Monday, March 7, 2011

What is the most accurate method of estimating peak bandwidth requirement for a web application?

I am working on a client proposal and they will need to upgrade their network infrastructure to support hosting an ASP.NET application. Essentially, I need to estimate peak usage for a system with a known quantity of users (currently 250). A simple answer like "you'll need a dedicated T1 line" would probably suffice, but I'd like to have data to back it up.

Another question referenced NetLimiter, which looks pretty slick for getting a sense of what's being used.

My general thought is that I'll fire the web app up and use the system like I would anticipate it be used at the customer, really at a leisurely pace, over a certain time span, and then multiply the bandwidth usage by the number of users and divide by the time.

This doesn't seem very scientific. It may be good enough for a proposal, but I'd like to see if there's a better way.

I know there are load tools available for testing web application performance, but it seems like these would not accurately simulate peak user load for bandwidth testing purposes (too much at once).

The platform is Windows/ASP.NET and the application is hosted within SharePoint (MOSS 2007).

From stackoverflow
  • There are several additional questions that need to be asked here.

    Is it 250 total users, or 250 concurrent users? If concurrent, is that 250 peak, or 250 typically? If it's 250 total users, are they all expected to use it at the same time (eg, an intranet site, where people must use it as part of their job), or is it more of a community site where they may or may not use it? I assume the way you've worded this that it is 250 total users, but that still doesn't tell enough about the site to make an estimate.

    If it's a community or "normal" internet site, it will also depend on the usage - eg, are people really going to be using this intensely, or is it something that some users will simply log into once, and then forget? This can be a tough question from your perspective, since you will want to assume the former, but if you spend a lot of money on network infrastructure and no one ends up using it, it can be a very bad thing.

    What is the site doing? At the low end of the spectrum, there is a "typical" web application, where you have reasonable size (say, 1-2k) pages and a handful of images. A bit more intense is a site that has a lot of media - eg, flickr style image browsing. At the upper end is a site with a lot of downloads - streaming movies, or just large files or datasets being downloaded.

    This is getting a bit outside the threshold of your question, but another thing to look at is the future of the site: is the usage going to possibly double in the next year, or month? Be wary of locking into a long term contract with something like a T1 or fiber connection, without having some way to upgrade.

    Another question is reliability - do you need redundancy in connections? It can cost a lot up front, but there are ways to do multi-homed connections where you can balance access across a couple of links, and then just use one (albeit with reduced capacity) in the event of failure.

    Another option to consider, which effectively lets you completely avoid this entire question, is to just host the application in a datacenter. You pay a relatively low monthly fee (low compared to the cost of a dedicated high-quality connection), and you get as much bandwidth as you need (eg, most hosting plans will give you something like 500GB transfer a month, to start with - and some will just give you unlimited). The datacenter is also going to be more reliable than anything you can build (short of your own 6+ figure datacenter) because they have redundant internet, power backup, redundant cooling, fire protection, physical security.. and they have people that manage all of this for you, so you never have to deal with it.

  • In lieu of a good reporting tool for bandwidth usage, you can always do a rough guesstimate.

    N = Number of page views in busiest hour P = Average Page size

    (N * P) /3600) = Average traffic per second.

    The server itself will have a lot more internal traffic for probably db server/NAS/etc. But outward facing that should give you a very rough idea on utilization. Obviously you will need to far surpass the above value as you never want to be 100% utilized, and to allow for other traffic.

    I would also not suggest using an arbitrary number like 250 users. Use the heaviest production day/hour as a reference. Double and triple if you like, but that will give you the expected distribution of user behavior if you have good log files/user auditing. It will help make your guesstimate more accurate.

    As another commenter pointed out, a data center is a good idea, when redundancy and bandwidth availability become are a concern. Your needs may vary, but do not dismiss the suggestion lightly.

    Eric Nguyen : To go from average traffic estimates to capacity planning for burst loads, see Charlie Martin's excellent notes on this comment: http://stackoverflow.com/questions/379478/whats-the-best-way-to-determine-the-hardware-requirements-for-an-application/379543#379543

0 comments:

Post a Comment