Thursday, March 31, 2011

What is the best way scale out work to multiple machines?

We're developing a .NET app that must make up to tens of thousands of small webservice calls to a 3rd party webservice. We would prefer a more 'chunky' call, but the 3rd party does not support it. We've designed the client to use a configurable number of worker threads, and through testing have code that is fairly well optimized for one multicore machine. However, we still want to improve the speed, and are looking at spreading the work accross multiple machines. We're well versed in typical client/server/database apps, but new to designing for multiple machines. So, a few questions related to that:

  • Is there any other client-side optimization, besides multithreading, that we should look at that could improve speed of a http request/response? (I should note this is a non-standard webservice, so is implemented using WebClient, not a WCF or SOAP client)
  • Our current thinking is to use WCF to publish chunks of work to MSMQ, and run clients on one or more machines to pull work off of the queue. We have experience with WCF + MSMQ, but want to be sure we're not missing better options. Are there other, better ways to do this today?
  • I've seen some 3rd party tools like DigiPede and Microsoft's HPC offerings, but these seem like overkill. Any experience with those products or reasons we should consider them over roll-our-own?
From stackoverflow
  • Sounds like your goal is to execute all these web service calls as quickly as you can, and get the results tabulated. Given that, your greatest efficiency control is going to be through scaling the number of concurrent requests you can make.

    Be sure to look at your client-side connection limits. By default, I think the system default is 2 connections. I haven't tried this myself, but by upping the number of connections with this property, you should theoretically see a multiplier effect in terms of generating more requests by generating more connections from a single machine. There's more info on MS forums.

    The MSMQ option works well. I'm running that configuration myself. ActiveMQ is also a fine solution, but MSMQ is already on the server.

    You have a good starting point. Get that in operation, then move on to performance and throughput.

    Daniel : wish I could mark all the answers - all very helpful, but this one led me to setting maxConnections in my app.config, which led to a 2x improvement in speed. Still not where we want to be, so we're looking into MSMQ, Rhino, and EC2/Azure.
  • You might want to consider Rhino Service Bus instead of MSMQ. The source is available here.

    lfoust : Isn't Rhino Service Bus built on top of MSMQ?
    configurator : Yes it is. I meant this as a wrapper around MSMQ.
  • At CodeMash this year, Wesley Faler did an interesting presentation on this sort of problem. His solution was to store "jobs" in a DB, then use clients to pull down work and mark status when complete.

    He then pushed the whole infrastructure up to Amazon's EC2.

    Here's his slides from the presentation - they should give you the basic idea:

    I've done something similar w/ multiple PC's locally - the basics of managing the workload were similar to Faler's approach.

  • If you have optimized the code, you could look into optimizing the network side to minimize the number of packets sent:

    • reuse HTTP sessions (i.e.: multiple transactions into one session by keeping the connection open, reduces TCP overhead)
    • reduce the number of HTTP headers to the minimum in the request to save bandwidth
    • if supported by server, use gzip to compress the body of the request (need to balance CPU usage to do the compression, and the bandwidth you save)

0 comments:

Post a Comment