« Boomerang – And I don’t mean the cartoon network | Main | Can you fool your users? »

August 04, 2008

Sizing WAN Optimization Solutions

Often my discussions with customers involve questions about identifying the right sized appliance for each remote site in the network.  Identifying the correct Steelhead appliance model is usually a routine process involving estimating the number of users at each site, and then multiplying that number by five in order to identify the number of TCP connections that would be needed at that site.  The right Steelhead model will support at least the calculated number of TCP connections. A secondary, though less important consideration, is matching the Steelhead to the amount of WAN bandwidth at that site.  However, occasionally I come across a customer who doesn't know how many WAN users they have at each branch office.  Typically this is a large corporate customer where details on employee-count information are known by only a few people, or are details that are kept only by the facilities organization.  What usually happens in this case is that I get asked to propose a sizing based only on the amount of WAN bandwidth at each site.

This is a vexing exercise, as it basically involves proposing a solution without knowing what the requirements are.  In a similar way, it's like ordering pizza for a meeting without knowing how many people will show up.  You can buy too much and end up wasting a lot of pizza.  Or you can buy too little and end up with hungry and angry meeting attendees.

As we all should know by now, WAN performance isn't just about compression and bandwidth; it's also about WAN latency and protocol chattiness issues.  Many protocols we use, such as CIFS and Exchange/MAPI, are extremely chatty in the way that they communicate, leading to a large number of round-trip interactions across the high-latency WAN.  To address this issue, an effective WAN optimization solution must be stateful and aware at the application layer.  In the case of Riverbed, in order to implement latency mitigation mechanisms for these application protocols, memory and CPU resources are allocated to implement layer 4-7 optimizations for each discrete application session and TCP connection.  Because hardware resources are consumed on a per-connection basis, in order to prevent each Steelhead from exhausting its available hardware resources, Riverbed has a TCP connection limit for each Steelhead appliance model.

But there are other reasons why WAN bandwidth alone is an ambiguous metric--are we talking about the required bandwidth before or after deployment of WAN optimization?  There are many cases where a customer will start out with a 10Mbps CIR at a given branch office.  After Steelheads are deployed, the customer is able to reduce their CIR to 4Mbps to save on WAN telecom costs, with no performance impact due to the Steelhead's ability to reduce the traffic load on the WAN.  In this case, should a Steelhead 2020 with 10Mbps of optimized WAN throughput have been proposed, or a Steelhead 1520 with 4Mbps of optimized WAN throughput?  In other words, attempting an appliance sizing based purely on the pre-existing WAN bandwidth is ineffective in this regard.

But then I'm confronted by how some of our competitors are able to provide concrete sizing based on only the amount of WAN bandwidth, with no dependency on the number of users at each site.  Why is it that some of our competitors don't care how many users are at each site, while Riverbed has to size by estimating number of users and the number of connections used by each user?

Well, the reason that some competitors support a very high or infinite limit to the number of TCP connections is because they don't do very much when it comes to application-specific optimizations.  Since they don't do much at the application layer, they don't need very much additional memory and CPU for every new TCP connection and application session.  These products focus only on compression and bandwidth reduction.  To them, it doesn't matter if the 1000 packets that they observe in the network are all from the same TCP connection, or if they are seeing one packet each from 1000 separate and distinct TCP connections--they treat those 1000 packets in the same way regardless of the situation.  And in either case they consume about the same amount of hardware memory and CPU resources.  Yes, they can support a high number of TCP connections, but only because they do very little to address protocol chattiness and latency issues in the WAN.  It only follows that the performance improvement for these applications is very limited when it comes to chatty protocol behavior for applications such as CIFS, MAPI, NFS, HTTP, etc...

Josh Tseng

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e5508a3ca7883400e553bb63138834

Listed below are links to weblogs that reference Sizing WAN Optimization Solutions:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Josh,

One way I do this is looking at the cache flow inside a cisco router durning normal business times.

Then I double that number.

So if I see 200-300 active tcp sessions via sh ip route cache flow I know I should estimate around 400-600.

It's no more a exact science than knowing the number of people in the office and multiplying by 5 because not all the people in the office will be using WAN connectivity.. so I figure this is a bit more realistic..

Once our Riverbed install goes live in the next few months I should be able to see if my estimates were realistic.

Hi Matt,

Yes, that's a very interesting approach. I wish I had thought of that on several occasions. In one case for a very large customer, we actually went through tcpdumps with tcptrace and counted the net number of connections that were set up between 2am and 11am. Please keep me posted on how well the output of that command correlates to the number of active connections the Steelhead sees and optimizes.

Josh

Hello Josh,
Did you ever get a result back from Matt on his "sh ip route cache" method for sizing his project?

Thanks
Keith

Hi Keith,

I haven't heard back from Matt specifically, but as he mentions it's not an exact science. I would just note that as in any other sizing situation, network usage tends to grow over time. That being the case, when sizing a new deployment, it's best to have room to upgrade if necessary. In the case of Riverbed that means if possible, select a model that can be upgraded with a software key to increase the TCP connection limit.

Best,
Josh

Post a comment

If you have a TypeKey or TypePad account, please Sign In.


WWW
blogs.riverbed.com

Please enter your email address to subscribe to the Riverbed Blog:

Please enter your email address to subscribe to the Riverbed Blog: