« Blog Battle: Why public cloud is the future of IT | Main | Riverbed Storytellers Contest winner for September »

December 19, 2011

Performance: very important. Integrity: even more.

A few weeks ago a friend of mine was asking some questions about what kinds of risks WAN optimization might pose for data integrity. He was sufficiently skeptical that he established a policy prohibiting any form of WAN optimization on his network. I said I thought that was a little extreme; have you performed any in-house testing to learn how, for example, RiOS strives to maintain integrity? As I thought, he hadn't.

Data-corruptionA couple weeks later I found myself in a similar situation: another person questioning the ability to preserve data integrity when WAN optimizers are in the path of the traffic. Sensing a pattern, I'm thinking it's time to address this topic.

Suppose ClientA and ServerB need to communicate with each other. Network equipment between ClientA and ServerB should do everything it can to ensure that the payloads exiting one node are the same as the payloads entering the other node. Sounds simple, right? If most network gear is simply reading datagram headers, making a few changes to a value or two along the way, but not touching the payload, few people worry about data integrity. But when the network equipment actually examines and modifies the payload, well, that can make some people uncomfortable.

RiOS takes several precautions to ensure that the original goal -- traffic out equals traffic in -- is maintained. First, whenever ClientA connects to ServerB, the connection is always permitted: the Steelheads on both sides allow the full connection, even if ClientA has previously connected to ServerB. By not using tunnels or multiplexing, RiOS maintains a one-to-one ratio of active connections between Steelheads and the corresponding connections between clients and servers. A Steelhead doesn't attempt to spoof, or assume the identity of, a client or a server.

Data-protectionSecond, RiOS continually validates the integrity and format of its data dictionary. Scalable Data Referencing, the feature that does most of the deduplication work, relies principally on byte-by-byte comparisons to ensure that the same symbolic reference always matches the same unique data chunk. When "cold" Steelheads exchange data chunks, the chunks are encoded with checksums; the checksums are compared after transmission to check for data corruption. This combination is an improvement over hashing mechanisms that run the risk of occasionally colliding. Further, if the Steelhead on one side receives a reference it no longer understands, or can't reliably reconstruct (because of a checksum mismatch, for example), it will re-request the entire chunk from the Steelhead on the other side.

Third, RiOS doesn't assume that its data dictionary is always current. Returning to our example, suppose ClientA retrieves a file from ServerB. The Steelheads on both ends construct their respective data dictionaries. Now ClientC, in the same branch office as ClientA, retrieves the file. The wrong way to optimize this is for the client-side WAN optimizer to spoof ServerB and immediately return the content. RiOS takes care of this the right way: ClientC's connection is forwarded all the way to ServerB, ServerB returns the file, and the server-side Steelhead evaluates the return traffic patterns. Only if they exactly match the patterns already stored in the dictionary will the Steelhead send the references to the client-side Steelhead, which then returns the corresponding full data chunks to the client.

If you're a long-time Riverbed customer, you probably already understand how SDR and RiOS preserve data integrity. If you're new to Riverbed, or are just looking to learn more, I hope this brief introduction has been useful. We freely admit: we are speed freaks here. But we won't let our passion for performance override our obsession with integrity. So let's go -- fast, and with trust.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e5508a3ca7883401675ef83ed1970b

Listed below are links to weblogs that reference Performance: very important. Integrity: even more.:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Post a comment

This weblog only allows comments from registered users. To comment, please Sign In.


WWW
blogs.riverbed.com

Please enter your email address to subscribe to the Riverbed Blog:

Please enter your email address to subscribe to the Riverbed Blog: