Today's guest blogger is Philip O'Toole, a member of our Cloud Steelhead Engineering Team, who called me a few weeks ago with this great story. I encouraged him to blog about it. It really demonstrates the ease-of-use and flexibility of cloud computing.
The recently-launched Riverbed Cloud Portal is a web service, allowing for simplified deployment, easy management, licensing, and instant upgrades of the new Cloud Steelhead. Hosting the Portal itself in the Amazon Web Services (AWS) Cloud allowed us to streamline deployment and management of the Portal, significantly shortening our time-to-market, while still allowing us to meet our needs for reliability and security.
Like most modern web services, the Portal is backed by a database. And like most databases, the schema sometimes needs modification. We use an open source tool to help with database migrations, and generally it works very well. However, early during development we had an interesting experience which really showed the Portal team the power of the Cloud when it comes to collaborative development.
One day the migration tool started acting up, reporting errors in a low-level part of the code, and in a manner that seemed quite specific to our environment. I was sufficiently familiar with the tool to know that something fundamental seemed to be wrong, so I e-mailed the tool's developer. He agreed it was very strange, that the root cause was not obvious, and had some questions for me.
Normally the process would go something like this:
- He sends me an e-mail asking me to run a test.
- I run it (perhaps I don't run it quite right, and he needs to ask me again).
- I send him the results.
- Lather, rinse, and repeat.
- We go back and forth for a week or more until we finally determine the root cause.
Most developers have been there at some point -- it can be long, slow, process working with a developer in another country, across timezones, both of you trying to resolve an issue like this. Often it occurs late in the development cycle, close to a release, when time is most precious.
This is where the public Cloud came to the rescue. If occurred to me that if I could reproduce the issue in an EC2 Instance (i.e. a virtual machine) in the AWS Cloud, I could then simply turn the VM over to him and let him debug it himself. After all, it's a public Cloud.
It was easy since we've got lots of these virtual machines up and running for development and testing. I fired up a scrubbed VM (so it had nothing proprietary), added access to the VM from source IP addresses outside of Riverbed (access to the VMs is locked down by default -- security is always a critical consideration), reproduced the error, and then sent him the DNS name and credentials for the VM. He logged in, quickly identified the root cause, and showed me how the VM could be patched to address the issue.
As a result, we had our fix in less than a day, and I could implement it on our real systems. It struck me how easy it was to collaborate on this issue, when I could recreate a machine with the problem, and then turn it over to the tool creator who lives in England. The developer has even committed the fix to the product's publicly available source, so it's a win for everyone.
It was a very interesting process -- obvious perhaps, but I had to experience it to really understand the advantages of this use of a public cloud. After all, as any developer or system engineer knows, nothing is worse than having to ship physical hardware somewhere just so someone can work hands-on with the problematic system.