Monday, October 4, 2010

Google Cloud Vs Amazon Cloud - An architectural perspective - Part 2

Check out the previous blog entry in the series: Google Cloud Vs Amazon Cloud - An architectural perspective - Part 1

Amazon Web Services (AWS)

Unlike Google, Amazon follows a service-oriented architecture to solve your cloud computing requirements. Amazon offers a suite of diverse, fine-grained services that you can mix and match and use as per your requirements. Show below is a rough sketch of how an AWS App works. Amazon offers services like EC2 for CPU cycles, S3 for storage, Cloudfront for Content Distribution etc. To see the full list of services offered by Amazon, look here:  

  Strictly speaking, there is no such thing as an AWS app. Any application that uses one or more services from the AWS Suite, can be called an AWS App (Compare this to the strict definition of what a GAE app is.  It needs to implement specific interfaces, be packaged as a war, must have an app-engine specific deployment descriptor called appengine-web.xml etc). In essence, your AWS app could be as simple as a command line utility that uses the Amazon Simple Storage Service(S3)  to backup your old photos and videos. It could also be an enterprise-scale deployment of several EC2 servers using Elastic Load Balancing for balancing load, storing data in Amazon Elastic Block Store (EBS) and delivering media via Amazon Cloudfront. Think of an AWS App as a mashup. You pick the services you want to use and forget about the rest.

Amazon offers you a great deal of flexibility on how you can build on their infrastructure. For example, you can get an Amazon EC2 instance and run Tomcat on it. The Amazon EC2 instance, which is a virtual instance, will look and behave just like a regular dedicated Linux server you would get from Rackspace or Godaddy. You get root access and you can do practically anything you would be able to do with your own box. The only difference being, if you want to scale up or scale down your instance, you can do it immediately. If your website is slashdotted, you can get 100 instances up and running in minutes. 

Amazon does not force you to use any of their services. So while the Tomcat on your EC2 instance can use Amazon's Simple DB for persistence, nothing stops you from configuring MySQL on your instance and using it as your database.  However, and this is the most important thing,  YOU will need to figure out how to scale up and scale down your MySQL based on your demand, you will need to configure replication etc. Amazon cannot magically do that for you.

Where AWS shines

Amazon provides you the building blocks to build any complex cloud deployment you can think of. It gives you the flexibility of choosing those features you need and ignoring the rest. With AWS you are in complete control of your deployment topology. AWS is also pay-per-use, which means you only pay as much as you use.

In the next blog entry, I will demystify some marketing jargon, discuss the pros and cons of each platform and give you some recommendations on how to choose the right platform for your needs.

Sunday, October 3, 2010

Membase - A super fast RDBMS alternative

Recently came across Membase. Membase is a key-value database that is a super fast replacement for a traditional RDBMS. If your web application needs to read a lot of data, in the fastest time possible, Membase is what you are looking for.

Membase is used by Zynga, the largest application developer on Facebook.

Saturday, October 2, 2010

Google Cloud Vs Amazon Cloud - An architectural perspective - Part 1

In this series of blog entries, I'll explore the two main cloud platforms available to Java developers today: Google's Cloud offering (called the Google App Engine) and Amazon's Cloud offering (called Amazon Web Services).  I'll make the comparison from a technical perspective.  If you are a developer/architect evaluating these options, you might find this useful.

Google App Engine (GAE)

Google App Engine (GAE) is Google's solution to your cloud computing requirements. GAE follows a container-based architecture. If you are a Java developer,  you are probably familiar with the concept of a Servlet Container or a Java EE container. In a container-based architecture, you write a component based of certain interfaces, You drop the component (deploy) into the container. And then, the container manages the component and its life cycle for you. For example, you write a Servlet component and drop it into your servlet container. On a web request, the servlet container initializes your servlet and calls the doGet method automatically for you.

The Google App Engine (GAE) is also a container. You write a Google App Engine App (GAE App)  and drop it (deploy) into the Google App Engine and the App Engine manages the app for you.   For example,  you can build an online discussion forum as a GAE app.  You deploy it to GAE and let GAE handle the provisioning for you.  When users hit your application, GAE will automatically load your GAE app and serve the requests. If there are no users accessing your application, GAE puts your application to "sleep", saving your from using up valuable CPU cycles.  On a new web request, GAE can "wake up" your application and service the request. If your application recieves a large number of requests GAE will replicate your application across multiple servers or "scale it up" to meet the demand. Likewise as your load eases, GAE can "scale it down". All this is done with no intervention from you. That's the promise of  GAE.

GAE also offers a variety of services that is available to your GAE app.  Need a database ? Instead of setting up your own databases and then replicating/clustering it, use the Datastore API and Google will handle those tasks for you.  Need to store large media files ? Use the Blobstore service.  Need a super fast, temporary data store ? Use Memcache.

The GAE App contract

GAE can make your app work, only if your app adheres to a strict contract with the container. The contract consists of :

1. Programmatic interfaces
2. Programmatic restrictions

Programmatic Interfaces

GAE provides APIs to the services that it offers. You must only use the APIs/services provided by GAE in your GAE app. For example, for persistent data storage, GAE provides a Datastore. You must use the GAE provided Datastore to persist your data. You cannot configure your own MYSQL instance and use it on GAE.

Programmatic Restrictions

GAE places a lot of restrictions on what you can do and cannot do in a GAE App. For example, you cannot read/write to the filesystem. You are also not allowed to spawn your own threads in a GAE App. Some of these make sense. If you were do File IO in your GAE App (File IO being local to the machine), how do you expect GAE to automatically provision servers for you during high load ?

One of the unfortunate consequence of these restrictions is that not all libraries that you use in your regular Java SE or Java EE projects will work. So you cannot use a library that does File IO anywhere (even if the library is not primarily meant for File IO). The library will fail to run on GAE.   The GAE community maintains a large list of libraries that are compatible/incompatible with GAE :  Will it Play in App Engine

Where App Engine Shines

If your application can live with the programming interfaces and restrictions of GAE, then GAE would make an excellent choice as your cloud platform. It can relieve you from a lot of pain related to server management/provisioning/sizing and administration.  GAE is also a pay-per-use platform. This means you do not incur any upfront fees to build on GAE.  GAE offers a significant free quota  that can help you get started with no cost and pay as you scale.