We are Web 2.0 company that built a hosted Content Management solution from the ground up using LAMP. In short, people log into our backend to manage their website content and then use our API to extract that content. This API gets plugged into templates that can be hosted anywhere on the interwebs.
Scaling for us has progressed as follows:
- Shared hosting (1and1)
- Dedicated single server hosting (Rackspace)
- 1 Web Server, 1 DB Server (Rackspace)
- 1 Backend Web Server, 1 API Web Server, 1 DB Server
- Memcache, caching, caching, caching.
The question is, what's next for us? Every time one of our sites are dugg or mentioned in a popular website, our API server gets crushed with too many connections. Or every time our DB server gets overrun with queries, our Web server requests back up.
This is obviously the 'next problem' for any company like ours and I was wondering if you could point me in some directions.
I am currently attracted to the virtualization solutions (like EC2) but need some pointers on what to consider.
What/where/how to scale is dependent on what your issues are. Since you've been hit a few times, and you know it's the API server, you need to identify what's actually causing the issue.
Is it DB lookup times?
A volume of requests that the web server just can't handle even though they're shortlived?
API requests take too long to process? (independent of DB lookups, e.g., does the code take a bit to run)?
Once you identify WHAT the problem is, you should have a pretty clear picture of what you need to do. If it's just volume of requests, and it's the API server, you just need more web servers (and code changes to allow horizontal scaling) or a beefier web server. If it's API requests taking too long, you're looking at code optimizations. There's never a 1-shot fix when it comes to scalability.
The most common scaling issues have to do with slow (2-3 seconds) execution of the actual code for each request, which in turn leads to more web servers, which leads to more database interactions (for cross-server sessions, etc.) which leads to database performance issues. High performance, server independent code with memcache (I actually prefer a wrapper around memcache so the application doesn't know/care where it gets the data from, just that it gets it and the translation layer handles DB/memcache lookups as well as populating memcache).
What is the level of scaling you are looking for? Is it a stop-gap solution e.g. scale vertically? If it is a more strategic scaling project, does your current architecture support scaling horizontally?
Depends really if your bottleneck is reads or writes. Scaling writes is much harder than reads.
It also depends on how much data you have in the database.
If your database is small, but cannot cope with the read load, you can deploy enough ram that it fits in ram. If it still cannot cope, you can add read-replicas, possibly on the same box as your web servers, this will give you good read-scalability - the number of slaves from one MySQL master is quite high and will depend chiefly on the write workload.
If you need to scale writes, that's a totally different game. To do that you'll need to split your data out, either horizontally (partitioning / sharding) or vertically (functional partitioning etc) so that you can spread the workload over several write servers which do not need to do each others' work.
I'm not sure what EC2 can do for you, it essentially offers slow, high latency machines with nonpersistent discs and low IO performance on the end of a more-or-less nonexistent SLA. I guess it might be useful in your case as you can provision them relatively quickly - provided you're just using them as read-replicas and you don't have too much data (remember they have nonpersistent discs and sucky IO)