Are there any scalability best practices specifically for sites with huge audiences?

While this question has been asked in a variety of contexts before, I can't find any information pertaining specifically to sites targeting very large audiences - for example on the scale of hundreds of thousands or even millions of users.

When writing sites that target smaller audiences (such as intranet hosted data driven sites that handle from a few to a few thousand users) we only tend to follow best practices within the confines of our project budgets/deadlines - i.e. developer costs, rollout schedules and maintainability have a far bigger impact than we would often like on how we code things.

Some things are also negligible (to a point), for instance delivery time, image compression/size, bandwidth because the nature of a LAN hosted application tends to mean that there is a relatively small amount of financial cost that (within reason) we don't need to worry about too much.

However, when looking to target a much broader audience for instance an audience of (hopefully) millions of users:

  • Are there any best practices that no longer need to be worried about (i.e. become more negligible the larger the audience)?
  • Are there any practices that should be adhered to even more tightly?
  • Also, are there any practices that only really come into play as your audience achieves some critical mass [and what would that critical mass be]? i.e. applying artificial constraints that wouldn't begin to concern you on a private network

Examples I've come across so far are:

  • Host codebases such as jQuery on Google as it's delivered from Google's CDN and can be served much faster than from your own servers. This will also help keep bandwidth costs down for delivery of your site.
  • Host images on a CDN for the same reason as hosting your javascript code elsewhere.
13.10.2009 17:51:16

I guess it depends on what one aims for on the "triangle" of pressures: CAP (Consistency, Availability & Tolerance to Partition). E.g. one can only have so much "C" when faced with network disruptions which incur "P".

Nowadays, it would appear that the accent is put more on delivering "good user experience" which seems to hinge on "Time to Result" (e.g. having a complete web page on the user's desktop): this translate to investing (amongst other things) more on the "A" and "P" sides then the "C" one.

More concretely: spend some time deciding when to perform data aggregation for the presentation layer to your users e.g. can I aggregate this data over a longer time period before recomputing another view to push?

Of course, I am only barely scratching the surface of the problem.

13.10.2009 18:11:12
Thanks, I appreciate that insight. Do you have any recommended reading on the subject?
BobTheBuilder 13.10.2009 18:17:59
A good presentation on CAP:…
jldupont 13.10.2009 18:46:26
@Jean-Lou Dupont: Thanks, I appreciate the link. I'll follow up on it when I get a moment.
BobTheBuilder 13.10.2009 18:55:34

I would check out YSlow and follow their reccomendations with regards to improving performance.

13.10.2009 17:53:14
Thanks there are some useful tips on their site to take into account. +1
BobTheBuilder 13.10.2009 18:20:00

I think there are three big things to keep in mind here:

a) You aren't going to write the next twitter/youtube/facebook/ebay/amazon/whatever. It don't happen too often so it is a big case of YAGNI.

b) If you do happen to write one of those, chances are you'll have the opportunity to rewrite the application more than a few times.

c) Only object lesson from any of the architecture types who have spoken publicly about those apps is that scaling horizontally is the way to go. Vertical maxes out real, real quick.

Also, I'd argue that process improvements become much bigger at these lofty scales. You will have legions of developers, strict deployment windows and lots of boxes to worry about. It had better be real scripted, automated and repeatable.

13.10.2009 18:06:14
+1 for C, as for the other two points, I'm sure I'll make enough of my own mistakes without repeating everyone else's along the way...and anyone can have an idea, so A is a bit too defeatist for me to take seriously.
BobTheBuilder 13.10.2009 18:16:45

@jldupont - Just looked at the presentation that you have linked to. One thing that I didn't get is that how come "Distributed Databases" is an example scenario when you lose Availability to gain Consistency and Partitioning. I think for distributed databases you lose Consistency.

14.11.2009 15:04:28