ApacheCon - Top 10 Scalability Mistakes

Presented by John Coggeshall, author of PHP 5 Unleashed

The “fastest” approach isn’t always the most scalable. John covers how to scale everything from your data, your code, to your team. He quotes Theo Schlossnagle saying “Scalability marginally impacts procedure, procedure grossly impacts scalability”.

Performance and resource scalability requires forethought and process. Besides obvious things like version control, it is very helpful to set performance goals and metric measurements ahead of time, as well as API documentation and internal development mailing lists. One of the first things to consider is what it means to your application and business to perform - 10 / 100 / 1000 requests per second? What are your performance requirements?

(Note that he focuses primarily on PHP, but some of the tips are generic for all apps)

Some performance metrics to consider: RESPONSE TIME - I agree with him that this is one of the biggest ones. It’s what your users (and your boss) visibly see when they use your site. Others he mentioned are resource usage (CPU / memory/ etc) and throughput (i.e. requests per second).

When it comes to scalability, you can either be reactive or proactive. Twitter is a good example of a reactive app. They have had horrible problems that have blown up on them. If you build for a niche market (John writes ERP apps for car dealerships) - you know ahead of time that your userbase is limited. Then you may not have to squeeze every ounce of performance out of every routine.

Quotable: Don’t write an application you’ll need three years from now, write an application you need today. HOWEVER - THINK about what you might need in three years.

One specific example of something you may think about doing ahead of time: Separate your database writes and reads ahead of time so that you could read from a replication server later, even if you start out with both of them hitting the same server.

John’s top ten scalability tips:

Network File Systems - Don’t use NFS to host your code base just because it’s easier to deploy the code. Why? NFS / GFS is really slow and has tons of locking issues. John recommends rsync - which I agree with if you’re deploying PHP / file-based apps, which he is primarily talking about. What about run-time updates such as accepting file uploads that need to be replicated to all web servers? First - consider - does it really need to be instantly replicated to all servers? Most likely, it could be synced. NFS may be an option for this, but not for hosting code.

I/O Buffers - I/O buffers are there for a reason - to make things faster. Sending 4098 bytes of data to the user when your system write blocks are 4096 bytes is stupid - it takes an additional block for two extra bytes.

Ram Disk - ram disks improve performance, but are not appropriate for many things. One example he gives of a good use case for ram disks are sessions - if you don’t mind if sessions are lost in worst-case-scenario. This will improve performance, but is obviously risky if that data is critical.

Bandwidth Optimization - You can optimize bandwidth various ways. He’s discussing PHP apps - and recommends mod_deflate or Zlib.output_compression=1.

Configuring PHP for speed - you’ll have to see the slides for the complete list of speed enhancements. Here are most of them (too fast to add exactly what they each do) register_globals = off / output_buffering = 4096 / session.auto_start = off / session.gc_divisor = 10000 / session.use_trans_sid = off / register_argc_argv = off / auto_globals_jit = on

Blocking calls - Blocking I/O can always be a problem in an application - i.e. attempting to open a remote URL from within your PHP scripts. If the resource is locked / slow / unavailable, your script hangs while we wait for a timeout. You may as well try to scale an application that has a sleep(30) in it. At the very least, set the timeout to only two or three seconds. Solutions: don’t use blocking calls in your application. _Have out-of-process scripts responsible for pulling down data. Then cache that data in your database, etc… _Zend has a commercial solution for PHP apps to do background processing in job queues (Zend job queues)

Caching - failing to cache or do so intelligently is one of the biggest pitfalls of scalability. A lot of people don’t realize how much the can cache. Use the op-code cache in PHP - this will keep your server from recompiling your script for every request.

Semi-static caching - if your application has a lot of content that could change so it has to be stored in the DB, but most never does, you can use semi-static caching. He suggests that instead of generating the HTML for the browser, make this script generate another PHP script that contains mostly static content (i.e. the content of an article), and has minimal dynamic code (ads / username on page). Then you could do a mod_rewrite rule that redirects to your generated file. I don’t personally recommend this unless you REALLY need it…. I have seen it lead to bad problems.

Poor database design - Using MyISAM everywhere instead of InnoDB is a bad idea. John says NEVER put logic in your code to say “if I can’t connect to this DB, switch to this DB”. You may start writing to a second master while other threads are writing to the original. He suggests using MySQL Proxy (for PHP) instead if necessary.

Use SQLite - great for PHP if you are doing 99.999% reads. You have to understand when to use it (a write locks the whole database basically).

Knowing where to not optimize - vmstat and iostat are your friends. Use PHP profilers. Log information so that you can see where your bottlenecks are. Amdahl’s law: impoving code execution time by 50% when it executes only 2% of the time only nets a 1% improvement. Optimizing code by 10% better when it runs 40% of the time is MUCH better.

Final thoughts:

Scalability is a two way street: scale up and down
WIthout process, you will fail
You have to be able to afford to write the program.
You have to be able to afford to make it ten times larger.