There’s a special brand of terror that settles in your gut when you’re running an online promotion and your website or servers aren’t up to the task. The errors stack up, the traffic drops, the boss calls–it’s not a good feeling at all. Proper maintenance and planning usually avoids these situations but there are times when, despite our best efforts to the contrary, stuff happens. And it doesn’t matter how hard you’ve worked, the damage is done. If it were me, I'd want to minimize the damage and fix my website asap.
The steps you take to fix a site when the problem is occurring are absolutely critical. We know from experience it’s very easy to make the problem worse with the wrong moves. You’ve got to be methodical and resist the urge to try too many things at once. You need to be able to assess the impact of each configuration change you make, are things better or worse? Only by changing one thing at a time can you know for sure what’s making a difference.
Make quick backups of all your config files before you start whacking on them. It’s easy to get all turned around and have the need to start over when things go from bad to worse.
Here’s a checklist of things to consider and try when you’re trying to work your way out of an issue.
Check the easy stuff like the DNS
Make sure your domain and DNS are configured properly, and pointing to the right place. If you’ve got a load-balancer in the mix, the hostname should point to it, not the webservers. If you’ve got multiple authoritative servers for your domain, make sure they all have the same records. It’s not uncommon and frustrating to diagnose a rogue DNS server that’s giving out bad IP addresses. In this situation, sometimes the site will work, sometimes it won’t. We usually keep our TTLs pretty low, usually 15 minutes, during a promo. This lets us make quick changes if necessary.
Is your webserver to blame?
Somewhere in the website will be a bottleneck of some sort. There may be more than one, but you’ll usually have one that’s causing more problems than others. Is it the web server or is it the database server? It’s best to move through the process just like a web request would, from the network, to the webserver, to the database server.
So if you’ve already confirmed that the DNS is shipshape and your requests are hitting your site, start with the webserver. Look at the number of processes that the webserver is spawning. If you’ve got an endless list of idle webserver processes, that’s a problem, but you haven’t narrowed it down yet. It’s too late to do anything about it now but if you’re running Apache, you should be running NGINX. The performance is night and day. Remember that for next time. If you’re running Windows in a heavy traffic situation, well, your problems started long before today. It’s not my intent to start a flamewar here but the fact of the matter is, in our experience, we’ve had much better luck scaling and tuning Linux systems over Windows.
Try creating a dummy page
One quick way to start narrowing things down is to create a small dummy page on the webserver. This may seem simplistic, but you’d be surprised how often it’ll shed light on the problem. At first, just let it be a classic “Hello, world!” text file. Call it test.txt and put it in the root of the website. Via a browser, try to hit it. If you’re running SSL, try to hit it under HTTPS as well. If it comes back quickly, add some HTML. Nothing fancy, just enough to make sure that the server is parsing HTML properly. Then add a database connection string, then a query, an update and so on. At some point it’s gonna grind to a halt and that will help tell you where the issue is. If the HTML take a long time to render, then you may have problems in one or more server-side include files. There may be bunch of logic that’s getting executed each time a page it loaded, and maybe it doesn’t have to run every time or for every page. If it dies with the database connection, then your bottleneck is probably in the database. Look at your SQL statements that are being executed. Is it mainly SELECT statements? If they’re running slow, you’ve probably got primary key or indexing problems. Make sure that any columns that are being joined have indexes or are primary keys.
If you’re running mainly INSERT or UPDATE statements, then too much indexing could be working against you. This is the Yin and Yang of databases, not enough indexing is bad for SELECT and good for INSERT and UPDATE. Too much shows the reverse.
What about those APIs?
You should also consider what external services your website may be accessing. If you’ve got shipping calculations, then you’re probably reaching out to various shippers through their API for shipping quotes. Or, you may be reaching into your warehouse or other back-office system for inventory or other information. Make sure that these calls aren’t the source of your slowdowns.
Could caching be running amok?
If your site is built on a CMS, like MODX, then it probably has extensive caching involved. Make sure that each snippet or chunk that can be cached, is being cached. There are also whole page caches that can provide additional help. However, you need to make sure that your site is suitable for whole page caching. If your page template has any personal information about the current user, then you need to be careful about what gets cached. It’s entirely possible for other users to get pages served up from the cache that have other user’s information embedded in them. Not a good scenario.
These are just a few of the many items that you can and should check when you’re faced with a poorly performing website. Remember, it’s always best to plan ahead and perform as much stress testing as you can prior to the event. Be methodical, make good backups, and make sure you check everything, even the most obvious and simple items. A website fix is like everything else, the problem is always in the last place you look.