Running a high load website

Portal Home > Knowledgebase > Articles Database > Running a high load website

Posted by WebMedic, 01-08-2011, 03:47 AM
Hello, Not sure if this is the right place to ask. I have a website with an average of ~4000 to ~5000 users on it all day, lots of connections, lots of MySQL queries and at points the site is not even accessible. I am running it right now on a VPS, 4 cores, 1GB of RAM. The RAM is not an issue as its always using ~600MB and does not go over that, what I am having issues with is basically the CPU load caused by the MySQL queries and the fact that the website cannot take more people on it without hanging and lagging really bad. What do I do in this case? I cannot afford to throw money at it and buy huge servers, I need a better solution, I was suggested to move the MySQL to a different server, would that suffice? does the MySQL server need any custom requirements? - My problem then would be expanding my web server (nginx) to handle a lot more users, I need to expand. Please give me your suggestions with links to read more about it, Thanks for your time.
Posted by Johnny Cache, 01-08-2011, 04:35 AM
I know you mentioned that moving to a dedicated server isn't in your budget right now, but it may be your only viable option if this client of yours is utilizing so many system resources. I have a few higher-end VPS nodes with BurstNET (which can be verified by doing a traceroute on my domain) and they have fairly inexpensive Premium VPS plans if you still prefer an inexpensive VPS solution. Good luck!
Posted by squirrelhost, 01-08-2011, 05:19 AM
If you're sticking with your current kit for the moment, there's a lot of things you could do, mostly involving caching of some sort. 1. if you're using php, install APC opcode caching (there are others, but no matter) 2. maybe think about caching mysql queries - mysql has it's own caching, which may even be enabled, but it's pretty rubbish. Best is to use memcached. This would require a little re-coding for every mysql select statement. (unlike step 1. above which is simply an install + configure thing) 3. if you're serving images, then in addition to setting far-future expire times etc.., you can use a frontend like Varnish to cache and serve them. This is what it was written for. (some of the above were discussed by rasmus in: http://talks.php.net/show/oscon06/0 - not always the same technology, but same principles). Many other things you could do, but maybe best to go the simple caching route above first.
Posted by brentpresley, 01-08-2011, 09:55 AM
I would recommend that you first update mySQL to version 5.5.8 if you have not already done so. We have been testing this version and found it has significantly decreased CPU requirements. Beyond that, you should look at other areas that can be optimized in your setup (do run PHP? is it being run with an accelerator?) If overall optimization fails, then you have simply outgrown your current VPS and need to consider going to the next step.
Posted by WebMedic, 01-08-2011, 12:27 PM
squirrelhost gets the idea here, I am looking for optimization setup, not buying a huge server that will eventually get an overdose of the connections and die on me again. I have already installed APC, let me tell you about my current setup: And as for images and static files, nginx is configured to cache them so I don't think I need a frontend for that, do I? nginx webserver php5-fpm with APC mysql (just checked the version brentpresley): mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (x86_64) using readline 6.1 The website is completely custom made, I am pretty sure I will need to rewrite a huge part of it for optimization purposes but right now need to figure out the setup to handle my ~5000+ people on the site without the lag. What I am looking for is a solid setup that can be expanded later on when the website turns in profits to afford the dedicated servers. Alright here is what I have in mind: Separate the MySQL database to another server, but the question here is, what kind of specs am I looking for the MySQL server? Use a CDN for the static files, or atleast put them all under different subdomain so that later on I can easily switch that subdomain to another server / frontend / CDN. I was reading on: nrg-media . de/2010/08/learning-how-to-scale-the-hard-way/ This seems like a really good setup, but I think thats for a much larger project, so I want a smaller version of that, a more affordable one mainly. More ideas are welcomed
Posted by brentpresley, 01-08-2011, 12:39 PM
MySQL 5.1.41. I would give 5.5.8 a try, it's gone gold now and there is a thread here with good reviews on its performance.
Posted by WebMedic, 01-08-2011, 12:42 PM
Yeah I am reading up on that right now and will do the upgrade and get back to you with the results, thanks for your recommendation!
Posted by brentpresley, 01-08-2011, 12:44 PM
Just check and make sure it works with your control panel and the rest of your setup first, if you are running a control panel, that is. We have tested it with DirectAdmin on CentOS with LiteSpeed and custom Apache without issue, but have not tried cPanel yet because there were reports around the web of some incompatibilities.
Posted by squirrelhost, 01-08-2011, 12:48 PM
have a look at some of the optimizations in the link I posted above. there's lost of little things you can do to whittle down the times. like - serving files, especially images from partitions(s) mounted noatime (why bother updating access times at all). - use require instead of require_once or include_once which leads to stat operations looking up the file path for similar-named files - remove small include php files - put the content in whatever is including it - you can cache images with nginx - they introduced that few years back and probably not too bad performance - remember to set as high expires times on image files as you can get away with (10 years??) - cache css files also for this long, if you make any changes use versioning, i.e. update to style2.css, then style3.css, it won't matter if they're cached forever. I'd still look at memcached - it can make a huge difference for mysql queries
Posted by WebMedic, 01-08-2011, 01:06 PM
Can you send me a link to a good step-by-step upgrade for the MySQL on ubuntu, I am going to do this on the live server and dont want to mess it up Serving static files from noatime partitions sounds like a great idea, I can do that without issues on an OpenVZ VPS? creating my own partition? I am caching images and all static files already on my nginx. Memcached will be definitely used, the code needs a lot of rewriting so I am leaving that for later to do all the rewrites at once. Thanks
Posted by brentpresley, 01-08-2011, 01:20 PM
Take this with a HUGE grain of salt, because we run only CentOS, but here is what I found: http://www.ovaistariq.net/490/a-step...-to-mysql-5-5/
Posted by squirrelhost, 01-08-2011, 01:38 PM
I only use freebsd (bsd variants). I just edit /etc/fstab and reboot to change any flags like noatime etc... presumably centos and linux is mostly the same, just cut and pasted from unix to a different location maybe..
Posted by squirrelhost, 01-08-2011, 02:17 PM
As for mysql: - if you're deleting/adding records all the time, you really need to optimize the tables frequently. - If you're searching the db, make sure you're correctly indexing the stuff on the right of 'WHERE' in each statement. - go one step further and use Sphinx for searching, and then just use the db for retrieval (just uses the ID's returned by the sphinx search). - if you're not into replication, why have: log-bin=mysql-bin in your my.cnf?? - just comment it out, forget the logs - use my-huge.cnf as the basis for your my.cnf (or the large version at least) there's lots of good articles out there on using EXPLAIN when indexing to see how useful it is. also don't bother indexing complete fields if they're large, maybe just use something like: index(`verylongfield`(16)) and so on ...
Posted by tim2718281, 01-08-2011, 02:36 PM
The first thing to do is analyze the web server logs, to see how many HTTP requests the server is getting, and to see if that can be reduced - typically by exploiting browser and ISP caching using far-future expires headers. Files such as css, js, img, jpg and so on are all candidates for caching. Exploiting caching will not only reduce the demands on the server, it will also speed up response times.
Posted by WebMedic, 01-08-2011, 07:10 PM
brent: I will be installing that soon and will get back to you about it @squirrelhost: Thanks a lot, looking deeper into my MySQL queries now. @tim: I am already caching the images and css files, and any other static files for that matter, nginx is configured nicely for me but I think my main problem is that with the huge number of guests going around and using the website = high load on MySQL = high CPU load = cant take more users. What I have been doing today is, researched into Memcached and started rewriting my code to use it, the code is huge so will be doing more optimization later on tomorrow and hopefully finish off all the major queries and get them memcached. Hopefully that will help with easing the load on MySQL, will be coming back here to post results. Thanks everyone for your input!
Posted by mugo, 01-09-2011, 04:07 AM
You can use free load balancing soft ware to use more than one VPS with your site. You can set up mysql multi-master or master-slave, or even periodic dumps if they update infrequently. I have one client setup with 12 VPSs and synced and synced Dbs, I don't pay any where near the cost of a dedi, and he hasn't have a second of downtime in two years. (And he gets slammed, even gave his site out during an interview on CNN, and it barely made a noticeable impact). If a VPS works for you now, it's easier to deploy more of them than to try for a dedi. I personally don't migrate sites to dedi until there are either 2, or backed up by a few VPSs. (My hosting niche is HA hosting, so I never put *anything* on just one server)
Posted by WebMedic, 01-09-2011, 04:25 AM
mugo, I am very interested in your setup, can you please share more information? What load balancing software are you using? how are you connecting all 12 VPS's and even with a lower cost than a dedicated? I would highly appreciate it if you would share more information so I can start on a similar easily expandable setup Thanks!
Posted by mugo, 01-09-2011, 04:43 AM
I do custom setups, so I use everything from on-site high end dedi's with XOSoft, to an array of VPSs behind a load balancer. I normally use HAProxy, as it's tried, true, and very mature software. There are a few others I use, but I would highly suggest it. Since I'm an admin by trade, I don't have to get managed, I have a handful of companies that have good costs on unmanaged VPSs, instant setup, etc. I also do either heartbeat based failover, or DNS failover with many of the custom setups. What you choose really depends on what you serve, how many, and what uptime you are shooting for. I think the most complex I have for one customer is multiple load balancers behind multiple cascading failovers, there has to be around 15 simultanious outages to bring him down. But, the company wanted to specifically be up should a nuclear threat or attack happen in the US or abroad, so it was setup accordingly. Rough idea, and a "usual" deployment, 2 load balancers in front of 4 web servers geographically disassociated, with DNS failover, and that can be setup for 50-100 /mo, depending on variables (space, ram, location), but it will give you a general idea. Dedi does have it's place, but I'd generally take 10 distributed VPSs over 2 dedis any day. If you have intense apps and processing that are not alleviated by distributing bandwidth and web/internet services, then it's not a good fit. But for a LAMP site that's only gravity is bw/hits, it works like a charm. Oh, and you don't have to worry as much about backups, as long as you keep snapshots, since you have data distributed all over kingdom come. hehe.
Posted by squirrelhost, 01-09-2011, 08:54 AM
doesn't really matter how many boxes (or vps) you have when it comes to real performance. otherwise brad fitzpatrick at livejournal would never have developed memcached. The total time taken to handle one complete request can still be slow. even gmail takes many seconds to load here for me sometimes, and that's with maybe 50,000 boxes running it. it's a separate issue. just like request performance, load-balancing, redundancy and high-availability are all separate issues.
Posted by mugo, 01-09-2011, 02:37 PM
I know what you are saying, but not sure what the relevance or idea you are trying to convey is. I don't think OP is worried so much about one hit's performance. Spreading those hits over several affordable VPSs would do him well, and save the expense of a big dedicated server, which, I would never put a real site on without two or some redundancy. I have a customer with 12 VPSs, and yes, it matters very much if I take 10 of those down. As long as the individual boxes are quick enough for it's share of transactions, plus a couple of failures. Load balancing is a way to achieve HA. Like peanut butter and jelly.
Posted by UNIXy, 01-09-2011, 02:58 PM
How are you sharing and replicating your storage across all these nodes? Regards Joe / UNIXY
Posted by mugo, 01-09-2011, 03:26 PM
Understand this is just one instance (customer), I use just about every method that works, for various reasons...DRDB, other block level rep techniques,iSCSI shared, SAN, but the one in question uses key based rsync. Files only change every few days, so it's not as dymanic as most. Some dbs are master-slaved, some are just mysqldumped to other hosts upon change (direct to host, no export-import). I take each clients needs and do whatever fits their serving and usage patterns. 4 different load balancing apps...HA software, it just depends on what the customer needs, wants, and can afford Unixy, I think I've even taken some of your discussions and ideas and put them into my practice over the years! hehe Last edited by mugo; 01-09-2011 at 03:33 PM.
Posted by squirrelhost, 01-09-2011, 09:38 PM
it's trivially simple. you can load balance whatever you like but using clunky and not very snappy mysql, and probably not optimized queries, and bog-standard php code at the last step won't gain you all that much. the thread isn't about load balancing to avoid doing something intelligent. but i don't hold anything against people who want to do so.
Posted by mugo, 01-10-2011, 01:36 AM
LB would help alleviate quite a lot, and introduce redundancy in on a large scale, so writing it off as just silly compared to caching is...well...silly. In my experience, code that is in place for whatever instance is usually the best you have to work with...it is at the edge of understanding of whoever wrote it or placed it to begin with. Telling someone to migrate dbs because mysql sucks, and to "go optimize queries" and the php code is sub-standard usually does little to help in any situation...the admin / webmaster either has the best in place they understand, or hasn't attained the skill level to understand how to optimize it any further, if at all. I've never been a fan of the "go optimize and you'll be alright" cap answer. And for obvious reasons. It rarely helps anyone at the end of the day. And reading between the lines, if someone is running a site on a VPS, it doubt they have the cash to pay a developer to analyze all their code. I think OP said he's done some optimization and caching, and it looks as if you may be just assuming his setup, coding, and queries are a pile of rubbish, and he should "man up" and fix it? Did you and load balancing have an argument, and he left in a huff, slamming the door behind him, or what? Migrating all the data over to something other than mysql probably isn't a workable solution in this case. Assuming he has clunky code, uses bad queries, and made bad choices in software probably aren't helping, at the end of the day.
Posted by squirrelhost, 01-10-2011, 01:47 AM
you miss the point yet again - if sql queries are slow, at least caching them is a good idea. opcode caching for php scripts will always produce benefits - regardless or how good or bad. if you'd read the above link I posted you could have got an idea of the many simple optimisations which can bring major benefits. but having spent 10 years working for yahoo, I think i've more experience of load-balancing, redundancy and high-availability than most here - but nevertheless in this case, the simplest fix is how much can you get out of a pint pot - and it can be many gallons indeed with the many simply caching tools available.
Posted by mugo, 01-10-2011, 01:55 AM
and I've been building LB'd / HA sites for over 10 years...so we can flex our techie muscles!! What if I work for Google? Oh...man...bad blood there... I got the point. But this isn't helping OP *at all*, he has the info, I'd say a good combination of both would do him well. Off Topic - Has anyone noticed; google has a nice clean interface, and yahoo just bombards the user with so much info, it's hard to decide what you want to click on? silly websites.
Posted by UNIXy, 01-11-2011, 01:07 AM
Real time DB replication across high latency links is painful to say the least. Based on experience, I've only seen it done in the finance industry where they use EMC SRDF & Oracle on dedicated high speed leased lines. Needless to say, it prices quite a bit! Regards Joe / UNIXY
Posted by plumsauce, 01-11-2011, 01:53 AM
And therein lies the rub It is almost a given that the site in its current condition is the best that someone can do, or can afford to pay for. There is always someone else who can do better, but it comes at a cost. And the fix is never very fast because it can involve a *lot* of code review by someone who has never seen the code before.
Posted by plumsauce, 01-11-2011, 02:01 AM
And yet you miss the most obvious immediate improvement - going to real hardware for about the same cost. Getting closer to the metal has never hurt anyone. *First*, fix the foundation, *then* apply the tweaks. Otherwise you have a wobbly mess. A highly optimised wobbly mess, but nonetheless, a wobbly mess.
Posted by plumsauce, 01-11-2011, 02:03 AM
But, when the cost of not doing it outweighs the cost of doing it ... Of course, there are custom solutions out there. But, they are equally expensive in terms of initial development prices.
Posted by brentpresley, 01-11-2011, 09:17 AM
Hey OP, Can you give us a run-down of what you have done and if/what difference it has made?
Posted by rhythmic, 01-11-2011, 09:58 AM
There are a lot of great suggestions here, particularly with respect to query/page caching and browser cache control. However, there are probably lots of quick fix areas that can make a big impact. OP indicated the need to stop the bleeding for current users and a lack of money, so breathing room may be what he requires right now until he can implement some of the better suggestions in this thread. Usually when I look at a site that is performing poorly (assuming it is the first time it has been looked at for optimization), it isn't the pages that 90-100% of users are hitting that is causing the load. Those pages tend to be mostly static with PK or indexed queries. They seem like the logical choice for the bottleneck, but often there are less frequently accessed pages that contribute dramatically more load. Things like searches, reports, administrative queries, etc can create orders of magnitude more load than a typical page load. In those situations, you might be able to apply a quick bandaid or two while you investigate caching options. I'm not saying that this is happening on this site, as there's not nearly enough information shared to know that. But, OP should take the time to figure that out first, before implementing any of the suggestions people have made. Turn on slow query logging and leave it on. Turn on all query logging for as long as you can--at least a few hours--and look at what is accounting for MySQL's time. At peak periods of sluggishness, do a SHOW PROCESSLIST, and do it at peak periods of performance. Compare the difference. Also be mindful of your CPU. If you are experiencing very high IO wait (depending on what type of VPS you have, the IO wait reading your OS reports back to you may lie a bit), you may be taxing your disk with queries that are "larger" than they need to be, particularly unindexed queries that need to table scan. However, if IO wait is low but user CPU is high, perhaps you're spending a lot of time sorting result sets, executing abusive PHP code, etc. Once you get a feel for what is causing the load, use things like EXPLAIN, code profiling, and tweaking DB/PHP/Apache settings. Again, this is to buy breathing room. When growth pushes you back into trouble again, you'll already have done the easy fixes.
Posted by raffo, 01-12-2011, 08:08 AM
Of course, update your software and use memory cache and file cache. But consider also to limit the resource for every client, for example max 5 connections simultaneous for each IP. I suggest you to use a reverse proxy like Nginx and use it as a firewall. Configure a cache, compression gzip, keep-alive, connetions limit and you will see your total load much less than normal. With this proxy you don't block any IP, you just limit the connection. Much more efficent than mod_qos of Apache but you need little more know what you doing.
Posted by mugo, 01-13-2011, 01:36 AM
I have a quite a few sites I use master-master with, one particularly busy site between two VPSs, which work well. It's a basic wordpress with a custom theme, a company blog that's updated maybe every few days, but heavy traffic. I use it for redundancy / autofailover, with a script that watches replication and notifies if either slave balks. But over high latency, or using something that writes sessions to a db where both servers are active, yeah, I wouldn't run anything "high load" with that design. A lot of the VPS providers are making private backend IPs available if you happen to be in the same DC, which works pretty well. But, for the level of HA I sell, that doesn't do me too much good, since geographic and vendor diversity is part of the overall "no single point of failure" method. It realtime master-master is working very well for a heavily customized drupal instance that runs my companies main web site, but we have both behind ZXTM load balancers, with a 1G fiber link between the geographical locations. That does work quite well. Do you use DRDB much Joe?
Posted by WebMedic, 01-15-2011, 09:22 PM
Sorry for the late response, been really busy. I would like to thank everyone for their input again, good stuff here. What I have done so far: - Optimized nginx settings and php5-fpm a bit more (at first they were too small, and then I made it too big for the VPS to handle.. I think now I got a good setting going on) - Working slowly on optimizing my code, as I said before its custom code and I cannot really install / ask for plugins and addons to fix anything so working on it slowly, I was not using any template system for example so thats changing now to Smarty. - Part of my optimization was in installing Memcached and using it on a lot of queries that were repeating on every request where they didnt really need to. I have a lot of other work right now so the website is on halt until I get more free time to expand it. Thanks!
Posted by serveradmin4linux, 01-15-2011, 10:23 PM
Inorder to reduce the load, you can install nginx as a reverse proxy for apache, I installed it and it reduces the load in the server.

Add to Favourites Print this Article