Rik van Riel wrote: > Lorenzo Allegrucci wrote: > >> Hi lkml, >> >> according to the test below (sysbench) Linux seems to have scalability >> problems beyond 8 client threads: >> http://jeffr-tech.livejournal.com/6268.html#cutid1 >> http://jeffr-tech.livejournal.com/5705.html >> Hardware is an 8-core amd64 system and jeffr seems willing to try more >> Linux versions on that machine. >> Anyway, is there anyone who can reproduce this? > > > I have reproduced it on a quad core test system. > > With 4 threads (on 4 cores) I get a high throughput, with > approximately 58% user time and 42% system time. > > With 8 threads (on 4 cores) I get way lower throughput, > with 37% user time, 29% system time 35% idle time! > > The maximum time taken per query also increases from > 0.0096s to 0.5273s. Ouch! > > I don't know if this is MySQL, glibc or Linux kernel, > but something strange is going on... Like you, I'm also seeing idle time start going up as threads increase. I initially thought this was a problem with the multiprocessor scheduler, because the pattern is exactly like some artificat in the load balancing. However, after looking at the stats, and testing a couple of things, I think it may not be after all. I've reproduced this on a 8-socket/16-way dual core Opteron. So far what I am seeing is that MySQL is having trouble putting enough load into the scheduler. Virtually all of the sleep time is coming from unix_stream_recvmsg, which seems to be what the clients and server threads use to communicate with. There doesn't seem to be any other tell-tale event that the database is blocking on. It seems like it might at least partially be a problem with MySQL thread/connection management. I found a couple of interesting issues so far. Firstly, the MySQL version that I'm using (5.0.26-Max) is making lots of calls to sched_setscheduler attempting to fiddle with SCHED_OTHER priority in what looks like an attempt to boot CPU time while holding some resource. All these calls actually fail, because you cannot change SCHED_OTHER priority like that. Adding a hack to make it fall through to set_user_nice provides a boost which eliminates the cliff (but a downward degredation is still there). Secondly, I've raised the thread numbers from 16 to 32 for my system, which also provides a bit more (although doesn't help the downward slope). Combined, it looks like around 30-40% improvement past 16 threads. It isn't anything like making up for the dropoff seen in the blog link, but different systems, different mysql version... I wonder how close we are with this hack in place? Attached is a graph of my numbers, from 1 to 32 clients. plain = 2.6.20.1, sched is with the attached sched patch, and thread is with 32 rather than 16 clients. Anyway, I'll keep experimenting. If anyone from MySQL wants to help look at this, send me a mail (eg. especially with the sched_setscheduler issue, you might be able to do something better). Nick -- SUSE Labs, Novell Inc.