linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Rik van Riel <riel@redhat.com>
Cc: Lorenzo Allegrucci <l_allegrucci@yahoo.it>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Suparna Bhattacharya <suparna@in.ibm.com>,
	Jens Axboe <jens.axboe@oracle.com>
Subject: Re: SMP performance degradation with sysbench
Date: Tue, 27 Feb 2007 00:36:04 +1100	[thread overview]
Message-ID: <45E2E244.8040009@yahoo.com.au> (raw)
In-Reply-To: <45E21FEC.9060605@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3000 bytes --]

Rik van Riel wrote:
> Lorenzo Allegrucci wrote:
> 
>> Hi lkml,
>>
>> according to the test below (sysbench) Linux seems to have scalability
>> problems beyond 8 client threads:
>> http://jeffr-tech.livejournal.com/6268.html#cutid1
>> http://jeffr-tech.livejournal.com/5705.html
>> Hardware is an 8-core amd64 system and jeffr seems willing to try more
>> Linux versions on that machine.
>> Anyway, is there anyone who can reproduce this?
> 
> 
> I have reproduced it on a quad core test system.
> 
> With 4 threads (on 4 cores) I get a high throughput, with
> approximately 58% user time and 42% system time.
> 
> With 8 threads (on 4 cores) I get way lower throughput,
> with 37% user time, 29% system time 35% idle time!
> 
> The maximum time taken per query also increases from
> 0.0096s to 0.5273s. Ouch!
> 
> I don't know if this is MySQL, glibc or Linux kernel,
> but something strange is going on...

Like you, I'm also seeing idle time start going up as threads increase.

I initially thought this was a problem with the multiprocessor scheduler,
because the pattern is exactly like some artificat in the load balancing.

However, after looking at the stats, and testing a couple of things, I
think it may not be after all.

I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
I am seeing is that MySQL is having trouble putting enough load into the
scheduler.

Virtually all of the sleep time is coming from unix_stream_recvmsg, which
seems to be what the clients and server threads use to communicate with.
There doesn't seem to be any other tell-tale event that the database is
blocking on.

It seems like it might at least partially be a problem with MySQL
thread/connection management.

I found a couple of interesting issues so far. Firstly, the MySQL version
that I'm using (5.0.26-Max) is making lots of calls to sched_setscheduler
attempting to fiddle with SCHED_OTHER priority in what looks like an
attempt to boot CPU time while holding some resource. All these calls
actually fail, because you cannot change SCHED_OTHER priority like that.
Adding a hack to make it fall through to set_user_nice provides a boost
which eliminates the cliff (but a downward degredation is still there).

Secondly, I've raised the thread numbers from 16 to 32 for my system,
which also provides a bit more (although doesn't help the downward
slope).

Combined, it looks like around 30-40% improvement past 16 threads. It
isn't anything like making up for the dropoff seen in the blog link, but
different systems, different mysql version... I wonder how close we are
with this hack in place?

Attached is a graph of my numbers, from 1 to 32 clients. plain = 2.6.20.1,
sched is with the attached sched patch, and thread is with 32 rather than
16 clients.

Anyway, I'll keep experimenting. If anyone from MySQL wants to help look
at this, send me a mail (eg. especially with the sched_setscheduler issue,
you might be able to do something better).

Nick

-- 
SUSE Labs, Novell Inc.

[-- Attachment #2: graph.png --]
[-- Type: image/png, Size: 6969 bytes --]

[-- Attachment #3: mysql-hack.patch --]
[-- Type: text/plain, Size: 766 bytes --]

--- kernel/sched.c.orig	2007-02-26 11:46:46.849841000 +0100
+++ kernel/sched.c	2007-02-26 12:04:09.283056000 +0100
@@ -4227,8 +4227,6 @@ recheck:
 	    (p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
 	    (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
 		return -EINVAL;
-	if (is_rt_policy(policy) != (param->sched_priority != 0))
-		return -EINVAL;
 
 	/*
 	 * Allow unprivileged RT tasks to decrease priority:
@@ -4302,6 +4300,13 @@ recheck:
 
 	rt_mutex_adjust_pi(p);
 
+	if (!is_rt_policy(policy)) {
+                if (param->sched_priority == 8)
+                        set_user_nice(p, -20);
+                else
+                        set_user_nice(p, param->sched_priority-6);
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sched_setscheduler);

  reply	other threads:[~2007-02-26 13:36 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-25 17:44 SMP performance degradation with sysbench Lorenzo Allegrucci
2007-02-25 23:46 ` Rik van Riel
2007-02-26 13:36   ` Nick Piggin [this message]
2007-02-26 13:41     ` Nick Piggin
2007-02-26 22:04     ` Pete Harlan
2007-02-26 22:36       ` Dave Jones
2007-02-27  0:32         ` Hiro Yoshioka
2007-02-27  0:43           ` Rik van Riel
2007-02-27  4:03             ` Hiro Yoshioka
2007-02-27  4:31               ` Rik van Riel
2007-02-27  8:14                 ` J.A. Magallón
2007-02-27 14:02                   ` Rik van Riel
2007-02-27 14:56                     ` Paulo Marques
2007-02-27 20:40                       ` Nish Aravamudan
2007-02-28  2:21                       ` Bill Davidsen
2007-02-28  2:52                         ` Nish Aravamudan
2007-03-01  0:20                           ` Nish Aravamudan
2007-02-27 19:05                     ` Lorenzo Allegrucci
2007-03-01 16:57                       ` Lorenzo Allegrucci
2007-02-28  1:27     ` Nish Aravamudan
2007-02-28  2:22       ` Nick Piggin
2007-02-28  2:51         ` Nish Aravamudan
2007-03-12 22:00     ` Anton Blanchard
2007-03-13  5:11       ` Nick Piggin
2007-03-13  9:45         ` Andrea Arcangeli
2007-03-13 10:06           ` Nick Piggin
2007-03-13 10:31             ` Andrea Arcangeli
2007-03-13 10:37               ` Nick Piggin
2007-03-13 10:57                 ` Andrea Arcangeli
2007-03-13 11:12                   ` Nick Piggin
2007-03-13 11:40                     ` Eric Dumazet
2007-03-13 11:56                       ` Nick Piggin
2007-03-13 11:42                     ` Andrea Arcangeli
2007-03-13 12:02                       ` Eric Dumazet
2007-03-13 12:27                         ` Jakub Jelinek
2007-03-13 12:08                       ` Nick Piggin
2007-03-14 23:33                         ` Siddha, Suresh B
2007-03-20  2:29                           ` Zhang, Yanmin
2007-04-02  2:59                             ` Zhang, Yanmin
2007-03-13  6:00       ` Eric Dumazet
2007-03-14  0:36       ` Nish Aravamudan
2007-03-14  1:00         ` Eric Dumazet
2007-03-14  1:09           ` Nish Aravamudan
     [not found] <fa.V3M3ZgXL+lFlIyhx43YxCU/JFUk@ifi.uio.no>
     [not found] ` <fa.ciL5lzdfskdJHJPgn+UVCHt/9EM@ifi.uio.no>
     [not found]   ` <fa.2ABbHhyCbp3Fx7hSE/Gr0SuzFvw@ifi.uio.no>
     [not found]     ` <fa.oaZk6Aiqd8gyZNsj7+m+w9MibhU@ifi.uio.no>
     [not found]       ` <fa.RjX9Y4ckjRCle5L+uWNdd0snOio@ifi.uio.no>
     [not found]         ` <fa.XocsudxlGplKh0kloTtA0juPwtA@ifi.uio.no>
2007-02-28  0:20           ` Robert Hancock
2007-02-28  1:32             ` Hiro Yoshioka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45E2E244.8040009@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=jens.axboe@oracle.com \
    --cc=l_allegrucci@yahoo.it \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=riel@redhat.com \
    --cc=suparna@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).