linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Vinod, Chegu" <chegu_vinod@hp.com>,
	"Low, Jason" <jason.low2@hp.com>,
	linux-tip-commits@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	aquini@redhat.com, Michel Lespinasse <walken@google.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [tip:core/locking] x86/smp: Move waiting on contended ticket lock out of line
Date: Thu, 28 Feb 2013 10:22:23 -0800	[thread overview]
Message-ID: <CA+55aFw3t+sC-zv6njxdJ6hB0arJnZzoGNO9W7o8h=tcUo5P_Q@mail.gmail.com> (raw)
In-Reply-To: <512F7429.4020103@redhat.com>

On Thu, Feb 28, 2013 at 7:13 AM, Rik van Riel <riel@redhat.com> wrote:
>
> Btw, the IPC lock is already fairly fine grained. One ipc
> lock is allocated for each set of semaphores allocated through
> sys_semget. Looking up those semaphores in the namespace, when
> they are used later, is done under RCU.

Bullshit. There is no fine-grained locking, and the code has never
ever been optimized for concurrency at all.

Yes, the lookup is done with RCU, and that's good. And I can tell you
*why* it's done with RCU:  all the trivial sysv semaphore benchmarks
I've seen tend to have just one lock per semget and then they test the
performance of that semaphore.

But I'm suspecting that since there is clearly contention going on
(and hopefully the real world people aren't just all pounding on one
single mutex), maybe these loads actually allocate multiple semaphores
with semget (that 'nsems' argument). And then we scale really badly,
because we use a single spinlock for all of them, even if we didn't
have to.

THERE IS ABSOLUTELY ZERO SCALING. None. Nada. The scaling that exists
(your RCU lookup example) is for *unrelated* locks, which is exactly
what you'd expect if all the simple benchmarks around did a single
sem-lock and then the "scaling" was tested by running a thousand
copies of that program - all using their own private single ipc
semaphore.

Plus the code in ipc/sem.c really is not very good. It does that
"sem_lock()" (which gets the ipc spinlock) way too eagerly, because
the semaphore lock also protects some trivial ipc stuff, so it gets
the lock *first*, then it does all the random checks it wants to do
(security checks, you name it), and then ot does all the queue undo
crap etc. All while holding that lock.

So no. There's no scaling, and the SINGLE lock that is held is held
way too f*cking long.

And that's what I'm talking about. We've never had people actually
bother about the IPC locking, because we've never had people who
cared. Everybody hates the crazy SysV IPC code, often for good reason.
People are told not to use it (use shared memory and futexes instead,
which actually *do* scale, and people *do* care about). But of course,
lots of projects do use the horrid SysV IPC code because it's
portable, and it does have those undo features etc.

And I'm sure we *could* fix it, but judging by past performance nobody
really cares enough. Which is why I do think that fixing it the wrong
way (by making the contention on the ipc_lock()) may well be the right
thing here for once.

Of course, if the real-world applications really only use a single
SysV semaphore, then we can't scale any better. But even then we could
still try to lower the hold-times of that ipc_lock().

And hold-time really tends to be one big killer for contention. A lot
of hardware tends to have ping-pong avoidance logic that means that
quick unlocks after getting the lock, and it will still remain in the
local node caches. Hardware features like that make it much harder to
introduce the really bad contention cases even if there are tons of
CPU's trying to access the same lock. But if the lock hold times are
long, it's *much* easier to get bad contention.

Looking at the code, if we could just get the security hook outside of
the locked region, that would probably be a big improvement. Maybe do
that part in the RCU lookup part? Because selinux really is very
expensive, with the whole ipc_has_perm -> avc_has_perm_flags thing etc
etc. This is exactly the kind of thing we did with pathnames.

I'm sure there are other things we could do to improve ipc lock times
even if we don't actually split the lock, but the security one might
be a good first step.

                      Linus

  reply	other threads:[~2013-02-28 18:22 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-06 20:03 [PATCH -v5 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning Rik van Riel
2013-02-06 20:04 ` [PATCH -v5 1/5] x86,smp: move waiting on contended ticket lock out of line Rik van Riel
2013-02-13 12:06   ` [tip:core/locking] x86/smp: Move " tip-bot for Rik van Riel
2013-02-13 16:20     ` Linus Torvalds
2013-02-13 18:30       ` Linus Torvalds
2013-02-14  0:54         ` H. Peter Anvin
2013-02-14  1:31           ` Linus Torvalds
2013-02-14  1:56             ` H. Peter Anvin
2013-02-14 10:50         ` Ingo Molnar
2013-02-14 16:10           ` Linus Torvalds
2013-02-15 15:57             ` Ingo Molnar
2013-02-15  6:48         ` Benjamin Herrenschmidt
2013-02-13 19:08       ` Rik van Riel
2013-02-13 19:36         ` Linus Torvalds
2013-02-13 22:21           ` Rik van Riel
2013-02-13 22:40             ` Linus Torvalds
2013-02-13 23:41               ` Rik van Riel
2013-02-14  1:21                 ` Linus Torvalds
2013-02-14  1:46                   ` Linus Torvalds
2013-02-14 10:43                   ` Ingo Molnar
2013-02-27 16:42                   ` Rik van Riel
2013-02-27 17:10                     ` Linus Torvalds
2013-02-27 19:53                       ` Rik van Riel
2013-02-27 20:18                         ` Linus Torvalds
2013-02-27 21:55                           ` Rik van Riel
     [not found]                             ` <CA+55aFwa0EjGG2NUDYVLVBmXJa2k81YiuNO2yggk=GLRQxhhUQ@mail.gmail.com>
2013-02-28  2:58                               ` Rik van Riel
2013-02-28  3:19                                 ` Linus Torvalds
2013-02-28  4:06                                 ` Davidlohr Bueso
2013-02-28  4:49                                   ` Linus Torvalds
2013-02-28 15:13                                     ` Rik van Riel
2013-02-28 18:22                                       ` Linus Torvalds [this message]
2013-02-28 20:26                                         ` Linus Torvalds
2013-02-28 21:14                                           ` Rik van Riel
2013-02-28 21:58                                             ` Linus Torvalds
2013-02-28 22:38                                               ` Rik van Riel
2013-02-28 23:09                                                 ` Linus Torvalds
2013-03-01  6:42                                                   ` Rik van Riel
2013-03-01 18:18                                                     ` Davidlohr Bueso
2013-03-01 18:50                                                       ` Rik van Riel
2013-03-01 18:52                                                       ` Linus Torvalds
2013-02-06 20:04 ` [PATCH -v5 2/5] x86,smp: proportional backoff for ticket spinlocks Rik van Riel
2013-02-13 12:07   ` [tip:core/locking] x86/smp: Implement " tip-bot for Rik van Riel
2013-02-06 20:05 ` [PATCH -v5 3/5] x86,smp: auto tune spinlock backoff delay factor Rik van Riel
2013-02-13 12:08   ` [tip:core/locking] x86/smp: Auto " tip-bot for Rik van Riel
2013-02-06 20:06 ` [PATCH -v5 4/5] x86,smp: keep spinlock delay values per hashed spinlock address Rik van Riel
2013-02-13 12:09   ` [tip:core/locking] x86/smp: Keep " tip-bot for Eric Dumazet
2013-02-06 20:07 ` [PATCH -v5 5/5] x86,smp: limit spinlock delay on virtual machines Rik van Riel
2013-02-07 11:11   ` Ingo Molnar
2013-02-07 21:24     ` [PATCH fix " Rik van Riel
2013-02-13 12:10       ` [tip:core/locking] x86/smp: Limit " tip-bot for Rik van Riel
2013-02-07 11:25   ` [PATCH -v5 5/5] x86,smp: limit " Stefano Stabellini
2013-02-07 11:59     ` Raghavendra K T
2013-02-07 13:28     ` Rik van Riel
2013-02-06 20:08 ` [PATCH -v5 6/5] x86,smp: add debugging code to track spinlock delay value Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFw3t+sC-zv6njxdJ6hB0arJnZzoGNO9W7o8h=tcUo5P_Q@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=hpa@zytor.com \
    --cc=jason.low2@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).