From: Chris Mason <chris.mason@oracle.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>,
zach.brown@oracle.com, jens.axboe@oracle.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] ipc semaphores: reduce ipc_lock contention in semtimedop
Date: Wed, 14 Apr 2010 13:33:19 -0400 [thread overview]
Message-ID: <20100414173319.GA3228@think> (raw)
In-Reply-To: <4BC5EA75.9090803@colorfullife.com>
On Wed, Apr 14, 2010 at 06:16:53PM +0200, Manfred Spraul wrote:
> On 04/13/2010 08:19 PM, Chris Mason wrote:
> >On Wed, Apr 14, 2010 at 04:09:45AM +1000, Nick Piggin wrote:
> >>On Tue, Apr 13, 2010 at 01:39:41PM -0400, Chris Mason wrote:
> >>The other thing I don't know if your patch gets right is requeueing on
> >>of the operations. When you requeue from one list to another, then you
> >>seem to lose ordering with other pending operations, so that would
> >>seem to break the API as well (can't remember if the API strictly
> >>mandates FIFO, but anyway it can open up starvation cases).
> >I don't see anything in the docs about the FIFO order. I could add an
> >extra sort on sequence number pretty easily, but is the starvation case
> >really that bad?
> >
> How do you want to determine the sequence number?
> Is atomic_inc_return() on a per-semaphore array counter sufficiently fast?
I haven't tried yet, but hopefully it won't be a problem. A later patch
does atomics on the reference count and it doesn't show up in the
profiles.
>
> >>I was looking at doing a sequence number to be able to sort these, but
> >>it ended up getting over complex (and SAP was only using simple ops so
> >>it didn't seem to need much better).
> >>
> >>We want to be careful not to change semantics at all. And it gets
> >>tricky quickly :( What about Zach's simpler wakeup API?
> >Yeah, that's why my patches include code to handle userland sending
> >duplicate semids. Zach's simpler API is cooking too, but if I can get
> >this done without insane complexity it helps with more than just the
> >post/wait oracle workload.
> >
> What is the oracle workload, which multi-sembuf operations does it use?
> How many semaphores are in one array?
>
> When the last optimizations were written, I've searched a bit:
> - postgres uses per-process semaphores, with small semaphore arrays.
> [process sleeps on it's own semaphore and is woken up by someone
> else when it can make progress]
This is similar to Oracle (and the sembench program). Each process has
a semaphore and when it is waiting for a commit it goes to sleep on it.
They are woken up in bulk with semtimedop calls from a single process.
But oracle also uses semaphores for locking in a traditional sense.
Putting the waiters into a per-semaphore list is really only part of the
speedup. The real boost comes from the patch to break up the locks into
a per semaphore lock.
We gain another 10-15% from a later patch that gets uses atomics on the
refcount, which lets us do sem_putref without a lock (meaning we're
lockless once we get woken up).
I'm cleaning up fixes based on suggestions here and will repost.
> - with google, I couldn't find anything relevant that uses
> multi-sembuf semop() calls.
>
I think this should help any workload that has more than one semaphore
per array, even if they only do one sem per call.
> And I agree with Nick: We should be careful about changing the API.
Definitely, thanks for reading through it.
-chris
next prev parent reply other threads:[~2010-04-14 17:33 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-12 18:49 [PATCH RFC] Optimize semtimedop Chris Mason
2010-04-12 18:49 ` [PATCH 1/2] ipc semaphores: reduce ipc_lock contention in semtimedop Chris Mason
2010-04-13 17:15 ` Manfred Spraul
2010-04-13 17:39 ` Chris Mason
2010-04-13 18:09 ` Nick Piggin
2010-04-13 18:19 ` Chris Mason
2010-04-13 18:57 ` Nick Piggin
2010-04-13 19:01 ` Chris Mason
2010-04-13 19:25 ` Nick Piggin
2010-04-13 19:38 ` Chris Mason
2010-04-13 20:05 ` Nick Piggin
2010-05-16 16:57 ` Manfred Spraul
2010-05-16 22:40 ` Chris Mason
2010-05-17 7:20 ` Nick Piggin
2010-04-14 16:16 ` Manfred Spraul
2010-04-14 17:33 ` Chris Mason [this message]
2010-04-14 19:11 ` Manfred Spraul
2010-04-14 19:50 ` Chris Mason
2010-04-15 16:33 ` Manfred Spraul
2010-04-15 16:34 ` Chris Mason
2010-04-13 18:24 ` Zach Brown
2010-04-16 11:26 ` Manfred Spraul
2010-04-16 11:45 ` Chris Mason
2010-04-12 18:49 ` [PATCH 2/2] ipc semaphores: order wakeups based on waiter CPU Chris Mason
2010-04-17 10:24 ` Manfred Spraul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100414173319.GA3228@think \
--to=chris.mason@oracle.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=npiggin@suse.de \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).