linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>,
	zach.brown@oracle.com, jens.axboe@oracle.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] ipc semaphores: reduce ipc_lock contention in semtimedop
Date: Wed, 14 Apr 2010 15:50:03 -0400	[thread overview]
Message-ID: <20100414195003.GE3228@think> (raw)
In-Reply-To: <4BC61370.7020700@colorfullife.com>

On Wed, Apr 14, 2010 at 09:11:44PM +0200, Manfred Spraul wrote:
> On 04/14/2010 07:33 PM, Chris Mason wrote:
> >On Wed, Apr 14, 2010 at 06:16:53PM +0200, Manfred Spraul wrote:
> >>On 04/13/2010 08:19 PM, Chris Mason wrote:
> >>>On Wed, Apr 14, 2010 at 04:09:45AM +1000, Nick Piggin wrote:
> >>>>On Tue, Apr 13, 2010 at 01:39:41PM -0400, Chris Mason wrote:
> >>>>The other thing I don't know if your patch gets right is requeueing on
> >>>>of the operations. When you requeue from one list to another, then you
> >>>>seem to lose ordering with other pending operations, so that would
> >>>>seem to break the API as well (can't remember if the API strictly
> >>>>mandates FIFO, but anyway it can open up starvation cases).
> >>>I don't see anything in the docs about the FIFO order.  I could add an
> >>>extra sort on sequence number pretty easily, but is the starvation case
> >>>really that bad?
> >>>
> >>How do you want to determine the sequence number?
> >>Is atomic_inc_return() on a per-semaphore array counter sufficiently fast?
> >I haven't tried yet, but hopefully it won't be a problem.  A later patch
> >does atomics on the reference count and it doesn't show up in the
> >profiles.
> >
> >>>>I was looking at doing a sequence number to be able to sort these, but
> >>>>it ended up getting over complex (and SAP was only using simple ops so
> >>>>it didn't seem to need much better).
> >>>>
> >>>>We want to be careful not to change semantics at all. And it gets
> >>>>tricky quickly :( What about Zach's simpler wakeup API?
> >>>Yeah, that's why my patches include code to handle userland sending
> >>>duplicate semids.  Zach's simpler API is cooking too, but if I can get
> >>>this done without insane complexity it helps with more than just the
> >>>post/wait oracle workload.
> >>>
> >>What is the oracle workload, which multi-sembuf operations does it use?
> >>How many semaphores are in one array?
> >>
> >>When the last optimizations were written, I've searched a bit:
> >>- postgres uses per-process semaphores, with small semaphore arrays.
> >>     [process sleeps on it's own semaphore and is woken up by someone
> >>else when it can make progress]
> >This is similar to Oracle (and the sembench program).  Each process has
> >a semaphore and when it is waiting for a commit it goes to sleep on it.
> >They are woken up in bulk with semtimedop calls from a single process.
> >
> Hmm. Thus you have:
> - single sembuf decrease operations that are waiting frequently.
> - multi-sembuf  increase operations.
> 
> What about optimizing for that case?
> Increase operations succeed immediately. Thus complex_count is 0.

I've been wondering about that.  I can optimize the patch to special
case the increase operations.  The only problem I saw was checking or
the range overflow.  Current behavior will abort the whole set if the
range overflow happens.

> 
> If we have performed an update operation, then we can scan all
> simple_lists that have seen an increase instead of checking the
> global list - as long as there are no complex operations waiting.
> Right now, we give up if the update operation was a complex
> operation - but that does not matter.
> All that matters are the sleeping operations, not the operation that
> did the wakeup.
> I've attached an untested idea.

Zach Brown's original patch set tried just the list magic and not
the spinlocks.  I'm afraid it didn't help very much over all.

> 
> >But oracle also uses semaphores for locking in a traditional sense.
> >
> >Putting the waiters into a per-semaphore list is really only part of the
> >speedup.  The real boost comes from the patch to break up the locks into
> >a per semaphore lock.
> >
> Ok. Then simple tricks won't help.
> How many semaphores are in one array?

On a big system I saw about 4000 semaphores total.  The database will
just allocate as many as it can into a single array and keep creating
arrays until it has all it needs.

-chris


  reply	other threads:[~2010-04-14 19:51 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-12 18:49 [PATCH RFC] Optimize semtimedop Chris Mason
2010-04-12 18:49 ` [PATCH 1/2] ipc semaphores: reduce ipc_lock contention in semtimedop Chris Mason
2010-04-13 17:15   ` Manfred Spraul
2010-04-13 17:39     ` Chris Mason
2010-04-13 18:09       ` Nick Piggin
2010-04-13 18:19         ` Chris Mason
2010-04-13 18:57           ` Nick Piggin
2010-04-13 19:01             ` Chris Mason
2010-04-13 19:25               ` Nick Piggin
2010-04-13 19:38                 ` Chris Mason
2010-04-13 20:05                   ` Nick Piggin
2010-05-16 16:57             ` Manfred Spraul
2010-05-16 22:40               ` Chris Mason
2010-05-17  7:20               ` Nick Piggin
2010-04-14 16:16           ` Manfred Spraul
2010-04-14 17:33             ` Chris Mason
2010-04-14 19:11               ` Manfred Spraul
2010-04-14 19:50                 ` Chris Mason [this message]
2010-04-15 16:33                   ` Manfred Spraul
2010-04-15 16:34                     ` Chris Mason
2010-04-13 18:24         ` Zach Brown
2010-04-16 11:26   ` Manfred Spraul
2010-04-16 11:45     ` Chris Mason
2010-04-12 18:49 ` [PATCH 2/2] ipc semaphores: order wakeups based on waiter CPU Chris Mason
2010-04-17 10:24   ` Manfred Spraul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100414195003.GE3228@think \
    --to=chris.mason@oracle.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=npiggin@suse.de \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).