On 04/14/2010 07:33 PM, Chris Mason wrote:
> On Wed, Apr 14, 2010 at 06:16:53PM +0200, Manfred Spraul wrote:
>    
>> On 04/13/2010 08:19 PM, Chris Mason wrote:
>>      
>>> On Wed, Apr 14, 2010 at 04:09:45AM +1000, Nick Piggin wrote:
>>>        
>>>> On Tue, Apr 13, 2010 at 01:39:41PM -0400, Chris Mason wrote:
>>>> The other thing I don't know if your patch gets right is requeueing on
>>>> of the operations. When you requeue from one list to another, then you
>>>> seem to lose ordering with other pending operations, so that would
>>>> seem to break the API as well (can't remember if the API strictly
>>>> mandates FIFO, but anyway it can open up starvation cases).
>>>>          
>>> I don't see anything in the docs about the FIFO order.  I could add an
>>> extra sort on sequence number pretty easily, but is the starvation case
>>> really that bad?
>>>
>>>        
>> How do you want to determine the sequence number?
>> Is atomic_inc_return() on a per-semaphore array counter sufficiently fast?
>>      
> I haven't tried yet, but hopefully it won't be a problem.  A later patch
> does atomics on the reference count and it doesn't show up in the
> profiles.
>
>    
>>      
>>>> I was looking at doing a sequence number to be able to sort these, but
>>>> it ended up getting over complex (and SAP was only using simple ops so
>>>> it didn't seem to need much better).
>>>>
>>>> We want to be careful not to change semantics at all. And it gets
>>>> tricky quickly :( What about Zach's simpler wakeup API?
>>>>          
>>> Yeah, that's why my patches include code to handle userland sending
>>> duplicate semids.  Zach's simpler API is cooking too, but if I can get
>>> this done without insane complexity it helps with more than just the
>>> post/wait oracle workload.
>>>
>>>        
>> What is the oracle workload, which multi-sembuf operations does it use?
>> How many semaphores are in one array?
>>
>> When the last optimizations were written, I've searched a bit:
>> - postgres uses per-process semaphores, with small semaphore arrays.
>>      [process sleeps on it's own semaphore and is woken up by someone
>> else when it can make progress]
>>      
> This is similar to Oracle (and the sembench program).  Each process has
> a semaphore and when it is waiting for a commit it goes to sleep on it.
> They are woken up in bulk with semtimedop calls from a single process.
>
>    
Hmm. Thus you have:
- single sembuf decrease operations that are waiting frequently.
- multi-sembuf  increase operations.

What about optimizing for that case?
Increase operations succeed immediately. Thus complex_count is 0.

If we have performed an update operation, then we can scan all 
simple_lists that have seen an increase instead of checking the global 
list - as long as there are no complex operations waiting.
Right now, we give up if the update operation was a complex operation - 
but that does not matter.
All that matters are the sleeping operations, not the operation that did 
the wakeup.
I've attached an untested idea.

> But oracle also uses semaphores for locking in a traditional sense.
>
> Putting the waiters into a per-semaphore list is really only part of the
> speedup.  The real boost comes from the patch to break up the locks into
> a per semaphore lock.
>
>    
Ok. Then simple tricks won't help.
How many semaphores are in one array?

--
     Manfred