e100 oops on resume

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* e100 oops on resume
@ 2006-01-24 22:59 Stefan Seyfried
  2006-01-24 23:21 ` Mattia Dongili
  0 siblings, 1 reply; 85+ messages in thread
From: Stefan Seyfried @ 2006-01-24 22:59 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: netdev

Hi,
since 2.6.16rc1-git3, e100 dies on resume (regardless if from disk, ram or
runtime powermanagement). Unfortunately i only have a bad photo of
the oops right now, it is available from
https://bugzilla.novell.com/attachment.cgi?id=64761&action=view
I have reproduced this on a second e100 machine and can get a serial
console log from this machine tomorrow if needed.
It did resume fine with 2.6.15-git12
-- 
Stefan Seyfried                  \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices      \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, Nürnberg \                    -- Leonard Cohen

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-24 22:59 e100 oops on resume Stefan Seyfried
@ 2006-01-24 23:21 ` Mattia Dongili
  2006-01-25  9:02   ` Olaf Kirch
  0 siblings, 1 reply; 85+ messages in thread
From: Mattia Dongili @ 2006-01-24 23:21 UTC (permalink / raw)
  To: Stefan Seyfried; +Cc: Linux Kernel Mailing List, netdev

On Tue, Jan 24, 2006 at 11:59:19PM +0100, Stefan Seyfried wrote:
> Hi,
> since 2.6.16rc1-git3, e100 dies on resume (regardless if from disk, ram or
> runtime powermanagement). Unfortunately i only have a bad photo of
> the oops right now, it is available from
> https://bugzilla.novell.com/attachment.cgi?id=64761&action=view
> I have reproduced this on a second e100 machine and can get a serial
> console log from this machine tomorrow if needed.
> It did resume fine with 2.6.15-git12

I experienced the same today, I was planning to get a photo tomorrow :)
I'm running 2.6.16-rc1-mm2 and the last working kernel was 2.6.15-mm4
(didn't try .16-rc1-mm1 being scared of the reiserfs breakage).

-- 
mattia
:wq!

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-24 23:21 ` Mattia Dongili
@ 2006-01-25  9:02   ` Olaf Kirch
  2006-01-25 12:11     ` Olaf Kirch
  0 siblings, 1 reply; 85+ messages in thread
From: Olaf Kirch @ 2006-01-25  9:02 UTC (permalink / raw)
  To: Stefan Seyfried, Linux Kernel Mailing List, netdev

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

On Wed, Jan 25, 2006 at 12:21:42AM +0100, Mattia Dongili wrote:
> I experienced the same today, I was planning to get a photo tomorrow :)
> I'm running 2.6.16-rc1-mm2 and the last working kernel was 2.6.15-mm4
> (didn't try .16-rc1-mm1 being scared of the reiserfs breakage).

I think that's because the latest driver version wants to wait for
the ucode download, and e100_exec_cb_wait before allocating any
control blocks.

static inline int e100_exec_cb_wait(struct nic *nic, struct sk_buff *skb,
        void (*cb_prepare)(struct nic *, struct cb *, struct sk_buff *))
{
        int err = 0, counter = 50;
        struct cb *cb = nic->cb_to_clean;

        if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
                DPRINTK(PROBE,ERR, "ucode cmd failed with error %d\n", err);
	/* NOTE: the oops shows that e100_exec_cb fails with ENOMEM,
  	 * which also means there are no cbs */

	/* ... other stuff...
	 * and then we die here because cb is NULL: */
        while (!(cb->status & cpu_to_le16(cb_complete))) {
                msleep(10);
                if (!--counter) break;
        }

I'm not sure what the right fix would be. e100_resume would probably
have to call e100_alloc_cbs early on, while e100_up should avoid
calling it a second time if nic->cbs_avail != 0. A tentative patch
for testing is attached.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

[-- Attachment #2: e100-resume-fix --]
[-- Type: text/plain, Size: 1830 bytes --]

[PATCH] e100: allocate cbs early on when resuming

Signed-off-by: Olaf Kirch <okir@suse.de>

 drivers/net/e100.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

Index: build/drivers/net/e100.c
===================================================================
--- build.orig/drivers/net/e100.c
+++ build/drivers/net/e100.c
@@ -1298,8 +1298,10 @@ static inline int e100_exec_cb_wait(stru
 	int err = 0, counter = 50;
 	struct cb *cb = nic->cb_to_clean;
 
-	if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
+	if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode))) {
 		DPRINTK(PROBE,ERR, "ucode cmd failed with error %d\n", err);
+		return err;
+	}
 
 	/* must restart cuc */
 	nic->cuc_cmd = cuc_start;
@@ -1721,9 +1723,11 @@ static int e100_alloc_cbs(struct nic *ni
 	struct cb *cb;
 	unsigned int i, count = nic->params.cbs.count;
 
+	/* bail out if we've been here before */
+	if (nic->cbs_avail)
+		return 0;
+
 	nic->cuc_cmd = cuc_start;
-	nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
-	nic->cbs_avail = 0;
 
 	nic->cbs = pci_alloc_consistent(nic->pdev,
 		sizeof(struct cb) * count, &nic->cbs_dma_addr);
@@ -2578,6 +2582,8 @@ static int __devinit e100_probe(struct p
 	nic->pdev = pdev;
 	nic->msg_enable = (1 << debug) - 1;
 	pci_set_drvdata(pdev, netdev);
+	nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
+	nic->cbs_avail = 0;
 
 	if((err = pci_enable_device(pdev))) {
 		DPRINTK(PROBE, ERR, "Cannot enable PCI device, aborting.\n");
@@ -2752,6 +2758,8 @@ static int e100_resume(struct pci_dev *p
 	retval = pci_enable_wake(pdev, 0, 0);
 	if (retval)
 		DPRINTK(PROBE,ERR, "Error clearing wake events\n");
+	if ((retval = e100_alloc_cbs(nic)))
+		DPRINTK(PROBE,ERR, "No memory for cbs\n");
 	if(e100_hw_init(nic))
 		DPRINTK(HW, ERR, "e100_hw_init failed\n");
 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-25  9:02   ` Olaf Kirch
@ 2006-01-25 12:11     ` Olaf Kirch
  2006-01-25 13:51       ` sched_yield() makes OpenLDAP slow Howard Chu
  2006-01-25 19:37       ` e100 oops on resume Jesse Brandeburg
  0 siblings, 2 replies; 85+ messages in thread
From: Olaf Kirch @ 2006-01-25 12:11 UTC (permalink / raw)
  To: Stefan Seyfried, Linux Kernel Mailing List, netdev

On Wed, Jan 25, 2006 at 10:02:40AM +0100, Olaf Kirch wrote:
> I'm not sure what the right fix would be. e100_resume would probably
> have to call e100_alloc_cbs early on, while e100_up should avoid
> calling it a second time if nic->cbs_avail != 0. A tentative patch
> for testing is attached.

Reportedly, the patch fixes the crash on resume.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: sched_yield() makes OpenLDAP slow
  2006-01-25 12:11     ` Olaf Kirch
@ 2006-01-25 13:51       ` Howard Chu
  2006-01-25 14:38         ` Robert Hancock
                           ` (2 more replies)
  2006-01-25 19:37       ` e100 oops on resume Jesse Brandeburg
  1 sibling, 3 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-25 13:51 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: hancockr


Robert Hancock wrote:
 > Howard Chu wrote:
 > > POSIX requires a reschedule to occur, as noted here:
 > > http://blog.firetree.net/2005/06/22/thread-yield-after-mutex-unlock/
 >
 > No, it doesn't:
 >
 > >
 > > The relevant SUSv3 text is here
 > > 
http://www.opengroup.org/onlinepubs/000095399/functions/pthread_mutex_unlock.html 

 >
 > "If there are threads blocked on the mutex object referenced by mutex
 > when pthread_mutex_unlock() is called, resulting in the mutex becoming
 > available, the scheduling policy shall determine which thread shall
 > acquire the mutex."
 >
 > This says nothing about requiring a reschedule. The "scheduling policy"
 > can well decide that the thread which just released the mutex can
 > re-acquire it.

No, because the thread that just released the mutex is obviously not one 
of  the threads blocked on the mutex. When a mutex is unlocked, one of 
the *waiting* threads at the time of the unlock must acquire it, and the 
scheduling policy can determine that. But the thread the released the 
mutex is not one of the waiting threads, and is not eligible for 
consideration.

 > > I suppose if pthread_mutex_unlock() actually behaved correctly we 
could
 > > remove the other sched_yield() hacks that didn't belong there in the
 > > first place and go on our merry way.
 >
 > Generally, needing to implement hacks like this is a sign that there are
 > problems with the synchronization design of the code (like a mutex which
 > has excessive contention). Programs should not rely on the scheduling
 > behavior of the kernel for proper operation when that behavior is not
 > defined.
 >
 > --
 > Robert Hancock      Saskatoon, SK, Canada
 > To email, remove "nospam" from hancockr@nospamshaw.ca
 > Home Page: http://www.roberthancock.com/

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: sched_yield() makes OpenLDAP slow
  2006-01-25 13:51       ` sched_yield() makes OpenLDAP slow Howard Chu
@ 2006-01-25 14:38         ` Robert Hancock
  2006-01-25 17:49         ` Christopher Friesen
  2006-01-26  1:07         ` sched_yield() makes OpenLDAP slow David Schwartz
  2 siblings, 0 replies; 85+ messages in thread
From: Robert Hancock @ 2006-01-25 14:38 UTC (permalink / raw)
  To: Howard Chu; +Cc: Linux Kernel Mailing List

Howard Chu wrote:
> No, because the thread that just released the mutex is obviously not one 
> of  the threads blocked on the mutex. When a mutex is unlocked, one of 
> the *waiting* threads at the time of the unlock must acquire it, and the 
> scheduling policy can determine that. But the thread the released the 
> mutex is not one of the waiting threads, and is not eligible for 
> consideration.

That statement does not imply that any reschedule needs to happen at the 
time of the mutex unlock at all, only that the other threads waiting on 
the mutex can attempt to reacquire it when the scheduler allows them to. 
  In all likelihood, what tends to happen is that either the thread that 
had the mutex previously still has time left in its timeslice and is 
allowed to keep running and reacquire the mutex, or another thread is 
woken up (perhaps on another CPU) but doesn't reacquire the mutex before 
the original thread carries on and acquires it, and therefore goes back 
to sleep.

Forcing the mutex to ping-pong between different threads would be quite 
inefficient (especially on SMP machines), and is not something that 
POSIX requires.

--
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: sched_yield() makes OpenLDAP slow
  2006-01-25 13:51       ` sched_yield() makes OpenLDAP slow Howard Chu
  2006-01-25 14:38         ` Robert Hancock
@ 2006-01-25 17:49         ` Christopher Friesen
  2006-01-25 18:26           ` pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow) Howard Chu
  2006-01-26  1:07         ` sched_yield() makes OpenLDAP slow David Schwartz
  2 siblings, 1 reply; 85+ messages in thread
From: Christopher Friesen @ 2006-01-25 17:49 UTC (permalink / raw)
  To: Howard Chu; +Cc: Linux Kernel Mailing List, hancockr

Howard Chu wrote:
> 
> Robert Hancock wrote:

>  > This says nothing about requiring a reschedule. The "scheduling policy"
>  > can well decide that the thread which just released the mutex can
>  > re-acquire it.
> 
> No, because the thread that just released the mutex is obviously not one 
> of  the threads blocked on the mutex. When a mutex is unlocked, one of 
> the *waiting* threads at the time of the unlock must acquire it, and the 
> scheduling policy can determine that. But the thread the released the 
> mutex is not one of the waiting threads, and is not eligible for 
> consideration.

Is it *required* that the new owner of the mutex is determined at the 
time of mutex release?

If the kernel doesn't actually determine the new owner of the mutex 
until the currently running thread swaps out, it would be possible for 
the currently running thread to re-aquire the mutex.

Chris

^ permalink raw reply	[flat|nested] 85+ messages in thread

* pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 17:49         ` Christopher Friesen
@ 2006-01-25 18:26           ` Howard Chu
  2006-01-25 18:59             ` Nick Piggin
                               ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-25 18:26 UTC (permalink / raw)
  To: Christopher Friesen; +Cc: Linux Kernel Mailing List, hancockr

Christopher Friesen wrote:
> Howard Chu wrote:
>>
>> Robert Hancock wrote:
>
>>  > This says nothing about requiring a reschedule. The "scheduling 
>> policy"
>>  > can well decide that the thread which just released the mutex can
>>  > re-acquire it.
>>
>> No, because the thread that just released the mutex is obviously not 
>> one of  the threads blocked on the mutex. When a mutex is unlocked, 
>> one of the *waiting* threads at the time of the unlock must acquire 
>> it, and the scheduling policy can determine that. But the thread the 
>> released the mutex is not one of the waiting threads, and is not 
>> eligible for consideration.
>
> Is it *required* that the new owner of the mutex is determined at the 
> time of mutex release?
>
> If the kernel doesn't actually determine the new owner of the mutex 
> until the currently running thread swaps out, it would be possible for 
> the currently running thread to re-aquire the mutex.

The SUSv3 text seems pretty clear. It says "WHEN pthread_mutex_unlock() 
is called, ... the scheduling policy SHALL decide ..." It doesn't say 
MAY, and it doesn't say "some undefined time after the call." There is 
nothing optional or implementation-defined here. The only thing that is 
not explicitly stated is what happens when there are no waiting threads; 
in that case obviously the running thread can continue running.

re: forcing the mutex to ping-pong between different threads - if that 
is inefficient, then the thread scheduler needs to be tuned differently. 
Threads and thread context switches are supposed to be cheap, otherwise 
you might as well just program with fork() instead. (And of course, back 
when Unix was first developed, *processes* were lightweight, compared to 
other extant OSs.)

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 18:26           ` pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow) Howard Chu
@ 2006-01-25 18:59             ` Nick Piggin
  2006-01-25 19:32               ` Howard Chu
  2006-01-25 21:06             ` Lee Revell
  2006-01-26  0:08             ` Robert Hancock
  2 siblings, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-25 18:59 UTC (permalink / raw)
  To: Howard Chu; +Cc: Christopher Friesen, Linux Kernel Mailing List, hancockr

Howard Chu wrote:
> Christopher Friesen wrote:
> 
>> Howard Chu wrote:
>>
>>>
>>> Robert Hancock wrote:
>>
>>
>>>  > This says nothing about requiring a reschedule. The "scheduling 
>>> policy"
>>>  > can well decide that the thread which just released the mutex can
>>>  > re-acquire it.
>>>
>>> No, because the thread that just released the mutex is obviously not 
>>> one of  the threads blocked on the mutex. When a mutex is unlocked, 
>>> one of the *waiting* threads at the time of the unlock must acquire 
>>> it, and the scheduling policy can determine that. But the thread the 
>>> released the mutex is not one of the waiting threads, and is not 
>>> eligible for consideration.
>>
>>
>> Is it *required* that the new owner of the mutex is determined at the 
>> time of mutex release?
>>
>> If the kernel doesn't actually determine the new owner of the mutex 
>> until the currently running thread swaps out, it would be possible for 
>> the currently running thread to re-aquire the mutex.
> 
> 
> The SUSv3 text seems pretty clear. It says "WHEN pthread_mutex_unlock() 
> is called, ... the scheduling policy SHALL decide ..." It doesn't say 
> MAY, and it doesn't say "some undefined time after the call." There is 
> nothing optional or implementation-defined here. The only thing that is 
> not explicitly stated is what happens when there are no waiting threads; 
> in that case obviously the running thread can continue running.
> 

But it doesn't say the unlocking thread must yield to the new mutex
owner, only that the scheduling policy shall determine the which
thread aquires the lock.

It doesn't say that decision must be made immediately, either (eg.
it could be made as a by product of which contender is chosen to run
next).

I think the intention of the wording is that for deterministic policies,
it is clear that the waiting threads are actually worken and reevaluated
for scheduling. In the case of SCHED_OTHER, it means basically nothing,
considering the scheduling policy is arbitrary.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 18:59             ` Nick Piggin
@ 2006-01-25 19:32               ` Howard Chu
  2006-01-26  8:51                 ` Nick Piggin
  2006-01-26 10:38                 ` Nikita Danilov
  0 siblings, 2 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-25 19:32 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Christopher Friesen, Linux Kernel Mailing List, hancockr

Nick Piggin wrote:
> Howard Chu wrote:
>> The SUSv3 text seems pretty clear. It says "WHEN 
>> pthread_mutex_unlock() is called, ... the scheduling policy SHALL 
>> decide ..." It doesn't say MAY, and it doesn't say "some undefined 
>> time after the call." There is nothing optional or 
>> implementation-defined here. The only thing that is not explicitly 
>> stated is what happens when there are no waiting threads; in that 
>> case obviously the running thread can continue running.
>>
>
> But it doesn't say the unlocking thread must yield to the new mutex
> owner, only that the scheduling policy shall determine the which
> thread aquires the lock.

True, the unlocking thread doesn't have to yield to the new mutex owner 
as a direct consequence of the unlock. But logically, if the unlocking 
thread subsequently calls mutex_lock, it must block, because some other 
thread has already been assigned ownership of the mutex.

> It doesn't say that decision must be made immediately, either (eg.
> it could be made as a by product of which contender is chosen to run
> next).

A straightforward reading of the language here says the decision happens 
"when pthread_mutex_unlock() is called" and not at any later time. There 
is nothing here to support your interpretation.
>
> I think the intention of the wording is that for deterministic policies,
> it is clear that the waiting threads are actually worken and reevaluated
> for scheduling. In the case of SCHED_OTHER, it means basically nothing,
> considering the scheduling policy is arbitrary.
>
Clearly the point is that one of the waiting threads is waken and gets 
the mutex, and it doesn't matter which thread is chosen. I.e., whatever 
thread the scheduling policy chooses. The fact that SCHED_OTHER can 
choose arbitrarily is immaterial, it still can only choose one of the 
waiting threads.

The fact that SCHED_OTHER's scheduling behavior is undefined is not free 
license to implement whatever you want. Scheduling policies are an 
optional feature; the basic thread behavior must still be consistent 
even on systems that don't implement scheduling policies.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-25 12:11     ` Olaf Kirch
  2006-01-25 13:51       ` sched_yield() makes OpenLDAP slow Howard Chu
@ 2006-01-25 19:37       ` Jesse Brandeburg
  2006-01-25 20:14         ` Olaf Kirch
  2006-01-26  0:28         ` Jesse Brandeburg
  1 sibling, 2 replies; 85+ messages in thread
From: Jesse Brandeburg @ 2006-01-25 19:37 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: Stefan Seyfried, Linux Kernel Mailing List, netdev

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

On 1/25/06, Olaf Kirch <okir@suse.de> wrote:
> On Wed, Jan 25, 2006 at 10:02:40AM +0100, Olaf Kirch wrote:
> > I'm not sure what the right fix would be. e100_resume would probably
> > have to call e100_alloc_cbs early on, while e100_up should avoid
> > calling it a second time if nic->cbs_avail != 0. A tentative patch
> > for testing is attached.
>
> Reportedly, the patch fixes the crash on resume.

Cool, thanks for the research, I have a concern about this however.

its an interesting patch, but it raises the question why does
e100_init_hw need to be called at all in resume?  I looked back
through our history and that init_hw call has always been there.  I
think its incorrect, but its taking me a while to set up a system with
the ability to resume.

everywhere else in the driver alloc_cbs is called before init_hw so it
just seems like a long standing bug.

comments?  anyone want to test? i compile tested this, but it is untested.

[-- Attachment #2: e100_resume_no_init.diff --]
[-- Type: application/octet-stream, Size: 818 bytes --]

e100: remove init_hw call to fix panic

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

e100 seems to have had a long standing bug where e100_init_hw was being
called when it should not have been.  This caused a panic due to recent
changes that rely on correct set up in the driver, and more robust error
paths.
---

 drivers/net/e100.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2752,8 +2752,6 @@ static int e100_resume(struct pci_dev *p
 	retval = pci_enable_wake(pdev, 0, 0);
 	if (retval)
 		DPRINTK(PROBE,ERR, "Error clearing wake events\n");
-	if(e100_hw_init(nic))
-		DPRINTK(HW, ERR, "e100_hw_init failed\n");
 
 	netif_device_attach(netdev);
 	if(netif_running(netdev))

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-25 19:37       ` e100 oops on resume Jesse Brandeburg
@ 2006-01-25 20:14         ` Olaf Kirch
  2006-01-25 22:28           ` Jesse Brandeburg
  2006-01-26  0:28         ` Jesse Brandeburg
  1 sibling, 1 reply; 85+ messages in thread
From: Olaf Kirch @ 2006-01-25 20:14 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Stefan Seyfried, Linux Kernel Mailing List, netdev

On Wed, Jan 25, 2006 at 11:37:40AM -0800, Jesse Brandeburg wrote:
> its an interesting patch, but it raises the question why does
> e100_init_hw need to be called at all in resume?  I looked back
> through our history and that init_hw call has always been there.  I
> think its incorrect, but its taking me a while to set up a system with
> the ability to resume.

I'll ask the folks here to give it a try tomorrow. But I suspect at
least some of it will be needed. For instance I assume you'll
have to reload to ucode when bringing the NIC back from sleep.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 18:26           ` pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow) Howard Chu
  2006-01-25 18:59             ` Nick Piggin
@ 2006-01-25 21:06             ` Lee Revell
  2006-01-25 22:14               ` Howard Chu
  2006-01-26  0:08             ` Robert Hancock
  2 siblings, 1 reply; 85+ messages in thread
From: Lee Revell @ 2006-01-25 21:06 UTC (permalink / raw)
  To: Howard Chu; +Cc: Christopher Friesen, Linux Kernel Mailing List, hancockr

On Wed, 2006-01-25 at 10:26 -0800, Howard Chu wrote:
> The SUSv3 text seems pretty clear. It says "WHEN
> pthread_mutex_unlock() 
> is called, ... the scheduling policy SHALL decide ..." It doesn't say 
> MAY, and it doesn't say "some undefined time after the call."  

This does NOT require pthread_mutex_unlock() to cause the scheduler to
immediately pick a new runnable process.  It only says it's up the the
scheduling POLICY what to do.  The policy could be "let the unlocking
thread finish its timeslice then reschedule".

Lee


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 21:06             ` Lee Revell
@ 2006-01-25 22:14               ` Howard Chu
  2006-01-26  0:16                 ` Robert Hancock
                                   ` (3 more replies)
  0 siblings, 4 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-25 22:14 UTC (permalink / raw)
  To: Lee Revell; +Cc: Christopher Friesen, Linux Kernel Mailing List, hancockr

Lee Revell wrote:
> On Wed, 2006-01-25 at 10:26 -0800, Howard Chu wrote:
>   
>> The SUSv3 text seems pretty clear. It says "WHEN
>> pthread_mutex_unlock() 
>> is called, ... the scheduling policy SHALL decide ..." It doesn't say 
>> MAY, and it doesn't say "some undefined time after the call."  
>>     
>
> This does NOT require pthread_mutex_unlock() to cause the scheduler to
> immediately pick a new runnable process.  It only says it's up the the
> scheduling POLICY what to do.  The policy could be "let the unlocking
> thread finish its timeslice then reschedule".
>   

This is obviously some very old ground.

http://groups.google.com/groups?threadm=etai7.108188%24B37.2381726%40news1.rdc1.bc.home.com

Kaz's post clearly interprets the POSIX spec differently from you. The 
policy can decide *which of the waiting threads* gets the mutex, but the 
releasing thread is totally out of the picture. For good or bad, the 
current pthread_mutex_unlock() is not POSIX-compliant. Now then, if 
we're forced to live with that, for efficiency's sake, that's OK, 
assuming that valid workarounds exist, such as inserting a sched_yield() 
after the unlock.

http://groups.google.com/group/comp.programming.threads/msg/16c01eac398a1139?hl=en&

But then we have to deal with you folks' bizarre notion that 
sched_yield() can legitimately be a no-op, which also defies the POSIX 
spec. Again, in SUSv3 "The /sched_yield/() function shall force the 
running thread to relinquish the processor until it again becomes the 
head of its thread list. It takes no arguments." There is no language 
here saying "sched_yield *may* do nothing at all." There are of course 
cases where it will have no effect, such as when called in a 
single-threaded program, but those are the exceptions that define the 
rule. Otherwise, the expectation is that some other runnable thread will 
acquire the CPU. Again, note that sched_yield() is a core function of 
the Threads specification, while scheduling policies are an optional 
feature. The function's core behavior (give up the CPU and make some 
other runnable thread run) is invariant; the current thread gives up the 
CPU regardless of which scheduling policy is in effect or even if 
scheduling policies are implemented at all. The only behavior that's 
open to implementors is which *of the other runnable threads* is chosen 
to take the place of the current thread.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-25 20:14         ` Olaf Kirch
@ 2006-01-25 22:28           ` Jesse Brandeburg
  0 siblings, 0 replies; 85+ messages in thread
From: Jesse Brandeburg @ 2006-01-25 22:28 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: Stefan Seyfried, Linux Kernel Mailing List, netdev

On 1/25/06, Olaf Kirch <okir@suse.de> wrote:
> On Wed, Jan 25, 2006 at 11:37:40AM -0800, Jesse Brandeburg wrote:
> > its an interesting patch, but it raises the question why does
> > e100_init_hw need to be called at all in resume?  I looked back
> > through our history and that init_hw call has always been there.  I
> > think its incorrect, but its taking me a while to set up a system with
> > the ability to resume.
>
> I'll ask the folks here to give it a try tomorrow. But I suspect at
> least some of it will be needed. For instance I assume you'll
> have to reload to ucode when bringing the NIC back from sleep.

I totally agree thats what it looks like, but unless I'm missing
something e100_up will take care of everything, and if the interface
is not up, e100_open->e100_up afterward will take care of it.

we have to be really careful about what might happen when resuming on
a system with a SMBUS link to a BMC, as there are some tricky
transitions in the hardware that can be easily violated.

Jesse

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 18:26           ` pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow) Howard Chu
  2006-01-25 18:59             ` Nick Piggin
  2006-01-25 21:06             ` Lee Revell
@ 2006-01-26  0:08             ` Robert Hancock
  2 siblings, 0 replies; 85+ messages in thread
From: Robert Hancock @ 2006-01-26  0:08 UTC (permalink / raw)
  To: Howard Chu; +Cc: Christopher Friesen, Linux Kernel Mailing List

Howard Chu wrote:
> The SUSv3 text seems pretty clear. It says "WHEN pthread_mutex_unlock() 
> is called, ... the scheduling policy SHALL decide ..." It doesn't say 
> MAY, and it doesn't say "some undefined time after the call." There is 
> nothing optional or implementation-defined here. The only thing that is 
> not explicitly stated is what happens when there are no waiting threads; 
> in that case obviously the running thread can continue running.

It says the scheduling policy will decide who gets the mutex. It does 
not say that such a decision must be made immediately. That seems rather 
implementation defined to me.

> 
> re: forcing the mutex to ping-pong between different threads - if that 
> is inefficient, then the thread scheduler needs to be tuned differently. 
> Threads and thread context switches are supposed to be cheap, otherwise 
> you might as well just program with fork() instead. (And of course, back 
> when Unix was first developed, *processes* were lightweight, compared to 
> other extant OSs.)

This is nothing to do with the thread scheduler being inefficient. It is 
inherently inefficient to context-switch repeatedly no matter how good 
the kernel is. It trashes the CPU pipeline, at the very least, can cause 
thrashing of the CPU caches, and can cause cache lines to be pushed back 
and forth across the bus on SMP machines which really kills performance.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 22:14               ` Howard Chu
@ 2006-01-26  0:16                 ` Robert Hancock
  2006-01-26  0:49                   ` Howard Chu
  2006-01-26  2:05                 ` David Schwartz
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 85+ messages in thread
From: Robert Hancock @ 2006-01-26  0:16 UTC (permalink / raw)
  To: Howard Chu; +Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List

Howard Chu wrote:
> Kaz's post clearly interprets the POSIX spec differently from you. The 
> policy can decide *which of the waiting threads* gets the mutex, but the 
> releasing thread is totally out of the picture. For good or bad, the 
> current pthread_mutex_unlock() is not POSIX-compliant. Now then, if 
> we're forced to live with that, for efficiency's sake, that's OK, 
> assuming that valid workarounds exist, such as inserting a sched_yield() 
> after the unlock.
> 
> http://groups.google.com/group/comp.programming.threads/msg/16c01eac398a1139?hl=en& 

Did you read the rest of this post?

"In any event, all the mutex fairness in the world won't solve the
problem. Consider if this lock/unlock cycle is inside a larger
lock/unlock cycle. Yielding at the unlock or blocking at the lock will
increase the dreadlock over the larger mutex.

The fact is, the threads library can't read the programmer's mind. So
it shouldn't try to, especially if that makes the common cases much
worse for the benefit of excruciatingly rare cases."

And earlier in that thread ("old behavior" referring to an old 
LinuxThreads version which allowed "unfair" locking):

"Notice however that even the old "unfair" behavior is perfectly
acceptable with respect to the POSIX standard: for the default
scheduling policy, POSIX makes no guarantees of fairness, such as "the
thread waiting for the mutex for the longest time always acquires it
first". Properly written multithreaded code avoids that kind of heavy
contention on mutexes, and does not run into fairness problems. If you
need scheduling guarantees, you should consider using the real-time
scheduling policies SCHED_RR and SCHED_FIFO, which have precisely
defined scheduling behaviors. "

If you indeed have some thread which is trying to do an essentially 
infinite amount of work, you really should not have that thread locking 
a mutex, which other threads need to acquire, for a large part of each 
cycle. Correctness aside, this is simply not efficient.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-25 19:37       ` e100 oops on resume Jesse Brandeburg
  2006-01-25 20:14         ` Olaf Kirch
@ 2006-01-26  0:28         ` Jesse Brandeburg
  2006-01-26  9:32           ` Pavel Machek
                             ` (2 more replies)
  1 sibling, 3 replies; 85+ messages in thread
From: Jesse Brandeburg @ 2006-01-26  0:28 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: Stefan Seyfried, Linux Kernel Mailing List, netdev

On 1/25/06, Jesse Brandeburg <jesse.brandeburg@gmail.com> wrote:
> On 1/25/06, Olaf Kirch <okir@suse.de> wrote:
> > On Wed, Jan 25, 2006 at 10:02:40AM +0100, Olaf Kirch wrote:
> > > I'm not sure what the right fix would be. e100_resume would probably
> > > have to call e100_alloc_cbs early on, while e100_up should avoid
> > > calling it a second time if nic->cbs_avail != 0. A tentative patch
> > > for testing is attached.
> >
> > Reportedly, the patch fixes the crash on resume.
>
> Cool, thanks for the research, I have a concern about this however.
>
> its an interesting patch, but it raises the question why does
> e100_init_hw need to be called at all in resume?  I looked back
> through our history and that init_hw call has always been there.  I
> think its incorrect, but its taking me a while to set up a system with
> the ability to resume.
>
> everywhere else in the driver alloc_cbs is called before init_hw so it
> just seems like a long standing bug.
>
> comments?  anyone want to test? i compile tested this, but it is untested.

Okay I reproduced the issue on 2.6.15.1 (with S1 sleep) and was able
to show that my patch that just removes e100_init_hw works okay for
me.  Let me know how it goes for you, I think this is a good fix.

Jesse

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  0:16                 ` Robert Hancock
@ 2006-01-26  0:49                   ` Howard Chu
  2006-01-26  1:04                     ` Lee Revell
  0 siblings, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-26  0:49 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List

Robert Hancock wrote:
> Howard Chu wrote:
>> Kaz's post clearly interprets the POSIX spec differently from you. 
>> The policy can decide *which of the waiting threads* gets the mutex, 
>> but the releasing thread is totally out of the picture. For good or 
>> bad, the current pthread_mutex_unlock() is not POSIX-compliant. Now 
>> then, if we're forced to live with that, for efficiency's sake, 
>> that's OK, assuming that valid workarounds exist, such as inserting a 
>> sched_yield() after the unlock.
>>
>> http://groups.google.com/group/comp.programming.threads/msg/16c01eac398a1139?hl=en& 
>
>
> Did you read the rest of this post?
>
> "In any event, all the mutex fairness in the world won't solve the
> problem. Consider if this lock/unlock cycle is inside a larger
> lock/unlock cycle. Yielding at the unlock or blocking at the lock will
> increase the dreadlock over the larger mutex.

Basic "fairness" isn't the issue. Fairness is concerned with which of 
*multiple waiting threads* gets the mutex, and that is certainly 
irrelevant here. The issue is that the releasing thread should not be a 
candidate.

The mutex functions are a core part of the thread specification; they 
have a fundamental behavior, and the definition says if there are 
blocked threads waiting on a mutex when it gets unlocked, one of the 
waiting threads gets the mutex. Which of the waiting threads gets it is 
unspecified in the core spec. On a system that implements the scheduling 
option, the scheduling policy specifies which thread. The scheduling 
policy is an optional feature, it serves only to refine the core 
functionality. A program written to the basic core specification should 
not break when run in an environment that implements optional features.

The spec may be mandating a non-optimal behavior, but that's a 
side-issue - someone should file an objection with the Open Group to get 
it redefined if it's such a bad idea. But for now, the NPTL 
implementation is non-conformant.

Standards aren't just academic exercises. They're meant to be useful. If 
the standard is too thinly specified, is ambiguous, or allows 
nonsensical behavior, it's not useful and should be fixed at the source, 
not just ignored and papered over in implementations.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  0:49                   ` Howard Chu
@ 2006-01-26  1:04                     ` Lee Revell
  2006-01-26  1:31                       ` Howard Chu
  0 siblings, 1 reply; 85+ messages in thread
From: Lee Revell @ 2006-01-26  1:04 UTC (permalink / raw)
  To: Howard Chu; +Cc: Robert Hancock, Christopher Friesen, Linux Kernel Mailing List

On Wed, 2006-01-25 at 16:49 -0800, Howard Chu wrote:
> Basic "fairness" isn't the issue. Fairness is concerned with which of 
> *multiple waiting threads* gets the mutex, and that is certainly 
> irrelevant here. The issue is that the releasing thread should not be
> a candidate.
> 

You seem to be making 2 controversial assertions:

1. pthread_mutex_unlock must cause an immediate reschedule if other
threads are blocked on the mutex, and 
2. if the unlocking thread immediately tries to relock the mutex,
another thread must get it first

I disagree with #1, which makes #2 irrelevant.  It would lead to
obviously incorrect behavior, pthread_mutex_unlock would no longer be an
RT safe operation for example.

Also consider a SCHED_FIFO policy - static priorities and the scheduler
always runs the highest priority runnable thread - under your
interpretation of POSIX a high priority thread unlocking a mutex would
require the scheduler to run a lower priority thread which violates
SCHED_FIFO semantics.

Lee

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: sched_yield() makes OpenLDAP slow
  2006-01-25 13:51       ` sched_yield() makes OpenLDAP slow Howard Chu
  2006-01-25 14:38         ` Robert Hancock
  2006-01-25 17:49         ` Christopher Friesen
@ 2006-01-26  1:07         ` David Schwartz
  2006-01-26  8:30           ` Helge Hafting
  2 siblings, 1 reply; 85+ messages in thread
From: David Schwartz @ 2006-01-26  1:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: hancockr


> Robert Hancock wrote:

>  > "If there are threads blocked on the mutex object referenced by mutex
>  > when pthread_mutex_unlock() is called, resulting in the mutex becoming
>  > available, the scheduling policy shall determine which thread shall
>  > acquire the mutex."
>  >
>  > This says nothing about requiring a reschedule. The "scheduling policy"
>  > can well decide that the thread which just released the mutex can
>  > re-acquire it.

> No, because the thread that just released the mutex is obviously not one
> of  the threads blocked on the mutex.

	So what?

> When a mutex is unlocked, one of
> the *waiting* threads at the time of the unlock must acquire it, and the
> scheduling policy can determine that.

	This is false and is nowhere found in the standard.

> But the thread the released the
> mutex is not one of the waiting threads, and is not eligible for
> consideration.

	Where are you getting this from? Nothing requires the scheduler to schedule
any threads when the mutex is released.

	All that must happen is that the mutex must be unlocked. The scheduler is
permitted to allow any thread it wants to run at that point, or no thread.
Nothing says the thread that released the mutex can't continue running and
nothing says that it can't call pthread_mutex_lock and re-acquire the mutex
before any other thread gets around to getting it.

	In general, it is very bad karma for the scheduler to stop a thread before
its timeslice is up if it doesn't have to. Consider one CPU and two threads,
each needing to do 100 quick lock/unlock cycles. Why force 200 context
switches?

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  1:04                     ` Lee Revell
@ 2006-01-26  1:31                       ` Howard Chu
  0 siblings, 0 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-26  1:31 UTC (permalink / raw)
  To: Lee Revell; +Cc: Robert Hancock, Christopher Friesen, Linux Kernel Mailing List

Lee Revell wrote:
> On Wed, 2006-01-25 at 16:49 -0800, Howard Chu wrote:
>   
>> Basic "fairness" isn't the issue. Fairness is concerned with which of 
>> *multiple waiting threads* gets the mutex, and that is certainly 
>> irrelevant here. The issue is that the releasing thread should not be
>> a candidate.
>>
>>     
>
> You seem to be making 2 controversial assertions:
>
> 1. pthread_mutex_unlock must cause an immediate reschedule if other
> threads are blocked on the mutex, and 
> 2. if the unlocking thread immediately tries to relock the mutex,
> another thread must get it first
>
> I disagree with #1, which makes #2 irrelevant.  It would lead to
> obviously incorrect behavior, pthread_mutex_unlock would no longer be an
> RT safe operation for example.
>   

Actually no, I see that #1 is unnecessary, and already acknowledged as such
http://groups.google.com/group/fa.linux.kernel/msg/89da66017d53d496

But #2 still holds.

> Also consider a SCHED_FIFO policy - static priorities and the scheduler
> always runs the highest priority runnable thread - under your
> interpretation of POSIX a high priority thread unlocking a mutex would
> require the scheduler to run a lower priority thread which violates
> SCHED_FIFO semantics

See the Mutex Initialization Scheduling Attributes section which 
specifically addresses priority inversion:
http://www.opengroup.org/onlinepubs/000095399/xrat/xsh_chap02.html#tag_03_02_09

If point #2 were not true, then there would be no need to bother with 
any of that. Instead that text ends with "it is important that 
IEEE Std 1003.1-2001 provide these interfaces for those cases in which 
it is necessary."

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 22:14               ` Howard Chu
  2006-01-26  0:16                 ` Robert Hancock
@ 2006-01-26  2:05                 ` David Schwartz
  2006-01-26  2:48                   ` Mark Lord
  2006-01-26  8:54                 ` Nick Piggin
  2006-01-26 10:44                 ` Nikita Danilov
  3 siblings, 1 reply; 85+ messages in thread
From: David Schwartz @ 2006-01-26  2:05 UTC (permalink / raw)
  To: Lee Revell; +Cc: Christopher Friesen, Linux Kernel Mailing List, hancockr


> Kaz's post clearly interprets the POSIX spec differently from you. The
> policy can decide *which of the waiting threads* gets the mutex, but the
> releasing thread is totally out of the picture. For good or bad, the
> current pthread_mutex_unlock() is not POSIX-compliant. Now then, if
> we're forced to live with that, for efficiency's sake, that's OK,
> assuming that valid workarounds exist, such as inserting a sched_yield()
> after the unlock.

	My thanks to David Hopwood for providing me with the definitive refutation
of this position. The response is that the as-if rules allows the
implementation to violate the specification internally provided no compliant
application could tell the difference.

	When you call 'pthread_mutex_lock', there is no guarantee regarding how
long it will or might take until you are actually waiting for the mutex. So
no conforming application can ever tell whether or not it is waiting for the
mutex or about to wait for the mutex.

	So you cannot write an application that can tell the difference.

	His exact quote is, "It could have been the case that the other threads ran
more slowly, so that they didn't reach the point of blocking on the mutex
before the pthread_mutex_unlock()."

	You can find it on comp.programming.threads if you like.

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  2:05                 ` David Schwartz
@ 2006-01-26  2:48                   ` Mark Lord
  2006-01-26  3:30                     ` David Schwartz
  0 siblings, 1 reply; 85+ messages in thread
From: Mark Lord @ 2006-01-26  2:48 UTC (permalink / raw)
  To: davids
  Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List, hancockr

David Schwartz wrote:
>> Kaz's post clearly interprets the POSIX spec differently from you. The
>> policy can decide *which of the waiting threads* gets the mutex, but the
>> releasing thread is totally out of the picture. For good or bad, the
>> current pthread_mutex_unlock() is not POSIX-compliant. Now then, if
>> we're forced to live with that, for efficiency's sake, that's OK,
>> assuming that valid workarounds exist, such as inserting a sched_yield()
>> after the unlock.
> 
> 	My thanks to David Hopwood for providing me with the definitive refutation
> of this position. The response is that the as-if rules allows the
> implementation to violate the specification internally provided no compliant
> application could tell the difference.
> 
> 	When you call 'pthread_mutex_lock', there is no guarantee regarding how
> long it will or might take until you are actually waiting for the mutex. So
> no conforming application can ever tell whether or not it is waiting for the
> mutex or about to wait for the mutex.
> 
> 	So you cannot write an application that can tell the difference.

Not true.  The code for the relinquishing thread could indeed tell the difference.

-ml

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  2:48                   ` Mark Lord
@ 2006-01-26  3:30                     ` David Schwartz
  2006-01-26  3:49                       ` Samuel Masham
  0 siblings, 1 reply; 85+ messages in thread
From: David Schwartz @ 2006-01-26  3:30 UTC (permalink / raw)
  To: lkml; +Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List, hancockr


> > 	So you cannot write an application that can tell the difference.

> Not true.  The code for the relinquishing thread could indeed
> tell the difference.
>
> -ml

	It can tell the difference between the other thread getting the mutex first
and it getting the mutex first. But it cannot tell the difference between an
implementation that puts random sleeps before calls to 'pthread_mutex_lock'
and an implementation that has the allegedly non-compliant behavior. That
makes the behavior compliant under the 'as-if' rule.

	If you don't believe me, try to write a program that prints 'non-compliant'
on a system that has the alleged non-compliance but is guaranteed not to do
so on any compliant system. It cannot be done.

	In order to claim the alleged compliance, you would have to know that a
thread waiting for a mutex did not get it. But there is no possible way you
can know that another thread is waiting for the mutex (as opposed to being
about to wait for it). So you can never detect the claimed non-compliance,
so it's not non-compliance.

	This is definitive, really. It 100% refutes the claim.

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  3:30                     ` David Schwartz
@ 2006-01-26  3:49                       ` Samuel Masham
  2006-01-26  4:02                         ` Samuel Masham
  0 siblings, 1 reply; 85+ messages in thread
From: Samuel Masham @ 2006-01-26  3:49 UTC (permalink / raw)
  To: davids
  Cc: lkml, Lee Revell, Christopher Friesen, Linux Kernel Mailing List,
	hancockr

On 26/01/06, David Schwartz <davids@webmaster.com> wrote:
>
> > >     So you cannot write an application that can tell the difference.
>
> > Not true.  The code for the relinquishing thread could indeed
> > tell the difference.
> >
> > -ml
>
>         It can tell the difference between the other thread getting the mutex first
> and it getting the mutex first. But it cannot tell the difference between an
> implementation that puts random sleeps before calls to 'pthread_mutex_lock'
> and an implementation that has the allegedly non-compliant behavior. That
> makes the behavior compliant under the 'as-if' rule.
>
>         If you don't believe me, try to write a program that prints 'non-compliant'
> on a system that has the alleged non-compliance but is guaranteed not to do
> so on any compliant system. It cannot be done.

Just putting priority inheritance on then in the running thread check
your priority, if it goes up then the waiting thread in really
waiting.

Then if you can release + get the lock again its non compliant.... no?

ie    pthread_mutexattr_setprotocol(pthread_mutexattr_t *attr, int
protocol); with PTHREAD_PRIO_INHERIT

comment:
As a rt person I don't like the idea of scheduler bounce so the way
round seems to be have the mutex lock acquiring work on a FIFO like
basis.


>         In order to claim the alleged compliance, you would have to know that a
> thread waiting for a mutex did not get it. But there is no possible way you
> can know that another thread is waiting for the mutex (as opposed to being
> about to wait for it). So you can never detect the claimed non-compliance,
> so it's not non-compliance.
>
>         This is definitive, really. It 100% refutes the claim.
>
>         DS
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  3:49                       ` Samuel Masham
@ 2006-01-26  4:02                         ` Samuel Masham
  2006-01-26  4:53                           ` Lee Revell
  0 siblings, 1 reply; 85+ messages in thread
From: Samuel Masham @ 2006-01-26  4:02 UTC (permalink / raw)
  To: davids
  Cc: lkml, Lee Revell, Christopher Friesen, Linux Kernel Mailing List,
	hancockr

On 26/01/06, Samuel Masham <samuel.masham@gmail.com> wrote:
> comment:
> As a rt person I don't like the idea of scheduler bounce so the way
> round seems to be have the mutex lock acquiring work on a FIFO like
> basis.

which is obviously wrong...

Howeve my basic point stands but needs to be clarified a bit:

I think I can print non-compliant if the mutex acquisition doesn't
respect the higher priority of the waiter over the current process
even if the mutex is "available".

OK?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  4:02                         ` Samuel Masham
@ 2006-01-26  4:53                           ` Lee Revell
  2006-01-26  6:14                             ` Samuel Masham
  0 siblings, 1 reply; 85+ messages in thread
From: Lee Revell @ 2006-01-26  4:53 UTC (permalink / raw)
  To: Samuel Masham
  Cc: davids, lkml, Christopher Friesen, Linux Kernel Mailing List, hancockr

On Thu, 2006-01-26 at 13:02 +0900, Samuel Masham wrote:
> On 26/01/06, Samuel Masham <samuel.masham@gmail.com> wrote:
> > comment:
> > As a rt person I don't like the idea of scheduler bounce so the way
> > round seems to be have the mutex lock acquiring work on a FIFO like
> > basis.
> 
> which is obviously wrong...
> 
> Howeve my basic point stands but needs to be clarified a bit:
> 
> I think I can print non-compliant if the mutex acquisition doesn't
> respect the higher priority of the waiter over the current process
> even if the mutex is "available".
> 
> OK?

I don't think using an optional feature (PI) counts...

Lee


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  4:53                           ` Lee Revell
@ 2006-01-26  6:14                             ` Samuel Masham
  0 siblings, 0 replies; 85+ messages in thread
From: Samuel Masham @ 2006-01-26  6:14 UTC (permalink / raw)
  To: Lee Revell
  Cc: davids, lkml, Christopher Friesen, Linux Kernel Mailing List, hancockr

On 26/01/06, Lee Revell <rlrevell@joe-job.com> wrote:
> On Thu, 2006-01-26 at 13:02 +0900, Samuel Masham wrote:
> > On 26/01/06, Samuel Masham <samuel.masham@gmail.com> wrote:
> > > comment:
> > > As a rt person I don't like the idea of scheduler bounce so the way
> > > round seems to be have the mutex lock acquiring work on a FIFO like
> > > basis.
> >
> > which is obviously wrong...
> >
> > Howeve my basic point stands but needs to be clarified a bit:
> >
> > I think I can print non-compliant if the mutex acquisition doesn't
> > respect the higher priority of the waiter over the current process
> > even if the mutex is "available".
> >
> > OK?
>
> I don't think using an optional feature (PI) counts...
>
> Lee

So when acquiring a mutex with pi enabled must involve scheduler...

... and you can skip that bit with it disabled as one can argue that
the user can't tell if the time slice hit between the call to acquire
the mutex and the actual mutex wait itself?

sounds a bit of a fudge to me....

I assume that mutexes will must never support a the wchan (proc)
interface or the like?

On the other hand the basic point about high contention around mutexes
and relying on this being a bad idea is fine by me.

Samuel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: sched_yield() makes OpenLDAP slow
  2006-01-26  1:07         ` sched_yield() makes OpenLDAP slow David Schwartz
@ 2006-01-26  8:30           ` Helge Hafting
  2006-01-26  9:01             ` Nick Piggin
  2006-01-26 10:50             ` Nikita Danilov
  0 siblings, 2 replies; 85+ messages in thread
From: Helge Hafting @ 2006-01-26  8:30 UTC (permalink / raw)
  To: davids; +Cc: Linux Kernel Mailing List, hancockr

David Schwartz wrote:

>>Robert Hancock wrote:
>>    
>>
>>But the thread the released the
>>mutex is not one of the waiting threads, and is not eligible for
>>consideration.
>>    
>>
>
>	Where are you getting this from? Nothing requires the scheduler to schedule
>any threads when the mutex is released.
>  
>
Correct.

>	All that must happen is that the mutex must be unlocked. The scheduler is
>permitted to allow any thread it wants to run at that point, or no thread.
>Nothing says the thread that released the mutex can't continue running and
>  
>
Correct. The releasing thread may keep running.

>nothing says that it can't call pthread_mutex_lock and re-acquire the mutex
>before any other thread gets around to getting it.
>  
>
Wrong.
The spec says that the mutex must be given to a waiter (if any) at the
moment of release.  The waiter don't have to be scheduled at that
point, it may keep sleeping with its freshly unlocked mutex.  So the
unlocking thread may continue - but if it tries to reaquire the mutex
it will find the mutex taken and go to sleep at that point. Then other
processes will schedule, and at some time the one now owning the mutex
will wake up and do its work.

>	In general, it is very bad karma for the scheduler to stop a thread before
>its timeslice is up if it doesn't have to. Consider one CPU and two threads,
>each needing to do 100 quick lock/unlock cycles. Why force 200 context
>switches?
>
Good point, except it is a strange program that do this.  Lock the mutex 
once,
do 100 operations, then unlock is the better way. :-)

Helge Hafting

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 19:32               ` Howard Chu
@ 2006-01-26  8:51                 ` Nick Piggin
  2006-01-26 14:15                   ` Kyle Moffett
  2006-01-26 10:38                 ` Nikita Danilov
  1 sibling, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26  8:51 UTC (permalink / raw)
  To: Howard Chu; +Cc: Christopher Friesen, Linux Kernel Mailing List, hancockr

Howard Chu wrote:
> Nick Piggin wrote:
> 
>> Howard Chu wrote:
>>
>>> The SUSv3 text seems pretty clear. It says "WHEN 
>>> pthread_mutex_unlock() is called, ... the scheduling policy SHALL 
>>> decide ..." It doesn't say MAY, and it doesn't say "some undefined 
>>> time after the call." There is nothing optional or 
>>> implementation-defined here. The only thing that is not explicitly 
>>> stated is what happens when there are no waiting threads; in that 
>>> case obviously the running thread can continue running.
>>>
>>
>> But it doesn't say the unlocking thread must yield to the new mutex
>> owner, only that the scheduling policy shall determine the which
>> thread aquires the lock.
> 
> 
> True, the unlocking thread doesn't have to yield to the new mutex owner 
> as a direct consequence of the unlock. But logically, if the unlocking 
> thread subsequently calls mutex_lock, it must block, because some other 
> thread has already been assigned ownership of the mutex.
> 
>> It doesn't say that decision must be made immediately, either (eg.
>> it could be made as a by product of which contender is chosen to run
>> next).
> 
> 
> A straightforward reading of the language here says the decision happens 
> "when pthread_mutex_unlock() is called" and not at any later time. There 
> is nothing here to support your interpretation.
> 

OK, so what happens if my scheduling policy decides _right then_, that
the next _running_ thread that was being blocked on or tries to aquire
the mutex, is the next owner?

This is the logical way for a *scheduling* policy to determine which
thread gets the mutex. I don't know any other way that the scheduling
policy could determine the next thread to get the mutex.

>>
>> I think the intention of the wording is that for deterministic policies,
>> it is clear that the waiting threads are actually worken and reevaluated
>> for scheduling. In the case of SCHED_OTHER, it means basically nothing,
>> considering the scheduling policy is arbitrary.
>>
> Clearly the point is that one of the waiting threads is waken and gets 
> the mutex, and it doesn't matter which thread is chosen. I.e., whatever 
> thread the scheduling policy chooses. The fact that SCHED_OTHER can 
> choose arbitrarily is immaterial, it still can only choose one of the 
> waiting threads.
> 

I don't know that it exactly says one of the waiting threads must get the
mutex.

> The fact that SCHED_OTHER's scheduling behavior is undefined is not free 
> license to implement whatever you want. Scheduling policies are an 
> optional feature; the basic thread behavior must still be consistent 
> even on systems that don't implement scheduling policies.
> 

It just so happens that normal tasks in Linux run in SCHED_OTHER. It
is irrelevant whether it might be an optional feature or not.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 22:14               ` Howard Chu
  2006-01-26  0:16                 ` Robert Hancock
  2006-01-26  2:05                 ` David Schwartz
@ 2006-01-26  8:54                 ` Nick Piggin
  2006-01-26 14:24                   ` Howard Chu
  2006-01-26 10:44                 ` Nikita Danilov
  3 siblings, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26  8:54 UTC (permalink / raw)
  To: Howard Chu
  Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List, hancockr

Howard Chu wrote:
> Lee Revell wrote:
> 
>> On Wed, 2006-01-25 at 10:26 -0800, Howard Chu wrote:
>>  
>>
>>> The SUSv3 text seems pretty clear. It says "WHEN
>>> pthread_mutex_unlock() is called, ... the scheduling policy SHALL 
>>> decide ..." It doesn't say MAY, and it doesn't say "some undefined 
>>> time after the call."      
>>
>>
>> This does NOT require pthread_mutex_unlock() to cause the scheduler to
>> immediately pick a new runnable process.  It only says it's up the the
>> scheduling POLICY what to do.  The policy could be "let the unlocking
>> thread finish its timeslice then reschedule".
>>   
> 
> 
> This is obviously some very old ground.
> 
> http://groups.google.com/groups?threadm=etai7.108188%24B37.2381726%40news1.rdc1.bc.home.com 
> 
> 
> Kaz's post clearly interprets the POSIX spec differently from you. The 
> policy can decide *which of the waiting threads* gets the mutex, but the 
> releasing thread is totally out of the picture. For good or bad, the 
> current pthread_mutex_unlock() is not POSIX-compliant. Now then, if 
> we're forced to live with that, for efficiency's sake, that's OK, 
> assuming that valid workarounds exist, such as inserting a sched_yield() 
> after the unlock.
> 
> http://groups.google.com/group/comp.programming.threads/msg/16c01eac398a1139?hl=en& 
> 
> 
> But then we have to deal with you folks' bizarre notion that 
> sched_yield() can legitimately be a no-op, which also defies the POSIX 
> spec. Again, in SUSv3 "The /sched_yield/() function shall force the 
> running thread to relinquish the processor until it again becomes the 
> head of its thread list. It takes no arguments." There is no language 

How many times have we been over this? What do you think the "head of
its thread list" might mean?

> here saying "sched_yield *may* do nothing at all." There are of course 

There is language saying SCHED_OTHER is arbitrary, including how the
thread list is implemented and how a task might become on the head of
it.

They obviously don't need to redefine exactly what sched_yield may do
under each scheduling policy, do they?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: sched_yield() makes OpenLDAP slow
  2006-01-26  8:30           ` Helge Hafting
@ 2006-01-26  9:01             ` Nick Piggin
  2006-01-26 10:50             ` Nikita Danilov
  1 sibling, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2006-01-26  9:01 UTC (permalink / raw)
  To: Helge Hafting; +Cc: davids, Linux Kernel Mailing List, hancockr

Helge Hafting wrote:
> David Schwartz wrote:

>> nothing says that it can't call pthread_mutex_lock and re-acquire the 
>> mutex
>> before any other thread gets around to getting it.
>>  
>>
> Wrong.
> The spec says that the mutex must be given to a waiter (if any) at the
> moment of release.

Repeating myself here...

To me it says that the scheduling policy decides at the moment of release.
What if the scheduling policy decides *right then* to give the mutex to
the next running thread that tries to aquire it?

That would be the logical way for a scheduling policy to decide the next
owner of the mutex.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-26  0:28         ` Jesse Brandeburg
@ 2006-01-26  9:32           ` Pavel Machek
  2006-01-26 19:02           ` Stefan Seyfried
       [not found]           ` <BAY108-DAV111F6EF46F6682FEECCC1593140@phx.gbl>
  2 siblings, 0 replies; 85+ messages in thread
From: Pavel Machek @ 2006-01-26  9:32 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Olaf Kirch, Stefan Seyfried, Linux Kernel Mailing List, netdev

On St 25-01-06 16:28:48, Jesse Brandeburg wrote:
> On 1/25/06, Jesse Brandeburg <jesse.brandeburg@gmail.com> wrote:
> > On 1/25/06, Olaf Kirch <okir@suse.de> wrote:
> > > On Wed, Jan 25, 2006 at 10:02:40AM +0100, Olaf Kirch wrote:
> > > > I'm not sure what the right fix would be. e100_resume would probably
> > > > have to call e100_alloc_cbs early on, while e100_up should avoid
> > > > calling it a second time if nic->cbs_avail != 0. A tentative patch
> > > > for testing is attached.
> > >
> > > Reportedly, the patch fixes the crash on resume.
> >
> > Cool, thanks for the research, I have a concern about this however.
> >
> > its an interesting patch, but it raises the question why does
> > e100_init_hw need to be called at all in resume?  I looked back
> > through our history and that init_hw call has always been there.  I
> > think its incorrect, but its taking me a while to set up a system with
> > the ability to resume.
> >
> > everywhere else in the driver alloc_cbs is called before init_hw so it
> > just seems like a long standing bug.
> >
> > comments?  anyone want to test? i compile tested this, but it is untested.
> 
> Okay I reproduced the issue on 2.6.15.1 (with S1 sleep) and was able
> to show that my patch that just removes e100_init_hw works okay for
> me.  Let me know how it goes for you, I think this is a good fix.

S1 preserves hardware state, .suspend/.resume routines can be NULL for
S1. Try with swsusp or S3.
								Pavel
-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 19:32               ` Howard Chu
  2006-01-26  8:51                 ` Nick Piggin
@ 2006-01-26 10:38                 ` Nikita Danilov
  2006-01-30  8:35                   ` Helge Hafting
  1 sibling, 1 reply; 85+ messages in thread
From: Nikita Danilov @ 2006-01-26 10:38 UTC (permalink / raw)
  To: Howard Chu; +Cc: Christopher Friesen, Linux Kernel Mailing List, hancockr

Howard Chu writes:

[...]

 > 
 > A straightforward reading of the language here says the decision happens 
 > "when pthread_mutex_unlock() is called" and not at any later time. There 
 > is nothing here to support your interpretation.
 > >
 > > I think the intention of the wording is that for deterministic policies,
 > > it is clear that the waiting threads are actually worken and reevaluated
 > > for scheduling. In the case of SCHED_OTHER, it means basically nothing,
 > > considering the scheduling policy is arbitrary.
 > >
 > Clearly the point is that one of the waiting threads is waken and gets 
 > the mutex, and it doesn't matter which thread is chosen. I.e., whatever 

Note that this behavior directly leads to "convoy formation": if that
woken thread T0 does not immediately run (e.g., because there are higher
priority threads) but still already owns the mutex, then other running
threads contending for this mutex will block waiting for T0, forming a
convoy.

 > thread the scheduling policy chooses. The fact that SCHED_OTHER can 
 > choose arbitrarily is immaterial, it still can only choose one of the 
 > waiting threads.

Looks like a good time to submit Defect Report to the Open Group.

 > 
 > The fact that SCHED_OTHER's scheduling behavior is undefined is not free 
 > license to implement whatever you want. Scheduling policies are an 
 > optional feature; the basic thread behavior must still be consistent 
 > even on systems that don't implement scheduling policies.
 > 
 > -- 
 >   -- Howard Chu

Nikita.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-25 22:14               ` Howard Chu
                                   ` (2 preceding siblings ...)
  2006-01-26  8:54                 ` Nick Piggin
@ 2006-01-26 10:44                 ` Nikita Danilov
  3 siblings, 0 replies; 85+ messages in thread
From: Nikita Danilov @ 2006-01-26 10:44 UTC (permalink / raw)
  To: Howard Chu; +Cc: Christopher Friesen, Linux Kernel Mailing List, hancockr

Howard Chu writes:

[...]

 > 
 > But then we have to deal with you folks' bizarre notion that 
 > sched_yield() can legitimately be a no-op, which also defies the POSIX 
 > spec. Again, in SUSv3 "The /sched_yield/() function shall force the 
 > running thread to relinquish the processor until it again becomes the 
 > head of its thread list. It takes no arguments." There is no language 
 > here saying "sched_yield *may* do nothing at all." There are of course 

As have been pointed to you already, while there is no such language,
the effect may be the same, if --for example-- scheduling policy decides
to put current thread back to "the head of its thread list" immediately
after sched_yield(). Which is a valid behavior for SCHED_OTHER.

Nikita.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: sched_yield() makes OpenLDAP slow
  2006-01-26  8:30           ` Helge Hafting
  2006-01-26  9:01             ` Nick Piggin
@ 2006-01-26 10:50             ` Nikita Danilov
  1 sibling, 0 replies; 85+ messages in thread
From: Nikita Danilov @ 2006-01-26 10:50 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Linux Kernel Mailing List, hancockr

Helge Hafting writes:

[...]

 > 
 > >nothing says that it can't call pthread_mutex_lock and re-acquire the mutex
 > >before any other thread gets around to getting it.
 > >  
 > >
 > Wrong.
 > The spec says that the mutex must be given to a waiter (if any) at the
 > moment of release.  The waiter don't have to be scheduled at that
 > point, it may keep sleeping with its freshly unlocked mutex.  So the
 > unlocking thread may continue - but if it tries to reaquire the mutex
 > it will find the mutex taken and go to sleep at that point. Then other

You just described a convoy formation: a phenomenon that all reasonable
mutex implementation try to avoid at all costs. If that's what standard
prescribes---the standard has to be amended.

 > 
 > Helge Hafting

Nikita.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  8:51                 ` Nick Piggin
@ 2006-01-26 14:15                   ` Kyle Moffett
  2006-01-26 14:43                     ` Howard Chu
  0 siblings, 1 reply; 85+ messages in thread
From: Kyle Moffett @ 2006-01-26 14:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Howard Chu, Christopher Friesen, Linux Kernel Mailing List, hancockr

Haven't you OpenLDAP guys realized that the pthread model you're  
actually looking for is this?  POSIX mutexes are not designed to  
mandate scheduling requirements *precisely* because this achieves  
your scheduling goals by explicitly stating what they are.

s: pthread_mutex_lock(&mutex);
s: pthread_cond_wait(&wake_slave, &mutex);

m: [do some work]
m: pthread_cond_signal(&wake_slave);
m: pthread_cond_wait(&wake_master, &mutex);

s: [return from pthread_cond_wait]
s: [do some work]
s: pthread_cond_signal(&wake_master);
s: pthread_cond_wait(&wake_slave, &mutex);

Of course, if that's the model you're looking for, you could always  
do this instead:

void master_func() {
	while (1) {
		[do some work]
		slave_func();
	}
}

void slave_func() {
	[do some work]
}

The semantics are effectively the same.

Cheers,
Kyle Moffett

--
Premature optimization is the root of all evil in programming
   -- C.A.R. Hoare




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26  8:54                 ` Nick Piggin
@ 2006-01-26 14:24                   ` Howard Chu
  2006-01-26 14:54                     ` Nick Piggin
  0 siblings, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-26 14:24 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List, hancockr

Nick Piggin wrote:
> Howard Chu wrote:
>> But then we have to deal with you folks' bizarre notion that 
>> sched_yield() can legitimately be a no-op, which also defies the 
>> POSIX spec. Again, in SUSv3 "The /sched_yield/() function shall force 
>> the running thread to relinquish the processor until it again becomes 
>> the head of its thread list. It takes no arguments." There is no 
>> language 
>
> How many times have we been over this? What do you think the "head of
> its thread list" might mean?
>
>> here saying "sched_yield *may* do nothing at all." There are of course 
>
> There is language saying SCHED_OTHER is arbitrary, including how the
> thread list is implemented and how a task might become on the head of
> it.
>
> They obviously don't need to redefine exactly what sched_yield may do
> under each scheduling policy, do they?
>
As Dave Butenhof says so often, threading is a cooperative programming 
model, not a competitive one. The sched_yield function exists for a 
specific purpose, to let one thread decide to allow some other thread to 
run. No matter what the scheduling policy, or even if there is no 
scheduling policy at all, the expectation is that the current thread 
will not continue to run unless there are no other runnable threads in 
the same process. The other important point here is that the yielding 
thread is only cooperating with other threads in its process. The 2.6 
kernel behavior effectively causes the entire process to give up its 
time slice, since the yielding thread has to wait for other processes in 
the system before it can run again. Again, if folks wanted process 
scheduling behavior they would have used fork().

By the way, I've already raised an objection with the Open Group asking 
for more clarification here.
http://www.opengroup.org/austin/aardvark/latest/xshbug2.txt   request 
number 120.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 14:15                   ` Kyle Moffett
@ 2006-01-26 14:43                     ` Howard Chu
  2006-01-26 19:57                       ` David Schwartz
  0 siblings, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-26 14:43 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Nick Piggin, Christopher Friesen, Linux Kernel Mailing List, hancockr

Kyle Moffett wrote:
> Haven't you OpenLDAP guys realized that the pthread model you're 
> actually looking for is this?  POSIX mutexes are not designed to 
> mandate scheduling requirements *precisely* because this achieves your 
> scheduling goals by explicitly stating what they are.

This isn't about OpenLDAP. Yes, we had a lot of yield() calls scattered 
through the code, leftovers from when we only supported non-preemptive 
threading. Those calls have been removed. There are a few remaining, 
that are only in code paths for unusual errors, so what they do has no 
real performance impact.

The point of this discussion is that the POSIX spec says one thing and 
you guys say another; one way or another that should be resolved. The 
2.6 kernel behavior is a noticable departure from previous releases. The 
2.4/LinuxThreads guys believed their implementation was correct. If you 
believe the 2.6 implementation is correct, then you should get the spec 
amended or state up front that the "P" in "NPTL" doesn't really mean 
anything.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 14:24                   ` Howard Chu
@ 2006-01-26 14:54                     ` Nick Piggin
  2006-01-26 15:23                       ` Howard Chu
  0 siblings, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26 14:54 UTC (permalink / raw)
  To: Howard Chu
  Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List, hancockr

Howard Chu wrote:
> Nick Piggin wrote:

>> They obviously don't need to redefine exactly what sched_yield may do
>> under each scheduling policy, do they?
>>
> As Dave Butenhof says so often, threading is a cooperative programming 
> model, not a competitive one. The sched_yield function exists for a 
> specific purpose, to let one thread decide to allow some other thread to 
> run. No matter what the scheduling policy, or even if there is no 

Yes, and even SCHED_OTHER in Linux attempts to do this as part of
the principle of least surprise. That it doesn't _exactly_ match
what you want it to do just means you need to be using something
else.

> scheduling policy at all, the expectation is that the current thread 
> will not continue to run unless there are no other runnable threads in 
> the same process. The other important point here is that the yielding 
> thread is only cooperating with other threads in its process. The 2.6 

No I don't think so. POSIX 1.b where sched_yield is defined are the
realtime extensions, are they not?

sched_yield explicitly makes reference to the realtime priority system
of thread lists does it not? It is pretty clear that it is used for
realtime processes to deterministically give up their timeslices to
others of the same priority level.

Linux's SCHED_OTHER behaviour is arguably the best interpretation,
considering SCHED_OTHER is defined to have a single priority level.

> kernel behavior effectively causes the entire process to give up its 
> time slice, since the yielding thread has to wait for other processes in 
> the system before it can run again. Again, if folks wanted process 

It yields to all other SCHED_OTHER processes (which are all on the
same thread priority list) and not to any other processes of higher
realtime priority.

> scheduling behavior they would have used fork().
> 

It so happens that processes and threads use the same scheduling
policy in Linux. Is that forbidden somewhere?

> By the way, I've already raised an objection with the Open Group asking 
> for more clarification here.
> http://www.opengroup.org/austin/aardvark/latest/xshbug2.txt   request 
> number 120.
> 

-- 
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 14:54                     ` Nick Piggin
@ 2006-01-26 15:23                       ` Howard Chu
  2006-01-26 15:51                         ` Nick Piggin
  0 siblings, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-26 15:23 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List, hancockr

Nick Piggin wrote:
> Howard Chu wrote:
>> scheduling policy at all, the expectation is that the current thread 
>> will not continue to run unless there are no other runnable threads 
>> in the same process. The other important point here is that the 
>> yielding thread is only cooperating with other threads in its 
>> process. The 2.6 
>
> No I don't think so. POSIX 1.b where sched_yield is defined are the
> realtime extensions, are they not?
>
> sched_yield explicitly makes reference to the realtime priority system
> of thread lists does it not? It is pretty clear that it is used for
> realtime processes to deterministically give up their timeslices to
> others of the same priority level.

The fact that sched_yield came originally from the realtime extensions 
is just a historical artifact. There was a pthread_yield() function 
specifically for threads and it was merged with sched_yield(). Today 
sched_yield() is a core part of the basic Threads specification, 
independent of the realtime extensions. The fact that it is defined 
solely in the language of the realtime priorities is an obvious flaw in 
the spec, since the function itself exists independently of realtime 
priorities. The objection I raised with the Open Group specifically 
addresses this flaw.

> Linux's SCHED_OTHER behaviour is arguably the best interpretation,
> considering SCHED_OTHER is defined to have a single priority level.

It appears that you just read the spec and blindly followed it without 
thinking about what it really said and failed to say. The best 
interpretation would come from saying "hey, this spec is only defined 
for realtime behavior, WTF is it supposed to do for the default 
non-realtime case?" and getting a clear definition in the spec.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 15:23                       ` Howard Chu
@ 2006-01-26 15:51                         ` Nick Piggin
  2006-01-26 16:44                           ` Howard Chu
  0 siblings, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26 15:51 UTC (permalink / raw)
  To: Howard Chu
  Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List, hancockr

Howard Chu wrote:
> Nick Piggin wrote:
> 
>> Howard Chu wrote:
>>
>>> scheduling policy at all, the expectation is that the current thread 
>>> will not continue to run unless there are no other runnable threads 
>>> in the same process. The other important point here is that the 
>>> yielding thread is only cooperating with other threads in its 
>>> process. The 2.6 
>>
>>
>> No I don't think so. POSIX 1.b where sched_yield is defined are the
>> realtime extensions, are they not?
>>
>> sched_yield explicitly makes reference to the realtime priority system
>> of thread lists does it not? It is pretty clear that it is used for
>> realtime processes to deterministically give up their timeslices to
>> others of the same priority level.
> 
> 
> The fact that sched_yield came originally from the realtime extensions 
> is just a historical artifact. There was a pthread_yield() function 
> specifically for threads and it was merged with sched_yield(). Today 
> sched_yield() is a core part of the basic Threads specification, 
> independent of the realtime extensions. The fact that it is defined 
> solely in the language of the realtime priorities is an obvious flaw in 
> the spec, since the function itself exists independently of realtime 
> priorities. The objection I raised with the Open Group specifically 
> addresses this flaw.
> 

Either way, it by no means says anything about yielding to other
threads in the process but nobody else. Where did you get that
from?

>> Linux's SCHED_OTHER behaviour is arguably the best interpretation,
>> considering SCHED_OTHER is defined to have a single priority level.
> 
> 
> It appears that you just read the spec and blindly followed it without 
> thinking about what it really said and failed to say. The best 

No, a spec is something that is written unambiguously, and generally
the wording leads me to believe they attempted to make it so (it
definitely isn't perfect - your mutex unlock example is one that could
be interpreted either way). If they failed to say something that should
be there then the spec needs to be corrected -- however in this case
I don't think you've shown what's missing.

And actually your reading things into the spec that "they failed to say"
is wrong I believe (in the above sched_yield example).

> interpretation would come from saying "hey, this spec is only defined 
> for realtime behavior, WTF is it supposed to do for the default 
> non-realtime case?" and getting a clear definition in the spec.
> 

However they do not omit to say that. They quite explicitly say that
SCHED_OTHER is considered a single priority class in relation to its
interactions with other realtime classes, and is otherwise free to
be implemented in any way.

I can't see how you still have a problem with that...

-- 
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 15:51                         ` Nick Piggin
@ 2006-01-26 16:44                           ` Howard Chu
  2006-01-26 17:34                             ` linux-os (Dick Johnson)
  0 siblings, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-26 16:44 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Lee Revell, Christopher Friesen, Linux Kernel Mailing List, hancockr

Nick Piggin wrote:
> No, a spec is something that is written unambiguously, and generally
> the wording leads me to believe they attempted to make it so (it
> definitely isn't perfect - your mutex unlock example is one that could
> be interpreted either way). If they failed to say something that should
> be there then the spec needs to be corrected -- however in this case
> I don't think you've shown what's missing.

What is missing: sched_yield is a core threads function but it's defined 
using language that only has meaning in the presence of an optional 
feature (Process Scheduling.) Since the function must exist even in the 
absence of these options, the definition must be changed to use language 
that has meaning even in the absence of these options.

> And actually your reading things into the spec that "they failed to say"
> is wrong I believe (in the above sched_yield example).
>
>> interpretation would come from saying "hey, this spec is only defined 
>> for realtime behavior, WTF is it supposed to do for the default 
>> non-realtime case?" and getting a clear definition in the spec.
>
> However they do not omit to say that. They quite explicitly say that
> SCHED_OTHER is considered a single priority class in relation to its
> interactions with other realtime classes, and is otherwise free to
> be implemented in any way.
>
> I can't see how you still have a problem with that...
>
I may be missing the obvious, but I couldn't find this explicit 
statement in the SUS docs. Also, it would not address the core 
complaint, that sched_yield's definition has no meaning when the Process 
Scheduling option doesn't exist.

The current Open Group response to my objection reads:
 >>>

Add to APPLICATION USAGE
Since there may not be more than one thread runnable in a process
a call to sched_yield() might not relinquish the processor at all.
In a single threaded application this will always be case.

<<<
The interesting point one can draw from this response is that 
sched_yield is only intended to yield to other runnable threads within a 
single process. This response is also problematic, because restricting 
it to threads within a process makes it useless for Process Scheduling. 
E.g., the Process Scheduling language would imply that a single-threaded 
app could yield the processor to some other process. As such, I think 
this response is also flawed, and the definition still needs more work.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 16:44                           ` Howard Chu
@ 2006-01-26 17:34                             ` linux-os (Dick Johnson)
  2006-01-26 19:00                               ` Nick Piggin
  2006-01-30  8:44                               ` Helge Hafting
  0 siblings, 2 replies; 85+ messages in thread
From: linux-os (Dick Johnson) @ 2006-01-26 17:34 UTC (permalink / raw)
  To: Howard Chu
  Cc: Nick Piggin, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr


On Thu, 26 Jan 2006, Howard Chu wrote:

> Nick Piggin wrote:
>> No, a spec is something that is written unambiguously, and generally
>> the wording leads me to believe they attempted to make it so (it
>> definitely isn't perfect - your mutex unlock example is one that could
>> be interpreted either way). If they failed to say something that should
>> be there then the spec needs to be corrected -- however in this case
>> I don't think you've shown what's missing.
>
> What is missing: sched_yield is a core threads function but it's defined
> using language that only has meaning in the presence of an optional
> feature (Process Scheduling.) Since the function must exist even in the
> absence of these options, the definition must be changed to use language
> that has meaning even in the absence of these options.
>
>> And actually your reading things into the spec that "they failed to say"
>> is wrong I believe (in the above sched_yield example).
>>
>>> interpretation would come from saying "hey, this spec is only defined
>>> for realtime behavior, WTF is it supposed to do for the default
>>> non-realtime case?" and getting a clear definition in the spec.
>>
>> However they do not omit to say that. They quite explicitly say that
>> SCHED_OTHER is considered a single priority class in relation to its
>> interactions with other realtime classes, and is otherwise free to
>> be implemented in any way.
>>
>> I can't see how you still have a problem with that...
>>
> I may be missing the obvious, but I couldn't find this explicit
> statement in the SUS docs. Also, it would not address the core
> complaint, that sched_yield's definition has no meaning when the Process
> Scheduling option doesn't exist.
>
> The current Open Group response to my objection reads:
> >>>
>
> Add to APPLICATION USAGE
> Since there may not be more than one thread runnable in a process
> a call to sched_yield() might not relinquish the processor at all.
> In a single threaded application this will always be case.
>
> <<<
> The interesting point one can draw from this response is that
> sched_yield is only intended to yield to other runnable threads within a
> single process. This response is also problematic, because restricting
> it to threads within a process makes it useless for Process Scheduling.
> E.g., the Process Scheduling language would imply that a single-threaded
> app could yield the processor to some other process. As such, I think
> this response is also flawed, and the definition still needs more work.
>
> --
>  -- Howard Chu
>  Chief Architect, Symas Corp.  http://www.symas.com
>  Director, Highland Sun        http://highlandsun.com/hyc
>  OpenLDAP Core Team            http://www.openldap.org/project/
>

To fix the current problem, you can substitute usleep(0); It will
give the CPU to somebody if it's computable, then give it back to
you. It seems to work in every case that sched_yield() has
mucked up (perhaps 20 to 30 here).

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 17:34                             ` linux-os (Dick Johnson)
@ 2006-01-26 19:00                               ` Nick Piggin
  2006-01-26 19:14                                 ` linux-os (Dick Johnson)
  2006-01-30  8:44                               ` Helge Hafting
  1 sibling, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26 19:00 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Howard Chu, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

linux-os (Dick Johnson) wrote:
> 
> To fix the current problem, you can substitute usleep(0); It will
> give the CPU to somebody if it's computable, then give it back to
> you. It seems to work in every case that sched_yield() has
> mucked up (perhaps 20 to 30 here).
> 

That sounds like a terrible hack.

What cases has sched_yield mucked up for you, and why do you
think the problem is sched_yield mucking up? Can you solve it
using mutexes?

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-26  0:28         ` Jesse Brandeburg
  2006-01-26  9:32           ` Pavel Machek
@ 2006-01-26 19:02           ` Stefan Seyfried
  2006-01-26 19:09             ` Olaf Kirch
  2006-01-28 11:53             ` Mattia Dongili
       [not found]           ` <BAY108-DAV111F6EF46F6682FEECCC1593140@phx.gbl>
  2 siblings, 2 replies; 85+ messages in thread
From: Stefan Seyfried @ 2006-01-26 19:02 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Olaf Kirch, Linux Kernel Mailing List, netdev

On Wed, Jan 25, 2006 at 04:28:48PM -0800, Jesse Brandeburg wrote:
 
> Okay I reproduced the issue on 2.6.15.1 (with S1 sleep) and was able
> to show that my patch that just removes e100_init_hw works okay for
> me.  Let me know how it goes for you, I think this is a good fix.

worked for me in the Compaq Armada e500 and reportedly also fixed the
SONY that originally uncovered it.

Will be in the next SUSE betas, so if anything breaks, we'll notice
it.

Thanks.
-- 
Stefan Seyfried                  \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices      \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, Nürnberg \                    -- Leonard Cohen

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-26 19:02           ` Stefan Seyfried
@ 2006-01-26 19:09             ` Olaf Kirch
  2006-01-28 11:53             ` Mattia Dongili
  1 sibling, 0 replies; 85+ messages in thread
From: Olaf Kirch @ 2006-01-26 19:09 UTC (permalink / raw)
  To: Stefan Seyfried; +Cc: Jesse Brandeburg, Linux Kernel Mailing List, netdev

On Thu, Jan 26, 2006 at 08:02:37PM +0100, Stefan Seyfried wrote:
> Will be in the next SUSE betas, so if anything breaks, we'll notice
> it.

I doubt it. As Jesse mentioned, e100_hw_init is called from e100_up,
so the call from e100_resume was really superfluous.

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
okir@suse.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 19:00                               ` Nick Piggin
@ 2006-01-26 19:14                                 ` linux-os (Dick Johnson)
  2006-01-26 21:12                                   ` Nick Piggin
  0 siblings, 1 reply; 85+ messages in thread
From: linux-os (Dick Johnson) @ 2006-01-26 19:14 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Howard Chu, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

On Thu, 26 Jan 2006, Nick Piggin wrote:

> linux-os (Dick Johnson) wrote:
>>
>> To fix the current problem, you can substitute usleep(0); It will
>> give the CPU to somebody if it's computable, then give it back to
>> you. It seems to work in every case that sched_yield() has
>> mucked up (perhaps 20 to 30 here).
>>
>
> That sounds like a terrible hack.
>
> What cases has sched_yield mucked up for you, and why do you
> think the problem is sched_yield mucking up? Can you solve it
> using mutexes?
>
> Thanks,
> Nick

Somebody wrote code that used Linux Threads. We didn't know
why it was so slow so I was asked to investigate. It was
a user-interface where high-speed image data gets put into
a buffer (using DMA) and one thread manipulates it. Another
thread copies and crunches the data, then displays it. The
writer insisted that he was doing the correct thing, however
the response sucked big time. I ran top and found that the
threaded processes were always grabbing big chunks of
CPU time. Searching for every instance of sched_yield(), I
was going to replace it with a diagnostic. However, the code
ran beautifully when the 'fprintf(stderr, "Message\n"' was
in the code! The call to write() sleeps. That gave the
CPU to somebody who was starving. The 'quick-fix" was
to replace sched_yield() with usleep(0).

The permanent fix was to not use threads at all.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 14:43                     ` Howard Chu
@ 2006-01-26 19:57                       ` David Schwartz
  2006-01-26 20:27                         ` Howard Chu
  2006-01-30  8:28                         ` Helge Hafting
  0 siblings, 2 replies; 85+ messages in thread
From: David Schwartz @ 2006-01-26 19:57 UTC (permalink / raw)
  To: hyc; +Cc: Linux Kernel Mailing List


> The point of this discussion is that the POSIX spec says one thing and
> you guys say another; one way or another that should be resolved. The
> 2.6 kernel behavior is a noticable departure from previous releases. The
> 2.4/LinuxThreads guys believed their implementation was correct. If you
> believe the 2.6 implementation is correct, then you should get the spec
> amended or state up front that the "P" in "NPTL" doesn't really mean
> anything.

	There is disagreement over what the POSIX specification says. You have
already seen three arguments against your interpretation, any one of which
is, IMO, sufficient to demolish it.

	First, there's the as-if issue. You cannot write a program that can print
"non-compliant" with the behavior you claim is non-compliant that is
guaranteed not to do so by the standard because there is no way to know that
another thread is blocked on the mutex (except for PI mutexes).

	Second, there's the plain langauge of the standard. It says "If X is so at
time T, then Y". This does not require Y to happen at time T. It is X
happening at time T that requires Y, but the time for Y is not specified.

	If a law says, for example, "if there are two or more bids with the same
price lower than all other bids at the close of bidding, the first such bid
to be received shall be accepted". The phrase "at the close of bidding"
refers to the time the rule is deteremined to apply to the situation, not
the time at which the decision as to which bid to accept is made.

	Third, there's the ambiguity of the standard. It says the "sceduling
policy" shall decide, not that the scheduler shall decide. If the policy is
to make a conditional or delayed decision, that is still perfectly valid
policy. "Whichever thread requests it first" is a valid scheduler policy.

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 19:57                       ` David Schwartz
@ 2006-01-26 20:27                         ` Howard Chu
  2006-01-26 20:46                           ` Nick Piggin
  2006-01-27  2:16                           ` David Schwartz
  2006-01-30  8:28                         ` Helge Hafting
  1 sibling, 2 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-26 20:27 UTC (permalink / raw)
  To: davids; +Cc: Linux Kernel Mailing List

David Schwartz wrote:
>> The point of this discussion is that the POSIX spec says one thing and
>> you guys say another; one way or another that should be resolved. The
>> 2.6 kernel behavior is a noticable departure from previous releases. The
>> 2.4/LinuxThreads guys believed their implementation was correct. If you
>> believe the 2.6 implementation is correct, then you should get the spec
>> amended or state up front that the "P" in "NPTL" doesn't really mean
>> anything.
>>     
>
> 	There is disagreement over what the POSIX specification says. You have
> already seen three arguments against your interpretation, any one of which
> is, IMO, sufficient to demolish it.
>   

> 	First, there's the as-if issue. You cannot write a program that can print
> "non-compliant" with the behavior you claim is non-compliant that is
> guaranteed not to do so by the standard because there is no way to know that
> another thread is blocked on the mutex (except for PI mutexes).
>   

The exception here demolishes this argument, IMO. Moreover, if the 
unlocker was a lower priority thread and there are higher priority 
threads blocked on the mutex, you really want the higher priority thread 
to run.

> 	Second, there's the plain langauge of the standard. It says "If X is so at
> time T, then Y". This does not require Y to happen at time T. It is X
> happening at time T that requires Y, but the time for Y is not specified.
>
> 	If a law says, for example, "if there are two or more bids with the same
> price lower than all other bids at the close of bidding, the first such bid
> to be received shall be accepted". The phrase "at the close of bidding"
> refers to the time the rule is deteremined to apply to the situation, not
> the time at which the decision as to which bid to accept is made.
>   

The time at which the decision takes effect is immaterial; the point is 
that the decision can only be made from the set of options available at 
time T.

Per your analogy, if a new bid comes in at time T+1, it can't have any 
effect on which of the bids shall be accepted.

> 	Third, there's the ambiguity of the standard. It says the "sceduling
> policy" shall decide, not that the scheduler shall decide. If the policy is
> to make a conditional or delayed decision, that is still perfectly valid
> policy. "Whichever thread requests it first" is a valid scheduler policy.

I am not debating what the policy can decide. Merely the set of choices 
from which it may decide.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 20:27                         ` Howard Chu
@ 2006-01-26 20:46                           ` Nick Piggin
  2006-01-26 21:32                             ` Howard Chu
  2006-01-27  2:16                           ` David Schwartz
  1 sibling, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26 20:46 UTC (permalink / raw)
  To: Howard Chu; +Cc: davids, Linux Kernel Mailing List

Howard Chu wrote:
> David Schwartz wrote:
> 

> The time at which the decision takes effect is immaterial; the point is 
> that the decision can only be made from the set of options available at 
> time T.
> 
> Per your analogy, if a new bid comes in at time T+1, it can't have any 
> effect on which of the bids shall be accepted.
> 
>>     Third, there's the ambiguity of the standard. It says the "sceduling
>> policy" shall decide, not that the scheduler shall decide. If the 
>> policy is
>> to make a conditional or delayed decision, that is still perfectly valid
>> policy. "Whichever thread requests it first" is a valid scheduler policy.
> 
> 
> I am not debating what the policy can decide. Merely the set of choices 
> from which it may decide.
> 

OK, you believe that the mutex *must* be granted to a blocking thread
at the time of the unlock. I don't think this is unreasonable from the
wording (because it does not seem to be completely unambiguous english),
however think about this -

A realtime system with tasks A and B, A has an RT scheduling priority of
1, and B is 2. A and B are both runnable, so A is running. A takes a mutex
then sleeps, B runs and ends up blocked on the mutex. A wakes up and at
some point it drops the mutex and then tries to take it again.

What happens?

I haven't programmed realtime systems of any complexity, but I'd think it
would be undesirable if A were to block and allow B to run at this point.

Now this has nothing to do with PI or SCHED_OTHER, so behaviour is exactly
determined by our respective interpretations of what it means for "the
scheduling policy to decide which task gets the mutex".

What have I proven? Nothing ;) but perhaps my question could be answered
by someone who knows a lot more about RT systems than I.

Nick

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 19:14                                 ` linux-os (Dick Johnson)
@ 2006-01-26 21:12                                   ` Nick Piggin
  2006-01-26 21:31                                     ` linux-os (Dick Johnson)
  0 siblings, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26 21:12 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Howard Chu, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

linux-os (Dick Johnson) wrote:
> On Thu, 26 Jan 2006, Nick Piggin wrote:

>>What cases has sched_yield mucked up for you, and why do you
>>think the problem is sched_yield mucking up? Can you solve it
>>using mutexes?
>>
>>Thanks,
>>Nick
> 
> 
> Somebody wrote code that used Linux Threads. We didn't know
> why it was so slow so I was asked to investigate. It was
> a user-interface where high-speed image data gets put into
> a buffer (using DMA) and one thread manipulates it. Another
> thread copies and crunches the data, then displays it. The
> writer insisted that he was doing the correct thing, however
> the response sucked big time. I ran top and found that the
> threaded processes were always grabbing big chunks of
> CPU time. Searching for every instance of sched_yield(), I
> was going to replace it with a diagnostic. However, the code
> ran beautifully when the 'fprintf(stderr, "Message\n"' was
> in the code! The call to write() sleeps. That gave the
> CPU to somebody who was starving. The 'quick-fix" was
> to replace sched_yield() with usleep(0).
> 
> The permanent fix was to not use threads at all.
> 

This sounds like a trivial producer consumer problem that you
would find in any basic books on synchronisation, threading, or
operating systems.

If it was not a realtime system, then I can't believe it has any
usages of sched_yield in there at all. If it is a realtime system,
then replacing them with something else could easily have broken
it.

Also, I'm not sure that you can rely on write or usleep for 0
microseconds to sleep.

> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
> Warning : 98.36% of all statistics are fiction.
> .
> 
> ****************************************************************
> The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.
> 
> Thank you.
> 

Any chance you can get rid of that crazy disclaimer when posting
to lkml, please?

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 21:12                                   ` Nick Piggin
@ 2006-01-26 21:31                                     ` linux-os (Dick Johnson)
  2006-01-27  7:06                                       ` Valdis.Kletnieks
  0 siblings, 1 reply; 85+ messages in thread
From: linux-os (Dick Johnson) @ 2006-01-26 21:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Howard Chu, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

On Thu, 26 Jan 2006, Nick Piggin wrote:
[SNIPPED...]

> Any chance you can get rid of that crazy disclaimer when posting
> to lkml, please?
>
> Thanks,
> Nick
> --
> SUSE Labs, Novell Inc.
> Send instant messages to your online friends http://au.messenger.yahoo.com
>

I tried. The "!@#(*$%^~!" IT/Legal Department(s) don't have a clue.
I asked the "mail-filter" guy on linux-kernel if he could just
exclude everything after a "." in the first column, just like
/bin/mail and, for that matter, sendmail. I was just told that
"It doesn't...." even though I can run sendmail by hand, using
telnet port 25, over the network, and know that the "." in the
first column is the way it knows the end-of-message after it
receives the "DATA" command.

Hoping that somebody, sometime, will implement my suggestion,
I continue to put a dot in the first column after my signature.
I know that if I send my mail around the lab without going through
the "*(_!@#&%" MicroWorm mail-grinder, the dot gets rid of
everything thereafter.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 20:46                           ` Nick Piggin
@ 2006-01-26 21:32                             ` Howard Chu
  2006-01-26 21:41                               ` Nick Piggin
                                                 ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-26 21:32 UTC (permalink / raw)
  To: Nick Piggin; +Cc: davids, Linux Kernel Mailing List

Nick Piggin wrote:
> OK, you believe that the mutex *must* be granted to a blocking thread
> at the time of the unlock. I don't think this is unreasonable from the
> wording (because it does not seem to be completely unambiguous english),
> however think about this -
>
> A realtime system with tasks A and B, A has an RT scheduling priority of
> 1, and B is 2. A and B are both runnable, so A is running. A takes a 
> mutex
> then sleeps, B runs and ends up blocked on the mutex. A wakes up and at
> some point it drops the mutex and then tries to take it again.
>
> What happens?
>
> I haven't programmed realtime systems of any complexity, but I'd think it
> would be undesirable if A were to block and allow B to run at this point.

But why does A take the mutex in the first place? Presumably because it 
is about to execute a critical section. And also presumably, A will not 
release the mutex until it no longer has anything critical to do; 
certainly it could hold it longer if it needed to.

If A still needed the mutex, why release it and reacquire it, why not 
just hold onto it? The fact that it is being released is significant.

> Now this has nothing to do with PI or SCHED_OTHER, so behaviour is 
> exactly
> determined by our respective interpretations of what it means for "the
> scheduling policy to decide which task gets the mutex".
>
> What have I proven? Nothing ;) but perhaps my question could be answered
> by someone who knows a lot more about RT systems than I.

In the last RT work I did 12-13 years ago, there was only one high 
priority producer task and it was never allowed to block. The consumers 
just kept up as best they could (multi-proc machine of course). I've 
seldom seen a need for many priority levels. Probably not much you can 
generalzie from this though.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 21:32                             ` Howard Chu
@ 2006-01-26 21:41                               ` Nick Piggin
  2006-01-26 21:56                                 ` Howard Chu
  2006-01-26 21:58                               ` Christopher Friesen
  2006-01-27  4:13                               ` Steven Rostedt
  2 siblings, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26 21:41 UTC (permalink / raw)
  To: Howard Chu; +Cc: davids, Linux Kernel Mailing List

Howard Chu wrote:
> Nick Piggin wrote:
> 
>> OK, you believe that the mutex *must* be granted to a blocking thread
>> at the time of the unlock. I don't think this is unreasonable from the
>> wording (because it does not seem to be completely unambiguous english),
>> however think about this -
>>
>> A realtime system with tasks A and B, A has an RT scheduling priority of
>> 1, and B is 2. A and B are both runnable, so A is running. A takes a 
>> mutex
>> then sleeps, B runs and ends up blocked on the mutex. A wakes up and at
>> some point it drops the mutex and then tries to take it again.
>>
>> What happens?
>>
>> I haven't programmed realtime systems of any complexity, but I'd think it
>> would be undesirable if A were to block and allow B to run at this point.
> 
> 
> But why does A take the mutex in the first place? Presumably because it 
> is about to execute a critical section. And also presumably, A will not 
> release the mutex until it no longer has anything critical to do; 
> certainly it could hold it longer if it needed to.
> 
> If A still needed the mutex, why release it and reacquire it, why not 
> just hold onto it? The fact that it is being released is significant.
> 

Regardless of why, that is just the simplest scenario I could think
of that would give us a test case. However...

Why not hold onto it? We sometimes do this in the kernel if we need
to take a lock that is incompatible with the lock already being held,
or if we discover we need to take a mutex which nests outside our
currently held lock in other paths. Ie to prevent deadlock.

Another reason might be because we will be running for a very long
time without requiring the lock. Or we might like to release it because
we expect a higher priority process to take it.

>> Now this has nothing to do with PI or SCHED_OTHER, so behaviour is 
>> exactly
>> determined by our respective interpretations of what it means for "the
>> scheduling policy to decide which task gets the mutex".
>>
>> What have I proven? Nothing ;) but perhaps my question could be answered
>> by someone who knows a lot more about RT systems than I.
> 
> 
> In the last RT work I did 12-13 years ago, there was only one high 
> priority producer task and it was never allowed to block. The consumers 
> just kept up as best they could (multi-proc machine of course). I've 
> seldom seen a need for many priority levels. Probably not much you can 
> generalzie from this though.
> 

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 21:41                               ` Nick Piggin
@ 2006-01-26 21:56                                 ` Howard Chu
  2006-01-26 22:24                                   ` Nick Piggin
  2006-01-27  4:27                                   ` Steven Rostedt
  0 siblings, 2 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-26 21:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: davids, Linux Kernel Mailing List

Nick Piggin wrote:
> Howard Chu wrote:
>> Nick Piggin wrote:
>>
>>> OK, you believe that the mutex *must* be granted to a blocking thread
>>> at the time of the unlock. I don't think this is unreasonable from the
>>> wording (because it does not seem to be completely unambiguous 
>>> english),
>>> however think about this -
>>>
>>> A realtime system with tasks A and B, A has an RT scheduling 
>>> priority of
>>> 1, and B is 2. A and B are both runnable, so A is running. A takes a 
>>> mutex
>>> then sleeps, B runs and ends up blocked on the mutex. A wakes up and at
>>> some point it drops the mutex and then tries to take it again.
>>>
>>> What happens?
>>>
>>> I haven't programmed realtime systems of any complexity, but I'd 
>>> think it
>>> would be undesirable if A were to block and allow B to run at this 
>>> point.
>>
>>
>> But why does A take the mutex in the first place? Presumably because 
>> it is about to execute a critical section. And also presumably, A 
>> will not release the mutex until it no longer has anything critical 
>> to do; certainly it could hold it longer if it needed to.
>>
>> If A still needed the mutex, why release it and reacquire it, why not 
>> just hold onto it? The fact that it is being released is significant.
>>
>
> Regardless of why, that is just the simplest scenario I could think
> of that would give us a test case. However...
>
> Why not hold onto it? We sometimes do this in the kernel if we need
> to take a lock that is incompatible with the lock already being held,
> or if we discover we need to take a mutex which nests outside our
> currently held lock in other paths. Ie to prevent deadlock.

In those cases, A cannot retake the mutex anyway. I.e., you just said 
that you released the first mutex because you want to acquire a 
different one. So those cases don't fit this example very well.

> Another reason might be because we will be running for a very long
> time without requiring the lock.

And again in this case, A should not be immediately reacquiring the lock 
if it doesn't actually need it.

> Or we might like to release it because
> we expect a higher priority process to take it.

And in this case, the expected behavior is the same as I've been pursuing.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 21:32                             ` Howard Chu
  2006-01-26 21:41                               ` Nick Piggin
@ 2006-01-26 21:58                               ` Christopher Friesen
  2006-01-27  4:13                               ` Steven Rostedt
  2 siblings, 0 replies; 85+ messages in thread
From: Christopher Friesen @ 2006-01-26 21:58 UTC (permalink / raw)
  To: Howard Chu; +Cc: Nick Piggin, davids, Linux Kernel Mailing List

Howard Chu wrote:

> But why does A take the mutex in the first place? Presumably because it 
> is about to execute a critical section. And also presumably, A will not 
> release the mutex until it no longer has anything critical to do; 
> certainly it could hold it longer if it needed to.

Suppose A is pulling job requests off a queue.

A takes the mutex because it is going to modify data protected by the 
mutex.  It then gives up the mutex when it's done modifying the data.

> If A still needed the mutex, why release it and reacquire it, why not 
> just hold onto it? The fact that it is being released is significant.

Suppose A then pulls another job request off the queue.  It just so 
happens that this job requires touching some data protected by the same 
mutex.  It would need to take the mutex again.

A doesn't necessarily know what data the various jobs will require it to 
access, so it doesn't know a priori what mutexes will be required.

Chris

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 21:56                                 ` Howard Chu
@ 2006-01-26 22:24                                   ` Nick Piggin
  2006-01-27  8:08                                     ` Howard Chu
  2006-01-27  4:27                                   ` Steven Rostedt
  1 sibling, 1 reply; 85+ messages in thread
From: Nick Piggin @ 2006-01-26 22:24 UTC (permalink / raw)
  To: Howard Chu; +Cc: davids, Linux Kernel Mailing List

Howard Chu wrote:
> Nick Piggin wrote:

>> Regardless of why, that is just the simplest scenario I could think
>> of that would give us a test case. However...
>>
>> Why not hold onto it? We sometimes do this in the kernel if we need
>> to take a lock that is incompatible with the lock already being held,
>> or if we discover we need to take a mutex which nests outside our
>> currently held lock in other paths. Ie to prevent deadlock.
> 
> 
> In those cases, A cannot retake the mutex anyway. I.e., you just said 
> that you released the first mutex because you want to acquire a 
> different one. So those cases don't fit this example very well.
> 

Umm yes, then *after* aquiring the different one, A would like to
retake the original mutex.

>> Another reason might be because we will be running for a very long
>> time without requiring the lock.
> 
> 
> And again in this case, A should not be immediately reacquiring the lock 
> if it doesn't actually need it.
> 

No, not immediately, I said "for a very long time". As in: A does not
need the exclusion provided by the lock for a very long time so it
drops it to avoid needless contention, then reaquires it when it finally
does need the lock.

>> Or we might like to release it because
>> we expect a higher priority process to take it.
> 
> 
> And in this case, the expected behavior is the same as I've been pursuing.
> 

No, we're talking about what happens when A tries to aquire it again.

Just accept that my described scenario is legitimate then consider it in
isolation rather than getting caught up in the superfluous details of how
such a situation might come about.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 20:27                         ` Howard Chu
  2006-01-26 20:46                           ` Nick Piggin
@ 2006-01-27  2:16                           ` David Schwartz
  2006-01-27  8:19                             ` Howard Chu
  1 sibling, 1 reply; 85+ messages in thread
From: David Schwartz @ 2006-01-27  2:16 UTC (permalink / raw)
  To: hyc; +Cc: Linux Kernel Mailing List


> David Schwartz wrote:

> > First, there's the as-if issue. You cannot write a program
> > that can print
> > "non-compliant" with the behavior you claim is non-compliant that is
> > guaranteed not to do so by the standard because there is no way
> > to know that
> > another thread is blocked on the mutex (except for PI mutexes).

> The exception here demolishes this argument, IMO.

	You're saying the authors of the standard intended that clause to be read
in light of the possibility of PI mutexes?! That's just nuts.

> Moreover, if the
> unlocker was a lower priority thread and there are higher priority
> threads blocked on the mutex, you really want the higher priority thread
> to run.

	Yes, I agree.

> > 	Second, there's the plain langauge of the standard. It says
> > "If X is so at
> > time T, then Y". This does not require Y to happen at time T. It is X
> > happening at time T that requires Y, but the time for Y is not
> specified.

> > 	If a law says, for example, "if there are two or more bids
> > with the same
> > price lower than all other bids at the close of bidding, the
> > first such bid
> > to be received shall be accepted". The phrase "at the close of bidding"
> > refers to the time the rule is deteremined to apply to the
> > situation, not
> > the time at which the decision as to which bid to accept is made.

> The time at which the decision takes effect is immaterial; the point is
> that the decision can only be made from the set of options available at
> time T.
>
> Per your analogy, if a new bid comes in at time T+1, it can't have any
> effect on which of the bids shall be accepted.

	Only because of the specifics of this analogy. If the rule said "if there
are two or more such bids with the same price at the close of bidding, the
winning bad shall be determined by the board of directors policy", nothing
prevents the board of directors from having a policy of going back to the
bidders and asking if they can lower their bids further.

	Nothing prevents them from rebidding the project if they want. In other
words, it doesn't place any restrictions on what the board can do.

> > 	Third, there's the ambiguity of the standard. It says the "sceduling
> > policy" shall decide, not that the scheduler shall decide. If
> > the policy is
> > to make a conditional or delayed decision, that is still perfectly valid
> > policy. "Whichever thread requests it first" is a valid
> > scheduler policy.

> I am not debating what the policy can decide. Merely the set of choices
> from which it may decide.

	Which is a restriction not found in the standard. A "policy" is a way of
deciding, not a decision. Scheduling policy can be to let whoever asks first
get it.

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 21:32                             ` Howard Chu
  2006-01-26 21:41                               ` Nick Piggin
  2006-01-26 21:58                               ` Christopher Friesen
@ 2006-01-27  4:13                               ` Steven Rostedt
  2 siblings, 0 replies; 85+ messages in thread
From: Steven Rostedt @ 2006-01-27  4:13 UTC (permalink / raw)
  To: Howard Chu; +Cc: Linux Kernel Mailing List, davids, Nick Piggin

On Thu, 2006-01-26 at 13:32 -0800, Howard Chu wrote:
> Nick Piggin wrote:
> > OK, you believe that the mutex *must* be granted to a blocking thread
> > at the time of the unlock. I don't think this is unreasonable from the
> > wording (because it does not seem to be completely unambiguous english),
> > however think about this -
> >
> > A realtime system with tasks A and B, A has an RT scheduling priority of
> > 1, and B is 2. A and B are both runnable, so A is running. A takes a 
> > mutex
> > then sleeps, B runs and ends up blocked on the mutex. A wakes up and at
> > some point it drops the mutex and then tries to take it again.
> >
> > What happens?
> >
> > I haven't programmed realtime systems of any complexity, but I'd think it
> > would be undesirable if A were to block and allow B to run at this point.
> 
> But why does A take the mutex in the first place? Presumably because it 
> is about to execute a critical section. And also presumably, A will not 
> release the mutex until it no longer has anything critical to do; 
> certainly it could hold it longer if it needed to.

A while back I discovered that the -rt patch did just this with the
spin_lock to rt_mutexes. Here's the scenario that happened amazingly too
much.

Three tasks A, B, C:  A with highest  prio (say 3), B is middle (say 2)
and C is lowest (say 1).  And all this with PI (although without PI it
can happen even easier. see my explanation here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=111165425915947&w=4 )

C grabs mutex X
B preempts C and tries to grab mutex X and blocks (C inherits from B)
A comes along and preempts C and blocks on X (C now inherits from A)
C lets go of mutex X and gives it to A.
A does some work then releases mutex X (B although not running aquires
it).
A needs to grab X again but B owns it. Since B has the lock, high
priority task A must give up the CPU for a lower priority task B.

I implemented a "lock stealing" for this very case and cut down
unnecessary schedules and latencies tremendously.  If A goes to grab X
again, but B has it (but hasn't woken up yet) it can "steal" it from B
and continue.

Hmm, this may still be under the POSIX if what you say is that a
"waiting" process must get the lock.  If A comes back before B wakes up,
A is now a waiting process and may take it. OK maybe I'm stretching it a
little, but that's what RT wants.

> 
> If A still needed the mutex, why release it and reacquire it, why not 
> just hold onto it? The fact that it is being released is significant.

There's several reasons.  Why hold a mutex when you don't need to. This
could be a SMP machine and B could grab the mutex in the small time that
A releases it.  Also locks are released and reaquired a lot to prevent
deadlocks.

It's good practice to always release a mutex (or any lock) when not
needed, even if you plan on grabbing it again right a way. For anything,
a higher priority process my be waiting to get it.

> 
> > Now this has nothing to do with PI or SCHED_OTHER, so behaviour is 
> > exactly
> > determined by our respective interpretations of what it means for "the
> > scheduling policy to decide which task gets the mutex".
> >
> > What have I proven? Nothing ;) but perhaps my question could be answered
> > by someone who knows a lot more about RT systems than I.
> 
> In the last RT work I did 12-13 years ago, there was only one high 
> priority producer task and it was never allowed to block. The consumers 
> just kept up as best they could (multi-proc machine of course). I've 
> seldom seen a need for many priority levels. Probably not much you can 
> generalzie from this though.

That seems to be a very simple system.  I usually deal with 4 or 5
priority levels and that can easily create headaches.

-- Steve

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 21:56                                 ` Howard Chu
  2006-01-26 22:24                                   ` Nick Piggin
@ 2006-01-27  4:27                                   ` Steven Rostedt
  1 sibling, 0 replies; 85+ messages in thread
From: Steven Rostedt @ 2006-01-27  4:27 UTC (permalink / raw)
  To: Howard Chu; +Cc: Linux Kernel Mailing List, davids, Nick Piggin

On Thu, 2006-01-26 at 13:56 -0800, Howard Chu wrote:
> Nick Piggin wrote:

> >>
> >> But why does A take the mutex in the first place? Presumably because 
> >> it is about to execute a critical section. And also presumably, A 
> >> will not release the mutex until it no longer has anything critical 
> >> to do; certainly it could hold it longer if it needed to.
> >>
> >> If A still needed the mutex, why release it and reacquire it, why not 
> >> just hold onto it? The fact that it is being released is significant.
> >>
> >
> > Regardless of why, that is just the simplest scenario I could think
> > of that would give us a test case. However...
> >
> > Why not hold onto it? We sometimes do this in the kernel if we need
> > to take a lock that is incompatible with the lock already being held,
> > or if we discover we need to take a mutex which nests outside our
> > currently held lock in other paths. Ie to prevent deadlock.
> 
> In those cases, A cannot retake the mutex anyway. I.e., you just said 
> that you released the first mutex because you want to acquire a 
> different one. So those cases don't fit this example very well.

Lets say you have two locks X and Y.  Y nests inside of X. To do block1
you need to have lock Y and to do block2 you need to have both locks X
and Y, and block 1 must be done first without holding lock X.

func()
{
again:
	mutex_lock(Y);
	block1();
	if (!mutex_try_lock(X)) {
		mutex_unlock(Y);
		mutex_lock(X);
		mutex_lock(Y);
		if (block1_has_changed()) {
			mutex_unlock(Y);
			mutex_unlock(X);
			goto again;
		}
	}
	block2();
	mutex_unlock(X);
	mutex_unlock(Y);
}

Stuff like the above actually is done (it's done in the kernel). So you
can see here that Y can be released and reacquired right away.  If
another task was waiting on Y (of lower priority) we don't want to give
up the lock, since we would then block and the chances of
block1_has_changed goes up even more.

> 
> > Another reason might be because we will be running for a very long
> > time without requiring the lock.
> 
> And again in this case, A should not be immediately reacquiring the lock 
> if it doesn't actually need it.

I'm not sure what Nick means here, but I'm sure he didn't mean it to
come out that way ;)

> 
> > Or we might like to release it because
> > we expect a higher priority process to take it.
> 
> And in this case, the expected behavior is the same as I've been pursuing.

But you can't know if a higher or lower priority process is waiting.
Sure it works like what you say when a higher priority process is
waiting, but it doesn't when it's a lower priority process waiting.

-- Steve



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 21:31                                     ` linux-os (Dick Johnson)
@ 2006-01-27  7:06                                       ` Valdis.Kletnieks
  0 siblings, 0 replies; 85+ messages in thread
From: Valdis.Kletnieks @ 2006-01-27  7:06 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Nick Piggin, Howard Chu, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

[-- Attachment #1: Type: text/plain, Size: 1590 bytes --]

On Thu, 26 Jan 2006 16:31:28 EST, "linux-os (Dick Johnson)" said:

> "It doesn't...." even though I can run sendmail by hand, using
> telnet port 25, over the network, and know that the "." in the
> first column is the way it knows the end-of-message after it
> receives the "DATA" command.

Right. That's how an MTA talks to another MTA.  However, your mail
needs to be properly escaped.  RFC821, section 4.5.2:

     4.5.2.  TRANSPARENCY

         Without some provision for data transparency the character
         sequence "<CRLF>.<CRLF>" ends the mail text and cannot be sent
         by the user.  In general, users are not aware of such
         "forbidden" sequences.  To allow all user composed text to be
         transmitted transparently the following procedures are used.

            1. Before sending a line of mail text the sender-SMTP checks
            the first character of the line.  If it is a period, one
            additional period is inserted at the beginning of the line.

            2. When a line of mail text is received by the receiver-SMTP
            it checks the line.  If the line is composed of a single
            period it is the end of mail.  If the first character is a
            period and there are other characters on the line, the first
            character is deleted.

In other words, the on-the-wire protocol is specifically designed so that
you *cant* accidentally lose the rest of the message by sending a bare '.'.
The fact that some programs implement it when talking to the user is
merely a convenience hack on the program's part.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 22:24                                   ` Nick Piggin
@ 2006-01-27  8:08                                     ` Howard Chu
  2006-01-27 19:25                                       ` Philipp Matthias Hahn
  2006-02-01 12:31                                       ` Nick Piggin
  0 siblings, 2 replies; 85+ messages in thread
From: Howard Chu @ 2006-01-27  8:08 UTC (permalink / raw)
  To: Nick Piggin; +Cc: davids, Linux Kernel Mailing List

Nick Piggin wrote:
> Howard Chu wrote:
>
>>> Another reason might be because we will be running for a very long
>>> time without requiring the lock.
>>
>>
>> And again in this case, A should not be immediately reacquiring the 
>> lock if it doesn't actually need it.
>>
>
> No, not immediately, I said "for a very long time". As in: A does not
> need the exclusion provided by the lock for a very long time so it
> drops it to avoid needless contention, then reaquires it when it finally
> does need the lock.

OK. I think this is really a separate situation. Just to recap: A takes 
lock, does some work, releases lock, a very long time passes, then A 
takes the lock again. In the "time passes" part, that mutex could be 
locked and unlocked any number of times by other threads and A won't 
know or care. Particularly on an SMP machine, other threads that were 
blocked on that mutex could do useful work in the interim without 
impacting A's progress at all. So here, when A leaves the mutex unlocked 
for a long time, it's desirable to give the mutex to one of the waiters 
ASAP.

>>> Or we might like to release it because
>>> we expect a higher priority process to take it.
>>
>>
>> And in this case, the expected behavior is the same as I've been 
>> pursuing.
>>
>
> No, we're talking about what happens when A tries to aquire it again.
>
> Just accept that my described scenario is legitimate then consider it in
> isolation rather than getting caught up in the superfluous details of how
> such a situation might come about.

OK. I'm not trying to be difficult here. In much of life, context is 
everything; very little can be understood in isolation.

Back to the scenario:

> A realtime system with tasks A and B, A has an RT scheduling priority of
> 1, and B is 2. A and B are both runnable, so A is running. A takes a 
> mutex
> then sleeps, B runs and ends up blocked on the mutex. A wakes up and at
> some point it drops the mutex and then tries to take it again.
>
> What happens?

As I understand the spec, A must block because B has acquired the mutex. 
Once again, the SUS discussion of priority inheritance would never need 
to have been written if this were not the case:

 >>>
In a priority-driven environment, a direct use of traditional primitives 
like mutexes and condition variables can lead to unbounded priority 
inversion, where a higher priority thread can be blocked by a lower 
priority thread, or set of threads, for an unbounded duration of time. 
As a result, it becomes impossible to guarantee thread deadlines. 
Priority inversion can be bounded and minimized by the use of priority 
inheritance protocols. This allows thread deadlines to be guaranteed 
even in the presence of synchronization requirements.
<<<

The very first sentence indicates that a higher priority thread can be 
blocked by a lower priority thread. If your interpretation of the spec 
were correct, then such an instance would never occur. Since your 
scenario is using realtime threads, then we can assume that the Priority 
Ceiling feature is present and you can use it if needed. ( 
http://www.opengroup.org/onlinepubs/000095399/xrat/xsh_chap02.html#tag_03_02_09_06 
Realtime Threads option group )

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-27  2:16                           ` David Schwartz
@ 2006-01-27  8:19                             ` Howard Chu
  2006-01-27 19:50                               ` David Schwartz
  0 siblings, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-27  8:19 UTC (permalink / raw)
  To: davids; +Cc: Linux Kernel Mailing List

David Schwartz wrote:
>>> 	Third, there's the ambiguity of the standard. It says the "sceduling
>>> policy" shall decide, not that the scheduler shall decide. If
>>> the policy is
>>> to make a conditional or delayed decision, that is still perfectly valid
>>> policy. "Whichever thread requests it first" is a valid
>>> scheduler policy.
>>>       

>> I am not debating what the policy can decide. Merely the set of choices
>> from which it may decide.
>>     
>
> 	Which is a restriction not found in the standard. A "policy" is a way of
> deciding, not a decision. Scheduling policy can be to let whoever asks first
> get it.
>   

If we just went with "whoever asks first" then clearly one of the 
blocked threads asked before the unlocker made its new request. You're 
arguing for my point, then.

Other ambiguities aside, one thing is clear - a decision is triggered by 
the unlock. What you seem to be arguing is the equivalent of saying that 
the decision is made based on the next lock operation. The spec doesn't 
say that mutex_lock is to behave this way. Why do you suppose that is? 
Perhaps you should raise this question with the Open Group.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-27  8:08                                     ` Howard Chu
@ 2006-01-27 19:25                                       ` Philipp Matthias Hahn
  2006-02-01 12:31                                       ` Nick Piggin
  1 sibling, 0 replies; 85+ messages in thread
From: Philipp Matthias Hahn @ 2006-01-27 19:25 UTC (permalink / raw)
  To: Howard Chu; +Cc: davids, Linux Kernel Mailing List

Hello!

On Fri, Jan 27, 2006 at 12:08:13AM -0800, Howard Chu wrote:
> >No, not immediately, I said "for a very long time". As in: A does not
> >need the exclusion provided by the lock for a very long time so it
> >drops it to avoid needless contention, then reaquires it when it finally
> >does need the lock.
> 
> OK. I think this is really a separate situation. Just to recap: A takes 
> lock, does some work, releases lock, a very long time passes, then A 
> takes the lock again. In the "time passes" part, that mutex could be 
> locked and unlocked any number of times by other threads and A won't 
> know or care. Particularly on an SMP machine, other threads that were 
> blocked on that mutex could do useful work in the interim without 
> impacting A's progress at all. So here, when A leaves the mutex unlocked 
> for a long time, it's desirable to give the mutex to one of the waiters 
> ASAP.

When you release a lock, you unblock at most one thread, which is
waiting for that lock and put that released thread in the runnable
state.
Than it's up to the scheduler, what happens next:
- if you have multiple processors, you _can_ run the released thread on
  anther processor, so both thread run.
- if you are single processor or don't want to schedule the released
  thread on a second cpu, you must decide to
  - either _continue running the releasing thread_ and let the released
    thread stay some more time in the runnable queue,
  - or _preempt the releasing thread_ to the runnable queue and make the
    released thread running.
If you have different priorities, your decision is easy: run the most
important thread.
But if you don't have priorities, you base your decision on other
metrics: Since it takes more time to switch a thread (save/restore
state) compared to continue running the same thread, from a throuput
perspective you'll prefer to not change threads.

Similar thinking for yield(): You put the running thread back to the
runnable queue and choose one thread from it as the new running thread.
Note, that you might choose the old thread as the new thread again,
since with SCHED_OTHER this is perfectly fine, if you decided to honor
throuput more than fairness.
Other with SCHED_FIFO/RR, since there you are forced to put the old
thread at the end of your runnable queue and choose the new one from the
front of the queue, so all other threads with the same priority will run
before you yielding thread gets the cpu again.

Summary: yield() only makes sense with a SCHED_FIFO/RR policy, because
with SCHED_OTHER you know too little about the exact policy to make any
use of it.

BYtE
Philipp
-- 
      Dipl.-Inform. Philipp.Hahn@informatik.uni-oldenburg.de
      Abteilung Systemsoftware und verteilte Systeme, Fk. II
Carl von Ossietzky Universitaet Oldenburg, 26111 Oldenburg, Germany
    http://www.svs.informatik.uni-oldenburg.de/contact/pmhahn/
      Telefon: +49 441 798-2866    Telefax: +49 441 798-2756

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-27  8:19                             ` Howard Chu
@ 2006-01-27 19:50                               ` David Schwartz
  2006-01-27 20:13                                 ` Howard Chu
  0 siblings, 1 reply; 85+ messages in thread
From: David Schwartz @ 2006-01-27 19:50 UTC (permalink / raw)
  To: hyc; +Cc: Linux Kernel Mailing List


> If we just went with "whoever asks first" then clearly one of the
> blocked threads asked before the unlocker made its new request. You're
> arguing for my point, then.

	Huh? I am saying the policy can be anything at all. We could just go with
"whoever asks first", but we are not required to. And, in any event, I meant
whoever asks for the mutex first, not whoever blocks first. (Note that I
didn't say "whoever asked first" which would mean something totally
different.)

> Other ambiguities aside, one thing is clear - a decision is triggered by
> the unlock. What you seem to be arguing is the equivalent of saying that
> the decision is made based on the next lock operation.

	The spec says that the decision is triggered by a particular condition that
exists at the time of the unlock. That does not mean the decision is made at
the time of the unlock.

> The spec doesn't
> say that mutex_lock is to behave this way.

	We don't agree on what the specification says.

> Why do you suppose that is?

	Why do I suppose what? I find the specification perfectly clear and your
reading of it incredibly strained for the three reasons I stated.

> Perhaps you should raise this question with the Open Group.

	I don't think it's unclear.

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-27 19:50                               ` David Schwartz
@ 2006-01-27 20:13                                 ` Howard Chu
  2006-01-27 21:05                                   ` David Schwartz
  0 siblings, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-27 20:13 UTC (permalink / raw)
  To: davids; +Cc: Linux Kernel Mailing List

David Schwartz wrote:
> 	We don't agree on what the specification says.
>
>   
>> Why do you suppose that is?
>>     
>
> 	Why do I suppose what? I find the specification perfectly clear and your
> reading of it incredibly strained for the three reasons I stated.
>   

Oddly enough, you said 
http://groups.google.com/group/comp.programming.threads/msg/28b58e91886a3602?hl=en&
"Unfortunately, it sounds reasonable"  so I can't lend credence to your 
stating that my reading is incredibly strained. The fact that 
LinuxThreads historically adhered to my reading of it lends more weight 
to my argument. The fact that people accepted this interpretation for so 
many years lends further weight. In light of this, it is your current 
interpretation that is incredibly strained, and I would say, broken.

You have essentially created a tri-state mutex. (Locked, unlocked, and 
sort-of-unlocked-but-really-reserved.) That may be a good and useful 
thing in its own right, but it should not be the default behavior.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-27 20:13                                 ` Howard Chu
@ 2006-01-27 21:05                                   ` David Schwartz
  2006-01-27 21:23                                     ` Howard Chu
  0 siblings, 1 reply; 85+ messages in thread
From: David Schwartz @ 2006-01-27 21:05 UTC (permalink / raw)
  To: hyc; +Cc: Linux Kernel Mailing List


> David Schwartz wrote:
> > 	We don't agree on what the specification says.
> >
> >
> >> Why do you suppose that is?
> >>
> >
> > 	Why do I suppose what? I find the specification perfectly
> clear and your
> > reading of it incredibly strained for the three reasons I stated.
> >

> Oddly enough, you said
> http://groups.google.com/group/comp.programming.threads/msg/28b58e
> 91886a3602?hl=en&
> "Unfortunately, it sounds reasonable"  so I can't lend credence to your
> stating that my reading is incredibly strained. The fact that
> LinuxThreads historically adhered to my reading of it lends more weight
> to my argument. The fact that people accepted this interpretation for so
> many years lends further weight. In light of this, it is your current
> interpretation that is incredibly strained, and I would say, broken.

	After collecting other opinions from comp.programming.threads, and being
unable to find other people who considered it reasonable, I've changed my
opinion. I was far too generous and deferential before.

	The more I consider it, the more absurd I find it. POSIX and SuS were so
careful not to dictate scheduler policy (or even hint at any notion of
fairness) that to argue that they intended to prohibit a thread from
releasing and reacquiring a mutex while another thread was blocked on it is
not tenable.

	You are essentially arguing that they intended to prohibit the most natural
and highest performing implementation. This is totally inconsistent with
POSIX's overall design intention to provide the lightest and
highest-performing primitives and allow users to add features with overhead
if they needed those features and could tolerate the overhead.

> You have essentially created a tri-state mutex. (Locked, unlocked, and
> sort-of-unlocked-but-really-reserved.) That may be a good and useful
> thing in its own right, but it should not be the default behavior.

	Huh?

	I'm suggesting the most natural implementation: When a thread tries to
acquire a mutex, it is blocked if a higher-priority thread is already
waiting for a mutex. When a thread releases a mutex, the highest-priority
thread waiting for the mutex is woken (but not necessarily guaranteed the
mutex, the mutex is simply marked available). When a thread tries to acquire
a mutex, it gets it unless a higher-priority thread is already registered as
wanting it. When a thread tries to acquire a mutex, it loops until it
acquires it and on each iteration blocks if the mutex is taken or a
higher-priority thread is waiting for it, otherwise it takes the mutex.

	A thread that is descheduled should never get priority over a thread that
is already running (unless a scheduling priority mechanism requires it).

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-27 21:05                                   ` David Schwartz
@ 2006-01-27 21:23                                     ` Howard Chu
  2006-01-27 23:31                                       ` David Schwartz
  0 siblings, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-27 21:23 UTC (permalink / raw)
  To: davids; +Cc: Linux Kernel Mailing List

David Schwartz wrote:
> 	After collecting other opinions from comp.programming.threads, and being
> unable to find other people who considered it reasonable, I've changed my
> opinion. I was far too generous and deferential before.
>   

David, you specifically have been faced with this question before:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/2184ba84f911d9dd/a6e4f7cf13bbec2d#a6e4f7cf13bbec2d
and you didn't dispute the interpretation then. The wording for 
pthread_mutex_unlock hasn't changed between 2001 and now.

And here:
http://groups.google.com/group/comp.programming.threads/msg/89cc5d600e34e88a?hl=en&

If those statements were incorrect, I have a feeling someone would have 
corrected them at the time. Certainly you can attest to that.
http://groups.google.com/group/comp.programming.threads/msg/d5b2231ca57bb102?hl=en&

Clearly at this point there's nothing to be gained from pursuing this 
any further. The 2.6 kernel has been out for too long; if it were to be 
"fixed" again it would just make life ugly for another group of people, 
and I don't want to write the autoconf tests to detect the 
flavor-of-the-week. We've wasted enough time arguing futilely over it, 
I'll stop.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-27 21:23                                     ` Howard Chu
@ 2006-01-27 23:31                                       ` David Schwartz
  0 siblings, 0 replies; 85+ messages in thread
From: David Schwartz @ 2006-01-27 23:31 UTC (permalink / raw)
  To: hyc; +Cc: Linux Kernel Mailing List


> David, you specifically have been faced with this question before:
> http://groups.google.com/group/comp.programming.threads/browse_frm
> /thread/2184ba84f911d9dd/a6e4f7cf13bbec2d#a6e4f7cf13bbec2d
> and you didn't dispute the interpretation then. The wording for
> pthread_mutex_unlock hasn't changed between 2001 and now.

	This was a totally different question. This was about the implementation,
not the interpretation. You'll note that I objected to the implementation.

> And here:
>
http://groups.google.com/group/comp.programming.threads/msg/89cc5d600e34e88a
?hl=en&

	Again, I don't see that I commented on the interpretation. This was an
unfortunate missed oppurtunity. Kaz is incorrect here.

> If those statements were incorrect, I have a feeling someone would have
> corrected them at the time. Certainly you can attest to that.

	Obviously not, since they are incorrect and nobody did.

>
http://groups.google.com/group/comp.programming.threads/msg/d5b2231ca57bb102
?hl=en&

	Again, this had nothing whatsoever to do with whether the interpretation is
correct or not.

> Clearly at this point there's nothing to be gained from pursuing this
> any further. The 2.6 kernel has been out for too long; if it were to be
> "fixed" again it would just make life ugly for another group of people,
> and I don't want to write the autoconf tests to detect the
> flavor-of-the-week. We've wasted enough time arguing futilely over it,
> I'll stop.

	The problem is that this interpretation is simply incorrect and results in
maximally inefficient implementations.

	David Butenhof recently posted to comp.programming.threads and indicated
that disagreed with this implementation. That's about as close to
authoritative as you're likely to get.

	POSIX had no intention to constrain the scheduler to compel inefficient
behavior. In fact, they went out of their way to create the lightest
possible primitives.

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-26 19:02           ` Stefan Seyfried
  2006-01-26 19:09             ` Olaf Kirch
@ 2006-01-28 11:53             ` Mattia Dongili
  2006-01-28 19:53               ` Jesse Brandeburg
  1 sibling, 1 reply; 85+ messages in thread
From: Mattia Dongili @ 2006-01-28 11:53 UTC (permalink / raw)
  To: Stefan Seyfried
  Cc: Jesse Brandeburg, Olaf Kirch, Linux Kernel Mailing List, netdev

On Thu, Jan 26, 2006 at 08:02:37PM +0100, Stefan Seyfried wrote:
> On Wed, Jan 25, 2006 at 04:28:48PM -0800, Jesse Brandeburg wrote:
>  
> > Okay I reproduced the issue on 2.6.15.1 (with S1 sleep) and was able
> > to show that my patch that just removes e100_init_hw works okay for
> > me.  Let me know how it goes for you, I think this is a good fix.
> 
> worked for me in the Compaq Armada e500 and reportedly also fixed the
> SONY that originally uncovered it.

confirmed here too. The patch fixes S3 resume on this Sony (GR7/K)
running 2.6.16-rc1-mm3.

0000:02:08.0 Ethernet controller: Intel Corporation 82801CAM (ICH3) PRO/100 VE (LOM) Ethernet Controller (rev 41)
	Subsystem: Sony Corporation Vaio PCG-GR214EP/GR214MP/GR215MP/GR314MP/GR315MP
	Flags: bus master, medium devsel, latency 66, IRQ 9
	Memory at d0204000 (32-bit, non-prefetchable) [size=4K]
	I/O ports at 4000 [size=64]
	Capabilities: <available only to root>

thanks
-- 
mattia
:wq!

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-28 11:53             ` Mattia Dongili
@ 2006-01-28 19:53               ` Jesse Brandeburg
  2006-02-07  6:57                 ` Jeff Garzik
  0 siblings, 1 reply; 85+ messages in thread
From: Jesse Brandeburg @ 2006-01-28 19:53 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Stefan Seyfried, Jesse Brandeburg, Olaf Kirch,
	Linux Kernel Mailing List, netdev, Jesse Brandeburg,
	Jeff Kirsher

[-- Attachment #1: Type: text/plain, Size: 737 bytes --]

On 1/28/06, Mattia Dongili <malattia@linux.it> wrote:
> On Thu, Jan 26, 2006 at 08:02:37PM +0100, Stefan Seyfried wrote:
> > On Wed, Jan 25, 2006 at 04:28:48PM -0800, Jesse Brandeburg wrote:
> >
> > > Okay I reproduced the issue on 2.6.15.1 (with S1 sleep) and was able
> > > to show that my patch that just removes e100_init_hw works okay for
> > > me.  Let me know how it goes for you, I think this is a good fix.
> >
> > worked for me in the Compaq Armada e500 and reportedly also fixed the
> > SONY that originally uncovered it.
>
> confirmed here too. The patch fixes S3 resume on this Sony (GR7/K)
> running 2.6.16-rc1-mm3.

excellent news! thanks for testing.

Jeff, could you please apply to 2.6.16-rcX

Jesse

[-- Attachment #2: e100_resume_no_init.patch --]
[-- Type: application/octet-stream, Size: 818 bytes --]

e100: remove init_hw call to fix panic

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

e100 seems to have had a long standing bug where e100_init_hw was being
called when it should not have been.  This caused a panic due to recent
changes that rely on correct set up in the driver, and more robust error
paths.
---

 drivers/net/e100.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2752,8 +2752,6 @@ static int e100_resume(struct pci_dev *p
 	retval = pci_enable_wake(pdev, 0, 0);
 	if (retval)
 		DPRINTK(PROBE,ERR, "Error clearing wake events\n");
-	if(e100_hw_init(nic))
-		DPRINTK(HW, ERR, "e100_hw_init failed\n");
 
 	netif_device_attach(netdev);
 	if(netif_running(netdev))

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 19:57                       ` David Schwartz
  2006-01-26 20:27                         ` Howard Chu
@ 2006-01-30  8:28                         ` Helge Hafting
  1 sibling, 0 replies; 85+ messages in thread
From: Helge Hafting @ 2006-01-30  8:28 UTC (permalink / raw)
  To: davids; +Cc: hyc, Linux Kernel Mailing List

David Schwartz wrote:

>	Third, there's the ambiguity of the standard. It says the "sceduling
>policy" shall decide, not that the scheduler shall decide. If the policy is
>to make a conditional or delayed decision, that is still perfectly valid
>policy. "Whichever thread requests it first" is a valid scheduler policy.
>  
>
Sure.  And with a "whichever thread aquires it first" policy, then
it is obvious what happens when a mutex is released when someone
is blocked on it:  Whoever blocked on it first is then the one
who requested it first - that cannot change as the request was made
before the mutex even was released.  So then, the releasing thread has
no chance of getting the mutex back until the others have had a
go at it - no matter what threads actually gets scheduled.

Helge Hafting


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 10:38                 ` Nikita Danilov
@ 2006-01-30  8:35                   ` Helge Hafting
  2006-01-30 11:13                     ` Nikita Danilov
  2006-01-31 23:18                     ` David Schwartz
  0 siblings, 2 replies; 85+ messages in thread
From: Helge Hafting @ 2006-01-30  8:35 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: Howard Chu, Christopher Friesen, Linux Kernel Mailing List, hancockr

Nikita Danilov wrote:

>Howard Chu writes:
>
>[...]
>
> > 
> > A straightforward reading of the language here says the decision happens 
> > "when pthread_mutex_unlock() is called" and not at any later time. There 
> > is nothing here to support your interpretation.
> > >
> > > I think the intention of the wording is that for deterministic policies,
> > > it is clear that the waiting threads are actually worken and reevaluated
> > > for scheduling. In the case of SCHED_OTHER, it means basically nothing,
> > > considering the scheduling policy is arbitrary.
> > >
> > Clearly the point is that one of the waiting threads is waken and gets 
> > the mutex, and it doesn't matter which thread is chosen. I.e., whatever 
>
>Note that this behavior directly leads to "convoy formation": if that
>woken thread T0 does not immediately run (e.g., because there are higher
>priority threads) but still already owns the mutex, then other running
>threads contending for this mutex will block waiting for T0, forming a
>convoy.
>
I just wonder - what is the problem with this convoy formation?
It can only happen when the cpu is overloaded, and in that case
someone has to wait.  In this case, the mutex waiters. 

Aggressively handing the cpu to whoever holds a mutex will mean the
mutexes are free more of the time - but it will *not* mean less waiting in
tghe system.  You just changes who waits.

Helge Hafting


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-26 17:34                             ` linux-os (Dick Johnson)
  2006-01-26 19:00                               ` Nick Piggin
@ 2006-01-30  8:44                               ` Helge Hafting
  2006-01-30  8:50                                 ` Howard Chu
  2006-01-30 13:28                                 ` linux-os (Dick Johnson)
  1 sibling, 2 replies; 85+ messages in thread
From: Helge Hafting @ 2006-01-30  8:44 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Howard Chu, Nick Piggin, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

linux-os (Dick Johnson) wrote:

>To fix the current problem, you can substitute usleep(0); It will
>give the CPU to somebody if it's computable, then give it back to
>you. It seems to work in every case that sched_yield() has
>mucked up (perhaps 20 to 30 here).
>  
>
Isn't that dangerous?  Someday, someone working on linux (or some
other unixish os) might come up with an usleep implementation where
usleep(0) just returns and becomes a no-op.  Which probably is ok
with the usleep spec - it did sleep for zero time . . .

Helge Hafting

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-30  8:44                               ` Helge Hafting
@ 2006-01-30  8:50                                 ` Howard Chu
  2006-01-30 15:33                                   ` Kyle Moffett
  2006-01-30 13:28                                 ` linux-os (Dick Johnson)
  1 sibling, 1 reply; 85+ messages in thread
From: Howard Chu @ 2006-01-30  8:50 UTC (permalink / raw)
  To: Helge Hafting
  Cc: linux-os (Dick Johnson),
	Nick Piggin, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

Helge Hafting wrote:
> linux-os (Dick Johnson) wrote:
>
>> To fix the current problem, you can substitute usleep(0); It will
>> give the CPU to somebody if it's computable, then give it back to
>> you. It seems to work in every case that sched_yield() has
>> mucked up (perhaps 20 to 30 here).
>>  
>>
> Isn't that dangerous?  Someday, someone working on linux (or some
> other unixish os) might come up with an usleep implementation where
> usleep(0) just returns and becomes a no-op.  Which probably is ok
> with the usleep spec - it did sleep for zero time . . .
>
We actually experimented with usleep(0) and select(...) with a zeroed 
timeval. Both of these approaches performed worse than just using 
sched_yield(), depending on the system and some other conditions. 
Dual-core AMD64 vs single-CPU had quite different behaviors. Also, if 
the slapd main event loop was using epoll() instead of select(), the 
select's used for yields slowed down by a couple orders of magnitude. (A 
test that normally took ~30 seconds took as long as 45 minutes in one 
case, it was quite erratic.)

It turned out that most of those yield's were leftovers inherited from 
when we only supported non-preemptive threads, and simply deleting them 
was the best approach.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-30  8:35                   ` Helge Hafting
@ 2006-01-30 11:13                     ` Nikita Danilov
  2006-01-31 23:18                     ` David Schwartz
  1 sibling, 0 replies; 85+ messages in thread
From: Nikita Danilov @ 2006-01-30 11:13 UTC (permalink / raw)
  To: Helge Hafting
  Cc: Nikita Danilov, Howard Chu, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

Helge Hafting writes:
 > Nikita Danilov wrote:
 > 
 > >Howard Chu writes:
 > >
 > >[...]
 > >
 > > > 
 > > > A straightforward reading of the language here says the decision happens 
 > > > "when pthread_mutex_unlock() is called" and not at any later time. There 
 > > > is nothing here to support your interpretation.
 > > > >
 > > > > I think the intention of the wording is that for deterministic policies,
 > > > > it is clear that the waiting threads are actually worken and reevaluated
 > > > > for scheduling. In the case of SCHED_OTHER, it means basically nothing,
 > > > > considering the scheduling policy is arbitrary.
 > > > >
 > > > Clearly the point is that one of the waiting threads is waken and gets 
 > > > the mutex, and it doesn't matter which thread is chosen. I.e., whatever 
 > >
 > >Note that this behavior directly leads to "convoy formation": if that
 > >woken thread T0 does not immediately run (e.g., because there are higher
 > >priority threads) but still already owns the mutex, then other running
 > >threads contending for this mutex will block waiting for T0, forming a
 > >convoy.
 > >
 > I just wonder - what is the problem with this convoy formation?
 > It can only happen when the cpu is overloaded, and in that case
 > someone has to wait.  In this case, the mutex waiters. 

The obvious problem is extra context switch: if mutex is left unlocked,
then first thread (say, T0) that tries to acquire it, succeeds and
continues to run, whereas if mutex is directly handed to the runnable
(but not running) thread T1, T0 has to block, until T1 runs.

What's worse, convoys tend to grow once formed.

 > 
 > Aggressively handing the cpu to whoever holds a mutex will mean the
 > mutexes are free more of the time - but it will *not* mean less waiting in
 > tghe system.  You just changes who waits.
 > 
 > Helge Hafting

Nikita.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-30  8:44                               ` Helge Hafting
  2006-01-30  8:50                                 ` Howard Chu
@ 2006-01-30 13:28                                 ` linux-os (Dick Johnson)
  2006-01-30 15:15                                   ` Helge Hafting
  1 sibling, 1 reply; 85+ messages in thread
From: linux-os (Dick Johnson) @ 2006-01-30 13:28 UTC (permalink / raw)
  To: Helge Hafting
  Cc: Howard Chu, Nick Piggin, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

On Mon, 30 Jan 2006, Helge Hafting wrote:

> linux-os (Dick Johnson) wrote:
>
>> To fix the current problem, you can substitute usleep(0); It will
>> give the CPU to somebody if it's computable, then give it back to
>> you. It seems to work in every case that sched_yield() has
>> mucked up (perhaps 20 to 30 here).
>>
>>
> Isn't that dangerous?  Someday, someone working on linux (or some
> other unixish os) might come up with an usleep implementation where
> usleep(0) just returns and becomes a no-op.  Which probably is ok
> with the usleep spec - it did sleep for zero time . . .
>
> Helge Hafting

Dangerous?? You have a product that needs to ship. You can make
it work by adding a hack. You add a hack. I don't see danger at
all. I see getting the management off the back of the software
engineers so that they can fix the code. Further, you __test__ the
stuff before you ship. If usleep(0) just spins, then you use
usleep(1).

Also, I don't think any Engineer would use threads for anything
that could be potentially dangerous anyway. You create step-by-step
ordered procedures with explicit state-machines for things that
really need to happen as written. You use threads for things that
must occur, but you don't give a damn when they occur (like updating
a window on the screen or sorting keys in a database).

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-30 13:28                                 ` linux-os (Dick Johnson)
@ 2006-01-30 15:15                                   ` Helge Hafting
  0 siblings, 0 replies; 85+ messages in thread
From: Helge Hafting @ 2006-01-30 15:15 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Howard Chu, Nick Piggin, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

linux-os (Dick Johnson) wrote:

>On Mon, 30 Jan 2006, Helge Hafting wrote:
>
>  
>
>>linux-os (Dick Johnson) wrote:
>>
>>    
>>
>>>To fix the current problem, you can substitute usleep(0); It will
>>>give the CPU to somebody if it's computable, then give it back to
>>>you. It seems to work in every case that sched_yield() has
>>>mucked up (perhaps 20 to 30 here).
>>>
>>>
>>>      
>>>
>>Isn't that dangerous?  Someday, someone working on linux (or some
>>other unixish os) might come up with an usleep implementation where
>>usleep(0) just returns and becomes a no-op.  Which probably is ok
>>with the usleep spec - it did sleep for zero time . . .
>>    
>>
>
>Dangerous?? You have a product that needs to ship. You can make
>it work by adding a hack. You add a hack. I don't see danger at
>all. I see getting the management off the back of the software
>engineers so that they can fix the code. Further, you __test__ the
>stuff before you ship. If usleep(0) just spins, then you use
>usleep(1).
>  
>
The dangerous part was that usleep(0) works as a "yield"
today, as your testing will confirm before you ship the product.
But it may break next year if someone changes this part of
the kernel.  Then your customer suddenly have a broken product.

Helge Hafting

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-30  8:50                                 ` Howard Chu
@ 2006-01-30 15:33                                   ` Kyle Moffett
  0 siblings, 0 replies; 85+ messages in thread
From: Kyle Moffett @ 2006-01-30 15:33 UTC (permalink / raw)
  To: Howard Chu
  Cc: Helge Hafting, linux-os (Dick Johnson),
	Nick Piggin, Lee Revell, Christopher Friesen,
	Linux Kernel Mailing List, hancockr

On Jan 30, 2006, at 03:50, Howard Chu wrote:
> Helge Hafting wrote:
>> linux-os (Dick Johnson) wrote:
>>> To fix the current problem, you can substitute usleep(0); It will  
>>> give the CPU to somebody if it's computable, then give it back to  
>>> you. It seems to work in every case that sched_yield() has mucked  
>>> up (perhaps 20 to 30 here).
>>
>> Isn't that dangerous?  Someday, someone working on linux (or some  
>> other unixish os) might come up with an usleep implementation  
>> where usleep(0) just returns and becomes a no-op.  Which probably  
>> is ok with the usleep spec - it did sleep for zero time . . .
>
> We actually experimented with usleep(0) and select(...) with a  
> zeroed timeval. Both of these approaches performed worse than just  
> using sched_yield(), depending on the system and some other  
> conditions. Dual-core AMD64 vs single-CPU had quite different  
> behaviors. Also, if the slapd main event loop was using epoll()  
> instead of select(), the select's used for yields slowed down by a  
> couple orders of magnitude. (A test that normally took ~30 seconds  
> took as long as 45 minutes in one case, it was quite erratic.)
>
> It turned out that most of those yield's were leftovers inherited  
> from when we only supported non-preemptive threads, and simply  
> deleting them was the best approach.

I would argue that in a non realtime environment sched_yield() is not  
useful at all.  When you want to wait for another process, you wait  
explicitly for that process using one of the various POSIX-defined  
methods, such as mutexes, condition variables, etc.  There are very  
clearly and thoroughly defined ways to wait for other processes to  
complete work, why rely on usleep(0) giving CPU to some other task  
when you can explicitly tell the scheduler "I am waiting for task foo  
to release this mutex" or "I can't run until somebody signals this  
condition variable".

Cheers,
Kyle Moffett

--
Unix was not designed to stop people from doing stupid things,  
because that would also stop them from doing clever things.
   -- Doug Gwyn



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Can I do a regular read to simulate prefetch instruction?
       [not found]             ` <4807377b0601271404w6dbfcff6s4de1c3f785dded9f@mail.gmail.com>
@ 2006-01-30 17:25               ` John Smith
  0 siblings, 0 replies; 85+ messages in thread
From: John Smith @ 2006-01-30 17:25 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Hi,

I find out some network card drivers (e.g. e1000 driver) use prefetch 
instruction
to reduce memory access latency and speed up data operation. My question is:
Support we want to pre-read a skb buffer into the cache, what is the 
difference
between the following two methods, i.e. what is the different when using 
prefetch
and using a regular read opertation?
1. use prefetch instruction to stimulate a pre-fetch of the skb address,
    e.g. prefetch(skb);
2. use an assignment statement to stimulate a pre-fetch of the skb address,
    e.g. skb1 = skb;

I was told the data will be prefetched into a so-called prefetching queue 
only by
using prefetching instruction? Is this true?

Thanks,

John



^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-30  8:35                   ` Helge Hafting
  2006-01-30 11:13                     ` Nikita Danilov
@ 2006-01-31 23:18                     ` David Schwartz
  1 sibling, 0 replies; 85+ messages in thread
From: David Schwartz @ 2006-01-31 23:18 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Howard Chu, Linux Kernel Mailing List


> I just wonder - what is the problem with this convoy formation?
> It can only happen when the cpu is overloaded, and in that case
> someone has to wait.  In this case, the mutex waiters. 

	The problem is that you need to become more efficient as load increases, not less. If you get more efficient as load increases, you can get into a situation where even though you have an amount of load you can handle, you will never catch up on the load that backed up before.
 
> Aggressively handing the cpu to whoever holds a mutex will mean the
> mutexes are free more of the time - but it will *not* mean less waiting in
> tghe system.  You just changes who waits.

	It will mean fewer context switches and more effective use of caches as load increases. Even a very small amount of "gets more efficient as load goes up" can mean the difference between a system that handles load spikes smoothly (with a temporary reduction in responsiveness) and a system that backs up in a load spike and never recovers (with a per,amently increasing reduction in responsiveness even with load that's normally tolerable).

	As load goes up, you need your threads to use more of their timeslice. This means not descheduling a running thread unless it is unavoidable.

	DS



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
  2006-01-27  8:08                                     ` Howard Chu
  2006-01-27 19:25                                       ` Philipp Matthias Hahn
@ 2006-02-01 12:31                                       ` Nick Piggin
  1 sibling, 0 replies; 85+ messages in thread
From: Nick Piggin @ 2006-02-01 12:31 UTC (permalink / raw)
  To: Howard Chu; +Cc: davids, Linux Kernel Mailing List

Howard Chu wrote:
> Nick Piggin wrote:
>> Howard Chu wrote:
>>
>>>
>>> And again in this case, A should not be immediately reacquiring the 
>>> lock if it doesn't actually need it.
>>>
>>
>> No, not immediately, I said "for a very long time". As in: A does not
>> need the exclusion provided by the lock for a very long time so it
>> drops it to avoid needless contention, then reaquires it when it finally
>> does need the lock.
> 
> 
> OK. I think this is really a separate situation. Just to recap: A takes 
> lock, does some work, releases lock, a very long time passes, then A 
> takes the lock again. In the "time passes" part, that mutex could be 
> locked and unlocked any number of times by other threads and A won't 
> know or care. Particularly on an SMP machine, other threads that were 
> blocked on that mutex could do useful work in the interim without 
> impacting A's progress at all. So here, when A leaves the mutex unlocked 
> for a long time, it's desirable to give the mutex to one of the waiters 
> ASAP.
> 

But how do you quantify "a long time"? And what happens if process A is
a very high priority and which nothing else is allowed to run?

>> Just accept that my described scenario is legitimate then consider it in
>> isolation rather than getting caught up in the superfluous details of how
>> such a situation might come about.
> 
> 
> OK. I'm not trying to be difficult here. In much of life, context is 
> everything; very little can be understood in isolation.
> 

OK, but other valid examples were offered up - lock inversion avoidance,
and externally driven systems (ie. where it is not known which lock will
be taken next).

> Back to the scenario:
> 
>> A realtime system with tasks A and B, A has an RT scheduling priority of
>> 1, and B is 2. A and B are both runnable, so A is running. A takes a 
>> mutex
>> then sleeps, B runs and ends up blocked on the mutex. A wakes up and at
>> some point it drops the mutex and then tries to take it again.
>>
>> What happens?
> 
> 
> As I understand the spec, A must block because B has acquired the mutex. 
> Once again, the SUS discussion of priority inheritance would never need 
> to have been written if this were not the case:
> 
>  >>>
> In a priority-driven environment, a direct use of traditional primitives 
> like mutexes and condition variables can lead to unbounded priority 
> inversion, where a higher priority thread can be blocked by a lower 
> priority thread, or set of threads, for an unbounded duration of time. 
> As a result, it becomes impossible to guarantee thread deadlines. 
> Priority inversion can be bounded and minimized by the use of priority 
> inheritance protocols. This allows thread deadlines to be guaranteed 
> even in the presence of synchronization requirements.
> <<<
> 
> The very first sentence indicates that a higher priority thread can be 
> blocked by a lower priority thread. If your interpretation of the spec 
> were correct, then such an instance would never occur. Since your 

Wrong. It will obviously occur if the lower priority process is able
to take a lock before a higher priority process.

The situation will not exist in "the scenario" though, if we follow
my reading of the spec, because *the scheduler* determines the next
process to gain the mutex. This makes perfect sense to me.

> scenario is using realtime threads, then we can assume that the Priority 
> Ceiling feature is present and you can use it if needed. ( 
> http://www.opengroup.org/onlinepubs/000095399/xrat/xsh_chap02.html#tag_03_02_09_06 
> Realtime Threads option group )
> 

Any kind of priority boost / inherentance like this is orthogonal to
the issue. They still do not prevent B from acquiring the mutex and
thereby blocking the execution of the higher priority A. I think this
is against the spirit of the spec, especially the part where it says
*the scheduler* will choose which process to gain the lock.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: e100 oops on resume
  2006-01-28 19:53               ` Jesse Brandeburg
@ 2006-02-07  6:57                 ` Jeff Garzik
  0 siblings, 0 replies; 85+ messages in thread
From: Jeff Garzik @ 2006-02-07  6:57 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Stefan Seyfried, Olaf Kirch, Linux Kernel Mailing List, netdev,
	Jesse Brandeburg, Jeff Kirsher

Jesse Brandeburg wrote:
> On 1/28/06, Mattia Dongili <malattia@linux.it> wrote:
> 
>>On Thu, Jan 26, 2006 at 08:02:37PM +0100, Stefan Seyfried wrote:
>>
>>>On Wed, Jan 25, 2006 at 04:28:48PM -0800, Jesse Brandeburg wrote:
>>>
>>>
>>>>Okay I reproduced the issue on 2.6.15.1 (with S1 sleep) and was able
>>>>to show that my patch that just removes e100_init_hw works okay for
>>>>me.  Let me know how it goes for you, I think this is a good fix.
>>>
>>>worked for me in the Compaq Armada e500 and reportedly also fixed the
>>>SONY that originally uncovered it.
>>
>>confirmed here too. The patch fixes S3 resume on this Sony (GR7/K)
>>running 2.6.16-rc1-mm3.
> 
> 
> excellent news! thanks for testing.
> 
> Jeff, could you please apply to 2.6.16-rcX
> 
> Jesse

SIGH.  In your last patch submission you had it right, but Intel has yet 
again regressed in patch submission form.

Your fixes will be expedited if they can be applied by script, and then 
quickly whisked upstream to Linus/Andrew.  This one had to be applied by 
hand (so yes, its applied) for several reasons:

* Unreviewable in mail reader, due to MIME type application/octet-stream.

* In general, never use MIME (attachments), they decrease the audience 
that can easily review your patch.

* Your patch's description and signed-off-by were buried inside the 
octet-stream attachment.

* Please review http://linux.yyz.us/patch-format.html  (I probably 
should add MIME admonitions to that)

	Jeff

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2006-02-07  6:57 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-24 22:59 e100 oops on resume Stefan Seyfried
2006-01-24 23:21 ` Mattia Dongili
2006-01-25  9:02   ` Olaf Kirch
2006-01-25 12:11     ` Olaf Kirch
2006-01-25 13:51       ` sched_yield() makes OpenLDAP slow Howard Chu
2006-01-25 14:38         ` Robert Hancock
2006-01-25 17:49         ` Christopher Friesen
2006-01-25 18:26           ` pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow) Howard Chu
2006-01-25 18:59             ` Nick Piggin
2006-01-25 19:32               ` Howard Chu
2006-01-26  8:51                 ` Nick Piggin
2006-01-26 14:15                   ` Kyle Moffett
2006-01-26 14:43                     ` Howard Chu
2006-01-26 19:57                       ` David Schwartz
2006-01-26 20:27                         ` Howard Chu
2006-01-26 20:46                           ` Nick Piggin
2006-01-26 21:32                             ` Howard Chu
2006-01-26 21:41                               ` Nick Piggin
2006-01-26 21:56                                 ` Howard Chu
2006-01-26 22:24                                   ` Nick Piggin
2006-01-27  8:08                                     ` Howard Chu
2006-01-27 19:25                                       ` Philipp Matthias Hahn
2006-02-01 12:31                                       ` Nick Piggin
2006-01-27  4:27                                   ` Steven Rostedt
2006-01-26 21:58                               ` Christopher Friesen
2006-01-27  4:13                               ` Steven Rostedt
2006-01-27  2:16                           ` David Schwartz
2006-01-27  8:19                             ` Howard Chu
2006-01-27 19:50                               ` David Schwartz
2006-01-27 20:13                                 ` Howard Chu
2006-01-27 21:05                                   ` David Schwartz
2006-01-27 21:23                                     ` Howard Chu
2006-01-27 23:31                                       ` David Schwartz
2006-01-30  8:28                         ` Helge Hafting
2006-01-26 10:38                 ` Nikita Danilov
2006-01-30  8:35                   ` Helge Hafting
2006-01-30 11:13                     ` Nikita Danilov
2006-01-31 23:18                     ` David Schwartz
2006-01-25 21:06             ` Lee Revell
2006-01-25 22:14               ` Howard Chu
2006-01-26  0:16                 ` Robert Hancock
2006-01-26  0:49                   ` Howard Chu
2006-01-26  1:04                     ` Lee Revell
2006-01-26  1:31                       ` Howard Chu
2006-01-26  2:05                 ` David Schwartz
2006-01-26  2:48                   ` Mark Lord
2006-01-26  3:30                     ` David Schwartz
2006-01-26  3:49                       ` Samuel Masham
2006-01-26  4:02                         ` Samuel Masham
2006-01-26  4:53                           ` Lee Revell
2006-01-26  6:14                             ` Samuel Masham
2006-01-26  8:54                 ` Nick Piggin
2006-01-26 14:24                   ` Howard Chu
2006-01-26 14:54                     ` Nick Piggin
2006-01-26 15:23                       ` Howard Chu
2006-01-26 15:51                         ` Nick Piggin
2006-01-26 16:44                           ` Howard Chu
2006-01-26 17:34                             ` linux-os (Dick Johnson)
2006-01-26 19:00                               ` Nick Piggin
2006-01-26 19:14                                 ` linux-os (Dick Johnson)
2006-01-26 21:12                                   ` Nick Piggin
2006-01-26 21:31                                     ` linux-os (Dick Johnson)
2006-01-27  7:06                                       ` Valdis.Kletnieks
2006-01-30  8:44                               ` Helge Hafting
2006-01-30  8:50                                 ` Howard Chu
2006-01-30 15:33                                   ` Kyle Moffett
2006-01-30 13:28                                 ` linux-os (Dick Johnson)
2006-01-30 15:15                                   ` Helge Hafting
2006-01-26 10:44                 ` Nikita Danilov
2006-01-26  0:08             ` Robert Hancock
2006-01-26  1:07         ` sched_yield() makes OpenLDAP slow David Schwartz
2006-01-26  8:30           ` Helge Hafting
2006-01-26  9:01             ` Nick Piggin
2006-01-26 10:50             ` Nikita Danilov
2006-01-25 19:37       ` e100 oops on resume Jesse Brandeburg
2006-01-25 20:14         ` Olaf Kirch
2006-01-25 22:28           ` Jesse Brandeburg
2006-01-26  0:28         ` Jesse Brandeburg
2006-01-26  9:32           ` Pavel Machek
2006-01-26 19:02           ` Stefan Seyfried
2006-01-26 19:09             ` Olaf Kirch
2006-01-28 11:53             ` Mattia Dongili
2006-01-28 19:53               ` Jesse Brandeburg
2006-02-07  6:57                 ` Jeff Garzik
     [not found]           ` <BAY108-DAV111F6EF46F6682FEECCC1593140@phx.gbl>
     [not found]             ` <4807377b0601271404w6dbfcff6s4de1c3f785dded9f@mail.gmail.com>
2006-01-30 17:25               ` Can I do a regular read to simulate prefetch instruction? John Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).