linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* e100 oops on resume
@ 2006-01-24 22:59 Stefan Seyfried
  2006-01-24 23:21 ` Mattia Dongili
  0 siblings, 1 reply; 88+ messages in thread
From: Stefan Seyfried @ 2006-01-24 22:59 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: netdev

Hi,
since 2.6.16rc1-git3, e100 dies on resume (regardless if from disk, ram or
runtime powermanagement). Unfortunately i only have a bad photo of
the oops right now, it is available from
https://bugzilla.novell.com/attachment.cgi?id=64761&action=view
I have reproduced this on a second e100 machine and can get a serial
console log from this machine tomorrow if needed.
It did resume fine with 2.6.15-git12
-- 
Stefan Seyfried                  \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices      \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, Nürnberg \                    -- Leonard Cohen

^ permalink raw reply	[flat|nested] 88+ messages in thread
* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
@ 2006-01-30 22:01 linux
  0 siblings, 0 replies; 88+ messages in thread
From: linux @ 2006-01-30 22:01 UTC (permalink / raw)
  To: davids; +Cc: linux-kernel

> 	It can tell the difference between the other thread getting
> the mutex first and it getting the mutex first. But it cannot tell the
> difference between an implementation that puts random sleeps before calls
> to 'pthread_mutex_lock' and an implementation that has the allegedly
> non-compliant behavior. That makes the behavior compliant under the
> 'as-if' rule.
> 
> 	If you don't believe me, try to write a program that prints
> 'non-compliant' on a system that has the alleged non-compliance but is
> guaranteed not to do so on any compliant system. It cannot be done.
> 
> 	In order to claim the alleged compliance, you would have to
> know that a thread waiting for a mutex did not get it. But there is no
> possible way you can know that another thread is waiting for the mutex
> (as opposed to being about to wait for it). So you can never detect the
> claimed non-compliance, so it's not non-compliance.

An excellent point, but the existence of pthread_mutex_trylock()
invalidates it.

To be very specific, the following will do the job:

volatile unsigned shared_variable;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void
thread_function()
{
	unsigned prev_value = shared_variable;

	for (;;) {
		unsigned cur_value, delta;
		if (pthread_mutex_trylock(&lock) == 0) {
			cur_value = ++shared_variable;
			pthread_mutex_unlock(&lock);
			delta = cur_value - prev_value;
		} else {
			/* Another thread is holding the lock. */
			pthread_mutex_lock(&lock);
			cur_value = ++shared_variable;
			pthread_mutex_unlock(&lock);
			delta = cur_value - prev_value;
			if (delta == 1)
				fatal("non-compliant");
		}
		/* Assuming we don't wrap */
		if (delta == 0)
			fatal("buggy as a roach motel");
	}
}

You need to run more than one instance of the thread_function()
to have a chance of triggering the non-compliant message, of course.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* RE: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
@ 2006-01-30 23:37 linux
  0 siblings, 0 replies; 88+ messages in thread
From: linux @ 2006-01-30 23:37 UTC (permalink / raw)
  To: davids; +Cc: linux-kernel

Thinking some more on my example, for SCHED_OTHER threads, it is possible
to define the problem away by making pthread_mutex_trylock behave
compatibly with pthread_mutex_lock.  That is, threads in pthread_mutex_lock()
are actually descheduled just before waiting for the lock (SCHED_OTHER is
allowed to do that), and when the lock becomes available, the scheduler
then decides who to run through the acquisition code.

As long as pthread_mutex_trylock() succeeds in such a case, some may call
it weird, but it's conformant, and the performance arguments for the
"unfair" case might easily win the day.

This is assuming that SCHED_OTHER can block a process for an arbitrary
time for no good reason.  Otherwise, if the lock holder is waiting for
device I/O and no other processes are competing for the CPU, perhaps
blocking on the edge like that for an unbounded time is illegal.


However, if you have priorities and can't redefine locking using creative
scheduling policies, it's less clear.  If I have a couple of real-time
tasks, I can't decide arbitrarily to run one in lieu of the other.
For example, suppose that without priority inheritance, you have three
tasks, A (highest priority), B, and C (lowest).

There are three locks.  Initially, A holds lock 1 and C holds lock 2.
Then A tries to acquire lock 2.  A blocks, so B runs until it blocks trying
to get lock 1.  Then C runs and drops lock 2.  A gets it, then drops lock 1
and tries to re-acquire it.

It seems to me that the Posix spec mandates that B gets lock 1 (and A
must block) before A can re-acquire it.

(This can also be done with priority inheritance, although it's a bit
different.  A version that works whether priority inheritance is
implemented or not is probably possible, too.)

I'm not saying that this is a good thing, but it's distinguishable, and I
don't have any language-lawyer way to escape the obligation.

^ permalink raw reply	[flat|nested] 88+ messages in thread
* Re: pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow)
@ 2006-02-01 17:06 Lee Schermerhorn
  0 siblings, 0 replies; 88+ messages in thread
From: Lee Schermerhorn @ 2006-02-01 17:06 UTC (permalink / raw)
  To: Nick Piggin, linux-kernel

Hi, Nick, All:

It is with some trepedation that I jump into this already looong thread...

I spent ~6 years as secretary of the Posix.4/.4a [realtime and threads]
working group where we discussed this stuff ad nauseum.  I was also
"technical reviewer" of a few chapters of the .4 drafts [Posix.1a spec].
I only bring this up to point out where my "understanding" of the
"intent" of the spec, such as it is, comes from.  A caveat:  my
involvement was with IEEE/Posix.  The Open Group has adopted the Posix
specs into the SUS and has, rightfully so, imposed additional
interpretation and requirements on it.  E.g., the SUS can require that
certain features that are optional in Posix.

There are ambiguities in the spec.  We tried to avoid these.  One reason
for some of the ambiguities is that this is the best that we could get
the various factions, corporations, ... represented in the working
groups and the balloting groups [not necessarily the same folks] to
agree upon.

The drafts went through a couple of years [maybe longer] of balloting
where folks would object to the wording based on all sorts of real and
imagined scenarios.  Still, when all is said and done, the spec says
what it says.  I've been told that any interpretation that meets the
letter of the specification [and English is notorious for it's
ambiguity, leaving the field wide open, here] is valid, intentions of
the authors notwithstanding.

Also, note that, for Posix, at least, there are 2 parts of the spec:
The main body of the spec--so called "normative" text or mandatory
requirements--and the rationale that attempts to explain some of the
background and, well, rationale.  The rationale is non-normative/non-
binding.

With that background:

[note:  the single '>' indents are from Nick.  I grabbed this from the
archive.  sorry].

>> Back to the scenario:
>> 
>>> A realtime system with tasks A and B, A has an RT scheduling priority of
>>> 1, and B is 2. A and B are both runnable, so A is running. A takes a 
>>> mutex
>>> then sleeps, B runs and ends up blocked on the mutex. A wakes up and at
>>> some point it drops the mutex and then tries to take it again.
>>>
>>> What happens?
>> 
>> 
>> As I understand the spec, A must block because B has acquired the mutex. 
>> Once again, the SUS discussion of priority inheritance would never need 
>> to have been written if this were not the case:
>> 
>>  >>>
>> In a priority-driven environment, a direct use of traditional primitives 
>> like mutexes and condition variables can lead to unbounded priority 
>> inversion, where a higher priority thread can be blocked by a lower 
>> priority thread, or set of threads, for an unbounded duration of time. 
>> As a result, it becomes impossible to guarantee thread deadlines. 
>> Priority inversion can be bounded and minimized by the use of priority 
>> inheritance protocols. This allows thread deadlines to be guaranteed 
>> even in the presence of synchronization requirements.
>> <<<
>> 
>> The very first sentence indicates that a higher priority thread can be 
>> blocked by a lower priority thread. If your interpretation of the spec 
>> were correct, then such an instance would never occur. Since your 
>
>Wrong. It will obviously occur if the lower priority process is able
>to take a lock before a higher priority process.
>
>The situation will not exist in "the scenario" though, if we follow
>my reading of the spec, because *the scheduler* determines the next
>process to gain the mutex. This makes perfect sense to me.

My copy of the Posix spec and the on-line SUSv2 say *the scheduling
policy* determines the next process/thread.  I.e, the scheduler, per se,
isn't required to get involved--the selected waiter doesn't need to run
to obtain the mutex.

The intention of the authors [again, not binding] was that for
SCHED_OTHER, all bets are off; and for RT/FIFO policies, the highest
priority, longest waiting thread would be chosen.  Think of the queue of
waiters being ordered the same way as the run queue for the policy in
effect.  Then, the selection of the next thread to get the mutex is
simply the head of the list of waiters.  Presumably the scheduling
policy determined how the waiters were queued--paying the price at wait
time, when you're going to suffer a context switch anyway.  However, one
could queue in any order at wait time and pay the price to scan the
queue at unlock time.  

Now, this harks back to Howard Chu's [I think?] discussion of
[paraphrasing] "what is the eligible set of threads from which the
system selects."  The spec is ambiguous here.

One consideration that I haven't heard discussed in this thread [and I
might have missed it] is the notion of "forward progress".  I checked my
copy and the spec seems to be silent on this topic.  Most
implementations that I've worked with work hard to guarantee forward
progress through the mutex.  Generally, I've seen this implemented by
handing off the lock to the "most eligible" thread [as determined by the
scheduling policy].  This is, in my reading of the spec and based on my
understanding of intent, a perfectly conforming implementation.  It also
avoids the "thundering herd" phenomenon.

>
>> scenario is using realtime threads, then we can assume that the Priority 
>> Ceiling feature is present and you can use it if needed. ( 
>> http://www.opengroup.org/onlinepubs/000095399/xrat/xsh_chap02.html#tag_03_02_09_06 
>> Realtime Threads option group )
>> 
>
>Any kind of priority boost / inherentance like this is orthogonal to
>the issue. They still do not prevent B from acquiring the mutex and
>thereby blocking the execution of the higher priority A. I think this
>is against the spirit of the spec, especially the part where it says
>*the scheduler* will choose which process to gain the lock.
>

I don't agree.  Priority inheritance and priority ceiling options were
provided for just this purpose.  Some of the "hard real-time" working
group members insisted on this feature to avoid the dreaded priority
inversion.  However, the non-real-time folks interested in threads for
concurrent programming, didn't need nor want the associated overhead.
Thus, the options.  If, in the absence of priority inheritance, a lower
priority thread [B] gains a mutex [e.g., because of direct hand-off at
unlock time] and a higher priority thread [A] must then wait for the
mutex, that's allowed.  [Disclaimer:  the priority inheritance and other
mutex options were added in a later version of the spec, after I ended
my involvement in the working grou.  However, they were discussed at
length in the original .4/.4a working groups and deferred to the later
update.]

The scenario that most of the real time folks [that I've talked to or
heard discuss it] worry about is when a 3rd process, C, with priority
between A and B, preempts B while it's holding the mutex.   C can run
for an unbounded time [*unbounded* priority inversion is the big
bugaboo], as long as it doesn't try to obtain the mutex, effectively
preventing B from finishing its use and allowing A to proceed.  This is
what priority inheritance is supposed to prevent.

With priority inheritance, when A attempts to take the mutex held by B,
B inherits A's priority such that any C [between A and B in priority]
can't preempt B.  When, B releases the the mutex, it's priority is
dropped [to the lower of it's native priority or the highest priority of
any waiters on any other mutexes that B or any of those aforementioned
waiters might hold, yada, yada, yada--an ugly requirement, no?  more
rationale for making it optional!] and now A, if it's the highest
priority waiter, can grab the mutex.  In a uniprocessor, it probably
doesn't matter whether we hand the mutex off directly to A or just wake
up A, let it preempt B and grab the mutex.  In an SMP, some other thread
of lower priority that A could sneak in, requiring another heavy-weight
priority inheritance transaction if we don't hand off.

So, ....

The current implementation, that apparently doesn't hand off, but
requires a waiter to run to grab the mutex is probably conforming, in a
strict sense.  But, to say that this is the intent of the spec is, IMO,
a stretch.  I suspect it violates the "Principle of Least Astonishment"
for a lot of practioners.  I know is does for me, but I've learned that
my vote doesn't count for much in any venue...

Trying to be helpful, but probably just muddying the water...,
Lee


^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2006-02-07  6:57 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-24 22:59 e100 oops on resume Stefan Seyfried
2006-01-24 23:21 ` Mattia Dongili
2006-01-25  9:02   ` Olaf Kirch
2006-01-25 12:11     ` Olaf Kirch
2006-01-25 13:51       ` sched_yield() makes OpenLDAP slow Howard Chu
2006-01-25 14:38         ` Robert Hancock
2006-01-25 17:49         ` Christopher Friesen
2006-01-25 18:26           ` pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow) Howard Chu
2006-01-25 18:59             ` Nick Piggin
2006-01-25 19:32               ` Howard Chu
2006-01-26  8:51                 ` Nick Piggin
2006-01-26 14:15                   ` Kyle Moffett
2006-01-26 14:43                     ` Howard Chu
2006-01-26 19:57                       ` David Schwartz
2006-01-26 20:27                         ` Howard Chu
2006-01-26 20:46                           ` Nick Piggin
2006-01-26 21:32                             ` Howard Chu
2006-01-26 21:41                               ` Nick Piggin
2006-01-26 21:56                                 ` Howard Chu
2006-01-26 22:24                                   ` Nick Piggin
2006-01-27  8:08                                     ` Howard Chu
2006-01-27 19:25                                       ` Philipp Matthias Hahn
2006-02-01 12:31                                       ` Nick Piggin
2006-01-27  4:27                                   ` Steven Rostedt
2006-01-26 21:58                               ` Christopher Friesen
2006-01-27  4:13                               ` Steven Rostedt
2006-01-27  2:16                           ` David Schwartz
2006-01-27  8:19                             ` Howard Chu
2006-01-27 19:50                               ` David Schwartz
2006-01-27 20:13                                 ` Howard Chu
2006-01-27 21:05                                   ` David Schwartz
2006-01-27 21:23                                     ` Howard Chu
2006-01-27 23:31                                       ` David Schwartz
2006-01-30  8:28                         ` Helge Hafting
2006-01-26 10:38                 ` Nikita Danilov
2006-01-30  8:35                   ` Helge Hafting
2006-01-30 11:13                     ` Nikita Danilov
2006-01-31 23:18                     ` David Schwartz
2006-01-25 21:06             ` Lee Revell
2006-01-25 22:14               ` Howard Chu
2006-01-26  0:16                 ` Robert Hancock
2006-01-26  0:49                   ` Howard Chu
2006-01-26  1:04                     ` Lee Revell
2006-01-26  1:31                       ` Howard Chu
2006-01-26  2:05                 ` David Schwartz
2006-01-26  2:48                   ` Mark Lord
2006-01-26  3:30                     ` David Schwartz
2006-01-26  3:49                       ` Samuel Masham
2006-01-26  4:02                         ` Samuel Masham
2006-01-26  4:53                           ` Lee Revell
2006-01-26  6:14                             ` Samuel Masham
2006-01-26  8:54                 ` Nick Piggin
2006-01-26 14:24                   ` Howard Chu
2006-01-26 14:54                     ` Nick Piggin
2006-01-26 15:23                       ` Howard Chu
2006-01-26 15:51                         ` Nick Piggin
2006-01-26 16:44                           ` Howard Chu
2006-01-26 17:34                             ` linux-os (Dick Johnson)
2006-01-26 19:00                               ` Nick Piggin
2006-01-26 19:14                                 ` linux-os (Dick Johnson)
2006-01-26 21:12                                   ` Nick Piggin
2006-01-26 21:31                                     ` linux-os (Dick Johnson)
2006-01-27  7:06                                       ` Valdis.Kletnieks
2006-01-30  8:44                               ` Helge Hafting
2006-01-30  8:50                                 ` Howard Chu
2006-01-30 15:33                                   ` Kyle Moffett
2006-01-30 13:28                                 ` linux-os (Dick Johnson)
2006-01-30 15:15                                   ` Helge Hafting
2006-01-26 10:44                 ` Nikita Danilov
2006-01-26  0:08             ` Robert Hancock
2006-01-26  1:07         ` sched_yield() makes OpenLDAP slow David Schwartz
2006-01-26  8:30           ` Helge Hafting
2006-01-26  9:01             ` Nick Piggin
2006-01-26 10:50             ` Nikita Danilov
2006-01-25 19:37       ` e100 oops on resume Jesse Brandeburg
2006-01-25 20:14         ` Olaf Kirch
2006-01-25 22:28           ` Jesse Brandeburg
2006-01-26  0:28         ` Jesse Brandeburg
2006-01-26  9:32           ` Pavel Machek
2006-01-26 19:02           ` Stefan Seyfried
2006-01-26 19:09             ` Olaf Kirch
2006-01-28 11:53             ` Mattia Dongili
2006-01-28 19:53               ` Jesse Brandeburg
2006-02-07  6:57                 ` Jeff Garzik
     [not found]           ` <BAY108-DAV111F6EF46F6682FEECCC1593140@phx.gbl>
     [not found]             ` <4807377b0601271404w6dbfcff6s4de1c3f785dded9f@mail.gmail.com>
2006-01-30 17:25               ` Can I do a regular read to simulate prefetch instruction? John Smith
2006-01-30 22:01 pthread_mutex_unlock (was Re: sched_yield() makes OpenLDAP slow) linux
2006-01-30 23:37 linux
2006-02-01 17:06 Lee Schermerhorn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).