All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
@ 2009-07-07 23:58 Anton Vorontsov
  2009-07-08  0:50 ` Oleg Nesterov
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Vorontsov @ 2009-07-07 23:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Oleg Nesterov, Peter Zijlstra,
	linux-kernel

Using early netconsole and gianfar driver this error pops up:

  netconsole: timeout waiting for carrier

It appears that net/core/netpoll.c:netpoll_setup() is using
cond_resched() in a loop waiting for a carrier.

The thing is that cond_resched() is a no-op when system_state !=
SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
scheduled, therefore link detection doesn't work.

This patch fixes the issue by removing SYSTEM_RUNNING checks, which
might be dangerous to do, but looking at 'git blame' I can see that
the first SYSTEM_RUNNING check was introduced in this commit:

  commit 8ba7b0a14b2ec19583bedbcdbea7f1c5008fc922
  Author: Linus Torvalds <torvalds@g5.osdl.org>
  Date:   Mon Mar 6 17:38:49 2006 -0800

   Add early-boot-safety check to cond_resched()

   Just to be safe, we should not trigger a conditional reschedule during
   the early boot sequence.  We've historically done some questionable
   early on, and the safety warnings in __might_sleep() are generally
   turned off during that period, so there might be problems lurking.

   This affects CONFIG_PREEMPT_VOLUNTARY, which takes over might_sleep() to
   cause a voluntary conditional reschedule.

The commit doesn't mention that it fixes some particular problem,
though it surely breaks netconsole for drivers that are using phylib.

I tested the patch with CONFIG_PREEMPT_VOLUNTARY (on PowerPC), and it
didn't cause any issues.

Surely, the other way to fix the netconsole is to call schedule() or
msleep() directly, but I'm curious if removing the checks makes sense
nowadays.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---
 kernel/sched.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 7c9098d..3899baa 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6560,8 +6560,7 @@ static void __cond_resched(void)
 
 int __sched _cond_resched(void)
 {
-	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
-					system_state == SYSTEM_RUNNING) {
+	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE)) {
 		__cond_resched();
 		return 1;
 	}
@@ -6579,7 +6578,7 @@ EXPORT_SYMBOL(_cond_resched);
  */
 int cond_resched_lock(spinlock_t *lock)
 {
-	int resched = need_resched() && system_state == SYSTEM_RUNNING;
+	int resched = need_resched();
 	int ret = 0;
 
 	if (spin_needbreak(lock) || resched) {
@@ -6599,7 +6598,7 @@ int __sched cond_resched_softirq(void)
 {
 	BUG_ON(!in_softirq());
 
-	if (need_resched() && system_state == SYSTEM_RUNNING) {
+	if (need_resched()) {
 		local_bh_enable();
 		__cond_resched();
 		local_bh_disable();
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-07 23:58 [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Anton Vorontsov
@ 2009-07-08  0:50 ` Oleg Nesterov
  2009-07-08  6:24   ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2009-07-08  0:50 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Peter Zijlstra, linux-kernel

On 07/08, Anton Vorontsov wrote:
>
> but I'm curious if removing the checks makes sense
> nowadays.
...
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -6560,8 +6560,7 @@ static void __cond_resched(void)
>
>  int __sched _cond_resched(void)
>  {
> -	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
> -					system_state == SYSTEM_RUNNING) {
> +	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE)) {
>  		__cond_resched();
>  		return 1;
>  	}

and, with CONFIG_PREEMPT preempt_schedule() does not check system_state,
so it looks really strange cond_resched() does check SYSTEM_RUNNING.



debug_smp_processor_id() looks strange too:
	
	/*
	 * It is valid to assume CPU-locality during early bootup:
	 */
	if (system_state != SYSTEM_RUNNING)
		goto out;

this doesn't look right, smp_init() is called before we set
SYSTEM_RUNNING.

Hmm, and

	/*
	 * Kernel threads bound to a single CPU can safely use
	 * smp_processor_id():
	 */
	if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
		goto out;

perhaps this should use PF_THREAD_BOUND ?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08  0:50 ` Oleg Nesterov
@ 2009-07-08  6:24   ` Peter Zijlstra
  2009-07-08 12:03     ` Anton Vorontsov
  2009-07-08 16:12     ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Linus Torvalds
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2009-07-08  6:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Anton Vorontsov, Ingo Molnar, Linus Torvalds, Andrew Morton,
	linux-kernel

On Wed, 2009-07-08 at 02:50 +0200, Oleg Nesterov wrote:

> 	/*
> 	 * It is valid to assume CPU-locality during early bootup:
> 	 */
> 	if (system_state != SYSTEM_RUNNING)
> 		goto out;
> 
> this doesn't look right, smp_init() is called before we set
> SYSTEM_RUNNING.

The thing is, there's also ton's of code that might end up calling
cond_resched() and co before the scheduler is fully initialized. Doing
so would indeed mess things up.

Also, by definition we'd have to call smp_init() before SYSTEM_RUNNING,
because you simply cannot declare a system up and running when your core
functionality isn't initialized.

So I'd really rather preserve these checks -- I can even remember
running into some of these things a while back, but memory isn't
providing specific cases.

> Hmm, and
> 
> 	/*
> 	 * Kernel threads bound to a single CPU can safely use
> 	 * smp_processor_id():
> 	 */
> 	if (cpumask_equal(&current->cpus_allowed, cpumask_of(this_cpu)))
> 		goto out;
> 
> perhaps this should use PF_THREAD_BOUND ?

That might predate PF_THREAD_BOUND, also I think this is more generic,
and I think we used it for that set_affinity dance we did Rusty 'fixed'
a while back.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08  6:24   ` Peter Zijlstra
@ 2009-07-08 12:03     ` Anton Vorontsov
  2009-07-08 12:12       ` Peter Zijlstra
  2009-07-08 16:12     ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Anton Vorontsov @ 2009-07-08 12:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

On Wed, Jul 08, 2009 at 08:24:23AM +0200, Peter Zijlstra wrote:
> On Wed, 2009-07-08 at 02:50 +0200, Oleg Nesterov wrote:
> 
> > 	/*
> > 	 * It is valid to assume CPU-locality during early bootup:
> > 	 */
> > 	if (system_state != SYSTEM_RUNNING)
> > 		goto out;
> > 
> > this doesn't look right, smp_init() is called before we set
> > SYSTEM_RUNNING.
> 
> The thing is, there's also ton's of code that might end up calling
> cond_resched() and co before the scheduler is fully initialized.

Hm. Speaking of cond_resched*() only, then it should be pretty
safe to convert the SYSTEM_RUNNING checks to scheduler_running,
no? scheduler_running is set after sched_init().

Something like this

diff --git a/kernel/sched.c b/kernel/sched.c
index 7c9098d..555360b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6561,7 +6561,7 @@ static void __cond_resched(void)
 int __sched _cond_resched(void)
 {
 	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
-					system_state == SYSTEM_RUNNING) {
+					scheduler_running) {
 		__cond_resched();
 		return 1;
 	}
@@ -6579,7 +6579,7 @@ EXPORT_SYMBOL(_cond_resched);
  */
 int cond_resched_lock(spinlock_t *lock)
 {
-	int resched = need_resched() && system_state == SYSTEM_RUNNING;
+	int resched = need_resched() && scheduler_running;
 	int ret = 0;
 
 	if (spin_needbreak(lock) || resched) {
@@ -6599,7 +6599,7 @@ int __sched cond_resched_softirq(void)
 {
 	BUG_ON(!in_softirq());
 
-	if (need_resched() && system_state == SYSTEM_RUNNING) {
+	if (need_resched() && scheduler_running) {
 		local_bh_enable();
 		__cond_resched();
 		local_bh_disable();

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08 12:03     ` Anton Vorontsov
@ 2009-07-08 12:12       ` Peter Zijlstra
  2009-07-08 12:55         ` Anton Vorontsov
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-07-08 12:12 UTC (permalink / raw)
  To: avorontsov
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

On Wed, 2009-07-08 at 16:03 +0400, Anton Vorontsov wrote:
> On Wed, Jul 08, 2009 at 08:24:23AM +0200, Peter Zijlstra wrote:
> > On Wed, 2009-07-08 at 02:50 +0200, Oleg Nesterov wrote:
> > 
> > > 	/*
> > > 	 * It is valid to assume CPU-locality during early bootup:
> > > 	 */
> > > 	if (system_state != SYSTEM_RUNNING)
> > > 		goto out;
> > > 
> > > this doesn't look right, smp_init() is called before we set
> > > SYSTEM_RUNNING.
> > 
> > The thing is, there's also ton's of code that might end up calling
> > cond_resched() and co before the scheduler is fully initialized.
> 
> Hm. Speaking of cond_resched*() only, then it should be pretty
> safe to convert the SYSTEM_RUNNING checks to scheduler_running,
> no? scheduler_running is set after sched_init().

Hmm, that might work, I'd have to audit sched_init_smp() as it seems to
do way too much...



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08 12:12       ` Peter Zijlstra
@ 2009-07-08 12:55         ` Anton Vorontsov
  2009-07-08 12:58           ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Anton Vorontsov @ 2009-07-08 12:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

On Wed, Jul 08, 2009 at 02:12:25PM +0200, Peter Zijlstra wrote:
> On Wed, 2009-07-08 at 16:03 +0400, Anton Vorontsov wrote:
> > On Wed, Jul 08, 2009 at 08:24:23AM +0200, Peter Zijlstra wrote:
> > > On Wed, 2009-07-08 at 02:50 +0200, Oleg Nesterov wrote:
> > > 
> > > > 	/*
> > > > 	 * It is valid to assume CPU-locality during early bootup:
> > > > 	 */
> > > > 	if (system_state != SYSTEM_RUNNING)
> > > > 		goto out;
> > > > 
> > > > this doesn't look right, smp_init() is called before we set
> > > > SYSTEM_RUNNING.
> > > 
> > > The thing is, there's also ton's of code that might end up calling
> > > cond_resched() and co before the scheduler is fully initialized.
> > 
> > Hm. Speaking of cond_resched*() only, then it should be pretty
> > safe to convert the SYSTEM_RUNNING checks to scheduler_running,
> > no? scheduler_running is set after sched_init().
> 
> Hmm, that might work, I'd have to audit sched_init_smp() as it seems to
> do way too much...

sched_init_smp() is called from the kernel_thread(), so
if the scheduler is not functional prior to kernel_thread(),
you're in trouble anyway, no? The point is that a lot of
code is calling schedule() prior to sched_init_smp()
(e.g. msleep(), mutexes), and there are no issues. So
should be no issues with cond_resched()?

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08 12:55         ` Anton Vorontsov
@ 2009-07-08 12:58           ` Peter Zijlstra
  2009-07-08 20:45             ` [PATCH] sched: Make cond_resched*() available earlier Anton Vorontsov
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-07-08 12:58 UTC (permalink / raw)
  To: avorontsov
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

On Wed, 2009-07-08 at 16:55 +0400, Anton Vorontsov wrote:
> On Wed, Jul 08, 2009 at 02:12:25PM +0200, Peter Zijlstra wrote:
> > On Wed, 2009-07-08 at 16:03 +0400, Anton Vorontsov wrote:
> > > On Wed, Jul 08, 2009 at 08:24:23AM +0200, Peter Zijlstra wrote:
> > > > On Wed, 2009-07-08 at 02:50 +0200, Oleg Nesterov wrote:
> > > > 
> > > > > 	/*
> > > > > 	 * It is valid to assume CPU-locality during early bootup:
> > > > > 	 */
> > > > > 	if (system_state != SYSTEM_RUNNING)
> > > > > 		goto out;
> > > > > 
> > > > > this doesn't look right, smp_init() is called before we set
> > > > > SYSTEM_RUNNING.
> > > > 
> > > > The thing is, there's also ton's of code that might end up calling
> > > > cond_resched() and co before the scheduler is fully initialized.
> > > 
> > > Hm. Speaking of cond_resched*() only, then it should be pretty
> > > safe to convert the SYSTEM_RUNNING checks to scheduler_running,
> > > no? scheduler_running is set after sched_init().
> > 
> > Hmm, that might work, I'd have to audit sched_init_smp() as it seems to
> > do way too much...
> 
> sched_init_smp() is called from the kernel_thread(), so
> if the scheduler is not functional prior to kernel_thread(),
> you're in trouble anyway, no? The point is that a lot of
> code is calling schedule() prior to sched_init_smp()
> (e.g. msleep(), mutexes), and there are no issues. So
> should be no issues with cond_resched()?

Yeah, it should be good, I just got paranoid looking at
sched_init_smp().


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08  6:24   ` Peter Zijlstra
  2009-07-08 12:03     ` Anton Vorontsov
@ 2009-07-08 16:12     ` Linus Torvalds
  2009-07-08 21:10       ` Andrew Morton
  1 sibling, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2009-07-08 16:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Anton Vorontsov, Ingo Molnar, Andrew Morton, linux-kernel



On Wed, 8 Jul 2009, Peter Zijlstra wrote:

> On Wed, 2009-07-08 at 02:50 +0200, Oleg Nesterov wrote:
> 
> > 	/*
> > 	 * It is valid to assume CPU-locality during early bootup:
> > 	 */
> > 	if (system_state != SYSTEM_RUNNING)
> > 		goto out;
> > 
> > this doesn't look right, smp_init() is called before we set
> > SYSTEM_RUNNING.
> 
> The thing is, there's also ton's of code that might end up calling
> cond_resched() and co before the scheduler is fully initialized. Doing
> so would indeed mess things up.

I forget what triggered me to add some of them, but we definitely had bugs 
with the scheduler being called early, and having some really nasty oopses 
happen early in the boot (and that early, they don't get saved, so the 
reporting percentage goes down to something very low).

So I think that the patch Anton posted is probably the RightThing(tm) to 
do, and in that sense I like it - but they make me very nervous. There's a 
lot of "cond_resched()"s sprinkled about in helper routines, and if they 
are ever called early, you're basically screwed.

> Also, by definition we'd have to call smp_init() before SYSTEM_RUNNING,
> because you simply cannot declare a system up and running when your core
> functionality isn't initialized.

Now, that part I'm not sure about.

We can consider a UP system to be running, and brining up the other CPU's 
to be a matter of CPU hotplug. 

It's not what we do _now_, of course. We don't force hotplug support on 
people just because they want SMP, and we don't want to go through the 
whole "rewrite locks" things twice etc.

> So I'd really rather preserve these checks -- I can even remember
> running into some of these things a while back, but memory isn't
> providing specific cases.

Yes. I get nervous too about the patch.

That said, I do agree that maybe SYSTEM_RUNNING isn't the right check. 
Testing that the scheduler is initialized may be the more correct one. I 
think the SYSTEM_RUNNING one just comes from that being used for other 
debug issues.

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] sched: Make cond_resched*() available earlier
  2009-07-08 12:58           ` Peter Zijlstra
@ 2009-07-08 20:45             ` Anton Vorontsov
  0 siblings, 0 replies; 27+ messages in thread
From: Anton Vorontsov @ 2009-07-08 20:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Linus Torvalds, Andrew Morton, linux-kernel

Using early netconsole and gianfar driver this error pops up:

  netconsole: timeout waiting for carrier

It appears that net/core/netpoll.c:netpoll_setup() is using
cond_resched() in a loop waiting for a carrier.

The thing is that cond_resched() is a no-op when system_state !=
SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
scheduled, therefore link detection doesn't work.

The system_state check exists to avoid very early calls to the
scheduler. Though, using cond_resched() should be safe after scheduler
is initialized, so instead of checking the system_state, we can test
that the scheduler is running, therefore making cond_resched() and
friends available much earlier (but not too much).

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---

On Wed, Jul 08, 2009 at 02:58:46PM +0200, Peter Zijlstra wrote:
[...]
> > > > Hm. Speaking of cond_resched*() only, then it should be pretty
> > > > safe to convert the SYSTEM_RUNNING checks to scheduler_running,
> > > > no? scheduler_running is set after sched_init().
> > > 
> > > Hmm, that might work, I'd have to audit sched_init_smp() as it seems to
> > > do way too much...
> > 
> > sched_init_smp() is called from the kernel_thread(), so
> > if the scheduler is not functional prior to kernel_thread(),
> > you're in trouble anyway, no? The point is that a lot of
> > code is calling schedule() prior to sched_init_smp()
> > (e.g. msleep(), mutexes), and there are no issues. So
> > should be no issues with cond_resched()?
> 
> Yeah, it should be good, I just got paranoid looking at
> sched_init_smp().

OK, thanks everyone for the reviews.

Here is an updated patch. As usual, tested on UP PowerPC, with and
without SMP, PREEMPT and PREEMPT_VOLUNTARY.

 kernel/sched.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 7c9098d..555360b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6561,7 +6561,7 @@ static void __cond_resched(void)
 int __sched _cond_resched(void)
 {
 	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
-					system_state == SYSTEM_RUNNING) {
+					scheduler_running) {
 		__cond_resched();
 		return 1;
 	}
@@ -6579,7 +6579,7 @@ EXPORT_SYMBOL(_cond_resched);
  */
 int cond_resched_lock(spinlock_t *lock)
 {
-	int resched = need_resched() && system_state == SYSTEM_RUNNING;
+	int resched = need_resched() && scheduler_running;
 	int ret = 0;
 
 	if (spin_needbreak(lock) || resched) {
@@ -6599,7 +6599,7 @@ int __sched cond_resched_softirq(void)
 {
 	BUG_ON(!in_softirq());
 
-	if (need_resched() && system_state == SYSTEM_RUNNING) {
+	if (need_resched() && scheduler_running) {
 		local_bh_enable();
 		__cond_resched();
 		local_bh_disable();
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08 16:12     ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Linus Torvalds
@ 2009-07-08 21:10       ` Andrew Morton
  2009-07-08 21:33         ` Anton Vorontsov
  2009-07-09 23:20         ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Pavel Machek
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Morton @ 2009-07-08 21:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: a.p.zijlstra, oleg, avorontsov, mingo, linux-kernel

> On Wed, 8 Jul 2009 09:12:30 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
> That said, I do agree that maybe SYSTEM_RUNNING isn't the right check. 
> Testing that the scheduler is initialized may be the more correct one. I 
> think the SYSTEM_RUNNING one just comes from that being used for other 
> debug issues.

Agreed.  system_state is too general.

If we specifically want to know whether it is safe to call schedule() then
let's create a global boolean it_is_safe_to_call_schedule and test that,
rather than testing something which indirectly and unreliably implies "it
is safe to call schedule".  If that boolean already exists then no-brainer.

All that being said, I wonder if the netconsole code should be using
msleep(1) instead.  Spinning on cond_resched() is a bit rude.  But one
would have to verify that it is safe to call schedule() at this time, and
for the netconsole caller, this is dubious.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08 21:10       ` Andrew Morton
@ 2009-07-08 21:33         ` Anton Vorontsov
  2009-07-08 21:47           ` Andrew Morton
  2009-07-09 23:20         ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Pavel Machek
  1 sibling, 1 reply; 27+ messages in thread
From: Anton Vorontsov @ 2009-07-08 21:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, a.p.zijlstra, oleg, mingo, linux-kernel

On Wed, Jul 08, 2009 at 02:10:24PM -0700, Andrew Morton wrote:
> > On Wed, 8 Jul 2009 09:12:30 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > That said, I do agree that maybe SYSTEM_RUNNING isn't the right check. 
> > Testing that the scheduler is initialized may be the more correct one. I 
> > think the SYSTEM_RUNNING one just comes from that being used for other 
> > debug issues.
> 
> Agreed.  system_state is too general.
> 
> If we specifically want to know whether it is safe to call schedule() then
> let's create a global boolean it_is_safe_to_call_schedule and test that,
> rather than testing something which indirectly and unreliably implies "it
> is safe to call schedule".  If that boolean already exists then no-brainer.
> 
> All that being said, I wonder if the netconsole code should be using
> msleep(1) instead.  Spinning on cond_resched() is a bit rude.  But one
> would have to verify that it is safe to call schedule() at this time, and
> for the netconsole caller, this is dubious.

What do you mean by "verify that it is safe"? If it works,
can I assume that it's safe? ;-) It works, fwiw.

Thanks,

-- 
Anton Vorontsov
email: cbouatmailru@gmail.com
irc://irc.freenode.net/bd2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08 21:33         ` Anton Vorontsov
@ 2009-07-08 21:47           ` Andrew Morton
  2009-07-08 22:20             ` [PATCH] netpoll: Fix carrier detection for drivers that are using phylib Anton Vorontsov
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2009-07-08 21:47 UTC (permalink / raw)
  To: avorontsov; +Cc: torvalds, a.p.zijlstra, oleg, mingo, linux-kernel, netdev

(belatedly cc'ing netdev)

Original diagnosis:

: Using early netconsole and gianfar driver this error pops up:
: 
:   netconsole: timeout waiting for carrier
: 
: It appears that net/core/netpoll.c:netpoll_setup() is using
: cond_resched() in a loop waiting for a carrier.
: 
: The thing is that cond_resched() is a no-op when system_state !=
: SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
: scheduled, therefore link detection doesn't work

> On Thu, 9 Jul 2009 01:33:31 +0400 Anton Vorontsov <avorontsov@ru.mvista.com> wrote:
> On Wed, Jul 08, 2009 at 02:10:24PM -0700, Andrew Morton wrote:
> > > On Wed, 8 Jul 2009 09:12:30 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > That said, I do agree that maybe SYSTEM_RUNNING isn't the right check. 
> > > Testing that the scheduler is initialized may be the more correct one. I 
> > > think the SYSTEM_RUNNING one just comes from that being used for other 
> > > debug issues.
> > 
> > Agreed.  system_state is too general.
> > 
> > If we specifically want to know whether it is safe to call schedule() then
> > let's create a global boolean it_is_safe_to_call_schedule and test that,
> > rather than testing something which indirectly and unreliably implies "it
> > is safe to call schedule".  If that boolean already exists then no-brainer.
> > 
> > All that being said, I wonder if the netconsole code should be using
> > msleep(1) instead.  Spinning on cond_resched() is a bit rude.  But one
> > would have to verify that it is safe to call schedule() at this time, and
> > for the netconsole caller, this is dubious.
> 
> What do you mean by "verify that it is safe"? If it works,
> can I assume that it's safe? ;-) It works, fwiw.
> 

netconsole is supposed to be available as early as possible in boot for
obvious reasons.  I'd say there's a decent risk now and in the future that
netconsole will be initialised prior to the scheduler being available.

In fact, if "netconsole: timeout waiting for carrier" newly added to
netpoll_setup() a depedency on the scheduler being available then perhaps
that was an incorrect change.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-08 21:47           ` Andrew Morton
@ 2009-07-08 22:20             ` Anton Vorontsov
  2009-07-09  0:01               ` Linus Torvalds
  2009-07-09 12:52               ` Matt Mackall
  0 siblings, 2 replies; 27+ messages in thread
From: Anton Vorontsov @ 2009-07-08 22:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, a.p.zijlstra, oleg, mingo, linux-kernel, netdev

Using early netconsole and gianfar driver this error pops up:

  netconsole: timeout waiting for carrier

It appears that net/core/netpoll.c:netpoll_setup() is using
cond_resched() in a loop waiting for a carrier.

The thing is that cond_resched() is a no-op when system_state !=
SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
scheduled, therefore link detection doesn't work.

I belive that the main problem is in cond_resched()[1], but despite
how the cond_resched() story ends, it might be a good idea to call
msleep(1) instead of cond_resched(), as suggested by Andrew Morton.

[1] http://lkml.org/lkml/2009/7/7/463

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---

On Wed, Jul 08, 2009 at 02:47:44PM -0700, Andrew Morton wrote:
> (belatedly cc'ing netdev)
> 
> Original diagnosis:
> 
> : Using early netconsole and gianfar driver this error pops up:
> : 
> :   netconsole: timeout waiting for carrier
> : 
> : It appears that net/core/netpoll.c:netpoll_setup() is using
> : cond_resched() in a loop waiting for a carrier.
> : 
> : The thing is that cond_resched() is a no-op when system_state !=
> : SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
> : scheduled, therefore link detection doesn't work
> 
> > On Thu, 9 Jul 2009 01:33:31 +0400 Anton Vorontsov <avorontsov@ru.mvista.com> wrote:
> > On Wed, Jul 08, 2009 at 02:10:24PM -0700, Andrew Morton wrote:
> > > > On Wed, 8 Jul 2009 09:12:30 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > > That said, I do agree that maybe SYSTEM_RUNNING isn't the right check. 
> > > > Testing that the scheduler is initialized may be the more correct one. I 
> > > > think the SYSTEM_RUNNING one just comes from that being used for other 
> > > > debug issues.
> > > 
> > > Agreed.  system_state is too general.
> > > 
> > > If we specifically want to know whether it is safe to call schedule() then
> > > let's create a global boolean it_is_safe_to_call_schedule and test that,
> > > rather than testing something which indirectly and unreliably implies "it
> > > is safe to call schedule".  If that boolean already exists then no-brainer.
> > > 
> > > All that being said, I wonder if the netconsole code should be using
> > > msleep(1) instead.  Spinning on cond_resched() is a bit rude.  But one
> > > would have to verify that it is safe to call schedule() at this time, and
> > > for the netconsole caller, this is dubious.
> > 
> > What do you mean by "verify that it is safe"? If it works,
> > can I assume that it's safe? ;-) It works, fwiw.
> > 
> 
> netconsole is supposed to be available as early as possible in boot for
> obvious reasons.  I'd say there's a decent risk now and in the future that
> netconsole will be initialised prior to the scheduler being available.
> 
> In fact, if "netconsole: timeout waiting for carrier" newly added to
> netpoll_setup() a depedency on the scheduler being available then perhaps
> that was an incorrect change.

'git blame' says that carrier detection code didn't change since 2.6.12
(where git history starts), PHYLIB is using workqueue since its
submission (2.6.13). And SYSTEM_RUNNING check was added in 2.6.16.
So it's not a new dependency.

The netpoll code is using msleep() just a few lines below cond_resched(),
so we won't make things worse. ;-)

Thanks!

 net/core/netpoll.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9675f31..df30feb 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -740,7 +740,7 @@ int netpoll_setup(struct netpoll *np)
 				       np->name);
 				break;
 			}
-			cond_resched();
+			msleep(1);
 		}
 
 		/* If carrier appears to come up instantly, we don't
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-08 22:20             ` [PATCH] netpoll: Fix carrier detection for drivers that are using phylib Anton Vorontsov
@ 2009-07-09  0:01               ` Linus Torvalds
  2009-07-09  3:08                 ` David Miller
                                   ` (2 more replies)
  2009-07-09 12:52               ` Matt Mackall
  1 sibling, 3 replies; 27+ messages in thread
From: Linus Torvalds @ 2009-07-09  0:01 UTC (permalink / raw)
  To: Anton Vorontsov
  Cc: Andrew Morton, a.p.zijlstra, oleg, mingo, linux-kernel, netdev



On Thu, 9 Jul 2009, Anton Vorontsov wrote:
> 
> The netpoll code is using msleep() just a few lines below cond_resched(),
> so we won't make things worse. ;-)

Yeah. That function is definitely sleeping. It does things like 
kmalloc(GFP_KERNEL), rtnl_lock() and synchronize_rcu() etc too, so an 
added msleep() is the least of our problems.

Afaik, it's called from a bog-standard "module_init()", which happens late 
enough that everything works.

In fact, I wonder if we should set SYSTEM_RUNNING much earlier - _before_ 
doing the whole "do_initcalls()". By then we've set up all the core stuff 
and enabled interrupts, so we really _are_ running. We just don't 
necessarily have drivers, filesystems etc loaded yet. But anything that 
happens late enough to be an initcall should be largely considered to 
be during "normal code".

(The "early_initcall" cases are special - those really do happen pretty 
early).

So ACK on Anton's patch, but I wonder if we _also_ should do the 
following?

Looking at the people looking at SYSTEM_RUNNING, I do note some odd cases. 
Why the heck does kernel/perf_counter.c do it, for example?

		Linus

---
 init/main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/init/main.c b/init/main.c
index 2c5ade7..f10d9cd 100644
--- a/init/main.c
+++ b/init/main.c
@@ -788,6 +788,7 @@ static void __init do_initcalls(void)
 {
 	initcall_t *call;
 
+	system_state = SYSTEM_RUNNING;
 	for (call = __early_initcall_end; call < __initcall_end; call++)
 		do_one_initcall(*call);
 
@@ -839,7 +840,6 @@ static noinline int init_post(void)
 	free_initmem();
 	unlock_kernel();
 	mark_rodata_ro();
-	system_state = SYSTEM_RUNNING;
 	numa_default_policy();
 
 	if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09  0:01               ` Linus Torvalds
@ 2009-07-09  3:08                 ` David Miller
  2009-07-09  7:56                 ` Peter Zijlstra
  2009-07-09 13:26                 ` Matt Mackall
  2 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2009-07-09  3:08 UTC (permalink / raw)
  To: torvalds
  Cc: avorontsov, akpm, a.p.zijlstra, oleg, mingo, linux-kernel, netdev

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 8 Jul 2009 17:01:14 -0700 (PDT)

> So ACK on Anton's patch,

I'll integrate it into my tree, thanks.

> but I wonder if we _also_ should do the following?

I think we should set SYSTEM_RUNNING earlier, makes sense
to me.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09  0:01               ` Linus Torvalds
  2009-07-09  3:08                 ` David Miller
@ 2009-07-09  7:56                 ` Peter Zijlstra
  2009-07-09 12:56                   ` Matt Mackall
  2009-07-09 13:26                 ` Matt Mackall
  2 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-07-09  7:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anton Vorontsov, Andrew Morton, oleg, mingo, linux-kernel, netdev

On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> Looking at the people looking at SYSTEM_RUNNING, I do note some odd cases. 
> Why the heck does kernel/perf_counter.c do it, for example?

Ah, those are the swcounter and other probe entry points. I've had
several cases where we called into the perf counter code from those
points before it was initialized, getting in kernel segfaults due to
dereferencing uninitialized data etc..

I could keep a variable that tracked the perf_counter_init() state, and
use that instead if you prefer?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-08 22:20             ` [PATCH] netpoll: Fix carrier detection for drivers that are using phylib Anton Vorontsov
  2009-07-09  0:01               ` Linus Torvalds
@ 2009-07-09 12:52               ` Matt Mackall
  1 sibling, 0 replies; 27+ messages in thread
From: Matt Mackall @ 2009-07-09 12:52 UTC (permalink / raw)
  To: avorontsov
  Cc: Andrew Morton, torvalds, a.p.zijlstra, oleg, mingo, linux-kernel, netdev

On Thu, 2009-07-09 at 02:20 +0400, Anton Vorontsov wrote:
> Using early netconsole and gianfar driver this error pops up:
> 
>   netconsole: timeout waiting for carrier
> 
> It appears that net/core/netpoll.c:netpoll_setup() is using
> cond_resched() in a loop waiting for a carrier.
> 
> The thing is that cond_resched() is a no-op when system_state !=
> SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
> scheduled, therefore link detection doesn't work.
> 
> I belive that the main problem is in cond_resched()[1], but despite
> how the cond_resched() story ends, it might be a good idea to call
> msleep(1) instead of cond_resched(), as suggested by Andrew Morton.
> 
> [1] http://lkml.org/lkml/2009/7/7/463
> 
> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
> ---
> 
> On Wed, Jul 08, 2009 at 02:47:44PM -0700, Andrew Morton wrote:
> > (belatedly cc'ing netdev)
> > 
> > Original diagnosis:
> > 
> > : Using early netconsole and gianfar driver this error pops up:
> > : 
> > :   netconsole: timeout waiting for carrier
> > : 
> > : It appears that net/core/netpoll.c:netpoll_setup() is using
> > : cond_resched() in a loop waiting for a carrier.
> > : 
> > : The thing is that cond_resched() is a no-op when system_state !=
> > : SYSTEM_RUNNING, and so drivers/net/phy/phy.c's state_queue is never
> > : scheduled, therefore link detection doesn't work
> > 
> > > On Thu, 9 Jul 2009 01:33:31 +0400 Anton Vorontsov <avorontsov@ru.mvista.com> wrote:
> > > On Wed, Jul 08, 2009 at 02:10:24PM -0700, Andrew Morton wrote:
> > > > > On Wed, 8 Jul 2009 09:12:30 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > > > That said, I do agree that maybe SYSTEM_RUNNING isn't the right check. 
> > > > > Testing that the scheduler is initialized may be the more correct one. I 
> > > > > think the SYSTEM_RUNNING one just comes from that being used for other 
> > > > > debug issues.
> > > > 
> > > > Agreed.  system_state is too general.
> > > > 
> > > > If we specifically want to know whether it is safe to call schedule() then
> > > > let's create a global boolean it_is_safe_to_call_schedule and test that,
> > > > rather than testing something which indirectly and unreliably implies "it
> > > > is safe to call schedule".  If that boolean already exists then no-brainer.
> > > > 
> > > > All that being said, I wonder if the netconsole code should be using
> > > > msleep(1) instead.  Spinning on cond_resched() is a bit rude.  But one
> > > > would have to verify that it is safe to call schedule() at this time, and
> > > > for the netconsole caller, this is dubious.
> > > 
> > > What do you mean by "verify that it is safe"? If it works,
> > > can I assume that it's safe? ;-) It works, fwiw.
> > > 
> > 
> > netconsole is supposed to be available as early as possible in boot for
> > obvious reasons.  I'd say there's a decent risk now and in the future that
> > netconsole will be initialised prior to the scheduler being available.
> > 
> > In fact, if "netconsole: timeout waiting for carrier" newly added to
> > netpoll_setup() a depedency on the scheduler being available then perhaps
> > that was an incorrect change.
> 
> 'git blame' says that carrier detection code didn't change since 2.6.12
> (where git history starts), PHYLIB is using workqueue since its
> submission (2.6.13). And SYSTEM_RUNNING check was added in 2.6.16.
> So it's not a new dependency.
> 
> The netpoll code is using msleep() just a few lines below cond_resched(),
> so we won't make things worse. ;-)

I think that's an improvement with or without the SYSTEM_RUNNING fix.

Signed-off-by: Matt Mackall <mpm@selenic.com>

> Thanks!
> 
>  net/core/netpoll.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 9675f31..df30feb 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -740,7 +740,7 @@ int netpoll_setup(struct netpoll *np)
>  				       np->name);
>  				break;
>  			}
> -			cond_resched();
> +			msleep(1);
>  		}
>  
>  		/* If carrier appears to come up instantly, we don't

-- 
http://selenic.com : development and support for Mercurial and Linux



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09  7:56                 ` Peter Zijlstra
@ 2009-07-09 12:56                   ` Matt Mackall
  0 siblings, 0 replies; 27+ messages in thread
From: Matt Mackall @ 2009-07-09 12:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Anton Vorontsov, Andrew Morton, oleg, mingo,
	linux-kernel, netdev

On Thu, 2009-07-09 at 09:56 +0200, Peter Zijlstra wrote:
> On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> > Looking at the people looking at SYSTEM_RUNNING, I do note some odd cases. 
> > Why the heck does kernel/perf_counter.c do it, for example?
> 
> Ah, those are the swcounter and other probe entry points. I've had
> several cases where we called into the perf counter code from those
> points before it was initialized, getting in kernel segfaults due to
> dereferencing uninitialized data etc..
> 
> I could keep a variable that tracked the perf_counter_init() state, and
> use that instead if you prefer?

Looks like that'd be more accurate. Linus's proposed patch might break
your current assumptions.

-- 
http://selenic.com : development and support for Mercurial and Linux



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09  0:01               ` Linus Torvalds
  2009-07-09  3:08                 ` David Miller
  2009-07-09  7:56                 ` Peter Zijlstra
@ 2009-07-09 13:26                 ` Matt Mackall
  2009-07-09 13:46                   ` Peter Zijlstra
  2 siblings, 1 reply; 27+ messages in thread
From: Matt Mackall @ 2009-07-09 13:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anton Vorontsov, Andrew Morton, a.p.zijlstra, oleg, mingo,
	linux-kernel, netdev

On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> 
> On Thu, 9 Jul 2009, Anton Vorontsov wrote:
> > 
> > The netpoll code is using msleep() just a few lines below cond_resched(),
> > so we won't make things worse. ;-)
> 
> Yeah. That function is definitely sleeping. It does things like 
> kmalloc(GFP_KERNEL), rtnl_lock() and synchronize_rcu() etc too, so an 
> added msleep() is the least of our problems.
> 
> Afaik, it's called from a bog-standard "module_init()", which happens late 
> enough that everything works.
> 
> In fact, I wonder if we should set SYSTEM_RUNNING much earlier - _before_ 
> doing the whole "do_initcalls()".

Well there are two ways of consistently defining SYSTEM_RUNNING:

a) define it with reference to the well-understood notion of booting vs
running and don't switch it until handing off to init

b) define it with reference to its usage by an arbitrary user like
cond_resched()

In the latter case, we obviously need to move it to the earliest point
that scheduling is possible. But there are a number of things like 

http://lxr.linux.no/linux+v2.6.30/kernel/printk.c#L228

that assume the definition is actually (a). We're currently within a
couple lines of a strict definition of (a) already, so I actually think
cond_resched() is just wrong (and we already know it broke a
previously-working user). It should perhaps be using another private
flag that gets set as soon as scheduling is up and running.

But I'd actually go further and say that it's unfortunate to be checking
extra flags in such an important inline, especially since the check is
false for all but the first couple seconds of run time. Seems like we
could avoid adding an extra check by artificially elevating the preempt
count in early boot (or at compile time) then dropping it when
scheduling becomes available.

-- 
http://selenic.com : development and support for Mercurial and Linux



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09 13:26                 ` Matt Mackall
@ 2009-07-09 13:46                   ` Peter Zijlstra
  2009-07-09 14:18                     ` Matt Mackall
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-07-09 13:46 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Linus Torvalds, Anton Vorontsov, Andrew Morton, oleg, mingo,
	linux-kernel, netdev

On Thu, 2009-07-09 at 08:26 -0500, Matt Mackall wrote: 
> On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> > 
> > On Thu, 9 Jul 2009, Anton Vorontsov wrote:
> > > 
> > > The netpoll code is using msleep() just a few lines below cond_resched(),
> > > so we won't make things worse. ;-)
> > 
> > Yeah. That function is definitely sleeping. It does things like 
> > kmalloc(GFP_KERNEL), rtnl_lock() and synchronize_rcu() etc too, so an 
> > added msleep() is the least of our problems.
> > 
> > Afaik, it's called from a bog-standard "module_init()", which happens late 
> > enough that everything works.
> > 
> > In fact, I wonder if we should set SYSTEM_RUNNING much earlier - _before_ 
> > doing the whole "do_initcalls()".
> 
> Well there are two ways of consistently defining SYSTEM_RUNNING:
> 
> a) define it with reference to the well-understood notion of booting vs
> running and don't switch it until handing off to init

This makes the most sense IMHO.

> b) define it with reference to its usage by an arbitrary user like
> cond_resched()
> 
> In the latter case, we obviously need to move it to the earliest point
> that scheduling is possible. But there are a number of things like 
> 
> http://lxr.linux.no/linux+v2.6.30/kernel/printk.c#L228
> 
> that assume the definition is actually (a). We're currently within a
> couple lines of a strict definition of (a) already, so I actually think
> cond_resched() is just wrong (and we already know it broke a
> previously-working user). It should perhaps be using another private
> flag that gets set as soon as scheduling is up and running.

Right as mentioned before in this thread, we grew scheduler_running a
while back which could be used for this.

> But I'd actually go further and say that it's unfortunate to be checking
> extra flags in such an important inline, especially since the check is
> false for all but the first couple seconds of run time. Seems like we
> could avoid adding an extra check by artificially elevating the preempt
> count in early boot (or at compile time) then dropping it when
> scheduling becomes available.

Calling cond_resched() and co when !preemptable is an error so this
wouldn't actually work.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09 13:46                   ` Peter Zijlstra
@ 2009-07-09 14:18                     ` Matt Mackall
  2009-07-09 14:31                       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Matt Mackall @ 2009-07-09 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Anton Vorontsov, Andrew Morton, oleg, mingo,
	linux-kernel, netdev

On Thu, 2009-07-09 at 15:46 +0200, Peter Zijlstra wrote:
> On Thu, 2009-07-09 at 08:26 -0500, Matt Mackall wrote: 
> > On Wed, 2009-07-08 at 17:01 -0700, Linus Torvalds wrote:
> > > 
> > > On Thu, 9 Jul 2009, Anton Vorontsov wrote:
> > > > 
> > > > The netpoll code is using msleep() just a few lines below cond_resched(),
> > > > so we won't make things worse. ;-)
> > > 
> > > Yeah. That function is definitely sleeping. It does things like 
> > > kmalloc(GFP_KERNEL), rtnl_lock() and synchronize_rcu() etc too, so an 
> > > added msleep() is the least of our problems.
> > > 
> > > Afaik, it's called from a bog-standard "module_init()", which happens late 
> > > enough that everything works.
> > > 
> > > In fact, I wonder if we should set SYSTEM_RUNNING much earlier - _before_ 
> > > doing the whole "do_initcalls()".
> > 
> > Well there are two ways of consistently defining SYSTEM_RUNNING:
> > 
> > a) define it with reference to the well-understood notion of booting vs
> > running and don't switch it until handing off to init
> 
> This makes the most sense IMHO.
> 
> > b) define it with reference to its usage by an arbitrary user like
> > cond_resched()
> > 
> > In the latter case, we obviously need to move it to the earliest point
> > that scheduling is possible. But there are a number of things like 
> > 
> > http://lxr.linux.no/linux+v2.6.30/kernel/printk.c#L228
> > 
> > that assume the definition is actually (a). We're currently within a
> > couple lines of a strict definition of (a) already, so I actually think
> > cond_resched() is just wrong (and we already know it broke a
> > previously-working user). It should perhaps be using another private
> > flag that gets set as soon as scheduling is up and running.
> 
> Right as mentioned before in this thread, we grew scheduler_running a
> while back which could be used for this.
> 
> > But I'd actually go further and say that it's unfortunate to be checking
> > extra flags in such an important inline, especially since the check is
> > false for all but the first couple seconds of run time. Seems like we
> > could avoid adding an extra check by artificially elevating the preempt
> > count in early boot (or at compile time) then dropping it when
> > scheduling becomes available.
> 
> Calling cond_resched() and co when !preemptable is an error so this
> wouldn't actually work.

Sorry if I was unclear. I'm suggesting setting the count so the existing
PREEMPT_ACTIVE test here fires:

int __sched _cond_resched(void)
{
        if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
                                        system_state == SYSTEM_RUNNING) {
                __cond_resched();
                return 1;
        }
        return 0;
}

That should be kosher.

-- 
http://selenic.com : development and support for Mercurial and Linux



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09 14:18                     ` Matt Mackall
@ 2009-07-09 14:31                       ` Peter Zijlstra
  2009-07-09 14:43                         ` Matt Mackall
  2009-07-09 17:29                         ` Linus Torvalds
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2009-07-09 14:31 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Linus Torvalds, Anton Vorontsov, Andrew Morton, oleg, mingo,
	linux-kernel, netdev

On Thu, 2009-07-09 at 09:18 -0500, Matt Mackall wrote:
> 
> Sorry if I was unclear. I'm suggesting setting the count so the existing
> PREEMPT_ACTIVE test here fires:
> 
> int __sched _cond_resched(void)
> {
>         if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
>                                         system_state == SYSTEM_RUNNING) {
>                 __cond_resched();
>                 return 1;
>         }
>         return 0;
> }

Right, /me read preempt and thought a simple preempt inc but didn't read
the code. Shame on me.

So something like (utterly untested and such)

---

 arch/x86/include/asm/thread_info.h |    2 +-
 kernel/sched.c                     |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index b078352..7b5dbce 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -49,7 +49,7 @@ struct thread_info {
 	.exec_domain	= &default_exec_domain,	\
 	.flags		= 0,			\
 	.cpu		= 0,			\
-	.preempt_count	= 1,			\
+	.preempt_count	= 1 + PREEMPT_ACTIVE,	\
 	.addr_limit	= KERNEL_DS,		\
 	.restart_block = {			\
 		.fn = do_no_restart_syscall,	\
diff --git a/kernel/sched.c b/kernel/sched.c
index fd3ac58..8e81162 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6599,8 +6599,7 @@ static void __cond_resched(void)
 
 int __sched _cond_resched(void)
 {
-	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
-					system_state == SYSTEM_RUNNING) {
+	if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE)) {
 		__cond_resched();
 		return 1;
 	}
@@ -9422,6 +9421,7 @@ void __init sched_init(void)
 	perf_counter_init();
 
 	scheduler_running = 1;
+	sub_preempt_count(PREEMPT_ACTIVE);
 }
 
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09 14:31                       ` Peter Zijlstra
@ 2009-07-09 14:43                         ` Matt Mackall
  2009-07-09 14:51                           ` Peter Zijlstra
  2009-07-09 17:29                         ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Matt Mackall @ 2009-07-09 14:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Anton Vorontsov, Andrew Morton, oleg, mingo,
	linux-kernel, netdev

On Thu, 2009-07-09 at 16:31 +0200, Peter Zijlstra wrote:
> On Thu, 2009-07-09 at 09:18 -0500, Matt Mackall wrote:
> > 
> > Sorry if I was unclear. I'm suggesting setting the count so the existing
> > PREEMPT_ACTIVE test here fires:
> > 
> > int __sched _cond_resched(void)
> > {
> >         if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
> >                                         system_state == SYSTEM_RUNNING) {
> >                 __cond_resched();
> >                 return 1;
> >         }
> >         return 0;
> > }
> 
> Right, /me read preempt and thought a simple preempt inc but didn't read
> the code. Shame on me.
> 
> So something like (utterly untested and such)

Yeah, that's what I had in mind. Probably throw in a define:

/* for disabling scheduling in early boot */
#define PREEMPT_EARLY (1 + PREEMPT_ACTIVE)

and slap a comment on the sub_preempt_count().

Does anything actually use scheduler_running yet? Perhaps my tree is
old.

Also, might_sleep's use of system_state probably bears revisiting.

-- 
http://selenic.com : development and support for Mercurial and Linux



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09 14:43                         ` Matt Mackall
@ 2009-07-09 14:51                           ` Peter Zijlstra
  2009-07-09 15:06                             ` Matt Mackall
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-07-09 14:51 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Linus Torvalds, Anton Vorontsov, Andrew Morton, oleg, mingo,
	linux-kernel, netdev

On Thu, 2009-07-09 at 09:43 -0500, Matt Mackall wrote:

> Yeah, that's what I had in mind. Probably throw in a define:
> 
> /* for disabling scheduling in early boot */
> #define PREEMPT_EARLY (1 + PREEMPT_ACTIVE)
> 
> and slap a comment on the sub_preempt_count().

Right, and visit all the other arch init code ;-)

I'll wait to see if Ingo has anything to say about it and then complete
this thing.

> Does anything actually use scheduler_running yet? Perhaps my tree is
> old.

# git grep scheduler_running
kernel/sched.c:static __read_mostly int scheduler_running;
kernel/sched.c: scheduler_running = 1;
kernel/sched_rt.c:      if (unlikely(!scheduler_running))
kernel/sched_rt.c:      if (unlikely(!scheduler_running))

If memory serves there used to be more, but I think that migrated into
kernel/sched_clock.c, which has sched_clock_running.

> Also, might_sleep's use of system_state probably bears revisiting.

Yeah, all that code is from long before we had scheduler_running (which
was introduced around CFS/.23).


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09 14:51                           ` Peter Zijlstra
@ 2009-07-09 15:06                             ` Matt Mackall
  0 siblings, 0 replies; 27+ messages in thread
From: Matt Mackall @ 2009-07-09 15:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Anton Vorontsov, Andrew Morton, oleg, mingo,
	linux-kernel, netdev

On Thu, 2009-07-09 at 16:51 +0200, Peter Zijlstra wrote:
> > Does anything actually use scheduler_running yet? Perhaps my tree is
> > old.
> 
> # git grep scheduler_running
> kernel/sched.c:static __read_mostly int scheduler_running;
> kernel/sched.c: scheduler_running = 1;
> kernel/sched_rt.c:      if (unlikely(!scheduler_running))
> kernel/sched_rt.c:      if (unlikely(!scheduler_running))
> 
> If memory serves there used to be more, but I think that migrated into
> kernel/sched_clock.c, which has sched_clock_running.

The static in the first line makes that a bit of a head-scratcher? Oh, I
see: it's living in sin. Going to avert my eyes for now.

-- 
http://selenic.com : development and support for Mercurial and Linux



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] netpoll: Fix carrier detection for drivers that are using phylib
  2009-07-09 14:31                       ` Peter Zijlstra
  2009-07-09 14:43                         ` Matt Mackall
@ 2009-07-09 17:29                         ` Linus Torvalds
  1 sibling, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2009-07-09 17:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Mackall, Anton Vorontsov, Andrew Morton, oleg, mingo,
	linux-kernel, netdev



On Thu, 9 Jul 2009, Peter Zijlstra wrote:
> 
> So something like (utterly untested and such)

This looks like a good patch. Please make it so - who knows what other 
uses of cond_resched() we have in module init routines that might have 
deadlocks without it. The netpoll case got fixed, but please just do this.

I'd like to do my system_state movement too (the thing is, when you load 
drivers as modules you _will_ have "system_state == SYSTEM_RUNNING", so 
any initcall that depends on it being "early boot" is already broken), but 
there's no way that patch is appropriate for post-rc2. This one, however, 
looks appropriate (modulo getting some testing, of course)

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*()
  2009-07-08 21:10       ` Andrew Morton
  2009-07-08 21:33         ` Anton Vorontsov
@ 2009-07-09 23:20         ` Pavel Machek
  1 sibling, 0 replies; 27+ messages in thread
From: Pavel Machek @ 2009-07-09 23:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, a.p.zijlstra, oleg, avorontsov, mingo, linux-kernel

Hi!

> > That said, I do agree that maybe SYSTEM_RUNNING isn't the right check. 
> > Testing that the scheduler is initialized may be the more correct one. I 
> > think the SYSTEM_RUNNING one just comes from that being used for other 
> > debug issues.
> 
> Agreed.  system_state is too general.
> 
> If we specifically want to know whether it is safe to call schedule() then
> let's create a global boolean it_is_safe_to_call_schedule and test that,
> rather than testing something which indirectly and unreliably implies "it
> is safe to call schedule".  If that boolean already exists then no-brainer.

or maybe we could embed that check into schedule(), just returning
when scheduler is not ready?

And I always wondered... system_state is not protected by any kind of
lock and is not atomic_t... Does it all work by mistake?
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-07-10 12:06 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-07 23:58 [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Anton Vorontsov
2009-07-08  0:50 ` Oleg Nesterov
2009-07-08  6:24   ` Peter Zijlstra
2009-07-08 12:03     ` Anton Vorontsov
2009-07-08 12:12       ` Peter Zijlstra
2009-07-08 12:55         ` Anton Vorontsov
2009-07-08 12:58           ` Peter Zijlstra
2009-07-08 20:45             ` [PATCH] sched: Make cond_resched*() available earlier Anton Vorontsov
2009-07-08 16:12     ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Linus Torvalds
2009-07-08 21:10       ` Andrew Morton
2009-07-08 21:33         ` Anton Vorontsov
2009-07-08 21:47           ` Andrew Morton
2009-07-08 22:20             ` [PATCH] netpoll: Fix carrier detection for drivers that are using phylib Anton Vorontsov
2009-07-09  0:01               ` Linus Torvalds
2009-07-09  3:08                 ` David Miller
2009-07-09  7:56                 ` Peter Zijlstra
2009-07-09 12:56                   ` Matt Mackall
2009-07-09 13:26                 ` Matt Mackall
2009-07-09 13:46                   ` Peter Zijlstra
2009-07-09 14:18                     ` Matt Mackall
2009-07-09 14:31                       ` Peter Zijlstra
2009-07-09 14:43                         ` Matt Mackall
2009-07-09 14:51                           ` Peter Zijlstra
2009-07-09 15:06                             ` Matt Mackall
2009-07-09 17:29                         ` Linus Torvalds
2009-07-09 12:52               ` Matt Mackall
2009-07-09 23:20         ` [PATCH/RFC] sched: Remove SYSTEM_RUNNING checks from cond_resched*() Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.