All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] 2.4.16 kernel/printk.c (per processor initialization check)
@ 2001-12-03  5:46 j-nomura
  2001-12-03  9:20 ` Andrew Morton
  2001-12-03 10:32 ` j-nomura
  0 siblings, 2 replies; 25+ messages in thread
From: j-nomura @ 2001-12-03  5:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: j-nomura

Hello,

I experienced system hang on my SMP machine and it turned out to be due to
console write before mmu initialization completes.

To be more specific, even if secondary processors are not in status enough
to do actual console I/O (e.g. mmu is not initialized), call_console_drivers()
tries to do it.
This leads to unpredictable result. For me, for example, it cause machine
check abort and hang up system.

Attached is a patch for it.

--- kernel/printk.c	2001/11/27 04:41:49	1.1.1.8
+++ kernel/printk.c	2001/12/03 05:25:26
@@ -491,20 +491,22 @@
  */
 void release_console_sem(void)
 {
 	unsigned long flags;
 	unsigned long _con_start, _log_end;
 	unsigned long must_wake_klogd = 0;
 
 	for ( ; ; ) {
 		spin_lock_irqsave(&logbuf_lock, flags);
 		must_wake_klogd |= log_start - log_end;
+		if (!(cpu_online_map & 1UL << smp_processor_id()))
+			break;
 		if (con_start == log_end)
 			break;			/* Nothing to print */
 		_con_start = con_start;
 		_log_end = log_end;
 		con_start = log_end;		/* Flush */
 		spin_unlock_irqrestore(&logbuf_lock, flags);
 		call_console_drivers(_con_start, _log_end);
 	}
 	console_may_schedule = 0;
 	up(&console_sem);

Best regards.
--
NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com, nomura@hpc.bs1.fc.nec.co.jp>
HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check)
  2001-12-03  5:46 [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) j-nomura
@ 2001-12-03  9:20 ` Andrew Morton
  2001-12-03 10:32 ` j-nomura
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2001-12-03  9:20 UTC (permalink / raw)
  To: j-nomura; +Cc: linux-kernel

j-nomura@ce.jp.nec.com wrote:
> 
> Hello,
> 
> I experienced system hang on my SMP machine and it turned out to be due to
> console write before mmu initialization completes.
> 
> To be more specific, even if secondary processors are not in status enough
> to do actual console I/O (e.g. mmu is not initialized), call_console_drivers()
> tries to do it.
> This leads to unpredictable result. For me, for example, it cause machine
> check abort and hang up system.
> 
> Attached is a patch for it.
> 
> --- kernel/printk.c     2001/11/27 04:41:49     1.1.1.8
> +++ kernel/printk.c     2001/12/03 05:25:26
> @@ -491,20 +491,22 @@
>   */
>  void release_console_sem(void)
>  {
>         unsigned long flags;
>         unsigned long _con_start, _log_end;
>         unsigned long must_wake_klogd = 0;
> 
>         for ( ; ; ) {
>                 spin_lock_irqsave(&logbuf_lock, flags);
>                 must_wake_klogd |= log_start - log_end;
> +               if (!(cpu_online_map & 1UL << smp_processor_id()))
> +                       break;
>                 if (con_start == log_end)
>                         break;                  /* Nothing to print */
>                 _con_start = con_start;
>                 _log_end = log_end;
>                 con_start = log_end;            /* Flush */
>                 spin_unlock_irqrestore(&logbuf_lock, flags);
>                 call_console_drivers(_con_start, _log_end);
>         }
>         console_may_schedule = 0;
>         up(&console_sem);
> 

Seems that there is some sort of ordering problem here - someone
is calling printk before the MMU is initialised, but after some
console drivers have been installed.

I suspect the real fix is elsewhere, but I'm not sure where.

Probably a clearer place to put this test would be within
printk itself, immediately before the down_trylock.  Does that
work?

-

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check)
  2001-12-03  5:46 [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) j-nomura
  2001-12-03  9:20 ` Andrew Morton
@ 2001-12-03 10:32 ` j-nomura
  2001-12-04  1:45   ` [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck) Andrew Morton
  2001-12-06  5:01   ` j-nomura
  1 sibling, 2 replies; 25+ messages in thread
From: j-nomura @ 2001-12-03 10:32 UTC (permalink / raw)
  To: akpm; +Cc: j-nomura, linux-kernel

Hi,

Thank you for commenting.

From: Andrew Morton <akpm@zip.com.au>
Subject: Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check)
Date: Mon, 03 Dec 2001 01:20:28 -0800

> Seems that there is some sort of ordering problem here - someone
> is calling printk before the MMU is initialised, but after some
> console drivers have been installed.

Yes.
Because smp_init() is later in place than console_init(), printk() can be
called in such a situation.
For example, in IA-64, identify_cpu() is called before ia64_mmu_init(),
while identify_cpu() calls printk() in it.
I don't think the ordering itself is a problem.

> I suspect the real fix is elsewhere, but I'm not sure where.
> 
> Probably a clearer place to put this test would be within
> printk itself, immediately before the down_trylock.  Does that
> work?

The reason I put it in release_console_sem() is that release_console_sem()
can be called from other functions than printk(), e.g. console_unblank().
I agree with you that it is clearer but I think it is not sufficient.

Best regards.
--
NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com, nomura@hpc.bs1.fc.nec.co.jp>
HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck)
  2001-12-03 10:32 ` j-nomura
@ 2001-12-04  1:45   ` Andrew Morton
  2001-12-06  5:01   ` j-nomura
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2001-12-04  1:45 UTC (permalink / raw)
  To: j-nomura; +Cc: linux-kernel

j-nomura@ce.jp.nec.com wrote:
> 
> Hi,
> 
> Thank you for commenting.
> 
> From: Andrew Morton <akpm@zip.com.au>
> Subject: Re: [PATCH] 2.4.16 kernel/printk.c (per processor initialization check)
> Date: Mon, 03 Dec 2001 01:20:28 -0800
> 
> > Seems that there is some sort of ordering problem here - someone
> > is calling printk before the MMU is initialised, but after some
> > console drivers have been installed.
> 
> Yes.
> Because smp_init() is later in place than console_init(), printk() can be
> called in such a situation.
> For example, in IA-64, identify_cpu() is called before ia64_mmu_init(),
> while identify_cpu() calls printk() in it.
> I don't think the ordering itself is a problem.
> 
> > I suspect the real fix is elsewhere, but I'm not sure where.
> >
> > Probably a clearer place to put this test would be within
> > printk itself, immediately before the down_trylock.  Does that
> > work?
> 
> The reason I put it in release_console_sem() is that release_console_sem()
> can be called from other functions than printk(), e.g. console_unblank().
> I agree with you that it is clearer but I think it is not sufficient.
> 

I really doubt if any of those paths could be called before
even the MMU is set up.

It seems that the ia64 port has installed some console drivers,
and has then called them before it is ready to do so.  Via printk.

It should not have installed the console drivers that early.  Do
you know what console driver is causing the problem?

If the console driver is not fixable then a more general approach
would be, in printk.c:

#ifndef ARCH_HAS_PRINTK_MAY_BE_USED
#define printk_may_be_used() (1)
#endif

then, in printk() itself:

                if (*p == '\n')
                        log_level_unknown = 1;
        }

+	if (!printk_may_be_used())
+		return printed_len;

        if (!down_trylock(&console_sem)) {
                /*

then, for ia64, give it a printk_may_be_used() function, and
define ARCH_HAS_PRINTK_MAY_BE_USED somewhere.

Or just not install console drivers before they may be safely
used!

-

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck)
  2001-12-03 10:32 ` j-nomura
  2001-12-04  1:45   ` [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck) Andrew Morton
@ 2001-12-06  5:01   ` j-nomura
  2001-12-07  3:40     ` [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) Andrew Morton
  1 sibling, 1 reply; 25+ messages in thread
From: j-nomura @ 2001-12-06  5:01 UTC (permalink / raw)
  To: akpm; +Cc: j-nomura, linux-kernel

Hi,
excuse me for not prompt response. I've been off-line for 2 days.

> > The reason I put it in release_console_sem() is that release_console_sem()
> > can be called from other functions than printk(), e.g. console_unblank().
> > I agree with you that it is clearer but I think it is not sufficient.
> 
> I really doubt if any of those paths could be called before
> even the MMU is set up.

I didn't have any intention to say that console_unblank() is called so early.

OK. Here is revised patch which checks if cpu initialization is done
just before down_trylock(). This works for me.

diff -u -r1.1.1.8 printk.c
--- kernel/printk.c	2001/11/27 04:41:49	1.1.1.8
+++ kernel/printk.c	2001/12/06 04:54:50
@@ -438,7 +438,13 @@
 			log_level_unknown = 1;
 	}
 
-	if (!down_trylock(&console_sem)) {
+	if (!(cpu_online_map & 1UL << smp_processor_id())) {
+		/*
+		 * The cpu has not been initialized completely
+		 * enough to call console drivers.
+		 */
+		spin_unlock_irqrestore(&logbuf_lock, flags);
+	} else if (!down_trylock(&console_sem)) {
 		/*
 		 * We own the drivers.  We can drop the spinlock and let
 		 * release_console_sem() print the text



Best regards.
--
NOMURA, Jun'ichi <j-nomura@ce.jp.nec.com, nomura@hpc.bs1.fc.nec.co.jp>
HPC Operating System Group, 1st Computers Software Division,
Computers Software Operations Unit, NEC Solutions.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-06  5:01   ` j-nomura
@ 2001-12-07  3:40     ` Andrew Morton
  2001-12-07 18:52       ` Marcelo Tosatti
  2001-12-07 21:37       ` David Mosberger
  0 siblings, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2001-12-07  3:40 UTC (permalink / raw)
  To: j-nomura; +Cc: linux-kernel, David Mosberger, Marcelo Tosatti

j-nomura@ce.jp.nec.com wrote:
> 
> Hi,
> excuse me for not prompt response. I've been off-line for 2 days.
> 
> > > The reason I put it in release_console_sem() is that release_console_sem()
> > > can be called from other functions than printk(), e.g. console_unblank().
> > > I agree with you that it is clearer but I think it is not sufficient.
> >
> > I really doubt if any of those paths could be called before
> > even the MMU is set up.
> 

Marcelo,

after a fairly lengthy off-list discussion, it turns out that
special-casing printk is probably the best way to proceed
to fix this one.

The problem is that the boot processor sets up the console drivers,
and is able to call them via printk(), but the application processors
at that time are not able to call printk() because the console device
driver mappings are not set up.  Undoing this inside the ia64 boot code is
complex and fragile.   Possibly the problem exists on other platforms
but hasn't been discovered yet.

So the patch defines an architecture-specific macro `arch_consoles_callable()'
which, if not defined, defaults to `1', so the impact to other platforms
(and to uniprocessor ia64) is zero.



--- linux-2.4.17-pre4/include/asm-ia64/system.h	Thu Nov 22 23:02:59 2001
+++ linux-akpm/include/asm-ia64/system.h	Wed Dec  5 23:09:15 2001
@@ -405,6 +405,10 @@ extern void ia64_load_extra (struct task
 	ia64_psr(ia64_task_regs(prev))->dfh = 1;				\
 	__switch_to(prev,next,last);						\
   } while (0)
+
+/* Return true if this CPU can call the console drivers in printk() */
+#define arch_consoles_callable() (cpu_online_map & (1UL << smp_processor_id()))
+
 #else
 # define switch_to(prev,next,last) do {						\
 	ia64_psr(ia64_task_regs(next))->dfh = (ia64_get_fpu_owner() != (next));	\
--- linux-2.4.17-pre4/kernel/printk.c	Thu Nov 22 23:02:59 2001
+++ linux-akpm/kernel/printk.c	Wed Dec  5 23:03:54 2001
@@ -38,6 +38,10 @@
 
 #define LOG_BUF_MASK	(LOG_BUF_LEN-1)
 
+#ifndef arch_consoles_callable
+#define arch_consoles_callable() (1)
+#endif
+
 /* printk's without a loglevel use this.. */
 #define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */
 
@@ -438,6 +442,14 @@ asmlinkage int printk(const char *fmt, .
 			log_level_unknown = 1;
 	}
 
+	if (!arch_consoles_callable()) {
+		/*
+		 * On some architectures, the consoles are not usable
+		 * on secondary CPUs early in the boot process.
+		 */
+		spin_unlock_irqrestore(&logbuf_lock, flags);
+		goto out;
+	}
 	if (!down_trylock(&console_sem)) {
 		/*
 		 * We own the drivers.  We can drop the spinlock and let
@@ -454,6 +466,7 @@ asmlinkage int printk(const char *fmt, .
 		 */
 		spin_unlock_irqrestore(&logbuf_lock, flags);
 	}
+out:
 	return printed_len;
 }
 EXPORT_SYMBOL(printk);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07  3:40     ` [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) Andrew Morton
@ 2001-12-07 18:52       ` Marcelo Tosatti
  2001-12-07 20:52         ` William Lee Irwin III
  2001-12-07 21:37       ` David Mosberger
  1 sibling, 1 reply; 25+ messages in thread
From: Marcelo Tosatti @ 2001-12-07 18:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: j-nomura, linux-kernel, David Mosberger


How hard would it be to fix that on ia64 code? 

I'm really not willing to apply this kludge... 


On Thu, 6 Dec 2001, Andrew Morton wrote:

> j-nomura@ce.jp.nec.com wrote:
> > 
> > Hi,
> > excuse me for not prompt response. I've been off-line for 2 days.
> > 
> > > > The reason I put it in release_console_sem() is that release_console_sem()
> > > > can be called from other functions than printk(), e.g. console_unblank().
> > > > I agree with you that it is clearer but I think it is not sufficient.
> > >
> > > I really doubt if any of those paths could be called before
> > > even the MMU is set up.
> > 
> 
> Marcelo,
> 
> after a fairly lengthy off-list discussion, it turns out that
> special-casing printk is probably the best way to proceed
> to fix this one.
> 
> The problem is that the boot processor sets up the console drivers,
> and is able to call them via printk(), but the application processors
> at that time are not able to call printk() because the console device
> driver mappings are not set up.  Undoing this inside the ia64 boot code is
> complex and fragile.   Possibly the problem exists on other platforms
> but hasn't been discovered yet.
> 
> So the patch defines an architecture-specific macro `arch_consoles_callable()'
> which, if not defined, defaults to `1', so the impact to other platforms
> (and to uniprocessor ia64) is zero.
> 
> 
> 
> --- linux-2.4.17-pre4/include/asm-ia64/system.h	Thu Nov 22 23:02:59 2001
> +++ linux-akpm/include/asm-ia64/system.h	Wed Dec  5 23:09:15 2001
> @@ -405,6 +405,10 @@ extern void ia64_load_extra (struct task
>  	ia64_psr(ia64_task_regs(prev))->dfh = 1;				\
>  	__switch_to(prev,next,last);						\
>    } while (0)
> +
> +/* Return true if this CPU can call the console drivers in printk() */
> +#define arch_consoles_callable() (cpu_online_map & (1UL << smp_processor_id()))
> +
>  #else
>  # define switch_to(prev,next,last) do {						\
>  	ia64_psr(ia64_task_regs(next))->dfh = (ia64_get_fpu_owner() != (next));	\
> --- linux-2.4.17-pre4/kernel/printk.c	Thu Nov 22 23:02:59 2001
> +++ linux-akpm/kernel/printk.c	Wed Dec  5 23:03:54 2001
> @@ -38,6 +38,10 @@
>  
>  #define LOG_BUF_MASK	(LOG_BUF_LEN-1)
>  
> +#ifndef arch_consoles_callable
> +#define arch_consoles_callable() (1)
> +#endif
> +
>  /* printk's without a loglevel use this.. */
>  #define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */
>  
> @@ -438,6 +442,14 @@ asmlinkage int printk(const char *fmt, .
>  			log_level_unknown = 1;
>  	}
>  
> +	if (!arch_consoles_callable()) {
> +		/*
> +		 * On some architectures, the consoles are not usable
> +		 * on secondary CPUs early in the boot process.
> +		 */
> +		spin_unlock_irqrestore(&logbuf_lock, flags);
> +		goto out;
> +	}
>  	if (!down_trylock(&console_sem)) {
>  		/*
>  		 * We own the drivers.  We can drop the spinlock and let
> @@ -454,6 +466,7 @@ asmlinkage int printk(const char *fmt, .
>  		 */
>  		spin_unlock_irqrestore(&logbuf_lock, flags);
>  	}
> +out:
>  	return printed_len;
>  }
>  EXPORT_SYMBOL(printk);
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07 21:37       ` David Mosberger
@ 2001-12-07 20:47         ` Marcelo Tosatti
  2001-12-07 22:08         ` Alan Cox
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Marcelo Tosatti @ 2001-12-07 20:47 UTC (permalink / raw)
  To: David Mosberger; +Cc: Andrew Morton, j-nomura, linux-kernel



On Fri, 7 Dec 2001, David Mosberger wrote:

> >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said:
> 
>   Marcelo> I'm really not willing to apply this kludge...
> 
> Do you agree that it should always be safe to call printk() from C code?

No if you can't access the console to print the message :) 

Its just that I would prefer to see the thing fixed in arch-dependant code
instead special casing core code. 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07 18:52       ` Marcelo Tosatti
@ 2001-12-07 20:52         ` William Lee Irwin III
  0 siblings, 0 replies; 25+ messages in thread
From: William Lee Irwin III @ 2001-12-07 20:52 UTC (permalink / raw)
  To: linux-kernel

On Fri, Dec 07, 2001 at 04:52:07PM -0200, Marcelo Tosatti wrote:
> How hard would it be to fix that on ia64 code? 
> I'm really not willing to apply this kludge... 

It's possible, but the off-list discussion's consensus centered
around this being a bootstrap ordering issue, where drivers may
not be called from the application processors (at least not universally)
until they have been initialized to some degree. printk() is in the
unique position of performing such calls, and that is the idea around
which this patch is centered. Specifically, on those architectures
where some initialization of kernel virtual addressing is required to
access memory regions from which console (or any) I/O is done, the
driver calls may not be done from application processors until they
have been initialized (on IA64, this is initializing the uncached
region's region register and installing TLB fault handlers).

So this is actually a broader issue than IA64 alone, though
IA64 seems to be the first to have encountered it. The patch here
provides a hook for all affected architectures to protect themselves
against this issue, although there are perhaps other ways.


Cheers,
Bill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07 22:17         ` David Mosberger
@ 2001-12-07 21:09           ` Marcelo Tosatti
  2001-12-08  1:10           ` David Mosberger
  1 sibling, 0 replies; 25+ messages in thread
From: Marcelo Tosatti @ 2001-12-07 21:09 UTC (permalink / raw)
  To: David Mosberger; +Cc: Andrew Morton, j-nomura, linux-kernel



On Fri, 7 Dec 2001, David Mosberger wrote:

> >>>>> On Fri, 7 Dec 2001 18:47:11 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said:
> 
>   Marcelo> On Fri, 7 Dec 2001, David Mosberger wrote:
> 
>   >> >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti
>   >> <marcelo@conectiva.com.br> said:
>   >> 
>   Marcelo> I'm really not willing to apply this kludge...
>   >>  Do you agree that it should always be safe to call printk() from
>   >> C code?
> 
>   Marcelo> No if you can't access the console to print the message :)
> 
> Let me quote the first few lines of the Linux kernel:
> 
> 	----
> 	asmlinkage void __init start_kernel(void)
> 	{
> 		char * command_line;
> 		unsigned long mempages;
> 		extern char saved_command_line[];
> 	/*
> 	 * Interrupts are still disabled. Do necessary setups, then
> 	 * enable them
> 	 */
> 		lock_kernel();
> 		printk(linux_banner);
> 	----
> 
> You still think it doesn't make sense?

Ok, it does. However, this still does not make me change my mind.

>   Marcelo> Its just that I would prefer to see the thing fixed in
>   Marcelo> arch-dependant code instead special casing core code.
> 
> Only architecture specific problems should be fixed with architecture
> specific code.
> 
> I'm not entirely sure whether this particular problem is architecture
> specific.  Perhaps it is and, if so, I'm certainly happy to fix it in
> the ia64 specific code. However, are you really 100% certain that on
> x86 there are no console drivers which in some fashion depend on
> cpu_init() having completed execution?  Note that the x86 version of
> cpu_init() also has printk()s.  If you're not certain of this, the AP
> startup problem could occur on x86, too.  I haven't looked at all the
> other platforms, but I suspect similar things will be true, there.

Prove, please. If you show me it can also happen on other architectures,
I'll be glad to apply the patch.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07  3:40     ` [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) Andrew Morton
  2001-12-07 18:52       ` Marcelo Tosatti
@ 2001-12-07 21:37       ` David Mosberger
  2001-12-07 20:47         ` Marcelo Tosatti
                           ` (3 more replies)
  1 sibling, 4 replies; 25+ messages in thread
From: David Mosberger @ 2001-12-07 21:37 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Andrew Morton, j-nomura, linux-kernel, David Mosberger

>>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said:

  Marcelo> I'm really not willing to apply this kludge...

Do you agree that it should always be safe to call printk() from C code?

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07 21:37       ` David Mosberger
  2001-12-07 20:47         ` Marcelo Tosatti
@ 2001-12-07 22:08         ` Alan Cox
  2001-12-07 22:14         ` Christopher Friesen
  2001-12-07 22:17         ` David Mosberger
  3 siblings, 0 replies; 25+ messages in thread
From: Alan Cox @ 2001-12-07 22:08 UTC (permalink / raw)
  To: davidm; +Cc: Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

> >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said:
>   Marcelo> I'm really not willing to apply this kludge...
> 
> Do you agree that it should always be safe to call printk() from C code?

Sounds a good goal, but surely thats up to the arch code to get right

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07 21:37       ` David Mosberger
  2001-12-07 20:47         ` Marcelo Tosatti
  2001-12-07 22:08         ` Alan Cox
@ 2001-12-07 22:14         ` Christopher Friesen
  2001-12-07 22:17         ` David Mosberger
  3 siblings, 0 replies; 25+ messages in thread
From: Christopher Friesen @ 2001-12-07 22:14 UTC (permalink / raw)
  To: davidm; +Cc: Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

David Mosberger wrote:
> 
> >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said:
> 
>   Marcelo> I'm really not willing to apply this kludge...
> 
> Do you agree that it should always be safe to call printk() from C code?

Is it really safe to call this from interrupt handlers?  I can think of cases
where the time required to print can totally mess stuff up...

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07 21:37       ` David Mosberger
                           ` (2 preceding siblings ...)
  2001-12-07 22:14         ` Christopher Friesen
@ 2001-12-07 22:17         ` David Mosberger
  2001-12-07 21:09           ` Marcelo Tosatti
  2001-12-08  1:10           ` David Mosberger
  3 siblings, 2 replies; 25+ messages in thread
From: David Mosberger @ 2001-12-07 22:17 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: David Mosberger, Andrew Morton, j-nomura, linux-kernel

>>>>> On Fri, 7 Dec 2001 18:47:11 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said:

  Marcelo> On Fri, 7 Dec 2001, David Mosberger wrote:

  >> >>>>> On Fri, 7 Dec 2001 16:52:07 -0200 (BRST), Marcelo Tosatti
  >> <marcelo@conectiva.com.br> said:
  >> 
  Marcelo> I'm really not willing to apply this kludge...
  >>  Do you agree that it should always be safe to call printk() from
  >> C code?

  Marcelo> No if you can't access the console to print the message :)

Let me quote the first few lines of the Linux kernel:

	----
	asmlinkage void __init start_kernel(void)
	{
		char * command_line;
		unsigned long mempages;
		extern char saved_command_line[];
	/*
	 * Interrupts are still disabled. Do necessary setups, then
	 * enable them
	 */
		lock_kernel();
		printk(linux_banner);
	----

You still think it doesn't make sense?

  Marcelo> Its just that I would prefer to see the thing fixed in
  Marcelo> arch-dependant code instead special casing core code.

Only architecture specific problems should be fixed with architecture
specific code.

I'm not entirely sure whether this particular problem is architecture
specific.  Perhaps it is and, if so, I'm certainly happy to fix it in
the ia64 specific code. However, are you really 100% certain that on
x86 there are no console drivers which in some fashion depend on
cpu_init() having completed execution?  Note that the x86 version of
cpu_init() also has printk()s.  If you're not certain of this, the AP
startup problem could occur on x86, too.  I haven't looked at all the
other platforms, but I suspect similar things will be true, there.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-07 22:17         ` David Mosberger
  2001-12-07 21:09           ` Marcelo Tosatti
@ 2001-12-08  1:10           ` David Mosberger
  2001-12-08 11:27             ` Alan Cox
  2001-12-08 16:41             ` David Mosberger
  1 sibling, 2 replies; 25+ messages in thread
From: David Mosberger @ 2001-12-08  1:10 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: David Mosberger, Andrew Morton, j-nomura, linux-kernel

>>>>> On Fri, 7 Dec 2001 19:09:03 -0200 (BRST), Marcelo Tosatti <marcelo@conectiva.com.br> said:

  >> I'm not entirely sure whether this particular problem is
  >> architecture specific.  Perhaps it is and, if so, I'm certainly
  >> happy to fix it in the ia64 specific code. However, are you
  >> really 100% certain that on x86 there are no console drivers
  >> which in some fashion depend on cpu_init() having completed
  >> execution?  Note that the x86 version of cpu_init() also has
  >> printk()s.  If you're not certain of this, the AP startup problem
  >> could occur on x86, too.  I haven't looked at all the other
  >> platforms, but I suspect similar things will be true, there.

  Marcelo> Prove, please. If you show me it can also happen on other
  Marcelo> architectures, I'll be glad to apply the patch.

I'm no x86 expert, but I have the impression that
current_cpu_data.loops_per_jiffy will be invalid (probably 0) until
smp_store_cpu_info() is called in smp_callin().  If so, a console
driver using udelay() might not work properly.  I suspect there are
other issues, but this is just based on looking at the x86 source code
for 5 minutes.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-08  1:10           ` David Mosberger
@ 2001-12-08 11:27             ` Alan Cox
  2001-12-08 16:41             ` David Mosberger
  1 sibling, 0 replies; 25+ messages in thread
From: Alan Cox @ 2001-12-08 11:27 UTC (permalink / raw)
  To: davidm; +Cc: Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

> I'm no x86 expert, but I have the impression that
> current_cpu_data.loops_per_jiffy will be invalid (probably 0) until
> smp_store_cpu_info() is called in smp_callin().  If so, a console
> driver using udelay() might not work properly.  I suspect there are
> other issues, but this is just based on looking at the x86 source code
> for 5 minutes.

x86_udelay_tsc wont have been set at that point so the main timer is still
being used.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-08  1:10           ` David Mosberger
  2001-12-08 11:27             ` Alan Cox
@ 2001-12-08 16:41             ` David Mosberger
  2001-12-08 20:45               ` Alan Cox
  2001-12-09  0:32               ` David Mosberger
  1 sibling, 2 replies; 25+ messages in thread
From: David Mosberger @ 2001-12-08 16:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: davidm, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

>>>>> On Sat, 8 Dec 2001 11:27:19 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> said:

  >> I'm no x86 expert, but I have the impression that
  >> current_cpu_data.loops_per_jiffy will be invalid (probably 0)
  >> until smp_store_cpu_info() is called in smp_callin().  If so, a
  >> console driver using udelay() might not work properly.  I suspect
  >> there are other issues, but this is just based on looking at the
  >> x86 source code for 5 minutes.

  Alan> x86_udelay_tsc wont have been set at that point so the main
  Alan> timer is still being used.

x86 does use current_cpu_data.loops_per_jiffy in the non-TSC case, no?

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-08 16:41             ` David Mosberger
@ 2001-12-08 20:45               ` Alan Cox
  2001-12-09  0:32               ` David Mosberger
  1 sibling, 0 replies; 25+ messages in thread
From: Alan Cox @ 2001-12-08 20:45 UTC (permalink / raw)
  To: davidm; +Cc: Alan Cox, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

>   Alan> x86_udelay_tsc wont have been set at that point so the main
>   Alan> timer is still being used.
> 
> x86 does use current_cpu_data.loops_per_jiffy in the non-TSC case, no?

I believe so.  So we should propogate that across earlier, although its
not needed for our current console drivers that I can see

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-08 16:41             ` David Mosberger
  2001-12-08 20:45               ` Alan Cox
@ 2001-12-09  0:32               ` David Mosberger
  2001-12-09  0:55                 ` Alan Cox
  2001-12-09  0:58                 ` David Mosberger
  1 sibling, 2 replies; 25+ messages in thread
From: David Mosberger @ 2001-12-09  0:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: davidm, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

>>>>> On Sat, 8 Dec 2001 20:45:07 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> said:

  Alan> x86_udelay_tsc wont have been set at that point so the main
  Alan> timer is still being used.

  >>  x86 does use current_cpu_data.loops_per_jiffy in the non-TSC
  >> case, no?

  Alan> I believe so.  So we should propogate that across earlier,
  Alan> although its not needed for our current console drivers that I
  Alan> can see

I don't think you can do it early enough.  calibrate_delay() requires
irqs to be enabled and the first printk() happens long before irqs are
enabled on an AP.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-09  0:32               ` David Mosberger
@ 2001-12-09  0:55                 ` Alan Cox
  2001-12-09  0:58                 ` David Mosberger
  1 sibling, 0 replies; 25+ messages in thread
From: Alan Cox @ 2001-12-09  0:55 UTC (permalink / raw)
  To: davidm; +Cc: Alan Cox, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

> I don't think you can do it early enough.  calibrate_delay() requires
> irqs to be enabled and the first printk() happens long before irqs are
> enabled on an AP.

So we make sure our initial console code doesnt need udelay(), or set an
initial safe default like 25MHz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-09  0:32               ` David Mosberger
  2001-12-09  0:55                 ` Alan Cox
@ 2001-12-09  0:58                 ` David Mosberger
  2001-12-09  1:14                   ` David Mosberger
  2001-12-09  1:15                   ` Alan Cox
  1 sibling, 2 replies; 25+ messages in thread
From: David Mosberger @ 2001-12-09  0:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: davidm, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

>>>>> On Sun, 9 Dec 2001 00:55:25 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> said:

  >> I don't think you can do it early enough.  calibrate_delay()
  >> requires irqs to be enabled and the first printk() happens long
  >> before irqs are enabled on an AP.

  Alan> So we make sure our initial console code doesnt need udelay(),
  Alan> or set an initial safe default like 25MHz

So someone is going to maintain a list of what a console driver can
and cannot do for all 12+ ports in existence?

The alternative is to do:

--- linux-2.4.16/kernel/printk.c	Mon Nov 26 11:19:24 2001
+++ lia64-kdb/kernel/printk.c	Thu Nov 29 21:45:08 2001
@@ -498,6 +505,10 @@
 	for ( ; ; ) {
 		spin_lock_irqsave(&logbuf_lock, flags);
 		must_wake_klogd |= log_start - log_end;
+#ifdef CONFIG_SMP
+		if (!(cpu_online_map & (1UL << smp_processor_id())))
+			break;
+#endif
 		if (con_start == log_end)
 			break;			/* Nothing to print */
 		_con_start = con_start;

and be done with it.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-09  0:58                 ` David Mosberger
@ 2001-12-09  1:14                   ` David Mosberger
  2001-12-09  1:32                     ` Alan Cox
  2001-12-09  1:15                   ` Alan Cox
  1 sibling, 1 reply; 25+ messages in thread
From: David Mosberger @ 2001-12-09  1:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: davidm, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

>>>>> On Sun, 9 Dec 2001 01:15:25 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> said:

  Alan> And break the ability for non broken setups to debug SMP boot
  Alan> up. Lets do the job properly.

Then use Andrew's patch (attached below).

	--david

--- linux-2.4.17-pre4/include/asm-ia64/system.h	Thu Nov 22 23:02:59 2001
+++ linux-akpm/include/asm-ia64/system.h	Wed Dec  5 23:09:15 2001
@@ -405,6 +405,10 @@ extern void ia64_load_extra (struct task
 	ia64_psr(ia64_task_regs(prev))->dfh = 1;				\
 	__switch_to(prev,next,last);						\
   } while (0)
+
+/* Return true if this CPU can call the console drivers in printk() */
+#define arch_consoles_callable() (cpu_online_map & (1UL << smp_processor_id()))
+
 #else
 # define switch_to(prev,next,last) do {						\
 	ia64_psr(ia64_task_regs(next))->dfh = (ia64_get_fpu_owner() != (next));	\
--- linux-2.4.17-pre4/kernel/printk.c	Thu Nov 22 23:02:59 2001
+++ linux-akpm/kernel/printk.c	Wed Dec  5 23:03:54 2001
@@ -38,6 +38,10 @@
 
 #define LOG_BUF_MASK	(LOG_BUF_LEN-1)
 
+#ifndef arch_consoles_callable
+#define arch_consoles_callable() (1)
+#endif
+
 /* printk's without a loglevel use this.. */
 #define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */
 
@@ -438,6 +442,14 @@ asmlinkage int printk(const char *fmt, .
 			log_level_unknown = 1;
 	}
 
+	if (!arch_consoles_callable()) {
+		/*
+		 * On some architectures, the consoles are not usable
+		 * on secondary CPUs early in the boot process.
+		 */
+		spin_unlock_irqrestore(&logbuf_lock, flags);
+		goto out;
+	}
 	if (!down_trylock(&console_sem)) {
 		/*
 		 * We own the drivers.  We can drop the spinlock and let
@@ -454,6 +466,7 @@ asmlinkage int printk(const char *fmt, .
 		 */
 		spin_unlock_irqrestore(&logbuf_lock, flags);
 	}
+out:
 	return printed_len;
 }
 EXPORT_SYMBOL(printk);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-09  0:58                 ` David Mosberger
  2001-12-09  1:14                   ` David Mosberger
@ 2001-12-09  1:15                   ` Alan Cox
  1 sibling, 0 replies; 25+ messages in thread
From: Alan Cox @ 2001-12-09  1:15 UTC (permalink / raw)
  To: davidm; +Cc: Alan Cox, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

>   Alan> So we make sure our initial console code doesnt need udelay(),
>   Alan> or set an initial safe default like 25MHz
> 
> So someone is going to maintain a list of what a console driver can
> and cannot do for all 12+ ports in existence?
> 
> The alternative is to do:

And break the ability for non broken setups to debug SMP boot up. Lets do
the job properly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
  2001-12-09  1:14                   ` David Mosberger
@ 2001-12-09  1:32                     ` Alan Cox
  0 siblings, 0 replies; 25+ messages in thread
From: Alan Cox @ 2001-12-09  1:32 UTC (permalink / raw)
  To: davidm; +Cc: Alan Cox, Marcelo Tosatti, Andrew Morton, j-nomura, linux-kernel

>   Alan> And break the ability for non broken setups to debug SMP boot
>   Alan> up. Lets do the job properly.
> 
> Then use Andrew's patch (attached below).

No objection.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck)
@ 2001-12-08 17:36 Manfred Spraul
  0 siblings, 0 replies; 25+ messages in thread
From: Manfred Spraul @ 2001-12-08 17:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> x86_udelay_tsc wont have been set at that point so the main timer is still
> being used.

No. x86_udelay_tsc is initialized by time_init(), and time_init() is called before
smp_init(). The udelay implementation only multiplies with loops_per_jiffy,
therefore there is no oops on i386.

But could oops if the bios disables the TSC instruction - the first printk on
the secondary cpu happens before

     clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE)

--
    Manfred


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2001-12-09  1:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-03  5:46 [PATCH] 2.4.16 kernel/printk.c (per processor initialization check) j-nomura
2001-12-03  9:20 ` Andrew Morton
2001-12-03 10:32 ` j-nomura
2001-12-04  1:45   ` [PATCH] 2.4.16 kernel/printk.c (per processor initializationcheck) Andrew Morton
2001-12-06  5:01   ` j-nomura
2001-12-07  3:40     ` [PATCH] 2.4.16 kernel/printk.c (per processorinitializationcheck) Andrew Morton
2001-12-07 18:52       ` Marcelo Tosatti
2001-12-07 20:52         ` William Lee Irwin III
2001-12-07 21:37       ` David Mosberger
2001-12-07 20:47         ` Marcelo Tosatti
2001-12-07 22:08         ` Alan Cox
2001-12-07 22:14         ` Christopher Friesen
2001-12-07 22:17         ` David Mosberger
2001-12-07 21:09           ` Marcelo Tosatti
2001-12-08  1:10           ` David Mosberger
2001-12-08 11:27             ` Alan Cox
2001-12-08 16:41             ` David Mosberger
2001-12-08 20:45               ` Alan Cox
2001-12-09  0:32               ` David Mosberger
2001-12-09  0:55                 ` Alan Cox
2001-12-09  0:58                 ` David Mosberger
2001-12-09  1:14                   ` David Mosberger
2001-12-09  1:32                     ` Alan Cox
2001-12-09  1:15                   ` Alan Cox
2001-12-08 17:36 Manfred Spraul

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.