linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] panic: fix incomplete panic log in panic()
@ 2012-10-11  8:03 Qing Zhu
  2012-10-11 21:18 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Qing Zhu @ 2012-10-11  8:03 UTC (permalink / raw)
  To: akpm, mingo, ben, markivx, ak
  Cc: linux-kernel, cxie4, binw, wwang27, njun, Qing Zhu

Panic log should be printed on the console, but if someone lock the
console when panic, console won't print out panic log.

The incomplete panic log issue will happen in below scenarios:
1. One task call console_lock(), then panic happend before it call
console_unlock(). No panic log can be printed.
2. Cpu 0 call panic()->Cpu 1 call console_lock()->Cpu 0 call
smp_send_stop()
Cpu 1 will be stopped and can't unlock console,only top part of panic log
will be printed.

So unlock console anyway in panic() to keep panic log printed.

Signed-off-by: Qing Zhu <qzhu@marvell.com>
---
 kernel/panic.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index e1b2822..3924d25 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -23,6 +23,7 @@
 #include <linux/init.h>
 #include <linux/nmi.h>
 #include <linux/dmi.h>
+#include <linux/console.h>
 
 #define PANIC_TIMER_STEP 100
 #define PANIC_BLINK_SPD 18
@@ -127,6 +128,13 @@ void panic(const char *fmt, ...)
 
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
+	/*
+	 * Unlock the console anyway here, in case it's occupied by another
+	 * one which has no chance to unlock the console thus prevents the
+	 * panic log prints on the console.
+	 */
+	console_unlock();
+
 	bust_spinlocks(0);
 
 	if (!panic_blink)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
  2012-10-11  8:03 [PATCH] panic: fix incomplete panic log in panic() Qing Zhu
@ 2012-10-11 21:18 ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2012-10-11 21:18 UTC (permalink / raw)
  To: Qing Zhu
  Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27, njun

On Thu, 11 Oct 2012 16:03:07 +0800
Qing Zhu <qzhu@marvell.com> wrote:

> Panic log should be printed on the console, but if someone lock the
> console when panic, console won't print out panic log.
> 
> The incomplete panic log issue will happen in below scenarios:
> 1. One task call console_lock(), then panic happend before it call
> console_unlock(). No panic log can be printed.
> 2. Cpu 0 call panic()->Cpu 1 call console_lock()->Cpu 0 call
> smp_send_stop()
> Cpu 1 will be stopped and can't unlock console,only top part of panic log
> will be printed.
> 
> So unlock console anyway in panic() to keep panic log printed.
> 
> Signed-off-by: Qing Zhu <qzhu@marvell.com>
> ---
>  kernel/panic.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index e1b2822..3924d25 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -23,6 +23,7 @@
>  #include <linux/init.h>
>  #include <linux/nmi.h>
>  #include <linux/dmi.h>
> +#include <linux/console.h>
>  
>  #define PANIC_TIMER_STEP 100
>  #define PANIC_BLINK_SPD 18
> @@ -127,6 +128,13 @@ void panic(const char *fmt, ...)
>  
>  	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>  
> +	/*
> +	 * Unlock the console anyway here, in case it's occupied by another
> +	 * one which has no chance to unlock the console thus prevents the
> +	 * panic log prints on the console.
> +	 */
> +	console_unlock();
> +
>  	bust_spinlocks(0);
>  
>  	if (!panic_blink)

hm.  console_unlock() does a large amount of work, and it seems risky
to do all of that when the system is in a bad-and-getting-worse state.

Is there some more modest thing we could do here, for example,

	if (console_locked) {
		up(&console_sem);
		console_locked = 0;
	}

or something along those lines?

Also, perhaps this operation should be moved into bust_spinlocks(). 
What would have happened if your code had triggered an oops, rather
than called panic()?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
  2012-10-22 14:54         ` Qing Z
@ 2012-10-22 19:51           ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2012-10-22 19:51 UTC (permalink / raw)
  To: Qing Z
  Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
	xjian, zhangwm, Qing Zhu

On Mon, 22 Oct 2012 22:54:54 +0800
Qing Z <njumical@gmail.com> wrote:

> Hi Andrew,
>           Basically, console_unlock() should be called to make panic
> log printed.  Call console_unlock() in panic have some risks when
> recurse in it(are there other bad cases?). The condition is very rare
> and the two issue cases I list always happen between console_lock()
> and console_unlock(). So I think we need to couple with
> console_lock(), but should avoid the case that panic happen in
> console_unlock().  I think it is a more modest and safe way.  Please
> corect me if there is something wrong. Thanks!
> 
> bool Is_in_console_unlock;
> void console_unlock(void)
> {
>        ...
> +      Is_in_console_unlock = ture;
>         /* flush buffered message fragment immediately to console */
>         console_cont_flush(text, sizeof(text));
> again:
>         for (;;) {
>         ....
>  +       Is_in_console_unlock = false;
> }
> 
> void panic(const char *fmt, ...)
> {
>         ....
> +      /*
> +       * we should unlock console here to make oops log printed, in case
> +       * console is locked before panic in this cpu, or other cpus lock the
> +       * console before be stopped.
> +       */
> +      if( unlikely(console_locked) && !Is_in_console_unlock )
> +           {
> +                  console_unlock();
> +                  console_locked = 0;
> +            }
> 
>         /*
>          * Note smp_send_stop is the usual smp shutdown function, which
>          * unfortunately means it may not be hardened to work in a panic
>          * situation.
>          */
>         smp_send_stop();
> 
>         ....
> }

Well, if something like that will solve the problem then yes, I guess
that is the way to go.  It's not pretty, but it is clear and direct,
and this isn't a pretty problem!

But is this approach sufficient?  What happens in the case of an oops
or a BUG() inside console_lock()?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
  2012-10-18  0:06       ` Andrew Morton
@ 2012-10-22 14:54         ` Qing Z
  2012-10-22 19:51           ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Qing Z @ 2012-10-22 14:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
	xjian, zhangwm, Qing Zhu

2012/10/18 Andrew Morton <akpm@linux-foundation.org>:
> On Wed, 17 Oct 2012 18:44:32 +0800
> Qing Z <njumical@gmail.com> wrote:
>
>>          In ./drivers/video/fbmem.c, codes below cause issues:
>>
>>                             case FBIOPAN_DISPLAY:
>>                       ...
>>                       console_lock();
>>                       ret = fb_pan_display(info, &var);
>>                       console_unlock();
>>                       ...
>>                       break;
>>
>> issue case 1:
>> 1. core 0 call console_lock();
>> 2. panic;
>> ...
>> 4. panic process done.
>> Result: all panic log won't be printed.
>>
>> issue case 2:
>> 1. core 0 panic;
>> 2. core 1 call console_lock();
>> 3. core 0 call smp_send_stop(), core1 stop;
>> 4. core 0 panic process done.
>> Result: only little top part of panic log will be printed.
>>
>> My soluiton according to your suggestions:
>>
>> As you said, the first priority is to get oops message reliably
>> delivered. I think we needn't care about console_sem when panic, just
>> make sure we print the log imediately, so add
>> sema_init(&console_sem,1) in bust_spinlocks(0), just like zap_locks()
>> do.  It is safer than console_unlock() or up().
>
> hm, I see.
>
>> We can't add sema_init(..) in bust_spinlocks(1) due to issue case2,
>> although the condition is rare. About issue case 2: should we avoid
>> call console_lock() when panic?
>
> Well, I think we do have infrastructure to support that:
>
> +       if (!oops_in_progress)
>                 console_lock();
>
> I haven't looked to see how practical that approach would be.
>
>
> It would be better if we were to do
>
>         if (oops_in_progress)
>                 console_trylock();
>         else
>                 console_lock();
>
> where console_trylock() would *try* to do a console_lock() but would
> bail out if it was unable to immediately take the lock.  This is better
> because most of the time, the oopsing CPU *will* lock the console and
> will prevent other code from getting into the console code and messing
> things up.
>
> A problem with this approach is that it is very hard to test - the
> "console_trylock failed" case will be rare.
>
>
> I think it would be acceptable to just skip over the console_lock() if
> oops_in_progress is set.  And if we skipped the console_lock(), we
> should also skip the console_unlock().  So something like:
>
>         bool console_unlock_needed = true;
>
>         if (unlikely(oops_in_progress))
>                 console_unlock_needed = false;
>         else
>                 console_lock();
>
>         ...
>
>         if (console_unlock_needed)
>                 console_unlock();
>
>
>> If we init console_sem in panic, old text may be flushed too, but
>> should be before panic oops message. Also we can fix it by updating
>> con_start("con_start = log_end") once panic happen, only log after
>> panic will be printed.
Hi Andrew,
          Basically, console_unlock() should be called to make panic
log printed.  Call console_unlock() in panic have some risks when
recurse in it(are there other bad cases?). The condition is very rare
and the two issue cases I list always happen between console_lock()
and console_unlock(). So I think we need to couple with
console_lock(), but should avoid the case that panic happen in
console_unlock().  I think it is a more modest and safe way.  Please
corect me if there is something wrong. Thanks!

bool Is_in_console_unlock;
void console_unlock(void)
{
       ...
+      Is_in_console_unlock = ture;
        /* flush buffered message fragment immediately to console */
        console_cont_flush(text, sizeof(text));
again:
        for (;;) {
        ....
 +       Is_in_console_unlock = false;
}

void panic(const char *fmt, ...)
{
        ....
+      /*
+       * we should unlock console here to make oops log printed, in case
+       * console is locked before panic in this cpu, or other cpus lock the
+       * console before be stopped.
+       */
+      if( unlikely(console_locked) && !Is_in_console_unlock )
+           {
+                  console_unlock();
+                  console_locked = 0;
+            }

        /*
         * Note smp_send_stop is the usual smp shutdown function, which
         * unfortunately means it may not be hardened to work in a panic
         * situation.
         */
        smp_send_stop();

        ....
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
  2012-10-17 10:44     ` Qing Z
@ 2012-10-18  0:06       ` Andrew Morton
  2012-10-22 14:54         ` Qing Z
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2012-10-18  0:06 UTC (permalink / raw)
  To: Qing Z
  Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
	xjian, zhangwm, Qing Zhu

On Wed, 17 Oct 2012 18:44:32 +0800
Qing Z <njumical@gmail.com> wrote:

>          In ./drivers/video/fbmem.c, codes below cause issues:
> 	
>                             case FBIOPAN_DISPLAY:
> 			...
> 			console_lock();
> 			ret = fb_pan_display(info, &var);
> 			console_unlock();
> 			...
> 			break;
> 
> issue case 1:
> 1. core 0 call console_lock();
> 2. panic;
> ...
> 4. panic process done.
> Result: all panic log won't be printed.
> 
> issue case 2:
> 1. core 0 panic;
> 2. core 1 call console_lock();
> 3. core 0 call smp_send_stop(), core1 stop;
> 4. core 0 panic process done.
> Result: only little top part of panic log will be printed.
> 
> My soluiton according to your suggestions:
> 
> As you said, the first priority is to get oops message reliably
> delivered. I think we needn't care about console_sem when panic, just
> make sure we print the log imediately, so add
> sema_init(&console_sem,1) in bust_spinlocks(0), just like zap_locks()
> do.  It is safer than console_unlock() or up().

hm, I see.

> We can't add sema_init(..) in bust_spinlocks(1) due to issue case2,
> although the condition is rare. About issue case 2: should we avoid
> call console_lock() when panic?

Well, I think we do have infrastructure to support that:

+	if (!oops_in_progress)
		console_lock();

I haven't looked to see how practical that approach would be.


It would be better if we were to do

	if (oops_in_progress)
		console_trylock();
	else
		console_lock();

where console_trylock() would *try* to do a console_lock() but would
bail out if it was unable to immediately take the lock.  This is better
because most of the time, the oopsing CPU *will* lock the console and
will prevent other code from getting into the console code and messing
things up.

A problem with this approach is that it is very hard to test - the
"console_trylock failed" case will be rare.


I think it would be acceptable to just skip over the console_lock() if
oops_in_progress is set.  And if we skipped the console_lock(), we
should also skip the console_unlock().  So something like:

	bool console_unlock_needed = true;

	if (unlikely(oops_in_progress))
		console_unlock_needed = false;
	else
		console_lock();

	...

	if (console_unlock_needed)
		console_unlock();


> If we init console_sem in panic, old text may be flushed too, but
> should be before panic oops message. Also we can fix it by updating
> con_start("con_start = log_end") once panic happen, only log after
> panic will be printed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
  2012-10-15 22:02   ` Andrew Morton
  2012-10-15 22:06     ` Andi Kleen
  2012-10-16 14:25     ` Qing Z
@ 2012-10-17 10:44     ` Qing Z
  2012-10-18  0:06       ` Andrew Morton
  2 siblings, 1 reply; 10+ messages in thread
From: Qing Z @ 2012-10-17 10:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
	xjian, zhangwm, Qing Zhu

2012/10/16 Andrew Morton <akpm@linux-foundation.org>:
> On Mon, 15 Oct 2012 19:38:46 +0800
> Qing Z <njumical@gmail.com> wrote:
>
>> >>       atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >>
>> >> +     /*
>> >> +      * Unlock the console anyway here, in case it's occupied by another
>> >> +      * one which has no chance to unlock the console thus prevents the
>> >> +      * panic log prints on the console.
>> >> +      */
>> >> +     console_unlock();
>> >> +
>> >>       bust_spinlocks(0);
>> >>
>> >>       if (!panic_blink)
>> >
>> > hm.  console_unlock() does a large amount of work, and it seems risky
>> > to do all of that when the system is in a bad-and-getting-worse state.
>> >
>> > Is there some more modest thing we could do here, for example,
>> >
>> >         if (console_locked) {
>> >                 up(&console_sem);
>> >                 console_locked = 0;
>> >         }
>> >
>> > or something along those lines?
>> >
>> > Also, perhaps this operation should be moved into bust_spinlocks().
>> > What would have happened if your code had triggered an oops, rather
>> > than called panic()?
>> >
>> >
>> Hi Andrew,
>>       Thanks for your reply!
>>       For your question" What would have happened if your code had
>> triggered an oops, rather than called panic()?", actually we found the
>> issue when trigger an oops. When we call FBIOPAN_DISPLAY in
>> ./drivers/video/fbmem.c, it will first lock console, if we trigger an
>> oops before unlock console, the issue happen. It also exist when call
>> panic() directly in the same case. It is a common issue for panic
>> process.
>>       I have two options for solution:
>>       1. I agree with your suggestion that add some modest thing in
>> bust_spinlocks(), bust_spinlocks() is supposed to clear spinlocks
>> which prevent oops information from reaching the user. But it didn't
>> clear console_sem. We can add codes that clear console_sem.
>>       1) add up(&console_sem) in bust_spinlocks(0).
>>               It will be risky in case that no printk after bust_spinlocks(0) in
>> panic(), because no console_unlock() to print log out.
>>       2) call console_unlock()in bust_spinlocks(0).
>>               For bust_spinlocks(0), console_unblank() is used to flush oops to
>> mtdoops console(commit: b61312d353da1871778711040464b10f5cd904df).
>> Logically, if panic without the issue, console_unlock is called after
>> couples of console_lock and console_unlock; if panic with the issue,
>> will it be risky call console_unlock() in console_unblank() after
>> console_lock()?
>>       2. Moreover, there is another option. We can also add protect codes
>> in vprintk(), vprintk() just cover the cases that two cores' log
>> interleave when panic and printk recurse itself. We can add all cases'
>> protection here. Actually the original vprintk() don___t have the issue,
>> but after the patch(commit: fe21773d655c2c64641ec2cef499289ea175c817)
>> which fix two cores' log interleave issue , the issue is not covered.
>> I add a flag after panic_smp_self_stop() in panic(), and check the
>> flag, if flag is set, vprintk will call zap_locks(), I have tested the
>> option, the issue also disappear.
>>       What do you think?
>
> The #1 priority is to get the oops message reliably delivered.
>
> That means we should avoid console_unlock() on the oops path: it's far
> too complicated and risks deadlocks, re-oopses, recursion, etc.
>
> If there was text queued in the console layer and that text fails to be
> emitted, well, that's sad, but it's more important that the oops
> message be displayed.
>
> If the oops trace is occasionally interleaved with other text then
> that's sad too, but at least the info we need is readable.  Oopses
> inside console_lock() are rare.
>
>
> So I'd suggest that the code in bust_spinlocks(1) should simply do
> whatever needs to be done to make the forthcoming oops trace be
> visible, and leave it at that - don't bother trying to flush out any
> old text.
>
>
> Also, we should be careful with things like up() on a semaphore which
> hasn't been down()ed.  Because under some Kconfig combinations, such an
> operation might trigger debugging traces and we could get into a big
> mess.  (An up() on non-down()ed semaphore is actually an OK operation,
> so this was a bad example.  But you see the problem).
>
Hi Andrew,
        Thanks for your suggestions.
         I am e newbie, may not state the issue clearly. Let me state it again:
         In ./drivers/video/fbmem.c, codes below cause issues:
	
                            case FBIOPAN_DISPLAY:
			...
			console_lock();
			ret = fb_pan_display(info, &var);
			console_unlock();
			...
			break;

issue case 1:
1. core 0 call console_lock();
2. panic;
...
4. panic process done.
Result: all panic log won't be printed.

issue case 2:
1. core 0 panic;
2. core 1 call console_lock();
3. core 0 call smp_send_stop(), core1 stop;
4. core 0 panic process done.
Result: only little top part of panic log will be printed.

My soluiton according to your suggestions:

As you said, the first priority is to get oops message reliably
delivered. I think we needn't care about console_sem when panic, just
make sure we print the log imediately, so add
sema_init(&console_sem,1) in bust_spinlocks(0), just like zap_locks()
do.  It is safer than console_unlock() or up().

We can't add sema_init(..) in bust_spinlocks(1) due to issue case2,
although the condition is rare. About issue case 2: should we avoid
call console_lock() when panic?

If we init console_sem in panic, old text may be flushed too, but
should be before panic oops message. Also we can fix it by updating
con_start("con_start = log_end") once panic happen, only log after
panic will be printed.


What do you think? Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
  2012-10-15 22:02   ` Andrew Morton
  2012-10-15 22:06     ` Andi Kleen
@ 2012-10-16 14:25     ` Qing Z
  2012-10-17 10:44     ` Qing Z
  2 siblings, 0 replies; 10+ messages in thread
From: Qing Z @ 2012-10-16 14:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
	xjian, zhangwm, Qing Zhu

2012/10/16 Andrew Morton <akpm@linux-foundation.org>:
> On Mon, 15 Oct 2012 19:38:46 +0800
> Qing Z <njumical@gmail.com> wrote:
>
>> >>       atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >>
>> >> +     /*
>> >> +      * Unlock the console anyway here, in case it's occupied by another
>> >> +      * one which has no chance to unlock the console thus prevents the
>> >> +      * panic log prints on the console.
>> >> +      */
>> >> +     console_unlock();
>> >> +
>> >>       bust_spinlocks(0);
>> >>
>> >>       if (!panic_blink)
>> >
>> > hm.  console_unlock() does a large amount of work, and it seems risky
>> > to do all of that when the system is in a bad-and-getting-worse state.
>> >
>> > Is there some more modest thing we could do here, for example,
>> >
>> >         if (console_locked) {
>> >                 up(&console_sem);
>> >                 console_locked = 0;
>> >         }
>> >
>> > or something along those lines?
>> >
>> > Also, perhaps this operation should be moved into bust_spinlocks().
>> > What would have happened if your code had triggered an oops, rather
>> > than called panic()?
>> >
>> >
>> Hi Andrew,
>>       Thanks for your reply!
>>       For your question" What would have happened if your code had
>> triggered an oops, rather than called panic()?", actually we found the
>> issue when trigger an oops. When we call FBIOPAN_DISPLAY in
>> ./drivers/video/fbmem.c, it will first lock console, if we trigger an
>> oops before unlock console, the issue happen. It also exist when call
>> panic() directly in the same case. It is a common issue for panic
>> process.
>>       I have two options for solution:
>>       1. I agree with your suggestion that add some modest thing in
>> bust_spinlocks(), bust_spinlocks() is supposed to clear spinlocks
>> which prevent oops information from reaching the user. But it didn't
>> clear console_sem. We can add codes that clear console_sem.
>>       1) add up(&console_sem) in bust_spinlocks(0).
>>               It will be risky in case that no printk after bust_spinlocks(0) in
>> panic(), because no console_unlock() to print log out.
>>       2) call console_unlock()in bust_spinlocks(0).
>>               For bust_spinlocks(0), console_unblank() is used to flush oops to
>> mtdoops console(commit: b61312d353da1871778711040464b10f5cd904df).
>> Logically, if panic without the issue, console_unlock is called after
>> couples of console_lock and console_unlock; if panic with the issue,
>> will it be risky call console_unlock() in console_unblank() after
>> console_lock()?
>>       2. Moreover, there is another option. We can also add protect codes
>> in vprintk(), vprintk() just cover the cases that two cores' log
>> interleave when panic and printk recurse itself. We can add all cases'
>> protection here. Actually the original vprintk() don___t have the issue,
>> but after the patch(commit: fe21773d655c2c64641ec2cef499289ea175c817)
>> which fix two cores' log interleave issue , the issue is not covered.
>> I add a flag after panic_smp_self_stop() in panic(), and check the
>> flag, if flag is set, vprintk will call zap_locks(), I have tested the
>> option, the issue also disappear.
>>       What do you think?
>
> The #1 priority is to get the oops message reliably delivered.
>
> That means we should avoid console_unlock() on the oops path: it's far
> too complicated and risks deadlocks, re-oopses, recursion, etc.
>
> If there was text queued in the console layer and that text fails to be
> emitted, well, that's sad, but it's more important that the oops
> message be displayed.
>
> If the oops trace is occasionally interleaved with other text then
> that's sad too, but at least the info we need is readable.  Oopses
> inside console_lock() are rare.
>
>
> So I'd suggest that the code in bust_spinlocks(1) should simply do
> whatever needs to be done to make the forthcoming oops trace be
> visible, and leave it at that - don't bother trying to flush out any
> old text.
>
>
> Also, we should be careful with things like up() on a semaphore which
> hasn't been down()ed.  Because under some Kconfig combinations, such an
> operation might trigger debugging traces and we could get into a big
> mess.  (An up() on non-down()ed semaphore is actually an OK operation,
> so this was a bad example.  But you see the problem).
>
Hi Andrew,
         It is indeed risky that call console_unlock() directly in
panic process, like deadlock that  core0 stop core1 but core1 is
holding logbuf_lock.
         As to old text, if we hold logbuf_lock when panic,we can keep
panic log continuous and entire, old text will be before panic
log.Frankly, I don't see any risk that old text impact or interleave
panic log.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
  2012-10-15 22:02   ` Andrew Morton
@ 2012-10-15 22:06     ` Andi Kleen
  2012-10-16 14:25     ` Qing Z
  2012-10-17 10:44     ` Qing Z
  2 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2012-10-15 22:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Qing Z, mingo, ben, markivx, linux-kernel, cxie4, binw, wwang27,
	xjian, zhangwm, Qing Zhu

> Also, we should be careful with things like up() on a semaphore which
> hasn't been down()ed.  Because under some Kconfig combinations, such an
> operation might trigger debugging traces and we could get into a big
> mess.  (An up() on non-down()ed semaphore is actually an OK operation,
> so this was a bad example.  But you see the problem).

iirc the debugging code normally checks oops_in_progress, so this shouldn't 
happen.

Agree on the rest of your email -- keep oops simple ... (KOSS)

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
  2012-10-15 11:38 ` Qing Z
@ 2012-10-15 22:02   ` Andrew Morton
  2012-10-15 22:06     ` Andi Kleen
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Morton @ 2012-10-15 22:02 UTC (permalink / raw)
  To: Qing Z
  Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
	xjian, zhangwm, Qing Zhu

On Mon, 15 Oct 2012 19:38:46 +0800
Qing Z <njumical@gmail.com> wrote:

> >>       atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> >>
> >> +     /*
> >> +      * Unlock the console anyway here, in case it's occupied by another
> >> +      * one which has no chance to unlock the console thus prevents the
> >> +      * panic log prints on the console.
> >> +      */
> >> +     console_unlock();
> >> +
> >>       bust_spinlocks(0);
> >>
> >>       if (!panic_blink)
> >
> > hm.  console_unlock() does a large amount of work, and it seems risky
> > to do all of that when the system is in a bad-and-getting-worse state.
> >
> > Is there some more modest thing we could do here, for example,
> >
> >         if (console_locked) {
> >                 up(&console_sem);
> >                 console_locked = 0;
> >         }
> >
> > or something along those lines?
> >
> > Also, perhaps this operation should be moved into bust_spinlocks().
> > What would have happened if your code had triggered an oops, rather
> > than called panic()?
> >
> >
> Hi Andrew,
> 	Thanks for your reply!
> 	For your question" What would have happened if your code had
> triggered an oops, rather than called panic()?", actually we found the
> issue when trigger an oops. When we call FBIOPAN_DISPLAY in
> ./drivers/video/fbmem.c, it will first lock console, if we trigger an
> oops before unlock console, the issue happen. It also exist when call
> panic() directly in the same case. It is a common issue for panic
> process.
> 	I have two options for solution:
> 	1. I agree with your suggestion that add some modest thing in
> bust_spinlocks(), bust_spinlocks() is supposed to clear spinlocks
> which prevent oops information from reaching the user. But it didn't
> clear console_sem. We can add codes that clear console_sem.
> 	1) add up(&console_sem) in bust_spinlocks(0).
> 		It will be risky in case that no printk after bust_spinlocks(0) in
> panic(), because no console_unlock() to print log out.
> 	2) call console_unlock()in bust_spinlocks(0).
> 		For bust_spinlocks(0), console_unblank() is used to flush oops to
> mtdoops console(commit: b61312d353da1871778711040464b10f5cd904df).
> Logically, if panic without the issue, console_unlock is called after
> couples of console_lock and console_unlock; if panic with the issue,
> will it be risky call console_unlock() in console_unblank() after
> console_lock()?
> 	2. Moreover, there is another option. We can also add protect codes
> in vprintk(), vprintk() just cover the cases that two cores' log
> interleave when panic and printk recurse itself. We can add all cases'
> protection here. Actually the original vprintk() don___t have the issue,
> but after the patch(commit: fe21773d655c2c64641ec2cef499289ea175c817)
> which fix two cores' log interleave issue , the issue is not covered.
> I add a flag after panic_smp_self_stop() in panic(), and check the
> flag, if flag is set, vprintk will call zap_locks(), I have tested the
> option, the issue also disappear.
> 	What do you think?

The #1 priority is to get the oops message reliably delivered.

That means we should avoid console_unlock() on the oops path: it's far
too complicated and risks deadlocks, re-oopses, recursion, etc.

If there was text queued in the console layer and that text fails to be
emitted, well, that's sad, but it's more important that the oops
message be displayed.

If the oops trace is occasionally interleaved with other text then
that's sad too, but at least the info we need is readable.  Oopses
inside console_lock() are rare.


So I'd suggest that the code in bust_spinlocks(1) should simply do
whatever needs to be done to make the forthcoming oops trace be
visible, and leave it at that - don't bother trying to flush out any
old text.


Also, we should be careful with things like up() on a semaphore which
hasn't been down()ed.  Because under some Kconfig combinations, such an
operation might trigger debugging traces and we could get into a big
mess.  (An up() on non-down()ed semaphore is actually an OK operation,
so this was a bad example.  But you see the problem).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] panic: fix incomplete panic log in panic()
       [not found] <BD54883EA7DE8B41A61B7E723561C711139F75D348@sc-vexch3.marvell.com>
@ 2012-10-15 11:38 ` Qing Z
  2012-10-15 22:02   ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Qing Z @ 2012-10-15 11:38 UTC (permalink / raw)
  To: akpm
  Cc: mingo, ben, markivx, ak, linux-kernel, cxie4, binw, wwang27,
	xjian, zhangwm, Qing Zhu

> On Thu, 11 Oct 2012 16:03:07 +0800
> Qing Zhu <qzhu@marvell.com> wrote:
>
>> Panic log should be printed on the console, but if someone lock the
>> console when panic, console won't print out panic log.
>>
>> The incomplete panic log issue will happen in below scenarios:
>> 1. One task call console_lock(), then panic happend before it call
>> console_unlock(). No panic log can be printed.
>> 2. Cpu 0 call panic()->Cpu 1 call console_lock()->Cpu 0 call
>> smp_send_stop()
>> Cpu 1 will be stopped and can't unlock console,only top part of panic log
>> will be printed.
>>
>> So unlock console anyway in panic() to keep panic log printed.
>>
>> Signed-off-by: Qing Zhu <qzhu@marvell.com>
>> ---
>>  kernel/panic.c |    8 ++++++++
>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index e1b2822..3924d25 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -23,6 +23,7 @@
>>  #include <linux/init.h>
>>  #include <linux/nmi.h>
>>  #include <linux/dmi.h>
>> +#include <linux/console.h>
>>
>>  #define PANIC_TIMER_STEP 100
>>  #define PANIC_BLINK_SPD 18
>> @@ -127,6 +128,13 @@ void panic(const char *fmt, ...)
>>
>>       atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>>
>> +     /*
>> +      * Unlock the console anyway here, in case it's occupied by another
>> +      * one which has no chance to unlock the console thus prevents the
>> +      * panic log prints on the console.
>> +      */
>> +     console_unlock();
>> +
>>       bust_spinlocks(0);
>>
>>       if (!panic_blink)
>
> hm.  console_unlock() does a large amount of work, and it seems risky
> to do all of that when the system is in a bad-and-getting-worse state.
>
> Is there some more modest thing we could do here, for example,
>
>         if (console_locked) {
>                 up(&console_sem);
>                 console_locked = 0;
>         }
>
> or something along those lines?
>
> Also, perhaps this operation should be moved into bust_spinlocks().
> What would have happened if your code had triggered an oops, rather
> than called panic()?
>
>
Hi Andrew,
	Thanks for your reply!
	For your question" What would have happened if your code had
triggered an oops, rather than called panic()?", actually we found the
issue when trigger an oops. When we call FBIOPAN_DISPLAY in
./drivers/video/fbmem.c, it will first lock console, if we trigger an
oops before unlock console, the issue happen. It also exist when call
panic() directly in the same case. It is a common issue for panic
process.
	I have two options for solution:
	1. I agree with your suggestion that add some modest thing in
bust_spinlocks(), bust_spinlocks() is supposed to clear spinlocks
which prevent oops information from reaching the user. But it didn't
clear console_sem. We can add codes that clear console_sem.
	1) add up(&console_sem) in bust_spinlocks(0).
		It will be risky in case that no printk after bust_spinlocks(0) in
panic(), because no console_unlock() to print log out.
	2) call console_unlock()in bust_spinlocks(0).
		For bust_spinlocks(0), console_unblank() is used to flush oops to
mtdoops console(commit: b61312d353da1871778711040464b10f5cd904df).
Logically, if panic without the issue, console_unlock is called after
couples of console_lock and console_unlock; if panic with the issue,
will it be risky call console_unlock() in console_unblank() after
console_lock()?
	2. Moreover, there is another option. We can also add protect codes
in vprintk(), vprintk() just cover the cases that two cores' log
interleave when panic and printk recurse itself. We can add all cases'
protection here. Actually the original vprintk() don’t have the issue,
but after the patch(commit: fe21773d655c2c64641ec2cef499289ea175c817)
which fix two cores' log interleave issue , the issue is not covered.
I add a flag after panic_smp_self_stop() in panic(), and check the
flag, if flag is set, vprintk will call zap_locks(), I have tested the
option, the issue also disappear.
	What do you think?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-10-22 19:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-11  8:03 [PATCH] panic: fix incomplete panic log in panic() Qing Zhu
2012-10-11 21:18 ` Andrew Morton
     [not found] <BD54883EA7DE8B41A61B7E723561C711139F75D348@sc-vexch3.marvell.com>
2012-10-15 11:38 ` Qing Z
2012-10-15 22:02   ` Andrew Morton
2012-10-15 22:06     ` Andi Kleen
2012-10-16 14:25     ` Qing Z
2012-10-17 10:44     ` Qing Z
2012-10-18  0:06       ` Andrew Morton
2012-10-22 14:54         ` Qing Z
2012-10-22 19:51           ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).