linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Knut Petersen <Knut_Petersen@t-online.de>
To: Frederic Weisbecker <fweisbec@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>, Greg KH <greg@kroah.com>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [BUG 3.12.rc4] Oops: unable to handle kernel paging request during shutdown
Date: Tue, 15 Oct 2013 00:31:34 +0200	[thread overview]
Message-ID: <525C70C6.60702@t-online.de> (raw)
In-Reply-To: <20131014215117.GE13538@localhost.localdomain>

On 14.10.2013 23:51, Frederic Weisbecker wrote:
> On Mon, Oct 14, 2013 at 02:28:30PM -0700, Paul E. McKenney wrote:
>> On Mon, Oct 14, 2013 at 10:53:03AM -0700, Linus Torvalds wrote:
>>> Hmm. No obvious ideas come to mind, but I'm adding more people to the cc.
>>>
>>> Clearly the wait_event_interruptible_timeout() in the RCU grace-period
>>> thread causes this, but I'm not seeing why shutdown would trigger it.
>>>
>>> The code disassembles to
>>>
>>>     0: 85 db                 test   %ebx,%ebx
>>>     2: 79 0c                 jns    0x10
>>>     4: 81 e6 ff 00 00 00     and    $0xff,%esi
>>>     a: 8d 44 f0 30           lea    0x30(%eax,%esi,8),%eax
>>>     e: eb 0a                 jmp    0x1a
>>>    10: c1 e9 1a             shr    $0x1a,%ecx
>>>    13: 8d 84 c8 30 0e 00 00 lea    0xe30(%eax,%ecx,8),%eax
>>>    1a: 8b 48 04             mov    0x4(%eax),%ecx
>>>    1d: 89 50 04             mov    %edx,0x4(%eax)
>>>    20: 89 02                 mov    %eax,(%edx)
>>>    22: 89 4a 04             mov    %ecx,0x4(%edx)
>>>    25:* 89 11                 mov    %edx,(%ecx) <-- trapping instruction
>>>    27: 5b                   pop    %ebx
>>>    28: 5e                   pop    %esi
>>>    29: 5d                   pop    %ebp
>>>    2a: c3                   ret
>>>
>>> so the oops is in the final
>>>
>>>          list_add_tail(&timer->entry, vec);
>>>
>>> where "%ecx" is "vec->prev" (f8c551f4). That looks like it might be a
>>> perfectly valid pointer, but clearly it isn't (it's about 115M off the
>>> top of virtual memory, I think that might be in the vmalloc area).
>>>
>>> So I'm *guessing* that something did a vfree() on some data structure
>>> that contained active timers - and then later on the RCU thread ended
>>> up being the next thing that tried to add a timer after the
>>> now-non-existing one.
>>>
>>> And your other oopses do seem to have a similar pattern, even if their
>>> actual oops is elsewhere. They oops in run_timer_softirq, also taking
>>> a page fault in the 0xf9...... range, so it might well be a vmalloc
>>> address there too.
>>>
>>> But I sure as hell can't start to guess what that would be.
>>>
>>> I'm wondering it CONFIG_DEBUG_OBJECTS (and then
>>> CONFIG_DEBUG_OBJECTS_FREE=y and CONFIG_DEBUG_OBJECTS_TIMERS=y) might
>>> help catch this...
>> I would also like to nominate CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, which
>> checks for invoking call_rcu() twice in a row on the same rcu_head.
>>
>> Any chance of a look at the .config file?
> Yeah it would be nice to get the .config.
>
> Looks like the problem happens on timer softirq during timers callback execution.
> The two previous reports from Knut seem to be about the same.
>
> I've been thinking that CONFIG_DEBUG_LIST could help. Unfortunately it's good
> to spot list APIs misuse but, if Linus is right, the problem may be that the list
> belongs to an object that has been freed, and I believe that won't detect such a thing.
>
> Still it would be nice to try it, in case we are missing some list misuse.
>
> Also knut, do you have a serial line?
Yes, if really necesarry .... I´d prefer network

>
> If so may be looking at the timer events may help. Could you try to enable the following
> timer events?
>
>        echo 1 > /sys/kernel/debug/tracing/events/timer/timer_cancel
>        echo 1 > /sys/kernel/debug/tracing/events/timer/timer_expire_entry
>        echo 1 > /sys/kernel/debug/tracing/events/timer/timer_expire_exit
>        echo 1 > /sys/kernel/debug/tracing/events/timer/timer_init
>        echo 1 > /sys/kernel/debug/tracing/events/timer/timer_start
>
> And also enable ftrace_dump_on_oops:
>
>        sysctl kernel.ftrace_dump_on_oops=1
I assume you forgot .../enable several times ?!

echo 1 > /sys/kernel/debug/tracing/events/timer/timer_cancel/enable
echo 1 > /sys/kernel/debug/tracing/events/timer/timer_expire_entry/enable
echo 1 > /sys/kernel/debug/tracing/events/timer/timer_expire_exit/enable
echo 1 > /sys/kernel/debug/tracing/events/timer/timer_init/enable
echo 1 > /sys/kernel/debug/tracing/events/timer/timer_start/enable
sysctl kernel.ftrace_dump_on_oops=1

I added that to /etc/init.d/boot.local


>
> We may have some interesting clues about who manipulated the timer list last before
> the oops happened.
>
> Thanks!
>


  reply	other threads:[~2013-10-14 22:31 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <525BD08C.2080101@t-online.de>
2013-10-14 17:53 ` [BUG 3.12.rc4] Oops: unable to handle kernel paging request during shutdown Linus Torvalds
2013-10-14 21:28   ` Paul E. McKenney
2013-10-14 21:51     ` Frederic Weisbecker
2013-10-14 22:31       ` Knut Petersen [this message]
2013-10-14 22:43         ` Frederic Weisbecker
2013-10-15  6:40       ` Ingo Molnar
2013-10-15  7:53         ` Knut Petersen
2013-10-17 14:25         ` Frederic Weisbecker
2013-10-18  6:30           ` Ingo Molnar
2013-10-14 21:52     ` Knut Petersen
2013-10-14 23:16       ` Paul E. McKenney
2013-10-15  0:59         ` Paul E. McKenney
2013-10-15  8:06           ` Knut Petersen
2013-10-25  8:38   ` Linus Torvalds
2013-10-25  9:02     ` Linus Torvalds
2013-10-25  9:08       ` Paul E. McKenney
2013-10-25  9:17         ` Greg Kroah-Hartman
2013-10-25  9:13       ` Greg Kroah-Hartman
2013-10-25  9:28       ` Rafael J. Wysocki
2013-10-25  9:51         ` Rafael J. Wysocki
2013-10-25  9:54           ` Viresh Kumar
2013-10-25 10:10           ` Rafael J. Wysocki
2013-10-25 10:00             ` Viresh Kumar
2013-10-25 10:07             ` Linus Torvalds
2013-10-25 11:10               ` Rafael J. Wysocki
2013-10-25 13:49                 ` Viresh Kumar
2013-10-25 14:21                   ` Rafael J. Wysocki
2013-10-28 15:02       ` Knut Petersen
2013-10-25 10:23     ` Thomas Gleixner
2013-10-25 10:48       ` Linus Torvalds
2013-10-26 11:43         ` Ingo Molnar
2013-10-28 14:50           ` Knut Petersen
2013-10-28 15:01             ` Ingo Molnar
2013-10-28 15:16               ` Ingo Molnar
2013-10-28 15:45                 ` Knut Petersen
2013-10-27 20:20         ` Linus Torvalds
2013-10-27 20:39           ` Linus Torvalds
2013-10-27 21:13             ` Linus Torvalds
2013-10-27 21:24               ` Greg Kroah-Hartman
2013-10-28 17:23                 ` Bjorn Helgaas
2013-10-28 17:30                   ` Veaceslav Falico
2013-10-28 17:35                     ` Bjorn Helgaas
2013-10-28 17:39                       ` Veaceslav Falico
2013-10-28 18:52                   ` Greg Kroah-Hartman
2013-10-30 18:04             ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525C70C6.60702@t-online.de \
    --to=knut_petersen@t-online.de \
    --cc=fweisbec@gmail.com \
    --cc=greg@kroah.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).