* [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
[not found] <bug-18252-4803@https.bugzilla.kernel.org/>
@ 2010-09-11 9:50 ` Stefan Richter
2010-09-13 21:41 ` Andrew Morton
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Richter @ 2010-09-11 9:50 UTC (permalink / raw)
To: linux-kernel; +Cc: bugzilla-daemon, axboe
Full quote for lkml:
bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=18252
>
> Summary: spinlock lockup in __make_request <- submit_bio <-
> ondemand_readahead
> Product: IO/Storage
> Version: 2.5
> Kernel Version: 2.6.36-rc3
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Block Layer
> AssignedTo: axboe@kernel.dk
> ReportedBy: stefanr@s5r6.in-berlin.de
> Regression: No
>
>
> Created an attachment (id=29562)
> --> (https://bugzilla.kernel.org/attachment.cgi?id=29562)
> BUG screenshot
>
> After a week uptime of 2.6.36-rc3 (I ran 2.6.35 before that),
Almost two weeks uptime actually.
> I was greeted by a black screen of death today in the morning:
>
> (see screenshot in attachment; partial transcript:)
>
> sending NMI to all CPUs:
> BUG: soinlock lockup on CPU#0, ktorrent/4313, ffff8802...
> PID: 4313, comm: ktorrent Tainted: G M D W 2.6.36-rc3 #3
> Call Trace:
> [...] do_raw_spin_lock+0x118/0x147
> [...] _raw_spin_lock_irq+0x44/0x49
> [...] ? __make_request+0x5c/0x400
> [...] __make_request+0x5c/0x400
> [...] generic_make_request+0x23a/0x2a9
> [...] submit_bio+0xad/b6
> [...] mpage_bio_submit...
> [...] do_mpage_readpage...
> [...] ? get_parent_ip...
> [...] ? sub_preempt_count...
> [...] ? __lru_cache_add...
> [...] mpage_readpages...
> [...] ? ext4_get_block...
> [...] ? __alloc_pages_nodemask...
> [...] ? ext4_get_block...
> [...] ext4_readpages...
> [...] __do_page_cache_readahead...
> [...] ? __do_page_cache_readahead...
> [...] ra_submit...
> [...] ondemand_readahead...
>
> This is a system with Phenom II x4 and Radeon graphics. Since kernel mode
> setting is fairly new for radeon, it is possible that the lockup happened with
> earlier kernels too but simply ended in a lockup without trace dump to the
> screen. IOW, it is not clear to me whether this is a regression or not.
>
> The bug happened while kaffeine wrote an MPEG 2 TS to the same filesystem from
> which ktorrent was reading. Of course this kind of commonplace workload
> happened without problem two or three times before during the week in which I
> ran 2.6.36-rc3.
>
(The screenshot is a bit large, hence I reported in bugzilla instead of the list.)
The kernel taint was due to prior apparently unrelated lockdep report, bug
17752 "2.6.36-rc3: inconsistent lock state (iprune_sem, shrink_icache_memory".
And there were three machine check events ten days ago due to corrected ECC
memory errors.
--
Stefan Richter
-=====-==-=- =--= -=-==
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
2010-09-11 9:50 ` [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead Stefan Richter
@ 2010-09-13 21:41 ` Andrew Morton
2010-09-14 6:56 ` Stefan Richter
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2010-09-13 21:41 UTC (permalink / raw)
To: Stefan Richter; +Cc: linux-kernel, bugzilla-daemon, axboe, linux-scsi
On Sat, 11 Sep 2010 11:50:41 +0200
Stefan Richter <stefanr@s5r6.in-berlin.de> wrote:
> Full quote for lkml:
>
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=18252
> >
> > Summary: spinlock lockup in __make_request <- submit_bio <-
> > ondemand_readahead
> > Product: IO/Storage
> > Version: 2.5
> > Kernel Version: 2.6.36-rc3
> > Platform: All
> > OS/Version: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: Block Layer
> > AssignedTo: axboe@kernel.dk
> > ReportedBy: stefanr@s5r6.in-berlin.de
> > Regression: No
> >
> >
> > Created an attachment (id=29562)
> > --> (https://bugzilla.kernel.org/attachment.cgi?id=29562)
> > BUG screenshot
> >
> > After a week uptime of 2.6.36-rc3 (I ran 2.6.35 before that),
>
> Almost two weeks uptime actually.
>
> > I was greeted by a black screen of death today in the morning:
> >
> > (see screenshot in attachment; partial transcript:)
> >
> > sending NMI to all CPUs:
> > BUG: soinlock lockup on CPU#0, ktorrent/4313, ffff8802...
> > PID: 4313, comm: ktorrent Tainted: G M D W 2.6.36-rc3 #3
> > Call Trace:
> > [...] do_raw_spin_lock+0x118/0x147
> > [...] _raw_spin_lock_irq+0x44/0x49
> > [...] ? __make_request+0x5c/0x400
> > [...] __make_request+0x5c/0x400
> > [...] generic_make_request+0x23a/0x2a9
> > [...] submit_bio+0xad/b6
> > [...] mpage_bio_submit...
> > [...] do_mpage_readpage...
> > [...] ? get_parent_ip...
> > [...] ? sub_preempt_count...
> > [...] ? __lru_cache_add...
> > [...] mpage_readpages...
> > [...] ? ext4_get_block...
> > [...] ? __alloc_pages_nodemask...
> > [...] ? ext4_get_block...
> > [...] ext4_readpages...
> > [...] __do_page_cache_readahead...
> > [...] ? __do_page_cache_readahead...
> > [...] ra_submit...
> > [...] ondemand_readahead...
> >
> > This is a system with Phenom II x4 and Radeon graphics. Since kernel mode
> > setting is fairly new for radeon, it is possible that the lockup happened with
> > earlier kernels too but simply ended in a lockup without trace dump to the
> > screen. IOW, it is not clear to me whether this is a regression or not.
> >
> > The bug happened while kaffeine wrote an MPEG 2 TS to the same filesystem from
> > which ktorrent was reading. Of course this kind of commonplace workload
> > happened without problem two or three times before during the week in which I
> > ran 2.6.36-rc3.
> >
>
> (The screenshot is a bit large, hence I reported in bugzilla instead of the list.)
>
What you've quoted above appears to be just the aftermath.
https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the
kernel earlier crashed in scsi code, perhaps under
scsi_setup_fs_cmnd().
The question is: was that actually the first crash, or did an even
earlier one scroll off?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
2010-09-13 21:41 ` Andrew Morton
@ 2010-09-14 6:56 ` Stefan Richter
2010-09-14 6:58 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Richter @ 2010-09-14 6:56 UTC (permalink / raw)
To: Andrew Morton, Florian Mickler
Cc: linux-kernel, bugzilla-daemon, axboe, linux-scsi
Andrew Morton wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=18252
...
> What you've quoted above appears to be just the aftermath.
> https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the
> kernel earlier crashed in scsi code, perhaps under
> scsi_setup_fs_cmnd().
>
> The question is: was that actually the first crash, or did an even
> earlier one scroll off?
It happened overnight. The screenshot
https://bugzilla.kernel.org/attachment.cgi?id=29562 shows that there was a lot
more logged before it. When I saw it in the morning I assumed that the tail
was a repetition of the leading bug trace, but it seems I am mistaken.
Florian Mickler wrote:
> There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days
> ago.
>
> On Sat, 11 Sep 2010 19:07:44 +0000
> James Bottomley <James.Bottomley@suse.de> wrote:
>
>> This includes the oops from use after free, a set of qla2xxx fixes, some
>> misc warning cleanups from the recently introduced printk issue, an hpsa
>> lockup fix and a medium removal bug in sd introduced by the BKL
>> pushdown.
>>
>> The patch is available here:
>>
>> master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git
>
> Maybe you are seeing that?
>
> (reacting to the general-protection-fault preceded by scsi_init in the
> attachment jpg)
Now that you point it out --- perhaps. Though I haven't looked into the
mechanics of the now fixed scsi_ini_io use after free.
I am going to update to 2.6.36-rc4 today (I had reverted to 2.6.35 since the
report), and if the issue does not return after two weeks or so I will close
it as fixed, I suggest.
--
Stefan Richter
-=====-==-=- =--= -===-
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
2010-09-14 6:56 ` Stefan Richter
@ 2010-09-14 6:58 ` Jens Axboe
2010-09-14 11:18 ` Stefan Richter
0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2010-09-14 6:58 UTC (permalink / raw)
To: Stefan Richter
Cc: Andrew Morton, Florian Mickler, linux-kernel, bugzilla-daemon,
linux-scsi
On 2010-09-14 08:56, Stefan Richter wrote:
> Andrew Morton wrote:
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=18252
> ...
>> What you've quoted above appears to be just the aftermath.
>> https://bugzilla.kernel.org/attachment.cgi?id=29562 indicates that the
>> kernel earlier crashed in scsi code, perhaps under
>> scsi_setup_fs_cmnd().
>>
>> The question is: was that actually the first crash, or did an even
>> earlier one scroll off?
>
> It happened overnight. The screenshot
> https://bugzilla.kernel.org/attachment.cgi?id=29562 shows that there was a lot
> more logged before it. When I saw it in the morning I assumed that the tail
> was a repetition of the leading bug trace, but it seems I am mistaken.
>
> Florian Mickler wrote:
>> There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days
>> ago.
>>
>> On Sat, 11 Sep 2010 19:07:44 +0000
>> James Bottomley <James.Bottomley@suse.de> wrote:
>>
>>> This includes the oops from use after free, a set of qla2xxx fixes, some
>>> misc warning cleanups from the recently introduced printk issue, an hpsa
>>> lockup fix and a medium removal bug in sd introduced by the BKL
>>> pushdown.
>>>
>>> The patch is available here:
>>>
>>> master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git
>>
>> Maybe you are seeing that?
>>
>> (reacting to the general-protection-fault preceded by scsi_init in the
>> attachment jpg)
>
> Now that you point it out --- perhaps. Though I haven't looked into the
> mechanics of the now fixed scsi_ini_io use after free.
It seems the very likely explanation, since I can't see any other way that
you would deadlock on the queue lock from that call trace if you haven't
had someone else crash with the lock held already.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead
2010-09-14 6:58 ` Jens Axboe
@ 2010-09-14 11:18 ` Stefan Richter
0 siblings, 0 replies; 5+ messages in thread
From: Stefan Richter @ 2010-09-14 11:18 UTC (permalink / raw)
To: Jens Axboe
Cc: Andrew Morton, Florian Mickler, linux-kernel, bugzilla-daemon,
linux-scsi
Jens Axboe wrote:
> On 2010-09-14 08:56, Stefan Richter wrote:
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=18252
>> Florian Mickler wrote:
>>> There was an scsi-related use-after-free OOPS fixed recently and pulled 3 days
>>> ago.
...
>>> Maybe you are seeing that?
>>>
>>> (reacting to the general-protection-fault preceded by scsi_init in the
>>> attachment jpg)
>> Now that you point it out --- perhaps. Though I haven't looked into the
>> mechanics of the now fixed scsi_ini_io use after free.
>
> It seems the very likely explanation, since I can't see any other way that
> you would deadlock on the queue lock from that call trace if you haven't
> had someone else crash with the lock held already.
Good, I close the bugzilla item right away with the reasonable assuption that
it is fixed by commit 3a5c19c23db65a554f2e4f5df5f307c668277056.
--
Stefan Richter
-=====-==-=- =--= -===-
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-09-14 11:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-18252-4803@https.bugzilla.kernel.org/>
2010-09-11 9:50 ` [Bug 18252] spinlock lockup in __make_request <- submit_bio <- ondemand_readahead Stefan Richter
2010-09-13 21:41 ` Andrew Morton
2010-09-14 6:56 ` Stefan Richter
2010-09-14 6:58 ` Jens Axboe
2010-09-14 11:18 ` Stefan Richter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.