linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
@ 2014-04-09  4:44 Benjamin Herrenschmidt
  2014-04-09 13:35 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2014-04-09  4:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

Hi folks !

While testing a branch of fixes before I send it to Linus, which
happens to be based on 18a1a7a1d862ae0794a0179473d08a414dd49234,
I hit this:

kernel BUG at /home/benh/linux-powerpc-test/block/cfq-iosched.c:3145!
cpu 0x3c: Vector: 700 (Program Check) at [c000003ca69bb190]
    pc: c00000000033b05c: .cfq_dispatch_requests+0x90/0x99c
    lr: c00000000033b038: .cfq_dispatch_requests+0x6c/0x99c
    sp: c000003ca69bb410
   msr: 9000000000029032
  current = 0xc000003ca63d32a0
  paca    = 0xc00000000ffef000	 softe: 0	 irq_happened: 0x01
    pid   = 3487, comm = smartd
kernel BUG at /home/benh/linux-powerpc-test/block/cfq-iosched.c:3145!
enter ? for help
[c000003ca69bb4c0] c00000000032000c .elv_drain_elevator+0x70/0xc8
[c000003ca69bb540] c000000000320140 .__elv_add_request+0xdc/0x27c
[c000003ca69bb5e0] c0000000003286f8 .blk_execute_rq_nowait+0xc0/0xf8
[c000003ca69bb670] c0000000003287ec .blk_execute_rq+0xbc/0xe8
[c000003ca69bb810] c000000000332350 .sg_io+0x218/0x39c
[c000003ca69bb930] c000000000332c3c .scsi_cmd_ioctl+0x270/0x4ac
[c000003ca69bba70] c0000000005d559c .sd_ioctl+0xa4/0xd8
[c000003ca69bbb20] c00000000032eb1c .__blkdev_driver_ioctl+0x34/0x54
[c000003ca69bbb90] c00000000032f83c .blkdev_ioctl+0x7b8/0x850
[c000003ca69bbc40] c00000000018d6e0 .block_ioctl+0x4c/0x60
[c000003ca69bbcb0] c0000000001691cc .do_vfs_ioctl+0x5cc/0x670
[c000003ca69bbd90] c0000000001692b4 .SyS_ioctl+0x44/0x70
[c000003ca69bbe30] c00000000000a024 syscall_exit+0x0/0x98
--- Exception: c00 (System Call) at 00003fffb5240ee0

The storage driver is our usual IBM "IPR".

Is that a known issue ?

Cheers,
Ben.
 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-09  4:44 [BUG] kernel BUG at /.../block/cfq-iosched.c:3145! Benjamin Herrenschmidt
@ 2014-04-09 13:35 ` Jens Axboe
  2014-04-09 23:47   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2014-04-09 13:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On 04/08/2014 10:44 PM, Benjamin Herrenschmidt wrote:
> Hi folks !
>
> While testing a branch of fixes before I send it to Linus, which
> happens to be based on 18a1a7a1d862ae0794a0179473d08a414dd49234,
> I hit this:
>
> kernel BUG at /home/benh/linux-powerpc-test/block/cfq-iosched.c:3145!
> cpu 0x3c: Vector: 700 (Program Check) at [c000003ca69bb190]
>      pc: c00000000033b05c: .cfq_dispatch_requests+0x90/0x99c
>      lr: c00000000033b038: .cfq_dispatch_requests+0x6c/0x99c
>      sp: c000003ca69bb410
>     msr: 9000000000029032
>    current = 0xc000003ca63d32a0
>    paca    = 0xc00000000ffef000	 softe: 0	 irq_happened: 0x01
>      pid   = 3487, comm = smartd
> kernel BUG at /home/benh/linux-powerpc-test/block/cfq-iosched.c:3145!
> enter ? for help
> [c000003ca69bb4c0] c00000000032000c .elv_drain_elevator+0x70/0xc8
> [c000003ca69bb540] c000000000320140 .__elv_add_request+0xdc/0x27c
> [c000003ca69bb5e0] c0000000003286f8 .blk_execute_rq_nowait+0xc0/0xf8
> [c000003ca69bb670] c0000000003287ec .blk_execute_rq+0xbc/0xe8
> [c000003ca69bb810] c000000000332350 .sg_io+0x218/0x39c
> [c000003ca69bb930] c000000000332c3c .scsi_cmd_ioctl+0x270/0x4ac
> [c000003ca69bba70] c0000000005d559c .sd_ioctl+0xa4/0xd8
> [c000003ca69bbb20] c00000000032eb1c .__blkdev_driver_ioctl+0x34/0x54
> [c000003ca69bbb90] c00000000032f83c .blkdev_ioctl+0x7b8/0x850
> [c000003ca69bbc40] c00000000018d6e0 .block_ioctl+0x4c/0x60
> [c000003ca69bbcb0] c0000000001691cc .do_vfs_ioctl+0x5cc/0x670
> [c000003ca69bbd90] c0000000001692b4 .SyS_ioctl+0x44/0x70
> [c000003ca69bbe30] c00000000000a024 syscall_exit+0x0/0x98
> --- Exception: c00 (System Call) at 00003fffb5240ee0
>
> The storage driver is our usual IBM "IPR".
>
> Is that a known issue ?

Nope, that's not a known issue. This must be related to the FIFO 
changes... How reproducible is this?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-09 13:35 ` Jens Axboe
@ 2014-04-09 23:47   ` Benjamin Herrenschmidt
  2014-04-10  0:02     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2014-04-09 23:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On Wed, 2014-04-09 at 07:35 -0600, Jens Axboe wrote:
> On 04/08/2014 10:44 PM, Benjamin Herrenschmidt wrote:
> > Hi folks !
> >
> > While testing a branch of fixes before I send it to Linus, which
> > happens to be based on 18a1a7a1d862ae0794a0179473d08a414dd49234,
> > I hit this:
> >
> > kernel BUG at /home/benh/linux-powerpc-test/block/cfq-iosched.c:3145!
> > cpu 0x3c: Vector: 700 (Program Check) at [c000003ca69bb190]
> >      pc: c00000000033b05c: .cfq_dispatch_requests+0x90/0x99c
> >      lr: c00000000033b038: .cfq_dispatch_requests+0x6c/0x99c
> >      sp: c000003ca69bb410
> >     msr: 9000000000029032
> >    current = 0xc000003ca63d32a0
> >    paca    = 0xc00000000ffef000	 softe: 0	 irq_happened: 0x01
> >      pid   = 3487, comm = smartd
> > kernel BUG at /home/benh/linux-powerpc-test/block/cfq-iosched.c:3145!
> > enter ? for help
> > [c000003ca69bb4c0] c00000000032000c .elv_drain_elevator+0x70/0xc8
> > [c000003ca69bb540] c000000000320140 .__elv_add_request+0xdc/0x27c
> > [c000003ca69bb5e0] c0000000003286f8 .blk_execute_rq_nowait+0xc0/0xf8
> > [c000003ca69bb670] c0000000003287ec .blk_execute_rq+0xbc/0xe8
> > [c000003ca69bb810] c000000000332350 .sg_io+0x218/0x39c
> > [c000003ca69bb930] c000000000332c3c .scsi_cmd_ioctl+0x270/0x4ac
> > [c000003ca69bba70] c0000000005d559c .sd_ioctl+0xa4/0xd8
> > [c000003ca69bbb20] c00000000032eb1c .__blkdev_driver_ioctl+0x34/0x54
> > [c000003ca69bbb90] c00000000032f83c .blkdev_ioctl+0x7b8/0x850
> > [c000003ca69bbc40] c00000000018d6e0 .block_ioctl+0x4c/0x60
> > [c000003ca69bbcb0] c0000000001691cc .do_vfs_ioctl+0x5cc/0x670
> > [c000003ca69bbd90] c0000000001692b4 .SyS_ioctl+0x44/0x70
> > [c000003ca69bbe30] c00000000000a024 syscall_exit+0x0/0x98
> > --- Exception: c00 (System Call) at 00003fffb5240ee0
> >
> > The storage driver is our usual IBM "IPR".
> >
> > Is that a known issue ?
> 
> Nope, that's not a known issue. This must be related to the FIFO 
> changes... How reproducible is this?

Dunno yet, haven't had a chance to dig. I'll try to grab that machine
later today and reproduce &| bisect.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-09 23:47   ` Benjamin Herrenschmidt
@ 2014-04-10  0:02     ` Benjamin Herrenschmidt
  2014-04-10  1:31       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2014-04-10  0:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On Thu, 2014-04-10 at 09:47 +1000, Benjamin Herrenschmidt wrote:

> > Nope, that's not a known issue. This must be related to the FIFO 
> > changes... How reproducible is this?
> 
> Dunno yet, haven't had a chance to dig. I'll try to grab that machine
> later today and reproduce &| bisect.

Second boot leads to a slightly different symptom (see below). I'll
try bisecting a bit later today if I get a chance.

=============================================================================
BUG blkdev_requests (Not tainted): Poison overwritten
-----------------------------------------------------------------------------

Disabling lock debugging due to kernel taint
INFO: 0xc000003ca627f790-0xc000003ca627f797. First byte 0xc0 instead of 0x6b
INFO: Allocated in .mempool_alloc_slab+0x1c/0x30 age=17 cpu=36 pid=3579
	.kmem_cache_alloc+0xc0/0x1ac
	.mempool_alloc_slab+0x1c/0x30
	.mempool_alloc+0x98/0x18c
	.get_request+0x248/0x490
	.blk_queue_bio+0x174/0x25c
	.generic_make_request+0xa8/0xec
	.submit_bio+0x134/0x14c
	.mpage_readpages+0x108/0x118
	.ext4_readpages+0x48/0x5c
	.__do_page_cache_readahead+0x1b0/0x298
	.ra_submit+0x28/0x38
	.filemap_fault+0x158/0x3e4
	.__do_fault+0x48/0xbc
	.do_read_fault.isra.65+0x2c/0xe4
	.handle_mm_fault+0x460/0xbb8
	.do_page_fault+0x498/0x7e0
INFO: Freed in .mempool_free_slab+0x1c/0x30 age=17 cpu=36 pid=0
	.kmem_cache_free+0x1d4/0x1fc
	.mempool_free_slab+0x1c/0x30
	.mempool_free+0xb0/0xbc
	.__blk_put_request+0xd0/0x104
	.blk_end_bidi_request+0x40/0x64
	.scsi_io_completion+0x198/0x5a4
	.scsi_finish_command+0xd0/0xdc
	.scsi_softirq_done+0x128/0x134
	.blk_done_softirq+0xa0/0xb8
	.__do_softirq+0x1b8/0x370
	.irq_exit+0x74/0xa4
	.__do_irq+0x90/0xa4
	.call_do_irq+0x14/0x24
	.do_IRQ+0x80/0xc0
	hardware_interrupt_common+0x138/0x180
	0xc000000001e46640
INFO: Slab 0xf00000000d445888 objects=92 used=92 fp=0x          (null) flags=0x13fff8000000080
INFO: Object 0xc000003ca627f788 @offset=63368 fp=0xc000003ca6270de8

Bytes b4 c000003ca627f778: 00 00 00 00 ff fe ef be 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
Object c000003ca627f788: 6b 6b 6b 6b 6b 6b 6b 6b c0 00 00 3c a6 27 fa 50  kkkkkkkk...<.'.P
Object c000003ca627f798: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f7a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f7b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f7c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f7d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f7e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f7f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f808: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f818: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f828: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f838: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f848: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f858: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f868: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f878: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f888: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f898: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f8a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f8b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f8c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f8d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f8e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object c000003ca627f8f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
Redzone c000003ca627f908: bb bb bb bb bb bb bb bb                          ........
Padding c000003ca627fa48: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
CPU: 60 PID: 3579 Comm: X Tainted: G    B        3.14.0-test #1
Call Trace:
[c000003ca331aaf0] [c00000000001405c] .show_stack+0x50/0x14c (unreliable)
[c000003ca331abc0] [c0000000008a3a94] .dump_stack+0x90/0xc0
[c000003ca331ac40] [c00000000014bbec] .print_trailer+0x17c/0x188
[c000003ca331acd0] [c00000000014bd08] .check_bytes_and_report+0xc0/0x114
[c000003ca331ad80] [c00000000014be44] .check_object+0xe8/0x240
[c000003ca331ae20] [c0000000008a2488] .alloc_debug_processing+0x184/0x19c
[c000003ca331aeb0] [c0000000008a2958] .__slab_alloc+0x4b8/0x540
[c000003ca331b020] [c00000000014de68] .kmem_cache_alloc+0xc0/0x1ac
[c000003ca331b0d0] [c000000000107f3c] .mempool_alloc_slab+0x1c/0x30
[c000003ca331b140] [c0000000001080d4] .mempool_alloc+0x98/0x18c
[c000003ca331b220] [c0000000003227c4] .get_request+0x248/0x490
[c000003ca331b340] [c000000000324a68] .blk_queue_bio+0x174/0x25c
[c000003ca331b3e0] [c000000000322280] .generic_make_request+0xa8/0xec
[c000003ca331b470] [c0000000003223f8] .submit_bio+0x134/0x14c
[c000003ca331b530] [c000000000186030] ._submit_bh+0x244/0x278
[c000003ca331b5c0] [c000000000186b14] .ll_rw_block+0xd4/0x100
[c000003ca331b670] [c000000000188d80] .__breadahead+0x2c/0x4c
[c000003ca331b6f0] [c00000000021d18c] .__ext4_get_inode_loc+0x390/0x444
[c000003ca331b7c0] [c00000000021f7cc] .ext4_iget+0x4c/0x97c
[c000003ca331b8c0] [c000000000229558] .ext4_lookup+0xc4/0x15c
[c000003ca331b960] [c00000000016007c] .lookup_real+0x4c/0x74
[c000003ca331b9e0] [c000000000165bdc] .do_last+0x58c/0xc54
[c000003ca331bb10] [c00000000016650c] .path_openat+0x268/0x6a0
[c000003ca331bc30] [c000000000166cc0] .do_filp_open+0x34/0x80
[c000003ca331bd70] [c000000000155a10] .do_sys_open+0x1a4/0x250
[c000003ca331be30] [c00000000000a024] syscall_exit+0x0/0x98
FIX blkdev_requests: Restoring 0xc000003ca627f790-0xc000003ca627f797=0x6b

FIX blkdev_requests: Marking all objects used
------------[ cut here ]------------
kernel BUG at /home/benh/linux-powerpc-test/block/elevator.c:262!
cpu 0x34: Vector: 700 (Program Check) at [c000001febbd32c0]
    pc: c00000000031ee40: .elv_rqhash_add+0x10/0x8c
    lr: c0000000003202c8: .__elv_add_request+0x264/0x27c
    sp: c000001febbd3540
   msr: 9000000000029032
  current = 0xc000003ca620c320
  paca    = 0xc00000000ffed000	 softe: 0	 irq_happened: 0x01
    pid   = 3565, comm = in:imjournal
kernel BUG at /home/benh/linux-powerpc-test/block/elevator.c:262!
enter ? for help
[link register   ] c0000000003202c8 .__elv_add_request+0x264/0x27c
[c000001febbd3540] c0000000003201a0 .__elv_add_request+0x13c/0x27c (unreliable)
[c000001febbd35e0] c000000000324858 .blk_flush_plug_list+0x1f4/0x254
[c000001febbd36a0] c0000000003248d0 .blk_finish_plug+0x18/0x3c
[c000001febbd3720] c000000000111a84 .__do_page_cache_readahead+0x268/0x298
[c000001febbd3870] c000000000111d24 .ra_submit+0x28/0x38
[c000001febbd38e0] c000000000106f54 .filemap_fault+0x158/0x3e4
[c000001febbd39b0] c000000000126cfc .__do_fault+0x48/0xbc
[c000001febbd3a60] c0000000001272b0 .do_read_fault.isra.65+0x2c/0xe4
[c000001febbd3b20] c00000000012a718 .handle_mm_fault+0x460/0xbb8
[c000001febbd3c00] c00000000002ff54 .do_page_fault+0x498/0x7e0
[c000001febbd3e30] c0000000000094e8 handle_page_fault+0x10/0x30
--- Exception: 301 (Data Access) at 00003fff9a40d2d4
SP (3fff9585df30) is in userspace
34:mon> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  0:02     ` Benjamin Herrenschmidt
@ 2014-04-10  1:31       ` Jens Axboe
  2014-04-10  1:36         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2014-04-10  1:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On 2014-04-09 18:02, Benjamin Herrenschmidt wrote:
> On Thu, 2014-04-10 at 09:47 +1000, Benjamin Herrenschmidt wrote:
>
>>> Nope, that's not a known issue. This must be related to the FIFO
>>> changes... How reproducible is this?
>>
>> Dunno yet, haven't had a chance to dig. I'll try to grab that machine
>> later today and reproduce &| bisect.
>
> Second boot leads to a slightly different symptom (see below). I'll
> try bisecting a bit later today if I get a chance.
>
> =============================================================================
> BUG blkdev_requests (Not tainted): Poison overwritten
> -----------------------------------------------------------------------------
>
> Disabling lock debugging due to kernel taint
> INFO: 0xc000003ca627f790-0xc000003ca627f797. First byte 0xc0 instead of 0x6b
> INFO: Allocated in .mempool_alloc_slab+0x1c/0x30 age=17 cpu=36 pid=3579
> 	.kmem_cache_alloc+0xc0/0x1ac
> 	.mempool_alloc_slab+0x1c/0x30
> 	.mempool_alloc+0x98/0x18c
> 	.get_request+0x248/0x490
> 	.blk_queue_bio+0x174/0x25c
> 	.generic_make_request+0xa8/0xec
> 	.submit_bio+0x134/0x14c
> 	.mpage_readpages+0x108/0x118
> 	.ext4_readpages+0x48/0x5c
> 	.__do_page_cache_readahead+0x1b0/0x298
> 	.ra_submit+0x28/0x38
> 	.filemap_fault+0x158/0x3e4
> 	.__do_fault+0x48/0xbc
> 	.do_read_fault.isra.65+0x2c/0xe4
> 	.handle_mm_fault+0x460/0xbb8
> 	.do_page_fault+0x498/0x7e0
> INFO: Freed in .mempool_free_slab+0x1c/0x30 age=17 cpu=36 pid=0
> 	.kmem_cache_free+0x1d4/0x1fc
> 	.mempool_free_slab+0x1c/0x30
> 	.mempool_free+0xb0/0xbc
> 	.__blk_put_request+0xd0/0x104
> 	.blk_end_bidi_request+0x40/0x64
> 	.scsi_io_completion+0x198/0x5a4
> 	.scsi_finish_command+0xd0/0xdc
> 	.scsi_softirq_done+0x128/0x134
> 	.blk_done_softirq+0xa0/0xb8
> 	.__do_softirq+0x1b8/0x370
> 	.irq_exit+0x74/0xa4
> 	.__do_irq+0x90/0xa4
> 	.call_do_irq+0x14/0x24
> 	.do_IRQ+0x80/0xc0
> 	hardware_interrupt_common+0x138/0x180
> 	0xc000000001e46640
> INFO: Slab 0xf00000000d445888 objects=92 used=92 fp=0x          (null) flags=0x13fff8000000080
> INFO: Object 0xc000003ca627f788 @offset=63368 fp=0xc000003ca6270de8
>
> Bytes b4 c000003ca627f778: 00 00 00 00 ff fe ef be 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
> Object c000003ca627f788: 6b 6b 6b 6b 6b 6b 6b 6b c0 00 00 3c a6 27 fa 50  kkkkkkkk...<.'.P
> Object c000003ca627f798: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f7a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f7b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f7c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f7d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f7e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f7f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f808: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f818: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f828: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f838: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f848: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f858: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f868: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f878: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f888: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f898: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f8a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f8b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f8c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f8d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f8e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
> Object c000003ca627f8f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
> Redzone c000003ca627f908: bb bb bb bb bb bb bb bb                          ........
> Padding c000003ca627fa48: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
> CPU: 60 PID: 3579 Comm: X Tainted: G    B        3.14.0-test #1
> Call Trace:
> [c000003ca331aaf0] [c00000000001405c] .show_stack+0x50/0x14c (unreliable)
> [c000003ca331abc0] [c0000000008a3a94] .dump_stack+0x90/0xc0
> [c000003ca331ac40] [c00000000014bbec] .print_trailer+0x17c/0x188
> [c000003ca331acd0] [c00000000014bd08] .check_bytes_and_report+0xc0/0x114
> [c000003ca331ad80] [c00000000014be44] .check_object+0xe8/0x240
> [c000003ca331ae20] [c0000000008a2488] .alloc_debug_processing+0x184/0x19c
> [c000003ca331aeb0] [c0000000008a2958] .__slab_alloc+0x4b8/0x540
> [c000003ca331b020] [c00000000014de68] .kmem_cache_alloc+0xc0/0x1ac
> [c000003ca331b0d0] [c000000000107f3c] .mempool_alloc_slab+0x1c/0x30
> [c000003ca331b140] [c0000000001080d4] .mempool_alloc+0x98/0x18c
> [c000003ca331b220] [c0000000003227c4] .get_request+0x248/0x490
> [c000003ca331b340] [c000000000324a68] .blk_queue_bio+0x174/0x25c
> [c000003ca331b3e0] [c000000000322280] .generic_make_request+0xa8/0xec
> [c000003ca331b470] [c0000000003223f8] .submit_bio+0x134/0x14c
> [c000003ca331b530] [c000000000186030] ._submit_bh+0x244/0x278
> [c000003ca331b5c0] [c000000000186b14] .ll_rw_block+0xd4/0x100
> [c000003ca331b670] [c000000000188d80] .__breadahead+0x2c/0x4c
> [c000003ca331b6f0] [c00000000021d18c] .__ext4_get_inode_loc+0x390/0x444
> [c000003ca331b7c0] [c00000000021f7cc] .ext4_iget+0x4c/0x97c
> [c000003ca331b8c0] [c000000000229558] .ext4_lookup+0xc4/0x15c
> [c000003ca331b960] [c00000000016007c] .lookup_real+0x4c/0x74
> [c000003ca331b9e0] [c000000000165bdc] .do_last+0x58c/0xc54
> [c000003ca331bb10] [c00000000016650c] .path_openat+0x268/0x6a0
> [c000003ca331bc30] [c000000000166cc0] .do_filp_open+0x34/0x80
> [c000003ca331bd70] [c000000000155a10] .do_sys_open+0x1a4/0x250
> [c000003ca331be30] [c00000000000a024] syscall_exit+0x0/0x98
> FIX blkdev_requests: Restoring 0xc000003ca627f790-0xc000003ca627f797=0x6b
>
> FIX blkdev_requests: Marking all objects used
> ------------[ cut here ]------------
> kernel BUG at /home/benh/linux-powerpc-test/block/elevator.c:262!
> cpu 0x34: Vector: 700 (Program Check) at [c000001febbd32c0]
>      pc: c00000000031ee40: .elv_rqhash_add+0x10/0x8c
>      lr: c0000000003202c8: .__elv_add_request+0x264/0x27c
>      sp: c000001febbd3540
>     msr: 9000000000029032
>    current = 0xc000003ca620c320
>    paca    = 0xc00000000ffed000	 softe: 0	 irq_happened: 0x01
>      pid   = 3565, comm = in:imjournal
> kernel BUG at /home/benh/linux-powerpc-test/block/elevator.c:262!
> enter ? for help
> [link register   ] c0000000003202c8 .__elv_add_request+0x264/0x27c
> [c000001febbd3540] c0000000003201a0 .__elv_add_request+0x13c/0x27c (unreliable)
> [c000001febbd35e0] c000000000324858 .blk_flush_plug_list+0x1f4/0x254
> [c000001febbd36a0] c0000000003248d0 .blk_finish_plug+0x18/0x3c
> [c000001febbd3720] c000000000111a84 .__do_page_cache_readahead+0x268/0x298
> [c000001febbd3870] c000000000111d24 .ra_submit+0x28/0x38
> [c000001febbd38e0] c000000000106f54 .filemap_fault+0x158/0x3e4
> [c000001febbd39b0] c000000000126cfc .__do_fault+0x48/0xbc
> [c000001febbd3a60] c0000000001272b0 .do_read_fault.isra.65+0x2c/0xe4
> [c000001febbd3b20] c00000000012a718 .handle_mm_fault+0x460/0xbb8
> [c000001febbd3c00] c00000000002ff54 .do_page_fault+0x498/0x7e0
> [c000001febbd3e30] c0000000000094e8 handle_page_fault+0x10/0x30
> --- Exception: 301 (Data Access) at 00003fff9a40d2d4
> SP (3fff9585df30) is in userspace
> 34:mon>

OK, I think we're seeing different symptoms of the same bug. Stay tuned, 
will have something for you to test shortly, I hope.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  1:31       ` Jens Axboe
@ 2014-04-10  1:36         ` Benjamin Herrenschmidt
  2014-04-10  1:38           ` Jens Axboe
  2014-04-10  2:25           ` Jens Axboe
  0 siblings, 2 replies; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2014-04-10  1:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On Wed, 2014-04-09 at 19:31 -0600, Jens Axboe wrote:
> OK, I think we're seeing different symptoms of the same bug. Stay
> tuned, will have something for you to test shortly, I hope.

Ah thanks. I'll defer the bisection then (it's painful on that machine
for various reasons....)

This is a 64-way P7 btw, so a nice race-o-matic :-)

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  1:36         ` Benjamin Herrenschmidt
@ 2014-04-10  1:38           ` Jens Axboe
  2014-04-10  2:25           ` Jens Axboe
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2014-04-10  1:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On 2014-04-09 19:36, Benjamin Herrenschmidt wrote:
> On Wed, 2014-04-09 at 19:31 -0600, Jens Axboe wrote:
>> OK, I think we're seeing different symptoms of the same bug. Stay
>> tuned, will have something for you to test shortly, I hope.
>
> Ah thanks. I'll defer the bisection then (it's painful on that machine
> for various reasons....)

Yeah, hunting it now, hopefully I can reproduce and a fix should be 
forth coming.

> This is a 64-way P7 btw, so a nice race-o-matic :-)

It's why we all love ppc :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  1:36         ` Benjamin Herrenschmidt
  2014-04-10  1:38           ` Jens Axboe
@ 2014-04-10  2:25           ` Jens Axboe
  2014-04-10  3:35             ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2014-04-10  2:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

[-- Attachment #1: Type: text/plain, Size: 402 bytes --]

On 2014-04-09 19:36, Benjamin Herrenschmidt wrote:
> On Wed, 2014-04-09 at 19:31 -0600, Jens Axboe wrote:
>> OK, I think we're seeing different symptoms of the same bug. Stay
>> tuned, will have something for you to test shortly, I hope.
>
> Ah thanks. I'll defer the bisection then (it's painful on that machine
> for various reasons....)

Can you try with these two patches applied?

-- 
Jens Axboe


[-- Attachment #2: scsi-flags.patch --]
[-- Type: text/x-patch, Size: 5331 bytes --]

Received: from mx0a-00082601.pphosted.com (192.168.57.29) by
 PRN-CHUB10.TheFacebook.com (192.168.16.20) with Microsoft SMTP Server id
 14.3.174.1; Wed, 9 Apr 2014 19:21:04 -0700
Received: from pps.filterd (m0044010 [127.0.0.1])	by
 mx0a-00082601.pphosted.com (8.14.5/8.14.5) with SMTP id s3A2Jc5c012431	for
 <axboe@fb.com>; Wed, 9 Apr 2014 19:21:04 -0700
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69])	by
 mx0a-00082601.pphosted.com with ESMTP id 1k5bk5s73q-1	(version=TLSv1/SSLv3
 cipher=AES256-SHA bits=256 verify=OK)	for <axboe@fb.com>; Wed, 09 Apr 2014
 19:21:03 -0700
Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238])
	by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id
 s3A2L16N026501	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
 verify=OK);	Thu, 10 Apr 2014 02:21:02 GMT
Received: from userz7021.oracle.com (userz7021.oracle.com [156.151.31.85])	by
 acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s3A2L07X021509
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL);	Thu, 10
 Apr 2014 02:21:01 GMT
Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23])	by
 userz7021.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s3A2KxkX007968;	Thu,
 10 Apr 2014 02:21:00 GMT
Received: from mojo.mkp.net (/141.144.6.237)	by default (Oracle Beehive
 Gateway v4.0)	with ESMTP ; Wed, 09 Apr 2014 19:20:59 -0700
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: <linux-scsi@vger.kernel.org>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>, Jens Axboe
	<axboe@fb.com>, Jan Kara <jack@suse.cz>, Frederic Weisbecker
	<fweisbec@gmail.com>
Subject: [PATCH] scsi: Make sure cmd_flags are 64-bit
Date: Wed, 9 Apr 2014 22:20:48 -0400
Message-ID: <1397096448-22073-1-git-send-email-martin.petersen@oracle.com>
X-Mailer: git-send-email 1.8.3.1
X-Source-IP: acsinet22.oracle.com [141.146.126.238]
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.96,1.0.14,0.0.0000
 definitions=2014-04-09_05:2014-04-09,2014-04-09,1970-01-01 signatures=0
X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0
 kscore.is_bulkscore=1.4432899320127e-15 kscore.compositescore=0
 circleOfTrustscore=216.016 compositescore=0.0999826334628526
 urlsuspect_oldscore=0.999826334628526 suspectscore=1
 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0
 kscore.is_spamscore=0 recipient_to_sender_totalscore=0
 recipient_domain_to_sender_domain_totalscore=27002
 rbsscore=0.0999826334628526 spamscore=0
 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.9 adultscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000
 definitions=main-1404100031
X-FB-Internal: deliver
Content-Type: text/plain
Return-Path: martin.petersen@oracle.com
X-MS-Exchange-Organization-AuthSource: PRN-CHUB10.TheFacebook.com
X-MS-Exchange-Organization-AuthAs: Anonymous
MIME-Version: 1.0

From: "Martin K. Petersen" <martin.petersen@oracle.com>

cmd_flags in struct request is now 64 bits wide but the scsi_execute
functions truncated arguments passed to int leading to errors. Make sure
the flags parameters are u64.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jens Axboe <axboe@fb.com>
CC: Jan Kara <jack@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
---
 drivers/scsi/scsi_lib.c    | 4 ++--
 include/scsi/scsi_device.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5681c05ac506..65a123d9c676 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -184,7 +184,7 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
  */
 int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 		 int data_direction, void *buffer, unsigned bufflen,
-		 unsigned char *sense, int timeout, int retries, int flags,
+		 unsigned char *sense, int timeout, int retries, u64 flags,
 		 int *resid)
 {
 	struct request *req;
@@ -235,7 +235,7 @@ EXPORT_SYMBOL(scsi_execute);
 int scsi_execute_req_flags(struct scsi_device *sdev, const unsigned char *cmd,
 		     int data_direction, void *buffer, unsigned bufflen,
 		     struct scsi_sense_hdr *sshdr, int timeout, int retries,
-		     int *resid, int flags)
+		     int *resid, u64 flags)
 {
 	char *sense = NULL;
 	int result;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 4e845b80efd3..5853c913d2b0 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -423,11 +423,11 @@ extern int scsi_is_target_device(const struct device *);
 extern int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 			int data_direction, void *buffer, unsigned bufflen,
 			unsigned char *sense, int timeout, int retries,
-			int flag, int *resid);
+			u64 flags, int *resid);
 extern int scsi_execute_req_flags(struct scsi_device *sdev,
 	const unsigned char *cmd, int data_direction, void *buffer,
 	unsigned bufflen, struct scsi_sense_hdr *sshdr, int timeout,
-	int retries, int *resid, int flags);
+	int retries, int *resid, u64 flags);
 static inline int scsi_execute_req(struct scsi_device *sdev,
 	const unsigned char *cmd, int data_direction, void *buffer,
 	unsigned bufflen, struct scsi_sense_hdr *sshdr, int timeout,
-- 
1.8.3.1


[-- Attachment #3: blk-ipi-v2.patch --]
[-- Type: text/x-patch, Size: 4661 bytes --]

diff --git a/block/blk-core.c b/block/blk-core.c
index 7af4a4898dcb..0bc030aff0d0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1307,7 +1307,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 		struct request_list *rl = blk_rq_rl(req);
 
 		BUG_ON(!list_empty(&req->queuelist));
-		BUG_ON(!hlist_unhashed(&req->hash));
+		BUG_ON(ELV_ON_HASH(req));
 
 		blk_free_request(rl, req);
 		freed_request(rl, flags);
diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index ebd6b6f1bdeb..53b1737e978d 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -30,8 +30,8 @@ static void blk_done_softirq(struct softirq_action *h)
 	while (!list_empty(&local_list)) {
 		struct request *rq;
 
-		rq = list_entry(local_list.next, struct request, queuelist);
-		list_del_init(&rq->queuelist);
+		rq = list_entry(local_list.next, struct request, ipi_list);
+		list_del_init(&rq->ipi_list);
 		rq->q->softirq_done_fn(rq);
 	}
 }
@@ -45,14 +45,9 @@ static void trigger_softirq(void *data)
 
 	local_irq_save(flags);
 	list = this_cpu_ptr(&blk_cpu_done);
-	/*
-	 * We reuse queuelist for a list of requests to process. Since the
-	 * queuelist is used by the block layer only for requests waiting to be
-	 * submitted to the device it is unused now.
-	 */
-	list_add_tail(&rq->queuelist, list);
+	list_add_tail(&rq->ipi_list, list);
 
-	if (list->next == &rq->queuelist)
+	if (list->next == &rq->ipi_list)
 		raise_softirq_irqoff(BLOCK_SOFTIRQ);
 
 	local_irq_restore(flags);
@@ -141,7 +136,7 @@ void __blk_complete_request(struct request *req)
 		struct list_head *list;
 do_local:
 		list = this_cpu_ptr(&blk_cpu_done);
-		list_add_tail(&req->queuelist, list);
+		list_add_tail(&req->ipi_list, list);
 
 		/*
 		 * if the list only contains our just added request,
@@ -149,7 +144,7 @@ do_local:
 		 * entries there, someone already raised the irq but it
 		 * hasn't run yet.
 		 */
-		if (list->next == &req->queuelist)
+		if (list->next == &req->ipi_list)
 			raise_softirq_irqoff(BLOCK_SOFTIRQ);
 	} else if (raise_blk_irq(ccpu, req))
 		goto do_local;
diff --git a/block/blk.h b/block/blk.h
index d23b415b8a28..1d880f1f957f 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -78,7 +78,7 @@ static inline void blk_clear_rq_complete(struct request *rq)
 /*
  * Internal elevator interface
  */
-#define ELV_ON_HASH(rq) hash_hashed(&(rq)->hash)
+#define ELV_ON_HASH(rq) ((rq)->cmd_flags & REQ_HASHED)
 
 void blk_insert_flush(struct request *rq);
 void blk_abort_flushes(struct request_queue *q);
diff --git a/block/elevator.c b/block/elevator.c
index 42c45a7d6714..1e01b66a0b92 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -247,6 +247,7 @@ EXPORT_SYMBOL(elevator_exit);
 static inline void __elv_rqhash_del(struct request *rq)
 {
 	hash_del(&rq->hash);
+	rq->cmd_flags &= ~REQ_HASHED;
 }
 
 static void elv_rqhash_del(struct request_queue *q, struct request *rq)
@@ -261,6 +262,7 @@ static void elv_rqhash_add(struct request_queue *q, struct request *rq)
 
 	BUG_ON(ELV_ON_HASH(rq));
 	hash_add(e->hash, &rq->hash, rq_hash_key(rq));
+	rq->cmd_flags |= REQ_HASHED;
 }
 
 static void elv_rqhash_reposition(struct request_queue *q, struct request *rq)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index bbc3a6c88fce..aa0eaa2d0bd8 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -189,6 +189,7 @@ enum rq_flag_bits {
 	__REQ_KERNEL, 		/* direct IO to kernel pages */
 	__REQ_PM,		/* runtime pm request */
 	__REQ_END,		/* last of chain of requests */
+	__REQ_HASHED,		/* on IO scheduler merge hash */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -241,5 +242,6 @@ enum rq_flag_bits {
 #define REQ_KERNEL		(1ULL << __REQ_KERNEL)
 #define REQ_PM			(1ULL << __REQ_PM)
 #define REQ_END			(1ULL << __REQ_END)
+#define REQ_HASHED		(1ULL << __REQ_HASHED)
 
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5a31307c5ded..e402f8421c0c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -118,7 +118,18 @@ struct request {
 	struct bio *bio;
 	struct bio *biotail;
 
-	struct hlist_node hash;	/* merge hash */
+	/*
+	 * The hash is used inside the scheduler, and killed once the
+	 * request reaches the dispatch list. The ipi_list is only used
+	 * to queue the request for softirq completion, which is long
+	 * after the request has been unhashed (and even removed from
+	 * the dispatch list).
+	 */
+	union {
+		struct hlist_node hash;	/* merge hash */
+		struct list_head ipi_list;
+	};
+
 	/*
 	 * The rb_node is only used inside the io scheduler, requests
 	 * are pruned when moved to the dispatch queue. So let the

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  2:25           ` Jens Axboe
@ 2014-04-10  3:35             ` Benjamin Herrenschmidt
  2014-04-10  3:52               ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2014-04-10  3:35 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On Wed, 2014-04-09 at 20:25 -0600, Jens Axboe wrote:
> On 2014-04-09 19:36, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-04-09 at 19:31 -0600, Jens Axboe wrote:
> >> OK, I think we're seeing different symptoms of the same bug. Stay
> >> tuned, will have something for you to test shortly, I hope.
> >
> > Ah thanks. I'll defer the bisection then (it's painful on that machine
> > for various reasons....)
> 
> Can you try with these two patches applied?

Booted to login prompt once ... and twice. Much better ! :-)

Thanks a lot !

Not sure if it's related, but in the "good" boots (with the patch),
I see a truckload of:

systemd-udevd[3264]: starting version 204
sr 2:6:0:0: [sr0]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sr 2:6:0:0: [sr0]  
Sense Key : Illegal Request [current] 
sr 2:6:0:0: [sr0]  
Add. Sense: Logical block address out of range
sr 2:6:0:0: [sr0] CDB: 
Read(10): 28 00 00 00 00 00 00 00 01 00
end_request: critical target error, dev sr0, sector 0
Buffer I/O error on device sr0, logical block 0
[  OK  ] Started udev Kernel Device Manager.
sr 2:6:0:0: [sr0]  
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sr 2:6:0:0: [sr0]  
Sense Key : Illegal Request [current] 
sr 2:6:0:0: [sr0]  
Add. Sense: Logical block address out of range
sr 2:6:0:0: [sr0] CDB: 
 
   etc...

I might have a crap coaster in that drive. It doesn't prevent
the machine from working otherwise, so far...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  3:35             ` Benjamin Herrenschmidt
@ 2014-04-10  3:52               ` Jens Axboe
  2014-04-10  6:45                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2014-04-10  3:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On 2014-04-09 21:35, Benjamin Herrenschmidt wrote:
> On Wed, 2014-04-09 at 20:25 -0600, Jens Axboe wrote:
>> On 2014-04-09 19:36, Benjamin Herrenschmidt wrote:
>>> On Wed, 2014-04-09 at 19:31 -0600, Jens Axboe wrote:
>>>> OK, I think we're seeing different symptoms of the same bug. Stay
>>>> tuned, will have something for you to test shortly, I hope.
>>>
>>> Ah thanks. I'll defer the bisection then (it's painful on that machine
>>> for various reasons....)
>>
>> Can you try with these two patches applied?
>
> Booted to login prompt once ... and twice. Much better ! :-)

Excellent! I was hoping it was the same bug :-)
I'll amend the commit and add your tested-by.

> Not sure if it's related, but in the "good" boots (with the patch),
> I see a truckload of:
>
> systemd-udevd[3264]: starting version 204
> sr 2:6:0:0: [sr0]
> Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sr 2:6:0:0: [sr0]
> Sense Key : Illegal Request [current]
> sr 2:6:0:0: [sr0]
> Add. Sense: Logical block address out of range
> sr 2:6:0:0: [sr0] CDB:
> Read(10): 28 00 00 00 00 00 00 00 01 00
> end_request: critical target error, dev sr0, sector 0
> Buffer I/O error on device sr0, logical block 0
> [  OK  ] Started udev Kernel Device Manager.
> sr 2:6:0:0: [sr0]
> Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sr 2:6:0:0: [sr0]
> Sense Key : Illegal Request [current]
> sr 2:6:0:0: [sr0]
> Add. Sense: Logical block address out of range
> sr 2:6:0:0: [sr0] CDB:
>
>     etc...
>
> I might have a crap coaster in that drive. It doesn't prevent
> the machine from working otherwise, so far...

That does look like just a dud cdrom in the tray.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  3:52               ` Jens Axboe
@ 2014-04-10  6:45                 ` Benjamin Herrenschmidt
  2014-04-10 13:47                   ` Jens Axboe
  2014-04-10 18:10                   ` Jens Axboe
  0 siblings, 2 replies; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2014-04-10  6:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On Wed, 2014-04-09 at 21:52 -0600, Jens Axboe wrote:
> On 2014-04-09 21:35, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-04-09 at 20:25 -0600, Jens Axboe wrote:
> >> On 2014-04-09 19:36, Benjamin Herrenschmidt wrote:
> >>> On Wed, 2014-04-09 at 19:31 -0600, Jens Axboe wrote:
> >>>> OK, I think we're seeing different symptoms of the same bug. Stay
> >>>> tuned, will have something for you to test shortly, I hope.
> >>>
> >>> Ah thanks. I'll defer the bisection then (it's painful on that machine
> >>> for various reasons....)
> >>
> >> Can you try with these two patches applied?
> >
> > Booted to login prompt once ... and twice. Much better ! :-)
> 
> Excellent! I was hoping it was the same bug :-)
> I'll amend the commit and add your tested-by.

A colleague was complaining of yet another oddball block related crash
today on another machine when doing rsync's with upstream (this machine
booted fine but crashed later on, but then it has less CPUs... maybe
that's relevant) and these patches fixed it too.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  6:45                 ` Benjamin Herrenschmidt
@ 2014-04-10 13:47                   ` Jens Axboe
  2014-04-10 18:10                   ` Jens Axboe
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2014-04-10 13:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On 2014-04-10 00:45, Benjamin Herrenschmidt wrote:
> On Wed, 2014-04-09 at 21:52 -0600, Jens Axboe wrote:
>> On 2014-04-09 21:35, Benjamin Herrenschmidt wrote:
>>> On Wed, 2014-04-09 at 20:25 -0600, Jens Axboe wrote:
>>>> On 2014-04-09 19:36, Benjamin Herrenschmidt wrote:
>>>>> On Wed, 2014-04-09 at 19:31 -0600, Jens Axboe wrote:
>>>>>> OK, I think we're seeing different symptoms of the same bug. Stay
>>>>>> tuned, will have something for you to test shortly, I hope.
>>>>>
>>>>> Ah thanks. I'll defer the bisection then (it's painful on that machine
>>>>> for various reasons....)
>>>>
>>>> Can you try with these two patches applied?
>>>
>>> Booted to login prompt once ... and twice. Much better ! :-)
>>
>> Excellent! I was hoping it was the same bug :-)
>> I'll amend the commit and add your tested-by.
>
> A colleague was complaining of yet another oddball block related crash
> today on another machine when doing rsync's with upstream (this machine
> booted fine but crashed later on, but then it has less CPUs... maybe
> that's relevant) and these patches fixed it too.

I'm sure it could manifest itself in a lot of weird ways. I'll send it 
upstream this morning, so hopefully should be fixed in mainline shortly.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] kernel BUG at /.../block/cfq-iosched.c:3145!
  2014-04-10  6:45                 ` Benjamin Herrenschmidt
  2014-04-10 13:47                   ` Jens Axboe
@ 2014-04-10 18:10                   ` Jens Axboe
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2014-04-10 18:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kara, Linux Kernel Mailing List, Frederic Weisbecker,
	James Bottomley, Brian J King

On 04/10/2014 12:45 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2014-04-09 at 21:52 -0600, Jens Axboe wrote:
>> On 2014-04-09 21:35, Benjamin Herrenschmidt wrote:
>>> On Wed, 2014-04-09 at 20:25 -0600, Jens Axboe wrote:
>>>> On 2014-04-09 19:36, Benjamin Herrenschmidt wrote:
>>>>> On Wed, 2014-04-09 at 19:31 -0600, Jens Axboe wrote:
>>>>>> OK, I think we're seeing different symptoms of the same bug. Stay
>>>>>> tuned, will have something for you to test shortly, I hope.
>>>>>
>>>>> Ah thanks. I'll defer the bisection then (it's painful on that machine
>>>>> for various reasons....)
>>>>
>>>> Can you try with these two patches applied?
>>>
>>> Booted to login prompt once ... and twice. Much better ! :-)
>>
>> Excellent! I was hoping it was the same bug :-)
>> I'll amend the commit and add your tested-by.
>
> A colleague was complaining of yet another oddball block related crash
> today on another machine when doing rsync's with upstream (this machine
> booted fine but crashed later on, but then it has less CPUs... maybe
> that's relevant) and these patches fixed it too.

Pull request is now in, jfyi.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-04-10 18:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-09  4:44 [BUG] kernel BUG at /.../block/cfq-iosched.c:3145! Benjamin Herrenschmidt
2014-04-09 13:35 ` Jens Axboe
2014-04-09 23:47   ` Benjamin Herrenschmidt
2014-04-10  0:02     ` Benjamin Herrenschmidt
2014-04-10  1:31       ` Jens Axboe
2014-04-10  1:36         ` Benjamin Herrenschmidt
2014-04-10  1:38           ` Jens Axboe
2014-04-10  2:25           ` Jens Axboe
2014-04-10  3:35             ` Benjamin Herrenschmidt
2014-04-10  3:52               ` Jens Axboe
2014-04-10  6:45                 ` Benjamin Herrenschmidt
2014-04-10 13:47                   ` Jens Axboe
2014-04-10 18:10                   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).