All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-18  9:18 Michael Ellerman
  2015-11-18 11:06 ` Laurent Dufour
  2015-11-18 14:03 ` Mark Salter
  0 siblings, 2 replies; 58+ messages in thread
From: Michael Ellerman @ 2015-11-18  9:18 UTC (permalink / raw)
  To: James E. J. Bottomley, brking; +Cc: linux-scsi, linux-kernel, linuxppc-dev

Hi folks,

I'm intermittently seeing the following oops on at least one powerpc box.

The BUG_ON() is from:

static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
{
	...
	count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
	BUG_ON(count > sdb->table.nents);

Looking at the dump it looks like count was 2, I can't work out what nents was.

The machine's just a fairly boring bare metal setup, with a single IPR adapter:

0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
	Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
	Flags: bus master, fast devsel, latency 0
	Kernel driver in use: ipr


Anyone seen it before or have any ideas?

cheers


systemd[1]: Starting Uncomplicated firewall...
         Starting Uncomplicated firewall...
------------[ cut here ]------------
kernel BUG at drivers/scsi/scsi_lib.c:1096!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 132 PID: 2699 Comm: kworker/132:1H Not tainted 4.4.0-rc1-54939-ge22a248-dirty #77
Workqueue: kblockd cfq_kick_queue
task: c000000fef147400 ti: c000000feb384000 task.ti: c000000feb384000
NIP: c0000000005ab4a8 LR: c0000000005ab490 CTR: 0000000000000000
REGS: c000000feb387620 TRAP: 0700   Not tainted  (4.4.0-rc1-54939-ge22a248-dirty)
MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002228  XER: 00000000
CFAR: c000000000464950 SOFTE: 0 
GPR00: c0000000005ab490 c000000feb3878a0 c000000000d77d00 0000000000000002 
GPR04: c000000ff2030158 c000000ff47a0c00 0000000000000000 0000000000001000 
GPR08: 0000000000000000 0000000000000001 0000000000000000 fffffffffffffff7 
GPR12: 0000000000002200 c00000000fde5200 c0000000000c8098 c000000000b39858 
GPR16: c000000000ae12c8 c000000000b39948 0000000000000000 0000000000000000 
GPR20: c0000000005ab9c0 c0000017f5800144 0000000000000000 c000001e551a6850 
GPR24: c0000017f5800140 c0000007efb9c800 0000000000000000 c000001e551a6800 
GPR28: 0000000000000000 0000000000000000 c000000ff2030158 c000000feb420240 
NIP [c0000000005ab4a8] scsi_init_sgtable+0xa8/0x180
LR [c0000000005ab490] scsi_init_sgtable+0x90/0x180
Call Trace:
[c000000feb3878a0] [c0000000005ab490] scsi_init_sgtable+0x90/0x180 (unreliable)
[c000000feb3878e0] [c0000000005ab5d4] scsi_init_io+0x54/0x160
[c000000feb387930] [c0000000005fb43c] sd_init_command+0x6c/0xb00
[c000000feb3879f0] [c0000000005ac2f8] scsi_setup_cmnd+0x108/0x1a0
[c000000feb387a30] [c0000000005ac594] scsi_prep_fn+0x104/0x1c0
[c000000feb387a70] [c00000000045d85c] blk_peek_request+0x20c/0x390
[c000000feb387af0] [c0000000005ad6a8] scsi_request_fn+0xb8/0x7f0
[c000000feb387bf0] [c0000000004583b4] __blk_run_queue+0x54/0x80
[c000000feb387c20] [c00000000047e178] cfq_kick_queue+0x38/0xc0
[c000000feb387c50] [c0000000000c00cc] process_one_work+0x2ac/0x560
[c000000feb387ce0] [c0000000000c0510] worker_thread+0x190/0x660
[c000000feb387d80] [c0000000000c8198] kthread+0x108/0x130
[c000000feb387e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
Instruction dump:
2fa30000 409e00e4 e8bf0000 e87e0030 7fc4f378 4beb91ed 60000000 813f0008 
78630020 7f891840 7d301026 5529effe <0b090000> 907f0008 38600000 813e005c 
---[ end trace 3ea632a512ff48d5 ]---


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-18  9:18 kernel BUG at drivers/scsi/scsi_lib.c:1096! Michael Ellerman
@ 2015-11-18 11:06 ` Laurent Dufour
  2015-11-18 11:10   ` Michael Ellerman
  2015-11-18 14:03 ` Mark Salter
  1 sibling, 1 reply; 58+ messages in thread
From: Laurent Dufour @ 2015-11-18 11:06 UTC (permalink / raw)
  To: linuxppc-dev

On 18/11/2015 10:18, Michael Ellerman wrote:
> Hi folks,
> 
> I'm intermittently seeing the following oops on at least one powerpc box.
> 
> The BUG_ON() is from:
> 
> static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
> {
> 	...
> 	count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
> 	BUG_ON(count > sdb->table.nents);
> 
> Looking at the dump it looks like count was 2, I can't work out what nents was.
> 
> The machine's just a fairly boring bare metal setup, with a single IPR adapter:
> 
> 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
> 	Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
> 	Flags: bus master, fast devsel, latency 0
> 	Kernel driver in use: ipr
> 
> 
> Anyone seen it before or have any ideas?

Hi Michael,

I'm facing the same panic on my bare metal system.

Sounds to be raised by the latest update to 4.4-rc1

Laurent.

> cheers
> 
> 
> systemd[1]: Starting Uncomplicated firewall...
>          Starting Uncomplicated firewall...
> ------------[ cut here ]------------
> kernel BUG at drivers/scsi/scsi_lib.c:1096!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 132 PID: 2699 Comm: kworker/132:1H Not tainted 4.4.0-rc1-54939-ge22a248-dirty #77
> Workqueue: kblockd cfq_kick_queue
> task: c000000fef147400 ti: c000000feb384000 task.ti: c000000feb384000
> NIP: c0000000005ab4a8 LR: c0000000005ab490 CTR: 0000000000000000
> REGS: c000000feb387620 TRAP: 0700   Not tainted  (4.4.0-rc1-54939-ge22a248-dirty)
> MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002228  XER: 00000000
> CFAR: c000000000464950 SOFTE: 0 
> GPR00: c0000000005ab490 c000000feb3878a0 c000000000d77d00 0000000000000002 
> GPR04: c000000ff2030158 c000000ff47a0c00 0000000000000000 0000000000001000 
> GPR08: 0000000000000000 0000000000000001 0000000000000000 fffffffffffffff7 
> GPR12: 0000000000002200 c00000000fde5200 c0000000000c8098 c000000000b39858 
> GPR16: c000000000ae12c8 c000000000b39948 0000000000000000 0000000000000000 
> GPR20: c0000000005ab9c0 c0000017f5800144 0000000000000000 c000001e551a6850 
> GPR24: c0000017f5800140 c0000007efb9c800 0000000000000000 c000001e551a6800 
> GPR28: 0000000000000000 0000000000000000 c000000ff2030158 c000000feb420240 
> NIP [c0000000005ab4a8] scsi_init_sgtable+0xa8/0x180
> LR [c0000000005ab490] scsi_init_sgtable+0x90/0x180
> Call Trace:
> [c000000feb3878a0] [c0000000005ab490] scsi_init_sgtable+0x90/0x180 (unreliable)
> [c000000feb3878e0] [c0000000005ab5d4] scsi_init_io+0x54/0x160
> [c000000feb387930] [c0000000005fb43c] sd_init_command+0x6c/0xb00
> [c000000feb3879f0] [c0000000005ac2f8] scsi_setup_cmnd+0x108/0x1a0
> [c000000feb387a30] [c0000000005ac594] scsi_prep_fn+0x104/0x1c0
> [c000000feb387a70] [c00000000045d85c] blk_peek_request+0x20c/0x390
> [c000000feb387af0] [c0000000005ad6a8] scsi_request_fn+0xb8/0x7f0
> [c000000feb387bf0] [c0000000004583b4] __blk_run_queue+0x54/0x80
> [c000000feb387c20] [c00000000047e178] cfq_kick_queue+0x38/0xc0
> [c000000feb387c50] [c0000000000c00cc] process_one_work+0x2ac/0x560
> [c000000feb387ce0] [c0000000000c0510] worker_thread+0x190/0x660
> [c000000feb387d80] [c0000000000c8198] kthread+0x108/0x130
> [c000000feb387e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
> Instruction dump:
> 2fa30000 409e00e4 e8bf0000 e87e0030 7fc4f378 4beb91ed 60000000 813f0008 
> 78630020 7f891840 7d301026 5529effe <0b090000> 907f0008 38600000 813e005c 
> ---[ end trace 3ea632a512ff48d5 ]---
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-18 11:06 ` Laurent Dufour
@ 2015-11-18 11:10   ` Michael Ellerman
  2015-11-18 11:17     ` Laurent Dufour
  0 siblings, 1 reply; 58+ messages in thread
From: Michael Ellerman @ 2015-11-18 11:10 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev

On Wed, 2015-11-18 at 12:06 +0100, Laurent Dufour wrote:
> On 18/11/2015 10:18, Michael Ellerman wrote:
> > Hi folks,
> > 
> > I'm intermittently seeing the following oops on at least one powerpc box.
> > 
> > The BUG_ON() is from:
> > 
> > static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
> > {
> > 	...
> > 	count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
> > 	BUG_ON(count > sdb->table.nents);
> > 
> > Looking at the dump it looks like count was 2, I can't work out what nents was.
> > 
> > The machine's just a fairly boring bare metal setup, with a single IPR adapter:
> > 
> > 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
> > 	Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
> > 	Flags: bus master, fast devsel, latency 0
> > 	Kernel driver in use: ipr
> > 
> > 
> > Anyone seen it before or have any ideas?
> 
> Hi Michael,
> 
> I'm facing the same panic on my bare metal system.
> 
> Sounds to be raised by the latest update to 4.4-rc1

Yeah I think it's new since 4.3.

Do you see it regularly, or just every now and then?

cheers

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-18 11:10   ` Michael Ellerman
@ 2015-11-18 11:17     ` Laurent Dufour
  0 siblings, 0 replies; 58+ messages in thread
From: Laurent Dufour @ 2015-11-18 11:17 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev

On 18/11/2015 12:10, Michael Ellerman wrote:
> On Wed, 2015-11-18 at 12:06 +0100, Laurent Dufour wrote:
>> On 18/11/2015 10:18, Michael Ellerman wrote:
>>> Hi folks,
>>>
>>> I'm intermittently seeing the following oops on at least one powerpc box.
>>>
>>> The BUG_ON() is from:
>>>
>>> static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
>>> {
>>> 	...
>>> 	count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
>>> 	BUG_ON(count > sdb->table.nents);
>>>
>>> Looking at the dump it looks like count was 2, I can't work out what nents was.
>>>
>>> The machine's just a fairly boring bare metal setup, with a single IPR adapter:
>>>
>>> 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
>>> 	Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
>>> 	Flags: bus master, fast devsel, latency 0
>>> 	Kernel driver in use: ipr
>>>
>>>
>>> Anyone seen it before or have any ideas?
>>
>> Hi Michael,
>>
>> I'm facing the same panic on my bare metal system.
>>
>> Sounds to be raised by the latest update to 4.4-rc1
> 
> Yeah I think it's new since 4.3.
> 
> Do you see it regularly, or just every now and then?

I got it almost every time I'm stressing the system's memory, but I
can't state if it's tied or not.

I can't remember having seeing it with 4.3, I'll double check that and
getting back to you.

Laurent.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-18  9:18 kernel BUG at drivers/scsi/scsi_lib.c:1096! Michael Ellerman
  2015-11-18 11:06 ` Laurent Dufour
@ 2015-11-18 14:03 ` Mark Salter
  2015-11-19  1:02   ` Michael Ellerman
  1 sibling, 1 reply; 58+ messages in thread
From: Mark Salter @ 2015-11-18 14:03 UTC (permalink / raw)
  To: Michael Ellerman, James E. J. Bottomley, brking
  Cc: linux-scsi, linux-kernel, linuxppc-dev

On Wed, 2015-11-18 at 20:18 +1100, Michael Ellerman wrote:
> Hi folks,
> 
> I'm intermittently seeing the following oops on at least one powerpc box.
> 
> The BUG_ON() is from:
> 
> static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
> {
> 	...
> 	count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
> 	BUG_ON(count > sdb->table.nents);
> 
> Looking at the dump it looks like count was 2, I can't work out what nents was.
> 
> The machine's just a fairly boring bare metal setup, with a single IPR adapter:
> 
> 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
> 	Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
> 	Flags: bus master, fast devsel, latency 0
> 	Kernel driver in use: ipr
> 
> 
> Anyone seen it before or have any ideas?

I'm also seeing it on arm64 in 4.4-rc1

[    6.859003] Call trace:                                                                      
[    6.861439] [<fffffe000074345c>] scsi_init_sgtable+0x84/0x88                                 
[    6.867072] [<fffffe00007434ac>] scsi_init_io+0x4c/0x1ac                                     
[    6.872358] [<fffffe000075004c>] sd_setup_read_write_cmnd+0x44/0x844                         
[    6.878682] [<fffffe0000750884>] sd_init_command+0x38/0xb0                                   
[    6.884141] [<fffffe00007436e4>] scsi_setup_cmnd+0xd8/0x13c                                  
[    6.889686] [<fffffe000074397c>] scsi_prep_fn+0xc0/0x140                                     
[    6.894973] [<fffffe00003bd0cc>] blk_peek_request+0x148/0x24c                                
[    6.900692] [<fffffe0000744754>] scsi_request_fn+0x58/0x648                                  
[    6.906237] [<fffffe00003b8e80>] __blk_run_queue+0x40/0x58                                   
[    6.911696] [<fffffe00003b91a4>] blk_run_queue+0x30/0x48                                     
[    6.916983] [<fffffe00007422e4>] scsi_run_queue+0x204/0x294                                  
[    6.922528] [<fffffe000074321c>] scsi_end_request+0x13c/0x1a0                                
[    6.928247] [<fffffe0000744ef4>] scsi_io_completion+0xf0/0x564                               
[    6.934052] [<fffffe000073beb0>] scsi_finish_command+0xe4/0x144                              
[    6.939943] [<fffffe00007446cc>] scsi_softirq_done+0x148/0x178                               
[    6.945748] [<fffffe00003c39a4>] blk_done_softirq+0x7c/0x94                                  
[    6.951295] [<fffffe00000c8f44>] __do_softirq+0x114/0x2a0                                    
[    6.956667] [<fffffe00000c9340>] irq_exit+0x8c/0xe4                                          
[    6.961522] [<fffffe000009cf00>] handle_IPI+0x170/0x228                                      
[    6.966721] [<fffffe00000904e0>] gic_handle_irq+0xa0/0xb8                                    
[    6.972093] Exception stack(0xfffffe03dc143de0 to 0xfffffe03dc143f00)                        

> 
> cheers
> 
> 
> systemd[1]: Starting Uncomplicated firewall...
>          Starting Uncomplicated firewall...
> ------------[ cut here ]------------
> kernel BUG at drivers/scsi/scsi_lib.c:1096!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in:
> CPU: 132 PID: 2699 Comm: kworker/132:1H Not tainted 4.4.0-rc1-54939-ge22a248-dirty #77
> Workqueue: kblockd cfq_kick_queue
> task: c000000fef147400 ti: c000000feb384000 task.ti: c000000feb384000
> NIP: c0000000005ab4a8 LR: c0000000005ab490 CTR: 0000000000000000
> REGS: c000000feb387620 TRAP: 0700   Not tainted  (4.4.0-rc1-54939-ge22a248-dirty)
> MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002228  XER: 00000000
> CFAR: c000000000464950 SOFTE: 0 
> GPR00: c0000000005ab490 c000000feb3878a0 c000000000d77d00 0000000000000002 
> GPR04: c000000ff2030158 c000000ff47a0c00 0000000000000000 0000000000001000 
> GPR08: 0000000000000000 0000000000000001 0000000000000000 fffffffffffffff7 
> GPR12: 0000000000002200 c00000000fde5200 c0000000000c8098 c000000000b39858 
> GPR16: c000000000ae12c8 c000000000b39948 0000000000000000 0000000000000000 
> GPR20: c0000000005ab9c0 c0000017f5800144 0000000000000000 c000001e551a6850 
> GPR24: c0000017f5800140 c0000007efb9c800 0000000000000000 c000001e551a6800 
> GPR28: 0000000000000000 0000000000000000 c000000ff2030158 c000000feb420240 
> NIP [c0000000005ab4a8] scsi_init_sgtable+0xa8/0x180
> LR [c0000000005ab490] scsi_init_sgtable+0x90/0x180
> Call Trace:
> [c000000feb3878a0] [c0000000005ab490] scsi_init_sgtable+0x90/0x180 (unreliable)
> [c000000feb3878e0] [c0000000005ab5d4] scsi_init_io+0x54/0x160
> [c000000feb387930] [c0000000005fb43c] sd_init_command+0x6c/0xb00
> [c000000feb3879f0] [c0000000005ac2f8] scsi_setup_cmnd+0x108/0x1a0
> [c000000feb387a30] [c0000000005ac594] scsi_prep_fn+0x104/0x1c0
> [c000000feb387a70] [c00000000045d85c] blk_peek_request+0x20c/0x390
> [c000000feb387af0] [c0000000005ad6a8] scsi_request_fn+0xb8/0x7f0
> [c000000feb387bf0] [c0000000004583b4] __blk_run_queue+0x54/0x80
> [c000000feb387c20] [c00000000047e178] cfq_kick_queue+0x38/0xc0
> [c000000feb387c50] [c0000000000c00cc] process_one_work+0x2ac/0x560
> [c000000feb387ce0] [c0000000000c0510] worker_thread+0x190/0x660
> [c000000feb387d80] [c0000000000c8198] kthread+0x108/0x130
> [c000000feb387e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
> Instruction dump:
> 2fa30000 409e00e4 e8bf0000 e87e0030 7fc4f378 4beb91ed 60000000 813f0008 
> 78630020 7f891840 7d301026 5529effe <0b090000> 907f0008 38600000 813e005c 
> ---[ end trace 3ea632a512ff48d5 ]---
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-18 14:03 ` Mark Salter
@ 2015-11-19  1:02   ` Michael Ellerman
  2015-11-19  8:23     ` Christoph Hellwig
  0 siblings, 1 reply; 58+ messages in thread
From: Michael Ellerman @ 2015-11-19  1:02 UTC (permalink / raw)
  To: Mark Salter, James E. J. Bottomley, brking
  Cc: linux-scsi, linux-kernel, linuxppc-dev

On Wed, 2015-11-18 at 09:03 -0500, Mark Salter wrote:
> On Wed, 2015-11-18 at 20:18 +1100, Michael Ellerman wrote:
> > Hi folks,
> > 
> > I'm intermittently seeing the following oops on at least one powerpc box.
> > 
> > The BUG_ON() is from:
> > 
> > static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
> > {
> > 	...
> > 	count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
> > 	BUG_ON(count > sdb->table.nents);
> > 
> > Looking at the dump it looks like count was 2, I can't work out what nents was.
> > 
> > The machine's just a fairly boring bare metal setup, with a single IPR adapter:
> > 
> > 0001:08:00.0 RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) (rev 02)
> > 	Subsystem: IBM PCIe3 x8 SAS RAID Internal Adapter 6Gb (57D7)
> > 	Flags: bus master, fast devsel, latency 0
> > 	Kernel driver in use: ipr
> > 
> > 
> > Anyone seen it before or have any ideas?
> 
> I'm also seeing it on arm64 in 4.4-rc1

Ah thanks, that's a good data point. I was assuming it was a driver bug, but I
assume you're not using IPR :)

> [    6.859003] Call trace:                                                                      
> [    6.861439] [<fffffe000074345c>] scsi_init_sgtable+0x84/0x88                                 
> [    6.867072] [<fffffe00007434ac>] scsi_init_io+0x4c/0x1ac                                     
> [    6.872358] [<fffffe000075004c>] sd_setup_read_write_cmnd+0x44/0x844                         
> [    6.878682] [<fffffe0000750884>] sd_init_command+0x38/0xb0                                   
> [    6.884141] [<fffffe00007436e4>] scsi_setup_cmnd+0xd8/0x13c                                  
> [    6.889686] [<fffffe000074397c>] scsi_prep_fn+0xc0/0x140                                     
> [    6.894973] [<fffffe00003bd0cc>] blk_peek_request+0x148/0x24c                                
> [    6.900692] [<fffffe0000744754>] scsi_request_fn+0x58/0x648                                  
> [    6.906237] [<fffffe00003b8e80>] __blk_run_queue+0x40/0x58                                   
> [    6.911696] [<fffffe00003b91a4>] blk_run_queue+0x30/0x48                                     
> [    6.916983] [<fffffe00007422e4>] scsi_run_queue+0x204/0x294                                  
> [    6.922528] [<fffffe000074321c>] scsi_end_request+0x13c/0x1a0                                
> [    6.928247] [<fffffe0000744ef4>] scsi_io_completion+0xf0/0x564                               
> [    6.934052] [<fffffe000073beb0>] scsi_finish_command+0xe4/0x144                              
> [    6.939943] [<fffffe00007446cc>] scsi_softirq_done+0x148/0x178                               
> [    6.945748] [<fffffe00003c39a4>] blk_done_softirq+0x7c/0x94                                  
> [    6.951295] [<fffffe00000c8f44>] __do_softirq+0x114/0x2a0                                    
> [    6.956667] [<fffffe00000c9340>] irq_exit+0x8c/0xe4                                          
> [    6.961522] [<fffffe000009cf00>] handle_IPI+0x170/0x228                                      
> [    6.966721] [<fffffe00000904e0>] gic_handle_irq+0xa0/0xb8                                    
> [    6.972093] Exception stack(0xfffffe03dc143de0 to 0xfffffe03dc143f00)                        

cheers


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-19  1:02   ` Michael Ellerman
@ 2015-11-19  8:23     ` Christoph Hellwig
  2015-11-19 15:35         ` Hannes Reinecke
  2015-11-20 12:10         ` Michael Ellerman
  0 siblings, 2 replies; 58+ messages in thread
From: Christoph Hellwig @ 2015-11-19  8:23 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block

It's pretty much guaranteed a block layer bug, most likely in the
merge bios to request infrastucture where we don't obey the merging
limits properly.

Does either of you have a known good and first known bad kernel?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-19  8:23     ` Christoph Hellwig
@ 2015-11-19 15:35         ` Hannes Reinecke
  2015-11-20 12:10         ` Michael Ellerman
  1 sibling, 0 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-19 15:35 UTC (permalink / raw)
  To: Christoph Hellwig, Michael Ellerman
  Cc: Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block

On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
> 
> Does either of you have a known good and first known bad kernel?

Well, I have been fighting a similar issue for several months now,
albeit with multipath enabled. Haven't had much progress with this,
sadly.
Seeing that this is our distro kernel it might or might not be
related; however, as the symptoms are identical there still is a
chance that this is actually a generic block-layer problem.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-19 15:35         ` Hannes Reinecke
  0 siblings, 0 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-19 15:35 UTC (permalink / raw)
  To: Christoph Hellwig, Michael Ellerman
  Cc: Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block

On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
> 
> Does either of you have a known good and first known bad kernel?

Well, I have been fighting a similar issue for several months now,
albeit with multipath enabled. Haven't had much progress with this,
sadly.
Seeing that this is our distro kernel it might or might not be
related; however, as the symptoms are identical there still is a
chance that this is actually a generic block-layer problem.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-19  8:23     ` Christoph Hellwig
@ 2015-11-20 12:10         ` Michael Ellerman
  2015-11-20 12:10         ` Michael Ellerman
  1 sibling, 0 replies; 58+ messages in thread
From: Michael Ellerman @ 2015-11-20 12:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block, Laurent Dufour

On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:

> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
> 
> Does either of you have a known good and first known bad kernel?

Not me, I've only hit it one or two times. All I can say is I have hit it in
4.4-rc1.

Laurent, can you narrow it down at all?

cheers


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-20 12:10         ` Michael Ellerman
  0 siblings, 0 replies; 58+ messages in thread
From: Michael Ellerman @ 2015-11-20 12:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block, Laurent Dufour

On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:

> It's pretty much guaranteed a block layer bug, most likely in the
> merge bios to request infrastucture where we don't obey the merging
> limits properly.
> 
> Does either of you have a known good and first known bad kernel?

Not me, I've only hit it one or two times. All I can say is I have hit it in
4.4-rc1.

Laurent, can you narrow it down at all?

cheers

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-20 12:10         ` Michael Ellerman
  (?)
@ 2015-11-20 12:56         ` Laurent Dufour
  2015-11-20 13:37           ` Mark Salter
  -1 siblings, 1 reply; 58+ messages in thread
From: Laurent Dufour @ 2015-11-20 12:56 UTC (permalink / raw)
  To: Michael Ellerman, Christoph Hellwig
  Cc: Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block

On 20/11/2015 13:10, Michael Ellerman wrote:
> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> 
>> It's pretty much guaranteed a block layer bug, most likely in the
>> merge bios to request infrastucture where we don't obey the merging
>> limits properly.
>>
>> Does either of you have a known good and first known bad kernel?
> 
> Not me, I've only hit it one or two times. All I can say is I have hit it in
> 4.4-rc1.
> 
> Laurent, can you narrow it down at all?

I'm trying bisect it but it's very time consuming, and I think I missed
some bad kernel because I didn't let enough time to the system to panic.

But I'm still on...

Cheers


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-20 12:56         ` Laurent Dufour
@ 2015-11-20 13:37           ` Mark Salter
  0 siblings, 0 replies; 58+ messages in thread
From: Mark Salter @ 2015-11-20 13:37 UTC (permalink / raw)
  To: Laurent Dufour, Michael Ellerman, Christoph Hellwig
  Cc: James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block

On Fri, 2015-11-20 at 13:56 +0100, Laurent Dufour wrote:
> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > 
> > > It's pretty much guaranteed a block layer bug, most likely in the
> > > merge bios to request infrastucture where we don't obey the merging
> > > limits properly.
> > > 
> > > Does either of you have a known good and first known bad kernel?
> > 
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> > 
> > Laurent, can you narrow it down at all?
> 
> I'm trying bisect it but it's very time consuming, and I think I missed
> some bad kernel because I didn't let enough time to the system to panic.
> 
> But I'm still on...
> 
I usually hit it at boot time. Maybe one out of five times. I'll take a stab
at bisecting today...


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-19 15:35         ` Hannes Reinecke
  (?)
@ 2015-11-20 14:38         ` Ewan Milne
  2015-11-20 14:55             ` Hannes Reinecke
  -1 siblings, 1 reply; 58+ messages in thread
From: Ewan Milne @ 2015-11-20 14:38 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block

On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
> > It's pretty much guaranteed a block layer bug, most likely in the
> > merge bios to request infrastucture where we don't obey the merging
> > limits properly.
> > 
> > Does either of you have a known good and first known bad kernel?
> 
> Well, I have been fighting a similar issue for several months now,
> albeit with multipath enabled. Haven't had much progress with this,
> sadly.
> Seeing that this is our distro kernel it might or might not be
> related; however, as the symptoms are identical there still is a
> chance that this is actually a generic block-layer problem.
> 
> Cheers,
> 
> Hannes

We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
blk_rq_map_sg() returned 4.)  I was suspicious of the patch:

bio: modify __bio_add_page() to accept pages that don't start a new segment

But we put some debugging code in and didn't hit it.  We haven't
found the problem yet, either, though.  We're still looking.

As Christoph said, it would seem to be a problem with the block layer
merging.

The API for this seems defective, in that blk_rq_map_sg() should
never be returning a value indicating that it overwrote past the
end of the supplied SG array and depend on the caller to check it.
(We could get data corruption on another I/O if it used adjacent
memory for a different SG list, for example.)

-Ewan






^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-20 14:38         ` Ewan Milne
@ 2015-11-20 14:55             ` Hannes Reinecke
  0 siblings, 0 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-20 14:55 UTC (permalink / raw)
  To: emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block

On 11/20/2015 03:38 PM, Ewan Milne wrote:
> On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
>> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
>>> It's pretty much guaranteed a block layer bug, most likely in the
>>> merge bios to request infrastucture where we don't obey the merging
>>> limits properly.
>>>
>>> Does either of you have a known good and first known bad kernel?
>>
>> Well, I have been fighting a similar issue for several months now,
>> albeit with multipath enabled. Haven't had much progress with this,
>> sadly.
>> Seeing that this is our distro kernel it might or might not be
>> related; however, as the symptoms are identical there still is a
>> chance that this is actually a generic block-layer problem.
>>
>> Cheers,
>>
>> Hannes
> 
> We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
> blk_rq_map_sg() returned 4.)  I was suspicious of the patch:
> 
> bio: modify __bio_add_page() to accept pages that don't start a new segment
> 
> But we put some debugging code in and didn't hit it.  We haven't
> found the problem yet, either, though.  We're still looking.
> 
Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.

I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...

> As Christoph said, it would seem to be a problem with the block layer
> merging.
> 
> The API for this seems defective, in that blk_rq_map_sg() should
> never be returning a value indicating that it overwrote past the
> end of the supplied SG array and depend on the caller to check it.
> (We could get data corruption on another I/O if it used adjacent
> memory for a different SG list, for example.)
> 
Yeah, the API is bloody useless.
By the time you hit the BUG_ON you've already caused a memory
corruption, so no way you can recover there.

At the very least we should be passing in the sg list count into
blk_map_rq_sg(), but that's a core blocklayer API and changes here
would require changes by quite a set of drivers. Plus it wouldn't
help me for a distribution kernel ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-20 14:55             ` Hannes Reinecke
  0 siblings, 0 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-20 14:55 UTC (permalink / raw)
  To: emilne
  Cc: linux-block, linux-scsi, James E. J. Bottomley, linux-kernel,
	Christoph Hellwig, brking, Mark Salter, linuxppc-dev

On 11/20/2015 03:38 PM, Ewan Milne wrote:
> On Thu, 2015-11-19 at 16:35 +0100, Hannes Reinecke wrote:
>> On 11/19/2015 09:23 AM, Christoph Hellwig wrote:
>>> It's pretty much guaranteed a block layer bug, most likely in the
>>> merge bios to request infrastucture where we don't obey the merging
>>> limits properly.
>>>
>>> Does either of you have a known good and first known bad kernel?
>>
>> Well, I have been fighting a similar issue for several months now,
>> albeit with multipath enabled. Haven't had much progress with this,
>> sadly.
>> Seeing that this is our distro kernel it might or might not be
>> related; however, as the symptoms are identical there still is a
>> chance that this is actually a generic block-layer problem.
>>
>> Cheers,
>>
>> Hannes
> 
> We have seen this also.  (e.g.  req->nr_phys_segments was 3, but
> blk_rq_map_sg() returned 4.)  I was suspicious of the patch:
> 
> bio: modify __bio_add_page() to accept pages that don't start a new segment
> 
> But we put some debugging code in and didn't hit it.  We haven't
> found the problem yet, either, though.  We're still looking.
> 
Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.

I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...

> As Christoph said, it would seem to be a problem with the block layer
> merging.
> 
> The API for this seems defective, in that blk_rq_map_sg() should
> never be returning a value indicating that it overwrote past the
> end of the supplied SG array and depend on the caller to check it.
> (We could get data corruption on another I/O if it used adjacent
> memory for a different SG list, for example.)
> 
Yeah, the API is bloody useless.
By the time you hit the BUG_ON you've already caused a memory
corruption, so no way you can recover there.

At the very least we should be passing in the sg list count into
blk_map_rq_sg(), but that's a core blocklayer API and changes here
would require changes by quite a set of drivers. Plus it wouldn't
help me for a distribution kernel ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-20 14:55             ` Hannes Reinecke
  (?)
@ 2015-11-20 15:28             ` Ewan Milne
  2015-11-23  6:55                 ` Hannes Reinecke
  2015-11-25  9:04               ` Hannes Reinecke
  -1 siblings, 2 replies; 58+ messages in thread
From: Ewan Milne @ 2015-11-20 15:28 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> Can't we have a joint effort here?
> I've been spending a _LOT_ of time trying to debug things here, but
> none of the ideas I've come up with have been able to fix anything.

Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.

> 
> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> by one and be done with ...
> 

That might not fix it if it is a problem with the merge code, though.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-20 12:10         ` Michael Ellerman
@ 2015-11-21 11:30           ` Laurent Dufour
  -1 siblings, 0 replies; 58+ messages in thread
From: Laurent Dufour @ 2015-11-21 11:30 UTC (permalink / raw)
  To: Michael Ellerman, Ming Lei
  Cc: Christoph Hellwig, Mark Salter, James E. J. Bottomley, brking,
	linux-scsi, linux-kernel, linuxppc-dev, linux-block

On 20/11/2015 13:10, Michael Ellerman wrote:
> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> 
>> It's pretty much guaranteed a block layer bug, most likely in the
>> merge bios to request infrastucture where we don't obey the merging
>> limits properly.
>>
>> Does either of you have a known good and first known bad kernel?
> 
> Not me, I've only hit it one or two times. All I can say is I have hit it in
> 4.4-rc1.
> 
> Laurent, can you narrow it down at all?

It seems that the panic is triggered by the commit bdced438acd8 ("block:
setup bi_phys_segments after splitting") which has been pulled by the
merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
git://git.kernel.dk/linux-block").

My system is panicing promptly when running a kernel built at
d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
without panicing.

This being said, I can't explain what's going wrong.

May Ming shed some light here ?

Cheers



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-21 11:30           ` Laurent Dufour
  0 siblings, 0 replies; 58+ messages in thread
From: Laurent Dufour @ 2015-11-21 11:30 UTC (permalink / raw)
  To: Michael Ellerman, Ming Lei
  Cc: linux-block, linux-scsi, James E. J. Bottomley, linux-kernel,
	Christoph Hellwig, brking, Mark Salter, linuxppc-dev

On 20/11/2015 13:10, Michael Ellerman wrote:
> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> 
>> It's pretty much guaranteed a block layer bug, most likely in the
>> merge bios to request infrastucture where we don't obey the merging
>> limits properly.
>>
>> Does either of you have a known good and first known bad kernel?
> 
> Not me, I've only hit it one or two times. All I can say is I have hit it in
> 4.4-rc1.
> 
> Laurent, can you narrow it down at all?

It seems that the panic is triggered by the commit bdced438acd8 ("block:
setup bi_phys_segments after splitting") which has been pulled by the
merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
git://git.kernel.dk/linux-block").

My system is panicing promptly when running a kernel built at
d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
without panicing.

This being said, I can't explain what's going wrong.

May Ming shed some light here ?

Cheers


_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-21 11:30           ` Laurent Dufour
@ 2015-11-21 16:56             ` Ming Lei
  -1 siblings, 0 replies; 58+ messages in thread
From: Ming Lei @ 2015-11-21 16:56 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Michael Ellerman, Christoph Hellwig, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, tom.leiming

On Sat, 21 Nov 2015 12:30:14 +0100
Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:

> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > 
> >> It's pretty much guaranteed a block layer bug, most likely in the
> >> merge bios to request infrastucture where we don't obey the merging
> >> limits properly.
> >>
> >> Does either of you have a known good and first known bad kernel?
> > 
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> > 
> > Laurent, can you narrow it down at all?
> 
> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> setup bi_phys_segments after splitting") which has been pulled by the
> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> git://git.kernel.dk/linux-block").
> 
> My system is panicing promptly when running a kernel built at
> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> without panicing.
> 
> This being said, I can't explain what's going wrong.
> 
> May Ming shed some light here ?

Laurent, looks there is one bug in blk_bio_segment_split(), would you
mind testing the following patch to see if it fixes your issue?

---
>From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Sun, 22 Nov 2015 00:47:13 +0800
Subject: [PATCH] block: fix segment split

Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-merge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5716d8..f2efe8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 
 			seg_size += bv.bv_len;
 			bvprv = bv;
-			bvprvp = &bv;
+			bvprvp = &bvprv;
 			sectors += bv.bv_len >> 9;
 			continue;
 		}
@@ -108,7 +108,7 @@ new_segment:
 
 		nsegs++;
 		bvprv = bv;
-		bvprvp = &bv;
+		bvprvp = &bvprv;
 		seg_size = bv.bv_len;
 		sectors += bv.bv_len >> 9;
 	}
-- 
1.9.1



Thanks,
Ming



^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-21 16:56             ` Ming Lei
  0 siblings, 0 replies; 58+ messages in thread
From: Ming Lei @ 2015-11-21 16:56 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Michael Ellerman, Christoph Hellwig, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, tom.leiming

On Sat, 21 Nov 2015 12:30:14 +0100
Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:

> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >=20
> >> It's pretty much guaranteed a block layer bug, most likely in the
> >> merge bios to request infrastucture where we don't obey the merging
> >> limits properly.
> >>
> >> Does either of you have a known good and first known bad kernel?
> >=20
> > Not me, I've only hit it one or two times. All I can say is I have hit =
it in
> > 4.4-rc1.
> >=20
> > Laurent, can you narrow it down at all?
>=20
> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> setup bi_phys_segments after splitting") which has been pulled by the
> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> git://git.kernel.dk/linux-block").
>=20
> My system is panicing promptly when running a kernel built at
> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> without panicing.
>=20
> This being said, I can't explain what's going wrong.
>=20
> May Ming shed some light here ?

Laurent, looks there is one bug in blk_bio_segment_split(), would you
mind testing the following patch to see if it fixes your issue?

---
=46rom 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Sun, 22 Nov 2015 00:47:13 +0800
Subject: [PATCH] block: fix segment split

Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-merge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5716d8..f2efe8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_q=
ueue *q,
=20
 			seg_size +=3D bv.bv_len;
 			bvprv =3D bv;
-			bvprvp =3D &bv;
+			bvprvp =3D &bvprv;
 			sectors +=3D bv.bv_len >> 9;
 			continue;
 		}
@@ -108,7 +108,7 @@ new_segment:
=20
 		nsegs++;
 		bvprv =3D bv;
-		bvprvp =3D &bv;
+		bvprvp =3D &bvprv;
 		seg_size =3D bv.bv_len;
 		sectors +=3D bv.bv_len >> 9;
 	}
--=20
1.9.1



Thanks,
Ming

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-21 16:56             ` Ming Lei
  (?)
@ 2015-11-22 23:20             ` Mark Salter
  2015-11-23  0:36               ` Ming Lei
  2015-11-23 13:57                 ` Laurent Dufour
  -1 siblings, 2 replies; 58+ messages in thread
From: Mark Salter @ 2015-11-22 23:20 UTC (permalink / raw)
  To: Ming Lei, Laurent Dufour
  Cc: Michael Ellerman, Christoph Hellwig, James E. J. Bottomley,
	brking, linux-scsi, linux-kernel, linuxppc-dev, linux-block,
	tom.leiming

On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> On Sat, 21 Nov 2015 12:30:14 +0100
> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> 
> > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > 
> > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > merge bios to request infrastucture where we don't obey the merging
> > > > limits properly.
> > > > 
> > > > Does either of you have a known good and first known bad kernel?
> > > 
> > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > > 4.4-rc1.
> > > 
> > > Laurent, can you narrow it down at all?
> > 
> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > setup bi_phys_segments after splitting") which has been pulled by the
> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > git://git.kernel.dk/linux-block").
> > 
> > My system is panicing promptly when running a kernel built at
> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > without panicing.
> > 
> > This being said, I can't explain what's going wrong.
> > 
> > May Ming shed some light here ?
> 
> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> mind testing the following patch to see if it fixes your issue?
> 
> ---
> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Sun, 22 Nov 2015 00:47:13 +0800
> Subject: [PATCH] block: fix segment split
> 
> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> always points to the iterator local variable, which is obviously
> wrong, so fix it by pointing to the local variable of 'bvprv'.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  block/blk-merge.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index de5716d8..f2efe8a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>  
>  			seg_size += bv.bv_len;
>  			bvprv = bv;
> -			bvprvp = &bv;
> +			bvprvp = &bvprv;
>  			sectors += bv.bv_len >> 9;
>  			continue;
>  		}
> @@ -108,7 +108,7 @@ new_segment:
>  
>  		nsegs++;
>  		bvprv = bv;
> -		bvprvp = &bv;
> +		bvprvp = &bvprv;
>  		seg_size = bv.bv_len;
>  		sectors += bv.bv_len >> 9;
>  	}

I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-22 23:20             ` Mark Salter
@ 2015-11-23  0:36               ` Ming Lei
  2015-11-23  1:50                   ` Mark Salter
  2015-11-23 13:57                 ` Laurent Dufour
  1 sibling, 1 reply; 58+ messages in thread
From: Ming Lei @ 2015-11-23  0:36 UTC (permalink / raw)
  To: Mark Salter
  Cc: Laurent Dufour, Michael Ellerman, Christoph Hellwig,
	James E. J. Bottomley, brking, Linux SCSI List,
	Linux Kernel Mailing List, linuxppc-dev, linux-block

On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>
>> > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > >
>> > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > merge bios to request infrastucture where we don't obey the merging
>> > > > limits properly.
>> > > >
>> > > > Does either of you have a known good and first known bad kernel?
>> > >
>> > > Not me, I've only hit it one or two times. All I can say is I have hit it in
>> > > 4.4-rc1.
>> > >
>> > > Laurent, can you narrow it down at all?
>> >
>> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
>> > setup bi_phys_segments after splitting") which has been pulled by the
>> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > git://git.kernel.dk/linux-block").
>> >
>> > My system is panicing promptly when running a kernel built at
>> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > without panicing.
>> >
>> > This being said, I can't explain what's going wrong.
>> >
>> > May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <ming.lei@canonical.com>
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>
>>                       seg_size += bv.bv_len;
>>                       bvprv = bv;
>> -                     bvprvp = &bv;
>> +                     bvprvp = &bvprv;
>>                       sectors += bv.bv_len >> 9;
>>                       continue;
>>               }
>> @@ -108,7 +108,7 @@ new_segment:
>>
>>               nsegs++;
>>               bvprv = bv;
>> -             bvprvp = &bv;
>> +             bvprvp = &bvprv;
>>               seg_size = bv.bv_len;
>>               sectors += bv.bv_len >> 9;
>>       }
>
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

OK, looks there are still other bugs, care to share us how to reproduce
it on arm64?

thanks,
Ming

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23  0:36               ` Ming Lei
@ 2015-11-23  1:50                   ` Mark Salter
  0 siblings, 0 replies; 58+ messages in thread
From: Mark Salter @ 2015-11-23  1:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Laurent Dufour, Michael Ellerman, Christoph Hellwig,
	James E. J. Bottomley, brking, Linux SCSI List,
	Linux Kernel Mailing List, linuxppc-dev, linux-block

On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> > > On Sat, 21 Nov 2015 12:30:14 +0100
> > > Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> > > 
> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > > > 
> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > > > merge bios to request infrastucture where we don't obey the merging
> > > > > > limits properly.
> > > > > > 
> > > > > > Does either of you have a known good and first known bad kernel?
> > > > > 
> > > > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > > > > 4.4-rc1.
> > > > > 
> > > > > Laurent, can you narrow it down at all?
> > > > 
> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > > > setup bi_phys_segments after splitting") which has been pulled by the
> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > > > git://git.kernel.dk/linux-block").
> > > > 
> > > > My system is panicing promptly when running a kernel built at
> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > > > without panicing.
> > > > 
> > > > This being said, I can't explain what's going wrong.
> > > > 
> > > > May Ming shed some light here ?
> > > 
> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> > > mind testing the following patch to see if it fixes your issue?
> > > 
> > > ---
> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> > > From: Ming Lei <ming.lei@canonical.com>
> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> > > Subject: [PATCH] block: fix segment split
> > > 
> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> > > always points to the iterator local variable, which is obviously
> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@canonical.com>
> > > ---
> > >  block/blk-merge.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > > index de5716d8..f2efe8a 100644
> > > --- a/block/blk-merge.c
> > > +++ b/block/blk-merge.c
> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> > > 
> > >                       seg_size += bv.bv_len;
> > >                       bvprv = bv;
> > > -                     bvprvp = &bv;
> > > +                     bvprvp = &bvprv;
> > >                       sectors += bv.bv_len >> 9;
> > >                       continue;
> > >               }
> > > @@ -108,7 +108,7 @@ new_segment:
> > > 
> > >               nsegs++;
> > >               bvprv = bv;
> > > -             bvprvp = &bv;
> > > +             bvprvp = &bvprv;
> > >               seg_size = bv.bv_len;
> > >               sectors += bv.bv_len >> 9;
> > >       }
> > 
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> 
> OK, looks there are still other bugs, care to share us how to reproduce
> it on arm64?
> 
> thanks,
> Ming

Unfortunately, the best reproducer I have is to boot the platform. I have seen the
BUG a few times post-boot, but I don't have a consistant reproducer. I am using
upstream 4.4-rc1 with this config:

  http://people.redhat.com/msalter/fh_defconfig

With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
On an AMD Seattle platform, about every 9 boots.

I have a script that loops through an ssh command to reboot the platform under test.
I manually install test kernels and then run the script and wait for failure. While
debugging, I have tried more minimal configs with which I have been unable to
reproduce the problem even after several hours of reboots. With the above mentioned
fh_defconfig, I have been able to get a failure within 20 or so boots with most
kernel builds but at certain kernel commits, the failure has taken a longer time to
reproduce.

>From my POV, I can't say which commit causes the problem. So far, I have not been
able to reproduce at all before commit d9734e0d1ccf but I am currently trying to
reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-23  1:50                   ` Mark Salter
  0 siblings, 0 replies; 58+ messages in thread
From: Mark Salter @ 2015-11-23  1:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Laurent Dufour, Michael Ellerman, Christoph Hellwig,
	James E. J. Bottomley, brking, Linux SCSI List,
	Linux Kernel Mailing List, linuxppc-dev, linux-block

On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> > > On Sat, 21 Nov 2015 12:30:14 +0100
> > > Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> > > 
> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > > > 
> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > > > merge bios to request infrastucture where we don't obey the merging
> > > > > > limits properly.
> > > > > > 
> > > > > > Does either of you have a known good and first known bad kernel?
> > > > > 
> > > > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > > > > 4.4-rc1.
> > > > > 
> > > > > Laurent, can you narrow it down at all?
> > > > 
> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > > > setup bi_phys_segments after splitting") which has been pulled by the
> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > > > git://git.kernel.dk/linux-block").
> > > > 
> > > > My system is panicing promptly when running a kernel built at
> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > > > without panicing.
> > > > 
> > > > This being said, I can't explain what's going wrong.
> > > > 
> > > > May Ming shed some light here ?
> > > 
> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> > > mind testing the following patch to see if it fixes your issue?
> > > 
> > > ---
> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> > > From: Ming Lei <ming.lei@canonical.com>
> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> > > Subject: [PATCH] block: fix segment split
> > > 
> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> > > always points to the iterator local variable, which is obviously
> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@canonical.com>
> > > ---
> > >  block/blk-merge.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > > index de5716d8..f2efe8a 100644
> > > --- a/block/blk-merge.c
> > > +++ b/block/blk-merge.c
> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> > > 
> > >                       seg_size += bv.bv_len;
> > >                       bvprv = bv;
> > > -                     bvprvp = &bv;
> > > +                     bvprvp = &bvprv;
> > >                       sectors += bv.bv_len >> 9;
> > >                       continue;
> > >               }
> > > @@ -108,7 +108,7 @@ new_segment:
> > > 
> > >               nsegs++;
> > >               bvprv = bv;
> > > -             bvprvp = &bv;
> > > +             bvprvp = &bvprv;
> > >               seg_size = bv.bv_len;
> > >               sectors += bv.bv_len >> 9;
> > >       }
> > 
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> 
> OK, looks there are still other bugs, care to share us how to reproduce
> it on arm64?
> 
> thanks,
> Ming

Unfortunately, the best reproducer I have is to boot the platform. I have seen the
BUG a few times post-boot, but I don't have a consistant reproducer. I am using
upstream 4.4-rc1 with this config:

  http://people.redhat.com/msalter/fh_defconfig

With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
On an AMD Seattle platform, about every 9 boots.

I have a script that loops through an ssh command to reboot the platform under test.
I manually install test kernels and then run the script and wait for failure. While
debugging, I have tried more minimal configs with which I have been unable to
reproduce the problem even after several hours of reboots. With the above mentioned
fh_defconfig, I have been able to get a failure within 20 or so boots with most
kernel builds but at certain kernel commits, the failure has taken a longer time to
reproduce.

From my POV, I can't say which commit causes the problem. So far, I have not been
able to reproduce at all before commit d9734e0d1ccf but I am currently trying to
reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23  1:50                   ` Mark Salter
  (?)
@ 2015-11-23  2:46                   ` Ming Lei
  2015-11-23 15:21                       ` Ming Lei
  -1 siblings, 1 reply; 58+ messages in thread
From: Ming Lei @ 2015-11-23  2:46 UTC (permalink / raw)
  To: Mark Salter
  Cc: Laurent Dufour, Michael Ellerman, Christoph Hellwig,
	James E. J. Bottomley, brking, Linux SCSI List,
	Linux Kernel Mailing List, linuxppc-dev, linux-block

[-- Attachment #1: Type: text/plain, Size: 4947 bytes --]

Hi Mark,

On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msalter@redhat.com> wrote:
> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
>> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> > > On Sat, 21 Nov 2015 12:30:14 +0100
>> > > Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>> > >
>> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > > > >
>> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > > > merge bios to request infrastucture where we don't obey the merging
>> > > > > > limits properly.
>> > > > > >
>> > > > > > Does either of you have a known good and first known bad kernel?
>> > > > >
>> > > > > Not me, I've only hit it one or two times. All I can say is I have hit it in
>> > > > > 4.4-rc1.
>> > > > >
>> > > > > Laurent, can you narrow it down at all?
>> > > >
>> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
>> > > > setup bi_phys_segments after splitting") which has been pulled by the
>> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > > > git://git.kernel.dk/linux-block").
>> > > >
>> > > > My system is panicing promptly when running a kernel built at
>> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > > > without panicing.
>> > > >
>> > > > This being said, I can't explain what's going wrong.
>> > > >
>> > > > May Ming shed some light here ?
>> > >
>> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> > > mind testing the following patch to see if it fixes your issue?
>> > >
>> > > ---
>> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> > > From: Ming Lei <ming.lei@canonical.com>
>> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
>> > > Subject: [PATCH] block: fix segment split
>> > >
>> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> > > always points to the iterator local variable, which is obviously
>> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
>> > >
>> > > Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> > > ---
>> > >  block/blk-merge.c | 4 ++--
>> > >  1 file changed, 2 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
>> > > index de5716d8..f2efe8a 100644
>> > > --- a/block/blk-merge.c
>> > > +++ b/block/blk-merge.c
>> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>> > >
>> > >                       seg_size += bv.bv_len;
>> > >                       bvprv = bv;
>> > > -                     bvprvp = &bv;
>> > > +                     bvprvp = &bvprv;
>> > >                       sectors += bv.bv_len >> 9;
>> > >                       continue;
>> > >               }
>> > > @@ -108,7 +108,7 @@ new_segment:
>> > >
>> > >               nsegs++;
>> > >               bvprv = bv;
>> > > -             bvprvp = &bv;
>> > > +             bvprvp = &bvprv;
>> > >               seg_size = bv.bv_len;
>> > >               sectors += bv.bv_len >> 9;
>> > >       }
>> >
>> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> OK, looks there are still other bugs, care to share us how to reproduce
>> it on arm64?
>>
>> thanks,
>> Ming
>
> Unfortunately, the best reproducer I have is to boot the platform. I have seen the
> BUG a few times post-boot, but I don't have a consistant reproducer. I am using
> upstream 4.4-rc1 with this config:
>
>   http://people.redhat.com/msalter/fh_defconfig
>
> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
> On an AMD Seattle platform, about every 9 boots.

Thanks for the input, and I will try to reproduce the issue on mustang with
your kernel config.

>
> I have a script that loops through an ssh command to reboot the platform under test.
> I manually install test kernels and then run the script and wait for failure. While
> debugging, I have tried more minimal configs with which I have been unable to
> reproduce the problem even after several hours of reboots. With the above mentioned
> fh_defconfig, I have been able to get a failure within 20 or so boots with most
> kernel builds but at certain kernel commits, the failure has taken a longer time to
> reproduce.
>
> From my POV, I can't say which commit causes the problem. So far, I have not been
> able to reproduce at all before commit d9734e0d1ccf but I am currently trying to
> reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).

The patch for fixing 'bvprvp' is better to be included for test,
because that issue
may have a big effect on computing physical seg count.

Also I appreciate if you, Laurent or anyone may provide debug log which
can be captured with the attached debug patch when this issue is trigered .


Thanks,

[-- Attachment #2: dbg_scsi_blk_merge.patch --]
[-- Type: text/x-patch, Size: 1429 bytes --]

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dd8ad2a..a0db1fe 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1077,14 +1077,31 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	}
 }
 
+static void scsi_dump_req(struct request *req)
+{
+	struct bio_vec bvec;
+	struct bio *bio;
+	struct bvec_iter iter;
+	int i = 0;
+
+	__rq_for_each_bio(bio, req) {
+		printk("%d-%p\n", i, bio);
+		bio_for_each_segment(bvec, bio, iter) {
+			printk("\t %p %u %u\n", bvec.bv_page,
+					bvec.bv_offset, bvec.bv_len);
+		}
+	}
+}
+
 static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
 {
-	int count;
+	int count, pre_count = req->nr_phys_segments;
+	int alloc_cnt = queue_max_segments(req->q);
 
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	if (unlikely(scsi_alloc_sgtable(sdb, req->nr_phys_segments,
+	if (unlikely(scsi_alloc_sgtable(sdb, alloc_cnt,
 					req->mq_ctx != NULL)))
 		return BLKPREP_DEFER;
 
@@ -1093,6 +1110,11 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb)
 	 * each segment.
 	 */
 	count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
+
+	if (count > sdb->table.nents) {
+		printk("%s prev vs. now: %d-%d\n", __func__, pre_count, count);
+		scsi_dump_req(req);
+	}
 	BUG_ON(count > sdb->table.nents);
 	sdb->table.nents = count;
 	sdb->length = blk_rq_bytes(req);

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-20 15:28             ` Ewan Milne
@ 2015-11-23  6:55                 ` Hannes Reinecke
  2015-11-25  9:04               ` Hannes Reinecke
  1 sibling, 0 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-23  6:55 UTC (permalink / raw)
  To: emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block

On 11/20/2015 04:28 PM, Ewan Milne wrote:
> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>> Can't we have a joint effort here?
>> I've been spending a _LOT_ of time trying to debug things here, but
>> none of the ideas I've come up with have been able to fix anything.
>
> Yes.  I'm not the one primarily looking at it, and we don't have a
> reproducer in-house.  We just have the one dump right now.
>
Oh, I got plenty of them :-(

>>
>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>> by one and be done with ...
>>
>
> That might not fix it if it is a problem with the merge code, though.
>
Of course not. But it'll be a band-aid to keep the customer happy.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-23  6:55                 ` Hannes Reinecke
  0 siblings, 0 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-23  6:55 UTC (permalink / raw)
  To: emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block

On 11/20/2015 04:28 PM, Ewan Milne wrote:
> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>> Can't we have a joint effort here?
>> I've been spending a _LOT_ of time trying to debug things here, but
>> none of the ideas I've come up with have been able to fix anything.
>
> Yes.  I'm not the one primarily looking at it, and we don't have a
> reproducer in-house.  We just have the one dump right now.
>
Oh, I got plenty of them :-(

>>
>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>> by one and be done with ...
>>
>
> That might not fix it if it is a problem with the merge code, though.
>
Of course not. But it'll be a band-aid to keep the customer happy.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-22 23:20             ` Mark Salter
@ 2015-11-23 13:57                 ` Laurent Dufour
  2015-11-23 13:57                 ` Laurent Dufour
  1 sibling, 0 replies; 58+ messages in thread
From: Laurent Dufour @ 2015-11-23 13:57 UTC (permalink / raw)
  To: Mark Salter, Ming Lei
  Cc: linux-block, linux-scsi, James E. J. Bottomley, linux-kernel,
	Christoph Hellwig, brking, tom.leiming, linuxppc-dev

On 23/11/2015 00:20, Mark Salter wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>
>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>
>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>> limits properly.
>>>>>
>>>>> Does either of you have a known good and first known bad kernel?
>>>>
>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>> 4.4-rc1.
>>>>
>>>> Laurent, can you narrow it down at all?
>>>
>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>> setup bi_phys_segments after splitting") which has been pulled by the
>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>> git://git.kernel.dk/linux-block").
>>>
>>> My system is panicing promptly when running a kernel built at
>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>> without panicing.
>>>
>>> This being said, I can't explain what's going wrong.
>>>
>>> May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <ming.lei@canonical.com>
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>  
>>  			seg_size += bv.bv_len;
>>  			bvprv = bv;
>> -			bvprvp = &bv;
>> +			bvprvp = &bvprv;
>>  			sectors += bv.bv_len >> 9;
>>  			continue;
>>  		}
>> @@ -108,7 +108,7 @@ new_segment:
>>  
>>  		nsegs++;
>>  		bvprv = bv;
>> -		bvprvp = &bv;
>> +		bvprvp = &bvprv;
>>  		seg_size = bv.bv_len;
>>  		sectors += bv.bv_len >> 9;
>>  	}
> 
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

On my side, with the patch applied on top of 4.4-rc1, I can't get the
panic anymore.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-23 13:57                 ` Laurent Dufour
  0 siblings, 0 replies; 58+ messages in thread
From: Laurent Dufour @ 2015-11-23 13:57 UTC (permalink / raw)
  To: Mark Salter, Ming Lei
  Cc: Christoph Hellwig, linux-scsi, tom.leiming,
	James E. J. Bottomley, linux-kernel, linux-block, brking,
	linuxppc-dev

On 23/11/2015 00:20, Mark Salter wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>
>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>
>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>> limits properly.
>>>>>
>>>>> Does either of you have a known good and first known bad kernel?
>>>>
>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>> 4.4-rc1.
>>>>
>>>> Laurent, can you narrow it down at all?
>>>
>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>> setup bi_phys_segments after splitting") which has been pulled by the
>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>> git://git.kernel.dk/linux-block").
>>>
>>> My system is panicing promptly when running a kernel built at
>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>> without panicing.
>>>
>>> This being said, I can't explain what's going wrong.
>>>
>>> May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <ming.lei@canonical.com>
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>  
>>  			seg_size += bv.bv_len;
>>  			bvprv = bv;
>> -			bvprvp = &bv;
>> +			bvprvp = &bvprv;
>>  			sectors += bv.bv_len >> 9;
>>  			continue;
>>  		}
>> @@ -108,7 +108,7 @@ new_segment:
>>  
>>  		nsegs++;
>>  		bvprv = bv;
>> -		bvprvp = &bv;
>> +		bvprvp = &bvprv;
>>  		seg_size = bv.bv_len;
>>  		sectors += bv.bv_len >> 9;
>>  	}
> 
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

On my side, with the patch applied on top of 4.4-rc1, I can't get the
panic anymore.


_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23 13:57                 ` Laurent Dufour
  (?)
@ 2015-11-23 15:13                 ` Pratyush Anand
  2015-11-23 15:20                   ` Laurent Dufour
  -1 siblings, 1 reply; 58+ messages in thread
From: Pratyush Anand @ 2015-11-23 15:13 UTC (permalink / raw)
  To: Laurent Dufour, Mark Salter, Ming Lei
  Cc: linux-block, linux-scsi, James E. J. Bottomley, linux-kernel,
	Christoph Hellwig, brking, tom.leiming, linuxppc-dev

On 23/11/2015:02:57:19 PM, Laurent Dufour wrote:
> On 23/11/2015 00:20, Mark Salter wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> On Sat, 21 Nov 2015 12:30:14 +0100
> >> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> >>
> >>> On 20/11/2015 13:10, Michael Ellerman wrote:
> >>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >>>>
> >>>>> It's pretty much guaranteed a block layer bug, most likely in the
> >>>>> merge bios to request infrastucture where we don't obey the merging
> >>>>> limits properly.
> >>>>>
> >>>>> Does either of you have a known good and first known bad kernel?
> >>>>
> >>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
> >>>> 4.4-rc1.
> >>>>
> >>>> Laurent, can you narrow it down at all?
> >>>
> >>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> >>> setup bi_phys_segments after splitting") which has been pulled by the
> >>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >>> git://git.kernel.dk/linux-block").
> >>>
> >>> My system is panicing promptly when running a kernel built at
> >>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> >>> without panicing.
> >>>
> >>> This being said, I can't explain what's going wrong.
> >>>
> >>> May Ming shed some light here ?
> >>
> >> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> >> mind testing the following patch to see if it fixes your issue?
> >>
> >> ---
> >> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> >> From: Ming Lei <ming.lei@canonical.com>
> >> Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> Subject: [PATCH] block: fix segment split
> >>
> >> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> always points to the iterator local variable, which is obviously
> >> wrong, so fix it by pointing to the local variable of 'bvprv'.
> >>
> >> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >> ---
> >>  block/blk-merge.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> index de5716d8..f2efe8a 100644
> >> --- a/block/blk-merge.c
> >> +++ b/block/blk-merge.c
> >> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> >>  
> >>  			seg_size += bv.bv_len;
> >>  			bvprv = bv;
> >> -			bvprvp = &bv;
> >> +			bvprvp = &bvprv;
> >>  			sectors += bv.bv_len >> 9;
> >>  			continue;
> >>  		}
> >> @@ -108,7 +108,7 @@ new_segment:
> >>  
> >>  		nsegs++;
> >>  		bvprv = bv;
> >> -		bvprvp = &bv;
> >> +		bvprvp = &bvprv;
> >>  		seg_size = bv.bv_len;
> >>  		sectors += bv.bv_len >> 9;
> >>  	}
> > 
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> 
> On my side, with the patch applied on top of 4.4-rc1, I can't get the
> panic anymore.

git bisect shows:

bdced438acd83ad83a6c6fc7f50099b820245ddb is the first bad commit
commit bdced438acd83ad83a6c6fc7f50099b820245ddb
Author: Ming Lei <ming.lei@canonical.com>
Date:   Tue Oct 20 23:13:52 2015 +0800 

    block: setup bi_phys_segments after splitting

Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.

~Pratyush

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23 15:13                 ` Pratyush Anand
@ 2015-11-23 15:20                   ` Laurent Dufour
  2015-11-23 15:27                     ` Ming Lei
  0 siblings, 1 reply; 58+ messages in thread
From: Laurent Dufour @ 2015-11-23 15:20 UTC (permalink / raw)
  To: Pratyush Anand, Mark Salter, Ming Lei
  Cc: linux-block, linux-scsi, James E. J. Bottomley, linux-kernel,
	Christoph Hellwig, brking, tom.leiming, linuxppc-dev

On 23/11/2015 16:13, Pratyush Anand wrote:
> On 23/11/2015:02:57:19 PM, Laurent Dufour wrote:
>> On 23/11/2015 00:20, Mark Salter wrote:
>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>>>> On Sat, 21 Nov 2015 12:30:14 +0100
>>>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>>>
>>>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>>>
>>>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>>>> limits properly.
>>>>>>>
>>>>>>> Does either of you have a known good and first known bad kernel?
>>>>>>
>>>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>>>> 4.4-rc1.
>>>>>>
>>>>>> Laurent, can you narrow it down at all?
>>>>>
>>>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>>>> setup bi_phys_segments after splitting") which has been pulled by the
>>>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>>>> git://git.kernel.dk/linux-block").
>>>>>
>>>>> My system is panicing promptly when running a kernel built at
>>>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>>>> without panicing.
>>>>>
>>>>> This being said, I can't explain what's going wrong.
>>>>>
>>>>> May Ming shed some light here ?
>>>>
>>>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>>>> mind testing the following patch to see if it fixes your issue?
>>>>
>>>> ---
>>>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>>>> From: Ming Lei <ming.lei@canonical.com>
>>>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>>>> Subject: [PATCH] block: fix segment split
>>>>
>>>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>>>> always points to the iterator local variable, which is obviously
>>>> wrong, so fix it by pointing to the local variable of 'b
> ~Pratyush
> 
vprv'.
>>>>
>>>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>>>> ---
>>>>  block/blk-merge.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>> index de5716d8..f2efe8a 100644
>>>> --- a/block/blk-merge.c
>>>> +++ b/block/blk-merge.c
>>>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>>>  
>>>>  			seg_size += bv.bv_len;bdced438acd8
>>>>  			bvprv = bv;
>>>> -			bvprvp = &bv;
>>>> +			bvprvp = &bvprv;
>>>>  			sectors += bv.bv_len >> 9;
>>>>  			continue;
>>>>  		}
>>>> @@ -108,7 +108,7 @@ new_segment:
>>>>  
>>>>  		nsegs++;
>>>>  		bvprv = bv;
>>>> -		bvprvp = &bv;
>>>> +		bvprvp = &bvprv;
>>>>  		seg_size = bv.bv_len;
>>>>  		sectors += bv.bv_len >> 9;
>>>>  	}
>>>
>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> On my side, with the patch applied on top of 4.4-rc1, I can't get the
>> panic anymore.
> 
> git bisect shows:
> 
> bdced438acd83ad83a6c6fc7f50099b820245ddb is the first bad commit
> commit bdced438acd83ad83a6c6fc7f50099b820245ddb
> Author: Ming Lei <ming.lei@canonical.com>
> Date:   Tue Oct 20 23:13:52 2015 +0800 
> 
>     block: setup bi_phys_segments after splitting
> 
> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.

That's what I mentioned earlier ;)

Now Ming send an additional patch with seems to fix the bug introduced
through the commit bdced438acd8. When testing with this new patch I
can't get the panic anymore, but Mark reported he is still hitting it.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23  2:46                   ` Ming Lei
@ 2015-11-23 15:21                       ` Ming Lei
  0 siblings, 0 replies; 58+ messages in thread
From: Ming Lei @ 2015-11-23 15:21 UTC (permalink / raw)
  To: Mark Salter
  Cc: Ming Lei, Laurent Dufour, Michael Ellerman, Christoph Hellwig,
	James E. J. Bottomley, brking, Linux SCSI List,
	Linux Kernel Mailing List, linuxppc-dev, linux-block, Ming Lei

On Mon, 23 Nov 2015 10:46:20 +0800
Ming Lei <ming.lei@canonical.com> wrote:

> Hi Mark,
> 
> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msalter@redhat.com> wrote:
> > On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> >> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
> >> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> > > On Sat, 21 Nov 2015 12:30:14 +0100
> >> > > Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> >> > >
> >> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> >> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >> > > > >
> >> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> >> > > > > > merge bios to request infrastucture where we don't obey the merging
> >> > > > > > limits properly.
> >> > > > > >
> >> > > > > > Does either of you have a known good and first known bad kernel?
> >> > > > >
> >> > > > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> >> > > > > 4.4-rc1.
> >> > > > >
> >> > > > > Laurent, can you narrow it down at all?
> >> > > >
> >> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> >> > > > setup bi_phys_segments after splitting") which has been pulled by the
> >> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >> > > > git://git.kernel.dk/linux-block").
> >> > > >
> >> > > > My system is panicing promptly when running a kernel built at
> >> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> >> > > > without panicing.
> >> > > >
> >> > > > This being said, I can't explain what's going wrong.
> >> > > >
> >> > > > May Ming shed some light here ?
> >> > >
> >> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> >> > > mind testing the following patch to see if it fixes your issue?
> >> > >
> >> > > ---
> >> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> >> > > From: Ming Lei <ming.lei@canonical.com>
> >> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> > > Subject: [PATCH] block: fix segment split
> >> > >
> >> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> > > always points to the iterator local variable, which is obviously
> >> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> >> > >
> >> > > Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >> > > ---
> >> > >  block/blk-merge.c | 4 ++--
> >> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> >> > >
> >> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> > > index de5716d8..f2efe8a 100644
> >> > > --- a/block/blk-merge.c
> >> > > +++ b/block/blk-merge.c
> >> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> >> > >
> >> > >                       seg_size += bv.bv_len;
> >> > >                       bvprv = bv;
> >> > > -                     bvprvp = &bv;
> >> > > +                     bvprvp = &bvprv;
> >> > >                       sectors += bv.bv_len >> 9;
> >> > >                       continue;
> >> > >               }
> >> > > @@ -108,7 +108,7 @@ new_segment:
> >> > >
> >> > >               nsegs++;
> >> > >               bvprv = bv;
> >> > > -             bvprvp = &bv;
> >> > > +             bvprvp = &bvprv;
> >> > >               seg_size = bv.bv_len;
> >> > >               sectors += bv.bv_len >> 9;
> >> > >       }
> >> >
> >> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> >>
> >> OK, looks there are still other bugs, care to share us how to reproduce
> >> it on arm64?
> >>
> >> thanks,
> >> Ming
> >
> > Unfortunately, the best reproducer I have is to boot the platform. I have seen the
> > BUG a few times post-boot, but I don't have a consistant reproducer. I am using
> > upstream 4.4-rc1 with this config:
> >
> >   http://people.redhat.com/msalter/fh_defconfig
> >
> > With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
> > On an AMD Seattle platform, about every 9 boots.
> 
> Thanks for the input, and I will try to reproduce the issue on mustang with
> your kernel config.

I can reproduce the issue on mustang, and looks I may understand the story now.

When 64K page size is used on arm64, and the default segment size of block
is 65536, then one segment should only include one page at most.

Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
regression.

Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
and bio->bi_seg_back_size in blk_bio_segment_split().

Mark, thanks again for providing the reproduction steps, and could you run your
test to see if it can fix your issue?

---
>From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Mon, 23 Nov 2015 20:27:23 +0800
Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split

Commit bdced438acd83a(block: setup bi_phys_segments after
splitting) introduces function of computing bio->bi_phys_segments
during bio splitting.

Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
arn't computed, so too many physical segments may be obtained
for one request since both the two are used to check if one segment
across two bios can be possible.

This patch fixes the issue by computing the two variables in
blk_bio_segment_split().

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Mark Salter <msalter@redhat.com>
Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-merge.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index f2efe8a..50793cd 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -76,6 +76,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
 	struct bio_vec bv, bvprv, *bvprvp = NULL;
 	struct bvec_iter iter;
 	unsigned seg_size = 0, nsegs = 0, sectors = 0;
+	unsigned front_seg_size = bio->bi_seg_front_size;
+	bool do_split = true;
+	struct bio *new = NULL;
 
 	bio_for_each_segment(bv, bio, iter) {
 		if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
@@ -111,13 +114,26 @@ new_segment:
 		bvprvp = &bvprv;
 		seg_size = bv.bv_len;
 		sectors += bv.bv_len >> 9;
+
+		if (nsegs == 1 && seg_size > front_seg_size)
+			front_seg_size = seg_size;
 	}
 
-	*segs = nsegs;
-	return NULL;
+	do_split = false;
 split:
 	*segs = nsegs;
-	return bio_split(bio, sectors, GFP_NOIO, bs);
+
+	if (do_split) {
+		new = bio_split(bio, sectors, GFP_NOIO, bs);
+		if (new)
+			bio = new;
+	}
+
+	bio->bi_seg_front_size = front_seg_size;
+	if (seg_size > bio->bi_seg_back_size)
+		bio->bi_seg_back_size = seg_size;
+
+	return do_split ? new : NULL;
 }
 
 void blk_queue_split(struct request_queue *q, struct bio **bio,
-- 
1.9.1

Thanks,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-23 15:21                       ` Ming Lei
  0 siblings, 0 replies; 58+ messages in thread
From: Ming Lei @ 2015-11-23 15:21 UTC (permalink / raw)
  To: Mark Salter
  Cc: Ming Lei, Laurent Dufour, Michael Ellerman, Christoph Hellwig,
	James E. J. Bottomley, brking, Linux SCSI List,
	Linux Kernel Mailing List, linuxppc-dev, linux-block, Ming Lei

On Mon, 23 Nov 2015 10:46:20 +0800
Ming Lei <ming.lei@canonical.com> wrote:

> Hi Mark,
>=20
> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msalter@redhat.com> wrote:
> > On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> >> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrot=
e:
> >> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> > > On Sat, 21 Nov 2015 12:30:14 +0100
> >> > > Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> >> > >
> >> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> >> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >> > > > >
> >> > > > > > It's pretty much guaranteed a block layer bug, most likely i=
n the
> >> > > > > > merge bios to request infrastucture where we don't obey the =
merging
> >> > > > > > limits properly.
> >> > > > > >
> >> > > > > > Does either of you have a known good and first known bad ker=
nel?
> >> > > > >
> >> > > > > Not me, I've only hit it one or two times. All I can say is I =
have hit it in
> >> > > > > 4.4-rc1.
> >> > > > >
> >> > > > > Laurent, can you narrow it down at all?
> >> > > >
> >> > > > It seems that the panic is triggered by the commit bdced438acd8 =
("block:
> >> > > > setup bi_phys_segments after splitting") which has been pulled b=
y the
> >> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >> > > > git://git.kernel.dk/linux-block").
> >> > > >
> >> > > > My system is panicing promptly when running a kernel built at
> >> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can ru=
n hours
> >> > > > without panicing.
> >> > > >
> >> > > > This being said, I can't explain what's going wrong.
> >> > > >
> >> > > > May Ming shed some light here ?
> >> > >
> >> > > Laurent, looks there is one bug in blk_bio_segment_split(), would =
you
> >> > > mind testing the following patch to see if it fixes your issue?
> >> > >
> >> > > ---
> >> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 =
2001
> >> > > From: Ming Lei <ming.lei@canonical.com>
> >> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> > > Subject: [PATCH] block: fix segment split
> >> > >
> >> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> > > always points to the iterator local variable, which is obviously
> >> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> >> > >
> >> > > Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >> > > ---
> >> > >  block/blk-merge.c | 4 ++--
> >> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> >> > >
> >> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> > > index de5716d8..f2efe8a 100644
> >> > > --- a/block/blk-merge.c
> >> > > +++ b/block/blk-merge.c
> >> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct =
request_queue *q,
> >> > >
> >> > >                       seg_size +=3D bv.bv_len;
> >> > >                       bvprv =3D bv;
> >> > > -                     bvprvp =3D &bv;
> >> > > +                     bvprvp =3D &bvprv;
> >> > >                       sectors +=3D bv.bv_len >> 9;
> >> > >                       continue;
> >> > >               }
> >> > > @@ -108,7 +108,7 @@ new_segment:
> >> > >
> >> > >               nsegs++;
> >> > >               bvprv =3D bv;
> >> > > -             bvprvp =3D &bv;
> >> > > +             bvprvp =3D &bvprv;
> >> > >               seg_size =3D bv.bv_len;
> >> > >               sectors +=3D bv.bv_len >> 9;
> >> > >       }
> >> >
> >> > I'm still hitting the BUG even with this patch applied on top of 4.4=
-rc1.
> >>
> >> OK, looks there are still other bugs, care to share us how to reproduce
> >> it on arm64?
> >>
> >> thanks,
> >> Ming
> >
> > Unfortunately, the best reproducer I have is to boot the platform. I ha=
ve seen the
> > BUG a few times post-boot, but I don't have a consistant reproducer. I =
am using
> > upstream 4.4-rc1 with this config:
> >
> >   http://people.redhat.com/msalter/fh_defconfig
> >
> > With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every=
 6-7 boots.
> > On an AMD Seattle platform, about every 9 boots.
>=20
> Thanks for the input, and I will try to reproduce the issue on mustang wi=
th
> your kernel config.

I can reproduce the issue on mustang, and looks I may understand the story =
now.

When 64K page size is used on arm64, and the default segment size of block
is 65536, then one segment should only include one page at most.

Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does n=
ot
compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less seg=
ment
may be obtained because blk_phys_contig_segment() thought the last bvec in =
1st
bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
regression.

Looks the following patch can fix the issue by figuring bio->bi_seg_front_s=
ize
and bio->bi_seg_back_size in blk_bio_segment_split().

Mark, thanks again for providing the reproduction steps, and could you run =
your
test to see if it can fix your issue?

---
=46rom 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Mon, 23 Nov 2015 20:27:23 +0800
Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split

Commit bdced438acd83a(block: setup bi_phys_segments after
splitting) introduces function of computing bio->bi_phys_segments
during bio splitting.

Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
arn't computed, so too many physical segments may be obtained
for one request since both the two are used to check if one segment
across two bios can be possible.

This patch fixes the issue by computing the two variables in
blk_bio_segment_split().

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Mark Salter <msalter@redhat.com>
Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-merge.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index f2efe8a..50793cd 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -76,6 +76,9 @@ static struct bio *blk_bio_segment_split(struct request_q=
ueue *q,
 	struct bio_vec bv, bvprv, *bvprvp =3D NULL;
 	struct bvec_iter iter;
 	unsigned seg_size =3D 0, nsegs =3D 0, sectors =3D 0;
+	unsigned front_seg_size =3D bio->bi_seg_front_size;
+	bool do_split =3D true;
+	struct bio *new =3D NULL;
=20
 	bio_for_each_segment(bv, bio, iter) {
 		if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
@@ -111,13 +114,26 @@ new_segment:
 		bvprvp =3D &bvprv;
 		seg_size =3D bv.bv_len;
 		sectors +=3D bv.bv_len >> 9;
+
+		if (nsegs =3D=3D 1 && seg_size > front_seg_size)
+			front_seg_size =3D seg_size;
 	}
=20
-	*segs =3D nsegs;
-	return NULL;
+	do_split =3D false;
 split:
 	*segs =3D nsegs;
-	return bio_split(bio, sectors, GFP_NOIO, bs);
+
+	if (do_split) {
+		new =3D bio_split(bio, sectors, GFP_NOIO, bs);
+		if (new)
+			bio =3D new;
+	}
+
+	bio->bi_seg_front_size =3D front_seg_size;
+	if (seg_size > bio->bi_seg_back_size)
+		bio->bi_seg_back_size =3D seg_size;
+
+	return do_split ? new : NULL;
 }
=20
 void blk_queue_split(struct request_queue *q, struct bio **bio,
--=20
1.9.1

Thanks,

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23 15:20                   ` Laurent Dufour
@ 2015-11-23 15:27                     ` Ming Lei
  2015-11-23 16:24                       ` Laurent Dufour
  2015-11-24  1:30                       ` Mark Salter
  0 siblings, 2 replies; 58+ messages in thread
From: Ming Lei @ 2015-11-23 15:27 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Pratyush Anand, Mark Salter, linux-block, Linux SCSI List,
	James E. J. Bottomley, Linux Kernel Mailing List,
	Christoph Hellwig, brking, linuxppc-dev

On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
<ldufour@linux.vnet.ibm.com> wrote:
>>
>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>
> That's what I mentioned earlier ;)
>
> Now Ming send an additional patch with seems to fix the bug introduced
> through the commit bdced438acd8. When testing with this new patch I
> can't get the panic anymore, but Mark reported he is still hitting it.

Laurent, thanks for your test on the 1st patch, and looks there are at
least two problems, and my 2nd patch sent just now should address
Mark's issue which is caused by bdced438acd83a.

Once the 2nd one is tested OK, I will send out the two together.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23 15:27                     ` Ming Lei
@ 2015-11-23 16:24                       ` Laurent Dufour
  2015-11-24  1:30                       ` Mark Salter
  1 sibling, 0 replies; 58+ messages in thread
From: Laurent Dufour @ 2015-11-23 16:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Pratyush Anand, Mark Salter, linux-block, Linux SCSI List,
	James E. J. Bottomley, Linux Kernel Mailing List,
	Christoph Hellwig, brking, linuxppc-dev

On 23/11/2015 16:27, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
> <ldufour@linux.vnet.ibm.com> wrote:
>>>
>>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>>
>> That's what I mentioned earlier ;)
>>
>> Now Ming send an additional patch with seems to fix the bug introduced
>> through the commit bdced438acd8. When testing with this new patch I
>> can't get the panic anymore, but Mark reported he is still hitting it.
> 
> Laurent, thanks for your test on the 1st patch, and looks there are at
> least two problems, and my 2nd patch sent just now should address
> Mark's issue which is caused by bdced438acd83a.
> 
> Once the 2nd one is tested OK, I will send out the two together.

FWIW, I applied the 2nd patch, and my system is still running like a charm.

Cheers,
Laurent.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23 15:27                     ` Ming Lei
  2015-11-23 16:24                       ` Laurent Dufour
@ 2015-11-24  1:30                       ` Mark Salter
  1 sibling, 0 replies; 58+ messages in thread
From: Mark Salter @ 2015-11-24  1:30 UTC (permalink / raw)
  To: Ming Lei, Laurent Dufour
  Cc: Pratyush Anand, linux-block, Linux SCSI List,
	James E. J. Bottomley, Linux Kernel Mailing List,
	Christoph Hellwig, brking, linuxppc-dev

On Mon, 2015-11-23 at 23:27 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
> <ldufour@linux.vnet.ibm.com> wrote:
> > > 
> > > Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
> > 
> > That's what I mentioned earlier ;)
> > 
> > Now Ming send an additional patch with seems to fix the bug introduced
> > through the commit bdced438acd8. When testing with this new patch I
> > can't get the panic anymore, but Mark reported he is still hitting it.
> 
> Laurent, thanks for your test on the 1st patch, and looks there are at
> least two problems, and my 2nd patch sent just now should address
> Mark's issue which is caused by bdced438acd83a.
> 
> Once the 2nd one is tested OK, I will send out the two together.
> 
> Thanks,
> Ming

Thanks Ming.
With both patches applied, I have been unable to reproduce the problem.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-23 15:21                       ` Ming Lei
  (?)
  (?)
@ 2015-11-24 18:59                         ` Alan Ott
  -1 siblings, 0 replies; 58+ messages in thread
From: Alan Ott @ 2015-11-24 18:59 UTC (permalink / raw)
  To: Ming Lei, Mark Salter
  Cc: Ming Lei, Laurent Dufour, Michael Ellerman, Christoph Hellwig,
	James E. J. Bottomley, brking, Linux SCSI List,
	Linux Kernel Mailing List, linuxppc-dev, linux-block,
	linux-arm-kernel

On 11/23/2015 10:21 AM, Ming Lei wrote:
> On Mon, 23 Nov 2015 10:46:20 +0800
> Ming Lei <ming.lei@canonical.com> wrote:
>
>> Hi Mark,
>>
>> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msalter@redhat.com> wrote:
>>> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>>>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
>>>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>>>>>> On Sat, 21 Nov 2015 12:30:14 +0100
>>>>>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>>>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>>>>>
>>>>>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>>>>>> limits properly.
>>>>>>>>>
>>>>>>>>> Does either of you have a known good and first known bad kernel?
>>>>>>>>
>>>>>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>>>>>> 4.4-rc1.
>>>>>>>>
>>>>>>>> Laurent, can you narrow it down at all?
>>>>>>>
>>>>>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>>>>>> setup bi_phys_segments after splitting") which has been pulled by the
>>>>>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>>>>>> git://git.kernel.dk/linux-block").
>>>>>>>
>>>>>>> My system is panicing promptly when running a kernel built at
>>>>>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>>>>>> without panicing.
>>>>>>>
>>>>>>> This being said, I can't explain what's going wrong.
>>>>>>>
>>>>>>> May Ming shed some light here ?
>>>>>>
>>>>>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>>>>>> mind testing the following patch to see if it fixes your issue?
>>>>>>
>>>>>> ---
>>>>>>  From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>>>>>> From: Ming Lei <ming.lei@canonical.com>
>>>>>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>>>>>> Subject: [PATCH] block: fix segment split
>>>>>>
>>>>>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>>>>>> always points to the iterator local variable, which is obviously
>>>>>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>>>>>
>>>>>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>>>>>> ---
>>>>>>   block/blk-merge.c | 4 ++--
>>>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>>>> index de5716d8..f2efe8a 100644
>>>>>> --- a/block/blk-merge.c
>>>>>> +++ b/block/blk-merge.c
>>>>>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>>>>>
>>>>>>                        seg_size += bv.bv_len;
>>>>>>                        bvprv = bv;
>>>>>> -                     bvprvp = &bv;
>>>>>> +                     bvprvp = &bvprv;
>>>>>>                        sectors += bv.bv_len >> 9;
>>>>>>                        continue;
>>>>>>                }
>>>>>> @@ -108,7 +108,7 @@ new_segment:
>>>>>>
>>>>>>                nsegs++;
>>>>>>                bvprv = bv;
>>>>>> -             bvprvp = &bv;
>>>>>> +             bvprvp = &bvprv;
>>>>>>                seg_size = bv.bv_len;
>>>>>>                sectors += bv.bv_len >> 9;
>>>>>>        }
>>>>>
>>>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>>>
>>>> OK, looks there are still other bugs, care to share us how to reproduce
>>>> it on arm64?
>>>>
>>>> thanks,
>>>> Ming
>>>
>>> Unfortunately, the best reproducer I have is to boot the platform. I have seen the
>>> BUG a few times post-boot, but I don't have a consistant reproducer. I am using
>>> upstream 4.4-rc1 with this config:
>>>
>>>    http://people.redhat.com/msalter/fh_defconfig
>>>
>>> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
>>> On an AMD Seattle platform, about every 9 boots.
>>
>> Thanks for the input, and I will try to reproduce the issue on mustang with
>> your kernel config.
>
> I can reproduce the issue on mustang, and looks I may understand the story now.
>
> When 64K page size is used on arm64, and the default segment size of block
> is 65536, then one segment should only include one page at most.
>
> Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
> compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
> may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
> bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
> regression.
>
> Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
> and bio->bi_seg_back_size in blk_bio_segment_split().
>
> Mark, thanks again for providing the reproduction steps, and could you run your
> test to see if it can fix your issue?
>
> ---
>  From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Mon, 23 Nov 2015 20:27:23 +0800
> Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split
>
> Commit bdced438acd83a(block: setup bi_phys_segments after
> splitting) introduces function of computing bio->bi_phys_segments
> during bio splitting.
>
> Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
> arn't computed, so too many physical segments may be obtained
> for one request since both the two are used to check if one segment
> across two bios can be possible.
>
> This patch fixes the issue by computing the two variables in
> blk_bio_segment_split().
>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Mark Salter <msalter@redhat.com>
> Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>   block/blk-merge.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index f2efe8a..50793cd 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -76,6 +76,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>   	struct bio_vec bv, bvprv, *bvprvp = NULL;
>   	struct bvec_iter iter;
>   	unsigned seg_size = 0, nsegs = 0, sectors = 0;
> +	unsigned front_seg_size = bio->bi_seg_front_size;
> +	bool do_split = true;
> +	struct bio *new = NULL;
>
>   	bio_for_each_segment(bv, bio, iter) {
>   		if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
> @@ -111,13 +114,26 @@ new_segment:
>   		bvprvp = &bvprv;
>   		seg_size = bv.bv_len;
>   		sectors += bv.bv_len >> 9;
> +
> +		if (nsegs == 1 && seg_size > front_seg_size)
> +			front_seg_size = seg_size;
>   	}
>
> -	*segs = nsegs;
> -	return NULL;
> +	do_split = false;
>   split:
>   	*segs = nsegs;
> -	return bio_split(bio, sectors, GFP_NOIO, bs);
> +
> +	if (do_split) {
> +		new = bio_split(bio, sectors, GFP_NOIO, bs);
> +		if (new)
> +			bio = new;
> +	}
> +
> +	bio->bi_seg_front_size = front_seg_size;
> +	if (seg_size > bio->bi_seg_back_size)
> +		bio->bi_seg_back_size = seg_size;
> +
> +	return do_split ? new : NULL;
>   }
>
>   void blk_queue_split(struct request_queue *q, struct bio **bio,
>

I applied both your patches and tested on Overdrive 3000. This fixes the 
issue for me.

Added linux-arm-kernel, since arm64 triggers this issue.

Thanks Ming and testers for your hard work on this.

Tested-by: Alan Ott <alan@softiron.co.uk>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-24 18:59                         ` Alan Ott
  0 siblings, 0 replies; 58+ messages in thread
From: Alan Ott @ 2015-11-24 18:59 UTC (permalink / raw)
  To: Ming Lei, Mark Salter
  Cc: linux-block, Linux SCSI List, Michael Ellerman, Ming Lei,
	James E. J. Bottomley, Linux Kernel Mailing List,
	Christoph Hellwig, brking, Laurent Dufour, linuxppc-dev,
	linux-arm-kernel

On 11/23/2015 10:21 AM, Ming Lei wrote:
> On Mon, 23 Nov 2015 10:46:20 +0800
> Ming Lei <ming.lei@canonical.com> wrote:
>
>> Hi Mark,
>>
>> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msalter@redhat.com> wrote:
>>> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>>>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
>>>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>>>>>> On Sat, 21 Nov 2015 12:30:14 +0100
>>>>>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>>>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>>>>>
>>>>>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>>>>>> limits properly.
>>>>>>>>>
>>>>>>>>> Does either of you have a known good and first known bad kernel?
>>>>>>>>
>>>>>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>>>>>> 4.4-rc1.
>>>>>>>>
>>>>>>>> Laurent, can you narrow it down at all?
>>>>>>>
>>>>>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>>>>>> setup bi_phys_segments after splitting") which has been pulled by the
>>>>>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>>>>>> git://git.kernel.dk/linux-block").
>>>>>>>
>>>>>>> My system is panicing promptly when running a kernel built at
>>>>>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>>>>>> without panicing.
>>>>>>>
>>>>>>> This being said, I can't explain what's going wrong.
>>>>>>>
>>>>>>> May Ming shed some light here ?
>>>>>>
>>>>>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>>>>>> mind testing the following patch to see if it fixes your issue?
>>>>>>
>>>>>> ---
>>>>>>  From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>>>>>> From: Ming Lei <ming.lei@canonical.com>
>>>>>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>>>>>> Subject: [PATCH] block: fix segment split
>>>>>>
>>>>>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>>>>>> always points to the iterator local variable, which is obviously
>>>>>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>>>>>
>>>>>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>>>>>> ---
>>>>>>   block/blk-merge.c | 4 ++--
>>>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>>>> index de5716d8..f2efe8a 100644
>>>>>> --- a/block/blk-merge.c
>>>>>> +++ b/block/blk-merge.c
>>>>>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>>>>>
>>>>>>                        seg_size += bv.bv_len;
>>>>>>                        bvprv = bv;
>>>>>> -                     bvprvp = &bv;
>>>>>> +                     bvprvp = &bvprv;
>>>>>>                        sectors += bv.bv_len >> 9;
>>>>>>                        continue;
>>>>>>                }
>>>>>> @@ -108,7 +108,7 @@ new_segment:
>>>>>>
>>>>>>                nsegs++;
>>>>>>                bvprv = bv;
>>>>>> -             bvprvp = &bv;
>>>>>> +             bvprvp = &bvprv;
>>>>>>                seg_size = bv.bv_len;
>>>>>>                sectors += bv.bv_len >> 9;
>>>>>>        }
>>>>>
>>>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>>>
>>>> OK, looks there are still other bugs, care to share us how to reproduce
>>>> it on arm64?
>>>>
>>>> thanks,
>>>> Ming
>>>
>>> Unfortunately, the best reproducer I have is to boot the platform. I have seen the
>>> BUG a few times post-boot, but I don't have a consistant reproducer. I am using
>>> upstream 4.4-rc1 with this config:
>>>
>>>    http://people.redhat.com/msalter/fh_defconfig
>>>
>>> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
>>> On an AMD Seattle platform, about every 9 boots.
>>
>> Thanks for the input, and I will try to reproduce the issue on mustang with
>> your kernel config.
>
> I can reproduce the issue on mustang, and looks I may understand the story now.
>
> When 64K page size is used on arm64, and the default segment size of block
> is 65536, then one segment should only include one page at most.
>
> Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
> compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
> may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
> bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
> regression.
>
> Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
> and bio->bi_seg_back_size in blk_bio_segment_split().
>
> Mark, thanks again for providing the reproduction steps, and could you run your
> test to see if it can fix your issue?
>
> ---
>  From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Mon, 23 Nov 2015 20:27:23 +0800
> Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split
>
> Commit bdced438acd83a(block: setup bi_phys_segments after
> splitting) introduces function of computing bio->bi_phys_segments
> during bio splitting.
>
> Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
> arn't computed, so too many physical segments may be obtained
> for one request since both the two are used to check if one segment
> across two bios can be possible.
>
> This patch fixes the issue by computing the two variables in
> blk_bio_segment_split().
>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Mark Salter <msalter@redhat.com>
> Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>   block/blk-merge.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index f2efe8a..50793cd 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -76,6 +76,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>   	struct bio_vec bv, bvprv, *bvprvp = NULL;
>   	struct bvec_iter iter;
>   	unsigned seg_size = 0, nsegs = 0, sectors = 0;
> +	unsigned front_seg_size = bio->bi_seg_front_size;
> +	bool do_split = true;
> +	struct bio *new = NULL;
>
>   	bio_for_each_segment(bv, bio, iter) {
>   		if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
> @@ -111,13 +114,26 @@ new_segment:
>   		bvprvp = &bvprv;
>   		seg_size = bv.bv_len;
>   		sectors += bv.bv_len >> 9;
> +
> +		if (nsegs == 1 && seg_size > front_seg_size)
> +			front_seg_size = seg_size;
>   	}
>
> -	*segs = nsegs;
> -	return NULL;
> +	do_split = false;
>   split:
>   	*segs = nsegs;
> -	return bio_split(bio, sectors, GFP_NOIO, bs);
> +
> +	if (do_split) {
> +		new = bio_split(bio, sectors, GFP_NOIO, bs);
> +		if (new)
> +			bio = new;
> +	}
> +
> +	bio->bi_seg_front_size = front_seg_size;
> +	if (seg_size > bio->bi_seg_back_size)
> +		bio->bi_seg_back_size = seg_size;
> +
> +	return do_split ? new : NULL;
>   }
>
>   void blk_queue_split(struct request_queue *q, struct bio **bio,
>

I applied both your patches and tested on Overdrive 3000. This fixes the 
issue for me.

Added linux-arm-kernel, since arm64 triggers this issue.

Thanks Ming and testers for your hard work on this.

Tested-by: Alan Ott <alan@softiron.co.uk>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-24 18:59                         ` Alan Ott
  0 siblings, 0 replies; 58+ messages in thread
From: Alan Ott @ 2015-11-24 18:59 UTC (permalink / raw)
  To: Ming Lei, Mark Salter
  Cc: Ming Lei, Laurent Dufour, Michael Ellerman, Christoph Hellwig,
	James E. J. Bottomley, brking, Linux SCSI List,
	Linux Kernel Mailing List, linuxppc-dev, linux-block,
	linux-arm-kernel

On 11/23/2015 10:21 AM, Ming Lei wrote:
> On Mon, 23 Nov 2015 10:46:20 +0800
> Ming Lei <ming.lei@canonical.com> wrote:
>
>> Hi Mark,
>>
>> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msalter@redhat.com> wrote:
>>> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>>>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
>>>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>>>>>> On Sat, 21 Nov 2015 12:30:14 +0100
>>>>>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>>>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>>>>>
>>>>>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>>>>>> limits properly.
>>>>>>>>>
>>>>>>>>> Does either of you have a known good and first known bad kernel?
>>>>>>>>
>>>>>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>>>>>> 4.4-rc1.
>>>>>>>>
>>>>>>>> Laurent, can you narrow it down at all?
>>>>>>>
>>>>>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>>>>>> setup bi_phys_segments after splitting") which has been pulled by the
>>>>>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>>>>>> git://git.kernel.dk/linux-block").
>>>>>>>
>>>>>>> My system is panicing promptly when running a kernel built at
>>>>>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>>>>>> without panicing.
>>>>>>>
>>>>>>> This being said, I can't explain what's going wrong.
>>>>>>>
>>>>>>> May Ming shed some light here ?
>>>>>>
>>>>>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>>>>>> mind testing the following patch to see if it fixes your issue?
>>>>>>
>>>>>> ---
>>>>>>  From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>>>>>> From: Ming Lei <ming.lei@canonical.com>
>>>>>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>>>>>> Subject: [PATCH] block: fix segment split
>>>>>>
>>>>>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>>>>>> always points to the iterator local variable, which is obviously
>>>>>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>>>>>
>>>>>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>>>>>> ---
>>>>>>   block/blk-merge.c | 4 ++--
>>>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>>>> index de5716d8..f2efe8a 100644
>>>>>> --- a/block/blk-merge.c
>>>>>> +++ b/block/blk-merge.c
>>>>>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>>>>>
>>>>>>                        seg_size += bv.bv_len;
>>>>>>                        bvprv = bv;
>>>>>> -                     bvprvp = &bv;
>>>>>> +                     bvprvp = &bvprv;
>>>>>>                        sectors += bv.bv_len >> 9;
>>>>>>                        continue;
>>>>>>                }
>>>>>> @@ -108,7 +108,7 @@ new_segment:
>>>>>>
>>>>>>                nsegs++;
>>>>>>                bvprv = bv;
>>>>>> -             bvprvp = &bv;
>>>>>> +             bvprvp = &bvprv;
>>>>>>                seg_size = bv.bv_len;
>>>>>>                sectors += bv.bv_len >> 9;
>>>>>>        }
>>>>>
>>>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>>>
>>>> OK, looks there are still other bugs, care to share us how to reproduce
>>>> it on arm64?
>>>>
>>>> thanks,
>>>> Ming
>>>
>>> Unfortunately, the best reproducer I have is to boot the platform. I have seen the
>>> BUG a few times post-boot, but I don't have a consistant reproducer. I am using
>>> upstream 4.4-rc1 with this config:
>>>
>>>    http://people.redhat.com/msalter/fh_defconfig
>>>
>>> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
>>> On an AMD Seattle platform, about every 9 boots.
>>
>> Thanks for the input, and I will try to reproduce the issue on mustang with
>> your kernel config.
>
> I can reproduce the issue on mustang, and looks I may understand the story now.
>
> When 64K page size is used on arm64, and the default segment size of block
> is 65536, then one segment should only include one page at most.
>
> Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
> compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
> may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
> bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
> regression.
>
> Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
> and bio->bi_seg_back_size in blk_bio_segment_split().
>
> Mark, thanks again for providing the reproduction steps, and could you run your
> test to see if it can fix your issue?
>
> ---
>  From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Mon, 23 Nov 2015 20:27:23 +0800
> Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split
>
> Commit bdced438acd83a(block: setup bi_phys_segments after
> splitting) introduces function of computing bio->bi_phys_segments
> during bio splitting.
>
> Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
> arn't computed, so too many physical segments may be obtained
> for one request since both the two are used to check if one segment
> across two bios can be possible.
>
> This patch fixes the issue by computing the two variables in
> blk_bio_segment_split().
>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Mark Salter <msalter@redhat.com>
> Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>   block/blk-merge.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index f2efe8a..50793cd 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -76,6 +76,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>   	struct bio_vec bv, bvprv, *bvprvp = NULL;
>   	struct bvec_iter iter;
>   	unsigned seg_size = 0, nsegs = 0, sectors = 0;
> +	unsigned front_seg_size = bio->bi_seg_front_size;
> +	bool do_split = true;
> +	struct bio *new = NULL;
>
>   	bio_for_each_segment(bv, bio, iter) {
>   		if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
> @@ -111,13 +114,26 @@ new_segment:
>   		bvprvp = &bvprv;
>   		seg_size = bv.bv_len;
>   		sectors += bv.bv_len >> 9;
> +
> +		if (nsegs == 1 && seg_size > front_seg_size)
> +			front_seg_size = seg_size;
>   	}
>
> -	*segs = nsegs;
> -	return NULL;
> +	do_split = false;
>   split:
>   	*segs = nsegs;
> -	return bio_split(bio, sectors, GFP_NOIO, bs);
> +
> +	if (do_split) {
> +		new = bio_split(bio, sectors, GFP_NOIO, bs);
> +		if (new)
> +			bio = new;
> +	}
> +
> +	bio->bi_seg_front_size = front_seg_size;
> +	if (seg_size > bio->bi_seg_back_size)
> +		bio->bi_seg_back_size = seg_size;
> +
> +	return do_split ? new : NULL;
>   }
>
>   void blk_queue_split(struct request_queue *q, struct bio **bio,
>

I applied both your patches and tested on Overdrive 3000. This fixes the 
issue for me.

Added linux-arm-kernel, since arm64 triggers this issue.

Thanks Ming and testers for your hard work on this.

Tested-by: Alan Ott <alan@softiron.co.uk>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-24 18:59                         ` Alan Ott
  0 siblings, 0 replies; 58+ messages in thread
From: Alan Ott @ 2015-11-24 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/23/2015 10:21 AM, Ming Lei wrote:
> On Mon, 23 Nov 2015 10:46:20 +0800
> Ming Lei <ming.lei@canonical.com> wrote:
>
>> Hi Mark,
>>
>> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msalter@redhat.com> wrote:
>>> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>>>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
>>>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>>>>>> On Sat, 21 Nov 2015 12:30:14 +0100
>>>>>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>>>>>
>>>>>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>>>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>>>>>
>>>>>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>>>>>> limits properly.
>>>>>>>>>
>>>>>>>>> Does either of you have a known good and first known bad kernel?
>>>>>>>>
>>>>>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>>>>>> 4.4-rc1.
>>>>>>>>
>>>>>>>> Laurent, can you narrow it down at all?
>>>>>>>
>>>>>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>>>>>> setup bi_phys_segments after splitting") which has been pulled by the
>>>>>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>>>>>> git://git.kernel.dk/linux-block").
>>>>>>>
>>>>>>> My system is panicing promptly when running a kernel built at
>>>>>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>>>>>> without panicing.
>>>>>>>
>>>>>>> This being said, I can't explain what's going wrong.
>>>>>>>
>>>>>>> May Ming shed some light here ?
>>>>>>
>>>>>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>>>>>> mind testing the following patch to see if it fixes your issue?
>>>>>>
>>>>>> ---
>>>>>>  From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>>>>>> From: Ming Lei <ming.lei@canonical.com>
>>>>>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>>>>>> Subject: [PATCH] block: fix segment split
>>>>>>
>>>>>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>>>>>> always points to the iterator local variable, which is obviously
>>>>>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>>>>>
>>>>>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>>>>>> ---
>>>>>>   block/blk-merge.c | 4 ++--
>>>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>>>> index de5716d8..f2efe8a 100644
>>>>>> --- a/block/blk-merge.c
>>>>>> +++ b/block/blk-merge.c
>>>>>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>>>>>
>>>>>>                        seg_size += bv.bv_len;
>>>>>>                        bvprv = bv;
>>>>>> -                     bvprvp = &bv;
>>>>>> +                     bvprvp = &bvprv;
>>>>>>                        sectors += bv.bv_len >> 9;
>>>>>>                        continue;
>>>>>>                }
>>>>>> @@ -108,7 +108,7 @@ new_segment:
>>>>>>
>>>>>>                nsegs++;
>>>>>>                bvprv = bv;
>>>>>> -             bvprvp = &bv;
>>>>>> +             bvprvp = &bvprv;
>>>>>>                seg_size = bv.bv_len;
>>>>>>                sectors += bv.bv_len >> 9;
>>>>>>        }
>>>>>
>>>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>>>
>>>> OK, looks there are still other bugs, care to share us how to reproduce
>>>> it on arm64?
>>>>
>>>> thanks,
>>>> Ming
>>>
>>> Unfortunately, the best reproducer I have is to boot the platform. I have seen the
>>> BUG a few times post-boot, but I don't have a consistant reproducer. I am using
>>> upstream 4.4-rc1 with this config:
>>>
>>>    http://people.redhat.com/msalter/fh_defconfig
>>>
>>> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
>>> On an AMD Seattle platform, about every 9 boots.
>>
>> Thanks for the input, and I will try to reproduce the issue on mustang with
>> your kernel config.
>
> I can reproduce the issue on mustang, and looks I may understand the story now.
>
> When 64K page size is used on arm64, and the default segment size of block
> is 65536, then one segment should only include one page at most.
>
> Commit bdced438acd83a(block: setup bi_phys_segments after splitting) does not
> compute bio->bi_seg_front_size and bio->bi_seg_back_size, then one less segment
> may be obtained because blk_phys_contig_segment() thought the last bvec in 1st
> bio and the 1st bvec in the 2nd bio is in one physical segment, so cause the
> regression.
>
> Looks the following patch can fix the issue by figuring bio->bi_seg_front_size
> and bio->bi_seg_back_size in blk_bio_segment_split().
>
> Mark, thanks again for providing the reproduction steps, and could you run your
> test to see if it can fix your issue?
>
> ---
>  From 86b5f33d48715c1150fdcfd9a76e495e7aa913aa Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Mon, 23 Nov 2015 20:27:23 +0800
> Subject: [PATCH 2/2] blk-merge: fix blk_bio_segment_split
>
> Commit bdced438acd83a(block: setup bi_phys_segments after
> splitting) introduces function of computing bio->bi_phys_segments
> during bio splitting.
>
> Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size
> arn't computed, so too many physical segments may be obtained
> for one request since both the two are used to check if one segment
> across two bios can be possible.
>
> This patch fixes the issue by computing the two variables in
> blk_bio_segment_split().
>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Mark Salter <msalter@redhat.com>
> Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting)
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>   block/blk-merge.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index f2efe8a..50793cd 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -76,6 +76,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>   	struct bio_vec bv, bvprv, *bvprvp = NULL;
>   	struct bvec_iter iter;
>   	unsigned seg_size = 0, nsegs = 0, sectors = 0;
> +	unsigned front_seg_size = bio->bi_seg_front_size;
> +	bool do_split = true;
> +	struct bio *new = NULL;
>
>   	bio_for_each_segment(bv, bio, iter) {
>   		if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q))
> @@ -111,13 +114,26 @@ new_segment:
>   		bvprvp = &bvprv;
>   		seg_size = bv.bv_len;
>   		sectors += bv.bv_len >> 9;
> +
> +		if (nsegs == 1 && seg_size > front_seg_size)
> +			front_seg_size = seg_size;
>   	}
>
> -	*segs = nsegs;
> -	return NULL;
> +	do_split = false;
>   split:
>   	*segs = nsegs;
> -	return bio_split(bio, sectors, GFP_NOIO, bs);
> +
> +	if (do_split) {
> +		new = bio_split(bio, sectors, GFP_NOIO, bs);
> +		if (new)
> +			bio = new;
> +	}
> +
> +	bio->bi_seg_front_size = front_seg_size;
> +	if (seg_size > bio->bi_seg_back_size)
> +		bio->bi_seg_back_size = seg_size;
> +
> +	return do_split ? new : NULL;
>   }
>
>   void blk_queue_split(struct request_queue *q, struct bio **bio,
>

I applied both your patches and tested on Overdrive 3000. This fixes the 
issue for me.

Added linux-arm-kernel, since arm64 triggers this issue.

Thanks Ming and testers for your hard work on this.

Tested-by: Alan Ott <alan@softiron.co.uk>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-20 15:28             ` Ewan Milne
  2015-11-23  6:55                 ` Hannes Reinecke
@ 2015-11-25  9:04               ` Hannes Reinecke
  2015-11-25 17:56                   ` Jens Axboe
  2015-11-25 18:01                 ` Mike Snitzer
  1 sibling, 2 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-25  9:04 UTC (permalink / raw)
  To: emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Mike Snitzer, Jun'ichi Nomura,
	Jens Axboe

On 11/20/2015 04:28 PM, Ewan Milne wrote:
> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>> Can't we have a joint effort here?
>> I've been spending a _LOT_ of time trying to debug things here, but
>> none of the ideas I've come up with have been able to fix anything.
> 
> Yes.  I'm not the one primarily looking at it, and we don't have a
> reproducer in-house.  We just have the one dump right now.
> 
>>
>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>> by one and be done with ...
>>
> 
> That might not fix it if it is a problem with the merge code, though.
> 
And indeed, it doesn't.
Seems I finally found the culprit.

What happens is this:
We have two paths, with these seg_boundary_masks:

path-1:    seg_boundary_mask = 65535,
path-2:    seg_boundary_mask = 4294967295,

consequently the DM request queue has this:

md-1:    seg_boundary_mask = 65535,

What happens now is that a request is being formatted, and sent
to path 2. During submission req->nr_phys_segments is formatted
with the limits of path 2, arriving at a count of 3.
Now the request gets retried on path 1, but as the NOMERGE request
flag is set req->nr_phys_segments is never updated.
But blk_rq_map_sg() ignores all counters, and just uses the
bi_vec directly, resulting in a count of 4 -> boom.

So the culprit here is the NOMERGE flag, which is evaluated
via
->dm_dispatch_request()
  ->blk_insert_cloned_request()
    ->blk_rq_check_limits()

If the above assessment is correct, the following patch should
fix it:

diff --git a/block/blk-core.c b/block/blk-core.c
index 801ced7..12cccd6 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
  */
 int blk_rq_check_limits(struct request_queue *q, struct request *rq)
 {
-       if (!rq_mergeable(rq))
+       if (rq->cmd_type != REQ_TYPE_FS)
                return 0;

        if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
rq->cmd_flags)) {


Mike? Jens?
Can you comment on it?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-25  9:04               ` Hannes Reinecke
@ 2015-11-25 17:56                   ` Jens Axboe
  2015-11-25 18:01                 ` Mike Snitzer
  1 sibling, 0 replies; 58+ messages in thread
From: Jens Axboe @ 2015-11-25 17:56 UTC (permalink / raw)
  To: Hannes Reinecke, emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Mike Snitzer, Jun'ichi Nomura

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:
> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>> Can't we have a joint effort here?
>>> I've been spending a _LOT_ of time trying to debug things here, but
>>> none of the ideas I've come up with have been able to fix anything.
>>
>> Yes.  I'm not the one primarily looking at it, and we don't have a
>> reproducer in-house.  We just have the one dump right now.
>>
>>>
>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>> by one and be done with ...
>>>
>>
>> That might not fix it if it is a problem with the merge code, though.
>>
> And indeed, it doesn't.
> Seems I finally found the culprit.
>
> What happens is this:
> We have two paths, with these seg_boundary_masks:
>
> path-1:    seg_boundary_mask = 65535,
> path-2:    seg_boundary_mask = 4294967295,
>
> consequently the DM request queue has this:
>
> md-1:    seg_boundary_mask = 65535,
>
> What happens now is that a request is being formatted, and sent
> to path 2. During submission req->nr_phys_segments is formatted
> with the limits of path 2, arriving at a count of 3.
> Now the request gets retried on path 1, but as the NOMERGE request
> flag is set req->nr_phys_segments is never updated.
> But blk_rq_map_sg() ignores all counters, and just uses the
> bi_vec directly, resulting in a count of 4 -> boom.
>
> So the culprit here is the NOMERGE flag, which is evaluated
> via
> ->dm_dispatch_request()
>    ->blk_insert_cloned_request()
>      ->blk_rq_check_limits()
>
> If the above assessment is correct, the following patch should
> fix it:
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 801ced7..12cccd6 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>    */
>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>   {
> -       if (!rq_mergeable(rq))
> +       if (rq->cmd_type != REQ_TYPE_FS)
>                  return 0;
>
>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> rq->cmd_flags)) {
>
>
> Mike? Jens?
> Can you comment on it?

We only support merging on REQ_TYPE_FS already, so how is the above 
making it any different? In general, NOMERGE being set or not should not 
make a difference. It's only a hint that we need not check further if we 
should be merging on this request, since we already tried it once, found 
we'd exceed various limits, then set NOMERGE to reflect that.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-25 17:56                   ` Jens Axboe
  0 siblings, 0 replies; 58+ messages in thread
From: Jens Axboe @ 2015-11-25 17:56 UTC (permalink / raw)
  To: Hannes Reinecke, emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Mike Snitzer, Jun'ichi Nomura

On 11/25/2015 02:04 AM, Hannes Reinecke wrote:
> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>> Can't we have a joint effort here?
>>> I've been spending a _LOT_ of time trying to debug things here, but
>>> none of the ideas I've come up with have been able to fix anything.
>>
>> Yes.  I'm not the one primarily looking at it, and we don't have a
>> reproducer in-house.  We just have the one dump right now.
>>
>>>
>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>> by one and be done with ...
>>>
>>
>> That might not fix it if it is a problem with the merge code, though.
>>
> And indeed, it doesn't.
> Seems I finally found the culprit.
>
> What happens is this:
> We have two paths, with these seg_boundary_masks:
>
> path-1:    seg_boundary_mask = 65535,
> path-2:    seg_boundary_mask = 4294967295,
>
> consequently the DM request queue has this:
>
> md-1:    seg_boundary_mask = 65535,
>
> What happens now is that a request is being formatted, and sent
> to path 2. During submission req->nr_phys_segments is formatted
> with the limits of path 2, arriving at a count of 3.
> Now the request gets retried on path 1, but as the NOMERGE request
> flag is set req->nr_phys_segments is never updated.
> But blk_rq_map_sg() ignores all counters, and just uses the
> bi_vec directly, resulting in a count of 4 -> boom.
>
> So the culprit here is the NOMERGE flag, which is evaluated
> via
> ->dm_dispatch_request()
>    ->blk_insert_cloned_request()
>      ->blk_rq_check_limits()
>
> If the above assessment is correct, the following patch should
> fix it:
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 801ced7..12cccd6 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>    */
>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>   {
> -       if (!rq_mergeable(rq))
> +       if (rq->cmd_type != REQ_TYPE_FS)
>                  return 0;
>
>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> rq->cmd_flags)) {
>
>
> Mike? Jens?
> Can you comment on it?

We only support merging on REQ_TYPE_FS already, so how is the above 
making it any different? In general, NOMERGE being set or not should not 
make a difference. It's only a hint that we need not check further if we 
should be merging on this request, since we already tried it once, found 
we'd exceed various limits, then set NOMERGE to reflect that.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-25  9:04               ` Hannes Reinecke
  2015-11-25 17:56                   ` Jens Axboe
@ 2015-11-25 18:01                 ` Mike Snitzer
  2015-11-25 19:01                     ` Hannes Reinecke
  1 sibling, 1 reply; 58+ messages in thread
From: Mike Snitzer @ 2015-11-25 18:01 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: emilne, Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Jun'ichi Nomura, Jens Axboe

On Wed, Nov 25 2015 at  4:04am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 11/20/2015 04:28 PM, Ewan Milne wrote:
> > On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >> Can't we have a joint effort here?
> >> I've been spending a _LOT_ of time trying to debug things here, but
> >> none of the ideas I've come up with have been able to fix anything.
> > 
> > Yes.  I'm not the one primarily looking at it, and we don't have a
> > reproducer in-house.  We just have the one dump right now.
> > 
> >>
> >> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >> by one and be done with ...
> >>
> > 
> > That might not fix it if it is a problem with the merge code, though.
> > 
> And indeed, it doesn't.

How did you arrive at that?  Do you have a reproducer now?

> Seems I finally found the culprit.
> 
> What happens is this:
> We have two paths, with these seg_boundary_masks:
> 
> path-1:    seg_boundary_mask = 65535,
> path-2:    seg_boundary_mask = 4294967295,
> 
> consequently the DM request queue has this:
> 
> md-1:    seg_boundary_mask = 65535,
> 
> What happens now is that a request is being formatted, and sent
> to path 2. During submission req->nr_phys_segments is formatted
> with the limits of path 2, arriving at a count of 3.
> Now the request gets retried on path 1, but as the NOMERGE request
> flag is set req->nr_phys_segments is never updated.
> But blk_rq_map_sg() ignores all counters, and just uses the
> bi_vec directly, resulting in a count of 4 -> boom.
> 
> So the culprit here is the NOMERGE flag,

NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.

> which is evaluated via
> ->dm_dispatch_request()
>   ->blk_insert_cloned_request()
>     ->blk_rq_check_limits()

blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
anyway after reading your mail I'm still left wondering if your proposed
patch is correct.

> If the above assessment is correct, the following patch should
> fix it:
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 801ced7..12cccd6 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>   */
>  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>  {
> -       if (!rq_mergeable(rq))
> +       if (rq->cmd_type != REQ_TYPE_FS)
>                 return 0;
> 
>         if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> rq->cmd_flags)) {
> 
> 
> Mike? Jens?
> Can you comment on it?

You're not explaining the actual change in the patch very well; I think
you're correct but you're leaving the justification as an exercise to 
the reviewer:

blk_rq_check_limits() will call blk_recalc_rq_segments() after the
!rq_mergeable() check but you're saying for this case in question we
never get there -- due to the cloned request having NOMERGE set.

So in blk_rq_check_limits() you've unrolled rq_mergeable() and
open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)

I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
sense for cloned requests that always have NOMERGE set.

So you're saying that by having blk_rq_check_limits() go on to call
blk_recalc_rq_segments() this bug will be fixed?

BTW, I think blk_rq_check_limits()'s export should be removed and the
function made static and renamed to blk_clone_rq_check_limits(), again:
blk_insert_cloned_request() is the only caller of blk_rq_check_limits()

Seems prudent to make that change now to be clear that this code is only
used by cloned requests.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-25 18:01                 ` Mike Snitzer
@ 2015-11-25 19:01                     ` Hannes Reinecke
  0 siblings, 0 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-25 19:01 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: emilne, Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Jun'ichi Nomura, Jens Axboe

On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> On Wed, Nov 25 2015 at  4:04am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
>
>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>>> Can't we have a joint effort here?
>>>> I've been spending a _LOT_ of time trying to debug things here, but
>>>> none of the ideas I've come up with have been able to fix anything.
>>>
>>> Yes.  I'm not the one primarily looking at it, and we don't have a
>>> reproducer in-house.  We just have the one dump right now.
>>>
>>>>
>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>>> by one and be done with ...
>>>>
>>>
>>> That might not fix it if it is a problem with the merge code, though.
>>>
>> And indeed, it doesn't.
>
> How did you arrive at that?  Do you have a reproducer now?
>
Not a reproducer, but several dumps for analysis.

>> Seems I finally found the culprit.
>>
>> What happens is this:
>> We have two paths, with these seg_boundary_masks:
>>
>> path-1:    seg_boundary_mask = 65535,
>> path-2:    seg_boundary_mask = 4294967295,
>>
>> consequently the DM request queue has this:
>>
>> md-1:    seg_boundary_mask = 65535,
>>
>> What happens now is that a request is being formatted, and sent
>> to path 2. During submission req->nr_phys_segments is formatted
>> with the limits of path 2, arriving at a count of 3.
>> Now the request gets retried on path 1, but as the NOMERGE request
>> flag is set req->nr_phys_segments is never updated.
>> But blk_rq_map_sg() ignores all counters, and just uses the
>> bi_vec directly, resulting in a count of 4 -> boom.
>>
>> So the culprit here is the NOMERGE flag,
>
> NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
>
Yes.

>> which is evaluated via
>> ->dm_dispatch_request()
>>    ->blk_insert_cloned_request()
>>      ->blk_rq_check_limits()
>
> blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> anyway after reading your mail I'm still left wondering if your proposed
> patch is correct.
>
>> If the above assessment is correct, the following patch should
>> fix it:
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 801ced7..12cccd6 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>>    */
>>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>>   {
>> -       if (!rq_mergeable(rq))
>> +       if (rq->cmd_type != REQ_TYPE_FS)
>>                  return 0;
>>
>>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>> rq->cmd_flags)) {
>>
>>
>> Mike? Jens?
>> Can you comment on it?
>
> You're not explaining the actual change in the patch very well; I think
> you're correct but you're leaving the justification as an exercise to
> the reviewer:
>
> blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> !rq_mergeable() check but you're saying for this case in question we
> never get there -- due to the cloned request having NOMERGE set.
>
> So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
>
> I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> sense for cloned requests that always have NOMERGE set.
>
> So you're saying that by having blk_rq_check_limits() go on to call
> blk_recalc_rq_segments() this bug will be fixed?
>
That is the idea.

I've already established that in all instances I have seen so far
req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.

As it turns out, req->nr_phys_segemnts _would_ have been updated in
blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
for the cloned request.
So each cloned request inherits the values from the original request,
despite the fact that req->nr_phys_segments _has_ to be evaluated in
the final request_queue context, as the queue limits _might_ be 
different from the original (merged) queue limits of the multipath
request queue.

> BTW, I think blk_rq_check_limits()'s export should be removed and the
> function made static and renamed to blk_clone_rq_check_limits(), again:
> blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
>
Actually, seeing Jens' last comment the check for REQ_TYPE_FS is 
pointless, too, so we might as well remove the entire if-clause.

> Seems prudent to make that change now to be clear that this code is only
> used by cloned requests.
>
Yeah, that would make sense. I'll be preparing a patch.
With a more detailed description :-)

Cheers,

Hannes

-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-25 19:01                     ` Hannes Reinecke
  0 siblings, 0 replies; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-25 19:01 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: emilne, Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Jun'ichi Nomura, Jens Axboe

On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> On Wed, Nov 25 2015 at  4:04am -0500,
> Hannes Reinecke <hare@suse.de> wrote:
>
>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>>> Can't we have a joint effort here?
>>>> I've been spending a _LOT_ of time trying to debug things here, but
>>>> none of the ideas I've come up with have been able to fix anything.
>>>
>>> Yes.  I'm not the one primarily looking at it, and we don't have a
>>> reproducer in-house.  We just have the one dump right now.
>>>
>>>>
>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>>> by one and be done with ...
>>>>
>>>
>>> That might not fix it if it is a problem with the merge code, though.
>>>
>> And indeed, it doesn't.
>
> How did you arrive at that?  Do you have a reproducer now?
>
Not a reproducer, but several dumps for analysis.

>> Seems I finally found the culprit.
>>
>> What happens is this:
>> We have two paths, with these seg_boundary_masks:
>>
>> path-1:    seg_boundary_mask = 65535,
>> path-2:    seg_boundary_mask = 4294967295,
>>
>> consequently the DM request queue has this:
>>
>> md-1:    seg_boundary_mask = 65535,
>>
>> What happens now is that a request is being formatted, and sent
>> to path 2. During submission req->nr_phys_segments is formatted
>> with the limits of path 2, arriving at a count of 3.
>> Now the request gets retried on path 1, but as the NOMERGE request
>> flag is set req->nr_phys_segments is never updated.
>> But blk_rq_map_sg() ignores all counters, and just uses the
>> bi_vec directly, resulting in a count of 4 -> boom.
>>
>> So the culprit here is the NOMERGE flag,
>
> NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
>
Yes.

>> which is evaluated via
>> ->dm_dispatch_request()
>>    ->blk_insert_cloned_request()
>>      ->blk_rq_check_limits()
>
> blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> anyway after reading your mail I'm still left wondering if your proposed
> patch is correct.
>
>> If the above assessment is correct, the following patch should
>> fix it:
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 801ced7..12cccd6 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>>    */
>>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>>   {
>> -       if (!rq_mergeable(rq))
>> +       if (rq->cmd_type != REQ_TYPE_FS)
>>                  return 0;
>>
>>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>> rq->cmd_flags)) {
>>
>>
>> Mike? Jens?
>> Can you comment on it?
>
> You're not explaining the actual change in the patch very well; I think
> you're correct but you're leaving the justification as an exercise to
> the reviewer:
>
> blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> !rq_mergeable() check but you're saying for this case in question we
> never get there -- due to the cloned request having NOMERGE set.
>
> So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
>
> I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> sense for cloned requests that always have NOMERGE set.
>
> So you're saying that by having blk_rq_check_limits() go on to call
> blk_recalc_rq_segments() this bug will be fixed?
>
That is the idea.

I've already established that in all instances I have seen so far
req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.

As it turns out, req->nr_phys_segemnts _would_ have been updated in
blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
for the cloned request.
So each cloned request inherits the values from the original request,
despite the fact that req->nr_phys_segments _has_ to be evaluated in
the final request_queue context, as the queue limits _might_ be 
different from the original (merged) queue limits of the multipath
request queue.

> BTW, I think blk_rq_check_limits()'s export should be removed and the
> function made static and renamed to blk_clone_rq_check_limits(), again:
> blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
>
Actually, seeing Jens' last comment the check for REQ_TYPE_FS is 
pointless, too, so we might as well remove the entire if-clause.

> Seems prudent to make that change now to be clear that this code is only
> used by cloned requests.
>
Yeah, that would make sense. I'll be preparing a patch.
With a more detailed description :-)

Cheers,

Hannes

-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-25 17:56                   ` Jens Axboe
  (?)
@ 2015-11-25 19:10                   ` Hannes Reinecke
  2015-11-25 19:24                       ` Jens Axboe
  -1 siblings, 1 reply; 58+ messages in thread
From: Hannes Reinecke @ 2015-11-25 19:10 UTC (permalink / raw)
  To: Jens Axboe, emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Mike Snitzer, Jun'ichi Nomura

On 11/25/2015 06:56 PM, Jens Axboe wrote:
> On 11/25/2015 02:04 AM, Hannes Reinecke wrote:
>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>>> Can't we have a joint effort here?
>>>> I've been spending a _LOT_ of time trying to debug things here, but
>>>> none of the ideas I've come up with have been able to fix anything.
>>>
>>> Yes.  I'm not the one primarily looking at it, and we don't have a
>>> reproducer in-house.  We just have the one dump right now.
>>>
>>>>
>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>>> by one and be done with ...
>>>>
>>>
>>> That might not fix it if it is a problem with the merge code, though.
>>>
>> And indeed, it doesn't.
>> Seems I finally found the culprit.
>>
>> What happens is this:
>> We have two paths, with these seg_boundary_masks:
>>
>> path-1:    seg_boundary_mask = 65535,
>> path-2:    seg_boundary_mask = 4294967295,
>>
>> consequently the DM request queue has this:
>>
>> md-1:    seg_boundary_mask = 65535,
>>
>> What happens now is that a request is being formatted, and sent
>> to path 2. During submission req->nr_phys_segments is formatted
>> with the limits of path 2, arriving at a count of 3.
>> Now the request gets retried on path 1, but as the NOMERGE request
>> flag is set req->nr_phys_segments is never updated.
>> But blk_rq_map_sg() ignores all counters, and just uses the
>> bi_vec directly, resulting in a count of 4 -> boom.
>>
>> So the culprit here is the NOMERGE flag, which is evaluated
>> via
>> ->dm_dispatch_request()
>>    ->blk_insert_cloned_request()
>>      ->blk_rq_check_limits()
>>
>> If the above assessment is correct, the following patch should
>> fix it:
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 801ced7..12cccd6 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>>    */
>>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>>   {
>> -       if (!rq_mergeable(rq))
>> +       if (rq->cmd_type != REQ_TYPE_FS)
>>                  return 0;
>>
>>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>> rq->cmd_flags)) {
>>
>>
>> Mike? Jens?
>> Can you comment on it?
>
> We only support merging on REQ_TYPE_FS already, so how is the above
> making it any different? In general, NOMERGE being set or not should not
> make a difference. It's only a hint that we need not check further if we
> should be merging on this request, since we already tried it once, found
> we'd exceed various limits, then set NOMERGE to reflect that.
>
The problem is that NOMERGE does too much, as it inhibits _any_ merging.

Unfortunately, the req->nr_phys_segments value is evaluated in the final
_driver_ context _after_ the merging happend; cf 
scsi_lib.c:scsi_init_sgtable().
As nr_phys_segments is inherited from the original request (and never
recalculated with the new request queue limits) the following 
blk_rq_map_sg() call might end up at a different calculation, especially
after retrying a request on another path.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-25 19:10                   ` Hannes Reinecke
@ 2015-11-25 19:24                       ` Jens Axboe
  0 siblings, 0 replies; 58+ messages in thread
From: Jens Axboe @ 2015-11-25 19:24 UTC (permalink / raw)
  To: Hannes Reinecke, emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Mike Snitzer, Jun'ichi Nomura

On 11/25/2015 12:10 PM, Hannes Reinecke wrote:
> On 11/25/2015 06:56 PM, Jens Axboe wrote:
>> On 11/25/2015 02:04 AM, Hannes Reinecke wrote:
>>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>>>> Can't we have a joint effort here?
>>>>> I've been spending a _LOT_ of time trying to debug things here, but
>>>>> none of the ideas I've come up with have been able to fix anything.
>>>>
>>>> Yes.  I'm not the one primarily looking at it, and we don't have a
>>>> reproducer in-house.  We just have the one dump right now.
>>>>
>>>>>
>>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>>>> by one and be done with ...
>>>>>
>>>>
>>>> That might not fix it if it is a problem with the merge code, though.
>>>>
>>> And indeed, it doesn't.
>>> Seems I finally found the culprit.
>>>
>>> What happens is this:
>>> We have two paths, with these seg_boundary_masks:
>>>
>>> path-1:    seg_boundary_mask = 65535,
>>> path-2:    seg_boundary_mask = 4294967295,
>>>
>>> consequently the DM request queue has this:
>>>
>>> md-1:    seg_boundary_mask = 65535,
>>>
>>> What happens now is that a request is being formatted, and sent
>>> to path 2. During submission req->nr_phys_segments is formatted
>>> with the limits of path 2, arriving at a count of 3.
>>> Now the request gets retried on path 1, but as the NOMERGE request
>>> flag is set req->nr_phys_segments is never updated.
>>> But blk_rq_map_sg() ignores all counters, and just uses the
>>> bi_vec directly, resulting in a count of 4 -> boom.
>>>
>>> So the culprit here is the NOMERGE flag, which is evaluated
>>> via
>>> ->dm_dispatch_request()
>>>    ->blk_insert_cloned_request()
>>>      ->blk_rq_check_limits()
>>>
>>> If the above assessment is correct, the following patch should
>>> fix it:
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 801ced7..12cccd6 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>>>    */
>>>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>>>   {
>>> -       if (!rq_mergeable(rq))
>>> +       if (rq->cmd_type != REQ_TYPE_FS)
>>>                  return 0;
>>>
>>>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>>> rq->cmd_flags)) {
>>>
>>>
>>> Mike? Jens?
>>> Can you comment on it?
>>
>> We only support merging on REQ_TYPE_FS already, so how is the above
>> making it any different? In general, NOMERGE being set or not should not
>> make a difference. It's only a hint that we need not check further if we
>> should be merging on this request, since we already tried it once, found
>> we'd exceed various limits, then set NOMERGE to reflect that.
>>
> The problem is that NOMERGE does too much, as it inhibits _any_ merging.

Right, that is the point of the flag from the block layer view, where it 
was originally added for the case mentioned.

> Unfortunately, the req->nr_phys_segments value is evaluated in the final
> _driver_ context _after_ the merging happend; cf
> scsi_lib.c:scsi_init_sgtable().
> As nr_phys_segments is inherited from the original request (and never
> recalculated with the new request queue limits) the following
> blk_rq_map_sg() call might end up at a different calculation, especially
> after retrying a request on another path.

That all sounds pretty horrible. Why is blk_rq_check_limits() checking 
for mergeable at all? If merging is disabled on the request, I'm 
assuming that's an attempt at an optimization since we know it won't 
change. But that should be tracked separately, like how it's done on the 
bio.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-11-25 19:24                       ` Jens Axboe
  0 siblings, 0 replies; 58+ messages in thread
From: Jens Axboe @ 2015-11-25 19:24 UTC (permalink / raw)
  To: Hannes Reinecke, emilne
  Cc: Christoph Hellwig, Michael Ellerman, Mark Salter,
	James E. J. Bottomley, brking, linux-scsi, linux-kernel,
	linuxppc-dev, linux-block, Mike Snitzer, Jun'ichi Nomura

On 11/25/2015 12:10 PM, Hannes Reinecke wrote:
> On 11/25/2015 06:56 PM, Jens Axboe wrote:
>> On 11/25/2015 02:04 AM, Hannes Reinecke wrote:
>>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>>>> Can't we have a joint effort here?
>>>>> I've been spending a _LOT_ of time trying to debug things here, but
>>>>> none of the ideas I've come up with have been able to fix anything.
>>>>
>>>> Yes.  I'm not the one primarily looking at it, and we don't have a
>>>> reproducer in-house.  We just have the one dump right now.
>>>>
>>>>>
>>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>>>> by one and be done with ...
>>>>>
>>>>
>>>> That might not fix it if it is a problem with the merge code, though.
>>>>
>>> And indeed, it doesn't.
>>> Seems I finally found the culprit.
>>>
>>> What happens is this:
>>> We have two paths, with these seg_boundary_masks:
>>>
>>> path-1:    seg_boundary_mask = 65535,
>>> path-2:    seg_boundary_mask = 4294967295,
>>>
>>> consequently the DM request queue has this:
>>>
>>> md-1:    seg_boundary_mask = 65535,
>>>
>>> What happens now is that a request is being formatted, and sent
>>> to path 2. During submission req->nr_phys_segments is formatted
>>> with the limits of path 2, arriving at a count of 3.
>>> Now the request gets retried on path 1, but as the NOMERGE request
>>> flag is set req->nr_phys_segments is never updated.
>>> But blk_rq_map_sg() ignores all counters, and just uses the
>>> bi_vec directly, resulting in a count of 4 -> boom.
>>>
>>> So the culprit here is the NOMERGE flag, which is evaluated
>>> via
>>> ->dm_dispatch_request()
>>>    ->blk_insert_cloned_request()
>>>      ->blk_rq_check_limits()
>>>
>>> If the above assessment is correct, the following patch should
>>> fix it:
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 801ced7..12cccd6 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>>>    */
>>>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>>>   {
>>> -       if (!rq_mergeable(rq))
>>> +       if (rq->cmd_type != REQ_TYPE_FS)
>>>                  return 0;
>>>
>>>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>>> rq->cmd_flags)) {
>>>
>>>
>>> Mike? Jens?
>>> Can you comment on it?
>>
>> We only support merging on REQ_TYPE_FS already, so how is the above
>> making it any different? In general, NOMERGE being set or not should not
>> make a difference. It's only a hint that we need not check further if we
>> should be merging on this request, since we already tried it once, found
>> we'd exceed various limits, then set NOMERGE to reflect that.
>>
> The problem is that NOMERGE does too much, as it inhibits _any_ merging.

Right, that is the point of the flag from the block layer view, where it 
was originally added for the case mentioned.

> Unfortunately, the req->nr_phys_segments value is evaluated in the final
> _driver_ context _after_ the merging happend; cf
> scsi_lib.c:scsi_init_sgtable().
> As nr_phys_segments is inherited from the original request (and never
> recalculated with the new request queue limits) the following
> blk_rq_map_sg() call might end up at a different calculation, especially
> after retrying a request on another path.

That all sounds pretty horrible. Why is blk_rq_check_limits() checking 
for mergeable at all? If merging is disabled on the request, I'm 
assuming that's an attempt at an optimization since we know it won't 
change. But that should be tracked separately, like how it's done on the 
bio.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-25 19:24                       ` Jens Axboe
  (?)
@ 2015-11-25 20:23                       ` Mike Snitzer
  2015-11-25 21:20                         ` Mike Snitzer
  -1 siblings, 1 reply; 58+ messages in thread
From: Mike Snitzer @ 2015-11-25 20:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Hannes Reinecke, emilne, Christoph Hellwig, Michael Ellerman,
	Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block, Jun'ichi Nomura

On Wed, Nov 25 2015 at  2:24pm -0500,
Jens Axboe <axboe@fb.com> wrote:

> On 11/25/2015 12:10 PM, Hannes Reinecke wrote:
> >On 11/25/2015 06:56 PM, Jens Axboe wrote:
> >>On 11/25/2015 02:04 AM, Hannes Reinecke wrote:
> >>>On 11/20/2015 04:28 PM, Ewan Milne wrote:
> >>>>On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >>>>>Can't we have a joint effort here?
> >>>>>I've been spending a _LOT_ of time trying to debug things here, but
> >>>>>none of the ideas I've come up with have been able to fix anything.
> >>>>
> >>>>Yes.  I'm not the one primarily looking at it, and we don't have a
> >>>>reproducer in-house.  We just have the one dump right now.
> >>>>
> >>>>>
> >>>>>I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >>>>>by one and be done with ...
> >>>>>
> >>>>
> >>>>That might not fix it if it is a problem with the merge code, though.
> >>>>
> >>>And indeed, it doesn't.
> >>>Seems I finally found the culprit.
> >>>
> >>>What happens is this:
> >>>We have two paths, with these seg_boundary_masks:
> >>>
> >>>path-1:    seg_boundary_mask = 65535,
> >>>path-2:    seg_boundary_mask = 4294967295,
> >>>
> >>>consequently the DM request queue has this:
> >>>
> >>>md-1:    seg_boundary_mask = 65535,
> >>>
> >>>What happens now is that a request is being formatted, and sent
> >>>to path 2. During submission req->nr_phys_segments is formatted
> >>>with the limits of path 2, arriving at a count of 3.
> >>>Now the request gets retried on path 1, but as the NOMERGE request
> >>>flag is set req->nr_phys_segments is never updated.
> >>>But blk_rq_map_sg() ignores all counters, and just uses the
> >>>bi_vec directly, resulting in a count of 4 -> boom.
> >>>
> >>>So the culprit here is the NOMERGE flag, which is evaluated
> >>>via
> >>>->dm_dispatch_request()
> >>>   ->blk_insert_cloned_request()
> >>>     ->blk_rq_check_limits()
> >>>
> >>>If the above assessment is correct, the following patch should
> >>>fix it:
> >>>
> >>>diff --git a/block/blk-core.c b/block/blk-core.c
> >>>index 801ced7..12cccd6 100644
> >>>--- a/block/blk-core.c
> >>>+++ b/block/blk-core.c
> >>>@@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> >>>   */
> >>>  int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> >>>  {
> >>>-       if (!rq_mergeable(rq))
> >>>+       if (rq->cmd_type != REQ_TYPE_FS)
> >>>                 return 0;
> >>>
> >>>         if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> >>>rq->cmd_flags)) {
> >>>
> >>>
> >>>Mike? Jens?
> >>>Can you comment on it?
> >>
> >>We only support merging on REQ_TYPE_FS already, so how is the above
> >>making it any different? In general, NOMERGE being set or not should not
> >>make a difference. It's only a hint that we need not check further if we
> >>should be merging on this request, since we already tried it once, found
> >>we'd exceed various limits, then set NOMERGE to reflect that.
> >>
> >The problem is that NOMERGE does too much, as it inhibits _any_ merging.
> 
> Right, that is the point of the flag from the block layer view,
> where it was originally added for the case mentioned.

And we really don't want _any_ merging.  The merging, if any, will have
already happened in upper DM-multipath's elevator.  So there should be
no need to have the underlying SCSI paths do any merging.

> >Unfortunately, the req->nr_phys_segments value is evaluated in the final
> >_driver_ context _after_ the merging happend; cf
> >scsi_lib.c:scsi_init_sgtable().
> >As nr_phys_segments is inherited from the original request (and never
> >recalculated with the new request queue limits) the following
> >blk_rq_map_sg() call might end up at a different calculation, especially
> >after retrying a request on another path.
> 
> That all sounds pretty horrible. Why is blk_rq_check_limits()
> checking for mergeable at all? If merging is disabled on the
> request, I'm assuming that's an attempt at an optimization since we
> know it won't change. But that should be tracked separately, like
> how it's done on the bio.

Not clear to me why it was checking for merging...

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-25 20:23                       ` Mike Snitzer
@ 2015-11-25 21:20                         ` Mike Snitzer
  0 siblings, 0 replies; 58+ messages in thread
From: Mike Snitzer @ 2015-11-25 21:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Hannes Reinecke, emilne, Christoph Hellwig, Michael Ellerman,
	Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block, Jun'ichi Nomura

On Wed, Nov 25 2015 at  3:23pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, Nov 25 2015 at  2:24pm -0500,
> Jens Axboe <axboe@fb.com> wrote:
> 
> > On 11/25/2015 12:10 PM, Hannes Reinecke wrote:
> > >The problem is that NOMERGE does too much, as it inhibits _any_ merging.
> > 
> > Right, that is the point of the flag from the block layer view,
> > where it was originally added for the case mentioned.
> 
> And we really don't want _any_ merging.  The merging, if any, will have
> already happened in upper DM-multipath's elevator.  So there should be
> no need to have the underlying SCSI paths do any merging.
> 
> > >Unfortunately, the req->nr_phys_segments value is evaluated in the final
> > >_driver_ context _after_ the merging happend; cf
> > >scsi_lib.c:scsi_init_sgtable().
> > >As nr_phys_segments is inherited from the original request (and never
> > >recalculated with the new request queue limits) the following
> > >blk_rq_map_sg() call might end up at a different calculation, especially
> > >after retrying a request on another path.
> > 
> > That all sounds pretty horrible. Why is blk_rq_check_limits()
> > checking for mergeable at all? If merging is disabled on the
> > request, I'm assuming that's an attempt at an optimization since we
> > know it won't change. But that should be tracked separately, like
> > how it's done on the bio.
> 
> Not clear to me why it was checking for merging...

Ewan pointed out that blk_rq_check_limits()'s call to rq_mergable() was
introduced as part of Martin's DISCARD cleanup that prepared for
WRITE_SAME, see: commit e2a60da74 ("block: Clean up special command handling logic")

And prior to that, blk_rq_check_limits()'s (rq->cmd_flags & REQ_DISCARD)
check was introduced by some guy crazy doppelganger name "Ike Snitzer",
see commit: 3383977f ("block: update request stacking methods to support discards")

So basically we probably never needed the extra check in
blk_rq_check_limits() to begin with.. Ike was probably paranoid. ;)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-11-25 19:01                     ` Hannes Reinecke
@ 2015-12-04 16:59                       ` Takashi Iwai
  -1 siblings, 0 replies; 58+ messages in thread
From: Takashi Iwai @ 2015-12-04 16:59 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Mike Snitzer, emilne, Christoph Hellwig, Michael Ellerman,
	Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block, Jun'ichi Nomura,
	Jens Axboe

On Wed, 25 Nov 2015 20:01:47 +0100,
Hannes Reinecke wrote:
> 
> On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> > On Wed, Nov 25 2015 at  4:04am -0500,
> > Hannes Reinecke <hare@suse.de> wrote:
> >
> >> On 11/20/2015 04:28 PM, Ewan Milne wrote:
> >>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >>>> Can't we have a joint effort here?
> >>>> I've been spending a _LOT_ of time trying to debug things here, but
> >>>> none of the ideas I've come up with have been able to fix anything.
> >>>
> >>> Yes.  I'm not the one primarily looking at it, and we don't have a
> >>> reproducer in-house.  We just have the one dump right now.
> >>>
> >>>>
> >>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >>>> by one and be done with ...
> >>>>
> >>>
> >>> That might not fix it if it is a problem with the merge code, though.
> >>>
> >> And indeed, it doesn't.
> >
> > How did you arrive at that?  Do you have a reproducer now?
> >
> Not a reproducer, but several dumps for analysis.
> 
> >> Seems I finally found the culprit.
> >>
> >> What happens is this:
> >> We have two paths, with these seg_boundary_masks:
> >>
> >> path-1:    seg_boundary_mask = 65535,
> >> path-2:    seg_boundary_mask = 4294967295,
> >>
> >> consequently the DM request queue has this:
> >>
> >> md-1:    seg_boundary_mask = 65535,
> >>
> >> What happens now is that a request is being formatted, and sent
> >> to path 2. During submission req->nr_phys_segments is formatted
> >> with the limits of path 2, arriving at a count of 3.
> >> Now the request gets retried on path 1, but as the NOMERGE request
> >> flag is set req->nr_phys_segments is never updated.
> >> But blk_rq_map_sg() ignores all counters, and just uses the
> >> bi_vec directly, resulting in a count of 4 -> boom.
> >>
> >> So the culprit here is the NOMERGE flag,
> >
> > NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
> >
> Yes.
> 
> >> which is evaluated via
> >> ->dm_dispatch_request()
> >>    ->blk_insert_cloned_request()
> >>      ->blk_rq_check_limits()
> >
> > blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> > anyway after reading your mail I'm still left wondering if your proposed
> > patch is correct.
> >
> >> If the above assessment is correct, the following patch should
> >> fix it:
> >>
> >> diff --git a/block/blk-core.c b/block/blk-core.c
> >> index 801ced7..12cccd6 100644
> >> --- a/block/blk-core.c
> >> +++ b/block/blk-core.c
> >> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> >>    */
> >>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> >>   {
> >> -       if (!rq_mergeable(rq))
> >> +       if (rq->cmd_type != REQ_TYPE_FS)
> >>                  return 0;
> >>
> >>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> >> rq->cmd_flags)) {
> >>
> >>
> >> Mike? Jens?
> >> Can you comment on it?
> >
> > You're not explaining the actual change in the patch very well; I think
> > you're correct but you're leaving the justification as an exercise to
> > the reviewer:
> >
> > blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> > !rq_mergeable() check but you're saying for this case in question we
> > never get there -- due to the cloned request having NOMERGE set.
> >
> > So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> > open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
> >
> > I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> > the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> > sense for cloned requests that always have NOMERGE set.
> >
> > So you're saying that by having blk_rq_check_limits() go on to call
> > blk_recalc_rq_segments() this bug will be fixed?
> >
> That is the idea.
> 
> I've already established that in all instances I have seen so far
> req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.
> 
> As it turns out, req->nr_phys_segemnts _would_ have been updated in
> blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
> for the cloned request.
> So each cloned request inherits the values from the original request,
> despite the fact that req->nr_phys_segments _has_ to be evaluated in
> the final request_queue context, as the queue limits _might_ be 
> different from the original (merged) queue limits of the multipath
> request queue.
> 
> > BTW, I think blk_rq_check_limits()'s export should be removed and the
> > function made static and renamed to blk_clone_rq_check_limits(), again:
> > blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
> >
> Actually, seeing Jens' last comment the check for REQ_TYPE_FS is 
> pointless, too, so we might as well remove the entire if-clause.
> 
> > Seems prudent to make that change now to be clear that this code is only
> > used by cloned requests.
> >
> Yeah, that would make sense. I'll be preparing a patch.
> With a more detailed description :-)

Do we have already a fix?  Right now I got (likely) this kernel BUG()
on the almost latest Linus tree (commit 25364a9e54fb8296).  It
happened while I started a KVM right after a fresh boot.  The machine
paniced even before that, so I hit this twice today.

I have a crash dump, so if anything is needed, let me know.
(But the system had to be restarted since it's my main workstation.)


thanks,

Takashi

===
[  137.211255] ------------[ cut here ]------------
[  137.211265] WARNING: CPU: 0 PID: 391 at block/blk-merge.c:435 blk_rq_map_sg+0x430/0x4d0()
[  137.211267] Modules linked in: nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace sunrpc fscache parport_pc ppdev parport fuse vmw_vsock_vmci_transport vsock vmw_vmci iscsi_ibft iscsi_boot_sysfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i915 irqbypass crct10dif_pclmul crc32_pclmul snd_emu10k1 snd_hda_codec_ca0132 snd_util_mem aesni_intel snd_ac97_codec snd_hda_intel aes_x86_64 dcdbas snd_hda_codec glue_helper dm_mod ac97_bus lrw gf128mul ablk_helper cryptd snd_rawmidi snd_hda_core i2c_algo_bit snd_seq_device drm_kms_helper snd_hwdep syscopyarea snd_pcm sysfillrect sysimgblt snd_timer fb_sys_fops iTCO_wdt i2c_i801 drm snd iTCO_vendor_support emu10k1_gp battery lpc_ich gameport mfd_core soundcore
[  137.211326]  mei_me video mei thermal button usbhid crc32c_intel sr_mod ehci_pci cdrom serio_raw xhci_pci ehci_hcd e1000e xhci_hcd ptp pps_core usbcore usb_common sg
[  137.211342] CPU: 0 PID: 391 Comm: kworker/0:1H Not tainted 4.4.0-rc3-test2+ #183
[  137.211344] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
[  137.211349] Workqueue: kblockd bfq_kick_queue
[  137.211352]  ffffffff81a6b805 ffff880214307ab8 ffffffff813b1cc9 0000000000000000
[  137.211357]  ffff880214307af0 ffffffff8106d9a2 ffff88019e7d49c8 0000000000000000
[  137.211362]  0000000000000000 0000000000001000 0000000000000020 ffff880214307b00
[  137.211367] Call Trace:
[  137.211373]  [<ffffffff813b1cc9>] dump_stack+0x4b/0x72
[  137.211378]  [<ffffffff8106d9a2>] warn_slowpath_common+0x82/0xc0
[  137.211383]  [<ffffffff8106da9a>] warn_slowpath_null+0x1a/0x20
[  137.211387]  [<ffffffff81381340>] blk_rq_map_sg+0x430/0x4d0
[  137.211393]  [<ffffffff81514a60>] ? scsi_init_cmd_errh+0x90/0x90
[  137.211398]  [<ffffffff81514fc4>] scsi_init_sgtable+0x44/0x80
[  137.211402]  [<ffffffff81515045>] scsi_init_io+0x45/0x1d0
[  137.211408]  [<ffffffff815214a8>] sd_setup_read_write_cmnd+0x38/0xa20
[  137.211413]  [<ffffffff810c141d>] ? trace_hardirqs_off+0xd/0x10
[  137.211416]  [<ffffffff81524fc7>] sd_init_command+0x27/0xa0
[  137.211420]  [<ffffffff815152c0>] scsi_setup_cmnd+0xf0/0x150
[  137.211424]  [<ffffffff81515471>] scsi_prep_fn+0xd1/0x170
[  137.211428]  [<ffffffff8137b87c>] blk_peek_request+0x18c/0x320
[  137.211433]  [<ffffffff81516cb2>] scsi_request_fn+0x42/0x610
[  137.211438]  [<ffffffff81374943>] __blk_run_queue+0x33/0x40
[  137.211442]  [<ffffffff813a8015>] bfq_kick_queue+0x25/0x40
[  137.211447]  [<ffffffff8108b18d>] process_one_work+0x1ed/0x660
[  137.211451]  [<ffffffff8108b0fc>] ? process_one_work+0x15c/0x660
[  137.211456]  [<ffffffff8108b64e>] worker_thread+0x4e/0x450
[  137.211460]  [<ffffffff8108b600>] ? process_one_work+0x660/0x660
[  137.211464]  [<ffffffff8108b600>] ? process_one_work+0x660/0x660
[  137.211468]  [<ffffffff8109206e>] kthread+0xfe/0x120
[  137.211473]  [<ffffffff81091f70>] ? kthread_create_on_node+0x230/0x230
[  137.211478]  [<ffffffff816d19ef>] ret_from_fork+0x3f/0x70
[  137.211482]  [<ffffffff81091f70>] ? kthread_create_on_node+0x230/0x230
[  137.211485] ---[ end trace b39a62f95b1cad19 ]---
[  137.211499] ------------[ cut here ]------------
[  137.211506] kernel BUG at drivers/scsi/scsi_lib.c:1096!
[  137.211511] invalid opcode: 0000 [#1] PREEMPT SMP 
[  137.211519] Modules linked in: nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace sunrpc fscache parport_pc ppdev parport fuse vmw_vsock_vmci_transport vsock vmw_vmci iscsi_ibft iscsi_boot_sysfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i915 irqbypass crct10dif_pclmul crc32_pclmul snd_emu10k1 snd_hda_codec_ca0132 snd_util_mem aesni_intel snd_ac97_codec snd_hda_intel aes_x86_64 dcdbas snd_hda_codec glue_helper dm_mod ac97_bus lrw gf128mul ablk_helper cryptd snd_rawmidi snd_hda_core i2c_algo_bit snd_seq_device drm_kms_helper snd_hwdep syscopyarea snd_pcm sysfillrect sysimgblt snd_timer fb_sys_fops iTCO_wdt i2c_i801 drm snd iTCO_vendor_support emu10k1_gp battery lpc_ich gameport mfd_core soundcore
[  137.211630]  mei_me video mei thermal button usbhid crc32c_intel sr_mod ehci_pci cdrom serio_raw xhci_pci ehci_hcd e1000e xhci_hcd ptp pps_core usbcore usb_common sg
[  137.211660] CPU: 0 PID: 391 Comm: kworker/0:1H Tainted: G        W       4.4.0-rc3-test2+ #183
[  137.211668] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
[  137.211675] Workqueue: kblockd bfq_kick_queue
[  137.211681] task: ffff880214754a40 ti: ffff880214304000 task.ti: ffff880214304000
[  137.211713] RIP: 0010:[<ffffffff81514ff2>]  [<ffffffff81514ff2>] scsi_init_sgtable+0x72/0x80
[  137.211747] RSP: 0018:ffff880214307b90  EFLAGS: 00010097
[  137.211777] RAX: 000000000000000e RBX: ffff8802147e68b0 RCX: 0000000000000006
[  137.211807] RDX: 0000000000000024 RSI: 0000000000000000 RDI: 0000000000000009
[  137.211838] RBP: ffff880214307ba8 R08: 0000000000000001 R09: 0000000000000000
[  137.211868] R10: ffff8802147e68b0 R11: 0000000000000001 R12: ffff88003711c320
[  137.211899] R13: 0000000000000000 R14: ffff8800370be000 R15: ffff8802147e68b0
[  137.211930] FS:  0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
[  137.211961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.211991] CR2: 000000000041bee0 CR3: 00000000d85ac000 CR4: 00000000001426f0
[  137.212021] Stack:
[  137.212048]  ffff88003711c200 000000000003b000 0000000000000000 ffff880214307bf0
[  137.212082]  ffffffff81515045 ffff880214307c28 0000000000000092 ffff8802147e68b0
[  137.212115]  000000000003b000 ffff8800d8a90800 ffff8800370be000 0000000000000020
[  137.212148] Call Trace:
[  137.212178]  [<ffffffff81515045>] scsi_init_io+0x45/0x1d0
[  137.212205]  [<ffffffff815214a8>] sd_setup_read_write_cmnd+0x38/0xa20
[  137.212232]  [<ffffffff810c141d>] ? trace_hardirqs_off+0xd/0x10
[  137.212258]  [<ffffffff81524fc7>] sd_init_command+0x27/0xa0
[  137.212285]  [<ffffffff815152c0>] scsi_setup_cmnd+0xf0/0x150
[  137.212312]  [<ffffffff81515471>] scsi_prep_fn+0xd1/0x170
[  137.212338]  [<ffffffff8137b87c>] blk_peek_request+0x18c/0x320
[  137.212365]  [<ffffffff81516cb2>] scsi_request_fn+0x42/0x610
[  137.212392]  [<ffffffff81374943>] __blk_run_queue+0x33/0x40
[  137.212418]  [<ffffffff813a8015>] bfq_kick_queue+0x25/0x40
[  137.212444]  [<ffffffff8108b18d>] process_one_work+0x1ed/0x660
[  137.212471]  [<ffffffff8108b0fc>] ? process_one_work+0x15c/0x660
[  137.212498]  [<ffffffff8108b64e>] worker_thread+0x4e/0x450
[  137.212525]  [<ffffffff8108b600>] ? process_one_work+0x660/0x660
[  137.212552]  [<ffffffff8108b600>] ? process_one_work+0x660/0x660
[  137.212578]  [<ffffffff8109206e>] kthread+0xfe/0x120
[  137.212605]  [<ffffffff81091f70>] ? kthread_create_on_node+0x230/0x230
[  137.212631]  [<ffffffff816d19ef>] ret_from_fork+0x3f/0x70
[  137.212657]  [<ffffffff81091f70>] ? kthread_create_on_node+0x230/0x230
[  137.212683] Code: 24 08 72 27 41 89 44 24 08 8b 43 5c 5b 41 89 44 24 10 44 89 e8 41 5c 41 5d 5d c3 41 bd 02 00 00 00 5b 44 89 e8 41 5c 41 5d 5d c3 <0f> 0b 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 
[  137.212798] RIP  [<ffffffff81514ff2>] scsi_init_sgtable+0x72/0x80
[  137.212826]  RSP <ffff880214307b90>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-12-04 16:59                       ` Takashi Iwai
  0 siblings, 0 replies; 58+ messages in thread
From: Takashi Iwai @ 2015-12-04 16:59 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Mike Snitzer, emilne, Christoph Hellwig, Michael Ellerman,
	Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block, Jun'ichi Nomura,
	Jens Axboe

On Wed, 25 Nov 2015 20:01:47 +0100,
Hannes Reinecke wrote:
> 
> On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> > On Wed, Nov 25 2015 at  4:04am -0500,
> > Hannes Reinecke <hare@suse.de> wrote:
> >
> >> On 11/20/2015 04:28 PM, Ewan Milne wrote:
> >>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >>>> Can't we have a joint effort here?
> >>>> I've been spending a _LOT_ of time trying to debug things here, but
> >>>> none of the ideas I've come up with have been able to fix anything.
> >>>
> >>> Yes.  I'm not the one primarily looking at it, and we don't have a
> >>> reproducer in-house.  We just have the one dump right now.
> >>>
> >>>>
> >>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >>>> by one and be done with ...
> >>>>
> >>>
> >>> That might not fix it if it is a problem with the merge code, though.
> >>>
> >> And indeed, it doesn't.
> >
> > How did you arrive at that?  Do you have a reproducer now?
> >
> Not a reproducer, but several dumps for analysis.
> 
> >> Seems I finally found the culprit.
> >>
> >> What happens is this:
> >> We have two paths, with these seg_boundary_masks:
> >>
> >> path-1:    seg_boundary_mask = 65535,
> >> path-2:    seg_boundary_mask = 4294967295,
> >>
> >> consequently the DM request queue has this:
> >>
> >> md-1:    seg_boundary_mask = 65535,
> >>
> >> What happens now is that a request is being formatted, and sent
> >> to path 2. During submission req->nr_phys_segments is formatted
> >> with the limits of path 2, arriving at a count of 3.
> >> Now the request gets retried on path 1, but as the NOMERGE request
> >> flag is set req->nr_phys_segments is never updated.
> >> But blk_rq_map_sg() ignores all counters, and just uses the
> >> bi_vec directly, resulting in a count of 4 -> boom.
> >>
> >> So the culprit here is the NOMERGE flag,
> >
> > NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
> >
> Yes.
> 
> >> which is evaluated via
> >> ->dm_dispatch_request()
> >>    ->blk_insert_cloned_request()
> >>      ->blk_rq_check_limits()
> >
> > blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> > anyway after reading your mail I'm still left wondering if your proposed
> > patch is correct.
> >
> >> If the above assessment is correct, the following patch should
> >> fix it:
> >>
> >> diff --git a/block/blk-core.c b/block/blk-core.c
> >> index 801ced7..12cccd6 100644
> >> --- a/block/blk-core.c
> >> +++ b/block/blk-core.c
> >> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> >>    */
> >>   int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> >>   {
> >> -       if (!rq_mergeable(rq))
> >> +       if (rq->cmd_type != REQ_TYPE_FS)
> >>                  return 0;
> >>
> >>          if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> >> rq->cmd_flags)) {
> >>
> >>
> >> Mike? Jens?
> >> Can you comment on it?
> >
> > You're not explaining the actual change in the patch very well; I think
> > you're correct but you're leaving the justification as an exercise to
> > the reviewer:
> >
> > blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> > !rq_mergeable() check but you're saying for this case in question we
> > never get there -- due to the cloned request having NOMERGE set.
> >
> > So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> > open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
> >
> > I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> > the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> > sense for cloned requests that always have NOMERGE set.
> >
> > So you're saying that by having blk_rq_check_limits() go on to call
> > blk_recalc_rq_segments() this bug will be fixed?
> >
> That is the idea.
> 
> I've already established that in all instances I have seen so far
> req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.
> 
> As it turns out, req->nr_phys_segemnts _would_ have been updated in
> blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
> for the cloned request.
> So each cloned request inherits the values from the original request,
> despite the fact that req->nr_phys_segments _has_ to be evaluated in
> the final request_queue context, as the queue limits _might_ be 
> different from the original (merged) queue limits of the multipath
> request queue.
> 
> > BTW, I think blk_rq_check_limits()'s export should be removed and the
> > function made static and renamed to blk_clone_rq_check_limits(), again:
> > blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
> >
> Actually, seeing Jens' last comment the check for REQ_TYPE_FS is 
> pointless, too, so we might as well remove the entire if-clause.
> 
> > Seems prudent to make that change now to be clear that this code is only
> > used by cloned requests.
> >
> Yeah, that would make sense. I'll be preparing a patch.
> With a more detailed description :-)

Do we have already a fix?  Right now I got (likely) this kernel BUG()
on the almost latest Linus tree (commit 25364a9e54fb8296).  It
happened while I started a KVM right after a fresh boot.  The machine
paniced even before that, so I hit this twice today.

I have a crash dump, so if anything is needed, let me know.
(But the system had to be restarted since it's my main workstation.)


thanks,

Takashi

===
[  137.211255] ------------[ cut here ]------------
[  137.211265] WARNING: CPU: 0 PID: 391 at block/blk-merge.c:435 blk_rq_map_sg+0x430/0x4d0()
[  137.211267] Modules linked in: nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace sunrpc fscache parport_pc ppdev parport fuse vmw_vsock_vmci_transport vsock vmw_vmci iscsi_ibft iscsi_boot_sysfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i915 irqbypass crct10dif_pclmul crc32_pclmul snd_emu10k1 snd_hda_codec_ca0132 snd_util_mem aesni_intel snd_ac97_codec snd_hda_intel aes_x86_64 dcdbas snd_hda_codec glue_helper dm_mod ac97_bus lrw gf128mul ablk_helper cryptd snd_rawmidi snd_hda_core i2c_algo_bit snd_seq_device drm_kms_helper snd_hwdep syscopyarea snd_pcm sysfillrect sysimgblt snd_timer fb_sys_fops iTCO_wdt i2c_i801 drm snd iTCO_vendor_support emu10k1_gp battery lpc_ich gameport mfd_cor
 e soundcore
[  137.211326]  mei_me video mei thermal button usbhid crc32c_intel sr_mod ehci_pci cdrom serio_raw xhci_pci ehci_hcd e1000e xhci_hcd ptp pps_core usbcore usb_common sg
[  137.211342] CPU: 0 PID: 391 Comm: kworker/0:1H Not tainted 4.4.0-rc3-test2+ #183
[  137.211344] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
[  137.211349] Workqueue: kblockd bfq_kick_queue
[  137.211352]  ffffffff81a6b805 ffff880214307ab8 ffffffff813b1cc9 0000000000000000
[  137.211357]  ffff880214307af0 ffffffff8106d9a2 ffff88019e7d49c8 0000000000000000
[  137.211362]  0000000000000000 0000000000001000 0000000000000020 ffff880214307b00
[  137.211367] Call Trace:
[  137.211373]  [<ffffffff813b1cc9>] dump_stack+0x4b/0x72
[  137.211378]  [<ffffffff8106d9a2>] warn_slowpath_common+0x82/0xc0
[  137.211383]  [<ffffffff8106da9a>] warn_slowpath_null+0x1a/0x20
[  137.211387]  [<ffffffff81381340>] blk_rq_map_sg+0x430/0x4d0
[  137.211393]  [<ffffffff81514a60>] ? scsi_init_cmd_errh+0x90/0x90
[  137.211398]  [<ffffffff81514fc4>] scsi_init_sgtable+0x44/0x80
[  137.211402]  [<ffffffff81515045>] scsi_init_io+0x45/0x1d0
[  137.211408]  [<ffffffff815214a8>] sd_setup_read_write_cmnd+0x38/0xa20
[  137.211413]  [<ffffffff810c141d>] ? trace_hardirqs_off+0xd/0x10
[  137.211416]  [<ffffffff81524fc7>] sd_init_command+0x27/0xa0
[  137.211420]  [<ffffffff815152c0>] scsi_setup_cmnd+0xf0/0x150
[  137.211424]  [<ffffffff81515471>] scsi_prep_fn+0xd1/0x170
[  137.211428]  [<ffffffff8137b87c>] blk_peek_request+0x18c/0x320
[  137.211433]  [<ffffffff81516cb2>] scsi_request_fn+0x42/0x610
[  137.211438]  [<ffffffff81374943>] __blk_run_queue+0x33/0x40
[  137.211442]  [<ffffffff813a8015>] bfq_kick_queue+0x25/0x40
[  137.211447]  [<ffffffff8108b18d>] process_one_work+0x1ed/0x660
[  137.211451]  [<ffffffff8108b0fc>] ? process_one_work+0x15c/0x660
[  137.211456]  [<ffffffff8108b64e>] worker_thread+0x4e/0x450
[  137.211460]  [<ffffffff8108b600>] ? process_one_work+0x660/0x660
[  137.211464]  [<ffffffff8108b600>] ? process_one_work+0x660/0x660
[  137.211468]  [<ffffffff8109206e>] kthread+0xfe/0x120
[  137.211473]  [<ffffffff81091f70>] ? kthread_create_on_node+0x230/0x230
[  137.211478]  [<ffffffff816d19ef>] ret_from_fork+0x3f/0x70
[  137.211482]  [<ffffffff81091f70>] ? kthread_create_on_node+0x230/0x230
[  137.211485] ---[ end trace b39a62f95b1cad19 ]---
[  137.211499] ------------[ cut here ]------------
[  137.211506] kernel BUG at drivers/scsi/scsi_lib.c:1096!
[  137.211511] invalid opcode: 0000 [#1] PREEMPT SMP 
[  137.211519] Modules linked in: nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs lockd grace sunrpc fscache parport_pc ppdev parport fuse vmw_vsock_vmci_transport vsock vmw_vmci iscsi_ibft iscsi_boot_sysfs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i915 irqbypass crct10dif_pclmul crc32_pclmul snd_emu10k1 snd_hda_codec_ca0132 snd_util_mem aesni_intel snd_ac97_codec snd_hda_intel aes_x86_64 dcdbas snd_hda_codec glue_helper dm_mod ac97_bus lrw gf128mul ablk_helper cryptd snd_rawmidi snd_hda_core i2c_algo_bit snd_seq_device drm_kms_helper snd_hwdep syscopyarea snd_pcm sysfillrect sysimgblt snd_timer fb_sys_fops iTCO_wdt i2c_i801 drm snd iTCO_vendor_support emu10k1_gp battery lpc_ich gameport mfd_cor
 e soundcore
[  137.211630]  mei_me video mei thermal button usbhid crc32c_intel sr_mod ehci_pci cdrom serio_raw xhci_pci ehci_hcd e1000e xhci_hcd ptp pps_core usbcore usb_common sg
[  137.211660] CPU: 0 PID: 391 Comm: kworker/0:1H Tainted: G        W       4.4.0-rc3-test2+ #183
[  137.211668] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
[  137.211675] Workqueue: kblockd bfq_kick_queue
[  137.211681] task: ffff880214754a40 ti: ffff880214304000 task.ti: ffff880214304000
[  137.211713] RIP: 0010:[<ffffffff81514ff2>]  [<ffffffff81514ff2>] scsi_init_sgtable+0x72/0x80
[  137.211747] RSP: 0018:ffff880214307b90  EFLAGS: 00010097
[  137.211777] RAX: 000000000000000e RBX: ffff8802147e68b0 RCX: 0000000000000006
[  137.211807] RDX: 0000000000000024 RSI: 0000000000000000 RDI: 0000000000000009
[  137.211838] RBP: ffff880214307ba8 R08: 0000000000000001 R09: 0000000000000000
[  137.211868] R10: ffff8802147e68b0 R11: 0000000000000001 R12: ffff88003711c320
[  137.211899] R13: 0000000000000000 R14: ffff8800370be000 R15: ffff8802147e68b0
[  137.211930] FS:  0000000000000000(0000) GS:ffff88021e200000(0000) knlGS:0000000000000000
[  137.211961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.211991] CR2: 000000000041bee0 CR3: 00000000d85ac000 CR4: 00000000001426f0
[  137.212021] Stack:
[  137.212048]  ffff88003711c200 000000000003b000 0000000000000000 ffff880214307bf0
[  137.212082]  ffffffff81515045 ffff880214307c28 0000000000000092 ffff8802147e68b0
[  137.212115]  000000000003b000 ffff8800d8a90800 ffff8800370be000 0000000000000020
[  137.212148] Call Trace:
[  137.212178]  [<ffffffff81515045>] scsi_init_io+0x45/0x1d0
[  137.212205]  [<ffffffff815214a8>] sd_setup_read_write_cmnd+0x38/0xa20
[  137.212232]  [<ffffffff810c141d>] ? trace_hardirqs_off+0xd/0x10
[  137.212258]  [<ffffffff81524fc7>] sd_init_command+0x27/0xa0
[  137.212285]  [<ffffffff815152c0>] scsi_setup_cmnd+0xf0/0x150
[  137.212312]  [<ffffffff81515471>] scsi_prep_fn+0xd1/0x170
[  137.212338]  [<ffffffff8137b87c>] blk_peek_request+0x18c/0x320
[  137.212365]  [<ffffffff81516cb2>] scsi_request_fn+0x42/0x610
[  137.212392]  [<ffffffff81374943>] __blk_run_queue+0x33/0x40
[  137.212418]  [<ffffffff813a8015>] bfq_kick_queue+0x25/0x40
[  137.212444]  [<ffffffff8108b18d>] process_one_work+0x1ed/0x660
[  137.212471]  [<ffffffff8108b0fc>] ? process_one_work+0x15c/0x660
[  137.212498]  [<ffffffff8108b64e>] worker_thread+0x4e/0x450
[  137.212525]  [<ffffffff8108b600>] ? process_one_work+0x660/0x660
[  137.212552]  [<ffffffff8108b600>] ? process_one_work+0x660/0x660
[  137.212578]  [<ffffffff8109206e>] kthread+0xfe/0x120
[  137.212605]  [<ffffffff81091f70>] ? kthread_create_on_node+0x230/0x230
[  137.212631]  [<ffffffff816d19ef>] ret_from_fork+0x3f/0x70
[  137.212657]  [<ffffffff81091f70>] ? kthread_create_on_node+0x230/0x230
[  137.212683] Code: 24 08 72 27 41 89 44 24 08 8b 43 5c 5b 41 89 44 24 10 44 89 e8 41 5c 41 5d 5d c3 41 bd 02 00 00 00 5b 44 89 e8 41 5c 41 5d 5d c3 <0f> 0b 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 
[  137.212798] RIP  [<ffffffff81514ff2>] scsi_init_sgtable+0x72/0x80
[  137.212826]  RSP <ffff880214307b90>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-12-04 16:59                       ` Takashi Iwai
@ 2015-12-04 17:02                         ` Jens Axboe
  -1 siblings, 0 replies; 58+ messages in thread
From: Jens Axboe @ 2015-12-04 17:02 UTC (permalink / raw)
  To: Takashi Iwai, Hannes Reinecke
  Cc: Mike Snitzer, emilne, Christoph Hellwig, Michael Ellerman,
	Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block, Jun'ichi Nomura

On 12/04/2015 09:59 AM, Takashi Iwai wrote:
> On Wed, 25 Nov 2015 20:01:47 +0100,
> Hannes Reinecke wrote:
>>
>> On 11/25/2015 07:01 PM, Mike Snitzer wrote:
>>> On Wed, Nov 25 2015 at  4:04am -0500,
>>> Hannes Reinecke <hare@suse.de> wrote:
>>>
>>>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>>>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>>>>> Can't we have a joint effort here?
>>>>>> I've been spending a _LOT_ of time trying to debug things here, but
>>>>>> none of the ideas I've come up with have been able to fix anything.
>>>>>
>>>>> Yes.  I'm not the one primarily looking at it, and we don't have a
>>>>> reproducer in-house.  We just have the one dump right now.
>>>>>
>>>>>>
>>>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>>>>> by one and be done with ...
>>>>>>
>>>>>
>>>>> That might not fix it if it is a problem with the merge code, though.
>>>>>
>>>> And indeed, it doesn't.
>>>
>>> How did you arrive at that?  Do you have a reproducer now?
>>>
>> Not a reproducer, but several dumps for analysis.
>>
>>>> Seems I finally found the culprit.
>>>>
>>>> What happens is this:
>>>> We have two paths, with these seg_boundary_masks:
>>>>
>>>> path-1:    seg_boundary_mask = 65535,
>>>> path-2:    seg_boundary_mask = 4294967295,
>>>>
>>>> consequently the DM request queue has this:
>>>>
>>>> md-1:    seg_boundary_mask = 65535,
>>>>
>>>> What happens now is that a request is being formatted, and sent
>>>> to path 2. During submission req->nr_phys_segments is formatted
>>>> with the limits of path 2, arriving at a count of 3.
>>>> Now the request gets retried on path 1, but as the NOMERGE request
>>>> flag is set req->nr_phys_segments is never updated.
>>>> But blk_rq_map_sg() ignores all counters, and just uses the
>>>> bi_vec directly, resulting in a count of 4 -> boom.
>>>>
>>>> So the culprit here is the NOMERGE flag,
>>>
>>> NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
>>>
>> Yes.
>>
>>>> which is evaluated via
>>>> ->dm_dispatch_request()
>>>>     ->blk_insert_cloned_request()
>>>>       ->blk_rq_check_limits()
>>>
>>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
>>> anyway after reading your mail I'm still left wondering if your proposed
>>> patch is correct.
>>>
>>>> If the above assessment is correct, the following patch should
>>>> fix it:
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index 801ced7..12cccd6 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>>>>     */
>>>>    int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>>>>    {
>>>> -       if (!rq_mergeable(rq))
>>>> +       if (rq->cmd_type != REQ_TYPE_FS)
>>>>                   return 0;
>>>>
>>>>           if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>>>> rq->cmd_flags)) {
>>>>
>>>>
>>>> Mike? Jens?
>>>> Can you comment on it?
>>>
>>> You're not explaining the actual change in the patch very well; I think
>>> you're correct but you're leaving the justification as an exercise to
>>> the reviewer:
>>>
>>> blk_rq_check_limits() will call blk_recalc_rq_segments() after the
>>> !rq_mergeable() check but you're saying for this case in question we
>>> never get there -- due to the cloned request having NOMERGE set.
>>>
>>> So in blk_rq_check_limits() you've unrolled rq_mergeable() and
>>> open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
>>>
>>> I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
>>> the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
>>> sense for cloned requests that always have NOMERGE set.
>>>
>>> So you're saying that by having blk_rq_check_limits() go on to call
>>> blk_recalc_rq_segments() this bug will be fixed?
>>>
>> That is the idea.
>>
>> I've already established that in all instances I have seen so far
>> req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.
>>
>> As it turns out, req->nr_phys_segemnts _would_ have been updated in
>> blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
>> for the cloned request.
>> So each cloned request inherits the values from the original request,
>> despite the fact that req->nr_phys_segments _has_ to be evaluated in
>> the final request_queue context, as the queue limits _might_ be
>> different from the original (merged) queue limits of the multipath
>> request queue.
>>
>>> BTW, I think blk_rq_check_limits()'s export should be removed and the
>>> function made static and renamed to blk_clone_rq_check_limits(), again:
>>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
>>>
>> Actually, seeing Jens' last comment the check for REQ_TYPE_FS is
>> pointless, too, so we might as well remove the entire if-clause.
>>
>>> Seems prudent to make that change now to be clear that this code is only
>>> used by cloned requests.
>>>
>> Yeah, that would make sense. I'll be preparing a patch.
>> With a more detailed description :-)
>
> Do we have already a fix?  Right now I got (likely) this kernel BUG()
> on the almost latest Linus tree (commit 25364a9e54fb8296).  It
> happened while I started a KVM right after a fresh boot.  The machine
> paniced even before that, so I hit this twice today.

Update to the tree as-of yesterday (or today) and it should work. 
25364a9e54fb8296 doesn't include the latest block fixes that were sent 
in yesterday, that should fix it. You need commit a88d32af18b8 or newer.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-12-04 17:02                         ` Jens Axboe
  0 siblings, 0 replies; 58+ messages in thread
From: Jens Axboe @ 2015-12-04 17:02 UTC (permalink / raw)
  To: Takashi Iwai, Hannes Reinecke
  Cc: Mike Snitzer, emilne, Christoph Hellwig, Michael Ellerman,
	Mark Salter, James E. J. Bottomley, brking, linux-scsi,
	linux-kernel, linuxppc-dev, linux-block, Jun'ichi Nomura

On 12/04/2015 09:59 AM, Takashi Iwai wrote:
> On Wed, 25 Nov 2015 20:01:47 +0100,
> Hannes Reinecke wrote:
>>
>> On 11/25/2015 07:01 PM, Mike Snitzer wrote:
>>> On Wed, Nov 25 2015 at  4:04am -0500,
>>> Hannes Reinecke <hare@suse.de> wrote:
>>>
>>>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
>>>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
>>>>>> Can't we have a joint effort here?
>>>>>> I've been spending a _LOT_ of time trying to debug things here, but
>>>>>> none of the ideas I've come up with have been able to fix anything.
>>>>>
>>>>> Yes.  I'm not the one primarily looking at it, and we don't have a
>>>>> reproducer in-house.  We just have the one dump right now.
>>>>>
>>>>>>
>>>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
>>>>>> by one and be done with ...
>>>>>>
>>>>>
>>>>> That might not fix it if it is a problem with the merge code, though.
>>>>>
>>>> And indeed, it doesn't.
>>>
>>> How did you arrive at that?  Do you have a reproducer now?
>>>
>> Not a reproducer, but several dumps for analysis.
>>
>>>> Seems I finally found the culprit.
>>>>
>>>> What happens is this:
>>>> We have two paths, with these seg_boundary_masks:
>>>>
>>>> path-1:    seg_boundary_mask = 65535,
>>>> path-2:    seg_boundary_mask = 4294967295,
>>>>
>>>> consequently the DM request queue has this:
>>>>
>>>> md-1:    seg_boundary_mask = 65535,
>>>>
>>>> What happens now is that a request is being formatted, and sent
>>>> to path 2. During submission req->nr_phys_segments is formatted
>>>> with the limits of path 2, arriving at a count of 3.
>>>> Now the request gets retried on path 1, but as the NOMERGE request
>>>> flag is set req->nr_phys_segments is never updated.
>>>> But blk_rq_map_sg() ignores all counters, and just uses the
>>>> bi_vec directly, resulting in a count of 4 -> boom.
>>>>
>>>> So the culprit here is the NOMERGE flag,
>>>
>>> NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
>>>
>> Yes.
>>
>>>> which is evaluated via
>>>> ->dm_dispatch_request()
>>>>     ->blk_insert_cloned_request()
>>>>       ->blk_rq_check_limits()
>>>
>>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
>>> anyway after reading your mail I'm still left wondering if your proposed
>>> patch is correct.
>>>
>>>> If the above assessment is correct, the following patch should
>>>> fix it:
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index 801ced7..12cccd6 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
>>>>     */
>>>>    int blk_rq_check_limits(struct request_queue *q, struct request *rq)
>>>>    {
>>>> -       if (!rq_mergeable(rq))
>>>> +       if (rq->cmd_type != REQ_TYPE_FS)
>>>>                   return 0;
>>>>
>>>>           if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
>>>> rq->cmd_flags)) {
>>>>
>>>>
>>>> Mike? Jens?
>>>> Can you comment on it?
>>>
>>> You're not explaining the actual change in the patch very well; I think
>>> you're correct but you're leaving the justification as an exercise to
>>> the reviewer:
>>>
>>> blk_rq_check_limits() will call blk_recalc_rq_segments() after the
>>> !rq_mergeable() check but you're saying for this case in question we
>>> never get there -- due to the cloned request having NOMERGE set.
>>>
>>> So in blk_rq_check_limits() you've unrolled rq_mergeable() and
>>> open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
>>>
>>> I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
>>> the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
>>> sense for cloned requests that always have NOMERGE set.
>>>
>>> So you're saying that by having blk_rq_check_limits() go on to call
>>> blk_recalc_rq_segments() this bug will be fixed?
>>>
>> That is the idea.
>>
>> I've already established that in all instances I have seen so far
>> req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.
>>
>> As it turns out, req->nr_phys_segemnts _would_ have been updated in
>> blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
>> for the cloned request.
>> So each cloned request inherits the values from the original request,
>> despite the fact that req->nr_phys_segments _has_ to be evaluated in
>> the final request_queue context, as the queue limits _might_ be
>> different from the original (merged) queue limits of the multipath
>> request queue.
>>
>>> BTW, I think blk_rq_check_limits()'s export should be removed and the
>>> function made static and renamed to blk_clone_rq_check_limits(), again:
>>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
>>>
>> Actually, seeing Jens' last comment the check for REQ_TYPE_FS is
>> pointless, too, so we might as well remove the entire if-clause.
>>
>>> Seems prudent to make that change now to be clear that this code is only
>>> used by cloned requests.
>>>
>> Yeah, that would make sense. I'll be preparing a patch.
>> With a more detailed description :-)
>
> Do we have already a fix?  Right now I got (likely) this kernel BUG()
> on the almost latest Linus tree (commit 25364a9e54fb8296).  It
> happened while I started a KVM right after a fresh boot.  The machine
> paniced even before that, so I hit this twice today.

Update to the tree as-of yesterday (or today) and it should work. 
25364a9e54fb8296 doesn't include the latest block fixes that were sent 
in yesterday, that should fix it. You need commit a88d32af18b8 or newer.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
  2015-12-04 17:02                         ` Jens Axboe
@ 2015-12-04 17:09                           ` Takashi Iwai
  -1 siblings, 0 replies; 58+ messages in thread
From: Takashi Iwai @ 2015-12-04 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Hannes Reinecke, Mike Snitzer, emilne, Christoph Hellwig,
	Michael Ellerman, Mark Salter, James E. J. Bottomley, brking,
	linux-scsi, linux-kernel, linuxppc-dev, linux-block,
	Jun'ichi Nomura

On Fri, 04 Dec 2015 18:02:58 +0100,
Jens Axboe wrote:
> 
> On 12/04/2015 09:59 AM, Takashi Iwai wrote:
> > On Wed, 25 Nov 2015 20:01:47 +0100,
> > Hannes Reinecke wrote:
> >>
> >> On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> >>> On Wed, Nov 25 2015 at  4:04am -0500,
> >>> Hannes Reinecke <hare@suse.de> wrote:
> >>>
> >>>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
> >>>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >>>>>> Can't we have a joint effort here?
> >>>>>> I've been spending a _LOT_ of time trying to debug things here, but
> >>>>>> none of the ideas I've come up with have been able to fix anything.
> >>>>>
> >>>>> Yes.  I'm not the one primarily looking at it, and we don't have a
> >>>>> reproducer in-house.  We just have the one dump right now.
> >>>>>
> >>>>>>
> >>>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >>>>>> by one and be done with ...
> >>>>>>
> >>>>>
> >>>>> That might not fix it if it is a problem with the merge code, though.
> >>>>>
> >>>> And indeed, it doesn't.
> >>>
> >>> How did you arrive at that?  Do you have a reproducer now?
> >>>
> >> Not a reproducer, but several dumps for analysis.
> >>
> >>>> Seems I finally found the culprit.
> >>>>
> >>>> What happens is this:
> >>>> We have two paths, with these seg_boundary_masks:
> >>>>
> >>>> path-1:    seg_boundary_mask = 65535,
> >>>> path-2:    seg_boundary_mask = 4294967295,
> >>>>
> >>>> consequently the DM request queue has this:
> >>>>
> >>>> md-1:    seg_boundary_mask = 65535,
> >>>>
> >>>> What happens now is that a request is being formatted, and sent
> >>>> to path 2. During submission req->nr_phys_segments is formatted
> >>>> with the limits of path 2, arriving at a count of 3.
> >>>> Now the request gets retried on path 1, but as the NOMERGE request
> >>>> flag is set req->nr_phys_segments is never updated.
> >>>> But blk_rq_map_sg() ignores all counters, and just uses the
> >>>> bi_vec directly, resulting in a count of 4 -> boom.
> >>>>
> >>>> So the culprit here is the NOMERGE flag,
> >>>
> >>> NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
> >>>
> >> Yes.
> >>
> >>>> which is evaluated via
> >>>> ->dm_dispatch_request()
> >>>>     ->blk_insert_cloned_request()
> >>>>       ->blk_rq_check_limits()
> >>>
> >>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> >>> anyway after reading your mail I'm still left wondering if your proposed
> >>> patch is correct.
> >>>
> >>>> If the above assessment is correct, the following patch should
> >>>> fix it:
> >>>>
> >>>> diff --git a/block/blk-core.c b/block/blk-core.c
> >>>> index 801ced7..12cccd6 100644
> >>>> --- a/block/blk-core.c
> >>>> +++ b/block/blk-core.c
> >>>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> >>>>     */
> >>>>    int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> >>>>    {
> >>>> -       if (!rq_mergeable(rq))
> >>>> +       if (rq->cmd_type != REQ_TYPE_FS)
> >>>>                   return 0;
> >>>>
> >>>>           if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> >>>> rq->cmd_flags)) {
> >>>>
> >>>>
> >>>> Mike? Jens?
> >>>> Can you comment on it?
> >>>
> >>> You're not explaining the actual change in the patch very well; I think
> >>> you're correct but you're leaving the justification as an exercise to
> >>> the reviewer:
> >>>
> >>> blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> >>> !rq_mergeable() check but you're saying for this case in question we
> >>> never get there -- due to the cloned request having NOMERGE set.
> >>>
> >>> So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> >>> open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
> >>>
> >>> I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> >>> the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> >>> sense for cloned requests that always have NOMERGE set.
> >>>
> >>> So you're saying that by having blk_rq_check_limits() go on to call
> >>> blk_recalc_rq_segments() this bug will be fixed?
> >>>
> >> That is the idea.
> >>
> >> I've already established that in all instances I have seen so far
> >> req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.
> >>
> >> As it turns out, req->nr_phys_segemnts _would_ have been updated in
> >> blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
> >> for the cloned request.
> >> So each cloned request inherits the values from the original request,
> >> despite the fact that req->nr_phys_segments _has_ to be evaluated in
> >> the final request_queue context, as the queue limits _might_ be
> >> different from the original (merged) queue limits of the multipath
> >> request queue.
> >>
> >>> BTW, I think blk_rq_check_limits()'s export should be removed and the
> >>> function made static and renamed to blk_clone_rq_check_limits(), again:
> >>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
> >>>
> >> Actually, seeing Jens' last comment the check for REQ_TYPE_FS is
> >> pointless, too, so we might as well remove the entire if-clause.
> >>
> >>> Seems prudent to make that change now to be clear that this code is only
> >>> used by cloned requests.
> >>>
> >> Yeah, that would make sense. I'll be preparing a patch.
> >> With a more detailed description :-)
> >
> > Do we have already a fix?  Right now I got (likely) this kernel BUG()
> > on the almost latest Linus tree (commit 25364a9e54fb8296).  It
> > happened while I started a KVM right after a fresh boot.  The machine
> > paniced even before that, so I hit this twice today.
> 
> Update to the tree as-of yesterday (or today) and it should work. 
> 25364a9e54fb8296 doesn't include the latest block fixes that were sent 
> in yesterday, that should fix it. You need commit a88d32af18b8 or newer.

Alright, I'll give it a try.  Thanks for information!


Takashi

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!
@ 2015-12-04 17:09                           ` Takashi Iwai
  0 siblings, 0 replies; 58+ messages in thread
From: Takashi Iwai @ 2015-12-04 17:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Hannes Reinecke, Mike Snitzer, emilne, Christoph Hellwig,
	Michael Ellerman, Mark Salter, James E. J. Bottomley, brking,
	linux-scsi, linux-kernel, linuxppc-dev, linux-block,
	Jun'ichi Nomura

On Fri, 04 Dec 2015 18:02:58 +0100,
Jens Axboe wrote:
> 
> On 12/04/2015 09:59 AM, Takashi Iwai wrote:
> > On Wed, 25 Nov 2015 20:01:47 +0100,
> > Hannes Reinecke wrote:
> >>
> >> On 11/25/2015 07:01 PM, Mike Snitzer wrote:
> >>> On Wed, Nov 25 2015 at  4:04am -0500,
> >>> Hannes Reinecke <hare@suse.de> wrote:
> >>>
> >>>> On 11/20/2015 04:28 PM, Ewan Milne wrote:
> >>>>> On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:
> >>>>>> Can't we have a joint effort here?
> >>>>>> I've been spending a _LOT_ of time trying to debug things here, but
> >>>>>> none of the ideas I've come up with have been able to fix anything.
> >>>>>
> >>>>> Yes.  I'm not the one primarily looking at it, and we don't have a
> >>>>> reproducer in-house.  We just have the one dump right now.
> >>>>>
> >>>>>>
> >>>>>> I'm almost tempted to increase the count from scsi_alloc_sgtable()
> >>>>>> by one and be done with ...
> >>>>>>
> >>>>>
> >>>>> That might not fix it if it is a problem with the merge code, though.
> >>>>>
> >>>> And indeed, it doesn't.
> >>>
> >>> How did you arrive at that?  Do you have a reproducer now?
> >>>
> >> Not a reproducer, but several dumps for analysis.
> >>
> >>>> Seems I finally found the culprit.
> >>>>
> >>>> What happens is this:
> >>>> We have two paths, with these seg_boundary_masks:
> >>>>
> >>>> path-1:    seg_boundary_mask = 65535,
> >>>> path-2:    seg_boundary_mask = 4294967295,
> >>>>
> >>>> consequently the DM request queue has this:
> >>>>
> >>>> md-1:    seg_boundary_mask = 65535,
> >>>>
> >>>> What happens now is that a request is being formatted, and sent
> >>>> to path 2. During submission req->nr_phys_segments is formatted
> >>>> with the limits of path 2, arriving at a count of 3.
> >>>> Now the request gets retried on path 1, but as the NOMERGE request
> >>>> flag is set req->nr_phys_segments is never updated.
> >>>> But blk_rq_map_sg() ignores all counters, and just uses the
> >>>> bi_vec directly, resulting in a count of 4 -> boom.
> >>>>
> >>>> So the culprit here is the NOMERGE flag,
> >>>
> >>> NOMERGE is always set in __blk_rq_prep_clone() for cloned requests.
> >>>
> >> Yes.
> >>
> >>>> which is evaluated via
> >>>> ->dm_dispatch_request()
> >>>>     ->blk_insert_cloned_request()
> >>>>       ->blk_rq_check_limits()
> >>>
> >>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits();
> >>> anyway after reading your mail I'm still left wondering if your proposed
> >>> patch is correct.
> >>>
> >>>> If the above assessment is correct, the following patch should
> >>>> fix it:
> >>>>
> >>>> diff --git a/block/blk-core.c b/block/blk-core.c
> >>>> index 801ced7..12cccd6 100644
> >>>> --- a/block/blk-core.c
> >>>> +++ b/block/blk-core.c
> >>>> @@ -1928,7 +1928,7 @@ EXPORT_SYMBOL(submit_bio);
> >>>>     */
> >>>>    int blk_rq_check_limits(struct request_queue *q, struct request *rq)
> >>>>    {
> >>>> -       if (!rq_mergeable(rq))
> >>>> +       if (rq->cmd_type != REQ_TYPE_FS)
> >>>>                   return 0;
> >>>>
> >>>>           if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q,
> >>>> rq->cmd_flags)) {
> >>>>
> >>>>
> >>>> Mike? Jens?
> >>>> Can you comment on it?
> >>>
> >>> You're not explaining the actual change in the patch very well; I think
> >>> you're correct but you're leaving the justification as an exercise to
> >>> the reviewer:
> >>>
> >>> blk_rq_check_limits() will call blk_recalc_rq_segments() after the
> >>> !rq_mergeable() check but you're saying for this case in question we
> >>> never get there -- due to the cloned request having NOMERGE set.
> >>>
> >>> So in blk_rq_check_limits() you've unrolled rq_mergeable() and
> >>> open-coded the lone remaining check (rq->cmd_type != REQ_TYPE_FS)
> >>>
> >>> I agree that the (rq->cmd_flags & REQ_NOMERGE_FLAGS) check in
> >>> the blk_insert_cloned_request() call-chain (via rq_mergeable()) makes no
> >>> sense for cloned requests that always have NOMERGE set.
> >>>
> >>> So you're saying that by having blk_rq_check_limits() go on to call
> >>> blk_recalc_rq_segments() this bug will be fixed?
> >>>
> >> That is the idea.
> >>
> >> I've already established that in all instances I have seen so far
> >> req->nr_phys_segments is _less_ than req->bio->bi_phys_segments.
> >>
> >> As it turns out, req->nr_phys_segemnts _would_ have been updated in
> >> blk_rq_check_limits(), but isn't due to the NOMERGE flag being set
> >> for the cloned request.
> >> So each cloned request inherits the values from the original request,
> >> despite the fact that req->nr_phys_segments _has_ to be evaluated in
> >> the final request_queue context, as the queue limits _might_ be
> >> different from the original (merged) queue limits of the multipath
> >> request queue.
> >>
> >>> BTW, I think blk_rq_check_limits()'s export should be removed and the
> >>> function made static and renamed to blk_clone_rq_check_limits(), again:
> >>> blk_insert_cloned_request() is the only caller of blk_rq_check_limits()
> >>>
> >> Actually, seeing Jens' last comment the check for REQ_TYPE_FS is
> >> pointless, too, so we might as well remove the entire if-clause.
> >>
> >>> Seems prudent to make that change now to be clear that this code is only
> >>> used by cloned requests.
> >>>
> >> Yeah, that would make sense. I'll be preparing a patch.
> >> With a more detailed description :-)
> >
> > Do we have already a fix?  Right now I got (likely) this kernel BUG()
> > on the almost latest Linus tree (commit 25364a9e54fb8296).  It
> > happened while I started a KVM right after a fresh boot.  The machine
> > paniced even before that, so I hit this twice today.
> 
> Update to the tree as-of yesterday (or today) and it should work. 
> 25364a9e54fb8296 doesn't include the latest block fixes that were sent 
> in yesterday, that should fix it. You need commit a88d32af18b8 or newer.

Alright, I'll give it a try.  Thanks for information!


Takashi

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2015-12-04 17:10 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-18  9:18 kernel BUG at drivers/scsi/scsi_lib.c:1096! Michael Ellerman
2015-11-18 11:06 ` Laurent Dufour
2015-11-18 11:10   ` Michael Ellerman
2015-11-18 11:17     ` Laurent Dufour
2015-11-18 14:03 ` Mark Salter
2015-11-19  1:02   ` Michael Ellerman
2015-11-19  8:23     ` Christoph Hellwig
2015-11-19 15:35       ` Hannes Reinecke
2015-11-19 15:35         ` Hannes Reinecke
2015-11-20 14:38         ` Ewan Milne
2015-11-20 14:55           ` Hannes Reinecke
2015-11-20 14:55             ` Hannes Reinecke
2015-11-20 15:28             ` Ewan Milne
2015-11-23  6:55               ` Hannes Reinecke
2015-11-23  6:55                 ` Hannes Reinecke
2015-11-25  9:04               ` Hannes Reinecke
2015-11-25 17:56                 ` Jens Axboe
2015-11-25 17:56                   ` Jens Axboe
2015-11-25 19:10                   ` Hannes Reinecke
2015-11-25 19:24                     ` Jens Axboe
2015-11-25 19:24                       ` Jens Axboe
2015-11-25 20:23                       ` Mike Snitzer
2015-11-25 21:20                         ` Mike Snitzer
2015-11-25 18:01                 ` Mike Snitzer
2015-11-25 19:01                   ` Hannes Reinecke
2015-11-25 19:01                     ` Hannes Reinecke
2015-12-04 16:59                     ` Takashi Iwai
2015-12-04 16:59                       ` Takashi Iwai
2015-12-04 17:02                       ` Jens Axboe
2015-12-04 17:02                         ` Jens Axboe
2015-12-04 17:09                         ` Takashi Iwai
2015-12-04 17:09                           ` Takashi Iwai
2015-11-20 12:10       ` Michael Ellerman
2015-11-20 12:10         ` Michael Ellerman
2015-11-20 12:56         ` Laurent Dufour
2015-11-20 13:37           ` Mark Salter
2015-11-21 11:30         ` Laurent Dufour
2015-11-21 11:30           ` Laurent Dufour
2015-11-21 16:56           ` Ming Lei
2015-11-21 16:56             ` Ming Lei
2015-11-22 23:20             ` Mark Salter
2015-11-23  0:36               ` Ming Lei
2015-11-23  1:50                 ` Mark Salter
2015-11-23  1:50                   ` Mark Salter
2015-11-23  2:46                   ` Ming Lei
2015-11-23 15:21                     ` Ming Lei
2015-11-23 15:21                       ` Ming Lei
2015-11-24 18:59                       ` Alan Ott
2015-11-24 18:59                         ` Alan Ott
2015-11-24 18:59                         ` Alan Ott
2015-11-24 18:59                         ` Alan Ott
2015-11-23 13:57               ` Laurent Dufour
2015-11-23 13:57                 ` Laurent Dufour
2015-11-23 15:13                 ` Pratyush Anand
2015-11-23 15:20                   ` Laurent Dufour
2015-11-23 15:27                     ` Ming Lei
2015-11-23 16:24                       ` Laurent Dufour
2015-11-24  1:30                       ` Mark Salter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.