All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.1-rc2 dm-multipath-mq kernel warning
@ 2015-05-05 14:04 Bart Van Assche
  2015-05-06  2:23 ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2015-05-05 14:04 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

Hello Mike,

While retesting my SRP initiator patches on top of kernel v4.1-rc2 with 
DM_MQ_DEFAULT=y I ran into the kernel warning below. Does this mean that 
I'm missing any device mapper related patches ? This warning was 
reported shortly after scsi_remove_host() had been invoked.

Thanks,

Bart.

[  636.982089] device-mapper: multipath: Failing path 8:16.
[  636.982395] blk_update_request: I/O error, dev dm-1, sector 42378
[  636.987025] ------------[ cut here ]------------
[  636.987053] WARNING: CPU: 0 PID: 3 at drivers/md/dm.c:1090 
free_rq_clone+0x7a/0xf0 [dm_mod]()
[  636.987070] Modules linked in: dm_service_time dm_multipath scsi_dh 
ib_srp scsi_transport_srp netconsole configfs fuse dm_crypt xts gf128mul 
algif_skcipher af_alg loop rdma_ucm rdma_cm iw_cm ib_ipoib ib_cm 
ib_uverbs ib_umad mlx4_en ptp pps_core iscsi_ibft iscsi_boot_sysfs 
af_packet mlx4_ib ib_sa ib_mad ib_core ib_addr mlx4_core iTCO_wdt sky2 
iTCO_vendor_support coretemp lpc_ich shpchp mfd_core pcspkr serio_raw 
i2c_i801 acpi_cpufreq tpm_infineon tpm_tis tpm asus_atk0110 button 
processor dm_mod sr_mod cdrom ata_generic radeon ata_piix firewire_ohci 
i2c_algo_bit firewire_core crc_itu_t drm_kms_helper ttm pata_marvell drm 
floppy sg [last unloaded: scsi_transport_srp]
[  636.998755] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 
4.1.0-rc2-debug+ #1
[  636.998771] Hardware name: System manufacturer P5Q DELUXE/P5Q DELUXE, 
BIOS 2301    07/10/2009
[  636.998786]  ffffffffa02ce8ad ffff8801b81efca8 ffffffff81606de3 
ffff8801bfc0fff8
[  636.998862]  0000000000000000 ffff8801b81efce8 ffffffff81052940 
ffffffff81e3d288
[  636.998953]  ffff8800a1f82580 ffff8800a1e40000 ffff8800a1f82530 
ffff8800a1e40000
[  636.999032] Call Trace:
[  636.999054]  [<ffffffff81606de3>] dump_stack+0x4c/0x6e
[  636.999074]  [<ffffffff81052940>] warn_slowpath_common+0x80/0xc0
[  636.999092]  [<ffffffff81052a25>] warn_slowpath_null+0x15/0x20
[  636.999115]  [<ffffffffa02bdc2a>] free_rq_clone+0x7a/0xf0 [dm_mod]
[  636.999139]  [<ffffffffa02be45c>] dm_softirq_done+0x13c/0x250 [dm_mod]
[  636.999162]  [<ffffffff812c6ec8>] blk_done_softirq+0x78/0x90
[  636.999181]  [<ffffffff81056a2a>] __do_softirq+0x10a/0x240
[  636.999200]  [<ffffffff81056b7a>] run_ksoftirqd+0x1a/0x70
[  636.999218]  [<ffffffff81076adf>] smpboot_thread_fn+0x16f/0x260
[  636.999256]  [<ffffffff81073099>] kthread+0xf9/0x110
[  636.999294]  [<ffffffff8160fce2>] ret_from_fork+0x42/0x70
[  636.999338] ---[ end trace ba34bc950a902b8f ]---

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-05 14:04 4.1-rc2 dm-multipath-mq kernel warning Bart Van Assche
@ 2015-05-06  2:23 ` Mike Snitzer
  2015-05-06  7:45   ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-06  2:23 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, May 05 2015 at 10:04am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Hello Mike,
> 
> While retesting my SRP initiator patches on top of kernel v4.1-rc2
> with DM_MQ_DEFAULT=y I ran into the kernel warning below. Does this
> mean that I'm missing any device mapper related patches ? This
> warning was reported shortly after scsi_remove_host() had been
> invoked.

I put the warning in place because, to me, if it triggers it speaks to
unsafe teardown occuring (request is still completing but the queue it
was issued from no longer exists).

Like I said before I'm open to removing the WARN_ON_ONCE() if this
scenario is perfectly valid.  But I just haven't had time to revisit
what appears to be a potentially serious problem with the underlying
paths' teardown vs upper level mpath IO.

I'll try to revisit this week.  But I welcome input from others too.

(Just thinking about it further now, it could be that the way the clone
request is allocated in the case of blk-mq DM is as part of the original
request's pdu... meaning there isn't a proper get_request() call against
the underlying queue.. so the expected refcounting likely isn't
happening.  And given the request won't be free'd from that underlying
request_queue there really isn't a need to artificially link these
cloned requests with the underlying request_queue... so I'm now leaning
toward just removing the WARN_ON_ONCE.. but I'll look closer tomorrow)

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-06  2:23 ` Mike Snitzer
@ 2015-05-06  7:45   ` Bart Van Assche
  2015-05-06 18:29     ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2015-05-06  7:45 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 05/06/15 04:23, Mike Snitzer wrote:
> On Tue, May 05 2015 at 10:04am -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> While retesting my SRP initiator patches on top of kernel v4.1-rc2
>> with DM_MQ_DEFAULT=y I ran into the kernel warning below. Does this
>> mean that I'm missing any device mapper related patches ? This
>> warning was reported shortly after scsi_remove_host() had been
>> invoked.
> 
> I put the warning in place because, to me, if it triggers it speaks to
> unsafe teardown occuring (request is still completing but the queue it
> was issued from no longer exists).
> 
> Like I said before I'm open to removing the WARN_ON_ONCE() if this
> scenario is perfectly valid.  But I just haven't had time to revisit
> what appears to be a potentially serious problem with the underlying
> paths' teardown vs upper level mpath IO.
> 
> I'll try to revisit this week.  But I welcome input from others too.
> 
> (Just thinking about it further now, it could be that the way the clone
> request is allocated in the case of blk-mq DM is as part of the original
> request's pdu... meaning there isn't a proper get_request() call against
> the underlying queue.. so the expected refcounting likely isn't
> happening.  And given the request won't be free'd from that underlying
> request_queue there really isn't a need to artificially link these
> cloned requests with the underlying request_queue... so I'm now leaning
> toward just removing the WARN_ON_ONCE.. but I'll look closer tomorrow)

Hello Mike,

With CONFIG_SCSI_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n I just ran into
the bug report below. I will continue my v4.1-rc2 tests with SCSI_MQ=n.

[  288.035205] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
[  288.035415] IP: [<ffffffff812bda07>] blk_rq_prep_clone+0x87/0x160
[  288.035565] PGD a1890067 PUD a432f067 PMD 0 
[  288.035753] Oops: 0000 [#1] PREEMPT SMP 
[  288.035957] Modules linked in: dm_service_time dm_multipath scsi_dh netconsole configfs fuse dm_crypt xts gf128mul algif_skcipher af_alg loop rdma_ucm rdma_cm iw_cm ib_srp scsi_transport_srp ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en ptp pps_core mlx4_ib ib_sa iscsi_ibft ib_mad iscsi_boot_sysfs ib_core ib_addr af_packet mlx4_core iTCO_wdt tpm_infineon tpm_tis iTCO_vendor_support sky2 lpc_ich tpm mfd_core shpchp serio_raw acpi_cpufreq i2c_i801 asus_atk0110 button processor pcspkr coretemp dm_mod sr_mod cdrom ata_generic ata_piix firewire_ohci radeon firewire_core crc_itu_t i2c_algo_bit drm_kms_helper ttm drm pata_marvell floppy sg
[  288.040008] CPU: 0 PID: 2223 Comm: kdmwork-254:1 Not tainted 4.1.0-rc2-debug+ #4
[  288.040008] Hardware name: System manufacturer P5Q DELUXE/P5Q DELUXE, BIOS 2301    07/10/2009
[  288.040008] task: ffff8801a2f75180 ti: ffff88019d008000 task.ti: ffff88019d008000
[  288.040008] RIP: 0010:[<ffffffff812bda07>]  [<ffffffff812bda07>] blk_rq_prep_clone+0x87/0x160
[  288.040008] RSP: 0018:ffff88019d00bd38  EFLAGS: 00010246
[  288.040008] RAX: 0000000000000000 RBX: ffffffffa02914f0 RCX: 0000000000000001
[  288.040008] RDX: ffff8800a0cec660 RSI: ffff8801b7d22880 RDI: ffff8800a0cbed10
[  288.040008] RBP: ffff88019d00bd88 R08: 0000000000000020 R09: 0000000000000000
[  288.040008] R10: 0000000000000001 R11: ffff8800a0cbd200 R12: ffff8800a43cc618
[  288.040008] R13: ffff8801b7d22880 R14: ffff8800a0cbed10 R15: 0000000000000000
[  288.040008] FS:  0000000000000000(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000
[  288.040008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  288.040008] CR2: 0000000000000068 CR3: 00000000a1a15000 CR4: 00000000000407f0
[  288.040008] Stack:
[  288.040008]  ffff88019d00bda0 ffff88019b80c828 ffff8800a0cec660 00000020a0cec660
[  288.040008]  ffff8801b6101148 ffff8800a0cec660 0000000000000002 ffff88019b80c828
[  288.040008]  ffffc90001f12040 0000000000000000 ffff88019d00bdd8 ffffffffa0292a71
[  288.040008] Call Trace:
[  288.040008]  [<ffffffffa0292a71>] map_request.isra.39+0x191/0x230 [dm_mod]
[  288.040008]  [<ffffffffa0292b2a>] map_tio_request+0x1a/0x40 [dm_mod]
[  288.040008]  [<ffffffff8107318e>] kthread_worker_fn+0x7e/0x1b0
[  288.040008]  [<ffffffff81073110>] ? __init_kthread_worker+0x60/0x60
[  288.040008]  [<ffffffff81073099>] kthread+0xf9/0x110
[  288.040008]  [<ffffffff81072fa0>] ? kthread_create_on_node+0x230/0x230
[  288.040008]  [<ffffffff8160fee2>] ret_from_fork+0x42/0x70
[  288.040008]  [<ffffffff81072fa0>] ? kthread_create_on_node+0x230/0x230

# gdb vmlinux
(gdb) list *(blk_rq_prep_clone+0x87)
0xffffffff812bda07 is in blk_rq_prep_clone (block/blk-core.c:2976).
2971                            goto free_and_out;
2972
2973                    if (bio_ctr && bio_ctr(bio, bio_src, data))
2974                            goto free_and_out;
2975
2976                    if (rq->bio) {
2977                            rq->biotail->bi_next = bio;
2978                            rq->biotail = bio;
2979                    } else
2980                            rq->bio = rq->biotail = bio;

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-06  7:45   ` Bart Van Assche
@ 2015-05-06 18:29     ` Mike Snitzer
  2015-05-07 10:19       ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-06 18:29 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, May 06 2015 at  3:45am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 05/06/15 04:23, Mike Snitzer wrote:
> > On Tue, May 05 2015 at 10:04am -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >> While retesting my SRP initiator patches on top of kernel v4.1-rc2
> >> with DM_MQ_DEFAULT=y I ran into the kernel warning below. Does this
> >> mean that I'm missing any device mapper related patches ? This
> >> warning was reported shortly after scsi_remove_host() had been
> >> invoked.
> > 
> > I put the warning in place because, to me, if it triggers it speaks to
> > unsafe teardown occuring (request is still completing but the queue it
> > was issued from no longer exists).
> > 
> > Like I said before I'm open to removing the WARN_ON_ONCE() if this
> > scenario is perfectly valid.  But I just haven't had time to revisit
> > what appears to be a potentially serious problem with the underlying
> > paths' teardown vs upper level mpath IO.
> > 
> > I'll try to revisit this week.  But I welcome input from others too.
> > 
> > (Just thinking about it further now, it could be that the way the clone
> > request is allocated in the case of blk-mq DM is as part of the original
> > request's pdu... meaning there isn't a proper get_request() call against
> > the underlying queue.. so the expected refcounting likely isn't
> > happening.  And given the request won't be free'd from that underlying
> > request_queue there really isn't a need to artificially link these
> > cloned requests with the underlying request_queue... so I'm now leaning
> > toward just removing the WARN_ON_ONCE.. but I'll look closer tomorrow)
> 
> Hello Mike,
> 
> With CONFIG_SCSI_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n I just ran into
> the bug report below. I will continue my v4.1-rc2 tests with SCSI_MQ=n.

What were you doing when this happened?  Quite a strange place to get a
NULL pointer (it should be noted that for 4.2 hch's patch does away with
cloning the request's bios).  Is there an easy reproducer (unlikely
considering I've tested CONFIG_SCSI_MQ_DEFAULT=y and
CONFIG_DM_MQ_DEFAULT=n a fair amount).

BTW, my "Just thinking about it further now" above was relative to
CONFIG_DM_MQ_DEFAULT=y and CONFIG_SCSI_MQ_DEFAULT=n.

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-06 18:29     ` Mike Snitzer
@ 2015-05-07 10:19       ` Bart Van Assche
  2015-05-27 12:57         ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2015-05-07 10:19 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 05/06/15 20:29, Mike Snitzer wrote:
> On Wed, May 06 2015 at  3:45am -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
>> On 05/06/15 04:23, Mike Snitzer wrote:
>>> On Tue, May 05 2015 at 10:04am -0400,
>>> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>>>> While retesting my SRP initiator patches on top of kernel v4.1-rc2
>>>> with DM_MQ_DEFAULT=y I ran into the kernel warning below. Does this
>>>> mean that I'm missing any device mapper related patches ? This
>>>> warning was reported shortly after scsi_remove_host() had been
>>>> invoked.
>>>
>>> I put the warning in place because, to me, if it triggers it speaks to
>>> unsafe teardown occuring (request is still completing but the queue it
>>> was issued from no longer exists).
>>>
>>> Like I said before I'm open to removing the WARN_ON_ONCE() if this
>>> scenario is perfectly valid.  But I just haven't had time to revisit
>>> what appears to be a potentially serious problem with the underlying
>>> paths' teardown vs upper level mpath IO.
>>>
>>> I'll try to revisit this week.  But I welcome input from others too.
>>>
>>> (Just thinking about it further now, it could be that the way the clone
>>> request is allocated in the case of blk-mq DM is as part of the original
>>> request's pdu... meaning there isn't a proper get_request() call against
>>> the underlying queue.. so the expected refcounting likely isn't
>>> happening.  And given the request won't be free'd from that underlying
>>> request_queue there really isn't a need to artificially link these
>>> cloned requests with the underlying request_queue... so I'm now leaning
>>> toward just removing the WARN_ON_ONCE.. but I'll look closer tomorrow)
>>
>> Hello Mike,
>>
>> With CONFIG_SCSI_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n I just ran into
>> the bug report below. I will continue my v4.1-rc2 tests with SCSI_MQ=n.
>
> What were you doing when this happened?  Quite a strange place to get a
> NULL pointer (it should be noted that for 4.2 hch's patch does away with
> cloning the request's bios).  Is there an easy reproducer (unlikely
> considering I've tested CONFIG_SCSI_MQ_DEFAULT=y and
> CONFIG_DM_MQ_DEFAULT=n a fair amount).
>
> BTW, my "Just thinking about it further now" above was relative to
> CONFIG_DM_MQ_DEFAULT=y and CONFIG_SCSI_MQ_DEFAULT=n.

Hello Mike,

With kernel v4.1-rc2, with CONFIG_SCSI_MQ_DEFAULT=y and 
CONFIG_DM_MQ_DEFAULT=n if I run "for p in /sys/class/srp_remote_ports/*; 
do echo 1 > $p/delete; done" if no I/O is running that command works 
fine. That command triggers a call of scsi_remove_host(). But if I run 
the same command while I/O is running the message "BUG: unable to handle 
kernel NULL pointer dereference at 0000000000000068 / IP: 
blk_rq_prep_clone+0x87/0x160" appears. I just reproduced this after 
having rebuilt the kernel after a "make clean".

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-07 10:19       ` Bart Van Assche
@ 2015-05-27 12:57         ` Mike Snitzer
  2015-05-27 15:29           ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-27 12:57 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Thu, May 07 2015 at  6:19am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 05/06/15 20:29, Mike Snitzer wrote:
> >On Wed, May 06 2015 at  3:45am -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >
> >>On 05/06/15 04:23, Mike Snitzer wrote:
> >>>On Tue, May 05 2015 at 10:04am -0400,
> >>>Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>>>While retesting my SRP initiator patches on top of kernel v4.1-rc2
> >>>>with DM_MQ_DEFAULT=y I ran into the kernel warning below. Does this
> >>>>mean that I'm missing any device mapper related patches ? This
> >>>>warning was reported shortly after scsi_remove_host() had been
> >>>>invoked.
> >>>
> >>>I put the warning in place because, to me, if it triggers it speaks to
> >>>unsafe teardown occuring (request is still completing but the queue it
> >>>was issued from no longer exists).
> >>>
> >>>Like I said before I'm open to removing the WARN_ON_ONCE() if this
> >>>scenario is perfectly valid.  But I just haven't had time to revisit
> >>>what appears to be a potentially serious problem with the underlying
> >>>paths' teardown vs upper level mpath IO.
> >>>
> >>>I'll try to revisit this week.  But I welcome input from others too.
> >>>
> >>>(Just thinking about it further now, it could be that the way the clone
> >>>request is allocated in the case of blk-mq DM is as part of the original
> >>>request's pdu... meaning there isn't a proper get_request() call against
> >>>the underlying queue.. so the expected refcounting likely isn't
> >>>happening.  And given the request won't be free'd from that underlying
> >>>request_queue there really isn't a need to artificially link these
> >>>cloned requests with the underlying request_queue... so I'm now leaning
> >>>toward just removing the WARN_ON_ONCE.. but I'll look closer tomorrow)
> >>
> >>Hello Mike,
> >>
> >>With CONFIG_SCSI_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n I just ran into
> >>the bug report below. I will continue my v4.1-rc2 tests with SCSI_MQ=n.
> >
> >What were you doing when this happened?  Quite a strange place to get a
> >NULL pointer (it should be noted that for 4.2 hch's patch does away with
> >cloning the request's bios).  Is there an easy reproducer (unlikely
> >considering I've tested CONFIG_SCSI_MQ_DEFAULT=y and
> >CONFIG_DM_MQ_DEFAULT=n a fair amount).
> >
> >BTW, my "Just thinking about it further now" above was relative to
> >CONFIG_DM_MQ_DEFAULT=y and CONFIG_SCSI_MQ_DEFAULT=n.
> 
> Hello Mike,
> 
> With kernel v4.1-rc2, with CONFIG_SCSI_MQ_DEFAULT=y and
> CONFIG_DM_MQ_DEFAULT=n if I run "for p in
> /sys/class/srp_remote_ports/*; do echo 1 > $p/delete; done" if no
> I/O is running that command works fine. That command triggers a call
> of scsi_remove_host(). But if I run the same command while I/O is
> running the message "BUG: unable to handle kernel NULL pointer
> dereference at 0000000000000068 / IP: blk_rq_prep_clone+0x87/0x160"
> appears. I just reproduced this after having rebuilt the kernel
> after a "make clean".

Hey Bart,

Looks like Junichi likely fixed this issue you reported, please try this
patch: https://patchwork.kernel.org/patch/6487321/

Thanks,
Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-27 12:57         ` Mike Snitzer
@ 2015-05-27 15:29           ` Bart Van Assche
  2015-05-27 15:33             ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2015-05-27 15:29 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 05/27/15 14:57, Mike Snitzer wrote:
> Looks like Junichi likely fixed this issue you reported, please try this
> patch: https://patchwork.kernel.org/patch/6487321/

Hello Mike,

On a setup on which an I/O verification test passes with 
blk-mq/scsi-mq/dm-mq disabled, this is what fio reports after a few 
minutes with scsi-mq and dm-mq enabled:

test: Laying out IO file(s) (1 file(s) / 10MB)
fio: io_u error on file /mnt/test.0.0: Input/output error: write 
offset=8327168, buflen=4096
fio: io_u error on file /mnt/test.0.0: Input/output error: write 
offset=9007104, buflen=4096
fio: pid=4568, err=5/file:io_u.c:1564, func=io_u error, 
error=Input/output error

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-27 15:29           ` Bart Van Assche
@ 2015-05-27 15:33             ` Bart Van Assche
  2015-05-27 16:14               ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2015-05-27 15:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 05/27/15 17:29, Bart Van Assche wrote:
> On 05/27/15 14:57, Mike Snitzer wrote:
>> Looks like Junichi likely fixed this issue you reported, please try this
>> patch: https://patchwork.kernel.org/patch/6487321/
>
> Hello Mike,
>
> On a setup on which an I/O verification test passes with
> blk-mq/scsi-mq/dm-mq disabled, this is what fio reports after a few
> minutes with scsi-mq and dm-mq enabled:
>
> test: Laying out IO file(s) (1 file(s) / 10MB)
> fio: io_u error on file /mnt/test.0.0: Input/output error: write
> offset=8327168, buflen=4096
> fio: io_u error on file /mnt/test.0.0: Input/output error: write
> offset=9007104, buflen=4096
> fio: pid=4568, err=5/file:io_u.c:1564, func=io_u error,
> error=Input/output error

(replying to my own e-mail)

BTW, on the same test setup kmemleak reports several memory leaks, e.g. 
this one:

unreferenced object 0xffff88009b14e2b0 (size 16):
   comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
   hex dump (first 16 bytes):
     40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
   backtrace:
     [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
     [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
     [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
     [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
     [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
     [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
     [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
     [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
     [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
     [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
     [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
     [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
     [<ffffffff812c0ff2>] submit_bio+0x72/0x150
     [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
     [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
     [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-27 15:33             ` Bart Van Assche
@ 2015-05-27 16:14               ` Mike Snitzer
  2015-05-27 17:00                 ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-27 16:14 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, May 27 2015 at 11:33am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 05/27/15 17:29, Bart Van Assche wrote:
> >On 05/27/15 14:57, Mike Snitzer wrote:
> >>Looks like Junichi likely fixed this issue you reported, please try this
> >>patch: https://patchwork.kernel.org/patch/6487321/
> >
> >Hello Mike,
> >
> >On a setup on which an I/O verification test passes with
> >blk-mq/scsi-mq/dm-mq disabled, this is what fio reports after a few
> >minutes with scsi-mq and dm-mq enabled:
> >
> >test: Laying out IO file(s) (1 file(s) / 10MB)
> >fio: io_u error on file /mnt/test.0.0: Input/output error: write
> >offset=8327168, buflen=4096
> >fio: io_u error on file /mnt/test.0.0: Input/output error: write
> >offset=9007104, buflen=4096
> >fio: pid=4568, err=5/file:io_u.c:1564, func=io_u error,
> >error=Input/output error

I'll look closer at this.. so NULL pointer is fixed but this test hits
IO errors.

> (replying to my own e-mail)
> 
> BTW, on the same test setup kmemleak reports several memory leaks,
> e.g. this one:
> 
> unreferenced object 0xffff88009b14e2b0 (size 16):
>   comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
>   hex dump (first 16 bytes):
>     40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
>   backtrace:
>     [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
>     [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
>     [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
>     [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
>     [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
>     [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
>     [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
>     [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
>     [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
>     [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
>     [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
>     [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
>     [<ffffffff812c0ff2>] submit_bio+0x72/0x150
>     [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
>     [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
>     [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390

Would appear there is potential for an early return from
dm-mpath.c:__multipath_map() to leak the dm_mpath_io.

Please add this patch:

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 6395347..eff7bdd 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -429,9 +429,11 @@ static int __multipath_map(struct dm_target *ti, struct request *clone,
 		/* blk-mq request-based interface */
 		*__clone = blk_get_request(bdev_get_queue(bdev),
 					   rq_data_dir(rq), GFP_ATOMIC);
-		if (IS_ERR(*__clone))
+		if (IS_ERR(*__clone)) {
 			/* ENOMEM, requeue */
+			clear_mapinfo(m, map_context);
 			return r;
+		}
 		(*__clone)->bio = (*__clone)->biotail = NULL;
 		(*__clone)->rq_disk = bdev->bd_disk;
 		(*__clone)->cmd_flags |= REQ_FAILFAST_TRANSPORT;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-27 16:14               ` Mike Snitzer
@ 2015-05-27 17:00                 ` Mike Snitzer
  2015-05-27 22:37                   ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-27 17:00 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, May 27 2015 at 12:14P -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, May 27 2015 at 11:33am -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > On 05/27/15 17:29, Bart Van Assche wrote:
> > >On 05/27/15 14:57, Mike Snitzer wrote:
> > >>Looks like Junichi likely fixed this issue you reported, please try this
> > >>patch: https://patchwork.kernel.org/patch/6487321/
> > >
> > >Hello Mike,
> > >
> > >On a setup on which an I/O verification test passes with
> > >blk-mq/scsi-mq/dm-mq disabled, this is what fio reports after a few
> > >minutes with scsi-mq and dm-mq enabled:
> > >
> > >test: Laying out IO file(s) (1 file(s) / 10MB)
> > >fio: io_u error on file /mnt/test.0.0: Input/output error: write
> > >offset=8327168, buflen=4096
> > >fio: io_u error on file /mnt/test.0.0: Input/output error: write
> > >offset=9007104, buflen=4096
> > >fio: pid=4568, err=5/file:io_u.c:1564, func=io_u error,
> > >error=Input/output error
> 
> I'll look closer at this.. so NULL pointer is fixed but this test hits
> IO errors.

Further code inspection revealed an issue with dm-mq enabled but scsi-mq
disabled (when requeuing the original request after clone_rq() failure DM
core wasn't unwinding the dm_start_request() accounting).  The following
patch will fix this issue.  I've also switched the dm-mq on scsi-mq case
to return BLK_MQ_RQ_QUEUE_BUSY directly (like hch suggested last week).
I have no idea if this would actually fix your case (would be surprising
but worth a shot I suppose).

Anyway, feel free to try this patch:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 85966ee..02e2d1f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1115,23 +1115,37 @@ static void old_requeue_request(struct request *rq)
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
-static void dm_requeue_original_request(struct mapped_device *md,
-					struct request *rq)
+static void __dm_requeue_original_request(struct mapped_device *md,
+					  struct request *rq, bool in_blk_mq_queue_rq)
 {
 	int rw = rq_data_dir(rq);
 
 	dm_unprep_request(rq);
 
-	if (!rq->q->mq_ops)
-		old_requeue_request(rq);
-	else {
-		blk_mq_requeue_request(rq);
-		blk_mq_kick_requeue_list(rq->q);
+	if (!in_blk_mq_queue_rq) {
+		if (!rq->q->mq_ops)
+			old_requeue_request(rq);
+		else {
+			blk_mq_requeue_request(rq);
+			blk_mq_kick_requeue_list(rq->q);
+		}
 	}
 
 	rq_completed(md, rw, false);
 }
 
+static void dm_requeue_original_request(struct mapped_device *md,
+					struct request *rq)
+{
+	return __dm_requeue_original_request(md, rq, false);
+}
+
+static void dm_unprep_before_requeuing_original_request(struct mapped_device *md,
+							struct request *rq)
+{
+	return __dm_requeue_original_request(md, rq, true);
+}
+
 static void old_stop_queue(struct request_queue *q)
 {
 	unsigned long flags;
@@ -2679,15 +2693,18 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		/* clone request is allocated at the end of the pdu */
 		tio->clone = (void *)blk_mq_rq_to_pdu(rq) + sizeof(struct dm_rq_target_io);
 		if (!clone_rq(rq, md, tio, GFP_ATOMIC))
-			return BLK_MQ_RQ_QUEUE_BUSY;
+			goto out_requeue;
 		queue_kthread_work(&md->kworker, &tio->work);
 	} else {
 		/* Direct call is fine since .queue_rq allows allocations */
 		if (map_request(tio, rq, md) == DM_MAPIO_REQUEUE)
-			dm_requeue_original_request(md, rq);
+			goto out_requeue;
 	}
 
 	return BLK_MQ_RQ_QUEUE_OK;
+out_requeue:
+	dm_unprep_before_requeuing_original_request(md, rq);
+	return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
 static struct blk_mq_ops dm_mq_ops = {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-27 17:00                 ` Mike Snitzer
@ 2015-05-27 22:37                   ` Mike Snitzer
  2015-05-28  8:19                     ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-27 22:37 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Wed, May 27 2015 at  1:00pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, May 27 2015 at 12:14P -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
> > On Wed, May 27 2015 at 11:33am -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> > 
> > > On 05/27/15 17:29, Bart Van Assche wrote:
> > > >On 05/27/15 14:57, Mike Snitzer wrote:
> > > >>Looks like Junichi likely fixed this issue you reported, please try this
> > > >>patch: https://patchwork.kernel.org/patch/6487321/
> > > >
> > > >Hello Mike,
> > > >
> > > >On a setup on which an I/O verification test passes with
> > > >blk-mq/scsi-mq/dm-mq disabled, this is what fio reports after a few
> > > >minutes with scsi-mq and dm-mq enabled:
> > > >
> > > >test: Laying out IO file(s) (1 file(s) / 10MB)
> > > >fio: io_u error on file /mnt/test.0.0: Input/output error: write
> > > >offset=8327168, buflen=4096
> > > >fio: io_u error on file /mnt/test.0.0: Input/output error: write
> > > >offset=9007104, buflen=4096
> > > >fio: pid=4568, err=5/file:io_u.c:1564, func=io_u error,
> > > >error=Input/output error
> > 
> > I'll look closer at this.. so NULL pointer is fixed but this test hits
> > IO errors.
> 
> Further code inspection revealed an issue with dm-mq enabled but scsi-mq
> disabled (when requeuing the original request after clone_rq() failure DM
> core wasn't unwinding the dm_start_request() accounting).  The following
> patch will fix this issue.  I've also switched the dm-mq on scsi-mq case
> to return BLK_MQ_RQ_QUEUE_BUSY directly (like hch suggested last week).
> I have no idea if this would actually fix your case (would be surprising
> but worth a shot I suppose).
> 
> Anyway, feel free to try this patch:

FYI, I've staged a variant patch for 4.1 that is simpler; along with the
various fixes I've picked up from Junichi and the leak fix I emailed
earlier.  They are now in linux-next and available in this 'dm-4.1'
specific branch (based on 4.1-rc5):
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.1

Please try and let me know if your test works.

I don't have SRP setup otherwise I'd try your reproducer you shared a
while ago.  Any chance you're aware of a way to reproduce with LIO (and
tcm utils)?

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-27 22:37                   ` Mike Snitzer
@ 2015-05-28  8:19                     ` Bart Van Assche
  2015-05-28 13:10                       ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2015-05-28  8:19 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, Christoph Hellwig

On 05/28/15 00:37, Mike Snitzer wrote:
> FYI, I've staged a variant patch for 4.1 that is simpler; along with the
> various fixes I've picked up from Junichi and the leak fix I emailed
> earlier.  They are now in linux-next and available in this 'dm-4.1'
> specific branch (based on 4.1-rc5):
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.1
>
> Please try and let me know if your test works.

No data corruption was reported this time but a very large number of 
memory leaks were reported by kmemleak. The initiator system ran out of 
memory after some time due to these leaks. Here is an example of a leak 
reported by kmemleak:

unreferenced object 0xffff8800a39fc1a8 (size 96):
    comm "srp_daemon", pid 2116, jiffies 4294955508 (age 137.600s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
      [<ffffffff81167d19>] kmem_cache_alloc_node+0xd9/0x190
      [<ffffffff81425400>] scsi_init_request+0x20/0x40
      [<ffffffff812cbb98>] blk_mq_init_rq_map+0x228/0x290
      [<ffffffff812cbcc6>] blk_mq_alloc_tag_set+0xc6/0x220
      [<ffffffff81427488>] scsi_mq_setup_tags+0xc8/0xd0
      [<ffffffff8141e34f>] scsi_add_host_with_dma+0x6f/0x300
      [<ffffffffa04c62bf>] srp_create_target+0x11cf/0x1600 [ib_srp]
      [<ffffffff813f9c93>] dev_attr_store+0x13/0x20
      [<ffffffff81200a33>] sysfs_kf_write+0x43/0x60
      [<ffffffff811fff8b>] kernfs_fop_write+0x13b/0x1a0
      [<ffffffff81183e53>] __vfs_write+0x23/0xe0
      [<ffffffff81184524>] vfs_write+0xa4/0x1b0
      [<ffffffff811852d4>] SyS_write+0x44/0xb0
      [<ffffffff81613cdb>] system_call_fastpath+0x16/0x73
      [<ffffffffffffffff>] 0xffffffffffffffff

> I don't have SRP setup otherwise I'd try your reproducer you shared a
> while ago.  Any chance you're aware of a way to reproduce with LIO (and
> tcm utils)?

If I find a way to reproduce this with LIO I'll let you know.

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-28  8:19                     ` Bart Van Assche
@ 2015-05-28 13:10                       ` Mike Snitzer
  2015-05-28 14:07                         ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-28 13:10 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, Christoph Hellwig

On Thu, May 28 2015 at  4:19am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 05/28/15 00:37, Mike Snitzer wrote:
> >FYI, I've staged a variant patch for 4.1 that is simpler; along with the
> >various fixes I've picked up from Junichi and the leak fix I emailed
> >earlier.  They are now in linux-next and available in this 'dm-4.1'
> >specific branch (based on 4.1-rc5):
> >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.1
> >
> >Please try and let me know if your test works.
> 
> No data corruption was reported this time but a very large number of
> memory leaks were reported by kmemleak. The initiator system ran out
> of memory after some time due to these leaks. Here is an example of
> a leak reported by kmemleak:
> 
> unreferenced object 0xffff8800a39fc1a8 (size 96):
>    comm "srp_daemon", pid 2116, jiffies 4294955508 (age 137.600s)
>    hex dump (first 32 bytes):
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
>      [<ffffffff81167d19>] kmem_cache_alloc_node+0xd9/0x190
>      [<ffffffff81425400>] scsi_init_request+0x20/0x40
>      [<ffffffff812cbb98>] blk_mq_init_rq_map+0x228/0x290
>      [<ffffffff812cbcc6>] blk_mq_alloc_tag_set+0xc6/0x220
>      [<ffffffff81427488>] scsi_mq_setup_tags+0xc8/0xd0
>      [<ffffffff8141e34f>] scsi_add_host_with_dma+0x6f/0x300
>      [<ffffffffa04c62bf>] srp_create_target+0x11cf/0x1600 [ib_srp]
>      [<ffffffff813f9c93>] dev_attr_store+0x13/0x20
>      [<ffffffff81200a33>] sysfs_kf_write+0x43/0x60
>      [<ffffffff811fff8b>] kernfs_fop_write+0x13b/0x1a0
>      [<ffffffff81183e53>] __vfs_write+0x23/0xe0
>      [<ffffffff81184524>] vfs_write+0xa4/0x1b0
>      [<ffffffff811852d4>] SyS_write+0x44/0xb0
>      [<ffffffff81613cdb>] system_call_fastpath+0x16/0x73
>      [<ffffffffffffffff>] 0xffffffffffffffff

I suspect I'm missing some cleanup of the request I got from the
underlying blk-mq device.  I'll have a closer look.

> >I don't have SRP setup otherwise I'd try your reproducer you shared a
> >while ago.  Any chance you're aware of a way to reproduce with LIO (and
> >tcm utils)?
> 
> If I find a way to reproduce this with LIO I'll let you know.

Could you try Junichi's script that he posted today to see if it at
least shows the leak?  I'll do the same but I need to rebuild with
kmemleak enabled, etc.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-28 13:10                       ` Mike Snitzer
@ 2015-05-28 14:07                         ` Mike Snitzer
  2015-05-28 14:54                           ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-28 14:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Junichi Nomura, device-mapper development, Christoph Hellwig

On Thu, May 28 2015 at  9:10P -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, May 28 2015 at  4:19am -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > On 05/28/15 00:37, Mike Snitzer wrote:
> > >FYI, I've staged a variant patch for 4.1 that is simpler; along with the
> > >various fixes I've picked up from Junichi and the leak fix I emailed
> > >earlier.  They are now in linux-next and available in this 'dm-4.1'
> > >specific branch (based on 4.1-rc5):
> > >https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.1
> > >
> > >Please try and let me know if your test works.
> > 
> > No data corruption was reported this time but a very large number of
> > memory leaks were reported by kmemleak. The initiator system ran out
> > of memory after some time due to these leaks. Here is an example of
> > a leak reported by kmemleak:
> > 
> > unreferenced object 0xffff8800a39fc1a8 (size 96):
> >    comm "srp_daemon", pid 2116, jiffies 4294955508 (age 137.600s)
> >    hex dump (first 32 bytes):
> >      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
> >      [<ffffffff81167d19>] kmem_cache_alloc_node+0xd9/0x190
> >      [<ffffffff81425400>] scsi_init_request+0x20/0x40
> >      [<ffffffff812cbb98>] blk_mq_init_rq_map+0x228/0x290
> >      [<ffffffff812cbcc6>] blk_mq_alloc_tag_set+0xc6/0x220
> >      [<ffffffff81427488>] scsi_mq_setup_tags+0xc8/0xd0
> >      [<ffffffff8141e34f>] scsi_add_host_with_dma+0x6f/0x300
> >      [<ffffffffa04c62bf>] srp_create_target+0x11cf/0x1600 [ib_srp]
> >      [<ffffffff813f9c93>] dev_attr_store+0x13/0x20
> >      [<ffffffff81200a33>] sysfs_kf_write+0x43/0x60
> >      [<ffffffff811fff8b>] kernfs_fop_write+0x13b/0x1a0
> >      [<ffffffff81183e53>] __vfs_write+0x23/0xe0
> >      [<ffffffff81184524>] vfs_write+0xa4/0x1b0
> >      [<ffffffff811852d4>] SyS_write+0x44/0xb0
> >      [<ffffffff81613cdb>] system_call_fastpath+0x16/0x73
> >      [<ffffffffffffffff>] 0xffffffffffffffff
> 
> I suspect I'm missing some cleanup of the request I got from the
> underlying blk-mq device.  I'll have a closer look.

BTW, your test was with the dm-4.1 branch right?

The above kmemleak trace clearly speaks to dm-mpath's ->clone_and_map_rq
having allocated the underlying scsi-mq request.  So it'll later require
a call to dm-mpath's ->release_clone_rq to free the associated memory --
which happens in dm.c:free_rq_clone().

But I'm not yet seeing where we'd be missing a required call to
free_rq_clone() in the DM core error paths.  You can try this patch to
see if you hit the WARN_ON but I highly doubt you would.. similarly the
clone request shouldn't ever be allocated (nor tio->clone initialized)
in the REQUEUE case:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1badfb2..2db936f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1972,8 +1972,10 @@ static int map_request(struct dm_rq_target_io *tio, struct request *rq,
 			dm_kill_unmapped_request(rq, r);
 			return r;
 		}
-		if (r != DM_MAPIO_REMAPPED)
+		if (r != DM_MAPIO_REMAPPED) {
+			WARN_ON_ONCE(clone && !IS_ERR(clone));
 			return r;
+		}
 		if (setup_clone(clone, rq, tio, GFP_ATOMIC)) {
 			/* -ENOMEM */
 			ti->type->release_clone_rq(clone);
@@ -2759,7 +2761,8 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 	} else {
 		/* Direct call is fine since .queue_rq allows allocations */
 		if (map_request(tio, rq, md) == DM_MAPIO_REQUEUE) {
-			/* Undo dm_start_request() before requeuing */
+			/* Free clone and undo dm_start_request() before requeuing */
+			dm_unprep_request(rq);
 			rq_completed(md, rq_data_dir(rq), false);
 			return BLK_MQ_RQ_QUEUE_BUSY;
 		}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-28 14:07                         ` Mike Snitzer
@ 2015-05-28 14:54                           ` Bart Van Assche
  2015-05-28 15:06                             ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2015-05-28 14:54 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Junichi Nomura, device-mapper development, Christoph Hellwig

On 05/28/15 16:07, Mike Snitzer wrote:
> On Thu, May 28 2015 at  9:10P -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
>> On Thu, May 28 2015 at  4:19am -0400,
>> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>>
>>> On 05/28/15 00:37, Mike Snitzer wrote:
>>>> FYI, I've staged a variant patch for 4.1 that is simpler; along with the
>>>> various fixes I've picked up from Junichi and the leak fix I emailed
>>>> earlier.  They are now in linux-next and available in this 'dm-4.1'
>>>> specific branch (based on 4.1-rc5):
>>>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.1
>>>>
>>>> Please try and let me know if your test works.
>>>
>>> No data corruption was reported this time but a very large number of
>>> memory leaks were reported by kmemleak. The initiator system ran out
>>> of memory after some time due to these leaks. Here is an example of
>>> a leak reported by kmemleak:
>>>
>>> unreferenced object 0xffff8800a39fc1a8 (size 96):
>>>     comm "srp_daemon", pid 2116, jiffies 4294955508 (age 137.600s)
>>>     hex dump (first 32 bytes):
>>>       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>     backtrace:
>>>       [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
>>>       [<ffffffff81167d19>] kmem_cache_alloc_node+0xd9/0x190
>>>       [<ffffffff81425400>] scsi_init_request+0x20/0x40
>>>       [<ffffffff812cbb98>] blk_mq_init_rq_map+0x228/0x290
>>>       [<ffffffff812cbcc6>] blk_mq_alloc_tag_set+0xc6/0x220
>>>       [<ffffffff81427488>] scsi_mq_setup_tags+0xc8/0xd0
>>>       [<ffffffff8141e34f>] scsi_add_host_with_dma+0x6f/0x300
>>>       [<ffffffffa04c62bf>] srp_create_target+0x11cf/0x1600 [ib_srp]
>>>       [<ffffffff813f9c93>] dev_attr_store+0x13/0x20
>>>       [<ffffffff81200a33>] sysfs_kf_write+0x43/0x60
>>>       [<ffffffff811fff8b>] kernfs_fop_write+0x13b/0x1a0
>>>       [<ffffffff81183e53>] __vfs_write+0x23/0xe0
>>>       [<ffffffff81184524>] vfs_write+0xa4/0x1b0
>>>       [<ffffffff811852d4>] SyS_write+0x44/0xb0
>>>       [<ffffffff81613cdb>] system_call_fastpath+0x16/0x73
>>>       [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> I suspect I'm missing some cleanup of the request I got from the
>> underlying blk-mq device.  I'll have a closer look.
>
> BTW, your test was with the dm-4.1 branch right?
>
> The above kmemleak trace clearly speaks to dm-mpath's ->clone_and_map_rq
> having allocated the underlying scsi-mq request.  So it'll later require
> a call to dm-mpath's ->release_clone_rq to free the associated memory --
> which happens in dm.c:free_rq_clone().
>
> But I'm not yet seeing where we'd be missing a required call to
> free_rq_clone() in the DM core error paths.  You can try this patch to
> see if you hit the WARN_ON but I highly doubt you would.. similarly the
> clone request shouldn't ever be allocated (nor tio->clone initialized)
> in the REQUEUE case:

Hello Mike,

This occurred with the dm-4.1 branch merged with the for-4.2 IB branch. 
The leak was reported for regular I/O and before I started to trigger 
path failures. I had a look myself at how the sense_buffer pointer is 
manipulated by the scsi-mq code but could not find anything that is 
wrong. So the next I did was to repeat my test with kmemleak disabled. 
During this test the number of kmalloc-96 objects in /proc/slabinfo 
remained constant. So I probably have hit a bug in kmemleak. Maybe the 
code that clears and restores the sense buffer pointer in 
scsi_mq_prep_fn() is confusing kmemleak ? Sorry for the noise.

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-28 14:54                           ` Bart Van Assche
@ 2015-05-28 15:06                             ` Mike Snitzer
  2015-05-29 10:04                               ` Bart Van Assche
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2015-05-28 15:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Junichi Nomura, device-mapper development, Christoph Hellwig

On Thu, May 28 2015 at 10:54am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 05/28/15 16:07, Mike Snitzer wrote:
> >On Thu, May 28 2015 at  9:10P -0400,
> >Mike Snitzer <snitzer@redhat.com> wrote:
> >
> >>On Thu, May 28 2015 at  4:19am -0400,
> >>Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >>
> >>>On 05/28/15 00:37, Mike Snitzer wrote:
> >>>>FYI, I've staged a variant patch for 4.1 that is simpler; along with the
> >>>>various fixes I've picked up from Junichi and the leak fix I emailed
> >>>>earlier.  They are now in linux-next and available in this 'dm-4.1'
> >>>>specific branch (based on 4.1-rc5):
> >>>>https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.1
> >>>>
> >>>>Please try and let me know if your test works.
> >>>
> >>>No data corruption was reported this time but a very large number of
> >>>memory leaks were reported by kmemleak. The initiator system ran out
> >>>of memory after some time due to these leaks. Here is an example of
> >>>a leak reported by kmemleak:
> >>>
> >>>unreferenced object 0xffff8800a39fc1a8 (size 96):
> >>>    comm "srp_daemon", pid 2116, jiffies 4294955508 (age 137.600s)
> >>>    hex dump (first 32 bytes):
> >>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>    backtrace:
> >>>      [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
> >>>      [<ffffffff81167d19>] kmem_cache_alloc_node+0xd9/0x190
> >>>      [<ffffffff81425400>] scsi_init_request+0x20/0x40
> >>>      [<ffffffff812cbb98>] blk_mq_init_rq_map+0x228/0x290
> >>>      [<ffffffff812cbcc6>] blk_mq_alloc_tag_set+0xc6/0x220
> >>>      [<ffffffff81427488>] scsi_mq_setup_tags+0xc8/0xd0
> >>>      [<ffffffff8141e34f>] scsi_add_host_with_dma+0x6f/0x300
> >>>      [<ffffffffa04c62bf>] srp_create_target+0x11cf/0x1600 [ib_srp]
> >>>      [<ffffffff813f9c93>] dev_attr_store+0x13/0x20
> >>>      [<ffffffff81200a33>] sysfs_kf_write+0x43/0x60
> >>>      [<ffffffff811fff8b>] kernfs_fop_write+0x13b/0x1a0
> >>>      [<ffffffff81183e53>] __vfs_write+0x23/0xe0
> >>>      [<ffffffff81184524>] vfs_write+0xa4/0x1b0
> >>>      [<ffffffff811852d4>] SyS_write+0x44/0xb0
> >>>      [<ffffffff81613cdb>] system_call_fastpath+0x16/0x73
> >>>      [<ffffffffffffffff>] 0xffffffffffffffff
> >>
> >>I suspect I'm missing some cleanup of the request I got from the
> >>underlying blk-mq device.  I'll have a closer look.
> >
> >BTW, your test was with the dm-4.1 branch right?
> >
> >The above kmemleak trace clearly speaks to dm-mpath's ->clone_and_map_rq
> >having allocated the underlying scsi-mq request.  So it'll later require
> >a call to dm-mpath's ->release_clone_rq to free the associated memory --
> >which happens in dm.c:free_rq_clone().
> >
> >But I'm not yet seeing where we'd be missing a required call to
> >free_rq_clone() in the DM core error paths.  You can try this patch to
> >see if you hit the WARN_ON but I highly doubt you would.. similarly the
> >clone request shouldn't ever be allocated (nor tio->clone initialized)
> >in the REQUEUE case:
> 
> Hello Mike,
> 
> This occurred with the dm-4.1 branch merged with the for-4.2 IB
> branch. The leak was reported for regular I/O and before I started
> to trigger path failures. I had a look myself at how the
> sense_buffer pointer is manipulated by the scsi-mq code but could
> not find anything that is wrong. So the next I did was to repeat my
> test with kmemleak disabled. During this test the number of
> kmalloc-96 objects in /proc/slabinfo remained constant. So I
> probably have hit a bug in kmemleak. Maybe the code that clears and
> restores the sense buffer pointer in scsi_mq_prep_fn() is confusing
> kmemleak ? Sorry for the noise.

Ah, no problem, very good news (albeit strange)!

So you can confirm that with dm-4.1 your test passes?  If possible
please try your test a fews times.  Also, if time permits, please vary
scsi-mq and dm-mq enable/disable (4 permutations) to make sure all
supported modes pass your SRP torture test.

I just have to review Junichi's patch from today to silence the WARN_ON
I added; once I work through that I'll likely send dm-4.1 to Linus.

Thanks for all your help testing.
Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.1-rc2 dm-multipath-mq kernel warning
  2015-05-28 15:06                             ` Mike Snitzer
@ 2015-05-29 10:04                               ` Bart Van Assche
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Van Assche @ 2015-05-29 10:04 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Junichi Nomura, device-mapper development, Christoph Hellwig

On 05/28/15 17:06, Mike Snitzer wrote:
> So you can confirm that with dm-4.1 your test passes?  If possible
> please try your test a fews times.  Also, if time permits, please vary
> scsi-mq and dm-mq enable/disable (4 permutations) to make sure all
> supported modes pass your SRP torture test.
>
> I just have to review Junichi's patch from today to silence the WARN_ON
> I added; once I work through that I'll likely send dm-4.1 to Linus.

Hello Mike,

Good news: with the latest dm-4.1 branch and with both scsi-mq and dm-mq 
enabled my SRP fail-over test passes.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-05-29 10:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-05 14:04 4.1-rc2 dm-multipath-mq kernel warning Bart Van Assche
2015-05-06  2:23 ` Mike Snitzer
2015-05-06  7:45   ` Bart Van Assche
2015-05-06 18:29     ` Mike Snitzer
2015-05-07 10:19       ` Bart Van Assche
2015-05-27 12:57         ` Mike Snitzer
2015-05-27 15:29           ` Bart Van Assche
2015-05-27 15:33             ` Bart Van Assche
2015-05-27 16:14               ` Mike Snitzer
2015-05-27 17:00                 ` Mike Snitzer
2015-05-27 22:37                   ` Mike Snitzer
2015-05-28  8:19                     ` Bart Van Assche
2015-05-28 13:10                       ` Mike Snitzer
2015-05-28 14:07                         ` Mike Snitzer
2015-05-28 14:54                           ` Bart Van Assche
2015-05-28 15:06                             ` Mike Snitzer
2015-05-29 10:04                               ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.