All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel v4.19-rc4 KASAN complaint
@ 2018-09-18 21:16 Bart Van Assche
  2018-09-20  7:10 ` Christoph Hellwig
  2018-09-20 17:01 ` Keith Busch
  0 siblings, 2 replies; 19+ messages in thread
From: Bart Van Assche @ 2018-09-18 21:16 UTC (permalink / raw)


Hello,

If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests 
against kernel v4.19-rc4 then a KASAN complaint appears. This complaint 
does not appear when I run these tests against kernel v4.18. Could this 
be a regression?

Thanks,

Bart.

BUG: KASAN: use-after-free in srcu_invoke_callbacks+0x207/0x290
Read of size 8 at addr ffff880074250f70 by task kworker/0:3/26033

CPU: 0 PID: 26033 Comm: kworker/0:3 Not tainted 4.19.0-rc4-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: rcu_gp srcu_invoke_callbacks
Call Trace:
dump_stack+0xa4/0xf5
print_address_description+0x78/0x290
kasan_report+0x241/0x360
__asan_load8+0x54/0x90
srcu_invoke_callbacks+0x207/0x290
process_one_work+0x4ae/0xa20
worker_thread+0x63/0x5a0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30

Allocated by task 24735:
save_stack+0x43/0xd0
kasan_kmalloc+0xad/0xe0
kmem_cache_alloc_trace+0x13d/0x300
nvme_validate_ns+0x8e9/0x1020 [nvme_core]
nvme_scan_work+0x3be/0x4a0 [nvme_core]
process_one_work+0x4ae/0xa20
worker_thread+0x63/0x5a0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30

Freed by task 17790:
save_stack+0x43/0xd0
__kasan_slab_free+0x135/0x190
kasan_slab_free+0xe/0x10
kfree+0x105/0x2e0
nvme_free_ns+0x160/0x1a0 [nvme_core]
nvme_ns_remove+0x1ba/0x250 [nvme_core]
nvme_remove_invalid_namespaces+0x1d9/0x220 [nvme_core]
nvme_scan_work+0x43b/0x4a0 [nvme_core]
process_one_work+0x4ae/0xa20
worker_thread+0x63/0x5a0
kthread+0x1cf/0x1f0
ret_from_fork+0x24/0x30

The buggy address belongs to the object at ffff880074250d80
which belongs to the cache kmalloc-1024 of size 1024
The buggy address is located 496 bytes inside of
1024-byte region [ffff880074250d80, ffff880074251180)
The buggy address belongs to the page:
page:ffffea0001d09400 count:1 mapcount:0 mapping:ffff88011bf8ea00 
index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 ffffea0003587a00 0000000200000002 ffff88011bf8ea00
raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880074250e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880074250e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 >ffff880074250f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
ffff880074250f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880074251000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-18 21:16 Kernel v4.19-rc4 KASAN complaint Bart Van Assche
@ 2018-09-20  7:10 ` Christoph Hellwig
  2018-09-20 17:24   ` Bart Van Assche
  2018-09-24  4:27   ` Sagi Grimberg
  2018-09-20 17:01 ` Keith Busch
  1 sibling, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-09-20  7:10 UTC (permalink / raw)


On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> Hello,
> 
> If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> not appear when I run these tests against kernel v4.18. Could this be a
> regression?

Sounds like it is.  4.19 has the new ANA code, so the multipath code
has some churn.

> BUG: KASAN: use-after-free in srcu_invoke_callbacks+0x207/0x290

Can you resolve the address using gdb on vmlinux to a specific
line of code?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-18 21:16 Kernel v4.19-rc4 KASAN complaint Bart Van Assche
  2018-09-20  7:10 ` Christoph Hellwig
@ 2018-09-20 17:01 ` Keith Busch
  2018-09-20 17:31   ` Bart Van Assche
  1 sibling, 1 reply; 19+ messages in thread
From: Keith Busch @ 2018-09-20 17:01 UTC (permalink / raw)


On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> Hello,
> 
> If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> not appear when I run these tests against kernel v4.18. Could this be a
> regression?

Not sure if the following is what you're hitting since it wouldn't be a
regression. It looks like removing a namespace path and clearing it as
the current path is in the wrong order, such that the very next IO
may reference the namespace being deleted.

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 893f1fcc17cd..a01b6743d62b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3143,8 +3143,8 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 	}
 
 	mutex_lock(&ns->ctrl->subsys->lock);
-	nvme_mpath_clear_current_path(ns);
 	list_del_rcu(&ns->siblings);
+	nvme_mpath_clear_current_path(ns);
 	mutex_unlock(&ns->ctrl->subsys->lock);
 
 	down_write(&ns->ctrl->namespaces_rwsem);
--

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-20  7:10 ` Christoph Hellwig
@ 2018-09-20 17:24   ` Bart Van Assche
  2018-09-25 23:32     ` Christoph Hellwig
  2018-09-24  4:27   ` Sagi Grimberg
  1 sibling, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2018-09-20 17:24 UTC (permalink / raw)


On Thu, 2018-09-20@00:10 -0700, Christoph Hellwig wrote:
> On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> > Hello,
> > 
> > If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> > against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> > not appear when I run these tests against kernel v4.18. Could this be a
> > regression?
> 
> Sounds like it is.  4.19 has the new ANA code, so the multipath code
> has some churn.
> 
> > BUG: KASAN: use-after-free in srcu_invoke_callbacks+0x207/0x290
> 
> Can you resolve the address using gdb on vmlinux to a specific
> line of code?

Sure. The gdb output (which is probably not very useful) is as follows:

(gdb) list *(srcu_invoke_callbacks+0x207)
0xffffffff811872e7 is in srcu_invoke_callbacks (./include/linux/compiler.h:188).
183     })
184
185     static __always_inline
186     void __read_once_size(const volatile void *p, void *res, int size)
187     {
188             __READ_ONCE_SIZE;
189     }
190
191     #ifdef CONFIG_KASAN
192     /*

This may be more useful:

(gdb) list *(srcu_invoke_callbacks+0x1fa)
0xffffffff811872da is in srcu_invoke_callbacks (kernel/rcu/srcutree.c:1206).
1201            /*
1202             * Update counts, accelerate new callbacks, and if needed,
1203             * schedule another round of callback invocation.
1204             */
1205            spin_lock_irq_rcu_node(sdp);
1206            rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
1207            (void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
1208                                           rcu_seq_snap(&sp->srcu_gp_seq));
1209            sdp->srcu_cblist_invoking = false;
1210            more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist);

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-20 17:01 ` Keith Busch
@ 2018-09-20 17:31   ` Bart Van Assche
  2018-09-20 17:36     ` Keith Busch
  2018-09-20 17:36     ` Bart Van Assche
  0 siblings, 2 replies; 19+ messages in thread
From: Bart Van Assche @ 2018-09-20 17:31 UTC (permalink / raw)


On Thu, 2018-09-20@11:01 -0600, Keith Busch wrote:
> On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> > Hello,
> > 
> > If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> > against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> > not appear when I run these tests against kernel v4.18. Could this be a
> > regression?
> 
> Not sure if the following is what you're hitting since it wouldn't be a
> regression. It looks like removing a namespace path and clearing it as
> the current path is in the wrong order, such that the very next IO
> may reference the namespace being deleted.
> 
> ---
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 893f1fcc17cd..a01b6743d62b 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3143,8 +3143,8 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>  	}
>  
>  	mutex_lock(&ns->ctrl->subsys->lock);
> -	nvme_mpath_clear_current_path(ns);
>  	list_del_rcu(&ns->siblings);
> +	nvme_mpath_clear_current_path(ns);
>  	mutex_unlock(&ns->ctrl->subsys->lock);
>  
>  	down_write(&ns->ctrl->namespaces_rwsem);

That patch makes the KASAN complaint disappear on my test setup.

Thanks!

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-20 17:31   ` Bart Van Assche
@ 2018-09-20 17:36     ` Keith Busch
  2018-10-05  7:34       ` Christoph Hellwig
  2018-09-20 17:36     ` Bart Van Assche
  1 sibling, 1 reply; 19+ messages in thread
From: Keith Busch @ 2018-09-20 17:36 UTC (permalink / raw)


On Thu, Sep 20, 2018@10:31:29AM -0700, Bart Van Assche wrote:
> On Thu, 2018-09-20@11:01 -0600, Keith Busch wrote:
> > On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> > > Hello,
> > > 
> > > If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> > > against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> > > not appear when I run these tests against kernel v4.18. Could this be a
> > > regression?
> > 
> > Not sure if the following is what you're hitting since it wouldn't be a
> > regression. It looks like removing a namespace path and clearing it as
> > the current path is in the wrong order, such that the very next IO
> > may reference the namespace being deleted.
> > 
> > ---
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 893f1fcc17cd..a01b6743d62b 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3143,8 +3143,8 @@ static void nvme_ns_remove(struct nvme_ns *ns)
> >  	}
> >  
> >  	mutex_lock(&ns->ctrl->subsys->lock);
> > -	nvme_mpath_clear_current_path(ns);
> >  	list_del_rcu(&ns->siblings);
> > +	nvme_mpath_clear_current_path(ns);
> >  	mutex_unlock(&ns->ctrl->subsys->lock);
> >  
> >  	down_write(&ns->ctrl->namespaces_rwsem);
> 
> That patch makes the KASAN complaint disappear on my test setup.

Nice, thanks for confirming. I'll send a proper patch, but also wonder
why the error doesn't show up in 4.18.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-20 17:31   ` Bart Van Assche
  2018-09-20 17:36     ` Keith Busch
@ 2018-09-20 17:36     ` Bart Van Assche
  2018-09-20 17:45       ` Keith Busch
  1 sibling, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2018-09-20 17:36 UTC (permalink / raw)


On Thu, 2018-09-20@10:31 -0700, Bart Van Assche wrote:
> On Thu, 2018-09-20@11:01 -0600, Keith Busch wrote:
> > On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> > > Hello,
> > > 
> > > If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> > > against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> > > not appear when I run these tests against kernel v4.18. Could this be a
> > > regression?
> > 
> > Not sure if the following is what you're hitting since it wouldn't be a
> > regression. It looks like removing a namespace path and clearing it as
> > the current path is in the wrong order, such that the very next IO
> > may reference the namespace being deleted.
> > 
> > ---
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 893f1fcc17cd..a01b6743d62b 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3143,8 +3143,8 @@ static void nvme_ns_remove(struct nvme_ns *ns)
> >  	}
> >  
> >  	mutex_lock(&ns->ctrl->subsys->lock);
> > -	nvme_mpath_clear_current_path(ns);
> >  	list_del_rcu(&ns->siblings);
> > +	nvme_mpath_clear_current_path(ns);
> >  	mutex_unlock(&ns->ctrl->subsys->lock);
> >  
> >  	down_write(&ns->ctrl->namespaces_rwsem);
> 
> That patch makes the KASAN complaint disappear on my test setup.

Sorry, I spoke too soon. I didn't see it in the dmesg output because it
disappeared from that buffer but I found the complaint in the system log:

Sep 20 10:17:23 ubuntu-vm kernel: [  144.820073] WARNING: possible circular locking dependency detected
Sep 20 10:17:23 ubuntu-vm kernel: [  144.972667] Call Trace:
Sep 20 10:18:51 ubuntu-vm kernel: [  232.585122] BUG: KASAN: use-after-free in srcu_invoke_callbacks+0x207/0x290
Sep 20 10:18:51 ubuntu-vm kernel: [  232.585146] Call Trace:
Sep 20 10:28:02 ubuntu-vm kernel: [   58.755492] WARNING: possible circular locking dependency detected
Sep 20 10:28:02 ubuntu-vm kernel: [   58.825452] Call Trace:
Sep 20 10:31:44 ubuntu-vm kernel: [  281.520737] BUG: KASAN: use-after-free in srcu_invoke_callbacks+0x207/0x290
Sep 20 10:31:44 ubuntu-vm kernel: [  281.538488] Call Trace:

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-20 17:36     ` Bart Van Assche
@ 2018-09-20 17:45       ` Keith Busch
  0 siblings, 0 replies; 19+ messages in thread
From: Keith Busch @ 2018-09-20 17:45 UTC (permalink / raw)


On Thu, Sep 20, 2018@10:36:30AM -0700, Bart Van Assche wrote:
> On Thu, 2018-09-20@10:31 -0700, Bart Van Assche wrote:
> > On Thu, 2018-09-20@11:01 -0600, Keith Busch wrote:
> > > On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> > > > Hello,
> > > > 
> > > > If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> > > > against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> > > > not appear when I run these tests against kernel v4.18. Could this be a
> > > > regression?
> > > 
> > > Not sure if the following is what you're hitting since it wouldn't be a
> > > regression. It looks like removing a namespace path and clearing it as
> > > the current path is in the wrong order, such that the very next IO
> > > may reference the namespace being deleted.
> > > 
> > > ---
> > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > > index 893f1fcc17cd..a01b6743d62b 100644
> > > --- a/drivers/nvme/host/core.c
> > > +++ b/drivers/nvme/host/core.c
> > > @@ -3143,8 +3143,8 @@ static void nvme_ns_remove(struct nvme_ns *ns)
> > >  	}
> > >  
> > >  	mutex_lock(&ns->ctrl->subsys->lock);
> > > -	nvme_mpath_clear_current_path(ns);
> > >  	list_del_rcu(&ns->siblings);
> > > +	nvme_mpath_clear_current_path(ns);
> > >  	mutex_unlock(&ns->ctrl->subsys->lock);
> > >  
> > >  	down_write(&ns->ctrl->namespaces_rwsem);
> > 
> > That patch makes the KASAN complaint disappear on my test setup.
> 
> Sorry, I spoke too soon. I didn't see it in the dmesg output because it
> disappeared from that buffer but I found the complaint in the system log:

Bummer. :(

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-20  7:10 ` Christoph Hellwig
  2018-09-20 17:24   ` Bart Van Assche
@ 2018-09-24  4:27   ` Sagi Grimberg
  2018-09-24 14:04     ` Bart Van Assche
  1 sibling, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2018-09-24  4:27 UTC (permalink / raw)



>> Hello,
>>
>> If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
>> against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
>> not appear when I run these tests against kernel v4.18. Could this be a
>> regression?
> 
> Sounds like it is.  4.19 has the new ANA code, so the multipath code
> has some churn.

Note that this is testing nvmf with dm-multipath.

>> BUG: KASAN: use-after-free in srcu_invoke_callbacks+0x207/0x290
> 
> Can you resolve the address using gdb on vmlinux to a specific
> line of code?

Did this get resolved?
Bart, did this reproduce with nvme.multipath=1? or was it disabled?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-24  4:27   ` Sagi Grimberg
@ 2018-09-24 14:04     ` Bart Van Assche
  0 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2018-09-24 14:04 UTC (permalink / raw)


On 9/23/18 9:27 PM, Sagi Grimberg wrote:
> Did this get resolved?

Not yet unfortunately.

> Bart, did this reproduce with nvme.multipath=1? or was it disabled?

The nvmeof-mp tests only work with CONFIG_NVME_MULTIPATH enabled. Is the 
nvme.multipath parameter relevant in that mode?

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-20 17:24   ` Bart Van Assche
@ 2018-09-25 23:32     ` Christoph Hellwig
  2018-09-26  3:14       ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-09-25 23:32 UTC (permalink / raw)


[Adding Paul]

Hi Paul,

Bart reported a use after free in the SRCU code when testing the
nvme multipath code here:

http://lists.infradead.org/pipermail/linux-nvme/2018-September/020009.html

Based on his analsys it appears to me the use after free is on the
srcu_data structure, which is internal to the SRCU implementation.

While I don't want to exclude an actual cause in the nvme code I wonder
if you have any additional insights from the RCU perspective.

On Thu, Sep 20, 2018@10:24:01AM -0700, Bart Van Assche wrote:
> On Thu, 2018-09-20@00:10 -0700, Christoph Hellwig wrote:
> > On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> > > Hello,
> > > 
> > > If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> > > against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> > > not appear when I run these tests against kernel v4.18. Could this be a
> > > regression?
> > 
> > Sounds like it is.  4.19 has the new ANA code, so the multipath code
> > has some churn.
> > 
> > > BUG: KASAN: use-after-free in srcu_invoke_callbacks+0x207/0x290
> > 
> > Can you resolve the address using gdb on vmlinux to a specific
> > line of code?
> 
> Sure. The gdb output (which is probably not very useful) is as follows:
> 
> (gdb) list *(srcu_invoke_callbacks+0x207)
> 0xffffffff811872e7 is in srcu_invoke_callbacks (./include/linux/compiler.h:188).
> 183     })
> 184
> 185     static __always_inline
> 186     void __read_once_size(const volatile void *p, void *res, int size)
> 187     {
> 188             __READ_ONCE_SIZE;
> 189     }
> 190
> 191     #ifdef CONFIG_KASAN
> 192     /*
> 
> This may be more useful:
> 
> (gdb) list *(srcu_invoke_callbacks+0x1fa)
> 0xffffffff811872da is in srcu_invoke_callbacks (kernel/rcu/srcutree.c:1206).
> 1201            /*
> 1202             * Update counts, accelerate new callbacks, and if needed,
> 1203             * schedule another round of callback invocation.
> 1204             */
> 1205            spin_lock_irq_rcu_node(sdp);
> 1206            rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
> 1207            (void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
> 1208                                           rcu_seq_snap(&sp->srcu_gp_seq));
> 1209            sdp->srcu_cblist_invoking = false;
> 1210            more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist);
> 
> Bart.
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-25 23:32     ` Christoph Hellwig
@ 2018-09-26  3:14       ` Paul E. McKenney
  2018-10-05  7:38         ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2018-09-26  3:14 UTC (permalink / raw)


On Tue, Sep 25, 2018@04:32:11PM -0700, Christoph Hellwig wrote:
> [Adding Paul]
> 
> Hi Paul,
> 
> Bart reported a use after free in the SRCU code when testing the
> nvme multipath code here:
> 
> http://lists.infradead.org/pipermail/linux-nvme/2018-September/020009.html
> 
> Based on his analsys it appears to me the use after free is on the
> srcu_data structure, which is internal to the SRCU implementation.
> 
> While I don't want to exclude an actual cause in the nvme code I wonder
> if you have any additional insights from the RCU perspective.
> 
> On Thu, Sep 20, 2018@10:24:01AM -0700, Bart Van Assche wrote:
> > On Thu, 2018-09-20@00:10 -0700, Christoph Hellwig wrote:
> > > On Tue, Sep 18, 2018@02:16:48PM -0700, Bart Van Assche wrote:
> > > > Hello,
> > > > 
> > > > If I run the nvmeof-mp tests from https://github.com/bvanassche/blktests
> > > > against kernel v4.19-rc4 then a KASAN complaint appears. This complaint does
> > > > not appear when I run these tests against kernel v4.18. Could this be a
> > > > regression?

I would be quite surprised if any of the SRCU commits since v4.18 caused
this sort of a problem, but there are not that many of them. so easy to
check (at least assuming that this is reproducible):

	gitk v4.18.. -- kernel/rcu/srcu* include/linux/*srcu*

But checking below...

> > > Sounds like it is.  4.19 has the new ANA code, so the multipath code
> > > has some churn.
> > > 
> > > > BUG: KASAN: use-after-free in srcu_invoke_callbacks+0x207/0x290
> > > 
> > > Can you resolve the address using gdb on vmlinux to a specific
> > > line of code?
> > 
> > Sure. The gdb output (which is probably not very useful) is as follows:
> > 
> > (gdb) list *(srcu_invoke_callbacks+0x207)
> > 0xffffffff811872e7 is in srcu_invoke_callbacks (./include/linux/compiler.h:188).
> > 183     })
> > 184
> > 185     static __always_inline
> > 186     void __read_once_size(const volatile void *p, void *res, int size)
> > 187     {
> > 188             __READ_ONCE_SIZE;
> > 189     }
> > 190
> > 191     #ifdef CONFIG_KASAN
> > 192     /*
> > 
> > This may be more useful:
> > 
> > (gdb) list *(srcu_invoke_callbacks+0x1fa)
> > 0xffffffff811872da is in srcu_invoke_callbacks (kernel/rcu/srcutree.c:1206).
> > 1201            /*
> > 1202             * Update counts, accelerate new callbacks, and if needed,
> > 1203             * schedule another round of callback invocation.
> > 1204             */
> > 1205            spin_lock_irq_rcu_node(sdp);
> > 1206            rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
> > 1207            (void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
> > 1208                                           rcu_seq_snap(&sp->srcu_gp_seq));
> > 1209            sdp->srcu_cblist_invoking = false;
> > 1210            more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist);

I would expect something like this if someone did a double call_srcu()
or passed something to call_srcu() but then kept using it (for an example
of the latter, failed to make it inaccessible to readers before invoking
call_srcu() on it).  Yet another way to get here is to have unioned the
rcu_head structure with something used by the SRCU readers.

The double call_srcu() can be located by building your kernel with
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y and rerunning your tests.  The other
two usually require inspection or bisection.

So, the eternal question:  Is bisection feasible?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-20 17:36     ` Keith Busch
@ 2018-10-05  7:34       ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-10-05  7:34 UTC (permalink / raw)


On Thu, Sep 20, 2018@11:36:25AM -0600, Keith Busch wrote:
> > >  	mutex_lock(&ns->ctrl->subsys->lock);
> > > -	nvme_mpath_clear_current_path(ns);
> > >  	list_del_rcu(&ns->siblings);
> > > +	nvme_mpath_clear_current_path(ns);
> > >  	mutex_unlock(&ns->ctrl->subsys->lock);
> > >  
> > >  	down_write(&ns->ctrl->namespaces_rwsem);
> > 
> > That patch makes the KASAN complaint disappear on my test setup.
> 
> Nice, thanks for confirming. I'll send a proper patch, but also wonder
> why the error doesn't show up in 4.18.

Independent of fixing the issue Bart reported this looks like a good
fix, can you send a proper patch for it?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-09-26  3:14       ` Paul E. McKenney
@ 2018-10-05  7:38         ` Christoph Hellwig
  2018-10-17  6:39           ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-10-05  7:38 UTC (permalink / raw)


On Tue, Sep 25, 2018@08:14:17PM -0700, Paul E. McKenney wrote:
> I would expect something like this if someone did a double call_srcu()
> or passed something to call_srcu() but then kept using it (for an example
> of the latter, failed to make it inaccessible to readers before invoking
> call_srcu() on it).  Yet another way to get here is to have unioned the
> rcu_head structure with something used by the SRCU readers.
>
> The double call_srcu() can be located by building your kernel with
> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y and rerunning your tests.  The other
> two usually require inspection or bisection.
> 
> So, the eternal question:  Is bisection feasible?

Looks like I misread the lines pointed to by gdb, and it indeed seems
like a premature free of the nvme_ns_head structure, but I fail to see
where.

I've tried to bring up Barts testcase but failed so far.

Bart, can you help out a bit?

The last commit of the original merge 4.19 nvme merge is:

b369b30cf510fe94d8884837039362e2ec223cec

Can you check if that shows the problem (I suspect it does) and if so
bisect between that and 9b89bc3857a6c0dfda18ddae2a42c114ecc32753, which
as the first commit for the merge?

> 							Thanx, Paul
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
---end quoted text---

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-10-05  7:38         ` Christoph Hellwig
@ 2018-10-17  6:39           ` Christoph Hellwig
  2018-10-17 14:38             ` Bart Van Assche
  2018-10-17 17:32             ` Bart Van Assche
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-10-17  6:39 UTC (permalink / raw)


On Fri, Oct 05, 2018@12:38:33AM -0700, Christoph Hellwig wrote:
> Looks like I misread the lines pointed to by gdb, and it indeed seems
> like a premature free of the nvme_ns_head structure, but I fail to see
> where.
> 
> I've tried to bring up Barts testcase but failed so far.
> 
> Bart, can you help out a bit?
> 
> The last commit of the original merge 4.19 nvme merge is:
> 
> b369b30cf510fe94d8884837039362e2ec223cec
> 
> Can you check if that shows the problem (I suspect it does) and if so
> bisect between that and 9b89bc3857a6c0dfda18ddae2a42c114ecc32753, which
> as the first commit for the merge?

Bart, do you cycles to help out bisecting this?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-10-17  6:39           ` Christoph Hellwig
@ 2018-10-17 14:38             ` Bart Van Assche
  2018-10-17 17:32             ` Bart Van Assche
  1 sibling, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2018-10-17 14:38 UTC (permalink / raw)


On 10/16/18 11:39 PM, Christoph Hellwig wrote:
> On Fri, Oct 05, 2018@12:38:33AM -0700, Christoph Hellwig wrote:
>> Looks like I misread the lines pointed to by gdb, and it indeed seems
>> like a premature free of the nvme_ns_head structure, but I fail to see
>> where.
>>
>> I've tried to bring up Barts testcase but failed so far.
>>
>> Bart, can you help out a bit?
>>
>> The last commit of the original merge 4.19 nvme merge is:
>>
>> b369b30cf510fe94d8884837039362e2ec223cec
>>
>> Can you check if that shows the problem (I suspect it does) and if so
>> bisect between that and 9b89bc3857a6c0dfda18ddae2a42c114ecc32753, which
>> as the first commit for the merge?
> 
> Bart, do you cycles to help out bisecting this?

Hi Christoph,

I will try to free up some time to search for the root cause of this 
complaint.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-10-17  6:39           ` Christoph Hellwig
  2018-10-17 14:38             ` Bart Van Assche
@ 2018-10-17 17:32             ` Bart Van Assche
  1 sibling, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2018-10-17 17:32 UTC (permalink / raw)


On 10/16/18 11:39 PM, Christoph Hellwig wrote:
> On Fri, Oct 05, 2018@12:38:33AM -0700, Christoph Hellwig wrote:
>> Looks like I misread the lines pointed to by gdb, and it indeed seems
>> like a premature free of the nvme_ns_head structure, but I fail to see
>> where.
>>
>> I've tried to bring up Barts testcase but failed so far.
>>
>> Bart, can you help out a bit?
>>
>> The last commit of the original merge 4.19 nvme merge is:
>>
>> b369b30cf510fe94d8884837039362e2ec223cec
>>
>> Can you check if that shows the problem (I suspect it does) and if so
>> bisect between that and 9b89bc3857a6c0dfda18ddae2a42c114ecc32753, which
>> as the first commit for the merge?
> 
> Bart, do you cycles to help out bisecting this?

Hi Christoph,

With your current nvme-4.20 branch I have not been able to reproduce 
this. The nvmeof-mp test scripts have not been changed. So I don't think 
we have to spend more time on this.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
  2018-10-08  6:11   ` Byungchul Park
@ 2018-10-08 10:13     ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-10-08 10:13 UTC (permalink / raw)


On Mon, Oct 08, 2018@03:11:52PM +0900, Byungchul Park wrote:
> Is it ok to call nvme_mpath_clear_current_path(ns) without holding
> any lock in nvme_failover_req(req)? No possible to race with other
> rcu_assign_pointer(ns->head->current_path, some_value)?

We can race with an assignment, but it should be harmless, as that
call is just an optimization to get everyone to stop using the
path ASAP.  If we actually free the namespace we always go through
nvme_ns_remove, which calls nvme_mpath_clear_current_path under
subsys->lock.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Kernel v4.19-rc4 KASAN complaint
       [not found] ` <31b80bc0-afc6-6bd9-c722-302f538d3e5b@lge.com>
@ 2018-10-08  6:11   ` Byungchul Park
  2018-10-08 10:13     ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Byungchul Park @ 2018-10-08  6:11 UTC (permalink / raw)


>On Tue, Sep 25, 2018@08:14:17PM -0700, Paul E. McKenney wrote:
>>I would expect something like this if someone did a double call_srcu()
>>or passed something to call_srcu() but then kept using it (for an example
>>of the latter, failed to make it inaccessible to readers before invoking
>>call_srcu() on it).  Yet another way to get here is to have unioned the
>>rcu_head structure with something used by the SRCU readers.
>>
>>The double call_srcu() can be located by building your kernel with
>>CONFIG_DEBUG_OBJECTS_RCU_HEAD=y and rerunning your tests.  The other
>>two usually require inspection or bisection.
>>
>>So, the eternal question:  Is bisection feasible?
>
>Looks like I misread the lines pointed to by gdb, and it indeed seems
>like a premature free of the nvme_ns_head structure, but I fail to see
>where.

I'm sorry for making a noise if I'm telling something wrong. I'm not
familiar with nvme stuff though, I'm just curious about..

Is it ok to call nvme_mpath_clear_current_path(ns) without holding
any lock in nvme_failover_req(req)? No possible to race with other
rcu_assign_pointer(ns->head->current_path, some_value)?

In other words, has the namespace been serialized when calling
nvme_failover_req(req) so as to guarantee that there's no one doing
rcu_assign_pointer(ns->head->current_path, some_value) somewhere or so?

Sorry again if I miss something but hope it's helpful. Ignore if so.

Thanks,
Byungchul

>I've tried to bring up Barts testcase but failed so far.
>
>Bart, can you help out a bit?
>
>The last commit of the original merge 4.19 nvme merge is:
>
>b369b30cf510fe94d8884837039362e2ec223cec
>
>Can you check if that shows the problem (I suspect it does) and if so
>bisect between that and 9b89bc3857a6c0dfda18ddae2a42c114ecc32753, which
>as the first commit for the merge?
>
>>							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-10-17 17:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18 21:16 Kernel v4.19-rc4 KASAN complaint Bart Van Assche
2018-09-20  7:10 ` Christoph Hellwig
2018-09-20 17:24   ` Bart Van Assche
2018-09-25 23:32     ` Christoph Hellwig
2018-09-26  3:14       ` Paul E. McKenney
2018-10-05  7:38         ` Christoph Hellwig
2018-10-17  6:39           ` Christoph Hellwig
2018-10-17 14:38             ` Bart Van Assche
2018-10-17 17:32             ` Bart Van Assche
2018-09-24  4:27   ` Sagi Grimberg
2018-09-24 14:04     ` Bart Van Assche
2018-09-20 17:01 ` Keith Busch
2018-09-20 17:31   ` Bart Van Assche
2018-09-20 17:36     ` Keith Busch
2018-10-05  7:34       ` Christoph Hellwig
2018-09-20 17:36     ` Bart Van Assche
2018-09-20 17:45       ` Keith Busch
     [not found] <20181006170915.GS2674@linux.ibm.com>
     [not found] ` <31b80bc0-afc6-6bd9-c722-302f538d3e5b@lge.com>
2018-10-08  6:11   ` Byungchul Park
2018-10-08 10:13     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.