All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
@ 2016-07-01 19:13 Jay Freyensee
  2016-07-04  8:48 ` Christoph Hellwig
  2016-07-12 23:15 ` J Freyensee
  0 siblings, 2 replies; 6+ messages in thread
From: Jay Freyensee @ 2016-07-01 19:13 UTC (permalink / raw)


Fix by: "Ming Lin" <mlin at kernel.org>

Repeatedly adding then removing the same NVMe-over-Fabrics controller
over and over again (shown below) can cause a kernel crash (also shown
below).  This patch fixes that.

[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside
-nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics
[nvmf]# ./remove_nvme_connections.sh 2
echo 1 > /sys/class/nvme/nvme0/delete_controller
echo 1 > /sys/class/nvme/nvme1/delete_controller
[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
Killed

[nvmf]# dmesg
[  313.416908] nvme nvme0: creating 16 I/O queues.
[  313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr
192.168.1.100:4420
[  313.524857] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[  313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30
[  313.525490] PGD 0
[  313.525726] Oops: 0000 [#1] SMP
[  313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core
ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en
mlx4_ib ib_core mlx4_core
[  313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted
4.7.0-rc2+ #2
[  313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT
-F/IBQF/IBFF, BIOS 1.0a 10/09/2012
[  313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti:
ffff88025b980000
[  313.527879] RIP: 0010:[<ffffffff8136c60e>]  [<ffffffff8136c60e>]
strcmp+0xe/0x30
[  313.528232] RSP: 0018:ffff88025b983db0  EFLAGS: 00010206
[  313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX:
fffffffffffffff1
[  313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI:
0000000000000011
[  313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09:
ffff880471879058
[  313.528956] R10: 000000000000002c R11: ffff88047f415000 R12:
ffff880471879800
[  313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15:
fffffffffffffff8
[  313.529303] FS:  00007f778f510700(0000) GS:ffff88047fbc0000(0000)
knlGS:0000000000000000
[  313.529629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4:
00000000000406e0
[  313.529989] Stack:
[  313.530154]  ffff88025b983e48 ffffffffa0171c74 0000000000000001
0000000000000059
[  313.530621]  ffff880476f32400 ffff88047e8add80 0000010074b33aa0
ffff880471879059
[  313.531162]  ffff88047187904b ffff880471879058 0000000000000000
ffff88047736e000
[  313.531629] Call Trace:
[  313.531797]  [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840
[nvme_fabrics]
[  313.531974]  [<ffffffff81180b53>] __vfs_write+0x23/0x120
[  313.532146]  [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0
[  313.532316]  [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170
[  313.532487]  [<ffffffff811811f3>] vfs_write+0xb3/0x1b0
[  313.532658]  [<ffffffff8117e321>] ? filp_close+0x51/0x70
[  313.532845]  [<ffffffff811824e1>] SyS_write+0x41/0xa0
[  313.533016]  [<ffffffff8183055b>]
entry_SYSCALL_64_fastpath+0x13/0x8f
[  313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01
84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83
c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31
[  313.536563] RIP  [<ffffffff8136c60e>] strcmp+0xe/0x30
[  313.536815]  RSP <ffff88025b983db0>
[  313.536981] CR2: 0000000000000010
[  313.537151] ---[ end trace 3d952e590e7bc2d5 ]---

Reported-and-tested-by: Jay Freyensee <james.p.freyensee at intel.com>
Signed-off-by: Ming Lin <mlin at kernel.org>
Signed-off-by: Jay Freyensee <james.p.freyensee at intel.com>
---
 drivers/nvme/host/fabrics.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 918310c..f8045e7 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -88,6 +88,10 @@ static void nvmf_host_destroy(struct kref *ref)
 {
 	struct nvmf_host *host = container_of(ref, struct nvmf_host, ref);
 
+	mutex_lock(&nvmf_hosts_mutex);
+	list_del(&host->list);
+	mutex_unlock(&nvmf_hosts_mutex);
+
 	kfree(host);
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
  2016-07-01 19:13 [PATCH] nvme-fabrics: add-remove ctrl repeat fix Jay Freyensee
@ 2016-07-04  8:48 ` Christoph Hellwig
  2016-07-08 15:56   ` J Freyensee
  2016-07-12 23:15 ` J Freyensee
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2016-07-04  8:48 UTC (permalink / raw)


On Fri, Jul 01, 2016@12:13:32PM -0700, Jay Freyensee wrote:
> Fix by: "Ming Lin" <mlin at kernel.org>

So is this authored by you or by Ming?  In the latter case this
should be a

From: "Ming Lin" <mlin@kernel.org>

The patch itself looks fine, I'll queue it up after we've sorted
out the above question, thanks to both of you.

But I think we'll also need a kref_get_unless_zero in __nvmf_host_find
to make this code fully correct, I'll prepare a patch for that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
  2016-07-04  8:48 ` Christoph Hellwig
@ 2016-07-08 15:56   ` J Freyensee
  0 siblings, 0 replies; 6+ messages in thread
From: J Freyensee @ 2016-07-08 15:56 UTC (permalink / raw)


On Mon, 2016-07-04@01:48 -0700, Christoph Hellwig wrote:
> On Fri, Jul 01, 2016@12:13:32PM -0700, Jay Freyensee wrote:
> > Fix by: "Ming Lin" <mlin at kernel.org>
> 
> So is this authored by you or by Ming?  In the latter case this
> should be a
> 
> From: "Ming Lin" <mlin at kernel.org>
> 
> The patch itself looks fine, I'll queue it up after we've sorted
> out the above question, thanks to both of you.

Sorry for the late reply, back from an extended 4th of July vacation.

Ming authored the fix after I found the kernel crash and sent it to me
for testing.  So yes, to answer your question, the title would be
"From: Ming Lin".  Sorry about that.

Jay

> But I think we'll also need a kref_get_unless_zero in
> __nvmf_host_find
> to make this code fully correct, I'll prepare a patch for that.
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
  2016-07-01 19:13 [PATCH] nvme-fabrics: add-remove ctrl repeat fix Jay Freyensee
  2016-07-04  8:48 ` Christoph Hellwig
@ 2016-07-12 23:15 ` J Freyensee
  2016-07-12 23:38   ` Jens Axboe
  1 sibling, 1 reply; 6+ messages in thread
From: J Freyensee @ 2016-07-12 23:15 UTC (permalink / raw)


On Fri, 2016-07-01@12:13 -0700, Jay Freyensee wrote:

I haven't seen this patch get folded in on the list yet, though
Christoph OK'ed the fix, minus the 'Fix by:' I used instead of 'From:'.

Has this been folded into 4.8?  Should I go ahead and re-spin the patch
and make the tweak change?  Be nice to eliminate this kernel crash with
fabrics being merged into 4.8 kernel.

Thanks,
Jay

> Fix by: "Ming Lin" <mlin at kernel.org>
> 
> Repeatedly adding then removing the same NVMe-over-Fabrics controller
> over and over again (shown below) can cause a kernel crash (also
> shown
> below).  This patch fixes that.
> 
> [nvmf]# ./setup_nvme_connections.sh
> traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
> -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
> traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside
> -nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics
> [nvmf]# ./remove_nvme_connections.sh 2
> echo 1 > /sys/class/nvme/nvme0/delete_controller
> echo 1 > /sys/class/nvme/nvme1/delete_controller
> [nvmf]# ./setup_nvme_connections.sh
> traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
> -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
> Killed
> 
> [nvmf]# dmesg
> [  313.416908] nvme nvme0: creating 16 I/O queues.
> [  313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr
> 192.168.1.100:4420
> [  313.524857] BUG: unable to handle kernel NULL pointer dereference
> at
> 0000000000000010
> [  313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30
> [  313.525490] PGD 0
> [  313.525726] Oops: 0000 [#1] SMP
> [  313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core
> ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
> mlx4_en
> mlx4_ib ib_core mlx4_core
> [  313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted
> 4.7.0-rc2+ #2
> [  313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT
> -F/IBQF/IBFF, BIOS 1.0a 10/09/2012
> [  313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti:
> ffff88025b980000
> [  313.527879] RIP: 0010:[<ffffffff8136c60e>]  [<ffffffff8136c60e>]
> strcmp+0xe/0x30
> [  313.528232] RSP: 0018:ffff88025b983db0  EFLAGS: 00010206
> [  313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX:
> fffffffffffffff1
> [  313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI:
> 0000000000000011
> [  313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09:
> ffff880471879058
> [  313.528956] R10: 000000000000002c R11: ffff88047f415000 R12:
> ffff880471879800
> [  313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15:
> fffffffffffffff8
> [  313.529303] FS:  00007f778f510700(0000) GS:ffff88047fbc0000(0000)
> knlGS:0000000000000000
> [  313.529629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4:
> 00000000000406e0
> [  313.529989] Stack:
> [  313.530154]  ffff88025b983e48 ffffffffa0171c74 0000000000000001
> 0000000000000059
> [  313.530621]  ffff880476f32400 ffff88047e8add80 0000010074b33aa0
> ffff880471879059
> [  313.531162]  ffff88047187904b ffff880471879058 0000000000000000
> ffff88047736e000
> [  313.531629] Call Trace:
> [  313.531797]  [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840
> [nvme_fabrics]
> [  313.531974]  [<ffffffff81180b53>] __vfs_write+0x23/0x120
> [  313.532146]  [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0
> [  313.532316]  [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170
> [  313.532487]  [<ffffffff811811f3>] vfs_write+0xb3/0x1b0
> [  313.532658]  [<ffffffff8117e321>] ? filp_close+0x51/0x70
> [  313.532845]  [<ffffffff811824e1>] SyS_write+0x41/0xa0
> [  313.533016]  [<ffffffff8183055b>]
> entry_SYSCALL_64_fastpath+0x13/0x8f
> [  313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2
> 01
> 84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48
> 83
> c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3
> 31
> [  313.536563] RIP  [<ffffffff8136c60e>] strcmp+0xe/0x30
> [  313.536815]  RSP <ffff88025b983db0>
> [  313.536981] CR2: 0000000000000010
> [  313.537151] ---[ end trace 3d952e590e7bc2d5 ]---
> 
> Reported-and-tested-by: Jay Freyensee <james.p.freyensee at intel.com>
> Signed-off-by: Ming Lin <mlin at kernel.org>
> Signed-off-by: Jay Freyensee <james.p.freyensee at intel.com>
> ---
>  drivers/nvme/host/fabrics.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/host/fabrics.c
> b/drivers/nvme/host/fabrics.c
> index 918310c..f8045e7 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -88,6 +88,10 @@ static void nvmf_host_destroy(struct kref *ref)
>  {
>  	struct nvmf_host *host = container_of(ref, struct nvmf_host,
> ref);
>  
> +	mutex_lock(&nvmf_hosts_mutex);
> +	list_del(&host->list);
> +	mutex_unlock(&nvmf_hosts_mutex);
> +
>  	kfree(host);
>  }
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
  2016-07-12 23:15 ` J Freyensee
@ 2016-07-12 23:38   ` Jens Axboe
  2016-07-13 16:16     ` J Freyensee
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2016-07-12 23:38 UTC (permalink / raw)


On 07/12/2016 04:15 PM, J Freyensee wrote:
> On Fri, 2016-07-01@12:13 -0700, Jay Freyensee wrote:
>
> I haven't seen this patch get folded in on the list yet, though
> Christoph OK'ed the fix, minus the 'Fix by:' I used instead of 'From:'.
>
> Has this been folded into 4.8?  Should I go ahead and re-spin the patch
> and make the tweak change?  Be nice to eliminate this kernel crash with
> fabrics being merged into 4.8 kernel.

It was added into for-4.8/drivers earlier today, with the author 
attribution fixed.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
  2016-07-12 23:38   ` Jens Axboe
@ 2016-07-13 16:16     ` J Freyensee
  0 siblings, 0 replies; 6+ messages in thread
From: J Freyensee @ 2016-07-13 16:16 UTC (permalink / raw)


On Tue, 2016-07-12@16:38 -0700, Jens Axboe wrote:
> On 07/12/2016 04:15 PM, J Freyensee wrote:
> > On Fri, 2016-07-01@12:13 -0700, Jay Freyensee wrote:
> > 
> > I haven't seen this patch get folded in on the list yet, though
> > Christoph OK'ed the fix, minus the 'Fix by:' I used instead of
> > 'From:'.
> > 
> > Has this been folded into 4.8?  Should I go ahead and re-spin the
> > patch
> > and make the tweak change?  Be nice to eliminate this kernel crash
> > with
> > fabrics being merged into 4.8 kernel.
> 
> It was added into for-4.8/drivers earlier today, with the author 
> attribution fixed.

Thanks Jens!!

> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-13 16:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-01 19:13 [PATCH] nvme-fabrics: add-remove ctrl repeat fix Jay Freyensee
2016-07-04  8:48 ` Christoph Hellwig
2016-07-08 15:56   ` J Freyensee
2016-07-12 23:15 ` J Freyensee
2016-07-12 23:38   ` Jens Axboe
2016-07-13 16:16     ` J Freyensee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.