* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
@ 2016-07-01 19:13 Jay Freyensee
2016-07-04 8:48 ` Christoph Hellwig
2016-07-12 23:15 ` J Freyensee
0 siblings, 2 replies; 6+ messages in thread
From: Jay Freyensee @ 2016-07-01 19:13 UTC (permalink / raw)
Fix by: "Ming Lin" <mlin at kernel.org>
Repeatedly adding then removing the same NVMe-over-Fabrics controller
over and over again (shown below) can cause a kernel crash (also shown
below). This patch fixes that.
[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside
-nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics
[nvmf]# ./remove_nvme_connections.sh 2
echo 1 > /sys/class/nvme/nvme0/delete_controller
echo 1 > /sys/class/nvme/nvme1/delete_controller
[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
Killed
[nvmf]# dmesg
[ 313.416908] nvme nvme0: creating 16 I/O queues.
[ 313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr
192.168.1.100:4420
[ 313.524857] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[ 313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30
[ 313.525490] PGD 0
[ 313.525726] Oops: 0000 [#1] SMP
[ 313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core
ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en
mlx4_ib ib_core mlx4_core
[ 313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted
4.7.0-rc2+ #2
[ 313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT
-F/IBQF/IBFF, BIOS 1.0a 10/09/2012
[ 313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti:
ffff88025b980000
[ 313.527879] RIP: 0010:[<ffffffff8136c60e>] [<ffffffff8136c60e>]
strcmp+0xe/0x30
[ 313.528232] RSP: 0018:ffff88025b983db0 EFLAGS: 00010206
[ 313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX:
fffffffffffffff1
[ 313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI:
0000000000000011
[ 313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09:
ffff880471879058
[ 313.528956] R10: 000000000000002c R11: ffff88047f415000 R12:
ffff880471879800
[ 313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15:
fffffffffffffff8
[ 313.529303] FS: 00007f778f510700(0000) GS:ffff88047fbc0000(0000)
knlGS:0000000000000000
[ 313.529629] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4:
00000000000406e0
[ 313.529989] Stack:
[ 313.530154] ffff88025b983e48 ffffffffa0171c74 0000000000000001
0000000000000059
[ 313.530621] ffff880476f32400 ffff88047e8add80 0000010074b33aa0
ffff880471879059
[ 313.531162] ffff88047187904b ffff880471879058 0000000000000000
ffff88047736e000
[ 313.531629] Call Trace:
[ 313.531797] [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840
[nvme_fabrics]
[ 313.531974] [<ffffffff81180b53>] __vfs_write+0x23/0x120
[ 313.532146] [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0
[ 313.532316] [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170
[ 313.532487] [<ffffffff811811f3>] vfs_write+0xb3/0x1b0
[ 313.532658] [<ffffffff8117e321>] ? filp_close+0x51/0x70
[ 313.532845] [<ffffffff811824e1>] SyS_write+0x41/0xa0
[ 313.533016] [<ffffffff8183055b>]
entry_SYSCALL_64_fastpath+0x13/0x8f
[ 313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01
84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83
c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31
[ 313.536563] RIP [<ffffffff8136c60e>] strcmp+0xe/0x30
[ 313.536815] RSP <ffff88025b983db0>
[ 313.536981] CR2: 0000000000000010
[ 313.537151] ---[ end trace 3d952e590e7bc2d5 ]---
Reported-and-tested-by: Jay Freyensee <james.p.freyensee at intel.com>
Signed-off-by: Ming Lin <mlin at kernel.org>
Signed-off-by: Jay Freyensee <james.p.freyensee at intel.com>
---
drivers/nvme/host/fabrics.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 918310c..f8045e7 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -88,6 +88,10 @@ static void nvmf_host_destroy(struct kref *ref)
{
struct nvmf_host *host = container_of(ref, struct nvmf_host, ref);
+ mutex_lock(&nvmf_hosts_mutex);
+ list_del(&host->list);
+ mutex_unlock(&nvmf_hosts_mutex);
+
kfree(host);
}
--
2.5.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
2016-07-01 19:13 [PATCH] nvme-fabrics: add-remove ctrl repeat fix Jay Freyensee
@ 2016-07-04 8:48 ` Christoph Hellwig
2016-07-08 15:56 ` J Freyensee
2016-07-12 23:15 ` J Freyensee
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2016-07-04 8:48 UTC (permalink / raw)
On Fri, Jul 01, 2016@12:13:32PM -0700, Jay Freyensee wrote:
> Fix by: "Ming Lin" <mlin at kernel.org>
So is this authored by you or by Ming? In the latter case this
should be a
From: "Ming Lin" <mlin@kernel.org>
The patch itself looks fine, I'll queue it up after we've sorted
out the above question, thanks to both of you.
But I think we'll also need a kref_get_unless_zero in __nvmf_host_find
to make this code fully correct, I'll prepare a patch for that.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
2016-07-04 8:48 ` Christoph Hellwig
@ 2016-07-08 15:56 ` J Freyensee
0 siblings, 0 replies; 6+ messages in thread
From: J Freyensee @ 2016-07-08 15:56 UTC (permalink / raw)
On Mon, 2016-07-04@01:48 -0700, Christoph Hellwig wrote:
> On Fri, Jul 01, 2016@12:13:32PM -0700, Jay Freyensee wrote:
> > Fix by: "Ming Lin" <mlin at kernel.org>
>
> So is this authored by you or by Ming? In the latter case this
> should be a
>
> From: "Ming Lin" <mlin at kernel.org>
>
> The patch itself looks fine, I'll queue it up after we've sorted
> out the above question, thanks to both of you.
Sorry for the late reply, back from an extended 4th of July vacation.
Ming authored the fix after I found the kernel crash and sent it to me
for testing. So yes, to answer your question, the title would be
"From: Ming Lin". Sorry about that.
Jay
> But I think we'll also need a kref_get_unless_zero in
> __nvmf_host_find
> to make this code fully correct, I'll prepare a patch for that.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
2016-07-01 19:13 [PATCH] nvme-fabrics: add-remove ctrl repeat fix Jay Freyensee
2016-07-04 8:48 ` Christoph Hellwig
@ 2016-07-12 23:15 ` J Freyensee
2016-07-12 23:38 ` Jens Axboe
1 sibling, 1 reply; 6+ messages in thread
From: J Freyensee @ 2016-07-12 23:15 UTC (permalink / raw)
On Fri, 2016-07-01@12:13 -0700, Jay Freyensee wrote:
I haven't seen this patch get folded in on the list yet, though
Christoph OK'ed the fix, minus the 'Fix by:' I used instead of 'From:'.
Has this been folded into 4.8? Should I go ahead and re-spin the patch
and make the tweak change? Be nice to eliminate this kernel crash with
fabrics being merged into 4.8 kernel.
Thanks,
Jay
> Fix by: "Ming Lin" <mlin at kernel.org>
>
> Repeatedly adding then removing the same NVMe-over-Fabrics controller
> over and over again (shown below) can cause a kernel crash (also
> shown
> below). This patch fixes that.
>
> [nvmf]# ./setup_nvme_connections.sh
> traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
> -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
> traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside
> -nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics
> [nvmf]# ./remove_nvme_connections.sh 2
> echo 1 > /sys/class/nvme/nvme0/delete_controller
> echo 1 > /sys/class/nvme/nvme1/delete_controller
> [nvmf]# ./setup_nvme_connections.sh
> traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
> -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
> Killed
>
> [nvmf]# dmesg
> [ 313.416908] nvme nvme0: creating 16 I/O queues.
> [ 313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr
> 192.168.1.100:4420
> [ 313.524857] BUG: unable to handle kernel NULL pointer dereference
> at
> 0000000000000010
> [ 313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30
> [ 313.525490] PGD 0
> [ 313.525726] Oops: 0000 [#1] SMP
> [ 313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core
> ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
> mlx4_en
> mlx4_ib ib_core mlx4_core
> [ 313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted
> 4.7.0-rc2+ #2
> [ 313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT
> -F/IBQF/IBFF, BIOS 1.0a 10/09/2012
> [ 313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti:
> ffff88025b980000
> [ 313.527879] RIP: 0010:[<ffffffff8136c60e>] [<ffffffff8136c60e>]
> strcmp+0xe/0x30
> [ 313.528232] RSP: 0018:ffff88025b983db0 EFLAGS: 00010206
> [ 313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX:
> fffffffffffffff1
> [ 313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI:
> 0000000000000011
> [ 313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09:
> ffff880471879058
> [ 313.528956] R10: 000000000000002c R11: ffff88047f415000 R12:
> ffff880471879800
> [ 313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15:
> fffffffffffffff8
> [ 313.529303] FS: 00007f778f510700(0000) GS:ffff88047fbc0000(0000)
> knlGS:0000000000000000
> [ 313.529629] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4:
> 00000000000406e0
> [ 313.529989] Stack:
> [ 313.530154] ffff88025b983e48 ffffffffa0171c74 0000000000000001
> 0000000000000059
> [ 313.530621] ffff880476f32400 ffff88047e8add80 0000010074b33aa0
> ffff880471879059
> [ 313.531162] ffff88047187904b ffff880471879058 0000000000000000
> ffff88047736e000
> [ 313.531629] Call Trace:
> [ 313.531797] [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840
> [nvme_fabrics]
> [ 313.531974] [<ffffffff81180b53>] __vfs_write+0x23/0x120
> [ 313.532146] [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0
> [ 313.532316] [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170
> [ 313.532487] [<ffffffff811811f3>] vfs_write+0xb3/0x1b0
> [ 313.532658] [<ffffffff8117e321>] ? filp_close+0x51/0x70
> [ 313.532845] [<ffffffff811824e1>] SyS_write+0x41/0xa0
> [ 313.533016] [<ffffffff8183055b>]
> entry_SYSCALL_64_fastpath+0x13/0x8f
> [ 313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2
> 01
> 84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48
> 83
> c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3
> 31
> [ 313.536563] RIP [<ffffffff8136c60e>] strcmp+0xe/0x30
> [ 313.536815] RSP <ffff88025b983db0>
> [ 313.536981] CR2: 0000000000000010
> [ 313.537151] ---[ end trace 3d952e590e7bc2d5 ]---
>
> Reported-and-tested-by: Jay Freyensee <james.p.freyensee at intel.com>
> Signed-off-by: Ming Lin <mlin at kernel.org>
> Signed-off-by: Jay Freyensee <james.p.freyensee at intel.com>
> ---
> drivers/nvme/host/fabrics.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/nvme/host/fabrics.c
> b/drivers/nvme/host/fabrics.c
> index 918310c..f8045e7 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -88,6 +88,10 @@ static void nvmf_host_destroy(struct kref *ref)
> {
> struct nvmf_host *host = container_of(ref, struct nvmf_host,
> ref);
>
> + mutex_lock(&nvmf_hosts_mutex);
> + list_del(&host->list);
> + mutex_unlock(&nvmf_hosts_mutex);
> +
> kfree(host);
> }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
2016-07-12 23:15 ` J Freyensee
@ 2016-07-12 23:38 ` Jens Axboe
2016-07-13 16:16 ` J Freyensee
0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2016-07-12 23:38 UTC (permalink / raw)
On 07/12/2016 04:15 PM, J Freyensee wrote:
> On Fri, 2016-07-01@12:13 -0700, Jay Freyensee wrote:
>
> I haven't seen this patch get folded in on the list yet, though
> Christoph OK'ed the fix, minus the 'Fix by:' I used instead of 'From:'.
>
> Has this been folded into 4.8? Should I go ahead and re-spin the patch
> and make the tweak change? Be nice to eliminate this kernel crash with
> fabrics being merged into 4.8 kernel.
It was added into for-4.8/drivers earlier today, with the author
attribution fixed.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme-fabrics: add-remove ctrl repeat fix
2016-07-12 23:38 ` Jens Axboe
@ 2016-07-13 16:16 ` J Freyensee
0 siblings, 0 replies; 6+ messages in thread
From: J Freyensee @ 2016-07-13 16:16 UTC (permalink / raw)
On Tue, 2016-07-12@16:38 -0700, Jens Axboe wrote:
> On 07/12/2016 04:15 PM, J Freyensee wrote:
> > On Fri, 2016-07-01@12:13 -0700, Jay Freyensee wrote:
> >
> > I haven't seen this patch get folded in on the list yet, though
> > Christoph OK'ed the fix, minus the 'Fix by:' I used instead of
> > 'From:'.
> >
> > Has this been folded into 4.8? Should I go ahead and re-spin the
> > patch
> > and make the tweak change? Be nice to eliminate this kernel crash
> > with
> > fabrics being merged into 4.8 kernel.
>
> It was added into for-4.8/drivers earlier today, with the author
> attribution fixed.
Thanks Jens!!
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-07-13 16:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-01 19:13 [PATCH] nvme-fabrics: add-remove ctrl repeat fix Jay Freyensee
2016-07-04 8:48 ` Christoph Hellwig
2016-07-08 15:56 ` J Freyensee
2016-07-12 23:15 ` J Freyensee
2016-07-12 23:38 ` Jens Axboe
2016-07-13 16:16 ` J Freyensee
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.