linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Wagner <dwagner@suse.de>
To: Johannes Thumshirn <jthumshirn@suse.de>
Cc: hch@lst.de, linux-nvme@lists.infradead.org,
	Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>,
	sagi@grimberg.me
Subject: Re: [PATCH] nvmet: Always remove processed AER elements from list
Date: Mon, 4 Nov 2019 11:19:35 +0100	[thread overview]
Message-ID: <20191104101935.lzdhraz5wnd56g4r@beryllium.lan> (raw)
In-Reply-To: <20191104095034.GA3193@linux-lxv2>

On Mon, Nov 04, 2019 at 10:50:34AM +0100, Johannes Thumshirn wrote:
> On Mon, Nov 04, 2019 at 09:13:38AM +0100, Daniel Wagner wrote:
> [...]
> > BTW, I got feedback how to produce the oops on our custerms setup:
> > 
> > Test Steps:
> > 1. Create two targets on node n1 as target side (tgt1 and tgt2)
> > 2. Run perf to write data to tgt2:
> >    sudo ./perf -q 1 -w read -o 4096 -t 60 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.219.4 trsvcid:4421'
> > 3. Deleted tgt2 node n1 configuration during perf execution
> >    and reboot the node n1 automatically.

sudo rmdir subsystems/199212f7-33d0-46c3-b362-e47a217c2c49/namespaces/2
sudo unlink /sys/kernel/config/nvmet/ports/2/subsystems/199212f7-33d0-46c3-b362-e47a217c2c49
sudo rmdir subsystems/199212f7-33d0-46c3-b362-e47a217c2c49/
sudo rmdir ports/2/

> I assume thhis "perf" is some kind of performance test. So it should be
> possible to create a blktest recreating this issue.
> 
> I'd translate the "perf" line into something similar to 'fio -rw=read -bs 4k \
> --time_based --timeout=60 --iodepth=1' and then running on a mpathed
> nvme-loop and delete one of the controllers while the IO is running. This
> should be a fairly usual multi-path test. The only thing I can't see yet
> is, how we end up with unprocessed AENs here.

The 'perf' command seems to be able to write to a specific target. At
least I read this from from another command I see in the feedback I
got:

  Run perf to write data to tgt1 and tgt2 for a long time , command:
    sudo ./perf -q 1 -w read -o 4096 -t 600000 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.219.4 trsvcid:4420'
    sudo ./perf -q 1 -w read -o 4096 -t 600000 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.219.4 trsvcid:4421'


Anyway, also have a few lines from dmesg before the crash (this is
without the patch):

[ 1507.453659] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0052-3910-8036-c7c04f315332.
[ 1507.453794] nvme nvme12: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.219.4:4421
[ 1507.453925] nvme nvme12: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 1926.227086] nvmet: creating controller 1 for subsystem 416d7463-d3c2-41e4-b2f5-76df2ed3db26 for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0052-4210-8033-c7c04f315332.
[ 1927.992304] nvmet: creating controller 2 for subsystem 199212f7-33d0-46c3-b362-e47a217c2c49 for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0052-4210-8033-c7c04f315332.
[ 2135.800592] nvmet: adding nsid 2 to subsystem 199212f7-33d0-46c3-b362-e47a217c2c49
[ 2137.174479] nvmet_rdma: enabling port 2 (192.168.219.4:4421)
[ 2138.597566] nvmet: creating controller 2 for subsystem 199212f7-33d0-46c3-b362-e47a217c2c49 for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0052-4210-8033-c7c04f315332.
[ 2912.103808] nvmet: ctrl 2 keep-alive timer (15 seconds) expired!
[ 2912.103812] nvmet: ctrl 2 fatal error occurred!
[ 4005.326810] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN 2014-08.org.nvmexpress:uuid:a9ebf41b-201c-42e1-9b09-4dd7fb70e2c5.
[ 4005.329045] nvmet: creating controller 2 for subsystem 416d7463-d3c2-41e4-b2f5-76df2ed3db26 for NQN 2014-08.org.nvmexpress:uuid:a9ebf41b-201c-42e1-9b09-4dd7fb70e2c5.
[ 4068.052392] nvmet: adding nsid 2 to subsystem 199212f7-33d0-46c3-b362-e47a217c2c49
[ 4068.106984] nvmet_rdma: enabling port 2 (192.168.219.4:4421)
[ 4078.581846] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN 2014-08.org.nvmexpress:uuid:742c21ef-fcee-4b64-aec1-6063ae5710e4.
[ 4078.583849] nvmet: creating controller 2 for subsystem 199212f7-33d0-46c3-b362-e47a217c2c49 for NQN 2014-08.org.nvmexpress:uuid:742c21ef-fcee-4b64-aec1-6063ae5710e4.
[ 5076.654122] nvmet: adding nsid 1 to subsystem 416d7463-d3c2-41e4-b2f5-76df2ed3db26
[ 5076.720100] nvmet_rdma: enabling port 1 (192.168.219.4:4420)
[ 5076.744432] nvmet: adding nsid 2 to subsystem 199212f7-33d0-46c3-b362-e47a217c2c49
[ 5076.804401] nvmet_rdma: enabling port 2 (192.168.219.4:4421)
[ 5115.805550] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN 2014-08.org.nvmexpress:uuid:988626aa-4466-44dc-8142-6bfc382192e3.
[ 5115.807714] nvmet: creating controller 2 for subsystem 199212f7-33d0-46c3-b362-e47a217c2c49 for NQN 2014-08.org.nvmexpress:uuid:988626aa-4466-44dc-8142-6bfc382192e3.
[ 5136.186393] general protection fault: 0000 [#1] SMP PTI
[ 5136.191643] CPU: 14 PID: 54995 Comm: kworker/14:1 Tainted: G           OE      4.12.14-95.19.1.17807.0.PTF.1134097-default #1 SLE12-SP4
[ 5136.203873] Hardware name: Dell Inc. PowerEdge R740xd/01KPX8, BIOS 1.5.4 07/30/2018
[ 5136.211554] Workqueue: events nvmet_async_event_work [nvmet]
[ 5136.217230] task: ffff983bdb11d0c0 task.stack: ffffafc449b58000
[ 5136.223169] RIP: 0010:nvmet_async_event_work+0x5e/0xb0 [nvmet]
[ 5136.229017] RSP: 0018:ffffafc449b5be58 EFLAGS: 00010202
[ 5136.234258] RAX: dead000000000100 RBX: ffff983b99b830b0 RCX: 0000000000000002
[ 5136.241404] RDX: 0000000000040002 RSI: 38ffff983bdb3c05 RDI: ffff983bdb9412c0
[ 5136.248552] RBP: ffff983b99b83018 R08: 00000000fffffff0 R09: 0000000000000000
[ 5136.255709] R10: fffffffffffffff4 R11: ffffffffffffff83 R12: ffff983b99b830a0
[ 5136.262887] R13: ffff983bcaa55f29 R14: 0000000000000000 R15: 0ffff983c501e860
[ 5136.270045] FS:  0000000000000000(0000) GS:ffff983c501c0000(0000) knlGS:0000000000000000
[ 5136.278160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5136.283925] CR2: 00007f06d9f246e0 CR3: 00000032ed00a001 CR4: 00000000007606e0
[ 5136.291073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5136.298236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5136.305398] PKRU: 55555554
[ 5136.308121] Call Trace:
[ 5136.310587]  process_one_work+0x14c/0x390
[ 5136.314619]  worker_thread+0x47/0x3e0
[ 5136.318294]  kthread+0xff/0x140
[ 5136.321462]  ? max_active_store+0x60/0x60
[ 5136.325503]  ? __kthread_parkme+0x70/0x70
[ 5136.329526]  ret_from_fork+0x35/0x40
[ 5136.333131] Code: b8 00 01 00 00 00 00 ad de 89 53 e8 4c 8b 6c d3 c8 0f b6 4f 12 0f b6 57 11 49 8b 75 08 c1 e1 10 c1 e2 08 09 ca 0f b6 4f 10 09 ca <89> 16 48 8b 0f 48 8b 57 08 48 89 51 08 48 89 0a 48 89 07 66 b8


At 2912 tgt2 is propably deleted. I wonder why there are two "enabling
port 2" after it.

Thanks,
Daniel

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2019-11-04 10:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-03 20:13 Chaitanya Kulkarni
2019-11-04  8:13 ` Daniel Wagner
2019-11-04  9:50   ` Johannes Thumshirn
2019-11-04 10:19     ` Daniel Wagner [this message]
2019-11-08 10:42       ` Daniel Wagner
2019-11-10  2:25         ` Chaitanya Kulkarni
2019-11-11  7:28           ` Daniel Wagner
2019-12-05  9:23           ` Daniel Wagner
2019-12-11  7:39             ` Chaitanya Kulkarni
2019-12-12  9:35               ` hch
2019-11-04 20:41   ` Chaitanya Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191104101935.lzdhraz5wnd56g4r@beryllium.lan \
    --to=dwagner@suse.de \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=hch@lst.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --subject='Re: [PATCH] nvmet: Always remove processed AER elements from list' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).