From: Chris Leech <cleech@redhat.com>
To: linux-nvme@lists.infradead.org, sagi@grimberg.me, hch@lst.de
Cc: lengchao@huawei.com, dwagner@suse.de, hare@suse.de,
mlombard@redhat.com, jmeneghi@redhat.com
Subject: kdump details (Re: nvme-multipath: round-robin infinite looping)
Date: Mon, 21 Mar 2022 15:43:02 -0700 [thread overview]
Message-ID: <20220321224304.955072-2-cleech@redhat.com> (raw)
In-Reply-To: <20220321224304.955072-1-cleech@redhat.com>
Backtrace of a soft-lockup while executing in nvme_ns_head_make_request
crash> bt
PID: 6 TASK: ffff90550742c000 CPU: 0 COMMAND: "kworker/0:0H"
#0 [ffffb40a40003d48] machine_kexec at ffffffff93864e9e
#1 [ffffb40a40003da0] __crash_kexec at ffffffff939a4c4d
#2 [ffffb40a40003e68] panic at ffffffff938ed587
#3 [ffffb40a40003ee8] watchdog_timer_fn.cold.10 at ffffffff939db197
#4 [ffffb40a40003f18] __hrtimer_run_queues at ffffffff93983c20
#5 [ffffb40a40003f78] hrtimer_interrupt at ffffffff93984480
#6 [ffffb40a40003fd8] smp_apic_timer_interrupt at ffffffff942026ba
#7 [ffffb40a40003ff0] apic_timer_interrupt at ffffffff94201c4f
--- <IRQ stack> ---
#8 [ffffb40a403d3d18] apic_timer_interrupt at ffffffff94201c4f
[exception RIP: nvme_ns_head_make_request+592]
RIP: ffffffffc0280810 RSP: ffffb40a403d3dc0 RFLAGS: 00000206
RAX: 0000000000000003 RBX: ffff9059517dd800 RCX: 0000000000000001
RDX: 000043822f41b300 RSI: 0000000000000000 RDI: ffff9058da8d0010
RBP: ffff9058da8d0010 R8: 0000000000000001 R9: 0000000000000000
R10: 0000000000000003 R11: 0000000000000600 R12: 0000000000000001
R13: ffff90724096e000 R14: ffff9058da8dca50 R15: ffff9058da8d0000
ORIG_RAX: ffffffffffffff13 CS: 0010 SS: 0018
#9 [ffffb40a403d3e18] generic_make_request at ffffffff93c6e8bb
#10 [ffffb40a403d3e80] nvme_requeue_work at ffffffffc02805aa [nvme_core]
#11 [ffffb40a403d3e98] process_one_work at ffffffff9390b237
#12 [ffffb40a403d3ed8] worker_thread at ffffffff9390b8f0
#13 [ffffb40a403d3f10] kthread at ffffffff9391269a
#14 [ffffb40a403d3f50] ret_from_fork at ffffffff94200255
RIP here is in the nvme_round_robin_path loop. I've seen a report
without softlockup_panic enabled where multiple CPUs locked with a
variety of addresses reported all within the loop.
Disassembling and looking for the loop test expression, we can find the
instructions for ns && ns != old
236 for (ns = nvme_next_ns(head, old);
237 ns && ns != old;
0x000000000000982f <+575>: test %rbx,%rbx
0x0000000000009832 <+578>: je 0x98a8 <nvme_ns_head_make_request+696>
0x0000000000009834 <+580>: cmp %rbx,%r13
0x0000000000009837 <+583>: je 0x98a8 <nvme_ns_head_make_request+696>
At this point ns is in rbx, and old is in r13, that doesn’t appear to
change while in this loop.
Similarly head is in r15, being used here in list_first_or_null_rcu
./include/linux/compiler.h:
276 __READ_ONCE_SIZE;
0x0000000000009961 <+881>: mov (%r15),%rax
drivers/nvme/host/multipath.c:
222 return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
0x0000000000009964 <+884>: mov %rax,0x18(%rsp)
0x0000000000009969 <+889>: cmp %rax,%r15
0x000000000000996c <+892>: je 0x99d4 <nvme_ns_head_make_request+996>
So, from the saved register values in the backtrace:
old = ffff90724096e000
head = ffff9058da8d0000
Checking that old.head == head
crash> struct nvme_ns.head ffff90724096e000
head = 0xffff9058da8d0000
Dumping the nvme_ns structs on head.list
crash> list nvme_ns.siblings -H 0xffff9058da8d0000
ffff907084504000
ffff9059517dd800
Only 2, and neither of them match “old”
What do the list pointers in old look like?
crash> struct nvme_ns.siblings ffff90724096e000
siblings = {
next = 0xffff9059517dd830,
prev = 0xdead000000000200
}
old.siblings->next points into a valid nvme_ns on the list, but prev has
been poisoned.
This loop can still exit if any ns on the list is not disabled and an
ANA optimized path.
crash> list nvme_ns.siblings -H 0xffff9058da8d0000 -s nvme_ns.ctrl
ffff907084504000
ctrl = 0xffff90555c13c338
ffff9059517dd800
ctrl = 0xffff90555c138338
crash> struct nvme_ctrl.state 0xffff90555c13c338
state = NVME_CTRL_CONNECTING
crash> struct nvme_ctrl.state 0xffff90555c138338
state = NVME_CTRL_CONNECTING
No good, both are disabled while connecting.
Dump head.current_path[], and there’s “old”
crash> struct nvme_ns_head ffff9058da8d0000 -o
struct nvme_ns_head {
[ffff9058da8d0000] struct list_head list;
[ffff9058da8d0010] struct srcu_struct srcu;
[ffff9058da8dc510] struct nvme_subsystem *subsys;
[ffff9058da8dc518] unsigned int ns_id;
[ffff9058da8dc51c] struct nvme_ns_ids ids;
[ffff9058da8dc548] struct list_head entry;
[ffff9058da8dc558] struct kref ref;
[ffff9058da8dc55c] bool shared;
[ffff9058da8dc560] int instance;
[ffff9058da8dc568] struct nvme_effects_log *effects;
[ffff9058da8dc570] struct cdev cdev;
[ffff9058da8dc5f8] struct device cdev_device;
[ffff9058da8dc9c8] struct gendisk *disk;
[ffff9058da8dc9d0] struct bio_list requeue_list;
[ffff9058da8dc9e0] spinlock_t requeue_lock;
[ffff9058da8dc9e8] struct work_struct requeue_work;
[ffff9058da8dca28] struct mutex lock;
[ffff9058da8dca48] unsigned long flags;
[ffff9058da8dca50] struct nvme_ns *current_path[];
}
SIZE: 51792
crash> p *(struct nvme_ns **) 0xffff9058da8dca50 @ 4
$18 =
{0xffff90724096e000, 0x0, 0x0, 0x0}
So, we end up in an infinite loop because:
The nvme_ns_head list contains 2 nvme_ns. Both of them have ctrl->state
as NVME_CTRL_CONNECTING, so they are considered as disabled and we will
execute the continue statement in the loop. “old” has been removed from
the siblings list, so it will never trigger the ns != old condition
next prev parent reply other threads:[~2022-03-21 22:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-21 22:43 nvme-multipath: round-robin infinite looping Chris Leech
2022-03-21 22:43 ` Chris Leech [this message]
2022-03-22 11:16 ` kdump details (Re: nvme-multipath: round-robin infinite looping) Christoph Hellwig
2022-03-22 15:36 ` Chris Leech
2022-03-21 22:43 ` [RFC PATCH] nvme-multipath: break endless loop in nvme_round_robin_path Chris Leech
2022-03-22 11:17 ` Christoph Hellwig
2022-03-22 12:07 ` Daniel Wagner
2022-03-22 15:42 ` Chris Leech
2022-03-21 22:43 ` [RFC PATCH] nvme: fix RCU hole that allowed for endless looping in multipath round robin Chris Leech
2022-03-23 14:54 ` Sagi Grimberg
2022-03-23 15:34 ` Christoph Hellwig
2022-03-23 19:07 ` John Meneghini
2022-04-05 13:14 ` John Meneghini
2022-03-25 6:36 ` Christoph Hellwig
2022-03-25 12:42 ` Sagi Grimberg
2022-04-05 17:25 ` Chris Leech
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220321224304.955072-2-cleech@redhat.com \
--to=cleech@redhat.com \
--cc=dwagner@suse.de \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=lengchao@huawei.com \
--cc=linux-nvme@lists.infradead.org \
--cc=mlombard@redhat.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.