linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-pci: cancel nvme device request before disabling
@ 2020-08-14  7:14 Tong Zhang
  2020-08-14  8:16 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Tong Zhang @ 2020-08-14  7:14 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, kbusch, axboe, hch, sagi; +Cc: ztong0001

  This patch addresses an irq free warning and null pointer dereference
  error problem when nvme devices got timeout error during initialization.
  This problem happens when nvme_timeout() function is called while
  nvme_reset_work() is still in execution. This patch fixed the problem by
  setting flag of the problematic request to NVME_REQ_CANCELLED before
  calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
  an error code and let nvme_submit_sync_cmd() fail gracefully.
  The following is console output.

  [   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
  [   62.488796] nvme nvme0: could not set timestamp (881)
  [   62.494888] ------------[ cut here ]------------
  [   62.495142] Trying to free already-free IRQ 11
  [   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370
  [   62.495742] Modules linked in:
  [   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
  [   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
  [   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
  [   62.497019] RIP: 0010:free_irq+0x1f7/0x370
  [   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70
  [   62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086
  [   62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000
  [   62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c
  [   62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163
  [   62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00
  [   62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000
  [   62.500140] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
  [   62.500534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
  [   62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   62.501864] Call Trace:
  [   62.501993]  pci_free_irq+0x13/0x20
  [   62.502167]  nvme_reset_work+0x5d0/0x12a0
  [   62.502369]  ? update_load_avg+0x59/0x580
  [   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
  [   62.502780]  ? try_to_wake_up+0x1a2/0x450
  [   62.502979]  process_one_work+0x1d2/0x390
  [   62.503179]  worker_thread+0x45/0x3b0
  [   62.503361]  ? process_one_work+0x390/0x390
  [   62.503568]  kthread+0xf9/0x130
  [   62.503726]  ? kthread_park+0x80/0x80
  [   62.503911]  ret_from_fork+0x22/0x30
  [   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
  [  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
  [  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
  [  123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000
  [  123.917469] #PF: supervisor write access in kernel mode
  [  123.917725] #PF: error_code(0x0002) - not-present page
  [  123.917976] PGD 0 P4D 0
  [  123.918109] Oops: 0002 [#1] SMP PTI
  [  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G        W         5.8.0+ #8
  [  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
  [  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
  [  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
  [  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
  [  123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
  [  123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
  [  123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
  [  123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
  [  123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
  [  123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
  [  123.922641] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
  [  123.923032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
  [  123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [  123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [  123.924353] Call Trace:
  [  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
  [  123.924694]  nvme_reset_work+0xed6/0x12a0
  [  123.924898]  process_one_work+0x1d2/0x390
  [  123.925099]  worker_thread+0x45/0x3b0
  [  123.925280]  ? process_one_work+0x390/0x390
  [  123.925486]  kthread+0xf9/0x130
  [  123.925642]  ? kthread_park+0x80/0x80
  [  123.925825]  ret_from_fork+0x22/0x30
  [  123.926004] Modules linked in:
  [  123.926158] CR2: 0000000000000000
  [  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
  [  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
  [  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
  [  123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
  [  123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
  [  123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
  [  123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
  [  123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
  [  123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
  [  123.929715] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
  [  123.930106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
  [  123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [  123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ba725ae47305..c4f1ce0ee1e3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn_ratelimited(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, true);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
+		nvme_dev_disable(dev, true);
 		return BLK_EH_DONE;
 	case NVME_CTRL_RESETTING:
 		return BLK_EH_RESET_TIMER;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-14  7:14 [PATCH] nvme-pci: cancel nvme device request before disabling Tong Zhang
@ 2020-08-14  8:16 ` Christoph Hellwig
  2020-08-14 11:53   ` Tong Zhang
  2020-08-14 11:46 ` [PATCH v2] " Tong Zhang
  2020-08-14 15:04 ` [PATCH] " Keith Busch
  2 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2020-08-14  8:16 UTC (permalink / raw)
  To: Tong Zhang; +Cc: linux-kernel, linux-nvme, kbusch, axboe, hch, sagi

On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
>   This patch addresses an irq free warning and null pointer dereference
>   error problem when nvme devices got timeout error during initialization.
>   This problem happens when nvme_timeout() function is called while
>   nvme_reset_work() is still in execution. This patch fixed the problem by
>   setting flag of the problematic request to NVME_REQ_CANCELLED before
>   calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
>   an error code and let nvme_submit_sync_cmd() fail gracefully.
>   The following is console output.

The commit log looks a little weird due to the extra indentation.

The patch itself looks good, though:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] nvme-pci: cancel nvme device request before disabling
  2020-08-14  7:14 [PATCH] nvme-pci: cancel nvme device request before disabling Tong Zhang
  2020-08-14  8:16 ` Christoph Hellwig
@ 2020-08-14 11:46 ` Tong Zhang
  2020-08-14 15:04 ` [PATCH] " Keith Busch
  2 siblings, 0 replies; 14+ messages in thread
From: Tong Zhang @ 2020-08-14 11:46 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, sagi, hch, axboe, kbusch; +Cc: ztong0001

This patch addresses an irq free warning and null pointer dereference
error problem when nvme devices got timeout error during initialization.
This problem happens when nvme_timeout() function is called while
nvme_reset_work() is still in execution. This patch fixed the problem by
setting flag of the problematic request to NVME_REQ_CANCELLED before
calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
an error code and let nvme_submit_sync_cmd() fail gracefully.
The following is console output.

[   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
[   62.488796] nvme nvme0: could not set timestamp (881)
[   62.494888] ------------[ cut here ]------------
[   62.495142] Trying to free already-free IRQ 11
[   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370
[   62.495742] Modules linked in:
[   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
[   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
[   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
[   62.497019] RIP: 0010:free_irq+0x1f7/0x370
[   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70
[   62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086
[   62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000
[   62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c
[   62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163
[   62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00
[   62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000
[   62.500140] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[   62.500534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[   62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   62.501864] Call Trace:
[   62.501993]  pci_free_irq+0x13/0x20
[   62.502167]  nvme_reset_work+0x5d0/0x12a0
[   62.502369]  ? update_load_avg+0x59/0x580
[   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
[   62.502780]  ? try_to_wake_up+0x1a2/0x450
[   62.502979]  process_one_work+0x1d2/0x390
[   62.503179]  worker_thread+0x45/0x3b0
[   62.503361]  ? process_one_work+0x390/0x390
[   62.503568]  kthread+0xf9/0x130
[   62.503726]  ? kthread_park+0x80/0x80
[   62.503911]  ret_from_fork+0x22/0x30
[   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
[  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
[  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
[  123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  123.917469] #PF: supervisor write access in kernel mode
[  123.917725] #PF: error_code(0x0002) - not-present page
[  123.917976] PGD 0 P4D 0
[  123.918109] Oops: 0002 [#1] SMP PTI
[  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G        W         5.8.0+ #8
[  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
[  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
[  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
[  123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
[  123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
[  123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
[  123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
[  123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
[  123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
[  123.922641] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[  123.923032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[  123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  123.924353] Call Trace:
[  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
[  123.924694]  nvme_reset_work+0xed6/0x12a0
[  123.924898]  process_one_work+0x1d2/0x390
[  123.925099]  worker_thread+0x45/0x3b0
[  123.925280]  ? process_one_work+0x390/0x390
[  123.925486]  kthread+0xf9/0x130
[  123.925642]  ? kthread_park+0x80/0x80
[  123.925825]  ret_from_fork+0x22/0x30
[  123.926004] Modules linked in:
[  123.926158] CR2: 0000000000000000
[  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
[  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
[  123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
[  123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
[  123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
[  123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
[  123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
[  123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
[  123.929715] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[  123.930106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[  123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ba725ae47305..c4f1ce0ee1e3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn_ratelimited(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, true);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
+		nvme_dev_disable(dev, true);
 		return BLK_EH_DONE;
 	case NVME_CTRL_RESETTING:
 		return BLK_EH_RESET_TIMER;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-14  8:16 ` Christoph Hellwig
@ 2020-08-14 11:53   ` Tong Zhang
  0 siblings, 0 replies; 14+ messages in thread
From: Tong Zhang @ 2020-08-14 11:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, linux-nvme, kbusch, axboe, sagi

Oops sorry for the duplicated email -- I forgot to turn off HTML when
sending the previous one.

Thanks Christoph,
I did a couple of fixes on the commit log, please see v2 patch.
Thanks!
-- Tong


On Fri, Aug 14, 2020 at 4:16 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
> >   This patch addresses an irq free warning and null pointer dereference
> >   error problem when nvme devices got timeout error during initialization.
> >   This problem happens when nvme_timeout() function is called while
> >   nvme_reset_work() is still in execution. This patch fixed the problem by
> >   setting flag of the problematic request to NVME_REQ_CANCELLED before
> >   calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
> >   an error code and let nvme_submit_sync_cmd() fail gracefully.
> >   The following is console output.
>
> The commit log looks a little weird due to the extra indentation.
>
> The patch itself looks good, though:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-14  7:14 [PATCH] nvme-pci: cancel nvme device request before disabling Tong Zhang
  2020-08-14  8:16 ` Christoph Hellwig
  2020-08-14 11:46 ` [PATCH v2] " Tong Zhang
@ 2020-08-14 15:04 ` Keith Busch
  2020-08-14 15:37   ` Tong Zhang
  2 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2020-08-14 15:04 UTC (permalink / raw)
  To: Tong Zhang; +Cc: linux-kernel, linux-nvme, axboe, hch, sagi

On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index ba725ae47305..c4f1ce0ee1e3 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>  		dev_warn_ratelimited(dev->ctrl.device,
>  			 "I/O %d QID %d timeout, disable controller\n",
>  			 req->tag, nvmeq->qid);
> -		nvme_dev_disable(dev, true);
>  		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> +		nvme_dev_disable(dev, true);
>  		return BLK_EH_DONE;

Shouldn't this flag have been set in nvme_cancel_request()?  It's not
like the timeout out command is the only command to have been cancelled
by this action, nor is it guaranteed that getting here will mean the
request was in fact cancelled. The controller could still provide a real
completion.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-14 15:04 ` [PATCH] " Keith Busch
@ 2020-08-14 15:37   ` Tong Zhang
  2020-08-14 15:42     ` Keith Busch
  0 siblings, 1 reply; 14+ messages in thread
From: Tong Zhang @ 2020-08-14 15:37 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-kernel, linux-nvme, axboe, Christoph Hellwig, sagi

Hi Keith,

On Fri, Aug 14, 2020 at 11:04 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index ba725ae47305..c4f1ce0ee1e3 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> >               dev_warn_ratelimited(dev->ctrl.device,
> >                        "I/O %d QID %d timeout, disable controller\n",
> >                        req->tag, nvmeq->qid);
> > -             nvme_dev_disable(dev, true);
> >               nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> > +             nvme_dev_disable(dev, true);
> >               return BLK_EH_DONE;
>
> Shouldn't this flag have been set in nvme_cancel_request()?

nvme_cancel_request() is not setting this flag to cancelled and this is causing
__nvme_submit_sync_cmd() to return a positive number and eventually
this will allow nvme_init_identify() proceed with a timeout error where the
problematic nvme device has already been disabled by nvme_dev_disable(dev,true).
The reset of the logic in nvme_init_identify() is not going to check
those already
released resources. -- in fact this is UAF --

> like the timeout out command is the only command to have been cancelled
> by this action, nor is it guaranteed that getting here will mean the
> request was in fact cancelled. The controller could still provide a real
> completion.

what would you like to suggest here?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-14 15:37   ` Tong Zhang
@ 2020-08-14 15:42     ` Keith Busch
  2020-08-14 16:11       ` Tong Zhang
  2020-08-14 18:00       ` [PATCH] " Sagi Grimberg
  0 siblings, 2 replies; 14+ messages in thread
From: Keith Busch @ 2020-08-14 15:42 UTC (permalink / raw)
  To: Tong Zhang; +Cc: linux-kernel, linux-nvme, axboe, Christoph Hellwig, sagi

On Fri, Aug 14, 2020 at 11:37:20AM -0400, Tong Zhang wrote:
> On Fri, Aug 14, 2020 at 11:04 AM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index ba725ae47305..c4f1ce0ee1e3 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> > >               dev_warn_ratelimited(dev->ctrl.device,
> > >                        "I/O %d QID %d timeout, disable controller\n",
> > >                        req->tag, nvmeq->qid);
> > > -             nvme_dev_disable(dev, true);
> > >               nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> > > +             nvme_dev_disable(dev, true);
> > >               return BLK_EH_DONE;
> >
> > Shouldn't this flag have been set in nvme_cancel_request()?
> 
> nvme_cancel_request() is not setting this flag to cancelled and this is causing

Right, I see that it doesn't, but I'm saying that it should. We used to
do something like that, and I'm struggling to recall why we're not
anymore. The driver is not reporting   non-response back for all
cancelled requests, and that is probably not what we should be doing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-14 15:42     ` Keith Busch
@ 2020-08-14 16:11       ` Tong Zhang
  2020-08-27 15:01         ` Keith Busch
  2020-08-14 18:00       ` [PATCH] " Sagi Grimberg
  1 sibling, 1 reply; 14+ messages in thread
From: Tong Zhang @ 2020-08-14 16:11 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-kernel, linux-nvme, axboe, Christoph Hellwig, sagi

On Fri, Aug 14, 2020 at 11:42 AM Keith Busch <kbusch@kernel.org> wrote:
> > > On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
> > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > > index ba725ae47305..c4f1ce0ee1e3 100644
> > > > --- a/drivers/nvme/host/pci.c
> > > > +++ b/drivers/nvme/host/pci.c
> > > > @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> > > >               dev_warn_ratelimited(dev->ctrl.device,
> > > >                        "I/O %d QID %d timeout, disable controller\n",
> > > >                        req->tag, nvmeq->qid);
> > > > -             nvme_dev_disable(dev, true);
> > > >               nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> > > > +             nvme_dev_disable(dev, true);
> > > >               return BLK_EH_DONE;

> anymore. The driver is not reporting   non-response back for all
> cancelled requests, and that is probably not what we should be doing.

OK, thanks for the explanation. I think the bottom line here is to let the
probe function know and stop proceeding when there's an error.
I also don't see an obvious reason to set NVME_REQ_CANCELLED
after nvme_dev_disable(dev, true).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-14 15:42     ` Keith Busch
  2020-08-14 16:11       ` Tong Zhang
@ 2020-08-14 18:00       ` Sagi Grimberg
  1 sibling, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2020-08-14 18:00 UTC (permalink / raw)
  To: Keith Busch, Tong Zhang
  Cc: linux-kernel, linux-nvme, axboe, Christoph Hellwig


>>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>>> index ba725ae47305..c4f1ce0ee1e3 100644
>>>> --- a/drivers/nvme/host/pci.c
>>>> +++ b/drivers/nvme/host/pci.c
>>>> @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>>>>                dev_warn_ratelimited(dev->ctrl.device,
>>>>                         "I/O %d QID %d timeout, disable controller\n",
>>>>                         req->tag, nvmeq->qid);
>>>> -             nvme_dev_disable(dev, true);
>>>>                nvme_req(req)->flags |= NVME_REQ_CANCELLED;
>>>> +             nvme_dev_disable(dev, true);
>>>>                return BLK_EH_DONE;
>>>
>>> Shouldn't this flag have been set in nvme_cancel_request()?
>>
>> nvme_cancel_request() is not setting this flag to cancelled and this is causing
> 
> Right, I see that it doesn't, but I'm saying that it should. We used to
> do something like that, and I'm struggling to recall why we're not
> anymore.

I also don't recall why, but I know that we rely on the status
propagating back from submit_sync_cmd which won't happen because
it converts the status into -ENODEV.

> The driver is not reporting   non-response back for all
> cancelled requests, and that is probably not what we should be doing.

I'd think that we should modify our callers to handle nvme status
codes as well rather than rely on nvme_submit_sync_cmd to return a
negative codes under some conditions.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-14 16:11       ` Tong Zhang
@ 2020-08-27 15:01         ` Keith Busch
  2020-08-28 12:43           ` Tong Zhang
  2020-08-28 14:17           ` [PATCH v2] " Tong Zhang
  0 siblings, 2 replies; 14+ messages in thread
From: Keith Busch @ 2020-08-27 15:01 UTC (permalink / raw)
  To: Tong Zhang; +Cc: linux-kernel, linux-nvme, axboe, Christoph Hellwig, sagi

On Fri, Aug 14, 2020 at 12:11:56PM -0400, Tong Zhang wrote:
> On Fri, Aug 14, 2020 at 11:42 AM Keith Busch <kbusch@kernel.org> wrote:
> > > > On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
> > > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > > > index ba725ae47305..c4f1ce0ee1e3 100644
> > > > > --- a/drivers/nvme/host/pci.c
> > > > > +++ b/drivers/nvme/host/pci.c
> > > > > @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> > > > >               dev_warn_ratelimited(dev->ctrl.device,
> > > > >                        "I/O %d QID %d timeout, disable controller\n",
> > > > >                        req->tag, nvmeq->qid);
> > > > > -             nvme_dev_disable(dev, true);
> > > > >               nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> > > > > +             nvme_dev_disable(dev, true);
> > > > >               return BLK_EH_DONE;
> 
> > anymore. The driver is not reporting   non-response back for all
> > cancelled requests, and that is probably not what we should be doing.
> 
> OK, thanks for the explanation. I think the bottom line here is to let the
> probe function know and stop proceeding when there's an error.
> I also don't see an obvious reason to set NVME_REQ_CANCELLED
> after nvme_dev_disable(dev, true).

The flag was set after disabling when it didn't happen to matter: the
block layer had a complicated timeout scheme that didn't actually
complete the request until the timeout handler returned, so the flag set
where it is was 'ok'. That's clearly not the case anymore, so yes, I
think we do need your patch.

There is one case you are missing, though:

---
@@ -1267,10 +1267,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
+		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		nvme_dev_disable(dev, false);
 		nvme_reset_ctrl(&dev->ctrl);
 
-		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_DONE;
 	}
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] nvme-pci: cancel nvme device request before disabling
  2020-08-27 15:01         ` Keith Busch
@ 2020-08-28 12:43           ` Tong Zhang
  2020-08-28 14:17           ` [PATCH v2] " Tong Zhang
  1 sibling, 0 replies; 14+ messages in thread
From: Tong Zhang @ 2020-08-28 12:43 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-kernel, linux-nvme, axboe, Christoph Hellwig, sagi

Hi Keith,
Thanks for the confirmation. I will send another revision according to
your comments.
Best,
- Tong

On Thu, Aug 27, 2020 at 11:01 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, Aug 14, 2020 at 12:11:56PM -0400, Tong Zhang wrote:
> > On Fri, Aug 14, 2020 at 11:42 AM Keith Busch <kbusch@kernel.org> wrote:
> > > > > On Fri, Aug 14, 2020 at 03:14:31AM -0400, Tong Zhang wrote:
> > > > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > > > > index ba725ae47305..c4f1ce0ee1e3 100644
> > > > > > --- a/drivers/nvme/host/pci.c
> > > > > > +++ b/drivers/nvme/host/pci.c
> > > > > > @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
> > > > > >               dev_warn_ratelimited(dev->ctrl.device,
> > > > > >                        "I/O %d QID %d timeout, disable controller\n",
> > > > > >                        req->tag, nvmeq->qid);
> > > > > > -             nvme_dev_disable(dev, true);
> > > > > >               nvme_req(req)->flags |= NVME_REQ_CANCELLED;
> > > > > > +             nvme_dev_disable(dev, true);
> > > > > >               return BLK_EH_DONE;
> >
> > > anymore. The driver is not reporting   non-response back for all
> > > cancelled requests, and that is probably not what we should be doing.
> >
> > OK, thanks for the explanation. I think the bottom line here is to let the
> > probe function know and stop proceeding when there's an error.
> > I also don't see an obvious reason to set NVME_REQ_CANCELLED
> > after nvme_dev_disable(dev, true).
>
> The flag was set after disabling when it didn't happen to matter: the
> block layer had a complicated timeout scheme that didn't actually
> complete the request until the timeout handler returned, so the flag set
> where it is was 'ok'. That's clearly not the case anymore, so yes, I
> think we do need your patch.
>
> There is one case you are missing, though:
>
> ---
> @@ -1267,10 +1267,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
>                 dev_warn(dev->ctrl.device,
>                          "I/O %d QID %d timeout, reset controller\n",
>                          req->tag, nvmeq->qid);
> +               nvme_req(req)->flags |= NVME_REQ_CANCELLED;
>                 nvme_dev_disable(dev, false);
>                 nvme_reset_ctrl(&dev->ctrl);
>
> -               nvme_req(req)->flags |= NVME_REQ_CANCELLED;
>                 return BLK_EH_DONE;
>         }
> --

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] nvme-pci: cancel nvme device request before disabling
  2020-08-27 15:01         ` Keith Busch
  2020-08-28 12:43           ` Tong Zhang
@ 2020-08-28 14:17           ` Tong Zhang
  2020-08-28 15:51             ` Keith Busch
  2020-08-28 15:56             ` Sagi Grimberg
  1 sibling, 2 replies; 14+ messages in thread
From: Tong Zhang @ 2020-08-28 14:17 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, linux-nvme, linux-kernel; +Cc: ztong0001

This patch addresses an irq free warning and null pointer dereference
error problem when nvme devices got timeout error during initialization.
This problem happens when nvme_timeout() function is called while
nvme_reset_work() is still in execution. This patch fixed the problem by
setting flag of the problematic request to NVME_REQ_CANCELLED before
calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
an error code and let nvme_submit_sync_cmd() fail gracefully.
The following is console output.

[   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
[   62.488796] nvme nvme0: could not set timestamp (881)
[   62.494888] ------------[ cut here ]------------
[   62.495142] Trying to free already-free IRQ 11
[   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370
[   62.495742] Modules linked in:
[   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
[   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
[   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
[   62.497019] RIP: 0010:free_irq+0x1f7/0x370
[   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70
[   62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086
[   62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000
[   62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c
[   62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163
[   62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00
[   62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000
[   62.500140] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[   62.500534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[   62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   62.501864] Call Trace:
[   62.501993]  pci_free_irq+0x13/0x20
[   62.502167]  nvme_reset_work+0x5d0/0x12a0
[   62.502369]  ? update_load_avg+0x59/0x580
[   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
[   62.502780]  ? try_to_wake_up+0x1a2/0x450
[   62.502979]  process_one_work+0x1d2/0x390
[   62.503179]  worker_thread+0x45/0x3b0
[   62.503361]  ? process_one_work+0x390/0x390
[   62.503568]  kthread+0xf9/0x130
[   62.503726]  ? kthread_park+0x80/0x80
[   62.503911]  ret_from_fork+0x22/0x30
[   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
[  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
[  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
[  123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  123.917469] #PF: supervisor write access in kernel mode
[  123.917725] #PF: error_code(0x0002) - not-present page
[  123.917976] PGD 0 P4D 0
[  123.918109] Oops: 0002 [#1] SMP PTI
[  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G        W         5.8.0+ #8
[  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
[  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
[  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
[  123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
[  123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
[  123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
[  123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
[  123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
[  123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
[  123.922641] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[  123.923032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[  123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  123.924353] Call Trace:
[  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
[  123.924694]  nvme_reset_work+0xed6/0x12a0
[  123.924898]  process_one_work+0x1d2/0x390
[  123.925099]  worker_thread+0x45/0x3b0
[  123.925280]  ? process_one_work+0x390/0x390
[  123.925486]  kthread+0xf9/0x130
[  123.925642]  ? kthread_park+0x80/0x80
[  123.925825]  ret_from_fork+0x22/0x30
[  123.926004] Modules linked in:
[  123.926158] CR2: 0000000000000000
[  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
[  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
[  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
[  123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
[  123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
[  123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
[  123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
[  123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
[  123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
[  123.929715] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
[  123.930106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
[  123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Co-developed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
---

v2: also fix other places where NVME_REQ_CANCELLED should be set before
calling nvme_dev_disable according to Keith Busch's comment

 drivers/nvme/host/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ba725ae47305..0d51f1210444 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn_ratelimited(dev->ctrl.device,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_disable(dev, true);
 		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
+		nvme_dev_disable(dev, true);
 		return BLK_EH_DONE;
 	case NVME_CTRL_RESETTING:
 		return BLK_EH_RESET_TIMER;
@@ -1267,10 +1267,10 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->ctrl.device,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
+		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		nvme_dev_disable(dev, false);
 		nvme_reset_ctrl(&dev->ctrl);
 
-		nvme_req(req)->flags |= NVME_REQ_CANCELLED;
 		return BLK_EH_DONE;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] nvme-pci: cancel nvme device request before disabling
  2020-08-28 14:17           ` [PATCH v2] " Tong Zhang
@ 2020-08-28 15:51             ` Keith Busch
  2020-08-28 15:56             ` Sagi Grimberg
  1 sibling, 0 replies; 14+ messages in thread
From: Keith Busch @ 2020-08-28 15:51 UTC (permalink / raw)
  To: Tong Zhang; +Cc: axboe, hch, sagi, linux-nvme, linux-kernel

On Fri, Aug 28, 2020 at 10:17:08AM -0400, Tong Zhang wrote:
> This patch addresses an irq free warning and null pointer dereference
> error problem when nvme devices got timeout error during initialization.
> This problem happens when nvme_timeout() function is called while
> nvme_reset_work() is still in execution. This patch fixed the problem by
> setting flag of the problematic request to NVME_REQ_CANCELLED before
> calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
> an error code and let nvme_submit_sync_cmd() fail gracefully.
> The following is console output.

Thanks, this looks good to me.

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] nvme-pci: cancel nvme device request before disabling
  2020-08-28 14:17           ` [PATCH v2] " Tong Zhang
  2020-08-28 15:51             ` Keith Busch
@ 2020-08-28 15:56             ` Sagi Grimberg
  1 sibling, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2020-08-28 15:56 UTC (permalink / raw)
  To: Tong Zhang, kbusch, axboe, hch, linux-nvme, linux-kernel

Added to nvme-5.9-rc

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-08-28 15:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14  7:14 [PATCH] nvme-pci: cancel nvme device request before disabling Tong Zhang
2020-08-14  8:16 ` Christoph Hellwig
2020-08-14 11:53   ` Tong Zhang
2020-08-14 11:46 ` [PATCH v2] " Tong Zhang
2020-08-14 15:04 ` [PATCH] " Keith Busch
2020-08-14 15:37   ` Tong Zhang
2020-08-14 15:42     ` Keith Busch
2020-08-14 16:11       ` Tong Zhang
2020-08-27 15:01         ` Keith Busch
2020-08-28 12:43           ` Tong Zhang
2020-08-28 14:17           ` [PATCH v2] " Tong Zhang
2020-08-28 15:51             ` Keith Busch
2020-08-28 15:56             ` Sagi Grimberg
2020-08-14 18:00       ` [PATCH] " Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).