All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
@ 2021-05-30  7:33 Michal Kalderon
  2021-06-08 16:50 ` Christoph Hellwig
  2021-06-08 17:43 ` Sagi Grimberg
  0 siblings, 2 replies; 8+ messages in thread
From: Michal Kalderon @ 2021-05-30  7:33 UTC (permalink / raw)
  To: Christoph Hellwig, sagi; +Cc: linux-nvme, Shai Malin, Ariel Elior

Hi Christoph, Sagi, 

We're testing some device error recovery scenarios and hit the following BUG, stack trace below. 
In the error scenario, nvmet_rdma_queue_response receives an error from the device when trying to post a wr, 

this leads to nvmet_rdma_release_rsp being called from softirq eventually
reaching the blk_mq_delay_run_hw_queue which tries to schedule in softirq. (full stack below)

could you please advise what the correct solution should be in this case ? 

thanks,
Michal

[ 8790.082863] nvmet_rdma: post_recv cmd failed
[ 8790.083484] nvmet_rdma: sending cmd response failed
[ 8790.084131] ------------[ cut here ]------------
[ 8790.084140] WARNING: CPU: 7 PID: 46 at block/blk-mq.c:1422 __blk_mq_run_hw_queue+0xb7/0x100
[ 8790.084619] Modules linked in: null_blk nvmet_rdma nvmet nvme_rdma nvme_fabrics nvme_core netconsole qedr(OE) qede(OE) qed(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM nft_chain_nat xt_MASQUERADE nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nft_counter nft_compat tun bridge stp llc nf_tables nfnetlink ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma rdma_ucm ib_iser rdma_cm iw_cm intel_rapl_msr intel_rapl_common ib_cm sb_edac libiscsi scsi_transport_iscsi kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc rapl ib_uverbs ib_core cirrus drm_kms_helper drm virtio_balloon i2c_piix4 pcspkr crc32c_intel virtio_net serio_raw net_failover failover floppy crc8 ata_generic pata_acpi qemu_fw_cfg [last unloaded: qedr]
[ 8790.084748] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G           OE     5.8.10 #1
[ 8790.084749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 8790.084752] RIP: 0010:__blk_mq_run_hw_queue+0xb7/0x100
[ 8790.084753] Code: 00 48 89 ef e8 ea 34 c8 ff 48 89 df 41 89 c4 e8 1f 7f 00 00 f6 83 a8 00 00 00 20 74 b1 41 f7 c4 fe ff ff ff 74 b7 0f 0b eb b3 <0f> 0b eb 86 48 83 bf 98 00 00 00 00 48 c7 c0 df 81 3f 82 48 c7 c2
[ 8790.084754] RSP: 0018:ffffc9000020ba60 EFLAGS: 00010206
[ 8790.084755] RAX: 0000000000000100 RBX: ffff88809fe8c400 RCX: 00000000ffffffff
[ 8790.084756] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88809fe8c400
[ 8790.084756] RBP: ffff888137b81a50 R08: ffffffffffffffff R09: 0000000000000020
[ 8790.084757] R10: 0000000000000001 R11: ffff8881365d4968 R12: 0000000000000000
[ 8790.084758] R13: ffff888137b81a40 R14: ffff88811e2b9e80 R15: ffff8880b3d964f0
[ 8790.084759] FS:  0000000000000000(0000) GS:ffff88813bbc0000(0000) knlGS:0000000000000000
[ 8790.084759] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8790.084760] CR2: 000055ca53900da8 CR3: 000000012b83e006 CR4: 0000000000360ee0
[ 8790.084763] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8790.084763] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8790.084764] Call Trace:
[ 8790.084767]  __blk_mq_delay_run_hw_queue+0x140/0x160
[ 8790.084768]  blk_mq_get_tag+0x1d1/0x270
[ 8790.084771]  ? finish_wait+0x80/0x80
[ 8790.084773]  __blk_mq_alloc_request+0xb1/0x100
[ 8790.084774]  blk_mq_make_request+0x144/0x5d0
[ 8790.084778]  generic_make_request+0x2db/0x340
[ 8790.084779]  ? bvec_alloc+0x82/0xe0
[ 8790.084781]  submit_bio+0x43/0x160
[ 8790.084781]  ? bio_add_page+0x39/0x90
[ 8790.084794]  nvmet_bdev_execute_rw+0x28c/0x360 [nvmet]
[ 8790.084800]  nvmet_rdma_execute_command+0x72/0x110 [nvmet_rdma]
[ 8790.084802]  nvmet_rdma_release_rsp+0xc1/0x1e0 [nvmet_rdma]
[ 8790.084804]  nvmet_rdma_queue_response.cold.63+0x14/0x19 [nvmet_rdma]
[ 8790.084806]  nvmet_req_complete+0x11/0x40 [nvmet]
[ 8790.084809]  nvmet_bio_done+0x27/0x100 [nvmet]
[ 8790.084811]  blk_update_request+0x23e/0x3b0
[ 8790.084812]  blk_mq_end_request+0x1a/0x120
[ 8790.084814]  blk_done_softirq+0xa1/0xd0
[ 8790.084818]  __do_softirq+0xe4/0x2f8
[ 8790.084821]  ? sort_range+0x20/0x20
[ 8790.084824]  run_ksoftirqd+0x26/0x40
[ 8790.084825]  smpboot_thread_fn+0xc5/0x160
[ 8790.084827]  kthread+0x116/0x130
[ 8790.084828]  ? kthread_park+0x80/0x80
[ 8790.084832]  ret_from_fork+0x22/0x30
[ 8790.084833] ---[ end trace 16ec813ee3f82b56 ]---
[ 8790.085314] BUG: scheduling while atomic: ksoftirqd/7/46/0x00000100


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
  2021-05-30  7:33 BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request Michal Kalderon
@ 2021-06-08 16:50 ` Christoph Hellwig
  2021-06-08 17:43 ` Sagi Grimberg
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2021-06-08 16:50 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Christoph Hellwig, sagi, linux-nvme, Shai Malin, Ariel Elior

What kernel version is this?

On Sun, May 30, 2021 at 07:33:18AM +0000, Michal Kalderon wrote:
> 
> this leads to nvmet_rdma_release_rsp being called from softirq eventually
> reaching the blk_mq_delay_run_hw_queue which tries to schedule in softirq. (full stack below)
> 
> could you please advise what the correct solution should be in this case ? 
> 
> thanks,
> Michal
> 
> [ 8790.082863] nvmet_rdma: post_recv cmd failed
> [ 8790.083484] nvmet_rdma: sending cmd response failed
> [ 8790.084131] ------------[ cut here ]------------
> [ 8790.084140] WARNING: CPU: 7 PID: 46 at block/blk-mq.c:1422 __blk_mq_run_hw_queue+0xb7/0x100
> [ 8790.084619] Modules linked in: null_blk nvmet_rdma nvmet nvme_rdma nvme_fabrics nvme_core netconsole qedr(OE) qede(OE) qed(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM nft_chain_nat xt_MASQUERADE nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nft_counter nft_compat tun bridge stp llc nf_tables nfnetlink ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma rdma_ucm ib_iser rdma_cm iw_cm intel_rapl_msr intel_rapl_common ib_cm sb_edac libiscsi scsi_transport_iscsi kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc rapl ib_uverbs ib_core cirrus drm_kms_helper drm virtio_balloon i2c_piix4 pcspkr crc32c_intel virtio_net serio_raw net_failover failover floppy crc8 ata_generic pata_acpi qemu_fw_cfg [last unloaded: qedr]
> [ 8790.084748] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G           OE     5.8.10 #1
> [ 8790.084749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
> [ 8790.084752] RIP: 0010:__blk_mq_run_hw_queue+0xb7/0x100
> [ 8790.084753] Code: 00 48 89 ef e8 ea 34 c8 ff 48 89 df 41 89 c4 e8 1f 7f 00 00 f6 83 a8 00 00 00 20 74 b1 41 f7 c4 fe ff ff ff 74 b7 0f 0b eb b3 <0f> 0b eb 86 48 83 bf 98 00 00 00 00 48 c7 c0 df 81 3f 82 48 c7 c2
> [ 8790.084754] RSP: 0018:ffffc9000020ba60 EFLAGS: 00010206
> [ 8790.084755] RAX: 0000000000000100 RBX: ffff88809fe8c400 RCX: 00000000ffffffff
> [ 8790.084756] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88809fe8c400
> [ 8790.084756] RBP: ffff888137b81a50 R08: ffffffffffffffff R09: 0000000000000020
> [ 8790.084757] R10: 0000000000000001 R11: ffff8881365d4968 R12: 0000000000000000
> [ 8790.084758] R13: ffff888137b81a40 R14: ffff88811e2b9e80 R15: ffff8880b3d964f0
> [ 8790.084759] FS:  0000000000000000(0000) GS:ffff88813bbc0000(0000) knlGS:0000000000000000
> [ 8790.084759] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8790.084760] CR2: 000055ca53900da8 CR3: 000000012b83e006 CR4: 0000000000360ee0
> [ 8790.084763] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 8790.084763] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 8790.084764] Call Trace:
> [ 8790.084767]  __blk_mq_delay_run_hw_queue+0x140/0x160
> [ 8790.084768]  blk_mq_get_tag+0x1d1/0x270
> [ 8790.084771]  ? finish_wait+0x80/0x80
> [ 8790.084773]  __blk_mq_alloc_request+0xb1/0x100
> [ 8790.084774]  blk_mq_make_request+0x144/0x5d0
> [ 8790.084778]  generic_make_request+0x2db/0x340
> [ 8790.084779]  ? bvec_alloc+0x82/0xe0
> [ 8790.084781]  submit_bio+0x43/0x160
> [ 8790.084781]  ? bio_add_page+0x39/0x90
> [ 8790.084794]  nvmet_bdev_execute_rw+0x28c/0x360 [nvmet]
> [ 8790.084800]  nvmet_rdma_execute_command+0x72/0x110 [nvmet_rdma]
> [ 8790.084802]  nvmet_rdma_release_rsp+0xc1/0x1e0 [nvmet_rdma]
> [ 8790.084804]  nvmet_rdma_queue_response.cold.63+0x14/0x19 [nvmet_rdma]
> [ 8790.084806]  nvmet_req_complete+0x11/0x40 [nvmet]
> [ 8790.084809]  nvmet_bio_done+0x27/0x100 [nvmet]
> [ 8790.084811]  blk_update_request+0x23e/0x3b0
> [ 8790.084812]  blk_mq_end_request+0x1a/0x120
> [ 8790.084814]  blk_done_softirq+0xa1/0xd0
> [ 8790.084818]  __do_softirq+0xe4/0x2f8
> [ 8790.084821]  ? sort_range+0x20/0x20
> [ 8790.084824]  run_ksoftirqd+0x26/0x40
> [ 8790.084825]  smpboot_thread_fn+0xc5/0x160
> [ 8790.084827]  kthread+0x116/0x130
> [ 8790.084828]  ? kthread_park+0x80/0x80
> [ 8790.084832]  ret_from_fork+0x22/0x30
> [ 8790.084833] ---[ end trace 16ec813ee3f82b56 ]---
> [ 8790.085314] BUG: scheduling while atomic: ksoftirqd/7/46/0x00000100
---end quoted text---

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
  2021-05-30  7:33 BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request Michal Kalderon
  2021-06-08 16:50 ` Christoph Hellwig
@ 2021-06-08 17:43 ` Sagi Grimberg
  2021-06-08 18:41   ` Keith Busch
  1 sibling, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-06-08 17:43 UTC (permalink / raw)
  To: Michal Kalderon, Christoph Hellwig; +Cc: linux-nvme, Shai Malin, Ariel Elior


> Hi Christoph, Sagi,
> 
> We're testing some device error recovery scenarios and hit the following BUG, stack trace below.
> In the error scenario, nvmet_rdma_queue_response receives an error from the device when trying to post a wr,
> 
> this leads to nvmet_rdma_release_rsp being called from softirq eventually
> reaching the blk_mq_delay_run_hw_queue which tries to schedule in softirq. (full stack below)
> 
> could you please advise what the correct solution should be in this case ?

Hey Michal,

I agree this can happen and requires correction. Does the below resolve
the issue?

--
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 7d607f435e36..6d2eea322779 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -16,6 +16,7 @@
  #include <linux/wait.h>
  #include <linux/inet.h>
  #include <asm/unaligned.h>
+#include <linux/async.h>

  #include <rdma/ib_verbs.h>
  #include <rdma/rdma_cm.h>
@@ -712,6 +713,12 @@ static void nvmet_rdma_send_done(struct ib_cq *cq, 
struct ib_wc *wc)
         }
  }

+static void nvmet_rdma_async_release_rsp(void *data, async_cookie_t cookie)
+{
+       struct nvmet_rdma_rsp *rsp = data;
+       nvmet_rdma_release_rsp(rsp);
+}
+
  static void nvmet_rdma_queue_response(struct nvmet_req *req)
  {
         struct nvmet_rdma_rsp *rsp =
@@ -745,7 +752,12 @@ static void nvmet_rdma_queue_response(struct 
nvmet_req *req)

         if (unlikely(ib_post_send(cm_id->qp, first_wr, NULL))) {
                 pr_err("sending cmd response failed\n");
-               nvmet_rdma_release_rsp(rsp);
+               /*
+                * We might be in atomic context, hence release
+                * the rsp in async context in case we need to
+                * process the wr_wait_list.
+                */
+               async_schedule(nvmet_rdma_async_release_rsp, rsp);
         }
  }
--

> 
> thanks,
> Michal
> 
> [ 8790.082863] nvmet_rdma: post_recv cmd failed
> [ 8790.083484] nvmet_rdma: sending cmd response failed
> [ 8790.084131] ------------[ cut here ]------------
> [ 8790.084140] WARNING: CPU: 7 PID: 46 at block/blk-mq.c:1422 __blk_mq_run_hw_queue+0xb7/0x100
> [ 8790.084619] Modules linked in: null_blk nvmet_rdma nvmet nvme_rdma nvme_fabrics nvme_core netconsole qedr(OE) qede(OE) qed(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM nft_chain_nat xt_MASQUERADE nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nft_counter nft_compat tun bridge stp llc nf_tables nfnetlink ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma rdma_ucm ib_iser rdma_cm iw_cm intel_rapl_msr intel_rapl_common ib_cm sb_edac libiscsi scsi_transport_iscsi kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sunrpc rapl ib_uverbs ib_core cirrus drm_kms_helper drm virtio_balloon i2c_piix4 pcspkr crc32c_intel virtio_net serio_raw net_failover failover floppy crc8 ata_generic pata_acpi qemu_fw_cfg [last unloaded: qedr]
> [ 8790.084748] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G           OE     5.8.10 #1
> [ 8790.084749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
> [ 8790.084752] RIP: 0010:__blk_mq_run_hw_queue+0xb7/0x100
> [ 8790.084753] Code: 00 48 89 ef e8 ea 34 c8 ff 48 89 df 41 89 c4 e8 1f 7f 00 00 f6 83 a8 00 00 00 20 74 b1 41 f7 c4 fe ff ff ff 74 b7 0f 0b eb b3 <0f> 0b eb 86 48 83 bf 98 00 00 00 00 48 c7 c0 df 81 3f 82 48 c7 c2
> [ 8790.084754] RSP: 0018:ffffc9000020ba60 EFLAGS: 00010206
> [ 8790.084755] RAX: 0000000000000100 RBX: ffff88809fe8c400 RCX: 00000000ffffffff
> [ 8790.084756] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88809fe8c400
> [ 8790.084756] RBP: ffff888137b81a50 R08: ffffffffffffffff R09: 0000000000000020
> [ 8790.084757] R10: 0000000000000001 R11: ffff8881365d4968 R12: 0000000000000000
> [ 8790.084758] R13: ffff888137b81a40 R14: ffff88811e2b9e80 R15: ffff8880b3d964f0
> [ 8790.084759] FS:  0000000000000000(0000) GS:ffff88813bbc0000(0000) knlGS:0000000000000000
> [ 8790.084759] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8790.084760] CR2: 000055ca53900da8 CR3: 000000012b83e006 CR4: 0000000000360ee0
> [ 8790.084763] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 8790.084763] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 8790.084764] Call Trace:
> [ 8790.084767]  __blk_mq_delay_run_hw_queue+0x140/0x160
> [ 8790.084768]  blk_mq_get_tag+0x1d1/0x270
> [ 8790.084771]  ? finish_wait+0x80/0x80
> [ 8790.084773]  __blk_mq_alloc_request+0xb1/0x100
> [ 8790.084774]  blk_mq_make_request+0x144/0x5d0
> [ 8790.084778]  generic_make_request+0x2db/0x340
> [ 8790.084779]  ? bvec_alloc+0x82/0xe0
> [ 8790.084781]  submit_bio+0x43/0x160
> [ 8790.084781]  ? bio_add_page+0x39/0x90
> [ 8790.084794]  nvmet_bdev_execute_rw+0x28c/0x360 [nvmet]
> [ 8790.084800]  nvmet_rdma_execute_command+0x72/0x110 [nvmet_rdma]
> [ 8790.084802]  nvmet_rdma_release_rsp+0xc1/0x1e0 [nvmet_rdma]
> [ 8790.084804]  nvmet_rdma_queue_response.cold.63+0x14/0x19 [nvmet_rdma]
> [ 8790.084806]  nvmet_req_complete+0x11/0x40 [nvmet]
> [ 8790.084809]  nvmet_bio_done+0x27/0x100 [nvmet]
> [ 8790.084811]  blk_update_request+0x23e/0x3b0
> [ 8790.084812]  blk_mq_end_request+0x1a/0x120
> [ 8790.084814]  blk_done_softirq+0xa1/0xd0
> [ 8790.084818]  __do_softirq+0xe4/0x2f8
> [ 8790.084821]  ? sort_range+0x20/0x20
> [ 8790.084824]  run_ksoftirqd+0x26/0x40
> [ 8790.084825]  smpboot_thread_fn+0xc5/0x160
> [ 8790.084827]  kthread+0x116/0x130
> [ 8790.084828]  ? kthread_park+0x80/0x80
> [ 8790.084832]  ret_from_fork+0x22/0x30
> [ 8790.084833] ---[ end trace 16ec813ee3f82b56 ]---
> [ 8790.085314] BUG: scheduling while atomic: ksoftirqd/7/46/0x00000100
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
  2021-06-08 17:43 ` Sagi Grimberg
@ 2021-06-08 18:41   ` Keith Busch
  2021-06-09  0:03     ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2021-06-08 18:41 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Michal Kalderon, Christoph Hellwig, linux-nvme, Shai Malin, Ariel Elior

On Tue, Jun 08, 2021 at 10:43:45AM -0700, Sagi Grimberg wrote:
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index 7d607f435e36..6d2eea322779 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -16,6 +16,7 @@
>  #include <linux/wait.h>
>  #include <linux/inet.h>
>  #include <asm/unaligned.h>
> +#include <linux/async.h>
> 
>  #include <rdma/ib_verbs.h>
>  #include <rdma/rdma_cm.h>
> @@ -712,6 +713,12 @@ static void nvmet_rdma_send_done(struct ib_cq *cq,
> struct ib_wc *wc)
>         }
>  }
> 
> +static void nvmet_rdma_async_release_rsp(void *data, async_cookie_t cookie)
> +{
> +       struct nvmet_rdma_rsp *rsp = data;
> +       nvmet_rdma_release_rsp(rsp);
> +}
> +
>  static void nvmet_rdma_queue_response(struct nvmet_req *req)
>  {
>         struct nvmet_rdma_rsp *rsp =
> @@ -745,7 +752,12 @@ static void nvmet_rdma_queue_response(struct nvmet_req
> *req)
> 
>         if (unlikely(ib_post_send(cm_id->qp, first_wr, NULL))) {
>                 pr_err("sending cmd response failed\n");
> -               nvmet_rdma_release_rsp(rsp);
> +               /*
> +                * We might be in atomic context, hence release
> +                * the rsp in async context in case we need to
> +                * process the wr_wait_list.
> +                */
> +               async_schedule(nvmet_rdma_async_release_rsp, rsp);
>         }
>  }

Just FYI, async_schedule() has conditions where it may execute your
callback synchronously. Your suggestion is probably fine for testing,
but it sounds like you require something that can guarantee a non-atomic
context for nvmet_rdma_release_rsp().

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
  2021-06-08 18:41   ` Keith Busch
@ 2021-06-09  0:03     ` Sagi Grimberg
  2021-06-14 14:44       ` [EXT] " Michal Kalderon
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-06-09  0:03 UTC (permalink / raw)
  To: Keith Busch
  Cc: Michal Kalderon, Christoph Hellwig, linux-nvme, Shai Malin, Ariel Elior


>> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
>> index 7d607f435e36..6d2eea322779 100644
>> --- a/drivers/nvme/target/rdma.c
>> +++ b/drivers/nvme/target/rdma.c
>> @@ -16,6 +16,7 @@
>>   #include <linux/wait.h>
>>   #include <linux/inet.h>
>>   #include <asm/unaligned.h>
>> +#include <linux/async.h>
>>
>>   #include <rdma/ib_verbs.h>
>>   #include <rdma/rdma_cm.h>
>> @@ -712,6 +713,12 @@ static void nvmet_rdma_send_done(struct ib_cq *cq,
>> struct ib_wc *wc)
>>          }
>>   }
>>
>> +static void nvmet_rdma_async_release_rsp(void *data, async_cookie_t cookie)
>> +{
>> +       struct nvmet_rdma_rsp *rsp = data;
>> +       nvmet_rdma_release_rsp(rsp);
>> +}
>> +
>>   static void nvmet_rdma_queue_response(struct nvmet_req *req)
>>   {
>>          struct nvmet_rdma_rsp *rsp =
>> @@ -745,7 +752,12 @@ static void nvmet_rdma_queue_response(struct nvmet_req
>> *req)
>>
>>          if (unlikely(ib_post_send(cm_id->qp, first_wr, NULL))) {
>>                  pr_err("sending cmd response failed\n");
>> -               nvmet_rdma_release_rsp(rsp);
>> +               /*
>> +                * We might be in atomic context, hence release
>> +                * the rsp in async context in case we need to
>> +                * process the wr_wait_list.
>> +                */
>> +               async_schedule(nvmet_rdma_async_release_rsp, rsp);
>>          }
>>   }
> 
> Just FYI, async_schedule() has conditions where it may execute your
> callback synchronously. Your suggestion is probably fine for testing,
> but it sounds like you require something that can guarantee a non-atomic
> context for nvmet_rdma_release_rsp().

OK, it seems that the issue is that we are submitting I/O in atomic
context. This should be more appropriate...

--
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 7d607f435e36..16f2f5a84ae7 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -102,6 +102,7 @@ struct nvmet_rdma_queue {

         struct work_struct      release_work;
         struct list_head        rsp_wait_list;
+       struct work_struct      wr_wait_work;
         struct list_head        rsp_wr_wait_list;
         spinlock_t              rsp_wr_wait_lock;

@@ -517,8 +518,10 @@ static int nvmet_rdma_post_recv(struct 
nvmet_rdma_device *ndev,
         return ret;
  }

-static void nvmet_rdma_process_wr_wait_list(struct nvmet_rdma_queue *queue)
+static void nvmet_rdma_process_wr_wait_list(struct work_struct *w)
  {
+       struct nvmet_rdma_queue *queue =
+               container_of(w, struct nvmet_rdma_queue, wr_wait_work);
         spin_lock(&queue->rsp_wr_wait_lock);
         while (!list_empty(&queue->rsp_wr_wait_list)) {
                 struct nvmet_rdma_rsp *rsp;
@@ -677,7 +680,7 @@ static void nvmet_rdma_release_rsp(struct 
nvmet_rdma_rsp *rsp)
                 nvmet_req_free_sgls(&rsp->req);

         if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list)))
-               nvmet_rdma_process_wr_wait_list(queue);
+               schedule_work(&queue->wr_wait_work);

         nvmet_rdma_put_rsp(rsp);
  }
@@ -1446,6 +1449,7 @@ nvmet_rdma_alloc_queue(struct nvmet_rdma_device *ndev,
          * inside a CM callback would trigger a deadlock. (great API 
design..)
          */
         INIT_WORK(&queue->release_work, nvmet_rdma_release_queue_work);
+       INIT_WORK(&queue->wr_wait_work, nvmet_rdma_process_wr_wait_list);
         queue->dev = ndev;
         queue->cm_id = cm_id;
         queue->port = port->nport;
--

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [EXT] Re: BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
  2021-06-09  0:03     ` Sagi Grimberg
@ 2021-06-14 14:44       ` Michal Kalderon
  2021-06-14 16:44         ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Kalderon @ 2021-06-14 14:44 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch
  Cc: Christoph Hellwig, linux-nvme, Shai Malin, Ariel Elior

> From: Sagi Grimberg <sagi@grimberg.me>
> Sent: Wednesday, June 9, 2021 3:04 AM
> 
> ----------------------------------------------------------------------
> 
> >> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> >> index 7d607f435e36..6d2eea322779 100644
> >> --- a/drivers/nvme/target/rdma.c
> >> +++ b/drivers/nvme/target/rdma.c
> >> @@ -16,6 +16,7 @@
> >>   #include <linux/wait.h>
> >>   #include <linux/inet.h>
> >>   #include <asm/unaligned.h>
> >> +#include <linux/async.h>
> >>
> >>   #include <rdma/ib_verbs.h>
> >>   #include <rdma/rdma_cm.h>
> >> @@ -712,6 +713,12 @@ static void nvmet_rdma_send_done(struct ib_cq
> *cq,
> >> struct ib_wc *wc)
> >>          }
> >>   }
> >>
> >> +static void nvmet_rdma_async_release_rsp(void *data, async_cookie_t
> cookie)
> >> +{
> >> +       struct nvmet_rdma_rsp *rsp = data;
> >> +       nvmet_rdma_release_rsp(rsp);
> >> +}
> >> +
> >>   static void nvmet_rdma_queue_response(struct nvmet_req *req)
> >>   {
> >>          struct nvmet_rdma_rsp *rsp =
> >> @@ -745,7 +752,12 @@ static void nvmet_rdma_queue_response(struct
> nvmet_req
> >> *req)
> >>
> >>          if (unlikely(ib_post_send(cm_id->qp, first_wr, NULL))) {
> >>                  pr_err("sending cmd response failed\n");
> >> -               nvmet_rdma_release_rsp(rsp);
> >> +               /*
> >> +                * We might be in atomic context, hence release
> >> +                * the rsp in async context in case we need to
> >> +                * process the wr_wait_list.
> >> +                */
> >> +               async_schedule(nvmet_rdma_async_release_rsp, rsp);
> >>          }
> >>   }
> >
> > Just FYI, async_schedule() has conditions where it may execute your
> > callback synchronously. Your suggestion is probably fine for testing,
> > but it sounds like you require something that can guarantee a non-atomic
> > context for nvmet_rdma_release_rsp().
> 
> OK, it seems that the issue is that we are submitting I/O in atomic
> context. This should be more appropriate...

Thanks Sagi, this seems to work. I'm still hitting some other issues where in some cases reconnect fails, but I'm 
Collecting more info. 

> 
> --
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index 7d607f435e36..16f2f5a84ae7 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -102,6 +102,7 @@ struct nvmet_rdma_queue {
> 
>          struct work_struct      release_work;
>          struct list_head        rsp_wait_list;
> +       struct work_struct      wr_wait_work;
>          struct list_head        rsp_wr_wait_list;
>          spinlock_t              rsp_wr_wait_lock;
> 
> @@ -517,8 +518,10 @@ static int nvmet_rdma_post_recv(struct
> nvmet_rdma_device *ndev,
>          return ret;
>   }
> 
> -static void nvmet_rdma_process_wr_wait_list(struct nvmet_rdma_queue
> *queue)
> +static void nvmet_rdma_process_wr_wait_list(struct work_struct *w)
>   {
> +       struct nvmet_rdma_queue *queue =
> +               container_of(w, struct nvmet_rdma_queue, wr_wait_work);
>          spin_lock(&queue->rsp_wr_wait_lock);
>          while (!list_empty(&queue->rsp_wr_wait_list)) {
>                  struct nvmet_rdma_rsp *rsp;
> @@ -677,7 +680,7 @@ static void nvmet_rdma_release_rsp(struct
> nvmet_rdma_rsp *rsp)
>                  nvmet_req_free_sgls(&rsp->req);
> 
>          if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list)))
> -               nvmet_rdma_process_wr_wait_list(queue);
> +               schedule_work(&queue->wr_wait_work);
> 
>          nvmet_rdma_put_rsp(rsp);
>   }
> @@ -1446,6 +1449,7 @@ nvmet_rdma_alloc_queue(struct
> nvmet_rdma_device *ndev,
>           * inside a CM callback would trigger a deadlock. (great API
> design..)
>           */
>          INIT_WORK(&queue->release_work,
> nvmet_rdma_release_queue_work);
> +       INIT_WORK(&queue->wr_wait_work,
> nvmet_rdma_process_wr_wait_list);
>          queue->dev = ndev;
>          queue->cm_id = cm_id;
>          queue->port = port->nport;
> --
Thanks, 

Tested-by: Michal Kalderon <michal.kalderon@marvell.com>


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [EXT] Re: BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
  2021-06-14 14:44       ` [EXT] " Michal Kalderon
@ 2021-06-14 16:44         ` Sagi Grimberg
  2021-06-14 18:14           ` Michal Kalderon
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2021-06-14 16:44 UTC (permalink / raw)
  To: Michal Kalderon, Keith Busch
  Cc: Christoph Hellwig, linux-nvme, Shai Malin, Ariel Elior


>> OK, it seems that the issue is that we are submitting I/O in atomic
>> context. This should be more appropriate...
> 
> Thanks Sagi, this seems to work. I'm still hitting some other issues where in some cases reconnect fails, but I'm
> Collecting more info.

Same type of failures?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [EXT] Re: BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request
  2021-06-14 16:44         ` Sagi Grimberg
@ 2021-06-14 18:14           ` Michal Kalderon
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Kalderon @ 2021-06-14 18:14 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch
  Cc: Christoph Hellwig, linux-nvme, Shai Malin, Ariel Elior

> From: Sagi Grimberg <sagi@grimberg.me>
> Sent: Monday, June 14, 2021 7:45 PM
> 
> 
> >> OK, it seems that the issue is that we are submitting I/O in atomic
> >> context. This should be more appropriate...
> >
> > Thanks Sagi, this seems to work. I'm still hitting some other issues where in
> some cases reconnect fails, but I'm
> > Collecting more info.
> 
> Same type of failures?
No, something else. 
After recovery completes, I'm getting the following errors on initiator side without any messages on target: 
[14678.618025] nvme nvme2: Connect rejected: status -104 (reset by remote host).
[14678.619350] nvme nvme2: rdma connection establishment failed (-104)
[14678.622274] nvme nvme2: Failed reconnect attempt 6
[14678.623623] nvme nvme2: Reconnecting in 10 seconds...
[14751.304247] nvme nvme2: I/O 0 QID 0 timeout
[14751.305749] nvme nvme2: Connect command failed, error wo/DNR bit: 881
[14751.307240] nvme nvme2: failed to connect queue: 0 ret=881
[14751.310497] nvme nvme2: Failed reconnect attempt 7
[14751.312174] nvme nvme2: Reconnecting in 10 seconds...
[14825.032645] nvme nvme2: I/O 1 QID 0 timeout


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-14 18:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-30  7:33 BUG: scheduling while atomic when nvmet_rdma_queue_response fails in posting a request Michal Kalderon
2021-06-08 16:50 ` Christoph Hellwig
2021-06-08 17:43 ` Sagi Grimberg
2021-06-08 18:41   ` Keith Busch
2021-06-09  0:03     ` Sagi Grimberg
2021-06-14 14:44       ` [EXT] " Michal Kalderon
2021-06-14 16:44         ` Sagi Grimberg
2021-06-14 18:14           ` Michal Kalderon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.