All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-tcp: Fix possible race of io_work and direct send
@ 2020-12-21  8:03 Sagi Grimberg
  2020-12-21 14:13 ` Potnuri Bharat Teja
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Sagi Grimberg @ 2020-12-21  8:03 UTC (permalink / raw)
  To: linux-nvme; +Cc: Keith Busch, Christoph Hellwig

We may send a request (with or without its data) from two
paths:

1. From our I/O context nvme_tcp_io_work which
is triggered from:
- queue_rq
- r2t reception
- socket data_ready and write_space callbacks

2. Directly from queue_rq if the send_list is empty (because
we want to save the context switch associated with scheduling
our io_work).

However, given that now we have the send_mutex, we may run into
a race condition where none of these contexts will send the pending
payload to the controller. Both io_work send path and queue_rq send
path opportunistically attempt to acquire the send_mutex however
queue_rq only attempts to send a single request, and if io_work
context fails to acquire the send_mutex it will complete without
rescheduling itself.

The race can trigger with the following sequence:
1. queue_rq sends request (no incapsule data) and blocks
2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU to the
   send_list and schedules io_work
3. io_work triggers and cannot acquire the send_mutex - because of (1), ends
   without self rescheduling
4. queue_rq completes the send, and completes
==> no context will send the h2cdata - timeout.

Fix this by having queue_rq sending as much as it can from the send_list
such that if it still has any left, its because the socket buffer is
full and the socket write_space callback will trigger, thus guaranteeing
that a context will be scheduled to send the h2cdata PDU.

Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Samuel Jones <sjones@kalrayinc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/tcp.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 1ba659927442..979ee31b8dd1 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -262,6 +262,16 @@ static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req,
 	}
 }
 
+static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
+{
+	int ret;
+
+	/* drain the send queue as much as we can... */
+	do {
+		ret = nvme_tcp_try_send(queue);
+	} while (ret > 0);
+}
+
 static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
 		bool sync, bool last)
 {
@@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
 	if (queue->io_cpu == smp_processor_id() &&
 	    sync && empty && mutex_trylock(&queue->send_mutex)) {
 		queue->more_requests = !last;
-		nvme_tcp_try_send(queue);
+		nvme_tcp_send_all(queue);
 		queue->more_requests = false;
 		mutex_unlock(&queue->send_mutex);
 	} else if (last) {
-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Fix possible race of io_work and direct send
  2020-12-21  8:03 [PATCH] nvme-tcp: Fix possible race of io_work and direct send Sagi Grimberg
@ 2020-12-21 14:13 ` Potnuri Bharat Teja
  2020-12-22 13:45 ` Christoph Hellwig
  2021-01-07 19:05 ` Or Gerlitz
  2 siblings, 0 replies; 8+ messages in thread
From: Potnuri Bharat Teja @ 2020-12-21 14:13 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Christoph Hellwig, linux-nvme

On Monday, December 12/21/20, 2020 at 13:33:39 +0530, Sagi Grimberg wrote:
> We may send a request (with or without its data) from two
> paths:
> 
> 1. From our I/O context nvme_tcp_io_work which
> is triggered from:
> - queue_rq
> - r2t reception
> - socket data_ready and write_space callbacks
> 
> 2. Directly from queue_rq if the send_list is empty (because
> we want to save the context switch associated with scheduling
> our io_work).
> 
> However, given that now we have the send_mutex, we may run into
> a race condition where none of these contexts will send the pending
> payload to the controller. Both io_work send path and queue_rq send
> path opportunistically attempt to acquire the send_mutex however
> queue_rq only attempts to send a single request, and if io_work
> context fails to acquire the send_mutex it will complete without
> rescheduling itself.
> 
> The race can trigger with the following sequence:
> 1. queue_rq sends request (no incapsule data) and blocks
> 2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU to the
>    send_list and schedules io_work
> 3. io_work triggers and cannot acquire the send_mutex - because of (1), ends
>    without self rescheduling
> 4. queue_rq completes the send, and completes
> ==> no context will send the h2cdata - timeout.
> 
> Fix this by having queue_rq sending as much as it can from the send_list
> such that if it still has any left, its because the socket buffer is
> full and the socket write_space callback will trigger, thus guaranteeing
> that a context will be scheduled to send the h2cdata PDU.
> 
> Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
> Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
> Reported-by: Samuel Jones <sjones@kalrayinc.com>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>

> ---
>  drivers/nvme/host/tcp.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 1ba659927442..979ee31b8dd1 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -262,6 +262,16 @@ static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req,
>  	}
>  }
>  
> +static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
> +{
> +	int ret;
> +
> +	/* drain the send queue as much as we can... */
> +	do {
> +		ret = nvme_tcp_try_send(queue);
> +	} while (ret > 0);
> +}
> +
>  static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>  		bool sync, bool last)
>  {
> @@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>  	if (queue->io_cpu == smp_processor_id() &&
>  	    sync && empty && mutex_trylock(&queue->send_mutex)) {
>  		queue->more_requests = !last;
> -		nvme_tcp_try_send(queue);
> +		nvme_tcp_send_all(queue);
>  		queue->more_requests = false;
>  		mutex_unlock(&queue->send_mutex);
>  	} else if (last) {
> -- 
> 2.25.1
> 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Fix possible race of io_work and direct send
  2020-12-21  8:03 [PATCH] nvme-tcp: Fix possible race of io_work and direct send Sagi Grimberg
  2020-12-21 14:13 ` Potnuri Bharat Teja
@ 2020-12-22 13:45 ` Christoph Hellwig
  2021-01-07 19:05 ` Or Gerlitz
  2 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2020-12-22 13:45 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, Christoph Hellwig, linux-nvme

Thanks,

applied to nvme-5.11.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Fix possible race of io_work and direct send
  2020-12-21  8:03 [PATCH] nvme-tcp: Fix possible race of io_work and direct send Sagi Grimberg
  2020-12-21 14:13 ` Potnuri Bharat Teja
  2020-12-22 13:45 ` Christoph Hellwig
@ 2021-01-07 19:05 ` Or Gerlitz
  2021-01-08 11:23   ` Yi Zhang
  2021-01-08 20:20   ` Sagi Grimberg
  2 siblings, 2 replies; 8+ messages in thread
From: Or Gerlitz @ 2021-01-07 19:05 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Keith Busch, Boris Pismenny, Christoph Hellwig, linux-nvme,
	Ben Ben-ishay

On Mon, Dec 21, 2020 at 10:06 AM Sagi Grimberg <sagi@grimberg.me> wrote:
> We may send a request (with or without its data) from two paths:
> 1. From our I/O context nvme_tcp_io_work which is triggered from:
> - queue_rq
> - r2t reception
> - socket data_ready and write_space callbacks

> 2. Directly from queue_rq if the send_list is empty (because
> we want to save the context switch associated with scheduling
> our io_work).

> Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
> Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
> Reported-by: Samuel Jones <sjones@kalrayinc.com>

> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c

> @@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>         if (queue->io_cpu == smp_processor_id() &&
>             sync && empty && mutex_trylock(&queue->send_mutex)) {

but we can't rely on the processor id if we are on preemptible state..

  373.940336] BUG: using smp_processor_id() in preemptible [00000000]
code: kworker/u8:4/457
[  373.942037] caller is nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
[  373.942144] CPU: 0 PID: 457 Comm: kworker/u8:4 Not tainted 5.11.0-rc1+ #116
[  373.942171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  373.942196] Workqueue: nvme-wq nvme_async_event_work [nvme_core]
[  373.942259] Call Trace:
[  373.942286]  dump_stack+0xd0/0x119
[  373.942346]  check_preemption_disabled+0xec/0xf0
[  373.942397]  nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
[  373.942449]  nvme_async_event_work+0x11d/0x1f0 [nvme_core]
[  373.942496]  ? nvme_ctrl_loss_tmo_show+0x140/0x140 [nvme_core]
[  373.942576]  process_one_work+0x9f3/0x1630
[  373.942649]  ? pwq_dec_nr_in_flight+0x330/0x330
[  373.942765]  worker_thread+0x9e/0xf60
[  373.942849]  ? process_one_work+0x1630/0x1630
[  373.942889]  kthread+0x362/0x480
[  373.942918]  ? kthread_create_on_node+0x100/0x100

       /*
+        * if we're the first on the send_list and we can try to send
+        * directly, otherwise queue io_work. Also, only do that if we
+        * are on the same cpu, so we don't introduce contention.
+        */
+       if (queue->io_cpu == smp_processor_id() &&
+           sync && empty && mutex_trylock(&queue->send_mutex)) {
+               nvme_tcp_try_send(queue);
+               mutex_unlock(&queue->send_mutex);
+       } else {
+               queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
+       }

so before this patch that check was not needed? I was not sure re all
the involved contexts and how to fix that.. so just sending the report

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Fix possible race of io_work and direct send
  2021-01-07 19:05 ` Or Gerlitz
@ 2021-01-08 11:23   ` Yi Zhang
  2021-01-08 20:20   ` Sagi Grimberg
  1 sibling, 0 replies; 8+ messages in thread
From: Yi Zhang @ 2021-01-08 11:23 UTC (permalink / raw)
  To: Or Gerlitz, Sagi Grimberg
  Cc: Keith Busch, Boris Pismenny, Christoph Hellwig, linux-nvme,
	Ben Ben-ishay



On 1/8/21 3:05 AM, Or Gerlitz wrote:
> On Mon, Dec 21, 2020 at 10:06 AM Sagi Grimberg <sagi@grimberg.me> wrote:
>> We may send a request (with or without its data) from two paths:
>> 1. From our I/O context nvme_tcp_io_work which is triggered from:
>> - queue_rq
>> - r2t reception
>> - socket data_ready and write_space callbacks
>> 2. Directly from queue_rq if the send_list is empty (because
>> we want to save the context switch associated with scheduling
>> our io_work).
>> Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
>> Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
>> Reported-by: Samuel Jones <sjones@kalrayinc.com>
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>>          if (queue->io_cpu == smp_processor_id() &&
>>              sync && empty && mutex_trylock(&queue->send_mutex)) {
> but we can't rely on the processor id if we are on preemptible state..
>
>    373.940336] BUG: using smp_processor_id() in preemptible [00000000]
> code: kworker/u8:4/457
> [  373.942037] caller is nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
> [  373.942144] CPU: 0 PID: 457 Comm: kworker/u8:4 Not tainted 5.11.0-rc1+ #116
> [  373.942171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [  373.942196] Workqueue: nvme-wq nvme_async_event_work [nvme_core]
> [  373.942259] Call Trace:
> [  373.942286]  dump_stack+0xd0/0x119
> [  373.942346]  check_preemption_disabled+0xec/0xf0
> [  373.942397]  nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
> [  373.942449]  nvme_async_event_work+0x11d/0x1f0 [nvme_core]
> [  373.942496]  ? nvme_ctrl_loss_tmo_show+0x140/0x140 [nvme_core]
> [  373.942576]  process_one_work+0x9f3/0x1630
> [  373.942649]  ? pwq_dec_nr_in_flight+0x330/0x330
> [  373.942765]  worker_thread+0x9e/0xf60
> [  373.942849]  ? process_one_work+0x1630/0x1630
> [  373.942889]  kthread+0x362/0x480
> [  373.942918]  ? kthread_create_on_node+0x100/0x100
>
>         /*
> +        * if we're the first on the send_list and we can try to send
> +        * directly, otherwise queue io_work. Also, only do that if we
> +        * are on the same cpu, so we don't introduce contention.
> +        */
> +       if (queue->io_cpu == smp_processor_id() &&
> +           sync && empty && mutex_trylock(&queue->send_mutex)) {
> +               nvme_tcp_try_send(queue);
> +               mutex_unlock(&queue->send_mutex);
> +       } else {
> +               queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
> +       }
>
> so before this patch that check was not needed? I was not sure re all
> the involved contexts and how to fix that.. so just sending the report
Yeah, I met similar issue after enable PREEMPT.

[  135.285366] run blktests nvme/003 at 2021-01-08 06:13:04
[  135.358457] loop: module loaded
[  135.368607] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  135.369568] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  135.374261] nvmet: creating controller 1 for subsystem 
nqn.2014-08.org.nvmexpress.discovery for NQN 
nqn.2014-08.org.nvmexpress:uuid:0d2e879c-9efb-4026-a82e-450e4283175e.
[  135.374381] nvme nvme0: new ctrl: NQN 
"nqn.2014-08.org.nvmexpress.discovery", addr 127.0.0.1:4420
[  135.374382] BUG: using smp_processor_id() in preemptible [00000000] 
code: kworker/u64:18/187
[  135.374387] caller is nvme_tcp_submit_async_event+0x111/0x140 [nvme_tcp]
[  135.389518] CPU: 31 PID: 187 Comm: kworker/u64:18 Tainted: G S        
I       5.11.0-rc2.v1 #2
[  135.398121] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 
2.10.0 11/12/2020
[  135.405688] Workqueue: nvme-wq nvme_async_event_work [nvme_core]
[  135.411704] Call Trace:
[  135.414157]  dump_stack+0x57/0x6a
[  135.417475]  check_preemption_disabled+0xb6/0xd0
[  135.422094]  nvme_tcp_submit_async_event+0x111/0x140 [nvme_tcp]
[  135.428014]  nvme_async_event_work+0x5d/0xc0 [nvme_core]
[  135.433324]  process_one_work+0x1b6/0x3b0
[  135.437337]  worker_thread+0x30/0x370
[  135.441002]  ? process_one_work+0x3b0/0x3b0
[  135.445190]  kthread+0x13d/0x160
[  135.448423]  ? kthread_park+0x80/0x80
[  135.452089]  ret_from_fork+0x1f/0x30
[  145.394927] nvme nvme0: Removing ctrl: NQN 
"nqn.2014-08.org.nvmexpress.discovery"


> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Fix possible race of io_work and direct send
  2021-01-07 19:05 ` Or Gerlitz
  2021-01-08 11:23   ` Yi Zhang
@ 2021-01-08 20:20   ` Sagi Grimberg
  2021-01-09  1:59     ` Yi Zhang
  2021-01-10  8:06     ` Or Gerlitz
  1 sibling, 2 replies; 8+ messages in thread
From: Sagi Grimberg @ 2021-01-08 20:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Keith Busch, Boris Pismenny, Christoph Hellwig, linux-nvme,
	Ben Ben-ishay


>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
> 
>> @@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>>          if (queue->io_cpu == smp_processor_id() &&
>>              sync && empty && mutex_trylock(&queue->send_mutex)) {
> 
> but we can't rely on the processor id if we are on preemptible state..
> 
>    373.940336] BUG: using smp_processor_id() in preemptible [00000000]
> code: kworker/u8:4/457
> [  373.942037] caller is nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
> [  373.942144] CPU: 0 PID: 457 Comm: kworker/u8:4 Not tainted 5.11.0-rc1+ #116
> [  373.942171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [  373.942196] Workqueue: nvme-wq nvme_async_event_work [nvme_core]
> [  373.942259] Call Trace:
> [  373.942286]  dump_stack+0xd0/0x119
> [  373.942346]  check_preemption_disabled+0xec/0xf0
> [  373.942397]  nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
> [  373.942449]  nvme_async_event_work+0x11d/0x1f0 [nvme_core]
> [  373.942496]  ? nvme_ctrl_loss_tmo_show+0x140/0x140 [nvme_core]
> [  373.942576]  process_one_work+0x9f3/0x1630
> [  373.942649]  ? pwq_dec_nr_in_flight+0x330/0x330
> [  373.942765]  worker_thread+0x9e/0xf60
> [  373.942849]  ? process_one_work+0x1630/0x1630
> [  373.942889]  kthread+0x362/0x480
> [  373.942918]  ? kthread_create_on_node+0x100/0x100

Well, this is a heuristic anyways, too bad we need to
disable preemption just for this...

I'm assuming that this makes the issue go away?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index cf856103025a..ea07336ca179 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -281,8 +281,11 @@ static inline void nvme_tcp_queue_request(struct 
nvme_tcp_request *req,
                 bool sync, bool last)
  {
         struct nvme_tcp_queue *queue = req->queue;
+       int cpu;
         bool empty;

+       cpu = get_cpu();
+       put_cpu();
         empty = llist_add(&req->lentry, &queue->req_list) &&
                 list_empty(&queue->send_list) && !queue->request;

@@ -291,8 +294,8 @@ static inline void nvme_tcp_queue_request(struct 
nvme_tcp_request *req,
          * directly, otherwise queue io_work. Also, only do that if we
          * are on the same cpu, so we don't introduce contention.
          */
-       if (queue->io_cpu == smp_processor_id() &&
-           sync && empty && mutex_trylock(&queue->send_mutex)) {
+       if (queue->io_cpu == cpu && sync && empty &&
+           mutex_trylock(&queue->send_mutex)) {
                 queue->more_requests = !last;
                 nvme_tcp_send_all(queue);
                 queue->more_requests = false;

--

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Fix possible race of io_work and direct send
  2021-01-08 20:20   ` Sagi Grimberg
@ 2021-01-09  1:59     ` Yi Zhang
  2021-01-10  8:06     ` Or Gerlitz
  1 sibling, 0 replies; 8+ messages in thread
From: Yi Zhang @ 2021-01-09  1:59 UTC (permalink / raw)
  To: Sagi Grimberg, Or Gerlitz
  Cc: Keith Busch, Boris Pismenny, Christoph Hellwig, linux-nvme,
	Ben Ben-ishay



On 1/9/21 4:20 AM, Sagi Grimberg wrote:
>
>>> --- a/drivers/nvme/host/tcp.c
>>> +++ b/drivers/nvme/host/tcp.c
>>
>>> @@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct 
>>> nvme_tcp_request *req,
>>>          if (queue->io_cpu == smp_processor_id() &&
>>>              sync && empty && mutex_trylock(&queue->send_mutex)) {
>>
>> but we can't rely on the processor id if we are on preemptible state..
>>
>>    373.940336] BUG: using smp_processor_id() in preemptible [00000000]
>> code: kworker/u8:4/457
>> [  373.942037] caller is nvme_tcp_submit_async_event+0x4da/0x690 
>> [nvme_tcp]
>> [  373.942144] CPU: 0 PID: 457 Comm: kworker/u8:4 Not tainted 
>> 5.11.0-rc1+ #116
>> [  373.942171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>> [  373.942196] Workqueue: nvme-wq nvme_async_event_work [nvme_core]
>> [  373.942259] Call Trace:
>> [  373.942286]  dump_stack+0xd0/0x119
>> [  373.942346]  check_preemption_disabled+0xec/0xf0
>> [  373.942397]  nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
>> [  373.942449]  nvme_async_event_work+0x11d/0x1f0 [nvme_core]
>> [  373.942496]  ? nvme_ctrl_loss_tmo_show+0x140/0x140 [nvme_core]
>> [  373.942576]  process_one_work+0x9f3/0x1630
>> [  373.942649]  ? pwq_dec_nr_in_flight+0x330/0x330
>> [  373.942765]  worker_thread+0x9e/0xf60
>> [  373.942849]  ? process_one_work+0x1630/0x1630
>> [  373.942889]  kthread+0x362/0x480
>> [  373.942918]  ? kthread_create_on_node+0x100/0x100
>
> Well, this is a heuristic anyways, too bad we need to
> disable preemption just for this...
>
> I'm assuming that this makes the issue go away?
> -- 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index cf856103025a..ea07336ca179 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -281,8 +281,11 @@ static inline void nvme_tcp_queue_request(struct 
> nvme_tcp_request *req,
>                 bool sync, bool last)
>  {
>         struct nvme_tcp_queue *queue = req->queue;
> +       int cpu;
>         bool empty;
>
> +       cpu = get_cpu();
> +       put_cpu();
>         empty = llist_add(&req->lentry, &queue->req_list) &&
>                 list_empty(&queue->send_list) && !queue->request;
>
> @@ -291,8 +294,8 @@ static inline void nvme_tcp_queue_request(struct 
> nvme_tcp_request *req,
>          * directly, otherwise queue io_work. Also, only do that if we
>          * are on the same cpu, so we don't introduce contention.
>          */
> -       if (queue->io_cpu == smp_processor_id() &&
> -           sync && empty && mutex_trylock(&queue->send_mutex)) {
> +       if (queue->io_cpu == cpu && sync && empty &&
> +           mutex_trylock(&queue->send_mutex)) {
>                 queue->more_requests = !last;
>                 nvme_tcp_send_all(queue);
>                 queue->more_requests = false;
>
Hi Sagi
Thanks for the quick response, confirmed the issue was fixed, feel free 
to add:

Tested-by: Yi Zhang <yi.zhang@redhat.com>

> -- 
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-tcp: Fix possible race of io_work and direct send
  2021-01-08 20:20   ` Sagi Grimberg
  2021-01-09  1:59     ` Yi Zhang
@ 2021-01-10  8:06     ` Or Gerlitz
  1 sibling, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2021-01-10  8:06 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Keith Busch, Boris Pismenny, Christoph Hellwig, linux-nvme,
	Ben Ben-ishay

On Fri, Jan 8, 2021 at 10:20 PM Sagi Grimberg <sagi@grimberg.me> wrote:

> Well, this is a heuristic anyways, too bad we need to
> disable preemption just for this...
>
> I'm assuming that this makes the issue go away?
> --
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index cf856103025a..ea07336ca179 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -281,8 +281,11 @@ static inline void nvme_tcp_queue_request(struct
> nvme_tcp_request *req,
>                  bool sync, bool last)
>   {
>          struct nvme_tcp_queue *queue = req->queue;
> +       int cpu;
>          bool empty;
>
> +       cpu = get_cpu();
> +       put_cpu();
>          empty = llist_add(&req->lentry, &queue->req_list) &&
>                  list_empty(&queue->send_list) && !queue->request;
>
> @@ -291,8 +294,8 @@ static inline void nvme_tcp_queue_request(struct
> nvme_tcp_request *req,
>           * directly, otherwise queue io_work. Also, only do that if we
>           * are on the same cpu, so we don't introduce contention.
>           */
> -       if (queue->io_cpu == smp_processor_id() &&
> -           sync && empty && mutex_trylock(&queue->send_mutex)) {
> +       if (queue->io_cpu == cpu && sync && empty &&


if preemption happens here, the value returned by get_cpu() is not where
we run when the actual comparison takes place. So what you are saying
is that overall correctness will not be hurt? worth putting a comment
here for readers / future modifiers of the code

> +           mutex_trylock(&queue->send_mutex)) {
>                  queue->more_requests = !last;
>                  nvme_tcp_send_all(queue);
>                  queue->more_requests = false;

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-10  8:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-21  8:03 [PATCH] nvme-tcp: Fix possible race of io_work and direct send Sagi Grimberg
2020-12-21 14:13 ` Potnuri Bharat Teja
2020-12-22 13:45 ` Christoph Hellwig
2021-01-07 19:05 ` Or Gerlitz
2021-01-08 11:23   ` Yi Zhang
2021-01-08 20:20   ` Sagi Grimberg
2021-01-09  1:59     ` Yi Zhang
2021-01-10  8:06     ` Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.