All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
@ 2018-01-09 19:23 Bart Van Assche
  2018-01-10 21:40 ` Doug Ledford
  2018-01-11 11:27 ` Moni Shoua
  0 siblings, 2 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-09 19:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma, Bart Van Assche, Moni Shoua, stable

The following sequence:
* Change queue pair state into IB_QPS_ERR.
* Post a work request on the queue pair.
Triggers the following race condition in the rdma_rxe driver:
* rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
  that examines the QP send queue.
* rxe_post_send() posts a work request on the QP send queue.
Avoid that this race causes a work request to be ignored by scheduling
an rxe_completer() call from rxe_post_send() for queues that are in the
error state.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: <stable@vger.kernel.org> # v4.8
---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index a6fbed48db8a..8f631d64c192 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -814,6 +814,8 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
 			(queue_count(qp->sq.queue) > 1);
 
 	rxe_run_task(&qp->req.task, must_sched);
+	if (unlikely(qp->req.state == QP_STATE_ERROR))
+		rxe_run_task(&qp->comp.task, 1);
 
 	return err;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-09 19:23 [PATCH] RDMA/rxe: Fix a race condition related to the QP error state Bart Van Assche
@ 2018-01-10 21:40 ` Doug Ledford
  2018-01-10 22:01   ` Doug Ledford
  2018-01-11 11:27 ` Moni Shoua
  1 sibling, 1 reply; 15+ messages in thread
From: Doug Ledford @ 2018-01-10 21:40 UTC (permalink / raw)
  To: Bart Van Assche, Jason Gunthorpe; +Cc: linux-rdma, Moni Shoua, stable

[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote:
> The following sequence:
> * Change queue pair state into IB_QPS_ERR.
> * Post a work request on the queue pair.
> Triggers the following race condition in the rdma_rxe driver:
> * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
>   that examines the QP send queue.
> * rxe_post_send() posts a work request on the QP send queue.
If rxe_completer() runs before rxe_post_send(), the send queue is
believed to be empty while a stale work request stays on the send queue
indefinitely.  To avoid this race, schedule rxe_completer() after a work
request is queued on a qp in the error state by rxe_post_send().

I think that improves the log message, yes?

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-10 21:40 ` Doug Ledford
@ 2018-01-10 22:01   ` Doug Ledford
  2018-01-11  6:22     ` Leon Romanovsky
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Ledford @ 2018-01-10 22:01 UTC (permalink / raw)
  To: Bart Van Assche, Jason Gunthorpe; +Cc: linux-rdma, Moni Shoua, stable

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

On Wed, 2018-01-10 at 16:40 -0500, Doug Ledford wrote:
> On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote:
> > The following sequence:
> > * Change queue pair state into IB_QPS_ERR.
> > * Post a work request on the queue pair.
> > Triggers the following race condition in the rdma_rxe driver:
> > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
> >   that examines the QP send queue.
> > * rxe_post_send() posts a work request on the QP send queue.
> 
> If rxe_completer() runs before rxe_post_send(), the send queue is
> believed to be empty while a stale work request stays on the send queue
> indefinitely.  To avoid this race, schedule rxe_completer() after a work
> request is queued on a qp in the error state by rxe_post_send().
> 
> I think that improves the log message, yes?
> 

I did some further edits.  But, patch applied to for-next.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-10 22:01   ` Doug Ledford
@ 2018-01-11  6:22     ` Leon Romanovsky
       [not found]       ` <20180111062252.GP7368-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-11  6:22 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Bart Van Assche, Jason Gunthorpe, linux-rdma, Moni Shoua, stable

[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]

On Wed, Jan 10, 2018 at 05:01:40PM -0500, Doug Ledford wrote:
> On Wed, 2018-01-10 at 16:40 -0500, Doug Ledford wrote:
> > On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote:
> > > The following sequence:
> > > * Change queue pair state into IB_QPS_ERR.
> > > * Post a work request on the queue pair.
> > > Triggers the following race condition in the rdma_rxe driver:
> > > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
> > >   that examines the QP send queue.
> > > * rxe_post_send() posts a work request on the QP send queue.
> >
> > If rxe_completer() runs before rxe_post_send(), the send queue is
> > believed to be empty while a stale work request stays on the send queue
> > indefinitely.  To avoid this race, schedule rxe_completer() after a work
> > request is queued on a qp in the error state by rxe_post_send().
> >
> > I think that improves the log message, yes?
> >
>
> I did some further edits.  But, patch applied to for-next.

The proposed patch definitely decreases the chance of races, but it is not fixing them.
There is a chance to have change in qp state immediately after your "if ..." check.

Thanks

>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-09 19:23 [PATCH] RDMA/rxe: Fix a race condition related to the QP error state Bart Van Assche
  2018-01-10 21:40 ` Doug Ledford
@ 2018-01-11 11:27 ` Moni Shoua
  2018-01-11 16:16   ` Bart Van Assche
  1 sibling, 1 reply; 15+ messages in thread
From: Moni Shoua @ 2018-01-11 11:27 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, stable

On Tue, Jan 9, 2018 at 9:23 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> The following sequence:
> * Change queue pair state into IB_QPS_ERR.
> * Post a work request on the queue pair.
> Triggers the following race condition in the rdma_rxe driver:
> * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
>   that examines the QP send queue.
> * rxe_post_send() posts a work request on the QP send queue.
> Avoid that this race causes a work request to be ignored by scheduling
> an rxe_completer() call from rxe_post_send() for queues that are in the
> error state.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Moni Shoua <monis@mellanox.com>
> Cc: <stable@vger.kernel.org> # v4.8
> ---
>  drivers/infiniband/sw/rxe/rxe_verbs.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index a6fbed48db8a..8f631d64c192 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -814,6 +814,8 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
>                         (queue_count(qp->sq.queue) > 1);
>
>         rxe_run_task(&qp->req.task, must_sched);
> +       if (unlikely(qp->req.state == QP_STATE_ERROR))
> +               rxe_run_task(&qp->comp.task, 1);
>
>         return err;
>  }
> --
> 2.15.1
>

Maybe I am missing something but I think that the race is when qp is
in ERROR state and the following functions run in parallel
* rxe_drain_req_pkts (called from rxe_requester after post_send)
* rxe_drain_resp_pkts (called from rxe_completer after modify to ERROR)

Am I right?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-11  6:22     ` Leon Romanovsky
@ 2018-01-11 16:02           ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-11 16:02 UTC (permalink / raw)
  To: leon-DgEjT+Ai2ygdnm+yROfE0A, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: jgg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA

On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> The proposed patch definitely decreases the chance of races, but it is not fixing them.
> There is a chance to have change in qp state immediately after your "if ..." check.

Hello Leon,

Please have a look at rxe_qp_error() and you will see that the patch I posted
is a proper fix. In the scenario you described rxe_qp_error() will trigger a
run of rxe_completer().

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
@ 2018-01-11 16:02           ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-11 16:02 UTC (permalink / raw)
  To: leon, dledford; +Cc: jgg, monis, linux-rdma, stable

On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> The proposed patch definitely decreases the chance of races, but it is not fixing them.
> There is a chance to have change in qp state immediately after your "if ..." check.

Hello Leon,

Please have a look at rxe_qp_error() and you will see that the patch I posted
is a proper fix. In the scenario you described rxe_qp_error() will trigger a
run of rxe_completer().

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-11 11:27 ` Moni Shoua
@ 2018-01-11 16:16   ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-11 16:16 UTC (permalink / raw)
  To: monis; +Cc: jgg, linux-rdma, stable, dledford

On Thu, 2018-01-11 at 13:27 +0200, Moni Shoua wrote:
> Maybe I am missing something but I think that the race is when qp is
> in ERROR state and the following functions run in parallel
> * rxe_drain_req_pkts (called from rxe_requester after post_send)
> * rxe_drain_resp_pkts (called from rxe_completer after modify to ERROR)
> 
> Am I right?

Hello Moni,

I think that's a real race and a race that has to be fixed but not the race
that caused the missing completions in the tests I ran myself.

Best regards,

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-11 16:02           ` Bart Van Assche
@ 2018-01-11 19:00               ` Leon Romanovsky
  -1 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-11 19:00 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, jgg-VPRAkNaXOzVWk0Htik3J/w,
	monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > There is a chance to have change in qp state immediately after your "if ..." check.
>
> Hello Leon,
>
> Please have a look at rxe_qp_error() and you will see that the patch I posted
> is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> run of rxe_completer().

Bart,

What am I missing?

CPU1							CPU2
							if (unlikely....
						<---
/* move the qp to the error state */
void rxe_qp_error(struct rxe_qp *qp)
{
        qp->req.state = QP_STATE_ERROR;
        qp->resp.state = QP_STATE_ERROR;
        qp->attr.qp_state = IB_QPS_ERR;
						--->
							rxe_run_task(&qp->req.task, must_sched);



It is more or less the same as without "if (unlikely..."

Thanks

>
> Bart.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
@ 2018-01-11 19:00               ` Leon Romanovsky
  0 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-11 19:00 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dledford, jgg, monis, linux-rdma, stable

[-- Attachment #1: Type: text/plain, Size: 922 bytes --]

On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > There is a chance to have change in qp state immediately after your "if ..." check.
>
> Hello Leon,
>
> Please have a look at rxe_qp_error() and you will see that the patch I posted
> is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> run of rxe_completer().

Bart,

What am I missing?

CPU1							CPU2
							if (unlikely....
						<---
/* move the qp to the error state */
void rxe_qp_error(struct rxe_qp *qp)
{
        qp->req.state = QP_STATE_ERROR;
        qp->resp.state = QP_STATE_ERROR;
        qp->attr.qp_state = IB_QPS_ERR;
						--->
							rxe_run_task(&qp->req.task, must_sched);



It is more or less the same as without "if (unlikely..."

Thanks

>
> Bart.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-11 19:00               ` Leon Romanovsky
  (?)
@ 2018-01-11 19:07               ` Bart Van Assche
  2018-01-12  6:23                 ` Leon Romanovsky
  -1 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2018-01-11 19:07 UTC (permalink / raw)
  To: leon; +Cc: jgg, monis, linux-rdma, stable, dledford

On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > There is a chance to have change in qp state immediately after your "if ..." check.
> > 
> > Hello Leon,
> > 
> > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > run of rxe_completer().
> 
> Bart,
> 
> What am I missing?
> 
> CPU1							CPU2
> 							if (unlikely....
> 						<---
> /* move the qp to the error state */
> void rxe_qp_error(struct rxe_qp *qp)
> {
>         qp->req.state = QP_STATE_ERROR;
>         qp->resp.state = QP_STATE_ERROR;
>         qp->attr.qp_state = IB_QPS_ERR;
> 						--->
> 							rxe_run_task(&qp->req.task, must_sched);
> 
> 
> 
> It is more or less the same as without "if (unlikely..."

Hello Leon,

In the above the part of rxe_qp_error() that I was referring to in my e-mail
is missing:

	if (qp_type(qp) == IB_QPT_RC)
		rxe_run_task(&qp->comp.task, 1);

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-11 19:07               ` Bart Van Assche
@ 2018-01-12  6:23                 ` Leon Romanovsky
       [not found]                   ` <20180112062359.GG15760-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-12  6:23 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: jgg, monis, linux-rdma, stable, dledford

[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]

On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > >
> > > Hello Leon,
> > >
> > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > run of rxe_completer().
> >
> > Bart,
> >
> > What am I missing?
> >
> > CPU1							CPU2
> > 							if (unlikely....
> > 						<---
> > /* move the qp to the error state */
> > void rxe_qp_error(struct rxe_qp *qp)
> > {
> >         qp->req.state = QP_STATE_ERROR;
> >         qp->resp.state = QP_STATE_ERROR;
> >         qp->attr.qp_state = IB_QPS_ERR;
> > 						--->
> > 							rxe_run_task(&qp->req.task, must_sched);
> >
> >
> >
> > It is more or less the same as without "if (unlikely..."
>
> Hello Leon,
>
> In the above the part of rxe_qp_error() that I was referring to in my e-mail
> is missing:
>
> 	if (qp_type(qp) == IB_QPT_RC)
> 		rxe_run_task(&qp->comp.task, 1);


But it is exactly where race exists, as long QP isn't protected, it can
switch CPUs and create race.

Thanks

>
> Bart.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-12  6:23                 ` Leon Romanovsky
@ 2018-01-12  6:32                       ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-12  6:32 UTC (permalink / raw)
  To: leon-DgEjT+Ai2ygdnm+yROfE0A
  Cc: jgg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2137 bytes --]

On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote:
> On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > > > 
> > > > Hello Leon,
> > > > 
> > > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > > run of rxe_completer().
> > > 
> > > Bart,
> > > 
> > > What am I missing?
> > > 
> > > CPU1							CPU2
> > > 							if (unlikely....
> > > 						<---
> > > /* move the qp to the error state */
> > > void rxe_qp_error(struct rxe_qp *qp)
> > > {
> > >         qp->req.state = QP_STATE_ERROR;
> > >         qp->resp.state = QP_STATE_ERROR;
> > >         qp->attr.qp_state = IB_QPS_ERR;
> > > 						--->
> > > 							rxe_run_task(&qp->req.task, must_sched);
> > > 
> > > 
> > > 
> > > It is more or less the same as without "if (unlikely..."
> > 
> > Hello Leon,
> > 
> > In the above the part of rxe_qp_error() that I was referring to in my e-mail
> > is missing:
> > 
> > 	if (qp_type(qp) == IB_QPT_RC)
> > 		rxe_run_task(&qp->comp.task, 1);
> 
> 
> But it is exactly where race exists, as long QP isn't protected, it can
> switch CPUs and create race.

Hello Leon,

Can you clarify which race you are referring to? rxe_run_task() uses the
tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a
time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers,
3rd edition. See also the tasklet_schedule() implementation in
<linux/interrupt.h> and in kernel/softirq.c.

Thanks,

Bart.N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
@ 2018-01-12  6:32                       ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-12  6:32 UTC (permalink / raw)
  To: leon; +Cc: jgg, monis, linux-rdma, stable, dledford

On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote:
> On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > > > 
> > > > Hello Leon,
> > > > 
> > > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > > run of rxe_completer().
> > > 
> > > Bart,
> > > 
> > > What am I missing?
> > > 
> > > CPU1							CPU2
> > > 							if (unlikely....
> > > 						<---
> > > /* move the qp to the error state */
> > > void rxe_qp_error(struct rxe_qp *qp)
> > > {
> > >         qp->req.state = QP_STATE_ERROR;
> > >         qp->resp.state = QP_STATE_ERROR;
> > >         qp->attr.qp_state = IB_QPS_ERR;
> > > 						--->
> > > 							rxe_run_task(&qp->req.task, must_sched);
> > > 
> > > 
> > > 
> > > It is more or less the same as without "if (unlikely..."
> > 
> > Hello Leon,
> > 
> > In the above the part of rxe_qp_error() that I was referring to in my e-mail
> > is missing:
> > 
> > 	if (qp_type(qp) == IB_QPT_RC)
> > 		rxe_run_task(&qp->comp.task, 1);
> 
> 
> But it is exactly where race exists, as long QP isn't protected, it can
> switch CPUs and create race.

Hello Leon,

Can you clarify which race you are referring to? rxe_run_task() uses the
tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a
time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers,
3rd edition. See also the tasklet_schedule() implementation in
<linux/interrupt.h> and in kernel/softirq.c.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
  2018-01-12  6:32                       ` Bart Van Assche
  (?)
@ 2018-01-12  6:38                       ` Leon Romanovsky
  -1 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-12  6:38 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: jgg, monis, linux-rdma, stable, dledford

[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]

On Fri, Jan 12, 2018 at 06:32:47AM +0000, Bart Van Assche wrote:
> On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote:
> > On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> > > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > > > >
> > > > > Hello Leon,
> > > > >
> > > > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > > > run of rxe_completer().
> > > >
> > > > Bart,
> > > >
> > > > What am I missing?
> > > >
> > > > CPU1							CPU2
> > > > 							if (unlikely....
> > > > 						<---
> > > > /* move the qp to the error state */
> > > > void rxe_qp_error(struct rxe_qp *qp)
> > > > {
> > > >         qp->req.state = QP_STATE_ERROR;
> > > >         qp->resp.state = QP_STATE_ERROR;
> > > >         qp->attr.qp_state = IB_QPS_ERR;
> > > > 						--->
> > > > 							rxe_run_task(&qp->req.task, must_sched);
> > > >
> > > >
> > > >
> > > > It is more or less the same as without "if (unlikely..."
> > >
> > > Hello Leon,
> > >
> > > In the above the part of rxe_qp_error() that I was referring to in my e-mail
> > > is missing:
> > >
> > > 	if (qp_type(qp) == IB_QPT_RC)
> > > 		rxe_run_task(&qp->comp.task, 1);
> >
> >
> > But it is exactly where race exists, as long QP isn't protected, it can
> > switch CPUs and create race.
>
> Hello Leon,
>
> Can you clarify which race you are referring to? rxe_run_task() uses the
> tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a
> time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers,
> 3rd edition. See also the tasklet_schedule() implementation in
> <linux/interrupt.h> and in kernel/softirq.c.

Ahh, Bart, Sorry.

I found the cause of my misunderstanding, it is "comp" in the rxe_run_task
call and not "req".

Thanks

>
> Thanks,
>
> Bart.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-12  6:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-09 19:23 [PATCH] RDMA/rxe: Fix a race condition related to the QP error state Bart Van Assche
2018-01-10 21:40 ` Doug Ledford
2018-01-10 22:01   ` Doug Ledford
2018-01-11  6:22     ` Leon Romanovsky
     [not found]       ` <20180111062252.GP7368-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-11 16:02         ` Bart Van Assche
2018-01-11 16:02           ` Bart Van Assche
     [not found]           ` <1515686552.2752.2.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-11 19:00             ` Leon Romanovsky
2018-01-11 19:00               ` Leon Romanovsky
2018-01-11 19:07               ` Bart Van Assche
2018-01-12  6:23                 ` Leon Romanovsky
     [not found]                   ` <20180112062359.GG15760-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-12  6:32                     ` Bart Van Assche
2018-01-12  6:32                       ` Bart Van Assche
2018-01-12  6:38                       ` Leon Romanovsky
2018-01-11 11:27 ` Moni Shoua
2018-01-11 16:16   ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.