* [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
@ 2018-01-09 19:23 Bart Van Assche
2018-01-10 21:40 ` Doug Ledford
2018-01-11 11:27 ` Moni Shoua
0 siblings, 2 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-09 19:23 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, linux-rdma, Bart Van Assche, Moni Shoua, stable
The following sequence:
* Change queue pair state into IB_QPS_ERR.
* Post a work request on the queue pair.
Triggers the following race condition in the rdma_rxe driver:
* rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
that examines the QP send queue.
* rxe_post_send() posts a work request on the QP send queue.
Avoid that this race causes a work request to be ignored by scheduling
an rxe_completer() call from rxe_post_send() for queues that are in the
error state.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: <stable@vger.kernel.org> # v4.8
---
drivers/infiniband/sw/rxe/rxe_verbs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index a6fbed48db8a..8f631d64c192 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -814,6 +814,8 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
(queue_count(qp->sq.queue) > 1);
rxe_run_task(&qp->req.task, must_sched);
+ if (unlikely(qp->req.state == QP_STATE_ERROR))
+ rxe_run_task(&qp->comp.task, 1);
return err;
}
--
2.15.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-09 19:23 [PATCH] RDMA/rxe: Fix a race condition related to the QP error state Bart Van Assche
@ 2018-01-10 21:40 ` Doug Ledford
2018-01-10 22:01 ` Doug Ledford
2018-01-11 11:27 ` Moni Shoua
1 sibling, 1 reply; 15+ messages in thread
From: Doug Ledford @ 2018-01-10 21:40 UTC (permalink / raw)
To: Bart Van Assche, Jason Gunthorpe; +Cc: linux-rdma, Moni Shoua, stable
[-- Attachment #1: Type: text/plain, Size: 901 bytes --]
On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote:
> The following sequence:
> * Change queue pair state into IB_QPS_ERR.
> * Post a work request on the queue pair.
> Triggers the following race condition in the rdma_rxe driver:
> * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
> that examines the QP send queue.
> * rxe_post_send() posts a work request on the QP send queue.
If rxe_completer() runs before rxe_post_send(), the send queue is
believed to be empty while a stale work request stays on the send queue
indefinitely. To avoid this race, schedule rxe_completer() after a work
request is queued on a qp in the error state by rxe_post_send().
I think that improves the log message, yes?
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-10 21:40 ` Doug Ledford
@ 2018-01-10 22:01 ` Doug Ledford
2018-01-11 6:22 ` Leon Romanovsky
0 siblings, 1 reply; 15+ messages in thread
From: Doug Ledford @ 2018-01-10 22:01 UTC (permalink / raw)
To: Bart Van Assche, Jason Gunthorpe; +Cc: linux-rdma, Moni Shoua, stable
[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]
On Wed, 2018-01-10 at 16:40 -0500, Doug Ledford wrote:
> On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote:
> > The following sequence:
> > * Change queue pair state into IB_QPS_ERR.
> > * Post a work request on the queue pair.
> > Triggers the following race condition in the rdma_rxe driver:
> > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
> > that examines the QP send queue.
> > * rxe_post_send() posts a work request on the QP send queue.
>
> If rxe_completer() runs before rxe_post_send(), the send queue is
> believed to be empty while a stale work request stays on the send queue
> indefinitely. To avoid this race, schedule rxe_completer() after a work
> request is queued on a qp in the error state by rxe_post_send().
>
> I think that improves the log message, yes?
>
I did some further edits. But, patch applied to for-next.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-10 22:01 ` Doug Ledford
@ 2018-01-11 6:22 ` Leon Romanovsky
[not found] ` <20180111062252.GP7368-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-11 6:22 UTC (permalink / raw)
To: Doug Ledford
Cc: Bart Van Assche, Jason Gunthorpe, linux-rdma, Moni Shoua, stable
[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]
On Wed, Jan 10, 2018 at 05:01:40PM -0500, Doug Ledford wrote:
> On Wed, 2018-01-10 at 16:40 -0500, Doug Ledford wrote:
> > On Tue, 2018-01-09 at 11:23 -0800, Bart Van Assche wrote:
> > > The following sequence:
> > > * Change queue pair state into IB_QPS_ERR.
> > > * Post a work request on the queue pair.
> > > Triggers the following race condition in the rdma_rxe driver:
> > > * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
> > > that examines the QP send queue.
> > > * rxe_post_send() posts a work request on the QP send queue.
> >
> > If rxe_completer() runs before rxe_post_send(), the send queue is
> > believed to be empty while a stale work request stays on the send queue
> > indefinitely. To avoid this race, schedule rxe_completer() after a work
> > request is queued on a qp in the error state by rxe_post_send().
> >
> > I think that improves the log message, yes?
> >
>
> I did some further edits. But, patch applied to for-next.
The proposed patch definitely decreases the chance of races, but it is not fixing them.
There is a chance to have change in qp state immediately after your "if ..." check.
Thanks
>
> --
> Doug Ledford <dledford@redhat.com>
> GPG KeyID: B826A3330E572FDD
> Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-09 19:23 [PATCH] RDMA/rxe: Fix a race condition related to the QP error state Bart Van Assche
2018-01-10 21:40 ` Doug Ledford
@ 2018-01-11 11:27 ` Moni Shoua
2018-01-11 16:16 ` Bart Van Assche
1 sibling, 1 reply; 15+ messages in thread
From: Moni Shoua @ 2018-01-11 11:27 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, stable
On Tue, Jan 9, 2018 at 9:23 PM, Bart Van Assche <bart.vanassche@wdc.com> wrote:
> The following sequence:
> * Change queue pair state into IB_QPS_ERR.
> * Post a work request on the queue pair.
> Triggers the following race condition in the rdma_rxe driver:
> * rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
> that examines the QP send queue.
> * rxe_post_send() posts a work request on the QP send queue.
> Avoid that this race causes a work request to be ignored by scheduling
> an rxe_completer() call from rxe_post_send() for queues that are in the
> error state.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Moni Shoua <monis@mellanox.com>
> Cc: <stable@vger.kernel.org> # v4.8
> ---
> drivers/infiniband/sw/rxe/rxe_verbs.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index a6fbed48db8a..8f631d64c192 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -814,6 +814,8 @@ static int rxe_post_send_kernel(struct rxe_qp *qp, struct ib_send_wr *wr,
> (queue_count(qp->sq.queue) > 1);
>
> rxe_run_task(&qp->req.task, must_sched);
> + if (unlikely(qp->req.state == QP_STATE_ERROR))
> + rxe_run_task(&qp->comp.task, 1);
>
> return err;
> }
> --
> 2.15.1
>
Maybe I am missing something but I think that the race is when qp is
in ERROR state and the following functions run in parallel
* rxe_drain_req_pkts (called from rxe_requester after post_send)
* rxe_drain_resp_pkts (called from rxe_completer after modify to ERROR)
Am I right?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-11 6:22 ` Leon Romanovsky
@ 2018-01-11 16:02 ` Bart Van Assche
0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-11 16:02 UTC (permalink / raw)
To: leon-DgEjT+Ai2ygdnm+yROfE0A, dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: jgg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA
On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> The proposed patch definitely decreases the chance of races, but it is not fixing them.
> There is a chance to have change in qp state immediately after your "if ..." check.
Hello Leon,
Please have a look at rxe_qp_error() and you will see that the patch I posted
is a proper fix. In the scenario you described rxe_qp_error() will trigger a
run of rxe_completer().
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
@ 2018-01-11 16:02 ` Bart Van Assche
0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-11 16:02 UTC (permalink / raw)
To: leon, dledford; +Cc: jgg, monis, linux-rdma, stable
On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> The proposed patch definitely decreases the chance of races, but it is not fixing them.
> There is a chance to have change in qp state immediately after your "if ..." check.
Hello Leon,
Please have a look at rxe_qp_error() and you will see that the patch I posted
is a proper fix. In the scenario you described rxe_qp_error() will trigger a
run of rxe_completer().
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-11 11:27 ` Moni Shoua
@ 2018-01-11 16:16 ` Bart Van Assche
0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-11 16:16 UTC (permalink / raw)
To: monis; +Cc: jgg, linux-rdma, stable, dledford
On Thu, 2018-01-11 at 13:27 +0200, Moni Shoua wrote:
> Maybe I am missing something but I think that the race is when qp is
> in ERROR state and the following functions run in parallel
> * rxe_drain_req_pkts (called from rxe_requester after post_send)
> * rxe_drain_resp_pkts (called from rxe_completer after modify to ERROR)
>
> Am I right?
Hello Moni,
I think that's a real race and a race that has to be fixed but not the race
that caused the missing completions in the tests I ran myself.
Best regards,
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-11 16:02 ` Bart Van Assche
@ 2018-01-11 19:00 ` Leon Romanovsky
-1 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-11 19:00 UTC (permalink / raw)
To: Bart Van Assche
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, jgg-VPRAkNaXOzVWk0Htik3J/w,
monis-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
stable-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 922 bytes --]
On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > There is a chance to have change in qp state immediately after your "if ..." check.
>
> Hello Leon,
>
> Please have a look at rxe_qp_error() and you will see that the patch I posted
> is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> run of rxe_completer().
Bart,
What am I missing?
CPU1 CPU2
if (unlikely....
<---
/* move the qp to the error state */
void rxe_qp_error(struct rxe_qp *qp)
{
qp->req.state = QP_STATE_ERROR;
qp->resp.state = QP_STATE_ERROR;
qp->attr.qp_state = IB_QPS_ERR;
--->
rxe_run_task(&qp->req.task, must_sched);
It is more or less the same as without "if (unlikely..."
Thanks
>
> Bart.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
@ 2018-01-11 19:00 ` Leon Romanovsky
0 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-11 19:00 UTC (permalink / raw)
To: Bart Van Assche; +Cc: dledford, jgg, monis, linux-rdma, stable
[-- Attachment #1: Type: text/plain, Size: 922 bytes --]
On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > There is a chance to have change in qp state immediately after your "if ..." check.
>
> Hello Leon,
>
> Please have a look at rxe_qp_error() and you will see that the patch I posted
> is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> run of rxe_completer().
Bart,
What am I missing?
CPU1 CPU2
if (unlikely....
<---
/* move the qp to the error state */
void rxe_qp_error(struct rxe_qp *qp)
{
qp->req.state = QP_STATE_ERROR;
qp->resp.state = QP_STATE_ERROR;
qp->attr.qp_state = IB_QPS_ERR;
--->
rxe_run_task(&qp->req.task, must_sched);
It is more or less the same as without "if (unlikely..."
Thanks
>
> Bart.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-11 19:00 ` Leon Romanovsky
(?)
@ 2018-01-11 19:07 ` Bart Van Assche
2018-01-12 6:23 ` Leon Romanovsky
-1 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2018-01-11 19:07 UTC (permalink / raw)
To: leon; +Cc: jgg, monis, linux-rdma, stable, dledford
On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > There is a chance to have change in qp state immediately after your "if ..." check.
> >
> > Hello Leon,
> >
> > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > run of rxe_completer().
>
> Bart,
>
> What am I missing?
>
> CPU1 CPU2
> if (unlikely....
> <---
> /* move the qp to the error state */
> void rxe_qp_error(struct rxe_qp *qp)
> {
> qp->req.state = QP_STATE_ERROR;
> qp->resp.state = QP_STATE_ERROR;
> qp->attr.qp_state = IB_QPS_ERR;
> --->
> rxe_run_task(&qp->req.task, must_sched);
>
>
>
> It is more or less the same as without "if (unlikely..."
Hello Leon,
In the above the part of rxe_qp_error() that I was referring to in my e-mail
is missing:
if (qp_type(qp) == IB_QPT_RC)
rxe_run_task(&qp->comp.task, 1);
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-11 19:07 ` Bart Van Assche
@ 2018-01-12 6:23 ` Leon Romanovsky
[not found] ` <20180112062359.GG15760-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-12 6:23 UTC (permalink / raw)
To: Bart Van Assche; +Cc: jgg, monis, linux-rdma, stable, dledford
[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]
On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > >
> > > Hello Leon,
> > >
> > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > run of rxe_completer().
> >
> > Bart,
> >
> > What am I missing?
> >
> > CPU1 CPU2
> > if (unlikely....
> > <---
> > /* move the qp to the error state */
> > void rxe_qp_error(struct rxe_qp *qp)
> > {
> > qp->req.state = QP_STATE_ERROR;
> > qp->resp.state = QP_STATE_ERROR;
> > qp->attr.qp_state = IB_QPS_ERR;
> > --->
> > rxe_run_task(&qp->req.task, must_sched);
> >
> >
> >
> > It is more or less the same as without "if (unlikely..."
>
> Hello Leon,
>
> In the above the part of rxe_qp_error() that I was referring to in my e-mail
> is missing:
>
> if (qp_type(qp) == IB_QPT_RC)
> rxe_run_task(&qp->comp.task, 1);
But it is exactly where race exists, as long QP isn't protected, it can
switch CPUs and create race.
Thanks
>
> Bart.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-12 6:23 ` Leon Romanovsky
@ 2018-01-12 6:32 ` Bart Van Assche
0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-12 6:32 UTC (permalink / raw)
To: leon-DgEjT+Ai2ygdnm+yROfE0A
Cc: jgg-VPRAkNaXOzVWk0Htik3J/w, monis-VPRAkNaXOzVWk0Htik3J/w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA,
dledford-H+wXaHxf7aLQT0dZR+AlfA
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2137 bytes --]
On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote:
> On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > > >
> > > > Hello Leon,
> > > >
> > > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > > run of rxe_completer().
> > >
> > > Bart,
> > >
> > > What am I missing?
> > >
> > > CPU1 CPU2
> > > if (unlikely....
> > > <---
> > > /* move the qp to the error state */
> > > void rxe_qp_error(struct rxe_qp *qp)
> > > {
> > > qp->req.state = QP_STATE_ERROR;
> > > qp->resp.state = QP_STATE_ERROR;
> > > qp->attr.qp_state = IB_QPS_ERR;
> > > --->
> > > rxe_run_task(&qp->req.task, must_sched);
> > >
> > >
> > >
> > > It is more or less the same as without "if (unlikely..."
> >
> > Hello Leon,
> >
> > In the above the part of rxe_qp_error() that I was referring to in my e-mail
> > is missing:
> >
> > if (qp_type(qp) == IB_QPT_RC)
> > rxe_run_task(&qp->comp.task, 1);
>
>
> But it is exactly where race exists, as long QP isn't protected, it can
> switch CPUs and create race.
Hello Leon,
Can you clarify which race you are referring to? rxe_run_task() uses the
tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a
time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers,
3rd edition. See also the tasklet_schedule() implementation in
<linux/interrupt.h> and in kernel/softirq.c.
Thanks,
Bart.N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
@ 2018-01-12 6:32 ` Bart Van Assche
0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2018-01-12 6:32 UTC (permalink / raw)
To: leon; +Cc: jgg, monis, linux-rdma, stable, dledford
On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote:
> On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > > >
> > > > Hello Leon,
> > > >
> > > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > > run of rxe_completer().
> > >
> > > Bart,
> > >
> > > What am I missing?
> > >
> > > CPU1 CPU2
> > > if (unlikely....
> > > <---
> > > /* move the qp to the error state */
> > > void rxe_qp_error(struct rxe_qp *qp)
> > > {
> > > qp->req.state = QP_STATE_ERROR;
> > > qp->resp.state = QP_STATE_ERROR;
> > > qp->attr.qp_state = IB_QPS_ERR;
> > > --->
> > > rxe_run_task(&qp->req.task, must_sched);
> > >
> > >
> > >
> > > It is more or less the same as without "if (unlikely..."
> >
> > Hello Leon,
> >
> > In the above the part of rxe_qp_error() that I was referring to in my e-mail
> > is missing:
> >
> > if (qp_type(qp) == IB_QPT_RC)
> > rxe_run_task(&qp->comp.task, 1);
>
>
> But it is exactly where race exists, as long QP isn't protected, it can
> switch CPUs and create race.
Hello Leon,
Can you clarify which race you are referring to? rxe_run_task() uses the
tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a
time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers,
3rd edition. See also the tasklet_schedule() implementation in
<linux/interrupt.h> and in kernel/softirq.c.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] RDMA/rxe: Fix a race condition related to the QP error state
2018-01-12 6:32 ` Bart Van Assche
(?)
@ 2018-01-12 6:38 ` Leon Romanovsky
-1 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2018-01-12 6:38 UTC (permalink / raw)
To: Bart Van Assche; +Cc: jgg, monis, linux-rdma, stable, dledford
[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]
On Fri, Jan 12, 2018 at 06:32:47AM +0000, Bart Van Assche wrote:
> On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote:
> > On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
> > > On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
> > > > On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
> > > > > On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
> > > > > > The proposed patch definitely decreases the chance of races, but it is not fixing them.
> > > > > > There is a chance to have change in qp state immediately after your "if ..." check.
> > > > >
> > > > > Hello Leon,
> > > > >
> > > > > Please have a look at rxe_qp_error() and you will see that the patch I posted
> > > > > is a proper fix. In the scenario you described rxe_qp_error() will trigger a
> > > > > run of rxe_completer().
> > > >
> > > > Bart,
> > > >
> > > > What am I missing?
> > > >
> > > > CPU1 CPU2
> > > > if (unlikely....
> > > > <---
> > > > /* move the qp to the error state */
> > > > void rxe_qp_error(struct rxe_qp *qp)
> > > > {
> > > > qp->req.state = QP_STATE_ERROR;
> > > > qp->resp.state = QP_STATE_ERROR;
> > > > qp->attr.qp_state = IB_QPS_ERR;
> > > > --->
> > > > rxe_run_task(&qp->req.task, must_sched);
> > > >
> > > >
> > > >
> > > > It is more or less the same as without "if (unlikely..."
> > >
> > > Hello Leon,
> > >
> > > In the above the part of rxe_qp_error() that I was referring to in my e-mail
> > > is missing:
> > >
> > > if (qp_type(qp) == IB_QPT_RC)
> > > rxe_run_task(&qp->comp.task, 1);
> >
> >
> > But it is exactly where race exists, as long QP isn't protected, it can
> > switch CPUs and create race.
>
> Hello Leon,
>
> Can you clarify which race you are referring to? rxe_run_task() uses the
> tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a
> time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers,
> 3rd edition. See also the tasklet_schedule() implementation in
> <linux/interrupt.h> and in kernel/softirq.c.
Ahh, Bart, Sorry.
I found the cause of my misunderstanding, it is "comp" in the rxe_run_task
call and not "req".
Thanks
>
> Thanks,
>
> Bart.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-01-12 6:38 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-09 19:23 [PATCH] RDMA/rxe: Fix a race condition related to the QP error state Bart Van Assche
2018-01-10 21:40 ` Doug Ledford
2018-01-10 22:01 ` Doug Ledford
2018-01-11 6:22 ` Leon Romanovsky
[not found] ` <20180111062252.GP7368-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-11 16:02 ` Bart Van Assche
2018-01-11 16:02 ` Bart Van Assche
[not found] ` <1515686552.2752.2.camel-Sjgp3cTcYWE@public.gmane.org>
2018-01-11 19:00 ` Leon Romanovsky
2018-01-11 19:00 ` Leon Romanovsky
2018-01-11 19:07 ` Bart Van Assche
2018-01-12 6:23 ` Leon Romanovsky
[not found] ` <20180112062359.GG15760-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2018-01-12 6:32 ` Bart Van Assche
2018-01-12 6:32 ` Bart Van Assche
2018-01-12 6:38 ` Leon Romanovsky
2018-01-11 11:27 ` Moni Shoua
2018-01-11 16:16 ` Bart Van Assche
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.