All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -v2] blk-mq: Start to fix memory ordering...
@ 2017-09-06  8:00 Peter Zijlstra
  2017-09-06 14:02 ` Alan Stern
  2017-10-04 17:18 ` Jens Axboe
  0 siblings, 2 replies; 4+ messages in thread
From: Peter Zijlstra @ 2017-09-06  8:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: parri.andrea, linux-kernel, tom.leiming, hch, paulmck,
	will.deacon, boqun.feng, stern


Attempt to untangle the ordering in blk-mq. The patch introducing the
single smp_mb__before_atomic() is obviously broken in that it doesn't
clearly specify a pairing barrier and an obtained guarantee.

The comment is further misleading in that it hints that the
deadline store and the COMPLETE store also need to be ordered, but
AFAICT there is no such dependency. However what does appear to be
important is the clear happening _after_ the store, and that worked by
pure accident.

This clarifies blk_mq_start_request() -- we should not get there with
STARTING set -- this simplifies the code and makes the barrier usage
sane (the old code could be read to allow not having _any_ atomic after
the barrier, in which case the barrier hasn't got anything to order). We
then also introduce the missing pairing barrier for it.

Also down-grade the barrier to smp_wmb(), this is cheaper for
PowerPC/ARM and doesn't cost anything extra on x86.

And it documents the STARTING vs COMPLETE ordering. Although I've not
been entirely successful in reverse engineering the blk-mq state
machine so there might still be more funnies around timeout vs
requeue.

If I got anything wrong, feel free to educate me by adding comments to
clarify things ;-)

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Fixes: 538b75341835 ("blk-mq: request deadline must be visible before marking rq as started")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 - spelling; Andrea and Bart
 - compiles (urgh!)
 - smp_wmb(); Adrea


 block/blk-mq.c      | 52 ++++++++++++++++++++++++++++++++++++++++------------
 block/blk-timeout.c |  2 +-
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4603b115e234..506a0f355117 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -558,22 +558,32 @@ void blk_mq_start_request(struct request *rq)
 
 	blk_add_timer(rq);
 
-	/*
-	 * Ensure that ->deadline is visible before set the started
-	 * flag and clear the completed flag.
-	 */
-	smp_mb__before_atomic();
+	WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags));
 
 	/*
 	 * Mark us as started and clear complete. Complete might have been
 	 * set if requeue raced with timeout, which then marked it as
 	 * complete. So be sure to clear complete again when we start
 	 * the request, otherwise we'll ignore the completion event.
+	 *
+	 * Ensure that ->deadline is visible before we set STARTED, such that
+	 * blk_mq_check_expired() is guaranteed to observe our ->deadline when
+	 * it observes STARTED.
 	 */
-	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
-		set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
-	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
+	smp_wmb();
+	set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
+	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
+		/*
+		 * Coherence order guarantees these consecutive stores to a
+		 * single variable propagate in the specified order. Thus the
+		 * clear_bit() is ordered _after_ the set bit. See
+		 * blk_mq_check_expired().
+		 *
+		 * (the bits must be part of the same byte for this to be
+		 * true).
+		 */
 		clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
+	}
 
 	if (q->dma_drain_size && blk_rq_bytes(rq)) {
 		/*
@@ -744,11 +754,20 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
 {
 	struct blk_mq_timeout_data *data = priv;
+	unsigned long deadline;
 
 	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
 		return;
 
 	/*
+	 * Ensures that if we see STARTED we must also see our
+	 * up-to-date deadline, see blk_mq_start_request().
+	 */
+	smp_rmb();
+
+	deadline = READ_ONCE(rq->deadline);
+
+	/*
 	 * The rq being checked may have been freed and reallocated
 	 * out already here, we avoid this race by checking rq->deadline
 	 * and REQ_ATOM_COMPLETE flag together:
@@ -761,11 +780,20 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	 *   and clearing the flag in blk_mq_start_request(), so
 	 *   this rq won't be timed out too.
 	 */
-	if (time_after_eq(jiffies, rq->deadline)) {
-		if (!blk_mark_rq_complete(rq))
+	if (time_after_eq(jiffies, deadline)) {
+		if (!blk_mark_rq_complete(rq)) {
+			/*
+			 * Again coherence order ensures that consecutive reads
+			 * from the same variable must be in that order. This
+			 * ensures that if we see COMPLETE clear, we must then
+			 * see STARTED set and we'll ignore this timeout.
+			 *
+			 * (There's also the MB implied by the test_and_clear())
+			 */
 			blk_mq_rq_timed_out(rq, reserved);
-	} else if (!data->next_set || time_after(data->next, rq->deadline)) {
-		data->next = rq->deadline;
+		}
+	} else if (!data->next_set || time_after(data->next, deadline)) {
+		data->next = deadline;
 		data->next_set = 1;
 	}
 }
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 17ec83bb0900..e3e9c9771d36 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -211,7 +211,7 @@ void blk_add_timer(struct request *req)
 	if (!req->timeout)
 		req->timeout = q->rq_timeout;
 
-	req->deadline = jiffies + req->timeout;
+	WRITE_ONCE(req->deadline, jiffies + req->timeout);
 
 	/*
 	 * Only the non-mq case needs to add the request to a protected list.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH -v2] blk-mq: Start to fix memory ordering...
  2017-09-06  8:00 [PATCH -v2] blk-mq: Start to fix memory ordering Peter Zijlstra
@ 2017-09-06 14:02 ` Alan Stern
  2017-09-06 14:19   ` Boqun Feng
  2017-10-04 17:18 ` Jens Axboe
  1 sibling, 1 reply; 4+ messages in thread
From: Alan Stern @ 2017-09-06 14:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jens Axboe, parri.andrea, linux-kernel, tom.leiming, hch,
	paulmck, will.deacon, boqun.feng

On Wed, 6 Sep 2017, Peter Zijlstra wrote:

> Attempt to untangle the ordering in blk-mq. The patch introducing the
> single smp_mb__before_atomic() is obviously broken in that it doesn't
> clearly specify a pairing barrier and an obtained guarantee.
> 
> The comment is further misleading in that it hints that the
> deadline store and the COMPLETE store also need to be ordered, but
> AFAICT there is no such dependency. However what does appear to be
> important is the clear happening _after_ the store, and that worked by
> pure accident.
> 
> This clarifies blk_mq_start_request() -- we should not get there with
> STARTING set -- this simplifies the code and makes the barrier usage
> sane (the old code could be read to allow not having _any_ atomic after
> the barrier, in which case the barrier hasn't got anything to order). We
> then also introduce the missing pairing barrier for it.
> 
> Also down-grade the barrier to smp_wmb(), this is cheaper for
> PowerPC/ARM and doesn't cost anything extra on x86.
> 
> And it documents the STARTING vs COMPLETE ordering. Although I've not
> been entirely successful in reverse engineering the blk-mq state
> machine so there might still be more funnies around timeout vs
> requeue.
> 
> If I got anything wrong, feel free to educate me by adding comments to
> clarify things ;-)
> 
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ming Lei <tom.leiming@gmail.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Andrea Parri <parri.andrea@gmail.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Fixes: 538b75341835 ("blk-mq: request deadline must be visible before marking rq as started")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  - spelling; Andrea and Bart
>  - compiles (urgh!)
>  - smp_wmb(); Adrea
> 
> 
>  block/blk-mq.c      | 52 ++++++++++++++++++++++++++++++++++++++++------------
>  block/blk-timeout.c |  2 +-
>  2 files changed, 41 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4603b115e234..506a0f355117 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -558,22 +558,32 @@ void blk_mq_start_request(struct request *rq)
>  
>  	blk_add_timer(rq);
>  
> -	/*
> -	 * Ensure that ->deadline is visible before set the started
> -	 * flag and clear the completed flag.
> -	 */
> -	smp_mb__before_atomic();
> +	WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags));
>  
>  	/*
>  	 * Mark us as started and clear complete. Complete might have been
>  	 * set if requeue raced with timeout, which then marked it as
>  	 * complete. So be sure to clear complete again when we start
>  	 * the request, otherwise we'll ignore the completion event.
> +	 *
> +	 * Ensure that ->deadline is visible before we set STARTED, such that
> +	 * blk_mq_check_expired() is guaranteed to observe our ->deadline when
> +	 * it observes STARTED.
>  	 */
> -	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> -		set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
> -	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
> +	smp_wmb();
> +	set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
> +	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
> +		/*
> +		 * Coherence order guarantees these consecutive stores to a
> +		 * single variable propagate in the specified order. Thus the
> +		 * clear_bit() is ordered _after_ the set bit. See
> +		 * blk_mq_check_expired().
> +		 *
> +		 * (the bits must be part of the same byte for this to be
> +		 * true).

Adding this comment is well and good, but for more security you should 
also add a comment (maybe even a compile-time check) to the place where 
REQ_ATOM_STARTED and REQ_ATOM_COMPLETE are defined.  Otherwise they 
might eventually get moved into separate bytes.

Alan Stern

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -v2] blk-mq: Start to fix memory ordering...
  2017-09-06 14:02 ` Alan Stern
@ 2017-09-06 14:19   ` Boqun Feng
  0 siblings, 0 replies; 4+ messages in thread
From: Boqun Feng @ 2017-09-06 14:19 UTC (permalink / raw)
  To: Alan Stern
  Cc: Peter Zijlstra, Jens Axboe, parri.andrea, linux-kernel,
	tom.leiming, hch, paulmck, will.deacon

[-- Attachment #1: Type: text/plain, Size: 4422 bytes --]

On Wed, Sep 06, 2017 at 10:02:00AM -0400, Alan Stern wrote:
> On Wed, 6 Sep 2017, Peter Zijlstra wrote:
> 
> > Attempt to untangle the ordering in blk-mq. The patch introducing the
> > single smp_mb__before_atomic() is obviously broken in that it doesn't
> > clearly specify a pairing barrier and an obtained guarantee.
> > 
> > The comment is further misleading in that it hints that the
> > deadline store and the COMPLETE store also need to be ordered, but
> > AFAICT there is no such dependency. However what does appear to be
> > important is the clear happening _after_ the store, and that worked by
> > pure accident.
> > 
> > This clarifies blk_mq_start_request() -- we should not get there with
> > STARTING set -- this simplifies the code and makes the barrier usage
> > sane (the old code could be read to allow not having _any_ atomic after
> > the barrier, in which case the barrier hasn't got anything to order). We
> > then also introduce the missing pairing barrier for it.
> > 
> > Also down-grade the barrier to smp_wmb(), this is cheaper for
> > PowerPC/ARM and doesn't cost anything extra on x86.
> > 
> > And it documents the STARTING vs COMPLETE ordering. Although I've not
> > been entirely successful in reverse engineering the blk-mq state
> > machine so there might still be more funnies around timeout vs
> > requeue.
> > 
> > If I got anything wrong, feel free to educate me by adding comments to
> > clarify things ;-)
> > 
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Ming Lei <tom.leiming@gmail.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Jens Axboe <axboe@fb.com>
> > Cc: Andrea Parri <parri.andrea@gmail.com>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Bart Van Assche <bart.vanassche@wdc.com>
> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > Fixes: 538b75341835 ("blk-mq: request deadline must be visible before marking rq as started")
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  - spelling; Andrea and Bart
> >  - compiles (urgh!)
> >  - smp_wmb(); Adrea
> > 
> > 
> >  block/blk-mq.c      | 52 ++++++++++++++++++++++++++++++++++++++++------------
> >  block/blk-timeout.c |  2 +-
> >  2 files changed, 41 insertions(+), 13 deletions(-)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 4603b115e234..506a0f355117 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -558,22 +558,32 @@ void blk_mq_start_request(struct request *rq)
> >  
> >  	blk_add_timer(rq);
> >  
> > -	/*
> > -	 * Ensure that ->deadline is visible before set the started
> > -	 * flag and clear the completed flag.
> > -	 */
> > -	smp_mb__before_atomic();
> > +	WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags));
> >  
> >  	/*
> >  	 * Mark us as started and clear complete. Complete might have been
> >  	 * set if requeue raced with timeout, which then marked it as
> >  	 * complete. So be sure to clear complete again when we start
> >  	 * the request, otherwise we'll ignore the completion event.
> > +	 *
> > +	 * Ensure that ->deadline is visible before we set STARTED, such that
> > +	 * blk_mq_check_expired() is guaranteed to observe our ->deadline when
> > +	 * it observes STARTED.
> >  	 */
> > -	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
> > -		set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
> > -	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags))
> > +	smp_wmb();
> > +	set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
> > +	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
> > +		/*
> > +		 * Coherence order guarantees these consecutive stores to a
> > +		 * single variable propagate in the specified order. Thus the
> > +		 * clear_bit() is ordered _after_ the set bit. See
> > +		 * blk_mq_check_expired().
> > +		 *
> > +		 * (the bits must be part of the same byte for this to be
> > +		 * true).
> 
> Adding this comment is well and good, but for more security you should 
> also add a comment (maybe even a compile-time check) to the place where 
> REQ_ATOM_STARTED and REQ_ATOM_COMPLETE are defined.  Otherwise they 
> might eventually get moved into separate bytes.
> 

How about adding:

	BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) != (REQ_ATOM_COMPLETE / BITS_PER_BYTE));

here?

Regards,
Boqun

> Alan Stern
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -v2] blk-mq: Start to fix memory ordering...
  2017-09-06  8:00 [PATCH -v2] blk-mq: Start to fix memory ordering Peter Zijlstra
  2017-09-06 14:02 ` Alan Stern
@ 2017-10-04 17:18 ` Jens Axboe
  1 sibling, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2017-10-04 17:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: parri.andrea, linux-kernel, tom.leiming, hch, paulmck,
	will.deacon, boqun.feng, stern

On 09/06/2017 02:00 AM, Peter Zijlstra wrote:
> 
> Attempt to untangle the ordering in blk-mq. The patch introducing the
> single smp_mb__before_atomic() is obviously broken in that it doesn't
> clearly specify a pairing barrier and an obtained guarantee.
> 
> The comment is further misleading in that it hints that the
> deadline store and the COMPLETE store also need to be ordered, but
> AFAICT there is no such dependency. However what does appear to be
> important is the clear happening _after_ the store, and that worked by
> pure accident.
> 
> This clarifies blk_mq_start_request() -- we should not get there with
> STARTING set -- this simplifies the code and makes the barrier usage
> sane (the old code could be read to allow not having _any_ atomic after
> the barrier, in which case the barrier hasn't got anything to order). We
> then also introduce the missing pairing barrier for it.
> 
> Also down-grade the barrier to smp_wmb(), this is cheaper for
> PowerPC/ARM and doesn't cost anything extra on x86.
> 
> And it documents the STARTING vs COMPLETE ordering. Although I've not
> been entirely successful in reverse engineering the blk-mq state
> machine so there might still be more funnies around timeout vs
> requeue.
> 
> If I got anything wrong, feel free to educate me by adding comments to
> clarify things ;-)

Sorry for the belated response on this, I spent some time and looked
over everything. Looks solid to me.

I'll queue this up for some testing, and also add a compile check to
prevent us violating the need to have STARTED and COMPLETED be in
the same byte of storage.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-04 17:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-06  8:00 [PATCH -v2] blk-mq: Start to fix memory ordering Peter Zijlstra
2017-09-06 14:02 ` Alan Stern
2017-09-06 14:19   ` Boqun Feng
2017-10-04 17:18 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.