linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
       [not found]       ` <1ec7922c-f2b0-08ec-5849-f4eb7f71e9e7@acm.org>
@ 2020-05-28  5:19         ` Ming Lei
  2020-05-28 13:37           ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2020-05-28  5:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, linux-block, John Garry, Hannes Reinecke,
	Thomas Gleixner, Paul E. McKenney, linux-kernel

On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
> On 2020-05-27 18:46, Ming Lei wrote:
> > On Wed, May 27, 2020 at 04:09:19PM -0700, Bart Van Assche wrote:
> >> On 2020-05-27 11:06, Christoph Hellwig wrote:
> >>> --- a/block/blk-mq-tag.c
> >>> +++ b/block/blk-mq-tag.c
> >>> @@ -180,6 +180,14 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
> >>>  	sbitmap_finish_wait(bt, ws, &wait);
> >>>  
> >>>  found_tag:
> >>> +	/*
> >>> +	 * Give up this allocation if the hctx is inactive.  The caller will
> >>> +	 * retry on an active hctx.
> >>> +	 */
> >>> +	if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state))) {
> >>> +		blk_mq_put_tag(tags, data->ctx, tag + tag_offset);
> >>> +		return -1;
> >>> +	}
> >>>  	return tag + tag_offset;
> >>>  }
> >>
> >> The code that has been added in blk_mq_hctx_notify_offline() will only
> >> work correctly if blk_mq_get_tag() tests BLK_MQ_S_INACTIVE after the
> >> store instructions involved in the tag allocation happened. Does this
> >> mean that a memory barrier should be added in the above function before
> >> the test_bit() call?
> > 
> > Please see comment in blk_mq_hctx_notify_offline():
> > 
> > +       /*
> > +        * Prevent new request from being allocated on the current hctx.
> > +        *
> > +        * The smp_mb__after_atomic() Pairs with the implied barrier in
> > +        * test_and_set_bit_lock in sbitmap_get().  Ensures the inactive flag is
> > +        * seen once we return from the tag allocator.
> > +        */
> > +       set_bit(BLK_MQ_S_INACTIVE, &hctx->state);
> 
> From Documentation/atomic_bitops.txt: "Except for a successful
> test_and_set_bit_lock() which has ACQUIRE semantics and
> clear_bit_unlock() which has RELEASE semantics."

test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state) is called exactly after
one tag is allocated, that means test_and_set_bit_lock is successful before
the test_bit(). The ACQUIRE semantics guarantees that test_bit(BLK_MQ_S_INACTIVE)
is always done after successful test_and_set_bit_lock(), so tag bit is
always set before testing BLK_MQ_S_INACTIVE.
 
See Documentation/memory-barriers.txt:
 (5) ACQUIRE operations.

     This acts as a one-way permeable barrier.  It guarantees that all memory
     operations after the ACQUIRE operation will appear to happen after the
     ACQUIRE operation with respect to the other components of the system.
     ACQUIRE operations include LOCK operations and both smp_load_acquire()
     and smp_cond_load_acquire() operations.

> 
> My understanding is that operations that have acquire semantics pair
> with operations that have release semantics. I haven't been able to find
> any documentation that shows that smp_mb__after_atomic() has release
> semantics. So I looked up its definition. This is what I found:
> 
> $ git grep -nH 'define __smp_mb__after_atomic'
> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
> barrier()
> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
> smp_llsc_mb()
> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
> barrier()
> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
> barrier()
> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
> } while (0)
> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
> barrier()
> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
> __smp_mb()
> 
> My interpretation of the above is that not all smp_mb__after_atomic()
> implementations have release semantics. Do you agree with this conclusion?

I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
and reading the tag bit which is done in blk_mq_all_tag_iter().

So the two pair of OPs are ordered:

1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
so the request will be drained.

OR

2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
the request(tag bit) will be released and retried on another CPU
finally, see __blk_mq_alloc_request().

Cc Paul and linux-kernel list.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-28  5:19         ` [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline Ming Lei
@ 2020-05-28 13:37           ` Bart Van Assche
  2020-05-28 17:21             ` Paul E. McKenney
  2020-05-29  1:13             ` Ming Lei
  0 siblings, 2 replies; 10+ messages in thread
From: Bart Van Assche @ 2020-05-28 13:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, linux-block, John Garry, Hannes Reinecke,
	Thomas Gleixner, Paul E. McKenney, linux-kernel

On 2020-05-27 22:19, Ming Lei wrote:
> On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
>> My understanding is that operations that have acquire semantics pair
>> with operations that have release semantics. I haven't been able to find
>> any documentation that shows that smp_mb__after_atomic() has release
>> semantics. So I looked up its definition. This is what I found:
>>
>> $ git grep -nH 'define __smp_mb__after_atomic'
>> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
>> barrier()
>> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
>> smp_llsc_mb()
>> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
>> barrier()
>> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
>> barrier()
>> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
>> } while (0)
>> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
>> barrier()
>> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
>> __smp_mb()
>>
>> My interpretation of the above is that not all smp_mb__after_atomic()
>> implementations have release semantics. Do you agree with this conclusion?
> 
> I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> and reading the tag bit which is done in blk_mq_all_tag_iter().
> 
> So the two pair of OPs are ordered:
> 
> 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> so the request will be drained.
> 
> OR
> 
> 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> the request(tag bit) will be released and retried on another CPU
> finally, see __blk_mq_alloc_request().
> 
> Cc Paul and linux-kernel list.

I do not agree with the above conclusion. My understanding of
acquire/release labels is that if the following holds:
(1) A store operation that stores the value V into memory location M has
a release label.
(2) A load operation that reads memory location M has an acquire label.
(3) The load operation (2) retrieves the value V that was stored by (1).

that the following ordering property holds: all load and store
instructions that happened before the store instruction (1) in program
order are guaranteed to happen before the load and store instructions
that follow (2) in program order.

In the ARM manual these semantics have been described as follows: "A
Store-Release instruction is multicopy atomic when observed with a
Load-Acquire instruction".

In this case the load-acquire operation is the
"test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
code is executed indirectly by blk_mq_get_tag(). Since there is no
matching store-release instruction in __blk_mq_alloc_request() for
'word', ordering of the &data->hctx->state and 'tag' memory locations is
not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
word)" statement from the sbitmap code.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-28 13:37           ` Bart Van Assche
@ 2020-05-28 17:21             ` Paul E. McKenney
  2020-05-29  1:53               ` Ming Lei
  2020-05-29  1:13             ` Ming Lei
  1 sibling, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2020-05-28 17:21 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, Christoph Hellwig, linux-block, John Garry,
	Hannes Reinecke, Thomas Gleixner, linux-kernel

On Thu, May 28, 2020 at 06:37:47AM -0700, Bart Van Assche wrote:
> On 2020-05-27 22:19, Ming Lei wrote:
> > On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
> >> My understanding is that operations that have acquire semantics pair
> >> with operations that have release semantics. I haven't been able to find
> >> any documentation that shows that smp_mb__after_atomic() has release
> >> semantics. So I looked up its definition. This is what I found:
> >>
> >> $ git grep -nH 'define __smp_mb__after_atomic'
> >> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
> >> barrier()
> >> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
> >> smp_llsc_mb()
> >> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
> >> barrier()
> >> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
> >> barrier()
> >> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
> >> } while (0)
> >> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
> >> barrier()
> >> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
> >> __smp_mb()
> >>
> >> My interpretation of the above is that not all smp_mb__after_atomic()
> >> implementations have release semantics. Do you agree with this conclusion?
> > 
> > I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> > and reading the tag bit which is done in blk_mq_all_tag_iter().
> > 
> > So the two pair of OPs are ordered:
> > 
> > 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> > the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> > so the request will be drained.
> > 
> > OR
> > 
> > 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> > the request(tag bit) will be released and retried on another CPU
> > finally, see __blk_mq_alloc_request().
> > 
> > Cc Paul and linux-kernel list.
> 
> I do not agree with the above conclusion. My understanding of
> acquire/release labels is that if the following holds:
> (1) A store operation that stores the value V into memory location M has
> a release label.
> (2) A load operation that reads memory location M has an acquire label.
> (3) The load operation (2) retrieves the value V that was stored by (1).
> 
> that the following ordering property holds: all load and store
> instructions that happened before the store instruction (1) in program
> order are guaranteed to happen before the load and store instructions
> that follow (2) in program order.
> 
> In the ARM manual these semantics have been described as follows: "A
> Store-Release instruction is multicopy atomic when observed with a
> Load-Acquire instruction".
> 
> In this case the load-acquire operation is the
> "test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
> code is executed indirectly by blk_mq_get_tag(). Since there is no
> matching store-release instruction in __blk_mq_alloc_request() for
> 'word', ordering of the &data->hctx->state and 'tag' memory locations is
> not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
> word)" statement from the sbitmap code.

I feel like I just parachuted into the middle of the conversation,
so let me start by giving a (silly) example illustrating the limits of
smp_mb__{before,after}_atomic() that might be tangling things up.

But please please please avoid doing this in real code unless you have
an extremely good reason included in a comment.

void t1(void)
{
	WRITE_ONCE(a, 1);
	smp_mb__before_atomic();
	WRITE_ONCE(b, 1);  // Just Say No to code here!!!
	atomic_inc(&c);
	WRITE_ONCE(d, 1);  // Just Say No to code here!!!
	smp_mb__after_atomic();
	WRITE_ONCE(e, 1);
}

void t2(void)
{
	r1 = READ_ONCE(e);
	smp_mb();
	r2 = READ_ONCE(d);
	smp_mb();
	r3 = READ_ONCE(c);
	smp_mb();
	r4 = READ_ONCE(b);
	smp_mb();
	r5 = READ_ONCE(a);
}

Each platform must provide strong ordering for either atomic_inc()
on the one hand (as ia64 does) or for smp_mb__{before,after}_atomic()
on the other (as powerpc does).  Note that both ia64 and powerpc are
weakly ordered.

So ia64 could see (r1 == 1 && r2 == 0) on the one hand as well as (r4 ==
1 && r5 == 0).  So clearly smp_mb_{before,after}_atomic() need not have
any ordering properties whatsoever.

Similarly, powerpc could see (r3 == 1 && r4 == 0) on the one hand as well
as (r2 == 1 && r3 == 0) on the other.  Or even both at the same time.
So clearly atomic_inc() need not have any ordering properties whatsoever.

But the combination of smp_mb__before_atomic() and the later atomic_inc()
does provide full ordering, so that no architecture can see (r3 == 1 &&
r5 == 0), and either of r1 or r2 can be substituted for r3.

Similarly, atomic_inc() and the late4r smp_mb__after_atomic() also
provide full ordering, so that no architecture can see (r1 == 1 && r3 ==
0), and either r4 or r5 can be substituted for r3.


So a call to set_bit() followed by a call to smp_mb__after_atomic() will
provide a full memory barrier (implying release semantics) for any write
access after the smp_mb__after_atomic() with respect to the set_bit() or
any access preceding it.  But the set_bit() by itself won't have release
semantics, nor will the smp_mb__after_atomic(), only their combination
further combined with some write following the smp_mb__after_atomic().

More generally, there will be the equivalent of smp_mb() somewhere between
the set_bit() and every access following the smp_mb__after_atomic().

Does that help, or am I missing the point?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-28 13:37           ` Bart Van Assche
  2020-05-28 17:21             ` Paul E. McKenney
@ 2020-05-29  1:13             ` Ming Lei
  1 sibling, 0 replies; 10+ messages in thread
From: Ming Lei @ 2020-05-29  1:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, linux-block, John Garry, Hannes Reinecke,
	Thomas Gleixner, Paul E. McKenney, linux-kernel

On Thu, May 28, 2020 at 06:37:47AM -0700, Bart Van Assche wrote:
> On 2020-05-27 22:19, Ming Lei wrote:
> > On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
> >> My understanding is that operations that have acquire semantics pair
> >> with operations that have release semantics. I haven't been able to find
> >> any documentation that shows that smp_mb__after_atomic() has release
> >> semantics. So I looked up its definition. This is what I found:
> >>
> >> $ git grep -nH 'define __smp_mb__after_atomic'
> >> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
> >> barrier()
> >> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
> >> smp_llsc_mb()
> >> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
> >> barrier()
> >> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
> >> barrier()
> >> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
> >> } while (0)
> >> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
> >> barrier()
> >> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
> >> __smp_mb()
> >>
> >> My interpretation of the above is that not all smp_mb__after_atomic()
> >> implementations have release semantics. Do you agree with this conclusion?
> > 
> > I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> > and reading the tag bit which is done in blk_mq_all_tag_iter().
> > 
> > So the two pair of OPs are ordered:
> > 
> > 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> > the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> > so the request will be drained.
> > 
> > OR
> > 
> > 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> > the request(tag bit) will be released and retried on another CPU
> > finally, see __blk_mq_alloc_request().
> > 
> > Cc Paul and linux-kernel list.
> 
> I do not agree with the above conclusion. My understanding of
> acquire/release labels is that if the following holds:
> (1) A store operation that stores the value V into memory location M has
> a release label.
> (2) A load operation that reads memory location M has an acquire label.
> (3) The load operation (2) retrieves the value V that was stored by (1).
> 
> that the following ordering property holds: all load and store
> instructions that happened before the store instruction (1) in program
> order are guaranteed to happen before the load and store instructions
> that follow (2) in program order.
> 
> In the ARM manual these semantics have been described as follows: "A
> Store-Release instruction is multicopy atomic when observed with a
> Load-Acquire instruction".
> 
> In this case the load-acquire operation is the
> "test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
> code is executed indirectly by blk_mq_get_tag(). Since there is no
> matching store-release instruction in __blk_mq_alloc_request() for
> 'word', ordering of the &data->hctx->state and 'tag' memory locations is
> not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
> word)" statement from the sbitmap code.

If the order isn't guaranteed, either of the following two documents has to be wrong:

Documentation/memory-barriers.txt:
	...
	In all cases there are variants on "ACQUIRE" operations and "RELEASE" operations
	for each construct.  These operations all imply certain barriers:
	
	 (1) ACQUIRE operation implication:
	
	     Memory operations issued after the ACQUIRE will be completed after the
	     ACQUIRE operation has completed.

Documentation/atomic_bitops.txt:
	...
	Except for a successful test_and_set_bit_lock() which has ACQUIRE semantics and
	clear_bit_unlock() which has RELEASE semantics.

Setting the tag bit is part of successful test_and_set_bit_lock(), which has ACQUIRE
semantics, and any Memory operations(test_bit(INACTIVE)) after the ACQUIRE will be
completed after the ACQUIRE has completed according to the above two documents.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-28 17:21             ` Paul E. McKenney
@ 2020-05-29  1:53               ` Ming Lei
  2020-05-29  3:07                 ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2020-05-29  1:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Bart Van Assche, Christoph Hellwig, linux-block, John Garry,
	Hannes Reinecke, Thomas Gleixner, linux-kernel

Hi Paul,

Thanks for your response!

On Thu, May 28, 2020 at 10:21:21AM -0700, Paul E. McKenney wrote:
> On Thu, May 28, 2020 at 06:37:47AM -0700, Bart Van Assche wrote:
> > On 2020-05-27 22:19, Ming Lei wrote:
> > > On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
> > >> My understanding is that operations that have acquire semantics pair
> > >> with operations that have release semantics. I haven't been able to find
> > >> any documentation that shows that smp_mb__after_atomic() has release
> > >> semantics. So I looked up its definition. This is what I found:
> > >>
> > >> $ git grep -nH 'define __smp_mb__after_atomic'
> > >> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
> > >> barrier()
> > >> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
> > >> smp_llsc_mb()
> > >> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
> > >> barrier()
> > >> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
> > >> barrier()
> > >> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
> > >> } while (0)
> > >> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
> > >> barrier()
> > >> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
> > >> __smp_mb()
> > >>
> > >> My interpretation of the above is that not all smp_mb__after_atomic()
> > >> implementations have release semantics. Do you agree with this conclusion?
> > > 
> > > I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> > > and reading the tag bit which is done in blk_mq_all_tag_iter().
> > > 
> > > So the two pair of OPs are ordered:
> > > 
> > > 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> > > the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> > > so the request will be drained.
> > > 
> > > OR
> > > 
> > > 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> > > the request(tag bit) will be released and retried on another CPU
> > > finally, see __blk_mq_alloc_request().
> > > 
> > > Cc Paul and linux-kernel list.
> > 
> > I do not agree with the above conclusion. My understanding of
> > acquire/release labels is that if the following holds:
> > (1) A store operation that stores the value V into memory location M has
> > a release label.
> > (2) A load operation that reads memory location M has an acquire label.
> > (3) The load operation (2) retrieves the value V that was stored by (1).
> > 
> > that the following ordering property holds: all load and store
> > instructions that happened before the store instruction (1) in program
> > order are guaranteed to happen before the load and store instructions
> > that follow (2) in program order.
> > 
> > In the ARM manual these semantics have been described as follows: "A
> > Store-Release instruction is multicopy atomic when observed with a
> > Load-Acquire instruction".
> > 
> > In this case the load-acquire operation is the
> > "test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
> > code is executed indirectly by blk_mq_get_tag(). Since there is no
> > matching store-release instruction in __blk_mq_alloc_request() for
> > 'word', ordering of the &data->hctx->state and 'tag' memory locations is
> > not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
> > word)" statement from the sbitmap code.
> 
> I feel like I just parachuted into the middle of the conversation,
> so let me start by giving a (silly) example illustrating the limits of
> smp_mb__{before,after}_atomic() that might be tangling things up.
> 
> But please please please avoid doing this in real code unless you have
> an extremely good reason included in a comment.
> 
> void t1(void)
> {
> 	WRITE_ONCE(a, 1);
> 	smp_mb__before_atomic();
> 	WRITE_ONCE(b, 1);  // Just Say No to code here!!!
> 	atomic_inc(&c);
> 	WRITE_ONCE(d, 1);  // Just Say No to code here!!!
> 	smp_mb__after_atomic();
> 	WRITE_ONCE(e, 1);
> }
> 
> void t2(void)
> {
> 	r1 = READ_ONCE(e);
> 	smp_mb();
> 	r2 = READ_ONCE(d);
> 	smp_mb();
> 	r3 = READ_ONCE(c);
> 	smp_mb();
> 	r4 = READ_ONCE(b);
> 	smp_mb();
> 	r5 = READ_ONCE(a);
> }
> 
> Each platform must provide strong ordering for either atomic_inc()
> on the one hand (as ia64 does) or for smp_mb__{before,after}_atomic()
> on the other (as powerpc does).  Note that both ia64 and powerpc are
> weakly ordered.
> 
> So ia64 could see (r1 == 1 && r2 == 0) on the one hand as well as (r4 ==
> 1 && r5 == 0).  So clearly smp_mb_{before,after}_atomic() need not have
> any ordering properties whatsoever.
> 
> Similarly, powerpc could see (r3 == 1 && r4 == 0) on the one hand as well
> as (r2 == 1 && r3 == 0) on the other.  Or even both at the same time.
> So clearly atomic_inc() need not have any ordering properties whatsoever.
> 
> But the combination of smp_mb__before_atomic() and the later atomic_inc()
> does provide full ordering, so that no architecture can see (r3 == 1 &&
> r5 == 0), and either of r1 or r2 can be substituted for r3.
> 
> Similarly, atomic_inc() and the late4r smp_mb__after_atomic() also
> provide full ordering, so that no architecture can see (r1 == 1 && r3 ==
> 0), and either r4 or r5 can be substituted for r3.
> 
> 
> So a call to set_bit() followed by a call to smp_mb__after_atomic() will
> provide a full memory barrier (implying release semantics) for any write
> access after the smp_mb__after_atomic() with respect to the set_bit() or
> any access preceding it.  But the set_bit() by itself won't have release
> semantics, nor will the smp_mb__after_atomic(), only their combination
> further combined with some write following the smp_mb__after_atomic().
> 
> More generally, there will be the equivalent of smp_mb() somewhere between
> the set_bit() and every access following the smp_mb__after_atomic().
> 
> Does that help, or am I missing the point?

Yeah, it does help.

BTW, can we replace the smp_mb__after_atomic() with smp_mb() for
ordering set_bit() and the memory OP following the smp_mb()?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-29  1:53               ` Ming Lei
@ 2020-05-29  3:07                 ` Paul E. McKenney
  2020-05-29  3:53                   ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2020-05-29  3:07 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Christoph Hellwig, linux-block, John Garry,
	Hannes Reinecke, Thomas Gleixner, linux-kernel

On Fri, May 29, 2020 at 09:53:04AM +0800, Ming Lei wrote:
> Hi Paul,
> 
> Thanks for your response!
> 
> On Thu, May 28, 2020 at 10:21:21AM -0700, Paul E. McKenney wrote:
> > On Thu, May 28, 2020 at 06:37:47AM -0700, Bart Van Assche wrote:
> > > On 2020-05-27 22:19, Ming Lei wrote:
> > > > On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
> > > >> My understanding is that operations that have acquire semantics pair
> > > >> with operations that have release semantics. I haven't been able to find
> > > >> any documentation that shows that smp_mb__after_atomic() has release
> > > >> semantics. So I looked up its definition. This is what I found:
> > > >>
> > > >> $ git grep -nH 'define __smp_mb__after_atomic'
> > > >> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
> > > >> barrier()
> > > >> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
> > > >> smp_llsc_mb()
> > > >> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
> > > >> barrier()
> > > >> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
> > > >> barrier()
> > > >> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
> > > >> } while (0)
> > > >> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
> > > >> barrier()
> > > >> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
> > > >> __smp_mb()
> > > >>
> > > >> My interpretation of the above is that not all smp_mb__after_atomic()
> > > >> implementations have release semantics. Do you agree with this conclusion?
> > > > 
> > > > I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> > > > and reading the tag bit which is done in blk_mq_all_tag_iter().
> > > > 
> > > > So the two pair of OPs are ordered:
> > > > 
> > > > 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> > > > the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> > > > so the request will be drained.
> > > > 
> > > > OR
> > > > 
> > > > 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> > > > the request(tag bit) will be released and retried on another CPU
> > > > finally, see __blk_mq_alloc_request().
> > > > 
> > > > Cc Paul and linux-kernel list.
> > > 
> > > I do not agree with the above conclusion. My understanding of
> > > acquire/release labels is that if the following holds:
> > > (1) A store operation that stores the value V into memory location M has
> > > a release label.
> > > (2) A load operation that reads memory location M has an acquire label.
> > > (3) The load operation (2) retrieves the value V that was stored by (1).
> > > 
> > > that the following ordering property holds: all load and store
> > > instructions that happened before the store instruction (1) in program
> > > order are guaranteed to happen before the load and store instructions
> > > that follow (2) in program order.
> > > 
> > > In the ARM manual these semantics have been described as follows: "A
> > > Store-Release instruction is multicopy atomic when observed with a
> > > Load-Acquire instruction".
> > > 
> > > In this case the load-acquire operation is the
> > > "test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
> > > code is executed indirectly by blk_mq_get_tag(). Since there is no
> > > matching store-release instruction in __blk_mq_alloc_request() for
> > > 'word', ordering of the &data->hctx->state and 'tag' memory locations is
> > > not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
> > > word)" statement from the sbitmap code.
> > 
> > I feel like I just parachuted into the middle of the conversation,
> > so let me start by giving a (silly) example illustrating the limits of
> > smp_mb__{before,after}_atomic() that might be tangling things up.
> > 
> > But please please please avoid doing this in real code unless you have
> > an extremely good reason included in a comment.
> > 
> > void t1(void)
> > {
> > 	WRITE_ONCE(a, 1);
> > 	smp_mb__before_atomic();
> > 	WRITE_ONCE(b, 1);  // Just Say No to code here!!!
> > 	atomic_inc(&c);
> > 	WRITE_ONCE(d, 1);  // Just Say No to code here!!!
> > 	smp_mb__after_atomic();
> > 	WRITE_ONCE(e, 1);
> > }
> > 
> > void t2(void)
> > {
> > 	r1 = READ_ONCE(e);
> > 	smp_mb();
> > 	r2 = READ_ONCE(d);
> > 	smp_mb();
> > 	r3 = READ_ONCE(c);
> > 	smp_mb();
> > 	r4 = READ_ONCE(b);
> > 	smp_mb();
> > 	r5 = READ_ONCE(a);
> > }
> > 
> > Each platform must provide strong ordering for either atomic_inc()
> > on the one hand (as ia64 does) or for smp_mb__{before,after}_atomic()
> > on the other (as powerpc does).  Note that both ia64 and powerpc are
> > weakly ordered.
> > 
> > So ia64 could see (r1 == 1 && r2 == 0) on the one hand as well as (r4 ==
> > 1 && r5 == 0).  So clearly smp_mb_{before,after}_atomic() need not have
> > any ordering properties whatsoever.
> > 
> > Similarly, powerpc could see (r3 == 1 && r4 == 0) on the one hand as well
> > as (r2 == 1 && r3 == 0) on the other.  Or even both at the same time.
> > So clearly atomic_inc() need not have any ordering properties whatsoever.
> > 
> > But the combination of smp_mb__before_atomic() and the later atomic_inc()
> > does provide full ordering, so that no architecture can see (r3 == 1 &&
> > r5 == 0), and either of r1 or r2 can be substituted for r3.
> > 
> > Similarly, atomic_inc() and the late4r smp_mb__after_atomic() also
> > provide full ordering, so that no architecture can see (r1 == 1 && r3 ==
> > 0), and either r4 or r5 can be substituted for r3.
> > 
> > 
> > So a call to set_bit() followed by a call to smp_mb__after_atomic() will
> > provide a full memory barrier (implying release semantics) for any write
> > access after the smp_mb__after_atomic() with respect to the set_bit() or
> > any access preceding it.  But the set_bit() by itself won't have release
> > semantics, nor will the smp_mb__after_atomic(), only their combination
> > further combined with some write following the smp_mb__after_atomic().
> > 
> > More generally, there will be the equivalent of smp_mb() somewhere between
> > the set_bit() and every access following the smp_mb__after_atomic().
> > 
> > Does that help, or am I missing the point?
> 
> Yeah, it does help.
> 
> BTW, can we replace the smp_mb__after_atomic() with smp_mb() for
> ordering set_bit() and the memory OP following the smp_mb()?

Placing an smp_mb() between set_bit() and a later access will indeed
order set_bit() with that later access.

That said, I don't know this code well enough to say whether or not
that ordering is sufficient.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-29  3:07                 ` Paul E. McKenney
@ 2020-05-29  3:53                   ` Ming Lei
  2020-05-29 18:13                     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2020-05-29  3:53 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Bart Van Assche, Christoph Hellwig, linux-block, John Garry,
	Hannes Reinecke, Thomas Gleixner, linux-kernel

Hi Paul,

On Thu, May 28, 2020 at 08:07:28PM -0700, Paul E. McKenney wrote:
> On Fri, May 29, 2020 at 09:53:04AM +0800, Ming Lei wrote:
> > Hi Paul,
> > 
> > Thanks for your response!
> > 
> > On Thu, May 28, 2020 at 10:21:21AM -0700, Paul E. McKenney wrote:
> > > On Thu, May 28, 2020 at 06:37:47AM -0700, Bart Van Assche wrote:
> > > > On 2020-05-27 22:19, Ming Lei wrote:
> > > > > On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
> > > > >> My understanding is that operations that have acquire semantics pair
> > > > >> with operations that have release semantics. I haven't been able to find
> > > > >> any documentation that shows that smp_mb__after_atomic() has release
> > > > >> semantics. So I looked up its definition. This is what I found:
> > > > >>
> > > > >> $ git grep -nH 'define __smp_mb__after_atomic'
> > > > >> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
> > > > >> barrier()
> > > > >> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
> > > > >> smp_llsc_mb()
> > > > >> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
> > > > >> barrier()
> > > > >> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
> > > > >> barrier()
> > > > >> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
> > > > >> } while (0)
> > > > >> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
> > > > >> barrier()
> > > > >> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
> > > > >> __smp_mb()
> > > > >>
> > > > >> My interpretation of the above is that not all smp_mb__after_atomic()
> > > > >> implementations have release semantics. Do you agree with this conclusion?
> > > > > 
> > > > > I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> > > > > and reading the tag bit which is done in blk_mq_all_tag_iter().
> > > > > 
> > > > > So the two pair of OPs are ordered:
> > > > > 
> > > > > 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> > > > > the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> > > > > so the request will be drained.
> > > > > 
> > > > > OR
> > > > > 
> > > > > 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> > > > > the request(tag bit) will be released and retried on another CPU
> > > > > finally, see __blk_mq_alloc_request().
> > > > > 
> > > > > Cc Paul and linux-kernel list.
> > > > 
> > > > I do not agree with the above conclusion. My understanding of
> > > > acquire/release labels is that if the following holds:
> > > > (1) A store operation that stores the value V into memory location M has
> > > > a release label.
> > > > (2) A load operation that reads memory location M has an acquire label.
> > > > (3) The load operation (2) retrieves the value V that was stored by (1).
> > > > 
> > > > that the following ordering property holds: all load and store
> > > > instructions that happened before the store instruction (1) in program
> > > > order are guaranteed to happen before the load and store instructions
> > > > that follow (2) in program order.
> > > > 
> > > > In the ARM manual these semantics have been described as follows: "A
> > > > Store-Release instruction is multicopy atomic when observed with a
> > > > Load-Acquire instruction".
> > > > 
> > > > In this case the load-acquire operation is the
> > > > "test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
> > > > code is executed indirectly by blk_mq_get_tag(). Since there is no
> > > > matching store-release instruction in __blk_mq_alloc_request() for
> > > > 'word', ordering of the &data->hctx->state and 'tag' memory locations is
> > > > not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
> > > > word)" statement from the sbitmap code.
> > > 
> > > I feel like I just parachuted into the middle of the conversation,
> > > so let me start by giving a (silly) example illustrating the limits of
> > > smp_mb__{before,after}_atomic() that might be tangling things up.
> > > 
> > > But please please please avoid doing this in real code unless you have
> > > an extremely good reason included in a comment.
> > > 
> > > void t1(void)
> > > {
> > > 	WRITE_ONCE(a, 1);
> > > 	smp_mb__before_atomic();
> > > 	WRITE_ONCE(b, 1);  // Just Say No to code here!!!
> > > 	atomic_inc(&c);
> > > 	WRITE_ONCE(d, 1);  // Just Say No to code here!!!
> > > 	smp_mb__after_atomic();
> > > 	WRITE_ONCE(e, 1);
> > > }
> > > 
> > > void t2(void)
> > > {
> > > 	r1 = READ_ONCE(e);
> > > 	smp_mb();
> > > 	r2 = READ_ONCE(d);
> > > 	smp_mb();
> > > 	r3 = READ_ONCE(c);
> > > 	smp_mb();
> > > 	r4 = READ_ONCE(b);
> > > 	smp_mb();
> > > 	r5 = READ_ONCE(a);
> > > }
> > > 
> > > Each platform must provide strong ordering for either atomic_inc()
> > > on the one hand (as ia64 does) or for smp_mb__{before,after}_atomic()
> > > on the other (as powerpc does).  Note that both ia64 and powerpc are
> > > weakly ordered.
> > > 
> > > So ia64 could see (r1 == 1 && r2 == 0) on the one hand as well as (r4 ==
> > > 1 && r5 == 0).  So clearly smp_mb_{before,after}_atomic() need not have
> > > any ordering properties whatsoever.
> > > 
> > > Similarly, powerpc could see (r3 == 1 && r4 == 0) on the one hand as well
> > > as (r2 == 1 && r3 == 0) on the other.  Or even both at the same time.
> > > So clearly atomic_inc() need not have any ordering properties whatsoever.
> > > 
> > > But the combination of smp_mb__before_atomic() and the later atomic_inc()
> > > does provide full ordering, so that no architecture can see (r3 == 1 &&
> > > r5 == 0), and either of r1 or r2 can be substituted for r3.
> > > 
> > > Similarly, atomic_inc() and the late4r smp_mb__after_atomic() also
> > > provide full ordering, so that no architecture can see (r1 == 1 && r3 ==
> > > 0), and either r4 or r5 can be substituted for r3.
> > > 
> > > 
> > > So a call to set_bit() followed by a call to smp_mb__after_atomic() will
> > > provide a full memory barrier (implying release semantics) for any write
> > > access after the smp_mb__after_atomic() with respect to the set_bit() or
> > > any access preceding it.  But the set_bit() by itself won't have release
> > > semantics, nor will the smp_mb__after_atomic(), only their combination
> > > further combined with some write following the smp_mb__after_atomic().
> > > 
> > > More generally, there will be the equivalent of smp_mb() somewhere between
> > > the set_bit() and every access following the smp_mb__after_atomic().
> > > 
> > > Does that help, or am I missing the point?
> > 
> > Yeah, it does help.
> > 
> > BTW, can we replace the smp_mb__after_atomic() with smp_mb() for
> > ordering set_bit() and the memory OP following the smp_mb()?
> 
> Placing an smp_mb() between set_bit() and a later access will indeed
> order set_bit() with that later access.
> 
> That said, I don't know this code well enough to say whether or not
> that ordering is sufficient.

Another pair is in blk_mq_get_tag(), and we expect the following two
memory OPs are ordered:

1) set bit in successful test_and_set_bit_lock(), which is called
from sbitmap_get()

2) test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state)

Do you think that the above two OPs are ordered?

Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-29  3:53                   ` Ming Lei
@ 2020-05-29 18:13                     ` Paul E. McKenney
  2020-05-29 19:55                       ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2020-05-29 18:13 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Christoph Hellwig, linux-block, John Garry,
	Hannes Reinecke, Thomas Gleixner, linux-kernel

On Fri, May 29, 2020 at 11:53:15AM +0800, Ming Lei wrote:
> Hi Paul,
> 
> On Thu, May 28, 2020 at 08:07:28PM -0700, Paul E. McKenney wrote:
> > On Fri, May 29, 2020 at 09:53:04AM +0800, Ming Lei wrote:
> > > Hi Paul,
> > > 
> > > Thanks for your response!
> > > 
> > > On Thu, May 28, 2020 at 10:21:21AM -0700, Paul E. McKenney wrote:
> > > > On Thu, May 28, 2020 at 06:37:47AM -0700, Bart Van Assche wrote:
> > > > > On 2020-05-27 22:19, Ming Lei wrote:
> > > > > > On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
> > > > > >> My understanding is that operations that have acquire semantics pair
> > > > > >> with operations that have release semantics. I haven't been able to find
> > > > > >> any documentation that shows that smp_mb__after_atomic() has release
> > > > > >> semantics. So I looked up its definition. This is what I found:
> > > > > >>
> > > > > >> $ git grep -nH 'define __smp_mb__after_atomic'
> > > > > >> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
> > > > > >> barrier()
> > > > > >> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
> > > > > >> smp_llsc_mb()
> > > > > >> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
> > > > > >> barrier()
> > > > > >> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
> > > > > >> barrier()
> > > > > >> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
> > > > > >> } while (0)
> > > > > >> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
> > > > > >> barrier()
> > > > > >> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
> > > > > >> __smp_mb()
> > > > > >>
> > > > > >> My interpretation of the above is that not all smp_mb__after_atomic()
> > > > > >> implementations have release semantics. Do you agree with this conclusion?
> > > > > > 
> > > > > > I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> > > > > > and reading the tag bit which is done in blk_mq_all_tag_iter().
> > > > > > 
> > > > > > So the two pair of OPs are ordered:
> > > > > > 
> > > > > > 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> > > > > > the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> > > > > > so the request will be drained.
> > > > > > 
> > > > > > OR
> > > > > > 
> > > > > > 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> > > > > > the request(tag bit) will be released and retried on another CPU
> > > > > > finally, see __blk_mq_alloc_request().
> > > > > > 
> > > > > > Cc Paul and linux-kernel list.
> > > > > 
> > > > > I do not agree with the above conclusion. My understanding of
> > > > > acquire/release labels is that if the following holds:
> > > > > (1) A store operation that stores the value V into memory location M has
> > > > > a release label.
> > > > > (2) A load operation that reads memory location M has an acquire label.
> > > > > (3) The load operation (2) retrieves the value V that was stored by (1).
> > > > > 
> > > > > that the following ordering property holds: all load and store
> > > > > instructions that happened before the store instruction (1) in program
> > > > > order are guaranteed to happen before the load and store instructions
> > > > > that follow (2) in program order.
> > > > > 
> > > > > In the ARM manual these semantics have been described as follows: "A
> > > > > Store-Release instruction is multicopy atomic when observed with a
> > > > > Load-Acquire instruction".
> > > > > 
> > > > > In this case the load-acquire operation is the
> > > > > "test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
> > > > > code is executed indirectly by blk_mq_get_tag(). Since there is no
> > > > > matching store-release instruction in __blk_mq_alloc_request() for
> > > > > 'word', ordering of the &data->hctx->state and 'tag' memory locations is
> > > > > not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
> > > > > word)" statement from the sbitmap code.
> > > > 
> > > > I feel like I just parachuted into the middle of the conversation,
> > > > so let me start by giving a (silly) example illustrating the limits of
> > > > smp_mb__{before,after}_atomic() that might be tangling things up.
> > > > 
> > > > But please please please avoid doing this in real code unless you have
> > > > an extremely good reason included in a comment.
> > > > 
> > > > void t1(void)
> > > > {
> > > > 	WRITE_ONCE(a, 1);
> > > > 	smp_mb__before_atomic();
> > > > 	WRITE_ONCE(b, 1);  // Just Say No to code here!!!
> > > > 	atomic_inc(&c);
> > > > 	WRITE_ONCE(d, 1);  // Just Say No to code here!!!
> > > > 	smp_mb__after_atomic();
> > > > 	WRITE_ONCE(e, 1);
> > > > }
> > > > 
> > > > void t2(void)
> > > > {
> > > > 	r1 = READ_ONCE(e);
> > > > 	smp_mb();
> > > > 	r2 = READ_ONCE(d);
> > > > 	smp_mb();
> > > > 	r3 = READ_ONCE(c);
> > > > 	smp_mb();
> > > > 	r4 = READ_ONCE(b);
> > > > 	smp_mb();
> > > > 	r5 = READ_ONCE(a);
> > > > }
> > > > 
> > > > Each platform must provide strong ordering for either atomic_inc()
> > > > on the one hand (as ia64 does) or for smp_mb__{before,after}_atomic()
> > > > on the other (as powerpc does).  Note that both ia64 and powerpc are
> > > > weakly ordered.
> > > > 
> > > > So ia64 could see (r1 == 1 && r2 == 0) on the one hand as well as (r4 ==
> > > > 1 && r5 == 0).  So clearly smp_mb_{before,after}_atomic() need not have
> > > > any ordering properties whatsoever.
> > > > 
> > > > Similarly, powerpc could see (r3 == 1 && r4 == 0) on the one hand as well
> > > > as (r2 == 1 && r3 == 0) on the other.  Or even both at the same time.
> > > > So clearly atomic_inc() need not have any ordering properties whatsoever.
> > > > 
> > > > But the combination of smp_mb__before_atomic() and the later atomic_inc()
> > > > does provide full ordering, so that no architecture can see (r3 == 1 &&
> > > > r5 == 0), and either of r1 or r2 can be substituted for r3.
> > > > 
> > > > Similarly, atomic_inc() and the late4r smp_mb__after_atomic() also
> > > > provide full ordering, so that no architecture can see (r1 == 1 && r3 ==
> > > > 0), and either r4 or r5 can be substituted for r3.
> > > > 
> > > > 
> > > > So a call to set_bit() followed by a call to smp_mb__after_atomic() will
> > > > provide a full memory barrier (implying release semantics) for any write
> > > > access after the smp_mb__after_atomic() with respect to the set_bit() or
> > > > any access preceding it.  But the set_bit() by itself won't have release
> > > > semantics, nor will the smp_mb__after_atomic(), only their combination
> > > > further combined with some write following the smp_mb__after_atomic().
> > > > 
> > > > More generally, there will be the equivalent of smp_mb() somewhere between
> > > > the set_bit() and every access following the smp_mb__after_atomic().
> > > > 
> > > > Does that help, or am I missing the point?
> > > 
> > > Yeah, it does help.
> > > 
> > > BTW, can we replace the smp_mb__after_atomic() with smp_mb() for
> > > ordering set_bit() and the memory OP following the smp_mb()?
> > 
> > Placing an smp_mb() between set_bit() and a later access will indeed
> > order set_bit() with that later access.
> > 
> > That said, I don't know this code well enough to say whether or not
> > that ordering is sufficient.
> 
> Another pair is in blk_mq_get_tag(), and we expect the following two
> memory OPs are ordered:
> 
> 1) set bit in successful test_and_set_bit_lock(), which is called
> from sbitmap_get()
> 
> 2) test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state)
> 
> Do you think that the above two OPs are ordered?

Given that he has been through the code, I would like to hear Bart's
thoughts, actually.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-29 18:13                     ` Paul E. McKenney
@ 2020-05-29 19:55                       ` Bart Van Assche
  2020-05-29 21:12                         ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2020-05-29 19:55 UTC (permalink / raw)
  To: paulmck, Ming Lei
  Cc: Christoph Hellwig, linux-block, John Garry, Hannes Reinecke,
	Thomas Gleixner, linux-kernel

On 2020-05-29 11:13, Paul E. McKenney wrote:
> On Fri, May 29, 2020 at 11:53:15AM +0800, Ming Lei wrote:
>> Another pair is in blk_mq_get_tag(), and we expect the following two
>> memory OPs are ordered:
>>
>> 1) set bit in successful test_and_set_bit_lock(), which is called
>> from sbitmap_get()
>>
>> 2) test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state)
>>
>> Do you think that the above two OPs are ordered?
> 
> Given that he has been through the code, I would like to hear Bart's
> thoughts, actually.

Hi Paul,

My understanding of the involved instructions is as follows (see also
https://lore.kernel.org/linux-block/b98f055f-6f38-a47c-965d-b6bcf4f5563f@huawei.com/T/#t
for the entire e-mail thread):
* blk_mq_hctx_notify_offline() sets the BLK_MQ_S_INACTIVE bit in
hctx->state, calls smp_mb__after_atomic() and waits in a loop until all
tags have been freed. Each tag is an integer number that has a 1:1
correspondence with a block layer request structure. The code that
iterates over block layer request tags relies on
__sbitmap_for_each_set(). That function examines both the 'word' and
'cleared' members of struct sbitmap_word.
* What blk_mq_hctx_notify_offline() waits for is freeing of tags by
blk_mq_put_tag(). blk_mq_put_tag() frees a tag by setting a bit in
sbitmap_word.cleared (see also sbitmap_deferred_clear_bit()).
* Tag allocation by blk_mq_get_tag() relies on test_and_set_bit_lock().
The actual allocation happens by sbitmap_get() that sets a bit in
sbitmap_word.word. blk_mg_get_tag() tests the BLK_MQ_S_INACTIVE bit
after tag allocation succeeded.

What confuses me is that blk_mq_hctx_notify_offline() uses
smp_mb__after_atomic() to enforce the order of memory accesses while
blk_mq_get_tag() relies on the acquire semantics of
test_and_set_bit_lock(). Usually ordering is enforced by combining two
smp_mb() calls or by combining a store-release with a load-acquire.

Does the Linux memory model provide the expected ordering guarantees
when combining load-acquire with smp_mb__after_atomic() as used in patch
8/8 of this series?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
  2020-05-29 19:55                       ` Bart Van Assche
@ 2020-05-29 21:12                         ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2020-05-29 21:12 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, Christoph Hellwig, linux-block, John Garry,
	Hannes Reinecke, Thomas Gleixner, linux-kernel

On Fri, May 29, 2020 at 12:55:43PM -0700, Bart Van Assche wrote:
> On 2020-05-29 11:13, Paul E. McKenney wrote:
> > On Fri, May 29, 2020 at 11:53:15AM +0800, Ming Lei wrote:
> >> Another pair is in blk_mq_get_tag(), and we expect the following two
> >> memory OPs are ordered:
> >>
> >> 1) set bit in successful test_and_set_bit_lock(), which is called
> >> from sbitmap_get()
> >>
> >> 2) test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state)
> >>
> >> Do you think that the above two OPs are ordered?
> > 
> > Given that he has been through the code, I would like to hear Bart's
> > thoughts, actually.
> 
> Hi Paul,
> 
> My understanding of the involved instructions is as follows (see also
> https://lore.kernel.org/linux-block/b98f055f-6f38-a47c-965d-b6bcf4f5563f@huawei.com/T/#t
> for the entire e-mail thread):
> * blk_mq_hctx_notify_offline() sets the BLK_MQ_S_INACTIVE bit in
> hctx->state, calls smp_mb__after_atomic() and waits in a loop until all
> tags have been freed. Each tag is an integer number that has a 1:1
> correspondence with a block layer request structure. The code that
> iterates over block layer request tags relies on
> __sbitmap_for_each_set(). That function examines both the 'word' and
> 'cleared' members of struct sbitmap_word.
> * What blk_mq_hctx_notify_offline() waits for is freeing of tags by
> blk_mq_put_tag(). blk_mq_put_tag() frees a tag by setting a bit in
> sbitmap_word.cleared (see also sbitmap_deferred_clear_bit()).
> * Tag allocation by blk_mq_get_tag() relies on test_and_set_bit_lock().
> The actual allocation happens by sbitmap_get() that sets a bit in
> sbitmap_word.word. blk_mg_get_tag() tests the BLK_MQ_S_INACTIVE bit
> after tag allocation succeeded.
> 
> What confuses me is that blk_mq_hctx_notify_offline() uses
> smp_mb__after_atomic() to enforce the order of memory accesses while
> blk_mq_get_tag() relies on the acquire semantics of
> test_and_set_bit_lock(). Usually ordering is enforced by combining two
> smp_mb() calls or by combining a store-release with a load-acquire.
> 
> Does the Linux memory model provide the expected ordering guarantees
> when combining load-acquire with smp_mb__after_atomic() as used in patch
> 8/8 of this series?

Strictly speaking, smp_mb__after_atomic() works only in combination
with a non-value-returning atomic operation. Let's look at a (silly)
example where smp_mb__after_atomic() would not help in conjunction
with smp_store_release():

void thread1(void)
{
	smp_store_release(&x, 1);
	smp_mb__after_atomic();
	r1 = READ_ONCE(y);
}

void thread2(void)
{
	smp_store_release(&y, 1);
	smp_mb__after_atomic();
	r2 = READ_ONCE(x);
}

Even on x86 (or perhaps especially on x86) it is quite possible that
execution could end with r1 == r2 == 0 because on x86 there is no
ordering whatsoever from smp_mb__after_atomic().  In this case,
the CPU is well within its rights to reorder each thread's store
with its later load.  Yes, even x86.

On the other hand, suppose that the stores are non-value-returning
atomics:

void thread1(void)
{
	atomic_inc(&x);
	smp_mb__after_atomic();
	r1 = READ_ONCE(y);
}

void thread2(void)
{
	atomic_inc(&y);
	smp_mb__after_atomic();
	r2 = READ_ONCE(x);
}

In this case, for all architectures, there would be the equivalent
of an smp_mb() full barrier associated with either the atomic_inc()
or the smp_mb__after_atomic(), which would rule out the case where
execution ends with r1 == r2 == 0.

Does that help?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-05-29 21:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200527180644.514302-1-hch@lst.de>
     [not found] ` <20200527180644.514302-9-hch@lst.de>
     [not found]   ` <7acc7ab5-02f9-e6ee-e95f-175bc0df9cbc@acm.org>
     [not found]     ` <20200528014601.GC933147@T590>
     [not found]       ` <1ec7922c-f2b0-08ec-5849-f4eb7f71e9e7@acm.org>
2020-05-28  5:19         ` [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline Ming Lei
2020-05-28 13:37           ` Bart Van Assche
2020-05-28 17:21             ` Paul E. McKenney
2020-05-29  1:53               ` Ming Lei
2020-05-29  3:07                 ` Paul E. McKenney
2020-05-29  3:53                   ` Ming Lei
2020-05-29 18:13                     ` Paul E. McKenney
2020-05-29 19:55                       ` Bart Van Assche
2020-05-29 21:12                         ` Paul E. McKenney
2020-05-29  1:13             ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).