All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
	Christoph Hellwig <hch@lst.de>,
	linux-block@vger.kernel.org, John Garry <john.garry@huawei.com>,
	Hannes Reinecke <hare@suse.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline
Date: Thu, 28 May 2020 20:07:28 -0700	[thread overview]
Message-ID: <20200529030728.GW2869@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20200529015304.GC1075489@T590>

On Fri, May 29, 2020 at 09:53:04AM +0800, Ming Lei wrote:
> Hi Paul,
> 
> Thanks for your response!
> 
> On Thu, May 28, 2020 at 10:21:21AM -0700, Paul E. McKenney wrote:
> > On Thu, May 28, 2020 at 06:37:47AM -0700, Bart Van Assche wrote:
> > > On 2020-05-27 22:19, Ming Lei wrote:
> > > > On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
> > > >> My understanding is that operations that have acquire semantics pair
> > > >> with operations that have release semantics. I haven't been able to find
> > > >> any documentation that shows that smp_mb__after_atomic() has release
> > > >> semantics. So I looked up its definition. This is what I found:
> > > >>
> > > >> $ git grep -nH 'define __smp_mb__after_atomic'
> > > >> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
> > > >> barrier()
> > > >> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
> > > >> smp_llsc_mb()
> > > >> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
> > > >> barrier()
> > > >> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
> > > >> barrier()
> > > >> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic()	do {
> > > >> } while (0)
> > > >> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()	
> > > >> barrier()
> > > >> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
> > > >> __smp_mb()
> > > >>
> > > >> My interpretation of the above is that not all smp_mb__after_atomic()
> > > >> implementations have release semantics. Do you agree with this conclusion?
> > > > 
> > > > I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> > > > and reading the tag bit which is done in blk_mq_all_tag_iter().
> > > > 
> > > > So the two pair of OPs are ordered:
> > > > 
> > > > 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> > > > the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> > > > so the request will be drained.
> > > > 
> > > > OR
> > > > 
> > > > 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> > > > the request(tag bit) will be released and retried on another CPU
> > > > finally, see __blk_mq_alloc_request().
> > > > 
> > > > Cc Paul and linux-kernel list.
> > > 
> > > I do not agree with the above conclusion. My understanding of
> > > acquire/release labels is that if the following holds:
> > > (1) A store operation that stores the value V into memory location M has
> > > a release label.
> > > (2) A load operation that reads memory location M has an acquire label.
> > > (3) The load operation (2) retrieves the value V that was stored by (1).
> > > 
> > > that the following ordering property holds: all load and store
> > > instructions that happened before the store instruction (1) in program
> > > order are guaranteed to happen before the load and store instructions
> > > that follow (2) in program order.
> > > 
> > > In the ARM manual these semantics have been described as follows: "A
> > > Store-Release instruction is multicopy atomic when observed with a
> > > Load-Acquire instruction".
> > > 
> > > In this case the load-acquire operation is the
> > > "test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
> > > code is executed indirectly by blk_mq_get_tag(). Since there is no
> > > matching store-release instruction in __blk_mq_alloc_request() for
> > > 'word', ordering of the &data->hctx->state and 'tag' memory locations is
> > > not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
> > > word)" statement from the sbitmap code.
> > 
> > I feel like I just parachuted into the middle of the conversation,
> > so let me start by giving a (silly) example illustrating the limits of
> > smp_mb__{before,after}_atomic() that might be tangling things up.
> > 
> > But please please please avoid doing this in real code unless you have
> > an extremely good reason included in a comment.
> > 
> > void t1(void)
> > {
> > 	WRITE_ONCE(a, 1);
> > 	smp_mb__before_atomic();
> > 	WRITE_ONCE(b, 1);  // Just Say No to code here!!!
> > 	atomic_inc(&c);
> > 	WRITE_ONCE(d, 1);  // Just Say No to code here!!!
> > 	smp_mb__after_atomic();
> > 	WRITE_ONCE(e, 1);
> > }
> > 
> > void t2(void)
> > {
> > 	r1 = READ_ONCE(e);
> > 	smp_mb();
> > 	r2 = READ_ONCE(d);
> > 	smp_mb();
> > 	r3 = READ_ONCE(c);
> > 	smp_mb();
> > 	r4 = READ_ONCE(b);
> > 	smp_mb();
> > 	r5 = READ_ONCE(a);
> > }
> > 
> > Each platform must provide strong ordering for either atomic_inc()
> > on the one hand (as ia64 does) or for smp_mb__{before,after}_atomic()
> > on the other (as powerpc does).  Note that both ia64 and powerpc are
> > weakly ordered.
> > 
> > So ia64 could see (r1 == 1 && r2 == 0) on the one hand as well as (r4 ==
> > 1 && r5 == 0).  So clearly smp_mb_{before,after}_atomic() need not have
> > any ordering properties whatsoever.
> > 
> > Similarly, powerpc could see (r3 == 1 && r4 == 0) on the one hand as well
> > as (r2 == 1 && r3 == 0) on the other.  Or even both at the same time.
> > So clearly atomic_inc() need not have any ordering properties whatsoever.
> > 
> > But the combination of smp_mb__before_atomic() and the later atomic_inc()
> > does provide full ordering, so that no architecture can see (r3 == 1 &&
> > r5 == 0), and either of r1 or r2 can be substituted for r3.
> > 
> > Similarly, atomic_inc() and the late4r smp_mb__after_atomic() also
> > provide full ordering, so that no architecture can see (r1 == 1 && r3 ==
> > 0), and either r4 or r5 can be substituted for r3.
> > 
> > 
> > So a call to set_bit() followed by a call to smp_mb__after_atomic() will
> > provide a full memory barrier (implying release semantics) for any write
> > access after the smp_mb__after_atomic() with respect to the set_bit() or
> > any access preceding it.  But the set_bit() by itself won't have release
> > semantics, nor will the smp_mb__after_atomic(), only their combination
> > further combined with some write following the smp_mb__after_atomic().
> > 
> > More generally, there will be the equivalent of smp_mb() somewhere between
> > the set_bit() and every access following the smp_mb__after_atomic().
> > 
> > Does that help, or am I missing the point?
> 
> Yeah, it does help.
> 
> BTW, can we replace the smp_mb__after_atomic() with smp_mb() for
> ordering set_bit() and the memory OP following the smp_mb()?

Placing an smp_mb() between set_bit() and a later access will indeed
order set_bit() with that later access.

That said, I don't know this code well enough to say whether or not
that ordering is sufficient.

						Thanx, Paul

  reply	other threads:[~2020-05-29  3:07 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-27 18:06 blk-mq: improvement CPU hotplug (simplified version) v4 Christoph Hellwig
2020-05-27 18:06 ` [PATCH 1/8] blk-mq: remove the bio argument to ->prepare_request Christoph Hellwig
2020-05-27 18:16   ` Johannes Thumshirn
2020-05-27 18:06 ` [PATCH 2/8] blk-mq: simplify the blk_mq_get_request calling convention Christoph Hellwig
2020-05-27 18:17   ` Johannes Thumshirn
2020-05-27 18:06 ` [PATCH 3/8] blk-mq: move more request initialization to blk_mq_rq_ctx_init Christoph Hellwig
2020-05-27 18:16   ` Hannes Reinecke
2020-05-28  9:50   ` Johannes Thumshirn
2020-05-27 18:06 ` [PATCH 4/8] blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG Christoph Hellwig
2020-05-27 18:14   ` Johannes Thumshirn
2020-05-27 18:17   ` Hannes Reinecke
2020-05-27 22:38   ` Bart Van Assche
2020-05-27 18:06 ` [PATCH 5/8] blk-mq: use BLK_MQ_NO_TAG in more places Christoph Hellwig
2020-05-27 18:15   ` Johannes Thumshirn
2020-05-27 18:18   ` Hannes Reinecke
2020-05-27 22:38   ` Bart Van Assche
2020-05-27 18:06 ` [PATCH 6/8] blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx Christoph Hellwig
2020-05-27 18:06 ` [PATCH 7/8] blk-mq: add blk_mq_all_tag_iter Christoph Hellwig
2020-05-27 18:21   ` Hannes Reinecke
2020-05-27 22:52   ` Bart Van Assche
2020-05-27 18:06 ` [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline Christoph Hellwig
2020-05-27 18:26   ` Hannes Reinecke
2020-05-27 23:09   ` Bart Van Assche
2020-05-28  1:46     ` Ming Lei
2020-05-28  3:33       ` Bart Van Assche
2020-05-28  5:19         ` Ming Lei
2020-05-28 13:37           ` Bart Van Assche
2020-05-28 17:21             ` Paul E. McKenney
2020-05-29  1:53               ` Ming Lei
2020-05-29  3:07                 ` Paul E. McKenney [this message]
2020-05-29  3:53                   ` Ming Lei
2020-05-29 18:13                     ` Paul E. McKenney
2020-05-29 19:55                       ` Bart Van Assche
2020-05-29 21:12                         ` Paul E. McKenney
2020-05-29  1:13             ` Ming Lei
2020-05-27 20:07 ` blk-mq: improvement CPU hotplug (simplified version) v4 Bart Van Assche
2020-05-27 20:31   ` John Garry
2020-05-29 13:26     ` Christoph Hellwig
2020-05-28  8:29 ` John Garry
2020-05-29 13:53 Christoph Hellwig
2020-05-29 13:53 ` [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline Christoph Hellwig
2020-05-29 14:34   ` Hannes Reinecke
2020-05-29 14:41   ` Daniel Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200529030728.GW2869@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.