All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Jan Kara <jack@suse.cz>
Cc: Bart Van Assche <bvanassche@acm.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH] blktrace: Avoid sparse warnings when assigning q->blk_trace
Date: Fri, 29 May 2020 11:43:00 +0000	[thread overview]
Message-ID: <20200529114300.GA11244@42.do-not-panic.com> (raw)
In-Reply-To: <20200529090448.GN14550@quack2.suse.cz>

On Fri, May 29, 2020 at 11:04:48AM +0200, Jan Kara wrote:
> On Fri 29-05-20 08:00:56, Luis Chamberlain wrote:
> > On Thu, May 28, 2020 at 08:55:39PM +0200, Jan Kara wrote:
> > > On Thu 28-05-20 18:43:33, Luis Chamberlain wrote:
> > > > On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote:
> > > > > On Thu 28-05-20 07:44:38, Bart Van Assche wrote:
> > > > > > (+Luis)
> > > > > > 
> > > > > > On 2020-05-28 02:29, Jan Kara wrote:
> > > > > > > Mostly for historical reasons, q->blk_trace is assigned through xchg()
> > > > > > > and cmpxchg() atomic operations. Although this is correct, sparse
> > > > > > > complains about this because it violates rcu annotations. Furthermore
> > > > > > > there's no real need for atomic operations anymore since all changes to
> > > > > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace
> > > > > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and
> > > > > > > rcu_assign_pointer(). This makes the code more efficient and sparse
> > > > > > > happy.
> > > > > > > 
> > > > > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > > > 
> > > > > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect
> > > > > > q->blk_trace with RCU") in the description of this patch?
> > > > > 
> > > > > Yes, that's probably a good idea.
> > > > > 
> > > > > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q,
> > > > > > >  
> > > > > > >  	blk_trace_setup_lba(bt, bdev);
> > > > > > >  
> > > > > > > -	ret = -EBUSY;
> > > > > > > -	if (cmpxchg(&q->blk_trace, NULL, bt))
> > > > > > > -		goto free_bt;
> > > > > > > -
> > > > > > > +	rcu_assign_pointer(q->blk_trace, bt);
> > > > > > >  	get_probe_ref();
> > > > > > >  	return 0;
> > > > > > 
> > > > > > This changes a conditional assignment of q->blk_trace into an
> > > > > > unconditional assignment. Shouldn't q->blk_trace only be assigned if
> > > > > > q->blk_trace == NULL?
> > > > > 
> > > > > Yes but both callers of blk_trace_setup_queue() actually check that
> > > > > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we
> > > > > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change.
> > > > > So the conditional assignment was just bogus.
> > > > 
> > > > If you run a blktrace against a different partition the check does have
> > > > an effect today. This is because the request_queue is shared between
> > > > partitions implicitly, even though they end up using a different struct
> > > > dentry. So the check is actually still needed, however my change adds
> > > > this check early as well so we don't do a memory allocation just to
> > > > throw it away.
> > > 
> > > I'm not sure we are speaking about the same check but I might be missing
> > > something. blk_trace_setup_queue() is only called from
> > > sysfs_blk_trace_attr_store(). That does:
> > > 
> > >         mutex_lock(&q->blk_trace_mutex);
> > > 
> > >         bt = rcu_dereference_protected(q->blk_trace,
> > >                                        lockdep_is_held(&q->blk_trace_mutex));
> > >         if (attr == &dev_attr_enable) {
> > >                 if (!!value == !!bt) {
> > >                         ret = 0;
> > >                         goto out_unlock_bdev;
> > >                 }
> > > 
> > > 		^^^ So if 'bt' is non-NULL, and we are enabling, we bail
> > > instead of calling blk_trace_setup_queue().
> > > 
> > > Similarly later:
> > > 
> > >         if (bt == NULL) {
> > >                 ret = blk_trace_setup_queue(q, bdev);
> > > 	...
> > > so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the
> > > cmpxchg() in blk_trace_setup_queue() could never fail to set the value.
> > > Am I missing something?
> > 
> > I believe we are talking about the same check indeed. Consider the
> > situation not as a race, but instead consider the state machine of
> > the ioctl. The BLKTRACESETUP goes first, and when that is over we
> > have not ran BLKTRACESTART. So, prior to BLKTRACESTART we can have
> > another BLKTRACESETUP run but against another partition.
> 
> So first note that BLKTRACESETUP goes through do_blk_trace_setup() while
> 'echo 1 >/sys/block/../trace/enable' goes through blk_trace_setup_queue().
> Although these operations achieve a very similar things, they are completely
> separate code paths. I was speaking about the second case while you are now
> speaking about the first one.
> 
> WRT to your BLKTRACESETUP example, the first BLKTRACESETUP will end up
> setting q->blk_trace to 'bt' so the second BLKTRACESETUP will see
> q->blk_trace is not NULL (my patch adds this check to do_blk_trace_setup()
> so we bail out earlier than during cmpxchg()) and fails. Again I don't see
> any problem here...

Ah, the patch I was CC'd on didn't contain this hunk! It only had the
change from cmpxchg() to the rcu_assign_pointer(), so I misunderstood
your intention, sorry!

In that case, I already proposed a patch to do that, and it also adds
a tiny bit of verbiage given we currently don't inform the user about
why this fails [0].

Let me know how you folks would like to proceed.

[0] https://lkml.kernel.org/r/20200516031956.2605-7-mcgrof@kernel.org

  Luis

  reply	other threads:[~2020-05-29 11:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28  9:29 [PATCH] blktrace: Avoid sparse warnings when assigning q->blk_trace Jan Kara
2020-05-28 14:44 ` Bart Van Assche
2020-05-28 14:55   ` Luis Chamberlain
2020-05-28 18:31   ` Jan Kara
2020-05-28 18:43     ` Luis Chamberlain
2020-05-28 18:55       ` Jan Kara
2020-05-29  8:00         ` Luis Chamberlain
2020-05-29  9:04           ` Jan Kara
2020-05-29 11:43             ` Luis Chamberlain [this message]
2020-05-29 12:11               ` Jan Kara
2020-05-29 12:22                 ` Luis Chamberlain
2020-06-02  7:12 Jan Kara
2020-06-02 14:17 ` Luis Chamberlain
2020-06-02 15:10   ` Luis Chamberlain
2020-06-03  8:35     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200529114300.GA11244@42.do-not-panic.com \
    --to=mcgrof@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.