All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	Jianchao Wang <jianchao.w.wang@oracle.com>,
	Kent Overstreet <kent.overstreet@gmail.com>
Subject: Re: [PATCH] percpu-refcount: relax limit on percpu_ref_reinit()
Date: Wed, 19 Sep 2018 10:51:49 +0800	[thread overview]
Message-ID: <20180919025148.GB20560@ming.t460p> (raw)
In-Reply-To: <20180918124909.GA902964@devbig004.ftw2.facebook.com>

Hi Tejun,

On Tue, Sep 18, 2018 at 05:49:09AM -0700, Tejun Heo wrote:
> Hello, Ming.
> 
> Sorry about the delay.
> 
> On Thu, Sep 13, 2018 at 06:11:40AM +0800, Ming Lei wrote:
> > > Yeah but what guards ->release() starting to run and then the ref
> > > being switched to percpu mode?  Or maybe that doesn't matter?
> > 
> > OK, we may add synchronize_rcu() just after clearing the DEAD flag in
> > the new introduced helper to avoid the race.
> 
> That doesn't make sense to me.  How is synchronize_rcu() gonna change
> anything there?

As you saw in the new post, synchronize_rcu() isn't used for avoiding
the race. Instead, it is done by grabbing one extra ref on atomic part.

> 
> > > > 4) after the queue is recovered(or the controller is reset successfully), it
> > > > isn't necessary to wait until the refcount drops zero, since it is fine to
> > > > reinit it by clearing DEAD and switching back to percpu mode from atomic mode.
> > > > And waiting for the refcount dropping to zero in the reset handler may trigger
> > > > IO hang if IO timeout happens again during reset.
> > > 
> > > Does the recovery need the in-flight commands actually drained or does
> > > it just need to block new issues for a while.  If latter, why is
> > 
> > The recovery needn't to drain the in-flight commands actually.
> 
> Is it just waiting till confirm_kill is called?  So that new ref is
> not given away?  If synchronization like that is gonna work, the
> percpu ref operations on the reader side must be wrapped in a larger
> critical region, which brings up two issues.
> 
> 1. Callers of percpu_ref must not depend on what internal
>    synchronization construct percpu_ref uses.  Again, percpu_ref
>    doesn't even use regular RCU.
> 
> 2. If there is already an outer RCU protection around ref operation,
>    that RCU critical section can and should be used for
>    synchronization, not percpu_ref.

I guess the above doesn't apply any more because there isn't new 
synchronize_rcu() introduced in my new post.

> 
> > > percpu_ref even being used?
> > 
> > Just for avoiding to invent a new wheel, especially .q_usage_counter
> > has served for this purpose for long time.
> 
> It sounds like this was more of an abuse.  So, basically what you want
> is sth like the following.
> 
> READER
> 
>  rcu_read_lock();
>  if (can_issue_new_commands)
> 	issue;
>  else
> 	abort;
>  rcu_read_unlock();
> 
> WRITER
> 
>  can_issue_new_commands = false;
>  synchronize_rcu();
>  // no new command will be issued anymore
> 
> Right?  There isn't much wheel to reinvent here and using percpu_ref
> for the above is likely already incorrect due to the different RCU
> type being used.

No RCU story any more, :-)

It might work, but still a reinvented wheel since perpcu-refcount does
provide same function. Not mention the inter-action between the two
mechanism may have to be considered.

Also there is still cost introduced in WRITER side, and the
synchronize_rcu() often takes a bit long, especially there might be lots
of namespaces, each need to run one synchronize_rcu(). We have learned
lessons in converting to blk-mq for scsi, in which synchronize_rcu()
introduces long delay in booting.


Thanks,
Ming

WARNING: multiple messages have this Message-ID (diff)
From: ming.lei@redhat.com (Ming Lei)
Subject: [PATCH] percpu-refcount: relax limit on percpu_ref_reinit()
Date: Wed, 19 Sep 2018 10:51:49 +0800	[thread overview]
Message-ID: <20180919025148.GB20560@ming.t460p> (raw)
In-Reply-To: <20180918124909.GA902964@devbig004.ftw2.facebook.com>

Hi Tejun,

On Tue, Sep 18, 2018@05:49:09AM -0700, Tejun Heo wrote:
> Hello, Ming.
> 
> Sorry about the delay.
> 
> On Thu, Sep 13, 2018@06:11:40AM +0800, Ming Lei wrote:
> > > Yeah but what guards ->release() starting to run and then the ref
> > > being switched to percpu mode?  Or maybe that doesn't matter?
> > 
> > OK, we may add synchronize_rcu() just after clearing the DEAD flag in
> > the new introduced helper to avoid the race.
> 
> That doesn't make sense to me.  How is synchronize_rcu() gonna change
> anything there?

As you saw in the new post, synchronize_rcu() isn't used for avoiding
the race. Instead, it is done by grabbing one extra ref on atomic part.

> 
> > > > 4) after the queue is recovered(or the controller is reset successfully), it
> > > > isn't necessary to wait until the refcount drops zero, since it is fine to
> > > > reinit it by clearing DEAD and switching back to percpu mode from atomic mode.
> > > > And waiting for the refcount dropping to zero in the reset handler may trigger
> > > > IO hang if IO timeout happens again during reset.
> > > 
> > > Does the recovery need the in-flight commands actually drained or does
> > > it just need to block new issues for a while.  If latter, why is
> > 
> > The recovery needn't to drain the in-flight commands actually.
> 
> Is it just waiting till confirm_kill is called?  So that new ref is
> not given away?  If synchronization like that is gonna work, the
> percpu ref operations on the reader side must be wrapped in a larger
> critical region, which brings up two issues.
> 
> 1. Callers of percpu_ref must not depend on what internal
>    synchronization construct percpu_ref uses.  Again, percpu_ref
>    doesn't even use regular RCU.
> 
> 2. If there is already an outer RCU protection around ref operation,
>    that RCU critical section can and should be used for
>    synchronization, not percpu_ref.

I guess the above doesn't apply any more because there isn't new 
synchronize_rcu() introduced in my new post.

> 
> > > percpu_ref even being used?
> > 
> > Just for avoiding to invent a new wheel, especially .q_usage_counter
> > has served for this purpose for long time.
> 
> It sounds like this was more of an abuse.  So, basically what you want
> is sth like the following.
> 
> READER
> 
>  rcu_read_lock();
>  if (can_issue_new_commands)
> 	issue;
>  else
> 	abort;
>  rcu_read_unlock();
> 
> WRITER
> 
>  can_issue_new_commands = false;
>  synchronize_rcu();
>  // no new command will be issued anymore
> 
> Right?  There isn't much wheel to reinvent here and using percpu_ref
> for the above is likely already incorrect due to the different RCU
> type being used.

No RCU story any more, :-)

It might work, but still a reinvented wheel since perpcu-refcount does
provide same function. Not mention the inter-action between the two
mechanism may have to be considered.

Also there is still cost introduced in WRITER side, and the
synchronize_rcu() often takes a bit long, especially there might be lots
of namespaces, each need to run one synchronize_rcu(). We have learned
lessons in converting to blk-mq for scsi, in which synchronize_rcu()
introduces long delay in booting.


Thanks,
Ming

  reply	other threads:[~2018-09-19  8:27 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-09 12:58 [PATCH] percpu-refcount: relax limit on percpu_ref_reinit() Ming Lei
2018-09-09 18:46 ` Bart Van Assche
2018-09-09 23:59   ` Ming Lei
2018-09-10  1:40 ` jianchao.wang
2018-09-10 16:11   ` Ming Lei
2018-09-11  1:48     ` jianchao.wang
2018-09-11  4:03       ` Ming Lei
2018-09-11  4:40         ` jianchao.wang
2018-09-11  4:40           ` jianchao.wang
2018-09-11  8:20           ` Ming Lei
2018-09-11 14:22             ` jianchao.wang
2018-09-11 13:44           ` Tejun Heo
2018-09-11 14:13             ` jianchao.wang
2018-09-10  1:54 ` jianchao.wang
2018-09-10 16:49 ` Tejun Heo
2018-09-11  0:00   ` Ming Lei
2018-09-11 13:48     ` Tejun Heo
2018-09-11 15:45       ` Ming Lei
2018-09-11 15:49         ` Tejun Heo
2018-09-11 16:05           ` Ming Lei
2018-09-11 16:30             ` Tejun Heo
2018-09-11 16:34               ` Ming Lei
2018-09-11 16:38                 ` Tejun Heo
2018-09-12  1:52                   ` Ming Lei
2018-09-12 15:53                     ` Tejun Heo
2018-09-12 22:11                       ` Ming Lei
2018-09-12 22:11                         ` Ming Lei
2018-09-18 12:49                         ` Tejun Heo
2018-09-18 12:49                           ` Tejun Heo
2018-09-19  2:51                           ` Ming Lei [this message]
2018-09-19  2:51                             ` Ming Lei
2018-09-19 20:36                             ` Tejun Heo
2018-09-19 20:36                               ` Tejun Heo
2018-09-18  3:21 ` jianchao.wang
2018-09-18  7:34   ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180919025148.GB20560@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=jianchao.w.wang@oracle.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.