Re: [PATCH 7/7] RCU, workqueue: Implement rcu_work

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lai Jiangshan <jiangshanlai+lkml@gmail.com>
Cc: Tejun Heo <tj@kernel.org>,
	torvalds@linux-foundation.org, jannh@google.com, bcrl@kvack.org,
	viro@zeniv.linux.org.uk, kent.overstreet@gmail.com,
	security@kernel.org, LKML <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com
Subject: Re: [PATCH 7/7] RCU, workqueue: Implement rcu_work
Date: Wed, 7 Mar 2018 06:54:08 -0800	[thread overview]
Message-ID: <20180307145408.GC3918@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAJhGHyAuz3TRCpPnCBzZ+om3k9N3ykCWYOq_bv+hucV=XXiY9w@mail.gmail.com>

On Wed, Mar 07, 2018 at 10:49:49AM +0800, Lai Jiangshan wrote:
> On Wed, Mar 7, 2018 at 1:33 AM, Tejun Heo <tj@kernel.org> wrote:
> 
> > +/**
> > + * queue_rcu_work_on - queue work on specific CPU after a RCU grace period
> > + * @cpu: CPU number to execute work on
> > + * @wq: workqueue to use
> > + * @rwork: work to queue
> 
> For many people, "RCU grace period" is clear enough, but not ALL.
> 
> So please make it a little more clear that it just queues work after
> a *Normal* RCU grace period. it supports only one RCU variant.
> 
> 
> > + *
> > + * Return: %false if @work was already on a queue, %true otherwise.
> > + */
> 
> I'm afraid this will be a hard-using API.
> 
> The user can't find a plan B when it returns false, especially when
> the user expects the work must be called at least once again
> after an RCU grace period.
> 
> And the error-prone part of it is that, like other queue_work() functions,
> the return value of it is often ignored and makes the problem worse.
> 
> So, since workqueue.c provides this API, it should handle this
> problem. For example, by calling call_rcu() again in this case, but
> everything will be much more complex: a synchronization is needed
> for "calling call_rcu() again" and allowing the work item called
> twice after the last queue_rcu_work() is not workqueue style.

I confess that I had not thought of this aspect, but doesn't the
rcu_barrier() in v2 of this patchset guarantee that it has passed
the RCU portion of the overall wait time?  Given that, what I am
missing is now this differs from flush_work() on a normal work item.

So could you please show me the sequence of events that leads to this
problem?  (Again, against v2 of this patch, which replaces the
synchronize_rcu() with rcu_barrier().)

> Some would argue that the delayed_work has the same problem
> when the user expects the work must be called at least once again
> after a period of time. But time interval is easy to detect, the user
> can check the time and call the queue_delayed_work() again
> when needed which is also a frequent design pattern. And
> for rcu, it is hard to use this design pattern since it is hard
> to detect (new) rcu grace period without using call_rcu().
> 
> I would not provide this API. it is not a NACK. I'm just trying
> expressing my thinking about the API. I'd rather RCU be changed
> and RCU callbacks are changed to be sleepable. But a complete
> overhaul cleanup on the whole source tree for compatibility
> is needed at first, an even more complex job.

One downside of allowing RCU callback functions to sleep is that
one poorly written callback can block a bunch of other ones.
One advantage of Tejun's approach is that such delays only affect
the workqueues, which are already designed to handle such delays.

						Thanx, Paul

> > +bool queue_rcu_work_on(int cpu, struct workqueue_struct *wq,
> > +                      struct rcu_work *rwork)
> > +{
> > +       struct work_struct *work = &rwork->work;
> > +
> > +       if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
> > +               rwork->wq = wq;
> > +               rwork->cpu = cpu;
> > +               call_rcu(&rwork->rcu, rcu_work_rcufn);
> > +               return true;
> > +       }
> > +
> > +       return false;
> > +}
> > +EXPORT_SYMBOL(queue_rcu_work_on);
> > +
>