From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753957AbbJ0FdW (ORCPT ); Tue, 27 Oct 2015 01:33:22 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:38281 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753924AbbJ0FdV (ORCPT ); Tue, 27 Oct 2015 01:33:21 -0400 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 26 Oct 2015 22:33:12 -0700 From: "Paul E. McKenney" To: Tejun Heo Cc: Linus Torvalds , Ingo Molnar , Linux Kernel Mailing List , Lai Jiangshan , Dipankar Sarma , Andrew Morton , Mathieu Desnoyers , Josh Triplett , Thomas Gleixner , Peter Zijlstra , Steven Rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar , Patrick Marlier Subject: Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Message-ID: <20151027053312.GL5105@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20151006161305.GA9799@linux.vnet.ibm.com> <1444148028-11551-1-git-send-email-paulmck@linux.vnet.ibm.com> <1444148028-11551-11-git-send-email-paulmck@linux.vnet.ibm.com> <20151026084506.GA28423@gmail.com> <20151026145552.GG5105@linux.vnet.ibm.com> <20151027051939.GA19355@mtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151027051939.GA19355@mtj.duckdns.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15102705-0033-0000-0000-0000069110CB Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 27, 2015 at 02:19:39PM +0900, Tejun Heo wrote: > Hello, > > On Tue, Oct 27, 2015 at 12:37:16PM +0900, Linus Torvalds wrote: > > > I believe that the above should instead be: > > > > > > struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next, > > I should have just used list_entry() here. It's just offseting the > pointer to set up the initial iteration point. OK, that sounds much better! > ... > > That said, I'm not sure why it doesn't just do the normal > > > > rcu_read_lock(); > > list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) { > > .... > > } > > rcu_read_unlock(); > > > > like the other places do. It looks like it wants that > > "list_for_each_entry_continue_rcu()" because it does that odd "pin > > entry and drop rcu lock and retake it and continue where you left > > off", but I'm not sure why the continue version would be so > > different.. It's going to do that "follow next entry" regardless, and > > the "goto restart" doesn't look like it actually adds anything. If > > following the next pointer is ok even after having released the RCU > > read lock, then I'm not seeing why the end of the loop couldn't just > > do > > > > rcu_read_unlock(); > > wb_wait_for_completion(bdi, &fallback_work_done); > > rcu_read_lock(); > > > > and just continue the loop (and the pinning of "wb" and releasing the > > "last_wb" thing in the *next* iteration should make it all work the > > same). > > > > Adding Tejun to the cc, because this is his code and there's probably > > something subtle I'm missing. Tejun, can you take a look? It's > > bdi_split_work_to_wbs() in fs/fs-writeback.c. > > Yeah, just releasing and regrabbing should work too as the iterator > doesn't depend on anything other than the current entry (e.g. as > opposed to imaginary list_for_each_entry_safe_rcu()). It's slightly > icky to meddle with locking behind the iterator's back tho. Either > way should be fine but how about something like the following? > > Subject: writeback: don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs() > > bdi_split_work_to_wbs() uses list_for_each_entry_rcu_continue() to > walk @bdi->wb_list. To set up the initial iteration condition, it > uses list_entry_rcu() to calculate the entry pointer corresponding to > the list head; however, this isn't an actual RCU dereference and using > list_entry_rcu() for it ended up breaking a proposed list_entry_rcu() > change because it was feeding an non-lvalue pointer into the macro. > > Don't use the RCU variant for simple pointer offsetting. Use > list_entry() instead. > > Signed-off-by: Tejun Heo Acked-by: Paul E. McKenney > --- > fs/fs-writeback.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > index 29e4599..7378169 100644 > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -779,8 +779,8 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, > bool skip_if_busy) > { > struct bdi_writeback *last_wb = NULL; > - struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list, > - struct bdi_writeback, bdi_node); > + struct bdi_writeback *wb = list_entry(&bdi->wb_list, > + struct bdi_writeback, bdi_node); > > might_sleep(); > restart: >