From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753895AbbJ0Fcp (ORCPT ); Tue, 27 Oct 2015 01:32:45 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:45369 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751280AbbJ0Fcn (ORCPT ); Tue, 27 Oct 2015 01:32:43 -0400 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 26 Oct 2015 22:32:36 -0700 From: "Paul E. McKenney" To: Linus Torvalds Cc: Tejun Heo , Ingo Molnar , Linux Kernel Mailing List , Lai Jiangshan , Dipankar Sarma , Andrew Morton , Mathieu Desnoyers , Josh Triplett , Thomas Gleixner , Peter Zijlstra , Steven Rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar , Patrick Marlier Subject: Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() Message-ID: <20151027053236.GK5105@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20151006161305.GA9799@linux.vnet.ibm.com> <1444148028-11551-1-git-send-email-paulmck@linux.vnet.ibm.com> <1444148028-11551-11-git-send-email-paulmck@linux.vnet.ibm.com> <20151026084506.GA28423@gmail.com> <20151026145552.GG5105@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15102705-8236-0000-0000-0000130E97D4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 27, 2015 at 12:37:16PM +0900, Linus Torvalds wrote: > On Mon, Oct 26, 2015 at 11:55 PM, Paul E. McKenney > wrote: > >> struct bdi_writeback *last_wb = NULL; > >> struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list, > > > > I believe that the above should instead be: > > > > struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next, > > I don't think you can do that. > > You haven't even taken the RCU read lock yet at this point. > > What the code seems to try to do is to get the "head pointer" of the > list before taking the read lock (since _that_ is stable), and then > follow the list under the lock. > > You're making it actually follow the first RCU pointer too early. Good point, color me dazed and confused. :-/ Thanx, Paul > That said, I'm not sure why it doesn't just do the normal > > rcu_read_lock(); > list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) { > .... > } > rcu_read_unlock(); > > like the other places do. It looks like it wants that > "list_for_each_entry_continue_rcu()" because it does that odd "pin > entry and drop rcu lock and retake it and continue where you left > off", but I'm not sure why the continue version would be so > different.. It's going to do that "follow next entry" regardless, and > the "goto restart" doesn't look like it actually adds anything. If > following the next pointer is ok even after having released the RCU > read lock, then I'm not seeing why the end of the loop couldn't just > do > > rcu_read_unlock(); > wb_wait_for_completion(bdi, &fallback_work_done); > rcu_read_lock(); > > and just continue the loop (and the pinning of "wb" and releasing the > "last_wb" thing in the *next* iteration should make it all work the > same). > > Adding Tejun to the cc, because this is his code and there's probably > something subtle I'm missing. Tejun, can you take a look? It's > bdi_split_work_to_wbs() in fs/fs-writeback.c. > > Linus >