From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753519AbbJ0Dhb (ORCPT ); Mon, 26 Oct 2015 23:37:31 -0400 Received: from mail-ig0-f180.google.com ([209.85.213.180]:38294 "EHLO mail-ig0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751572AbbJ0DhR (ORCPT ); Mon, 26 Oct 2015 23:37:17 -0400 MIME-Version: 1.0 In-Reply-To: <20151026145552.GG5105@linux.vnet.ibm.com> References: <20151006161305.GA9799@linux.vnet.ibm.com> <1444148028-11551-1-git-send-email-paulmck@linux.vnet.ibm.com> <1444148028-11551-11-git-send-email-paulmck@linux.vnet.ibm.com> <20151026084506.GA28423@gmail.com> <20151026145552.GG5105@linux.vnet.ibm.com> Date: Tue, 27 Oct 2015 12:37:16 +0900 X-Google-Sender-Auth: UqQyp-M4cjaO74EYMqGBGtedXCI Message-ID: Subject: Re: [PATCH tip/core/rcu 11/13] rculist: Make list_entry_rcu() use lockless_dereference() From: Linus Torvalds To: Paul McKenney , Tejun Heo Cc: Ingo Molnar , Linux Kernel Mailing List , Lai Jiangshan , Dipankar Sarma , Andrew Morton , Mathieu Desnoyers , Josh Triplett , Thomas Gleixner , Peter Zijlstra , Steven Rostedt , David Howells , Eric Dumazet , Darren Hart , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Oleg Nesterov , pranith kumar , Patrick Marlier Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 26, 2015 at 11:55 PM, Paul E. McKenney wrote: >> struct bdi_writeback *last_wb = NULL; >> struct bdi_writeback *wb = list_entry_rcu(&bdi->wb_list, > > I believe that the above should instead be: > > struct bdi_writeback *wb = list_entry_rcu(bdi->wb_list.next, I don't think you can do that. You haven't even taken the RCU read lock yet at this point. What the code seems to try to do is to get the "head pointer" of the list before taking the read lock (since _that_ is stable), and then follow the list under the lock. You're making it actually follow the first RCU pointer too early. That said, I'm not sure why it doesn't just do the normal rcu_read_lock(); list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) { .... } rcu_read_unlock(); like the other places do. It looks like it wants that "list_for_each_entry_continue_rcu()" because it does that odd "pin entry and drop rcu lock and retake it and continue where you left off", but I'm not sure why the continue version would be so different.. It's going to do that "follow next entry" regardless, and the "goto restart" doesn't look like it actually adds anything. If following the next pointer is ok even after having released the RCU read lock, then I'm not seeing why the end of the loop couldn't just do rcu_read_unlock(); wb_wait_for_completion(bdi, &fallback_work_done); rcu_read_lock(); and just continue the loop (and the pinning of "wb" and releasing the "last_wb" thing in the *next* iteration should make it all work the same). Adding Tejun to the cc, because this is his code and there's probably something subtle I'm missing. Tejun, can you take a look? It's bdi_split_work_to_wbs() in fs/fs-writeback.c. Linus