From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753550AbbFXNwN (ORCPT ); Wed, 24 Jun 2015 09:52:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54484 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752542AbbFXNwH (ORCPT ); Wed, 24 Jun 2015 09:52:07 -0400 Date: Wed, 24 Jun 2015 15:50:49 +0200 From: Oleg Nesterov To: Peter Zijlstra Cc: paulmck@linux.vnet.ibm.com, tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 09/13] hotplug: Replace hotplug lock with percpu-rwsem Message-ID: <20150624135049.GA31992@redhat.com> References: <20150622121623.291363374@infradead.org> <20150622122256.480062572@infradead.org> <20150622225739.GA5582@redhat.com> <20150623071637.GA3644@twins.programming.kicks-ass.net> <20150623170122.GA26854@redhat.com> <20150623175318.GE3644@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150623175318.GE3644@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/23, Peter Zijlstra wrote: > > On Tue, Jun 23, 2015 at 07:01:22PM +0200, Oleg Nesterov wrote: > > On 06/23, Peter Zijlstra wrote: > > > > > > On Tue, Jun 23, 2015 at 12:57:39AM +0200, Oleg Nesterov wrote: > > > > > + > > > > > + lock_map_acquire_read(&cpu_hotplug.rwsem.rw_sem.dep_map); > > > > > + _percpu_down_read(&cpu_hotplug.rwsem); > > > > > } > > > > > > > > Confused... Why do we need _percpu_down_read()? Can't get_online_cpus() > > > > just use percpu_down_read() ? > > > > > > > > Yes, percpu_down_read() is not recursive, like the normal down_read(). > > > > But this does not matter because we rely on ->cpuhp_ref anyway? > > > > > > While we will not call the actual lock, lockdep will still get confused > > > by the inconsistent locking order observed. > > > > > > Change it and boot, you'll find lockdep output pretty quickly. > > > > Hmm. and I simply can't understand why... > > If in one callchain we do: > > get_online_cpus(); > lock(A); > > in another we do: > > lock(A); > get_online_cpus(); > > lockdep will complain about the inverted lock order, however this is not > a problem at all for recursive locks. Ah, but in this case lockdep is right. This is deadlockable because with the new implementation percpu_down_write() blocks the new readers. So this change just hides the valid warning. Just suppose that the 3rd CPU does percpu_down_write()->down_write() right after the 2nd CPU (above) takes lock(A). I have to admit that I didn't realize that the code above is currently correct... but it is. So we need percpu_down_write_dont_block_readers(). I already thought about this before, I'll try to make the patch tomorrow on top of your changes. This means that we do not need task_struct->cpuhp_ref, but we can't avoid livelock we currently have: cpu_hotplug_begin() can never succeed if the new readers come fast enough. Oleg.