From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751189AbaKYRwT (ORCPT ); Tue, 25 Nov 2014 12:52:19 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:53464 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750950AbaKYRwR (ORCPT ); Tue, 25 Nov 2014 12:52:17 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: Andrew Morton , Aaron Tomlin , Pavel Emelyanov , Serge Hallyn , Sterling Alexander , linux-kernel@vger.kernel.org References: <20141124200629.GA21009@redhat.com> <87vbm4ff0r.fsf@x220.int.ebiederm.org> <20141125170718.GA29360@redhat.com> Date: Tue, 25 Nov 2014 11:50:59 -0600 In-Reply-To: <20141125170718.GA29360@redhat.com> (Oleg Nesterov's message of "Tue, 25 Nov 2014 18:07:18 +0100") Message-ID: <87lhmzb24c.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX19/Lob7BE2q9jvFeafsfl1tDOR+U5si2SU= X-SA-Exim-Connect-IP: 97.121.92.161 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_03 6+ unique symbols in subject * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.0 T_TooManySym_02 5+ unique symbols in subject X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Oleg Nesterov X-Spam-Relay-Country: X-Spam-Timing: total 431 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 3.7 (0.8%), b_tie_ro: 2.8 (0.7%), parse: 1.13 (0.3%), extract_message_metadata: 5 (1.2%), get_uri_detail_list: 2.9 (0.7%), tests_pri_-1000: 5 (1.2%), tests_pri_-950: 1.93 (0.4%), tests_pri_-900: 1.57 (0.4%), tests_pri_-400: 36 (8.4%), check_bayes: 34 (8.0%), b_tokenize: 10 (2.3%), b_tok_get_all: 8 (2.0%), b_comp_prob: 3.6 (0.8%), b_tok_touch_all: 9 (2.0%), b_finish: 0.77 (0.2%), tests_pri_0: 357 (83.0%), tests_pri_500: 4.7 (1.1%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg Nesterov writes: > On 11/24, Eric W. Biederman wrote: >> >> Oleg Nesterov writes: >> >> > --- a/kernel/pid.c >> > +++ b/kernel/pid.c >> > @@ -320,7 +320,6 @@ struct pid *alloc_pid(struct pid_namespace *ns) >> > goto out_free; >> > } >> > >> > - get_pid_ns(ns); >> > atomic_set(&pid->count, 1); >> > for (type = 0; type < PIDTYPE_MAX; ++type) >> > INIT_HLIST_HEAD(&pid->tasks[type]); >> > @@ -336,7 +335,7 @@ struct pid *alloc_pid(struct pid_namespace *ns) >> > } >> > spin_unlock_irq(&pidmap_lock); >> > >> > -out: >> > + get_pid_ns(ns); >> >> Moving the label and changing the goto out logic is gratuitous confusing >> and I think it probably even generates worse code. >> >> Furthermore multiple exits make adding debugging code more difficult. > > Oh, I strongly disagree but I am not going to argue ;) cleanups are > always subjective, and I do believe in "maintainer is always right" > mantra. I can make v2 without this change. Fair enough. My primary complaint was that you were changing the logic and fixing a bug at the same time. That added noise and made analysis of what was really going on much more difficult. >> Moving get_pid_ns down does close a leak in the error handling path. > > OK, good. > >> However at the moment my I can't figure out if it is safe to move >> get_pid_ns elow hlist_add_head_rcu. Because once we are on the rcu list >> the pid is findable, and being publicly visible with a bad refcount could cause >> problems. > > The caller has a reference, this ns can't go away. Obviously, otherwise > get_pid_ns(ns) is not safe. > > We need this get_pid_ns() to balance put_pid()->put_pid_ns() which obviously > won't be called until we return this pid, otherwise everything is wrong. > > So I think this should be safe? My concern is exposing a half initialized struct pid to the world via an rcu data structure. In particular could one of the rcu users get into trouble because we haven't called get_pid_ns yet? That is unclear to me. That is one of those weird nasty races I would rather not have to consider and moving the get_pid_ns after hlist_add requires that we think about it. To fix the error handling and avoid thinking about the races we have two choices: - In the error path that is currently called out_unlock we can drop the extra references. - Immediately after we perform the test that on error jumps to out_unlock we call get_pid_ns. My preference would be the first, as it is a trivially correct one line change. Aka I think this is the obviously correct trivial fix. out_unlock: spin_unlock_irq(&pidmap_lock); + put_pid_ns(ns); out_free: while (++i <= ns->level) free_pidmap(pid->numbers + i); Eric