From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753135AbbHESHg (ORCPT ); Wed, 5 Aug 2015 14:07:36 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:48006 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751912AbbHESHe (ORCPT ); Wed, 5 Aug 2015 14:07:34 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Oleg Nesterov Cc: "Kirill A. Shutemov" , Andrew Morton , Kees Cook , David Howells , linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar , "Kirill A. Shutemov" , Rik van Riel , Vladimir Davydov , Ricky Zhou , Julien Tinnes References: <20150728171500.GA2871@www.outflux.net> <20150728143504.5aa996ba5955522a19c2d5f1@linux-foundation.org> <20150728221111.GA23391@node.dhcp.inet.fi> <20150805172356.GA20490@redhat.com> Date: Wed, 05 Aug 2015 13:00:49 -0500 In-Reply-To: <20150805172356.GA20490@redhat.com> (Oleg Nesterov's message of "Wed, 5 Aug 2015 19:23:56 +0200") Message-ID: <87wpx9sjhq.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX192wnIX3SDak5jzklkEM6yTigNbfP2fmbg= X-SA-Exim-Connect-IP: 97.119.22.40 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa01 1397; Body=1 Fuz1=1 Fuz2=1] * 0.1 XMSolicitRefs_0 Weightloss drug * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa01 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Oleg Nesterov X-Spam-Relay-Country: X-Spam-Timing: total 827 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 4.0 (0.5%), b_tie_ro: 2.9 (0.3%), parse: 1.28 (0.2%), extract_message_metadata: 31 (3.8%), get_uri_detail_list: 3.7 (0.5%), tests_pri_-1000: 12 (1.5%), tests_pri_-950: 2.1 (0.3%), tests_pri_-900: 1.68 (0.2%), tests_pri_-400: 42 (5.1%), check_bayes: 40 (4.8%), b_tokenize: 14 (1.7%), b_tok_get_all: 10 (1.2%), b_comp_prob: 5 (0.7%), b_tok_touch_all: 4.3 (0.5%), b_finish: 2.2 (0.3%), tests_pri_0: 720 (87.0%), tests_pri_500: 7 (0.9%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH] user_ns: use correct check for single-threadedness X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Oleg Nesterov writes: > On 07/29, Kirill A. Shutemov wrote: >> >> On Tue, Jul 28, 2015 at 02:35:04PM -0700, Andrew Morton wrote: >> > On Tue, 28 Jul 2015 10:15:00 -0700 Kees Cook wrote: >> > >> > > From: Ricky Zhou >> > > >> > > Checking mm_users > 1 does not mean a process is multithreaded. For >> > > example, reading /proc/PID/maps temporarily increments mm_users, allowing >> > > other processes to (accidentally) interfere with unshare() calls. >> > > >> > > This fixes observed failures of unshare(CLONE_NEWUSER) incorrectly >> > > returning EINVAL if another processes happened to be simultaneously >> > > reading the maps file. >> > >> > Yikes. current_is_single_threaded() is expensive. Are we sure this >> > isn't going to kill someone's workload? >> >> It's expensive only if mm_users > 1. We will go to for_each_process() only >> if somebody outside of the process grabs mm_users references (like reading >> /proc/PID/maps). > > Yes, > >> Or if it called it from multithreaded application. > > Not really, please note the "negative fast-path" check: > > if (atomic_read(&task->signal->live) != 1) > return false; > >> Acked-by: Kirill A. Shutemov > > Same here, I think the patch is fine. > > I don't think that sys_setns() is performance critical, and Eric seems > to agree with this change too. I do and I am about to merge it into my userns tree. It is clearly an issue that is affecting users in the real world, which fundamentally makes the current implementation buggy. That said the performance of sys_setns does matter. Removing the possibility of calling synchronize_rcu from switch_task_namespaces happened because the performance of sys_setns was a problem. Looking at the implementation of current_is_single_threaded() it appears that the data structures dealing with threads could stand some improvement so current_is_single_threaded() could become a constant time function. The class of application that is likely to call setns and be strongly affected by it's performance are container management applications or applications that are deliberately using more than one namespace at a time. There might be advantages to using current_is_single_threaded() as currently written to slow down an application. That slow down could expand a race condition enough to make another bug exploitable. So if we could figure out how to make current_is_single_threaded() faster or at least have the slow path not triggerable by users playing with proc I would very much be in favor of it. Eric