From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760836AbdJQWCv (ORCPT ); Tue, 17 Oct 2017 18:02:51 -0400 Received: from mail.kernel.org ([198.145.29.99]:35316 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754607AbdJQWCt (ORCPT ); Tue, 17 Oct 2017 18:02:49 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0093F21925 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org X-Google-Smtp-Source: ABhQp+S3vTUaHEjPZlgM9w5IUSgTKFCw/1X6Omn8/LFnG8x8LRdL28zKYrlMMbIS+JNQBWbp2IEtEWibXmEdIl4tdVY= MIME-Version: 1.0 In-Reply-To: References: <150788678482.924140.11785205105514746135.stgit@buzz> <20171013160514.GA27812@redhat.com> <3bdb5341-9ae6-265a-ce5b-45c2cfc76fad@yandex-team.ru> <20171016143628.b2ef80a9ef16d4345889b4d9@linux-foundation.org> From: Andy Lutomirski Date: Tue, 17 Oct 2017 15:02:27 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4] pidns: introduce syscall translate_pid To: Prakash Sangappa Cc: Nagarathnam Muthusamy , Andrew Morton , Konstantin Khlebnikov , Oleg Nesterov , Linux API , "linux-kernel@vger.kernel.org" , Serge Hallyn , "Eric W. Biederman" , Eugene Syromiatnikov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 17, 2017 at 8:38 AM, Prakash Sangappa wrote: > > > On 10/16/17 5:52 PM, Andy Lutomirski wrote: >> >> On Mon, Oct 16, 2017 at 3:54 PM, prakash.sangappa >> wrote: >>> >>> >>> On 10/16/2017 03:07 PM, Nagarathnam Muthusamy wrote: >>>> >>>> >>>> >>>> On 10/16/2017 02:36 PM, Andrew Morton wrote: >>>>> >>>>> On Sat, 14 Oct 2017 11:17:47 +0300 Konstantin Khlebnikov >>>>> wrote: >>>>> >>>>>>>>> pid_t translate_pid(pid_t pid, int source, int target); >>>>>>>>> >>>>>>>>> This syscall converts pid from source pid-ns into pid in target >>>>>>>>> pid-ns. >>>>>>>>> If pid is unreachable from target pid-ns it returns zero. >>>>>>>>> >>>>>>>>> Pid-namespaces are referred file descriptors opened to proc files >>>>>>>>> /proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children. Negative >>>>>>>>> argument >>>>>>>>> refers to current pid namespace, same as file /proc/self/ns/pid. >>>>>>>>> >>>>>>>>> Kernel expose virtual pids in /proc/[pid]/status:NSpid, but >>>>>>>>> backward >>>>>>>>> translation requires scanning all tasks. Also pids could be >>>>>>>>> translated >>>>>>>>> by sending them through unix socket between namespaces, this method >>>>>>>>> is >>>>>>>>> slow and insecure because other side is exposed inside pid >>>>>>>>> namespace. >>>>>> >>>>>> Andrew asked why we might need this. >>>>>> >>>>>> Such conversion is required for interaction between processes across >>>>>> pid-namespaces. >>>>>> For example to identify process in container by pid file looking from >>>>>> outside. >>>>>> >>>>>> Two years ago I've solved this in project of mine with monstrous code >>>>>> which >>>>>> forks couple times just to convert pid, lucky for me performance >>>>>> wasn't >>>>>> important. >>>>> >>>>> That's a single user who needed this a single time, and found a >>>>> userspace-based solution anyway. This is not exactly compelling! >>>>> >>>>> Is there a stronger case to be made? How does this change benefit our >>>>> users? Sell it to us! >>>> >>>> Oracle database is planning to use pid namespace for sandboxing database >>>> instances and they need an API similar to translate_pid to effectively >>>> translate process IDs from other pid namespaces. Prakash (cced in mail) >>>> can >>>> provide more details on this usecase. >>> >>> >>> As Nagarathnam indicated, Oracle Database will be using pid namespaces >>> and >>> needs a direct method of converting pids of processes in the pid >>> namespace >>> hierarchy. In this use case multiple >>> nested PID namespaces will be used. The currently available mechanism >>> are >>> not very efficient for this use case. For ex. as Konstantin described, >>> using >>> /proc//status would require the application to scan all the pid's >>> status files to determine the pid of given process in a child namespace. >>> >>> Use of SCM_CREDENTIALS's socket message is another way, which would >>> require >>> every process starting inside a pid namespace to send this message and >>> the >>> receiving process in the target namespace would have to save the >>> converted >>> pid and reference it. This mechanism becomes cumbersome especially if the >>> application has to deal with multiple nested pid namespaces. Also, the >>> Database needs to be able to convert a thread's global pid(gettid()). >>> Passing the thread's pid(gettid()) in SCM_CREDENTIALS message requires >>> CAP_SYS_ADMIN, which is an issue. >>> >>> So having a direct method, like the API that Konstantin is proposing, >>> will >>> work best for the Database >>> since pid of a process in any of the nested pid namespaces can be >>> converted >>> as and when required. I think with the proposed API, the application >>> should >>> be able to convert pid of a process or tid(gettid()) of a thread as well. >>> >> >> Can you explain what Oracle's database is planning to do with this >> information? > > > Database uses the PID to programmatically find out if the process/thread is > alive(kill 0) also send signals to the processes requesting it to dump > status/debug information and kill the processes in case of a shutdown abort > of the instance. What I'm wondering is: how does the caller of kill() end up controlling a task whose pid it doesn't know in its own namespace? > > -Prakash. > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: [PATCH v4] pidns: introduce syscall translate_pid Date: Tue, 17 Oct 2017 15:02:27 -0700 Message-ID: References: <150788678482.924140.11785205105514746135.stgit@buzz> <20171013160514.GA27812@redhat.com> <3bdb5341-9ae6-265a-ce5b-45c2cfc76fad@yandex-team.ru> <20171016143628.b2ef80a9ef16d4345889b4d9@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Prakash Sangappa Cc: Nagarathnam Muthusamy , Andrew Morton , Konstantin Khlebnikov , Oleg Nesterov , Linux API , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Serge Hallyn , "Eric W. Biederman" , Eugene Syromiatnikov List-Id: linux-api@vger.kernel.org On Tue, Oct 17, 2017 at 8:38 AM, Prakash Sangappa wrote: > > > On 10/16/17 5:52 PM, Andy Lutomirski wrote: >> >> On Mon, Oct 16, 2017 at 3:54 PM, prakash.sangappa >> wrote: >>> >>> >>> On 10/16/2017 03:07 PM, Nagarathnam Muthusamy wrote: >>>> >>>> >>>> >>>> On 10/16/2017 02:36 PM, Andrew Morton wrote: >>>>> >>>>> On Sat, 14 Oct 2017 11:17:47 +0300 Konstantin Khlebnikov >>>>> wrote: >>>>> >>>>>>>>> pid_t translate_pid(pid_t pid, int source, int target); >>>>>>>>> >>>>>>>>> This syscall converts pid from source pid-ns into pid in target >>>>>>>>> pid-ns. >>>>>>>>> If pid is unreachable from target pid-ns it returns zero. >>>>>>>>> >>>>>>>>> Pid-namespaces are referred file descriptors opened to proc files >>>>>>>>> /proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children. Negative >>>>>>>>> argument >>>>>>>>> refers to current pid namespace, same as file /proc/self/ns/pid. >>>>>>>>> >>>>>>>>> Kernel expose virtual pids in /proc/[pid]/status:NSpid, but >>>>>>>>> backward >>>>>>>>> translation requires scanning all tasks. Also pids could be >>>>>>>>> translated >>>>>>>>> by sending them through unix socket between namespaces, this method >>>>>>>>> is >>>>>>>>> slow and insecure because other side is exposed inside pid >>>>>>>>> namespace. >>>>>> >>>>>> Andrew asked why we might need this. >>>>>> >>>>>> Such conversion is required for interaction between processes across >>>>>> pid-namespaces. >>>>>> For example to identify process in container by pid file looking from >>>>>> outside. >>>>>> >>>>>> Two years ago I've solved this in project of mine with monstrous code >>>>>> which >>>>>> forks couple times just to convert pid, lucky for me performance >>>>>> wasn't >>>>>> important. >>>>> >>>>> That's a single user who needed this a single time, and found a >>>>> userspace-based solution anyway. This is not exactly compelling! >>>>> >>>>> Is there a stronger case to be made? How does this change benefit our >>>>> users? Sell it to us! >>>> >>>> Oracle database is planning to use pid namespace for sandboxing database >>>> instances and they need an API similar to translate_pid to effectively >>>> translate process IDs from other pid namespaces. Prakash (cced in mail) >>>> can >>>> provide more details on this usecase. >>> >>> >>> As Nagarathnam indicated, Oracle Database will be using pid namespaces >>> and >>> needs a direct method of converting pids of processes in the pid >>> namespace >>> hierarchy. In this use case multiple >>> nested PID namespaces will be used. The currently available mechanism >>> are >>> not very efficient for this use case. For ex. as Konstantin described, >>> using >>> /proc//status would require the application to scan all the pid's >>> status files to determine the pid of given process in a child namespace. >>> >>> Use of SCM_CREDENTIALS's socket message is another way, which would >>> require >>> every process starting inside a pid namespace to send this message and >>> the >>> receiving process in the target namespace would have to save the >>> converted >>> pid and reference it. This mechanism becomes cumbersome especially if the >>> application has to deal with multiple nested pid namespaces. Also, the >>> Database needs to be able to convert a thread's global pid(gettid()). >>> Passing the thread's pid(gettid()) in SCM_CREDENTIALS message requires >>> CAP_SYS_ADMIN, which is an issue. >>> >>> So having a direct method, like the API that Konstantin is proposing, >>> will >>> work best for the Database >>> since pid of a process in any of the nested pid namespaces can be >>> converted >>> as and when required. I think with the proposed API, the application >>> should >>> be able to convert pid of a process or tid(gettid()) of a thread as well. >>> >> >> Can you explain what Oracle's database is planning to do with this >> information? > > > Database uses the PID to programmatically find out if the process/thread is > alive(kill 0) also send signals to the processes requesting it to dump > status/debug information and kill the processes in case of a shutdown abort > of the instance. What I'm wondering is: how does the caller of kill() end up controlling a task whose pid it doesn't know in its own namespace? > > -Prakash. > >