From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755100AbdKAQ71 (ORCPT ); Wed, 1 Nov 2017 12:59:27 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:18305 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754817AbdKAQ7Z (ORCPT ); Wed, 1 Nov 2017 12:59:25 -0400 Message-ID: <59F9FD8B.8090607@oracle.com> Date: Wed, 01 Nov 2017 09:59:55 -0700 From: nagarathnam muthusamy Organization: Oracle Corporation User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20131028 Thunderbird/17.0.10 MIME-Version: 1.0 To: prakash sangappa CC: Andy Lutomirski , Andrew Morton , Konstantin Khlebnikov , Oleg Nesterov , Linux API , "linux-kernel@vger.kernel.org" , Serge Hallyn , "Eric W. Biederman" , Eugene Syromiatnikov Subject: Re: [PATCH v4] pidns: introduce syscall translate_pid References: <150788678482.924140.11785205105514746135.stgit@buzz> <20171013160514.GA27812@redhat.com> <3bdb5341-9ae6-265a-ce5b-45c2cfc76fad@yandex-team.ru> <20171016143628.b2ef80a9ef16d4345889b4d9@linux-foundation.org> <59E685B3.1000200@oracle.com> <59E689F5.2080706@oracle.com> In-Reply-To: <59E689F5.2080706@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I believe all the questions raised in this thread were answered. Just wondering if there are any outstanding questions? Thanks, Nagarathnam. On 10/17/2017 3:53 PM, prakash sangappa wrote: > > On 10/17/2017 3:40 PM, Andy Lutomirski wrote: >> On Tue, Oct 17, 2017 at 3:35 PM, prakash sangappa >> wrote: >>> On 10/17/2017 3:02 PM, Andy Lutomirski wrote: >>>> On Tue, Oct 17, 2017 at 8:38 AM, Prakash Sangappa >>>> wrote: >>>>> >>>>> On 10/16/17 5:52 PM, Andy Lutomirski wrote: >>>>>> On Mon, Oct 16, 2017 at 3:54 PM, prakash.sangappa >>>>>> wrote: >>>>>>> >>>>>>> On 10/16/2017 03:07 PM, Nagarathnam Muthusamy wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 10/16/2017 02:36 PM, Andrew Morton wrote: >>>>>>>>> On Sat, 14 Oct 2017 11:17:47 +0300 Konstantin Khlebnikov >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>>>> pid_t translate_pid(pid_t pid, int source, int target); >>>>>>>>>>>>> >>>>>>>>>>>>> This syscall converts pid from source pid-ns into pid in >>>>>>>>>>>>> target >>>>>>>>>>>>> pid-ns. >>>>>>>>>>>>> If pid is unreachable from target pid-ns it returns zero. >>>>>>>>>>>>> >>>>>>>>>>>>> Pid-namespaces are referred file descriptors opened to >>>>>>>>>>>>> proc files >>>>>>>>>>>>> /proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children. >>>>>>>>>>>>> Negative >>>>>>>>>>>>> argument >>>>>>>>>>>>> refers to current pid namespace, same as file >>>>>>>>>>>>> /proc/self/ns/pid. >>>>>>>>>>>>> >>>>>>>>>>>>> Kernel expose virtual pids in /proc/[pid]/status:NSpid, but >>>>>>>>>>>>> backward >>>>>>>>>>>>> translation requires scanning all tasks. Also pids could be >>>>>>>>>>>>> translated >>>>>>>>>>>>> by sending them through unix socket between namespaces, this >>>>>>>>>>>>> method >>>>>>>>>>>>> is >>>>>>>>>>>>> slow and insecure because other side is exposed inside pid >>>>>>>>>>>>> namespace. >>>>>>>>>> Andrew asked why we might need this. >>>>>>>>>> >>>>>>>>>> Such conversion is required for interaction between processes >>>>>>>>>> across >>>>>>>>>> pid-namespaces. >>>>>>>>>> For example to identify process in container by pid file looking >>>>>>>>>> from >>>>>>>>>> outside. >>>>>>>>>> >>>>>>>>>> Two years ago I've solved this in project of mine with monstrous >>>>>>>>>> code >>>>>>>>>> which >>>>>>>>>> forks couple times just to convert pid, lucky for me performance >>>>>>>>>> wasn't >>>>>>>>>> important. >>>>>>>>> That's a single user who needed this a single time, and found a >>>>>>>>> userspace-based solution anyway. This is not exactly compelling! >>>>>>>>> >>>>>>>>> Is there a stronger case to be made? How does this change >>>>>>>>> benefit >>>>>>>>> our >>>>>>>>> users? Sell it to us! >>>>>>>> Oracle database is planning to use pid namespace for sandboxing >>>>>>>> database >>>>>>>> instances and they need an API similar to translate_pid to >>>>>>>> effectively >>>>>>>> translate process IDs from other pid namespaces. Prakash (cced in >>>>>>>> mail) >>>>>>>> can >>>>>>>> provide more details on this usecase. >>>>>>> >>>>>>> As Nagarathnam indicated, Oracle Database will be using pid >>>>>>> namespaces >>>>>>> and >>>>>>> needs a direct method of converting pids of processes in the pid >>>>>>> namespace >>>>>>> hierarchy. In this use case multiple >>>>>>> nested PID namespaces will be used. The currently available >>>>>>> mechanism >>>>>>> are >>>>>>> not very efficient for this use case. For ex. as Konstantin >>>>>>> described, >>>>>>> using >>>>>>> /proc//status would require the application to scan all the >>>>>>> pid's >>>>>>> status files to determine the pid of given process in a child >>>>>>> namespace. >>>>>>> >>>>>>> Use of SCM_CREDENTIALS's socket message is another way, which would >>>>>>> require >>>>>>> every process starting inside a pid namespace to send this >>>>>>> message and >>>>>>> the >>>>>>> receiving process in the target namespace would have to save the >>>>>>> converted >>>>>>> pid and reference it. This mechanism becomes cumbersome >>>>>>> especially if >>>>>>> the >>>>>>> application has to deal with multiple nested pid namespaces. >>>>>>> Also, the >>>>>>> Database needs to be able to convert a thread's global >>>>>>> pid(gettid()). >>>>>>> Passing the thread's pid(gettid()) in SCM_CREDENTIALS message >>>>>>> requires >>>>>>> CAP_SYS_ADMIN, which is an issue. >>>>>>> >>>>>>> So having a direct method, like the API that Konstantin is >>>>>>> proposing, >>>>>>> will >>>>>>> work best for the Database >>>>>>> since pid of a process in any of the nested pid namespaces can be >>>>>>> converted >>>>>>> as and when required. I think with the proposed API, the >>>>>>> application >>>>>>> should >>>>>>> be able to convert pid of a process or tid(gettid()) of a thread as >>>>>>> well. >>>>>>> >>>>>> Can you explain what Oracle's database is planning to do with this >>>>>> information? >>>>> >>>>> Database uses the PID to programmatically find out if the >>>>> process/thread >>>>> is >>>>> alive(kill 0) also send signals to the processes requesting it to >>>>> dump >>>>> status/debug information and kill the processes in case of a shutdown >>>>> abort >>>>> of the instance. >>>> What I'm wondering is: how does the caller of kill() end up >>>> controlling a task whose pid it doesn't know in its own namespace? >>> >>> I was generally describing how DB would use the PID of process. The >>> above >>> description >>> was in the case when no namespaces are used. >>> >>> With use of namespaces, the DB would convert the PID of processes >>> inside >>> its children namespaces to PID in its namespace and use that pid to >>> issue >>> kill(). >> Seems vaguely sensible. >> >> If I were designing this type of system, I'd have a manager process in >> each namespace running as PID 1, though -- PID 1 is special and needs >> to understand what's going on anyway. Then PID 1 would do the kill() >> calls and wouldn't need translate_pid(). > > Yes, this has been tried out with the prototype use of PID namespaces > in the DB. > It works, but would be slow as the manager would have to exchange > messages with the > controlling processes which would be in the parent namespace. > DB could use the api to convert the pid. >