All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jann Horn <jannh@google.com>,
	Serge Hallyn <serge.hallyn@ubuntu.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Prakash Sangappa <prakash.sangappa@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH RFC v5] pidns: introduce syscall translate_pid
Date: Thu, 31 May 2018 12:52:49 -0500	[thread overview]
Message-ID: <87efhremq6.fsf@xmission.com> (raw)
In-Reply-To: <3bdd6b27-0a46-5802-8671-07268cecc1c7@oracle.com> (Nagarathnam Muthusamy's message of "Thu, 31 May 2018 10:41:20 -0700")

Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com> writes:

> On 05/15/2018 10:36 AM, Konstantin Khlebnikov wrote:
>>
>>
>> On 15.05.2018 20:19, Nagarathnam Muthusamy wrote:
>>>
>>>
>>> On 04/24/2018 10:36 PM, Konstantin Khlebnikov wrote:
>>>> On 23.04.2018 20:37, Nagarathnam Muthusamy wrote:
>>>>>
>>>>>
>>>>> On 04/05/2018 12:02 AM, Konstantin Khlebnikov wrote:
>>>>>> On 05.04.2018 01:29, Eric W. Biederman wrote:
>>>>>>> Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com> writes:
>>>>>>>
>>>>>>>> On 04/04/2018 12:11 PM, Konstantin Khlebnikov wrote:
>>>>>>>>> Each process have different pids, one for each pid namespace
>>>>>>>>> it belongs.
>>>>>>>>> When interaction happens within single pid-ns translation
>>>>>>>>> isn't required.
>>>>>>>>> More complicated scenarios needs special handling.
>>>>>>>>>
>>>>>>>>> For example:
>>>>>>>>> - reading pid-files or logs written inside container with pid
>>>>>>>>> namespace
>>>>>>>>> - attaching with ptrace to tasks from different pid namespace
>>>>>>>>> - passing pids across pid namespaces in any kind of API
>>>>>>>>>
>>>>>>>>> Currently there are several interfaces that could be used here:
>>>>>>>>>
>>>>>>>>> Pid namespaces are identified by inode number of
>>>>>>>>> /proc/[pid]/ns/pid.
>>>>>>>
>>>>>>> Using the inode number in interfaces is not an
>>>>>>> option. Especially not
>>>>>>> withou referencing the device number for the filesystem as well.
>>>>>>
>>>>>> This is supposed to be single-instance fs,
>>>>>> not part of proc but referenced but its magic "symlinks".
>>>>>>
>>>>>> Device numbers are not mentioned in "man namespaces".
>>>>>>
>>>>>>>
>>>>>>>>> Pids for nested Pid namespaces are shown in file
>>>>>>>>> /proc/[pid]/status.
>>>>>>>>> In some cases conversion pid -> vpid could be easily done
>>>>>>>>> using this
>>>>>>>>> information, but backward translation requires scanning all tasks.
>>>>>>>>>
>>>>>>>>> Unix socket automatically translates pid attached to
>>>>>>>>> SCM_CREDENTIALS.
>>>>>>>>> This requires CAP_SYS_ADMIN for sending arbitrary pids and
>>>>>>>>> entering
>>>>>>>>> into pid namespace, this expose process and could be insecure.
>>>>>>>>>
>>>>>>>>> This patch adds new syscall for converting pids between pid
>>>>>>>>> namespaces:
>>>>>>>>>
>>>>>>>>> pid_t translate_pid(pid_t pid, int source_type, int source,
>>>>>>>>>                                  int target_type, int target);
>>>>>>>>>
>>>>>>>>> @source_type and @target_type defines type of following arguments:
>>>>>>>>>
>>>>>>>>> TRANSLATE_PID_CURRENT_PIDNS  - current pid namespace,
>>>>>>>>> argument is unused
>>>>>>>>> TRANSLATE_PID_TASK_PIDNS     - task pid-ns, argument is task pid
>>>>>>>>
>>>>>>>> I believe using pid to represent the namespace has been already
>>>>>>>> discussed in V1 of this patch in
>>>>>>>> https://lkml.org/lkml/2015/9/22/1087
>>>>>>>> after which we moved on to fd based version of this interface.
>>>>>>>
>>>>>>> Or in short why is the case of pids important?
>>>>>>>
>>>>>>> You Konstantin you almost said why they were important in your
>>>>>>> message
>>>>>>> saying you were going to send this one.  However you don't
>>>>>>> explain in
>>>>>>> your description why you want to identify pid namespaces by pid.
>>>>>>>
>>>>>>
>>>>>> Open of /proc/[pid]/ns/pid requires same permissions as ptrace,
>>>>>> pid based variant doesn't have such restrictions.
>>>>>
>>>>> Can you provide more information on usecase requiring PID
>>>>> translation but not used for tracing related purposes?
>>>>
>>>> Any introspection for [nested] containers. It's easier to work
>>>> when you have all information when you don't have any.
>>>> For example our CMS https://github.com/yandex/porto allows to
>>>> start nested sub-container (or even deeper) by request from any
>>>> container and have to tell back which pid task is have. And it
>>>> could translate any pid inside into accessible by client and vice
>>>> versa.
>>>>
>>>
>>> I still dont get the exact reason why PID based approach to
>>> identify the namespace during pid translation process is absolutely
>>> required compared to fd based approach. 
>>
>> As I told open(/proc/%d/ns/pid) have security restrictions - same
>> uid/CAP_SYS_PTRACE/whatever
>> Pidns-fd holds pid-namespace and without restrictions could be abused.
>> Pid based API is racy but always available without any restrictions.
>>
>>
>>> From your version of TranslatePid in
>>>
>>> https://github.com/yandex/porto/blob/0d7e6e7e1830dcd0038a057b2ab9964cec5b8fab/src/util/unix.cpp 
>>>
>>>
>>> I see that you are going through the trouble of forking a process
>>> and sending SMC_CREDENTIALS for pid translation. Even your existing
>>> API could be extremely simplified if translate_pid based on file
>>> descriptors make it to the gate and I believe from the last
>>> discussion it was almost there
>>> https://patchwork.kernel.org/patch/10305439/
>>>
>>>
>>>>> On a side note, can we have the types TRANSLATE_PID_CURRENT_PIDNS
>>>>> and TRANSLATE_PID_FD_PIDNS integrated first and then possibly
>>>>> extend the interface to include TRANSLATE_PID_TASK_PIDNS in
>>>>> future?
>>>>
>>>> I don't see reason for this separation.
>>>> Pids and pid namespaces are part of the API for a long time.
>>>
>>> If you are talking about the translate_pid API proposed, I believe
>>> the V4 proposed under https://patchwork.kernel.org/patch/10003935/
>>> had only fd based API before a mix of PID and fd based is proposed
>>> in V5. Again, I was just wondering if we can get the FD based
>>> approach in first and then extend the API to include PID based
>>> approach later as fd based approach could provide a lot of
>>> immediate benefits?
>>>
>>> Thanks,
>>> Nagarathnam.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Nagarathnam.
>>>>>> Most pid-based syscalls are racy in some cases but they are
>>>>>> here for decades and everybody knowns how to deal with it.
>>>>>> So, I've decided to merge both worlds in one interface which
>>>>>> clearly tells what to expect.
>>>>>
>>>
>
> Ping? Any additional comments on this patch?

I have totally lost the thread.  Let me see if I can find enough of the
thread to see what is going on.

The whole let's use pids instead of fds was a major distraction.

Eric

  reply	other threads:[~2018-05-31 17:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-04 19:11 [PATCH RFC v5] pidns: introduce syscall translate_pid Konstantin Khlebnikov
2018-04-04 20:31 ` Nagarathnam Muthusamy
2018-04-04 22:29   ` Eric W. Biederman
2018-04-05  7:02     ` Konstantin Khlebnikov
2018-04-23 17:37       ` Nagarathnam Muthusamy
2018-04-25  5:36         ` Konstantin Khlebnikov
2018-05-15 17:19           ` Nagarathnam Muthusamy
2018-05-15 17:36             ` Konstantin Khlebnikov
2018-05-15 17:40               ` Nagarathnam Muthusamy
2018-05-15 17:44                 ` Nagarathnam Muthusamy
2018-05-31 17:41               ` Nagarathnam Muthusamy
2018-05-31 17:52                 ` Eric W. Biederman [this message]
2018-05-31 18:05                 ` Eric W. Biederman
2018-06-01  6:58                   ` Konstantin Khlebnikov
2018-06-01 15:55                     ` NAGARATHNAM MUTHUSAMY
2018-06-01 16:14                       ` Eric W. Biederman
2018-06-01 16:15                       ` Konstantin Khlebnikov
2018-04-27 12:15       ` Michael Kerrisk (man-pages)
2018-05-31 18:02         ` Eric W. Biederman
2018-05-31 18:02           ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87efhremq6.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=jannh@google.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=nagarathnam.muthusamy@oracle.com \
    --cc=oleg@redhat.com \
    --cc=prakash.sangappa@oracle.com \
    --cc=serge.hallyn@ubuntu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.