From: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
"Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
Jann Horn <jannh@google.com>,
Serge Hallyn <serge.hallyn@ubuntu.com>,
Oleg Nesterov <oleg@redhat.com>,
Andy Lutomirski <luto@amacapital.net>,
Prakash Sangappa <prakash.sangappa@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH RFC v5] pidns: introduce syscall translate_pid
Date: Tue, 15 May 2018 10:40:19 -0700 [thread overview]
Message-ID: <4b21a648-0abc-e92d-ec37-681dff63bb55@oracle.com> (raw)
In-Reply-To: <3e2c285a-1bf8-f71d-1b74-4d6465c29a54@yandex-team.ru>
On 05/15/2018 10:36 AM, Konstantin Khlebnikov wrote:
>
>
> On 15.05.2018 20:19, Nagarathnam Muthusamy wrote:
>>
>>
>> On 04/24/2018 10:36 PM, Konstantin Khlebnikov wrote:
>>> On 23.04.2018 20:37, Nagarathnam Muthusamy wrote:
>>>>
>>>>
>>>> On 04/05/2018 12:02 AM, Konstantin Khlebnikov wrote:
>>>>> On 05.04.2018 01:29, Eric W. Biederman wrote:
>>>>>> Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com> writes:
>>>>>>
>>>>>>> On 04/04/2018 12:11 PM, Konstantin Khlebnikov wrote:
>>>>>>>> Each process have different pids, one for each pid namespace it
>>>>>>>> belongs.
>>>>>>>> When interaction happens within single pid-ns translation isn't
>>>>>>>> required.
>>>>>>>> More complicated scenarios needs special handling.
>>>>>>>>
>>>>>>>> For example:
>>>>>>>> - reading pid-files or logs written inside container with pid
>>>>>>>> namespace
>>>>>>>> - attaching with ptrace to tasks from different pid namespace
>>>>>>>> - passing pids across pid namespaces in any kind of API
>>>>>>>>
>>>>>>>> Currently there are several interfaces that could be used here:
>>>>>>>>
>>>>>>>> Pid namespaces are identified by inode number of
>>>>>>>> /proc/[pid]/ns/pid.
>>>>>>
>>>>>> Using the inode number in interfaces is not an option. Especially
>>>>>> not
>>>>>> withou referencing the device number for the filesystem as well.
>>>>>
>>>>> This is supposed to be single-instance fs,
>>>>> not part of proc but referenced but its magic "symlinks".
>>>>>
>>>>> Device numbers are not mentioned in "man namespaces".
>>>>>
>>>>>>
>>>>>>>> Pids for nested Pid namespaces are shown in file
>>>>>>>> /proc/[pid]/status.
>>>>>>>> In some cases conversion pid -> vpid could be easily done using
>>>>>>>> this
>>>>>>>> information, but backward translation requires scanning all tasks.
>>>>>>>>
>>>>>>>> Unix socket automatically translates pid attached to
>>>>>>>> SCM_CREDENTIALS.
>>>>>>>> This requires CAP_SYS_ADMIN for sending arbitrary pids and
>>>>>>>> entering
>>>>>>>> into pid namespace, this expose process and could be insecure.
>>>>>>>>
>>>>>>>> This patch adds new syscall for converting pids between pid
>>>>>>>> namespaces:
>>>>>>>>
>>>>>>>> pid_t translate_pid(pid_t pid, int source_type, int source,
>>>>>>>> int target_type, int target);
>>>>>>>>
>>>>>>>> @source_type and @target_type defines type of following arguments:
>>>>>>>>
>>>>>>>> TRANSLATE_PID_CURRENT_PIDNS - current pid namespace, argument
>>>>>>>> is unused
>>>>>>>> TRANSLATE_PID_TASK_PIDNS - task pid-ns, argument is task pid
>>>>>>>
>>>>>>> I believe using pid to represent the namespace has been already
>>>>>>> discussed in V1 of this patch in
>>>>>>> https://lkml.org/lkml/2015/9/22/1087
>>>>>>> after which we moved on to fd based version of this interface.
>>>>>>
>>>>>> Or in short why is the case of pids important?
>>>>>>
>>>>>> You Konstantin you almost said why they were important in your
>>>>>> message
>>>>>> saying you were going to send this one. However you don't
>>>>>> explain in
>>>>>> your description why you want to identify pid namespaces by pid.
>>>>>>
>>>>>
>>>>> Open of /proc/[pid]/ns/pid requires same permissions as ptrace,
>>>>> pid based variant doesn't have such restrictions.
>>>>
>>>> Can you provide more information on usecase requiring PID
>>>> translation but not used for tracing related purposes?
>>>
>>> Any introspection for [nested] containers. It's easier to work when
>>> you have all information when you don't have any.
>>> For example our CMS https://github.com/yandex/porto allows to start
>>> nested sub-container (or even deeper) by request from any container
>>> and have to tell back which pid task is have. And it could translate
>>> any pid inside into accessible by client and vice versa.
>>>
>>
>> I still dont get the exact reason why PID based approach to identify
>> the namespace during pid translation process is absolutely required
>> compared to fd based approach.
>
> As I told open(/proc/%d/ns/pid) have security restrictions - same
> uid/CAP_SYS_PTRACE/whatever
> Pidns-fd holds pid-namespace and without restrictions could be abused.
> Pid based API is racy but always available without any restrictions.
I get that Pid based API is available without any restrictions but do we
have any existing usecase which requires Pid based API but cannot use
Pidns-fd based API? Most of the usecases discussed in this thread deals
with introspection of a process by another process and I believe that
security requirement for opening (/proc/%d/ns/pid) is required for all
such usecases. In other words, Why would a process which does not belong
to same uid of the process observed or have CAP_SYS_PTRACE be allowed to
translate PID?
Thanks,
Nagarathnam.
>
>
>> From your version of TranslatePid in
>>
>> https://github.com/yandex/porto/blob/0d7e6e7e1830dcd0038a057b2ab9964cec5b8fab/src/util/unix.cpp
>>
>>
>> I see that you are going through the trouble of forking a process and
>> sending SMC_CREDENTIALS for pid translation. Even your existing API
>> could be extremely simplified if translate_pid based on file
>> descriptors make it to the gate and I believe from the last
>> discussion it was almost there
>> https://patchwork.kernel.org/patch/10305439/
>>
>>
>>>> On a side note, can we have the types TRANSLATE_PID_CURRENT_PIDNS
>>>> and TRANSLATE_PID_FD_PIDNS integrated first and then possibly
>>>> extend the interface to include TRANSLATE_PID_TASK_PIDNS in future?
>>>
>>> I don't see reason for this separation.
>>> Pids and pid namespaces are part of the API for a long time.
>>
>> If you are talking about the translate_pid API proposed, I believe
>> the V4 proposed under https://patchwork.kernel.org/patch/10003935/
>> had only fd based API before a mix of PID and fd based is proposed in
>> V5. Again, I was just wondering if we can get the FD based approach
>> in first and then extend the API to include PID based approach later
>> as fd based approach could provide a lot of immediate benefits?
>>
>> Thanks,
>> Nagarathnam.
>>>
>>>>
>>>> Thanks,
>>>> Nagarathnam.
>>>>> Most pid-based syscalls are racy in some cases but they are
>>>>> here for decades and everybody knowns how to deal with it.
>>>>> So, I've decided to merge both worlds in one interface which
>>>>> clearly tells what to expect.
>>>>
>>
next prev parent reply other threads:[~2018-05-15 17:40 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-04 19:11 [PATCH RFC v5] pidns: introduce syscall translate_pid Konstantin Khlebnikov
2018-04-04 20:31 ` Nagarathnam Muthusamy
2018-04-04 22:29 ` Eric W. Biederman
2018-04-05 7:02 ` Konstantin Khlebnikov
2018-04-23 17:37 ` Nagarathnam Muthusamy
2018-04-25 5:36 ` Konstantin Khlebnikov
2018-05-15 17:19 ` Nagarathnam Muthusamy
2018-05-15 17:36 ` Konstantin Khlebnikov
2018-05-15 17:40 ` Nagarathnam Muthusamy [this message]
2018-05-15 17:44 ` Nagarathnam Muthusamy
2018-05-31 17:41 ` Nagarathnam Muthusamy
2018-05-31 17:52 ` Eric W. Biederman
2018-05-31 18:05 ` Eric W. Biederman
2018-06-01 6:58 ` Konstantin Khlebnikov
2018-06-01 15:55 ` NAGARATHNAM MUTHUSAMY
2018-06-01 16:14 ` Eric W. Biederman
2018-06-01 16:15 ` Konstantin Khlebnikov
2018-04-27 12:15 ` Michael Kerrisk (man-pages)
2018-05-31 18:02 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4b21a648-0abc-e92d-ec37-681dff63bb55@oracle.com \
--to=nagarathnam.muthusamy@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=jannh@google.com \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=oleg@redhat.com \
--cc=prakash.sangappa@oracle.com \
--cc=serge.hallyn@ubuntu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).