linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Djalal Harouni <tixxdz@gmail.com>,
	Alexey Gladkov <gladkov.alexey@gmail.com>,
	Aliaksandr Patseyenak <Aliaksandr_Patseyenak1@epam.com>,
	Tatsiana Brouka <Tatsiana_Brouka@epam.com>
Subject: Re: [PATCH 1/2 v2] fdmap(2)
Date: Thu, 28 Sep 2017 08:02:23 -0700	[thread overview]
Message-ID: <CALCETrW_8BHRtL5qPr4JHe_7TptEwc_4ZoDiuhZZE6Ph979YKQ@mail.gmail.com> (raw)
In-Reply-To: <CACVxJT_B4m-O66haETWkFVx5AgGqDbM-K-53odJ0bRp0_2xhrQ@mail.gmail.com>

On Thu, Sep 28, 2017 at 3:55 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On 9/28/17, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote:
>> On 27 September 2017 at 17:03, Andy Lutomirski <luto@amacapital.net> wrote:
>
>>>> The idea is to start process. In ideal world, only bynary system calls
>>>> would exist and shells could emulate /proc/* same way bash implement
>>>> /dev/tcp
>>>
>>> Then start the process by doing it for real and making it obviously
>>> useful.  We should not add a pair of vaguely useful, rather weak
>>> syscalls just to start a process of modernizing /proc.
>
> Before doing it for real it would be nice to have at least a nod
> from people in charge that syscalls which return binary
> information are OK. Otherwise some EIATF guy will just say
> "NAK /proc is fine, it always was fine".

There's nothing inherently wrong with syscalls that return binary
information.  There is something wrong with reinventing the world with
insufficient justification, though.

/proc may be old, clunky, and kind of slow, but it has a lot of good
things going for it.  It supports access control (DAC and MAC).  It
handles namespacing in a way that's awkward but supported by all the
existing namespace managers.  It may soon support mount options, which
is rather important.

I feel like we've been discussing this performance issue for over a
year, and I distinctly recall discussing it in Santa Fe.  I suggested
a two-pronged approach:

1. Add a new syscall that will, in a single call, open, read, and
close a proc file and maybe even a regular non-proc file.  Like this:

long readfileat(int dirfd, const char *path, void *buf, size_t len, int flags);

2. Where needed, add new /proc files with lighter-weight
representations.  I think we discussed something that's like status
but in nl_attr format.

This doesn't end up with a bunch of code duplication the way that a
full-blown syscall-based reimplementation would.  It supports all the
/proc features rather than just a subset.  It fully respects access
control, future mount options, and the potential lack of a /proc
mount.

>
> Or look from another angle: sched_setaffinity exists but there is
> no /proc counterpart, shells must use taskset(1) and world didn't end.

sched_setaffinity() modifies the caller.  /proc wouldn't have made much sense.

>
>> I concur.
>>
>> Alexey, you still have not wxplained who specifically needs this
>> right now, and how, precisely, they plan to use the new system calls.
>> It is all very arm-wavey so far.
>
> It is not if you read even example program in the original patch.
> Any program which queries information about file descriptors
> will benefit both in CPU and memory usage.
>
> void closefrom(int start)
> {
>         int fd[1024];
>         int n;
>
>         while ((n = fdmap(0, fd, sizeof(fd)/sizeof(fd[0]), start)) > 0) {
>                 unsigned int i;
>
>                 for (i = 0; i < n; i++)
>                         close(fd[i]);
>
>                 start = fd[n - 1] + 1;
>         }
> }
>
> CRIU naturally to know everything about descriptors of target processes:
> It does:
>
> int predump_task_files(int pid)
> {
>         struct dirent *de;
>         DIR *fd_dir;
>         int ret = -1;
>
>         pr_info("Pre-dump fds for %d)\n", pid);
>
>         fd_dir = opendir_proc(pid, "fd");
>         if (!fd_dir)
>                 return -1;
>
>         while ((de = readdir(fd_dir))) {
>                 if (dir_dots(de))
>                         continue;
>
>                 if (predump_one_fd(pid, atoi(de->d_name)))
>                         goto out;
>         }
>
>         ret = 0;
> out:
>         closedir(fd_dir);
>         return ret;
> }
>
> which is again inefficient.

And /proc/PID/fds would solve this.

  reply	other threads:[~2017-09-28 15:02 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-24 20:06 [PATCH 1/2 v2] fdmap(2) Alexey Dobriyan
2017-09-24 20:08 ` [PATCH v2 2/2] pidmap(2) Alexey Dobriyan
2017-09-24 21:27   ` Andy Lutomirski
2017-09-26 18:46     ` Alexey Dobriyan
2017-09-27 15:04       ` Andy Lutomirski
2017-09-25  7:43   ` Michael Kerrisk (man-pages)
2017-09-25 10:47   ` Djalal Harouni
2017-09-26  5:44   ` kbuild test robot
2017-09-24 21:31 ` [PATCH 1/2 v2] fdmap(2) Andy Lutomirski
2017-09-26 18:43   ` Alexey Dobriyan
2017-09-25  7:42 ` Michael Kerrisk (man-pages)
2017-09-26 19:00   ` Alexey Dobriyan
2017-09-27 15:03     ` Andy Lutomirski
2017-09-28  7:26       ` Michael Kerrisk (man-pages)
2017-09-28 10:55         ` Alexey Dobriyan
2017-09-28 15:02           ` Andy Lutomirski [this message]
2017-10-11 17:37             ` Alexey Dobriyan
2017-09-28 10:10       ` Alexey Dobriyan
2017-10-23  9:29   ` Pavel Machek
2017-10-25 12:45     ` Alexey Dobriyan
2017-10-25 13:48       ` Pavel Machek
2017-09-26  4:25 ` kbuild test robot
2017-10-10 22:08 ` [1/2,v2] fdmap(2) Andrei Vagin
2017-10-11 18:12   ` Alexey Dobriyan
2017-10-12  8:06     ` Andrei Vagin
2017-10-18 11:35       ` Alexey Dobriyan
2017-10-18 17:47         ` Andy Lutomirski
2017-10-19 15:34           ` Alexey Dobriyan
2017-10-20  7:48             ` Greg KH
2017-10-25 13:11               ` Alexey Dobriyan
2017-10-26  7:53             ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrW_8BHRtL5qPr4JHe_7TptEwc_4ZoDiuhZZE6Ph979YKQ@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=Aliaksandr_Patseyenak1@epam.com \
    --cc=Tatsiana_Brouka@epam.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gladkov.alexey@gmail.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=rdunlap@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tixxdz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).