All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Weimer <fweimer@redhat.com>
To: "Michael Kerrisk \(man-pages\)" <mtk.manpages@gmail.com>
Cc: Daniel Colascione <dancol@google.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Joel Fernandes <joelaf@google.com>,
	Linux API <linux-api@vger.kernel.org>, Willy Tarreau <w@1wt.eu>,
	Vlastimil Babka <vbabka@suse.cz>,
	Carlos O'Donell <carlos@redhat.com>,
	"libc-alpha\@sourceware.org" <libc-alpha@sourceware.org>
Subject: Re: Official Linux system wrapper library?
Date: Sun, 11 Nov 2018 12:09:28 +0100	[thread overview]
Message-ID: <877ehjx447.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <bbc12da5-830e-99a7-95e3-d9da42947dc9@gmail.com> (Michael Kerrisk's message of "Sun, 11 Nov 2018 07:55:30 +0100")

* Michael Kerrisk:

> [adding in glibc folk for comment]
>
> On 11/10/18 7:52 PM, Daniel Colascione wrote:
>> Now that glibc is basically not adding any new system call wrappers,
>> how about publishing an "official" system call glue library as part of
>> the kernel distribution, along with the uapi headers? I don't think
>> it's reasonable to expect people to keep using syscall(__NR_XXX) for
>> all new functionality, especially as the system grows increasingly
>> sophisticated capabilities (like the new mount API, and hopefully the
>> new process API) outside the strictures of the POSIX process.
>
> As a quick glance at the glibc NEWS file shows, the above is not
> quite true:
>
> [[
> Version 2.28
> * The renameat2 function has been added...
> * The statx function has been added...
>
> Version 2.27
> * Support for memory protection keys was added.  The <sys/mman.h> header now
>   declares the functions pkey_alloc, pkey_free, pkey_mprotect...
> * The copy_file_range function was added.
>
> Version 2.26
> * New wrappers for the Linux-specific system calls preadv2 and pwritev2.
>
> Version 2.25
> * The getrandom [function] have been added.
> ]]
>
> I make that 11 system call wrappers added in the last 2 years.

And you missed mlock2 and memfd_create.

In some cases, we used system calls before the kernel had them (because
the kernel does not add system calls consistently across architectures).

On the other hand, this is only half of the story because distributions
do not backport system call wrappers, even those that backport kernel
implementations (or just rebase the kernel).  This is something that
could be fixed eventually, but it is realted to another problem:

We had a patch for the membarrier system call, but the kernel developers
could not tell us what the system call does in therms of the C/C++
memory model, and the kernel developers and our concurrency expert could
not agree on documentation.

A lot of the new system calls lack clear specifications or are just
somewhat misdesigned.  For example, pkey_alloc uses PKEY_DISABLE_WRITE
and PKEY_DISABLE_ACCESS flags (where the latter implies disabling both
read and write access), not something that matches the PROT_READ and
PROT_WRITE flags used by mmap/mprotect.  This caused problems when POWER
support for pkey_alloc was added, and we are still working on resolving
that.

getrandom still causes boot delays because the kernel somehow fails to
seed its internal pool before starting PID 1 even on mainstream hardware
which has plenty of (true) randomness sources available, leading to
indefinite blocking of getrandom.  It seems to me that people have
largely given up on fixing this in the upstream kernel.

For copy_file_range, we still have debates whether the system call (and
the glibc emulation) should preserve holes or not, and there a plans to
lift the cross-device restriction.

For renameat2, we already had a function in gnulib with the same name,
but which did not provide the atomic RENAME_NOREPLACE behavior for which
renameat2 was introduced.

These problems are relevant to the backporting question.  One relatively
low-cost way do backport straight wrappers would be to put them as
hidden functions into libc_nonshared.a.  But with these uncertainties,
this would be rather risky because fixing bugs of the wrappers would
then require relinking.

Thanks,
Florian

WARNING: multiple messages have this Message-ID (diff)
From: Florian Weimer <fweimer@redhat.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Daniel Colascione <dancol@google.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Joel Fernandes <joelaf@google.com>,
	Linux API <linux-api@vger.kernel.org>, Willy Tarreau <w@1wt.eu>,
	Vlastimil Babka <vbabka@suse.cz>,
	Carlos O'Donell <carlos@redhat.com>,
	"libc-alpha@sourceware.org" <libc-alpha@sourceware.org>
Subject: Re: Official Linux system wrapper library?
Date: Sun, 11 Nov 2018 12:09:28 +0100	[thread overview]
Message-ID: <877ehjx447.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <bbc12da5-830e-99a7-95e3-d9da42947dc9@gmail.com> (Michael Kerrisk's message of "Sun, 11 Nov 2018 07:55:30 +0100")

* Michael Kerrisk:

> [adding in glibc folk for comment]
>
> On 11/10/18 7:52 PM, Daniel Colascione wrote:
>> Now that glibc is basically not adding any new system call wrappers,
>> how about publishing an "official" system call glue library as part of
>> the kernel distribution, along with the uapi headers? I don't think
>> it's reasonable to expect people to keep using syscall(__NR_XXX) for
>> all new functionality, especially as the system grows increasingly
>> sophisticated capabilities (like the new mount API, and hopefully the
>> new process API) outside the strictures of the POSIX process.
>
> As a quick glance at the glibc NEWS file shows, the above is not
> quite true:
>
> [[
> Version 2.28
> * The renameat2 function has been added...
> * The statx function has been added...
>
> Version 2.27
> * Support for memory protection keys was added.  The <sys/mman.h> header now
>   declares the functions pkey_alloc, pkey_free, pkey_mprotect...
> * The copy_file_range function was added.
>
> Version 2.26
> * New wrappers for the Linux-specific system calls preadv2 and pwritev2.
>
> Version 2.25
> * The getrandom [function] have been added.
> ]]
>
> I make that 11 system call wrappers added in the last 2 years.

And you missed mlock2 and memfd_create.

In some cases, we used system calls before the kernel had them (because
the kernel does not add system calls consistently across architectures).

On the other hand, this is only half of the story because distributions
do not backport system call wrappers, even those that backport kernel
implementations (or just rebase the kernel).  This is something that
could be fixed eventually, but it is realted to another problem:

We had a patch for the membarrier system call, but the kernel developers
could not tell us what the system call does in therms of the C/C++
memory model, and the kernel developers and our concurrency expert could
not agree on documentation.

A lot of the new system calls lack clear specifications or are just
somewhat misdesigned.  For example, pkey_alloc uses PKEY_DISABLE_WRITE
and PKEY_DISABLE_ACCESS flags (where the latter implies disabling both
read and write access), not something that matches the PROT_READ and
PROT_WRITE flags used by mmap/mprotect.  This caused problems when POWER
support for pkey_alloc was added, and we are still working on resolving
that.

getrandom still causes boot delays because the kernel somehow fails to
seed its internal pool before starting PID 1 even on mainstream hardware
which has plenty of (true) randomness sources available, leading to
indefinite blocking of getrandom.  It seems to me that people have
largely given up on fixing this in the upstream kernel.

For copy_file_range, we still have debates whether the system call (and
the glibc emulation) should preserve holes or not, and there a plans to
lift the cross-device restriction.

For renameat2, we already had a function in gnulib with the same name,
but which did not provide the atomic RENAME_NOREPLACE behavior for which
renameat2 was introduced.

These problems are relevant to the backporting question.  One relatively
low-cost way do backport straight wrappers would be to put them as
hidden functions into libc_nonshared.a.  But with these uncertainties,
this would be rather risky because fixing bugs of the wrappers would
then require relinking.

Thanks,
Florian

  parent reply	other threads:[~2018-11-11 11:09 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-10 18:52 Official Linux system wrapper library? Daniel Colascione
2018-11-10 19:01 ` Willy Tarreau
2018-11-10 19:06   ` Daniel Colascione
2018-11-10 19:33     ` Willy Tarreau
2018-11-10 19:20 ` Greg KH
2018-11-10 19:58   ` Vlastimil Babka
2018-11-12  2:03     ` Carlos O'Donell
2018-11-12  2:24   ` Carlos O'Donell
2018-11-12  2:36     ` Greg KH
2018-11-12 16:08       ` Jonathan Corbet
2018-11-12 20:03         ` Greg KH
2018-12-09  4:38         ` Randy Dunlap
2018-12-10 16:27           ` Jonathan Corbet
2018-12-10 17:39             ` Carlos O'Donell
2018-12-10 23:32               ` Randy Dunlap
2018-11-12  5:46     ` Andy Lutomirski
2018-11-11  6:55 ` Michael Kerrisk (man-pages)
2018-11-11  8:17   ` Willy Tarreau
2018-11-11  8:25     ` Daniel Colascione
2018-11-11 10:40       ` Florian Weimer
2018-11-11 10:40         ` Florian Weimer
2018-11-11 10:30     ` Florian Weimer
2018-11-11 10:30       ` Florian Weimer
2018-11-11 11:02       ` Willy Tarreau
2018-11-11 12:07         ` Florian Weimer
2018-11-11 12:07           ` Florian Weimer
2018-11-11 10:53     ` Michael Kerrisk (man-pages)
2018-11-11 11:02       ` Florian Weimer
2018-11-11 11:02         ` Florian Weimer
2018-11-12 16:43         ` Joseph Myers
2018-11-13 15:15           ` Carlos O'Donell
2018-11-11 11:11       ` Willy Tarreau
2018-11-11 11:46         ` Florian Weimer
2018-11-11 11:46           ` Florian Weimer
2018-11-11 12:09           ` Willy Tarreau
2018-11-12 12:25             ` Florian Weimer
2018-11-12 12:25               ` Florian Weimer
2018-11-12 17:36             ` Joseph Myers
2018-11-12 17:53               ` Greg KH
2018-11-12 18:09                 ` Joseph Myers
2018-11-12 18:14                   ` Randy Dunlap
2018-11-12 16:59           ` Joseph Myers
2018-11-14 12:03           ` Adam Borowski
2018-11-14 12:10             ` Florian Weimer
2018-11-14 12:10               ` Florian Weimer
2018-11-16 21:24         ` Alan Cox
2018-11-11 11:09   ` Florian Weimer [this message]
2018-11-11 11:09     ` Florian Weimer
2018-11-11 14:22     ` Daniel Colascione
2018-11-12  1:44       ` Paul Eggert
2018-11-12  8:11       ` Florian Weimer
2018-11-12  8:11         ` Florian Weimer
2018-11-12 13:19         ` Daniel Colascione
2018-11-12 17:24           ` Zack Weinberg
2018-11-12 18:28             ` Daniel Colascione
2018-11-12 19:11               ` Florian Weimer
2018-11-12 19:11                 ` Florian Weimer
2018-11-12 19:26                 ` Daniel Colascione
2018-11-12 22:51                   ` Joseph Myers
2018-11-12 23:10                     ` Daniel Colascione
2018-11-12 23:26                       ` Joseph Myers
2018-11-12 22:34                 ` Joseph Myers
2018-11-13 19:39           ` Dave Martin
2018-11-13 20:58             ` Andy Lutomirski
2018-11-14 10:54               ` Dave Martin
2018-11-14 11:40                 ` Florian Weimer
2018-11-14 11:40                   ` Florian Weimer
2018-11-15 10:33                   ` Dave Martin
2018-11-14 11:58             ` Szabolcs Nagy
2018-11-14 14:46               ` Andy Lutomirski
2018-11-14 15:07                 ` Florian Weimer
2018-11-14 15:07                   ` Florian Weimer
2018-11-14 17:40                 ` Joseph Myers
2018-11-14 18:13                   ` Paul Eggert
2018-11-14 14:58               ` Carlos O'Donell
2018-11-14 17:15                 ` Arnd Bergmann
2018-11-14 18:30                   ` Joseph Myers
2018-11-14 18:30                     ` Joseph Myers
2018-11-14 15:40               ` Daniel Colascione
2018-11-14 18:15                 ` Joseph Myers
2018-11-14 18:35                   ` Daniel Colascione
2018-11-14 18:47                     ` Joseph Myers
2018-11-15  5:30                       ` Theodore Y. Ts'o
2018-11-15 16:29                         ` Joseph Myers
2018-11-15 17:08                           ` Theodore Y. Ts'o
2018-11-15 17:14                             ` Joseph Myers
2018-11-15 21:00                             ` Carlos O'Donell
2018-11-15 20:34                       ` Carlos O'Donell
2018-11-23 13:34           ` Florian Weimer
2018-11-23 13:34             ` Florian Weimer
2018-11-23 14:11             ` David Newall
2018-11-23 15:23               ` Szabolcs Nagy
2018-11-24  3:41                 ` David Newall
2018-11-28 13:18               ` David Laight
2018-11-23 20:15             ` Daniel Colascione
2018-11-23 23:19               ` Dmitry V. Levin
2018-11-12 12:45       ` Szabolcs Nagy
2018-11-12 14:35         ` Theodore Y. Ts'o
2018-11-12 14:40           ` Daniel Colascione

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877ehjx447.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=carlos@redhat.com \
    --cc=dancol@google.com \
    --cc=joelaf@google.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.