archive mirror
 help / color / mirror / Atom feed
From: Kees Cook <>
To: Stefano Garzarella <>
Cc: Pavel Begunkov <>,
	Miklos Szeredi <>,
	Matthew Wilcox <>,
	Andy Lutomirski <>,
	Jann Horn <>,
	Christian Brauner <>,,,
	Linux API <>,,,
	Michael Kerrisk <>,
	Stefan Hajnoczi <>
Subject: Re: strace of io_uring events?
Date: Thu, 16 Jul 2020 08:12:35 -0700	[thread overview]
Message-ID: <202007160751.ED56C55@keescook> (raw)
In-Reply-To: <20200716131404.bnzsaarooumrp3kx@steredhat>

On Thu, Jul 16, 2020 at 03:14:04PM +0200, Stefano Garzarella wrote:
> On Wed, Jul 15, 2020 at 04:07:00PM -0700, Kees Cook wrote:
> [...]
> > Speaking to Stefano's proposal[1]:
> > 
> > - There appear to be three classes of desired restrictions:
> >   - opcodes for io_uring_register() (which can be enforced entirely with
> >     seccomp right now).
> >   - opcodes from SQEs (this _could_ be intercepted by seccomp, but is
> >     not currently written)
> >   - opcodes of the types of restrictions to restrict... for making sure
> >     things can't be changed after being set? seccomp already enforces
> >     that kind of "can only be made stricter"
> In addition we want to limit the SQEs to use only the registered fd and buffers.

Hmm, good point. Yeah, since it's an "extra" mapping (ioring file number
vs fd number) this doesn't really map well to seccomp. (And frankly,
there's some difficulty here mapping many of the ioring-syscalls to
seccomp because it's happening "deeper" than the syscall layer (i.e.
some of the arguments have already been resolved into kernel object
pointers, etc).

> Do you think it's better to have everything in seccomp instead of adding
> the restrictions in io_uring (the patch isn't very big)?

I'm still trying to understand how io_uring will be used, and it seems
odd to me that it's effectively a seccomp bypass. (Though from what I
can tell it is not an LSM bypass, which is good -- though I'm worried
there might be some embedded assumptions in LSMs about creds vs current
and LSMs may try to reason (or report) on actions with the kthread in
mind, but afaict everything important is checked against creds.

> With seccomp, would it be possible to have different restrictions for two
> instances of io_uring in the same process?

For me, this is the most compelling reason to have the restrictions NOT
implemented via seccomp. Trying to make "which instance" choice in
seccomp would be extremely clumsy.

So at this point, I think it makes sense for the restriction series to
carry on -- it is io_uring-specific and solves some problems that
seccomp is not in good position to reason about.

All this said, I'd still like a way to apply seccomp to io_uring
because it's a rather giant syscall filter bypass mechanism, and gaining
access (IIUC) is possible without actually calling any of the io_uring
syscalls. Is that correct? A process would receive an fd (via SCM_RIGHTS,
pidfd_getfd, or soon seccomp addfd), and then call mmap() on it to gain
access to the SQ and CQ, and off it goes? (The only glitch I see is
waking up the worker thread?)

What appears to be the worst bit about adding seccomp to io_uring is the
almost complete disassociation of process hierarchy from syscall action.
Only a cred is used for io_uring, and seccomp filters are associated with
task structs. I'm not sure if there is a way to solve this disconnect
without a major internal refactoring of seccomp to attach to creds and
then make every filter attachment create a new cred... *head explody*

Kees Cook

  reply	other threads:[~2020-07-16 15:12 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15 11:12 strace of io_uring events? Miklos Szeredi
2020-07-15 14:35 ` Andy Lutomirski
2020-07-15 17:11   ` Matthew Wilcox
2020-07-15 19:42     ` Pavel Begunkov
2020-07-15 20:09       ` Miklos Szeredi
2020-07-15 20:20         ` Pavel Begunkov
2020-07-15 23:07           ` Kees Cook
2020-07-16 13:14             ` Stefano Garzarella
2020-07-16 15:12               ` Kees Cook [this message]
2020-07-17  8:01                 ` Stefano Garzarella
2020-07-21 15:27                   ` Andy Lutomirski
2020-07-21 15:31                     ` Jens Axboe
2020-07-21 17:23                       ` Andy Lutomirski
2020-07-21 17:30                         ` Jens Axboe
2020-07-21 17:44                           ` Andy Lutomirski
2020-07-21 18:39                             ` Jens Axboe
2020-07-21 19:44                               ` Andy Lutomirski
2020-07-21 19:48                                 ` Jens Axboe
2020-07-21 19:56                                 ` Andres Freund
2020-07-21 19:37                         ` Andres Freund
2020-07-21 15:58                     ` Stefano Garzarella
2020-07-23 10:39                       ` Stefan Hajnoczi
2020-07-23 13:37                       ` Colin Walters
2020-07-24  7:25                         ` Stefano Garzarella
2020-07-16 13:17             ` Aleksa Sarai
2020-07-16 15:19               ` Kees Cook
2020-07-17  8:17               ` Cyril Hrubis
2020-07-16 16:24             ` Andy Lutomirski
2020-07-16  0:12     ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202007160751.ED56C55@keescook \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).