linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sargun Dhillon <sargun@sargun.me>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Tycho Andersen <tycho@tycho.pizza>,
	Christian Brauner <christian@brauner.io>,
	Kees Cook <keescook@chromium.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Song Liu <songliubraving@fb.com>,
	Robert Sesek <rsesek@google.com>,
	Containers <containers@lists.linux-foundation.org>,
	linux-man <linux-man@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Aleksa Sarai <cyphar@cyphar.com>, Jann Horn <jannh@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Will Drewry <wad@chromium.org>, bpf <bpf@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>
Subject: Re: For review: seccomp_user_notif(2) manual page [v2]
Date: Fri, 30 Oct 2020 20:27:21 +0000	[thread overview]
Message-ID: <20201030202720.GA4088@ircssh-2.c.rugged-nimbus-611.internal> (raw)
In-Reply-To: <48e5937b-80f5-c48b-1c67-e8c9db263ca5@gmail.com>

On Thu, Oct 29, 2020 at 09:37:21PM +0100, Michael Kerrisk (man-pages) wrote:
> Hello Sargun,,
> 
> On 10/29/20 9:53 AM, Sargun Dhillon wrote:
> > On Mon, Oct 26, 2020 at 10:55:04AM +0100, Michael Kerrisk (man-pages) wrote:
> 
> [...]
> 
> >>    ioctl(2) operations
> >>        The following ioctl(2) operations are provided to support seccomp
> >>        user-space notification.  For each of these operations, the first
> >>        (file descriptor) argument of ioctl(2) is the listening file
> >>        descriptor returned by a call to seccomp(2) with the
> >>        SECCOMP_FILTER_FLAG_NEW_LISTENER flag.
> >>
> >>        SECCOMP_IOCTL_NOTIF_RECV
> >>               This operation is used to obtain a user-space notification
> >>               event.  If no such event is currently pending, the
> >>               operation blocks until an event occurs.  The third
> >>               ioctl(2) argument is a pointer to a structure of the
> >>               following form which contains information about the event.
> >>               This structure must be zeroed out before the call.
> >>
> >>                   struct seccomp_notif {
> >>                       __u64  id;              /* Cookie */
> >>                       __u32  pid;             /* TID of target thread */
> >>                       __u32  flags;           /* Currently unused (0) */
> >>                       struct seccomp_data data;   /* See seccomp(2) */
> >>                   };
> >>
> >>               The fields in this structure are as follows:
> >>
> >>               id     This is a cookie for the notification.  Each such
> >>                      cookie is guaranteed to be unique for the
> >>                      corresponding seccomp filter.
> >>
> >>                      · It can be used with the
> >>                        SECCOMP_IOCTL_NOTIF_ID_VALID ioctl(2) operation
> >>                        to verify that the target is still alive.
> >>
> >>                      · When returning a notification response to the
> >>                        kernel, the supervisor must include the cookie
> >>                        value in the seccomp_notif_resp structure that is
> >>                        specified as the argument of the
> >>                        SECCOMP_IOCTL_NOTIF_SEND operation.
> >>
> >>               pid    This is the thread ID of the target thread that
> >>                      triggered the notification event.
> >>
> >>               flags  This is a bit mask of flags providing further
> >>                      information on the event.  In the current
> >>                      implementation, this field is always zero.
> >>
> >>               data   This is a seccomp_data structure containing
> >>                      information about the system call that triggered
> >>                      the notification.  This is the same structure that
> >>                      is passed to the seccomp filter.  See seccomp(2)
> >>                      for details of this structure.
> >>
> >>               On success, this operation returns 0; on failure, -1 is
> >>               returned, and errno is set to indicate the cause of the
> >>               error.  This operation can fail with the following errors:
> >>
> >>               EINVAL (since Linux 5.5)
> >>                      The seccomp_notif structure that was passed to the
> >>                      call contained nonzero fields.
> >>
> >>               ENOENT The target thread was killed by a signal as the
> >>                      notification information was being generated, or
> >>                      the target's (blocked) system call was interrupted
> >>                      by a signal handler.
> >>
> > 
> > I think I commented in another thread somewhere that the supervisor is not 
> > notified if the syscall is preempted. Therefore if it is performing a 
> > preemptible, long-running syscall, you need to poll
> > SECCOMP_IOCTL_NOTIF_ID_VALID in the background, otherwise you can
> > end up in a bad situation -- like leaking resources, or holding on to
> > file descriptors after the program under supervision has intended to
> > release them.
> 
> It's been a long day, and I'm not sure I reallu understand this.
> Could you outline the scnario in more detail?
> 
S: Sets up filter + interception for accept
T: socket(AF_INET, SOCK_STREAM, 0) = 7
T: bind(7, {127.0.0.1, 4444}, ..)
T: listen(7, 10)
T: pidfd_getfd(T, 7) = 7 # For the sake of discussion.
T: accept(7, ...)
S: Intercepts accept
S: Does accept in background
T: Receives signal, and accept(...) responds in EINTR
T: close(7)
S: Still running accept(7, ....), holding port 4444, so if now T retries
   to bind to port 4444, things fail.

> > A very specific example is if you're performing an accept on behalf
> > of the program generating the notification, and the program intends
> > to reuse the port. You can get into all sorts of awkward situations
> > there.
> 
> [...]
> 
See above

> > 	SECCOMP_IOCTL_NOTIF_ADDFD (Since Linux v5.9)
> > 		This operations is used by the supervisor to add a file
> > 		descriptor to the process that generated the notification.
> > 		This can be used by the supervisor to enable "emulation"
> > 		[Probably a better word] of syscalls which return file
> > 		descriptors, such as socket(2), or open(2).
> > 
> > 		When the file descriptor is received by the process that
> > 		is associated with the notification / cookie, it follows
> > 		SCM_RIGHTS like semantics, and is evaluated by MAC.
> 
> I'm not sure what you mean by SCM_RIGHTS like semantics. Do you mean,
> the file descriptor refers to the same open file description
> ('struct file')?
> 
> "is evaluated by MAC"... Do you mean something like: the FD is 
> subject  to LSM checks?
> 
The same model of SCM_RIGHTS, where it's checked against LSMs in the same way, 
and if your lsm hooks in, it'll activate the same hook as moving the file via 
SCM_RIGHTS would trigger. Also, SCM_RIGHTS does result in some aspects of the fd 
being shared and others being different (like flags). Perhaps there's a better 
term to describe these semantics.

RE: Evaluated by MAC - yes, checked by LSMs.

> > 		In addition, if it is a socket, it inherits the cgroup
> > 		v1 classid and netprioidx of the receiving process.
> > 
> > 		The argument of this is as follows:
> > 
> > 			struct seccomp_notif_addfd {
> > 				__u64 id;
> > 				__u32 flags;
> > 				__u32 srcfd;
> > 				__u32 newfd;
> > 				__u32 newfd_flags;
> > 			};
> > 
> > 		id
> > 			This is the cookie value that was obtained using
> > 			SECCOMP_IOCTL_NOTIF_RECV.
> > 
> > 		flags
> > 			A bitmask that includes zero or more of the
> > 			SECCOMP_ADDFD_FLAG_* bits set
> > 
> > 			SECCOMP_ADDFD_FLAG_SETFD - Use dup2 (or dup3?)
> > 				like semantics when copying the file
> > 				descriptor.
> > 
> > 		srcfd
> > 			The file descriptor number to copy in the
> > 			supervisor process.
> > 
> > 		newfd
> > 			If the SECCOMP_ADDFD_FLAG_SETFD flag is specified
> > 			this will be the file descriptor that is used
> > 			in the dup2 semantics. If this file descriptor
> > 			exists in the receiving process, it is closed
> > 			and replaced by this file descriptor in an
> > 			atomic fashion. If the copy process fails
> > 			due to a MAC failure, or if srcfd is invalid,
> > 			the newfd will not be closed in the receiving
> > 			process.
> 
> Great description!
> 
> > 			If SECCOMP_ADDFD_FLAG_SETFD it not set, then
> > 			this value must be 0.
> > 
> > 		newfd_flags
> > 			The file descriptor flags to set on
> > 			the file descriptor after it has been received
> > 			by the process. The only flag that can currently
> > 			be specified is O_CLOEXEC.
> > 
> > 		On success, this operation returns the file descriptor
> > 		number in the receiving process. On failure, -1 is returned.
> > 
> > 		It can fail with the following error codes:
> > 
> > 		EINPROGRESS
> > 			The cookie number specified hasn't been received
> > 			by the listener
> 
> I don't understand this. Can you say more about the scenario?
> 

This should not really happen. But if you do a ADDFD(...), on a notification 
*before* you've received it, you will get this error. So for example,
--> epoll(....) -> returns
--> RECV(...) cookie id is 777
--> epoll(...) -> returns
<-- ioctl(ADDFD, id = 778) # Notice how we haven't done a receive yet
    where we've received a notification for 778.

> > 		ENOENT
> > 			The cookie number is not valid. This can happen
> > 			if a response has already been sent, or if the
> > 			syscall was interrupted
> > 
> > 		EBADF
> > 			If the file descriptor specified in srcfd is
> > 			invalid, or if the fd is out of range of the
> > 			destination program.
> 
> The piece "or if the fd is out of range of the destination
> program" is not clear to me. Can you say some more please.
> 

IIRC the maximum fd range is specific in proc by some sysctl named
nr_open. It's also evaluated against RLIMITs, and nr_max.

If nr-open (maximum fds open per process, iiirc) is 1000, even
if 10 FDs are open, it wont work if newfd is 1001.

> > 		EINVAL
> > 			If flags or new_flags were unrecognized, or
> > 			if newfd is non-zero, and SECCOMP_ADDFD_FLAG_SETFD
> > 			has not been set.
> > 
> > 		EMFILE
> > 			Too many files are open by the destination process.
> > 
> > 		[there's other error codes possible, like from the LSMs
> > 		 or if memory can't be read / written or ebusy]
> > 		 
> > Does this help?
> 
> It's a good start!
> 
> Thanks,
> 
> Michael
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2020-10-30 20:36 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26  9:55 For review: seccomp_user_notif(2) manual page [v2] Michael Kerrisk (man-pages)
2020-10-26 13:54 ` Tycho Andersen
2020-10-26 14:30   ` Michael Kerrisk (man-pages)
2020-10-26 14:32     ` Tycho Andersen
2020-10-29  1:42 ` Jann Horn
     [not found]   ` <20201029020438.GA25673@cisco>
2020-10-29  4:43     ` Jann Horn
2020-10-29 14:19   ` Michael Kerrisk (man-pages)
2020-10-30 19:14     ` Jann Horn
2020-10-31  8:31       ` Michael Kerrisk (man-pages)
2020-11-02 13:49         ` Jann Horn
2020-10-29 19:14   ` Michael Kerrisk (man-pages)
2020-10-30 19:20     ` Jann Horn
2020-10-31  8:51       ` Michael Kerrisk (man-pages)
2020-11-02 14:13         ` Jann Horn
2020-10-29  8:53 ` Sargun Dhillon
2020-10-29 20:37   ` Michael Kerrisk (man-pages)
2020-10-30 20:27     ` Sargun Dhillon [this message]
2020-10-31 16:27       ` Michael Kerrisk (man-pages)
2020-11-02  8:07         ` Sargun Dhillon
2020-11-02 19:45           ` Michael Kerrisk (man-pages)
2020-11-02 19:49             ` Sargun Dhillon
2020-11-02 20:04               ` Jann Horn
2020-10-29 15:26 ` Christian Brauner
2020-10-29 19:53   ` Michael Kerrisk (man-pages)
2020-10-30 19:24     ` Jann Horn
2020-10-30 20:07       ` Michael Kerrisk (man-pages)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201030202720.GA4088@ircssh-2.c.rugged-nimbus-611.internal \
    --to=sargun@sargun.me \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=daniel@iogearbox.net \
    --cc=gscrivan@redhat.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mtk.manpages@gmail.com \
    --cc=rsesek@google.com \
    --cc=songliubraving@fb.com \
    --cc=tycho@tycho.pizza \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).