* [Ksummit-discuss] [TECH TOPIC] seccomp @ 2019-07-19 9:35 Christian Brauner 2019-07-19 12:32 ` Andy Lutomirski 2019-07-20 7:23 ` James Morris 0 siblings, 2 replies; 11+ messages in thread From: Christian Brauner @ 2019-07-19 9:35 UTC (permalink / raw) To: ksummit-discuss Hey everyone, I would like to discuss approaches to enabling deep argument inspection with seccomp and if we reach an agreement am also happy to do the work and implement it. Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF which enables a process (watchee) to retrieve a fd for its seccomp filter. This fd can then be handed to another (usually more privileged) process (watcher). The watcher will then be able to receive seccomp messages about the syscalls having been performed by the watchee. I have integrated this feature into userspace. We currently make heavy use of this to intercept mknod() syscalls in user namespaces aka in containers. If the mknod() syscall matches a device in a pre-determined whitelist the privileged watcher will perform the mknod syscall in lieu of the unprivileged watchee and report back to the watchee on the success or failure of its attempt. If the syscall does not match a device in a whitelist we simply report an error. We recently also started to intercept the setxattr() syscall to allow the creation of various, well-known xattrs including trusted.overlay.opaque. The mknod() syscall can be easily filtered based on dev_t. This allows us to only intercept a very specific subset of mknod() syscalls. Furthermore, mknod() is not possible in user namespaces toto coelo and so intercepting and denying syscalls that are not in the whitelist on accident is not a big deal. The watchee won't notice a difference. In contrast to mknod(), setxattr() and many other syscalls that we would like to intercept suffer from two major problems: 1. they are not easily filterable like mknod() because they have pointer arguments 2. some of them might actually succeed in user namespaces already (e.g. fscaps etc.) The 1. problem is not specific to SECCOMP_RET_USER_NOTIF but also apparently affects future system call design. We recently merged the clone3() syscall into mainline which moves the flag from a register argument into a dedicated extensible struct clone_args to lift the flag limit from legacy clone() and allowing for extensions while supporting all legacy workloads. One of the counter arguments leveraged against my design early on was that this means clone3() cannot be easily filtered by seccomp due to 1. This argument was fortunately not seen as defeating. I would argue that there sure is value in trying to design syscalls that can be handled by seccomp nicely but that seccomp can't become a burden on designing extensible syscalls. The openat2() syscall proposed currenly also does use a dedicated argument struct which contains flags and the seccomp argument popped back up again. In light of all this, I would argue that we should seriously look into extending seccomp to allow filtering on pointer arguments. There is a close connection between 1. and 2. When a watcher intercepts a syscall from a watchee and starts to inspect its arguments it can - depending on the syscall rather often actually - determine whether or not the syscall would succeed or fail. If it knows that the syscall will succeed it currently still has to perform it in lieu of the watchee since there is no way to tell the kernel to "resume" or actually perform the syscall. It would be nice if we could discuss approaches to enabling this feature as well. I'm happy to lead this session and can also illustrate how this feature is heavily used and how we run into its limitations. Thanks! Christian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-07-19 9:35 [Ksummit-discuss] [TECH TOPIC] seccomp Christian Brauner @ 2019-07-19 12:32 ` Andy Lutomirski 2019-07-20 3:18 ` Kees Cook 2019-07-20 7:23 ` James Morris 1 sibling, 1 reply; 11+ messages in thread From: Andy Lutomirski @ 2019-07-19 12:32 UTC (permalink / raw) To: Christian Brauner; +Cc: ksummit On Fri, Jul 19, 2019 at 2:35 AM Christian Brauner <christian@brauner.io> wrote: > > In light of all this, I would argue that we should seriously look into > extending seccomp to allow filtering on pointer arguments. I won't be at LPC this year, but I was thinking about this anyway. I have the following suggestion that might be a bit unorthodox: have syscalls opt into this filtering. Specifically, a syscall that supports pointer filtering would be refactored the way a bunch of our syscalls are already refactored. The baseline situation is: SYSCALL_DEFINE1(syscallname, struct foo __user *, buf) { ... } Instead, we would do: SYSCALL_FILTERABLE(syscallname, struct foo __user *, buf) { int ret; struct foo kbuf; ret = copy_from_user(&kbuf, buf, sizeof(buf)); if (ret) return ret; ret = seccomp_deep_filter(syscallname, 0, &kbuf); if (ret) return ret; return do_syscallname(&kbuf); } In principle, if we know we're doing a FILTERABLE syscall, we could skip the initial seccomp invocation and just defer it until seccomp_deep_filter(), although this might interact badly with any SECCOMP_RET_PTRACE handles that change nr. To make this robust, it might help a lot if the generation of these stubs was mostly automated. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-07-19 12:32 ` Andy Lutomirski @ 2019-07-20 3:18 ` Kees Cook 2019-08-14 17:54 ` Andy Lutomirski 0 siblings, 1 reply; 11+ messages in thread From: Kees Cook @ 2019-07-20 3:18 UTC (permalink / raw) To: Andy Lutomirski; +Cc: ksummit On Fri, Jul 19, 2019 at 05:32:59AM -0700, Andy Lutomirski wrote: > On Fri, Jul 19, 2019 at 2:35 AM Christian Brauner <christian@brauner.io> wrote: > > > > In light of all this, I would argue that we should seriously look into > > extending seccomp to allow filtering on pointer arguments. I would be all for this. :) I've struggled for a long while trying to find a sane design for this. > I won't be at LPC this year, but I was thinking about this anyway. I > have the following suggestion that might be a bit unorthodox: have > syscalls opt into this filtering. Specifically, a syscall that > supports pointer filtering would be refactored the way a bunch of our > syscalls are already refactored. The baseline situation is: > > SYSCALL_DEFINE1(syscallname, struct foo __user *, buf) { ... } > > Instead, we would do: > > SYSCALL_FILTERABLE(syscallname, struct foo __user *, buf) > { > int ret; > struct foo kbuf; > ret = copy_from_user(&kbuf, buf, sizeof(buf)); > if (ret) > return ret; > > ret = seccomp_deep_filter(syscallname, 0, &kbuf); > if (ret) > return ret; > > return do_syscallname(&kbuf); > } > > In principle, if we know we're doing a FILTERABLE syscall, we could > skip the initial seccomp invocation and just defer it until > seccomp_deep_filter(), although this might interact badly with any > SECCOMP_RET_PTRACE handles that change nr. I don't like splitting the logic on seccomp invocation (we end up needing to solve ordering issues maybe again), but I do like this explicit opt-in feature. How you have it does make the "where do we store a cached copy?" problem go away, too. With a solution looming, now my mind turns to "how do we write filters that check argument data?" Can this be done sanely with cBPF or are we finally to requiring eBPF? The placement of the seccomp hook looks rather like an LSM, which gets me back to earlier LSM hooking designs I'd considered: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=seccomp/lsm&id=10c1e4d2b51ad61ad516fa44c2e007f3f5f6edfb Which also didn't solve the split-location of seccomp rules and wasn't creating a dynamic way to do, say, string matching. > To make this robust, it might help a lot if the generation of these > stubs was mostly automated. Agreed. -- Kees Cook ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-07-20 3:18 ` Kees Cook @ 2019-08-14 17:54 ` Andy Lutomirski 2019-08-15 17:48 ` Kees Cook 0 siblings, 1 reply; 11+ messages in thread From: Andy Lutomirski @ 2019-08-14 17:54 UTC (permalink / raw) To: Kees Cook; +Cc: ksummit, Andy Lutomirski On Fri, Jul 19, 2019 at 8:18 PM Kees Cook <keescook@chromium.org> wrote: > > On Fri, Jul 19, 2019 at 05:32:59AM -0700, Andy Lutomirski wrote: > > On Fri, Jul 19, 2019 at 2:35 AM Christian Brauner <christian@brauner.io> wrote: > > > > > > In light of all this, I would argue that we should seriously look into > > > extending seccomp to allow filtering on pointer arguments. > > I would be all for this. :) I've struggled for a long while trying to > find a sane design for this. > > > I won't be at LPC this year, but I was thinking about this anyway. I > > have the following suggestion that might be a bit unorthodox: have > > syscalls opt into this filtering. Specifically, a syscall that > > supports pointer filtering would be refactored the way a bunch of our > > syscalls are already refactored. The baseline situation is: > > > > SYSCALL_DEFINE1(syscallname, struct foo __user *, buf) { ... } > > > > Instead, we would do: > > > > SYSCALL_FILTERABLE(syscallname, struct foo __user *, buf) > > { > > int ret; > > struct foo kbuf; > > ret = copy_from_user(&kbuf, buf, sizeof(buf)); > > if (ret) > > return ret; > > > > ret = seccomp_deep_filter(syscallname, 0, &kbuf); > > if (ret) > > return ret; > > > > return do_syscallname(&kbuf); > > } > > > > In principle, if we know we're doing a FILTERABLE syscall, we could > > skip the initial seccomp invocation and just defer it until > > seccomp_deep_filter(), although this might interact badly with any > > SECCOMP_RET_PTRACE handles that change nr. > > I don't like splitting the logic on seccomp invocation (we end up needing > to solve ordering issues maybe again), but I do like this explicit > opt-in feature. How you have it does make the "where do we store a cached > copy?" problem go away, too. After thinking about this a bit more, I think that deferring the main seccomp filter invocation until arguments have been read is too problematic. It has the ordering issues you're thinking of, but it also has unpleasant effects if one of the reads faults or if SECCOMP_RET_TRACE or SECCOMP_RET_TRAP is used. I'm thinking that this type of deeper inspection filter should just be a totally separate layer. Once the main seccomp logic decides that a filterable syscall will be issued then, assuming that no -EFAULT happens, a totally different program should get run with access to arguments. And there should be a way for the main program to know that the syscall nr in question is filterable on the running kernel. Does that make sense? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-08-14 17:54 ` Andy Lutomirski @ 2019-08-15 17:48 ` Kees Cook 2019-08-15 18:26 ` Andy Lutomirski 0 siblings, 1 reply; 11+ messages in thread From: Kees Cook @ 2019-08-15 17:48 UTC (permalink / raw) To: Andy Lutomirski; +Cc: ksummit On Wed, Aug 14, 2019 at 10:54:49AM -0700, Andy Lutomirski wrote: > After thinking about this a bit more, I think that deferring the main > seccomp filter invocation until arguments have been read is too > problematic. It has the ordering issues you're thinking of, but it > also has unpleasant effects if one of the reads faults or if > SECCOMP_RET_TRACE or SECCOMP_RET_TRAP is used. I'm thinking that this Right, I was actually thinking of the trace/trap as being the race. > type of deeper inspection filter should just be a totally separate > layer. Once the main seccomp logic decides that a filterable syscall > will be issued then, assuming that no -EFAULT happens, a totally > different program should get run with access to arguments. And there > should be a way for the main program to know that the syscall nr in > question is filterable on the running kernel. Right -- this is how I designed the original prototype: it was effectively an LSM that was triggered by seccomp (since LSMs don't know anything about syscalls -- their hooks are more generalized). So seccomp would set a flag to make the LSM hook pay attention. Existing LSMs are system-owner defined, so really something like Landlock is needed for a process-owned LSM to be defined. But I worry that LSM hooks are still too "deep" in the kernel to have a process-oriented filter author who is not a kernel developer make any sense of the hooks. They're certainly oriented in a better position to gain the intent of a filter. For example, if a filter says "you can't open(2) /etc/foo", but it misses saying "you can't openat(2) /etc/foo", that's a dumb exposure. The LSM hooks are positioned to say "you can't manipulate /etc/foo through any means". So, I'm not entirely sure. It needs a clear design that chooses and justifies the appropriate "depth" of filtering. And FWIW, the two most frequent examples of argument parsing requests have been path-based checking and network address checking. So any prototype needs to handle these two cases sanely... -- Kees Cook ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-08-15 17:48 ` Kees Cook @ 2019-08-15 18:26 ` Andy Lutomirski 2019-08-15 18:31 ` Christian Brauner 0 siblings, 1 reply; 11+ messages in thread From: Andy Lutomirski @ 2019-08-15 18:26 UTC (permalink / raw) To: Kees Cook; +Cc: ksummit, Andy Lutomirski On Thu, Aug 15, 2019 at 10:48 AM Kees Cook <keescook@chromium.org> wrote: > > On Wed, Aug 14, 2019 at 10:54:49AM -0700, Andy Lutomirski wrote: > > After thinking about this a bit more, I think that deferring the main > > seccomp filter invocation until arguments have been read is too > > problematic. It has the ordering issues you're thinking of, but it > > also has unpleasant effects if one of the reads faults or if > > SECCOMP_RET_TRACE or SECCOMP_RET_TRAP is used. I'm thinking that this > > Right, I was actually thinking of the trace/trap as being the race. > > > type of deeper inspection filter should just be a totally separate > > layer. Once the main seccomp logic decides that a filterable syscall > > will be issued then, assuming that no -EFAULT happens, a totally > > different program should get run with access to arguments. And there > > should be a way for the main program to know that the syscall nr in > > question is filterable on the running kernel. > > Right -- this is how I designed the original prototype: it was > effectively an LSM that was triggered by seccomp (since LSMs don't know > anything about syscalls -- their hooks are more generalized). So seccomp > would set a flag to make the LSM hook pay attention. > > Existing LSMs are system-owner defined, so really something like Landlock > is needed for a process-owned LSM to be defined. But I worry that LSM > hooks are still too "deep" in the kernel to have a process-oriented > filter author who is not a kernel developer make any sense of the > hooks. They're certainly oriented in a better position to gain the > intent of a filter. For example, if a filter says "you can't open(2) > /etc/foo", but it misses saying "you can't openat(2) /etc/foo", that's a > dumb exposure. The LSM hooks are positioned to say "you can't manipulate > /etc/foo through any means". > > So, I'm not entirely sure. It needs a clear design that chooses and > justifies the appropriate "depth" of filtering. And FWIW, the two most > frequent examples of argument parsing requests have been path-based > checking and network address checking. So any prototype needs to handle > these two cases sanely... > But also clone() flag filtering, and new clone() proposals keep wanting to add structs. And filtering bpf(). /me runs. But yes, doing this LSM-style could also make sense. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-08-15 18:26 ` Andy Lutomirski @ 2019-08-15 18:31 ` Christian Brauner 2019-08-15 19:21 ` Andy Lutomirski 0 siblings, 1 reply; 11+ messages in thread From: Christian Brauner @ 2019-08-15 18:31 UTC (permalink / raw) To: Andy Lutomirski; +Cc: ksummit On Thu, Aug 15, 2019 at 11:26:10AM -0700, Andy Lutomirski wrote: > On Thu, Aug 15, 2019 at 10:48 AM Kees Cook <keescook@chromium.org> wrote: > > > > On Wed, Aug 14, 2019 at 10:54:49AM -0700, Andy Lutomirski wrote: > > > After thinking about this a bit more, I think that deferring the main > > > seccomp filter invocation until arguments have been read is too > > > problematic. It has the ordering issues you're thinking of, but it > > > also has unpleasant effects if one of the reads faults or if > > > SECCOMP_RET_TRACE or SECCOMP_RET_TRAP is used. I'm thinking that this > > > > Right, I was actually thinking of the trace/trap as being the race. > > > > > type of deeper inspection filter should just be a totally separate > > > layer. Once the main seccomp logic decides that a filterable syscall > > > will be issued then, assuming that no -EFAULT happens, a totally > > > different program should get run with access to arguments. And there > > > should be a way for the main program to know that the syscall nr in > > > question is filterable on the running kernel. > > > > Right -- this is how I designed the original prototype: it was > > effectively an LSM that was triggered by seccomp (since LSMs don't know > > anything about syscalls -- their hooks are more generalized). So seccomp > > would set a flag to make the LSM hook pay attention. > > > > Existing LSMs are system-owner defined, so really something like Landlock > > is needed for a process-owned LSM to be defined. But I worry that LSM > > hooks are still too "deep" in the kernel to have a process-oriented > > filter author who is not a kernel developer make any sense of the > > hooks. They're certainly oriented in a better position to gain the > > intent of a filter. For example, if a filter says "you can't open(2) > > /etc/foo", but it misses saying "you can't openat(2) /etc/foo", that's a > > dumb exposure. The LSM hooks are positioned to say "you can't manipulate > > /etc/foo through any means". > > > > So, I'm not entirely sure. It needs a clear design that chooses and > > justifies the appropriate "depth" of filtering. And FWIW, the two most > > frequent examples of argument parsing requests have been path-based > > checking and network address checking. So any prototype needs to handle > > these two cases sanely... > > > > But also clone() flag filtering, and new clone() proposals keep > wanting to add structs. And filtering bpf(). /me runs. Yeah, I've mentioned clone3() in my initial mail. And it is not a proposal anymore it's in mainline since the 5.3 merge window. So the evil has been done. /me (sorry-not-sorry) ducks :) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-08-15 18:31 ` Christian Brauner @ 2019-08-15 19:21 ` Andy Lutomirski 0 siblings, 0 replies; 11+ messages in thread From: Andy Lutomirski @ 2019-08-15 19:21 UTC (permalink / raw) To: Christian Brauner; +Cc: ksummit, Andy Lutomirski On Thu, Aug 15, 2019 at 11:31 AM Christian Brauner <christian.brauner@ubuntu.com> wrote: > > On Thu, Aug 15, 2019 at 11:26:10AM -0700, Andy Lutomirski wrote: > > On Thu, Aug 15, 2019 at 10:48 AM Kees Cook <keescook@chromium.org> wrote: > > > > > > On Wed, Aug 14, 2019 at 10:54:49AM -0700, Andy Lutomirski wrote: > > > > After thinking about this a bit more, I think that deferring the main > > > > seccomp filter invocation until arguments have been read is too > > > > problematic. It has the ordering issues you're thinking of, but it > > > > also has unpleasant effects if one of the reads faults or if > > > > SECCOMP_RET_TRACE or SECCOMP_RET_TRAP is used. I'm thinking that this > > > > > > Right, I was actually thinking of the trace/trap as being the race. > > > > > > > type of deeper inspection filter should just be a totally separate > > > > layer. Once the main seccomp logic decides that a filterable syscall > > > > will be issued then, assuming that no -EFAULT happens, a totally > > > > different program should get run with access to arguments. And there > > > > should be a way for the main program to know that the syscall nr in > > > > question is filterable on the running kernel. > > > > > > Right -- this is how I designed the original prototype: it was > > > effectively an LSM that was triggered by seccomp (since LSMs don't know > > > anything about syscalls -- their hooks are more generalized). So seccomp > > > would set a flag to make the LSM hook pay attention. > > > > > > Existing LSMs are system-owner defined, so really something like Landlock > > > is needed for a process-owned LSM to be defined. But I worry that LSM > > > hooks are still too "deep" in the kernel to have a process-oriented > > > filter author who is not a kernel developer make any sense of the > > > hooks. They're certainly oriented in a better position to gain the > > > intent of a filter. For example, if a filter says "you can't open(2) > > > /etc/foo", but it misses saying "you can't openat(2) /etc/foo", that's a > > > dumb exposure. The LSM hooks are positioned to say "you can't manipulate > > > /etc/foo through any means". > > > > > > So, I'm not entirely sure. It needs a clear design that chooses and > > > justifies the appropriate "depth" of filtering. And FWIW, the two most > > > frequent examples of argument parsing requests have been path-based > > > checking and network address checking. So any prototype needs to handle > > > these two cases sanely... > > > > > > > But also clone() flag filtering, and new clone() proposals keep > > wanting to add structs. And filtering bpf(). /me runs. > > Yeah, I've mentioned clone3() in my initial mail. And it is not a > proposal anymore it's in mainline since the 5.3 merge window. So the > evil has been done. /me (sorry-not-sorry) ducks :) /me throws something squishy So I guess we want some way for a seccomp filter to see clone3() being called and determine that it or a related filter will be invoked again with the arguments read before clone3() actually does anything. Doing this with Landlock would involve poking quite a few places to add a syscall, whereas my FILTERABLE thing would do it more simply. These approaches aren't necessarily mutually exclusive. Maybe some flags could be passed to the main seccomp filter so that it could determine things like: - This syscall is FILTERABLE and (optionally) these args will be filtered. - Landlock will be called for filesystem access and the following hooks are enabled. The idea is that we want the ability to make additional syscalls be FILTERABLE and/or to add new seccompable LSM hooks in new kernels. Doing this in a way that has an acceptably low risk of accidentally opening security holes when LSM hooks change will require quite a bit of care. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-07-19 9:35 [Ksummit-discuss] [TECH TOPIC] seccomp Christian Brauner 2019-07-19 12:32 ` Andy Lutomirski @ 2019-07-20 7:23 ` James Morris 2019-07-20 7:41 ` Christian Brauner 1 sibling, 1 reply; 11+ messages in thread From: James Morris @ 2019-07-20 7:23 UTC (permalink / raw) To: Christian Brauner; +Cc: mic, ksummit-discuss On Fri, 19 Jul 2019, Christian Brauner wrote: > There is a close connection between 1. and 2. When a watcher intercepts > a syscall from a watchee and starts to inspect its arguments it can - > depending on the syscall rather often actually - determine whether or > not the syscall would succeed or fail. If it knows that the syscall will > succeed it currently still has to perform it in lieu of the watchee > since there is no way to tell the kernel to "resume" or actually perform > the syscall. It would be nice if we could discuss approaches to enabling > this feature as well. Landlock is exploring userspace access control via the seccomp syscall with ebpf, but from within the same process: https://landlock.io/ It may be worth investigating whether Landlock could be extended to a split watcher/watchee model. -- James Morris <jmorris@namei.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-07-20 7:23 ` James Morris @ 2019-07-20 7:41 ` Christian Brauner 2019-07-25 14:18 ` Serge E. Hallyn 0 siblings, 1 reply; 11+ messages in thread From: Christian Brauner @ 2019-07-20 7:41 UTC (permalink / raw) To: James Morris; +Cc: mic, ksummit-discuss On July 20, 2019 9:23:33 AM GMT+02:00, James Morris <jmorris@namei.org> wrote: >On Fri, 19 Jul 2019, Christian Brauner wrote: > >> There is a close connection between 1. and 2. When a watcher >intercepts >> a syscall from a watchee and starts to inspect its arguments it can - >> depending on the syscall rather often actually - determine whether or >> not the syscall would succeed or fail. If it knows that the syscall >will >> succeed it currently still has to perform it in lieu of the watchee >> since there is no way to tell the kernel to "resume" or actually >perform >> the syscall. It would be nice if we could discuss approaches to >enabling >> this feature as well. > >Landlock is exploring userspace access control via the seccomp >syscall with ebpf, but from within the same process: > >https://landlock.io/ > >It may be worth investigating whether Landlock could be extended to a >split watcher/watchee model. Certainly a valid point but... I don't want to rely on landlock for this. First, no one knows if and when it will ever land. Second, seccomp is the go-to sandboxing solution for a lot of userspace already. Often used without a full LSM. Third, syscall interception to me is seccomp territory. :) That's to say I'd like seccomp to have this feature *natively* and ideally not tied to a complete LSM that needs to be merged for this. :) Christian ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] seccomp 2019-07-20 7:41 ` Christian Brauner @ 2019-07-25 14:18 ` Serge E. Hallyn 0 siblings, 0 replies; 11+ messages in thread From: Serge E. Hallyn @ 2019-07-25 14:18 UTC (permalink / raw) To: Christian Brauner; +Cc: mic, ksummit-discuss On Sat, Jul 20, 2019 at 09:41:11AM +0200, Christian Brauner wrote: > On July 20, 2019 9:23:33 AM GMT+02:00, James Morris <jmorris@namei.org> wrote: > >On Fri, 19 Jul 2019, Christian Brauner wrote: > > > >> There is a close connection between 1. and 2. When a watcher > >intercepts > >> a syscall from a watchee and starts to inspect its arguments it can - > >> depending on the syscall rather often actually - determine whether or > >> not the syscall would succeed or fail. If it knows that the syscall > >will > >> succeed it currently still has to perform it in lieu of the watchee > >> since there is no way to tell the kernel to "resume" or actually > >perform > >> the syscall. It would be nice if we could discuss approaches to > >enabling > >> this feature as well. > > > >Landlock is exploring userspace access control via the seccomp > >syscall with ebpf, but from within the same process: > > > >https://landlock.io/ > > > >It may be worth investigating whether Landlock could be extended to a > >split watcher/watchee model. > > Certainly a valid point but... > I don't want to rely on landlock for this. > First, no one knows if and when it will ever land. > Second, seccomp is the go-to sandboxing solution for a lot of userspace already. > Often used without a full LSM. > Third, syscall interception to me is seccomp territory. :) > That's to say I'd like seccomp to have this feature *natively* and ideally not tied to > a complete LSM that needs to be merged for this. :) Sounds all the more like discussion is warranted :) ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-08-15 19:21 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-19 9:35 [Ksummit-discuss] [TECH TOPIC] seccomp Christian Brauner 2019-07-19 12:32 ` Andy Lutomirski 2019-07-20 3:18 ` Kees Cook 2019-08-14 17:54 ` Andy Lutomirski 2019-08-15 17:48 ` Kees Cook 2019-08-15 18:26 ` Andy Lutomirski 2019-08-15 18:31 ` Christian Brauner 2019-08-15 19:21 ` Andy Lutomirski 2019-07-20 7:23 ` James Morris 2019-07-20 7:41 ` Christian Brauner 2019-07-25 14:18 ` Serge E. Hallyn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).