linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Brendan Shanks <bshanks@codeweavers.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Paul Gofman <gofmanp@gmail.com>,
	Gabriel Krisman Bertazi <krisman@collabora.com>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel@collabora.com, Thomas Gleixner <tglx@linutronix.de>,
	Kees Cook <keescook@chromium.org>, Will Drewry <wad@chromium.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Zebediah Figura <zfigura@codeweavers.com>
Subject: Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas
Date: Sun, 31 May 2020 18:51:40 -0700	[thread overview]
Message-ID: <53C0BD81-A942-4BB3-8538-D5107E84C5CD@amacapital.net> (raw)
In-Reply-To: <8DF2868F-E756-4B33-A7AE-C61F4AB9ABB9@codeweavers.com>



> On May 31, 2020, at 4:50 PM, Brendan Shanks <bshanks@codeweavers.com> wrote:
> 
> 
>> On May 31, 2020, at 11:57 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> 
>> Using SECCOMP_RET_USER_NOTIF is likely to be considerably more
>> expensive than my scheme.  On a non-PTI system, my approach will add a
>> few tens of ns to each syscall.  On a PTI system, it will be worse.
>> But using any kind of notifier for all syscalls will cause a context
>> switch to a different user program for each syscall, and that will be
>> much slower.
> 
> There’s also no way (at least to my understanding) to modify register state from SECCOMP_RET_USER_NOTIF, which is how the existing -staging SIGSYS handler works:
> 
> <https://github.com/wine-staging/wine-staging/blob/master/patches/ntdll-Syscall_Emulation/0001-ntdll-Support-x86_64-syscall-emulation.patch#L62>
> 
>> I think that the implementation may well want to live in seccomp, but
>> doing this as a seccomp filter isn't quite right.  It's not a security
>> thing -- it's an emulation thing.  Seccomp is all about making
>> inescapable sandboxes, but that's not what you're doing at all, and
>> the fact that seccomp filters are preserved across execve() sounds
>> like it'll be annoying for you.
> 
> Definitely. Regardless of what approach is taken, we don’t want it to persist across execve.
> 
>> What if there was a special filter type that ran a BPF program on each
>> syscall, and the program was allowed to access user memory to make its
>> decisions, e.g. to look at some list of memory addresses.  But this
>> would explicitly *not* be a security feature -- execve() would remove
>> the filter, and the filter's outcome would be one of redirecting
>> execution or allowing the syscall.  If the "allow" outcome occurs,
>> then regular seccomp filters run.  Obviously the exact semantics here
>> would need some care.
> 
> Although if that’s running a BPF filter on every syscall, wouldn’t it also incur the ~10% overhead that Paul and Gabriel have seen with existing seccomp?
> 
> 

Unlikely. Some benchmarking is needed, but the seccomp ptrace overhead is likely *huge* compared to the overhead of just a filter.

As wild guess numbers on made up modern hardware, cache hot:

Empty syscall: 50ns, or 300ns with PTI

Empty syscall accepted by simple seccomp filter: 10ns more than an empty syscall without seccomp

Seccomp ptrace round trip: 6 us  Worse with PTI

Seccomp user notif round trip: 4 us

Syscall hypothetically redirected back to same process: about the same as an empty filtered accepted syscall, plus however long it takes to run the handler. Add 900ns if using SIGSYS instead of plain redirection. Add an extra 500ns on current kernels because signal delivery sucks, but I can fix this.

Take these numbers with a huge grain of salt.  But the point is that the BPF part is the least of your worries.

  reply	other threads:[~2020-06-01  1:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-30  5:59 [PATCH RFC] seccomp: Implement syscall isolation based on memory areas Gabriel Krisman Bertazi
2020-05-30 17:30 ` Kees Cook
2020-05-31  5:56   ` Gabriel Krisman Bertazi
2020-05-31 12:39     ` Paul Gofman
2020-05-31 16:49       ` Matthew Wilcox
2020-05-31 17:10         ` Paul Gofman
2020-05-31 17:31           ` Matthew Wilcox
2020-05-31 18:01             ` Paul Gofman
2020-06-01 17:54               ` Gabriel Krisman Bertazi
2020-06-01 17:53         ` Gabriel Krisman Bertazi
2020-05-30 22:09 ` Andy Lutomirski
2020-05-31  0:26   ` Gabriel Krisman Bertazi
2020-05-31  0:59     ` Andy Lutomirski
2020-05-31 12:56       ` Paul Gofman
2020-05-31 18:10         ` Andy Lutomirski
2020-05-31 18:36           ` Paul Gofman
2020-05-31 18:57             ` Andy Lutomirski
2020-05-31 19:37               ` Paul Gofman
2020-05-31 21:03               ` Andy Lutomirski
2020-06-01 18:06                 ` Gabriel Krisman Bertazi
2020-06-01 20:08                 ` Kees Cook
2020-06-01 23:18                   ` Andy Lutomirski
2020-06-11 19:38                 ` Gabriel Krisman Bertazi
2020-05-31 23:33               ` Brendan Shanks
2020-06-01  1:51                 ` Andy Lutomirski [this message]
2020-06-25 23:14     ` Robert O'Callahan
2020-06-25 23:48       ` Gabriel Krisman Bertazi
2020-06-26  1:03         ` Robert O'Callahan
2020-06-05  6:06 ` Sargun Dhillon
2020-06-01  9:23 Billy Laws
2020-06-01 13:59 ` Andy Lutomirski
2020-06-01 17:48   ` hpa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53C0BD81-A942-4BB3-8538-D5107E84C5CD@amacapital.net \
    --to=luto@amacapital.net \
    --cc=bshanks@codeweavers.com \
    --cc=gofmanp@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=wad@chromium.org \
    --cc=zfigura@codeweavers.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).