linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Formal description of system call interface
@ 2016-11-06 22:39 Dmitry Vyukov
       [not found] ` <CACT4Y+YYgs43nJnyg3B9cHWOue62iMW3ZgXQKiKG12A1NVMgtg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-11-11 17:10 ` Andy Lutomirski
  0 siblings, 2 replies; 14+ messages in thread
From: Dmitry Vyukov @ 2016-11-06 22:39 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA, LKML
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Thomas Gleixner,
	Sasha Levin, Mathieu Desnoyers, scientist-b10kYP2dOMg,
	Steven Rostedt, Arnd Bergmann, carlos-H+wXaHxf7aLQT0dZR+AlfA,
	syzkaller, Kostya Serebryany, Mike Frysinger, Dave Jones,
	Tavis Ormandy

Hello,

This is notes from the discussion we had at Linux Plumbers this week
regarding providing a formal description of system calls (user API).

The idea come up in the context of syzkaller, syscall fuzzer, which
has descriptions for 1000+ syscalls mostly concentrating on types of
arguments and return values. However, problems are that a small group
of people can't write descriptions for all syscalls; can't keep them
up-to-date and doesn't have necessary domain expertise to do correct
descriptions in some cases.

We identified a surprisingly large number of potential users for such
descriptions:
 - fuzzers (syzkaller, trinity, iknowthis)
 - strace/syscall tracepoints (capturing indirect arguments and
   printing human-readable info)
 - generation of entry points for C libraries (glibc, liblinux
   (raw syscalls), Go runtime, clang/gcc sanitizers)
 - valgrind/sanitizers checking of input/output values of syscalls
 - seccomp filters (minijail, libseccomp) need to know interfaces
   to generate wrappers
 - safety certification (requires syscall specifications)
 - man pages (could provide actual syscall interface rather than
   glibc wrapper interface, it was noted that possible errno values
   is an important part here)
 - generation of syscall argument validation in kernel (fast version
   is enabled all the time, extended is optional)

It's worth noting that number of these users already have some
descriptions that suffer from the same problems of being
incomplete/outdated. See also linux-api mailing list description
which lists an overlapping set of cases:
https://www.kernel.org/doc/man-pages/linux-api-ml.html

We discussed several implementation approaches:
 - Extracting the interface from kernel code either by parsing
   sources or using dwarf. However, current source doesn't have
   enough info: fd are specified as int, while we need to know exact
   fd type (e.g. fd_epoll_t); not possible to extract flag set for
   'int flags'; don't know what is 'char*'.
 - Making the formal description the master copy and generating
   kernel code from it (structs, flags, syscall entry points).
   This is quite pervasive, but otherwise should work.
 - Doing what syzkaller currently does: providing the description
   on side. Verifying that description and implementation match
   is an important piece here. We can do dynamic checking in syscall
   entry points (print warnings on anything that does not match
   descriptions); or static checking (but again kernel code doesn't
   have enough info for checking).

We decided to pursue the last option as the least pervasive for now.
Several locations for the descriptions were proposed: with source code,
include/uapi, Documentation.

Action points:
 - polish DSL for description (must be extensible)
 - write a parser for DSL
 - provide definition for mm syscalls (mm is reasonably simple
   and self-contained)
 - see if we can do validation of mm arguments

It was acknowledged that whatever we do now it will probably
significantly change and evolve over time as we better understand
what we need and what works.

For the reference, current syzkaller descriptions are in txt files here:
https://github.com/google/syzkaller/tree/master/sys
The most generic syscalls are here:
https://github.com/google/syzkaller/blob/master/sys/sys.txt
Specific subsystems are described in separate files, e.g.:
https://github.com/google/syzkaller/blob/master/sys/bpf.txt
https://github.com/google/syzkaller/blob/master/sys/tty.txt
https://github.com/google/syzkaller/blob/master/sys/sndseq.txt
The descriptions should be self-explanatory, but just in case there
is also a semi-formal DSL specification here:
https://github.com/google/syzkaller/blob/master/sys/README.md

Taking the opportunity, if you see that something is missing/wrong
in the descriptions of the subsystem you care about, or if it is not
described at all, fixes are welcome.

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
       [not found] ` <CACT4Y+YYgs43nJnyg3B9cHWOue62iMW3ZgXQKiKG12A1NVMgtg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-11-07  0:28   ` Szabolcs Nagy
  2016-11-21 15:03     ` Dmitry Vyukov
  2016-11-07 10:38   ` Cyril Hrubis
  2017-04-21 15:14   ` Carlos O'Donell
  2 siblings, 1 reply; 14+ messages in thread
From: Szabolcs Nagy @ 2016-11-07  0:28 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, LKML,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Thomas Gleixner,
	Sasha Levin, Mathieu Desnoyers, scientist-b10kYP2dOMg,
	Steven Rostedt, Arnd Bergmann, carlos-H+wXaHxf7aLQT0dZR+AlfA,
	syzkaller, Kostya Serebryany, Mike Frysinger, Dave Jones,
	Tavis Ormandy

* Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> [2016-11-06 14:39:28 -0800]:
> This is notes from the discussion we had at Linux Plumbers this week
> regarding providing a formal description of system calls (user API).

yes a database of the syscall abis would be useful
..depending on the level of detail it has.

(right now there is not even spec about what registers
the syscall entry point may clobber on the various abis
which would be useful to know when making syscalls)

> Action points:
>  - polish DSL for description (must be extensible)
>  - write a parser for DSL
>  - provide definition for mm syscalls (mm is reasonably simple
>    and self-contained)
>  - see if we can do validation of mm arguments

for all abi variants? e.g. mmap offset range is abi dependent.

> For the reference, current syzkaller descriptions are in txt files here:
> https://github.com/google/syzkaller/tree/master/sys
...
> Taking the opportunity, if you see that something is missing/wrong
> in the descriptions of the subsystem you care about, or if it is not
> described at all, fixes are welcome.

abi variants are missing (abi variation makes a lot of
syscall interface related work painful).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
       [not found] ` <CACT4Y+YYgs43nJnyg3B9cHWOue62iMW3ZgXQKiKG12A1NVMgtg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-11-07  0:28   ` Szabolcs Nagy
@ 2016-11-07 10:38   ` Cyril Hrubis
       [not found]     ` <20161107103819.GA11374-2UyX9mZUyMU@public.gmane.org>
  2016-11-21 15:37     ` Steven Rostedt
  2017-04-21 15:14   ` Carlos O'Donell
  2 siblings, 2 replies; 14+ messages in thread
From: Cyril Hrubis @ 2016-11-07 10:38 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, LKML,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Thomas Gleixner,
	Sasha Levin, Mathieu Desnoyers, scientist-b10kYP2dOMg,
	Steven Rostedt, Arnd Bergmann, carlos-H+wXaHxf7aLQT0dZR+AlfA,
	syzkaller, Kostya Serebryany, Mike Frysinger, Dave Jones,
	Tavis Ormandy

Hi!
> We identified a surprisingly large number of potential users for such
> descriptions:
>  - fuzzers (syzkaller, trinity, iknowthis)
>  - strace/syscall tracepoints (capturing indirect arguments and
>    printing human-readable info)
>  - generation of entry points for C libraries (glibc, liblinux
>    (raw syscalls), Go runtime, clang/gcc sanitizers)
>  - valgrind/sanitizers checking of input/output values of syscalls
>  - seccomp filters (minijail, libseccomp) need to know interfaces
>    to generate wrappers
>  - safety certification (requires syscall specifications)
>  - man pages (could provide actual syscall interface rather than
>    glibc wrapper interface, it was noted that possible errno values
>    is an important part here)
>  - generation of syscall argument validation in kernel (fast version
>    is enabled all the time, extended is optional)

I was thinking of generating LTP testcases from a well defined format for
quite some time. Maybe that would be a good way how to keep the
descriptions to match the reality.

LTP test would however need a bit more information that just syscall
parameter anotation. We would need also description of the expected
return value in some cases, annotating it as "returns fd" or "zero on
success" would be good enough for basic tests. Better testing would need
to describe the "side efect" as well (file was created, filesystem
mounted, etc.) then we could write a classes of verify functions,
something as check_file_exits() and use them to check the results
accordingly.

I'm not sure if something like this is really doable or in the scope of
this project, but it may be worth a try.

-- 
Cyril Hrubis
chrubis-AlSwsSmVLrQ@public.gmane.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
  2016-11-06 22:39 Formal description of system call interface Dmitry Vyukov
       [not found] ` <CACT4Y+YYgs43nJnyg3B9cHWOue62iMW3ZgXQKiKG12A1NVMgtg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-11-11 17:10 ` Andy Lutomirski
  2016-11-21 15:17   ` Dmitry Vyukov
  1 sibling, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2016-11-11 17:10 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Linux API, LKML, Michael Kerrisk, Thomas Gleixner, Sasha Levin,
	Mathieu Desnoyers, scientist, Steven Rostedt, Arnd Bergmann,
	Carlos O'Donell, syzkaller, Kostya Serebryany,
	Mike Frysinger, Dave Jones, Tavis Ormandy

On Sun, Nov 6, 2016 at 2:39 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> This is notes from the discussion we had at Linux Plumbers this week
> regarding providing a formal description of system calls (user API).
>
> The idea come up in the context of syzkaller, syscall fuzzer, which
> has descriptions for 1000+ syscalls mostly concentrating on types of
> arguments and return values. However, problems are that a small group
> of people can't write descriptions for all syscalls; can't keep them
> up-to-date and doesn't have necessary domain expertise to do correct
> descriptions in some cases.
>
> We identified a surprisingly large number of potential users for such
> descriptions:

Let me add one more: consolidation of all the incompatible arch
syscall tables.  A sufficiently descriptive format should be parseable
at build time to generate the syscall tables.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
  2016-11-07  0:28   ` Szabolcs Nagy
@ 2016-11-21 15:03     ` Dmitry Vyukov
       [not found]       ` <CACT4Y+bq97OPqW9nUoQWDdVfeCv6oOYT0=GeFmOu2rosBz4s2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2016-11-21 15:03 UTC (permalink / raw)
  To: syzkaller
  Cc: linux-api, LKML, Michael Kerrisk-manpages, Thomas Gleixner,
	Sasha Levin, Mathieu Desnoyers, scientist, Steven Rostedt,
	Arnd Bergmann, carlos, Kostya Serebryany, Mike Frysinger,
	Dave Jones, Tavis Ormandy

On Mon, Nov 7, 2016 at 1:28 AM, Szabolcs Nagy <nsz@port70.net> wrote:
> * Dmitry Vyukov <dvyukov@google.com> [2016-11-06 14:39:28 -0800]:
>> This is notes from the discussion we had at Linux Plumbers this week
>> regarding providing a formal description of system calls (user API).
> yes a database of the syscall abis would be useful
> ..depending on the level of detail it has.
>
> (right now there is not even spec about what registers
> the syscall entry point may clobber on the various abis
> which would be useful to know when making syscalls)

Hi Szabolcs,

Level of detail is discus-sable. I would say that a detail is worth
supporting if:
1. there is an intent to actually use in a foreseeable future
2. there is somebody who is ready to spend time describing the detail


>> Action points:
>>  - polish DSL for description (must be extensible)
>>  - write a parser for DSL
>>  - provide definition for mm syscalls (mm is reasonably simple
>>    and self-contained)
>>  - see if we can do validation of mm arguments
>
> for all abi variants? e.g. mmap offset range is abi dependent.

I don't think we draw exact line between what will be verified and will not.
There are simpler predicates (e.g. memory is addressable) and more complex
predicates (e.g. this flag can have this value iff that other flags is
specified and
a valid fd is passed in that field of a struct).


>> For the reference, current syzkaller descriptions are in txt files here:
>> https://github.com/google/syzkaller/tree/master/sys
> ...
>> Taking the opportunity, if you see that something is missing/wrong
>> in the descriptions of the subsystem you care about, or if it is not
>> described at all, fixes are welcome.
>
> abi variants are missing (abi variation makes a lot of
> syscall interface related work painful).


What exactly do you mean by "abi variants"? Is it architecture?
What exactly needs to be added to the descriptions support "abi variants?

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
       [not found]     ` <20161107103819.GA11374-2UyX9mZUyMU@public.gmane.org>
@ 2016-11-21 15:14       ` Dmitry Vyukov
       [not found]         ` <CACT4Y+aUzdX8NqMu+Y3s53vEmoBw7KysB3g2PEjZ6MyJimki1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2016-11-21 15:14 UTC (permalink / raw)
  To: Cyril Hrubis
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, LKML, Michael Kerrisk-manpages,
	Thomas Gleixner, Sasha Levin, Mathieu Desnoyers,
	scientist-b10kYP2dOMg, Steven Rostedt, Arnd Bergmann,
	carlos-H+wXaHxf7aLQT0dZR+AlfA, syzkaller, Kostya Serebryany,
	Mike Frysinger, Dave Jones, Tavis Ormandy

On Mon, Nov 7, 2016 at 11:38 AM, Cyril Hrubis <chrubis-AlSwsSmVLrQ@public.gmane.org> wrote:
> Hi!
>> We identified a surprisingly large number of potential users for such
>> descriptions:
>>  - fuzzers (syzkaller, trinity, iknowthis)
>>  - strace/syscall tracepoints (capturing indirect arguments and
>>    printing human-readable info)
>>  - generation of entry points for C libraries (glibc, liblinux
>>    (raw syscalls), Go runtime, clang/gcc sanitizers)
>>  - valgrind/sanitizers checking of input/output values of syscalls
>>  - seccomp filters (minijail, libseccomp) need to know interfaces
>>    to generate wrappers
>>  - safety certification (requires syscall specifications)
>>  - man pages (could provide actual syscall interface rather than
>>    glibc wrapper interface, it was noted that possible errno values
>>    is an important part here)
>>  - generation of syscall argument validation in kernel (fast version
>>    is enabled all the time, extended is optional)
>
> I was thinking of generating LTP testcases from a well defined format for
> quite some time. Maybe that would be a good way how to keep the
> descriptions to match the reality.
>
> LTP test would however need a bit more information that just syscall
> parameter anotation. We would need also description of the expected
> return value in some cases, annotating it as "returns fd" or "zero on
> success" would be good enough for basic tests. Better testing would need
> to describe the "side efect" as well (file was created, filesystem
> mounted, etc.) then we could write a classes of verify functions,
> something as check_file_exits() and use them to check the results
> accordingly.
>
> I'm not sure if something like this is really doable or in the scope of
> this project, but it may be worth a try.

Hi Cyril,

I think I heard something similar from Tavis in the iknowthis context.

Description of "returns fd or this set of errors" looks simple and useful.
Any test system or fuzzer will be able to verify that kernel actually returns
claimed return values. Also Carlos expressed interested in errno values
in the context of glibc.
I would do it from day one.

Re more complex side effects. I always feared that a description suitable
for automatic verification (i.e. zero false positives, otherwise it is useless)
may be too difficult to achieve.

Cyril, Tavis, can you come up with some set of predicates that can be
checked automatically yet still useful?
We can start small, e.g. "must not alter virtual address space".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
  2016-11-11 17:10 ` Andy Lutomirski
@ 2016-11-21 15:17   ` Dmitry Vyukov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Vyukov @ 2016-11-21 15:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux API, LKML, Michael Kerrisk, Thomas Gleixner, Sasha Levin,
	Mathieu Desnoyers, scientist, Steven Rostedt, Arnd Bergmann,
	Carlos O'Donell, syzkaller, Kostya Serebryany,
	Mike Frysinger, Dave Jones, Tavis Ormandy

On Fri, Nov 11, 2016 at 6:10 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Sun, Nov 6, 2016 at 2:39 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> Hello,
>>
>> This is notes from the discussion we had at Linux Plumbers this week
>> regarding providing a formal description of system calls (user API).
>>
>> The idea come up in the context of syzkaller, syscall fuzzer, which
>> has descriptions for 1000+ syscalls mostly concentrating on types of
>> arguments and return values. However, problems are that a small group
>> of people can't write descriptions for all syscalls; can't keep them
>> up-to-date and doesn't have necessary domain expertise to do correct
>> descriptions in some cases.
>>
>> We identified a surprisingly large number of potential users for such
>> descriptions:
>
> Let me add one more: consolidation of all the incompatible arch
> syscall tables.  A sufficiently descriptive format should be parseable
> at build time to generate the syscall tables.


Hi Andy,

What exactly info do we need for this?

Anything that just requires a bit of info per syscall looks like the
lowest hanging fruit
(as compared to info per every ioctl discrimination, and complete args
description).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
       [not found]         ` <CACT4Y+aUzdX8NqMu+Y3s53vEmoBw7KysB3g2PEjZ6MyJimki1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-11-21 15:34           ` Tavis Ormandy
  2016-11-21 16:10           ` Cyril Hrubis
  1 sibling, 0 replies; 14+ messages in thread
From: Tavis Ormandy @ 2016-11-21 15:34 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Cyril Hrubis, linux-api-u79uwXL29TY76Z2rM5mHXA, LKML,
	Michael Kerrisk-manpages, Thomas Gleixner, Sasha Levin,
	Mathieu Desnoyers, scientist-b10kYP2dOMg, Steven Rostedt,
	Arnd Bergmann, carlos-H+wXaHxf7aLQT0dZR+AlfA, syzkaller,
	Kostya Serebryany, Mike Frysinger, Dave Jones

On Mon, Nov 21, 2016 at 7:14 AM, Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
>
> Re more complex side effects. I always feared that a description suitable
> for automatic verification (i.e. zero false positives, otherwise it is useless)
> may be too difficult to achieve.
>
> Cyril, Tavis, can you come up with some set of predicates that can be
> checked automatically yet still useful?
> We can start small, e.g. "must not alter virtual address space".

Yes, I've been working on creating something like this, I have a
simple working prototype. I cant promise it has zero false positives
right now, but I think that is achievable.

Let me dig it up (I had put it on the back burner).

Tavis.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
  2016-11-07 10:38   ` Cyril Hrubis
       [not found]     ` <20161107103819.GA11374-2UyX9mZUyMU@public.gmane.org>
@ 2016-11-21 15:37     ` Steven Rostedt
       [not found]       ` <20161121103752.70ad1418-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2016-11-21 15:37 UTC (permalink / raw)
  To: Cyril Hrubis
  Cc: Dmitry Vyukov, linux-api, LKML, mtk.manpages, Thomas Gleixner,
	Sasha Levin, Mathieu Desnoyers, scientist, Arnd Bergmann, carlos,
	syzkaller, Kostya Serebryany, Mike Frysinger, Dave Jones,
	Tavis Ormandy

On Mon, 7 Nov 2016 11:38:20 +0100
Cyril Hrubis <chrubis@suse.cz> wrote:


> I'm not sure if something like this is really doable or in the scope of
> this project, but it may be worth a try.
> 

Looking ahead into the future, I was also thinking that if this becomes
robust, we could also start an integration specification, that could
describe how different system calls interact with each other. Like
open() to read(), write() and close().

But this is just an idea that popped in my head while reading this
thread. We want to start small first, but still could keep this in the
back of our minds for future enhancements.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
       [not found]       ` <20161121103752.70ad1418-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
@ 2016-11-21 15:48         ` Dmitry Vyukov
  2016-11-21 16:58           ` Cyril Hrubis
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2016-11-21 15:48 UTC (permalink / raw)
  To: syzkaller
  Cc: Cyril Hrubis, Linux API, LKML, Michael Kerrisk-manpages,
	Thomas Gleixner, Sasha Levin, Mathieu Desnoyers,
	scientist-b10kYP2dOMg, Arnd Bergmann, Carlos O'Donell,
	Kostya Serebryany, Mike Frysinger, Dave Jones, Tavis Ormandy

On Mon, Nov 21, 2016 at 4:37 PM, Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:
> On Mon, 7 Nov 2016 11:38:20 +0100
> Cyril Hrubis <chrubis-AlSwsSmVLrQ@public.gmane.org> wrote:
>
>
>> I'm not sure if something like this is really doable or in the scope of
>> this project, but it may be worth a try.
>>
>
> Looking ahead into the future, I was also thinking that if this becomes
> robust, we could also start an integration specification, that could
> describe how different system calls interact with each other. Like
> open() to read(), write() and close().
>
> But this is just an idea that popped in my head while reading this
> thread. We want to start small first, but still could keep this in the
> back of our minds for future enhancements.


FWIW syzkaller does something along these lines.
It has notion of 'resources' and of input/output arguments.
Then it can figure out that e.g. open creates fd's, so it should be called
before any reads/writes (provided that we want to pass in valid fd's).
It does not have notion of "destructors" for resources (e.g. close
destroys the passed in resource). But it should be easy to describe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
       [not found]         ` <CACT4Y+aUzdX8NqMu+Y3s53vEmoBw7KysB3g2PEjZ6MyJimki1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-11-21 15:34           ` Tavis Ormandy
@ 2016-11-21 16:10           ` Cyril Hrubis
  1 sibling, 0 replies; 14+ messages in thread
From: Cyril Hrubis @ 2016-11-21 16:10 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, LKML, Michael Kerrisk-manpages,
	Thomas Gleixner, Sasha Levin, Mathieu Desnoyers,
	scientist-b10kYP2dOMg, Steven Rostedt, Arnd Bergmann,
	carlos-H+wXaHxf7aLQT0dZR+AlfA, syzkaller, Kostya Serebryany,
	Mike Frysinger, Dave Jones, Tavis Ormandy

Hi!
> Description of "returns fd or this set of errors" looks simple and useful.
> Any test system or fuzzer will be able to verify that kernel actually returns
> claimed return values. Also Carlos expressed interested in errno values
> in the context of glibc.
> I would do it from day one.
> 
> Re more complex side effects. I always feared that a description suitable
> for automatic verification (i.e. zero false positives, otherwise it is useless)
> may be too difficult to achieve.

I'm afraid it may be as well. I would expect that we will end up with
something quite complex with a large set of exceptions from the rules.

> Cyril, Tavis, can you come up with some set of predicates that can be
> checked automatically yet still useful?
> We can start small, e.g. "must not alter virtual address space".

I will try to thing about this a bit.

-- 
Cyril Hrubis
chrubis-AlSwsSmVLrQ@public.gmane.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
  2016-11-21 15:48         ` Dmitry Vyukov
@ 2016-11-21 16:58           ` Cyril Hrubis
  0 siblings, 0 replies; 14+ messages in thread
From: Cyril Hrubis @ 2016-11-21 16:58 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Linux API, LKML, Michael Kerrisk-manpages,
	Thomas Gleixner, Sasha Levin, Mathieu Desnoyers, scientist,
	Arnd Bergmann, Carlos O'Donell, Kostya Serebryany,
	Mike Frysinger, Dave Jones, Tavis Ormandy

Hi!
> > Looking ahead into the future, I was also thinking that if this becomes
> > robust, we could also start an integration specification, that could
> > describe how different system calls interact with each other. Like
> > open() to read(), write() and close().
> >
> > But this is just an idea that popped in my head while reading this
> > thread. We want to start small first, but still could keep this in the
> > back of our minds for future enhancements.
> 
> 
> FWIW syzkaller does something along these lines.
> It has notion of 'resources' and of input/output arguments.
> Then it can figure out that e.g. open creates fd's, so it should be called
> before any reads/writes (provided that we want to pass in valid fd's).
> It does not have notion of "destructors" for resources (e.g. close
> destroys the passed in resource). But it should be easy to describe.

Logical extension would be to teach it that creat(), for instance,
returns file descriptor and creates a file as a side effect. That file
then could be used for a stat() or unlink() concurently, etc. But we
should also consider that not all file descriptors or files are equal,
so we may end up with some classes of files and file descriptors
some of them suitable for different subsets of operations.

I think that defining classes of objects and defining how syscalls
transform their state may yield something usable. But that would require
some serious thinking and a few trial and error implementations.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
       [not found]       ` <CACT4Y+bq97OPqW9nUoQWDdVfeCv6oOYT0=GeFmOu2rosBz4s2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-11-22 13:07         ` Szabolcs Nagy
  0 siblings, 0 replies; 14+ messages in thread
From: Szabolcs Nagy @ 2016-11-22 13:07 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, linux-api-u79uwXL29TY76Z2rM5mHXA, LKML,
	Michael Kerrisk-manpages, Thomas Gleixner, Sasha Levin,
	Mathieu Desnoyers, scientist-b10kYP2dOMg, Steven Rostedt,
	Arnd Bergmann, carlos-H+wXaHxf7aLQT0dZR+AlfA, Kostya Serebryany,
	Mike Frysinger, Dave Jones, Tavis Ormandy

* Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> [2016-11-21 16:03:04 +0100]:
> On Mon, Nov 7, 2016 at 1:28 AM, Szabolcs Nagy <nsz-4P1ElwuDYu6sTnJN9+BGXg@public.gmane.org> wrote:
> > * Dmitry Vyukov <dvyukov-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> [2016-11-06 14:39:28 -0800]:
> >> For the reference, current syzkaller descriptions are in txt files here:
> >> https://github.com/google/syzkaller/tree/master/sys
> > ...
> >> Taking the opportunity, if you see that something is missing/wrong
> >> in the descriptions of the subsystem you care about, or if it is not
> >> described at all, fixes are welcome.
> >
> > abi variants are missing (abi variation makes a lot of
> > syscall interface related work painful).
> 
> 
> What exactly do you mean by "abi variants"? Is it architecture?
> What exactly needs to be added to the descriptions support "abi variants?
> 

abi variant is a supported syscall abi
(linux supports several abis on the same arch:
e.g. x86 has i386,x86_64,x32 one might also
call different endianness a different abi, but
i'd expect le/be to be handled together.)

available syscalls, argument types and ordering
can be different across abis

i may be wrong, but i did not see those handled
in sys.txt, looking at it now i see missing
paddings in ipc structs (*_ds) so it may not
work the way i assumed (these are types which
have some abi variation).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Formal description of system call interface
       [not found] ` <CACT4Y+YYgs43nJnyg3B9cHWOue62iMW3ZgXQKiKG12A1NVMgtg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-11-07  0:28   ` Szabolcs Nagy
  2016-11-07 10:38   ` Cyril Hrubis
@ 2017-04-21 15:14   ` Carlos O'Donell
  2 siblings, 0 replies; 14+ messages in thread
From: Carlos O'Donell @ 2017-04-21 15:14 UTC (permalink / raw)
  To: Dmitry Vyukov, linux-api-u79uwXL29TY76Z2rM5mHXA, LKML
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Thomas Gleixner,
	Sasha Levin, Mathieu Desnoyers, scientist-b10kYP2dOMg,
	Steven Rostedt, Arnd Bergmann, syzkaller, Kostya Serebryany,
	Mike Frysinger, Dave Jones, Tavis Ormandy, Florian Weimer

On 11/06/2016 05:39 PM, Dmitry Vyukov wrote:
> Hello,
> 
> This is notes from the discussion we had at Linux Plumbers this week
> regarding providing a formal description of system calls (user API).
> 
> The idea come up in the context of syzkaller, syscall fuzzer, which
> has descriptions for 1000+ syscalls mostly concentrating on types of
> arguments and return values. However, problems are that a small group
> of people can't write descriptions for all syscalls; can't keep them
> up-to-date and doesn't have necessary domain expertise to do correct
> descriptions in some cases.
> 
> We identified a surprisingly large number of potential users for such
> descriptions:
>  - fuzzers (syzkaller, trinity, iknowthis)
>  - strace/syscall tracepoints (capturing indirect arguments and
>    printing human-readable info)
>  - generation of entry points for C libraries (glibc, liblinux
>    (raw syscalls), Go runtime, clang/gcc sanitizers)

To add another:

Auto-generation of SYS_* macros (sys/syscalls.h) in glibc which are
required for syscall().

It would mean we could copy the list directly from the most recently
released kernel instead of relying on distro kernel UAPI headers package.

We need this information in the released kernel.

>  - valgrind/sanitizers checking of input/output values of syscalls
>  - seccomp filters (minijail, libseccomp) need to know interfaces
>    to generate wrappers
>  - safety certification (requires syscall specifications)
>  - man pages (could provide actual syscall interface rather than
>    glibc wrapper interface, it was noted that possible errno values
>    is an important part here)
>  - generation of syscall argument validation in kernel (fast version
>    is enabled all the time, extended is optional)
> 
> It's worth noting that number of these users already have some
> descriptions that suffer from the same problems of being
> incomplete/outdated. See also linux-api mailing list description
> which lists an overlapping set of cases:
> https://www.kernel.org/doc/man-pages/linux-api-ml.html
> 
> We discussed several implementation approaches:
>  - Extracting the interface from kernel code either by parsing
>    sources or using dwarf. However, current source doesn't have
>    enough info: fd are specified as int, while we need to know exact
>    fd type (e.g. fd_epoll_t); not possible to extract flag set for
>    'int flags'; don't know what is 'char*'.
>  - Making the formal description the master copy and generating
>    kernel code from it (structs, flags, syscall entry points).
>    This is quite pervasive, but otherwise should work.
>  - Doing what syzkaller currently does: providing the description
>    on side. Verifying that description and implementation match
>    is an important piece here. We can do dynamic checking in syscall
>    entry points (print warnings on anything that does not match
>    descriptions); or static checking (but again kernel code doesn't
>    have enough info for checking).
> 
> We decided to pursue the last option as the least pervasive for now.
> Several locations for the descriptions were proposed: with source code,
> include/uapi, Documentation.
> 
> Action points:
>  - polish DSL for description (must be extensible)
>  - write a parser for DSL
>  - provide definition for mm syscalls (mm is reasonably simple
>    and self-contained)
>  - see if we can do validation of mm arguments

Have we made any progress on these points?

> It was acknowledged that whatever we do now it will probably
> significantly change and evolve over time as we better understand
> what we need and what works.
> 
> For the reference, current syzkaller descriptions are in txt files here:
> https://github.com/google/syzkaller/tree/master/sys
> The most generic syscalls are here:
> https://github.com/google/syzkaller/blob/master/sys/sys.txt
> Specific subsystems are described in separate files, e.g.:
> https://github.com/google/syzkaller/blob/master/sys/bpf.txt
> https://github.com/google/syzkaller/blob/master/sys/tty.txt
> https://github.com/google/syzkaller/blob/master/sys/sndseq.txt
> The descriptions should be self-explanatory, but just in case there
> is also a semi-formal DSL specification here:
> https://github.com/google/syzkaller/blob/master/sys/README.md
> 
> Taking the opportunity, if you see that something is missing/wrong
> in the descriptions of the subsystem you care about, or if it is not
> described at all, fixes are welcome.
> 
> Thanks

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-04-21 15:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-06 22:39 Formal description of system call interface Dmitry Vyukov
     [not found] ` <CACT4Y+YYgs43nJnyg3B9cHWOue62iMW3ZgXQKiKG12A1NVMgtg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-07  0:28   ` Szabolcs Nagy
2016-11-21 15:03     ` Dmitry Vyukov
     [not found]       ` <CACT4Y+bq97OPqW9nUoQWDdVfeCv6oOYT0=GeFmOu2rosBz4s2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-22 13:07         ` Szabolcs Nagy
2016-11-07 10:38   ` Cyril Hrubis
     [not found]     ` <20161107103819.GA11374-2UyX9mZUyMU@public.gmane.org>
2016-11-21 15:14       ` Dmitry Vyukov
     [not found]         ` <CACT4Y+aUzdX8NqMu+Y3s53vEmoBw7KysB3g2PEjZ6MyJimki1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-11-21 15:34           ` Tavis Ormandy
2016-11-21 16:10           ` Cyril Hrubis
2016-11-21 15:37     ` Steven Rostedt
     [not found]       ` <20161121103752.70ad1418-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2016-11-21 15:48         ` Dmitry Vyukov
2016-11-21 16:58           ` Cyril Hrubis
2017-04-21 15:14   ` Carlos O'Donell
2016-11-11 17:10 ` Andy Lutomirski
2016-11-21 15:17   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).