From: Dmitry Vyukov <dvyukov@google.com>
To: "Levin, Alexander" <alexander.levin@verizon.com>
Cc: "tglx@linutronix.de" <tglx@linutronix.de>,
"scientist@fb.com" <scientist@fb.com>,
"glider@google.com" <glider@google.com>,
"andreyknvl@google.com" <andreyknvl@google.com>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
"arnd@arndb.de" <arnd@arndb.de>,
"mathieu.desnoyers@efficios.com" <mathieu.desnoyers@efficios.com>,
"daniel.vetter@ffwll.ch" <daniel.vetter@ffwll.ch>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 1/3] abi_spec: basic definitions of constraints, args and syscalls
Date: Wed, 14 Dec 2016 20:46:25 +0100 [thread overview]
Message-ID: <CACT4Y+b_M3J2xB-CpOafFizRGWeK=dWAkQQ-6WFyFSda-VdDQA@mail.gmail.com> (raw)
In-Reply-To: <CACT4Y+aGGf_u0D3Xkt3yNRG-WXF_bCf_BSaE0FwretT3D3aTiQ@mail.gmail.com>
On Mon, Dec 12, 2016 at 11:45 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Mon, Dec 12, 2016 at 11:29 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Wed, Nov 23, 2016 at 3:59 PM, <alexander.levin@verizon.com> wrote:
>>> On Mon, Nov 21, 2016 at 03:48:17PM +0100, Dmitry Vyukov wrote:
>>>> Several observations based on my experience with syzkaller descriptions:
>>>> - there are 2 levels: physical and logical;
>>>> on physical level there are int, pointer, array, struct, union;
>>>> and that's pretty much it.
>>>> on logical level there are flags, bitmasks, file paths, sctp socket fds,
>>>> unix socket names, etc.
>>>> These levels are almost completely orthogonal. It would be useful to
>>>> clearly separate them on description level. E.g. now you have TYPE_PTR and
>>>> TYPE_INT which is physical level; and then TYPE_FD which is also an int.
>>>>
>>>> - logical types won't fit into 64 bits, there are more of them
>>>
>>> I agree with your two points above.
>>>
>>> As an example, let's look at:
>>>
>>> int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
>>>
>>> epfd would be a physical int, logical "epoll_fd", and constrainted with being
>>> an open descriptor.
>>>
>>> fd on the other hand is tricky: it's a physical int, logical fd, and
>>> constrainted with being a file descriptor that supports poll, and being
>>> open.
>>>
>>> So while I think that logical types can be just a counter rather than a bitmask
>>> I suspect that our constraints won't fit into 64 bits. Make 2 64 bit fields?
>>
>> One observation is that there are just 5 physical types:
>> - scalar
>> - pointer
>> - array
>> - struct
>> - union
>>
>> The rest deals with what exactly "scalar" is in a particular case.
>>
>> I don't yet have complete answer, as it somewhat intermixed with the
>> rest of questions.
>>
>>
>>
>>>> - we need support for recursive types (yes, there are linked lists in
>>>> kernel APIs)
>>>
>>> I imagine that this will be handled by specific logical type handlers we'll
>>> need to implement. Can you give me an example and I'll try to code that?
>>
>> One example is te_oper_param here:
>> https://android.googlesource.com/kernel/tegra/+/android-tegra-3.10/security/tlk_driver/ote_protocol.h
>> next_ptr_user is a pointer to te_oper_param. Thus recursive definition.
>>
>> Another example is snd_seq_ev_quote:
>> http://lxr.free-electrons.com/source/include/uapi/sound/asequencer.h#L194
>> it contains struct snd_seq_event *event and snd_seq_event recursively
>> contains snd_seq_ev_quote.
>>
>> In all cases it is pointer recursion via structs.
>>
>> Sometimes it wish that developers have to write formal descriptions in
>> a limited language upfront. That would probably eliminate lots of
>> weird one-off "see what I invented here" cases :)
>>
>>
>>
>>
>>>> - we need support for input/output data
>>>> currently syzkaller does this only on pointer level, i.e. you
>>>> attach direction to pointer target
>>>> but that's not enough, frequently there is a struct where one field
>>>> is input and another is output
>>>
>>> Assuming it's "data", for intput we'll just need to check that the given
>>> length is readable and for output that the length is writable, no?
>>
>> It also can be an fd in a struct field. If it's output (e.g. pipe),
>> then we must not check that it's valid on entry. But we may want to
>> check that it's valid on successful exit, or fuzzer will use these
>> output fd's as inputs to other calls.
>>
>>
>>> We can do it with constraints right now.
>>>
>>>> - we may need support for reusing types in several arguments
>>>> e.g. you may have a pretty complex type, and you don't want to
>>>> write it out a dozen of times
>>>
>>> Yup, so if we go with the physical/logical split we can have handlers for
>>> logical types.
>>>
>>>> - we need some support for discriminated syscalls
>>>> if we want to support strace usecase, the support needs to be more
>>>> extensive than what syzkaller has;
>>>> i.e. syzkaller can't restore discrimination having actual argument
>>>> values (it can do it only in the other direction)
>>>>
>>>> - I would not create a special support for arguments;
>>>> rather I would create support for structs and struct fields,
>>>> and then pretend that a syscalls effectively accepts a struct by value
>>>
>>> But that means I need a custom handler for every syscall to parse the
>>> struct fields rather than a generic code that goes through the args and calls
>>> the right handler?
>>
>> No, you don't. We will need generic code that parses a piece of memory
>> as a struct and splits it into fields anyway.
>> We can just reuse this code to handle syscall arguments as follows.
>> Describe syscall arguments as a pseudo struct (array of fields). Then
>> syscall handling function accepts pointer to region of memory with
>> arguments and description of the struct, and invokes common struct
>> handling code.
>>
>>
>>
>>>> How would you like us to collaborate on this?
>>>> If you share your git repo, I could form it into something that would
>>>> be suitable for syzkaller and incorporate most of the above.
>>>
>>> I'd really like to have something that either generates these descriptions from
>>> your DSL (it really doesn't have to be perfect (at first)) or something that
>>> generates DSL from these C structs.
>>
>> Do you mean generating C from my DSL of a one-off or as a permanent solution?
>
>
> Main problem I am trying to resolve now is how to make types reference
> other types.
> Say we have "pointer to pointer to int" (the same applies to structs
> and arrays). This means that struct type must be able to reference
> another instance of struct type. But we can't put type into type by
> value. Namely, the following is not possible:
>
> struct type {
> int kind;
> union {
> ...
> // for kind = KIND_PTR
> struct {
> struct type type;
> } ptr;
> };
> };
>
> One obvious solution would be to always reference _pointers_ to types. E.g.:
>
> // for kind = KIND_PTR
> struct {
> struct type* type;
> } ptr;
>
> This works but leads to super-verbose descriptions (if we want to use
> static tables), because we need to describe type of each syscall
> argument and inner types for all pointers/arrays/structs/unions as
> separate global variables:
>
>
> static struct type open_arg0_inner {
> .kind = KIND_ARRAY,
> ...
> };
>
> static struct type open_arg0 {
> .kind = KIND_PTR,
> .ptr.type = &open_arg0_inner
> ...
> };
>
> static struct type open_arg1 {
> .kind = KIND_SCALAR,
> ...
> };
>
> static struct type open_arg2 {
> .kind = KIND_SCALAR,
> ...
> };
>
> static struct type open_ret {
> .kind = KIND_SCALAR,
> ...
> };
>
> struct syscall_spec syscall_spec_open = {
> .name = "open",
> .retval = {
> .name = "retval",
> .type = &open_ret,
> }
> .nargs = 3,
> .args[0] = {
> .name = "pathname",
> .type = &open_arg0,
> }
> .args[1] = {
> .name = "flags",
> .type = &open_arg1,
> }
> .args[2] = {
> .name = "mode",
> .type = &open_arg2,
> }
> };
>
>
> This looks way too verbose provided that we write these descriptions
> by hand (just coming up with consistent unique names for all these
> global vars is a problem).
> If we generate that from DSL, it can work though.
>
> Another option I am considering is to use helper construction
> functions that return pointers to types, e.g. something along these
> lines:
>
> struct syscall_spec syscall_spec_open = {
> .name = "open",
> .retval = {
> .name = "retval",
> .type = type_scalar(FD | ERRNO),
> }
> .nargs = 3,
> .args[0] = {
> .name = "pathname",
> .type = type_ptr(DIR_IN, type_array(PATHNAME)),
> }
> .args[1] = {
> .name = "flags",
> .type = type_scalar_flags(O_RDONLY | O_WRONLY, ....),
> }
> .args[2] = {
> .name = "mode",
> .type = type_scalar_flags(S_IRWXU ....),
> }
> };
>
> Now these table can only be initialized dynamically. And we will
> probably need lots of these helper functions. So I don't like it
> either...
>
> Any suggestions?
Here is my current prototype:
https://github.com/dvyukov/linux/commit/6200a9333e78bef393f8ead41205813b94d340f3
For now it can trace arguments of 4 system calls:
[ 4.055483] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6)
[ 4.055991] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6) = 3
[ 4.056486] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000)
[ 4.056977] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000) = 1780
[ 4.057485] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000)
[ 4.057991] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000) = 0
[ 4.058488] [pid 1258] close(0x0) = 0
[ 4.058848] [pid 1258] write(0x1, &00007f316a732000=[], 0x5)
[ 4.059304] [pid 1258] write(0x1, &00007f316a732000=[], 0x5) = 5
[ 4.059905] [pid 1234] close(0x0) = 0
[ 4.060239] [pid 1234] close(0x0) = 0
Main outstanding problems:
- understanding length of arrays and buffers
- understanding discriminators of unions and syscall variations
next prev parent reply other threads:[~2016-12-14 19:49 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-16 17:37 [RFC 0/3] ABI spec - verification alexander.levin
2016-11-16 17:37 ` [RFC 1/3] abi_spec: basic definitions of constraints, args and syscalls alexander.levin
2016-11-21 14:48 ` Dmitry Vyukov
2016-11-23 14:59 ` alexander.levin
2016-12-12 10:29 ` Dmitry Vyukov
2016-12-12 10:45 ` Dmitry Vyukov
2016-12-14 19:46 ` Dmitry Vyukov [this message]
2016-12-14 19:48 ` Dmitry Vyukov
2017-01-04 4:52 ` alexander.levin
2016-12-27 17:23 ` alexander.levin
2016-12-28 7:32 ` Dmitry Vyukov
2016-11-21 15:41 ` Steven Rostedt
2016-11-23 15:03 ` alexander.levin
2016-11-23 15:31 ` Steven Rostedt
2016-11-23 15:33 ` Steven Rostedt
2016-11-16 17:37 ` [RFC 2/3] abi_spec: hooks into syscall to allow pre and post checking alexander.levin
2016-11-21 15:54 ` Steven Rostedt
2016-11-21 15:57 ` Dmitry Vyukov
2016-11-23 15:04 ` alexander.levin
2016-11-16 17:37 ` [RFC 3/3] abi_spec: example spec for open(), placeholder for rest of syscalls alexander.levin
2016-11-16 17:46 ` [RFC 0/3] ABI spec - verification Thomas Gleixner
2016-11-21 14:25 ` Dmitry Vyukov
2016-11-23 14:36 ` alexander.levin
2016-12-12 10:12 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CACT4Y+b_M3J2xB-CpOafFizRGWeK=dWAkQQ-6WFyFSda-VdDQA@mail.gmail.com' \
--to=dvyukov@google.com \
--cc=alexander.levin@verizon.com \
--cc=andreyknvl@google.com \
--cc=arnd@arndb.de \
--cc=daniel.vetter@ffwll.ch \
--cc=glider@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=rostedt@goodmis.org \
--cc=scientist@fb.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).