From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756803AbcLNTuJ (ORCPT ); Wed, 14 Dec 2016 14:50:09 -0500 Received: from mail-wm0-f47.google.com ([74.125.82.47]:38314 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756555AbcLNTtw (ORCPT ); Wed, 14 Dec 2016 14:49:52 -0500 MIME-Version: 1.0 In-Reply-To: References: <1479317803-17220-1-git-send-email-alexander.levin@verizon.com> <1479317803-17220-2-git-send-email-alexander.levin@verizon.com> <20161123150001.GD3218@sasha-lappy> From: Dmitry Vyukov Date: Wed, 14 Dec 2016 20:48:37 +0100 Message-ID: Subject: Re: [RFC 1/3] abi_spec: basic definitions of constraints, args and syscalls To: "Levin, Alexander" Cc: "tglx@linutronix.de" , "scientist@fb.com" , "glider@google.com" , "andreyknvl@google.com" , "rostedt@goodmis.org" , "arnd@arndb.de" , "mathieu.desnoyers@efficios.com" , "daniel.vetter@ffwll.ch" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 14, 2016 at 8:46 PM, Dmitry Vyukov wrote: > On Mon, Dec 12, 2016 at 11:45 AM, Dmitry Vyukov wrote: >> On Mon, Dec 12, 2016 at 11:29 AM, Dmitry Vyukov wrote: >>> On Wed, Nov 23, 2016 at 3:59 PM, wrote: >>>> On Mon, Nov 21, 2016 at 03:48:17PM +0100, Dmitry Vyukov wrote: >>>>> Several observations based on my experience with syzkaller descriptions: >>>>> - there are 2 levels: physical and logical; >>>>> on physical level there are int, pointer, array, struct, union; >>>>> and that's pretty much it. >>>>> on logical level there are flags, bitmasks, file paths, sctp socket fds, >>>>> unix socket names, etc. >>>>> These levels are almost completely orthogonal. It would be useful to >>>>> clearly separate them on description level. E.g. now you have TYPE_PTR and >>>>> TYPE_INT which is physical level; and then TYPE_FD which is also an int. >>>>> >>>>> - logical types won't fit into 64 bits, there are more of them >>>> >>>> I agree with your two points above. >>>> >>>> As an example, let's look at: >>>> >>>> int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); >>>> >>>> epfd would be a physical int, logical "epoll_fd", and constrainted with being >>>> an open descriptor. >>>> >>>> fd on the other hand is tricky: it's a physical int, logical fd, and >>>> constrainted with being a file descriptor that supports poll, and being >>>> open. >>>> >>>> So while I think that logical types can be just a counter rather than a bitmask >>>> I suspect that our constraints won't fit into 64 bits. Make 2 64 bit fields? >>> >>> One observation is that there are just 5 physical types: >>> - scalar >>> - pointer >>> - array >>> - struct >>> - union >>> >>> The rest deals with what exactly "scalar" is in a particular case. >>> >>> I don't yet have complete answer, as it somewhat intermixed with the >>> rest of questions. >>> >>> >>> >>>>> - we need support for recursive types (yes, there are linked lists in >>>>> kernel APIs) >>>> >>>> I imagine that this will be handled by specific logical type handlers we'll >>>> need to implement. Can you give me an example and I'll try to code that? >>> >>> One example is te_oper_param here: >>> https://android.googlesource.com/kernel/tegra/+/android-tegra-3.10/security/tlk_driver/ote_protocol.h >>> next_ptr_user is a pointer to te_oper_param. Thus recursive definition. >>> >>> Another example is snd_seq_ev_quote: >>> http://lxr.free-electrons.com/source/include/uapi/sound/asequencer.h#L194 >>> it contains struct snd_seq_event *event and snd_seq_event recursively >>> contains snd_seq_ev_quote. >>> >>> In all cases it is pointer recursion via structs. >>> >>> Sometimes it wish that developers have to write formal descriptions in >>> a limited language upfront. That would probably eliminate lots of >>> weird one-off "see what I invented here" cases :) >>> >>> >>> >>> >>>>> - we need support for input/output data >>>>> currently syzkaller does this only on pointer level, i.e. you >>>>> attach direction to pointer target >>>>> but that's not enough, frequently there is a struct where one field >>>>> is input and another is output >>>> >>>> Assuming it's "data", for intput we'll just need to check that the given >>>> length is readable and for output that the length is writable, no? >>> >>> It also can be an fd in a struct field. If it's output (e.g. pipe), >>> then we must not check that it's valid on entry. But we may want to >>> check that it's valid on successful exit, or fuzzer will use these >>> output fd's as inputs to other calls. >>> >>> >>>> We can do it with constraints right now. >>>> >>>>> - we may need support for reusing types in several arguments >>>>> e.g. you may have a pretty complex type, and you don't want to >>>>> write it out a dozen of times >>>> >>>> Yup, so if we go with the physical/logical split we can have handlers for >>>> logical types. >>>> >>>>> - we need some support for discriminated syscalls >>>>> if we want to support strace usecase, the support needs to be more >>>>> extensive than what syzkaller has; >>>>> i.e. syzkaller can't restore discrimination having actual argument >>>>> values (it can do it only in the other direction) >>>>> >>>>> - I would not create a special support for arguments; >>>>> rather I would create support for structs and struct fields, >>>>> and then pretend that a syscalls effectively accepts a struct by value >>>> >>>> But that means I need a custom handler for every syscall to parse the >>>> struct fields rather than a generic code that goes through the args and calls >>>> the right handler? >>> >>> No, you don't. We will need generic code that parses a piece of memory >>> as a struct and splits it into fields anyway. >>> We can just reuse this code to handle syscall arguments as follows. >>> Describe syscall arguments as a pseudo struct (array of fields). Then >>> syscall handling function accepts pointer to region of memory with >>> arguments and description of the struct, and invokes common struct >>> handling code. >>> >>> >>> >>>>> How would you like us to collaborate on this? >>>>> If you share your git repo, I could form it into something that would >>>>> be suitable for syzkaller and incorporate most of the above. >>>> >>>> I'd really like to have something that either generates these descriptions from >>>> your DSL (it really doesn't have to be perfect (at first)) or something that >>>> generates DSL from these C structs. >>> >>> Do you mean generating C from my DSL of a one-off or as a permanent solution? >> >> >> Main problem I am trying to resolve now is how to make types reference >> other types. >> Say we have "pointer to pointer to int" (the same applies to structs >> and arrays). This means that struct type must be able to reference >> another instance of struct type. But we can't put type into type by >> value. Namely, the following is not possible: >> >> struct type { >> int kind; >> union { >> ... >> // for kind = KIND_PTR >> struct { >> struct type type; >> } ptr; >> }; >> }; >> >> One obvious solution would be to always reference _pointers_ to types. E.g.: >> >> // for kind = KIND_PTR >> struct { >> struct type* type; >> } ptr; >> >> This works but leads to super-verbose descriptions (if we want to use >> static tables), because we need to describe type of each syscall >> argument and inner types for all pointers/arrays/structs/unions as >> separate global variables: >> >> >> static struct type open_arg0_inner { >> .kind = KIND_ARRAY, >> ... >> }; >> >> static struct type open_arg0 { >> .kind = KIND_PTR, >> .ptr.type = &open_arg0_inner >> ... >> }; >> >> static struct type open_arg1 { >> .kind = KIND_SCALAR, >> ... >> }; >> >> static struct type open_arg2 { >> .kind = KIND_SCALAR, >> ... >> }; >> >> static struct type open_ret { >> .kind = KIND_SCALAR, >> ... >> }; >> >> struct syscall_spec syscall_spec_open = { >> .name = "open", >> .retval = { >> .name = "retval", >> .type = &open_ret, >> } >> .nargs = 3, >> .args[0] = { >> .name = "pathname", >> .type = &open_arg0, >> } >> .args[1] = { >> .name = "flags", >> .type = &open_arg1, >> } >> .args[2] = { >> .name = "mode", >> .type = &open_arg2, >> } >> }; >> >> >> This looks way too verbose provided that we write these descriptions >> by hand (just coming up with consistent unique names for all these >> global vars is a problem). >> If we generate that from DSL, it can work though. >> >> Another option I am considering is to use helper construction >> functions that return pointers to types, e.g. something along these >> lines: >> >> struct syscall_spec syscall_spec_open = { >> .name = "open", >> .retval = { >> .name = "retval", >> .type = type_scalar(FD | ERRNO), >> } >> .nargs = 3, >> .args[0] = { >> .name = "pathname", >> .type = type_ptr(DIR_IN, type_array(PATHNAME)), >> } >> .args[1] = { >> .name = "flags", >> .type = type_scalar_flags(O_RDONLY | O_WRONLY, ....), >> } >> .args[2] = { >> .name = "mode", >> .type = type_scalar_flags(S_IRWXU ....), >> } >> }; >> >> Now these table can only be initialized dynamically. And we will >> probably need lots of these helper functions. So I don't like it >> either... >> >> Any suggestions? > > > > Here is my current prototype: > https://github.com/dvyukov/linux/commit/6200a9333e78bef393f8ead41205813b94d340f3 > > For now it can trace arguments of 4 system calls: > > [ 4.055483] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6) > [ 4.055991] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6) = 3 > [ 4.056486] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000) > [ 4.056977] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000) = 1780 > [ 4.057485] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000) > [ 4.057991] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000) = 0 > [ 4.058488] [pid 1258] close(0x0) = 0 > [ 4.058848] [pid 1258] write(0x1, &00007f316a732000=[], 0x5) > [ 4.059304] [pid 1258] write(0x1, &00007f316a732000=[], 0x5) = 5 > [ 4.059905] [pid 1234] close(0x0) = 0 > [ 4.060239] [pid 1234] close(0x0) = 0 > > > Main outstanding problems: > - understanding length of arrays and buffers > - understanding discriminators of unions and syscall variations Here is open description: struct syscall_spec syscall_spec_open = { .name = "open", .errno = {EACCES, EDQUOT, EEXIST, EFAULT, EFBIG, EINTR, EINVAL, EISDIR, ELOOP, EMFILE, ENAMETOOLONG, ENFILE, ENODEV, ENOENT, ENOMEM, ENOSPC, ENOTDIR, ENXIO, EOVERFLOW, EPERM, EROFS, ETXTBSY, EWOULDBLOCK}, .ret = &type_fd, .args = { {"pathname", &type_ptr_pathname}, {"flags", &type_open_flags}, {"mode", &type_open_mode}, }, }; Types are declares outline as: static struct type type_open_mode = { .kind = KIND_SCALAR, .scalar.size = 4, .scalar.constraint = CONSTRAINT_BITMASK, .scalar.bitmask = {$(S_IRUSR), $(S_IWUSR), $(S_IXUSR), $(S_IRGRP), $(S_IWGRP), $(S_IXGRP), $(S_IROTH), $(S_IWOTH), $(S_IXOTH)}, }; static struct type type_fd = { .kind = KIND_RESOURCE, .res.res = RES_FD, .res.type = &type_i32, }; static struct type type_array_i8 = { .kind = KIND_ARRAY, .array.type = &type_i8, }; static struct type type_ptr_array_i8 = { .kind = KIND_PTR, .ptr.type = &type_array_i8, }; etc