All of lore.kernel.org
 help / color / mirror / Atom feed
* eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 19:34 Andy Lutomirski
  2014-09-26 19:51   ` Andy Lutomirski
  2014-09-26 20:00 ` Alexei Starovoitov
  0 siblings, 2 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 19:34 UTC (permalink / raw)
  To: David Miller
  Cc: Alexei Starovoitov, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 12:06 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexei Starovoitov <ast@plumgrid.com>
> Date: Fri, 26 Sep 2014 00:16:56 -0700
>
>> v14 -> v15:
>> - got rid of macros with hidden control flow (suggested by David)
>>   replaced macro with explicit goto or return and simplified
>>   where possible (affected patches #9 and #10)
>> - rebased, retested
>
> Series applied to net-next, thanks.

Hi all-

I read through the verifier.  Sorry this email is a little bit late.
Here are some thoughts, in no particular order.

Programs should IMO have feature flags.  One such feature would be
"can use pointers as integers", for example.

UNKNOWN_VALUE is IMO a crappy type.  I think that it should renamed to
INTEGER and that a bunch of the things that generate it should be
errors.

ALU ops like adding two pointers should IMO be failures.  I don't
think that they should result in UNKNOWN_VALUE.

I don't like PTR_TO_MAP_VALUE as a type.  I think that there should
instead be "pointer to (const?) buffer of N bytes".

ARG_PTR_TO_MAP_KEY is odd.  Why do we care what a function does with a
key?  It just needs to be a big enough buffer, right?

All of the stack spill stuff seems overcomplicated.  Can you just
disallow unaligned stack access?  Right off the bat, that will delete
a bunch of code and cut down runtime memory use by nearly a factor of
eight.

Also, there are no calls to functions written eBPF, right?  If so, why
not just give each stack slot a *fixed* type for the lifetime of the
program?  That would be a huge time complexity win, not to mention
being more comprehensible.  When function calls get added, they'll
need a whole pile of new infrastructure anyway.

check_call is a mess.  I predict that it will be unmaintainable.

check_ld_abs is messy, IMO.  Can't there be a real type "pointer to
object of type [type]", where skb would be a type?  Then you could use
the normal function call logic for skb pointers instead of hardcoded
crud.

You're doing a depth-first search.  I don't like it.  You need complex
pruning logic to avoid exponential behavior, and it's not obvious to
me that pathological programs that cause exponential blowups in the
verifier don't exist.  Wouldn't it make more sense to do a
breadth-first search instead and to give each reg/stack slot a
definite type at each point in the control flow?  You'd need a
function to find the intersection of two types, but I think that would
be preferable to the current code that tests whether one type is a
subset of another.

At some point someone will want loops, and that will open a huge can
of worms involving explicit typing or type inference.  It might pay to
add some kind of stack slot (and maybe even register slot) type
declarations now to get things ready for that.

NB: If this code is actually invoked in net-next when loading a
filter, please don't push to Linus until there's a convincing argument
that the verifier has acceptable asymptotic complexity.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 19:51   ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 19:51 UTC (permalink / raw)
  To: David Miller
  Cc: Alexei Starovoitov, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Sep 26, 2014 at 12:06 PM, David Miller <davem@davemloft.net> wrote:
>> From: Alexei Starovoitov <ast@plumgrid.com>
>> Date: Fri, 26 Sep 2014 00:16:56 -0700
>>
>>> v14 -> v15:
>>> - got rid of macros with hidden control flow (suggested by David)
>>>   replaced macro with explicit goto or return and simplified
>>>   where possible (affected patches #9 and #10)
>>> - rebased, retested
>>
>> Series applied to net-next, thanks.
>
> Hi all-
>
> I read through the verifier.  Sorry this email is a little bit late.
> Here are some thoughts, in no particular order.
>
> Programs should IMO have feature flags.  One such feature would be
> "can use pointers as integers", for example.
>
> UNKNOWN_VALUE is IMO a crappy type.  I think that it should renamed to
> INTEGER and that a bunch of the things that generate it should be
> errors.
>
> ALU ops like adding two pointers should IMO be failures.  I don't
> think that they should result in UNKNOWN_VALUE.
>
> I don't like PTR_TO_MAP_VALUE as a type.  I think that there should
> instead be "pointer to (const?) buffer of N bytes".
>
> ARG_PTR_TO_MAP_KEY is odd.  Why do we care what a function does with a
> key?  It just needs to be a big enough buffer, right?
>
> All of the stack spill stuff seems overcomplicated.  Can you just
> disallow unaligned stack access?  Right off the bat, that will delete
> a bunch of code and cut down runtime memory use by nearly a factor of
> eight.
>
> Also, there are no calls to functions written eBPF, right?  If so, why
> not just give each stack slot a *fixed* type for the lifetime of the
> program?  That would be a huge time complexity win, not to mention
> being more comprehensible.  When function calls get added, they'll
> need a whole pile of new infrastructure anyway.
>
> check_call is a mess.  I predict that it will be unmaintainable.
>
> check_ld_abs is messy, IMO.  Can't there be a real type "pointer to
> object of type [type]", where skb would be a type?  Then you could use
> the normal function call logic for skb pointers instead of hardcoded
> crud.
>
> You're doing a depth-first search.  I don't like it.  You need complex
> pruning logic to avoid exponential behavior, and it's not obvious to
> me that pathological programs that cause exponential blowups in the
> verifier don't exist.  Wouldn't it make more sense to do a
> breadth-first search instead and to give each reg/stack slot a
> definite type at each point in the control flow?  You'd need a
> function to find the intersection of two types, but I think that would
> be preferable to the current code that tests whether one type is a
> subset of another.
>
> At some point someone will want loops, and that will open a huge can
> of worms involving explicit typing or type inference.  It might pay to
> add some kind of stack slot (and maybe even register slot) type
> declarations now to get things ready for that.
>
> NB: If this code is actually invoked in net-next when loading a
> filter, please don't push to Linus until there's a convincing argument
> that the verifier has acceptable asymptotic complexity.
>

To add one more point:

With the current verifier design, it's impossible to write a userspace
tool that can take an eBPF program and check it.  The verification is
far too context-dependent for that to be possible.  I won't go so far
as to say that a userspace tool needs to *exist*, but I strongly
object to exposing a verification algorithm that *precludes* writing
such a tool.

I think that the eBPF program format needs to encode all context
needed for verification.  Then verification should check that the
program is compliant with the context and that the context is correct.
The former could, in principle, be done in userspace, too.

Here, "context" includes the program type (i.e. what type R1 hasis),
the key and value sizes of all referenced maps, the fact that the maps
are maps (damnit, "every object implements exactly the same interface
and is called a 'map'" is a bad type system*), and possible also
things like the intended stack size and any other relevant details
about the entry calling convention.

* It's especially bad because you already have a type (skb) that
doesn't conform.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 19:51   ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 19:51 UTC (permalink / raw)
  To: David Miller
  Cc: Alexei Starovoitov, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> On Fri, Sep 26, 2014 at 12:06 PM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
>> From: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
>> Date: Fri, 26 Sep 2014 00:16:56 -0700
>>
>>> v14 -> v15:
>>> - got rid of macros with hidden control flow (suggested by David)
>>>   replaced macro with explicit goto or return and simplified
>>>   where possible (affected patches #9 and #10)
>>> - rebased, retested
>>
>> Series applied to net-next, thanks.
>
> Hi all-
>
> I read through the verifier.  Sorry this email is a little bit late.
> Here are some thoughts, in no particular order.
>
> Programs should IMO have feature flags.  One such feature would be
> "can use pointers as integers", for example.
>
> UNKNOWN_VALUE is IMO a crappy type.  I think that it should renamed to
> INTEGER and that a bunch of the things that generate it should be
> errors.
>
> ALU ops like adding two pointers should IMO be failures.  I don't
> think that they should result in UNKNOWN_VALUE.
>
> I don't like PTR_TO_MAP_VALUE as a type.  I think that there should
> instead be "pointer to (const?) buffer of N bytes".
>
> ARG_PTR_TO_MAP_KEY is odd.  Why do we care what a function does with a
> key?  It just needs to be a big enough buffer, right?
>
> All of the stack spill stuff seems overcomplicated.  Can you just
> disallow unaligned stack access?  Right off the bat, that will delete
> a bunch of code and cut down runtime memory use by nearly a factor of
> eight.
>
> Also, there are no calls to functions written eBPF, right?  If so, why
> not just give each stack slot a *fixed* type for the lifetime of the
> program?  That would be a huge time complexity win, not to mention
> being more comprehensible.  When function calls get added, they'll
> need a whole pile of new infrastructure anyway.
>
> check_call is a mess.  I predict that it will be unmaintainable.
>
> check_ld_abs is messy, IMO.  Can't there be a real type "pointer to
> object of type [type]", where skb would be a type?  Then you could use
> the normal function call logic for skb pointers instead of hardcoded
> crud.
>
> You're doing a depth-first search.  I don't like it.  You need complex
> pruning logic to avoid exponential behavior, and it's not obvious to
> me that pathological programs that cause exponential blowups in the
> verifier don't exist.  Wouldn't it make more sense to do a
> breadth-first search instead and to give each reg/stack slot a
> definite type at each point in the control flow?  You'd need a
> function to find the intersection of two types, but I think that would
> be preferable to the current code that tests whether one type is a
> subset of another.
>
> At some point someone will want loops, and that will open a huge can
> of worms involving explicit typing or type inference.  It might pay to
> add some kind of stack slot (and maybe even register slot) type
> declarations now to get things ready for that.
>
> NB: If this code is actually invoked in net-next when loading a
> filter, please don't push to Linus until there's a convincing argument
> that the verifier has acceptable asymptotic complexity.
>

To add one more point:

With the current verifier design, it's impossible to write a userspace
tool that can take an eBPF program and check it.  The verification is
far too context-dependent for that to be possible.  I won't go so far
as to say that a userspace tool needs to *exist*, but I strongly
object to exposing a verification algorithm that *precludes* writing
such a tool.

I think that the eBPF program format needs to encode all context
needed for verification.  Then verification should check that the
program is compliant with the context and that the context is correct.
The former could, in principle, be done in userspace, too.

Here, "context" includes the program type (i.e. what type R1 hasis),
the key and value sizes of all referenced maps, the fact that the maps
are maps (damnit, "every object implements exactly the same interface
and is called a 'map'" is a bad type system*), and possible also
things like the intended stack size and any other relevant details
about the entry calling convention.

* It's especially bad because you already have a type (skb) that
doesn't conform.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
  2014-09-26 19:34 eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite) Andy Lutomirski
  2014-09-26 19:51   ` Andy Lutomirski
@ 2014-09-26 20:00 ` Alexei Starovoitov
  2014-09-26 20:39     ` Andy Lutomirski
  1 sibling, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 20:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Sep 26, 2014 at 12:06 PM, David Miller <davem@davemloft.net> wrote:
>> From: Alexei Starovoitov <ast@plumgrid.com>
>> Date: Fri, 26 Sep 2014 00:16:56 -0700
>>
>>> v14 -> v15:
>>> - got rid of macros with hidden control flow (suggested by David)
>>>   replaced macro with explicit goto or return and simplified
>>>   where possible (affected patches #9 and #10)
>>> - rebased, retested
>>
>> Series applied to net-next, thanks.
>
> Hi all-
>
> I read through the verifier.  Sorry this email is a little bit late.
> Here are some thoughts, in no particular order.
>
> Programs should IMO have feature flags.  One such feature would be
> "can use pointers as integers", for example.

of course.
I believe we discussed it in the past.
programs will have flags. I'm not sure why you insist on this in
the first version.

> UNKNOWN_VALUE is IMO a crappy type.  I think that it should renamed to
> INTEGER and that a bunch of the things that generate it should be
> errors.

it's an internal type.
Sure, I'm ok renaming it, but 'integer' name doesn't fit.
It's not integer type.
When I post a patch for "can use pointers as integers" flag
you'll see that this 'unknown_value' will be broken down
into more precise types.

> ALU ops like adding two pointers should IMO be failures.  I don't
> think that they should result in UNKNOWN_VALUE.

they will be failures for unprivileged programs.
In this patch the whole thing is for root only and root programs need
arithmetic on pointers to compute hashes and so on.
I wish we spent more time chatting, so I could explain this better.

> I don't like PTR_TO_MAP_VALUE as a type.  I think that there should
> instead be "pointer to (const?) buffer of N bytes".
>
> ARG_PTR_TO_MAP_KEY is odd.  Why do we care what a function does with a
> key?  It just needs to be a big enough buffer, right?

not quite. there is a distinction between key and value.
They both come from map definition and correspond to key_size
and value_size, so they have to have two different corresponding
_internal_ types 'ptr_to_map_key' and 'ptr_to_map_value'
This distinction is needed to properly describe function
arguments constraints.

> All of the stack spill stuff seems overcomplicated.  Can you just
> disallow unaligned stack access?  Right off the bat, that will delete
> a bunch of code and cut down runtime memory use by nearly a factor of
> eight.

not quite. stack spill/fill is not for unaligned access.
unaligned access is disallowed first.
See line 663 in check_mem_access().
Often enough gcc/llvm ran out of registers and have to spill
them into stack. This spill/fill tracking mechanism keeps
track of what is stored into stack. Otherwise type violation will be
possible. We cannot get rid of it. It essential for safety.

> Also, there are no calls to functions written eBPF, right?  If so, why
> not just give each stack slot a *fixed* type for the lifetime of the
> program?  That would be a huge time complexity win, not to mention
> being more comprehensible.  When function calls get added, they'll
> need a whole pile of new infrastructure anyway.

you cannot. stack cannot be fixed. It's very small and valuable
resources. Programs will be using differently.

> check_call is a mess.  I predict that it will be unmaintainable.

why? care to provide details?

> check_ld_abs is messy, IMO.  Can't there be a real type "pointer to
> object of type [type]", where skb would be a type?  Then you could use
> the normal function call logic for skb pointers instead of hardcoded
> crud.
>
> You're doing a depth-first search.  I don't like it.  You need complex
> pruning logic to avoid exponential behavior, and it's not obvious to

there was in patch in previous series that did the prunning.
I dropped out of this set to simplify things.

> me that pathological programs that cause exponential blowups in the
> verifier don't exist.  Wouldn't it make more sense to do a
> breadth-first search instead and to give each reg/stack slot a
> definite type at each point in the control flow?  You'd need a
> function to find the intersection of two types, but I think that would
> be preferable to the current code that tests whether one type is a
> subset of another.

nope. breadth-first just doesn't work at all.

> At some point someone will want loops, and that will open a huge can
> of worms involving explicit typing or type inference.  It might pay to
> add some kind of stack slot (and maybe even register slot) type
> declarations now to get things ready for that.

sure. As discussed we may allow loops in the future.

> NB: If this code is actually invoked in net-next when loading a
> filter, please don't push to Linus until there's a convincing argument
> that the verifier has acceptable asymptotic complexity.

complexity is actually described in the doc.
There are several limits. Verifier will be aborted if it walks
more then 32k instructions or more then 1k branches.
So the very worst case takes micro seconds to reject
the program. So I don't see your concern.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 20:09     ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 20:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 12:51 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> To add one more point:
>
> With the current verifier design, it's impossible to write a userspace
> tool that can take an eBPF program and check it.  The verification is
> far too context-dependent for that to be possible.  I won't go so far
> as to say that a userspace tool needs to *exist*, but I strongly
> object to exposing a verification algorithm that *precludes* writing
> such a tool.

that's just not true.
why is it not possible?

> I think that the eBPF program format needs to encode all context
> needed for verification.  Then verification should check that the
> program is compliant with the context and that the context is correct.
> The former could, in principle, be done in userspace, too.

one can have maps and other future objects equally
represented in user space. Nothing stops doing exactly the same
logic there.

> Here, "context" includes the program type (i.e. what type R1 hasis),
> the key and value sizes of all referenced maps, the fact that the maps
> are maps (damnit, "every object implements exactly the same interface
> and is called a 'map'" is a bad type system*), and possible also
> things like the intended stack size and any other relevant details
> about the entry calling convention.

Andy, I'm not sure where you're going with this.
Sounds like you want to redesign the whole thing?
How long it will take?
Did you consider all the cases I did?
I think I understand your concerns. What I don't understand
why you think we cannot address them step by step.
imo what this does covers a ton of use cases.
Some futuristic stuff may be better and may be not.
But here I have it working, tested and proven over many
use cases, whereas some future unclear stuff will take
unknown amount of time to redesign...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 20:09     ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 20:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 12:51 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>
> To add one more point:
>
> With the current verifier design, it's impossible to write a userspace
> tool that can take an eBPF program and check it.  The verification is
> far too context-dependent for that to be possible.  I won't go so far
> as to say that a userspace tool needs to *exist*, but I strongly
> object to exposing a verification algorithm that *precludes* writing
> such a tool.

that's just not true.
why is it not possible?

> I think that the eBPF program format needs to encode all context
> needed for verification.  Then verification should check that the
> program is compliant with the context and that the context is correct.
> The former could, in principle, be done in userspace, too.

one can have maps and other future objects equally
represented in user space. Nothing stops doing exactly the same
logic there.

> Here, "context" includes the program type (i.e. what type R1 hasis),
> the key and value sizes of all referenced maps, the fact that the maps
> are maps (damnit, "every object implements exactly the same interface
> and is called a 'map'" is a bad type system*), and possible also
> things like the intended stack size and any other relevant details
> about the entry calling convention.

Andy, I'm not sure where you're going with this.
Sounds like you want to redesign the whole thing?
How long it will take?
Did you consider all the cases I did?
I think I understand your concerns. What I don't understand
why you think we cannot address them step by step.
imo what this does covers a ton of use cases.
Some futuristic stuff may be better and may be not.
But here I have it working, tested and proven over many
use cases, whereas some future unclear stuff will take
unknown amount of time to redesign...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 20:39     ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 20:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 1:00 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Sep 26, 2014 at 12:06 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Alexei Starovoitov <ast@plumgrid.com>
>>> Date: Fri, 26 Sep 2014 00:16:56 -0700
>>>
>>>> v14 -> v15:
>>>> - got rid of macros with hidden control flow (suggested by David)
>>>>   replaced macro with explicit goto or return and simplified
>>>>   where possible (affected patches #9 and #10)
>>>> - rebased, retested
>>>
>>> Series applied to net-next, thanks.
>>
>> Hi all-
>>
>> I read through the verifier.  Sorry this email is a little bit late.
>> Here are some thoughts, in no particular order.
>>
>> Programs should IMO have feature flags.  One such feature would be
>> "can use pointers as integers", for example.
>
> of course.
> I believe we discussed it in the past.
> programs will have flags. I'm not sure why you insist on this in
> the first version.
>
>> UNKNOWN_VALUE is IMO a crappy type.  I think that it should renamed to
>> INTEGER and that a bunch of the things that generate it should be
>> errors.
>
> it's an internal type.
> Sure, I'm ok renaming it, but 'integer' name doesn't fit.
> It's not integer type.
> When I post a patch for "can use pointers as integers" flag
> you'll see that this 'unknown_value' will be broken down
> into more precise types.
>
>> ALU ops like adding two pointers should IMO be failures.  I don't
>> think that they should result in UNKNOWN_VALUE.
>
> they will be failures for unprivileged programs.
> In this patch the whole thing is for root only and root programs need
> arithmetic on pointers to compute hashes and so on.
> I wish we spent more time chatting, so I could explain this better.
>
>> I don't like PTR_TO_MAP_VALUE as a type.  I think that there should
>> instead be "pointer to (const?) buffer of N bytes".
>>
>> ARG_PTR_TO_MAP_KEY is odd.  Why do we care what a function does with a
>> key?  It just needs to be a big enough buffer, right?
>
> not quite. there is a distinction between key and value.
> They both come from map definition and correspond to key_size
> and value_size, so they have to have two different corresponding
> _internal_ types 'ptr_to_map_key' and 'ptr_to_map_value'
> This distinction is needed to properly describe function
> arguments constraints.

But they're still just pointers to buffers of some size known to the
verifier, right?  By calling them "pointer to map key" and "pointer to
map value" you're tying them to map objects in a way that makes little
sense to me.

>
>> All of the stack spill stuff seems overcomplicated.  Can you just
>> disallow unaligned stack access?  Right off the bat, that will delete
>> a bunch of code and cut down runtime memory use by nearly a factor of
>> eight.
>
> not quite. stack spill/fill is not for unaligned access.
> unaligned access is disallowed first.
> See line 663 in check_mem_access().
> Often enough gcc/llvm ran out of registers and have to spill
> them into stack. This spill/fill tracking mechanism keeps
> track of what is stored into stack. Otherwise type violation will be
> possible. We cannot get rid of it. It essential for safety.

So what's "spill part"?  Unless I misunderstood the stack tracking
code, you're tracking each byte separately.

You're also tracking the type for each stack slot separately for each
instruction.  That looks like it'll account for the considerable
majority of total memory usage.

>
>> Also, there are no calls to functions written eBPF, right?  If so, why
>> not just give each stack slot a *fixed* type for the lifetime of the
>> program?  That would be a huge time complexity win, not to mention
>> being more comprehensible.  When function calls get added, they'll
>> need a whole pile of new infrastructure anyway.
>
> you cannot. stack cannot be fixed. It's very small and valuable
> resources. Programs will be using differently.
>
>> check_call is a mess.  I predict that it will be unmaintainable.
>
> why? care to provide details?

I may have misread my notes.

I don't like the fact that the function proto comes from the
environment instead of from the program.

>
>> check_ld_abs is messy, IMO.  Can't there be a real type "pointer to
>> object of type [type]", where skb would be a type?  Then you could use
>> the normal function call logic for skb pointers instead of hardcoded
>> crud.
>>
>> You're doing a depth-first search.  I don't like it.  You need complex
>> pruning logic to avoid exponential behavior, and it's not obvious to
>
> there was in patch in previous series that did the prunning.
> I dropped out of this set to simplify things.
>
>> me that pathological programs that cause exponential blowups in the
>> verifier don't exist.  Wouldn't it make more sense to do a
>> breadth-first search instead and to give each reg/stack slot a
>> definite type at each point in the control flow?  You'd need a
>> function to find the intersection of two types, but I think that would
>> be preferable to the current code that tests whether one type is a
>> subset of another.
>
> nope. breadth-first just doesn't work at all.

Sorry, I didn't actually mean BFS.  I meant to order the search such
that all incoming control flow edges to an insn are visited before any
of the outgoing edges are visited.

>
>> At some point someone will want loops, and that will open a huge can
>> of worms involving explicit typing or type inference.  It might pay to
>> add some kind of stack slot (and maybe even register slot) type
>> declarations now to get things ready for that.
>
> sure. As discussed we may allow loops in the future.
>
>> NB: If this code is actually invoked in net-next when loading a
>> filter, please don't push to Linus until there's a convincing argument
>> that the verifier has acceptable asymptotic complexity.
>
> complexity is actually described in the doc.
> There are several limits. Verifier will be aborted if it walks
> more then 32k instructions or more then 1k branches.
> So the very worst case takes micro seconds to reject
> the program. So I don't see your concern.

That this will randomly fail, then.  For all I know, there are
existing valid BPF programs with vastly more than 32k "instructions"
as counted by the verifier.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 20:39     ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 20:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 1:00 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> On Fri, Sep 26, 2014 at 12:06 PM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
>>> From: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org>
>>> Date: Fri, 26 Sep 2014 00:16:56 -0700
>>>
>>>> v14 -> v15:
>>>> - got rid of macros with hidden control flow (suggested by David)
>>>>   replaced macro with explicit goto or return and simplified
>>>>   where possible (affected patches #9 and #10)
>>>> - rebased, retested
>>>
>>> Series applied to net-next, thanks.
>>
>> Hi all-
>>
>> I read through the verifier.  Sorry this email is a little bit late.
>> Here are some thoughts, in no particular order.
>>
>> Programs should IMO have feature flags.  One such feature would be
>> "can use pointers as integers", for example.
>
> of course.
> I believe we discussed it in the past.
> programs will have flags. I'm not sure why you insist on this in
> the first version.
>
>> UNKNOWN_VALUE is IMO a crappy type.  I think that it should renamed to
>> INTEGER and that a bunch of the things that generate it should be
>> errors.
>
> it's an internal type.
> Sure, I'm ok renaming it, but 'integer' name doesn't fit.
> It's not integer type.
> When I post a patch for "can use pointers as integers" flag
> you'll see that this 'unknown_value' will be broken down
> into more precise types.
>
>> ALU ops like adding two pointers should IMO be failures.  I don't
>> think that they should result in UNKNOWN_VALUE.
>
> they will be failures for unprivileged programs.
> In this patch the whole thing is for root only and root programs need
> arithmetic on pointers to compute hashes and so on.
> I wish we spent more time chatting, so I could explain this better.
>
>> I don't like PTR_TO_MAP_VALUE as a type.  I think that there should
>> instead be "pointer to (const?) buffer of N bytes".
>>
>> ARG_PTR_TO_MAP_KEY is odd.  Why do we care what a function does with a
>> key?  It just needs to be a big enough buffer, right?
>
> not quite. there is a distinction between key and value.
> They both come from map definition and correspond to key_size
> and value_size, so they have to have two different corresponding
> _internal_ types 'ptr_to_map_key' and 'ptr_to_map_value'
> This distinction is needed to properly describe function
> arguments constraints.

But they're still just pointers to buffers of some size known to the
verifier, right?  By calling them "pointer to map key" and "pointer to
map value" you're tying them to map objects in a way that makes little
sense to me.

>
>> All of the stack spill stuff seems overcomplicated.  Can you just
>> disallow unaligned stack access?  Right off the bat, that will delete
>> a bunch of code and cut down runtime memory use by nearly a factor of
>> eight.
>
> not quite. stack spill/fill is not for unaligned access.
> unaligned access is disallowed first.
> See line 663 in check_mem_access().
> Often enough gcc/llvm ran out of registers and have to spill
> them into stack. This spill/fill tracking mechanism keeps
> track of what is stored into stack. Otherwise type violation will be
> possible. We cannot get rid of it. It essential for safety.

So what's "spill part"?  Unless I misunderstood the stack tracking
code, you're tracking each byte separately.

You're also tracking the type for each stack slot separately for each
instruction.  That looks like it'll account for the considerable
majority of total memory usage.

>
>> Also, there are no calls to functions written eBPF, right?  If so, why
>> not just give each stack slot a *fixed* type for the lifetime of the
>> program?  That would be a huge time complexity win, not to mention
>> being more comprehensible.  When function calls get added, they'll
>> need a whole pile of new infrastructure anyway.
>
> you cannot. stack cannot be fixed. It's very small and valuable
> resources. Programs will be using differently.
>
>> check_call is a mess.  I predict that it will be unmaintainable.
>
> why? care to provide details?

I may have misread my notes.

I don't like the fact that the function proto comes from the
environment instead of from the program.

>
>> check_ld_abs is messy, IMO.  Can't there be a real type "pointer to
>> object of type [type]", where skb would be a type?  Then you could use
>> the normal function call logic for skb pointers instead of hardcoded
>> crud.
>>
>> You're doing a depth-first search.  I don't like it.  You need complex
>> pruning logic to avoid exponential behavior, and it's not obvious to
>
> there was in patch in previous series that did the prunning.
> I dropped out of this set to simplify things.
>
>> me that pathological programs that cause exponential blowups in the
>> verifier don't exist.  Wouldn't it make more sense to do a
>> breadth-first search instead and to give each reg/stack slot a
>> definite type at each point in the control flow?  You'd need a
>> function to find the intersection of two types, but I think that would
>> be preferable to the current code that tests whether one type is a
>> subset of another.
>
> nope. breadth-first just doesn't work at all.

Sorry, I didn't actually mean BFS.  I meant to order the search such
that all incoming control flow edges to an insn are visited before any
of the outgoing edges are visited.

>
>> At some point someone will want loops, and that will open a huge can
>> of worms involving explicit typing or type inference.  It might pay to
>> add some kind of stack slot (and maybe even register slot) type
>> declarations now to get things ready for that.
>
> sure. As discussed we may allow loops in the future.
>
>> NB: If this code is actually invoked in net-next when loading a
>> filter, please don't push to Linus until there's a convincing argument
>> that the verifier has acceptable asymptotic complexity.
>
> complexity is actually described in the doc.
> There are several limits. Verifier will be aborted if it walks
> more then 32k instructions or more then 1k branches.
> So the very worst case takes micro seconds to reject
> the program. So I don't see your concern.

That this will randomly fail, then.  For all I know, there are
existing valid BPF programs with vastly more than 32k "instructions"
as counted by the verifier.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 20:42       ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 20:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 1:09 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Fri, Sep 26, 2014 at 12:51 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> To add one more point:
>>
>> With the current verifier design, it's impossible to write a userspace
>> tool that can take an eBPF program and check it.  The verification is
>> far too context-dependent for that to be possible.  I won't go so far
>> as to say that a userspace tool needs to *exist*, but I strongly
>> object to exposing a verification algorithm that *precludes* writing
>> such a tool.
>
> that's just not true.
> why is it not possible?

Because the types of referenced objects aren't encoded in the blob
that a user program loads, unless I'm missing something.

>
>> I think that the eBPF program format needs to encode all context
>> needed for verification.  Then verification should check that the
>> program is compliant with the context and that the context is correct.
>> The former could, in principle, be done in userspace, too.
>
> one can have maps and other future objects equally
> represented in user space. Nothing stops doing exactly the same
> logic there.

But the eBPF binary doesn't encode this information. In fact, the
caller of an ebpf syscall may not even have access to this
information.

I really think that the information needed to check type safety should
be encapsulated in the program.

>
>> Here, "context" includes the program type (i.e. what type R1 hasis),
>> the key and value sizes of all referenced maps, the fact that the maps
>> are maps (damnit, "every object implements exactly the same interface
>> and is called a 'map'" is a bad type system*), and possible also
>> things like the intended stack size and any other relevant details
>> about the entry calling convention.
>
> Andy, I'm not sure where you're going with this.
> Sounds like you want to redesign the whole thing?
> How long it will take?
> Did you consider all the cases I did?
> I think I understand your concerns. What I don't understand
> why you think we cannot address them step by step.
> imo what this does covers a ton of use cases.
> Some futuristic stuff may be better and may be not.
> But here I have it working, tested and proven over many
> use cases, whereas some future unclear stuff will take
> unknown amount of time to redesign...

I think this is addressable as a smallish change on top of your code.
Rather than looking up a map when you need to learn its key and value
size, I think that all you need to do is to look in a program section
for the key and value size (and the fact that it's a map) and confirm
that the referenced map *matches* the stored values.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 20:42       ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 20:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 1:09 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Sep 26, 2014 at 12:51 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>
>> To add one more point:
>>
>> With the current verifier design, it's impossible to write a userspace
>> tool that can take an eBPF program and check it.  The verification is
>> far too context-dependent for that to be possible.  I won't go so far
>> as to say that a userspace tool needs to *exist*, but I strongly
>> object to exposing a verification algorithm that *precludes* writing
>> such a tool.
>
> that's just not true.
> why is it not possible?

Because the types of referenced objects aren't encoded in the blob
that a user program loads, unless I'm missing something.

>
>> I think that the eBPF program format needs to encode all context
>> needed for verification.  Then verification should check that the
>> program is compliant with the context and that the context is correct.
>> The former could, in principle, be done in userspace, too.
>
> one can have maps and other future objects equally
> represented in user space. Nothing stops doing exactly the same
> logic there.

But the eBPF binary doesn't encode this information. In fact, the
caller of an ebpf syscall may not even have access to this
information.

I really think that the information needed to check type safety should
be encapsulated in the program.

>
>> Here, "context" includes the program type (i.e. what type R1 hasis),
>> the key and value sizes of all referenced maps, the fact that the maps
>> are maps (damnit, "every object implements exactly the same interface
>> and is called a 'map'" is a bad type system*), and possible also
>> things like the intended stack size and any other relevant details
>> about the entry calling convention.
>
> Andy, I'm not sure where you're going with this.
> Sounds like you want to redesign the whole thing?
> How long it will take?
> Did you consider all the cases I did?
> I think I understand your concerns. What I don't understand
> why you think we cannot address them step by step.
> imo what this does covers a ton of use cases.
> Some futuristic stuff may be better and may be not.
> But here I have it working, tested and proven over many
> use cases, whereas some future unclear stuff will take
> unknown amount of time to redesign...

I think this is addressable as a smallish change on top of your code.
Rather than looking up a map when you need to learn its key and value
size, I think that all you need to do is to look in a program section
for the key and value size (and the fact that it's a map) and confirm
that the referenced map *matches* the stored values.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 21:25       ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 21:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 1:39 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> not quite. there is a distinction between key and value.
>> They both come from map definition and correspond to key_size
>> and value_size, so they have to have two different corresponding
>> _internal_ types 'ptr_to_map_key' and 'ptr_to_map_value'
>> This distinction is needed to properly describe function
>> arguments constraints.
>
> But they're still just pointers to buffers of some size known to the
> verifier, right?  By calling them "pointer to map key" and "pointer to
> map value" you're tying them to map objects in a way that makes little
> sense to me.

'pointer_to_map_key' is internal argument constraint of the
in-kernel helper function. It tells verifier how to check the values
passed into function.
Just pointer + size abstraction is not enough here.
verifier has to know the type of what it's checking.

> So what's "spill part"?  Unless I misunderstood the stack tracking
> code, you're tracking each byte separately.
>
> You're also tracking the type for each stack slot separately for each
> instruction.  That looks like it'll account for the considerable
> majority of total memory usage.

verifier has to track each byte separately, because
malicious program may write a pointer into stack with 8-byte
write, then modify single byte with 1-byte write and then
try to read 8-byte back. Verifier has to catch that and
that's why it's tracking every byte-sized slot independently.

> I don't like the fact that the function proto comes from the
> environment instead of from the program.

that's must have.
in-kernel function argument constraints must come from
kernel. where else?
User program says I want to call function foo() and here
is my code that invokes it. Kernel sees prototype of this
foo() and checks arguments.
There is no point for user space program to also
pass foo() constraints. The only thing kernel can do
with this extra info is to check that it matches what
kernel already knows.

>> nope. breadth-first just doesn't work at all.
>
> Sorry, I didn't actually mean BFS.  I meant to order the search such
> that all incoming control flow edges to an insn are visited before any
> of the outgoing edges are visited.

hmm. I'm not sure how exactly you plan on achieving that.
I don't think we want to see real control/data flow graph
analysis in the kernel the way compilers do things.
It will be tens of thousands lines of code.
The algorithm you see in this verifier is straight forward and
tiny. I guess when time passes by when may get enough
courage to attempt something like this, but
today 'kiss' principle rules.

>> complexity is actually described in the doc.
>> There are several limits. Verifier will be aborted if it walks
>> more then 32k instructions or more then 1k branches.
>> So the very worst case takes micro seconds to reject
>> the program. So I don't see your concern.
>
> That this will randomly fail, then.  For all I know, there are
> existing valid BPF programs with vastly more than 32k "instructions"
> as counted by the verifier.

you need to double check your data :)
classic bpf limit is 4k instructions per program.
We're keeping the same limit for eBPF.
32k limit says that verifier will visit each instruction
no more than 8 times.
if we have a program full of branches, then yes, 32k limit will
be reached and that's exactly what 'state pruning' patch is
addressing! As I already said, I dropped it out of this set
to ease review and to keep patch set size minimal.
You can see it my tree:
https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/commit/?h=v14&id=1d9529ae4ce24bc31ca245a156299aa9e59a29f0
I was planning to send it next.
It's small incremental patch on top of existing things.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 21:25       ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 21:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 1:39 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> not quite. there is a distinction between key and value.
>> They both come from map definition and correspond to key_size
>> and value_size, so they have to have two different corresponding
>> _internal_ types 'ptr_to_map_key' and 'ptr_to_map_value'
>> This distinction is needed to properly describe function
>> arguments constraints.
>
> But they're still just pointers to buffers of some size known to the
> verifier, right?  By calling them "pointer to map key" and "pointer to
> map value" you're tying them to map objects in a way that makes little
> sense to me.

'pointer_to_map_key' is internal argument constraint of the
in-kernel helper function. It tells verifier how to check the values
passed into function.
Just pointer + size abstraction is not enough here.
verifier has to know the type of what it's checking.

> So what's "spill part"?  Unless I misunderstood the stack tracking
> code, you're tracking each byte separately.
>
> You're also tracking the type for each stack slot separately for each
> instruction.  That looks like it'll account for the considerable
> majority of total memory usage.

verifier has to track each byte separately, because
malicious program may write a pointer into stack with 8-byte
write, then modify single byte with 1-byte write and then
try to read 8-byte back. Verifier has to catch that and
that's why it's tracking every byte-sized slot independently.

> I don't like the fact that the function proto comes from the
> environment instead of from the program.

that's must have.
in-kernel function argument constraints must come from
kernel. where else?
User program says I want to call function foo() and here
is my code that invokes it. Kernel sees prototype of this
foo() and checks arguments.
There is no point for user space program to also
pass foo() constraints. The only thing kernel can do
with this extra info is to check that it matches what
kernel already knows.

>> nope. breadth-first just doesn't work at all.
>
> Sorry, I didn't actually mean BFS.  I meant to order the search such
> that all incoming control flow edges to an insn are visited before any
> of the outgoing edges are visited.

hmm. I'm not sure how exactly you plan on achieving that.
I don't think we want to see real control/data flow graph
analysis in the kernel the way compilers do things.
It will be tens of thousands lines of code.
The algorithm you see in this verifier is straight forward and
tiny. I guess when time passes by when may get enough
courage to attempt something like this, but
today 'kiss' principle rules.

>> complexity is actually described in the doc.
>> There are several limits. Verifier will be aborted if it walks
>> more then 32k instructions or more then 1k branches.
>> So the very worst case takes micro seconds to reject
>> the program. So I don't see your concern.
>
> That this will randomly fail, then.  For all I know, there are
> existing valid BPF programs with vastly more than 32k "instructions"
> as counted by the verifier.

you need to double check your data :)
classic bpf limit is 4k instructions per program.
We're keeping the same limit for eBPF.
32k limit says that verifier will visit each instruction
no more than 8 times.
if we have a program full of branches, then yes, 32k limit will
be reached and that's exactly what 'state pruning' patch is
addressing! As I already said, I dropped it out of this set
to ease review and to keep patch set size minimal.
You can see it my tree:
https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/commit/?h=v14&id=1d9529ae4ce24bc31ca245a156299aa9e59a29f0
I was planning to send it next.
It's small incremental patch on top of existing things.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
  2014-09-26 20:42       ` Andy Lutomirski
  (?)
@ 2014-09-26 21:46       ` Alexei Starovoitov
  2014-09-26 21:48           ` Andy Lutomirski
  -1 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 21:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 1:42 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Sep 26, 2014 at 1:09 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On Fri, Sep 26, 2014 at 12:51 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>
>>> To add one more point:
>>>
>>> With the current verifier design, it's impossible to write a userspace
>>> tool that can take an eBPF program and check it.  The verification is
>>> far too context-dependent for that to be possible.  I won't go so far
>>> as to say that a userspace tool needs to *exist*, but I strongly
>>> object to exposing a verification algorithm that *precludes* writing
>>> such a tool.
>>
>> that's just not true.
>> why is it not possible?
>
> Because the types of referenced objects aren't encoded in the blob
> that a user program loads, unless I'm missing something.

patch #8 'handle pseudo BPF_LD_IMM64 insn' of this set
handles first type == map. Other types will be added in the future.
The same verification can be done in user space.
It's pretty much copy paste for everything from the kernel.
I don't understand yet why you really must do it in in userspace
in addition to doing it in kernel. It's definitely doable.
Instead of asking kernel to create a map, user space
can just remember map attributes (key_size, value_size)
and continues verification in userspace.

> But the eBPF binary doesn't encode this information. In fact, the
> caller of an ebpf syscall may not even have access to this
> information.

I don't follow. What info are you talking about?
are you saying that program only that references maps via fds
is not verifiable unless one knows what this fds refer to?
yeah, but we're talking user space verification here.
user knows what maps it creates with what attributes.
Also we can add a command to this syscall to fetch map
attributes. That would be trivial _incremental_ addition, right?

> I think this is addressable as a smallish change on top of your code.
> Rather than looking up a map when you need to learn its key and value
> size, I think that all you need to do is to look in a program section
> for the key and value size (and the fact that it's a map) and confirm
> that the referenced map *matches* the stored values.

we can add extra info to the program that will encode
program assumptions about maps. Sure. Though I think
it's extra info that kernel doesn't really need, since it can
only check that program assumptions match to what
kernel already knows. Kernel cannot rely on them.
So I'm not sure what this extra check really buys.

Anyway, if you think it's a smallish change, we can do it
incrementally on top of existing stuff, right?
Why this arguing then?
Sounds like you want to help with the development?
This is great! I'm all for it :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 21:47         ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 21:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 2:25 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Fri, Sep 26, 2014 at 1:39 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> not quite. there is a distinction between key and value.
>>> They both come from map definition and correspond to key_size
>>> and value_size, so they have to have two different corresponding
>>> _internal_ types 'ptr_to_map_key' and 'ptr_to_map_value'
>>> This distinction is needed to properly describe function
>>> arguments constraints.
>>
>> But they're still just pointers to buffers of some size known to the
>> verifier, right?  By calling them "pointer to map key" and "pointer to
>> map value" you're tying them to map objects in a way that makes little
>> sense to me.
>
> 'pointer_to_map_key' is internal argument constraint of the
> in-kernel helper function. It tells verifier how to check the values
> passed into function.
> Just pointer + size abstraction is not enough here.
> verifier has to know the type of what it's checking.

Ignore "pointer_to_map_key" -- that was an error on my part.

I still think that "pointer to map value" should be "pointer to bytes".

>
>> So what's "spill part"?  Unless I misunderstood the stack tracking
>> code, you're tracking each byte separately.
>>
>> You're also tracking the type for each stack slot separately for each
>> instruction.  That looks like it'll account for the considerable
>> majority of total memory usage.
>
> verifier has to track each byte separately, because
> malicious program may write a pointer into stack with 8-byte
> write, then modify single byte with 1-byte write and then
> try to read 8-byte back. Verifier has to catch that and
> that's why it's tracking every byte-sized slot independently.
>

Can't you just disallow the 1-byte write to the stack?

>> I don't like the fact that the function proto comes from the
>> environment instead of from the program.
>
> that's must have.
> in-kernel function argument constraints must come from
> kernel. where else?
> User program says I want to call function foo() and here
> is my code that invokes it. Kernel sees prototype of this
> foo() and checks arguments.
> There is no point for user space program to also
> pass foo() constraints. The only thing kernel can do
> with this extra info is to check that it matches what
> kernel already knows.

User says "I'm calling a function called foo that has this signature".
Kernel checks (a) that the signature is right and (b) that the call is
compliant.

>
>>> nope. breadth-first just doesn't work at all.
>>
>> Sorry, I didn't actually mean BFS.  I meant to order the search such
>> that all incoming control flow edges to an insn are visited before any
>> of the outgoing edges are visited.
>
> hmm. I'm not sure how exactly you plan on achieving that.
> I don't think we want to see real control/data flow graph
> analysis in the kernel the way compilers do things.
> It will be tens of thousands lines of code.
> The algorithm you see in this verifier is straight forward and
> tiny. I guess when time passes by when may get enough
> courage to attempt something like this, but
> today 'kiss' principle rules.

I'll try it in Python.  I bet I can get it to be shorter than the current code.

>
>>> complexity is actually described in the doc.
>>> There are several limits. Verifier will be aborted if it walks
>>> more then 32k instructions or more then 1k branches.
>>> So the very worst case takes micro seconds to reject
>>> the program. So I don't see your concern.
>>
>> That this will randomly fail, then.  For all I know, there are
>> existing valid BPF programs with vastly more than 32k "instructions"
>> as counted by the verifier.
>
> you need to double check your data :)
> classic bpf limit is 4k instructions per program.
> We're keeping the same limit for eBPF.
> 32k limit says that verifier will visit each instruction
> no more than 8 times.
> if we have a program full of branches, then yes, 32k limit will
> be reached and that's exactly what 'state pruning' patch is
> addressing! As I already said, I dropped it out of this set
> to ease review and to keep patch set size minimal.
> You can see it my tree:
> https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/commit/?h=v14&id=1d9529ae4ce24bc31ca245a156299aa9e59a29f0
> I was planning to send it next.
> It's small incremental patch on top of existing things.

Yes, but does it work reliably?

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 21:47         ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 21:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 2:25 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Sep 26, 2014 at 1:39 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>> not quite. there is a distinction between key and value.
>>> They both come from map definition and correspond to key_size
>>> and value_size, so they have to have two different corresponding
>>> _internal_ types 'ptr_to_map_key' and 'ptr_to_map_value'
>>> This distinction is needed to properly describe function
>>> arguments constraints.
>>
>> But they're still just pointers to buffers of some size known to the
>> verifier, right?  By calling them "pointer to map key" and "pointer to
>> map value" you're tying them to map objects in a way that makes little
>> sense to me.
>
> 'pointer_to_map_key' is internal argument constraint of the
> in-kernel helper function. It tells verifier how to check the values
> passed into function.
> Just pointer + size abstraction is not enough here.
> verifier has to know the type of what it's checking.

Ignore "pointer_to_map_key" -- that was an error on my part.

I still think that "pointer to map value" should be "pointer to bytes".

>
>> So what's "spill part"?  Unless I misunderstood the stack tracking
>> code, you're tracking each byte separately.
>>
>> You're also tracking the type for each stack slot separately for each
>> instruction.  That looks like it'll account for the considerable
>> majority of total memory usage.
>
> verifier has to track each byte separately, because
> malicious program may write a pointer into stack with 8-byte
> write, then modify single byte with 1-byte write and then
> try to read 8-byte back. Verifier has to catch that and
> that's why it's tracking every byte-sized slot independently.
>

Can't you just disallow the 1-byte write to the stack?

>> I don't like the fact that the function proto comes from the
>> environment instead of from the program.
>
> that's must have.
> in-kernel function argument constraints must come from
> kernel. where else?
> User program says I want to call function foo() and here
> is my code that invokes it. Kernel sees prototype of this
> foo() and checks arguments.
> There is no point for user space program to also
> pass foo() constraints. The only thing kernel can do
> with this extra info is to check that it matches what
> kernel already knows.

User says "I'm calling a function called foo that has this signature".
Kernel checks (a) that the signature is right and (b) that the call is
compliant.

>
>>> nope. breadth-first just doesn't work at all.
>>
>> Sorry, I didn't actually mean BFS.  I meant to order the search such
>> that all incoming control flow edges to an insn are visited before any
>> of the outgoing edges are visited.
>
> hmm. I'm not sure how exactly you plan on achieving that.
> I don't think we want to see real control/data flow graph
> analysis in the kernel the way compilers do things.
> It will be tens of thousands lines of code.
> The algorithm you see in this verifier is straight forward and
> tiny. I guess when time passes by when may get enough
> courage to attempt something like this, but
> today 'kiss' principle rules.

I'll try it in Python.  I bet I can get it to be shorter than the current code.

>
>>> complexity is actually described in the doc.
>>> There are several limits. Verifier will be aborted if it walks
>>> more then 32k instructions or more then 1k branches.
>>> So the very worst case takes micro seconds to reject
>>> the program. So I don't see your concern.
>>
>> That this will randomly fail, then.  For all I know, there are
>> existing valid BPF programs with vastly more than 32k "instructions"
>> as counted by the verifier.
>
> you need to double check your data :)
> classic bpf limit is 4k instructions per program.
> We're keeping the same limit for eBPF.
> 32k limit says that verifier will visit each instruction
> no more than 8 times.
> if we have a program full of branches, then yes, 32k limit will
> be reached and that's exactly what 'state pruning' patch is
> addressing! As I already said, I dropped it out of this set
> to ease review and to keep patch set size minimal.
> You can see it my tree:
> https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/commit/?h=v14&id=1d9529ae4ce24bc31ca245a156299aa9e59a29f0
> I was planning to send it next.
> It's small incremental patch on top of existing things.

Yes, but does it work reliably?

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 21:48           ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 21:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 2:46 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Fri, Sep 26, 2014 at 1:42 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Sep 26, 2014 at 1:09 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>> On Fri, Sep 26, 2014 at 12:51 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>
>>>> To add one more point:
>>>>
>>>> With the current verifier design, it's impossible to write a userspace
>>>> tool that can take an eBPF program and check it.  The verification is
>>>> far too context-dependent for that to be possible.  I won't go so far
>>>> as to say that a userspace tool needs to *exist*, but I strongly
>>>> object to exposing a verification algorithm that *precludes* writing
>>>> such a tool.
>>>
>>> that's just not true.
>>> why is it not possible?
>>
>> Because the types of referenced objects aren't encoded in the blob
>> that a user program loads, unless I'm missing something.
>
> patch #8 'handle pseudo BPF_LD_IMM64 insn' of this set
> handles first type == map. Other types will be added in the future.
> The same verification can be done in user space.
> It's pretty much copy paste for everything from the kernel.
> I don't understand yet why you really must do it in in userspace
> in addition to doing it in kernel. It's definitely doable.
> Instead of asking kernel to create a map, user space
> can just remember map attributes (key_size, value_size)
> and continues verification in userspace.
>
>> But the eBPF binary doesn't encode this information. In fact, the
>> caller of an ebpf syscall may not even have access to this
>> information.
>
> I don't follow. What info are you talking about?
> are you saying that program only that references maps via fds
> is not verifiable unless one knows what this fds refer to?
> yeah, but we're talking user space verification here.
> user knows what maps it creates with what attributes.
> Also we can add a command to this syscall to fetch map
> attributes. That would be trivial _incremental_ addition, right?

That would also work, I suppose.

>
>> I think this is addressable as a smallish change on top of your code.
>> Rather than looking up a map when you need to learn its key and value
>> size, I think that all you need to do is to look in a program section
>> for the key and value size (and the fact that it's a map) and confirm
>> that the referenced map *matches* the stored values.
>
> we can add extra info to the program that will encode
> program assumptions about maps. Sure. Though I think
> it's extra info that kernel doesn't really need, since it can
> only check that program assumptions match to what
> kernel already knows. Kernel cannot rely on them.
> So I'm not sure what this extra check really buys.
>
> Anyway, if you think it's a smallish change, we can do it
> incrementally on top of existing stuff, right?
> Why this arguing then?
> Sounds like you want to help with the development?
> This is great! I'm all for it :)



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 21:48           ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 21:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 2:46 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Sep 26, 2014 at 1:42 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> On Fri, Sep 26, 2014 at 1:09 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
>>> On Fri, Sep 26, 2014 at 12:51 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>>> On Fri, Sep 26, 2014 at 12:34 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>>>
>>>> To add one more point:
>>>>
>>>> With the current verifier design, it's impossible to write a userspace
>>>> tool that can take an eBPF program and check it.  The verification is
>>>> far too context-dependent for that to be possible.  I won't go so far
>>>> as to say that a userspace tool needs to *exist*, but I strongly
>>>> object to exposing a verification algorithm that *precludes* writing
>>>> such a tool.
>>>
>>> that's just not true.
>>> why is it not possible?
>>
>> Because the types of referenced objects aren't encoded in the blob
>> that a user program loads, unless I'm missing something.
>
> patch #8 'handle pseudo BPF_LD_IMM64 insn' of this set
> handles first type == map. Other types will be added in the future.
> The same verification can be done in user space.
> It's pretty much copy paste for everything from the kernel.
> I don't understand yet why you really must do it in in userspace
> in addition to doing it in kernel. It's definitely doable.
> Instead of asking kernel to create a map, user space
> can just remember map attributes (key_size, value_size)
> and continues verification in userspace.
>
>> But the eBPF binary doesn't encode this information. In fact, the
>> caller of an ebpf syscall may not even have access to this
>> information.
>
> I don't follow. What info are you talking about?
> are you saying that program only that references maps via fds
> is not verifiable unless one knows what this fds refer to?
> yeah, but we're talking user space verification here.
> user knows what maps it creates with what attributes.
> Also we can add a command to this syscall to fetch map
> attributes. That would be trivial _incremental_ addition, right?

That would also work, I suppose.

>
>> I think this is addressable as a smallish change on top of your code.
>> Rather than looking up a map when you need to learn its key and value
>> size, I think that all you need to do is to look in a program section
>> for the key and value size (and the fact that it's a map) and confirm
>> that the referenced map *matches* the stored values.
>
> we can add extra info to the program that will encode
> program assumptions about maps. Sure. Though I think
> it's extra info that kernel doesn't really need, since it can
> only check that program assumptions match to what
> kernel already knows. Kernel cannot rely on them.
> So I'm not sure what this extra check really buys.
>
> Anyway, if you think it's a smallish change, we can do it
> incrementally on top of existing stuff, right?
> Why this arguing then?
> Sounds like you want to help with the development?
> This is great! I'm all for it :)



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 22:03           ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 22:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 2:47 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Can't you just disallow the 1-byte write to the stack?

of course not.
That would be extremely limiting to users.
Can you actually see yourself living with stack that only
allows 8-byte writes/reads?
The stack usage will increase a lot, since all char/short
stack variables will become 8-byte...

> I'll try it in Python.  I bet I can get it to be shorter than the current code.

Awesome challenge! :)
I'll buy you a drink of your choice if you can achieve that.
Also I'll send you our C programs that we use for
testing to make sure, your python verifier can analyze them.
If it passes, I'll be glad to rip mine out. Seriously.
Deal?

> Yes, but does it work reliably?

I'm not saying current verifier is perfect. It can get better.
So far it was enough to let people code freely on top of it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 22:03           ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 22:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 2:47 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>
> Can't you just disallow the 1-byte write to the stack?

of course not.
That would be extremely limiting to users.
Can you actually see yourself living with stack that only
allows 8-byte writes/reads?
The stack usage will increase a lot, since all char/short
stack variables will become 8-byte...

> I'll try it in Python.  I bet I can get it to be shorter than the current code.

Awesome challenge! :)
I'll buy you a drink of your choice if you can achieve that.
Also I'll send you our C programs that we use for
testing to make sure, your python verifier can analyze them.
If it passes, I'll be glad to rip mine out. Seriously.
Deal?

> Yes, but does it work reliably?

I'm not saying current verifier is perfect. It can get better.
So far it was enough to let people code freely on top of it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 22:07             ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 22:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 3:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Fri, Sep 26, 2014 at 2:47 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Can't you just disallow the 1-byte write to the stack?
>
> of course not.
> That would be extremely limiting to users.
> Can you actually see yourself living with stack that only
> allows 8-byte writes/reads?
> The stack usage will increase a lot, since all char/short
> stack variables will become 8-byte...

How about requiring that sub-8-byte stack accesses only be to integer slots?

>
>> I'll try it in Python.  I bet I can get it to be shorter than the current code.
>
> Awesome challenge! :)
> I'll buy you a drink of your choice if you can achieve that.
> Also I'll send you our C programs that we use for
> testing to make sure, your python verifier can analyze them.
> If it passes, I'll be glad to rip mine out. Seriously.
> Deal?

Sure.

>
>> Yes, but does it work reliably?
>
> I'm not saying current verifier is perfect. It can get better.
> So far it was enough to let people code freely on top of it.



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 22:07             ` Andy Lutomirski
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 22:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 3:03 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
> On Fri, Sep 26, 2014 at 2:47 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>
>> Can't you just disallow the 1-byte write to the stack?
>
> of course not.
> That would be extremely limiting to users.
> Can you actually see yourself living with stack that only
> allows 8-byte writes/reads?
> The stack usage will increase a lot, since all char/short
> stack variables will become 8-byte...

How about requiring that sub-8-byte stack accesses only be to integer slots?

>
>> I'll try it in Python.  I bet I can get it to be shorter than the current code.
>
> Awesome challenge! :)
> I'll buy you a drink of your choice if you can achieve that.
> Also I'll send you our C programs that we use for
> testing to make sure, your python verifier can analyze them.
> If it passes, I'll be glad to rip mine out. Seriously.
> Deal?

Sure.

>
>> Yes, but does it work reliably?
>
> I'm not saying current verifier is perfect. It can get better.
> So far it was enough to let people code freely on top of it.



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
  2014-09-26 22:07             ` Andy Lutomirski
  (?)
@ 2014-09-26 22:26             ` Alexei Starovoitov
  2014-09-26 22:41               ` Andy Lutomirski
  -1 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 22:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 3:07 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Sep 26, 2014 at 3:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On Fri, Sep 26, 2014 at 2:47 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>
>>> Can't you just disallow the 1-byte write to the stack?
>>
>> of course not.
>> That would be extremely limiting to users.
>> Can you actually see yourself living with stack that only
>> allows 8-byte writes/reads?
>> The stack usage will increase a lot, since all char/short
>> stack variables will become 8-byte...
>
> How about requiring that sub-8-byte stack accesses only be to integer slots?

you mean to reject the sub-8-byte write early if it's going
into space where pointers were stored?
That will limit stack reuse.
gcc/llvm generate code where the same stack location
is used by different variables during life of the function.
So if I reject the write early, it will break otherwise valid
programs.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
  2014-09-26 22:26             ` Alexei Starovoitov
@ 2014-09-26 22:41               ` Andy Lutomirski
  2014-09-26 23:13                   ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-09-26 22:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 3:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Fri, Sep 26, 2014 at 3:07 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Sep 26, 2014 at 3:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>> On Fri, Sep 26, 2014 at 2:47 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>
>>>> Can't you just disallow the 1-byte write to the stack?
>>>
>>> of course not.
>>> That would be extremely limiting to users.
>>> Can you actually see yourself living with stack that only
>>> allows 8-byte writes/reads?
>>> The stack usage will increase a lot, since all char/short
>>> stack variables will become 8-byte...
>>
>> How about requiring that sub-8-byte stack accesses only be to integer slots?
>
> you mean to reject the sub-8-byte write early if it's going
> into space where pointers were stored?
> That will limit stack reuse.
> gcc/llvm generate code where the same stack location
> is used by different variables during life of the function.
> So if I reject the write early, it will break otherwise valid
> programs.

I think that a sub-8-byte write to an integer slot should leave it as
an integer and a sub-8-byte write to a non-integer slot should turn
that slot into an integer (if conversions to integer are permitted) or
be rejected otherwise.  gcc/llvm could emit an 8-byte write first, as
needed, to make this valid.

Alternatively, an integer stack slot could have a bitmask indicating
which bytes are valid.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 23:13                   ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 23:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development, linux-kernel

On Fri, Sep 26, 2014 at 3:41 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Sep 26, 2014 at 3:26 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On Fri, Sep 26, 2014 at 3:07 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Fri, Sep 26, 2014 at 3:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>> On Fri, Sep 26, 2014 at 2:47 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>
>>>>> Can't you just disallow the 1-byte write to the stack?
>>>>
>>>> of course not.
>>>> That would be extremely limiting to users.
>>>> Can you actually see yourself living with stack that only
>>>> allows 8-byte writes/reads?
>>>> The stack usage will increase a lot, since all char/short
>>>> stack variables will become 8-byte...
>>>
>>> How about requiring that sub-8-byte stack accesses only be to integer slots?
>>
>> you mean to reject the sub-8-byte write early if it's going
>> into space where pointers were stored?
>> That will limit stack reuse.
>> gcc/llvm generate code where the same stack location
>> is used by different variables during life of the function.
>> So if I reject the write early, it will break otherwise valid
>> programs.
>
> I think that a sub-8-byte write to an integer slot should leave it as
> an integer and a sub-8-byte write to a non-integer slot should turn
> that slot into an integer (if conversions to integer are permitted) or
> be rejected otherwise.  gcc/llvm could emit an 8-byte write first, as
> needed, to make this valid.

I don't think the above will work.

> Alternatively, an integer stack slot could have a bitmask indicating
> which bytes are valid.

but this one might. Let me think about it more.
Note verifier, as it stands, is using 12Kbyte of temporary
memory to track stack with byte granularity.
(it's freed as soon as verifier finishes)
This bitmask optimization, best case, will reduce it to 1.5Kbyte
at the cost of extra complexity. I'll play with this idea to
see whether this trade off is actually worth doing.
Also note that there are indirect stack reads
(see check_func_arg() -> check_stack_boundary())
used to verify that N bytes were initialized in stack
before calling into helper function.
Also in the future I was planning to allow
indirect stack write, so that helper function can
store stuff into program stack. This indirect accesses
are crossing 8-byte boundaries, so would need special
care with bitmask optimization.
We need to carefully weight all pros and cons.
imo this memory usage during verification is not a big deal,
but of course we should not waste it. I'll see what can be done.

In any case, we're talking about incremental improvements
on top of current stuff, right?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite)
@ 2014-09-26 23:13                   ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2014-09-26 23:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Miller, Ingo Molnar, Linus Torvalds, Daniel Borkmann,
	Hannes Frederic Sowa, Chema Gonzalez, Eric Dumazet,
	Peter Zijlstra, Pablo Neira Ayuso, H. Peter Anvin, Andrew Morton,
	Kees Cook, Linux API, Network Development,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 26, 2014 at 3:41 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> On Fri, Sep 26, 2014 at 3:26 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
>> On Fri, Sep 26, 2014 at 3:07 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>> On Fri, Sep 26, 2014 at 3:03 PM, Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/w@public.gmane.org> wrote:
>>>> On Fri, Sep 26, 2014 at 2:47 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>>>>
>>>>> Can't you just disallow the 1-byte write to the stack?
>>>>
>>>> of course not.
>>>> That would be extremely limiting to users.
>>>> Can you actually see yourself living with stack that only
>>>> allows 8-byte writes/reads?
>>>> The stack usage will increase a lot, since all char/short
>>>> stack variables will become 8-byte...
>>>
>>> How about requiring that sub-8-byte stack accesses only be to integer slots?
>>
>> you mean to reject the sub-8-byte write early if it's going
>> into space where pointers were stored?
>> That will limit stack reuse.
>> gcc/llvm generate code where the same stack location
>> is used by different variables during life of the function.
>> So if I reject the write early, it will break otherwise valid
>> programs.
>
> I think that a sub-8-byte write to an integer slot should leave it as
> an integer and a sub-8-byte write to a non-integer slot should turn
> that slot into an integer (if conversions to integer are permitted) or
> be rejected otherwise.  gcc/llvm could emit an 8-byte write first, as
> needed, to make this valid.

I don't think the above will work.

> Alternatively, an integer stack slot could have a bitmask indicating
> which bytes are valid.

but this one might. Let me think about it more.
Note verifier, as it stands, is using 12Kbyte of temporary
memory to track stack with byte granularity.
(it's freed as soon as verifier finishes)
This bitmask optimization, best case, will reduce it to 1.5Kbyte
at the cost of extra complexity. I'll play with this idea to
see whether this trade off is actually worth doing.
Also note that there are indirect stack reads
(see check_func_arg() -> check_stack_boundary())
used to verify that N bytes were initialized in stack
before calling into helper function.
Also in the future I was planning to allow
indirect stack write, so that helper function can
store stuff into program stack. This indirect accesses
are crossing 8-byte boundaries, so would need special
care with bitmask optimization.
We need to carefully weight all pros and cons.
imo this memory usage during verification is not a big deal,
but of course we should not waste it. I'll see what can be done.

In any case, we're talking about incremental improvements
on top of current stuff, right?

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-09-26 23:13 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-26 19:34 eBPF verifier thoughts (Re: [PATCH v15 net-next 00/11] eBPF syscall, verifier, testsuite) Andy Lutomirski
2014-09-26 19:51 ` Andy Lutomirski
2014-09-26 19:51   ` Andy Lutomirski
2014-09-26 20:09   ` Alexei Starovoitov
2014-09-26 20:09     ` Alexei Starovoitov
2014-09-26 20:42     ` Andy Lutomirski
2014-09-26 20:42       ` Andy Lutomirski
2014-09-26 21:46       ` Alexei Starovoitov
2014-09-26 21:48         ` Andy Lutomirski
2014-09-26 21:48           ` Andy Lutomirski
2014-09-26 20:00 ` Alexei Starovoitov
2014-09-26 20:39   ` Andy Lutomirski
2014-09-26 20:39     ` Andy Lutomirski
2014-09-26 21:25     ` Alexei Starovoitov
2014-09-26 21:25       ` Alexei Starovoitov
2014-09-26 21:47       ` Andy Lutomirski
2014-09-26 21:47         ` Andy Lutomirski
2014-09-26 22:03         ` Alexei Starovoitov
2014-09-26 22:03           ` Alexei Starovoitov
2014-09-26 22:07           ` Andy Lutomirski
2014-09-26 22:07             ` Andy Lutomirski
2014-09-26 22:26             ` Alexei Starovoitov
2014-09-26 22:41               ` Andy Lutomirski
2014-09-26 23:13                 ` Alexei Starovoitov
2014-09-26 23:13                   ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.