[PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
@ 2017-05-15 16:04 David Miller
  2017-05-16 12:37 ` Edward Cree
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2017-05-15 16:04 UTC (permalink / raw)
  To: daniel; +Cc: ast, alexei.starovoitov, netdev


If we use 1<<31, then sequences like:

	R1 = 0
	R1 <<= 2

do silly things.  Examples of this actually exist in
the MAP tests of test_verifier.c

Update test_align.c expectation strings.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 kernel/bpf/verifier.c                    |  2 +-
 tools/testing/selftests/bpf/test_align.c | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 39f2dcb..0f33db0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1715,7 +1715,7 @@ static void check_reg_overflow(struct bpf_reg_state *reg)
 static u32 calc_align(u32 imm)
 {
 	if (!imm)
-		return 1U << 31;
+		return 1U << 16;
 	return imm - ((imm - 1) & imm);
 }
 
diff --git a/tools/testing/selftests/bpf/test_align.c b/tools/testing/selftests/bpf/test_align.c
index 9644d4e..afaa297 100644
--- a/tools/testing/selftests/bpf/test_align.c
+++ b/tools/testing/selftests/bpf/test_align.c
@@ -235,12 +235,12 @@ static struct bpf_align_test tests[] = {
 		},
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 		.matches = {
-			"4: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=0,r=0) R10=fp",
-			"5: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=14,r=0) R10=fp",
-			"6: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R4=pkt(id=0,off=14,r=0) R5=pkt(id=0,off=14,r=0) R10=fp",
-			"10: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv56 R5=pkt(id=0,off=14,r=18) R10=fp",
-			"14: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
-			"15: R0=imm0,min_value=0,max_value=0,min_align=2147483648 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
+			"4: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=0,r=0) R10=fp",
+			"5: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R5=pkt(id=0,off=14,r=0) R10=fp",
+			"6: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=0) R3=pkt_end R4=pkt(id=0,off=14,r=0) R5=pkt(id=0,off=14,r=0) R10=fp",
+			"10: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv56 R5=pkt(id=0,off=14,r=18) R10=fp",
+			"14: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
+			"15: R0=imm0,min_value=0,max_value=0,min_align=65536 R1=ctx R2=pkt(id=0,off=0,r=18) R3=pkt_end R4=inv48 R5=pkt(id=0,off=14,r=18) R10=fp",
 		},
 	},
 	{
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-15 16:04 [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier David Miller
@ 2017-05-16 12:37 ` Edward Cree
  2017-05-16 19:52   ` David Miller
  2017-05-16 22:53   ` Alexei Starovoitov
  0 siblings, 2 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-16 12:37 UTC (permalink / raw)
  To: David Miller, daniel; +Cc: ast, alexei.starovoitov, netdev

On 15/05/17 17:04, David Miller wrote:
> If we use 1<<31, then sequences like:
>
> 	R1 = 0
> 	R1 <<= 2
>
> do silly things.
Hmm.  It might be a bit late for this, but I wonder if, instead of handling
 alignments as (1 << align), you could store them as -(1 << align), i.e.
 leading 1s followed by 'align' 0s.
Now the alignment of 0 is 0 (really 1 << 32), which doesn't change when
 left-shifted some more.  Shifts of other numbers' alignments also do the
 right thing, e.g. align(6) << 2 = (-2) << 2 = -8 = align(6 << 2).  Of
 course you do all this in unsigned, to make sure right shifts work.
This also makes other arithmetic simple to track; for instance, align(a + b)
 is at worst align(a) | align(b).  (Of course, this bound isn't tight.)
A number is 2^(n+1)-aligned if the 2^n bit of its alignment is cleared.
Considered as unsigned numbers, smaller values are stricter alignments.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-16 12:37 ` Edward Cree
@ 2017-05-16 19:52   ` David Miller
  2017-05-16 22:53   ` Alexei Starovoitov
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2017-05-16 19:52 UTC (permalink / raw)
  To: ecree; +Cc: daniel, ast, alexei.starovoitov, netdev

From: Edward Cree <ecree@solarflare.com>
Date: Tue, 16 May 2017 13:37:42 +0100

> On 15/05/17 17:04, David Miller wrote:
>> If we use 1<<31, then sequences like:
>>
>> 	R1 = 0
>> 	R1 <<= 2
>>
>> do silly things.
> Hmm.  It might be a bit late for this, but I wonder if, instead of handling
>  alignments as (1 << align), you could store them as -(1 << align), i.e.
>  leading 1s followed by 'align' 0s.
> Now the alignment of 0 is 0 (really 1 << 32), which doesn't change when
>  left-shifted some more.  Shifts of other numbers' alignments also do the
>  right thing, e.g. align(6) << 2 = (-2) << 2 = -8 = align(6 << 2).  Of
>  course you do all this in unsigned, to make sure right shifts work.
> This also makes other arithmetic simple to track; for instance, align(a + b)
>  is at worst align(a) | align(b).  (Of course, this bound isn't tight.)
> A number is 2^(n+1)-aligned if the 2^n bit of its alignment is cleared.
> Considered as unsigned numbers, smaller values are stricter alignments.

Thanks for the bit twiddling suggestion, I'll take a look!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-16 12:37 ` Edward Cree
  2017-05-16 19:52   ` David Miller
@ 2017-05-16 22:53   ` Alexei Starovoitov
  2017-05-17 14:00     ` Edward Cree
  1 sibling, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-16 22:53 UTC (permalink / raw)
  To: Edward Cree, David Miller, daniel; +Cc: alexei.starovoitov, netdev

On 5/16/17 5:37 AM, Edward Cree wrote:
> On 15/05/17 17:04, David Miller wrote:
>> If we use 1<<31, then sequences like:
>>
>> 	R1 = 0
>> 	R1 <<= 2
>>
>> do silly things.
> Hmm.  It might be a bit late for this, but I wonder if, instead of handling
>  alignments as (1 << align), you could store them as -(1 << align), i.e.
>  leading 1s followed by 'align' 0s.
> Now the alignment of 0 is 0 (really 1 << 32), which doesn't change when
>  left-shifted some more.  Shifts of other numbers' alignments also do the
>  right thing, e.g. align(6) << 2 = (-2) << 2 = -8 = align(6 << 2).  Of
>  course you do all this in unsigned, to make sure right shifts work.
> This also makes other arithmetic simple to track; for instance, align(a + b)
>  is at worst align(a) | align(b).  (Of course, this bound isn't tight.)
> A number is 2^(n+1)-aligned if the 2^n bit of its alignment is cleared.
> Considered as unsigned numbers, smaller values are stricter alignments.

following this line of thinking it feels that it should be possible
to get rid of 'aux_off' and 'aux_off_align' and simplify the code.
I mean we can always do
dst_reg->min_align = min(dst_reg->min_align, src_reg->min_align);

and don't use 'off' as part of alignment checks at all.
So this bit:
if ((ip_align + reg_off + off) % size != 0) {
can be removed
and replaced with
a = alignof(ip_align)
a = min(a, reg->align)
if (a % size != 0)
and do this check always and not only after if (reg->id)

In check_packet_ptr_add():
- if (had_id)
-  dst_reg->aux_off_align = min(dst_reg->aux_off_align,
-                               src_reg->min_align);
- else
-  dst_reg->aux_off_align = src_reg->min_align;

+ if (had_id)
+  dst_reg->min_align = min(dst_reg->min_align, src_reg->min_align);
+ else
+  dst_reg->min_align = src_reg->min_align;

in that sense packet_ptr_add() will be no different than
align logic we do in adjust_reg_min_max_vals()

Thoughts?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-16 22:53   ` Alexei Starovoitov
@ 2017-05-17 14:00     ` Edward Cree
  2017-05-17 15:33       ` Edward Cree
  2017-05-17 16:13       ` David Miller
  0 siblings, 2 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-17 14:00 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, daniel; +Cc: alexei.starovoitov, netdev

On 16/05/17 23:53, Alexei Starovoitov wrote:
> following this line of thinking it feels that it should be possible
> to get rid of 'aux_off' and 'aux_off_align' and simplify the code.
> I mean we can always do
> dst_reg->min_align = min(dst_reg->min_align, src_reg->min_align);
>
> and don't use 'off' as part of alignment checks at all.
Problem with this approach, of course, is that (say) NET_IP_ALIGN +
 sizeof(ethhdr) = 16 is muchly aligned, whereas if you turn all
 constants into alignments you think you're only 2-byte aligned.
I think you have to track exact offsets when you can, and only turn
 into an alignment when you introduce a variable.
Of course it can still be fooled by e.g. 2 + (x << 2) + 14, which it
 will think is only 2-aligned when really it's 4-aligned, but unless
 you want to start tracking 'bits known to be 1' as well as 'bits
 known to be 0', I think you just accept that alignment tracking
 isn't commutative.  The obvious cases (ihl << 2 and so) will work
 when written the obvious way, unless the compiler does something
 perverse.
OTOH the 'track known 1s as well' might work in a nice generic way
 and cover all bases, I'll have to experiment a bit with that.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-17 14:00     ` Edward Cree
@ 2017-05-17 15:33       ` Edward Cree
  2017-05-18  0:16         ` David Miller
  2017-05-18  2:48         ` Alexei Starovoitov
  2017-05-17 16:13       ` David Miller
  1 sibling, 2 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-17 15:33 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, daniel; +Cc: alexei.starovoitov, netdev

On 17/05/17 15:00, Edward Cree wrote:
> OTOH the 'track known 1s as well' might work in a nice generic way
>  and cover all bases, I'll have to experiment a bit with that.
> 
> -Ed

So I did some experiments (in Python, script follows) and found that
 indeed this does appear to work, at least for addition and shifts.
The idea is that we have a 0s mask and a 1s mask; for bits that are
 unknown, the 0s mask is set and the 1s mask is cleared.  So a
 completely unknown variable has masks (~0, 0), then if you shift it
 left 2 you get (~3, 0) - just shift both masks.  A constant x has
 masks (x, x) - all the 0s are known 0s and all the 1s are known 1s.
Addition is a bit more complicated: we compute the 'unknown bits'
 mask, by XORing the 0s and 1s masks together, of each addend.  Then
 we add the corresponding masks from each addend together, and force
 the 'unknown' bits to the appropriate values in each mask.
So given (a1, b1) and (a2, b2), we compute m1 = a1 ^ b1,
 m2 = a2 ^ b2, and m = m1 | m2.  Then a = (a1 + a2) | m, and
 b = (b1 + b2) & ~m.
As a worked example, 2 + (x << 2) + 14:
 2 => (2, 2) constant
 x => (~0, 0) unknown
 x << 2 => (~3, 0)
 2 + (x << 2): add (2, 2) with (~3, 0)
    m1 = 0, m2 = ~3, m = ~3
    a = (2 + ~3) | ~3 = ~1 | ~3 = ~1
    b = (2 + 0) & ~~3 = 2 & 3 = 2
    so (~1, 2), which means "...xx10"
now add 14: add (~1, 2) with (14, 14)
    m1 = ~3, m2 = 0, m = ~3
    a = (~1 + 14) | ~3 = 12 | ~3 = ~3
    b = (2 + 14) & ~~3 = 16 & 3 = 0
    so (~3, 0), which means "...xx00"
and the result is 4-byte aligned.

-Ed

PS. Beware of bugs in the following code; I have only tested it, not
 proved it correct.

--8<--

#!/usr/bin/python2

def cpl(x):
    return (~x)&0xff

class AlignedNumber(object):
    def __init__(self, mask0=0xff, mask1=0):
        """mask0 has 0s for bits known to be 0, 1s otherwise.
        mask1 has 1s for bits known to be 1, 0s otherwise.
        Thus a bit which is set in mask0 and cleared in mask1 is an 'unknown'
        bit, while a bit which is cleared in mask0 and set in mask1 is a bug.
        """
        self.masks = (mask0 & 0xff, mask1 & 0xff)
        self.validate()
    @classmethod
    def const(cls, value):
        """All bits are known, so both masks equal value."""
        return cls(value, value)
    def validate(self):
        # Check for bits 'known' to be both 0 and 1
        assert not (cpl(self.masks[0]) & self.masks[1]), self.masks
        # Check unknown bits don't follow known bits
        assert self.mx | ((self.mx - 1) & 0xff) == 0xff, self.masks
    def __str__(self):
        return ':'.join(map(bin, self.masks))
    def __lshift__(self, sh):
        """Shift both masks left, low bits become 'known 0'"""
        return self.__class__(self.masks[0] << sh, self.masks[1] << sh)
    def __rshift__(self, sh):
        """Shift 1s into mask0; high bits become 'unknown'.
        While strictly speaking they may be known 0 if we're tracking the full
        word and doing unsigned shifts, having known bits before unknown bits
        breaks the addition code."""
        return self.__class__(cpl(cpl(self.masks[0]) >> sh), self.masks[1] >> sh)
    @property
    def mx(self):
        """Mask of unknown bits"""
        return self.masks[0] ^ self.masks[1]
    def __add__(self, other):
        """OR the mx values together.  Unknown bits could cause carries, so we
        just assume that they can carry all the way to the left (thus we keep
        our mx masks in the form 1...10...0.
        Then, add our 0- and 1-masks, and force the bits of the combined mx
        mask to the unknown state."""
        if isinstance(other, int):
            return self + AlignedNumber.const(other)
        assert isinstance(other, AlignedNumber), other
        m = self.mx | other.mx
        return self.__class__((self.masks[0] + other.masks[0]) | m,
                              (self.masks[1] + other.masks[1]) & cpl(m))
    def is_aligned(self, bits):
        """We are 2^n-aligned iff the bottom n bits are known-0."""
        mask = (1 << bits) - 1
        return not (self.masks[0] & mask)

if __name__ == '__main__':
    a = AlignedNumber.const(2)
    b = AlignedNumber() << 2
    c = AlignedNumber.const(14)
    print a, b, c
    print a + b, a + b + c
    assert (a + b + c).is_aligned(2)
    d = (AlignedNumber() << 4) >> 2
    print d
    assert d.is_aligned(2)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-17 14:00     ` Edward Cree
  2017-05-17 15:33       ` Edward Cree
@ 2017-05-17 16:13       ` David Miller
  2017-05-17 17:00         ` Edward Cree
  1 sibling, 1 reply; 36+ messages in thread
From: David Miller @ 2017-05-17 16:13 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev

From: Edward Cree <ecree@solarflare.com>
Date: Wed, 17 May 2017 15:00:04 +0100

> On 16/05/17 23:53, Alexei Starovoitov wrote:
>> following this line of thinking it feels that it should be possible
>> to get rid of 'aux_off' and 'aux_off_align' and simplify the code.
>> I mean we can always do
>> dst_reg->min_align = min(dst_reg->min_align, src_reg->min_align);
>>
>> and don't use 'off' as part of alignment checks at all.
> Problem with this approach, of course, is that (say) NET_IP_ALIGN +
>  sizeof(ethhdr) = 16 is muchly aligned, whereas if you turn all
>  constants into alignments you think you're only 2-byte aligned.
> I think you have to track exact offsets when you can, and only turn
>  into an alignment when you introduce a variable.
> Of course it can still be fooled by e.g. 2 + (x << 2) + 14, which it
>  will think is only 2-aligned when really it's 4-aligned, but unless
>  you want to start tracking 'bits known to be 1' as well as 'bits
>  known to be 0', I think you just accept that alignment tracking
>  isn't commutative.  The obvious cases (ihl << 2 and so) will work
>  when written the obvious way, unless the compiler does something
>  perverse.
> OTOH the 'track known 1s as well' might work in a nice generic way
>  and cover all bases, I'll have to experiment a bit with that.

Both cases are common in real BPF programs.  The offsets really are
necessary.  It's funny because initially I tried to implement this
without the auxiliary offset and it simply doesn't work. :-)

We always have to track when you've seen the offset that cancels out
the NET_IP_ALIGN.  And as stated it can occur both before and after
variable offsets have been introduced.

You have to catch both:

	ptr += variable;
	ptr += 14;

and:

	ptr += 14;
	ptr += variable; /* align = 4 */

And always see at the end that "NET_IP_ALIGN + offsets" will
be properly 4 byte aligned.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-17 16:13       ` David Miller
@ 2017-05-17 17:00         ` Edward Cree
  2017-05-17 17:25           ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Edward Cree @ 2017-05-17 17:00 UTC (permalink / raw)
  To: David Miller; +Cc: ast, daniel, alexei.starovoitov, netdev

On 17/05/17 17:13, David Miller wrote:
> Both cases are common in real BPF programs.  The offsets really are
> necessary.  It's funny because initially I tried to implement this
> without the auxiliary offset and it simply doesn't work. :-)
> 
> We always have to track when you've seen the offset that cancels out
> the NET_IP_ALIGN.  And as stated it can occur both before and after
> variable offsets have been introduced.
> 
> You have to catch both:
> 
> 	ptr += variable;
> 	ptr += 14;
> 
> and:
> 
> 	ptr += 14;
> 	ptr += variable; /* align = 4 */
> 
> And always see at the end that "NET_IP_ALIGN + offsets" will
> be properly 4 byte aligned.

Did you see the algorithms I posted with two masks?  Effectively they
 are tracking the 'quotient' and 'remainder', but in a was that might
 be easier to manipulate (bit-twiddly etc.) than the aux offset.  I
 think they handle both the above cases, in a nicely general way.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-17 17:00         ` Edward Cree
@ 2017-05-17 17:25           ` David Miller
  0 siblings, 0 replies; 36+ messages in thread
From: David Miller @ 2017-05-17 17:25 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev

From: Edward Cree <ecree@solarflare.com>
Date: Wed, 17 May 2017 18:00:03 +0100

> Did you see the algorithms I posted with two masks?

I'll take a look.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-17 15:33       ` Edward Cree
@ 2017-05-18  0:16         ` David Miller
  2017-05-18  1:18           ` Alexei Starovoitov
  2017-05-18 14:10           ` Edward Cree
  2017-05-18  2:48         ` Alexei Starovoitov
  1 sibling, 2 replies; 36+ messages in thread
From: David Miller @ 2017-05-18  0:16 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev

From: Edward Cree <ecree@solarflare.com>
Date: Wed, 17 May 2017 16:33:27 +0100

> So I did some experiments (in Python, script follows) and found that
>  indeed this does appear to work, at least for addition and shifts.
> The idea is that we have a 0s mask and a 1s mask; for bits that are
>  unknown, the 0s mask is set and the 1s mask is cleared.  So a
>  completely unknown variable has masks (~0, 0), then if you shift it
>  left 2 you get (~3, 0) - just shift both masks.  A constant x has
>  masks (x, x) - all the 0s are known 0s and all the 1s are known 1s.
> Addition is a bit more complicated: we compute the 'unknown bits'
>  mask, by XORing the 0s and 1s masks together, of each addend.  Then
>  we add the corresponding masks from each addend together, and force
>  the 'unknown' bits to the appropriate values in each mask.
> So given (a1, b1) and (a2, b2), we compute m1 = a1 ^ b1,
>  m2 = a2 ^ b2, and m = m1 | m2.  Then a = (a1 + a2) | m, and
>  b = (b1 + b2) & ~m.
> As a worked example, 2 + (x << 2) + 14:
>  2 => (2, 2) constant
>  x => (~0, 0) unknown
>  x << 2 => (~3, 0)
>  2 + (x << 2): add (2, 2) with (~3, 0)
>     m1 = 0, m2 = ~3, m = ~3
>     a = (2 + ~3) | ~3 = ~1 | ~3 = ~1
>     b = (2 + 0) & ~~3 = 2 & 3 = 2
>     so (~1, 2), which means "...xx10"
> now add 14: add (~1, 2) with (14, 14)
>     m1 = ~3, m2 = 0, m = ~3
>     a = (~1 + 14) | ~3 = 12 | ~3 = ~3
>     b = (2 + 14) & ~~3 = 16 & 3 = 0
>     so (~3, 0), which means "...xx00"
> and the result is 4-byte aligned.

Ok, I'm starting to digest this.  If it works it's a nice universal
way to handle alignment tracking, that's for sure.

So, in C, addition (a += b) is something like:

struct bpf_reg_bits {
        u64 zero_bits;
        u64 one_bits;
};

static void add_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
{
        u64 m_zeros, m_ones, m_all;

        m_zeros = a->zero_bits ^ b->zero_bits;
        m_ones = a->one_bits ^ b->one_bits;
        m_all = m_zeros | m_ones;

        a->zero_bits = (a->zero_bits + b->zero_bits) | m_all;
        a->one_bits = (a->one_bits + b->zero_bits) & ~m_all;
}

Then, is subtraction merely:

static void sub_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
{
        u64 m_zeros, m_ones, m_all;

        m_zeros = a->zero_bits ^ b->zero_bits;
        m_ones = a->one_bits ^ b->one_bits;
        m_all = m_zeros | m_ones;

        a->zero_bits = (a->zero_bits - b->zero_bits) | m_all;
        a->one_bits = (a->one_bits - b->zero_bits) & ~m_all;
}

Or is something different needed?

What about multiplication?  AND should be easy too.

BTW, we track something similar already in reg->imm which tracks how
many high order bits are known to be cleared in the register.  It is
used to avoid potential overflow for packet pointer accesses.  I bet
we can subsume that into this facility as well.

Thanks for all of your suggestions so far.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-18  0:16         ` David Miller
@ 2017-05-18  1:18           ` Alexei Starovoitov
  2017-05-18 14:10           ` Edward Cree
  1 sibling, 0 replies; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-18  1:18 UTC (permalink / raw)
  To: David Miller, ecree; +Cc: daniel, alexei.starovoitov, netdev

On 5/17/17 5:16 PM, David Miller wrote:
> BTW, we track something similar already in reg->imm which tracks how
> many high order bits are known to be cleared in the register.  It is
> used to avoid potential overflow for packet pointer accesses.  I bet
> we can subsume that into this facility as well.

yeah, reg->imm tracks zero upper bits in very simplistic way.
For alignment checking it seems we need to track lower zeros
and ones. If Edward's algorithm can be adopted to track both
that would be double win indeed.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-17 15:33       ` Edward Cree
  2017-05-18  0:16         ` David Miller
@ 2017-05-18  2:48         ` Alexei Starovoitov
  2017-05-18 14:49           ` Edward Cree
  1 sibling, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-18  2:48 UTC (permalink / raw)
  To: Edward Cree, David Miller, Daniel Borkmann; +Cc: alexei.starovoitov, netdev

On 5/17/17 8:33 AM, Edward Cree wrote:
> On 17/05/17 15:00, Edward Cree wrote:
>> OTOH the 'track known 1s as well' might work in a nice generic way
>>  and cover all bases, I'll have to experiment a bit with that.
>>
>> -Ed
>
> So I did some experiments (in Python, script follows) and found that
>  indeed this does appear to work, at least for addition and shifts.
> The idea is that we have a 0s mask and a 1s mask; for bits that are
>  unknown, the 0s mask is set and the 1s mask is cleared.  So a
>  completely unknown variable has masks (~0, 0), then if you shift it
>  left 2 you get (~3, 0) - just shift both masks.  A constant x has
>  masks (x, x) - all the 0s are known 0s and all the 1s are known 1s.
> Addition is a bit more complicated: we compute the 'unknown bits'
>  mask, by XORing the 0s and 1s masks together, of each addend.  Then
>  we add the corresponding masks from each addend together, and force
>  the 'unknown' bits to the appropriate values in each mask.
> So given (a1, b1) and (a2, b2), we compute m1 = a1 ^ b1,
>  m2 = a2 ^ b2, and m = m1 | m2.  Then a = (a1 + a2) | m, and
>  b = (b1 + b2) & ~m.
> As a worked example, 2 + (x << 2) + 14:
>  2 => (2, 2) constant
>  x => (~0, 0) unknown
>  x << 2 => (~3, 0)
>  2 + (x << 2): add (2, 2) with (~3, 0)
>     m1 = 0, m2 = ~3, m = ~3
>     a = (2 + ~3) | ~3 = ~1 | ~3 = ~1
>     b = (2 + 0) & ~~3 = 2 & 3 = 2
>     so (~1, 2), which means "...xx10"
> now add 14: add (~1, 2) with (14, 14)
>     m1 = ~3, m2 = 0, m = ~3
>     a = (~1 + 14) | ~3 = 12 | ~3 = ~3
>     b = (2 + 14) & ~~3 = 16 & 3 = 0
>     so (~3, 0), which means "...xx00"
> and the result is 4-byte aligned.
>
> -Ed
>
> PS. Beware of bugs in the following code; I have only tested it, not
>  proved it correct.
>
> --8<--
>
> #!/usr/bin/python2
>
> def cpl(x):
>     return (~x)&0xff
>
> class AlignedNumber(object):
>     def __init__(self, mask0=0xff, mask1=0):
>         """mask0 has 0s for bits known to be 0, 1s otherwise.
>         mask1 has 1s for bits known to be 1, 0s otherwise.
>         Thus a bit which is set in mask0 and cleared in mask1 is an 'unknown'
>         bit, while a bit which is cleared in mask0 and set in mask1 is a bug.
>         """
>         self.masks = (mask0 & 0xff, mask1 & 0xff)
>         self.validate()
>     @classmethod
>     def const(cls, value):
>         """All bits are known, so both masks equal value."""
>         return cls(value, value)
>     def validate(self):
>         # Check for bits 'known' to be both 0 and 1
>         assert not (cpl(self.masks[0]) & self.masks[1]), self.masks
>         # Check unknown bits don't follow known bits
>         assert self.mx | ((self.mx - 1) & 0xff) == 0xff, self.masks
>     def __str__(self):
>         return ':'.join(map(bin, self.masks))
>     def __lshift__(self, sh):
>         """Shift both masks left, low bits become 'known 0'"""
>         return self.__class__(self.masks[0] << sh, self.masks[1] << sh)
>     def __rshift__(self, sh):
>         """Shift 1s into mask0; high bits become 'unknown'.
>         While strictly speaking they may be known 0 if we're tracking the full
>         word and doing unsigned shifts, having known bits before unknown bits
>         breaks the addition code."""
>         return self.__class__(cpl(cpl(self.masks[0]) >> sh), self.masks[1] >> sh)
>     @property
>     def mx(self):
>         """Mask of unknown bits"""
>         return self.masks[0] ^ self.masks[1]
>     def __add__(self, other):
>         """OR the mx values together.  Unknown bits could cause carries, so we
>         just assume that they can carry all the way to the left (thus we keep
>         our mx masks in the form 1...10...0.
>         Then, add our 0- and 1-masks, and force the bits of the combined mx
>         mask to the unknown state."""
>         if isinstance(other, int):
>             return self + AlignedNumber.const(other)
>         assert isinstance(other, AlignedNumber), other
>         m = self.mx | other.mx
>         return self.__class__((self.masks[0] + other.masks[0]) | m,
>                               (self.masks[1] + other.masks[1]) & cpl(m))
>     def is_aligned(self, bits):
>         """We are 2^n-aligned iff the bottom n bits are known-0."""
>         mask = (1 << bits) - 1
>         return not (self.masks[0] & mask)
>
> if __name__ == '__main__':
>     a = AlignedNumber.const(2)
>     b = AlignedNumber() << 2
>     c = AlignedNumber.const(14)
>     print a, b, c
>     print a + b, a + b + c
>     assert (a + b + c).is_aligned(2)

that bit was confusing to me.
is_aligned(2) checks that number is 4 byte aligned :)

Would it be easier to represent this logic via (mask_of_unknown, value)
instead of (mask0, mask1) ?
Where mask_of_unknown has 1s for unknown bits and 0 for known bits
in the value. Then arithmetic operations will be easier to visualize.
Like:
(mask1, val1) + (mask2, val2) = (mask1 | mask2, val1 + val2)

Here is your modified script:
#!/usr/bin/python2

class AlignedNumber(object):
     def __init__(self, mask=0xff, value=0):
         # mask has 1s for unknown bits, 0s for known bits in value
         self.masks = (mask & 0xff, value & 0xff)
     @classmethod
     def const(cls, value):
         # All bits are known, hence mask = 0
         return cls(0, value)
     def __str__(self):
         return ':'.join(map(bin, self.masks))
     def __lshift__(self, sh):
         return self.__class__(self.masks[0] << sh, self.masks[1] << sh)
     def __rshift__(self, sh):
         return self.__class__(self.masks[0] >> sh, self.masks[1] >> sh)
     def __add__(self, other):
         if isinstance(other, int):
             return self + AlignedNumber.const(other)
         assert isinstance(other, AlignedNumber), other
         return self.__class__(self.masks[0] | other.masks[0],
                               self.masks[1] + other.masks[1])
     def is_aligned(self, bits):
         """We are 2^n-aligned iff the bottom n bits are known-0."""
         mask = (1 << bits) - 1
         # assume all unknown bits are 1 for alignment checking purpose
         m = self.masks[0] | self.masks[1]
         return not (self.masks[0] & mask)

if __name__ == '__main__':
     a = AlignedNumber.const(2)
     b = AlignedNumber() << 2
     c = AlignedNumber.const(14)
     print a, b, c
     print a + b, a + b + c
     assert (a + b + c).is_aligned(2)
     d = (AlignedNumber() << 4) >> 2
     print d
     assert d.is_aligned(2)


As far as upper bits we can tweak the algorithm to eat into
one or more bits of known bits due to carry.
Like
00xx11 + 00xx11 = 0xxx10
we will eat only one bit (second from left) and the highest bit
is known to stay zero, since carry can only compromise 2nd from left.
Such logic should work for sparse representation of unknown bits too
Like:
10xx01xx10 +
01xx01xx00 =
xxxx1xxx10
both upper two bits would be unknown, but only one middle bit becomes
unknown.
In typical case (that verifier tracks today) a bunch of upper bits will
be zero, so worst case we will be eating one bit at a time.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-18  0:16         ` David Miller
  2017-05-18  1:18           ` Alexei Starovoitov
@ 2017-05-18 14:10           ` Edward Cree
  1 sibling, 0 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-18 14:10 UTC (permalink / raw)
  To: David Miller; +Cc: ast, daniel, alexei.starovoitov, netdev

On 18/05/17 01:16, David Miller wrote:
> So, in C, addition (a += b) is something like:
>
> struct bpf_reg_bits {
>         u64 zero_bits;
>         u64 one_bits;
> };
>
> static void add_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
> {
>         u64 m_zeros, m_ones, m_all;
>
>         m_zeros = a->zero_bits ^ b->zero_bits;
>         m_ones = a->one_bits ^ b->one_bits;
>         m_all = m_zeros | m_ones;
No, this should be
    u64 m_a, m_b, m_all;

    m_a = a->zero_bits ^ a->one_bits; /* unknown bits in a */
    m_b = b->zero_bits ^ b->one_bits; /* unknown bits in b */
    m_all = m_a | m_b; /* unknown bits in result */
>         a->zero_bits = (a->zero_bits + b->zero_bits) | m_all;
>         a->one_bits = (a->one_bits + b->zero_bits) & ~m_all;
> }
>
> Then, is subtraction merely:
>
> static void sub_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
> {
>         u64 m_zeros, m_ones, m_all;
>
>         m_zeros = a->zero_bits ^ b->zero_bits;
>         m_ones = a->one_bits ^ b->one_bits;
>         m_all = m_zeros | m_ones;
>
>         a->zero_bits = (a->zero_bits - b->zero_bits) | m_all;
>         a->one_bits = (a->one_bits - b->zero_bits) & ~m_all;
> }
>
> Or is something different needed?
I suspect it's something different, just because I worry about what
 carries will do.

But I think Alexei's idea (mask and value) is better anyway; at the
 least it's easier to think about.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-18  2:48         ` Alexei Starovoitov
@ 2017-05-18 14:49           ` Edward Cree
  2017-05-18 16:38             ` Edward Cree
  0 siblings, 1 reply; 36+ messages in thread
From: Edward Cree @ 2017-05-18 14:49 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, Daniel Borkmann
  Cc: alexei.starovoitov, netdev

On 18/05/17 03:48, Alexei Starovoitov wrote:
> Would it be easier to represent this logic via (mask_of_unknown, value)
> instead of (mask0, mask1) ?
Yes, I like this.
> As far as upper bits we can tweak the algorithm to eat into
> one or more bits of known bits due to carry.
> Like
> 00xx11 + 00xx11 = 0xxx10
> we will eat only one bit (second from left) and the highest bit
> is known to stay zero, since carry can only compromise 2nd from left.
> Such logic should work for sparse representation of unknown bits too
> Like:
> 10xx01xx10 +
> 01xx01xx00 =
> xxxx1xxx10
> both upper two bits would be unknown, but only one middle bit becomes
> unknown.
Yes, that is the behaviour we want.  But it's unclear how to efficiently
 compute it, without just iterating over the bits and computing carry
 possibilities.
Here's one idea that seemed to work when I did a couple of experiments:
let A = (a;am), B = (b;bm) where the m are the masks
Σ = am + bm + a + b
χ = Σ ^ (a + b) /* unknown carries */
μ = χ | am | bm /* mask of result */
then A + B = ((a + b) & ~μ; μ)

The idea is that we find which bits change between the case "all x are
 1" and "all x are 0", and those become xs too.  But I'm not certain
 that that's always going to cover all possible values in between.
It worked on the tests I came up with, and also your example above, but
 I can't quite prove it'll always work.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-18 14:49           ` Edward Cree
@ 2017-05-18 16:38             ` Edward Cree
  2017-05-18 18:41               ` Edward Cree
  2017-05-19  1:22               ` Alexei Starovoitov
  0 siblings, 2 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-18 16:38 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, Daniel Borkmann
  Cc: alexei.starovoitov, netdev

On 18/05/17 15:49, Edward Cree wrote:
> Here's one idea that seemed to work when I did a couple of experiments:
> let A = (a;am), B = (b;bm) where the m are the masks
> Σ = am + bm + a + b
> χ = Σ ^ (a + b) /* unknown carries */
> μ = χ | am | bm /* mask of result */
> then A + B = ((a + b) & ~μ; μ)
>
> The idea is that we find which bits change between the case "all x are
>  1" and "all x are 0", and those become xs too.
And now I've found a similar algorithm for subtraction, which (again) I
 can't prove but it seems to work.
α = a + am - b
β = a - b - bm
χ = α ^ β
μ = χ | α | β
then A - B = ((a - b) & ~μ; μ)
Again we're effectively finding the max. and min. values, and XORing
 them to find unknown carries.

Bitwise operations are easy, of course;
/* By assumption, a & am == b & bm == 0 */
A & B = (a & b; (a | am) & (b | bm) & ~(a & b))
A | B = (a | b; (am | bm) & ~(a | b))
/* It bothers me that & and | aren't symmetric, but I can't fix it */
A ^ B = (a ^ b; am | bm)

as are shifts by a constant (just shift 0s into both number and mask).

Multiplication by a constant can be done by decomposing into shifts
 and adds; but it can also be done directly; here we find (a;am) * k.
π = a * k
γ = am * k
then A * k = (π; 0) + (0; γ), for which we use our addition algo.

Multiplication of two unknown values is a nightmare, as unknown bits
 can propagate all over the place.  We can do a shift-add
 decomposition where the adds for unknown bits have all the 1s in
 the addend replaced with xs.  A few experiments suggest that this
 works, regardless of the order of operands.  For instance
 110x * x01 comes out as either
    110x
+ xx0x
= xxxx0x
or
     x0x
   x01
+ x01
= xxxx0x
We can slightly optimise this by handling all the 1 bits in one go;
 that is, for (a;am) * (b;bm) we first find (a;am) * b using our
 multiplication-by-a-constant algo above, then for each bit in bm
 we find (a;am) * bit and force all its nonzero bits to unknown;
 finally we add all our components.

Don't even ask about division; that scrambles bits so hard that the
 only thing you can say for sure is that the leading 0s in the
 numerator stay 0 in the result.  The only exception is divisions
 by a constant which can be converted into a shift, or divisions
 of a constant by another constant; if the numerator has any xs and
 the denominator has more than one 1, everything to the right of the
 first x is totally unknown in general.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-18 16:38             ` Edward Cree
@ 2017-05-18 18:41               ` Edward Cree
  2017-05-19  1:22               ` Alexei Starovoitov
  1 sibling, 0 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-18 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, Daniel Borkmann
  Cc: alexei.starovoitov, netdev

Implementations (still in Python for now) at
https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d
(I left out division, because it's so weak.)

I still can't prove + and - are correct, but they've passed every test
 case I've come up with so far.  * seems pretty obviously correct as long
 as the + it uses is.  Bitwise ops and shifts are trivial to prove.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-18 16:38             ` Edward Cree
  2017-05-18 18:41               ` Edward Cree
@ 2017-05-19  1:22               ` Alexei Starovoitov
  2017-05-19 14:21                 ` Edward Cree
  1 sibling, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-19  1:22 UTC (permalink / raw)
  To: Edward Cree, David Miller, Daniel Borkmann; +Cc: alexei.starovoitov, netdev

On 5/18/17 9:38 AM, Edward Cree wrote:
> On 18/05/17 15:49, Edward Cree wrote:
>> Here's one idea that seemed to work when I did a couple of experiments:
>> let A = (a;am), B = (b;bm) where the m are the masks
>> Σ = am + bm + a + b
>> χ = Σ ^ (a + b) /* unknown carries */
>> μ = χ | am | bm /* mask of result */
>> then A + B = ((a + b) & ~μ; μ)
>>
>> The idea is that we find which bits change between the case "all x are
>>  1" and "all x are 0", and those become xs too.

 > https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d

I played with it quite a bit trying to break it and have to
agree that the above algorithm works.
At least for add and sub I think it's solid.
Still feels a bit magical, since it gave me better results
than I could envision for my test vectors.

In your .py I'd only change __str__(self) to print them in mask,value
as the order they're passed into constructor to make it easier to read.
The bin(self) output is the most useful, of course.
We should carry it into the kernel too for debugging.

> And now I've found a similar algorithm for subtraction, which (again) I
>  can't prove but it seems to work.
> α = a + am - b
> β = a - b - bm
> χ = α ^ β
> μ = χ | α | β
> then A - B = ((a - b) & ~μ; μ)
> Again we're effectively finding the max. and min. values, and XORing
>  them to find unknown carries.
>
> Bitwise operations are easy, of course;
> /* By assumption, a & am == b & bm == 0 */
> A & B = (a & b; (a | am) & (b | bm) & ~(a & b))
> A | B = (a | b; (am | bm) & ~(a | b))
> /* It bothers me that & and | aren't symmetric, but I can't fix it */
> A ^ B = (a ^ b; am | bm)
>
> as are shifts by a constant (just shift 0s into both number and mask).
>
> Multiplication by a constant can be done by decomposing into shifts
>  and adds; but it can also be done directly; here we find (a;am) * k.
> π = a * k
> γ = am * k
> then A * k = (π; 0) + (0; γ), for which we use our addition algo.
>
> Multiplication of two unknown values is a nightmare, as unknown bits
>  can propagate all over the place.  We can do a shift-add
>  decomposition where the adds for unknown bits have all the 1s in
>  the addend replaced with xs.  A few experiments suggest that this
>  works, regardless of the order of operands.  For instance
>  110x * x01 comes out as either
>     110x
> + xx0x
> = xxxx0x
> or
>      x0x
>    x01
> + x01
> = xxxx0x
> We can slightly optimise this by handling all the 1 bits in one go;
>  that is, for (a;am) * (b;bm) we first find (a;am) * b using our
>  multiplication-by-a-constant algo above, then for each bit in bm
>  we find (a;am) * bit and force all its nonzero bits to unknown;
>  finally we add all our components.

this mul algo I don't completely understand. It feels correct,
but I'm not sure we really need it for the kernel.
For all practical cases llvm will likely emit shifts or sequence
of adds and shifts, so multiplies by crazy non-optimizable constant
or variable are rare and likely the end result is going to be
outside of packet boundary, so it will be rejected anyway and
precise alignment tracking doesn't matter much.
What I love about the whole thing that it works for access into
packet, access into map values and in the future for any other
variable length access.

> Don't even ask about division; that scrambles bits so hard that the

yeah screw div and mod. We have an option to disable div/mod altogether
under some new 'prog_flags', since it has this ugly 'div by 0'
exception path. We don't even have 'signed division' instruction and
llvm errors like:
     errs() << "Unsupport signed division for DAG: ";
     errs() << "Please convert to unsigned div/mod.\n";
and no one complained. It just means that division is extremely rare.

Are you planning to work on the kernel patch for this algo?
Once we have it the verifier will be smarter regarding
alignment tracking than any compiler i know :)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-19  1:22               ` Alexei Starovoitov
@ 2017-05-19 14:21                 ` Edward Cree
  2017-05-19 14:55                   ` Alexei Starovoitov
  0 siblings, 1 reply; 36+ messages in thread
From: Edward Cree @ 2017-05-19 14:21 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, Daniel Borkmann
  Cc: alexei.starovoitov, netdev

On 19/05/17 02:22, Alexei Starovoitov wrote:
> In your .py I'd only change __str__(self) to print them in mask,value
> as the order they're passed into constructor to make it easier to read. 
Actually I was going to go the other way and change the ctor to take
 value,mask.  But I agree they're inconsistent right now.

> this mul algo I don't completely understand. It feels correct,
> but I'm not sure we really need it for the kernel. 
You're probably right; I was just driven by a completionist desire to
 cover everything I could.

> What I love about the whole thing that it works for access into
> packet, access into map values and in the future for any other
> variable length access.
Sure, but don't start thinking it subsumes all the other checks.  We
 will still need e.g. max/min tracking, because packet length isn't
 always a power of 2.

> Are you planning to work on the kernel patch for this algo?
> Once we have it the verifier will be smarter regarding
> alignment tracking than any compiler i know :) 
I'm currently translating the algos to C.  But for the kernel patch,
 I'll need to read & understand the existing verifier code, so it
 might take a while :)  (I don't suppose there's any design document
 or hacking-notes you could point me at?)
But I'll give it a go for sure.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-19 14:21                 ` Edward Cree
@ 2017-05-19 14:55                   ` Alexei Starovoitov
  2017-05-19 17:17                     ` Edward Cree
  0 siblings, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-19 14:55 UTC (permalink / raw)
  To: Edward Cree, David Miller, Daniel Borkmann; +Cc: alexei.starovoitov, netdev

On 5/19/17 7:21 AM, Edward Cree wrote:
> I'm currently translating the algos to C.  But for the kernel patch,
>  I'll need to read & understand the existing verifier code, so it
>  might take a while :)  (I don't suppose there's any design document
>  or hacking-notes you could point me at?)

Dave just gave a good overview of the verifier:
https://www.spinics.net/lists/xdp-newbies/msg00185.html

Few more details in
Documentation/networking/filter.txt
and in top comments in kernel/bpf/verifier.c

General bpf arch overview:
http://docs.cilium.io/en/latest/bpf/#instruction-set

> But I'll give it a go for sure.

Thanks!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-19 14:55                   ` Alexei Starovoitov
@ 2017-05-19 17:17                     ` Edward Cree
  2017-05-19 20:00                       ` Alignment in BPF verifier Edward Cree
  2017-05-19 20:41                       ` [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier David Miller
  0 siblings, 2 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-19 17:17 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, Daniel Borkmann
  Cc: alexei.starovoitov, netdev

On 19/05/17 15:55, Alexei Starovoitov wrote:
> On 5/19/17 7:21 AM, Edward Cree wrote:
>> I'm currently translating the algos to C.  But for the kernel patch,
>>  I'll need to read & understand the existing verifier code, so it
>>  might take a while :)  (I don't suppose there's any design document
>>  or hacking-notes you could point me at?)
>
> Dave just gave a good overview of the verifier:
> https://www.spinics.net/lists/xdp-newbies/msg00185.html
>
> Few more details in
> Documentation/networking/filter.txt
> and in top comments in kernel/bpf/verifier.c
>
> General bpf arch overview:
> http://docs.cilium.io/en/latest/bpf/#instruction-set 
Thanks for the links!  I'm digging into implementation now.
One question: is there a way to build the verifier as userland code
 (or at least as a module), or will I have to reboot every time I
 want to test a change?
-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Alignment in BPF verifier
  2017-05-19 17:17                     ` Edward Cree
@ 2017-05-19 20:00                       ` Edward Cree
  2017-05-19 20:39                         ` David Miller
  2017-05-19 20:48                         ` David Miller
  2017-05-19 20:41                       ` [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier David Miller
  1 sibling, 2 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-19 20:00 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, Daniel Borkmann
  Cc: alexei.starovoitov, netdev

Well, I've managed to get somewhat confused by reg->id.
In particular, I'm unsure which bpf_reg_types can have an id, and what
 exactly it means.  There seems to be some code that checks around map value
 pointers, which seems strange as maps have fixed sizes (and the comments in
 enum bpf_reg_type make it seem like id is a PTR_TO_PACKET thing) - is this
 maybe because of map-of-maps support, can the contained maps have differing
 element sizes?  Or do we allow *(map_value + var + imm), if map_value + var
 was appropriately bounds-checked?

Does the 'id' identify the variable that was added to an object pointer, or
 the object itself?  Or does it blur these and identify (what the comment in
 enum bpf_reg_type calls) "skb->data + (u16) var"?

Here's what I'm thinking of doing:
struct bpf_reg_state {
    enum bpf_reg_type type;
    union {
        /* valid when type == PTR_TO_PACKET */
        u16 range;

        /* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
         *   PTR_TO_MAP_VALUE_OR_NULL
         */
        struct bpf_map *map_ptr;
    };
    /* Used to find other pointers with the same variable base, so they
     * can share range and align knowledge.
     */
    u32 id;
    u32 off; /* fixed part of pointer offset */
    /* For scalar types (CONST_IMM | UNKNOWN_VALUE), this represents our
     * knowledge of the actual value.
     * For pointer types, this represents the variable part of the offset
     * from the pointed-to object, and is shared with all bpf_reg_states
     * with the same id as us.
     */
    struct tnum align;
    /* Used to determine if any memory access using this register will
     * result in a bad access. These two fields must be last.
     * See states_equal()
     * These refer to the same value as align, not necessarily the actual
     * contents of the register.
     */
    s64 min_value;
    u64 max_value;
};

Does that sound reasonable?  (And does my added comment on min/max_value
 accurately describe the current semantics, or will I need to change that
 as well?)

-Ed

PS. I think this approach would also mean several of the bpf_reg_types can
 be folded together:
* PTR_TO_MAP_VALUE and PTR_TO_MAP_VALUE_ADJ are the same
* FRAME_PTR is just a PTR_TO_STACK with known-zero offset
* CONST_IMM is similarly a special case of UNKNOWN_VALUE

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-19 20:00                       ` Alignment in BPF verifier Edward Cree
@ 2017-05-19 20:39                         ` David Miller
  2017-05-19 23:05                           ` Daniel Borkmann
  2017-05-19 20:48                         ` David Miller
  1 sibling, 1 reply; 36+ messages in thread
From: David Miller @ 2017-05-19 20:39 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev

From: Edward Cree <ecree@solarflare.com>
Date: Fri, 19 May 2017 21:00:13 +0100

> Well, I've managed to get somewhat confused by reg->id.
> In particular, I'm unsure which bpf_reg_types can have an id, and what
>  exactly it means.  There seems to be some code that checks around map value
>  pointers, which seems strange as maps have fixed sizes (and the comments in
>  enum bpf_reg_type make it seem like id is a PTR_TO_PACKET thing) - is this
>  maybe because of map-of-maps support, can the contained maps have differing
>  element sizes?  Or do we allow *(map_value + var + imm), if map_value + var
>  was appropriately bounds-checked?
> 
> Does the 'id' identify the variable that was added to an object pointer, or
>  the object itself?  Or does it blur these and identify (what the comment in
>  enum bpf_reg_type calls) "skb->data + (u16) var"?

The reg->id value changes any time a variable gets added to a packet
pointer.

You will also notice right now that only packet pointers have their
alignment tracked.

I have changes pending that will do that for MAP pointers too, but
it needs more work.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-19 17:17                     ` Edward Cree
  2017-05-19 20:00                       ` Alignment in BPF verifier Edward Cree
@ 2017-05-19 20:41                       ` David Miller
  2017-05-19 21:37                         ` Alexei Starovoitov
  1 sibling, 1 reply; 36+ messages in thread
From: David Miller @ 2017-05-19 20:41 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev

From: Edward Cree <ecree@solarflare.com>
Date: Fri, 19 May 2017 18:17:42 +0100

> One question: is there a way to build the verifier as userland code
>  (or at least as a module), or will I have to reboot every time I
>  want to test a change?

There currently is no such machanism, you will have to reboot every
time.

I have considered working on making the code buildable outside of the
kernel.  It shouldn't be too hard.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-19 20:00                       ` Alignment in BPF verifier Edward Cree
  2017-05-19 20:39                         ` David Miller
@ 2017-05-19 20:48                         ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2017-05-19 20:48 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev

From: Edward Cree <ecree@solarflare.com>
Date: Fri, 19 May 2017 21:00:13 +0100

> Here's what I'm thinking of doing:
> struct bpf_reg_state {
>     enum bpf_reg_type type;
>     union {
>         /* valid when type == PTR_TO_PACKET */
>         u16 range;
> 
>         /* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
>          *   PTR_TO_MAP_VALUE_OR_NULL
>          */
>         struct bpf_map *map_ptr;
>     };
>     /* Used to find other pointers with the same variable base, so they
>      * can share range and align knowledge.
>      */
>     u32 id;
>     u32 off; /* fixed part of pointer offset */
>     /* For scalar types (CONST_IMM | UNKNOWN_VALUE), this represents our
>      * knowledge of the actual value.
>      * For pointer types, this represents the variable part of the offset
>      * from the pointed-to object, and is shared with all bpf_reg_states
>      * with the same id as us.
>      */
>     struct tnum align;
>     /* Used to determine if any memory access using this register will
>      * result in a bad access. These two fields must be last.
>      * See states_equal()
>      * These refer to the same value as align, not necessarily the actual
>      * contents of the register.
>      */
>     s64 min_value;
>     u64 max_value;
> };

Be very careful with the layout of bpf_reg_state.

There are layout dependencies in the state pruning.  Please take a look
at states_equal().  It is walking the set of registers at two snapshot
locations and trying to see if they are "equivalent".

What's happening here is that the verifier makes a stack of all branch
points in the program.  On the first pass it analyzes the register
values taking one of the two paths a branch takes.  Then when it hits
the end of the program, on that path, to BPF_EXIT it starts popping
the entries on the stack.

The naive implementation would pop each stack entry, and then traverse
the other arm of the branch.  But for programs with lots of branches
this gets very expensive.

So at each stack pop, the verifier tries to determine if it can skip
traversing the branch's other path.  And it does this by analyzing
register state.

The tests are basically:

		if (memcmp(rold, rcur, sizeof(*rold)) == 0)
			continue;

exact equivalent, then we're fine.

		/* If the ranges were not the same, but everything else was and
		 * we didn't do a variable access into a map then we are a-ok.
		 */
		if (!varlen_map_access &&
		    memcmp(rold, rcur, offsetofend(struct bpf_reg_state, id)) == 0)
			continue;

We didn't do any variable MAP accesses, and everything in the register
"up to and including member ID" is the same, we're fine.

And then we drop down into some packet pointer specific tests to try
and optimize things further.

So you have to be careful what you place before and/or after 'id'.

Probably we need to put the alignment stuff before 'id' so that it
is considered by the offsetofend() length memcmp().

Hope that helps.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-19 20:41                       ` [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier David Miller
@ 2017-05-19 21:37                         ` Alexei Starovoitov
  2017-05-19 23:16                           ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-19 21:37 UTC (permalink / raw)
  To: David Miller, ecree; +Cc: daniel, alexei.starovoitov, netdev

On 5/19/17 1:41 PM, David Miller wrote:
> From: Edward Cree <ecree@solarflare.com>
> Date: Fri, 19 May 2017 18:17:42 +0100
>
>> One question: is there a way to build the verifier as userland code
>>  (or at least as a module), or will I have to reboot every time I
>>  want to test a change?
>
> There currently is no such machanism, you will have to reboot every
> time.
>
> I have considered working on making the code buildable outside of the
> kernel.  It shouldn't be too hard.

it's not hard.
We did it twice and both times abandoned.
First time to have 'user space verifier' to check programs before
loading and second time for fuzzing via llvm.
Abandoned since it diverges very quickly from kernel.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-19 20:39                         ` David Miller
@ 2017-05-19 23:05                           ` Daniel Borkmann
  2017-05-23 14:41                             ` Edward Cree
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Borkmann @ 2017-05-19 23:05 UTC (permalink / raw)
  To: David Miller, ecree; +Cc: ast, alexei.starovoitov, netdev

On 05/19/2017 10:39 PM, David Miller wrote:
> From: Edward Cree <ecree@solarflare.com>
> Date: Fri, 19 May 2017 21:00:13 +0100
>
>> Well, I've managed to get somewhat confused by reg->id.
>> In particular, I'm unsure which bpf_reg_types can have an id, and what
>>   exactly it means.  There seems to be some code that checks around map value
>>   pointers, which seems strange as maps have fixed sizes (and the comments in
>>   enum bpf_reg_type make it seem like id is a PTR_TO_PACKET thing) - is this

Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
track all registers (incl. spilled ones) with the same reg->id
that originated from the same map lookup. After the reg type is
then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
for map in map) or UNKNOWN_VALUE depending on the branch, the
reg->id is then reset to 0 again. Whole reason for this is that
LLVM generates code where it can move and/or spill a reg of type
PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
test on it, and later on it expects that the spilled or moved
regs work wrt access. So they're marked with an id and then all
of them are type migrated. So here meaning of reg->id is different
than in PTR_TO_PACKET case. Example:

0: (b7) r1 = 10
1: (7b) *(u64 *)(r10 -8) = r1
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x59c00000
6: (85) call 1 //map lookup
7: (bf) r4 = r0
8: (15) if r0 == 0x0 goto pc+1
  R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
9: (7a) *(u64 *)(r4 +0) = 0

>>   maybe because of map-of-maps support, can the contained maps have differing
>>   element sizes?  Or do we allow *(map_value + var + imm), if map_value + var
>>   was appropriately bounds-checked?
>>
>> Does the 'id' identify the variable that was added to an object pointer, or
>>   the object itself?  Or does it blur these and identify (what the comment in
>>   enum bpf_reg_type calls) "skb->data + (u16) var"?
>
> The reg->id value changes any time a variable gets added to a packet
> pointer.
>
> You will also notice right now that only packet pointers have their
> alignment tracked.
>
> I have changes pending that will do that for MAP pointers too, but
> it needs more work.
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-19 21:37                         ` Alexei Starovoitov
@ 2017-05-19 23:16                           ` David Miller
  2017-05-20  0:20                             ` Alexei Starovoitov
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2017-05-19 23:16 UTC (permalink / raw)
  To: ast; +Cc: ecree, daniel, alexei.starovoitov, netdev

From: Alexei Starovoitov <ast@fb.com>
Date: Fri, 19 May 2017 14:37:56 -0700

> On 5/19/17 1:41 PM, David Miller wrote:
>> From: Edward Cree <ecree@solarflare.com>
>> Date: Fri, 19 May 2017 18:17:42 +0100
>>
>>> One question: is there a way to build the verifier as userland code
>>>  (or at least as a module), or will I have to reboot every time I
>>>  want to test a change?
>>
>> There currently is no such machanism, you will have to reboot every
>> time.
>>
>> I have considered working on making the code buildable outside of the
>> kernel.  It shouldn't be too hard.
> 
> it's not hard.
> We did it twice and both times abandoned.
> First time to have 'user space verifier' to check programs before
> loading and second time for fuzzing via llvm.
> Abandoned since it diverges very quickly from kernel.
> 

Well, my idea was the create an environment in which kernel verifier.c
could be built as-is.

Maybe there would be some small compromises in verifier.c such as an
ifdef test or two, but that should be it.

It really is just a piece of what amounts to compiler infrastructure
and not very kernel specific.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
  2017-05-19 23:16                           ` David Miller
@ 2017-05-20  0:20                             ` Alexei Starovoitov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-20  0:20 UTC (permalink / raw)
  To: David Miller; +Cc: ecree, daniel, alexei.starovoitov, netdev

On 5/19/17 4:16 PM, David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Fri, 19 May 2017 14:37:56 -0700
>
>> On 5/19/17 1:41 PM, David Miller wrote:
>>> From: Edward Cree <ecree@solarflare.com>
>>> Date: Fri, 19 May 2017 18:17:42 +0100
>>>
>>>> One question: is there a way to build the verifier as userland code
>>>>  (or at least as a module), or will I have to reboot every time I
>>>>  want to test a change?
>>>
>>> There currently is no such machanism, you will have to reboot every
>>> time.
>>>
>>> I have considered working on making the code buildable outside of the
>>> kernel.  It shouldn't be too hard.
>>
>> it's not hard.
>> We did it twice and both times abandoned.
>> First time to have 'user space verifier' to check programs before
>> loading and second time for fuzzing via llvm.
>> Abandoned since it diverges very quickly from kernel.
>>
>
> Well, my idea was the create an environment in which kernel verifier.c
> could be built as-is.
>
> Maybe there would be some small compromises in verifier.c such as an
> ifdef test or two, but that should be it.

that's exactly what we did the first time. Added few ifdef to verifier.c
Second time we went even further by compiling kernel/bpf/verifier.c
as-is and linking everything magically via user space hooks
all the way that test_verifier.c runs as-is but calling
bpf_check() function that was compiled for user space via clang.
That code is here:
https://github.com/iovisor/bpf-fuzzer
It's definitely possible to refresh it and make it work again.

My point that unless we put such 'lets build verifier.c for user space'
as part of tools/testing/selftests/ or something, such project is
destined to bit rot.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-19 23:05                           ` Daniel Borkmann
@ 2017-05-23 14:41                             ` Edward Cree
  2017-05-23 17:43                               ` Edward Cree
  2017-05-23 19:45                               ` Alexei Starovoitov
  0 siblings, 2 replies; 36+ messages in thread
From: Edward Cree @ 2017-05-23 14:41 UTC (permalink / raw)
  To: Daniel Borkmann, David Miller; +Cc: ast, alexei.starovoitov, netdev

I'm still plugging away at this... it's going to be quite a big patch and
 rewrite a lot of stuff (and I'm not sure I'll be able to break it into
 smaller bisectable patches).
And of course I have more questions.  In check_packet_ptr_add(), we
 forbid adding a negative constant to a packet ptr.  Is there some
 principled reason for that, or is it just because the bounds checking is
 hard?  It seems like, if imm + reg->off > 0 (suitably carefully checked
 to avoid overflow etc.), then the subtraction should be legal.  Indeed,
 even if the reg->off (fixed part of offset) is zero, if the variable part
 is known (min_value) to be >= -imm, the subtraction should be safe.

On 20/05/17 00:05, Daniel Borkmann wrote:
> Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
> track all registers (incl. spilled ones) with the same reg->id
> that originated from the same map lookup. After the reg type is
> then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
> for map in map) or UNKNOWN_VALUE depending on the branch, the
> reg->id is then reset to 0 again. Whole reason for this is that
> LLVM generates code where it can move and/or spill a reg of type
> PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
> test on it, and later on it expects that the spilled or moved
> regs work wrt access. So they're marked with an id and then all
> of them are type migrated. So here meaning of reg->id is different
> than in PTR_TO_PACKET case.
Hmm, that means that we can't do arithmetic on a
 PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
 first by NULL-checking it.  That's probably fine, but I can just about
 imagine some compiler optimisation reordering them.  Any reason not to
 split this out into a different reg->field, rather than overloading id?
Of course that would need (more) caution wrt. states_equal(), but it
 looks like I'll be mangling that a lot anyway - for instance, we don't
 want to just use memcmp() to compare alignments, we want to check that
 our alignment is stricter than the old alignment.  (Of course memcmp()
 is a conservative check, so the "memcmp() the whole reg_state" fast
 path can remain.)

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-23 14:41                             ` Edward Cree
@ 2017-05-23 17:43                               ` Edward Cree
  2017-05-23 23:59                                 ` Alexei Starovoitov
  2017-05-23 19:45                               ` Alexei Starovoitov
  1 sibling, 1 reply; 36+ messages in thread
From: Edward Cree @ 2017-05-23 17:43 UTC (permalink / raw)
  To: Daniel Borkmann, David Miller; +Cc: ast, alexei.starovoitov, netdev

Another issue: it looks like the min/max_value handling for subtraction is
 bogus.  In adjust_reg_min_max_vals() we have
    if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
        dst_reg->min_value -= min_val;
    if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
        dst_reg->max_value -= max_val;
 where min_val and max_val refer to the src_reg.
But surely they should be used the other way round; if (say) 2 <= R1 <= 6
 and 1 <= R2 <= 4, then this will claim 1 <= (R1 - R2) <= 2, whereas really
 (R1 - R2) could be anything from -2 to 5.
This also means that the code just above the switch,
    if (min_val == BPF_REGISTER_MIN_RANGE)
        dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
    if (max_val == BPF_REGISTER_MAX_RANGE)
        dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
 is wrong, since e.g. subtracting MAX_RANGE needs to blow our min_value,
 not our max_value.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-23 14:41                             ` Edward Cree
  2017-05-23 17:43                               ` Edward Cree
@ 2017-05-23 19:45                               ` Alexei Starovoitov
  2017-05-23 21:27                                 ` Daniel Borkmann
  1 sibling, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-23 19:45 UTC (permalink / raw)
  To: Edward Cree, Daniel Borkmann, David Miller; +Cc: alexei.starovoitov, netdev

On 5/23/17 7:41 AM, Edward Cree wrote:
> I'm still plugging away at this... it's going to be quite a big patch and
>  rewrite a lot of stuff (and I'm not sure I'll be able to break it into
>  smaller bisectable patches).
> And of course I have more questions.  In check_packet_ptr_add(), we
>  forbid adding a negative constant to a packet ptr.  Is there some
>  principled reason for that, or is it just because the bounds checking is
>  hard?  It seems like, if imm + reg->off > 0 (suitably carefully checked
>  to avoid overflow etc.), then the subtraction should be legal.  Indeed,
>  even if the reg->off (fixed part of offset) is zero, if the variable part
>  is known (min_value) to be >= -imm, the subtraction should be safe.

adding negative imm to pkt_ptr is ok, but what is the use case?
Do you see llvm generating such code?
I think if we try to track everything with the current shape of
state pruning, the verifier will stop accepting old programs
because it reaches complexity limit.

I think we need to rearchitect the whole thing.
I was thinking of doing it compiler-style. Convert to ssa and
do traditional data flow analysis, use-def chains, register liveness
then pruning heuristics won't be necessary and verifier should be
able to check everything in more or less single pass.
Things like register liveness can be done without ssa. It can
be used to augment existing pruning, since it will know which
registers are dead, so they don't have to be compared, but it
feels half-way. I'd rather go all the way.

> On 20/05/17 00:05, Daniel Borkmann wrote:
>> Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
>> track all registers (incl. spilled ones) with the same reg->id
>> that originated from the same map lookup. After the reg type is
>> then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
>> for map in map) or UNKNOWN_VALUE depending on the branch, the
>> reg->id is then reset to 0 again. Whole reason for this is that
>> LLVM generates code where it can move and/or spill a reg of type
>> PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
>> test on it, and later on it expects that the spilled or moved
>> regs work wrt access. So they're marked with an id and then all
>> of them are type migrated. So here meaning of reg->id is different
>> than in PTR_TO_PACKET case.
> Hmm, that means that we can't do arithmetic on a
>  PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
>  first by NULL-checking it.  That's probably fine, but I can just about
>  imagine some compiler optimisation reordering them.  Any reason not to
>  split this out into a different reg->field, rather than overloading id?

'id' is sort of like 'version' of a pointer and has the same meaning in
both cases. How exactly do you see this split?

> Of course that would need (more) caution wrt. states_equal(), but it
>  looks like I'll be mangling that a lot anyway - for instance, we don't
>  want to just use memcmp() to compare alignments, we want to check that
>  our alignment is stricter than the old alignment.  (Of course memcmp()
>  is a conservative check, so the "memcmp() the whole reg_state" fast
>  path can remain.)

yes. that would be good improvement. Not sure how much it will help
the pruning though.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-23 19:45                               ` Alexei Starovoitov
@ 2017-05-23 21:27                                 ` Daniel Borkmann
  2017-05-24 13:46                                   ` Edward Cree
  2017-05-25 16:31                                   ` David Miller
  0 siblings, 2 replies; 36+ messages in thread
From: Daniel Borkmann @ 2017-05-23 21:27 UTC (permalink / raw)
  To: Alexei Starovoitov, Edward Cree, David Miller; +Cc: alexei.starovoitov, netdev

On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:
> On 5/23/17 7:41 AM, Edward Cree wrote:
>> I'm still plugging away at this... it's going to be quite a big patch and
>>  rewrite a lot of stuff (and I'm not sure I'll be able to break it into
>>  smaller bisectable patches).
>> And of course I have more questions.  In check_packet_ptr_add(), we
>>  forbid adding a negative constant to a packet ptr.  Is there some
>>  principled reason for that, or is it just because the bounds checking is
>>  hard?  It seems like, if imm + reg->off > 0 (suitably carefully checked
>>  to avoid overflow etc.), then the subtraction should be legal.  Indeed,
>>  even if the reg->off (fixed part of offset) is zero, if the variable part
>>  is known (min_value) to be >= -imm, the subtraction should be safe.
>
> adding negative imm to pkt_ptr is ok, but what is the use case?
> Do you see llvm generating such code?

Btw, currently, you kind of have it in a limited way via BPF_STX_MEM()
and BPF_LDX_MEM() when accessing pkt data through the offset, which
can be negative on the insn level.

> I think if we try to track everything with the current shape of
> state pruning, the verifier will stop accepting old programs
> because it reaches complexity limit.
>
> I think we need to rearchitect the whole thing.
> I was thinking of doing it compiler-style. Convert to ssa and
> do traditional data flow analysis, use-def chains, register liveness
> then pruning heuristics won't be necessary and verifier should be
> able to check everything in more or less single pass.
> Things like register liveness can be done without ssa. It can
> be used to augment existing pruning, since it will know which
> registers are dead, so they don't have to be compared, but it
> feels half-way. I'd rather go all the way.
>
>> On 20/05/17 00:05, Daniel Borkmann wrote:
>>> Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
>>> track all registers (incl. spilled ones) with the same reg->id
>>> that originated from the same map lookup. After the reg type is
>>> then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
>>> for map in map) or UNKNOWN_VALUE depending on the branch, the
>>> reg->id is then reset to 0 again. Whole reason for this is that
>>> LLVM generates code where it can move and/or spill a reg of type
>>> PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
>>> test on it, and later on it expects that the spilled or moved
>>> regs work wrt access. So they're marked with an id and then all
>>> of them are type migrated. So here meaning of reg->id is different
>>> than in PTR_TO_PACKET case.
>> Hmm, that means that we can't do arithmetic on a
>>  PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
>>  first by NULL-checking it.  That's probably fine, but I can just about
>>  imagine some compiler optimisation reordering them.  Any reason not to
>>  split this out into a different reg->field, rather than overloading id?
>
> 'id' is sort of like 'version' of a pointer and has the same meaning in
> both cases. How exactly do you see this split?

Also, same id is never reused once generated and later propagated
through regs. So far we haven't run into this kind of optimization
from llvm side yet, but others which led to requiring the id marker
(see 57a09bf0a416). I could imagine it might be needed at some point,
though where we later transition directly to PTR_TO_MAP_VALUE_ADJ
after NULL check. Out of curiosity, did you run into it with llvm?

>> Of course that would need (more) caution wrt. states_equal(), but it
>>  looks like I'll be mangling that a lot anyway - for instance, we don't
>>  want to just use memcmp() to compare alignments, we want to check that
>>  our alignment is stricter than the old alignment.  (Of course memcmp()
>>  is a conservative check, so the "memcmp() the whole reg_state" fast
>>  path can remain.)
>
> yes. that would be good improvement. Not sure how much it will help
> the pruning though.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-23 17:43                               ` Edward Cree
@ 2017-05-23 23:59                                 ` Alexei Starovoitov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-23 23:59 UTC (permalink / raw)
  To: Edward Cree, Daniel Borkmann, David Miller
  Cc: alexei.starovoitov, netdev, Josef Bacik

On 5/23/17 10:43 AM, Edward Cree wrote:
> Another issue: it looks like the min/max_value handling for subtraction is
>  bogus.  In adjust_reg_min_max_vals() we have
>     if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
>         dst_reg->min_value -= min_val;
>     if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
>         dst_reg->max_value -= max_val;
>  where min_val and max_val refer to the src_reg.
> But surely they should be used the other way round; if (say) 2 <= R1 <= 6
>  and 1 <= R2 <= 4, then this will claim 1 <= (R1 - R2) <= 2, whereas really
>  (R1 - R2) could be anything from -2 to 5.
> This also means that the code just above the switch,
>     if (min_val == BPF_REGISTER_MIN_RANGE)
>         dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
>     if (max_val == BPF_REGISTER_MAX_RANGE)
>         dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
>  is wrong, since e.g. subtracting MAX_RANGE needs to blow our min_value,
>  not our max_value.

right. good catch. I have a feeling we discussed similar thing before.
May be some patch felt through the cracks.
That's the reason the fancy verifier analysis is root only.
I'm assuming you're going to send a fix?
Thanks!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-23 21:27                                 ` Daniel Borkmann
@ 2017-05-24 13:46                                   ` Edward Cree
  2017-05-24 16:39                                     ` Alexei Starovoitov
  2017-05-25 16:31                                   ` David Miller
  1 sibling, 1 reply; 36+ messages in thread
From: Edward Cree @ 2017-05-24 13:46 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, David Miller
  Cc: alexei.starovoitov, netdev

On 23/05/17 22:27, Daniel Borkmann wrote:
> On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:
>> On 5/23/17 7:41 AM, Edward Cree wrote:
>>> Hmm, that means that we can't do arithmetic on a
>>>  PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
>>>  first by NULL-checking it.  That's probably fine, but I can just about
>>>  imagine some compiler optimisation reordering them.  Any reason not to
>>>  split this out into a different reg->field, rather than overloading id?
>>
>> 'id' is sort of like 'version' of a pointer and has the same meaning in
>> both cases. How exactly do you see this split?
I was thinking there would be reg->id and reg->map_id.  Both could share the
 env->id_gen, since that's not likely to run out, but they'd be separate
 fields so that a PTR_TO_MAP_VALUE_OR_NULL could say "this is either map_value
 plus a 4-byte-aligned offset less than 24, or NULL plus that same offset",
 and then if another pointer with the same map_id and no variable-offset part
 was NULL-checked, we could convert both pointers to PTR_TO_MAP_VALUE.  (I'm
 getting rid of PTR_TO_MAP_VALUE_ADJ in my patch, along with several other
 types, by taking the 'we have an offset' part out of the bpf_reg_type.)
> So far we haven't run into this kind of optimization
> from llvm side yet[...] Out of curiosity, did you run into it with llvm?
No, purely theoretical.  I haven't even built/installed llvm yet, I'm just
 working with the bytecode in test_verifier.c for now.  I'm merely trying to
 not have restrictions that are unnecessary; but since allowing this kind of
 construct would take a non-zero amount of work, I'll file it for later.

-Ed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-24 13:46                                   ` Edward Cree
@ 2017-05-24 16:39                                     ` Alexei Starovoitov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexei Starovoitov @ 2017-05-24 16:39 UTC (permalink / raw)
  To: Edward Cree, Daniel Borkmann, David Miller; +Cc: alexei.starovoitov, netdev

On 5/24/17 6:46 AM, Edward Cree wrote:
> On 23/05/17 22:27, Daniel Borkmann wrote:
>> On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:
>>> On 5/23/17 7:41 AM, Edward Cree wrote:
>>>> Hmm, that means that we can't do arithmetic on a
>>>>  PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
>>>>  first by NULL-checking it.  That's probably fine, but I can just about
>>>>  imagine some compiler optimisation reordering them.  Any reason not to
>>>>  split this out into a different reg->field, rather than overloading id?
>>>
>>> 'id' is sort of like 'version' of a pointer and has the same meaning in
>>> both cases. How exactly do you see this split?
> I was thinking there would be reg->id and reg->map_id.  Both could share the
>  env->id_gen, since that's not likely to run out, but they'd be separate
>  fields so that a PTR_TO_MAP_VALUE_OR_NULL could say "this is either map_value
>  plus a 4-byte-aligned offset less than 24, or NULL plus that same offset",
>  and then if another pointer with the same map_id and no variable-offset part
>  was NULL-checked, we could convert both pointers to PTR_TO_MAP_VALUE.  (I'm
>  getting rid of PTR_TO_MAP_VALUE_ADJ in my patch, along with several other
>  types, by taking the 'we have an offset' part out of the bpf_reg_type.)

got it. makes sense.

>> So far we haven't run into this kind of optimization
>> from llvm side yet[...] Out of curiosity, did you run into it with llvm?
> No, purely theoretical.  I haven't even built/installed llvm yet, I'm just
>  working with the bytecode in test_verifier.c for now.  I'm merely trying to
>  not have restrictions that are unnecessary; but since allowing this kind of
>  construct would take a non-zero amount of work, I'll file it for later.

modern fedora/ubuntu come with llvm that has bpf backend by default.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Alignment in BPF verifier
  2017-05-23 21:27                                 ` Daniel Borkmann
  2017-05-24 13:46                                   ` Edward Cree
@ 2017-05-25 16:31                                   ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2017-05-25 16:31 UTC (permalink / raw)
  To: daniel; +Cc: ast, ecree, alexei.starovoitov, netdev

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 23 May 2017 23:27:20 +0200

> On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:
>> On 5/23/17 7:41 AM, Edward Cree wrote:
>>> Hmm, that means that we can't do arithmetic on a
>>>  PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
>>>  first by NULL-checking it.  That's probably fine, but I can just about
>>>  imagine some compiler optimisation reordering them.  Any reason not to
>>>  split this out into a different reg->field, rather than overloading
>>>  id?
>>
>> 'id' is sort of like 'version' of a pointer and has the same meaning
>> in
>> both cases. How exactly do you see this split?
> 
> Also, same id is never reused once generated and later propagated
> through regs. So far we haven't run into this kind of optimization
> from llvm side yet, but others which led to requiring the id marker
> (see 57a09bf0a416). I could imagine it might be needed at some point,
> though where we later transition directly to PTR_TO_MAP_VALUE_ADJ
> after NULL check. Out of curiosity, did you run into it with llvm?

We could handle this issue in find_good_pkt_pointers(), nothing prevents
us from advancing state there for cases like Edward notes above.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-05-25 16:32 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 16:04 [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier David Miller
2017-05-16 12:37 ` Edward Cree
2017-05-16 19:52   ` David Miller
2017-05-16 22:53   ` Alexei Starovoitov
2017-05-17 14:00     ` Edward Cree
2017-05-17 15:33       ` Edward Cree
2017-05-18  0:16         ` David Miller
2017-05-18  1:18           ` Alexei Starovoitov
2017-05-18 14:10           ` Edward Cree
2017-05-18  2:48         ` Alexei Starovoitov
2017-05-18 14:49           ` Edward Cree
2017-05-18 16:38             ` Edward Cree
2017-05-18 18:41               ` Edward Cree
2017-05-19  1:22               ` Alexei Starovoitov
2017-05-19 14:21                 ` Edward Cree
2017-05-19 14:55                   ` Alexei Starovoitov
2017-05-19 17:17                     ` Edward Cree
2017-05-19 20:00                       ` Alignment in BPF verifier Edward Cree
2017-05-19 20:39                         ` David Miller
2017-05-19 23:05                           ` Daniel Borkmann
2017-05-23 14:41                             ` Edward Cree
2017-05-23 17:43                               ` Edward Cree
2017-05-23 23:59                                 ` Alexei Starovoitov
2017-05-23 19:45                               ` Alexei Starovoitov
2017-05-23 21:27                                 ` Daniel Borkmann
2017-05-24 13:46                                   ` Edward Cree
2017-05-24 16:39                                     ` Alexei Starovoitov
2017-05-25 16:31                                   ` David Miller
2017-05-19 20:48                         ` David Miller
2017-05-19 20:41                       ` [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier David Miller
2017-05-19 21:37                         ` Alexei Starovoitov
2017-05-19 23:16                           ` David Miller
2017-05-20  0:20                             ` Alexei Starovoitov
2017-05-17 16:13       ` David Miller
2017-05-17 17:00         ` Edward Cree
2017-05-17 17:25           ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.