All of lore.kernel.org
 help / color / mirror / Atom feed
* multi bpf filter will impact performance?
@ 2010-11-30  9:22 Rui
  2010-11-30  9:34 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Rui @ 2010-11-30  9:22 UTC (permalink / raw)
  To: netdev

hi

I did a test with an intel X520 10Gnic, HP DL380 G6,  to see how the
bpf filter will impact the performance.

kernel .2.6.32 SLES11+SP1, original ixgbe driver

step 0:
launch 4 tcpdump processes,each applied a filter to filter out some
GTP-U packets. seen with 'tcpdump -d', the bpf code has about 100
lines.

#!/bin/sh
PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
(((ether[48:1]&0x07)>0) and
(((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==0))
or (((ether[48:1]&0x07)==0) and
(((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==0))
) ' -w /dev/null -s 4096 2>f1.log &
PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
(((ether[48:1]&0x07)>0) and
(((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==1))
or (((ether[48:1]&0x07)==0) and
(((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==1))
) ' -w /dev/null -s 4096 2>f2.log &
PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
(((ether[48:1]&0x07)>0) and
(((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==2))
or (((ether[48:1]&0x07)==0) and
(((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==2))
) ' -w /dev/null -s 4096 2>f3.log &
PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
(((ether[48:1]&0x07)>0) and
(((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==3))
or (((ether[48:1]&0x07)==0) and
(((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==3))
) ' -w /dev/null -s 4096 2>f4.log &


step1:
use stress test equipment to generate traffic >1.2Gbps


step2:
type 'ifconfig eth4'
saw many packets dropped

step3:
type 'sar -n DEV 1', the incoming throughput limited at 800Mbps


my questions:

Q1. the bpf filter is run one by one? will only one filter be executed
for one sock? (so that the tcpdump corresponding kernel part will run
filter in parallel?)

Q2. performance is bad? any idea to improve it?

thanks a lot
rui

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-11-30  9:22 multi bpf filter will impact performance? Rui
@ 2010-11-30  9:34 ` Eric Dumazet
       [not found]   ` <AANLkTi=VpmnrXTBNV7McQm6mq9ULT7KTKbM8_hLPoL=2@mail.gmail.com>
  2010-11-30 10:01 ` multi bpf filter will impact performance? Eric Dumazet
  2010-11-30 11:17 ` Eric Dumazet
  2 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-11-30  9:34 UTC (permalink / raw)
  To: Rui; +Cc: netdev

Le mardi 30 novembre 2010 à 17:22 +0800, Rui a écrit :
> hi
> 
> I did a test with an intel X520 10Gnic, HP DL380 G6,  to see how the
> bpf filter will impact the performance.
> 
> kernel .2.6.32 SLES11+SP1, original ixgbe driver
> 

Could you try latest net-next-2.6, we optimized bpf a bit lately

commit 93aaae2e01e57483256b7da05c9a7ebd65ad4686
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Fri Nov 19 09:49:59 2010 -0800

    filter: optimize sk_run_filter
    


> step 0:
> launch 4 tcpdump processes,each applied a filter to filter out some
> GTP-U packets. seen with 'tcpdump -d', the bpf code has about 100
> lines.
> 
> #!/bin/sh
> PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
> (((ether[48:1]&0x07)>0) and
> (((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==0))
> or (((ether[48:1]&0x07)==0) and
> (((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==0))
> ) ' -w /dev/null -s 4096 2>f1.log &
> PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
> (((ether[48:1]&0x07)>0) and
> (((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==1))
> or (((ether[48:1]&0x07)==0) and
> (((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==1))
> ) ' -w /dev/null -s 4096 2>f2.log &
> PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
> (((ether[48:1]&0x07)>0) and
> (((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==2))
> or (((ether[48:1]&0x07)==0) and
> (((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==2))
> ) ' -w /dev/null -s 4096 2>f3.log &
> PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
> (((ether[48:1]&0x07)>0) and
> (((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==3))
> or (((ether[48:1]&0x07)==0) and
> (((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==3))
> ) ' -w /dev/null -s 4096 2>f4.log &
> 
> 

Hmm, do you receive trafic on several queues of your card ?

Do you have 4 cpus running ?

grep eth /proc/interrupts


> step1:
> use stress test equipment to generate traffic >1.2Gbps
> 
> 
> step2:
> type 'ifconfig eth4'
> saw many packets dropped
> 
> step3:
> type 'sar -n DEV 1', the incoming throughput limited at 800Mbps
> 
> 
> my questions:
> 
> Q1. the bpf filter is run one by one? will only one filter be executed
> for one sock? (so that the tcpdump corresponding kernel part will run
> filter in parallel?)
> 

bpf filter is run on behalf of the softirq processing.

> Q2. performance is bad? any idea to improve it?
> 

multiqueue card : split each IRQ on a separate cpu.

If not multiqueue card : use RPS on a recent kernel to split the load on
several cpus.

Use a filter against the queue, instead of doing a hash code yourself in
the bpf. (code added in commit d19742fb (linux-2.6.33)

(you need to tweak libpcap to use SKF_AD_QUEUE instruction)

commit d19742fb1c68e6db83b76e06dea5a374c99e104f
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Tue Oct 20 01:06:22 2009 -0700

    filter: Add SKF_AD_QUEUE instruction
    
    It can help being able to filter packets on their queue_mapping.
    
    If filter performance is not good, we could add a "numqueue" field
    in struct packet_type, so that netif_nit_deliver() and other functions
    can directly ignore packets with not expected queue number.
    
    Lets experiment this simple filter extension first.






^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-11-30  9:22 multi bpf filter will impact performance? Rui
  2010-11-30  9:34 ` Eric Dumazet
@ 2010-11-30 10:01 ` Eric Dumazet
  2010-11-30 11:17 ` Eric Dumazet
  2 siblings, 0 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-11-30 10:01 UTC (permalink / raw)
  To: Rui; +Cc: netdev

Le mardi 30 novembre 2010 à 17:22 +0800, Rui a écrit :

> step 0:
> launch 4 tcpdump processes,each applied a filter to filter out some
> GTP-U packets. seen with 'tcpdump -d', the bpf code has about 100
> lines.

So each incoming packet has to go through your 4 filters, thats a total
of ~400 bpf instructions. (a bit less in average because of the
conditionnal branches)




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-11-30  9:22 multi bpf filter will impact performance? Rui
  2010-11-30  9:34 ` Eric Dumazet
  2010-11-30 10:01 ` multi bpf filter will impact performance? Eric Dumazet
@ 2010-11-30 11:17 ` Eric Dumazet
  2 siblings, 0 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-11-30 11:17 UTC (permalink / raw)
  To: Rui; +Cc: netdev

Le mardi 30 novembre 2010 à 17:22 +0800, Rui a écrit :

> PCAP_FRAMES=32000 ./tcpdump_MMAP -i eth4 'udp dst port 2152 and (
> (((ether[48:1]&0x07)>0) and
> (((ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1]+ether[70:1]+ether[71:1]+ether[72:1]+ether[73:1])&0x03)==0))
> or (((ether[48:1]&0x07)==0) and
> (((ether[62:1]+ether[63:1]+ether[64:1]+ether[65:1]+ether[66:1]+ether[67:1]+ether[68:1]+ether[69:1])&0x03)==0))
> ) ' -w /dev/null -s 4096 2>f1.log &

(000) ldh      [12]
(001) jeq      #0x86dd          jt 2	jf 6
(002) ldb      [20]
(003) jeq      #0x11            jt 4	jf 95
(004) ldh      [56]
(005) jeq      #0x868           jt 14	jf 95
(006) jeq      #0x800           jt 7	jf 95
(007) ldb      [23]
(008) jeq      #0x11            jt 9	jf 95
(009) ldh      [20]
(010) jset     #0x1fff          jt 95	jf 11
(011) ldxb     4*([14]&0xf)
(012) ldh      [x + 16]
(013) jeq      #0x868           jt 14	jf 95
(014) ldb      [48]
(015) and      #0x7
(016) ldx      #0x0
(017) jgt      x                jt 18	jf 55
(018) ldb      [66]
(019) st       M[4]
(020) ldb      [67]
(021) tax      
(022) ld       M[4]
(023) add      x
(024) st       M[6]
(025) ldb      [68]
(026) tax      
(027) ld       M[6]
(028) add      x
(029) st       M[8]
(030) ldb      [69]
(031) tax      
(032) ld       M[8]
(033) add      x
(034) st       M[10]
(035) ldb      [70]
(036) tax      
(037) ld       M[10]
(038) add      x
(039) st       M[12]
(040) ldb      [71]
(041) tax      
(042) ld       M[12]
(043) add      x
(044) st       M[14]
(045) ldb      [72]
(046) tax      
(047) ld       M[14]
(048) add      x
(049) st       M[0]
(050) ldb      [73]
(051) tax      
(052) ld       M[0]
(053) add      x

[deleted part]

Ouch... we miss a "add{bh } [byteoff]" instruction, or "ldx{bh } [byteoff]"

ldb	[66]
ldxb	[67]
add	x
ldxb	[68]
add	x
ldxb	[69]
add	x
ldxb	[70]
add	x
ldxb	[71]
add	x
ldxb	[72]
add	x
ldxb	[73]
add	x
...

With current instruction set, pcap optimizer could at least do something like :

ldb	[66]
tax
ldb	[67]
add	x
tax
ldb	[68]
add	x
tax
ldb	[69]
add	x
tax
ldb	[70]
add	x
tax
ldb	[71]
add	x
tax
ldb	[72]
add	x
tax
ldb	[73]
add	x
...




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
       [not found]     ` <1291127670.2904.96.camel@edumazet-laptop>
@ 2010-12-01  3:48       ` Rui
  2010-12-01  4:03         ` Eric Dumazet
  2010-12-01  7:36         ` Changli Gao
  0 siblings, 2 replies; 42+ messages in thread
From: Rui @ 2010-12-01  3:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Tue, Nov 30, 2010 at 10:34 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 30 novembre 2010 à 22:21 +0800, Rui a écrit :
>
>> does it mean each incoming packet will issue a softirq?
>
> Hmm, if your machine is loaded, one cpu is probably looping inside
> ksoftirqd. You can probably check with "top" command.
>
>> possible to know the corresponding sock,then to run only one filter?
>>
>
> We could add a SKF_AD_CPU filter, thats a couple of lines ;)
>
> tcpdump   .... "cpu 0 and udp ..."
>
>> >
>> >> Q2. performance is bad? any idea to improve it?
>> >>
>> >
>> > multiqueue card : split each IRQ on a separate cpu.
>> >
>> > If not multiqueue card : use RPS on a recent kernel to split the load on
>> > several cpus.
>> >
>> > Use a filter against the queue, instead of doing a hash code yourself in
>> > the bpf. (code added in commit d19742fb (linux-2.6.33)
>> >
>> > (you need to tweak libpcap to use SKF_AD_QUEUE instruction)
>> >
>> > commit d19742fb1c68e6db83b76e06dea5a374c99e104f
>> > Author: Eric Dumazet <eric.dumazet@gmail.com>
>> > Date:   Tue Oct 20 01:06:22 2009 -0700
>> >
>> >    filter: Add SKF_AD_QUEUE instruction
>> >
>> >    It can help being able to filter packets on their queue_mapping.
>> >
>> >    If filter performance is not good, we could add a "numqueue" field
>> >    in struct packet_type, so that netif_nit_deliver() and other functions
>> >    can directly ignore packets with not expected queue number.
>> >
>> >    Lets experiment this simple filter extension first.
>> >
>> >
>> >
>> >
>> >
>> >
>> yes, I will try SKF_AD_QUEUE  with kernel 2.6.33.
>>
>> and I am testing the performance when traffic falls at the same
>> rx-queue (same src ip and dst ip traffic),
>> in this situation,I need to find another way to hash the traffic.
>
> Hmm, in this case, all trafic is handled by one cpu in ksoftirqd.
> and RPS wont be able to help you, since the skb hash will be the same
> for all packets.
>
> It would need a change in __skb_get_rxhash() to compute rxhash with the
> hash you use  (if udp dest port is 2152, then use ether[66-73] or
> ether[62-69] as the jhash parameter, while we currently only use udp
> addr/port information to compute the rxhash)
>
> Then, you could use RPS to spread the load into 4 separate cpus.
>
>
>
>
one more question is

if  RPS can spread the load into 4 separate cpus, how about the
"packet_rcv(or tpacket_rcv)" ? will they run in parallel?

as I use 'tcpdump(PACKET_MMAP)' to copy the packet to user space, I
expect there are simultaneous packet_rcv running in each CPU to put
the packet into ringbuffer.

thank you
rui

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  3:48       ` Rui
@ 2010-12-01  4:03         ` Eric Dumazet
  2010-12-01  7:45           ` [PATCH net-next-2.6] filter: add SKF_AD_RXHASH and SKF_AD_CPU Eric Dumazet
  2010-12-03  9:40           ` multi bpf filter will impact performance? Junchang Wang
  2010-12-01  7:36         ` Changli Gao
  1 sibling, 2 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-12-01  4:03 UTC (permalink / raw)
  To: Rui; +Cc: netdev

Le mercredi 01 décembre 2010 à 11:48 +0800, Rui a écrit :

> if  RPS can spread the load into 4 separate cpus, how about the
> "packet_rcv(or tpacket_rcv)" ? will they run in parallel?
> 
> as I use 'tcpdump(PACKET_MMAP)' to copy the packet to user space, I
> expect there are simultaneous packet_rcv running in each CPU to put
> the packet into ringbuffer.

Yes, the filter code wan run in parallel with no particular slowdown,
since code and bpf data is shared by all cpus (no writes)

But its rather important for performance that each cpu store packets
into its own packet socket or ring buffer, to avoid false sharing
slowdowns.

With such a setup (split packets to four cpus, then make sure one cpu
deliver packets to one particular PACKET socket/ring buffer), it should
really be fast enough.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  3:48       ` Rui
  2010-12-01  4:03         ` Eric Dumazet
@ 2010-12-01  7:36         ` Changli Gao
  2010-12-01  7:47           ` Eric Dumazet
  1 sibling, 1 reply; 42+ messages in thread
From: Changli Gao @ 2010-12-01  7:36 UTC (permalink / raw)
  To: Rui; +Cc: Eric Dumazet, netdev

[-- Attachment #1: Type: text/plain, Size: 524 bytes --]

On Wed, Dec 1, 2010 at 11:48 AM, Rui <wirelesser@gmail.com> wrote:
> one more question is
>
> if  RPS can spread the load into 4 separate cpus, how about the
> "packet_rcv(or tpacket_rcv)" ? will they run in parallel?
>

You mentioned RPS. But the current bpf doesn't have an instruction to
get the current CPU number. You can try this patch attached.

Maybe we can leverage the bpf and SO_REUSEPORT to direct the traffic
to the socket instance on the local CPU.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

[-- Attachment #2: bpf_cpu.diff --]
[-- Type: text/plain, Size: 728 bytes --]

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 447a775..35db44a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -124,7 +124,8 @@ struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
 #define SKF_AD_MARK 	20
 #define SKF_AD_QUEUE	24
 #define SKF_AD_HATYPE	28
-#define SKF_AD_MAX	32
+#define SKF_AD_CPU	32
+#define SKF_AD_MAX	36
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a44d27f..3baa3f7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -410,6 +410,9 @@ load_b:
 				A = 0;
 			continue;
 		}
+		case SKF_AD_CPU:
+			A = raw_smp_processor_id();
+			continue;
 		default:
 			return 0;
 		}

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH net-next-2.6] filter: add SKF_AD_RXHASH and SKF_AD_CPU
  2010-12-01  4:03         ` Eric Dumazet
@ 2010-12-01  7:45           ` Eric Dumazet
  2010-12-01  8:03             ` Changli Gao
  2010-12-06 21:02             ` David Miller
  2010-12-03  9:40           ` multi bpf filter will impact performance? Junchang Wang
  1 sibling, 2 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-12-01  7:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Rui

Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism,
to be able to build advanced filters.

This can help spreading packets on several sockets with a fast
selection, after RPS dispatch to N cpus for example, or to catch a
percentage of flows in one queue.

tcpdump -s 500 "cpu = 1" :

[0] ld CPU
[1] jeq #1  jt 2  jf 3 
[2] ret #500
[3] ret #0

# take 12.5 % of flows (average)
tcpdump -s 1000 "rxhash & 7 = 2" :

[0] ld RXHASH
[1] and #7
[2] jeq #2  jt 3  jf 4 
[3] ret #1000
[4] ret #0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rui <wirelesser@gmail.com>
---
 include/linux/filter.h |    4 +++-
 net/core/filter.c      |    6 ++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 447a775..5334ada 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -124,7 +124,9 @@ struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
 #define SKF_AD_MARK 	20
 #define SKF_AD_QUEUE	24
 #define SKF_AD_HATYPE	28
-#define SKF_AD_MAX	32
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_MAX	40
 #define SKF_NET_OFF   (-0x100000)
 #define SKF_LL_OFF    (-0x200000)
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a44d27f..054e286 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -375,6 +375,12 @@ load_b:
 				return 0;
 			A = skb->dev->type;
 			continue;
+		case SKF_AD_RXHASH:
+			A = skb->rxhash;
+			continue;
+		case SKF_AD_CPU:
+			A = raw_smp_processor_id();
+			continue;
 		case SKF_AD_NLATTR: {
 			struct nlattr *nla;
 



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  7:36         ` Changli Gao
@ 2010-12-01  7:47           ` Eric Dumazet
  2010-12-01  7:59             ` Changli Gao
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-01  7:47 UTC (permalink / raw)
  To: Changli Gao; +Cc: Rui, netdev

Le mercredi 01 décembre 2010 à 15:36 +0800, Changli Gao a écrit :
> On Wed, Dec 1, 2010 at 11:48 AM, Rui <wirelesser@gmail.com> wrote:
> > one more question is
> >
> > if  RPS can spread the load into 4 separate cpus, how about the
> > "packet_rcv(or tpacket_rcv)" ? will they run in parallel?
> >
> 
> You mentioned RPS. But the current bpf doesn't have an instruction to
> get the current CPU number. You can try this patch attached.
> 
> Maybe we can leverage the bpf and SO_REUSEPORT to direct the traffic
> to the socket instance on the local CPU.
> 

Oh well, it seems you read over my neck, I was preparing a patch with
SKF_AD_RXHASH and SKF_AD_CPU





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  7:47           ` Eric Dumazet
@ 2010-12-01  7:59             ` Changli Gao
  2010-12-01  8:09               ` Eric Dumazet
  0 siblings, 1 reply; 42+ messages in thread
From: Changli Gao @ 2010-12-01  7:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rui, netdev

On Wed, Dec 1, 2010 at 3:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 01 décembre 2010 à 15:36 +0800, Changli Gao a écrit :
>
> Oh well, it seems you read over my neck, I was preparing a patch with
> SKF_AD_RXHASH and SKF_AD_CPU
>
>

Nice to hear it. :)

There are too many filters: bpf, iptables and tc. Maybe an unified one
such as nft is needed. Then the duplicate code would be reduced. Maybe
it is just a good dream.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add SKF_AD_RXHASH and SKF_AD_CPU
  2010-12-01  7:45           ` [PATCH net-next-2.6] filter: add SKF_AD_RXHASH and SKF_AD_CPU Eric Dumazet
@ 2010-12-01  8:03             ` Changli Gao
  2010-12-06 21:02             ` David Miller
  1 sibling, 0 replies; 42+ messages in thread
From: Changli Gao @ 2010-12-01  8:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Rui

On Wed, Dec 1, 2010 at 3:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism,
> to be able to build advanced filters.
>
> This can help spreading packets on several sockets with a fast
> selection, after RPS dispatch to N cpus for example, or to catch a
> percentage of flows in one queue.
>
> tcpdump -s 500 "cpu = 1" :
>
> [0] ld CPU
> [1] jeq #1  jt 2  jf 3
> [2] ret #500
> [3] ret #0
>
> # take 12.5 % of flows (average)
> tcpdump -s 1000 "rxhash & 7 = 2" :
>
> [0] ld RXHASH
> [1] and #7
> [2] jeq #2  jt 3  jf 4
> [3] ret #1000
> [4] ret #0
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Rui <wirelesser@gmail.com>

Acked-by: Changli Gao <xiaosuo@gmail.com>

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  7:59             ` Changli Gao
@ 2010-12-01  8:09               ` Eric Dumazet
  2010-12-01  8:15                 ` Changli Gao
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-01  8:09 UTC (permalink / raw)
  To: Changli Gao; +Cc: Rui, netdev

Le mercredi 01 décembre 2010 à 15:59 +0800, Changli Gao a écrit :
> On Wed, Dec 1, 2010 at 3:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mercredi 01 décembre 2010 à 15:36 +0800, Changli Gao a écrit :
> >
> > Oh well, it seems you read over my neck, I was preparing a patch with
> > SKF_AD_RXHASH and SKF_AD_CPU
> >
> >
> 
> Nice to hear it. :)
> 
> There are too many filters: bpf, iptables and tc. Maybe an unified one
> such as nft is needed. Then the duplicate code would be reduced. Maybe
> it is just a good dream.
> 

You forgot the rxhash function as well, it would be nice to augment it
with custom code if necessary.

A dream would be to have a native compiler, and not interpret pseudo
code...




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  8:09               ` Eric Dumazet
@ 2010-12-01  8:15                 ` Changli Gao
  2010-12-01  8:42                   ` Eric Dumazet
  0 siblings, 1 reply; 42+ messages in thread
From: Changli Gao @ 2010-12-01  8:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rui, netdev

On Wed, Dec 1, 2010 at 4:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> A dream would be to have a native compiler, and not interpret pseudo
> code...
>

FYI: FreeBSD's BPF implementation have JIT compilers in kernel on both
i386 and amd64.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  8:15                 ` Changli Gao
@ 2010-12-01  8:42                   ` Eric Dumazet
  2010-12-01 17:22                     ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-01  8:42 UTC (permalink / raw)
  To: Changli Gao; +Cc: Rui, netdev

Le mercredi 01 décembre 2010 à 16:15 +0800, Changli Gao a écrit :
> On Wed, Dec 1, 2010 at 4:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > A dream would be to have a native compiler, and not interpret pseudo
> > code...
> >
> 
> FYI: FreeBSD's BPF implementation have JIT compilers in kernel on both
> i386 and amd64.
> 


IMHO, a better pcap optimizer would be the first step.

If you take a look at their generated code, its not a real win over the
code we currently have.


Really.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  8:42                   ` Eric Dumazet
@ 2010-12-01 17:22                     ` Hagen Paul Pfeifer
  2010-12-01 18:18                       ` David Miller
  0 siblings, 1 reply; 42+ messages in thread
From: Hagen Paul Pfeifer @ 2010-12-01 17:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, Rui, netdev


On Wed, 01 Dec 2010 09:42:47 +0100, Eric Dumazet wrote:



> IMHO, a better pcap optimizer would be the first step.



+1



Optimizing complex filter rules is step one in the process of optimizing

the packet processing. A JIT compiler like FreeBSD provides cannot polish a

(pcap)turd. I thought Patrick was working on a generic filter mechanism for

netfilter!? ... ;)



Hagen



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01 17:22                     ` Hagen Paul Pfeifer
@ 2010-12-01 18:18                       ` David Miller
  2010-12-01 18:24                         ` David Miller
                                           ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: David Miller @ 2010-12-01 18:18 UTC (permalink / raw)
  To: hagen; +Cc: eric.dumazet, xiaosuo, wirelesser, netdev

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Wed, 01 Dec 2010 18:22:48 +0100

> On Wed, 01 Dec 2010 09:42:47 +0100, Eric Dumazet wrote:
> 
>> IMHO, a better pcap optimizer would be the first step.
> 
> +1
> 
> Optimizing complex filter rules is step one in the process of optimizing
> the packet processing. A JIT compiler like FreeBSD provides cannot polish a
> (pcap)turd. I thought Patrick was working on a generic filter mechanism for
> netfilter!? ... ;)

Yes, and we spoke at the netfilter workshop about making that interpreter
available to socket filters and the packet classifier layer.

However, I think it's still valuable to write a few JIT compilers for
the existing BPF stuff.  I considered working on a sparc64 JIT just to
see what it would look like.

If people work on the BPF optimizer and BPF JITs in parallel, we'll have
both ready at the same time.  win++

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01 18:18                       ` David Miller
@ 2010-12-01 18:24                         ` David Miller
  2010-12-01 18:24                         ` Eric Dumazet
  2010-12-03  6:32                         ` multi bpf filter will impact performance? Eric Dumazet
  2 siblings, 0 replies; 42+ messages in thread
From: David Miller @ 2010-12-01 18:24 UTC (permalink / raw)
  To: hagen; +Cc: eric.dumazet, xiaosuo, wirelesser, netdev

From: David Miller <davem@davemloft.net>
Date: Wed, 01 Dec 2010 10:18:09 -0800 (PST)

> If people work on the BPF optimizer and BPF JITs in parallel, we'll have
> both ready at the same time.  win++

BTW, the JIT is non-trivial work for us because of non-linear SKBs.
We'd need some kind of helper stub or similar, with a straight line
fast path for the linear case.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01 18:18                       ` David Miller
  2010-12-01 18:24                         ` David Miller
@ 2010-12-01 18:24                         ` Eric Dumazet
  2010-12-01 18:44                           ` David Miller
  2010-12-03  6:32                         ` multi bpf filter will impact performance? Eric Dumazet
  2 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-01 18:24 UTC (permalink / raw)
  To: David Miller; +Cc: hagen, xiaosuo, wirelesser, netdev

Le mercredi 01 décembre 2010 à 10:18 -0800, David Miller a écrit :
> From: Hagen Paul Pfeifer <hagen@jauu.net>
> Date: Wed, 01 Dec 2010 18:22:48 +0100
> 
> > On Wed, 01 Dec 2010 09:42:47 +0100, Eric Dumazet wrote:
> > 
> >> IMHO, a better pcap optimizer would be the first step.
> > 
> > +1
> > 
> > Optimizing complex filter rules is step one in the process of optimizing
> > the packet processing. A JIT compiler like FreeBSD provides cannot polish a
> > (pcap)turd. I thought Patrick was working on a generic filter mechanism for
> > netfilter!? ... ;)
> 
> Yes, and we spoke at the netfilter workshop about making that interpreter
> available to socket filters and the packet classifier layer.
> 
> However, I think it's still valuable to write a few JIT compilers for
> the existing BPF stuff.  I considered working on a sparc64 JIT just to
> see what it would look like.
> 
> If people work on the BPF optimizer and BPF JITs in parallel, we'll have
> both ready at the same time.  win++

A third work in progress (from my side) is to add a check in
sk_chk_filter() to remove the memvalid we added lately to protect the
LOAD M(K).

It is needed anyway for arches without a BPF JIT :)




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01 18:24                         ` Eric Dumazet
@ 2010-12-01 18:44                           ` David Miller
  2010-12-01 19:48                             ` Eric Dumazet
  0 siblings, 1 reply; 42+ messages in thread
From: David Miller @ 2010-12-01 18:44 UTC (permalink / raw)
  To: eric.dumazet; +Cc: hagen, xiaosuo, wirelesser, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 01 Dec 2010 19:24:53 +0100

> A third work in progress (from my side) is to add a check in
> sk_chk_filter() to remove the memvalid we added lately to protect the
> LOAD M(K).

I understand your idea, but the static checkers are still going to
complain.  So better add a huge comment in sk_run_filter() explaining
why the checker's complaint should be ignored :-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01 18:44                           ` David Miller
@ 2010-12-01 19:48                             ` Eric Dumazet
  2010-12-01 20:23                               ` David Miller
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-01 19:48 UTC (permalink / raw)
  To: David Miller; +Cc: hagen, xiaosuo, wirelesser, netdev

Le mercredi 01 décembre 2010 à 10:44 -0800, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 01 Dec 2010 19:24:53 +0100
> 
> > A third work in progress (from my side) is to add a check in
> > sk_chk_filter() to remove the memvalid we added lately to protect the
> > LOAD M(K).
> 
> I understand your idea, but the static checkers are still going to
> complain.  So better add a huge comment in sk_run_filter() explaining
> why the checker's complaint should be ignored :-)

Sure, here is the patch I plan to test ASAP

 net/core/filter.c |   70 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 61 insertions(+), 9 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index a44d27f..1e713b3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -166,11 +166,9 @@ unsigned int sk_run_filter(struct sk_buff *skb, const struct sock_filter *fentry
 	u32 A = 0;			/* Accumulator */
 	u32 X = 0;			/* Index Register */
 	u32 mem[BPF_MEMWORDS];		/* Scratch Memory Store */
-	unsigned long memvalid = 0;
 	u32 tmp;
 	int k;
 
-	BUILD_BUG_ON(BPF_MEMWORDS > BITS_PER_LONG);
 	/*
 	 * Process array of filter instructions.
 	 */
@@ -318,12 +316,10 @@ load_b:
 			X = K;
 			continue;
 		case BPF_S_LD_MEM:
-			A = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			A = mem[K];
 			continue;
 		case BPF_S_LDX_MEM:
-			X = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			X = mem[K];
 			continue;
 		case BPF_S_MISC_TAX:
 			X = A;
@@ -336,11 +332,9 @@ load_b:
 		case BPF_S_RET_A:
 			return A;
 		case BPF_S_ST:
-			memvalid |= 1UL << K;
 			mem[K] = A;
 			continue;
 		case BPF_S_STX:
-			memvalid |= 1UL << K;
 			mem[K] = X;
 			continue;
 		default:
@@ -419,6 +413,63 @@ load_b:
 }
 EXPORT_SYMBOL(sk_run_filter);
 
+/*
+ * Security :
+ * A BPF program is able to use 16 cells of memory to store intermediate
+ * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter())
+ * As we dont want to clear mem[] array for each packet going through
+ * sk_run_filter(), we check that filter loaded by user never try to read
+ * a cell if not previously written, and we check all branches to be sure
+ * a malicious user doesnt try to abuse us.
+ */
+static int check_load_and_stores(struct sock_filter *filter, int flen)
+{
+	u16 *masks, memvalid = 0; /* one bit per cell, 16 cells */
+	int pc, ret = 0;
+
+	masks = kmalloc(flen * sizeof(*masks), GFP_KERNEL);
+	if (!masks)
+		return -ENOMEM;
+	memset(masks, 0xff, flen * sizeof(*masks));
+
+	for (pc = 0; pc < flen; pc++) {
+		memvalid &= masks[pc];
+		switch (filter[pc].code) {
+		case BPF_S_ST:
+		case BPF_S_STX:
+			memvalid |= (1 << filter[pc].k);
+			break;
+		case BPF_S_LD_MEM:
+		case BPF_S_LDX_MEM:
+			if (!(memvalid & (1 << filter[pc].k))) {
+				pr_err("filter: bad load(%d) memvalid=%x\n", filter[pc].k, memvalid);
+				ret = -EINVAL;
+				goto error;
+			}
+			break;
+		case BPF_S_JMP_JA:
+			/* a jump must set masks on target */
+			masks[pc + 1 + filter[pc].k] &= memvalid;
+			break;
+		case BPF_S_JMP_JEQ_K:
+		case BPF_S_JMP_JEQ_X:
+		case BPF_S_JMP_JGE_K:
+		case BPF_S_JMP_JGE_X:
+		case BPF_S_JMP_JGT_K:
+		case BPF_S_JMP_JGT_X:
+		case BPF_S_JMP_JSET_X:
+		case BPF_S_JMP_JSET_K:
+			/* a jump must set masks on targets */
+			masks[pc + 1 + filter[pc].jt] &= memvalid;
+			masks[pc + 1 + filter[pc].jf] &= memvalid;
+			break;
+		}
+	}
+error:
+	kfree(masks);
+	return ret;
+}
+
 /**
  *	sk_chk_filter - verify socket filter code
  *	@filter: filter to verify
@@ -432,6 +483,7 @@ EXPORT_SYMBOL(sk_run_filter);
  * All jumps are forward as they are not signed.
  *
  * Returns 0 if the rule set is legal or -EINVAL if not.
+ *
  */
 int sk_chk_filter(struct sock_filter *filter, int flen)
 {
@@ -547,7 +599,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
 	switch (filter[flen - 1].code) {
 	case BPF_S_RET_K:
 	case BPF_S_RET_A:
-		return 0;
+		return check_load_and_stores(filter, flen);
 	}
 	return -EINVAL;
 }



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01 19:48                             ` Eric Dumazet
@ 2010-12-01 20:23                               ` David Miller
  2010-12-01 20:45                                 ` [PATCH net-next-2.6] filter: add a security check at install time Eric Dumazet
  0 siblings, 1 reply; 42+ messages in thread
From: David Miller @ 2010-12-01 20:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: hagen, xiaosuo, wirelesser, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 01 Dec 2010 20:48:57 +0100

> Le mercredi 01 décembre 2010 à 10:44 -0800, David Miller a écrit :
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Wed, 01 Dec 2010 19:24:53 +0100
>> 
>> > A third work in progress (from my side) is to add a check in
>> > sk_chk_filter() to remove the memvalid we added lately to protect the
>> > LOAD M(K).
>> 
>> I understand your idea, but the static checkers are still going to
>> complain.  So better add a huge comment in sk_run_filter() explaining
>> why the checker's complaint should be ignored :-)
> 
> Sure, here is the patch I plan to test ASAP

Looks good to me.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-01 20:23                               ` David Miller
@ 2010-12-01 20:45                                 ` Eric Dumazet
  2010-12-02  2:30                                   ` Changli Gao
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-01 20:45 UTC (permalink / raw)
  To: David Miller; +Cc: hagen, xiaosuo, wirelesser, netdev, Dan Rosenberg

Le mercredi 01 décembre 2010 à 12:23 -0800, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 01 Dec 2010 20:48:57 +0100
> 
> > Le mercredi 01 décembre 2010 à 10:44 -0800, David Miller a écrit :
> >> From: Eric Dumazet <eric.dumazet@gmail.com>
> >> Date: Wed, 01 Dec 2010 19:24:53 +0100
> >> 
> >> > A third work in progress (from my side) is to add a check in
> >> > sk_chk_filter() to remove the memvalid we added lately to protect the
> >> > LOAD M(K).
> >> 
> >> I understand your idea, but the static checkers are still going to
> >> complain.  So better add a huge comment in sk_run_filter() explaining
> >> why the checker's complaint should be ignored :-)
> > 
> > Sure, here is the patch I plan to test ASAP
> 
> Looks good to me.

Yes, it survives tests I did.

I submit the patch and Cc Dan Rosenberg, I would like him to double
check it if he likes.

Thanks

[PATCH net-next-2.6] filter: add a security check at install time

We added some security checks in commit 57fe93b374a6
(filter: make sure filters dont read uninitialized memory) to close a
potential leak of kernel information to user.

This added a potential extra cost at run time, while we can perform a
check of the filter itself, to make sure a malicious user doesnt try to
abuse us.

This patch adds a check_loads() function, whole unique purpose is to
make this check, allocating a temporary array of mask. We scan the
filter and propagate a bitmask information, telling us if a load M(K) is
allowed because a previous store M(K) is guaranteed. (So that
sk_run_filter() can possibly not read unitialized memory)

Note: this can uncover application bug, denying a filter attach,
previously allowed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
---
 net/core/filter.c |   70 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 61 insertions(+), 9 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index a44d27f..00a0d50 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -166,11 +166,9 @@ unsigned int sk_run_filter(struct sk_buff *skb, const struct sock_filter *fentry
 	u32 A = 0;			/* Accumulator */
 	u32 X = 0;			/* Index Register */
 	u32 mem[BPF_MEMWORDS];		/* Scratch Memory Store */
-	unsigned long memvalid = 0;
 	u32 tmp;
 	int k;
 
-	BUILD_BUG_ON(BPF_MEMWORDS > BITS_PER_LONG);
 	/*
 	 * Process array of filter instructions.
 	 */
@@ -318,12 +316,10 @@ load_b:
 			X = K;
 			continue;
 		case BPF_S_LD_MEM:
-			A = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			A = mem[K];
 			continue;
 		case BPF_S_LDX_MEM:
-			X = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			X = mem[K];
 			continue;
 		case BPF_S_MISC_TAX:
 			X = A;
@@ -336,11 +332,9 @@ load_b:
 		case BPF_S_RET_A:
 			return A;
 		case BPF_S_ST:
-			memvalid |= 1UL << K;
 			mem[K] = A;
 			continue;
 		case BPF_S_STX:
-			memvalid |= 1UL << K;
 			mem[K] = X;
 			continue;
 		default:
@@ -419,6 +413,64 @@ load_b:
 }
 EXPORT_SYMBOL(sk_run_filter);
 
+/*
+ * Security :
+ * A BPF program is able to use 16 cells of memory to store intermediate
+ * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter())
+ * As we dont want to clear mem[] array for each packet going through
+ * sk_run_filter(), we check that filter loaded by user never try to read
+ * a cell if not previously written, and we check all branches to be sure
+ * a malicious user doesnt try to abuse us.
+ */
+static int check_loads(struct sock_filter *filter, int flen)
+{
+	u16 *masks, memvalid = 0; /* one bit per cell, 16 cells */
+	int pc, ret = 0;
+
+	BUILD_BUG_ON(BPF_MEMWORDS > 16);
+	masks = kmalloc(flen * sizeof(*masks), GFP_KERNEL);
+	if (!masks)
+		return -ENOMEM;
+	memset(masks, 0xff, flen * sizeof(*masks));
+
+	for (pc = 0; pc < flen; pc++) {
+		memvalid &= masks[pc];
+
+		switch (filter[pc].code) {
+		case BPF_S_ST:
+		case BPF_S_STX:
+			memvalid |= (1 << filter[pc].k);
+			break;
+		case BPF_S_LD_MEM:
+		case BPF_S_LDX_MEM:
+			if (!(memvalid & (1 << filter[pc].k))) {
+				ret = -EINVAL;
+				goto error;
+			}
+			break;
+		case BPF_S_JMP_JA:
+			/* a jump must set masks on target */
+			masks[pc + 1 + filter[pc].k] &= memvalid;
+			break;
+		case BPF_S_JMP_JEQ_K:
+		case BPF_S_JMP_JEQ_X:
+		case BPF_S_JMP_JGE_K:
+		case BPF_S_JMP_JGE_X:
+		case BPF_S_JMP_JGT_K:
+		case BPF_S_JMP_JGT_X:
+		case BPF_S_JMP_JSET_X:
+		case BPF_S_JMP_JSET_K:
+			/* a jump must set masks on targets */
+			masks[pc + 1 + filter[pc].jt] &= memvalid;
+			masks[pc + 1 + filter[pc].jf] &= memvalid;
+			break;
+		}
+	}
+error:
+	kfree(masks);
+	return ret;
+}
+
 /**
  *	sk_chk_filter - verify socket filter code
  *	@filter: filter to verify
@@ -547,7 +599,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
 	switch (filter[flen - 1].code) {
 	case BPF_S_RET_K:
 	case BPF_S_RET_A:
-		return 0;
+		return check_loads(filter, flen);
 	}
 	return -EINVAL;
 }



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-01 20:45                                 ` [PATCH net-next-2.6] filter: add a security check at install time Eric Dumazet
@ 2010-12-02  2:30                                   ` Changli Gao
  2010-12-02  6:46                                     ` Eric Dumazet
  0 siblings, 1 reply; 42+ messages in thread
From: Changli Gao @ 2010-12-02  2:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

On Thu, Dec 2, 2010 at 4:45 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 01 décembre 2010 à 12:23 -0800, David Miller a écrit :
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Wed, 01 Dec 2010 20:48:57 +0100
>>
>> > Le mercredi 01 décembre 2010 à 10:44 -0800, David Miller a écrit :
>> >> From: Eric Dumazet <eric.dumazet@gmail.com>
>> >> Date: Wed, 01 Dec 2010 19:24:53 +0100
>> >>
>> >> > A third work in progress (from my side) is to add a check in
>> >> > sk_chk_filter() to remove the memvalid we added lately to protect the
>> >> > LOAD M(K).
>> >>
>> >> I understand your idea, but the static checkers are still going to
>> >> complain.  So better add a huge comment in sk_run_filter() explaining
>> >> why the checker's complaint should be ignored :-)
>> >
>> > Sure, here is the patch I plan to test ASAP
>>
>> Looks good to me.
>
> Yes, it survives tests I did.
>
> I submit the patch and Cc Dan Rosenberg, I would like him to double
> check it if he likes.
>
> Thanks
>
> [PATCH net-next-2.6] filter: add a security check at install time
>
> We added some security checks in commit 57fe93b374a6
> (filter: make sure filters dont read uninitialized memory) to close a
> potential leak of kernel information to user.
>
> This added a potential extra cost at run time, while we can perform a
> check of the filter itself, to make sure a malicious user doesnt try to
> abuse us.
>
> This patch adds a check_loads() function, whole unique purpose is to
> make this check, allocating a temporary array of mask. We scan the
> filter and propagate a bitmask information, telling us if a load M(K) is
> allowed because a previous store M(K) is guaranteed. (So that
> sk_run_filter() can possibly not read unitialized memory)
>
> Note: this can uncover application bug, denying a filter attach,
> previously allowed.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Dan Rosenberg <drosenberg@vsecurity.com>
> ---
>  net/core/filter.c |   70 ++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 61 insertions(+), 9 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a44d27f..00a0d50 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -166,11 +166,9 @@ unsigned int sk_run_filter(struct sk_buff *skb, const struct sock_filter *fentry
>        u32 A = 0;                      /* Accumulator */
>        u32 X = 0;                      /* Index Register */
>        u32 mem[BPF_MEMWORDS];          /* Scratch Memory Store */
> -       unsigned long memvalid = 0;
>        u32 tmp;
>        int k;
>
> -       BUILD_BUG_ON(BPF_MEMWORDS > BITS_PER_LONG);
>        /*
>         * Process array of filter instructions.
>         */
> @@ -318,12 +316,10 @@ load_b:
>                        X = K;
>                        continue;
>                case BPF_S_LD_MEM:
> -                       A = (memvalid & (1UL << K)) ?
> -                               mem[K] : 0;
> +                       A = mem[K];
>                        continue;
>                case BPF_S_LDX_MEM:
> -                       X = (memvalid & (1UL << K)) ?
> -                               mem[K] : 0;
> +                       X = mem[K];
>                        continue;
>                case BPF_S_MISC_TAX:
>                        X = A;
> @@ -336,11 +332,9 @@ load_b:
>                case BPF_S_RET_A:
>                        return A;
>                case BPF_S_ST:
> -                       memvalid |= 1UL << K;
>                        mem[K] = A;
>                        continue;
>                case BPF_S_STX:
> -                       memvalid |= 1UL << K;
>                        mem[K] = X;
>                        continue;
>                default:
> @@ -419,6 +413,64 @@ load_b:
>  }
>  EXPORT_SYMBOL(sk_run_filter);
>
> +/*
> + * Security :
> + * A BPF program is able to use 16 cells of memory to store intermediate
> + * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter())
> + * As we dont want to clear mem[] array for each packet going through
> + * sk_run_filter(), we check that filter loaded by user never try to read
> + * a cell if not previously written, and we check all branches to be sure
> + * a malicious user doesnt try to abuse us.
> + */
> +static int check_loads(struct sock_filter *filter, int flen)
> +{
> +       u16 *masks, memvalid = 0; /* one bit per cell, 16 cells */
> +       int pc, ret = 0;
> +
> +       BUILD_BUG_ON(BPF_MEMWORDS > 16);
> +       masks = kmalloc(flen * sizeof(*masks), GFP_KERNEL);
> +       if (!masks)
> +               return -ENOMEM;
> +       memset(masks, 0xff, flen * sizeof(*masks));
> +
> +       for (pc = 0; pc < flen; pc++) {
> +               memvalid &= masks[pc];
> +

It seems wrong. Think about the following instructions:

/* m[1] isn't set */
   jeq jt jf
jt:
   st m[1]
   jmp ja
jf:
   jmp ja2 /* m[1] is invalidated by masks */
ja:
   ld m[1] /* -EINVAL is returned */
ja2:

So you need to search all the possible branches to validate the instructions.

> +               switch (filter[pc].code) {
> +               case BPF_S_ST:
> +               case BPF_S_STX:
> +                       memvalid |= (1 << filter[pc].k);
> +                       break;
> +               case BPF_S_LD_MEM:
> +               case BPF_S_LDX_MEM:
> +                       if (!(memvalid & (1 << filter[pc].k))) {
> +                               ret = -EINVAL;
> +                               goto error;
> +                       }
> +                       break;
> +               case BPF_S_JMP_JA:
> +                       /* a jump must set masks on target */
> +                       masks[pc + 1 + filter[pc].k] &= memvalid;
> +                       break;
> +               case BPF_S_JMP_JEQ_K:
> +               case BPF_S_JMP_JEQ_X:
> +               case BPF_S_JMP_JGE_K:
> +               case BPF_S_JMP_JGE_X:
> +               case BPF_S_JMP_JGT_K:
> +               case BPF_S_JMP_JGT_X:
> +               case BPF_S_JMP_JSET_X:
> +               case BPF_S_JMP_JSET_K:
> +                       /* a jump must set masks on targets */
> +                       masks[pc + 1 + filter[pc].jt] &= memvalid;
> +                       masks[pc + 1 + filter[pc].jf] &= memvalid;
> +                       break;
> +               }
> +       }
> +error:
> +       kfree(masks);
> +       return ret;
> +}
> +
>  /**
>  *     sk_chk_filter - verify socket filter code
>  *     @filter: filter to verify
> @@ -547,7 +599,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
>        switch (filter[flen - 1].code) {
>        case BPF_S_RET_K:
>        case BPF_S_RET_A:
> -               return 0;
> +               return check_loads(filter, flen);
>        }
>        return -EINVAL;
>  }
>
>
>



-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  2:30                                   ` Changli Gao
@ 2010-12-02  6:46                                     ` Eric Dumazet
  2010-12-02  8:11                                       ` Changli Gao
  2010-12-06 21:07                                       ` David Miller
  0 siblings, 2 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-12-02  6:46 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

Le jeudi 02 décembre 2010 à 10:30 +0800, Changli Gao a écrit :

> It seems wrong. Think about the following instructions:
> 
> /* m[1] isn't set */
>    jeq jt jf
> jt:
>    st m[1]
>    jmp ja
> jf:
>    jmp ja2 /* m[1] is invalidated by masks */
> ja:
>    ld m[1] /* -EINVAL is returned */
> ja2:
> 
> So you need to search all the possible branches to validate the instructions.

Well, I already did this branch search.

Its only that the instruction following a JMP should not inherit
memvalid from the JMP itself, but the AND of memvalid of all possible
jumps that can arrive to this instruction.

I'll think about it after some coffee, but I feel I might just set
memvalid to ~0 after the JMPS

Thanks

[PATCH v2 net-next-2.6] filter: add a security check at install time

We added some security checks in commit 57fe93b374a6
(filter: make sure filters dont read uninitialized memory) to close a
potential leak of kernel information to user.

This added a potential extra cost at run time, while we can perform a
check of the filter itself, to make sure a malicious user doesnt try to
abuse us.

This patch adds a check_loads() function, whole unique purpose is to
make this check, allocating a temporary array of mask. We scan the
filter and propagate a bitmask information, telling us if a load M(K) is
allowed because a previous store M(K) is guaranteed. (So that
sk_run_filter() can possibly not read unitialized memory)

Note: this can uncover application bug, denying a filter attach,
previously allowed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Changli Gao <xiaosuo@gmail.com>
---
v2: set memvalid to ~0 on JMP instructions

diff --git a/net/core/filter.c b/net/core/filter.c
index a44d27f..0d636ef 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -166,11 +166,9 @@ unsigned int sk_run_filter(struct sk_buff *skb, const struct sock_filter *fentry
 	u32 A = 0;			/* Accumulator */
 	u32 X = 0;			/* Index Register */
 	u32 mem[BPF_MEMWORDS];		/* Scratch Memory Store */
-	unsigned long memvalid = 0;
 	u32 tmp;
 	int k;
 
-	BUILD_BUG_ON(BPF_MEMWORDS > BITS_PER_LONG);
 	/*
 	 * Process array of filter instructions.
 	 */
@@ -318,12 +316,10 @@ load_b:
 			X = K;
 			continue;
 		case BPF_S_LD_MEM:
-			A = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			A = mem[K];
 			continue;
 		case BPF_S_LDX_MEM:
-			X = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			X = mem[K];
 			continue;
 		case BPF_S_MISC_TAX:
 			X = A;
@@ -336,11 +332,9 @@ load_b:
 		case BPF_S_RET_A:
 			return A;
 		case BPF_S_ST:
-			memvalid |= 1UL << K;
 			mem[K] = A;
 			continue;
 		case BPF_S_STX:
-			memvalid |= 1UL << K;
 			mem[K] = X;
 			continue;
 		default:
@@ -419,6 +413,66 @@ load_b:
 }
 EXPORT_SYMBOL(sk_run_filter);
 
+/*
+ * Security :
+ * A BPF program is able to use 16 cells of memory to store intermediate
+ * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter())
+ * As we dont want to clear mem[] array for each packet going through
+ * sk_run_filter(), we check that filter loaded by user never try to read
+ * a cell if not previously written, and we check all branches to be sure
+ * a malicious user doesnt try to abuse us.
+ */
+static int check_load_and_stores(struct sock_filter *filter, int flen)
+{
+	u16 *masks, memvalid = 0; /* one bit per cell, 16 cells */
+	int pc, ret = 0;
+
+	BUILD_BUG_ON(BPF_MEMWORDS > 16);
+	masks = kmalloc(flen * sizeof(*masks), GFP_KERNEL);
+	if (!masks)
+		return -ENOMEM;
+	memset(masks, 0xff, flen * sizeof(*masks));
+
+	for (pc = 0; pc < flen; pc++) {
+		memvalid &= masks[pc];
+
+		switch (filter[pc].code) {
+		case BPF_S_ST:
+		case BPF_S_STX:
+			memvalid |= (1 << filter[pc].k);
+			break;
+		case BPF_S_LD_MEM:
+		case BPF_S_LDX_MEM:
+			if (!(memvalid & (1 << filter[pc].k))) {
+				ret = -EINVAL;
+				goto error;
+			}
+			break;
+		case BPF_S_JMP_JA:
+			/* a jump must set masks on target */
+			masks[pc + 1 + filter[pc].k] &= memvalid;
+			memvalid = ~0;
+			break;
+		case BPF_S_JMP_JEQ_K:
+		case BPF_S_JMP_JEQ_X:
+		case BPF_S_JMP_JGE_K:
+		case BPF_S_JMP_JGE_X:
+		case BPF_S_JMP_JGT_K:
+		case BPF_S_JMP_JGT_X:
+		case BPF_S_JMP_JSET_X:
+		case BPF_S_JMP_JSET_K:
+			/* a jump must set masks on targets */
+			masks[pc + 1 + filter[pc].jt] &= memvalid;
+			masks[pc + 1 + filter[pc].jf] &= memvalid;
+			memvalid = ~0;
+			break;
+		}
+	}
+error:
+	kfree(masks);
+	return ret;
+}
+
 /**
  *	sk_chk_filter - verify socket filter code
  *	@filter: filter to verify
@@ -547,7 +601,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
 	switch (filter[flen - 1].code) {
 	case BPF_S_RET_K:
 	case BPF_S_RET_A:
-		return 0;
+		return check_load_and_stores(filter, flen);
 	}
 	return -EINVAL;
 }






^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  6:46                                     ` Eric Dumazet
@ 2010-12-02  8:11                                       ` Changli Gao
  2010-12-02  8:53                                         ` Eric Dumazet
  2010-12-06 21:07                                       ` David Miller
  1 sibling, 1 reply; 42+ messages in thread
From: Changli Gao @ 2010-12-02  8:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

On Thu, Dec 2, 2010 at 2:46 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> [PATCH v2 net-next-2.6] filter: add a security check at install time
>
> We added some security checks in commit 57fe93b374a6
> (filter: make sure filters dont read uninitialized memory) to close a
> potential leak of kernel information to user.
>
> This added a potential extra cost at run time, while we can perform a
> check of the filter itself, to make sure a malicious user doesnt try to
> abuse us.
>
> This patch adds a check_loads() function, whole unique purpose is to
> make this check, allocating a temporary array of mask. We scan the
> filter and propagate a bitmask information, telling us if a load M(K) is
> allowed because a previous store M(K) is guaranteed. (So that
> sk_run_filter() can possibly not read unitialized memory)
>
> Note: this can uncover application bug, denying a filter attach,
> previously allowed.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Dan Rosenberg <drosenberg@vsecurity.com>
> Cc: Changli Gao <xiaosuo@gmail.com>

It seems correct to me now.

Acked-by: Changli Gao <xiaosuo@gmail.com>

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  8:11                                       ` Changli Gao
@ 2010-12-02  8:53                                         ` Eric Dumazet
  2010-12-02  9:00                                           ` Eric Dumazet
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-02  8:53 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

Le jeudi 02 décembre 2010 à 16:11 +0800, Changli Gao a écrit :

> It seems correct to me now.
> 
> Acked-by: Changli Gao <xiaosuo@gmail.com>
> 

Thanks for reviewing Changli.

Now I am thinking about not denying the filter installation, but change
the problematic LOAD M(1)  and LOADX M(1)  by LOADI #0 (BPF_S_LD_IMM
K=0) and LOADIX #0 (BPF_S_LDX_IMM K=0)

(ie pretend the value of memory is 0, not a random value taken from
stack)


[PATCH v3 net-next-2.6] filter: add a security check at install time

We added some security checks in commit 57fe93b374a6
(filter: make sure filters dont read uninitialized memory) to close a
potential leak of kernel information to user.

This added a potential extra cost at run time, while we can perform a
check of the filter itself, to make sure a malicious user doesnt try to
abuse us.

This patch adds a check_loads() function, whole unique purpose is to
make this check, allocating a temporary array of mask. We scan the
filter and propagate a bitmask information, telling us if a load M(K) is
allowed because a previous store M(K) is guaranteed. 

If we detect a problematic load M(K), we replace it by a load of
immediate value 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Changli Gao <xiaosuo@gmail.com>
---
v3: replace problematic loads M(K) by load of immediate 0 value,
    dont report an error to user.
v2: set memvalid to ~0 on JMP instructions

 net/core/filter.c |   78 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 69 insertions(+), 9 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index a44d27f..4456a6c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -166,11 +166,9 @@ unsigned int sk_run_filter(struct sk_buff *skb, const struct sock_filter *fentry
 	u32 A = 0;			/* Accumulator */
 	u32 X = 0;			/* Index Register */
 	u32 mem[BPF_MEMWORDS];		/* Scratch Memory Store */
-	unsigned long memvalid = 0;
 	u32 tmp;
 	int k;
 
-	BUILD_BUG_ON(BPF_MEMWORDS > BITS_PER_LONG);
 	/*
 	 * Process array of filter instructions.
 	 */
@@ -318,12 +316,10 @@ load_b:
 			X = K;
 			continue;
 		case BPF_S_LD_MEM:
-			A = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			A = mem[K];
 			continue;
 		case BPF_S_LDX_MEM:
-			X = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			X = mem[K];
 			continue;
 		case BPF_S_MISC_TAX:
 			X = A;
@@ -336,11 +332,9 @@ load_b:
 		case BPF_S_RET_A:
 			return A;
 		case BPF_S_ST:
-			memvalid |= 1UL << K;
 			mem[K] = A;
 			continue;
 		case BPF_S_STX:
-			memvalid |= 1UL << K;
 			mem[K] = X;
 			continue;
 		default:
@@ -419,6 +413,72 @@ load_b:
 }
 EXPORT_SYMBOL(sk_run_filter);
 
+/*
+ * Security :
+ * A BPF program is able to use 16 cells of memory to store intermediate
+ * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter())
+ * As we dont want to clear mem[] array for each packet going through
+ * sk_run_filter(), we check that filter loaded by user never try to read
+ * a cell if not previously written, and we check all branches to be sure
+ * a malicious user doesnt try to abuse us.
+ * If such malicious (or buggy) read is detected, its replaced by a
+ * load of immediate zero value.
+ */
+static int check_load_and_stores(struct sock_filter *filter, int flen)
+{
+	u16 *masks, memvalid = 0; /* one bit per cell, 16 cells */
+	int pc;
+
+	BUILD_BUG_ON(BPF_MEMWORDS > 16);
+	masks = kmalloc(flen * sizeof(*masks), GFP_KERNEL);
+	if (!masks)
+		return -ENOMEM;
+	memset(masks, 0xff, flen * sizeof(*masks));
+
+	for (pc = 0; pc < flen; pc++) {
+		memvalid &= masks[pc];
+
+		switch (filter[pc].code) {
+		case BPF_S_ST:
+		case BPF_S_STX:
+			memvalid |= (1 << filter[pc].k);
+			break;
+		case BPF_S_LD_MEM:
+			if (!(memvalid & (1 << filter[pc].k))) {
+				filter[pc].code = BPF_S_LD_IMM;
+				filter[pc].k = 0;
+			}
+			break;
+		case BPF_S_LDX_MEM:
+			if (!(memvalid & (1 << filter[pc].k))) {
+				filter[pc].code = BPF_S_LDX_IMM;
+				filter[pc].k = 0;
+			}
+			break;
+		case BPF_S_JMP_JA:
+			/* a jump must set masks on target */
+			masks[pc + 1 + filter[pc].k] &= memvalid;
+			memvalid = ~0;
+			break;
+		case BPF_S_JMP_JEQ_K:
+		case BPF_S_JMP_JEQ_X:
+		case BPF_S_JMP_JGE_K:
+		case BPF_S_JMP_JGE_X:
+		case BPF_S_JMP_JGT_K:
+		case BPF_S_JMP_JGT_X:
+		case BPF_S_JMP_JSET_X:
+		case BPF_S_JMP_JSET_K:
+			/* a jump must set masks on targets */
+			masks[pc + 1 + filter[pc].jt] &= memvalid;
+			masks[pc + 1 + filter[pc].jf] &= memvalid;
+			memvalid = ~0;
+			break;
+		}
+	}
+	kfree(masks);
+	return 0;
+}
+
 /**
  *	sk_chk_filter - verify socket filter code
  *	@filter: filter to verify
@@ -547,7 +607,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
 	switch (filter[flen - 1].code) {
 	case BPF_S_RET_K:
 	case BPF_S_RET_A:
-		return 0;
+		return check_load_and_stores(filter, flen);
 	}
 	return -EINVAL;
 }



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  8:53                                         ` Eric Dumazet
@ 2010-12-02  9:00                                           ` Eric Dumazet
  2010-12-02  9:10                                             ` Changli Gao
  2010-12-02 10:59                                             ` Changli Gao
  0 siblings, 2 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-12-02  9:00 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

Le jeudi 02 décembre 2010 à 09:53 +0100, Eric Dumazet a écrit :
> Le jeudi 02 décembre 2010 à 16:11 +0800, Changli Gao a écrit :
> 
> > It seems correct to me now.
> > 
> > Acked-by: Changli Gao <xiaosuo@gmail.com>
> > 
> 
> Thanks for reviewing Changli.
> 
> Now I am thinking about not denying the filter installation, but change
> the problematic LOAD M(1)  and LOADX M(1)  by LOADI #0 (BPF_S_LD_IMM
> K=0) and LOADIX #0 (BPF_S_LDX_IMM K=0)
> 
> (ie pretend the value of memory is 0, not a random value taken from
> stack)
> 
> 
> [PATCH v3 net-next-2.6] filter: add a security check at install time

Doh, I sent a version with old (V1) check_load_and_stores() name, here
is a V4 with shorter name check_loads() as mentioned in changelog.

Sorry for the mess.

[PATCH v4 net-next-2.6] filter: add a security check at install time

We added some security checks in commit 57fe93b374a6
(filter: make sure filters dont read uninitialized memory) to close a
potential leak of kernel information to user.

This added a potential extra cost at run time, while we can perform a
check of the filter itself, to make sure a malicious user doesnt try to
abuse us.

This patch adds a check_loads() function, whole unique purpose is to
make this check, allocating a temporary array of mask. We scan the
filter and propagate a bitmask information, telling us if a load M(K) is
allowed because a previous store M(K) is guaranteed. 

If we detect a problematic load M(K), we replace it by a load of
immediate value 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Changli Gao <xiaosuo@gmail.com>
---
v4: really use check_loads(), not check_load_and_stores()
v3: replace problematic loads M(K) by load of immediate 0 value,
    dont report an error to user.
v2: set memvalid to ~0 on JMP instructions

 net/core/filter.c |   78 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 69 insertions(+), 9 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index a44d27f..2bd7dbc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -166,11 +166,9 @@ unsigned int sk_run_filter(struct sk_buff *skb, const struct sock_filter *fentry
 	u32 A = 0;			/* Accumulator */
 	u32 X = 0;			/* Index Register */
 	u32 mem[BPF_MEMWORDS];		/* Scratch Memory Store */
-	unsigned long memvalid = 0;
 	u32 tmp;
 	int k;
 
-	BUILD_BUG_ON(BPF_MEMWORDS > BITS_PER_LONG);
 	/*
 	 * Process array of filter instructions.
 	 */
@@ -318,12 +316,10 @@ load_b:
 			X = K;
 			continue;
 		case BPF_S_LD_MEM:
-			A = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			A = mem[K];
 			continue;
 		case BPF_S_LDX_MEM:
-			X = (memvalid & (1UL << K)) ?
-				mem[K] : 0;
+			X = mem[K];
 			continue;
 		case BPF_S_MISC_TAX:
 			X = A;
@@ -336,11 +332,9 @@ load_b:
 		case BPF_S_RET_A:
 			return A;
 		case BPF_S_ST:
-			memvalid |= 1UL << K;
 			mem[K] = A;
 			continue;
 		case BPF_S_STX:
-			memvalid |= 1UL << K;
 			mem[K] = X;
 			continue;
 		default:
@@ -419,6 +413,72 @@ load_b:
 }
 EXPORT_SYMBOL(sk_run_filter);
 
+/*
+ * Security :
+ * A BPF program is able to use 16 cells of memory to store intermediate
+ * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter())
+ * As we dont want to clear mem[] array for each packet going through
+ * sk_run_filter(), we check that filter loaded by user never try to read
+ * a cell if not previously written, and we check all branches to be sure
+ * a malicious user doesnt try to abuse us.
+ * If such malicious (or buggy) read is detected, its replaced by a
+ * load of immediate zero value.
+ */
+static int check_loads(struct sock_filter *filter, int flen)
+{
+	u16 *masks, memvalid = 0; /* one bit per cell, 16 cells */
+	int pc;
+
+	BUILD_BUG_ON(BPF_MEMWORDS > 16);
+	masks = kmalloc(flen * sizeof(*masks), GFP_KERNEL);
+	if (!masks)
+		return -ENOMEM;
+	memset(masks, 0xff, flen * sizeof(*masks));
+
+	for (pc = 0; pc < flen; pc++) {
+		memvalid &= masks[pc];
+
+		switch (filter[pc].code) {
+		case BPF_S_ST:
+		case BPF_S_STX:
+			memvalid |= (1 << filter[pc].k);
+			break;
+		case BPF_S_LD_MEM:
+			if (!(memvalid & (1 << filter[pc].k))) {
+				filter[pc].code = BPF_S_LD_IMM;
+				filter[pc].k = 0;
+			}
+			break;
+		case BPF_S_LDX_MEM:
+			if (!(memvalid & (1 << filter[pc].k))) {
+				filter[pc].code = BPF_S_LDX_IMM;
+				filter[pc].k = 0;
+			}
+			break;
+		case BPF_S_JMP_JA:
+			/* a jump must set masks on target */
+			masks[pc + 1 + filter[pc].k] &= memvalid;
+			memvalid = ~0;
+			break;
+		case BPF_S_JMP_JEQ_K:
+		case BPF_S_JMP_JEQ_X:
+		case BPF_S_JMP_JGE_K:
+		case BPF_S_JMP_JGE_X:
+		case BPF_S_JMP_JGT_K:
+		case BPF_S_JMP_JGT_X:
+		case BPF_S_JMP_JSET_X:
+		case BPF_S_JMP_JSET_K:
+			/* a jump must set masks on targets */
+			masks[pc + 1 + filter[pc].jt] &= memvalid;
+			masks[pc + 1 + filter[pc].jf] &= memvalid;
+			memvalid = ~0;
+			break;
+		}
+	}
+	kfree(masks);
+	return 0;
+}
+
 /**
  *	sk_chk_filter - verify socket filter code
  *	@filter: filter to verify
@@ -547,7 +607,7 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
 	switch (filter[flen - 1].code) {
 	case BPF_S_RET_K:
 	case BPF_S_RET_A:
-		return 0;
+		return check_loads(filter, flen);
 	}
 	return -EINVAL;
 }



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  9:00                                           ` Eric Dumazet
@ 2010-12-02  9:10                                             ` Changli Gao
  2010-12-02  9:54                                               ` Eric Dumazet
  2010-12-02 10:59                                             ` Changli Gao
  1 sibling, 1 reply; 42+ messages in thread
From: Changli Gao @ 2010-12-02  9:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

On Thu, Dec 2, 2010 at 5:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 02 décembre 2010 à 09:53 +0100, Eric Dumazet a écrit :
>> Le jeudi 02 décembre 2010 à 16:11 +0800, Changli Gao a écrit :
>>
>> > It seems correct to me now.
>> >
>> > Acked-by: Changli Gao <xiaosuo@gmail.com>
>> >
>>
>> Thanks for reviewing Changli.
>>
>> Now I am thinking about not denying the filter installation, but change
>> the problematic LOAD M(1)  and LOADX M(1)  by LOADI #0 (BPF_S_LD_IMM
>> K=0) and LOADIX #0 (BPF_S_LDX_IMM K=0)

Oops. We were wrong. The RAM of BPF machine is initialized to 0. So
loading from a cell, in which no value is stored before, is valid. So
we can't prevent the following instructions.

jeq jt jf
jt:
store m[0]
jf:
load m[0]

After applying your patch, the third instruction will be replaced with
load 0. It is wrong for the jt branch. So NACK.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  9:10                                             ` Changli Gao
@ 2010-12-02  9:54                                               ` Eric Dumazet
  2010-12-02 10:10                                                 ` Changli Gao
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-02  9:54 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

Le jeudi 02 décembre 2010 à 17:10 +0800, Changli Gao a écrit :
> On Thu, Dec 2, 2010 at 5:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le jeudi 02 décembre 2010 à 09:53 +0100, Eric Dumazet a écrit :
> >> Le jeudi 02 décembre 2010 à 16:11 +0800, Changli Gao a écrit :
> >>
> >> > It seems correct to me now.
> >> >
> >> > Acked-by: Changli Gao <xiaosuo@gmail.com>
> >> >
> >>
> >> Thanks for reviewing Changli.
> >>
> >> Now I am thinking about not denying the filter installation, but change
> >> the problematic LOAD M(1)  and LOADX M(1)  by LOADI #0 (BPF_S_LD_IMM
> >> K=0) and LOADIX #0 (BPF_S_LDX_IMM K=0)
> 
> Oops. We were wrong. The RAM of BPF machine is initialized to 0. So
> loading from a cell, in which no value is stored before, is valid. So
> we can't prevent the following instructions.
> 

It was not 'initialized to 0', thats the point of previous patches.



> jeq jt jf

> jt:
> store m[0]
> jf:
> load m[0]
> 
> After applying your patch, the third instruction will be replaced with
> load 0. It is wrong for the jt branch. So NACK.
> 
> 

But this is _exactly_ the case we want to deny (or protect)

We want to : 

- Accept valid programs generated by libpcap current and future
optimizers. Show me a real sample.

A valid program doesnt mix stores/loads like you tried.

Memories are used because of limited instruction and register (A, X)
set.

- Make sure a malicious or stupid or buggy program doesnt read garbage
from stack.

After optimizer, your program should read (no memory load/stores)

RET #0


To let your 'program' run, we could add temporary state saying :

Memory K has a known value m(k), or an unknown one.
Register A has a known value a, or an unkown one.
Register X has a known value x, or an unkown one.

And be able to "optimize" stupid "jeq jt jf" tests if value of A is
known, since we know the result of test (only one branch will be taken)

I am not sure its worth it, really, since all instruction set should be
taken into account to maintain this state. (implement kind of an
optimizer in kernel)

It's probably better to spend time in userland optimizer, and a JIT
compiler...

(By the way, I believe FreeBSD has the security problem Dan reported to
us)




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  9:54                                               ` Eric Dumazet
@ 2010-12-02 10:10                                                 ` Changli Gao
  2010-12-02 11:15                                                   ` Eric Dumazet
  0 siblings, 1 reply; 42+ messages in thread
From: Changli Gao @ 2010-12-02 10:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

On Thu, Dec 2, 2010 at 5:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 02 décembre 2010 à 17:10 +0800, Changli Gao a écrit :
>>
>> Oops. We were wrong. The RAM of BPF machine is initialized to 0. So
>> loading from a cell, in which no value is stored before, is valid. So
>> we can't prevent the following instructions.
>>
>
> It was not 'initialized to 0', thats the point of previous patches.
>

I checked the implementation of bpf in FreeBSD, and found RAM isn't
initialized to 0. Then it could not be a common 'feature', and no
application relies on it. Maybe we can drop this 'feature' added by
accident, and break the 'ABI'.

Now, I agree with you totally. Thank for your explaining..

Acked-by: Changli Gao <xiaosuo@gmail.com>

>
> (By the way, I believe FreeBSD has the security problem Dan reported to us)

Yes. it doesn't do this check.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  9:00                                           ` Eric Dumazet
  2010-12-02  9:10                                             ` Changli Gao
@ 2010-12-02 10:59                                             ` Changli Gao
  1 sibling, 0 replies; 42+ messages in thread
From: Changli Gao @ 2010-12-02 10:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

On Thu, Dec 2, 2010 at 5:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 02 décembre 2010 à 09:53 +0100, Eric Dumazet a écrit :
>> Le jeudi 02 décembre 2010 à 16:11 +0800, Changli Gao a écrit :
>>
>> > It seems correct to me now.
>> >
>> > Acked-by: Changli Gao <xiaosuo@gmail.com>
>> >
>>
>> Thanks for reviewing Changli.
>>
>> Now I am thinking about not denying the filter installation, but change
>> the problematic LOAD M(1)  and LOADX M(1)  by LOADI #0 (BPF_S_LD_IMM
>> K=0) and LOADIX #0 (BPF_S_LDX_IMM K=0)
>>
>> (ie pretend the value of memory is 0, not a random value taken from
>> stack)
>>
>>
>> [PATCH v3 net-next-2.6] filter: add a security check at install time
>
> Doh, I sent a version with old (V1) check_load_and_stores() name, here
> is a V4 with shorter name check_loads() as mentioned in changelog.
>
> Sorry for the mess.
>
> [PATCH v4 net-next-2.6] filter: add a security check at install time
>
> We added some security checks in commit 57fe93b374a6
> (filter: make sure filters dont read uninitialized memory) to close a
> potential leak of kernel information to user.
>
> This added a potential extra cost at run time, while we can perform a
> check of the filter itself, to make sure a malicious user doesnt try to
> abuse us.
>
> This patch adds a check_loads() function, whole unique purpose is to
> make this check, allocating a temporary array of mask. We scan the
> filter and propagate a bitmask information, telling us if a load M(K) is
> allowed because a previous store M(K) is guaranteed.
>
> If we detect a problematic load M(K), we replace it by a load of
> immediate value 0
>

Since RAM of BPF isn't initialized to zero and this kind of
instruction sequences is invalid, I think we'd better return -EINVAL
instead of mangling the instruction silently. I prefer V2.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02 10:10                                                 ` Changli Gao
@ 2010-12-02 11:15                                                   ` Eric Dumazet
  2010-12-02 11:29                                                     ` Changli Gao
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-02 11:15 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

Le jeudi 02 décembre 2010 à 18:10 +0800, Changli Gao a écrit :

> > (By the way, I believe FreeBSD has the security problem Dan reported to us)
> 
> Yes. it doesn't do this check.
> 

Their scratch memory is not on stack but part of the filter, so no
security problem (You can only read previous values of scratch registers
written by your own filter on handling a previous packet.)

This means it is either run on a single CPU, or there is a hidden SMP
bug in their implementation.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02 11:15                                                   ` Eric Dumazet
@ 2010-12-02 11:29                                                     ` Changli Gao
  2010-12-02 13:14                                                       ` Eric Dumazet
  0 siblings, 1 reply; 42+ messages in thread
From: Changli Gao @ 2010-12-02 11:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

On Thu, Dec 2, 2010 at 7:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Their scratch memory is not on stack but part of the filter, so no
> security problem (You can only read previous values of scratch registers
> written by your own filter on handling a previous packet.)
>

The code I checked is the newest in SVN. The scratch memory is really on stack.

/*
 * Execute the filter program starting at pc on the packet p
 * wirelen is the length of the original packet
 * buflen is the amount of data present
 */
u_int
bpf_filter(const struct bpf_insn *pc, u_char *p, u_int wirelen, u_int buflen)
{
        u_int32_t A = 0, X = 0;
        bpf_u_int32 k;
        u_int32_t mem[BPF_MEMWORDS];



-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02 11:29                                                     ` Changli Gao
@ 2010-12-02 13:14                                                       ` Eric Dumazet
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-12-02 13:14 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, hagen, wirelesser, netdev, Dan Rosenberg

Le jeudi 02 décembre 2010 à 19:29 +0800, Changli Gao a écrit :
> On Thu, Dec 2, 2010 at 7:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Their scratch memory is not on stack but part of the filter, so no
> > security problem (You can only read previous values of scratch registers
> > written by your own filter on handling a previous packet.)
> >
> 
> The code I checked is the newest in SVN. The scratch memory is really on stack.
> 

> u_int
> bpf_filter(const struct bpf_insn *pc, u_char *p, u_int wirelen, u_int buflen)
> {
>         u_int32_t A = 0, X = 0;
>         bpf_u_int32 k;
>         u_int32_t mem[BPF_MEMWORDS];
> 
> 
> 

This legacy code is not used on x86 now they have JIT by default ?

My remark about SMP 'problem' was about their JIT implementation.

net/bpf_jitter.h
/* Structure describing a native filtering program created by the jitter. */

typedef struct bpf_jit_filter {
        /* The native filtering binary, in the form of a bpf_filter_func. */
        bpf_filter_func func;

        int             mem[BPF_MEMWORDS];      /* Scratch memory */
} bpf_jit_filter;

Apparently they use locking around this stuff

BPFD_LOCK(d);
...
   bf = bpf_jitter_enable != 0 ? d->bd_bfilter : NULL;
   if (bf != NULL)
	slen = (*(bf->func))(pkt, pktlen, pktlen); 
   else
	slen = bpf_filter(d->bd_rfilter, pkt, pktlen, pktlen);

BPFD_UNLOCK(d);




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01 18:18                       ` David Miller
  2010-12-01 18:24                         ` David Miller
  2010-12-01 18:24                         ` Eric Dumazet
@ 2010-12-03  6:32                         ` Eric Dumazet
  2010-12-05 20:53                           ` PATCH] filter: fix sk_filter rcu handling Eric Dumazet
  2 siblings, 1 reply; 42+ messages in thread
From: Eric Dumazet @ 2010-12-03  6:32 UTC (permalink / raw)
  To: David Miller; +Cc: hagen, xiaosuo, wirelesser, netdev

Le mercredi 01 décembre 2010 à 10:18 -0800, David Miller a écrit :

> However, I think it's still valuable to write a few JIT compilers for
> the existing BPF stuff.  I considered working on a sparc64 JIT just to
> see what it would look like.
> 
> If people work on the BPF optimizer and BPF JITs in parallel, we'll have
> both ready at the same time.  win++

I began work on implementing a BPF JIT for x86_64

My plan is to use external helpers to load skb data/metadata, to keep
BPF program very short and have no dependencies against struct layouts.

These helpers would be the three load_word, load_half, load_byte.

In case the bits are in skb head, these helpers should be fast.

For practical reasons, they would be in ASM for their fast path, and C
for the slow path. They are ASM because they are able to perform the
shortcut (in case of error, doing the stack unwind to perform the
"return 0;") so that we dont have to test their return from the JIT
program.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: multi bpf filter will impact performance?
  2010-12-01  4:03         ` Eric Dumazet
  2010-12-01  7:45           ` [PATCH net-next-2.6] filter: add SKF_AD_RXHASH and SKF_AD_CPU Eric Dumazet
@ 2010-12-03  9:40           ` Junchang Wang
  1 sibling, 0 replies; 42+ messages in thread
From: Junchang Wang @ 2010-12-03  9:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rui, netdev, Fenghua Yu

On Wed, Dec 1, 2010 at 12:03 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> But its rather important for performance that each cpu store packets
> into its own packet socket or ring buffer, to avoid false sharing
> slowdowns.

Hi Eric,
But the current situation is that a single socket (AF_PACKET for example)
can only create a ring buffer, meaning that multiple cores will compete
for passing data to user-space programs. Right?

>
> With such a setup (split packets to four cpus, then make sure one cpu
> deliver packets to one particular PACKET socket/ring buffer), it should
> really be fast enough.
>

Is there any patch or configuration? I'm preparing such a patch. I wonder
whether there's conflict.


Thanks.

--Junchang

^ permalink raw reply	[flat|nested] 42+ messages in thread

* PATCH] filter: fix sk_filter rcu handling
  2010-12-03  6:32                         ` multi bpf filter will impact performance? Eric Dumazet
@ 2010-12-05 20:53                           ` Eric Dumazet
  2010-12-05 21:08                             ` Andi Kleen
  2010-12-06 17:29                             ` David Miller
  0 siblings, 2 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-12-05 20:53 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, hagen, xiaosuo, wirelesser, Paul E. McKenney, stable,
	Pavel Emelyanov

Le vendredi 03 décembre 2010 à 07:32 +0100, Eric Dumazet a écrit :
> Le mercredi 01 décembre 2010 à 10:18 -0800, David Miller a écrit :
> 
> > However, I think it's still valuable to write a few JIT compilers for
> > the existing BPF stuff.  I considered working on a sparc64 JIT just to
> > see what it would look like.
> > 
> > If people work on the BPF optimizer and BPF JITs in parallel, we'll have
> > both ready at the same time.  win++
> 
> I began work on implementing a BPF JIT for x86_64
> 
> My plan is to use external helpers to load skb data/metadata, to keep
> BPF program very short and have no dependencies against struct layouts.
> 
> These helpers would be the three load_word, load_half, load_byte.
> 
> In case the bits are in skb head, these helpers should be fast.
> 
> For practical reasons, they would be in ASM for their fast path, and C
> for the slow path. They are ASM because they are able to perform the
> shortcut (in case of error, doing the stack unwind to perform the
> "return 0;") so that we dont have to test their return from the JIT
> program.
> 
> 

While working on this, I found an annoying problem with current code.

This patch is a stable candidate.

Thanks


[PATCH] filter: fix sk_filter rcu handling

Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
sk_clone() in commit 47e958eac280c263397

Problem is we can have several clones sharing a common sk_filter, and
these clones might want to sk_filter_attach() their own filters at the
same time, and can overwrite old_filter->rcu, corrupting RCU queues.

We can not use filter->rcu without being sure no other thread could do
the same thing.

Switch code to a more conventional ref-counting technique : Do the
atomic decrement immediately and queue one rcu call back when last
reference is released.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@kernel.org
---
 include/net/sock.h |    4 +++-
 net/core/filter.c  |   19 ++++++-------------
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index a6338d0..4308af7 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1155,6 +1155,8 @@ extern void sk_common_release(struct sock *sk);
 /* Initialise core socket variables */
 extern void sock_init_data(struct socket *sock, struct sock *sk);
 
+extern void sk_filter_release_rcu(struct rcu_head *rcu);
+
 /**
  *	sk_filter_release - release a socket filter
  *	@fp: filter to remove
@@ -1165,7 +1167,7 @@ extern void sock_init_data(struct socket *sock, struct sock *sk);
 static inline void sk_filter_release(struct sk_filter *fp)
 {
 	if (atomic_dec_and_test(&fp->refcnt))
-		kfree(fp);
+		call_rcu_bh(&fp->rcu, sk_filter_release_rcu);
 }
 
 static inline void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp)
diff --git a/net/core/filter.c b/net/core/filter.c
index c1ee800..ae21a0d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -589,23 +589,16 @@ int sk_chk_filter(struct sock_filter *filter, int flen)
 EXPORT_SYMBOL(sk_chk_filter);
 
 /**
- * 	sk_filter_rcu_release - Release a socket filter by rcu_head
+ * 	sk_filter_release_rcu - Release a socket filter by rcu_head
  *	@rcu: rcu_head that contains the sk_filter to free
  */
-static void sk_filter_rcu_release(struct rcu_head *rcu)
+void sk_filter_release_rcu(struct rcu_head *rcu)
 {
 	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
 
-	sk_filter_release(fp);
-}
-
-static void sk_filter_delayed_uncharge(struct sock *sk, struct sk_filter *fp)
-{
-	unsigned int size = sk_filter_len(fp);
-
-	atomic_sub(size, &sk->sk_omem_alloc);
-	call_rcu_bh(&fp->rcu, sk_filter_rcu_release);
+	kfree(fp);
 }
+EXPORT_SYMBOL(sk_filter_release_rcu);
 
 /**
  *	sk_attach_filter - attach a socket filter
@@ -649,7 +642,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 	rcu_assign_pointer(sk->sk_filter, fp);
 
 	if (old_fp)
-		sk_filter_delayed_uncharge(sk, old_fp);
+		sk_filter_uncharge(sk, old_fp);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sk_attach_filter);
@@ -663,7 +656,7 @@ int sk_detach_filter(struct sock *sk)
 					   sock_owned_by_user(sk));
 	if (filter) {
 		rcu_assign_pointer(sk->sk_filter, NULL);
-		sk_filter_delayed_uncharge(sk, filter);
+		sk_filter_uncharge(sk, filter);
 		ret = 0;
 	}
 	return ret;


_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: PATCH] filter: fix sk_filter rcu handling
  2010-12-05 20:53                           ` PATCH] filter: fix sk_filter rcu handling Eric Dumazet
@ 2010-12-05 21:08                             ` Andi Kleen
  2010-12-05 21:28                               ` Eric Dumazet
  2010-12-06 17:29                             ` David Miller
  1 sibling, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2010-12-05 21:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, hagen, stable, xiaosuo, wirelesser, Paul E. McKenney,
	David Miller, Pavel Emelyanov

> While working on this, I found an annoying problem with current code.
> 
> This patch is a stable candidate.

Does this actually fix a real world problem someone is experiencing? 
"Theoretical races" are normally outside stable's scope.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PATCH] filter: fix sk_filter rcu handling
  2010-12-05 21:08                             ` Andi Kleen
@ 2010-12-05 21:28                               ` Eric Dumazet
  0 siblings, 0 replies; 42+ messages in thread
From: Eric Dumazet @ 2010-12-05 21:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: netdev, hagen, stable, xiaosuo, wirelesser, Paul E. McKenney,
	David Miller, Pavel Emelyanov

Le dimanche 05 décembre 2010 à 22:08 +0100, Andi Kleen a écrit :
> > While working on this, I found an annoying problem with current code.
> > 
> > This patch is a stable candidate.
> 
> Does this actually fix a real world problem someone is experiencing? 
> "Theoretical races" are normally outside stable's scope.
> 
> -Andi

I guess its a security problem.

We can probably write a program, run by unpriviledge user to crash the
machine.

I am not sure I want to spend time to write this program, I have other
more interesting topics, but some hackers probably want to.



_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: PATCH] filter: fix sk_filter rcu handling
  2010-12-05 20:53                           ` PATCH] filter: fix sk_filter rcu handling Eric Dumazet
  2010-12-05 21:08                             ` Andi Kleen
@ 2010-12-06 17:29                             ` David Miller
  1 sibling, 0 replies; 42+ messages in thread
From: David Miller @ 2010-12-06 17:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: hagen, xiaosuo, wirelesser, netdev, xemul, stable, paulmck

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 05 Dec 2010 21:53:52 +0100

> [PATCH] filter: fix sk_filter rcu handling
> 
> Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
> sk_clone() in commit 47e958eac280c263397
> 
> Problem is we can have several clones sharing a common sk_filter, and
> these clones might want to sk_filter_attach() their own filters at the
> same time, and can overwrite old_filter->rcu, corrupting RCU queues.
> 
> We can not use filter->rcu without being sure no other thread could do
> the same thing.
> 
> Switch code to a more conventional ref-counting technique : Do the
> atomic decrement immediately and queue one rcu call back when last
> reference is released.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add SKF_AD_RXHASH and SKF_AD_CPU
  2010-12-01  7:45           ` [PATCH net-next-2.6] filter: add SKF_AD_RXHASH and SKF_AD_CPU Eric Dumazet
  2010-12-01  8:03             ` Changli Gao
@ 2010-12-06 21:02             ` David Miller
  1 sibling, 0 replies; 42+ messages in thread
From: David Miller @ 2010-12-06 21:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, wirelesser

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 01 Dec 2010 08:45:56 +0100

> Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism,
> to be able to build advanced filters.
> 
> This can help spreading packets on several sockets with a fast
> selection, after RPS dispatch to N cpus for example, or to catch a
> percentage of flows in one queue.
> 
> tcpdump -s 500 "cpu = 1" :
> 
> [0] ld CPU
> [1] jeq #1  jt 2  jf 3 
> [2] ret #500
> [3] ret #0
> 
> # take 12.5 % of flows (average)
> tcpdump -s 1000 "rxhash & 7 = 2" :
> 
> [0] ld RXHASH
> [1] and #7
> [2] jeq #2  jt 3  jf 4 
> [3] ret #1000
> [4] ret #0
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next-2.6] filter: add a security check at install time
  2010-12-02  6:46                                     ` Eric Dumazet
  2010-12-02  8:11                                       ` Changli Gao
@ 2010-12-06 21:07                                       ` David Miller
  1 sibling, 0 replies; 42+ messages in thread
From: David Miller @ 2010-12-06 21:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, hagen, wirelesser, netdev, drosenberg

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 02 Dec 2010 07:46:24 +0100

> [PATCH v2 net-next-2.6] filter: add a security check at install time

Applied.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2010-12-06 21:07 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-30  9:22 multi bpf filter will impact performance? Rui
2010-11-30  9:34 ` Eric Dumazet
     [not found]   ` <AANLkTi=VpmnrXTBNV7McQm6mq9ULT7KTKbM8_hLPoL=2@mail.gmail.com>
     [not found]     ` <1291127670.2904.96.camel@edumazet-laptop>
2010-12-01  3:48       ` Rui
2010-12-01  4:03         ` Eric Dumazet
2010-12-01  7:45           ` [PATCH net-next-2.6] filter: add SKF_AD_RXHASH and SKF_AD_CPU Eric Dumazet
2010-12-01  8:03             ` Changli Gao
2010-12-06 21:02             ` David Miller
2010-12-03  9:40           ` multi bpf filter will impact performance? Junchang Wang
2010-12-01  7:36         ` Changli Gao
2010-12-01  7:47           ` Eric Dumazet
2010-12-01  7:59             ` Changli Gao
2010-12-01  8:09               ` Eric Dumazet
2010-12-01  8:15                 ` Changli Gao
2010-12-01  8:42                   ` Eric Dumazet
2010-12-01 17:22                     ` Hagen Paul Pfeifer
2010-12-01 18:18                       ` David Miller
2010-12-01 18:24                         ` David Miller
2010-12-01 18:24                         ` Eric Dumazet
2010-12-01 18:44                           ` David Miller
2010-12-01 19:48                             ` Eric Dumazet
2010-12-01 20:23                               ` David Miller
2010-12-01 20:45                                 ` [PATCH net-next-2.6] filter: add a security check at install time Eric Dumazet
2010-12-02  2:30                                   ` Changli Gao
2010-12-02  6:46                                     ` Eric Dumazet
2010-12-02  8:11                                       ` Changli Gao
2010-12-02  8:53                                         ` Eric Dumazet
2010-12-02  9:00                                           ` Eric Dumazet
2010-12-02  9:10                                             ` Changli Gao
2010-12-02  9:54                                               ` Eric Dumazet
2010-12-02 10:10                                                 ` Changli Gao
2010-12-02 11:15                                                   ` Eric Dumazet
2010-12-02 11:29                                                     ` Changli Gao
2010-12-02 13:14                                                       ` Eric Dumazet
2010-12-02 10:59                                             ` Changli Gao
2010-12-06 21:07                                       ` David Miller
2010-12-03  6:32                         ` multi bpf filter will impact performance? Eric Dumazet
2010-12-05 20:53                           ` PATCH] filter: fix sk_filter rcu handling Eric Dumazet
2010-12-05 21:08                             ` Andi Kleen
2010-12-05 21:28                               ` Eric Dumazet
2010-12-06 17:29                             ` David Miller
2010-11-30 10:01 ` multi bpf filter will impact performance? Eric Dumazet
2010-11-30 11:17 ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.