* [PATCH net-next v2 0/2] net: permit skb_segment on head_frag frag_list skb
@ 2018-03-20 15:54 Yonghong Song
2018-03-20 15:55 ` [PATCH net-next v2 1/2] " Yonghong Song
2018-03-20 15:55 ` [PATCH net-next v2 2/2] net: bpf: add a test for skb_segment in test_bpf module Yonghong Song
0 siblings, 2 replies; 5+ messages in thread
From: Yonghong Song @ 2018-03-20 15:54 UTC (permalink / raw)
To: edumazet, ast, daniel, diptanu, netdev; +Cc: kernel-team
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
function skb_segment(), line 3667. The bpf program attaches to
clsact ingress, calls bpf_skb_change_proto to change protocol
from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
to send the changed packet out.
...
3665 while (pos < offset + len) {
3666 if (i >= nfrags) {
3667 BUG_ON(skb_headlen(list_skb));
...
The triggering input skb has the following properties:
list_skb = skb->frag_list;
skb->nfrags != NULL && skb_headlen(list_skb) != 0
and skb_segment() is not able to handle a frag_list skb
if its headlen (list_skb->len - list_skb->data_len) is not 0.
Patch #1 provides a simple solution to avoid BUG_ON. If
list_skb->head_frag is true, its page-backed frag will
be processed before the list_skb->frags.
Patch #2 provides a test case in test_bpf module which
constructs a skb and calls skb_segment() directly. The test
case is able to trigger the BUG_ON without Patch #1.
Changelog:
v1 -> v2:
. Removed never-hit BUG_ON, spotted by Linyu Yuan.
Yonghong Song (2):
net: permit skb_segment on head_frag frag_list skb
net: bpf: add a test for skb_segment in test_bpf module
lib/test_bpf.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
net/core/skbuff.c | 42 ++++++++++++++++++++++----------
2 files changed, 99 insertions(+), 14 deletions(-)
--
2.9.5
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v2 1/2] net: permit skb_segment on head_frag frag_list skb
2018-03-20 15:54 [PATCH net-next v2 0/2] net: permit skb_segment on head_frag frag_list skb Yonghong Song
@ 2018-03-20 15:55 ` Yonghong Song
2018-03-20 18:08 ` Alexander Duyck
2018-03-20 15:55 ` [PATCH net-next v2 2/2] net: bpf: add a test for skb_segment in test_bpf module Yonghong Song
1 sibling, 1 reply; 5+ messages in thread
From: Yonghong Song @ 2018-03-20 15:55 UTC (permalink / raw)
To: edumazet, ast, daniel, diptanu, netdev; +Cc: kernel-team
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
function skb_segment(), line 3667. The bpf program attaches to
clsact ingress, calls bpf_skb_change_proto to change protocol
from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
to send the changed packet out.
3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3473 netdev_features_t features)
3474 {
3475 struct sk_buff *segs = NULL;
3476 struct sk_buff *tail = NULL;
...
3665 while (pos < offset + len) {
3666 if (i >= nfrags) {
3667 BUG_ON(skb_headlen(list_skb));
3668
3669 i = 0;
3670 nfrags = skb_shinfo(list_skb)->nr_frags;
3671 frag = skb_shinfo(list_skb)->frags;
3672 frag_skb = list_skb;
...
call stack:
...
#1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
#2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
#3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
#4 [ffff883ffef03668] die at ffffffff8101deb2
#5 [ffff883ffef03698] do_trap at ffffffff8101a700
#6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
#7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
#8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
[exception RIP: skb_segment+3044]
RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216
RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011
RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500
RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da
R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80
R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
--- <IRQ stack> ---
...
The triggering input skb has the following properties:
list_skb = skb->frag_list;
skb->nfrags != NULL && skb_headlen(list_skb) != 0
and skb_segment() is not able to handle a frag_list skb
if its headlen (list_skb->len - list_skb->data_len) is not 0.
This patch addressed the issue by handling skb_headlen(list_skb) != 0
case properly if list_skb->head_frag is true, which is expected in
most cases. A one-element frag array is created for the list_skb head
and processed before list_skb->frags are processed.
Reported-by: Diptanu Gon Choudhury <diptanu@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
net/core/skbuff.c | 42 +++++++++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 715c134..0ad4cda 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3475,9 +3475,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
struct sk_buff *segs = NULL;
struct sk_buff *tail = NULL;
struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
- skb_frag_t *frag = skb_shinfo(head_skb)->frags;
+ skb_frag_t *frag = skb_shinfo(head_skb)->frags, head_frag;
unsigned int mss = skb_shinfo(head_skb)->gso_size;
unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
+ struct sk_buff *check_list_skb = list_skb;
struct sk_buff *frag_skb = head_skb;
unsigned int offset = doffset;
unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
@@ -3590,6 +3591,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
nskb = skb_clone(list_skb, GFP_ATOMIC);
list_skb = list_skb->next;
+ check_list_skb = list_skb;
if (unlikely(!nskb))
goto err;
@@ -3664,21 +3666,35 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
while (pos < offset + len) {
if (i >= nfrags) {
- BUG_ON(skb_headlen(list_skb));
-
- i = 0;
- nfrags = skb_shinfo(list_skb)->nr_frags;
- frag = skb_shinfo(list_skb)->frags;
- frag_skb = list_skb;
+ if (skb_headlen(list_skb) && check_list_skb == list_skb) {
+ struct page *page;
+
+ BUG_ON(!list_skb->head_frag);
+
+ i = 0;
+ nfrags = 1;
+ page = virt_to_head_page(list_skb->head);
+ head_frag.page.p = page;
+ head_frag.page_offset = list_skb->data -
+ (unsigned char *)page_address(page);
+ head_frag.size = skb_headlen(list_skb);
+ frag = &head_frag;
+ check_list_skb = list_skb->next;
+ } else {
+ i = 0;
+ nfrags = skb_shinfo(list_skb)->nr_frags;
+ frag = skb_shinfo(list_skb)->frags;
+ frag_skb = list_skb;
- BUG_ON(!nfrags);
+ BUG_ON(!nfrags);
- if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
- skb_zerocopy_clone(nskb, frag_skb,
- GFP_ATOMIC))
- goto err;
+ if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+ skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+ goto err;
- list_skb = list_skb->next;
+ list_skb = list_skb->next;
+ check_list_skb = list_skb;
+ }
}
if (unlikely(skb_shinfo(nskb)->nr_frags >=
--
2.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next v2 2/2] net: bpf: add a test for skb_segment in test_bpf module
2018-03-20 15:54 [PATCH net-next v2 0/2] net: permit skb_segment on head_frag frag_list skb Yonghong Song
2018-03-20 15:55 ` [PATCH net-next v2 1/2] " Yonghong Song
@ 2018-03-20 15:55 ` Yonghong Song
1 sibling, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2018-03-20 15:55 UTC (permalink / raw)
To: edumazet, ast, daniel, diptanu, netdev; +Cc: kernel-team
Without the previous commit,
"modprobe test_bpf" will have the following errors:
...
[ 98.149165] ------------[ cut here ]------------
[ 98.159362] kernel BUG at net/core/skbuff.c:3667!
[ 98.169756] invalid opcode: 0000 [#1] SMP PTI
[ 98.179370] Modules linked in:
[ 98.179371] test_bpf(+)
...
which triggers the bug the previous commit intends to fix.
The skbs are constructed to mimic what mlx5 may generate.
The packet size/header may not mimic real cases in production. But
the processing flow is similar.
Signed-off-by: Yonghong Song <yhs@fb.com>
---
lib/test_bpf.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 70 insertions(+), 1 deletion(-)
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 2efb213..045d7d3 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -6574,6 +6574,72 @@ static bool exclude_test(int test_id)
return test_id < test_range[0] || test_id > test_range[1];
}
+static struct sk_buff *build_test_skb(void *page)
+{
+ u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN;
+ struct sk_buff *skb[2];
+ int i, data_size = 8;
+
+ for (i = 0; i < 2; i++) {
+ /* this will set skb[i]->head_frag */
+ skb[i] = build_skb(page, headroom);
+ if (!skb[i])
+ return NULL;
+
+ skb_reserve(skb[i], headroom);
+ skb_put(skb[i], data_size);
+ skb[i]->protocol = htons(ETH_P_IP);
+ skb_reset_network_header(skb[i]);
+ skb_set_mac_header(skb[i], -ETH_HLEN);
+
+ skb_add_rx_frag(skb[i], skb_shinfo(skb[i])->nr_frags,
+ page, 0, 64, 64);
+ // skb: skb_headlen(skb[i]): 8, skb[i]->head_frag = 1
+ }
+
+ /* setup shinfo */
+ skb_shinfo(skb[0])->gso_size = 1448;
+ skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4;
+ skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY;
+ skb_shinfo(skb[0])->gso_segs = 0;
+ skb_shinfo(skb[0])->frag_list = skb[1];
+
+ /* adjust skb[0]'s len */
+ skb[0]->len += skb[1]->len;
+ skb[0]->data_len += skb[1]->data_len;
+ skb[0]->truesize += skb[1]->truesize;
+
+ return skb[0];
+}
+
+static __init int test_skb_segment(void)
+{
+ netdev_features_t features;
+ struct sk_buff *skb;
+ void *page;
+ int ret = -1;
+
+ page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page) {
+ pr_info("%s: failed to get_free_page!", __func__);
+ return ret;
+ }
+
+ features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
+ features |= NETIF_F_RXCSUM;
+ skb = build_test_skb(page);
+ if (!skb) {
+ pr_info("%s: failed to build_test_skb", __func__);
+ } else if (skb_segment(skb, features)) {
+ ret = 0;
+ pr_info("%s: success in skb_segment!", __func__);
+ } else {
+ pr_info("%s: failed in skb_segment!", __func__);
+ }
+ free_page((unsigned long)page);
+ return ret;
+}
+
static __init int test_bpf(void)
{
int i, err_cnt = 0, pass_cnt = 0;
@@ -6632,8 +6698,11 @@ static int __init test_bpf_init(void)
return ret;
ret = test_bpf();
-
destroy_bpf_tests();
+ if (ret)
+ return ret;
+
+ ret = test_skb_segment();
return ret;
}
--
2.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v2 1/2] net: permit skb_segment on head_frag frag_list skb
2018-03-20 15:55 ` [PATCH net-next v2 1/2] " Yonghong Song
@ 2018-03-20 18:08 ` Alexander Duyck
2018-03-20 22:53 ` Yonghong Song
0 siblings, 1 reply; 5+ messages in thread
From: Alexander Duyck @ 2018-03-20 18:08 UTC (permalink / raw)
To: Yonghong Song
Cc: Eric Dumazet, ast, Daniel Borkmann, diptanu, Netdev, Kernel Team
On Tue, Mar 20, 2018 at 8:55 AM, Yonghong Song <yhs@fb.com> wrote:
> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
> function skb_segment(), line 3667. The bpf program attaches to
> clsact ingress, calls bpf_skb_change_proto to change protocol
> from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
> to send the changed packet out.
>
> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 3473 netdev_features_t features)
> 3474 {
> 3475 struct sk_buff *segs = NULL;
> 3476 struct sk_buff *tail = NULL;
> ...
> 3665 while (pos < offset + len) {
> 3666 if (i >= nfrags) {
> 3667 BUG_ON(skb_headlen(list_skb));
> 3668
> 3669 i = 0;
> 3670 nfrags = skb_shinfo(list_skb)->nr_frags;
> 3671 frag = skb_shinfo(list_skb)->frags;
> 3672 frag_skb = list_skb;
> ...
>
> call stack:
> ...
> #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
> #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
> #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
> #4 [ffff883ffef03668] die at ffffffff8101deb2
> #5 [ffff883ffef03698] do_trap at ffffffff8101a700
> #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
> #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
> #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
> [exception RIP: skb_segment+3044]
> RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216
> RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011
> RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500
> RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da
> R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80
> R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
> --- <IRQ stack> ---
> ...
>
> The triggering input skb has the following properties:
> list_skb = skb->frag_list;
> skb->nfrags != NULL && skb_headlen(list_skb) != 0
> and skb_segment() is not able to handle a frag_list skb
> if its headlen (list_skb->len - list_skb->data_len) is not 0.
>
> This patch addressed the issue by handling skb_headlen(list_skb) != 0
> case properly if list_skb->head_frag is true, which is expected in
> most cases. A one-element frag array is created for the list_skb head
> and processed before list_skb->frags are processed.
>
> Reported-by: Diptanu Gon Choudhury <diptanu@fb.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
> net/core/skbuff.c | 42 +++++++++++++++++++++++++++++-------------
> 1 file changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 715c134..0ad4cda 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3475,9 +3475,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> struct sk_buff *segs = NULL;
> struct sk_buff *tail = NULL;
> struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
> - skb_frag_t *frag = skb_shinfo(head_skb)->frags;
> + skb_frag_t *frag = skb_shinfo(head_skb)->frags, head_frag;
I would move head_frag down into the while loop. No point in making it
global to this function and eating up the extra stack space if you
don't need it.
> unsigned int mss = skb_shinfo(head_skb)->gso_size;
> unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
> + struct sk_buff *check_list_skb = list_skb;
This seems like a waste of a pointer. You can probably just repurpose
nfrags to achieve the same purpose you are achieving below. Note that
nfrags is a signed int and only needing to store up to only 18 or so
total frags so we can probably just reserve a nfrags value of -1 to
indicate that we are using the header frag.
> struct sk_buff *frag_skb = head_skb;
> unsigned int offset = doffset;
> unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
> @@ -3590,6 +3591,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>
> nskb = skb_clone(list_skb, GFP_ATOMIC);
> list_skb = list_skb->next;
> + check_list_skb = list_skb;
>
> if (unlikely(!nskb))
> goto err;
If my understanding is correct then this is unneeded if you just
change how you use i and nfrags.
> @@ -3664,21 +3666,35 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>
> while (pos < offset + len) {
You could probably just declare here since this would be more local to
where it is actually used and the fact is it can be overwritten with
each iteration of the loop anyway so there is no need to reserve the
space before you get here.
> if (i >= nfrags) {
> - BUG_ON(skb_headlen(list_skb));
> -
> - i = 0;
> - nfrags = skb_shinfo(list_skb)->nr_frags;
> - frag = skb_shinfo(list_skb)->frags;
> - frag_skb = list_skb;
> + if (skb_headlen(list_skb) && check_list_skb == list_skb) {
> + struct page *page;
> +
> + BUG_ON(!list_skb->head_frag);
> +
> + i = 0;
> + nfrags = 1;
> + page = virt_to_head_page(list_skb->head);
> + head_frag.page.p = page;
> + head_frag.page_offset = list_skb->data -
> + (unsigned char *)page_address(page);
> + head_frag.size = skb_headlen(list_skb);
> + frag = &head_frag;
> + check_list_skb = list_skb->next;
The whole need for check_list_skb can be worked around if we take
advantage of the fact that i and nfrags are both integers so we could
use -1 for both to indicate we are processing the head frag. The only
bit that gets messy is the fact that we have to add special handling
for the case where skb_shinfo(list_skb)->nr_frags is 0. If that is the
case we would have to set nfrags to 0 and bump the list_skb =
list_skb->next so that we avoid trying to pull frags from an otherwise
empty buffer.
> + } else {
> + i = 0;
> + nfrags = skb_shinfo(list_skb)->nr_frags;
> + frag = skb_shinfo(list_skb)->frags;
> + frag_skb = list_skb;
>
> - BUG_ON(!nfrags);
> + BUG_ON(!nfrags);
>
> - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> - skb_zerocopy_clone(nskb, frag_skb,
> - GFP_ATOMIC))
> - goto err;
> + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> + skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
> + goto err;
>
> - list_skb = list_skb->next;
> + list_skb = list_skb->next;
> + check_list_skb = list_skb;
> + }
> }
>
> if (unlikely(skb_shinfo(nskb)->nr_frags >=
> --
> 2.9.5
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v2 1/2] net: permit skb_segment on head_frag frag_list skb
2018-03-20 18:08 ` Alexander Duyck
@ 2018-03-20 22:53 ` Yonghong Song
0 siblings, 0 replies; 5+ messages in thread
From: Yonghong Song @ 2018-03-20 22:53 UTC (permalink / raw)
To: Alexander Duyck
Cc: Eric Dumazet, ast, Daniel Borkmann, diptanu, Netdev, Kernel Team
On 3/20/18 11:08 AM, Alexander Duyck wrote:
> On Tue, Mar 20, 2018 at 8:55 AM, Yonghong Song <yhs@fb.com> wrote:
>> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
>> function skb_segment(), line 3667. The bpf program attaches to
>> clsact ingress, calls bpf_skb_change_proto to change protocol
>> from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
>> to send the changed packet out.
>>
>> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> 3473 netdev_features_t features)
>> 3474 {
>> 3475 struct sk_buff *segs = NULL;
>> 3476 struct sk_buff *tail = NULL;
>> ...
>> 3665 while (pos < offset + len) {
>> 3666 if (i >= nfrags) {
>> 3667 BUG_ON(skb_headlen(list_skb));
>> 3668
>> 3669 i = 0;
>> 3670 nfrags = skb_shinfo(list_skb)->nr_frags;
>> 3671 frag = skb_shinfo(list_skb)->frags;
>> 3672 frag_skb = list_skb;
>> ...
>>
>> call stack:
>> ...
>> #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>> #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>> #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>> #4 [ffff883ffef03668] die at ffffffff8101deb2
>> #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>> #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>> #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>> #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>> [exception RIP: skb_segment+3044]
>> RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216
>> RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011
>> RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500
>> RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da
>> R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80
>> R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>> #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
>> --- <IRQ stack> ---
>> ...
>>
>> The triggering input skb has the following properties:
>> list_skb = skb->frag_list;
>> skb->nfrags != NULL && skb_headlen(list_skb) != 0
>> and skb_segment() is not able to handle a frag_list skb
>> if its headlen (list_skb->len - list_skb->data_len) is not 0.
>>
>> This patch addressed the issue by handling skb_headlen(list_skb) != 0
>> case properly if list_skb->head_frag is true, which is expected in
>> most cases. A one-element frag array is created for the list_skb head
>> and processed before list_skb->frags are processed.
>>
>> Reported-by: Diptanu Gon Choudhury <diptanu@fb.com>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> net/core/skbuff.c | 42 +++++++++++++++++++++++++++++-------------
>> 1 file changed, 29 insertions(+), 13 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 715c134..0ad4cda 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3475,9 +3475,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> struct sk_buff *segs = NULL;
>> struct sk_buff *tail = NULL;
>> struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
>> - skb_frag_t *frag = skb_shinfo(head_skb)->frags;
>> + skb_frag_t *frag = skb_shinfo(head_skb)->frags, head_frag;
>
> I would move head_frag down into the while loop. No point in making it
> global to this function and eating up the extra stack space if you
> don't need it.
We need head_frag declared outside the main segmentation loop.
Its value may survive across different main loop iterations.
However, I see your point to save the stack space.
Will use a pointer here and do allocation on demand instead.
>
>> unsigned int mss = skb_shinfo(head_skb)->gso_size;
>> unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
>> + struct sk_buff *check_list_skb = list_skb;
>
> This seems like a waste of a pointer. You can probably just repurpose
> nfrags to achieve the same purpose you are achieving below. Note that
> nfrags is a signed int and only needing to store up to only 18 or so
> total frags so we can probably just reserve a nfrags value of -1 to
> indicate that we are using the header frag.
Just did some prototyping. Indeed, we could reserve the i = -1 to
indicate the special head_frag.
>
>> struct sk_buff *frag_skb = head_skb;
>> unsigned int offset = doffset;
>> unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
>> @@ -3590,6 +3591,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>> nskb = skb_clone(list_skb, GFP_ATOMIC);
>> list_skb = list_skb->next;
>> + check_list_skb = list_skb;
>>
>> if (unlikely(!nskb))
>> goto err;
>
> If my understanding is correct then this is unneeded if you just
> change how you use i and nfrags.
Right.
>
>> @@ -3664,21 +3666,35 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>> while (pos < offset + len) {
>
> You could probably just declare here since this would be more local to
> where it is actually used and the fact is it can be overwritten with
> each iteration of the loop anyway so there is no need to reserve the
> space before you get here.
The actual space can be allocated once and reused later. But the space
needs to be preserved across main segmentation loop.
>
>> if (i >= nfrags) {
>> - BUG_ON(skb_headlen(list_skb));
>> -
>> - i = 0;
>> - nfrags = skb_shinfo(list_skb)->nr_frags;
>> - frag = skb_shinfo(list_skb)->frags;
>> - frag_skb = list_skb;
>> + if (skb_headlen(list_skb) && check_list_skb == list_skb) {
>> + struct page *page;
>> +
>> + BUG_ON(!list_skb->head_frag);
>> +
>> + i = 0;
>> + nfrags = 1;
>> + page = virt_to_head_page(list_skb->head);
>> + head_frag.page.p = page;
>> + head_frag.page_offset = list_skb->data -
>> + (unsigned char *)page_address(page);
>> + head_frag.size = skb_headlen(list_skb);
>> + frag = &head_frag;
>> + check_list_skb = list_skb->next;
>
> The whole need for check_list_skb can be worked around if we take
> advantage of the fact that i and nfrags are both integers so we could
> use -1 for both to indicate we are processing the head frag. The only
> bit that gets messy is the fact that we have to add special handling
> for the case where skb_shinfo(list_skb)->nr_frags is 0. If that is the
> case we would have to set nfrags to 0 and bump the list_skb =
> list_skb->next so that we avoid trying to pull frags from an otherwise
> empty buffer.
Right. if nr_frags = 0, we do need to avoid pulling frags.
Thanks for the review, will send out v3 shortly.
>
>> + } else {
>> + i = 0;
>> + nfrags = skb_shinfo(list_skb)->nr_frags;
>> + frag = skb_shinfo(list_skb)->frags;
>> + frag_skb = list_skb;
>>
>> - BUG_ON(!nfrags);
>> + BUG_ON(!nfrags);
>>
>> - if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> - skb_zerocopy_clone(nskb, frag_skb,
>> - GFP_ATOMIC))
>> - goto err;
>> + if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> + skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>> + goto err;
>>
>> - list_skb = list_skb->next;
>> + list_skb = list_skb->next;
>> + check_list_skb = list_skb;
>> + }
>> }
>>
>> if (unlikely(skb_shinfo(nskb)->nr_frags >=
>> --
>> 2.9.5
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-03-20 22:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20 15:54 [PATCH net-next v2 0/2] net: permit skb_segment on head_frag frag_list skb Yonghong Song
2018-03-20 15:55 ` [PATCH net-next v2 1/2] " Yonghong Song
2018-03-20 18:08 ` Alexander Duyck
2018-03-20 22:53 ` Yonghong Song
2018-03-20 15:55 ` [PATCH net-next v2 2/2] net: bpf: add a test for skb_segment in test_bpf module Yonghong Song
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.