netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Alexei Starovoitov <ast@fb.com>, netdev <netdev@vger.kernel.org>,
	Martin Lau <kafai@fb.com>, Yonghong Song <yhs@fb.com>
Subject: BUG_ON triggered in skb_segment
Date: Mon, 12 Mar 2018 22:45:56 -0700	[thread overview]
Message-ID: <9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com> (raw)

Hi,

One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
net-next function skb_segment, line 3667.

3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3473                             netdev_features_t features)
3474 {
3475         struct sk_buff *segs = NULL;
3476         struct sk_buff *tail = NULL;
...
3665                 while (pos < offset + len) {
3666                         if (i >= nfrags) {
3667                                 BUG_ON(skb_headlen(list_skb));
3668
3669                                 i = 0;
3670                                 nfrags = 
skb_shinfo(list_skb)->nr_frags;
3671                                 frag = skb_shinfo(list_skb)->frags;
3672                                 frag_skb = list_skb;
...

call stack:
...
#0 [ffff883ffef034f8] machine_kexec at ffffffff81044c41
  #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
  #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
  #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
  #4 [ffff883ffef03668] die at ffffffff8101deb2
  #5 [ffff883ffef03698] do_trap at ffffffff8101a700
  #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
  #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
  #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
     [exception RIP: skb_segment+3044]
     RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
     RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
     RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
     RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
     R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
     R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
#10 [ffff883ffef03990] tcp4_gso_segment at ffffffff818717d8
#11 [ffff883ffef039b0] inet_gso_segment at ffffffff81882c9b
#12 [ffff883ffef03a10] skb_mac_gso_segment at ffffffff817f39b8
#13 [ffff883ffef03a38] __skb_gso_segment at ffffffff817f3ac9
#14 [ffff883ffef03a68] validate_xmit_skb at ffffffff817f3eed
#15 [ffff883ffef03aa8] validate_xmit_skb_list at ffffffff817f40a2
#16 [ffff883ffef03ad8] sch_direct_xmit at ffffffff81824efb
#17 [ffff883ffef03b20] __qdisc_run at ffffffff818251aa
#18 [ffff883ffef03b90] __dev_queue_xmit at ffffffff817f45ed
#19 [ffff883ffef03c08] dev_queue_xmit at ffffffff817f4b90
#20 [ffff883ffef03c18] __bpf_redirect at ffffffff81812b66
#21 [ffff883ffef03c40] skb_do_redirect at ffffffff81813209
#22 [ffff883ffef03c60] __netif_receive_skb_core at ffffffff817f310d
#23 [ffff883ffef03cc8] __netif_receive_skb at ffffffff817f32e8
#24 [ffff883ffef03ce8] netif_receive_skb_internal at ffffffff817f5538
#25 [ffff883ffef03d10] napi_gro_complete at ffffffff817f56c0
#26 [ffff883ffef03d28] dev_gro_receive at ffffffff817f5ea6
#27 [ffff883ffef03d78] napi_gro_receive at ffffffff817f6168
#28 [ffff883ffef03da0] mlx5e_handle_rx_cqe_mpwrq at ffffffff817381c2
#29 [ffff883ffef03e30] mlx5e_poll_rx_cq at ffffffff817386c2
#30 [ffff883ffef03e80] mlx5e_napi_poll at ffffffff8173926e
#31 [ffff883ffef03ed0] net_rx_action at ffffffff817f5a6e
#32 [ffff883ffef03f48] __softirqentry_text_start at ffffffff81c000c3
#33 [ffff883ffef03fa8] irq_exit at ffffffff8108f515
#34 [ffff883ffef03fb8] do_IRQ at ffffffff81a01b11
--- <IRQ stack> ---
bt: cannot transition from IRQ stack to current process stack:
         IRQ stack pointer: ffff883ffef034f8
     process stack pointer: ffffffff81a01ae9
        current stack base: ffffc9000c5c4000
...
Setup:
=====

The test will involve three machines:
   M_ipv6 <-> M_nat <-> M_ipv4

The M_nat will do ipv4<->ipv6 address translation and then forward packet
to proper destination. The control plane will configure M_nat properly
will understand virtual ipv4 address for machine M_ipv6, and
virtual ipv6 address for machine M_ipv4.

M_nat runs a bpf program, which is attached to clsact (ingress) qdisc.
The program uses bpf_skb_change_proto to do protocol conversion.
bpf_skb_change_proto will adjust skb header_len and len properly
based on protocol change.
After the conversion, the program will make proper change on
ethhdr and ip4/6 header, recalculate checksum, and send the packet out
through bpf_redirect.

Experiment:
===========

MTU: 1500B for all three machines.

The tso/lro/gro are enabled on the M_nat box.

ping works on both ways of M_ipv6 <-> M_ipv4.
It works for transfering a small file (4KB) between M_ipv6 and M_ipv4 
(both ways).
Transfering a large file (e.g., 4MB) from M_ipv6 to M_ipv4, failed with 
the above BUG_ON, really fast.
Did not really test from M_ipv4 to M_ipv6 with large file.

The error path likely to be (also from the above call stack):
   nic -> lro/gro -> bpf_program -> gso (BUG_ON)

In one of experiments, I explicitly printed the skb->len and 
skb->data_len. The values are below:
   skb_segment: len 2856, data_len 2686
They should be equal to avoid BUG.

In another experiment, I got:
   skb_segment: len 1428, data_len 1258

In both cases, the difference is 170 bytes. Not sure whether
this is just a coincidence or not.

Workaround:
===========

A workaround to avoid BUG_ON is to disable lro/gro. This way,
kernel will not receive big packets and hence gso is not really called.

I am not familiar with gso code. Does anybody hit this BUG_ON before?
Any suggestion on how to debug this?

Thanks!

Yonghong

             reply	other threads:[~2018-03-13  5:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-13  5:45 Yonghong Song [this message]
2018-03-13  6:04 ` BUG_ON triggered in skb_segment Eric Dumazet
2018-03-13  6:08   ` Yonghong Song
2018-03-13  6:25     ` Eric Dumazet
2018-03-13  8:44       ` Steffen Klassert
2018-03-13 22:37         ` Yonghong Song
2018-03-13 22:47           ` Eric Dumazet
2018-03-13 23:09             ` Alexei Starovoitov
2018-03-13 23:18               ` Daniel Borkmann
2018-03-13 23:27               ` Eric Dumazet
2018-03-14  0:04                 ` Alexei Starovoitov
2018-03-14  0:26                   ` Eric Dumazet
2018-03-14  0:35                     ` Eric Dumazet
2018-03-14  1:15                       ` Eric Dumazet
2018-03-16 22:37                         ` Yonghong Song
2018-03-16 23:03                           ` Eric Dumazet
2018-03-17  4:44                             ` Yonghong Song
2018-03-13  6:18 ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com \
    --to=yhs@fb.com \
    --cc=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).