All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Hangbin Liu <liuhangbin@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Jiri Benc" <jbenc@redhat.com>,
	"Eelco Chaudron" <echaudro@redhat.com>,
	ast@kernel.org, "Daniel Borkmann" <daniel@iogearbox.net>,
	"Lorenzo Bianconi" <lorenzo.bianconi@redhat.com>,
	"David Ahern" <dsahern@gmail.com>,
	"Andrii Nakryiko" <andrii.nakryiko@gmail.com>,
	"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
	"Björn Töpel" <bjorn.topel@gmail.com>,
	"Martin KaFai Lau" <kafai@fb.com>,
	brouer@redhat.com
Subject: Re: [PATCHv10 bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support
Date: Mon, 26 Apr 2021 13:41:05 +0200	[thread overview]
Message-ID: <20210426134105.4706af0b@carbon> (raw)
In-Reply-To: <20210426104704.GR3465@Leo-laptop-t470s>

On Mon, 26 Apr 2021 18:47:04 +0800
Hangbin Liu <liuhangbin@gmail.com> wrote:

> On Mon, Apr 26, 2021 at 11:53:50AM +0200, Jesper Dangaard Brouer wrote:
> > On Fri, 23 Apr 2021 10:00:17 +0800
> > Hangbin Liu <liuhangbin@gmail.com> wrote:
> >   
> > > This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
> > > extend xdp_redirect_map for broadcast support.
> > > 
> > > With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
> > > in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
> > > excluded when do broadcasting.
> > > 
> > > When getting the devices in dev hash map via dev_map_hash_get_next_key(),
> > > there is a possibility that we fall back to the first key when a device
> > > was removed. This will duplicate packets on some interfaces. So just walk
> > > the whole buckets to avoid this issue. For dev array map, we also walk the
> > > whole map to find valid interfaces.
> > > 
> > > Function bpf_clear_redirect_map() was removed in
> > > commit ee75aef23afe ("bpf, xdp: Restructure redirect actions").
> > > Add it back as we need to use ri->map again.
> > > 
> > > Here is the performance result by using 10Gb i40e NIC, do XDP_DROP on
> > > veth peer, run xdp_redirect_{map, map_multi} in sample/bpf and send pkts
> > > via pktgen cmd:
> > > ./pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -t 10 -s 64  
> > 
> > While running:
> >  $ sudo ./xdp_redirect_map_multi -F i40e2 i40e2
> >  Get interfaces 7 7
> >  libbpf: elf: skipping unrecognized data section(23) .eh_frame
> >  libbpf: elf: skipping relo section(24) .rel.eh_frame for section(23) .eh_frame
> >  Forwarding   10140845 pkt/s
> >  Forwarding   11767042 pkt/s
> >  Forwarding   11783437 pkt/s
> >  Forwarding   11767331 pkt/s
> > 
> > When starting:  sudo ./xdp_monitor --stats  
> 
> That seems the same issue I reported previously in our meeting.
> https://bugzilla.redhat.com/show_bug.cgi?id=1906820#c4
> 
> I only saw it 3 times and can't reproduce it easily.
> 
> Do you have any idea where is the root cause?

All the information you need to find the root-cause is listed below.
I have even decoded where in the code it happens, and also include the
code with line-numbering and pointed to the line the crash happens in,
I don't think it is possible for me to be more specific and help further.

 
> > System crashed with:
> > 
> > [ 5509.997837] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [ 5510.004793] #PF: supervisor read access in kernel mode
> > [ 5510.009929] #PF: error_code(0x0000) - not-present page
> > [ 5510.015060] PGD 0 P4D 0 
> > [ 5510.017591] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 5510.021769] CPU: 3 PID: 29 Comm: ksoftirqd/3 Not tainted 5.12.0-rc7-net-next-hangbin-v10+ #602
> > [ 5510.030368] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
> > [ 5510.037835] RIP: 0010:perf_trace_xdp_redirect_template+0xba/0x130
> > [ 5510.043929] Code: 00 00 00 8b 45 18 74 1e 41 83 f9 19 74 18 45 85 c9 0f 85 83 00 00 00 81 7d 10 ff ff ff 7f 75 7a 89 c1 31 c0 eb 0d 48 8b 75 b8 <48> 8b 16 8b 8a d0 00 00 00 49 8b 55 38 41 b8 01 00 00 00 be 24 00
> > [ 5510.062668] RSP: 0018:ffffc9000017fc50 EFLAGS: 00010246
> > [ 5510.067884] RAX: 0000000000000000 RBX: ffffe8ffffccf180 RCX: 0000000000000000
> > [ 5510.075007] RDX: ffffffff817d1a9b RSI: 0000000000000000 RDI: ffffe8ffffcd8000
> > [ 5510.082133] RBP: ffffc9000017fc98 R08: 0000000000000000 R09: 0000000000000019
> > [ 5510.089256] R10: 0000000000000000 R11: ffff88887fd2ab70 R12: 0000000000000000
> > [ 5510.096382] R13: ffffc900000a5000 R14: ffff88810a8f2000 R15: ffffffff82a7c840
> > [ 5510.103505] FS:  0000000000000000(0000) GS:ffff88887fcc0000(0000) knlGS:0000000000000000
> > [ 5510.111584] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 5510.117319] CR2: 0000000000000000 CR3: 0000000157b1e004 CR4: 00000000003706e0
> > [ 5510.124444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 5510.131567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 5510.138692] Call Trace:
> > [ 5510.141141]  xdp_do_redirect+0x16b/0x230
> > [ 5510.145064]  i40e_clean_rx_irq+0x62e/0x9a0 [i40e]
> > [ 5510.149779]  i40e_napi_poll+0xf0/0x410 [i40e]
> > [ 5510.154135]  __napi_poll+0x2a/0x140
> > [ 5510.157620]  net_rx_action+0x215/0x2d0
> > [ 5510.161364]  __do_softirq+0xe3/0x2df
> > [ 5510.164938]  run_ksoftirqd+0x1a/0x20
> > [ 5510.168514]  smpboot_thread_fn+0xee/0x1e0
> > [ 5510.172519]  ? sort_range+0x20/0x20
> > [ 5510.176003]  kthread+0x116/0x150
> > [ 5510.179237]  ? kthread_park+0x90/0x90
> > [ 5510.182893]  ret_from_fork+0x22/0x30
> > [ 5510.186474] Modules linked in: algif_hash af_alg bpf_preload fuse veth nf_defrag_ipv6 nf_defrag_ipv4 tun bridge stp llc rpcrdma sunrpc rdma_ucm ib_umad rdma_cm ib_ipoib coretemp iw_cm kvm_intel ib_cm kvm mlx5_ib i40iw irqbypass ib_uverbs rapl intel_cstate i2c_i801 intel_uncore ib_core pcspkr bfq i2c_smbus acpi_ipmi wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad sch_fq_codel mlx5_core mlxfw i40e ixgbe igc psample mdio igb sd_mod ptp t10_pi nfp i2c_algo_bit i2c_core pps_core hid_generic [last unloaded: bpfilter]
> > [ 5510.231847] CR2: 0000000000000000
> > [ 5510.235164] ---[ end trace b8e076677c53b5e8 ]---
> > [ 5510.241762] RIP: 0010:perf_trace_xdp_redirect_template+0xba/0x130
> > [ 5510.247851] Code: 00 00 00 8b 45 18 74 1e 41 83 f9 19 74 18 45 85 c9 0f 85 83 00 00 00 81 7d 10 ff ff ff 7f 75 7a 89 c1 31 c0 eb 0d 48 8b 75 b8 <48> 8b 16 8b 8a d0 00 00 00 49 8b 55 38 41 b8 01 00 00 00 be 24 00
> > [ 5510.266590] RSP: 0018:ffffc9000017fc50 EFLAGS: 00010246
> > [ 5510.271804] RAX: 0000000000000000 RBX: ffffe8ffffccf180 RCX: 0000000000000000
> > [ 5510.278931] RDX: ffffffff817d1a9b RSI: 0000000000000000 RDI: ffffe8ffffcd8000
> > [ 5510.286053] RBP: ffffc9000017fc98 R08: 0000000000000000 R09: 0000000000000019
> > [ 5510.293178] R10: 0000000000000000 R11: ffff88887fd2ab70 R12: 0000000000000000
> > [ 5510.300301] R13: ffffc900000a5000 R14: ffff88810a8f2000 R15: ffffffff82a7c840
> > [ 5510.307426] FS:  0000000000000000(0000) GS:ffff88887fcc0000(0000) knlGS:0000000000000000
> > [ 5510.315503] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 5510.321241] CR2: 0000000000000000 CR3: 0000000157b1e004 CR4: 00000000003706e0
> > [ 5510.328364] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 5510.335490] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 5510.342612] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 5510.348994] Kernel Offset: disabled
> > [ 5510.354469] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> > 
> > [net-next]$ ./scripts/faddr2line vmlinux xdp_do_redirect+0x16b
> > xdp_do_redirect+0x16b/0x230:
> > trace_xdp_redirect at include/trace/events/xdp.h:136
> > (inlined by) trace_xdp_redirect at include/trace/events/xdp.h:136
> > (inlined by) xdp_do_redirect at net/core/filter.c:3996
> > 
> > Decode: perf_trace_xdp_redirect_template+0xba
> >  ./scripts/faddr2line vmlinux perf_trace_xdp_redirect_template+0xba
> > perf_trace_xdp_redirect_template+0xba/0x130:
> > perf_trace_xdp_redirect_template at include/trace/events/xdp.h:89 (discriminator 13)
> > 
> > less -N net/core/filter.c
> >  [...]
> >    3993         if (unlikely(err))
> >    3994                 goto err;
> >    3995   
> > -> 3996         _trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index);  
> >    3997         return 0;
> >    3998 err:
> >    3999         _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err);
> >    4000         return err;
> >    4001 }
> >    4002 EXPORT_SYMBOL_GPL(xdp_do_redirect);
> > 
> >   
> > > ---
> > >  include/linux/bpf.h            |  20 ++++
> > >  include/linux/filter.h         |  18 +++-
> > >  include/net/xdp.h              |   1 +
> > >  include/uapi/linux/bpf.h       |  17 +++-
> > >  kernel/bpf/cpumap.c            |   3 +-
> > >  kernel/bpf/devmap.c            | 181 ++++++++++++++++++++++++++++++++-
> > >  net/core/filter.c              |  37 ++++++-
> > >  net/core/xdp.c                 |  29 ++++++
> > >  net/xdp/xskmap.c               |   3 +-
> > >  tools/include/uapi/linux/bpf.h |  17 +++-
> > >  10 files changed, 312 insertions(+), 14 deletions(-)
> > >   
> > [...]
> >   
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index cae56d08a670..05ba5ab4345f 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -3926,6 +3926,23 @@ void xdp_do_flush(void)
> > >  }
> > >  EXPORT_SYMBOL_GPL(xdp_do_flush);
> > >  
> > > +void bpf_clear_redirect_map(struct bpf_map *map)
> > > +{
> > > +	struct bpf_redirect_info *ri;
> > > +	int cpu;
> > > +
> > > +	for_each_possible_cpu(cpu) {
> > > +		ri = per_cpu_ptr(&bpf_redirect_info, cpu);
> > > +		/* Avoid polluting remote cacheline due to writes if
> > > +		 * not needed. Once we pass this test, we need the
> > > +		 * cmpxchg() to make sure it hasn't been changed in
> > > +		 * the meantime by remote CPU.
> > > +		 */
> > > +		if (unlikely(READ_ONCE(ri->map) == map))
> > > +			cmpxchg(&ri->map, map, NULL);
> > > +	}
> > > +}
> > > +
> > >  int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > >  		    struct bpf_prog *xdp_prog)
> > >  {
> > > @@ -3933,6 +3950,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > >  	enum bpf_map_type map_type = ri->map_type;
> > >  	void *fwd = ri->tgt_value;
> > >  	u32 map_id = ri->map_id;
> > > +	struct bpf_map *map;
> > >  	int err;
> > >  
> > >  	ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
> > > @@ -3942,7 +3960,14 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> > >  	case BPF_MAP_TYPE_DEVMAP:
> > >  		fallthrough;
> > >  	case BPF_MAP_TYPE_DEVMAP_HASH:
> > > -		err = dev_map_enqueue(fwd, xdp, dev);
> > > +		map = READ_ONCE(ri->map);
> > > +		if (map) {
> > > +			WRITE_ONCE(ri->map, NULL);
> > > +			err = dev_map_enqueue_multi(xdp, dev, map,
> > > +						    ri->flags & BPF_F_EXCLUDE_INGRESS);
> > > +		} else {
> > > +			err = dev_map_enqueue(fwd, xdp, dev);
> > > +		}
> > >  		break;
> > >  	case BPF_MAP_TYPE_CPUMAP:
> > >  		err = cpu_map_enqueue(fwd, xdp, dev);
> > > @@ -3984,13 +4009,21 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
> > >  				       enum bpf_map_type map_type, u32 map_id)
> > >  {
> > >  	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> > > +	struct bpf_map *map;
> > >  	int err;
> > >  
> > >  	switch (map_type) {
> > >  	case BPF_MAP_TYPE_DEVMAP:
> > >  		fallthrough;
> > >  	case BPF_MAP_TYPE_DEVMAP_HASH:
> > > -		err = dev_map_generic_redirect(fwd, skb, xdp_prog);
> > > +		map = READ_ONCE(ri->map);
> > > +		if (map) {
> > > +			WRITE_ONCE(ri->map, NULL);
> > > +			err = dev_map_redirect_multi(dev, skb, xdp_prog, map,
> > > +						     ri->flags & BPF_F_EXCLUDE_INGRESS);
> > > +		} else {
> > > +			err = dev_map_generic_redirect(fwd, skb, xdp_prog);
> > > +		}
> > >  		if (unlikely(err))
> > >  			goto err;
> > >  		break;  
> > 

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  parent reply	other threads:[~2021-04-26 11:41 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-23  2:00 [PATCHv10 bpf-next 0/4] xdp: extend xdp_redirect_map with broadcast support Hangbin Liu
2021-04-23  2:00 ` [PATCHv10 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue Hangbin Liu
2021-04-23  2:00 ` [PATCHv10 bpf-next 2/4] xdp: extend xdp_redirect_map with broadcast support Hangbin Liu
2021-04-26  9:53   ` Jesper Dangaard Brouer
2021-04-26 10:47     ` Hangbin Liu
2021-04-26 10:54       ` Hangbin Liu
2021-04-26 11:41       ` Jesper Dangaard Brouer [this message]
2021-04-26 11:54         ` Hangbin Liu
2021-04-26 11:40     ` Hangbin Liu
2021-04-26 11:47       ` Hangbin Liu
2021-04-26 13:00         ` Toke Høiland-Jørgensen
2021-04-27  8:40         ` Jesper Dangaard Brouer
2021-04-23  2:00 ` [PATCHv10 bpf-next 3/4] sample/bpf: add xdp_redirect_map_multi for redirect_map broadcast test Hangbin Liu
2021-04-23  2:00 ` [PATCHv10 bpf-next 4/4] selftests/bpf: add xdp_redirect_multi test Hangbin Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426134105.4706af0b@carbon \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=echaudro@redhat.com \
    --cc=jbenc@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=liuhangbin@gmail.com \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.