Netdev Archive on lore.kernel.org
 help / color / Atom feed
* xdpsock poll with 5.2.21rt kernel
@ 2019-11-12 22:42 Paul Thomas
  2019-11-29 16:48 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Thomas @ 2019-11-12 22:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev,
	xdp-newbies, bpf, linux-rt-users

Hello,

I'm doing some testing with AF_XDP, and I'm seeing some behavior I
don't quite understand. It seems I can get into a situation where
xdpsock (from samples/bpf/scpsocke_user.c) is using most of the cpu
even though I'm trying to use poll().

To start with I run xdpsock with --rxdrop and --poll. At first this
behaves nicely, the cpu usage is very low:
# ps -AL -o pid,lwp,cmd,comm,rtprio,cpuid,pcpu | grep [x]dpsock
 1932  1932 ./xdpsock -r -p -i eth1     xdpsock              -     3  0.0
 1932  1933 ./xdpsock -r -p -i eth1     xdpsock              -     2  0.0

And strace shows nice orderly ppoll timeouts every second.
# strace -p 1932
strace: Process 1932 attached
ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=0, tv_nsec=510616211}, NULL,
0) = 0 (Timeout)
ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=1, tv_nsec=0}, NULL, 0) = 0 (Timeout)
ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=1, tv_nsec=0}, NULL, 0) = 0 (Timeout)
...

Then I generate some traffic and ppoll() is not timing out anymore:
ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=1, tv_nsec=0}, NULL, 0) = 1
([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=999996790})
ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=1, tv_nsec=0}, NULL, 0) = 1
([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=999997260})
ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=1, tv_nsec=0}, NULL, 0) = 1
([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=999997100})

This is where it get's strange, if I stop the traffic, then strace no
longer generates any activity but the xdpsock cpu usage is way up:
# ps -AL -o pid,lwp,cmd,comm,rtprio,cpuid,pcpu | grep [x]dpsock
 1932  1932 ./xdpsock -r -p -i eth1     xdpsock              -     3 61.0
 1932  1933 ./xdpsock -r -p -i eth1     xdpsock              -     2  0.0

So is it getting stuck at while (ret != rcvd) in rx_drop()?

Is it a normal case to get past the poll() and then have
xsk_ring_cons__peek() not equal xsk_ring_prod__reserve()?

I see the added xsk_ring_prod__needs_wakeup() with the extra poll() in
the latest 5.4 kernels, but I don't think any of the needs_wakeup
stuff is in the 5.2 kernel. Is that needed for this case?

This is with a 5.2.21 preempt-rt kernel on arm64 using the macb driver
(so XDP_SKB and not XDP_DRV).

Any thoughts would be appreciated.

thanks,
Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xdpsock poll with 5.2.21rt kernel
  2019-11-12 22:42 xdpsock poll with 5.2.21rt kernel Paul Thomas
@ 2019-11-29 16:48 ` Sebastian Andrzej Siewior
  2019-12-02 15:36   ` Paul Thomas
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-11-29 16:48 UTC (permalink / raw)
  To: Paul Thomas
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev,
	xdp-newbies, bpf, linux-rt-users

On 2019-11-12 17:42:42 [-0500], Paul Thomas wrote:
> Any thoughts would be appreciated.

Could please enable CONFIG_DEBUG_ATOMIC_SLEEP and check if the kernel
complains?

> thanks,
> Paul

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xdpsock poll with 5.2.21rt kernel
  2019-11-29 16:48 ` Sebastian Andrzej Siewior
@ 2019-12-02 15:36   ` Paul Thomas
  2019-12-02 16:26     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Thomas @ 2019-12-02 15:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev,
	xdp-newbies, bpf, linux-rt-users

On Fri, Nov 29, 2019 at 11:48 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2019-11-12 17:42:42 [-0500], Paul Thomas wrote:
> > Any thoughts would be appreciated.
>
> Could please enable CONFIG_DEBUG_ATOMIC_SLEEP and check if the kernel
> complains?

Hi Sebastian,

Well, it does complain (report below), but I'm not sure it's related.
The other thing I tried was the AF_XDP example here:
https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP

With this example poll() always seems to block correctly, so I think
maybe there is something wrong with the xdpsock_user.c example or how
I'm using it.

[  259.591480] BUG: assuming atomic context at net/core/ptp_classifier.c:106
[  259.591488] in_atomic(): 0, irqs_disabled(): 0, pid: 953, name: irq/22-eth%d
[  259.591494] CPU: 0 PID: 953 Comm: irq/22-eth%d Tainted: G        WC
       5.

                        2.21-rt13-00016-g93898e751d0e #90
[  259.591499] Hardware name: Enclustra XU5 SOM (DT)
[  259.591501] Call trace:
[  259.591503] dump_backtrace (/arch/arm64/kernel/traps.c:94)
[  259.591514] show_stack (/arch/arm64/kernel/traps.c:151)
[  259.591520] dump_stack (/lib/dump_stack.c:115)
[  259.591526] __cant_sleep (/kernel/sched/core.c:6386)
[  259.591531] ptp_classify_raw (/./include/linux/compiler.h:194
/./include/asm-generic/atomic-instrumented.h:27
/./include/linux/jump_label.h:251 /net/core/ptp_classifier.c:106)
[  259.591537] skb_defer_rx_timestamp (/./include/linux/skbuff.h:2236
/net/core/timestamping.c:60)
[  259.591541] netif_receive_skb_internal (/net/core/dev.c:5217)
[  259.591547] netif_receive_skb (/net/core/dev.c:5296)
[  259.591550] gem_rx (/drivers/net/ethernet/cadence/macb_main.c:993)
[  259.591556] macb_poll (/drivers/net/ethernet/cadence/macb_main.c:1265)
[  259.591561] net_rx_action (/net/core/dev.c:6387 /net/core/dev.c:6461)
[  259.591565] __do_softirq (/./include/linux/compiler.h:194
/./arch/arm64/include/asm/preempt.h:12 /kernel/softirq.c:400)
[  259.591569] __local_bh_enable_ip (/kernel/softirq.c:182)
[  259.591574] irq_forced_thread_fn (/kernel/irq/manage.c:1008)
[  259.591579] irq_thread (/kernel/irq/manage.c:1101)
[  259.591584] kthread (/kernel/kthread.c:255)
[  259.591589] ret_from_fork (/arch/arm64/kernel/entry.S:1176)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xdpsock poll with 5.2.21rt kernel
  2019-12-02 15:36   ` Paul Thomas
@ 2019-12-02 16:26     ` Sebastian Andrzej Siewior
  2019-12-02 17:11       ` Paul Thomas
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-12-02 16:26 UTC (permalink / raw)
  To: Paul Thomas
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev,
	xdp-newbies, bpf, linux-rt-users

On 2019-12-02 10:36:54 [-0500], Paul Thomas wrote:
> On Fri, Nov 29, 2019 at 11:48 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2019-11-12 17:42:42 [-0500], Paul Thomas wrote:
> > > Any thoughts would be appreciated.
> >
> > Could please enable CONFIG_DEBUG_ATOMIC_SLEEP and check if the kernel
> > complains?
> 
> Hi Sebastian,
> 
> Well, it does complain (report below), but I'm not sure it's related.
> The other thing I tried was the AF_XDP example here:
> https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP
> 
> With this example poll() always seems to block correctly, so I think
> maybe there is something wrong with the xdpsock_user.c example or how
> I'm using it.
> 
> [  259.591480] BUG: assuming atomic context at net/core/ptp_classifier.c:106
> [  259.591488] in_atomic(): 0, irqs_disabled(): 0, pid: 953, name: irq/22-eth%d
> [  259.591494] CPU: 0 PID: 953 Comm: irq/22-eth%d Tainted: G        WC
>        5.
> 
>                         2.21-rt13-00016-g93898e751d0e #90
> [  259.591499] Hardware name: Enclustra XU5 SOM (DT)
> [  259.591501] Call trace:
> [  259.591503] dump_backtrace (/arch/arm64/kernel/traps.c:94)
> [  259.591514] show_stack (/arch/arm64/kernel/traps.c:151)
> [  259.591520] dump_stack (/lib/dump_stack.c:115)
> [  259.591526] __cant_sleep (/kernel/sched/core.c:6386)
> [  259.591531] ptp_classify_raw (/./include/linux/compiler.h:194

Is this the only splat? Nothing more? I would expect something at boot
time, too.

So this part expects disabled preemption. Other invocations disable
preemption. The whole BPF part is currently not working on -RT.

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xdpsock poll with 5.2.21rt kernel
  2019-12-02 16:26     ` Sebastian Andrzej Siewior
@ 2019-12-02 17:11       ` Paul Thomas
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Thomas @ 2019-12-02 17:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Alexei Starovoitov, Daniel Borkmann, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend, netdev,
	xdp-newbies, bpf, linux-rt-users

> >
> > Well, it does complain (report below), but I'm not sure it's related.
> > The other thing I tried was the AF_XDP example here:
> > https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP
> >
> > With this example poll() always seems to block correctly, so I think
> > maybe there is something wrong with the xdpsock_user.c example or how
> > I'm using it.
> >
> > [  259.591480] BUG: assuming atomic context at net/core/ptp_classifier.c:106
> > [  259.591488] in_atomic(): 0, irqs_disabled(): 0, pid: 953, name: irq/22-eth%d
> > [  259.591494] CPU: 0 PID: 953 Comm: irq/22-eth%d Tainted: G        WC
> >        5.
> >
> >                         2.21-rt13-00016-g93898e751d0e #90
> > [  259.591499] Hardware name: Enclustra XU5 SOM (DT)
> > [  259.591501] Call trace:
> > [  259.591503] dump_backtrace (/arch/arm64/kernel/traps.c:94)
> > [  259.591514] show_stack (/arch/arm64/kernel/traps.c:151)
> > [  259.591520] dump_stack (/lib/dump_stack.c:115)
> > [  259.591526] __cant_sleep (/kernel/sched/core.c:6386)
> > [  259.591531] ptp_classify_raw (/./include/linux/compiler.h:194
>
> Is this the only splat? Nothing more? I would expect something at boot
> time, too.
I should have expanded more. This seems to happen every second
starting at boot in ptp_classifier.c regardless of if I'm doing
anything with BPF.

>
> So this part expects disabled preemption. Other invocations disable
> preemption. The whole BPF part is currently not working on -RT.
OK, so I should expect more issues as we play with AF_XDP? An
application based on the other example [1] is at least running.
Preempt-rt + AF_XDP seems like an awesome combination, so I hopefully
any BPF issues can be resolved.

thanks,
Paul

[1] https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 22:42 xdpsock poll with 5.2.21rt kernel Paul Thomas
2019-11-29 16:48 ` Sebastian Andrzej Siewior
2019-12-02 15:36   ` Paul Thomas
2019-12-02 16:26     ` Sebastian Andrzej Siewior
2019-12-02 17:11       ` Paul Thomas

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git