All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srivats P <pstavirs@gmail.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>,
	Xdp <xdp-newbies@vger.kernel.org>
Subject: Re: AF_XDP sendto kick returning EPERM
Date: Tue, 11 May 2021 17:32:43 +0530	[thread overview]
Message-ID: <CANzUK58Bsurc=ACPEqKcKpxZnPuiR84bFvu27ZNr1x8N-JxKWg@mail.gmail.com> (raw)
In-Reply-To: <20210509154136.GA36905@ranger.igk.intel.com>

On Sun, May 9, 2021 at 9:24 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Fri, May 07, 2021 at 08:39:04PM +0530, Srivats P wrote:
> > Here's an update -
> >
> > On Fri, May 7, 2021 at 8:17 PM Srivats P <pstavirs@gmail.com> wrote:
> > >
> > > On Mon, May 3, 2021 at 1:54 PM Magnus Karlsson
> > > <magnus.karlsson@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 29, 2021 at 5:47 PM Srivats P <pstavirs@gmail.com> wrote:
> > > > >
> > > > > On Tue, Apr 27, 2021 at 12:58 PM Magnus Karlsson
> > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Apr 23, 2021 at 5:44 PM Srivats P <pstavirs@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm using sendto() to kick tx in my AF_XDP program after I submit
> > > > > > > descriptors to the tx ring -
> > > > > > >
> > > > > > > ret = sendto(xsk_socket__fd(xsk_), NULL, 0, MSG_DONTWAIT, NULL, 0);
> > > > > > >
> > > > > > > However, I'm receiving EPERM as the return value every time. AFAIK
> > > > > > > this is not an expected return value. Since this is with i40e, I
> > > > > > > checked i40e_xsk_wakeup() - but that also doesn't return EPERM. I am
> > > > > > > running as root and I don't see any problems with creating the xsk,
> > > > > > > configuring umem etc.
> > > > > > >
> > > > > > > Also, no packets seem to go out either.
> > > > > > >
> > > > > > > # uname -a
> > > > > > > Linux Ostinato-1 5.11.15-1-default #1 SMP Fri Apr 16 16:47:34 UTC 2021
> > > > > > > (64fb5bf) x86_64 x86_64 x86_64 GNU/Linux
> > > > > > >
> > > > > > > I don't see the problem on another machine with i40e but older kernel 5.4 series
> > > > > > >
> > > > > > > Any suggestions on what to look for or how to proceed?
> > > > > >
> > > > > > Weird. Have not seen this before. What is your command line for
> > > > > > xdpsock? Is it unmodified?
> > > > >
> > > > > This is not xdpsock, but my own AF_XDP program.
> > > > >
> > > > > >
> > > > > > Using bpftrace, we can get the call stack of xsk_sendmsg. Somewhere in
> > > > > > this stack there must be an EPERM. You can run the same command on
> > > > > > your system, but use ftrace to see what a sendto call hits. Then see
> > > > > > where the code terminates.
> > > > > >
> > > > > > mkarlsso@kurt:~/src/dna-linux$ sudo bpftrace -e 'kprobe:xsk_sendmsg {
> > > > > > @[kstack()] = count(); }'
> > > > > > Attaching 1 probe...
> > > > > > ^C
> > > > > >
> > > > > > @[
> > > > > >     xsk_sendmsg+1
> > > > > >     sock_sendmsg+94
> > > > > >     __sys_sendto+238
> > > > > >     __x64_sys_sendto+37
> > > > > >     do_syscall_64+51
> > > > > >     entry_SYSCALL_64_after_hwframe+68
> > > > > > ]: 2244805
> > > > >
> > > > > Ostinato-1:~ # bpftrace -e 'kprobe:xsk_sendmsg {
> > > > > @[kstack()] = count(); }'
> > > > > Attaching 1 probe...^C@[
> > > > >     xsk_sendmsg+1
> > > > >     sock_sendmsg+94
> > > > >     __sys_sendto+238
> > > > >     __x64_sys_sendto+37
> > > > >     do_syscall_64+51
> > > > >     entry_SYSCALL_64_after_hwframe+68
> > > > > ]: 1253307
> > > > >
> > > > > Which doesn't seem to suggest any error - I've looked at the source
> > > > > code for all these functions, but don't see any reference to EPERM.
> > > >
> > > > It must be in there somewhere :-). Could you plesae use ftrace
> > > > (through perf for example) and trace all functions that a sendto hits
> > > > in your case? Then we might see what it hits.
> > > >
> > > > Are you running in SKB mode or in zero-copy mode? Guess it is
> > > > zero-copy from your mail, but just want to verify. Does Rx work as
> > > > expected?
> > > >
> > > > Could you share your AF_XDP program?
>
> +1, that would help us probably :)

The code is proprietary, but if required I can extract relevant bits
into a sample program or modify the sample xdpsock_user.c suitably.

>
> > >
> > > After some experimentation and a lot of head-scratching, I found part
> > > of the problem last night. The sendto() was not returning EPERM (-1),
> > > but ENXIO (-6) - I was mistakenly printing the return value of the
> > > sento() call (which always returns -1 in case of failure), instead of
> > > errno (duh!).
> > >
> > > Looking at the code, I see ENXIO is returned if the xsk is unbound.
> > > I'm still investigating this and will post an update soon. The problem
> > > is happening at a customer end and there's some delay and follow up
> > > required to get the logs.
> >
> > sendto() was returning ENXIO because the interface MTU was set to 9000
> > which I know is not supported with AF_XDP. But shouldn't
> > xsk_socket__create() fail in this case? Note the actual packet being
> > transmitted was 64 bytes.
>
> It depends. You said that you have your own AF_XDP app, so if you're
> setting the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD flag then libbpf wouldn't
> be loading the built-in AF_XDP eBPF prog on interface and that's where the
> failure should happen.

I used AF_XDP for TX only with my own eBPF program for RX. For this
reason, I was using INHIBIT_PROG_LOAD while opening the xsk. That's
why I didn't see an error while creating the xsk.

>
> >
> > Not sure if it has a role in the above sendto() failure, but before
> > xsk socket create, my call to bpf_set_link_xdp_fd() was failing
> > because of the MTU problem (the newly added error message for this
> > case was very helpful!). Once MTU was reduced to 1500 both the RX eBPF
> > program link to the interface failure and the TX sendto() returning
> > ENXIO always went away. Kernel version 5.12
> >
> > Can someone tell me what is expected to happen for a Tx AF_XDP socket
> > in case of MTU > 4K?
>
> See the last paragraph.
>
> >
> > I also found a second case of sendto() returning ENXIO. In this
> > scenario, I was removing my RX eBPF program by calling
> >
> >     bpf_set_link_xdp_fd(ifIndex, -1, 0)
> >
> > while AF_XDP transmit (and associated sento() wakeup) was still going
> > on. In this case, sendto starts failing with ENETDOWN for some time
> > followed by ENXIO subsequently. This case was on Kernel version 5.4.0
>
> I think that we addressed the ENETDOWN Tx issue with the following set:
> https://lore.kernel.org/netdev/20200205045834.56795-1-maciej.fijalkowski@intel.com/
>
> I see that it has been merged in 5.6. But it was related to being unable
> to spawn multiple AF_XDP Tx-only instances. With what you're saying it
> feels to me that you have multiple instances of your AF_XDP progs and you
> terminate one of them? Previously, every instance would die due to the
> fact that the underlying XDP prog would be unloaded from interface, but
> right now we have bpf_link support for AF_XDP which would handle that
> properly. Note that it was developed for the built-in prog.

I think my case is different. I have only one AF_XDP Tx-only instance,
but I'm not using the built-in AF_XDP eBPF program. So when I remove
my eBPF program the AF_XDP Tx also gets affected. I solved my problem
by cleaning up the AF_XDP Tx first before removing my custom eBPF Rx
program.

>
> >
> > Does removing a XDP program cause the interface to go down (ENETDOWN)
> > leading to XDP socket unbind (ENXIO)? Should removing (or replacing)
> > an RX eBPF program, affect AF_XDP TX?
>
> Removing XDP prog causes the interface to undergo the reset or some other
> mechanism as it needs to remove the XDP Tx resources and change the Rx
> memory model. For Intel drivers, the AF_XDP Tx resources are configured
> during the load of Rx eBPF prog. We would have to develop some mechanism
> that detaches the creation of XDP Tx resources from loading Rx eBPF prog.
> There have been discussions around feature detection but I think it was
> about the opposite - don't configure Tx rings if your prog will not be
> doing XDP_TX action.

I guess I was sort of implicitly assuming that XDP Tx and Rx paths are
independent. Which is not the case. This is good to keep in mind while
coding.

I think it might be a worthwhile goal to allow the eBPF program to be
removed/replaced without affecting Tx - not sure how feasible that is
though.

Thanks for all the help!

>
> >
> > Srivats

      reply	other threads:[~2021-05-11 12:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-23 15:44 AF_XDP sendto kick returning EPERM Srivats P
2021-04-27  7:28 ` Magnus Karlsson
2021-04-29 15:47   ` Srivats P
2021-05-03  8:24     ` Magnus Karlsson
2021-05-07 14:47       ` Srivats P
2021-05-07 15:09         ` Srivats P
2021-05-09 15:41           ` Maciej Fijalkowski
2021-05-11 12:02             ` Srivats P [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANzUK58Bsurc=ACPEqKcKpxZnPuiR84bFvu27ZNr1x8N-JxKWg@mail.gmail.com' \
    --to=pstavirs@gmail.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=xdp-newbies@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.