All of lore.kernel.org
 help / color / mirror / Atom feed
* AF_XDP not transmitting frames immediately
@ 2021-12-13 21:04 Jesper Dangaard Brouer
  2021-12-14  8:07 ` Karlsson, Magnus
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2021-12-13 21:04 UTC (permalink / raw)
  To: Karlsson, Magnus, Björn Töpel
  Cc: brouer, Xdp, Ong Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima

Hi Magnus and Bjørn,

I'm coding on an AF_XDP program[1] that need to send (a bulk of packets)
in a short time-window (related to Time-Triggered Ethernet).

My observations are that AF_XDP doesn't send the frames immediately.
And yes, I do call sendto() to trigger a TX kick.
In zero-copy mode this is particular bad.  My program want to send 4
packets in a burst, but I'm observing 8 packets grouped together on the
receiving host.

Is the a known property of AF_XDP?

How can I get AF_XDP to "flush" TX packets when calling sendto()?
Should we add another flag than the current MSG_DONTWAIT?

--Jesper

Hint, I'm using tcpdump hardware timestamping on receiving hist via cmdline:

  tcpdump -vv -s0 -ni eth1 -j adapter_unsynced 
--time-stamp-precision=nano  -w af_xdp_tx_cyclic.dump42

Notice[1] on specific branch:
  [1] 
https://github.com/xdp-project/bpf-examples/tree/vestas03_AF_XDP_example/AF_XDP-interaction


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: AF_XDP not transmitting frames immediately
  2021-12-13 21:04 AF_XDP not transmitting frames immediately Jesper Dangaard Brouer
@ 2021-12-14  8:07 ` Karlsson, Magnus
  2021-12-14 10:32   ` Jesper Dangaard Brouer
  2021-12-15 10:17   ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 16+ messages in thread
From: Karlsson, Magnus @ 2021-12-14  8:07 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Björn Töpel
  Cc: Brouer, Jesper, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima



> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Sent: Monday, December 13, 2021 10:04 PM
> To: Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel
> <bjorn@kernel.org>
> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp <xdp-
> newbies@vger.kernel.org>; Ong, Boon Leong <boon.leong.ong@intel.com>;
> Joao Pedro Barros Silva <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima
> <dioli@vestas.com>
> Subject: AF_XDP not transmitting frames immediately
> 
> Hi Magnus and Bjørn,
> 
> I'm coding on an AF_XDP program[1] that need to send (a bulk of packets) in
> a short time-window (related to Time-Triggered Ethernet).
> 
> My observations are that AF_XDP doesn't send the frames immediately.
> And yes, I do call sendto() to trigger a TX kick.
> In zero-copy mode this is particular bad.  My program want to send 4 packets
> in a burst, but I'm observing 8 packets grouped together on the receiving
> host.
> 
> Is the a known property of AF_XDP?

Nope! It is supposed to be able to send one packet at a time, though I have several times seen bugs in the drivers where the batching behavior shines through like this, and once a bug in the core code. There is even a test these days for just sending a single packet, since we have had issues with this in the past. That test does pass in bpf-next, but it is only run with the veth driver that does not support zero-copy so could still be an issue. What driver are you using in zero-copy mode and what kernel version are you on?

> How can I get AF_XDP to "flush" TX packets when calling sendto()?
> Should we add another flag than the current MSG_DONTWAIT?

In zero-copy mode with softirq driver processing (not busy poll), a sendto will just trigger the xsk_wakeup ndo that schedules napi unless it is already executing. It is up to the driver to then get packets from the Tx ring and put them on the HW and make sure they are sent. Barring any HW quirks, sending one packets should be perfectly fine.

/Magnus

> --Jesper
> 
> Hint, I'm using tcpdump hardware timestamping on receiving hist via cmdline:
> 
>   tcpdump -vv -s0 -ni eth1 -j adapter_unsynced --time-stamp-precision=nano
> -w af_xdp_tx_cyclic.dump42
> 
> Notice[1] on specific branch:
>   [1]
> https://github.com/xdp-project/bpf-
> examples/tree/vestas03_AF_XDP_example/AF_XDP-interaction


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-14  8:07 ` Karlsson, Magnus
@ 2021-12-14 10:32   ` Jesper Dangaard Brouer
  2021-12-14 10:40     ` Karlsson, Magnus
  2021-12-15 10:17   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2021-12-14 10:32 UTC (permalink / raw)
  To: Karlsson, Magnus, Björn Töpel
  Cc: brouer, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima



On 14/12/2021 09.07, Karlsson, Magnus wrote:
> 
> 
>> -----Original Message----- From: Jesper Dangaard Brouer
>> <jbrouer@redhat.com> Sent: Monday, December 13, 2021 10:04 PM To:
>> Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel 
>> <bjorn@kernel.org> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp
>> <xdp- newbies@vger.kernel.org>; Ong, Boon Leong
>> <boon.leong.ong@intel.com>; Joao Pedro Barros Silva
>> <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima 
>> <dioli@vestas.com> Subject: AF_XDP not transmitting frames
>> immediately
>> 
>> Hi Magnus and Bjørn,
>> 
>> I'm coding on an AF_XDP program[1] that need to send (a bulk of
>> packets) in a short time-window (related to Time-Triggered
>> Ethernet).
>> 
>> My observations are that AF_XDP doesn't send the frames
>> immediately. And yes, I do call sendto() to trigger a TX kick. In
>> zero-copy mode this is particular bad.  My program want to send 4
>> packets in a burst, but I'm observing 8 packets grouped together on
>> the receiving host.
>> 
>> Is the a known property of AF_XDP?
> 
> Nope! It is supposed to be able to send one packet at a time, though
> I have several times seen bugs in the drivers where the batching
> behavior shines through like this, and once a bug in the core code.
> There is even a test these days for just sending a single packet,

Where is that test in the kernel tree?

> since we have had issues with this in the past. That test does pass
> in bpf-next, but it is only run with the veth driver that does not
> support zero-copy so could still be an issue. What driver are you
> using in zero-copy mode and what kernel version are you on?

Driver: igc with Intel chip i225

Kernel version: 5.15.0-net-next+ #618 SMP PREEMPT
  - Devel branch at commit 6d3b1b069946 (v5.15-12802-g6d3b1b069946)

>> How can I get AF_XDP to "flush" TX packets when calling sendto()? 
>> Should we add another flag than the current MSG_DONTWAIT?
> 
> In zero-copy mode with softirq driver processing (not busy poll), a
> sendto will just trigger the xsk_wakeup ndo that schedules napi
> unless it is already executing. It is up to the driver to then get
> packets from the Tx ring and put them on the HW and make sure they
> are sent. Barring any HW quirks, sending one packets should be
> perfectly fine.

I will investigate driver level issues.

I have other (100G) NICs in my testlab, but I'm using these 1G NICs 
because they support hardware timestamping, which allows me to 
investigate these timing issues.
I'll find a way to see of other drivers behave differently.

>> Hint, I'm using tcpdump hardware timestamping on receiving hist via
>> cmdline:
>> 
>> tcpdump -vv -s0 -ni eth1 -j adapter_unsynced --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42
>> 
>> Notice[1] on specific branch: [1] 
>> https://github.com/xdp-project/bpf-examples/tree/vestas03_AF_XDP_example/AF_XDP-interaction
> 

Thanks for your feedback,
--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: AF_XDP not transmitting frames immediately
  2021-12-14 10:32   ` Jesper Dangaard Brouer
@ 2021-12-14 10:40     ` Karlsson, Magnus
  2021-12-14 11:25       ` Maciej Fijalkowski
  2021-12-15  1:08       ` Desouza, Ederson
  0 siblings, 2 replies; 16+ messages in thread
From: Karlsson, Magnus @ 2021-12-14 10:40 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Björn Töpel, Desouza, Ederson
  Cc: Brouer, Jesper, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima, Fijalkowski, Maciej

Adding Ederson and Maciej.

> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Sent: Tuesday, December 14, 2021 11:32 AM
> To: Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel
> <bjorn@kernel.org>
> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp <xdp-
> newbies@vger.kernel.org>; Ong, Boon Leong <boon.leong.ong@intel.com>;
> Joao Pedro Barros Silva <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima
> <dioli@vestas.com>
> Subject: Re: AF_XDP not transmitting frames immediately
> 
> 
> 
> On 14/12/2021 09.07, Karlsson, Magnus wrote:
> >
> >
> >> -----Original Message----- From: Jesper Dangaard Brouer
> >> <jbrouer@redhat.com> Sent: Monday, December 13, 2021 10:04 PM To:
> >> Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel
> >> <bjorn@kernel.org> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp
> >> <xdp- newbies@vger.kernel.org>; Ong, Boon Leong
> >> <boon.leong.ong@intel.com>; Joao Pedro Barros Silva
> >> <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima <dioli@vestas.com>
> >> Subject: AF_XDP not transmitting frames immediately
> >>
> >> Hi Magnus and Bjørn,
> >>
> >> I'm coding on an AF_XDP program[1] that need to send (a bulk of
> >> packets) in a short time-window (related to Time-Triggered Ethernet).
> >>
> >> My observations are that AF_XDP doesn't send the frames immediately.
> >> And yes, I do call sendto() to trigger a TX kick. In zero-copy mode
> >> this is particular bad.  My program want to send 4 packets in a
> >> burst, but I'm observing 8 packets grouped together on the receiving
> >> host.
> >>
> >> Is the a known property of AF_XDP?
> >
> > Nope! It is supposed to be able to send one packet at a time, though I
> > have several times seen bugs in the drivers where the batching
> > behavior shines through like this, and once a bug in the core code.
> > There is even a test these days for just sending a single packet,
> 
> Where is that test in the kernel tree?

In tools/testing/selftests/bpf/xdpxceiver.c. It is the RUN_TO_COMPLETION_SINGLE_PKT test. But the framework only operates on veth currently.

> > since we have had issues with this in the past. That test does pass in
> > bpf-next, but it is only run with the veth driver that does not
> > support zero-copy so could still be an issue. What driver are you
> > using in zero-copy mode and what kernel version are you on?
> 
> Driver: igc with Intel chip i225

Have never tried this one personally. Do not know if I have one in the lab but let me check.

Ederson, do you have any experience with this card and if so, have you seen something similar?

> Kernel version: 5.15.0-net-next+ #618 SMP PREEMPT
>   - Devel branch at commit 6d3b1b069946 (v5.15-12802-g6d3b1b069946)
> 
> >> How can I get AF_XDP to "flush" TX packets when calling sendto()?
> >> Should we add another flag than the current MSG_DONTWAIT?
> >
> > In zero-copy mode with softirq driver processing (not busy poll), a
> > sendto will just trigger the xsk_wakeup ndo that schedules napi unless
> > it is already executing. It is up to the driver to then get packets
> > from the Tx ring and put them on the HW and make sure they are sent.
> > Barring any HW quirks, sending one packets should be perfectly fine.
> 
> I will investigate driver level issues.
> 
> I have other (100G) NICs in my testlab, but I'm using these 1G NICs because
> they support hardware timestamping, which allows me to investigate these
> timing issues.
> I'll find a way to see of other drivers behave differently.

Would be great if you could check if the problem also exists on e.g. ice. 

> >> Hint, I'm using tcpdump hardware timestamping on receiving hist via
> >> cmdline:
> >>
> >> tcpdump -vv -s0 -ni eth1 -j adapter_unsynced
> >> --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42
> >>
> >> Notice[1] on specific branch: [1]
> >> https://github.com/xdp-project/bpf-
> examples/tree/vestas03_AF_XDP_exam
> >> ple/AF_XDP-interaction
> >
> 
> Thanks for your feedback,
> --Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-14 10:40     ` Karlsson, Magnus
@ 2021-12-14 11:25       ` Maciej Fijalkowski
  2021-12-14 14:04         ` Jesper Dangaard Brouer
  2021-12-15  1:08       ` Desouza, Ederson
  1 sibling, 1 reply; 16+ messages in thread
From: Maciej Fijalkowski @ 2021-12-14 11:25 UTC (permalink / raw)
  To: Karlsson, Magnus
  Cc: Jesper Dangaard Brouer, Björn Töpel, Desouza, Ederson,
	Brouer, Jesper, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima

On Tue, Dec 14, 2021 at 10:40:05AM +0000, Karlsson, Magnus wrote:
> Adding Ederson and Maciej.
> 
> > On 14/12/2021 09.07, Karlsson, Magnus wrote:
> > >
> > >
> > >> -----Original Message----- From: Jesper Dangaard Brouer
> > >> <jbrouer@redhat.com> Sent: Monday, December 13, 2021 10:04 PM To:
> > >> Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel
> > >> <bjorn@kernel.org> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp
> > >> <xdp- newbies@vger.kernel.org>; Ong, Boon Leong
> > >> <boon.leong.ong@intel.com>; Joao Pedro Barros Silva
> > >> <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima <dioli@vestas.com>
> > >> Subject: AF_XDP not transmitting frames immediately
> > >>
> > >> Hi Magnus and Bjørn,
> > >>
> > >> I'm coding on an AF_XDP program[1] that need to send (a bulk of
> > >> packets) in a short time-window (related to Time-Triggered Ethernet).
> > >>
> > >> My observations are that AF_XDP doesn't send the frames immediately.
> > >> And yes, I do call sendto() to trigger a TX kick. In zero-copy mode
> > >> this is particular bad.  My program want to send 4 packets in a
> > >> burst, but I'm observing 8 packets grouped together on the receiving
> > >> host.
> > >>
> > >> Is the a known property of AF_XDP?
> > >
> > > Nope! It is supposed to be able to send one packet at a time, though I
> > > have several times seen bugs in the drivers where the batching
> > > behavior shines through like this, and once a bug in the core code.
> > > There is even a test these days for just sending a single packet,
> >
> > Where is that test in the kernel tree?
> 
> In tools/testing/selftests/bpf/xdpxceiver.c. It is the RUN_TO_COMPLETION_SINGLE_PKT test. But the framework only operates on veth currently.

I'd say it's driver's fault. Magnus fixed something similar for i40e:
https://lore.kernel.org/netdev/20210401172107.1191618-3-anthony.l.nguyen@intel.com/

We don't have currently igc HW on our side to dig this :<

> 
> > > since we have had issues with this in the past. That test does pass in
> > > bpf-next, but it is only run with the veth driver that does not
> > > support zero-copy so could still be an issue. What driver are you
> > > using in zero-copy mode and what kernel version are you on?
> >
> > Driver: igc with Intel chip i225
> 
> Have never tried this one personally. Do not know if I have one in the lab but let me check.
> 
> Ederson, do you have any experience with this card and if so, have you seen something similar?
> 
> > Kernel version: 5.15.0-net-next+ #618 SMP PREEMPT
> >   - Devel branch at commit 6d3b1b069946 (v5.15-12802-g6d3b1b069946)
> >
> > >> How can I get AF_XDP to "flush" TX packets when calling sendto()?
> > >> Should we add another flag than the current MSG_DONTWAIT?
> > >
> > > In zero-copy mode with softirq driver processing (not busy poll), a
> > > sendto will just trigger the xsk_wakeup ndo that schedules napi unless
> > > it is already executing. It is up to the driver to then get packets
> > > from the Tx ring and put them on the HW and make sure they are sent.
> > > Barring any HW quirks, sending one packets should be perfectly fine.
> >
> > I will investigate driver level issues.
> >
> > I have other (100G) NICs in my testlab, but I'm using these 1G NICs because
> > they support hardware timestamping, which allows me to investigate these
> > timing issues.
> > I'll find a way to see of other drivers behave differently.
> 
> Would be great if you could check if the problem also exists on e.g. ice.
> 
> > >> Hint, I'm using tcpdump hardware timestamping on receiving hist via
> > >> cmdline:
> > >>
> > >> tcpdump -vv -s0 -ni eth1 -j adapter_unsynced
> > >> --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42
> > >>
> > >> Notice[1] on specific branch: [1]
> > >> https://github.com/xdp-project/bpf-
> > examples/tree/vestas03_AF_XDP_exam
> > >> ple/AF_XDP-interaction
> > >
> >
> > Thanks for your feedback,
> > --Jesper
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-14 11:25       ` Maciej Fijalkowski
@ 2021-12-14 14:04         ` Jesper Dangaard Brouer
  2021-12-14 14:57           ` Ong, Boon Leong
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2021-12-14 14:04 UTC (permalink / raw)
  To: Maciej Fijalkowski, Karlsson, Magnus
  Cc: brouer, Jesper Dangaard Brouer, Björn Töpel, Desouza,
	Ederson, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima



On 14/12/2021 12.25, Maciej Fijalkowski wrote:
> On Tue, Dec 14, 2021 at 10:40:05AM +0000, Karlsson, Magnus wrote:
>> Adding Ederson and Maciej.
>>
>>> On 14/12/2021 09.07, Karlsson, Magnus wrote:
>>>>
>>>>
>>>>> -----Original Message----- From: Jesper Dangaard Brouer
>>>>> <jbrouer@redhat.com> Sent: Monday, December 13, 2021 10:04 PM To:
>>>>> Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel
>>>>> <bjorn@kernel.org> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp
>>>>> <xdp- newbies@vger.kernel.org>; Ong, Boon Leong
>>>>> <boon.leong.ong@intel.com>; Joao Pedro Barros Silva
>>>>> <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima <dioli@vestas.com>
>>>>> Subject: AF_XDP not transmitting frames immediately
>>>>>
>>>>> Hi Magnus and Bjørn,
>>>>>
>>>>> I'm coding on an AF_XDP program[1] that need to send (a bulk of
>>>>> packets) in a short time-window (related to Time-Triggered Ethernet).
>>>>>
>>>>> My observations are that AF_XDP doesn't send the frames immediately.
>>>>> And yes, I do call sendto() to trigger a TX kick. In zero-copy mode
>>>>> this is particular bad.  My program want to send 4 packets in a
>>>>> burst, but I'm observing 8 packets grouped together on the receiving
>>>>> host.
>>>>>
>>>>> Is the a known property of AF_XDP?
>>>>
>>>> Nope! It is supposed to be able to send one packet at a time, though I
>>>> have several times seen bugs in the drivers where the batching
>>>> behavior shines through like this, and once a bug in the core code.
>>>> There is even a test these days for just sending a single packet,
>>>
>>> Where is that test in the kernel tree?
>>
>> In tools/testing/selftests/bpf/xdpxceiver.c. It is the RUN_TO_COMPLETION_SINGLE_PKT test. But the framework only operates on veth currently.
> 
> I'd say it's driver's fault. Magnus fixed something similar for i40e:
> https://lore.kernel.org/netdev/20210401172107.1191618-3-anthony.l.nguyen@intel.com/

Thanks for that hint.

> 
> We don't have currently igc HW on our side to dig this :<

I suspected Boon Leong (cc) would have this hardware.

>>
>>>> since we have had issues with this in the past. That test does pass in
>>>> bpf-next, but it is only run with the veth driver that does not
>>>> support zero-copy so could still be an issue. What driver are you
>>>> using in zero-copy mode and what kernel version are you on?
>>>
>>> Driver: igc with Intel chip i225
>>
>> Have never tried this one personally. Do not know if I have one in the lab but let me check.
>>
>> Ederson, do you have any experience with this card and if so, have you seen something similar?
>>
>>> Kernel version: 5.15.0-net-next+ #618 SMP PREEMPT
>>>    - Devel branch at commit 6d3b1b069946 (v5.15-12802-g6d3b1b069946)
>>>
>>>>> How can I get AF_XDP to "flush" TX packets when calling sendto()?
>>>>> Should we add another flag than the current MSG_DONTWAIT?
>>>>
>>>> In zero-copy mode with softirq driver processing (not busy poll), a
>>>> sendto will just trigger the xsk_wakeup ndo that schedules napi unless
>>>> it is already executing. It is up to the driver to then get packets
>>>> from the Tx ring and put them on the HW and make sure they are sent.
>>>> Barring any HW quirks, sending one packets should be perfectly fine.
>>>
>>> I will investigate driver level issues.
>>>
>>> I have other (100G) NICs in my testlab, but I'm using these 1G NICs because
>>> they support hardware timestamping, which allows me to investigate these
>>> timing issues.
>>> I'll find a way to see of other drivers behave differently.
>>
>> Would be great if you could check if the problem also exists on e.g. ice.
>>

Having issues getting my ICE hardware to link up.

I tested that driver i40e works as expected. Thus, this is likely an 
issue with the driver.  I will digg some more.


>>>>> Hint, I'm using tcpdump hardware timestamping on receiving hist via
>>>>> cmdline:
>>>>>
>>>>> tcpdump -vv -s0 -ni eth1 -j adapter_unsynced
>>>>> --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42
>>>>>
>>>>> Notice[1] on specific branch: 

[1] 
https://github.com/xdp-project/bpf-examples/tree/vestas03_AF_XDP_example/AF_XDP-interaction

In [1] I tried to play with SO_PREFER_BUSY_POLL, but it didn't make a 
difference.

[2] https://github.com/xdp-project/bpf-examples/commit/3685d5ea93fced

--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: AF_XDP not transmitting frames immediately
  2021-12-14 14:04         ` Jesper Dangaard Brouer
@ 2021-12-14 14:57           ` Ong, Boon Leong
  2021-12-14 15:42             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 16+ messages in thread
From: Ong, Boon Leong @ 2021-12-14 14:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Fijalkowski, Maciej, Karlsson, Magnus
  Cc: Brouer, Jesper, Björn Töpel, Desouza, Ederson, Xdp,
	Joao Pedro Barros Silva, Diogo Alexandre Da Silva Lima

>On 14/12/2021 12.25, Maciej Fijalkowski wrote:
>> On Tue, Dec 14, 2021 at 10:40:05AM +0000, Karlsson, Magnus wrote:
>>> Adding Ederson and Maciej.
>>>
>>>> On 14/12/2021 09.07, Karlsson, Magnus wrote:
>>>>>
>>>>>
>>>>>> -----Original Message----- From: Jesper Dangaard Brouer
>>>>>> <jbrouer@redhat.com> Sent: Monday, December 13, 2021 10:04 PM To:
>>>>>> Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel
>>>>>> <bjorn@kernel.org> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp
>>>>>> <xdp- newbies@vger.kernel.org>; Ong, Boon Leong
>>>>>> <boon.leong.ong@intel.com>; Joao Pedro Barros Silva
>>>>>> <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima <dioli@vestas.com>
>>>>>> Subject: AF_XDP not transmitting frames immediately
>>>>>>
>>>>>> Hi Magnus and Bjørn,
>>>>>>
>>>>>> I'm coding on an AF_XDP program[1] that need to send (a bulk of
>>>>>> packets) in a short time-window (related to Time-Triggered Ethernet).
>>>>>>
>>>>>> My observations are that AF_XDP doesn't send the frames immediately.
>>>>>> And yes, I do call sendto() to trigger a TX kick. In zero-copy mode
>>>>>> this is particular bad.  My program want to send 4 packets in a
>>>>>> burst, but I'm observing 8 packets grouped together on the receiving
>>>>>> host.
>>>>>>
>>>>>> Is the a known property of AF_XDP?
>>>>>
>>>>> Nope! It is supposed to be able to send one packet at a time, though I
>>>>> have several times seen bugs in the drivers where the batching
>>>>> behavior shines through like this, and once a bug in the core code.
>>>>> There is even a test these days for just sending a single packet,
>>>>
>>>> Where is that test in the kernel tree?
>>>
>>> In tools/testing/selftests/bpf/xdpxceiver.c. It is the
>RUN_TO_COMPLETION_SINGLE_PKT test. But the framework only operates on
>veth currently.
>>
>> I'd say it's driver's fault. Magnus fixed something similar for i40e:
>> https://lore.kernel.org/netdev/20210401172107.1191618-3-
>anthony.l.nguyen@intel.com/
>
>Thanks for that hint.
>
>>
>> We don't have currently igc HW on our side to dig this :<
>
>I suspected Boon Leong (cc) would have this hardware.

Unfortunately, my current setup in lab does not have I225 hooked-up
and I am working remotely due to control access to intel facility. 
Perhaps, Ederson may have ready system to test?

For ZC mode, the igc driver (also true to stmmac) depends on the XSK wakeup
to trigger the NAPI poll (igc_poll) to first clean-up Tx ring and eventually call
igc_xdp_xmit_zc() to start submitting Tx frame into DMA engine. We have
used busy-poll to ensure in smaller Tx frame latency/jitter.

There was another issue in stmmac that was patched [1] recently to ensure
the driver does not perform MAC reset whenever XDP program is added
so that between XDP socket creation, the Tx transmit does not take extra
2-3s due to link down/up. Jesper, are you seeing something similar in your
app?

If yes, then it is likely because of the implementation of igc driver in mainline
that is doing igc_down(), a little bit too aggressive in reseting MAC completely. 

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20211111143949.2806049-1-boon.leong.ong@intel.com/ 

>
>>>
>>>>> since we have had issues with this in the past. That test does pass in
>>>>> bpf-next, but it is only run with the veth driver that does not
>>>>> support zero-copy so could still be an issue. What driver are you
>>>>> using in zero-copy mode and what kernel version are you on?
>>>>
>>>> Driver: igc with Intel chip i225
>>>
>>> Have never tried this one personally. Do not know if I have one in the lab
>but let me check.
>>>
>>> Ederson, do you have any experience with this card and if so, have you seen
>something similar?
>>>
>>>> Kernel version: 5.15.0-net-next+ #618 SMP PREEMPT
>>>>    - Devel branch at commit 6d3b1b069946 (v5.15-12802-g6d3b1b069946)
>>>>
>>>>>> How can I get AF_XDP to "flush" TX packets when calling sendto()?
>>>>>> Should we add another flag than the current MSG_DONTWAIT?
>>>>>
>>>>> In zero-copy mode with softirq driver processing (not busy poll), a
>>>>> sendto will just trigger the xsk_wakeup ndo that schedules napi unless
>>>>> it is already executing. It is up to the driver to then get packets
>>>>> from the Tx ring and put them on the HW and make sure they are sent.
>>>>> Barring any HW quirks, sending one packets should be perfectly fine.
>>>>
>>>> I will investigate driver level issues.
>>>>
>>>> I have other (100G) NICs in my testlab, but I'm using these 1G NICs because
>>>> they support hardware timestamping, which allows me to investigate
>these
>>>> timing issues.
>>>> I'll find a way to see of other drivers behave differently.
>>>
>>> Would be great if you could check if the problem also exists on e.g. ice.
>>>
>
>Having issues getting my ICE hardware to link up.
>
>I tested that driver i40e works as expected. Thus, this is likely an
>issue with the driver.  I will digg some more.
>
>
>>>>>> Hint, I'm using tcpdump hardware timestamping on receiving hist via
>>>>>> cmdline:
>>>>>>
>>>>>> tcpdump -vv -s0 -ni eth1 -j adapter_unsynced
>>>>>> --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42
>>>>>>
>>>>>> Notice[1] on specific branch:
>
>[1]
>https://github.com/xdp-project/bpf-
>examples/tree/vestas03_AF_XDP_example/AF_XDP-interaction
>
>In [1] I tried to play with SO_PREFER_BUSY_POLL, but it didn't make a
>difference.
>
>[2] https://github.com/xdp-project/bpf-examples/commit/3685d5ea93fced
>
>--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-14 14:57           ` Ong, Boon Leong
@ 2021-12-14 15:42             ` Jesper Dangaard Brouer
  2021-12-14 16:05               ` Maciej Fijalkowski
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2021-12-14 15:42 UTC (permalink / raw)
  To: Ong, Boon Leong, Jesper Dangaard Brouer, Fijalkowski, Maciej,
	Karlsson, Magnus
  Cc: brouer, Björn Töpel, Desouza, Ederson, Xdp,
	Joao Pedro Barros Silva, Diogo Alexandre Da Silva Lima


On 14/12/2021 15.57, Ong, Boon Leong wrote:
>> I suspected Boon Leong (cc) would have this hardware.
 >
> Unfortunately, my current setup in lab does not have I225 hooked-up
> and I am working remotely due to control access to intel facility.
> Perhaps, Ederson may have ready system to test?
> 
> For ZC mode, the igc driver (also true to stmmac) depends on the XSK wakeup
> to trigger the NAPI poll (igc_poll) to first clean-up Tx ring and eventually call
> igc_xdp_xmit_zc() to start submitting Tx frame into DMA engine. We have
> used busy-poll to ensure in smaller Tx frame latency/jitter.
> 
> There was another issue in stmmac that was patched [1] recently to ensure
> the driver does not perform MAC reset whenever XDP program is added
> so that between XDP socket creation, the Tx transmit does not take extra
> 2-3s due to link down/up. Jesper, are you seeing something similar in your
> app?

Yes, and it is quite annoying.

In my setup, if I AF_XDP transmit packets too early they are simply 
lost... that confused me a bit.

I wanted to ask AF_XDP maintainers:
  - What is the best way to know when AF_XDP is ready to Tx packets?

E.g. what API should I call, e.g. that blocks, until XSK socket is ready 
to transmit on?


> If yes, then it is likely because of the implementation of igc driver in mainline
> that is doing igc_down(), a little bit too aggressive in reseting MAC completely.
> 

It would be good to fix igc too, like[1].
BUT afaik it will only make the window smaller when XSK is not ready for 
TX packets.


> [1]https://patchwork.kernel.org/project/netdevbpf/patch/20211111143949.2806049-1-boon.leong.ong@intel.com/  
> 

Thanks for the link
--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-14 15:42             ` Jesper Dangaard Brouer
@ 2021-12-14 16:05               ` Maciej Fijalkowski
  0 siblings, 0 replies; 16+ messages in thread
From: Maciej Fijalkowski @ 2021-12-14 16:05 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Ong, Boon Leong, Karlsson, Magnus, brouer, Björn Töpel,
	Desouza, Ederson, Xdp, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima

On Tue, Dec 14, 2021 at 04:42:18PM +0100, Jesper Dangaard Brouer wrote:
> 
> On 14/12/2021 15.57, Ong, Boon Leong wrote:
> > > I suspected Boon Leong (cc) would have this hardware.
> >
> > Unfortunately, my current setup in lab does not have I225 hooked-up
> > and I am working remotely due to control access to intel facility.
> > Perhaps, Ederson may have ready system to test?
> > 
> > For ZC mode, the igc driver (also true to stmmac) depends on the XSK wakeup
> > to trigger the NAPI poll (igc_poll) to first clean-up Tx ring and eventually call
> > igc_xdp_xmit_zc() to start submitting Tx frame into DMA engine. We have
> > used busy-poll to ensure in smaller Tx frame latency/jitter.
> > 
> > There was another issue in stmmac that was patched [1] recently to ensure
> > the driver does not perform MAC reset whenever XDP program is added
> > so that between XDP socket creation, the Tx transmit does not take extra
> > 2-3s due to link down/up. Jesper, are you seeing something similar in your
> > app?
> 
> Yes, and it is quite annoying.
> 
> In my setup, if I AF_XDP transmit packets too early they are simply lost...
> that confused me a bit.
> 
> I wanted to ask AF_XDP maintainers:
>  - What is the best way to know when AF_XDP is ready to Tx packets?
> 
> E.g. what API should I call, e.g. that blocks, until XSK socket is ready to
> transmit on?

Not a maintainer, but anyway.
From a driver POV xsk_tx_peek_desc() (or batching variant) is used to make
sure that user space produced entries in the XSK Tx ring so that driver
can consume it and place it onto HW descriptors.

From the top of my head I'm not aware of any blocking calls, maybe you
could spin on xsk_tx_peek_desc.

For igc maybe it would be worth returning some status from
igc_xdp_xmit_zc(). Like, imagine that you consumed all the budget but
there are still descriptors in the XSK Tx ring. You'd like to signal to
NAPI that there is still work to be done.

> 
> 
> > If yes, then it is likely because of the implementation of igc driver in mainline
> > that is doing igc_down(), a little bit too aggressive in reseting MAC completely.
> > 
> 
> It would be good to fix igc too, like[1].
> BUT afaik it will only make the window smaller when XSK is not ready for TX
> packets.
> 
> 
> > [1]https://patchwork.kernel.org/project/netdevbpf/patch/20211111143949.2806049-1-boon.leong.ong@intel.com/
> > 
> 
> Thanks for the link
> --Jesper
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-14 10:40     ` Karlsson, Magnus
  2021-12-14 11:25       ` Maciej Fijalkowski
@ 2021-12-15  1:08       ` Desouza, Ederson
  2021-12-15  8:41         ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 16+ messages in thread
From: Desouza, Ederson @ 2021-12-15  1:08 UTC (permalink / raw)
  To: bjorn, Karlsson, Magnus, jbrouer
  Cc: jopbs, xdp-newbies, dioli, Fijalkowski, Maciej, Gomes, Vinicius,
	Ong, Boon Leong, Brouer, Jesper

+Vinicius

On Tue, 2021-12-14 at 10:40 +0000, Karlsson, Magnus wrote:
> Adding Ederson and Maciej.
> 
> > -----Original Message-----
> > From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> > Sent: Tuesday, December 14, 2021 11:32 AM
> > To: Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel
> > <bjorn@kernel.org>
> > Cc: Brouer, Jesper <brouer@redhat.com>; Xdp <xdp-
> > newbies@vger.kernel.org>; Ong, Boon Leong
> > <boon.leong.ong@intel.com>;
> > Joao Pedro Barros Silva <jopbs@vestas.com>; Diogo Alexandre Da
> > Silva Lima
> > <dioli@vestas.com>
> > Subject: Re: AF_XDP not transmitting frames immediately
> > 
> > 
> > 
> > On 14/12/2021 09.07, Karlsson, Magnus wrote:
> > > 
> > > 
> > > > -----Original Message----- From: Jesper Dangaard Brouer
> > > > <jbrouer@redhat.com> Sent: Monday, December 13, 2021 10:04 PM
> > > > To:
> > > > Karlsson, Magnus <magnus.karlsson@intel.com>; Björn Töpel
> > > > <bjorn@kernel.org> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp
> > > > <xdp- newbies@vger.kernel.org>; Ong, Boon Leong
> > > > <boon.leong.ong@intel.com>; Joao Pedro Barros Silva
> > > > <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima
> > > > <dioli@vestas.com>
> > > > Subject: AF_XDP not transmitting frames immediately
> > > > 
> > > > Hi Magnus and Bjørn,
> > > > 
> > > > I'm coding on an AF_XDP program[1] that need to send (a bulk of
> > > > packets) in a short time-window (related to Time-Triggered
> > > > Ethernet).
> > > > 
> > > > My observations are that AF_XDP doesn't send the frames
> > > > immediately.
> > > > And yes, I do call sendto() to trigger a TX kick. In zero-copy
> > > > mode
> > > > this is particular bad.  My program want to send 4 packets in a
> > > > burst, but I'm observing 8 packets grouped together on the
> > > > receiving
> > > > host.
> > > > 
> > > > Is the a known property of AF_XDP?
> > > 
> > > Nope! It is supposed to be able to send one packet at a time,
> > > though I
> > > have several times seen bugs in the drivers where the batching
> > > behavior shines through like this, and once a bug in the core
> > > code.
> > > There is even a test these days for just sending a single packet,
> > 
> > Where is that test in the kernel tree?
> 
> In tools/testing/selftests/bpf/xdpxceiver.c. It is the
> RUN_TO_COMPLETION_SINGLE_PKT test. But the framework only operates on
> veth currently.
> 
> > > since we have had issues with this in the past. That test does
> > > pass in
> > > bpf-next, but it is only run with the veth driver that does not
> > > support zero-copy so could still be an issue. What driver are you
> > > using in zero-copy mode and what kernel version are you on?
> > 
> > Driver: igc with Intel chip i225
> 
> Have never tried this one personally. Do not know if I have one in
> the lab but let me check.
> 
> Ederson, do you have any experience with this card and if so, have
> you seen something similar?

Not sure. I wonder how small is the interval Jesper is using. I imagine
that if it's too small, the interrupt generated to trigger the tx could
end up serving more than one packet.

Vinicius should have more prompt access to i225 - could you please help
on this?
> 
> > Kernel version: 5.15.0-net-next+ #618 SMP PREEMPT
> >    - Devel branch at commit 6d3b1b069946 (v5.15-12802-
> > g6d3b1b069946)
> > 
> > > > How can I get AF_XDP to "flush" TX packets when calling
> > > > sendto()?
> > > > Should we add another flag than the current MSG_DONTWAIT?
> > > 
> > > In zero-copy mode with softirq driver processing (not busy poll),
> > > a
> > > sendto will just trigger the xsk_wakeup ndo that schedules napi
> > > unless
> > > it is already executing. It is up to the driver to then get
> > > packets
> > > from the Tx ring and put them on the HW and make sure they are
> > > sent.
> > > Barring any HW quirks, sending one packets should be perfectly
> > > fine.
> > 
> > I will investigate driver level issues.
> > 
> > I have other (100G) NICs in my testlab, but I'm using these 1G NICs
> > because
> > they support hardware timestamping, which allows me to investigate
> > these
> > timing issues.
> > I'll find a way to see of other drivers behave differently.
> 
> Would be great if you could check if the problem also exists on e.g.
> ice. 
> 
> > > > Hint, I'm using tcpdump hardware timestamping on receiving hist
> > > > via
> > > > cmdline:
> > > > 
> > > > tcpdump -vv -s0 -ni eth1 -j adapter_unsynced
> > > > --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42
> > > > 
> > > > Notice[1] on specific branch: [1]
> > > > https://github.com/xdp-project/bpf-
> > examples/tree/vestas03_AF_XDP_exam
> > > > ple/AF_XDP-interaction
> > > 
> > 
> > Thanks for your feedback,
> > --Jesper
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-15  1:08       ` Desouza, Ederson
@ 2021-12-15  8:41         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2021-12-15  8:41 UTC (permalink / raw)
  To: Desouza, Ederson, bjorn, Karlsson, Magnus, jbrouer
  Cc: brouer, jopbs, xdp-newbies, dioli, Fijalkowski, Maciej, Gomes,
	Vinicius, Ong, Boon Leong



On 15/12/2021 02.08, Desouza, Ederson wrote:
> +Vinicius
> 
> On Tue, 2021-12-14 at 10:40 +0000, Karlsson, Magnus wrote:
>> Adding Ederson and Maciej.
>> 
>>> On 14/12/2021 09.07, Karlsson, Magnus wrote:
>>>> 
>>>> 
>>>>> From: Jesper Dangaard Brouer 
>>>>> 
>>>>> Hi Magnus and Bjørn,
>>>>> 
>>>>> I'm coding on an AF_XDP program[1] that need to send (a bulk
>>>>> of packets) in a short time-window (related to
>>>>> Time-Triggered Ethernet).
>>>>> 
>>>>> My observations are that AF_XDP doesn't send the frames 
>>>>> immediately. And yes, I do call sendto() to trigger a TX
>>>>> kick. In zero-copy mode this is particular bad.  My program
>>>>> want to send 4 packets in a burst, but I'm observing 8
>>>>> packets grouped together on the receiving host.
>>>>> 
>>>>> Is the a known property of AF_XDP?
>>>> 
>>>> Nope! It is supposed to be able to send one packet at a time, 
>>>> though I have several times seen bugs in the drivers where the
>>>> batching behavior shines through like this, and once a bug in
>>>> the core code. There is even a test these days for just sending
>>>> a single packet,
>>> 
>>> Where is that test in the kernel tree?
>> 
>> In tools/testing/selftests/bpf/xdpxceiver.c. It is the 
>> RUN_TO_COMPLETION_SINGLE_PKT test. But the framework only operates
>> on veth currently.
>> 
>>>> since we have had issues with this in the past. That test does 
>>>> pass in bpf-next, but it is only run with the veth driver that
>>>> does not support zero-copy so could still be an issue. What
>>>> driver are you using in zero-copy mode and what kernel version
>>>> are you on?
>>> 
>>> Driver: igc with Intel chip i225
>> 
>> Have never tried this one personally. Do not know if I have one in 
>> the lab but let me check.
>> 
>> Ederson, do you have any experience with this card and if so, have 
>> you seen something similar?
> 
> Not sure. I wonder how small is the interval Jesper is using. I
> imagine that if it's too small, the interrupt generated to trigger
> the tx could end up serving more than one packet.

The interval is currently 1 second (code here[0]), thus the TX interrupt 
should have plenty of time to trigger.

[0] 
https://github.com/xdp-project/bpf-examples/blob/vestas03_AF_XDP_example/AF_XDP-interaction/af_xdp_user.c#L1129


> Vinicius should have more prompt access to i225 - could you please
> help on this?

My reproducer[1] need option --zero-copy to enable the error case, as it 
defaults to 'copy-mode'.
It is only in zero-copy mode for igc/i225 that I see the behavior of 8 
packet bulking, when expecting/sending bulks of 4 packets ever second.


[1] 
https://github.com/xdp-project/bpf-example/tree/vestas03_AF_XDP_example/AF_XDP-interaction

--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-14  8:07 ` Karlsson, Magnus
  2021-12-14 10:32   ` Jesper Dangaard Brouer
@ 2021-12-15 10:17   ` Jesper Dangaard Brouer
  2021-12-15 11:07     ` Karlsson, Magnus
  1 sibling, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2021-12-15 10:17 UTC (permalink / raw)
  To: Karlsson, Magnus, Jesper Dangaard Brouer, Björn Töpel
  Cc: brouer, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima



On 14/12/2021 09.07, Karlsson, Magnus wrote:
> 
>> 
>> I'm coding on an AF_XDP program[1] that need to send (a bulk of
>> packets) in a short time-window (related to Time-Triggered
>> Ethernet).
>> 
[...]
> 
>> How can I get AF_XDP to "flush" TX packets when calling sendto()? 
>> Should we add another flag than the current MSG_DONTWAIT?
> 
> In zero-copy mode with softirq driver processing (not busy poll), a
> sendto will just trigger the xsk_wakeup ndo that schedules napi
> unless it is already executing. It is up to the driver to then get
> packets from the Tx ring and put them on the HW and make sure they
> are sent. Barring any HW quirks, sending one packets should be
> perfectly fine.

This actually doesn't sound so good from my customers use-case PoV.
That we only trigger a ndo_xsk_wakeup that schedules napi.

We want to trigger HW transmission immediately.
Can we achieve this via using busy-poll mode?


>> Hint, I'm using tcpdump hardware timestamping on receiving hist via
>> cmdline:
>> 
>> tcpdump -vv -s0 -ni eth1 -j adapter_unsynced --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42
>> 
>> Notice[1] on specific branch: [1] 
>> https://github.com/xdp-project/bpf-examples/tree/vestas03_AF_XDP_example/AF_XDP-interaction


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: AF_XDP not transmitting frames immediately
  2021-12-15 10:17   ` Jesper Dangaard Brouer
@ 2021-12-15 11:07     ` Karlsson, Magnus
  2021-12-15 17:11       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 16+ messages in thread
From: Karlsson, Magnus @ 2021-12-15 11:07 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Björn Töpel
  Cc: Brouer, Jesper, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima



> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Sent: Wednesday, December 15, 2021 11:18 AM
> To: Karlsson, Magnus <magnus.karlsson@intel.com>; Jesper Dangaard
> Brouer <jbrouer@redhat.com>; Björn Töpel <bjorn@kernel.org>
> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp <xdp-
> newbies@vger.kernel.org>; Ong, Boon Leong <boon.leong.ong@intel.com>;
> Joao Pedro Barros Silva <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima
> <dioli@vestas.com>
> Subject: Re: AF_XDP not transmitting frames immediately
> 
> 
> 
> On 14/12/2021 09.07, Karlsson, Magnus wrote:
> >
> >>
> >> I'm coding on an AF_XDP program[1] that need to send (a bulk of
> >> packets) in a short time-window (related to Time-Triggered Ethernet).
> >>
> [...]
> >
> >> How can I get AF_XDP to "flush" TX packets when calling sendto()?
> >> Should we add another flag than the current MSG_DONTWAIT?
> >
> > In zero-copy mode with softirq driver processing (not busy poll), a
> > sendto will just trigger the xsk_wakeup ndo that schedules napi unless
> > it is already executing. It is up to the driver to then get packets
> > from the Tx ring and put them on the HW and make sure they are sent.
> > Barring any HW quirks, sending one packets should be perfectly fine.
> 
> This actually doesn't sound so good from my customers use-case PoV.
> That we only trigger a ndo_xsk_wakeup that schedules napi.
> 
> We want to trigger HW transmission immediately.
> Can we achieve this via using busy-poll mode?

Yes, but not without napi. The napi context will in this case be executed in process context right away, unless it is already running somewhere else but that should not be the case. Will this be good enough?

> 
> >> Hint, I'm using tcpdump hardware timestamping on receiving hist via
> >> cmdline:
> >>
> >> tcpdump -vv -s0 -ni eth1 -j adapter_unsynced
> >> --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42
> >>
> >> Notice[1] on specific branch: [1]
> >> https://github.com/xdp-project/bpf-
> examples/tree/vestas03_AF_XDP_exam
> >> ple/AF_XDP-interaction


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-15 11:07     ` Karlsson, Magnus
@ 2021-12-15 17:11       ` Jesper Dangaard Brouer
  2021-12-16  8:34         ` Karlsson, Magnus
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2021-12-15 17:11 UTC (permalink / raw)
  To: Karlsson, Magnus, Jesper Dangaard Brouer, Björn Töpel
  Cc: brouer, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima



On 15/12/2021 12.07, Karlsson, Magnus wrote:
> 
>> From: Jesper Dangaard Brouer <jbrouer@redhat.com> On 14/12/2021
>> 09.07, Karlsson, Magnus wrote:
>>> 
>>>> 
>>>> I'm coding on an AF_XDP program[1] that need to send (a bulk
>>>> of packets) in a short time-window (related to Time-Triggered
>>>> Ethernet).
>>>> 
>> [...]
>>> 
>>>> How can I get AF_XDP to "flush" TX packets when calling
>>>> sendto()? Should we add another flag than the current
>>>> MSG_DONTWAIT?
>>> 
>>> In zero-copy mode with softirq driver processing (not busy poll),
>>> a sendto will just trigger the xsk_wakeup ndo that schedules napi
>>> unless it is already executing. It is up to the driver to then
>>> get packets from the Tx ring and put them on the HW and make sure
>>> they are sent. Barring any HW quirks, sending one packets should
>>> be perfectly fine.
>> 
>> This actually doesn't sound so good from my customers use-case
>> PoV. That we only trigger a ndo_xsk_wakeup that schedules napi.
>> 
>> We want to trigger HW transmission immediately. Can we achieve this
>> via using busy-poll mode?
> 
> Yes, but not without napi. The napi context will in this case be
> executed in process context right away, unless it is already running
> somewhere else but that should not be the case. Will this be good
> enough?

"Time" will tell if it is good enough (pun intended).
Meaning I will implement and measure it.
The busy-poll mode does sound like a way forward.

Looking at kernel code, I can see that drivers TX NAPI usually does 
DMA-TX completion *before* transmitting new frames.  This usually makes 
sense, but for our use-case of hitting a narrow time-slot, I worry about 
the jitter this introduces.  I would like to see a mode/flag that would 
allow transmitting new frames (and afterwards invoking/scheduling 
TX-completion, e.g. via raising the softirq/NAPI).
Well this is future work, first I will measure current implementation.

--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: AF_XDP not transmitting frames immediately
  2021-12-15 17:11       ` Jesper Dangaard Brouer
@ 2021-12-16  8:34         ` Karlsson, Magnus
  2021-12-16 15:28           ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: Karlsson, Magnus @ 2021-12-16  8:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Björn Töpel
  Cc: Brouer, Jesper, Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima, Fijalkowski, Maciej



> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Sent: Wednesday, December 15, 2021 6:11 PM
> To: Karlsson, Magnus <magnus.karlsson@intel.com>; Jesper Dangaard
> Brouer <jbrouer@redhat.com>; Björn Töpel <bjorn@kernel.org>
> Cc: Brouer, Jesper <brouer@redhat.com>; Xdp <xdp-
> newbies@vger.kernel.org>; Ong, Boon Leong <boon.leong.ong@intel.com>;
> Joao Pedro Barros Silva <jopbs@vestas.com>; Diogo Alexandre Da Silva Lima
> <dioli@vestas.com>
> Subject: Re: AF_XDP not transmitting frames immediately
> 
> 
> 
> On 15/12/2021 12.07, Karlsson, Magnus wrote:
> >
> >> From: Jesper Dangaard Brouer <jbrouer@redhat.com> On 14/12/2021
> >> 09.07, Karlsson, Magnus wrote:
> >>>
> >>>>
> >>>> I'm coding on an AF_XDP program[1] that need to send (a bulk of
> >>>> packets) in a short time-window (related to Time-Triggered
> >>>> Ethernet).
> >>>>
> >> [...]
> >>>
> >>>> How can I get AF_XDP to "flush" TX packets when calling sendto()?
> >>>> Should we add another flag than the current MSG_DONTWAIT?
> >>>
> >>> In zero-copy mode with softirq driver processing (not busy poll), a
> >>> sendto will just trigger the xsk_wakeup ndo that schedules napi
> >>> unless it is already executing. It is up to the driver to then get
> >>> packets from the Tx ring and put them on the HW and make sure they
> >>> are sent. Barring any HW quirks, sending one packets should be
> >>> perfectly fine.
> >>
> >> This actually doesn't sound so good from my customers use-case PoV.
> >> That we only trigger a ndo_xsk_wakeup that schedules napi.
> >>
> >> We want to trigger HW transmission immediately. Can we achieve this
> >> via using busy-poll mode?
> >
> > Yes, but not without napi. The napi context will in this case be
> > executed in process context right away, unless it is already running
> > somewhere else but that should not be the case. Will this be good
> > enough?
> 
> "Time" will tell if it is good enough (pun intended).
> Meaning I will implement and measure it.
> The busy-poll mode does sound like a way forward.
> 
> Looking at kernel code, I can see that drivers TX NAPI usually does DMA-TX
> completion *before* transmitting new frames.  This usually makes sense,
> but for our use-case of hitting a narrow time-slot, I worry about the jitter this
> introduces.  I would like to see a mode/flag that would allow transmitting
> new frames (and afterwards invoking/scheduling TX-completion, e.g. via
> raising the softirq/NAPI).
> Well this is future work, first I will measure current implementation.

Maciej has been experimenting with the ice driver to do sending first. Completions are only done lazily when needed to make sure that there are always a number of descriptors available for sending. This yields much better throughput and is the style that DPDK uses for its drivers. Hopefully, this will also improve latency, though we have not measured that. Could be adopted for the igc driver too if that is the case.

> --Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: AF_XDP not transmitting frames immediately
  2021-12-16  8:34         ` Karlsson, Magnus
@ 2021-12-16 15:28           ` Jakub Kicinski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2021-12-16 15:28 UTC (permalink / raw)
  To: Karlsson, Magnus
  Cc: Jesper Dangaard Brouer, Björn Töpel, Brouer, Jesper,
	Xdp, Ong, Boon Leong, Joao Pedro Barros Silva,
	Diogo Alexandre Da Silva Lima, Fijalkowski, Maciej

On Thu, 16 Dec 2021 08:34:23 +0000 Karlsson, Magnus wrote:
> > "Time" will tell if it is good enough (pun intended).
> > Meaning I will implement and measure it.
> > The busy-poll mode does sound like a way forward.
> > 
> > Looking at kernel code, I can see that drivers TX NAPI usually does DMA-TX
> > completion *before* transmitting new frames.  This usually makes sense,
> > but for our use-case of hitting a narrow time-slot, I worry about the jitter this
> > introduces.  I would like to see a mode/flag that would allow transmitting
> > new frames (and afterwards invoking/scheduling TX-completion, e.g. via
> > raising the softirq/NAPI).
> > Well this is future work, first I will measure current implementation.  
> 
> Maciej has been experimenting with the ice driver to do sending
> first. Completions are only done lazily when needed to make sure that
> there are always a number of descriptors available for sending. This
> yields much better throughput and is the style that DPDK uses for its
> drivers. Hopefully, this will also improve latency, though we have
> not measured that. Could be adopted for the igc driver too if that is
> the case.

I think this came up before, there was a concern that some applications
may forgo retransmiting data until they see a completions (like
skb_still_in_host_queue()).  Make sure you call out that it's a change
in behavior in the commit message and/or docs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-12-16 15:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-13 21:04 AF_XDP not transmitting frames immediately Jesper Dangaard Brouer
2021-12-14  8:07 ` Karlsson, Magnus
2021-12-14 10:32   ` Jesper Dangaard Brouer
2021-12-14 10:40     ` Karlsson, Magnus
2021-12-14 11:25       ` Maciej Fijalkowski
2021-12-14 14:04         ` Jesper Dangaard Brouer
2021-12-14 14:57           ` Ong, Boon Leong
2021-12-14 15:42             ` Jesper Dangaard Brouer
2021-12-14 16:05               ` Maciej Fijalkowski
2021-12-15  1:08       ` Desouza, Ederson
2021-12-15  8:41         ` Jesper Dangaard Brouer
2021-12-15 10:17   ` Jesper Dangaard Brouer
2021-12-15 11:07     ` Karlsson, Magnus
2021-12-15 17:11       ` Jesper Dangaard Brouer
2021-12-16  8:34         ` Karlsson, Magnus
2021-12-16 15:28           ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.