How does the Kernel decide which Umem frame to choose for the next packet?

All of lore.kernel.org
 help / color / mirror / Atom feed

* How does the Kernel decide which Umem frame to choose for the next packet?
@ 2020-05-18  8:37 Gaul, Maximilian
  2020-05-18  8:51 ` Magnus Karlsson
  0 siblings, 1 reply; 4+ messages in thread
From: Gaul, Maximilian @ 2020-05-18  8:37 UTC (permalink / raw)
  To: Xdp

Hello,

I read through the paper "The Path to DPDK Speeds for AF XDP" but I couldn't find information about how the Kernel choses the Umem frame for the next packet? Or in case of zero-copy, how the driver decides which Umem frame to chose.
But maybe I just overlooked it.

So how does this work?

Best regards

Max

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How does the Kernel decide which Umem frame to choose for the next packet?
  2020-05-18  8:37 How does the Kernel decide which Umem frame to choose for the next packet? Gaul, Maximilian
@ 2020-05-18  8:51 ` Magnus Karlsson
  2020-05-18  9:17   ` AW: " Gaul, Maximilian
  0 siblings, 1 reply; 4+ messages in thread
From: Magnus Karlsson @ 2020-05-18  8:51 UTC (permalink / raw)
  To: Gaul, Maximilian; +Cc: Xdp

On Mon, May 18, 2020 at 10:39 AM Gaul, Maximilian
<maximilian.gaul@hm.edu> wrote:
>
> Hello,
>
> I read through the paper "The Path to DPDK Speeds for AF XDP" but I couldn't find information about how the Kernel choses the Umem frame for the next packet? Or in case of zero-copy, how the driver decides which Umem frame to chose.
> But maybe I just overlooked it.
>
> So how does this work?

User-space decides this by what frames it enters into the fill ring.
Kernel-space uses the frames in order from that ring.

/Magnus

> Best regards
>
> Max

^ permalink raw reply	[flat|nested] 4+ messages in thread

* AW: How does the Kernel decide which Umem frame to choose for the next packet?
  2020-05-18  8:51 ` Magnus Karlsson
@ 2020-05-18  9:17   ` Gaul, Maximilian
  2020-05-18 13:14     ` Magnus Karlsson
  0 siblings, 1 reply; 4+ messages in thread
From: Gaul, Maximilian @ 2020-05-18  9:17 UTC (permalink / raw)
  To: Magnus Karlsson; +Cc: Xdp

> User-space decides this by what frames it enters into the fill ring.
> Kernel-space uses the frames in order from that ring.
> 
> /Magnus

Thank you for your reply Magnus,

I am sorry to ask again but I am not so sure when this happens.
So I first check my socket RX-ring for new packets:

		xsk_ring_cons__peek(&xsk_socket->rx, 1024, &idx_rx)

which looks like this:

		static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
							 size_t nb, __u32 *idx)
		{
			size_t entries = xsk_cons_nb_avail(cons, nb);

			if (entries > 0) {
				/* Make sure we do not speculatively read the data before
				 * we have received the packet buffers from the ring.
				 */
				libbpf_smp_rmb();

				*idx = cons->cached_cons;
				cons->cached_cons += entries;
			}

			return entries;
		}

where `idx_rx` is the starting position of descriptors for the new packets in the RX-ring.

My first question here is: How can there already be descriptors of packets in my RX-ring if I didn't enter any frames into the fill ring of the umem yet?
So I assume libbpf did this for me already?

After this call I know how many packets are waiting. So I reserve exactly as many Umem frames:

		xsk_ring_prod__reserve(&umem_info->fq, rx_rcvd_amnt, &idx_fq);

which looks like this:

		static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
								size_t nb, __u32 *idx)
		{
			if (xsk_prod_nb_free(prod, nb) < nb)
				return 0;

			*idx = prod->cached_prod;
			prod->cached_prod += nb;

			return nb;
		}

But what am I exactly reserving here? How can I reserve anything from the Umem without telling it the RX-ring of my socket?

After  this, I extract the RX-ring packet descriptors, starting at `idx_rx`:

		const struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk_socket->rx, idx_rx + i);

I am also not entirely certain with the zero-copy aspect of AF-XDP. As far as I know the NIC writes incoming packets via DMA directly into system memory. But this time system memory means the Umem area - right? Where with non-zero-copy this would be any position in memory and the Kernel first has to copy the packets into the Umem area?

I am also a bit confused what the size of a RX-queue means in this context. Assuming the output of ethtool:

		$ ethtool -g eth20
		Ring parameters for eth20:
		Pre-set maximums:
		RX:             8192
		RX Mini:        0
		RX Jumbo:       0
		TX:             8192
		Current hardware settings:
		RX:             1024
		RX Mini:        0
		RX Jumbo:       0
		TX:             1024

Does this mean that at the moment my NIC can store 1024 incoming packets inside its own memory? So there is no connection between the RX-queue size of the NIC and the Umem area?

Sorry for this wall of text. Maybe you can answer a few of my questions, I hope they are not too confusing.

Thank you so much

Max

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How does the Kernel decide which Umem frame to choose for the next packet?
  2020-05-18  9:17   ` AW: " Gaul, Maximilian
@ 2020-05-18 13:14     ` Magnus Karlsson
  0 siblings, 0 replies; 4+ messages in thread
From: Magnus Karlsson @ 2020-05-18 13:14 UTC (permalink / raw)
  To: Gaul, Maximilian; +Cc: Xdp

On Mon, May 18, 2020 at 11:17 AM Gaul, Maximilian
<maximilian.gaul@hm.edu> wrote:
>
> > User-space decides this by what frames it enters into the fill ring.
> > Kernel-space uses the frames in order from that ring.
> >
> > /Magnus
>
> Thank you for your reply Magnus,
>
> I am sorry to ask again but I am not so sure when this happens.
> So I first check my socket RX-ring for new packets:
>
>                 xsk_ring_cons__peek(&xsk_socket->rx, 1024, &idx_rx)
>
> which looks like this:
>
>                 static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
>                                                          size_t nb, __u32 *idx)
>                 {
>                         size_t entries = xsk_cons_nb_avail(cons, nb);
>
>                         if (entries > 0) {
>                                 /* Make sure we do not speculatively read the data before
>                                  * we have received the packet buffers from the ring.
>                                  */
>                                 libbpf_smp_rmb();
>
>                                 *idx = cons->cached_cons;
>                                 cons->cached_cons += entries;
>                         }
>
>                         return entries;
>                 }
>
> where `idx_rx` is the starting position of descriptors for the new packets in the RX-ring.
>
> My first question here is: How can there already be descriptors of packets in my RX-ring if I didn't enter any frames into the fill ring of the umem yet?
> So I assume libbpf did this for me already?

Yes, that is correct.

> After this call I know how many packets are waiting. So I reserve exactly as many Umem frames:
>
>                 xsk_ring_prod__reserve(&umem_info->fq, rx_rcvd_amnt, &idx_fq);
>
> which looks like this:
>
>                 static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
>                                                                 size_t nb, __u32 *idx)
>                 {
>                         if (xsk_prod_nb_free(prod, nb) < nb)
>                                 return 0;
>
>                         *idx = prod->cached_prod;
>                         prod->cached_prod += nb;
>
>                         return nb;
>                 }
>
> But what am I exactly reserving here? How can I reserve anything from the Umem without telling it the RX-ring of my socket?

You are reserving descriptor slots in a producer ring.

> After  this, I extract the RX-ring packet descriptors, starting at `idx_rx`:
>
>                 const struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk_socket->rx, idx_rx + i);
>
> I am also not entirely certain with the zero-copy aspect of AF-XDP. As far as I know the NIC writes incoming packets via DMA directly into system memory. But this time system memory means the Umem area - right? Where with non-zero-copy this would be any position in memory and the Kernel first has to copy the packets into the Umem area?

In zero-copy mode, the NIC DMA:s the packet straight into the umem, so
they are immediately seen by the user space process.

> I am also a bit confused what the size of a RX-queue means in this context. Assuming the output of ethtool:
>
>                 $ ethtool -g eth20
>                 Ring parameters for eth20:
>                 Pre-set maximums:
>                 RX:             8192
>                 RX Mini:        0
>                 RX Jumbo:       0
>                 TX:             8192
>                 Current hardware settings:
>                 RX:             1024
>                 RX Mini:        0
>                 RX Jumbo:       0
>                 TX:             1024
>
> Does this mean that at the moment my NIC can store 1024 incoming packets inside its own memory?

The NIC does not have its own memory. This just means that their can
be 1024 packets that will be processed by the NIC or have been
processed by the NIC but not handled by the driver. Nothing you need
to care about unless you are performance optimizing, or writing a
driver of course :-).

> So there is no connection between the RX-queue size of the NIC and the Umem area?

Correct.

/Magnus

> Sorry for this wall of text. Maybe you can answer a few of my questions, I hope they are not too confusing.
>
> Thank you so much
>
> Max

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-18 13:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18  8:37 How does the Kernel decide which Umem frame to choose for the next packet? Gaul, Maximilian
2020-05-18  8:51 ` Magnus Karlsson
2020-05-18  9:17   ` AW: " Gaul, Maximilian
2020-05-18 13:14     ` Magnus Karlsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.