bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org
Cc: netdev@vger.kernel.org, magnus.karlsson@intel.com,
	bjorn@kernel.org, tirthendu.sarkar@intel.com,
	maciej.fijalkowski@intel.com, simon.horman@corigine.com,
	toke@kernel.org
Subject: [PATCH v4 bpf-next 15/22] xsk: add multi-buffer documentation
Date: Thu, 15 Jun 2023 19:25:59 +0200	[thread overview]
Message-ID: <20230615172606.349557-16-maciej.fijalkowski@intel.com> (raw)
In-Reply-To: <20230615172606.349557-1-maciej.fijalkowski@intel.com>

From: Magnus Karlsson <magnus.karlsson@intel.com>

Add AF_XDP multi-buffer support documentation including two
pseudo-code samples.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 Documentation/networking/af_xdp.rst | 177 ++++++++++++++++++++++++++++
 1 file changed, 177 insertions(+)

diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
index 247c6c4127e9..2b583f58967b 100644
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -453,6 +453,93 @@ XDP_OPTIONS getsockopt
 Gets options from an XDP socket. The only one supported so far is
 XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not.
 
+Multi-Buffer Support
+--------------------
+
+With multi-buffer support, programs using AF_XDP sockets can receive
+and transmit packets consisting of multiple buffers both in copy and
+zero-copy mode. For example, a packet can consist of two
+frames/buffers, one with the header and the other one with the data,
+or a 9K Ethernet jumbo frame can be constructed by chaining together
+three 4K frames.
+
+Some definitions:
+
+* A packet consists of one or more frames
+
+* A descriptor in one of the AF_XDP rings always refers to a single
+  frame. In the case the packet consists of a single frame, the
+  descriptor refers to the whole packet.
+
+To enable multi-buffer support for an AF_XDP socket, use the new bind
+flag XDP_USE_SG. If this is not provided, all multi-buffer packets
+will be dropped just as before. Note that the XDP program loaded also
+needs to be in multi-buffer mode. This can be accomplished by using
+"xdp.frags" as the section name of the XDP program used.
+
+To represent a packet consisting of multiple frames, a new flag called
+XDP_PKT_CONTD is introduced in the options field of the Rx and Tx
+descriptors. If it is true (1) the packet continues with the next
+descriptor and if it is false (0) it means this is the last descriptor
+of the packet. Why the reverse logic of end-of-packet (eop) flag found
+in many NICs? Just to preserve compatibility with non-multi-buffer
+applications that have this bit set to false for all packets on Rx,
+and the apps set the options field to zero for Tx, as anything else
+will be treated as an invalid descriptor.
+
+These are the semantics for producing packets onto AF_XDP Tx ring
+consisting of multiple frames:
+
+* When an invalid descriptor is found, all the other
+  descriptors/frames of this packet are marked as invalid and not
+  completed. The next descriptor is treated as the start of a new
+  packet, even if this was not the intent (because we cannot guess
+  the intent). As before, if your program is producing invalid
+  descriptors you have a bug that must be fixed.
+
+* Zero length descriptors are treated as invalid descriptors.
+
+* For copy mode, the maximum supported number of frames in a packet is
+  equal to CONFIG_MAX_SKB_FRAGS + 1. If it is exceeded, all
+  descriptors accumulated so far are dropped and treated as
+  invalid. To produce an application that will work on any system
+  regardless of this config setting, limit the number of frags to 18,
+  as the minimum value of the config is 17.
+
+* For zero-copy mode, the limit is up to what the NIC HW
+  supports. Usually at least five on the NICs we have checked. We
+  consciously chose to not enforce a rigid limit (such as
+  CONFIG_MAX_SKB_FRAGS + 1) for zero-copy mode, as it would have
+  resulted in copy actions under the hood to fit into what limit
+  the NIC supports. Kind of defeats the purpose of zero-copy mode.
+
+* The ZC batch API guarantees that it will provide a batch of Tx
+  descriptors that ends with full packet at the end. If not, ZC
+  drivers would have to gather the full packet on their side. The
+  approach we picked makes ZC drivers' life much easier (at least on
+  Tx side).
+
+On the Rx path in copy-mode, the xsk core copies the XDP data into
+multiple descriptors, if needed, and sets the XDP_PKT_CONTD flag as
+detailed before. Zero-copy mode works the same, though the data is not
+copied. When the application gets a descriptor with the XDP_PKT_CONTD
+flag set to one, it means that the packet consists of multiple buffers
+and it continues with the next buffer in the following
+descriptor. When a descriptor with XDP_PKT_CONTD == 0 is received, it
+means that this is the last buffer of the packet. AF_XDP guarantees
+that only a complete packet (all frames in the packet) is sent to the
+application.
+
+If application reads a batch of descriptors, using for example the libxdp
+interfaces, it is not guaranteed that the batch will end with a full
+packet. It might end in the middle of a packet and the rest of the
+buffers of that packet will arrive at the beginning of the next batch,
+since the libxdp interface does not read the whole ring (unless you
+have an enormous batch size or a very small ring size).
+
+An example program each for Rx and Tx multi-buffer support can be found
+later in this document.
+
 Usage
 =====
 
@@ -532,6 +619,96 @@ like this:
 But please use the libbpf functions as they are optimized and ready to
 use. Will make your life easier.
 
+Usage Multi-Buffer Rx
+=====================
+
+Here is a simple Rx path pseudo-code example (using libxdp interfaces
+for simplicity). Error paths have been excluded to keep it short:
+
+.. code-block:: c
+
+    void rx_packets(struct xsk_socket_info *xsk)
+    {
+        static bool new_packet = true;
+        u32 idx_rx = 0, idx_fq = 0;
+        static char *pkt;
+
+        int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
+
+        xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+
+        for (int i = 0; i < rcvd; i++) {
+            struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++);
+            char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr);
+            bool eop = !(desc->options & XDP_PKT_CONTD);
+
+        if (new_packet)
+            pkt = frag;
+        else
+            add_frag_to_pkt(pkt, frag);
+
+        if (eop)
+            process_pkt(pkt);
+
+        new_packet = eop;
+
+        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr;
+        }
+
+        xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+        xsk_ring_cons__release(&xsk->rx, rcvd);
+    }
+
+Usage Multi-Buffer Tx
+=====================
+
+Here is an example Tx path pseudo-code (using libxdp interfaces for
+simplicity) ignoring that the umem is finite in size, and that we
+eventually will run out of packets to send. Also assumes pkts.addr
+points to a valid location in the umem.
+
+.. code-block:: c
+
+    void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts,
+                    int batch_size)
+    {
+        u32 idx, i, pkt_nb = 0;
+
+        xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx);
+
+        for (i = 0; i < batch_size;) {
+            u64 addr = pkts[pkt_nb].addr;
+            u32 len = pkts[pkt_nb].size;
+
+            do {
+                struct xdp_desc *tx_desc;
+
+                tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++);
+                tx_desc->addr = addr;
+
+                if (len > xsk_frame_size) {
+                    tx_desc->len = xsk_frame_size;
+                    tx_desc->options = XDP_PKT_CONTD;
+                } else {
+                    tx_desc->len = len;
+                    tx_desc->options = 0;
+                    pkt_nb++;
+                }
+                len -= tx_desc->len;
+                addr += xsk_frame_size;
+
+                if (i == batch_size) {
+                    /* Remember len, addr, pkt_nb for next iteration.
+                     * Skipped for simplicity.
+                     */
+                    break;
+                }
+            } while (len);
+        }
+
+        xsk_ring_prod__submit(&xsk->tx, i);
+    }
+
 Sample application
 ==================
 
-- 
2.34.1


  parent reply	other threads:[~2023-06-15 17:27 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 17:25 [PATCH v4 bpf-next 00/22] xsk: multi-buffer support Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 01/22] xsk: prepare 'options' in xdp_desc for multi-buffer use Maciej Fijalkowski
2023-06-22 19:02   ` Benjamin Poirier
2023-06-26 13:18     ` Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 02/22] xsk: introduce XSK_USE_SG bind flag for xsk socket Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 03/22] xsk: prepare both copy and zero-copy modes to co-exist Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 04/22] xsk: move xdp_buff's data length check to xsk_rcv_check Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 05/22] xsk: add support for AF_XDP multi-buffer on Rx path Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 06/22] xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path Maciej Fijalkowski
2023-06-20 17:25   ` Toke Høiland-Jørgensen
2023-06-21  8:15     ` Sarkar, Tirthendu
2023-06-21 13:27       ` Toke Høiland-Jørgensen
2023-06-15 17:25 ` [PATCH v4 bpf-next 07/22] xsk: allow core/drivers to test EOP bit Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 08/22] xsk: add support for AF_XDP multi-buffer on Tx path Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 09/22] xsk: discard zero length descriptors in " Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 10/22] xsk: support mbuf on ZC RX Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 11/22] ice: xsk: add RX multi-buffer support Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 12/22] xsk: support ZC Tx multi-buffer in batch API Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 13/22] xsk: report zero-copy multi-buffer capability via xdp_features Maciej Fijalkowski
2023-06-15 17:25 ` [PATCH v4 bpf-next 14/22] ice: xsk: Tx multi-buffer support Maciej Fijalkowski
2023-06-15 17:25 ` Maciej Fijalkowski [this message]
2023-06-20 17:34   ` [PATCH v4 bpf-next 15/22] xsk: add multi-buffer documentation Toke Høiland-Jørgensen
2023-06-21  8:06     ` Magnus Karlsson
2023-06-21 13:30       ` Toke Høiland-Jørgensen
2023-06-21 14:15         ` Magnus Karlsson
2023-06-21 20:34           ` Jakub Kicinski
2023-06-22  8:24             ` Magnus Karlsson
2023-06-22 10:56               ` Toke Høiland-Jørgensen
     [not found]                 ` <ZJx9WkB/dfB5EFjE@boxer>
2023-06-28 20:28                   ` Jakub Kicinski
2023-06-28 21:02                   ` Toke Høiland-Jørgensen
2023-06-29 20:28                     ` Maciej Fijalkowski
2023-06-29 20:57                       ` Toke Høiland-Jørgensen
2023-06-30 18:00                         ` Maciej Fijalkowski
2023-07-01 13:51                           ` Toke Høiland-Jørgensen
2023-06-15 17:26 ` [PATCH v4 bpf-next 16/22] selftests/xsk: transmit and receive multi-buffer packets Maciej Fijalkowski
2023-06-15 17:26 ` [PATCH v4 bpf-next 17/22] selftests/xsk: add basic multi-buffer test Maciej Fijalkowski
2023-06-15 17:26 ` [PATCH v4 bpf-next 18/22] selftests/xsk: add unaligned mode test for multi-buffer Maciej Fijalkowski
2023-06-15 17:26 ` [PATCH v4 bpf-next 19/22] selftests/xsk: add invalid descriptor " Maciej Fijalkowski
2023-06-15 17:26 ` [PATCH v4 bpf-next 20/22] selftests/xsk: add metadata copy test for multi-buff Maciej Fijalkowski
2023-06-15 17:26 ` [PATCH v4 bpf-next 21/22] selftests/xsk: add test for too many frags Maciej Fijalkowski
2023-06-15 17:26 ` [PATCH v4 bpf-next 22/22] selftests/xsk: reset NIC settings to default after running test suite Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230615172606.349557-16-maciej.fijalkowski@intel.com \
    --to=maciej.fijalkowski@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=simon.horman@corigine.com \
    --cc=tirthendu.sarkar@intel.com \
    --cc=toke@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).