All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Ciara Loftus <ciara.loftus@intel.com>, <dev@dpdk.org>
Cc: <stephen@networkplumber.org>
Subject: Re: [PATCH v2] net/af_xdp: re-enable secondary process support
Date: Fri, 4 Feb 2022 14:18:46 +0000	[thread overview]
Message-ID: <927b094f-2d84-1dbb-0ad5-37dcf1e1c98e@intel.com> (raw)
In-Reply-To: <20220204125436.30397-1-ciara.loftus@intel.com>

On 2/4/2022 12:54 PM, Ciara Loftus wrote:
> Secondary process support had been disabled for the AF_XDP PMD
> because there was no logic in place to share the AF_XDP socket
> file descriptors between the processes. This commit introduces
> this logic using the IPC APIs.
> 
> Since AF_XDP rings are single-producer single-consumer, rx/tx
> in the secondary process is disabled. However other operations
> including retrieval of stats are permitted.
> 
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> 
> ---
> v1 -> v2:
> * Rebase to next-net
> 
> RFC -> v1:
> * Added newline to af_xdp.rst
> * Fixed spelling errors
> * Fixed potential NULL dereference in init_internals
> * Fixed potential free of address-of expression in afxdp_mp_request_fds
> ---
>   doc/guides/nics/af_xdp.rst             |   9 ++
>   doc/guides/nics/features/af_xdp.ini    |   1 +
>   doc/guides/rel_notes/release_22_03.rst |   1 +
>   drivers/net/af_xdp/rte_eth_af_xdp.c    | 210 +++++++++++++++++++++++--
>   4 files changed, 207 insertions(+), 14 deletions(-)
> 
> diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
> index db02ea1984..eb4eab28a8 100644
> --- a/doc/guides/nics/af_xdp.rst
> +++ b/doc/guides/nics/af_xdp.rst
> @@ -141,4 +141,13 @@ Limitations
>     NAPI context from a watchdog timer instead of from softirqs. More information
>     on this feature can be found at [1].
>   
> +- **Secondary Processes**
> +
> +  Rx and Tx are not supported for secondary processes due to the single-producer
> +  single-consumer nature of the AF_XDP rings. However other operations including
> +  statistics retrieval are permitted.

Hi Ciara,

Isn't this limitation same for all PMDs, like not both primary & secondary can Rx/Tx
from same queue at the same time.
But primary can initiallize the PMD and secondary can do the datapath,
or isn't af_xdp supports multiple queue, if so some queues can be used by
primary and some by secondary for datapath.

Is there anyhing special for af_xdp that prevents it?

> +  The maximum number of queues permitted for PMDs operating in this model is 8
> +  as this is the maximum number of fds that can be sent through the IPC APIs as
> +  defined by RTE_MP_MAX_FD_NUM.
> +
>     [1] https://lwn.net/Articles/837010/
> diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini
> index 54b738e616..8e7e075aaf 100644
> --- a/doc/guides/nics/features/af_xdp.ini
> +++ b/doc/guides/nics/features/af_xdp.ini
> @@ -9,4 +9,5 @@ Power mgmt address monitor = Y
>   MTU update           = Y
>   Promiscuous mode     = Y
>   Stats per queue      = Y
> +Multiprocess aware   = Y
>   x86-64               = Y
> diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> index bf2e3f78a9..dfd2cbbccf 100644
> --- a/doc/guides/rel_notes/release_22_03.rst
> +++ b/doc/guides/rel_notes/release_22_03.rst
> @@ -58,6 +58,7 @@ New Features
>   * **Updated AF_XDP PMD**
>   
>     * Added support for libxdp >=v1.2.2.
> +  * Re-enabled secondary process support. RX/TX is not supported.
>   
>   * **Updated Cisco enic driver.**
>   
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 1b6192fa44..407f6d8dbe 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -80,6 +80,18 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
>   
>   #define ETH_AF_XDP_ETH_OVERHEAD		(RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN)
>   
> +#define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
> +
> +static int afxdp_dev_count;
> +
> +/* Message header to synchronize fds via IPC */
> +struct ipc_hdr {
> +	char port_name[RTE_DEV_NAME_MAX_LEN];
> +	/* The file descriptors are in the dedicated part
> +	 * of the Unix message to be translated by the kernel.
> +	 */
> +};
> +
>   struct xsk_umem_info {
>   	struct xsk_umem *umem;
>   	struct rte_ring *buf_ring;
> @@ -147,6 +159,10 @@ struct pmd_internals {
>   	struct pkt_tx_queue *tx_queues;
>   };
>   
> +struct pmd_process_private {
> +	int rxq_xsk_fds[RTE_MAX_QUEUES_PER_PORT];
> +};
> +
>   #define ETH_AF_XDP_IFACE_ARG			"iface"
>   #define ETH_AF_XDP_START_QUEUE_ARG		"start_queue"
>   #define ETH_AF_XDP_QUEUE_COUNT_ARG		"queue_count"
> @@ -795,11 +811,12 @@ static int
>   eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>   {
>   	struct pmd_internals *internals = dev->data->dev_private;
> +	struct pmd_process_private *process_private = dev->process_private;
>   	struct xdp_statistics xdp_stats;
>   	struct pkt_rx_queue *rxq;
>   	struct pkt_tx_queue *txq;
>   	socklen_t optlen;
> -	int i, ret;
> +	int i, ret, fd;
>   
>   	for (i = 0; i < dev->data->nb_rx_queues; i++) {
>   		optlen = sizeof(struct xdp_statistics);
> @@ -815,8 +832,9 @@ eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>   		stats->ibytes += stats->q_ibytes[i];
>   		stats->imissed += rxq->stats.rx_dropped;
>   		stats->oerrors += txq->stats.tx_dropped;
> -		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
> -				XDP_STATISTICS, &xdp_stats, &optlen);
> +		fd = process_private->rxq_xsk_fds[i];
> +		ret = fd >= 0 ? getsockopt(fd, SOL_XDP, XDP_STATISTICS,
> +					   &xdp_stats, &optlen) : -1;
>   		if (ret != 0) {
>   			AF_XDP_LOG(ERR, "getsockopt() failed for XDP_STATISTICS.\n");
>   			return -1;
> @@ -883,8 +901,10 @@ eth_dev_close(struct rte_eth_dev *dev)
>   	struct pkt_rx_queue *rxq;
>   	int i;
>   
> -	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
> +		rte_free(dev->process_private);
>   		return 0;
> +	}
>   
>   	AF_XDP_LOG(INFO, "Closing AF_XDP ethdev on numa socket %u\n",
>   		rte_socket_id());
> @@ -1349,6 +1369,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
>   		   struct rte_mempool *mb_pool)
>   {
>   	struct pmd_internals *internals = dev->data->dev_private;
> +	struct pmd_process_private *process_private = dev->process_private;
>   	struct pkt_rx_queue *rxq;
>   	int ret;
>   
> @@ -1387,6 +1408,8 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
>   	rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
>   	rxq->fds[0].events = POLLIN;
>   
> +	process_private->rxq_xsk_fds[rx_queue_id] = rxq->fds[0].fd;
> +
>   	dev->data->rx_queues[rx_queue_id] = rxq;
>   	return 0;
>   
> @@ -1688,6 +1711,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
>   {
>   	const char *name = rte_vdev_device_name(dev);
>   	const unsigned int numa_node = dev->device.numa_node;
> +	struct pmd_process_private *process_private;
>   	struct pmd_internals *internals;
>   	struct rte_eth_dev *eth_dev;
>   	int ret;
> @@ -1753,9 +1777,17 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
>   	if (ret)
>   		goto err_free_tx;
>   
> +	process_private = (struct pmd_process_private *)
> +		rte_zmalloc_socket(name, sizeof(struct pmd_process_private),
> +				   RTE_CACHE_LINE_SIZE, numa_node);
> +	if (process_private == NULL) {
> +		AF_XDP_LOG(ERR, "Failed to alloc memory for process private\n");
> +		goto err_free_tx;
> +	}

Need to free 'process_private' in the PMD, in 'close()' and 'remove()' paths.

> +
>   	eth_dev = rte_eth_vdev_allocate(dev, 0);
>   	if (eth_dev == NULL)
> -		goto err_free_tx;
> +		goto err_free_pp;
>   
>   	eth_dev->data->dev_private = internals;
>   	eth_dev->data->dev_link = pmd_link;
> @@ -1764,6 +1796,10 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
>   	eth_dev->dev_ops = &ops;
>   	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
>   	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
> +	eth_dev->process_private = process_private;
> +
> +	for (i = 0; i < queue_cnt; i++)
> +		process_private->rxq_xsk_fds[i] = -1;
>   
>   #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>   	AF_XDP_LOG(INFO, "Zero copy between umem and mbuf enabled.\n");
> @@ -1771,6 +1807,8 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
>   
>   	return eth_dev;
>   
> +err_free_pp:
> +	rte_free(process_private);
>   err_free_tx:
>   	rte_free(internals->tx_queues);
>   err_free_rx:
> @@ -1780,6 +1818,115 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
>   	return NULL;
>   }
>   
> +/* Secondary process requests rxq fds from primary. */
> +static int
> +afxdp_mp_request_fds(const char *name, struct rte_eth_dev *dev)
> +{
> +	struct pmd_process_private *process_private = dev->process_private;
> +	struct timespec timeout = {.tv_sec = 1, .tv_nsec = 0};
> +	struct rte_mp_msg request, *reply;
> +	struct rte_mp_reply replies;
> +	struct ipc_hdr *request_param = (struct ipc_hdr *)request.param;
> +	int i, ret;
> +
> +	/* Prepare the request */
> +	memset(&request, 0, sizeof(request));
> +	strlcpy(request.name, ETH_AF_XDP_MP_KEY, sizeof(request.name));
> +	strlcpy(request_param->port_name, name,
> +		sizeof(request_param->port_name));
> +	request.len_param = sizeof(*request_param);
> +
> +	/* Send the request and receive the reply */
> +	AF_XDP_LOG(DEBUG, "Sending IPC request for %s\n", name);
> +	ret = rte_mp_request_sync(&request, &replies, &timeout);
> +	if (ret < 0 || replies.nb_received != 1) {
> +		AF_XDP_LOG(ERR, "Failed to request fds from primary: %d",
> +			   rte_errno);
> +		return -1;
> +	}
> +	reply = replies.msgs;
> +	AF_XDP_LOG(DEBUG, "Received IPC reply for %s\n", name);

I think message can mention "multi-process IPC" for clarification.

> +	if (dev->data->nb_rx_queues != reply->num_fds) {
> +		AF_XDP_LOG(ERR, "Incorrect number of fds received: %d != %d\n",
> +			   reply->num_fds, dev->data->nb_rx_queues);
> +		return -EINVAL;
> +	}
> +
> +	for (i = 0; i < reply->num_fds; i++)
> +		process_private->rxq_xsk_fds[i] = reply->fds[i];
> +
> +	free(reply);
> +	return 0;
> +}
> +
> +/* Primary process sends rxq fds to secondary. */
> +static int
> +afxdp_mp_send_fds(const struct rte_mp_msg *request, const void *peer)
> +{
> +	struct rte_eth_dev *dev;
> +	struct pmd_process_private *process_private;
> +	struct rte_mp_msg reply;
> +	const struct ipc_hdr *request_param =
> +		(const struct ipc_hdr *)request->param;
> +	struct ipc_hdr *reply_param =
> +		(struct ipc_hdr *)reply.param;
> +	const char *request_name = request_param->port_name;
> +	uint16_t port_id;
> +	int i, ret;
> +
> +	AF_XDP_LOG(DEBUG, "Received IPC request for %s\n", request_name);
> +
> +	/* Find the requested port */
> +	ret = rte_eth_dev_get_port_by_name(request_name, &port_id);
> +	if (ret) {
> +		AF_XDP_LOG(ERR, "Failed to get port id for %s\n", request_name);
> +		return -1;
> +	}
> +	dev = &rte_eth_devices[port_id];

Better to not access the global array, there is a new API and a cleanup
already done [1] in other PMDs, can you please apply the same here.

[1]
https://patches.dpdk.org/project/dpdk/patch/20220203082412.79028-1-kumaraparamesh92@gmail.com/

> +	process_private = dev->process_private;
> +
> +	/* Populate the reply with the xsk fd for each queue */
> +	reply.num_fds = 0;
> +	if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
> +		AF_XDP_LOG(ERR, "Number of rx queues (%d) exceeds max number of fds (%d)\n",
> +			   dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
> +		return -EINVAL;
> +	}
> +
> +	for (i = 0; i < dev->data->nb_rx_queues; i++)
> +		reply.fds[reply.num_fds++] = process_private->rxq_xsk_fds[i];
> +
> +	/* Send the reply */
> +	strlcpy(reply.name, request->name, sizeof(reply.name));
> +	strlcpy(reply_param->port_name, request_name,
> +		sizeof(reply_param->port_name));
> +	reply.len_param = sizeof(*reply_param);
> +	AF_XDP_LOG(DEBUG, "Sending IPC reply for %s\n", reply_param->port_name);
> +	if (rte_mp_reply(&reply, peer) < 0) {
> +		AF_XDP_LOG(ERR, "Failed to reply to IPC request\n");
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +/* Secondary process rx function. RX is disabled because rings are SPSC */
> +static uint16_t
> +eth_af_xdp_rx_noop(void *queue __rte_unused,
> +		struct rte_mbuf **bufs __rte_unused,
> +		uint16_t nb_pkts __rte_unused)
> +{
> +	return 0;
> +}
> +
> +/* Secondary process tx function. TX is disabled because rings are SPSC */
> +static uint16_t
> +eth_af_xdp_tx_noop(void *queue __rte_unused,
> +			struct rte_mbuf **bufs __rte_unused,
> +			uint16_t nb_pkts __rte_unused)
> +{
> +	return 0;
> +}
> +

Now there are multiple PMDs using noop/dummy Rx/Tx burst functions,
what do you think to add static inline functions in the 'ethdev_driver.h'
(and clean existing drivers to use it) with a separate patch, later in this
patch use those functions?


>   static int
>   rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>   {
> @@ -1789,19 +1936,39 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>   	int xsk_queue_cnt = ETH_AF_XDP_DFLT_QUEUE_COUNT;
>   	int shared_umem = 0;
>   	char prog_path[PATH_MAX] = {'\0'};
> -	int busy_budget = -1;
> +	int busy_budget = -1, ret;
>   	struct rte_eth_dev *eth_dev = NULL;
> -	const char *name;
> +	const char *name = rte_vdev_device_name(dev);
>   
> -	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
> -		rte_vdev_device_name(dev));
> +	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n", name);
>   
> -	name = rte_vdev_device_name(dev);
>   	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> -		AF_XDP_LOG(ERR, "Failed to probe %s. "
> -				"AF_XDP PMD does not support secondary processes.\n",
> -				name);
> -		return -ENOTSUP;
> +		eth_dev = rte_eth_dev_attach_secondary(name);
> +		if (eth_dev == NULL) {
> +			AF_XDP_LOG(ERR, "Failed to probe %s\n", name);
> +			return -EINVAL;
> +		}
> +		eth_dev->dev_ops = &ops;
> +		eth_dev->device = &dev->device;
> +		eth_dev->rx_pkt_burst = eth_af_xdp_rx_noop;
> +		eth_dev->tx_pkt_burst = eth_af_xdp_tx_noop;
> +		eth_dev->process_private = (struct pmd_process_private *)
> +			rte_zmalloc_socket(name,
> +					   sizeof(struct pmd_process_private),
> +					   RTE_CACHE_LINE_SIZE,
> +					   eth_dev->device->numa_node);
> +		if (eth_dev->process_private == NULL) {
> +			AF_XDP_LOG(ERR,
> +				"Failed to alloc memory for process private\n");
> +			return -ENOMEM;
> +		}
> +
> +		/* Obtain the xsk fds from the primary process. */
> +		if (afxdp_mp_request_fds(name, eth_dev))
> +			return -1;
> +
> +		rte_eth_dev_probing_finish(eth_dev);
> +		return 0;
>   	}
>   
>   	kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_arguments);
> @@ -1836,6 +2003,17 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
>   		return -1;
>   	}
>   
> +	/* Register IPC callback which shares xsk fds from primary to secondary */
> +	if (!afxdp_dev_count) {
> +		ret = rte_mp_action_register(ETH_AF_XDP_MP_KEY, afxdp_mp_send_fds);
> +		if (ret < 0) {
> +			AF_XDP_LOG(ERR, "%s: Failed to register IPC callback: %s",
> +				   name, strerror(rte_errno));
> +			return -1;
> +		}
> +	}
> +	afxdp_dev_count++;
> +
>   	rte_eth_dev_probing_finish(eth_dev);
>   
>   	return 0;
> @@ -1858,6 +2036,10 @@ rte_pmd_af_xdp_remove(struct rte_vdev_device *dev)
>   		return 0;
>   
>   	eth_dev_close(eth_dev);
> +	rte_free(eth_dev->process_private);
> +	if (afxdp_dev_count == 1)
> +		rte_mp_action_unregister(ETH_AF_XDP_MP_KEY);
> +	afxdp_dev_count--;
>   	rte_eth_dev_release_port(eth_dev);
>   
>   


  reply	other threads:[~2022-02-04 14:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 15:32 [RFC PATCH] net/af_xdp: reenable secondary process support Ciara Loftus
2021-12-11 21:49 ` Stephen Hemminger
2022-01-12  7:54 ` [PATCH] net/af_xdp: re-enable " Ciara Loftus
2022-02-04 12:54   ` [PATCH v2] " Ciara Loftus
2022-02-04 14:18     ` Ferruh Yigit [this message]
2022-02-07  7:49       ` Loftus, Ciara
2022-02-07 10:27         ` Ferruh Yigit
2022-02-07 11:39           ` Loftus, Ciara
2022-02-08 10:58             ` Ferruh Yigit
2022-02-08 13:48     ` [PATCH v3] " Ciara Loftus
2022-02-08 17:45       ` Stephen Hemminger
2022-02-08 18:00         ` Ferruh Yigit
2022-02-08 18:42           ` Stephen Hemminger
2022-02-08 18:56             ` Ferruh Yigit
2022-02-09  7:41               ` Loftus, Ciara
2022-02-09  9:48       ` [PATCH v4] " Ciara Loftus
2022-02-09 15:29         ` Stephen Hemminger
2022-02-11 13:32           ` Ferruh Yigit
2022-02-09 17:55         ` Ferruh Yigit
2022-02-10 15:08           ` Ferruh Yigit
2022-02-10 15:19         ` Ferruh Yigit
2022-02-10 15:40           ` Loftus, Ciara
2022-02-10 16:06             ` Ferruh Yigit
2022-02-10 17:47               ` Loftus, Ciara
2022-02-10 20:12                 ` Ferruh Yigit
2022-02-11  7:28                   ` Loftus, Ciara
2022-02-11  9:26                     ` Loftus, Ciara
2022-02-11 12:29                       ` Ferruh Yigit
2022-02-11 13:01                         ` Loftus, Ciara
2022-02-11 13:07                           ` Ferruh Yigit
2022-02-11 15:32                             ` Loftus, Ciara
2022-02-16 11:23                               ` Loftus, Ciara
2022-02-11 11:31                     ` Ferruh Yigit
2022-02-17 12:44         ` David Marchand
2022-02-17 12:47           ` Ferruh Yigit
2022-02-17 12:53             ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=927b094f-2d84-1dbb-0ad5-37dcf1e1c98e@intel.com \
    --to=ferruh.yigit@intel.com \
    --cc=ciara.loftus@intel.com \
    --cc=dev@dpdk.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.