All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode
@ 2018-10-01 12:51 Magnus Karlsson
  2018-10-01 12:51 ` [PATCH bpf-next v2 1/5] net: add umem reference in netdev{_rx}_queue Magnus Karlsson
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Magnus Karlsson @ 2018-10-01 12:51 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski

Previously, the xsk code did not record which umem was bound to a
specific queue id. This was not required if all drivers were zero-copy
enabled as this had to be recorded in the driver anyway. So if a user
tried to bind two umems to the same queue, the driver would say
no. But if copy-mode was first enabled and then zero-copy mode (or the
reverse order), we mistakenly enabled both of them on the same umem
leading to buggy behavior. The main culprit for this is that we did
not store the association of umem to queue id in the copy case and
only relied on the driver reporting this. As this relation was not
stored in the driver for copy mode (it does not rely on the AF_XDP
NDOs), this obviously could not work.

This patch fixes the problem by always recording the umem to queue id
relationship in the netdev_queue and netdev_rx_queue structs. This way
we always know what kind of umem has been bound to a queue id and can
act appropriately at bind time. To make the bind semantics consistent
with ethtool queue manipulations and to facilitate the implementation
of drivers, we also forbid decreasing the number of queues/channels
with ethtool if there is an active AF_XDP socket in the set of queues
that are disabled.

Jakub, please take a look at your patches. The last one I had to
change slightly to make it fit with the new interface
xdp_get_umem_from_qid(). An added bonus with this function is that we,
in the future, can also use it from the driver to get a umem, thus
simplifying driver implementations (and later remove the umem from the
NDO completely). Björn will mail patches, at a later point in time,
using this in the i40e and ixgbe drivers, that removes a good chunk of
code from the ZC implementations. I also made your code aware of Tx
queues. If we create a socket that only has a Tx queue, then the queue
id will refer to a Tx queue id only and could be larger than the
available amount of Rx queues. Please take a look at it.

Differences against v1:
* Included patches from Jakub that forbids decreasing the number of active
  queues if a queue to be deactivated has an AF_XDP socket. These have
  been adapted somewhat to the new interfaces in patch 2.
* Removed redundant check against real_num_[rt]x_queue in xsk_bind
* Only need to test against real_num_[rt]x_queues in
  xdp_clear_umem_at_qid.

Patch 1: Introduces a umem reference in the netdev_rx_queue and
         netdev_queue structs.
Patch 2: Records which queue_id is bound to which umem and make sure
         that you cannot bind two different umems to the same queue_id.
Patch 3: Pre patch to ethtool_set_channels.
Patch 4: Forbid decreasing the number of active queues if a deactivated
         queue has an AF_XDP socket.
Patch 5: Simplify xdp_clear_umem_at_qid now when ethtool cannot deactivate
         the queue id we are running on.

I based this patch set on bpf-next commit 5bf7a60b8e70 ("bpf: permit
CGROUP_DEVICE programs accessing helper bpf_get_current_cgroup_id()")

Thanks: Magnus

Jakub Kicinski (2):
  ethtool: rename local variable max -> curr
  ethtool: don't allow disabling queues with umem installed

Magnus Karlsson (3):
  net: add umem reference in netdev{_rx}_queue
  xsk: fix bug when trying to use both copy and zero-copy on one queue
    id
  xsk: simplify xdp_clear_umem_at_qid implementation

 include/linux/netdevice.h |  6 ++++
 include/net/xdp_sock.h    |  7 ++++
 net/core/ethtool.c        | 23 +++++++++----
 net/xdp/xdp_umem.c        | 87 ++++++++++++++++++++++++++++++++---------------
 net/xdp/xdp_umem.h        |  2 +-
 net/xdp/xsk.c             |  7 ----
 6 files changed, 91 insertions(+), 41 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 1/5] net: add umem reference in netdev{_rx}_queue
  2018-10-01 12:51 [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
@ 2018-10-01 12:51 ` Magnus Karlsson
  2018-10-01 12:51 ` [PATCH bpf-next v2 2/5] xsk: fix bug when trying to use both copy and zero-copy on one queue id Magnus Karlsson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Magnus Karlsson @ 2018-10-01 12:51 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski

These references to the umem will be used to store information
on what kind of AF_XDP umem that is bound to a queue id, if any.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/linux/netdevice.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1cbbf77..8318f79 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -609,6 +609,9 @@ struct netdev_queue {
 
 	/* Subordinate device that the queue has been assigned to */
 	struct net_device	*sb_dev;
+#ifdef CONFIG_XDP_SOCKETS
+	struct xdp_umem         *umem;
+#endif
 /*
  * write-mostly part
  */
@@ -738,6 +741,9 @@ struct netdev_rx_queue {
 	struct kobject			kobj;
 	struct net_device		*dev;
 	struct xdp_rxq_info		xdp_rxq;
+#ifdef CONFIG_XDP_SOCKETS
+	struct xdp_umem                 *umem;
+#endif
 } ____cacheline_aligned_in_smp;
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 2/5] xsk: fix bug when trying to use both copy and zero-copy on one queue id
  2018-10-01 12:51 [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
  2018-10-01 12:51 ` [PATCH bpf-next v2 1/5] net: add umem reference in netdev{_rx}_queue Magnus Karlsson
@ 2018-10-01 12:51 ` Magnus Karlsson
  2018-10-01 12:51 ` [PATCH bpf-next v2 3/5] ethtool: rename local variable max -> curr Magnus Karlsson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Magnus Karlsson @ 2018-10-01 12:51 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski

Previously, the xsk code did not record which umem was bound to a
specific queue id. This was not required if all drivers were zero-copy
enabled as this had to be recorded in the driver anyway. So if a user
tried to bind two umems to the same queue, the driver would say
no. But if copy-mode was first enabled and then zero-copy mode (or the
reverse order), we mistakenly enabled both of them on the same umem
leading to buggy behavior. The main culprit for this is that we did
not store the association of umem to queue id in the copy case and
only relied on the driver reporting this. As this relation was not
stored in the driver for copy mode (it does not rely on the AF_XDP
NDOs), this obviously could not work.

This patch fixes the problem by always recording the umem to queue id
relationship in the netdev_queue and netdev_rx_queue structs. This way
we always know what kind of umem has been bound to a queue id and can
act appropriately at bind time.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xdp_umem.c | 90 ++++++++++++++++++++++++++++++++++++++----------------
 net/xdp/xdp_umem.h |  2 +-
 net/xdp/xsk.c      |  7 -----
 3 files changed, 64 insertions(+), 35 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 555427b..4d6c665 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -42,27 +42,47 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs)
 	}
 }
 
-int xdp_umem_query(struct net_device *dev, u16 queue_id)
+/* The umem is stored both in the _rx struct and the _tx struct as we do
+ * not know if the device has more tx queues than rx, or the opposite.
+ * This might also change during run time.
+ */
+static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
+				u16 queue_id)
 {
-	struct netdev_bpf bpf;
+	if (queue_id < dev->real_num_rx_queues)
+		dev->_rx[queue_id].umem = umem;
+	if (queue_id < dev->real_num_tx_queues)
+		dev->_tx[queue_id].umem = umem;
+}
 
-	ASSERT_RTNL();
+static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
+					      u16 queue_id)
+{
+	if (queue_id < dev->real_num_rx_queues)
+		return dev->_rx[queue_id].umem;
+	if (queue_id < dev->real_num_tx_queues)
+		return dev->_tx[queue_id].umem;
 
-	memset(&bpf, 0, sizeof(bpf));
-	bpf.command = XDP_QUERY_XSK_UMEM;
-	bpf.xsk.queue_id = queue_id;
+	return NULL;
+}
 
-	if (!dev->netdev_ops->ndo_bpf)
-		return 0;
-	return dev->netdev_ops->ndo_bpf(dev, &bpf) ?: !!bpf.xsk.umem;
+static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
+{
+	/* Zero out the entry independent on how many queues are configured
+	 * at this point in time, as it might be used in the future.
+	 */
+	if (queue_id < dev->num_rx_queues)
+		dev->_rx[queue_id].umem = NULL;
+	if (queue_id < dev->num_tx_queues)
+		dev->_tx[queue_id].umem = NULL;
 }
 
 int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
-			u32 queue_id, u16 flags)
+			u16 queue_id, u16 flags)
 {
 	bool force_zc, force_copy;
 	struct netdev_bpf bpf;
-	int err;
+	int err = 0;
 
 	force_zc = flags & XDP_ZEROCOPY;
 	force_copy = flags & XDP_COPY;
@@ -70,17 +90,23 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 	if (force_zc && force_copy)
 		return -EINVAL;
 
-	if (force_copy)
-		return 0;
+	rtnl_lock();
+	if (xdp_get_umem_from_qid(dev, queue_id)) {
+		err = -EBUSY;
+		goto out_rtnl_unlock;
+	}
 
-	if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_async_xmit)
-		return force_zc ? -EOPNOTSUPP : 0; /* fail or fallback */
+	xdp_reg_umem_at_qid(dev, umem, queue_id);
+	umem->dev = dev;
+	umem->queue_id = queue_id;
+	if (force_copy)
+		/* For copy-mode, we are done. */
+		goto out_rtnl_unlock;
 
-	rtnl_lock();
-	err = xdp_umem_query(dev, queue_id);
-	if (err) {
-		err = err < 0 ? -EOPNOTSUPP : -EBUSY;
-		goto err_rtnl_unlock;
+	if (!dev->netdev_ops->ndo_bpf ||
+	    !dev->netdev_ops->ndo_xsk_async_xmit) {
+		err = -EOPNOTSUPP;
+		goto err_unreg_umem;
 	}
 
 	bpf.command = XDP_SETUP_XSK_UMEM;
@@ -89,18 +115,20 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
 
 	err = dev->netdev_ops->ndo_bpf(dev, &bpf);
 	if (err)
-		goto err_rtnl_unlock;
+		goto err_unreg_umem;
 	rtnl_unlock();
 
 	dev_hold(dev);
-	umem->dev = dev;
-	umem->queue_id = queue_id;
 	umem->zc = true;
 	return 0;
 
-err_rtnl_unlock:
+err_unreg_umem:
+	xdp_clear_umem_at_qid(dev, queue_id);
+	if (!force_zc)
+		err = 0; /* fallback to copy mode */
+out_rtnl_unlock:
 	rtnl_unlock();
-	return force_zc ? err : 0; /* fail or fallback */
+	return err;
 }
 
 static void xdp_umem_clear_dev(struct xdp_umem *umem)
@@ -108,7 +136,7 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
 	struct netdev_bpf bpf;
 	int err;
 
-	if (umem->dev) {
+	if (umem->zc) {
 		bpf.command = XDP_SETUP_XSK_UMEM;
 		bpf.xsk.umem = NULL;
 		bpf.xsk.queue_id = umem->queue_id;
@@ -119,9 +147,17 @@ static void xdp_umem_clear_dev(struct xdp_umem *umem)
 
 		if (err)
 			WARN(1, "failed to disable umem!\n");
+	}
+
+	if (umem->dev) {
+		rtnl_lock();
+		xdp_clear_umem_at_qid(umem->dev, umem->queue_id);
+		rtnl_unlock();
+	}
 
+	if (umem->zc) {
 		dev_put(umem->dev);
-		umem->dev = NULL;
+		umem->zc = false;
 	}
 }
 
diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h
index c8be1ad..2760322 100644
--- a/net/xdp/xdp_umem.h
+++ b/net/xdp/xdp_umem.h
@@ -9,7 +9,7 @@
 #include <net/xdp_sock.h>
 
 int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev,
-			u32 queue_id, u16 flags);
+			u16 queue_id, u16 flags);
 bool xdp_umem_validate_queues(struct xdp_umem *umem);
 void xdp_get_umem(struct xdp_umem *umem);
 void xdp_put_umem(struct xdp_umem *umem);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5a432df..caeddad1 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -419,13 +419,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	}
 
 	qid = sxdp->sxdp_queue_id;
-
-	if ((xs->rx && qid >= dev->real_num_rx_queues) ||
-	    (xs->tx && qid >= dev->real_num_tx_queues)) {
-		err = -EINVAL;
-		goto out_unlock;
-	}
-
 	flags = sxdp->sxdp_flags;
 
 	if (flags & XDP_SHARED_UMEM) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 3/5] ethtool: rename local variable max -> curr
  2018-10-01 12:51 [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
  2018-10-01 12:51 ` [PATCH bpf-next v2 1/5] net: add umem reference in netdev{_rx}_queue Magnus Karlsson
  2018-10-01 12:51 ` [PATCH bpf-next v2 2/5] xsk: fix bug when trying to use both copy and zero-copy on one queue id Magnus Karlsson
@ 2018-10-01 12:51 ` Magnus Karlsson
  2018-10-01 12:51 ` [PATCH bpf-next v2 4/5] ethtool: don't allow disabling queues with umem installed Magnus Karlsson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Magnus Karlsson @ 2018-10-01 12:51 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski

From: Jakub Kicinski <jakub.kicinski@netronome.com>

ethtool_set_channels() validates the config against driver's max
settings. It retrieves the current config and stores it in a
variable called max. This was okay when only max settings were
accessed but we will soon want to access current settings as
well, so calling the entire structure max makes the code less
readable.

While at it drop unnecessary parenthesis.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 net/core/ethtool.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 96afc55..9a648fb 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1655,7 +1655,7 @@ static noinline_for_stack int ethtool_get_channels(struct net_device *dev,
 static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 						   void __user *useraddr)
 {
-	struct ethtool_channels channels, max = { .cmd = ETHTOOL_GCHANNELS };
+	struct ethtool_channels channels, curr = { .cmd = ETHTOOL_GCHANNELS };
 	u32 max_rx_in_use = 0;
 
 	if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
@@ -1664,13 +1664,13 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 	if (copy_from_user(&channels, useraddr, sizeof(channels)))
 		return -EFAULT;
 
-	dev->ethtool_ops->get_channels(dev, &max);
+	dev->ethtool_ops->get_channels(dev, &curr);
 
 	/* ensure new counts are within the maximums */
-	if ((channels.rx_count > max.max_rx) ||
-	    (channels.tx_count > max.max_tx) ||
-	    (channels.combined_count > max.max_combined) ||
-	    (channels.other_count > max.max_other))
+	if (channels.rx_count > curr.max_rx ||
+	    channels.tx_count > curr.max_tx ||
+	    channels.combined_count > curr.max_combined ||
+	    channels.other_count > curr.max_other)
 		return -EINVAL;
 
 	/* ensure the new Rx count fits within the configured Rx flow
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 4/5] ethtool: don't allow disabling queues with umem installed
  2018-10-01 12:51 [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
                   ` (2 preceding siblings ...)
  2018-10-01 12:51 ` [PATCH bpf-next v2 3/5] ethtool: rename local variable max -> curr Magnus Karlsson
@ 2018-10-01 12:51 ` Magnus Karlsson
  2018-10-01 12:51 ` [PATCH bpf-next v2 5/5] xsk: simplify xdp_clear_umem_at_qid implementation Magnus Karlsson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Magnus Karlsson @ 2018-10-01 12:51 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski

From: Jakub Kicinski <jakub.kicinski@netronome.com>

We already check the RSS indirection table does not use queues which
would be disabled by channel reconfiguration. Make sure user does not
try to disable queues which have a UMEM and zero-copy AF_XDP socket
installed.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 include/net/xdp_sock.h |  7 +++++++
 net/core/ethtool.c     | 11 +++++++++++
 net/xdp/xdp_umem.c     |  4 ++--
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 70a115b..13acb98 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -86,6 +86,7 @@ struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries);
 struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
 					  struct xdp_umem_fq_reuse *newq);
 void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
+struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev, u16 queue_id);
 
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
@@ -183,6 +184,12 @@ static inline void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq)
 {
 }
 
+static inline struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
+						     u16 queue_id)
+{
+	return NULL;
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9a648fb..5a788ad 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -27,6 +27,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/sched/signal.h>
 #include <linux/net.h>
+#include <net/xdp_sock.h>
 
 /*
  * Some useful ethtool_ops methods that're device independent.
@@ -1656,7 +1657,9 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 						   void __user *useraddr)
 {
 	struct ethtool_channels channels, curr = { .cmd = ETHTOOL_GCHANNELS };
+	u16 from_channel, to_channel;
 	u32 max_rx_in_use = 0;
+	unsigned int i;
 
 	if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
 		return -EOPNOTSUPP;
@@ -1680,6 +1683,14 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 	    (channels.combined_count + channels.rx_count) <= max_rx_in_use)
 	    return -EINVAL;
 
+	/* Disabling channels, query zero-copy AF_XDP sockets */
+	from_channel = channels.combined_count +
+		min(channels.rx_count, channels.tx_count);
+	to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count);
+	for (i = from_channel; i < to_channel; i++)
+		if (xdp_get_umem_from_qid(dev, i))
+			return -EINVAL;
+
 	return dev->ethtool_ops->set_channels(dev, &channels);
 }
 
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 4d6c665..773326f 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -55,8 +55,8 @@ static void xdp_reg_umem_at_qid(struct net_device *dev, struct xdp_umem *umem,
 		dev->_tx[queue_id].umem = umem;
 }
 
-static struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
-					      u16 queue_id)
+struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
+				       u16 queue_id)
 {
 	if (queue_id < dev->real_num_rx_queues)
 		return dev->_rx[queue_id].umem;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v2 5/5] xsk: simplify xdp_clear_umem_at_qid implementation
  2018-10-01 12:51 [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
                   ` (3 preceding siblings ...)
  2018-10-01 12:51 ` [PATCH bpf-next v2 4/5] ethtool: don't allow disabling queues with umem installed Magnus Karlsson
@ 2018-10-01 12:51 ` Magnus Karlsson
  2018-10-01 20:31 ` [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Jakub Kicinski
  2018-10-05  7:35 ` Daniel Borkmann
  6 siblings, 0 replies; 10+ messages in thread
From: Magnus Karlsson @ 2018-10-01 12:51 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jakub.kicinski

As we now do not allow ethtool to deactivate the queue id we are
running an AF_XDP socket on, we can simplify the implementation of
xdp_clear_umem_at_qid().

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xdp_umem.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 773326f..c6007c5 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -68,12 +68,9 @@ struct xdp_umem *xdp_get_umem_from_qid(struct net_device *dev,
 
 static void xdp_clear_umem_at_qid(struct net_device *dev, u16 queue_id)
 {
-	/* Zero out the entry independent on how many queues are configured
-	 * at this point in time, as it might be used in the future.
-	 */
-	if (queue_id < dev->num_rx_queues)
+	if (queue_id < dev->real_num_rx_queues)
 		dev->_rx[queue_id].umem = NULL;
-	if (queue_id < dev->num_tx_queues)
+	if (queue_id < dev->real_num_tx_queues)
 		dev->_tx[queue_id].umem = NULL;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode
  2018-10-01 12:51 [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
                   ` (4 preceding siblings ...)
  2018-10-01 12:51 ` [PATCH bpf-next v2 5/5] xsk: simplify xdp_clear_umem_at_qid implementation Magnus Karlsson
@ 2018-10-01 20:31 ` Jakub Kicinski
  2018-10-02 12:49   ` Magnus Karlsson
  2018-10-05  7:35 ` Daniel Borkmann
  6 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2018-10-01 20:31 UTC (permalink / raw)
  To: Magnus Karlsson; +Cc: bjorn.topel, ast, daniel, netdev, Jesper Dangaard Brouer

On Mon,  1 Oct 2018 14:51:32 +0200, Magnus Karlsson wrote:
> Jakub, please take a look at your patches. The last one I had to
> change slightly to make it fit with the new interface
> xdp_get_umem_from_qid(). An added bonus with this function is that we,
> in the future, can also use it from the driver to get a umem, thus
> simplifying driver implementations (and later remove the umem from the
> NDO completely). Björn will mail patches, at a later point in time,
> using this in the i40e and ixgbe drivers, that removes a good chunk of
> code from the ZC implementations. 

Nice, drivers which don't follow the prepare/commit model of handling
reconfigurations will benefit!

> I also made your code aware of Tx queues. If we create a socket that
> only has a Tx queue, then the queue id will refer to a Tx queue id
> only and could be larger than the available amount of Rx queues.
> Please take a look at it.

The semantics of Tx queue id are slightly unclear.  To me XDP is
associated with Rx, so the qid in driver context can only refer to 
Rx queue and its associated XDP Tx queue.  It does not mean the Tx
queue stack uses, like it does for copy fallback.  If one doesn't have
a Rx queue $id, there will be no associated XDP Tx queue $id (in all
drivers but Intel, and virtio, which use per-CPU Tx queues making TX
queue even more meaningless).

Its to be seen how others implement AF_XDP.  My general feeling is
that we should only talk about Rx queues in context of driver XDP. 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode
  2018-10-01 20:31 ` [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Jakub Kicinski
@ 2018-10-02 12:49   ` Magnus Karlsson
  2018-10-02 16:58     ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Magnus Karlsson @ 2018-10-02 12:49 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: Karlsson, Magnus, Björn Töpel, ast, Daniel Borkmann,
	Network Development, Jesper Dangaard Brouer

On Mon, Oct 1, 2018 at 10:34 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Mon,  1 Oct 2018 14:51:32 +0200, Magnus Karlsson wrote:
> > Jakub, please take a look at your patches. The last one I had to
> > change slightly to make it fit with the new interface
> > xdp_get_umem_from_qid(). An added bonus with this function is that we,
> > in the future, can also use it from the driver to get a umem, thus
> > simplifying driver implementations (and later remove the umem from the
> > NDO completely). Björn will mail patches, at a later point in time,
> > using this in the i40e and ixgbe drivers, that removes a good chunk of
> > code from the ZC implementations.
>
> Nice, drivers which don't follow the prepare/commit model of handling
> reconfigurations will benefit!
>
> > I also made your code aware of Tx queues. If we create a socket that
> > only has a Tx queue, then the queue id will refer to a Tx queue id
> > only and could be larger than the available amount of Rx queues.
> > Please take a look at it.
>
> The semantics of Tx queue id are slightly unclear.  To me XDP is
> associated with Rx, so the qid in driver context can only refer to
> Rx queue and its associated XDP Tx queue.  It does not mean the Tx
> queue stack uses, like it does for copy fallback.  If one doesn't have
> a Rx queue $id, there will be no associated XDP Tx queue $id (in all
> drivers but Intel, and virtio, which use per-CPU Tx queues making TX
> queue even more meaningless).
>
> Its to be seen how others implement AF_XDP.  My general feeling is
> that we should only talk about Rx queues in context of driver XDP.

This is the way I see it. From an uapi point of view we can create a
socket that can only do Rx, only Tx or both. We then bind this socket
to a specific queue id on a device. If a packet is received on this
queue id it is sent (by the default xdpsock sample program) to the
socket. If a packet is sent on this socket it goes out on this same
queue id. If you have not registered an Rx ring (in user space) for
this socket, you cannot receive anything on this socket. And
conversely, if you have no Tx ring, you will not be able to send
anything.

But if we take a look at this from the driver perspective and the NDO
XDP_SETUP_XSK_UMEM, today it does not know anything about if Rx and Tx
rings have been setup in the socket. It will always initialize the HW
Rx and Tx queues of the supplied queue id. So with today's NDO
interface you will always get a Rx/Tx queue pair. In order to realize
the uapi above in an efficient manner and to support devices with more
Tx queues than Rx, we need to change the NDO.

Just as a note, in the applications I am used to work on, radio base
stations and other telecom apps, it is the common case to have many
more Tx queues than Rx queues just to be able to use scheduling,
shaping and other QoS features that are important on egress in those
systems. Therefore the interest in supporting Tx only queues. But
maybe this is just a weird case, do not know.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode
  2018-10-02 12:49   ` Magnus Karlsson
@ 2018-10-02 16:58     ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2018-10-02 16:58 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Karlsson, Magnus, Björn Töpel, ast, Daniel Borkmann,
	Network Development, Jesper Dangaard Brouer

On Tue, 2 Oct 2018 14:49:13 +0200, Magnus Karlsson wrote:
> On Mon, Oct 1, 2018 at 10:34 PM Jakub Kicinski wrote:
> > On Mon,  1 Oct 2018 14:51:32 +0200, Magnus Karlsson wrote:  
> > > Jakub, please take a look at your patches. The last one I had to
> > > change slightly to make it fit with the new interface
> > > xdp_get_umem_from_qid(). An added bonus with this function is that we,
> > > in the future, can also use it from the driver to get a umem, thus
> > > simplifying driver implementations (and later remove the umem from the
> > > NDO completely). Björn will mail patches, at a later point in time,
> > > using this in the i40e and ixgbe drivers, that removes a good chunk of
> > > code from the ZC implementations.  
> >
> > Nice, drivers which don't follow the prepare/commit model of handling
> > reconfigurations will benefit!
> >  
> > > I also made your code aware of Tx queues. If we create a socket that
> > > only has a Tx queue, then the queue id will refer to a Tx queue id
> > > only and could be larger than the available amount of Rx queues.
> > > Please take a look at it.  
> >
> > The semantics of Tx queue id are slightly unclear.  To me XDP is
> > associated with Rx, so the qid in driver context can only refer to
> > Rx queue and its associated XDP Tx queue.  It does not mean the Tx
> > queue stack uses, like it does for copy fallback.  If one doesn't have
> > a Rx queue $id, there will be no associated XDP Tx queue $id (in all
> > drivers but Intel, and virtio, which use per-CPU Tx queues making TX
> > queue even more meaningless).
> >
> > Its to be seen how others implement AF_XDP.  My general feeling is
> > that we should only talk about Rx queues in context of driver XDP.  
> 
> This is the way I see it. From an uapi point of view we can create a
> socket that can only do Rx, only Tx or both. We then bind this socket
> to a specific queue id on a device. If a packet is received on this
> queue id it is sent (by the default xdpsock sample program) to the
> socket. If a packet is sent on this socket it goes out on this same
> queue id. If you have not registered an Rx ring (in user space) for
> this socket, you cannot receive anything on this socket. And
> conversely, if you have no Tx ring, you will not be able to send
> anything.
> 
> But if we take a look at this from the driver perspective and the NDO
> XDP_SETUP_XSK_UMEM, today it does not know anything about if Rx and Tx
> rings have been setup in the socket. It will always initialize the HW
> Rx and Tx queues of the supplied queue id. So with today's NDO
> interface you will always get a Rx/Tx queue pair. In order to realize
> the uapi above in an efficient manner and to support devices with more
> Tx queues than Rx, we need to change the NDO.
> 
> Just as a note, in the applications I am used to work on, radio base
> stations and other telecom apps, it is the common case to have many
> more Tx queues than Rx queues just to be able to use scheduling,
> shaping and other QoS features that are important on egress in those
> systems. Therefore the interest in supporting Tx only queues. But
> maybe this is just a weird case, do not know.

It's a good case, it should be supported.  I'm just wondering whether
the API we have today is going to be the right one.  So for i40e you
actually allocate TX ring per RX ring?  In ixgbe IIUC there is an XDP
TX ring per core so regardless how many TX queues one requests there
will actually be num_cpu_ids XDP TX queues... so even the check against
RX isn't meaningful there.  Hm..  Okay, I think what you've done is the
safest bet, we can always relax the check later on.

LGTM, sorry for the noise! :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode
  2018-10-01 12:51 [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
                   ` (5 preceding siblings ...)
  2018-10-01 20:31 ` [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Jakub Kicinski
@ 2018-10-05  7:35 ` Daniel Borkmann
  6 siblings, 0 replies; 10+ messages in thread
From: Daniel Borkmann @ 2018-10-05  7:35 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jakub.kicinski

On 10/01/2018 02:51 PM, Magnus Karlsson wrote:
> Previously, the xsk code did not record which umem was bound to a
> specific queue id. This was not required if all drivers were zero-copy
> enabled as this had to be recorded in the driver anyway. So if a user
> tried to bind two umems to the same queue, the driver would say
> no. But if copy-mode was first enabled and then zero-copy mode (or the
> reverse order), we mistakenly enabled both of them on the same umem
> leading to buggy behavior. The main culprit for this is that we did
> not store the association of umem to queue id in the copy case and
> only relied on the driver reporting this. As this relation was not
> stored in the driver for copy mode (it does not rely on the AF_XDP
> NDOs), this obviously could not work.
> 
> This patch fixes the problem by always recording the umem to queue id
> relationship in the netdev_queue and netdev_rx_queue structs. This way
> we always know what kind of umem has been bound to a queue id and can
> act appropriately at bind time. To make the bind semantics consistent
> with ethtool queue manipulations and to facilitate the implementation
> of drivers, we also forbid decreasing the number of queues/channels
> with ethtool if there is an active AF_XDP socket in the set of queues
> that are disabled.
> 
> Jakub, please take a look at your patches. The last one I had to
> change slightly to make it fit with the new interface
> xdp_get_umem_from_qid(). An added bonus with this function is that we,
> in the future, can also use it from the driver to get a umem, thus
> simplifying driver implementations (and later remove the umem from the
> NDO completely). Björn will mail patches, at a later point in time,
> using this in the i40e and ixgbe drivers, that removes a good chunk of
> code from the ZC implementations. I also made your code aware of Tx
> queues. If we create a socket that only has a Tx queue, then the queue
> id will refer to a Tx queue id only and could be larger than the
> available amount of Rx queues. Please take a look at it.
> 
> Differences against v1:
> * Included patches from Jakub that forbids decreasing the number of active
>   queues if a queue to be deactivated has an AF_XDP socket. These have
>   been adapted somewhat to the new interfaces in patch 2.
> * Removed redundant check against real_num_[rt]x_queue in xsk_bind
> * Only need to test against real_num_[rt]x_queues in
>   xdp_clear_umem_at_qid.
> 
> Patch 1: Introduces a umem reference in the netdev_rx_queue and
>          netdev_queue structs.
> Patch 2: Records which queue_id is bound to which umem and make sure
>          that you cannot bind two different umems to the same queue_id.
> Patch 3: Pre patch to ethtool_set_channels.
> Patch 4: Forbid decreasing the number of active queues if a deactivated
>          queue has an AF_XDP socket.
> Patch 5: Simplify xdp_clear_umem_at_qid now when ethtool cannot deactivate
>          the queue id we are running on.
> 
> I based this patch set on bpf-next commit 5bf7a60b8e70 ("bpf: permit
> CGROUP_DEVICE programs accessing helper bpf_get_current_cgroup_id()")
> 
> Thanks: Magnus
> 
> Jakub Kicinski (2):
>   ethtool: rename local variable max -> curr
>   ethtool: don't allow disabling queues with umem installed
> 
> Magnus Karlsson (3):
>   net: add umem reference in netdev{_rx}_queue
>   xsk: fix bug when trying to use both copy and zero-copy on one queue
>     id
>   xsk: simplify xdp_clear_umem_at_qid implementation
> 
>  include/linux/netdevice.h |  6 ++++
>  include/net/xdp_sock.h    |  7 ++++
>  net/core/ethtool.c        | 23 +++++++++----
>  net/xdp/xdp_umem.c        | 87 ++++++++++++++++++++++++++++++++---------------
>  net/xdp/xdp_umem.h        |  2 +-
>  net/xdp/xsk.c             |  7 ----
>  6 files changed, 91 insertions(+), 41 deletions(-)
> 
> --
> 2.7.4
> 

Applied to bpf-next, thanks everyone!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-05 14:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-01 12:51 [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Magnus Karlsson
2018-10-01 12:51 ` [PATCH bpf-next v2 1/5] net: add umem reference in netdev{_rx}_queue Magnus Karlsson
2018-10-01 12:51 ` [PATCH bpf-next v2 2/5] xsk: fix bug when trying to use both copy and zero-copy on one queue id Magnus Karlsson
2018-10-01 12:51 ` [PATCH bpf-next v2 3/5] ethtool: rename local variable max -> curr Magnus Karlsson
2018-10-01 12:51 ` [PATCH bpf-next v2 4/5] ethtool: don't allow disabling queues with umem installed Magnus Karlsson
2018-10-01 12:51 ` [PATCH bpf-next v2 5/5] xsk: simplify xdp_clear_umem_at_qid implementation Magnus Karlsson
2018-10-01 20:31 ` [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Jakub Kicinski
2018-10-02 12:49   ` Magnus Karlsson
2018-10-02 16:58     ` Jakub Kicinski
2018-10-05  7:35 ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.