All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] tc cls_u32 hardware interface
@ 2016-02-01  1:48 John Fastabend
  2016-02-01  1:49 ` [RFC PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: John Fastabend @ 2016-02-01  1:48 UTC (permalink / raw)
  To: anjali.singhai, jesse.brandeburg, jhs
  Cc: ast, donald.c.skidmore, horms, netdev, tgraf, davem

I was waiting for net-next to open to submit this but it seems like
a good idea to get an RFC out there for folks to start looking over.

This extends the setup_tc framework so it can support more than just
the mqprio offload and push other classifiers and qdiscs into the
hardware. The series here targets the u32 classifier and ixgbe
driver. I worked out the u32 classifier because it is protocol
oblivious and aligns with multiple hardware devices I have access
to. I did an initial implementation on ixgbe because (a) I have one
in my box (b) its a stable driver and (c) it is relatively simple
compared to the other devices I have here but still has enough
flexibility to exercise the features of cls_u32.

I intentionally limited the scope of this series to the basic
feature set. Specifically this uses a 'big hammer' feature bit
to do the offload or not. If the bit is set you get offloaded rules
if it is not then rules will not be offloaded. If we can agree on
this patch series there are some more patches on my queue we can
talk about to make the offload decision per rule using flags similar
to how we do l2 mac updates. Additionally the error strategy can
be improved to be hard aborting, log and continue, etc. I think
these are nice to have improvements but shouldn't block this series.
I am working on similar support for the other Intel devices now
as well namely i40e and fm10k.

Also in the future work bin by adding get_parse_graph and
set_parse_graph attributes as in my previous flow_api work we
can build programmable devices and programmatically learn when
rules can or can not be loaded into the hardware.

Note this series is on a slightly behind net-next stack I think it
should apply to the latest but I haven't updated the series for
awhile I'll do that soon I was sort of waiting for net-next to
open to do this.

Any comments/feedback appreciated.

Thanks,
John

---

John Fastabend (7):
      net: rework ndo tc op to consume additional qdisc handle parameter
      net: rework setup_tc ndo op to consume general tc operand
      net: sched: add cls_u32 offload hooks for netdevs
      net: add tc offload feature flag
      net: tc: helper functions to query action types
      net: ixgbe: add minimal parser details for ixgbe
      net: ixgbe: add support for tc_u32 offload


 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    8 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |    2 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        |    9 +
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   11 +
 drivers/net/ethernet/intel/i40e/i40e_main.c      |   10 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  198 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h   |  110 ++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   13 +
 drivers/net/ethernet/sfc/efx.h                   |    3 
 drivers/net/ethernet/sfc/tx.c                    |   10 +
 drivers/net/ethernet/ti/netcp_core.c             |   14 +-
 include/linux/netdev_features.h                  |    3 
 include/linux/netdevice.h                        |   24 +++
 include/net/pkt_cls.h                            |   33 ++++
 include/net/tc_act/tc_gact.h                     |   14 ++
 net/core/ethtool.c                               |    1 
 net/sched/cls_u32.c                              |   73 ++++++++
 net/sched/sch_mqprio.c                           |    8 +
 21 files changed, 529 insertions(+), 26 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter
  2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
@ 2016-02-01  1:49 ` John Fastabend
  2016-02-01  1:49 ` [RFC PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand John Fastabend
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2016-02-01  1:49 UTC (permalink / raw)
  To: anjali.singhai, jesse.brandeburg, jhs
  Cc: ast, donald.c.skidmore, horms, netdev, tgraf, davem

The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    7 +++++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        |    5 ++++-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   10 +++++++++-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |    9 ++++++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   11 ++++++++++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   12 ++++++++++--
 drivers/net/ethernet/sfc/efx.h                   |    2 +-
 drivers/net/ethernet/sfc/tx.c                    |    5 ++++-
 drivers/net/ethernet/ti/netcp_core.c             |    5 ++++-
 include/linux/netdevice.h                        |    2 +-
 net/sched/sch_mqprio.c                           |    5 +++--
 13 files changed, 63 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index b3552dd..2deb47e8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4268,6 +4268,13 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
 	return 0;
 }
 
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+	return bnx2x_setup_tc(dev, num_tc);
+}
+
 /* called with rtnl_lock */
 int bnx2x_change_mac_addr(struct net_device *dev, void *p)
 {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 4cbb03f8..e92d6e7 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,6 +486,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
 			struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 6c4e3a6..b17bb17 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12994,7 +12994,7 @@ static const struct net_device_ops bnx2x_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= poll_bnx2x,
 #endif
-	.ndo_setup_tc		= bnx2x_setup_tc,
+	.ndo_setup_tc		= __bnx2x_setup_tc,
 #ifdef CONFIG_BNX2X_SRIOV
 	.ndo_set_vf_mac		= bnx2x_set_vf_mac,
 	.ndo_set_vf_vlan	= bnx2x_set_vf_vlan,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 11446ad..ae9a3c2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5260,10 +5260,13 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
 {
 	struct bnxt *bp = netdev_priv(dev);
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	if (tc > bp->max_tc) {
 		netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n",
 			   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 83ddf36..a75e8db 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1190,6 +1190,14 @@ int fm10k_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
+static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return fm10k_setup_tc(dev, tc);
+}
+
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 {
 	switch (cmd) {
@@ -1372,7 +1380,7 @@ static const struct net_device_ops fm10k_netdev_ops = {
 	.ndo_vlan_rx_kill_vid	= fm10k_vlan_rx_kill_vid,
 	.ndo_set_rx_mode	= fm10k_set_rx_mode,
 	.ndo_get_stats64	= fm10k_get_stats64,
-	.ndo_setup_tc		= fm10k_setup_tc,
+	.ndo_setup_tc		= __fm10k_setup_tc,
 	.ndo_set_vf_mac		= fm10k_ndo_set_vf_mac,
 	.ndo_set_vf_vlan	= fm10k_ndo_set_vf_vlan,
 	.ndo_set_vf_rate	= fm10k_ndo_set_vf_bw,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 23211e0..20e550b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5298,6 +5298,13 @@ exit:
 	return ret;
 }
 
+static int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+	return i40e_setup_tc(netdev, tc);
+}
+
 /**
  * i40e_open - Called when a network interface is made active
  * @netdev: network interface device structure
@@ -8887,7 +8894,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= i40e_netpoll,
 #endif
-	.ndo_setup_tc		= i40e_setup_tc,
+	.ndo_setup_tc		= __i40e_setup_tc,
 #ifdef I40E_FCOE
 	.ndo_fcoe_enable	= i40e_fcoe_enable,
 	.ndo_fcoe_disable	= i40e_fcoe_disable,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index fca35aa..a05f3b7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8167,6 +8167,15 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
+int __ixgbe_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+{
+	/* Only support egress tc setup for now */
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return ixgbe_setup_tc(dev, tc);
+}
+
 #ifdef CONFIG_PCI_IOV
 void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter)
 {
@@ -8625,7 +8634,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
 	.ndo_get_stats64	= ixgbe_get_stats64,
 #ifdef CONFIG_IXGBE_DCB
-	.ndo_setup_tc		= ixgbe_setup_tc,
+	.ndo_setup_tc		= __ixgbe_setup_tc,
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 659209f..9b5c9f4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -69,6 +69,14 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	return 0;
 }
 
+static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, u8 up)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return mlx4_en_setup_tc(dev, up);
+}
+
 #ifdef CONFIG_RFS_ACCEL
 
 struct mlx4_en_filter {
@@ -2463,7 +2471,7 @@ static const struct net_device_ops mlx4_netdev_ops = {
 #endif
 	.ndo_set_features	= mlx4_en_set_features,
 	.ndo_fix_features	= mlx4_en_fix_features,
-	.ndo_setup_tc		= mlx4_en_setup_tc,
+	.ndo_setup_tc		= __mlx4_en_setup_tc,
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= mlx4_en_filter_rfs,
 #endif
@@ -2501,7 +2509,7 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
 #endif
 	.ndo_set_features	= mlx4_en_set_features,
 	.ndo_fix_features	= mlx4_en_fix_features,
-	.ndo_setup_tc		= mlx4_en_setup_tc,
+	.ndo_setup_tc		= __mlx4_en_setup_tc,
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= mlx4_en_filter_rfs,
 #endif
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 1aaf76c..2444089 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -32,7 +32,7 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
 				struct net_device *net_dev);
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
-int efx_setup_tc(struct net_device *net_dev, u8 num_tc);
+int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc);
 unsigned int efx_tx_max_skb_descs(struct efx_nic *efx);
 extern unsigned int efx_piobuf_size;
 extern bool efx_separate_tx_channels;
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index f7a0ec1..8f1d53e 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -562,7 +562,7 @@ void efx_init_tx_queue_core_txq(struct efx_tx_queue *tx_queue)
 				     efx->n_tx_channels : 0));
 }
 
-int efx_setup_tc(struct net_device *net_dev, u8 num_tc)
+int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	struct efx_channel *channel;
@@ -570,6 +570,9 @@ int efx_setup_tc(struct net_device *net_dev, u8 num_tc)
 	unsigned tc;
 	int rc;
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	if (efx_nic_rev(efx) < EFX_REV_FALCON_B0 || num_tc > EFX_MAX_TX_TC)
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 92d08eb..e1b9606 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1831,13 +1831,16 @@ static u16 netcp_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return 0;
 }
 
-static int netcp_setup_tc(struct net_device *dev, u8 num_tc)
+static int netcp_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
 {
 	int i;
 
 	/* setup tc must be called under rtnl lock */
 	ASSERT_RTNL();
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	/* Sanity-check the number of traffic classes requested */
 	if ((dev->real_num_tx_queues <= 1) ||
 	    (dev->real_num_tx_queues < num_tc))
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 81b26a5..afbce40 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1148,7 +1148,7 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
-	int			(*ndo_setup_tc)(struct net_device *dev, u8 tc);
+	int			(*ndo_setup_tc)(struct net_device *dev, u32 handle, u8 tc);
 #if IS_ENABLED(CONFIG_FCOE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6bc2617..4cec88a 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -39,7 +39,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_owned && dev->netdev_ops->ndo_setup_tc)
-		dev->netdev_ops->ndo_setup_tc(dev, 0);
+		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0);
 	else
 		netdev_set_num_tc(dev, 0);
 }
@@ -141,7 +141,8 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 */
 	if (qopt->hw) {
 		priv->hw_owned = 1;
-		err = dev->netdev_ops->ndo_setup_tc(dev, qopt->num_tc);
+		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle,
+						    qopt->num_tc);
 		if (err)
 			goto err;
 	} else {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand
  2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
  2016-02-01  1:49 ` [RFC PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
@ 2016-02-01  1:49 ` John Fastabend
  2016-02-01  1:50 ` [RFC PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2016-02-01  1:49 UTC (permalink / raw)
  To: anjali.singhai, jesse.brandeburg, jhs
  Cc: ast, donald.c.skidmore, horms, netdev, tgraf, davem

This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |    7 ++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |    8 ++++++--
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |    7 ++++---
 drivers/net/ethernet/intel/i40e/i40e_main.c     |    7 ++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |    7 ++++---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |    7 ++++---
 drivers/net/ethernet/sfc/efx.h                  |    3 ++-
 drivers/net/ethernet/sfc/tx.c                   |    9 ++++++---
 drivers/net/ethernet/ti/netcp_core.c            |   13 +++++++------
 include/linux/netdevice.h                       |   20 +++++++++++++++++++-
 net/sched/sch_mqprio.c                          |    9 ++++++---
 12 files changed, 68 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 2deb47e8..c44b267 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4268,11 +4268,12 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
 	return 0;
 }
 
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
-	return bnx2x_setup_tc(dev, num_tc);
+	return bnx2x_setup_tc(dev, tc->tc);
 }
 
 /* called with rtnl_lock */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index e92d6e7..ef2c776 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,7 +486,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
 			struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ae9a3c2..e99ec0d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5260,13 +5260,17 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			 struct tc_to_netdev *ntc)
 {
 	struct bnxt *bp = netdev_priv(dev);
+	u8 tc;
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
+	tc = ntc->tc;
+
 	if (tc > bp->max_tc) {
 		netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n",
 			   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index a75e8db..6db73b1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1190,12 +1190,13 @@ int fm10k_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
-static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int __fm10k_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			    struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return fm10k_setup_tc(dev, tc);
+	return fm10k_setup_tc(dev, tc->tc);
 }
 
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 20e550b..abc966c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5298,11 +5298,12 @@ exit:
 	return ret;
 }
 
-static int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+static int __i40e_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+			   struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
-	return i40e_setup_tc(netdev, tc);
+	return i40e_setup_tc(netdev, tc->tc);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a05f3b7..38a160b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8167,13 +8167,14 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
-int __ixgbe_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc)
 {
 	/* Only support egress tc setup for now */
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return ixgbe_setup_tc(dev, tc);
+	return ixgbe_setup_tc(dev, tc->tc);
 }
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 9b5c9f4..ee3b43f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -69,12 +69,13 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	return 0;
 }
 
-static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, u8 up)
+static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			      struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return mlx4_en_setup_tc(dev, up);
+	return mlx4_en_setup_tc(dev, tc->tc);
 }
 
 #ifdef CONFIG_RFS_ACCEL
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 2444089..a73890f 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -32,7 +32,8 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
 				struct net_device *net_dev);
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
-int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc);
+int efx_setup_tc(struct net_device *net_dev, u32 handle, __be16 proto,
+		 struct tc_to_netdev *tc);
 unsigned int efx_tx_max_skb_descs(struct efx_nic *efx);
 extern unsigned int efx_piobuf_size;
 extern bool efx_separate_tx_channels;
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 8f1d53e..2cdb571 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -562,17 +562,20 @@ void efx_init_tx_queue_core_txq(struct efx_tx_queue *tx_queue)
 				     efx->n_tx_channels : 0));
 }
 
-int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc)
+int efx_setup_tc(struct net_device *net_dev, u32 handle, __be16 proto,
+		 struct tc_to_netdev *ntc)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	struct efx_channel *channel;
 	struct efx_tx_queue *tx_queue;
-	unsigned tc;
+	unsigned tc, num_tc;
 	int rc;
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
+	num_tc = ntc->tc;
+
 	if (efx_nic_rev(efx) < EFX_REV_FALCON_B0 || num_tc > EFX_MAX_TX_TC)
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index e1b9606..b51800f 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1831,25 +1831,26 @@ static u16 netcp_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return 0;
 }
 
-static int netcp_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+static int netcp_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			  struct tc_to_netdev tc)
 {
 	int i;
 
 	/* setup tc must be called under rtnl lock */
 	ASSERT_RTNL();
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
 	/* Sanity-check the number of traffic classes requested */
 	if ((dev->real_num_tx_queues <= 1) ||
-	    (dev->real_num_tx_queues < num_tc))
+	    (dev->real_num_tx_queues < tc->tc))
 		return -EINVAL;
 
 	/* Configure traffic class to queue mappings */
-	if (num_tc) {
-		netdev_set_num_tc(dev, num_tc);
-		for (i = 0; i < num_tc; i++)
+	if (tc->tc) {
+		netdev_set_num_tc(dev, tc->tc);
+		for (i = 0; i < tc->tc; i++)
 			netdev_set_tc_queue(dev, i, 1, i);
 	} else {
 		netdev_reset_tc(dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index afbce40..27b8904 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,6 +779,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
+/* This structure holds attributes of qdisc and classifiers
+ * that are being passed to the netdevice through the setup_tc op.
+ */
+enum {
+	TC_SETUP_MQPRIO,
+};
+
+struct tc_to_netdev {
+	unsigned int type;
+	union {
+		u8 tc;
+	};
+};
+
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1148,7 +1163,10 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
-	int			(*ndo_setup_tc)(struct net_device *dev, u32 handle, u8 tc);
+	int			(*ndo_setup_tc)(struct net_device *dev,
+						u32 handle,
+						__be16 protocol,
+						struct tc_to_netdev *tc);
 #if IS_ENABLED(CONFIG_FCOE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 4cec88a..06b2797 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -28,6 +28,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
+	struct tc_to_netdev tc = {.type = TC_SETUP_MQPRIO};
 	unsigned int ntx;
 
 	if (priv->qdiscs) {
@@ -39,7 +40,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_owned && dev->netdev_ops->ndo_setup_tc)
-		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0);
+		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, &tc);
 	else
 		netdev_set_num_tc(dev, 0);
 }
@@ -140,9 +141,11 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 * supplied and verified mapping
 	 */
 	if (qopt->hw) {
+		struct tc_to_netdev tc = {.type = TC_SETUP_MQPRIO,
+					  .tc = qopt->num_tc};
+
 		priv->hw_owned = 1;
-		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle,
-						    qopt->num_tc);
+		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, &tc);
 		if (err)
 			goto err;
 	} else {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
  2016-02-01  1:49 ` [RFC PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
  2016-02-01  1:49 ` [RFC PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand John Fastabend
@ 2016-02-01  1:50 ` John Fastabend
  2016-02-02 16:25   ` Or Gerlitz
  2016-02-01  1:51 ` [RFC PATCH 4/7] net: add tc offload feature flag John Fastabend
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: John Fastabend @ 2016-02-01  1:50 UTC (permalink / raw)
  To: anjali.singhai, jesse.brandeburg, jhs
  Cc: ast, donald.c.skidmore, horms, netdev, tgraf, davem

This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/netdevice.h |    6 +++-
 include/net/pkt_cls.h     |   33 ++++++++++++++++++++
 net/sched/cls_u32.c       |   73 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 27b8904..38d1e59 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,17 +779,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
-/* This structure holds attributes of qdisc and classifiers
+/* These structures hold the attributes of qdisc and classifiers
  * that are being passed to the netdevice through the setup_tc op.
  */
 enum {
 	TC_SETUP_MQPRIO,
+	TC_SETUP_CLSU32,
 };
 
+struct tc_cls_u32_offload;
+
 struct tc_to_netdev {
 	unsigned int type;
 	union {
 		u8 tc;
+		struct tc_cls_u32_offload *cls_u32;
 	};
 };
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index bc49967..0bd12cd 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -358,4 +358,37 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 #endif /* CONFIG_NET_CLS_IND */
 
+struct tc_cls_u32_knode {
+	struct tcf_exts *exts;
+	u8 fshift;
+	u32 handle;
+	u32 val;
+	u32 mask;
+	u32 link_handle;
+	struct tc_u32_sel *sel;
+};
+
+struct tc_cls_u32_hnode {
+	u32 handle;
+	u32 prio;
+	unsigned int divisor;
+};
+
+enum {
+	TC_CLSU32_NEW_KNODE,
+	TC_CLSU32_REPLACE_KNODE,
+	TC_CLSU32_DELETE_KNODE,
+	TC_CLSU32_NEW_HNODE,
+	TC_CLSU32_REPLACE_HNODE,
+};
+
+struct tc_cls_u32_offload {
+	/* knode values */
+	int command;
+	union {
+		struct tc_cls_u32_knode knode;
+		struct tc_cls_u32_hnode hnode;
+	};
+};
+
 #endif
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 4fbb674..dfaaf29 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -43,6 +43,7 @@
 #include <net/netlink.h>
 #include <net/act_api.h>
 #include <net/pkt_cls.h>
+#include <linux/netdevice.h>
 
 struct tc_u_knode {
 	struct tc_u_knode __rcu	*next;
@@ -424,6 +425,68 @@ static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 	return 0;
 }
 
+static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
+		offload.cls_u32->knode.handle = handle;
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
+static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
+		offload.cls_u32->hnode.divisor = h->divisor;
+		offload.cls_u32->hnode.handle = h->handle;
+		offload.cls_u32->hnode.prio = h->prio;
+
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
+static void u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_REPLACE_KNODE;
+		offload.cls_u32->knode.handle = n->handle;
+		offload.cls_u32->knode.fshift = n->fshift;
+		offload.cls_u32->knode.val = n->val;
+		offload.cls_u32->knode.mask = n->mask;
+		offload.cls_u32->knode.sel = &n->sel;
+		offload.cls_u32->knode.exts = &n->exts;
+		if (n->ht_down)
+			offload.cls_u32->knode.link_handle = n->ht_down->handle;
+
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
 static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 {
 	struct tc_u_knode *n;
@@ -434,6 +497,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 			RCU_INIT_POINTER(ht->ht[h],
 					 rtnl_dereference(n->next));
 			tcf_unbind_filter(tp, &n->res);
+			u32_remove_hw_knode(tp, n->handle);
 			call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
 		}
 	}
@@ -540,8 +604,10 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
 	if (ht == NULL)
 		return 0;
 
-	if (TC_U32_KEY(ht->handle))
+	if (TC_U32_KEY(ht->handle)) {
+		u32_remove_hw_knode(tp, ht->handle);
 		return u32_delete_key(tp, (struct tc_u_knode *)ht);
+	}
 
 	if (root_ht == ht)
 		return -EINVAL;
@@ -769,6 +835,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		u32_replace_knode(tp, tp_c, new);
 		tcf_unbind_filter(tp, &n->res);
 		call_rcu(&n->rcu, u32_delete_key_rcu);
+		u32_replace_hw_knode(tp, new);
 		return 0;
 	}
 
@@ -795,6 +862,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		RCU_INIT_POINTER(ht->next, tp_c->hlist);
 		rcu_assign_pointer(tp_c->hlist, ht);
 		*arg = (unsigned long)ht;
+
+		u32_replace_hw_hnode(tp, ht);
 		return 0;
 	}
 
@@ -877,7 +946,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 
 		RCU_INIT_POINTER(n->next, pins);
 		rcu_assign_pointer(*ins, n);
-
+		u32_replace_hw_knode(tp, n);
 		*arg = (unsigned long)n;
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/7] net: add tc offload feature flag
  2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
                   ` (2 preceding siblings ...)
  2016-02-01  1:50 ` [RFC PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
@ 2016-02-01  1:51 ` John Fastabend
  2016-02-01  1:51 ` [RFC PATCH 5/7] net: tc: helper functions to query action types John Fastabend
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2016-02-01  1:51 UTC (permalink / raw)
  To: anjali.singhai, jesse.brandeburg, jhs
  Cc: ast, donald.c.skidmore, horms, netdev, tgraf, davem

Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/netdev_features.h |    3 +++
 net/core/ethtool.c              |    1 +
 2 files changed, 4 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index d9654f0e..a734bf4 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -67,6 +67,8 @@ enum {
 	NETIF_F_HW_L2FW_DOFFLOAD_BIT,	/* Allow L2 Forwarding in Hardware */
 	NETIF_F_BUSY_POLL_BIT,		/* Busy poll */
 
+	NETIF_F_HW_TC_BIT,		/* Offload TC infrastructure */
+
 	/*
 	 * Add your fresh new feature above and remember to update
 	 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -124,6 +126,7 @@ enum {
 #define NETIF_F_HW_VLAN_STAG_TX	__NETIF_F(HW_VLAN_STAG_TX)
 #define NETIF_F_HW_L2FW_DOFFLOAD	__NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL	__NETIF_F(BUSY_POLL)
+#define NETIF_F_HW_TC		__NETIF_F(HW_TC)
 
 #define for_each_netdev_feature(mask_addr, bit)	\
 	for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 09948a7..bd8e157a 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -98,6 +98,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_RXALL_BIT] =            "rx-all",
 	[NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
 	[NETIF_F_BUSY_POLL_BIT] =        "busy-poll",
+	[NETIF_F_HW_TC_BIT] =		 "hw-tc-offload",
 };
 
 static const char

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 5/7] net: tc: helper functions to query action types
  2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
                   ` (3 preceding siblings ...)
  2016-02-01  1:51 ` [RFC PATCH 4/7] net: add tc offload feature flag John Fastabend
@ 2016-02-01  1:51 ` John Fastabend
  2016-02-01  1:52 ` [RFC PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe John Fastabend
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2016-02-01  1:51 UTC (permalink / raw)
  To: anjali.singhai, jesse.brandeburg, jhs
  Cc: ast, donald.c.skidmore, horms, netdev, tgraf, davem

This is a helper function drivers can use to learn if the
action type is a drop action.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/tc_act/tc_gact.h |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
index 592a6bc..3fee3e9 100644
--- a/include/net/tc_act/tc_gact.h
+++ b/include/net/tc_act/tc_gact.h
@@ -2,6 +2,7 @@
 #define __NET_TC_GACT_H
 
 #include <net/act_api.h>
+#include <linux/tc_act/tc_gact.h>
 
 struct tcf_gact {
 	struct tcf_common	common;
@@ -15,4 +16,17 @@ struct tcf_gact {
 #define to_gact(a) \
 	container_of(a->priv, struct tcf_gact, common)
 
+static inline bool is_tcf_gact_dropped(const struct tc_action *a)
+{
+	struct tcf_gact *gact;
+
+	if (a->ops && a->ops->type != TCA_ACT_GACT)
+		return false;
+
+	gact = a->priv;
+	if (gact->tcf_action == TC_ACT_SHOT)
+		return true;
+
+	return false;
+}
 #endif /* __NET_TC_GACT_H */

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe
  2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
                   ` (4 preceding siblings ...)
  2016-02-01  1:51 ` [RFC PATCH 5/7] net: tc: helper functions to query action types John Fastabend
@ 2016-02-01  1:52 ` John Fastabend
  2016-02-02 16:27   ` Or Gerlitz
  2016-02-01  1:53 ` [RFC PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
  2016-02-02 11:49 ` [RFC PATCH 0/7] tc cls_u32 hardware interface Jiri Pirko
  7 siblings, 1 reply; 17+ messages in thread
From: John Fastabend @ 2016-02-01  1:52 UTC (permalink / raw)
  To: anjali.singhai, jesse.brandeburg, jhs
  Cc: ast, donald.c.skidmore, horms, netdev, tgraf, davem

This adds an ixgbe data structure that is used to determine what
headers:fields can be matched and in what order they are supported.

For hardware devices this can be a bit tricky because typically
only pre-programmed (firmware, ucode, rtl) parse graphs will be
supported and we don't yet have an interface to change these from
the OS. So its sort of a you get whatever your friendly vendor
provides affair at the moment.

In the future we can add the get routines and set routines to
update this data structure. One interesting thing to note here
is the data structure here identifies ethernet, ip, and tcp
fields without having to hardcode them as enumerations or use
other identifiers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h |  110 ++++++++++++++++++++++++
 1 file changed, 110 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
new file mode 100644
index 0000000..747d14a
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
@@ -0,0 +1,110 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 2016 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  Linux NICS <linux.nics@intel.com>
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#ifndef _IXGBE_MODEL_H_
+#define _IXGBE_MODEL_H_
+
+#include "ixgbe.h"
+#include "ixgbe_type.h"
+
+struct ixgbe_mat_field {
+	unsigned int off;
+	unsigned int mask;
+	int (*val)(struct ixgbe_fdir_filter *input,
+		   union ixgbe_atr_input *mask,
+		   __u32 val, __u32 m);
+	int link;
+};
+
+static inline int ixgbe_mat_prgm_sip(struct ixgbe_fdir_filter *input,
+				     union ixgbe_atr_input *mask,
+				     __u32 val, __u32 m)
+{
+	input->filter.formatted.src_ip[0] = val;
+	mask->formatted.src_ip[0] = m;
+	return 0;
+}
+
+static inline int ixgbe_mat_prgm_dip(struct ixgbe_fdir_filter *input,
+				     union ixgbe_atr_input *mask,
+				     __u32 val, __u32 m)
+{
+	input->filter.formatted.dst_ip[0] = val;
+	mask->formatted.dst_ip[0] = m;
+	return 0;
+}
+
+static struct ixgbe_mat_field ixgbe_ipv4_fields[] = {
+	{ .off = 12, .mask = -1, .val = ixgbe_mat_prgm_sip, .link = 0},
+	{ .off = 16, .mask = -1, .val = ixgbe_mat_prgm_dip, .link = 0},
+	{ .val = NULL } /* terminal node */
+};
+
+static inline int ixgbe_mat_prgm_sport(struct ixgbe_fdir_filter *input,
+				       union ixgbe_atr_input *mask,
+				       __u32 val, __u32 m)
+{
+	input->filter.formatted.src_port = val & 0xffff;
+	mask->formatted.src_port = m & 0xffff;
+	return 0;
+};
+
+static inline int ixgbe_mat_prgm_dport(struct ixgbe_fdir_filter *input,
+				       union ixgbe_atr_input *mask,
+				       __u32 val, __u32 m)
+{
+	input->filter.formatted.dst_port = val & 0xffff;
+	mask->formatted.dst_port = m & 0xffff;
+	return 0;
+};
+
+static struct ixgbe_mat_field ixgbe_tcp_fields[] = {
+	{.off = 0, .mask = 0xffff, .val = ixgbe_mat_prgm_sport, .link = -1},
+	{.off = 2, .mask = 0xffff, .val = ixgbe_mat_prgm_dport, .link = -1},
+	{ .val = NULL } /* terminal node */
+};
+
+struct ixgbe_nexthdr {
+	/* offset, shift, and mask of position to next header */
+	unsigned int o;
+	__u32 s;
+	__u32 m;
+	/* match criteria to make this jump*/
+	unsigned int off;
+	__u32 val;
+	__u32 mask;
+	/* location of jump to make */
+	struct ixgbe_mat_field *jump;
+};
+
+static struct ixgbe_nexthdr ixgbe_ipv4_jumps[] = {
+	{ .o = 0, .s = 6, .m = 0xf,
+	  .off = 8, .val = 0x600, .mask = 0xff00, .jump = ixgbe_tcp_fields},
+	{ .jump = NULL } /* terminal node */
+};
+#endif /* _IXGBE_MODEL_H_ */

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
                   ` (5 preceding siblings ...)
  2016-02-01  1:52 ` [RFC PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe John Fastabend
@ 2016-02-01  1:53 ` John Fastabend
  2016-02-02  2:17   ` David Miller
  2016-02-02 11:49 ` [RFC PATCH 0/7] tc cls_u32 hardware interface Jiri Pirko
  7 siblings, 1 reply; 17+ messages in thread
From: John Fastabend @ 2016-02-01  1:53 UTC (permalink / raw)
  To: anjali.singhai, jesse.brandeburg, jhs
  Cc: ast, donald.c.skidmore, horms, netdev, tgraf, davem

This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.

However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.

Also only the drop action is supported initially.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  188 ++++++++++++++++++++++
 3 files changed, 190 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index f4c9a42..532b1bd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -925,6 +925,9 @@ s32 ixgbe_fdir_erase_perfect_filter_82599(struct ixgbe_hw *hw,
 					  u16 soft_id);
 void ixgbe_atr_compute_perfect_hash_82599(union ixgbe_atr_input *input,
 					  union ixgbe_atr_input *mask);
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+				    struct ixgbe_fdir_filter *input,
+				    u16 sw_idx);
 void ixgbe_set_rx_mode(struct net_device *netdev);
 #ifdef CONFIG_IXGBE_DCB
 void ixgbe_set_rx_drop_en(struct ixgbe_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 1ed4c9a..712972d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2490,9 +2490,9 @@ static int ixgbe_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
 	return ret;
 }
 
-static int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
-					   struct ixgbe_fdir_filter *input,
-					   u16 sw_idx)
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+				    struct ixgbe_fdir_filter *input,
+				    u16 sw_idx)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct hlist_node *node2;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 38a160b..c1aabf1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -51,6 +51,7 @@
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
 #include <net/vxlan.h>
+#include <net/pkt_cls.h>
 
 #ifdef CONFIG_OF
 #include <linux/of_net.h>
@@ -8167,10 +8168,189 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
+#include <net/tc_act/tc_gact.h>
+#include "ixgbe_model.h"
+static int ixgbe_delete_clsu32(struct ixgbe_adapter *adapter,
+			       struct tc_cls_u32_offload *cls)
+{
+	int err;
+
+	spin_lock(&adapter->fdir_perfect_lock);
+	err = ixgbe_update_ethtool_fdir_entry(adapter, NULL, cls->knode.handle);
+	spin_unlock(&adapter->fdir_perfect_lock);
+	return err;
+}
+
+#define IXGBE_MAX_LINK_HANDLE 10
+static struct ixgbe_mat_fieldi *
+ixgbe_jump_tables[IXGBE_MAX_LINK_HANDLE] = {ixgbe_ipv4_fields,};
+
+static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
+				  __be16 protocol,
+				  struct tc_cls_u32_offload *cls)
+{
+	u32 loc = cls->knode.handle & 0xfffff;
+	struct ixgbe_hw *hw = &adapter->hw;
+	struct ixgbe_mat_field *field_ptr;
+	struct ixgbe_fdir_filter *input;
+	union ixgbe_atr_input mask;
+	const struct tc_action *a;
+	int i, err = 0;
+	u8 queue;
+	u32 handle;
+
+	memset(&mask, 0, sizeof(union ixgbe_atr_input));
+	handle = cls->knode.handle;
+
+	/* At the moment cls_u32 jumps to transport layer and skips past
+	 * L2 headers. The canonical method to match L2 frames is to use
+	 * negative values. However this is error prone at best but really
+	 * just broken because there is no way to "know" what sort of hdr
+	 * is in front of the transport layer. Fix cls_u32 to support L2
+	 * headers when needed.
+	 */
+	if (protocol != htons(ETH_P_IP))
+		return -EINVAL;
+
+	if (cls->knode.link_handle ||
+	    cls->knode.link_handle >= IXGBE_MAX_LINK_HANDLE) {
+		struct ixgbe_nexthdr *nexthdr = ixgbe_ipv4_jumps;
+		u32 uhtid = TC_U32_USERHTID(cls->knode.link_handle);
+
+		for (i = 0; nexthdr[i].jump; i++) {
+			if (nexthdr->o != cls->knode.sel->offoff ||
+			    nexthdr->s != cls->knode.sel->offshift ||
+			    nexthdr->m != cls->knode.sel->offmask ||
+			    /* do not support multiple key jumps its just mad */
+			    cls->knode.sel->nkeys > 1)
+				return -EINVAL;
+
+			if (nexthdr->off != cls->knode.sel->keys[0].off ||
+			    nexthdr->val != cls->knode.sel->keys[0].val ||
+			    nexthdr->mask != cls->knode.sel->keys[0].mask)
+				return -EINVAL;
+
+			if (uhtid >= IXGBE_MAX_LINK_HANDLE)
+				return -EINVAL;
+
+			ixgbe_jump_tables[uhtid] = nexthdr->jump;
+		}
+		return 0;
+	}
+
+	if (loc >= ((1024 << adapter->fdir_pballoc) - 2)) {
+		e_err(drv, "Location out of range\n");
+		return -EINVAL;
+	}
+
+	/* cls u32 is a graph starting at root node 0x800. The driver tracks
+	 * links and also the fields used to advance the parser across each
+	 * link (e.g. nexthdr/eat parameters from 'tc'). This way we can map
+	 * the u32 graph onto the hardware parse graph denoted in ixgbe_model.h
+	 * To add support for new nodes update ixgbe_model.h parse structures
+	 * this function _should_ be generic try not to hardcode values here.
+	 */
+	if (TC_U32_USERHTID(handle) == 0x800) {
+		field_ptr = ixgbe_jump_tables[0];
+	} else {
+		if (TC_U32_USERHTID(handle) >= ARRAY_SIZE(ixgbe_jump_tables))
+			return -EINVAL;
+
+		field_ptr = ixgbe_jump_tables[TC_U32_USERHTID(handle)];
+	}
+
+	if (!field_ptr)
+		return -EINVAL;
+
+	input = kzalloc(sizeof(*input), GFP_ATOMIC);
+	if (!input)
+		return -ENOMEM;
+
+	for (i = 0; i < cls->knode.sel->nkeys; i++) {
+		int off = cls->knode.sel->keys[i].off;
+		__be32 val = cls->knode.sel->keys[i].val;
+		__be32 m = cls->knode.sel->keys[i].mask;
+		bool found_entry = false;
+
+		mask.formatted.flow_type = IXGBE_ATR_L4TYPE_IPV6_MASK |
+					   IXGBE_ATR_L4TYPE_MASK;
+
+		for (i = 0; field_ptr[i].val; i++) {
+			if (field_ptr[i].off == off &&
+			    field_ptr[i].mask == m) {
+				field_ptr[i].val(input, &mask, val, m);
+				found_entry = true;
+				break;
+			}
+		}
+
+		if (!found_entry)
+			goto err_out;
+	}
+
+	if (list_empty(&cls->knode.exts->actions))
+		goto err_out;
+
+	list_for_each_entry(a, &cls->knode.exts->actions, list) {
+		if (!is_tcf_gact_dropped(a))
+			goto err_out;
+	}
+
+	input->action = IXGBE_FDIR_DROP_QUEUE;
+	queue = IXGBE_FDIR_DROP_QUEUE;
+	input->sw_idx = loc;
+
+	spin_lock(&adapter->fdir_perfect_lock);
+
+	if (hlist_empty(&adapter->fdir_filter_list)) {
+		memcpy(&adapter->fdir_mask, &mask, sizeof(mask));
+		err = ixgbe_fdir_set_input_mask_82599(hw, &mask);
+		if (err)
+			goto err_out_w_lock;
+	} else if (memcmp(&adapter->fdir_mask, &mask, sizeof(mask))) {
+		err = -EINVAL;
+		goto err_out_w_lock;
+	}
+
+	ixgbe_atr_compute_perfect_hash_82599(&input->filter, &mask);
+	err = ixgbe_fdir_write_perfect_filter_82599(hw, &input->filter,
+						    input->sw_idx, queue);
+	if (!err)
+		ixgbe_update_ethtool_fdir_entry(adapter, input, input->sw_idx);
+	spin_unlock(&adapter->fdir_perfect_lock);
+
+	return err;
+err_out_w_lock:
+	spin_unlock(&adapter->fdir_perfect_lock);
+err_out:
+	kfree(input);
+	return -EINVAL;
+}
+
 int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
 		     struct tc_to_netdev *tc)
 {
-	/* Only support egress tc setup for now */
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+	if (TC_H_MAJ(handle) == TC_H_MAJ(TC_H_INGRESS) &&
+	    tc->type == TC_SETUP_CLSU32) {
+		struct tc_cls_u32_offload *cls = tc->cls_u32;
+
+		if (!(dev->hw_features & NETIF_F_HW_TC))
+			return -EINVAL;
+
+		switch (tc->cls_u32->command) {
+		case TC_CLSU32_NEW_KNODE:
+		case TC_CLSU32_REPLACE_KNODE:
+			return ixgbe_configure_clsu32(adapter,
+						      proto, tc->cls_u32);
+		case TC_CLSU32_DELETE_KNODE:
+			return ixgbe_delete_clsu32(adapter, tc->cls_u32);
+		default:
+			return -EINVAL;
+		}
+	}
+
 	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
@@ -8244,6 +8424,7 @@ static int ixgbe_set_features(struct net_device *netdev,
 	 */
 	switch (features & NETIF_F_NTUPLE) {
 	case NETIF_F_NTUPLE:
+	case NETIF_F_HW_TC:
 		/* turn off ATR, enable perfect filters and reset */
 		if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
 			need_reset = true;
@@ -8634,9 +8815,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_set_vf_trust	= ixgbe_ndo_set_vf_trust,
 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
 	.ndo_get_stats64	= ixgbe_get_stats64,
-#ifdef CONFIG_IXGBE_DCB
 	.ndo_setup_tc		= __ixgbe_setup_tc,
-#endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
@@ -9007,7 +9186,8 @@ skip_sriov:
 	case ixgbe_mac_X550EM_x:
 		netdev->features |= NETIF_F_SCTP_CRC;
 		netdev->hw_features |= NETIF_F_SCTP_CRC |
-				       NETIF_F_NTUPLE;
+				       NETIF_F_NTUPLE |
+				       NETIF_F_HW_TC;
 		break;
 	default:
 		break;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-01  1:53 ` [RFC PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
@ 2016-02-02  2:17   ` David Miller
  2016-02-02  4:53     ` John Fastabend
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2016-02-02  2:17 UTC (permalink / raw)
  To: john.fastabend
  Cc: anjali.singhai, jesse.brandeburg, jhs, ast, donald.c.skidmore,
	horms, netdev, tgraf

From: John Fastabend <john.fastabend@gmail.com>
Date: Sun, 31 Jan 2016 17:53:10 -0800

> +			ixgbe_jump_tables[uhtid] = nexthdr->jump;

I can't figure out what protects concurrent accesses to this shared
ixgbe_jump_table[].  Is RTNL held here?  If so it's likely that
GFP_ATOMIC can be changed to GFP_KERNEL in this function.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-02  2:17   ` David Miller
@ 2016-02-02  4:53     ` John Fastabend
  0 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2016-02-02  4:53 UTC (permalink / raw)
  To: David Miller
  Cc: anjali.singhai, jesse.brandeburg, jhs, ast, donald.c.skidmore,
	horms, netdev, tgraf

On 16-02-01 06:17 PM, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Sun, 31 Jan 2016 17:53:10 -0800
> 
>> +			ixgbe_jump_tables[uhtid] = nexthdr->jump;
> 
> I can't figure out what protects concurrent accesses to this shared
> ixgbe_jump_table[].  Is RTNL held here?  If so it's likely that
> GFP_ATOMIC can be changed to GFP_KERNEL in this function.
> 
> 

This is only called from tc_ctl_tfilter which is wrapped in the
rtnl_lock as you suspected.

Thanks for the catch I'll convert it to GFP_KERNEL along with the
other spot I copy pasted the GFP_ATOMIC code from.

.John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/7] tc cls_u32 hardware interface
  2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
                   ` (6 preceding siblings ...)
  2016-02-01  1:53 ` [RFC PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
@ 2016-02-02 11:49 ` Jiri Pirko
  2016-02-02 14:58   ` John Fastabend
  7 siblings, 1 reply; 17+ messages in thread
From: Jiri Pirko @ 2016-02-02 11:49 UTC (permalink / raw)
  To: John Fastabend
  Cc: anjali.singhai, jesse.brandeburg, jhs, ast, donald.c.skidmore,
	horms, netdev, tgraf, davem

Mon, Feb 01, 2016 at 02:48:32AM CET, john.fastabend@gmail.com wrote:
>I was waiting for net-next to open to submit this but it seems like
>a good idea to get an RFC out there for folks to start looking over.
>
>This extends the setup_tc framework so it can support more than just
>the mqprio offload and push other classifiers and qdiscs into the
>hardware. The series here targets the u32 classifier and ixgbe
>driver. I worked out the u32 classifier because it is protocol
>oblivious and aligns with multiple hardware devices I have access
>to. I did an initial implementation on ixgbe because (a) I have one
>in my box (b) its a stable driver and (c) it is relatively simple
>compared to the other devices I have here but still has enough
>flexibility to exercise the features of cls_u32.
>
>I intentionally limited the scope of this series to the basic
>feature set. Specifically this uses a 'big hammer' feature bit
>to do the offload or not. If the bit is set you get offloaded rules
>if it is not then rules will not be offloaded. If we can agree on
>this patch series there are some more patches on my queue we can
>talk about to make the offload decision per rule using flags similar
>to how we do l2 mac updates. Additionally the error strategy can
>be improved to be hard aborting, log and continue, etc. I think
>these are nice to have improvements but shouldn't block this series.
>I am working on similar support for the other Intel devices now
>as well namely i40e and fm10k.
>
>Also in the future work bin by adding get_parse_graph and
>set_parse_graph attributes as in my previous flow_api work we
>can build programmable devices and programmatically learn when
>rules can or can not be loaded into the hardware.
>
>Note this series is on a slightly behind net-next stack I think it
>should apply to the latest but I haven't updated the series for
>awhile I'll do that soon I was sort of waiting for net-next to
>open to do this.
>
>Any comments/feedback appreciated.

I like this patchset. I gave it a quick peek and I it looks to me like
the correct way to go. There are couple of things needed to be decided,
as you described them (e. g. per-rule offload) - we should discuss them
on netdev conference. I hope you will be there.

I curious about how do you plan to expose the parse graphs get/set ops...


>
>Thanks,
>John
>
>---
>
>John Fastabend (7):
>      net: rework ndo tc op to consume additional qdisc handle parameter
>      net: rework setup_tc ndo op to consume general tc operand
>      net: sched: add cls_u32 offload hooks for netdevs
>      net: add tc offload feature flag
>      net: tc: helper functions to query action types
>      net: ixgbe: add minimal parser details for ixgbe
>      net: ixgbe: add support for tc_u32 offload
>
>
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    8 +
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |    2 
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 
> drivers/net/ethernet/broadcom/bnxt/bnxt.c        |    9 +
> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   11 +
> drivers/net/ethernet/intel/i40e/i40e_main.c      |   10 +
> drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
> drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  198 ++++++++++++++++++++++
> drivers/net/ethernet/intel/ixgbe/ixgbe_model.h   |  110 ++++++++++++
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   13 +
> drivers/net/ethernet/sfc/efx.h                   |    3 
> drivers/net/ethernet/sfc/tx.c                    |   10 +
> drivers/net/ethernet/ti/netcp_core.c             |   14 +-
> include/linux/netdev_features.h                  |    3 
> include/linux/netdevice.h                        |   24 +++
> include/net/pkt_cls.h                            |   33 ++++
> include/net/tc_act/tc_gact.h                     |   14 ++
> net/core/ethtool.c                               |    1 
> net/sched/cls_u32.c                              |   73 ++++++++
> net/sched/sch_mqprio.c                           |    8 +
> 21 files changed, 529 insertions(+), 26 deletions(-)
> create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
>
>--
>Signature

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/7] tc cls_u32 hardware interface
  2016-02-02 11:49 ` [RFC PATCH 0/7] tc cls_u32 hardware interface Jiri Pirko
@ 2016-02-02 14:58   ` John Fastabend
  0 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2016-02-02 14:58 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: anjali.singhai, jesse.brandeburg, jhs, ast, donald.c.skidmore,
	horms, netdev, tgraf, davem

On 16-02-02 03:49 AM, Jiri Pirko wrote:
> Mon, Feb 01, 2016 at 02:48:32AM CET, john.fastabend@gmail.com wrote:
>> I was waiting for net-next to open to submit this but it seems like
>> a good idea to get an RFC out there for folks to start looking over.
>>
>> This extends the setup_tc framework so it can support more than just
>> the mqprio offload and push other classifiers and qdiscs into the
>> hardware. The series here targets the u32 classifier and ixgbe
>> driver. I worked out the u32 classifier because it is protocol
>> oblivious and aligns with multiple hardware devices I have access
>> to. I did an initial implementation on ixgbe because (a) I have one
>> in my box (b) its a stable driver and (c) it is relatively simple
>> compared to the other devices I have here but still has enough
>> flexibility to exercise the features of cls_u32.
>>
>> I intentionally limited the scope of this series to the basic
>> feature set. Specifically this uses a 'big hammer' feature bit
>> to do the offload or not. If the bit is set you get offloaded rules
>> if it is not then rules will not be offloaded. If we can agree on
>> this patch series there are some more patches on my queue we can
>> talk about to make the offload decision per rule using flags similar
>> to how we do l2 mac updates. Additionally the error strategy can
>> be improved to be hard aborting, log and continue, etc. I think
>> these are nice to have improvements but shouldn't block this series.
>> I am working on similar support for the other Intel devices now
>> as well namely i40e and fm10k.
>>
>> Also in the future work bin by adding get_parse_graph and
>> set_parse_graph attributes as in my previous flow_api work we
>> can build programmable devices and programmatically learn when
>> rules can or can not be loaded into the hardware.
>>
>> Note this series is on a slightly behind net-next stack I think it
>> should apply to the latest but I haven't updated the series for
>> awhile I'll do that soon I was sort of waiting for net-next to
>> open to do this.
>>
>> Any comments/feedback appreciated.
> 
> I like this patchset. I gave it a quick peek and I it looks to me like
> the correct way to go. There are couple of things needed to be decided

Great.

,
> as you described them (e. g. per-rule offload) - we should discuss them
> on netdev conference. I hope you will be there.
> 

I'll be at the conference so we can discuss it there. Although even
without the per-rule flow this set is useful. Like I noted I view that
as an optimization and its much more useful on a NIC where host traffic
is the norm vs a switch where host traffic is most likely the exception.

> I curious about how do you plan to expose the parse graphs get/set ops...

The current stack of patches I have on my dev box use a new netlink
handler. ethtool could work as well but my preference is netlink in
this case.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-01  1:50 ` [RFC PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
@ 2016-02-02 16:25   ` Or Gerlitz
  2016-02-02 16:42     ` John Fastabend
  0 siblings, 1 reply; 17+ messages in thread
From: Or Gerlitz @ 2016-02-02 16:25 UTC (permalink / raw)
  To: John Fastabend
  Cc: Anjali Singhai Jain, Jesse Brandeburg, Jamal Hadi Salim, ast,
	Skidmore, Donald C, horms, Linux Netdev List, Thomas Graf,
	David Miller, Jiri Pirko

On Mon, Feb 1, 2016 at 3:50 AM, John Fastabend <john.fastabend@gmail.com> wrote:
> This patch allows netdev drivers to consume cls_u32 offloads via
> the ndo_setup_tc ndo op.
>
> This works aligns with how network drivers have been doing qdisc
> offloads for mqprio.

[...]

> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -779,17 +779,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
>  typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>                                        struct sk_buff *skb);
>
> -/* This structure holds attributes of qdisc and classifiers
> +/* These structures hold the attributes of qdisc and classifiers
>   * that are being passed to the netdevice through the setup_tc op.
>   */
>  enum {
>         TC_SETUP_MQPRIO,
> +       TC_SETUP_CLSU32,
>  };
>
> +struct tc_cls_u32_offload;
> +
>  struct tc_to_netdev {
>         unsigned int type;
>         union {
>                 u8 tc;
> +               struct tc_cls_u32_offload *cls_u32;
>         };
>  };

So under this approach we're making the HW driver u32 aware. Do we
really want to go there?

The flow-dissector + actions structure way of describing matching and
actions maybe had some
drawbacks but it's not affiliated with a specific networking component
(here TC/U32). When we look
fwd do we expect everything (netfilter offloads for example) to be
expressed in u32 terms?

Or.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe
  2016-02-01  1:52 ` [RFC PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe John Fastabend
@ 2016-02-02 16:27   ` Or Gerlitz
  2016-02-02 16:46     ` John Fastabend
  0 siblings, 1 reply; 17+ messages in thread
From: Or Gerlitz @ 2016-02-02 16:27 UTC (permalink / raw)
  To: John Fastabend
  Cc: Anjali Singhai Jain, Jesse Brandeburg, Jamal Hadi Salim, ast,
	Skidmore, Donald C, horms, Linux Netdev List, Thomas Graf,
	David Miller

On Mon, Feb 1, 2016 at 3:52 AM, John Fastabend <john.fastabend@gmail.com> wrote:
> This adds an ixgbe data structure that is used to determine what
> headers:fields can be matched and in what order they are supported.
>
> For hardware devices this can be a bit tricky because typically
> only pre-programmed (firmware, ucode, rtl) parse graphs will be
> supported and we don't yet have an interface to change these from
> the OS. So its sort of a you get whatever your friendly vendor
> provides affair at the moment.
>
> In the future we can add the get routines and set routines to
> update this data structure. One interesting thing to note here
> is the data structure here identifies ethernet, ip, and tcp
> fields without having to hardcode them as enumerations or use
> other identifiers.

Maybe for the current state this patch (or the most of it) can be
made generic and provided in a way that  multiple HW drivers can use it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-02 16:25   ` Or Gerlitz
@ 2016-02-02 16:42     ` John Fastabend
  2016-02-02 22:06       ` Or Gerlitz
  0 siblings, 1 reply; 17+ messages in thread
From: John Fastabend @ 2016-02-02 16:42 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Anjali Singhai Jain, Jesse Brandeburg, Jamal Hadi Salim, ast,
	Skidmore, Donald C, horms, Linux Netdev List, Thomas Graf,
	David Miller, Jiri Pirko

On 16-02-02 08:25 AM, Or Gerlitz wrote:
> On Mon, Feb 1, 2016 at 3:50 AM, John Fastabend <john.fastabend@gmail.com> wrote:
>> This patch allows netdev drivers to consume cls_u32 offloads via
>> the ndo_setup_tc ndo op.
>>
>> This works aligns with how network drivers have been doing qdisc
>> offloads for mqprio.
> 
> [...]
> 
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -779,17 +779,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
>>  typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>                                        struct sk_buff *skb);
>>
>> -/* This structure holds attributes of qdisc and classifiers
>> +/* These structures hold the attributes of qdisc and classifiers
>>   * that are being passed to the netdevice through the setup_tc op.
>>   */
>>  enum {
>>         TC_SETUP_MQPRIO,
>> +       TC_SETUP_CLSU32,
>>  };
>>
>> +struct tc_cls_u32_offload;
>> +
>>  struct tc_to_netdev {
>>         unsigned int type;
>>         union {
>>                 u8 tc;
>> +               struct tc_cls_u32_offload *cls_u32;
>>         };
>>  };
> 
> So under this approach we're making the HW driver u32 aware. Do we
> really want to go there?
> 

Yes, I'm not convinced writing the universal language X to arbitrary
hardware is worth the complexity/cost at the moment. I already started
writing this universal block of code and it gets a bit complex to do
it right. Anyways none of this is exposed via UAPI so it can be
conslidated reworked as needed. Also I'm not to keen on going from
tc/netfilter/etc to language X (hw IR) to hardware when the block of
code to jump from u32 or flower to hardware is so simple. I added
flower support to the driver with about 100lines of code fwiw I'll
send the patch out later today, sure I skipped populating all the
fields by breaking out of some case statements but not that many.

I don't mind opening up some helper functions if you like my backend
structures. But anyways most the hard work is programming the hardware
and hoping someone did silicon validation anyways imo.

> The flow-dissector + actions structure way of describing matching and
> actions maybe had some
> drawbacks but it's not affiliated with a specific networking component
> (here TC/U32). When we look
> fwd do we expect everything (netfilter offloads for example) to be
> expressed in u32 terms?

I'm a bit tired of speculating about what_ifs when we see the netfilter
offload code lets take a look at consolidating. For now I have code that
_works_.

> 
> Or.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe
  2016-02-02 16:27   ` Or Gerlitz
@ 2016-02-02 16:46     ` John Fastabend
  0 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2016-02-02 16:46 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Anjali Singhai Jain, Jesse Brandeburg, Jamal Hadi Salim, ast,
	Skidmore, Donald C, horms, Linux Netdev List, Thomas Graf,
	David Miller

On 16-02-02 08:27 AM, Or Gerlitz wrote:
> On Mon, Feb 1, 2016 at 3:52 AM, John Fastabend <john.fastabend@gmail.com> wrote:
>> This adds an ixgbe data structure that is used to determine what
>> headers:fields can be matched and in what order they are supported.
>>
>> For hardware devices this can be a bit tricky because typically
>> only pre-programmed (firmware, ucode, rtl) parse graphs will be
>> supported and we don't yet have an interface to change these from
>> the OS. So its sort of a you get whatever your friendly vendor
>> provides affair at the moment.
>>
>> In the future we can add the get routines and set routines to
>> update this data structure. One interesting thing to note here
>> is the data structure here identifies ethernet, ip, and tcp
>> fields without having to hardcode them as enumerations or use
>> other identifiers.
> 
> Maybe for the current state this patch (or the most of it) can be
> made generic and provided in a way that  multiple HW drivers can use it?
> 

If you want the structs we can put them in a helper lib but the
main code is two for loops to catch the keys and an if block to catch
the nexthdr code mixed with a bunch of code to program the specific
device. Its just not that helpful to other drivers.

.John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-02 16:42     ` John Fastabend
@ 2016-02-02 22:06       ` Or Gerlitz
  0 siblings, 0 replies; 17+ messages in thread
From: Or Gerlitz @ 2016-02-02 22:06 UTC (permalink / raw)
  To: John Fastabend
  Cc: Anjali Singhai Jain, Jesse Brandeburg, Jamal Hadi Salim, ast,
	Skidmore, Donald C, Simon Horman, Linux Netdev List, Thomas Graf,
	David Miller, Jiri Pirko

On Tue, Feb 2, 2016 at 6:42 PM, John Fastabend
<john.fastabend@gmail.com> > [..] I added
> flower support to the driver with about 100lines of code fwiw I'll
> send the patch out later today,

that would be very helpful, would appreciate if you post the code that
supports flower to the list or @ your github

> sure I skipped populating all the
> fields by breaking out of some case statements but not that many.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-02-02 22:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-01  1:48 [RFC PATCH 0/7] tc cls_u32 hardware interface John Fastabend
2016-02-01  1:49 ` [RFC PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
2016-02-01  1:49 ` [RFC PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand John Fastabend
2016-02-01  1:50 ` [RFC PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
2016-02-02 16:25   ` Or Gerlitz
2016-02-02 16:42     ` John Fastabend
2016-02-02 22:06       ` Or Gerlitz
2016-02-01  1:51 ` [RFC PATCH 4/7] net: add tc offload feature flag John Fastabend
2016-02-01  1:51 ` [RFC PATCH 5/7] net: tc: helper functions to query action types John Fastabend
2016-02-01  1:52 ` [RFC PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe John Fastabend
2016-02-02 16:27   ` Or Gerlitz
2016-02-02 16:46     ` John Fastabend
2016-02-01  1:53 ` [RFC PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
2016-02-02  2:17   ` David Miller
2016-02-02  4:53     ` John Fastabend
2016-02-02 11:49 ` [RFC PATCH 0/7] tc cls_u32 hardware interface Jiri Pirko
2016-02-02 14:58   ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.