All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe
@ 2016-02-17  5:15 John Fastabend
  2016-02-17  5:16 ` [net-next PATCH v3 1/8] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
                   ` (8 more replies)
  0 siblings, 9 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:15 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

This extends the setup_tc framework so it can support more than just
the mqprio offload and push other classifiers and qdiscs into the
hardware. The series here targets the u32 classifier and ixgbe
driver. I worked out the u32 classifier because it is protocol
oblivious and aligns with multiple hardware devices I have access
to. I did an initial implementation on ixgbe because (a) I have one
in my box (b) its a stable driver and (c) it is relatively simple
compared to the other devices I have here but still has enough
flexibility to exercise the features of cls_u32.

I intentionally limited the scope of this series to the basic
feature set. Specifically this uses a 'big hammer' feature bit
to do the offload or not. If the bit is set you get offloaded rules
if it is not then rules will not be offloaded. If we can agree on
this patch series there are some more patches on my queue we can
talk about to make the offload decision per rule using flags similar
to how we do l2 mac updates. Additionally the error strategy can
be improved to be hard aborting, log and continue, etc. I think
these are nice to have improvements but shouldn't block this series.

Also by adding get_parse_graph and set_parse_graph attributes as
in my previous flow_api work we can build programmable devices
and programmatically learn when rules can or can not be loaded
into the hardware. Again future work.

... v3 includes a couple style fixups suggested by Jiri and a
quick fix to ixgbe to check features flag and not dev_features
flag the latter being always set.

---

John Fastabend (8):
      net: rework ndo tc op to consume additional qdisc handle parameter
      net: rework setup_tc ndo op to consume general tc operand
      net: sched: add cls_u32 offload hooks for netdevs
      net: add tc offload feature flag
      net: tc: helper functions to query action types
      net: ixgbe: add minimal parser details for ixgbe
      net: ixgbe: add support for tc_u32 offload
      net: ixgbe: abort with cls u32 divisor groups greater than 1


 drivers/net/ethernet/amd/xgbe/xgbe-drv.c         |   10 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    8 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |    2 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        |    9 +
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   11 +
 drivers/net/ethernet/intel/i40e/i40e.h           |    3 
 drivers/net/ethernet/intel/i40e/i40e_fcoe.c      |    2 
 drivers/net/ethernet/intel/i40e/i40e_main.c      |   19 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    7 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  272 ++++++++++++++++++++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h   |  112 +++++++++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   13 +
 drivers/net/ethernet/sfc/efx.h                   |    3 
 drivers/net/ethernet/sfc/tx.c                    |   10 +
 drivers/net/ethernet/ti/netcp_core.c             |   14 +
 include/linux/netdev_features.h                  |    3 
 include/linux/netdevice.h                        |   25 ++
 include/net/pkt_cls.h                            |   34 +++
 include/net/tc_act/tc_gact.h                     |   16 +
 net/core/ethtool.c                               |    1 
 net/sched/cls_u32.c                              |   99 ++++++++
 net/sched/sch_mqprio.c                           |    8 -
 24 files changed, 632 insertions(+), 57 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [net-next PATCH v3 1/8] net: rework ndo tc op to consume additional qdisc handle parameter
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
@ 2016-02-17  5:16 ` John Fastabend
  2016-02-17 10:42   ` Jamal Hadi Salim
  2016-02-17  5:16 ` [net-next PATCH v3 2/8] net: rework setup_tc ndo op to consume general tc operand John Fastabend
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:16 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c         |    5 ++++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    7 +++++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        |    5 ++++-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   10 +++++++++-
 drivers/net/ethernet/intel/i40e/i40e.h           |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_fcoe.c      |    2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |   17 ++++++++++++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   11 ++++++++++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   12 ++++++++++--
 drivers/net/ethernet/sfc/efx.h                   |    2 +-
 drivers/net/ethernet/sfc/tx.c                    |    5 ++++-
 drivers/net/ethernet/ti/netcp_core.c             |    5 ++++-
 include/linux/netdevice.h                        |    3 ++-
 net/sched/sch_mqprio.c                           |    5 +++--
 16 files changed, 74 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 8a9b493..9955cae 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1626,12 +1626,15 @@ static void xgbe_poll_controller(struct net_device *netdev)
 }
 #endif /* End CONFIG_NET_POLL_CONTROLLER */
 
-static int xgbe_setup_tc(struct net_device *netdev, u8 tc)
+static int xgbe_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
 {
 	struct xgbe_prv_data *pdata = netdev_priv(netdev);
 	unsigned int offset, queue;
 	u8 i;
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	if (tc && (tc != pdata->hw_feat.tc_cnt))
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 9e42bca..b262cba 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4272,6 +4272,13 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
 	return 0;
 }
 
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+	return bnx2x_setup_tc(dev, num_tc);
+}
+
 /* called with rtnl_lock */
 int bnx2x_change_mac_addr(struct net_device *dev, void *p)
 {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 3736986..60a4109 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,6 +486,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
 			struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index c584525..81fc51c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13061,7 +13061,7 @@ static const struct net_device_ops bnx2x_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= poll_bnx2x,
 #endif
-	.ndo_setup_tc		= bnx2x_setup_tc,
+	.ndo_setup_tc		= __bnx2x_setup_tc,
 #ifdef CONFIG_BNX2X_SRIOV
 	.ndo_set_vf_mac		= bnx2x_set_vf_mac,
 	.ndo_set_vf_vlan	= bnx2x_set_vf_vlan,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5dc89e5..ff08faf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5370,10 +5370,13 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
 {
 	struct bnxt *bp = netdev_priv(dev);
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	if (tc > bp->max_tc) {
 		netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n",
 			   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 662569d..12701a4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1204,6 +1204,14 @@ err_queueing_scheme:
 	return err;
 }
 
+static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return fm10k_setup_tc(dev, tc);
+}
+
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 {
 	switch (cmd) {
@@ -1386,7 +1394,7 @@ static const struct net_device_ops fm10k_netdev_ops = {
 	.ndo_vlan_rx_kill_vid	= fm10k_vlan_rx_kill_vid,
 	.ndo_set_rx_mode	= fm10k_set_rx_mode,
 	.ndo_get_stats64	= fm10k_get_stats64,
-	.ndo_setup_tc		= fm10k_setup_tc,
+	.ndo_setup_tc		= __fm10k_setup_tc,
 	.ndo_set_vf_mac		= fm10k_ndo_set_vf_mac,
 	.ndo_set_vf_vlan	= fm10k_ndo_set_vf_vlan,
 	.ndo_set_vf_rate	= fm10k_ndo_set_vf_bw,
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 53ed3bd..ef9ca07 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -788,7 +788,7 @@ struct i40e_mac_filter *i40e_find_mac(struct i40e_vsi *vsi, u8 *macaddr,
 				      bool is_vf, bool is_netdev);
 #ifdef I40E_FCOE
 int i40e_close(struct net_device *netdev);
-int i40e_setup_tc(struct net_device *netdev, u8 tc);
+int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc);
 void i40e_netpoll(struct net_device *netdev);
 int i40e_fcoe_enable(struct net_device *netdev);
 int i40e_fcoe_disable(struct net_device *netdev);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_fcoe.c b/drivers/net/ethernet/intel/i40e/i40e_fcoe.c
index 579a46c..7c66ce4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_fcoe.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_fcoe.c
@@ -1457,7 +1457,7 @@ static const struct net_device_ops i40e_fcoe_netdev_ops = {
 	.ndo_tx_timeout		= i40e_tx_timeout,
 	.ndo_vlan_rx_add_vid	= i40e_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= i40e_vlan_rx_kill_vid,
-	.ndo_setup_tc		= i40e_setup_tc,
+	.ndo_setup_tc		= __i40e_setup_tc,
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= i40e_netpoll,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 320b049..abcb6c1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5253,11 +5253,7 @@ void i40e_down(struct i40e_vsi *vsi)
  * @netdev: net device to configure
  * @tc: number of traffic classes to enable
  **/
-#ifdef I40E_FCOE
-int i40e_setup_tc(struct net_device *netdev, u8 tc)
-#else
 static int i40e_setup_tc(struct net_device *netdev, u8 tc)
-#endif
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
@@ -5310,6 +5306,17 @@ exit:
 	return ret;
 }
 
+#ifdef I40E_FCOE
+int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+#else
+static int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+#endif
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+	return i40e_setup_tc(netdev, tc);
+}
+
 /**
  * i40e_open - Called when a network interface is made active
  * @netdev: network interface device structure
@@ -8951,7 +8958,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= i40e_netpoll,
 #endif
-	.ndo_setup_tc		= i40e_setup_tc,
+	.ndo_setup_tc		= __i40e_setup_tc,
 #ifdef I40E_FCOE
 	.ndo_fcoe_enable	= i40e_fcoe_enable,
 	.ndo_fcoe_disable	= i40e_fcoe_disable,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0c701b8..1ba714e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8200,6 +8200,15 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
+int __ixgbe_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+{
+	/* Only support egress tc setup for now */
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return ixgbe_setup_tc(dev, tc);
+}
+
 #ifdef CONFIG_PCI_IOV
 void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter)
 {
@@ -8658,7 +8667,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
 	.ndo_get_stats64	= ixgbe_get_stats64,
 #ifdef CONFIG_IXGBE_DCB
-	.ndo_setup_tc		= ixgbe_setup_tc,
+	.ndo_setup_tc		= __ixgbe_setup_tc,
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0c7e3f6..d5c6c16 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -69,6 +69,14 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	return 0;
 }
 
+static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, u8 up)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return mlx4_en_setup_tc(dev, up);
+}
+
 #ifdef CONFIG_RFS_ACCEL
 
 struct mlx4_en_filter {
@@ -2466,7 +2474,7 @@ static const struct net_device_ops mlx4_netdev_ops = {
 #endif
 	.ndo_set_features	= mlx4_en_set_features,
 	.ndo_fix_features	= mlx4_en_fix_features,
-	.ndo_setup_tc		= mlx4_en_setup_tc,
+	.ndo_setup_tc		= __mlx4_en_setup_tc,
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= mlx4_en_filter_rfs,
 #endif
@@ -2504,7 +2512,7 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
 #endif
 	.ndo_set_features	= mlx4_en_set_features,
 	.ndo_fix_features	= mlx4_en_fix_features,
-	.ndo_setup_tc		= mlx4_en_setup_tc,
+	.ndo_setup_tc		= __mlx4_en_setup_tc,
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= mlx4_en_filter_rfs,
 #endif
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 1082747..7815fa0 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -32,7 +32,7 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
 				struct net_device *net_dev);
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
-int efx_setup_tc(struct net_device *net_dev, u8 num_tc);
+int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc);
 unsigned int efx_tx_max_skb_descs(struct efx_nic *efx);
 extern unsigned int efx_piobuf_size;
 extern bool efx_separate_tx_channels;
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index f7a0ec1..8f1d53e 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -562,7 +562,7 @@ void efx_init_tx_queue_core_txq(struct efx_tx_queue *tx_queue)
 				     efx->n_tx_channels : 0));
 }
 
-int efx_setup_tc(struct net_device *net_dev, u8 num_tc)
+int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	struct efx_channel *channel;
@@ -570,6 +570,9 @@ int efx_setup_tc(struct net_device *net_dev, u8 num_tc)
 	unsigned tc;
 	int rc;
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	if (efx_nic_rev(efx) < EFX_REV_FALCON_B0 || num_tc > EFX_MAX_TX_TC)
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index c61d66d..40cde81 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1835,13 +1835,16 @@ static u16 netcp_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return 0;
 }
 
-static int netcp_setup_tc(struct net_device *dev, u8 num_tc)
+static int netcp_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
 {
 	int i;
 
 	/* setup tc must be called under rtnl lock */
 	ASSERT_RTNL();
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	/* Sanity-check the number of traffic classes requested */
 	if ((dev->real_num_tx_queues <= 1) ||
 	    (dev->real_num_tx_queues < num_tc))
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0499569..48928b6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -51,6 +51,7 @@
 #include <linux/neighbour.h>
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
+#include <uapi/linux/pkt_cls.h>
 
 struct netpoll_info;
 struct device;
@@ -1150,7 +1151,7 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
-	int			(*ndo_setup_tc)(struct net_device *dev, u8 tc);
+	int			(*ndo_setup_tc)(struct net_device *dev, u32 handle, u8 tc);
 #if IS_ENABLED(CONFIG_FCOE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index ad70ecf..f5a0e8a 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -39,7 +39,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_owned && dev->netdev_ops->ndo_setup_tc)
-		dev->netdev_ops->ndo_setup_tc(dev, 0);
+		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0);
 	else
 		netdev_set_num_tc(dev, 0);
 }
@@ -141,7 +141,8 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 */
 	if (qopt->hw) {
 		priv->hw_owned = 1;
-		err = dev->netdev_ops->ndo_setup_tc(dev, qopt->num_tc);
+		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle,
+						    qopt->num_tc);
 		if (err)
 			goto err;
 	} else {

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH v3 2/8] net: rework setup_tc ndo op to consume general tc operand
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
  2016-02-17  5:16 ` [net-next PATCH v3 1/8] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
@ 2016-02-17  5:16 ` John Fastabend
  2016-02-17 10:42   ` Jamal Hadi Salim
  2016-02-17  5:17 ` [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:16 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c        |    9 ++++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |    7 ++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |    8 ++++++--
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |    7 ++++---
 drivers/net/ethernet/intel/i40e/i40e.h          |    3 ++-
 drivers/net/ethernet/intel/i40e/i40e_main.c     |   10 ++++++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |    7 ++++---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |    7 ++++---
 drivers/net/ethernet/sfc/efx.h                  |    3 ++-
 drivers/net/ethernet/sfc/tx.c                   |    9 ++++++---
 drivers/net/ethernet/ti/netcp_core.c            |   13 +++++++------
 include/linux/netdevice.h                       |   20 +++++++++++++++++++-
 net/sched/sch_mqprio.c                          |    9 ++++++---
 14 files changed, 78 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 9955cae..cfd3f7e 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1626,15 +1626,18 @@ static void xgbe_poll_controller(struct net_device *netdev)
 }
 #endif /* End CONFIG_NET_POLL_CONTROLLER */
 
-static int xgbe_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+static int xgbe_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+			 struct tc_to_netdev *tc_to_netdev)
 {
 	struct xgbe_prv_data *pdata = netdev_priv(netdev);
 	unsigned int offset, queue;
-	u8 i;
+	u8 i, tc;
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc_to_netdev->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
+	tc = tc_to_netdev->tc;
+
 	if (tc && (tc != pdata->hw_feat.tc_cnt))
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index b262cba..45843d1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4272,11 +4272,12 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
 	return 0;
 }
 
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
-	return bnx2x_setup_tc(dev, num_tc);
+	return bnx2x_setup_tc(dev, tc->tc);
 }
 
 /* called with rtnl_lock */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 60a4109..0e68fad 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,7 +486,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
 			struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ff08faf..169920a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5370,13 +5370,17 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			 struct tc_to_netdev *ntc)
 {
 	struct bnxt *bp = netdev_priv(dev);
+	u8 tc;
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
+	tc = ntc->tc;
+
 	if (tc > bp->max_tc) {
 		netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n",
 			   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 12701a4..dc1a821 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1204,12 +1204,13 @@ err_queueing_scheme:
 	return err;
 }
 
-static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int __fm10k_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			    struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return fm10k_setup_tc(dev, tc);
+	return fm10k_setup_tc(dev, tc->tc);
 }
 
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index ef9ca07..933c4b3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -788,7 +788,8 @@ struct i40e_mac_filter *i40e_find_mac(struct i40e_vsi *vsi, u8 *macaddr,
 				      bool is_vf, bool is_netdev);
 #ifdef I40E_FCOE
 int i40e_close(struct net_device *netdev);
-int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc);
+int __i40e_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+		    struct tc_to_netdev *tc);
 void i40e_netpoll(struct net_device *netdev);
 int i40e_fcoe_enable(struct net_device *netdev);
 int i40e_fcoe_disable(struct net_device *netdev);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index abcb6c1..257d162 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5307,14 +5307,16 @@ exit:
 }
 
 #ifdef I40E_FCOE
-int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+int __i40e_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+		    struct tc_to_netdev *tc)
 #else
-static int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+static int __i40e_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+			   struct tc_to_netdev *tc)
 #endif
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
-	return i40e_setup_tc(netdev, tc);
+	return i40e_setup_tc(netdev, tc->tc);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1ba714e..dca2298 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8200,13 +8200,14 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
-int __ixgbe_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc)
 {
 	/* Only support egress tc setup for now */
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return ixgbe_setup_tc(dev, tc);
+	return ixgbe_setup_tc(dev, tc->tc);
 }
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index d5c6c16..01d6a96 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -69,12 +69,13 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	return 0;
 }
 
-static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, u8 up)
+static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			      struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return mlx4_en_setup_tc(dev, up);
+	return mlx4_en_setup_tc(dev, tc->tc);
 }
 
 #ifdef CONFIG_RFS_ACCEL
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 7815fa0..5e3f93f 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -32,7 +32,8 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
 				struct net_device *net_dev);
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
-int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc);
+int efx_setup_tc(struct net_device *net_dev, u32 handle, __be16 proto,
+		 struct tc_to_netdev *tc);
 unsigned int efx_tx_max_skb_descs(struct efx_nic *efx);
 extern unsigned int efx_piobuf_size;
 extern bool efx_separate_tx_channels;
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 8f1d53e..2cdb571 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -562,17 +562,20 @@ void efx_init_tx_queue_core_txq(struct efx_tx_queue *tx_queue)
 				     efx->n_tx_channels : 0));
 }
 
-int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc)
+int efx_setup_tc(struct net_device *net_dev, u32 handle, __be16 proto,
+		 struct tc_to_netdev *ntc)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	struct efx_channel *channel;
 	struct efx_tx_queue *tx_queue;
-	unsigned tc;
+	unsigned tc, num_tc;
 	int rc;
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
+	num_tc = ntc->tc;
+
 	if (efx_nic_rev(efx) < EFX_REV_FALCON_B0 || num_tc > EFX_MAX_TX_TC)
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 40cde81..8586a20 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1835,25 +1835,26 @@ static u16 netcp_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return 0;
 }
 
-static int netcp_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+static int netcp_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			  struct tc_to_netdev tc)
 {
 	int i;
 
 	/* setup tc must be called under rtnl lock */
 	ASSERT_RTNL();
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
 	/* Sanity-check the number of traffic classes requested */
 	if ((dev->real_num_tx_queues <= 1) ||
-	    (dev->real_num_tx_queues < num_tc))
+	    (dev->real_num_tx_queues < tc->tc))
 		return -EINVAL;
 
 	/* Configure traffic class to queue mappings */
-	if (num_tc) {
-		netdev_set_num_tc(dev, num_tc);
-		for (i = 0; i < num_tc; i++)
+	if (tc->tc) {
+		netdev_set_num_tc(dev, tc->tc);
+		for (i = 0; i < tc->tc; i++)
 			netdev_set_tc_queue(dev, i, 1, i);
 	} else {
 		netdev_reset_tc(dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 48928b6..e396060 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,6 +779,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
+/* This structure holds attributes of qdisc and classifiers
+ * that are being passed to the netdevice through the setup_tc op.
+ */
+enum {
+	TC_SETUP_MQPRIO,
+};
+
+struct tc_to_netdev {
+	unsigned int type;
+	union {
+		u8 tc;
+	};
+};
+
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1151,7 +1166,10 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
-	int			(*ndo_setup_tc)(struct net_device *dev, u32 handle, u8 tc);
+	int			(*ndo_setup_tc)(struct net_device *dev,
+						u32 handle,
+						__be16 protocol,
+						struct tc_to_netdev *tc);
 #if IS_ENABLED(CONFIG_FCOE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index f5a0e8a..f9947d1 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -28,6 +28,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
+	struct tc_to_netdev tc = {.type = TC_SETUP_MQPRIO};
 	unsigned int ntx;
 
 	if (priv->qdiscs) {
@@ -39,7 +40,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_owned && dev->netdev_ops->ndo_setup_tc)
-		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0);
+		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, &tc);
 	else
 		netdev_set_num_tc(dev, 0);
 }
@@ -140,9 +141,11 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 * supplied and verified mapping
 	 */
 	if (qopt->hw) {
+		struct tc_to_netdev tc = {.type = TC_SETUP_MQPRIO,
+					  .tc = qopt->num_tc};
+
 		priv->hw_owned = 1;
-		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle,
-						    qopt->num_tc);
+		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, &tc);
 		if (err)
 			goto err;
 	} else {

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
  2016-02-17  5:16 ` [net-next PATCH v3 1/8] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
  2016-02-17  5:16 ` [net-next PATCH v3 2/8] net: rework setup_tc ndo op to consume general tc operand John Fastabend
@ 2016-02-17  5:17 ` John Fastabend
  2016-02-17  7:02   ` Jiri Pirko
  2016-02-17 10:59   ` Jamal Hadi Salim
  2016-02-17  5:17 ` [net-next PATCH v3 4/8] net: add tc offload feature flag John Fastabend
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:17 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/netdevice.h |    6 ++-
 include/net/pkt_cls.h     |   34 +++++++++++++++
 net/sched/cls_u32.c       |   99 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 136 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e396060..47671ce0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,17 +779,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
-/* This structure holds attributes of qdisc and classifiers
+/* These structures hold the attributes of qdisc and classifiers
  * that are being passed to the netdevice through the setup_tc op.
  */
 enum {
 	TC_SETUP_MQPRIO,
+	TC_SETUP_CLSU32,
 };
 
+struct tc_cls_u32_offload;
+
 struct tc_to_netdev {
 	unsigned int type;
 	union {
 		u8 tc;
+		struct tc_cls_u32_offload *cls_u32;
 	};
 };
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index bc49967..59789ca 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -358,4 +358,38 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 #endif /* CONFIG_NET_CLS_IND */
 
+struct tc_cls_u32_knode {
+	struct tcf_exts *exts;
+	u8 fshift;
+	u32 handle;
+	u32 val;
+	u32 mask;
+	u32 link_handle;
+	struct tc_u32_sel *sel;
+};
+
+struct tc_cls_u32_hnode {
+	u32 handle;
+	u32 prio;
+	unsigned int divisor;
+};
+
+enum tc_clsu32_command {
+	TC_CLSU32_NEW_KNODE,
+	TC_CLSU32_REPLACE_KNODE,
+	TC_CLSU32_DELETE_KNODE,
+	TC_CLSU32_NEW_HNODE,
+	TC_CLSU32_REPLACE_HNODE,
+	TC_CLSU32_DELETE_HNODE,
+};
+
+struct tc_cls_u32_offload {
+	/* knode values */
+	enum tc_clsu32_command command;
+	union {
+		struct tc_cls_u32_knode knode;
+		struct tc_cls_u32_hnode hnode;
+	};
+};
+
 #endif
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 4fbb674..d54bc94 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -43,6 +43,7 @@
 #include <net/netlink.h>
 #include <net/act_api.h>
 #include <net/pkt_cls.h>
+#include <linux/netdevice.h>
 
 struct tc_u_knode {
 	struct tc_u_knode __rcu	*next;
@@ -424,6 +425,93 @@ static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 	return 0;
 }
 
+static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
+		offload.cls_u32->knode.handle = handle;
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
+static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
+		offload.cls_u32->hnode.divisor = h->divisor;
+		offload.cls_u32->hnode.handle = h->handle;
+		offload.cls_u32->hnode.prio = h->prio;
+
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
+static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_DELETE_HNODE;
+		offload.cls_u32->hnode.divisor = h->divisor;
+		offload.cls_u32->hnode.handle = h->handle;
+		offload.cls_u32->hnode.prio = h->prio;
+
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
+static void u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_REPLACE_KNODE;
+		offload.cls_u32->knode.handle = n->handle;
+		offload.cls_u32->knode.fshift = n->fshift;
+#ifdef CONFIG_CLS_U32_MARK
+		offload.cls_u32->knode.val = n->val;
+		offload.cls_u32->knode.mask = n->mask;
+#else
+		offload.cls_u32->knode.val = 0;
+		offload.cls_u32->knode.mask = 0;
+#endif
+		offload.cls_u32->knode.sel = &n->sel;
+		offload.cls_u32->knode.exts = &n->exts;
+		if (n->ht_down)
+			offload.cls_u32->knode.link_handle = n->ht_down->handle;
+
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
 static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 {
 	struct tc_u_knode *n;
@@ -434,6 +522,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 			RCU_INIT_POINTER(ht->ht[h],
 					 rtnl_dereference(n->next));
 			tcf_unbind_filter(tp, &n->res);
+			u32_remove_hw_knode(tp, n->handle);
 			call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
 		}
 	}
@@ -454,6 +543,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 	     phn;
 	     hn = &phn->next, phn = rtnl_dereference(*hn)) {
 		if (phn == ht) {
+			u32_clear_hw_hnode(tp, ht);
 			RCU_INIT_POINTER(*hn, ht->next);
 			kfree_rcu(ht, rcu);
 			return 0;
@@ -540,8 +630,10 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
 	if (ht == NULL)
 		return 0;
 
-	if (TC_U32_KEY(ht->handle))
+	if (TC_U32_KEY(ht->handle)) {
+		u32_remove_hw_knode(tp, ht->handle);
 		return u32_delete_key(tp, (struct tc_u_knode *)ht);
+	}
 
 	if (root_ht == ht)
 		return -EINVAL;
@@ -769,6 +861,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		u32_replace_knode(tp, tp_c, new);
 		tcf_unbind_filter(tp, &n->res);
 		call_rcu(&n->rcu, u32_delete_key_rcu);
+		u32_replace_hw_knode(tp, new);
 		return 0;
 	}
 
@@ -795,6 +888,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		RCU_INIT_POINTER(ht->next, tp_c->hlist);
 		rcu_assign_pointer(tp_c->hlist, ht);
 		*arg = (unsigned long)ht;
+
+		u32_replace_hw_hnode(tp, ht);
 		return 0;
 	}
 
@@ -877,7 +972,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 
 		RCU_INIT_POINTER(n->next, pins);
 		rcu_assign_pointer(*ins, n);
-
+		u32_replace_hw_knode(tp, n);
 		*arg = (unsigned long)n;
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH v3 4/8] net: add tc offload feature flag
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (2 preceding siblings ...)
  2016-02-17  5:17 ` [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
@ 2016-02-17  5:17 ` John Fastabend
  2016-02-17 11:01   ` Jamal Hadi Salim
  2016-02-17  5:18 ` [net-next PATCH v3 5/8] net: tc: helper functions to query action types John Fastabend
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:17 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/netdev_features.h |    3 +++
 net/core/ethtool.c              |    1 +
 2 files changed, 4 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index d9654f0e..a734bf4 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -67,6 +67,8 @@ enum {
 	NETIF_F_HW_L2FW_DOFFLOAD_BIT,	/* Allow L2 Forwarding in Hardware */
 	NETIF_F_BUSY_POLL_BIT,		/* Busy poll */
 
+	NETIF_F_HW_TC_BIT,		/* Offload TC infrastructure */
+
 	/*
 	 * Add your fresh new feature above and remember to update
 	 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -124,6 +126,7 @@ enum {
 #define NETIF_F_HW_VLAN_STAG_TX	__NETIF_F(HW_VLAN_STAG_TX)
 #define NETIF_F_HW_L2FW_DOFFLOAD	__NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL	__NETIF_F(BUSY_POLL)
+#define NETIF_F_HW_TC		__NETIF_F(HW_TC)
 
 #define for_each_netdev_feature(mask_addr, bit)	\
 	for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 65f907a..c2d3118 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -98,6 +98,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_RXALL_BIT] =            "rx-all",
 	[NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
 	[NETIF_F_BUSY_POLL_BIT] =        "busy-poll",
+	[NETIF_F_HW_TC_BIT] =		 "hw-tc-offload",
 };
 
 static const char

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH v3 5/8] net: tc: helper functions to query action types
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (3 preceding siblings ...)
  2016-02-17  5:17 ` [net-next PATCH v3 4/8] net: add tc offload feature flag John Fastabend
@ 2016-02-17  5:18 ` John Fastabend
  2016-02-17  7:03   ` Jiri Pirko
  2016-02-17 11:02   ` Jamal Hadi Salim
  2016-02-17  5:18 ` [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe John Fastabend
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:18 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

This is a helper function drivers can use to learn if the
action type is a drop action.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/tc_act/tc_gact.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
index 592a6bc..04a3183 100644
--- a/include/net/tc_act/tc_gact.h
+++ b/include/net/tc_act/tc_gact.h
@@ -2,6 +2,7 @@
 #define __NET_TC_GACT_H
 
 #include <net/act_api.h>
+#include <linux/tc_act/tc_gact.h>
 
 struct tcf_gact {
 	struct tcf_common	common;
@@ -15,4 +16,19 @@ struct tcf_gact {
 #define to_gact(a) \
 	container_of(a->priv, struct tcf_gact, common)
 
+#ifdef CONFIG_NET_CLS_ACT
+static inline bool is_tcf_gact_shot(const struct tc_action *a)
+{
+	struct tcf_gact *gact;
+
+	if (a->ops && a->ops->type != TCA_ACT_GACT)
+		return false;
+
+	gact = a->priv;
+	if (gact->tcf_action == TC_ACT_SHOT)
+		return true;
+
+	return false;
+}
+#endif
 #endif /* __NET_TC_GACT_H */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (4 preceding siblings ...)
  2016-02-17  5:18 ` [net-next PATCH v3 5/8] net: tc: helper functions to query action types John Fastabend
@ 2016-02-17  5:18 ` John Fastabend
  2016-02-17 11:06   ` Jamal Hadi Salim
  2016-02-17 18:01   ` Rustad, Mark D
  2016-02-17  5:18 ` [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload John Fastabend
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:18 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

This adds an ixgbe data structure that is used to determine what
headers:fields can be matched and in what order they are supported.

For hardware devices this can be a bit tricky because typically
only pre-programmed (firmware, ucode, rtl) parse graphs will be
supported and we don't yet have an interface to change these from
the OS. So its sort of a you get whatever your friendly vendor
provides affair at the moment.

In the future we can add the get routines and set routines to
update this data structure. One interesting thing to note here
is the data structure here identifies ethernet, ip, and tcp
fields without having to hardcode them as enumerations or use
other identifiers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h |  112 ++++++++++++++++++++++++
 1 file changed, 112 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
new file mode 100644
index 0000000..43ebec4
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
@@ -0,0 +1,112 @@
+/*******************************************************************************
+ *
+ * Intel 10 Gigabit PCI Express Linux drive
+ * Copyright(c) 2013 - 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Contact Information:
+ * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+ *
+ ******************************************************************************/
+
+#ifndef _IXGBE_MODEL_H_
+#define _IXGBE_MODEL_H_
+
+#include "ixgbe.h"
+#include "ixgbe_type.h"
+
+struct ixgbe_mat_field {
+	unsigned int off;
+	unsigned int mask;
+	int (*val)(struct ixgbe_fdir_filter *input,
+		   union ixgbe_atr_input *mask,
+		   __u32 val, __u32 m);
+	unsigned int type;
+};
+
+static inline int ixgbe_mat_prgm_sip(struct ixgbe_fdir_filter *input,
+				     union ixgbe_atr_input *mask,
+				     __u32 val, __u32 m)
+{
+	input->filter.formatted.src_ip[0] = val;
+	mask->formatted.src_ip[0] = m;
+	return 0;
+}
+
+static inline int ixgbe_mat_prgm_dip(struct ixgbe_fdir_filter *input,
+				     union ixgbe_atr_input *mask,
+				     __u32 val, __u32 m)
+{
+	input->filter.formatted.dst_ip[0] = val;
+	mask->formatted.dst_ip[0] = m;
+	return 0;
+}
+
+static struct ixgbe_mat_field ixgbe_ipv4_fields[] = {
+	{ .off = 12, .mask = -1, .val = ixgbe_mat_prgm_sip,
+	  .type = IXGBE_ATR_FLOW_TYPE_IPV4},
+	{ .off = 16, .mask = -1, .val = ixgbe_mat_prgm_dip,
+	  .type = IXGBE_ATR_FLOW_TYPE_IPV4},
+	{ .val = NULL } /* terminal node */
+};
+
+static inline int ixgbe_mat_prgm_sport(struct ixgbe_fdir_filter *input,
+				       union ixgbe_atr_input *mask,
+				       __u32 val, __u32 m)
+{
+	input->filter.formatted.src_port = val & 0xffff;
+	mask->formatted.src_port = m & 0xffff;
+	return 0;
+};
+
+static inline int ixgbe_mat_prgm_dport(struct ixgbe_fdir_filter *input,
+				       union ixgbe_atr_input *mask,
+				       __u32 val, __u32 m)
+{
+	input->filter.formatted.dst_port = val & 0xffff;
+	mask->formatted.dst_port = m & 0xffff;
+	return 0;
+};
+
+static struct ixgbe_mat_field ixgbe_tcp_fields[] = {
+	{.off = 0, .mask = 0xffff, .val = ixgbe_mat_prgm_sport,
+	 .type = IXGBE_ATR_FLOW_TYPE_TCPV4},
+	{.off = 2, .mask = 0xffff, .val = ixgbe_mat_prgm_dport,
+	 .type = IXGBE_ATR_FLOW_TYPE_TCPV4},
+	{ .val = NULL } /* terminal node */
+};
+
+struct ixgbe_nexthdr {
+	/* offset, shift, and mask of position to next header */
+	unsigned int o;
+	__u32 s;
+	__u32 m;
+	/* match criteria to make this jump*/
+	unsigned int off;
+	__u32 val;
+	__u32 mask;
+	/* location of jump to make */
+	struct ixgbe_mat_field *jump;
+};
+
+static struct ixgbe_nexthdr ixgbe_ipv4_jumps[] = {
+	{ .o = 0, .s = 6, .m = 0xf,
+	  .off = 8, .val = 0x600, .mask = 0xff00, .jump = ixgbe_tcp_fields},
+	{ .jump = NULL } /* terminal node */
+};
+#endif /* _IXGBE_MODEL_H_ */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (5 preceding siblings ...)
  2016-02-17  5:18 ` [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe John Fastabend
@ 2016-02-17  5:18 ` John Fastabend
  2016-02-17 11:17   ` Jamal Hadi Salim
  2016-02-17  5:19 ` [net-next PATCH v3 8/8] net: ixgbe: abort with cls u32 divisor groups greater than 1 John Fastabend
  2016-02-17 14:48 ` [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe David Miller
  8 siblings, 1 reply; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:18 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.

However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.

Also only the drop action is supported initially.

Here is a short test script,

 #tc qdisc add dev eth4 ingress
 #tc filter add dev eth4 parent ffff: protocol ip \
	u32 ht 800: order 1 \
	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop

<-- hardware has dst/src ip match rule installed -->

 #tc filter del dev eth4 parent ffff: prio 49152
 #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
	handle 1: u32 divisor 1
 #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
	u32 ht 800: order 1 link 1: \
	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
 #tc filter add dev eth4 parent ffff: protocol ip \
	u32 ht 1: order 3 match tcp src 23 ffff action drop

<-- hardware has tcp src port rule installed -->

 #tc qdisc del dev eth4 parent ffff:

<-- hardware cleaned up -->

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    6 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  231 +++++++++++++++++++---
 3 files changed, 213 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4b9156c..fc877c7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -796,6 +796,9 @@ struct ixgbe_adapter {
 	u8 default_up;
 	unsigned long fwd_bitmask; /* Bitmask indicating in use pools */
 
+#define IXGBE_MAX_LINK_HANDLE 10
+	struct ixgbe_mat_field *jump_tables[IXGBE_MAX_LINK_HANDLE];
+
 /* maximum number of RETA entries among all devices supported by ixgbe
  * driver: currently it's x550 device in non-SRIOV mode
  */
@@ -925,6 +928,9 @@ s32 ixgbe_fdir_erase_perfect_filter_82599(struct ixgbe_hw *hw,
 					  u16 soft_id);
 void ixgbe_atr_compute_perfect_hash_82599(union ixgbe_atr_input *input,
 					  union ixgbe_atr_input *mask);
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+				    struct ixgbe_fdir_filter *input,
+				    u16 sw_idx);
 void ixgbe_set_rx_mode(struct net_device *netdev);
 #ifdef CONFIG_IXGBE_DCB
 void ixgbe_set_rx_drop_en(struct ixgbe_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index bea96b3..726e0ee 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2520,9 +2520,9 @@ static int ixgbe_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
 	return ret;
 }
 
-static int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
-					   struct ixgbe_fdir_filter *input,
-					   u16 sw_idx)
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+				    struct ixgbe_fdir_filter *input,
+				    u16 sw_idx)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct hlist_node *node2;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index dca2298..abdfaea 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -51,6 +51,8 @@
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
 #include <net/vxlan.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gact.h>
 
 #ifdef CONFIG_OF
 #include <linux/of_net.h>
@@ -65,6 +67,7 @@
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
+#include "ixgbe_model.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -5545,6 +5548,9 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 #endif /* CONFIG_IXGBE_DCB */
 #endif /* IXGBE_FCOE */
 
+	/* initialize static ixgbe jump table entries */
+	adapter->jump_tables[0] = ixgbe_ipv4_fields;
+
 	adapter->mac_table = kzalloc(sizeof(struct ixgbe_mac_addr) *
 				     hw->mac.num_rar_entries,
 				     GFP_ATOMIC);
@@ -8200,10 +8206,191 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
+static int ixgbe_delete_clsu32(struct ixgbe_adapter *adapter,
+			       struct tc_cls_u32_offload *cls)
+{
+	int err;
+
+	spin_lock(&adapter->fdir_perfect_lock);
+	err = ixgbe_update_ethtool_fdir_entry(adapter, NULL, cls->knode.handle);
+	spin_unlock(&adapter->fdir_perfect_lock);
+	return err;
+}
+
+static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
+				  __be16 protocol,
+				  struct tc_cls_u32_offload *cls)
+{
+	u32 loc = cls->knode.handle & 0xfffff;
+	struct ixgbe_hw *hw = &adapter->hw;
+	struct ixgbe_mat_field *field_ptr;
+	struct ixgbe_fdir_filter *input;
+	union ixgbe_atr_input mask;
+#ifdef CONFIG_NET_CLS_ACT
+	const struct tc_action *a;
+#endif
+	int i, err = 0;
+	u8 queue;
+	u32 handle;
+
+	memset(&mask, 0, sizeof(union ixgbe_atr_input));
+	handle = cls->knode.handle;
+
+	/* At the moment cls_u32 jumps to transport layer and skips past
+	 * L2 headers. The canonical method to match L2 frames is to use
+	 * negative values. However this is error prone at best but really
+	 * just broken because there is no way to "know" what sort of hdr
+	 * is in front of the transport layer. Fix cls_u32 to support L2
+	 * headers when needed.
+	 */
+	if (protocol != htons(ETH_P_IP))
+		return -EINVAL;
+
+	if (cls->knode.link_handle ||
+	    cls->knode.link_handle >= IXGBE_MAX_LINK_HANDLE) {
+		struct ixgbe_nexthdr *nexthdr = ixgbe_ipv4_jumps;
+		u32 uhtid = TC_U32_USERHTID(cls->knode.link_handle);
+
+		for (i = 0; nexthdr[i].jump; i++) {
+			if (nexthdr->o != cls->knode.sel->offoff ||
+			    nexthdr->s != cls->knode.sel->offshift ||
+			    nexthdr->m != cls->knode.sel->offmask ||
+			    /* do not support multiple key jumps its just mad */
+			    cls->knode.sel->nkeys > 1)
+				return -EINVAL;
+
+			if (nexthdr->off != cls->knode.sel->keys[0].off ||
+			    nexthdr->val != cls->knode.sel->keys[0].val ||
+			    nexthdr->mask != cls->knode.sel->keys[0].mask)
+				return -EINVAL;
+
+			if (uhtid >= IXGBE_MAX_LINK_HANDLE)
+				return -EINVAL;
+
+			adapter->jump_tables[uhtid] = nexthdr->jump;
+		}
+		return 0;
+	}
+
+	if (loc >= ((1024 << adapter->fdir_pballoc) - 2)) {
+		e_err(drv, "Location out of range\n");
+		return -EINVAL;
+	}
+
+	/* cls u32 is a graph starting at root node 0x800. The driver tracks
+	 * links and also the fields used to advance the parser across each
+	 * link (e.g. nexthdr/eat parameters from 'tc'). This way we can map
+	 * the u32 graph onto the hardware parse graph denoted in ixgbe_model.h
+	 * To add support for new nodes update ixgbe_model.h parse structures
+	 * this function _should_ be generic try not to hardcode values here.
+	 */
+	if (TC_U32_USERHTID(handle) == 0x800) {
+		field_ptr = adapter->jump_tables[0];
+	} else {
+		if (TC_U32_USERHTID(handle) >= ARRAY_SIZE(adapter->jump_tables))
+			return -EINVAL;
+
+		field_ptr = adapter->jump_tables[TC_U32_USERHTID(handle)];
+	}
+
+	if (!field_ptr)
+		return -EINVAL;
+
+	input = kzalloc(sizeof(*input), GFP_KERNEL);
+	if (!input)
+		return -ENOMEM;
+
+	for (i = 0; i < cls->knode.sel->nkeys; i++) {
+		int off = cls->knode.sel->keys[i].off;
+		__be32 val = cls->knode.sel->keys[i].val;
+		__be32 m = cls->knode.sel->keys[i].mask;
+		bool found_entry = false;
+		int j;
+
+		for (j = 0; field_ptr[j].val; j++) {
+			if (field_ptr[j].off == off &&
+			    field_ptr[j].mask == m) {
+				field_ptr[j].val(input, &mask, val, m);
+				input->filter.formatted.flow_type |=
+					field_ptr[j].type;
+				found_entry = true;
+				break;
+			}
+		}
+
+		if (!found_entry)
+			goto err_out;
+	}
+
+	mask.formatted.flow_type = IXGBE_ATR_L4TYPE_IPV6_MASK |
+				   IXGBE_ATR_L4TYPE_MASK;
+
+	if (input->filter.formatted.flow_type == IXGBE_ATR_FLOW_TYPE_IPV4)
+		mask.formatted.flow_type &= IXGBE_ATR_L4TYPE_IPV6_MASK;
+
+#ifdef CONFIG_NET_CLS_ACT
+	if (list_empty(&cls->knode.exts->actions))
+		goto err_out;
+
+	list_for_each_entry(a, &cls->knode.exts->actions, list) {
+		if (!is_tcf_gact_shot(a))
+			goto err_out;
+	}
+#endif
+
+	input->action = IXGBE_FDIR_DROP_QUEUE;
+	queue = IXGBE_FDIR_DROP_QUEUE;
+	input->sw_idx = loc;
+
+	spin_lock(&adapter->fdir_perfect_lock);
+
+	if (hlist_empty(&adapter->fdir_filter_list)) {
+		memcpy(&adapter->fdir_mask, &mask, sizeof(mask));
+		err = ixgbe_fdir_set_input_mask_82599(hw, &mask);
+		if (err)
+			goto err_out_w_lock;
+	} else if (memcmp(&adapter->fdir_mask, &mask, sizeof(mask))) {
+		err = -EINVAL;
+		goto err_out_w_lock;
+	}
+
+	ixgbe_atr_compute_perfect_hash_82599(&input->filter, &mask);
+	err = ixgbe_fdir_write_perfect_filter_82599(hw, &input->filter,
+						    input->sw_idx, queue);
+	if (!err)
+		ixgbe_update_ethtool_fdir_entry(adapter, input, input->sw_idx);
+	spin_unlock(&adapter->fdir_perfect_lock);
+
+	return err;
+err_out_w_lock:
+	spin_unlock(&adapter->fdir_perfect_lock);
+err_out:
+	kfree(input);
+	return -EINVAL;
+}
+
 int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
 		     struct tc_to_netdev *tc)
 {
-	/* Only support egress tc setup for now */
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+	if (TC_H_MAJ(handle) == TC_H_MAJ(TC_H_INGRESS) &&
+	    tc->type == TC_SETUP_CLSU32) {
+		if (!(dev->features & NETIF_F_HW_TC))
+			return -EINVAL;
+
+		switch (tc->cls_u32->command) {
+		case TC_CLSU32_NEW_KNODE:
+		case TC_CLSU32_REPLACE_KNODE:
+			return ixgbe_configure_clsu32(adapter,
+						      proto, tc->cls_u32);
+		case TC_CLSU32_DELETE_KNODE:
+			return ixgbe_delete_clsu32(adapter, tc->cls_u32);
+		default:
+			return -EINVAL;
+		}
+	}
+
 	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
@@ -8272,19 +8459,17 @@ static int ixgbe_set_features(struct net_device *netdev,
 	}
 
 	/*
-	 * Check if Flow Director n-tuple support was enabled or disabled.  If
-	 * the state changed, we need to reset.
+	 * Check if Flow Director n-tuple support or hw_tc support was
+	 * enabled or disabled.  If the state changed, we need to reset.
 	 */
-	switch (features & NETIF_F_NTUPLE) {
-	case NETIF_F_NTUPLE:
+	if ((features & NETIF_F_NTUPLE) || (features & NETIF_F_HW_TC)) {
 		/* turn off ATR, enable perfect filters and reset */
 		if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
 			need_reset = true;
 
 		adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
 		adapter->flags |= IXGBE_FLAG_FDIR_PERFECT_CAPABLE;
-		break;
-	default:
+	} else {
 		/* turn off perfect filters, enable ATR and reset */
 		if (adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE)
 			need_reset = true;
@@ -8292,23 +8477,16 @@ static int ixgbe_set_features(struct net_device *netdev,
 		adapter->flags &= ~IXGBE_FLAG_FDIR_PERFECT_CAPABLE;
 
 		/* We cannot enable ATR if SR-IOV is enabled */
-		if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
-			break;
-
-		/* We cannot enable ATR if we have 2 or more traffic classes */
-		if (netdev_get_num_tc(netdev) > 1)
-			break;
-
-		/* We cannot enable ATR if RSS is disabled */
-		if (adapter->ring_feature[RING_F_RSS].limit <= 1)
-			break;
-
-		/* A sample rate of 0 indicates ATR disabled */
-		if (!adapter->atr_sample_rate)
-			break;
-
-		adapter->flags |= IXGBE_FLAG_FDIR_HASH_CAPABLE;
-		break;
+		if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED ||
+		    /* We cannot enable ATR if we have 2 or more tcs */
+		    (netdev_get_num_tc(netdev) > 1) ||
+		    /* We cannot enable ATR if RSS is disabled */
+		    (adapter->ring_feature[RING_F_RSS].limit <= 1) ||
+		    /* A sample rate of 0 indicates ATR disabled */
+		    (!adapter->atr_sample_rate))
+			; /* do nothing not supported */
+		else /* otherwise supported and set the flag */
+			adapter->flags |= IXGBE_FLAG_FDIR_HASH_CAPABLE;
 	}
 
 	if (features & NETIF_F_HW_VLAN_CTAG_RX)
@@ -8667,9 +8845,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_set_vf_trust	= ixgbe_ndo_set_vf_trust,
 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
 	.ndo_get_stats64	= ixgbe_get_stats64,
-#ifdef CONFIG_IXGBE_DCB
 	.ndo_setup_tc		= __ixgbe_setup_tc,
-#endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
@@ -9040,7 +9216,8 @@ skip_sriov:
 	case ixgbe_mac_X550EM_x:
 		netdev->features |= NETIF_F_SCTP_CRC;
 		netdev->hw_features |= NETIF_F_SCTP_CRC |
-				       NETIF_F_NTUPLE;
+				       NETIF_F_NTUPLE |
+				       NETIF_F_HW_TC;
 		break;
 	default:
 		break;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH v3 8/8] net: ixgbe: abort with cls u32 divisor groups greater than 1
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (6 preceding siblings ...)
  2016-02-17  5:18 ` [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload John Fastabend
@ 2016-02-17  5:19 ` John Fastabend
  2016-02-17 14:48 ` [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe David Miller
  8 siblings, 0 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17  5:19 UTC (permalink / raw)
  To: jiri, amir, john.fastabend, jhs, davem; +Cc: netdev, jeffrey.t.kirsher

This patch ensures ixgbe will not try to offload hash tables from the
u32 module. The device class does not currently support this so until
it is enabled just abort on these tables.

Interestingly the more flexible your hardware is the less code you
need to implement to guard against these cases.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   31 +++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index fc877c7..84fa28c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -798,6 +798,7 @@ struct ixgbe_adapter {
 
 #define IXGBE_MAX_LINK_HANDLE 10
 	struct ixgbe_mat_field *jump_tables[IXGBE_MAX_LINK_HANDLE];
+	unsigned long tables;
 
 /* maximum number of RETA entries among all devices supported by ixgbe
  * driver: currently it's x550 device in non-SRIOV mode
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index abdfaea..cf4b729 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8217,6 +8217,27 @@ static int ixgbe_delete_clsu32(struct ixgbe_adapter *adapter,
 	return err;
 }
 
+static int ixgbe_configure_clsu32_add_hnode(struct ixgbe_adapter *adapter,
+					    __be16 protocol,
+					    struct tc_cls_u32_offload *cls)
+{
+	/* This ixgbe devices do not support hash tables at the moment
+	 * so abort when given hash tables.
+	 */
+	if (cls->hnode.divisor > 0)
+		return -EINVAL;
+
+	set_bit(TC_U32_USERHTID(cls->hnode.handle), &adapter->tables);
+	return 0;
+}
+
+static int ixgbe_configure_clsu32_del_hnode(struct ixgbe_adapter *adapter,
+					    struct tc_cls_u32_offload *cls)
+{
+	clear_bit(TC_U32_USERHTID(cls->hnode.handle), &adapter->tables);
+	return 0;
+}
+
 static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
 				  __be16 protocol,
 				  struct tc_cls_u32_offload *cls)
@@ -8251,6 +8272,9 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
 		struct ixgbe_nexthdr *nexthdr = ixgbe_ipv4_jumps;
 		u32 uhtid = TC_U32_USERHTID(cls->knode.link_handle);
 
+		if (!test_bit(uhtid, &adapter->tables))
+			return -EINVAL;
+
 		for (i = 0; nexthdr[i].jump; i++) {
 			if (nexthdr->o != cls->knode.sel->offoff ||
 			    nexthdr->s != cls->knode.sel->offshift ||
@@ -8386,6 +8410,13 @@ int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
 						      proto, tc->cls_u32);
 		case TC_CLSU32_DELETE_KNODE:
 			return ixgbe_delete_clsu32(adapter, tc->cls_u32);
+		case TC_CLSU32_NEW_HNODE:
+		case TC_CLSU32_REPLACE_HNODE:
+			return ixgbe_configure_clsu32_add_hnode(adapter, proto,
+								tc->cls_u32);
+		case TC_CLSU32_DELETE_HNODE:
+			return ixgbe_configure_clsu32_del_hnode(adapter,
+								tc->cls_u32);
 		default:
 			return -EINVAL;
 		}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-17  5:17 ` [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
@ 2016-02-17  7:02   ` Jiri Pirko
  2016-02-17 10:59   ` Jamal Hadi Salim
  1 sibling, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2016-02-17  7:02 UTC (permalink / raw)
  To: John Fastabend; +Cc: amir, jhs, davem, netdev, jeffrey.t.kirsher

Wed, Feb 17, 2016 at 06:17:09AM CET, john.fastabend@gmail.com wrote:
>This patch allows netdev drivers to consume cls_u32 offloads via
>the ndo_setup_tc ndo op.
>
>This works aligns with how network drivers have been doing qdisc
>offloads for mqprio.
>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 5/8] net: tc: helper functions to query action types
  2016-02-17  5:18 ` [net-next PATCH v3 5/8] net: tc: helper functions to query action types John Fastabend
@ 2016-02-17  7:03   ` Jiri Pirko
  2016-02-17 11:02   ` Jamal Hadi Salim
  1 sibling, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2016-02-17  7:03 UTC (permalink / raw)
  To: John Fastabend; +Cc: amir, jhs, davem, netdev, jeffrey.t.kirsher

Wed, Feb 17, 2016 at 06:18:03AM CET, john.fastabend@gmail.com wrote:
>This is a helper function drivers can use to learn if the
>action type is a drop action.
>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 1/8] net: rework ndo tc op to consume additional qdisc handle parameter
  2016-02-17  5:16 ` [net-next PATCH v3 1/8] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
@ 2016-02-17 10:42   ` Jamal Hadi Salim
  0 siblings, 0 replies; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-17 10:42 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 12:16 AM, John Fastabend wrote:
> The ndo_setup_tc() op was added to support drivers offloading tx
> qdiscs however only support for mqprio was ever added. So we
> only ever added support for passing the number of traffic classes
> to the driver.
>
> This patch generalizes the ndo_setup_tc op so that a handle can
> be provided to indicate if the offload is for ingress or egress
> or potentially even child qdiscs.
>
> CC: Murali Karicheri <m-karicheri2@ti.com>
> CC: Shradha Shah <sshah@solarflare.com>
> CC: Or Gerlitz <ogerlitz@mellanox.com>
> CC: Ariel Elior <ariel.elior@qlogic.com>
> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> CC: Bruce Allan <bruce.w.allan@intel.com>
> CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
> CC: Don Skidmore <donald.c.skidmore@intel.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 2/8] net: rework setup_tc ndo op to consume general tc operand
  2016-02-17  5:16 ` [net-next PATCH v3 2/8] net: rework setup_tc ndo op to consume general tc operand John Fastabend
@ 2016-02-17 10:42   ` Jamal Hadi Salim
  0 siblings, 0 replies; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-17 10:42 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 12:16 AM, John Fastabend wrote:
> This patch updates setup_tc so we can pass additional parameters into
> the ndo op in a generic way. To do this we provide structured union
> and type flag.
>
> This lets each classifier and qdisc provide its own set of attributes
> without having to add new ndo ops or grow the signature of the
> callback.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>


Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-17  5:17 ` [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
  2016-02-17  7:02   ` Jiri Pirko
@ 2016-02-17 10:59   ` Jamal Hadi Salim
  2016-02-17 14:24     ` John Fastabend
  1 sibling, 1 reply; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-17 10:59 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 12:17 AM, John Fastabend wrote:
> This patch allows netdev drivers to consume cls_u32 offloads via
> the ndo_setup_tc ndo op.
>
> This works aligns with how network drivers have been doing qdisc
> offloads for mqprio.
>

This one i have comments on.

> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>   include/linux/netdevice.h |    6 ++-
>   include/net/pkt_cls.h     |   34 +++++++++++++++
>   net/sched/cls_u32.c       |   99 ++++++++++++++++++++++++++++++++++++++++++++-
>   3 files changed, 136 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e396060..47671ce0 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -779,17 +779,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
>   typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>   				       struct sk_buff *skb);
>
> -/* This structure holds attributes of qdisc and classifiers
> +/* These structures hold the attributes of qdisc and classifiers
>    * that are being passed to the netdevice through the setup_tc op.
>    */
>   enum {
>   	TC_SETUP_MQPRIO,
> +	TC_SETUP_CLSU32,
>   };
>
> +struct tc_cls_u32_offload;
> +
>   struct tc_to_netdev {
>   	unsigned int type;
>   	union {
>   		u8 tc;
> +		struct tc_cls_u32_offload *cls_u32;
>   	};
>   };
>
 >
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index bc49967..59789ca 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -358,4 +358,38 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
>   }
>   #endif /* CONFIG_NET_CLS_IND */
>
> +struct tc_cls_u32_knode {
> +	struct tcf_exts *exts;
> +	u8 fshift;
> +	u32 handle;
> +	u32 val;
> +	u32 mask;
> +	u32 link_handle;
> +	struct tc_u32_sel *sel;
> +};
>

Swapping sel and fshift would give better struct alignment.

> +struct tc_cls_u32_hnode {
> +	u32 handle;
> +	u32 prio;
> +	unsigned int divisor;
> +};


Assuming in the future "prio" would be moved to something that is more
generic classifier specific?

> +enum tc_clsu32_command {
> +	TC_CLSU32_NEW_KNODE,
> +	TC_CLSU32_REPLACE_KNODE,
> +	TC_CLSU32_DELETE_KNODE,
> +	TC_CLSU32_NEW_HNODE,
> +	TC_CLSU32_REPLACE_HNODE,
> +	TC_CLSU32_DELETE_HNODE,
> +};
> +

It seems to me commands should be generic which speak
Netlinkism. A REPLACE is just a flag to NEW. You dont need
a NEW_XXX for every object. switchdev got this right.
If you use cmd + flags  then you can have all kinds of
netlink semantics that relay user intent from user space. Example:
Exclusivity where user says "create if it doesnt exist but dont replace
if it does".
At minimal add "flags" there.
Maybe not this release - but it makes sense to move "command" into 
tc_to_netdev; a u16 cmd + u16 flags.

> +struct tc_cls_u32_offload {
> +	/* knode values */
> +	enum tc_clsu32_command command;
> +	union {
> +		struct tc_cls_u32_knode knode;
> +		struct tc_cls_u32_hnode hnode;
> +	};
> +};
> +
>   #endif
> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
> index 4fbb674..d54bc94 100644

> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
> +{
> +	struct net_device *dev = tp->q->dev_queue->dev;
> +	struct tc_cls_u32_offload u32_offload = {0};
> +	struct tc_to_netdev offload;
> +
> +	offload.type = TC_SETUP_CLSU32;
> +	offload.cls_u32 = &u32_offload;
> +
> +	if (dev->netdev_ops->ndo_setup_tc) {
> +		offload.cls_u32->command = TC_CLSU32_NEW_HNODE;

TC_CLSU32_REPLACE_HNODE?

> +		offload.cls_u32->hnode.divisor = h->divisor;
> +		offload.cls_u32->hnode.handle = h->handle;
> +		offload.cls_u32->hnode.prio = h->prio;
> +
> +		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
> +					      tp->protocol, &offload);
> +	}
> +}


>   static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
>   {
>   	struct tc_u_knode *n;
> @@ -434,6 +522,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
>   			RCU_INIT_POINTER(ht->ht[h],
>   					 rtnl_dereference(n->next));
>   			tcf_unbind_filter(tp, &n->res);
> +			u32_remove_hw_knode(tp, n->handle);
>   			call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
>   		}
>   	}
> @@ -454,6 +543,7 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
>   	     phn;
>   	     hn = &phn->next, phn = rtnl_dereference(*hn)) {
>   		if (phn == ht) {
> +			u32_clear_hw_hnode(tp, ht);
>   			RCU_INIT_POINTER(*hn, ht->next);
>   			kfree_rcu(ht, rcu);
>   			return 0;
> @@ -540,8 +630,10 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
>   	if (ht == NULL)
>   		return 0;
>
> -	if (TC_U32_KEY(ht->handle))
> +	if (TC_U32_KEY(ht->handle)) {
> +		u32_remove_hw_knode(tp, ht->handle);
>   		return u32_delete_key(tp, (struct tc_u_knode *)ht);
> +	}
>


You are unconditionally calling the _hw_ api. For someone not using _hw_ 
offloads, there are a few more instructions. Maybe just do the
dev->netdev_ops->ndo_setup_tc first?


And to Or's point: How do i distinguish s/w from h/w?

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 4/8] net: add tc offload feature flag
  2016-02-17  5:17 ` [net-next PATCH v3 4/8] net: add tc offload feature flag John Fastabend
@ 2016-02-17 11:01   ` Jamal Hadi Salim
  0 siblings, 0 replies; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-17 11:01 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 12:17 AM, John Fastabend wrote:
> Its useful to turn off the qdisc offload feature at a per device
> level. This gives us a big hammer to enable/disable offloading.
> More fine grained control (i.e. per rule) may be supported later.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 5/8] net: tc: helper functions to query action types
  2016-02-17  5:18 ` [net-next PATCH v3 5/8] net: tc: helper functions to query action types John Fastabend
  2016-02-17  7:03   ` Jiri Pirko
@ 2016-02-17 11:02   ` Jamal Hadi Salim
  1 sibling, 0 replies; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-17 11:02 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 12:18 AM, John Fastabend wrote:
> This is a helper function drivers can use to learn if the
> action type is a drop action.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe
  2016-02-17  5:18 ` [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe John Fastabend
@ 2016-02-17 11:06   ` Jamal Hadi Salim
  2016-02-17 15:09     ` David Miller
  2016-02-17 18:01   ` Rustad, Mark D
  1 sibling, 1 reply; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-17 11:06 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 12:18 AM, John Fastabend wrote:
> This adds an ixgbe data structure that is used to determine what
> headers:fields can be matched and in what order they are supported.
>
> For hardware devices this can be a bit tricky because typically
> only pre-programmed (firmware, ucode, rtl) parse graphs will be
> supported and we don't yet have an interface to change these from
> the OS. So its sort of a you get whatever your friendly vendor
> provides affair at the moment.
>
> In the future we can add the get routines and set routines to
> update this data structure. One interesting thing to note here
> is the data structure here identifies ethernet, ip, and tcp
> fields without having to hardcode them as enumerations or use
> other identifiers.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>


should __u32 be u32? Also at some points you seem to interchangeably
use unsigned int vs u32. I think most of unsigned ints should be u32.


Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload
  2016-02-17  5:18 ` [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload John Fastabend
@ 2016-02-17 11:17   ` Jamal Hadi Salim
  2016-02-17 11:42     ` Jiri Pirko
  0 siblings, 1 reply; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-17 11:17 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 12:18 AM, John Fastabend wrote:
> This adds initial support for offloading the u32 tc classifier. This
> initial implementation only implements a few base matches and actions
> to illustrate the use of the infrastructure patches.
>
> However it is an interesting subset because it handles the u32 next
> hdr logic to correctly map tcp packets from ip headers using the ihl
> and protocol fields. After this is accepted we can extend the match
> and action fields easily by updating the model header file.
>
> Also only the drop action is supported initially.
>
> Here is a short test script,
>
>   #tc qdisc add dev eth4 ingress
>   #tc filter add dev eth4 parent ffff: protocol ip \
> 	u32 ht 800: order 1 \
> 	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
>

Note: i dont see anything that says "hw". Are you delegating ht 0x800
for h/w only? It is the default ht; so may not be the best choice.

> <-- hardware has dst/src ip match rule installed -->
>
>   #tc filter del dev eth4 parent ffff: prio 49152

Would be useful to dump the installed rule so user gets to
see kernel-assigned prio 49152. The better way
to do this is use the handle 800::1 as opposed to prio.

>   #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
> 	handle 1: u32 divisor 1
>   #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
> 	u32 ht 800: order 1 link 1: \
> 	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>   #tc filter add dev eth4 parent ffff: protocol ip \
> 	u32 ht 1: order 3 match tcp src 23 ffff action drop
>
> <-- hardware has tcp src port rule installed -->
>
>   #tc qdisc del dev eth4 parent ffff:
>
> <-- hardware cleaned up -->
>

All looks cool but I am just worried about the lack of intent that
something needs to go to hw vs sw. Other worry:
What happens when things fail to install in hw?
As it stands right now your assumption is all default rules go to h/w.
And failure to install is not handled.
I know Dave wants to push this in so a followup sets of patches to
address this is fine by me. Otherwise if it is not too much work, please
address this.

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload
  2016-02-17 11:17   ` Jamal Hadi Salim
@ 2016-02-17 11:42     ` Jiri Pirko
  2016-02-17 11:47       ` Jamal Hadi Salim
  0 siblings, 1 reply; 34+ messages in thread
From: Jiri Pirko @ 2016-02-17 11:42 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: John Fastabend, amir, davem, netdev, jeffrey.t.kirsher

Wed, Feb 17, 2016 at 12:17:26PM CET, jhs@mojatatu.com wrote:
>On 16-02-17 12:18 AM, John Fastabend wrote:
>>This adds initial support for offloading the u32 tc classifier. This
>>initial implementation only implements a few base matches and actions
>>to illustrate the use of the infrastructure patches.
>>
>>However it is an interesting subset because it handles the u32 next
>>hdr logic to correctly map tcp packets from ip headers using the ihl
>>and protocol fields. After this is accepted we can extend the match
>>and action fields easily by updating the model header file.
>>
>>Also only the drop action is supported initially.
>>
>>Here is a short test script,
>>
>>  #tc qdisc add dev eth4 ingress
>>  #tc filter add dev eth4 parent ffff: protocol ip \
>>	u32 ht 800: order 1 \
>>	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
>>
>
>Note: i dont see anything that says "hw". Are you delegating ht 0x800
>for h/w only? It is the default ht; so may not be the best choice.

That is not implemented in this patchset. hw/sw/hwsw flag will be done
in a follow up. So far, the user has only possibility to enable/disable
the whole thing by ethtool feature flag.


>
>><-- hardware has dst/src ip match rule installed -->
>>
>>  #tc filter del dev eth4 parent ffff: prio 49152
>
>Would be useful to dump the installed rule so user gets to
>see kernel-assigned prio 49152. The better way
>to do this is use the handle 800::1 as opposed to prio.
>
>>  #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
>>	handle 1: u32 divisor 1
>>  #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
>>	u32 ht 800: order 1 link 1: \
>>	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>>  #tc filter add dev eth4 parent ffff: protocol ip \
>>	u32 ht 1: order 3 match tcp src 23 ffff action drop
>>
>><-- hardware has tcp src port rule installed -->
>>
>>  #tc qdisc del dev eth4 parent ffff:
>>
>><-- hardware cleaned up -->
>>
>
>All looks cool but I am just worried about the lack of intent that
>something needs to go to hw vs sw. Other worry:
>What happens when things fail to install in hw?

Silently fail. I believe that this should be handled in the same
follow-up I referred to above.


>As it stands right now your assumption is all default rules go to h/w.
>And failure to install is not handled.
>I know Dave wants to push this in so a followup sets of patches to
>address this is fine by me. Otherwise if it is not too much work, please
>address this.
>
>cheers,
>jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload
  2016-02-17 11:42     ` Jiri Pirko
@ 2016-02-17 11:47       ` Jamal Hadi Salim
  2016-02-17 14:25         ` John Fastabend
  0 siblings, 1 reply; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-17 11:47 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: John Fastabend, amir, davem, netdev, jeffrey.t.kirsher

On 16-02-17 06:42 AM, Jiri Pirko wrote:
> Wed, Feb 17, 2016 at 12:17:26PM CET, jhs@mojatatu.com wrote:
>> On 16-02-17 12:18 AM, John Fastabend wrote:

>>
>> Note: i dont see anything that says "hw". Are you delegating ht 0x800
>> for h/w only? It is the default ht; so may not be the best choice.
>
> That is not implemented in this patchset. hw/sw/hwsw flag will be done
> in a follow up. So far, the user has only possibility to enable/disable
> the whole thing by ethtool feature flag.

[..]
>> All looks cool but I am just worried about the lack of intent that
>> something needs to go to hw vs sw. Other worry:
>> What happens when things fail to install in hw?
>
> Silently fail. I believe that this should be handled in the same
> follow-up I referred to above.
>

I can live with the above. John, please consider my comments when
sending next update.
So only outstanding issue is you need to fix that NEW with REPLACE
in patch 3.

For this patch:
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-17 10:59   ` Jamal Hadi Salim
@ 2016-02-17 14:24     ` John Fastabend
  2016-02-17 23:07       ` John Fastabend
  0 siblings, 1 reply; 34+ messages in thread
From: John Fastabend @ 2016-02-17 14:24 UTC (permalink / raw)
  To: Jamal Hadi Salim, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 02:59 AM, Jamal Hadi Salim wrote:
> On 16-02-17 12:17 AM, John Fastabend wrote:
>> This patch allows netdev drivers to consume cls_u32 offloads via
>> the ndo_setup_tc ndo op.
>>
>> This works aligns with how network drivers have been doing qdisc
>> offloads for mqprio.
>>
> 
> This one i have comments on.
> 
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>   include/linux/netdevice.h |    6 ++-
>>   include/net/pkt_cls.h     |   34 +++++++++++++++
>>   net/sched/cls_u32.c       |   99
>> ++++++++++++++++++++++++++++++++++++++++++++-

[...]

>>   #endif /* CONFIG_NET_CLS_IND */
>>
>> +struct tc_cls_u32_knode {
>> +    struct tcf_exts *exts;
>> +    u8 fshift;
>> +    u32 handle;
>> +    u32 val;
>> +    u32 mask;
>> +    u32 link_handle;
>> +    struct tc_u32_sel *sel;
>> +};
>>
> 
> Swapping sel and fshift would give better struct alignment.
> 

Makes sense I went ahead and did this.

>> +struct tc_cls_u32_hnode {
>> +    u32 handle;
>> +    u32 prio;
>> +    unsigned int divisor;
>> +};
> 
> 
> Assuming in the future "prio" would be moved to something that is more
> generic classifier specific?
> 

Sure in the future it can be moved up into a generic struct if
it becomes useful there.

>> +enum tc_clsu32_command {
>> +    TC_CLSU32_NEW_KNODE,
>> +    TC_CLSU32_REPLACE_KNODE,
>> +    TC_CLSU32_DELETE_KNODE,
>> +    TC_CLSU32_NEW_HNODE,
>> +    TC_CLSU32_REPLACE_HNODE,
>> +    TC_CLSU32_DELETE_HNODE,
>> +};
>> +
> 
> It seems to me commands should be generic which speak
> Netlinkism. A REPLACE is just a flag to NEW. You dont need
> a NEW_XXX for every object. switchdev got this right.
> If you use cmd + flags  then you can have all kinds of
> netlink semantics that relay user intent from user space. Example:
> Exclusivity where user says "create if it doesnt exist but dont replace
> if it does".
> At minimal add "flags" there.
> Maybe not this release - but it makes sense to move "command" into
> tc_to_netdev; a u16 cmd + u16 flags.
> 

Yep next set of patches add the specific hw/sw/both semantics
and specific error handling strategies. For this series we just
get the simplest one.

>> +struct tc_cls_u32_offload {
>> +    /* knode values */
>> +    enum tc_clsu32_command command;
>> +    union {
>> +        struct tc_cls_u32_knode knode;
>> +        struct tc_cls_u32_hnode hnode;
>> +    };
>> +};
>> +
>>   #endif
>> diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
>> index 4fbb674..d54bc94 100644
> 
>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
>> tc_u_hnode *h)
>> +{
>> +    struct net_device *dev = tp->q->dev_queue->dev;
>> +    struct tc_cls_u32_offload u32_offload = {0};
>> +    struct tc_to_netdev offload;
>> +
>> +    offload.type = TC_SETUP_CLSU32;
>> +    offload.cls_u32 = &u32_offload;
>> +
>> +    if (dev->netdev_ops->ndo_setup_tc) {
>> +        offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
> 
> TC_CLSU32_REPLACE_HNODE?
> 

Yep I made this change and will send out v4.

[...]

> 
> 
> You are unconditionally calling the _hw_ api. For someone not using _hw_
> offloads, there are a few more instructions. Maybe just do the
> dev->netdev_ops->ndo_setup_tc first?
> 

My simple add/rem stress test didn't seem to make any difference.
I think there is so much other stuff going on here this is in the
noise. I'll take a pass at optimizing this later for all
cases not just the hw loading ones.

> 
> And to Or's point: How do i distinguish s/w from h/w?

Next series handles this for now just enable the simplest case.

> 
> cheers,
> jamal
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload
  2016-02-17 11:47       ` Jamal Hadi Salim
@ 2016-02-17 14:25         ` John Fastabend
  0 siblings, 0 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17 14:25 UTC (permalink / raw)
  To: Jamal Hadi Salim, Jiri Pirko; +Cc: amir, davem, netdev, jeffrey.t.kirsher

On 16-02-17 03:47 AM, Jamal Hadi Salim wrote:
> On 16-02-17 06:42 AM, Jiri Pirko wrote:
>> Wed, Feb 17, 2016 at 12:17:26PM CET, jhs@mojatatu.com wrote:
>>> On 16-02-17 12:18 AM, John Fastabend wrote:
> 
>>>
>>> Note: i dont see anything that says "hw". Are you delegating ht 0x800
>>> for h/w only? It is the default ht; so may not be the best choice.
>>
>> That is not implemented in this patchset. hw/sw/hwsw flag will be done
>> in a follow up. So far, the user has only possibility to enable/disable
>> the whole thing by ethtool feature flag.
> 
> [..]
>>> All looks cool but I am just worried about the lack of intent that
>>> something needs to go to hw vs sw. Other worry:
>>> What happens when things fail to install in hw?
>>
>> Silently fail. I believe that this should be handled in the same
>> follow-up I referred to above.
>>
> 
> I can live with the above. John, please consider my comments when
> sending next update.
> So only outstanding issue is you need to fix that NEW with REPLACE
> in patch 3.
> 

Perfect. I'm sending an update for patch 3 now and I'll carry all
your Acks forward thanks for the review Jamal.

> For this patch:
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> 
> cheers,
> jamal
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe
  2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (7 preceding siblings ...)
  2016-02-17  5:19 ` [net-next PATCH v3 8/8] net: ixgbe: abort with cls u32 divisor groups greater than 1 John Fastabend
@ 2016-02-17 14:48 ` David Miller
  8 siblings, 0 replies; 34+ messages in thread
From: David Miller @ 2016-02-17 14:48 UTC (permalink / raw)
  To: john.fastabend; +Cc: jiri, amir, jhs, netdev, jeffrey.t.kirsher

From: John Fastabend <john.fastabend@gmail.com>
Date: Tue, 16 Feb 2016 21:15:50 -0800

> This extends the setup_tc framework so it can support more than just
> the mqprio offload and push other classifiers and qdiscs into the
> hardware.

Series applied, thanks John!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe
  2016-02-17 11:06   ` Jamal Hadi Salim
@ 2016-02-17 15:09     ` David Miller
  2016-02-17 15:14       ` John Fastabend
  0 siblings, 1 reply; 34+ messages in thread
From: David Miller @ 2016-02-17 15:09 UTC (permalink / raw)
  To: jhs; +Cc: john.fastabend, jiri, amir, netdev, jeffrey.t.kirsher

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Wed, 17 Feb 2016 06:06:02 -0500

> On 16-02-17 12:18 AM, John Fastabend wrote:
>> This adds an ixgbe data structure that is used to determine what
>> headers:fields can be matched and in what order they are supported.
>>
>> For hardware devices this can be a bit tricky because typically
>> only pre-programmed (firmware, ucode, rtl) parse graphs will be
>> supported and we don't yet have an interface to change these from
>> the OS. So its sort of a you get whatever your friendly vendor
>> provides affair at the moment.
>>
>> In the future we can add the get routines and set routines to
>> update this data structure. One interesting thing to note here
>> is the data structure here identifies ethernet, ip, and tcp
>> fields without having to hardcode them as enumerations or use
>> other identifiers.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> 
> 
> should __u32 be u32? Also at some points you seem to interchangeably
> use unsigned int vs u32. I think most of unsigned ints should be u32.

Anything only kernel visible should be 'u32', whereas if the object is
exported via a uapi header file it should be '__u32'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe
  2016-02-17 15:09     ` David Miller
@ 2016-02-17 15:14       ` John Fastabend
  0 siblings, 0 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17 15:14 UTC (permalink / raw)
  To: David Miller, jhs; +Cc: jiri, amir, netdev, jeffrey.t.kirsher

On 16-02-17 07:09 AM, David Miller wrote:
> From: Jamal Hadi Salim <jhs@mojatatu.com>
> Date: Wed, 17 Feb 2016 06:06:02 -0500
> 
>> On 16-02-17 12:18 AM, John Fastabend wrote:
>>> This adds an ixgbe data structure that is used to determine what
>>> headers:fields can be matched and in what order they are supported.
>>>
>>> For hardware devices this can be a bit tricky because typically
>>> only pre-programmed (firmware, ucode, rtl) parse graphs will be
>>> supported and we don't yet have an interface to change these from
>>> the OS. So its sort of a you get whatever your friendly vendor
>>> provides affair at the moment.
>>>
>>> In the future we can add the get routines and set routines to
>>> update this data structure. One interesting thing to note here
>>> is the data structure here identifies ethernet, ip, and tcp
>>> fields without having to hardcode them as enumerations or use
>>> other identifiers.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>
>>
>> should __u32 be u32? Also at some points you seem to interchangeably
>> use unsigned int vs u32. I think most of unsigned ints should be u32.
> 
> Anything only kernel visible should be 'u32', whereas if the object is
> exported via a uapi header file it should be '__u32'.
> 

yep, it looks like you may have applied the series so I'll push a
fix for this and another to align the structs a bit better in cls_u32 as
well per Jamals feedback.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe
  2016-02-17  5:18 ` [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe John Fastabend
  2016-02-17 11:06   ` Jamal Hadi Salim
@ 2016-02-17 18:01   ` Rustad, Mark D
  2016-02-17 22:34     ` John Fastabend
  1 sibling, 1 reply; 34+ messages in thread
From: Rustad, Mark D @ 2016-02-17 18:01 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jiri Pirko, amir, jhs, davem, netdev, Kirsher, Jeffrey T

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

John Fastabend <john.fastabend@gmail.com> wrote:

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h  
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
> new file mode 100644
> index 0000000..43ebec4
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
> @@ -0,0 +1,112 @@
> +/*******************************************************************************
> + *
> + * Intel 10 Gigabit PCI Express Linux drive
> + * Copyright(c) 2013 - 2015 Intel Corporation.

Those copyright dates don't seem reasonable for a new file, since it  
clearly didn't exist in 2013. IANAL, but it has to be better to at least be  
accurate about when it was created.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe
  2016-02-17 18:01   ` Rustad, Mark D
@ 2016-02-17 22:34     ` John Fastabend
  0 siblings, 0 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17 22:34 UTC (permalink / raw)
  To: Rustad, Mark D; +Cc: Jiri Pirko, amir, jhs, davem, netdev, Kirsher, Jeffrey T

On 16-02-17 10:01 AM, Rustad, Mark D wrote:
> John Fastabend <john.fastabend@gmail.com> wrote:
> 
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
>> new file mode 100644
>> index 0000000..43ebec4
>> --- /dev/null
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
>> @@ -0,0 +1,112 @@
>> +/*******************************************************************************
>>
>> + *
>> + * Intel 10 Gigabit PCI Express Linux drive
>> + * Copyright(c) 2013 - 2015 Intel Corporation.
> 
> Those copyright dates don't seem reasonable for a new file, since it
> clearly didn't exist in 2013. IANAL, but it has to be better to at least
> be accurate about when it was created.
> 
> -- 
> Mark Rustad, Networking Division, Intel Corporation

Sure I'll fix this as well. :/ thanks Mark.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-17 14:24     ` John Fastabend
@ 2016-02-17 23:07       ` John Fastabend
  2016-02-18  9:23         ` Amir Vadai"
  2016-02-18 12:14         ` Jamal Hadi Salim
  0 siblings, 2 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-17 23:07 UTC (permalink / raw)
  To: Jamal Hadi Salim, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

[...]

>>
>>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
>>> tc_u_hnode *h)
>>> +{
>>> +    struct net_device *dev = tp->q->dev_queue->dev;
>>> +    struct tc_cls_u32_offload u32_offload = {0};
>>> +    struct tc_to_netdev offload;
>>> +
>>> +    offload.type = TC_SETUP_CLSU32;
>>> +    offload.cls_u32 = &u32_offload;
>>> +
>>> +    if (dev->netdev_ops->ndo_setup_tc) {
>>> +        offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
>>
>> TC_CLSU32_REPLACE_HNODE?
>>
> 
> Yep I made this change and will send out v4.
> 
> [...]
> 
>>

Actually thinking about this a bit more I wrote this thinking
that there existed some hardware that actually cared if it was
a new rule or an existing rule. For me it doesn't matter I do
the same thing in the new/replace cases I just write into the
slot on the hardware table and if it happens to have something
in it well its overwritten e.g. "replaced". This works because
the cls_u32 layer protects us from doing something unexpected.

I'm wondering (mostly asking the mlx folks) is there hardware
out there that cares to make this distinction between new and
replace? Otherwise I can just drop new and always use replace.
Or vice versa which is the case in its current form.

Thanks,
John

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-17 23:07       ` John Fastabend
@ 2016-02-18  9:23         ` Amir Vadai"
  2016-02-19  3:37           ` Simon Horman
  2016-02-19  8:16           ` Or Gerlitz
  2016-02-18 12:14         ` Jamal Hadi Salim
  1 sibling, 2 replies; 34+ messages in thread
From: Amir Vadai" @ 2016-02-18  9:23 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jamal Hadi Salim, jiri, davem, netdev, jeffrey.t.kirsher

On Wed, Feb 17, 2016 at 03:07:23PM -0800, John Fastabend wrote:
> [...]
> 
> >>
> >>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
> >>> tc_u_hnode *h)
> >>> +{
> >>> +    struct net_device *dev = tp->q->dev_queue->dev;
> >>> +    struct tc_cls_u32_offload u32_offload = {0};
> >>> +    struct tc_to_netdev offload;
> >>> +
> >>> +    offload.type = TC_SETUP_CLSU32;
> >>> +    offload.cls_u32 = &u32_offload;
> >>> +
> >>> +    if (dev->netdev_ops->ndo_setup_tc) {
> >>> +        offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
> >>
> >> TC_CLSU32_REPLACE_HNODE?
> >>
> > 
> > Yep I made this change and will send out v4.
> > 
> > [...]
> > 
> >>
> 
> Actually thinking about this a bit more I wrote this thinking
> that there existed some hardware that actually cared if it was
> a new rule or an existing rule. For me it doesn't matter I do
> the same thing in the new/replace cases I just write into the
> slot on the hardware table and if it happens to have something
> in it well its overwritten e.g. "replaced". This works because
> the cls_u32 layer protects us from doing something unexpected.
> 
> I'm wondering (mostly asking the mlx folks) is there hardware
> out there that cares to make this distinction between new and
> replace? Otherwise I can just drop new and always use replace.
> Or vice versa which is the case in its current form.
I don't see a need for such a distinction in mlx hardware.

Thanks,
Amir.

> 
> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-17 23:07       ` John Fastabend
  2016-02-18  9:23         ` Amir Vadai"
@ 2016-02-18 12:14         ` Jamal Hadi Salim
  2016-02-18 15:24           ` John Fastabend
  1 sibling, 1 reply; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-18 12:14 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-17 06:07 PM, John Fastabend wrote:
> [...]
>

> Actually thinking about this a bit more I wrote this thinking
> that there existed some hardware that actually cared if it was
> a new rule or an existing rule. For me it doesn't matter I do
> the same thing in the new/replace cases I just write into the
> slot on the hardware table and if it happens to have something
> in it well its overwritten e.g. "replaced". This works because
> the cls_u32 layer protects us from doing something unexpected.
>

You are describing create-or-update which is a reasonable default
BUT: counting on the user to specify the htid+bktid+nodeid
for every filter and knowing what that means is prone to mistakes
when for example (using your big hammer approach right now) they
dont specify the handle and the kernel creates one for them.

IMO, it would be better at this early stage to enforce the correct
behavior for future generations.
To follow the netlink semantics which a lot of people are already
trained to think in.

Current netlink behavior is supposed to be:

1) NEW ==> "Create".
Ambigous - could mean a)"create if it doesnt exist" or b) "fail if it 
exists otherwise create"
Unfortunately different parts of the kernel often assume some
default from either #a or #b.

2) NEW|REPLACE flag ==> "Create if it doesnt exist and replace
if it exists"

3)NEW|EXCLUSIVE ==> "Create if it doesnt exist and fail if it
exists"

4)NEW|APPEND ==> "just fscking create; i dont care if it exists".

IOW, just add the flag field which is intepreted from whatever
the user explicitly asks for. And reject say what the hardware
doesnt support.
I have worked with tcams where we support #3. It is a bit inefficient
because you have to check if a rule exists first. And i have worked
in cases where #1 is assumed to mean #2 and at times #4. It is better
user experience to be explicit.

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-18 12:14         ` Jamal Hadi Salim
@ 2016-02-18 15:24           ` John Fastabend
  2016-02-19 12:52             ` Jamal Hadi Salim
  0 siblings, 1 reply; 34+ messages in thread
From: John Fastabend @ 2016-02-18 15:24 UTC (permalink / raw)
  To: Jamal Hadi Salim, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-18 04:14 AM, Jamal Hadi Salim wrote:
> On 16-02-17 06:07 PM, John Fastabend wrote:
>> [...]
>>
> 
>> Actually thinking about this a bit more I wrote this thinking
>> that there existed some hardware that actually cared if it was
>> a new rule or an existing rule. For me it doesn't matter I do
>> the same thing in the new/replace cases I just write into the
>> slot on the hardware table and if it happens to have something
>> in it well its overwritten e.g. "replaced". This works because
>> the cls_u32 layer protects us from doing something unexpected.
>>
> 
> You are describing create-or-update which is a reasonable default
> BUT: counting on the user to specify the htid+bktid+nodeid
> for every filter and knowing what that means is prone to mistakes
> when for example (using your big hammer approach right now) they
> dont specify the handle and the kernel creates one for them.
> 
> IMO, it would be better at this early stage to enforce the correct
> behavior for future generations.
> To follow the netlink semantics which a lot of people are already
> trained to think in.
> 
> Current netlink behavior is supposed to be:
> 
> 1) NEW ==> "Create".
> Ambigous - could mean a)"create if it doesnt exist" or b) "fail if it
> exists otherwise create"
> Unfortunately different parts of the kernel often assume some
> default from either #a or #b.
> 

But this is already handled by the core cls_api.c code. We never
get to u32_change if the flags are not correct.

Look at the block right above the op call into the classifiers
change() code in cls_api.c. Starting at line 287.

> 2) NEW|REPLACE flag ==> "Create if it doesnt exist and replace
> if it exists"
> 
> 3)NEW|EXCLUSIVE ==> "Create if it doesnt exist and fail if it
> exists"
> 
> 4)NEW|APPEND ==> "just fscking create; i dont care if it exists".
> 
> IOW, just add the flag field which is intepreted from whatever
> the user explicitly asks for. And reject say what the hardware
> doesnt support.

I think I agree with what your saying but I'm fairly sure this is
working as you describe.

> I have worked with tcams where we support #3. It is a bit inefficient
> because you have to check if a rule exists first. And i have worked
> in cases where #1 is assumed to mean #2 and at times #4. It is better
> user experience to be explicit.

Agreed, and I think it is :)

> 
> cheers,
> jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-18  9:23         ` Amir Vadai"
@ 2016-02-19  3:37           ` Simon Horman
  2016-02-19  8:16           ` Or Gerlitz
  1 sibling, 0 replies; 34+ messages in thread
From: Simon Horman @ 2016-02-19  3:37 UTC (permalink / raw)
  To: Amir Vadai"
  Cc: John Fastabend, Jamal Hadi Salim, jiri, davem, netdev, jeffrey.t.kirsher

On Thu, Feb 18, 2016 at 11:23:35AM +0200, Amir Vadai" wrote:
> On Wed, Feb 17, 2016 at 03:07:23PM -0800, John Fastabend wrote:
> > [...]
> > 
> > >>
> > >>> +static void u32_replace_hw_hnode(struct tcf_proto *tp, struct
> > >>> tc_u_hnode *h)
> > >>> +{
> > >>> +    struct net_device *dev = tp->q->dev_queue->dev;
> > >>> +    struct tc_cls_u32_offload u32_offload = {0};
> > >>> +    struct tc_to_netdev offload;
> > >>> +
> > >>> +    offload.type = TC_SETUP_CLSU32;
> > >>> +    offload.cls_u32 = &u32_offload;
> > >>> +
> > >>> +    if (dev->netdev_ops->ndo_setup_tc) {
> > >>> +        offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
> > >>
> > >> TC_CLSU32_REPLACE_HNODE?
> > >>
> > > 
> > > Yep I made this change and will send out v4.
> > > 
> > > [...]
> > > 
> > >>
> > 
> > Actually thinking about this a bit more I wrote this thinking
> > that there existed some hardware that actually cared if it was
> > a new rule or an existing rule. For me it doesn't matter I do
> > the same thing in the new/replace cases I just write into the
> > slot on the hardware table and if it happens to have something
> > in it well its overwritten e.g. "replaced". This works because
> > the cls_u32 layer protects us from doing something unexpected.
> > 
> > I'm wondering (mostly asking the mlx folks) is there hardware
> > out there that cares to make this distinction between new and
> > replace? Otherwise I can just drop new and always use replace.
> > Or vice versa which is the case in its current form.
> I don't see a need for such a distinction in mlx hardware.

FWIW, I think it is unlikely such a distinction would
be needed for Netronome hardware.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-18  9:23         ` Amir Vadai"
  2016-02-19  3:37           ` Simon Horman
@ 2016-02-19  8:16           ` Or Gerlitz
  1 sibling, 0 replies; 34+ messages in thread
From: Or Gerlitz @ 2016-02-19  8:16 UTC (permalink / raw)
  To: Amir Vadai, John Fastabend
  Cc: Jamal Hadi Salim, Jiří Pírko, David Miller,
	Linux Netdev List, Jeff Kirsher

On Thu, Feb 18, 2016 at 11:23 AM, Amir Vadai" <amir@vadai.me> wrote:
> On Wed, Feb 17, 2016 at 03:07:23PM -0800, John Fastabend wrote:

>> Actually thinking about this a bit more I wrote this thinking
>> that there existed some hardware that actually cared if it was
>> a new rule or an existing rule. For me it doesn't matter I do
>> the same thing in the new/replace cases I just write into the
>> slot on the hardware table and if it happens to have something
>> in it well its overwritten e.g. "replaced". This works because
>> the cls_u32 layer protects us from doing something unexpected.

>> I'm wondering (mostly asking the mlx folks) is there hardware
>> out there that cares to make this distinction between new and
>> replace? Otherwise I can just drop new and always use replace.
>> Or vice versa which is the case in its current form.

> I don't see a need for such a distinction in mlx hardware.

If you (say) have the same match and different action for a given rule
we have HW API to modify existing steering entry under which we would
end up calling the FW once instead of twice (delete old, add new).

Or.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-18 15:24           ` John Fastabend
@ 2016-02-19 12:52             ` Jamal Hadi Salim
  0 siblings, 0 replies; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-19 12:52 UTC (permalink / raw)
  To: John Fastabend, jiri, amir, davem; +Cc: netdev, jeffrey.t.kirsher

On 16-02-18 10:24 AM, John Fastabend wrote:
> On 16-02-18 04:14 AM, Jamal Hadi Salim wrote:
>> On 16-02-17 06:07 PM, John Fastabend wrote:
>>> [...]
>>>
>>

>> IMO, it would be better at this early stage to enforce the correct
>> behavior for future generations.
>> To follow the netlink semantics which a lot of people are already
>> trained to think in.
>>
>> Current netlink behavior is supposed to be:
>>
>> 1) NEW ==> "Create".
>> Ambigous - could mean a)"create if it doesnt exist" or b) "fail if it
>> exists otherwise create"
>> Unfortunately different parts of the kernel often assume some
>> default from either #a or #b.
>>
>
> But this is already handled by the core cls_api.c code. We never
> get to u32_change if the flags are not correct.
>
> Look at the block right above the op call into the classifiers
> change() code in cls_api.c. Starting at line 287.
>
>

Indeed that would cover s/ware filters but not h/ware. That will
depend on what hardware can do. Really,
all you need is to propagate those flags. Your
driver ndo can ignore them. I know i will need them.

Alternatively: If we say all filters are going to be stored in
s/ware as well i.e the tri-state we talked about then the cls_api
checks will work as well. But when you are talking millions
of filters (such as i deal with) - that may become impractical
(and then you are going to have all kind of clever things to find
whether an EXCLUSIVE will work or not etc).

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-02-19 12:52 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17  5:15 [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe John Fastabend
2016-02-17  5:16 ` [net-next PATCH v3 1/8] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
2016-02-17 10:42   ` Jamal Hadi Salim
2016-02-17  5:16 ` [net-next PATCH v3 2/8] net: rework setup_tc ndo op to consume general tc operand John Fastabend
2016-02-17 10:42   ` Jamal Hadi Salim
2016-02-17  5:17 ` [net-next PATCH v3 3/8] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
2016-02-17  7:02   ` Jiri Pirko
2016-02-17 10:59   ` Jamal Hadi Salim
2016-02-17 14:24     ` John Fastabend
2016-02-17 23:07       ` John Fastabend
2016-02-18  9:23         ` Amir Vadai"
2016-02-19  3:37           ` Simon Horman
2016-02-19  8:16           ` Or Gerlitz
2016-02-18 12:14         ` Jamal Hadi Salim
2016-02-18 15:24           ` John Fastabend
2016-02-19 12:52             ` Jamal Hadi Salim
2016-02-17  5:17 ` [net-next PATCH v3 4/8] net: add tc offload feature flag John Fastabend
2016-02-17 11:01   ` Jamal Hadi Salim
2016-02-17  5:18 ` [net-next PATCH v3 5/8] net: tc: helper functions to query action types John Fastabend
2016-02-17  7:03   ` Jiri Pirko
2016-02-17 11:02   ` Jamal Hadi Salim
2016-02-17  5:18 ` [net-next PATCH v3 6/8] net: ixgbe: add minimal parser details for ixgbe John Fastabend
2016-02-17 11:06   ` Jamal Hadi Salim
2016-02-17 15:09     ` David Miller
2016-02-17 15:14       ` John Fastabend
2016-02-17 18:01   ` Rustad, Mark D
2016-02-17 22:34     ` John Fastabend
2016-02-17  5:18 ` [net-next PATCH v3 7/8] net: ixgbe: add support for tc_u32 offload John Fastabend
2016-02-17 11:17   ` Jamal Hadi Salim
2016-02-17 11:42     ` Jiri Pirko
2016-02-17 11:47       ` Jamal Hadi Salim
2016-02-17 14:25         ` John Fastabend
2016-02-17  5:19 ` [net-next PATCH v3 8/8] net: ixgbe: abort with cls u32 divisor groups greater than 1 John Fastabend
2016-02-17 14:48 ` [net-next PATCH v3 0/8] tc offload for cls_u32 on ixgbe David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.