All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
@ 2016-02-03  9:27 John Fastabend
  2016-02-03  9:27 ` [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
                   ` (8 more replies)
  0 siblings, 9 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-03  9:27 UTC (permalink / raw)
  To: amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher; +Cc: netdev, davem

This extends the setup_tc framework so it can support more than just
the mqprio offload and push other classifiers and qdiscs into the
hardware. The series here targets the u32 classifier and ixgbe
driver. I worked out the u32 classifier because it is protocol
oblivious and aligns with multiple hardware devices I have access
to. I did an initial implementation on ixgbe because (a) I have one
in my box (b) its a stable driver and (c) it is relatively simple
compared to the other devices I have here but still has enough
flexibility to exercise the features of cls_u32.

I intentionally limited the scope of this series to the basic
feature set. Specifically this uses a 'big hammer' feature bit
to do the offload or not. If the bit is set you get offloaded rules
if it is not then rules will not be offloaded. If we can agree on
this patch series there are some more patches on my queue we can
talk about to make the offload decision per rule using flags similar
to how we do l2 mac updates. Additionally the error strategy can
be improved to be hard aborting, log and continue, etc. I think
these are nice to have improvements but shouldn't block this series.

Also by adding get_parse_graph and set_parse_graph attributes as
in my previous flow_api work we can build programmable devices
and programmatically learn when rules can or can not be loaded
into the hardware. Again future work.

Any comments/feedback appreciated.

Thanks,
John

---

John Fastabend (7):
      net: rework ndo tc op to consume additional qdisc handle parameter
      net: rework setup_tc ndo op to consume general tc operand
      net: sched: add cls_u32 offload hooks for netdevs
      net: add tc offload feature flag
      net: tc: helper functions to query action types
      net: ixgbe: add minimal parser details for ixgbe
      net: ixgbe: add support for tc_u32 offload


 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    8 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |    2 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        |    9 +
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   11 +
 drivers/net/ethernet/intel/i40e/i40e_main.c      |   10 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  206 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h   |  112 ++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   13 +
 drivers/net/ethernet/sfc/efx.h                   |    3 
 drivers/net/ethernet/sfc/tx.c                    |   10 +
 drivers/net/ethernet/ti/netcp_core.c             |   14 +
 include/linux/netdev_features.h                  |    3 
 include/linux/netdevice.h                        |   24 ++-
 include/net/pkt_cls.h                            |   33 ++++
 include/net/tc_act/tc_gact.h                     |   16 ++
 net/core/ethtool.c                               |    1 
 net/sched/cls_u32.c                              |   73 ++++++++
 net/sched/sch_mqprio.c                           |    8 +
 21 files changed, 541 insertions(+), 26 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
@ 2016-02-03  9:27 ` John Fastabend
  2016-02-03  9:58   ` kbuild test robot
                     ` (2 more replies)
  2016-02-03  9:28 ` [net-next PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand John Fastabend
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-03  9:27 UTC (permalink / raw)
  To: amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher; +Cc: netdev, davem

The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |    7 +++++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        |    5 ++++-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   10 +++++++++-
 drivers/net/ethernet/intel/i40e/i40e_main.c      |    9 ++++++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   11 ++++++++++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   12 ++++++++++--
 drivers/net/ethernet/sfc/efx.h                   |    2 +-
 drivers/net/ethernet/sfc/tx.c                    |    5 ++++-
 drivers/net/ethernet/ti/netcp_core.c             |    5 ++++-
 include/linux/netdevice.h                        |    2 +-
 net/sched/sch_mqprio.c                           |    5 +++--
 13 files changed, 63 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 9695a4c..1c7ff51 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4272,6 +4272,13 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
 	return 0;
 }
 
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+	return bnx2x_setup_tc(dev, num_tc);
+}
+
 /* called with rtnl_lock */
 int bnx2x_change_mac_addr(struct net_device *dev, void *p)
 {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 4cbb03f8..e92d6e7 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,6 +486,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
 			struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 6c4e3a6..b17bb17 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12994,7 +12994,7 @@ static const struct net_device_ops bnx2x_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= poll_bnx2x,
 #endif
-	.ndo_setup_tc		= bnx2x_setup_tc,
+	.ndo_setup_tc		= __bnx2x_setup_tc,
 #ifdef CONFIG_BNX2X_SRIOV
 	.ndo_set_vf_mac		= bnx2x_set_vf_mac,
 	.ndo_set_vf_vlan	= bnx2x_set_vf_vlan,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5dc89e5..ff08faf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5370,10 +5370,13 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
 {
 	struct bnxt *bp = netdev_priv(dev);
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	if (tc > bp->max_tc) {
 		netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n",
 			   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 662569d..12701a4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1204,6 +1204,14 @@ err_queueing_scheme:
 	return err;
 }
 
+static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return fm10k_setup_tc(dev, tc);
+}
+
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 {
 	switch (cmd) {
@@ -1386,7 +1394,7 @@ static const struct net_device_ops fm10k_netdev_ops = {
 	.ndo_vlan_rx_kill_vid	= fm10k_vlan_rx_kill_vid,
 	.ndo_set_rx_mode	= fm10k_set_rx_mode,
 	.ndo_get_stats64	= fm10k_get_stats64,
-	.ndo_setup_tc		= fm10k_setup_tc,
+	.ndo_setup_tc		= __fm10k_setup_tc,
 	.ndo_set_vf_mac		= fm10k_ndo_set_vf_mac,
 	.ndo_set_vf_vlan	= fm10k_ndo_set_vf_vlan,
 	.ndo_set_vf_rate	= fm10k_ndo_set_vf_bw,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8f3b53e..ae70e7c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5306,6 +5306,13 @@ exit:
 	return ret;
 }
 
+static int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+	return i40e_setup_tc(netdev, tc);
+}
+
 /**
  * i40e_open - Called when a network interface is made active
  * @netdev: network interface device structure
@@ -8890,7 +8897,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= i40e_netpoll,
 #endif
-	.ndo_setup_tc		= i40e_setup_tc,
+	.ndo_setup_tc		= __i40e_setup_tc,
 #ifdef I40E_FCOE
 	.ndo_fcoe_enable	= i40e_fcoe_enable,
 	.ndo_fcoe_disable	= i40e_fcoe_disable,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c4003a8..61ec625 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8200,6 +8200,15 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
+int __ixgbe_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+{
+	/* Only support egress tc setup for now */
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return ixgbe_setup_tc(dev, tc);
+}
+
 #ifdef CONFIG_PCI_IOV
 void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter)
 {
@@ -8658,7 +8667,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
 	.ndo_get_stats64	= ixgbe_get_stats64,
 #ifdef CONFIG_IXGBE_DCB
-	.ndo_setup_tc		= ixgbe_setup_tc,
+	.ndo_setup_tc		= __ixgbe_setup_tc,
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0c7e3f6..d5c6c16 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -69,6 +69,14 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	return 0;
 }
 
+static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, u8 up)
+{
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
+	return mlx4_en_setup_tc(dev, up);
+}
+
 #ifdef CONFIG_RFS_ACCEL
 
 struct mlx4_en_filter {
@@ -2466,7 +2474,7 @@ static const struct net_device_ops mlx4_netdev_ops = {
 #endif
 	.ndo_set_features	= mlx4_en_set_features,
 	.ndo_fix_features	= mlx4_en_fix_features,
-	.ndo_setup_tc		= mlx4_en_setup_tc,
+	.ndo_setup_tc		= __mlx4_en_setup_tc,
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= mlx4_en_filter_rfs,
 #endif
@@ -2504,7 +2512,7 @@ static const struct net_device_ops mlx4_netdev_ops_master = {
 #endif
 	.ndo_set_features	= mlx4_en_set_features,
 	.ndo_fix_features	= mlx4_en_fix_features,
-	.ndo_setup_tc		= mlx4_en_setup_tc,
+	.ndo_setup_tc		= __mlx4_en_setup_tc,
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= mlx4_en_filter_rfs,
 #endif
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 1082747..7815fa0 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -32,7 +32,7 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
 				struct net_device *net_dev);
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
-int efx_setup_tc(struct net_device *net_dev, u8 num_tc);
+int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc);
 unsigned int efx_tx_max_skb_descs(struct efx_nic *efx);
 extern unsigned int efx_piobuf_size;
 extern bool efx_separate_tx_channels;
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index f7a0ec1..8f1d53e 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -562,7 +562,7 @@ void efx_init_tx_queue_core_txq(struct efx_tx_queue *tx_queue)
 				     efx->n_tx_channels : 0));
 }
 
-int efx_setup_tc(struct net_device *net_dev, u8 num_tc)
+int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	struct efx_channel *channel;
@@ -570,6 +570,9 @@ int efx_setup_tc(struct net_device *net_dev, u8 num_tc)
 	unsigned tc;
 	int rc;
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	if (efx_nic_rev(efx) < EFX_REV_FALCON_B0 || num_tc > EFX_MAX_TX_TC)
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index c61d66d..40cde81 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1835,13 +1835,16 @@ static u16 netcp_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return 0;
 }
 
-static int netcp_setup_tc(struct net_device *dev, u8 num_tc)
+static int netcp_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
 {
 	int i;
 
 	/* setup tc must be called under rtnl lock */
 	ASSERT_RTNL();
 
+	if (handle != TC_H_ROOT)
+		return -EINVAL;
+
 	/* Sanity-check the number of traffic classes requested */
 	if ((dev->real_num_tx_queues <= 1) ||
 	    (dev->real_num_tx_queues < num_tc))
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c231..e8d01a1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1150,7 +1150,7 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
-	int			(*ndo_setup_tc)(struct net_device *dev, u8 tc);
+	int			(*ndo_setup_tc)(struct net_device *dev, u32 handle, u8 tc);
 #if IS_ENABLED(CONFIG_FCOE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6bc2617..4cec88a 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -39,7 +39,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_owned && dev->netdev_ops->ndo_setup_tc)
-		dev->netdev_ops->ndo_setup_tc(dev, 0);
+		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0);
 	else
 		netdev_set_num_tc(dev, 0);
 }
@@ -141,7 +141,8 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 */
 	if (qopt->hw) {
 		priv->hw_owned = 1;
-		err = dev->netdev_ops->ndo_setup_tc(dev, qopt->num_tc);
+		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle,
+						    qopt->num_tc);
 		if (err)
 			goto err;
 	} else {

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
  2016-02-03  9:27 ` [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
@ 2016-02-03  9:28 ` John Fastabend
  2016-02-03  9:28 ` [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-03  9:28 UTC (permalink / raw)
  To: amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher; +Cc: netdev, davem

This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |    7 ++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |    3 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |    8 ++++++--
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |    7 ++++---
 drivers/net/ethernet/intel/i40e/i40e_main.c     |    7 ++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |    7 ++++---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |    7 ++++---
 drivers/net/ethernet/sfc/efx.h                  |    3 ++-
 drivers/net/ethernet/sfc/tx.c                   |    9 ++++++---
 drivers/net/ethernet/ti/netcp_core.c            |   13 +++++++------
 include/linux/netdevice.h                       |   20 +++++++++++++++++++-
 net/sched/sch_mqprio.c                          |    9 ++++++---
 12 files changed, 68 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 1c7ff51..e925831 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4272,11 +4272,12 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
 	return 0;
 }
 
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
-	return bnx2x_setup_tc(dev, num_tc);
+	return bnx2x_setup_tc(dev, tc->tc);
 }
 
 /* called with rtnl_lock */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index e92d6e7..ef2c776 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,7 +486,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
 			struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ff08faf..169920a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5370,13 +5370,17 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			 struct tc_to_netdev *ntc)
 {
 	struct bnxt *bp = netdev_priv(dev);
+	u8 tc;
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
+	tc = ntc->tc;
+
 	if (tc > bp->max_tc) {
 		netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n",
 			   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 12701a4..dc1a821 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1204,12 +1204,13 @@ err_queueing_scheme:
 	return err;
 }
 
-static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int __fm10k_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			    struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return fm10k_setup_tc(dev, tc);
+	return fm10k_setup_tc(dev, tc->tc);
 }
 
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index ae70e7c..8a84328 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5306,11 +5306,12 @@ exit:
 	return ret;
 }
 
-static int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+static int __i40e_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+			   struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
-	return i40e_setup_tc(netdev, tc);
+	return i40e_setup_tc(netdev, tc->tc);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 61ec625..03e236c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8200,13 +8200,14 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
-int __ixgbe_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+		     struct tc_to_netdev *tc)
 {
 	/* Only support egress tc setup for now */
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return ixgbe_setup_tc(dev, tc);
+	return ixgbe_setup_tc(dev, tc->tc);
 }
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index d5c6c16..01d6a96 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -69,12 +69,13 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	return 0;
 }
 
-static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, u8 up)
+static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			      struct tc_to_netdev *tc)
 {
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
-	return mlx4_en_setup_tc(dev, up);
+	return mlx4_en_setup_tc(dev, tc->tc);
 }
 
 #ifdef CONFIG_RFS_ACCEL
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 7815fa0..5e3f93f 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -32,7 +32,8 @@ netdev_tx_t efx_hard_start_xmit(struct sk_buff *skb,
 				struct net_device *net_dev);
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb);
 void efx_xmit_done(struct efx_tx_queue *tx_queue, unsigned int index);
-int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc);
+int efx_setup_tc(struct net_device *net_dev, u32 handle, __be16 proto,
+		 struct tc_to_netdev *tc);
 unsigned int efx_tx_max_skb_descs(struct efx_nic *efx);
 extern unsigned int efx_piobuf_size;
 extern bool efx_separate_tx_channels;
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 8f1d53e..2cdb571 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -562,17 +562,20 @@ void efx_init_tx_queue_core_txq(struct efx_tx_queue *tx_queue)
 				     efx->n_tx_channels : 0));
 }
 
-int efx_setup_tc(struct net_device *net_dev, u32 handle, u8 num_tc)
+int efx_setup_tc(struct net_device *net_dev, u32 handle, __be16 proto,
+		 struct tc_to_netdev *ntc)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	struct efx_channel *channel;
 	struct efx_tx_queue *tx_queue;
-	unsigned tc;
+	unsigned tc, num_tc;
 	int rc;
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
+	num_tc = ntc->tc;
+
 	if (efx_nic_rev(efx) < EFX_REV_FALCON_B0 || num_tc > EFX_MAX_TX_TC)
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 40cde81..8586a20 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -1835,25 +1835,26 @@ static u16 netcp_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return 0;
 }
 
-static int netcp_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+static int netcp_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			  struct tc_to_netdev tc)
 {
 	int i;
 
 	/* setup tc must be called under rtnl lock */
 	ASSERT_RTNL();
 
-	if (handle != TC_H_ROOT)
+	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
 	/* Sanity-check the number of traffic classes requested */
 	if ((dev->real_num_tx_queues <= 1) ||
-	    (dev->real_num_tx_queues < num_tc))
+	    (dev->real_num_tx_queues < tc->tc))
 		return -EINVAL;
 
 	/* Configure traffic class to queue mappings */
-	if (num_tc) {
-		netdev_set_num_tc(dev, num_tc);
-		for (i = 0; i < num_tc; i++)
+	if (tc->tc) {
+		netdev_set_num_tc(dev, tc->tc);
+		for (i = 0; i < tc->tc; i++)
 			netdev_set_tc_queue(dev, i, 1, i);
 	} else {
 		netdev_reset_tc(dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e8d01a1..9090ff7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -778,6 +778,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
+/* This structure holds attributes of qdisc and classifiers
+ * that are being passed to the netdevice through the setup_tc op.
+ */
+enum {
+	TC_SETUP_MQPRIO,
+};
+
+struct tc_to_netdev {
+	unsigned int type;
+	union {
+		u8 tc;
+	};
+};
+
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1150,7 +1165,10 @@ struct net_device_ops {
 	int			(*ndo_set_vf_rss_query_en)(
 						   struct net_device *dev,
 						   int vf, bool setting);
-	int			(*ndo_setup_tc)(struct net_device *dev, u32 handle, u8 tc);
+	int			(*ndo_setup_tc)(struct net_device *dev,
+						u32 handle,
+						__be16 protocol,
+						struct tc_to_netdev *tc);
 #if IS_ENABLED(CONFIG_FCOE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 4cec88a..06b2797 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -28,6 +28,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
+	struct tc_to_netdev tc = {.type = TC_SETUP_MQPRIO};
 	unsigned int ntx;
 
 	if (priv->qdiscs) {
@@ -39,7 +40,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_owned && dev->netdev_ops->ndo_setup_tc)
-		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0);
+		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, &tc);
 	else
 		netdev_set_num_tc(dev, 0);
 }
@@ -140,9 +141,11 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 * supplied and verified mapping
 	 */
 	if (qopt->hw) {
+		struct tc_to_netdev tc = {.type = TC_SETUP_MQPRIO,
+					  .tc = qopt->num_tc};
+
 		priv->hw_owned = 1;
-		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle,
-						    qopt->num_tc);
+		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, &tc);
 		if (err)
 			goto err;
 	} else {

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
  2016-02-03  9:27 ` [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
  2016-02-03  9:28 ` [net-next PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand John Fastabend
@ 2016-02-03  9:28 ` John Fastabend
  2016-02-03 10:14   ` kbuild test robot
  2016-02-04 13:18   ` Amir Vadai"
  2016-02-03  9:28 ` [net-next PATCH 4/7] net: add tc offload feature flag John Fastabend
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-03  9:28 UTC (permalink / raw)
  To: amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher; +Cc: netdev, davem

This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/netdevice.h |    6 +++-
 include/net/pkt_cls.h     |   33 ++++++++++++++++++++
 net/sched/cls_u32.c       |   73 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9090ff7..861ce67 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -778,17 +778,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
 
-/* This structure holds attributes of qdisc and classifiers
+/* These structures hold the attributes of qdisc and classifiers
  * that are being passed to the netdevice through the setup_tc op.
  */
 enum {
 	TC_SETUP_MQPRIO,
+	TC_SETUP_CLSU32,
 };
 
+struct tc_cls_u32_offload;
+
 struct tc_to_netdev {
 	unsigned int type;
 	union {
 		u8 tc;
+		struct tc_cls_u32_offload *cls_u32;
 	};
 };
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index bc49967..0bd12cd 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -358,4 +358,37 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 #endif /* CONFIG_NET_CLS_IND */
 
+struct tc_cls_u32_knode {
+	struct tcf_exts *exts;
+	u8 fshift;
+	u32 handle;
+	u32 val;
+	u32 mask;
+	u32 link_handle;
+	struct tc_u32_sel *sel;
+};
+
+struct tc_cls_u32_hnode {
+	u32 handle;
+	u32 prio;
+	unsigned int divisor;
+};
+
+enum {
+	TC_CLSU32_NEW_KNODE,
+	TC_CLSU32_REPLACE_KNODE,
+	TC_CLSU32_DELETE_KNODE,
+	TC_CLSU32_NEW_HNODE,
+	TC_CLSU32_REPLACE_HNODE,
+};
+
+struct tc_cls_u32_offload {
+	/* knode values */
+	int command;
+	union {
+		struct tc_cls_u32_knode knode;
+		struct tc_cls_u32_hnode hnode;
+	};
+};
+
 #endif
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 4fbb674..dfaaf29 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -43,6 +43,7 @@
 #include <net/netlink.h>
 #include <net/act_api.h>
 #include <net/pkt_cls.h>
+#include <linux/netdevice.h>
 
 struct tc_u_knode {
 	struct tc_u_knode __rcu	*next;
@@ -424,6 +425,68 @@ static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 	return 0;
 }
 
+static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
+		offload.cls_u32->knode.handle = handle;
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
+static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
+		offload.cls_u32->hnode.divisor = h->divisor;
+		offload.cls_u32->hnode.handle = h->handle;
+		offload.cls_u32->hnode.prio = h->prio;
+
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
+static void u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_u32_offload u32_offload = {0};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSU32;
+	offload.cls_u32 = &u32_offload;
+
+	if (dev->netdev_ops->ndo_setup_tc) {
+		offload.cls_u32->command = TC_CLSU32_REPLACE_KNODE;
+		offload.cls_u32->knode.handle = n->handle;
+		offload.cls_u32->knode.fshift = n->fshift;
+		offload.cls_u32->knode.val = n->val;
+		offload.cls_u32->knode.mask = n->mask;
+		offload.cls_u32->knode.sel = &n->sel;
+		offload.cls_u32->knode.exts = &n->exts;
+		if (n->ht_down)
+			offload.cls_u32->knode.link_handle = n->ht_down->handle;
+
+		dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					      tp->protocol, &offload);
+	}
+}
+
 static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 {
 	struct tc_u_knode *n;
@@ -434,6 +497,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 			RCU_INIT_POINTER(ht->ht[h],
 					 rtnl_dereference(n->next));
 			tcf_unbind_filter(tp, &n->res);
+			u32_remove_hw_knode(tp, n->handle);
 			call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
 		}
 	}
@@ -540,8 +604,10 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
 	if (ht == NULL)
 		return 0;
 
-	if (TC_U32_KEY(ht->handle))
+	if (TC_U32_KEY(ht->handle)) {
+		u32_remove_hw_knode(tp, ht->handle);
 		return u32_delete_key(tp, (struct tc_u_knode *)ht);
+	}
 
 	if (root_ht == ht)
 		return -EINVAL;
@@ -769,6 +835,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		u32_replace_knode(tp, tp_c, new);
 		tcf_unbind_filter(tp, &n->res);
 		call_rcu(&n->rcu, u32_delete_key_rcu);
+		u32_replace_hw_knode(tp, new);
 		return 0;
 	}
 
@@ -795,6 +862,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		RCU_INIT_POINTER(ht->next, tp_c->hlist);
 		rcu_assign_pointer(tp_c->hlist, ht);
 		*arg = (unsigned long)ht;
+
+		u32_replace_hw_hnode(tp, ht);
 		return 0;
 	}
 
@@ -877,7 +946,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 
 		RCU_INIT_POINTER(n->next, pins);
 		rcu_assign_pointer(*ins, n);
-
+		u32_replace_hw_knode(tp, n);
 		*arg = (unsigned long)n;
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH 4/7] net: add tc offload feature flag
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (2 preceding siblings ...)
  2016-02-03  9:28 ` [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
@ 2016-02-03  9:28 ` John Fastabend
  2016-02-03  9:29 ` [net-next PATCH 5/7] net: tc: helper functions to query action types John Fastabend
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-03  9:28 UTC (permalink / raw)
  To: amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher; +Cc: netdev, davem

Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/netdev_features.h |    3 +++
 net/core/ethtool.c              |    1 +
 2 files changed, 4 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index d9654f0e..a734bf4 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -67,6 +67,8 @@ enum {
 	NETIF_F_HW_L2FW_DOFFLOAD_BIT,	/* Allow L2 Forwarding in Hardware */
 	NETIF_F_BUSY_POLL_BIT,		/* Busy poll */
 
+	NETIF_F_HW_TC_BIT,		/* Offload TC infrastructure */
+
 	/*
 	 * Add your fresh new feature above and remember to update
 	 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -124,6 +126,7 @@ enum {
 #define NETIF_F_HW_VLAN_STAG_TX	__NETIF_F(HW_VLAN_STAG_TX)
 #define NETIF_F_HW_L2FW_DOFFLOAD	__NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL	__NETIF_F(BUSY_POLL)
+#define NETIF_F_HW_TC		__NETIF_F(HW_TC)
 
 #define for_each_netdev_feature(mask_addr, bit)	\
 	for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index daf0470..636c1c1 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -98,6 +98,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_RXALL_BIT] =            "rx-all",
 	[NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
 	[NETIF_F_BUSY_POLL_BIT] =        "busy-poll",
+	[NETIF_F_HW_TC_BIT] =		 "hw-tc-offload",
 };
 
 static const char

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH 5/7] net: tc: helper functions to query action types
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (3 preceding siblings ...)
  2016-02-03  9:28 ` [net-next PATCH 4/7] net: add tc offload feature flag John Fastabend
@ 2016-02-03  9:29 ` John Fastabend
  2016-02-03  9:29 ` [net-next PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe John Fastabend
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-03  9:29 UTC (permalink / raw)
  To: amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher; +Cc: netdev, davem

This is a helper function drivers can use to learn if the
action type is a drop action.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/tc_act/tc_gact.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
index 592a6bc..3067a10 100644
--- a/include/net/tc_act/tc_gact.h
+++ b/include/net/tc_act/tc_gact.h
@@ -2,6 +2,7 @@
 #define __NET_TC_GACT_H
 
 #include <net/act_api.h>
+#include <linux/tc_act/tc_gact.h>
 
 struct tcf_gact {
 	struct tcf_common	common;
@@ -15,4 +16,19 @@ struct tcf_gact {
 #define to_gact(a) \
 	container_of(a->priv, struct tcf_gact, common)
 
+#ifdef CONFIG_NET_CLS_ACT
+static inline bool is_tcf_gact_dropped(const struct tc_action *a)
+{
+	struct tcf_gact *gact;
+
+	if (a->ops && a->ops->type != TCA_ACT_GACT)
+		return false;
+
+	gact = a->priv;
+	if (gact->tcf_action == TC_ACT_SHOT)
+		return true;
+
+	return false;
+}
+#endif
 #endif /* __NET_TC_GACT_H */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (4 preceding siblings ...)
  2016-02-03  9:29 ` [net-next PATCH 5/7] net: tc: helper functions to query action types John Fastabend
@ 2016-02-03  9:29 ` John Fastabend
  2016-02-03  9:29 ` [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-03  9:29 UTC (permalink / raw)
  To: amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher; +Cc: netdev, davem

This adds an ixgbe data structure that is used to determine what
headers:fields can be matched and in what order they are supported.

For hardware devices this can be a bit tricky because typically
only pre-programmed (firmware, ucode, rtl) parse graphs will be
supported and we don't yet have an interface to change these from
the OS. So its sort of a you get whatever your friendly vendor
provides affair at the moment.

In the future we can add the get routines and set routines to
update this data structure. One interesting thing to note here
is the data structure here identifies ethernet, ip, and tcp
fields without having to hardcode them as enumerations or use
other identifiers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h |  112 ++++++++++++++++++++++++
 1 file changed, 112 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
new file mode 100644
index 0000000..43ebec4
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
@@ -0,0 +1,112 @@
+/*******************************************************************************
+ *
+ * Intel 10 Gigabit PCI Express Linux drive
+ * Copyright(c) 2013 - 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Contact Information:
+ * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+ *
+ ******************************************************************************/
+
+#ifndef _IXGBE_MODEL_H_
+#define _IXGBE_MODEL_H_
+
+#include "ixgbe.h"
+#include "ixgbe_type.h"
+
+struct ixgbe_mat_field {
+	unsigned int off;
+	unsigned int mask;
+	int (*val)(struct ixgbe_fdir_filter *input,
+		   union ixgbe_atr_input *mask,
+		   __u32 val, __u32 m);
+	unsigned int type;
+};
+
+static inline int ixgbe_mat_prgm_sip(struct ixgbe_fdir_filter *input,
+				     union ixgbe_atr_input *mask,
+				     __u32 val, __u32 m)
+{
+	input->filter.formatted.src_ip[0] = val;
+	mask->formatted.src_ip[0] = m;
+	return 0;
+}
+
+static inline int ixgbe_mat_prgm_dip(struct ixgbe_fdir_filter *input,
+				     union ixgbe_atr_input *mask,
+				     __u32 val, __u32 m)
+{
+	input->filter.formatted.dst_ip[0] = val;
+	mask->formatted.dst_ip[0] = m;
+	return 0;
+}
+
+static struct ixgbe_mat_field ixgbe_ipv4_fields[] = {
+	{ .off = 12, .mask = -1, .val = ixgbe_mat_prgm_sip,
+	  .type = IXGBE_ATR_FLOW_TYPE_IPV4},
+	{ .off = 16, .mask = -1, .val = ixgbe_mat_prgm_dip,
+	  .type = IXGBE_ATR_FLOW_TYPE_IPV4},
+	{ .val = NULL } /* terminal node */
+};
+
+static inline int ixgbe_mat_prgm_sport(struct ixgbe_fdir_filter *input,
+				       union ixgbe_atr_input *mask,
+				       __u32 val, __u32 m)
+{
+	input->filter.formatted.src_port = val & 0xffff;
+	mask->formatted.src_port = m & 0xffff;
+	return 0;
+};
+
+static inline int ixgbe_mat_prgm_dport(struct ixgbe_fdir_filter *input,
+				       union ixgbe_atr_input *mask,
+				       __u32 val, __u32 m)
+{
+	input->filter.formatted.dst_port = val & 0xffff;
+	mask->formatted.dst_port = m & 0xffff;
+	return 0;
+};
+
+static struct ixgbe_mat_field ixgbe_tcp_fields[] = {
+	{.off = 0, .mask = 0xffff, .val = ixgbe_mat_prgm_sport,
+	 .type = IXGBE_ATR_FLOW_TYPE_TCPV4},
+	{.off = 2, .mask = 0xffff, .val = ixgbe_mat_prgm_dport,
+	 .type = IXGBE_ATR_FLOW_TYPE_TCPV4},
+	{ .val = NULL } /* terminal node */
+};
+
+struct ixgbe_nexthdr {
+	/* offset, shift, and mask of position to next header */
+	unsigned int o;
+	__u32 s;
+	__u32 m;
+	/* match criteria to make this jump*/
+	unsigned int off;
+	__u32 val;
+	__u32 mask;
+	/* location of jump to make */
+	struct ixgbe_mat_field *jump;
+};
+
+static struct ixgbe_nexthdr ixgbe_ipv4_jumps[] = {
+	{ .o = 0, .s = 6, .m = 0xf,
+	  .off = 8, .val = 0x600, .mask = 0xff00, .jump = ixgbe_tcp_fields},
+	{ .jump = NULL } /* terminal node */
+};
+#endif /* _IXGBE_MODEL_H_ */

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (5 preceding siblings ...)
  2016-02-03  9:29 ` [net-next PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe John Fastabend
@ 2016-02-03  9:29 ` John Fastabend
  2016-02-03 10:07   ` Amir Vadai"
  2016-02-04  7:30   ` Amir Vadai"
  2016-02-03 10:11 ` [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe Amir Vadai"
  2016-02-04  9:16 ` Jiri Pirko
  8 siblings, 2 replies; 34+ messages in thread
From: John Fastabend @ 2016-02-03  9:29 UTC (permalink / raw)
  To: amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher; +Cc: netdev, davem

This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.

However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.

Also only the drop action is supported initially.

Here is a short test script,

 #tc qdisc add dev eth4 ingress
 #tc filter add dev eth4 parent ffff: protocol ip \
	u32 ht 800: order 1 \
	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop

<-- hardware has dst/src ip match rule installed -->

 #tc filter del dev eth4 parent ffff: prio 49152
 #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
	handle 1: u32 divisor 1
 #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
	u32 ht 800: order 1 link 1: \
	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
 #tc filter add dev eth4 parent ffff: protocol ip \
	u32 ht 1: order 3 match tcp src 23 ffff action drop

<-- hardware has tcp src port rule installed -->

 #tc qdisc del dev eth4 parent ffff:

<-- hardware cleaned up -->

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  196 ++++++++++++++++++++++
 3 files changed, 198 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4b9156c..09c2d9b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -925,6 +925,9 @@ s32 ixgbe_fdir_erase_perfect_filter_82599(struct ixgbe_hw *hw,
 					  u16 soft_id);
 void ixgbe_atr_compute_perfect_hash_82599(union ixgbe_atr_input *input,
 					  union ixgbe_atr_input *mask);
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+				    struct ixgbe_fdir_filter *input,
+				    u16 sw_idx);
 void ixgbe_set_rx_mode(struct net_device *netdev);
 #ifdef CONFIG_IXGBE_DCB
 void ixgbe_set_rx_drop_en(struct ixgbe_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index bea96b3..726e0ee 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2520,9 +2520,9 @@ static int ixgbe_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
 	return ret;
 }
 
-static int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
-					   struct ixgbe_fdir_filter *input,
-					   u16 sw_idx)
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+				    struct ixgbe_fdir_filter *input,
+				    u16 sw_idx)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct hlist_node *node2;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 03e236c..a1a91bf 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -51,6 +51,7 @@
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
 #include <net/vxlan.h>
+#include <net/pkt_cls.h>
 
 #ifdef CONFIG_OF
 #include <linux/of_net.h>
@@ -8200,10 +8201,197 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	return 0;
 }
 
+#include <net/tc_act/tc_gact.h>
+#include "ixgbe_model.h"
+static int ixgbe_delete_clsu32(struct ixgbe_adapter *adapter,
+			       struct tc_cls_u32_offload *cls)
+{
+	int err;
+
+	spin_lock(&adapter->fdir_perfect_lock);
+	err = ixgbe_update_ethtool_fdir_entry(adapter, NULL, cls->knode.handle);
+	spin_unlock(&adapter->fdir_perfect_lock);
+	return err;
+}
+
+#define IXGBE_MAX_LINK_HANDLE 10
+static struct ixgbe_mat_field *
+ixgbe_jump_tables[IXGBE_MAX_LINK_HANDLE] = {ixgbe_ipv4_fields,};
+
+static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
+				  __be16 protocol,
+				  struct tc_cls_u32_offload *cls)
+{
+	u32 loc = cls->knode.handle & 0xfffff;
+	struct ixgbe_hw *hw = &adapter->hw;
+	struct ixgbe_mat_field *field_ptr;
+	struct ixgbe_fdir_filter *input;
+	union ixgbe_atr_input mask;
+#ifdef CONFIG_NET_CLS_ACT
+	const struct tc_action *a;
+#endif
+	int i, err = 0;
+	u8 queue;
+	u32 handle;
+
+	memset(&mask, 0, sizeof(union ixgbe_atr_input));
+	handle = cls->knode.handle;
+
+	/* At the moment cls_u32 jumps to transport layer and skips past
+	 * L2 headers. The canonical method to match L2 frames is to use
+	 * negative values. However this is error prone at best but really
+	 * just broken because there is no way to "know" what sort of hdr
+	 * is in front of the transport layer. Fix cls_u32 to support L2
+	 * headers when needed.
+	 */
+	if (protocol != htons(ETH_P_IP))
+		return -EINVAL;
+
+	if (cls->knode.link_handle ||
+	    cls->knode.link_handle >= IXGBE_MAX_LINK_HANDLE) {
+		struct ixgbe_nexthdr *nexthdr = ixgbe_ipv4_jumps;
+		u32 uhtid = TC_U32_USERHTID(cls->knode.link_handle);
+
+		for (i = 0; nexthdr[i].jump; i++) {
+			if (nexthdr->o != cls->knode.sel->offoff ||
+			    nexthdr->s != cls->knode.sel->offshift ||
+			    nexthdr->m != cls->knode.sel->offmask ||
+			    /* do not support multiple key jumps its just mad */
+			    cls->knode.sel->nkeys > 1)
+				return -EINVAL;
+
+			if (nexthdr->off != cls->knode.sel->keys[0].off ||
+			    nexthdr->val != cls->knode.sel->keys[0].val ||
+			    nexthdr->mask != cls->knode.sel->keys[0].mask)
+				return -EINVAL;
+
+			if (uhtid >= IXGBE_MAX_LINK_HANDLE)
+				return -EINVAL;
+
+			ixgbe_jump_tables[uhtid] = nexthdr->jump;
+		}
+		return 0;
+	}
+
+	if (loc >= ((1024 << adapter->fdir_pballoc) - 2)) {
+		e_err(drv, "Location out of range\n");
+		return -EINVAL;
+	}
+
+	/* cls u32 is a graph starting at root node 0x800. The driver tracks
+	 * links and also the fields used to advance the parser across each
+	 * link (e.g. nexthdr/eat parameters from 'tc'). This way we can map
+	 * the u32 graph onto the hardware parse graph denoted in ixgbe_model.h
+	 * To add support for new nodes update ixgbe_model.h parse structures
+	 * this function _should_ be generic try not to hardcode values here.
+	 */
+	if (TC_U32_USERHTID(handle) == 0x800) {
+		field_ptr = ixgbe_jump_tables[0];
+	} else {
+		if (TC_U32_USERHTID(handle) >= ARRAY_SIZE(ixgbe_jump_tables))
+			return -EINVAL;
+
+		field_ptr = ixgbe_jump_tables[TC_U32_USERHTID(handle)];
+	}
+
+	if (!field_ptr)
+		return -EINVAL;
+
+	input = kzalloc(sizeof(*input), GFP_KERNEL);
+	if (!input)
+		return -ENOMEM;
+
+	for (i = 0; i < cls->knode.sel->nkeys; i++) {
+		int off = cls->knode.sel->keys[i].off;
+		__be32 val = cls->knode.sel->keys[i].val;
+		__be32 m = cls->knode.sel->keys[i].mask;
+		bool found_entry = false;
+		int j;
+
+		for (j = 0; field_ptr[j].val; j++) {
+			if (field_ptr[j].off == off &&
+			    field_ptr[j].mask == m) {
+				field_ptr[j].val(input, &mask, val, m);
+				input->filter.formatted.flow_type |=
+					field_ptr[j].type;
+				found_entry = true;
+				break;
+			}
+		}
+
+		if (!found_entry)
+			goto err_out;
+	}
+
+	mask.formatted.flow_type = IXGBE_ATR_L4TYPE_IPV6_MASK |
+				   IXGBE_ATR_L4TYPE_MASK;
+
+	if (input->filter.formatted.flow_type == IXGBE_ATR_FLOW_TYPE_IPV4)
+		mask.formatted.flow_type &= IXGBE_ATR_L4TYPE_IPV6_MASK;
+
+#ifdef CONFIG_NET_CLS_ACT
+	if (list_empty(&cls->knode.exts->actions))
+		goto err_out;
+
+	list_for_each_entry(a, &cls->knode.exts->actions, list) {
+		if (!is_tcf_gact_dropped(a))
+			goto err_out;
+	}
+#endif
+
+	input->action = IXGBE_FDIR_DROP_QUEUE;
+	queue = IXGBE_FDIR_DROP_QUEUE;
+	input->sw_idx = loc;
+
+	spin_lock(&adapter->fdir_perfect_lock);
+
+	if (hlist_empty(&adapter->fdir_filter_list)) {
+		memcpy(&adapter->fdir_mask, &mask, sizeof(mask));
+		err = ixgbe_fdir_set_input_mask_82599(hw, &mask);
+		if (err)
+			goto err_out_w_lock;
+	} else if (memcmp(&adapter->fdir_mask, &mask, sizeof(mask))) {
+		err = -EINVAL;
+		goto err_out_w_lock;
+	}
+
+	ixgbe_atr_compute_perfect_hash_82599(&input->filter, &mask);
+	err = ixgbe_fdir_write_perfect_filter_82599(hw, &input->filter,
+						    input->sw_idx, queue);
+	if (!err)
+		ixgbe_update_ethtool_fdir_entry(adapter, input, input->sw_idx);
+	spin_unlock(&adapter->fdir_perfect_lock);
+
+	return err;
+err_out_w_lock:
+	spin_unlock(&adapter->fdir_perfect_lock);
+err_out:
+	kfree(input);
+	return -EINVAL;
+}
+
 int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
 		     struct tc_to_netdev *tc)
 {
-	/* Only support egress tc setup for now */
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+	if (TC_H_MAJ(handle) == TC_H_MAJ(TC_H_INGRESS) &&
+	    tc->type == TC_SETUP_CLSU32) {
+		if (!(dev->hw_features & NETIF_F_HW_TC))
+			return -EINVAL;
+
+		switch (tc->cls_u32->command) {
+		case TC_CLSU32_NEW_KNODE:
+		case TC_CLSU32_REPLACE_KNODE:
+			return ixgbe_configure_clsu32(adapter,
+						      proto, tc->cls_u32);
+		case TC_CLSU32_DELETE_KNODE:
+			return ixgbe_delete_clsu32(adapter, tc->cls_u32);
+		default:
+			return -EINVAL;
+		}
+	}
+
 	if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
 		return -EINVAL;
 
@@ -8277,6 +8465,7 @@ static int ixgbe_set_features(struct net_device *netdev,
 	 */
 	switch (features & NETIF_F_NTUPLE) {
 	case NETIF_F_NTUPLE:
+	case NETIF_F_HW_TC:
 		/* turn off ATR, enable perfect filters and reset */
 		if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
 			need_reset = true;
@@ -8667,9 +8856,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_set_vf_trust	= ixgbe_ndo_set_vf_trust,
 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
 	.ndo_get_stats64	= ixgbe_get_stats64,
-#ifdef CONFIG_IXGBE_DCB
 	.ndo_setup_tc		= __ixgbe_setup_tc,
-#endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
@@ -9040,7 +9227,8 @@ skip_sriov:
 	case ixgbe_mac_X550EM_x:
 		netdev->features |= NETIF_F_SCTP_CRC;
 		netdev->hw_features |= NETIF_F_SCTP_CRC |
-				       NETIF_F_NTUPLE;
+				       NETIF_F_NTUPLE |
+				       NETIF_F_HW_TC;
 		break;
 	default:
 		break;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter
  2016-02-03  9:27 ` [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
@ 2016-02-03  9:58   ` kbuild test robot
  2016-02-03  9:59   ` kbuild test robot
  2016-02-03 11:44   ` kbuild test robot
  2 siblings, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2016-02-03  9:58 UTC (permalink / raw)
  To: John Fastabend
  Cc: kbuild-all, amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]

Hi John,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/John-Fastabend/tc-offload-for-cls_u32-on-ixgbe/20160203-173342
config: x86_64-randconfig-x016-201605 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c: In function '__fm10k_setup_tc':
>> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:16: error: 'TC_H_ROOT' undeclared (first use in this function)
     if (handle != TC_H_ROOT)
                   ^
   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:16: note: each undeclared identifier is reported only once for each function it appears in

vim +/TC_H_ROOT +1209 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c

  1203	
  1204		return err;
  1205	}
  1206	
  1207	static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
  1208	{
> 1209		if (handle != TC_H_ROOT)
  1210			return -EINVAL;
  1211	
  1212		return fm10k_setup_tc(dev, tc);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 26812 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter
  2016-02-03  9:27 ` [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
  2016-02-03  9:58   ` kbuild test robot
@ 2016-02-03  9:59   ` kbuild test robot
  2016-02-03 11:44   ` kbuild test robot
  2 siblings, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2016-02-03  9:59 UTC (permalink / raw)
  To: John Fastabend
  Cc: kbuild-all, amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

[-- Attachment #1: Type: text/plain, Size: 2750 bytes --]

Hi John,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/John-Fastabend/tc-offload-for-cls_u32-on-ixgbe/20160203-173342
config: x86_64-randconfig-x005-201605 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from drivers/net/ethernet/intel/fm10k/fm10k.h:24,
                    from drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:21:
   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c: In function '__fm10k_setup_tc':
   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:16: error: 'TC_H_ROOT' undeclared (first use in this function)
     if (handle != TC_H_ROOT)
                   ^
   include/linux/compiler.h:147:28: note: in definition of macro '__trace_if'
     if (__builtin_constant_p((cond)) ? !!(cond) :   \
                               ^
>> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:2: note: in expansion of macro 'if'
     if (handle != TC_H_ROOT)
     ^
   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:16: note: each undeclared identifier is reported only once for each function it appears in
     if (handle != TC_H_ROOT)
                   ^
   include/linux/compiler.h:147:28: note: in definition of macro '__trace_if'
     if (__builtin_constant_p((cond)) ? !!(cond) :   \
                               ^
>> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:2: note: in expansion of macro 'if'
     if (handle != TC_H_ROOT)
     ^

vim +/if +1209 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c

  1193		/* flag to indicate SWPRI has yet to be updated */
  1194		interface->flags |= FM10K_FLAG_SWPRI_CONFIG;
  1195	
  1196		return 0;
  1197	err_open:
  1198		fm10k_mbx_free_irq(interface);
  1199	err_mbx_irq:
  1200		fm10k_clear_queueing_scheme(interface);
  1201	err_queueing_scheme:
  1202		netif_device_detach(dev);
  1203	
  1204		return err;
  1205	}
  1206	
  1207	static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
  1208	{
> 1209		if (handle != TC_H_ROOT)
  1210			return -EINVAL;
  1211	
  1212		return fm10k_setup_tc(dev, tc);
  1213	}
  1214	
  1215	static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
  1216	{
  1217		switch (cmd) {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 32973 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-03  9:29 ` [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
@ 2016-02-03 10:07   ` Amir Vadai"
  2016-02-03 10:26     ` John Fastabend
  2016-02-04  7:30   ` Amir Vadai"
  1 sibling, 1 reply; 34+ messages in thread
From: Amir Vadai" @ 2016-02-03 10:07 UTC (permalink / raw)
  To: John Fastabend; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
> This adds initial support for offloading the u32 tc classifier. This
> initial implementation only implements a few base matches and actions
> to illustrate the use of the infrastructure patches.
> 
> However it is an interesting subset because it handles the u32 next
> hdr logic to correctly map tcp packets from ip headers using the ihl
> and protocol fields. After this is accepted we can extend the match
> and action fields easily by updating the model header file.
> 
> Also only the drop action is supported initially.
> 
> Here is a short test script,
> 
>  #tc qdisc add dev eth4 ingress
>  #tc filter add dev eth4 parent ffff: protocol ip \
> 	u32 ht 800: order 1 \
> 	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
> 
> <-- hardware has dst/src ip match rule installed -->
> 
>  #tc filter del dev eth4 parent ffff: prio 49152
>  #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
> 	handle 1: u32 divisor 1
>  #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
> 	u32 ht 800: order 1 link 1: \
> 	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>  #tc filter add dev eth4 parent ffff: protocol ip \
> 	u32 ht 1: order 3 match tcp src 23 ffff action drop
> 
> <-- hardware has tcp src port rule installed -->
> 
>  #tc qdisc del dev eth4 parent ffff:
> 
> <-- hardware cleaned up -->
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  196 ++++++++++++++++++++++
>  3 files changed, 198 insertions(+), 7 deletions(-)
> 

What are you doing w.r.t priorities? Are the filters processed by the
order of the priorities?

[...]

>  
> -static int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
> -					   struct ixgbe_fdir_filter *input,
> -					   u16 sw_idx)
> +int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
> +				    struct ixgbe_fdir_filter *input,
> +				    u16 sw_idx)
>  {
>  	struct ixgbe_hw *hw = &adapter->hw;
>  	struct hlist_node *node2;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 03e236c..a1a91bf 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -51,6 +51,7 @@
>  #include <linux/prefetch.h>
>  #include <scsi/fc/fc_fcoe.h>
>  #include <net/vxlan.h>
> +#include <net/pkt_cls.h>
>  
>  #ifdef CONFIG_OF
>  #include <linux/of_net.h>
> @@ -8200,10 +8201,197 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
>  	return 0;
>  }
>  
> +#include <net/tc_act/tc_gact.h>
> +#include "ixgbe_model.h"
Did you leave those #include's in the middle of the file on purpose?

[...]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (6 preceding siblings ...)
  2016-02-03  9:29 ` [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
@ 2016-02-03 10:11 ` Amir Vadai"
  2016-02-03 10:21   ` John Fastabend
  2016-02-04  9:16 ` Jiri Pirko
  8 siblings, 1 reply; 34+ messages in thread
From: Amir Vadai" @ 2016-02-03 10:11 UTC (permalink / raw)
  To: John Fastabend; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On Wed, Feb 03, 2016 at 01:27:32AM -0800, John Fastabend wrote:
> This extends the setup_tc framework so it can support more than just
> the mqprio offload and push other classifiers and qdiscs into the
> hardware. The series here targets the u32 classifier and ixgbe
> driver. I worked out the u32 classifier because it is protocol
> oblivious and aligns with multiple hardware devices I have access
> to. I did an initial implementation on ixgbe because (a) I have one
> in my box (b) its a stable driver and (c) it is relatively simple
> compared to the other devices I have here but still has enough
> flexibility to exercise the features of cls_u32.
> 
> I intentionally limited the scope of this series to the basic
> feature set. Specifically this uses a 'big hammer' feature bit
> to do the offload or not. If the bit is set you get offloaded rules
> if it is not then rules will not be offloaded. If we can agree on
> this patch series there are some more patches on my queue we can
> talk about to make the offload decision per rule using flags similar
> to how we do l2 mac updates. Additionally the error strategy can
> be improved to be hard aborting, log and continue, etc. I think
> these are nice to have improvements but shouldn't block this series.
> 
> Also by adding get_parse_graph and set_parse_graph attributes as
> in my previous flow_api work we can build programmable devices
> and programmatically learn when rules can or can not be loaded
> into the hardware. Again future work.
> 
> Any comments/feedback appreciated.
> 
> Thanks,
> John
> 
> ---
> 
> John Fastabend (7):
>       net: rework ndo tc op to consume additional qdisc handle parameter
>       net: rework setup_tc ndo op to consume general tc operand
>       net: sched: add cls_u32 offload hooks for netdevs
>       net: add tc offload feature flag
>       net: tc: helper functions to query action types
>       net: ixgbe: add minimal parser details for ixgbe
>       net: ixgbe: add support for tc_u32 offload
> 

Hi John,

Nice work :)

I will add mlx5 support, and see if can live with u32. If not - will
add flower support too.

Amir

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-03  9:28 ` [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
@ 2016-02-03 10:14   ` kbuild test robot
  2016-02-04 13:18   ` Amir Vadai"
  1 sibling, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2016-02-03 10:14 UTC (permalink / raw)
  To: John Fastabend
  Cc: kbuild-all, amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]

Hi John,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/John-Fastabend/tc-offload-for-cls_u32-on-ixgbe/20160203-173342
config: i386-randconfig-x009-02010231 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   net/sched/cls_u32.c: In function 'u32_replace_hw_knode':
>> net/sched/cls_u32.c:478:33: error: 'struct tc_u_knode' has no member named 'val'
      offload.cls_u32->knode.val = n->val;
                                    ^
>> net/sched/cls_u32.c:479:34: error: 'struct tc_u_knode' has no member named 'mask'
      offload.cls_u32->knode.mask = n->mask;
                                     ^

vim +478 net/sched/cls_u32.c

   472		offload.cls_u32 = &u32_offload;
   473	
   474		if (dev->netdev_ops->ndo_setup_tc) {
   475			offload.cls_u32->command = TC_CLSU32_REPLACE_KNODE;
   476			offload.cls_u32->knode.handle = n->handle;
   477			offload.cls_u32->knode.fshift = n->fshift;
 > 478			offload.cls_u32->knode.val = n->val;
 > 479			offload.cls_u32->knode.mask = n->mask;
   480			offload.cls_u32->knode.sel = &n->sel;
   481			offload.cls_u32->knode.exts = &n->exts;
   482			if (n->ht_down)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 27112 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-03 10:11 ` [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe Amir Vadai"
@ 2016-02-03 10:21   ` John Fastabend
  2016-02-03 10:31     ` Or Gerlitz
  0 siblings, 1 reply; 34+ messages in thread
From: John Fastabend @ 2016-02-03 10:21 UTC (permalink / raw)
  To: Amir Vadai"; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On 16-02-03 02:11 AM, Amir Vadai" wrote:
> On Wed, Feb 03, 2016 at 01:27:32AM -0800, John Fastabend wrote:
>> This extends the setup_tc framework so it can support more than just
>> the mqprio offload and push other classifiers and qdiscs into the
>> hardware. The series here targets the u32 classifier and ixgbe
>> driver. I worked out the u32 classifier because it is protocol
>> oblivious and aligns with multiple hardware devices I have access
>> to. I did an initial implementation on ixgbe because (a) I have one
>> in my box (b) its a stable driver and (c) it is relatively simple
>> compared to the other devices I have here but still has enough
>> flexibility to exercise the features of cls_u32.
>>
>> I intentionally limited the scope of this series to the basic
>> feature set. Specifically this uses a 'big hammer' feature bit
>> to do the offload or not. If the bit is set you get offloaded rules
>> if it is not then rules will not be offloaded. If we can agree on
>> this patch series there are some more patches on my queue we can
>> talk about to make the offload decision per rule using flags similar
>> to how we do l2 mac updates. Additionally the error strategy can
>> be improved to be hard aborting, log and continue, etc. I think
>> these are nice to have improvements but shouldn't block this series.
>>
>> Also by adding get_parse_graph and set_parse_graph attributes as
>> in my previous flow_api work we can build programmable devices
>> and programmatically learn when rules can or can not be loaded
>> into the hardware. Again future work.
>>
>> Any comments/feedback appreciated.
>>
>> Thanks,
>> John
>>
>> ---
>>
>> John Fastabend (7):
>>       net: rework ndo tc op to consume additional qdisc handle parameter
>>       net: rework setup_tc ndo op to consume general tc operand
>>       net: sched: add cls_u32 offload hooks for netdevs
>>       net: add tc offload feature flag
>>       net: tc: helper functions to query action types
>>       net: ixgbe: add minimal parser details for ixgbe
>>       net: ixgbe: add support for tc_u32 offload
>>
> 
> Hi John,
> 
> Nice work :)

Thanks, we will need at least a v2 to fixup some build errors
with various compile flags caught by build_bot and missed by me.

> 
> I will add mlx5 support, and see if can live with u32. If not - will
> add flower support too.

That would be great.

Thanks
.John

> 
> Amir
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-03 10:07   ` Amir Vadai"
@ 2016-02-03 10:26     ` John Fastabend
  2016-02-03 12:46       ` Jamal Hadi Salim
  0 siblings, 1 reply; 34+ messages in thread
From: John Fastabend @ 2016-02-03 10:26 UTC (permalink / raw)
  To: Amir Vadai"; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On 16-02-03 02:07 AM, Amir Vadai" wrote:
> On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
>> This adds initial support for offloading the u32 tc classifier. This
>> initial implementation only implements a few base matches and actions
>> to illustrate the use of the infrastructure patches.
>>
>> However it is an interesting subset because it handles the u32 next
>> hdr logic to correctly map tcp packets from ip headers using the ihl
>> and protocol fields. After this is accepted we can extend the match
>> and action fields easily by updating the model header file.
>>
>> Also only the drop action is supported initially.
>>
>> Here is a short test script,
>>
>>  #tc qdisc add dev eth4 ingress
>>  #tc filter add dev eth4 parent ffff: protocol ip \
>> 	u32 ht 800: order 1 \
>> 	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
>>
>> <-- hardware has dst/src ip match rule installed -->
>>
>>  #tc filter del dev eth4 parent ffff: prio 49152
>>  #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
>> 	handle 1: u32 divisor 1
>>  #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
>> 	u32 ht 800: order 1 link 1: \
>> 	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>>  #tc filter add dev eth4 parent ffff: protocol ip \
>> 	u32 ht 1: order 3 match tcp src 23 ffff action drop
>>
>> <-- hardware has tcp src port rule installed -->
>>
>>  #tc qdisc del dev eth4 parent ffff:
>>
>> <-- hardware cleaned up -->
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  196 ++++++++++++++++++++++
>>  3 files changed, 198 insertions(+), 7 deletions(-)
>>
> 
> What are you doing w.r.t priorities? Are the filters processed by the
> order of the priorities?
> 

The rules are put in order by the handles which is populated in
my command above such that 'ht 1: order 3' gives handle 1::3 and
'ht 800: order 1' gives 800::1. Take a look at this block in cls_u32

        if (err == 0) {
                struct tc_u_knode __rcu **ins;
                struct tc_u_knode *pins;

                ins = &ht->ht[TC_U32_HASH(handle)];
                for (pins = rtnl_dereference(*ins); pins;
                     ins = &pins->next, pins = rtnl_dereference(*ins))
                        if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
                                break;

                RCU_INIT_POINTER(n->next, pins);
                rcu_assign_pointer(*ins, n);
                u32_replace_hw_knode(tp, n);
                *arg = (unsigned long)n;
                return 0;


If you leave ht and order off the tc cli I believe 'tc' just
picks some semi-arbitrary ones for you. I've been in the habit
of always specifying them even for software filters.

[...]

>>  
>> +#include <net/tc_act/tc_gact.h>
>> +#include "ixgbe_model.h"
> Did you leave those #include's in the middle of the file on purpose?
> 
> [...]
> 

Probably bad form I'll move it to the top with the other header files.

Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-03 10:21   ` John Fastabend
@ 2016-02-03 10:31     ` Or Gerlitz
  2016-02-03 12:21       ` Jamal Hadi Salim
  0 siblings, 1 reply; 34+ messages in thread
From: Or Gerlitz @ 2016-02-03 10:31 UTC (permalink / raw)
  To: John Fastabend; +Cc: amir, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On 2/3/2016 12:21 PM, John Fastabend wrote:
> Thanks, we will need at least a v2 to fixup some build errors
> with various compile flags caught by build_bot and missed by me.
Hi John,

You didn't mark that as RFC... but we said this direction/approach yet
to be talked @ netdev next-week, so.. can you clarify?

I suggest not to rush and asking pulling this, lets have the tc workshop 
beforehand...

Please add to v2 listing of changes from V0/V1 to assist with the review.

thanks,

Or.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter
  2016-02-03  9:27 ` [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
  2016-02-03  9:58   ` kbuild test robot
  2016-02-03  9:59   ` kbuild test robot
@ 2016-02-03 11:44   ` kbuild test robot
  2 siblings, 0 replies; 34+ messages in thread
From: kbuild test robot @ 2016-02-03 11:44 UTC (permalink / raw)
  To: John Fastabend
  Cc: kbuild-all, amir, ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

[-- Attachment #1: Type: text/plain, Size: 3329 bytes --]

Hi John,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/John-Fastabend/tc-offload-for-cls_u32-on-ixgbe/20160203-173342
config: x86_64-allmodconfig (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

>> drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1715:19: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
     .ndo_setup_tc  = xgbe_setup_tc,
                      ^
   drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1715:19: note: (near initialization for 'xgbe_netdev_ops.ndo_setup_tc')
--
>> drivers/net/ethernet/intel/i40e/i40e_fcoe.c:1460:19: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
     .ndo_setup_tc  = i40e_setup_tc,
                      ^
   drivers/net/ethernet/intel/i40e/i40e_fcoe.c:1460:19: note: (near initialization for 'i40e_fcoe_netdev_ops.ndo_setup_tc')

vim +1715 drivers/net/ethernet/amd/xgbe/xgbe-drv.c

c5aa9e3b Lendacky, Thomas 2014-06-05  1699  static const struct net_device_ops xgbe_netdev_ops = {
c5aa9e3b Lendacky, Thomas 2014-06-05  1700  	.ndo_open		= xgbe_open,
c5aa9e3b Lendacky, Thomas 2014-06-05  1701  	.ndo_stop		= xgbe_close,
c5aa9e3b Lendacky, Thomas 2014-06-05  1702  	.ndo_start_xmit		= xgbe_xmit,
c5aa9e3b Lendacky, Thomas 2014-06-05  1703  	.ndo_set_rx_mode	= xgbe_set_rx_mode,
c5aa9e3b Lendacky, Thomas 2014-06-05  1704  	.ndo_set_mac_address	= xgbe_set_mac_address,
c5aa9e3b Lendacky, Thomas 2014-06-05  1705  	.ndo_validate_addr	= eth_validate_addr,
23e4eef7 Lendacky, Thomas 2014-07-29  1706  	.ndo_do_ioctl		= xgbe_ioctl,
c5aa9e3b Lendacky, Thomas 2014-06-05  1707  	.ndo_change_mtu		= xgbe_change_mtu,
a8373f1a Lendacky, Thomas 2015-04-09  1708  	.ndo_tx_timeout		= xgbe_tx_timeout,
c5aa9e3b Lendacky, Thomas 2014-06-05  1709  	.ndo_get_stats64	= xgbe_get_stats64,
801c62d9 Lendacky, Thomas 2014-06-24  1710  	.ndo_vlan_rx_add_vid	= xgbe_vlan_rx_add_vid,
801c62d9 Lendacky, Thomas 2014-06-24  1711  	.ndo_vlan_rx_kill_vid	= xgbe_vlan_rx_kill_vid,
c5aa9e3b Lendacky, Thomas 2014-06-05  1712  #ifdef CONFIG_NET_POLL_CONTROLLER
c5aa9e3b Lendacky, Thomas 2014-06-05  1713  	.ndo_poll_controller	= xgbe_poll_controller,
c5aa9e3b Lendacky, Thomas 2014-06-05  1714  #endif
fca2d994 Lendacky, Thomas 2014-07-29 @1715  	.ndo_setup_tc		= xgbe_setup_tc,
c5aa9e3b Lendacky, Thomas 2014-06-05  1716  	.ndo_set_features	= xgbe_set_features,
c5aa9e3b Lendacky, Thomas 2014-06-05  1717  };
c5aa9e3b Lendacky, Thomas 2014-06-05  1718  
c5aa9e3b Lendacky, Thomas 2014-06-05  1719  struct net_device_ops *xgbe_get_netdev_ops(void)
c5aa9e3b Lendacky, Thomas 2014-06-05  1720  {
c5aa9e3b Lendacky, Thomas 2014-06-05  1721  	return (struct net_device_ops *)&xgbe_netdev_ops;
c5aa9e3b Lendacky, Thomas 2014-06-05  1722  }
c5aa9e3b Lendacky, Thomas 2014-06-05  1723  

:::::: The code at line 1715 was first introduced by commit
:::::: fca2d99428473884e67ef8ea1586e58151ed6ac3 amd-xgbe: Add traffic class support

:::::: TO: Lendacky, Thomas <Thomas.Lendacky@amd.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 53132 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-03 10:31     ` Or Gerlitz
@ 2016-02-03 12:21       ` Jamal Hadi Salim
  2016-02-03 18:48         ` Fastabend, John R
  0 siblings, 1 reply; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-03 12:21 UTC (permalink / raw)
  To: Or Gerlitz, John Fastabend; +Cc: amir, jiri, jeffrey.t.kirsher, netdev, davem

On 16-02-03 05:31 AM, Or Gerlitz wrote:
> On 2/3/2016 12:21 PM, John Fastabend wrote:
>> Thanks, we will need at least a v2 to fixup some build errors
>> with various compile flags caught by build_bot and missed by me.
> Hi John,
>
> You didn't mark that as RFC... but we said this direction/approach yet
> to be talked @ netdev next-week, so.. can you clarify?
>
> I suggest not to rush and asking pulling this, lets have the tc workshop
> beforehand...
>

Yes, the tc workshop is a good place for this.
I think we can spill some of it into the switchdev workshop (which is a
nice flow since that happens later).

Some comments:
1) "priorities" for filters and some form of "index" for actions is
is needed. I think index (which tends to be a 32 bit value is what
Amir's patches refered to as "cookie" - or at least some hardware
can be used to query the action with). Priorities maybe implicit in
the order in which they are added. And the idea of appending vs
exclusivity vs replace (which  netlink already supports)
is important to worry about (TCAMS tend to assume an append mode
for example).

2) I like the u32 approach where it makes sense; but sometimes it
doesnt make sense from a usability pov. I work with some ASICs
that have 10 tuples that are  fixed. Yes, a user can describe a policy
with u32 but flower would be more  usable say with flower (both
programmatic and cli)

3) The concept of "hook address" is important to be able to express.
Amir's patches seemed to miss that (and John brought it up in an
email). It could be as simple as ifindex + hookid. With ifindex of
0 meaning all ports and maybe hookid of 0 meaning all hooks.
Hook semantics are as mentioned by John (as it stands right now
in/egress)

4) Why are we forsaking switchdev John?
This is certainly re-usable beyond NICs and SRIOV.

5)What happened to being both able to hardware and/or software?

Anyways, I think Seville would be a blast! Come one, come all.

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-03 10:26     ` John Fastabend
@ 2016-02-03 12:46       ` Jamal Hadi Salim
  2016-02-03 19:02         ` Fastabend, John R
  2016-02-09 11:30         ` Fastabend, John R
  0 siblings, 2 replies; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-03 12:46 UTC (permalink / raw)
  To: John Fastabend, Amir Vadai"
  Cc: ogerlitz, jiri, jeffrey.t.kirsher, netdev, davem

On 16-02-03 05:26 AM, John Fastabend wrote:
> On 16-02-03 02:07 AM, Amir Vadai" wrote:
>> On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
>>> This adds initial support for offloading the u32 tc classifier. This
>>> initial implementation only implements a few base matches and actions
>>> to illustrate the use of the infrastructure patches.
>>>
>>> However it is an interesting subset because it handles the u32 next
>>> hdr logic to correctly map tcp packets from ip headers using the ihl
>>> and protocol fields. After this is accepted we can extend the match
>>> and action fields easily by updating the model header file.
>>>
>>> Also only the drop action is supported initially.
>>>
>>> Here is a short test script,
>>>
>>>   #tc qdisc add dev eth4 ingress
>>>   #tc filter add dev eth4 parent ffff: protocol ip \
>>> 	u32 ht 800: order 1 \
>>> 	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
>>>
>>> <-- hardware has dst/src ip match rule installed -->
>>>
>>>   #tc filter del dev eth4 parent ffff: prio 49152
>>>   #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
>>> 	handle 1: u32 divisor 1
>>>   #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
>>> 	u32 ht 800: order 1 link 1: \
>>> 	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>>>   #tc filter add dev eth4 parent ffff: protocol ip \
>>> 	u32 ht 1: order 3 match tcp src 23 ffff action drop
>>>
>>> <-- hardware has tcp src port rule installed -->
>>>
>>>   #tc qdisc del dev eth4 parent ffff:
>>>
>>> <-- hardware cleaned up -->
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>>>   drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3
>>>   drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
>>>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  196 ++++++++++++++++++++++
>>>   3 files changed, 198 insertions(+), 7 deletions(-)
>>>
>>
>> What are you doing w.r.t priorities? Are the filters processed by the
>> order of the priorities?
>>
>
> The rules are put in order by the handles which is populated in
> my command above such that 'ht 1: order 3' gives handle 1::3 and
> 'ht 800: order 1' gives 800::1. Take a look at this block in cls_u32
>
>          if (err == 0) {
>                  struct tc_u_knode __rcu **ins;
>                  struct tc_u_knode *pins;
>
>                  ins = &ht->ht[TC_U32_HASH(handle)];
>                  for (pins = rtnl_dereference(*ins); pins;
>                       ins = &pins->next, pins = rtnl_dereference(*ins))
>                          if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
>                                  break;
>
>                  RCU_INIT_POINTER(n->next, pins);
>                  rcu_assign_pointer(*ins, n);
>                  u32_replace_hw_knode(tp, n);
>                  *arg = (unsigned long)n;
>                  return 0;
>
>
> If you leave ht and order off the tc cli I believe 'tc' just
> picks some semi-arbitrary ones for you. I've been in the habit
> of always specifying them even for software filters.
>

The default table id is essentially 0x800. Default bucket is 0.
"order" essentially is the filter id. And given you can link tables
(Nice work John!); essentially the ht:bucket:nodeid is an "address" to
a specific filter on a specific table and when makes sense a specific
hash bucket. Some other way to look at it is as a way to construct
a mapping to a TCAM key.
What John is doing is essentially taking the nodeid and trying to use
it as a priority. In otherwise the abstraction is reduced to a linked
list in which the ordering is how the list is traversed.
It may work in this case, but i am for being able to explicitly specify
priorities.

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-03 12:21       ` Jamal Hadi Salim
@ 2016-02-03 18:48         ` Fastabend, John R
  2016-02-04 13:12           ` Jamal Hadi Salim
  0 siblings, 1 reply; 34+ messages in thread
From: Fastabend, John R @ 2016-02-03 18:48 UTC (permalink / raw)
  To: Jamal Hadi Salim, Or Gerlitz; +Cc: amir, jiri, jeffrey.t.kirsher, netdev, davem

On 2/3/2016 4:21 AM, Jamal Hadi Salim wrote:
> On 16-02-03 05:31 AM, Or Gerlitz wrote:
>> On 2/3/2016 12:21 PM, John Fastabend wrote:
>>> Thanks, we will need at least a v2 to fixup some build errors
>>> with various compile flags caught by build_bot and missed by me.
>> Hi John,
>>
>> You didn't mark that as RFC... but we said this direction/approach yet
>> to be talked @ netdev next-week, so.. can you clarify?

Yeah I think this set of patches is ready and it really is what we've
been talking about doing for two or three conferences now. Also I don't
know where "we" decided to talk about this @ netdev or how that became
a prereq for patches. I think this is the correct approach and I am not
seeing any contentious pieces here so why not consider it for inclusion.
Do you have some issue with the approach? I don't recall any when we
talked about this last time.

>>
>> I suggest not to rush and asking pulling this, lets have the tc workshop
>> beforehand...
>>

rush? we've been talking about it for a year+

> 
> Yes, the tc workshop is a good place for this.
> I think we can spill some of it into the switchdev workshop (which is a
> nice flow since that happens later).
> 

Sure but this patch is really the basic stuff we should move on to
some of the more interesting pieces. I have ~30 or so patches behind
this that do the fun stuff like resource allocation, capababilities,
support for the divisor > 1, some new actions, etc.

> Some comments:
> 1) "priorities" for filters and some form of "index" for actions is
> is needed. I think index (which tends to be a 32 bit value is what
> Amir's patches refered to as "cookie" - or at least some hardware
> can be used to query the action with). Priorities maybe implicit in
> the order in which they are added. And th idea of appending vs
> exclusivity vs replace (which  netlink already supports)
> is important to worry about (TCAMS tend to assume an append mode
> for example).

The code denotes add/del/replace already. I'm not sure why a TCAM
would assume an append mode but OK maybe that is some API you have
the APIs I use don't have these semantics.

For this series using cls_u32 the handle gives you everything you need
to put entries in the right table and row. Namely the ht # and order #
from 'tc'. Take a look at u32_change and u32_classify its the handle
that places the filter into the list and the handle that is matched in
classify. We should place the filters in the hardware in the same order
that is used by u32_change.

Also ran a few tests and can't see how priority works in u32 maybe you
can shed some light but as best I can tell it doesn't have any effect
on rule execution.

> 
> 2) I like the u32 approach where it makes sense; but sometimes it
> doesnt make sense from a usability pov. I work with some ASICs
> that have 10 tuples that are  fixed. Yes, a user can describe a policy
> with u32 but flower would be more  usable say with flower (both
> programmatic and cli)

Sure so create a set of offload hooks for flower we don't need only
one hardware classifier any more than we would like a single software
classifiers. I'll send out my flower patches when I get to a real system
I'm on a corporate laptop at the moment.

> 
> 3) The concept of "hook address" is important to be able to express.
> Amir's patches seemed to miss that (and John brought it up in an
> email). It could be as simple as ifindex + hookid. With ifindex of
> 0 meaning all ports and maybe hookid of 0 meaning all hooks.
> Hook semantics are as mentioned by John (as it stands right now
> in/egress)
> 

Again I'm trying to faithfully implement what we have in software
and load that into the hardware. The handle today gives ingress/egres
hook. If you want an all ports hook we should add it to 'tc' software
first and then push that to the hardware not create magic hardware
bits. See I've drank the cool aid software first than hardware.

> 4) Why are we forsaking switchdev John?
> This is certainly re-usable beyond NICs and SRIOV.
> 

Sure and switchdev can use it just like they use fdb_add and friends.
I just don't want to require switchdev infrastructure on things that
really are not switches. I think Amir indicated he would take a try
at the switchdev integration. If not I'm willing to do it but it
doesn't block this series in any way imo.

> 5)What happened to being both able to hardware and/or software?

Follow up patch once we get the basic infrastructure in place with
the big feature flag bit. I have a patch I'm testing for this now
but again I want to move in logical and somewhat minimal sets.

> 
> Anyways, I think Seville would be a blast! Come one, come all.
> 

I'll be there but lets be sure to follow up with this online I
know folks are following this who wont be at Seville and I don't
see any reason to block these patches and stop the thread for a
week or more.

> cheers,
> jamal
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-03 12:46       ` Jamal Hadi Salim
@ 2016-02-03 19:02         ` Fastabend, John R
  2016-02-09 11:30         ` Fastabend, John R
  1 sibling, 0 replies; 34+ messages in thread
From: Fastabend, John R @ 2016-02-03 19:02 UTC (permalink / raw)
  To: Jamal Hadi Salim, Amir Vadai"
  Cc: ogerlitz, jiri, jeffrey.t.kirsher, netdev, davem

On 2/3/2016 4:46 AM, Jamal Hadi Salim wrote:
[...]

>>>>
>>>
>>> What are you doing w.r.t priorities? Are the filters processed by the
>>> order of the priorities?
>>>

In the same order as software processes filters. I tried to faithfully
translate the u32 classify and  u32_change loops into hardware. If I
missed some case its a bug and I'll fix it. My reading and experiments
of u32 indicate the ht:bucket:node build a handle and this is how we
order rules.

When I test this I create a veth pair and create the same rule set on
the veth pair as my hardware netdev. Then inject a skb into both
the veth and hardware netdev if you get different results its a bug.

>>
>> The rules are put in order by the handles which is populated in
>> my command above such that 'ht 1: order 3' gives handle 1::3 and
>> 'ht 800: order 1' gives 800::1. Take a look at this block in cls_u32
>>
>>          if (err == 0) {
>>                  struct tc_u_knode __rcu **ins;
>>                  struct tc_u_knode *pins;
>>
>>                  ins = &ht->ht[TC_U32_HASH(handle)];
>>                  for (pins = rtnl_dereference(*ins); pins;
>>                       ins = &pins->next, pins = rtnl_dereference(*ins))
>>                          if (TC_U32_NODE(handle) <
>> TC_U32_NODE(pins->handle))
>>                                  break;
>>
>>                  RCU_INIT_POINTER(n->next, pins);
>>                  rcu_assign_pointer(*ins, n);
>>                  u32_replace_hw_knode(tp, n);
>>                  *arg = (unsigned long)n;
>>                  return 0;
>>
>>
>> If you leave ht and order off the tc cli I believe 'tc' just
>> picks some semi-arbitrary ones for you. I've been in the habit
>> of always specifying them even for software filters.
>>
> 
> The default table id is essentially 0x800. Default bucket is 0.
> "order" essentially is the filter id. And given you can link tables
> (Nice work John!); essentially the ht:bucket:nodeid is an "address" to
> a specific filter on a specific table and when makes sense a specific
> hash bucket. Some other way to look at it is as a way to construct
> a mapping to a TCAM key.

Sure as long as your mapping works/looks like the software only case
it doesn't matter how you do the hardware mapping and my guess is this
is going to be highly device specific. Even with the handful of devices
I have it looks different depending on the device.

Note if you don't do the table links there really is no safe
way to get into the headers beyond the IP header because you need the
nexthdr stuff. Also just to stick this out there I have a patch to let
cls_u32 start processing at the ethernet header instead of the
transport header similar to how bpf works. This negative offset business
doesn't really work as best I can tell.

> What John is doing is essentially taking the nodeid and trying to use
> it as a priority. In otherwise the abstraction is reduced to a linked
> list in which the ordering is how the list is traversed.
> It may work in this case, but i am for being able to explicitly specify
> priorities.

But that doesn't exist in software today. If users want explicit order
today they build the ht, nodeid out correctly just use that. If you
are working on a TCAM just use the nodeid that should be equivalent to
priority.

And  although the ixgbe supports only a single table the mapping on more
complex devices will take it onto multiple tables if that optimizes
things.

Note I need to do a couple fixes on my existing code one to abort when
given a bad ifindex and two to hard abort when a hashtable is given
a higher order rule that I can't support. Both are fairly small tweaks
to the ixgbe code not to the infrastructure.

> 
> cheers,
> jamal
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-03  9:29 ` [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
  2016-02-03 10:07   ` Amir Vadai"
@ 2016-02-04  7:30   ` Amir Vadai"
  2016-02-04  8:23     ` Fastabend, John R
  1 sibling, 1 reply; 34+ messages in thread
From: Amir Vadai" @ 2016-02-04  7:30 UTC (permalink / raw)
  To: John Fastabend; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
> This adds initial support for offloading the u32 tc classifier. This
> initial implementation only implements a few base matches and actions
> to illustrate the use of the infrastructure patches.
> 
> However it is an interesting subset because it handles the u32 next
> hdr logic to correctly map tcp packets from ip headers using the ihl
> and protocol fields. After this is accepted we can extend the match
> and action fields easily by updating the model header file.
> 
> Also only the drop action is supported initially.
> 
> Here is a short test script,
> 
>  #tc qdisc add dev eth4 ingress
>  #tc filter add dev eth4 parent ffff: protocol ip \
> 	u32 ht 800: order 1 \
> 	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
> 
> <-- hardware has dst/src ip match rule installed -->
> 
>  #tc filter del dev eth4 parent ffff: prio 49152
>  #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
> 	handle 1: u32 divisor 1
>  #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
> 	u32 ht 800: order 1 link 1: \
> 	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>  #tc filter add dev eth4 parent ffff: protocol ip \
> 	u32 ht 1: order 3 match tcp src 23 ffff action drop
> 
> <-- hardware has tcp src port rule installed -->
> 
>  #tc qdisc del dev eth4 parent ffff:
> 
> <-- hardware cleaned up -->
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  196 ++++++++++++++++++++++
>  3 files changed, 198 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 4b9156c..09c2d9b 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h

[...]

> @@ -8277,6 +8465,7 @@ static int ixgbe_set_features(struct net_device *netdev,
>  	 */
>  	switch (features & NETIF_F_NTUPLE) {
>  	case NETIF_F_NTUPLE:
> +	case NETIF_F_HW_TC:
>  		/* turn off ATR, enable perfect filters and reset */
>  		if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
>  			need_reset = true;

I think you have a bug here. I don't see how the NETIF_F_HW_TC case will
happen after masking 'features' out.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-04  7:30   ` Amir Vadai"
@ 2016-02-04  8:23     ` Fastabend, John R
  2016-02-04 12:12       ` Amir Vadai"
  0 siblings, 1 reply; 34+ messages in thread
From: Fastabend, John R @ 2016-02-04  8:23 UTC (permalink / raw)
  To: Amir Vadai"; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On 2/3/2016 11:30 PM, Amir Vadai" wrote:
> On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
>> This adds initial support for offloading the u32 tc classifier. This
>> initial implementation only implements a few base matches and actions
>> to illustrate the use of the infrastructure patches.
>>
>> However it is an interesting subset because it handles the u32 next
>> hdr logic to correctly map tcp packets from ip headers using the ihl
>> and protocol fields. After this is accepted we can extend the match
>> and action fields easily by updating the model header file.
>>
>> Also only the drop action is supported initially.
>>
>> Here is a short test script,
>>
>>  #tc qdisc add dev eth4 ingress
>>  #tc filter add dev eth4 parent ffff: protocol ip \
>> 	u32 ht 800: order 1 \
>> 	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
>>
>> <-- hardware has dst/src ip match rule installed -->
>>
>>  #tc filter del dev eth4 parent ffff: prio 49152
>>  #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
>> 	handle 1: u32 divisor 1
>>  #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
>> 	u32 ht 800: order 1 link 1: \
>> 	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>>  #tc filter add dev eth4 parent ffff: protocol ip \
>> 	u32 ht 1: order 3 match tcp src 23 ffff action drop
>>
>> <-- hardware has tcp src port rule installed -->
>>
>>  #tc qdisc del dev eth4 parent ffff:
>>
>> <-- hardware cleaned up -->
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  196 ++++++++++++++++++++++
>>  3 files changed, 198 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> index 4b9156c..09c2d9b 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> 
> [...]
> 
>> @@ -8277,6 +8465,7 @@ static int ixgbe_set_features(struct net_device *netdev,
>>  	 */
>>  	switch (features & NETIF_F_NTUPLE) {
>>  	case NETIF_F_NTUPLE:
>> +	case NETIF_F_HW_TC:
>>  		/* turn off ATR, enable perfect filters and reset */
>>  		if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
>>  			need_reset = true;
> 
> I think you have a bug here. I don't see how the NETIF_F_HW_TC case will
> happen after masking 'features' out.
> 

Ah I should have annotated this in the commit msg. I turn the feature
off by default to enable it the user needs to run

	# ethtool -K ethx hw-tc-offload on

this is just a habit of mine to leave new features off by default for
a bit until I work out some of the kinks. For example I found a case
today where if you build loops into your u32 graph the hardware tables
can get out of sync with the software tables. This is sort of extreme
corner case not sure if anyone would really use u32 but it is valid
and the hardware should abort correctly.

Thanks,
John

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
                   ` (7 preceding siblings ...)
  2016-02-03 10:11 ` [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe Amir Vadai"
@ 2016-02-04  9:16 ` Jiri Pirko
  2016-02-04 23:19   ` Pablo Neira Ayuso
  8 siblings, 1 reply; 34+ messages in thread
From: Jiri Pirko @ 2016-02-04  9:16 UTC (permalink / raw)
  To: John Fastabend; +Cc: amir, ogerlitz, jhs, jeffrey.t.kirsher, netdev, davem

Wed, Feb 03, 2016 at 10:27:32AM CET, john.fastabend@gmail.com wrote:
>This extends the setup_tc framework so it can support more than just
>the mqprio offload and push other classifiers and qdiscs into the
>hardware. The series here targets the u32 classifier and ixgbe
>driver. I worked out the u32 classifier because it is protocol
>oblivious and aligns with multiple hardware devices I have access
>to. I did an initial implementation on ixgbe because (a) I have one
>in my box (b) its a stable driver and (c) it is relatively simple
>compared to the other devices I have here but still has enough
>flexibility to exercise the features of cls_u32.
>
>I intentionally limited the scope of this series to the basic
>feature set. Specifically this uses a 'big hammer' feature bit
>to do the offload or not. If the bit is set you get offloaded rules
>if it is not then rules will not be offloaded. If we can agree on
>this patch series there are some more patches on my queue we can
>talk about to make the offload decision per rule using flags similar
>to how we do l2 mac updates. Additionally the error strategy can
>be improved to be hard aborting, log and continue, etc. I think
>these are nice to have improvements but shouldn't block this series.
>
>Also by adding get_parse_graph and set_parse_graph attributes as
>in my previous flow_api work we can build programmable devices
>and programmatically learn when rules can or can not be loaded
>into the hardware. Again future work.
>
>Any comments/feedback appreciated.

I like this being thin and elegant solution. However, ~2 years ago when I
pushed openvswitch kernel datapath offload patchset, people were yelling
at me that it is not generic enough solution, that tc has to be able
to use the api (Jamal :)), nftables as well.

Now this patch is making offload strictly tc-based and nobody seems to
care :) I do. I think that we might try to find some generic middle layer.

Let's discuss this more in person next week.

Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-04  8:23     ` Fastabend, John R
@ 2016-02-04 12:12       ` Amir Vadai"
  2016-02-09 11:27         ` Fastabend, John R
  0 siblings, 1 reply; 34+ messages in thread
From: Amir Vadai" @ 2016-02-04 12:12 UTC (permalink / raw)
  To: Fastabend, John R; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On Thu, Feb 04, 2016 at 12:23:02AM -0800, Fastabend, John R wrote:
> On 2/3/2016 11:30 PM, Amir Vadai" wrote:
> > On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
> >> This adds initial support for offloading the u32 tc classifier. This
> >> initial implementation only implements a few base matches and actions
> >> to illustrate the use of the infrastructure patches.
> >>
> >> However it is an interesting subset because it handles the u32 next
> >> hdr logic to correctly map tcp packets from ip headers using the ihl
> >> and protocol fields. After this is accepted we can extend the match
> >> and action fields easily by updating the model header file.
> >>
> >> Also only the drop action is supported initially.
> >>
> >> Here is a short test script,
> >>
> >>  #tc qdisc add dev eth4 ingress
> >>  #tc filter add dev eth4 parent ffff: protocol ip \
> >> 	u32 ht 800: order 1 \
> >> 	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
> >>
> >> <-- hardware has dst/src ip match rule installed -->
> >>
> >>  #tc filter del dev eth4 parent ffff: prio 49152
> >>  #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
> >> 	handle 1: u32 divisor 1
> >>  #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
> >> 	u32 ht 800: order 1 link 1: \
> >> 	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
> >>  #tc filter add dev eth4 parent ffff: protocol ip \
> >> 	u32 ht 1: order 3 match tcp src 23 ffff action drop
> >>
> >> <-- hardware has tcp src port rule installed -->
> >>
> >>  #tc qdisc del dev eth4 parent ffff:
> >>
> >> <-- hardware cleaned up -->
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> ---
> >>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    3 
> >>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    6 -
> >>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  196 ++++++++++++++++++++++
> >>  3 files changed, 198 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> >> index 4b9156c..09c2d9b 100644
> >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> > 
> > [...]
> > 
> >> @@ -8277,6 +8465,7 @@ static int ixgbe_set_features(struct net_device *netdev,
> >>  	 */
> >>  	switch (features & NETIF_F_NTUPLE) {
> >>  	case NETIF_F_NTUPLE:
> >> +	case NETIF_F_HW_TC:
> >>  		/* turn off ATR, enable perfect filters and reset */
> >>  		if (!(adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE))
> >>  			need_reset = true;
> > 
> > I think you have a bug here. I don't see how the NETIF_F_HW_TC case will
> > happen after masking 'features' out.
> > 
> 
> Ah I should have annotated this in the commit msg. I turn the feature
> off by default to enable it the user needs to run
> 
> 	# ethtool -K ethx hw-tc-offload on
> 
> this is just a habit of mine to leave new features off by default for
> a bit until I work out some of the kinks. For example I found a case
> today where if you build loops into your u32 graph the hardware tables
> can get out of sync with the software tables. This is sort of extreme
> corner case not sure if anyone would really use u32 but it is valid
> and the hardware should abort correctly.
Yeh - that is nice :) But I was just pointing out on a small typo which I
think you have.
The new case will never happen. You compare: (features & NETIF_F_NTUPLE) == NETIF_F_HW_TC
Also the comment before the switch should be modified.

> 
> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-03 18:48         ` Fastabend, John R
@ 2016-02-04 13:12           ` Jamal Hadi Salim
  2016-02-09 11:24             ` Fastabend, John R
  0 siblings, 1 reply; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-04 13:12 UTC (permalink / raw)
  To: Fastabend, John R, Or Gerlitz
  Cc: amir, jiri, jeffrey.t.kirsher, netdev, davem


On 16-02-03 01:48 PM, Fastabend, John R wrote:

BTW: For the record John, I empathize with you that we need to
move. Please have patience - we are close; lets just get this resolved
in Seville. I like your patches a lot and would love to just have
your patches pushed in, but the challenges with community is being able
to reach some middle ground. We are not as bad as some of the standards
organizations. I am sure we'll get this resolved by end of next week
if not, I am %100 in agreement some form of your patches (And Amir's
need to go in and then we can refactor as needed)

>> 1) "priorities" for filters and some form of "index" for actions is
>> is needed. I think index (which tends to be a 32 bit value is what
>> Amir's patches refered to as "cookie" - or at least some hardware
>> can be used to query the action with). Priorities maybe implicit in
>> the order in which they are added. And th idea of appending vs
>> exclusivity vs replace (which  netlink already supports)
>> is important to worry about (TCAMS tend to assume an append mode
>> for example).
>
> The code denotes add/del/replace already. I'm not sure why a TCAM
> would assume an append mode but OK maybe that is some API you have
> the APIs I use don't have these semantics.
>

Basically most hardware (or i should say driver implementations of
mostly TCAMS) allow you to add exactly the same filter as many times
as you want. They dont really look at what you want to filter on
and then scream "conflict". IOW, you (user) are responsible for
conflict resolution at the filter level. The driver sees this blob
and requests for some index/key from the hardware then just adds it.
You can then use this key/index to delete/replace etc.
This is what i meant by "append" mode.
However if a classifier implementation cares about filter ambiguity
resolution, then priorities are used. We need to worry about the
bigger picture.


> For this series using cls_u32 the handle gives you everything you need
> to put entries in the right table and row. Namely the ht # and order #
> from 'tc'.

True - but with a caveat. There are only 2^12 max tables you can
have for example and up to 2^12 filters per bucket etc.

>Take a look at u32_change and u32_classify its the handle
> that places the filter into the list and the handle that is matched in
> classify. We should place the filters in the hardware in the same order
> that is used by u32_change.
>

I can see some parallels, but:
The nodeid in itself is insufficent for two reasons:
You cant have more than 2^12 filters per bucket;
and the nodeid then takes two meanings: a) it is an id
b) it specifies the order in which things are looked up.

I think you need to take the u32 address and map it to something in your
hardware. But at the same time it is important to have the abstraction
closely emulate your hardware.

> Also ran a few tests and can't see how priority works in u32 maybe you
> can shed some light but as best I can tell it doesn't have any effect
> on rule execution.
>

True.
u32 doesnt care because it will give you a nodeid if you dont specify
one. i.e conflict resolution is mapped to you not specifying exactly
the same ht:bkt:nodeid more than once. And if you will let the
kernel do it for you (as i am assumming you are saying your hardware
will) then no need.

>>
>> 2) I like the u32 approach where it makes sense; but sometimes it
>> doesnt make sense from a usability pov. I work with some ASICs
>> that have 10 tuples that are  fixed. Yes, a user can describe a policy
>> with u32 but flower would be more  usable say with flower (both
>> programmatic and cli)
>
> Sure so create a set of offload hooks for flower we don't need only
> one hardware classifier any more than we would like a single software
> classifiers.


Glad to hear that.
I was a little concerned that despite my love for u32 it was
going to be _the_ classifier. It doesnt fit for all offload cases
and sometimes it is because of human operators (the 10 tuple
hardware classifier i mentioned earlier).
BTW: Classifier in this case is very wide ranging (a regex hardware
offload for example qualifies).

>
> Again I'm trying to faithfully implement what we have in software
> and load that into the hardware. The handle today gives ingress/egres
> hook. If you want an all ports hook we should add it to 'tc' software
> first and then push that to the hardware not create magic hardware
> bits. See I've drank the cool aid software first than hardware.
>

;-> No disagreement. It felt like a small sensible
change - thats why i suggested it.

>> 4) Why are we forsaking switchdev John?
>> This is certainly re-usable beyond NICs and SRIOV.
>>
>
> Sure and switchdev can use it just like they use fdb_add and friends.
> I just don't want to require switchdev infrastructure on things that
> really are not switches. I think Amir indicated he would take a try
> at the switchdev integration. If not I'm willing to do it but it
> doesn't block this series in any way imo.
>

Ok. Makes sense.

>> 5)What happened to being both able to hardware and/or software?
>
> Follow up patch once we get the basic infrastructure in place with
> the big feature flag bit. I have a patch I'm testing for this now
> but again I want to move in logical and somewhat minimal sets.
>

Sounds sensible.

>>
>> Anyways, I think Seville would be a blast! Come one, come all.
>>
>
> I'll be there but lets be sure to follow up with this online I
> know folks are following this who wont be at Seville and I don't
> see any reason to block these patches and stop the thread for a
> week or more.
>

I really dont see much of a blocker.

cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-03  9:28 ` [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
  2016-02-03 10:14   ` kbuild test robot
@ 2016-02-04 13:18   ` Amir Vadai"
  2016-02-09 11:09     ` Fastabend, John R
  1 sibling, 1 reply; 34+ messages in thread
From: Amir Vadai" @ 2016-02-04 13:18 UTC (permalink / raw)
  To: John Fastabend; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On Wed, Feb 03, 2016 at 01:28:37AM -0800, John Fastabend wrote:
> This patch allows netdev drivers to consume cls_u32 offloads via
> the ndo_setup_tc ndo op.
> 
> This works aligns with how network drivers have been doing qdisc
> offloads for mqprio.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  include/linux/netdevice.h |    6 +++-
>  include/net/pkt_cls.h     |   33 ++++++++++++++++++++
>  net/sched/cls_u32.c       |   73 ++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 109 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 9090ff7..861ce67 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -778,17 +778,21 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
>  typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>  				       struct sk_buff *skb);
>  
> -/* This structure holds attributes of qdisc and classifiers
> +/* These structures hold the attributes of qdisc and classifiers
>   * that are being passed to the netdevice through the setup_tc op.
>   */
>  enum {
>  	TC_SETUP_MQPRIO,
> +	TC_SETUP_CLSU32,
>  };
>  
> +struct tc_cls_u32_offload;
> +
>  struct tc_to_netdev {
>  	unsigned int type;
>  	union {
>  		u8 tc;
> +		struct tc_cls_u32_offload *cls_u32;
>  	};
>  };
>  
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index bc49967..0bd12cd 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -358,4 +358,37 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
>  }
>  #endif /* CONFIG_NET_CLS_IND */
>  
> +struct tc_cls_u32_knode {
> +	struct tcf_exts *exts;
> +	u8 fshift;
> +	u32 handle;
> +	u32 val;
> +	u32 mask;
> +	u32 link_handle;
> +	struct tc_u32_sel *sel;
> +};
> +
> +struct tc_cls_u32_hnode {
> +	u32 handle;
> +	u32 prio;
> +	unsigned int divisor;
> +};
> +
> +enum {
> +	TC_CLSU32_NEW_KNODE,
TC_CLSU32_NEW_KNODE is never used

[...]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-04  9:16 ` Jiri Pirko
@ 2016-02-04 23:19   ` Pablo Neira Ayuso
  2016-02-09 11:06     ` Fastabend, John R
  0 siblings, 1 reply; 34+ messages in thread
From: Pablo Neira Ayuso @ 2016-02-04 23:19 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, amir, ogerlitz, jhs, jeffrey.t.kirsher, netdev, davem

On Thu, Feb 04, 2016 at 10:16:56AM +0100, Jiri Pirko wrote:
> Wed, Feb 03, 2016 at 10:27:32AM CET, john.fastabend@gmail.com wrote:
> >
> >Also by adding get_parse_graph and set_parse_graph attributes as
> >in my previous flow_api work we can build programmable devices
> >and programmatically learn when rules can or can not be loaded
> >into the hardware. Again future work.
> >
> >Any comments/feedback appreciated.
> 
> I like this being thin and elegant solution. However, ~2 years ago when I
> pushed openvswitch kernel datapath offload patchset, people were yelling
> at me that it is not generic enough solution, that tc has to be able
> to use the api (Jamal :)), nftables as well.

I would be glad to join this debate during NetDev 1.1 too.

I think we should provide a solution that allows people uses both
tc and nftables, this would require a bit of generic infrastructure on
top of it so we don't restrict users to one single solution, in other
words, we allow the user to select its own poison.

> Now this patch is making offload strictly tc-based and nobody seems to
> care :) I do. I think that we might try to find some generic middle layer.

I agree and I'll be happy to help to push this ahead. Let's try to sit
and get together to resolve this.

See you soon.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-04 23:19   ` Pablo Neira Ayuso
@ 2016-02-09 11:06     ` Fastabend, John R
  0 siblings, 0 replies; 34+ messages in thread
From: Fastabend, John R @ 2016-02-09 11:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jiri Pirko
  Cc: amir, ogerlitz, jhs, jeffrey.t.kirsher, netdev, davem

On 2/4/2016 3:19 PM, Pablo Neira Ayuso wrote:
> On Thu, Feb 04, 2016 at 10:16:56AM +0100, Jiri Pirko wrote:
>> Wed, Feb 03, 2016 at 10:27:32AM CET, john.fastabend@gmail.com wrote:
>>>
>>> Also by adding get_parse_graph and set_parse_graph attributes as
>>> in my previous flow_api work we can build programmable devices
>>> and programmatically learn when rules can or can not be loaded
>>> into the hardware. Again future work.
>>>
>>> Any comments/feedback appreciated.

Sorry if you get this twice it doesn't look like my original response
made it to netdev and the laptop I replied on charger blew up.

>>
>> I like this being thin and elegant solution. However, ~2 years ago when I
>> pushed openvswitch kernel datapath offload patchset, people were yelling
>> at me that it is not generic enough solution, that tc has to be able
>> to use the api (Jamal :)), nftables as well.
>

The other problem with OVS is if you have the capabilities to do
wildcard lookups (e.g. TCAM/SRAM/etc) then offloading the exact
match table in OVS is really inefficient use of the resource. You
really want to load the megaflow table into hardware. I just don't
think its a good scheme for what you want.

> I would be glad to join this debate during NetDev 1.1 too.
>

great.

> I think we should provide a solution that allows people uses both
> tc and nftables, this would require a bit of generic infrastructure on
> top of it so we don't restrict users to one single solution, in other
> words, we allow the user to select its own poison.
>
>> Now this patch is making offload strictly tc-based and nobody seems to
>> care :) I do. I think that we might try to find some generic middle
>> layer.

If we can build the universal model for 'tc' and 'nftable' we should
unify them higher in the stack? It doesn't make sense to me for the
driver folks to try and create the unified model for two subsystems
if we don't think its worthwhile in software as well.

>
> I agree and I'll be happy to help to push this ahead. Let's try to sit
> and get together to resolve this.

Great.

>
> See you soon.
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs
  2016-02-04 13:18   ` Amir Vadai"
@ 2016-02-09 11:09     ` Fastabend, John R
  0 siblings, 0 replies; 34+ messages in thread
From: Fastabend, John R @ 2016-02-09 11:09 UTC (permalink / raw)
  To: Amir Vadai"; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

On 2/4/2016 5:18 AM, Amir Vadai" wrote:
> On Wed, Feb 03, 2016 at 01:28:37AM -0800, John Fastabend wrote:
>> This patch allows netdev drivers to consume cls_u32 offloads via
>> the ndo_setup_tc ndo op.
>>
>> This works aligns with how network drivers have been doing qdisc
>> offloads for mqprio.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---

[...]

>> +enum {
>> +	TC_CLSU32_NEW_KNODE,
> TC_CLSU32_NEW_KNODE is never used

aha yep that snuck in there. In a follow up patch for the fm10k devices
where we can support hash tables (e.g. divisor > 1) I use it. Although
on closer inspection I need to check that the divisor == 1 on ixgbe or
else abort because we can get out of sync if software expects hash
tables here.

Thanks, nice catch.

> 
> [...]
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-04 13:12           ` Jamal Hadi Salim
@ 2016-02-09 11:24             ` Fastabend, John R
  2016-02-09 12:20               ` Jamal Hadi Salim
  0 siblings, 1 reply; 34+ messages in thread
From: Fastabend, John R @ 2016-02-09 11:24 UTC (permalink / raw)
  To: Jamal Hadi Salim, Or Gerlitz; +Cc: amir, jiri, jeffrey.t.kirsher, netdev, davem

On 2/4/2016 5:12 AM, Jamal Hadi Salim wrote:
> 
> On 16-02-03 01:48 PM, Fastabend, John R wrote:
> 
> BTW: For the record John, I empathize with you that we need to
> move. Please have patience - we are close; lets just get this resolved
> in Seville. I like your patches a lot and would love to just have
> your patches pushed in, but the challenges with community is being able
> to reach some middle ground. We are not as bad as some of the standards
> organizations. I am sure we'll get this resolved by end of next week
> if not, I am %100 in agreement some form of your patches (And Amir's
> need to go in and then we can refactor as needed)

Agreed although I'm a bit worried we are starting to talk about a
single hardware IR. This discussion has always failed in my experience.

> 
>>> 1) "priorities" for filters and some form of "index" for actions is
>>> is needed. I think index (which tends to be a 32 bit value is what
>>> Amir's patches refered to as "cookie" - or at least some hardware
>>> can be used to query the action with). Priorities maybe implicit in
>>> the order in which they are added. And th idea of appending vs
>>> exclusivity vs replace (which  netlink already supports)
>>> is important to worry about (TCAMS tend to assume an append mode
>>> for example).
>>
>> The code denotes add/del/replace already. I'm not sure why a TCAM
>> would assume an append mode but OK maybe that is some API you have
>> the APIs I use don't have these semantics.
>>
> 
> Basically most hardware (or i should say driver implementations of
> mostly TCAMS) allow you to add exactly the same filter as many times
> as you want. They dont really look at what you want to filter on
> and then scream "conflict". IOW, you (user) are responsible for
> conflict resolution at the filter level. The driver sees this blob
> and requests for some index/key from the hardware then just adds it.
> You can then use this key/index to delete/replace etc.
> This is what i meant by "append" mode.
> However if a classifier implementation cares about filter ambiguity
> resolution, then priorities are used. We need to worry about the
> bigger picture.
> 

Sure in other classifiers its used but its not needed in the set I
planned to added it later.

> 
>> For this series using cls_u32 the handle gives you everything you need
>> to put entries in the right table and row. Namely the ht # and order #
>> from 'tc'.
> 
> True - but with a caveat. There are only 2^12 max tables you can
> have for example and up to 2^12 filters per bucket etc.
> 

This is a software limitation as well right? If it hasn't showed up
as a limitation on the software side why would it be an issue here?
Do you have more than 2^12 tables on your devices? If so I guess we
can tack on another 32bits somewhere.

>> Take a look at u32_change and u32_classify its the handle
>> that places the filter into the list and the handle that is matched in
>> classify. We should place the filters in the hardware in the same order
>> that is used by u32_change.
>>
> 
> I can see some parallels, but:
> The nodeid in itself is insufficent for two reasons:
> You cant have more than 2^12 filters per bucket;
> and the nodeid then takes two meanings: a) it is an id
> b) it specifies the order in which things are looked up.
> 
> I think you need to take the u32 address and map it to something in your
> hardware. But at the same time it is important to have the abstraction
> closely emulate your hardware.
> 

IMO the hardware/interface must preserve the same ordering of
filters/hash_Tables/etc. How it does that mapping should be
a driver concern and it can always abort if it fails.

>> Also ran a few tests and can't see how priority works in u32 maybe you
>> can shed some light but as best I can tell it doesn't have any effect
>> on rule execution.
>>
> 
> True.
> u32 doesnt care because it will give you a nodeid if you dont specify
> one. i.e conflict resolution is mapped to you not specifying exactly
> the same ht:bkt:nodeid more than once. And if you will let the
> kernel do it for you (as i am assumming you are saying your hardware
> will) then no need.

Yep. Faithfully offloading u32 here not changing anything except
I do have to abort on some cases with the simpler devices. fm10k for
example can model hash nodes with divisors > 1.

> 
>>>
>>> 2) I like the u32 approach where it makes sense; but sometimes it
>>> doesnt make sense from a usability pov. I work with some ASICs
>>> that have 10 tuples that are  fixed. Yes, a user can describe a policy
>>> with u32 but flower would be more  usable say with flower (both
>>> programmatic and cli)
>>
>> Sure so create a set of offload hooks for flower we don't need only
>> one hardware classifier any more than we would like a single software
>> classifiers.
> 
> 
> Glad to hear that.
> I was a little concerned that despite my love for u32 it was
> going to be _the_ classifier. It doesnt fit for all offload cases
> and sometimes it is because of human operators (the 10 tuple
> hardware classifier i mentioned earlier).
> BTW: Classifier in this case is very wide ranging (a regex hardware
> offload for example qualifies).

My issue is we can map flower onto u32 that is fine and u32 onto
bpf. But we lose a lot of the power of each classifier when we
do this. flower for example is nice because of its simplicity
presumably this translates into faster updates, u32 is great because
we get full parse graph support and hash tables, ebpf is the biggest
beast of all and lets us load arbitrary functions into the device.
All are nice in their own right.

> 
>>
>> Again I'm trying to faithfully implement what we have in software
>> and load that into the hardware. The handle today gives ingress/egres
>> hook. If you want an all ports hook we should add it to 'tc' software
>> first and then push that to the hardware not create magic hardware
>> bits. See I've drank the cool aid software first than hardware.
>>
> 
> ;-> No disagreement. It felt like a small sensible
> change - thats why i suggested it.
>

Yep its in the git log if we can get past this initial series.


>>> 4) Why are we forsaking switchdev John?
>>> This is certainly re-usable beyond NICs and SRIOV.
>>>
>>
>> Sure and switchdev can use it just like they use fdb_add and friends.
>> I just don't want to require switchdev infrastructure on things that
>> really are not switches. I think Amir indicated he would take a try
>> at the switchdev integration. If not I'm willing to do it but it
>> doesn't block this series in any way imo.
>>
> 
> Ok. Makes sense.
> 

Great!

>>> 5)What happened to being both able to hardware and/or software?
>>
>> Follow up patch once we get the basic infrastructure in place with
>> the big feature flag bit. I have a patch I'm testing for this now
>> but again I want to move in logical and somewhat minimal sets.
>>
> 
> Sounds sensible.
> 
>>>
>>> Anyways, I think Seville would be a blast! Come one, come all.
>>>
>>
>> I'll be there but lets be sure to follow up with this online I
>> know folks are following this who wont be at Seville and I don't
>> see any reason to block these patches and stop the thread for a
>> week or more.
>>
> 
> I really dont see much of a blocker.

Perfect hopefully it didn't get thrashed on too much last couple
days. I'll be in Seville in a couple hours!

> 
> cheers,
> jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-04 12:12       ` Amir Vadai"
@ 2016-02-09 11:27         ` Fastabend, John R
  0 siblings, 0 replies; 34+ messages in thread
From: Fastabend, John R @ 2016-02-09 11:27 UTC (permalink / raw)
  To: Amir Vadai"; +Cc: ogerlitz, jiri, jhs, jeffrey.t.kirsher, netdev, davem

[...]

>> Ah I should have annotated this in the commit msg. I turn the feature
>> off by default to enable it the user needs to run
>>
>> 	# ethtool -K ethx hw-tc-offload on
>>
>> this is just a habit of mine to leave new features off by default for
>> a bit until I work out some of the kinks. For example I found a case
>> today where if you build loops into your u32 graph the hardware tables
>> can get out of sync with the software tables. This is sort of extreme
>> corner case not sure if anyone would really use u32 but it is valid
>> and the hardware should abort correctly.
> Yeh - that is nice :) But I was just pointing out on a small typo which I
> think you have.
> The new case will never happen. You compare: (features & NETIF_F_NTUPLE) == NETIF_F_HW_TC
> Also the comment before the switch should be modified.

Aha nice catch my scripts were enabling both ntuple and hw-tc-offload
for testing compatibility issues. I wonder if there is a bug somewhere
else though that checks that code most likely because it was definately
getting offloaded.

Good catch again thanks.

> 
>>
>> Thanks,
>> John
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload
  2016-02-03 12:46       ` Jamal Hadi Salim
  2016-02-03 19:02         ` Fastabend, John R
@ 2016-02-09 11:30         ` Fastabend, John R
  1 sibling, 0 replies; 34+ messages in thread
From: Fastabend, John R @ 2016-02-09 11:30 UTC (permalink / raw)
  To: Jamal Hadi Salim, Amir Vadai"
  Cc: ogerlitz, jiri, jeffrey.t.kirsher, netdev, davem

[...]

>>
>> If you leave ht and order off the tc cli I believe 'tc' just
>> picks some semi-arbitrary ones for you. I've been in the habit
>> of always specifying them even for software filters.
>>
> 
> The default table id is essentially 0x800. Default bucket is 0.
> "order" essentially is the filter id. And given you can link tables
> (Nice work John!); essentially the ht:bucket:nodeid is an "address" to
> a specific filter on a specific table and when makes sense a specific
> hash bucket. Some other way to look at it is as a way to construct
> a mapping to a TCAM key.
> What John is doing is essentially taking the nodeid and trying to use
> it as a priority. In otherwise the abstraction is reduced to a linked
> list in which the ordering is how the list is traversed.
> It may work in this case, but i am for being able to explicitly specify
> priorities.

Sorry bombing you with emails Jamal. Another thing to note is ixgbe
doesn't support hash tables explicitly but our other devices do. So
when a hash node is created we can map that onto a hardware block
and actually do the hash.

> 
> cheers,
> jamal
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
  2016-02-09 11:24             ` Fastabend, John R
@ 2016-02-09 12:20               ` Jamal Hadi Salim
  0 siblings, 0 replies; 34+ messages in thread
From: Jamal Hadi Salim @ 2016-02-09 12:20 UTC (permalink / raw)
  To: Fastabend, John R, Or Gerlitz
  Cc: amir, jiri, jeffrey.t.kirsher, netdev, davem

On 16-02-09 06:24 AM, Fastabend, John R wrote:
> On 2/4/2016 5:12 AM, Jamal Hadi Salim wrote:
>>
>> On 16-02-03 01:48 PM, Fastabend, John R wrote:
>>

>> Basically most hardware (or i should say driver implementations of
>> mostly TCAMS) allow you to add exactly the same filter as many times
>> as you want. They dont really look at what you want to filter on
>> and then scream "conflict". IOW, you (user) are responsible for
>> conflict resolution at the filter level. The driver sees this blob
>> and requests for some index/key from the hardware then just adds it.
>> You can then use this key/index to delete/replace etc.
>> This is what i meant by "append" mode.
>> However if a classifier implementation cares about filter ambiguity
>> resolution, then priorities are used. We need to worry about the
>> bigger picture.
>>
>
> Sure in other classifiers its used but its not needed in the set I
> planned to added it later.
>

If you leave it open for some other hardware to use we should be fine.

>>
>>> For this series using cls_u32 the handle gives you everything you need
>>> to put entries in the right table and row. Namely the ht # and order #
>>> from 'tc'.
>>
>> True - but with a caveat. There are only 2^12 max tables you can
>> have for example and up to 2^12 filters per bucket etc.
>>
>
> This is a software limitation as well right? If it hasn't showed up
> as a limitation on the software side why would it be an issue here?
> Do you have more than 2^12 tables on your devices? If so I guess we
> can tack on another 32bits somewhere.
>

That handle is used as an "Address" to the 32 bit filter.
Just beware of the semantics the handle has.
It hasnt shown up as a software limitation because the defaults
are good enough for most people. But if you ever want to install
a million rules that can be looked up at a reasonable pps rate
it will become very obvious quickly. I have a sample setup in the
talk tommorow which shows such an example.

>> I think you need to take the u32 address and map it to something in your
>> hardware. But at the same time it is important to have the abstraction
>> closely emulate your hardware.
>>
>
> IMO the hardware/interface must preserve the same ordering of
> filters/hash_Tables/etc. How it does that mapping should be
> a driver concern and it can always abort if it fails.
>

Sure.

>>> Also ran a few tests and can't see how priority works in u32 maybe you
>>> can shed some light but as best I can tell it doesn't have any effect
>>> on rule execution.
>>>
>>
>> True.
>> u32 doesnt care because it will give you a nodeid if you dont specify
>> one. i.e conflict resolution is mapped to you not specifying exactly
>> the same ht:bkt:nodeid more than once. And if you will let the
>> kernel do it for you (as i am assumming you are saying your hardware
>> will) then no need.
>
> Yep. Faithfully offloading u32 here not changing anything except
> I do have to abort on some cases with the simpler devices. fm10k for
> example can model hash nodes with divisors > 1.
>

I wonder if when we get to capabilities we can do this...

>>

>
> My issue is we can map flower onto u32 that is fine and u32 onto
> bpf. But we lose a lot of the power of each classifier when we
> do this. flower for example is nice because of its simplicity
> presumably this translates into faster updates, u32 is great because
> we get full parse graph support and hash tables, ebpf is the biggest
> beast of all and lets us load arbitrary functions into the device.
> All are nice in their own right.
>

Did i send you my slides? ;->


cheers,
jamal

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-02-09 12:20 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03  9:27 [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe John Fastabend
2016-02-03  9:27 ` [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter John Fastabend
2016-02-03  9:58   ` kbuild test robot
2016-02-03  9:59   ` kbuild test robot
2016-02-03 11:44   ` kbuild test robot
2016-02-03  9:28 ` [net-next PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand John Fastabend
2016-02-03  9:28 ` [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs John Fastabend
2016-02-03 10:14   ` kbuild test robot
2016-02-04 13:18   ` Amir Vadai"
2016-02-09 11:09     ` Fastabend, John R
2016-02-03  9:28 ` [net-next PATCH 4/7] net: add tc offload feature flag John Fastabend
2016-02-03  9:29 ` [net-next PATCH 5/7] net: tc: helper functions to query action types John Fastabend
2016-02-03  9:29 ` [net-next PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe John Fastabend
2016-02-03  9:29 ` [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload John Fastabend
2016-02-03 10:07   ` Amir Vadai"
2016-02-03 10:26     ` John Fastabend
2016-02-03 12:46       ` Jamal Hadi Salim
2016-02-03 19:02         ` Fastabend, John R
2016-02-09 11:30         ` Fastabend, John R
2016-02-04  7:30   ` Amir Vadai"
2016-02-04  8:23     ` Fastabend, John R
2016-02-04 12:12       ` Amir Vadai"
2016-02-09 11:27         ` Fastabend, John R
2016-02-03 10:11 ` [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe Amir Vadai"
2016-02-03 10:21   ` John Fastabend
2016-02-03 10:31     ` Or Gerlitz
2016-02-03 12:21       ` Jamal Hadi Salim
2016-02-03 18:48         ` Fastabend, John R
2016-02-04 13:12           ` Jamal Hadi Salim
2016-02-09 11:24             ` Fastabend, John R
2016-02-09 12:20               ` Jamal Hadi Salim
2016-02-04  9:16 ` Jiri Pirko
2016-02-04 23:19   ` Pablo Neira Ayuso
2016-02-09 11:06     ` Fastabend, John R

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.