All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver
@ 2017-03-01 14:02 Erez Shitrit
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit

    The IPoIB protocol encapsulates IP packets over Infiniband datagrams.
    As a direct RDMA Upper Layer Protocol (ULP), IPoIB cannot support HW
    features that are specific to the IP protocol stack.

    Nevertheless, RDMA interfaces have been extended to support some of the
    prominent IP offload features, such as TCP/UDP checksum and TSO.
    This provided reasonable performance gain for IPoIB but is still
    insufficient to cope with the increasing network bandwidth demand.

    However, New features are exisiting in common network interfaces that
    are very hard to implement in IPoIB interfaces while it uses the RDMA
    layer, examples include TSS and RSS, tunneling offloads, and XDP.
    Rather than continuously porting IP network interface developments into
    the RDMA stack, we propose adding an abstract network data-path interfaces
    to RDMA devices.

    In order to present a consistent interface to users, the IPoIB ULP
    continues to represent the network device to the IP stack.
    The common code also manages the IPoIB control plane, such as resolving
    path queries and registering to multicast groups.
    Data path operations are forwarded to devices that implement the new
    API, or fallback to the standard implementation otherwise.
    Using the forgoing approach, we show how IPoIB closes the performance
    gap compared to state-of-the-art Ethernet network interfaces.

    The implementation idea is to expose a set of functions that are used for
    network interfaces, like create, delete, init hw resources, send, and
    attach/detach multicast to qp.
    That set of functions encapsulates in new struct, and this struct can or
    can't be given by the specific HW layer.

    The IPoIB code will be adapted to enable the option of accelerating the
    network interface, but the code will work as before if the HW below
    doesn't support the acceleration.
    Each HW vendor can supply the acceleration for the IPoIB or to leave
    IPoIB to work as before.

    The flow is as the following:

    /* default accelerating functions, same as before */
    struct ib_ipoib_accel_ops default_ipoib_accel_ops = {
            .ib_dev_init = ipoib_dev_init_default,
            .ib_dev_cleanup = ipoib_dev_uninit_default,
            .ib_dev_open = ipoib_ib_dev_open_default,
            .ib_dev_stop = ipoib_ib_dev_stop_default,
            .send = ipoib_send,
            .create_netdev = ipoib_create_netdev_default,
            .attach_mcast = ipoib_mcast_attach,
            .dettach_mcast = ipoib_mcast_dettach,
    };

        /* checks if the ib core layer supports the accelerations via
         * new verb get_ipoib_accel_ops, if the response is not NULL
         * IPoIB will
         * use it
         */
        cur_ipoib_accel_ops = ib_get_ipoib_accel_ops(hca);
        if (!cur_ipoib_accel_ops) {
        /*
         * IPoIB has a default acceleration set of
         * functions that does the same code that it
         * does before,
         * so if the under layer doesn't support
         * acceleration the code looks and feels the
         * same way
         */
                pr_debug("default ipoib_ops for %s\n", hca->name);
                cur_ipoib_accel_ops = &default_ipoib_accel_ops;
        }

        /* create netdev device */
        ipoib_accel_ops->create_netdev(hca, name, ipoib_setup_common);

        /* Creating resources */
        priv->ipoib_accel_ops->ib_dev_init(dev, &priv->qp_num);

        /* open device */
        ipoib_accel_ops->ib_dev_open(dev);

        /* attach multicast */
        ipoib_accel_ops->attach_mcast(dev, gid, mlid);

        /* send traffic */
        ipoib_accel_ops->send(dev, skb, ah, rqpn);

Erez Shitrit (27):
  IB/ipoib: Separate control and data related initializations
  IB/ipoib: separate control from HW operation on ipoib_open/stop ndo
  IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions
  IB/verb: Add ipoib_options struct and API
  IB/ipoib: Support ipoib acceleration options callbacks
  IB/ipoib: Add context to ipoib to be used in acceleration layer
  hw/mlx5: Add New bit to check over QP creation
  linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects
  net/mlx5e: Refactor EN code to support IB link
  net/mlx5e: Creating and Destroying flow-steering tables for IB link
  mlx5_en: Support netdevice creation for IB link type
  mlx5_en: Refactor attach_netdev API
  mlx5_en: Use underlay_qpn in tis creation
  mlx5_en: Export resource creation function to be used in IB link
  mlx5_core: Enable flow-steering for IB link
  mlx5_core/en: Enhanced flow table creation to support ETH and IB
    links.
  mlx5_en: Change cleanup API in order to enable IB link
  mlx5_en: Change mlx5e_open_locked and mlx5e_close_locked api
  mlx5_en: Export open/close api for IB link
  include/linux/mlx5: Add  mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode
  mlx5_en: Refactor TX send flow
  mlx5_en: Export send function for IB link type
  mlx5_en: New function pointer for build_rx_skb is
  mlx5_en: Change the function that checks the packet type
  mlx5_en: Add support for build_rx_skb for packet from IB type
  mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
  RFC: cover letter

 drivers/infiniband/hw/mlx5/Makefile                |   2 +-
 drivers/infiniband/hw/mlx5/main.c                  |   3 +
 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c        | 289 ++++++++++++++
 drivers/infiniband/hw/mlx5/qp.c                    |   5 +-
 drivers/infiniband/ulp/ipoib/ipoib.h               |  26 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c            | 275 ++++++-------
 drivers/infiniband/ulp/ipoib/ipoib_main.c          | 187 +++++++--
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c     |  13 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |  52 +--
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c          |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  23 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  |  12 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  24 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c    |  66 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 435 ++++++++++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  21 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  70 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c    | 292 +++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   9 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  19 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |   8 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  77 ++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fw.c       |   3 +-
 include/linux/mlx5/driver.h                        |  17 +
 include/linux/mlx5/fs.h                            |  16 +-
 include/linux/mlx5/mlx5_ifc.h                      |  11 +-
 include/linux/mlx5/qp.h                            |   8 +
 include/rdma/ib_ipoib_accel_ops.h                  |  71 ++++
 include/rdma/ib_verbs.h                            |   9 +
 30 files changed, 1545 insertions(+), 504 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c
 create mode 100644 include/rdma/ib_ipoib_accel_ops.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 01/26] IB/ipoib: Separate control and data related initializations
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 02/26] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo Erez Shitrit
                     ` (26 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


This patch prepares init and and teardown flows so we can call them
through ipoib_options function pointers.
It arranges that area of code as the following:
All operations which deal with the resource allocation/deletion are done
in one place.
All operations that are control oriented, meaning that they are not
connected to a specific HW beneath, are done in a separate place.

The operations for alloc HW resources are now in the function
ipoib_dev_init_default, the deletion of all the resources are in
ipoib_dev_uninit_default The only exception is the creation of the pd
object which is used both for resource allocation (create QP etc.) and for
control flows like creating ah.

It also does:
Move creation of rx_ring and tx_ring to be in the resources allocation
area.
Move the function ipoib_ib_dev_open that does the open device to the
control area instead of the dev_init which creates resources.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h       |   5 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c    |  39 +++-------
 drivers/infiniband/ulp/ipoib/ipoib_main.c  | 116 ++++++++++++++++++++++++-----
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  40 +---------
 4 files changed, 110 insertions(+), 90 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index bed233bf45c3..7cd9befd7d54 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -491,14 +491,13 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 void ipoib_flush_paths(struct net_device *dev);
 int ipoib_check_sm_sendonly_fullmember_support(struct ipoib_dev_priv *priv);
 struct ipoib_dev_priv *ipoib_intf_alloc(const char *format);
-
-int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
+void ipoib_ib_tx_timer_func(unsigned long ctx);
 void ipoib_ib_dev_flush_light(struct work_struct *work);
 void ipoib_ib_dev_flush_normal(struct work_struct *work);
 void ipoib_ib_dev_flush_heavy(struct work_struct *work);
 void ipoib_pkey_event(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
-
+void ipoib_dev_uninit_default(struct net_device *dev);
 int ipoib_ib_dev_open(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 12c4f84a6639..3c0a35d883e2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -40,6 +40,7 @@
 
 #include <linux/ip.h>
 #include <linux/tcp.h>
+#include <linux/vmalloc.h>
 
 #include "ipoib.h"
 
@@ -692,7 +693,7 @@ static void ipoib_stop_ah(struct net_device *dev)
 	ipoib_flush_ah(dev);
 }
 
-static void ipoib_ib_tx_timer_func(unsigned long ctx)
+void ipoib_ib_tx_timer_func(unsigned long ctx)
 {
 	drain_tx_cq((struct net_device *)ctx);
 }
@@ -913,32 +914,6 @@ void ipoib_ib_dev_stop(struct net_device *dev)
 	ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP);
 }
 
-int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
-{
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
-
-	priv->ca = ca;
-	priv->port = port;
-	priv->qp = NULL;
-
-	if (ipoib_transport_dev_init(dev, ca)) {
-		printk(KERN_WARNING "%s: ipoib_transport_dev_init failed\n", ca->name);
-		return -ENODEV;
-	}
-
-	setup_timer(&priv->poll_timer, ipoib_ib_tx_timer_func,
-		    (unsigned long) dev);
-
-	if (dev->flags & IFF_UP) {
-		if (ipoib_ib_dev_open(dev)) {
-			ipoib_transport_dev_cleanup(dev);
-			return -ENODEV;
-		}
-	}
-
-	return 0;
-}
-
 /*
  * Takes whatever value which is in pkey index 0 and updates priv->pkey
  * returns 0 if the pkey value was changed.
@@ -1236,7 +1211,13 @@ void ipoib_ib_dev_cleanup(struct net_device *dev)
 	 */
 	ipoib_stop_ah(dev);
 
-	ipoib_transport_dev_cleanup(dev);
-}
+	clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
+
+	ipoib_dev_uninit_default(dev);
 
+	if (priv->pd) {
+		ib_dealloc_pd(priv->pd);
+		priv->pd = NULL;
+	}
+}
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 259c59f67394..8c644bbc2828 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1622,8 +1622,23 @@ static void ipoib_neigh_hash_uninit(struct net_device *dev)
 	wait_for_completion(&priv->ntbl.deleted);
 }
 
+void ipoib_dev_uninit_default(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
-int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
+	ipoib_transport_dev_cleanup(dev);
+
+	ipoib_cm_dev_cleanup(dev);
+
+	kfree(priv->rx_ring);
+	vfree(priv->tx_ring);
+
+	priv->rx_ring = NULL;
+	priv->tx_ring = NULL;
+}
+
+static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
+				  int port)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
@@ -1642,29 +1657,95 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 
 	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
 
-	if (ipoib_ib_dev_init(dev, ca, port))
+	if (ipoib_transport_dev_init(dev, ca)) {
+		pr_warn("%s: ipoib_transport_dev_init failed\n", ca->name);
 		goto out_tx_ring_cleanup;
+	}
+
+	setup_timer(&priv->poll_timer, ipoib_ib_tx_timer_func,
+		    (unsigned long)dev);
+
+	return 0;
+
+out_tx_ring_cleanup:
+	vfree(priv->tx_ring);
+
+out_rx_ring_cleanup:
+	kfree(priv->rx_ring);
+
+out:
+	return -ENOMEM;
+}
+
+int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret = -ENOMEM;
+
+	priv->ca = ca;
+	priv->port = port;
+	priv->qp = NULL;
 
 	/*
-	 * Must be after ipoib_ib_dev_init so we can allocate a per
-	 * device wq there and use it here
+	 * the various IPoIB tasks assume they will never race against
+	 * themselves, so always use a single thread workqueue
 	 */
-	if (ipoib_neigh_hash_init(priv) < 0)
+	priv->wq = alloc_ordered_workqueue("ipoib_wq", WQ_MEM_RECLAIM);
+	if (!priv->wq) {
+		pr_warn("%s: failed to allocate device WQ\n", dev->name);
+		goto out;
+	}
+
+	/* create pd, which used both for control and datapath*/
+	priv->pd = ib_alloc_pd(priv->ca, 0);
+	if (IS_ERR(priv->pd)) {
+		pr_warn("%s: failed to allocate PD\n", ca->name);
+		goto clean_wq;
+	}
+
+	ret = ipoib_dev_init_default(dev, ca, port);
+	if (ret) {
+		pr_warn("%s failed to init HW resource\n", dev->name);
+		goto out_free_pd;
+	}
+
+	/* after qp created set dev address */
+	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
+	priv->dev->dev_addr[2] = (priv->qp->qp_num >>  8) & 0xff;
+	priv->dev->dev_addr[3] = (priv->qp->qp_num) & 0xff;
+
+	if (ipoib_neigh_hash_init(priv) < 0) {
+		pr_warn("%s failed to init neigh hash\n", dev->name);
 		goto out_dev_uninit;
+	}
+
+	if (dev->flags & IFF_UP) {
+		if (ipoib_ib_dev_open(dev)) {
+			pr_warn("%s failed to open device\n", dev->name);
+			ret = -ENODEV;
+			goto out_dev_uninit;
+		}
+	}
 
 	return 0;
 
 out_dev_uninit:
 	ipoib_ib_dev_cleanup(dev);
 
-out_tx_ring_cleanup:
-	vfree(priv->tx_ring);
+out_free_pd:
+	if (priv->pd) {
+		ib_dealloc_pd(priv->pd);
+		priv->pd = NULL;
+	}
 
-out_rx_ring_cleanup:
-	kfree(priv->rx_ring);
+clean_wq:
+	if (priv->wq) {
+		destroy_workqueue(priv->wq);
+		priv->wq = NULL;
+	}
 
 out:
-	return -ENOMEM;
+	return ret;
 }
 
 void ipoib_dev_cleanup(struct net_device *dev)
@@ -1685,19 +1766,16 @@ void ipoib_dev_cleanup(struct net_device *dev)
 	}
 	unregister_netdevice_many(&head);
 
-	/*
-	 * Must be before ipoib_ib_dev_cleanup or we delete an in use
-	 * work queue
-	 */
 	ipoib_neigh_hash_uninit(dev);
 
 	ipoib_ib_dev_cleanup(dev);
 
-	kfree(priv->rx_ring);
-	vfree(priv->tx_ring);
-
-	priv->rx_ring = NULL;
-	priv->tx_ring = NULL;
+	/* no more works over the priv->wq */
+	if (priv->wq) {
+		flush_workqueue(priv->wq);
+		destroy_workqueue(priv->wq);
+		priv->wq = NULL;
+	}
 }
 
 static int ipoib_set_vf_link_state(struct net_device *dev, int vf, int link_state)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 189dcd1709d2..529b9509a2c4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -147,22 +147,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 	int ret, size;
 	int i;
 
-	priv->pd = ib_alloc_pd(priv->ca, 0);
-	if (IS_ERR(priv->pd)) {
-		printk(KERN_WARNING "%s: failed to allocate PD\n", ca->name);
-		return -ENODEV;
-	}
-
-	/*
-	 * the various IPoIB tasks assume they will never race against
-	 * themselves, so always use a single thread workqueue
-	 */
-	priv->wq = alloc_ordered_workqueue("ipoib_wq", WQ_MEM_RECLAIM);
-	if (!priv->wq) {
-		printk(KERN_WARNING "ipoib: failed to allocate device WQ\n");
-		goto out_free_pd;
-	}
-
 	size = ipoib_recvq_size + 1;
 	ret = ipoib_cm_dev_init(dev);
 	if (!ret) {
@@ -173,7 +157,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 			size += ipoib_recvq_size * ipoib_max_conn_qp;
 	} else
 		if (ret != -ENOSYS)
-			goto out_free_wq;
+			return -ENODEV;
 
 	cq_attr.cqe = size;
 	priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL,
@@ -212,10 +196,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		goto out_free_send_cq;
 	}
 
-	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
-	priv->dev->dev_addr[2] = (priv->qp->qp_num >>  8) & 0xff;
-	priv->dev->dev_addr[3] = (priv->qp->qp_num      ) & 0xff;
-
 	for (i = 0; i < MAX_SKB_FRAGS + 1; ++i)
 		priv->tx_sge[i].lkey = priv->pd->local_dma_lkey;
 
@@ -247,13 +227,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 out_cm_dev_cleanup:
 	ipoib_cm_dev_cleanup(dev);
 
-out_free_wq:
-	destroy_workqueue(priv->wq);
-	priv->wq = NULL;
-
-out_free_pd:
-	ib_dealloc_pd(priv->pd);
-
 	return -ENODEV;
 }
 
@@ -266,7 +239,6 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
 			ipoib_warn(priv, "ib_qp_destroy failed\n");
 
 		priv->qp = NULL;
-		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 	}
 
 	if (ib_destroy_cq(priv->send_cq))
@@ -274,16 +246,6 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
 
 	if (ib_destroy_cq(priv->recv_cq))
 		ipoib_warn(priv, "ib_cq_destroy (recv) failed\n");
-
-	ipoib_cm_dev_cleanup(dev);
-
-	if (priv->wq) {
-		flush_workqueue(priv->wq);
-		destroy_workqueue(priv->wq);
-		priv->wq = NULL;
-	}
-
-	ib_dealloc_pd(priv->pd);
 }
 
 void ipoib_event(struct ib_event_handler *handler,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 02/26] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 01/26] IB/ipoib: Separate control and data related initializations Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 03/26] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions Erez Shitrit
                     ` (25 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


This patch is preparing the netdev part at the ipoib driver to be able
to use the ipoib_options.
It deals with the two flows from the .ndo: ipoib_open and ipoib_stop.
It arranges that area of code as follows:
All operations which deal with the HW resources, (for example change QP
state, post-receive etc.) are done in one place.
All operations that are control oriented (like restart multicast task,
start the reap_ah etc.) are done in separate place.

The functions that deals with the HW resources now located at
__ipoib_ib_dev_open for the ipoib_open flow and __ipoib_ib_dev_stop for
ipoib_stop.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   | 228 +++++++++++++++++-------------
 drivers/infiniband/ulp/ipoib/ipoib_main.c |   2 +-
 3 files changed, 129 insertions(+), 103 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 7cd9befd7d54..b3900b253ad5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -501,7 +501,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 int ipoib_ib_dev_open(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
-void ipoib_ib_dev_stop(struct net_device *dev);
+int ipoib_ib_dev_stop_default(struct net_device *dev);
 void ipoib_pkey_dev_check_presence(struct net_device *dev);
 
 int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 3c0a35d883e2..5d732c5f01ee 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -693,24 +693,113 @@ static void ipoib_stop_ah(struct net_device *dev)
 	ipoib_flush_ah(dev);
 }
 
-void ipoib_ib_tx_timer_func(unsigned long ctx)
+static int recvs_pending(struct net_device *dev)
 {
-	drain_tx_cq((struct net_device *)ctx);
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int pending = 0;
+	int i;
+
+	for (i = 0; i < ipoib_recvq_size; ++i)
+		if (priv->rx_ring[i].skb)
+			++pending;
+
+	return pending;
 }
 
-int ipoib_ib_dev_open(struct net_device *dev)
+int ipoib_ib_dev_stop_default(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	int ret;
+	struct ib_qp_attr qp_attr;
+	unsigned long begin;
+	struct ipoib_tx_buf *tx_req;
+	int i;
 
-	ipoib_pkey_dev_check_presence(dev);
+	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
+		napi_disable(&priv->napi);
 
-	if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags)) {
-		ipoib_warn(priv, "P_Key 0x%04x is %s\n", priv->pkey,
-			   (!(priv->pkey & 0x7fff) ? "Invalid" : "not found"));
-		return -1;
+	ipoib_cm_dev_stop(dev);
+
+	/*
+	 * Move our QP to the error state and then reinitialize in
+	 * when all work requests have completed or have been flushed.
+	 */
+	qp_attr.qp_state = IB_QPS_ERR;
+	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
+		ipoib_warn(priv, "Failed to modify QP to ERROR state\n");
+
+	/* Wait for all sends and receives to complete */
+	begin = jiffies;
+
+	while (priv->tx_head != priv->tx_tail || recvs_pending(dev)) {
+		if (time_after(jiffies, begin + 5 * HZ)) {
+			ipoib_warn(priv, "timing out; %d sends %d receives not completed\n",
+				   priv->tx_head - priv->tx_tail, recvs_pending(dev));
+
+			/*
+			 * assume the HW is wedged and just free up
+			 * all our pending work requests.
+			 */
+			while ((int) priv->tx_tail - (int) priv->tx_head < 0) {
+				tx_req = &priv->tx_ring[priv->tx_tail &
+							(ipoib_sendq_size - 1)];
+				ipoib_dma_unmap_tx(priv, tx_req);
+				dev_kfree_skb_any(tx_req->skb);
+				++priv->tx_tail;
+				--priv->tx_outstanding;
+			}
+
+			for (i = 0; i < ipoib_recvq_size; ++i) {
+				struct ipoib_rx_buf *rx_req;
+
+				rx_req = &priv->rx_ring[i];
+				if (!rx_req->skb)
+					continue;
+				ipoib_ud_dma_unmap_rx(priv,
+						      priv->rx_ring[i].mapping);
+				dev_kfree_skb_any(rx_req->skb);
+				rx_req->skb = NULL;
+			}
+
+			goto timeout;
+		}
+
+		ipoib_drain_cq(dev);
+
+		msleep(1);
 	}
 
+	ipoib_dbg(priv, "All sends and receives done.\n");
+
+timeout:
+	del_timer_sync(&priv->poll_timer);
+	qp_attr.qp_state = IB_QPS_RESET;
+	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
+		ipoib_warn(priv, "Failed to modify QP to RESET state\n");
+
+	ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP);
+
+	return 0;
+}
+
+int ipoib_ib_dev_stop(struct net_device *dev)
+{
+	ipoib_ib_dev_stop_default(dev);
+
+	ipoib_flush_ah(dev);
+
+	return 0;
+}
+
+void ipoib_ib_tx_timer_func(unsigned long ctx)
+{
+	drain_tx_cq((struct net_device *)ctx);
+}
+
+int ipoib_ib_dev_open_default(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret;
+
 	ret = ipoib_init_qp(dev);
 	if (ret) {
 		ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret);
@@ -729,10 +818,6 @@ int ipoib_ib_dev_open(struct net_device *dev)
 		goto dev_stop;
 	}
 
-	clear_bit(IPOIB_STOP_REAPER, &priv->flags);
-	queue_delayed_work(priv->wq, &priv->ah_reap_task,
-			   round_jiffies_relative(HZ));
-
 	if (!test_and_set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
 		napi_enable(&priv->napi);
 
@@ -744,6 +829,35 @@ int ipoib_ib_dev_open(struct net_device *dev)
 	return -1;
 }
 
+int ipoib_ib_dev_open(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	ipoib_pkey_dev_check_presence(dev);
+
+	if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags)) {
+		ipoib_warn(priv, "P_Key 0x%04x is %s\n", priv->pkey,
+			   (!(priv->pkey & 0x7fff) ? "Invalid" : "not found"));
+		return -1;
+	}
+
+	clear_bit(IPOIB_STOP_REAPER, &priv->flags);
+	queue_delayed_work(priv->wq, &priv->ah_reap_task,
+			   round_jiffies_relative(HZ));
+
+	if (ipoib_ib_dev_open_default(dev)) {
+		pr_warn("%s: Failed to open dev\n", dev->name);
+		goto stop_ah_reap;
+	}
+
+	return 0;
+
+stop_ah_reap:
+	set_bit(IPOIB_STOP_REAPER, &priv->flags);
+	cancel_delayed_work(&priv->ah_reap_task);
+	return -1;
+}
+
 void ipoib_pkey_dev_check_presence(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -787,19 +901,6 @@ void ipoib_ib_dev_down(struct net_device *dev)
 	ipoib_flush_paths(dev);
 }
 
-static int recvs_pending(struct net_device *dev)
-{
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	int pending = 0;
-	int i;
-
-	for (i = 0; i < ipoib_recvq_size; ++i)
-		if (priv->rx_ring[i].skb)
-			++pending;
-
-	return pending;
-}
-
 void ipoib_drain_cq(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -839,81 +940,6 @@ void ipoib_drain_cq(struct net_device *dev)
 	local_bh_enable();
 }
 
-void ipoib_ib_dev_stop(struct net_device *dev)
-{
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_qp_attr qp_attr;
-	unsigned long begin;
-	struct ipoib_tx_buf *tx_req;
-	int i;
-
-	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
-		napi_disable(&priv->napi);
-
-	ipoib_cm_dev_stop(dev);
-
-	/*
-	 * Move our QP to the error state and then reinitialize in
-	 * when all work requests have completed or have been flushed.
-	 */
-	qp_attr.qp_state = IB_QPS_ERR;
-	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
-		ipoib_warn(priv, "Failed to modify QP to ERROR state\n");
-
-	/* Wait for all sends and receives to complete */
-	begin = jiffies;
-
-	while (priv->tx_head != priv->tx_tail || recvs_pending(dev)) {
-		if (time_after(jiffies, begin + 5 * HZ)) {
-			ipoib_warn(priv, "timing out; %d sends %d receives not completed\n",
-				   priv->tx_head - priv->tx_tail, recvs_pending(dev));
-
-			/*
-			 * assume the HW is wedged and just free up
-			 * all our pending work requests.
-			 */
-			while ((int) priv->tx_tail - (int) priv->tx_head < 0) {
-				tx_req = &priv->tx_ring[priv->tx_tail &
-							(ipoib_sendq_size - 1)];
-				ipoib_dma_unmap_tx(priv, tx_req);
-				dev_kfree_skb_any(tx_req->skb);
-				++priv->tx_tail;
-				--priv->tx_outstanding;
-			}
-
-			for (i = 0; i < ipoib_recvq_size; ++i) {
-				struct ipoib_rx_buf *rx_req;
-
-				rx_req = &priv->rx_ring[i];
-				if (!rx_req->skb)
-					continue;
-				ipoib_ud_dma_unmap_rx(priv,
-						      priv->rx_ring[i].mapping);
-				dev_kfree_skb_any(rx_req->skb);
-				rx_req->skb = NULL;
-			}
-
-			goto timeout;
-		}
-
-		ipoib_drain_cq(dev);
-
-		msleep(1);
-	}
-
-	ipoib_dbg(priv, "All sends and receives done.\n");
-
-timeout:
-	del_timer_sync(&priv->poll_timer);
-	qp_attr.qp_state = IB_QPS_RESET;
-	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
-		ipoib_warn(priv, "Failed to modify QP to RESET state\n");
-
-	ipoib_flush_ah(dev);
-
-	ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP);
-}
-
 /*
  * Takes whatever value which is in pkey index 0 and updates priv->pkey
  * returns 0 if the pkey value was changed.
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 8c644bbc2828..be8dcea252f9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -166,7 +166,7 @@ static int ipoib_stop(struct net_device *dev)
 	netif_stop_queue(dev);
 
 	ipoib_ib_dev_down(dev);
-	ipoib_ib_dev_stop(dev);
+	ipoib_ib_dev_stop_default(dev);
 
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
 		struct ipoib_dev_priv *cpriv;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 03/26] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 01/26] IB/ipoib: Separate control and data related initializations Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 02/26] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 04/26] IB/verb: Add ipoib_options struct and API Erez Shitrit
                     ` (24 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h    | 2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index b3900b253ad5..d94a7a953338 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -483,7 +483,7 @@ static inline void ipoib_put_ah(struct ipoib_ah *ah)
 int ipoib_add_umcast_attr(struct net_device *dev);
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 qpn);
+		struct ipoib_ah *address, u32 dqpn);
 void ipoib_reap_ah(struct work_struct *work);
 
 struct ipoib_path *__path_find(struct net_device *dev, void *gid);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 5d732c5f01ee..dd5fb2964e63 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -514,7 +514,7 @@ void ipoib_send_comp_handler(struct ib_cq *cq, void *dev_ptr)
 
 static inline int post_send(struct ipoib_dev_priv *priv,
 			    unsigned int wr_id,
-			    struct ib_ah *address, u32 qpn,
+			    struct ib_ah *address, u32 dqpn,
 			    struct ipoib_tx_buf *tx_req,
 			    void *head, int hlen)
 {
@@ -524,7 +524,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 	ipoib_build_sge(priv, tx_req);
 
 	priv->tx_wr.wr.wr_id	= wr_id;
-	priv->tx_wr.remote_qpn	= qpn;
+	priv->tx_wr.remote_qpn	= dqpn;
 	priv->tx_wr.ah		= address;
 
 	if (head) {
@@ -539,7 +539,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 }
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 qpn)
+		struct ipoib_ah *address, u32 dqpn)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_tx_buf *tx_req;
@@ -621,7 +621,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 	skb_dst_drop(skb);
 
 	rc = post_send(priv, priv->tx_head & (ipoib_sendq_size - 1),
-		       address->ah, qpn, tx_req, phead, hlen);
+		       address->ah, dqpn, tx_req, phead, hlen);
 	if (unlikely(rc)) {
 		ipoib_warn(priv, "post_send failed, error %d\n", rc);
 		++dev->stats.tx_errors;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 04/26] IB/verb: Add ipoib_options struct and API
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 03/26] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 05/26] IB/ipoib: Support ipoib acceleration options callbacks Erez Shitrit
                     ` (23 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


The idea is to allow vendors to optimize IPoIB data path.
The new structure exposes set of callback functions for handling
data path flows in IPoIB driver.
Each vendor can support these set of functions in order to optimize its
specific data path, and let IPoIB to leverage its data path.
The code of IPoIB driver was changed accordingly, and works in both ways
with vendor specific implementation and without.
There is an assumption, that vendors should give the full set of functions
and not only part of them, in order to work properly.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 include/rdma/ib_ipoib_accel_ops.h | 71 +++++++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h           |  9 +++++
 2 files changed, 80 insertions(+)
 create mode 100644 include/rdma/ib_ipoib_accel_ops.h

diff --git a/include/rdma/ib_ipoib_accel_ops.h b/include/rdma/ib_ipoib_accel_ops.h
new file mode 100644
index 000000000000..e2b177f01ab5
--- /dev/null
+++ b/include/rdma/ib_ipoib_accel_ops.h
@@ -0,0 +1,71 @@
+/*
+ * Copyright (c) 2017 Mellanox Technologies Ltd.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(IB_IPOIB_ACCEL_OPS_H)
+#define IB_IPOIB_ACCEL_OPS_H
+
+#include <rdma/ib_verbs.h>
+
+struct ipoib_ah;
+
+struct ib_ipoib_accel_ops {
+
+	/*
+	 * HW provider driver creates the net_device for IPoIB.
+	 * hca: The current ib device.
+	 * name: is the format of the new network device (probably ib%d)
+	 */
+	struct net_device * (*create_netdev)(struct ib_device *hca,
+					     const char *name,
+					     void (*setup)(struct net_device *));
+
+	/* IB resource allocation function, returns new UD QP */
+	int (*ib_dev_init)(struct net_device *dev, int *qp_num);
+	void (*ib_dev_cleanup)(struct net_device *dev);
+
+	/* handle network stack when open/stop device */
+	int (*ib_dev_open)(struct net_device *dev);
+	int (*ib_dev_stop)(struct net_device *dev);
+
+	/* send packet */
+	void (*send)(struct net_device *dev, struct sk_buff *skb,
+		     struct ipoib_ah *address, u32 dqpn, u32 dqkey);
+
+	/* multicast */
+	int (*attach_mcast)(struct net_device *dev, union ib_gid *gid,
+			    u16 lid, int set_qkey);
+	int (*detach_mcast)(struct net_device *dev, union ib_gid *gid,
+			     u16 lid);
+
+};
+
+#endif /* IB_IPOIB_ACCEL_OPS_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 89f5bd4e1d52..68e3bf25450d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2147,6 +2147,7 @@ struct ib_device {
 						struct ib_wq_attr *attr,
 						u32 wq_attr_mask,
 						struct ib_udata *udata);
+	struct ib_ipoib_accel_ops * (*get_ipoib_accel_ops)(struct ib_device *device);
 	struct ib_rwq_ind_table *  (*create_rwq_ind_table)(struct ib_device *device,
 							   struct ib_rwq_ind_table_init_attr *init_attr,
 							   struct ib_udata *udata);
@@ -2322,6 +2323,14 @@ static inline u8 rdma_start_port(const struct ib_device *device)
 	return rdma_cap_ib_switch(device) ? 0 : 1;
 }
 
+static inline struct ib_ipoib_accel_ops *ib_get_ipoib_accel_ops(struct ib_device *device)
+{
+	if (!device->get_ipoib_accel_ops)
+		return NULL;
+
+	return device->get_ipoib_accel_ops(device);
+}
+
 /**
  * rdma_end_port - Return the last valid port number for the device
  * specified
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 05/26] IB/ipoib: Support ipoib acceleration options callbacks
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 04/26] IB/verb: Add ipoib_options struct and API Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 06/26] IB/ipoib: Add context to ipoib to be used in acceleration layer Erez Shitrit
                     ` (22 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


IPoIB driver now uses the new set of callback functions.
If the HW provider supports the new ipoib_options implementation, the
driver uses the callbacks in its datapath flows, otherwise it uses the
driver default implementation for all data flows in its code.
The default implementation is exactly the driver implementation as it
was without HW vendor support.

TODO: We added remote qkey to ipoib_send in order to match send op
signature.
In accel mode this param will be used but in regular mode this param is
redundant. Need to think about better solution.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h           | 18 ++++--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        | 10 +--
 drivers/infiniband/ulp/ipoib/ipoib_main.c      | 89 ++++++++++++++++++++------
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 13 ++--
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     | 14 +++-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c      |  5 +-
 6 files changed, 112 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index d94a7a953338..7b158090c1ac 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -51,8 +51,8 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_pack.h>
 #include <rdma/ib_sa.h>
+#include <rdma/ib_ipoib_accel_ops.h>
 #include <linux/sched.h>
-
 /* constants */
 
 enum ipoib_flush_level {
@@ -357,6 +357,7 @@ struct ipoib_dev_priv {
 	struct ib_cq	 *recv_cq;
 	struct ib_cq	 *send_cq;
 	struct ib_qp	 *qp;
+	u32		  qp_num;
 	u32		  qkey;
 
 	union ib_gid local_gid;
@@ -402,6 +403,7 @@ struct ipoib_dev_priv {
 	u64	hca_caps;
 	struct ipoib_ethtool_st ethtool;
 	struct timer_list poll_timer;
+	struct ib_ipoib_accel_ops *ipoib_accel_ops;
 	unsigned max_send_sge;
 	bool sm_fullmember_sendonly_support;
 };
@@ -483,14 +485,16 @@ static inline void ipoib_put_ah(struct ipoib_ah *ah)
 int ipoib_add_umcast_attr(struct net_device *dev);
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 dqpn);
+		struct ipoib_ah *address, u32 dqpn, u32 dqkey);
 void ipoib_reap_ah(struct work_struct *work);
 
 struct ipoib_path *__path_find(struct net_device *dev, void *gid);
 void ipoib_mark_paths_invalid(struct net_device *dev);
 void ipoib_flush_paths(struct net_device *dev);
 int ipoib_check_sm_sendonly_fullmember_support(struct ipoib_dev_priv *priv);
-struct ipoib_dev_priv *ipoib_intf_alloc(const char *format);
+struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca,
+					const char *format,
+					struct ib_ipoib_accel_ops *ipoib_accel_ops);
 void ipoib_ib_tx_timer_func(unsigned long ctx);
 void ipoib_ib_dev_flush_light(struct work_struct *work);
 void ipoib_ib_dev_flush_normal(struct work_struct *work);
@@ -498,7 +502,10 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 void ipoib_pkey_event(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
 void ipoib_dev_uninit_default(struct net_device *dev);
+int ipoib_ib_dev_open_default(struct net_device *dev);
+int ipoib_ib_dev_stop_default(struct net_device *dev);
 int ipoib_ib_dev_open(struct net_device *dev);
+int ipoib_ib_dev_stop(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
 int ipoib_ib_dev_stop_default(struct net_device *dev);
@@ -561,8 +568,9 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter,
 			  struct ipoib_path *path);
 #endif
 
-int ipoib_mcast_attach(struct net_device *dev, u16 mlid,
-		       union ib_gid *mgid, int set_qkey);
+int ipoib_mcast_attach(struct net_device *dev, union ib_gid *mgid, u16 mlid,
+		       int set_qkey);
+int ipoib_mcast_detach(struct net_device *dev, union ib_gid *mgid, u16 mlid);
 void ipoib_mcast_remove_list(struct list_head *remove_list);
 void ipoib_check_and_add_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
 				struct list_head *remove_list);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index dd5fb2964e63..44f6a3d5b672 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -539,7 +539,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 }
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 dqpn)
+		struct ipoib_ah *address, u32 dqpn, u32 dqkey)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_tx_buf *tx_req;
@@ -783,7 +783,9 @@ int ipoib_ib_dev_stop_default(struct net_device *dev)
 
 int ipoib_ib_dev_stop(struct net_device *dev)
 {
-	ipoib_ib_dev_stop_default(dev);
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	priv->ipoib_accel_ops->ib_dev_stop(dev);
 
 	ipoib_flush_ah(dev);
 
@@ -845,7 +847,7 @@ int ipoib_ib_dev_open(struct net_device *dev)
 	queue_delayed_work(priv->wq, &priv->ah_reap_task,
 			   round_jiffies_relative(HZ));
 
-	if (ipoib_ib_dev_open_default(dev)) {
+	if (priv->ipoib_accel_ops->ib_dev_open(dev)) {
 		pr_warn("%s: Failed to open dev\n", dev->name);
 		goto stop_ah_reap;
 	}
@@ -1239,7 +1241,7 @@ void ipoib_ib_dev_cleanup(struct net_device *dev)
 
 	clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 
-	ipoib_dev_uninit_default(dev);
+	priv->ipoib_accel_ops->ib_dev_cleanup(dev);
 
 	if (priv->pd) {
 		ib_dealloc_pd(priv->pd);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index be8dcea252f9..9946693270a3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -166,7 +166,7 @@ static int ipoib_stop(struct net_device *dev)
 	netif_stop_queue(dev);
 
 	ipoib_ib_dev_down(dev);
-	ipoib_ib_dev_stop_default(dev);
+	ipoib_ib_dev_stop(dev);
 
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
 		struct ipoib_dev_priv *cpriv;
@@ -964,7 +964,9 @@ static void neigh_add_path(struct sk_buff *skb, u8 *daddr,
 			}
 		} else {
 			spin_unlock_irqrestore(&priv->lock, flags);
-			ipoib_send(dev, skb, path->ah, IPOIB_QPN(daddr));
+			priv->ipoib_accel_ops->send(dev, skb, path->ah,
+						    IPOIB_QPN(daddr),
+						    priv->tx_wr.remote_qkey);
 			ipoib_neigh_put(neigh);
 			return;
 		}
@@ -1042,7 +1044,9 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
 			  be16_to_cpu(path->pathrec.dlid));
 
 		spin_unlock_irqrestore(&priv->lock, flags);
-		ipoib_send(dev, skb, path->ah, IPOIB_QPN(phdr->hwaddr));
+		priv->ipoib_accel_ops->send(dev, skb, path->ah,
+					    IPOIB_QPN(phdr->hwaddr),
+					    priv->tx_wr.remote_qkey);
 		return;
 	} else if ((path->query || !path_rec_start(dev, path)) &&
 		   skb_queue_len(&path->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
@@ -1122,7 +1126,9 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			goto unref;
 		}
 	} else if (neigh->ah) {
-		ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(phdr->hwaddr));
+		priv->ipoib_accel_ops->send(dev, skb, neigh->ah,
+					    IPOIB_QPN(phdr->hwaddr),
+					    priv->tx_wr.remote_qkey);
 		goto unref;
 	}
 
@@ -1637,8 +1643,7 @@ void ipoib_dev_uninit_default(struct net_device *dev)
 	priv->tx_ring = NULL;
 }
 
-static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
-				  int port)
+static int ipoib_dev_init_default(struct net_device *dev, int *qp_num)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
@@ -1651,14 +1656,14 @@ static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
 	priv->tx_ring = vzalloc(ipoib_sendq_size * sizeof *priv->tx_ring);
 	if (!priv->tx_ring) {
 		printk(KERN_WARNING "%s: failed to allocate TX ring (%d entries)\n",
-		       ca->name, ipoib_sendq_size);
+		       priv->ca->name, ipoib_sendq_size);
 		goto out_rx_ring_cleanup;
 	}
 
 	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
 
-	if (ipoib_transport_dev_init(dev, ca)) {
-		pr_warn("%s: ipoib_transport_dev_init failed\n", ca->name);
+	if (ipoib_transport_dev_init(dev, priv->ca)) {
+		pr_warn("%s: ipoib_transport_dev_init failed\n", priv->ca->name);
 		goto out_tx_ring_cleanup;
 	}
 
@@ -1703,16 +1708,16 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 		goto clean_wq;
 	}
 
-	ret = ipoib_dev_init_default(dev, ca, port);
+	ret = priv->ipoib_accel_ops->ib_dev_init(dev, &priv->qp_num);
 	if (ret) {
 		pr_warn("%s failed to init HW resource\n", dev->name);
 		goto out_free_pd;
 	}
 
 	/* after qp created set dev address */
-	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
-	priv->dev->dev_addr[2] = (priv->qp->qp_num >>  8) & 0xff;
-	priv->dev->dev_addr[3] = (priv->qp->qp_num) & 0xff;
+	priv->dev->dev_addr[1] = (priv->qp_num >> 16) & 0xff;
+	priv->dev->dev_addr[2] = (priv->qp_num >>  8) & 0xff;
+	priv->dev->dev_addr[3] = (priv->qp_num) & 0xff;
 
 	if (ipoib_neigh_hash_init(priv) < 0) {
 		pr_warn("%s failed to init neigh hash\n", dev->name);
@@ -1851,21 +1856,19 @@ static int ipoib_get_vf_stats(struct net_device *dev, int vf,
 	.ndo_get_iflink		 = ipoib_get_iflink,
 };
 
-void ipoib_setup(struct net_device *dev)
+void ipoib_setup_common(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
+	dev->header_ops		 = &ipoib_header_ops;
+
 	if (priv->hca_caps & IB_DEVICE_VIRTUAL_FUNCTION)
 		dev->netdev_ops	= &ipoib_netdev_ops_vf;
 	else
 		dev->netdev_ops	= &ipoib_netdev_ops_pf;
 
-	dev->header_ops		 = &ipoib_header_ops;
-
 	ipoib_set_ethtool_ops(dev);
 
-	netif_napi_add(dev, &priv->napi, ipoib_poll, NAPI_POLL_WEIGHT);
-
 	dev->watchdog_timeo	 = HZ;
 
 	dev->flags		|= IFF_BROADCAST | IFF_MULTICAST;
@@ -1901,12 +1904,47 @@ void ipoib_setup(struct net_device *dev)
 	INIT_DELAYED_WORK(&priv->neigh_reap_task, ipoib_reap_neigh);
 }
 
-struct ipoib_dev_priv *ipoib_intf_alloc(const char *name)
+void ipoib_setup(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	netif_napi_add(dev, &priv->napi, ipoib_poll, NAPI_POLL_WEIGHT);
+}
+
+static struct net_device *ipoib_create_netdev_default(struct ib_device *hca,
+						      const char *name,
+						      void (*setup)(struct net_device *))
 {
 	struct net_device *dev;
 
 	dev = alloc_netdev((int)sizeof(struct ipoib_dev_priv), name,
-			   NET_NAME_UNKNOWN, ipoib_setup);
+			   NET_NAME_UNKNOWN, setup);
+	if (!dev)
+		return NULL;
+
+	ipoib_setup(dev);
+
+	return dev;
+}
+
+struct ib_ipoib_accel_ops default_ipoib_accel_ops = {
+	.ib_dev_init = ipoib_dev_init_default,
+	.ib_dev_cleanup = ipoib_dev_uninit_default,
+	.ib_dev_open = ipoib_ib_dev_open_default,
+	.ib_dev_stop = ipoib_ib_dev_stop_default,
+	.send = ipoib_send,
+	.create_netdev = ipoib_create_netdev_default,
+	.attach_mcast = ipoib_mcast_attach,
+	.detach_mcast = ipoib_mcast_detach,
+};
+
+struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca,
+					const char *name,
+					struct ib_ipoib_accel_ops *ipoib_accel_ops)
+{
+	struct net_device *dev;
+
+	dev = ipoib_accel_ops->create_netdev(hca, name, ipoib_setup_common);
 	if (!dev)
 		return NULL;
 
@@ -2093,11 +2131,20 @@ static struct net_device *ipoib_add_port(const char *format,
 	struct ipoib_dev_priv *priv;
 	struct ib_port_attr attr;
 	int result = -ENOMEM;
+	struct ib_ipoib_accel_ops *cur_ipoib_accel_ops;
 
-	priv = ipoib_intf_alloc(format);
+	cur_ipoib_accel_ops = ib_get_ipoib_accel_ops(hca);
+	if (!cur_ipoib_accel_ops) {
+		pr_debug("default ipoib_ops for %s\n", hca->name);
+		cur_ipoib_accel_ops = &default_ipoib_accel_ops;
+	}
+
+	priv = ipoib_intf_alloc(hca, format, cur_ipoib_accel_ops);
 	if (!priv)
 		goto alloc_mem_failed;
 
+	priv->ipoib_accel_ops = cur_ipoib_accel_ops;
+
 	SET_NETDEV_DEV(priv->dev, hca->dma_device);
 	priv->dev->dev_id = port - 1;
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 69e146cdc306..cf38e7a256de 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -260,8 +260,9 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
 			return 0;
 		}
 
-		ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid),
-					 &mcast->mcmember.mgid, set_qkey);
+		ret = priv->ipoib_accel_ops->attach_mcast(dev, &mcast->mcmember.mgid,
+						    be16_to_cpu(mcast->mcmember.mlid),
+						    set_qkey);
 		if (ret < 0) {
 			ipoib_warn(priv, "couldn't attach QP to multicast group %pI6\n",
 				   mcast->mcmember.mgid.raw);
@@ -720,8 +721,9 @@ static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast)
 				mcast->mcmember.mgid.raw);
 
 		/* Remove ourselves from the multicast group */
-		ret = ib_detach_mcast(priv->qp, &mcast->mcmember.mgid,
-				      be16_to_cpu(mcast->mcmember.mlid));
+		ret = priv->ipoib_accel_ops->detach_mcast(dev,
+						     &mcast->mcmember.mgid,
+						     be16_to_cpu(mcast->mcmember.mlid));
 		if (ret)
 			ipoib_warn(priv, "ib_detach_mcast failed (result = %d)\n", ret);
 	} else if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
@@ -825,7 +827,8 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
 			}
 		}
 		spin_unlock_irqrestore(&priv->lock, flags);
-		ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN);
+		priv->ipoib_accel_ops->send(dev, skb, mcast->ah,
+				IB_MULTICAST_QPN, priv->tx_wr.remote_qkey);
 		if (neigh)
 			ipoib_neigh_put(neigh);
 		return;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 529b9509a2c4..476d4cf917b5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -35,7 +35,7 @@
 
 #include "ipoib.h"
 
-int ipoib_mcast_attach(struct net_device *dev, u16 mlid, union ib_gid *mgid, int set_qkey)
+int ipoib_mcast_attach(struct net_device *dev, union ib_gid *mgid, u16 mlid, int set_qkey)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_attr *qp_attr = NULL;
@@ -74,6 +74,16 @@ int ipoib_mcast_attach(struct net_device *dev, u16 mlid, union ib_gid *mgid, int
 	return ret;
 }
 
+int ipoib_mcast_detach(struct net_device *dev, union ib_gid *mgid, u16 mlid)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret;
+
+	ret = ib_detach_mcast(priv->qp, mgid, mlid);
+
+	return ret;
+}
+
 int ipoib_init_qp(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -196,6 +206,8 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		goto out_free_send_cq;
 	}
 
+	priv->qp_num = priv->qp->qp_num;
+
 	for (i = 0; i < MAX_SKB_FRAGS + 1; ++i)
 		priv->tx_sge[i].lkey = priv->pd->local_dma_lkey;
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index deedb6fc1b05..e303bad8c067 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -134,10 +134,13 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 
 	snprintf(intf_name, sizeof intf_name, "%s.%04x",
 		 ppriv->dev->name, pkey);
-	priv = ipoib_intf_alloc(intf_name);
+
+	priv = ipoib_intf_alloc(ppriv->ca, intf_name, ppriv->ipoib_accel_ops);
 	if (!priv)
 		return -ENOMEM;
 
+	priv->ipoib_accel_ops = ppriv->ipoib_accel_ops;
+
 	if (!rtnl_trylock())
 		return restart_syscall();
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 06/26] IB/ipoib: Add context to ipoib to be used in acceleration layer
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 05/26] IB/ipoib: Support ipoib acceleration options callbacks Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 07/26] hw/mlx5: Add New bit to check over QP creation Erez Shitrit
                     ` (21 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Each specific HW accelerator can keep here handle to its own needs.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 7b158090c1ac..48b76bd872cb 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -406,6 +406,7 @@ struct ipoib_dev_priv {
 	struct ib_ipoib_accel_ops *ipoib_accel_ops;
 	unsigned max_send_sge;
 	bool sm_fullmember_sendonly_support;
+	void *context;
 };
 
 struct ipoib_ah {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 07/26] hw/mlx5: Add New bit to check over QP creation
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 06/26] IB/ipoib: Add context to ipoib to be used in acceleration layer Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 08/26] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects Erez Shitrit
                     ` (20 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/qp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dc0ea63900c1..41e14d57fec9 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -897,6 +897,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
 	if (init_attr->create_flags & ~(IB_QP_CREATE_SIGNATURE_EN |
 					IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK |
 					IB_QP_CREATE_IPOIB_UD_LSO |
+					IB_QP_CREATE_NETIF_QP |
 					mlx5_ib_create_qp_sqpn_qp1()))
 		return -EINVAL;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 08/26] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 07/26] hw/mlx5: Add New bit to check over QP creation Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 09/26] net/mlx5e: Refactor EN code to support IB link Erez Shitrit
                     ` (19 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Add to the TIS and flow_table objects, and to hca capability table.

1. New capability bit: ipoib_enhanced_offloads, indicates new ability for UD
QP to do RSS.
2. In order to support SET_ROOT cmd, to connect between QP to FS table.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 include/linux/mlx5/mlx5_ifc.h | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2d197d8a7025..afb6c8ab156a 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -859,7 +859,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 
 	u8         compact_address_vector[0x1];
 	u8         striding_rq[0x1];
-	u8         reserved_at_202[0x2];
+	u8         reserved_at_202[0x1];
+	u8         ipoib_enhanced_offloads[0x1];
 	u8         ipoib_basic_offloads[0x1];
 	u8         reserved_at_205[0xa];
 	u8         drain_sigerr[0x1];
@@ -2217,7 +2218,9 @@ struct mlx5_ifc_tisc_bits {
 	u8         reserved_at_120[0x8];
 	u8         transport_domain[0x18];
 
-	u8         reserved_at_140[0x3c0];
+	u8         reserved_at_140[0x8];
+	u8         underlay_qpn[0x18];
+	u8         reserved_at_160[0x3a0];
 };
 
 enum {
@@ -7906,7 +7909,9 @@ struct mlx5_ifc_set_flow_table_root_in_bits {
 	u8         reserved_at_a0[0x8];
 	u8         table_id[0x18];
 
-	u8         reserved_at_c0[0x140];
+	u8         reserved_at_c0[0x8];
+	u8         underlay_qpn[0x18];
+	u8         reserved_at_e0[0x120];
 };
 
 enum {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 09/26] net/mlx5e: Refactor EN code to support IB link
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 08/26] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 10/26] net/mlx5e: Creating and Destroying flow-steering tables for " Erez Shitrit
                     ` (18 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


The idea is to use the same infrastructures for both ETH and IB link
types, so the first step is to refactor the ETH handling to be able to
use IB link as well.

1. Check requirments for ETH and for IB
2. Move code to common functions, where it will be used for both link
types.
3. Change init and cleanup flows not to be specific for ETH link.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 120 ++++++++++++++--------
 1 file changed, 80 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 041e0ac16096..88541f99d37b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3323,29 +3323,39 @@ static void mlx5e_netpoll(struct net_device *dev)
 	.ndo_get_offload_stats	 = mlx5e_get_offload_stats,
 };
 
-static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
-{
-	if (MLX5_CAP_GEN(mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
-		return -EOPNOTSUPP;
-	if (!MLX5_CAP_GEN(mdev, eth_net_offloads) ||
-	    !MLX5_CAP_GEN(mdev, nic_flow_table) ||
-	    !MLX5_CAP_ETH(mdev, csum_cap) ||
-	    !MLX5_CAP_ETH(mdev, max_lso_cap) ||
-	    !MLX5_CAP_ETH(mdev, vlan_cap) ||
-	    !MLX5_CAP_ETH(mdev, rss_ind_tbl_cap) ||
-	    MLX5_CAP_FLOWTABLE(mdev,
-			       flow_table_properties_nic_receive.max_ft_level)
-			       < 3) {
-		mlx5_core_warn(mdev,
-			       "Not creating net device, some required device capabilities are missing\n");
-		return -EOPNOTSUPP;
+static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev,
+					int link_type)
+{
+	if (link_type == MLX5_INTERFACE_PROTOCOL_ETH) {
+		if (!MLX5_CAP_GEN(mdev, eth_net_offloads) ||
+		    !MLX5_CAP_GEN(mdev, nic_flow_table) ||
+		    !MLX5_CAP_ETH(mdev, csum_cap) ||
+		    !MLX5_CAP_ETH(mdev, max_lso_cap) ||
+		    !MLX5_CAP_ETH(mdev, vlan_cap) ||
+		    !MLX5_CAP_ETH(mdev, rss_ind_tbl_cap) ||
+		    MLX5_CAP_FLOWTABLE(mdev,
+				       flow_table_properties_nic_receive.max_ft_level)
+				       < 3) {
+			mlx5_core_warn(mdev,
+				       "Not creating net device, some required device capabilities are missing\n");
+			return -ENOTSUPP;
+		}
+		if (!MLX5_CAP_ETH(mdev, self_lb_en_modifiable))
+			mlx5_core_warn(mdev, "Self loop back prevention is not supported\n");
+		if (!MLX5_CAP_GEN(mdev, cq_moderation))
+			mlx5_core_warn(mdev, "CQ modiration is not supported\n");
+
+		return 0;
+	} else if (link_type == MLX5_INTERFACE_PROTOCOL_IB) {
+		if (!MLX5_CAP_GEN(mdev, ipoib_enhanced_offloads)) {
+			pr_warn("Not creating net device (IB), some required device capabilities are missing\n");
+			return -ENOTSUPP;
+		}
+		return 0;
 	}
-	if (!MLX5_CAP_ETH(mdev, self_lb_en_modifiable))
-		mlx5_core_warn(mdev, "Self loop back prevention is not supported\n");
-	if (!MLX5_CAP_GEN(mdev, cq_moderation))
-		mlx5_core_warn(mdev, "CQ modiration is not supported\n");
 
-	return 0;
+	return -ENOTSUPP;
+
 }
 
 u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev)
@@ -3455,12 +3465,12 @@ u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout)
 	return MLX5_CAP_ETH(mdev, lro_timer_supported_periods[i]);
 }
 
-static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
+void mlx5n_build_nic_netdev_priv_common(struct mlx5_core_dev *mdev,
 					struct net_device *netdev,
+					struct mlx5e_priv *priv,
 					const struct mlx5e_profile *profile,
 					void *ppriv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
 	u32 link_speed = 0;
 	u32 pci_bw = 0;
 	u8 cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
@@ -3524,6 +3534,15 @@ static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
 	MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, priv->params.rx_cqe_compress_def);
 
 	mutex_init(&priv->state_lock);
+}
+
+static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
+					struct net_device *netdev,
+					struct mlx5e_priv *priv,
+					const struct mlx5e_profile *profile,
+					void *ppriv)
+{
+	mlx5n_build_nic_netdev_priv_common(mdev, netdev, priv, profile, ppriv);
 
 	INIT_WORK(&priv->update_carrier_work, mlx5e_update_carrier_work);
 	INIT_WORK(&priv->set_rx_mode_work, mlx5e_set_rx_mode_work);
@@ -3663,7 +3682,7 @@ static void mlx5e_nic_init(struct mlx5_core_dev *mdev,
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 
-	mlx5e_build_nic_netdev_priv(mdev, netdev, profile, ppriv);
+	mlx5e_build_nic_netdev_priv(mdev, netdev, priv, profile, ppriv);
 	mlx5e_build_nic_netdev(netdev);
 	mlx5e_vxlan_init(priv);
 }
@@ -3676,7 +3695,8 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 		bpf_prog_put(priv->xdp_prog);
 }
 
-static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
+/* used for ETH && IB link layer */
+static int mlx5n_init_nic_rx_common(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int err;
@@ -3706,20 +3726,12 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
 		goto err_destroy_indirect_tirs;
 	}
 
-	err = mlx5e_create_flow_steering(priv);
-	if (err) {
-		mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
-		goto err_destroy_direct_tirs;
-	}
-
 	err = mlx5e_tc_init(priv);
 	if (err)
-		goto err_destroy_flow_steering;
+		goto err_destroy_direct_tirs;
 
 	return 0;
 
-err_destroy_flow_steering:
-	mlx5e_destroy_flow_steering(priv);
 err_destroy_direct_tirs:
 	mlx5e_destroy_direct_tirs(priv);
 err_destroy_indirect_tirs:
@@ -3732,12 +3744,31 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
 	return err;
 }
 
-static void mlx5e_cleanup_nic_rx(struct mlx5e_priv *priv)
+static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	int err;
+
+	err = mlx5n_init_nic_rx_common(priv);
+	if (err) {
+		mlx5_core_warn(mdev, "failed create nic rx res, %d\n", err);
+		return err;
+	}
+
+	err = mlx5e_create_flow_steering(priv);
+	if (err) {
+		mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static void mlx5n_cleanup_nic_rx_common(struct mlx5e_priv *priv)
 {
 	int i;
 
 	mlx5e_tc_cleanup(priv);
-	mlx5e_destroy_flow_steering(priv);
 	mlx5e_destroy_direct_tirs(priv);
 	mlx5e_destroy_indirect_tirs(priv);
 	for (i = 0; i < priv->profile->max_nch(priv->mdev); i++)
@@ -3745,6 +3776,12 @@ static void mlx5e_cleanup_nic_rx(struct mlx5e_priv *priv)
 	mlx5e_destroy_rqt(priv, &priv->indir_rqt);
 }
 
+static void mlx5e_cleanup_nic_rx(struct mlx5e_priv *priv)
+{
+	mlx5e_destroy_flow_steering(priv);
+	mlx5n_cleanup_nic_rx_common(priv);
+}
+
 static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
 {
 	int err;
@@ -3836,7 +3873,8 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 		return NULL;
 	}
 
-	profile->init(mdev, netdev, profile, ppriv);
+	if (profile->init)
+		profile->init(mdev, netdev, profile, ppriv);
 
 	netif_carrier_off(netdev);
 
@@ -3849,7 +3887,9 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 	return netdev;
 
 err_cleanup_nic:
-	profile->cleanup(priv);
+	if (profile->cleanup)
+		profile->cleanup(priv);
+
 	free_netdev(netdev);
 
 	return NULL;
@@ -3881,7 +3921,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
 		goto err_close_drop_rq;
 
 	mlx5e_create_q_counter(priv);
-
+	//TBD do i need to change that?
 	mlx5e_init_l2_addr(priv);
 
 	/* MTU range: 68 - hw-specific max */
@@ -4006,7 +4046,7 @@ static void *mlx5e_add(struct mlx5_core_dev *mdev)
 	int err;
 	struct net_device *netdev;
 
-	err = mlx5e_check_required_hca_cap(mdev);
+	err = mlx5e_check_required_hca_cap(mdev, MLX5_INTERFACE_PROTOCOL_ETH);
 	if (err)
 		return NULL;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 10/26] net/mlx5e: Creating and Destroying flow-steering tables for IB link
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 09/26] net/mlx5e: Refactor EN code to support IB link Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 11/26] net/mlx5e: Support netdevice creation for IB link type Erez Shitrit
                     ` (17 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


New function to handle RSS table for IB link type.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h    |  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 41 +++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 39f8ac849af7..f3337ec4457f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -755,6 +755,8 @@ void mlx5e_page_release(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info,
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv);
+int mlx5i_create_flow_steering(struct mlx5e_priv *priv);
+void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_init_l2_addr(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_table(struct mlx5e_flow_table *ft);
 int mlx5e_self_test_num(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index a0e5a69402b3..c6b40003007c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -1081,6 +1081,40 @@ static void mlx5e_destroy_vlan_table(struct mlx5e_priv *priv)
 	mlx5e_destroy_flow_table(&priv->fs.vlan.ft);
 }
 
+int mlx5i_create_flow_steering(struct mlx5e_priv *priv)
+{
+	int err;
+
+	priv->fs.ns = mlx5_get_flow_namespace(priv->mdev,
+					       MLX5_FLOW_NAMESPACE_KERNEL);
+
+	if (!priv->fs.ns)
+		return -EINVAL;
+
+	err = mlx5e_arfs_create_tables(priv);
+	if (err) {
+		netdev_err(priv->netdev, "Failed to create arfs tables, err=%d\n",
+			   err);
+		priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
+	}
+
+	err = mlx5e_create_ttc_table(priv);
+	if (err) {
+		netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
+			   err);
+		goto err_destroy_arfs_tables;
+	}
+
+	mlx5e_ethtool_init_steering(priv);
+
+	return 0;
+
+err_destroy_arfs_tables:
+	mlx5e_arfs_destroy_tables(priv);
+
+	return err;
+}
+
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv)
 {
 	int err;
@@ -1141,3 +1175,10 @@ void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv)
 	mlx5e_arfs_destroy_tables(priv);
 	mlx5e_ethtool_cleanup_steering(priv);
 }
+
+void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv)
+{
+	mlx5e_destroy_ttc_table(priv);
+	mlx5e_arfs_destroy_tables(priv);
+	mlx5e_ethtool_cleanup_steering(priv);
+}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 11/26] net/mlx5e: Support netdevice creation for IB link type
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 10/26] net/mlx5e: Creating and Destroying flow-steering tables for " Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 12/26] net/mlx5e: Refactor attach_netdev API Erez Shitrit
                     ` (16 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Implement required interface that will able the IB link to be run on top
of the ETH data structures.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 99 +++++++++++++++++++++++
 include/linux/mlx5/driver.h                       | 11 +++
 2 files changed, 110 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 88541f99d37b..dd7d1d123d38 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3843,6 +3843,105 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 	mlx5_lag_remove(mdev);
 }
 
+static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
+			   struct net_device *netdev,
+			   const struct mlx5e_profile *profile,
+			   void *ppriv)
+{
+	int base_size = *(int *)ppriv;
+	struct mlx5e_priv *priv = netdev_priv(netdev) + base_size;
+
+	mlx5n_build_nic_netdev_priv_common(mdev, netdev, priv, profile, ppriv);
+}
+
+static int mlx5i_init_nic_rx(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	int err;
+
+	err = mlx5n_init_nic_rx_common(priv);
+	if (err) {
+		mlx5_core_warn(mdev, "failed create nic rx res, %d\n", err);
+		return err;
+	}
+
+	err = mlx5i_create_flow_steering(priv);
+	if (err) {
+		mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static void mlx5i_cleanup_nic_rx(struct mlx5e_priv *priv)
+{
+	mlx5i_destroy_flow_steering(priv);
+	mlx5n_cleanup_nic_rx_common(priv);
+}
+
+static const struct mlx5e_profile mlx5i_nic_profile = {
+	.init		   = mlx5i_nic_init,
+	.cleanup	   = NULL,
+	.init_rx	   = mlx5i_init_nic_rx,
+	.cleanup_rx	   = mlx5i_cleanup_nic_rx,
+	.init_tx	   = mlx5e_init_nic_tx,
+	.cleanup_tx	   = mlx5e_cleanup_nic_tx,
+	.enable		   = NULL,/*mlx5e_nic_enable,*/
+	.disable	   = NULL,
+	.update_stats	   = NULL,/*mlx5e_update_stats,*/
+	.max_nch	   = mlx5e_get_max_num_channels,
+	.max_tc		   = MLX5E_MAX_NUM_TC,
+};
+
+struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
+				       const char *name,
+				       void (*setup)(struct net_device *dev),
+				       struct mlx5i_create_ext_param *param)
+{
+	const struct mlx5e_profile *profile = &mlx5i_nic_profile;
+	int nch = profile->max_nch(mdev);
+	struct net_device *netdev;
+	struct mlx5e_priv *priv;
+
+	if (mlx5e_check_required_hca_cap(mdev, MLX5_INTERFACE_PROTOCOL_IB))
+		return NULL;
+
+	netdev = alloc_netdev_mqs(sizeof(struct mlx5e_priv) + param->size_base_priv,
+				  name, NET_NAME_UNKNOWN,
+				  setup,
+				  nch * MLX5E_MAX_NUM_TC,
+				  nch);
+	if (!netdev) {
+		pr_err("alloc_netdev_mqs failed\n");
+		return NULL;
+	}
+
+	if (profile->init)
+		profile->init(mdev, netdev, profile, &param->size_base_priv);
+
+	netif_carrier_off(netdev);
+
+	priv = netdev_priv(netdev) + param->size_base_priv;
+
+	priv->underlay_qpn = param->qpn;
+
+	priv->wq = create_singlethread_workqueue("mlx5i");
+	if (!priv->wq)
+		goto err_cleanup_nic;
+
+	return netdev;
+
+err_cleanup_nic:
+	if (profile->cleanup)
+		profile->cleanup(priv);
+
+	free_netdev(netdev);
+
+	return NULL;
+}
+EXPORT_SYMBOL(mlx5i_create_netdev);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
 	.init		   = mlx5e_nic_init,
 	.cleanup	   = mlx5e_nic_cleanup,
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 886ff2b00500..0acf099e9d06 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1127,4 +1127,15 @@ enum {
 	MLX5_TRIGGERED_CMD_COMP = (u64)1 << 32,
 };
 
+struct mlx5i_create_ext_param {
+	int	size_base_priv;
+	u32	qpn;
+};
+
+struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
+				       const char *name,
+				       void (*setup)(struct net_device *dev),
+				       struct mlx5i_create_ext_param *param);
+int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv);
+void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 12/26] net/mlx5e: Refactor attach_netdev API
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (10 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 11/26] net/mlx5e: Support netdevice creation for IB link type Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 13/26] net/mlx5e: Use underlay_qpn in tis creation Erez Shitrit
                     ` (15 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Use priv object instead of netdev object, will give tha ability to use
it for IB link.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 ++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 4 +++-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f3337ec4457f..e5c8badc38c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -918,7 +918,7 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 				       const struct mlx5e_profile *profile,
 				       void *ppriv);
 void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
-int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
+int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
 void mlx5e_add_vxlan_port(struct net_device *netdev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index dd7d1d123d38..4e9a11b94025 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3994,14 +3994,14 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 	return NULL;
 }
 
-int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
+int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
 	const struct mlx5e_profile *profile;
-	struct mlx5e_priv *priv;
+	struct net_device *netdev;
 	u16 max_mtu;
 	int err;
 
-	priv = netdev_priv(netdev);
+	netdev = priv->netdev;
 	profile = priv->profile;
 	clear_bit(MLX5E_STATE_DESTROYING, &priv->state);
 
@@ -4114,7 +4114,7 @@ static int mlx5e_attach(struct mlx5_core_dev *mdev, void *vpriv)
 	if (err)
 		return err;
 
-	err = mlx5e_attach_netdev(mdev, netdev);
+	err = mlx5e_attach_netdev(mdev, priv);
 	if (err) {
 		mlx5e_destroy_mdev_resources(mdev);
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 850378893b25..0aad28da1638 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -556,6 +556,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 			 struct mlx5_eswitch_rep *rep)
 {
 	struct net_device *netdev;
+	struct mlx5e_priv *priv;
 	int err;
 
 	netdev = mlx5e_create_netdev(esw->dev, &mlx5e_rep_profile, rep);
@@ -567,7 +568,8 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 
 	rep->netdev = netdev;
 
-	err = mlx5e_attach_netdev(esw->dev, netdev);
+	priv = netdev_priv(netdev);
+	err = mlx5e_attach_netdev(esw->dev, priv);
 	if (err) {
 		pr_warn("Failed to attach representor netdev for vport %d\n",
 			rep->vport);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 13/26] net/mlx5e: Use underlay_qpn in tis creation
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (11 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 12/26] net/mlx5e: Refactor attach_netdev API Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 14/26] net/mlx5e: Export resource creation function to be used in IB link Erez Shitrit
                     ` (14 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Enable IB link to use the same code, by default the underlay_qp is zero
for ETH link.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e5c8badc38c7..84db4761f09c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -222,6 +222,7 @@ struct mlx5e_params {
 	bool rx_am_enabled;
 	u32 lro_timeout;
 	u32 pflags;
+	u32 underlay_qpn;
 };
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -718,6 +719,7 @@ struct mlx5e_priv {
 
 	const struct mlx5e_profile *profile;
 	void                      *ppriv;
+	u32                       underlay_qpn;
 };
 
 void mlx5e_build_ptys2ethtool_map(void);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 4e9a11b94025..eedbe3947423 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2453,6 +2453,7 @@ static int mlx5e_create_tis(struct mlx5e_priv *priv, int tc)
 
 	MLX5_SET(tisc, tisc, prio, tc << 1);
 	MLX5_SET(tisc, tisc, transport_domain, mdev->mlx5e_res.td.tdn);
+	MLX5_SET(tisc, tisc, underlay_qpn, priv->underlay_qpn);
 
 	if (mlx5_lag_is_lacp_owner(mdev))
 		MLX5_SET(tisc, tisc, strict_lag_tx_port_affinity, 1);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 14/26] net/mlx5e: Export resource creation function to be used in IB link
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (12 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 13/26] net/mlx5e: Use underlay_qpn in tis creation Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 15/26] net/mlx5: Enable flow-steering for " Erez Shitrit
                     ` (13 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


mlx5i_attach that creates the resources of IB network device.
mlx5i_detach cleans resources for IB device.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 122 +++++++++++++++-------
 1 file changed, 87 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index eedbe3947423..726d516b7ea8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3844,6 +3844,54 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 	mlx5_lag_remove(mdev);
 }
 
+static int mlx5n_attach_netdev_common(struct mlx5_core_dev *mdev,
+				      struct mlx5e_priv *priv)
+{
+	const struct mlx5e_profile *profile;
+	struct net_device *netdev;
+	int err;
+
+	netdev = priv->netdev;
+	profile = priv->profile;
+	clear_bit(MLX5E_STATE_DESTROYING, &priv->state);
+
+	err = profile->init_tx(priv);
+	if (err)
+		goto out;
+
+	err = mlx5e_open_drop_rq(priv);
+	if (err) {
+		mlx5_core_err(mdev, "open drop rq failed, %d\n", err);
+		goto err_cleanup_tx;
+	}
+
+	err = profile->init_rx(priv);
+	if (err)
+		goto err_close_drop_rq;
+
+	mlx5e_create_q_counter(priv);
+
+	if (profile->enable)
+		profile->enable(priv);
+
+	rtnl_lock();
+	if (netif_running(netdev))
+		mlx5e_open(netdev);
+	netif_device_attach(netdev);
+	rtnl_unlock();
+
+	return 0;
+
+err_close_drop_rq:
+	mlx5e_close_drop_rq(priv);
+
+err_cleanup_tx:
+	profile->cleanup_tx(priv);
+
+out:
+	return err;
+}
+
 static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
 			   struct net_device *netdev,
 			   const struct mlx5e_profile *profile,
@@ -3943,6 +3991,42 @@ struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL(mlx5i_create_netdev);
 
+int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv)
+{
+	struct mlx5e_priv *priv = vpriv;
+	struct net_device *netdev = priv->netdev;
+	int err;
+
+	if (netif_device_present(netdev))
+		return 0;
+
+	err = mlx5e_create_mdev_resources(mdev);
+	if (err)
+		return err;
+
+	err = mlx5n_attach_netdev_common(mdev, priv);
+	if (err) {
+		mlx5e_destroy_mdev_resources(mdev);
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(mlx5i_attach);
+
+void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv)
+{
+	struct mlx5e_priv *priv = vpriv;
+	struct net_device *netdev = priv->netdev;
+
+	if (!netif_device_present(netdev))
+		return;
+
+	mlx5e_detach_netdev(mdev, netdev);
+	mlx5e_destroy_mdev_resources(mdev);
+}
+EXPORT_SYMBOL(mlx5i_detach);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
 	.init		   = mlx5e_nic_init,
 	.cleanup	   = mlx5e_nic_cleanup,
@@ -3997,31 +4081,17 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 
 int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
-	const struct mlx5e_profile *profile;
 	struct net_device *netdev;
 	u16 max_mtu;
 	int err;
 
 	netdev = priv->netdev;
-	profile = priv->profile;
-	clear_bit(MLX5E_STATE_DESTROYING, &priv->state);
-
-	err = profile->init_tx(priv);
-	if (err)
-		goto out;
 
-	err = mlx5e_open_drop_rq(priv);
+	err = mlx5n_attach_netdev_common(mdev, priv);
 	if (err) {
-		mlx5_core_err(mdev, "open drop rq failed, %d\n", err);
-		goto err_cleanup_tx;
+		mlx5_core_err(mdev, "failed attach netdev %d\n", err);
+		return err;
 	}
-
-	err = profile->init_rx(priv);
-	if (err)
-		goto err_close_drop_rq;
-
-	mlx5e_create_q_counter(priv);
-	//TBD do i need to change that?
 	mlx5e_init_l2_addr(priv);
 
 	/* MTU range: 68 - hw-specific max */
@@ -4031,25 +4101,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 
 	mlx5e_set_dev_port_mtu(netdev);
 
-	if (profile->enable)
-		profile->enable(priv);
-
-	rtnl_lock();
-	if (netif_running(netdev))
-		mlx5e_open(netdev);
-	netif_device_attach(netdev);
-	rtnl_unlock();
-
 	return 0;
-
-err_close_drop_rq:
-	mlx5e_close_drop_rq(priv);
-
-err_cleanup_tx:
-	profile->cleanup_tx(priv);
-
-out:
-	return err;
 }
 
 static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 15/26] net/mlx5: Enable flow-steering for IB link
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (13 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 14/26] net/mlx5e: Export resource creation function to be used in IB link Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 16/26] net/mlx5e: Enhanced flow table creation to support ETH and IB links Erez Shitrit
                     ` (12 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Get the relevant capabilities if supports ipoib_enhanced_offloads and
init the flow steering table accordingly.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +++++-----
 drivers/net/ethernet/mellanox/mlx5/core/fw.c      |  3 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index fa4edd88daf1..dd21fc557281 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1991,9 +1991,6 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
 	struct mlx5_flow_steering *steering;
 	int err = 0;
 
-	if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
-		return 0;
-
 	err = mlx5_init_fc_stats(dev);
 	if (err)
 		return err;
@@ -2004,8 +2001,11 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
 	steering->dev = dev;
 	dev->priv.steering = steering;
 
-	if (MLX5_CAP_GEN(dev, nic_flow_table) &&
-	    MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
+	if ((((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
+	      (MLX5_CAP_GEN(dev, nic_flow_table))) ||
+	     ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
+	      MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)))
+	    && MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
 		err = init_root_ns(steering);
 		if (err)
 			goto err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 5718aada6605..f95bc78b02f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -123,7 +123,8 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 			return err;
 	}
 
-	if (MLX5_CAP_GEN(dev, nic_flow_table)) {
+	if (MLX5_CAP_GEN(dev, nic_flow_table) ||
+	    MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
 		err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
 		if (err)
 			return err;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 16/26] net/mlx5e: Enhanced flow table creation to support ETH and IB links.
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (14 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 15/26] net/mlx5: Enable flow-steering for " Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 17/26] net/mlx5e: Change cleanup API in order to enable IB link Erez Shitrit
                     ` (11 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


IB link needs the the underlay_qp to support flow-steering, so change
the API of the flow-steering creation for supporting both types in the
same set of functions.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 12 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c    | 39 ++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  9 ++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 19 +++++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  8 +++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 67 ++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  1 +
 include/linux/mlx5/fs.h                            | 16 ++++--
 8 files changed, 125 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 68419a01db36..ea3032d97b0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -325,10 +325,18 @@ static int arfs_create_table(struct mlx5e_priv *priv,
 {
 	struct mlx5e_arfs_tables *arfs = &priv->fs.arfs;
 	struct mlx5e_flow_table *ft = &arfs->arfs_tables[type].ft;
+	struct create_flow_table_param param = {0};
 	int err;
 
-	ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-				       MLX5E_ARFS_TABLE_SIZE, MLX5E_ARFS_FT_LEVEL, 0);
+	ft->num_groups = 0;
+
+	param.ns = priv->fs.ns;
+	param.prio = MLX5E_NIC_PRIO;
+	param.max_fte = MLX5E_ARFS_TABLE_SIZE;
+	param.level = MLX5E_ARFS_FT_LEVEL;
+	param.flags = 0;
+
+	ft->t = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft->t)) {
 		err = PTR_ERR(ft->t);
 		ft->t = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index c6b40003007c..46b48b76e7ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -779,9 +779,16 @@ static int mlx5e_create_ttc_table(struct mlx5e_priv *priv)
 	struct mlx5e_ttc_table *ttc = &priv->fs.ttc;
 	struct mlx5e_flow_table *ft = &ttc->ft;
 	int err;
+	struct create_flow_table_param param = {0};
 
-	ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-				       MLX5E_TTC_TABLE_SIZE, MLX5E_TTC_FT_LEVEL, 0);
+	param.ns = priv->fs.ns;
+	param.prio = MLX5E_NIC_PRIO;
+	param.max_fte = MLX5E_TTC_TABLE_SIZE;
+	param.level = MLX5E_TTC_FT_LEVEL;
+	param.flags = 0;
+	param.underlay_qpn = priv->underlay_qpn;
+
+	ft->t = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft->t)) {
 		err = PTR_ERR(ft->t);
 		ft->t = NULL;
@@ -952,10 +959,16 @@ static int mlx5e_create_l2_table(struct mlx5e_priv *priv)
 	struct mlx5e_l2_table *l2_table = &priv->fs.l2;
 	struct mlx5e_flow_table *ft = &l2_table->ft;
 	int err;
+	struct create_flow_table_param param = {0};
+
+	param.ns = priv->fs.ns;
+	param.prio = MLX5E_NIC_PRIO;
+	param.max_fte = MLX5E_L2_TABLE_SIZE;
+	param.level = MLX5E_L2_FT_LEVEL;
+	param.flags = 0;
 
 	ft->num_groups = 0;
-	ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-				       MLX5E_L2_TABLE_SIZE, MLX5E_L2_FT_LEVEL, 0);
+	ft->t = mlx5_create_flow_table(&param);
 
 	if (IS_ERR(ft->t)) {
 		err = PTR_ERR(ft->t);
@@ -1041,11 +1054,18 @@ static int mlx5e_create_vlan_table_groups(struct mlx5e_flow_table *ft)
 static int mlx5e_create_vlan_table(struct mlx5e_priv *priv)
 {
 	struct mlx5e_flow_table *ft = &priv->fs.vlan.ft;
+	struct create_flow_table_param param = {0};
 	int err;
 
 	ft->num_groups = 0;
-	ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-				       MLX5E_VLAN_TABLE_SIZE, MLX5E_VLAN_FT_LEVEL, 0);
+
+	param.ns = priv->fs.ns;
+	param.prio = MLX5E_NIC_PRIO;
+	param.max_fte = MLX5E_VLAN_TABLE_SIZE;
+	param.level = MLX5E_VLAN_FT_LEVEL;
+	param.flags = 0;
+
+	ft->t = mlx5_create_flow_table(&param);
 
 	if (IS_ERR(ft->t)) {
 		err = PTR_ERR(ft->t);
@@ -1091,13 +1111,6 @@ int mlx5i_create_flow_steering(struct mlx5e_priv *priv)
 	if (!priv->fs.ns)
 		return -EINVAL;
 
-	err = mlx5e_arfs_create_tables(priv);
-	if (err) {
-		netdev_err(priv->netdev, "Failed to create arfs tables, err=%d\n",
-			   err);
-		priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
-	}
-
 	err = mlx5e_create_ttc_table(priv);
 	if (err) {
 		netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index d0c8bf014453..06dfe755f931 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -340,6 +340,7 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw, int nvports)
 	struct mlx5_core_dev *dev = esw->dev;
 	struct mlx5_flow_namespace *root_ns;
 	struct mlx5_flow_table *fdb;
+	struct create_flow_table_param param = {0};
 	struct mlx5_flow_group *g;
 	void *match_criteria;
 	int table_size;
@@ -361,8 +362,14 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw, int nvports)
 		return -ENOMEM;
 	memset(flow_group_in, 0, inlen);
 
+	param.ns = root_ns;
+	param.prio = 0;
+	param.level = 0;
+	param.max_fte = table_size;
+	param.flags = 0;
+
 	table_size = BIT(MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
-	fdb = mlx5_create_flow_table(root_ns, 0, table_size, 0, 0);
+	fdb = mlx5_create_flow_table(&param);
 	if (IS_ERR(fdb)) {
 		err = PTR_ERR(fdb);
 		esw_warn(dev, "Failed to create FDB Table err %d\n", err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 595f7c7383b3..5e929888f0d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -410,6 +410,7 @@ static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
 	struct mlx5_core_dev *dev = esw->dev;
 	struct mlx5_flow_namespace *root_ns;
+	struct create_flow_table_param param = {0};
 	struct mlx5_flow_table *fdb = NULL;
 	struct mlx5_flow_group *g;
 	u32 *flow_group_in;
@@ -447,7 +448,14 @@ static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
 	esw->fdb_table.fdb = fdb;
 
 	table_size = nvports + MAX_PF_SQ + 1;
-	fdb = mlx5_create_flow_table(root_ns, FDB_SLOW_PATH, table_size, 0, 0);
+
+	param.ns = root_ns;
+	param.prio = FDB_SLOW_PATH;
+	param.level = 0;
+	param.max_fte = table_size;
+	param.flags = 0;
+
+	fdb = mlx5_create_flow_table(&param);
 	if (IS_ERR(fdb)) {
 		err = PTR_ERR(fdb);
 		esw_warn(dev, "Failed to create slow path FDB Table err %d\n", err);
@@ -531,6 +539,7 @@ static int esw_create_offloads_table(struct mlx5_eswitch *esw)
 	struct mlx5_flow_namespace *ns;
 	struct mlx5_flow_table *ft_offloads;
 	struct mlx5_core_dev *dev = esw->dev;
+	struct create_flow_table_param param = {0};
 	int err = 0;
 
 	ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_OFFLOADS);
@@ -539,7 +548,13 @@ static int esw_create_offloads_table(struct mlx5_eswitch *esw)
 		return -EOPNOTSUPP;
 	}
 
-	ft_offloads = mlx5_create_flow_table(ns, 0, dev->priv.sriov.num_vfs + 2, 0, 0);
+	param.ns = ns;
+	param.prio = 0;
+	param.level = 0;
+	param.max_fte = dev->priv.sriov.num_vfs + 2;
+	param.flags = 0;
+
+	ft_offloads = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft_offloads)) {
 		err = PTR_ERR(ft_offloads);
 		esw_warn(esw->dev, "Failed to create offloads table, err %d\n", err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index b53fc85a2375..d82721f00f94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -45,6 +45,10 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
 	u32 in[MLX5_ST_SZ_DW(set_flow_table_root_in)]   = {0};
 	u32 out[MLX5_ST_SZ_DW(set_flow_table_root_out)] = {0};
 
+	if ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
+	    ft->underlay_qpn == 0)
+		return 0;
+
 	MLX5_SET(set_flow_table_root_in, in, opcode,
 		 MLX5_CMD_OP_SET_FLOW_TABLE_ROOT);
 	MLX5_SET(set_flow_table_root_in, in, table_type, ft->type);
@@ -54,6 +58,10 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
 		MLX5_SET(set_flow_table_root_in, in, other_vport, 1);
 	}
 
+	if ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
+	    ft->underlay_qpn != 0)
+		MLX5_SET(set_flow_table_root_in, in, underlay_qpn, ft->underlay_qpn);
+
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index dd21fc557281..07e766770c14 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -776,18 +776,16 @@ static void list_add_flow_table(struct mlx5_flow_table *ft,
 	list_add(&ft->node.list, prev);
 }
 
-static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
+static struct mlx5_flow_table *__mlx5_create_flow_table(struct create_flow_table_param *param,
 							enum fs_flow_table_op_mod op_mod,
-							u16 vport, int prio,
-							int max_fte, u32 level,
-							u32 flags)
+							u16 vport)
 {
 	struct mlx5_flow_table *next_ft = NULL;
 	struct mlx5_flow_table *ft;
 	int err;
 	int log_table_sz;
 	struct mlx5_flow_root_namespace *root =
-		find_root(&ns->node);
+		find_root(&param->ns->node);
 	struct fs_prio *fs_prio = NULL;
 
 	if (!root) {
@@ -796,29 +794,31 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 	}
 
 	mutex_lock(&root->chain_lock);
-	fs_prio = find_prio(ns, prio);
+	fs_prio = find_prio(param->ns, param->prio);
 	if (!fs_prio) {
 		err = -EINVAL;
 		goto unlock_root;
 	}
-	if (level >= fs_prio->num_levels) {
+	if (param->level >= fs_prio->num_levels) {
 		err = -ENOSPC;
 		goto unlock_root;
 	}
 	/* The level is related to the
 	 * priority level range.
 	 */
-	level += fs_prio->start_level;
-	ft = alloc_flow_table(level,
+	param->level += fs_prio->start_level;
+	ft = alloc_flow_table(param->level,
 			      vport,
-			      max_fte ? roundup_pow_of_two(max_fte) : 0,
+			      param->max_fte ? roundup_pow_of_two(param->max_fte) : 0,
 			      root->table_type,
-			      op_mod, flags);
+			      op_mod, param->flags);
 	if (!ft) {
 		err = -ENOMEM;
 		goto unlock_root;
 	}
 
+	ft->underlay_qpn = param->underlay_qpn;
+
 	tree_init_node(&ft->node, 1, del_flow_table);
 	log_table_sz = ft->max_fte ? ilog2(ft->max_fte) : 0;
 	next_ft = find_next_chained_ft(fs_prio);
@@ -847,29 +847,36 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 	return ERR_PTR(err);
 }
 
-struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
-					       int prio, int max_fte,
-					       u32 level,
-					       u32 flags)
+struct mlx5_flow_table *mlx5_create_flow_table(struct create_flow_table_param *param)
 {
-	return __mlx5_create_flow_table(ns, FS_FT_OP_MOD_NORMAL, 0, prio,
-					max_fte, level, flags);
+	return __mlx5_create_flow_table(param, FS_FT_OP_MOD_NORMAL, 0);
 }
 
 struct mlx5_flow_table *mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
 						     int prio, int max_fte,
 						     u32 level, u16 vport)
 {
-	return __mlx5_create_flow_table(ns, FS_FT_OP_MOD_NORMAL, vport, prio,
-					max_fte, level, 0);
+	struct create_flow_table_param param = {0};
+
+	param.ns = ns;
+	param.prio = prio;
+	param.max_fte = max_fte;
+	param.level = level;
+
+	return __mlx5_create_flow_table(&param, FS_FT_OP_MOD_NORMAL, 0);
 }
 
 struct mlx5_flow_table *mlx5_create_lag_demux_flow_table(
 					       struct mlx5_flow_namespace *ns,
 					       int prio, u32 level)
 {
-	return __mlx5_create_flow_table(ns, FS_FT_OP_MOD_LAG_DEMUX, 0, prio, 0,
-					level, 0);
+	struct create_flow_table_param param = {0};
+
+	param.ns = ns;
+	param.prio = prio;
+	param.level = level;
+
+	return __mlx5_create_flow_table(&param, FS_FT_OP_MOD_LAG_DEMUX, 0);
 }
 EXPORT_SYMBOL(mlx5_create_lag_demux_flow_table);
 
@@ -881,11 +888,18 @@ struct mlx5_flow_table *mlx5_create_auto_grouped_flow_table(struct mlx5_flow_nam
 							    u32 flags)
 {
 	struct mlx5_flow_table *ft;
+	struct create_flow_table_param param = {0};
 
 	if (max_num_groups > num_flow_table_entries)
 		return ERR_PTR(-EINVAL);
 
-	ft = mlx5_create_flow_table(ns, prio, num_flow_table_entries, level, flags);
+	param.ns = ns;
+	param.prio = prio;
+	param.level = level;
+	param.max_fte = num_flow_table_entries;
+	param.flags = flags;
+
+	ft = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft))
 		return ft;
 
@@ -1828,11 +1842,18 @@ static int create_anchor_flow_table(struct mlx5_flow_steering *steering)
 {
 	struct mlx5_flow_namespace *ns = NULL;
 	struct mlx5_flow_table *ft;
+	struct create_flow_table_param param = {0};
 
 	ns = mlx5_get_flow_namespace(steering->dev, MLX5_FLOW_NAMESPACE_ANCHOR);
 	if (WARN_ON(!ns))
 		return -EINVAL;
-	ft = mlx5_create_flow_table(ns, ANCHOR_PRIO, ANCHOR_SIZE, ANCHOR_LEVEL, 0);
+	param.ns = ns;
+	param.prio = ANCHOR_PRIO;
+	param.level = ANCHOR_LEVEL;
+	param.max_fte = ANCHOR_SIZE;
+	param.flags = 0;
+
+	ft = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft)) {
 		mlx5_core_err(steering->dev, "Failed to create last anchor flow table");
 		return PTR_ERR(ft);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index 8e668c63f69e..9ec8a2835642 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -118,6 +118,7 @@ struct mlx5_flow_table {
 	/* FWD rules that point on this flow table */
 	struct list_head		fwd_rules;
 	u32				flags;
+	u32				underlay_qpn;
 };
 
 struct mlx5_fc_cache {
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 949b24b6c479..9ed3cfa607d1 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -104,12 +104,18 @@ struct mlx5_flow_table *
 				    u32 level,
 				    u32 flags);
 
+struct create_flow_table_param {
+	struct mlx5_flow_namespace *ns;
+	int prio;
+	int max_fte;
+	u32 level;
+	u32 flags;
+	u32 underlay_qpn;
+};
+
 struct mlx5_flow_table *
-mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
-		       int prio,
-		       int num_flow_table_entries,
-		       u32 level,
-		       u32 flags);
+mlx5_create_flow_table(struct create_flow_table_param *param);
+
 struct mlx5_flow_table *
 mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
 			     int prio,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 17/26] net/mlx5e: Change cleanup API in order to enable IB link
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (15 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 16/26] net/mlx5e: Enhanced flow table creation to support ETH and IB links Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 18/26] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api Erez Shitrit
                     ` (10 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


1. Change the mlx5e_detach_netdev api.
2. Let that function to be called after the rtnl_lock is already held,
like done in IB link.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 18 +++++++++++-------
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  |  4 ++--
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 84db4761f09c..a10966df24f6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -921,7 +921,7 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 				       void *ppriv);
 void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
-void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
+void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
 void mlx5e_add_vxlan_port(struct net_device *netdev,
 			  struct udp_tunnel_info *ti);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 726d516b7ea8..42ac16daa126 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4022,7 +4022,7 @@ void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv)
 	if (!netif_device_present(netdev))
 		return;
 
-	mlx5e_detach_netdev(mdev, netdev);
+	mlx5e_detach_netdev(mdev, priv);
 	mlx5e_destroy_mdev_resources(mdev);
 }
 EXPORT_SYMBOL(mlx5i_detach);
@@ -4127,18 +4127,22 @@ static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
 	}
 }
 
-void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
+void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct net_device *netdev = priv->netdev;
 	const struct mlx5e_profile *profile = priv->profile;
-
+	bool locked = false;
 	set_bit(MLX5E_STATE_DESTROYING, &priv->state);
 
-	rtnl_lock();
+	if (!rtnl_is_locked()) {
+		rtnl_lock();
+		locked = true;
+	}
 	if (netif_running(netdev))
 		mlx5e_close(netdev);
 	netif_device_detach(netdev);
-	rtnl_unlock();
+	if (locked)
+		rtnl_unlock();
 
 	if (profile->disable)
 		profile->disable(priv);
@@ -4184,7 +4188,7 @@ static void mlx5e_detach(struct mlx5_core_dev *mdev, void *vpriv)
 	if (!netif_device_present(netdev))
 		return;
 
-	mlx5e_detach_netdev(mdev, netdev);
+	mlx5e_detach_netdev(mdev, priv);
 	mlx5e_destroy_mdev_resources(mdev);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 0aad28da1638..21d3d8e0bab7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -586,7 +586,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 	return 0;
 
 err_detach_netdev:
-	mlx5e_detach_netdev(esw->dev, netdev);
+	mlx5e_detach_netdev(esw->dev, priv);
 
 err_destroy_netdev:
 	mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
@@ -601,6 +601,6 @@ void mlx5e_vport_rep_unload(struct mlx5_eswitch *esw,
 	struct net_device *netdev = rep->netdev;
 
 	unregister_netdev(netdev);
-	mlx5e_detach_netdev(esw->dev, netdev);
+	mlx5e_detach_netdev(esw->dev, netdev_priv(netdev));
 	mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 18/26] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (16 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 17/26] net/mlx5e: Change cleanup API in order to enable IB link Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 19/26] net/mlx5e: Export open/close api for IB link Erez Shitrit
                     ` (9 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Let the IB link to call it directly with the relevant priv.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  4 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 24 ++++++++---------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 31 +++++++++++-----------
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  4 +--
 4 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a10966df24f6..2f9242ae06f3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -797,8 +797,8 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 proto,
 void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
 				    enum mlx5e_traffic_types tt);
 
-int mlx5e_open_locked(struct net_device *netdev);
-int mlx5e_close_locked(struct net_device *netdev);
+int mlx5e_open_locked(struct mlx5e_priv *priv);
+int mlx5e_close_locked(struct mlx5e_priv *priv);
 void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev,
 				   u32 *indirection_rqt, int len,
 				   int num_channels);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index bb67863aa361..0c8773718292 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -516,14 +516,14 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened)
-		mlx5e_close_locked(dev);
+		mlx5e_close_locked(priv);
 
 	priv->params.log_rq_size = log_rq_size;
 	priv->params.log_sq_size = log_sq_size;
 	priv->params.min_rx_wqes = min_rx_wqes;
 
 	if (was_opened)
-		err = mlx5e_open_locked(dev);
+		err = mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 
@@ -561,7 +561,7 @@ static int mlx5e_set_channels(struct net_device *dev,
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened)
-		mlx5e_close_locked(dev);
+		mlx5e_close_locked(priv);
 
 	arfs_enabled = dev->features & NETIF_F_NTUPLE;
 	if (arfs_enabled)
@@ -572,7 +572,7 @@ static int mlx5e_set_channels(struct net_device *dev,
 				      MLX5E_INDIR_RQT_SIZE, count);
 
 	if (was_opened)
-		err = mlx5e_open_locked(dev);
+		err = mlx5e_open_locked(priv);
 	if (err)
 		goto out;
 
@@ -626,7 +626,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened && restart) {
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 		priv->params.rx_am_enabled = !!coal->use_adaptive_rx_coalesce;
 	}
 
@@ -655,7 +655,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
 
 out:
 	if (was_opened && restart)
-		err = mlx5e_open_locked(netdev);
+		err = mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 	return err;
@@ -1112,12 +1112,12 @@ static int mlx5e_set_tunable(struct net_device *dev,
 
 		was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 		if (was_opened)
-			mlx5e_close_locked(dev);
+			mlx5e_close_locked(priv);
 
 		priv->params.tx_max_inline = val;
 
 		if (was_opened)
-			err = mlx5e_open_locked(dev);
+			err = mlx5e_open_locked(priv);
 
 		mutex_unlock(&priv->state_lock);
 		break;
@@ -1444,12 +1444,12 @@ static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable)
 
 	reset = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (reset)
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 
 	mlx5e_set_rx_cq_mode_params(&priv->params, rx_cq_period_mode);
 
 	if (reset)
-		err = mlx5e_open_locked(netdev);
+		err = mlx5e_open_locked(priv);
 
 	return err;
 }
@@ -1473,13 +1473,13 @@ static int set_pflag_rx_cqe_compress(struct net_device *netdev,
 	reset = test_bit(MLX5E_STATE_OPENED, &priv->state);
 
 	if (reset)
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 
 	MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, enable);
 	priv->params.rx_cqe_compress_def = enable;
 
 	if (reset)
-		err = mlx5e_open_locked(netdev);
+		err = mlx5e_open_locked(priv);
 	return err;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 42ac16daa126..8e6d0b3cd835 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -152,8 +152,8 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 	mutex_lock(&priv->state_lock);
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
 		goto unlock;
-	mlx5e_close_locked(priv->netdev);
-	err = mlx5e_open_locked(priv->netdev);
+	mlx5e_close_locked(priv);
+	err = mlx5e_open_locked(priv);
 	if (err)
 		netdev_err(priv->netdev, "mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n",
 			   err);
@@ -2239,9 +2239,9 @@ static void mlx5e_netdev_set_tcs(struct net_device *netdev)
 		netdev_set_tc_queue(netdev, tc, nch, 0);
 }
 
-int mlx5e_open_locked(struct net_device *netdev)
+int mlx5e_open_locked(struct mlx5e_priv *priv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct net_device *netdev = priv->netdev;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int num_txqs;
 	int err;
@@ -2297,15 +2297,14 @@ int mlx5e_open(struct net_device *netdev)
 	int err;
 
 	mutex_lock(&priv->state_lock);
-	err = mlx5e_open_locked(netdev);
+	err = mlx5e_open_locked(priv);
 	mutex_unlock(&priv->state_lock);
 
 	return err;
 }
 
-int mlx5e_close_locked(struct net_device *netdev)
+int mlx5e_close_locked(struct mlx5e_priv *priv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	/* May already be CLOSED in case a previous configuration operation
@@ -2336,7 +2335,7 @@ int mlx5e_close(struct net_device *netdev)
 		return -ENODEV;
 
 	mutex_lock(&priv->state_lock);
-	err = mlx5e_close_locked(netdev);
+	err = mlx5e_close_locked(priv);
 	mutex_unlock(&priv->state_lock);
 
 	return err;
@@ -2641,12 +2640,12 @@ static int mlx5e_setup_tc(struct net_device *netdev, u8 tc)
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened)
-		mlx5e_close_locked(priv->netdev);
+		mlx5e_close_locked(priv);
 
 	priv->params.num_tc = tc ? tc : 1;
 
 	if (was_opened)
-		err = mlx5e_open_locked(priv->netdev);
+		err = mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 
@@ -2771,7 +2770,7 @@ static int set_feature_lro(struct net_device *netdev, bool enable)
 	mutex_lock(&priv->state_lock);
 
 	if (was_opened && (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST))
-		mlx5e_close_locked(priv->netdev);
+		mlx5e_close_locked(priv);
 
 	priv->params.lro_en = enable;
 	err = mlx5e_modify_tirs_lro(priv);
@@ -2781,7 +2780,7 @@ static int set_feature_lro(struct net_device *netdev, bool enable)
 	}
 
 	if (was_opened && (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST))
-		mlx5e_open_locked(priv->netdev);
+		mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 
@@ -2915,13 +2914,13 @@ static int mlx5e_change_mtu(struct net_device *netdev, int new_mtu)
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened && reset)
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 
 	netdev->mtu = new_mtu;
 	mlx5e_set_dev_port_mtu(netdev);
 
 	if (was_opened && reset)
-		err = mlx5e_open_locked(netdev);
+		err = mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 
@@ -3173,7 +3172,7 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 	reset = (!priv->xdp_prog || !prog);
 
 	if (was_opened && reset)
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 	if (was_opened && !reset) {
 		/* num_channels is invariant here, so we can take the
 		 * batched reference right upfront.
@@ -3196,7 +3195,7 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 		mlx5e_set_rq_priv_params(priv);
 
 	if (was_opened && reset)
-		mlx5e_open_locked(netdev);
+		mlx5e_open_locked(priv);
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state) || reset)
 		goto unlock;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 06d5e6fecb0a..ad08e64fee1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -169,12 +169,12 @@ void mlx5e_modify_rx_cqe_compression(struct mlx5e_priv *priv, bool val)
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened)
-		mlx5e_close_locked(priv->netdev);
+		mlx5e_close_locked(priv);
 
 	MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, val);
 
 	if (was_opened)
-		mlx5e_open_locked(priv->netdev);
+		mlx5e_open_locked(priv);
 
 unlock:
 	mutex_unlock(&priv->state_lock);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 19/26] net/mlx5e: Export open/close api for IB link
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (17 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 18/26] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 20/26] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode Erez Shitrit
                     ` (8 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Now IB device is able to call open or close for its net device.

TBD:
There is one change that is waiting for the new channels api, till then
used an "if" in the code.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 23 +++++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 13 ++++++++++++-
 include/linux/mlx5/driver.h                       |  3 +++
 4 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2f9242ae06f3..154cab2a301b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -934,4 +934,5 @@ int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev,
 
 bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv);
 bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv);
+bool mlx5e_is_eswitch_vport_mngr(struct mlx5_core_dev *mdev);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8e6d0b3cd835..2ca2de9d2578 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2269,7 +2269,9 @@ int mlx5e_open_locked(struct mlx5e_priv *priv)
 	}
 
 	mlx5e_redirect_rqts(priv);
-	mlx5e_update_carrier(priv);
+	/* only for the RFC, will use channels api when available */
+	if (MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH)
+		mlx5e_update_carrier(priv);
 	mlx5e_timestamp_init(priv);
 #ifdef CONFIG_RFS_ACCEL
 	priv->netdev->rx_cpu_rmap = priv->mdev->rmap;
@@ -2277,7 +2279,7 @@ int mlx5e_open_locked(struct mlx5e_priv *priv)
 	if (priv->profile->update_stats)
 		queue_delayed_work(priv->wq, &priv->update_stats_work, 0);
 
-	if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
+	if (mlx5e_is_eswitch_vport_mngr(mdev)) {
 		err = mlx5e_add_sqs_fwd_rules(priv);
 		if (err)
 			goto err_close_channels;
@@ -3900,6 +3902,7 @@ static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
 	struct mlx5e_priv *priv = netdev_priv(netdev) + base_size;
 
 	mlx5n_build_nic_netdev_priv_common(mdev, netdev, priv, profile, ppriv);
+	priv->ppriv = NULL;
 }
 
 static int mlx5i_init_nic_rx(struct mlx5e_priv *priv)
@@ -4026,6 +4029,22 @@ void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv)
 }
 EXPORT_SYMBOL(mlx5i_detach);
 
+int mlx5i_open(void *vpriv)
+{
+	struct mlx5e_priv *priv = vpriv;
+
+	return mlx5e_open_locked(priv);
+}
+EXPORT_SYMBOL(mlx5i_open);
+
+int mlx5i_close(void *vpriv)
+{
+	struct mlx5e_priv *priv = vpriv;
+
+	return mlx5e_close_locked(priv);
+}
+EXPORT_SYMBOL(mlx5i_close);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
 	.init		   = mlx5e_nic_init,
 	.cleanup	   = mlx5e_nic_cleanup,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 21d3d8e0bab7..cbb1092dddd4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -219,7 +219,12 @@ int mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
 void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	struct mlx5_eswitch_rep *rep;
+
+	if (!priv->ppriv)
+		return;
+
+	rep = priv->ppriv;
 
 	mlx5_eswitch_sqs2vport_stop(esw, rep);
 }
@@ -323,6 +328,12 @@ bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv)
 	return false;
 }
 
+bool mlx5e_is_eswitch_vport_mngr(struct mlx5_core_dev *mdev)
+{
+	return (MLX5_CAP_GEN(mdev, vport_group_manager) &&
+		MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH);
+}
+
 bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv)
 {
 	struct mlx5_eswitch_rep *rep = (struct mlx5_eswitch_rep *)priv->ppriv;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 0acf099e9d06..f7f4e301fbd2 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1138,4 +1138,7 @@ struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
 				       struct mlx5i_create_ext_param *param);
 int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv);
 void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
+int mlx5i_close(void *vpriv);
+int mlx5i_open(void *vpriv);
+
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 20/26] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (18 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 19/26] net/mlx5e: Export open/close api for IB link Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 21/26] net/mlx5e: Refactor TX send flow Erez Shitrit
                     ` (7 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


mlx5_wqe_eth_pad will be used for TX flow for IB link type.
enhanced-ipoib-qp-mode for QP creation.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/qp.c | 4 ----
 include/linux/mlx5/qp.h         | 8 ++++++++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 41e14d57fec9..d6918e6b6f28 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -71,10 +71,6 @@ enum {
 	[MLX5_IB_WR_UMR]			= MLX5_OPCODE_UMR,
 };
 
-struct mlx5_wqe_eth_pad {
-	u8 rsvd0[16];
-};
-
 enum raw_qp_set_mask_map {
 	MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID		= 1UL << 0,
 	MLX5_RAW_QP_RATE_LIMIT			= 1UL << 1,
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index 219c699c17b7..568fffff8ac9 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -221,6 +221,14 @@ enum {
 	MLX5_ETH_WQE_L4_CSUM            = 1 << 7,
 };
 
+struct mlx5_wqe_eth_pad {
+	u8 rsvd0[16];
+};
+
+enum {
+	MLX5_QP_ENHANCED_ULP_STATELESS_MODE = 2,
+};
+
 struct mlx5_wqe_eth_seg {
 	u8              rsvd0[4];
 	u8              cs_flags;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 21/26] net/mlx5e: Refactor TX send flow
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (19 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 20/26] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 22/26] net/mlx5e: Export send function for IB link type Erez Shitrit
                     ` (6 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


prepare of IB link type packets sending.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h    |   4 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 237 ++++++++++++++----------
 2 files changed, 141 insertions(+), 100 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 154cab2a301b..b6758d0b93a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -402,6 +402,10 @@ struct mlx5e_tx_wqe_info {
 	u32 num_bytes;
 	u8  num_wqebbs;
 	u8  num_dma;
+	u16 ds_cnt;
+	u16 ihs;
+	u8 opcode;
+	bool bf;
 };
 
 enum mlx5e_dma_map_type {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index cfb68371c397..22443ce778ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -216,94 +216,65 @@ static inline void mlx5e_insert_vlan(void *start, struct sk_buff *skb, u16 ihs,
 	mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy2_sz);
 }
 
-static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
+static inline void mlx5n_sq_fill_ctrl_seg_and_send(struct mlx5e_sq *sq,
+						   struct mlx5e_tx_wqe_info *wi,
+						   struct mlx5_wqe_ctrl_seg *cseg,
+						   struct sk_buff *skb, u16 pi)
 {
 	struct mlx5_wq_cyc       *wq   = &sq->wq;
 
-	u16 pi = sq->pc & wq->sz_m1;
-	struct mlx5e_tx_wqe      *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
-	struct mlx5e_tx_wqe_info *wi   = &sq->db.txq.wqe_info[pi];
+	cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | wi->opcode);
+	cseg->qpn_ds           = cpu_to_be32((sq->sqn << 8) | wi->ds_cnt);
 
-	struct mlx5_wqe_ctrl_seg *cseg = &wqe->ctrl;
-	struct mlx5_wqe_eth_seg  *eseg = &wqe->eth;
-	struct mlx5_wqe_data_seg *dseg;
+	sq->db.txq.skb[pi] = skb;
 
-	unsigned char *skb_data = skb->data;
-	unsigned int skb_len = skb->len;
-	u8  opcode = MLX5_OPCODE_SEND;
-	dma_addr_t dma_addr = 0;
-	unsigned int num_bytes;
-	bool bf = false;
-	u16 headlen;
-	u16 ds_cnt;
-	u16 ihs;
-	int i;
+	wi->num_wqebbs = DIV_ROUND_UP(wi->ds_cnt, MLX5_SEND_WQEBB_NUM_DS);
+	sq->pc += wi->num_wqebbs;
 
-	memset(wqe, 0, sizeof(*wqe));
+	netdev_tx_sent_queue(sq->txq, wi->num_bytes);
 
-	if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
-		eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
-		if (skb->encapsulation) {
-			eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
-					  MLX5_ETH_WQE_L4_INNER_CSUM;
-			sq->stats.csum_partial_inner++;
-		} else {
-			eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
-		}
-	} else
-		sq->stats.csum_none++;
+	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
+		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 
-	if (sq->cc != sq->prev_cc) {
-		sq->prev_cc = sq->cc;
-		sq->bf_budget = (sq->cc == sq->pc) ? MLX5E_SQ_BF_BUDGET : 0;
+	if (unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
+		netif_tx_stop_queue(sq->txq);
+		sq->stats.stopped++;
 	}
 
-	if (skb_is_gso(skb)) {
-		eseg->mss    = cpu_to_be16(skb_shinfo(skb)->gso_size);
-		opcode       = MLX5_OPCODE_LSO;
+	if (!skb->xmit_more || netif_xmit_stopped(sq->txq)) {
+		int bf_sz = 0;
 
-		if (skb->encapsulation) {
-			ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
-			sq->stats.tso_inner_packets++;
-			sq->stats.tso_inner_bytes += skb->len - ihs;
-		} else {
-			ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
-			sq->stats.tso_packets++;
-			sq->stats.tso_bytes += skb->len - ihs;
-		}
+		if (wi->bf && test_bit(MLX5E_SQ_STATE_BF_ENABLE, &sq->state))
+			bf_sz = wi->num_wqebbs << 3;
 
-		num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs;
-	} else {
-		bf = sq->bf_budget &&
-		     !skb->xmit_more &&
-		     !skb_shinfo(skb)->nr_frags;
-		ihs = mlx5e_get_inline_hdr_size(sq, skb, bf);
-		num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN);
+		cseg->fm_ce_se = MLX5_WQE_CTRL_CQ_UPDATE;
+		mlx5e_tx_notify_hw(sq, cseg, bf_sz);
 	}
 
-	wi->num_bytes = num_bytes;
+	/* fill sq edge with nops to avoid wqe wrap around */
+	while ((sq->pc & wq->sz_m1) > sq->edge)
+		mlx5e_send_nop(sq, false);
 
-	if (skb_vlan_tag_present(skb)) {
-		mlx5e_insert_vlan(eseg->inline_hdr_start, skb, ihs, &skb_data,
-				  &skb_len);
-		ihs += VLAN_HLEN;
-	} else {
-		memcpy(eseg->inline_hdr_start, skb_data, ihs);
-		mlx5e_tx_skb_pull_inline(&skb_data, &skb_len, ihs);
-	}
+	sq->bf_budget = wi->bf ? sq->bf_budget - 1 : 0;
 
-	eseg->inline_hdr_sz = cpu_to_be16(ihs);
+	sq->stats.packets++;
+	sq->stats.bytes += wi->num_bytes;
+}
 
-	ds_cnt  = sizeof(*wqe) / MLX5_SEND_WQE_DS;
-	ds_cnt += DIV_ROUND_UP(ihs - sizeof(eseg->inline_hdr_start),
-			       MLX5_SEND_WQE_DS);
-	dseg    = (struct mlx5_wqe_data_seg *)cseg + ds_cnt;
+static inline int mlx5n_sq_build_data_seg(struct mlx5e_sq *sq,
+					  struct mlx5e_tx_wqe_info *wi,
+					  struct mlx5_wqe_data_seg *dseg,
+					  struct sk_buff *skb)
+{
+	dma_addr_t dma_addr = 0;
+	u16 headlen;
+	int i;
 
 	wi->num_dma = 0;
 
-	headlen = skb_len - skb->data_len;
+	headlen = skb->len - skb->data_len;
 	if (headlen) {
-		dma_addr = dma_map_single(sq->pdev, skb_data, headlen,
+		dma_addr = dma_map_single(sq->pdev, skb->data, headlen,
 					  DMA_TO_DEVICE);
 		if (unlikely(dma_mapping_error(sq->pdev, dma_addr)))
 			goto dma_unmap_wqe_err;
@@ -336,57 +307,123 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
 
 		dseg++;
 	}
+	wi->ds_cnt += wi->num_dma;
 
-	ds_cnt += wi->num_dma;
-
-	cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | opcode);
-	cseg->qpn_ds           = cpu_to_be32((sq->sqn << 8) | ds_cnt);
+	return 0;
 
-	sq->db.txq.skb[pi] = skb;
+dma_unmap_wqe_err:
+	sq->stats.dropped++;
+	mlx5e_dma_unmap_wqe_err(sq, wi->num_dma);
 
-	wi->num_wqebbs = DIV_ROUND_UP(ds_cnt, MLX5_SEND_WQEBB_NUM_DS);
-	sq->pc += wi->num_wqebbs;
+	dev_kfree_skb_any(skb);
+	return -ENOMEM;
+}
 
-	netdev_tx_sent_queue(sq->txq, wi->num_bytes);
+static inline void mlx5n_sq_build_ether_seg(struct mlx5e_sq *sq,
+					    struct mlx5e_tx_wqe_info *wi,
+					    struct mlx5_wqe_eth_seg *eseg,
+					    struct sk_buff *skb)
+{
+	unsigned int num_bytes;
 
-	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
-		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+	if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
+		eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
+		if (skb->encapsulation) {
+			eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
+					  MLX5_ETH_WQE_L4_INNER_CSUM;
+			sq->stats.csum_partial_inner++;
+		} else {
+			eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
+		}
+			} else {
+				sq->stats.csum_none++;
+			}
 
-	if (unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
-		netif_tx_stop_queue(sq->txq);
-		sq->stats.stopped++;
+	if (sq->cc != sq->prev_cc) {
+		sq->prev_cc = sq->cc;
+		sq->bf_budget = (sq->cc == sq->pc) ? MLX5E_SQ_BF_BUDGET : 0;
 	}
 
-	sq->stats.xmit_more += skb->xmit_more;
-	if (!skb->xmit_more || netif_xmit_stopped(sq->txq)) {
-		int bf_sz = 0;
+	wi->opcode = MLX5_OPCODE_SEND;
+	wi->bf = false;
+	wi->ihs = 0;
 
-		if (bf && test_bit(MLX5E_SQ_STATE_BF_ENABLE, &sq->state))
-			bf_sz = wi->num_wqebbs << 3;
+	if (skb_is_gso(skb)) {
+		eseg->mss    = cpu_to_be16(skb_shinfo(skb)->gso_size);
+		wi->opcode      = MLX5_OPCODE_LSO;
 
-		cseg->fm_ce_se = MLX5_WQE_CTRL_CQ_UPDATE;
-		mlx5e_tx_notify_hw(sq, &wqe->ctrl, bf_sz);
+		if (skb->encapsulation) {
+			wi->ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
+			sq->stats.tso_inner_packets++;
+			sq->stats.tso_inner_bytes += skb->len - wi->ihs;
+		} else {
+			wi->ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
+			sq->stats.tso_packets++;
+			sq->stats.tso_bytes += skb->len - wi->ihs;
+		}
+
+		num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * wi->ihs;
+	} else {
+		wi->bf = sq->bf_budget &&
+		     !skb->xmit_more &&
+		     !skb_shinfo(skb)->nr_frags;
+		wi->ihs = mlx5e_get_inline_hdr_size(sq, skb, wi->bf);
+		num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN);
 	}
 
-	/* fill sq edge with nops to avoid wqe wrap around */
-	while ((pi = (sq->pc & wq->sz_m1)) > sq->edge) {
-		sq->db.txq.skb[pi] = NULL;
-		mlx5e_send_nop(sq, false);
+	wi->num_bytes = num_bytes;
+
+	if (skb_vlan_tag_present(skb)) {
+		mlx5e_insert_vlan(eseg->inline_hdr_start, skb, wi->ihs, &skb->data,
+				  &skb->len);
+		wi->ihs += VLAN_HLEN;
+	} else {
+		memcpy(eseg->inline_hdr_start, skb->data, wi->ihs);
+		mlx5e_tx_skb_pull_inline(&skb->data, &skb->len, wi->ihs);
 	}
 
-	if (bf)
-		sq->bf_budget--;
+	eseg->inline_hdr_sz = cpu_to_be16(wi->ihs);
+}
 
-	sq->stats.packets++;
-	sq->stats.bytes += num_bytes;
-	return NETDEV_TX_OK;
+static inline void mlx5n_sq_build_datagram_seg(struct mlx5e_sq *sq,
+					       struct mlx5_wqe_datagram_seg *dseg,
+					       struct mlx5_av *av, u32 dqpn, u32 dqkey,
+					       struct sk_buff *skb)
+{
+	memcpy(&dseg->av, av, sizeof(struct mlx5_av));
+	dseg->av.dqp_dct = cpu_to_be32(dqpn | MLX5_EXTENDED_UD_AV);
+	dseg->av.key.qkey.qkey = cpu_to_be32(dqkey);
+}
 
-dma_unmap_wqe_err:
-	sq->stats.dropped++;
-	mlx5e_dma_unmap_wqe_err(sq, wi->num_dma);
+static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
+{
+	struct mlx5_wq_cyc       *wq   = &sq->wq;
+	u16                      pi    = sq->pc & wq->sz_m1;
+	void                     *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
+	struct mlx5e_tx_wqe_info *wi   = &sq->db.txq.wqe_info[pi];
 
-	dev_kfree_skb_any(skb);
+	struct mlx5_wqe_ctrl_seg *ctrl_seg = wqe;
+	struct mlx5_wqe_eth_seg  *ether_seg =
+		wqe + sizeof(struct mlx5_wqe_ctrl_seg);
+	struct mlx5_wqe_data_seg *data_seg;
+
+	memset(wqe, 0, sizeof(struct mlx5_wqe_ctrl_seg) +
+	       sizeof(struct mlx5_wqe_eth_seg));
+
+	mlx5n_sq_build_ether_seg(sq, wi, ether_seg, skb);
+
+	wi->ds_cnt  = (sizeof(struct mlx5_wqe_ctrl_seg) +
+			sizeof(struct mlx5_wqe_eth_seg)) / MLX5_SEND_WQE_DS;
+	wi->ds_cnt += DIV_ROUND_UP(wi->ihs - sizeof(ether_seg->inline_hdr_start),
+			MLX5_SEND_WQE_DS);
+	data_seg = (struct mlx5_wqe_data_seg *)ctrl_seg + wi->ds_cnt;
+
+	if (mlx5n_sq_build_data_seg(sq, wi, data_seg, skb) < 0)
+		goto out;
+
+	mlx5n_sq_fill_ctrl_seg_and_send(sq, wi, ctrl_seg, skb, pi);
 
+out:
 	return NETDEV_TX_OK;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 22/26] net/mlx5e: Export send function for IB link type
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (20 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 21/26] net/mlx5e: Refactor TX send flow Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 23/26] net/mlx5e: New function pointer for build_rx_skb is Erez Shitrit
                     ` (5 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


The function will be used in IB link in order to send packets.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 55 +++++++++++++++++++++++++
 include/linux/mlx5/driver.h                     |  5 ++-
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 22443ce778ff..fea06be30393 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -435,6 +435,61 @@ netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev)
 	return mlx5e_sq_xmit(sq, skb);
 }
 
+static int s_ctrl_seg = sizeof(struct mlx5_wqe_ctrl_seg);
+static int s_datagram_seg = sizeof(struct mlx5_wqe_datagram_seg);
+static int s_pad = sizeof(struct mlx5_wqe_eth_pad);
+static int s_eth_seg = sizeof(struct mlx5_wqe_eth_seg);
+static netdev_tx_t mlx5i_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb,
+				 struct mlx5_av *av, u32 dqpn, u32 dqkey)
+{
+	struct mlx5_wq_cyc       *wq   = &sq->wq;
+	u16                      pi    = sq->pc & wq->sz_m1;
+	void                     *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
+	struct mlx5e_tx_wqe_info *wi   = &sq->db.txq.wqe_info[pi];
+
+	struct mlx5_wqe_ctrl_seg *ctrl_seg = wqe;
+	struct mlx5_wqe_datagram_seg *datagram_seg =
+			wqe + s_ctrl_seg;
+	struct mlx5_wqe_eth_pad *pad =
+			(void *)datagram_seg + s_datagram_seg;
+	struct mlx5_wqe_eth_seg  *ether_seg =
+			(void *)pad + s_pad;
+	struct mlx5_wqe_data_seg *data_seg;
+
+	int tot = s_ctrl_seg + s_datagram_seg + s_pad + s_eth_seg;
+
+	memset(wqe, 0, tot);
+
+	mlx5n_sq_build_datagram_seg(sq, datagram_seg, av, dqpn, dqkey, skb);
+
+	mlx5n_sq_build_ether_seg(sq, wi, ether_seg, skb);
+
+	wi->ds_cnt  = tot / MLX5_SEND_WQE_DS;
+	wi->ds_cnt += DIV_ROUND_UP(wi->ihs - sizeof(ether_seg->inline_hdr_start),
+			MLX5_SEND_WQE_DS);
+	data_seg = (struct mlx5_wqe_data_seg *)ctrl_seg + wi->ds_cnt;
+
+	if (mlx5n_sq_build_data_seg(sq, wi, data_seg, skb) < 0)
+		goto out;
+
+	mlx5n_sq_fill_ctrl_seg_and_send(sq, wi, ctrl_seg, skb, pi);
+
+out:
+	return NETDEV_TX_OK;
+}
+
+netdev_tx_t mlx5i_xmit(struct sk_buff *skb, void *p,
+		       struct mlx5_av *av, u32 dqpn, u32 dqkey)
+{
+	struct mlx5e_priv *priv = p;
+	struct mlx5e_sq *sq;
+
+	sq = priv->txq_to_sq_map[skb_get_queue_mapping(skb)];
+
+	return mlx5i_sq_xmit(sq, skb, av, dqpn, dqkey);
+}
+EXPORT_SYMBOL(mlx5i_xmit);
+
 bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 {
 	struct mlx5e_sq *sq;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index f7f4e301fbd2..c60494854190 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1132,6 +1132,8 @@ struct mlx5i_create_ext_param {
 	u32	qpn;
 };
 
+struct mlx5_av;
+
 struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
 				       const char *name,
 				       void (*setup)(struct net_device *dev),
@@ -1140,5 +1142,6 @@ struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
 void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
 int mlx5i_close(void *vpriv);
 int mlx5i_open(void *vpriv);
-
+netdev_tx_t mlx5i_xmit(struct sk_buff *skb, void *p, struct mlx5_av *av,
+		       u32 dqpn, u32 dqkey);
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 23/26] net/mlx5e: New function pointer for build_rx_skb is
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (21 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 22/26] net/mlx5e: Export send function for IB link type Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 24/26] net/mlx5e: Change the function that checks the packet type Erez Shitrit
                     ` (4 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


In order to have the ability to support IB link with the same base code.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  7 +++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 10 +++++-----
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index b6758d0b93a5..84de1ca11524 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -298,6 +298,11 @@ typedef int (*mlx5e_fp_alloc_wqe)(struct mlx5e_rq *rq, struct mlx5e_rx_wqe *wqe,
 
 typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq *rq, u16 ix);
 
+typedef void (*mlx5e_fp_build_rx_skb)(struct mlx5_cqe64 *cqe,
+				      u32 cqe_bcnt,
+				      struct mlx5e_rq *rq,
+				      struct sk_buff *skb);
+
 struct mlx5e_dma_info {
 	struct page	*page;
 	dma_addr_t	addr;
@@ -367,6 +372,7 @@ struct mlx5e_rq {
 	mlx5e_fp_handle_rx_cqe handle_rx_cqe;
 	mlx5e_fp_alloc_wqe     alloc_wqe;
 	mlx5e_fp_dealloc_wqe   dealloc_wqe;
+	mlx5e_fp_build_rx_skb  build_rx_skb;
 
 	unsigned long          state;
 	int                    ix;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2ca2de9d2578..7fbfdd41c709 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -493,6 +493,12 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq)
 	return mlx5e_create_umr_mkey(priv, num_mtts, PAGE_SHIFT, &rq->umr_mkey);
 }
 
+/* forward declaration */
+inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
+			       u32 cqe_bcnt,
+			       struct mlx5e_rq *rq,
+			       struct sk_buff *skb);
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
 			   struct mlx5e_rq_param *param,
 			   struct mlx5e_rq *rq)
@@ -538,6 +544,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 	if (rq->xdp_prog)
 		rq->buff.map_dir = DMA_BIDIRECTIONAL;
 
+	rq->build_rx_skb = mlx5e_build_rx_skb;
 	switch (priv->params.rq_wq_type) {
 	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
 		if (mlx5e_is_vf_vport_rep(priv)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index ad08e64fee1a..98546b3395df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -590,10 +590,10 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
 	rq->stats.csum_none++;
 }
 
-static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
-				      u32 cqe_bcnt,
-				      struct mlx5e_rq *rq,
-				      struct sk_buff *skb)
+inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
+			       u32 cqe_bcnt,
+			       struct mlx5e_rq *rq,
+			       struct sk_buff *skb)
 {
 	struct net_device *netdev = rq->netdev;
 	struct mlx5e_tstamp *tstamp = rq->tstamp;
@@ -632,7 +632,7 @@ static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 {
 	rq->stats.packets++;
 	rq->stats.bytes += cqe_bcnt;
-	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
+	rq->build_rx_skb(cqe, cqe_bcnt, rq, skb);
 }
 
 static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_sq *sq)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 24/26] net/mlx5e: Change the function that checks the packet type
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (22 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 23/26] net/mlx5e: New function pointer for build_rx_skb is Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 25/26] net/mlx5e: Add support for build_rx_skb for packet from IB type Erez Shitrit
                     ` (3 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Now we can use it for non-ethernet type packets (like IB).
After changing the order of the skb processing the function now can
check the skb->protocol field to check if it is ETH_P_IP/ETH_P_IPV6
packet.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 98546b3395df..071a6ecce720 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -549,9 +549,8 @@ static inline void mlx5e_skb_set_hash(struct mlx5_cqe64 *cqe,
 
 static inline bool is_first_ethertype_ip(struct sk_buff *skb)
 {
-	__be16 ethertype = ((struct ethhdr *)skb->data)->h_proto;
-
-	return (ethertype == htons(ETH_P_IP) || ethertype == htons(ETH_P_IPV6));
+	return (skb->protocol == htons(ETH_P_IP) ||
+		skb->protocol == htons(ETH_P_IPV6));
 }
 
 static inline void mlx5e_handle_csum(struct net_device *netdev,
@@ -621,8 +620,8 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 
 	skb->mark = be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK;
 
-	mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
 	skb->protocol = eth_type_trans(skb, netdev);
+	mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
 }
 
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 25/26] net/mlx5e: Add support for build_rx_skb for packet from IB type
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (23 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 24/26] net/mlx5e: Change the function that checks the packet type Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 26/26] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops Erez Shitrit
                     ` (2 subsequent siblings)
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


New function that parse and build the skb for IPoIB traffic.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  8 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 49 +++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7fbfdd41c709..ffca235a8985 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -499,6 +499,11 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 			       struct mlx5e_rq *rq,
 			       struct sk_buff *skb);
 
+inline void mlx5i_build_rx_skb(struct mlx5_cqe64 *cqe,
+			       u32 cqe_bcnt,
+			       struct mlx5e_rq *rq,
+			       struct sk_buff *skb);
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
 			   struct mlx5e_rq_param *param,
 			   struct mlx5e_rq *rq)
@@ -584,6 +589,9 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 		else
 			rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
 
+		if (MLX5_CAP_GEN(mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
+			rq->build_rx_skb = mlx5i_build_rx_skb;
+
 		rq->alloc_wqe = mlx5e_alloc_rx_wqe;
 		rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 071a6ecce720..db3064c4b052 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -624,6 +624,55 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 	mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
 }
 
+#define MLX5_IB_GRH_DGID_OFFSET 24
+#define MLX5_IB_GRH_BYTES       40
+#define MLX5_IPOIB_ENCAP_LEN    4
+#define MLX5_GID_SIZE           16
+
+inline void mlx5i_build_rx_skb(struct mlx5_cqe64 *cqe,
+			       u32 cqe_bcnt,
+			       struct mlx5e_rq *rq,
+			       struct sk_buff *skb)
+{
+	struct net_device *netdev = rq->netdev;
+	u8 *dgid;
+	u8 g;
+
+	skb_put(skb, cqe_bcnt);
+
+	g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3;
+	dgid = skb->data + MLX5_IB_GRH_DGID_OFFSET;
+	if ((!g) || dgid[0] != 0xff)
+		skb->pkt_type = PACKET_HOST;
+	else if (memcmp(dgid, netdev->broadcast + 4, MLX5_GID_SIZE) == 0)
+		skb->pkt_type = PACKET_BROADCAST;
+	else
+		skb->pkt_type = PACKET_MULTICAST;
+
+	/* TODO: IB/ipoib: Allow mcast packets from other VFs
+	 * 68996a6e760e5c74654723eeb57bf65628ae87f4
+	 */
+
+	skb_pull(skb, MLX5_IB_GRH_BYTES);
+
+	skb->protocol = *((__be16 *)(skb->data));
+
+	mlx5e_handle_csum(netdev, cqe, rq, skb, rq->priv->params.lro_en);
+
+	skb_record_rx_queue(skb, rq->ix);
+
+	if (likely(netdev->features & NETIF_F_RXHASH))
+		mlx5e_skb_set_hash(cqe, skb);
+
+	skb_reset_mac_header(skb);
+	skb_pull(skb, MLX5_IPOIB_ENCAP_LEN);
+
+	++netdev->stats.rx_packets;
+	netdev->stats.rx_bytes += skb->len;
+
+	skb->dev = netdev;
+}
+
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 					 struct mlx5_cqe64 *cqe,
 					 u32 cqe_bcnt,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC for accelerated IPoIB 26/26] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (24 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 25/26] net/mlx5e: Add support for build_rx_skb for packet from IB type Erez Shitrit
@ 2017-03-01 14:02   ` Erez Shitrit
  2017-03-01 18:20   ` [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver Jason Gunthorpe
  2017-03-01 18:28   ` Jason Gunthorpe
  27 siblings, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-01 14:02 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Erez Shitrit


Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/Makefile         |   2 +-
 drivers/infiniband/hw/mlx5/main.c           |   3 +
 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c | 289 ++++++++++++++++++++++++++++
 3 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c

diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile
index 90ad2adc752f..0c4caa339565 100644
--- a/drivers/infiniband/hw/mlx5/Makefile
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_MLX5_INFINIBAND)	+= mlx5_ib.o
 
-mlx5_ib-y :=	main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o gsi.o ib_virt.o cmd.o
+mlx5_ib-y :=	main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o gsi.o ib_virt.o cmd.o mlx5_ipoib_ops.o
 mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4a043cf35b9a..4140c5b2a050 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -72,6 +72,8 @@ enum {
 	MLX5_ATOMIC_SIZE_QP_8BYTES = 1 << 3,
 };
 
+struct ib_ipoib_accel_ops *mlx5_ib_get_ipoib_ops(struct ib_device *device);
+
 static enum rdma_link_layer
 mlx5_port_type_cap_to_rdma_ll(int port_type_cap)
 {
@@ -3436,6 +3438,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.alloc_mr		= mlx5_ib_alloc_mr;
 	dev->ib_dev.map_mr_sg		= mlx5_ib_map_mr_sg;
 	dev->ib_dev.check_mr_status	= mlx5_ib_check_mr_status;
+	dev->ib_dev.get_ipoib_accel_ops = mlx5_ib_get_ipoib_ops;
 	dev->ib_dev.get_port_immutable  = mlx5_port_immutable;
 	dev->ib_dev.get_dev_fw_str      = get_dev_fw_str;
 	if (mlx5_core_is_pf(mdev)) {
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c b/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c
new file mode 100644
index 000000000000..b8008aca690b
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c
@@ -0,0 +1,289 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/ib_ipoib_accel_ops.h>
+#include "mlx5_ib.h"
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/fs.h>
+/*FIX ME*/
+#include "../../ulp/ipoib/ipoib.h"
+
+#define IB_DEFAULT_Q_KEY   0xb1b
+
+static inline void *next_netdev_priv(const struct net_device *dev)
+{
+	return (char *)netdev_priv(dev) + sizeof(struct ipoib_dev_priv);
+}
+
+int mlx5_ib_config_ipoib_qp(struct mlx5_ib_dev *ib_dev, struct mlx5_core_qp *qp)
+{
+	u32 *in;
+	struct mlx5_qp_context *context;
+	int inlen;
+	void *addr_path;
+	void *qpc;
+	int ret;
+
+	inlen = MLX5_ST_SZ_BYTES(create_qp_in);
+	in = mlx5_vzalloc(inlen);
+	if (!in)
+		return -ENOMEM;
+
+	qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_UD);
+	MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+	MLX5_SET(qpc, qpc, ulp_stateless_offload_mode,
+		 MLX5_QP_ENHANCED_ULP_STATELESS_MODE);
+
+	addr_path = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+	MLX5_SET(ads, addr_path, port, 1);
+	MLX5_SET(ads, addr_path, grh, 1);
+
+	ret = mlx5_core_create_qp(ib_dev->mdev, qp, in, inlen);
+	if (ret) {
+		pr_err("Failed creating IPoIB QP err : %d\n", ret);
+		goto clean_inbox;
+	}
+
+	/* QP states */
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (!context)
+		goto clean_inbox;
+
+	context->flags = cpu_to_be32(MLX5_QP_PM_MIGRATED << 11);
+	context->pri_path.port = 1;
+	context->qkey = cpu_to_be32(IB_DEFAULT_Q_KEY);
+
+	ret = mlx5_core_qp_modify(ib_dev->mdev,
+				  MLX5_CMD_OP_RST2INIT_QP, 0, context, qp);
+	if (ret) {
+		pr_warn("Failed to modify qp RST2INIT, err: %d\n", ret);
+		goto clean_in_modify;
+	}
+	memset(context, 0, sizeof(*context));
+
+	ret = mlx5_core_qp_modify(ib_dev->mdev,
+				  MLX5_CMD_OP_INIT2RTR_QP, 0, context,
+				  qp);
+	if (ret) {
+		pr_warn("Failed to modify qp INIT2RTR, err: %d\n", ret);
+		goto clean_in_modify;
+	}
+
+	ret = mlx5_core_qp_modify(ib_dev->mdev,
+				  MLX5_CMD_OP_RTR2RTS_QP, 0, context,
+				  qp);
+	if (ret) {
+		pr_warn("Failed to modify qp RTR2RTS, err: %d\n", ret);
+		goto clean_in_modify;
+	}
+
+	kvfree(in);
+	kfree(context);
+
+	return 0;
+
+clean_in_modify:
+	kvfree(in);
+clean_inbox:
+	kvfree(in);
+	return ret;
+}
+
+static void mlx5_ib_clean_qp(struct mlx5_ib_dev *ib_dev,
+			     struct mlx5_core_qp *qp)
+{
+	mlx5_core_destroy_qp(ib_dev->mdev, qp);
+	kfree(qp);
+}
+
+struct net_device *mlx5_ib_create_netdev(struct ib_device *hca,
+					 const char *name,
+					 void (*setup)(struct net_device *))
+{
+	struct net_device *dev;
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+	struct mlx5_core_qp *qp;
+	struct mlx5i_create_ext_param param;
+	struct ipoib_dev_priv *priv;
+	int ret;
+
+	/* new IPoIB QP */
+	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
+	if (!qp) {
+		pr_warn("Failed allocate memroy for QP\n");
+		return NULL;
+	}
+
+	/* config and move qp to RTS */
+	ret = mlx5_ib_config_ipoib_qp(ib_dev, qp);
+	if (ret) {
+		pr_warn("Failed config IPoIB QP ret: %d\n", ret);
+		goto clean_qp;
+	}
+
+	param.size_base_priv = sizeof(struct ipoib_dev_priv);
+	param.qpn = qp->qpn;
+	dev = mlx5i_create_netdev(ib_dev->mdev, name, setup, &param);
+	if (dev) {
+		priv = netdev_priv(dev);
+		priv->qp_num = qp->qpn;
+		priv->context = qp;
+		pr_debug("%s qpn: %d created\n", __func__, qp->qpn);
+		return dev;
+	}
+
+	pr_err("%s: Failed to create net device\n", __func__);
+
+clean_qp:
+	mlx5_ib_clean_qp(ib_dev, qp);
+	return NULL;
+}
+
+int mlx5_ib_dev_init(struct net_device *dev, int *qp_num)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	void *next_priv = next_netdev_priv(dev);
+	struct ib_device *hca = priv->ca;
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+	int ret;
+
+	ret = mlx5i_attach(ib_dev->mdev, next_priv);
+	if (ret) {
+		pr_err("Failed resources allocation for device: %s ret: %d\n",
+		       dev->name, ret);
+		return ret;
+	}
+
+	*qp_num = priv->qp_num;
+
+	pr_debug("resources allocated for device: %s\n", dev->name);
+
+	return 0;
+}
+
+void mlx5_ib_dev_cleanup(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	void *next_priv = next_netdev_priv(dev);
+	struct ib_device *hca = priv->ca;
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+	struct mlx5_qp_context context;
+	int ret;
+
+	/* detach qp from flow-steering by rest it */
+	ret = mlx5_core_qp_modify(ib_dev->mdev,
+				  MLX5_CMD_OP_2RST_QP, 0, &context,
+				  (struct mlx5_core_qp *)priv->context);
+	if (ret)
+		pr_err("%s failed (ret: %d) to reset QP\n", __func__, ret);
+
+	mlx5i_detach(ib_dev->mdev, next_priv);
+
+	mlx5_ib_clean_qp(ib_dev, (struct mlx5_core_qp *)priv->context);
+}
+
+int mlx5_ib_dev_open(struct net_device *dev)
+{
+	void *next_priv = next_netdev_priv(dev);
+
+	return mlx5i_open(next_priv);
+}
+
+int mlx5_ib_dev_stop(struct net_device *dev)
+{
+	void *next_priv = next_netdev_priv(dev);
+
+	return mlx5i_close(next_priv);
+}
+
+void mlx5_ib_send(struct net_device *dev, struct sk_buff *skb,
+		  struct ipoib_ah *address, u32 dqpn, u32 dqkey)
+{
+	void *next_priv = next_netdev_priv(dev);
+
+	mlx5i_xmit(skb, next_priv, &to_mah(address->ah)->av, dqpn, dqkey);
+}
+
+int mlx5_ib_attach_mcast(struct net_device *dev, union ib_gid *gid,
+			 u16 lid, int set_qkey)
+{
+	int err;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_device *hca = priv->ca;
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+
+	pr_debug("%s attaching QPN 0x%x, MGID %pI6\n",
+		 dev->name, priv->qp_num, gid->raw);
+
+	err = mlx5_core_attach_mcg(ib_dev->mdev, gid, priv->qp_num);
+	if (err)
+		pr_err("%s failed attaching QPN 0x%x, MGID %pI6\n",
+		       dev->name, priv->qp_num, gid->raw);
+
+	return err;
+}
+
+int mlx5_ib_dettach_mcast(struct net_device *dev, union ib_gid *gid, u16 lid)
+{
+	int err;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ib_device *hca = priv->ca;
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+
+	pr_debug("%s de-attaching QPN 0x%x, MGID %pI6\n",
+		 dev->name, priv->qp_num, gid->raw);
+
+	err = mlx5_core_detach_mcg(ib_dev->mdev, gid, priv->qp_num);
+	if (err)
+		pr_err("%s failed dettaching QPN 0x%x, MGID %pI6\n",
+		       dev->name, priv->qp_num, gid->raw);
+
+	return err;
+}
+
+struct ib_ipoib_accel_ops ipoib_ops = {
+	.ib_dev_init = mlx5_ib_dev_init,
+	.ib_dev_cleanup = mlx5_ib_dev_cleanup,
+	.ib_dev_open = mlx5_ib_dev_open,
+	.ib_dev_stop = mlx5_ib_dev_stop,
+	.send = mlx5_ib_send,
+	.create_netdev = mlx5_ib_create_netdev,
+	.attach_mcast = mlx5_ib_attach_mcast,
+	.detach_mcast = mlx5_ib_dettach_mcast
+
+};
+
+struct ib_ipoib_accel_ops *mlx5_ib_get_ipoib_ops(struct ib_device *device)
+{
+	return &ipoib_ops;
+}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (25 preceding siblings ...)
  2017-03-01 14:02   ` [RFC for accelerated IPoIB 26/26] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops Erez Shitrit
@ 2017-03-01 18:20   ` Jason Gunthorpe
       [not found]     ` <20170301182039.GC14791-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-03-01 18:28   ` Jason Gunthorpe
  27 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2017-03-01 18:20 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, Niranjana Vishwanathapura

On Wed, Mar 01, 2017 at 04:02:08PM +0200, Erez Shitrit wrote:
>     /* default accelerating functions, same as before */
>     struct ib_ipoib_accel_ops default_ipoib_accel_ops = {
>             .ib_dev_init = ipoib_dev_init_default,
>             .ib_dev_cleanup = ipoib_dev_uninit_default,
>             .ib_dev_open = ipoib_ib_dev_open_default,
>             .ib_dev_stop = ipoib_ib_dev_stop_default,
>             .send = ipoib_send,
>             .create_netdev = ipoib_create_netdev_default,
>             .attach_mcast = ipoib_mcast_attach,
>             .dettach_mcast = ipoib_mcast_dettach,
>     };

This is so ridiculously close to what opa_vnic is doing, please try to
work with them to figure out some common version of your
'create_netdev':

+struct ib_ipoib_accel_ops {
+
+       /*
+        * HW provider driver creates the net_device for IPoIB.
+        * hca: The current ib device.
+        * name: is the format of the new network device (probably ib%d)
+        */
+       struct net_device * (*create_netdev)(struct ib_device *hca,
+                                            const char *name,
+                                            void (*setup)(struct net_device *));


vs:

@@ -2110,6 +2128,15 @@  struct ib_device {

+	struct net_device *(*alloc_rdma_netdev)(
+					struct ib_device *device,
+					u8 port_num,
+					enum rdma_netdev_t type,
+					const char *name,
+					unsigned char name_assign_type,
+					void (*setup)(struct net_device *));


I've pointed out again and again you two are working on the same
thing.

It is inexplicable to me you couldn't use this opa_vnic patch as a
starting point for the ipoib api:

https://patchwork.kernel.org/patch/9587819/

I would propose you add a RDMA_NETDEV_IPOIB to their stuff and a
rdma_netdev_get_ipoib_ops(struct net_device *) to get your
rdma_netdev_ipoib_ops.

I also wonder why there are so many ops, for instance the approach of
having the driver set .ndo_stop and others seems better.

ipoib could either 'wrapper' that pointer after calling
alloc_rdma_netdev or it could export enough utility functions for the
driver to implement the needed behavior. (eg this may be more in line
with the "midlayer mistake" thinking of
https://lwn.net/Articles/336262/)

Maybe Niranjana has some suggestions based on learning in vnic, they
tried a few different schemes.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver
       [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (26 preceding siblings ...)
  2017-03-01 18:20   ` [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver Jason Gunthorpe
@ 2017-03-01 18:28   ` Jason Gunthorpe
       [not found]     ` <20170301182833.GD14791-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  27 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2017-03-01 18:28 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w

On Wed, Mar 01, 2017 at 04:02:08PM +0200, Erez Shitrit wrote:
>     However, New features are exisiting in common network interfaces that
>     are very hard to implement in IPoIB interfaces while it uses the RDMA
>     layer, examples include TSS and RSS, tunneling offloads, and XDP.
>     Rather than continuously porting IP network interface developments into
>     the RDMA stack, we propose adding an abstract network data-path interfaces
>     to RDMA devices.

I should also say that we have discussed this for a while and I am
broadly in favor of this approach of delegating IPoIB skb transmit
directly to the low level driver.

There are too many optimizations that do not fit sanely into verbs,
and the entire overhead of building a verbs wr and posting it then
translating that to a driver wr is unnecessary layering.

Longer term, it might be quite nice if the part verbs specific part of
ipoib was cleanly split out into the 'default' implementation of your
get_ipoib_accel_ops.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver
       [not found]     ` <20170301182039.GC14791-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-02 19:13       ` Erez Shitrit
       [not found]         ` <CAAk-MO9tAHioaSXv8MPu=Kf3QSxjuQhAt6vYRVCzuNriXkm-+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Erez Shitrit @ 2017-03-02 19:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky,
	Niranjana Vishwanathapura

On Wed, Mar 1, 2017 at 8:20 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Wed, Mar 01, 2017 at 04:02:08PM +0200, Erez Shitrit wrote:
>>     /* default accelerating functions, same as before */
>>     struct ib_ipoib_accel_ops default_ipoib_accel_ops = {
>>             .ib_dev_init = ipoib_dev_init_default,
>>             .ib_dev_cleanup = ipoib_dev_uninit_default,
>>             .ib_dev_open = ipoib_ib_dev_open_default,
>>             .ib_dev_stop = ipoib_ib_dev_stop_default,
>>             .send = ipoib_send,
>>             .create_netdev = ipoib_create_netdev_default,
>>             .attach_mcast = ipoib_mcast_attach,
>>             .dettach_mcast = ipoib_mcast_dettach,
>>     };
>
> This is so ridiculously close to what opa_vnic is doing, please try to
> work with them to figure out some common version of your
> 'create_netdev':
>
> +struct ib_ipoib_accel_ops {
> +
> +       /*
> +        * HW provider driver creates the net_device for IPoIB.
> +        * hca: The current ib device.
> +        * name: is the format of the new network device (probably ib%d)
> +        */
> +       struct net_device * (*create_netdev)(struct ib_device *hca,
> +                                            const char *name,
> +                                            void (*setup)(struct net_device *));
>
>
> vs:
>
> @@ -2110,6 +2128,15 @@  struct ib_device {
>
> +       struct net_device *(*alloc_rdma_netdev)(
> +                                       struct ib_device *device,
> +                                       u8 port_num,
> +                                       enum rdma_netdev_t type,
> +                                       const char *name,
> +                                       unsigned char name_assign_type,
> +                                       void (*setup)(struct net_device *));
>
>
> I've pointed out again and again you two are working on the same
> thing.
>
> It is inexplicable to me you couldn't use this opa_vnic patch as a
> starting point for the ipoib api:
>
> https://patchwork.kernel.org/patch/9587819/
>
> I would propose you add a RDMA_NETDEV_IPOIB to their stuff and a
> rdma_netdev_get_ipoib_ops(struct net_device *) to get your
> rdma_netdev_ipoib_ops.
>

We agree that both of the implementations look similar in some way,
and we think that at the end we can give one api that will work for
both of us.
but, we need to remember that IPoIB acceleration and VNIC are coming
to solve different issues.

Anyway, It doesn't right for us to use the api as it is in the VNIC
code, there are few differences that according to them we started with
a different set of functions:

1. We added acceleration to IPoIB driver which is generic network
device driver and should work over any other HW specific vendor,
without exposing the way the acceleration is done beneath, so we
preferred not to  tie the implementation to the term "netdev ndo,
The VNIC driver is free to do whatever it likes to because it "knows"
what is going on in the low-level-driver (which is from the same
vendor)
also, we tried to keep the IPoIB code as before, the only changes is
by separating control from data-path, that derived the meaning of the
functions we exported, only data-path or HW resources handling.

2. we wanted to keep the IPoIB code as clean as we can, so we created
a "default acceleration" that works the same way as it was before, the
idea that leads that is to separate the HW specific from the
control/management area, and that is done better/simpler with the set
of functions that are exposed.

3. We are using some functions that are not part of the ndo calls or
are used differently for example:
We need functions to attach / detach mcast, we are using send function
that needs to get the ah object We can do some hacking but we think it
is cleaner that way.

I think that we can change the create_netdev api to be like is in the
VNIC code, but we would like to keep the rest of the functions in our
struct and suggest the VNIC driver will use it in order to get the
pointer of the create_netdev, We think that struct of pointers to
functions can be flexible to future needs with no need to consider the
ndo existence.

meaning:
one verb that expose struct of functions.
for IPoIB we will use all of them.
for VNIC it will use the create_netdev only.
both of the driver are able to use them for their specific needs.

> I also wonder why there are so many ops, for instance the approach of
> having the driver set .ndo_stop and others seems better.
>
> ipoib could either 'wrapper' that pointer after calling
> alloc_rdma_netdev or it could export enough utility functions for the
> driver to implement the needed behavior. (eg this may be more in line
> with the "midlayer mistake" thinking of
> https://lwn.net/Articles/336262/)
>
> Maybe Niranjana has some suggestions based on learning in vnic, they
> tried a few different schemes.
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver
       [not found]     ` <20170301182833.GD14791-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-02 19:17       ` Erez Shitrit
  2017-03-02 20:30       ` ira.weiny
  1 sibling, 0 replies; 34+ messages in thread
From: Erez Shitrit @ 2017-03-02 19:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky

On Wed, Mar 1, 2017 at 8:28 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Wed, Mar 01, 2017 at 04:02:08PM +0200, Erez Shitrit wrote:
>>     However, New features are exisiting in common network interfaces that
>>     are very hard to implement in IPoIB interfaces while it uses the RDMA
>>     layer, examples include TSS and RSS, tunneling offloads, and XDP.
>>     Rather than continuously porting IP network interface developments into
>>     the RDMA stack, we propose adding an abstract network data-path interfaces
>>     to RDMA devices.
>
> I should also say that we have discussed this for a while and I am
> broadly in favor of this approach of delegating IPoIB skb transmit
> directly to the low level driver.
>
> There are too many optimizations that do not fit sanely into verbs,
> and the entire overhead of building a verbs wr and posting it then
> translating that to a driver wr is unnecessary layering.
>
> Longer term, it might be quite nice if the part verbs specific part of
> ipoib was cleanly split out into the 'default' implementation of your
> get_ipoib_accel_ops.

That what we tried to do here (hopefully..) we tried to keep the
separating of the control from data-path and verbs.
The accelerating functions are only for  data-path or HW resources handling.

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver
       [not found]         ` <CAAk-MO9tAHioaSXv8MPu=Kf3QSxjuQhAt6vYRVCzuNriXkm-+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-02 20:06           ` Jason Gunthorpe
       [not found]             ` <20170302200619.GA17530-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2017-03-02 20:06 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky,
	Niranjana Vishwanathapura

On Thu, Mar 02, 2017 at 09:13:33PM +0200, Erez Shitrit wrote:

> but, we need to remember that IPoIB acceleration and VNIC are coming
> to solve different issues.

Seriously? Have you read the vnic patches and looked at what they are
doing with their create_netdev and related?

It is *exactly* the same problem - the lowlevel driver has
accelerations that apply to skbs that are slow/impossible to use over
verbs.

There is no 'magic' going on there, the vnic ulp plays exactly the
same role as ipoib in arbitrating the control plane side, while the
stuff under create_netdev is concerned entirely with accelerated rx/tx
of skbs

ipob is entirely the same, and needs to converge on the same generic
interface for shuffling skbs (which is netdev ndo) and generic
itnerface to provide any special functions like attach/detach

Most of your comments, and from others at Mellanox make me think you
have not really internallized what the vnic patches that add the
create_ndev api are all about.

> 2. we wanted to keep the IPoIB code as clean as we can, so we created
> a "default acceleration" that works the same way as it was before, the
> idea that leads that is to separate the HW specific from the
> control/management area, and that is done better/simpler with the set
> of functions that are exposed.

Yes, this is the right way to go.

> 3. We are using some functions that are not part of the ndo calls or
> are used differently for example:

What these patches do is not consistent with the rest of the stack.

Niranjana choose to add control functions directly to 'rdma_netdev', I
think that is a reasonable choice and you should do the same:

+struct rdma_netdev {
+	void *clnt_priv;
+
+	/* control functions */
+	void (*set_id)(struct net_device *netdev, int id);
+};

Add ipoib_attach_mcast/etc

Maybe propose to Niranjana that this should be a struct
rdma_netdev_ops pointer.

For everything else I would use ndo ops directly and do not involve
the rdma side.

> I think that we can change the create_netdev api to be like is in the
> VNIC code, but we would like to keep the rest of the functions in
> our

Of course you can, it is the same function!

Both patchsets need to adopt the same scheme for dealing with 'priv'
data in the netdev, and the same signature for the create_netdev()
driver op. The vnic approach is reasonable here and provides a
dedicated priv for the driver and a second priv for the management
layer.

You should start here:

https://patchwork.kernel.org/patch/9587819/

And propose any fundamental modifications to Niranjana.

For instance, ipoib should define a header like 'opib_vnic.h' that
specifies ipoib versions of opa_vnic_rdma_netdev, opa_vnic_priv,
opa_vnic_dev_priv, and so on.

> struct and suggest the VNIC driver will use it in order to get the
> pointer of the create_netdev, We think that struct of pointers to
> functions can be flexible to future needs with no need to consider the
> ndo existence.

I think that misses the point, there many ndo ops that are
relavent only to the low level driver (eg ndo_select_queue,
ndo_features_check, etc) and those should just be directly hooked by
the low level driver without involvement from the ipoib layer.

For instance instead of having ipoib ndo_start_xmit call a
ops->start(), it is more consistent with netdev for the driver to
provide ndo_start_xmit and call ipoib_get_ah() to get the path
information.

This broadly makes more sense because the driver might want to return
NETDEV_TX_BUSY right away and doesn't need to be forced to do a
mandatory AH lookup by the ipoib wrapper.

In general wrapping the ndo calls in ipoib is probably not a great
idea. (same comment applies to vnic)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver
       [not found]     ` <20170301182833.GD14791-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-03-02 19:17       ` Erez Shitrit
@ 2017-03-02 20:30       ` ira.weiny
  1 sibling, 0 replies; 34+ messages in thread
From: ira.weiny @ 2017-03-02 20:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w

On Wed, Mar 01, 2017 at 11:28:33AM -0700, Jason Gunthorpe wrote:
> On Wed, Mar 01, 2017 at 04:02:08PM +0200, Erez Shitrit wrote:
> >     However, New features are exisiting in common network interfaces that
> >     are very hard to implement in IPoIB interfaces while it uses the RDMA
> >     layer, examples include TSS and RSS, tunneling offloads, and XDP.
> >     Rather than continuously porting IP network interface developments into
> >     the RDMA stack, we propose adding an abstract network data-path interfaces
> >     to RDMA devices.
> 
> I should also say that we have discussed this for a while and I am
> broadly in favor of this approach of delegating IPoIB skb transmit
> directly to the low level driver.

I agree.

> 
> There are too many optimizations that do not fit sanely into verbs,
> and the entire overhead of building a verbs wr and posting it then
> translating that to a driver wr is unnecessary layering.

Yes, Niranjana has more details but the idea with the new rdma_netdev is that
the type defines what skbs the driver can expect and the low level driver does
the correct optimizations if the rdma_netdev is used.

I assume none of this applies to Connected Mode?

> 
> Longer term, it might be quite nice if the part verbs specific part of
> ipoib was cleanly split out into the 'default' implementation of your
> get_ipoib_accel_ops.

This seems reasonable,
Ira


> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver
       [not found]             ` <20170302200619.GA17530-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-03  0:14               ` Vishwanathapura, Niranjana
  0 siblings, 0 replies; 34+ messages in thread
From: Vishwanathapura, Niranjana @ 2017-03-03  0:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Erez Shitrit, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	Leon Romanovsky

Hi,

I haven't gone through the patches in detail yet.
Jason already responded regarding using 'rdma netdev' for this use case.
I agree with it and adding couple points to that here.

On Thu, Mar 02, 2017 at 01:06:19PM -0700, Jason Gunthorpe wrote:
>On Thu, Mar 02, 2017 at 09:13:33PM +0200, Erez Shitrit wrote:
>
>> 3. We are using some functions that are not part of the ndo calls or
>> are used differently for example:
>
>What these patches do is not consistent with the rest of the stack.
>
>Niranjana choose to add control functions directly to 'rdma_netdev', I
>think that is a reasonable choice and you should do the same:
>
>+struct rdma_netdev {
>+	void *clnt_priv;
>+
>+	/* control functions */
>+	void (*set_id)(struct net_device *netdev, int id);
>+};
>
>Add ipoib_attach_mcast/etc
>
>Maybe propose to Niranjana that this should be a struct
>rdma_netdev_ops pointer.
>
>For everything else I would use ndo ops directly and do not involve
>the rdma side.
>

Many of the operations in ib_ipoib_accel_ops can use ndo calls or can be easily 
be adopted to use them.

>> I think that we can change the create_netdev api to be like is in the
>> VNIC code, but we would like to keep the rest of the functions in
>> our
>
>Of course you can, it is the same function!
>
>Both patchsets need to adopt the same scheme for dealing with 'priv'
>data in the netdev, and the same signature for the create_netdev()
>driver op. The vnic approach is reasonable here and provides a
>dedicated priv for the driver and a second priv for the management
>layer.
>
>You should start here:
>
>https://patchwork.kernel.org/patch/9587819/
>
>And propose any fundamental modifications to Niranjana.
>
>For instance, ipoib should define a header like 'opib_vnic.h' that
>specifies ipoib versions of opa_vnic_rdma_netdev, opa_vnic_priv,
>opa_vnic_dev_priv, and so on.
>
>> struct and suggest the VNIC driver will use it in order to get the
>> pointer of the create_netdev, We think that struct of pointers to
>> functions can be flexible to future needs with no need to consider the
>> ndo existence.
>
>I think that misses the point, there many ndo ops that are
>relavent only to the low level driver (eg ndo_select_queue,
>ndo_features_check, etc) and those should just be directly hooked by
>the low level driver without involvement from the ipoib layer.
>
>For instance instead of having ipoib ndo_start_xmit call a
>ops->start(), it is more consistent with netdev for the driver to
>provide ndo_start_xmit and call ipoib_get_ah() to get the path
>information.
>
>This broadly makes more sense because the driver might want to return
>NETDEV_TX_BUSY right away and doesn't need to be forced to do a
>mandatory AH lookup by the ipoib wrapper.
>
>In general wrapping the ndo calls in ipoib is probably not a great
>idea. (same comment applies to vnic)
>

Having OPA_VNIC call a function out of ib_ipoib_accel_ops is not right design 
at all.
'rdma netdev' is a generic netdev interface which can be used for multiple use 
cases.
Both OPA_VNIC and this should use 'rdma netdev' interface. OPA_VNIC is already 
doing it by defining RDMA_NETDEV_OPA_VNIC.
The control ops should be associated with the 'rdma netdev' and not the other 
way round.

You can put the control ops in the ipoib wrapper around 'rdma_netdev' (opa_vnic 
defines opa_vnic_rdma_netdev structure, ipoib can define a similar one) or 
directly under 'rdma_netdev' structure perhaps.
I am seeing in PATCH#26, mlx5 driver is accessing ipoib ULP's private data 
structure, which it shouldn't. VNIC using rdma netdev clearly distinguishes 
between client and device private data structures.

Niranjana

>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-03-03  0:14 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01 14:02 [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver Erez Shitrit
     [not found] ` <1488376954-8346-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-01 14:02   ` [RFC for accelerated IPoIB 01/26] IB/ipoib: Separate control and data related initializations Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 02/26] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 03/26] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 04/26] IB/verb: Add ipoib_options struct and API Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 05/26] IB/ipoib: Support ipoib acceleration options callbacks Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 06/26] IB/ipoib: Add context to ipoib to be used in acceleration layer Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 07/26] hw/mlx5: Add New bit to check over QP creation Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 08/26] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 09/26] net/mlx5e: Refactor EN code to support IB link Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 10/26] net/mlx5e: Creating and Destroying flow-steering tables for " Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 11/26] net/mlx5e: Support netdevice creation for IB link type Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 12/26] net/mlx5e: Refactor attach_netdev API Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 13/26] net/mlx5e: Use underlay_qpn in tis creation Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 14/26] net/mlx5e: Export resource creation function to be used in IB link Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 15/26] net/mlx5: Enable flow-steering for " Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 16/26] net/mlx5e: Enhanced flow table creation to support ETH and IB links Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 17/26] net/mlx5e: Change cleanup API in order to enable IB link Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 18/26] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 19/26] net/mlx5e: Export open/close api for IB link Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 20/26] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 21/26] net/mlx5e: Refactor TX send flow Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 22/26] net/mlx5e: Export send function for IB link type Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 23/26] net/mlx5e: New function pointer for build_rx_skb is Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 24/26] net/mlx5e: Change the function that checks the packet type Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 25/26] net/mlx5e: Add support for build_rx_skb for packet from IB type Erez Shitrit
2017-03-01 14:02   ` [RFC for accelerated IPoIB 26/26] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops Erez Shitrit
2017-03-01 18:20   ` [RFC for accelerated IPoIB 00/27] Enhanced mode for IPoIB driver Jason Gunthorpe
     [not found]     ` <20170301182039.GC14791-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-02 19:13       ` Erez Shitrit
     [not found]         ` <CAAk-MO9tAHioaSXv8MPu=Kf3QSxjuQhAt6vYRVCzuNriXkm-+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-02 20:06           ` Jason Gunthorpe
     [not found]             ` <20170302200619.GA17530-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-03  0:14               ` Vishwanathapura, Niranjana
2017-03-01 18:28   ` Jason Gunthorpe
     [not found]     ` <20170301182833.GD14791-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-02 19:17       ` Erez Shitrit
2017-03-02 20:30       ` ira.weiny

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.