All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver
@ 2017-03-13 18:31 Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 01/25] IB/ipoib: Separate control and data related initializations Erez Shitrit
                   ` (15 more replies)
  0 siblings, 16 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit

    The IPoIB protocol encapsulates IP packets over Infiniband datagrams.
    As a direct RDMA Upper Layer Protocol (ULP), IPoIB cannot support HW
    features that are specific to the IP protocol stack.

    Nevertheless, RDMA interfaces have been extended to support some of the
    prominent IP offload features, such as TCP/UDP checksum and TSO.
    This provided reasonable performance gain for IPoIB but is still
    insufficient to cope with the increasing network bandwidth demand.

    However, New features are exisiting in common network interfaces that
    are very hard to implement in IPoIB interfaces while it uses the RDMA
    layer, examples include TSS and RSS, tunneling offloads, and XDP.
    Rather than continuously porting IP network interface developments into
    the RDMA stack, we propose adding an abstract network data-path interfaces
    to RDMA devices.

    In order to present a consistent interface to users, the IPoIB ULP
    continues to represent the network device to the IP stack.
    The common code also manages the IPoIB control plane, such as resolving
    path queries and registering to multicast groups.
    Data path operations are forwarded to devices that implement the new
    API, or fallback to the standard implementation otherwise.
    Using the forgoing approach, we show how IPoIB closes the performance
    gap compared to state-of-the-art Ethernet network interfaces.

    The implementation idea is to expose a struct that has data members and set
    of functions that are used for network interfaces, like create, delete, init hw
    resources, send, and attach/detach multicast to qp.
    That set of functions encapsulates in new struct, and this struct can or
    can't be given by the specific HW layer.

    The IPoIB code will be adapted to enable the option of accelerating the
    network interface, but the code will work as before if the HW below
    doesn't support the acceleration.
    Each HW vendor can supply the acceleration for the IPoIB or to leave
    IPoIB to work as before.

    TBD:
	1. Few functions in the API might be changed, at least send functions that is not going to use the ipoib_ah struct.
	2. Currently I used functions for init/cleanup, perhaps later it will be pushed into the ndo_ops struct.
	3. The low-level-functions will have a new design that will reduce the use of exported function from the mlx5_core layer to the ib layer.

Changes fron v0:
---------------
1. Use the vnic/hfi API as a base for the new design/impl.
2. Change the low level driver to support the new struct. 


Erez Shitrit (25):
  IB/ipoib: Separate control and data related initializations
  IB/ipoib: separate control from HW operation on ipoib_open/stop ndo
  IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions
  IB/verb: Add ipoib_options struct and API
  IB/ipoib: Support ipoib acceleration options callbacks
  hw/mlx5: Add New bit to check over QP creation
  linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects
  net/mlx5e: Refactor EN code to support IB link
  net/mlx5e: Creating and Destroying flow-steering tables for IB link
  net/mlx5e: Support netdevice creation for IB link type
  net/mlx5e: Refactor attach_netdev API
  net/mlx5e: Use underlay_qpn in tis creation
  net/mlx5e: Export resource creation function to be used in IB link
  net/mlx5: Enable flow-steering for IB link
  net/mlx5e: Enhanced flow table creation to support ETH and IB links.
  net/mlx5e: Change cleanup API in order to enable IB link
  net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api
  net/mlx5e: Export open/close api for IB link
  include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode
  net/mlx5e: Refactor TX send flow
  net/mlx5e: Export send function for IB link type
  net/mlx5e: New function pointer for build_rx_skb is
  net/mlx5e: Change the function that checks the packet type
  net/mlx5e: Add support for build_rx_skb for packet from IB type
  mlx5_ib: skeleton for mlx5_ib to support ipoib_ops

 drivers/infiniband/hw/mlx5/Makefile                |   2 +-
 drivers/infiniband/hw/mlx5/main.c                  |  10 +
 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c        | 289 ++++++++++++++
 drivers/infiniband/hw/mlx5/qp.c                    |   5 +-
 drivers/infiniband/ulp/ipoib/ipoib.h               |  35 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c            |  66 ++--
 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c       |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_fs.c            |   4 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c            | 316 +++++++--------
 drivers/infiniband/ulp/ipoib/ipoib_main.c          | 299 ++++++++++----
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c     |  39 +-
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c       |  12 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c         |  62 +--
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c          |   9 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  23 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  |  12 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  24 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c    |  66 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 432 ++++++++++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  21 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  70 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c    | 292 +++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   9 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  19 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |   8 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  77 ++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/fw.c       |   3 +-
 include/linux/mlx5/driver.h                        |  18 +
 include/linux/mlx5/fs.h                            |  16 +-
 include/linux/mlx5/mlx5_ifc.h                      |  11 +-
 include/linux/mlx5/qp.h                            |   8 +
 include/rdma/ib_ipoib_accel_ops.h                  |  59 +++
 include/rdma/ib_verbs.h                            |  36 ++
 34 files changed, 1720 insertions(+), 639 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c
 create mode 100644 include/rdma/ib_ipoib_accel_ops.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 01/25] IB/ipoib: Separate control and data related initializations
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 02/25] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo Erez Shitrit
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


This patch prepares init and and teardown flows so we can call them
through ipoib_options function pointers.
It arranges that area of code as the following:
All operations which deal with the resource allocation/deletion are done
in one place.
All operations that are control oriented, meaning that they are not
connected to a specific HW beneath, are done in a separate place.

The operations for alloc HW resources are now in the function
ipoib_dev_init_default, the deletion of all the resources are in
ipoib_dev_uninit_default The only exception is the creation of the pd
object which is used both for resource allocation (create QP etc.) and for
control flows like creating ah.

It also does:
Move creation of rx_ring and tx_ring to be in the resources allocation
area.
Move the function ipoib_ib_dev_open that does the open device to the
control area instead of the dev_init which creates resources.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h       |   5 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c    |  39 +++-------
 drivers/infiniband/ulp/ipoib/ipoib_main.c  | 116 ++++++++++++++++++++++++-----
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |  40 +---------
 4 files changed, 110 insertions(+), 90 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index bed233bf45c3..7cd9befd7d54 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -491,14 +491,13 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 void ipoib_flush_paths(struct net_device *dev);
 int ipoib_check_sm_sendonly_fullmember_support(struct ipoib_dev_priv *priv);
 struct ipoib_dev_priv *ipoib_intf_alloc(const char *format);
-
-int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
+void ipoib_ib_tx_timer_func(unsigned long ctx);
 void ipoib_ib_dev_flush_light(struct work_struct *work);
 void ipoib_ib_dev_flush_normal(struct work_struct *work);
 void ipoib_ib_dev_flush_heavy(struct work_struct *work);
 void ipoib_pkey_event(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
-
+void ipoib_dev_uninit_default(struct net_device *dev);
 int ipoib_ib_dev_open(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 12c4f84a6639..3c0a35d883e2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -40,6 +40,7 @@
 
 #include <linux/ip.h>
 #include <linux/tcp.h>
+#include <linux/vmalloc.h>
 
 #include "ipoib.h"
 
@@ -692,7 +693,7 @@ static void ipoib_stop_ah(struct net_device *dev)
 	ipoib_flush_ah(dev);
 }
 
-static void ipoib_ib_tx_timer_func(unsigned long ctx)
+void ipoib_ib_tx_timer_func(unsigned long ctx)
 {
 	drain_tx_cq((struct net_device *)ctx);
 }
@@ -913,32 +914,6 @@ void ipoib_ib_dev_stop(struct net_device *dev)
 	ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP);
 }
 
-int ipoib_ib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
-{
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
-
-	priv->ca = ca;
-	priv->port = port;
-	priv->qp = NULL;
-
-	if (ipoib_transport_dev_init(dev, ca)) {
-		printk(KERN_WARNING "%s: ipoib_transport_dev_init failed\n", ca->name);
-		return -ENODEV;
-	}
-
-	setup_timer(&priv->poll_timer, ipoib_ib_tx_timer_func,
-		    (unsigned long) dev);
-
-	if (dev->flags & IFF_UP) {
-		if (ipoib_ib_dev_open(dev)) {
-			ipoib_transport_dev_cleanup(dev);
-			return -ENODEV;
-		}
-	}
-
-	return 0;
-}
-
 /*
  * Takes whatever value which is in pkey index 0 and updates priv->pkey
  * returns 0 if the pkey value was changed.
@@ -1236,7 +1211,13 @@ void ipoib_ib_dev_cleanup(struct net_device *dev)
 	 */
 	ipoib_stop_ah(dev);
 
-	ipoib_transport_dev_cleanup(dev);
-}
+	clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
+
+	ipoib_dev_uninit_default(dev);
 
+	if (priv->pd) {
+		ib_dealloc_pd(priv->pd);
+		priv->pd = NULL;
+	}
+}
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 259c59f67394..8c644bbc2828 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1622,8 +1622,23 @@ static void ipoib_neigh_hash_uninit(struct net_device *dev)
 	wait_for_completion(&priv->ntbl.deleted);
 }
 
+void ipoib_dev_uninit_default(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
-int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
+	ipoib_transport_dev_cleanup(dev);
+
+	ipoib_cm_dev_cleanup(dev);
+
+	kfree(priv->rx_ring);
+	vfree(priv->tx_ring);
+
+	priv->rx_ring = NULL;
+	priv->tx_ring = NULL;
+}
+
+static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
+				  int port)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 
@@ -1642,29 +1657,95 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 
 	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
 
-	if (ipoib_ib_dev_init(dev, ca, port))
+	if (ipoib_transport_dev_init(dev, ca)) {
+		pr_warn("%s: ipoib_transport_dev_init failed\n", ca->name);
 		goto out_tx_ring_cleanup;
+	}
+
+	setup_timer(&priv->poll_timer, ipoib_ib_tx_timer_func,
+		    (unsigned long)dev);
+
+	return 0;
+
+out_tx_ring_cleanup:
+	vfree(priv->tx_ring);
+
+out_rx_ring_cleanup:
+	kfree(priv->rx_ring);
+
+out:
+	return -ENOMEM;
+}
+
+int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret = -ENOMEM;
+
+	priv->ca = ca;
+	priv->port = port;
+	priv->qp = NULL;
 
 	/*
-	 * Must be after ipoib_ib_dev_init so we can allocate a per
-	 * device wq there and use it here
+	 * the various IPoIB tasks assume they will never race against
+	 * themselves, so always use a single thread workqueue
 	 */
-	if (ipoib_neigh_hash_init(priv) < 0)
+	priv->wq = alloc_ordered_workqueue("ipoib_wq", WQ_MEM_RECLAIM);
+	if (!priv->wq) {
+		pr_warn("%s: failed to allocate device WQ\n", dev->name);
+		goto out;
+	}
+
+	/* create pd, which used both for control and datapath*/
+	priv->pd = ib_alloc_pd(priv->ca, 0);
+	if (IS_ERR(priv->pd)) {
+		pr_warn("%s: failed to allocate PD\n", ca->name);
+		goto clean_wq;
+	}
+
+	ret = ipoib_dev_init_default(dev, ca, port);
+	if (ret) {
+		pr_warn("%s failed to init HW resource\n", dev->name);
+		goto out_free_pd;
+	}
+
+	/* after qp created set dev address */
+	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
+	priv->dev->dev_addr[2] = (priv->qp->qp_num >>  8) & 0xff;
+	priv->dev->dev_addr[3] = (priv->qp->qp_num) & 0xff;
+
+	if (ipoib_neigh_hash_init(priv) < 0) {
+		pr_warn("%s failed to init neigh hash\n", dev->name);
 		goto out_dev_uninit;
+	}
+
+	if (dev->flags & IFF_UP) {
+		if (ipoib_ib_dev_open(dev)) {
+			pr_warn("%s failed to open device\n", dev->name);
+			ret = -ENODEV;
+			goto out_dev_uninit;
+		}
+	}
 
 	return 0;
 
 out_dev_uninit:
 	ipoib_ib_dev_cleanup(dev);
 
-out_tx_ring_cleanup:
-	vfree(priv->tx_ring);
+out_free_pd:
+	if (priv->pd) {
+		ib_dealloc_pd(priv->pd);
+		priv->pd = NULL;
+	}
 
-out_rx_ring_cleanup:
-	kfree(priv->rx_ring);
+clean_wq:
+	if (priv->wq) {
+		destroy_workqueue(priv->wq);
+		priv->wq = NULL;
+	}
 
 out:
-	return -ENOMEM;
+	return ret;
 }
 
 void ipoib_dev_cleanup(struct net_device *dev)
@@ -1685,19 +1766,16 @@ void ipoib_dev_cleanup(struct net_device *dev)
 	}
 	unregister_netdevice_many(&head);
 
-	/*
-	 * Must be before ipoib_ib_dev_cleanup or we delete an in use
-	 * work queue
-	 */
 	ipoib_neigh_hash_uninit(dev);
 
 	ipoib_ib_dev_cleanup(dev);
 
-	kfree(priv->rx_ring);
-	vfree(priv->tx_ring);
-
-	priv->rx_ring = NULL;
-	priv->tx_ring = NULL;
+	/* no more works over the priv->wq */
+	if (priv->wq) {
+		flush_workqueue(priv->wq);
+		destroy_workqueue(priv->wq);
+		priv->wq = NULL;
+	}
 }
 
 static int ipoib_set_vf_link_state(struct net_device *dev, int vf, int link_state)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 189dcd1709d2..529b9509a2c4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -147,22 +147,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 	int ret, size;
 	int i;
 
-	priv->pd = ib_alloc_pd(priv->ca, 0);
-	if (IS_ERR(priv->pd)) {
-		printk(KERN_WARNING "%s: failed to allocate PD\n", ca->name);
-		return -ENODEV;
-	}
-
-	/*
-	 * the various IPoIB tasks assume they will never race against
-	 * themselves, so always use a single thread workqueue
-	 */
-	priv->wq = alloc_ordered_workqueue("ipoib_wq", WQ_MEM_RECLAIM);
-	if (!priv->wq) {
-		printk(KERN_WARNING "ipoib: failed to allocate device WQ\n");
-		goto out_free_pd;
-	}
-
 	size = ipoib_recvq_size + 1;
 	ret = ipoib_cm_dev_init(dev);
 	if (!ret) {
@@ -173,7 +157,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 			size += ipoib_recvq_size * ipoib_max_conn_qp;
 	} else
 		if (ret != -ENOSYS)
-			goto out_free_wq;
+			return -ENODEV;
 
 	cq_attr.cqe = size;
 	priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL,
@@ -212,10 +196,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		goto out_free_send_cq;
 	}
 
-	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
-	priv->dev->dev_addr[2] = (priv->qp->qp_num >>  8) & 0xff;
-	priv->dev->dev_addr[3] = (priv->qp->qp_num      ) & 0xff;
-
 	for (i = 0; i < MAX_SKB_FRAGS + 1; ++i)
 		priv->tx_sge[i].lkey = priv->pd->local_dma_lkey;
 
@@ -247,13 +227,6 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 out_cm_dev_cleanup:
 	ipoib_cm_dev_cleanup(dev);
 
-out_free_wq:
-	destroy_workqueue(priv->wq);
-	priv->wq = NULL;
-
-out_free_pd:
-	ib_dealloc_pd(priv->pd);
-
 	return -ENODEV;
 }
 
@@ -266,7 +239,6 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
 			ipoib_warn(priv, "ib_qp_destroy failed\n");
 
 		priv->qp = NULL;
-		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 	}
 
 	if (ib_destroy_cq(priv->send_cq))
@@ -274,16 +246,6 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
 
 	if (ib_destroy_cq(priv->recv_cq))
 		ipoib_warn(priv, "ib_cq_destroy (recv) failed\n");
-
-	ipoib_cm_dev_cleanup(dev);
-
-	if (priv->wq) {
-		flush_workqueue(priv->wq);
-		destroy_workqueue(priv->wq);
-		priv->wq = NULL;
-	}
-
-	ib_dealloc_pd(priv->pd);
 }
 
 void ipoib_event(struct ib_event_handler *handler,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 02/25] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 01/25] IB/ipoib: Separate control and data related initializations Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 03/25] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions Erez Shitrit
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


This patch is preparing the netdev part at the ipoib driver to be able
to use the ipoib_options.
It deals with the two flows from the .ndo: ipoib_open and ipoib_stop.
It arranges that area of code as follows:
All operations which deal with the HW resources, (for example change QP
state, post-receive etc.) are done in one place.
All operations that are control oriented (like restart multicast task,
start the reap_ah etc.) are done in separate place.

The functions that deals with the HW resources now located at
__ipoib_ib_dev_open for the ipoib_open flow and __ipoib_ib_dev_stop for
ipoib_stop.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   | 228 +++++++++++++++++-------------
 drivers/infiniband/ulp/ipoib/ipoib_main.c |   2 +-
 3 files changed, 129 insertions(+), 103 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 7cd9befd7d54..b3900b253ad5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -501,7 +501,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 int ipoib_ib_dev_open(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
-void ipoib_ib_dev_stop(struct net_device *dev);
+int ipoib_ib_dev_stop_default(struct net_device *dev);
 void ipoib_pkey_dev_check_presence(struct net_device *dev);
 
 int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 3c0a35d883e2..5d732c5f01ee 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -693,24 +693,113 @@ static void ipoib_stop_ah(struct net_device *dev)
 	ipoib_flush_ah(dev);
 }
 
-void ipoib_ib_tx_timer_func(unsigned long ctx)
+static int recvs_pending(struct net_device *dev)
 {
-	drain_tx_cq((struct net_device *)ctx);
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int pending = 0;
+	int i;
+
+	for (i = 0; i < ipoib_recvq_size; ++i)
+		if (priv->rx_ring[i].skb)
+			++pending;
+
+	return pending;
 }
 
-int ipoib_ib_dev_open(struct net_device *dev)
+int ipoib_ib_dev_stop_default(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	int ret;
+	struct ib_qp_attr qp_attr;
+	unsigned long begin;
+	struct ipoib_tx_buf *tx_req;
+	int i;
 
-	ipoib_pkey_dev_check_presence(dev);
+	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
+		napi_disable(&priv->napi);
 
-	if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags)) {
-		ipoib_warn(priv, "P_Key 0x%04x is %s\n", priv->pkey,
-			   (!(priv->pkey & 0x7fff) ? "Invalid" : "not found"));
-		return -1;
+	ipoib_cm_dev_stop(dev);
+
+	/*
+	 * Move our QP to the error state and then reinitialize in
+	 * when all work requests have completed or have been flushed.
+	 */
+	qp_attr.qp_state = IB_QPS_ERR;
+	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
+		ipoib_warn(priv, "Failed to modify QP to ERROR state\n");
+
+	/* Wait for all sends and receives to complete */
+	begin = jiffies;
+
+	while (priv->tx_head != priv->tx_tail || recvs_pending(dev)) {
+		if (time_after(jiffies, begin + 5 * HZ)) {
+			ipoib_warn(priv, "timing out; %d sends %d receives not completed\n",
+				   priv->tx_head - priv->tx_tail, recvs_pending(dev));
+
+			/*
+			 * assume the HW is wedged and just free up
+			 * all our pending work requests.
+			 */
+			while ((int) priv->tx_tail - (int) priv->tx_head < 0) {
+				tx_req = &priv->tx_ring[priv->tx_tail &
+							(ipoib_sendq_size - 1)];
+				ipoib_dma_unmap_tx(priv, tx_req);
+				dev_kfree_skb_any(tx_req->skb);
+				++priv->tx_tail;
+				--priv->tx_outstanding;
+			}
+
+			for (i = 0; i < ipoib_recvq_size; ++i) {
+				struct ipoib_rx_buf *rx_req;
+
+				rx_req = &priv->rx_ring[i];
+				if (!rx_req->skb)
+					continue;
+				ipoib_ud_dma_unmap_rx(priv,
+						      priv->rx_ring[i].mapping);
+				dev_kfree_skb_any(rx_req->skb);
+				rx_req->skb = NULL;
+			}
+
+			goto timeout;
+		}
+
+		ipoib_drain_cq(dev);
+
+		msleep(1);
 	}
 
+	ipoib_dbg(priv, "All sends and receives done.\n");
+
+timeout:
+	del_timer_sync(&priv->poll_timer);
+	qp_attr.qp_state = IB_QPS_RESET;
+	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
+		ipoib_warn(priv, "Failed to modify QP to RESET state\n");
+
+	ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP);
+
+	return 0;
+}
+
+int ipoib_ib_dev_stop(struct net_device *dev)
+{
+	ipoib_ib_dev_stop_default(dev);
+
+	ipoib_flush_ah(dev);
+
+	return 0;
+}
+
+void ipoib_ib_tx_timer_func(unsigned long ctx)
+{
+	drain_tx_cq((struct net_device *)ctx);
+}
+
+int ipoib_ib_dev_open_default(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int ret;
+
 	ret = ipoib_init_qp(dev);
 	if (ret) {
 		ipoib_warn(priv, "ipoib_init_qp returned %d\n", ret);
@@ -729,10 +818,6 @@ int ipoib_ib_dev_open(struct net_device *dev)
 		goto dev_stop;
 	}
 
-	clear_bit(IPOIB_STOP_REAPER, &priv->flags);
-	queue_delayed_work(priv->wq, &priv->ah_reap_task,
-			   round_jiffies_relative(HZ));
-
 	if (!test_and_set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
 		napi_enable(&priv->napi);
 
@@ -744,6 +829,35 @@ int ipoib_ib_dev_open(struct net_device *dev)
 	return -1;
 }
 
+int ipoib_ib_dev_open(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	ipoib_pkey_dev_check_presence(dev);
+
+	if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags)) {
+		ipoib_warn(priv, "P_Key 0x%04x is %s\n", priv->pkey,
+			   (!(priv->pkey & 0x7fff) ? "Invalid" : "not found"));
+		return -1;
+	}
+
+	clear_bit(IPOIB_STOP_REAPER, &priv->flags);
+	queue_delayed_work(priv->wq, &priv->ah_reap_task,
+			   round_jiffies_relative(HZ));
+
+	if (ipoib_ib_dev_open_default(dev)) {
+		pr_warn("%s: Failed to open dev\n", dev->name);
+		goto stop_ah_reap;
+	}
+
+	return 0;
+
+stop_ah_reap:
+	set_bit(IPOIB_STOP_REAPER, &priv->flags);
+	cancel_delayed_work(&priv->ah_reap_task);
+	return -1;
+}
+
 void ipoib_pkey_dev_check_presence(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -787,19 +901,6 @@ void ipoib_ib_dev_down(struct net_device *dev)
 	ipoib_flush_paths(dev);
 }
 
-static int recvs_pending(struct net_device *dev)
-{
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	int pending = 0;
-	int i;
-
-	for (i = 0; i < ipoib_recvq_size; ++i)
-		if (priv->rx_ring[i].skb)
-			++pending;
-
-	return pending;
-}
-
 void ipoib_drain_cq(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -839,81 +940,6 @@ void ipoib_drain_cq(struct net_device *dev)
 	local_bh_enable();
 }
 
-void ipoib_ib_dev_stop(struct net_device *dev)
-{
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_qp_attr qp_attr;
-	unsigned long begin;
-	struct ipoib_tx_buf *tx_req;
-	int i;
-
-	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags))
-		napi_disable(&priv->napi);
-
-	ipoib_cm_dev_stop(dev);
-
-	/*
-	 * Move our QP to the error state and then reinitialize in
-	 * when all work requests have completed or have been flushed.
-	 */
-	qp_attr.qp_state = IB_QPS_ERR;
-	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
-		ipoib_warn(priv, "Failed to modify QP to ERROR state\n");
-
-	/* Wait for all sends and receives to complete */
-	begin = jiffies;
-
-	while (priv->tx_head != priv->tx_tail || recvs_pending(dev)) {
-		if (time_after(jiffies, begin + 5 * HZ)) {
-			ipoib_warn(priv, "timing out; %d sends %d receives not completed\n",
-				   priv->tx_head - priv->tx_tail, recvs_pending(dev));
-
-			/*
-			 * assume the HW is wedged and just free up
-			 * all our pending work requests.
-			 */
-			while ((int) priv->tx_tail - (int) priv->tx_head < 0) {
-				tx_req = &priv->tx_ring[priv->tx_tail &
-							(ipoib_sendq_size - 1)];
-				ipoib_dma_unmap_tx(priv, tx_req);
-				dev_kfree_skb_any(tx_req->skb);
-				++priv->tx_tail;
-				--priv->tx_outstanding;
-			}
-
-			for (i = 0; i < ipoib_recvq_size; ++i) {
-				struct ipoib_rx_buf *rx_req;
-
-				rx_req = &priv->rx_ring[i];
-				if (!rx_req->skb)
-					continue;
-				ipoib_ud_dma_unmap_rx(priv,
-						      priv->rx_ring[i].mapping);
-				dev_kfree_skb_any(rx_req->skb);
-				rx_req->skb = NULL;
-			}
-
-			goto timeout;
-		}
-
-		ipoib_drain_cq(dev);
-
-		msleep(1);
-	}
-
-	ipoib_dbg(priv, "All sends and receives done.\n");
-
-timeout:
-	del_timer_sync(&priv->poll_timer);
-	qp_attr.qp_state = IB_QPS_RESET;
-	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
-		ipoib_warn(priv, "Failed to modify QP to RESET state\n");
-
-	ipoib_flush_ah(dev);
-
-	ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP);
-}
-
 /*
  * Takes whatever value which is in pkey index 0 and updates priv->pkey
  * returns 0 if the pkey value was changed.
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 8c644bbc2828..be8dcea252f9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -166,7 +166,7 @@ static int ipoib_stop(struct net_device *dev)
 	netif_stop_queue(dev);
 
 	ipoib_ib_dev_down(dev);
-	ipoib_ib_dev_stop(dev);
+	ipoib_ib_dev_stop_default(dev);
 
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
 		struct ipoib_dev_priv *cpriv;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 03/25] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 01/25] IB/ipoib: Separate control and data related initializations Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 02/25] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API Erez Shitrit
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h    | 2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index b3900b253ad5..d94a7a953338 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -483,7 +483,7 @@ static inline void ipoib_put_ah(struct ipoib_ah *ah)
 int ipoib_add_umcast_attr(struct net_device *dev);
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 qpn);
+		struct ipoib_ah *address, u32 dqpn);
 void ipoib_reap_ah(struct work_struct *work);
 
 struct ipoib_path *__path_find(struct net_device *dev, void *gid);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 5d732c5f01ee..dd5fb2964e63 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -514,7 +514,7 @@ void ipoib_send_comp_handler(struct ib_cq *cq, void *dev_ptr)
 
 static inline int post_send(struct ipoib_dev_priv *priv,
 			    unsigned int wr_id,
-			    struct ib_ah *address, u32 qpn,
+			    struct ib_ah *address, u32 dqpn,
 			    struct ipoib_tx_buf *tx_req,
 			    void *head, int hlen)
 {
@@ -524,7 +524,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 	ipoib_build_sge(priv, tx_req);
 
 	priv->tx_wr.wr.wr_id	= wr_id;
-	priv->tx_wr.remote_qpn	= qpn;
+	priv->tx_wr.remote_qpn	= dqpn;
 	priv->tx_wr.ah		= address;
 
 	if (head) {
@@ -539,7 +539,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 }
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 qpn)
+		struct ipoib_ah *address, u32 dqpn)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_tx_buf *tx_req;
@@ -621,7 +621,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 	skb_dst_drop(skb);
 
 	rc = post_send(priv, priv->tx_head & (ipoib_sendq_size - 1),
-		       address->ah, qpn, tx_req, phead, hlen);
+		       address->ah, dqpn, tx_req, phead, hlen);
 	if (unlikely(rc)) {
 		ipoib_warn(priv, "post_send failed, error %d\n", rc);
 		++dev->stats.tx_errors;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (2 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 03/25] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
       [not found]   ` <1489429896-10781-5-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks Erez Shitrit
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


The idea is to allow vendors to optimize IPoIB data path.
New struct that includes functions and data member is exposed.
It exposes set of callback functions for handling data path flows in IPoIB driver.
Each vendor can support these set of functions in order to optimize its
specific data path, and let IPoIB to leverage its data path.
The code of IPoIB driver was changed accordingly, and works in both ways
with vendor specific implementation and without.
There is an assumption, that vendors should give the full set of functions
and not only part of them, in order to work properly.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 include/rdma/ib_ipoib_accel_ops.h | 59 +++++++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h           | 36 ++++++++++++++++++++++++
 2 files changed, 95 insertions(+)
 create mode 100644 include/rdma/ib_ipoib_accel_ops.h

diff --git a/include/rdma/ib_ipoib_accel_ops.h b/include/rdma/ib_ipoib_accel_ops.h
new file mode 100644
index 000000000000..148a5529a559
--- /dev/null
+++ b/include/rdma/ib_ipoib_accel_ops.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2017 Mellanox Technologies Ltd.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(IB_IPOIB_ACCEL_OPS_H)
+#define IB_IPOIB_ACCEL_OPS_H
+
+#include <rdma/ib_verbs.h>
+
+/* ipoib rdma netdev's private data structure */
+struct ipoib_rdma_netdev {
+	struct rdma_netdev rn;  /* keep this first */
+	/* followed by device private data */
+	char *dev_priv[0];
+};
+
+static inline void *ipoib_priv(const struct net_device *dev)
+{
+	struct rdma_netdev *rn = netdev_priv(dev);
+
+	return rn->clnt_priv;
+}
+
+static inline void *ipoib_dev_priv(const struct net_device *dev)
+{
+	struct ipoib_rdma_netdev *ipoib_rn = netdev_priv(dev);
+
+	return ipoib_rn->dev_priv;
+}
+
+#endif /* IB_IPOIB_ACCEL_OPS_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 85b9034c8cfc..9b090efccdba 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1901,6 +1901,41 @@ struct ib_port_immutable {
 	u32                           max_mad_size;
 };
 
+/* rdma netdev type - specifies protocol type */
+enum rdma_netdev_t {
+	RDMA_NETDEV_OPA_VNIC,
+	RDMA_NETDEV_IPOIB
+};
+
+struct ipoib_ah;
+
+/**
+ * struct rdma_netdev - rdma netdev
+ * For cases where netstack interfacing is required.
+ */
+struct rdma_netdev {
+	void *clnt_priv;
+
+	/* control functions */
+	void (*set_id)(struct net_device *netdev, int id);
+	/* IB resource allocation function, returns new UD QP */
+	int (*ib_dev_init)(struct net_device *dev, struct ib_device *hca,
+			   int *qp_num);
+	void (*ib_dev_cleanup)(struct net_device *dev, struct ib_device *hca);
+
+	/* send packet */
+	void (*send)(struct net_device *dev, struct sk_buff *skb,
+		     struct ipoib_ah *address, u32 dqpn, u32 dqkey);
+
+	/* multicast */
+	int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
+			    union ib_gid *gid, u16 lid, int set_qkey);
+	int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
+			    union ib_gid *gid, u16 lid);
+	int qp_num;
+	void *context;
+};
+
 struct ib_device {
 	struct device                *dma_device;
 
@@ -2149,6 +2184,7 @@ struct ib_device {
 						struct ib_wq_attr *attr,
 						u32 wq_attr_mask,
 						struct ib_udata *udata);
+	struct ib_ipoib_accel_ops * (*get_ipoib_accel_ops)(struct ib_device *device);
 	struct ib_rwq_ind_table *  (*create_rwq_ind_table)(struct ib_device *device,
 							   struct ib_rwq_ind_table_init_attr *init_attr,
 							   struct ib_udata *udata);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (3 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 20:10   ` Jason Gunthorpe
       [not found]   ` <1489429896-10781-6-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 06/25] hw/mlx5: Add New bit to check over QP creation Erez Shitrit
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


IPoIB driver now uses the new set of callback functions.
If the HW provider supports the new ipoib_options implementation, the
driver uses the callbacks in its datapath flows, otherwise it uses the
driver default implementation for all data flows in its code.
The default implementation is exactly the driver implementation as it
was without HW vendor support.

TODO: We added remote qkey to ipoib_send in order to match send op
signature.
In accel mode this param will be used but in regular mode this param is
redundant. Need to think about better solution.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h           |  30 ++--
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        |  66 ++++----
 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c   |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_fs.c        |   4 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  57 +++----
 drivers/infiniband/ulp/ipoib/ipoib_main.c      | 207 ++++++++++++++++---------
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  39 +++--
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c   |  12 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |  24 ++-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c      |   9 +-
 10 files changed, 275 insertions(+), 179 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index d94a7a953338..48da1b5be183 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -51,8 +51,8 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_pack.h>
 #include <rdma/ib_sa.h>
+#include <rdma/ib_ipoib_accel_ops.h>
 #include <linux/sched.h>
-
 /* constants */
 
 enum ipoib_flush_level {
@@ -357,6 +357,7 @@ struct ipoib_dev_priv {
 	struct ib_cq	 *recv_cq;
 	struct ib_cq	 *send_cq;
 	struct ib_qp	 *qp;
+	u32		  qp_num;
 	u32		  qkey;
 
 	union ib_gid local_gid;
@@ -404,6 +405,7 @@ struct ipoib_dev_priv {
 	struct timer_list poll_timer;
 	unsigned max_send_sge;
 	bool sm_fullmember_sendonly_support;
+	const struct net_device_ops	*rn_ops;
 };
 
 struct ipoib_ah {
@@ -483,22 +485,26 @@ static inline void ipoib_put_ah(struct ipoib_ah *ah)
 int ipoib_add_umcast_attr(struct net_device *dev);
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 dqpn);
+		struct ipoib_ah *address, u32 dqpn, u32 dqkey);
 void ipoib_reap_ah(struct work_struct *work);
 
 struct ipoib_path *__path_find(struct net_device *dev, void *gid);
 void ipoib_mark_paths_invalid(struct net_device *dev);
 void ipoib_flush_paths(struct net_device *dev);
 int ipoib_check_sm_sendonly_fullmember_support(struct ipoib_dev_priv *priv);
-struct ipoib_dev_priv *ipoib_intf_alloc(const char *format);
+struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,
+					const char *format);
 void ipoib_ib_tx_timer_func(unsigned long ctx);
 void ipoib_ib_dev_flush_light(struct work_struct *work);
 void ipoib_ib_dev_flush_normal(struct work_struct *work);
 void ipoib_ib_dev_flush_heavy(struct work_struct *work);
 void ipoib_pkey_event(struct work_struct *work);
 void ipoib_ib_dev_cleanup(struct net_device *dev);
-void ipoib_dev_uninit_default(struct net_device *dev);
+void ipoib_dev_uninit_default(struct net_device *dev, struct ib_device *hca);
+int ipoib_ib_dev_open_default(struct net_device *dev);
+int ipoib_ib_dev_stop_default(struct net_device *dev);
 int ipoib_ib_dev_open(struct net_device *dev);
+int ipoib_ib_dev_stop(struct net_device *dev);
 void ipoib_ib_dev_up(struct net_device *dev);
 void ipoib_ib_dev_down(struct net_device *dev);
 int ipoib_ib_dev_stop_default(struct net_device *dev);
@@ -561,8 +567,10 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter,
 			  struct ipoib_path *path);
 #endif
 
-int ipoib_mcast_attach(struct net_device *dev, u16 mlid,
-		       union ib_gid *mgid, int set_qkey);
+int ipoib_mcast_attach(struct net_device *dev, struct ib_device *hca,
+		       union ib_gid *mgid, u16 mlid, int set_qkey);
+int ipoib_mcast_detach(struct net_device *dev, struct ib_device *hca,
+		       union ib_gid *mgid, u16 mlid);
 void ipoib_mcast_remove_list(struct list_head *remove_list);
 void ipoib_check_and_add_mcast_sendonly(struct ipoib_dev_priv *priv, u8 *mgid,
 				struct list_head *remove_list);
@@ -586,7 +594,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 void ipoib_set_umcast(struct net_device *ndev, int umcast_val);
 int  ipoib_set_mode(struct net_device *dev, const char *buf);
 
-void ipoib_setup(struct net_device *dev);
+void ipoib_setup_common(struct net_device *dev);
 
 void ipoib_pkey_open(struct ipoib_dev_priv *priv);
 void ipoib_drain_cq(struct net_device *dev);
@@ -606,14 +614,14 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv,
 
 static inline int ipoib_cm_admin_enabled(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	return IPOIB_CM_SUPPORTED(dev->dev_addr) &&
 		test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
 }
 
 static inline int ipoib_cm_enabled(struct net_device *dev, u8 *hwaddr)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	return IPOIB_CM_SUPPORTED(hwaddr) &&
 		test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags);
 }
@@ -636,13 +644,13 @@ static inline void ipoib_cm_set(struct ipoib_neigh *neigh, struct ipoib_cm_tx *t
 
 static inline int ipoib_cm_has_srq(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	return !!priv->cm.srq;
 }
 
 static inline unsigned int ipoib_cm_max_mtu(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	return priv->cm.max_cm_mtu;
 }
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 53d69d7d9ad4..9593930454ef 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -91,7 +91,7 @@ static void ipoib_cm_dma_unmap_rx(struct ipoib_dev_priv *priv, int frags,
 
 static int ipoib_cm_post_receive_srq(struct net_device *dev, int id)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_recv_wr *bad_wr;
 	int i, ret;
 
@@ -117,7 +117,7 @@ static int ipoib_cm_post_receive_nonsrq(struct net_device *dev,
 					struct ib_recv_wr *wr,
 					struct ib_sge *sge, int id)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_recv_wr *bad_wr;
 	int i, ret;
 
@@ -144,7 +144,7 @@ static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev,
 					     u64 mapping[IPOIB_CM_RX_SG],
 					     gfp_t gfp)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct sk_buff *skb;
 	int i;
 
@@ -195,7 +195,7 @@ static struct sk_buff *ipoib_cm_alloc_rx_skb(struct net_device *dev,
 static void ipoib_cm_free_rx_ring(struct net_device *dev,
 				  struct ipoib_cm_rx_buf *rx_ring)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int i;
 
 	for (i = 0; i < ipoib_recvq_size; ++i)
@@ -234,7 +234,7 @@ static void ipoib_cm_start_rx_drain(struct ipoib_dev_priv *priv)
 static void ipoib_cm_rx_event_handler(struct ib_event *event, void *ctx)
 {
 	struct ipoib_cm_rx *p = ctx;
-	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(p->dev);
 	unsigned long flags;
 
 	if (event->event != IB_EVENT_QP_LAST_WQE_REACHED)
@@ -250,7 +250,7 @@ static void ipoib_cm_rx_event_handler(struct ib_event *event, void *ctx)
 static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
 					   struct ipoib_cm_rx *p)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_qp_init_attr attr = {
 		.event_handler = ipoib_cm_rx_event_handler,
 		.send_cq = priv->recv_cq, /* For drain WR */
@@ -275,7 +275,7 @@ static int ipoib_cm_modify_rx_qp(struct net_device *dev,
 				 struct ib_cm_id *cm_id, struct ib_qp *qp,
 				 unsigned psn)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_qp_attr qp_attr;
 	int qp_attr_mask, ret;
 
@@ -330,7 +330,7 @@ static void ipoib_cm_init_rx_wr(struct net_device *dev,
 				struct ib_recv_wr *wr,
 				struct ib_sge *sge)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int i;
 
 	for (i = 0; i < priv->cm.num_frags; ++i)
@@ -348,7 +348,7 @@ static void ipoib_cm_init_rx_wr(struct net_device *dev,
 static int ipoib_cm_nonsrq_init_rx(struct net_device *dev, struct ib_cm_id *cm_id,
 				   struct ipoib_cm_rx *rx)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct {
 		struct ib_recv_wr wr;
 		struct ib_sge sge[IPOIB_CM_RX_SG];
@@ -419,7 +419,7 @@ static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id,
 			     struct ib_qp *qp, struct ib_cm_req_event_param *req,
 			     unsigned psn)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_cm_data data = {};
 	struct ib_cm_rep_param rep = {};
 
@@ -439,7 +439,7 @@ static int ipoib_cm_send_rep(struct net_device *dev, struct ib_cm_id *cm_id,
 static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 {
 	struct net_device *dev = cm_id->context;
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_cm_rx *p;
 	unsigned psn;
 	int ret;
@@ -512,7 +512,7 @@ static int ipoib_cm_rx_handler(struct ib_cm_id *cm_id,
 		/* Fall through */
 	case IB_CM_REJ_RECEIVED:
 		p = cm_id->context;
-		priv = netdev_priv(p->dev);
+		priv = ipoib_priv(p->dev);
 		if (ib_modify_qp(p->qp, &ipoib_cm_err_attr, IB_QP_STATE))
 			ipoib_warn(priv, "unable to move qp to error state\n");
 		/* Fall through */
@@ -556,7 +556,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 
 void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_cm_rx_buf *rx_ring;
 	unsigned int wr_id = wc->wr_id & ~(IPOIB_OP_CM | IPOIB_OP_RECV);
 	struct sk_buff *skb, *newskb;
@@ -705,7 +705,7 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 
 void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_tx *tx)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_tx_buf *tx_req;
 	int rc;
 	unsigned usable_sge = tx->max_send_sge - !!skb_headlen(skb);
@@ -783,7 +783,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 
 void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_cm_tx *tx = wc->qp->qp_context;
 	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_CM;
 	struct ipoib_tx_buf *tx_req;
@@ -852,7 +852,7 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 int ipoib_cm_dev_open(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int ret;
 
 	if (!IPOIB_CM_SUPPORTED(dev->dev_addr))
@@ -884,7 +884,7 @@ int ipoib_cm_dev_open(struct net_device *dev)
 
 static void ipoib_cm_free_rx_reap_list(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_cm_rx *rx, *n;
 	LIST_HEAD(list);
 
@@ -907,7 +907,7 @@ static void ipoib_cm_free_rx_reap_list(struct net_device *dev)
 
 void ipoib_cm_dev_stop(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_cm_rx *p;
 	unsigned long begin;
 	int ret;
@@ -966,7 +966,7 @@ void ipoib_cm_dev_stop(struct net_device *dev)
 static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 {
 	struct ipoib_cm_tx *p = cm_id->context;
-	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(p->dev);
 	struct ipoib_cm_data *data = event->private_data;
 	struct sk_buff_head skqueue;
 	struct ib_qp_attr qp_attr;
@@ -1034,7 +1034,7 @@ static int ipoib_cm_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 
 static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_cm_tx *tx)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_qp_init_attr attr = {
 		.send_cq		= priv->recv_cq,
 		.recv_cq		= priv->recv_cq,
@@ -1067,7 +1067,7 @@ static int ipoib_cm_send_req(struct net_device *dev,
 			     u32 qpn,
 			     struct ib_sa_path_rec *pathrec)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_cm_data data = {};
 	struct ib_cm_req_param req = {};
 
@@ -1102,7 +1102,7 @@ static int ipoib_cm_send_req(struct net_device *dev,
 static int ipoib_cm_modify_tx_init(struct net_device *dev,
 				  struct ib_cm_id *cm_id, struct ib_qp *qp)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_qp_attr qp_attr;
 	int qp_attr_mask, ret;
 	ret = ib_find_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index);
@@ -1127,7 +1127,7 @@ static int ipoib_cm_modify_tx_init(struct net_device *dev,
 static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
 			    struct ib_sa_path_rec *pathrec)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(p->dev);
 	int ret;
 
 	p->tx_ring = __vmalloc(ipoib_sendq_size * sizeof *p->tx_ring,
@@ -1184,7 +1184,7 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
 
 static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(p->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(p->dev);
 	struct ipoib_tx_buf *tx_req;
 	unsigned long begin;
 
@@ -1234,7 +1234,7 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
 			       struct ib_cm_event *event)
 {
 	struct ipoib_cm_tx *tx = cm_id->context;
-	struct ipoib_dev_priv *priv = netdev_priv(tx->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(tx->dev);
 	struct net_device *dev = priv->dev;
 	struct ipoib_neigh *neigh;
 	unsigned long flags;
@@ -1285,7 +1285,7 @@ static int ipoib_cm_tx_handler(struct ib_cm_id *cm_id,
 struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path *path,
 				       struct ipoib_neigh *neigh)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_cm_tx *tx;
 
 	tx = kzalloc(sizeof *tx, GFP_ATOMIC);
@@ -1304,7 +1304,7 @@ struct ipoib_cm_tx *ipoib_cm_create_tx(struct net_device *dev, struct ipoib_path
 
 void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(tx->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(tx->dev);
 	unsigned long flags;
 	if (test_and_clear_bit(IPOIB_FLAG_INITIALIZED, &tx->flags)) {
 		spin_lock_irqsave(&priv->lock, flags);
@@ -1439,7 +1439,7 @@ static void ipoib_cm_skb_reap(struct work_struct *work)
 void ipoib_cm_skb_too_long(struct net_device *dev, struct sk_buff *skb,
 			   unsigned int mtu)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int e = skb_queue_empty(&priv->cm.skb_queue);
 
 	if (skb_dst(skb))
@@ -1488,7 +1488,7 @@ static void ipoib_cm_stale_task(struct work_struct *work)
 static ssize_t show_mode(struct device *d, struct device_attribute *attr,
 			 char *buf)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(d));
+	struct ipoib_dev_priv *priv = ipoib_priv(to_net_dev(d));
 
 	if (test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags))
 		return sprintf(buf, "connected\n");
@@ -1501,7 +1501,7 @@ static ssize_t set_mode(struct device *d, struct device_attribute *attr,
 {
 	struct net_device *dev = to_net_dev(d);
 	int ret;
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	if (test_bit(IPOIB_FLAG_GOING_DOWN, &priv->flags))
 		return -EPERM;
@@ -1530,7 +1530,7 @@ int ipoib_cm_add_mode_attr(struct net_device *dev)
 
 static void ipoib_cm_create_srq(struct net_device *dev, int max_sge)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_srq_init_attr srq_init_attr = {
 		.srq_type = IB_SRQT_BASIC,
 		.attr = {
@@ -1559,7 +1559,7 @@ static void ipoib_cm_create_srq(struct net_device *dev, int max_sge)
 
 int ipoib_cm_dev_init(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int max_srq_sge, i;
 
 	INIT_LIST_HEAD(&priv->cm.passive_ids);
@@ -1620,7 +1620,7 @@ int ipoib_cm_dev_init(struct net_device *dev)
 
 void ipoib_cm_dev_cleanup(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int ret;
 
 	if (!priv->cm.srq)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c b/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
index 7b6d40ff1acf..32ad805fc721 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ethtool.c
@@ -60,7 +60,7 @@ struct ipoib_stats {
 static void ipoib_get_drvinfo(struct net_device *netdev,
 			      struct ethtool_drvinfo *drvinfo)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(netdev);
+	struct ipoib_dev_priv *priv = ipoib_priv(netdev);
 
 	ib_get_device_fw_str(priv->ca, drvinfo->fw_version,
 			     sizeof(drvinfo->fw_version));
@@ -77,7 +77,7 @@ static void ipoib_get_drvinfo(struct net_device *netdev,
 static int ipoib_get_coalesce(struct net_device *dev,
 			      struct ethtool_coalesce *coal)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	coal->rx_coalesce_usecs = priv->ethtool.coalesce_usecs;
 	coal->rx_max_coalesced_frames = priv->ethtool.max_coalesced_frames;
@@ -88,7 +88,7 @@ static int ipoib_get_coalesce(struct net_device *dev,
 static int ipoib_set_coalesce(struct net_device *dev,
 			      struct ethtool_coalesce *coal)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int ret;
 
 	/*
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_fs.c b/drivers/infiniband/ulp/ipoib/ipoib_fs.c
index 6bd5740e2691..6e73450241bf 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_fs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_fs.c
@@ -261,7 +261,7 @@ static int ipoib_path_open(struct inode *inode, struct file *file)
 
 void ipoib_create_debug_files(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	char name[IFNAMSIZ + sizeof "_path"];
 
 	snprintf(name, sizeof name, "%s_mcg", dev->name);
@@ -279,7 +279,7 @@ void ipoib_create_debug_files(struct net_device *dev)
 
 void ipoib_delete_debug_files(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	debugfs_remove(priv->mcg_dentry);
 	debugfs_remove(priv->path_dentry);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index dd5fb2964e63..b9ae9c9046d0 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -72,7 +72,7 @@ struct ipoib_ah *ipoib_create_ah(struct net_device *dev,
 		ah = (struct ipoib_ah *)vah;
 	} else {
 		ah->ah = vah;
-		ipoib_dbg(netdev_priv(dev), "Created ah %p\n", ah->ah);
+		ipoib_dbg(ipoib_priv(dev), "Created ah %p\n", ah->ah);
 	}
 
 	return ah;
@@ -81,7 +81,7 @@ struct ipoib_ah *ipoib_create_ah(struct net_device *dev,
 void ipoib_free_ah(struct kref *kref)
 {
 	struct ipoib_ah *ah = container_of(kref, struct ipoib_ah, ref);
-	struct ipoib_dev_priv *priv = netdev_priv(ah->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(ah->dev);
 
 	unsigned long flags;
 
@@ -100,7 +100,7 @@ static void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv,
 
 static int ipoib_ib_post_receive(struct net_device *dev, int id)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_recv_wr *bad_wr;
 	int ret;
 
@@ -122,7 +122,7 @@ static int ipoib_ib_post_receive(struct net_device *dev, int id)
 
 static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev, int id)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct sk_buff *skb;
 	int buf_size;
 	u64 *mapping;
@@ -154,7 +154,7 @@ static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev, int id)
 
 static int ipoib_ib_post_receives(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int i;
 
 	for (i = 0; i < ipoib_recvq_size; ++i) {
@@ -173,7 +173,7 @@ static int ipoib_ib_post_receives(struct net_device *dev)
 
 static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV;
 	struct sk_buff *skb;
 	u64 mapping[IPOIB_UD_RX_SG];
@@ -382,7 +382,7 @@ static void ipoib_qp_state_validate_work(struct work_struct *work)
 
 static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	unsigned int wr_id = wc->wr_id;
 	struct ipoib_tx_buf *tx_req;
 
@@ -486,14 +486,14 @@ int ipoib_poll(struct napi_struct *napi, int budget)
 void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr)
 {
 	struct net_device *dev = dev_ptr;
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	napi_schedule(&priv->napi);
 }
 
 static void drain_tx_cq(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	netif_tx_lock(dev);
 	while (poll_tx(priv))
@@ -507,7 +507,7 @@ static void drain_tx_cq(struct net_device *dev)
 
 void ipoib_send_comp_handler(struct ib_cq *cq, void *dev_ptr)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev_ptr);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev_ptr);
 
 	mod_timer(&priv->poll_timer, jiffies);
 }
@@ -539,9 +539,9 @@ static inline int post_send(struct ipoib_dev_priv *priv,
 }
 
 void ipoib_send(struct net_device *dev, struct sk_buff *skb,
-		struct ipoib_ah *address, u32 dqpn)
+		struct ipoib_ah *address, u32 dqpn, u32 dqkey)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_tx_buf *tx_req;
 	int hlen, rc;
 	void *phead;
@@ -644,7 +644,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 
 static void __ipoib_reap_ah(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_ah *ah, *tah;
 	LIST_HEAD(remove_list);
 	unsigned long flags;
@@ -678,7 +678,7 @@ void ipoib_reap_ah(struct work_struct *work)
 
 static void ipoib_flush_ah(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	cancel_delayed_work(&priv->ah_reap_task);
 	flush_workqueue(priv->wq);
@@ -687,7 +687,7 @@ static void ipoib_flush_ah(struct net_device *dev)
 
 static void ipoib_stop_ah(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	set_bit(IPOIB_STOP_REAPER, &priv->flags);
 	ipoib_flush_ah(dev);
@@ -695,7 +695,7 @@ static void ipoib_stop_ah(struct net_device *dev)
 
 static int recvs_pending(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int pending = 0;
 	int i;
 
@@ -708,7 +708,7 @@ static int recvs_pending(struct net_device *dev)
 
 int ipoib_ib_dev_stop_default(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_qp_attr qp_attr;
 	unsigned long begin;
 	struct ipoib_tx_buf *tx_req;
@@ -783,7 +783,9 @@ int ipoib_ib_dev_stop_default(struct net_device *dev)
 
 int ipoib_ib_dev_stop(struct net_device *dev)
 {
-	ipoib_ib_dev_stop_default(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+
+	priv->rn_ops->ndo_stop(dev);
 
 	ipoib_flush_ah(dev);
 
@@ -797,7 +799,7 @@ void ipoib_ib_tx_timer_func(unsigned long ctx)
 
 int ipoib_ib_dev_open_default(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int ret;
 
 	ret = ipoib_init_qp(dev);
@@ -831,7 +833,7 @@ int ipoib_ib_dev_open_default(struct net_device *dev)
 
 int ipoib_ib_dev_open(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_pkey_dev_check_presence(dev);
 
@@ -845,7 +847,7 @@ int ipoib_ib_dev_open(struct net_device *dev)
 	queue_delayed_work(priv->wq, &priv->ah_reap_task,
 			   round_jiffies_relative(HZ));
 
-	if (ipoib_ib_dev_open_default(dev)) {
+	if (priv->rn_ops->ndo_open(dev)) {
 		pr_warn("%s: Failed to open dev\n", dev->name);
 		goto stop_ah_reap;
 	}
@@ -860,7 +862,7 @@ int ipoib_ib_dev_open(struct net_device *dev)
 
 void ipoib_pkey_dev_check_presence(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	if (!(priv->pkey & 0x7fff) ||
 	    ib_find_pkey(priv->ca, priv->port, priv->pkey,
@@ -872,7 +874,7 @@ void ipoib_pkey_dev_check_presence(struct net_device *dev)
 
 void ipoib_ib_dev_up(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_pkey_dev_check_presence(dev);
 
@@ -888,7 +890,7 @@ void ipoib_ib_dev_up(struct net_device *dev)
 
 void ipoib_ib_dev_down(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_dbg(priv, "downing ib_dev\n");
 
@@ -903,7 +905,7 @@ void ipoib_ib_dev_down(struct net_device *dev)
 
 void ipoib_drain_cq(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int i, n;
 
 	/*
@@ -1217,7 +1219,8 @@ void ipoib_ib_dev_flush_heavy(struct work_struct *work)
 
 void ipoib_ib_dev_cleanup(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
 
 	ipoib_dbg(priv, "cleaning up ib_dev\n");
 	/*
@@ -1239,7 +1242,7 @@ void ipoib_ib_dev_cleanup(struct net_device *dev)
 
 	clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 
-	ipoib_dev_uninit_default(dev);
+	rn->ib_dev_cleanup(dev, priv->ca);
 
 	if (priv->pd) {
 		ib_dealloc_pd(priv->pd);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index be8dcea252f9..605f5673a6ca 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -110,7 +110,7 @@ static struct net_device *ipoib_get_net_dev_by_params(
 
 int ipoib_open(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_dbg(priv, "bringing up interface\n");
 
@@ -157,7 +157,7 @@ int ipoib_open(struct net_device *dev)
 
 static int ipoib_stop(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_dbg(priv, "stopping interface\n");
 
@@ -166,7 +166,7 @@ static int ipoib_stop(struct net_device *dev)
 	netif_stop_queue(dev);
 
 	ipoib_ib_dev_down(dev);
-	ipoib_ib_dev_stop_default(dev);
+	ipoib_ib_dev_stop(dev);
 
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
 		struct ipoib_dev_priv *cpriv;
@@ -195,7 +195,7 @@ static void ipoib_uninit(struct net_device *dev)
 
 static netdev_features_t ipoib_fix_features(struct net_device *dev, netdev_features_t features)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	if (test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags))
 		features &= ~(NETIF_F_IP_CSUM | NETIF_F_TSO);
@@ -205,7 +205,7 @@ static netdev_features_t ipoib_fix_features(struct net_device *dev, netdev_featu
 
 static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	/* dev->mtu > 2K ==> connected mode */
 	if (ipoib_cm_admin_enabled(dev)) {
@@ -468,7 +468,7 @@ static struct net_device *ipoib_get_net_dev_by_params(
 
 int ipoib_set_mode(struct net_device *dev, const char *buf)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	if ((test_bit(IPOIB_FLAG_ADMIN_CM, &priv->flags) &&
 	     !strcmp(buf, "connected\n")) ||
@@ -505,7 +505,7 @@ int ipoib_set_mode(struct net_device *dev, const char *buf)
 
 struct ipoib_path *__path_find(struct net_device *dev, void *gid)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct rb_node *n = priv->path_tree.rb_node;
 	struct ipoib_path *path;
 	int ret;
@@ -529,7 +529,7 @@ struct ipoib_path *__path_find(struct net_device *dev, void *gid)
 
 static int __path_add(struct net_device *dev, struct ipoib_path *path)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct rb_node **n = &priv->path_tree.rb_node;
 	struct rb_node *pn = NULL;
 	struct ipoib_path *tpath;
@@ -564,7 +564,7 @@ static void path_free(struct net_device *dev, struct ipoib_path *path)
 	while ((skb = __skb_dequeue(&path->queue)))
 		dev_kfree_skb_irq(skb);
 
-	ipoib_dbg(netdev_priv(dev), "path_free\n");
+	ipoib_dbg(ipoib_priv(dev), "path_free\n");
 
 	/* remove all neigh connected to this path */
 	ipoib_del_neighs_by_gid(dev, path->pathrec.dgid.raw);
@@ -598,7 +598,7 @@ struct ipoib_path_iter *ipoib_path_iter_init(struct net_device *dev)
 
 int ipoib_path_iter_next(struct ipoib_path_iter *iter)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(iter->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(iter->dev);
 	struct rb_node *n;
 	struct ipoib_path *path;
 	int ret = 1;
@@ -635,7 +635,7 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter,
 
 void ipoib_mark_paths_invalid(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_path *path, *tp;
 
 	spin_lock_irq(&priv->lock);
@@ -731,7 +731,7 @@ static void push_pseudo_header(struct sk_buff *skb, const char *daddr)
 
 void ipoib_flush_paths(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_path *path, *tp;
 	LIST_HEAD(remove_list);
 	unsigned long flags;
@@ -765,7 +765,7 @@ static void path_rec_completion(int status,
 {
 	struct ipoib_path *path = path_ptr;
 	struct net_device *dev = path->dev;
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_ah *ah = NULL;
 	struct ipoib_ah *old_ah = NULL;
 	struct ipoib_neigh *neigh, *tn;
@@ -858,7 +858,7 @@ static void path_rec_completion(int status,
 
 static struct ipoib_path *path_rec_create(struct net_device *dev, void *gid)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_path *path;
 
 	if (!priv->broadcast)
@@ -886,7 +886,7 @@ static struct ipoib_path *path_rec_create(struct net_device *dev, void *gid)
 static int path_rec_start(struct net_device *dev,
 			  struct ipoib_path *path)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_dbg(priv, "Start path record lookup for %pI6\n",
 		  path->pathrec.dgid.raw);
@@ -917,7 +917,8 @@ static int path_rec_start(struct net_device *dev,
 static void neigh_add_path(struct sk_buff *skb, u8 *daddr,
 			   struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
 	struct ipoib_path *path;
 	struct ipoib_neigh *neigh;
 	unsigned long flags;
@@ -964,7 +965,9 @@ static void neigh_add_path(struct sk_buff *skb, u8 *daddr,
 			}
 		} else {
 			spin_unlock_irqrestore(&priv->lock, flags);
-			ipoib_send(dev, skb, path->ah, IPOIB_QPN(daddr));
+			rn->send(dev, skb, path->ah,
+						    IPOIB_QPN(daddr),
+						    priv->tx_wr.remote_qkey);
 			ipoib_neigh_put(neigh);
 			return;
 		}
@@ -998,7 +1001,8 @@ static void neigh_add_path(struct sk_buff *skb, u8 *daddr,
 static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
 			     struct ipoib_pseudo_header *phdr)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
 	struct ipoib_path *path;
 	unsigned long flags;
 
@@ -1042,7 +1046,9 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
 			  be16_to_cpu(path->pathrec.dlid));
 
 		spin_unlock_irqrestore(&priv->lock, flags);
-		ipoib_send(dev, skb, path->ah, IPOIB_QPN(phdr->hwaddr));
+		rn->send(dev, skb, path->ah,
+			 IPOIB_QPN(phdr->hwaddr),
+			 priv->tx_wr.remote_qkey);
 		return;
 	} else if ((path->query || !path_rec_start(dev, path)) &&
 		   skb_queue_len(&path->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
@@ -1058,7 +1064,8 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
 
 static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
 	struct ipoib_neigh *neigh;
 	struct ipoib_pseudo_header *phdr;
 	struct ipoib_header *header;
@@ -1122,7 +1129,9 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			goto unref;
 		}
 	} else if (neigh->ah) {
-		ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(phdr->hwaddr));
+		rn->send(dev, skb, neigh->ah,
+			 IPOIB_QPN(phdr->hwaddr),
+			 priv->tx_wr.remote_qkey);
 		goto unref;
 	}
 
@@ -1144,7 +1153,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 static void ipoib_timeout(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_warn(priv, "transmit timeout: latency %d msecs\n",
 		   jiffies_to_msecs(jiffies - dev_trans_start(dev)));
@@ -1178,7 +1187,7 @@ static int ipoib_hard_header(struct sk_buff *skb,
 
 static void ipoib_set_mcast_list(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags)) {
 		ipoib_dbg(priv, "IPOIB_FLAG_OPER_UP not set");
@@ -1190,7 +1199,7 @@ static void ipoib_set_mcast_list(struct net_device *dev)
 
 static int ipoib_get_iflink(const struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	/* parent interface */
 	if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
@@ -1218,7 +1227,7 @@ static u32 ipoib_addr_hash(struct ipoib_neigh_hash *htbl, u8 *daddr)
 
 struct ipoib_neigh *ipoib_neigh_get(struct net_device *dev, u8 *daddr)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_neigh_table *ntbl = &priv->ntbl;
 	struct ipoib_neigh_hash *htbl;
 	struct ipoib_neigh *neigh = NULL;
@@ -1347,7 +1356,7 @@ static struct ipoib_neigh *ipoib_neigh_ctor(u8 *daddr,
 struct ipoib_neigh *ipoib_neigh_alloc(u8 *daddr,
 				      struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_neigh_table *ntbl = &priv->ntbl;
 	struct ipoib_neigh_hash *htbl;
 	struct ipoib_neigh *neigh;
@@ -1404,7 +1413,7 @@ void ipoib_neigh_dtor(struct ipoib_neigh *neigh)
 {
 	/* neigh reference count was dropprd to zero */
 	struct net_device *dev = neigh->dev;
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct sk_buff *skb;
 	if (neigh->ah)
 		ipoib_put_ah(neigh->ah);
@@ -1414,7 +1423,7 @@ void ipoib_neigh_dtor(struct ipoib_neigh *neigh)
 	}
 	if (ipoib_cm_get(neigh))
 		ipoib_cm_destroy_tx(ipoib_cm_get(neigh));
-	ipoib_dbg(netdev_priv(dev),
+	ipoib_dbg(ipoib_priv(dev),
 		  "neigh free for %06x %pI6\n",
 		  IPOIB_QPN(neigh->daddr),
 		  neigh->daddr + 4);
@@ -1436,7 +1445,7 @@ static void ipoib_neigh_reclaim(struct rcu_head *rp)
 void ipoib_neigh_free(struct ipoib_neigh *neigh)
 {
 	struct net_device *dev = neigh->dev;
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_neigh_table *ntbl = &priv->ntbl;
 	struct ipoib_neigh_hash *htbl;
 	struct ipoib_neigh __rcu **np;
@@ -1519,7 +1528,7 @@ static void neigh_hash_free_rcu(struct rcu_head *head)
 
 void ipoib_del_neighs_by_gid(struct net_device *dev, u8 *gid)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ipoib_neigh_table *ntbl = &priv->ntbl;
 	struct ipoib_neigh_hash *htbl;
 	unsigned long flags;
@@ -1605,7 +1614,7 @@ static void ipoib_flush_neighs(struct ipoib_dev_priv *priv)
 
 static void ipoib_neigh_hash_uninit(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int stopped;
 
 	ipoib_dbg(priv, "ipoib_neigh_hash_uninit\n");
@@ -1622,9 +1631,9 @@ static void ipoib_neigh_hash_uninit(struct net_device *dev)
 	wait_for_completion(&priv->ntbl.deleted);
 }
 
-void ipoib_dev_uninit_default(struct net_device *dev)
+void ipoib_dev_uninit_default(struct net_device *dev, struct ib_device *hca)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_transport_dev_cleanup(dev);
 
@@ -1637,10 +1646,12 @@ void ipoib_dev_uninit_default(struct net_device *dev)
 	priv->tx_ring = NULL;
 }
 
-static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
-				  int port)
+static int ipoib_dev_init_default(struct net_device *dev,
+				  struct ib_device *hca, int *qp_num)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+
+	netif_napi_add(dev, &priv->napi, ipoib_poll, NAPI_POLL_WEIGHT);
 
 	/* Allocate RX/TX "rings" to hold queued skbs */
 	priv->rx_ring =	kzalloc(ipoib_recvq_size * sizeof *priv->rx_ring,
@@ -1651,14 +1662,14 @@ static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
 	priv->tx_ring = vzalloc(ipoib_sendq_size * sizeof *priv->tx_ring);
 	if (!priv->tx_ring) {
 		printk(KERN_WARNING "%s: failed to allocate TX ring (%d entries)\n",
-		       ca->name, ipoib_sendq_size);
+		       priv->ca->name, ipoib_sendq_size);
 		goto out_rx_ring_cleanup;
 	}
 
 	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
 
-	if (ipoib_transport_dev_init(dev, ca)) {
-		pr_warn("%s: ipoib_transport_dev_init failed\n", ca->name);
+	if (ipoib_transport_dev_init(dev, priv->ca)) {
+		pr_warn("%s: ipoib_transport_dev_init failed\n", priv->ca->name);
 		goto out_tx_ring_cleanup;
 	}
 
@@ -1679,7 +1690,8 @@ static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
 
 int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
 	int ret = -ENOMEM;
 
 	priv->ca = ca;
@@ -1703,16 +1715,16 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 		goto clean_wq;
 	}
 
-	ret = ipoib_dev_init_default(dev, ca, port);
+	ret = rn->ib_dev_init(dev, ca, &priv->qp_num);
 	if (ret) {
 		pr_warn("%s failed to init HW resource\n", dev->name);
 		goto out_free_pd;
 	}
 
 	/* after qp created set dev address */
-	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
-	priv->dev->dev_addr[2] = (priv->qp->qp_num >>  8) & 0xff;
-	priv->dev->dev_addr[3] = (priv->qp->qp_num) & 0xff;
+	priv->dev->dev_addr[1] = (priv->qp_num >> 16) & 0xff;
+	priv->dev->dev_addr[2] = (priv->qp_num >>  8) & 0xff;
+	priv->dev->dev_addr[3] = (priv->qp_num) & 0xff;
 
 	if (ipoib_neigh_hash_init(priv) < 0) {
 		pr_warn("%s failed to init neigh hash\n", dev->name);
@@ -1750,7 +1762,7 @@ int ipoib_dev_init(struct net_device *dev, struct ib_device *ca, int port)
 
 void ipoib_dev_cleanup(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev), *cpriv, *tcpriv;
+	struct ipoib_dev_priv *priv = ipoib_priv(dev), *cpriv, *tcpriv;
 	LIST_HEAD(head);
 
 	ASSERT_RTNL();
@@ -1780,7 +1792,7 @@ void ipoib_dev_cleanup(struct net_device *dev)
 
 static int ipoib_set_vf_link_state(struct net_device *dev, int vf, int link_state)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	return ib_set_vf_link_state(priv->ca, vf, priv->port, link_state);
 }
@@ -1788,7 +1800,7 @@ static int ipoib_set_vf_link_state(struct net_device *dev, int vf, int link_stat
 static int ipoib_get_vf_config(struct net_device *dev, int vf,
 			       struct ifla_vf_info *ivf)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int err;
 
 	err = ib_get_vf_config(priv->ca, vf, priv->port, ivf);
@@ -1802,7 +1814,7 @@ static int ipoib_get_vf_config(struct net_device *dev, int vf,
 
 static int ipoib_set_vf_guid(struct net_device *dev, int vf, u64 guid, int type)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	if (type != IFLA_VF_IB_NODE_GUID && type != IFLA_VF_IB_PORT_GUID)
 		return -EINVAL;
@@ -1813,7 +1825,7 @@ static int ipoib_set_vf_guid(struct net_device *dev, int vf, u64 guid, int type)
 static int ipoib_get_vf_stats(struct net_device *dev, int vf,
 			      struct ifla_vf_stats *vf_stats)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	return ib_get_vf_stats(priv->ca, vf, priv->port, vf_stats);
 }
@@ -1851,21 +1863,12 @@ static int ipoib_get_vf_stats(struct net_device *dev, int vf,
 	.ndo_get_iflink		 = ipoib_get_iflink,
 };
 
-void ipoib_setup(struct net_device *dev)
+void ipoib_setup_common(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
-
-	if (priv->hca_caps & IB_DEVICE_VIRTUAL_FUNCTION)
-		dev->netdev_ops	= &ipoib_netdev_ops_vf;
-	else
-		dev->netdev_ops	= &ipoib_netdev_ops_pf;
-
 	dev->header_ops		 = &ipoib_header_ops;
 
 	ipoib_set_ethtool_ops(dev);
 
-	netif_napi_add(dev, &priv->napi, ipoib_poll, NAPI_POLL_WEIGHT);
-
 	dev->watchdog_timeo	 = HZ;
 
 	dev->flags		|= IFF_BROADCAST | IFF_MULTICAST;
@@ -1879,11 +1882,14 @@ void ipoib_setup(struct net_device *dev)
 	netif_keep_dst(dev);
 
 	memcpy(dev->broadcast, ipv4_bcast_addr, INFINIBAND_ALEN);
+}
 
-	priv->dev = dev;
+void ipoib_build_priv(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
+	priv->dev = dev;
 	spin_lock_init(&priv->lock);
-
 	init_rwsem(&priv->vlan_rwsem);
 
 	INIT_LIST_HEAD(&priv->path_list);
@@ -1901,22 +1907,80 @@ void ipoib_setup(struct net_device *dev)
 	INIT_DELAYED_WORK(&priv->neigh_reap_task, ipoib_reap_neigh);
 }
 
-struct ipoib_dev_priv *ipoib_intf_alloc(const char *name)
+static const struct net_device_ops ipoib_netdev_default_pf = {
+	.ndo_open		 = ipoib_ib_dev_open_default,
+	.ndo_stop		 = ipoib_ib_dev_stop_default,
+};
+
+static struct net_device *ipoib_create_netdev_default(struct ib_device *hca,
+						      const char *name,
+						      void (*setup)(struct net_device *))
 {
 	struct net_device *dev;
+	struct rdma_netdev *rn;
 
-	dev = alloc_netdev((int)sizeof(struct ipoib_dev_priv), name,
-			   NET_NAME_UNKNOWN, ipoib_setup);
+	dev = alloc_netdev((int)sizeof(struct ipoib_rdma_netdev),
+			   name,
+			   NET_NAME_UNKNOWN, setup);
 	if (!dev)
 		return NULL;
 
-	return netdev_priv(dev);
+	rn = netdev_priv(dev);
+
+	rn->ib_dev_init = ipoib_dev_init_default;
+	rn->ib_dev_cleanup = ipoib_dev_uninit_default;
+	rn->send = ipoib_send;
+	rn->attach_mcast = ipoib_mcast_attach;
+	rn->detach_mcast = ipoib_mcast_detach;
+
+	dev->netdev_ops = &ipoib_netdev_default_pf;
+
+	return dev;
+}
+
+struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,
+					const char *name)
+{
+	struct net_device *dev;
+	struct ipoib_dev_priv *priv;
+	struct rdma_netdev *rn;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv) {
+		pr_err("%s failed allocting priv\n", __func__);
+		return NULL;
+	}
+
+	if (!hca->alloc_rdma_netdev)
+		dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
+	else
+		dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
+					     name, NET_NAME_UNKNOWN,
+					     ipoib_setup_common);
+	if (!dev) {
+		kfree(priv);
+		return NULL;
+	}
+
+	priv->rn_ops = dev->netdev_ops;
+
+	/* fixme : should be after the query_cap */
+	if (priv->hca_caps & IB_DEVICE_VIRTUAL_FUNCTION)
+		dev->netdev_ops	= &ipoib_netdev_ops_vf;
+	else
+		dev->netdev_ops	= &ipoib_netdev_ops_pf;
+
+	rn = netdev_priv(dev);
+	rn->clnt_priv = priv;
+	ipoib_build_priv(dev);
+
+	return priv;
 }
 
 static ssize_t show_pkey(struct device *dev,
 			 struct device_attribute *attr, char *buf)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(dev));
+	struct ipoib_dev_priv *priv = ipoib_priv(to_net_dev(dev));
 
 	return sprintf(buf, "0x%04x\n", priv->pkey);
 }
@@ -1925,14 +1989,14 @@ static ssize_t show_pkey(struct device *dev,
 static ssize_t show_umcast(struct device *dev,
 			   struct device_attribute *attr, char *buf)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(dev));
+	struct ipoib_dev_priv *priv = ipoib_priv(to_net_dev(dev));
 
 	return sprintf(buf, "%d\n", test_bit(IPOIB_FLAG_UMCAST, &priv->flags));
 }
 
 void ipoib_set_umcast(struct net_device *ndev, int umcast_val)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(ndev);
+	struct ipoib_dev_priv *priv = ipoib_priv(ndev);
 
 	if (umcast_val > 0) {
 		set_bit(IPOIB_FLAG_UMCAST, &priv->flags);
@@ -2005,7 +2069,7 @@ static int ipoib_check_lladdr(struct net_device *dev,
 
 static int ipoib_set_mac(struct net_device *dev, void *addr)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct sockaddr_storage *ss = addr;
 	int ret;
 
@@ -2094,7 +2158,7 @@ static struct net_device *ipoib_add_port(const char *format,
 	struct ib_port_attr attr;
 	int result = -ENOMEM;
 
-	priv = ipoib_intf_alloc(format);
+	priv = ipoib_intf_alloc(hca, port, format);
 	if (!priv)
 		goto alloc_mem_failed;
 
@@ -2224,7 +2288,7 @@ static void ipoib_add_one(struct ib_device *device)
 			continue;
 		dev = ipoib_add_port("ib%d", device, p);
 		if (!IS_ERR(dev)) {
-			priv = netdev_priv(dev);
+			priv = ipoib_priv(dev);
 			list_add_tail(&priv->list, dev_list);
 			count++;
 		}
@@ -2264,6 +2328,7 @@ static void ipoib_remove_one(struct ib_device *device, void *client_data)
 
 		unregister_netdev(priv->dev);
 		free_netdev(priv->dev);
+		kfree(priv);
 	}
 
 	kfree(dev_list);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 69e146cdc306..a557a5e993fd 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -114,7 +114,7 @@ static void ipoib_mcast_free(struct ipoib_mcast *mcast)
 	struct net_device *dev = mcast->dev;
 	int tx_dropped = 0;
 
-	ipoib_dbg_mcast(netdev_priv(dev), "deleting multicast group %pI6\n",
+	ipoib_dbg_mcast(ipoib_priv(dev), "deleting multicast group %pI6\n",
 			mcast->mcmember.mgid.raw);
 
 	/* remove all neigh connected to this mcast */
@@ -158,7 +158,7 @@ static struct ipoib_mcast *ipoib_mcast_alloc(struct net_device *dev,
 
 static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct rb_node *n = priv->multicast_tree.rb_node;
 
 	while (n) {
@@ -182,7 +182,7 @@ static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid
 
 static int __ipoib_mcast_add(struct net_device *dev, struct ipoib_mcast *mcast)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct rb_node **n = &priv->multicast_tree.rb_node, *pn = NULL;
 
 	while (*n) {
@@ -212,7 +212,8 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
 				   struct ib_sa_mcmember_rec *mcmember)
 {
 	struct net_device *dev = mcast->dev;
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
 	struct ipoib_ah *ah;
 	int ret;
 	int set_qkey = 0;
@@ -260,8 +261,9 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
 			return 0;
 		}
 
-		ret = ipoib_mcast_attach(dev, be16_to_cpu(mcast->mcmember.mlid),
-					 &mcast->mcmember.mgid, set_qkey);
+		ret = rn->attach_mcast(dev, priv->ca, &mcast->mcmember.mgid,
+				       be16_to_cpu(mcast->mcmember.mlid),
+				       set_qkey);
 		if (ret < 0) {
 			ipoib_warn(priv, "couldn't attach QP to multicast group %pI6\n",
 				   mcast->mcmember.mgid.raw);
@@ -375,7 +377,7 @@ static int ipoib_mcast_join_complete(int status,
 {
 	struct ipoib_mcast *mcast = multicast->context;
 	struct net_device *dev = mcast->dev;
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	ipoib_dbg_mcast(priv, "%sjoin completion for %pI6 (status %d)\n",
 			test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) ?
@@ -477,7 +479,7 @@ static int ipoib_mcast_join_complete(int status,
  */
 static int ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_sa_multicast *multicast;
 	struct ib_sa_mcmember_rec rec = {
 		.join_state = 1
@@ -678,7 +680,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
 
 void ipoib_mcast_start_thread(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	unsigned long flags;
 
 	ipoib_dbg_mcast(priv, "starting multicast thread\n");
@@ -690,7 +692,7 @@ void ipoib_mcast_start_thread(struct net_device *dev)
 
 int ipoib_mcast_stop_thread(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	unsigned long flags;
 
 	ipoib_dbg_mcast(priv, "stopping multicast thread\n");
@@ -706,7 +708,8 @@ int ipoib_mcast_stop_thread(struct net_device *dev)
 
 static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
 	int ret = 0;
 
 	if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
@@ -720,8 +723,8 @@ static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast)
 				mcast->mcmember.mgid.raw);
 
 		/* Remove ourselves from the multicast group */
-		ret = ib_detach_mcast(priv->qp, &mcast->mcmember.mgid,
-				      be16_to_cpu(mcast->mcmember.mlid));
+		ret = rn->detach_mcast(dev, priv->ca, &mcast->mcmember.mgid,
+				       be16_to_cpu(mcast->mcmember.mlid));
 		if (ret)
 			ipoib_warn(priv, "ib_detach_mcast failed (result = %d)\n", ret);
 	} else if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
@@ -762,7 +765,8 @@ void ipoib_mcast_remove_list(struct list_head *remove_list)
 
 void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
 	struct ipoib_mcast *mcast;
 	unsigned long flags;
 	void *mgid = daddr + 4;
@@ -825,7 +829,8 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
 			}
 		}
 		spin_unlock_irqrestore(&priv->lock, flags);
-		ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN);
+		rn->send(dev, skb, mcast->ah, IB_MULTICAST_QPN,
+			 priv->tx_wr.remote_qkey);
 		if (neigh)
 			ipoib_neigh_put(neigh);
 		return;
@@ -837,7 +842,7 @@ void ipoib_mcast_send(struct net_device *dev, u8 *daddr, struct sk_buff *skb)
 
 void ipoib_mcast_dev_flush(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	LIST_HEAD(remove_list);
 	struct ipoib_mcast *mcast, *tmcast;
 	unsigned long flags;
@@ -1029,7 +1034,7 @@ struct ipoib_mcast_iter *ipoib_mcast_iter_init(struct net_device *dev)
 
 int ipoib_mcast_iter_next(struct ipoib_mcast_iter *iter)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(iter->dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(iter->dev);
 	struct rb_node *n;
 	struct ipoib_mcast *mcast;
 	int ret = 1;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index cdc7df4fdb8a..5d795aaea92c 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -44,7 +44,7 @@
 
 static int ipoib_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	u16 val;
 
 	if (nla_put_u16(skb, IFLA_IPOIB_PKEY, priv->pkey))
@@ -107,7 +107,7 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
 	if (!pdev || pdev->type != ARPHRD_INFINIBAND)
 		return -ENODEV;
 
-	ppriv = netdev_priv(pdev);
+	ppriv = ipoib_priv(pdev);
 
 	if (test_bit(IPOIB_FLAG_SUBINTERFACE, &ppriv->flags)) {
 		ipoib_warn(ppriv, "child creation disallowed for child devices\n");
@@ -129,7 +129,7 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev,
 	 */
 	child_pkey |= 0x8000;
 
-	err = __ipoib_vlan_add(ppriv, netdev_priv(dev), child_pkey, IPOIB_RTNL_CHILD);
+	err = __ipoib_vlan_add(ppriv, ipoib_priv(dev), child_pkey, IPOIB_RTNL_CHILD);
 
 	if (!err && data)
 		err = ipoib_changelink(dev, tb, data);
@@ -140,8 +140,8 @@ static void ipoib_unregister_child_dev(struct net_device *dev, struct list_head
 {
 	struct ipoib_dev_priv *priv, *ppriv;
 
-	priv = netdev_priv(dev);
-	ppriv = netdev_priv(priv->parent);
+	priv = ipoib_priv(dev);
+	ppriv = ipoib_priv(priv->parent);
 
 	down_write(&ppriv->vlan_rwsem);
 	unregister_netdevice_queue(dev, head);
@@ -161,7 +161,7 @@ static size_t ipoib_get_size(const struct net_device *dev)
 	.maxtype	= IFLA_IPOIB_MAX,
 	.policy		= ipoib_policy,
 	.priv_size	= sizeof(struct ipoib_dev_priv),
-	.setup		= ipoib_setup,
+	.setup		= ipoib_setup_common,
 	.newlink	= ipoib_new_child_link,
 	.changelink	= ipoib_changelink,
 	.dellink	= ipoib_unregister_child_dev,
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 529b9509a2c4..5f3dba9041e0 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -35,9 +35,10 @@
 
 #include "ipoib.h"
 
-int ipoib_mcast_attach(struct net_device *dev, u16 mlid, union ib_gid *mgid, int set_qkey)
+int ipoib_mcast_attach(struct net_device *dev, struct ib_device *hca,
+		       union ib_gid *mgid, u16 mlid, int set_qkey)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_qp_attr *qp_attr = NULL;
 	int ret;
 	u16 pkey_index;
@@ -74,9 +75,20 @@ int ipoib_mcast_attach(struct net_device *dev, u16 mlid, union ib_gid *mgid, int
 	return ret;
 }
 
+int ipoib_mcast_detach(struct net_device *dev, struct ib_device *hca,
+		       union ib_gid *mgid, u16 mlid)
+{
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
+	int ret;
+
+	ret = ib_detach_mcast(priv->qp, mgid, mlid);
+
+	return ret;
+}
+
 int ipoib_init_qp(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	int ret;
 	struct ib_qp_attr qp_attr;
 	int attr_mask;
@@ -130,7 +142,7 @@ int ipoib_init_qp(struct net_device *dev)
 
 int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 	struct ib_qp_init_attr init_attr = {
 		.cap = {
 			.max_send_wr  = ipoib_sendq_size,
@@ -196,6 +208,8 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		goto out_free_send_cq;
 	}
 
+	priv->qp_num = priv->qp->qp_num;
+
 	for (i = 0; i < MAX_SKB_FRAGS + 1; ++i)
 		priv->tx_sge[i].lkey = priv->pd->local_dma_lkey;
 
@@ -232,7 +246,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 
 void ipoib_transport_dev_cleanup(struct net_device *dev)
 {
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	if (priv->qp) {
 		if (ib_destroy_qp(priv->qp))
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index deedb6fc1b05..ce1a12ba2fd9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -43,7 +43,7 @@ static ssize_t show_parent(struct device *d, struct device_attribute *attr,
 			   char *buf)
 {
 	struct net_device *dev = to_net_dev(d);
-	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	struct ipoib_dev_priv *priv = ipoib_priv(dev);
 
 	return sprintf(buf, "%s\n", priv->parent->name);
 }
@@ -127,14 +127,15 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
-	ppriv = netdev_priv(pdev);
+	ppriv = ipoib_priv(pdev);
 
 	if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags))
 		return -EPERM;
 
 	snprintf(intf_name, sizeof intf_name, "%s.%04x",
 		 ppriv->dev->name, pkey);
-	priv = ipoib_intf_alloc(intf_name);
+
+	priv = ipoib_intf_alloc(ppriv->ca, ppriv->port, intf_name);
 	if (!priv)
 		return -ENOMEM;
 
@@ -182,7 +183,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
 
-	ppriv = netdev_priv(pdev);
+	ppriv = ipoib_priv(pdev);
 
 	if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags))
 		return -EPERM;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 06/25] hw/mlx5: Add New bit to check over QP creation
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (4 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 07/25] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects Erez Shitrit
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


Add check for bit IB_QP_CREATE_NETIF_QP while creating QP.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index dc0ea63900c1..41e14d57fec9 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -897,6 +897,7 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
 	if (init_attr->create_flags & ~(IB_QP_CREATE_SIGNATURE_EN |
 					IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK |
 					IB_QP_CREATE_IPOIB_UD_LSO |
+					IB_QP_CREATE_NETIF_QP |
 					mlx5_ib_create_qp_sqpn_qp1()))
 		return -EINVAL;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 07/25] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (5 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 06/25] hw/mlx5: Add New bit to check over QP creation Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 08/25] net/mlx5e: Refactor EN code to support IB link Erez Shitrit
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


Add to the TIS and flow_table objects, and to hca capability table.

1. New capability bit: ipoib_enhanced_offloads, indicates new ability for UD
QP to do RSS.
2. In order to support SET_ROOT cmd, to connect between QP to FS table.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2d197d8a7025..afb6c8ab156a 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -859,7 +859,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 
 	u8         compact_address_vector[0x1];
 	u8         striding_rq[0x1];
-	u8         reserved_at_202[0x2];
+	u8         reserved_at_202[0x1];
+	u8         ipoib_enhanced_offloads[0x1];
 	u8         ipoib_basic_offloads[0x1];
 	u8         reserved_at_205[0xa];
 	u8         drain_sigerr[0x1];
@@ -2217,7 +2218,9 @@ struct mlx5_ifc_tisc_bits {
 	u8         reserved_at_120[0x8];
 	u8         transport_domain[0x18];
 
-	u8         reserved_at_140[0x3c0];
+	u8         reserved_at_140[0x8];
+	u8         underlay_qpn[0x18];
+	u8         reserved_at_160[0x3a0];
 };
 
 enum {
@@ -7906,7 +7909,9 @@ struct mlx5_ifc_set_flow_table_root_in_bits {
 	u8         reserved_at_a0[0x8];
 	u8         table_id[0x18];
 
-	u8         reserved_at_c0[0x140];
+	u8         reserved_at_c0[0x8];
+	u8         underlay_qpn[0x18];
+	u8         reserved_at_e0[0x120];
 };
 
 enum {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 08/25] net/mlx5e: Refactor EN code to support IB link
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (6 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 07/25] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 10/25] net/mlx5e: Support netdevice creation for IB link type Erez Shitrit
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


The idea is to use the same infrastructures for both ETH and IB link
types, so the first step is to refactor the ETH handling to be able to
use IB link as well.

1. Check requirments for ETH and for IB
2. Move code to common functions, where it will be used for both link
types.
3. Change init and cleanup flows not to be specific for ETH link.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 120 ++++++++++++++--------
 1 file changed, 80 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 041e0ac16096..88541f99d37b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3323,29 +3323,39 @@ static void mlx5e_netpoll(struct net_device *dev)
 	.ndo_get_offload_stats	 = mlx5e_get_offload_stats,
 };
 
-static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
-{
-	if (MLX5_CAP_GEN(mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
-		return -EOPNOTSUPP;
-	if (!MLX5_CAP_GEN(mdev, eth_net_offloads) ||
-	    !MLX5_CAP_GEN(mdev, nic_flow_table) ||
-	    !MLX5_CAP_ETH(mdev, csum_cap) ||
-	    !MLX5_CAP_ETH(mdev, max_lso_cap) ||
-	    !MLX5_CAP_ETH(mdev, vlan_cap) ||
-	    !MLX5_CAP_ETH(mdev, rss_ind_tbl_cap) ||
-	    MLX5_CAP_FLOWTABLE(mdev,
-			       flow_table_properties_nic_receive.max_ft_level)
-			       < 3) {
-		mlx5_core_warn(mdev,
-			       "Not creating net device, some required device capabilities are missing\n");
-		return -EOPNOTSUPP;
+static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev,
+					int link_type)
+{
+	if (link_type == MLX5_INTERFACE_PROTOCOL_ETH) {
+		if (!MLX5_CAP_GEN(mdev, eth_net_offloads) ||
+		    !MLX5_CAP_GEN(mdev, nic_flow_table) ||
+		    !MLX5_CAP_ETH(mdev, csum_cap) ||
+		    !MLX5_CAP_ETH(mdev, max_lso_cap) ||
+		    !MLX5_CAP_ETH(mdev, vlan_cap) ||
+		    !MLX5_CAP_ETH(mdev, rss_ind_tbl_cap) ||
+		    MLX5_CAP_FLOWTABLE(mdev,
+				       flow_table_properties_nic_receive.max_ft_level)
+				       < 3) {
+			mlx5_core_warn(mdev,
+				       "Not creating net device, some required device capabilities are missing\n");
+			return -ENOTSUPP;
+		}
+		if (!MLX5_CAP_ETH(mdev, self_lb_en_modifiable))
+			mlx5_core_warn(mdev, "Self loop back prevention is not supported\n");
+		if (!MLX5_CAP_GEN(mdev, cq_moderation))
+			mlx5_core_warn(mdev, "CQ modiration is not supported\n");
+
+		return 0;
+	} else if (link_type == MLX5_INTERFACE_PROTOCOL_IB) {
+		if (!MLX5_CAP_GEN(mdev, ipoib_enhanced_offloads)) {
+			pr_warn("Not creating net device (IB), some required device capabilities are missing\n");
+			return -ENOTSUPP;
+		}
+		return 0;
 	}
-	if (!MLX5_CAP_ETH(mdev, self_lb_en_modifiable))
-		mlx5_core_warn(mdev, "Self loop back prevention is not supported\n");
-	if (!MLX5_CAP_GEN(mdev, cq_moderation))
-		mlx5_core_warn(mdev, "CQ modiration is not supported\n");
 
-	return 0;
+	return -ENOTSUPP;
+
 }
 
 u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev)
@@ -3455,12 +3465,12 @@ u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout)
 	return MLX5_CAP_ETH(mdev, lro_timer_supported_periods[i]);
 }
 
-static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
+void mlx5n_build_nic_netdev_priv_common(struct mlx5_core_dev *mdev,
 					struct net_device *netdev,
+					struct mlx5e_priv *priv,
 					const struct mlx5e_profile *profile,
 					void *ppriv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
 	u32 link_speed = 0;
 	u32 pci_bw = 0;
 	u8 cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
@@ -3524,6 +3534,15 @@ static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
 	MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, priv->params.rx_cqe_compress_def);
 
 	mutex_init(&priv->state_lock);
+}
+
+static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev,
+					struct net_device *netdev,
+					struct mlx5e_priv *priv,
+					const struct mlx5e_profile *profile,
+					void *ppriv)
+{
+	mlx5n_build_nic_netdev_priv_common(mdev, netdev, priv, profile, ppriv);
 
 	INIT_WORK(&priv->update_carrier_work, mlx5e_update_carrier_work);
 	INIT_WORK(&priv->set_rx_mode_work, mlx5e_set_rx_mode_work);
@@ -3663,7 +3682,7 @@ static void mlx5e_nic_init(struct mlx5_core_dev *mdev,
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 
-	mlx5e_build_nic_netdev_priv(mdev, netdev, profile, ppriv);
+	mlx5e_build_nic_netdev_priv(mdev, netdev, priv, profile, ppriv);
 	mlx5e_build_nic_netdev(netdev);
 	mlx5e_vxlan_init(priv);
 }
@@ -3676,7 +3695,8 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 		bpf_prog_put(priv->xdp_prog);
 }
 
-static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
+/* used for ETH && IB link layer */
+static int mlx5n_init_nic_rx_common(struct mlx5e_priv *priv)
 {
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int err;
@@ -3706,20 +3726,12 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
 		goto err_destroy_indirect_tirs;
 	}
 
-	err = mlx5e_create_flow_steering(priv);
-	if (err) {
-		mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
-		goto err_destroy_direct_tirs;
-	}
-
 	err = mlx5e_tc_init(priv);
 	if (err)
-		goto err_destroy_flow_steering;
+		goto err_destroy_direct_tirs;
 
 	return 0;
 
-err_destroy_flow_steering:
-	mlx5e_destroy_flow_steering(priv);
 err_destroy_direct_tirs:
 	mlx5e_destroy_direct_tirs(priv);
 err_destroy_indirect_tirs:
@@ -3732,12 +3744,31 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
 	return err;
 }
 
-static void mlx5e_cleanup_nic_rx(struct mlx5e_priv *priv)
+static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	int err;
+
+	err = mlx5n_init_nic_rx_common(priv);
+	if (err) {
+		mlx5_core_warn(mdev, "failed create nic rx res, %d\n", err);
+		return err;
+	}
+
+	err = mlx5e_create_flow_steering(priv);
+	if (err) {
+		mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static void mlx5n_cleanup_nic_rx_common(struct mlx5e_priv *priv)
 {
 	int i;
 
 	mlx5e_tc_cleanup(priv);
-	mlx5e_destroy_flow_steering(priv);
 	mlx5e_destroy_direct_tirs(priv);
 	mlx5e_destroy_indirect_tirs(priv);
 	for (i = 0; i < priv->profile->max_nch(priv->mdev); i++)
@@ -3745,6 +3776,12 @@ static void mlx5e_cleanup_nic_rx(struct mlx5e_priv *priv)
 	mlx5e_destroy_rqt(priv, &priv->indir_rqt);
 }
 
+static void mlx5e_cleanup_nic_rx(struct mlx5e_priv *priv)
+{
+	mlx5e_destroy_flow_steering(priv);
+	mlx5n_cleanup_nic_rx_common(priv);
+}
+
 static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
 {
 	int err;
@@ -3836,7 +3873,8 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 		return NULL;
 	}
 
-	profile->init(mdev, netdev, profile, ppriv);
+	if (profile->init)
+		profile->init(mdev, netdev, profile, ppriv);
 
 	netif_carrier_off(netdev);
 
@@ -3849,7 +3887,9 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 	return netdev;
 
 err_cleanup_nic:
-	profile->cleanup(priv);
+	if (profile->cleanup)
+		profile->cleanup(priv);
+
 	free_netdev(netdev);
 
 	return NULL;
@@ -3881,7 +3921,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
 		goto err_close_drop_rq;
 
 	mlx5e_create_q_counter(priv);
-
+	//TBD do i need to change that?
 	mlx5e_init_l2_addr(priv);
 
 	/* MTU range: 68 - hw-specific max */
@@ -4006,7 +4046,7 @@ static void *mlx5e_add(struct mlx5_core_dev *mdev)
 	int err;
 	struct net_device *netdev;
 
-	err = mlx5e_check_required_hca_cap(mdev);
+	err = mlx5e_check_required_hca_cap(mdev, MLX5_INTERFACE_PROTOCOL_ETH);
 	if (err)
 		return NULL;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 09/25] net/mlx5e: Creating and Destroying flow-steering tables for IB link
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 11/25] net/mlx5e: Refactor attach_netdev API Erez Shitrit
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


New function to handle RSS table for IB link type.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h    |  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 41 +++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 39f8ac849af7..f3337ec4457f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -755,6 +755,8 @@ void mlx5e_page_release(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info,
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv);
+int mlx5i_create_flow_steering(struct mlx5e_priv *priv);
+void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv);
 void mlx5e_init_l2_addr(struct mlx5e_priv *priv);
 void mlx5e_destroy_flow_table(struct mlx5e_flow_table *ft);
 int mlx5e_self_test_num(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index a0e5a69402b3..c6b40003007c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -1081,6 +1081,40 @@ static void mlx5e_destroy_vlan_table(struct mlx5e_priv *priv)
 	mlx5e_destroy_flow_table(&priv->fs.vlan.ft);
 }
 
+int mlx5i_create_flow_steering(struct mlx5e_priv *priv)
+{
+	int err;
+
+	priv->fs.ns = mlx5_get_flow_namespace(priv->mdev,
+					       MLX5_FLOW_NAMESPACE_KERNEL);
+
+	if (!priv->fs.ns)
+		return -EINVAL;
+
+	err = mlx5e_arfs_create_tables(priv);
+	if (err) {
+		netdev_err(priv->netdev, "Failed to create arfs tables, err=%d\n",
+			   err);
+		priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
+	}
+
+	err = mlx5e_create_ttc_table(priv);
+	if (err) {
+		netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
+			   err);
+		goto err_destroy_arfs_tables;
+	}
+
+	mlx5e_ethtool_init_steering(priv);
+
+	return 0;
+
+err_destroy_arfs_tables:
+	mlx5e_arfs_destroy_tables(priv);
+
+	return err;
+}
+
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv)
 {
 	int err;
@@ -1141,3 +1175,10 @@ void mlx5e_destroy_flow_steering(struct mlx5e_priv *priv)
 	mlx5e_arfs_destroy_tables(priv);
 	mlx5e_ethtool_cleanup_steering(priv);
 }
+
+void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv)
+{
+	mlx5e_destroy_ttc_table(priv);
+	mlx5e_arfs_destroy_tables(priv);
+	mlx5e_ethtool_cleanup_steering(priv);
+}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 10/25] net/mlx5e: Support netdevice creation for IB link type
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (7 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 08/25] net/mlx5e: Refactor EN code to support IB link Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 14/25] net/mlx5: Enable flow-steering for IB link Erez Shitrit
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


Implement required interface that will able the IB link to be run on top
of the ETH data structures.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 98 +++++++++++++++++++++++
 include/linux/mlx5/driver.h                       | 12 +++
 2 files changed, 110 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 88541f99d37b..3db0334cdba0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3843,6 +3843,104 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 	mlx5_lag_remove(mdev);
 }
 
+static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
+			   struct net_device *netdev,
+			   const struct mlx5e_profile *profile,
+			   void *ppriv)
+{
+	struct mlx5e_priv *priv = ipoib_dev_priv(netdev);
+
+	mlx5n_build_nic_netdev_priv_common(mdev, netdev, priv, profile, ppriv);
+}
+
+static int mlx5i_init_nic_rx(struct mlx5e_priv *priv)
+{
+	struct mlx5_core_dev *mdev = priv->mdev;
+	int err;
+
+	err = mlx5n_init_nic_rx_common(priv);
+	if (err) {
+		mlx5_core_warn(mdev, "failed create nic rx res, %d\n", err);
+		return err;
+	}
+
+	err = mlx5i_create_flow_steering(priv);
+	if (err) {
+		mlx5_core_warn(mdev, "create flow steering failed, %d\n", err);
+		return err;
+	}
+
+	return 0;
+}
+
+static void mlx5i_cleanup_nic_rx(struct mlx5e_priv *priv)
+{
+	mlx5i_destroy_flow_steering(priv);
+	mlx5n_cleanup_nic_rx_common(priv);
+}
+
+static const struct mlx5e_profile mlx5i_nic_profile = {
+	.init		   = mlx5i_nic_init,
+	.cleanup	   = NULL,
+	.init_rx	   = mlx5i_init_nic_rx,
+	.cleanup_rx	   = mlx5i_cleanup_nic_rx,
+	.init_tx	   = mlx5e_init_nic_tx,
+	.cleanup_tx	   = mlx5e_cleanup_nic_tx,
+	.enable		   = NULL,/*mlx5e_nic_enable,*/
+	.disable	   = NULL,
+	.update_stats	   = NULL,/*mlx5e_update_stats,*/
+	.max_nch	   = mlx5e_get_max_num_channels,
+	.max_tc		   = MLX5E_MAX_NUM_TC,
+};
+
+struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
+				       const char *name,
+				       void (*setup)(struct net_device *dev),
+				       struct mlx5i_create_ext_param *param)
+{
+	const struct mlx5e_profile *profile = &mlx5i_nic_profile;
+	int nch = profile->max_nch(mdev);
+	struct net_device *netdev;
+	struct mlx5e_priv *priv;
+
+	if (mlx5e_check_required_hca_cap(mdev, MLX5_INTERFACE_PROTOCOL_IB))
+		return NULL;
+
+	netdev = alloc_netdev_mqs(sizeof(struct mlx5e_priv) + param->size_base_priv,
+				  name, NET_NAME_UNKNOWN,
+				  setup,
+				  nch * MLX5E_MAX_NUM_TC,
+				  nch);
+	if (!netdev) {
+		pr_err("alloc_netdev_mqs failed\n");
+		return NULL;
+	}
+
+	if (profile->init)
+		profile->init(mdev, netdev, profile, &param->size_base_priv);
+
+	netif_carrier_off(netdev);
+
+	priv = ipoib_dev_priv(netdev);
+
+	priv->underlay_qpn = param->qpn;
+
+	priv->wq = create_singlethread_workqueue("mlx5i");
+	if (!priv->wq)
+		goto err_cleanup_nic;
+
+	return netdev;
+
+err_cleanup_nic:
+	if (profile->cleanup)
+		profile->cleanup(priv);
+
+	free_netdev(netdev);
+
+	return NULL;
+}
+EXPORT_SYMBOL(mlx5i_create_netdev);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
 	.init		   = mlx5e_nic_init,
 	.cleanup	   = mlx5e_nic_cleanup,
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 886ff2b00500..d0060cfb2a4f 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -48,6 +48,7 @@
 #include <linux/mlx5/device.h>
 #include <linux/mlx5/doorbell.h>
 #include <linux/mlx5/srq.h>
+#include <rdma/ib_ipoib_accel_ops.h>
 
 enum {
 	MLX5_BOARD_ID_LEN = 64,
@@ -1127,4 +1128,15 @@ enum {
 	MLX5_TRIGGERED_CMD_COMP = (u64)1 << 32,
 };
 
+struct mlx5i_create_ext_param {
+	int	size_base_priv;
+	u32	qpn;
+};
+
+struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
+				       const char *name,
+				       void (*setup)(struct net_device *dev),
+				       struct mlx5i_create_ext_param *param);
+int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv);
+void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 11/25] net/mlx5e: Refactor attach_netdev API
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 09/25] net/mlx5e: Creating and Destroying flow-steering tables for IB link Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 12/25] net/mlx5e: Use underlay_qpn in tis creation Erez Shitrit
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


Use priv object instead of netdev object, will give tha ability to use
it for IB link.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 ++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 4 +++-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index f3337ec4457f..e5c8badc38c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -918,7 +918,7 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 				       const struct mlx5e_profile *profile,
 				       void *ppriv);
 void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
-int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
+int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
 void mlx5e_add_vxlan_port(struct net_device *netdev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3db0334cdba0..0eb16ada0ae6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3993,14 +3993,14 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 	return NULL;
 }
 
-int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
+int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
 	const struct mlx5e_profile *profile;
-	struct mlx5e_priv *priv;
+	struct net_device *netdev;
 	u16 max_mtu;
 	int err;
 
-	priv = netdev_priv(netdev);
+	netdev = priv->netdev;
 	profile = priv->profile;
 	clear_bit(MLX5E_STATE_DESTROYING, &priv->state);
 
@@ -4113,7 +4113,7 @@ static int mlx5e_attach(struct mlx5_core_dev *mdev, void *vpriv)
 	if (err)
 		return err;
 
-	err = mlx5e_attach_netdev(mdev, netdev);
+	err = mlx5e_attach_netdev(mdev, priv);
 	if (err) {
 		mlx5e_destroy_mdev_resources(mdev);
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 850378893b25..0aad28da1638 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -556,6 +556,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 			 struct mlx5_eswitch_rep *rep)
 {
 	struct net_device *netdev;
+	struct mlx5e_priv *priv;
 	int err;
 
 	netdev = mlx5e_create_netdev(esw->dev, &mlx5e_rep_profile, rep);
@@ -567,7 +568,8 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 
 	rep->netdev = netdev;
 
-	err = mlx5e_attach_netdev(esw->dev, netdev);
+	priv = netdev_priv(netdev);
+	err = mlx5e_attach_netdev(esw->dev, priv);
 	if (err) {
 		pr_warn("Failed to attach representor netdev for vport %d\n",
 			rep->vport);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 12/25] net/mlx5e: Use underlay_qpn in tis creation
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 09/25] net/mlx5e: Creating and Destroying flow-steering tables for IB link Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 11/25] net/mlx5e: Refactor attach_netdev API Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 13/25] net/mlx5e: Export resource creation function to be used in IB link Erez Shitrit
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


Enable IB link to use the same code, by default the underlay_qp is zero
for ETH link.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e5c8badc38c7..84db4761f09c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -222,6 +222,7 @@ struct mlx5e_params {
 	bool rx_am_enabled;
 	u32 lro_timeout;
 	u32 pflags;
+	u32 underlay_qpn;
 };
 
 #ifdef CONFIG_MLX5_CORE_EN_DCB
@@ -718,6 +719,7 @@ struct mlx5e_priv {
 
 	const struct mlx5e_profile *profile;
 	void                      *ppriv;
+	u32                       underlay_qpn;
 };
 
 void mlx5e_build_ptys2ethtool_map(void);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 0eb16ada0ae6..87881f9ddf35 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2453,6 +2453,7 @@ static int mlx5e_create_tis(struct mlx5e_priv *priv, int tc)
 
 	MLX5_SET(tisc, tisc, prio, tc << 1);
 	MLX5_SET(tisc, tisc, transport_domain, mdev->mlx5e_res.td.tdn);
+	MLX5_SET(tisc, tisc, underlay_qpn, priv->underlay_qpn);
 
 	if (mlx5_lag_is_lacp_owner(mdev))
 		MLX5_SET(tisc, tisc, strict_lag_tx_port_affinity, 1);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 13/25] net/mlx5e: Export resource creation function to be used in IB link
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 12/25] net/mlx5e: Use underlay_qpn in tis creation Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 17/25] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api Erez Shitrit
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


mlx5i_attach that creates the resources of IB network device.
mlx5i_detach cleans resources for IB device.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 122 +++++++++++++++-------
 1 file changed, 87 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 87881f9ddf35..5b3c2e67607f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3844,6 +3844,54 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 	mlx5_lag_remove(mdev);
 }
 
+static int mlx5n_attach_netdev_common(struct mlx5_core_dev *mdev,
+				      struct mlx5e_priv *priv)
+{
+	const struct mlx5e_profile *profile;
+	struct net_device *netdev;
+	int err;
+
+	netdev = priv->netdev;
+	profile = priv->profile;
+	clear_bit(MLX5E_STATE_DESTROYING, &priv->state);
+
+	err = profile->init_tx(priv);
+	if (err)
+		goto out;
+
+	err = mlx5e_open_drop_rq(priv);
+	if (err) {
+		mlx5_core_err(mdev, "open drop rq failed, %d\n", err);
+		goto err_cleanup_tx;
+	}
+
+	err = profile->init_rx(priv);
+	if (err)
+		goto err_close_drop_rq;
+
+	mlx5e_create_q_counter(priv);
+
+	if (profile->enable)
+		profile->enable(priv);
+
+	rtnl_lock();
+	if (netif_running(netdev))
+		mlx5e_open(netdev);
+	netif_device_attach(netdev);
+	rtnl_unlock();
+
+	return 0;
+
+err_close_drop_rq:
+	mlx5e_close_drop_rq(priv);
+
+err_cleanup_tx:
+	profile->cleanup_tx(priv);
+
+out:
+	return err;
+}
+
 static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
 			   struct net_device *netdev,
 			   const struct mlx5e_profile *profile,
@@ -3942,6 +3990,42 @@ struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
 }
 EXPORT_SYMBOL(mlx5i_create_netdev);
 
+int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv)
+{
+	struct mlx5e_priv *priv = vpriv;
+	struct net_device *netdev = priv->netdev;
+	int err;
+
+	if (netif_device_present(netdev))
+		return 0;
+
+	err = mlx5e_create_mdev_resources(mdev);
+	if (err)
+		return err;
+
+	err = mlx5n_attach_netdev_common(mdev, priv);
+	if (err) {
+		mlx5e_destroy_mdev_resources(mdev);
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(mlx5i_attach);
+
+void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv)
+{
+	struct mlx5e_priv *priv = vpriv;
+	struct net_device *netdev = priv->netdev;
+
+	if (!netif_device_present(netdev))
+		return;
+
+	mlx5e_detach_netdev(mdev, netdev);
+	mlx5e_destroy_mdev_resources(mdev);
+}
+EXPORT_SYMBOL(mlx5i_detach);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
 	.init		   = mlx5e_nic_init,
 	.cleanup	   = mlx5e_nic_cleanup,
@@ -3996,31 +4080,17 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 
 int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
-	const struct mlx5e_profile *profile;
 	struct net_device *netdev;
 	u16 max_mtu;
 	int err;
 
 	netdev = priv->netdev;
-	profile = priv->profile;
-	clear_bit(MLX5E_STATE_DESTROYING, &priv->state);
-
-	err = profile->init_tx(priv);
-	if (err)
-		goto out;
 
-	err = mlx5e_open_drop_rq(priv);
+	err = mlx5n_attach_netdev_common(mdev, priv);
 	if (err) {
-		mlx5_core_err(mdev, "open drop rq failed, %d\n", err);
-		goto err_cleanup_tx;
+		mlx5_core_err(mdev, "failed attach netdev %d\n", err);
+		return err;
 	}
-
-	err = profile->init_rx(priv);
-	if (err)
-		goto err_close_drop_rq;
-
-	mlx5e_create_q_counter(priv);
-	//TBD do i need to change that?
 	mlx5e_init_l2_addr(priv);
 
 	/* MTU range: 68 - hw-specific max */
@@ -4030,25 +4100,7 @@ int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 
 	mlx5e_set_dev_port_mtu(netdev);
 
-	if (profile->enable)
-		profile->enable(priv);
-
-	rtnl_lock();
-	if (netif_running(netdev))
-		mlx5e_open(netdev);
-	netif_device_attach(netdev);
-	rtnl_unlock();
-
 	return 0;
-
-err_close_drop_rq:
-	mlx5e_close_drop_rq(priv);
-
-err_cleanup_tx:
-	profile->cleanup_tx(priv);
-
-out:
-	return err;
 }
 
 static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 14/25] net/mlx5: Enable flow-steering for IB link
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (8 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 10/25] net/mlx5e: Support netdevice creation for IB link type Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-15 18:56     ` Leon Romanovsky
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 15/25] net/mlx5e: Enhanced flow table creation to support ETH and IB links Erez Shitrit
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


Get the relevant capabilities if supports ipoib_enhanced_offloads and
init the flow steering table accordingly.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +++++-----
 drivers/net/ethernet/mellanox/mlx5/core/fw.c      |  3 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index fa4edd88daf1..dd21fc557281 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -1991,9 +1991,6 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
 	struct mlx5_flow_steering *steering;
 	int err = 0;
 
-	if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
-		return 0;
-
 	err = mlx5_init_fc_stats(dev);
 	if (err)
 		return err;
@@ -2004,8 +2001,11 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
 	steering->dev = dev;
 	dev->priv.steering = steering;
 
-	if (MLX5_CAP_GEN(dev, nic_flow_table) &&
-	    MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
+	if ((((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
+	      (MLX5_CAP_GEN(dev, nic_flow_table))) ||
+	     ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
+	      MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)))
+	    && MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
 		err = init_root_ns(steering);
 		if (err)
 			goto err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 5718aada6605..f95bc78b02f2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -123,7 +123,8 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 			return err;
 	}
 
-	if (MLX5_CAP_GEN(dev, nic_flow_table)) {
+	if (MLX5_CAP_GEN(dev, nic_flow_table) ||
+	    MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
 		err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
 		if (err)
 			return err;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 15/25] net/mlx5e: Enhanced flow table creation to support ETH and IB links.
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (9 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 14/25] net/mlx5: Enable flow-steering for IB link Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 16/25] net/mlx5e: Change cleanup API in order to enable IB link Erez Shitrit
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


IB link needs the the underlay_qp to support flow-steering, so change
the API of the flow-steering creation for supporting both types in the
same set of functions.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 12 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c    | 39 ++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  9 ++-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 19 +++++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  8 +++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 67 ++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  1 +
 include/linux/mlx5/fs.h                            | 16 ++++--
 8 files changed, 125 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 68419a01db36..ea3032d97b0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -325,10 +325,18 @@ static int arfs_create_table(struct mlx5e_priv *priv,
 {
 	struct mlx5e_arfs_tables *arfs = &priv->fs.arfs;
 	struct mlx5e_flow_table *ft = &arfs->arfs_tables[type].ft;
+	struct create_flow_table_param param = {0};
 	int err;
 
-	ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-				       MLX5E_ARFS_TABLE_SIZE, MLX5E_ARFS_FT_LEVEL, 0);
+	ft->num_groups = 0;
+
+	param.ns = priv->fs.ns;
+	param.prio = MLX5E_NIC_PRIO;
+	param.max_fte = MLX5E_ARFS_TABLE_SIZE;
+	param.level = MLX5E_ARFS_FT_LEVEL;
+	param.flags = 0;
+
+	ft->t = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft->t)) {
 		err = PTR_ERR(ft->t);
 		ft->t = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index c6b40003007c..46b48b76e7ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -779,9 +779,16 @@ static int mlx5e_create_ttc_table(struct mlx5e_priv *priv)
 	struct mlx5e_ttc_table *ttc = &priv->fs.ttc;
 	struct mlx5e_flow_table *ft = &ttc->ft;
 	int err;
+	struct create_flow_table_param param = {0};
 
-	ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-				       MLX5E_TTC_TABLE_SIZE, MLX5E_TTC_FT_LEVEL, 0);
+	param.ns = priv->fs.ns;
+	param.prio = MLX5E_NIC_PRIO;
+	param.max_fte = MLX5E_TTC_TABLE_SIZE;
+	param.level = MLX5E_TTC_FT_LEVEL;
+	param.flags = 0;
+	param.underlay_qpn = priv->underlay_qpn;
+
+	ft->t = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft->t)) {
 		err = PTR_ERR(ft->t);
 		ft->t = NULL;
@@ -952,10 +959,16 @@ static int mlx5e_create_l2_table(struct mlx5e_priv *priv)
 	struct mlx5e_l2_table *l2_table = &priv->fs.l2;
 	struct mlx5e_flow_table *ft = &l2_table->ft;
 	int err;
+	struct create_flow_table_param param = {0};
+
+	param.ns = priv->fs.ns;
+	param.prio = MLX5E_NIC_PRIO;
+	param.max_fte = MLX5E_L2_TABLE_SIZE;
+	param.level = MLX5E_L2_FT_LEVEL;
+	param.flags = 0;
 
 	ft->num_groups = 0;
-	ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-				       MLX5E_L2_TABLE_SIZE, MLX5E_L2_FT_LEVEL, 0);
+	ft->t = mlx5_create_flow_table(&param);
 
 	if (IS_ERR(ft->t)) {
 		err = PTR_ERR(ft->t);
@@ -1041,11 +1054,18 @@ static int mlx5e_create_vlan_table_groups(struct mlx5e_flow_table *ft)
 static int mlx5e_create_vlan_table(struct mlx5e_priv *priv)
 {
 	struct mlx5e_flow_table *ft = &priv->fs.vlan.ft;
+	struct create_flow_table_param param = {0};
 	int err;
 
 	ft->num_groups = 0;
-	ft->t = mlx5_create_flow_table(priv->fs.ns, MLX5E_NIC_PRIO,
-				       MLX5E_VLAN_TABLE_SIZE, MLX5E_VLAN_FT_LEVEL, 0);
+
+	param.ns = priv->fs.ns;
+	param.prio = MLX5E_NIC_PRIO;
+	param.max_fte = MLX5E_VLAN_TABLE_SIZE;
+	param.level = MLX5E_VLAN_FT_LEVEL;
+	param.flags = 0;
+
+	ft->t = mlx5_create_flow_table(&param);
 
 	if (IS_ERR(ft->t)) {
 		err = PTR_ERR(ft->t);
@@ -1091,13 +1111,6 @@ int mlx5i_create_flow_steering(struct mlx5e_priv *priv)
 	if (!priv->fs.ns)
 		return -EINVAL;
 
-	err = mlx5e_arfs_create_tables(priv);
-	if (err) {
-		netdev_err(priv->netdev, "Failed to create arfs tables, err=%d\n",
-			   err);
-		priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
-	}
-
 	err = mlx5e_create_ttc_table(priv);
 	if (err) {
 		netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index d0c8bf014453..06dfe755f931 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -340,6 +340,7 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw, int nvports)
 	struct mlx5_core_dev *dev = esw->dev;
 	struct mlx5_flow_namespace *root_ns;
 	struct mlx5_flow_table *fdb;
+	struct create_flow_table_param param = {0};
 	struct mlx5_flow_group *g;
 	void *match_criteria;
 	int table_size;
@@ -361,8 +362,14 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw, int nvports)
 		return -ENOMEM;
 	memset(flow_group_in, 0, inlen);
 
+	param.ns = root_ns;
+	param.prio = 0;
+	param.level = 0;
+	param.max_fte = table_size;
+	param.flags = 0;
+
 	table_size = BIT(MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
-	fdb = mlx5_create_flow_table(root_ns, 0, table_size, 0, 0);
+	fdb = mlx5_create_flow_table(&param);
 	if (IS_ERR(fdb)) {
 		err = PTR_ERR(fdb);
 		esw_warn(dev, "Failed to create FDB Table err %d\n", err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 595f7c7383b3..5e929888f0d8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -410,6 +410,7 @@ static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
 	struct mlx5_core_dev *dev = esw->dev;
 	struct mlx5_flow_namespace *root_ns;
+	struct create_flow_table_param param = {0};
 	struct mlx5_flow_table *fdb = NULL;
 	struct mlx5_flow_group *g;
 	u32 *flow_group_in;
@@ -447,7 +448,14 @@ static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
 	esw->fdb_table.fdb = fdb;
 
 	table_size = nvports + MAX_PF_SQ + 1;
-	fdb = mlx5_create_flow_table(root_ns, FDB_SLOW_PATH, table_size, 0, 0);
+
+	param.ns = root_ns;
+	param.prio = FDB_SLOW_PATH;
+	param.level = 0;
+	param.max_fte = table_size;
+	param.flags = 0;
+
+	fdb = mlx5_create_flow_table(&param);
 	if (IS_ERR(fdb)) {
 		err = PTR_ERR(fdb);
 		esw_warn(dev, "Failed to create slow path FDB Table err %d\n", err);
@@ -531,6 +539,7 @@ static int esw_create_offloads_table(struct mlx5_eswitch *esw)
 	struct mlx5_flow_namespace *ns;
 	struct mlx5_flow_table *ft_offloads;
 	struct mlx5_core_dev *dev = esw->dev;
+	struct create_flow_table_param param = {0};
 	int err = 0;
 
 	ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_OFFLOADS);
@@ -539,7 +548,13 @@ static int esw_create_offloads_table(struct mlx5_eswitch *esw)
 		return -EOPNOTSUPP;
 	}
 
-	ft_offloads = mlx5_create_flow_table(ns, 0, dev->priv.sriov.num_vfs + 2, 0, 0);
+	param.ns = ns;
+	param.prio = 0;
+	param.level = 0;
+	param.max_fte = dev->priv.sriov.num_vfs + 2;
+	param.flags = 0;
+
+	ft_offloads = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft_offloads)) {
 		err = PTR_ERR(ft_offloads);
 		esw_warn(esw->dev, "Failed to create offloads table, err %d\n", err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index b53fc85a2375..d82721f00f94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -45,6 +45,10 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
 	u32 in[MLX5_ST_SZ_DW(set_flow_table_root_in)]   = {0};
 	u32 out[MLX5_ST_SZ_DW(set_flow_table_root_out)] = {0};
 
+	if ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
+	    ft->underlay_qpn == 0)
+		return 0;
+
 	MLX5_SET(set_flow_table_root_in, in, opcode,
 		 MLX5_CMD_OP_SET_FLOW_TABLE_ROOT);
 	MLX5_SET(set_flow_table_root_in, in, table_type, ft->type);
@@ -54,6 +58,10 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
 		MLX5_SET(set_flow_table_root_in, in, other_vport, 1);
 	}
 
+	if ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
+	    ft->underlay_qpn != 0)
+		MLX5_SET(set_flow_table_root_in, in, underlay_qpn, ft->underlay_qpn);
+
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index dd21fc557281..07e766770c14 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -776,18 +776,16 @@ static void list_add_flow_table(struct mlx5_flow_table *ft,
 	list_add(&ft->node.list, prev);
 }
 
-static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
+static struct mlx5_flow_table *__mlx5_create_flow_table(struct create_flow_table_param *param,
 							enum fs_flow_table_op_mod op_mod,
-							u16 vport, int prio,
-							int max_fte, u32 level,
-							u32 flags)
+							u16 vport)
 {
 	struct mlx5_flow_table *next_ft = NULL;
 	struct mlx5_flow_table *ft;
 	int err;
 	int log_table_sz;
 	struct mlx5_flow_root_namespace *root =
-		find_root(&ns->node);
+		find_root(&param->ns->node);
 	struct fs_prio *fs_prio = NULL;
 
 	if (!root) {
@@ -796,29 +794,31 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 	}
 
 	mutex_lock(&root->chain_lock);
-	fs_prio = find_prio(ns, prio);
+	fs_prio = find_prio(param->ns, param->prio);
 	if (!fs_prio) {
 		err = -EINVAL;
 		goto unlock_root;
 	}
-	if (level >= fs_prio->num_levels) {
+	if (param->level >= fs_prio->num_levels) {
 		err = -ENOSPC;
 		goto unlock_root;
 	}
 	/* The level is related to the
 	 * priority level range.
 	 */
-	level += fs_prio->start_level;
-	ft = alloc_flow_table(level,
+	param->level += fs_prio->start_level;
+	ft = alloc_flow_table(param->level,
 			      vport,
-			      max_fte ? roundup_pow_of_two(max_fte) : 0,
+			      param->max_fte ? roundup_pow_of_two(param->max_fte) : 0,
 			      root->table_type,
-			      op_mod, flags);
+			      op_mod, param->flags);
 	if (!ft) {
 		err = -ENOMEM;
 		goto unlock_root;
 	}
 
+	ft->underlay_qpn = param->underlay_qpn;
+
 	tree_init_node(&ft->node, 1, del_flow_table);
 	log_table_sz = ft->max_fte ? ilog2(ft->max_fte) : 0;
 	next_ft = find_next_chained_ft(fs_prio);
@@ -847,29 +847,36 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 	return ERR_PTR(err);
 }
 
-struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
-					       int prio, int max_fte,
-					       u32 level,
-					       u32 flags)
+struct mlx5_flow_table *mlx5_create_flow_table(struct create_flow_table_param *param)
 {
-	return __mlx5_create_flow_table(ns, FS_FT_OP_MOD_NORMAL, 0, prio,
-					max_fte, level, flags);
+	return __mlx5_create_flow_table(param, FS_FT_OP_MOD_NORMAL, 0);
 }
 
 struct mlx5_flow_table *mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
 						     int prio, int max_fte,
 						     u32 level, u16 vport)
 {
-	return __mlx5_create_flow_table(ns, FS_FT_OP_MOD_NORMAL, vport, prio,
-					max_fte, level, 0);
+	struct create_flow_table_param param = {0};
+
+	param.ns = ns;
+	param.prio = prio;
+	param.max_fte = max_fte;
+	param.level = level;
+
+	return __mlx5_create_flow_table(&param, FS_FT_OP_MOD_NORMAL, 0);
 }
 
 struct mlx5_flow_table *mlx5_create_lag_demux_flow_table(
 					       struct mlx5_flow_namespace *ns,
 					       int prio, u32 level)
 {
-	return __mlx5_create_flow_table(ns, FS_FT_OP_MOD_LAG_DEMUX, 0, prio, 0,
-					level, 0);
+	struct create_flow_table_param param = {0};
+
+	param.ns = ns;
+	param.prio = prio;
+	param.level = level;
+
+	return __mlx5_create_flow_table(&param, FS_FT_OP_MOD_LAG_DEMUX, 0);
 }
 EXPORT_SYMBOL(mlx5_create_lag_demux_flow_table);
 
@@ -881,11 +888,18 @@ struct mlx5_flow_table *mlx5_create_auto_grouped_flow_table(struct mlx5_flow_nam
 							    u32 flags)
 {
 	struct mlx5_flow_table *ft;
+	struct create_flow_table_param param = {0};
 
 	if (max_num_groups > num_flow_table_entries)
 		return ERR_PTR(-EINVAL);
 
-	ft = mlx5_create_flow_table(ns, prio, num_flow_table_entries, level, flags);
+	param.ns = ns;
+	param.prio = prio;
+	param.level = level;
+	param.max_fte = num_flow_table_entries;
+	param.flags = flags;
+
+	ft = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft))
 		return ft;
 
@@ -1828,11 +1842,18 @@ static int create_anchor_flow_table(struct mlx5_flow_steering *steering)
 {
 	struct mlx5_flow_namespace *ns = NULL;
 	struct mlx5_flow_table *ft;
+	struct create_flow_table_param param = {0};
 
 	ns = mlx5_get_flow_namespace(steering->dev, MLX5_FLOW_NAMESPACE_ANCHOR);
 	if (WARN_ON(!ns))
 		return -EINVAL;
-	ft = mlx5_create_flow_table(ns, ANCHOR_PRIO, ANCHOR_SIZE, ANCHOR_LEVEL, 0);
+	param.ns = ns;
+	param.prio = ANCHOR_PRIO;
+	param.level = ANCHOR_LEVEL;
+	param.max_fte = ANCHOR_SIZE;
+	param.flags = 0;
+
+	ft = mlx5_create_flow_table(&param);
 	if (IS_ERR(ft)) {
 		mlx5_core_err(steering->dev, "Failed to create last anchor flow table");
 		return PTR_ERR(ft);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index 8e668c63f69e..9ec8a2835642 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -118,6 +118,7 @@ struct mlx5_flow_table {
 	/* FWD rules that point on this flow table */
 	struct list_head		fwd_rules;
 	u32				flags;
+	u32				underlay_qpn;
 };
 
 struct mlx5_fc_cache {
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 949b24b6c479..9ed3cfa607d1 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -104,12 +104,18 @@ struct mlx5_flow_table *
 				    u32 level,
 				    u32 flags);
 
+struct create_flow_table_param {
+	struct mlx5_flow_namespace *ns;
+	int prio;
+	int max_fte;
+	u32 level;
+	u32 flags;
+	u32 underlay_qpn;
+};
+
 struct mlx5_flow_table *
-mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
-		       int prio,
-		       int num_flow_table_entries,
-		       u32 level,
-		       u32 flags);
+mlx5_create_flow_table(struct create_flow_table_param *param);
+
 struct mlx5_flow_table *
 mlx5_create_vport_flow_table(struct mlx5_flow_namespace *ns,
 			     int prio,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 16/25] net/mlx5e: Change cleanup API in order to enable IB link
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (10 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 15/25] net/mlx5e: Enhanced flow table creation to support ETH and IB links Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 21/25] net/mlx5e: Export send function for IB link type Erez Shitrit
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


1. Change the mlx5e_detach_netdev api.
2. Let that function to be called after the rtnl_lock is already held,
like done in IB link.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 18 +++++++++++-------
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  |  4 ++--
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 84db4761f09c..a10966df24f6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -921,7 +921,7 @@ struct net_device *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
 				       void *ppriv);
 void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 int mlx5e_attach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
-void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev);
+void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
 u32 mlx5e_choose_lro_timeout(struct mlx5_core_dev *mdev, u32 wanted_timeout);
 void mlx5e_add_vxlan_port(struct net_device *netdev,
 			  struct udp_tunnel_info *ti);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5b3c2e67607f..b91bd7a179fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4021,7 +4021,7 @@ void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv)
 	if (!netif_device_present(netdev))
 		return;
 
-	mlx5e_detach_netdev(mdev, netdev);
+	mlx5e_detach_netdev(mdev, priv);
 	mlx5e_destroy_mdev_resources(mdev);
 }
 EXPORT_SYMBOL(mlx5i_detach);
@@ -4126,18 +4126,22 @@ static void mlx5e_register_vport_rep(struct mlx5_core_dev *mdev)
 	}
 }
 
-void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct net_device *netdev)
+void mlx5e_detach_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct net_device *netdev = priv->netdev;
 	const struct mlx5e_profile *profile = priv->profile;
-
+	bool locked = false;
 	set_bit(MLX5E_STATE_DESTROYING, &priv->state);
 
-	rtnl_lock();
+	if (!rtnl_is_locked()) {
+		rtnl_lock();
+		locked = true;
+	}
 	if (netif_running(netdev))
 		mlx5e_close(netdev);
 	netif_device_detach(netdev);
-	rtnl_unlock();
+	if (locked)
+		rtnl_unlock();
 
 	if (profile->disable)
 		profile->disable(priv);
@@ -4183,7 +4187,7 @@ static void mlx5e_detach(struct mlx5_core_dev *mdev, void *vpriv)
 	if (!netif_device_present(netdev))
 		return;
 
-	mlx5e_detach_netdev(mdev, netdev);
+	mlx5e_detach_netdev(mdev, priv);
 	mlx5e_destroy_mdev_resources(mdev);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 0aad28da1638..21d3d8e0bab7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -586,7 +586,7 @@ int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
 	return 0;
 
 err_detach_netdev:
-	mlx5e_detach_netdev(esw->dev, netdev);
+	mlx5e_detach_netdev(esw->dev, priv);
 
 err_destroy_netdev:
 	mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
@@ -601,6 +601,6 @@ void mlx5e_vport_rep_unload(struct mlx5_eswitch *esw,
 	struct net_device *netdev = rep->netdev;
 
 	unregister_netdev(netdev);
-	mlx5e_detach_netdev(esw->dev, netdev);
+	mlx5e_detach_netdev(esw->dev, netdev_priv(netdev));
 	mlx5e_destroy_netdev(esw->dev, netdev_priv(netdev));
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 17/25] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 13/25] net/mlx5e: Export resource creation function to be used in IB link Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 18/25] net/mlx5e: Export open/close api for IB link Erez Shitrit
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


Let the IB link to call it directly with the relevant priv.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  4 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 24 ++++++++---------
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 31 +++++++++++-----------
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  4 +--
 4 files changed, 31 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a10966df24f6..2f9242ae06f3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -797,8 +797,8 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, __always_unused __be16 proto,
 void mlx5e_build_indir_tir_ctx_hash(struct mlx5e_priv *priv, void *tirc,
 				    enum mlx5e_traffic_types tt);
 
-int mlx5e_open_locked(struct net_device *netdev);
-int mlx5e_close_locked(struct net_device *netdev);
+int mlx5e_open_locked(struct mlx5e_priv *priv);
+int mlx5e_close_locked(struct mlx5e_priv *priv);
 void mlx5e_build_default_indir_rqt(struct mlx5_core_dev *mdev,
 				   u32 *indirection_rqt, int len,
 				   int num_channels);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index bb67863aa361..0c8773718292 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -516,14 +516,14 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened)
-		mlx5e_close_locked(dev);
+		mlx5e_close_locked(priv);
 
 	priv->params.log_rq_size = log_rq_size;
 	priv->params.log_sq_size = log_sq_size;
 	priv->params.min_rx_wqes = min_rx_wqes;
 
 	if (was_opened)
-		err = mlx5e_open_locked(dev);
+		err = mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 
@@ -561,7 +561,7 @@ static int mlx5e_set_channels(struct net_device *dev,
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened)
-		mlx5e_close_locked(dev);
+		mlx5e_close_locked(priv);
 
 	arfs_enabled = dev->features & NETIF_F_NTUPLE;
 	if (arfs_enabled)
@@ -572,7 +572,7 @@ static int mlx5e_set_channels(struct net_device *dev,
 				      MLX5E_INDIR_RQT_SIZE, count);
 
 	if (was_opened)
-		err = mlx5e_open_locked(dev);
+		err = mlx5e_open_locked(priv);
 	if (err)
 		goto out;
 
@@ -626,7 +626,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened && restart) {
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 		priv->params.rx_am_enabled = !!coal->use_adaptive_rx_coalesce;
 	}
 
@@ -655,7 +655,7 @@ static int mlx5e_set_coalesce(struct net_device *netdev,
 
 out:
 	if (was_opened && restart)
-		err = mlx5e_open_locked(netdev);
+		err = mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 	return err;
@@ -1112,12 +1112,12 @@ static int mlx5e_set_tunable(struct net_device *dev,
 
 		was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 		if (was_opened)
-			mlx5e_close_locked(dev);
+			mlx5e_close_locked(priv);
 
 		priv->params.tx_max_inline = val;
 
 		if (was_opened)
-			err = mlx5e_open_locked(dev);
+			err = mlx5e_open_locked(priv);
 
 		mutex_unlock(&priv->state_lock);
 		break;
@@ -1444,12 +1444,12 @@ static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable)
 
 	reset = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (reset)
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 
 	mlx5e_set_rx_cq_mode_params(&priv->params, rx_cq_period_mode);
 
 	if (reset)
-		err = mlx5e_open_locked(netdev);
+		err = mlx5e_open_locked(priv);
 
 	return err;
 }
@@ -1473,13 +1473,13 @@ static int set_pflag_rx_cqe_compress(struct net_device *netdev,
 	reset = test_bit(MLX5E_STATE_OPENED, &priv->state);
 
 	if (reset)
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 
 	MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, enable);
 	priv->params.rx_cqe_compress_def = enable;
 
 	if (reset)
-		err = mlx5e_open_locked(netdev);
+		err = mlx5e_open_locked(priv);
 	return err;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b91bd7a179fc..ca1867cdce48 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -152,8 +152,8 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 	mutex_lock(&priv->state_lock);
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
 		goto unlock;
-	mlx5e_close_locked(priv->netdev);
-	err = mlx5e_open_locked(priv->netdev);
+	mlx5e_close_locked(priv);
+	err = mlx5e_open_locked(priv);
 	if (err)
 		netdev_err(priv->netdev, "mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n",
 			   err);
@@ -2239,9 +2239,9 @@ static void mlx5e_netdev_set_tcs(struct net_device *netdev)
 		netdev_set_tc_queue(netdev, tc, nch, 0);
 }
 
-int mlx5e_open_locked(struct net_device *netdev)
+int mlx5e_open_locked(struct mlx5e_priv *priv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
+	struct net_device *netdev = priv->netdev;
 	struct mlx5_core_dev *mdev = priv->mdev;
 	int num_txqs;
 	int err;
@@ -2297,15 +2297,14 @@ int mlx5e_open(struct net_device *netdev)
 	int err;
 
 	mutex_lock(&priv->state_lock);
-	err = mlx5e_open_locked(netdev);
+	err = mlx5e_open_locked(priv);
 	mutex_unlock(&priv->state_lock);
 
 	return err;
 }
 
-int mlx5e_close_locked(struct net_device *netdev)
+int mlx5e_close_locked(struct mlx5e_priv *priv)
 {
-	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 
 	/* May already be CLOSED in case a previous configuration operation
@@ -2336,7 +2335,7 @@ int mlx5e_close(struct net_device *netdev)
 		return -ENODEV;
 
 	mutex_lock(&priv->state_lock);
-	err = mlx5e_close_locked(netdev);
+	err = mlx5e_close_locked(priv);
 	mutex_unlock(&priv->state_lock);
 
 	return err;
@@ -2641,12 +2640,12 @@ static int mlx5e_setup_tc(struct net_device *netdev, u8 tc)
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened)
-		mlx5e_close_locked(priv->netdev);
+		mlx5e_close_locked(priv);
 
 	priv->params.num_tc = tc ? tc : 1;
 
 	if (was_opened)
-		err = mlx5e_open_locked(priv->netdev);
+		err = mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 
@@ -2771,7 +2770,7 @@ static int set_feature_lro(struct net_device *netdev, bool enable)
 	mutex_lock(&priv->state_lock);
 
 	if (was_opened && (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST))
-		mlx5e_close_locked(priv->netdev);
+		mlx5e_close_locked(priv);
 
 	priv->params.lro_en = enable;
 	err = mlx5e_modify_tirs_lro(priv);
@@ -2781,7 +2780,7 @@ static int set_feature_lro(struct net_device *netdev, bool enable)
 	}
 
 	if (was_opened && (priv->params.rq_wq_type == MLX5_WQ_TYPE_LINKED_LIST))
-		mlx5e_open_locked(priv->netdev);
+		mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 
@@ -2915,13 +2914,13 @@ static int mlx5e_change_mtu(struct net_device *netdev, int new_mtu)
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened && reset)
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 
 	netdev->mtu = new_mtu;
 	mlx5e_set_dev_port_mtu(netdev);
 
 	if (was_opened && reset)
-		err = mlx5e_open_locked(netdev);
+		err = mlx5e_open_locked(priv);
 
 	mutex_unlock(&priv->state_lock);
 
@@ -3173,7 +3172,7 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 	reset = (!priv->xdp_prog || !prog);
 
 	if (was_opened && reset)
-		mlx5e_close_locked(netdev);
+		mlx5e_close_locked(priv);
 	if (was_opened && !reset) {
 		/* num_channels is invariant here, so we can take the
 		 * batched reference right upfront.
@@ -3196,7 +3195,7 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 		mlx5e_set_rq_priv_params(priv);
 
 	if (was_opened && reset)
-		mlx5e_open_locked(netdev);
+		mlx5e_open_locked(priv);
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state) || reset)
 		goto unlock;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 06d5e6fecb0a..ad08e64fee1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -169,12 +169,12 @@ void mlx5e_modify_rx_cqe_compression(struct mlx5e_priv *priv, bool val)
 
 	was_opened = test_bit(MLX5E_STATE_OPENED, &priv->state);
 	if (was_opened)
-		mlx5e_close_locked(priv->netdev);
+		mlx5e_close_locked(priv);
 
 	MLX5E_SET_PFLAG(priv, MLX5E_PFLAG_RX_CQE_COMPRESS, val);
 
 	if (was_opened)
-		mlx5e_open_locked(priv->netdev);
+		mlx5e_open_locked(priv);
 
 unlock:
 	mutex_unlock(&priv->state_lock);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 18/25] net/mlx5e: Export open/close api for IB link
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 17/25] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 19/25] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode Erez Shitrit
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


Now IB device is able to call open or close for its net device.

TBD:
There is one change that is waiting for the new channels api, till then
used an "if" in the code.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 23 +++++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 13 ++++++++++++-
 include/linux/mlx5/driver.h                       |  3 +++
 4 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2f9242ae06f3..154cab2a301b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -934,4 +934,5 @@ int mlx5e_get_offload_stats(int attr_id, const struct net_device *dev,
 
 bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv);
 bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv);
+bool mlx5e_is_eswitch_vport_mngr(struct mlx5_core_dev *mdev);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ca1867cdce48..24efc8ccc075 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2269,7 +2269,9 @@ int mlx5e_open_locked(struct mlx5e_priv *priv)
 	}
 
 	mlx5e_redirect_rqts(priv);
-	mlx5e_update_carrier(priv);
+	/* only for the RFC, will use channels api when available */
+	if (MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH)
+		mlx5e_update_carrier(priv);
 	mlx5e_timestamp_init(priv);
 #ifdef CONFIG_RFS_ACCEL
 	priv->netdev->rx_cpu_rmap = priv->mdev->rmap;
@@ -2277,7 +2279,7 @@ int mlx5e_open_locked(struct mlx5e_priv *priv)
 	if (priv->profile->update_stats)
 		queue_delayed_work(priv->wq, &priv->update_stats_work, 0);
 
-	if (MLX5_CAP_GEN(mdev, vport_group_manager)) {
+	if (mlx5e_is_eswitch_vport_mngr(mdev)) {
 		err = mlx5e_add_sqs_fwd_rules(priv);
 		if (err)
 			goto err_close_channels;
@@ -3899,6 +3901,7 @@ static void mlx5i_nic_init(struct mlx5_core_dev *mdev,
 	struct mlx5e_priv *priv = ipoib_dev_priv(netdev);
 
 	mlx5n_build_nic_netdev_priv_common(mdev, netdev, priv, profile, ppriv);
+	priv->ppriv = NULL;
 }
 
 static int mlx5i_init_nic_rx(struct mlx5e_priv *priv)
@@ -4025,6 +4028,22 @@ void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv)
 }
 EXPORT_SYMBOL(mlx5i_detach);
 
+int mlx5i_open(void *vpriv)
+{
+	struct mlx5e_priv *priv = vpriv;
+
+	return mlx5e_open_locked(priv);
+}
+EXPORT_SYMBOL(mlx5i_open);
+
+int mlx5i_close(void *vpriv)
+{
+	struct mlx5e_priv *priv = vpriv;
+
+	return mlx5e_close_locked(priv);
+}
+EXPORT_SYMBOL(mlx5i_close);
+
 static const struct mlx5e_profile mlx5e_nic_profile = {
 	.init		   = mlx5e_nic_init,
 	.cleanup	   = mlx5e_nic_cleanup,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 21d3d8e0bab7..cbb1092dddd4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -219,7 +219,12 @@ int mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
 void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv)
 {
 	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
-	struct mlx5_eswitch_rep *rep = priv->ppriv;
+	struct mlx5_eswitch_rep *rep;
+
+	if (!priv->ppriv)
+		return;
+
+	rep = priv->ppriv;
 
 	mlx5_eswitch_sqs2vport_stop(esw, rep);
 }
@@ -323,6 +328,12 @@ bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv)
 	return false;
 }
 
+bool mlx5e_is_eswitch_vport_mngr(struct mlx5_core_dev *mdev)
+{
+	return (MLX5_CAP_GEN(mdev, vport_group_manager) &&
+		MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH);
+}
+
 bool mlx5e_is_vf_vport_rep(struct mlx5e_priv *priv)
 {
 	struct mlx5_eswitch_rep *rep = (struct mlx5_eswitch_rep *)priv->ppriv;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d0060cfb2a4f..c18be51287e7 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1139,4 +1139,7 @@ struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
 				       struct mlx5i_create_ext_param *param);
 int mlx5i_attach(struct mlx5_core_dev *mdev, void *vpriv);
 void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
+int mlx5i_close(void *vpriv);
+int mlx5i_open(void *vpriv);
+
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 19/25] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 18/25] net/mlx5e: Export open/close api for IB link Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 20/25] net/mlx5e: Refactor TX send flow Erez Shitrit
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


mlx5_wqe_eth_pad will be used for TX flow for IB link type.
enhanced-ipoib-qp-mode for QP creation.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/qp.c | 4 ----
 include/linux/mlx5/qp.h         | 8 ++++++++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 41e14d57fec9..d6918e6b6f28 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -71,10 +71,6 @@ enum {
 	[MLX5_IB_WR_UMR]			= MLX5_OPCODE_UMR,
 };
 
-struct mlx5_wqe_eth_pad {
-	u8 rsvd0[16];
-};
-
 enum raw_qp_set_mask_map {
 	MLX5_RAW_QP_MOD_SET_RQ_Q_CTR_ID		= 1UL << 0,
 	MLX5_RAW_QP_RATE_LIMIT			= 1UL << 1,
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index 219c699c17b7..568fffff8ac9 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -221,6 +221,14 @@ enum {
 	MLX5_ETH_WQE_L4_CSUM            = 1 << 7,
 };
 
+struct mlx5_wqe_eth_pad {
+	u8 rsvd0[16];
+};
+
+enum {
+	MLX5_QP_ENHANCED_ULP_STATELESS_MODE = 2,
+};
+
 struct mlx5_wqe_eth_seg {
 	u8              rsvd0[4];
 	u8              cs_flags;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 20/25] net/mlx5e: Refactor TX send flow
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 19/25] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 22/25] net/mlx5e: New function pointer for build_rx_skb is Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 23/25] net/mlx5e: Change the function that checks the packet type Erez Shitrit
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


prepare of IB link type packets sending.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h    |   4 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 237 ++++++++++++++----------
 2 files changed, 141 insertions(+), 100 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 154cab2a301b..b6758d0b93a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -402,6 +402,10 @@ struct mlx5e_tx_wqe_info {
 	u32 num_bytes;
 	u8  num_wqebbs;
 	u8  num_dma;
+	u16 ds_cnt;
+	u16 ihs;
+	u8 opcode;
+	bool bf;
 };
 
 enum mlx5e_dma_map_type {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index cfb68371c397..22443ce778ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -216,94 +216,65 @@ static inline void mlx5e_insert_vlan(void *start, struct sk_buff *skb, u16 ihs,
 	mlx5e_tx_skb_pull_inline(skb_data, skb_len, cpy2_sz);
 }
 
-static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
+static inline void mlx5n_sq_fill_ctrl_seg_and_send(struct mlx5e_sq *sq,
+						   struct mlx5e_tx_wqe_info *wi,
+						   struct mlx5_wqe_ctrl_seg *cseg,
+						   struct sk_buff *skb, u16 pi)
 {
 	struct mlx5_wq_cyc       *wq   = &sq->wq;
 
-	u16 pi = sq->pc & wq->sz_m1;
-	struct mlx5e_tx_wqe      *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
-	struct mlx5e_tx_wqe_info *wi   = &sq->db.txq.wqe_info[pi];
+	cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | wi->opcode);
+	cseg->qpn_ds           = cpu_to_be32((sq->sqn << 8) | wi->ds_cnt);
 
-	struct mlx5_wqe_ctrl_seg *cseg = &wqe->ctrl;
-	struct mlx5_wqe_eth_seg  *eseg = &wqe->eth;
-	struct mlx5_wqe_data_seg *dseg;
+	sq->db.txq.skb[pi] = skb;
 
-	unsigned char *skb_data = skb->data;
-	unsigned int skb_len = skb->len;
-	u8  opcode = MLX5_OPCODE_SEND;
-	dma_addr_t dma_addr = 0;
-	unsigned int num_bytes;
-	bool bf = false;
-	u16 headlen;
-	u16 ds_cnt;
-	u16 ihs;
-	int i;
+	wi->num_wqebbs = DIV_ROUND_UP(wi->ds_cnt, MLX5_SEND_WQEBB_NUM_DS);
+	sq->pc += wi->num_wqebbs;
 
-	memset(wqe, 0, sizeof(*wqe));
+	netdev_tx_sent_queue(sq->txq, wi->num_bytes);
 
-	if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
-		eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
-		if (skb->encapsulation) {
-			eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
-					  MLX5_ETH_WQE_L4_INNER_CSUM;
-			sq->stats.csum_partial_inner++;
-		} else {
-			eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
-		}
-	} else
-		sq->stats.csum_none++;
+	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
+		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 
-	if (sq->cc != sq->prev_cc) {
-		sq->prev_cc = sq->cc;
-		sq->bf_budget = (sq->cc == sq->pc) ? MLX5E_SQ_BF_BUDGET : 0;
+	if (unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
+		netif_tx_stop_queue(sq->txq);
+		sq->stats.stopped++;
 	}
 
-	if (skb_is_gso(skb)) {
-		eseg->mss    = cpu_to_be16(skb_shinfo(skb)->gso_size);
-		opcode       = MLX5_OPCODE_LSO;
+	if (!skb->xmit_more || netif_xmit_stopped(sq->txq)) {
+		int bf_sz = 0;
 
-		if (skb->encapsulation) {
-			ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
-			sq->stats.tso_inner_packets++;
-			sq->stats.tso_inner_bytes += skb->len - ihs;
-		} else {
-			ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
-			sq->stats.tso_packets++;
-			sq->stats.tso_bytes += skb->len - ihs;
-		}
+		if (wi->bf && test_bit(MLX5E_SQ_STATE_BF_ENABLE, &sq->state))
+			bf_sz = wi->num_wqebbs << 3;
 
-		num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs;
-	} else {
-		bf = sq->bf_budget &&
-		     !skb->xmit_more &&
-		     !skb_shinfo(skb)->nr_frags;
-		ihs = mlx5e_get_inline_hdr_size(sq, skb, bf);
-		num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN);
+		cseg->fm_ce_se = MLX5_WQE_CTRL_CQ_UPDATE;
+		mlx5e_tx_notify_hw(sq, cseg, bf_sz);
 	}
 
-	wi->num_bytes = num_bytes;
+	/* fill sq edge with nops to avoid wqe wrap around */
+	while ((sq->pc & wq->sz_m1) > sq->edge)
+		mlx5e_send_nop(sq, false);
 
-	if (skb_vlan_tag_present(skb)) {
-		mlx5e_insert_vlan(eseg->inline_hdr_start, skb, ihs, &skb_data,
-				  &skb_len);
-		ihs += VLAN_HLEN;
-	} else {
-		memcpy(eseg->inline_hdr_start, skb_data, ihs);
-		mlx5e_tx_skb_pull_inline(&skb_data, &skb_len, ihs);
-	}
+	sq->bf_budget = wi->bf ? sq->bf_budget - 1 : 0;
 
-	eseg->inline_hdr_sz = cpu_to_be16(ihs);
+	sq->stats.packets++;
+	sq->stats.bytes += wi->num_bytes;
+}
 
-	ds_cnt  = sizeof(*wqe) / MLX5_SEND_WQE_DS;
-	ds_cnt += DIV_ROUND_UP(ihs - sizeof(eseg->inline_hdr_start),
-			       MLX5_SEND_WQE_DS);
-	dseg    = (struct mlx5_wqe_data_seg *)cseg + ds_cnt;
+static inline int mlx5n_sq_build_data_seg(struct mlx5e_sq *sq,
+					  struct mlx5e_tx_wqe_info *wi,
+					  struct mlx5_wqe_data_seg *dseg,
+					  struct sk_buff *skb)
+{
+	dma_addr_t dma_addr = 0;
+	u16 headlen;
+	int i;
 
 	wi->num_dma = 0;
 
-	headlen = skb_len - skb->data_len;
+	headlen = skb->len - skb->data_len;
 	if (headlen) {
-		dma_addr = dma_map_single(sq->pdev, skb_data, headlen,
+		dma_addr = dma_map_single(sq->pdev, skb->data, headlen,
 					  DMA_TO_DEVICE);
 		if (unlikely(dma_mapping_error(sq->pdev, dma_addr)))
 			goto dma_unmap_wqe_err;
@@ -336,57 +307,123 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
 
 		dseg++;
 	}
+	wi->ds_cnt += wi->num_dma;
 
-	ds_cnt += wi->num_dma;
-
-	cseg->opmod_idx_opcode = cpu_to_be32((sq->pc << 8) | opcode);
-	cseg->qpn_ds           = cpu_to_be32((sq->sqn << 8) | ds_cnt);
+	return 0;
 
-	sq->db.txq.skb[pi] = skb;
+dma_unmap_wqe_err:
+	sq->stats.dropped++;
+	mlx5e_dma_unmap_wqe_err(sq, wi->num_dma);
 
-	wi->num_wqebbs = DIV_ROUND_UP(ds_cnt, MLX5_SEND_WQEBB_NUM_DS);
-	sq->pc += wi->num_wqebbs;
+	dev_kfree_skb_any(skb);
+	return -ENOMEM;
+}
 
-	netdev_tx_sent_queue(sq->txq, wi->num_bytes);
+static inline void mlx5n_sq_build_ether_seg(struct mlx5e_sq *sq,
+					    struct mlx5e_tx_wqe_info *wi,
+					    struct mlx5_wqe_eth_seg *eseg,
+					    struct sk_buff *skb)
+{
+	unsigned int num_bytes;
 
-	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
-		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+	if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
+		eseg->cs_flags = MLX5_ETH_WQE_L3_CSUM;
+		if (skb->encapsulation) {
+			eseg->cs_flags |= MLX5_ETH_WQE_L3_INNER_CSUM |
+					  MLX5_ETH_WQE_L4_INNER_CSUM;
+			sq->stats.csum_partial_inner++;
+		} else {
+			eseg->cs_flags |= MLX5_ETH_WQE_L4_CSUM;
+		}
+			} else {
+				sq->stats.csum_none++;
+			}
 
-	if (unlikely(!mlx5e_sq_has_room_for(sq, MLX5E_SQ_STOP_ROOM))) {
-		netif_tx_stop_queue(sq->txq);
-		sq->stats.stopped++;
+	if (sq->cc != sq->prev_cc) {
+		sq->prev_cc = sq->cc;
+		sq->bf_budget = (sq->cc == sq->pc) ? MLX5E_SQ_BF_BUDGET : 0;
 	}
 
-	sq->stats.xmit_more += skb->xmit_more;
-	if (!skb->xmit_more || netif_xmit_stopped(sq->txq)) {
-		int bf_sz = 0;
+	wi->opcode = MLX5_OPCODE_SEND;
+	wi->bf = false;
+	wi->ihs = 0;
 
-		if (bf && test_bit(MLX5E_SQ_STATE_BF_ENABLE, &sq->state))
-			bf_sz = wi->num_wqebbs << 3;
+	if (skb_is_gso(skb)) {
+		eseg->mss    = cpu_to_be16(skb_shinfo(skb)->gso_size);
+		wi->opcode      = MLX5_OPCODE_LSO;
 
-		cseg->fm_ce_se = MLX5_WQE_CTRL_CQ_UPDATE;
-		mlx5e_tx_notify_hw(sq, &wqe->ctrl, bf_sz);
+		if (skb->encapsulation) {
+			wi->ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
+			sq->stats.tso_inner_packets++;
+			sq->stats.tso_inner_bytes += skb->len - wi->ihs;
+		} else {
+			wi->ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
+			sq->stats.tso_packets++;
+			sq->stats.tso_bytes += skb->len - wi->ihs;
+		}
+
+		num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * wi->ihs;
+	} else {
+		wi->bf = sq->bf_budget &&
+		     !skb->xmit_more &&
+		     !skb_shinfo(skb)->nr_frags;
+		wi->ihs = mlx5e_get_inline_hdr_size(sq, skb, wi->bf);
+		num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN);
 	}
 
-	/* fill sq edge with nops to avoid wqe wrap around */
-	while ((pi = (sq->pc & wq->sz_m1)) > sq->edge) {
-		sq->db.txq.skb[pi] = NULL;
-		mlx5e_send_nop(sq, false);
+	wi->num_bytes = num_bytes;
+
+	if (skb_vlan_tag_present(skb)) {
+		mlx5e_insert_vlan(eseg->inline_hdr_start, skb, wi->ihs, &skb->data,
+				  &skb->len);
+		wi->ihs += VLAN_HLEN;
+	} else {
+		memcpy(eseg->inline_hdr_start, skb->data, wi->ihs);
+		mlx5e_tx_skb_pull_inline(&skb->data, &skb->len, wi->ihs);
 	}
 
-	if (bf)
-		sq->bf_budget--;
+	eseg->inline_hdr_sz = cpu_to_be16(wi->ihs);
+}
 
-	sq->stats.packets++;
-	sq->stats.bytes += num_bytes;
-	return NETDEV_TX_OK;
+static inline void mlx5n_sq_build_datagram_seg(struct mlx5e_sq *sq,
+					       struct mlx5_wqe_datagram_seg *dseg,
+					       struct mlx5_av *av, u32 dqpn, u32 dqkey,
+					       struct sk_buff *skb)
+{
+	memcpy(&dseg->av, av, sizeof(struct mlx5_av));
+	dseg->av.dqp_dct = cpu_to_be32(dqpn | MLX5_EXTENDED_UD_AV);
+	dseg->av.key.qkey.qkey = cpu_to_be32(dqkey);
+}
 
-dma_unmap_wqe_err:
-	sq->stats.dropped++;
-	mlx5e_dma_unmap_wqe_err(sq, wi->num_dma);
+static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb)
+{
+	struct mlx5_wq_cyc       *wq   = &sq->wq;
+	u16                      pi    = sq->pc & wq->sz_m1;
+	void                     *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
+	struct mlx5e_tx_wqe_info *wi   = &sq->db.txq.wqe_info[pi];
 
-	dev_kfree_skb_any(skb);
+	struct mlx5_wqe_ctrl_seg *ctrl_seg = wqe;
+	struct mlx5_wqe_eth_seg  *ether_seg =
+		wqe + sizeof(struct mlx5_wqe_ctrl_seg);
+	struct mlx5_wqe_data_seg *data_seg;
+
+	memset(wqe, 0, sizeof(struct mlx5_wqe_ctrl_seg) +
+	       sizeof(struct mlx5_wqe_eth_seg));
+
+	mlx5n_sq_build_ether_seg(sq, wi, ether_seg, skb);
+
+	wi->ds_cnt  = (sizeof(struct mlx5_wqe_ctrl_seg) +
+			sizeof(struct mlx5_wqe_eth_seg)) / MLX5_SEND_WQE_DS;
+	wi->ds_cnt += DIV_ROUND_UP(wi->ihs - sizeof(ether_seg->inline_hdr_start),
+			MLX5_SEND_WQE_DS);
+	data_seg = (struct mlx5_wqe_data_seg *)ctrl_seg + wi->ds_cnt;
+
+	if (mlx5n_sq_build_data_seg(sq, wi, data_seg, skb) < 0)
+		goto out;
+
+	mlx5n_sq_fill_ctrl_seg_and_send(sq, wi, ctrl_seg, skb, pi);
 
+out:
 	return NETDEV_TX_OK;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 21/25] net/mlx5e: Export send function for IB link type
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (11 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 16/25] net/mlx5e: Change cleanup API in order to enable IB link Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


The function will be used in IB link in order to send packets.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 55 +++++++++++++++++++++++++
 include/linux/mlx5/driver.h                     |  5 ++-
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 22443ce778ff..fea06be30393 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -435,6 +435,61 @@ netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev)
 	return mlx5e_sq_xmit(sq, skb);
 }
 
+static int s_ctrl_seg = sizeof(struct mlx5_wqe_ctrl_seg);
+static int s_datagram_seg = sizeof(struct mlx5_wqe_datagram_seg);
+static int s_pad = sizeof(struct mlx5_wqe_eth_pad);
+static int s_eth_seg = sizeof(struct mlx5_wqe_eth_seg);
+static netdev_tx_t mlx5i_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb,
+				 struct mlx5_av *av, u32 dqpn, u32 dqkey)
+{
+	struct mlx5_wq_cyc       *wq   = &sq->wq;
+	u16                      pi    = sq->pc & wq->sz_m1;
+	void                     *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
+	struct mlx5e_tx_wqe_info *wi   = &sq->db.txq.wqe_info[pi];
+
+	struct mlx5_wqe_ctrl_seg *ctrl_seg = wqe;
+	struct mlx5_wqe_datagram_seg *datagram_seg =
+			wqe + s_ctrl_seg;
+	struct mlx5_wqe_eth_pad *pad =
+			(void *)datagram_seg + s_datagram_seg;
+	struct mlx5_wqe_eth_seg  *ether_seg =
+			(void *)pad + s_pad;
+	struct mlx5_wqe_data_seg *data_seg;
+
+	int tot = s_ctrl_seg + s_datagram_seg + s_pad + s_eth_seg;
+
+	memset(wqe, 0, tot);
+
+	mlx5n_sq_build_datagram_seg(sq, datagram_seg, av, dqpn, dqkey, skb);
+
+	mlx5n_sq_build_ether_seg(sq, wi, ether_seg, skb);
+
+	wi->ds_cnt  = tot / MLX5_SEND_WQE_DS;
+	wi->ds_cnt += DIV_ROUND_UP(wi->ihs - sizeof(ether_seg->inline_hdr_start),
+			MLX5_SEND_WQE_DS);
+	data_seg = (struct mlx5_wqe_data_seg *)ctrl_seg + wi->ds_cnt;
+
+	if (mlx5n_sq_build_data_seg(sq, wi, data_seg, skb) < 0)
+		goto out;
+
+	mlx5n_sq_fill_ctrl_seg_and_send(sq, wi, ctrl_seg, skb, pi);
+
+out:
+	return NETDEV_TX_OK;
+}
+
+netdev_tx_t mlx5i_xmit(struct sk_buff *skb, void *p,
+		       struct mlx5_av *av, u32 dqpn, u32 dqkey)
+{
+	struct mlx5e_priv *priv = p;
+	struct mlx5e_sq *sq;
+
+	sq = priv->txq_to_sq_map[skb_get_queue_mapping(skb)];
+
+	return mlx5i_sq_xmit(sq, skb, av, dqpn, dqkey);
+}
+EXPORT_SYMBOL(mlx5i_xmit);
+
 bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 {
 	struct mlx5e_sq *sq;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index c18be51287e7..6d2ac932d321 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1133,6 +1133,8 @@ struct mlx5i_create_ext_param {
 	u32	qpn;
 };
 
+struct mlx5_av;
+
 struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
 				       const char *name,
 				       void (*setup)(struct net_device *dev),
@@ -1141,5 +1143,6 @@ struct net_device *mlx5i_create_netdev(struct mlx5_core_dev *mdev,
 void mlx5i_detach(struct mlx5_core_dev *mdev, void *vpriv);
 int mlx5i_close(void *vpriv);
 int mlx5i_open(void *vpriv);
-
+netdev_tx_t mlx5i_xmit(struct sk_buff *skb, void *p, struct mlx5_av *av,
+		       u32 dqpn, u32 dqkey);
 #endif /* MLX5_DRIVER_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 22/25] net/mlx5e: New function pointer for build_rx_skb is
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 20/25] net/mlx5e: Refactor TX send flow Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 23/25] net/mlx5e: Change the function that checks the packet type Erez Shitrit
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


In order to have the ability to support IB link with the same base code.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  7 +++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 10 +++++-----
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index b6758d0b93a5..84de1ca11524 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -298,6 +298,11 @@ typedef int (*mlx5e_fp_alloc_wqe)(struct mlx5e_rq *rq, struct mlx5e_rx_wqe *wqe,
 
 typedef void (*mlx5e_fp_dealloc_wqe)(struct mlx5e_rq *rq, u16 ix);
 
+typedef void (*mlx5e_fp_build_rx_skb)(struct mlx5_cqe64 *cqe,
+				      u32 cqe_bcnt,
+				      struct mlx5e_rq *rq,
+				      struct sk_buff *skb);
+
 struct mlx5e_dma_info {
 	struct page	*page;
 	dma_addr_t	addr;
@@ -367,6 +372,7 @@ struct mlx5e_rq {
 	mlx5e_fp_handle_rx_cqe handle_rx_cqe;
 	mlx5e_fp_alloc_wqe     alloc_wqe;
 	mlx5e_fp_dealloc_wqe   dealloc_wqe;
+	mlx5e_fp_build_rx_skb  build_rx_skb;
 
 	unsigned long          state;
 	int                    ix;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 24efc8ccc075..4dc8b21d011d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -493,6 +493,12 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5e_rq *rq)
 	return mlx5e_create_umr_mkey(priv, num_mtts, PAGE_SHIFT, &rq->umr_mkey);
 }
 
+/* forward declaration */
+inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
+			       u32 cqe_bcnt,
+			       struct mlx5e_rq *rq,
+			       struct sk_buff *skb);
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
 			   struct mlx5e_rq_param *param,
 			   struct mlx5e_rq *rq)
@@ -538,6 +544,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 	if (rq->xdp_prog)
 		rq->buff.map_dir = DMA_BIDIRECTIONAL;
 
+	rq->build_rx_skb = mlx5e_build_rx_skb;
 	switch (priv->params.rq_wq_type) {
 	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
 		if (mlx5e_is_vf_vport_rep(priv)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index ad08e64fee1a..98546b3395df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -590,10 +590,10 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
 	rq->stats.csum_none++;
 }
 
-static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
-				      u32 cqe_bcnt,
-				      struct mlx5e_rq *rq,
-				      struct sk_buff *skb)
+inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
+			       u32 cqe_bcnt,
+			       struct mlx5e_rq *rq,
+			       struct sk_buff *skb)
 {
 	struct net_device *netdev = rq->netdev;
 	struct mlx5e_tstamp *tstamp = rq->tstamp;
@@ -632,7 +632,7 @@ static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 {
 	rq->stats.packets++;
 	rq->stats.bytes += cqe_bcnt;
-	mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
+	rq->build_rx_skb(cqe, cqe_bcnt, rq, skb);
 }
 
 static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_sq *sq)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 23/25] net/mlx5e: Change the function that checks the packet type
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 22/25] net/mlx5e: New function pointer for build_rx_skb is Erez Shitrit
@ 2017-03-13 18:31   ` Erez Shitrit
  9 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Erez Shitrit


Now we can use it for non-ethernet type packets (like IB).
After changing the order of the skb processing the function now can
check the skb->protocol field to check if it is ETH_P_IP/ETH_P_IPV6
packet.

Signed-off-by: Erez Shitrit <erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 98546b3395df..071a6ecce720 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -549,9 +549,8 @@ static inline void mlx5e_skb_set_hash(struct mlx5_cqe64 *cqe,
 
 static inline bool is_first_ethertype_ip(struct sk_buff *skb)
 {
-	__be16 ethertype = ((struct ethhdr *)skb->data)->h_proto;
-
-	return (ethertype == htons(ETH_P_IP) || ethertype == htons(ETH_P_IPV6));
+	return (skb->protocol == htons(ETH_P_IP) ||
+		skb->protocol == htons(ETH_P_IPV6));
 }
 
 static inline void mlx5e_handle_csum(struct net_device *netdev,
@@ -621,8 +620,8 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 
 	skb->mark = be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK;
 
-	mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
 	skb->protocol = eth_type_trans(skb, netdev);
+	mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
 }
 
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 24/25] net/mlx5e: Add support for build_rx_skb for packet from IB type
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (13 preceding siblings ...)
       [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-13 18:31 ` Erez Shitrit
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops Erez Shitrit
  15 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


New function that parse and build the skb for IPoIB traffic.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  8 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 49 +++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 4dc8b21d011d..3b609bcc0914 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -499,6 +499,11 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 			       struct mlx5e_rq *rq,
 			       struct sk_buff *skb);
 
+inline void mlx5i_build_rx_skb(struct mlx5_cqe64 *cqe,
+			       u32 cqe_bcnt,
+			       struct mlx5e_rq *rq,
+			       struct sk_buff *skb);
+
 static int mlx5e_create_rq(struct mlx5e_channel *c,
 			   struct mlx5e_rq_param *param,
 			   struct mlx5e_rq *rq)
@@ -584,6 +589,9 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
 		else
 			rq->handle_rx_cqe = mlx5e_handle_rx_cqe;
 
+		if (MLX5_CAP_GEN(mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
+			rq->build_rx_skb = mlx5i_build_rx_skb;
+
 		rq->alloc_wqe = mlx5e_alloc_rx_wqe;
 		rq->dealloc_wqe = mlx5e_dealloc_rx_wqe;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 071a6ecce720..db3064c4b052 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -624,6 +624,55 @@ inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
 	mlx5e_handle_csum(netdev, cqe, rq, skb, !!lro_num_seg);
 }
 
+#define MLX5_IB_GRH_DGID_OFFSET 24
+#define MLX5_IB_GRH_BYTES       40
+#define MLX5_IPOIB_ENCAP_LEN    4
+#define MLX5_GID_SIZE           16
+
+inline void mlx5i_build_rx_skb(struct mlx5_cqe64 *cqe,
+			       u32 cqe_bcnt,
+			       struct mlx5e_rq *rq,
+			       struct sk_buff *skb)
+{
+	struct net_device *netdev = rq->netdev;
+	u8 *dgid;
+	u8 g;
+
+	skb_put(skb, cqe_bcnt);
+
+	g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3;
+	dgid = skb->data + MLX5_IB_GRH_DGID_OFFSET;
+	if ((!g) || dgid[0] != 0xff)
+		skb->pkt_type = PACKET_HOST;
+	else if (memcmp(dgid, netdev->broadcast + 4, MLX5_GID_SIZE) == 0)
+		skb->pkt_type = PACKET_BROADCAST;
+	else
+		skb->pkt_type = PACKET_MULTICAST;
+
+	/* TODO: IB/ipoib: Allow mcast packets from other VFs
+	 * 68996a6e760e5c74654723eeb57bf65628ae87f4
+	 */
+
+	skb_pull(skb, MLX5_IB_GRH_BYTES);
+
+	skb->protocol = *((__be16 *)(skb->data));
+
+	mlx5e_handle_csum(netdev, cqe, rq, skb, rq->priv->params.lro_en);
+
+	skb_record_rx_queue(skb, rq->ix);
+
+	if (likely(netdev->features & NETIF_F_RXHASH))
+		mlx5e_skb_set_hash(cqe, skb);
+
+	skb_reset_mac_header(skb);
+	skb_pull(skb, MLX5_IPOIB_ENCAP_LEN);
+
+	++netdev->stats.rx_packets;
+	netdev->stats.rx_bytes += skb->len;
+
+	skb->dev = netdev;
+}
+
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 					 struct mlx5_cqe64 *cqe,
 					 u32 cqe_bcnt,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
  2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
                   ` (14 preceding siblings ...)
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 24/25] net/mlx5e: Add support for build_rx_skb for packet from IB type Erez Shitrit
@ 2017-03-13 18:31 ` Erez Shitrit
       [not found]   ` <1489429896-10781-26-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-03-14  6:07   ` Vishwanathapura, Niranjana
  15 siblings, 2 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-13 18:31 UTC (permalink / raw)
  To: dledford; +Cc: linux-rdma, netdev, valex, leonro, saedm, erezsh, Erez Shitrit


Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
---
 drivers/infiniband/hw/mlx5/Makefile         |   2 +-
 drivers/infiniband/hw/mlx5/main.c           |  10 +
 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c | 289 ++++++++++++++++++++++++++++
 3 files changed, 300 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c

diff --git a/drivers/infiniband/hw/mlx5/Makefile b/drivers/infiniband/hw/mlx5/Makefile
index 90ad2adc752f..0c4caa339565 100644
--- a/drivers/infiniband/hw/mlx5/Makefile
+++ b/drivers/infiniband/hw/mlx5/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_MLX5_INFINIBAND)	+= mlx5_ib.o
 
-mlx5_ib-y :=	main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o gsi.o ib_virt.o cmd.o
+mlx5_ib-y :=	main.o cq.o doorbell.o qp.o mem.o srq.o mr.o ah.o mad.o gsi.o ib_virt.o cmd.o mlx5_ipoib_ops.o
 mlx5_ib-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += odp.o
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4a043cf35b9a..c9bcaf2cc0c6 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -72,6 +72,14 @@ enum {
 	MLX5_ATOMIC_SIZE_QP_8BYTES = 1 << 3,
 };
 
+struct net_device *mlx5_alloc_rdma_netdev(struct ib_device *hca,
+				     u8 port_num,
+				     enum rdma_netdev_t type,
+				     const char *name,
+				     unsigned char name_assign_type,
+				     void (*setup)(struct net_device *));
+void mlx5_free_rdma_netdev(struct net_device *netdev);
+
 static enum rdma_link_layer
 mlx5_port_type_cap_to_rdma_ll(int port_type_cap)
 {
@@ -3436,6 +3444,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.alloc_mr		= mlx5_ib_alloc_mr;
 	dev->ib_dev.map_mr_sg		= mlx5_ib_map_mr_sg;
 	dev->ib_dev.check_mr_status	= mlx5_ib_check_mr_status;
+	dev->ib_dev.alloc_rdma_netdev	= mlx5_alloc_rdma_netdev;
+	dev->ib_dev.free_rdma_netdev	= mlx5_free_rdma_netdev;
 	dev->ib_dev.get_port_immutable  = mlx5_port_immutable;
 	dev->ib_dev.get_dev_fw_str      = get_dev_fw_str;
 	if (mlx5_core_is_pf(mdev)) {
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c b/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c
new file mode 100644
index 000000000000..9ca2fc4fbc15
--- /dev/null
+++ b/drivers/infiniband/hw/mlx5/mlx5_ipoib_ops.c
@@ -0,0 +1,289 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <linux/netdevice.h>
+#include <rdma/ib_ipoib_accel_ops.h>
+#include "mlx5_ib.h"
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/fs.h>
+
+/*FIX ME*/
+#include "../../ulp/ipoib/ipoib.h"
+
+#define IB_DEFAULT_Q_KEY   0xb1b
+
+int mlx5_ib_config_ipoib_qp(struct mlx5_ib_dev *ib_dev, struct mlx5_core_qp *qp)
+{
+	u32 *in;
+	struct mlx5_qp_context *context;
+	int inlen;
+	void *addr_path;
+	void *qpc;
+	int ret;
+
+	inlen = MLX5_ST_SZ_BYTES(create_qp_in);
+	in = mlx5_vzalloc(inlen);
+	if (!in)
+		return -ENOMEM;
+
+	qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_UD);
+	MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+	MLX5_SET(qpc, qpc, ulp_stateless_offload_mode,
+		 MLX5_QP_ENHANCED_ULP_STATELESS_MODE);
+
+	addr_path = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+	MLX5_SET(ads, addr_path, port, 1);
+	MLX5_SET(ads, addr_path, grh, 1);
+
+	ret = mlx5_core_create_qp(ib_dev->mdev, qp, in, inlen);
+	if (ret) {
+		pr_err("Failed creating IPoIB QP err : %d\n", ret);
+		goto clean_inbox;
+	}
+
+	/* QP states */
+	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	if (!context)
+		goto clean_inbox;
+
+	context->flags = cpu_to_be32(MLX5_QP_PM_MIGRATED << 11);
+	context->pri_path.port = 1;
+	context->qkey = cpu_to_be32(IB_DEFAULT_Q_KEY);
+
+	ret = mlx5_core_qp_modify(ib_dev->mdev,
+				  MLX5_CMD_OP_RST2INIT_QP, 0, context, qp);
+	if (ret) {
+		pr_warn("Failed to modify qp RST2INIT, err: %d\n", ret);
+		goto clean_in_modify;
+	}
+	memset(context, 0, sizeof(*context));
+
+	ret = mlx5_core_qp_modify(ib_dev->mdev,
+				  MLX5_CMD_OP_INIT2RTR_QP, 0, context,
+				  qp);
+	if (ret) {
+		pr_warn("Failed to modify qp INIT2RTR, err: %d\n", ret);
+		goto clean_in_modify;
+	}
+
+	ret = mlx5_core_qp_modify(ib_dev->mdev,
+				  MLX5_CMD_OP_RTR2RTS_QP, 0, context,
+				  qp);
+	if (ret) {
+		pr_warn("Failed to modify qp RTR2RTS, err: %d\n", ret);
+		goto clean_in_modify;
+	}
+
+	kvfree(in);
+	kfree(context);
+
+	return 0;
+
+clean_in_modify:
+	kvfree(in);
+clean_inbox:
+	kvfree(in);
+	return ret;
+}
+
+static void mlx5_ib_clean_qp(struct mlx5_ib_dev *ib_dev,
+			     struct mlx5_core_qp *qp)
+{
+	mlx5_core_destroy_qp(ib_dev->mdev, qp);
+	kfree(qp);
+}
+
+int mlx5_ib_dev_init(struct net_device *dev, struct ib_device *hca,
+		     int *qp_num)
+{
+	void *next_priv = ipoib_dev_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+	int ret;
+
+	ret = mlx5i_attach(ib_dev->mdev, next_priv);
+	if (ret) {
+		pr_err("Failed resources allocation for device: %s ret: %d\n",
+		       dev->name, ret);
+		return ret;
+	}
+
+	*qp_num = rn->qp_num;
+
+	pr_debug("resources allocated for device: %s\n", dev->name);
+
+	return 0;
+}
+
+void mlx5_ib_dev_cleanup(struct net_device *dev, struct ib_device *hca)
+{
+	void *next_priv = ipoib_dev_priv(dev);
+	struct rdma_netdev *rn = netdev_priv(dev);
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+	struct mlx5_qp_context context;
+	int ret;
+
+	/* detach qp from flow-steering by reset it */
+	ret = mlx5_core_qp_modify(ib_dev->mdev,
+				  MLX5_CMD_OP_2RST_QP, 0, &context,
+				  (struct mlx5_core_qp *)rn->context);
+	if (ret)
+		pr_err("%s failed (ret: %d) to reset QP\n", __func__, ret);
+
+	mlx5i_detach(ib_dev->mdev, next_priv);
+
+	mlx5_ib_clean_qp(ib_dev, (struct mlx5_core_qp *)rn->context);
+}
+
+int mlx5_ib_dev_open(struct net_device *dev)
+{
+	void *next_priv = ipoib_dev_priv(dev);
+
+	return mlx5i_open(next_priv);
+}
+
+int mlx5_ib_dev_stop(struct net_device *dev)
+{
+	void *next_priv = ipoib_dev_priv(dev);
+
+	return mlx5i_close(next_priv);
+}
+
+void mlx5_ib_send(struct net_device *dev, struct sk_buff *skb,
+		  struct ipoib_ah *address, u32 dqpn, u32 dqkey)
+{
+	void *next_priv = ipoib_dev_priv(dev);
+
+	mlx5i_xmit(skb, next_priv, &to_mah(address->ah)->av, dqpn, dqkey);
+}
+
+int mlx5_ib_attach_mcast(struct net_device *dev, struct ib_device *hca,
+			 union ib_gid *gid, u16 lid, int set_qkey)
+{
+	struct rdma_netdev *rn = netdev_priv(dev);
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+	int err;
+
+	pr_debug("%s attaching QPN 0x%x, MGID %pI6\n",
+		 dev->name, rn->qp_num, gid->raw);
+
+	err = mlx5_core_attach_mcg(ib_dev->mdev, gid, rn->qp_num);
+	if (err)
+		pr_err("%s failed attaching QPN 0x%x, MGID %pI6\n",
+		       dev->name, rn->qp_num, gid->raw);
+
+	return err;
+}
+
+int mlx5_ib_dettach_mcast(struct net_device *dev, struct ib_device *hca,
+			  union ib_gid *gid, u16 lid)
+{
+	struct rdma_netdev *rn = netdev_priv(dev);
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+	int err;
+
+	pr_debug("%s de-attaching QPN 0x%x, MGID %pI6\n",
+		 dev->name, rn->qp_num, gid->raw);
+
+	err = mlx5_core_detach_mcg(ib_dev->mdev, gid, rn->qp_num);
+	if (err)
+		pr_err("%s failed dettaching QPN 0x%x, MGID %pI6\n",
+		       dev->name, rn->qp_num, gid->raw);
+
+	return err;
+}
+
+static const struct net_device_ops ipoib_netdev_default_pf = {
+	.ndo_open		 = mlx5_ib_dev_open,
+	.ndo_stop		 = mlx5_ib_dev_stop,
+};
+
+struct net_device *mlx5_alloc_rdma_netdev(struct ib_device *hca,
+				     u8 port_num,
+				     enum rdma_netdev_t type,
+				     const char *name,
+				     unsigned char name_assign_type,
+				     void (*setup)(struct net_device *))
+{
+	struct net_device *dev;
+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
+	struct mlx5_core_qp *qp;
+	struct mlx5i_create_ext_param param;
+	struct rdma_netdev *rn;
+	int ret;
+
+	/* new IPoIB QP */
+	qp = kzalloc(sizeof(*qp), GFP_KERNEL);
+	if (!qp) {
+		pr_warn("Failed allocate memroy for QP\n");
+		return NULL;
+	}
+
+	/* config and move qp to RTS */
+	ret = mlx5_ib_config_ipoib_qp(ib_dev, qp);
+	if (ret) {
+		pr_warn("Failed config IPoIB QP ret: %d\n", ret);
+		goto clean_qp;
+	}
+
+	param.size_base_priv = sizeof(struct ipoib_rdma_netdev);
+	param.qpn = qp->qpn;
+	dev = mlx5i_create_netdev(ib_dev->mdev, name, setup, &param);
+	if (!dev) {
+		pr_err("%s: Failed to create net device\n", __func__);
+		goto clean_qp;
+	}
+
+	pr_debug("%s qpn: %d created\n", __func__, qp->qpn);
+
+	/* set func pointers */
+	rn = netdev_priv(dev);
+	rn->qp_num = qp->qpn;
+	rn->context = qp;
+
+	dev->netdev_ops = &ipoib_netdev_default_pf;
+	rn->ib_dev_init = mlx5_ib_dev_init;
+	rn->ib_dev_cleanup = mlx5_ib_dev_cleanup;
+	rn->send = mlx5_ib_send;
+	rn->attach_mcast = mlx5_ib_attach_mcast;
+	rn->detach_mcast = mlx5_ib_dettach_mcast;
+
+	return dev;
+
+clean_qp:
+	mlx5_ib_clean_qp(ib_dev, qp);
+	return NULL;
+}
+
+void mlx5_free_rdma_netdev(struct net_device *netdev)
+{
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]   ` <1489429896-10781-5-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-13 20:01     ` Jason Gunthorpe
  2017-03-14  7:01       ` Vishwanathapura, Niranjana
                         ` (2 more replies)
  2017-03-14  6:44     ` Vishwanathapura, Niranjana
  1 sibling, 3 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-13 20:01 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Mon, Mar 13, 2017 at 08:31:15PM +0200, Erez Shitrit wrote:

> diff --git a/include/rdma/ib_ipoib_accel_ops.h b/include/rdma/ib_ipoib_accel_ops.h
> new file mode 100644
> index 000000000000..148a5529a559
> +++ b/include/rdma/ib_ipoib_accel_ops.h

Both patches need a better naming scheme for this file..

rn_opa_vnic.h
rn_ipoib.h

Maybe?

> +struct rdma_netdev {
> +	void *clnt_priv;
> +
> +	/* control functions */
> +	void (*set_id)(struct net_device *netdev, int id);

> +	/* IB resource allocation function, returns new UD QP */
> +	int (*ib_dev_init)(struct net_device *dev, struct ib_device *hca,
> +			   int *qp_num);

Why can't some combination of alloc_rdma_netdev and ndo.open do this stuff?

> +	void (*ib_dev_cleanup)(struct net_device *dev, struct ib_device *hca);

Ditto

> +	/* send packet */
> +	void (*send)(struct net_device *dev, struct sk_buff *skb,
> +		     struct ipoib_ah *address, u32 dqpn, u32 dqkey);

> +	/* multicast */
> +	int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
> +			    union ib_gid *gid, u16 lid, int set_qkey);
> +	int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
> +			    union ib_gid *gid, u16 lid);

It would make more sense to store the struct ib_device pointer in the
struct rdma_netdev.

Should 'lid' be 'mlid'?

> +	int qp_num;

This one probably belongs in ipoib_rdma_netdev

> +	void *context;

What is this? Why is something other than ipoib_priv or ipoib_dev_priv
needed?


>  						struct ib_wq_attr *attr,
>  						u32 wq_attr_mask,
>  						struct ib_udata *udata);
> +	struct ib_ipoib_accel_ops * (*get_ipoib_accel_ops)(struct ib_device *device);

rebase error? Not sure how this compiles

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks Erez Shitrit
@ 2017-03-13 20:10   ` Jason Gunthorpe
       [not found]     ` <20170313201049.GB2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
       [not found]   ` <1489429896-10781-6-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-13 20:10 UTC (permalink / raw)
  To: Erez Shitrit; +Cc: dledford, linux-rdma, netdev, valex, leonro, saedm, erezsh

On Mon, Mar 13, 2017 at 08:31:16PM +0200, Erez Shitrit wrote:
 
> TODO: We added remote qkey to ipoib_send in order to match send op
> signature.
> In accel mode this param will be used but in regular mode this param is
> redundant. Need to think about better solution.

The flow is backwards, in accel mode the xmit ndo should be owend by
the driver and the driver should call a helper to get all the proper
AH data, including qkey.

> -static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
> -				  int port)
> +static int ipoib_dev_init_default(struct net_device *dev,
> +				  struct ib_device *hca, int *qp_num)
>  {
> -	struct ipoib_dev_priv *priv = netdev_priv(dev);
> +	struct ipoib_dev_priv *priv = ipoib_priv(dev);
> +
> +	netif_napi_add(dev, &priv->napi, ipoib_poll, NAPI_POLL_WEIGHT);

All these 'default' functions are part of the 'rn driver'. They should
not be calling ipoib_priv, you said you didn't want ipoib_dev_priv
leaking into the drivers.

These _default funcs should use ipoib_dev_priv and all the members of
ipoib_dev_priv that are used exclusively by the 'default'
implementation need to be moved into a dedicated priv struct.

Otherwise the entire scheme become hugely confusing about what in
ipoib_dev_priv is actually valid in accel mode.

I think it would be much easier to maintain if the _default functions were
all in a dedicated files, eg rn_ipoib_ud_verbs.c

I also recommend splitting out the bulk rename of ipoib_priv into a
single patch with a '#define ipoib_priv(dev) netdev_priv(dev)'
shim. That would make this patch much smaller.

IHMO you probably don't need to send the mlx5 stuff until the series
up to here is OK. I think we all understand that mlx5 can implement
this API?

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
       [not found]   ` <1489429896-10781-26-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-13 20:27     ` Jason Gunthorpe
       [not found]       ` <20170313202720.GC2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-13 20:27 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Mon, Mar 13, 2017 at 08:31:36PM +0200, Erez Shitrit wrote:

> +struct net_device *mlx5_alloc_rdma_netdev(struct ib_device *hca,
> +				     u8 port_num,
> +				     enum rdma_netdev_t type,
> +				     const char *name,
> +				     unsigned char name_assign_type,
> +				     void (*setup)(struct net_device *));
> +void mlx5_free_rdma_netdev(struct net_device *netdev);

Seems like OK signatures to me..

> +	dev->ib_dev.alloc_rdma_netdev	= mlx5_alloc_rdma_netdev;
> +	dev->ib_dev.free_rdma_netdev	= mlx5_free_rdma_netdev;

Since mlx5_free_rdma_netdev is empty this should just be NULL

> +int mlx5_ib_dev_init(struct net_device *dev, struct ib_device *hca,
> +		     int *qp_num)
> +{
> +	void *next_priv = ipoib_dev_priv(dev);
> +	struct rdma_netdev *rn = netdev_priv(dev);
> +	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
> +	int ret;
> +
> +	ret = mlx5i_attach(ib_dev->mdev, next_priv);
> +	if (ret) {
> +		pr_err("Failed resources allocation for device: %s ret: %d\n",
> +		       dev->name, ret);
> +		return ret;
> +	}
> +
> +	*qp_num = rn->qp_num;
> +
> +	pr_debug("resources allocated for device: %s\n", dev->name);
> +
> +	return 0;
> +}
> +
> +void mlx5_ib_dev_cleanup(struct net_device *dev, struct ib_device *hca)
> +{
> +	void *next_priv = ipoib_dev_priv(dev);
> +	struct rdma_netdev *rn = netdev_priv(dev);
> +	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
> +	struct mlx5_qp_context context;
> +	int ret;
> +
> +	/* detach qp from flow-steering by reset it */
> +	ret = mlx5_core_qp_modify(ib_dev->mdev,
> +				  MLX5_CMD_OP_2RST_QP, 0, &context,
> +				  (struct mlx5_core_qp *)rn->context);
> +	if (ret)
> +		pr_err("%s failed (ret: %d) to reset QP\n", __func__, ret);
> +
> +	mlx5i_detach(ib_dev->mdev, next_priv);
> +
> +	mlx5_ib_clean_qp(ib_dev, (struct mlx5_core_qp *)rn->context);
> +}

Why isn't this stuff in open/close?

> +void mlx5_ib_send(struct net_device *dev, struct sk_buff *skb,
> +		  struct ipoib_ah *address, u32 dqpn, u32 dqkey)
> +{
> +	void *next_priv = ipoib_dev_priv(dev);
> +
> +	mlx5i_xmit(skb, next_priv, &to_mah(address->ah)->av, dqpn, dqkey);

How come the qkey is not available via ipoib_ah ?

to_mah(address->ah)->av->key.qkey.qkey

?

> +static const struct net_device_ops ipoib_netdev_default_pf = {

That is a weird name for a mlx5 specific structure.

> +	param.size_base_priv = sizeof(struct ipoib_rdma_netdev);

This is really weird, the code in mlx5i_create_netdev calls
ipoib_dev_priv so it must assume the struct is a ipoib_rdma_netdev.

> +	/* set func pointers */
> +	rn = netdev_priv(dev);
> +	rn->qp_num = qp->qpn;
> +	rn->context = qp;

No for using context.. You need your own driver priv, like this:

struct mlx4_rn_priv
{
    struct mlx5e_priv priv;
    struct mlx5_core_qp *qp;
};

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops Erez Shitrit
       [not found]   ` <1489429896-10781-26-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-14  6:07   ` Vishwanathapura, Niranjana
       [not found]     ` <20170314060730.GA79937-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
  1 sibling, 1 reply; 65+ messages in thread
From: Vishwanathapura, Niranjana @ 2017-03-14  6:07 UTC (permalink / raw)
  To: Erez Shitrit; +Cc: dledford, linux-rdma, netdev, valex, leonro, saedm, erezsh

On Mon, Mar 13, 2017 at 08:31:36PM +0200, Erez Shitrit wrote:
>+int mlx5_ib_dev_init(struct net_device *dev, struct ib_device *hca,
>+		     int *qp_num)
>+{
>+	void *next_priv = ipoib_dev_priv(dev);
>+	struct rdma_netdev *rn = netdev_priv(dev);
>+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
>+	int ret;
>+
>+	ret = mlx5i_attach(ib_dev->mdev, next_priv);
>+	if (ret) {
>+		pr_err("Failed resources allocation for device: %s ret: %d\n",
>+		       dev->name, ret);
>+		return ret;
>+	}
>+
>+	*qp_num = rn->qp_num;
>+
>+	pr_debug("resources allocated for device: %s\n", dev->name);
>+
>+	return 0;
>+}
>+
>+void mlx5_ib_dev_cleanup(struct net_device *dev, struct ib_device *hca)
>+{
>+	void *next_priv = ipoib_dev_priv(dev);
>+	struct rdma_netdev *rn = netdev_priv(dev);
>+	struct mlx5_ib_dev *ib_dev = to_mdev(hca);
>+	struct mlx5_qp_context context;
>+	int ret;
>+
>+	/* detach qp from flow-steering by reset it */
>+	ret = mlx5_core_qp_modify(ib_dev->mdev,
>+				  MLX5_CMD_OP_2RST_QP, 0, &context,
>+				  (struct mlx5_core_qp *)rn->context);
>+	if (ret)
>+		pr_err("%s failed (ret: %d) to reset QP\n", __func__, ret);
>+
>+	mlx5i_detach(ib_dev->mdev, next_priv);
>+
>+	mlx5_ib_clean_qp(ib_dev, (struct mlx5_core_qp *)rn->context);
>+}
>+

Why can't use ndo_init() and ndo_uninit() here (just like open and stop below).
We really don't need to pass in hca here (or in any other interface function) 
as it is already made available to the driver during alloc_rdma_netdev.
Also, why qp_num is an output parameter in the init function? Ipoib can access 
rn->qp_num which this init function is returning.

>+struct net_device *mlx5_alloc_rdma_netdev(struct ib_device *hca,
>+				     u8 port_num,
>+				     enum rdma_netdev_t type,
>+				     const char *name,
>+				     unsigned char name_assign_type,
>+				     void (*setup)(struct net_device *))
>+{

Probably need to check the 'type' here as any rdma netdev client can call this 
function (with different rdma_netdev type) and cause driver to misbehave.

>+void mlx5_free_rdma_netdev(struct net_device *netdev)
>+{
>+}

May be it is safer and cleaner for this function undo what alloc does here 
(instead of doing it in other places)?

>-- 
>1.8.3.1
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]   ` <1489429896-10781-6-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-14  6:35     ` Vishwanathapura, Niranjana
       [not found]       ` <20170314063538.GB79937-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
  2017-03-14 15:44       ` Jason Gunthorpe
  0 siblings, 2 replies; 65+ messages in thread
From: Vishwanathapura, Niranjana @ 2017-03-14  6:35 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Mon, Mar 13, 2017 at 08:31:16PM +0200, Erez Shitrit wrote:
>+static struct net_device *ipoib_create_netdev_default(struct ib_device *hca,
>+						      const char *name,
>+						      void (*setup)(struct net_device *))
> {
> 	struct net_device *dev;
>+	struct rdma_netdev *rn;
>
>-	dev = alloc_netdev((int)sizeof(struct ipoib_dev_priv), name,
>-			   NET_NAME_UNKNOWN, ipoib_setup);
>+	dev = alloc_netdev((int)sizeof(struct ipoib_rdma_netdev),
>+			   name,
>+			   NET_NAME_UNKNOWN, setup);
> 	if (!dev)
> 		return NULL;
>
>-	return netdev_priv(dev);
>+	rn = netdev_priv(dev);
>+
>+	rn->ib_dev_init = ipoib_dev_init_default;
>+	rn->ib_dev_cleanup = ipoib_dev_uninit_default;
>+	rn->send = ipoib_send;
>+	rn->attach_mcast = ipoib_mcast_attach;
>+	rn->detach_mcast = ipoib_mcast_detach;
>+
>+	dev->netdev_ops = &ipoib_netdev_default_pf;
>+

Probably no need to set netdev_ops here as it gets overwritten.

>+	return dev;
>+}
>+
>+struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,
>+					const char *name)
>+{
>+	struct net_device *dev;
>+	struct ipoib_dev_priv *priv;
>+	struct rdma_netdev *rn;
>+
>+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>+	if (!priv) {
>+		pr_err("%s failed allocting priv\n", __func__);
>+		return NULL;
>+	}
>+
>+	if (!hca->alloc_rdma_netdev)
>+		dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
>+	else
>+		dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>+					     name, NET_NAME_UNKNOWN,
>+					     ipoib_setup_common);
>+	if (!dev) {
>+		kfree(priv);
>+		return NULL;
>+	}

This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for 
OPA_VNIC type. We should probably look for a dedicated return type (-ENODEV?) 
to determine of the driver supports specified rdma netdev type. Or use a ib 
device attribute to suggest driver support ipoib rdma netdev.

Niranjana
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]   ` <1489429896-10781-5-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-03-13 20:01     ` Jason Gunthorpe
@ 2017-03-14  6:44     ` Vishwanathapura, Niranjana
  1 sibling, 0 replies; 65+ messages in thread
From: Vishwanathapura, Niranjana @ 2017-03-14  6:44 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Mon, Mar 13, 2017 at 08:31:15PM +0200, Erez Shitrit wrote:
>+struct ipoib_rdma_netdev {
>+	struct rdma_netdev rn;  /* keep this first */
>+	/* followed by device private data */
>+	char *dev_priv[0];
>+};
>+
>+static inline void *ipoib_priv(const struct net_device *dev)
>+{
>+	struct rdma_netdev *rn = netdev_priv(dev);
>+
>+	return rn->clnt_priv;
>+}
>+
>+static inline void *ipoib_dev_priv(const struct net_device *dev)
>+{
>+	struct ipoib_rdma_netdev *ipoib_rn = netdev_priv(dev);
>+
>+	return ipoib_rn->dev_priv;
>+}
>+

It can be confusing to see return of ipoib_priv() getting assigned to 
ipoib_dev_priv (legacy name). May be we should change ipoib_dev_priv() to 
ipoib_hw_priv()?


>+#endif /* IB_IPOIB_ACCEL_OPS_H */
>diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>index 85b9034c8cfc..9b090efccdba 100644
>--- a/include/rdma/ib_verbs.h
>+++ b/include/rdma/ib_verbs.h
>@@ -1901,6 +1901,41 @@ struct ib_port_immutable {
> 	u32                           max_mad_size;
> };
>
>+/* rdma netdev type - specifies protocol type */
>+enum rdma_netdev_t {
>+	RDMA_NETDEV_OPA_VNIC,
>+	RDMA_NETDEV_IPOIB
>+};
>+
>+struct ipoib_ah;
>+
>+/**
>+ * struct rdma_netdev - rdma netdev
>+ * For cases where netstack interfacing is required.
>+ */
>+struct rdma_netdev {
>+	void *clnt_priv;
>+
>+	/* control functions */
>+	void (*set_id)(struct net_device *netdev, int id);
>+	/* IB resource allocation function, returns new UD QP */
>+	int (*ib_dev_init)(struct net_device *dev, struct ib_device *hca,
>+			   int *qp_num);
>+	void (*ib_dev_cleanup)(struct net_device *dev, struct ib_device *hca);
>+
>+	/* send packet */
>+	void (*send)(struct net_device *dev, struct sk_buff *skb,
>+		     struct ipoib_ah *address, u32 dqpn, u32 dqkey);
>+
>+	/* multicast */
>+	int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
>+			    union ib_gid *gid, u16 lid, int set_qkey);
>+	int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
>+			    union ib_gid *gid, u16 lid);
>+	int qp_num;

May be ipoib_rdma_netdev structure is the right place for these functions?

>+	void *context;

No context should be necessary here.

>+};
>+
> struct ib_device {
> 	struct device                *dma_device;
>
>@@ -2149,6 +2184,7 @@ struct ib_device {
> 						struct ib_wq_attr *attr,
> 						u32 wq_attr_mask,
> 						struct ib_udata *udata);
>+	struct ib_ipoib_accel_ops * (*get_ipoib_accel_ops)(struct ib_device *device);

old code, needs fix.

Niranjana
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
  2017-03-13 20:01     ` Jason Gunthorpe
@ 2017-03-14  7:01       ` Vishwanathapura, Niranjana
  2017-03-14 13:25         ` Erez Shitrit
  2017-03-14 16:11         ` Jason Gunthorpe
       [not found]       ` <20170313200136.GA2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-03-16 15:17       ` Erez Shitrit
  2 siblings, 2 replies; 65+ messages in thread
From: Vishwanathapura, Niranjana @ 2017-03-14  7:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, dledford, linux-rdma, netdev, valex, leonro, saedm, erezsh

On Mon, Mar 13, 2017 at 02:01:36PM -0600, Jason Gunthorpe wrote:
>> +	/* multicast */
>> +	int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
>> +			    union ib_gid *gid, u16 lid, int set_qkey);
>> +	int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
>> +			    union ib_gid *gid, u16 lid);
>
>It would make more sense to store the struct ib_device pointer in the
>struct rdma_netdev.
>

Agree that it shouldn't be a function parameters.
For opa_vnic, I found it convenient to store ib_device pointer in client and 
device private structures as those will be available in most places anyhow.

Niranjana

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]       ` <20170313200136.GA2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-14 13:25         ` Erez Shitrit
       [not found]           ` <CAAk-MO8dZJKYgpWT8p1fz0O6y2wTx4jxoJhYf21ikMOef_xUMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Erez Shitrit @ 2017-03-14 13:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky

On Mon, Mar 13, 2017 at 10:01 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Mar 13, 2017 at 08:31:15PM +0200, Erez Shitrit wrote:
>
>> diff --git a/include/rdma/ib_ipoib_accel_ops.h b/include/rdma/ib_ipoib_accel_ops.h
>> new file mode 100644
>> index 000000000000..148a5529a559
>> +++ b/include/rdma/ib_ipoib_accel_ops.h
>
> Both patches need a better naming scheme for this file..
>
> rn_opa_vnic.h
> rn_ipoib.h
>
> Maybe?

Can work for me.

vnic?

>
>> +struct rdma_netdev {
>> +     void *clnt_priv;
>> +
>> +     /* control functions */
>> +     void (*set_id)(struct net_device *netdev, int id);
>
>> +     /* IB resource allocation function, returns new UD QP */
>> +     int (*ib_dev_init)(struct net_device *dev, struct ib_device *hca,
>> +                        int *qp_num);
>
> Why can't some combination of alloc_rdma_netdev and ndo.open do this stuff?
>
>> +     void (*ib_dev_cleanup)(struct net_device *dev, struct ib_device *hca);
>
> Ditto
>
>> +     /* send packet */
>> +     void (*send)(struct net_device *dev, struct sk_buff *skb,
>> +                  struct ipoib_ah *address, u32 dqpn, u32 dqkey);
>
>> +     /* multicast */
>> +     int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
>> +                         union ib_gid *gid, u16 lid, int set_qkey);
>> +     int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
>> +                         union ib_gid *gid, u16 lid);
>
> It would make more sense to store the struct ib_device pointer in the
> struct rdma_netdev.

Yes, I can add the ib_device and the qp_number to the rdma_netdev, and
in that way will take out the extra parameters for these functions and
be able to use ndo_ops.

>
> Should 'lid' be 'mlid'?

yes. will change

>
>> +     int qp_num;
>
> This one probably belongs in ipoib_rdma_netdev
>
>> +     void *context;
>
> What is this? Why is something other than ipoib_priv or ipoib_dev_priv
> needed?

ipoib_priv is doesn't known in the lower layers, that context is used
to keep data (qp pointer in that case) that is not available
otherwise.

>
>
>>                                               struct ib_wq_attr *attr,
>>                                               u32 wq_attr_mask,
>>                                               struct ib_udata *udata);
>> +     struct ib_ipoib_accel_ops * (*get_ipoib_accel_ops)(struct ib_device *device);
>
> rebase error? Not sure how this compiles

yes, will take it out. (was compiled with that, probably because it is
not in used.)

>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
  2017-03-14  7:01       ` Vishwanathapura, Niranjana
@ 2017-03-14 13:25         ` Erez Shitrit
  2017-03-14 16:11         ` Jason Gunthorpe
  1 sibling, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-14 13:25 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Jason Gunthorpe, Erez Shitrit, Doug Ledford, linux-rdma, netdev,
	valex, Leon Romanovsky, saedm

On Tue, Mar 14, 2017 at 9:01 AM, Vishwanathapura, Niranjana
<niranjana.vishwanathapura@intel.com> wrote:
> On Mon, Mar 13, 2017 at 02:01:36PM -0600, Jason Gunthorpe wrote:
>>>
>>> +       /* multicast */
>>> +       int (*attach_mcast)(struct net_device *dev, struct ib_device
>>> *hca,
>>> +                           union ib_gid *gid, u16 lid, int set_qkey);
>>> +       int (*detach_mcast)(struct net_device *dev, struct ib_device
>>> *hca,
>>> +                           union ib_gid *gid, u16 lid);
>>
>>
>> It would make more sense to store the struct ib_device pointer in the
>> struct rdma_netdev.
>>
>
> Agree that it shouldn't be a function parameters.
> For opa_vnic, I found it convenient to store ib_device pointer in client and
> device private structures as those will be available in most places anyhow.

Will add it to the rdma_netdev obj, as Jason suggested.
Thanks,

>
> Niranjana

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]     ` <20170313201049.GB2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-14 14:39       ` Erez Shitrit
       [not found]         ` <CAAk-MO8-bnWRu7BDtYDhNePj+76X4Vb7gFNhED8irLNtqR7tCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Erez Shitrit @ 2017-03-14 14:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky

On Mon, Mar 13, 2017 at 10:10 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Mar 13, 2017 at 08:31:16PM +0200, Erez Shitrit wrote:
>
>> TODO: We added remote qkey to ipoib_send in order to match send op
>> signature.
>> In accel mode this param will be used but in regular mode this param is
>> redundant. Need to think about better solution.
>
> The flow is backwards, in accel mode the xmit ndo should be owend by
> the driver and the driver should call a helper to get all the proper
> AH data, including qkey.
>
>> -static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
>> -                               int port)
>> +static int ipoib_dev_init_default(struct net_device *dev,
>> +                               struct ib_device *hca, int *qp_num)
>>  {
>> -     struct ipoib_dev_priv *priv = netdev_priv(dev);
>> +     struct ipoib_dev_priv *priv = ipoib_priv(dev);
>> +
>> +     netif_napi_add(dev, &priv->napi, ipoib_poll, NAPI_POLL_WEIGHT);
>
> All these 'default' functions are part of the 'rn driver'. They should
> not be calling ipoib_priv, you said you didn't want ipoib_dev_priv
> leaking into the drivers.

Do you mean in the name of the function?   drma_netdev_init_default
instead of ipoib_dev_init_default ?

>
> These _default funcs should use ipoib_dev_priv and all the members of
> ipoib_dev_priv that are used exclusively by the 'default'
> implementation need to be moved into a dedicated priv struct.
>
> Otherwise the entire scheme become hugely confusing about what in
> ipoib_dev_priv is actually valid in accel mode.

ipoib_dev_priv is used in both modes accel and "default", it keeps all
the control data/flows, the accel mode handle only the data path.

>
> I think it would be much easier to maintain if the _default functions were
> all in a dedicated files, eg rn_ipoib_ud_verbs.c
>
> I also recommend splitting out the bulk rename of ipoib_priv into a
> single patch with a '#define ipoib_priv(dev) netdev_priv(dev)'
> shim. That would make this patch much smaller.

OK.

>
> IHMO you probably don't need to send the mlx5 stuff until the series
> up to here is OK. I think we all understand that mlx5 can implement
> this API?

OK, agree. will check that with our people that handle the mlx5 stuff.

>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]       ` <20170314063538.GB79937-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
@ 2017-03-14 14:42         ` Erez Shitrit
  2017-03-14 16:00           ` Jason Gunthorpe
  0 siblings, 1 reply; 65+ messages in thread
From: Erez Shitrit @ 2017-03-14 14:42 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	Leon Romanovsky, saedm-VPRAkNaXOzVWk0Htik3J/w

On Tue, Mar 14, 2017 at 8:35 AM, Vishwanathapura, Niranjana
<niranjana.vishwanathapura-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Mar 13, 2017 at 08:31:16PM +0200, Erez Shitrit wrote:
>>
>> +static struct net_device *ipoib_create_netdev_default(struct ib_device
>> *hca,
>> +                                                     const char *name,
>> +                                                     void (*setup)(struct
>> net_device *))
>> {
>>         struct net_device *dev;
>> +       struct rdma_netdev *rn;
>>
>> -       dev = alloc_netdev((int)sizeof(struct ipoib_dev_priv), name,
>> -                          NET_NAME_UNKNOWN, ipoib_setup);
>> +       dev = alloc_netdev((int)sizeof(struct ipoib_rdma_netdev),
>> +                          name,
>> +                          NET_NAME_UNKNOWN, setup);
>>         if (!dev)
>>                 return NULL;
>>
>> -       return netdev_priv(dev);
>> +       rn = netdev_priv(dev);
>> +
>> +       rn->ib_dev_init = ipoib_dev_init_default;
>> +       rn->ib_dev_cleanup = ipoib_dev_uninit_default;
>> +       rn->send = ipoib_send;
>> +       rn->attach_mcast = ipoib_mcast_attach;
>> +       rn->detach_mcast = ipoib_mcast_detach;
>> +
>> +       dev->netdev_ops = &ipoib_netdev_default_pf;
>> +
>
>
> Probably no need to set netdev_ops here as it gets overwritten.

No, it is switched, and used.

>
>
>> +       return dev;
>> +}
>> +
>> +struct ipoib_dev_priv *ipoib_intf_alloc(struct ib_device *hca, u8 port,
>> +                                       const char *name)
>> +{
>> +       struct net_device *dev;
>> +       struct ipoib_dev_priv *priv;
>> +       struct rdma_netdev *rn;
>> +
>> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> +       if (!priv) {
>> +               pr_err("%s failed allocting priv\n", __func__);
>> +               return NULL;
>> +       }
>> +
>> +       if (!hca->alloc_rdma_netdev)
>> +               dev = ipoib_create_netdev_default(hca, name,
>> ipoib_setup_common);
>> +       else
>> +               dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>> +                                            name, NET_NAME_UNKNOWN,
>> +                                            ipoib_setup_common);
>> +       if (!dev) {
>> +               kfree(priv);
>> +               return NULL;
>> +       }
>
>
> This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> OPA_VNIC type. We should probably look for a dedicated return type
> (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> Or use a ib device attribute to suggest driver support ipoib rdma netdev.

sorry, I don't understand that, we are in ipoib driver, so the type is
RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
flag, and to use OPA_VNIC for vnic.


>
> Niranjana
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
       [not found]       ` <20170313202720.GC2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-14 14:53         ` Erez Shitrit
  2017-03-14 16:10           ` Jason Gunthorpe
  0 siblings, 1 reply; 65+ messages in thread
From: Erez Shitrit @ 2017-03-14 14:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	Leon Romanovsky, saedm-VPRAkNaXOzVWk0Htik3J/w

On Mon, Mar 13, 2017 at 10:27 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Mar 13, 2017 at 08:31:36PM +0200, Erez Shitrit wrote:
>
>> +struct net_device *mlx5_alloc_rdma_netdev(struct ib_device *hca,
>> +                                  u8 port_num,
>> +                                  enum rdma_netdev_t type,
>> +                                  const char *name,
>> +                                  unsigned char name_assign_type,
>> +                                  void (*setup)(struct net_device *));
>> +void mlx5_free_rdma_netdev(struct net_device *netdev);
>
> Seems like OK signatures to me..
>
>> +     dev->ib_dev.alloc_rdma_netdev   = mlx5_alloc_rdma_netdev;
>> +     dev->ib_dev.free_rdma_netdev    = mlx5_free_rdma_netdev;
>
> Since mlx5_free_rdma_netdev is empty this should just be NULL

OK,

>
>> +int mlx5_ib_dev_init(struct net_device *dev, struct ib_device *hca,
>> +                  int *qp_num)
>> +{
>> +     void *next_priv = ipoib_dev_priv(dev);
>> +     struct rdma_netdev *rn = netdev_priv(dev);
>> +     struct mlx5_ib_dev *ib_dev = to_mdev(hca);
>> +     int ret;
>> +
>> +     ret = mlx5i_attach(ib_dev->mdev, next_priv);
>> +     if (ret) {
>> +             pr_err("Failed resources allocation for device: %s ret: %d\n",
>> +                    dev->name, ret);
>> +             return ret;
>> +     }
>> +
>> +     *qp_num = rn->qp_num;
>> +
>> +     pr_debug("resources allocated for device: %s\n", dev->name);
>> +
>> +     return 0;
>> +}
>> +
>> +void mlx5_ib_dev_cleanup(struct net_device *dev, struct ib_device *hca)
>> +{
>> +     void *next_priv = ipoib_dev_priv(dev);
>> +     struct rdma_netdev *rn = netdev_priv(dev);
>> +     struct mlx5_ib_dev *ib_dev = to_mdev(hca);
>> +     struct mlx5_qp_context context;
>> +     int ret;
>> +
>> +     /* detach qp from flow-steering by reset it */
>> +     ret = mlx5_core_qp_modify(ib_dev->mdev,
>> +                               MLX5_CMD_OP_2RST_QP, 0, &context,
>> +                               (struct mlx5_core_qp *)rn->context);
>> +     if (ret)
>> +             pr_err("%s failed (ret: %d) to reset QP\n", __func__, ret);
>> +
>> +     mlx5i_detach(ib_dev->mdev, next_priv);
>> +
>> +     mlx5_ib_clean_qp(ib_dev, (struct mlx5_core_qp *)rn->context);
>> +}
>
> Why isn't this stuff in open/close?

According to ipoib control flows, there is a different between
open/close to init/cleanup
for example, in open/close the driver doesn't destroy hw resources,
just change the state, it destroys them in cleanup.

>
>> +void mlx5_ib_send(struct net_device *dev, struct sk_buff *skb,
>> +               struct ipoib_ah *address, u32 dqpn, u32 dqkey)
>> +{
>> +     void *next_priv = ipoib_dev_priv(dev);
>> +
>> +     mlx5i_xmit(skb, next_priv, &to_mah(address->ah)->av, dqpn, dqkey);
>
> How come the qkey is not available via ipoib_ah ?
>
> to_mah(address->ah)->av->key.qkey.qkey
>
> ?

It is, i will change the signature of that function accordingly.

>
>> +static const struct net_device_ops ipoib_netdev_default_pf = {
>
> That is a weird name for a mlx5 specific structure.

OK, will change that.

>
>> +     param.size_base_priv = sizeof(struct ipoib_rdma_netdev);
>
> This is really weird, the code in mlx5i_create_netdev calls
> ipoib_dev_priv so it must assume the struct is a ipoib_rdma_netdev.

It is the same attitude as in the vnic/hfi
(https://patchwork.kernel.org/patch/9587815/)
The lower driver allocates space for the rdma_netdev.
the only struct that is known between the layers is rdma_netdev.

>
>> +     /* set func pointers */
>> +     rn = netdev_priv(dev);
>> +     rn->qp_num = qp->qpn;
>> +     rn->context = qp;
>
> No for using context.. You need your own driver priv, like this:
>
> struct mlx4_rn_priv
> {
>     struct mlx5e_priv priv;
>     struct mlx5_core_qp *qp;
> };

OK, will try to fix it (i have a priv which is shared with the en
driver, so i don't want to mix it with ib objects like qp, will find a
solution for that, thanks.)



>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
       [not found]     ` <20170314060730.GA79937-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
@ 2017-03-14 14:55       ` Erez Shitrit
  0 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-14 14:55 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky

On Tue, Mar 14, 2017 at 8:07 AM, Vishwanathapura, Niranjana
<niranjana.vishwanathapura-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Mar 13, 2017 at 08:31:36PM +0200, Erez Shitrit wrote:
>>
>> +int mlx5_ib_dev_init(struct net_device *dev, struct ib_device *hca,
>> +                    int *qp_num)
>> +{
>> +       void *next_priv = ipoib_dev_priv(dev);
>> +       struct rdma_netdev *rn = netdev_priv(dev);
>> +       struct mlx5_ib_dev *ib_dev = to_mdev(hca);
>> +       int ret;
>> +
>> +       ret = mlx5i_attach(ib_dev->mdev, next_priv);
>> +       if (ret) {
>> +               pr_err("Failed resources allocation for device: %s ret:
>> %d\n",
>> +                      dev->name, ret);
>> +               return ret;
>> +       }
>> +
>> +       *qp_num = rn->qp_num;
>> +
>> +       pr_debug("resources allocated for device: %s\n", dev->name);
>> +
>> +       return 0;
>> +}
>> +
>> +void mlx5_ib_dev_cleanup(struct net_device *dev, struct ib_device *hca)
>> +{
>> +       void *next_priv = ipoib_dev_priv(dev);
>> +       struct rdma_netdev *rn = netdev_priv(dev);
>> +       struct mlx5_ib_dev *ib_dev = to_mdev(hca);
>> +       struct mlx5_qp_context context;
>> +       int ret;
>> +
>> +       /* detach qp from flow-steering by reset it */
>> +       ret = mlx5_core_qp_modify(ib_dev->mdev,
>> +                                 MLX5_CMD_OP_2RST_QP, 0, &context,
>> +                                 (struct mlx5_core_qp *)rn->context);
>> +       if (ret)
>> +               pr_err("%s failed (ret: %d) to reset QP\n", __func__,
>> ret);
>> +
>> +       mlx5i_detach(ib_dev->mdev, next_priv);
>> +
>> +       mlx5_ib_clean_qp(ib_dev, (struct mlx5_core_qp *)rn->context);
>> +}
>> +
>
>
> Why can't use ndo_init() and ndo_uninit() here (just like open and stop
> below).
> We really don't need to pass in hca here (or in any other interface
> function) as it is already made available to the driver during
> alloc_rdma_netdev.
> Also, why qp_num is an output parameter in the init function? Ipoib can
> access rn->qp_num which this init function is returning.
>
>> +struct net_device *mlx5_alloc_rdma_netdev(struct ib_device *hca,
>> +                                    u8 port_num,
>> +                                    enum rdma_netdev_t type,
>> +                                    const char *name,
>> +                                    unsigned char name_assign_type,
>> +                                    void (*setup)(struct net_device *))
>> +{
>
>
> Probably need to check the 'type' here as any rdma netdev client can call
> this function (with different rdma_netdev type) and cause driver to
> misbehave.

Agree, will fix that. thanks.

>
>> +void mlx5_free_rdma_netdev(struct net_device *netdev)
>> +{
>> +}
>
>
> May be it is safer and cleaner for this function undo what alloc does here
> (instead of doing it in other places)?

Currently, I don't see a reason for that, will re-check it.

>
>> --
>> 1.8.3.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
  2017-03-14  6:35     ` Vishwanathapura, Niranjana
       [not found]       ` <20170314063538.GB79937-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
@ 2017-03-14 15:44       ` Jason Gunthorpe
  1 sibling, 0 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-14 15:44 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Erez Shitrit, dledford, linux-rdma, netdev, valex, leonro, saedm, erezsh

On Mon, Mar 13, 2017 at 11:35:38PM -0700, Vishwanathapura, Niranjana wrote:

> This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> OPA_VNIC type. We should probably look for a dedicated return type
> (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> Or use a ib device attribute to suggest driver support ipoib rdma netdev.

I think I prefer call and detect ENOSUPP or whatever than yet another
flag.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
  2017-03-14 14:42         ` Erez Shitrit
@ 2017-03-14 16:00           ` Jason Gunthorpe
       [not found]             ` <20170314160021.GD3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-14 16:00 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: Vishwanathapura, Niranjana, Erez Shitrit, Doug Ledford,
	linux-rdma, netdev, valex, Leon Romanovsky, saedm

On Tue, Mar 14, 2017 at 04:42:55PM +0200, Erez Shitrit wrote:
> >> +       if (!hca->alloc_rdma_netdev)
> >> +               dev = ipoib_create_netdev_default(hca, name,
> >> ipoib_setup_common);
> >> +       else
> >> +               dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> >> +                                            name, NET_NAME_UNKNOWN,
> >> +                                            ipoib_setup_common);
> >> +       if (!dev) {
> >> +               kfree(priv);
> >> +               return NULL;
> >> +       }
> >
> >
> > This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> > OPA_VNIC type. We should probably look for a dedicated return type
> > (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> > Or use a ib device attribute to suggest driver support ipoib rdma netdev.
> 
> sorry, I don't understand that, we are in ipoib driver, so the type is
> RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
> flag, and to use OPA_VNIC for vnic.

He means it should look like this:

 if (hca->alloc_rdma_netdev)
     dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
                                            name, NET_NAME_UNKNOWN,
                                            ipoib_setup_common);

 if (IS_ERR(dev) && PTR_ERR(dev) != ENOTSUP)
      goto out;

 dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
 if (IS_ERR(dev))
      goto out;

 WARN_ON(dev == NULL);

  [...]
  
out:
      return PTR_ERR(dev);

And I'm confused why 'ipoib_create_netdev_default' doesn't need the
same function signature as hca->alloc_rdma_netdev

Both drivers should implement hca->alloc_rdma_netdev to return ENOTSUP
if RDMA_NETDEV is not supported.

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]         ` <CAAk-MO8-bnWRu7BDtYDhNePj+76X4Vb7gFNhED8irLNtqR7tCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-14 16:06           ` Jason Gunthorpe
       [not found]             ` <20170314160616.GE3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-14 16:06 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky

On Tue, Mar 14, 2017 at 04:39:20PM +0200, Erez Shitrit wrote:

> >> -static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
> >> -                               int port)
> >> +static int ipoib_dev_init_default(struct net_device *dev,
> >> +                               struct ib_device *hca, int *qp_num)
> >>  {
> >> -     struct ipoib_dev_priv *priv = netdev_priv(dev);
> >> +     struct ipoib_dev_priv *priv = ipoib_priv(dev);
> >> +
> >> +     netif_napi_add(dev, &priv->napi, ipoib_poll, NAPI_POLL_WEIGHT);
> >
> > All these 'default' functions are part of the 'rn driver'. They should
> > not be calling ipoib_priv, you said you didn't want ipoib_dev_priv
> > leaking into the drivers.
> 
> Do you mean in the name of the function?   drma_netdev_init_default
> instead of ipoib_dev_init_default ?

No, I mean ipoib_priv() should only be called by IPOIB core code and
ipoib_dev_priv() should only be called by code under the rdma_netdev
interface.

The names you picked are confusing.

This is the core code priv:

 struct ipoib_dev_priv *priv = ipoib_priv(dev);

And this is the rdma_netdev priv:

  struct ipoib_rn_default *priv = ipoib_dev_priv(dev)
  struct ipoib_rn_mlx5 *priv = ipoib_dev_priv(dev)

Rename ipoib_dev_priv() to something that doesn't collide with the
structure.

Move things like priv->napi into ipoib_rn_default, only the stuff
under rdma_netdev should touch NAPI.

Try hard not to use 'struct ipoib_dev_priv' inside the callbacks
related to the default rn netdev implementation.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
  2017-03-14 14:53         ` Erez Shitrit
@ 2017-03-14 16:10           ` Jason Gunthorpe
  2017-03-14 16:37             ` Erez Shitrit
       [not found]             ` <20170314161013.GF3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 2 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-14 16:10 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: Erez Shitrit, Doug Ledford, linux-rdma, netdev, valex,
	Leon Romanovsky, saedm

On Tue, Mar 14, 2017 at 04:53:24PM +0200, Erez Shitrit wrote:

> > Why isn't this stuff in open/close?
> 
> According to ipoib control flows, there is a different between
> open/close to init/cleanup for example, in open/close the driver
> doesn't destroy hw resources, just change the state, it destroys
> them in cleanup.

So put it in mlx5_alloc_rdma_netdev then?

Or ndo.init as was suggested?

Or in the void (*setup)(struct net_device *)

> >> +     param.size_base_priv = sizeof(struct ipoib_rdma_netdev);
> >
> > This is really weird, the code in mlx5i_create_netdev calls
> > ipoib_dev_priv so it must assume the struct is a ipoib_rdma_netdev.
> 
> It is the same attitude as in the vnic/hfi
> (https://patchwork.kernel.org/patch/9587815/)

Not quite, they call alloc_netdev_mqs directly, here indirects through
mlx5i_create_netdev which assumes a priv layout, Just drop
param.size_base_priv and put that same calculation in
mlx5i_create_netdev..

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
  2017-03-14  7:01       ` Vishwanathapura, Niranjana
  2017-03-14 13:25         ` Erez Shitrit
@ 2017-03-14 16:11         ` Jason Gunthorpe
       [not found]           ` <20170314161149.GA15752-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-14 16:11 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Erez Shitrit, dledford, linux-rdma, netdev, valex, leonro, saedm, erezsh

On Tue, Mar 14, 2017 at 12:01:09AM -0700, Vishwanathapura, Niranjana wrote:
> On Mon, Mar 13, 2017 at 02:01:36PM -0600, Jason Gunthorpe wrote:
> >>+	/* multicast */
> >>+	int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
> >>+			    union ib_gid *gid, u16 lid, int set_qkey);
> >>+	int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
> >>+			    union ib_gid *gid, u16 lid);
> >
> >It would make more sense to store the struct ib_device pointer in the
> >struct rdma_netdev.
> >
> 
> Agree that it shouldn't be a function parameters.
> For opa_vnic, I found it convenient to store ib_device pointer in client and
> device private structures as those will be available in most places anyhow.

If vnic uses it too, then lets add the ib_device and port num to
rdma_netdev itself?

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
  2017-03-14 16:10           ` Jason Gunthorpe
@ 2017-03-14 16:37             ` Erez Shitrit
       [not found]             ` <20170314161013.GF3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-14 16:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma, netdev, valex,
	Leon Romanovsky, saedm

On Tue, Mar 14, 2017 at 6:10 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Tue, Mar 14, 2017 at 04:53:24PM +0200, Erez Shitrit wrote:
>
>> > Why isn't this stuff in open/close?
>>
>> According to ipoib control flows, there is a different between
>> open/close to init/cleanup for example, in open/close the driver
>> doesn't destroy hw resources, just change the state, it destroys
>> them in cleanup.
>
> So put it in mlx5_alloc_rdma_netdev then?
>
> Or ndo.init as was suggested?

I can do that, as i said to your previous suggestion, will add the
ib_device to the rdma_netdev and will use the ndo.init

>
> Or in the void (*setup)(struct net_device *)
>
>> >> +     param.size_base_priv = sizeof(struct ipoib_rdma_netdev);
>> >
>> > This is really weird, the code in mlx5i_create_netdev calls
>> > ipoib_dev_priv so it must assume the struct is a ipoib_rdma_netdev.
>>
>> It is the same attitude as in the vnic/hfi
>> (https://patchwork.kernel.org/patch/9587815/)
>
> Not quite, they call alloc_netdev_mqs directly, here indirects through
> mlx5i_create_netdev which assumes a priv layout, Just drop
> param.size_base_priv and put that same calculation in
> mlx5i_create_netdev..

We are sharing 2 drivers as the low level driver, anyway i will find
the way to do that.

>
> Jason
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]           ` <CAAk-MO8dZJKYgpWT8p1fz0O6y2wTx4jxoJhYf21ikMOef_xUMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-14 19:03             ` Vishwanathapura, Niranjana
       [not found]               ` <20170314190300.GA80705-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Vishwanathapura, Niranjana @ 2017-03-14 19:03 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: Jason Gunthorpe, Erez Shitrit, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	Leon Romanovsky

On Tue, Mar 14, 2017 at 03:25:01PM +0200, Erez Shitrit wrote:
>On Mon, Mar 13, 2017 at 10:01 PM, Jason Gunthorpe
><jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> On Mon, Mar 13, 2017 at 08:31:15PM +0200, Erez Shitrit wrote:
>>
>>> diff --git a/include/rdma/ib_ipoib_accel_ops.h b/include/rdma/ib_ipoib_accel_ops.h
>>> new file mode 100644
>>> index 000000000000..148a5529a559
>>> +++ b/include/rdma/ib_ipoib_accel_ops.h
>>
>> Both patches need a better naming scheme for this file..
>>
>> rn_opa_vnic.h
>> rn_ipoib.h
>>
>> Maybe?
>
>Can work for me.
>
>vnic?
>

I do see why it is an issue for ipoib (we don't want two ipoib.h files). We can 
add rn_ prefix for vnic, but probably opa_vnic.h is good enough.

>>> +     void *context;
>>
>> What is this? Why is something other than ipoib_priv or ipoib_dev_priv
>> needed?
>
>ipoib_priv is doesn't known in the lower layers, that context is used
>to keep data (qp pointer in that case) that is not available
>otherwise.
>

lower layers should be using ipoib_dev_priv() and no other context is needed. 
which patch is using this 'context'?

Niranjana
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]               ` <20170314190300.GA80705-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
@ 2017-03-15  6:27                 ` Leon Romanovsky
       [not found]                   ` <20170315062728.GB2079-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15  6:27 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Erez Shitrit, Jason Gunthorpe, Erez Shitrit, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 990 bytes --]

On Tue, Mar 14, 2017 at 12:03:01PM -0700, Vishwanathapura, Niranjana wrote:
> On Tue, Mar 14, 2017 at 03:25:01PM +0200, Erez Shitrit wrote:
> > On Mon, Mar 13, 2017 at 10:01 PM, Jason Gunthorpe
> > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > > On Mon, Mar 13, 2017 at 08:31:15PM +0200, Erez Shitrit wrote:
> > >
> > > > diff --git a/include/rdma/ib_ipoib_accel_ops.h b/include/rdma/ib_ipoib_accel_ops.h
> > > > new file mode 100644
> > > > index 000000000000..148a5529a559
> > > > +++ b/include/rdma/ib_ipoib_accel_ops.h
> > >
> > > Both patches need a better naming scheme for this file..
> > >
> > > rn_opa_vnic.h
> > > rn_ipoib.h
> > >
> > > Maybe?
> >
> > Can work for me.
> >
> > vnic?
> >
>
> I do see why it is an issue for ipoib (we don't want two ipoib.h files). We
> can add rn_ prefix for vnic, but probably opa_vnic.h is good enough.

Just for the consistency, I would like to see the same "rn_" prefix
for both files (ipoib and opa_vnic).

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]           ` <20170314161149.GA15752-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-15  6:30               ` Leon Romanovsky
  0 siblings, 0 replies; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15  6:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Vishwanathapura, Niranjana, Erez Shitrit,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

[-- Attachment #1: Type: text/plain, Size: 1316 bytes --]

On Tue, Mar 14, 2017 at 10:11:49AM -0600, Jason Gunthorpe wrote:
> On Tue, Mar 14, 2017 at 12:01:09AM -0700, Vishwanathapura, Niranjana wrote:
> > On Mon, Mar 13, 2017 at 02:01:36PM -0600, Jason Gunthorpe wrote:
> > >>+	/* multicast */
> > >>+	int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
> > >>+			    union ib_gid *gid, u16 lid, int set_qkey);
> > >>+	int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
> > >>+			    union ib_gid *gid, u16 lid);
> > >
> > >It would make more sense to store the struct ib_device pointer in the
> > >struct rdma_netdev.
> > >
> >
> > Agree that it shouldn't be a function parameters.
> > For opa_vnic, I found it convenient to store ib_device pointer in client and
> > device private structures as those will be available in most places anyhow.
>
> If vnic uses it too, then lets add the ib_device and port num to
> rdma_netdev itself?

Agree, at the end this rdma_netdev is intended for the drivers/infiniband
and it is better to have this binding (rdma_netdev and ib_device) as early as possible.

Thanks

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
@ 2017-03-15  6:30               ` Leon Romanovsky
  0 siblings, 0 replies; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15  6:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Vishwanathapura, Niranjana, Erez Shitrit,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

[-- Attachment #1: Type: text/plain, Size: 1316 bytes --]

On Tue, Mar 14, 2017 at 10:11:49AM -0600, Jason Gunthorpe wrote:
> On Tue, Mar 14, 2017 at 12:01:09AM -0700, Vishwanathapura, Niranjana wrote:
> > On Mon, Mar 13, 2017 at 02:01:36PM -0600, Jason Gunthorpe wrote:
> > >>+	/* multicast */
> > >>+	int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
> > >>+			    union ib_gid *gid, u16 lid, int set_qkey);
> > >>+	int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
> > >>+			    union ib_gid *gid, u16 lid);
> > >
> > >It would make more sense to store the struct ib_device pointer in the
> > >struct rdma_netdev.
> > >
> >
> > Agree that it shouldn't be a function parameters.
> > For opa_vnic, I found it convenient to store ib_device pointer in client and
> > device private structures as those will be available in most places anyhow.
>
> If vnic uses it too, then lets add the ib_device and port num to
> rdma_netdev itself?

Agree, at the end this rdma_netdev is intended for the drivers/infiniband
and it is better to have this binding (rdma_netdev and ib_device) as early as possible.

Thanks

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]             ` <20170314160021.GD3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-15  6:47                 ` Leon Romanovsky
  2017-03-15 10:11               ` Erez Shitrit
  1 sibling, 0 replies; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15  6:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Vishwanathapura, Niranjana, Erez Shitrit,
	Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 2149 bytes --]

On Tue, Mar 14, 2017 at 10:00:21AM -0600, Jason Gunthorpe wrote:
> On Tue, Mar 14, 2017 at 04:42:55PM +0200, Erez Shitrit wrote:
> > >> +       if (!hca->alloc_rdma_netdev)
> > >> +               dev = ipoib_create_netdev_default(hca, name,
> > >> ipoib_setup_common);
> > >> +       else
> > >> +               dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> > >> +                                            name, NET_NAME_UNKNOWN,
> > >> +                                            ipoib_setup_common);
> > >> +       if (!dev) {
> > >> +               kfree(priv);
> > >> +               return NULL;
> > >> +       }
> > >
> > >
> > > This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> > > OPA_VNIC type. We should probably look for a dedicated return type
> > > (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> > > Or use a ib device attribute to suggest driver support ipoib rdma netdev.
> >
> > sorry, I don't understand that, we are in ipoib driver, so the type is
> > RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
> > flag, and to use OPA_VNIC for vnic.
>
> He means it should look like this:
>
>  if (hca->alloc_rdma_netdev)
>      dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>                                             name, NET_NAME_UNKNOWN,
>                                             ipoib_setup_common);
>
>  if (IS_ERR(dev) && PTR_ERR(dev) != ENOTSUP)
>       goto out;
>
>  dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
>  if (IS_ERR(dev))
>       goto out;
>
>  WARN_ON(dev == NULL);
>
>   [...]
>
> out:
>       return PTR_ERR(dev);
>
> And I'm confused why 'ipoib_create_netdev_default' doesn't need the
> same function signature as hca->alloc_rdma_netdev

And now, I'm confused.
In your's proposal, the "dev" will be overwritten, in Erez's proposal,
"dev" will be one of two: defaults or device specific.

>
> Both drivers should implement hca->alloc_rdma_netdev to return ENOTSUP
> if RDMA_NETDEV is not supported.

Did you mean "if RDMA_NETDEV of specific type is not supported"?

Thanks

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
@ 2017-03-15  6:47                 ` Leon Romanovsky
  0 siblings, 0 replies; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15  6:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Vishwanathapura, Niranjana, Erez Shitrit,
	Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 2149 bytes --]

On Tue, Mar 14, 2017 at 10:00:21AM -0600, Jason Gunthorpe wrote:
> On Tue, Mar 14, 2017 at 04:42:55PM +0200, Erez Shitrit wrote:
> > >> +       if (!hca->alloc_rdma_netdev)
> > >> +               dev = ipoib_create_netdev_default(hca, name,
> > >> ipoib_setup_common);
> > >> +       else
> > >> +               dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> > >> +                                            name, NET_NAME_UNKNOWN,
> > >> +                                            ipoib_setup_common);
> > >> +       if (!dev) {
> > >> +               kfree(priv);
> > >> +               return NULL;
> > >> +       }
> > >
> > >
> > > This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> > > OPA_VNIC type. We should probably look for a dedicated return type
> > > (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> > > Or use a ib device attribute to suggest driver support ipoib rdma netdev.
> >
> > sorry, I don't understand that, we are in ipoib driver, so the type is
> > RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
> > flag, and to use OPA_VNIC for vnic.
>
> He means it should look like this:
>
>  if (hca->alloc_rdma_netdev)
>      dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>                                             name, NET_NAME_UNKNOWN,
>                                             ipoib_setup_common);
>
>  if (IS_ERR(dev) && PTR_ERR(dev) != ENOTSUP)
>       goto out;
>
>  dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
>  if (IS_ERR(dev))
>       goto out;
>
>  WARN_ON(dev == NULL);
>
>   [...]
>
> out:
>       return PTR_ERR(dev);
>
> And I'm confused why 'ipoib_create_netdev_default' doesn't need the
> same function signature as hca->alloc_rdma_netdev

And now, I'm confused.
In your's proposal, the "dev" will be overwritten, in Erez's proposal,
"dev" will be one of two: defaults or device specific.

>
> Both drivers should implement hca->alloc_rdma_netdev to return ENOTSUP
> if RDMA_NETDEV is not supported.

Did you mean "if RDMA_NETDEV of specific type is not supported"?

Thanks

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops
       [not found]             ` <20170314161013.GF3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-15  8:42               ` Erez Shitrit
  0 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-15  8:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky

On Tue, Mar 14, 2017 at 6:10 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, Mar 14, 2017 at 04:53:24PM +0200, Erez Shitrit wrote:
>
>> > Why isn't this stuff in open/close?
>>
>> According to ipoib control flows, there is a different between
>> open/close to init/cleanup for example, in open/close the driver
>> doesn't destroy hw resources, just change the state, it destroys
>> them in cleanup.
>
> So put it in mlx5_alloc_rdma_netdev then?
>
> Or ndo.init as was suggested?
>
> Or in the void (*setup)(struct net_device *)
>
>> >> +     param.size_base_priv = sizeof(struct ipoib_rdma_netdev);
>> >
>> > This is really weird, the code in mlx5i_create_netdev calls
>> > ipoib_dev_priv so it must assume the struct is a ipoib_rdma_netdev.
>>
>> It is the same attitude as in the vnic/hfi
>> (https://patchwork.kernel.org/patch/9587815/)
>
> Not quite, they call alloc_netdev_mqs directly, here indirects through
> mlx5i_create_netdev which assumes a priv layout, Just drop
> param.size_base_priv and put that same calculation in
> mlx5i_create_netdev..

Agree, will fix that. thanks.

>
> Jason
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]             ` <20170314160021.GD3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-03-15  6:47                 ` Leon Romanovsky
@ 2017-03-15 10:11               ` Erez Shitrit
  1 sibling, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-15 10:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Vishwanathapura, Niranjana, Erez Shitrit, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	Leon Romanovsky

On Tue, Mar 14, 2017 at 6:00 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, Mar 14, 2017 at 04:42:55PM +0200, Erez Shitrit wrote:
>> >> +       if (!hca->alloc_rdma_netdev)
>> >> +               dev = ipoib_create_netdev_default(hca, name,
>> >> ipoib_setup_common);
>> >> +       else
>> >> +               dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>> >> +                                            name, NET_NAME_UNKNOWN,
>> >> +                                            ipoib_setup_common);
>> >> +       if (!dev) {
>> >> +               kfree(priv);
>> >> +               return NULL;
>> >> +       }
>> >
>> >
>> > This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
>> > OPA_VNIC type. We should probably look for a dedicated return type
>> > (-ENODEV?) to determine of the driver supports specified rdma netdev type.
>> > Or use a ib device attribute to suggest driver support ipoib rdma netdev.
>>
>> sorry, I don't understand that, we are in ipoib driver, so the type is
>> RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
>> flag, and to use OPA_VNIC for vnic.
>
> He means it should look like this:
>
>  if (hca->alloc_rdma_netdev)
>      dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>                                             name, NET_NAME_UNKNOWN,
>                                             ipoib_setup_common);
>
>  if (IS_ERR(dev) && PTR_ERR(dev) != ENOTSUP)
>       goto out;
>
>  dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
>  if (IS_ERR(dev))
>       goto out;
>
>  WARN_ON(dev == NULL);
>
>   [...]
>
> out:
>       return PTR_ERR(dev);

OK, got it. will arrange the code accordingly. thank you both.

>
> And I'm confused why 'ipoib_create_netdev_default' doesn't need the
> same function signature as hca->alloc_rdma_netdev

I can pass both of them the same parameters, just to be consistent,
(the default doesn't need the type RDMA_NETDEV_IPOIB and the port).

>
> Both drivers should implement hca->alloc_rdma_netdev to return ENOTSUP

Both is for vnic and ipoib ?

> if RDMA_NETDEV is not supported.
>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
  2017-03-15  6:47                 ` Leon Romanovsky
  (?)
@ 2017-03-15 15:58                 ` Jason Gunthorpe
       [not found]                   ` <20170315155802.GB29562-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  -1 siblings, 1 reply; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-15 15:58 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Erez Shitrit, Vishwanathapura, Niranjana, Erez Shitrit,
	Doug Ledford, linux-rdma, netdev, valex, saedm

On Wed, Mar 15, 2017 at 08:47:51AM +0200, Leon Romanovsky wrote:
> On Tue, Mar 14, 2017 at 10:00:21AM -0600, Jason Gunthorpe wrote:
> > On Tue, Mar 14, 2017 at 04:42:55PM +0200, Erez Shitrit wrote:
> > > >> +       if (!hca->alloc_rdma_netdev)
> > > >> +               dev = ipoib_create_netdev_default(hca, name,
> > > >> ipoib_setup_common);
> > > >> +       else
> > > >> +               dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> > > >> +                                            name, NET_NAME_UNKNOWN,
> > > >> +                                            ipoib_setup_common);
> > > >> +       if (!dev) {
> > > >> +               kfree(priv);
> > > >> +               return NULL;
> > > >> +       }
> > > >
> > > >
> > > > This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> > > > OPA_VNIC type. We should probably look for a dedicated return type
> > > > (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> > > > Or use a ib device attribute to suggest driver support ipoib rdma netdev.
> > >
> > > sorry, I don't understand that, we are in ipoib driver, so the type is
> > > RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
> > > flag, and to use OPA_VNIC for vnic.
> >
> > He means it should look like this:
> >
> >  if (hca->alloc_rdma_netdev)
> >      dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> >                                             name, NET_NAME_UNKNOWN,
> >                                             ipoib_setup_common);
> >
> >  if (IS_ERR(dev) && PTR_ERR(dev) != ENOTSUP)
> >       goto out;
> >
> >  dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
> >  if (IS_ERR(dev))
> >       goto out;
> >
> >  WARN_ON(dev == NULL);
> >
> >   [...]
> >
> > out:
> >       return PTR_ERR(dev);
> >
> > And I'm confused why 'ipoib_create_netdev_default' doesn't need the
> > same function signature as hca->alloc_rdma_netdev
> 
> And now, I'm confused.
> In your's proposal, the "dev" will be overwritten, in Erez's proposal,
> "dev" will be one of two: defaults or device specific.

Well, not Erez's version allowed dev to be ERR_PTR too. More like this then

struct rdma_netdev *get_netdev(..)
{
   if (hca-alloc_rdma_netdev) {
       dev = hca-alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
                                              name, NET_NAME_UNKNOWN,
                                              ipoib_setup_common);
 
       if (!IS_ERR(dev) || PTR_ERR(dev) != ENOTSUP)
           return dev;
   }
 
   return ipoib_create_netdev_default(hca, name, ipoib_setup_common);
}

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]             ` <20170314160616.GE3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-15 16:27               ` Erez Shitrit
       [not found]                 ` <CAAk-MO96BbTnn=gzdZhRpQUENn=AVUJv1JhPbrSYrNHOrbFVeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Erez Shitrit @ 2017-03-15 16:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky,
	Niranjana Vishwanathapura

On Tue, Mar 14, 2017 at 6:06 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Tue, Mar 14, 2017 at 04:39:20PM +0200, Erez Shitrit wrote:
>
>> >> -static int ipoib_dev_init_default(struct net_device *dev, struct ib_device *ca,
>> >> -                               int port)
>> >> +static int ipoib_dev_init_default(struct net_device *dev,
>> >> +                               struct ib_device *hca, int *qp_num)
>> >>  {
>> >> -     struct ipoib_dev_priv *priv = netdev_priv(dev);
>> >> +     struct ipoib_dev_priv *priv = ipoib_priv(dev);
>> >> +
>> >> +     netif_napi_add(dev, &priv->napi, ipoib_poll, NAPI_POLL_WEIGHT);
>> >
>> > All these 'default' functions are part of the 'rn driver'. They should
>> > not be calling ipoib_priv, you said you didn't want ipoib_dev_priv
>> > leaking into the drivers.
>>
>> Do you mean in the name of the function?   drma_netdev_init_default
>> instead of ipoib_dev_init_default ?
>
> No, I mean ipoib_priv() should only be called by IPOIB core code and
> ipoib_dev_priv() should only be called by code under the rdma_netdev
> interface.
>
> The names you picked are confusing.
>
> This is the core code priv:
>
>  struct ipoib_dev_priv *priv = ipoib_priv(dev);
>
> And this is the rdma_netdev priv:
>
>   struct ipoib_rn_default *priv = ipoib_dev_priv(dev)
>   struct ipoib_rn_mlx5 *priv = ipoib_dev_priv(dev)
>
> Rename ipoib_dev_priv() to something that doesn't collide with the
> structure.

OK, will change the function  ipoib_dev_priv to something like  ipoib_accel_priv

>
> Move things like priv->napi into ipoib_rn_default, only the stuff
> under rdma_netdev should touch NAPI.

When no accelerator is around, ipoib should work as before, and the
struct struct ipoib_dev_priv is the one who does all datapath and
control.

>
> Try hard not to use 'struct ipoib_dev_priv' inside the callbacks
> related to the default rn netdev implementation.

OK

>
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]               ` <20170315063043.GC2079-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-03-15 16:40                 ` Vishwanathapura, Niranjana
       [not found]                   ` <20170315164050.GA81782-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 65+ messages in thread
From: Vishwanathapura, Niranjana @ 2017-03-15 16:40 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Erez Shitrit, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, saedm-VPRAkNaXOzVWk0Htik3J/w,
	erezsh-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb

On Wed, Mar 15, 2017 at 08:30:43AM +0200, Leon Romanovsky wrote:
>On Tue, Mar 14, 2017 at 10:11:49AM -0600, Jason Gunthorpe wrote:
>> On Tue, Mar 14, 2017 at 12:01:09AM -0700, Vishwanathapura, Niranjana wrote:
>> > On Mon, Mar 13, 2017 at 02:01:36PM -0600, Jason Gunthorpe wrote:
>> > >>+	/* multicast */
>> > >>+	int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
>> > >>+			    union ib_gid *gid, u16 lid, int set_qkey);
>> > >>+	int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
>> > >>+			    union ib_gid *gid, u16 lid);
>> > >
>> > >It would make more sense to store the struct ib_device pointer in the
>> > >struct rdma_netdev.
>> > >
>> >
>> > Agree that it shouldn't be a function parameters.
>> > For opa_vnic, I found it convenient to store ib_device pointer in client and
>> > device private structures as those will be available in most places anyhow.
>>
>> If vnic uses it too, then lets add the ib_device and port num to
>> rdma_netdev itself?
>
>Agree, at the end this rdma_netdev is intended for the drivers/infiniband
>and it is better to have this binding (rdma_netdev and ib_device) as early as possible.
>

I agree with adding ibdev and port num to rdma_netdev.

Niranjana

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]                   ` <20170315062728.GB2079-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-03-15 16:42                     ` Vishwanathapura, Niranjana
  0 siblings, 0 replies; 65+ messages in thread
From: Vishwanathapura, Niranjana @ 2017-03-15 16:42 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Erez Shitrit, Jason Gunthorpe, Erez Shitrit, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w

On Wed, Mar 15, 2017 at 08:27:28AM +0200, Leon Romanovsky wrote:
>On Tue, Mar 14, 2017 at 12:03:01PM -0700, Vishwanathapura, Niranjana wrote:
>> On Tue, Mar 14, 2017 at 03:25:01PM +0200, Erez Shitrit wrote:
>> > On Mon, Mar 13, 2017 at 10:01 PM, Jason Gunthorpe
>> > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> > > On Mon, Mar 13, 2017 at 08:31:15PM +0200, Erez Shitrit wrote:
>> > >
>> > > > diff --git a/include/rdma/ib_ipoib_accel_ops.h b/include/rdma/ib_ipoib_accel_ops.h
>> > > > new file mode 100644
>> > > > index 000000000000..148a5529a559
>> > > > +++ b/include/rdma/ib_ipoib_accel_ops.h
>> > >
>> > > Both patches need a better naming scheme for this file..
>> > >
>> > > rn_opa_vnic.h
>> > > rn_ipoib.h
>> > >
>> > > Maybe?
>> >
>> > Can work for me.
>> >
>> > vnic?
>> >
>>
>> I do see why it is an issue for ipoib (we don't want two ipoib.h files). We
>> can add rn_ prefix for vnic, but probably opa_vnic.h is good enough.
>
>Just for the consistency, I would like to see the same "rn_" prefix
>for both files (ipoib and opa_vnic).
>
>Thanks

Fine with VNIC.

Niranjana

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]                 ` <CAAk-MO96BbTnn=gzdZhRpQUENn=AVUJv1JhPbrSYrNHOrbFVeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-15 16:46                   ` Jason Gunthorpe
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-15 16:46 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: Erez Shitrit, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	valex-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky,
	Niranjana Vishwanathapura

On Wed, Mar 15, 2017 at 06:27:23PM +0200, Erez Shitrit wrote:

> > Move things like priv->napi into ipoib_rn_default, only the stuff
> > under rdma_netdev should touch NAPI.
> 
> When no accelerator is around, ipoib should work as before, and the
> struct struct ipoib_dev_priv is the one who does all datapath and
> control.

The point is to make the code maintainable, by moving everything that
is only used by *_default out of ipoib_dev_priv we can use the compiler
to prove it isn't being used improperly by the core code.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
       [not found]                   ` <20170315164050.GA81782-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
@ 2017-03-15 16:52                     ` Erez Shitrit
  0 siblings, 0 replies; 65+ messages in thread
From: Erez Shitrit @ 2017-03-15 16:52 UTC (permalink / raw)
  To: Vishwanathapura, Niranjana
  Cc: Leon Romanovsky, Jason Gunthorpe, Erez Shitrit, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w

On Wed, Mar 15, 2017 at 6:40 PM, Vishwanathapura, Niranjana
<niranjana.vishwanathapura-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, Mar 15, 2017 at 08:30:43AM +0200, Leon Romanovsky wrote:
>>
>> On Tue, Mar 14, 2017 at 10:11:49AM -0600, Jason Gunthorpe wrote:
>>>
>>> On Tue, Mar 14, 2017 at 12:01:09AM -0700, Vishwanathapura, Niranjana
>>> wrote:
>>> > On Mon, Mar 13, 2017 at 02:01:36PM -0600, Jason Gunthorpe wrote:
>>> > >>+   /* multicast */
>>> > >>+   int (*attach_mcast)(struct net_device *dev, struct ib_device
>>> > >> *hca,
>>> > >>+                       union ib_gid *gid, u16 lid, int set_qkey);
>>> > >>+   int (*detach_mcast)(struct net_device *dev, struct ib_device
>>> > >> *hca,
>>> > >>+                       union ib_gid *gid, u16 lid);
>>> > >
>>> > >It would make more sense to store the struct ib_device pointer in the
>>> > >struct rdma_netdev.
>>> > >
>>> >
>>> > Agree that it shouldn't be a function parameters.
>>> > For opa_vnic, I found it convenient to store ib_device pointer in
>>> > client and
>>> > device private structures as those will be available in most places
>>> > anyhow.
>>>
>>> If vnic uses it too, then lets add the ib_device and port num to
>>> rdma_netdev itself?
>>
>>
>> Agree, at the end this rdma_netdev is intended for the drivers/infiniband
>> and it is better to have this binding (rdma_netdev and ib_device) as early
>> as possible.
>>
>
> I agree with adding ibdev and port num to rdma_netdev.

Good. will do that.

>
> Niranjana
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
       [not found]                   ` <20170315155802.GB29562-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-15 17:39                       ` Leon Romanovsky
  0 siblings, 0 replies; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15 17:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Vishwanathapura, Niranjana, Erez Shitrit,
	Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 3148 bytes --]

On Wed, Mar 15, 2017 at 09:58:02AM -0600, Jason Gunthorpe wrote:
> On Wed, Mar 15, 2017 at 08:47:51AM +0200, Leon Romanovsky wrote:
> > On Tue, Mar 14, 2017 at 10:00:21AM -0600, Jason Gunthorpe wrote:
> > > On Tue, Mar 14, 2017 at 04:42:55PM +0200, Erez Shitrit wrote:
> > > > >> +       if (!hca->alloc_rdma_netdev)
> > > > >> +               dev = ipoib_create_netdev_default(hca, name,
> > > > >> ipoib_setup_common);
> > > > >> +       else
> > > > >> +               dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> > > > >> +                                            name, NET_NAME_UNKNOWN,
> > > > >> +                                            ipoib_setup_common);
> > > > >> +       if (!dev) {
> > > > >> +               kfree(priv);
> > > > >> +               return NULL;
> > > > >> +       }
> > > > >
> > > > >
> > > > > This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> > > > > OPA_VNIC type. We should probably look for a dedicated return type
> > > > > (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> > > > > Or use a ib device attribute to suggest driver support ipoib rdma netdev.
> > > >
> > > > sorry, I don't understand that, we are in ipoib driver, so the type is
> > > > RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
> > > > flag, and to use OPA_VNIC for vnic.
> > >
> > > He means it should look like this:
> > >
> > >  if (hca->alloc_rdma_netdev)
> > >      dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> > >                                             name, NET_NAME_UNKNOWN,
> > >                                             ipoib_setup_common);
> > >
> > >  if (IS_ERR(dev) && PTR_ERR(dev) != ENOTSUP)
> > >       goto out;
> > >
> > >  dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
> > >  if (IS_ERR(dev))
> > >       goto out;
> > >
> > >  WARN_ON(dev == NULL);
> > >
> > >   [...]
> > >
> > > out:
> > >       return PTR_ERR(dev);
> > >
> > > And I'm confused why 'ipoib_create_netdev_default' doesn't need the
> > > same function signature as hca->alloc_rdma_netdev
> >
> > And now, I'm confused.
> > In your's proposal, the "dev" will be overwritten, in Erez's proposal,
> > "dev" will be one of two: defaults or device specific.
>
> Well, not Erez's version allowed dev to be ERR_PTR too. More like this then
>
> struct rdma_netdev *get_netdev(..)
> {
>    if (hca-alloc_rdma_netdev) {
>        dev = hca-alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>                                               name, NET_NAME_UNKNOWN,
>                                               ipoib_setup_common);
>
>        if (!IS_ERR(dev) || PTR_ERR(dev) != ENOTSUP)
>            return dev;
>    }
>
>    return ipoib_create_netdev_default(hca, name, ipoib_setup_common);
> }

Thanks, I agree with you, the interfaces should properly handle errors
paths from day one.

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks
@ 2017-03-15 17:39                       ` Leon Romanovsky
  0 siblings, 0 replies; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15 17:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Vishwanathapura, Niranjana, Erez Shitrit,
	Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, valex-VPRAkNaXOzVWk0Htik3J/w,
	saedm-VPRAkNaXOzVWk0Htik3J/w

[-- Attachment #1: Type: text/plain, Size: 3148 bytes --]

On Wed, Mar 15, 2017 at 09:58:02AM -0600, Jason Gunthorpe wrote:
> On Wed, Mar 15, 2017 at 08:47:51AM +0200, Leon Romanovsky wrote:
> > On Tue, Mar 14, 2017 at 10:00:21AM -0600, Jason Gunthorpe wrote:
> > > On Tue, Mar 14, 2017 at 04:42:55PM +0200, Erez Shitrit wrote:
> > > > >> +       if (!hca->alloc_rdma_netdev)
> > > > >> +               dev = ipoib_create_netdev_default(hca, name,
> > > > >> ipoib_setup_common);
> > > > >> +       else
> > > > >> +               dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> > > > >> +                                            name, NET_NAME_UNKNOWN,
> > > > >> +                                            ipoib_setup_common);
> > > > >> +       if (!dev) {
> > > > >> +               kfree(priv);
> > > > >> +               return NULL;
> > > > >> +       }
> > > > >
> > > > >
> > > > > This will break ipoib on hfi1 as hfi1 will define alloc_rdma_netdev for
> > > > > OPA_VNIC type. We should probably look for a dedicated return type
> > > > > (-ENODEV?) to determine of the driver supports specified rdma netdev type.
> > > > > Or use a ib device attribute to suggest driver support ipoib rdma netdev.
> > > >
> > > > sorry, I don't understand that, we are in ipoib driver, so the type is
> > > > RDMA_NETDEV_IPOIB, if hfi wants to implement it should use the same
> > > > flag, and to use OPA_VNIC for vnic.
> > >
> > > He means it should look like this:
> > >
> > >  if (hca->alloc_rdma_netdev)
> > >      dev = hca->alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
> > >                                             name, NET_NAME_UNKNOWN,
> > >                                             ipoib_setup_common);
> > >
> > >  if (IS_ERR(dev) && PTR_ERR(dev) != ENOTSUP)
> > >       goto out;
> > >
> > >  dev = ipoib_create_netdev_default(hca, name, ipoib_setup_common);
> > >  if (IS_ERR(dev))
> > >       goto out;
> > >
> > >  WARN_ON(dev == NULL);
> > >
> > >   [...]
> > >
> > > out:
> > >       return PTR_ERR(dev);
> > >
> > > And I'm confused why 'ipoib_create_netdev_default' doesn't need the
> > > same function signature as hca->alloc_rdma_netdev
> >
> > And now, I'm confused.
> > In your's proposal, the "dev" will be overwritten, in Erez's proposal,
> > "dev" will be one of two: defaults or device specific.
>
> Well, not Erez's version allowed dev to be ERR_PTR too. More like this then
>
> struct rdma_netdev *get_netdev(..)
> {
>    if (hca-alloc_rdma_netdev) {
>        dev = hca-alloc_rdma_netdev(hca, port, RDMA_NETDEV_IPOIB,
>                                               name, NET_NAME_UNKNOWN,
>                                               ipoib_setup_common);
>
>        if (!IS_ERR(dev) || PTR_ERR(dev) != ENOTSUP)
>            return dev;
>    }
>
>    return ipoib_create_netdev_default(hca, name, ipoib_setup_common);
> }

Thanks, I agree with you, the interfaces should properly handle errors
paths from day one.

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 14/25] net/mlx5: Enable flow-steering for IB link
  2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 14/25] net/mlx5: Enable flow-steering for IB link Erez Shitrit
@ 2017-03-15 18:56     ` Leon Romanovsky
  0 siblings, 0 replies; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15 18:56 UTC (permalink / raw)
  To: Erez Shitrit; +Cc: dledford, linux-rdma, netdev, valex, saedm, erezsh

[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]

On Mon, Mar 13, 2017 at 08:31:25PM +0200, Erez Shitrit wrote:
>
> Get the relevant capabilities if supports ipoib_enhanced_offloads and
> init the flow steering table accordingly.
>
> Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +++++-----
>  drivers/net/ethernet/mellanox/mlx5/core/fw.c      |  3 ++-
>  2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> index fa4edd88daf1..dd21fc557281 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> @@ -1991,9 +1991,6 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
>  	struct mlx5_flow_steering *steering;
>  	int err = 0;
>
> -	if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
> -		return 0;
> -
>  	err = mlx5_init_fc_stats(dev);
>  	if (err)
>  		return err;
> @@ -2004,8 +2001,11 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
>  	steering->dev = dev;
>  	dev->priv.steering = steering;
>
> -	if (MLX5_CAP_GEN(dev, nic_flow_table) &&
> -	    MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
> +	if ((((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
> +	      (MLX5_CAP_GEN(dev, nic_flow_table))) ||
> +	     ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
> +	      MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)))
> +	    && MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {

Erez,

Please calculate the result outside of "if.." and do it in steps,
it is pretty hard to count all these brackets.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 14/25] net/mlx5: Enable flow-steering for IB link
@ 2017-03-15 18:56     ` Leon Romanovsky
  0 siblings, 0 replies; 65+ messages in thread
From: Leon Romanovsky @ 2017-03-15 18:56 UTC (permalink / raw)
  To: Erez Shitrit; +Cc: dledford, linux-rdma, netdev, valex, saedm, erezsh

[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]

On Mon, Mar 13, 2017 at 08:31:25PM +0200, Erez Shitrit wrote:
>
> Get the relevant capabilities if supports ipoib_enhanced_offloads and
> init the flow steering table accordingly.
>
> Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +++++-----
>  drivers/net/ethernet/mellanox/mlx5/core/fw.c      |  3 ++-
>  2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> index fa4edd88daf1..dd21fc557281 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
> @@ -1991,9 +1991,6 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
>  	struct mlx5_flow_steering *steering;
>  	int err = 0;
>
> -	if (MLX5_CAP_GEN(dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
> -		return 0;
> -
>  	err = mlx5_init_fc_stats(dev);
>  	if (err)
>  		return err;
> @@ -2004,8 +2001,11 @@ int mlx5_init_fs(struct mlx5_core_dev *dev)
>  	steering->dev = dev;
>  	dev->priv.steering = steering;
>
> -	if (MLX5_CAP_GEN(dev, nic_flow_table) &&
> -	    MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {
> +	if ((((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
> +	      (MLX5_CAP_GEN(dev, nic_flow_table))) ||
> +	     ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_IB) &&
> +	      MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)))
> +	    && MLX5_CAP_FLOWTABLE_NIC_RX(dev, ft_support)) {

Erez,

Please calculate the result outside of "if.." and do it in steps,
it is pretty hard to count all these brackets.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
  2017-03-13 20:01     ` Jason Gunthorpe
  2017-03-14  7:01       ` Vishwanathapura, Niranjana
       [not found]       ` <20170313200136.GA2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-03-16 15:17       ` Erez Shitrit
  2017-03-16 16:04         ` Jason Gunthorpe
  2 siblings, 1 reply; 65+ messages in thread
From: Erez Shitrit @ 2017-03-16 15:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Erez Shitrit, Doug Ledford, linux-rdma, netdev, valex,
	Leon Romanovsky, saedm

On Mon, Mar 13, 2017 at 10:01 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Mon, Mar 13, 2017 at 08:31:15PM +0200, Erez Shitrit wrote:
>
>> diff --git a/include/rdma/ib_ipoib_accel_ops.h b/include/rdma/ib_ipoib_accel_ops.h
>> new file mode 100644
>> index 000000000000..148a5529a559
>> +++ b/include/rdma/ib_ipoib_accel_ops.h
>
> Both patches need a better naming scheme for this file..
>
> rn_opa_vnic.h
> rn_ipoib.h
>
> Maybe?
>
>> +struct rdma_netdev {
>> +     void *clnt_priv;
>> +
>> +     /* control functions */
>> +     void (*set_id)(struct net_device *netdev, int id);
>
>> +     /* IB resource allocation function, returns new UD QP */
>> +     int (*ib_dev_init)(struct net_device *dev, struct ib_device *hca,
>> +                        int *qp_num);
>
> Why can't some combination of alloc_rdma_netdev and ndo.open do this stuff?
>
>> +     void (*ib_dev_cleanup)(struct net_device *dev, struct ib_device *hca);
>
> Ditto
>
>> +     /* send packet */
>> +     void (*send)(struct net_device *dev, struct sk_buff *skb,
>> +                  struct ipoib_ah *address, u32 dqpn, u32 dqkey);
>
>> +     /* multicast */
>> +     int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
>> +                         union ib_gid *gid, u16 lid, int set_qkey);
>> +     int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
>> +                         union ib_gid *gid, u16 lid);
>
> It would make more sense to store the struct ib_device pointer in the
> struct rdma_netdev.
>
> Should 'lid' be 'mlid'?
>
>> +     int qp_num;
>
> This one probably belongs in ipoib_rdma_netdev
>
>> +     void *context;

The QP as a part of the HW resources, it is created in the low-level
driver, and used by the upper ipoib for few reasons, (for example the
mac of the ipoib interface includes from the qp_num)
Now, if we want to use the ndo's init/uninit i need to store member
variables (qp_num and context) in the rdma_netdev, that will let me
use the ndos as is.
rdma_netdev is the one who belongs to both layers, ipoib and the low-level.

>
> What is this? Why is something other than ipoib_priv or ipoib_dev_priv
> needed?
>
>
>>                                               struct ib_wq_attr *attr,
>>                                               u32 wq_attr_mask,
>>                                               struct ib_udata *udata);
>> +     struct ib_ipoib_accel_ops * (*get_ipoib_accel_ops)(struct ib_device *device);
>
> rebase error? Not sure how this compiles
>
> Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API
  2017-03-16 15:17       ` Erez Shitrit
@ 2017-03-16 16:04         ` Jason Gunthorpe
  0 siblings, 0 replies; 65+ messages in thread
From: Jason Gunthorpe @ 2017-03-16 16:04 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: Erez Shitrit, Doug Ledford, linux-rdma, netdev, valex,
	Leon Romanovsky, saedm

On Thu, Mar 16, 2017 at 05:17:55PM +0200, Erez Shitrit wrote:
 
> The QP as a part of the HW resources, it is created in the low-level
> driver, and used by the upper ipoib for few reasons, (for example the
> mac of the ipoib interface includes from the qp_num)
> Now, if we want to use the ndo's init/uninit i need to store member
> variables (qp_num and context) in the rdma_netdev, that will let me
> use the ndos as is.
> rdma_netdev is the one who belongs to both layers, ipoib and the low-level.

IPOIB rdma_netdev's can be casted to ipoib_rdma_netdev, so it is where
ipoib specific cross-layer stuff should live

Jason

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2017-03-16 16:04 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-13 18:31 [RFC v1 for accelerated IPoIB 00/25] Enhanced mode for IPoIB driver Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 01/25] IB/ipoib: Separate control and data related initializations Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 02/25] IB/ipoib: separate control from HW operation on ipoib_open/stop ndo Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 03/25] IB/ipoib: Rename qpn to dqpn in ipoib_send and post_send functions Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 04/25] IB/verb: Add ipoib_options struct and API Erez Shitrit
     [not found]   ` <1489429896-10781-5-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-13 20:01     ` Jason Gunthorpe
2017-03-14  7:01       ` Vishwanathapura, Niranjana
2017-03-14 13:25         ` Erez Shitrit
2017-03-14 16:11         ` Jason Gunthorpe
     [not found]           ` <20170314161149.GA15752-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-15  6:30             ` Leon Romanovsky
2017-03-15  6:30               ` Leon Romanovsky
     [not found]               ` <20170315063043.GC2079-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-03-15 16:40                 ` Vishwanathapura, Niranjana
     [not found]                   ` <20170315164050.GA81782-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-03-15 16:52                     ` Erez Shitrit
     [not found]       ` <20170313200136.GA2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-14 13:25         ` Erez Shitrit
     [not found]           ` <CAAk-MO8dZJKYgpWT8p1fz0O6y2wTx4jxoJhYf21ikMOef_xUMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-14 19:03             ` Vishwanathapura, Niranjana
     [not found]               ` <20170314190300.GA80705-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-03-15  6:27                 ` Leon Romanovsky
     [not found]                   ` <20170315062728.GB2079-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-03-15 16:42                     ` Vishwanathapura, Niranjana
2017-03-16 15:17       ` Erez Shitrit
2017-03-16 16:04         ` Jason Gunthorpe
2017-03-14  6:44     ` Vishwanathapura, Niranjana
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 05/25] IB/ipoib: Support ipoib acceleration options callbacks Erez Shitrit
2017-03-13 20:10   ` Jason Gunthorpe
     [not found]     ` <20170313201049.GB2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-14 14:39       ` Erez Shitrit
     [not found]         ` <CAAk-MO8-bnWRu7BDtYDhNePj+76X4Vb7gFNhED8irLNtqR7tCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-14 16:06           ` Jason Gunthorpe
     [not found]             ` <20170314160616.GE3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-15 16:27               ` Erez Shitrit
     [not found]                 ` <CAAk-MO96BbTnn=gzdZhRpQUENn=AVUJv1JhPbrSYrNHOrbFVeg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-15 16:46                   ` Jason Gunthorpe
     [not found]   ` <1489429896-10781-6-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-14  6:35     ` Vishwanathapura, Niranjana
     [not found]       ` <20170314063538.GB79937-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-03-14 14:42         ` Erez Shitrit
2017-03-14 16:00           ` Jason Gunthorpe
     [not found]             ` <20170314160021.GD3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-15  6:47               ` Leon Romanovsky
2017-03-15  6:47                 ` Leon Romanovsky
2017-03-15 15:58                 ` Jason Gunthorpe
     [not found]                   ` <20170315155802.GB29562-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-15 17:39                     ` Leon Romanovsky
2017-03-15 17:39                       ` Leon Romanovsky
2017-03-15 10:11               ` Erez Shitrit
2017-03-14 15:44       ` Jason Gunthorpe
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 06/25] hw/mlx5: Add New bit to check over QP creation Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 07/25] linux/mlx5/mlx5_ifc.h: Add underlay_qpn field to PRM objects Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 08/25] net/mlx5e: Refactor EN code to support IB link Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 10/25] net/mlx5e: Support netdevice creation for IB link type Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 14/25] net/mlx5: Enable flow-steering for IB link Erez Shitrit
2017-03-15 18:56   ` Leon Romanovsky
2017-03-15 18:56     ` Leon Romanovsky
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 15/25] net/mlx5e: Enhanced flow table creation to support ETH and IB links Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 16/25] net/mlx5e: Change cleanup API in order to enable IB link Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 21/25] net/mlx5e: Export send function for IB link type Erez Shitrit
     [not found] ` <1489429896-10781-1-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 09/25] net/mlx5e: Creating and Destroying flow-steering tables for IB link Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 11/25] net/mlx5e: Refactor attach_netdev API Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 12/25] net/mlx5e: Use underlay_qpn in tis creation Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 13/25] net/mlx5e: Export resource creation function to be used in IB link Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 17/25] net/mlx5e: Change mlx5e_open_locked and mlx5e_close_locked api Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 18/25] net/mlx5e: Export open/close api for IB link Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 19/25] include/linux/mlx5: Add mlx5_wqe_eth_pad and enhanced-ipoib-qp-mode Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 20/25] net/mlx5e: Refactor TX send flow Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 22/25] net/mlx5e: New function pointer for build_rx_skb is Erez Shitrit
2017-03-13 18:31   ` [RFC v1 for accelerated IPoIB 23/25] net/mlx5e: Change the function that checks the packet type Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 24/25] net/mlx5e: Add support for build_rx_skb for packet from IB type Erez Shitrit
2017-03-13 18:31 ` [RFC v1 for accelerated IPoIB 25/25] mlx5_ib: skeleton for mlx5_ib to support ipoib_ops Erez Shitrit
     [not found]   ` <1489429896-10781-26-git-send-email-erezsh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-13 20:27     ` Jason Gunthorpe
     [not found]       ` <20170313202720.GC2738-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-14 14:53         ` Erez Shitrit
2017-03-14 16:10           ` Jason Gunthorpe
2017-03-14 16:37             ` Erez Shitrit
     [not found]             ` <20170314161013.GF3244-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-03-15  8:42               ` Erez Shitrit
2017-03-14  6:07   ` Vishwanathapura, Niranjana
     [not found]     ` <20170314060730.GA79937-wPcXA7LoDC+1XWohqUldA0EOCMrvLtNR@public.gmane.org>
2017-03-14 14:55       ` Erez Shitrit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.