All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next V1 00/29] Add SRIOV support for IB interfaces
@ 2012-06-19  8:21 Jack Morgenstein
       [not found] ` <1340094121-14858-1-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Jack Morgenstein @ 2012-06-19  8:21 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, liranl-VPRAkNaXOzVWk0Htik3J/w,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, Jack Morgenstein,
	dotanb-VPRAkNaXOzVWk0Htik3J/w, tziporet-VPRAkNaXOzVWk0Htik3J/w

This patch set adds SRIOV support for IB interfaces.

Patches 1-13 are "precondition" patches.
Patches 14-29 actually implement the feature.

This patch set introduces Infiniband SRIOV support for ConnectX2 and ConnectX3
devices.  Each function presents itself as an independent vHCA (virtual HCA) to
the host while a single HCA is observable by the network, which is unaware of
the vHCAs.  No changes are required by the IB subsystem, ULPs, and apps to
support SRIOV, and vHCAs are interoperable with any existing (non-virtualized)
IB deployments.
 
We term this model for SRIOV implementation the shared-port model.

Sharing the same physical port(s) among multiple vHCAs is achieved as follows:
 
1. Each vHCA port presents its own virtual GID table.
 
Currently, the virtual GID table comprises a single entry (at index 0) that
maps to a unique index in the physical GID table.  The vHCA of the PF maps to
physical GID index 0. To obtain GIDs for other vHCAs, alias GUIDs are requested
from the SM.  These are GUIDs which the SM places, per port, in the port's guid
table after the 0'th slot (which is read-only and determined by the FW).
The host admin can assign GIDs to vHCAs using a sysfs interface (see below).
 
2. Each vHCA port presents its own virtual PKey table.
 
The virtual PKey table is a mapping of selected indexes of the physical pkey table.
The host admin can control which pkey indexes are mapped to which virtual indexes
using a sysfs interface (see below). Note that the physical PKey table may contain
both full and partial memberships of the same PKey to allow different membership
types in different virtual tables.
 
3. Each vHCA port has its own virtual port state.
 
A vHCA port is up if the following conditions apply:
- The physical port is up
- The virtual GID table contains the GIDs requested by the host admin
- The SM has acknowledged the requested GIDs since the last time that
  the physical port came up
 
4. Other port attributes are shared, e.g., GID prefix, LID,  SM LID, LMC mask.
 
5. Special QPs are para-virtualized.
 
vHCAs are not given direct access to QP0/1. Rather, these QPs are operated by a
special context hosted by the PF, which mediates access to/from vHCAs.
This is done by opening a “tunnel” per vHCA port per QP0/1. A tunnel comprises
a pair of UD QPs:  a “Tunnel QP” in the PF-context and a “Proxy QP” in the vHCA.
All vHCA MAD traffic must pass through the corresponding tunnel.
vHCA QPs cannot be assigned to VL15 and are denied of the well-known QKey. 
 
QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual) QP0’s,
but they never receive any SMPs and all SMPs sent are discarded.
QP1 traffic is allowed for all vHCAs, but special care is required to bridge
the gap between the host and network views.

Specifically:
- Transaction IDs are mapped to guarantee uniqueness among vHCAs
- CM para-virtualization
  o   Incoming requests are steered to the correct vHCA according to the embedded GID
  o   Local communication IDs are mapped to ensure uniqueness among vHCAs
- Multicast para-virtualization
  o   The PF context aggregates membership state from all vHCAs
  o   The SA is contacted only when the aggregate membership changes
  o   If the aggregate does not change, the PF context will provide the
       requesting vHCA with the proper response
 
Incoming MADs are steered according to:
- the DGID If a GRH is present
- the mapped transaction ID for response MADs
- the embedded GID in CM requests
- the remote communication ID in other CM messages

To allow the host admin to control the virtual GID and PKey tables of vHCAs,
a new sysfs ‘iov’ sub-tree has been added under the PF infiniband device.
Details on this mechanism can be found in the change log of:
   IB/mlx4: Add iov directory in sysfs under the ib device

Some Limitations
----------------
1. FMRs are not currently supported on slaves. This will be corrected in a
   future submission.
2. RoCE is not currently supported on slaves. This will be corrected in a
   future submission.
3. Due to a (correct) change in kernel IRQ management in kernel 3.5-rc1 (see
   commit 1c6c69525b40), the KVM module no longer succeeds in passing interrupts
   through to guests.  (see the discussion thread beginning at
   https://lkml.org/lkml/2012/6/1/261).  Until this KVM issue is fixed, anyone
   wishing to use SRIOV-IB (or SRIOV-Ethernet) with ConnectX2 or ConnectX3
   devices on guest O/Ses should revert commit 1c6c69525b40
   (as a TEMPORARY workaround) in order to enable the guests to operate the mlx4 driver.

   VFs may still be bound to the host (via setting the "probe_vf" mlx4_core
   module parameter to a non-zero value in a conf file under /etc/modprobe.d) 
   without reverting the commit mentioned above.


Changes for V1
--------------

1. librdmacm now supports multiple VF/PF on the same host (patch 29).
2. Several patches cleaned up (these were indicated in the V0 changelogs).
   Major cleanups in patch 22 and patch 24.
3. Eliminated code duplication in Port Management Change event code (patch 8).
4. Now use pr_debug, instead of mlx4_ib_debug, and there is no module parameter
   (Roland's recommendation).
5. mlx4_master_func_num() to get the master's "slave_id", to make code more readable.
6. Fixed illegal use of port num field for a force-loopback bit in ib_ah structure
   (V0 patch 2 -- eliminated). The force-loopback bit is now set for Tunnel QPs in 
   mlx4_ib_post send (patch 17).
7. New patch 2 (not related to 6 above) to reserve bits in enum ib_qp_create_flags.
8. V0 patch 26 is now rolled into V1 patch 22. This allowed us to eliminate function
   mlx4_ib_indexed_gid from patch 22 (replaced by using __mlx4_ib_query_gid() from V0 patch 26).


Amir Vadai (1):
  IB/mlx4: Add CM paravirtualization

Erez Shitrit (1):
  IB/sa: Add GuidInfoRecord query support.

Jack Morgenstein (26):
  net/mlx4_core: Pass an invalid PCI id number to VFs
  IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver
    use
  IB/mlx4: Add debug printouts
  IB/core: change pkey table lookups to support full and partial
    membership for the same pkey
  IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey
    match
  IB/core: move macros from cm_msgs.h to ib_cm.h
  {NET,IB}/mlx4: Use port management change event instead of smp_snoop
  net/mlx4_core: For SRIOV, initialize ib port-capabilities for all
    slaves
  net/mlx4_core: Implement mechanism for reserved qkeys
  net/mlx4_core: Allow guests to support IB ports
  {NET,IB}/mlx4_core: place phys gid and pkey tbl sizes in
    mlx4_phys_caps struct and paravirtualize them
  IB/mlx4: SRIOV IB context objects and proxy/tunnel sqp support
  net/mlx4_core: Add proxy and tunnel QPs to the reserved QP area
  IB/mlx4: Initialize SRIOV IB support for slaves in master context
  {NET,IB}/mlx4: Implement QP paravirtualization and maintain
    phys_pkey_cache for smp_snoop
  IB/mlx4: SRIOV multiplex and demultiplex MADs
  {NET,IB}/mlx4: MAD_IFC paravirtualization
  net/mlx4_core: Add IB port-state machine, and port mgmt event
    propagation infrastructure
  {NET,IB}/mlx4: Add alias_guid mechanism
  IB/mlx4: Propagate pkey and guid change port management events to
    slaves
  IB/mlx4: Add iov directory in sysfs under the ib device
  net/mlx4_core: Adjustments to SET_PORT for SRIOV-IB
  net/mlx4_core: INIT/CLOSE port logic for IB ports in SRIOV mode
  IB/mlx4: Miscellaneous adjustments to SRIOV IB support
  {NET,IB}/mlx4: Activate SRIOV mode for IB
  {NET,IB}/mlx4: Paravirtualize Node Guids for slaves.

Oren Duer (1):
  IB/mlx4: Added Multicast Groups (MCG) para-virtualization for SRIOV

 drivers/infiniband/core/cache.c                    |   42 +-
 drivers/infiniband/core/cm_msgs.h                  |   12 -
 drivers/infiniband/core/device.c                   |   17 +-
 drivers/infiniband/core/sa_query.c                 |  133 ++
 drivers/infiniband/hw/mlx4/Makefile                |    2 +-
 drivers/infiniband/hw/mlx4/alias_GUID.c            |  688 ++++++++
 drivers/infiniband/hw/mlx4/cm.c                    |  437 +++++
 drivers/infiniband/hw/mlx4/cq.c                    |   31 +-
 drivers/infiniband/hw/mlx4/mad.c                   | 1684 +++++++++++++++++++-
 drivers/infiniband/hw/mlx4/main.c                  |  285 +++-
 drivers/infiniband/hw/mlx4/mcg.c                   | 1254 +++++++++++++++
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |  360 +++++-
 drivers/infiniband/hw/mlx4/qp.c                    |  651 +++++++-
 drivers/infiniband/hw/mlx4/sysfs.c                 |  794 +++++++++
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |  190 +++-
 drivers/net/ethernet/mellanox/mlx4/en_main.c       |    5 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c            |  259 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.c            |  227 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.h            |    3 +
 drivers/net/ethernet/mellanox/mlx4/intf.c          |    5 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          |  124 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |  116 +-
 drivers/net/ethernet/mellanox/mlx4/port.c          |   21 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c            |   67 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  220 +++-
 include/linux/mlx4/device.h                        |  179 ++-
 include/linux/mlx4/driver.h                        |    5 +-
 include/linux/mlx4/qp.h                            |    3 +-
 include/rdma/ib_cache.h                            |   16 +
 include/rdma/ib_cm.h                               |   12 +
 include/rdma/ib_sa.h                               |   33 +
 include/rdma/ib_verbs.h                            |    3 +
 32 files changed, 7520 insertions(+), 358 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx4/alias_GUID.c
 create mode 100644 drivers/infiniband/hw/mlx4/cm.c
 create mode 100644 drivers/infiniband/hw/mlx4/mcg.c
 create mode 100644 drivers/infiniband/hw/mlx4/sysfs.c

Cc: dotanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
Cc: tziporet-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2012-07-13 13:40 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-19  8:21 [PATCH for-next V1 00/29] Add SRIOV support for IB interfaces Jack Morgenstein
     [not found] ` <1340094121-14858-1-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-06-19  8:21   ` [PATCH for-next V1 01/29] net/mlx4_core: Pass an invalid PCI id number to VFs Jack Morgenstein
     [not found]     ` <1340094121-14858-2-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-07-05 20:50       ` Roland Dreier
2012-06-19  8:21   ` [PATCH for-next V1 02/29] IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 03/29] IB/mlx4: Add debug printouts Jack Morgenstein
     [not found]     ` <1340094121-14858-4-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-07-05 22:59       ` Roland Dreier
2012-06-19  8:21   ` [PATCH for-next V1 04/29] IB/core: change pkey table lookups to support full and partial membership for the same pkey Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 05/29] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 06/29] IB/sa: Add GuidInfoRecord query support Jack Morgenstein
     [not found]     ` <1340094121-14858-7-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-07-05 23:00       ` Roland Dreier
2012-06-19  8:21   ` [PATCH for-next V1 07/29] IB/core: move macros from cm_msgs.h to ib_cm.h Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 08/29] {NET,IB}/mlx4: Use port management change event instead of smp_snoop Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 09/29] net/mlx4_core: For SRIOV, initialize ib port-capabilities for all slaves Jack Morgenstein
     [not found]     ` <1340094121-14858-10-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-07-10 16:57       ` Roland Dreier
     [not found]         ` <CAL1RGDXOwkxQ4TXO7-KG-Eq3bLUx4r3OZ8GQqHd9YSzHsNsXbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-11 13:33           ` Or Gerlitz
2012-06-19  8:21   ` [PATCH for-next V1 10/29] net/mlx4_core: Implement mechanism for reserved qkeys Jack Morgenstein
     [not found]     ` <1340094121-14858-11-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-07-05 23:26       ` Roland Dreier
     [not found]         ` <CAL1RGDVXTWs6Xi2JQ=7-RiZUePfc6SoUsCHHdbS9XLUQyim6UA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-06  2:51           ` Or Gerlitz
2012-07-08 15:17           ` Or Gerlitz
2012-07-11 18:37       ` Roland Dreier
     [not found]         ` <CAL1RGDXU+Btt+r=AsqcTXiiFV5f1Ourau77AVbw68Ekf80W0tg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-13 13:40           ` Or Gerlitz
2012-06-19  8:21   ` [PATCH for-next V1 11/29] net/mlx4_core: Allow guests to support IB ports Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 12/29] {NET,IB}/mlx4_core: place phys gid and pkey tbl sizes in mlx4_phys_caps struct and paravirtualize them Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 13/29] IB/mlx4: SRIOV IB context objects and proxy/tunnel sqp support Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 14/29] net/mlx4_core: Add proxy and tunnel QPs to the reserved QP area Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 15/29] IB/mlx4: Initialize SRIOV IB support for slaves in master context Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 16/29] {NET,IB}/mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 17/29] IB/mlx4: SRIOV multiplex and demultiplex MADs Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 18/29] {NET,IB}/mlx4: MAD_IFC paravirtualization Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 19/29] IB/mlx4: Added Multicast Groups (MCG) para-virtualization for SRIOV Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 20/29] IB/mlx4: Add CM paravirtualization Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 21/29] net/mlx4_core: Add IB port-state machine, and port mgmt event propagation infrastructure Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 22/29] {NET,IB}/mlx4: Add alias_guid mechanism Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 23/29] IB/mlx4: Propagate pkey and guid change port management events to slaves Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 24/29] IB/mlx4: Add iov directory in sysfs under the ib device Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 25/29] net/mlx4_core: Adjustments to SET_PORT for SRIOV-IB Jack Morgenstein
     [not found]     ` <1340094121-14858-26-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-07-06  0:09       ` Roland Dreier
     [not found]         ` <CAL1RGDW093B0p1caO2CHKQiAGtxLGR97h1tFd7W4QY8_31MfTg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-06  2:43           ` Or Gerlitz
     [not found]             ` <CAJZOPZKhV=rcGF0duJLkGFBhD3_tG-JgW+hkVH51KX6uokVEiA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-06  3:09               ` Roland Dreier
     [not found]                 ` <CAL1RGDUg+fhPDHBfXywaW2AaPfarM3vcx-UV0648hLSmT9s1Fw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-09  3:02                   ` Or Gerlitz
2012-07-06  3:01           ` Or Gerlitz
2012-06-19  8:21   ` [PATCH for-next V1 26/29] net/mlx4_core: INIT/CLOSE port logic for IB ports in SRIOV mode Jack Morgenstein
2012-06-19  8:21   ` [PATCH for-next V1 27/29] IB/mlx4: Miscellaneous adjustments to SRIOV IB support Jack Morgenstein
2012-06-19  8:22   ` [PATCH for-next V1 28/29] {NET,IB}/mlx4: Activate SRIOV mode for IB Jack Morgenstein
2012-06-19  8:22   ` [PATCH for-next V1 29/29] {NET,IB}/mlx4: Paravirtualize Node Guids for slaves Jack Morgenstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.