All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH for-next 00/29] Add SRIOV support for IB interfaces
@ 2012-06-14 13:21 Yann Droneaud
       [not found] ` <9e8965c740ff0e889a097a722b9fff90.squirrel-2RFepEojUI2lDZmfZ6uX/xeHL2rgt/dS@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Yann Droneaud @ 2012-06-14 13:21 UTC (permalink / raw)
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, yevgenyp-VPRAkNaXOzVWk0Htik3J/w,
	Jack Morgenstein, dotanb-VPRAkNaXOzVWk0Htik3J/w,
	tziporet-VPRAkNaXOzVWk0Htik3J/w,
	ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA

Hi,

> This patch set adds SRIOV support for IB interfaces.
> Patches 1-13 are "precondition" patches.
> Patches 14-29 actually implement the feature.
> This patch set introduces Infiniband SRIOV support for ConnectX2 and
ConnectX3
> devices.  Each function presents itself as an independent vHCA (virtual
HCA) to
> the host while a single HCA is observable by the network, which is
unaware
> of
> the vHCAs.  No changes are required by the IB subsystem, ULPs, and apps
to
> support SRIOV, and vHCAs are interoperable with any existing
> (non-virtualized)
> IB deployments.

Please forgive me, I haven't tried the patches or go deeper in reading them.

How will interact vHCA/HCA regarding Automatic Path Migration (APM) and
IPoIB bonding with fail-over (HA), with RDMA_CM and IP usages in mind ?

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-next 00/29] Add SRIOV support for IB interfaces
       [not found] ` <9e8965c740ff0e889a097a722b9fff90.squirrel-2RFepEojUI2lDZmfZ6uX/xeHL2rgt/dS@public.gmane.org>
@ 2012-06-17 14:39   ` Or Gerlitz
       [not found]     ` <4FDDEC16.9060702-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Or Gerlitz @ 2012-06-17 14:39 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Jack Morgenstein, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, liranl-VPRAkNaXOzVWk0Htik3J/w,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, dotanb-VPRAkNaXOzVWk0Htik3J/w,
	tziporet-VPRAkNaXOzVWk0Htik3J/w

On 6/14/2012 4:21 PM, Yann Droneaud wrote:
> How will interact vHCA/HCA regarding Automatic Path Migration (APM) 
> and IPoIB bonding with fail-over (HA), with RDMA_CM and IP usages in 
> mind ?

work on vHCA the same way they do on HCA, please let me know if you have 
anything more special to realize

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-next 00/29] Add SRIOV support for IB interfaces
       [not found]     ` <4FDDEC16.9060702-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-06-18  9:43       ` Yann Droneaud
  0 siblings, 0 replies; 6+ messages in thread
From: Yann Droneaud @ 2012-06-18  9:43 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Yann Droneaud, Jack Morgenstein, roland-DgEjT+Ai2ygdnm+yROfE0A,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, liranl-VPRAkNaXOzVWk0Htik3J/w,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, dotanb-VPRAkNaXOzVWk0Htik3J/w,
	tziporet-VPRAkNaXOzVWk0Htik3J/w

Hi,

> On 6/14/2012 4:21 PM, Yann Droneaud wrote:
>> How will interact vHCA/HCA regarding Automatic Path Migration (APM)
>> and IPoIB bonding with fail-over (HA), with RDMA_CM and IP usages in
>> mind ?
>
> work on vHCA the same way they do on HCA, please let me know if you have
> anything more special to realize
>

Nothing special.

As I was understanding the patches description, the SR-IOV virtualization
layer was acting like some kind of address translation (NAT) layer, so I
thought there could be some corner case, especially with fail-over.

Thanks for clarification.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-next 00/29] Add SRIOV support for IB interfaces
       [not found]     ` <CAL1RGDVsGUBHUBOaajnzO4NcFjA1xBZczLA=vxV3=oRKme+LrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-06-12  7:22       ` Or Gerlitz
  0 siblings, 0 replies; 6+ messages in thread
From: Or Gerlitz @ 2012-06-12  7:22 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Jack Morgenstein, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, yevgenyp-VPRAkNaXOzVWk0Htik3J/w,
	dotanb-VPRAkNaXOzVWk0Htik3J/w, tziporet-VPRAkNaXOzVWk0Htik3J/w

On 6/12/2012 10:00 AM, Roland Dreier wrote:
> Jack Morgenstein<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>  wrote:
>> several of the patches have notations indicating things that will be fixed in V1
>
> Not sure what you want me to do with this -- it seems you yourself are saying this series is not ready to merge yet?

Roland,

We observed few last minute issues and preferred not to further defer 
submission,
with the reasons being:

1. be sure that the submitted code passed through regression, not even a 
bit change since

2. the issues we spotted are all fairly simple to fix and V1 will be out 
quickly

3. the design --> resulted code is a bit complex, wanted it to see the 
day light such that review can start (thank you for the quick feedback 
on patch 03)

All in all, V1 is coming soon, and if you prefer starting from there, 
let it be, but
still, will love to get any comments / questions / corrections.

Or.

The known issues to be fixed in V1, with NNN being the patch number:

002 illegal use here of the MSB of the port_num field in the ib core ah 
structure

013 add patch which will indicate some of the most-significant bits in
     ib_qp_create_flags as reserved for low-level driver use.

015 modify the slave_num macro defined here to be usable in mlx4_core as 
well

022 require some slight cleanup

024 require some slight cleanup

029 change the patch to reflect that RoCE is still not supported on slaves

also librdmacm will currently not support multiple VF/PF on the same host,
we will change the patch set to solve that, no change in the library


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH for-next 00/29] Add SRIOV support for IB interfaces
       [not found] ` <1339411570-4689-1-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2012-06-12  7:00   ` Roland Dreier
       [not found]     ` <CAL1RGDVsGUBHUBOaajnzO4NcFjA1xBZczLA=vxV3=oRKme+LrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2012-06-12  7:00 UTC (permalink / raw)
  To: Jack Morgenstein
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, liranl-VPRAkNaXOzVWk0Htik3J/w,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, dotanb-VPRAkNaXOzVWk0Htik3J/w,
	tziporet-VPRAkNaXOzVWk0Htik3J/w

On Mon, Jun 11, 2012 at 3:45 AM, Jack Morgenstein
<jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> In addition, several of the patches have notations indicating things that
> will be fixed in V1.

Not sure what you want me to do with this -- it seems you yourself are
saying this series is not ready to merge yet?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH for-next 00/29] Add SRIOV support for IB interfaces
@ 2012-06-11 10:45 Jack Morgenstein
       [not found] ` <1339411570-4689-1-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Jack Morgenstein @ 2012-06-11 10:45 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, liranl-VPRAkNaXOzVWk0Htik3J/w,
	yevgenyp-VPRAkNaXOzVWk0Htik3J/w, Jack Morgenstein,
	dotanb-VPRAkNaXOzVWk0Htik3J/w, tziporet-VPRAkNaXOzVWk0Htik3J/w

This patch set adds SRIOV support for IB interfaces.

Patches 1-13 are "precondition" patches.
Patches 14-29 actually implement the feature.

This patch set introduces Infiniband SRIOV support for ConnectX2 and ConnectX3
devices.  Each function presents itself as an independent vHCA (virtual HCA) to
the host while a single HCA is observable by the network, which is unaware of
the vHCAs.  No changes are required by the IB subsystem, ULPs, and apps to
support SRIOV, and vHCAs are interoperable with any existing (non-virtualized)
IB deployments.
 
We term this model for SRIOV implementation the shared-port model.

Sharing the same physical port(s) among multiple vHCAs is achieved as follows:
 
1. Each vHCA port presents its own virtual GID table.
 
Currently, the virtual GID table comprises a single entry (at index 0) that
maps to a unique index in the physical GID table.  The vHCA of the PF maps to
physical GID index 0. To obtain GIDs for other vHCAs, alias GUIDs are requested
from the SM.  These are GUIDs which the SM places, per port, in the port's guid
table after the 0'th slot (which is read-only and determined by the FW).
The host admin can assign GIDs to vHCAs using a sysfs interface (see below).
 
2. Each vHCA port presents its own virtual PKey table.
 
The virtual PKey table is a mapping of selected indexes of the physical pkey table.
The host admin can control which pkey indexes are mapped to which virtual indexes
using a sysfs interface (see below). Note that the physical PKey table may contain
both full and partial memberships of the same PKey to allow different membership
types in different virtual tables.
 
3. Each vHCA port has its own virtual port state.
 
A vHCA port is up if the following conditions apply:
- The physical port is up
- The virtual GID table contains the GIDs requested by the host admin
- The SM has acknowledged the requested GIDs since the last time that
  the physical port came up
 
4. Other port attributes are shared, e.g., GID prefix, LID,  SM LID, LMC mask.
 
5. Special QPs are para-virtualized.
 
vHCAs are not given direct access to QP0/1. Rather, these QPs are operated by a
special context hosted by the PF, which mediates access to/from vHCAs.
This is done by opening a “tunnel” per vHCA port per QP0/1. A tunnel comprises
a pair of UD QPs:  a “Tunnel QP” in the PF-context and a “Proxy QP” in the vHCA.
All vHCA MAD traffic must pass through the corresponding tunnel.
vHCA QPs cannot be assigned to VL15 and are denied of the well-known QKey. 
 
QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual) QP0’s,
but they never receive any SMPs and all SMPs sent are discarded.
QP1 traffic is allowed for all vHCAs, but special care is required to bridge
the gap between the host and network views.

Specifically:
- Transaction IDs are mapped to guarantee uniqueness among vHCAs
- CM para-virtualization
  o   Incoming requests are steered to the correct vHCA according to the embedded GID
  o   Local communication IDs are mapped to ensure uniqueness among vHCAs
- Multicast para-virtualization
  o   The PF context aggregates membership state from all vHCAs
  o   The SA is contacted only when the aggregate membership changes
  o   If the aggregate does not change, the PF context will provide the
       requesting vHCA with the proper response
 
Incoming MADs are steered according to:
- the DGID If a GRH is present
- the mapped transaction ID for response MADs
- the embedded GID in CM requests
- the remote communication ID in other CM messages

To allow the host admin to control the virtual GID and PKey tables of vHCAs,
a new sysfs ‘iov’ sub-tree has been added under the PF infiniband device.
Details on this mechanism can be found in the change log of:
   IB/mlx4: Add iov directory in sysfs under the ib device

Known Issues
------------
1. librdmacm will currently not support multiple VF/PF on the same host.
   This will be fixed in V1.
2. FMRs are not currently supported on slaves. This will be corrected in a
   future submission.
3. RoCE is not currently supported on slaves. This will be corrected in a
   future submission.
4. Due to a (correct) change in kernel IRQ management in kernel 3.5-rc1 (see
   commit 1c6c69525b40), the KVM module no longer succeeds in passing interrupts
   through to guests.  (see the discussion thread beginning at
   https://lkml.org/lkml/2012/6/1/261).  Until this KVM issue is fixed, anyone
   wishing to use SRIOV-IB (or SRIOV-Ethernet) with ConnectX2 or ConnectX3
   devices on guest O/Ses should revert commit 1c6c69525b40
   (as a TEMPORARY workaround) in order to enable the guests to operate the mlx4 driver.

   VFs may still be bound to the host (via setting the "probe_vf" mlx4_core
   module parameter to a non-zero value in a conf file under /etc/modprobe.d) 
   without reverting the commit mentioned above.

In addition, several of the patches have notations indicating things that
will be fixed in V1.

Amir Vadai (1):
  IB/mlx4: Add CM paravirtualization

Erez Shitrit (1):
  IB/sa: Add GuidInfoRecord query support.

Jack Morgenstein (26):
  net/mlx4_core: Pass an invalid PCI id number to VFs
  IB/mlx4: Mask out high order bit of port_num in mlx4_ib_create_ah
  IB/mlx4: Add run-time switchable error path debug output capability
  IB/core: change pkey table lookups to support full and partial
    membership for the same pkey
  IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey
    match
  IB/core: move macros from cm_msgs.h to ib_cm.h
  {NET,IB}/mlx4: Use port management change event instead of smp_snoop
  net/mlx4_core: For SRIOV, initialize ib port-capabilities for all
    slaves
  net/mlx4_core: Implement mechanism for reserved qkeys
  net/mlx4_core: Allow guests to support IB ports
  net/mlx4_core: place phys gid and pkey tbl sizes in mlx4_phys_caps
    struct and paravirtualize them
  IB/mlx4: SRIOV IB context objects and proxy/tunnel sqp support
  net/mlx4_core: Add proxy and tunnel QPs to the reserved QP area
  IB/mlx4: Initialize SRIOV IB support for slaves in master context
  {NET/IB}mlx4: Implement QP paravirtualization
  IB/mlx4: SRIOV multiplex and demultiplex MADs
  {NET,IB}/mlx4: MAD_IFC paravirtualization
  net/mlx4_core: Add IB port-state machine, and port mgmt event
    propagation infrastructure
  {NET,IB}/mlx4: Add alias_guid mechanism
  IB/mlx4: Propagate pkey and guid change port management events to
    slaves
  IB/mlx4: Add iov directory in sysfs under the ib device
  net/mlx4_core: Adjustments to SET_PORT for SRIOV-IB
  IB/mlx4: Initialize guid-cache index 0 (default guid)
  net/mlx4_core: INIT/CLOSE port logic for IB ports in SRIOV mode
  IB/mlx4: Miscellaneous adjustments to SRIOV IB support
  {NET/IB}mlx4: Activate SRIOV mode for IB

Oren Duer (1):
  IB/mlx4: Added Multicast Groups (MCG) para-virtualization for SRIOV

 drivers/infiniband/core/cache.c                    |   42 +-
 drivers/infiniband/core/cm_msgs.h                  |   12 -
 drivers/infiniband/core/device.c                   |   17 +-
 drivers/infiniband/core/sa_query.c                 |  133 ++
 drivers/infiniband/hw/mlx4/Makefile                |    2 +-
 drivers/infiniband/hw/mlx4/ah.c                    |    4 +-
 drivers/infiniband/hw/mlx4/alias_GUID.c            |  791 +++++++++
 drivers/infiniband/hw/mlx4/cm.c                    |  437 +++++
 drivers/infiniband/hw/mlx4/cq.c                    |   31 +-
 drivers/infiniband/hw/mlx4/mad.c                   | 1712 +++++++++++++++++++-
 drivers/infiniband/hw/mlx4/main.c                  |  284 +++-
 drivers/infiniband/hw/mlx4/mcg.c                   | 1254 ++++++++++++++
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |  368 +++++-
 drivers/infiniband/hw/mlx4/qp.c                    |  663 +++++++-
 drivers/infiniband/hw/mlx4/sysfs.c                 |  808 +++++++++
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |  179 ++-
 drivers/net/ethernet/mellanox/mlx4/en_main.c       |    5 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c            |  257 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.c            |  235 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.h            |    3 +
 drivers/net/ethernet/mellanox/mlx4/intf.c          |    5 +-
 drivers/net/ethernet/mellanox/mlx4/main.c          |  103 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |  115 +-
 drivers/net/ethernet/mellanox/mlx4/port.c          |   21 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c            |   66 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  220 +++-
 include/linux/mlx4/device.h                        |  168 ++-
 include/linux/mlx4/driver.h                        |    5 +-
 include/linux/mlx4/qp.h                            |    3 +-
 include/rdma/ib_cache.h                            |   16 +
 include/rdma/ib_cm.h                               |   12 +
 include/rdma/ib_sa.h                               |   33 +
 32 files changed, 7653 insertions(+), 351 deletions(-)
 create mode 100644 drivers/infiniband/hw/mlx4/alias_GUID.c
 create mode 100644 drivers/infiniband/hw/mlx4/cm.c
 create mode 100644 drivers/infiniband/hw/mlx4/mcg.c
 create mode 100644 drivers/infiniband/hw/mlx4/sysfs.c

Cc: dotanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
Cc: tziporet-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-06-18  9:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-14 13:21 [PATCH for-next 00/29] Add SRIOV support for IB interfaces Yann Droneaud
     [not found] ` <9e8965c740ff0e889a097a722b9fff90.squirrel-2RFepEojUI2lDZmfZ6uX/xeHL2rgt/dS@public.gmane.org>
2012-06-17 14:39   ` Or Gerlitz
     [not found]     ` <4FDDEC16.9060702-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-06-18  9:43       ` Yann Droneaud
  -- strict thread matches above, loose matches on Subject: below --
2012-06-11 10:45 Jack Morgenstein
     [not found] ` <1339411570-4689-1-git-send-email-jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2012-06-12  7:00   ` Roland Dreier
     [not found]     ` <CAL1RGDVsGUBHUBOaajnzO4NcFjA1xBZczLA=vxV3=oRKme+LrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-06-12  7:22       ` Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.