All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Adding support for RoCE V2 specification
@ 2014-11-27  0:56 Somnath Kotur
       [not found] ` <51dd1341-8216-494f-9574-120634cad8e7-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Somnath Kotur @ 2014-11-27  0:56 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Somnath Kotur

Please apply.

Somnath Kotur (2):
  IB/Core: Changes to the IB Core infrastructure for RoCE V2.
  IB/ocrdma: Changes to comply with RoCEv2 Specification changes

 drivers/infiniband/core/addr.c              |   13 +++++-
 drivers/infiniband/core/cm.c                |   15 ++++---
 drivers/infiniband/core/cma.c               |    4 +-
 drivers/infiniband/core/sa_query.c          |    1 +
 drivers/infiniband/core/verbs.c             |   47 +++++++++++++++++---
 drivers/infiniband/hw/ocrdma/ocrdma.h       |    2 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c    |   64 +++++++++++++++++++++-----
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c    |   26 +++++++++++
 drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   16 ++++++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   11 ++++-
 include/rdma/ib_addr.h                      |    3 +-
 include/rdma/ib_sa.h                        |    1 +
 include/rdma/ib_verbs.h                     |   15 ++++++
 13 files changed, 187 insertions(+), 31 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found] ` <51dd1341-8216-494f-9574-120634cad8e7-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org>
@ 2014-11-27  9:56   ` Moni Shoua
       [not found]     ` <CAG9sBKPGbHc49rLx+z3N+bgzMakFks2ihvgAkZpmjUz==2ZYDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-11-27 15:52   ` Shachar Raindel
  1 sibling, 1 reply; 10+ messages in thread
From: Moni Shoua @ 2014-11-27  9:56 UTC (permalink / raw)
  To: Somnath Kotur; +Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma

On Thu, Nov 27, 2014 at 2:56 AM, Somnath Kotur <somnath.kotur-laKkSmNT4hZhl2p70BpVqQ@public.gmane.orgm> wrote:
> Please apply.
>
> Somnath Kotur (2):
>   IB/Core: Changes to the IB Core infrastructure for RoCE V2.
>   IB/ocrdma: Changes to comply with RoCEv2 Specification changes
>
>  drivers/infiniband/core/addr.c              |   13 +++++-
>  drivers/infiniband/core/cm.c                |   15 ++++---
>  drivers/infiniband/core/cma.c               |    4 +-
>  drivers/infiniband/core/sa_query.c          |    1 +
>  drivers/infiniband/core/verbs.c             |   47 +++++++++++++++++---
>  drivers/infiniband/hw/ocrdma/ocrdma.h       |    2 +
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c    |   64 +++++++++++++++++++++-----
>  drivers/infiniband/hw/ocrdma/ocrdma_hw.c    |   26 +++++++++++
>  drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   16 ++++++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   11 ++++-
>  include/rdma/ib_addr.h                      |    3 +-
>  include/rdma/ib_sa.h                        |    1 +
>  include/rdma/ib_verbs.h                     |   15 ++++++
>  13 files changed, 187 insertions(+), 31 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Few comments to the IB/core part

1.       RoCEv2 spec (Annex A17) sees RoCE type (or as the patch name
it - network_hdr_type) as an attribute of the source GID. If so, there
is no point to for address structures (ib_ah_attr, ib_sa_path_rec, …)
to have both sgid and RoCE type. The alternative is to keep an extra
info in each GID table entry about the RoCE header (IPv4/IPv6 or GRH)
and use it when modifying QP from INIT to RTR (RC/UC) or creating AH
(UD).

2.       Population of  table in for RoCE should be a common code to
all drivers and needs to be moved to IB/core (unlike today when it is
specific to a vendor). We plan to push such a change soon.

3.       If I get it right, the purpose of the change in ib_addr is to
that determine RoCE type based on the resolution of address from OS.
If gateway is used (dest address is not in the subnet of source
address) then the protocol will be V2, otherwise it will be V1. This
means that IB over UDP is not possible on the same network. I don't
think that this is acceptable.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found] ` <51dd1341-8216-494f-9574-120634cad8e7-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org>
  2014-11-27  9:56   ` Moni Shoua
@ 2014-11-27 15:52   ` Shachar Raindel
       [not found]     ` <583cdb13b37145de92f9321e325f89b6-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Shachar Raindel @ 2014-11-27 15:52 UTC (permalink / raw)
  To: Somnath Kotur, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Somnath Kotur
> Sent: Thursday, November 27, 2014 2:56 AM
> To: roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Somnath Kotur
> Subject: [PATCH 0/2] Adding support for RoCE V2 specification
> 
> Please apply.
> 

This cover letter is rather short. Can you please elaborate?

Specifically, I will be happy to hear what is the expected usage model with RoCE v2.

What are the goals of the changes you are asking to push?

What assumptions do you have on the network topology? How do you decide when to use RoCE v1 and RoCE v2?

What is the integration of RoCE v2 with the CMA?

What changes in the user space libraries and applications are needed to support RoCE v2 on base of this kernel patch?


> Somnath Kotur (2):
>   IB/Core: Changes to the IB Core infrastructure for RoCE V2.
>   IB/ocrdma: Changes to comply with RoCEv2 Specification changes
> 
>  drivers/infiniband/core/addr.c              |   13 +++++-
>  drivers/infiniband/core/cm.c                |   15 ++++---
>  drivers/infiniband/core/cma.c               |    4 +-
>  drivers/infiniband/core/sa_query.c          |    1 +
>  drivers/infiniband/core/verbs.c             |   47 +++++++++++++++++---
>  drivers/infiniband/hw/ocrdma/ocrdma.h       |    2 +
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c    |   64
> +++++++++++++++++++++-----
>  drivers/infiniband/hw/ocrdma/ocrdma_hw.c    |   26 +++++++++++
>  drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   16 ++++++-
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   11 ++++-
>  include/rdma/ib_addr.h                      |    3 +-
>  include/rdma/ib_sa.h                        |    1 +
>  include/rdma/ib_verbs.h                     |   15 ++++++
>  13 files changed, 187 insertions(+), 31 deletions(-)
> 

Thanks,
--Shachar
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found]     ` <583cdb13b37145de92f9321e325f89b6-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2014-11-28  6:30       ` Somnath Kotur
  0 siblings, 0 replies; 10+ messages in thread
From: Somnath Kotur @ 2014-11-28  6:30 UTC (permalink / raw)
  To: Shachar Raindel, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Thank you Shachar for your comments. Sure, will respond to your queries and elaborate more with an updated cover note 
and re-send the patch series . Hope that is fine with you?

Thanks
Som

> -----Original Message-----
> From: Shachar Raindel [mailto:raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
> Sent: Thursday, November 27, 2014 9:22 PM
> To: Somnath Kotur; roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: RE: [PATCH 0/2] Adding support for RoCE V2 specification
> 
> Hi,
> 
> > -----Original Message-----
> > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Somnath Kotur
> > Sent: Thursday, November 27, 2014 2:56 AM
> > To: roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Somnath Kotur
> > Subject: [PATCH 0/2] Adding support for RoCE V2 specification
> >
> > Please apply.
> >
> 
> This cover letter is rather short. Can you please elaborate?
> 
> Specifically, I will be happy to hear what is the expected usage model with
> RoCE v2.
> 
> What are the goals of the changes you are asking to push?
> 
> What assumptions do you have on the network topology? How do you
> decide when to use RoCE v1 and RoCE v2?
> 
> What is the integration of RoCE v2 with the CMA?
> 
> What changes in the user space libraries and applications are needed to
> support RoCE v2 on base of this kernel patch?
> 
> 
> > Somnath Kotur (2):
> >   IB/Core: Changes to the IB Core infrastructure for RoCE V2.
> >   IB/ocrdma: Changes to comply with RoCEv2 Specification changes
> >
> >  drivers/infiniband/core/addr.c              |   13 +++++-
> >  drivers/infiniband/core/cm.c                |   15 ++++---
> >  drivers/infiniband/core/cma.c               |    4 +-
> >  drivers/infiniband/core/sa_query.c          |    1 +
> >  drivers/infiniband/core/verbs.c             |   47 +++++++++++++++++---
> >  drivers/infiniband/hw/ocrdma/ocrdma.h       |    2 +
> >  drivers/infiniband/hw/ocrdma/ocrdma_ah.c    |   64
> > +++++++++++++++++++++-----
> >  drivers/infiniband/hw/ocrdma/ocrdma_hw.c    |   26 +++++++++++
> >  drivers/infiniband/hw/ocrdma/ocrdma_sli.h   |   16 ++++++-
> >  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |   11 ++++-
> >  include/rdma/ib_addr.h                      |    3 +-
> >  include/rdma/ib_sa.h                        |    1 +
> >  include/rdma/ib_verbs.h                     |   15 ++++++
> >  13 files changed, 187 insertions(+), 31 deletions(-)
> >
> 
> Thanks,
> --Shachar
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found]     ` <CAG9sBKPGbHc49rLx+z3N+bgzMakFks2ihvgAkZpmjUz==2ZYDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-11-28 11:51       ` Somnath Kotur
       [not found]         ` <7F44EA5110810A40B7DAFB605C41975D312F40F4-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
       [not found]         ` <CAG9sBKMDLNnO1Q8QRNNq9cAnTdEL3DXPhpUqwKqFaN1aMGyomw@mail.gmail.com>
  0 siblings, 2 replies; 10+ messages in thread
From: Somnath Kotur @ 2014-11-28 11:51 UTC (permalink / raw)
  To: Moni Shoua; +Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma

Hi Moni/Shachar,
	Thank you for your feedback/comments. I have tried to address  some of your concerns /comments inline below.


> Few comments to the IB/core part
> 
> 1.       RoCEv2 spec (Annex A17) sees RoCE type (or as the patch name
> it - network_hdr_type) as an attribute of the source GID. If so, there is no
> point to for address structures (ib_ah_attr, ib_sa_path_rec, …) to have both
> sgid and RoCE type. The alternative is to keep an extra info in each GID table
> entry about the RoCE header (IPv4/IPv6 or GRH) and use it when modifying
> QP from INIT to RTR (RC/UC) or creating AH (UD).
Yes, that is correct.

The overarching goal behind this patch is to keep RDMA-CM as the central entity that decides the protocol (ROCEV2 /ROCEV1)
and hides all the address resolution details from applications.
This  is just a continuation of the existing RDMA-CM design philosophy for IP Based GIDs that lets applications operate over any form
of RDMA Service in a completely transparent way without any changes to the applications themselves.

Protocol Resolution(RoCEV2 or V1) must be done even before we 
send out the 1st connection request packet, the corresponding MAD pkt must be V2 for a destination that is behind  a router, correct?
I am not sure we want to try sending out RoCEv2 connection requests first and if that fails ,then fallback/retry with RoCEv1 request.
That would needlessly complicate and slow down the connection-establishment process, do you agree ?
Instead it's much more seamless to take help of the IP stack to select the network header type and subsequently the ROCE pkt format while
attempting to establish the connection.

The one thing that my patch does miss out though is the ability for local subnet end nodes to be able to communicate using RoCEv2 packet formats.
(The default policy in this patch is to use ROCEV1 for nodes connected directly /within the same local subnet.)
I believe that can be achieved by extending RDMA-CM to override the default protocol selection policy proposed in this patch by making use of
the GID Entry attribute type in another patch.
This scheme should still work with your proposal  in point# 2 , where GID Table management would move to a central/stack module instead of being
vendor specific.

Thanks
Som
 
> 2.       Population of  table in for RoCE should be a common code to
> all drivers and needs to be moved to IB/core (unlike today when it is specific
> to a vendor). We plan to push such a change soon.
> 
> 3.       If I get it right, the purpose of the change in ib_addr is to
> that determine RoCE type based on the resolution of address from OS.
> If gateway is used (dest address is not in the subnet of source
> address) then the protocol will be V2, otherwise it will be V1. This means that
> IB over UDP is not possible on the same network. I don't think that this is
> acceptable.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found]         ` <7F44EA5110810A40B7DAFB605C41975D312F40F4-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-11-30  6:32           ` Moni Shoua
  0 siblings, 0 replies; 10+ messages in thread
From: Moni Shoua @ 2014-11-30  6:32 UTC (permalink / raw)
  To: Somnath Kotur; +Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma

> Yes, that is correct.
>
> The overarching goal behind this patch is to keep RDMA-CM as the central entity that decides the protocol (ROCEV2 /ROCEV1)
> and hides all the address resolution details from applications.
I agree but with one comment. RDMA-CM, although preferred,  is not
mandatory to establish a connection. The solution needs to take care
also for
apps that don't use RDMA-CM

> This  is just a continuation of the existing RDMA-CM design philosophy for IP Based GIDs that lets applications operate over any form
> of RDMA Service in a completely transparent way without any changes to the applications themselves.
>
> Protocol Resolution(RoCEV2 or V1) must be done even before we
> send out the 1st connection request packet, the corresponding MAD pkt must be V2 for a destination that is behind  a router, correct?
> I am not sure we want to try sending out RoCEv2 connection requests first and if that fails ,then fallback/retry with RoCEv1 request.
I agree but when you send the first packet (request or response) you
know the SGID index to be used. Knowing that and assuming you chose
the SGID index correctly there is no need to guess.
A suggestion for choosing the SGID index is:
1. First, populate GID table with GIDs matching the IP address list of
the corresponding net devices. Each GID will appear twice, one for
RoCEv1 and one for RoCEv2.
2. When connection is being established RDMA-CM will choose one of the
2 matching indices based on a
preferred protocol attribute for the rmda_id (new attr.)
3. Proffered protocol is
     a) global for the node/fabric based on the administrator settings
     b) can be changed by applications that want to override the
default (requires extra API)


> That would needlessly complicate and slow down the connection-establishment process, do you agree ?
> Instead it's much more seamless to take help of the IP stack to select the network header type and subsequently the ROCE pkt format while
> attempting to establish the connection.
>
> The one thing that my patch does miss out though is the ability for local subnet end nodes to be able to communicate using RoCEv2 packet formats.
> (The default policy in this patch is to use ROCEV1 for nodes connected directly /within the same local subnet.)
> I believe that can be achieved by extending RDMA-CM to override the default protocol selection policy proposed in this patch by making use of
> the GID Entry attribute type in another patch.
Right, and this seems to fit the proposal above.


> This scheme should still work with your proposal  in point# 2 , where GID Table management would move to a central/stack module instead of being
> vendor specific.
>
> Thanks
> Som
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found]           ` <CAG9sBKMDLNnO1Q8QRNNq9cAnTdEL3DXPhpUqwKqFaN1aMGyomw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-12-02  5:35             ` Somnath Kotur
       [not found]               ` <7F44EA5110810A40B7DAFB605C41975D312F4C7C-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Somnath Kotur @ 2014-12-02  5:35 UTC (permalink / raw)
  To: Moni Shoua
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3676 bytes --]

HI Moni,
	Thank you for your comments , please find my response inline.


> > The overarching goal behind this patch is to keep RDMA-CM as the
> > central entity that decides the protocol (ROCEV2 /ROCEV1) and hides all the
> address resolution details from applications.
> I agree but with one comment. RDMA-CM, although preferred,  is not
> mandatory to establish a connection. The solution needs to take care also for
> apps that don't use RDMA-CM

Agreed and this patch takes care of that as well thanks to the solution that was put in place 
for IP-based GIDs anyway in `modify_qp` effectively by means of the ib_resolve_l2_attrs().

> 
> > This  is just a continuation of the existing RDMA-CM design philosophy
> > for IP Based GIDs that lets applications operate over any form of RDMA
> Service in a completely transparent way without any changes to the
> applications themselves.
> >
> > Protocol Resolution(RoCEV2 or V1) must be done even before we send out
> > the 1st connection request packet, the corresponding MAD pkt must be V2
> for a destination that is behind  a router, correct?
> > I am not sure we want to try sending out RoCEv2 connection requests first
> and if that fails ,then fallback/retry with RoCEv1 request.
> I agree but when you send the first packet (request or response) you know
> the SGID index to be used. Knowing that and assuming you chose the SGID
> index correctly there is no need to guess.
> A suggestion for choosing the SGID index is:
> 1. First, populate GID table with GIDs matching the IP address list of the
> corresponding net devices. Each GID will appear twice, one for
> RoCEv1 and one for RoCEv2.
> 2. When connection is being established RDMA-CM will choose one of the
> 2 matching indices based on a preferred protocol attribute for the rmda_id
> (new attr.) 3. Proffered protocol is
>      a) global for the node/fabric based on the administrator settings
>      b) can be changed by applications that want to override the default
> (requires extra API)

Agree to point 1 which is what we were thinking of doing as well as part of the 'override' option.
Still feel that the 'default' protocol resolution should be based on help from the IP/Networking stack
as that would straight away enable existing applications and majority of use-case scenarios that use RDMA-CM to seamlessly
work without any change for RoCE V2.
What this means is now RDMA-CM will choose one of the 2 matching GID indices based on help from the IP stack.

For this to happen , the 'gid_cache' structure will have to undergo a change to now incorporate a new 'gid_type' field. 
Accordingly the helper functions that find a cached gid entry - ib_find_cached_gid() will also undergo a change to take into account this new field.

I plan to resubmit the patch series making the following changes
a) Populating GID table with 2 entries per IP address
b) Changes to the GID cache structure along with it's helper functions
c) Make sure that RDMA-CM selects/finds the corresponding GID once a network type is chosen based on help from the IP stack. 


The only thing missing is the ability for end-nodes connected locally inside same subnet to communicate using V2 and that 
could be provided as an 'override' option using the 'preferred protocol' attribute setting on a system-wide basis or on an application basis 
as you have proposed.
This I propose as an additional patch after the above patch goes in first.

Does this sound like a plan ?

Thanks
Som

N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found]               ` <7F44EA5110810A40B7DAFB605C41975D312F4C7C-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-12-02  6:47                 ` Moni Shoua
       [not found]                   ` <CAG9sBKNt0sKixnKj_8ANc3ef4kwEGgrH_5WcCWr3-VfSSf8Ldw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Moni Shoua @ 2014-12-02  6:47 UTC (permalink / raw)
  To: Somnath Kotur
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Dec 2, 2014 at 7:35 AM, Somnath Kotur <Somnath.Kotur-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org> wrote:
> HI Moni,
>         Thank you for your comments , please find my response inline.
>
>
>> > The overarching goal behind this patch is to keep RDMA-CM as the
>> > central entity that decides the protocol (ROCEV2 /ROCEV1) and hides all the
>> address resolution details from applications.
>> I agree but with one comment. RDMA-CM, although preferred,  is not
>> mandatory to establish a connection. The solution needs to take care also for
>> apps that don't use RDMA-CM
>
> Agreed and this patch takes care of that as well thanks to the solution that was put in place
> for IP-based GIDs anyway in `modify_qp` effectively by means of the ib_resolve_l2_attrs().
>
>>
>> > This  is just a continuation of the existing RDMA-CM design philosophy
>> > for IP Based GIDs that lets applications operate over any form of RDMA
>> Service in a completely transparent way without any changes to the
>> applications themselves.
>> >
>> > Protocol Resolution(RoCEV2 or V1) must be done even before we send out
>> > the 1st connection request packet, the corresponding MAD pkt must be V2
>> for a destination that is behind  a router, correct?
>> > I am not sure we want to try sending out RoCEv2 connection requests first
>> and if that fails ,then fallback/retry with RoCEv1 request.
>> I agree but when you send the first packet (request or response) you know
>> the SGID index to be used. Knowing that and assuming you chose the SGID
>> index correctly there is no need to guess.
>> A suggestion for choosing the SGID index is:
>> 1. First, populate GID table with GIDs matching the IP address list of the
>> corresponding net devices. Each GID will appear twice, one for
>> RoCEv1 and one for RoCEv2.
>> 2. When connection is being established RDMA-CM will choose one of the
>> 2 matching indices based on a preferred protocol attribute for the rmda_id
>> (new attr.) 3. Proffered protocol is
>>      a) global for the node/fabric based on the administrator settings
>>      b) can be changed by applications that want to override the default
>> (requires extra API)
>
> Agree to point 1 which is what we were thinking of doing as well as part of the 'override' option.
> Still feel that the 'default' protocol resolution should be based on help from the IP/Networking stack
> as that would straight away enable existing applications and majority of use-case scenarios that use RDMA-CM to seamlessly
> work without any change for RoCE V2.
> What this means is now RDMA-CM will choose one of the 2 matching GID indices based on help from the IP stack.
>
> For this to happen , the 'gid_cache' structure will have to undergo a change to now incorporate a new 'gid_type' field.
> Accordingly the helper functions that find a cached gid entry - ib_find_cached_gid() will also undergo a change to take into account this new field.
>
> I plan to resubmit the patch series making the following changes
> a) Populating GID table with 2 entries per IP address
> b) Changes to the GID cache structure along with it's helper functions
> c) Make sure that RDMA-CM selects/finds the corresponding GID once a network type is chosen based on help from the IP stack.
>
>
> The only thing missing is the ability for end-nodes connected locally inside same subnet to communicate using V2 and that
> could be provided as an 'override' option using the 'preferred protocol' attribute setting on a system-wide basis or on an application basis
> as you have proposed.
> This I propose as an additional patch after the above patch goes in first.
>
> Does this sound like a plan ?
>
> Thanks
> Som
>
Hi Som
Generally, I agree with the changes you suggested. To summarize (and
make sure I got it right) your plan for V2 is to:

1. Associate gid_index with RoCE type in GID table and GID cache.
Requires helper functions to search for gid_index according to GID
value and type
2. Move the code that monitors IP address changes to populate GID
table from device driver to IB/core. Requires each RoCE device driver
to implement add_gid_entry/rem_gid_entry functions
3. Remove the RoCE type from address structure (e.g. ib_ah_attr).
Leave it only in rdma_addr to keep the hint from the IP stack
4. Choose sgid_index (from all matching entries) in RDMA-CM based on
hint from IP stack, user preference and admin policy

If so, I guess that we have a plan.

Thanks
Moni
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found]                   ` <CAG9sBKNt0sKixnKj_8ANc3ef4kwEGgrH_5WcCWr3-VfSSf8Ldw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-12-02  9:20                     ` Somnath Kotur
       [not found]                       ` <7F44EA5110810A40B7DAFB605C41975D312F642B-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Somnath Kotur @ 2014-12-02  9:20 UTC (permalink / raw)
  To: Moni Shoua
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Moni,


> -----Original Message-----
> From: monisonlists@gmail.com [mailto:monisonlists@gmail.com] On Behalf
> Of Moni Shoua
> Sent: Tuesday, December 02, 2014 12:17 PM
> To: Somnath Kotur
> Cc: roland@kernel.org; linux-rdma@vger.kernel.org
> Subject: Re: [PATCH 0/2] Adding support for RoCE V2 specification
> 
> On Tue, Dec 2, 2014 at 7:35 AM, Somnath Kotur
> <Somnath.Kotur@emulex.com> wrote:
> > HI Moni,
> >         Thank you for your comments , please find my response inline.
> >
> >
> >> > The overarching goal behind this patch is to keep RDMA-CM as the
> >> > central entity that decides the protocol (ROCEV2 /ROCEV1) and hides
> >> > all the
> >> address resolution details from applications.
> >> I agree but with one comment. RDMA-CM, although preferred,  is not
> >> mandatory to establish a connection. The solution needs to take care
> >> also for apps that don't use RDMA-CM
> >
> > Agreed and this patch takes care of that as well thanks to the
> > solution that was put in place for IP-based GIDs anyway in `modify_qp`
> effectively by means of the ib_resolve_l2_attrs().
> >
> >>
> >> > This  is just a continuation of the existing RDMA-CM design
> >> > philosophy for IP Based GIDs that lets applications operate over
> >> > any form of RDMA
> >> Service in a completely transparent way without any changes to the
> >> applications themselves.
> >> >
> >> > Protocol Resolution(RoCEV2 or V1) must be done even before we send
> >> > out the 1st connection request packet, the corresponding MAD pkt
> >> > must be V2
> >> for a destination that is behind  a router, correct?
> >> > I am not sure we want to try sending out RoCEv2 connection requests
> >> > first
> >> and if that fails ,then fallback/retry with RoCEv1 request.
> >> I agree but when you send the first packet (request or response) you
> >> know the SGID index to be used. Knowing that and assuming you chose
> >> the SGID index correctly there is no need to guess.
> >> A suggestion for choosing the SGID index is:
> >> 1. First, populate GID table with GIDs matching the IP address list
> >> of the corresponding net devices. Each GID will appear twice, one for
> >> RoCEv1 and one for RoCEv2.
> >> 2. When connection is being established RDMA-CM will choose one of
> >> the
> >> 2 matching indices based on a preferred protocol attribute for the
> >> rmda_id (new attr.) 3. Proffered protocol is
> >>      a) global for the node/fabric based on the administrator settings
> >>      b) can be changed by applications that want to override the
> >> default (requires extra API)
> >
> > Agree to point 1 which is what we were thinking of doing as well as part of
> the 'override' option.
> > Still feel that the 'default' protocol resolution should be based on
> > help from the IP/Networking stack as that would straight away enable
> > existing applications and majority of use-case scenarios that use RDMA-CM
> to seamlessly work without any change for RoCE V2.
> > What this means is now RDMA-CM will choose one of the 2 matching GID
> indices based on help from the IP stack.
> >
> > For this to happen , the 'gid_cache' structure will have to undergo a change
> to now incorporate a new 'gid_type' field.
> > Accordingly the helper functions that find a cached gid entry -
> ib_find_cached_gid() will also undergo a change to take into account this
> new field.
> >
> > I plan to resubmit the patch series making the following changes
> > a) Populating GID table with 2 entries per IP address
> > b) Changes to the GID cache structure along with it's helper functions
> > c) Make sure that RDMA-CM selects/finds the corresponding GID once a
> network type is chosen based on help from the IP stack.
> >
> >
> > The only thing missing is the ability for end-nodes connected locally
> > inside same subnet to communicate using V2 and that could be provided
> > as an 'override' option using the 'preferred protocol' attribute setting on a
> system-wide basis or on an application basis as you have proposed.
> > This I propose as an additional patch after the above patch goes in first.
> >
> > Does this sound like a plan ?
> >
> > Thanks
> > Som
> >
> Hi Som
> Generally, I agree with the changes you suggested. To summarize (and make
> sure I got it right) your plan for V2 is to:
> 
> 1. Associate gid_index with RoCE type in GID table and GID cache.
> Requires helper functions to search for gid_index according to GID value and
> type 2. Move the code that monitors IP address changes to populate GID
> table from device driver to IB/core. Requires each RoCE device driver to
> implement add_gid_entry/rem_gid_entry functions 3. Remove the RoCE
> type from address structure (e.g. ib_ah_attr).
> Leave it only in rdma_addr to keep the hint from the IP stack 4. Choose
> sgid_index (from all matching entries) in RDMA-CM based on hint from IP
> stack, user preference and admin policy

Agree with all your points above , except for point# 2.  That IMHO would be more of a code refactoring/cleanup patch 
that can be taken up later once the baseline version is checked in as it's not really related to this patch ?
Also, if we do that , then we also have to take care of  the IB case as well where GID is something that is obtained from the port/card(query_gid)
as opposed to something that is pushed down from the stack as you are now proposing for IP-based GIDs.. i.e implications are much more than
what this patch intends to address.
So I'd rather do this at a later point in time and not club it with V2, sound good ?

Thanks
Som



> 
> If so, I guess that we have a plan.
> 
> Thanks
> Moni

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Adding support for RoCE V2 specification
       [not found]                       ` <7F44EA5110810A40B7DAFB605C41975D312F642B-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-12-02 12:14                         ` Moni Shoua
  0 siblings, 0 replies; 10+ messages in thread
From: Moni Shoua @ 2014-12-02 12:14 UTC (permalink / raw)
  To: Somnath Kotur
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, linux-rdma-u79uwXL29TY76Z2rM5mHXA

>
> Agree with all your points above , except for point# 2.  That IMHO would be more of a code refactoring/cleanup patch
> that can be taken up later once the baseline version is checked in as it's not really related to this patch ?
> Also, if we do that , then we also have to take care of  the IB case as well where GID is something that is obtained from the port/card(query_gid)
> as opposed to something that is pushed down from the stack as you are now proposing for IP-based GIDs.. i.e implications are much more than
> what this patch intends to address.
> So I'd rather do this at a later point in time and not club it with V2, sound good ?
>
> Thanks
> Som

Hi Som
I think that it can be done in 2 stages. Moving GID table management
to IB/core is indeed not mandatory for RoCEv2.
However, you might want to consider it anyway since RoCEv2 support is
going to add changes to GID table management anyhow.
So, instead of modifying each device driver sperately, do it once
centrally and add a plugin function in struct ib_device (e.g.
ib_write_gid(union ib_gid, struct ib_gid_info *info)). We can work in
parallel and implement the mlx4 function while you concentrate in
IB/core and ocrdma.

I leave it for you to decide how to proceed from here.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-12-02 12:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-27  0:56 [PATCH 0/2] Adding support for RoCE V2 specification Somnath Kotur
     [not found] ` <51dd1341-8216-494f-9574-120634cad8e7-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org>
2014-11-27  9:56   ` Moni Shoua
     [not found]     ` <CAG9sBKPGbHc49rLx+z3N+bgzMakFks2ihvgAkZpmjUz==2ZYDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-28 11:51       ` Somnath Kotur
     [not found]         ` <7F44EA5110810A40B7DAFB605C41975D312F40F4-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-11-30  6:32           ` Moni Shoua
     [not found]         ` <CAG9sBKMDLNnO1Q8QRNNq9cAnTdEL3DXPhpUqwKqFaN1aMGyomw@mail.gmail.com>
     [not found]           ` <CAG9sBKMDLNnO1Q8QRNNq9cAnTdEL3DXPhpUqwKqFaN1aMGyomw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-02  5:35             ` Somnath Kotur
     [not found]               ` <7F44EA5110810A40B7DAFB605C41975D312F4C7C-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-12-02  6:47                 ` Moni Shoua
     [not found]                   ` <CAG9sBKNt0sKixnKj_8ANc3ef4kwEGgrH_5WcCWr3-VfSSf8Ldw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-02  9:20                     ` Somnath Kotur
     [not found]                       ` <7F44EA5110810A40B7DAFB605C41975D312F642B-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-12-02 12:14                         ` Moni Shoua
2014-11-27 15:52   ` Shachar Raindel
     [not found]     ` <583cdb13b37145de92f9321e325f89b6-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2014-11-28  6:30       ` Somnath Kotur

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.