All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: QoS in local SA entity
@ 2009-11-05 12:07 Or Gerlitz
       [not found] ` <4AF2C00A.4040808-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2009-11-05 12:07 UTC (permalink / raw)
  To: Sean Hefty, Roland Dreier; +Cc: linux-rdma, Jason Gunthorpe

>
> I think this really needs to be discussed wrt the implementation of the entity providing the path records.
fair-enough, lets do it then...

> I think what's needed is a way for the SA to distribute QoS information to the end nodes, so that the decisions can be made locally.  If someone wants some sort of dynamic QoS management and is happy using a small cluster, then they can disable any local SA entities and contact the SA directly.

I believe we can go also on a middle way, where the SA isn't contacted 
directly using path query for each resolution, but rather "indirectly" 
e.g using a dedicated multicast based protocol.

> In the case of ACM, the pkey is embedded in the MGID.  'Something' could tell the SA to create ACM multicast groups using a specific SL for a given MGID or pkey in the join request.  That SL would be distributed to the end nodes when they joined their groups.

So assuming ACM supports AF_INET, using network stack route lookup on 
the destination address / rdma_bind on the source address, etc as we 
discussed, ACM can use the rdma-cm to resolve the pkey, then use this 
pkey the MGID and a management software could tell the SA to use a 
specific SL for MGIDs on this partition. Next, ACM can use this SL in 
the path it generates for the IB connection, makes sense?

> The entity that provides the path records cannot depend on calling into the librdmacm.  The dependency needs to go the other way.
I understand that you want to be dependent less as much as possible, but 
I believe that my suggestion doesn't contradict your design but rather 
enhance it.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: QoS in local SA entity
       [not found] ` <4AF2C00A.4040808-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2009-11-05 16:40   ` Sean Hefty
       [not found]     ` <9BF1CEFA7F6F44F5B5641065C4914EB5-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Sean Hefty @ 2009-11-05 16:40 UTC (permalink / raw)
  To: 'Or Gerlitz', Roland Dreier; +Cc: linux-rdma, Jason Gunthorpe

>I believe we can go also on a middle way, where the SA isn't contacted
>directly using path query for each resolution, but rather "indirectly"
>e.g using a dedicated multicast based protocol.

Yes - I wasn't trying to limit how the SA could 'distribute' QoS information to
the end nodes.  ACM will obtain QoS information from the SA when it joins its
multicast groups.

>So assuming ACM supports AF_INET, using network stack route lookup on
>the destination address / rdma_bind on the source address, etc as we
>discussed, ACM can use the rdma-cm to resolve the pkey, then use this
>pkey the MGID and a management software could tell the SA to use a
>specific SL for MGIDs on this partition. Next, ACM can use this SL in
>the path it generates for the IB connection, makes sense?

ACM is intended to be a service that's used by the librdmacm to resolve address
mappings and routes.  Trying to have ACM use the librdmacm ends up with a
circular dependency.  That's the part I'm trying to avoid.

ACM uses address mappings as defined in an address configuration file (IP ->
device, port, pkey).  The address file can be created using the provided ib_acme
utility, which uses the current system configuration (in an ugly way, but it
works).  I think this provides QoS behavior similar to what you're describing.

At some future point, the ib_acm service can be merged with ib_acme to respond
to dynamic changes in ipoib address mappings, but that's a non-trivial amount of
work and involves changes to the ACM multicast groups.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QoS in local SA entity
       [not found]     ` <9BF1CEFA7F6F44F5B5641065C4914EB5-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-11-08  6:25       ` Or Gerlitz
       [not found]         ` <4AF66473.2050303-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2009-11-08  6:25 UTC (permalink / raw)
  To: Sean Hefty; +Cc: linux-rdma

Sean Hefty wrote:
> I wasn't trying to limit how the SA could 'distribute' QoS information to the end nodes.  ACM will obtain QoS information from the SA when it joins its
> multicast groups
excellent... still, this is dependent on how the ACM MGIDs are 
constructed, I'll take a look on the code.

> ACM is intended to be a service that's used by the librdmacm to resolve address mappings and routes.  Trying to have ACM use the librdmacm ends up with a circular dependency.  That's the part I'm trying to avoid.

fail-enough, I believe that my suggestion is doable also without 
circular dependency, e.g as you indicated below or with a fairly small 
enhancement of librdmacm, see next


> ACM uses address mappings as defined in an address configuration file (IP ->
> device, port, pkey).  The address file can be created using the provided ib_acme utility, which uses the current system configuration (in an ugly way, but it works).  I think this provides QoS behavior similar to what you're describing
I assume you are referring to an IP local to the system where ACM runs 
on correct? this would work well for applications calling rdma_bind 
and/or rdma_resolve_address while specifying a source address. To 
support also the case of application which do neither of these two, that 
is call rdma_resolve_addr with dest address only, I suggest to enhance 
librdmacm-calling-ACM flow and resolve the source address using route 
lookup from user space, next the librdmacm can issue rdma_bind on behalf 
of this ID and you have the <device, port, pkey> triplet at your hand so 
now the ACM call can be made form librdmacm. Writing this, I realized 
that better(should) be done also for apps _resove_addr with src ip 
specified. This way you have unified flow for the ACM use in librdmacm 
for either of apps A,B,C below

A.1 rdma_bind(src=X)
A.2 rdma_resolve_addr(src=null, dst=Y)

B.1 rdma_resolve_addr(src=null, dst=Y)

C.1 rdma_resolve_addr(src=X, dst=Y)

where librdmacm calling-ACM flow is

L1. compute source address
L2. issue kernel rdma_bind to source address and resolve <device, port, 
pkey>
L3. issue ACM address (DGID) resolution call using (<device, port, 
pkey>, dest-ip)

makes sense? if yes, what's the need in the address configuration file?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QoS in local SA entity
       [not found]         ` <4AF66473.2050303-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2009-11-09  0:56           ` Jason Gunthorpe
       [not found]             ` <20091109005607.GV1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-11-09 18:38           ` Sean Hefty
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2009-11-09  0:56 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Sean Hefty, linux-rdma

On Sun, Nov 08, 2009 at 08:25:55AM +0200, Or Gerlitz wrote:

> >ACM is intended to be a service that's used by the librdmacm to resolve 
> >address mappings and routes.  Trying to have ACM use the librdmacm ends up 
> >with a circular dependency.  That's the part I'm trying to avoid.
> 
> fail-enough, I believe that my suggestion is doable also without 
> circular dependency, e.g as you indicated below or with a fairly small 
> enhancement of librdmacm, see next

The entire point of the rdma_getaddrinfo + AF_IB is to avoid hacking
up librdmacm for every address lookup/cache scheme someone invents.

The desired flow would be:

rdma_getaddrinfo("User-Specified-Host-String","User-Specified-Port-String",
                 &hints,&res);

// Server flow (hints.af_flags |= AI_PASSIVE)
rdma_bind(res[0].bind_addr);
rmda_listen(res[0].listen_addr);

// Client Flow
rdma_bind(res[0].bind_addr); // Optional
rdma_resolve_addr2(res[0].bind_addr,res[0].dest_addr,res[0].extra_info);

And under rdma_getaddrinfo we could have any number of modules, like
glibc does.

Well written apps should already be using normal getaddrinfo, so we
can design an upgrade to rdma_getaddrinfo to be very minor, source
wise. Un upgraded apps don't get the new functionality.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QoS in local SA entity
       [not found]             ` <20091109005607.GV1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-11-09  7:44               ` Or Gerlitz
       [not found]                 ` <4AF7C85F.5000604-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2009-11-09  7:44 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Sean Hefty, linux-rdma

Jason Gunthorpe wrote:
> The entire point of the rdma_getaddrinfo + AF_IB is to avoid hacking up librdmacm for every address lookup/cache scheme someone invents
the entire simple point I am trying to make is that rdma_getaddrinfo + 
AF_INET is doable, is simple and is needed to keep up the essence of the 
rdma-cm. I don't see how AF_IB buys anything to anyone that but if you 
want to push it up as long as AF_INET is first and most 
supported/interoperable future/present go and add your bits. As you 
indicated the route lookup I was mentioning could be done in 
rdma_addrinfo, sure with  &res including both source and destination 
addresses. No rdma_resolve_addr2 is needed the one that exists now has  
source addresses specified, I  don't see that extra info is needed for 
AF_INET that was resolved with rdma_getaddrinfo is this AF_IB specific?

I don't see why the app should bother on calling rdma_getaddrinfo, it 
can be done by librdmacm with rdma_getaddrinfo having multiple modules 
as you suggested. I am in favor of the approach suggested by Sean of 
librdmacm either doing its native flow or under environment variable 
doing an alternative flow, where your suggestion not to have the 2nd 
flow being tightly coupled with ACM, e.g through using get_addrinfo 
abstraction and friends makes sense (yes!)

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QoS in local SA entity
       [not found]                 ` <4AF7C85F.5000604-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2009-11-09  8:08                   ` Jason Gunthorpe
       [not found]                     ` <20091109080812.GX1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2009-11-09  8:08 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Sean Hefty, linux-rdma

On Mon, Nov 09, 2009 at 09:44:31AM +0200, Or Gerlitz wrote:

>  No rdma_resolve_addr2 is needed the one that exists now has  
> source addresses specified, I  don't see that extra info is needed for 
> AF_INET that was resolved with rdma_getaddrinfo is this AF_IB specific?

The extra info in rdma_resolve_addr2 carries the IB specific path
information from the rdma_getaddrinfo module to the kernel for the
address pair. Then entire purpose of AF_IB is to let user space tell
the kernel it does not want a kernel side ND and PR query, instead
user space will provide all the information.

Think of it this way, ACM takes over the entire process of what
AF_INET does in the kernel. AF_INET talks directly to the IB CM module
in the kernel. Thus, it also makes sense that ACM would need to talk to
IB CM directly as well. AF_IB is that direct connection.

> I don't see why the app should bother on calling rdma_getaddrinfo, it 
> can be done by librdmacm with rdma_getaddrinfo having multiple modules 
> as you suggested. I am in favor of the approach suggested by Sean of 
> librdmacm either doing its native flow or under environment variable 
> doing an alternative flow, where your suggestion not to have the 2nd 
> flow being tightly coupled with ACM, e.g through using get_addrinfo 
> abstraction and friends makes sense (yes!)

I don't entirely understand this paragraph, but the point of a string
based rdma_getaddrinfo is exactly the same point as for IP - strings
may have different meaning and may encode richer information than a
simple sock addr (eg normal getaddrinfo can determine AF_INET,
AF_INET6, and AF_UNIX depending on the form of the string).

For instance it might make sense to trigger/disable the ACM method
with a special string based indicator.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: QoS in local SA entity
       [not found]         ` <4AF66473.2050303-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  2009-11-09  0:56           ` Jason Gunthorpe
@ 2009-11-09 18:38           ` Sean Hefty
       [not found]             ` <5C9CD47F123648F0A926E151BF775484-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Sean Hefty @ 2009-11-09 18:38 UTC (permalink / raw)
  To: 'Or Gerlitz'; +Cc: linux-rdma

>L1. compute source address
>L2. issue kernel rdma_bind to source address and resolve <device, port,
>pkey>
>L3. issue ACM address (DGID) resolution call using (<device, port,
>pkey>, dest-ip)
>
>makes sense? if yes, what's the need in the address configuration file?

Here is where we're at today:

rdma_resolve_addr:
	- Source sends a multicast request to destination IP
	- Destination performs a path record query
	- Destination sends a response with IP to DGID mapping

rdma_resolve_route:
	- Source performs a path record query

The current implementation of ACM converts this to:
	** Source sends a multicast request to destination IP
	** Destination sends a response with IP to DGID mapping
	- Path record is constructed from multicast group information	

ACM needs to know what the local addresses are, so it can respond to requests
for those addresses.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QoS in local SA entity
       [not found]             ` <5C9CD47F123648F0A926E151BF775484-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2009-11-10  5:29               ` Or Gerlitz
  0 siblings, 0 replies; 9+ messages in thread
From: Or Gerlitz @ 2009-11-10  5:29 UTC (permalink / raw)
  To: Sean Hefty; +Cc: linux-rdma

Sean Hefty wrote:
> [...] The current implementation of ACM converts this to:
> 	** Source sends a multicast request to destination IP
> 	** Destination sends a response with IP to DGID mapping
> 	- Path record is constructed from multicast group information	
> ACM needs to know what the local addresses are, so it can respond to requests
> for those addresses
okay got it. Still, how do you see my suggestion on the unified/modified 
librdmacm flow (L1/L2/L3 in my email) which would be taken when working 
against a "DGID/Route" provider such as ACM?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: QoS in local SA entity
       [not found]                     ` <20091109080812.GX1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-11-10  5:51                       ` Or Gerlitz
  0 siblings, 0 replies; 9+ messages in thread
From: Or Gerlitz @ 2009-11-10  5:51 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Sean Hefty, linux-rdma

Jason Gunthorpe wrote:
> The extra info in rdma_resolve_addr2 carries the IB specific path information from the rdma_getaddrinfo module to the kernel for the address pair. The entire purpose of AF_IB is to let user space tell the kernel it does not want a kernel side ND and PR query, instead user space will provide all the information.
The kernel patches posted by Sean replace the ND/PR flow with a two 
steps process, first specifying a DGID to the kernel next specifying a 
PATH. My suggestion is to have a librdmacm initiated bind before the 
sending the DGID to the kernel, this way AF_INET would be supported 
perfectly under the slight limitation that the source address <device, 
port, pkey> tuple would be chosen by route lookup and not by the 
neigh->dev that what resolved by the kernel ND. This is only when the 
modified flow of librdmacm is taken (e.g under user specification with 
environment variable etc).

--If-- on top of that you want to add AF_IB, we may be able to do that, 
but I don't see why the whole thing should be made for AF_IB only.

> Think of it this way, ACM takes over the entire process of what AF_INET does in the kernel. AF_INET talks directly to the IB CM module in the kernel. Thus, it also makes sense that ACM would need to talk to IB CM directly as well. AF_IB is that direct connection.

I don't agree we must state it this way. I see ACM as an alternative way 
for AF_INET to resolve ND/PR.

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-11-10  5:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-05 12:07 QoS in local SA entity Or Gerlitz
     [not found] ` <4AF2C00A.4040808-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-11-05 16:40   ` Sean Hefty
     [not found]     ` <9BF1CEFA7F6F44F5B5641065C4914EB5-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-11-08  6:25       ` Or Gerlitz
     [not found]         ` <4AF66473.2050303-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-11-09  0:56           ` Jason Gunthorpe
     [not found]             ` <20091109005607.GV1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-09  7:44               ` Or Gerlitz
     [not found]                 ` <4AF7C85F.5000604-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2009-11-09  8:08                   ` Jason Gunthorpe
     [not found]                     ` <20091109080812.GX1966-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-11-10  5:51                       ` Or Gerlitz
2009-11-09 18:38           ` Sean Hefty
     [not found]             ` <5C9CD47F123648F0A926E151BF775484-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2009-11-10  5:29               ` Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.