All of lore.kernel.org
 help / color / mirror / Atom feed
* Is there a working cache for path record and lids etc for librdmacm?
@ 2020-11-17  2:57 Christopher Lameter
  2020-11-17  8:46 ` Jens Domke
  2020-11-17 19:33 ` Jason Gunthorpe
  0 siblings, 2 replies; 21+ messages in thread
From: Christopher Lameter @ 2020-11-17  2:57 UTC (permalink / raw)
  To: linux-rdma

We have a large number of apps running on the same host that are all
sending to the same set of hosts. Lots of requests for address resolution
are going to the SM and for a large set of hosts this can become too much
for the SM.

Is there something that can locally cache the results of the SM queries to
avoid additional requests?

We have tried IBACM but the address resolution does not work on it. It is
unable to complete a request for any address resolution and leaves kernel
threads that never terminate instead.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-17  2:57 Is there a working cache for path record and lids etc for librdmacm? Christopher Lameter
@ 2020-11-17  8:46 ` Jens Domke
  2020-11-17 14:20   ` Christopher Lameter
  2020-11-17 19:33 ` Jason Gunthorpe
  1 sibling, 1 reply; 21+ messages in thread
From: Jens Domke @ 2020-11-17  8:46 UTC (permalink / raw)
  To: Christopher Lameter; +Cc: linux-rdma

Hi Christopher,

On 11/17/20 11:57 AM, Christopher Lameter wrote:
> We have a large number of apps running on the same host that are all
> sending to the same set of hosts. Lots of requests for address resolution
> are going to the SM and for a large set of hosts this can become too much
> for the SM.

I have used ibacm successfully years ago (think somewhere in the
2013-2015 timeframe) but abandoned the approach because some
measurements indicated that using OpenMPI with rdmacm had a big
runtime overhead compared to using OpenMPI+oob (Mellanox was
informed but I'm unsure how much has changed until now)

> Is there something that can locally cache the results of the SM queries to
> avoid additional requests?

Not that I know of, but others might know better. Maybe try contacting
Sean Hefty (driver behind ibacm) directly if he missed your email here
on the list.

> We have tried IBACM but the address resolution does not work on it. It is
> unable to complete a request for any address resolution and leaves kernel
> threads that never terminate instead.

Setting up ibacm was/is painful, maybe you could verify that it works on
a test bed with lowlevel rdmacm tools to debug with ping-pong, etc.

Furthermore, another thing I learned the hard way was that a cold cache
can overwhelm opensm as well. So, if you deploy ibacm, you have to make
sure that not too many requests go to the local ibacm on too many nodes
simultaneously right after starting ibacm service, otherwise having all
nodes sending numerous requests to opensm could timeout -> could be the
reason for your stalled kernel threads.

(another explanation is obviously a bug in ibacm and/or incompatibility
to newer versions of librdmacm or opensm or other IB libs)

Sorry, that I cannot provide more specific and direct help, but maybe my
pointers can help you solve the issue.

Best,
  Jens

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-17  8:46 ` Jens Domke
@ 2020-11-17 14:20   ` Christopher Lameter
  0 siblings, 0 replies; 21+ messages in thread
From: Christopher Lameter @ 2020-11-17 14:20 UTC (permalink / raw)
  To: Jens Domke; +Cc: linux-rdma

On Tue, 17 Nov 2020, Jens Domke wrote:

> I have used ibacm successfully years ago (think somewhere in the
> 2013-2015 timeframe) but abandoned the approach because some
> measurements indicated that using OpenMPI with rdmacm had a big
> runtime overhead compared to using OpenMPI+oob (Mellanox was
> informed but I'm unsure how much has changed until now)

Mellanox does not support ibacm.... But ok. Thanks. Good to know someone
that has actually used it.

> > Is there something that can locally cache the results of the SM queries to
> > avoid additional requests?
>
> Not that I know of, but others might know better. Maybe try contacting
> Sean Hefty (driver behind ibacm) directly if he missed your email here
> on the list.


I have talked to Ira Weiny who wax the last one who did major changes to
the source but he does not know of any alternate solution.

> > We have tried IBACM but the address resolution does not work on it. It is
> > unable to complete a request for any address resolution and leaves kernel
> > threads that never terminate instead.
>
> Setting up ibacm was/is painful, maybe you could verify that it works on
> a test bed with lowlevel rdmacm tools to debug with ping-pong, etc.

That was done and the bug was confirmed. There is bitrot there in the MAD
communication layer.

> Furthermore, another thing I learned the hard way was that a cold cache
> can overwhelm opensm as well. So, if you deploy ibacm, you have to make
> sure that not too many requests go to the local ibacm on too many nodes
> simultaneously right after starting ibacm service, otherwise having all
> nodes sending numerous requests to opensm could timeout -> could be the
> reason for your stalled kernel threads.

Right But our cluster only has around 200 nodes max. Should be fine.

> (another explanation is obviously a bug in ibacm and/or incompatibility
> to newer versions of librdmacm or opensm or other IB libs)
>
> Sorry, that I cannot provide more specific and direct help, but maybe my
> pointers can help you solve the issue.

Thanks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-17  2:57 Is there a working cache for path record and lids etc for librdmacm? Christopher Lameter
  2020-11-17  8:46 ` Jens Domke
@ 2020-11-17 19:33 ` Jason Gunthorpe
  2020-11-20 18:05   ` Christopher Lameter
  1 sibling, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2020-11-17 19:33 UTC (permalink / raw)
  To: Christopher Lameter, Haakon Bugge, Mark Haywood; +Cc: linux-rdma

On Tue, Nov 17, 2020 at 02:57:57AM +0000, Christopher Lameter wrote:
> We have a large number of apps running on the same host that are all
> sending to the same set of hosts. Lots of requests for address resolution
> are going to the SM and for a large set of hosts this can become too much
> for the SM.
> 
> Is there something that can locally cache the results of the SM queries to
> avoid additional requests?
> 
> We have tried IBACM but the address resolution does not work on it. It is
> unable to complete a request for any address resolution and leaves kernel
> threads that never terminate instead.

If it really doesn't work at all any more we should delete it from
rdma-core if nobody is interested to fix it.

Haakon and Mark had stepped up to maintain it a while ago because they
were using it internally, so I'm surprised to hear it is broken.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-17 19:33 ` Jason Gunthorpe
@ 2020-11-20 18:05   ` Christopher Lameter
  2020-11-20 18:34     ` Håkon Bugge
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher Lameter @ 2020-11-20 18:05 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Haakon Bugge, Mark Haywood, linux-rdma

On Tue, 17 Nov 2020, Jason Gunthorpe wrote:

> If it really doesn't work at all any more we should delete it from
> rdma-core if nobody is interested to fix it.
>
> Haakon and Mark had stepped up to maintain it a while ago because they
> were using it internally, so I'm surprised to hear it is broken.

Oh great. I did not know. Will work with them to get things sorted out.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-20 18:05   ` Christopher Lameter
@ 2020-11-20 18:34     ` Håkon Bugge
  2020-11-22 12:49       ` Christopher Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Håkon Bugge @ 2020-11-20 18:34 UTC (permalink / raw)
  To: Christopher Lameter; +Cc: Jason Gunthorpe, Mark Haywood, OFED mailing list



> On 20 Nov 2020, at 19:05, Christopher Lameter <cl@linux.com> wrote:
> 
> On Tue, 17 Nov 2020, Jason Gunthorpe wrote:
> 
>> If it really doesn't work at all any more we should delete it from
>> rdma-core if nobody is interested to fix it.
>> 
>> Haakon and Mark had stepped up to maintain it a while ago because they
>> were using it internally, so I'm surprised to hear it is broken.
> 
> Oh great. I did not know. Will work with them to get things sorted out.

Inside Oracle, we're only using it for resolving IB routes. A cache for address resolution already exists in the kernel. There is a config option to disable address resolution from user-space (acme_plus_kernel_only).


Thxs, Håkon



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-20 18:34     ` Håkon Bugge
@ 2020-11-22 12:49       ` Christopher Lameter
  2020-11-22 15:50         ` Håkon Bugge
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher Lameter @ 2020-11-22 12:49 UTC (permalink / raw)
  To: Håkon Bugge; +Cc: Jason Gunthorpe, Mark Haywood, OFED mailing list

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

On Fri, 20 Nov 2020, Håkon Bugge wrote:
> > Oh great. I did not know. Will work with them to get things sorted out.
> Inside Oracle, we're only using it for resolving IB routes. A cache for
> address resolution already exists in the kernel. There is a config
> option to disable address resolution from user-space
> (acme_plus_kernel_only).

The app that we have runs in user space. Can it use the cache? Is the
cache only in Mellanox OFED? I heard that it was removed.

This is an an option while building ibacm?

And yes we need it to resolve IB routes.

Can you share a working config?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-22 12:49       ` Christopher Lameter
@ 2020-11-22 15:50         ` Håkon Bugge
  2020-11-22 19:22           ` Christopher Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Håkon Bugge @ 2020-11-22 15:50 UTC (permalink / raw)
  To: Christopher Lameter; +Cc: Jason Gunthorpe, Mark Haywood, OFED mailing list



> On 22 Nov 2020, at 13:49, Christopher Lameter <cl@linux.com> wrote:
> 
> On Fri, 20 Nov 2020, Håkon Bugge wrote:
>>> Oh great. I did not know. Will work with them to get things sorted out.
>> Inside Oracle, we're only using it for resolving IB routes. A cache for
>> address resolution already exists in the kernel. There is a config
>> option to disable address resolution from user-space
>> (acme_plus_kernel_only).
> 
> The app that we have runs in user space. Can it use the cache? Is the
> cache only in Mellanox OFED? I heard that it was removed.

An app in user space can use the ibacm cache. If you use the default configuration that comes with rdma-core, both address and route resolution will be from librdmacm directly to ibacm, i.e., no kernel involved. The ibacm options are by default installed in /etc/rdma/ibacm_opts.cfg

If you set acme_plus_kernel_only to one in said config file, you app will resolve the address using the kernel neighbour cache and the route resolution will go into the kernel and then "bounce" back  to user space and ibacm through NetLink.

I do not know if ibacm is present in Mellanox OFED, but it is easy to find out:

# rpm -q ibacm


> This is an an option while building ibacm?

Nop, runtime config option as depicted above.

> And yes we need it to resolve IB routes.

Then the above will work.

> Can you share a working config?

The default provided by rdma-core should work, possible requiring the option above.


Thxs, Håkon


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-22 15:50         ` Håkon Bugge
@ 2020-11-22 19:22           ` Christopher Lameter
  2020-11-23 12:50             ` Christopher Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher Lameter @ 2020-11-22 19:22 UTC (permalink / raw)
  To: Håkon Bugge; +Cc: Jason Gunthorpe, Mark Haywood, OFED mailing list

[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]

On Sun, 22 Nov 2020, Håkon Bugge wrote:

> > The app that we have runs in user space. Can it use the cache? Is the
> > cache only in Mellanox OFED? I heard that it was removed.
>
> An app in user space can use the ibacm cache. If you use the default
> configuration that comes with rdma-core, both address and route
> resolution will be from librdmacm directly to ibacm, i.e., no kernel
> involved. The ibacm options are by default installed in
> /etc/rdma/ibacm_opts.cfg

I have been using that.

> If you set acme_plus_kernel_only to one in said config file, you app will resolve the address using the kernel neighbour cache and the route resolution will go into the kernel and then "bounce" back  to user space and ibacm through NetLink.

Have not seen that in the RHEL7.8 version of ibacm.

> > This is an an option while building ibacm?
>
> Nop, runtime config option as depicted above.

Must be a newer version then.

> The default provided by rdma-core should work, possible requiring the option above.

The one in RHEL7 will never resolve anything through the subnet manger.
Evey request results here in a leftover kernel thread hanging around.

Which version of ibacm do you run?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-22 19:22           ` Christopher Lameter
@ 2020-11-23 12:50             ` Christopher Lameter
  2020-11-23 19:01               ` Håkon Bugge
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher Lameter @ 2020-11-23 12:50 UTC (permalink / raw)
  To: Håkon Bugge; +Cc: Jason Gunthorpe, Mark Haywood, OFED mailing list

On Sun, 22 Nov 2020, Christopher Lameter wrote:

> > If you set acme_plus_kernel_only to one in said config file, you app will resolve the address using the kernel neighbour cache and the route resolution will go into the kernel and then "bounce" back  to user space and ibacm through NetLink.
>
> Have not seen that in the RHEL7.8 version of ibacm.
>

Got version 33.0 from Redhat with the option. Set it but ibacm still times
out when trying to contact the SM.

ib_acme says:

ib_acm_resolve_ip failed: Connection timed out


ibmacm.log says

acmp_process_wait_queue: notice - failing request
acmp_process_timeouts: notice - dest 192.168.50.39



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-23 12:50             ` Christopher Lameter
@ 2020-11-23 19:01               ` Håkon Bugge
  2020-11-24 19:01                 ` Christopher Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Håkon Bugge @ 2020-11-23 19:01 UTC (permalink / raw)
  To: Christopher Lameter; +Cc: Jason Gunthorpe, Mark Haywood, OFED mailing list



> On 23 Nov 2020, at 13:50, Christopher Lameter <cl@linux.com> wrote:
> 
> On Sun, 22 Nov 2020, Christopher Lameter wrote:
> 
>>> If you set acme_plus_kernel_only to one in said config file, you app will resolve the address using the kernel neighbour cache and the route resolution will go into the kernel and then "bounce" back  to user space and ibacm through NetLink.
>> 
>> Have not seen that in the RHEL7.8 version of ibacm.
>> 
> 
> Got version 33.0 from Redhat with the option. Set it but ibacm still times
> out when trying to contact the SM.

Contact the peer ibacm, that is. Is it started?

And, ib_acme bypasses the kernel_only check. I assume a real app (e.g., qperf <destination_ip> -cm1 rc_bw) would work, but incur an excess delay due to the ibacm timeout, before failing back to the kernel neighbour cache.


Thxs, Håkon




> 
> ib_acme says:
> 
> ib_acm_resolve_ip failed: Connection timed out
> 
> 
> ibmacm.log says
> 
> acmp_process_wait_queue: notice - failing request
> acmp_process_timeouts: notice - dest 192.168.50.39
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-23 19:01               ` Håkon Bugge
@ 2020-11-24 19:01                 ` Christopher Lameter
  2020-11-25  8:10                   ` Honggang LI
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher Lameter @ 2020-11-24 19:01 UTC (permalink / raw)
  To: Håkon Bugge; +Cc: Jason Gunthorpe, Mark Haywood, OFED mailing list

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

On Mon, 23 Nov 2020, Håkon Bugge wrote:

> > Got version 33.0 from Redhat with the option. Set it but ibacm still times
> > out when trying to contact the SM.
>
> Contact the peer ibacm, that is. Is it started?


It can contact the peer ibacm if its running on a particular host. Then
the resolution succeeds. But we want ibacm to talk to the subnet manager.

> And, ib_acme bypasses the kernel_only check. I assume a real app (e.g.,
> qperf <destination_ip> -cm1 rc_bw) would work, but incur an excess delay
> due to the ibacm timeout, before failing back to the kernel neighbour
> cache.

Ok. But what does it matter?


How do I figure out why ibacm is not talking to the subnet manager?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-24 19:01                 ` Christopher Lameter
@ 2020-11-25  8:10                   ` Honggang LI
  2020-11-25 16:43                     ` Christopher Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Honggang LI @ 2020-11-25  8:10 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Håkon Bugge, Jason Gunthorpe, Mark Haywood, OFED mailing list

On Tue, Nov 24, 2020 at 07:01:25PM +0000, Christopher Lameter wrote:
> On Mon, 23 Nov 2020, Håkon Bugge wrote:
> 
> > > Got version 33.0 from Redhat with the option. Set it but ibacm still times
> > > out when trying to contact the SM.
> >
> > Contact the peer ibacm, that is. Is it started?
> 
> 
> It can contact the peer ibacm if its running on a particular host. Then
> the resolution succeeds. But we want ibacm to talk to the subnet manager.
> 
> > And, ib_acme bypasses the kernel_only check. I assume a real app (e.g.,
> > qperf <destination_ip> -cm1 rc_bw) would work, but incur an excess delay
> > due to the ibacm timeout, before failing back to the kernel neighbour
> > cache.
> 
> Ok. But what does it matter?
> 
> 
> How do I figure out why ibacm is not talking to the subnet manager?

No, you can't talking to subnet manager, if you resolve IPoIB IP address
or hostname to PathRecord. The query MAD packets will be send to one
multicast group all ibacm service attached.

To resolve IPoIB address to PathRecord, you must:
1) The IPoIB interface must UP and RUNNING on the client and target
side.
2) The ibacm service must RUNNING on the client and target.

Thanks


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-25  8:10                   ` Honggang LI
@ 2020-11-25 16:43                     ` Christopher Lameter
  2020-11-27 14:52                       ` Håkon Bugge
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher Lameter @ 2020-11-25 16:43 UTC (permalink / raw)
  To: Honggang LI
  Cc: Håkon Bugge, Jason Gunthorpe, Mark Haywood, OFED mailing list

On Wed, 25 Nov 2020, Honggang LI wrote:

> > How do I figure out why ibacm is not talking to the subnet manager?
>
> No, you can't talking to subnet manager, if you resolve IPoIB IP address
> or hostname to PathRecord. The query MAD packets will be send to one
> multicast group all ibacm service attached.

Huh? When does it talk to a subnet manager (or the SA)?

If its get an IP address of an IB node that does not have ibacm then it
fails with a timeout ..... ? And leaves hanging kernel threads around by
design?

So it only populates the cache from its local node information?

> To resolve IPoIB address to PathRecord, you must:
> 1) The IPoIB interface must UP and RUNNING on the client and target
> side.
> 2) The ibacm service must RUNNING on the client and target.

That is working if you want to resolve only the IP addresses of the IB
interfaces on the client and target. None else.

Here is the description of ibacms function from the sources:

"Conceptually, the ibacm service implements an ARP like protocol and
either uses IB multicast records to construct path record data or queries
the SA directly, depending on the selected route protocol. By default, the
ibacm services uses and caches SA path record queries."

SA queries dont work. So its broken and cannot talk to the SM.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-25 16:43                     ` Christopher Lameter
@ 2020-11-27 14:52                       ` Håkon Bugge
  2020-11-30  8:24                         ` Christopher Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Håkon Bugge @ 2020-11-27 14:52 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Honggang LI, Jason Gunthorpe, Mark Haywood, OFED mailing list



> On 25 Nov 2020, at 17:43, Christopher Lameter <cl@linux.com> wrote:
> 
> On Wed, 25 Nov 2020, Honggang LI wrote:
> 
>>> How do I figure out why ibacm is not talking to the subnet manager?
>> 
>> No, you can't talking to subnet manager, if you resolve IPoIB IP address
>> or hostname to PathRecord. The query MAD packets will be send to one
>> multicast group all ibacm service attached.
> 
> Huh? When does it talk to a subnet manager (or the SA)?

When resolving the route AND the option "route_prot" is set to "sa". If set to "acm", what Hong describes above applies.

> If its get an IP address of an IB node that does not have ibacm then it
> fails with a timeout ..... ? And leaves hanging kernel threads around by
> design?

Nop, the kernel falls back and uses the neighbour cache instead.

> So it only populates the cache from its local node information?

No, if you use ibacm for address resolution the only protocol it has is "acm", which means the information comes from a peer ibacm.

If you talk about the cache for routes, it comes either from the SA or a peer ibacm, depending on the "route_prot" setting.

>> To resolve IPoIB address to PathRecord, you must:
>> 1) The IPoIB interface must UP and RUNNING on the client and target
>> side.
>> 2) The ibacm service must RUNNING on the client and target.
> 
> That is working if you want to resolve only the IP addresses of the IB
> interfaces on the client and target. None else.

That is why it is called IBacm, right?

> Here is the description of ibacms function from the sources:
> 
> "Conceptually, the ibacm service implements an ARP like protocol and
> either uses IB multicast records to construct path record data or queries
> the SA directly, depending on the selected route protocol. By default, the
> ibacm services uses and caches SA path record queries."
> 
> SA queries dont work. So its broken and cannot talk to the SM.

Why do you say that? It works all the time for me which uses "sa" as "route_prot".


Thxs, Håkon


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-27 14:52                       ` Håkon Bugge
@ 2020-11-30  8:24                         ` Christopher Lameter
  2020-12-04 11:17                           ` Håkon Bugge
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher Lameter @ 2020-11-30  8:24 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Honggang LI, Jason Gunthorpe, Mark Haywood, OFED mailing list

[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]

On Fri, 27 Nov 2020, Håkon Bugge wrote:

> > Huh? When does it talk to a subnet manager (or the SA)?
>
> When resolving the route AND the option "route_prot" is set to "sa". If
> set to "acm", what Hong describes above applies.

My config has "route_prot" set to "sa"

> > If its get an IP address of an IB node that does not have ibacm then it
> > fails with a timeout ..... ? And leaves hanging kernel threads around by
> > design?
>
> Nop, the kernel falls back and uses the neighbour cache instead.

But ib_acme hangs? The main issue here is what the user space app does.
And we need ibacm to cache user space address resolutions.

> > So it only populates the cache from its local node information?
>
> No, if you use ibacm for address resolution the only protocol it has is
> "acm", which means the information comes from a peer ibacm.
>
> If you talk about the cache for routes, it comes either from the SA or a
> peer ibacm, depending on the "route_prot" setting.

I have always run it with that setting. How can I debug this issue and how
can we fix this?

>
> >> To resolve IPoIB address to PathRecord, you must:
> >> 1) The IPoIB interface must UP and RUNNING on the client and target
> >> side.
> >> 2) The ibacm service must RUNNING on the client and target.
> >
> > That is working if you want to resolve only the IP addresses of the IB
> > interfaces on the client and target. None else.
>
> That is why it is called IBacm, right?

Huh? IBACM is an address resolution service for IB. Somehow that only
includes addresses of hosts running IBACM?

>
> > Here is the description of ibacms function from the sources:
> >
> > "Conceptually, the ibacm service implements an ARP like protocol and
> > either uses IB multicast records to construct path record data or queries
> > the SA directly, depending on the selected route protocol. By default, the
> > ibacm services uses and caches SA path record queries."
> >
> > SA queries dont work. So its broken and cannot talk to the SM.
>
> Why do you say that? It works all the time for me which uses "sa" as "route_prot".

Not here and not in the tests that RH ran to verify the issue.

"route_prot" set to "sa" is the default config for the Redhat release of
IBACM.

However, the addr_prot is set to  "acm" by default. I set it to "sa" with
no effect.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-11-30  8:24                         ` Christopher Lameter
@ 2020-12-04 11:17                           ` Håkon Bugge
  2020-12-05 11:50                             ` Christoph Lameter
  2020-12-07 10:28                             ` Christoph Lameter
  0 siblings, 2 replies; 21+ messages in thread
From: Håkon Bugge @ 2020-12-04 11:17 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Honggang LI, Jason Gunthorpe, Mark Haywood, OFED mailing list



> On 30 Nov 2020, at 09:24, Christopher Lameter <cl@linux.com> wrote:
> 
> On Fri, 27 Nov 2020, Håkon Bugge wrote:
> 
>>> Huh? When does it talk to a subnet manager (or the SA)?
>> 
>> When resolving the route AND the option "route_prot" is set to "sa". If
>> set to "acm", what Hong describes above applies.
> 
> My config has "route_prot" set to "sa"
> 
>>> If its get an IP address of an IB node that does not have ibacm then it
>>> fails with a timeout ..... ? And leaves hanging kernel threads around by
>>> design?
>> 
>> Nop, the kernel falls back and uses the neighbour cache instead.
> 
> But ib_acme hangs? The main issue here is what the user space app does.
> And we need ibacm to cache user space address resolutions.

I got the impression that you are debugging this with Honggang. If you want me to help, I need, to start with, an strace of ib_acme and ditto of ibacm.

>>> So it only populates the cache from its local node information?
>> 
>> No, if you use ibacm for address resolution the only protocol it has is
>> "acm", which means the information comes from a peer ibacm.
>> 
>> If you talk about the cache for routes, it comes either from the SA or a
>> peer ibacm, depending on the "route_prot" setting.
> 
> I have always run it with that setting. How can I debug this issue and how
> can we fix this?

k


> 
>> 
>>>> To resolve IPoIB address to PathRecord, you must:
>>>> 1) The IPoIB interface must UP and RUNNING on the client and target
>>>> side.
>>>> 2) The ibacm service must RUNNING on the client and target.
>>> 
>>> That is working if you want to resolve only the IP addresses of the IB
>>> interfaces on the client and target. None else.
>> 
>> That is why it is called IBacm, right?
> 
> Huh? IBACM is an address resolution service for IB. Somehow that only
> includes addresses of hosts running IBACM?

Yes. As Honggang explained, ibacmn's address resolution protocol is based on IB multicast, as such, the peer must have ibacm running in order to send a unicast response back with the L2 addr.

>>> Here is the description of ibacms function from the sources:
>>> 
>>> "Conceptually, the ibacm service implements an ARP like protocol and
>>> either uses IB multicast records to construct path record data or queries
>>> the SA directly, depending on the selected route protocol. By default, the
>>> ibacm services uses and caches SA path record queries."
>>> 
>>> SA queries dont work. So its broken and cannot talk to the SM.
>> 
>> Why do you say that? It works all the time for me which uses "sa" as "route_prot".
> 
> Not here and not in the tests that RH ran to verify the issue.
> 
> "route_prot" set to "sa" is the default config for the Redhat release of
> IBACM.
> 
> However, the addr_prot is set to  "acm" by default. I set it to "sa" with
> no effect.

OK. Understood. As stated above, let me know if you want me to debug this.


Thxs, Håkon


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-12-04 11:17                           ` Håkon Bugge
@ 2020-12-05 11:50                             ` Christoph Lameter
  2020-12-07 10:28                             ` Christoph Lameter
  1 sibling, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2020-12-05 11:50 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Honggang LI, Jason Gunthorpe, Mark Haywood, OFED mailing list

[-- Attachment #1: Type: text/plain, Size: 2355 bytes --]

On Fri, 4 Dec 2020, Håkon Bugge wrote:

> >> Nop, the kernel falls back and uses the neighbour cache instead.
> >
> > But ib_acme hangs? The main issue here is what the user space app does.
> > And we need ibacm to cache user space address resolutions.
>
> I got the impression that you are debugging this with Honggang. If you want me to help, I need, to start with, an strace of ib_acme and ditto of ibacm.

Ok will do that. Do you have access to the RH case on this one?

> >>>> To resolve IPoIB address to PathRecord, you must:
> >>>> 1) The IPoIB interface must UP and RUNNING on the client and target
> >>>> side.
> >>>> 2) The ibacm service must RUNNING on the client and target.
> >>>
> >>> That is working if you want to resolve only the IP addresses of the IB
> >>> interfaces on the client and target. None else.
> >>
> >> That is why it is called IBacm, right?
> >
> > Huh? IBACM is an address resolution service for IB. Somehow that only
> > includes addresses of hosts running IBACM?
>
> Yes. As Honggang explained, ibacmn's address resolution protocol is
> based on IB multicast, as such, the peer must have ibacm running in
> order to send a unicast response back with the L2 addr.

What is the point of the route_prot and addr_prot then?

> >>> Here is the description of ibacms function from the sources:
> >>>
> >>> "Conceptually, the ibacm service implements an ARP like protocol and
> >>> either uses IB multicast records to construct path record data or queries
> >>> the SA directly, depending on the selected route protocol. By default, the
> >>> ibacm services uses and caches SA path record queries."
> >>>
> >>> SA queries dont work. So its broken and cannot talk to the SM.
> >>
> >> Why do you say that? It works all the time for me which uses "sa" as "route_prot".
> >
> > Not here and not in the tests that RH ran to verify the issue.
> >
> > "route_prot" set to "sa" is the default config for the Redhat release of
> > IBACM.
> >
> > However, the addr_prot is set to  "acm" by default. I set it to "sa" with
> > no effect.
>
> OK. Understood. As stated above, let me know if you want me to debug this.

Well whats the point to debug this if its only doing address resolution
via multicast and not via the SA?

Is there a particular issue with usiing the SA? The route information may
contain process specific information?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-12-04 11:17                           ` Håkon Bugge
  2020-12-05 11:50                             ` Christoph Lameter
@ 2020-12-07 10:28                             ` Christoph Lameter
  2020-12-07 21:08                               ` Mark Haywood
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Lameter @ 2020-12-07 10:28 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Honggang LI, Jason Gunthorpe, Mark Haywood, OFED mailing list


Looking at librdmacm/rdma_getaddrinfo():

It seems that the call to the IBACM via ucma_ib_resolve() is only done
after a regular getaddrinfo() was run. Is IBACM truly able to provide
address resolution or is it just some strange after processing if the main
resolution attempt fails?

AFACIT ucma_resolve() should run before getaddrinfo()?

Or is there some magic in getaddrinfo() that actually does another call to
the IBACM daemon?



What is also confusing is that the path record determination is part of
getaddrinfo() as well. So both the address and route lookup end up in
getaddrinfo(). Is IB therefore using the kernel to do the lookups?





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-12-07 10:28                             ` Christoph Lameter
@ 2020-12-07 21:08                               ` Mark Haywood
  2020-12-08  8:59                                 ` Christoph Lameter
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Haywood @ 2020-12-07 21:08 UTC (permalink / raw)
  To: Christoph Lameter, Håkon Bugge
  Cc: Honggang LI, Jason Gunthorpe, OFED mailing list



On 12/7/20 5:28 AM, Christoph Lameter wrote:
> Looking at librdmacm/rdma_getaddrinfo():
>
> It seems that the call to the IBACM via ucma_ib_resolve() is only done
> after a regular getaddrinfo() was run. Is IBACM truly able to provide
> address resolution or is it just some strange after processing if the main
> resolution attempt fails?



getaddrinfo() is called only if 'node' or 'service' are set. Otherwise, 
'hints' are set and used.

ucma_set_ib_route() (called from rdma_resolve_route()) calls 
rdma_getaddrinfo() with 'hints' set.

Increasing the ibacm log level and then using cmtime(1), I see log 
messages that indicate that ibacm is resolving addresses.



>
> AFACIT ucma_resolve() should run before getaddrinfo()?
>
> Or is there some magic in getaddrinfo() that actually does another call to
> the IBACM daemon?
>
>
>
> What is also confusing is that the path record determination is part of
> getaddrinfo() as well. So both the address and route lookup end up in
> getaddrinfo(). Is IB therefore using the kernel to do the lookups?
>
>
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is there a working cache for path record and lids etc for librdmacm?
  2020-12-07 21:08                               ` Mark Haywood
@ 2020-12-08  8:59                                 ` Christoph Lameter
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Lameter @ 2020-12-08  8:59 UTC (permalink / raw)
  To: Mark Haywood
  Cc: Håkon Bugge, Honggang LI, Jason Gunthorpe, OFED mailing list

On Mon, 7 Dec 2020, Mark Haywood wrote:

> On 12/7/20 5:28 AM, Christoph Lameter wrote:
> > Looking at librdmacm/rdma_getaddrinfo():
> >
> > It seems that the call to the IBACM via ucma_ib_resolve() is only done
> > after a regular getaddrinfo() was run. Is IBACM truly able to provide
> > address resolution or is it just some strange after processing if the main
> > resolution attempt fails?
>
>
>
> getaddrinfo() is called only if 'node' or 'service' are set. Otherwise,
> 'hints' are set and used.

Right. It calls the function that does an RPC to ibacm *after*
getaddrinfo. This is confusing. I would have expected this to happen
*before* getaddrinfo and that getaddrinfo would be skipped if ibacm
returns a hit in the cache.

If node is set then we want something to be resolved. So it *first* should
check with ibacm. No?

> ucma_set_ib_route() (called from rdma_resolve_route()) calls
> rdma_getaddrinfo() with 'hints' set.

That in turn calls getaddrinfo.

> Increasing the ibacm log level and then using cmtime(1), I see log messages
> that indicate that ibacm is resolving addresses.

Well it does that under certain circumstances. What kind of addresses are
you resolving?


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-12-08  9:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-17  2:57 Is there a working cache for path record and lids etc for librdmacm? Christopher Lameter
2020-11-17  8:46 ` Jens Domke
2020-11-17 14:20   ` Christopher Lameter
2020-11-17 19:33 ` Jason Gunthorpe
2020-11-20 18:05   ` Christopher Lameter
2020-11-20 18:34     ` Håkon Bugge
2020-11-22 12:49       ` Christopher Lameter
2020-11-22 15:50         ` Håkon Bugge
2020-11-22 19:22           ` Christopher Lameter
2020-11-23 12:50             ` Christopher Lameter
2020-11-23 19:01               ` Håkon Bugge
2020-11-24 19:01                 ` Christopher Lameter
2020-11-25  8:10                   ` Honggang LI
2020-11-25 16:43                     ` Christopher Lameter
2020-11-27 14:52                       ` Håkon Bugge
2020-11-30  8:24                         ` Christopher Lameter
2020-12-04 11:17                           ` Håkon Bugge
2020-12-05 11:50                             ` Christoph Lameter
2020-12-07 10:28                             ` Christoph Lameter
2020-12-07 21:08                               ` Mark Haywood
2020-12-08  8:59                                 ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.