All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kernel-boot: Do not perform device rename on OPA devices
@ 2020-02-04 13:55 Goldman, Adam
  2020-02-04 14:14 ` Honggang LI
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Goldman, Adam @ 2020-02-04 13:55 UTC (permalink / raw)
  To: linux-rdma; +Cc: mike.marciniszyn, dennis.dalessandro, Goldman, Adam

From: "Goldman, Adam" <adam.goldman@intel.com>

PSM2 will not run with recent rdma-core releases. Several tools and
libraries like PSM2, require the hfi1 name to be present.

Recent rdma-core releases added a new feature to rename kernel devices,
but the default configuration will not work with hfi1 fabrics.

Related opa-psm2 github issue:
  https://github.com/intel/opa-psm2/issues/43

Fixes: 5b4099d47be3 ("kernel-boot: Perform device rename to make stable names")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Goldman, Adam <adam.goldman@intel.com>
---
 kernel-boot/rdma-persistent-naming.rules | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel-boot/rdma-persistent-naming.rules b/kernel-boot/rdma-persistent-naming.rules
index 9b61e16..95d6851 100644
--- a/kernel-boot/rdma-persistent-naming.rules
+++ b/kernel-boot/rdma-persistent-naming.rules
@@ -25,4 +25,4 @@
 #   Device type = RoCE
 #   mlx5_0 -> rocex525400c0fe123455
 #
-ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FALLBACK"
+ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-04 13:55 [PATCH] kernel-boot: Do not perform device rename on OPA devices Goldman, Adam
@ 2020-02-04 14:14 ` Honggang LI
  2020-02-04 14:53   ` Leon Romanovsky
  2020-02-04 14:59   ` Gal Pressman
  2020-02-04 14:56 ` Leon Romanovsky
  2020-02-05 19:12 ` Jason Gunthorpe
  2 siblings, 2 replies; 16+ messages in thread
From: Honggang LI @ 2020-02-04 14:14 UTC (permalink / raw)
  To: Goldman, Adam; +Cc: linux-rdma, mike.marciniszyn, dennis.dalessandro

> +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"

Maybe, we should not enable device rename as default for all RDMA
hardware. Leave it to system admin to apply rename or not.

We are observing issues with RDMA device renaming too.

Thanks


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-04 14:14 ` Honggang LI
@ 2020-02-04 14:53   ` Leon Romanovsky
  2020-02-04 14:59   ` Gal Pressman
  1 sibling, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2020-02-04 14:53 UTC (permalink / raw)
  To: Honggang LI
  Cc: Goldman, Adam, linux-rdma, mike.marciniszyn, dennis.dalessandro

On Tue, Feb 04, 2020 at 10:14:40PM +0800, Honggang LI wrote:
> > +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
>
> Maybe, we should not enable device rename as default for all RDMA
> hardware. Leave it to system admin to apply rename or not.
>
> We are observing issues with RDMA device renaming too.

I can't say that I'm exciting to see such attitude, it is better to fix
broken software instead of limiting features in "independent" library.

Thanks

>
> Thanks
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-04 13:55 [PATCH] kernel-boot: Do not perform device rename on OPA devices Goldman, Adam
  2020-02-04 14:14 ` Honggang LI
@ 2020-02-04 14:56 ` Leon Romanovsky
  2020-02-04 15:26   ` Dennis Dalessandro
  2020-02-05 19:12 ` Jason Gunthorpe
  2 siblings, 1 reply; 16+ messages in thread
From: Leon Romanovsky @ 2020-02-04 14:56 UTC (permalink / raw)
  To: Goldman, Adam; +Cc: linux-rdma, mike.marciniszyn, dennis.dalessandro

On Tue, Feb 04, 2020 at 08:55:20AM -0500, Goldman, Adam wrote:
> From: "Goldman, Adam" <adam.goldman@intel.com>
>
> PSM2 will not run with recent rdma-core releases. Several tools and
> libraries like PSM2, require the hfi1 name to be present.
>
> Recent rdma-core releases added a new feature to rename kernel devices,
> but the default configuration will not work with hfi1 fabrics.
>
> Related opa-psm2 github issue:
>   https://github.com/intel/opa-psm2/issues/43

Why don't you fix opa-psm2 and add required rdma-core version
checks inside packaging spec files, like we have inside
redhat/rdma-core.spec?

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-04 14:14 ` Honggang LI
  2020-02-04 14:53   ` Leon Romanovsky
@ 2020-02-04 14:59   ` Gal Pressman
  2020-02-04 15:51     ` Leon Romanovsky
  1 sibling, 1 reply; 16+ messages in thread
From: Gal Pressman @ 2020-02-04 14:59 UTC (permalink / raw)
  To: Honggang LI, Goldman, Adam
  Cc: linux-rdma, mike.marciniszyn, dennis.dalessandro

On 04/02/2020 16:14, Honggang LI wrote:
>> +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
> 
> Maybe, we should not enable device rename as default for all RDMA
> hardware. Leave it to system admin to apply rename or not.
> 
> We are observing issues with RDMA device renaming too.

+1, we're experiencing similar issues as well.
If not disabling by default, we need an easy way to disable the feature.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-04 14:56 ` Leon Romanovsky
@ 2020-02-04 15:26   ` Dennis Dalessandro
  2020-02-04 15:53     ` Leon Romanovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Dennis Dalessandro @ 2020-02-04 15:26 UTC (permalink / raw)
  To: Leon Romanovsky, Goldman, Adam; +Cc: linux-rdma, mike.marciniszyn

On 2/4/2020 9:56 AM, Leon Romanovsky wrote:
> On Tue, Feb 04, 2020 at 08:55:20AM -0500, Goldman, Adam wrote:
>> From: "Goldman, Adam" <adam.goldman@intel.com>
>>
>> PSM2 will not run with recent rdma-core releases. Several tools and
>> libraries like PSM2, require the hfi1 name to be present.
>>
>> Recent rdma-core releases added a new feature to rename kernel devices,
>> but the default configuration will not work with hfi1 fabrics.
>>
>> Related opa-psm2 github issue:
>>    https://github.com/intel/opa-psm2/issues/43
> 
> Why don't you fix opa-psm2 and add required rdma-core version
> checks inside packaging spec files, like we have inside
> redhat/rdma-core.spec?
> 
> Thanks
> 

This is the way PSM has operated from day 1. It has been broken by this 
rename stuff. Clearly not everyone is fan, [1] [2] of the rename.

Seems to me like we should revert back to the original behavior. However 
in lieu of that let HW vendors opt out like what this patch from Adam does.

[1] https://marc.info/?l=linux-rdma&m=158082841016117&w=2
[2] https://marc.info/?l=linux-rdma&m=158082569215149&w=2

-Denny

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-04 14:59   ` Gal Pressman
@ 2020-02-04 15:51     ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2020-02-04 15:51 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Honggang LI, Goldman, Adam, linux-rdma, mike.marciniszyn,
	dennis.dalessandro

On Tue, Feb 04, 2020 at 04:59:47PM +0200, Gal Pressman wrote:
> On 04/02/2020 16:14, Honggang LI wrote:
> >> +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
> >
> > Maybe, we should not enable device rename as default for all RDMA
> > hardware. Leave it to system admin to apply rename or not.
> >
> > We are observing issues with RDMA device renaming too.
>
> +1, we're experiencing similar issues as well.
> If not disabling by default, we need an easy way to disable the feature.

There is super easy way to disable it.

Copy 60-rdma-persistent-naming.rules to /etc/udev/rules.d folder and
change default to anything you want.

This file is not overwritten during rdma-core upgrades and will keep
whatever behavior you need.

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-04 15:26   ` Dennis Dalessandro
@ 2020-02-04 15:53     ` Leon Romanovsky
  0 siblings, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2020-02-04 15:53 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Goldman, Adam, linux-rdma, mike.marciniszyn

On Tue, Feb 04, 2020 at 10:26:22AM -0500, Dennis Dalessandro wrote:
> On 2/4/2020 9:56 AM, Leon Romanovsky wrote:
> > On Tue, Feb 04, 2020 at 08:55:20AM -0500, Goldman, Adam wrote:
> > > From: "Goldman, Adam" <adam.goldman@intel.com>
> > >
> > > PSM2 will not run with recent rdma-core releases. Several tools and
> > > libraries like PSM2, require the hfi1 name to be present.
> > >
> > > Recent rdma-core releases added a new feature to rename kernel devices,
> > > but the default configuration will not work with hfi1 fabrics.
> > >
> > > Related opa-psm2 github issue:
> > >    https://github.com/intel/opa-psm2/issues/43
> >
> > Why don't you fix opa-psm2 and add required rdma-core version
> > checks inside packaging spec files, like we have inside
> > redhat/rdma-core.spec?
> >
> > Thanks
> >
>
> This is the way PSM has operated from day 1. It has been broken by this
> rename stuff. Clearly not everyone is fan, [1] [2] of the rename.

Of course that not everyone will be happy, it is a nature of progress :).

>
> Seems to me like we should revert back to the original behavior. However in
> lieu of that let HW vendors opt out like what this patch from Adam does.
>
> [1] https://marc.info/?l=linux-rdma&m=158082841016117&w=2
> [2] https://marc.info/?l=linux-rdma&m=158082569215149&w=2
>
> -Denny

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-04 13:55 [PATCH] kernel-boot: Do not perform device rename on OPA devices Goldman, Adam
  2020-02-04 14:14 ` Honggang LI
  2020-02-04 14:56 ` Leon Romanovsky
@ 2020-02-05 19:12 ` Jason Gunthorpe
  2020-02-05 19:22   ` Leon Romanovsky
  2020-02-05 20:35   ` Dennis Dalessandro
  2 siblings, 2 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2020-02-05 19:12 UTC (permalink / raw)
  To: Goldman, Adam; +Cc: linux-rdma, mike.marciniszyn, dennis.dalessandro

On Tue, Feb 04, 2020 at 08:55:20AM -0500, Goldman, Adam wrote:
> From: "Goldman, Adam" <adam.goldman@intel.com>
> 
> PSM2 will not run with recent rdma-core releases. Several tools and
> libraries like PSM2, require the hfi1 name to be present.
> 
> Recent rdma-core releases added a new feature to rename kernel devices,
> but the default configuration will not work with hfi1 fabrics.
> 
> Related opa-psm2 github issue:
>   https://github.com/intel/opa-psm2/issues/43
> 
> Fixes: 5b4099d47be3 ("kernel-boot: Perform device rename to make stable names")
> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
> Signed-off-by: Goldman, Adam <adam.goldman@intel.com>
>  kernel-boot/rdma-persistent-naming.rules | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel-boot/rdma-persistent-naming.rules b/kernel-boot/rdma-persistent-naming.rules
> index 9b61e16..95d6851 100644
> +++ b/kernel-boot/rdma-persistent-naming.rules
> @@ -25,4 +25,4 @@
>  #   Device type = RoCE
>  #   mlx5_0 -> rocex525400c0fe123455
>  #
> -ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FALLBACK"
> +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"

We are moving to the new names by default slowly, when wrong
assumptions are found in other packages they need to be updated and
their fixes pushed out.

At some point the major distros will default this to On. People using
leading edge distros can turn it off with the global switch Leon
mentioned.

This is the same process netdev went through when they introduced
persistent names.

If I recall, hfi was one of the reason this work was done. HFI has
problems generating consistent names for its multi-function devices in
various cases and I NAK'd the kernel hack to try and 'fix' that.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-05 19:12 ` Jason Gunthorpe
@ 2020-02-05 19:22   ` Leon Romanovsky
  2020-02-05 20:35   ` Dennis Dalessandro
  1 sibling, 0 replies; 16+ messages in thread
From: Leon Romanovsky @ 2020-02-05 19:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Goldman, Adam, linux-rdma, mike.marciniszyn, dennis.dalessandro

On Wed, Feb 05, 2020 at 03:12:27PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2020 at 08:55:20AM -0500, Goldman, Adam wrote:
> > From: "Goldman, Adam" <adam.goldman@intel.com>
> >
> > PSM2 will not run with recent rdma-core releases. Several tools and
> > libraries like PSM2, require the hfi1 name to be present.
> >
> > Recent rdma-core releases added a new feature to rename kernel devices,
> > but the default configuration will not work with hfi1 fabrics.
> >
> > Related opa-psm2 github issue:
> >   https://github.com/intel/opa-psm2/issues/43
> >
> > Fixes: 5b4099d47be3 ("kernel-boot: Perform device rename to make stable names")
> > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
> > Signed-off-by: Goldman, Adam <adam.goldman@intel.com>
> >  kernel-boot/rdma-persistent-naming.rules | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel-boot/rdma-persistent-naming.rules b/kernel-boot/rdma-persistent-naming.rules
> > index 9b61e16..95d6851 100644
> > +++ b/kernel-boot/rdma-persistent-naming.rules
> > @@ -25,4 +25,4 @@
> >  #   Device type = RoCE
> >  #   mlx5_0 -> rocex525400c0fe123455
> >  #
> > -ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FALLBACK"
> > +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
>
> We are moving to the new names by default slowly, when wrong
> assumptions are found in other packages they need to be updated and
> their fixes pushed out.
>
> At some point the major distros will default this to On. People using
> leading edge distros can turn it off with the global switch Leon
> mentioned.
>
> This is the same process netdev went through when they introduced
> persistent names.
>
> If I recall, hfi was one of the reason this work was done. HFI has
> problems generating consistent names for its multi-function devices in
> various cases and I NAK'd the kernel hack to try and 'fix' that.

You remember correctly, it was related to the bug in production where
physical ports were not aligned with their physical location and the
module parameter was supposed to "reverse" the enumeration order.

Something like that.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-05 19:12 ` Jason Gunthorpe
  2020-02-05 19:22   ` Leon Romanovsky
@ 2020-02-05 20:35   ` Dennis Dalessandro
  2020-02-05 20:54     ` Jason Gunthorpe
  1 sibling, 1 reply; 16+ messages in thread
From: Dennis Dalessandro @ 2020-02-05 20:35 UTC (permalink / raw)
  To: Jason Gunthorpe, Goldman, Adam; +Cc: linux-rdma, mike.marciniszyn

On 2/5/2020 2:12 PM, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2020 at 08:55:20AM -0500, Goldman, Adam wrote:
>> From: "Goldman, Adam" <adam.goldman@intel.com>
>>
>> PSM2 will not run with recent rdma-core releases. Several tools and
>> libraries like PSM2, require the hfi1 name to be present.
>>
>> Recent rdma-core releases added a new feature to rename kernel devices,
>> but the default configuration will not work with hfi1 fabrics.
>>
>> Related opa-psm2 github issue:
>>    https://github.com/intel/opa-psm2/issues/43
>>
>> Fixes: 5b4099d47be3 ("kernel-boot: Perform device rename to make stable names")
>> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
>> Signed-off-by: Goldman, Adam <adam.goldman@intel.com>
>>   kernel-boot/rdma-persistent-naming.rules | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel-boot/rdma-persistent-naming.rules b/kernel-boot/rdma-persistent-naming.rules
>> index 9b61e16..95d6851 100644
>> +++ b/kernel-boot/rdma-persistent-naming.rules
>> @@ -25,4 +25,4 @@
>>   #   Device type = RoCE
>>   #   mlx5_0 -> rocex525400c0fe123455
>>   #
>> -ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FALLBACK"
>> +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
> 
> We are moving to the new names by default slowly, when wrong
> assumptions are found in other packages they need to be updated and
> their fixes pushed out.
> 
> At some point the major distros will default this to On. People using
> leading edge distros can turn it off with the global switch Leon
> mentioned.
> 
> This is the same process netdev went through when they introduced
> persistent names.
> 
> If I recall, hfi was one of the reason this work was done. HFI has
> problems generating consistent names for its multi-function devices in
> various cases and I NAK'd the kernel hack to try and 'fix' that.

So are you saying you won't take this patch then?

I guess we can work with distros to get the right rules in place outside 
of rdma-core so that things continue to work. It would be better though 
in my opinion to just have that be in rdma-core so no one has to worry 
about it and nothing needs to be globally disabled.

You are correct someone tried to put forth a hack for the flip-flop name 
thing [1]. However even if this was used as a solution for that issue we 
would still have the same library looking for hfi1_0 problem.

[1] https://patchwork.kernel.org/patch/9508879/

-Denny






^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-05 20:35   ` Dennis Dalessandro
@ 2020-02-05 20:54     ` Jason Gunthorpe
  2020-02-05 21:59       ` Dennis Dalessandro
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-02-05 20:54 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Goldman, Adam, linux-rdma, mike.marciniszyn

On Wed, Feb 05, 2020 at 03:35:13PM -0500, Dennis Dalessandro wrote:
> On 2/5/2020 2:12 PM, Jason Gunthorpe wrote:
> > On Tue, Feb 04, 2020 at 08:55:20AM -0500, Goldman, Adam wrote:
> > > From: "Goldman, Adam" <adam.goldman@intel.com>
> > > 
> > > PSM2 will not run with recent rdma-core releases. Several tools and
> > > libraries like PSM2, require the hfi1 name to be present.
> > > 
> > > Recent rdma-core releases added a new feature to rename kernel devices,
> > > but the default configuration will not work with hfi1 fabrics.
> > > 
> > > Related opa-psm2 github issue:
> > >    https://github.com/intel/opa-psm2/issues/43
> > > 
> > > Fixes: 5b4099d47be3 ("kernel-boot: Perform device rename to make stable names")
> > > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
> > > Signed-off-by: Goldman, Adam <adam.goldman@intel.com>
> > >   kernel-boot/rdma-persistent-naming.rules | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel-boot/rdma-persistent-naming.rules b/kernel-boot/rdma-persistent-naming.rules
> > > index 9b61e16..95d6851 100644
> > > +++ b/kernel-boot/rdma-persistent-naming.rules
> > > @@ -25,4 +25,4 @@
> > >   #   Device type = RoCE
> > >   #   mlx5_0 -> rocex525400c0fe123455
> > >   #
> > > -ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FALLBACK"
> > > +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
> > 
> > We are moving to the new names by default slowly, when wrong
> > assumptions are found in other packages they need to be updated and
> > their fixes pushed out.
> > 
> > At some point the major distros will default this to On. People using
> > leading edge distros can turn it off with the global switch Leon
> > mentioned.
> > 
> > This is the same process netdev went through when they introduced
> > persistent names.
> > 
> > If I recall, hfi was one of the reason this work was done. HFI has
> > problems generating consistent names for its multi-function devices in
> > various cases and I NAK'd the kernel hack to try and 'fix' that.
> 
> So are you saying you won't take this patch then?

No, this is not a longterm solution. The point of upstream here is to
highlight what needs to be fixed so leading edge distro can fix their
stuff.

> I guess we can work with distros to get the right rules in place outside of
> rdma-core so that things continue to work. 

I would actively block an attempt to try and do an end-run around
upstream like this. rdma-core is supposed to be the defacto
configuration, not be modified randomly by distros as before.

You can request distros delay enabling renaming until psm/etc are
fixed.

The distros know the users/cases where renaming is needed and can
decide if they are more or less important than psm for default
enablement.

> You are correct someone tried to put forth a hack for the flip-flop name
> thing [1]. However even if this was used as a solution for that issue we
> would still have the same library looking for hfi1_0 problem.

It was always a bad design to hardwire strings like this, that library
needs to be fixed up.

Do you remember when I was so annoyed that HFI1 created it's own char
dev, and told you not to do it? This is yet another reason why...

Why isn't psm keying off it's own chardev anyhow? There should be back
links to the RDMA device in sysfs from there.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-05 20:54     ` Jason Gunthorpe
@ 2020-02-05 21:59       ` Dennis Dalessandro
  2020-02-06 13:52         ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Dennis Dalessandro @ 2020-02-05 21:59 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Goldman, Adam, linux-rdma, mike.marciniszyn

On 2/5/2020 3:54 PM, Jason Gunthorpe wrote:
> On Wed, Feb 05, 2020 at 03:35:13PM -0500, Dennis Dalessandro wrote:
>> On 2/5/2020 2:12 PM, Jason Gunthorpe wrote:
>>> On Tue, Feb 04, 2020 at 08:55:20AM -0500, Goldman, Adam wrote:
>>>> From: "Goldman, Adam" <adam.goldman@intel.com>
>>>>
>>>> PSM2 will not run with recent rdma-core releases. Several tools and
>>>> libraries like PSM2, require the hfi1 name to be present.
>>>>
>>>> Recent rdma-core releases added a new feature to rename kernel devices,
>>>> but the default configuration will not work with hfi1 fabrics.
>>>>
>>>> Related opa-psm2 github issue:
>>>>     https://github.com/intel/opa-psm2/issues/43
>>>>
>>>> Fixes: 5b4099d47be3 ("kernel-boot: Perform device rename to make stable names")
>>>> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
>>>> Signed-off-by: Goldman, Adam <adam.goldman@intel.com>
>>>>    kernel-boot/rdma-persistent-naming.rules | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel-boot/rdma-persistent-naming.rules b/kernel-boot/rdma-persistent-naming.rules
>>>> index 9b61e16..95d6851 100644
>>>> +++ b/kernel-boot/rdma-persistent-naming.rules
>>>> @@ -25,4 +25,4 @@
>>>>    #   Device type = RoCE
>>>>    #   mlx5_0 -> rocex525400c0fe123455
>>>>    #
>>>> -ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_FALLBACK"
>>>> +ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
>>>
>>> We are moving to the new names by default slowly, when wrong
>>> assumptions are found in other packages they need to be updated and
>>> their fixes pushed out.
>>>
>>> At some point the major distros will default this to On. People using
>>> leading edge distros can turn it off with the global switch Leon
>>> mentioned.
>>>
>>> This is the same process netdev went through when they introduced
>>> persistent names.
>>>
>>> If I recall, hfi was one of the reason this work was done. HFI has
>>> problems generating consistent names for its multi-function devices in
>>> various cases and I NAK'd the kernel hack to try and 'fix' that.
>>
>> So are you saying you won't take this patch then?
> 
> No, this is not a longterm solution. The point of upstream here is to
> highlight what needs to be fixed so leading edge distro can fix their
> stuff.
> 
>> I guess we can work with distros to get the right rules in place outside of
>> rdma-core so that things continue to work.
> 
> I would actively block an attempt to try and do an end-run around
> upstream like this. rdma-core is supposed to be the defacto
> configuration, not be modified randomly by distros as before.

No but users should be free to name their devices how they want should 
they not?

> You can request distros delay enabling renaming until psm/etc are
> fixed.

Not an end-run around upstream at all. I didn't mean to imply anything 
about how it's done, delaying the enabling, or whatever is fine for now. 
I just meant something that does *not* change/impact rdma-core.

> The distros know the users/cases where renaming is needed and can
> decide if they are more or less important than psm for default
> enablement.

Exactly. We are on the same page here.

>> You are correct someone tried to put forth a hack for the flip-flop name
>> thing [1]. However even if this was used as a solution for that issue we
>> would still have the same library looking for hfi1_0 problem.
> 
> It was always a bad design to hardwire strings like this, that library
> needs to be fixed up.
> 
> Do you remember when I was so annoyed that HFI1 created it's own char
> dev, and told you not to do it? This is yet another reason why...

> Why isn't psm keying off it's own chardev anyhow? There should be back
> links to the RDMA device in sysfs from there.

No arguments here. No sense in going down this road though at this point 
in the game.

-Denny

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-05 21:59       ` Dennis Dalessandro
@ 2020-02-06 13:52         ` Jason Gunthorpe
  2020-02-06 17:51           ` Dennis Dalessandro
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-02-06 13:52 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Goldman, Adam, linux-rdma, mike.marciniszyn

On Wed, Feb 05, 2020 at 04:59:11PM -0500, Dennis Dalessandro wrote:
> > I would actively block an attempt to try and do an end-run around
> > upstream like this. rdma-core is supposed to be the defacto
> > configuration, not be modified randomly by distros as before.
> 
> No but users should be free to name their devices how they want should they
> not?

Isn't that exactly why PSM is broken?

These days I can do

$ rdma link add hfi1_0 type siw netdev eth0

and PSM will become very confused.

This is why keying off the device name was *never* OK.

> > Why isn't psm keying off it's own chardev anyhow? There should be back
> > links to the RDMA device in sysfs from there.
> 
> No arguments here. No sense in going down this road though at this point in
> the game.

I'm not sure what these means? Are you saying you won't be fixing PSM? Why?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-06 13:52         ` Jason Gunthorpe
@ 2020-02-06 17:51           ` Dennis Dalessandro
  2020-02-06 18:22             ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Dennis Dalessandro @ 2020-02-06 17:51 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Goldman, Adam, linux-rdma, mike.marciniszyn

On 2/6/2020 8:52 AM, Jason Gunthorpe wrote:
> On Wed, Feb 05, 2020 at 04:59:11PM -0500, Dennis Dalessandro wrote:
>>> I would actively block an attempt to try and do an end-run around
>>> upstream like this. rdma-core is supposed to be the defacto
>>> configuration, not be modified randomly by distros as before.
>>
>> No but users should be free to name their devices how they want should they
>> not?
> 
> Isn't that exactly why PSM is broken?
> 
> These days I can do
> 
> $ rdma link add hfi1_0 type siw netdev eth0
> 
> and PSM will become very confused.
> 
> This is why keying off the device name was *never* OK.
> 
>>> Why isn't psm keying off it's own chardev anyhow? There should be back
>>> links to the RDMA device in sysfs from there.
>>
>> No arguments here. No sense in going down this road though at this point in
>> the game.
> 
> I'm not sure what these means? Are you saying you won't be fixing PSM? Why?
> 

It's not worth going through the same to have a cdev or not argument 
over again.

-Denny

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] kernel-boot: Do not perform device rename on OPA devices
  2020-02-06 17:51           ` Dennis Dalessandro
@ 2020-02-06 18:22             ` Jason Gunthorpe
  0 siblings, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2020-02-06 18:22 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: Goldman, Adam, linux-rdma, mike.marciniszyn

On Thu, Feb 06, 2020 at 12:51:58PM -0500, Dennis Dalessandro wrote:
> On 2/6/2020 8:52 AM, Jason Gunthorpe wrote:
> > On Wed, Feb 05, 2020 at 04:59:11PM -0500, Dennis Dalessandro wrote:
> > > > I would actively block an attempt to try and do an end-run around
> > > > upstream like this. rdma-core is supposed to be the defacto
> > > > configuration, not be modified randomly by distros as before.
> > > 
> > > No but users should be free to name their devices how they want should they
> > > not?
> > 
> > Isn't that exactly why PSM is broken?
> > 
> > These days I can do
> > 
> > $ rdma link add hfi1_0 type siw netdev eth0
> > 
> > and PSM will become very confused.
> > 
> > This is why keying off the device name was *never* OK.
> > 
> > > > Why isn't psm keying off it's own chardev anyhow? There should be back
> > > > links to the RDMA device in sysfs from there.
> > > 
> > > No arguments here. No sense in going down this road though at this point in
> > > the game.
> > 
> > I'm not sure what these means? Are you saying you won't be fixing PSM? Why?
> > 
> 
> It's not worth going through the same to have a cdev or not argument over
> again.

What do you mean? You already have the extra cdev. It has a stable
name - I see hfi1_%d used as the pattern.

It even creates a class called hfi1

If you want to enumerate devices of this class readdir on
/sys/class/hfi1. This is yours, it will have only your devices. This
is what should have been done from the beginning.

If you want to find the RDMA parent then
readlink("/sys/class/hfi1/hfi1_xx/") + "/../.." should give it to you

Assuming the driver doesn't screw up the usage of sysfs.. Oh, right,
it does, and that never did get fixed.

The struct hfi1_devdata can not have both a struct kobj and a struct
ib_device as inline members. It *should not* have a kobject at all, it
should be using cdev_device_add() and specifying the ib_device as the
parent. Due to these mistakes I suppose the hfi1_x chardev is placed
incorrectly in sysfs.

So you fix it all up. Fix the kernel, fix the psm, etc. You insistend
on this extra char dev, you need to take responsibility to make sure
it is done properly and doesn't interfere with the rest of the system.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-02-06 18:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04 13:55 [PATCH] kernel-boot: Do not perform device rename on OPA devices Goldman, Adam
2020-02-04 14:14 ` Honggang LI
2020-02-04 14:53   ` Leon Romanovsky
2020-02-04 14:59   ` Gal Pressman
2020-02-04 15:51     ` Leon Romanovsky
2020-02-04 14:56 ` Leon Romanovsky
2020-02-04 15:26   ` Dennis Dalessandro
2020-02-04 15:53     ` Leon Romanovsky
2020-02-05 19:12 ` Jason Gunthorpe
2020-02-05 19:22   ` Leon Romanovsky
2020-02-05 20:35   ` Dennis Dalessandro
2020-02-05 20:54     ` Jason Gunthorpe
2020-02-05 21:59       ` Dennis Dalessandro
2020-02-06 13:52         ` Jason Gunthorpe
2020-02-06 17:51           ` Dennis Dalessandro
2020-02-06 18:22             ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.