All of lore.kernel.org
 help / color / mirror / Atom feed
* REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
@ 2021-09-20 10:20 Russell King (Oracle)
  2021-09-20 18:33 ` Mark Brown
  2021-09-20 19:41 ` Andreas Schwab
  0 siblings, 2 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-20 10:20 UTC (permalink / raw)
  To: broonie, Marco Felsch, Andreas Schwab; +Cc: linux-spi, kernel

Hi,

Commit 3ce6c9e2617e ("spi: add of_device_uevent_modalias support")
prevents Macchiatobin auto-loading the spi-nor driver at boot.

Prior to this change, /sys/bus/spi/devices/spi4.0/uevent contained:

OF_COMPATIBLE_0=st,w25q32
OF_COMPATIBLE_N=1
MODALIAS=spi:w25q32

since the DT for this platform contains:

spi-flash@0 {
        compatible = "st,w25q32";

which is entirely legal according to the binding documentation in
Documentation/devicetree/bindings/mtd/jedec,spi-nor.yaml.

The above referenced commit changes this to:

MODALIAS=of:Nspi-flashT(null)Cst,w25q32

However, the spi-nor module only supports these "of" modaliases:

alias:          of:N*T*Cjedec,spi-norC*
alias:          of:N*T*Cjedec,spi-nor

but supports _way_ more "spi" modaliases, including "spi:w25q32".

Therefore, this change breaks module autoloading.

A following commit e09f2ab8eecc ("spi: update modalias_show after
of_device_uevent_modalias support") also changed the 
/sys/bus/spi/devices/spi4.0/modalias making a similar breaking change.

Hence there are two commits that may need to be reverted:

e09f2ab8eecc ("spi: update modalias_show after of_device_uevent_modalias support")
3ce6c9e2617e ("spi: add of_device_uevent_modalias support")

Alternatively, we need to add _all_ the flash types that the spi-nor
driver supports to the DT table, which sounds like a recipe for
disaster waiting to happen as it means maintaining two large tables of
flash devices, one for the SPI aliases with the flash information and
one for the DT aliases.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 10:20 REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin Russell King (Oracle)
@ 2021-09-20 18:33 ` Mark Brown
  2021-09-20 19:37   ` Russell King (Oracle)
  2021-09-20 19:41 ` Andreas Schwab
  1 sibling, 1 reply; 17+ messages in thread
From: Mark Brown @ 2021-09-20 18:33 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Marco Felsch, Andreas Schwab, linux-spi, kernel, robh

[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]

On Mon, Sep 20, 2021 at 11:20:29AM +0100, Russell King (Oracle) wrote:

> spi-flash@0 {
>         compatible = "st,w25q32";

> which is entirely legal according to the binding documentation in
> Documentation/devicetree/bindings/mtd/jedec,spi-nor.yaml.

Are you sure?  Looking at the binding document it appears that the
fallback to jedec,spi-nor is mandatory in all cases - it's either one of
the two items: cases both of which are lists with jedec,spi-nor in them
or just the plain jedec,spi-nor fallback.  It kind of doesn't matter
given that we weren't enforcing it in the past but still.

> MODALIAS=of:Nspi-flashT(null)Cst,w25q32

> However, the spi-nor module only supports these "of" modaliases:

> alias:          of:N*T*Cjedec,spi-norC*
> alias:          of:N*T*Cjedec,spi-nor

> but supports _way_ more "spi" modaliases, including "spi:w25q32".

> Therefore, this change breaks module autoloading.

Ugh, right.  This sort of stuff is why I really dislike not listing all
the compatibles in driver like some of the SoC vendors seem allergic to,
I keep having to fight for that.  

> Hence there are two commits that may need to be reverted:

> e09f2ab8eecc ("spi: update modalias_show after of_device_uevent_modalias support")
> 3ce6c9e2617e ("spi: add of_device_uevent_modalias support")

This then causes issues for anything trying to bind based with DT
aliases AIUI so it's just pushing the problems around to different
devices.  I think ideally we should be including the fallback compat IDs
that could also be matched along with the OF aliases.

> Alternatively, we need to add _all_ the flash types that the spi-nor
> driver supports to the DT table, which sounds like a recipe for
> disaster waiting to happen as it means maintaining two large tables of
> flash devices, one for the SPI aliases with the flash information and
> one for the DT aliases.

That doesn't seem particularly hard TBH, and if we're going to be
listing any compatibles we really ought to be including them all rather
than just a random one.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 18:33 ` Mark Brown
@ 2021-09-20 19:37   ` Russell King (Oracle)
  2021-09-20 21:25     ` Mark Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-20 19:37 UTC (permalink / raw)
  To: Mark Brown; +Cc: Marco Felsch, Andreas Schwab, linux-spi, kernel, robh

On Mon, Sep 20, 2021 at 07:33:27PM +0100, Mark Brown wrote:
> On Mon, Sep 20, 2021 at 11:20:29AM +0100, Russell King (Oracle) wrote:
> 
> > spi-flash@0 {
> >         compatible = "st,w25q32";
> 
> > which is entirely legal according to the binding documentation in
> > Documentation/devicetree/bindings/mtd/jedec,spi-nor.yaml.
> 
> Are you sure?  Looking at the binding document it appears that the
> fallback to jedec,spi-nor is mandatory in all cases - it's either one of
> the two items: cases both of which are lists with jedec,spi-nor in them
> or just the plain jedec,spi-nor fallback.  It kind of doesn't matter
> given that we weren't enforcing it in the past but still.

We aren't even enforcing it today either - running the DT checker is
entirely optional, and it's not even carried by distros, so across
distro upgrades it breaks. I'd also suggest that almost no one bothers
to run it either, looking at the almost 6700 lines of output it
produces for my build - the chances of spotting anything relevant in
that are practically zero.

Nevertheless:

arch/arm64/boot/dts/marvell/armada-8040-mcbin.dt.yaml:0:0: /cp1/config-space@f4000000/spi@700680/spi-flash@0: failed to match any schema with compatible: ['st,w25q32']

because, although the driver accepts devices of type "w25q32", it
isn't listed in the DT schema. However, the driver code itself
accepts that "w25q32" is used in DT - so the schema is out of step
with the driver and has been for ages.

        /*
         * Entries that were used in DTs without "jedec,spi-nor" fallback and
         * should be kept for backward compatibility.
         */
        {"at25df321a"}, {"at25df641"},  {"at26df081a"},
        {"mx25l4005a"}, {"mx25l1606e"}, {"mx25l6405d"}, {"mx25l12805d"},
        {"mx25l25635e"},{"mx66l51235l"},
        {"n25q064"},    {"n25q128a11"}, {"n25q128a13"}, {"n25q512a"},
        {"s25fl256s1"}, {"s25fl512s"},  {"s25sl12801"}, {"s25fl008k"},
        {"s25fl064k"},
        {"sst25vf040b"},{"sst25vf016b"},{"sst25vf032b"},{"sst25wf040"},
        {"m25p40"},     {"m25p80"},     {"m25p16"},     {"m25p32"},
        {"m25p64"},     {"m25p128"},
        {"w25x80"},     {"w25x32"},     {"w25q32"},     {"w25q32dw"},
        {"w25q80bl"},   {"w25q128"},    {"w25q256"},

> > MODALIAS=of:Nspi-flashT(null)Cst,w25q32
> 
> > However, the spi-nor module only supports these "of" modaliases:
> 
> > alias:          of:N*T*Cjedec,spi-norC*
> > alias:          of:N*T*Cjedec,spi-nor
> 
> > but supports _way_ more "spi" modaliases, including "spi:w25q32".
> 
> > Therefore, this change breaks module autoloading.
> 
> Ugh, right.  This sort of stuff is why I really dislike not listing all
> the compatibles in driver like some of the SoC vendors seem allergic to,
> I keep having to fight for that.  
> 
> > Hence there are two commits that may need to be reverted:
> 
> > e09f2ab8eecc ("spi: update modalias_show after of_device_uevent_modalias support")
> > 3ce6c9e2617e ("spi: add of_device_uevent_modalias support")
> 
> This then causes issues for anything trying to bind based with DT
> aliases AIUI so it's just pushing the problems around to different
> devices.  I think ideally we should be including the fallback compat IDs
> that could also be matched along with the OF aliases.

I have a different view. These patches have fixed one problem by
creating another problem - they have _changed_ the module alias
that SPI creates for DT.

Originally, the module alias was created via of_modalias_node()
in of_register_spi_device():

        /* Select device driver */
        rc = of_modalias_node(nc, spi->modalias,
                                sizeof(spi->modalias));
        if (rc < 0) {
                dev_err(&ctlr->dev, "cannot find modalias for %pOF\n", nc);
                goto err_out;
        }

However, as a result of the above two commits, the modalias that
is now given to userspace has changed from whatever
of_modalias_node() produced to whatever of_device_modalias() and
of_device_uevent_modalias() produces - which is something quite
different.

So, IMHO the change in these two patches was _wrong_ and always
was wrong, and was always going to lead to this problem. Randomly
deciding to have a different modalias policy is always going to
lead to problems like this.

> > Alternatively, we need to add _all_ the flash types that the spi-nor
> > driver supports to the DT table, which sounds like a recipe for
> > disaster waiting to happen as it means maintaining two large tables of
> > flash devices, one for the SPI aliases with the flash information and
> > one for the DT aliases.
> 
> That doesn't seem particularly hard TBH, and if we're going to be
> listing any compatibles we really ought to be including them all rather
> than just a random one.

Looking at the shere number of combinations given the regexp in
the DT binding document, I think someone would need to script its
generation - expanding just the first set leads to:

micron,m25p40
micron,m25p80
micron,m25p16
micron,m25p32
micron,m25p64
micron,m25p128
spansion,m25p40
spansion,m25p80
spansion,m25p16
spansion,m25p32
spansion,m25p64
spansion,m25p128
st,m25p40
st,m25p80
st,m25p16
st,m25p32
st,m25p64
st,m25p128
m25p40
m25p80
m25p16
m25p32
m25p64
m25p128

And then you have a similar set for n25q*. Once all of that regexp
has been expanded, one then needs to add the list for "backward
compatibility" with all the different manufacturer prefixes that
of_modalias_node() stripped off.

Are you really sure this is a solution you wish to require?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 10:20 REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin Russell King (Oracle)
  2021-09-20 18:33 ` Mark Brown
@ 2021-09-20 19:41 ` Andreas Schwab
  2021-09-20 19:49   ` Russell King (Oracle)
  1 sibling, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2021-09-20 19:41 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: broonie, Marco Felsch, linux-spi, kernel

On Sep 20 2021, Russell King (Oracle) wrote:

> Therefore, this change breaks module autoloading.

Reverting this change breaks module autoloading.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 19:41 ` Andreas Schwab
@ 2021-09-20 19:49   ` Russell King (Oracle)
  2021-09-20 20:52     ` Mark Brown
  2021-09-20 21:56     ` Andreas Schwab
  0 siblings, 2 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-20 19:49 UTC (permalink / raw)
  To: Andreas Schwab, Linus Torvalds; +Cc: broonie, Marco Felsch, linux-spi, kernel

On Mon, Sep 20, 2021 at 09:41:47PM +0200, Andreas Schwab wrote:
> On Sep 20 2021, Russell King (Oracle) wrote:
> 
> > Therefore, this change breaks module autoloading.
> 
> Reverting this change breaks module autoloading.

No.

Module autoloading worked before. Then someone probably noticed a
problem, and thought they'd fix it by changing how the module alias
strings SPI provides are produced. In fixing it, they broke existing
setups that have worked for years.

If you think it's acceptable to break existing setups to fix a bug
you need to be reminded of Linus' rant:

  https://lkml.org/lkml/2018/8/3/621

"  Bugs happen. That's a fact of life. Arguing that "we had to break
   something because we were fixing a bug" is completely insane. We fix
   tens of bugs every single day, thinking that "fixing a bug" means
   that we can break something is simply NOT TRUE."

It seems that is exactly what has happened here: a change was
introduced to fix an apparently module autoloading bug that has
caused _another_ module autoloading bug that wasn't originally
there.

That is not fixing a bug at all.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 19:49   ` Russell King (Oracle)
@ 2021-09-20 20:52     ` Mark Brown
  2021-09-20 21:56     ` Andreas Schwab
  1 sibling, 0 replies; 17+ messages in thread
From: Mark Brown @ 2021-09-20 20:52 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andreas Schwab, Linus Torvalds, Marco Felsch, linux-spi, kernel

[-- Attachment #1: Type: text/plain, Size: 1594 bytes --]

On Mon, Sep 20, 2021 at 08:49:21PM +0100, Russell King (Oracle) wrote:
> On Mon, Sep 20, 2021 at 09:41:47PM +0200, Andreas Schwab wrote:
> > On Sep 20 2021, Russell King (Oracle) wrote:

> > > Therefore, this change breaks module autoloading.

> > Reverting this change breaks module autoloading.

> No.

> Module autoloading worked before. Then someone probably noticed a
> problem, and thought they'd fix it by changing how the module alias
> strings SPI provides are produced. In fixing it, they broke existing
> setups that have worked for years.

To be clear Russell is absolutely right here.  These changes have
broken module autoloading for spi-nor on device tree based
systems which don't list the jedec,spi-nor fallback compatible,
and quite likely for some other drivers/systems that were also
relying on the fallback compatible mechanism in a similar way.
They will also have fixed systems where we weren't autoloading
based on DT compatibles but the broken systems are still broken
and regardless of the quality of the DTs that those systems have
DT is an ABI so they have to continue to work.

Ideally we'll be able to keep both sets of drivers working (and
I think we probably should just get all the compatibles listed in
the spi-nor driver for the sake of robustness if nothing else),
unfortunately this wasn't noticed until after v5.14 was released
so we might now have systems relying on the new behaviour too
which complicates things a bit.  Still, if we can't get it fixed
reasonably promptly (this week say, for the next -rc) I expect
I'll revert as Russell has suggested.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 19:37   ` Russell King (Oracle)
@ 2021-09-20 21:25     ` Mark Brown
  2021-09-21 10:55       ` Russell King (Oracle)
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Brown @ 2021-09-20 21:25 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Marco Felsch, Andreas Schwab, linux-spi, kernel, robh

[-- Attachment #1: Type: text/plain, Size: 4655 bytes --]

On Mon, Sep 20, 2021 at 08:37:23PM +0100, Russell King (Oracle) wrote:
> On Mon, Sep 20, 2021 at 07:33:27PM +0100, Mark Brown wrote:
> > On Mon, Sep 20, 2021 at 11:20:29AM +0100, Russell King (Oracle) wrote:

> > Are you sure?  Looking at the binding document it appears that the
> > fallback to jedec,spi-nor is mandatory in all cases - it's either one of
> > the two items: cases both of which are lists with jedec,spi-nor in them
> > or just the plain jedec,spi-nor fallback.  It kind of doesn't matter
> > given that we weren't enforcing it in the past but still.

> We aren't even enforcing it today either - running the DT checker is
> entirely optional, and it's not even carried by distros, so across
> distro upgrades it breaks. I'd also suggest that almost no one bothers
> to run it either, looking at the almost 6700 lines of output it
> produces for my build - the chances of spotting anything relevant in
> that are practically zero.

Right, good - my read of the DT binding document was correct at
least.  Like we're both saying it doesn't really matter what was
documented, what we were accepting is what matters.

> because, although the driver accepts devices of type "w25q32", it
> isn't listed in the DT schema. However, the driver code itself
> accepts that "w25q32" is used in DT - so the schema is out of step
> with the driver and has been for ages.

Looks like it's trying to list it but not quite managing.
Another thing to fix...

> > This then causes issues for anything trying to bind based with DT
> > aliases AIUI so it's just pushing the problems around to different
> > devices.  I think ideally we should be including the fallback compat IDs
> > that could also be matched along with the OF aliases.

> I have a different view. These patches have fixed one problem by
> creating another problem - they have _changed_ the module alias
> that SPI creates for DT.

> Originally, the module alias was created via of_modalias_node()
> in of_register_spi_device():

>         /* Select device driver */
>         rc = of_modalias_node(nc, spi->modalias,
>                                 sizeof(spi->modalias));
>         if (rc < 0) {
>                 dev_err(&ctlr->dev, "cannot find modalias for %pOF\n", nc);
>                 goto err_out;
>         }

> However, as a result of the above two commits, the modalias that
> is now given to userspace has changed from whatever
> of_modalias_node() produced to whatever of_device_modalias() and
> of_device_uevent_modalias() produces - which is something quite
> different.

> So, IMHO the change in these two patches was _wrong_ and always
> was wrong, and was always going to lead to this problem. Randomly
> deciding to have a different modalias policy is always going to
> lead to problems like this.

Hrm, right - I hadn't really registered that we were generating
compat modaliases in quite that way (and hadn't had the bandwidth
to dig into that properly today, should do tomorrow).  First pass
I'd think that either SPI or probably of_device_uevent_modalias()
ought to be generating both formats (assuming we can list
multiple modaliases which I'm not sure on) but like I say I've
really not dug into this properly yet at this point.

I'm reasonably sure we should be continuing to generate the old
style modalises like a revert would, my main questions are if we
can arrange to provide both types so that anything that won't
match on the compat type can also work, and if there's any
fallout that needs fixing up if we can't and end up doing the
revert.

> > That doesn't seem particularly hard TBH, and if we're going to be
> > listing any compatibles we really ought to be including them all rather
> > than just a random one.
> 
> Looking at the shere number of combinations given the regexp in
> the DT binding document, I think someone would need to script its
> generation - expanding just the first set leads to:

...

> And then you have a similar set for n25q*. Once all of that regexp
> has been expanded, one then needs to add the list for "backward
> compatibility" with all the different manufacturer prefixes that
> of_modalias_node() stripped off.

> Are you really sure this is a solution you wish to require?

It's true that the manufacturer prefixes blow stuff up for this
particular driver especially badly without wildcards which gets
messy...  the interaction between generic parts like flash and
the DT aliases definitely isn't at all nice at the minute, the
compat stuff is doing a good job of sidestepping some of the
explosion in compatibles.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 19:49   ` Russell King (Oracle)
  2021-09-20 20:52     ` Mark Brown
@ 2021-09-20 21:56     ` Andreas Schwab
  2021-09-20 22:25       ` Linus Torvalds
  2021-09-20 22:43       ` Russell King (Oracle)
  1 sibling, 2 replies; 17+ messages in thread
From: Andreas Schwab @ 2021-09-20 21:56 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Linus Torvalds, broonie, Marco Felsch, linux-spi, kernel

On Sep 20 2021, Russell King (Oracle) wrote:

> On Mon, Sep 20, 2021 at 09:41:47PM +0200, Andreas Schwab wrote:
>> On Sep 20 2021, Russell King (Oracle) wrote:
>> 
>> > Therefore, this change breaks module autoloading.
>> 
>> Reverting this change breaks module autoloading.
>
> No.
>
> Module autoloading worked before.

Nope.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 21:56     ` Andreas Schwab
@ 2021-09-20 22:25       ` Linus Torvalds
  2021-10-04 14:00         ` Andreas Schwab
  2021-09-20 22:43       ` Russell King (Oracle)
  1 sibling, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2021-09-20 22:25 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Russell King (Oracle),
	Mark Brown, Marco Felsch, linux-spi, Pengutronix Kernel Team

On Mon, Sep 20, 2021 at 2:56 PM Andreas Schwab <schwab@suse.de> wrote:
>
> On Sep 20 2021, Russell King (Oracle) wrote:
> >
> > Module autoloading worked before.
>
> Nope.

It clearly did, Andreas. At least on Macchiatobin.

Maybe it didn't work for some other cases, but the point of
regressions is that they are things that broke that _used_ to work.

Other cases - that never worked - are not regressions if they continue
to not work.

And the thing that makes regressions special is that back when I
wasn't so strict about these things, we'd end up in endless "seesaw
situations" where somebody would fix something, it would break
something else, then that something else would break, and it would
never actually converge on anything reliable at all.

That is why the regression rule exists. It is NOT ACCEPTABLE to fix
one thing, and break another. That's not a fix at all. That's a
regression, and no amount of "but but but it fixes something else" is
valid.

So don't make completely garbage arguments. Russell reported a
regression. Denying regressions is not an option.

Yes, we have situations where even regressions don't matter - like
major security issues that simply cannot be fixed other ways, because
the regression _was_ the security hole.

(Or, actually, more commonly, the "nobody noticed" hole: a regression
is a bit like Schrödinger's cat - if nobody is around to notice it and
it doesn't actually affect any real workload, then you can treat the
regression as if it doesn't exist),

            Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 21:56     ` Andreas Schwab
  2021-09-20 22:25       ` Linus Torvalds
@ 2021-09-20 22:43       ` Russell King (Oracle)
  2021-09-21  7:34         ` Andreas Schwab
  1 sibling, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-20 22:43 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Linus Torvalds, broonie, Marco Felsch, linux-spi, kernel

On Mon, Sep 20, 2021 at 11:56:31PM +0200, Andreas Schwab wrote:
> On Sep 20 2021, Russell King (Oracle) wrote:
> 
> > On Mon, Sep 20, 2021 at 09:41:47PM +0200, Andreas Schwab wrote:
> >> On Sep 20 2021, Russell King (Oracle) wrote:
> >> 
> >> > Therefore, this change breaks module autoloading.
> >> 
> >> Reverting this change breaks module autoloading.
> >
> > No.
> >
> > Module autoloading worked before.
> 
> Nope.

Sorry, but you are wrong. Let me take a random built kernel I have
laying around. 5.4.0+:

-rw-r--r-- 1 rmk rmk 17164877 Jan 26  2020 /home/rmk/systems/juno-host-5.4.0+.tar.bz2

and throw it onto the Macchiatobin:

[  OK  ] Finished Suspend/Resume Running libvirt Guests.
[  OK  ] Started LSB: exim Mail Transport Agent.
[  OK  ] Started Samba SMB Daemon.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.

Debian GNU/Linux 11 mcbin-ss ttyS2

mcbin-ss login: root
Password:
Linux mcbin-ss 5.4.0+ #608 SMP PREEMPT Sun Jan 26 15:44:51 GMT 2020 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Sep 20 22:17:58 BST 2021 on ttyS2
root@mcbin-ss:~# lsmod
Module                  Size  Used by
nfnetlink              16384  0
8021q                  36864  0
garp                   16384  1 8021q
mrp                    20480  1 8021q
crct10dif_ce           16384  1
spi_nor                49152  0
armada_thermal         16384  0
sbsa_gwdt              16384  0
ip_tables              32768  0
x_tables               45056  1 ip_tables
root@mcbin-ss:~#

That's strange, spi_nor has been _autoloaded_ under 5.4.0+. If I boot
exactly the same userspace with 5.13.0+, it gets _autoloaded_:

[  OK  ] Finished Suspend/Resume Running libvirt Guests.
[  OK  ] Started Samba SMB Daemon.P■ower device driver controller.
[  OK  ] Started LSB: exim Mail Transport Agent.tails.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.ring sensors.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.rvice.
[  OK  ] Started LSB: rng-tools (Debian variant).

Debian GNU/Linux 11 mcbin-ss ttyS2

mcbin-ss login: root
Password:
Linux mcbin-ss 5.13.0+ #958 SMP PREEMPT Tue Sep 14 16:17:44 BST 2021 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Sep 20 23:19:05 BST 2021 on ttyS2
root@mcbin-ss:~# lsmod
Module                  Size  Used by
nfnetlink              20480  0
8021q                  36864  0
garp                   16384  1 8021q
mrp                    20480  1 8021q
crct10dif_ce           20480  1
spi_nor                73728  0
leds_gpio              16384  0
pwm_fan                20480  0
armada_thermal         16384  0
sbsa_gwdt              20480  0
ip_tables              32768  0
x_tables               45056  1 ip_tables

I'll prove it by moving the modules so they can't be loaded by the
normal system boot, and then trigger udev:

root@mcbin-ss:~# uname -a
Linux mcbin-ss 5.13.0+ #958 SMP PREEMPT Tue Sep 14 16:17:44 BST 2021 aarch64 GNU/Linux
root@mcbin-ss:~# lsmod
Module                  Size  Used by
root@mcbin-ss:~# mv 5.13.0+ /lib/modules
root@mcbin-ss:~# udevadm trigger --action=add
root@mcbin-ss:~# sbsa-gwdt f0610000.watchdog: Initialized with 10s timeout @ 25000000 Hz, action=0.
spi-nor spi4.0: w25q32 (4096 Kbytes)
lsmod
Module                  Size  Used by
crct10dif_ce           20480  1
pwm_fan                20480  0
spi_nor                73728  0
leds_gpio              16384  0
armada_thermal         16384  0
sbsa_gwdt              20480  0

Why does this happen?

root@mcbin-ss:~# cat /sys/bus/spi/devices/spi4.0/modalias
spi:w25q32
root@mcbin-ss:~# cat /sys/bus/spi/devices/spi4.0/uevent
DRIVER=spi-nor
OF_NAME=spi-flash
OF_FULLNAME=/cp1/config-space@f4000000/spi@700680/spi-flash@0
OF_COMPATIBLE_0=st,w25q32
OF_COMPATIBLE_N=1
MODALIAS=spi:w25q32
root@mcbin-ss:~# modinfo spi-nor
filename:       /lib/modules/5.13.0+/kernel/drivers/mtd/spi-nor/spi-nor.ko
description:    framework for SPI NOR
author:         Mike Lavender
author:         Huang Shijie <shijie8@gmail.com>
license:        GPL v2
alias:          of:N*T*Cjedec,spi-norC*
alias:          of:N*T*Cjedec,spi-nor
alias:          spi:mr25h40
...
alias:          spi:w25q32		<=======================
...
alias:          spi:spi-nor
depends:
intree:         Y
name:           spi_nor
vermagic:       5.13.0+ SMP preempt mod_unload aarch64

If I boot the exact same userspace with 5.14.0+, it does _not_ get
autoloaded, because the modinfo and uevent files contain different
contents that the spi-nor module does not have an alias for.

Yet you say without your change module autoloading doesn't work - that
may be true for you, but the point here is that:

   Your change plus another during the 5.14 cycle broke previously
   working module autoloading. It broke a previously working setup.
   This is a _regression_.

Maybe you could explain why you think otherwise - and possibly accept
that it _did_ used to work for others.

Yes, I get it that _your_ patch (which is the later one of the two
that I mentioned) was merely bringing the modalias file into line with
the uevent file - but that doesn't change the fact that 5.13 and
earlier kernels worked, 5.14 does not.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 22:43       ` Russell King (Oracle)
@ 2021-09-21  7:34         ` Andreas Schwab
  2021-09-21 12:22           ` Mark Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2021-09-21  7:34 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Linus Torvalds, broonie, Marco Felsch, linux-spi, kernel

On Sep 20 2021, Russell King (Oracle) wrote:

> Yet you say without your change module autoloading doesn't work - that
> may be true for you, but the point here is that:
>
>    Your change plus another during the 5.14 cycle broke previously
>    working module autoloading. It broke a previously working setup.
>    This is a _regression_.
>
> Maybe you could explain why you think otherwise - and possibly accept
> that it _did_ used to work for others.

Reverting them will break module autoloading that previously worked.
This is a _regression_.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 21:25     ` Mark Brown
@ 2021-09-21 10:55       ` Russell King (Oracle)
  0 siblings, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-21 10:55 UTC (permalink / raw)
  To: Mark Brown; +Cc: Marco Felsch, Andreas Schwab, linux-spi, kernel, robh

On Mon, Sep 20, 2021 at 10:25:31PM +0100, Mark Brown wrote:
> On Mon, Sep 20, 2021 at 08:37:23PM +0100, Russell King (Oracle) wrote:
> > On Mon, Sep 20, 2021 at 07:33:27PM +0100, Mark Brown wrote:
> > > On Mon, Sep 20, 2021 at 11:20:29AM +0100, Russell King (Oracle) wrote:
> 
> > > Are you sure?  Looking at the binding document it appears that the
> > > fallback to jedec,spi-nor is mandatory in all cases - it's either one of
> > > the two items: cases both of which are lists with jedec,spi-nor in them
> > > or just the plain jedec,spi-nor fallback.  It kind of doesn't matter
> > > given that we weren't enforcing it in the past but still.
> 
> > We aren't even enforcing it today either - running the DT checker is
> > entirely optional, and it's not even carried by distros, so across
> > distro upgrades it breaks. I'd also suggest that almost no one bothers
> > to run it either, looking at the almost 6700 lines of output it
> > produces for my build - the chances of spotting anything relevant in
> > that are practically zero.
> 
> Right, good - my read of the DT binding document was correct at
> least.  Like we're both saying it doesn't really matter what was
> documented, what we were accepting is what matters.

Indeed - today's reading of the DT binding document, but that misses
the history.  See commit:

8ff16cf77ce3 ("Documentation: devicetree: m25p80: add "nor-jedec"
binding")

Originally, this driver _only_ accepted a specific device name in the
compatible. This commit added "nor-jedec". This introduced the
requirement in March 2015 for this property to be present.

8947e396a829 ("Documentation: dt: mtd: replace "nor-jedec" binding with
"jedec, spi-nor"")

In May 2015, this converted commit "nor-jedec" to the "jedec,spi-nor"
we have today, and renamed the binding file from m25p80.txt to
jedec,spi-nor.txt.

So, I don't think we can hold any DTS file to conform to the binding
document given that:
(a) the binding document originally allowed it,
(b) that the driver doesn't warn that the "jedec,spi-nor" is missing,
(c) checking of the dts files for correctness to the binding
    documentation has historically been very poor.
(d) DT is an interface we should avoid regressions, just like the
    kernel to userspace interface.

> > because, although the driver accepts devices of type "w25q32", it
> > isn't listed in the DT schema. However, the driver code itself
> > accepts that "w25q32" is used in DT - so the schema is out of step
> > with the driver and has been for ages.
> 
> Looks like it's trying to list it but not quite managing.
> Another thing to fix...

Indeed. :(

> > > This then causes issues for anything trying to bind based with DT
> > > aliases AIUI so it's just pushing the problems around to different
> > > devices.  I think ideally we should be including the fallback compat IDs
> > > that could also be matched along with the OF aliases.
> 
> > I have a different view. These patches have fixed one problem by
> > creating another problem - they have _changed_ the module alias
> > that SPI creates for DT.
> 
> > Originally, the module alias was created via of_modalias_node()
> > in of_register_spi_device():
> 
> >         /* Select device driver */
> >         rc = of_modalias_node(nc, spi->modalias,
> >                                 sizeof(spi->modalias));
> >         if (rc < 0) {
> >                 dev_err(&ctlr->dev, "cannot find modalias for %pOF\n", nc);
> >                 goto err_out;
> >         }
> 
> > However, as a result of the above two commits, the modalias that
> > is now given to userspace has changed from whatever
> > of_modalias_node() produced to whatever of_device_modalias() and
> > of_device_uevent_modalias() produces - which is something quite
> > different.
> 
> > So, IMHO the change in these two patches was _wrong_ and always
> > was wrong, and was always going to lead to this problem. Randomly
> > deciding to have a different modalias policy is always going to
> > lead to problems like this.
> 
> Hrm, right - I hadn't really registered that we were generating
> compat modaliases in quite that way (and hadn't had the bandwidth
> to dig into that properly today, should do tomorrow).  First pass
> I'd think that either SPI or probably of_device_uevent_modalias()
> ought to be generating both formats (assuming we can list
> multiple modaliases which I'm not sure on) but like I say I've
> really not dug into this properly yet at this point.
> 
> I'm reasonably sure we should be continuing to generate the old
> style modalises like a revert would, my main questions are if we
> can arrange to provide both types so that anything that won't
> match on the compat type can also work, and if there's any
> fallout that needs fixing up if we can't and end up doing the
> revert.

Initially, I thought we could possibly export two modaliases, since
it works for DT...

/sys/bus/platform/devices/f0400000.xor/uevent
OF_NAME=xor
OF_FULLNAME=/ap806/config-space@f0000000/xor@400000
OF_COMPATIBLE_0=marvell,armada-7k-xor
OF_COMPATIBLE_1=marvell,xor-v2
OF_COMPATIBLE_N=2
MODALIAS=of:NxorT(null)Cmarvell,armada-7k-xorCmarvell,xor-v2

/sys/bus/platform/devices/f0400000.xor/modalias
of:NxorT(null)Cmarvell,armada-7k-xorCmarvell,xor-v2

So, I wondered whether we combine the of: and spi: forms like this:

of:...spi:...

However, it seems we can't do that - Debian Bullseye modprobe:
root@mcbin-ss:~# modprobe -R "of:NfooTbarCatmel,24c2048"
at24
root@mcbin-ss:~# modprobe -R "spi:fooof:NfooTbarCatmel,24c2048"
modprobe: FATAL: Module spi:fooof:NfooTbarCatmel,24c2048 not found in directory /lib/modules/5.14.0+

So it looks like we're stuck with having exactly one modalias string
exported, which we can't sanely change once we've settled on one
scheme for modaliases for a bus type and firmware combination.

> It's true that the manufacturer prefixes blow stuff up for this
> particular driver especially badly without wildcards which gets
> messy...  the interaction between generic parts like flash and
> the DT aliases definitely isn't at all nice at the minute, the
> compat stuff is doing a good job of sidestepping some of the
> explosion in compatibles.

Yes, certainly true... and then there's typos as well:

arch/arm/boot/dts/keystone-k2hk-evm.dts:                compatible = "Micron,n25q128a11";
arch/arm/boot/dts/keystone-k2e-evm.dts:         compatible = "Micron,n25q128a11";

where "Micron," should of course be "micron,". The prefix is irrelevant
because that part gets discarded by of_modalias_node() in
of_register_spi_device() when generating the modalias.

Going through the list in spi-nor/core.c and checking the DTS files,
it's random what you find. Sometimes they have a valid manufacturer
prefix, sometimes they don't. Sometimes they have "jedec,spi-nor",
sometimes they don't. Sometimes they only list "jedec,spi-nor".

I haven't been through them all - I stopped at n25q064, so there are
probably more gems such as the above to be found.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-21  7:34         ` Andreas Schwab
@ 2021-09-21 12:22           ` Mark Brown
  2021-09-21 13:02             ` Andreas Schwab
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Brown @ 2021-09-21 12:22 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Russell King (Oracle), Linus Torvalds, Marco Felsch, linux-spi, kernel

[-- Attachment #1: Type: text/plain, Size: 666 bytes --]

On Tue, Sep 21, 2021 at 09:34:11AM +0200, Andreas Schwab wrote:

> Reverting them will break module autoloading that previously worked.
> This is a _regression_.

We know.  As I said in my earlier mail there's regressions both ways:

  https://lore.kernel.org/linux-spi/87sfxyfklo.fsf@igel.home/T/#mb7811e507ee8a6d33ee5ab268a6eca676b746fce

This is messy, we're aware of of the complications and I fear any
thorough fix is going to involve a full audit of all SPI drivers which
is a bunch of work.

Please stop this, it's not contributing constructively to addressing the
problem and engaging with your mails takes time away from the people who
are trying to do so.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-21 12:22           ` Mark Brown
@ 2021-09-21 13:02             ` Andreas Schwab
  0 siblings, 0 replies; 17+ messages in thread
From: Andreas Schwab @ 2021-09-21 13:02 UTC (permalink / raw)
  To: Mark Brown
  Cc: Russell King (Oracle), Linus Torvalds, Marco Felsch, linux-spi, kernel

On Sep 21 2021, Mark Brown wrote:

> On Tue, Sep 21, 2021 at 09:34:11AM +0200, Andreas Schwab wrote:
>
>> Reverting them will break module autoloading that previously worked.
>> This is a _regression_.
>
> We know. 

Apparently, Russell didn't.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-09-20 22:25       ` Linus Torvalds
@ 2021-10-04 14:00         ` Andreas Schwab
  2021-10-04 14:30           ` Mark Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2021-10-04 14:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Russell King (Oracle),
	Mark Brown, Marco Felsch, linux-spi, Pengutronix Kernel Team

On Sep 20 2021, Linus Torvalds wrote:

> That is why the regression rule exists. It is NOT ACCEPTABLE to fix
> one thing, and break another. That's not a fix at all. That's a
> regression, and no amount of "but but but it fixes something else" is
> valid.

That regression has now landed in -stable.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-10-04 14:00         ` Andreas Schwab
@ 2021-10-04 14:30           ` Mark Brown
  2021-10-04 15:23             ` Andreas Schwab
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Brown @ 2021-10-04 14:30 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Linus Torvalds, Russell King (Oracle),
	Marco Felsch, linux-spi, Pengutronix Kernel Team

[-- Attachment #1: Type: text/plain, Size: 597 bytes --]

On Mon, Oct 04, 2021 at 04:00:02PM +0200, Andreas Schwab wrote:
> On Sep 20 2021, Linus Torvalds wrote:

> > That is why the regression rule exists. It is NOT ACCEPTABLE to fix
> > one thing, and break another. That's not a fix at all. That's a
> > regression, and no amount of "but but but it fixes something else" is
> > valid.

> That regression has now landed in -stable.

Like I said previously please stop this, it is not helping and just
taking time away from people actually working constructively on the
problem:

   https://lore.kernel.org/linux-spi/20210921122228.GB9990@sirena.org.uk/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin
  2021-10-04 14:30           ` Mark Brown
@ 2021-10-04 15:23             ` Andreas Schwab
  0 siblings, 0 replies; 17+ messages in thread
From: Andreas Schwab @ 2021-10-04 15:23 UTC (permalink / raw)
  To: Mark Brown
  Cc: Linus Torvalds, Russell King (Oracle),
	Marco Felsch, linux-spi, Pengutronix Kernel Team

On Okt 04 2021, Mark Brown wrote:

> On Mon, Oct 04, 2021 at 04:00:02PM +0200, Andreas Schwab wrote:
>> On Sep 20 2021, Linus Torvalds wrote:
>
>> > That is why the regression rule exists. It is NOT ACCEPTABLE to fix
>> > one thing, and break another. That's not a fix at all. That's a
>> > regression, and no amount of "but but but it fixes something else" is
>> > valid.
>
>> That regression has now landed in -stable.
>
> Like I said previously please stop this, it is not helping

Which part of "It is NOT ACCEPTABLE to fix one thing, and break another"
did you not understand?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-04 15:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 10:20 REGRESSION: "spi: add of_device_uevent_modalias support" and following "fix" breaks Macchiatobin Russell King (Oracle)
2021-09-20 18:33 ` Mark Brown
2021-09-20 19:37   ` Russell King (Oracle)
2021-09-20 21:25     ` Mark Brown
2021-09-21 10:55       ` Russell King (Oracle)
2021-09-20 19:41 ` Andreas Schwab
2021-09-20 19:49   ` Russell King (Oracle)
2021-09-20 20:52     ` Mark Brown
2021-09-20 21:56     ` Andreas Schwab
2021-09-20 22:25       ` Linus Torvalds
2021-10-04 14:00         ` Andreas Schwab
2021-10-04 14:30           ` Mark Brown
2021-10-04 15:23             ` Andreas Schwab
2021-09-20 22:43       ` Russell King (Oracle)
2021-09-21  7:34         ` Andreas Schwab
2021-09-21 12:22           ` Mark Brown
2021-09-21 13:02             ` Andreas Schwab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.