linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: SBus devices sometimes detected, sometimes not
       [not found] <alpine.SOC.1.00.1105021259220.6200@math.ut.ee>
@ 2011-05-17 20:43 ` David Miller
  2011-05-18  4:29   ` Grant Likely
  2011-05-18 15:27 ` [PATCH] of: fix race when matching drivers Milton Miller
  1 sibling, 1 reply; 9+ messages in thread
From: David Miller @ 2011-05-17 20:43 UTC (permalink / raw)
  To: mroos; +Cc: sparclinux, grant.likely, linux-kernel

From: Meelis Roos <mroos@linux.ee>
Date: Mon, 2 May 2011 13:39:46 +0300 (EEST)

> This is 2.6.39-rc3..rc5 on Sun Ultra 1 with SBus hme card.
> On about 50% of boots, hme is found fine and works. On other half boots 
> (not strictly in any order) SBus hme is not detected with this error:
> 
> hme: probe of f006be34 failed with error -22
> 
> grepping the logs, I sometimes also find similar errors about qlogicpti:
> 
> qpti: probe of f00798c4 failed with error -22
> 
> The node addresses are always the same for the device.
> 
> 2.6.38 did not have this problem.

Grant, I'm trying to diagnose this bug report which is a 2.6.39-rcX regression
and I think it's a side-effect of your changes to move away from
of_platform_{,un}register_driver().

Somehow an OF device driver's ->probe() is being called with a NULL
"->of_match", that's why the probe returns -22 (-EINVAL) since in
your transformations that is the first thing you check.

There are only a few ways this can happen that I can tell, firstly
the of_driver_match_device() call has to fail.

It could then happen if pdrv->id_table is non-NULL, but in these two
cases (drivers/net/sunhme.c:hme_sbus_driver and
drivers/scsi/qlogicpti.c:qpti_sbus_driver) no id_table is assigned.

The next possibility would be if pdrv->name matches the device's
name, but in these two cases ("SUNW,hme" vs. "hme" and "QLGC,pti"
vs. "qpti") that should not be the case either.

There is only one more way this could happen, and that is if the
device bus pointer is NULL.  Because in that situation the
match function doesn't get called and all devices can match.

This is also very unlikely, because platform_device_add() always
assigns pdev->dev.bus to &platform_bus_type before the device_add()
call.

The fact that this failure happens only on some boots for Meelis leads
me to belive that some datastructure is corrupted.  Perhaps there is
an incorrect __init or __devinit somewhere, or we reference freed up
data some other way.

Any ideas?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SBus devices sometimes detected, sometimes not
  2011-05-17 20:43 ` SBus devices sometimes detected, sometimes not David Miller
@ 2011-05-18  4:29   ` Grant Likely
  2011-05-18 13:59     ` mroos
  0 siblings, 1 reply; 9+ messages in thread
From: Grant Likely @ 2011-05-18  4:29 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, sparclinux, linux-kernel

On Tue, May 17, 2011 at 2:43 PM, David Miller <davem@davemloft.net> wrote:
> From: Meelis Roos <mroos@linux.ee>
> Date: Mon, 2 May 2011 13:39:46 +0300 (EEST)
>
>> This is 2.6.39-rc3..rc5 on Sun Ultra 1 with SBus hme card.
>> On about 50% of boots, hme is found fine and works. On other half boots
>> (not strictly in any order) SBus hme is not detected with this error:
>>
>> hme: probe of f006be34 failed with error -22
>>
>> grepping the logs, I sometimes also find similar errors about qlogicpti:
>>
>> qpti: probe of f00798c4 failed with error -22
>>
>> The node addresses are always the same for the device.
>>
>> 2.6.38 did not have this problem.
>
> Grant, I'm trying to diagnose this bug report which is a 2.6.39-rcX regression
> and I think it's a side-effect of your changes to move away from
> of_platform_{,un}register_driver().
>
> Somehow an OF device driver's ->probe() is being called with a NULL
> "->of_match", that's why the probe returns -22 (-EINVAL) since in
> your transformations that is the first thing you check.
>
> There are only a few ways this can happen that I can tell, firstly
> the of_driver_match_device() call has to fail.
>
> It could then happen if pdrv->id_table is non-NULL, but in these two
> cases (drivers/net/sunhme.c:hme_sbus_driver and
> drivers/scsi/qlogicpti.c:qpti_sbus_driver) no id_table is assigned.
>
> The next possibility would be if pdrv->name matches the device's
> name, but in these two cases ("SUNW,hme" vs. "hme" and "QLGC,pti"
> vs. "qpti") that should not be the case either.

This is the failure case I was most concerned about when merging the
busses, but as you say the device naming convention is different for
OF-generated devices and therefore unlikely, and also easy to check
for.

> There is only one more way this could happen, and that is if the
> device bus pointer is NULL.  Because in that situation the
> match function doesn't get called and all devices can match.
>
> This is also very unlikely, because platform_device_add() always
> assigns pdev->dev.bus to &platform_bus_type before the device_add()
> call.

Actually, platform_device_add() doesn't get called for the of_device
case; mostly because there are still a couple of things that need to
be unified before platform_device_add() can replace of_device_add().
of_device_register() is called instead which doesn't set the bus
pointer.  However, both versions of scan_one_device() in arch/sparc do
explicitly set the bus pointer before adding the device, so this still
doesn't look like the issue, and if it was it doesn't explain the
intermittent nature of the problem.

> The fact that this failure happens only on some boots for Meelis leads
> me to belive that some datastructure is corrupted.  Perhaps there is
> an incorrect __init or __devinit somewhere, or we reference freed up
> data some other way.

I would agree that that is a likely scenario.

> Any ideas?

It would be good to know the exact failure mode, so we need to know if
of_driver_match_device() is setting of_match which is subsequently
getting cleared, or if of_driver_match_device() isn't getting called
at all.  Meelis, can you add something like the following code to
include/linux/of_device.h in the of_driver_match_device() function
between the of_match_device() call and the return statement:

if (dev->of_match)
        printk("found match; node=%s, driver=%s\n",
                 dev->of_node->full_name, drv->name);

g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: SBus devices sometimes detected, sometimes not
  2011-05-18  4:29   ` Grant Likely
@ 2011-05-18 13:59     ` mroos
  0 siblings, 0 replies; 9+ messages in thread
From: mroos @ 2011-05-18 13:59 UTC (permalink / raw)
  To: Grant Likely; +Cc: David Miller, sparclinux, linux-kernel

> It would be good to know the exact failure mode, so we need to know if
> of_driver_match_device() is setting of_match which is subsequently
> getting cleared, or if of_driver_match_device() isn't getting called
> at all.  Meelis, can you add something like the following code to
> include/linux/of_device.h in the of_driver_match_device() function
> between the of_match_device() call and the return statement:
> 
> if (dev->of_match)
>         printk("found match; node=%s, driver=%s\n",
>                  dev->of_node->full_name, drv->name);
> 

Tried it with todays 2.6.39-rc7-00211-gc1d10d1. It was harder to 
reproduce this time, about 1 time out of 10.

In normal operation:

[   77.794877] found match; node=/sbus@1f,0/SUNW,hme@0,8c00000, driver=hme
[   77.872379] sunhme.c:v3.10 August 26, 2008 David S. Miller (davem@davemloft.net)
[   77.962865] eth0: HAPPY MEAL (SBUS) 10/100baseT Ethernet 08:00:20:88:6d:6a

In case of error (bpp show only as an example that some things work):

[   70.312166] found match; node=/sbus@1f,0/SUNW,bpp@e,c800000, driver=bpp
[   70.389649] parport0: sunbpp at 0x1ffec800000
[   70.477720] found match; node=/sbus@1f,0/SUNW,hme@0,8c00000, driver=hme
[   70.578023] found match; node=/sbus@1f,0/QLGC,isp@1,10000, driver=qpti
[   70.915544] hme: probe of f006be34 failed with error -22
[   70.978825] qpti: probe of f00798c4 failed with error -22

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] of: fix race when matching drivers
       [not found] <alpine.SOC.1.00.1105021259220.6200@math.ut.ee>
  2011-05-17 20:43 ` SBus devices sometimes detected, sometimes not David Miller
@ 2011-05-18 15:27 ` Milton Miller
  2011-05-18 15:41   ` Grant Likely
  2011-05-18 15:45   ` Josip Rodin
  1 sibling, 2 replies; 9+ messages in thread
From: Milton Miller @ 2011-05-18 15:27 UTC (permalink / raw)
  To: Grant Likely; +Cc: David Miller, devicetree-discuss, linux-kernel, sparclinux

If two drivers are probing devices at the same time, both will write
their match table result to the dev->of_match cache at the same time.

Only write the result if the device matches.

In a thread titled "SBus devices sometimes detected, sometimes not",
Meelis reported his SBus hme was not detected about 50% of the time.
>From the debug suggested by Grant it was obvious another driver matched
some devices between the call to match the hme and the hme discovery
failling.

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Milton Miller <miltonm@bga.com>
---

Grant, I really think this of_match cache in the device node is bad a
bad tradeoff, and am willing to submit patches to remove it for 2.6.40.
It is only used by about 26 drivers and all use it once during probe
to fill out their driver data.  It comes at the cost of a long for
every struct device in every system.

I'll even offer to throw in a patch to cache the parsing of the
compatible property to speed up of_device_is_compatible if needed.



Index: work.git/include/linux/of_device.h
===================================================================
--- work.git.orig/include/linux/of_device.h	2011-05-18 09:57:01.014386816 -0500
+++ work.git/include/linux/of_device.h	2011-05-18 09:58:27.537431575 -0500
@@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct
 static inline int of_driver_match_device(struct device *dev,
 					 const struct device_driver *drv)
 {
-	dev->of_match = of_match_device(drv->of_match_table, dev);
-	return dev->of_match != NULL;
+	const struct of_device_id *match;
+
+	match = of_match_device(drv->of_match_table, dev);
+	if (match) {
+		dev->of_match = of_match_device(drv->of_match_table, dev);
+		return 1;
+	}
+
+	return 0;
 }
 
 extern struct platform_device *of_dev_get(struct platform_device *dev);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] of: fix race when matching drivers
  2011-05-18 15:27 ` [PATCH] of: fix race when matching drivers Milton Miller
@ 2011-05-18 15:41   ` Grant Likely
  2011-05-18 15:47     ` Grant Likely
  2011-05-18 15:45   ` Josip Rodin
  1 sibling, 1 reply; 9+ messages in thread
From: Grant Likely @ 2011-05-18 15:41 UTC (permalink / raw)
  To: Milton Miller; +Cc: David Miller, devicetree-discuss, linux-kernel, sparclinux

On Wed, May 18, 2011 at 9:27 AM, Milton Miller <miltonm@bga.com> wrote:
> If two drivers are probing devices at the same time, both will write
> their match table result to the dev->of_match cache at the same time.
>
> Only write the result if the device matches.
>
> In a thread titled "SBus devices sometimes detected, sometimes not",
> Meelis reported his SBus hme was not detected about 50% of the time.
> From the debug suggested by Grant it was obvious another driver matched
> some devices between the call to match the hme and the hme discovery
> failling.
>
> Reported-by: Meelis Roos <mroos@linux.ee>
> Signed-off-by: Milton Miller <miltonm@bga.com>
> ---
>
> Grant, I really think this of_match cache in the device node is bad a
> bad tradeoff, and am willing to submit patches to remove it for 2.6.40.
> It is only used by about 26 drivers and all use it once during probe
> to fill out their driver data.  It comes at the cost of a long for
> every struct device in every system.

Ah, bugger.  I had /thought/ that matching and probing were kept
together with a mutex.  So, yes, this is bad and the of_match needs to
be removed.  Thanks for volunteering to submit the patch.  It should
be backported to 2.6.39 too.

> I'll even offer to throw in a patch to cache the parsing of the
> compatible property to speed up of_device_is_compatible if needed.

That would be useful.  :-)

I'll pick up your patch right now and fire it off to Linus.

g.

>
>
>
> Index: work.git/include/linux/of_device.h
> ===================================================================
> --- work.git.orig/include/linux/of_device.h     2011-05-18 09:57:01.014386816 -0500
> +++ work.git/include/linux/of_device.h  2011-05-18 09:58:27.537431575 -0500
> @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct
>  static inline int of_driver_match_device(struct device *dev,
>                                         const struct device_driver *drv)
>  {
> -       dev->of_match = of_match_device(drv->of_match_table, dev);
> -       return dev->of_match != NULL;
> +       const struct of_device_id *match;
> +
> +       match = of_match_device(drv->of_match_table, dev);
> +       if (match) {
> +               dev->of_match = of_match_device(drv->of_match_table, dev);
> +               return 1;
> +       }
> +
> +       return 0;
>  }
>
>  extern struct platform_device *of_dev_get(struct platform_device *dev);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] of: fix race when matching drivers
  2011-05-18 15:27 ` [PATCH] of: fix race when matching drivers Milton Miller
  2011-05-18 15:41   ` Grant Likely
@ 2011-05-18 15:45   ` Josip Rodin
  2011-05-18 16:21     ` Grant Likely
  1 sibling, 1 reply; 9+ messages in thread
From: Josip Rodin @ 2011-05-18 15:45 UTC (permalink / raw)
  To: Milton Miller
  Cc: Grant Likely, David Miller, devicetree-discuss, linux-kernel, sparclinux

On Wed, May 18, 2011 at 10:27:39AM -0500, Milton Miller wrote:
> --- work.git.orig/include/linux/of_device.h	2011-05-18 09:57:01.014386816 -0500
> +++ work.git/include/linux/of_device.h	2011-05-18 09:58:27.537431575 -0500
> @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct
>  static inline int of_driver_match_device(struct device *dev,
>  					 const struct device_driver *drv)
>  {
> -	dev->of_match = of_match_device(drv->of_match_table, dev);
> -	return dev->of_match != NULL;
> +	const struct of_device_id *match;
> +
> +	match = of_match_device(drv->of_match_table, dev);
> +	if (match) {
> +		dev->of_match = of_match_device(drv->of_match_table, dev);
> +		return 1;
> +	}
> +
> +	return 0;
>  }

Err, is there some reason to avoid simply assigning the existing result:
dev->of_match = match; ?

-- 
     2. That which causes joy or happiness.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] of: fix race when matching drivers
  2011-05-18 15:41   ` Grant Likely
@ 2011-05-18 15:47     ` Grant Likely
  0 siblings, 0 replies; 9+ messages in thread
From: Grant Likely @ 2011-05-18 15:47 UTC (permalink / raw)
  To: Milton Miller
  Cc: David Miller, devicetree-discuss, linux-kernel, sparclinux,
	Linus Torvalds

[cc'ing Linus: This is fix for a 2.6.39 regression, so just a heads up
that I'll be sending you a pull req ASAP (later this morning)]

On Wed, May 18, 2011 at 9:41 AM, Grant Likely <grant.likely@secretlab.ca> wrote:
> On Wed, May 18, 2011 at 9:27 AM, Milton Miller <miltonm@bga.com> wrote:
>> If two drivers are probing devices at the same time, both will write
>> their match table result to the dev->of_match cache at the same time.
>>
>> Only write the result if the device matches.
>>
>> In a thread titled "SBus devices sometimes detected, sometimes not",
>> Meelis reported his SBus hme was not detected about 50% of the time.
>> From the debug suggested by Grant it was obvious another driver matched
>> some devices between the call to match the hme and the hme discovery
>> failling.
>>
>> Reported-by: Meelis Roos <mroos@linux.ee>
>> Signed-off-by: Milton Miller <miltonm@bga.com>
>> ---
>>
>> Grant, I really think this of_match cache in the device node is bad a
>> bad tradeoff, and am willing to submit patches to remove it for 2.6.40.
>> It is only used by about 26 drivers and all use it once during probe
>> to fill out their driver data.  It comes at the cost of a long for
>> every struct device in every system.
>
> Ah, bugger.  I had /thought/ that matching and probing were kept
> together with a mutex.  So, yes, this is bad and the of_match needs to
> be removed.  Thanks for volunteering to submit the patch.  It should
> be backported to 2.6.39 too.
>
>> I'll even offer to throw in a patch to cache the parsing of the
>> compatible property to speed up of_device_is_compatible if needed.
>
> That would be useful.  :-)
>
> I'll pick up your patch right now and fire it off to Linus.

... it's not perfect since it will break some theoretical drivers
unbind/rebind use-cases, but I cannot think of any actual examples,
and it is far safer than the current code.  Regardless, the removal of
of_match will definitely need to go into stable when it is ready.

g.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] of: fix race when matching drivers
  2011-05-18 15:45   ` Josip Rodin
@ 2011-05-18 16:21     ` Grant Likely
  2011-05-18 17:03       ` Milton Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Grant Likely @ 2011-05-18 16:21 UTC (permalink / raw)
  To: Josip Rodin
  Cc: Milton Miller, David Miller, devicetree-discuss, linux-kernel,
	sparclinux

On Wed, May 18, 2011 at 05:45:17PM +0200, Josip Rodin wrote:
> On Wed, May 18, 2011 at 10:27:39AM -0500, Milton Miller wrote:
> > --- work.git.orig/include/linux/of_device.h	2011-05-18 09:57:01.014386816 -0500
> > +++ work.git/include/linux/of_device.h	2011-05-18 09:58:27.537431575 -0500
> > @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct
> >  static inline int of_driver_match_device(struct device *dev,
> >  					 const struct device_driver *drv)
> >  {
> > -	dev->of_match = of_match_device(drv->of_match_table, dev);
> > -	return dev->of_match != NULL;
> > +	const struct of_device_id *match;
> > +
> > +	match = of_match_device(drv->of_match_table, dev);
> > +	if (match) {
> > +		dev->of_match = of_match_device(drv->of_match_table, dev);
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> >  }
> 
> Err, is there some reason to avoid simply assigning the existing result:
> dev->of_match = match; ?

Good point.  I've committed and am currently testing the modified version.

g.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] of: fix race when matching drivers
  2011-05-18 16:21     ` Grant Likely
@ 2011-05-18 17:03       ` Milton Miller
  0 siblings, 0 replies; 9+ messages in thread
From: Milton Miller @ 2011-05-18 17:03 UTC (permalink / raw)
  To: Grant Likely, Josip Rodin
  Cc: David Miller, devicetree-discuss, linux-kernel, sparclinux

On Wed, 18 May 2011 about 10:21:39 -0600, Grant Likely wrote:
> On Wed, May 18, 2011 at 05:45:17PM +0200, Josip Rodin wrote:
> > On Wed, May 18, 2011 at 10:27:39AM -0500, Milton Miller wrote:
> > > --- work.git.orig/include/linux/of_device.h	2011-05-18 09:57:01.014386816 -0500
> > > +++ work.git/include/linux/of_device.h	2011-05-18 09:58:27.537431575 -0500
> > > @@ -21,8 +21,15 @@ extern void of_device_make_bus_id(struct
> > >  static inline int of_driver_match_device(struct device *dev,
> > >  					 const struct device_driver *drv)
> > >  {
> > > -	dev->of_match = of_match_device(drv->of_match_table, dev);
> > > -	return dev->of_match != NULL;
> > > +	const struct of_device_id *match;
> > > +
> > > +	match = of_match_device(drv->of_match_table, dev);
> > > +	if (match) {
> > > +		dev->of_match = of_match_device(drv->of_match_table, dev);
> > > +		return 1;
> > > +	}
> > > +
> > > +	return 0;
> > >  }
> > 
> > Err, is there some reason to avoid simply assigning the existing result:
> > dev->of_match = match; ?
> 
> Good point.  I've committed and am currently testing the modified version.
> 
> g.

Sorry about that.  I had intended to do that (hence creating the
variable), but obvioiusly I forgot when I hurried to compile test
and send out the patch.

thanks to Josip for finding and Grant for fixing.

milton

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-05-18 17:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <alpine.SOC.1.00.1105021259220.6200@math.ut.ee>
2011-05-17 20:43 ` SBus devices sometimes detected, sometimes not David Miller
2011-05-18  4:29   ` Grant Likely
2011-05-18 13:59     ` mroos
2011-05-18 15:27 ` [PATCH] of: fix race when matching drivers Milton Miller
2011-05-18 15:41   ` Grant Likely
2011-05-18 15:47     ` Grant Likely
2011-05-18 15:45   ` Josip Rodin
2011-05-18 16:21     ` Grant Likely
2011-05-18 17:03       ` Milton Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).