All of lore.kernel.org
 help / color / mirror / Atom feed
* Hotplug bug in Linux 2.4.21?
@ 2003-06-20 17:25 Adrien Beau
  2003-06-20 17:51 ` Greg KH
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Adrien Beau @ 2003-06-20 17:25 UTC (permalink / raw)
  To: linux-hotplug

Hello,

This is my first post on this mailing list. I registered to
discuss some uncritical issues I had with hotplug scripting, but
have encountered a problem today that I feel is more important.
However, I think it will be better if I explain my overall
situation in some details. If you're really busy, there's just
one thing you need to know and investigate: it seems that Linux
2.4.21 fires every event *three* times instead of one; this
breaks scripts.

My hotplugging needs are limited to one peripheral, the SpeedTouch
USB DSL modem. Due to some mysterious factors, it is disconnected
at the USB level several times per day, and so I saw a good
hotplug script as a fine solution to this behavior.

This modem is purely driven in userland. A kernel driver is in the
works, but not on par to the userland driver yet. When the modem is
connected, "modem_run" is launched to upload the firmware, handle
the initial DSL connection and then monitor the state of the line
in the background. "pppd" is then launched, and communicates with
the modem through the "pppoa3" helper program, which handles PPP
over ATM.

My uncritical issue was a coldplug issue. modem_run runs for around
25 seconds before it returns to its caller. However, upon bootup, it
seems that after seven seconds, some component in the hotplug or usb
subsystems times out and fires a new event. Since my hotplug script
kills existing instances of pppd and modem_run before relaunching them,
this was a problem, which I solved by implementing a locking mechanism
in my script: if the lockfile exists *and* isn't too old, the script
exits. This worked fine (but still with a few dangling issues I was
intending to ask about here), until I upgraded to Linux 2.4.21.

When I booted my new kernel, I almost immediately noticed that my
DSL line wasn't up as expected, and so I looked into my logs (my
hotplug script does a fair amount of logging) and noticed two weird
things: the first one, is that *three* connection events were fired
when the modem was detected (plus another one after the seven seconds
timeout). When I disconnected it, my remover was launched *three*
times, simultaneously.

My only diagnostic is that there's something quite broken in the new
kernel. I haven't had time to investigate the source however, and
you might know it much better than me (well, I've never looked at the
hotplug source). So, any idea, any help, is welcome. I can put the
hotplug script and the log on a website if you want to have a look
at them.

But wait, it's not over... I bet some of you (most of you?) have
already wondered "But he has a locking mechanism, so what's the deal?".
Well, the "deal" is that for reasons I can't even begin to speculate
on, when three instances of my hotplug script are run in parallel
(all three launched in the same second), a very simple "if" test fails
to work (it works fine when two instances are run at a few seconds
interval). Considering the simplicity of the code, I'd call that
"impossible", but it happens indeed. ¹

Well, I'd much rather try to solve why three events are fired for one
dis/connection (sounds like a simple bug) that to solve this
impossibility.

Thank you for reading this far, I look forward to discuss this with
you. I'll be out on (perhaps extended) week-end, but I'll try to check
this list once or twice during this period.


¹ Here's the code portion that fails:

MAX_LOCK_AGE`
if expr $currenttime '-' $timestamp '<' $MAX_LOCK_AGE; then
   log "Found a recent lock, exiting"
   exit 0
else
   log "Found a stale lock"
   # go ahead, create a new one
fi

With for example currenttime\x1056107163 and timestamp\x1056107162, I get
"Found a stale lock" on my log (and of course, the lock is removed,
pppd and modem_run are killed, etc.) When only two instances run at a
few more seconds interval, under 2.4.20, I of course get "Found a
recent lock, exiting" and everything works fine. Now, isn't that weird?

-- 
adrien.beau@free.fr - http://adrien.beau.free.fr/



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
@ 2003-06-20 17:51 ` Greg KH
  2003-06-20 18:15 ` Duncan Sands
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Greg KH @ 2003-06-20 17:51 UTC (permalink / raw)
  To: linux-hotplug

On Fri, Jun 20, 2003 at 07:25:59PM +0200, Adrien Beau wrote:
> 
> When I booted my new kernel, I almost immediately noticed that my
> DSL line wasn't up as expected, and so I looked into my logs (my
> hotplug script does a fair amount of logging) and noticed two weird
> things: the first one, is that *three* connection events were fired
> when the modem was detected (plus another one after the seven seconds
> timeout). When I disconnected it, my remover was launched *three*
> times, simultaneously.

Yes, we now call /sbin/hotplug for ever interface of a USB device,
instead of just once.  I've talked to the speedtouch kernel driver's
author a bunch about this, and he says that he has a updated hotplug
script to properly handle this.  You should contact him.

Hope this helps,

greg k-h


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
  2003-06-20 17:51 ` Greg KH
@ 2003-06-20 18:15 ` Duncan Sands
  2003-06-20 18:44 ` Adrien Beau
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Duncan Sands @ 2003-06-20 18:15 UTC (permalink / raw)
  To: linux-hotplug

> Yes, we now call /sbin/hotplug for ever interface of a USB device,
> instead of just once.  I've talked to the speedtouch kernel driver's
> author a bunch about this, and he says that he has a updated hotplug
> script to properly handle this.  You should contact him.

Actually I said that it was a pain to write such a script in 2.4 because
there is no easy way to get the interface number, and promised you
a kernel patch to export the interface number, which I never found
time to write...

Duncan.


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
  2003-06-20 17:51 ` Greg KH
  2003-06-20 18:15 ` Duncan Sands
@ 2003-06-20 18:44 ` Adrien Beau
  2003-06-20 19:18 ` Kevin P. Fleming
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Adrien Beau @ 2003-06-20 18:44 UTC (permalink / raw)
  To: linux-hotplug

On Friday 20 June 2003 19:51, Greg KH wrote:
> 
> On Fri, Jun 20, 2003 at 07:25:59PM +0200, Adrien Beau wrote:
> > three connection events were fired when the modem was detected
> > (...). When I disconnected it, my remover was launched three
> > times, simultaneously.
>
> Yes, we now call /sbin/hotplug for ever interface of a USB
> device, instead of just once. I've talked to the speedtouch
> kernel driver's author a bunch about this, and he says that he
> has a updated hotplug script to properly handle this.

To which Duncan Sands quickly replied:
>
> Actually I said that it was a pain to write such a script in 2.4
> because there is no easy way to get the interface number.

Okay. I guess that if we had that, we could easily react to only
one of the events, and ignore the others. Meanwhile, it seems like
my lockfile approach is the better answer to the issue, if only
there wasn't this strange behavior. If you want to have a look at
it, have put my script there:

http://adrien.beau.free.fr/slackware/hotplug/speedtouch
http://adrien.beau.free.fr/slackware/hotplug/speedtouch.usermap
http://adrien.beau.free.fr/slackware/hotplug/speedtouch.log

It is intended to work with the *usermode* SpeedTouch driver, and
should of course be put under /etc/hotplug/usb. The logfile might
interest you, too.

So, if anybody can help finding why:

* When everything works fine, a second instance of the script is
  launched approximately seven seconds after the first one. It looks
  like there's a timeout somewhere. Look at the logfile entries on
  June 17th for a typical occurence of the problem. This only happens
  at boot time. Hotplugging works (worked) fine.
* When two instances of the script are run at a few seconds interval,
  the lockfile mechanism works fine (again, see for example June 17th),
  but when three instances run at the same time, the mechanism fails
  in an impossible manner (see the log entries starting on June 20th,
  at 12:15pm for the first occurence of this, and see the entries at
  12:55:55pm for an occurence with the timestamps printed in the logs).

... I would be very grateful.

> Hope this helps,

Well, sure, now I know on what problem to work. :-)

-- 
adrien.beau@free.fr - http://adrien.beau.free.fr/



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
                   ` (2 preceding siblings ...)
  2003-06-20 18:44 ` Adrien Beau
@ 2003-06-20 19:18 ` Kevin P. Fleming
  2003-06-21 10:12 ` Olaf Hering
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Kevin P. Fleming @ 2003-06-20 19:18 UTC (permalink / raw)
  To: linux-hotplug

Adrien Beau wrote:

> * When two instances of the script are run at a few seconds interval,
>   the lockfile mechanism works fine (again, see for example June 17th),
>   but when three instances run at the same time, the mechanism fails
>   in an impossible manner (see the log entries starting on June 20th,
>   at 12:15pm for the first occurence of this, and see the entries at
>   12:55:55pm for an occurence with the timestamps printed in the logs).
> 

I suspect your problem is that you are using timestamps for this log checking, 
and the timestamp granularity is too coarse. When the USB core decides to 
generate hotplug events for all three interfaces of a device it just found, 
those three /sbin/hotplug invocations are going to happen very quickly, 
certainly within a small fraction of a second.

You'll need to find some other way of knowing when a lockfile exists and is 
stale... something that can handle more than one process that is started at 
identical times (or nearly so).



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
                   ` (3 preceding siblings ...)
  2003-06-20 19:18 ` Kevin P. Fleming
@ 2003-06-21 10:12 ` Olaf Hering
  2003-06-21 15:56 ` Adrien Beau
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Olaf Hering @ 2003-06-21 10:12 UTC (permalink / raw)
  To: linux-hotplug

 On Fri, Jun 20, Adrien Beau wrote:

> http://adrien.beau.free.fr/slackware/hotplug/speedtouch.usermap

Maybe I miss the point of this file, but you do not need it if your
driver defines a valid MODULE_DEVICE_TABLE().

Gruss Olaf

-- 
USB is for mice, FireWire is for men!


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
                   ` (4 preceding siblings ...)
  2003-06-21 10:12 ` Olaf Hering
@ 2003-06-21 15:56 ` Adrien Beau
  2003-06-21 16:11 ` Adrien Beau
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Adrien Beau @ 2003-06-21 15:56 UTC (permalink / raw)
  To: linux-hotplug

On Saturday 21 June 2003 12:12, Olaf Hering wrote:
> 
> On Fri, Jun 20, Adrien Beau wrote:
> > http://adrien.beau.free.fr/slackware/hotplug/speedtouch.usermap
>
> Maybe I miss the point of this file, but you do not need it if
> your driver defines a valid MODULE_DEVICE_TABLE().

I think this is the case for a kernel module, but the driver I use
is 100% usermode.

-- 
adrien.beau@free.fr - http://adrien.beau.free.fr/



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
                   ` (5 preceding siblings ...)
  2003-06-21 15:56 ` Adrien Beau
@ 2003-06-21 16:11 ` Adrien Beau
  2003-06-22  6:31 ` Adrien Beau
  2003-06-27 15:15 ` Adrien Beau
  8 siblings, 0 replies; 10+ messages in thread
From: Adrien Beau @ 2003-06-21 16:11 UTC (permalink / raw)
  To: linux-hotplug

On Friday 20 June 2003 21:18, Kevin P. Fleming wrote:
> 
> Adrien Beau wrote:
> > * When two instances of the script are run at a few seconds
> > interval, the lockfile mechanism works fine (again, see for
> > example June 17th), but when three instances run at the same
> > time, the mechanism fails in an impossible manner (see the
> > log entries starting on June 20th, at 12:15pm for the first
> > occurence of this, and see the entries at 12:55:55pm for an
> > occurence with the timestamps printed in the logs).
>
> I suspect your problem is that you are using timestamps for
> this log checking, and the timestamp granularity is too coarse.

It is coarse, but it shouldn't matter: if the two timestamps are
equal, or off by one, their difference is obviously less than
60 seconds. The script should then consider there's a fresh
lockfile, and should bail out. This is what it does when two
scripts are run at a few seconds interval.

My problem is that the script manages to find that the lockfile
exists, manages to read the timestamp from the logfile, but
then somehow decides that the difference between the timestamp
and the current time is more than 60 seconds! However, as the
logfile shows, the difference is never more than 1 second.
Something weird is happening behind the scenes.

> When the USB core decides to generate hotplug events for all
> three interfaces of a device it just found, those three
> /sbin/hotplug invocations are going to happen very quickly,
> certainly within a small fraction of a second.

This is what happens indeed.

> You'll need to find some other way of knowing when a lockfile
> exists and is stale... something that can handle more than one
> process that is started at identical times (or nearly so).

Even with the very short interval between them, the scripts find
without error that the lockfile exists, and read it without error.
However, they then go to decide that 1 is greater than 60, and
*that*, I can't understand.

I'm going to put some sleep(1) at appropriate places and see if
this helps.

-- 
adrien.beau@free.fr - http://adrien.beau.free.fr/



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
                   ` (6 preceding siblings ...)
  2003-06-21 16:11 ` Adrien Beau
@ 2003-06-22  6:31 ` Adrien Beau
  2003-06-27 15:15 ` Adrien Beau
  8 siblings, 0 replies; 10+ messages in thread
From: Adrien Beau @ 2003-06-22  6:31 UTC (permalink / raw)
  To: linux-hotplug

On Saturday 21 June 2003 18:11, Adrien Beau wrote:
> 
> > You'll need to find some other way of knowing when a lockfile
> > exists and is stale... something that can handle more than
> > one process that is started at identical times (or nearly
> > so).
>
> Even with the very short interval between them, the scripts
> find without error that the lockfile exists, and read it
> without error. However, they then go to decide that 1 is
> greater than 60, and *that*, I can't understand.
>
> I'm going to put some sleep(1) at appropriate places and see if
> this helps.

It quickly turned out that this was a wrong idea, since the scripts
are started at the same time. It only makes things worse, since the
the lockfile then disappears while a script is sleeping.

It doesn't matter any more, though, since I have found a solution to
this problem. I tried to find what made the test fail. It turns out
that

  expr $currenttime '-' $timestamp '<' $MAX_LOCK_AGE

returns 1 to stdout, which means that indeed, expr thinks that 0 or 1
are less than 60. The bug is that, when run in "normal" conditions,
at a few seconds interval, the above command has an exit status of 0,
but when run in "stress" conditions, at a few hundredth of seconds
interval, its exit status is 1. That's just weird (a bug in the kernel
or glibc, or what?), but that can be worked around. I ended up using

  [ `expr $currenttime '-' $timestamp` -lt $MAX_LOCK_AGE ]

instead, which works fine.

-- 
adrien.beau@free.fr - http://adrien.beau.free.fr/



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hotplug bug in Linux 2.4.21?
  2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
                   ` (7 preceding siblings ...)
  2003-06-22  6:31 ` Adrien Beau
@ 2003-06-27 15:15 ` Adrien Beau
  8 siblings, 0 replies; 10+ messages in thread
From: Adrien Beau @ 2003-06-27 15:15 UTC (permalink / raw)
  To: linux-hotplug

For the sake of completeness:

On Sunday 22 June 2003 08:31, Adrien Beau wrote:
> 
> It doesn't matter any more, though, since I have found a
> solution to this problem.
(...)
> I ended up using
>
>   [ `expr $currenttime '-' $timestamp` -lt $MAX_LOCK_AGE ]
>
> instead, which works fine.

Which works better, actually, but sometimes still fails. I've
also had cases where all three tests for the existence of the
lockfile would run at the same time, before it was created.
I've solved this problem by storing the pid in the lockfile,
so a script instance can know if its lockfile got overwritten.

Now it works fine.

-- 
adrien.beau@free.fr - http://adrien.beau.free.fr/



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-06-27 15:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-20 17:25 Hotplug bug in Linux 2.4.21? Adrien Beau
2003-06-20 17:51 ` Greg KH
2003-06-20 18:15 ` Duncan Sands
2003-06-20 18:44 ` Adrien Beau
2003-06-20 19:18 ` Kevin P. Fleming
2003-06-21 10:12 ` Olaf Hering
2003-06-21 15:56 ` Adrien Beau
2003-06-21 16:11 ` Adrien Beau
2003-06-22  6:31 ` Adrien Beau
2003-06-27 15:15 ` Adrien Beau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.