All of lore.kernel.org
 help / color / mirror / Atom feed
* udev queue error
@ 2011-05-30 19:46 Markus Rathgeb
  2011-05-30 20:06 ` Kay Sievers
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: Markus Rathgeb @ 2011-05-30 19:46 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

Hello!

If I insert a special USB stick to my computer, udev will go crazy.
The USB stick is a "SanDisk Cruzer" device. It represents himself as a
block device and a CD rom device.

I do not know, how I could help. So pleace have a look at the logs:

/proc/kmsg if I insert the stick:

<7>[99224.649752] hub 4-1:1.0: state 7 ports 4 chg 0000 evt 0008
<7>[99224.651372] hub 4-1:1.0: port 3, status 0101, change 0001, 12 Mb/s
<7>[99224.758955] hub 4-1:1.0: debounce: port 3: total 100ms stable
100ms status 0x101
<6>[99224.824960] usb 4-1.3: new high speed USB device number 18 using ehci_hcd
<7>[99224.903944] usb 4-1.3: default language 0x0409
<7>[99224.904691] usb 4-1.3: udev 18, busnum 4, minor = 401
<6>[99224.904699] usb 4-1.3: New USB device found, idVendor=0781, idProduct=5406
<6>[99224.904704] usb 4-1.3: New USB device strings: Mfr=1, Product=2,
SerialNumber=3
<6>[99224.904709] usb 4-1.3: Product: U3 Cruzer Micro
<6>[99224.904713] usb 4-1.3: Manufacturer: SanDisk
<6>[99224.904716] usb 4-1.3: SerialNumber: 22427202FB10CD97
<7>[99224.904892] usb 4-1.3: usb_probe_device
<7>[99224.904901] usb 4-1.3: configuration #1 chosen from 1 choice
<7>[99224.905078] usb 4-1.3: adding 4-1.3:1.0 (config #1, interface 0)
<6>[99224.907302] scsi13 : usb-storage 4-1.3:1.0
<7>[99224.907520] drivers/usb/core/inode.c: creating file '018'
<5>[99225.911208] scsi 13:0:0:0: Direct-Access     SanDisk  Cruzer
      8.02 PQ: 0 ANSI: 0 CCS
<5>[99225.911485] sd 13:0:0:0: Attached scsi generic sg2 type 0
<5>[99225.913086] scsi 13:0:0:1: CD-ROM            SanDisk  Cruzer
      8.02 PQ: 0 ANSI: 0
<5>[99225.921089] sr 13:0:0:1: Attached scsi generic sg3 type 5

Attached is a log of "udevadm monitor" when I plugged in the stick,
wait a moment and remove it again. But the messages do not stop...
I have to stop udev and start it again.

uname -a
Linux thor 2.6.39-gentoo #1 SMP Mon May 23 23:11:54 CEST 2011 i686 AMD
Athlon(tm) 7750 Dual-Core Processor AuthenticAMD GNU/Linux
I am using gentoo with sys-kernel/gentoo-sources-2.6.39. Should I test
also a vanilla kernel?

Gentoo package for udev: sys-fs/udev-168-r2
udevd --version
168

I hope someone could give me a hint...

With regards,
Markus

[-- Attachment #2: udev-monitor.log.bz2 --]
[-- Type: application/x-bzip2, Size: 4130 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
@ 2011-05-30 20:06 ` Kay Sievers
  2011-05-31 15:31 ` Tejun Heo
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-05-30 20:06 UTC (permalink / raw)
  To: linux-hotplug

On Mon, May 30, 2011 at 21:46, Markus Rathgeb <maggu2810@googlemail.com> wrote:
> If I insert a special USB stick to my computer, udev will go crazy.
> The USB stick is a "SanDisk Cruzer" device. It represents himself as a
> block device and a CD rom device.
>
> I do not know, how I could help. So please have a look at the logs:

Yeah, the fake cdrom on the stick seems pretty broken. It generates
media-changed events on very open(), and udev runs open() for the new
event, which ... creates the loop you are seeing.

# /sbin/udevadm monitor &
# touch /dev/sr1
KERNEL[1306785694.680617] change   \
  /devices/pci0000:00/0000:00:1d.7/usb2/2-1/2-1:1.0/host4/target4:0:0/4:0:0:1/block/sr1
(block)
ACTION=change
SUBSYSTEM=block
DISK_MEDIA_CHANGE=1
...

# grep . /sys/class/block/sr1/*
/sys/class/block/sr1/alignment_offset:0
/sys/class/block/sr1/capability:19
/sys/class/block/sr1/dev:11:1
/sys/class/block/sr1/discard_alignment:0
/sys/class/block/sr1/events:media_change eject_request
/sys/class/block/sr1/events_poll_msecs:-1
/sys/class/block/sr1/ext_range:1
/sys/class/block/sr1/inflight:       0        0
/sys/class/block/sr1/range:1
/sys/class/block/sr1/removable:1
/sys/class/block/sr1/ro:0
/sys/class/block/sr1/size:62972

We need to find a way to work around such issues in the kernel, I guess.

Tejun, any idea?

Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
  2011-05-30 20:06 ` Kay Sievers
@ 2011-05-31 15:31 ` Tejun Heo
  2011-05-31 15:34 ` Markus Rathgeb
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-05-31 15:31 UTC (permalink / raw)
  To: linux-hotplug

Hello,

On Mon, May 30, 2011 at 10:06:49PM +0200, Kay Sievers wrote:
> Yeah, the fake cdrom on the stick seems pretty broken. It generates
> media-changed events on very open(), and udev runs open() for the new
> event, which ... creates the loop you are seeing.

Ugh... :-(

> We need to find a way to work around such issues in the kernel, I guess.
> 
> Tejun, any idea?

Hmmm... a couple of questions.

* udev doesn't open device w/ O_EXCL, right?

* Does a MEDIA_CHANGE event trigger multiple opens?  I _think_ if udev
  opens only once, the event shouldn't be re-generated immediately.
  It will happen on the next poll interval but it won't be a busy
  loop.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
  2011-05-30 20:06 ` Kay Sievers
  2011-05-31 15:31 ` Tejun Heo
@ 2011-05-31 15:34 ` Markus Rathgeb
  2011-05-31 15:40 ` Kay Sievers
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Markus Rathgeb @ 2011-05-31 15:34 UTC (permalink / raw)
  To: linux-hotplug

2011/5/31 Tejun Heo <htejun@gmail.com>:
> Hello,
>
> On Mon, May 30, 2011 at 10:06:49PM +0200, Kay Sievers wrote:
>> Yeah, the fake cdrom on the stick seems pretty broken. It generates
>> media-changed events on very open(), and udev runs open() for the new
>> event, which ... creates the loop you are seeing.
>
> Ugh... :-(
>
Hello again!

Thanks for the fast reply!

Could I help with further debugging informations?
Some test cases, udev rules, etc?

Markus

>> We need to find a way to work around such issues in the kernel, I guess.
>>
>> Tejun, any idea?
>
> Hmmm... a couple of questions.
>
> * udev doesn't open device w/ O_EXCL, right?
>
> * Does a MEDIA_CHANGE event trigger multiple opens?  I _think_ if udev
>  opens only once, the event shouldn't be re-generated immediately.
>  It will happen on the next poll interval but it won't be a busy
>  loop.
>
> Thanks.
>
> --
> tejun
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (2 preceding siblings ...)
  2011-05-31 15:34 ` Markus Rathgeb
@ 2011-05-31 15:40 ` Kay Sievers
  2011-05-31 15:46 ` Kay Sievers
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-05-31 15:40 UTC (permalink / raw)
  To: linux-hotplug

On Tue, May 31, 2011 at 17:34, Markus Rathgeb <maggu2810@googlemail.com> wrote:
> 2011/5/31 Tejun Heo <htejun@gmail.com>:
>> On Mon, May 30, 2011 at 10:06:49PM +0200, Kay Sievers wrote:
>>> Yeah, the fake cdrom on the stick seems pretty broken. It generates
>>> media-changed events on very open(), and udev runs open() for the new
>>> event, which ... creates the loop you are seeing.
>>
>> Ugh... :-(

> Could I help with further debugging informations?
> Some test cases, udev rules, etc?

You can only remove the rules that handle cdrom devices. There is
nothing else you can do for now.

Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (3 preceding siblings ...)
  2011-05-31 15:40 ` Kay Sievers
@ 2011-05-31 15:46 ` Kay Sievers
  2011-06-01  1:39 ` Tejun Heo
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-05-31 15:46 UTC (permalink / raw)
  To: linux-hotplug

On Tue, May 31, 2011 at 17:31, Tejun Heo <htejun@gmail.com> wrote:
> On Mon, May 30, 2011 at 10:06:49PM +0200, Kay Sievers wrote:
>> Yeah, the fake cdrom on the stick seems pretty broken. It generates
>> media-changed events on very open(), and udev runs open() for the new
>> event, which ... creates the loop you are seeing.
>
> Ugh... :-(
>
>> We need to find a way to work around such issues in the kernel, I guess.
>>
>> Tejun, any idea?
>
> Hmmm... a couple of questions.
>
> * udev doesn't open device w/ O_EXCL, right?

Cdrom_id uses O_EXCL if the disk is not mounted. If it's mounted we
obviously can't use O_EXCL and don't use it.

If it's a data disk, we call blkid which does not use O_EXCL.

> * Does a MEDIA_CHANGE event trigger multiple opens?

Yeah, it can run multiple things against the drive. If it's a data cd
it will run cdrom_id and blkid with the same event.

> I _think_ if udev
>  opens only once, the event shouldn't be re-generated immediately.

Yeah, but we can't assume only a single open() without making huge changes.

>  It will happen on the next poll interval but it won't be a busy
>  loop.

This happens without the in-kernel polling enabled.

Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (4 preceding siblings ...)
  2011-05-31 15:46 ` Kay Sievers
@ 2011-06-01  1:39 ` Tejun Heo
  2011-06-01  2:10 ` Kay Sievers
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-01  1:39 UTC (permalink / raw)
  To: linux-hotplug

Hello, Kay, Markus.

On Tue, May 31, 2011 at 05:46:24PM +0200, Kay Sievers wrote:
> > * udev doesn't open device w/ O_EXCL, right?
> 
> Cdrom_id uses O_EXCL if the disk is not mounted. If it's mounted we
> obviously can't use O_EXCL and don't use it.

Oh, correction.  Write O_EXCL open - close sequence trigger immediate
recheck because event checking is blocked over write O_EXCL open.

> If it's a data disk, we call blkid which does not use O_EXCL.
> 
> > * Does a MEDIA_CHANGE event trigger multiple opens?
> 
> Yeah, it can run multiple things against the drive. If it's a data cd
> it will run cdrom_id and blkid with the same event.

But this can trigger chain events if the device is reporting media
changed spuriously.

> > I _think_ if udev
> >  opens only once, the event shouldn't be re-generated immediately.
> 
> Yeah, but we can't assume only a single open() without making huge changes.

Understood.

> >  It will happen on the next poll interval but it won't be a busy
> >  loop.
> 
> This happens without the in-kernel polling enabled.

Yes, it doesn't have much to do with polling per-se.  The new check
event is used during regular open path.  Before, it wouldn't have
generated a new event but with ->check_events() changes,
GET_EVENT_STATUS_NOTIFICATION result carries more weight and event is
generated if GET_EVENT_STATUS_NOTIFICATION says so (which was one of
the goals of the conversion after all).

I've just ordered the same device and will investigate how it actually
behaves but if the device is just saying media changed all the time,
I'm not sure what the kernel can do other than black listing it so
that event is not generated for the device, which will solve the udev
infinite looping but if we do that we'll just be pushing the problem
to userland and solving the problem for only this device.

I think the sensible thing to do is applying some form of back-off
strategy so that udev doesn't fall into infinite loop no matter what
the device says, which I think is better implemented from userland.
Would that be difficult to implement from udev side?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (5 preceding siblings ...)
  2011-06-01  1:39 ` Tejun Heo
@ 2011-06-01  2:10 ` Kay Sievers
  2011-06-01  4:42 ` Tejun Heo
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-01  2:10 UTC (permalink / raw)
  To: linux-hotplug

On Wed, Jun 1, 2011 at 03:39, Tejun Heo <htejun@gmail.com> wrote:
> On Tue, May 31, 2011 at 05:46:24PM +0200, Kay Sievers wrote:
>> > * udev doesn't open device w/ O_EXCL, right?
>>
>> Cdrom_id uses O_EXCL if the disk is not mounted. If it's mounted we
>> obviously can't use O_EXCL and don't use it.
>
> Oh, correction.  Write O_EXCL open - close sequence trigger immediate
> recheck because event checking is blocked over write O_EXCL open.

I don't think we ever open anything for writing in udev context, it
should be all read-only.

>> Yeah, it can run multiple things against the drive. If it's a data cd
>> it will run cdrom_id and blkid with the same event.
>
> But this can trigger chain events if the device is reporting media
> changed spuriously.

>> This happens without the in-kernel polling enabled.
>
> Yes, it doesn't have much to do with polling per-se.  The new check
> event is used during regular open path.  Before, it wouldn't have
> generated a new event but with ->check_events() changes,
> GET_EVENT_STATUS_NOTIFICATION result carries more weight and event is
> generated if GET_EVENT_STATUS_NOTIFICATION says so (which was one of
> the goals of the conversion after all).

Right.

> I've just ordered the same device and will investigate how it actually
> behaves

If that doesn't show it, I have one here, that does. You'll come visit
in 3 weeks anyway. :)

> but if the device is just saying media changed all the time,
> I'm not sure what the kernel can do

It does that with every open() here.

> other than black listing it so
> that event is not generated for the device, which will solve the udev
> infinite looping but if we do that we'll just be pushing the problem
> to userland and solving the problem for only this device.

Yeah, we don't want lists, if we can avoid them. They are a nightmare.

> I think the sensible thing to do is applying some form of back-off
> strategy so that udev doesn't fall into infinite loop no matter what
> the device says, which I think is better implemented from userland.
> Would that be difficult to implement from udev side?

Hmm, sure, we can do anything that software can do. :) I'll think
about a strategy that could work. Not sure where to put such a
'counter' or how to distinguish 'weird' from 'broken' devices
patterns.

Should we possibly invent/misuse some open flag to use in udev
context, that will suppress the events?

Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (6 preceding siblings ...)
  2011-06-01  2:10 ` Kay Sievers
@ 2011-06-01  4:42 ` Tejun Heo
  2011-06-08 11:55 ` Kay Sievers
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-01  4:42 UTC (permalink / raw)
  To: linux-hotplug

Hello,

On Wed, Jun 01, 2011 at 04:10:31AM +0200, Kay Sievers wrote:
> > Oh, correction.  Write O_EXCL open - close sequence trigger immediate
> > recheck because event checking is blocked over write O_EXCL open.
> 
> I don't think we ever open anything for writing in udev context, it
> should be all read-only.

I see, so it's the multiple open(2)s then.  Not that that's a wrong
thing to do.

> > I've just ordered the same device and will investigate how it actually
> > behaves
> 
> If that doesn't show it, I have one here, that does. You'll come visit
> in 3 weeks anyway. :)

Heh, yeah, crappy devices are all around us. :)

> > I think the sensible thing to do is applying some form of back-off
> > strategy so that udev doesn't fall into infinite loop no matter what
> > the device says, which I think is better implemented from userland.
> > Would that be difficult to implement from udev side?
> 
> Hmm, sure, we can do anything that software can do. :) I'll think
> about a strategy that could work. Not sure where to put such a
> 'counter' or how to distinguish 'weird' from 'broken' devices
> patterns.
> 
> Should we possibly invent/misuse some open flag to use in udev
> context, that will suppress the events?

I don't think that would be welcome by many people.  Besides, I think
that would be more fragile than implementing proper back-off.  It just
takes one extra open() to trigger the looping and operations hanging
off from a udev event may cause that to happen outside of udev proper,
so it's not straight forward to determine who needs to set such flag.

I _think_ what we really need is a timer based throttling.  e.g. one
token is generated every, say, three seconds and can accumulate upto 5
and each media changed event checking consumes a token and should wait
if there's no token left.  Dunno how difficult it owuld be to
implement this tho.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (7 preceding siblings ...)
  2011-06-01  4:42 ` Tejun Heo
@ 2011-06-08 11:55 ` Kay Sievers
  2011-06-09 11:54 ` Tejun Heo
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-08 11:55 UTC (permalink / raw)
  To: linux-hotplug

On Wed, Jun 1, 2011 at 06:42, Tejun Heo <htejun@gmail.com> wrote:
> I _think_ what we really need is a timer based throttling.  e.g. one
> token is generated every, say, three seconds and can accumulate upto 5
> and each media changed event checking consumes a token and should wait
> if there's no token left.  Dunno how difficult it owuld be to
> implement this tho.

The problem is that currently every udev event run causes three new
kernel events. The whole thing is constantly triggered by the (still
present) userspace polling once a second. Udev will need to forcefully
drop events from the event queue every second to break this loop --
means constant throttling without any real idle time. Sure, such
ratelimit we should probably have, but it does not really solve our
problem properly, I guess. constantly dropping events from udev that
the kernel send us shouldn't be the expected behavior.

Maybe we can find a clean way to disable the event generation during
the time udev runs the event handler? We do a similar thing with
inotify watches too, so that udev does not generate events it has
caused itself during the event run.

Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (8 preceding siblings ...)
  2011-06-08 11:55 ` Kay Sievers
@ 2011-06-09 11:54 ` Tejun Heo
  2011-06-09 12:36 ` Kay Sievers
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-09 11:54 UTC (permalink / raw)
  To: linux-hotplug

Hello,

On Wed, Jun 08, 2011 at 01:55:01PM +0200, Kay Sievers wrote:
> Maybe we can find a clean way to disable the event generation during
> the time udev runs the event handler? We do a similar thing with
> inotify watches too, so that udev does not generate events it has
> caused itself during the event run.

Maybe we can tag events which were generated during open(2) with the
pid of the opener so that udev can reliably filter them?  I think it
would be nice if we can somehow make it uni-directional (ie. no extra
information from userland to kernel during open).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (9 preceding siblings ...)
  2011-06-09 11:54 ` Tejun Heo
@ 2011-06-09 12:36 ` Kay Sievers
  2011-06-12 12:28 ` Tejun Heo
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-09 12:36 UTC (permalink / raw)
  To: linux-hotplug

On Thu, Jun 9, 2011 at 13:54, Tejun Heo <htejun@gmail.com> wrote:
> On Wed, Jun 08, 2011 at 01:55:01PM +0200, Kay Sievers wrote:
>> Maybe we can find a clean way to disable the event generation during
>> the time udev runs the event handler? We do a similar thing with
>> inotify watches too, so that udev does not generate events it has
>> caused itself during the event run.
>
> Maybe we can tag events which were generated during open(2) with the
> pid of the opener so that udev can reliably filter them?  I think it
> would be nice if we can somehow make it uni-directional (ie. no extra
> information from userland to kernel during open

It's a cool idea.

But I fear in reality it gets messy pretty fast. We would need to
track and remember all pids from tools executed by udev rules.

With the fully async/queued nature of these events, we might not even
have read the events from the kernel after the event and the called
tools have long finished their work.

We could use the session id or process group, but that sounds like
asking for trouble.

Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (10 preceding siblings ...)
  2011-06-09 12:36 ` Kay Sievers
@ 2011-06-12 12:28 ` Tejun Heo
  2011-06-13 19:02 ` Kay Sievers
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-12 12:28 UTC (permalink / raw)
  To: linux-hotplug

Hello, Kay.

On Thu, Jun 09, 2011 at 02:36:54PM +0200, Kay Sievers wrote:
> On Thu, Jun 9, 2011 at 13:54, Tejun Heo <htejun@gmail.com> wrote:
> > Maybe we can tag events which were generated during open(2) with the
> > pid of the opener so that udev can reliably filter them?  I think it
> > would be nice if we can somehow make it uni-directional (ie. no extra
> > information from userland to kernel during open
> 
> It's a cool idea.
> 
> But I fear in reality it gets messy pretty fast. We would need to
> track and remember all pids from tools executed by udev rules.
> 
> With the fully async/queued nature of these events, we might not even
> have read the events from the kernel after the event and the called
> tools have long finished their work.
> 
> We could use the session id or process group, but that sounds like
> asking for trouble.

I've been playing with the device a bit and this thing really is an
abomination and reports new media whenever asked. :(

I've been trying to come up with a way to work around the problem.
The only thing we can do from kernel is somehow suppressing event
generation on open (or mark it to be ignored) - be it timing or opener
based; however, there's no way to acheive that in a race-free way.

* Kernel must check whether media is changed upon open.

* If at least one MEDIA_CHANGE event is not generated after device
  indicated the event, we risk losing events.  ie. Media change can
  race against open() and userland may end up with stale information.
  The race window is quite short compared to usual tray actions tho.

I think it would be better to avoid penalizing sane devices and let
timer-based throttling take care of crazy ones.  I don't know how udev
handles it from userland but does it need to keep track of all the
separate events?  Can't it just track the pending states (ie. one bit
for each event) and throttle propagation accordingly?

Also, another thing is that as userland is now in charge of tray
locking, maybe it's safe to ignore MEDIA_CHANGE while the device is
known to be locked from userland?  But, then again, I would be
surprised if there isn't a device which is crazy enough to avoid such
safe guard and still trigger weird behavior.  :(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (11 preceding siblings ...)
  2011-06-12 12:28 ` Tejun Heo
@ 2011-06-13 19:02 ` Kay Sievers
  2011-06-14 20:24 ` Kay Sievers
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-13 19:02 UTC (permalink / raw)
  To: linux-hotplug

On Sun, Jun 12, 2011 at 14:28, Tejun Heo <htejun@gmail.com> wrote:
> I've been playing with the device a bit and this thing really is an
> abomination and reports new media whenever asked. :(

Yeah, it looks pretty broken.

> I've been trying to come up with a way to work around the problem.
> The only thing we can do from kernel is somehow suppressing event
> generation on open (or mark it to be ignored) - be it timing or opener
> based; however, there's no way to acheive that in a race-free way.

Right, and both are not easy to manage. An open flag is hard to decide
when to use it, especially for tools shared by udev and possibly other
callers. The timer based solution is kinda weird inside the kernel.
And unfortunately, we still have userspace polling, which currently
triggers this behaviour once a second.

> * Kernel must check whether media is changed upon open.

Sure.

We follow GET_EVENT_STATUS_NOTIFICATION now. Can't we always force TUR
when we open() the device from userland? That's what we did in the
past, didn't we? The in-kernel polling would still not need to do
that, and the periodic usespace polling is about to die pretty soon.

> * If at least one MEDIA_CHANGE event is not generated after device
>  indicated the event, we risk losing events.  ie. Media change can
>  race against open() and userland may end up with stale information.
>  The race window is quite short compared to usual tray actions tho.

That would be pretty hard to debug, I guess. :)

> I think it would be better to avoid penalizing sane devices and let
> timer-based throttling take care of crazy ones.  I don't know how udev
> handles it from userland but does it need to keep track of all the
> separate events?  Can't it just track the pending states (ie. one bit
> for each event) and throttle propagation accordingly?

It's just a queue of events. Every event currently generates 3 new
ones in the kernel from the open() in the event event handler. We
would need to ratelimit that and discard all already queued events
when we decide that it's a loop.

We can not generally coalesce events in udev, as they might carry
different properties added from subsystems we need to read.

This loop would be triggered from new once a second with the current tools.

> Also, another thing is that as userland is now in charge of tray
> locking, maybe it's safe to ignore MEDIA_CHANGE while the device is
> known to be locked from userland?  But, then again, I would be
> surprised if there isn't a device which is crazy enough to avoid such
> safe guard and still trigger weird behavior.  :(

Guess we need to make it work with current stuff too, and not rely on
the door lock.

Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (12 preceding siblings ...)
  2011-06-13 19:02 ` Kay Sievers
@ 2011-06-14 20:24 ` Kay Sievers
  2011-06-15  7:13 ` Tejun Heo
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-14 20:24 UTC (permalink / raw)
  To: linux-hotplug

On Mon, 2011-06-13 at 21:02 +0200, Kay Sievers wrote:
> We follow GET_EVENT_STATUS_NOTIFICATION now. Can't we always force TUR
> when we open() the device from userland? That's what we did in the
> past, didn't we? The in-kernel polling would still not need to do
> that, and the periodic usespace polling is about to die pretty soon.

After the long discussion today between Tejun and me, something like
this *could* work. A patch with printk() debug output is below.

Every open() from userland does: GET_EVENT_STATUS_NOTIFICATION and TUR.
If the both results disagree 8 times in a row, we will only look at the
TUR results in the future.

With this I get in dmesg:
  scsi 8:0:0:1: CD-ROM            SanDisk  U3 Cruzer Micro  8.02 PQ: 0 ANSI: 0 
  sr1: scsi3-mmc drive: 48x/48x tray
  sr 8:0:0:1: Attached scsi CD-ROM sr1
  sr 8:0:0:1: Attached scsi generic sg3 type 5
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 1
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 2
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 3
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 4
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 5
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 6
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 7
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 8
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 9
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, ignoring all future GET_EVENT_STATUS_NOTIFICATION

The in-kernel polling will still create a media change event every poll
interval. The rest seems to work as expected.

The other properly working devices I tried fail with the very first TUR
<-> GET_EVENT check for some reason:
  scsi 7:0:0:0: CD-ROM            TSSTcorp CDDVDW SE-S084B  TS01 PQ: 0 ANSI: 0
  sr2: scsi3-mmc drive: 8x/24x writer dvd-ram cd/rw xa/form2 cdda tray
  sr 7:0:0:0: Attached scsi CD-ROM sr2
  sr 7:0:0:0: Attached scsi generic sg4 type 5
  sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, last_present 1 <-> cd->media_present 1, count 1
  sr: GET_EVENT_STATUS_NOTIFICATION = TUR, clear mismatch count 1

Everything thing else seems to work fine.

Kay

---
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 4778e27..9023e1d 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -229,6 +229,15 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 	if (!(clearing & DISK_EVENT_MEDIA_CHANGE))
 		goto skip_tur;
 
+	/*
+	 * earlier GET_EVENT_STATUS_NOTIFICATION and TUR did not agree
+	 * for a couple of times in a row, we rely on TUR only for this
+	 * likely broken device, to prevent generating incorrect media
+	 * changed events for every open()
+	 */
+	if (cd->ignore_get_event)
+		events = 0;
+
 	/* let's see whether the media is there with TUR */
 	last_present = cd->media_present;
 	ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);
@@ -241,8 +250,22 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 	cd->media_present = scsi_status_is_good(ret) ||
 		(scsi_sense_valid(&sshdr) && sshdr.asc != 0x3a);
 
-	if (last_present != cd->media_present)
+	if (last_present != cd->media_present) {
 		events |= DISK_EVENT_MEDIA_CHANGE;
+	} else if (events & DISK_EVENT_MEDIA_CHANGE) {
+		cd->tur_mismatch++;
+		printk("sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, "
+		       "last_present %i <-> cd->media_present %i, count %i\n", last_present, cd->media_present, cd->tur_mismatch);
+		if (cd->tur_mismatch > 8) {
+			printk("sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, "
+			       "ignoring all future GET_EVENT_STATUS_NOTIFICATION\n");
+			cd->ignore_get_event = true;
+		}
+	} else if (cd->tur_mismatch) {
+		printk("sr: GET_EVENT_STATUS_NOTIFICATION = TUR, "
+		       "clear mismatch count %i\n", cd->tur_mismatch);
+		cd->tur_mismatch = 0;
+	}
 skip_tur:
 	if (cd->device->changed) {
 		events |= DISK_EVENT_MEDIA_CHANGE;
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index e036f1d..94a3215 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -41,6 +41,8 @@ typedef struct scsi_cd {
 	unsigned readcd_known:1;	/* drive supports READ_CD (0xbe) */
 	unsigned readcd_cdda:1;	/* reading audio data using READ_CD */
 	unsigned media_present:1;	/* media is present */
+	unsigned ignore_get_event:1;	/* get_event is unreliable, use TUR */
+	int tur_mismatch;		/* nr of get_event TUR mismatches */
 	struct cdrom_device_info cdi;
 	/* We hold gendisk and scsi_device references on probe and use
 	 * the refs on this kref to decide when to release them */



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (13 preceding siblings ...)
  2011-06-14 20:24 ` Kay Sievers
@ 2011-06-15  7:13 ` Tejun Heo
  2011-06-15  9:34 ` Kay Sievers
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-15  7:13 UTC (permalink / raw)
  To: linux-hotplug

Hello, Kay.

Looks good to me.  Just some nitpicks.

On Tue, Jun 14, 2011 at 10:24:16PM +0200, Kay Sievers wrote:
> -	if (last_present != cd->media_present)
> +	if (last_present != cd->media_present) {
>  		events |= DISK_EVENT_MEDIA_CHANGE;
> +	} else if (events & DISK_EVENT_MEDIA_CHANGE) {
> +		cd->tur_mismatch++;
> +		printk("sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, "
> +		       "last_present %i <-> cd->media_present %i, count %i\n", last_present, cd->media_present, cd->tur_mismatch);

I suppose the above printk is for debugging and will be removed when
posting for inclusion?

> +		if (cd->tur_mismatch > 8) {
> +			printk("sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, "
> +			       "ignoring all future GET_EVENT_STATUS_NOTIFICATION\n");
> +			cd->ignore_get_event = true;

sdev_printk(KERN_WARNING, ...) would probably be a better fit.

> +		}
> +	} else if (cd->tur_mismatch) {
> +		printk("sr: GET_EVENT_STATUS_NOTIFICATION = TUR, "
> +		       "clear mismatch count %i\n", cd->tur_mismatch);
> +		cd->tur_mismatch = 0;

Ditto, we probably don't want the above in the final version.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (14 preceding siblings ...)
  2011-06-15  7:13 ` Tejun Heo
@ 2011-06-15  9:34 ` Kay Sievers
  2011-06-15  9:40 ` Tejun Heo
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-15  9:34 UTC (permalink / raw)
  To: linux-hotplug

On Wed, Jun 15, 2011 at 09:13, Tejun Heo <htejun@gmail.com> wrote:
> Looks good to me.  Just some nitpicks.
>
> On Tue, Jun 14, 2011 at 10:24:16PM +0200, Kay Sievers wrote:
>> -     if (last_present != cd->media_present)
>> +     if (last_present != cd->media_present) {
>>               events |= DISK_EVENT_MEDIA_CHANGE;
>> +     } else if (events & DISK_EVENT_MEDIA_CHANGE) {
>> +             cd->tur_mismatch++;
>> +             printk("sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, "
>> +                    "last_present %i <-> cd->media_present %i, count %i\n", last_present, cd->media_present, cd->tur_mismatch);
>
> I suppose the above printk is for debugging and will be removed when
> posting for inclusion?
>
>> +             if (cd->tur_mismatch > 8) {
>> +                     printk("sr: GET_EVENT_STATUS_NOTIFICATION but TUR unchanged, "
>> +                            "ignoring all future GET_EVENT_STATUS_NOTIFICATION\n");
>> +                     cd->ignore_get_event = true;
>
> sdev_printk(KERN_WARNING, ...) would probably be a better fit.
>
>> +             }
>> +     } else if (cd->tur_mismatch) {
>> +             printk("sr: GET_EVENT_STATUS_NOTIFICATION = TUR, "
>> +                    "clear mismatch count %i\n", cd->tur_mismatch);
>> +             cd->tur_mismatch = 0;
>
> Ditto, we probably don't want the above in the final version.

Sure not, it's nothing to merge, I just wanted to see if we are on the
right track and how it behaves.

Any idea why with the first check GET_EVENT seems, but TUR seems to
indicate no change?

Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (15 preceding siblings ...)
  2011-06-15  9:34 ` Kay Sievers
@ 2011-06-15  9:40 ` Tejun Heo
  2011-06-18 21:41 ` Kay Sievers
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-15  9:40 UTC (permalink / raw)
  To: linux-hotplug

Hello,

On Wed, Jun 15, 2011 at 11:34:23AM +0200, Kay Sievers wrote:
> Any idea why with the first check GET_EVENT seems, but TUR seems to
> indicate no change?

That's probably because probing logic already consumed UA.  Reset
during probe also causes UA so detection logic consumes them
afterwards.  I don't think driver can tell apart different UAs at that
point.  If the device determines to report media change after reset
via GET_EVENT, you get a mismatch.  So, it's more or less expected.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (16 preceding siblings ...)
  2011-06-15  9:40 ` Tejun Heo
@ 2011-06-18 21:41 ` Kay Sievers
  2011-06-28 13:45 ` Tejun Heo
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-18 21:41 UTC (permalink / raw)
  To: linux-hotplug

On Wed, 2011-06-15 at 11:40 +0200, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jun 15, 2011 at 11:34:23AM +0200, Kay Sievers wrote:
> > Any idea why with the first check GET_EVENT seems, but TUR seems to
> > indicate no change?
> 
> That's probably because probing logic already consumed UA.  Reset
> during probe also causes UA so detection logic consumes them
> afterwards.  I don't think driver can tell apart different UAs at that
> point.  If the device determines to report media change after reset
> via GET_EVENT, you get a mismatch.  So, it's more or less expected.

Here is a new patch. Would be great, if one of you guys can check if it
fixes the loop we are currently seeing.

Thanks,
Kay

From: Kay Sievers <kay.sievers@vrfy.org>
Subject: sr: check_events() ignore GET_EVENT when TUR says otherwise

Some (broken) devices return GET_EVENT at every open() which
lets udev run into a loop when the device is accessed fron an
event handler.

When GET_EVENT and TUR disagree for a few times in a row,
we ignore the GET_EVENT events, and trust only the TUR status.

This is the log of a USB stick with a (broken) fake CDROM drive:
  scsi 5:0:0:0: Direct-Access     SanDisk  U3 Cruzer Micro  8.02 PQ: 0 ANSI: 0 CCS
  sd 5:0:0:0: Attached scsi generic sg3 type 0
  scsi 5:0:0:1: CD-ROM            SanDisk  U3 Cruzer Micro  8.02 PQ: 0 ANSI: 0
  sd 5:0:0:0: [sdb] Attached SCSI removable disk
  sr2: scsi3-mmc drive: 48x/48x tray
  sr 5:0:0:1: Attached scsi CD-ROM sr2
  sr 5:0:0:1: Attached scsi generic sg4 type 5
  sr2: GET_EVENT and TUR disagree continuously, suppress GET_EVENT events
  sd 5:0:0:0: [sdb] 31777279 512-byte logical blocks: (16.2 GB/15.1 GiB)
  sd 5:0:0:0: [sdb] No Caching mode page present
  sd 5:0:0:0: [sdb] Assuming drive cache: write through
  sd 5:0:0:0: [sdb] No Caching mode page present
  sd 5:0:0:0: [sdb] Assuming drive cache: write through
  sdb: sdb1

Reported-By: Markus Rathgeb maggu2810@googlemail.com
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
---

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 4778e27..1b42dbd 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -229,6 +229,15 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 	if (!(clearing & DISK_EVENT_MEDIA_CHANGE))
 		goto skip_tur;
 
+	/*
+	 * Earlier GET_EVENT_STATUS_NOTIFICATION and TUR did not agree
+	 * for a couple of times in a row. We rely on TUR only for this
+	 * likely broken device, to prevent generating incorrect media
+	 * changed events for every open().
+	 */
+	if (cd->ignore_get_event)
+		events &= ~DISK_EVENT_MEDIA_CHANGE;
+
 	/* let's see whether the media is there with TUR */
 	last_present = cd->media_present;
 	ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);
@@ -241,8 +250,19 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 	cd->media_present = scsi_status_is_good(ret) ||
 		(scsi_sense_valid(&sshdr) && sshdr.asc != 0x3a);
 
-	if (last_present != cd->media_present)
+	if (last_present != cd->media_present) {
 		events |= DISK_EVENT_MEDIA_CHANGE;
+	} else if (events & DISK_EVENT_MEDIA_CHANGE) {
+		if (cd->tur_mismatch > 8) {
+			printk("%s: GET_EVENT and TUR disagree continuously, "
+			       "suppress GET_EVENT events\n", cd->cdi.name);
+			cd->ignore_get_event = true;
+		} else {
+			cd->tur_mismatch++;
+		}
+	} else if (!cd->ignore_get_event && cd->tur_mismatch > 0) {
+		cd->tur_mismatch = 0;
+	}
 skip_tur:
 	if (cd->device->changed) {
 		events |= DISK_EVENT_MEDIA_CHANGE;
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index e036f1d..94a3215 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -41,6 +41,8 @@ typedef struct scsi_cd {
 	unsigned readcd_known:1;	/* drive supports READ_CD (0xbe) */
 	unsigned readcd_cdda:1;	/* reading audio data using READ_CD */
 	unsigned media_present:1;	/* media is present */
+	unsigned ignore_get_event:1;	/* get_event is unreliable, use TUR */
+	int tur_mismatch;		/* nr of get_event TUR mismatches */
 	struct cdrom_device_info cdi;
 	/* We hold gendisk and scsi_device references on probe and use
 	 * the refs on this kref to decide when to release them */



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (17 preceding siblings ...)
  2011-06-18 21:41 ` Kay Sievers
@ 2011-06-28 13:45 ` Tejun Heo
  2011-06-28 15:53 ` Tejun Heo
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-28 13:45 UTC (permalink / raw)
  To: linux-hotplug

Hello, Kay.

I've played with the patch and the usb stick and it seems to be
generally working.  I've made some changes tho.

* The TUR and GET_EVENT results accumulated between clearing
  check_events() should be compared, not the ones from single run.

* TUR reporting media change while GET_EVENT doesn't is okay.

* If ignore_get_event is set, kernel media polling can continue
  operating using TUR so that userland doesn't have to worry about
  working around.

* Flags moved to a separate bitfield var for proper exclusion.

How does this one look?

Thanks.

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 4778e27..ffb173e 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -221,14 +221,33 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 		return 0;
 
 	events = sr_get_events(cd->device);
+	cd->get_event_changed |= events & DISK_EVENT_MEDIA_CHANGE;
+
+	/*
+	 * If earlier GET_EVENT_STATUS_NOTIFICATION and TUR did not agree
+	 * for a couple of times in a row.  We rely on TUR only for this
+	 * likely broken device, to prevent generating incorrect media
+	 * changed events for every open().
+	 */
+	if (cd->ignore_get_event) {
+		events &= ~DISK_EVENT_MEDIA_CHANGE;
+		goto do_tur;
+	}
+
 	/*
 	 * GET_EVENT_STATUS_NOTIFICATION is enough unless MEDIA_CHANGE
 	 * is being cleared.  Note that there are devices which hang
 	 * if asked to execute TUR repeatedly.
 	 */
-	if (!(clearing & DISK_EVENT_MEDIA_CHANGE))
-		goto skip_tur;
+	if (cd->device->changed) {
+		events |= DISK_EVENT_MEDIA_CHANGE;
+		cd->device->changed = 0;
+		cd->tur_changed = true;
+	}
 
+	if (!(clearing & DISK_EVENT_MEDIA_CHANGE))
+		return events;
+do_tur:
 	/* let's see whether the media is there with TUR */
 	last_present = cd->media_present;
 	ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);
@@ -242,11 +261,27 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 		(scsi_sense_valid(&sshdr) && sshdr.asc != 0x3a);
 
 	if (last_present != cd->media_present)
-		events |= DISK_EVENT_MEDIA_CHANGE;
-skip_tur:
+		cd->device->changed = 1;
+
 	if (cd->device->changed) {
 		events |= DISK_EVENT_MEDIA_CHANGE;
 		cd->device->changed = 0;
+		cd->tur_changed = true;
+	}
+
+	if (!cd->ignore_get_event) {
+		/* check for spurious changed events from GET_EVENT */
+		if (cd->get_event_changed && !cd->tur_changed) {
+			if (cd->tur_mismatch++ > 8) {
+				sdev_printk(KERN_WARNING, cd->device,
+					    "GET_EVENT and TUR disagree continuously, suppress GET_EVENT events\n");
+				cd->ignore_get_event = true;
+			}
+		} else {
+			cd->tur_mismatch = 0;
+		}
+		cd->tur_changed = true;
+		cd->get_event_changed = true;
 	}
 
 	return events;
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index e036f1d..f8ea01e 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -41,6 +41,13 @@ typedef struct scsi_cd {
 	unsigned readcd_known:1;	/* drive supports READ_CD (0xbe) */
 	unsigned readcd_cdda:1;	/* reading audio data using READ_CD */
 	unsigned media_present:1;	/* media is present */
+
+	/* GET_EVENT spurious event handling, block guarantees exclusion */
+	int tur_mismatch;		/* nr of get_event TUR mismatches */
+	bool tur_changed:1;		/* changed according to TUR */
+	bool get_event_changed:1;	/* changed according to GET_EVENT */
+	bool ignore_get_event:1;	/* get_event is unreliable, use TUR */
+
 	struct cdrom_device_info cdi;
 	/* We hold gendisk and scsi_device references on probe and use
 	 * the refs on this kref to decide when to release them */

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (18 preceding siblings ...)
  2011-06-28 13:45 ` Tejun Heo
@ 2011-06-28 15:53 ` Tejun Heo
  2011-06-28 15:57 ` Tejun Heo
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-28 15:53 UTC (permalink / raw)
  To: linux-hotplug

Hello, again.

Please test this one.  The previous one was broken and I also updated
it such that tur_mismatch is cleared iff !tur_changed &&
!get_event_changed.

Thanks.

---
 drivers/scsi/sr.c |   46 ++++++++++++++++++++++++++++++++++++++++++----
 drivers/scsi/sr.h |    7 +++++++
 2 files changed, 49 insertions(+), 4 deletions(-)

Index: work/drivers/scsi/sr.c
=================================--- work.orig/drivers/scsi/sr.c
+++ work/drivers/scsi/sr.c
@@ -221,14 +221,33 @@ static unsigned int sr_check_events(stru
 		return 0;
 
 	events = sr_get_events(cd->device);
+	cd->get_event_changed |= events & DISK_EVENT_MEDIA_CHANGE;
+
+	/*
+	 * If earlier GET_EVENT_STATUS_NOTIFICATION and TUR did not agree
+	 * for a couple of times in a row.  We rely on TUR only for this
+	 * likely broken device, to prevent generating incorrect media
+	 * changed events for every open().
+	 */
+	if (cd->ignore_get_event) {
+		events &= ~DISK_EVENT_MEDIA_CHANGE;
+		goto do_tur;
+	}
+
 	/*
 	 * GET_EVENT_STATUS_NOTIFICATION is enough unless MEDIA_CHANGE
 	 * is being cleared.  Note that there are devices which hang
 	 * if asked to execute TUR repeatedly.
 	 */
-	if (!(clearing & DISK_EVENT_MEDIA_CHANGE))
-		goto skip_tur;
+	if (cd->device->changed) {
+		events |= DISK_EVENT_MEDIA_CHANGE;
+		cd->device->changed = 0;
+		cd->tur_changed = true;
+	}
 
+	if (!(clearing & DISK_EVENT_MEDIA_CHANGE))
+		return events;
+do_tur:
 	/* let's see whether the media is there with TUR */
 	last_present = cd->media_present;
 	ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);
@@ -242,12 +261,31 @@ static unsigned int sr_check_events(stru
 		(scsi_sense_valid(&sshdr) && sshdr.asc != 0x3a);
 
 	if (last_present != cd->media_present)
-		events |= DISK_EVENT_MEDIA_CHANGE;
-skip_tur:
+		cd->device->changed = 1;
+
 	if (cd->device->changed) {
 		events |= DISK_EVENT_MEDIA_CHANGE;
 		cd->device->changed = 0;
+		cd->tur_changed = true;
+	}
+
+	if (cd->ignore_get_event)
+		return events;
+
+	/* check for spurious changed events from GET_EVENT */
+	if (!cd->tur_changed) {
+		if (cd->get_event_changed) {
+			if (cd->tur_mismatch++ > 8) {
+				sdev_printk(KERN_WARNING, cd->device,
+					    "GET_EVENT and TUR disagree continuously, suppress GET_EVENT events\n");
+				cd->ignore_get_event = true;
+			}
+		} else {
+			cd->tur_mismatch = 0;
+		}
 	}
+	cd->tur_changed = false;
+	cd->get_event_changed = false;
 
 	return events;
 }
Index: work/drivers/scsi/sr.h
=================================--- work.orig/drivers/scsi/sr.h
+++ work/drivers/scsi/sr.h
@@ -41,6 +41,13 @@ typedef struct scsi_cd {
 	unsigned readcd_known:1;	/* drive supports READ_CD (0xbe) */
 	unsigned readcd_cdda:1;	/* reading audio data using READ_CD */
 	unsigned media_present:1;	/* media is present */
+
+	/* GET_EVENT spurious event handling, block guarantees exclusion */
+	int tur_mismatch;		/* nr of get_event TUR mismatches */
+	bool tur_changed:1;		/* changed according to TUR */
+	bool get_event_changed:1;	/* changed according to GET_EVENT */
+	bool ignore_get_event:1;	/* get_event is unreliable, use TUR */
+
 	struct cdrom_device_info cdi;
 	/* We hold gendisk and scsi_device references on probe and use
 	 * the refs on this kref to decide when to release them */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (19 preceding siblings ...)
  2011-06-28 15:53 ` Tejun Heo
@ 2011-06-28 15:57 ` Tejun Heo
  2011-06-28 22:20 ` Kay Sievers
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2011-06-28 15:57 UTC (permalink / raw)
  To: linux-hotplug

And here's a patch which triggers clearing check on close.  It's on
top of the following branch + the previous patch.

  git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git for-linus

With the patch applied, eject triggers MEDIA_CHANGE if the drive had a
media in it.

Thanks.

---
 block/genhd.c         |   22 ++++++++++++----------
 fs/block_dev.c        |   17 ++++++++---------
 include/linux/genhd.h |    2 +-
 3 files changed, 21 insertions(+), 20 deletions(-)

Index: work/block/genhd.c
=================================--- work.orig/block/genhd.c
+++ work/block/genhd.c
@@ -1500,30 +1500,32 @@ void disk_unblock_events(struct gendisk 
 }
 
 /**
- * disk_check_events - schedule immediate event checking
- * @disk: disk to check events for
+ * disk_flush_events - schedule immediate event checking and flushing
+ * @disk: disk to check and flush events for
+ * @mask: events to flush
  *
- * Schedule immediate event checking on @disk if not blocked.
+ * Schedule immediate event checking on @disk if not blocked.  Events in
+ * @mask are scheduled to be cleared from the driver.  Note that this
+ * doesn't clear the events from @disk->ev.
  *
  * CONTEXT:
- * Don't care.  Safe to call from irq context.
+ * If @mask is non-zero must be called with bdev->bd_mutex held.
  */
-void disk_check_events(struct gendisk *disk)
+void disk_flush_events(struct gendisk *disk, unsigned int mask)
 {
 	struct disk_events *ev = disk->ev;
-	unsigned long flags;
 
 	if (!ev)
 		return;
 
-	spin_lock_irqsave(&ev->lock, flags);
+	spin_lock_irq(&ev->lock);
+	ev->clearing |= mask;
 	if (!ev->block) {
 		cancel_delayed_work(&ev->dwork);
 		queue_delayed_work(system_nrt_wq, &ev->dwork, 0);
 	}
-	spin_unlock_irqrestore(&ev->lock, flags);
+	spin_unlock_irq(&ev->lock);
 }
-EXPORT_SYMBOL_GPL(disk_check_events);
 
 /**
  * disk_clear_events - synchronously check, clear and return pending events
@@ -1713,7 +1715,7 @@ static int disk_events_set_dfl_poll_msec
 	mutex_lock(&disk_events_mutex);
 
 	list_for_each_entry(ev, &disk_events, node)
-		disk_check_events(ev->disk);
+		disk_flush_events(ev->disk, 0);
 
 	mutex_unlock(&disk_events_mutex);
 
Index: work/fs/block_dev.c
=================================--- work.orig/fs/block_dev.c
+++ work/fs/block_dev.c
@@ -1447,6 +1447,8 @@ static int __blkdev_put(struct block_dev
 
 int blkdev_put(struct block_device *bdev, fmode_t mode)
 {
+	mutex_lock(&bdev->bd_mutex);
+
 	if (mode & FMODE_EXCL) {
 		bool bdev_free;
 
@@ -1455,7 +1457,6 @@ int blkdev_put(struct block_device *bdev
 		 * are protected with bdev_lock.  bd_mutex is to
 		 * synchronize disk_holder unlinking.
 		 */
-		mutex_lock(&bdev->bd_mutex);
 		spin_lock(&bdev_lock);
 
 		WARN_ON_ONCE(--bdev->bd_holders < 0);
@@ -1473,17 +1474,15 @@ int blkdev_put(struct block_device *bdev
 		 * If this was the last claim, remove holder link and
 		 * unblock evpoll if it was a write holder.
 		 */
-		if (bdev_free) {
-			if (bdev->bd_write_holder) {
-				disk_unblock_events(bdev->bd_disk);
-				disk_check_events(bdev->bd_disk);
-				bdev->bd_write_holder = false;
-			}
+		if (bdev_free && bdev->bd_write_holder) {
+			disk_unblock_events(bdev->bd_disk);
+			bdev->bd_write_holder = false;
 		}
-
-		mutex_unlock(&bdev->bd_mutex);
 	}
 
+	disk_flush_events(bdev->bd_disk, DISK_EVENT_MEDIA_CHANGE);
+	mutex_unlock(&bdev->bd_mutex);
+
 	return __blkdev_put(bdev, mode, 0);
 }
 EXPORT_SYMBOL(blkdev_put);
Index: work/include/linux/genhd.h
=================================--- work.orig/include/linux/genhd.h
+++ work/include/linux/genhd.h
@@ -420,7 +420,7 @@ static inline int get_disk_ro(struct gen
 
 extern void disk_block_events(struct gendisk *disk);
 extern void disk_unblock_events(struct gendisk *disk);
-extern void disk_check_events(struct gendisk *disk);
+extern void disk_flush_events(struct gendisk *disk, unsigned int mask);
 extern unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask);
 
 /* drivers/char/random.c */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (20 preceding siblings ...)
  2011-06-28 15:57 ` Tejun Heo
@ 2011-06-28 22:20 ` Kay Sievers
  2011-06-28 22:21 ` Kay Sievers
  2011-06-29 20:54 ` Markus Rathgeb
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-28 22:20 UTC (permalink / raw)
  To: linux-hotplug

On Tue, Jun 28, 2011 at 17:53, Tejun Heo <htejun@gmail.com> wrote:
> Please test this one.  The previous one was broken and I also updated
> it such that tur_mismatch is cleared iff !tur_changed &&
> !get_event_changed.

Looks good. Works fine for me.

The firmware in this SANDISK device is so completely broken, that it's
not funny.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (21 preceding siblings ...)
  2011-06-28 22:20 ` Kay Sievers
@ 2011-06-28 22:21 ` Kay Sievers
  2011-06-29 20:54 ` Markus Rathgeb
  23 siblings, 0 replies; 25+ messages in thread
From: Kay Sievers @ 2011-06-28 22:21 UTC (permalink / raw)
  To: linux-hotplug

On Tue, Jun 28, 2011 at 17:57, Tejun Heo <htejun@gmail.com> wrote:
> And here's a patch which triggers clearing check on close.  It's on
> top of the following branch + the previous patch.
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git for-linus
>
> With the patch applied, eject triggers MEDIA_CHANGE if the drive had a
> media in it.

Seems to works fine for me.

I think that was the last missing piece for the userspace switch-over.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: udev queue error
  2011-05-30 19:46 udev queue error Markus Rathgeb
                   ` (22 preceding siblings ...)
  2011-06-28 22:21 ` Kay Sievers
@ 2011-06-29 20:54 ` Markus Rathgeb
  23 siblings, 0 replies; 25+ messages in thread
From: Markus Rathgeb @ 2011-06-29 20:54 UTC (permalink / raw)
  To: linux-hotplug

Thanks a lot!
I have to do a deeper look to udev (userspace and kernelspace) to
understand what is going on and what your patch change.
Good work...

2011/6/29 Kay Sievers <kay.sievers@vrfy.org>:
> On Tue, Jun 28, 2011 at 17:57, Tejun Heo <htejun@gmail.com> wrote:
>> And here's a patch which triggers clearing check on close.  It's on
>> top of the following branch + the previous patch.
>>
>>  git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git for-linus
>>
>> With the patch applied, eject triggers MEDIA_CHANGE if the drive had a
>> media in it.
>
> Seems to works fine for me.
>
> I think that was the last missing piece for the userspace switch-over.
>
> Thanks,
> Kay
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-06-29 20:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-30 19:46 udev queue error Markus Rathgeb
2011-05-30 20:06 ` Kay Sievers
2011-05-31 15:31 ` Tejun Heo
2011-05-31 15:34 ` Markus Rathgeb
2011-05-31 15:40 ` Kay Sievers
2011-05-31 15:46 ` Kay Sievers
2011-06-01  1:39 ` Tejun Heo
2011-06-01  2:10 ` Kay Sievers
2011-06-01  4:42 ` Tejun Heo
2011-06-08 11:55 ` Kay Sievers
2011-06-09 11:54 ` Tejun Heo
2011-06-09 12:36 ` Kay Sievers
2011-06-12 12:28 ` Tejun Heo
2011-06-13 19:02 ` Kay Sievers
2011-06-14 20:24 ` Kay Sievers
2011-06-15  7:13 ` Tejun Heo
2011-06-15  9:34 ` Kay Sievers
2011-06-15  9:40 ` Tejun Heo
2011-06-18 21:41 ` Kay Sievers
2011-06-28 13:45 ` Tejun Heo
2011-06-28 15:53 ` Tejun Heo
2011-06-28 15:57 ` Tejun Heo
2011-06-28 22:20 ` Kay Sievers
2011-06-28 22:21 ` Kay Sievers
2011-06-29 20:54 ` Markus Rathgeb

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.