data corruption: revalidating a (removable) hdd/flash on re-insert

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* data corruption: revalidating a (removable) hdd/flash on re-insert
@ 2008-10-31 15:38 Michael Tokarev
  2008-10-31 15:59 ` Lennart Sorensen
  2008-10-31 16:10 ` Kay Sievers
  0 siblings, 2 replies; 14+ messages in thread
From: Michael Tokarev @ 2008-10-31 15:38 UTC (permalink / raw)
  To: Kernel Mailing List

To make a long story short: is there a way to force kernel
to re-validate a replaced usb-connected hard drive (or a
flash) *automatically*?

Because right now, the kernel does not see that the drive
has been replaced, and uses *some* old cached values, which
results in random data corruption here and there, and other
similar odd things.

For example, I've an USB flash reader (Carry Computer Eng.,
Co., Ltd 6-in-1 Card Reader, but that's not really relevant).
Among other things it has a compact flash slot.  And I've
2 differently-size CF cards.

So I turn the machine on, insert one CF card, mount it, do
something with its vfat filesystem, umount it and remove
it.  Next I insert another card, and when trying to mount
it, the kernel says:

FAT: invalid media value (0x1e)
VFS: Can't find a valid FAT filesystem on dev sdb1.

When trying to run cfdisk for example, it complains that
"partition 0 ends after end of the drive" (or something
similar).

Sometimes the mount succeeds, but there are all sorts of
errors here in there, the filesystem is messed up, whith
parts of the files "from" just-removed card, and parts
from the new card.

And sure enough, when I, forgetting the issue, trying
to WRITE something to the card, it becomes almost 100%
messy...

What helps is to run, for example,

  blockdev --rereadpt /dev/sdb

manually, after replacing the card.

The first thing I was thinking of when saw the whole mess
is: there must be some process like hald/udev/whatever
messy subsystem du jur, which holds the device node open
and prevents the kernel from re-reading the drive by its
own as it correctly did in the past.  Nope, I've shut down
everything, and the same happens when only shell process
running INSTEAD of init (booting with init=/bin/sh option).

So at some point the kernel stopped noticing the drive
change in this configuration some time ago.  I can't say
when exactly, since I didn't use the card reader for over
a year, and certainly didn't try it with more than one
card in a row for even longer time.   It worked in the
past, that's for sure.  And it definitely does not work
(resulting in the above mess) with 2.6.25, 2.6.26 and 2.6.27.

Help?

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-10-31 15:38 data corruption: revalidating a (removable) hdd/flash on re-insert Michael Tokarev
@ 2008-10-31 15:59 ` Lennart Sorensen
  2008-10-31 16:10   ` Michael Tokarev
  2008-10-31 16:10 ` Kay Sievers
  1 sibling, 1 reply; 14+ messages in thread
From: Lennart Sorensen @ 2008-10-31 15:59 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Kernel Mailing List

On Fri, Oct 31, 2008 at 06:38:01PM +0300, Michael Tokarev wrote:
> To make a long story short: is there a way to force kernel
> to re-validate a replaced usb-connected hard drive (or a
> flash) *automatically*?
> 
> Because right now, the kernel does not see that the drive
> has been replaced, and uses *some* old cached values, which
> results in random data corruption here and there, and other
> similar odd things.
> 
> For example, I've an USB flash reader (Carry Computer Eng.,
> Co., Ltd 6-in-1 Card Reader, but that's not really relevant).
> Among other things it has a compact flash slot.  And I've
> 2 differently-size CF cards.
> 
> So I turn the machine on, insert one CF card, mount it, do
> something with its vfat filesystem, umount it and remove
> it.  Next I insert another card, and when trying to mount
> it, the kernel says:
> 
> FAT: invalid media value (0x1e)
> VFS: Can't find a valid FAT filesystem on dev sdb1.
> 
> When trying to run cfdisk for example, it complains that
> "partition 0 ends after end of the drive" (or something
> similar).
> 
> Sometimes the mount succeeds, but there are all sorts of
> errors here in there, the filesystem is messed up, whith
> parts of the files "from" just-removed card, and parts
> from the new card.
> 
> And sure enough, when I, forgetting the issue, trying
> to WRITE something to the card, it becomes almost 100%
> messy...
> 
> What helps is to run, for example,
> 
>   blockdev --rereadpt /dev/sdb
> 
> manually, after replacing the card.
> 
> The first thing I was thinking of when saw the whole mess
> is: there must be some process like hald/udev/whatever
> messy subsystem du jur, which holds the device node open
> and prevents the kernel from re-reading the drive by its
> own as it correctly did in the past.  Nope, I've shut down
> everything, and the same happens when only shell process
> running INSTEAD of init (booting with init=/bin/sh option).
> 
> So at some point the kernel stopped noticing the drive
> change in this configuration some time ago.  I can't say
> when exactly, since I didn't use the card reader for over
> a year, and certainly didn't try it with more than one
> card in a row for even longer time.   It worked in the
> past, that's for sure.  And it definitely does not work
> (resulting in the above mess) with 2.6.25, 2.6.26 and 2.6.27.

I have had this happen with a few usb flash card readers.  My solution
was to unplug the usb cable then swap the card and connect the usb cable
again.  In the end I went and bought a different card reader, which does
work correctly.

I highly suspect it is a mistake in the hardware causing the problem
given the vast majority of readers do work correctly already.

So far I have had no problems with a silverstone, mitsumi, dell (in
monitor), sandisk.  I have had a problem with a no name cheap "15 in
1" reader which I stopped using because plugging and unplugging got
annoying.

So my recommendation is get a non broken device.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-10-31 15:59 ` Lennart Sorensen
@ 2008-10-31 16:10   ` Michael Tokarev
  2008-10-31 18:28     ` Lennart Sorensen
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2008-10-31 16:10 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Kernel Mailing List

Lennart Sorensen wrote:
> On Fri, Oct 31, 2008 at 06:38:01PM +0300, Michael Tokarev wrote:
>> To make a long story short: is there a way to force kernel
>> to re-validate a replaced usb-connected hard drive (or a
>> flash) *automatically*?
>>
>> Because right now, the kernel does not see that the drive
>> has been replaced, and uses *some* old cached values, which
>> results in random data corruption here and there, and other
>> similar odd things.
>>
>> For example, I've an USB flash reader (Carry Computer Eng.,
>> Co., Ltd 6-in-1 Card Reader, but that's not really relevant).
>> Among other things it has a compact flash slot.  And I've
>> 2 differently-size CF cards.
[]
>> So at some point the kernel stopped noticing the drive
>> change in this configuration some time ago.  I can't say
>> when exactly, since I didn't use the card reader for over
>> a year, and certainly didn't try it with more than one
>> card in a row for even longer time.   It worked in the
>> past, that's for sure.  And it definitely does not work
>> (resulting in the above mess) with 2.6.25, 2.6.26 and 2.6.27.
> 
> I have had this happen with a few usb flash card readers.  My solution
> was to unplug the usb cable then swap the card and connect the usb cable
> again.  In the end I went and bought a different card reader, which does
> work correctly.

Well, this one is internal reader, which plugs into a 3" slot.
AND it also has the regular floppy drive in it, too. It's a combo,
a floppy drive AND a USB flash reader.  As such, I can't easily
re-plug it (which definitely helps, too, but for that to work I
have to open the case), and I can't replace it either, because
this device is almost unique: I still need a floppy and there's
no other such combo drives, at least I wasn't able to find it.
It's a great device if you think of it: it connects two epochs
together...

> I highly suspect it is a mistake in the hardware causing the problem
> given the vast majority of readers do work correctly already.

So at least a) I'm not alone, and b) there's SOMETHING that works.
Excellent!

> So far I have had no problems with a silverstone, mitsumi, dell (in
> monitor), sandisk.  I have had a problem with a no name cheap "15 in
> 1" reader which I stopped using because plugging and unplugging got
> annoying.
> 
> So my recommendation is get a non broken device.

The thing is that with some older kernel(s) it defeinitely worked.
So I'd say it's the kernel which broke/regressed, not the hardware.
Suggesting to fix the hardware because new kernel does not work with
it anymore is.. strange at least.

And yes it was definitely a el cheapo no-name thing.  But a..
great (epochs!) and hence unique thing, see above.. ;)

I'll try to find out when it broke.  My first suspect was the patch
introduced not-so-recently (in 2.6.2x series) to support media
change notifications done by some hardware (wait for notify instead
of constantly polling).

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-10-31 15:38 data corruption: revalidating a (removable) hdd/flash on re-insert Michael Tokarev
  2008-10-31 15:59 ` Lennart Sorensen
@ 2008-10-31 16:10 ` Kay Sievers
  2008-10-31 17:39   ` Michael Tokarev
  2008-11-04 19:57   ` Pavel Machek
  1 sibling, 2 replies; 14+ messages in thread
From: Kay Sievers @ 2008-10-31 16:10 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Kernel Mailing List

On Fri, Oct 31, 2008 at 16:38, Michael Tokarev <mjt@tls.msk.ru> wrote:
> To make a long story short: is there a way to force kernel
> to re-validate a replaced usb-connected hard drive (or a
> flash) *automatically*?
>
> Because right now, the kernel does not see that the drive
> has been replaced, and uses *some* old cached values, which
> results in random data corruption here and there, and other
> similar odd things.
>
> For example, I've an USB flash reader (Carry Computer Eng.,
> Co., Ltd 6-in-1 Card Reader, but that's not really relevant).
> Among other things it has a compact flash slot.  And I've
> 2 differently-size CF cards.
>
> So I turn the machine on, insert one CF card, mount it, do
> something with its vfat filesystem, umount it and remove
> it.  Next I insert another card, and when trying to mount
> it, the kernel says:
>
> FAT: invalid media value (0x1e)
> VFS: Can't find a valid FAT filesystem on dev sdb1.
>
> When trying to run cfdisk for example, it complains that
> "partition 0 ends after end of the drive" (or something
> similar).
>
> Sometimes the mount succeeds, but there are all sorts of
> errors here in there, the filesystem is messed up, whith
> parts of the files "from" just-removed card, and parts
> from the new card.
>
> And sure enough, when I, forgetting the issue, trying
> to WRITE something to the card, it becomes almost 100%
> messy...
>
> What helps is to run, for example,
>
>  blockdev --rereadpt /dev/sdb
>
> manually, after replacing the card.
>
> The first thing I was thinking of when saw the whole mess
> is: there must be some process like hald/udev/whatever
> messy subsystem du jur, which holds the device node open
> and prevents the kernel from re-reading the drive by its
> own as it correctly did in the past.  Nope, I've shut down
> everything, and the same happens when only shell process
> running INSTEAD of init (booting with init=/bin/sh option).
>
> So at some point the kernel stopped noticing the drive
> change in this configuration some time ago.  I can't say
> when exactly, since I didn't use the card reader for over
> a year, and certainly didn't try it with more than one
> card in a row for even longer time.   It worked in the
> past, that's for sure.  And it definitely does not work
> (resulting in the above mess) with 2.6.25, 2.6.26 and 2.6.27.

Maybe your card reader is broken. I can not reproduce this with any of
the many readers I have. Usually a media change results in media
revalidation with the next access to the device. You can easily
reproduce that:

Insert the media, and force a validation:
  $ touch /dev/sdb

Start logging of the kernel uevents to the console:
  $ udevadm monitor --kernel &

Access the device:
  $ touch /dev/sdb

Nothing should happen, as the reader/kernel knows it is still valid.

Now remove the media and insert it immediately again.

Access the device:
  $ touch /dev/sdb
  UEVENT[1225468868.803950] change
/devices/pci0000:00/0000:00:1d.7/usb5/5-2/5-2:1.0/host8/target8:0:0/8:0:0:0
(scsi)

and you see the reader told to kernel (scsi unit attention) to
revalidate the device.

These events happen only when the device is accessed. That's why
distros poll removable devices for media changes.

Every access to removable media is guarded by this revalidation check.
If you don't see these events, you should not trust this reader, and
at least never change the media while it is connected.

Kay

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-10-31 16:10 ` Kay Sievers
@ 2008-10-31 17:39   ` Michael Tokarev
  2008-10-31 18:49     ` Kay Sievers
  2008-11-04 19:57   ` Pavel Machek
  1 sibling, 1 reply; 14+ messages in thread
From: Michael Tokarev @ 2008-10-31 17:39 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Kernel Mailing List

Kay Sievers wrote:
> On Fri, Oct 31, 2008 at 16:38, Michael Tokarev <mjt@tls.msk.ru> wrote:
>> To make a long story short: is there a way to force kernel
>> to re-validate a replaced usb-connected hard drive (or a
>> flash) *automatically*?
[]
> Insert the media, and force a validation:
>   $ touch /dev/sdb

With a newly inserted flash (removed some irrelevant stuff):

DEVTYPE=disk SUBSYSTEM=block MINOR=16 ACTION=change MAJOR=8
DEVTYPE=partition SUBSYSTEM=block MINOR=17 ACTION=add MAJOR=8
DEVTYPE=scsi_device SUBSYSTEM=scsi DRIVER=sd SDEV_MEDIA_CHANGE=1 ACTION=change
DEVTYPE=disk SUBSYSTEM=block MINOR=16 ACTION=change MAJOR=8

> Access the device:
>   $ touch /dev/sdb
> 
> Nothing should happen, as the reader/kernel knows it is still valid.

Yes nothing happens.

> Now remove the media and insert it immediately again.
> 
> Access the device:
>   $ touch /dev/sdb
>   UEVENT[1225468868.803950] change
> /devices/pci0000:00/0000:00:1d.7/usb5/5-2/5-2:1.0/host8/target8:0:0/8:0:0:0
> (scsi)

> and you see the reader told to kernel (scsi unit attention) to
> revalidate the device.

Ok. So in my case, nothing happens here just like
if it were not removed/inserted.

I replaced the card with another one, and nothing
happened as well.

Only when touch'ing after REMOVING the flash, I see:

DEVTYPE=scsi_device SUBSYSTEM=scsi DRIVER=sd SDEV_MEDIA_CHANGE=1 ACTION=change 
DEVTYPE=partition SUBSYSTEM=block MINOR=17 ACTION=remove MAJOR=8
DEVTYPE=disk SUBSYSTEM=block MINOR=16 ACTION=change MAJOR=8

> Every access to removable media is guarded by this revalidation check.
> If you don't see these events, you should not trust this reader, and
> at least never change the media while it is connected.

Ok.  So.. 3 questions.

1) how it worked before (i yet to find which kernel worked)?
   I can only guess that some older kernel never cached the
   "validity".

2) 'doze notices the insertions/removals just fine.  Again I
   can only guess that it constantly pools for changes.

3), and the most important one.  I think there should be a
   way to stop "caching" of the media information, i.e. to force
   revalidation events on EVERY access, for certain hardware at
   least.  Because corruption in such cases is much worse than
   any positive effects of caching etc... Maybe some unusual_devs.h
   way or somesuch?..

Now I see the device is somewhat(?) broken.  But as I said before
in another email, it's a great device (as in, two epochs connected
to each other), and it'd be sad to lose it.  A nostalgie, sort of.. ;)

Ok, maybe actually polling for devices sometimes makes sense... ;)
And there can be a workaround, using a tiny daemon that constantly
accesses the device, in order to catch removals...  'hwell.

/mjt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-10-31 16:10   ` Michael Tokarev
@ 2008-10-31 18:28     ` Lennart Sorensen
  0 siblings, 0 replies; 14+ messages in thread
From: Lennart Sorensen @ 2008-10-31 18:28 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Kernel Mailing List

On Fri, Oct 31, 2008 at 07:10:07PM +0300, Michael Tokarev wrote:
> Well, this one is internal reader, which plugs into a 3" slot.
> AND it also has the regular floppy drive in it, too. It's a combo,
> a floppy drive AND a USB flash reader.  As such, I can't easily
> re-plug it (which definitely helps, too, but for that to work I
> have to open the case), and I can't replace it either, because
> this device is almost unique: I still need a floppy and there's
> no other such combo drives, at least I wasn't able to find it.
> It's a great device if you think of it: it connects two epochs
> together...

Yes I use exactly that made by mitsumi.  Never had a problem with the
mitsumi one.  I haven't seen anyone with them lately though, so they
made have been discontinued.

It is shown here though:
http://www.mitsumi.com/products/FA404.htm

I have the one on the right with the floppy at the bottom.

> So at least a) I'm not alone, and b) there's SOMETHING that works.
> Excellent!

That's right.

> The thing is that with some older kernel(s) it defeinitely worked.
> So I'd say it's the kernel which broke/regressed, not the hardware.
> Suggesting to fix the hardware because new kernel does not work with
> it anymore is.. strange at least.

Well the mitsumi I have has worked with every kernel I have ever had
(2.6.26 at the moment).  The no name reader certainly didn't work right
with early 2.6 kernels, and I haven't bothered with it since to check.

> And yes it was definitely a el cheapo no-name thing.  But a..
> great (epochs!) and hence unique thing, see above.. ;)

I think I paid $27(canadian) for my mitsumi.  I paid $20 for the no name
thing (which has no floppy and is external).

> I'll try to find out when it broke.  My first suspect was the patch
> introduced not-so-recently (in 2.6.2x series) to support media
> change notifications done by some hardware (wait for notify instead
> of constantly polling).

Perhaps.  At least you have a simple test case, so a bisect shouldn't be
too hard to do.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-10-31 17:39   ` Michael Tokarev
@ 2008-10-31 18:49     ` Kay Sievers
  0 siblings, 0 replies; 14+ messages in thread
From: Kay Sievers @ 2008-10-31 18:49 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Kernel Mailing List

On Fri, Oct 31, 2008 at 18:39, Michael Tokarev <mjt@tls.msk.ru> wrote:
> Kay Sievers wrote:
>>
>> On Fri, Oct 31, 2008 at 16:38, Michael Tokarev <mjt@tls.msk.ru> wrote:
>>>
>>> To make a long story short: is there a way to force kernel
>>> to re-validate a replaced usb-connected hard drive (or a
>>> flash) *automatically*?
>
>> Insert the media, and force a validation:
>>  $ touch /dev/sdb
>
> With a newly inserted flash (removed some irrelevant stuff):
>
> DEVTYPE=disk SUBSYSTEM=block MINOR=16 ACTION=change MAJOR=8
> DEVTYPE=partition SUBSYSTEM=block MINOR=17 ACTION=add MAJOR=8
> DEVTYPE=scsi_device SUBSYSTEM=scsi DRIVER=sd SDEV_MEDIA_CHANGE=1
> ACTION=change
> DEVTYPE=disk SUBSYSTEM=block MINOR=16 ACTION=change MAJOR=8
>
>> Access the device:
>>  $ touch /dev/sdb
>>
>> Nothing should happen, as the reader/kernel knows it is still valid.
>
> Yes nothing happens.
>
>> Now remove the media and insert it immediately again.
>>
>> Access the device:
>>  $ touch /dev/sdb
>>  UEVENT[1225468868.803950] change
>>
>> /devices/pci0000:00/0000:00:1d.7/usb5/5-2/5-2:1.0/host8/target8:0:0/8:0:0:0
>> (scsi)
>
>> and you see the reader told to kernel (scsi unit attention) to
>> revalidate the device.
>
> Ok. So in my case, nothing happens here just like
> if it were not removed/inserted.
>
> I replaced the card with another one, and nothing
> happened as well.
>
> Only when touch'ing after REMOVING the flash, I see:
>
> DEVTYPE=scsi_device SUBSYSTEM=scsi DRIVER=sd SDEV_MEDIA_CHANGE=1
> ACTION=change DEVTYPE=partition SUBSYSTEM=block MINOR=17 ACTION=remove
> MAJOR=8
> DEVTYPE=disk SUBSYSTEM=block MINOR=16 ACTION=change MAJOR=8
>
>> Every access to removable media is guarded by this revalidation check.
>> If you don't see these events, you should not trust this reader, and
>> at least never change the media while it is connected.
>
> Ok.  So.. 3 questions.
>
> 1) how it worked before (i yet to find which kernel worked)?
>  I can only guess that some older kernel never cached the
>  "validity".

The kernel does not cache, it's the device itself that reports a media
change, and the kernel asks every removable device before it is
accessing it.

> 2) 'doze notices the insertions/removals just fine.  Again I
>  can only guess that it constantly pools for changes.

It polls the device every few secomds, just like HAL does on most
Linux desktop installations. But in your case, when the reader does
not report the change correctly, even that might go wrong, just like
without polling.

> 3), and the most important one.  I think there should be a
>  way to stop "caching" of the media information, i.e. to force
>  revalidation events on EVERY access, for certain hardware at
>  least.

That's how it already is. We just rely on the device to tell us. There
is no way to revalidate anything otherwise.

> Because corruption in such cases is much worse than
>  any positive effects of caching etc... Maybe some unusual_devs.h
>  way or somesuch?..

I can't think of a way to make that working, there is no cache in the
kernel, only a state in the device. You would need to checksum the
device to find out that it isn't the same media, even that might not
work, and it's definitely nothing you want to do.

You could trace with usbmon, or something and investigate the scsi
packets if the scsi unit attention really does not signify a media
change, which is what I expect.

> Now I see the device is somewhat(?) broken.  But as I said before
> in another email, it's a great device (as in, two epochs connected
> to each other), and it'd be sad to lose it.  A nostalgie, sort of.. ;)

If it works otherwise, use it, but I wouldn't change media while it is
connected.

> Ok, maybe actually polling for devices sometimes makes sense... ;)
> And there can be a workaround, using a tiny daemon that constantly
> accesses the device, in order to catch removals...  'hwell.

That would still not work with your device, if you change the media
during the polling interval, which is usually between 2 and 16
seconds. If the device does not report any change, like it seem in
your case, you can not do anything. Polling helps only to reflect the
current state of a device while it is not accessed, to get the state
of the device into the kernel. Like on a desktop, where you want to
auto-mount card reader/cdrom media on insertion.

Kay

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-10-31 16:10 ` Kay Sievers
  2008-10-31 17:39   ` Michael Tokarev
@ 2008-11-04 19:57   ` Pavel Machek
  2008-11-04 20:13     ` Kay Sievers
  1 sibling, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2008-11-04 19:57 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Michael Tokarev, Kernel Mailing List

On Fri 2008-10-31 17:10:26, Kay Sievers wrote:
> On Fri, Oct 31, 2008 at 16:38, Michael Tokarev <mjt@tls.msk.ru> wrote:
> > To make a long story short: is there a way to force kernel
> > to re-validate a replaced usb-connected hard drive (or a
> > flash) *automatically*?
> >
> > Because right now, the kernel does not see that the drive
> > has been replaced, and uses *some* old cached values, which
> > results in random data corruption here and there, and other
> > similar odd things.
> 
> Maybe your card reader is broken. I can not reproduce this with any of
> the many readers I have. Usually a media change results in media
> revalidation with the next access to the device. You can easily
> reproduce that:
> 
> Insert the media, and force a validation:
>   $ touch /dev/sdb
> 
> Start logging of the kernel uevents to the console:
>   $ udevadm monitor --kernel &
> 
> Access the device:
>   $ touch /dev/sdb
> 
> Nothing should happen, as the reader/kernel knows it is still valid.
> 
> Now remove the media and insert it immediately again.
> 
> Access the device:
>   $ touch /dev/sdb
>   UEVENT[1225468868.803950] change
> /devices/pci0000:00/0000:00:1d.7/usb5/5-2/5-2:1.0/host8/target8:0:0/8:0:0:0
> (scsi)
> 
> and you see the reader told to kernel (scsi unit attention) to
> revalidate the device.
> 
> These events happen only when the device is accessed. That's why
> distros poll removable devices for media changes.
> 
> Every access to removable media is guarded by this revalidation check.
> If you don't see these events, you should not trust this reader, and
> at least never change the media while it is connected.

This is rather nasty data-corrupter. Could we at least blacklist
broken device, and force revalidation on each close or something like
that?


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-11-04 19:57   ` Pavel Machek
@ 2008-11-04 20:13     ` Kay Sievers
  2008-11-04 20:20       ` Pavel Machek
  0 siblings, 1 reply; 14+ messages in thread
From: Kay Sievers @ 2008-11-04 20:13 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Michael Tokarev, Kernel Mailing List

On Tue, Nov 4, 2008 at 20:57, Pavel Machek <pavel@suse.cz> wrote:
> On Fri 2008-10-31 17:10:26, Kay Sievers wrote:
>> On Fri, Oct 31, 2008 at 16:38, Michael Tokarev <mjt@tls.msk.ru> wrote:
>> > To make a long story short: is there a way to force kernel
>> > to re-validate a replaced usb-connected hard drive (or a
>> > flash) *automatically*?
>> >
>> > Because right now, the kernel does not see that the drive
>> > has been replaced, and uses *some* old cached values, which
>> > results in random data corruption here and there, and other
>> > similar odd things.
>>
>> Maybe your card reader is broken. I can not reproduce this with any of
>> the many readers I have. Usually a media change results in media
>> revalidation with the next access to the device. You can easily
>> reproduce that:
>>
>> Insert the media, and force a validation:
>>   $ touch /dev/sdb
>>
>> Start logging of the kernel uevents to the console:
>>   $ udevadm monitor --kernel &
>>
>> Access the device:
>>   $ touch /dev/sdb
>>
>> Nothing should happen, as the reader/kernel knows it is still valid.
>>
>> Now remove the media and insert it immediately again.
>>
>> Access the device:
>>   $ touch /dev/sdb
>>   UEVENT[1225468868.803950] change
>> /devices/pci0000:00/0000:00:1d.7/usb5/5-2/5-2:1.0/host8/target8:0:0/8:0:0:0
>> (scsi)
>>
>> and you see the reader told to kernel (scsi unit attention) to
>> revalidate the device.
>>
>> These events happen only when the device is accessed. That's why
>> distros poll removable devices for media changes.
>>
>> Every access to removable media is guarded by this revalidation check.
>> If you don't see these events, you should not trust this reader, and
>> at least never change the media while it is connected.
>
> This is rather nasty data-corrupter.

Sure, it is.

> Could we at least blacklist
> broken device, and force revalidation on each close or something like
> that?

What's your idea of revalidation if the hardware does not tell you?
Get an md5 of the disk content? :)

Kay

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-11-04 20:13     ` Kay Sievers
@ 2008-11-04 20:20       ` Pavel Machek
  2008-11-04 21:22         ` Michael Tokarev
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2008-11-04 20:20 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Michael Tokarev, Kernel Mailing List


> >> Every access to removable media is guarded by this revalidation check.
> >> If you don't see these events, you should not trust this reader, and
> >> at least never change the media while it is connected.
> >
> > This is rather nasty data-corrupter.
> 
> Sure, it is.
> 
> > Could we at least blacklist
> > broken device, and force revalidation on each close or something like
> > that?
> 
> What's your idea of revalidation if the hardware does not tell you?
> Get an md5 of the disk content? :)

Well... you should not eject media while fs is mounted or blockdev is
open, correct?

So can we simply claim 'media changed' on last close/unmount? Sure,
sometimes media was not changed, but that only hurts performance, not
correctness... ?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-11-04 20:20       ` Pavel Machek
@ 2008-11-04 21:22         ` Michael Tokarev
  2008-11-04 21:28           ` Pavel Machek
  2008-11-05  0:29           ` Kay Sievers
  0 siblings, 2 replies; 14+ messages in thread
From: Michael Tokarev @ 2008-11-04 21:22 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Kay Sievers, Kernel Mailing List

Pavel Machek wrote:
>>>> Every access to removable media is guarded by this revalidation check.
>>>> If you don't see these events, you should not trust this reader, and
>>>> at least never change the media while it is connected.
>>> This is rather nasty data-corrupter.
>> Sure, it is.
>>
>>> Could we at least blacklist
>>> broken device, and force revalidation on each close or something like
>>> that?
>> What's your idea of revalidation if the hardware does not tell you?
>> Get an md5 of the disk content? :)
> 
> Well... you should not eject media while fs is mounted or blockdev is
> open, correct?
> 
> So can we simply claim 'media changed' on last close/unmount? Sure,
> sometimes media was not changed, but that only hurts performance, not
> correctness... ?

Well, that's what my tiny proggy, which I used here to work around the
problem, does.  It constantly opens/closes the /dev/sdFOO, every 0.5s
currently (I don't think I will be able to replace a media faster than
half a second :), in order to catch REMOVALs of media -- because when
the drive does not see the media anymore, it correctly reports that
the media has changed...

I tried to make it to detect CLOSE of the file (either by userspace or
by kernel on umount), to not waste time when the drive is open/mounted
as it can't be revalidated anyway, but neither dnotify nor inotify is
helpful here.

What is needed is to force "invalidation" on last close, so that on
next open, kernel thinks it's a shiny new media, never seen before.
Ie. to force-flush caches, or something like that.  Sure this is not
as good as my program, which still leaves caches in case media was
NOT removed.  But my approach is wasteful.  And the data corruption
is indeed quite bad (we've lost whole gig of photos this way already).

But yes, looks like this problem becomes less and less of an issue.
So for me, it's easy to deal with (not perfect but it works; it'd be
even better if i will be able to wait for umount using inotify, to
only wake when really needed), and the real solution is to not use
cheap broken hardware...  (My unit was about $15, real ones costs
$25 or so, but that's not the reason I've got it.  Real reason was
that it was only once than I actually saw such a thing, and it was
the last one as well... ;)

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-11-04 21:22         ` Michael Tokarev
@ 2008-11-04 21:28           ` Pavel Machek
  2008-11-05  8:04             ` Michael Tokarev
  2008-11-05  0:29           ` Kay Sievers
  1 sibling, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2008-11-04 21:28 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Kay Sievers, Kernel Mailing List

On Wed 2008-11-05 00:22:51, Michael Tokarev wrote:
> Pavel Machek wrote:
>>>>> Every access to removable media is guarded by this revalidation check.
>>>>> If you don't see these events, you should not trust this reader, and
>>>>> at least never change the media while it is connected.
>>>> This is rather nasty data-corrupter.
>>> Sure, it is.
>>>
>>>> Could we at least blacklist
>>>> broken device, and force revalidation on each close or something like
>>>> that?
>>> What's your idea of revalidation if the hardware does not tell you?
>>> Get an md5 of the disk content? :)
>>
>> Well... you should not eject media while fs is mounted or blockdev is
>> open, correct?
>>
>> So can we simply claim 'media changed' on last close/unmount? Sure,
>> sometimes media was not changed, but that only hurts performance, not
>> correctness... ?
>
> Well, that's what my tiny proggy, which I used here to work around the
> problem, does.  It constantly opens/closes the /dev/sdFOO, every 0.5s
> currently (I don't think I will be able to replace a media faster than
> half a second :), in order to catch REMOVALs of media -- because when
> the drive does not see the media anymore, it correctly reports that
> the media has changed...

Ok, so we you need to do is to put it into kernel and activate it
via blacklist...?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-11-04 21:22         ` Michael Tokarev
  2008-11-04 21:28           ` Pavel Machek
@ 2008-11-05  0:29           ` Kay Sievers
  1 sibling, 0 replies; 14+ messages in thread
From: Kay Sievers @ 2008-11-05  0:29 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Pavel Machek, Kernel Mailing List

On Tue, Nov 4, 2008 at 22:22, Michael Tokarev <mjt@tls.msk.ru> wrote:

> So for me, it's easy to deal with (not perfect but it works; it'd be
> even better if i will be able to wait for umount using inotify, to
> only wake when really needed),

If I understand it right, what you are looking for, you could sleep in
poll() of /proc/mounts, and will be woken up with POLLERR if anything
in your mount tree changes. Then you check the state of your usb
device and possibly invalidate it.

Kay

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: data corruption: revalidating a (removable) hdd/flash on re-insert
  2008-11-04 21:28           ` Pavel Machek
@ 2008-11-05  8:04             ` Michael Tokarev
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Tokarev @ 2008-11-05  8:04 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Kay Sievers, Kernel Mailing List

Pavel Machek wrote:
> On Wed 2008-11-05 00:22:51, Michael Tokarev wrote:
>> Pavel Machek wrote:
[]
>>> So can we simply claim 'media changed' on last close/unmount? Sure,
>>> sometimes media was not changed, but that only hurts performance, not
>>> correctness... ?

>> Well, that's what my tiny proggy, which I used here to work around the
>> problem, does.  It constantly opens/closes the /dev/sdFOO, every 0.5s
>> currently (I don't think I will be able to replace a media faster than
>> half a second :), in order to catch REMOVALs of media -- because when
>> the drive does not see the media anymore, it correctly reports that
>> the media has changed...

> Ok, so we you need to do is to put it into kernel and activate it
> via blacklist...?

I'm fine with my solution.. ;)  Especially once Kay suggested to
look at /proc/mounts for notifications.

Original problem was that I didn't understand what happens, and
blamed kernel for "breaking" the working device (it looks like
it never worked in the first place, it was just that we never hit
the bug before).  Once the problem become clear (thanks Kay!),
I wrote the proggy mentioned above - it's obviously a gross hack,
but it stops the corruption for me.

Generally the solution can be one of the 3:

a) leave it as it is now, since it had never been bought up
   before and hence does not affect many people.  And because
   even if it was, it becomes less and less of a problem with
   bad drives going away slowly...

b) to use a mechanism like blacklist in kernel to force
   invalidation on CLOSE automatically for such drives (not
   when it really necessary as my program detects - on REMOVAL).
   Less efficient than my  solution, but much easier to deal
   with in kernel.

c) I will use my variant for my problem.. while finding a
  replacement for the bad hardware.

So no, I'm not asking to put that proggy into the kernel.. ;)
For kernelspace solution that'd be a much simple way.  If at
all.

So to summary: if it is EASY (read: trivial) to do such blacklist
in kernel space, I'd do it right away, because potentially it
is still possible to see similar corruptions elsewhere.  If not,
just forget the case as "solved for the reporter" ;)

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-11-05  8:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-31 15:38 data corruption: revalidating a (removable) hdd/flash on re-insert Michael Tokarev
2008-10-31 15:59 ` Lennart Sorensen
2008-10-31 16:10   ` Michael Tokarev
2008-10-31 18:28     ` Lennart Sorensen
2008-10-31 16:10 ` Kay Sievers
2008-10-31 17:39   ` Michael Tokarev
2008-10-31 18:49     ` Kay Sievers
2008-11-04 19:57   ` Pavel Machek
2008-11-04 20:13     ` Kay Sievers
2008-11-04 20:20       ` Pavel Machek
2008-11-04 21:22         ` Michael Tokarev
2008-11-04 21:28           ` Pavel Machek
2008-11-05  8:04             ` Michael Tokarev
2008-11-05  0:29           ` Kay Sievers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).