linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Poor floppy performance in kernel 2.4.10
@ 2001-10-27 15:00 Alain Knaff
  2001-10-27 15:15 ` Alexander Viro
  0 siblings, 1 reply; 44+ messages in thread
From: Alain Knaff @ 2001-10-27 15:00 UTC (permalink / raw)
  To: torvalds, linux-kernel; +Cc: viro

Appended to this mail is a patch meant to fix the "non-cached floppy"
problem.

The  block_device_operations structure now has a new member:
  can_trust_media_change
If this returns 1, the VFS trusts the driver's check_media_change
function, and skips the call to kill_bdev in blkdev_put. If the driver
provides no such function, it defaults to "don't trust" (i.e. call
kill_bdev)

The floppy driver implements this new function in the following way:
 1. if broken_dcl on a drive is set, can_trust_media_change always
returns false
 2. if broken_dcl is not set, the function initially returns false,
until a disk change signal is seen. From then on, it always returns
true.

Other block devices could implement this by using black or whitelists,
or hardwiring a value.

HOWEVER, even after these changes, the caching problem still wasn't
solved (I confirmed with additional printk's that invalidate_bdev was
indeed no longer called).

Further investigation showed that the real culprit was actually
bdput. This function seems to de-allocate all structures attached to
the block device, including the cache... Ifdeffing it out did indeed
restore cache functionality. However, I understand that this would not
be a proper way to address the problem, as it would certainly cause a
memory leak.

Appended to this mail is the patch (without the debugging printk's,
and with bdput still intact); however in order to make it really work
bdput would need to be changed to not free up the cache, if that is
possible... Any ideas?

Regards,

Alain

diff -ur 2.4.13/linux/drivers/block/floppy.c linux/drivers/block/floppy.c
--- 2.4.13/linux/drivers/block/floppy.c	Sat Oct 27 09:42:34 2001
+++ linux/drivers/block/floppy.c	Sat Oct 27 16:17:33 2001
@@ -741,6 +741,7 @@
 		return UTESTF(FD_DISK_CHANGED);
 	if ((fd_inb(FD_DIR) ^ UDP->flags) & 0x80){
 		USETF(FD_VERIFY); /* verify write protection */
+		USETF(FD_DCL_SEEN); /* DCL for this drive has been seen */
 		if (UDRS->maxblock){
 			/* mark it changed */
 			USETF(FD_DISK_CHANGED);
@@ -3901,12 +3902,23 @@
 	return 0;
 }
 
+/* determines whether we can trust media change
+ * answers true if we have seen disk change at least once, and not marked as
+ * broken */
+static int floppy_can_trust_media_change(kdev_t dev)
+{
+	int drive=DRIVE(dev);
+	return (UTESTF(FD_DCL_SEEN) &&
+		((UDP->flags & FD_BROKEN_DCL) == 0));
+}
+
 static struct block_device_operations floppy_fops = {
 	open:			floppy_open,
 	release:		floppy_release,
 	ioctl:			fd_ioctl,
 	check_media_change:	check_floppy_change,
 	revalidate:		floppy_revalidate,
+	can_trust_media_change: floppy_can_trust_media_change
 };
 
 static void __init register_devfs_entries (int drive)
diff -ur 2.4.13/linux/fs/block_dev.c linux/fs/block_dev.c
--- 2.4.13/linux/fs/block_dev.c	Sat Oct 27 09:43:27 2001
+++ linux/fs/block_dev.c	Sat Oct 27 16:18:19 2001
@@ -601,8 +601,12 @@
 		__block_fsync(bd_inode);
 	else if (kind == BDEV_FS)
 		fsync_no_super(rdev);
-	if (!--bdev->bd_openers)
-		kill_bdev(bdev);
+	if (!--bdev->bd_openers) {
+		if(!bdev->bd_op->can_trust_media_change ||
+		   !bdev->bd_op->can_trust_media_change(rdev)) {
+			kill_bdev(bdev);
+		}
+	}
 	if (bdev->bd_op->release)
 		ret = bdev->bd_op->release(bd_inode, NULL);
 	if (!bdev->bd_openers)
diff -ur 2.4.13/linux/include/linux/fd.h linux/include/linux/fd.h
--- 2.4.13/linux/include/linux/fd.h	Fri Aug 13 21:16:16 1999
+++ linux/include/linux/fd.h	Sat Oct 27 13:37:08 2001
@@ -177,7 +177,8 @@
 				 * to clear media change status */
 	FD_UNUSED_BIT,
 	FD_DISK_CHANGED_BIT,	/* disk has been changed since last i/o */
-	FD_DISK_WRITABLE_BIT	/* disk is writable */
+	FD_DISK_WRITABLE_BIT,	/* disk is writable */
+	FD_DCL_SEEN_BIT		/* have we seen DCL? */
 };
 
 #define FDSETDRVPRM _IOW(2, 0x90, struct floppy_drive_params)
@@ -196,6 +197,7 @@
 #define FD_DISK_NEWCHANGE (1 << FD_DISK_NEWCHANGE_BIT)
 #define FD_DISK_CHANGED (1 << FD_DISK_CHANGED_BIT)
 #define FD_DISK_WRITABLE (1 << FD_DISK_WRITABLE_BIT)
+#define FD_DCL_SEEN (1 << FD_DCL_SEEN)
 
 	unsigned long spinup_date;
 	unsigned long select_date;
diff -ur 2.4.13/linux/include/linux/fs.h linux/include/linux/fs.h
--- 2.4.13/linux/include/linux/fs.h	Sat Oct 27 11:53:28 2001
+++ linux/include/linux/fs.h	Sat Oct 27 13:05:03 2001
@@ -793,6 +793,7 @@
 	int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long);
 	int (*check_media_change) (kdev_t);
 	int (*revalidate) (kdev_t);
+	int (*can_trust_media_change) (kdev_t);
 };
 
 /*

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 15:00 Poor floppy performance in kernel 2.4.10 Alain Knaff
@ 2001-10-27 15:15 ` Alexander Viro
  2001-10-27 17:12   ` Alain Knaff
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 15:15 UTC (permalink / raw)
  To: alain; +Cc: Linus Torvalds, Richard Gooch, linux-kernel



On Sat, 27 Oct 2001, Alain Knaff wrote:

> Appended to this mail is a patch meant to fix the "non-cached floppy"
> problem.

a) you _still_ need to stop all pending IO (readaheads in progress)
before final ->release()

b) at which point do you flush the cache?  It definitely shouldn't
survive rmmod.  And no, unregister_blkdev() is not a solution, courtesy
of devfs with its insane devfs=only option.

There is a related problem which is much nastier than short-living caches:
code that does bdev->bd_op = <stuff from devfs>;  blkdev_get(bdev, ...);
Think what happens if rmmod comes while blkdev_get() sleeps ob ->bd_sem.
Notice that it had been there since the moment when devfs went into the
tree.  Sigh...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 15:15 ` Alexander Viro
@ 2001-10-27 17:12   ` Alain Knaff
  2001-10-27 17:42     ` Alexander Viro
  0 siblings, 1 reply; 44+ messages in thread
From: Alain Knaff @ 2001-10-27 17:12 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, Richard Gooch, linux-kernel

>
>
>On Sat, 27 Oct 2001, Alain Knaff wrote:
>
>> Appended to this mail is a patch meant to fix the "non-cached floppy"
>> problem.
>
>a) you _still_ need to stop all pending IO (readaheads in progress)
>before final ->release()

Ok. Didn't know about that. Maybe we should add a variant of
invalidate_bdev which stops IO in progress, but which wouldn't throw
away data that has already been read? This same variant could also
make sure that no dirty buffers were around, as a sanity check.

>b) at which point do you flush the cache?  It definitely shouldn't
>survive rmmod.

In the case of the floppy driver, the following would happen: after
rmmod, followed by insmod, the floppy driver would assume the disk has
been changed an throw away the cache (by returning 1 from
check_media_change). This is actually a side-effect of the way how the
physical DCL status is cleared: after an insmod, the floppy driver has
no way of knowing whether DCL was set at some point in the past, but
then cleared by another driver (i.e. ftape), so it errs on the safe
side, and assumes the disk has been changed.

Other block device drivers may behave differently of course...

>  And no, unregister_blkdev() is not a solution, courtesy
>of devfs with its insane devfs=only option.


Do you mean this:

int devfs_unregister_blkdev (unsigned int major, const char *name)
{
    if (boot_options & OPTION_ONLY) return 0;
    return unregister_blkdev (major, name);
}   /*  End Function devfs_unregister_blkdev  */

It seems to me that the root of the problem here is that the normal
major->device mapping is no longer the only way to "find" a device.

Cursory examination of floppy.c (as an example of a block device
driver) showed that bdops are also registered using devfs_register and
register_disk (what's THAT for?!? Floppies don't have partitions...)

Apparently, devfs_register allows a direct mapping from the device's
name to its driver, without going through its major/minor number.

Thus, a possible solution would be to equip all possible paths leading
to the driver's block_device_operations with correct "teardown"
function. Thus, not only unregister_blkdev would dump the cache, but
also devfs_unregister (maybe near the place in unregister() where
de->u.fcb.ops = NULL is done?). Best make this call generic, such as
free_bd_ops(handle), so that other unregister actions could easily be
tucked on top of it later on, should the need arise.

All this begs of course the following question: what kind of
idnetifier does the buffer cache code actually use to refer to the
block device, if there is no longer a major?

Is a pointer to some kind of structure used as the identifier? If so,
what structure is this? The device driver's block_device_operations
structure? No, won't work, because there may be minors. Or is it
something which is allocated on the fly when the device is opened? If
it is indeed the latter, I think I understand why the cache is gone
after re-opening the device: maybe it just gets a different handle
(mapping)? The cache may still be there, but the VFS would no longer
"find" it because the identifier of the device has changed. Should we
maybe use a _combination_ of pointer to block_device_operations and
minor number as the identifier for the device? That should stay valid
all the time while the module is loaded.

>There is a related problem which is much nastier than short-living caches:
>code that does bdev->bd_op = <stuff from devfs>;  blkdev_get(bdev, ...);
>Think what happens if rmmod comes while blkdev_get() sleeps ob ->bd_sem.
>Notice that it had been there since the moment when devfs went into the
>tree.  Sigh...

I'm not sure I understand the full extent of the problem, but at a
quick glance, it seems to be that the only operation exposed would be
blkdev_open, as all other operations do need an open handle, which
would prevent rmmod because module usage count would not be zero (at
least, if the module is correctly implemented...).

We would thus need to protect bd_op between the time it has been
fetched via get_blkfops, and bd_op->open. After bd_op->open, we would
be safe because of module usage count.

We could either use bdev->bd_sem (awkward, as many drivers implement
multiple bdev's), or a new per-major device lock to protect that
section.

unregister_blkdev would need to acquire the same lock while zero-ing
blkdevs[major].bdops.

Right now, block_dev.c's do_open already holds the kernel lock, so
wouldn't it be enough to also lock_kernel() in unregister_blkdev ?

A symetric situation might occur if a bd_op was called after
bd_op->release. But as far as I could see, this is not the case. Even
my own patch added the call to bd_op->can_trust_media_change _before_
the bd_op->release.

Ok, so the above only describes the solution where a device is
accessed via its major. To transpose this to devfs' direct
name-to-device mappings, a similar lock would need to be held between
the time the bd_op pointer is established, and the time it is
opened. But this may not practical, because too many software
components would be involved in that critical section. So maybe
do_open could "revalidate" the bd_op pointer (i.e. somehow check
whether the block device driver has not yet been unregistered in the
meantime) just before using it, and have the critical section only
enclose the revalidate and bd_op->open operation.

Something like the following (do_open):

	down(&bdev->bd_sem);
	lock_kernel();
+       if(bdev->bd_op && !devfs_is_bdop_still_valid(inode,bdev)) {
+               unlock_kernel();
+               up(&bdev->bd_sem);
+               return -ENODEV;
+       }
	if (!bdev->bd_op)
		bdev->bd_op = get_blkfops(MAJOR(dev));
	if (bdev->bd_op) {
		ret = 0;
		if (bdev->bd_op->open)
			ret = bdev->bd_op->open(inode, file);
		if (!ret) {
			bdev->bd_openers++;
			bdev->bd_inode->i_size = blkdev_size(dev);
			bdev->bd_inode->i_blkbits = blksize_bits(block_size(dev));
		} else if (!bdev->bd_openers)
			bdev->bd_op = NULL;
	}
	unlock_kernel();
	up(&bdev->bd_sem);


Regards,

Alain

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 17:12   ` Alain Knaff
@ 2001-10-27 17:42     ` Alexander Viro
  2001-10-27 18:00       ` Alain Knaff
  2001-10-27 19:13       ` Alain Knaff
  0 siblings, 2 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 17:42 UTC (permalink / raw)
  To: alain; +Cc: Linus Torvalds, Richard Gooch, linux-kernel



On Sat, 27 Oct 2001, Alain Knaff wrote:

> Cursory examination of floppy.c (as an example of a block device
> driver) showed that bdops are also registered using devfs_register and
> register_disk (what's THAT for?!? Floppies don't have partitions...)

Actually, _that_ is Right Thing(tm) - it should allocate a structure that
would contain pointer to methods table and would be controlled by
driver.  devfs_register() would get that + prefered name, etc. so that
we had a common object.  Then driver would have a point where it could
tell the rest of kernel that disk is gone.
 
> Apparently, devfs_register allows a direct mapping from the device's
> name to its driver, without going through its major/minor number.
> 
> Thus, a possible solution would be to equip all possible paths leading
> to the driver's block_device_operations with correct "teardown"
> function. Thus, not only unregister_blkdev would dump the cache, but
> also devfs_unregister (maybe near the place in unregister() where
> de->u.fcb.ops = NULL is done?). Best make this call generic, such as

No go.  We can have situations where some of uses come from devfs and
some - from normal device nodes.  struct block_device will be the same.

> All this begs of course the following question: what kind of
> idnetifier does the buffer cache code actually use to refer to the
> block device, if there is no longer a major?

Right now - major:minor, in 2.5 - struct block_device *.
 
> We could either use bdev->bd_sem (awkward, as many drivers implement
> multiple bdev's), or a new per-major device lock to protect that
> section.

I'd rather have refcount raised by get_blkfops().  Again, that code path
is not a problem.  devfs_get_ops() is.
 
> unregister_blkdev would need to acquire the same lock while zero-ing
> blkdevs[major].bdops.

We could put bdev on per-major cyclic list and have it killed on
unregister_blkdev().  _That_ is easy.  The trouble being, with devfs
we don't have a single removal point.  Sometimes it's still
unregister_blkdev(), sometimes - crapload of devfs_unregister() for
each minor, sometimes - both.  Worse yet, we have one more place that
holds pointer to block_device_operations - gendisk.  Also used by
devfs (and nothing else) and logics is, to put it mildly, fuzzy.

Frankly, at that point I would prefer to remove the code in devfs that
tries to provide bdev methods by devfs entry.  Rationale:
	a) it's fucked up beyond any repair
	b) it will be useless until we switch buffer cache to block_device *
	c) we will need to change that logics anyway - as it is the thing is
inherently racy
	d) right now it stands in the way of long-living cache stuff _and_
introduces an oopsable race between mount and rmmod.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 17:42     ` Alexander Viro
@ 2001-10-27 18:00       ` Alain Knaff
  2001-10-27 18:13         ` Alexander Viro
                           ` (3 more replies)
  2001-10-27 19:13       ` Alain Knaff
  1 sibling, 4 replies; 44+ messages in thread
From: Alain Knaff @ 2001-10-27 18:00 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, Richard Gooch, linux-kernel

>
>
>On Sat, 27 Oct 2001, Alain Knaff wrote:
>
>> Cursory examination of floppy.c (as an example of a block device
>> driver) showed that bdops are also registered using devfs_register and
>> register_disk (what's THAT for?!? Floppies don't have partitions...)
>
>Actually, _that_ is Right Thing(tm) - it should allocate a structure that
>would contain pointer to methods table and would be controlled by
>driver.  devfs_register() would get that + prefered name, etc. so that
>we had a common object.  Then driver would have a point where it could
>tell the rest of kernel that disk is gone.

Register_disk seems to be related to partitions... and is yet another
place where floppy_fops is handed out. And it doesn't seem to have a
corresponding unregister_disk function, so this worries me
somewhat. Who did that, and why didn't he contact me?

>> Apparently, devfs_register allows a direct mapping from the device's
>> name to its driver, without going through its major/minor number.
>> 
>> Thus, a possible solution would be to equip all possible paths leading
>> to the driver's block_device_operations with correct "teardown"
>> function. Thus, not only unregister_blkdev would dump the cache, but
>> also devfs_unregister (maybe near the place in unregister() where
>> de->u.fcb.ops = NULL is done?). Best make this call generic, such as
>
>No go.  We can have situations where some of uses come from devfs and
>some - from normal device nodes.  struct block_device will be the same.

Ok. So maybe some kind of counter? When it drops to zero, dump the
cache?

>> All this begs of course the following question: what kind of
>> idnetifier does the buffer cache code actually use to refer to the
>> block device, if there is no longer a major?
>
>Right now - major:minor, in 2.5 - struct block_device *.

Good. But then, what's the point of devfs=only ? I assumed this was
intended for situations where we had a direct mapping from filename to
device.

Ok, so in 2.5 will be possible with struct block_device, and the
option will make sense.

So, in the interest of stability, shouldn't we (temporarily) disable
this devfs=only stuff in 2.4 ?

>> We could either use bdev->bd_sem (awkward, as many drivers implement
>> multiple bdev's), or a new per-major device lock to protect that
>> section.
>
>I'd rather have refcount raised by get_blkfops().  Again, that code path
>is not a problem.  devfs_get_ops() is.
> 
>> unregister_blkdev would need to acquire the same lock while zero-ing
>> blkdevs[major].bdops.
>
>We could put bdev on per-major cyclic list and have it killed on
>unregister_blkdev().  _That_ is easy.  The trouble being, with devfs
>we don't have a single removal point.  Sometimes it's still
>unregister_blkdev(), sometimes - crapload of devfs_unregister() for
>each minor, sometimes - both.  Worse yet, we have one more place that
>holds pointer to block_device_operations - gendisk.  Also used by
>devfs (and nothing else) and logics is, to put it mildly, fuzzy.
>
>Frankly, at that point I would prefer to remove the code in devfs that
>tries to provide bdev methods by devfs entry.  Rationale:
>	a) it's fucked up beyond any repair
>	b) it will be useless until we switch buffer cache to block_device *
>	c) we will need to change that logics anyway - as it is the thing is
>inherently racy
>	d) right now it stands in the way of long-living cache stuff _and_
>introduces an oopsable race between mount and rmmod.
>

I agree. We could maybe just #ifdef those methods out, so that we
could easily add them back in 2.5 once struct block_device is in
place.

Regards,

Alain

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 18:00       ` Alain Knaff
@ 2001-10-27 18:13         ` Alexander Viro
  2001-10-27 18:26         ` Alexander Viro
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 18:13 UTC (permalink / raw)
  To: alain; +Cc: Linus Torvalds, Richard Gooch, linux-kernel



On Sat, 27 Oct 2001, Alain Knaff wrote:

> Register_disk seems to be related to partitions... and is yet another
> place where floppy_fops is handed out. And it doesn't seem to have a
> corresponding unregister_disk function, so this worries me
> somewhat. Who did that, and why didn't he contact me?

In most of the cases (and that includes floppy.c) unregister_disk() would
be invoked by unregister_blkdev().  The only exception (back then, now
we probably have more) would be SCSI.  That stuff got frozen in the middle
of a merge by devfs inclusion (2.3.46 or so).  Right now register_disk()
is no-op for partitionless devices.
 
> Good. But then, what's the point of devfs=only ? I assumed this was

Ask Richard.  Maybe you will be able to get a straight answer.  I hadn't...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 18:00       ` Alain Knaff
  2001-10-27 18:13         ` Alexander Viro
@ 2001-10-27 18:26         ` Alexander Viro
  2001-10-27 19:02           ` more devfs fun Alexander Viro
  2001-11-06  5:06           ` Richard Gooch
  2001-11-06  7:01         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
  2001-11-06  7:19         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
  3 siblings, 2 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 18:26 UTC (permalink / raw)
  To: alain; +Cc: Linus Torvalds, Richard Gooch, linux-kernel

	BTW, here's one more devfs rmmod race: check_disk_changed() in
fs/devfs/base.c.  Calling ->check_media_change() with no protection
whatsoever.  If rmmod happens at that point...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* more devfs fun
  2001-10-27 18:26         ` Alexander Viro
@ 2001-10-27 19:02           ` Alexander Viro
  2001-10-27 19:16             ` Alexander Viro
  2001-11-06  5:17             ` more devfs fun Richard Gooch
  2001-11-06  5:06           ` Richard Gooch
  1 sibling, 2 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Richard Gooch

	BTW, what the hell is that?
/*
 * hwgraph_bdevsw_get - returns the fops of the given devfs entry.
*/
struct file_operations * 
hwgraph_bdevsw_get(devfs_handle_t de)
{
        return(devfs_get_ops(de));
}

	It's arch/ia64/sn/io/hcl.c.  The funny thing being, the thing
you will get from devfs_get_ops() will _not_ be struct file_operations *.
And that's aside of the fact that any use of that function is very
likely to be racy as hell.  Sigh...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 17:42     ` Alexander Viro
  2001-10-27 18:00       ` Alain Knaff
@ 2001-10-27 19:13       ` Alain Knaff
  2001-10-27 19:19         ` Alexander Viro
  1 sibling, 1 reply; 44+ messages in thread
From: Alain Knaff @ 2001-10-27 19:13 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, Richard Gooch, linux-kernel

>> All this begs of course the following question: what kind of
>> idnetifier does the buffer cache code actually use to refer to the
>> block device, if there is no longer a major?
>
>Right now - major:minor, in 2.5 - struct block_device *.

Actually, are you sure about that?

In filemap.c in do_generic_file_read, you have this:

	struct address_space *mapping = filp->f_dentry->d_inode->i_mapping;
	...
		/*
		 * Try to find the data in the page cache..
		 */
		hash = page_hash(mapping, index);

		spin_lock(&pagecache_lock);
		page = __find_page_nolock(mapping, index, *hash);

Looks as if pages are identified by (mapping, index) pairs, rather
than (major:minor, index) triplets.

And "mapping" itself seems to point to i_data of the device's inode
structure (not the device entry's inode, but the device's itself).

Which means that if the inode is put (all references to the block
device closed), and later the same major/minor is reopened, it may
very well be in a different place in memory... Meaning that the
formerly cached pages (if they are still there...) will no longer be
recognized as the same, as mapping is now a different address, even
though major/minor is the same.

The changing nature of mapping can be confirmed easily by inserting a
printk into the open routine of the device.

As long as at least one reference to the device is open, mapping stays
the same for further opens, but as soon as the last ref is closed, the
next open will return a different address...


For the following test, I've inserted the following debugging printk
in floppy_open:

	printk("Mapping=%x data=%x\n", inode->i_mapping, &inode->i_data);

As you see, while the sleep is open, Mapping is always
c25a7a90. However, once the sleep is closed, Mapping changes to
c1ba0cd0 . This seems to explain it things.

> sleep 3600 </dev/fd0 &
[1] 1449
Mapping=c25a7a90 data=cb80dc50
VFS: Disk change detected on device fd(2,0)
> mdir a:
Mapping=c25a7a90 data=cb80dc50
 Volume in drive A has no label
 Volume Serial Number is 63F7-FED8
Directory for A:/

No files
                          1 457 664 bytes free

> mdir a:
Mapping=c25a7a90 data=cb80dc50
 Volume in drive A has no label
 Volume Serial Number is 63F7-FED8
Directory for A:/

No files
                          1 457 664 bytes free
> kill %1
[1]  + 1449 terminated  sleep 3600 < /dev/fd0
> mdir a:
Mapping=c1ba0cd0 data=cb80dc50
 Volume in drive A has no label
 Volume Serial Number is 63F7-FED8
Directory for A:/

No files
                          1 457 664 bytes free



Regards,

Alain

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun
  2001-10-27 19:02           ` more devfs fun Alexander Viro
@ 2001-10-27 19:16             ` Alexander Viro
  2001-10-27 19:50               ` more devfs fun (Piled Higher and Deeper) Alexander Viro
                                 ` (2 more replies)
  2001-11-06  5:17             ` more devfs fun Richard Gooch
  1 sibling, 3 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 19:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Richard Gooch

	It just gets better and better...

in get_removable_partition()
        if (strcmp (de->name, "disc") == 0) return check_disc_changed (de);

with 
    if ( name && (namelen < 1) ) namelen = strlen (name);
    if ( ( new = kmalloc (sizeof *new + namelen, GFP_KERNEL) ) == NULL )
        return NULL;
...
    if (name) memcpy (new->name, name, namelen);

in create_entry().

IOW, ->name is not NUL-terminated.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 19:13       ` Alain Knaff
@ 2001-10-27 19:19         ` Alexander Viro
  2001-10-27 19:26           ` Alain Knaff
  2001-10-28 20:40           ` Alain Knaff
  0 siblings, 2 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 19:19 UTC (permalink / raw)
  To: alain; +Cc: Linus Torvalds, Richard Gooch, linux-kernel



On Sat, 27 Oct 2001, Alain Knaff wrote:

> And "mapping" itself seems to point to i_data of the device's inode
> structure (not the device entry's inode, but the device's itself).
 
> Which means that if the inode is put (all references to the block
> device closed), and later the same major/minor is reopened, it may

Stop here.  bdev->bd_inode is destroyed only when bdev is destroyed.
If we make block_device long-living (i.e. they stay around until all
pages are evicted from cache _or_ device gets unregistered) ->bd_inode
will follow.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 19:19         ` Alexander Viro
@ 2001-10-27 19:26           ` Alain Knaff
  2001-10-28 20:40           ` Alain Knaff
  1 sibling, 0 replies; 44+ messages in thread
From: Alain Knaff @ 2001-10-27 19:26 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, Richard Gooch, linux-kernel

>
>
>On Sat, 27 Oct 2001, Alain Knaff wrote:
>
>> And "mapping" itself seems to point to i_data of the device's inode
>> structure (not the device entry's inode, but the device's itself).
> 
>> Which means that if the inode is put (all references to the block
>> device closed), and later the same major/minor is reopened, it may
>
>Stop here.  bdev->bd_inode is destroyed only when bdev is destroyed.
>If we make block_device long-living (i.e. they stay around until all
>pages are evicted from cache _or_ device gets unregistered) ->bd_inode
>will follow.

That would be ok with me. Long live block_device! ;-)

Alain

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 19:16             ` Alexander Viro
@ 2001-10-27 19:50               ` Alexander Viro
  2001-10-27 20:06                 ` Alexander Viro
                                   ` (2 more replies)
  2001-10-28 10:06               ` Lorenzo Allegrucci
  2001-11-06  5:21               ` Richard Gooch
  2 siblings, 3 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 19:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Richard Gooch

devfs_rmdir() checks that directory is empty.  Then it calls
devfsd_notify_one(), which can block.  Then it marks the entry
unregistered and reports success.

Guess what will happen if devfs_register() will happen at that
moment...

/me _seriously_ considers hostile takeover of the damn thing

I mean, when holes are found at that rate just by cursory look through
the code...  And that stuff had been there for at least 2 years.
Richard, I hate to break it on you, but dealing with that crap is
generally considered a maintainer's job.  And it is supposed to happen
slightly faster - 2 years would be bad even for MS.  If you don't
have time for that work - say so and step down, nobody will blame
you for that.  Repeating that everything will be fine RSN is getting
real old by now...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 19:50               ` more devfs fun (Piled Higher and Deeper) Alexander Viro
@ 2001-10-27 20:06                 ` Alexander Viro
  2001-10-27 21:09                   ` Ryan Cumming
  2001-10-27 21:01                 ` Alexander Viro
  2001-11-06  6:35                 ` Richard Gooch
  2 siblings, 1 reply; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Richard Gooch

... and one more - devfs_unregister() on a directory happening when
mknod() in that directory sleeps in create_entry() (on kmalloc()).

Do you ever read your own code?  It's not like this stuff was hard
to find - I'm just poking into random places and every damn one
contains a hole.  Sigh...

Oh, BTW - here's another one:  think what happens if tree-walking in
unregister() steps on the entry we are currently removing in
devfs_unlink().


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 19:50               ` more devfs fun (Piled Higher and Deeper) Alexander Viro
  2001-10-27 20:06                 ` Alexander Viro
@ 2001-10-27 21:01                 ` Alexander Viro
  2001-11-06  6:35                 ` Richard Gooch
  2 siblings, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 21:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Richard Gooch



On Sat, 27 Oct 2001, Alexander Viro wrote:

> devfs_rmdir() checks that directory is empty.  Then it calls
> devfsd_notify_one(), which can block.  Then it marks the entry
> unregistered and reports success.
> 
> Guess what will happen if devfs_register() will happen at that
> moment...

Ugh... My apologies - race here is a bit different.  Namely,
devfs_register() find a directory, starts creating a child,
blocks in kmalloc(), _then_ entire devfs_rmdir() happens and
devfs_register() merrily inserts a new child into dead directory.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 20:06                 ` Alexander Viro
@ 2001-10-27 21:09                   ` Ryan Cumming
  2001-10-27 21:22                     ` Alexander Viro
                                       ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Ryan Cumming @ 2001-10-27 21:09 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

On October 27, 2001 13:06, Alexander Viro wrote:
> Do you ever read your own code?  It's not like this stuff was hard
> to find - I'm just poking into random places and every damn one
> contains a hole.  Sigh...

It might be more productive to provide patches or at least generate 
constructive ideas as how to fix these problems, as you are obviously quite 
capable of doing so. Digging through the code and sending a new email to this 
list for every few flaws you find is doing little good, and your personal 
attacks on the maintainer are even less benefical. Cooperation will get you a 
lot farther than conflict.

-Ryan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 21:09                   ` Ryan Cumming
@ 2001-10-27 21:22                     ` Alexander Viro
  2001-10-28  1:00                     ` Rik van Riel
  2001-10-28  8:45                     ` Richard Gooch
  2 siblings, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-27 21:22 UTC (permalink / raw)
  To: Ryan Cumming; +Cc: linux-kernel



On Sat, 27 Oct 2001, Ryan Cumming wrote:

> It might be more productive to provide patches or at least generate 
> constructive ideas as how to fix these problems, as you are obviously quite 
> capable of doing so. Digging through the code and sending a new email to this 
> list for every few flaws you find is doing little good, and your personal 
> attacks on the maintainer are even less benefical. Cooperation will get you a 
> lot farther than conflict.

Been there, tried that, had been told by Richard that he would rather fix
devfs bugs himself.  Quite a few months ago.  If you have better suggestions
they would be more than welcome.

As far as I can see, if maintainer doesn't fix the bugs himself and tells
that patches are not welcome there are only two things that can be done -
going into full-disclosure mode in hope that it will change the situation or
taking over the code in question.

By that point I'm sorely tempted to do the latter (i.e. full-blown fork,
maintained with no regard to Richard's preferences + sumbitting [very
massive] fixes directly to Linus), but I want to give a try to less
drastic approach first.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 21:09                   ` Ryan Cumming
  2001-10-27 21:22                     ` Alexander Viro
@ 2001-10-28  1:00                     ` Rik van Riel
  2001-10-28  8:45                     ` Richard Gooch
  2 siblings, 0 replies; 44+ messages in thread
From: Rik van Riel @ 2001-10-28  1:00 UTC (permalink / raw)
  To: Ryan Cumming; +Cc: Alexander Viro, linux-kernel

On Sat, 27 Oct 2001, Ryan Cumming wrote:

> It might be more productive to provide patches or at least generate
> constructive ideas as how to fix these problems, as you are obviously
> quite capable of doing so.

1) yes, Al Viro is very capable of sending in devfs fixes
   and he has done so in the past  (IIRC around 2 months ago)

2) Richard Gooch then told Al he'd just started working on
   a patch to fix the problem and he'd rather fix things
   himself ... as far as I can see this hasn't happened yet

regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 21:09                   ` Ryan Cumming
  2001-10-27 21:22                     ` Alexander Viro
  2001-10-28  1:00                     ` Rik van Riel
@ 2001-10-28  8:45                     ` Richard Gooch
  2001-10-28  9:54                       ` Alexander Viro
                                         ` (2 more replies)
  2 siblings, 3 replies; 44+ messages in thread
From: Richard Gooch @ 2001-10-28  8:45 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ryan Cumming, Alexander Viro, linux-kernel

Rik van Riel writes:
> On Sat, 27 Oct 2001, Ryan Cumming wrote:
> 
> > It might be more productive to provide patches or at least generate
> > constructive ideas as how to fix these problems, as you are obviously
> > quite capable of doing so.
> 
> 1) yes, Al Viro is very capable of sending in devfs fixes
>    and he has done so in the past  (IIRC around 2 months ago)

A truely horrible, busy-wait patch that was quickly superceeded by a
much cleaner patch that I wrote shortly thereafter. And was applied by
Linus in due course.

> 2) Richard Gooch then told Al he'd just started working on
>    a patch to fix the problem and he'd rather fix things
>    himself ... as far as I can see this hasn't happened yet

Complete fucking bullshit. Over the last several months, I've been
sending a steady stream of bugfix patches to Linus and the list, and
if you'd been paying attention, you'd notice that in time they've been
applied.

Furthermore, I've nearly finished the big rewrite of devfs which adds
proper locking and refcounting. That work was progressing nicely (but
it's a big job), although it's temporarily stalled because of some
important travel. Work on that will resume in the next couple of
weeks. There's no point sending in an incomplete version of the code.

It's beyond me why you state that there has been no progress by me
when my announcements of new devfs patches have been posted to the
list and even Linus' ChangeLog messages have shown stuff going in. If
you don't actually know what's going on, why do you bother posting on
this subject in the first place? How would you like it if I started
flaming about how long the VM code was taking to get working? Our VM
has sucked for *years*.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-28  8:45                     ` Richard Gooch
@ 2001-10-28  9:54                       ` Alexander Viro
  2001-10-28 12:27                       ` Roman Zippel
  2001-10-28 22:31                       ` Richard Gooch
  2 siblings, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-28  9:54 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Rik van Riel, Ryan Cumming, linux-kernel



On Sun, 28 Oct 2001, Richard Gooch wrote:

> Complete fucking bullshit. Over the last several months, I've been
> sending a steady stream of bugfix patches to Linus and the list, and
> if you'd been paying attention, you'd notice that in time they've been
> applied.

OK, _now_ I'm really pissed off.  As far as I can see there is only
one way to get you fix anything - posting to l-k.  This "steady
stream" consists of what?  Let's see:

0.118:	buffer underrun in try_modload().  Source: some Al Viro had hit
the function in question in grep over tree and took a couple of minutes
to read it.
0.118:  moving down_read() - yes, it had fixed the instance of deadlock
pointed to you by, damn, what a coincidence, same bastard.  Come to think
of that, let me grep for down_read()...  Aha.

static int devfs_follow_link (struct dentry *dentry, struct nameidata *nd)
{
    int err;
    struct devfs_entry *de;

    de = get_devfs_entry_from_vfs_inode (dentry->d_inode, TRUE);
    if (!de) return -ENODEV;
    down_read (&symlink_rwsem);
    err = de->registered ? vfs_follow_link (nd,
                                            de->u.symlink.linkname) : -ENODEV;
    up_read (&symlink_rwsem);
    return err;
}   /*  End Function devfs_follow_link  */

Umm... Hadn't we just been there?  Recursive down_read(&symlink_rwsem)...

0.117: oh, wow - finally.  devfs_link() is gone.

0.116: reverted previous broken patch, but result contained a deadlock instead
of race.  Result of race scenario described on l-k by... damn, this asshole
again.

0.115: bogus fix for breakage introduced by blkdev-in-pagecache patch.  Hadn't
got into Linus' tree, actually.

0.114: introduced broken refcounting for symlinks (see 0.116)

0.113: "quick and dirty hack" to protect symlink bodies.  Broken, at that.
BTW, breakage in 0.113 and 0.114 hadn't stopped Mandrake from deciding that
it fixed readlink() race and shipping the thing.  Funny, but race it was
supposed to fix had been described in private email several months before.
Then it was described on l-k.  Then description had been forwarded to Mandrake
- after a question about potential breakage.  _Then_ (and I assume that it
was a coincidence) said patches had appeared.

0.111, 0.112: not a fix by any stretch of imagination.

Oh, and before that we have a (finally, only after a year of mentioning
the crap in question, heavy-weight rant on l-k when I've finally ran out
of patience _and_ detailed discussion on the possible fixes) fix for
expand-entry-table races.


So far all I see is that beating you hard enough in public can make you
fix the bugs explicitly pointed to you.  That's it.  As far as I can
see you don't read your own code, judging by the fact that every damn
look at fs/devfs/base.c shows a new hole within a couple of minutes
_and_ said holes stay until posted on l-k.  Private mail doesn't work.
You read it, reply and ignore.  About hundred kilobytes of evidence
available at request.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 19:16             ` Alexander Viro
  2001-10-27 19:50               ` more devfs fun (Piled Higher and Deeper) Alexander Viro
@ 2001-10-28 10:06               ` Lorenzo Allegrucci
  2001-11-06  5:21               ` Richard Gooch
  2 siblings, 0 replies; 44+ messages in thread
From: Lorenzo Allegrucci @ 2001-10-28 10:06 UTC (permalink / raw)
  To: Alexander Viro, linux-kernel; +Cc: Linus Torvalds, Richard Gooch

At 15.50 27/10/01 -0400, Alexander Viro wrote:
>/me _seriously_ considers hostile takeover of the damn thing
>
>I mean, when holes are found at that rate just by cursory look through
>the code...  And that stuff had been there for at least 2 years.
>Richard, I hate to break it on you, but dealing with that crap is
>generally considered a maintainer's job.  And it is supposed to happen
>slightly faster - 2 years would be bad even for MS.  If you don't
>have time for that work - say so and step down, nobody will blame
>you for that.  Repeating that everything will be fine RSN is getting
>real old by now...

As matters stand I would suggest a radical approach:
What about to replace the current devfs by a clean and lightweight
ramfs based filesystem?
IMO we need a complete rewrite and I'm sure we can make it simpler
and cleaner. 2.5 stuff, but it's worthwhile.

Just my 0.02 Euro,


-- 
Lorenzo

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-28  8:45                     ` Richard Gooch
  2001-10-28  9:54                       ` Alexander Viro
@ 2001-10-28 12:27                       ` Roman Zippel
  2001-10-28 13:16                         ` Alexander Viro
  2001-10-28 22:31                       ` Richard Gooch
  2 siblings, 1 reply; 44+ messages in thread
From: Roman Zippel @ 2001-10-28 12:27 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Rik van Riel, Ryan Cumming, Alexander Viro, linux-kernel

Hi,

Richard Gooch wrote:

> Furthermore, I've nearly finished the big rewrite of devfs which adds
> proper locking and refcounting. That work was progressing nicely (but
> it's a big job), although it's temporarily stalled because of some
> important travel. Work on that will resume in the next couple of
> weeks. There's no point sending in an incomplete version of the code.

What about putting them somewhere in a CVS repository, so people can see
what's going on and maybe even can help out?
BTW you should really do something about your coding style, your code is
very confusing to read. I wouldn't care if it would be just some driver,
but devfs is supposed to be a very important part, so it would be nice
to use the same rules that apply to other important parts of the kernel.

bye, Roman

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-28 12:27                       ` Roman Zippel
@ 2001-10-28 13:16                         ` Alexander Viro
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-28 13:16 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Richard Gooch, Rik van Riel, Ryan Cumming, linux-kernel



On Sun, 28 Oct 2001, Roman Zippel wrote:

> What about putting them somewhere in a CVS repository, so people can see
> what's going on and maybe even can help out?

Looks like I'll get around to creating a CVS repository starting at the
last known code in a couple of days anyway...

> BTW you should really do something about your coding style, your code is
> very confusing to read. I wouldn't care if it would be just some driver,
> but devfs is supposed to be a very important part, so it would be nice
> to use the same rules that apply to other important parts of the kernel.

Good luck.

BTW, Richard - the last one for tonight:

devfs_unregister() vs. get_vfs_inode().  The latter blocks, so devfs_lookup()
and devfs_d_revalidate() can give you a nasty surprise - entry gets
unregistered while we allocate the inode and there's no connection between
it and inode or dentry at that point.  Then we merrily get dentry/inode
tied to unregistered devfs_entry.  And that includes reference _to_ dentry.
Enjoy...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 19:19         ` Alexander Viro
  2001-10-27 19:26           ` Alain Knaff
@ 2001-10-28 20:40           ` Alain Knaff
  2001-10-28 20:57             ` Peter T. Breuer
  2001-10-28 21:42             ` Alexander Viro
  1 sibling, 2 replies; 44+ messages in thread
From: Alain Knaff @ 2001-10-28 20:40 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, Richard Gooch, linux-kernel

Appended to this mail is the "long live the struct block_device"
patch. It includes the stuff covered in the last patch as well. The
issue of stopping transfers in progress is not yet addressed.

How it works: rather than have bdput free up the struct block_device
unconditionnally as soon as the count drops to zero, forcefully
bd_forgetting any inodes that still reference it, it now checks the
list of inodes references it, and only frees it up if not referenced.

Moreover, bd_forget also calls __bdput to free up the struct
block_device when the number of referencers drops to zero later on.

Simple tests show this to make the cache workeable again, and an extra
printk in floppy_open allows to show that the struct block_device can
be freed up if needed (by creating high memory pressure).


diff -ur 2.4.13/linux/drivers/block/floppy.c linux/drivers/block/floppy.c
--- 2.4.13/linux/drivers/block/floppy.c	Sat Oct 27 09:42:34 2001
+++ linux/drivers/block/floppy.c	Sun Oct 28 21:21:46 2001
@@ -741,6 +741,7 @@
 		return UTESTF(FD_DISK_CHANGED);
 	if ((fd_inb(FD_DIR) ^ UDP->flags) & 0x80){
 		USETF(FD_VERIFY); /* verify write protection */
+		USETF(FD_DCL_SEEN); /* DCL for this drive has been seen */
 		if (UDRS->maxblock){
 			/* mark it changed */
 			USETF(FD_DISK_CHANGED);
@@ -3901,12 +3902,23 @@
 	return 0;
 }
 
+/* determines whether we can trust media change
+ * answers true if we have seen disk change at least once, and not marked as
+ * broken */
+static int floppy_can_trust_media_change(kdev_t dev)
+{
+	int drive=DRIVE(dev);
+	return (UTESTF(FD_DCL_SEEN) &&
+		((UDP->flags & FD_BROKEN_DCL) == 0));
+}
+
 static struct block_device_operations floppy_fops = {
 	open:			floppy_open,
 	release:		floppy_release,
 	ioctl:			fd_ioctl,
 	check_media_change:	check_floppy_change,
 	revalidate:		floppy_revalidate,
+	can_trust_media_change: floppy_can_trust_media_change
 };
 
 static void __init register_devfs_entries (int drive)
diff -ur 2.4.13/linux/fs/block_dev.c linux/fs/block_dev.c
--- 2.4.13/linux/fs/block_dev.c	Sun Oct 28 09:52:53 2001
+++ linux/fs/block_dev.c	Sun Oct 28 21:11:08 2001
@@ -346,19 +346,36 @@
 	inode->i_mapping = &inode->i_data;
 }
 
+/**
+ * Physically delete bdev iff inode list empty AND counter zero. Must be
+ * called with bdev_lock held. bdev_lock will be released on exit.
+ * Should be called by all functions which can make either of the conditions
+ * true (decrease counter (bdput) or free inode (bdforget)
+ */
+static void __bdput(struct block_device *bdev) {
+    int shouldDrop =
+	    (bdev->bd_inodes.next == &bdev->bd_inodes) &&
+	    (atomic_read(&bdev->bd_count) == 0);
+    if(shouldDrop) {
+	    if (bdev->bd_openers)
+		    BUG();
+	    list_del(&bdev->bd_hash);
+    }
+    spin_unlock(&bdev_lock);
+    if(shouldDrop) {
+	    if (bdev->bd_openers)
+		    BUG();
+	    iput(bdev->bd_inode);
+	    destroy_bdev(bdev);
+    }
+}
+
 void bdput(struct block_device *bdev)
 {
 	if (atomic_dec_and_lock(&bdev->bd_count, &bdev_lock)) {
-		struct list_head *p;
 		if (bdev->bd_openers)
 			BUG();
-		list_del(&bdev->bd_hash);
-		while ( (p = bdev->bd_inodes.next) != &bdev->bd_inodes ) {
-			__bd_forget(list_entry(p, struct inode, i_devices));
-		}
-		spin_unlock(&bdev_lock);
-		iput(bdev->bd_inode);
-		destroy_bdev(bdev);
+		__bdput(bdev);
 	}
 }
  
@@ -390,9 +407,17 @@
 
 void bd_forget(struct inode *inode)
 {
+	struct block_device *bdev;
 	spin_lock(&bdev_lock);
-	if (inode->i_bdev)
+	bdev = inode->i_bdev;
+	if (bdev) {
 		__bd_forget(inode);
+		if(inode != bdev->bd_inode) {
+			/* if not "root" inode, do the bdput */
+			__bdput(bdev);
+			return;
+		}
+	}
 	spin_unlock(&bdev_lock);
 }
 
@@ -601,8 +626,12 @@
 		__block_fsync(bd_inode);
 	else if (kind == BDEV_FS)
 		fsync_no_super(rdev);
-	if (!--bdev->bd_openers)
-		kill_bdev(bdev);
+	if (!--bdev->bd_openers) {
+		if(!bdev->bd_op->can_trust_media_change ||
+		   !bdev->bd_op->can_trust_media_change(rdev)) {
+			kill_bdev(bdev);
+		}
+	}
 	if (bdev->bd_op->release)
 		ret = bdev->bd_op->release(bd_inode, NULL);
 	if (!bdev->bd_openers)
diff -ur 2.4.13/linux/include/linux/fd.h linux/include/linux/fd.h
--- 2.4.13/linux/include/linux/fd.h	Fri Aug 13 21:16:16 1999
+++ linux/include/linux/fd.h	Sat Oct 27 13:37:08 2001
@@ -177,7 +177,8 @@
 				 * to clear media change status */
 	FD_UNUSED_BIT,
 	FD_DISK_CHANGED_BIT,	/* disk has been changed since last i/o */
-	FD_DISK_WRITABLE_BIT	/* disk is writable */
+	FD_DISK_WRITABLE_BIT,	/* disk is writable */
+	FD_DCL_SEEN_BIT		/* have we seen DCL? */
 };
 
 #define FDSETDRVPRM _IOW(2, 0x90, struct floppy_drive_params)
@@ -196,6 +197,7 @@
 #define FD_DISK_NEWCHANGE (1 << FD_DISK_NEWCHANGE_BIT)
 #define FD_DISK_CHANGED (1 << FD_DISK_CHANGED_BIT)
 #define FD_DISK_WRITABLE (1 << FD_DISK_WRITABLE_BIT)
+#define FD_DCL_SEEN (1 << FD_DCL_SEEN)
 
 	unsigned long spinup_date;
 	unsigned long select_date;
diff -ur 2.4.13/linux/include/linux/fs.h linux/include/linux/fs.h
--- 2.4.13/linux/include/linux/fs.h	Sat Oct 27 11:53:28 2001
+++ linux/include/linux/fs.h	Sat Oct 27 13:05:03 2001
@@ -793,6 +793,7 @@
 	int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long);
 	int (*check_media_change) (kdev_t);
 	int (*revalidate) (kdev_t);
+	int (*can_trust_media_change) (kdev_t);
 };
 
 /*

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-28 20:40           ` Alain Knaff
@ 2001-10-28 20:57             ` Peter T. Breuer
  2001-10-29  5:38               ` Alain Knaff
  2001-10-28 21:42             ` Alexander Viro
  1 sibling, 1 reply; 44+ messages in thread
From: Peter T. Breuer @ 2001-10-28 20:57 UTC (permalink / raw)
  To: alain; +Cc: linux kernel

"A month of sundays ago Alain Knaff wrote:"
> Appended to this mail is the "long live the struct block_device"
> patch. It includes the stuff covered in the last patch as well. The
> issue of stopping transfers in progress is not yet addressed.

Errr .. haven't read the patches. But are you doing something so that
the semantics of the action taken after a media check fails can
be overridden? The current invalidate_inodes is too strong for me,
since I am proxying a remote device, and I don't want to kill 
_all_ the local file descriptors when the remote media disappears. 
I need to at least continue to send down local ioctls!

No, no suggestions of an extra control device, please. Simplicity.

> +
>  static struct block_device_operations floppy_fops = {
>  	open:			floppy_open,
>  	release:		floppy_release,
>  	ioctl:			fd_ioctl,
>  	check_media_change:	check_floppy_change,
>  	revalidate:		floppy_revalidate,
> +	can_trust_media_change: floppy_can_trust_media_change
>  };

and I'd like to have "invalidate" as a method too.

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-28 20:40           ` Alain Knaff
  2001-10-28 20:57             ` Peter T. Breuer
@ 2001-10-28 21:42             ` Alexander Viro
  1 sibling, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-28 21:42 UTC (permalink / raw)
  To: alain; +Cc: Linus Torvalds, Richard Gooch, linux-kernel



On Sun, 28 Oct 2001, Alain Knaff wrote:

> Appended to this mail is the "long live the struct block_device"
> patch. It includes the stuff covered in the last patch as well. The
> issue of stopping transfers in progress is not yet addressed.

Actually, both issues are not addressed.  With your patch bdev will
happily live after rmmod.  Please, wait a bit with that stuff.  I'll
post a variant tonight - still need to verify that it deals correctly
with cases like SCSI (disk going away without unregister_blkdev()).


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-28  8:45                     ` Richard Gooch
  2001-10-28  9:54                       ` Alexander Viro
  2001-10-28 12:27                       ` Roman Zippel
@ 2001-10-28 22:31                       ` Richard Gooch
  2001-10-28 23:11                         ` Alexander Viro
  2 siblings, 1 reply; 44+ messages in thread
From: Richard Gooch @ 2001-10-28 22:31 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Rik van Riel, Ryan Cumming, linux-kernel

Alexander Viro writes:
> 
> 
> On Sun, 28 Oct 2001, Richard Gooch wrote:
> 
> > Complete fucking bullshit. Over the last several months, I've been
> > sending a steady stream of bugfix patches to Linus and the list, and
> > if you'd been paying attention, you'd notice that in time they've been
> > applied.
> 
> OK, _now_ I'm really pissed off.  As far as I can see there is only
> one way to get you fix anything - posting to l-k.  This "steady
> stream" consists of what?  Let's see:
[...]
> Oh, and before that we have a (finally, only after a year of mentioning
> the crap in question, heavy-weight rant on l-k when I've finally ran out
> of patience _and_ detailed discussion on the possible fixes) fix for
> expand-entry-table races.
> 
> So far all I see is that beating you hard enough in public can make
> you fix the bugs explicitly pointed to you.  That's it.  As far as I
> can see you don't read your own code, judging by the fact that every
> damn look at fs/devfs/base.c shows a new hole within a couple of
> minutes _and_ said holes stay until posted on l-k.  Private mail
> doesn't work.  You read it, reply and ignore.  About hundred
> kilobytes of evidence available at request.

You don't get to see the bug reports or questions I respond to which
are sent to me privately or on the devfs list (I know you're not
subscribed:-). And you seem to have forgotten that I've responded to
questions or bug reports *from you* that you send privately to me,
sometimes Cc:ed to Linus. I've even responded to questions that you've
placed in the code. So it's simply not true that I only respond if
beat upon in public. Progress *is* being made, irrespective of your
"input".

As for the recent bug reports, yes, I've just seen them. I'll respond
(not because you've been flaming about it on the list) later this week
once I clear through my email backlog which accumulated while I was
off the 'net for a week. Yeah, it does take time to wade through all
the email, especially when I get greeted with a huge pile of flames.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-28 22:31                       ` Richard Gooch
@ 2001-10-28 23:11                         ` Alexander Viro
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-10-28 23:11 UTC (permalink / raw)
  To: Richard Gooch; +Cc: Rik van Riel, Ryan Cumming, linux-kernel



On Mon, 29 Oct 2001, Richard Gooch wrote:

> Alexander Viro writes:
> > So far all I see is that beating you hard enough in public can make
> > you fix the bugs explicitly pointed to you.  That's it.  As far as I
> > can see you don't read your own code, judging by the fact that every
> > damn look at fs/devfs/base.c shows a new hole within a couple of
> > minutes _and_ said holes stay until posted on l-k.  Private mail
> > doesn't work.  You read it, reply and ignore.  About hundred
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
> > kilobytes of evidence available at request.
>
> You don't get to see the bug reports or questions I respond to which
> are sent to me privately or on the devfs list (I know you're not
> subscribed:-). And you seem to have forgotten that I've responded to
> questions or bug reports *from you* that you send privately to me,

I see what made its way in your code and your changelog.  As far as I
can see nothing contradicts description above.
 
1) You are maintainer of that code.
2) Couple of minutes of reading through it is enough to find a new hole.
3) There are dozens of such holes.
4) They had been there for years.

Conclusion:

You either can't see that stuff at all or you don't bother to spend even
minimal time looking for bugs.

See the problem with that situation?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-28 20:57             ` Peter T. Breuer
@ 2001-10-29  5:38               ` Alain Knaff
  2001-10-29  6:07                 ` Alexander Viro
  0 siblings, 1 reply; 44+ messages in thread
From: Alain Knaff @ 2001-10-29  5:38 UTC (permalink / raw)
  To: ptb; +Cc: linux kernel

>"A month of sundays ago Alain Knaff wrote:"
>> Appended to this mail is the "long live the struct block_device"
>> patch. It includes the stuff covered in the last patch as well. The
>> issue of stopping transfers in progress is not yet addressed.
>
>Errr .. haven't read the patches.

Please do ;-) Or maybe, rather browse the initial mails of this
thread, which discuss the problem being fixed here, and the reason why
that problem was there in the first place.

>But are you doing something so that
>the semantics of the action taken after a media check fails can
>be overridden? The current invalidate_inodes is too strong for me,
>since I am proxying a remote device, and I don't want to kill 
>_all_ the local file descriptors when the remote media disappears. 
>I need to at least continue to send down local ioctls!
>
>No, no suggestions of an extra control device, please. Simplicity.

Don't worry. This patch does not do anything like that. No extra
control device is introduced. No killing of local file descriptors in
case a remote media disappears. Please read the context of the post
(Subject: Poor floppy performance). The discussion has unfortunately
been stretched over more than a week, with long silences in between
(I've been busy with other stuff in between, and thus had to come back
later, sorry), so the initial messages might be somewhat hard to find,
but they're there in the archives:
http://www.uwsg.iu.edu/hypermail/linux/kernel/, in the week of Oct
16-23.


All the patch it does is address the following problem: if all file
descriptors pointing to a block device are closed, the cache is
closed, no matter whether the media has been changed or not... This
clearly leads to suboptimal performance, especially with slow devices
such as the floppy.

The reason for this behaviour was threefold
 1. Apparently, for some media, check_media_change is not reliable, so
the VFS people decided not to trust it (don't ask...). Hence the
kill_bdev call, invoked after the last close, to throw away all
buffers
 2. Kill_bdev is also needed to stop any read-aheads in progress after
last close (makes sense: if there is no open filedescriptors, those
read-aheads can no longer safely be served, because the device driver
may assume that nobody needs the device any longer and thus
de-allocate critical resources: IRQ, DMA, bounce buffers...)
 3. Cached pages are identified by the pair (inode,index), and the
block device backend inode changes after the last close.

All the can_trust_media_change part does is skip the kill_bdev if the
device driver says "yes, you _can_ trust the answer of
check_media_change" (fix problem #1).

Problem number 2 is not yet addressed, Alex Viro is working on a patch
(probably by having a variant of invalidate_bdev that just stops
transfers in progress, loudly warn about dirty pages, but without
killing clean cached pages)

Number 3 is fixed by making struct block_device "long
lived". Formerly, it only existed as long as there were active open
descriptors using it; now it exists as long as there are frontend
inodes referencing it.

Another related issue (not yet addressed by my patch), is what happens
after rmmod. Right now, with the patch, the cache is so long-lived
that it hangs around even after an rmmod...

Personnally, I don't really like the "check_media_change not trusted"
semantics either... we better handle flaky media change indicators
internally in the driver, especially since the driver may have its own
limited internal caching (DMA bounce buffers, track buffers, ...). But
apparently other people have strong feelings about the issue, having
been burned badly by devices which occasionnally miss a media
change..., and who have chosen to ginore check_media_change
altogether, rather than fix those device drivers...

>> +
>>  static struct block_device_operations floppy_fops = {
>>  	open:			floppy_open,
>>  	release:		floppy_release,
>>  	ioctl:			fd_ioctl,
>>  	check_media_change:	check_floppy_change,
>>  	revalidate:		floppy_revalidate,
>> +	can_trust_media_change: floppy_can_trust_media_change
>>  };
>
>and I'd like to have "invalidate" as a method too.
>
>Peter

Regards,

Alain

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-29  5:38               ` Alain Knaff
@ 2001-10-29  6:07                 ` Alexander Viro
  2001-10-29  6:34                   ` Alain Knaff
  0 siblings, 1 reply; 44+ messages in thread
From: Alexander Viro @ 2001-10-29  6:07 UTC (permalink / raw)
  To: alain; +Cc: ptb, linux kernel



On Mon, 29 Oct 2001, Alain Knaff wrote:

> Problem number 2 is not yet addressed, Alex Viro is working on a patch
> (probably by having a variant of invalidate_bdev that just stops
> transfers in progress, loudly warn about dirty pages, but without
> killing clean cached pages)
> 
> Number 3 is fixed by making struct block_device "long
> lived". Formerly, it only existed as long as there were active open
> descriptors using it; now it exists as long as there are frontend
> inodes referencing it.

IMO that's bogus.  Correct test is "do we have any data in page cache
for this guy?" (combined with "... and if driver says that it goes
away, we'd better drop them all").  See the patch I've just sent to you
and Linus - right now I consider it as too dangerous for public testing, so...

Notice that it _doesn't_ trust check_media_change() - it still has
both detach_metadata() (needed in all cases) and truncate_inode_pages()
(kills cache, needed only for drivers with b0rken ->check_media_change())
in blkdev_put().  You'll need to make the latter call conditional to
actually keep the cache around, but I'd really like to make sure that
infrastructure changes are sane before doing that step.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-29  6:07                 ` Alexander Viro
@ 2001-10-29  6:34                   ` Alain Knaff
  0 siblings, 0 replies; 44+ messages in thread
From: Alain Knaff @ 2001-10-29  6:34 UTC (permalink / raw)
  To: Alexander Viro; +Cc: ptb, linux kernel

>
>
>On Mon, 29 Oct 2001, Alain Knaff wrote:
>
>> Problem number 2 is not yet addressed, Alex Viro is working on a patch
>> (probably by having a variant of invalidate_bdev that just stops
>> transfers in progress, loudly warn about dirty pages, but without
>> killing clean cached pages)
>> 
>> Number 3 is fixed by making struct block_device "long
>> lived". Formerly, it only existed as long as there were active open
>> descriptors using it; now it exists as long as there are frontend
>> inodes referencing it.
>
>IMO that's bogus.

As far as I understood, the same logic is used for all other kinds of
inodes, isn't it? I agree that it is suboptimal (might stay around
longer than its pages), but at least it works in simple cases
(i.e. stays cached normally, and is evicted from cache under memory
pressure).

>Correct test is "do we have any data in page cache
>for this guy?" (combined with "... and if driver says that it goes
>away, we'd better drop them all").  See the patch I've just sent to you
>and Linus - right now I consider it as too dangerous for public testing, so...

Got the patch... but only after I sent that mail to Peter. I'll have a
look at it, and try to test it. But due to other activities, I'm
afraid I'll only be able to get around to it tomorrow evening.

>Notice that it _doesn't_ trust check_media_change() - it still has
>both detach_metadata() (needed in all cases) and truncate_inode_pages()
>(kills cache, needed only for drivers with b0rken ->check_media_change())
>in blkdev_put().  You'll need to make the latter call conditional

With your patch, will making that call conditional still correctly
stop any transfers which are already in progress? (a problem that has
existed with my patches)? Or does this problem need to be addressed
separately?

>to
>actually keep the cache around, but I'd really like to make sure that
>infrastructure changes are sane before doing that step.

Ok, I'll give it a spin tomorrow evening.



Btw, the more I think about the check_media_change issue, the more I
come to the conclusion that the cases of broken media change detection
are best dealt with from within the driver. Indeed, firstly why
encumber the vast majority of drivers, which are non-broken, or which
already deal with the situation, just in order to handle a few bad
apples? Secondly, the broken drivers might have internal caches (track
buffers, etc.), and just ignoring check_media_change _outside_ the
driver won't fix all of the problem, as the internal caches will not
be invalidated.

Thus, may I suggest the following course of action in case a missed
media change detection problem with a driver: make a good-faith effort
to contact its maintainer, and have the problem fixed _in_the_driver_
rather than to have media change ignored centrally. Same goes for
other issues as well: work _with_ the device drivers, rather than
_against_ them. I think, in the long run this will save all of us lots
of grief.

Yes, we can have can_trust_media_change type functions, but it just
makes all drivers more complex, and eventually it will lead to a
situation where bad device drivers might unconditionnally return true
to it anyways, and we'd be back to square one.

Alain

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun
  2001-10-27 18:26         ` Alexander Viro
  2001-10-27 19:02           ` more devfs fun Alexander Viro
@ 2001-11-06  5:06           ` Richard Gooch
  1 sibling, 0 replies; 44+ messages in thread
From: Richard Gooch @ 2001-11-06  5:06 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, Linus Torvalds

Alexander Viro writes:
> 	BTW, what the hell is that?
> /*
>  * hwgraph_bdevsw_get - returns the fops of the given devfs entry.
> */
> struct file_operations * 
> hwgraph_bdevsw_get(devfs_handle_t de)
> {
>         return(devfs_get_ops(de));
> }
> 
> 	It's arch/ia64/sn/io/hcl.c.  The funny thing being, the thing
> you will get from devfs_get_ops() will _not_ be struct
> file_operations *.  And that's aside of the fact that any use of
> that function is very likely to be racy as hell.  Sigh...

Sigh indeed. I didn't write that code. Looks like someone is (ab)using
the devfs API. Contact the maintainer of that code for insight.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun
  2001-10-27 19:02           ` more devfs fun Alexander Viro
  2001-10-27 19:16             ` Alexander Viro
@ 2001-11-06  5:17             ` Richard Gooch
  1 sibling, 0 replies; 44+ messages in thread
From: Richard Gooch @ 2001-11-06  5:17 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, Linus Torvalds

Alexander Viro writes:
> 	It just gets better and better...

Sorry, Al. You're wrong on this one...

> in get_removable_partition()
>         if (strcmp (de->name, "disc") == 0) return check_disc_changed (de);
> 
> with 
>     if ( name && (namelen < 1) ) namelen = strlen (name);
>     if ( ( new = kmalloc (sizeof *new + namelen, GFP_KERNEL) ) == NULL )
>         return NULL;
> ...
>     if (name) memcpy (new->name, name, namelen);
> 
> in create_entry().
> 
> IOW, ->name is not NUL-terminated.

In fact, it is. Take a look at the declaration of struct devfs_entry
and you'll see that there is one byte already taken up for the name
string. So adding the length of the string yields storage for len+1
bytes. IOW, space for the NULL-terminator. And right after the call to
kmalloc(), I call:
	memset (new, 0, sizeof *new + namelen);

which zeros everything, including the terminator.

If I missed something as basic as NULL-termination, the thing probably
wouldn't even boot!

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 19:16             ` Alexander Viro
  2001-10-27 19:50               ` more devfs fun (Piled Higher and Deeper) Alexander Viro
  2001-10-28 10:06               ` Lorenzo Allegrucci
@ 2001-11-06  5:21               ` Richard Gooch
  2 siblings, 0 replies; 44+ messages in thread
From: Richard Gooch @ 2001-11-06  5:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, Linus Torvalds

Alexander Viro writes:
> devfs_rmdir() checks that directory is empty.  Then it calls
> devfsd_notify_one(), which can block.  Then it marks the entry
> unregistered and reports success.
> 
> Guess what will happen if devfs_register() will happen at that
> moment...

Yeah, I fixed that one a couple of months ago in the new tree. I now
unhook from the directory structure before sending the notification.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-10-27 19:50               ` more devfs fun (Piled Higher and Deeper) Alexander Viro
  2001-10-27 20:06                 ` Alexander Viro
  2001-10-27 21:01                 ` Alexander Viro
@ 2001-11-06  6:35                 ` Richard Gooch
  2001-11-06  7:10                   ` Alexander Viro
  2001-11-06  7:14                   ` Richard Gooch
  2 siblings, 2 replies; 44+ messages in thread
From: Richard Gooch @ 2001-11-06  6:35 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, Linus Torvalds

Alexander Viro writes:
> ... and one more - devfs_unregister() on a directory happening when
> mknod() in that directory sleeps in create_entry() (on kmalloc()).
> 
> Do you ever read your own code?  It's not like this stuff was hard
> to find - I'm just poking into random places and every damn one
> contains a hole.  Sigh...
> 
> Oh, BTW - here's another one:  think what happens if tree-walking in
> unregister() steps on the entry we are currently removing in
> devfs_unlink().

Yep, as I've long ago admitted, there are races in the old devfs
code, which couldn't be fixed without proper locking. And that's why
I've been wanting to add said locking for ages, and have been
frustrated at interruptions which delayed that work. And I'm very
happy to get the first cut of the new code released.

That said, try to understand (before getting emotional and launching
off a tirade such as the one last week) that different people have
different priorities, and mine was to provide functionality first, and
worry about hostile attacks/exploits later. This is not unreasonable
if you consider that the initial target machines for devfs were:
- my personal boxes (which are not public machines)
- big-iron machines sitting behind a firewall
- small university group sitting behind a firewall (and I know where
  all the users live:-)

I know your favourite horror scenario is the public machines available
to the undergrads, but not everyone works in such a hostile
environment.

Anyway, I hope Linus wasn't bored by all the messages you Cc:ed to
him for your^H^H^H^Hhis benefit :-O

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 18:00       ` Alain Knaff
  2001-10-27 18:13         ` Alexander Viro
  2001-10-27 18:26         ` Alexander Viro
@ 2001-11-06  7:01         ` Richard Gooch
  2001-11-06  7:03           ` Alexander Viro
  2001-12-09  5:38           ` Devfs races with block devices Richard Gooch
  2001-11-06  7:19         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
  3 siblings, 2 replies; 44+ messages in thread
From: Richard Gooch @ 2001-11-06  7:01 UTC (permalink / raw)
  To: Alexander Viro; +Cc: alain, Linus Torvalds, linux-kernel

Alexander Viro writes:
> 	BTW, here's one more devfs rmmod race: check_disk_changed() in
> fs/devfs/base.c.  Calling ->check_media_change() with no protection
> whatsoever.  If rmmod happens at that point...

How is this different from a call to fs/block_dev.c:check_disk_change()
which also has no protection?

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-11-06  7:01         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
@ 2001-11-06  7:03           ` Alexander Viro
  2001-12-09  5:38           ` Devfs races with block devices Richard Gooch
  1 sibling, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-11-06  7:03 UTC (permalink / raw)
  To: Richard Gooch; +Cc: alain, Linus Torvalds, linux-kernel



On Tue, 6 Nov 2001, Richard Gooch wrote:

> Alexander Viro writes:
> > 	BTW, here's one more devfs rmmod race: check_disk_changed() in
> > fs/devfs/base.c.  Calling ->check_media_change() with no protection
> > whatsoever.  If rmmod happens at that point...
> 
> How is this different from a call to fs/block_dev.c:check_disk_change()
> which also has no protection?

It's called either from driver itself (and then whatever protects caller
protects it as well) or after we'd pinned the thing down by blkdev_get().


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-11-06  6:35                 ` Richard Gooch
@ 2001-11-06  7:10                   ` Alexander Viro
  2001-11-06  7:14                   ` Richard Gooch
  1 sibling, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-11-06  7:10 UTC (permalink / raw)
  To: Richard Gooch; +Cc: linux-kernel, Linus Torvalds



On Mon, 5 Nov 2001, Richard Gooch wrote:

> Yep, as I've long ago admitted, there are races in the old devfs
> code, which couldn't be fixed without proper locking. And that's why
> I've been wanting to add said locking for ages, and have been
> frustrated at interruptions which delayed that work. And I'm very
> happy to get the first cut of the new code released.

BTW, new code still has both aforementioned races - detaching entries
from the tree doesn't help with that.

> That said, try to understand (before getting emotional and launching
> off a tirade such as the one last week) that different people have
> different priorities, and mine was to provide functionality first, and
> worry about hostile attacks/exploits later. This is not unreasonable
> if you consider that the initial target machines for devfs were:
> - my personal boxes (which are not public machines)
> - big-iron machines sitting behind a firewall
> - small university group sitting behind a firewall (and I know where
>   all the users live:-)

That's nice, but that had stopped being the case as soon as you've proposed
devfs for inclusion into the tree...


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-11-06  6:35                 ` Richard Gooch
  2001-11-06  7:10                   ` Alexander Viro
@ 2001-11-06  7:14                   ` Richard Gooch
  2001-11-06  7:17                     ` Alexander Viro
  2001-11-06 20:14                     ` Richard Gooch
  1 sibling, 2 replies; 44+ messages in thread
From: Richard Gooch @ 2001-11-06  7:14 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, Linus Torvalds

Alexander Viro writes:
> 
> 
> On Mon, 5 Nov 2001, Richard Gooch wrote:
> 
> > Yep, as I've long ago admitted, there are races in the old devfs
> > code, which couldn't be fixed without proper locking. And that's why
> > I've been wanting to add said locking for ages, and have been
> > frustrated at interruptions which delayed that work. And I'm very
> > happy to get the first cut of the new code released.
> 
> BTW, new code still has both aforementioned races - detaching
> entries from the tree doesn't help with that.

Which "both"? You sent quite a few messages, listing more than two
races. I'm still wading through the list.

> > That said, try to understand (before getting emotional and launching
> > off a tirade such as the one last week) that different people have
> > different priorities, and mine was to provide functionality first, and
> > worry about hostile attacks/exploits later. This is not unreasonable
> > if you consider that the initial target machines for devfs were:
> > - my personal boxes (which are not public machines)
> > - big-iron machines sitting behind a firewall
> > - small university group sitting behind a firewall (and I know where
> >   all the users live:-)
> 
> That's nice, but that had stopped being the case as soon as you've
> proposed devfs for inclusion into the tree...

Marked CONFIG_EXPERIMENTAL...

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-11-06  7:14                   ` Richard Gooch
@ 2001-11-06  7:17                     ` Alexander Viro
  2001-11-06 20:14                     ` Richard Gooch
  1 sibling, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-11-06  7:17 UTC (permalink / raw)
  To: Richard Gooch; +Cc: linux-kernel, Linus Torvalds



On Tue, 6 Nov 2001, Richard Gooch wrote:

> > BTW, new code still has both aforementioned races - detaching
                                 ^^^^^^^^^^^^^^
> > entries from the tree doesn't help with that.
> 
> Which "both"? You sent quite a few messages, listing more than two
> races. I'm still wading through the list.

Sorry - trimmed more than I should have.  Ones mentioned in the message you
were replying to - unregister() on parent vs. mknod() or unlink().


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-10-27 18:00       ` Alain Knaff
                           ` (2 preceding siblings ...)
  2001-11-06  7:01         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
@ 2001-11-06  7:19         ` Richard Gooch
  2001-11-06  7:22           ` Alexander Viro
  3 siblings, 1 reply; 44+ messages in thread
From: Richard Gooch @ 2001-11-06  7:19 UTC (permalink / raw)
  To: Alexander Viro; +Cc: alain, Linus Torvalds, linux-kernel

Alexander Viro writes:
> On Sat, 27 Oct 2001, Alain Knaff wrote:
> > Good. But then, what's the point of devfs=only ? I assumed this was
> 
> Ask Richard.  Maybe you will be able to get a straight answer.  I
> hadn't...

IIRC that I've told you this already. Here it is again: devfs=only
serves as a way of enforcing that the devfs entry->driver ops
connection is the only way of accessing a driver. It deliberately
breaks the fallback to major-table-lookup.

And it actually works now. It doesn't require massive 2.5 changes.
When I boot with devfs=only (which is always), my system still works.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Poor floppy performance in kernel 2.4.10
  2001-11-06  7:19         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
@ 2001-11-06  7:22           ` Alexander Viro
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Viro @ 2001-11-06  7:22 UTC (permalink / raw)
  To: Richard Gooch; +Cc: alain, Linus Torvalds, linux-kernel



On Tue, 6 Nov 2001, Richard Gooch wrote:

> IIRC that I've told you this already. Here it is again: devfs=only
> serves as a way of enforcing that the devfs entry->driver ops
> connection is the only way of accessing a driver. It deliberately
> breaks the fallback to major-table-lookup.
> 
> And it actually works now. It doesn't require massive 2.5 changes.
> When I boot with devfs=only (which is always), my system still works.

Try rmmod while blkdev_get() (from mount(2) or swapon(2)) sleeps on ->bd_sem.
And think what happens with value of ->bd_op.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: more devfs fun (Piled Higher and Deeper)
  2001-11-06  7:14                   ` Richard Gooch
  2001-11-06  7:17                     ` Alexander Viro
@ 2001-11-06 20:14                     ` Richard Gooch
  1 sibling, 0 replies; 44+ messages in thread
From: Richard Gooch @ 2001-11-06 20:14 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, Linus Torvalds

Alexander Viro writes:
> 
> 
> On Tue, 6 Nov 2001, Richard Gooch wrote:
> 
> > > BTW, new code still has both aforementioned races - detaching
>                                  ^^^^^^^^^^^^^^
> > > entries from the tree doesn't help with that.
> > 
> > Which "both"? You sent quite a few messages, listing more than two
> > races. I'm still wading through the list.
> 
> Sorry - trimmed more than I should have.  Ones mentioned in the
> message you were replying to - unregister() on parent vs. mknod() or
> unlink().

Ah, those. Next patch (just released) cleans that up a bit more. This
stuff is still a work in progress.

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Devfs races with block devices
  2001-11-06  7:01         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
  2001-11-06  7:03           ` Alexander Viro
@ 2001-12-09  5:38           ` Richard Gooch
  1 sibling, 0 replies; 44+ messages in thread
From: Richard Gooch @ 2001-12-09  5:38 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel

[Finally, after other things have stabilised, I'm ready to tackle this
issue]

Alexander Viro writes:
> 	BTW, here's one more devfs rmmod race: check_disk_changed() in
> fs/devfs/base.c.  Calling ->check_media_change() with no protection
> whatsoever.  If rmmod happens at that point...

How about if I do this sequence:
	lock_kernel();
	devfs checks;
	if (bd_op->owner)
		__MOD_INC_USE_COUNT(bd_op->owner);
	revalidate();
	if (bd_op->owner)
		__MOD_DEC_USE_COUNT(bd_op->owner);
	unlock_kernel();

Is there any reason why that won't work?

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2001-12-09  5:38 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-27 15:00 Poor floppy performance in kernel 2.4.10 Alain Knaff
2001-10-27 15:15 ` Alexander Viro
2001-10-27 17:12   ` Alain Knaff
2001-10-27 17:42     ` Alexander Viro
2001-10-27 18:00       ` Alain Knaff
2001-10-27 18:13         ` Alexander Viro
2001-10-27 18:26         ` Alexander Viro
2001-10-27 19:02           ` more devfs fun Alexander Viro
2001-10-27 19:16             ` Alexander Viro
2001-10-27 19:50               ` more devfs fun (Piled Higher and Deeper) Alexander Viro
2001-10-27 20:06                 ` Alexander Viro
2001-10-27 21:09                   ` Ryan Cumming
2001-10-27 21:22                     ` Alexander Viro
2001-10-28  1:00                     ` Rik van Riel
2001-10-28  8:45                     ` Richard Gooch
2001-10-28  9:54                       ` Alexander Viro
2001-10-28 12:27                       ` Roman Zippel
2001-10-28 13:16                         ` Alexander Viro
2001-10-28 22:31                       ` Richard Gooch
2001-10-28 23:11                         ` Alexander Viro
2001-10-27 21:01                 ` Alexander Viro
2001-11-06  6:35                 ` Richard Gooch
2001-11-06  7:10                   ` Alexander Viro
2001-11-06  7:14                   ` Richard Gooch
2001-11-06  7:17                     ` Alexander Viro
2001-11-06 20:14                     ` Richard Gooch
2001-10-28 10:06               ` Lorenzo Allegrucci
2001-11-06  5:21               ` Richard Gooch
2001-11-06  5:17             ` more devfs fun Richard Gooch
2001-11-06  5:06           ` Richard Gooch
2001-11-06  7:01         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
2001-11-06  7:03           ` Alexander Viro
2001-12-09  5:38           ` Devfs races with block devices Richard Gooch
2001-11-06  7:19         ` Poor floppy performance in kernel 2.4.10 Richard Gooch
2001-11-06  7:22           ` Alexander Viro
2001-10-27 19:13       ` Alain Knaff
2001-10-27 19:19         ` Alexander Viro
2001-10-27 19:26           ` Alain Knaff
2001-10-28 20:40           ` Alain Knaff
2001-10-28 20:57             ` Peter T. Breuer
2001-10-29  5:38               ` Alain Knaff
2001-10-29  6:07                 ` Alexander Viro
2001-10-29  6:34                   ` Alain Knaff
2001-10-28 21:42             ` Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).