linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Sysfs attributes racing with unregistration
@ 2012-01-04 16:52 Alan Stern
  2012-01-04 17:18 ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2012-01-04 16:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Kernel development list

Tejun:

Can you explain the current situation regarding access to sysfs
attributes and possible races with kobject removal?  I have two
questions in particular:

	What happens if one thread calls an attribute's show or
	store method concurrently with another thread unregistering
	the underlying kobject?

	What happens if a thread continues to hold an open fd 
	reference to a sysfs attribute file after the kobject is
	unregistered, and then tries to read or write that fd?

If there are any guarantees about what happens in these situations, I 
can't find them in the kernel source.

And of course, if you can think of any other matters related to this 
topic, please mention them.

Alan Stern


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sysfs attributes racing with unregistration
  2012-01-04 16:52 Sysfs attributes racing with unregistration Alan Stern
@ 2012-01-04 17:18 ` Tejun Heo
  2012-01-04 18:13   ` Eric W. Biederman
  2012-01-04 18:13   ` Sysfs attributes racing with unregistration Alan Stern
  0 siblings, 2 replies; 25+ messages in thread
From: Tejun Heo @ 2012-01-04 17:18 UTC (permalink / raw)
  To: Alan Stern
  Cc: Kernel development list, Eric Biederman, Greg Kroah-Hartman, Kay Sievers

Hello, Alan.

On Wed, Jan 04, 2012 at 11:52:20AM -0500, Alan Stern wrote:
> Can you explain the current situation regarding access to sysfs
> attributes and possible races with kobject removal?  I have two
> questions in particular:

Heh, I haven't looked at sysfs code seriously for years now and my
memory sucks to begin with, so please take whatever I say with a
gigantic grain of salt.  Eric has been looking at sysfs a lot lately
so he probably can answer these best.  Adding him, Greg and Kay - hi!
guys.

> 	What happens if one thread calls an attribute's show or
> 	store method concurrently with another thread unregistering
> 	the underlying kobject?

sysfs nodes have two reference counts - one for object lifespan and
the other for active usage.  The latter is called active and acquired
and released using sysfs_get/put_active().  Any callback invocation
should be performed while holding an active reference.  On removal,
sysfs_deactivate() marks the active reference count for deactivation
so that no new active reference is given out and waits for the
in-flight ones to drain.  IOW, removal makes sure new invocations of
callbacks fail and waits for in-progress ones to finish before
proceeding with removal.

> 	What happens if a thread continues to hold an open fd 
> 	reference to a sysfs attribute file after the kobject is
> 	unregistered, and then tries to read or write that fd?

Active reference is held only for the duration of each callback
invocation.  Userland can't prolong the existence of active reference.
The duration of callback execution is the only deciding factor.

Someone (I think Eric, right?) was trying to generalize the semantics
to vfs layer so that severance/revocation capability is generally
available.  IIRC, it didn't get through tho.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sysfs attributes racing with unregistration
  2012-01-04 17:18 ` Tejun Heo
@ 2012-01-04 18:13   ` Eric W. Biederman
  2012-01-04 19:41     ` Alan Stern
  2012-01-04 18:13   ` Sysfs attributes racing with unregistration Alan Stern
  1 sibling, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2012-01-04 18:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Stern, Kernel development list, Greg Kroah-Hartman, Kay Sievers

Tejun Heo <tj@kernel.org> writes:

> Hello, Alan.
>
> On Wed, Jan 04, 2012 at 11:52:20AM -0500, Alan Stern wrote:
>> Can you explain the current situation regarding access to sysfs
>> attributes and possible races with kobject removal?  I have two
>> questions in particular:
>
> Heh, I haven't looked at sysfs code seriously for years now and my
> memory sucks to begin with, so please take whatever I say with a
> gigantic grain of salt.  Eric has been looking at sysfs a lot lately
> so he probably can answer these best.  Adding him, Greg and Kay - hi!
> guys.
>
>> 	What happens if one thread calls an attribute's show or
>> 	store method concurrently with another thread unregistering
>> 	the underlying kobject?

>
> sysfs nodes have two reference counts - one for object lifespan and
> the other for active usage.  The latter is called active and acquired
> and released using sysfs_get/put_active().  Any callback invocation
> should be performed while holding an active reference.  On removal,
> sysfs_deactivate() marks the active reference count for deactivation
> so that no new active reference is given out and waits for the
> in-flight ones to drain.  IOW, removal makes sure new invocations of
> callbacks fail and waits for in-progress ones to finish before
> proceeding with removal.

Or in simple terms.

If the unregister call happens first the we do not call the show method.

If the show method happens first the unregister waits until the show
method is complete before letting the unregistration proceed.

Furthermore lockdep models this wait as a reader/writer lock so lockdep
should be able to warn you about deadlocks triggered by waiting for the
unregistration to complete.

>> 	What happens if a thread continues to hold an open fd 
>> 	reference to a sysfs attribute file after the kobject is
>> 	unregistered, and then tries to read or write that fd?
>
> Active reference is held only for the duration of each callback
> invocation.  Userland can't prolong the existence of active reference.
> The duration of callback execution is the only deciding factor.

The fd only pins core sysfs data structures in memory.

The fd remains usable (in the -EIO -EBADF sense of usable) even

> Someone (I think Eric, right?) was trying to generalize the semantics
> to vfs layer so that severance/revocation capability is generally
> available.  IIRC, it didn't get through tho.

Unfortunately I didn't have time to complete the effort of those
patches.  The approach was not fundamentally rejected but it needed a
clear and convincing use case as well as some strong scrutiny.  But
fundamentally finding a way to do that was seen as an interesting,
if it could be solved without slowing down the existing cases.

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sysfs attributes racing with unregistration
  2012-01-04 17:18 ` Tejun Heo
  2012-01-04 18:13   ` Eric W. Biederman
@ 2012-01-04 18:13   ` Alan Stern
  2012-01-04 18:20     ` Tejun Heo
  1 sibling, 1 reply; 25+ messages in thread
From: Alan Stern @ 2012-01-04 18:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Kernel development list, Eric Biederman, Greg Kroah-Hartman, Kay Sievers

On Wed, 4 Jan 2012, Tejun Heo wrote:

> Hello, Alan.
> 
> On Wed, Jan 04, 2012 at 11:52:20AM -0500, Alan Stern wrote:
> > Can you explain the current situation regarding access to sysfs
> > attributes and possible races with kobject removal?  I have two
> > questions in particular:
> 
> Heh, I haven't looked at sysfs code seriously for years now and my
> memory sucks to begin with, so please take whatever I say with a
> gigantic grain of salt.  Eric has been looking at sysfs a lot lately
> so he probably can answer these best.  Adding him, Greg and Kay - hi!
> guys.
> 
> > 	What happens if one thread calls an attribute's show or
> > 	store method concurrently with another thread unregistering
> > 	the underlying kobject?
> 
> sysfs nodes have two reference counts - one for object lifespan and
> the other for active usage.  The latter is called active and acquired
> and released using sysfs_get/put_active().  Any callback invocation
> should be performed while holding an active reference.  On removal,
> sysfs_deactivate() marks the active reference count for deactivation
> so that no new active reference is given out and waits for the
> in-flight ones to drain.  IOW, removal makes sure new invocations of
> callbacks fail and waits for in-progress ones to finish before
> proceeding with removal.
> 
> > 	What happens if a thread continues to hold an open fd 
> > 	reference to a sysfs attribute file after the kobject is
> > 	unregistered, and then tries to read or write that fd?
> 
> Active reference is held only for the duration of each callback
> invocation.  Userland can't prolong the existence of active reference.
> The duration of callback execution is the only deciding factor.
> 
> Someone (I think Eric, right?) was trying to generalize the semantics
> to vfs layer so that severance/revocation capability is generally
> available.  IIRC, it didn't get through tho.

That's great; it's just what I wanted to know.  Thanks.

Now, looking through the code, I wonder why sysfs_{get,put}_active() 
and sysfs_deactivate() don't use a real rwsem.  Why go to all the 
effort of imitating one?  Is it just to save space?

Alan Stern


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sysfs attributes racing with unregistration
  2012-01-04 18:13   ` Sysfs attributes racing with unregistration Alan Stern
@ 2012-01-04 18:20     ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-01-04 18:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Kernel development list, Eric Biederman, Greg Kroah-Hartman, Kay Sievers

Hello,

On Wed, Jan 04, 2012 at 01:13:41PM -0500, Alan Stern wrote:
> Now, looking through the code, I wonder why sysfs_{get,put}_active() 
> and sysfs_deactivate() don't use a real rwsem.  Why go to all the 
> effort of imitating one?  Is it just to save space?

Hmmm... maybe there was something which prevented that or maybe I was
just being stupid.  I don't really remember.  Space is a fairly
important consideration too.  Depending on configuration, there can be
a LOT of sysfs_dirents and memory consumption from sysfs has been a
real problem.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sysfs attributes racing with unregistration
  2012-01-04 18:13   ` Eric W. Biederman
@ 2012-01-04 19:41     ` Alan Stern
  2012-01-05  3:07       ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2012-01-04 19:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tejun Heo, Kernel development list, Greg Kroah-Hartman, Kay Sievers

On Wed, 4 Jan 2012, Eric W. Biederman wrote:

> > Someone (I think Eric, right?) was trying to generalize the semantics
> > to vfs layer so that severance/revocation capability is generally
> > available.  IIRC, it didn't get through tho.
> 
> Unfortunately I didn't have time to complete the effort of those
> patches.  The approach was not fundamentally rejected but it needed a
> clear and convincing use case as well as some strong scrutiny.  But
> fundamentally finding a way to do that was seen as an interesting,
> if it could be solved without slowing down the existing cases.

Ted Ts'o has been talking about something similar but not the same -- a
way to revoke an entire filesystem.  For example, see commit
7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
kludge to avoid an oops after the disk disappears).

The use case for that is obvious and widespread: Somebody yanks out a
USB drive without unmounting it first.

Alan Stern


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Sysfs attributes racing with unregistration
  2012-01-04 19:41     ` Alan Stern
@ 2012-01-05  3:07       ` Eric W. Biederman
  2012-01-05 15:13         ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2012-01-05  3:07 UTC (permalink / raw)
  To: Alan Stern
  Cc: Tejun Heo, Kernel development list, Greg Kroah-Hartman, Kay Sievers

Alan Stern <stern@rowland.harvard.edu> writes:

> On Wed, 4 Jan 2012, Eric W. Biederman wrote:
>
>> > Someone (I think Eric, right?) was trying to generalize the semantics
>> > to vfs layer so that severance/revocation capability is generally
>> > available.  IIRC, it didn't get through tho.
>> 
>> Unfortunately I didn't have time to complete the effort of those
>> patches.  The approach was not fundamentally rejected but it needed a
>> clear and convincing use case as well as some strong scrutiny.  But
>> fundamentally finding a way to do that was seen as an interesting,
>> if it could be solved without slowing down the existing cases.
>
> Ted Ts'o has been talking about something similar but not the same -- a
> way to revoke an entire filesystem.  For example, see commit
> 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
> kludge to avoid an oops after the disk disappears).
>
> The use case for that is obvious and widespread: Somebody yanks out a
> USB drive without unmounting it first.

Agreed.  The best I have at the moment is a library that can wrap
filesystem methods to implement the hotplug bits.

Do you know how hard it is to remove event up to the filesystem that
sits on top of a block device?

Do you know how hard it is to detect at mount time if a block device
might be hot-plugable?  We can always use a mount option here and
make userspace figure it out, but being to have a good default would
be nice.

If it isn't too hard to get the event up from the block device to the
filesystem when the block device is uncermoniously removed I might just
make the time to have hotunplug trigger a filesystem wide revoke on a
filesystem like ext4.

In addition to sysfs we need the same logic in proc, sysctl, and uio.
So it makes sense to move towards a common library that can do all of
the hard bits.

I just notice that sysctl is currently sysctl is broken in design if not
in practice by having poll methods that will break if you unregister
the sysctls.  Fortunately for the time being we don't have any sysctls
where that case comes up.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05  3:07       ` Eric W. Biederman
@ 2012-01-05 15:13         ` Alan Stern
  2012-01-05 15:32           ` Tejun Heo
                             ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Alan Stern @ 2012-01-05 15:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Theodore Ts'o, Tejun Heo, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

On Wed, 4 Jan 2012, Eric W. Biederman wrote:

> > Ted Ts'o has been talking about something similar but not the same -- a
> > way to revoke an entire filesystem.  For example, see commit
> > 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
> > kludge to avoid an oops after the disk disappears).
> >
> > The use case for that is obvious and widespread: Somebody yanks out a
> > USB drive without unmounting it first.
> 
> Agreed.  The best I have at the moment is a library that can wrap
> filesystem methods to implement the hotplug bits.
> 
> Do you know how hard it is to remove event up to the filesystem that
> sits on top of a block device?

I don't have a clear idea of what's involved (in particular, how to go
from a block_device structure to a mounted filesystem).  But the place
to do it would probably be block/genhd.c:invalidate_partition().  Ted
can tell you if there's a better alternative.

> Do you know how hard it is to detect at mount time if a block device
> might be hot-plugable?  We can always use a mount option here and
> make userspace figure it out, but being to have a good default would
> be nice.

I don't think it's possible to tell if a device is hot-unpluggable.  
For example, the device itself might not be removable from its parent, 
but the parent might be hot-unpluggable.  You'll probably have to 
assume that every device can potentially be unplugged, one way or 
another.

Also, even devices that aren't hot-unpluggable can fail.  The end 
result should be pretty much the same.

> If it isn't too hard to get the event up from the block device to the
> filesystem when the block device is uncermoniously removed I might just
> make the time to have hotunplug trigger a filesystem wide revoke on a
> filesystem like ext4.
> 
> In addition to sysfs we need the same logic in proc, sysctl, and uio.
> So it makes sense to move towards a common library that can do all of
> the hard bits.

Ted mentioned the need for a new "device removed" superblock method.  
Then each filesystem can add its own implementation as people get
around to it.

Alan Stern


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 15:13         ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern
@ 2012-01-05 15:32           ` Tejun Heo
  2012-01-05 16:03             ` Eric W. Biederman
  2012-01-05 15:52           ` Eric W. Biederman
  2012-01-05 18:18           ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Greg KH
  2 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2012-01-05 15:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: Eric W. Biederman, Theodore Ts'o, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

Hello,

On Thu, Jan 05, 2012 at 10:13:31AM -0500, Alan Stern wrote:
> I don't have a clear idea of what's involved (in particular, how to go
> from a block_device structure to a mounted filesystem).  But the place
> to do it would probably be block/genhd.c:invalidate_partition().  Ted
> can tell you if there's a better alternative.
> 
> > Do you know how hard it is to detect at mount time if a block device
> > might be hot-plugable?  We can always use a mount option here and
> > make userspace figure it out, but being to have a good default would
> > be nice.
> 
> I don't think it's possible to tell if a device is hot-unpluggable.  
> For example, the device itself might not be removable from its parent, 
> but the parent might be hot-unpluggable.  You'll probably have to 
> assume that every device can potentially be unplugged, one way or 
> another.
> 
> Also, even devices that aren't hot-unpluggable can fail.  The end 
> result should be pretty much the same.

Ummm.... I could be missing something but filesystems need to be able
to deal with partial device failures (ie. some block can't be read)
and hot-unplug or handling full failure is a logical extension of
that.  That's how it already works, so I don't really think that is a
particularly good application for the revoke mechanism.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 15:13         ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern
  2012-01-05 15:32           ` Tejun Heo
@ 2012-01-05 15:52           ` Eric W. Biederman
  2013-01-14 15:11             ` watchdog code anish kumar
  2012-01-05 18:18           ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Greg KH
  2 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2012-01-05 15:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: Theodore Ts'o, Tejun Heo, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

Alan Stern <stern@rowland.harvard.edu> writes:

> On Wed, 4 Jan 2012, Eric W. Biederman wrote:
>
>> > Ted Ts'o has been talking about something similar but not the same -- a
>> > way to revoke an entire filesystem.  For example, see commit
>> > 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
>> > kludge to avoid an oops after the disk disappears).
>> >
>> > The use case for that is obvious and widespread: Somebody yanks out a
>> > USB drive without unmounting it first.
>> 
>> Agreed.  The best I have at the moment is a library that can wrap
>> filesystem methods to implement the hotplug bits.
>> 
>> Do you know how hard it is to remove event up to the filesystem that
>> sits on top of a block device?
>
> I don't have a clear idea of what's involved (in particular, how to go
> from a block_device structure to a mounted filesystem).  But the place
> to do it would probably be block/genhd.c:invalidate_partition().  Ted
> can tell you if there's a better alternative.

Interesting.  That sounds like a good place to look.  Thanks.

>> Do you know how hard it is to detect at mount time if a block device
>> might be hot-plugable?  We can always use a mount option here and
>> make userspace figure it out, but being to have a good default would
>> be nice.
>
> I don't think it's possible to tell if a device is hot-unpluggable.  
> For example, the device itself might not be removable from its parent, 
> but the parent might be hot-unpluggable.  You'll probably have to 
> assume that every device can potentially be unplugged, one way or 
> another.
>
> Also, even devices that aren't hot-unpluggable can fail.  The end 
> result should be pretty much the same.

True, and ultimately I agree with you.  Unfortunately solving the
full general case right now looks like perfection being the enemy
of the good.

When the requirement becomes add the ability to tear down the data
structures and to remove the modules we have to track while we
are in a filesystem method and to add the ability to wait for
us to stop being in all filesystem methods.

That tracking is hard to make free.  So implementing it for everyone
out of the gate is a hard sell.

However if we can pick those cases where we care more about doing
the right thing on hot-unplug than we care about performance we should
be able to go forward with a good enough method now.

But since there are performance implications for very common path system
calls it makes sense to make this for the first pass something like
mount -o sync.   Something that you can opt into when it makes sense,
but that you don't have to opt into.

So the practical option that I see is either we autodetect block devices
that are setup to be hotpluggable or that we require a mount option.

>> If it isn't too hard to get the event up from the block device to the
>> filesystem when the block device is uncermoniously removed I might just
>> make the time to have hotunplug trigger a filesystem wide revoke on a
>> filesystem like ext4.
>> 
>> In addition to sysfs we need the same logic in proc, sysctl, and uio.
>> So it makes sense to move towards a common library that can do all of
>> the hard bits.
>
> Ted mentioned the need for a new "device removed" superblock method.  
> Then each filesystem can add its own implementation as people get
> around to it.

Yeah.  If we can get the "device removed" aka "revokefs" superblock
it isn't too hard to build a library that filesystems can use to
wrap their normal filesystem methods and implement revokefs.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 15:32           ` Tejun Heo
@ 2012-01-05 16:03             ` Eric W. Biederman
  2012-01-05 16:44               ` Tejun Heo
  2012-01-05 16:47               ` Alan Stern
  0 siblings, 2 replies; 25+ messages in thread
From: Eric W. Biederman @ 2012-01-05 16:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Stern, Theodore Ts'o, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

Tejun Heo <tj@kernel.org> writes:

> Hello,
>
> On Thu, Jan 05, 2012 at 10:13:31AM -0500, Alan Stern wrote:
>> I don't have a clear idea of what's involved (in particular, how to go
>> from a block_device structure to a mounted filesystem).  But the place
>> to do it would probably be block/genhd.c:invalidate_partition().  Ted
>> can tell you if there's a better alternative.
>> 
>> > Do you know how hard it is to detect at mount time if a block device
>> > might be hot-plugable?  We can always use a mount option here and
>> > make userspace figure it out, but being to have a good default would
>> > be nice.
>> 
>> I don't think it's possible to tell if a device is hot-unpluggable.  
>> For example, the device itself might not be removable from its parent, 
>> but the parent might be hot-unpluggable.  You'll probably have to 
>> assume that every device can potentially be unplugged, one way or 
>> another.
>> 
>> Also, even devices that aren't hot-unpluggable can fail.  The end 
>> result should be pretty much the same.
>
> Ummm.... I could be missing something but filesystems need to be able
> to deal with partial device failures (ie. some block can't be read)
> and hot-unplug or handling full failure is a logical extension of
> that.  That's how it already works, so I don't really think that is a
> particularly good application for the revoke mechanism.

Well the choices are really:
a) On a block device hotunplug keep the device and have it simply report
   everything as errors, to the filesystem.  Maybe with a hint to the
   filesystem that something is wrong.
b) Have a filesystem revoke method so that we don't have to keep the
   unplugged block device structure around indefinitely.

It seems clear that we are neither doing (a) or (b) which results in
periodic and spectacular failures when block devices are unplugged,
because we try and access block devices that no longer exist.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 16:03             ` Eric W. Biederman
@ 2012-01-05 16:44               ` Tejun Heo
  2012-01-05 16:47               ` Alan Stern
  1 sibling, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-01-05 16:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Stern, Theodore Ts'o, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

Hello,

On Thu, Jan 05, 2012 at 08:03:16AM -0800, Eric W. Biederman wrote:
> Well the choices are really:
> a) On a block device hotunplug keep the device and have it simply report
>    everything as errors, to the filesystem.  Maybe with a hint to the
>    filesystem that something is wrong.
> b) Have a filesystem revoke method so that we don't have to keep the
>    unplugged block device structure around indefinitely.
> 
> It seems clear that we are neither doing (a) or (b) which results in
> periodic and spectacular failures when block devices are unplugged,
> because we try and access block devices that no longer exist.

We're definitely doing a).  If it's not working properly, it's a bug.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 16:03             ` Eric W. Biederman
  2012-01-05 16:44               ` Tejun Heo
@ 2012-01-05 16:47               ` Alan Stern
  2012-01-05 17:11                 ` Tejun Heo
  2012-01-05 18:27                 ` Ted Ts'o
  1 sibling, 2 replies; 25+ messages in thread
From: Alan Stern @ 2012-01-05 16:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tejun Heo, Theodore Ts'o, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

On Thu, 5 Jan 2012, Eric W. Biederman wrote:

> > Ummm.... I could be missing something but filesystems need to be able
> > to deal with partial device failures (ie. some block can't be read)
> > and hot-unplug or handling full failure is a logical extension of
> > that.  That's how it already works, so I don't really think that is a
> > particularly good application for the revoke mechanism.
> 
> Well the choices are really:
> a) On a block device hotunplug keep the device and have it simply report
>    everything as errors, to the filesystem.  Maybe with a hint to the
>    filesystem that something is wrong.
> b) Have a filesystem revoke method so that we don't have to keep the
>    unplugged block device structure around indefinitely.

When I asked Ted about this, he strongly indicated that he preferred 
b).

> It seems clear that we are neither doing (a) or (b) which results in
> periodic and spectacular failures when block devices are unplugged,
> because we try and access block devices that no longer exist.

Actually we are doing a).  But we aren't doing it well enough.

One problem (which was reported by a user last spring) is that
del_gendisk() calls device_del() for the disk and bdi_unregister() for
the disk's backing_dev_info structure.  Now, del_gendisk will leave the
data structure in memory until the disk's refcount drops to 0, but
bdi_unregister ignores refcounts and simply erases the bdi->dev
pointer.  Once this happens, any attempt to call mark_buffer_dirty()
(for example, by ext4_commit_super) will cause an oops.

Alan Stern


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 16:47               ` Alan Stern
@ 2012-01-05 17:11                 ` Tejun Heo
  2012-01-05 18:27                 ` Ted Ts'o
  1 sibling, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-01-05 17:11 UTC (permalink / raw)
  To: Alan Stern
  Cc: Eric W. Biederman, Theodore Ts'o, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

Hello,

On Thu, Jan 05, 2012 at 11:47:54AM -0500, Alan Stern wrote:
> One problem (which was reported by a user last spring) is that
> del_gendisk() calls device_del() for the disk and bdi_unregister() for
> the disk's backing_dev_info structure.  Now, del_gendisk will leave the
> data structure in memory until the disk's refcount drops to 0, but
> bdi_unregister ignores refcounts and simply erases the bdi->dev
> pointer.  Once this happens, any attempt to call mark_buffer_dirty()
> (for example, by ext4_commit_super) will cause an oops.

Yeah, there were multiple bugs in block device hot-removal path.  I
got some of them fixed recently but didn't get to the bdi one yet.
It's a bug and needs to be fixed regardless of fs revoke support.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 15:13         ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern
  2012-01-05 15:32           ` Tejun Heo
  2012-01-05 15:52           ` Eric W. Biederman
@ 2012-01-05 18:18           ` Greg KH
  2 siblings, 0 replies; 25+ messages in thread
From: Greg KH @ 2012-01-05 18:18 UTC (permalink / raw)
  To: Alan Stern
  Cc: Eric W. Biederman, Theodore Ts'o, Tejun Heo,
	Kernel development list, Kay Sievers

On Thu, Jan 05, 2012 at 10:13:31AM -0500, Alan Stern wrote:
> On Wed, 4 Jan 2012, Eric W. Biederman wrote:
> 
> > > Ted Ts'o has been talking about something similar but not the same -- a
> > > way to revoke an entire filesystem.  For example, see commit
> > > 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
> > > kludge to avoid an oops after the disk disappears).
> > >
> > > The use case for that is obvious and widespread: Somebody yanks out a
> > > USB drive without unmounting it first.
> > 
> > Agreed.  The best I have at the moment is a library that can wrap
> > filesystem methods to implement the hotplug bits.
> > 
> > Do you know how hard it is to remove event up to the filesystem that
> > sits on top of a block device?
> 
> I don't have a clear idea of what's involved (in particular, how to go
> from a block_device structure to a mounted filesystem).  But the place
> to do it would probably be block/genhd.c:invalidate_partition().  Ted
> can tell you if there's a better alternative.
> 
> > Do you know how hard it is to detect at mount time if a block device
> > might be hot-plugable?  We can always use a mount option here and
> > make userspace figure it out, but being to have a good default would
> > be nice.
> 
> I don't think it's possible to tell if a device is hot-unpluggable.  
> For example, the device itself might not be removable from its parent, 
> but the parent might be hot-unpluggable.  You'll probably have to 
> assume that every device can potentially be unplugged, one way or 
> another.

These days, _any_ block device is hot unplugable, what with PCI hotplug
and the like (running in a virtual machine, etc.)  So you always need to
assume that any device can go away at any point in time.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 16:47               ` Alan Stern
  2012-01-05 17:11                 ` Tejun Heo
@ 2012-01-05 18:27                 ` Ted Ts'o
  2012-01-05 18:36                   ` Tejun Heo
  2012-01-05 18:38                   ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Christoph Hellwig
  1 sibling, 2 replies; 25+ messages in thread
From: Ted Ts'o @ 2012-01-05 18:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Eric W. Biederman, Tejun Heo, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

On Thu, Jan 05, 2012 at 11:47:54AM -0500, Alan Stern wrote:
> > Well the choices are really:
> > a) On a block device hotunplug keep the device and have it simply report
> >    everything as errors, to the filesystem.  Maybe with a hint to the
> >    filesystem that something is wrong.
> > b) Have a filesystem revoke method so that we don't have to keep the
> >    unplugged block device structure around indefinitely.
> 
> When I asked Ted about this, he strongly indicated that he preferred 
> b).

Ideally, we should do both.  The block device should call a
notification function (probably run out of a workqueue context, to
avoid locking issues) which tells the file system, "the block device
is _gone_ and isn't coming back".  Any attempts to read or write to
the block device should return errors, since there maybe writeback
happening in the background while the file system is shutting down
file system mount.  Once the file system is done, it can all a
function which tells the block device layer that it's OK to release
the block device and its related structures.

In order for the file system to shut down the file system cleanly, it
will need to access VFS-level revoke functionality that replaces file
descriptors with ones that returns an error on reads and writes, and
which does the right thing with mmap's[1], etc.

So it's really more of a filesystem force-umount method.  I could
imagine that this could also be used to extend the functionality of
umount(2) so that the MNT_FORCE flag could be used with non-NFS file
systems as well as NFS file systems.

				- Ted

[1] Interesting question: do we convert an mmap region to an anonymous
region and perhaps notify the user out of band this has happened?  Or
do we just make the mapping disappear and nuke the process with a SEGV
if it attempts to access it?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 18:27                 ` Ted Ts'o
@ 2012-01-05 18:36                   ` Tejun Heo
  2012-01-05 19:28                     ` Ted Ts'o
                                       ` (2 more replies)
  2012-01-05 18:38                   ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Christoph Hellwig
  1 sibling, 3 replies; 25+ messages in thread
From: Tejun Heo @ 2012-01-05 18:36 UTC (permalink / raw)
  To: Ted Ts'o, Alan Stern, Eric W. Biederman,
	Kernel development list, Greg Kroah-Hartman, Kay Sievers

Hello, Ted.

On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote:
> So it's really more of a filesystem force-umount method.  I could
> imagine that this could also be used to extend the functionality of
> umount(2) so that the MNT_FORCE flag could be used with non-NFS file
> systems as well as NFS file systems.

I think these are two separate mechanisms.  Filesystems need to be
able to handle IO errors no matter what and underlying device going
away is the same situation.  There's no reason to mix that with force
unmount.  That's a separate feature and whether to force unmount
filesystem on device removal or permanent failure is a policy decision
which belongs to userland - ie. if such behavior is desired, it should
be implemented via udev/udisk instead of hard coded logic in kernel.

I don't know enough to decide whether such forced unmount is a useful
feature tho.  It can be neat for development but is there any real
necessity for the feature?

> [1] Interesting question: do we convert an mmap region to an anonymous
> region and perhaps notify the user out of band this has happened?  Or
> do we just make the mapping disappear and nuke the process with a SEGV
> if it attempts to access it?

FWIW, I vote for SIGBUS similarly to the way we handle mmap
vs. truncate.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 18:27                 ` Ted Ts'o
  2012-01-05 18:36                   ` Tejun Heo
@ 2012-01-05 18:38                   ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2012-01-05 18:38 UTC (permalink / raw)
  To: Ted Ts'o, Alan Stern, Eric W. Biederman, Tejun Heo,
	Kernel development list, Greg Kroah-Hartman, Kay Sievers

On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote:
> Ideally, we should do both.  The block device should call a
> notification function (probably run out of a workqueue context, to
> avoid locking issues) which tells the file system, "the block device
> is _gone_ and isn't coming back".  Any attempts to read or write to
> the block device should return errors, since there maybe writeback
> happening in the background while the file system is shutting down
> file system mount.  Once the file system is done, it can all a
> function which tells the block device layer that it's OK to release
> the block device and its related structures.

FYI: we have all the functionality for that available in XFS and would
just need to wire it up.  It's also triggered if we get a write I/O
error for metadata (typically the log), so with a minim delay we
actually provide that behaviour already.

> In order for the file system to shut down the file system cleanly, it
> will need to access VFS-level revoke functionality that replaces file
> descriptors with ones that returns an error on reads and writes, and
> which does the right thing with mmap's[1], etc.

And that part is close to impossible to get right.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 18:36                   ` Tejun Heo
@ 2012-01-05 19:28                     ` Ted Ts'o
  2012-01-05 20:52                       ` Tejun Heo
  2012-01-06  6:25                       ` Alexander E. Patrakov
  2012-01-05 20:43                     ` Eric W. Biederman
  2012-01-07 21:01                     ` Revoking filesystems [was Re: Sysfs attributes racing withunregistration] Milton Miller
  2 siblings, 2 replies; 25+ messages in thread
From: Ted Ts'o @ 2012-01-05 19:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Stern, Eric W. Biederman, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

On Thu, Jan 05, 2012 at 10:36:02AM -0800, Tejun Heo wrote:
> Hello, Ted.
> 
> On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote:
> > So it's really more of a filesystem force-umount method.  I could
> > imagine that this could also be used to extend the functionality of
> > umount(2) so that the MNT_FORCE flag could be used with non-NFS file
> > systems as well as NFS file systems.
> 
> I think these are two separate mechanisms.  Filesystems need to be
> able to handle IO errors no matter what and underlying device going
> away is the same situation.  There's no reason to mix that with force
> unmount.  That's a separate feature and whether to force unmount
> filesystem on device removal or permanent failure is a policy decision
> which belongs to userland - ie. if such behavior is desired, it should
> be implemented via udev/udisk instead of hard coded logic in kernel.

I think it's needless complexity to loop this into userspace.  If the
block device is gone, it's *gone*.  What else could userspace do with
this information that block device has disappeared?  Right now, once
gone, it's never coming back.  Even if the luser plugs the USB device
back in, it's going to be coming back as a new block device node.

So we might as well automatically forcibly unmount the file system at
this point.  I can imagine sending an optional notification that such
a thing has happened, perhaps via a netlink socket, but why not have
the kernel do the right thing automatically?

> I don't know enough to decide whether such forced unmount is a useful
> feature tho.  It can be neat for development but is there any real
> necessity for the feature?

Well, if you want to complicate matters by having this go via a
notification up to userspace, and then have the userspace thoughtfully
consider (after looking up all sorts of complex rules stored in XML
files whose schema is documented nowhere but in the source code) that
the file system should go away because the block device has gone away,
the userspace code will then have to send a forced unmount.

The other use case would be a system administrator who doesn't want to
figure out which random shell is still cd'ed into a directory of a
file system he/she wants to unmount, he can still force the umount.
(Other Unix systems have had this feature in the past, and the result
is the same as what happens if you are cd'ed into a directory which is
later rmdir'ed.)  It's an ungraceful way of running things, but
sometimes it's the easist way to go.

     	   	       	  	    - Ted

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 18:36                   ` Tejun Heo
  2012-01-05 19:28                     ` Ted Ts'o
@ 2012-01-05 20:43                     ` Eric W. Biederman
  2012-01-05 20:55                       ` Tejun Heo
  2012-01-07 21:01                     ` Revoking filesystems [was Re: Sysfs attributes racing withunregistration] Milton Miller
  2 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2012-01-05 20:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ted Ts'o, Alan Stern, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

Tejun Heo <tj@kernel.org> writes:

> Hello, Ted.
>
> On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote:
>> So it's really more of a filesystem force-umount method.  I could
>> imagine that this could also be used to extend the functionality of
>> umount(2) so that the MNT_FORCE flag could be used with non-NFS file
>> systems as well as NFS file systems.
>
> I think these are two separate mechanisms.  Filesystems need to be
> able to handle IO errors no matter what and underlying device going
> away is the same situation.  There's no reason to mix that with force
> unmount.  That's a separate feature and whether to force unmount
> filesystem on device removal or permanent failure is a policy decision
> which belongs to userland - ie. if such behavior is desired, it should
> be implemented via udev/udisk instead of hard coded logic in kernel.
>
> I don't know enough to decide whether such forced unmount is a useful
> feature tho.  It can be neat for development but is there any real
> necessity for the feature?
>
>> [1] Interesting question: do we convert an mmap region to an anonymous
>> region and perhaps notify the user out of band this has happened?  Or
>> do we just make the mapping disappear and nuke the process with a SEGV
>> if it attempts to access it?
>
> FWIW, I vote for SIGBUS similarly to the way we handle mmap
> vs. truncate.

Agreed.  SIGBUS is documented as the mapping exists but the backing
store has gone away, which seems to describe hotunplug very well.
Additionally we already do this for sysfs and it works well.

So it appears that on a hotunplug it is desirable to wake all poll
waiters of a filesystem, invalidate all mmaps, and probably notify
all inotify watchers.  And in general scream to userspace that the
filesystem is gone leave it alone.

That does require a notification from the block device going away
to the filesystem.  Tejun is there an existing mechanism that we
can plug into or do we need to implement something new?

Ted we can scream that the filesystem is going away without freeing
all of the filesystem data structures.  To userspace there would
effectively be no difference but internal to the kernel it should
allows to skip the expensive logic of tracking every time a filesystem
method is invoked, allowing us to not penalize the fast path.

If I don't have to provide a zero cost ability to track which filesystem
methods are active at any given time I think I can whip up something
that is usable in a couple of days.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 19:28                     ` Ted Ts'o
@ 2012-01-05 20:52                       ` Tejun Heo
  2012-01-06  6:25                       ` Alexander E. Patrakov
  1 sibling, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-01-05 20:52 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Alan Stern, Eric W. Biederman, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

Hello,

On Thu, Jan 05, 2012 at 02:28:22PM -0500, Ted Ts'o wrote:
> > I think these are two separate mechanisms.  Filesystems need to be
> > able to handle IO errors no matter what and underlying device going
> > away is the same situation.  There's no reason to mix that with force
> > unmount.  That's a separate feature and whether to force unmount
> > filesystem on device removal or permanent failure is a policy decision
> > which belongs to userland - ie. if such behavior is desired, it should
> > be implemented via udev/udisk instead of hard coded logic in kernel.
> 
> I think it's needless complexity to loop this into userspace.  If the
> block device is gone, it's *gone*.  What else could userspace do with
> this information that block device has disappeared?  Right now, once
> gone, it's never coming back.  Even if the luser plugs the USB device
> back in, it's going to be coming back as a new block device node.
> 
> So we might as well automatically forcibly unmount the file system at
> this point.  I can imagine sending an optional notification that such
> a thing has happened, perhaps via a netlink socket, but why not have
> the kernel do the right thing automatically?

* If this was the one method to deal with hotunplug, sure, but it's
  not.  We already have (supposedly) working failure mode for hot
  device removal.

* Any modern linux distro already has all the infrastructure to handle
  this.  You can't handle hotplug without userland provided poicies
  and the same mechanism is used for hotunplugging too, *today*.  If
  force umount is decided to be the action to take on block device
  removal, that would be several line changes in userland.  Userland
  is already responsible for taking actions for those events.

* Such automation might look like a good idea now but we really don't
  know how it would end up in the longer run or for different use case
  scenarios.  I think a good example of this is the cdrom driver.  It
  implents tons of automatic behaviors, and then had to be augmented
  with ioctls to turn on and off them as they no longer fit new
  hardware, new userland behavior and changing user expectations.

So, regardless of whether adding revoking is a good idea or not, I
believe that force umount should be a separate thing from internal
block error handling.

> The other use case would be a system administrator who doesn't want to
> figure out which random shell is still cd'ed into a directory of a
> file system he/she wants to unmount, he can still force the umount.
> (Other Unix systems have had this feature in the past, and the result
> is the same as what happens if you are cd'ed into a directory which is
> later rmdir'ed.)  It's an ungraceful way of running things, but
> sometimes it's the easist way to go.

More importantly, I can't really see valid use cases other than
scenarios like the above for using revocation for usual hot unplug.
For most users, it wouldn't matter one way or the other.  It's not
like sync + lazy umount can't achieve (note that all the desktop stuff
knows about "filesystem is going away" and will gracefully step aside)
most of forced umount anyway.  It could be nice to cli aficionados or
grumpy admins but for the vast majority of userbase, it just wouldn't
matter.  Given that, I'm not convinced this is a worthwhile thing to
have.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 20:43                     ` Eric W. Biederman
@ 2012-01-05 20:55                       ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2012-01-05 20:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ted Ts'o, Alan Stern, Kernel development list,
	Greg Kroah-Hartman, Kay Sievers

On Thu, Jan 05, 2012 at 12:43:11PM -0800, Eric W. Biederman wrote:
> That does require a notification from the block device going away
> to the filesystem.  Tejun is there an existing mechanism that we
> can plug into or do we need to implement something new?

Of course, udev.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing with unregistration]
  2012-01-05 19:28                     ` Ted Ts'o
  2012-01-05 20:52                       ` Tejun Heo
@ 2012-01-06  6:25                       ` Alexander E. Patrakov
  1 sibling, 0 replies; 25+ messages in thread
From: Alexander E. Patrakov @ 2012-01-06  6:25 UTC (permalink / raw)
  To: linux-kernel

Ted Ts'o <tytso@mit.edu> wrote:

> On Thu, Jan 05, 2012 at 10:36:02AM -0800, Tejun Heo wrote:
> > Hello, Ted.
> > 
> > On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote:
> > > So it's really more of a filesystem force-umount method.  I could
> > > imagine that this could also be used to extend the functionality
> > > of umount(2) so that the MNT_FORCE flag could be used with
> > > non-NFS file systems as well as NFS file systems.
> > 
> > I think these are two separate mechanisms.  Filesystems need to be
> > able to handle IO errors no matter what and underlying device going
> > away is the same situation.  There's no reason to mix that with
> > force unmount.  That's a separate feature and whether to force
> > unmount filesystem on device removal or permanent failure is a
> > policy decision which belongs to userland - ie. if such behavior is
> > desired, it should be implemented via udev/udisk instead of hard
> > coded logic in kernel.
> 
> I think it's needless complexity to loop this into userspace.  If the
> block device is gone, it's *gone*.  What else could userspace do with
> this information that block device has disappeared?  Right now, once
> gone, it's never coming back.  Even if the luser plugs the USB device
> back in, it's going to be coming back as a new block device node.
> 
> So we might as well automatically forcibly unmount the file system at
> this point.  I can imagine sending an optional notification that such
> a thing has happened, perhaps via a netlink socket, but why not have
> the kernel do the right thing automatically?

+1, but with a different motivation. It just has to be done in the
kernel, because the userspace does not have all the needed information
to do it properly. Here are some testcases to think of, but, honestly,
I have tested only the first one and consider that it is sufficient to
prove my point.

Testcase 1: lazy unmount in progress. Plug in your USB flash drive,
mount it (or let it be automounted, say, in /media/DEVICE), open two
shells. In the first one, cd /media/DEVICE, and, after that, in the
second one, umount -l /media/DEVICE. Now look at /proc/mounts in the
second shell - there is no trace of your flash drive, so how would your
userspace guess that /media/DEVICE has to be force-unmounted if you
unplug the device now?

Testcase 2: mount namespaces. Same issue - are you going to traverse
all of /proc/???/mounts files, unscalably?

Testcase 3 (unsure): a filesystem bind-mounted several times on
different directories. What is the correct order of unmounting?

OTOH, I won't be surprised if anyone finds a case that clearly shows
that it cannot be done correctly in the kernel, either (and actually
want you to think about it). In that case, we are screwed :( Here are
some ideas for someone else to investigate if they are a problem:

1) Strange DM mappings on top of the device (LUKS?)
2) Something else mounted in /media/DEVICE/somedir - what to do with it?

-- 
Alexander E. Patrakov


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Revoking filesystems [was Re: Sysfs attributes racing withunregistration]
  2012-01-05 18:36                   ` Tejun Heo
  2012-01-05 19:28                     ` Ted Ts'o
  2012-01-05 20:43                     ` Eric W. Biederman
@ 2012-01-07 21:01                     ` Milton Miller
  2 siblings, 0 replies; 25+ messages in thread
From: Milton Miller @ 2012-01-07 21:01 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: linux-kernel, Eric W. Biederman, Tejun Heo, Alexander E. Patrakov

[resending with better headers]

On Thu Jan 05 2012 about 14:28:28 EST, Ted Ts'o wrote:
> On Thu, Jan 05, 2012 at 10:36:02AM -0800, Tejun Heo wrote:
> > Hello, Ted.
> >
> > On Thu, Jan 05, 2012 at 01:27:52PM -0500, Ted Ts'o wrote:
> > > So it's really more of a filesystem force-umount method. I could
> > > imagine that this could also be used to extend the functionality of
> > > umount(2) so that the MNT_FORCE flag could be used with non-NFS file
> > > systems as well as NFS file systems.
> >
> > I think these are two separate mechanisms. Filesystems need to be
> > able to handle IO errors no matter what and underlying device going
> > away is the same situation. There's no reason to mix that with force
> > unmount. That's a separate feature and whether to force unmount
> > filesystem on device removal or permanent failure is a policy decision
> > which belongs to userland - ie. if such behavior is desired, it should
> > be implemented via udev/udisk instead of hard coded logic in kernel.
> 
> I think it's needless complexity to loop this into userspace. If the
> block device is gone, it's *gone*. What else could userspace do with
> this information that block device has disappeared? Right now, once
> gone, it's never coming back. Even if the luser plugs the USB device
> back in, it's going to be coming back as a new block device node.


While user space has lost the ability to read that fs, there is lots
that can continue to work, espically if the system is not under memory
pressure.

First of all, what if the process I really care about is in a chroot
on another file system that was mounted under the failed filesystem?

I don't want the kernel killing my job and leaving a partial file on
some other file system just because some other disk went offline.

Second, as long as the file is cached in memory, I might be able
to use that busybox that is cached to shutdown my system or mount
the usb drive after it comes back as a new location, as long
as there isn't memory pressure.

> 
> So we might as well automatically forcibly unmount the file system at
> this point. I can imagine sending an optional notification that such
> a thing has happened, perhaps via a netlink socket, but why not have
> the kernel do the right thing automatically?
> 
> > I don't know enough to decide whether such forced unmount is a useful
> > feature tho. It can be neat for development but is there any real
> > necessity for the feature?
> 
> Well, if you want to complicate matters by having this go via a
> notification up to userspace, and then have the userspace thoughtfully
> consider (after looking up all sorts of complex rules stored in XML
> files whose schema is documented nowhere but in the source code) that
> the file system should go away because the block device has gone away,
> the userspace code will then have to send a forced unmount.
> 
> The other use case would be a system administrator who doesn't want to
> figure out which random shell is still cd'ed into a directory of a
> file system he/she wants to unmount, he can still force the umount.
> (Other Unix systems have had this feature in the past, and the result
> is the same as what happens if you are cd'ed into a directory which is
> later rmdir'ed.) It's an ungraceful way of running things, but
> sometimes it's the easist way to go.
> 
> - Ted

I can see something like this as an assist to userspace, but don't
forget that mounts are a tree.

milton

^ permalink raw reply	[flat|nested] 25+ messages in thread

* watchdog code
  2012-01-05 15:52           ` Eric W. Biederman
@ 2013-01-14 15:11             ` anish kumar
  0 siblings, 0 replies; 25+ messages in thread
From: anish kumar @ 2013-01-14 15:11 UTC (permalink / raw)
  To: johlstei; +Cc: Kernel development list

>From your comments in this thread https://lkml.org/lkml/2011/3/25/723 

>The msm watchdog driver is present in kernel only. It does not use the
>built-in Linux watchdog api. This is because the primary function of
>our watchdog is detecting bus lockups and interrupts being turned off
Doesn't linux original implementation(kernel/watchdog.c) already cover
this?If not then how does this implementation detect it i.e. bus lockup
and interrupt turned off for long time?
Does this piece of code can co-exist with the soft/hard lockup detection
in the core kernel?
>for long periods of time. We wanted this functionality to be present
>regardless of the userspace the kernel is running beneath. Userspace is
>free to have its own watchdog implemented in software.
what does this mean, can you elaborate?

>Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
In my personal opinion, we should always acknowledge the code from which
this code is inspired :)


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-01-14 15:12 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-04 16:52 Sysfs attributes racing with unregistration Alan Stern
2012-01-04 17:18 ` Tejun Heo
2012-01-04 18:13   ` Eric W. Biederman
2012-01-04 19:41     ` Alan Stern
2012-01-05  3:07       ` Eric W. Biederman
2012-01-05 15:13         ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Alan Stern
2012-01-05 15:32           ` Tejun Heo
2012-01-05 16:03             ` Eric W. Biederman
2012-01-05 16:44               ` Tejun Heo
2012-01-05 16:47               ` Alan Stern
2012-01-05 17:11                 ` Tejun Heo
2012-01-05 18:27                 ` Ted Ts'o
2012-01-05 18:36                   ` Tejun Heo
2012-01-05 19:28                     ` Ted Ts'o
2012-01-05 20:52                       ` Tejun Heo
2012-01-06  6:25                       ` Alexander E. Patrakov
2012-01-05 20:43                     ` Eric W. Biederman
2012-01-05 20:55                       ` Tejun Heo
2012-01-07 21:01                     ` Revoking filesystems [was Re: Sysfs attributes racing withunregistration] Milton Miller
2012-01-05 18:38                   ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Christoph Hellwig
2012-01-05 15:52           ` Eric W. Biederman
2013-01-14 15:11             ` watchdog code anish kumar
2012-01-05 18:18           ` Revoking filesystems [was Re: Sysfs attributes racing with unregistration] Greg KH
2012-01-04 18:13   ` Sysfs attributes racing with unregistration Alan Stern
2012-01-04 18:20     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).