linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: XFS to main kernel source
@ 2001-09-20 20:07 Gonyou, Austin
  2001-09-20 20:14 ` Alan Cox
  2001-09-20 20:29 ` Horst von Brand
  0 siblings, 2 replies; 35+ messages in thread
From: Gonyou, Austin @ 2001-09-20 20:07 UTC (permalink / raw)
  To: 'Alan Cox', narancs; +Cc: linux-xfs, linux-kernel

Won't there be a lot of changes which need to be made for it to go into 2.5
anyway though beyond just current development? Isn't 2.5 supposed to be
"radically" different?

-- 
Austin Gonyou
Systems Architect, CCNA
Coremetrics, Inc.
Phone: 512-796-9023
email: austin@coremetrics.com 

> -----Original Message-----
> From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk]
> Sent: Thursday, September 20, 2001 3:03 PM
> To: narancs@narancs.tii.matav.hu
> Cc: linux-xfs@oss.sgi.com; linux-kernel@vger.kernel.org
> Subject: Re: XFS to main kernel source
> 
> 
> > When will be the XFS patch integrated to main tree?
> > I'm really fed up with trying to get linux-2.4.9 + acXX or 
> preXX + xfs
> > together.
> 
> I can only speak for -ac but right now I have no plan to 
> tackle the merge
> except as an "its in 2.5, its ok in 2.5 backport" task
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:07 XFS to main kernel source Gonyou, Austin
@ 2001-09-20 20:14 ` Alan Cox
  2001-09-20 20:16   ` Steve Lord
  2001-09-20 20:29 ` Horst von Brand
  1 sibling, 1 reply; 35+ messages in thread
From: Alan Cox @ 2001-09-20 20:14 UTC (permalink / raw)
  To: Gonyou, Austin; +Cc: 'Alan Cox', narancs, linux-xfs, linux-kernel

> Won't there be a lot of changes which need to be made for it to go into 2.5
> anyway though beyond just current development? Isn't 2.5 supposed to be
> "radically" different?

Not really. 2.5 will change over time for certain but if anything the 2.5
changes will make it easier. One problem area with XFS is that it duplicates
chunks of what should be generic functionality - and 2.5 needs to provide
the generic paths it wants

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:14 ` Alan Cox
@ 2001-09-20 20:16   ` Steve Lord
  2001-09-20 20:25     ` Alan Cox
                       ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Steve Lord @ 2001-09-20 20:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: Gonyou, Austin, narancs, linux-xfs, linux-kernel

> > Won't there be a lot of changes which need to be made for it to go into 2.5
> > anyway though beyond just current development? Isn't 2.5 supposed to be
> > "radically" different?
> 
> Not really. 2.5 will change over time for certain but if anything the 2.5
> changes will make it easier. One problem area with XFS is that it duplicates
> chunks of what should be generic functionality - and 2.5 needs to provide
> the generic paths it wants

Since we have your attention - which chunks? One of the frustrations we have
had is the lack of feedback from anyone who has looked at XFS.

Just interested.

Steve



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:16   ` Steve Lord
@ 2001-09-20 20:25     ` Alan Cox
  2001-09-20 20:26     ` Christoph Hellwig
  2001-09-20 20:40     ` Alexander Viro
  2 siblings, 0 replies; 35+ messages in thread
From: Alan Cox @ 2001-09-20 20:25 UTC (permalink / raw)
  To: Steve Lord; +Cc: Alan Cox, Gonyou Austin, narancs, linux-xfs, linux-kernel

> Since we have your attention - which chunks? One of the frustrations we have
> had is the lack of feedback from anyone who has looked at XFS.
> 
> Just interested.

Send me a current snapshot diff and I'll promise you some feedback. I didnt
realise this was an issue

Alan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:16   ` Steve Lord
  2001-09-20 20:25     ` Alan Cox
@ 2001-09-20 20:26     ` Christoph Hellwig
  2001-09-20 21:31       ` Steve Lord
  2001-09-20 20:40     ` Alexander Viro
  2 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2001-09-20 20:26 UTC (permalink / raw)
  To: Steve Lord; +Cc: Alan Cox, Gonyou, Austin, narancs, linux-xfs, linux-kernel

On Thu, Sep 20, 2001 at 03:16:52PM -0500, Steve Lord wrote:
> > > Won't there be a lot of changes which need to be made for it to go into 2.5
> > > anyway though beyond just current development? Isn't 2.5 supposed to be
> > > "radically" different?
> > 
> > Not really. 2.5 will change over time for certain but if anything the 2.5
> > changes will make it easier. One problem area with XFS is that it duplicates
> > chunks of what should be generic functionality - and 2.5 needs to provide
> > the generic paths it wants
> 
> Since we have your attention - which chunks? One of the frustrations we have
> had is the lack of feedback from anyone who has looked at XFS.

 o The whole vnode layer
 o checks already peformed by the VFS all over the place
   (just take a look at xfs_rename.c!)
 o the own quoata code
 o the hooks for a propritary clusterfs..

My 2 (euro-)cents,

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:07 XFS to main kernel source Gonyou, Austin
  2001-09-20 20:14 ` Alan Cox
@ 2001-09-20 20:29 ` Horst von Brand
  2001-09-20 20:50   ` Alan Cox
  1 sibling, 1 reply; 35+ messages in thread
From: Horst von Brand @ 2001-09-20 20:29 UTC (permalink / raw)
  To: Gonyou, Austin; +Cc: linux-xfs, linux-kernel

"Gonyou, Austin" <austin@coremetrics.com> said:
> Won't there be a lot of changes which need to be made for it to go into 2.5
> anyway though beyond just current development? Isn't 2.5 supposed to be
> "radically" different?

I'd better ask "What is 2.4 missing, so we can finish that up and move on
to 2.5" 2.5 will have its own way of finding out what new, shiny, exciting
features to acquire...
-- 
Dr. Horst H. von Brand                Usuario #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:16   ` Steve Lord
  2001-09-20 20:25     ` Alan Cox
  2001-09-20 20:26     ` Christoph Hellwig
@ 2001-09-20 20:40     ` Alexander Viro
  2001-09-21 18:03       ` Steve Lord
  2 siblings, 1 reply; 35+ messages in thread
From: Alexander Viro @ 2001-09-20 20:40 UTC (permalink / raw)
  To: Steve Lord; +Cc: Alan Cox, Gonyou, Austin, narancs, linux-xfs, linux-kernel



On Thu, 20 Sep 2001, Steve Lord wrote:

> > > Won't there be a lot of changes which need to be made for it to go into 2.5
> > > anyway though beyond just current development? Isn't 2.5 supposed to be
> > > "radically" different?
> > 
> > Not really. 2.5 will change over time for certain but if anything the 2.5
> > changes will make it easier. One problem area with XFS is that it duplicates
> > chunks of what should be generic functionality - and 2.5 needs to provide
> > the generic paths it wants
> 
> Since we have your attention - which chunks? One of the frustrations we have
> had is the lack of feedback from anyone who has looked at XFS.

Locking.  There is a _lot_ of duplication between fs/namei.c and fs/xfs/* -
you definitely don't need most of the stuff you do with locks there.

I understand that some of that stuff may be needed for CXFS, and I would
really like to see the description of locking requirements of that animal.

Parts that are needed only on IRIX since IRIX VFS is braindead should go.
Parts that can be moved to generic code should be moved (with sane set of
methods provided by filesystems a-la CXFS). The rest will become much simpler.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:29 ` Horst von Brand
@ 2001-09-20 20:50   ` Alan Cox
  0 siblings, 0 replies; 35+ messages in thread
From: Alan Cox @ 2001-09-20 20:50 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Gonyou Austin, linux-xfs, linux-kernel

> I'd better ask "What is 2.4 missing, so we can finish that up and move on
> to 2.5" 2.5 will have its own way of finding out what new, shiny, exciting
> features to acquire...

<Cynic mode on>

Linus 2.4.10pre is definitely 2.5 in disguise

</Cynic>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:26     ` Christoph Hellwig
@ 2001-09-20 21:31       ` Steve Lord
  2001-09-21  3:12         ` Andreas Dilger
                           ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Steve Lord @ 2001-09-20 21:31 UTC (permalink / raw)
  To: hch, Steve Lord, Alan Cox, Gonyou, Austin, narancs, linux-xfs,
	linux-kernel

> On Thu, Sep 20, 2001 at 03:16:52PM -0500, Steve Lord wrote:

> > Since we have your attention - which chunks? One of the frustrations we hav
> e
> > had is the lack of feedback from anyone who has looked at XFS.
> 
>  o The whole vnode layer

Two answers here - economics and code stability. This is a filesystem
which has been worked on by people being payed to do so by a corporation,
therefore there is a budget (long since blown). It was simpler and hence
cheaper to wrap XFS in a conversion layer than to rework the code down
into the bowels of the filesystem. Then the stability part of it, we
started with a working filesystem, from an engineering standpoint it 
made more sense to keep as much of the existing code base intact as
possible - the less surgery performed the better in terms of keeping
things running, and making it easy to take enhancements and fixes made
in the Irix base into the Linux code (we don't do it the other way around).

>  o checks already peformed by the VFS all over the place
>    (just take a look at xfs_rename.c!)

I think I will answer this one more slowly and in response to Al Viro's
email. But that economics/stability thing comes into it again.

>  o the own quoata code

XFS quotas are transactional, when space is added to a file the quota is
adjusted in the same transaction. It is fairly hard to do this without your
own quota code.

>  o the hooks for a propritary clusterfs..

Well we have to make money on something you know.... and in reality
there are not a lot of them in the filesystem.

Thanks

   Steve


> 
> My 2 (euro-)cents,
> 
> 	Christoph
> 
> -- 
> Of course it doesn't work. We've performed a software upgrade.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 21:31       ` Steve Lord
@ 2001-09-21  3:12         ` Andreas Dilger
  2001-09-21  3:25           ` Steve Lord
  2001-09-21  5:58         ` Christoph Hellwig
  2001-09-21 14:19         ` Alexander Viro
  2 siblings, 1 reply; 35+ messages in thread
From: Andreas Dilger @ 2001-09-21  3:12 UTC (permalink / raw)
  To: Steve Lord
  Cc: hch, Alan Cox, Gonyou, Austin, narancs, linux-xfs, linux-kernel

On Sep 20, 2001  16:31 -0500, Steve Lord wrote:
> XFS quotas are transactional, when space is added to a file the quota is
> adjusted in the same transaction. It is fairly hard to do this without your
> own quota code.

Actually not.  The quotas in ext3 are transactional as well.  It's just
that the "ext3" journal layer allows nested transactions, so it is possible
to start a write transaction, call into the journal code which calls back
into the ext3 write code to start a nested transaction on the journal file
(i.e. it is in the same transaction as the initial write), and then the
initial write completes.

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-21  3:12         ` Andreas Dilger
@ 2001-09-21  3:25           ` Steve Lord
  2001-09-21  4:42             ` Nathan Scott
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Lord @ 2001-09-21  3:25 UTC (permalink / raw)
  To: Steve Lord, hch, Alan Cox, Gonyou, Austin, narancs, linux-xfs,
	linux-kernel

> On Sep 20, 2001  16:31 -0500, Steve Lord wrote:
> > XFS quotas are transactional, when space is added to a file the quota is
> > adjusted in the same transaction. It is fairly hard to do this without your
> > own quota code.
> 
> Actually not.  The quotas in ext3 are transactional as well.  It's just
> that the "ext3" journal layer allows nested transactions, so it is possible
> to start a write transaction, call into the journal code which calls back
> into the ext3 write code to start a nested transaction on the journal file
> (i.e. it is in the same transaction as the initial write), and then the
> initial write completes.

OK, good point, but doing a major rewrite of XFS to use a different
transaction mechanism is not really on the cards, plus we have on disk
compatibility with the Irix version to consider.

Steve




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-21  3:25           ` Steve Lord
@ 2001-09-21  4:42             ` Nathan Scott
  0 siblings, 0 replies; 35+ messages in thread
From: Nathan Scott @ 2001-09-21  4:42 UTC (permalink / raw)
  To: Steve Lord
  Cc: hch, Alan Cox, Gonyou, Austin, narancs, linux-xfs, linux-kernel

hi,

On Thu, Sep 20, 2001 at 10:25:20PM -0500, Steve Lord wrote:
> > On Sep 20, 2001  16:31 -0500, Steve Lord wrote:
> > > XFS quotas are transactional, when space is added to a file the quota is
> > > adjusted in the same transaction. It is fairly hard to do this without your
> > > own quota code.
> > 
> > Actually not.  The quotas in ext3 are transactional as well.  It's just
> > that the "ext3" journal layer allows nested transactions, so it is possible
> > to start a write transaction, call into the journal code which calls back
> > into the ext3 write code to start a nested transaction on the journal file
> > (i.e. it is in the same transaction as the initial write), and then the
> > initial write completes.
> 
> OK, good point, but doing a major rewrite of XFS to use a different
> transaction mechanism is not really on the cards, plus we have on disk
> compatibility with the Irix version to consider.
> 

XFS also uses quite a different _model_ of quota, and it is
tightly integrated into XFS (by its very nature).  There
are a number of issues that it attempts to address, and in
particular it works around the inherent problems of the
traditional quotacheck/mount/quotaon/quotaoff model (ie. the
BSD quota model which the Linux VFS also uses, and which was
used in IRIX for the EFS filesystem).

Its simply different, there are advantages and disadvantages
of each way, but XFS is particularly aimed at scalability,
and the desire to never have to run quotacheck(8) on large
filesystems was one of the issues which the original design
aimed to address.

So XFS quota should be a non-issue.  I have had discussions
with Jan Kara and Alan in the past about how to most cleanly
integrate it with their (new) VFS quota, and they seem happy
with the design we collectively came up with (its slightly
different to the one in both the XFS patch and Alan's patch
at the moment).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 21:31       ` Steve Lord
  2001-09-21  3:12         ` Andreas Dilger
@ 2001-09-21  5:58         ` Christoph Hellwig
  2001-09-21  8:40           ` Narancs v1
  2001-09-21 14:19         ` Alexander Viro
  2 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2001-09-21  5:58 UTC (permalink / raw)
  To: Steve Lord; +Cc: Alan Cox, Gonyou, Austin, narancs, linux-xfs, linux-kernel

On Thu, Sep 20, 2001 at 04:31:37PM -0500, Steve Lord wrote:
> >  o The whole vnode layer
> 
> Two answers here - economics and code stability. This is a filesystem
> which has been worked on by people being payed to do so by a corporation,
> therefore there is a budget (long since blown). It was simpler and hence
> cheaper to wrap XFS in a conversion layer than to rework the code down
> into the bowels of the filesystem. Then the stability part of it, we
> started with a working filesystem, from an engineering standpoint it 
> made more sense to keep as much of the existing code base intact as
> possible - the less surgery performed the better in terms of keeping
> things running, and making it easy to take enhancements and fixes made
> in the Irix base into the Linux code (we don't do it the other way around).

I completly understand SGI's reson to do this - but yet I don't want to
see such code in the mainline for obvios reasons..

> >  o the hooks for a propritary clusterfs..
> 
> Well we have to make money on something you know.... and in reality
> there are not a lot of them in the filesystem.

Again I understand SGI's reasoning - but such hooks are usually not
considered to be a good thing line.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-21  5:58         ` Christoph Hellwig
@ 2001-09-21  8:40           ` Narancs v1
  0 siblings, 0 replies; 35+ messages in thread
From: Narancs v1 @ 2001-09-21  8:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Steve Lord, Alan Cox, Gonyou, Austin, linux-xfs, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oh guys I did not think that this is so "technically impossible".
It's good for me to be informed what phylosofical differences you have
between SGI and Linux programming style.

If the merge will not happen in around a half year - I think SGI/XFS "will
lose market share". And Ext3 is gonna get more of the pie as that is
(maybe) possible to patch in the kernel.

I really hope, that there will be a nice technical solution to this issue
and there will be no political decisions over technical.

Thanks

- -------------------------
Narancs v1
IT Security Administrator

"Security of information is an illusion.
What is in one's mind gets into the collective consciousness (akasha),
so that can be read with meditation ;-) You don't have to hack.
Just 'remember'! You're the one."

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjuq/PUACgkQGp+ylEhMCIV0tQCeLKOrR4z1QZLWugIiA8hDZM7t
TQQAn0fD6QhhtuvNYUZB/si5CQa/XtaE
=k0Lx
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 21:31       ` Steve Lord
  2001-09-21  3:12         ` Andreas Dilger
  2001-09-21  5:58         ` Christoph Hellwig
@ 2001-09-21 14:19         ` Alexander Viro
  2001-09-21 14:45           ` Steve Lord
  2 siblings, 1 reply; 35+ messages in thread
From: Alexander Viro @ 2001-09-21 14:19 UTC (permalink / raw)
  To: Steve Lord
  Cc: hch, Alan Cox, Gonyou, Austin, narancs, linux-xfs, linux-kernel



On Thu, 20 Sep 2001, Steve Lord wrote:

> Two answers here - economics and code stability. This is a filesystem
> which has been worked on by people being payed to do so by a corporation,
> therefore there is a budget (long since blown). It was simpler and hence
> cheaper to wrap XFS in a conversion layer than to rework the code down
> into the bowels of the filesystem. Then the stability part of it, we
> started with a working filesystem, from an engineering standpoint it 
> made more sense to keep as much of the existing code base intact as
> possible - the less surgery performed the better in terms of keeping
> things running, and making it easy to take enhancements and fixes made
> in the Irix base into the Linux code (we don't do it the other way around).

True, but there's a cost of maintaining the source and reducing the
size of said source by order of magnitude will help to reduce _that_.

The argument would make sense if you were treating everything under
your compatibility layer as a black box, but I sincerely hope that
it's not the case.
 
> >  o checks already peformed by the VFS all over the place
> >    (just take a look at xfs_rename.c!)
> 
> I think I will answer this one more slowly and in response to Al Viro's
> email. But that economics/stability thing comes into it again.

Looking forward to that...  Just documenting the exclusion requirements
of CXFS would help.  Big way.  As it is, you are bordering on the "adding
undocumented API for proprietory module" and while I've got no problems
with the last part (I don't suffer from stallmanellosis), I really don't
like the first one.  Nobody's asking to give up the guts of CXFS, but
having its exclusion requirements documented is a different story.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-21 14:19         ` Alexander Viro
@ 2001-09-21 14:45           ` Steve Lord
  0 siblings, 0 replies; 35+ messages in thread
From: Steve Lord @ 2001-09-21 14:45 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Steve Lord, hch, Alan Cox, Gonyou, Austin, narancs, linux-xfs,
	linux-kernel

> 
> 
> On Thu, 20 Sep 2001, Steve Lord wrote:
> 
> > Two answers here - economics and code stability. This is a filesystem
> > which has been worked on by people being payed to do so by a corporation,
> > therefore there is a budget (long since blown). It was simpler and hence
> > cheaper to wrap XFS in a conversion layer than to rework the code down
> > into the bowels of the filesystem. Then the stability part of it, we
> > started with a working filesystem, from an engineering standpoint it 
> > made more sense to keep as much of the existing code base intact as
> > possible - the less surgery performed the better in terms of keeping
> > things running, and making it easy to take enhancements and fixes made
> > in the Irix base into the Linux code (we don't do it the other way around).
> 
> True, but there's a cost of maintaining the source and reducing the
> size of said source by order of magnitude will help to reduce _that_.

Well there is not an order of magnitude in it, and it then leaves SGI
in the situation of having two even more divergent versions of XFS
than we have now. I do realise there are two conflicting goals in all of
this.

> 
> The argument would make sense if you were treating everything under
> your compatibility layer as a black box, but I sincerely hope that
> it's not the case.

Well, not everything, but the vast majority of the xfs code has not
had to change at all, we have a different buffer cache interface,
and the read/write path is different, and the inode creation/teardown
interface needed surgery to work with linux inodes, oh and endian
conversion. But apart from that ......

>  
> > >  o checks already peformed by the VFS all over the place
> > >    (just take a look at xfs_rename.c!)
> > 
> > I think I will answer this one more slowly and in response to Al Viro's
> > email. But that economics/stability thing comes into it again.
> 
> Looking forward to that...  Just documenting the exclusion requirements
> of CXFS would help.  Big way.  As it is, you are bordering on the "adding
> undocumented API for proprietory module" and while I've got no problems
> with the last part (I don't suffer from stallmanellosis), I really don't
> like the first one.  Nobody's asking to give up the guts of CXFS, but
> having its exclusion requirements documented is a different story.

Working on the locking first, the tricky part is how locks interact with
the transaction mechanism.

Steve



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: XFS to main kernel source
  2001-09-20 20:40     ` Alexander Viro
@ 2001-09-21 18:03       ` Steve Lord
  0 siblings, 0 replies; 35+ messages in thread
From: Steve Lord @ 2001-09-21 18:03 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Steve Lord, Alan Cox, Gonyou, Austin, narancs, linux-xfs, linux-kernel


I started writing this, and then got dragged in various other directions,
so I am sending this out as a starting point - I will also be offline until
Monday, we have to go to a wedding... so I am not ignoring responses, just
probably not reading email.

Steve

> 
> 
> On Thu, 20 Sep 2001, Steve Lord wrote:
> 

> > 
> > Since we have your attention - which chunks? One of the frustrations we hav
> e
> > had is the lack of feedback from anyone who has looked at XFS.
> 
> Locking.  There is a _lot_ of duplication between fs/namei.c and fs/xfs/* -
> you definitely don't need most of the stuff you do with locks there.

Looks like its time for the xfs locking tutorial....

I am going to walk through this in terms of the internal locks within
XFS and how they interact.

The XFS inode has two main locks, these are both multi reader, single
writer sleep locks which have some semantics beyond those provided by
the Linux i_sema lock. There is one other lock which could probably be
removed on linux - it relates to readahead state, but we are not using
the readahead code within XFS so.... the final lock is the flush lock
which is used to synchronize flush activity on the inode (only one flush
at once for an inode). These multireader locks have a couple of other
things we can do with them - trylock and demote being the two we use
right now I think.

We have the ilock and the iolock. The ilock is used to protect the core
of the inode, and the iolock is used to protect file data, the iolock
is above the ilock in the locking hierarchy, but is obtained much less
of the time.

The ilock is closely tied in with the transaction system. In general
the sequence of events when performing a transaction which modifies an
inode in xfs is as follows:

 1. obtain the ilock, do access & resource checks, fail the request
    if there are problems.

 2. drop the ilock (we cannot hold the lock whilst doing the next step)

 3. allocate a transaction and reserve the transaction space. The reason
    this cannot be performed with the lock is that reserving space in
    the log can require flushing metadata out to disk (to allow old
    log contents to be overwritten). In order to flush the metadata out to
    disk we need to ensure it is stable - which means we need a lock. So
    should an inode be at the tail of the log and we need to start 
    another transaction on it, if we held the lock across the transaction
    reserve we have a deadlock.

 4. relock the inode - reperform the checks, drop locks, cancel transaction
    and return error if things changed.

 5. start metadata modifications - all objects are locked as required and
    remain locked until the transaction is committed.

 6. commit transaction. This copies contents of the modified objects into an
    in core log buffer, the individual items are 'pinned' into memory - which
    prevents a disk write, and unlocked - which means new operations can
    access them.

 7. write log buffer to disk, this is triggered by it being full, a sync
    transaction or filesystem sync activity (write_super). On completion of
    this we move the log items into something called the active item list
    and unpin them.

 8. metadata flush, triggered by background flushing, or by step 3 above
    taking items from the active item list. The flush gets the ilock on inodes
    and the flush lock. The inode lock is held until the inode contents are
    updated in an inode buffer, the flush lock is held until the buffer
    write completes.

 9. Metadate write completes - items are removed from the active item list
    and the tail of the log is moved if required. Threads waiting for log
    space (in step 3) are woken up.

So, there are lots of places in xfs which use the ilock for synchronization
between the top half - threads in system calls, and the bottom half - the
flushing of metadata to disk. Attempting to replace the ilock in the xfs
inode with the i_sema from the linux inode is not really a straightfoward
task.

This is really a little different from the rename checks performed by
xfs. I suspect we can remove these, but really we cannot just
blindly go replacing the ilock in the xfs inode with the i_sema and
expect anything reasonable to happen.

> 
> I understand that some of that stuff may be needed for CXFS, and I would
> really like to see the description of locking requirements of that animal.

Well, the main question here would be should cxfs have to go up to the top
of the vfs layer in linux, or can it go straight into xfs. Moving some
of the current locking out of xfs would mean the former.

> 
> Parts that are needed only on IRIX since IRIX VFS is braindead should go.
> Parts that can be moved to generic code should be moved (with sane set of
> methods provided by filesystems a-la CXFS). The rest will become much simpler
> .

Brain dead is your term, I would go more for having a different design
philosopy, but yes there is still code in xfs which is not needed under
the linux vfs. I am not yet sure what could be regarded as generic, and
I am not convinced making xfs simpler is as simple as you paint it as
being.

Steve




^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] lp.c, eexpress.c jiffies cleanup
@ 2001-11-06 20:04 Tim Schmielau
  2001-11-06 21:15 ` Andreas Dilger
  2001-11-06 21:37 ` Philip Blundell
  0 siblings, 2 replies; 35+ messages in thread
From: Tim Schmielau @ 2001-11-06 20:04 UTC (permalink / raw)
  To: Philip.Blundell; +Cc: Linus Torvalds, Alan Cox, linux-kernel, Andreas Dilger

Dear Philip, Linus, Alan,

one more jiffies cleanup patch to use comparison macros as pointed out by
Andreas Dilger on lkml.
I try to bundle files with the same maintainer to reduce email overhead.
If my procedure of mailing the maintainer as well as Cc:ing Linus and Alan
is incorrect, please drop me a short note.

In eexpress.c I also turned absolute jiffies number into multiples of HZ,
yet the resulting timeout values still do not always seem reasonable to
me.

Tim

--- ../linux-2.4.14-pre6/drivers/char/lp.c	Wed Oct 31 23:06:19 2001
+++ drivers/char/lp.c	Tue Nov  6 20:37:30 2001
@@ -302,7 +302,7 @@
 	size_t copy_size = count;

 #ifdef LP_STATS
-	if (jiffies-lp_table[minor].lastcall > LP_TIME(minor))
+	if (time_after(jiffies, lp_table[minor].lastcall + LP_TIME(minor)))
 		lp_table[minor].runchars = 0;

 	lp_table[minor].lastcall = jiffies;
--- ../linux-2.4.14-pre6/drivers/net/eexpress.c	Sun Sep 30 21:26:06 2001
+++ drivers/net/eexpress.c	Tue Nov  6 20:52:59 2001
@@ -504,7 +504,7 @@

 	if (lp->started)
 	{
-		if ((jiffies - dev->trans_start)>50)
+		if (time_after(jiffies, dev->trans_start + HZ/2))
 		{
 			if (lp->tx_link==lp->last_tx_restart)
 			{
@@ -560,7 +560,7 @@
 	}
 	else
 	{
-		if ((jiffies-lp->init_time)>10)
+		if (time_after(jiffies, lp->init_time + HZ/10))
 		{
 			unsigned short status = scb_status(dev);
 			printk(KERN_WARNING "%s: i82586 startup timed out, status %04x, resetting...\n",
@@ -725,8 +725,8 @@

 static void eexp_cmd_clear(struct net_device *dev)
 {
-	unsigned long int oldtime = jiffies;
-	while (scb_rdcmd(dev) && ((jiffies-oldtime)<10));
+	unsigned long timeout = jiffies + HZ/10;
+	while (scb_rdcmd(dev) && (time_before(jiffies, timeout)));
 	if (scb_rdcmd(dev)) {
 		printk("%s: command didn't clear\n", dev->name);
 	}
@@ -1598,16 +1598,16 @@
                 }
         }
         if (kick) {
-                unsigned long oj;
+                unsigned long timeout;
                 scb_command(dev, SCB_CUsuspend);
                 outb(0, ioaddr+SIGNAL_CA);
                 outb(0, ioaddr+SIGNAL_CA);
 #if 0
                 printk("%s: waiting for CU to go suspended\n", dev->name);
 #endif
-                oj = jiffies;
+                timeout = jiffies + 20*HZ;
                 while ((SCB_CUstat(scb_status(dev)) == 2) &&
-                       ((jiffies-oj) < 2000));
+                       time_before(jiffies, timeout));
 		if (SCB_CUstat(scb_status(dev)) == 2)
 			printk("%s: warning, CU didn't stop\n", dev->name);
                 lp->started &= ~(STARTED_CU);



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] lp.c, eexpress.c jiffies cleanup
  2001-11-06 20:04 [PATCH] lp.c, eexpress.c jiffies cleanup Tim Schmielau
@ 2001-11-06 21:15 ` Andreas Dilger
  2001-11-06 21:37 ` Philip Blundell
  1 sibling, 0 replies; 35+ messages in thread
From: Andreas Dilger @ 2001-11-06 21:15 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: Philip.Blundell, linux-kernel

On Nov 06, 2001  21:04 +0100, Tim Schmielau wrote:
> In eexpress.c I also turned absolute jiffies number into multiples of HZ,
> yet the resulting timeout values still do not always seem reasonable to
> me.

I agree.  It seems very ugly.  I looked at a few drivers which loop 1 or 2
jiffies, but to busy-loop for 1/10th of a second, or even 20 seconds
is terribly bad.  I took a look at the functions scb_rdcmd() and
scb_status() and they are only reading from an I/O port, so no chance
of sleeping/blocking, just busy waiting.

> -	unsigned long int oldtime = jiffies;
> -	while (scb_rdcmd(dev) && ((jiffies-oldtime)<10));
> +	unsigned long timeout = jiffies + HZ/10;
> +	while (scb_rdcmd(dev) && (time_before(jiffies, timeout)));
>  	if (scb_rdcmd(dev)) {
>  		printk("%s: command didn't clear\n", dev->name);
>  	}
> @@ -1598,16 +1598,16 @@
>  #endif
> -                oj = jiffies;
> +                timeout = jiffies + 20*HZ;
>                  while ((SCB_CUstat(scb_status(dev)) == 2) &&
> -                       ((jiffies-oj) < 2000));
> +                       time_before(jiffies, timeout));
>  		if (SCB_CUstat(scb_status(dev)) == 2)
>  			printk("%s: warning, CU didn't stop\n", dev->name);
>                  lp->started &= ~(STARTED_CU);

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] lp.c, eexpress.c jiffies cleanup
  2001-11-06 20:04 [PATCH] lp.c, eexpress.c jiffies cleanup Tim Schmielau
  2001-11-06 21:15 ` Andreas Dilger
@ 2001-11-06 21:37 ` Philip Blundell
  2001-11-07  0:10   ` Andreas Dilger
  1 sibling, 1 reply; 35+ messages in thread
From: Philip Blundell @ 2001-11-06 21:37 UTC (permalink / raw)
  To: Tim Schmielau, linux-kernel

In message <20011106141521.R3957@lynx.no>, Andreas Dilger writes:
>On Nov 06, 2001  21:04 +0100, Tim Schmielau wrote:
>> In eexpress.c I also turned absolute jiffies number into multiples of HZ,
>> yet the resulting timeout values still do not always seem reasonable to
>> me.
>
>I agree.  It seems very ugly.  I looked at a few drivers which loop 1 or 2
>jiffies, but to busy-loop for 1/10th of a second, or even 20 seconds
>is terribly bad. 

Those timeouts are only a last resort.  If the card is working properly the loop will terminate much sooner.

p.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] lp.c, eexpress.c jiffies cleanup
  2001-11-07  0:10   ` Andreas Dilger
@ 2001-11-06 23:58     ` Tim Hockin
  0 siblings, 0 replies; 35+ messages in thread
From: Tim Hockin @ 2001-11-06 23:58 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Philip Blundell, Tim Schmielau, linux-kernel

> afterwards.  Sucking all CPU for 20 seconds and locking everything else
> out isn't an acceptable method IMHO.


There are drivers that do this WITH INTERRUPTS OFF.  Some systems (e.g.
Cobalt) have short-interval watchdog timers, and when someone uses one of
these drivers, they just see spontaneous reboots.  Ick.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] lp.c, eexpress.c jiffies cleanup
  2001-11-06 21:37 ` Philip Blundell
@ 2001-11-07  0:10   ` Andreas Dilger
  2001-11-06 23:58     ` Tim Hockin
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Dilger @ 2001-11-07  0:10 UTC (permalink / raw)
  To: Philip Blundell; +Cc: Tim Schmielau, linux-kernel

On Nov 06, 2001  21:37 +0000, Philip Blundell wrote:
> In message <20011106141521.R3957@lynx.no>, Andreas Dilger writes:
> >I agree.  It seems very ugly.  I looked at a few drivers which loop 1 or 2
> >jiffies, but to busy-loop for 1/10th of a second, or even 20 seconds
> >is terribly bad. 
> 
> Those timeouts are only a last resort.  If the card is working properly
> the loop will terminate much sooner.

Well, if the card is working properly, then you don't need the timeouts
at all.  Clearly, if they are needed, then either they should be more
realistic in length (1/10th isn't bad, but 20 seconds?), or after a
"reasonable" short timeout, it should sleep for an interval and check
afterwards.  Sucking all CPU for 20 seconds and locking everything else
out isn't an acceptable method IMHO.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] Remove needless BKL from release functions
@ 2001-11-21 23:32 David C. Hansen
  2001-11-22 10:12 ` Oliver Neukum
  0 siblings, 1 reply; 35+ messages in thread
From: David C. Hansen @ 2001-11-21 23:32 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Linus Torvalds, Alexander Viro, Rick Lindsley

  The following is a patch which removes the BKL from quite a few 
drivers' release functions.  The release functions are already 
serialized in the VFS code by an atomic_t which guarantees that each 
function will be called only once, after all file descriptors have been 
closed.  In addition, in these drivers, the BKL was _only_ held in the 
release function and nowhere else in the driver where it might be needed.

Many of these patches simply remove the BKL from the file.  This causes 
no harm because the BKL was not really protecting anything, anyway.  
Other patches try to actually fix the locking.  Some do this by making 
use of atomic operations with the atomic_* functions, or the 
(test|set)_bit functions.  Most of these patches replace uses of normal 
integers which were used to keep open counts in the drivers.  In other 
some cases, a spinlock was added when the atomic operations could not 
guarantee proper serialization by themselves.  And, in very few cases, 
the existing locking was extended to protect more things.  These cases 
are very uncommon because locking is very uncommon in most of these drivers.

Special care has been taken not to introduce more locking issues into 
the drivers (do no harm). They're available as one big patch which is 
against 2.4.14.  The big patch is about 50k, so, instead of attaching 
it, here is a link: http://lse.sourceforge.net/lockhier/patches/bkl.rollup
Here is documentation describing some of the patches and other locking 
issues in the drivers: http://lse.sourceforge.net/lockhier/
The patch applies against 2.4.14.

--
David C. Hansen
haveblue@us.ibm.com



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-21 23:32 [PATCH] Remove needless BKL from release functions David C. Hansen
@ 2001-11-22 10:12 ` Oliver Neukum
  2001-11-22 12:11   ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Oliver Neukum @ 2001-11-22 10:12 UTC (permalink / raw)
  To: David C. Hansen, Linux Kernel Mailing List
  Cc: Linus Torvalds, Alexander Viro, Rick Lindsley

> Many of these patches simply remove the BKL from the file.  This causes
> no harm because the BKL was not really protecting anything, anyway.
> Other patches try to actually fix the locking.  Some do this by making
> use of atomic operations with the atomic_* functions, or the
> (test|set)_bit functions.  Most of these patches replace uses of normal
> integers which were used to keep open counts in the drivers.  In other
> some cases, a spinlock was added when the atomic operations could not
> guarantee proper serialization by themselves.  And, in very few cases,
> the existing locking was extended to protect more things.  These cases
> are very uncommon because locking is very uncommon in most of these
> drivers.

At least some of the removals in the input tree are probably wrong. You are 
introducing a race with deregistering of input devices.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-22 10:12 ` Oliver Neukum
@ 2001-11-22 12:11   ` Christoph Hellwig
  2001-11-22 12:30     ` Horst von Brand
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2001-11-22 12:11 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Linus Torvalds, Alexander Viro, Rick Lindsley, David C. Hansen,
	Linux Kernel Mailing List

In article <01112211121601.00690@argo> you wrote:
> At least some of the removals in the input tree are probably wrong. You are 
> introducing a race with deregistering of input devices.

Nope, it's fine to remove it.  Input is racy all over the place and the list
are modified somewhere else without any locking anyways.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-22 12:11   ` Christoph Hellwig
@ 2001-11-22 12:30     ` Horst von Brand
  2001-11-22 13:05       ` Christoph Hellwig
  2001-11-23  9:44       ` Rick Lindsley
  0 siblings, 2 replies; 35+ messages in thread
From: Horst von Brand @ 2001-11-22 12:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linux Kernel Mailing List

Christoph Hellwig <hch@ns.caldera.de> said:
> In article <01112211121601.00690@argo> you wrote:
> > At least some of the removals in the input tree are probably wrong. You
> > are introducing a race with deregistering of input devices.

> Nope, it's fine to remove it.  Input is racy all over the place and the list
> are modified somewhere else without any locking anyways.

"It is broken anyway, breaking it some more makes no difference"!?
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-22 12:30     ` Horst von Brand
@ 2001-11-22 13:05       ` Christoph Hellwig
  2001-11-23  9:44       ` Rick Lindsley
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2001-11-22 13:05 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Linux Kernel Mailing List

On Thu, Nov 22, 2001 at 09:30:06AM -0300, Horst von Brand wrote:
> > Nope, it's fine to remove it.  Input is racy all over the place and the list
> > are modified somewhere else without any locking anyways.
> 
> "It is broken anyway, breaking it some more makes no difference"!?

Wether you lock access to shared data at one or zero points doesn't matter,
so it's not breaking it more.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-22 12:30     ` Horst von Brand
  2001-11-22 13:05       ` Christoph Hellwig
@ 2001-11-23  9:44       ` Rick Lindsley
  2001-11-23 10:10         ` Oliver.Neukum
  1 sibling, 1 reply; 35+ messages in thread
From: Rick Lindsley @ 2001-11-23  9:44 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Linux Kernel Mailing List

Christoph Hellwig <hch@ns.caldera.de> said:

    Nope, it's fine to remove it.  Input is racy all over the place and the list
    are modified somewhere else without any locking anyways.
    
Horst von Brand <vonbrand@inf.utfsm.cl> then said:

    "It is broken anyway, breaking it some more makes no difference"!?

No, it is more a matter of "it is not helping at all, so removing it
makes no difference in behavior."  Removing it does, however, help
clean up the code and remove unnecessary instances of the BKL from the
kernel code.

If you check the web page at
http://lse.sourceforge.net/lockhier/patches.html, you'll find
additional information on why this patch was produced.  The most common
"no-op" was that (BKL) locking was done during release but not during
open. In some cases, there truly are things to guard. In some cases,
there really isn't. In all cases, nothing was really being correctly
guarded.

Usage of the BKL exists, in many cases, as a legendary artifact. Nobody
is sure if it's really needed, so everybody is afraid to take it out,
"just in case".  These patches represent real research -- we believe it
really is safe to take it out in these cases.  If we could not be sure,
we didn't try to patch it.

Rick

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-23  9:44       ` Rick Lindsley
@ 2001-11-23 10:10         ` Oliver.Neukum
  2001-11-23 10:47           ` Christoph Hellwig
  2001-11-23 12:08           ` Rick Lindsley
  0 siblings, 2 replies; 35+ messages in thread
From: Oliver.Neukum @ 2001-11-23 10:10 UTC (permalink / raw)
  To: Rick Lindsley; +Cc: Horst von Brand, Linux Kernel Mailing List

On Fri, 23 Nov 2001, Rick Lindsley wrote:

> Christoph Hellwig <hch@ns.caldera.de> said:
>
>     Nope, it's fine to remove it.  Input is racy all over the place and the list
>     are modified somewhere else without any locking anyways.
>
> Horst von Brand <vonbrand@inf.utfsm.cl> then said:
>
>     "It is broken anyway, breaking it some more makes no difference"!?
>
> No, it is more a matter of "it is not helping at all, so removing it
> makes no difference in behavior."  Removing it does, however, help
> clean up the code and remove unnecessary instances of the BKL from the
> kernel code.
>
> If you check the web page at
> http://lse.sourceforge.net/lockhier/patches.html, you'll find
> additional information on why this patch was produced.  The most common
> "no-op" was that (BKL) locking was done during release but not during
> open. In some cases, there truly are things to guard. In some cases,
> there really isn't. In all cases, nothing was really being correctly
> guarded.

While this is doubtlessly true, please don't do things like removing the
lock from interfaces like the call to open() in the input subsystem.
People may depend on the lock being held there. Having open() under BKL
simplifies writing USB device drivers.

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-23 10:10         ` Oliver.Neukum
@ 2001-11-23 10:47           ` Christoph Hellwig
  2001-11-23 11:24             ` Oliver Neukum
  2001-11-26 17:46             ` David C. Hansen
  2001-11-23 12:08           ` Rick Lindsley
  1 sibling, 2 replies; 35+ messages in thread
From: Christoph Hellwig @ 2001-11-23 10:47 UTC (permalink / raw)
  To: Oliver.Neukum; +Cc: Horst von Brand, Linux Kernel Mailing List, Rick Lindsley

In article <Pine.SOL.4.33.0111231106530.7403-100000@sun3.lrz-muenchen.de> you wrote:
> While this is doubtlessly true, please don't do things like removing the
> lock from interfaces like the call to open() in the input subsystem.
> People may depend on the lock being held there. Having open() under BKL
> simplifies writing USB device drivers.

Beeing completly single-threaded also simplifies writing unclean drivers..

BTW, I've attached a patch that fixes the largest input races (against 2.4.6),
I don't see how to change the total lack of locking for other data structures
without an API change, though.

	Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.

--- linux-2.4.6/drivers/input/input.c	Mon Jun  4 21:17:55 2001
+++ linux/drivers/input/input.c	Sun Jul  8 22:58:10 2001
@@ -57,6 +57,7 @@
 static devfs_handle_t input_devfs_handle;
 static int input_number;
 static long input_devices[NBITS(INPUT_DEVICES)];
+static DECLARE_MUTEX(input_lock);
 
 void input_event(struct input_dev *dev, unsigned int type, unsigned int code, int value)
 {
@@ -222,6 +223,8 @@
 	struct input_handler *handler = input_handler;
 	struct input_handle *handle;
 
+	down(&input_lock);
+	
 /*
  * Initialize repeat timer to default values.
  */
@@ -257,6 +260,8 @@
 			input_link_handle(handle);
 		handler = handler->next;
 	}
+
+	up(&input_lock);
 }
 
 void input_unregister_device(struct input_dev *dev)
@@ -265,6 +270,8 @@
 	struct input_dev **devptr = &input_dev;
 	struct input_handle *dnext;
 
+	down(&input_lock);
+	
 /*
  * Kill any pending repeat timers.
  */
@@ -294,6 +301,8 @@
 
 	if (dev->number < INPUT_DEVICES)
 		clear_bit(dev->number, input_devices);
+
+	up(&input_lock);
 }
 
 void input_register_handler(struct input_handler *handler)
@@ -301,6 +310,8 @@
 	struct input_dev *dev = input_dev;
 	struct input_handle *handle;
 
+	down(&input_lock);
+	
 /*
  * Add minors if needed.
  */
@@ -324,6 +335,8 @@
 			input_link_handle(handle);
 		dev = dev->next;
 	}
+
+	up(&input_lock);
 }
 
 void input_unregister_handler(struct input_handler *handler)
@@ -332,6 +345,8 @@
 	struct input_handle *handle = handler->handle;
 	struct input_handle *hnext;
 
+	down(&input_lock);
+
 /*
  * Tell the handler to disconnect from all devices it keeps open.
  */
@@ -358,38 +373,60 @@
 
 	if (handler->fops != NULL)
 		input_table[handler->minor >> 5] = NULL;
+
+	up(&input_lock);
 }
 
 static int input_open_file(struct inode *inode, struct file *file)
 {
-	struct input_handler *handler = input_table[MINOR(inode->i_rdev) >> 5];
 	struct file_operations *old_fops, *new_fops = NULL;
+	struct input_handler *handler;
+	unsigned int minor = MINOR(inode->i_rdev), index;
	int err;
 
-	/* No load-on-demand here? */
-	if (!handler || !(new_fops = fops_get(handler->fops)))
+	if (minor >= INPUT_DEVICES)
 		return -ENODEV;
 
-	/*
-	 * That's _really_ odd. Usually NULL ->open means "nothing special",
-	 * not "no device". Oh, well...
-	 */
-	if (!new_fops->open) {
-		fops_put(new_fops);
-		return -ENODEV;
+	down(&input_lock);
+	index = minor >> 5;
+	handler = input_table[index];
+
+	if (handler)
+		new_fops = fops_get(handler->fops);
+	if (!new_fops) {
+		char modname[32];
+		
+		up(&input_lock);
+		sprintf(modname, "input-handler-%d", index);
+		request_module(modname);
+		down(&input_lock);
+
+		err = -ENODEV;
+		handler = input_table[index];
+		if (!handler)
+			goto end;
+		if (!(new_fops = fops_get(handler->fops)))
+			goto end;
 	}
+	
 	old_fops = file->f_op;
 	file->f_op = new_fops;
 
-	lock_kernel();
-	err = new_fops->open(inode, file);
-	unlock_kernel();
+	if (new_fops->open) {
+		lock_kernel();
+		err = new_fops->open(inode, file);
+		unlock_kernel();
+	} else
+		err = 0;
 
 	if (err) {
 		fops_put(file->f_op);
 		file->f_op = fops_get(old_fops);
 	}
+
 	fops_put(old_fops);
+end:
+	up(&input_lock);
 	return err;
 }
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-23 10:47           ` Christoph Hellwig
@ 2001-11-23 11:24             ` Oliver Neukum
  2001-11-26 17:46             ` David C. Hansen
  1 sibling, 0 replies; 35+ messages in thread
From: Oliver Neukum @ 2001-11-23 11:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Horst von Brand, Linux Kernel Mailing List, Rick Lindsley

Am Freitag 23 November 2001 11:47 schrieb Christoph Hellwig:
> In article <Pine.SOL.4.33.0111231106530.7403-100000@sun3.lrz-muenchen.de> 
you wrote:
> > While this is doubtlessly true, please don't do things like removing the
> > lock from interfaces like the call to open() in the input subsystem.
> > People may depend on the lock being held there. Having open() under BKL
> > simplifies writing USB device drivers.
>
> Beeing completly single-threaded also simplifies writing unclean drivers..

This is true. However USB drivers have to cope with devices becoming 
unplugged at all times. The races this produces are not nice.

> BTW, I've attached a patch that fixes the largest input races (against
> 2.4.6), I don't see how to change the total lack of locking for other data
> structures without an API change, though.

This looks very good. Could you get this to the maintainer ?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-23 10:10         ` Oliver.Neukum
  2001-11-23 10:47           ` Christoph Hellwig
@ 2001-11-23 12:08           ` Rick Lindsley
  1 sibling, 0 replies; 35+ messages in thread
From: Rick Lindsley @ 2001-11-23 12:08 UTC (permalink / raw)
  To: Oliver.Neukum; +Cc: Linux Kernel Mailing List

I wrote:

    If you check the web page at
    http://lse.sourceforge.net/lockhier/patches.html, you'll find
    additional information on why this patch was produced.  The most common
    "no-op" was that (BKL) locking was done during release but not during
    open. In some cases, there truly are things to guard. In some cases,
    there really isn't. In all cases, nothing was really being correctly
    guarded.

Oliver.Neukum@lrz.uni-muenchen.de replied:
    
    While this is doubtlessly true, please don't do things like removing the
    lock from interfaces like the call to open() in the input subsystem.
    People may depend on the lock being held there. Having open() under BKL
    simplifies writing USB device drivers.

The good news is, the patches addressed unnecessary BKL's in release(),
not open(), so I don't think the patches we submitted will cause you to
lose any sleep.  The better news is, Christoph has even produced a
patch to address your concerns (which it sounds like you like.)  The
best news is, the kernel is cleaner now in multiple ways.

Life is good.

Rick

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-23 10:47           ` Christoph Hellwig
  2001-11-23 11:24             ` Oliver Neukum
@ 2001-11-26 17:46             ` David C. Hansen
  2001-11-26 19:41               ` Flavio Stanchina
  1 sibling, 1 reply; 35+ messages in thread
From: David C. Hansen @ 2001-11-26 17:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver.Neukum, Horst von Brand, Linux Kernel Mailing List, Rick Lindsley

Christoph Hellwig wrote:

>Beeing completly single-threaded also simplifies writing unclean drivers..
>
>BTW, I've attached a patch that fixes the largest input races (against 2.4.6),
>I don't see how to change the total lack of locking for other data structures
>without an API change, though.
>
The removal of the BKL in the input.c's open() was beyond the scope of 
what we were trying to do.  But, it snuck into the patch anyway.  It 
probably shouldn't have been there.  

In the patch, in the open function, you do this:
    if (handler)
        new_fops = fops_get(handler->fops);

But, the fops_get() #define already cheecks to make sure handler isn't null:
#define fops_get(fops) \
        (((fops) && (fops)->owner)      \
                ? ( try_inc_mod_count((fops)->owner) ? (fops) : NULL ) \
                : (fops))

It's nice to be careful, but is that extra check really necessary?

And, the BKL is still held during the open() call to the handler's 
open().  Is this trying to make sure that open() isn't called twice on a 
single device?  I checked all of the files in drivers/input, and didn't 
see any other uses of the BKL.  

--
David C. Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-26 17:46             ` David C. Hansen
@ 2001-11-26 19:41               ` Flavio Stanchina
  2001-11-26 19:53                 ` David C. Hansen
  0 siblings, 1 reply; 35+ messages in thread
From: Flavio Stanchina @ 2001-11-26 19:41 UTC (permalink / raw)
  To: David C. Hansen; +Cc: Linux Kernel Mailing List

On Monday 26 November 2001 18:46, David C. Hansen wrote:

> In the patch, in the open function, you do this:
>     if (handler)
>         new_fops = fops_get(handler->fops);
>
> But, the fops_get() #define already cheecks to make sure handler isn't
> null: #define fops_get(fops) \
>         (((fops) && (fops)->owner)      \
>                 ? ( try_inc_mod_count((fops)->owner) ? (fops) : NULL ) \
>                 : (fops))

Look closer, it doesn't check 'handler' (it couldn't).

-- 
Ciao,
    Flavio Stanchina
    Trento - Italy

"The best defense against logic is ignorance."
http://spazioweb.inwind.it/fstanchina/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] Remove needless BKL from release functions
  2001-11-26 19:41               ` Flavio Stanchina
@ 2001-11-26 19:53                 ` David C. Hansen
  0 siblings, 0 replies; 35+ messages in thread
From: David C. Hansen @ 2001-11-26 19:53 UTC (permalink / raw)
  To: Flavio Stanchina; +Cc: Linux Kernel Mailing List

Flavio Stanchina wrote:

>Look closer, it doesn't check 'handler' (it couldn't).
>
Silly me... Thanks.



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2001-11-26 19:56 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-06 20:04 [PATCH] lp.c, eexpress.c jiffies cleanup Tim Schmielau
2001-11-06 21:15 ` Andreas Dilger
2001-11-06 21:37 ` Philip Blundell
2001-11-07  0:10   ` Andreas Dilger
2001-11-06 23:58     ` Tim Hockin
  -- strict thread matches above, loose matches on Subject: below --
2001-11-21 23:32 [PATCH] Remove needless BKL from release functions David C. Hansen
2001-11-22 10:12 ` Oliver Neukum
2001-11-22 12:11   ` Christoph Hellwig
2001-11-22 12:30     ` Horst von Brand
2001-11-22 13:05       ` Christoph Hellwig
2001-11-23  9:44       ` Rick Lindsley
2001-11-23 10:10         ` Oliver.Neukum
2001-11-23 10:47           ` Christoph Hellwig
2001-11-23 11:24             ` Oliver Neukum
2001-11-26 17:46             ` David C. Hansen
2001-11-26 19:41               ` Flavio Stanchina
2001-11-26 19:53                 ` David C. Hansen
2001-11-23 12:08           ` Rick Lindsley
2001-09-20 20:07 XFS to main kernel source Gonyou, Austin
2001-09-20 20:14 ` Alan Cox
2001-09-20 20:16   ` Steve Lord
2001-09-20 20:25     ` Alan Cox
2001-09-20 20:26     ` Christoph Hellwig
2001-09-20 21:31       ` Steve Lord
2001-09-21  3:12         ` Andreas Dilger
2001-09-21  3:25           ` Steve Lord
2001-09-21  4:42             ` Nathan Scott
2001-09-21  5:58         ` Christoph Hellwig
2001-09-21  8:40           ` Narancs v1
2001-09-21 14:19         ` Alexander Viro
2001-09-21 14:45           ` Steve Lord
2001-09-20 20:40     ` Alexander Viro
2001-09-21 18:03       ` Steve Lord
2001-09-20 20:29 ` Horst von Brand
2001-09-20 20:50   ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).