All of lore.kernel.org
 help / color / mirror / Atom feed
* Status of inode scaling work
@ 2011-01-14  9:02 Nick Piggin
  2011-01-25  7:10 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2011-01-14  9:02 UTC (permalink / raw)
  To: Dave Chinner, Al Viro, Christoph Hellwig, Linus Torvalds, linux-fsdevel

I'd like to know about the status and plans of the inode scaling work.

Aiming for 2.6.39 or 40 might be an idea.

I'd like to be able to see what patches are proposed to finish breaking
inode_lock into its constituent sub classes, and we can go from there.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Status of inode scaling work
  2011-01-14  9:02 Status of inode scaling work Nick Piggin
@ 2011-01-25  7:10 ` Dave Chinner
  2011-01-27  0:40   ` Nick Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2011-01-25  7:10 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, Linus Torvalds, linux-fsdevel

On Fri, Jan 14, 2011 at 08:02:57PM +1100, Nick Piggin wrote:
> I'd like to know about the status and plans of the inode scaling work.
> 
> Aiming for 2.6.39 or 40 might be an idea.

It was ready for 2.6.38....

> I'd like to be able to see what patches are proposed to finish breaking
> inode_lock into its constituent sub classes, and we can go from there.

Already done. I pointed you at the tree a while back:

The following changes since commit c723fdab8aa728dc2bf0da6a0de8bb9c3f588d84:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 (2011-01-25 14:23:54 +1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git inode-scale

Dave Chinner (9):
      fs: protect inode->i_state with inode->i_lock
      fs: factor inode disposal
      fs: Lock the inode LRU list separately
      fs: remove inode_lock from iput_final and prune_icache
      fs: move i_sb_list out from under inode_lock
      fs: move i_wb_list out from under inode_lock
      fs: rename inode_lock to inode_hash_lock
      fs: Clean up documentation references to inode_lock
      fs: pull inode->i_lock up out of writeback_single_inode

 Documentation/filesystems/Locking |    2 +-
 Documentation/filesystems/porting |   11 +-
 Documentation/filesystems/vfs.txt |    2 +-
 fs/block_dev.c                    |    4 +-
 fs/buffer.c                       |    2 +-
 fs/drop_caches.c                  |   18 +-
 fs/fs-writeback.c                 |  138 ++++++++-----
 fs/inode.c                        |  395 +++++++++++++++++++++----------------
 fs/internal.h                     |    7 +
 fs/logfs/inode.c                  |    2 +-
 fs/notify/inode_mark.c            |   42 +++--
 fs/notify/mark.c                  |    1 -
 fs/notify/vfsmount_mark.c         |    1 -
 fs/ntfs/inode.c                   |    4 +-
 fs/quota/dquot.c                  |   41 +++--
 include/linux/fs.h                |    2 +-
 include/linux/quotaops.h          |    2 +-
 include/linux/writeback.h         |    2 +-
 mm/backing-dev.c                  |    8 +-
 mm/filemap.c                      |   10 +-
 mm/rmap.c                         |    7 +-
 21 files changed, 409 insertions(+), 292 deletions(-)

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Status of inode scaling work
  2011-01-25  7:10 ` Dave Chinner
@ 2011-01-27  0:40   ` Nick Piggin
  2011-01-27  4:57     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2011-01-27  0:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, Linus Torvalds, linux-fsdevel

On Tue, Jan 25, 2011 at 6:10 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Jan 14, 2011 at 08:02:57PM +1100, Nick Piggin wrote:
>> I'd like to know about the status and plans of the inode scaling work.
>>
>> Aiming for 2.6.39 or 40 might be an idea.
>
> It was ready for 2.6.38....

Can it go into linux-next? Unless Al can put it in his for-next branch?

>> I'd like to be able to see what patches are proposed to finish breaking
>> inode_lock into its constituent sub classes, and we can go from there.
>
> Already done. I pointed you at the tree a while back:
>
> The following changes since commit c723fdab8aa728dc2bf0da6a0de8bb9c3f588d84:
>
>  Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 (2011-01-25 14:23:54 +1000)
>
> are available in the git repository at:
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git inode-scale
>
> Dave Chinner (9):
>      fs: protect inode->i_state with inode->i_lock

The places where you take i_lock to initialize i_state in newly
allocated inodes should not be required, because they can't be found
until they get added to lists, and adding to the list must have the right
barriers or lock synchronisation to make sure traversals don't see
uninitialised inodes.

eg.
                         spin_lock(&inode->i_lock);
                         inode->i_state = I_NEW;
                         hlist_add_head(&inode->i_hash, head);
                         spin_unlock(&inode->i_lock);
                         inode_sb_list_add(inode);
                         spin_unlock(&inode_lock);

That lightens new inode paths a little bit.

"Also remove the unlock_new_inode() memory barrier optimisation
required to avoid taking the inode_lock when clearing I_NEW.
Simplify the code by simply taking the inode->i_lock around the
state change and wakeup. Because the wakeup is no longer tricky,
remove the wake_up_inode() function and open code the wakeup where
necessary."

Can you do that bit separately?


>      fs: factor inode disposal

-       spin_unlock(&inode->i_lock);
-
-       /*
-        * Move the inode off the IO lists and LRU once I_FREEING is
-        * set so that it won't get moved back on there if it is dirty.
-        */
        inode_lru_list_del(inode);
-       list_del_init(&inode->i_wb_list);
-
-       __inode_sb_list_del(inode);
+       spin_unlock(&inode->i_lock);
        spin_unlock(&inode_lock);
+

Why do you change i_lock to cover inode_lru_list_del in this
patch?


>      fs: Lock the inode LRU list separately

BTW. have you found any issues with nesting locks inside i_lock? We
do that extensively in dcache now, but it can be trivially made to use
another lock (eg. i_dcache_lock).


>      fs: remove inode_lock from iput_final and prune_icache
>      fs: move i_sb_list out from under inode_lock

I don't know if we should be including ../internal.h, although you could
argue that quota and notify is core code.


>      fs: move i_wb_list out from under inode_lock
>      fs: rename inode_lock to inode_hash_lock
>      fs: Clean up documentation references to inode_lock

Can these be merged with the patches as they change the locking?
Some places also call it "inode lock" (eg. find_inode())


>      fs: pull inode->i_lock up out of writeback_single_inode

I think the series looks pretty good overall. Thanks.

Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Status of inode scaling work
  2011-01-27  0:40   ` Nick Piggin
@ 2011-01-27  4:57     ` Dave Chinner
  2011-01-27  5:11       ` Nick Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2011-01-27  4:57 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, Linus Torvalds, linux-fsdevel

On Thu, Jan 27, 2011 at 11:40:57AM +1100, Nick Piggin wrote:
> On Tue, Jan 25, 2011 at 6:10 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Jan 14, 2011 at 08:02:57PM +1100, Nick Piggin wrote:
> >> I'd like to know about the status and plans of the inode scaling work.
> >>
> >> Aiming for 2.6.39 or 40 might be an idea.
> >
> > It was ready for 2.6.38....
> 
> Can it go into linux-next? Unless Al can put it in his for-next branch?

It'll go into linux-next whenever it is sufficiently reviewed and
tested to get pulled into an upstream branch.

> >> I'd like to be able to see what patches are proposed to finish breaking
> >> inode_lock into its constituent sub classes, and we can go from there.
> >
> > Already done. I pointed you at the tree a while back:
> >
> > The following changes since commit c723fdab8aa728dc2bf0da6a0de8bb9c3f588d84:
> >
> >  Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 (2011-01-25 14:23:54 +1000)
> >
> > are available in the git repository at:
> >
> >  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git inode-scale
> >
> > Dave Chinner (9):
> >      fs: protect inode->i_state with inode->i_lock
> 
> The places where you take i_lock to initialize i_state in newly
> allocated inodes should not be required, because they can't be found
> until they get added to lists, and adding to the list must have the right
> barriers or lock synchronisation to make sure traversals don't see
> uninitialised inodes.
> 
> eg.
>                          spin_lock(&inode->i_lock);
>                          inode->i_state = I_NEW;
>                          hlist_add_head(&inode->i_hash, head);
>                          spin_unlock(&inode->i_lock);
>                          inode_sb_list_add(inode);
>                          spin_unlock(&inode_lock);
> 
> That lightens new inode paths a little bit.

Not sure what you are try to say here - the above code fragment is
exactly how the code ends up after the patch....

As it is, the reason it was done this way was suggested by Al:

http://marc.info/?l=linux-fsdevel&m=128832928023866&w=2

We use ->i_lock when checking inode_unhashed(), so we need to hold
the ->i_lock when adding or removing the inode from the hash list.
This is also important for when hash lookups get converted to RCU
traversals, because then the only thing protecting a newly added
inode from a concurrent traversal is the ->i_lock...

> "Also remove the unlock_new_inode() memory barrier optimisation
> required to avoid taking the inode_lock when clearing I_NEW.
> Simplify the code by simply taking the inode->i_lock around the
> state change and wakeup. Because the wakeup is no longer tricky,
> remove the wake_up_inode() function and open code the wakeup where
> necessary."
> 
> Can you do that bit separately?

We'd still need to modify that code to protect the i_state
manipulations with ->i_lock in this patch, so it doesn't really make
sense to me to undo the optimisation in one patch, then remove the
rest of it in another patch...

> >      fs: factor inode disposal
> 
> -       spin_unlock(&inode->i_lock);
> -
> -       /*
> -        * Move the inode off the IO lists and LRU once I_FREEING is
> -        * set so that it won't get moved back on there if it is dirty.
> -        */
>         inode_lru_list_del(inode);
> -       list_del_init(&inode->i_wb_list);
> -
> -       __inode_sb_list_del(inode);
> +       spin_unlock(&inode->i_lock);
>         spin_unlock(&inode_lock);
> +
> 
> Why do you change i_lock to cover inode_lru_list_del in this
> patch?

So that all LRU manipultions in the function are done consiѕtently
i.e. with the ->i_lock held.

> >      fs: Lock the inode LRU list separately
> 
> BTW. have you found any issues with nesting locks inside i_lock? We
> do that extensively in dcache now, but it can be trivially made to use
> another lock (eg. i_dcache_lock).

No new ones.

> >      fs: remove inode_lock from iput_final and prune_icache
> >      fs: move i_sb_list out from under inode_lock
> 
> I don't know if we should be including ../internal.h, although you
> could argue that quota and notify is core code.

I don't think there is much to argue about there.

> >      fs: move i_wb_list out from under inode_lock
> >      fs: rename inode_lock to inode_hash_lock
> >      fs: Clean up documentation references to inode_lock
> 
> Can these be merged with the patches as they change the locking?
> Some places also call it "inode lock" (eg. find_inode())

Yeah, probably should.

> >      fs: pull inode->i_lock up out of writeback_single_inode
> 
> I think the series looks pretty good overall. Thanks.

Don't thank me, thank Christoph, Al, Andrew, Eric and others for
reviewing it over and over to beat it into this shape.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Status of inode scaling work
  2011-01-27  4:57     ` Dave Chinner
@ 2011-01-27  5:11       ` Nick Piggin
  2011-01-27  6:20         ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2011-01-27  5:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, Linus Torvalds, linux-fsdevel

On Thu, Jan 27, 2011 at 3:57 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Jan 27, 2011 at 11:40:57AM +1100, Nick Piggin wrote:
>> On Tue, Jan 25, 2011 at 6:10 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Fri, Jan 14, 2011 at 08:02:57PM +1100, Nick Piggin wrote:
>> >> I'd like to know about the status and plans of the inode scaling work.
>> >>
>> >> Aiming for 2.6.39 or 40 might be an idea.
>> >
>> > It was ready for 2.6.38....
>>
>> Can it go into linux-next? Unless Al can put it in his for-next branch?
>
> It'll go into linux-next whenever it is sufficiently reviewed and
> tested to get pulled into an upstream branch.
>
>> >> I'd like to be able to see what patches are proposed to finish breaking
>> >> inode_lock into its constituent sub classes, and we can go from there.
>> >
>> > Already done. I pointed you at the tree a while back:
>> >
>> > The following changes since commit c723fdab8aa728dc2bf0da6a0de8bb9c3f588d84:
>> >
>> >  Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 (2011-01-25 14:23:54 +1000)
>> >
>> > are available in the git repository at:
>> >
>> >  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git inode-scale
>> >
>> > Dave Chinner (9):
>> >      fs: protect inode->i_state with inode->i_lock
>>
>> The places where you take i_lock to initialize i_state in newly
>> allocated inodes should not be required, because they can't be found
>> until they get added to lists, and adding to the list must have the right
>> barriers or lock synchronisation to make sure traversals don't see
>> uninitialised inodes.
>>
>> eg.
>>                          spin_lock(&inode->i_lock);
>>                          inode->i_state = I_NEW;
>>                          hlist_add_head(&inode->i_hash, head);
>>                          spin_unlock(&inode->i_lock);
>>                          inode_sb_list_add(inode);
>>                          spin_unlock(&inode_lock);
>>
>> That lightens new inode paths a little bit.
>
> Not sure what you are try to say here - the above code fragment is
> exactly how the code ends up after the patch....

I meant that the i_lock can be removed from that section.

>
> As it is, the reason it was done this way was suggested by Al:
>
> http://marc.info/?l=linux-fsdevel&m=128832928023866&w=2
>
> We use ->i_lock when checking inode_unhashed(), so we need to hold
> the ->i_lock when adding or removing the inode from the hash list.

Ok right. I thought you were avoiding that approach in favour of
checking i_state bits and using the data structure locks (as opposed
to i_lock).


> This is also important for when hash lookups get converted to RCU
> traversals, because then the only thing protecting a newly added
> inode from a concurrent traversal is the ->i_lock...

Well rcu_assign_pointer has a memory barrier in it. So it would be
possible for rcu lookups to do

spin_lock(&inode->i_lock);
if (inode->i_state & I_HASHED) {
  /* do stuff */
}

Even without the spin lock in the inode init code.


>> "Also remove the unlock_new_inode() memory barrier optimisation
>> required to avoid taking the inode_lock when clearing I_NEW.
>> Simplify the code by simply taking the inode->i_lock around the
>> state change and wakeup. Because the wakeup is no longer tricky,
>> remove the wake_up_inode() function and open code the wakeup where
>> necessary."
>>
>> Can you do that bit separately?
>
> We'd still need to modify that code to protect the i_state
> manipulations with ->i_lock in this patch, so it doesn't really make
> sense to me to undo the optimisation in one patch, then remove the
> rest of it in another patch...

I'm not sure that I understand. The unlock_new_inode() path, for
example, could retain the optimisation, couldn't it?


>> >      fs: factor inode disposal
>>
>> -       spin_unlock(&inode->i_lock);
>> -
>> -       /*
>> -        * Move the inode off the IO lists and LRU once I_FREEING is
>> -        * set so that it won't get moved back on there if it is dirty.
>> -        */
>>         inode_lru_list_del(inode);
>> -       list_del_init(&inode->i_wb_list);
>> -
>> -       __inode_sb_list_del(inode);
>> +       spin_unlock(&inode->i_lock);
>>         spin_unlock(&inode_lock);
>> +
>>
>> Why do you change i_lock to cover inode_lru_list_del in this
>> patch?
>
> So that all LRU manipultions in the function are done consiѕtently
> i.e. with the ->i_lock held.

Why do you need that?


>> >      fs: Lock the inode LRU list separately
>>
>> BTW. have you found any issues with nesting locks inside i_lock? We
>> do that extensively in dcache now, but it can be trivially made to use
>> another lock (eg. i_dcache_lock).
>
> No new ones.

But did you see actual problems that you thought might appear?


>> >      fs: remove inode_lock from iput_final and prune_icache
>> >      fs: move i_sb_list out from under inode_lock
>>
>> I don't know if we should be including ../internal.h, although you
>> could argue that quota and notify is core code.
>
> I don't think there is much to argue about there.

So, what is the use of internal.h? And why not just put it in fs.h (like
other things which are only to be used by core code)?

I'm not saying one way or the other is "correct", but a single policy
should be applied.


>> >      fs: move i_wb_list out from under inode_lock
>> >      fs: rename inode_lock to inode_hash_lock
>> >      fs: Clean up documentation references to inode_lock
>>
>> Can these be merged with the patches as they change the locking?
>> Some places also call it "inode lock" (eg. find_inode())
>
> Yeah, probably should.

Thanks.


>> >      fs: pull inode->i_lock up out of writeback_single_inode
>>
>> I think the series looks pretty good overall. Thanks.
>
> Don't thank me, thank Christoph, Al, Andrew, Eric and others for
> reviewing it over and over to beat it into this shape.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Status of inode scaling work
  2011-01-27  5:11       ` Nick Piggin
@ 2011-01-27  6:20         ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2011-01-27  6:20 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Dave Chinner, Al Viro, Christoph Hellwig, Linus Torvalds, linux-fsdevel

On Thu, Jan 27, 2011 at 04:11:24PM +1100, Nick Piggin wrote:
> On Thu, Jan 27, 2011 at 3:57 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Jan 27, 2011 at 11:40:57AM +1100, Nick Piggin wrote:
> >> On Tue, Jan 25, 2011 at 6:10 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> > On Fri, Jan 14, 2011 at 08:02:57PM +1100, Nick Piggin wrote:
> >> >> I'd like to know about the status and plans of the inode scaling work.
> >> >>
> >> >> Aiming for 2.6.39 or 40 might be an idea.
> >> >
> >> > It was ready for 2.6.38....
> >>
> >> Can it go into linux-next? Unless Al can put it in his for-next branch?
> >
> > It'll go into linux-next whenever it is sufficiently reviewed and
> > tested to get pulled into an upstream branch.
> >
> >> >> I'd like to be able to see what patches are proposed to finish breaking
> >> >> inode_lock into its constituent sub classes, and we can go from there.
> >> >
> >> > Already done. I pointed you at the tree a while back:
> >> >
> >> > The following changes since commit c723fdab8aa728dc2bf0da6a0de8bb9c3f588d84:
> >> >
> >> >  Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 (2011-01-25 14:23:54 +1000)
> >> >
> >> > are available in the git repository at:
> >> >
> >> >  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git inode-scale
> >> >
> >> > Dave Chinner (9):
> >> >      fs: protect inode->i_state with inode->i_lock
> >>
> >> The places where you take i_lock to initialize i_state in newly
> >> allocated inodes should not be required, because they can't be found
> >> until they get added to lists, and adding to the list must have the right
> >> barriers or lock synchronisation to make sure traversals don't see
> >> uninitialised inodes.
> >>
> >> eg.
> >>                          spin_lock(&inode->i_lock);
> >>                          inode->i_state = I_NEW;
> >>                          hlist_add_head(&inode->i_hash, head);
> >>                          spin_unlock(&inode->i_lock);
> >>                          inode_sb_list_add(inode);
> >>                          spin_unlock(&inode_lock);
> >>
> >> That lightens new inode paths a little bit.
> >
> > Not sure what you are try to say here - the above code fragment is
> > exactly how the code ends up after the patch....
> 
> I meant that the i_lock can be removed from that section.
> 
> >
> > As it is, the reason it was done this way was suggested by Al:
> >
> > http://marc.info/?l=linux-fsdevel&m=128832928023866&w=2
> >
> > We use ->i_lock when checking inode_unhashed(), so we need to hold
> > the ->i_lock when adding or removing the inode from the hash list.
> 
> Ok right. I thought you were avoiding that approach in favour of
> checking i_state bits and using the data structure locks (as opposed
> to i_lock).

I've done whatever has seemed sane to do. What Al suggested
simplified the locking, fit in simply with the existing code, was
easy to understand from reading the code and consistent. Seemed like
a no-brainer to me...

> > This is also important for when hash lookups get converted to RCU
> > traversals, because then the only thing protecting a newly added
> > inode from a concurrent traversal is the ->i_lock...
> 
> Well rcu_assign_pointer has a memory barrier in it. So it would be
> possible for rcu lookups to do
> 
> spin_lock(&inode->i_lock);
> if (inode->i_state & I_HASHED) {
>   /* do stuff */
> }
> 
> Even without the spin lock in the inode init code.

Well, we don't have an I_HASHED bit, and this is the first
anyone has mentioned introducing it. Given that we have many
different ways of allocating new and hashing inodes, it seems
unnecessary to change the way we track how inodes are hashed right
now. Especially as some filesystems fake it (see the use of
hlist_add_fake()) because some core functions will only work on
hashed inodes. The proposed method handles this all just fine
without needing to change everything.

> >> "Also remove the unlock_new_inode() memory barrier optimisation
> >> required to avoid taking the inode_lock when clearing I_NEW.
> >> Simplify the code by simply taking the inode->i_lock around the
> >> state change and wakeup. Because the wakeup is no longer tricky,
> >> remove the wake_up_inode() function and open code the wakeup where
> >> necessary."
> >>
> >> Can you do that bit separately?
> >
> > We'd still need to modify that code to protect the i_state
> > manipulations with ->i_lock in this patch, so it doesn't really make
> > sense to me to undo the optimisation in one patch, then remove the
> > rest of it in another patch...
> 
> I'm not sure that I understand. The unlock_new_inode() path, for
> example, could retain the optimisation, couldn't it?

Seeing as it is not a necessary optimisation once the inode-lock is
removed, I much prefer to simplify the code first. If taking the
inode->i_lock proves to cause loss of performance or contention,
then we can re-introduce the optimisation. Otherwise it's just
complexity for the sake of saying "look how smart and tricky we can
be".

> >> >      fs: factor inode disposal
> >>
> >> -       spin_unlock(&inode->i_lock);
> >> -
> >> -       /*
> >> -        * Move the inode off the IO lists and LRU once I_FREEING is
> >> -        * set so that it won't get moved back on there if it is dirty.
> >> -        */
> >>         inode_lru_list_del(inode);
> >> -       list_del_init(&inode->i_wb_list);
> >> -
> >> -       __inode_sb_list_del(inode);
> >> +       spin_unlock(&inode->i_lock);
> >>         spin_unlock(&inode_lock);
> >> +
> >>
> >> Why do you change i_lock to cover inode_lru_list_del in this
> >> patch?
> >
> > So that all LRU manipultions in the function are done consiѕtently
> > i.e. with the ->i_lock held.
> 
> Why do you need that?

Just for consistency. Makes the code easier to understand because
when people look at the function they don't have to go digging to
work out why the differences in locking order isn't a bug....

> >> >      fs: Lock the inode LRU list separately
> >>
> >> BTW. have you found any issues with nesting locks inside i_lock? We
> >> do that extensively in dcache now, but it can be trivially made to use
> >> another lock (eg. i_dcache_lock).
> >
> > No new ones.
> 
> But did you see actual problems that you thought might appear?

What, that people can't rely on the i_lock being innermost anymore?
That's a design issue, so it will 'll take time to shake out as
people add and change code. More than likely it will mean that
filesystems grow their own spinlocks rather than using
inode->i_lock...

> >> >      fs: remove inode_lock from iput_final and prune_icache
> >> >      fs: move i_sb_list out from under inode_lock
> >>
> >> I don't know if we should be including ../internal.h, although you
> >> could argue that quota and notify is core code.
> >
> > I don't think there is much to argue about there.
> 
> So, what is the use of internal.h? And why not just put it in fs.h (like
> other things which are only to be used by core code)?

fs.h is hardly for "only to be used by core code" definitions. Every
man and his dog include it, it defines userspace interfaces (e.g.
ioctls, fcntls, etc), anything defined in it must be exported, etc.
Hence it's not the place to put an external definition for a lock
that is a) not exported and b) should only be used by internal VFS
code....

> I'm not saying one way or the other is "correct", but a single policy
> should be applied.

Yup, and I think I followed the policy for using fs/internal.h here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-01-27  6:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-14  9:02 Status of inode scaling work Nick Piggin
2011-01-25  7:10 ` Dave Chinner
2011-01-27  0:40   ` Nick Piggin
2011-01-27  4:57     ` Dave Chinner
2011-01-27  5:11       ` Nick Piggin
2011-01-27  6:20         ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.