linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* locking rules for ->dirty_inode()
@ 2002-09-20 15:00 Nikita Danilov
  2002-09-20 15:52 ` Andrew Morton
  2002-09-20 22:41 ` Andrew Morton
  0 siblings, 2 replies; 9+ messages in thread
From: Nikita Danilov @ 2002-09-20 15:00 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Alexander Viro, Andrew Morton

Hello,

Documentation/filesystems/Locking states that all super operations may
block, but __set_page_dirty_buffers() calls

   __mark_inode_dirty()->s_op->dirty_inode()

under mapping->private_lock spin lock. This seems strange, because file
systems' ->dirty_inode() assume that they are allowed to block. For
example, ext3_dirty_inode() allocates memory in

   ext3_journal_start()->journal_start()->new_handle()->...

Nikita.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: locking rules for ->dirty_inode()
  2002-09-20 15:00 locking rules for ->dirty_inode() Nikita Danilov
@ 2002-09-20 15:52 ` Andrew Morton
  2002-09-20 16:32   ` Nikita Danilov
  2002-09-20 22:41 ` Andrew Morton
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2002-09-20 15:52 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Linux Kernel Mailing List, Alexander Viro

Nikita Danilov wrote:
> 
> Hello,
> 
> Documentation/filesystems/Locking states that all super operations may
> block, but __set_page_dirty_buffers() calls
> 
>    __mark_inode_dirty()->s_op->dirty_inode()
> 
> under mapping->private_lock spin lock. This seems strange, because file
> systems' ->dirty_inode() assume that they are allowed to block. For
> example, ext3_dirty_inode() allocates memory in
> 
>    ext3_journal_start()->journal_start()->new_handle()->...
> 

OK, thanks.

mapping->private_lock is taken there to pin page->buffers()
(Can't lock the page because set_page_dirty is called under
page_table_lock, and other locks).

I'm sure we can just move the spin_unlock up to above the
TestSetPageDirty(), but I need to zenuflect for a while over
why I did it that way.

It's necessary to expose buffer-dirtiness and page-dirtiness
to the rest of the world in the correct order.  If we set the
page dirty and then the buffers, there is a window in which writeback
could find the dirty page, try to write it, discover clean buffers
and mark the page clean.  We would end up with a !PageDirty page,
on mapping->clean_pages, with dirty buffers.  It would never be
written.

Yup.  We can move that spin_unlock up ten lines.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: locking rules for ->dirty_inode()
  2002-09-20 15:52 ` Andrew Morton
@ 2002-09-20 16:32   ` Nikita Danilov
  2002-09-20 16:47     ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Nikita Danilov @ 2002-09-20 16:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List, Alexander Viro

Andrew Morton writes:
 > Nikita Danilov wrote:
 > > 
 > > Hello,
 > > 
 > > Documentation/filesystems/Locking states that all super operations may
 > > block, but __set_page_dirty_buffers() calls
 > > 
 > >    __mark_inode_dirty()->s_op->dirty_inode()
 > > 
 > > under mapping->private_lock spin lock. This seems strange, because file
 > > systems' ->dirty_inode() assume that they are allowed to block. For
 > > example, ext3_dirty_inode() allocates memory in
 > > 
 > >    ext3_journal_start()->journal_start()->new_handle()->...
 > > 
 > 
 > OK, thanks.
 > 
 > mapping->private_lock is taken there to pin page->buffers()
 > (Can't lock the page because set_page_dirty is called under
 > page_table_lock, and other locks).
 > 
 > I'm sure we can just move the spin_unlock up to above the
 > TestSetPageDirty(), but I need to zenuflect for a while over
 > why I did it that way.
 > 
 > It's necessary to expose buffer-dirtiness and page-dirtiness
 > to the rest of the world in the correct order.  If we set the
 > page dirty and then the buffers, there is a window in which writeback
 > could find the dirty page, try to write it, discover clean buffers
 > and mark the page clean.  We would end up with a !PageDirty page,
 > on mapping->clean_pages, with dirty buffers.  It would never be
 > written.
 > 
 > Yup.  We can move that spin_unlock up ten lines.

Actually, I came over this while trying to describe lock ordering in
reiser4 after I just started integrating other kernel locks there. I
wonder, has somebody already done this, writing up kernel lock
hierarchy, that is?

Nikita.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: locking rules for ->dirty_inode()
  2002-09-20 16:32   ` Nikita Danilov
@ 2002-09-20 16:47     ` Andrew Morton
  2002-09-20 17:32       ` Nikita Danilov
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2002-09-20 16:47 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Linux Kernel Mailing List, Alexander Viro

Nikita Danilov wrote:
> 
> ...
> Actually, I came over this while trying to describe lock ordering in
> reiser4 after I just started integrating other kernel locks there. I
> wonder, has somebody already done this, writing up kernel lock
> hierarchy, that is?
> 

I've been keeping the comment at the top of filemap.c uptodate when
I discover things.  It got smaller a while ago when certain rude
locks were spoken to.

Really, this form of representation isn't rich enough, but the
format certainly provides enough info to know when you might be
taking locks in the wrong order, and it tells you where to look
to see them being taken.

The problem with the format is that locks are only mentioned once,
and it can't describe the whole graph.  Maybe it needs something
like:


 *  ->i_shared_lock             (vmtruncate)
 *    ->private_lock            (__free_pte->__set_page_dirty_buffers)
 *      ->swap_list_lock
 *        ->swap_device_lock    (exclusive_swap_page, others)
 *          ->mapping->page_lock
 *      ->inode_lock            (__mark_inode_dirty)
 *        ->sb_lock             (fs/fs-writeback.c)
+*  ->i_shared_lock
+*    ->page_table_lock         (lots of places)
 */

Don't know.  Maybe someone somewhere has developed a notation
for this?   How are you doing it?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: locking rules for ->dirty_inode()
  2002-09-20 16:47     ` Andrew Morton
@ 2002-09-20 17:32       ` Nikita Danilov
  2002-09-20 18:21         ` Hans Reiser
  0 siblings, 1 reply; 9+ messages in thread
From: Nikita Danilov @ 2002-09-20 17:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List, Alexander Viro

Andrew Morton writes:
 > Nikita Danilov wrote:
 > > 
 > > ...
 > > Actually, I came over this while trying to describe lock ordering in
 > > reiser4 after I just started integrating other kernel locks there. I
 > > wonder, has somebody already done this, writing up kernel lock
 > > hierarchy, that is?
 > > 
 > 
 > I've been keeping the comment at the top of filemap.c uptodate when
 > I discover things.  It got smaller a while ago when certain rude
 > locks were spoken to.
 > 
 > Really, this form of representation isn't rich enough, but the
 > format certainly provides enough info to know when you might be
 > taking locks in the wrong order, and it tells you where to look
 > to see them being taken.
 > 
 > The problem with the format is that locks are only mentioned once,
 > and it can't describe the whole graph.  Maybe it needs something
 > like:
 > 
 > 
 >  *  ->i_shared_lock             (vmtruncate)
 >  *    ->private_lock            (__free_pte->__set_page_dirty_buffers)
 >  *      ->swap_list_lock
 >  *        ->swap_device_lock    (exclusive_swap_page, others)
 >  *          ->mapping->page_lock
 >  *      ->inode_lock            (__mark_inode_dirty)
 >  *        ->sb_lock             (fs/fs-writeback.c)
 > +*  ->i_shared_lock
 > +*    ->page_table_lock         (lots of places)
 >  */
 > 
 > Don't know.  Maybe someone somewhere has developed a notation
 > for this?   How are you doing it?

I am doing it roughly in the same way: in a similar diagram where each lock is
mentioned exactly once, locks can be acquired from the left to the
right. Locks on the same indentation level are unordered and cannot be held at
the same time. This is enough to express lock *ordering*, whenever it
exists. For example, you diagram above gives:

->i_shared_lock             (vmtruncate)
  ->private_lock            (__free_pte->__set_page_dirty_buffers)
    ->swap_list_lock
      ->swap_device_lock    (exclusive_swap_page, others)
        ->mapping->page_lock
    ->inode_lock            (__mark_inode_dirty)
      ->sb_lock             (fs/fs-writeback.c)
  ->page_table_lock         (lots of places)

And it means that neither ->private_lock and ->page_table_lock nor
->swap_list_lock and ->inode_lock cannot be held at the same time.

As you mentioned (if I understood correctly) this is insufficient to
express where locks are taken and, more generally, how the lock graph is
embedded into the call graph. But lock ordering itself is quite useful. 

For example, in reiser4 we have SPIN_LOCK_FUNCTIONS() macro. If one has

typedef struct txn_handle {
  ...
  spinlock_t   guard;
  ...
} txn_handle;

then SPIN_LOCK_FUNCTIONS(txn_handle, guard) will generate functions like

spin_lock_txn_handle();
spin_unlock_txn_handle();
spin_trylock_txn_handle(); etc.

In debugging mode, these functions (aside from manipulating locks)
modify special per-thread counters. This allows one to specify lock
ordering constraints. Each spin_lock_foo() function checks
spin_ordering_pred_foo() macro before taking lock. This was really
helpful during early debugging.

Nikita.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: locking rules for ->dirty_inode()
  2002-09-20 17:32       ` Nikita Danilov
@ 2002-09-20 18:21         ` Hans Reiser
  0 siblings, 0 replies; 9+ messages in thread
From: Hans Reiser @ 2002-09-20 18:21 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Andrew Morton, Linux Kernel Mailing List, Alexander Viro

You might ask the SGI guys what they do.  I know that they have a 
systematic approach to documenting lock ordering.

Hans



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: locking rules for ->dirty_inode()
  2002-09-20 15:00 locking rules for ->dirty_inode() Nikita Danilov
  2002-09-20 15:52 ` Andrew Morton
@ 2002-09-20 22:41 ` Andrew Morton
  2002-09-23 16:32   ` Nikita Danilov
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2002-09-20 22:41 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Linux Kernel Mailing List, Alexander Viro

Nikita Danilov wrote:
> 
> Hello,
> 
> Documentation/filesystems/Locking states that all super operations may
> block, but __set_page_dirty_buffers() calls
> 
>    __mark_inode_dirty()->s_op->dirty_inode()
> 
> under mapping->private_lock spin lock.

Actually it doesn't.  We do not call down into the filesystem
for I_DIRTY_PAGES.

set_page_dirty() is already called under locks, via __free_pte (pagetable
teardown).  2.4 does this as well.

But I'll make the change anyway.  I think it removes any
ranking requirements between mapping->page_lock and 
mapping->private_lock, which is always a nice thing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: locking rules for ->dirty_inode()
  2002-09-20 22:41 ` Andrew Morton
@ 2002-09-23 16:32   ` Nikita Danilov
  2002-09-23 16:42     ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Nikita Danilov @ 2002-09-23 16:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List, Alexander Viro

Andrew Morton writes:
 > Nikita Danilov wrote:
 > > 
 > > Hello,
 > > 
 > > Documentation/filesystems/Locking states that all super operations may
 > > block, but __set_page_dirty_buffers() calls
 > > 
 > >    __mark_inode_dirty()->s_op->dirty_inode()
 > > 
 > > under mapping->private_lock spin lock.
 > 
 > Actually it doesn't.  We do not call down into the filesystem
 > for I_DIRTY_PAGES.
 > 
 > set_page_dirty() is already called under locks, via __free_pte (pagetable
 > teardown).  2.4 does this as well.

Cannot find __free_pte, it is only mentioned in comments in mm/filemap.c
and include/asm-generic/tlb.h.

 > 
 > But I'll make the change anyway.  I think it removes any
 > ranking requirements between mapping->page_lock and 
 > mapping->private_lock, which is always a nice thing.

Nikita.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: locking rules for ->dirty_inode()
  2002-09-23 16:32   ` Nikita Danilov
@ 2002-09-23 16:42     ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2002-09-23 16:42 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Linux Kernel Mailing List, Alexander Viro

Nikita Danilov wrote:
> 
> Andrew Morton writes:
>  > Nikita Danilov wrote:
>  > >
>  > > Hello,
>  > >
>  > > Documentation/filesystems/Locking states that all super operations may
>  > > block, but __set_page_dirty_buffers() calls
>  > >
>  > >    __mark_inode_dirty()->s_op->dirty_inode()
>  > >
>  > > under mapping->private_lock spin lock.
>  >
>  > Actually it doesn't.  We do not call down into the filesystem
>  > for I_DIRTY_PAGES.
>  >
>  > set_page_dirty() is already called under locks, via __free_pte (pagetable
>  > teardown).  2.4 does this as well.
> 
> Cannot find __free_pte, it is only mentioned in comments in mm/filemap.c
> and include/asm-generic/tlb.h.
> 

It got moved around.  2.4: __free_pte(), 2.5: zap_pte_range().

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-09-23 19:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-20 15:00 locking rules for ->dirty_inode() Nikita Danilov
2002-09-20 15:52 ` Andrew Morton
2002-09-20 16:32   ` Nikita Danilov
2002-09-20 16:47     ` Andrew Morton
2002-09-20 17:32       ` Nikita Danilov
2002-09-20 18:21         ` Hans Reiser
2002-09-20 22:41 ` Andrew Morton
2002-09-23 16:32   ` Nikita Danilov
2002-09-23 16:42     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).