linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible race condition in vfs code
@ 2003-04-15 16:22 Dave Peterson
  2003-04-29 22:51 ` Mika Penttilä
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Peterson @ 2003-04-15 16:22 UTC (permalink / raw)
  To: linux-kernel

I think I found a race condition in some of the inode handling
code.  Here's where I think it is:

    1.  Task A is inside kill_super().  It clears the MS_ACTIVE
        flag of the s_flags field of the super_block struct and
        calls invalidate_inodes().  In invalidate_inodes(), it
        attempts to acquire inode_lock and spins because task B,
        executing inside iput(), just decremented the reference
        count of some inode i to 0 and acquired inode_lock.

    2.  Then the "if (!inode->i_nlink)" test condition evaluates
        to false for task B so it executes the following chunk
        of code:

        01 } else {
        02         if (!list_empty(&inode->i_hash)) {
        03                 if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
        04                         list_del(&inode->i_list);
        05                         list_add(&inode->i_list, &inode_unused);
        06                 }
        07                 inodes_stat.nr_unused++;
        08                 spin_unlock(&inode_lock);
        09                 if (!sb || (sb->s_flags & MS_ACTIVE))
        10                         return;
        11                 write_inode_now(inode, 1);
        12                 spin_lock(&inode_lock);
        13                 inodes_stat.nr_unused--;
        14                 list_del_init(&inode->i_hash);
        15         }
        16         list_del_init(&inode->i_list);
        17         inode->i_state|=I_FREEING;
        18         inodes_stat.nr_inodes--;
        19         spin_unlock(&inode_lock);
        20         if (inode->i_data.nrpages)
        21                 truncate_inode_pages(&inode->i_data, 0);
        22         clear_inode(inode);
        23 }
        24 destroy_inode(inode);

        Now the test condition on line 02 evaluates to true, so
        task B adds the inode to the inode_unused list and then
        releases inode_lock on line 08.

    3.  Now A acquires inode_lock and B spins on inode_lock inside
        write_inode_now();

    4.  Task A calls invalidate_list(), transferring inode i from
        the inode_unused list to its own private throw_away list.

    5.  Task A releases inode_lock, allowing B to acquire inode_lock
        and continue executing.

    6.  A attempts to destroy inode i inside dispose_list() while B
        simultaneously attempts to destroy i, potentially causing
        all sorts of bad things to happen.

So, did I find a bug or are my eyes playing tricks on me?

-Dave Peterson
 dsp@llnl.gov

P.S.  Please CC my email address when responding to this message.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: possible race condition in vfs code
  2003-04-15 16:22 possible race condition in vfs code Dave Peterson
@ 2003-04-29 22:51 ` Mika Penttilä
  2003-04-29 23:03   ` viro
  0 siblings, 1 reply; 4+ messages in thread
From: Mika Penttilä @ 2003-04-29 22:51 UTC (permalink / raw)
  To: Dave Peterson; +Cc: linux-kernel

That piece of code looks wrong in other ways also..if we have unmounted 
an active fs (which shouldn't be done but happens!) we shouldn't be at 
least writing back to it anything! The !sb test isn't useful (we never 
clear it in live inodes) and MS_ACTIVE handling is racy as hell wrt 
umount...

--Mika


Dave Peterson wrote:

>I think I found a race condition in some of the inode handling
>code.  Here's where I think it is:
>
>    1.  Task A is inside kill_super().  It clears the MS_ACTIVE
>        flag of the s_flags field of the super_block struct and
>        calls invalidate_inodes().  In invalidate_inodes(), it
>        attempts to acquire inode_lock and spins because task B,
>        executing inside iput(), just decremented the reference
>        count of some inode i to 0 and acquired inode_lock.
>
>    2.  Then the "if (!inode->i_nlink)" test condition evaluates
>        to false for task B so it executes the following chunk
>        of code:
>
>        01 } else {
>        02         if (!list_empty(&inode->i_hash)) {
>        03                 if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
>        04                         list_del(&inode->i_list);
>        05                         list_add(&inode->i_list, &inode_unused);
>        06                 }
>        07                 inodes_stat.nr_unused++;
>        08                 spin_unlock(&inode_lock);
>        09                 if (!sb || (sb->s_flags & MS_ACTIVE))
>        10                         return;
>        11                 write_inode_now(inode, 1);
>        12                 spin_lock(&inode_lock);
>        13                 inodes_stat.nr_unused--;
>        14                 list_del_init(&inode->i_hash);
>        15         }
>        16         list_del_init(&inode->i_list);
>        17         inode->i_state|=I_FREEING;
>        18         inodes_stat.nr_inodes--;
>        19         spin_unlock(&inode_lock);
>        20         if (inode->i_data.nrpages)
>        21                 truncate_inode_pages(&inode->i_data, 0);
>        22         clear_inode(inode);
>        23 }
>        24 destroy_inode(inode);
>
>        Now the test condition on line 02 evaluates to true, so
>        task B adds the inode to the inode_unused list and then
>        releases inode_lock on line 08.
>
>    3.  Now A acquires inode_lock and B spins on inode_lock inside
>        write_inode_now();
>
>    4.  Task A calls invalidate_list(), transferring inode i from
>        the inode_unused list to its own private throw_away list.
>
>    5.  Task A releases inode_lock, allowing B to acquire inode_lock
>        and continue executing.
>
>    6.  A attempts to destroy inode i inside dispose_list() while B
>        simultaneously attempts to destroy i, potentially causing
>        all sorts of bad things to happen.
>
>So, did I find a bug or are my eyes playing tricks on me?
>
>-Dave Peterson
> dsp@llnl.gov
>
>P.S.  Please CC my email address when responding to this message.
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: possible race condition in vfs code
  2003-04-29 22:51 ` Mika Penttilä
@ 2003-04-29 23:03   ` viro
  2003-04-30  6:52     ` Mika Penttilä
  0 siblings, 1 reply; 4+ messages in thread
From: viro @ 2003-04-29 23:03 UTC (permalink / raw)
  To: Mika Penttilä; +Cc: Dave Peterson, linux-kernel

On Wed, Apr 30, 2003 at 01:51:42AM +0300, Mika Penttilä wrote:
> That piece of code looks wrong in other ways also..if we have unmounted 
> an active fs (which shouldn't be done but happens!) we shouldn't be at 
> least writing back to it anything! The !sb test isn't useful (we never 
> clear it in live inodes) and MS_ACTIVE handling is racy as hell wrt 
> umount...

Would you mind actually _reading_ kill_super() and stuff called by it?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: possible race condition in vfs code
  2003-04-29 23:03   ` viro
@ 2003-04-30  6:52     ` Mika Penttilä
  0 siblings, 0 replies; 4+ messages in thread
From: Mika Penttilä @ 2003-04-30  6:52 UTC (permalink / raw)
  To: viro; +Cc: Dave Peterson, linux-kernel


I mean busy inodes after umount doing bogus write_inode_now. Busy inodes 
don't pin the superblock (vfsmnt does but it's gone, otherwise we 
wouldn't be in kill_sb in the first place).

Dave Peterson's fix solves the double free issue, but does potentially 
introduce another busy inode after sb has gone.

--Mika



viro@parcelfarce.linux.theplanet.co.uk wrote:

>On Wed, Apr 30, 2003 at 01:51:42AM +0300, Mika Penttilä wrote:
>  
>
>>That piece of code looks wrong in other ways also..if we have unmounted 
>>an active fs (which shouldn't be done but happens!) we shouldn't be at 
>>least writing back to it anything! The !sb test isn't useful (we never 
>>clear it in live inodes) and MS_ACTIVE handling is racy as hell wrt 
>>umount...
>>    
>>
>
>Would you mind actually _reading_ kill_super() and stuff called by it?
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-04-30  6:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-15 16:22 possible race condition in vfs code Dave Peterson
2003-04-29 22:51 ` Mika Penttilä
2003-04-29 23:03   ` viro
2003-04-30  6:52     ` Mika Penttilä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).