Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Martin Steigerwald <martin@lichtvoll.de>,
	Matthew Wilcox <willy@infradead.org>,
	dsterba@suse.cz, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	linux-m68k@lists.linux-m68k.org,
	Debian m68k <debian-68k@lists.debian.org>
Subject: Re: moving affs + RDB partition support to staging?
Date: Sun, 6 May 2018 08:40:00 +0100
Message-ID: <20180506073955.GJ30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20180506005946.GI30522@ZenIV.linux.org.uk>

On Sun, May 06, 2018 at 01:59:51AM +0100, Al Viro wrote:

> > There is nothing at the moment that needs fixing.
> 
> Funny, that...  I'd been going through the damn thing for the
> last week or so; open-by-fhandle/nfs export support is completely
> buggered.  And as for the rest... the least said about the error
> handling, the better - something like rename() hitting an IO
> error (read one) can not only screw the on-disk data into the
> ground, it can do seriously bad things to kernel data structures.

... and while we are at it, consider the following:

mkdir a
mkdir b
touch a/x
ln a/x b

<umount and mount again, to get cold dcache, or just wait for memory
pressure to evict those suckers>

process A: unlink("a/x");
process B: open("b/x");

We had two entries - one in a, another in b; the one in a is ST_FILE one,
the one in b - ST_LINKFILE, with ->original refering to the ST_FILE one.

unlink() needs to kick the entry out of a; since it can't leave an
ST_FILE not included into any directory (and since the file should live on
due to b/x still being there) it ends up removing ST_LINKFILE entry from
b and moving ST_FILE one in its place.  That happens in affs_remove_link().

However, open() gets there just as this surgery is about to begin.
It gets to affs_lookup(), which finds the entry for b/x, reads it,
stashes block number of that entry into dentry->d_fsdata, notices
that it's ST_LINKFILE one, picks the block number of ST_FILE one
out of it and proceeds to look up the inode; that doesn't end up
doing any work (the same inode is in icache due to a/x having been
looked up by unlink(2)).

In the meanwhile, since the hash table of b has been unlocked once
we'd done hash lookup, affs_remove_link() can proceed with the
surgery.  It *does* try to catch dentries with ->d_fsdata containing
the block number of sacrificed ST_LINKFILE and reset that to block
number of ST_FILE that gets moved in its place.  However, it does
so by
static void
affs_fix_dcache(struct inode *inode, u32 entry_ino)
{
        struct dentry *dentry;
        spin_lock(&inode->i_lock);
        hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) {
                if (entry_ino == (u32)(long)dentry->d_fsdata) {
                        dentry->d_fsdata = (void *)inode->i_ino;
                        break;
                }
        }
        spin_unlock(&inode->i_lock);
}
i.e. looping through dentries that point to our inode.  Except that
*our* dentry isn't attached to inode yet, so we are SOL - it's
left with ->d_fsdata pointing to (destroyed) ST_LINKFILE.

After a while process B closes the file and unlinks it.  Take a look
at affs_remove_header() and you'll see how much fun we are in for -
it uses ->d_fsdata to pick the entry to find and remove...

That's an fs corruptor, plain and simple.  As far as I had been able
to reconstruct the history, leftover after Roman's locking changes
17 years ago.  AFAICS, I'd missed it back then - the races I'd spotted
had been dealt with about a year later (when we started to lock the
victim in vfs_unlink/vfs_rmdir/vfs_rename), but this one went unnoticed...

  reply index

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25 15:46 Moving unmaintained filesystems to staging Matthew Wilcox
2018-04-25 15:47 ` Christoph Hellwig
2018-04-25 20:30 ` David Sterba
2018-04-26  2:57   ` Matthew Wilcox
2018-04-26 10:28     ` moving affs + RDB partition support to staging? (was: Re: Moving unmaintained filesystems to staging) Martin Steigerwald
2018-04-26 10:45       ` moving affs + RDB partition support to staging? John Paul Adrian Glaubitz
2018-04-26 10:59         ` David Sterba
2018-04-26 11:06           ` John Paul Adrian Glaubitz
2018-05-06  0:59         ` Al Viro
2018-05-06  7:40           ` Al Viro [this message]
2018-05-06 20:46             ` Al Viro
2018-05-06 20:49               ` John Paul Adrian Glaubitz
2018-05-06 21:32               ` Al Viro
2018-05-07  2:15                 ` Al Viro
2018-05-07  2:40                   ` Michael Schmitz
2018-05-07  7:08                     ` Martin Steigerwald
2018-05-07 20:50                       ` Michael Schmitz
2018-05-07 20:56                         ` Ingo Jürgensmann
2018-05-07 20:58                           ` John Paul Adrian Glaubitz
2018-05-06  8:40           ` John Paul Adrian Glaubitz
2018-05-06 10:12           ` Martin Steigerwald
2018-04-26 11:00       ` moving affs + RDB partition support to staging? (was: Re: Moving unmaintained filesystems to staging) Christoph Hellwig
2018-04-26 11:08       ` Geert Uytterhoeven
2018-04-26 23:56         ` Finn Thain
2018-04-27  1:43           ` moving affs + RDB partition support to staging? jdow
2018-04-27  1:26         ` jdow
2018-05-06  8:52           ` John Paul Adrian Glaubitz
2018-05-06 10:10             ` Martin Steigerwald
2018-05-07  4:54             ` jdow
2018-04-27  2:11         ` moving affs + RDB partition support to staging? (was: Re: Moving unmaintained filesystems to staging) Michael Schmitz
2018-06-24  9:06           ` Martin Steigerwald
2018-06-24 11:33             ` moving affs + RDB partition support to staging? jdow
2018-06-24 11:40             ` jdow
2018-06-26  2:23               ` Michael Schmitz
2018-06-26  5:17                 ` jdow
2018-06-26  8:12                   ` Martin Steigerwald
2018-06-26  9:46                     ` jdow
2018-06-26  8:31                   ` Michael Schmitz
2018-06-26  9:45                     ` jdow
2018-06-27  1:07                       ` Michael Schmitz
2018-06-27  6:24                         ` jdow
2018-06-27  8:03                           ` Martin Steigerwald
2018-06-28  2:57                             ` jdow
2018-06-28  7:40                               ` Amiga RDB partition support for disks >= 2 TB (was: Re: moving affs + RDB partition support to staging?) Martin Steigerwald
2018-06-27  9:00                           ` moving affs + RDB partition support to staging? Michael Schmitz
2018-06-28  3:44                             ` jdow
2018-06-28  5:43                               ` Michael Schmitz
2018-06-28  6:39                                 ` jdow
2018-06-28  8:16                                   ` Amiga RDB partition support for disks >= 2 TB (was: Re: moving affs + RDB partition support to staging?) Martin Steigerwald
2018-06-28 10:00                                     ` Amiga RDB partition support for disks >= 2 TB jdow
2018-06-28 11:30                                       ` Martin Steigerwald
2018-06-28 11:38                                         ` Martin Steigerwald
2018-06-28 12:31                                           ` jdow
2018-06-28  8:07                                 ` Amiga RDB partition support for disks >= 2 TB (was: Re: moving affs + RDB partition support to staging?) Martin Steigerwald
2018-06-27  7:57                         ` moving affs + RDB partition support to staging? Martin Steigerwald
2018-06-28  2:56                           ` jdow
2018-06-26  8:02                 ` Martin Steigerwald
2018-06-26  8:40                   ` Michael Schmitz
2018-06-26  9:31                   ` jdow
2018-06-25  7:53             ` moving affs + RDB partition support to staging? (was: Re: Moving unmaintained filesystems to staging) Michael Schmitz
2018-06-25  8:26               ` Martin Steigerwald
2018-06-25  8:40               ` Geert Uytterhoeven
2018-04-27  8:01         ` Martin Steigerwald
2018-04-26  4:58   ` Moving unmaintained filesystems to staging Nikolay Borisov
2018-04-26  5:30     ` Willy Tarreau
2018-04-26  6:11 ` Pavel Machek
2018-04-26 10:36   ` Martin Steigerwald
2018-05-03  9:18     ` Pavel Machek
2018-04-27  1:10   ` Luis R. Rodriguez
2018-04-29 12:07   ` Greg KH
2018-04-29 20:07     ` Ondrej Zary
2018-04-29 23:37       ` Greg KH
2018-05-01 10:14         ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180506073955.GJ30522@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=axboe@kernel.dk \
    --cc=debian-68k@lists.debian.org \
    --cc=dsterba@suse.cz \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=martin@lichtvoll.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git