All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Lee Jones <lee.jones@linaro.org>,
	linux-ext4@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Dave Chinner <dchinner@redhat.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Bob Peterson <rpeterso@redhat.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Ritesh Harjani <riteshh@linux.ibm.com>,
	Johannes Thumshirn <jth@kernel.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	cluster-devel@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()
Date: Thu, 24 Feb 2022 21:29:01 +1100	[thread overview]
Message-ID: <20220224102901.GN59715@dread.disaster.area> (raw)
In-Reply-To: <YhcAcfY1pZTl3sId@mit.edu>

On Wed, Feb 23, 2022 at 10:50:09PM -0500, Theodore Ts'o wrote:
> On Thu, Feb 24, 2022 at 12:48:42PM +1100, Dave Chinner wrote:
> > > Fair enough; on the other hand, we could also view this as making ext4
> > > more robust against buggy code in other subsystems, and while other
> > > file systems may be losing user data if they are actually trying to do
> > > remote memory access to file-backed memory, apparently other file
> > > systems aren't noticing and so they're not crashing.
> > 
> > Oh, we've noticed them, no question about that.  We've got bug
> > reports going back years for systems being crashed, triggering BUGs
> > and/or corrupting data on both XFS and ext4 filesystems due to users
> > trying to run RDMA applications with file backed pages.
> 
> Is this issue causing XFS to crash?  I didn't know that.

I have no idea if crashes nowdays -  go back a few years before and
search for XFS BUGging out in ->invalidate_page (or was it
->release_page?) because of unexpected dirty pages. I think it could
also trigger BUGs in writeback when ->writepages tripped over a
dirty page without a delayed allocation mapping over the hole...

We were pretty aggressive about telling people reporting such issues
that they get to keep all the borken bits to themselves and to stop
wasting our time with unsolvable problems caused by their
broken-by-design RDMA applications. Hence people have largely
stopped bothering us with random filesystem crashes on systems using
RDMA on file-backed pages...

> I tried the Syzbot reproducer with XFS mounted, and it didn't trigger
> any crashes.  I'm sure data was getting corrupted, but I figured I
> should bring ext4 to the XFS level of "at least we're not reliably
> killing the kernel".

Oh, well, good to know XFS didn't die a horrible death immediately.
Thanks for checking, Ted.

> On ext4, an unprivileged process can use process_vm_writev(2) to crash
> the system.  I don't know how quickly we can get a fix into mm/gup.c,
> but if some other kernel path tries calling set_page_dirty() on a
> file-backed page without first asking permission from the file system,
> it seems to be nice if the file system doesn't BUG() --- as near as I
> can tell, xfs isn't crashing in this case, but ext4 is.

iomap is probably refusing to map holes for writepage - we've
cleaned up most of the weird edge cases to return errors, so I'm
guessing iomap is just ignoring such pages these days.

Yeah, see iomap_writepage_map():

                error = wpc->ops->map_blocks(wpc, inode, pos);
                if (error)
                        break;
                if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
                        continue;
                if (wpc->iomap.type == IOMAP_HOLE)
                        continue;

Yeah, so if writeback maps a hole rather than converts a delalloc
region to IOMAP_MAPPED, it'll just skip over the block/page.  IIRC,
they essentially become uncleanable pages, and I think eventually
inode reclaim will just toss them out of memory.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()
Date: Thu, 24 Feb 2022 21:29:01 +1100	[thread overview]
Message-ID: <20220224102901.GN59715@dread.disaster.area> (raw)
In-Reply-To: <YhcAcfY1pZTl3sId@mit.edu>

On Wed, Feb 23, 2022 at 10:50:09PM -0500, Theodore Ts'o wrote:
> On Thu, Feb 24, 2022 at 12:48:42PM +1100, Dave Chinner wrote:
> > > Fair enough; on the other hand, we could also view this as making ext4
> > > more robust against buggy code in other subsystems, and while other
> > > file systems may be losing user data if they are actually trying to do
> > > remote memory access to file-backed memory, apparently other file
> > > systems aren't noticing and so they're not crashing.
> > 
> > Oh, we've noticed them, no question about that.  We've got bug
> > reports going back years for systems being crashed, triggering BUGs
> > and/or corrupting data on both XFS and ext4 filesystems due to users
> > trying to run RDMA applications with file backed pages.
> 
> Is this issue causing XFS to crash?  I didn't know that.

I have no idea if crashes nowdays -  go back a few years before and
search for XFS BUGging out in ->invalidate_page (or was it
->release_page?) because of unexpected dirty pages. I think it could
also trigger BUGs in writeback when ->writepages tripped over a
dirty page without a delayed allocation mapping over the hole...

We were pretty aggressive about telling people reporting such issues
that they get to keep all the borken bits to themselves and to stop
wasting our time with unsolvable problems caused by their
broken-by-design RDMA applications. Hence people have largely
stopped bothering us with random filesystem crashes on systems using
RDMA on file-backed pages...

> I tried the Syzbot reproducer with XFS mounted, and it didn't trigger
> any crashes.  I'm sure data was getting corrupted, but I figured I
> should bring ext4 to the XFS level of "at least we're not reliably
> killing the kernel".

Oh, well, good to know XFS didn't die a horrible death immediately.
Thanks for checking, Ted.

> On ext4, an unprivileged process can use process_vm_writev(2) to crash
> the system.  I don't know how quickly we can get a fix into mm/gup.c,
> but if some other kernel path tries calling set_page_dirty() on a
> file-backed page without first asking permission from the file system,
> it seems to be nice if the file system doesn't BUG() --- as near as I
> can tell, xfs isn't crashing in this case, but ext4 is.

iomap is probably refusing to map holes for writepage - we've
cleaned up most of the weird edge cases to return errors, so I'm
guessing iomap is just ignoring such pages these days.

Yeah, see iomap_writepage_map():

                error = wpc->ops->map_blocks(wpc, inode, pos);
                if (error)
                        break;
                if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
                        continue;
                if (wpc->iomap.type == IOMAP_HOLE)
                        continue;

Yeah, so if writeback maps a hole rather than converts a delalloc
region to IOMAP_MAPPED, it'll just skip over the block/page.  IIRC,
they essentially become uncleanable pages, and I think eventually
inode reclaim will just toss them out of memory.

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com



  reply	other threads:[~2022-02-24 10:29 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16 16:31 [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers() Lee Jones
2022-02-16 16:31 ` [Cluster-devel] " Lee Jones
2022-02-18  1:06 ` John Hubbard
2022-02-18  1:06   ` [Cluster-devel] " John Hubbard
2022-02-18  4:08   ` Theodore Ts'o
2022-02-18  4:08     ` [Cluster-devel] " Theodore Ts'o
2022-02-18  6:33     ` John Hubbard
2022-02-18  6:33       ` [Cluster-devel] " John Hubbard
2022-02-23 23:31       ` Theodore Ts'o
2022-02-23 23:31         ` [Cluster-devel] " Theodore Ts'o
2022-02-24  0:44         ` John Hubbard
2022-02-24  0:44           ` [Cluster-devel] " John Hubbard
2022-02-24  4:04           ` Theodore Ts'o
2022-02-24  4:04             ` [Cluster-devel] " Theodore Ts'o
2022-02-18  7:51     ` Greg Kroah-Hartman
2022-02-18  7:51       ` [Cluster-devel] " Greg Kroah-Hartman
2022-02-23 23:35       ` Theodore Ts'o
2022-02-23 23:35         ` [Cluster-devel] " Theodore Ts'o
2022-02-24  1:48         ` Dave Chinner
2022-02-24  1:48           ` [Cluster-devel] " Dave Chinner
2022-02-24  3:50           ` Theodore Ts'o
2022-02-24  3:50             ` [Cluster-devel] " Theodore Ts'o
2022-02-24 10:29             ` Dave Chinner [this message]
2022-02-24 10:29               ` Dave Chinner
2022-02-18  2:54 ` Theodore Ts'o
2022-02-18  2:54   ` [Cluster-devel] " Theodore Ts'o
2022-02-18  4:24   ` Matthew Wilcox
2022-02-18  4:24     ` [Cluster-devel] " Matthew Wilcox
2022-02-18  6:03     ` Theodore Ts'o
2022-02-18  6:03       ` [Cluster-devel] " Theodore Ts'o
2022-02-25 19:24 ` [PATCH -v2] ext4: don't BUG if kernel subsystems dirty pages without asking ext4 first Theodore Ts'o
2022-02-25 19:24   ` [Cluster-devel] " Theodore Ts'o
2022-02-25 20:51   ` Eric Biggers
2022-02-25 20:51     ` [Cluster-devel] " Eric Biggers
2022-02-25 21:08     ` Theodore Ts'o
2022-02-25 21:08       ` [Cluster-devel] " Theodore Ts'o
2022-02-25 21:23       ` [PATCH -v3] " Theodore Ts'o
2022-02-25 21:23         ` [Cluster-devel] " Theodore Ts'o
2022-02-25 21:33         ` John Hubbard
2022-02-25 21:33           ` [Cluster-devel] " John Hubbard
2022-02-25 23:21           ` Theodore Ts'o
2022-02-25 23:21             ` [Cluster-devel] " Theodore Ts'o
2022-02-26  0:18             ` Hillf Danton
2022-02-26  0:41             ` John Hubbard
2022-02-26  0:41               ` [Cluster-devel] " John Hubbard
2022-02-26  1:40               ` Theodore Ts'o
2022-02-26  1:40                 ` [Cluster-devel] " Theodore Ts'o
2022-02-26  2:00                 ` Theodore Ts'o
2022-02-26  2:00                   ` [Cluster-devel] " Theodore Ts'o
2022-02-26  2:55                 ` John Hubbard
2022-02-26  2:55                   ` [Cluster-devel] " John Hubbard
2022-03-03  4:26         ` [PATCH -v4] " Theodore Ts'o
2022-03-03  4:26           ` [Cluster-devel] " Theodore Ts'o
2022-03-03  8:21           ` Christoph Hellwig
2022-03-03  8:21             ` [Cluster-devel] " Christoph Hellwig
2022-03-03  9:21           ` Lee Jones
2022-03-03  9:21             ` [Cluster-devel] " Lee Jones
2022-03-03 14:38           ` [PATCH -v5] ext4: don't BUG if someone " Theodore Ts'o
2022-03-03 14:38             ` [Cluster-devel] " Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220224102901.GN59715@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=damien.lemoal@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jhubbard@nvidia.com \
    --cc=jth@kernel.org \
    --cc=lee.jones@linaro.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    --cc=riteshh@linux.ibm.com \
    --cc=rpeterso@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.