Re: [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

From: Dave Chinner <david@fromorbit.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Lee Jones <lee.jones@linaro.org>,
	linux-ext4@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Dave Chinner <dchinner@redhat.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Bob Peterson <rpeterso@redhat.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Ritesh Harjani <riteshh@linux.ibm.com>,
	Johannes Thumshirn <jth@kernel.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	cluster-devel@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()
Date: Thu, 24 Feb 2022 21:29:01 +1100	[thread overview]
Message-ID: <20220224102901.GN59715@dread.disaster.area> (raw)
In-Reply-To: <YhcAcfY1pZTl3sId@mit.edu>

On Wed, Feb 23, 2022 at 10:50:09PM -0500, Theodore Ts'o wrote:
> On Thu, Feb 24, 2022 at 12:48:42PM +1100, Dave Chinner wrote:
> > > Fair enough; on the other hand, we could also view this as making ext4
> > > more robust against buggy code in other subsystems, and while other
> > > file systems may be losing user data if they are actually trying to do
> > > remote memory access to file-backed memory, apparently other file
> > > systems aren't noticing and so they're not crashing.
> > 
> > Oh, we've noticed them, no question about that.  We've got bug
> > reports going back years for systems being crashed, triggering BUGs
> > and/or corrupting data on both XFS and ext4 filesystems due to users
> > trying to run RDMA applications with file backed pages.
> 
> Is this issue causing XFS to crash?  I didn't know that.

I have no idea if crashes nowdays -  go back a few years before and
search for XFS BUGging out in ->invalidate_page (or was it
->release_page?) because of unexpected dirty pages. I think it could
also trigger BUGs in writeback when ->writepages tripped over a
dirty page without a delayed allocation mapping over the hole...

We were pretty aggressive about telling people reporting such issues
that they get to keep all the borken bits to themselves and to stop
wasting our time with unsolvable problems caused by their
broken-by-design RDMA applications. Hence people have largely
stopped bothering us with random filesystem crashes on systems using
RDMA on file-backed pages...

> I tried the Syzbot reproducer with XFS mounted, and it didn't trigger
> any crashes.  I'm sure data was getting corrupted, but I figured I
> should bring ext4 to the XFS level of "at least we're not reliably
> killing the kernel".

Oh, well, good to know XFS didn't die a horrible death immediately.
Thanks for checking, Ted.

> On ext4, an unprivileged process can use process_vm_writev(2) to crash
> the system.  I don't know how quickly we can get a fix into mm/gup.c,
> but if some other kernel path tries calling set_page_dirty() on a
> file-backed page without first asking permission from the file system,
> it seems to be nice if the file system doesn't BUG() --- as near as I
> can tell, xfs isn't crashing in this case, but ext4 is.

iomap is probably refusing to map holes for writepage - we've
cleaned up most of the weird edge cases to return errors, so I'm
guessing iomap is just ignoring such pages these days.

Yeah, see iomap_writepage_map():

                error = wpc->ops->map_blocks(wpc, inode, pos);
                if (error)
                        break;
                if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
                        continue;
                if (wpc->iomap.type == IOMAP_HOLE)
                        continue;

Yeah, so if writeback maps a hole rather than converts a delalloc
region to IOMAP_MAPPED, it'll just skip over the block/page.  IIRC,
they essentially become uncleanable pages, and I think eventually
inode reclaim will just toss them out of memory.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com