[PATCH 0/3] xfs, iomap: fix writeback failure handling

* [PATCH 0/3] xfs, iomap: fix writeback failure handling
@ 2023-02-14  5:51 Dave Chinner
  2023-02-14  5:51 ` [PATCH 1/3] xfs: report block map corruption errors to the health tracking system Dave Chinner
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dave Chinner @ 2023-02-14  5:51 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Hi folks,

We just had a report of a WARN in the XFS writeback code where
delayed allocation was not finding a delayed allocation extent in
the extent tree here:

https://bugzilla.kernel.org/show_bug.cgi?id=217030

Turns out that this is a regression that resulted from removing the
dirty page invalidation on writeback error behaviour that XFS had
for many, many years. Essentially, if we are not invalidating the
dirty cached data on error, we should not be invalidating the
delalloc extent that backs the dirty data. Bad things happen when we
do that.....

This series of patches first adds Darrick's code to mark inodes as
unhealthy when bad extent maps or corruption during allocation is
detected.

The second patch expands on this sickness detection to
cover delalloc conversion failures due to corruption detected during
allocation. It then uses this sickness to trigger removal of the
unconvertable delalloc extents after the VFS has discarded the
cached data during inode reclaim, rather than throwing warnings and
assert failures due to stray unconverted delalloc extents. Those
will still happen if the inode is healthy, hence the need for
ensuring we mark inodes sick correctly.

The last patch then removes xfs_discard_folio() as all it does is
punch the delalloc extent incorrectly. Given that there are now no
other users of ->discard_folio(), that gets removed too.

This has run for a couple of hours with the original reproducer
code, whereas without these patches a current 6.2-rc7 kernel fails
in seconds. No fstests regressions have been seen either, with both
1kB and 4kB block size auto group tests runs now completed.

-Dave.

^ permalink raw reply	[flat|nested] 14+ messages in thread