linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] Monthly xfs report
@ 2023-03-30  9:58 syzbot
  2023-04-11  1:35 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: syzbot @ 2023-03-30  9:58 UTC (permalink / raw)
  To: djwong, linux-fsdevel, linux-kernel, linux-xfs, syzkaller-bugs

Hello xfs maintainers/developers,

This is a 30-day syzbot report for the xfs subsystem.
All related reports/information can be found at:
https://syzkaller.appspot.com/upstream/s/xfs

During the period, 5 new issues were detected and 0 were fixed.
In total, 23 issues are still open and 15 have been fixed so far.

Some of the still happening issues:

Crashes Repro Title
327     Yes   INFO: task hung in xlog_grant_head_check
              https://syzkaller.appspot.com/bug?extid=568245b88fbaedcb1959
85      Yes   KASAN: stack-out-of-bounds Read in xfs_buf_lock
              https://syzkaller.appspot.com/bug?extid=0bc698a422b5e4ac988c
81      Yes   WARNING in xfs_qm_dqget_cache_insert
              https://syzkaller.appspot.com/bug?extid=6ae213503fb12e87934f
47      Yes   WARNING in xfs_bmapi_convert_delalloc
              https://syzkaller.appspot.com/bug?extid=53b443b5c64221ee8bad
44      Yes   INFO: task hung in xfs_buf_item_unpin
              https://syzkaller.appspot.com/bug?extid=3f083e9e08b726fcfba2
13      Yes   general protection fault in __xfs_free_extent
              https://syzkaller.appspot.com/bug?extid=bfbc1eecdfb9b10e5792
5       Yes   KASAN: use-after-free Read in xfs_btree_lookup_get_block
              https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] Monthly xfs report
  2023-03-30  9:58 [syzbot] Monthly xfs report syzbot
@ 2023-04-11  1:35 ` Dave Chinner
  2023-04-11  4:35   ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2023-04-11  1:35 UTC (permalink / raw)
  To: syzbot; +Cc: djwong, linux-fsdevel, linux-kernel, linux-xfs, syzkaller-bugs

On Thu, Mar 30, 2023 at 02:58:43AM -0700, syzbot wrote:
> Hello xfs maintainers/developers,
> 
> This is a 30-day syzbot report for the xfs subsystem.
> All related reports/information can be found at:
> https://syzkaller.appspot.com/upstream/s/xfs
> 
> During the period, 5 new issues were detected and 0 were fixed.
> In total, 23 issues are still open and 15 have been fixed so far.
> 
> Some of the still happening issues:
> 
> Crashes Repro Title
> 327     Yes   INFO: task hung in xlog_grant_head_check
>               https://syzkaller.appspot.com/bug?extid=568245b88fbaedcb1959

[  501.289306][ T5098] XFS (loop0): Mounting V4 Filesystem 5e6273b8-2167-42bb-911b-418aa14a1261
[  501.299015][ T5098] XFS (loop0): Log size 128 blocks too small, minimum size is 2880 blocks
[  501.307608][ T5098] XFS (loop0): Log size out of supported range.
[  501.313866][ T5098] XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report.

Syzbot doing something stupid - syzbot needs to stop testing the
deprecated and soon to be unsupported v4 filesystem format.

Invalid.

> 85      Yes   KASAN: stack-out-of-bounds Read in xfs_buf_lock
>               https://syzkaller.appspot.com/bug?extid=0bc698a422b5e4ac988c

Bisection result is garbage.

Looks like a race between dquot shrinker grabbing a dquot buffer to
write back a dquot and the dquot buffer being reclaimed before it is
submitted from the delwri list. Something is dropping a buffer
reference on the floor...

More investigation needed.

> 81      Yes   WARNING in xfs_qm_dqget_cache_insert
>               https://syzkaller.appspot.com/bug?extid=6ae213503fb12e87934f

That'll be an ENOMEM warning on radix tree insert.

No big deal, the code cleans up and retries the lookup/insert
process cleanly. Could just remove the warning.

Low priority, low severity.

> 47      Yes   WARNING in xfs_bmapi_convert_delalloc
>               https://syzkaller.appspot.com/bug?extid=53b443b5c64221ee8bad

Unexpected ENOSPC because syzbot has created a inconsistency between
superblock counters and the free space btrees.  Warning is expected
as it indicates user data loss is going to occur, doesn't happen in
typical production operation, generally requires malicious
corruption of the filesystem to trigger.

Not a bug, won't fix.

> 44      Yes   INFO: task hung in xfs_buf_item_unpin
>               https://syzkaller.appspot.com/bug?extid=3f083e9e08b726fcfba2

Yup, that's a deadlock on the superblock buffer.

xfs_sync_sb_buf() is called from an ioctl of some kind, gets stuck
in the log force waiting for iclogs to complete. xfs_sync_sb_buf()
holds the buffer across the transaction commit, so the sb buffer is
locked while waiting for the log force.

At just the wrong time, the filesystem gets shut down:

  [  484.946965][ T5959] syz-executor360: attempt to access beyond end of device
  [  484.946965][ T5959] loop0: rw=432129, sector=65536, nr_sectors = 64 limit=65536
  [  484.950756][   T52] XFS (loop0): log I/O error -5
  [  484.952017][   T52] XFS (loop0): Filesystem has been shut down due to log error (0x2).
  [  484.953902][   T52] XFS (loop0): Please unmount the filesystem and rectify the problem(s).
  [  714.735393][   T28] INFO: task kworker/1:1H:52 blocked for more than 143 seconds.

And the iclog IO completion tries to unpin and abort all the log
items in the current checkpoint. One of those is the superblock
buffer, and because this is an abort:

[  714.754433][   T28]  xfs_buf_lock+0x264/0xa68
[  714.755623][   T28]  xfs_buf_item_unpin+0x2c4/0xc18
[  714.756875][   T28]  xfs_trans_committed_bulk+0x2d8/0x73c
[  714.758236][   T28]  xlog_cil_committed+0x210/0xef8

The unpin code tries to lock the buffer to pass it through to IO
completion to mark it as failed.

Real deadlock, I think it might be able to occur on any synchronous
transaction commit that holds a buffer locked across it. No
immediate fix comes to mind right now. Can only occur on a journal
IO triggered shutdown, so not somethign that happens typically in
production systems.

Low priority, medium severity.


> 13      Yes   general protection fault in __xfs_free_extent
>               https://syzkaller.appspot.com/bug?extid=bfbc1eecdfb9b10e5792

Growfs issue. Looks like a NULL pag, which means the fsbno passed
to __xfs_free_extent() is invalid. Without looking further, this
looks like it's a corrupt AGF length or superblock size and this has
resulted in the calculated fsbno starting beyond the end of the last
AG that we are about to grow. That means the agno is beyond EOFS,
xfs_perag_get(agno) ends up NULL, and __xfs_free_extent() goes
splat.  Likely requires corruption to trigger.

Low priority, low severity.

> 5       Yes   KASAN: use-after-free Read in xfs_btree_lookup_get_block
>               https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e

Recovery of reflink COW extents, we have a corrupted journal

   [   52.495566][ T5067] XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
   [   52.599681][ T5067] XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
   [   52.636680][ T5067] XFS (loop0): Starting recovery (logdev: internal)

And then it looks to have a UAF on the refcountbt cursor that is
first initialised in xfs_refcount_recover_cow_leftovers(). Likely
tripping over a corrupted refcount btree of some kind. Probably one
for Darrick to look into.

Low priority, low severity.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] Monthly xfs report
  2023-04-11  1:35 ` Dave Chinner
@ 2023-04-11  4:35   ` Darrick J. Wong
  2023-04-12 21:54     ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2023-04-11  4:35 UTC (permalink / raw)
  To: Dave Chinner
  Cc: syzbot, linux-fsdevel, linux-kernel, linux-xfs, syzkaller-bugs

On Tue, Apr 11, 2023 at 11:35:12AM +1000, Dave Chinner wrote:
> On Thu, Mar 30, 2023 at 02:58:43AM -0700, syzbot wrote:
> > Hello xfs maintainers/developers,
> > 
> > This is a 30-day syzbot report for the xfs subsystem.
> > All related reports/information can be found at:
> > https://syzkaller.appspot.com/upstream/s/xfs
> > 
> > During the period, 5 new issues were detected and 0 were fixed.
> > In total, 23 issues are still open and 15 have been fixed so far.
> > 
> > Some of the still happening issues:
> > 
> > Crashes Repro Title
> > 327     Yes   INFO: task hung in xlog_grant_head_check
> >               https://syzkaller.appspot.com/bug?extid=568245b88fbaedcb1959
> 
> [  501.289306][ T5098] XFS (loop0): Mounting V4 Filesystem 5e6273b8-2167-42bb-911b-418aa14a1261
> [  501.299015][ T5098] XFS (loop0): Log size 128 blocks too small, minimum size is 2880 blocks
> [  501.307608][ T5098] XFS (loop0): Log size out of supported range.
> [  501.313866][ T5098] XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report.
> 
> Syzbot doing something stupid - syzbot needs to stop testing the
> deprecated and soon to be unsupported v4 filesystem format.
> 
> Invalid.
> 
> > 85      Yes   KASAN: stack-out-of-bounds Read in xfs_buf_lock
> >               https://syzkaller.appspot.com/bug?extid=0bc698a422b5e4ac988c
> 
> Bisection result is garbage.
> 
> Looks like a race between dquot shrinker grabbing a dquot buffer to
> write back a dquot and the dquot buffer being reclaimed before it is
> submitted from the delwri list. Something is dropping a buffer
> reference on the floor...
> 
> More investigation needed.
> 
> > 81      Yes   WARNING in xfs_qm_dqget_cache_insert
> >               https://syzkaller.appspot.com/bug?extid=6ae213503fb12e87934f
> 
> That'll be an ENOMEM warning on radix tree insert.
> 
> No big deal, the code cleans up and retries the lookup/insert
> process cleanly. Could just remove the warning.
> 
> Low priority, low severity.
> 
> > 47      Yes   WARNING in xfs_bmapi_convert_delalloc
> >               https://syzkaller.appspot.com/bug?extid=53b443b5c64221ee8bad
> 
> Unexpected ENOSPC because syzbot has created a inconsistency between
> superblock counters and the free space btrees.  Warning is expected
> as it indicates user data loss is going to occur, doesn't happen in
> typical production operation, generally requires malicious
> corruption of the filesystem to trigger.
> 
> Not a bug, won't fix.
> 
> > 44      Yes   INFO: task hung in xfs_buf_item_unpin
> >               https://syzkaller.appspot.com/bug?extid=3f083e9e08b726fcfba2
> 
> Yup, that's a deadlock on the superblock buffer.
> 
> xfs_sync_sb_buf() is called from an ioctl of some kind, gets stuck
> in the log force waiting for iclogs to complete. xfs_sync_sb_buf()
> holds the buffer across the transaction commit, so the sb buffer is
> locked while waiting for the log force.
> 
> At just the wrong time, the filesystem gets shut down:
> 
>   [  484.946965][ T5959] syz-executor360: attempt to access beyond end of device
>   [  484.946965][ T5959] loop0: rw=432129, sector=65536, nr_sectors = 64 limit=65536
>   [  484.950756][   T52] XFS (loop0): log I/O error -5
>   [  484.952017][   T52] XFS (loop0): Filesystem has been shut down due to log error (0x2).
>   [  484.953902][   T52] XFS (loop0): Please unmount the filesystem and rectify the problem(s).
>   [  714.735393][   T28] INFO: task kworker/1:1H:52 blocked for more than 143 seconds.
> 
> And the iclog IO completion tries to unpin and abort all the log
> items in the current checkpoint. One of those is the superblock
> buffer, and because this is an abort:
> 
> [  714.754433][   T28]  xfs_buf_lock+0x264/0xa68
> [  714.755623][   T28]  xfs_buf_item_unpin+0x2c4/0xc18
> [  714.756875][   T28]  xfs_trans_committed_bulk+0x2d8/0x73c
> [  714.758236][   T28]  xlog_cil_committed+0x210/0xef8
> 
> The unpin code tries to lock the buffer to pass it through to IO
> completion to mark it as failed.
> 
> Real deadlock, I think it might be able to occur on any synchronous
> transaction commit that holds a buffer locked across it. No
> immediate fix comes to mind right now. Can only occur on a journal
> IO triggered shutdown, so not somethign that happens typically in
> production systems.

Force log, then xfs_ail_push_all_sync()?

It's SETLABEL, who cares how slow it is?

> Low priority, medium severity.
> 
> 
> > 13      Yes   general protection fault in __xfs_free_extent
> >               https://syzkaller.appspot.com/bug?extid=bfbc1eecdfb9b10e5792
> 
> Growfs issue. Looks like a NULL pag, which means the fsbno passed
> to __xfs_free_extent() is invalid. Without looking further, this
> looks like it's a corrupt AGF length or superblock size and this has
> resulted in the calculated fsbno starting beyond the end of the last
> AG that we are about to grow. That means the agno is beyond EOFS,
> xfs_perag_get(agno) ends up NULL, and __xfs_free_extent() goes
> splat.  Likely requires corruption to trigger.
> 
> Low priority, low severity.

I've been wondering for quite a while if the code that creates those
defer items ought to be shutting down the fs if they can't get a perag
to stuff in the intent.  xfs_perag_intent_get seems like a reasonable
place to shut down the fs with a corruption warning if someone feeds in
a totally garbage fsblock range.

> > 5       Yes   KASAN: use-after-free Read in xfs_btree_lookup_get_block
> >               https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e
> 
> Recovery of reflink COW extents, we have a corrupted journal
> 
>    [   52.495566][ T5067] XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
>    [   52.599681][ T5067] XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
>    [   52.636680][ T5067] XFS (loop0): Starting recovery (logdev: internal)
> 
> And then it looks to have a UAF on the refcountbt cursor that is
> first initialised in xfs_refcount_recover_cow_leftovers(). Likely
> tripping over a corrupted refcount btree of some kind. Probably one
> for Darrick to look into.

Somehow the bogus refcount level field in the AGF is getting past the
verifiers.  I'll look into this later.

--D

> Low priority, low severity.
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] Monthly xfs report
  2023-04-11  4:35   ` Darrick J. Wong
@ 2023-04-12 21:54     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2023-04-12 21:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: syzbot, linux-fsdevel, linux-kernel, linux-xfs, syzkaller-bugs

On Mon, Apr 10, 2023 at 09:35:17PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 11, 2023 at 11:35:12AM +1000, Dave Chinner wrote:
> > On Thu, Mar 30, 2023 at 02:58:43AM -0700, syzbot wrote:
> > 
> > > 13      Yes   general protection fault in __xfs_free_extent
> > >               https://syzkaller.appspot.com/bug?extid=bfbc1eecdfb9b10e5792
> > 
> > Growfs issue. Looks like a NULL pag, which means the fsbno passed
> > to __xfs_free_extent() is invalid. Without looking further, this
> > looks like it's a corrupt AGF length or superblock size and this has
> > resulted in the calculated fsbno starting beyond the end of the last
> > AG that we are about to grow. That means the agno is beyond EOFS,
> > xfs_perag_get(agno) ends up NULL, and __xfs_free_extent() goes
> > splat.  Likely requires corruption to trigger.
> > 
> > Low priority, low severity.
> 
> I've been wondering for quite a while if the code that creates those
> defer items ought to be shutting down the fs if they can't get a perag
> to stuff in the intent.  xfs_perag_intent_get seems like a reasonable
> place to shut down the fs with a corruption warning if someone feeds in
> a totally garbage fsblock range.

You know, I think this might be the same as thex  case below where
a bogus AGF field is getting past the verifiers in recovery...

> 
> > > 5       Yes   KASAN: use-after-free Read in xfs_btree_lookup_get_block
> > >               https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e
> > 
> > Recovery of reflink COW extents, we have a corrupted journal
> > 
> >    [   52.495566][ T5067] XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
> >    [   52.599681][ T5067] XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
> >    [   52.636680][ T5067] XFS (loop0): Starting recovery (logdev: internal)
> > 
> > And then it looks to have a UAF on the refcountbt cursor that is
> > first initialised in xfs_refcount_recover_cow_leftovers(). Likely
> > tripping over a corrupted refcount btree of some kind. Probably one
> > for Darrick to look into.
> 
> Somehow the bogus refcount level field in the AGF is getting past the
> verifiers.  I'll look into this later.

... because like this one, it seems to require corruption getting
deep into the modification operation without being detected.

As for shutdown when a perag cannot be obtained by defer items, I'm
hoping that the perag get operations slowly disappear from those as
we slowly move the perag references higher up the heirarchy. The
perag should not go away in the middle of a defer chain, so I don't
think we should ever get a NULL from a lookup except in the case of
buggy code....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-12 21:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-30  9:58 [syzbot] Monthly xfs report syzbot
2023-04-11  1:35 ` Dave Chinner
2023-04-11  4:35   ` Darrick J. Wong
2023-04-12 21:54     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).