PATCH [0/4 V2] xfs: log recovery hang fixes

* PATCH [0/4 V2] xfs: log recovery hang fixes
@ 2022-03-09  1:55 Dave Chinner
  2022-03-09  1:55 ` [PATCH 1/4] xfs: log worker needs to start before intent/unlink recovery Dave Chinner
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Dave Chinner @ 2022-03-09  1:55 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

Willy reported generic/530 had started hanging on his test machines
and I've tried to reproduce the problem he reported. While I haven't
reproduced the exact hang he's been having, I've found a couple of
others while running g/530 in a tight loop on a couple of test
machines.

The first 3 patches are defensive fixes - the log worker acts as a
watchdog, and the issues in patch 2 and 3 were triggered on my
testing of g/530 and lead to 30s delays that the log worker watchdog
caught. Without the watchdog, these may actually be deadlock
triggers.

The 4th patch is the one that fixes the problem Willy reported.
It is a regression from conversion of the AIL pushing to use
non-blocking CIL flushes. It is unknown why this suddenly started
showing up on Willy's test machine right now, and why only on that
machine, but it is clearly a problem. This patch catches the state
that leads to the deadlock and breaks it with an immediate log
force to flush any pending iclogs.

Version 2:
- updated to 5.17-rc7
- tested by Willy.

Version 1:
- https://lore.kernel.org/linux-xfs/20220307053252.2534616-1-david@fromorbit.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread