[PATCH v2 0/3] xfs: quotacheck vs. dquot reclaim deadlock

* [PATCH v2 0/3] xfs: quotacheck vs. dquot reclaim deadlock
@ 2017-02-24 19:53 Brian Foster
  2017-02-24 19:53 ` [PATCH 1/3] xfs: fix up quotacheck buffer list error handling Brian Foster
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Brian Foster @ 2017-02-24 19:53 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner, Martin Svec

Hi all,

Here's one more stab at the quotacheck vs. dquot reclaim deadlock. The
details of the deadlock are in the commit log. This is a completely new
approach from v1. Rather than disable dquot reclaim entirely, this
pushes on the queued buffer for already flush locked dquots to cycle the
lock and allow the flush to proceed. This is the general approach that
has been used to avoid this problem in XFS since before v3.5 (where the
regression was introduced) to as far back as I can tell.

I've also tacked on an RFC patch to submit dquot buffers after the
quotacheck buffer reset since the topic of doing so as an alternative
fix keeps coming up wrt to this issue. Note that this patch is for
reference and discussion purposes only, as it is broken. The purpose of
including it here is to separate the discussion of performance (memory
efficiency in this case) from the goal of correctness.

As it is, I don't like patch 3 because for the cost to make it actually
work, I don't think it provides enough practical benefit by itself. We
still, for example, load in the entire dquot buffer space all at once
during the buffer reset. If we're going to rework quotacheck for memory
usage, better (IMO) would be to explore an approach that allows more
granular management of both buffers and dquots, rather than do the work
to take quotacheck apart and put it back together again for an
incremental gain. Other options here might be to try and find a way to
avoid the use of dquots entirely by working directly on the buffers,
implement a dirty buffer lru and dquot analog to the inode DONTCACHE
mode (with immediate flush), implement a userspace quotacheck, more
creative batching, or perhaps other high-level ideas.

I'm also not convinced that memory consumption of the dquot traversal is
enough of a problem in practice for a quotacheck rewrite to be high
priority, as most filesystems probably won't have enough dquots for it
to be. I've been experimenting with a filesystem with 600k project
quotas and don't reproduce a problem with as little as 1GB RAM (no
swap). Thoughts?

Brian

v2:
- Added quotacheck error handling fixup patch.
- Push buffers with flush locked dquots for deadlock avoidance rather
  than bypass dquot reclaim.
- Added RFC patch for quotacheck early buffer list submission.
v1: http://www.spinics.net/lists/linux-xfs/msg04304.html

Brian Foster (3):
  xfs: fix up quotacheck buffer list error handling
  xfs: push buffer of flush locked dquot to avoid quotacheck deadlock
  [RFC] xfs: release buffer list after quotacheck buf reset

 fs/xfs/xfs_buf.c   | 39 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_buf.h   |  1 +
 fs/xfs/xfs_qm.c    | 33 ++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trace.h |  1 +
 4 files changed, 73 insertions(+), 1 deletion(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 26+ messages in thread