All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: xfs@oss.sgi.com
Subject: [PATCH V2] xfs: truncate_setsize should be outside transactions
Date: Fri, 2 May 2014 17:00:54 +1000	[thread overview]
Message-ID: <20140502070054.GC26353@dastard> (raw)
In-Reply-To: <20140502064700.GB26353@dastard>


From: Dave Chinner <dchinner@redhat.com>

truncate_setsize() removes pages from the page cache, and hence
requires page locks to be held. It is not valid to lock a page cache
page inside a transaction context as we can hold page locks when we
we reserve space for a transaction. If we do, then we expose an ABBA
deadlock between log space reservation and page locks.

That is, both the write path and writeback lock a page, then start a
transaction for block allocation, which means they can block waiting
for a log reservation with the page lock held. If we hold a log
reservation and then do something that locks a page (e.g.
truncate_setsize in xfs_setattr_size) then that page lock can block
on the page locked and waiting for a log reservation. If the
transaction that is waiting for the page lock is the only active
transaction in the system that can free log space via a commit,
then writeback will never make progress and so log space will never
free up.

This issue with xfs_setattr_size() was introduced back in 2010 by
commit fa9b227 ("xfs: new truncate sequence") which moved the page
cache truncate from outside the transaction context (what was
xfs_itruncate_data()) to inside the transaction context as a call to
truncate_setsize().

The reason truncate_setsize() was located where in this place was
that we can't change the file size until after we are in the
transaction context and the operation will either succeed or shut
down the filesystem on failure. Hence we have to split
truncate_setsize() back into a pagecache operation that occurs
before the transaction context, and a i_size_write() call that
happens within the transaction context.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_iops.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ef1ca01..ab2dc47 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -808,22 +808,27 @@ xfs_setattr_size(
 	 */
 	inode_dio_wait(inode);
 
+	/*
+	 * Do all the page cache truncate work outside the transaction
+	 * context as the "lock" order is page lock->log space reservation.
+	 * i.e. locking pages inside the transaction can ABBA deadlock with
+	 * writeback. We have to do the inode size update inside the
+	 * transaction, however, as xfs_trans_reserve() can fail with ENOMEM
+	 * and we can't make user visible changes on non-fatal errors.
+	 */
 	error = -block_truncate_page(inode->i_mapping, newsize, xfs_get_blocks);
 	if (error)
 		return error;
+	truncate_pagecache(inode, newsize);
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
 	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
 	if (error)
 		goto out_trans_cancel;
 
-	truncate_setsize(inode, newsize);
-
 	commit_flags = XFS_TRANS_RELEASE_LOG_RES;
 	lock_flags |= XFS_ILOCK_EXCL;
-
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-
 	xfs_trans_ijoin(tp, ip, 0);
 
 	/*
@@ -856,6 +861,7 @@ xfs_setattr_size(
 	 * they get written to.
 	 */
 	ip->i_d.di_size = newsize;
+	i_size_write(inode, newsize);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 	if (newsize <= oldsize) {

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-05-02  7:01 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-01 22:39 [PATCH] xfs: truncate_setsize should be outside transactions Dave Chinner
2014-05-02  4:54 ` Christoph Hellwig
2014-05-02  5:00   ` Christoph Hellwig
2014-05-02  6:47     ` Dave Chinner
2014-05-02  7:00       ` Dave Chinner [this message]
2014-05-02 10:08         ` [PATCH V2] " Christoph Hellwig
2014-05-02 23:23           ` Dave Chinner
2014-05-03 15:16             ` Christoph Hellwig
2014-05-04  0:06               ` Dave Chinner
2014-05-05  5:19                 ` [PATCH V3] " Dave Chinner
2014-05-06  7:52                   ` Christoph Hellwig
2014-05-02 12:50         ` [PATCH V2] " Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140502070054.GC26353@dastard \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.