All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 13/14] xfs: stop holding ILOCK over filldir callbacks
Date: Mon, 15 Feb 2016 17:18:24 +1100	[thread overview]
Message-ID: <1455517105-20033-14-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1455517105-20033-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit dbad7c993053d8f482a5f76270a93307537efd8e

The recent change to the readdir locking made in 40194ec ("xfs:
reinstate the ilock in xfs_readdir") for CXFS directory sanity was
probably the wrong thing to do. Deep in the readdir code we
can take page faults in the filldir callback, and so taking a page
fault while holding an inode ilock creates a new set of locking
issues that lockdep warns all over the place about.

The locking order for regular inodes w.r.t. page faults is io_lock
-> pagefault -> mmap_sem -> ilock. The directory readdir code now
triggers ilock -> page fault -> mmap_sem. While we cannot deadlock
at this point, it inverts all the locking patterns that lockdep
normally sees on XFS inodes, and so triggers lockdep. We worked
around this with commit 93a8614 ("xfs: fix directory inode iolock
lockdep false positive"), but that then just moved the lockdep
warning to deeper in the page fault path and triggered on security
inode locks. Fixing the shmem issue there just moved the lockdep
reports somewhere else, and now we are getting false positives from
filesystem freezing annotations getting confused.

Further, if we enter memory reclaim in a readdir path, we now get
lockdep warning about potential deadlocks because the ilock is held
when we enter reclaim. This, again, is different to a regular file
in that we never allow memory reclaim to run while holding the ilock
for regular files. Hence lockdep now throws
ilock->kmalloc->reclaim->ilock warnings.

Basically, the problem is that the ilock is being used to protect
the directory data and the inode metadata, whereas for a regular
file the iolock protects the data and the ilock protects the
metadata. From the VFS perspective, the i_mutex serialises all
accesses to the directory data, and so not holding the ilock for
readdir doesn't matter. The issue is that CXFS doesn't access
directory data via the VFS, so it has no "data serialisaton"
mechanism. Hence we need to hold the IOLOCK in the correct places to
provide this low level directory data access serialisation.

The ilock can then be used just when the extent list needs to be
read, just like we do for regular files. The directory modification
code can take the iolock exclusive when the ilock is also taken,
and this then ensures that readdir is correct excluded while
modifications are in progress.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_dir2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 383401b..89b4781 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -360,6 +360,7 @@ xfs_dir_lookup(
 	struct xfs_da_args *args;
 	int		rval;
 	int		v;		/* type-checking value */
+	int		lock_mode;
 
 	ASSERT(S_ISDIR(dp->i_d.di_mode));
 	XFS_STATS_INC(dp->i_mount, xs_dir_lookup);
@@ -385,6 +386,7 @@ xfs_dir_lookup(
 	if (ci_name)
 		args->op_flags |= XFS_DA_OP_CILOOKUP;
 
+	lock_mode = xfs_ilock_data_map_shared(dp);
 	if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
 		rval = xfs_dir2_sf_lookup(args);
 		goto out_check_rval;
@@ -417,6 +419,7 @@ out_check_rval:
 		}
 	}
 out_free:
+	xfs_iunlock(dp, lock_mode);
 	kmem_free(args);
 	return rval;
 }
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-02-15  6:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
2016-02-15  6:18 ` [PATCH 01/14] libxfs: Optimize the loop for xfs_bitmap_empty Dave Chinner
2016-02-15  6:18 ` [PATCH 02/14] xfs: add missing bmap cancel calls in error paths Dave Chinner
2016-02-15  6:18 ` [PATCH 03/14] xfs: log local to remote symlink conversions correctly on v5 supers Dave Chinner
2016-02-15  6:18 ` [PATCH 04/14] xfs: per-filesystem stats counter implementation Dave Chinner
2016-02-15  6:18 ` [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
2016-02-16 19:20   ` Brian Foster
2016-02-15  6:18 ` [PATCH 06/14] xfs: get mp from bma->ip in xfs_bmap code Dave Chinner
2016-02-15  6:18 ` [PATCH 07/14] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
2016-02-15  6:18 ` [PATCH 08/14] xfs: eliminate committed arg from xfs_bmap_finish Dave Chinner
2016-02-15  6:18 ` [PATCH 09/14] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
2016-02-15  6:18 ` [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
2016-02-16 19:20   ` Brian Foster
2016-02-15  6:18 ` [PATCH 11/14] xfs: swap leaf buffer into path struct atomically during path shift Dave Chinner
2016-02-15  6:18 ` [PATCH 12/14] libxfs: fix two comment typos Dave Chinner
2016-02-15  6:18 ` Dave Chinner [this message]
2016-02-15  6:18 ` [PATCH 14/14] xfs: Validate the length of on-disk ACLs Dave Chinner
2016-02-16 19:19 ` [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1455517105-20033-14-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.