All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v13 0/4] xfs: Remove wrappers for some semaphores
@ 2020-11-02 19:41 Pavel Reichl
  2020-11-02 19:41 ` [PATCH v13 1/4] xfs: Refactor xfs_isilocked() Pavel Reichl
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Pavel Reichl @ 2020-11-02 19:41 UTC (permalink / raw)
  To: linux-xfs

Remove some wrappers that we have in XFS around the read-write semaphore
locks.

The goal of this cleanup is to remove mrlock_t structure and its mr*()
wrapper functions and replace it with native rw_semaphore type and its
native calls.

Changes in version 8:
* Patchset was rebased so it applies cleanly.
* The patch 'xfs: replace mrlock_t with rw_semaphores' contains change in
xfs_btree.c which transfers ownership of lock so lockdep won't assert
(This was reported by Darrick and proposed change fixes this issue).

Changes in version 9:
*Fixed white space in patch 'xfs: Refactor xfs_isilocked()'
*Updated code comments as suggested by djwong (thanks!) in patch: 'xfs: replace mrlock_t with rw_semaphores'

Changes in version 10:
* Fixed use-after-free in 'xfs: replace mrlock_t with rw_semaphores' (thanks Darrick)
* Moved part of refactor of xfs_isilocked() from patch 'xfs: Refactor xfs_isilocked()' to patch 'xfs: replace mrlock_t with rw_semaphores' - to fix compilation error
* Typo in comment in 'xfs: replace mrlock_t with rw_semaphores'

Changes in version 11:
* Dropped typedef xfs_node_t from xfs_isilocked in 'xfs: Refactor xfs_isilocked()'

Changes in version 12:

xfs: Refactor xfs_isilocked()
* Moved shifting lock_flags from xfs_ilocked() to __xfs_rwsem_ilocked()
* Changed comment in __xfs_rwsem_islocked()
* Removed the arg variable from __xfs_rwsem_islocked()
* Removed the extra parentheses around the lock 'defines'

xfs: replace mrlock_t with rw_semaphores
* Moved shifting lock_flags from xfs_ilocked() to __xfs_rwsem_ilocked()
* Updated comment in xfs_btree_split() before the rwsem_release()
* Added assert to  xfs_isilocked() for IOLOCK flags

Changes in version 13:

Rebased against 5.10-rc1 as requested by Darrick.

The changes Brian asked for:
	1st patch - dropped the unnecessary parentheses
	4th patch - reduced the comments about passing the lockdep ownership 

Pavel Reichl (4):
  xfs: Refactor xfs_isilocked()
  xfs: clean up whitespace in xfs_isilocked() calls
  xfs: xfs_isilocked() can only check a single lock type
  xfs: replace mrlock_t with rw_semaphores

 fs/xfs/libxfs/xfs_bmap.c  |  8 ++--
 fs/xfs/libxfs/xfs_btree.c | 16 +++++++
 fs/xfs/mrlock.h           | 78 --------------------------------
 fs/xfs/xfs_file.c         |  3 +-
 fs/xfs/xfs_inode.c        | 95 +++++++++++++++++++++++++--------------
 fs/xfs/xfs_inode.h        | 25 +++++++----
 fs/xfs/xfs_iops.c         |  4 +-
 fs/xfs/xfs_linux.h        |  2 +-
 fs/xfs/xfs_qm.c           |  2 +-
 fs/xfs/xfs_super.c        |  6 +--
 10 files changed, 106 insertions(+), 133 deletions(-)
 delete mode 100644 fs/xfs/mrlock.h

-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v13 1/4] xfs: Refactor xfs_isilocked()
  2020-11-02 19:41 [PATCH v13 0/4] xfs: Remove wrappers for some semaphores Pavel Reichl
@ 2020-11-02 19:41 ` Pavel Reichl
  2020-11-02 19:41 ` [PATCH v13 2/4] xfs: clean up whitespace in xfs_isilocked() calls Pavel Reichl
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Pavel Reichl @ 2020-11-02 19:41 UTC (permalink / raw)
  To: linux-xfs

Refactor xfs_isilocked() to use newly introduced __xfs_rwsem_islocked().
__xfs_rwsem_islocked() is a helper function which encapsulates checking
state of rw_semaphores hold by inode.

Signed-off-by: Pavel Reichl <preichl@redhat.com>
Suggested-by: Dave Chinner <dchinner@redhat.com>
Suggested-by: Eric Sandeen <sandeen@redhat.com>
Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_inode.c | 39 +++++++++++++++++++++++++++++++--------
 fs/xfs/xfs_inode.h | 21 ++++++++++++++-------
 2 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2bfbcf28b1bd..efe4a0afa23e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -345,9 +345,34 @@ xfs_ilock_demote(
 }
 
 #if defined(DEBUG) || defined(XFS_WARN)
-int
+static inline bool
+__xfs_rwsem_islocked(
+	struct rw_semaphore	*rwsem,
+	int			lock_flags,
+	int			shift)
+{
+	lock_flags >>= shift;
+
+	if (!debug_locks)
+		return rwsem_is_locked(rwsem);
+	/*
+	 * If the shared flag is not set, pass 0 to explicitly check for
+	 * exclusive access to the lock. If the shared flag is set, we typically
+	 * want to make sure the lock is at least held in shared mode
+	 * (i.e., shared | excl) but we don't necessarily care that it might
+	 * actually be held exclusive. Therefore, pass -1 to check whether the
+	 * lock is held in any mode rather than one of the explicit shared mode
+	 * values (1 or 2)."
+	 */
+	if (lock_flags & 1 << XFS_SHARED_LOCK_SHIFT) {
+		return lockdep_is_held_type(rwsem, -1);
+	}
+	return lockdep_is_held_type(rwsem, 0);
+}
+
+bool
 xfs_isilocked(
-	xfs_inode_t		*ip,
+	struct xfs_inode	*ip,
 	uint			lock_flags)
 {
 	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
@@ -362,15 +387,13 @@ xfs_isilocked(
 		return rwsem_is_locked(&ip->i_mmaplock.mr_lock);
 	}
 
-	if (lock_flags & (XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED)) {
-		if (!(lock_flags & XFS_IOLOCK_SHARED))
-			return !debug_locks ||
-				lockdep_is_held_type(&VFS_I(ip)->i_rwsem, 0);
-		return rwsem_is_locked(&VFS_I(ip)->i_rwsem);
+	if (lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) {
+		return __xfs_rwsem_islocked(&VFS_I(ip)->i_rwsem, lock_flags,
+				XFS_IOLOCK_FLAG_SHIFT);
 	}
 
 	ASSERT(0);
-	return 0;
+	return false;
 }
 #endif
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 751a3d1d7d84..1392a9c452ae 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -242,12 +242,19 @@ static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
  * Bit ranges:	1<<1  - 1<<16-1 -- iolock/ilock modes (bitfield)
  *		1<<16 - 1<<32-1 -- lockdep annotation (integers)
  */
-#define	XFS_IOLOCK_EXCL		(1<<0)
-#define	XFS_IOLOCK_SHARED	(1<<1)
-#define	XFS_ILOCK_EXCL		(1<<2)
-#define	XFS_ILOCK_SHARED	(1<<3)
-#define	XFS_MMAPLOCK_EXCL	(1<<4)
-#define	XFS_MMAPLOCK_SHARED	(1<<5)
+
+#define XFS_IOLOCK_FLAG_SHIFT	0
+#define XFS_ILOCK_FLAG_SHIFT	2
+#define XFS_MMAPLOCK_FLAG_SHIFT	4
+
+#define XFS_SHARED_LOCK_SHIFT	1
+
+#define XFS_IOLOCK_EXCL		(1 << XFS_IOLOCK_FLAG_SHIFT)
+#define XFS_IOLOCK_SHARED	(XFS_IOLOCK_EXCL << XFS_SHARED_LOCK_SHIFT)
+#define XFS_ILOCK_EXCL		(1 << XFS_ILOCK_FLAG_SHIFT)
+#define XFS_ILOCK_SHARED	(XFS_ILOCK_EXCL << XFS_SHARED_LOCK_SHIFT)
+#define XFS_MMAPLOCK_EXCL	(1 << XFS_MMAPLOCK_FLAG_SHIFT)
+#define XFS_MMAPLOCK_SHARED	(XFS_MMAPLOCK_EXCL << XFS_SHARED_LOCK_SHIFT)
 
 #define XFS_LOCK_MASK		(XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED \
 				| XFS_ILOCK_EXCL | XFS_ILOCK_SHARED \
@@ -386,7 +393,7 @@ void		xfs_ilock(xfs_inode_t *, uint);
 int		xfs_ilock_nowait(xfs_inode_t *, uint);
 void		xfs_iunlock(xfs_inode_t *, uint);
 void		xfs_ilock_demote(xfs_inode_t *, uint);
-int		xfs_isilocked(xfs_inode_t *, uint);
+bool		xfs_isilocked(struct xfs_inode *, uint);
 uint		xfs_ilock_data_map_shared(struct xfs_inode *);
 uint		xfs_ilock_attr_map_shared(struct xfs_inode *);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v13 2/4] xfs: clean up whitespace in xfs_isilocked() calls
  2020-11-02 19:41 [PATCH v13 0/4] xfs: Remove wrappers for some semaphores Pavel Reichl
  2020-11-02 19:41 ` [PATCH v13 1/4] xfs: Refactor xfs_isilocked() Pavel Reichl
@ 2020-11-02 19:41 ` Pavel Reichl
  2020-11-02 19:41 ` [PATCH v13 3/4] xfs: xfs_isilocked() can only check a single lock type Pavel Reichl
  2020-11-02 19:41 ` [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores Pavel Reichl
  3 siblings, 0 replies; 8+ messages in thread
From: Pavel Reichl @ 2020-11-02 19:41 UTC (permalink / raw)
  To: linux-xfs

Make whitespace follow the same pattern in all xfs_isilocked() calls.

Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 2 +-
 fs/xfs/xfs_file.c        | 3 ++-
 fs/xfs/xfs_inode.c       | 4 ++--
 fs/xfs/xfs_qm.c          | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index d9a692484eae..666aa47e4f4f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3883,7 +3883,7 @@ xfs_bmapi_read(
 
 	ASSERT(*nmap >= 1);
 	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK | XFS_BMAPI_ENTIRE)));
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED|XFS_ILOCK_EXCL));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
 
 	if (WARN_ON_ONCE(!ifp))
 		return -EFSCORRUPTED;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f73837..874c5edc7b4a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -796,7 +796,8 @@ xfs_break_layouts(
 	bool			retry;
 	int			error;
 
-	ASSERT(xfs_isilocked(XFS_I(inode), XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL));
+	ASSERT(xfs_isilocked(XFS_I(inode),
+			XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL));
 
 	do {
 		retry = false;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index efe4a0afa23e..16d481cf3793 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2772,7 +2772,7 @@ static void
 xfs_iunpin(
 	struct xfs_inode	*ip)
 {
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
 
 	trace_xfs_inode_unpin_nowait(ip, _RET_IP_);
 
@@ -3465,7 +3465,7 @@ xfs_iflush(
 	struct xfs_mount	*mp = ip->i_mount;
 	int			error;
 
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
 	ASSERT(xfs_iflags_test(ip, XFS_IFLUSHING));
 	ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_BTREE ||
 	       ip->i_df.if_nextents > XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK));
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index b2a9abee8b2b..e0bc7c739061 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1818,7 +1818,7 @@ xfs_qm_vop_chown_reserve(
 	int			error;
 
 
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
 	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
 
 	delblks = ip->i_delayed_blks;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v13 3/4] xfs: xfs_isilocked() can only check a single lock type
  2020-11-02 19:41 [PATCH v13 0/4] xfs: Remove wrappers for some semaphores Pavel Reichl
  2020-11-02 19:41 ` [PATCH v13 1/4] xfs: Refactor xfs_isilocked() Pavel Reichl
  2020-11-02 19:41 ` [PATCH v13 2/4] xfs: clean up whitespace in xfs_isilocked() calls Pavel Reichl
@ 2020-11-02 19:41 ` Pavel Reichl
  2020-11-02 19:41 ` [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores Pavel Reichl
  3 siblings, 0 replies; 8+ messages in thread
From: Pavel Reichl @ 2020-11-02 19:41 UTC (permalink / raw)
  To: linux-xfs

In its current form, xfs_isilocked() is only able to test one lock type
at a time - ilock, iolock, or mmap lock, but combinations are not
properly handled. The intent here is to check that both XFS_IOLOCK_EXCL
and XFS_ILOCK_EXCL are held, so test them each separately.

The commit ecfea3f0c8c6 ("xfs: split xfs_bmap_shift_extents") ORed the
flags together which was an error, so this patch reverts that part of
the change and check the locks independently.

Fixes: ecfea3f0c8c6 ("xfs: split xfs_bmap_shift_extents")

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 666aa47e4f4f..558e113e267f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5792,7 +5792,8 @@ xfs_bmap_collapse_extents(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL));
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(tp, ip, whichfork);
@@ -5909,7 +5910,8 @@ xfs_bmap_insert_extents(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL));
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(tp, ip, whichfork);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores
  2020-11-02 19:41 [PATCH v13 0/4] xfs: Remove wrappers for some semaphores Pavel Reichl
                   ` (2 preceding siblings ...)
  2020-11-02 19:41 ` [PATCH v13 3/4] xfs: xfs_isilocked() can only check a single lock type Pavel Reichl
@ 2020-11-02 19:41 ` Pavel Reichl
  2020-11-03 19:41   ` Darrick J. Wong
  2020-11-04  0:53   ` Darrick J. Wong
  3 siblings, 2 replies; 8+ messages in thread
From: Pavel Reichl @ 2020-11-02 19:41 UTC (permalink / raw)
  To: linux-xfs

Remove mrlock_t as it does not provide any extra value over
rw_semaphores. Make i_lock and i_mmaplock native rw_semaphores and
replace mr*() functions with native rwsem calls.

Release the lock in xfs_btree_split() just before the work-queue
executing xfs_btree_split_worker() is scheduled and make
xfs_btree_split_worker() to acquire the lock as a first thing and
release it just before returning from the function. This it done so the
ownership of the lock is transfered between kernel threads and thus
lockdep won't complain about lock being held by a different kernel
thread.

Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>

---
 fs/xfs/libxfs/xfs_btree.c | 16 ++++++++
 fs/xfs/mrlock.h           | 78 ---------------------------------------
 fs/xfs/xfs_inode.c        | 52 ++++++++++++++------------
 fs/xfs/xfs_inode.h        |  4 +-
 fs/xfs/xfs_iops.c         |  4 +-
 fs/xfs/xfs_linux.h        |  2 +-
 fs/xfs/xfs_super.c        |  6 +--
 7 files changed, 51 insertions(+), 111 deletions(-)
 delete mode 100644 fs/xfs/mrlock.h

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 2d25bab68764..181d5797c97b 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -2816,6 +2816,10 @@ xfs_btree_split_worker(
 	unsigned long		pflags;
 	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
 
+	/*
+	 * Tranfer lock ownership to workqueue task.
+	 */
+	rwsem_acquire(&args->cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
 	/*
 	 * we are in a transaction context here, but may also be doing work
 	 * in kswapd context, and hence we may need to inherit that state
@@ -2829,6 +2833,7 @@ xfs_btree_split_worker(
 
 	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
 					 args->key, args->curp, args->stat);
+	rwsem_release(&args->cur->bc_ino.ip->i_lock.dep_map, _THIS_IP_);
 	complete(args->done);
 
 	current_restore_flags_nested(&pflags, new_pflags);
@@ -2863,8 +2868,19 @@ xfs_btree_split(
 	args.done = &done;
 	args.kswapd = current_is_kswapd();
 	INIT_WORK_ONSTACK(&args.work, xfs_btree_split_worker);
+	/*
+	 * Update lockdep's ownership information to reflect transfer of the
+	 * ilock from the current task to the worker. Otherwise assertions that
+	 * the lock is held (such as when logging the inode) might fail due to
+	 * incorrect task owner state.
+	 */
+	rwsem_release(&cur->bc_ino.ip->i_lock.dep_map, _THIS_IP_);
 	queue_work(xfs_alloc_wq, &args.work);
 	wait_for_completion(&done);
+	/*
+	 * Tranfer lock ownership back to the thread.
+	 */
+	rwsem_acquire(&cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
 	destroy_work_on_stack(&args.work);
 	return args.result;
 }
diff --git a/fs/xfs/mrlock.h b/fs/xfs/mrlock.h
deleted file mode 100644
index 79155eec341b..000000000000
--- a/fs/xfs/mrlock.h
+++ /dev/null
@@ -1,78 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Copyright (c) 2000-2006 Silicon Graphics, Inc.
- * All Rights Reserved.
- */
-#ifndef __XFS_SUPPORT_MRLOCK_H__
-#define __XFS_SUPPORT_MRLOCK_H__
-
-#include <linux/rwsem.h>
-
-typedef struct {
-	struct rw_semaphore	mr_lock;
-#if defined(DEBUG) || defined(XFS_WARN)
-	int			mr_writer;
-#endif
-} mrlock_t;
-
-#if defined(DEBUG) || defined(XFS_WARN)
-#define mrinit(mrp, name)	\
-	do { (mrp)->mr_writer = 0; init_rwsem(&(mrp)->mr_lock); } while (0)
-#else
-#define mrinit(mrp, name)	\
-	do { init_rwsem(&(mrp)->mr_lock); } while (0)
-#endif
-
-#define mrlock_init(mrp, t,n,s)	mrinit(mrp, n)
-#define mrfree(mrp)		do { } while (0)
-
-static inline void mraccess_nested(mrlock_t *mrp, int subclass)
-{
-	down_read_nested(&mrp->mr_lock, subclass);
-}
-
-static inline void mrupdate_nested(mrlock_t *mrp, int subclass)
-{
-	down_write_nested(&mrp->mr_lock, subclass);
-#if defined(DEBUG) || defined(XFS_WARN)
-	mrp->mr_writer = 1;
-#endif
-}
-
-static inline int mrtryaccess(mrlock_t *mrp)
-{
-	return down_read_trylock(&mrp->mr_lock);
-}
-
-static inline int mrtryupdate(mrlock_t *mrp)
-{
-	if (!down_write_trylock(&mrp->mr_lock))
-		return 0;
-#if defined(DEBUG) || defined(XFS_WARN)
-	mrp->mr_writer = 1;
-#endif
-	return 1;
-}
-
-static inline void mrunlock_excl(mrlock_t *mrp)
-{
-#if defined(DEBUG) || defined(XFS_WARN)
-	mrp->mr_writer = 0;
-#endif
-	up_write(&mrp->mr_lock);
-}
-
-static inline void mrunlock_shared(mrlock_t *mrp)
-{
-	up_read(&mrp->mr_lock);
-}
-
-static inline void mrdemote(mrlock_t *mrp)
-{
-#if defined(DEBUG) || defined(XFS_WARN)
-	mrp->mr_writer = 0;
-#endif
-	downgrade_write(&mrp->mr_lock);
-}
-
-#endif /* __XFS_SUPPORT_MRLOCK_H__ */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 16d481cf3793..43ecfcb63f99 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -191,14 +191,15 @@ xfs_ilock(
 	}
 
 	if (lock_flags & XFS_MMAPLOCK_EXCL)
-		mrupdate_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags));
+		down_write_nested(&ip->i_mmaplock,
+				XFS_MMAPLOCK_DEP(lock_flags));
 	else if (lock_flags & XFS_MMAPLOCK_SHARED)
-		mraccess_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags));
+		down_read_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags));
 
 	if (lock_flags & XFS_ILOCK_EXCL)
-		mrupdate_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
+		down_write_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
 	else if (lock_flags & XFS_ILOCK_SHARED)
-		mraccess_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
+		down_read_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
 }
 
 /*
@@ -242,27 +243,27 @@ xfs_ilock_nowait(
 	}
 
 	if (lock_flags & XFS_MMAPLOCK_EXCL) {
-		if (!mrtryupdate(&ip->i_mmaplock))
+		if (!down_write_trylock(&ip->i_mmaplock))
 			goto out_undo_iolock;
 	} else if (lock_flags & XFS_MMAPLOCK_SHARED) {
-		if (!mrtryaccess(&ip->i_mmaplock))
+		if (!down_read_trylock(&ip->i_mmaplock))
 			goto out_undo_iolock;
 	}
 
 	if (lock_flags & XFS_ILOCK_EXCL) {
-		if (!mrtryupdate(&ip->i_lock))
+		if (!down_write_trylock(&ip->i_lock))
 			goto out_undo_mmaplock;
 	} else if (lock_flags & XFS_ILOCK_SHARED) {
-		if (!mrtryaccess(&ip->i_lock))
+		if (!down_read_trylock(&ip->i_lock))
 			goto out_undo_mmaplock;
 	}
 	return 1;
 
 out_undo_mmaplock:
 	if (lock_flags & XFS_MMAPLOCK_EXCL)
-		mrunlock_excl(&ip->i_mmaplock);
+		up_write(&ip->i_mmaplock);
 	else if (lock_flags & XFS_MMAPLOCK_SHARED)
-		mrunlock_shared(&ip->i_mmaplock);
+		up_read(&ip->i_mmaplock);
 out_undo_iolock:
 	if (lock_flags & XFS_IOLOCK_EXCL)
 		up_write(&VFS_I(ip)->i_rwsem);
@@ -309,14 +310,14 @@ xfs_iunlock(
 		up_read(&VFS_I(ip)->i_rwsem);
 
 	if (lock_flags & XFS_MMAPLOCK_EXCL)
-		mrunlock_excl(&ip->i_mmaplock);
+		up_write(&ip->i_mmaplock);
 	else if (lock_flags & XFS_MMAPLOCK_SHARED)
-		mrunlock_shared(&ip->i_mmaplock);
+		up_read(&ip->i_mmaplock);
 
 	if (lock_flags & XFS_ILOCK_EXCL)
-		mrunlock_excl(&ip->i_lock);
+		up_write(&ip->i_lock);
 	else if (lock_flags & XFS_ILOCK_SHARED)
-		mrunlock_shared(&ip->i_lock);
+		up_read(&ip->i_lock);
 
 	trace_xfs_iunlock(ip, lock_flags, _RET_IP_);
 }
@@ -335,9 +336,9 @@ xfs_ilock_demote(
 		~(XFS_IOLOCK_EXCL|XFS_MMAPLOCK_EXCL|XFS_ILOCK_EXCL)) == 0);
 
 	if (lock_flags & XFS_ILOCK_EXCL)
-		mrdemote(&ip->i_lock);
+		downgrade_write(&ip->i_lock);
 	if (lock_flags & XFS_MMAPLOCK_EXCL)
-		mrdemote(&ip->i_mmaplock);
+		downgrade_write(&ip->i_mmaplock);
 	if (lock_flags & XFS_IOLOCK_EXCL)
 		downgrade_write(&VFS_I(ip)->i_rwsem);
 
@@ -375,19 +376,22 @@ xfs_isilocked(
 	struct xfs_inode	*ip,
 	uint			lock_flags)
 {
-	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
-		if (!(lock_flags & XFS_ILOCK_SHARED))
-			return !!ip->i_lock.mr_writer;
-		return rwsem_is_locked(&ip->i_lock.mr_lock);
+	if (lock_flags & (XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)) {
+		ASSERT(!(lock_flags & ~(XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)));
+		return __xfs_rwsem_islocked(&ip->i_lock, lock_flags,
+				XFS_ILOCK_FLAG_SHIFT);
 	}
 
-	if (lock_flags & (XFS_MMAPLOCK_EXCL|XFS_MMAPLOCK_SHARED)) {
-		if (!(lock_flags & XFS_MMAPLOCK_SHARED))
-			return !!ip->i_mmaplock.mr_writer;
-		return rwsem_is_locked(&ip->i_mmaplock.mr_lock);
+	if (lock_flags & (XFS_MMAPLOCK_EXCL | XFS_MMAPLOCK_SHARED)) {
+		ASSERT(!(lock_flags &
+			~(XFS_MMAPLOCK_EXCL | XFS_MMAPLOCK_SHARED)));
+		return __xfs_rwsem_islocked(&ip->i_mmaplock, lock_flags,
+				XFS_MMAPLOCK_FLAG_SHIFT);
 	}
 
 	if (lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) {
+		ASSERT(!(lock_flags &
+			~(XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)));
 		return __xfs_rwsem_islocked(&VFS_I(ip)->i_rwsem, lock_flags,
 				XFS_IOLOCK_FLAG_SHIFT);
 	}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 1392a9c452ae..66ceb127192e 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -39,8 +39,8 @@ typedef struct xfs_inode {
 
 	/* Transaction and locking information. */
 	struct xfs_inode_log_item *i_itemp;	/* logging information */
-	mrlock_t		i_lock;		/* inode lock */
-	mrlock_t		i_mmaplock;	/* inode mmap IO lock */
+	struct rw_semaphore	i_lock;		/* inode lock */
+	struct rw_semaphore	i_mmaplock;	/* inode mmap IO lock */
 	atomic_t		i_pincount;	/* inode pin count */
 
 	/*
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 5e165456da68..8181f6785a7a 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1336,9 +1336,9 @@ xfs_setup_inode(
 		 */
 		lockdep_set_class(&inode->i_rwsem,
 				  &inode->i_sb->s_type->i_mutex_dir_key);
-		lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class);
+		lockdep_set_class(&ip->i_lock, &xfs_dir_ilock_class);
 	} else {
-		lockdep_set_class(&ip->i_lock.mr_lock, &xfs_nondir_ilock_class);
+		lockdep_set_class(&ip->i_lock, &xfs_nondir_ilock_class);
 	}
 
 	/*
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 5b7a1e201559..64e28ec16cf7 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -22,7 +22,6 @@ typedef __u32			xfs_nlink_t;
 #include "xfs_types.h"
 
 #include "kmem.h"
-#include "mrlock.h"
 
 #include <linux/semaphore.h>
 #include <linux/mm.h>
@@ -61,6 +60,7 @@ typedef __u32			xfs_nlink_t;
 #include <linux/ratelimit.h>
 #include <linux/rhashtable.h>
 #include <linux/xattr.h>
+#include <linux/rwsem.h>
 
 #include <asm/page.h>
 #include <asm/div64.h>
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e3e229e52512..380ba196b165 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -708,10 +708,8 @@ xfs_fs_inode_init_once(
 	atomic_set(&ip->i_pincount, 0);
 	spin_lock_init(&ip->i_flags_lock);
 
-	mrlock_init(&ip->i_mmaplock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
-		     "xfsino", ip->i_ino);
-	mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
-		     "xfsino", ip->i_ino);
+	init_rwsem(&ip->i_mmaplock);
+	init_rwsem(&ip->i_lock);
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores
  2020-11-02 19:41 ` [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores Pavel Reichl
@ 2020-11-03 19:41   ` Darrick J. Wong
  2020-11-04  0:53   ` Darrick J. Wong
  1 sibling, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2020-11-03 19:41 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: linux-xfs

On Mon, Nov 02, 2020 at 08:41:35PM +0100, Pavel Reichl wrote:
> Remove mrlock_t as it does not provide any extra value over
> rw_semaphores. Make i_lock and i_mmaplock native rw_semaphores and
> replace mr*() functions with native rwsem calls.
> 
> Release the lock in xfs_btree_split() just before the work-queue
> executing xfs_btree_split_worker() is scheduled and make
> xfs_btree_split_worker() to acquire the lock as a first thing and
> release it just before returning from the function. This it done so the
> ownership of the lock is transfered between kernel threads and thus
> lockdep won't complain about lock being held by a different kernel
> thread.
> 
> Signed-off-by: Pavel Reichl <preichl@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> ---
>  fs/xfs/libxfs/xfs_btree.c | 16 ++++++++
>  fs/xfs/mrlock.h           | 78 ---------------------------------------
>  fs/xfs/xfs_inode.c        | 52 ++++++++++++++------------
>  fs/xfs/xfs_inode.h        |  4 +-
>  fs/xfs/xfs_iops.c         |  4 +-
>  fs/xfs/xfs_linux.h        |  2 +-
>  fs/xfs/xfs_super.c        |  6 +--
>  7 files changed, 51 insertions(+), 111 deletions(-)
>  delete mode 100644 fs/xfs/mrlock.h
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 2d25bab68764..181d5797c97b 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -2816,6 +2816,10 @@ xfs_btree_split_worker(
>  	unsigned long		pflags;
>  	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
>  
> +	/*
> +	 * Tranfer lock ownership to workqueue task.

"Transfer", not "tranfer".  I'll just fix that so you don't have to send
this yet again.

MR. LOCK is dead, long live MR. LOCK!
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +	 */
> +	rwsem_acquire(&args->cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
>  	/*
>  	 * we are in a transaction context here, but may also be doing work
>  	 * in kswapd context, and hence we may need to inherit that state
> @@ -2829,6 +2833,7 @@ xfs_btree_split_worker(
>  
>  	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
>  					 args->key, args->curp, args->stat);
> +	rwsem_release(&args->cur->bc_ino.ip->i_lock.dep_map, _THIS_IP_);
>  	complete(args->done);
>  
>  	current_restore_flags_nested(&pflags, new_pflags);
> @@ -2863,8 +2868,19 @@ xfs_btree_split(
>  	args.done = &done;
>  	args.kswapd = current_is_kswapd();
>  	INIT_WORK_ONSTACK(&args.work, xfs_btree_split_worker);
> +	/*
> +	 * Update lockdep's ownership information to reflect transfer of the
> +	 * ilock from the current task to the worker. Otherwise assertions that
> +	 * the lock is held (such as when logging the inode) might fail due to
> +	 * incorrect task owner state.
> +	 */
> +	rwsem_release(&cur->bc_ino.ip->i_lock.dep_map, _THIS_IP_);
>  	queue_work(xfs_alloc_wq, &args.work);
>  	wait_for_completion(&done);
> +	/*
> +	 * Tranfer lock ownership back to the thread.
> +	 */
> +	rwsem_acquire(&cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
>  	destroy_work_on_stack(&args.work);
>  	return args.result;
>  }
> diff --git a/fs/xfs/mrlock.h b/fs/xfs/mrlock.h
> deleted file mode 100644
> index 79155eec341b..000000000000
> --- a/fs/xfs/mrlock.h
> +++ /dev/null
> @@ -1,78 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * Copyright (c) 2000-2006 Silicon Graphics, Inc.
> - * All Rights Reserved.
> - */
> -#ifndef __XFS_SUPPORT_MRLOCK_H__
> -#define __XFS_SUPPORT_MRLOCK_H__
> -
> -#include <linux/rwsem.h>
> -
> -typedef struct {
> -	struct rw_semaphore	mr_lock;
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	int			mr_writer;
> -#endif
> -} mrlock_t;
> -
> -#if defined(DEBUG) || defined(XFS_WARN)
> -#define mrinit(mrp, name)	\
> -	do { (mrp)->mr_writer = 0; init_rwsem(&(mrp)->mr_lock); } while (0)
> -#else
> -#define mrinit(mrp, name)	\
> -	do { init_rwsem(&(mrp)->mr_lock); } while (0)
> -#endif
> -
> -#define mrlock_init(mrp, t,n,s)	mrinit(mrp, n)
> -#define mrfree(mrp)		do { } while (0)
> -
> -static inline void mraccess_nested(mrlock_t *mrp, int subclass)
> -{
> -	down_read_nested(&mrp->mr_lock, subclass);
> -}
> -
> -static inline void mrupdate_nested(mrlock_t *mrp, int subclass)
> -{
> -	down_write_nested(&mrp->mr_lock, subclass);
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	mrp->mr_writer = 1;
> -#endif
> -}
> -
> -static inline int mrtryaccess(mrlock_t *mrp)
> -{
> -	return down_read_trylock(&mrp->mr_lock);
> -}
> -
> -static inline int mrtryupdate(mrlock_t *mrp)
> -{
> -	if (!down_write_trylock(&mrp->mr_lock))
> -		return 0;
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	mrp->mr_writer = 1;
> -#endif
> -	return 1;
> -}
> -
> -static inline void mrunlock_excl(mrlock_t *mrp)
> -{
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	mrp->mr_writer = 0;
> -#endif
> -	up_write(&mrp->mr_lock);
> -}
> -
> -static inline void mrunlock_shared(mrlock_t *mrp)
> -{
> -	up_read(&mrp->mr_lock);
> -}
> -
> -static inline void mrdemote(mrlock_t *mrp)
> -{
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	mrp->mr_writer = 0;
> -#endif
> -	downgrade_write(&mrp->mr_lock);
> -}
> -
> -#endif /* __XFS_SUPPORT_MRLOCK_H__ */
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 16d481cf3793..43ecfcb63f99 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -191,14 +191,15 @@ xfs_ilock(
>  	}
>  
>  	if (lock_flags & XFS_MMAPLOCK_EXCL)
> -		mrupdate_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags));
> +		down_write_nested(&ip->i_mmaplock,
> +				XFS_MMAPLOCK_DEP(lock_flags));
>  	else if (lock_flags & XFS_MMAPLOCK_SHARED)
> -		mraccess_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags));
> +		down_read_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags));
>  
>  	if (lock_flags & XFS_ILOCK_EXCL)
> -		mrupdate_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
> +		down_write_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
>  	else if (lock_flags & XFS_ILOCK_SHARED)
> -		mraccess_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
> +		down_read_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags));
>  }
>  
>  /*
> @@ -242,27 +243,27 @@ xfs_ilock_nowait(
>  	}
>  
>  	if (lock_flags & XFS_MMAPLOCK_EXCL) {
> -		if (!mrtryupdate(&ip->i_mmaplock))
> +		if (!down_write_trylock(&ip->i_mmaplock))
>  			goto out_undo_iolock;
>  	} else if (lock_flags & XFS_MMAPLOCK_SHARED) {
> -		if (!mrtryaccess(&ip->i_mmaplock))
> +		if (!down_read_trylock(&ip->i_mmaplock))
>  			goto out_undo_iolock;
>  	}
>  
>  	if (lock_flags & XFS_ILOCK_EXCL) {
> -		if (!mrtryupdate(&ip->i_lock))
> +		if (!down_write_trylock(&ip->i_lock))
>  			goto out_undo_mmaplock;
>  	} else if (lock_flags & XFS_ILOCK_SHARED) {
> -		if (!mrtryaccess(&ip->i_lock))
> +		if (!down_read_trylock(&ip->i_lock))
>  			goto out_undo_mmaplock;
>  	}
>  	return 1;
>  
>  out_undo_mmaplock:
>  	if (lock_flags & XFS_MMAPLOCK_EXCL)
> -		mrunlock_excl(&ip->i_mmaplock);
> +		up_write(&ip->i_mmaplock);
>  	else if (lock_flags & XFS_MMAPLOCK_SHARED)
> -		mrunlock_shared(&ip->i_mmaplock);
> +		up_read(&ip->i_mmaplock);
>  out_undo_iolock:
>  	if (lock_flags & XFS_IOLOCK_EXCL)
>  		up_write(&VFS_I(ip)->i_rwsem);
> @@ -309,14 +310,14 @@ xfs_iunlock(
>  		up_read(&VFS_I(ip)->i_rwsem);
>  
>  	if (lock_flags & XFS_MMAPLOCK_EXCL)
> -		mrunlock_excl(&ip->i_mmaplock);
> +		up_write(&ip->i_mmaplock);
>  	else if (lock_flags & XFS_MMAPLOCK_SHARED)
> -		mrunlock_shared(&ip->i_mmaplock);
> +		up_read(&ip->i_mmaplock);
>  
>  	if (lock_flags & XFS_ILOCK_EXCL)
> -		mrunlock_excl(&ip->i_lock);
> +		up_write(&ip->i_lock);
>  	else if (lock_flags & XFS_ILOCK_SHARED)
> -		mrunlock_shared(&ip->i_lock);
> +		up_read(&ip->i_lock);
>  
>  	trace_xfs_iunlock(ip, lock_flags, _RET_IP_);
>  }
> @@ -335,9 +336,9 @@ xfs_ilock_demote(
>  		~(XFS_IOLOCK_EXCL|XFS_MMAPLOCK_EXCL|XFS_ILOCK_EXCL)) == 0);
>  
>  	if (lock_flags & XFS_ILOCK_EXCL)
> -		mrdemote(&ip->i_lock);
> +		downgrade_write(&ip->i_lock);
>  	if (lock_flags & XFS_MMAPLOCK_EXCL)
> -		mrdemote(&ip->i_mmaplock);
> +		downgrade_write(&ip->i_mmaplock);
>  	if (lock_flags & XFS_IOLOCK_EXCL)
>  		downgrade_write(&VFS_I(ip)->i_rwsem);
>  
> @@ -375,19 +376,22 @@ xfs_isilocked(
>  	struct xfs_inode	*ip,
>  	uint			lock_flags)
>  {
> -	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
> -		if (!(lock_flags & XFS_ILOCK_SHARED))
> -			return !!ip->i_lock.mr_writer;
> -		return rwsem_is_locked(&ip->i_lock.mr_lock);
> +	if (lock_flags & (XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)) {
> +		ASSERT(!(lock_flags & ~(XFS_ILOCK_EXCL | XFS_ILOCK_SHARED)));
> +		return __xfs_rwsem_islocked(&ip->i_lock, lock_flags,
> +				XFS_ILOCK_FLAG_SHIFT);
>  	}
>  
> -	if (lock_flags & (XFS_MMAPLOCK_EXCL|XFS_MMAPLOCK_SHARED)) {
> -		if (!(lock_flags & XFS_MMAPLOCK_SHARED))
> -			return !!ip->i_mmaplock.mr_writer;
> -		return rwsem_is_locked(&ip->i_mmaplock.mr_lock);
> +	if (lock_flags & (XFS_MMAPLOCK_EXCL | XFS_MMAPLOCK_SHARED)) {
> +		ASSERT(!(lock_flags &
> +			~(XFS_MMAPLOCK_EXCL | XFS_MMAPLOCK_SHARED)));
> +		return __xfs_rwsem_islocked(&ip->i_mmaplock, lock_flags,
> +				XFS_MMAPLOCK_FLAG_SHIFT);
>  	}
>  
>  	if (lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) {
> +		ASSERT(!(lock_flags &
> +			~(XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)));
>  		return __xfs_rwsem_islocked(&VFS_I(ip)->i_rwsem, lock_flags,
>  				XFS_IOLOCK_FLAG_SHIFT);
>  	}
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 1392a9c452ae..66ceb127192e 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -39,8 +39,8 @@ typedef struct xfs_inode {
>  
>  	/* Transaction and locking information. */
>  	struct xfs_inode_log_item *i_itemp;	/* logging information */
> -	mrlock_t		i_lock;		/* inode lock */
> -	mrlock_t		i_mmaplock;	/* inode mmap IO lock */
> +	struct rw_semaphore	i_lock;		/* inode lock */
> +	struct rw_semaphore	i_mmaplock;	/* inode mmap IO lock */
>  	atomic_t		i_pincount;	/* inode pin count */
>  
>  	/*
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 5e165456da68..8181f6785a7a 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1336,9 +1336,9 @@ xfs_setup_inode(
>  		 */
>  		lockdep_set_class(&inode->i_rwsem,
>  				  &inode->i_sb->s_type->i_mutex_dir_key);
> -		lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class);
> +		lockdep_set_class(&ip->i_lock, &xfs_dir_ilock_class);
>  	} else {
> -		lockdep_set_class(&ip->i_lock.mr_lock, &xfs_nondir_ilock_class);
> +		lockdep_set_class(&ip->i_lock, &xfs_nondir_ilock_class);
>  	}
>  
>  	/*
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index 5b7a1e201559..64e28ec16cf7 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -22,7 +22,6 @@ typedef __u32			xfs_nlink_t;
>  #include "xfs_types.h"
>  
>  #include "kmem.h"
> -#include "mrlock.h"
>  
>  #include <linux/semaphore.h>
>  #include <linux/mm.h>
> @@ -61,6 +60,7 @@ typedef __u32			xfs_nlink_t;
>  #include <linux/ratelimit.h>
>  #include <linux/rhashtable.h>
>  #include <linux/xattr.h>
> +#include <linux/rwsem.h>
>  
>  #include <asm/page.h>
>  #include <asm/div64.h>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index e3e229e52512..380ba196b165 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -708,10 +708,8 @@ xfs_fs_inode_init_once(
>  	atomic_set(&ip->i_pincount, 0);
>  	spin_lock_init(&ip->i_flags_lock);
>  
> -	mrlock_init(&ip->i_mmaplock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
> -		     "xfsino", ip->i_ino);
> -	mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER,
> -		     "xfsino", ip->i_ino);
> +	init_rwsem(&ip->i_mmaplock);
> +	init_rwsem(&ip->i_lock);
>  }
>  
>  /*
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores
  2020-11-02 19:41 ` [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores Pavel Reichl
  2020-11-03 19:41   ` Darrick J. Wong
@ 2020-11-04  0:53   ` Darrick J. Wong
  2020-11-05 22:08     ` Dave Chinner
  1 sibling, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2020-11-04  0:53 UTC (permalink / raw)
  To: Pavel Reichl; +Cc: linux-xfs

On Mon, Nov 02, 2020 at 08:41:35PM +0100, Pavel Reichl wrote:
> Remove mrlock_t as it does not provide any extra value over
> rw_semaphores. Make i_lock and i_mmaplock native rw_semaphores and
> replace mr*() functions with native rwsem calls.
> 
> Release the lock in xfs_btree_split() just before the work-queue
> executing xfs_btree_split_worker() is scheduled and make
> xfs_btree_split_worker() to acquire the lock as a first thing and
> release it just before returning from the function. This it done so the
> ownership of the lock is transfered between kernel threads and thus
> lockdep won't complain about lock being held by a different kernel
> thread.
> 
> Signed-off-by: Pavel Reichl <preichl@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> ---
>  fs/xfs/libxfs/xfs_btree.c | 16 ++++++++
>  fs/xfs/mrlock.h           | 78 ---------------------------------------
>  fs/xfs/xfs_inode.c        | 52 ++++++++++++++------------
>  fs/xfs/xfs_inode.h        |  4 +-
>  fs/xfs/xfs_iops.c         |  4 +-
>  fs/xfs/xfs_linux.h        |  2 +-
>  fs/xfs/xfs_super.c        |  6 +--
>  7 files changed, 51 insertions(+), 111 deletions(-)
>  delete mode 100644 fs/xfs/mrlock.h
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 2d25bab68764..181d5797c97b 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -2816,6 +2816,10 @@ xfs_btree_split_worker(
w  	unsigned long		pflags;
>  	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
>  
> +	/*
> +	 * Tranfer lock ownership to workqueue task.
> +	 */
> +	rwsem_acquire(&args->cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
>  	/*
>  	 * we are in a transaction context here, but may also be doing work
>  	 * in kswapd context, and hence we may need to inherit that state
> @@ -2829,6 +2833,7 @@ xfs_btree_split_worker(
>  
>  	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
>  					 args->key, args->curp, args->stat);
> +	rwsem_release(&args->cur->bc_ino.ip->i_lock.dep_map, _THIS_IP_);
>  	complete(args->done);
>  
>  	current_restore_flags_nested(&pflags, new_pflags);
> @@ -2863,8 +2868,19 @@ xfs_btree_split(
>  	args.done = &done;
>  	args.kswapd = current_is_kswapd();
>  	INIT_WORK_ONSTACK(&args.work, xfs_btree_split_worker);
> +	/*
> +	 * Update lockdep's ownership information to reflect transfer of the
> +	 * ilock from the current task to the worker. Otherwise assertions that
> +	 * the lock is held (such as when logging the inode) might fail due to
> +	 * incorrect task owner state.
> +	 */
> +	rwsem_release(&cur->bc_ino.ip->i_lock.dep_map, _THIS_IP_);
>  	queue_work(xfs_alloc_wq, &args.work);
>  	wait_for_completion(&done);
> +	/*
> +	 * Tranfer lock ownership back to the thread.
> +	 */
> +	rwsem_acquire(&cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);

So I ran all this through fstests.  On generic/324 on a fstests run with
reasonable recent xfsprogs and all the features turned on (rmap in
particular) I see the following lockdep report:

============================================
WARNING: possible recursive locking detected
5.10.0-rc1-djw #rc1 Not tainted
--------------------------------------------
xfs_fsr/5438 is trying to acquire lock:
ffff88801826b028 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_btree_make_block_unfull+0x19a/0x200 [xfs]

but task is already holding lock:
ffff88807fbc58a8 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_lock_two_inodes+0x116/0x3e0 [xfs]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xfs_nondir_ilock_class);
  lock(&xfs_nondir_ilock_class);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

7 locks held by xfs_fsr/5438:
 #0: ffff88804166a430 (sb_writers#12){.+.+}-{0:0}, at: mnt_want_write_file+0x23/0x70
 #1: ffff88801826b2d0 (&sb->s_type->i_mutex_key#13){++++}-{3:3}, at: lock_two_nondirectories+0x68/0x70
 #2: ffff88807fbc5b50 (&sb->s_type->i_mutex_key#13/4){+.+.}-{3:3}, at: xfs_swap_extents+0x4e/0x8e0 [xfs]
 #3: ffff88807fbc5940 (&ip->i_mmaplock){+.+.}-{3:3}, at: xfs_ilock+0xfa/0x270 [xfs]
 #4: ffff88801826b0c0 (&ip->i_mmaplock){+.+.}-{3:3}, at: xfs_ilock_nowait+0x17f/0x310 [xfs]
 #5: ffff88804166a620 (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc+0x1b0/0x2a0 [xfs]
 #6: ffff88807fbc58a8 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_lock_two_inodes+0x116/0x3e0 [xfs]

stack backtrace:
CPU: 1 PID: 5438 Comm: xfs_fsr Not tainted 5.10.0-rc1-djw #rc1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x7b/0x9b
 __lock_acquire.cold+0x208/0x2b9
 lock_acquire+0xcd/0x430
 ? xfs_btree_make_block_unfull+0x19a/0x200 [xfs]
 ? mark_held_locks+0x49/0x70
 xfs_btree_split+0x1a2/0x1c0 [xfs]
 ? xfs_btree_make_block_unfull+0x19a/0x200 [xfs]
 ? xfs_btree_split+0x1c0/0x1c0 [xfs]
 xfs_btree_make_block_unfull+0x19a/0x200 [xfs]
 xfs_btree_insrec+0x448/0x9a0 [xfs]
 ? xfs_btree_lookup_get_block+0xee/0x1a0 [xfs]
 xfs_btree_insert+0xa0/0x1f0 [xfs]
 xfs_bmap_add_extent_hole_real+0x272/0xae0 [xfs]
 xfs_bmapi_remap+0x1d5/0x400 [xfs]
 xfs_bmap_finish_one+0x1ac/0x2a0 [xfs]
 xfs_bmap_update_finish_item+0x53/0xd0 [xfs]
 xfs_defer_finish_noroll+0x271/0xa90 [xfs]
 xfs_defer_finish+0x15/0xa0 [xfs]
 xfs_swap_extent_rmap+0x27c/0x810 [xfs]
 xfs_swap_extents+0x7d8/0x8e0 [xfs]
 xfs_ioc_swapext+0x131/0x150 [xfs]
 xfs_file_ioctl+0x291/0xca0 [xfs]
 __x64_sys_ioctl+0x87/0xa0
 do_syscall_64+0x31/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f73472ea50b
Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc7f707778 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffc7f7078e0 RCX: 00007f73472ea50b
RDX: 00007ffc7f707780 RSI: 00000000c0c0586d RDI: 0000000000000005
RBP: 00007ffc7f7097c0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 00007ffc7f7097c0
R13: 00005567684f90c0 R14: 00000000000000f9 R15: 00000000000000f9

I think it's significant that xfs_lock_two_inodes figures in the lockdep
report above.  I suspect what's going on here is: xfs_swap_extents takes
the ILOCK of the two inodes it's operating on.  The first ILOCK gets
tagged with subclass 0 and the second file's ILOCK gets subclass 1.

While swapping extents, we decide that the second file requires a bmbt
split and that we have to do this via workqueue.  We tell lockdep to
drop the current thread (pid 5438) from the lockdep map and kick off the
worker.  When the worker returns, however, we don't feed the subclass
information to rwsem_acquire (the second parameter).  Therefore, lockdep
sees that pid 5438 currently holds one subclass 0 ILOCK (the first
inode, which we didn't unlock) and complains that we're asking it to
grab another subclass 0 ILOCK (the second inode).

Unfortunately, I don't think you can solve this problem solely by
passing the ILOCK subclass information to rwsem_acquire, either.
generic/558 produces the following splat:

======================================================
WARNING: possible circular locking dependency detected
5.10.0-rc1-djw #rc1 Not tainted
------------------------------------------------------
rm/25035 is trying to acquire lock:
ffff888009f5c970 (&xfs_dir_ilock_class){++++}-{3:3}, at: xfs_btree_make_block_unfull+0x19a/0x200 [xfs]

but task is already holding lock:
ffff88800af0c230 (&xfs_nondir_ilock_class/1){+.+.}-{3:3}, at: xfs_remove+0x180/0x510 [xfs]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&xfs_nondir_ilock_class/1){+.+.}-{3:3}:
       down_write_nested+0x48/0x120
       xfs_remove+0x180/0x510 [xfs]
       xfs_vn_unlink+0x57/0xa0 [xfs]
       vfs_unlink+0xf2/0x1e0
       do_unlinkat+0x1a5/0x2e0
       do_syscall_64+0x31/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&xfs_dir_ilock_class){++++}-{3:3}:
       __lock_acquire+0x1223/0x2200
       lock_acquire+0xcd/0x430
       xfs_btree_split+0x1a5/0x1c0 [xfs]
       xfs_btree_make_block_unfull+0x19a/0x200 [xfs]
       xfs_btree_insrec+0x448/0x9a0 [xfs]
       xfs_btree_insert+0xa0/0x1f0 [xfs]
       xfs_bmap_del_extent_real+0x6fc/0xc60 [xfs]
       __xfs_bunmapi+0x8ba/0x10b0 [xfs]
       xfs_bunmapi+0x19/0x30 [xfs]
       xfs_dir2_shrink_inode+0x94/0x2d0 [xfs]
       xfs_dir2_node_removename+0x87f/0x9e0 [xfs]
       xfs_dir_removename+0x1eb/0x2b0 [xfs]
       xfs_remove+0x426/0x510 [xfs]
       xfs_vn_unlink+0x57/0xa0 [xfs]
       vfs_unlink+0xf2/0x1e0
       do_unlinkat+0x1a5/0x2e0
       do_syscall_64+0x31/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&xfs_nondir_ilock_class/1);
                               lock(&xfs_dir_ilock_class);
                               lock(&xfs_nondir_ilock_class/1);
  lock(&xfs_dir_ilock_class);

 *** DEADLOCK ***

5 locks held by rm/25035:
 #0: ffff88803b9f8468 (sb_writers#12){.+.+}-{0:0}, at: mnt_want_write+0x1f/0x50
 #1: ffff888009f5cc48 (&inode->i_sb->s_type->i_mutex_dir_key/1){+.+.}-{3:3}, at: do_unlinkat+0x13c/0x2e0
 #2: ffff88800af0c508 (&sb->s_type->i_mutex_key#13){++++}-{3:3}, at: vfs_unlink+0x4c/0x1e0
 #3: ffff88803b9f8688 (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc+0x1b0/0x2a0 [xfs]
 #4: ffff88800af0c230 (&xfs_nondir_ilock_class/1){+.+.}-{3:3}, at: xfs_remove+0x180/0x510 [xfs]

stack backtrace:
CPU: 1 PID: 25035 Comm: rm Not tainted 5.10.0-rc1-djw #rc1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x7b/0x9b
 check_noncircular+0xf3/0x110
 __lock_acquire+0x1223/0x2200
 lock_acquire+0xcd/0x430
 ? xfs_btree_make_block_unfull+0x19a/0x200 [xfs]
 ? mark_held_locks+0x45/0x70
 xfs_btree_split+0x1a5/0x1c0 [xfs]
 ? xfs_btree_make_block_unfull+0x19a/0x200 [xfs]
 ? wait_for_completion+0xba/0x110
 ? xfs_btree_split+0x1c0/0x1c0 [xfs]
 xfs_btree_make_block_unfull+0x19a/0x200 [xfs]
 xfs_btree_insrec+0x448/0x9a0 [xfs]
 xfs_btree_insert+0xa0/0x1f0 [xfs]
 ? xfs_btree_increment+0x95/0x590 [xfs]
 xfs_bmap_del_extent_real+0x6fc/0xc60 [xfs]
 __xfs_bunmapi+0x8ba/0x10b0 [xfs]
 xfs_bunmapi+0x19/0x30 [xfs]
 xfs_dir2_shrink_inode+0x94/0x2d0 [xfs]
 xfs_dir2_node_removename+0x87f/0x9e0 [xfs]
 xfs_dir_removename+0x1eb/0x2b0 [xfs]
 ? xfs_iunlink+0x169/0x310 [xfs]
 xfs_remove+0x426/0x510 [xfs]
 xfs_vn_unlink+0x57/0xa0 [xfs]
 vfs_unlink+0xf2/0x1e0
 do_unlinkat+0x1a5/0x2e0
 do_syscall_64+0x31/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f892cca4e6b

Just like in the first scenario, we hold two ILOCKs -- a directory, and
a file that we're removing from the directory.  The removal triggers the
same bmbt split worker because we're shrinking the directory, and upon
its completion we call rwsem_acquire to reset the lockdep maps.

Unfortunately, in this case, we actually /are/ feeding the correct
subclass information to rwsem_acquire.  This time it's pointing out what
it thinks is an inconsistency in our locking order: the first time we
locked the directory and then the regular file inode, but now we hold
the regular file inode and we're asking it for the directory ILOCK.

(Note that we don't actually deadlock here because pid 25035 has
maintained ownership of the directory ILOCK rwsem this whole time, but
lockdep doesn't know that.)

A crappy way to bypass this problem is the following garbage patch
which disables lockdep chain checking since we never actually dropped
any of the ILOCKs that are being complained about.  Messing with low
level lockdep internals seems sketchy to me, but so it goes.

The patch also has the major flaw that it doesn't recapture the subclass
information, but doing that is left as an exercise to the reader. ;)

--D

diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 9756a63e78f4..3146932de7fe 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -2820,7 +2820,7 @@ xfs_btree_split_worker(
 	/*
 	 * Tranfer lock ownership to workqueue task.
 	 */
-	rwsem_acquire(&args->cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
+	lock_acquire(&args->cur->bc_ino.ip->i_lock.dep_map, 0, 0, 0, 0, NULL, _RET_IP_);
 	/*
 	 * we are in a transaction context here, but may also be doing work
 	 * in kswapd context, and hence we may need to inherit that state
@@ -2882,7 +2882,7 @@ xfs_btree_split(
 	/*
 	 * Tranfer lock ownership back to the thread.
 	 */
-	rwsem_acquire(&cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
+	lock_acquire(&cur->bc_ino.ip->i_lock.dep_map, 0, 0, 0, 0, NULL, _RET_IP_);
 	destroy_work_on_stack(&args.work);
 	return args.result;
 }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores
  2020-11-04  0:53   ` Darrick J. Wong
@ 2020-11-05 22:08     ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2020-11-05 22:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Pavel Reichl, linux-xfs

On Tue, Nov 03, 2020 at 04:53:45PM -0800, Darrick J. Wong wrote:
> On Mon, Nov 02, 2020 at 08:41:35PM +0100, Pavel Reichl wrote:
> > Remove mrlock_t as it does not provide any extra value over
> > rw_semaphores. Make i_lock and i_mmaplock native rw_semaphores and
> > replace mr*() functions with native rwsem calls.
> > 
> > Release the lock in xfs_btree_split() just before the work-queue
> > executing xfs_btree_split_worker() is scheduled and make
> > xfs_btree_split_worker() to acquire the lock as a first thing and
> > release it just before returning from the function. This it done so the
> > ownership of the lock is transfered between kernel threads and thus
> > lockdep won't complain about lock being held by a different kernel
> > thread.
> > 
> > Signed-off-by: Pavel Reichl <preichl@redhat.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > 
> > ---
> >  fs/xfs/libxfs/xfs_btree.c | 16 ++++++++
> >  fs/xfs/mrlock.h           | 78 ---------------------------------------
> >  fs/xfs/xfs_inode.c        | 52 ++++++++++++++------------
> >  fs/xfs/xfs_inode.h        |  4 +-
> >  fs/xfs/xfs_iops.c         |  4 +-
> >  fs/xfs/xfs_linux.h        |  2 +-
> >  fs/xfs/xfs_super.c        |  6 +--
> >  7 files changed, 51 insertions(+), 111 deletions(-)
> >  delete mode 100644 fs/xfs/mrlock.h
> > 
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 2d25bab68764..181d5797c97b 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -2816,6 +2816,10 @@ xfs_btree_split_worker(
> w  	unsigned long		pflags;
> >  	unsigned long		new_pflags = PF_MEMALLOC_NOFS;
> >  
> > +	/*
> > +	 * Tranfer lock ownership to workqueue task.
> > +	 */
> > +	rwsem_acquire(&args->cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
> >  	/*
> >  	 * we are in a transaction context here, but may also be doing work
> >  	 * in kswapd context, and hence we may need to inherit that state
> > @@ -2829,6 +2833,7 @@ xfs_btree_split_worker(
> >  
> >  	args->result = __xfs_btree_split(args->cur, args->level, args->ptrp,
> >  					 args->key, args->curp, args->stat);
> > +	rwsem_release(&args->cur->bc_ino.ip->i_lock.dep_map, _THIS_IP_);
> >  	complete(args->done);
> >  
> >  	current_restore_flags_nested(&pflags, new_pflags);
> > @@ -2863,8 +2868,19 @@ xfs_btree_split(
> >  	args.done = &done;
> >  	args.kswapd = current_is_kswapd();
> >  	INIT_WORK_ONSTACK(&args.work, xfs_btree_split_worker);
> > +	/*
> > +	 * Update lockdep's ownership information to reflect transfer of the
> > +	 * ilock from the current task to the worker. Otherwise assertions that
> > +	 * the lock is held (such as when logging the inode) might fail due to
> > +	 * incorrect task owner state.
> > +	 */
> > +	rwsem_release(&cur->bc_ino.ip->i_lock.dep_map, _THIS_IP_);
> >  	queue_work(xfs_alloc_wq, &args.work);
> >  	wait_for_completion(&done);
> > +	/*
> > +	 * Tranfer lock ownership back to the thread.
> > +	 */
> > +	rwsem_acquire(&cur->bc_ino.ip->i_lock.dep_map, 0, 0, _RET_IP_);
> 
> So I ran all this through fstests.  On generic/324 on a fstests run with
> reasonable recent xfsprogs and all the features turned on (rmap in
> particular) I see the following lockdep report:

Why, exactly do we need to transfer the ownership of the inode lock
here? 

The code that runs in the workqueue does not attempt to lock the
inode in any way, nor should it be trying to assert that the inode
is locked because it's just doing btree work that -requires- the
inode to be locked long before we get to these layers. If anything
deep in the BMBT split code is trying to assert that the inode is
locked, just remove those assertions as we've already run these
asserts long before we hand this work off to another thread.

[....]

> Just like in the first scenario, we hold two ILOCKs -- a directory, and
> a file that we're removing from the directory.  The removal triggers the
> same bmbt split worker because we're shrinking the directory, and upon
> its completion we call rwsem_acquire to reset the lockdep maps.
> 
> Unfortunately, in this case, we actually /are/ feeding the correct
> subclass information to rwsem_acquire.  This time it's pointing out what
> it thinks is an inconsistency in our locking order: the first time we
> locked the directory and then the regular file inode, but now we hold
> the regular file inode and we're asking it for the directory ILOCK.
> 
> (Note that we don't actually deadlock here because pid 25035 has
> maintained ownership of the directory ILOCK rwsem this whole time, but
> lockdep doesn't know that.)
> 
> A crappy way to bypass this problem is the following garbage patch
> which disables lockdep chain checking since we never actually dropped
> any of the ILOCKs that are being complained about.  Messing with low
> level lockdep internals seems sketchy to me, but so it goes.
> 
> The patch also has the major flaw that it doesn't recapture the subclass
> information, but doing that is left as an exercise to the reader. ;)

lockdep is a PITA when it comes to using semaphores as they were
intended because lockdep is task context centric and semaphores are
data object centric. i.e. with semaphores, the data object owns the
lock, not the task context, and that creates a horrible impedence
mismatch between the semaphore locking and lockdep tracking models.
IOWs, lockdep has major issues with this, so the two options here are:

	1. get rid of lockdep checks in the paths where we pass
	locked semaphores to other task contexts; or

	2. add lockdep checks for "lockdep_assert_held_non_owner()"
	where it just checks that the lock is held read/write but
	without checking the owner matches.

#2 matches how our existing inode locking assert checks work - we
don't care who locked the inode, just that it is locked in the
correct mode.

This would get rid of the need to pass rwsem lockdep contexts
between tasks. That, in turn, gets rid of all the nesting and
annotation problems that this attmept to pass lockdep contexts to
non-owner tasks introduces....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-05 22:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 19:41 [PATCH v13 0/4] xfs: Remove wrappers for some semaphores Pavel Reichl
2020-11-02 19:41 ` [PATCH v13 1/4] xfs: Refactor xfs_isilocked() Pavel Reichl
2020-11-02 19:41 ` [PATCH v13 2/4] xfs: clean up whitespace in xfs_isilocked() calls Pavel Reichl
2020-11-02 19:41 ` [PATCH v13 3/4] xfs: xfs_isilocked() can only check a single lock type Pavel Reichl
2020-11-02 19:41 ` [PATCH v13 4/4] xfs: replace mrlock_t with rw_semaphores Pavel Reichl
2020-11-03 19:41   ` Darrick J. Wong
2020-11-04  0:53   ` Darrick J. Wong
2020-11-05 22:08     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.