All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] serialize unmount against new buffer I/O
@ 2016-07-13 16:16 Brian Foster
  2016-07-13 16:16 ` [PATCH 1/3] xfs: helper to set flags on uncached buffer reads Brian Foster
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Brian Foster @ 2016-07-13 16:16 UTC (permalink / raw)
  To: xfs

This is a rework of the directory readahead panic fix based on a buffer
I/O count mechanism. See [1] for the original patch and subsequent
discussion leading to the solution as constituted.

This is split into multiple patches simply to break out some refactoring
and special buffer exclusion from the accounting mechanism. This
prevents the original problem (xfs/311) as well as passes an initial
xfstests run.

Brian

[1] http://oss.sgi.com/pipermail/xfs/2016-June/049946.html

Brian Foster (3):
  xfs: helper to set flags on uncached buffer reads
  xfs: exclude never-released buffers from buftarg I/O accounting
  xfs: track and serialize in-flight async buffers against unmount

 fs/xfs/xfs_buf.c   | 177 ++++++++++++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_buf.h   |   8 +++
 fs/xfs/xfs_log.c   |   4 +-
 fs/xfs/xfs_mount.c |  12 ++--
 4 files changed, 151 insertions(+), 50 deletions(-)

-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] xfs: helper to set flags on uncached buffer reads
  2016-07-13 16:16 [PATCH 0/3] serialize unmount against new buffer I/O Brian Foster
@ 2016-07-13 16:16 ` Brian Foster
  2016-07-14  0:01   ` Dave Chinner
  2016-07-13 16:16 ` [PATCH 2/3] xfs: exclude never-released buffers from buftarg I/O accounting Brian Foster
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2016-07-13 16:16 UTC (permalink / raw)
  To: xfs

xfs_buf_read_uncached() allocates an uncached buffer and performs a read
in one go. As part of the upcoming buftarg I/O accounting mechanism,
some sites may need to set flags on a buffer before I/O submission.

Create a new helper to support the ability to set flags on a buffer
before it is submitted for I/O. This use case is the exception, so
create a wrapper for the original xfs_buf_read_uncached().

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf.c | 20 +++++++++++++++++---
 fs/xfs/xfs_buf.h |  4 ++++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 4665ff6..f007713 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -693,13 +693,14 @@ xfs_buf_readahead_map(
  * buffer containing the disk contents or nothing.
  */
 int
-xfs_buf_read_uncached(
+xfs_buf_read_uncached_flags(
 	struct xfs_buftarg	*target,
 	xfs_daddr_t		daddr,
 	size_t			numblks,
 	int			flags,
 	struct xfs_buf		**bpp,
-	const struct xfs_buf_ops *ops)
+	const struct xfs_buf_ops *ops,
+	int			bflags)
 {
 	struct xfs_buf		*bp;
 
@@ -713,7 +714,7 @@ xfs_buf_read_uncached(
 	ASSERT(bp->b_map_count == 1);
 	bp->b_bn = XFS_BUF_DADDR_NULL;  /* always null for uncached buffers */
 	bp->b_maps[0].bm_bn = daddr;
-	bp->b_flags |= XBF_READ;
+	bp->b_flags |= XBF_READ | bflags;
 	bp->b_ops = ops;
 
 	xfs_buf_submit_wait(bp);
@@ -727,6 +728,19 @@ xfs_buf_read_uncached(
 	return 0;
 }
 
+int
+xfs_buf_read_uncached(
+	struct xfs_buftarg	*target,
+	xfs_daddr_t		daddr,
+	size_t			numblks,
+	int			flags,
+	struct xfs_buf		**bpp,
+	const struct xfs_buf_ops *ops)
+{
+	return xfs_buf_read_uncached_flags(target, daddr, numblks, flags, bpp,
+					   ops, 0);
+}
+
 /*
  * Return a buffer allocated as an empty buffer and associated to external
  * memory via xfs_buf_associate_memory() back to it's empty state.
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 8bfb974..a3c7ba4 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -295,6 +295,10 @@ struct xfs_buf *xfs_buf_get_uncached(struct xfs_buftarg *target, size_t numblks,
 int xfs_buf_read_uncached(struct xfs_buftarg *target, xfs_daddr_t daddr,
 			  size_t numblks, int flags, struct xfs_buf **bpp,
 			  const struct xfs_buf_ops *ops);
+int xfs_buf_read_uncached_flags(struct xfs_buftarg *target, xfs_daddr_t daddr,
+				size_t numblks, int flags, struct xfs_buf **bpp,
+				const struct xfs_buf_ops *ops, int blags);
+
 void xfs_buf_hold(struct xfs_buf *bp);
 
 /* Releasing Buffers */
-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] xfs: exclude never-released buffers from buftarg I/O accounting
  2016-07-13 16:16 [PATCH 0/3] serialize unmount against new buffer I/O Brian Foster
  2016-07-13 16:16 ` [PATCH 1/3] xfs: helper to set flags on uncached buffer reads Brian Foster
@ 2016-07-13 16:16 ` Brian Foster
  2016-07-13 16:16 ` [PATCH 3/3] xfs: track and serialize in-flight async buffers against unmount Brian Foster
  2016-07-14 17:29 ` [PATCH 0/3] serialize unmount against new buffer I/O Brian Foster
  3 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2016-07-13 16:16 UTC (permalink / raw)
  To: xfs

The upcoming buftarg I/O accounting mechanism maintains a count of all
buffers that have undergone I/O in the current hold-release cycle.
Certain buffers associated with core infrastructure (e.g., the xfs_mount
superblock buffer, log buffers) are never released, however. This means
that accounting I/O submission on such buffers elevates the buftarg
count indefinitely and could lead to lockup on unmount.

Define a new buffer flag to explicitly exclude buffers from buftarg I/O
accounting. Set the flag on the superblock and associated log buffers.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf.h   |  1 +
 fs/xfs/xfs_log.c   |  4 +++-
 fs/xfs/xfs_mount.c | 12 +++++++-----
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index a3c7ba4..9e1cca7 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -43,6 +43,7 @@ typedef enum {
 #define XBF_READ	 (1 << 0) /* buffer intended for reading from device */
 #define XBF_WRITE	 (1 << 1) /* buffer intended for writing to device */
 #define XBF_READ_AHEAD	 (1 << 2) /* asynchronous read-ahead */
+#define XBF_NO_IOACCT	 (1 << 3) /* bypass I/O accounting (non-LRU bufs) */
 #define XBF_ASYNC	 (1 << 4) /* initiator will not wait for completion */
 #define XBF_DONE	 (1 << 5) /* all pages in the buffer uptodate */
 #define XBF_STALE	 (1 << 6) /* buffer has been staled, do not find it */
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 63dad9e..4af0fb0 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1427,7 +1427,8 @@ xlog_alloc_log(
 	ASSERT(xfs_buf_islocked(bp));
 	xfs_buf_unlock(bp);
 
-	/* use high priority wq for log I/O completion */
+	/* disable accounting and use high priority wq for log I/O completion */
+	bp->b_flags |= XBF_NO_IOACCT;
 	bp->b_ioend_wq = mp->m_log_workqueue;
 	bp->b_iodone = xlog_iodone;
 	log->l_xbuf = bp;
@@ -1457,6 +1458,7 @@ xlog_alloc_log(
 						BTOBB(log->l_iclog_size), 0);
 		if (!bp)
 			goto out_free_iclog;
+		bp->b_flags |= XBF_NO_IOACCT;
 
 		ASSERT(xfs_buf_islocked(bp));
 		xfs_buf_unlock(bp);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e39b023..a9234a2 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -272,13 +272,15 @@ xfs_readsb(
 	buf_ops = NULL;
 
 	/*
-	 * Allocate a (locked) buffer to hold the superblock.
-	 * This will be kept around at all times to optimize
-	 * access to the superblock.
+	 * Allocate a (locked) buffer to hold the superblock. This will be kept
+	 * around at all times to optimize access to the superblock. Therefore,
+	 * set XBF_NO_IOACCT to make sure it doesn't hold the buftarg count
+	 * elevated.
 	 */
 reread:
-	error = xfs_buf_read_uncached(mp->m_ddev_targp, XFS_SB_DADDR,
-				   BTOBB(sector_size), 0, &bp, buf_ops);
+	error = xfs_buf_read_uncached_flags(mp->m_ddev_targp, XFS_SB_DADDR,
+					    BTOBB(sector_size), 0, &bp, buf_ops,
+					    XBF_NO_IOACCT);
 	if (error) {
 		if (loud)
 			xfs_warn(mp, "SB validate failed with error %d.", error);
-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] xfs: track and serialize in-flight async buffers against unmount
  2016-07-13 16:16 [PATCH 0/3] serialize unmount against new buffer I/O Brian Foster
  2016-07-13 16:16 ` [PATCH 1/3] xfs: helper to set flags on uncached buffer reads Brian Foster
  2016-07-13 16:16 ` [PATCH 2/3] xfs: exclude never-released buffers from buftarg I/O accounting Brian Foster
@ 2016-07-13 16:16 ` Brian Foster
  2016-07-14  0:05   ` Dave Chinner
  2016-07-14 17:29 ` [PATCH 0/3] serialize unmount against new buffer I/O Brian Foster
  3 siblings, 1 reply; 8+ messages in thread
From: Brian Foster @ 2016-07-13 16:16 UTC (permalink / raw)
  To: xfs

Newly allocated XFS metadata buffers are added to the LRU once the hold
count is released, which typically occurs after I/O completion. There is
no other mechanism at current that tracks the existence or I/O state of
a new buffer. Further, readahead I/O tends to be submitted
asynchronously by nature, which means the I/O can remain in flight and
actually complete long after the calling context is gone. This means
that file descriptors or any other holds on the filesystem can be
released, allowing the filesystem to be unmounted while I/O is still in
flight. When I/O completion occurs, core data structures may have been
freed, causing completion to run into invalid memory accesses and likely
to panic.

This problem is reproduced on XFS via directory readahead. A filesystem
is mounted, a directory is opened/closed and the filesystem immediately
unmounted. The open/close cycle triggers a directory readahead that if
delayed long enough, runs buffer I/O completion after the unmount has
completed.

To address this problem, add a mechanism to track all in-flight,
asynchronous buffers using per-cpu counters in the buftarg. The buffer
is accounted on the first I/O submission after the current reference is
acquired and unaccounted once the buffer is returned to the LRU or
freed. Update xfs_wait_buftarg() to wait on all in-flight I/O before
walking the LRU list. Once in-flight I/O has completed and the workqueue
has drained, all new buffers should have been released onto the LRU.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_buf.c | 157 ++++++++++++++++++++++++++++++++++++++++---------------
 fs/xfs/xfs_buf.h |   3 ++
 2 files changed, 119 insertions(+), 41 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f007713..3708dea 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -80,6 +80,47 @@ xfs_buf_vmap_len(
 }
 
 /*
+ * Bump the I/O in flight count on the buftarg if we haven't yet done so for
+ * this buffer. The count is incremented once per buffer (per hold cycle)
+ * because the corresponding decrement is deferred to buffer release. Buffers
+ * can undergo I/O multiple times in a hold-release cycle and per buffer I/O
+ * tracking adds unnecessary overhead. This is used for sychronization purposes
+ * with unmount (see xfs_wait_buftarg()), so all we really need is a count of
+ * in-flight buffers.
+ *
+ * Buffers that are never released (e.g., superblock, iclog buffers) must set
+ * the XBF_NO_IOACCT flag before I/O submission. Otherwise, the buftarg count
+ * never reaches zero and unmount hangs indefinitely.
+ */
+static inline void
+xfs_buf_ioacct_inc(
+	struct xfs_buf	*bp)
+{
+	if (bp->b_flags & (XBF_NO_IOACCT|_XBF_IN_FLIGHT))
+		return;
+
+	ASSERT(bp->b_flags & XBF_ASYNC);
+	bp->b_flags |= _XBF_IN_FLIGHT;
+	percpu_counter_inc(&bp->b_target->bt_io_count);
+}
+
+/*
+ * Clear the in-flight state on a buffer about to be released to the LRU or
+ * freed and unaccount from the buftarg.
+ */
+static inline void
+xfs_buf_ioacct_dec(
+	struct xfs_buf	*bp)
+{
+	if (!(bp->b_flags & _XBF_IN_FLIGHT))
+		return;
+
+	ASSERT(bp->b_flags & XBF_ASYNC);
+	bp->b_flags &= ~_XBF_IN_FLIGHT;
+	percpu_counter_dec(&bp->b_target->bt_io_count);
+}
+
+/*
  * When we mark a buffer stale, we remove the buffer from the LRU and clear the
  * b_lru_ref count so that the buffer is freed immediately when the buffer
  * reference count falls to zero. If the buffer is already on the LRU, we need
@@ -880,63 +921,85 @@ xfs_buf_hold(
 }
 
 /*
- *	Releases a hold on the specified buffer.  If the
- *	the hold count is 1, calls xfs_buf_free.
+ * Release a hold on the specified buffer. If the hold count is 1, the buffer is
+ * placed on LRU or freed (depending on b_lru_ref).
  */
 void
 xfs_buf_rele(
 	xfs_buf_t		*bp)
 {
 	struct xfs_perag	*pag = bp->b_pag;
+	bool			release;
+	bool			freebuf = false;
 
 	trace_xfs_buf_rele(bp, _RET_IP_);
 
 	if (!pag) {
 		ASSERT(list_empty(&bp->b_lru));
 		ASSERT(RB_EMPTY_NODE(&bp->b_rbnode));
-		if (atomic_dec_and_test(&bp->b_hold))
+		if (atomic_dec_and_test(&bp->b_hold)) {
+			xfs_buf_ioacct_dec(bp);
 			xfs_buf_free(bp);
+		}
 		return;
 	}
 
 	ASSERT(!RB_EMPTY_NODE(&bp->b_rbnode));
 
 	ASSERT(atomic_read(&bp->b_hold) > 0);
-	if (atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock)) {
-		spin_lock(&bp->b_lock);
-		if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) {
-			/*
-			 * If the buffer is added to the LRU take a new
-			 * reference to the buffer for the LRU and clear the
-			 * (now stale) dispose list state flag
-			 */
-			if (list_lru_add(&bp->b_target->bt_lru, &bp->b_lru)) {
-				bp->b_state &= ~XFS_BSTATE_DISPOSE;
-				atomic_inc(&bp->b_hold);
-			}
-			spin_unlock(&bp->b_lock);
-			spin_unlock(&pag->pag_buf_lock);
-		} else {
-			/*
-			 * most of the time buffers will already be removed from
-			 * the LRU, so optimise that case by checking for the
-			 * XFS_BSTATE_DISPOSE flag indicating the last list the
-			 * buffer was on was the disposal list
-			 */
-			if (!(bp->b_state & XFS_BSTATE_DISPOSE)) {
-				list_lru_del(&bp->b_target->bt_lru, &bp->b_lru);
-			} else {
-				ASSERT(list_empty(&bp->b_lru));
-			}
-			spin_unlock(&bp->b_lock);
 
-			ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
-			rb_erase(&bp->b_rbnode, &pag->pag_buf_tree);
-			spin_unlock(&pag->pag_buf_lock);
-			xfs_perag_put(pag);
-			xfs_buf_free(bp);
+	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
+	spin_lock(&bp->b_lock);
+	if (!release) {
+		/*
+		 * Drop the in-flight state if the buffer is already on the LRU
+		 * and it holds the only reference. This is racy because we
+		 * haven't acquired the pag lock, but the use of _XBF_IN_FLIGHT
+		 * ensures the decrement occurs only once per-buf.
+		 */
+		if ((atomic_read(&bp->b_hold) == 1) && !list_empty(&bp->b_lru))
+			xfs_buf_ioacct_dec(bp);
+		goto out_unlock;
+	}
+
+	/* the last reference has been dropped ... */
+	xfs_buf_ioacct_dec(bp);
+	if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) {
+		/*
+		 * If the buffer is added to the LRU take a new reference to the
+		 * buffer for the LRU and clear the (now stale) dispose list
+		 * state flag
+		 */
+		if (list_lru_add(&bp->b_target->bt_lru, &bp->b_lru)) {
+			bp->b_state &= ~XFS_BSTATE_DISPOSE;
+			atomic_inc(&bp->b_hold);
 		}
+		spin_unlock(&pag->pag_buf_lock);
+	} else {
+		/*
+		 * most of the time buffers will already be removed from the
+		 * LRU, so optimise that case by checking for the
+		 * XFS_BSTATE_DISPOSE flag indicating the last list the buffer
+		 * was on was the disposal list
+		 */
+		if (!(bp->b_state & XFS_BSTATE_DISPOSE)) {
+			list_lru_del(&bp->b_target->bt_lru, &bp->b_lru);
+		} else {
+			ASSERT(list_empty(&bp->b_lru));
+		}
+
+		ASSERT(!(bp->b_flags & _XBF_DELWRI_Q));
+		rb_erase(&bp->b_rbnode, &pag->pag_buf_tree);
+		spin_unlock(&pag->pag_buf_lock);
+		xfs_perag_put(pag);
+		freebuf = true;
 	}
+
+out_unlock:
+	spin_unlock(&bp->b_lock);
+
+	if (freebuf)
+		xfs_buf_free(bp);
 }
 
 
@@ -1355,6 +1418,7 @@ xfs_buf_submit(
 	 * xfs_buf_ioend too early.
 	 */
 	atomic_set(&bp->b_io_remaining, 1);
+	xfs_buf_ioacct_inc(bp);
 	_xfs_buf_ioapply(bp);
 
 	/*
@@ -1540,13 +1604,19 @@ xfs_wait_buftarg(
 	int loop = 0;
 
 	/*
-	 * We need to flush the buffer workqueue to ensure that all IO
-	 * completion processing is 100% done. Just waiting on buffer locks is
-	 * not sufficient for async IO as the reference count held over IO is
-	 * not released until after the buffer lock is dropped. Hence we need to
-	 * ensure here that all reference counts have been dropped before we
-	 * start walking the LRU list.
+	 * First wait on the buftarg I/O count for all in-flight buffers to be
+	 * released. This is critical as new buffers do not make the LRU until
+	 * they are released.
+	 *
+	 * Next, flush the buffer workqueue to ensure all completion processing
+	 * has finished. Just waiting on buffer locks is not sufficient for
+	 * async IO as the reference count held over IO is not released until
+	 * after the buffer lock is dropped. Hence we need to ensure here that
+	 * all reference counts have been dropped before we start walking the
+	 * LRU list.
 	 */
+	while (percpu_counter_sum(&btp->bt_io_count))
+		delay(100);
 	drain_workqueue(btp->bt_mount->m_buf_workqueue);
 
 	/* loop until there is nothing left on the lru list. */
@@ -1643,6 +1713,8 @@ xfs_free_buftarg(
 	struct xfs_buftarg	*btp)
 {
 	unregister_shrinker(&btp->bt_shrinker);
+	ASSERT(percpu_counter_sum(&btp->bt_io_count) == 0);
+	percpu_counter_destroy(&btp->bt_io_count);
 	list_lru_destroy(&btp->bt_lru);
 
 	if (mp->m_flags & XFS_MOUNT_BARRIER)
@@ -1707,6 +1779,9 @@ xfs_alloc_buftarg(
 	if (list_lru_init(&btp->bt_lru))
 		goto error;
 
+	if (percpu_counter_init(&btp->bt_io_count, 0, GFP_KERNEL))
+		goto error;
+
 	btp->bt_shrinker.count_objects = xfs_buftarg_shrink_count;
 	btp->bt_shrinker.scan_objects = xfs_buftarg_shrink_scan;
 	btp->bt_shrinker.seeks = DEFAULT_SEEKS;
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 9e1cca7..dbe4a74 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -63,6 +63,7 @@ typedef enum {
 #define _XBF_KMEM	 (1 << 21)/* backed by heap memory */
 #define _XBF_DELWRI_Q	 (1 << 22)/* buffer on a delwri queue */
 #define _XBF_COMPOUND	 (1 << 23)/* compound buffer */
+#define _XBF_IN_FLIGHT	 (1 << 25) /* I/O in flight, for accounting purposes */
 
 typedef unsigned int xfs_buf_flags_t;
 
@@ -116,6 +117,8 @@ typedef struct xfs_buftarg {
 	/* LRU control structures */
 	struct shrinker		bt_shrinker;
 	struct list_lru		bt_lru;
+
+	struct percpu_counter	bt_io_count;
 } xfs_buftarg_t;
 
 struct xfs_buf;
-- 
2.5.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] xfs: helper to set flags on uncached buffer reads
  2016-07-13 16:16 ` [PATCH 1/3] xfs: helper to set flags on uncached buffer reads Brian Foster
@ 2016-07-14  0:01   ` Dave Chinner
  2016-07-14 10:52     ` Brian Foster
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2016-07-14  0:01 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Jul 13, 2016 at 12:16:33PM -0400, Brian Foster wrote:
> xfs_buf_read_uncached() allocates an uncached buffer and performs a read
> in one go. As part of the upcoming buftarg I/O accounting mechanism,
> some sites may need to set flags on a buffer before I/O submission.
> 
> Create a new helper to support the ability to set flags on a buffer
> before it is submitted for I/O. This use case is the exception, so
> create a wrapper for the original xfs_buf_read_uncached().
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_buf.c | 20 +++++++++++++++++---
>  fs/xfs/xfs_buf.h |  4 ++++
>  2 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 4665ff6..f007713 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -693,13 +693,14 @@ xfs_buf_readahead_map(
>   * buffer containing the disk contents or nothing.
>   */
>  int
> -xfs_buf_read_uncached(
> +xfs_buf_read_uncached_flags(
>  	struct xfs_buftarg	*target,
>  	xfs_daddr_t		daddr,
>  	size_t			numblks,
>  	int			flags,
>  	struct xfs_buf		**bpp,
> -	const struct xfs_buf_ops *ops)
> +	const struct xfs_buf_ops *ops,
> +	int			bflags)
>  {

We already have a flags field being passed in. Why can't that be
used to pass the XBF_NO_IOACCT flag? i.e:

>  	ASSERT(bp->b_map_count == 1);
>  	bp->b_bn = XFS_BUF_DADDR_NULL;  /* always null for uncached buffers */
>  	bp->b_maps[0].bm_bn = daddr;
> -	bp->b_flags |= XBF_READ;
> +	bp->b_flags |= XBF_READ | bflags;

	bp->b_flags |= XBF_READ;
	bp->b_flags |= (flags & XBF_NO_IOACCT);

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] xfs: track and serialize in-flight async buffers against unmount
  2016-07-13 16:16 ` [PATCH 3/3] xfs: track and serialize in-flight async buffers against unmount Brian Foster
@ 2016-07-14  0:05   ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2016-07-14  0:05 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Jul 13, 2016 at 12:16:35PM -0400, Brian Foster wrote:
> Newly allocated XFS metadata buffers are added to the LRU once the hold
> count is released, which typically occurs after I/O completion. There is
> no other mechanism at current that tracks the existence or I/O state of
> a new buffer. Further, readahead I/O tends to be submitted
> asynchronously by nature, which means the I/O can remain in flight and
> actually complete long after the calling context is gone. This means
> that file descriptors or any other holds on the filesystem can be
> released, allowing the filesystem to be unmounted while I/O is still in
> flight. When I/O completion occurs, core data structures may have been
> freed, causing completion to run into invalid memory accesses and likely
> to panic.
> 
> This problem is reproduced on XFS via directory readahead. A filesystem
> is mounted, a directory is opened/closed and the filesystem immediately
> unmounted. The open/close cycle triggers a directory readahead that if
> delayed long enough, runs buffer I/O completion after the unmount has
> completed.
> 
> To address this problem, add a mechanism to track all in-flight,
> asynchronous buffers using per-cpu counters in the buftarg. The buffer
> is accounted on the first I/O submission after the current reference is
> acquired and unaccounted once the buffer is returned to the LRU or
> freed. Update xfs_wait_buftarg() to wait on all in-flight I/O before
> walking the LRU list. Once in-flight I/O has completed and the workqueue
> has drained, all new buffers should have been released onto the LRU.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Yup, looks good now. I'll pull it into my test tree...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] xfs: helper to set flags on uncached buffer reads
  2016-07-14  0:01   ` Dave Chinner
@ 2016-07-14 10:52     ` Brian Foster
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2016-07-14 10:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Jul 14, 2016 at 10:01:04AM +1000, Dave Chinner wrote:
> On Wed, Jul 13, 2016 at 12:16:33PM -0400, Brian Foster wrote:
> > xfs_buf_read_uncached() allocates an uncached buffer and performs a read
> > in one go. As part of the upcoming buftarg I/O accounting mechanism,
> > some sites may need to set flags on a buffer before I/O submission.
> > 
> > Create a new helper to support the ability to set flags on a buffer
> > before it is submitted for I/O. This use case is the exception, so
> > create a wrapper for the original xfs_buf_read_uncached().
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_buf.c | 20 +++++++++++++++++---
> >  fs/xfs/xfs_buf.h |  4 ++++
> >  2 files changed, 21 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 4665ff6..f007713 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -693,13 +693,14 @@ xfs_buf_readahead_map(
> >   * buffer containing the disk contents or nothing.
> >   */
> >  int
> > -xfs_buf_read_uncached(
> > +xfs_buf_read_uncached_flags(
> >  	struct xfs_buftarg	*target,
> >  	xfs_daddr_t		daddr,
> >  	size_t			numblks,
> >  	int			flags,
> >  	struct xfs_buf		**bpp,
> > -	const struct xfs_buf_ops *ops)
> > +	const struct xfs_buf_ops *ops,
> > +	int			bflags)
> >  {
> 
> We already have a flags field being passed in. Why can't that be
> used to pass the XBF_NO_IOACCT flag? i.e:
> 

Oops, I read over that too quickly as I saw it being passed into
alloc_page() and assumed it was for memory allocation flags. It is in
fact for buffer flags.. we just use xb_to_gfp() to select GFP_* flags
based on the buffer flags. I'll drop this patch and fold the
xfs_buf_read_cached() fixup into patch 2. Thanks.

Brian

> >  	ASSERT(bp->b_map_count == 1);
> >  	bp->b_bn = XFS_BUF_DADDR_NULL;  /* always null for uncached buffers */
> >  	bp->b_maps[0].bm_bn = daddr;
> > -	bp->b_flags |= XBF_READ;
> > +	bp->b_flags |= XBF_READ | bflags;
> 
> 	bp->b_flags |= XBF_READ;
> 	bp->b_flags |= (flags & XBF_NO_IOACCT);
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] serialize unmount against new buffer I/O
  2016-07-13 16:16 [PATCH 0/3] serialize unmount against new buffer I/O Brian Foster
                   ` (2 preceding siblings ...)
  2016-07-13 16:16 ` [PATCH 3/3] xfs: track and serialize in-flight async buffers against unmount Brian Foster
@ 2016-07-14 17:29 ` Brian Foster
  3 siblings, 0 replies; 8+ messages in thread
From: Brian Foster @ 2016-07-14 17:29 UTC (permalink / raw)
  To: xfs

On Wed, Jul 13, 2016 at 12:16:32PM -0400, Brian Foster wrote:
> This is a rework of the directory readahead panic fix based on a buffer
> I/O count mechanism. See [1] for the original patch and subsequent
> discussion leading to the solution as constituted.
> 
> This is split into multiple patches simply to break out some refactoring
> and special buffer exclusion from the accounting mechanism. This
> prevents the original problem (xfs/311) as well as passes an initial
> xfstests run.
> 

FYI, I've hit an unmount hang with this while running generic/270. It
takes a few iterations to hit. I'm not sure if it's a race issue or more
of an explicit accounting issue, but I'll hold off on v2 until I can
track it down...

Brian

> Brian
> 
> [1] http://oss.sgi.com/pipermail/xfs/2016-June/049946.html
> 
> Brian Foster (3):
>   xfs: helper to set flags on uncached buffer reads
>   xfs: exclude never-released buffers from buftarg I/O accounting
>   xfs: track and serialize in-flight async buffers against unmount
> 
>  fs/xfs/xfs_buf.c   | 177 ++++++++++++++++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_buf.h   |   8 +++
>  fs/xfs/xfs_log.c   |   4 +-
>  fs/xfs/xfs_mount.c |  12 ++--
>  4 files changed, 151 insertions(+), 50 deletions(-)
> 
> -- 
> 2.5.5
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-07-14 17:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-13 16:16 [PATCH 0/3] serialize unmount against new buffer I/O Brian Foster
2016-07-13 16:16 ` [PATCH 1/3] xfs: helper to set flags on uncached buffer reads Brian Foster
2016-07-14  0:01   ` Dave Chinner
2016-07-14 10:52     ` Brian Foster
2016-07-13 16:16 ` [PATCH 2/3] xfs: exclude never-released buffers from buftarg I/O accounting Brian Foster
2016-07-13 16:16 ` [PATCH 3/3] xfs: track and serialize in-flight async buffers against unmount Brian Foster
2016-07-14  0:05   ` Dave Chinner
2016-07-14 17:29 ` [PATCH 0/3] serialize unmount against new buffer I/O Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.