All of lore.kernel.org
 help / color / mirror / Atom feed
* use bios directly in the log code
@ 2019-05-20 16:13 Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 01/17] xfs: remove the no-op spinlock_destroy stub Christoph Hellwig
                   ` (16 more replies)
  0 siblings, 17 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Hi all,

this series switches the log writing and log recovery code to use bios
directly, and remove various special cases from the buffer cache code.
Note that I have developed it on top of the previous series of log item
related cleanups, so if you don't have that applied there is a small
conflict.  To make life easier I have pushed out a git branche here:

    git://git.infradead.org/users/hch/xfs.git xfs-log-bio

Gitweb:

    http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-log-bio

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/17] xfs: remove the no-op spinlock_destroy stub
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 02/17] xfs: remove the never used _XBF_COMPOUND flag Christoph Hellwig
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_linux.h | 2 --
 fs/xfs/xfs_log.c   | 2 --
 2 files changed, 4 deletions(-)

diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index edbd5a210df2..b78287666309 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -110,8 +110,6 @@ typedef __u32			xfs_nlink_t;
 #define current_restore_flags_nested(sp, f)	\
 		(current->flags = ((current->flags & ~(f)) | (*(sp) & (f))))
 
-#define spinlock_destroy(lock)
-
 #define NBBY		8		/* number of bits per byte */
 
 /*
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 0d6fb374dbe8..242ab5e8aaea 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1551,7 +1551,6 @@ xlog_alloc_log(
 			xfs_buf_free(iclog->ic_bp);
 		kmem_free(iclog);
 	}
-	spinlock_destroy(&log->l_icloglock);
 	xfs_buf_free(log->l_xbuf);
 out_free_log:
 	kmem_free(log);
@@ -1998,7 +1997,6 @@ xlog_dealloc_log(
 		kmem_free(iclog);
 		iclog = next_iclog;
 	}
-	spinlock_destroy(&log->l_icloglock);
 
 	log->l_mp->m_log = NULL;
 	kmem_free(log);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/17] xfs: remove the never used _XBF_COMPOUND flag
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 01/17] xfs: remove the no-op spinlock_destroy stub Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 03/17] xfs: renumber XBF_WRITE_FAIL Christoph Hellwig
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 1701efee4fd4..f67024c1deaa 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -43,7 +43,6 @@
 #define _XBF_PAGES	 (1 << 20)/* backed by refcounted pages */
 #define _XBF_KMEM	 (1 << 21)/* backed by heap memory */
 #define _XBF_DELWRI_Q	 (1 << 22)/* buffer on a delwri queue */
-#define _XBF_COMPOUND	 (1 << 23)/* compound buffer */
 
 typedef unsigned int xfs_buf_flags_t;
 
@@ -63,8 +62,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ XBF_UNMAPPED,		"UNMAPPED" },	/* ditto */\
 	{ _XBF_PAGES,		"PAGES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
-	{ _XBF_DELWRI_Q,	"DELWRI_Q" }, \
-	{ _XBF_COMPOUND,	"COMPOUND" }
+	{ _XBF_DELWRI_Q,	"DELWRI_Q" }
 
 
 /*
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/17] xfs: renumber XBF_WRITE_FAIL
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 01/17] xfs: remove the no-op spinlock_destroy stub Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 02/17] xfs: remove the never used _XBF_COMPOUND flag Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 04/17] xfs: reformat xlog_get_lowest_lsn Christoph Hellwig
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Assining a numerical value that is not close to the flags
defined near by is just asking for conflicts later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index f67024c1deaa..61691d9a5bc9 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -28,7 +28,7 @@
 #define XBF_ASYNC	 (1 << 4) /* initiator will not wait for completion */
 #define XBF_DONE	 (1 << 5) /* all pages in the buffer uptodate */
 #define XBF_STALE	 (1 << 6) /* buffer has been staled, do not find it */
-#define XBF_WRITE_FAIL	 (1 << 24)/* async writes have failed on this buffer */
+#define XBF_WRITE_FAIL	 (1 << 7) /* async writes have failed on this buffer */
 
 /* I/O hints for the BIO layer */
 #define XBF_SYNCIO	 (1 << 10)/* treat this buffer as synchronous I/O */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/17] xfs: reformat xlog_get_lowest_lsn
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (2 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 03/17] xfs: renumber XBF_WRITE_FAIL Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 05/17] xfs: don't use REQ_PREFLUSH for split log writes Christoph Hellwig
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Reformat xlog_get_lowest_lsn to our usual style.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 242ab5e8aaea..e7c9e1a653a8 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2642,27 +2642,23 @@ xlog_state_clean_log(
 
 STATIC xfs_lsn_t
 xlog_get_lowest_lsn(
-	struct xlog	*log)
+	struct xlog		*log)
 {
-	xlog_in_core_t  *lsn_log;
-	xfs_lsn_t	lowest_lsn, lsn;
+	struct xlog_in_core	*iclog = log->l_iclog;
+	xfs_lsn_t		lowest_lsn = 0, lsn;
 
-	lsn_log = log->l_iclog;
-	lowest_lsn = 0;
 	do {
-	    if (!(lsn_log->ic_state & (XLOG_STATE_ACTIVE|XLOG_STATE_DIRTY))) {
-		lsn = be64_to_cpu(lsn_log->ic_header.h_lsn);
-		if ((lsn && !lowest_lsn) ||
-		    (XFS_LSN_CMP(lsn, lowest_lsn) < 0)) {
+		if (iclog->ic_state & (XLOG_STATE_ACTIVE | XLOG_STATE_DIRTY))
+			continue;
+
+		lsn = be64_to_cpu(iclog->ic_header.h_lsn);
+		if ((lsn && !lowest_lsn) || XFS_LSN_CMP(lsn, lowest_lsn) < 0)
 			lowest_lsn = lsn;
-		}
-	    }
-	    lsn_log = lsn_log->ic_next;
-	} while (lsn_log != log->l_iclog);
+	} while ((iclog = iclog->ic_next) != log->l_iclog);
+
 	return lowest_lsn;
 }
 
-
 STATIC void
 xlog_state_do_callback(
 	struct xlog		*log,
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/17] xfs: don't use REQ_PREFLUSH for split log writes
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (3 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 04/17] xfs: reformat xlog_get_lowest_lsn Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 06/17] xfs: factor out log buffer writing Christoph Hellwig
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

If we have to split a log write because it wraps the end of the log we
can't just use REQ_PREFLUSH to flush before the first log write,
as the writes might get reordered somewhere in the I/O stack.  Issue
a manual flush in that case so that the ordering of the two log I/Os
doesn't matter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index e7c9e1a653a8..1374f5d6c372 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1912,7 +1912,7 @@ xlog_sync(
 	 * synchronously here; for an internal log we can simply use the block
 	 * layer state machine for preflushes.
 	 */
-	if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp)
+	if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp || split)
 		xfs_blkdev_issue_flush(log->l_mp->m_ddev_targp);
 	else
 		bp->b_flags |= XBF_FLUSH;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/17] xfs: factor out log buffer writing
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (4 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 05/17] xfs: don't use REQ_PREFLUSH for split log writes Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 07/17] xfs: factor out splitting of an iclog from xlog_sync Christoph Hellwig
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Replace the not very useful xlog_bdstrat wrapper with a new version that
that takes care of all the common logic for writing log buffers.  Use
the opportunity to avoid overloading the buffer address with the log
relative address, and to shed the unused return value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c | 125 +++++++++++++++++------------------------------
 1 file changed, 45 insertions(+), 80 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 1374f5d6c372..16159532825a 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -45,10 +45,6 @@ STATIC int
 xlog_space_left(
 	struct xlog		*log,
 	atomic64_t		*head);
-STATIC int
-xlog_sync(
-	struct xlog		*log,
-	struct xlog_in_core	*iclog);
 STATIC void
 xlog_dealloc_log(
 	struct xlog		*log);
@@ -1736,28 +1732,34 @@ xlog_cksum(
 	return xfs_end_cksum(crc);
 }
 
-/*
- * The bdstrat callback function for log bufs. This gives us a central
- * place to trap bufs in case we get hit by a log I/O error and need to
- * shutdown. Actually, in practice, even when we didn't get a log error,
- * we transition the iclogs to IOERROR state *after* flushing all existing
- * iclogs to disk. This is because we don't want anymore new transactions to be
- * started or completed afterwards.
- *
- * We lock the iclogbufs here so that we can serialise against IO completion
- * during unmount. We might be processing a shutdown triggered during unmount,
- * and that can occur asynchronously to the unmount thread, and hence we need to
- * ensure that completes before tearing down the iclogbufs. Hence we need to
- * hold the buffer lock across the log IO to acheive that.
- */
-STATIC int
-xlog_bdstrat(
-	struct xfs_buf		*bp)
+STATIC void
+xlog_write_iclog(
+	struct xlog		*log,
+	struct xlog_in_core	*iclog,
+	struct xfs_buf		*bp,
+	uint64_t		bno,
+	bool			flush)
 {
-	struct xlog_in_core	*iclog = bp->b_log_item;
+	ASSERT(bno < log->l_logBBsize);
+	ASSERT(bno + bp->b_io_length <= log->l_logBBsize);
 
+	bp->b_maps[0].bm_bn = log->l_logBBstart + bno;
+	bp->b_log_item = iclog;
+	bp->b_flags &= ~XBF_FLUSH;
+	bp->b_flags |= (XBF_ASYNC | XBF_SYNCIO | XBF_WRITE | XBF_FUA);
+	if (flush)
+		bp->b_flags |= XBF_FLUSH;
+
+	/*
+	 * We lock the iclogbufs here so that we can serialise against I/O
+	 * completion during unmount.  We might be processing a shutdown
+	 * triggered during unmount, and that can occur asynchronously to the
+	 * unmount thread, and hence we need to ensure that completes before
+	 * tearing down the iclogbufs.  Hence we need to hold the buffer lock
+	 * across the log IO to archive that.
+	 */
 	xfs_buf_lock(bp);
-	if (iclog->ic_state & XLOG_STATE_IOERROR) {
+	if (unlikely(iclog->ic_state & XLOG_STATE_IOERROR)) {
 		xfs_buf_ioerror(bp, -EIO);
 		xfs_buf_stale(bp);
 		xfs_buf_ioend(bp);
@@ -1767,11 +1769,10 @@ xlog_bdstrat(
 		 * doing it here. Similarly, IO completion will unlock the
 		 * buffer, so we don't do it here.
 		 */
-		return 0;
+		return;
 	}
 
 	xfs_buf_submit(bp);
-	return 0;
 }
 
 /*
@@ -1794,25 +1795,23 @@ xlog_bdstrat(
  * log will require grabbing the lock though.
  *
  * The entire log manager uses a logical block numbering scheme.  Only
- * log_sync (and then only bwrite()) know about the fact that the log may
- * not start with block zero on a given device.  The log block start offset
- * is added immediately before calling bwrite().
+ * xlog_write_iclog knows about the fact that the log may not start with
+ * block zero on a given device.
  */
-
-STATIC int
+STATIC void
 xlog_sync(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog)
 {
-	xfs_buf_t	*bp;
 	int		i;
 	uint		count;		/* byte count of bwrite */
 	uint		count_init;	/* initial count before roundup */
 	int		roundoff;       /* roundoff to BB or stripe */
 	int		split = 0;	/* split write into two regions */
-	int		error;
 	int		v2 = xfs_sb_version_haslogv2(&log->l_mp->m_sb);
+	uint64_t	bno;
 	int		size;
+	bool		flush = true;
 
 	XFS_STATS_INC(log->l_mp, xs_log_writes);
 	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
@@ -1848,17 +1847,16 @@ xlog_sync(
 		size += roundoff;
 	iclog->ic_header.h_len = cpu_to_be32(size);
 
-	bp = iclog->ic_bp;
-	XFS_BUF_SET_ADDR(bp, BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn)));
-
 	XFS_STATS_ADD(log->l_mp, xs_log_blocks, BTOBB(count));
 
+	bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn));
+
 	/* Do we need to split this write into 2 parts? */
-	if (XFS_BUF_ADDR(bp) + BTOBB(count) > log->l_logBBsize) {
+	if (bno + BTOBB(count) > log->l_logBBsize) {
 		char		*dptr;
 
-		split = count - (BBTOB(log->l_logBBsize - XFS_BUF_ADDR(bp)));
-		count = BBTOB(log->l_logBBsize - XFS_BUF_ADDR(bp));
+		split = count - (BBTOB(log->l_logBBsize - bno));
+		count = BBTOB(log->l_logBBsize - bno);
 		iclog->ic_bwritecnt = 2;
 
 		/*
@@ -1899,11 +1897,6 @@ xlog_sync(
 			 be64_to_cpu(iclog->ic_header.h_lsn));
 	}
 
-	bp->b_io_length = BTOBB(count);
-	bp->b_log_item = iclog;
-	bp->b_flags &= ~XBF_FLUSH;
-	bp->b_flags |= (XBF_ASYNC | XBF_SYNCIO | XBF_WRITE | XBF_FUA);
-
 	/*
 	 * Flush the data device before flushing the log to make sure all meta
 	 * data written back from the AIL actually made it to disk before
@@ -1912,50 +1905,22 @@ xlog_sync(
 	 * synchronously here; for an internal log we can simply use the block
 	 * layer state machine for preflushes.
 	 */
-	if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp || split)
+	if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp || split) {
 		xfs_blkdev_issue_flush(log->l_mp->m_ddev_targp);
-	else
-		bp->b_flags |= XBF_FLUSH;
+		flush = false;
+	}
 
-	ASSERT(XFS_BUF_ADDR(bp) <= log->l_logBBsize-1);
-	ASSERT(XFS_BUF_ADDR(bp) + BTOBB(count) <= log->l_logBBsize);
+	iclog->ic_bp->b_io_length = BTOBB(count);
 
 	xlog_verify_iclog(log, iclog, count, true);
+	xlog_write_iclog(log, iclog, iclog->ic_bp, bno, flush);
 
-	/* account for log which doesn't start at block #0 */
-	XFS_BUF_SET_ADDR(bp, XFS_BUF_ADDR(bp) + log->l_logBBstart);
-
-	/*
-	 * Don't call xfs_bwrite here. We do log-syncs even when the filesystem
-	 * is shutting down.
-	 */
-	error = xlog_bdstrat(bp);
-	if (error) {
-		xfs_buf_ioerror_alert(bp, "xlog_sync");
-		return error;
-	}
 	if (split) {
-		bp = iclog->ic_log->l_xbuf;
-		XFS_BUF_SET_ADDR(bp, 0);	     /* logical 0 */
-		xfs_buf_associate_memory(bp,
+		xfs_buf_associate_memory(iclog->ic_log->l_xbuf,
 				(char *)&iclog->ic_header + count, split);
-		bp->b_log_item = iclog;
-		bp->b_flags &= ~XBF_FLUSH;
-		bp->b_flags |= (XBF_ASYNC | XBF_SYNCIO | XBF_WRITE | XBF_FUA);
-
-		ASSERT(XFS_BUF_ADDR(bp) <= log->l_logBBsize-1);
-		ASSERT(XFS_BUF_ADDR(bp) + BTOBB(count) <= log->l_logBBsize);
-
-		/* account for internal log which doesn't start at block #0 */
-		XFS_BUF_SET_ADDR(bp, XFS_BUF_ADDR(bp) + log->l_logBBstart);
-		error = xlog_bdstrat(bp);
-		if (error) {
-			xfs_buf_ioerror_alert(bp, "xlog_sync (split)");
-			return error;
-		}
+		xlog_write_iclog(log, iclog, iclog->ic_log->l_xbuf, 0, false);
 	}
-	return 0;
-}	/* xlog_sync */
+}
 
 /*
  * Deallocate a log structure
@@ -3188,7 +3153,7 @@ xlog_state_release_iclog(
 	 * flags after this point.
 	 */
 	if (sync)
-		return xlog_sync(log, iclog);
+		xlog_sync(log, iclog);
 	return 0;
 }	/* xlog_state_release_iclog */
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/17] xfs: factor out splitting of an iclog from xlog_sync
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (5 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 06/17] xfs: factor out log buffer writing Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 08/17] xfs: split iclog size calculation out of xlog_sync Christoph Hellwig
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Split out a self-contained chunk of code from xlog_sync that calculates
the split offset for an iclog that wraps the log end and bumps the
cycles for the second half.

Use the chance to bring some sanity to the variables used to track the
split in xlog_sync by not changing the count variable, and instead use
split as the offset for the split and use those to calculate the
sizes and offsets for the two write buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c | 63 +++++++++++++++++++++++++-----------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 16159532825a..acfa1b1a85c2 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1775,6 +1775,32 @@ xlog_write_iclog(
 	xfs_buf_submit(bp);
 }
 
+/*
+ * Bump the cycle numbers at the start of each block in the part of the iclog
+ * that ends up in the buffer that gets written to the start of the log.
+ *
+ * Watch out for the header magic number case, though.
+ */
+static unsigned int
+xlog_split_iclog(
+	struct xlog		*log,
+	void			*data,
+	uint64_t		bno,
+	unsigned int		count)
+{
+	unsigned int		split_offset = BBTOB(log->l_logBBsize - bno), i;
+
+	for (i = split_offset; i < count; i += BBSIZE) {
+		uint32_t cycle = get_unaligned_be32(data + i);
+
+		if (++cycle == XLOG_HEADER_MAGIC_NUM)
+			cycle++;
+		put_unaligned_be32(cycle, data + i);
+	}
+
+	return split_offset;
+}
+
 /*
  * Flush out the in-core log (iclog) to the on-disk log in an asynchronous 
  * fashion.  Previously, we should have moved the current iclog
@@ -1803,13 +1829,12 @@ xlog_sync(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog)
 {
-	int		i;
 	uint		count;		/* byte count of bwrite */
 	uint		count_init;	/* initial count before roundup */
 	int		roundoff;       /* roundoff to BB or stripe */
-	int		split = 0;	/* split write into two regions */
 	int		v2 = xfs_sb_version_haslogv2(&log->l_mp->m_sb);
 	uint64_t	bno;
+	unsigned int	split = 0;
 	int		size;
 	bool		flush = true;
 
@@ -1852,32 +1877,8 @@ xlog_sync(
 	bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn));
 
 	/* Do we need to split this write into 2 parts? */
-	if (bno + BTOBB(count) > log->l_logBBsize) {
-		char		*dptr;
-
-		split = count - (BBTOB(log->l_logBBsize - bno));
-		count = BBTOB(log->l_logBBsize - bno);
-		iclog->ic_bwritecnt = 2;
-
-		/*
-		 * Bump the cycle numbers at the start of each block in the
-		 * part of the iclog that ends up in the buffer that gets
-		 * written to the start of the log.
-		 *
-		 * Watch out for the header magic number case, though.
-		 */
-		dptr = (char *)&iclog->ic_header + count;
-		for (i = 0; i < split; i += BBSIZE) {
-			uint32_t cycle = be32_to_cpu(*(__be32 *)dptr);
-			if (++cycle == XLOG_HEADER_MAGIC_NUM)
-				cycle++;
-			*(__be32 *)dptr = cpu_to_be32(cycle);
-
-			dptr += BBSIZE;
-		}
-	} else {
-		iclog->ic_bwritecnt = 1;
-	}
+	if (bno + BTOBB(count) > log->l_logBBsize)
+		split = xlog_split_iclog(log, &iclog->ic_header, bno, count);
 
 	/* calculcate the checksum */
 	iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header,
@@ -1910,14 +1911,16 @@ xlog_sync(
 		flush = false;
 	}
 
-	iclog->ic_bp->b_io_length = BTOBB(count);
+	iclog->ic_bp->b_io_length = BTOBB(split ? split : count);
+	iclog->ic_bwritecnt = split ? 2 : 1;
 
 	xlog_verify_iclog(log, iclog, count, true);
 	xlog_write_iclog(log, iclog, iclog->ic_bp, bno, flush);
 
 	if (split) {
 		xfs_buf_associate_memory(iclog->ic_log->l_xbuf,
-				(char *)&iclog->ic_header + count, split);
+				(char *)&iclog->ic_header + split,
+				count - split);
 		xlog_write_iclog(log, iclog, iclog->ic_log->l_xbuf, 0, false);
 	}
 }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/17] xfs: split iclog size calculation out of xlog_sync
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (6 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 07/17] xfs: factor out splitting of an iclog from xlog_sync Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 09/17] xfs: update both state counters together in xlog_sync Christoph Hellwig
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Split out another selfcountained bit of code from xlog_sync.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c | 64 ++++++++++++++++++++++++++++--------------------
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index acfa1b1a85c2..9950b1bb77a9 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1801,6 +1801,36 @@ xlog_split_iclog(
 	return split_offset;
 }
 
+static int
+xlog_calc_iclog_size(
+	struct xlog		*log,
+	struct xlog_in_core	*iclog,
+	uint32_t		*roundoff)
+{
+	bool			v2 = xfs_sb_version_haslogv2(&log->l_mp->m_sb);
+	uint32_t		count_init, count;
+
+	/* Add for LR header */
+	count_init = log->l_iclog_hsize + iclog->ic_offset;
+
+	/* Round out the log write size */
+	if (v2 && log->l_mp->m_sb.sb_logsunit > 1) {
+		/* we have a v2 stripe unit to use */
+		count = XLOG_LSUNITTOB(log, XLOG_BTOLSUNIT(log, count_init));
+	} else {
+		count = BBTOB(BTOBB(count_init));
+	}
+
+	ASSERT(count >= count_init);
+	*roundoff = count - count_init;
+
+	if (v2 && log->l_mp->m_sb.sb_logsunit > 1)
+		ASSERT(*roundoff < log->l_mp->m_sb.sb_logsunit);
+	else
+		ASSERT(*roundoff < BBTOB(1));
+	return count;
+}
+
 /*
  * Flush out the in-core log (iclog) to the on-disk log in an asynchronous 
  * fashion.  Previously, we should have moved the current iclog
@@ -1829,35 +1859,17 @@ xlog_sync(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog)
 {
-	uint		count;		/* byte count of bwrite */
-	uint		count_init;	/* initial count before roundup */
-	int		roundoff;       /* roundoff to BB or stripe */
-	int		v2 = xfs_sb_version_haslogv2(&log->l_mp->m_sb);
-	uint64_t	bno;
-	unsigned int	split = 0;
-	int		size;
-	bool		flush = true;
+	unsigned int		count;		/* byte count of bwrite */
+	unsigned int		roundoff;       /* roundoff to BB or stripe */
+	uint64_t		bno;
+	unsigned int		split = 0;
+	unsigned int		size;
+	bool			flush = true;
 
 	XFS_STATS_INC(log->l_mp, xs_log_writes);
 	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
 
-	/* Add for LR header */
-	count_init = log->l_iclog_hsize + iclog->ic_offset;
-
-	/* Round out the log write size */
-	if (v2 && log->l_mp->m_sb.sb_logsunit > 1) {
-		/* we have a v2 stripe unit to use */
-		count = XLOG_LSUNITTOB(log, XLOG_BTOLSUNIT(log, count_init));
-	} else {
-		count = BBTOB(BTOBB(count_init));
-	}
-	roundoff = count - count_init;
-	ASSERT(roundoff >= 0);
-	ASSERT((v2 && log->l_mp->m_sb.sb_logsunit > 1 && 
-                roundoff < log->l_mp->m_sb.sb_logsunit)
-		|| 
-		(log->l_mp->m_sb.sb_logsunit <= 1 && 
-		 roundoff < BBTOB(1)));
+	count = xlog_calc_iclog_size(log, iclog, &roundoff);
 
 	/* move grant heads by roundoff in sync */
 	xlog_grant_add_space(log, &log->l_reserve_head.grant, roundoff);
@@ -1868,7 +1880,7 @@ xlog_sync(
 
 	/* real byte length */
 	size = iclog->ic_offset;
-	if (v2)
+	if (xfs_sb_version_haslogv2(&log->l_mp->m_sb))
 		size += roundoff;
 	iclog->ic_header.h_len = cpu_to_be32(size);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/17] xfs: update both state counters together in xlog_sync
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (7 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 08/17] xfs: split iclog size calculation out of xlog_sync Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 10/17] xfs: remove the syncing argument from xlog_verify_iclog Christoph Hellwig
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Just a small bit of code tidying up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 9950b1bb77a9..c5a78bc9f99b 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1866,7 +1866,6 @@ xlog_sync(
 	unsigned int		size;
 	bool			flush = true;
 
-	XFS_STATS_INC(log->l_mp, xs_log_writes);
 	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
 
 	count = xlog_calc_iclog_size(log, iclog, &roundoff);
@@ -1884,6 +1883,7 @@ xlog_sync(
 		size += roundoff;
 	iclog->ic_header.h_len = cpu_to_be32(size);
 
+	XFS_STATS_INC(log->l_mp, xs_log_writes);
 	XFS_STATS_ADD(log->l_mp, xs_log_blocks, BTOBB(count));
 
 	bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn));
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/17] xfs: remove the syncing argument from xlog_verify_iclog
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (8 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 09/17] xfs: update both state counters together in xlog_sync Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 11/17] xfs: make use of the l_targ field in struct xlog Christoph Hellwig
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

The only caller unconditionally passes true here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index c5a78bc9f99b..6eb0b3a6f0d1 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -100,8 +100,7 @@ STATIC void
 xlog_verify_iclog(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog,
-	int			count,
-	bool                    syncing);
+	int			count);
 STATIC void
 xlog_verify_tail_lsn(
 	struct xlog		*log,
@@ -110,7 +109,7 @@ xlog_verify_tail_lsn(
 #else
 #define xlog_verify_dest_ptr(a,b)
 #define xlog_verify_grant_tail(a)
-#define xlog_verify_iclog(a,b,c,d)
+#define xlog_verify_iclog(a,b,c)
 #define xlog_verify_tail_lsn(a,b,c)
 #endif
 
@@ -1926,7 +1925,7 @@ xlog_sync(
 	iclog->ic_bp->b_io_length = BTOBB(split ? split : count);
 	iclog->ic_bwritecnt = split ? 2 : 1;
 
-	xlog_verify_iclog(log, iclog, count, true);
+	xlog_verify_iclog(log, iclog, count);
 	xlog_write_iclog(log, iclog, iclog->ic_bp, bno, flush);
 
 	if (split) {
@@ -3759,8 +3758,7 @@ STATIC void
 xlog_verify_iclog(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog,
-	int			count,
-	bool                    syncing)
+	int			count)
 {
 	xlog_op_header_t	*ophead;
 	xlog_in_core_t		*icptr;
@@ -3804,7 +3802,7 @@ xlog_verify_iclog(
 		/* clientid is only 1 byte */
 		p = &ophead->oh_clientid;
 		field_offset = p - base_ptr;
-		if (!syncing || (field_offset & 0x1ff)) {
+		if (field_offset & 0x1ff) {
 			clientid = ophead->oh_clientid;
 		} else {
 			idx = BTOBBT((char *)&ophead->oh_clientid - iclog->ic_datap);
@@ -3827,7 +3825,7 @@ xlog_verify_iclog(
 		/* check length */
 		p = &ophead->oh_len;
 		field_offset = p - base_ptr;
-		if (!syncing || (field_offset & 0x1ff)) {
+		if (field_offset & 0x1ff) {
 			op_len = be32_to_cpu(ophead->oh_len);
 		} else {
 			idx = BTOBBT((uintptr_t)&ophead->oh_len -
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/17] xfs: make use of the l_targ field in struct xlog
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (9 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 10/17] xfs: remove the syncing argument from xlog_verify_iclog Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 12/17] xfs: use bios directly to write log buffers Christoph Hellwig
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Use the slightly shorter way to get at the buftarg for the log device
wherever we can in the log and log recovery code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c         | 6 +++---
 fs/xfs/xfs_log_recover.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 6eb0b3a6f0d1..42ac2201691c 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -898,7 +898,7 @@ xfs_log_unmount_write(xfs_mount_t *mp)
 	 * Or, if we are doing a forced umount (typically because of IO errors).
 	 */
 	if (mp->m_flags & XFS_MOUNT_NORECOVERY ||
-	    xfs_readonly_buftarg(log->l_mp->m_logdev_targp)) {
+	    xfs_readonly_buftarg(log->l_targ)) {
 		ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
 		return 0;
 	}
@@ -1452,7 +1452,7 @@ xlog_alloc_log(
 	 * having set it up properly.
 	 */
 	error = -ENOMEM;
-	bp = xfs_buf_alloc(mp->m_logdev_targp, XFS_BUF_DADDR_NULL,
+	bp = xfs_buf_alloc(log->l_targ, XFS_BUF_DADDR_NULL,
 			   BTOBB(log->l_iclog_size), XBF_NO_IOACCT);
 	if (!bp)
 		goto out_free_log;
@@ -1917,7 +1917,7 @@ xlog_sync(
 	 * synchronously here; for an internal log we can simply use the block
 	 * layer state machine for preflushes.
 	 */
-	if (log->l_mp->m_logdev_targp != log->l_mp->m_ddev_targp || split) {
+	if (log->l_targ != log->l_mp->m_ddev_targp || split) {
 		xfs_blkdev_issue_flush(log->l_mp->m_ddev_targp);
 		flush = false;
 	}
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 76023ea49356..39e4f774d355 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -134,7 +134,7 @@ xlog_get_bp(
 		nbblks += log->l_sectBBsize;
 	nbblks = round_up(nbblks, log->l_sectBBsize);
 
-	bp = xfs_buf_get_uncached(log->l_mp->m_logdev_targp, nbblks, 0);
+	bp = xfs_buf_get_uncached(log->l_targ, nbblks, 0);
 	if (bp)
 		xfs_buf_unlock(bp);
 	return bp;
@@ -1505,7 +1505,7 @@ xlog_find_tail(
 	 * But... if the -device- itself is readonly, just skip this.
 	 * We can't recover this device anyway, so it won't matter.
 	 */
-	if (!xfs_readonly_buftarg(log->l_mp->m_logdev_targp))
+	if (!xfs_readonly_buftarg(log->l_targ))
 		error = xlog_clear_stale_blocks(log, tail_lsn);
 
 done:
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/17] xfs: use bios directly to write log buffers
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (10 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 11/17] xfs: make use of the l_targ field in struct xlog Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 13/17] xfs: return an offset instead of a pointer from xlog_align Christoph Hellwig
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Currently the XFS logging code uses the xfs_buf structure and
associated APIs to write the log buffers to disk.  This requires
various special cases in the log code and is generally not very
optimal.

Instead of using a buffer just allocate a vmalloc region for each
log buffer, and use a bio and bio_vec array embedded in the iclog
structure to write the buffer to disk.  This also allows for using
the bio split and chaining case to deal with the case of a log
buffer wrapping around the end of the log.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c      | 217 ++++++++++++++++++++----------------------
 fs/xfs/xfs_log_priv.h |  15 +--
 2 files changed, 109 insertions(+), 123 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 42ac2201691c..e51901f68f02 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1210,32 +1210,31 @@ xlog_space_left(
 }
 
 
-/*
- * Log function which is called when an io completes.
- *
- * The log manager needs its own routine, in order to control what
- * happens with the buffer after the write completes.
- */
 static void
-xlog_iodone(xfs_buf_t *bp)
+xlog_ioend_work(
+	struct work_struct	*work)
 {
-	struct xlog_in_core	*iclog = bp->b_log_item;
-	struct xlog		*l = iclog->ic_log;
+	struct xlog_in_core     *iclog =
+		container_of(work, struct xlog_in_core, ic_end_io_work);
+	struct xlog		*log = iclog->ic_log;
 	bool			aborted = false;
+	int			error;
+
+	invalidate_kernel_vmap_range(iclog->ic_data, iclog->ic_io_size);
 
 	/*
 	 * Race to shutdown the filesystem if we see an error or the iclog is in
 	 * IOABORT state. The IOABORT state is only set in DEBUG mode to inject
 	 * CRC errors into log recovery.
 	 */
-	if (XFS_TEST_ERROR(bp->b_error, l->l_mp, XFS_ERRTAG_IODONE_IOERR) ||
+	error = blk_status_to_errno(iclog->ic_bio.bi_status);
+	if (XFS_TEST_ERROR(error, log->l_mp, XFS_ERRTAG_IODONE_IOERR) ||
 	    iclog->ic_state & XLOG_STATE_IOABORT) {
 		if (iclog->ic_state & XLOG_STATE_IOABORT)
 			iclog->ic_state &= ~XLOG_STATE_IOABORT;
 
-		xfs_buf_ioerror_alert(bp, __func__);
-		xfs_buf_stale(bp);
-		xfs_force_shutdown(l->l_mp, SHUTDOWN_LOG_IO_ERROR);
+		xfs_alert(log->l_mp, "log I/O error %d", error);
+		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
 		/*
 		 * This flag will be propagated to the trans-committed
 		 * callback routines to let them know that the log-commit
@@ -1246,17 +1245,16 @@ xlog_iodone(xfs_buf_t *bp)
 		aborted = true;
 	}
 
-	/* log I/O is always issued ASYNC */
-	ASSERT(bp->b_flags & XBF_ASYNC);
 	xlog_state_done_syncing(iclog, aborted);
+	bio_uninit(&iclog->ic_bio);
 
 	/*
-	 * drop the buffer lock now that we are done. Nothing references
-	 * the buffer after this, so an unmount waiting on this lock can now
-	 * tear it down safely. As such, it is unsafe to reference the buffer
-	 * (bp) after the unlock as we could race with it being freed.
+	 * Drop the lock to signal that we are done. Nothing references the
+	 * iclog after this, so an unmount waiting on this lock can now tear it
+	 * down safely. As such, it is unsafe to reference the iclog after the
+	 * unlock as we could race with it being freed.
 	 */
-	xfs_buf_unlock(bp);
+	up(&iclog->ic_sema);
 }
 
 /*
@@ -1388,7 +1386,6 @@ xlog_alloc_log(
 	xlog_rec_header_t	*head;
 	xlog_in_core_t		**iclogp;
 	xlog_in_core_t		*iclog, *prev_iclog=NULL;
-	xfs_buf_t		*bp;
 	int			i;
 	int			error = -ENOMEM;
 	uint			log2_size = 0;
@@ -1446,30 +1443,6 @@ xlog_alloc_log(
 
 	xlog_get_iclog_buffer_size(mp, log);
 
-	/*
-	 * Use a NULL block for the extra log buffer used during splits so that
-	 * it will trigger errors if we ever try to do IO on it without first
-	 * having set it up properly.
-	 */
-	error = -ENOMEM;
-	bp = xfs_buf_alloc(log->l_targ, XFS_BUF_DADDR_NULL,
-			   BTOBB(log->l_iclog_size), XBF_NO_IOACCT);
-	if (!bp)
-		goto out_free_log;
-
-	/*
-	 * The iclogbuf buffer locks are held over IO but we are not going to do
-	 * IO yet.  Hence unlock the buffer so that the log IO path can grab it
-	 * when appropriately.
-	 */
-	ASSERT(xfs_buf_islocked(bp));
-	xfs_buf_unlock(bp);
-
-	/* use high priority wq for log I/O completion */
-	bp->b_ioend_wq = mp->m_log_workqueue;
-	bp->b_iodone = xlog_iodone;
-	log->l_xbuf = bp;
-
 	spin_lock_init(&log->l_icloglock);
 	init_waitqueue_head(&log->l_flush_wait);
 
@@ -1483,7 +1456,9 @@ xlog_alloc_log(
 	 */
 	ASSERT(log->l_iclog_size >= 4096);
 	for (i=0; i < log->l_iclog_bufs; i++) {
-		*iclogp = kmem_zalloc(sizeof(xlog_in_core_t), KM_MAYFAIL);
+		*iclogp = kmem_zalloc(struct_size(*iclogp, ic_bvec,
+				howmany(log->l_iclog_size, PAGE_SIZE)),
+				KM_MAYFAIL);
 		if (!*iclogp)
 			goto out_free_iclog;
 
@@ -1491,20 +1466,9 @@ xlog_alloc_log(
 		iclog->ic_prev = prev_iclog;
 		prev_iclog = iclog;
 
-		bp = xfs_buf_get_uncached(mp->m_logdev_targp,
-					  BTOBB(log->l_iclog_size),
-					  XBF_NO_IOACCT);
-		if (!bp)
+		iclog->ic_data = vmalloc(log->l_iclog_size);
+		if (!iclog->ic_data)
 			goto out_free_iclog;
-
-		ASSERT(xfs_buf_islocked(bp));
-		xfs_buf_unlock(bp);
-
-		/* use high priority wq for log I/O completion */
-		bp->b_ioend_wq = mp->m_log_workqueue;
-		bp->b_iodone = xlog_iodone;
-		iclog->ic_bp = bp;
-		iclog->ic_data = bp->b_addr;
 #ifdef DEBUG
 		log->l_iclog_bak[i] = &iclog->ic_header;
 #endif
@@ -1518,7 +1482,7 @@ xlog_alloc_log(
 		head->h_fmt = cpu_to_be32(XLOG_FMT);
 		memcpy(&head->h_fs_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t));
 
-		iclog->ic_size = BBTOB(bp->b_length) - log->l_iclog_hsize;
+		iclog->ic_size = log->l_iclog_size - log->l_iclog_hsize;
 		iclog->ic_state = XLOG_STATE_ACTIVE;
 		iclog->ic_log = log;
 		atomic_set(&iclog->ic_refcnt, 0);
@@ -1528,6 +1492,8 @@ xlog_alloc_log(
 
 		init_waitqueue_head(&iclog->ic_force_wait);
 		init_waitqueue_head(&iclog->ic_write_wait);
+		INIT_WORK(&iclog->ic_end_io_work, xlog_ioend_work);
+		sema_init(&iclog->ic_sema, 1);
 
 		iclogp = &iclog->ic_next;
 	}
@@ -1542,11 +1508,9 @@ xlog_alloc_log(
 out_free_iclog:
 	for (iclog = log->l_iclog; iclog; iclog = prev_iclog) {
 		prev_iclog = iclog->ic_next;
-		if (iclog->ic_bp)
-			xfs_buf_free(iclog->ic_bp);
+		vfree(iclog->ic_data);
 		kmem_free(iclog);
 	}
-	xfs_buf_free(log->l_xbuf);
 out_free_log:
 	kmem_free(log);
 out:
@@ -1731,23 +1695,43 @@ xlog_cksum(
 	return xfs_end_cksum(crc);
 }
 
+static void
+xlog_bio_end_io(
+	struct bio		*bio)
+{
+	struct xlog_in_core	*iclog = bio->bi_private;
+
+	queue_work(iclog->ic_log->l_mp->m_log_workqueue,
+		   &iclog->ic_end_io_work);
+}
+
+static void
+xlog_map_iclog_data(
+	struct bio		*bio,
+	void			*data,
+	size_t			count)
+{
+	do {
+		struct page	*page = vmalloc_to_page(data);
+		unsigned int	off = offset_in_page(data);
+		size_t		len = min_t(size_t, count, PAGE_SIZE - off);
+
+		__bio_add_page(bio, page, len, off);
+
+		data += len;
+		count -= len;
+	} while (count);
+}
+
 STATIC void
 xlog_write_iclog(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog,
-	struct xfs_buf		*bp,
 	uint64_t		bno,
+	unsigned int		count,
 	bool			flush)
 {
 	ASSERT(bno < log->l_logBBsize);
-	ASSERT(bno + bp->b_io_length <= log->l_logBBsize);
-
-	bp->b_maps[0].bm_bn = log->l_logBBstart + bno;
-	bp->b_log_item = iclog;
-	bp->b_flags &= ~XBF_FLUSH;
-	bp->b_flags |= (XBF_ASYNC | XBF_SYNCIO | XBF_WRITE | XBF_FUA);
-	if (flush)
-		bp->b_flags |= XBF_FLUSH;
 
 	/*
 	 * We lock the iclogbufs here so that we can serialise against I/O
@@ -1757,11 +1741,10 @@ xlog_write_iclog(
 	 * tearing down the iclogbufs.  Hence we need to hold the buffer lock
 	 * across the log IO to archive that.
 	 */
-	xfs_buf_lock(bp);
+	down(&iclog->ic_sema);
 	if (unlikely(iclog->ic_state & XLOG_STATE_IOERROR)) {
-		xfs_buf_ioerror(bp, -EIO);
-		xfs_buf_stale(bp);
-		xfs_buf_ioend(bp);
+		xlog_state_done_syncing(iclog, XFS_LI_ABORTED);
+		up(&iclog->ic_sema);
 		/*
 		 * It would seem logical to return EIO here, but we rely on
 		 * the log state machine to propagate I/O errors instead of
@@ -1771,7 +1754,37 @@ xlog_write_iclog(
 		return;
 	}
 
-	xfs_buf_submit(bp);
+	iclog->ic_io_size = count;
+
+	bio_init(&iclog->ic_bio, iclog->ic_bvec, howmany(count, PAGE_SIZE));
+	bio_set_dev(&iclog->ic_bio, log->l_targ->bt_bdev);
+	iclog->ic_bio.bi_iter.bi_sector = log->l_logBBstart + bno;
+	iclog->ic_bio.bi_end_io = xlog_bio_end_io;
+	iclog->ic_bio.bi_private = iclog;
+	iclog->ic_bio.bi_opf = REQ_OP_WRITE | REQ_META | REQ_SYNC | REQ_FUA;
+	if (flush)
+		iclog->ic_bio.bi_opf |= REQ_PREFLUSH;
+
+	xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, iclog->ic_io_size);
+	flush_kernel_vmap_range(iclog->ic_data, iclog->ic_io_size);
+
+	/*
+	 * If this log buffer would straddle the end of the log we will have
+	 * to split it up into two bios, so that we can continue at the start.
+	 */
+	if (bno + BTOBB(count) > log->l_logBBsize) {
+		struct bio *split;
+
+		split = bio_split(&iclog->ic_bio, log->l_logBBsize - bno,
+				  GFP_NOIO, &fs_bio_set);
+		bio_chain(split, &iclog->ic_bio);
+		submit_bio(split);
+
+		/* restart at logical offset zero for the remainder */
+		iclog->ic_bio.bi_iter.bi_sector = log->l_logBBstart;
+	}
+
+	submit_bio(&iclog->ic_bio);
 }
 
 /*
@@ -1780,7 +1793,7 @@ xlog_write_iclog(
  *
  * Watch out for the header magic number case, though.
  */
-static unsigned int
+static void
 xlog_split_iclog(
 	struct xlog		*log,
 	void			*data,
@@ -1796,8 +1809,6 @@ xlog_split_iclog(
 			cycle++;
 		put_unaligned_be32(cycle, data + i);
 	}
-
-	return split_offset;
 }
 
 static int
@@ -1861,9 +1872,8 @@ xlog_sync(
 	unsigned int		count;		/* byte count of bwrite */
 	unsigned int		roundoff;       /* roundoff to BB or stripe */
 	uint64_t		bno;
-	unsigned int		split = 0;
 	unsigned int		size;
-	bool			flush = true;
+	bool			flush = true, split = false;
 
 	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
 
@@ -1888,8 +1898,10 @@ xlog_sync(
 	bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn));
 
 	/* Do we need to split this write into 2 parts? */
-	if (bno + BTOBB(count) > log->l_logBBsize)
-		split = xlog_split_iclog(log, &iclog->ic_header, bno, count);
+	if (bno + BTOBB(count) > log->l_logBBsize) {
+		xlog_split_iclog(log, &iclog->ic_header, bno, count);
+		split = true;
+	}
 
 	/* calculcate the checksum */
 	iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header,
@@ -1922,18 +1934,8 @@ xlog_sync(
 		flush = false;
 	}
 
-	iclog->ic_bp->b_io_length = BTOBB(split ? split : count);
-	iclog->ic_bwritecnt = split ? 2 : 1;
-
 	xlog_verify_iclog(log, iclog, count);
-	xlog_write_iclog(log, iclog, iclog->ic_bp, bno, flush);
-
-	if (split) {
-		xfs_buf_associate_memory(iclog->ic_log->l_xbuf,
-				(char *)&iclog->ic_header + split,
-				count - split);
-		xlog_write_iclog(log, iclog, iclog->ic_log->l_xbuf, 0, false);
-	}
+	xlog_write_iclog(log, iclog, bno, count, flush);
 }
 
 /*
@@ -1954,25 +1956,15 @@ xlog_dealloc_log(
 	 */
 	iclog = log->l_iclog;
 	for (i = 0; i < log->l_iclog_bufs; i++) {
-		xfs_buf_lock(iclog->ic_bp);
-		xfs_buf_unlock(iclog->ic_bp);
+		down(&iclog->ic_sema);
+		up(&iclog->ic_sema);
 		iclog = iclog->ic_next;
 	}
 
-	/*
-	 * Always need to ensure that the extra buffer does not point to memory
-	 * owned by another log buffer before we free it. Also, cycle the lock
-	 * first to ensure we've completed IO on it.
-	 */
-	xfs_buf_lock(log->l_xbuf);
-	xfs_buf_unlock(log->l_xbuf);
-	xfs_buf_set_empty(log->l_xbuf, BTOBB(log->l_iclog_size));
-	xfs_buf_free(log->l_xbuf);
-
 	iclog = log->l_iclog;
 	for (i = 0; i < log->l_iclog_bufs; i++) {
-		xfs_buf_free(iclog->ic_bp);
 		next_iclog = iclog->ic_next;
+		vfree(iclog->ic_data);
 		kmem_free(iclog);
 		iclog = next_iclog;
 	}
@@ -2883,8 +2875,6 @@ xlog_state_done_syncing(
 	ASSERT(iclog->ic_state == XLOG_STATE_SYNCING ||
 	       iclog->ic_state == XLOG_STATE_IOERROR);
 	ASSERT(atomic_read(&iclog->ic_refcnt) == 0);
-	ASSERT(iclog->ic_bwritecnt == 1 || iclog->ic_bwritecnt == 2);
-
 
 	/*
 	 * If we got an error, either on the first buffer, or in the case of
@@ -2892,13 +2882,8 @@ xlog_state_done_syncing(
 	 * and none should ever be attempted to be written to disk
 	 * again.
 	 */
-	if (iclog->ic_state != XLOG_STATE_IOERROR) {
-		if (--iclog->ic_bwritecnt == 1) {
-			spin_unlock(&log->l_icloglock);
-			return;
-		}
+	if (iclog->ic_state != XLOG_STATE_IOERROR)
 		iclog->ic_state = XLOG_STATE_DONE_SYNC;
-	}
 
 	/*
 	 * Someone could be sleeping prior to writing out the next
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 5c188ccb8568..8fcaa2f88465 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -178,7 +178,6 @@ typedef struct xlog_ticket {
  *	the iclog.
  * - ic_forcewait is used to implement synchronous forcing of the iclog to disk.
  * - ic_next is the pointer to the next iclog in the ring.
- * - ic_bp is a pointer to the buffer used to write this incore log to disk.
  * - ic_log is a pointer back to the global log structure.
  * - ic_size is the full size of the header plus data.
  * - ic_offset is the current number of bytes written to in this iclog.
@@ -203,11 +202,10 @@ typedef struct xlog_in_core {
 	wait_queue_head_t	ic_write_wait;
 	struct xlog_in_core	*ic_next;
 	struct xlog_in_core	*ic_prev;
-	struct xfs_buf		*ic_bp;
 	struct xlog		*ic_log;
-	int			ic_size;
-	int			ic_offset;
-	int			ic_bwritecnt;
+	u32			ic_size;
+	u32			ic_io_size;
+	u32			ic_offset;
 	unsigned short		ic_state;
 	char			*ic_datap;	/* pointer to iclog data */
 
@@ -219,6 +217,11 @@ typedef struct xlog_in_core {
 	atomic_t		ic_refcnt ____cacheline_aligned_in_smp;
 	xlog_in_core_2_t	*ic_data;
 #define ic_header	ic_data->hic_header
+
+	struct semaphore	ic_sema;
+	struct work_struct	ic_end_io_work;
+	struct bio		ic_bio;
+	struct bio_vec		ic_bvec[];
 } xlog_in_core_t;
 
 /*
@@ -346,8 +349,6 @@ struct xlog {
 	struct xfs_mount	*l_mp;	        /* mount point */
 	struct xfs_ail		*l_ailp;	/* AIL log is working with */
 	struct xfs_cil		*l_cilp;	/* CIL log is working with */
-	struct xfs_buf		*l_xbuf;        /* extra buffer for log
-						 * wrapping */
 	struct xfs_buftarg	*l_targ;        /* buftarg of log */
 	struct delayed_work	l_work;		/* background flush work */
 	uint			l_flags;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/17] xfs: return an offset instead of a pointer from xlog_align
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (11 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 12/17] xfs: use bios directly to write log buffers Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers Christoph Hellwig
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

This simplifies both the helper and the callers.  We lost a bit of
size sanity checking, but that is already covered by KASAN if needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log_recover.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 39e4f774d355..9519956b1ccf 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -151,20 +151,14 @@ xlog_put_bp(
  * Return the address of the start of the given block number's data
  * in a log buffer.  The buffer covers a log sector-aligned region.
  */
-STATIC char *
+static inline unsigned int
 xlog_align(
 	struct xlog	*log,
-	xfs_daddr_t	blk_no,
-	int		nbblks,
-	struct xfs_buf	*bp)
+	xfs_daddr_t	blk_no)
 {
-	xfs_daddr_t	offset = blk_no & ((xfs_daddr_t)log->l_sectBBsize - 1);
-
-	ASSERT(offset + nbblks <= bp->b_length);
-	return bp->b_addr + BBTOB(offset);
+	return BBTOB(blk_no & ((xfs_daddr_t)log->l_sectBBsize - 1));
 }
 
-
 /*
  * nbblks should be uint, but oh well.  Just want to catch that 32-bit length.
  */
@@ -216,7 +210,7 @@ xlog_bread(
 	if (error)
 		return error;
 
-	*offset = xlog_align(log, blk_no, nbblks, bp);
+	*offset = bp->b_addr + xlog_align(log, blk_no);
 	return 0;
 }
 
@@ -1713,7 +1707,7 @@ xlog_write_log_records(
 
 		}
 
-		offset = xlog_align(log, start_block, endcount, bp);
+		offset = bp->b_addr + xlog_align(log, start_block);
 		for (; j < endcount; j++) {
 			xlog_add_record(log, offset, cycle, i+j,
 					tail_cycle, tail_block);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (12 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 13/17] xfs: return an offset instead of a pointer from xlog_align Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 23:32   ` Dave Chinner
  2019-05-20 16:13 ` [PATCH 15/17] xfs: remove unused buffer cache APIs Christoph Hellwig
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

The xfs_buf structure is basically used as a glorified container for
a vmalloc allocation in the log recovery code.  Replace it with a
real vmalloc implementation and just build bios directly as needed
to read into or write from it to simplify things a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log_recover.c | 266 +++++++++++++++++++--------------------
 1 file changed, 131 insertions(+), 135 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 9519956b1ccf..350c9a123dad 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -92,17 +92,14 @@ xlog_verify_bp(
 }
 
 /*
- * Allocate a buffer to hold log data.  The buffer needs to be able
- * to map to a range of nbblks basic blocks at any valid (basic
- * block) offset within the log.
+ * Allocate a buffer to hold log data.  The buffer needs to be able to map to
+ * a range of nbblks basic blocks at any valid offset within the log.
  */
-STATIC xfs_buf_t *
+static char *
 xlog_get_bp(
 	struct xlog	*log,
 	int		nbblks)
 {
-	struct xfs_buf	*bp;
-
 	/*
 	 * Pass log block 0 since we don't have an addr yet, buffer will be
 	 * verified on read.
@@ -115,36 +112,30 @@ xlog_get_bp(
 	}
 
 	/*
-	 * We do log I/O in units of log sectors (a power-of-2
-	 * multiple of the basic block size), so we round up the
-	 * requested size to accommodate the basic blocks required
-	 * for complete log sectors.
+	 * We do log I/O in units of log sectors (a power-of-2 multiple of the
+	 * basic block size), so we round up the requested size to accommodate
+	 * the basic blocks required for complete log sectors.
 	 *
-	 * In addition, the buffer may be used for a non-sector-
-	 * aligned block offset, in which case an I/O of the
-	 * requested size could extend beyond the end of the
-	 * buffer.  If the requested size is only 1 basic block it
-	 * will never straddle a sector boundary, so this won't be
-	 * an issue.  Nor will this be a problem if the log I/O is
-	 * done in basic blocks (sector size 1).  But otherwise we
-	 * extend the buffer by one extra log sector to ensure
-	 * there's space to accommodate this possibility.
+	 * In addition, the buffer may be used for a non-sector-aligned block
+	 * offset, in which case an I/O of the requested size could extend
+	 * beyond the end of the buffer.  If the requested size is only 1 basic
+	 * block it will never straddle a sector boundary, so this won't be an
+	 * issue.  Nor will this be a problem if the log I/O is done in basic
+	 * blocks (sector size 1).  But otherwise we extend the buffer by one
+	 * extra log sector to ensure there's space to accommodate this
+	 * possibility.
 	 */
 	if (nbblks > 1 && log->l_sectBBsize > 1)
 		nbblks += log->l_sectBBsize;
 	nbblks = round_up(nbblks, log->l_sectBBsize);
-
-	bp = xfs_buf_get_uncached(log->l_targ, nbblks, 0);
-	if (bp)
-		xfs_buf_unlock(bp);
-	return bp;
+	return vmalloc(BBTOB(nbblks));
 }
 
 STATIC void
 xlog_put_bp(
-	xfs_buf_t	*bp)
+	char		*data)
 {
-	xfs_buf_free(bp);
+	vfree(data);
 }
 
 /*
@@ -159,17 +150,71 @@ xlog_align(
 	return BBTOB(blk_no & ((xfs_daddr_t)log->l_sectBBsize - 1));
 }
 
-/*
- * nbblks should be uint, but oh well.  Just want to catch that 32-bit length.
- */
-STATIC int
-xlog_bread_noalign(
-	struct xlog	*log,
-	xfs_daddr_t	blk_no,
-	int		nbblks,
-	struct xfs_buf	*bp)
+static struct bio *
+xlog_alloc_bio(
+	struct xlog		*log,
+	xfs_daddr_t		blk_no,
+	unsigned int		count,
+	unsigned int		op)
 {
-	int		error;
+	int			vecs = howmany(count, PAGE_SIZE);
+	struct bio		*bio;
+
+	bio = bio_alloc(GFP_KERNEL, min(vecs, BIO_MAX_PAGES));
+	bio_set_dev(bio, log->l_targ->bt_bdev);
+	bio->bi_iter.bi_sector = log->l_logBBstart + blk_no;
+	bio->bi_opf = op | REQ_META | REQ_SYNC;
+	return bio;
+}
+
+static int
+xlog_submit_bio(
+	struct xlog		*log,
+	xfs_daddr_t		blk_no,
+	unsigned int		count,
+	char			*data,
+	unsigned int		op)
+
+{
+	int			error;
+	struct bio		*bio;
+
+	bio = xlog_alloc_bio(log, blk_no, count, op);
+
+	do {
+		struct page	*page = vmalloc_to_page(data);
+		unsigned int	off = offset_in_page(data);
+		unsigned int	len = min_t(unsigned, count, PAGE_SIZE - off);
+
+		while (bio_add_page(bio, page, len, off) != len) {
+			struct bio	*prev = bio;
+
+			bio = xlog_alloc_bio(log, blk_no, count, op);
+			bio_chain(bio, prev);
+			submit_bio(prev);
+		}
+
+		data += len;
+		count -= len;
+		blk_no += BTOBB(len);
+	} while (count > 0);
+
+	error = submit_bio_wait(bio);
+	bio_put(bio);
+
+	return error;
+}
+
+static int
+xlog_do_io(
+	struct xlog		*log,
+	xfs_daddr_t		blk_no,
+	unsigned int		nbblks,
+	char			*data,
+	unsigned int		op)
+{
+	unsigned int		count;
+	int			error;
 
 	if (!xlog_verify_bp(log, blk_no, nbblks)) {
 		xfs_warn(log->l_mp,
@@ -181,107 +226,59 @@ xlog_bread_noalign(
 
 	blk_no = round_down(blk_no, log->l_sectBBsize);
 	nbblks = round_up(nbblks, log->l_sectBBsize);
+	count = BBTOB(nbblks);
 
 	ASSERT(nbblks > 0);
-	ASSERT(nbblks <= bp->b_length);
 
-	XFS_BUF_SET_ADDR(bp, log->l_logBBstart + blk_no);
-	bp->b_flags |= XBF_READ;
-	bp->b_io_length = nbblks;
-	bp->b_error = 0;
+	if (op == REQ_OP_READ)
+		flush_kernel_vmap_range(data, count);
+	error = xlog_submit_bio(log, blk_no, count, data, op);
+	if (op == REQ_OP_WRITE)
+		invalidate_kernel_vmap_range(data, count);
 
-	error = xfs_buf_submit(bp);
-	if (error && !XFS_FORCED_SHUTDOWN(log->l_mp))
-		xfs_buf_ioerror_alert(bp, __func__);
+	if (error && !XFS_FORCED_SHUTDOWN(log->l_mp)) {
+		xfs_alert(log->l_mp,
+			  "log recovery %s I/O error at daddr 0x%llx len %d error %d",
+			  op == REQ_OP_WRITE ? "write" : "read",
+			  blk_no, nbblks, error);
+	}
 	return error;
 }
 
 STATIC int
-xlog_bread(
+xlog_bread_noalign(
 	struct xlog	*log,
 	xfs_daddr_t	blk_no,
 	int		nbblks,
-	struct xfs_buf	*bp,
-	char		**offset)
+	char		*data)
 {
-	int		error;
-
-	error = xlog_bread_noalign(log, blk_no, nbblks, bp);
-	if (error)
-		return error;
-
-	*offset = bp->b_addr + xlog_align(log, blk_no);
-	return 0;
+	return xlog_do_io(log, blk_no, nbblks, data, REQ_OP_READ);
 }
 
-/*
- * Read at an offset into the buffer. Returns with the buffer in it's original
- * state regardless of the result of the read.
- */
 STATIC int
-xlog_bread_offset(
+xlog_bread(
 	struct xlog	*log,
-	xfs_daddr_t	blk_no,		/* block to read from */
-	int		nbblks,		/* blocks to read */
-	struct xfs_buf	*bp,
-	char		*offset)
+	xfs_daddr_t	blk_no,
+	int		nbblks,
+	char		*data,
+	char		**offset)
 {
-	char		*orig_offset = bp->b_addr;
-	int		orig_len = BBTOB(bp->b_length);
-	int		error, error2;
-
-	error = xfs_buf_associate_memory(bp, offset, BBTOB(nbblks));
-	if (error)
-		return error;
-
-	error = xlog_bread_noalign(log, blk_no, nbblks, bp);
+	int		error;
 
-	/* must reset buffer pointer even on error */
-	error2 = xfs_buf_associate_memory(bp, orig_offset, orig_len);
-	if (error)
-		return error;
-	return error2;
+	error = xlog_do_io(log, blk_no, nbblks, data, REQ_OP_READ);
+	if (!error)
+		*offset = data + xlog_align(log, blk_no);
+	return error;
 }
 
-/*
- * Write out the buffer at the given block for the given number of blocks.
- * The buffer is kept locked across the write and is returned locked.
- * This can only be used for synchronous log writes.
- */
 STATIC int
 xlog_bwrite(
 	struct xlog	*log,
 	xfs_daddr_t	blk_no,
 	int		nbblks,
-	struct xfs_buf	*bp)
+	char		*data)
 {
-	int		error;
-
-	if (!xlog_verify_bp(log, blk_no, nbblks)) {
-		xfs_warn(log->l_mp,
-			 "Invalid log block/length (0x%llx, 0x%x) for buffer",
-			 blk_no, nbblks);
-		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_HIGH, log->l_mp);
-		return -EFSCORRUPTED;
-	}
-
-	blk_no = round_down(blk_no, log->l_sectBBsize);
-	nbblks = round_up(nbblks, log->l_sectBBsize);
-
-	ASSERT(nbblks > 0);
-	ASSERT(nbblks <= bp->b_length);
-
-	XFS_BUF_SET_ADDR(bp, log->l_logBBstart + blk_no);
-	xfs_buf_hold(bp);
-	xfs_buf_lock(bp);
-	bp->b_io_length = nbblks;
-	bp->b_error = 0;
-
-	error = xfs_bwrite(bp);
-	if (error)
-		xfs_buf_ioerror_alert(bp, __func__);
-	xfs_buf_relse(bp);
-	return error;
+	return xlog_do_io(log, blk_no, nbblks, data, REQ_OP_WRITE);
 }
 
 #ifdef DEBUG
@@ -399,7 +396,7 @@ xlog_recover_iodone(
 STATIC int
 xlog_find_cycle_start(
 	struct xlog	*log,
-	struct xfs_buf	*bp,
+	char		*bp,
 	xfs_daddr_t	first_blk,
 	xfs_daddr_t	*last_blk,
 	uint		cycle)
@@ -449,7 +446,7 @@ xlog_find_verify_cycle(
 {
 	xfs_daddr_t	i, j;
 	uint		cycle;
-	xfs_buf_t	*bp;
+	char		*bp;
 	xfs_daddr_t	bufblks;
 	char		*buf = NULL;
 	int		error = 0;
@@ -516,7 +513,7 @@ xlog_find_verify_log_record(
 	int			extra_bblks)
 {
 	xfs_daddr_t		i;
-	xfs_buf_t		*bp;
+	char			*bp;
 	char			*offset = NULL;
 	xlog_rec_header_t	*head = NULL;
 	int			error = 0;
@@ -623,7 +620,7 @@ xlog_find_head(
 	struct xlog	*log,
 	xfs_daddr_t	*return_head_blk)
 {
-	xfs_buf_t	*bp;
+	char		*bp;
 	char		*offset;
 	xfs_daddr_t	new_blk, first_blk, start_blk, last_blk, head_blk;
 	int		num_scan_bblks;
@@ -889,7 +886,7 @@ xlog_rseek_logrec_hdr(
 	xfs_daddr_t		head_blk,
 	xfs_daddr_t		tail_blk,
 	int			count,
-	struct xfs_buf		*bp,
+	char			*bp,
 	xfs_daddr_t		*rblk,
 	struct xlog_rec_header	**rhead,
 	bool			*wrapped)
@@ -963,7 +960,7 @@ xlog_seek_logrec_hdr(
 	xfs_daddr_t		head_blk,
 	xfs_daddr_t		tail_blk,
 	int			count,
-	struct xfs_buf		*bp,
+	char			*bp,
 	xfs_daddr_t		*rblk,
 	struct xlog_rec_header	**rhead,
 	bool			*wrapped)
@@ -1063,7 +1060,7 @@ xlog_verify_tail(
 	int			hsize)
 {
 	struct xlog_rec_header	*thead;
-	struct xfs_buf		*bp;
+	char			*bp;
 	xfs_daddr_t		first_bad;
 	int			error = 0;
 	bool			wrapped;
@@ -1145,13 +1142,13 @@ xlog_verify_head(
 	struct xlog		*log,
 	xfs_daddr_t		*head_blk,	/* in/out: unverified head */
 	xfs_daddr_t		*tail_blk,	/* out: tail block */
-	struct xfs_buf		*bp,
+	char			*bp,
 	xfs_daddr_t		*rhead_blk,	/* start blk of last record */
 	struct xlog_rec_header	**rhead,	/* ptr to last record */
 	bool			*wrapped)	/* last rec. wraps phys. log */
 {
 	struct xlog_rec_header	*tmp_rhead;
-	struct xfs_buf		*tmp_bp;
+	char			*tmp_bp;
 	xfs_daddr_t		first_bad;
 	xfs_daddr_t		tmp_rhead_blk;
 	int			found;
@@ -1260,7 +1257,7 @@ xlog_check_unmount_rec(
 	xfs_daddr_t		*tail_blk,
 	struct xlog_rec_header	*rhead,
 	xfs_daddr_t		rhead_blk,
-	struct xfs_buf		*bp,
+	char			*bp,
 	bool			*clean)
 {
 	struct xlog_op_header	*op_head;
@@ -1382,7 +1379,7 @@ xlog_find_tail(
 {
 	xlog_rec_header_t	*rhead;
 	char			*offset = NULL;
-	xfs_buf_t		*bp;
+	char			*bp;
 	int			error;
 	xfs_daddr_t		rhead_blk;
 	xfs_lsn_t		tail_lsn;
@@ -1531,7 +1528,7 @@ xlog_find_zeroed(
 	struct xlog	*log,
 	xfs_daddr_t	*blk_no)
 {
-	xfs_buf_t	*bp;
+	char		*bp;
 	char		*offset;
 	uint	        first_cycle, last_cycle;
 	xfs_daddr_t	new_blk, last_blk, start_blk;
@@ -1651,7 +1648,7 @@ xlog_write_log_records(
 	int		tail_block)
 {
 	char		*offset;
-	xfs_buf_t	*bp;
+	char		*bp;
 	int		balign, ealign;
 	int		sectbb = log->l_sectBBsize;
 	int		end_block = start_block + blocks;
@@ -1699,15 +1696,14 @@ xlog_write_log_records(
 		 */
 		ealign = round_down(end_block, sectbb);
 		if (j == 0 && (start_block + endcount > ealign)) {
-			offset = bp->b_addr + BBTOB(ealign - start_block);
-			error = xlog_bread_offset(log, ealign, sectbb,
-							bp, offset);
+			error = xlog_bread_noalign(log, ealign, sectbb,
+					bp + BBTOB(ealign - start_block));
 			if (error)
 				break;
 
 		}
 
-		offset = bp->b_addr + xlog_align(log, start_block);
+		offset = bp + xlog_align(log, start_block);
 		for (; j < endcount; j++) {
 			xlog_add_record(log, offset, cycle, i+j,
 					tail_cycle, tail_block);
@@ -5301,7 +5297,7 @@ xlog_do_recovery_pass(
 	xfs_daddr_t		blk_no, rblk_no;
 	xfs_daddr_t		rhead_blk;
 	char			*offset;
-	xfs_buf_t		*hbp, *dbp;
+	char			*hbp, *dbp;
 	int			error = 0, h_size, h_len;
 	int			error2 = 0;
 	int			bblks, split_bblks;
@@ -5399,7 +5395,7 @@ xlog_do_recovery_pass(
 			/*
 			 * Check for header wrapping around physical end-of-log
 			 */
-			offset = hbp->b_addr;
+			offset = hbp;
 			split_hblks = 0;
 			wrapped_hblks = 0;
 			if (blk_no + hblks <= log->l_logBBsize) {
@@ -5435,8 +5431,8 @@ xlog_do_recovery_pass(
 				 *   - order is important.
 				 */
 				wrapped_hblks = hblks - split_hblks;
-				error = xlog_bread_offset(log, 0,
-						wrapped_hblks, hbp,
+				error = xlog_bread_noalign(log, 0,
+						wrapped_hblks,
 						offset + BBTOB(split_hblks));
 				if (error)
 					goto bread_err2;
@@ -5467,7 +5463,7 @@ xlog_do_recovery_pass(
 			} else {
 				/* This log record is split across the
 				 * physical end of log */
-				offset = dbp->b_addr;
+				offset = dbp;
 				split_bblks = 0;
 				if (blk_no != log->l_logBBsize) {
 					/* some data is before the physical
@@ -5496,8 +5492,8 @@ xlog_do_recovery_pass(
 				 *   _first_, then the log start (LR header end)
 				 *   - order is important.
 				 */
-				error = xlog_bread_offset(log, 0,
-						bblks - split_bblks, dbp,
+				error = xlog_bread_noalign(log, 0,
+						bblks - split_bblks,
 						offset + BBTOB(split_bblks));
 				if (error)
 					goto bread_err2;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 15/17] xfs: remove unused buffer cache APIs
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (13 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 16/17] xfs: properly type the b_log_item field in struct xfs_buf Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 17/17] xfs: remove the b_io_length " Christoph Hellwig
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Now that the log code uses bios directly we can drop various special
cases in the buffer cache code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c | 94 ++----------------------------------------------
 fs/xfs/xfs_buf.h | 27 --------------
 2 files changed, 2 insertions(+), 119 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index be8afa1673c7..d44097e645a5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -213,7 +213,7 @@ xfs_buf_free_maps(
 	}
 }
 
-struct xfs_buf *
+static struct xfs_buf *
 _xfs_buf_alloc(
 	struct xfs_buftarg	*target,
 	struct xfs_buf_map	*map,
@@ -909,83 +909,6 @@ xfs_buf_read_uncached(
 	return 0;
 }
 
-/*
- * Return a buffer allocated as an empty buffer and associated to external
- * memory via xfs_buf_associate_memory() back to it's empty state.
- */
-void
-xfs_buf_set_empty(
-	struct xfs_buf		*bp,
-	size_t			numblks)
-{
-	if (bp->b_pages)
-		_xfs_buf_free_pages(bp);
-
-	bp->b_pages = NULL;
-	bp->b_page_count = 0;
-	bp->b_addr = NULL;
-	bp->b_length = numblks;
-	bp->b_io_length = numblks;
-
-	ASSERT(bp->b_map_count == 1);
-	bp->b_bn = XFS_BUF_DADDR_NULL;
-	bp->b_maps[0].bm_bn = XFS_BUF_DADDR_NULL;
-	bp->b_maps[0].bm_len = bp->b_length;
-}
-
-static inline struct page *
-mem_to_page(
-	void			*addr)
-{
-	if ((!is_vmalloc_addr(addr))) {
-		return virt_to_page(addr);
-	} else {
-		return vmalloc_to_page(addr);
-	}
-}
-
-int
-xfs_buf_associate_memory(
-	xfs_buf_t		*bp,
-	void			*mem,
-	size_t			len)
-{
-	int			rval;
-	int			i = 0;
-	unsigned long		pageaddr;
-	unsigned long		offset;
-	size_t			buflen;
-	int			page_count;
-
-	pageaddr = (unsigned long)mem & PAGE_MASK;
-	offset = (unsigned long)mem - pageaddr;
-	buflen = PAGE_ALIGN(len + offset);
-	page_count = buflen >> PAGE_SHIFT;
-
-	/* Free any previous set of page pointers */
-	if (bp->b_pages)
-		_xfs_buf_free_pages(bp);
-
-	bp->b_pages = NULL;
-	bp->b_addr = mem;
-
-	rval = _xfs_buf_get_pages(bp, page_count);
-	if (rval)
-		return rval;
-
-	bp->b_offset = offset;
-
-	for (i = 0; i < bp->b_page_count; i++) {
-		bp->b_pages[i] = mem_to_page((void *)pageaddr);
-		pageaddr += PAGE_SIZE;
-	}
-
-	bp->b_io_length = BTOBB(len);
-	bp->b_length = BTOBB(buflen);
-
-	return 0;
-}
-
 xfs_buf_t *
 xfs_buf_get_uncached(
 	struct xfs_buftarg	*target,
@@ -1269,7 +1192,7 @@ xfs_buf_ioend_async(
 	struct xfs_buf	*bp)
 {
 	INIT_WORK(&bp->b_ioend_work, xfs_buf_ioend_work);
-	queue_work(bp->b_ioend_wq, &bp->b_ioend_work);
+	queue_work(bp->b_target->bt_mount->m_buf_workqueue, &bp->b_ioend_work);
 }
 
 void
@@ -1436,21 +1359,8 @@ _xfs_buf_ioapply(
 	 */
 	bp->b_error = 0;
 
-	/*
-	 * Initialize the I/O completion workqueue if we haven't yet or the
-	 * submitter has not opted to specify a custom one.
-	 */
-	if (!bp->b_ioend_wq)
-		bp->b_ioend_wq = bp->b_target->bt_mount->m_buf_workqueue;
-
 	if (bp->b_flags & XBF_WRITE) {
 		op = REQ_OP_WRITE;
-		if (bp->b_flags & XBF_SYNCIO)
-			op_flags = REQ_SYNC;
-		if (bp->b_flags & XBF_FUA)
-			op_flags |= REQ_FUA;
-		if (bp->b_flags & XBF_FLUSH)
-			op_flags |= REQ_PREFLUSH;
 
 		/*
 		 * Run the write verifier callback function if it exists. If
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 61691d9a5bc9..2440c3ab85a8 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -30,11 +30,6 @@
 #define XBF_STALE	 (1 << 6) /* buffer has been staled, do not find it */
 #define XBF_WRITE_FAIL	 (1 << 7) /* async writes have failed on this buffer */
 
-/* I/O hints for the BIO layer */
-#define XBF_SYNCIO	 (1 << 10)/* treat this buffer as synchronous I/O */
-#define XBF_FUA		 (1 << 11)/* force cache write through mode */
-#define XBF_FLUSH	 (1 << 12)/* flush the disk cache before a write */
-
 /* flags used only as arguments to access routines */
 #define XBF_TRYLOCK	 (1 << 16)/* lock requested, but do not wait */
 #define XBF_UNMAPPED	 (1 << 17)/* do not map the buffer */
@@ -55,9 +50,6 @@ typedef unsigned int xfs_buf_flags_t;
 	{ XBF_DONE,		"DONE" }, \
 	{ XBF_STALE,		"STALE" }, \
 	{ XBF_WRITE_FAIL,	"WRITE_FAIL" }, \
-	{ XBF_SYNCIO,		"SYNCIO" }, \
-	{ XBF_FUA,		"FUA" }, \
-	{ XBF_FLUSH,		"FLUSH" }, \
 	{ XBF_TRYLOCK,		"TRYLOCK" },	/* should never be set */\
 	{ XBF_UNMAPPED,		"UNMAPPED" },	/* ditto */\
 	{ _XBF_PAGES,		"PAGES" }, \
@@ -156,7 +148,6 @@ typedef struct xfs_buf {
 	xfs_buftarg_t		*b_target;	/* buffer target (device) */
 	void			*b_addr;	/* virtual address of buffer */
 	struct work_struct	b_ioend_work;
-	struct workqueue_struct	*b_ioend_wq;	/* I/O completion wq */
 	xfs_buf_iodone_t	b_iodone;	/* I/O completion function */
 	struct completion	b_iowait;	/* queue for I/O waiters */
 	void			*b_log_item;
@@ -201,21 +192,6 @@ struct xfs_buf *xfs_buf_incore(struct xfs_buftarg *target,
 			   xfs_daddr_t blkno, size_t numblks,
 			   xfs_buf_flags_t flags);
 
-struct xfs_buf *_xfs_buf_alloc(struct xfs_buftarg *target,
-			       struct xfs_buf_map *map, int nmaps,
-			       xfs_buf_flags_t flags);
-
-static inline struct xfs_buf *
-xfs_buf_alloc(
-	struct xfs_buftarg	*target,
-	xfs_daddr_t		blkno,
-	size_t			numblks,
-	xfs_buf_flags_t		flags)
-{
-	DEFINE_SINGLE_BUF_MAP(map, blkno, numblks);
-	return _xfs_buf_alloc(target, &map, 1, flags);
-}
-
 struct xfs_buf *xfs_buf_get_map(struct xfs_buftarg *target,
 			       struct xfs_buf_map *map, int nmaps,
 			       xfs_buf_flags_t flags);
@@ -261,9 +237,6 @@ xfs_buf_readahead(
 	return xfs_buf_readahead_map(target, &map, 1, ops);
 }
 
-void xfs_buf_set_empty(struct xfs_buf *bp, size_t numblks);
-int xfs_buf_associate_memory(struct xfs_buf *bp, void *mem, size_t length);
-
 struct xfs_buf *xfs_buf_get_uncached(struct xfs_buftarg *target, size_t numblks,
 				int flags);
 int xfs_buf_read_uncached(struct xfs_buftarg *target, xfs_daddr_t daddr,
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 16/17] xfs: properly type the b_log_item field in struct xfs_buf
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (14 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 15/17] xfs: remove unused buffer cache APIs Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  2019-05-20 16:13 ` [PATCH 17/17] xfs: remove the b_io_length " Christoph Hellwig
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

Now that the log code doesn't abuse this field any more we can
declare it as a struct xfs_buf_log_item pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 2440c3ab85a8..27f05db07214 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -150,7 +150,7 @@ typedef struct xfs_buf {
 	struct work_struct	b_ioend_work;
 	xfs_buf_iodone_t	b_iodone;	/* I/O completion function */
 	struct completion	b_iowait;	/* queue for I/O waiters */
-	void			*b_log_item;
+	struct xfs_buf_log_item	*b_log_item;
 	struct list_head	b_li_list;	/* Log items list head */
 	struct xfs_trans	*b_transp;
 	struct page		**b_pages;	/* array of page pointers */
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 17/17] xfs: remove the b_io_length field in struct xfs_buf
  2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
                   ` (15 preceding siblings ...)
  2019-05-20 16:13 ` [PATCH 16/17] xfs: properly type the b_log_item field in struct xfs_buf Christoph Hellwig
@ 2019-05-20 16:13 ` Christoph Hellwig
  16 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-20 16:13 UTC (permalink / raw)
  To: linux-xfs

This field is now always idential to b_length.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c         | 5 ++---
 fs/xfs/xfs_buf.h         | 1 -
 fs/xfs/xfs_log_recover.c | 9 ++++-----
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index d44097e645a5..bb37e7a683e7 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -263,7 +263,6 @@ _xfs_buf_alloc(
 		bp->b_maps[i].bm_len = map[i].bm_len;
 		bp->b_length += map[i].bm_len;
 	}
-	bp->b_io_length = bp->b_length;
 
 	atomic_set(&bp->b_pin_count, 0);
 	init_waitqueue_head(&bp->b_waiters);
@@ -1407,7 +1406,7 @@ _xfs_buf_ioapply(
 	 * subsequent call.
 	 */
 	offset = bp->b_offset;
-	size = BBTOB(bp->b_io_length);
+	size = BBTOB(bp->b_length);
 	blk_start_plug(&plug);
 	for (i = 0; i < bp->b_map_count; i++) {
 		xfs_buf_ioapply_map(bp, i, &offset, &size, op, op_flags);
@@ -1540,7 +1539,7 @@ xfs_buf_zero(
 		page_offset = (boff + bp->b_offset) & ~PAGE_MASK;
 		page = bp->b_pages[page_index];
 		csize = min_t(size_t, PAGE_SIZE - page_offset,
-				      BBTOB(bp->b_io_length) - boff);
+				      BBTOB(bp->b_length) - boff);
 
 		ASSERT((csize + page_offset) <= PAGE_SIZE);
 
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 27f05db07214..178fdfc747b3 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -158,7 +158,6 @@ typedef struct xfs_buf {
 	struct xfs_buf_map	*b_maps;	/* compound buffer map */
 	struct xfs_buf_map	__b_map;	/* inline compound buffer map */
 	int			b_map_count;
-	int			b_io_length;	/* IO size in BBs */
 	atomic_t		b_pin_count;	/* pin count */
 	atomic_t		b_io_remaining;	/* #outstanding I/O requests */
 	unsigned int		b_page_count;	/* size of page array */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 350c9a123dad..fc968d67813f 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2152,7 +2152,7 @@ xlog_recover_do_inode_buffer(
 	if (xfs_sb_version_hascrc(&mp->m_sb))
 		bp->b_ops = &xfs_inode_buf_ops;
 
-	inodes_per_buf = BBTOB(bp->b_io_length) >> mp->m_sb.sb_inodelog;
+	inodes_per_buf = BBTOB(bp->b_length) >> mp->m_sb.sb_inodelog;
 	for (i = 0; i < inodes_per_buf; i++) {
 		next_unlinked_offset = (i * mp->m_sb.sb_inodesize) +
 			offsetof(xfs_dinode_t, di_next_unlinked);
@@ -2194,8 +2194,7 @@ xlog_recover_do_inode_buffer(
 
 		ASSERT(item->ri_buf[item_index].i_addr != NULL);
 		ASSERT((item->ri_buf[item_index].i_len % XFS_BLF_CHUNK) == 0);
-		ASSERT((reg_buf_offset + reg_buf_bytes) <=
-							BBTOB(bp->b_io_length));
+		ASSERT((reg_buf_offset + reg_buf_bytes) <= BBTOB(bp->b_length));
 
 		/*
 		 * The current logged region contains a copy of the
@@ -2660,7 +2659,7 @@ xlog_recover_do_reg_buffer(
 		ASSERT(nbits > 0);
 		ASSERT(item->ri_buf[i].i_addr != NULL);
 		ASSERT(item->ri_buf[i].i_len % XFS_BLF_CHUNK == 0);
-		ASSERT(BBTOB(bp->b_io_length) >=
+		ASSERT(BBTOB(bp->b_length) >=
 		       ((uint)bit << XFS_BLF_SHIFT) + (nbits << XFS_BLF_SHIFT));
 
 		/*
@@ -2883,7 +2882,7 @@ xlog_recover_buffer_pass2(
 	 */
 	if (XFS_DINODE_MAGIC ==
 	    be16_to_cpu(*((__be16 *)xfs_buf_offset(bp, 0))) &&
-	    (BBTOB(bp->b_io_length) != max(log->l_mp->m_sb.sb_blocksize,
+	    (BBTOB(bp->b_length) != max(log->l_mp->m_sb.sb_blocksize,
 			(uint32_t)log->l_mp->m_inode_cluster_size))) {
 		xfs_buf_stale(bp);
 		error = xfs_bwrite(bp);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-20 16:13 ` [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers Christoph Hellwig
@ 2019-05-20 23:32   ` Dave Chinner
  2019-05-21  5:09     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2019-05-20 23:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, May 20, 2019 at 06:13:44PM +0200, Christoph Hellwig wrote:
> The xfs_buf structure is basically used as a glorified container for
> a vmalloc allocation in the log recovery code.  Replace it with a
> real vmalloc implementation and just build bios directly as needed
> to read into or write from it to simplify things a bit.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I like the way the series runs in general, and the end result is a
fair bit neater, but I'm struggling to work out how we translate
this to the userspace code that uses uncached/raw buffer IO.

i.e. I'm in the process of porting the xfs_buf.c code to userspace,
and was using the uncached buffer API to provide the bits the
log code and other raw IO users (xfs_db, repair prefetch) with this
functionality through the API this patchset removes.

I wrote the patches a couple of days ago to move all this uncached
IO and kernel memory and device specific stuff to a xfs_buftarg.[ch]
files. This leaves xfs_buf.c as purely cached buffer management
code, has no bio stuff in it, no memory allocation, no shrinkers,
LRUs, etc).

So while it's easy to drop the uncached buffer API from the kernel
side, it leaves me with the question of what API do we use in
userspace to provide this same functionality? I suspect that we
probably need to separate all this log-to-bio code out into a
separate file (e.g. xfs_log_io.[ch]) to leave a set of API stubs
that we can reimplement in userspace to pread/pwrite directly to
the log buftarg device fd as I've done already for the buffer
code...

Thoughts?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-20 23:32   ` Dave Chinner
@ 2019-05-21  5:09     ` Christoph Hellwig
  2019-05-21 22:24       ` Dave Chinner
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-21  5:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Tue, May 21, 2019 at 09:32:33AM +1000, Dave Chinner wrote:
> So while it's easy to drop the uncached buffer API from the kernel
> side, it leaves me with the question of what API do we use in
> userspace to provide this same functionality? I suspect that we
> probably need to separate all this log-to-bio code out into a
> separate file (e.g. xfs_log_io.[ch]) to leave a set of API stubs
> that we can reimplement in userspace to pread/pwrite directly to
> the log buftarg device fd as I've done already for the buffer
> code...

For one we still keep the uncached buffers in xfs_buf.c as we have users
of that outside of the log code, but I guess that is not what you mean.

I can split the log recovery code into a separate file, as you said
it should just be malloc + pread/pwrite in userspace, so implementing
it should be trivial.  The xlog_sync case is pretty different in the
kernel as it isn't synchonous, and it also doesn't currently exist in
userspace.  I'd rather keep that as-is unless you have plans to port
the logging code to userspace?  Even in that case we'll probably want
a different abstraction that maps to aio.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-21  5:09     ` Christoph Hellwig
@ 2019-05-21 22:24       ` Dave Chinner
  2019-05-22  5:12         ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2019-05-21 22:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Tue, May 21, 2019 at 07:09:43AM +0200, Christoph Hellwig wrote:
> On Tue, May 21, 2019 at 09:32:33AM +1000, Dave Chinner wrote:
> > So while it's easy to drop the uncached buffer API from the kernel
> > side, it leaves me with the question of what API do we use in
> > userspace to provide this same functionality? I suspect that we
> > probably need to separate all this log-to-bio code out into a
> > separate file (e.g. xfs_log_io.[ch]) to leave a set of API stubs
> > that we can reimplement in userspace to pread/pwrite directly to
> > the log buftarg device fd as I've done already for the buffer
> > code...
> 
> For one we still keep the uncached buffers in xfs_buf.c as we have users
> of that outside of the log code, but I guess that is not what you mean.
> 
> I can split the log recovery code into a separate file, as you said
> it should just be malloc + pread/pwrite in userspace, so implementing
> it should be trivial.

Yeah, the log recovery code should probably be split in three - the
kernel specific IO code/API, the log parsing code (the bit that
finds head/tail and parses it into transactions for recovery) and
then the bit that actually does the recovery. THe logprint code in
userspace uses the parsing code, so that's the bit we need to share
with userspace...

> The xlog_sync case is pretty different in the
> kernel as it isn't synchonous, and it also doesn't currently exist in
> userspace.  I'd rather keep that as-is unless you have plans to port
> the logging code to userspace?

That's fine, I have no plans to pull the full logging code into
userspace right now.

> Even in that case we'll probably want
> a different abstraction that maps to aio.

I've got a rough AIO implementation backing the xfs_buf.c code in
userspace already. It works just fine and is massively faster than
the existing code on SSDs, so I don't see a problem with porting IO
code that assumes an AIO model anymore. i.e. Re-using the kernel AIO
model for all the buffer code in userspace is one of the reasons I'm
porting xfs-buf.c to userspace.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-21 22:24       ` Dave Chinner
@ 2019-05-22  5:12         ` Christoph Hellwig
  2019-05-22  6:19           ` Dave Chinner
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-22  5:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Wed, May 22, 2019 at 08:24:34AM +1000, Dave Chinner wrote:
> Yeah, the log recovery code should probably be split in three - the
> kernel specific IO code/API, the log parsing code (the bit that
> finds head/tail and parses it into transactions for recovery) and
> then the bit that actually does the recovery. THe logprint code in
> userspace uses the parsing code, so that's the bit we need to share
> with userspace...

Actually one thing I have on my TODO list is to move the log item type
specific recovery code first into an ops vector, and then out to the
xfs_*_item.c together with the code creating those items.  That isn't
really all of the recovery code, but it seems like a useful split.

Note that the I/O code isn't really very log specific, it basically
just is trivial I/O to a vmalloc buffer code.  In fact I wonder if
I could just generalize it a little more and move it to the block layer.

> I've got a rough AIO implementation backing the xfs_buf.c code in
> userspace already. It works just fine and is massively faster than
> the existing code on SSDs, so I don't see a problem with porting IO
> code that assumes an AIO model anymore. i.e. Re-using the kernel AIO
> model for all the buffer code in userspace is one of the reasons I'm
> porting xfs-buf.c to userspace.

Given that we:

 a) do direct I/O everywhere
 b) tend to do it on either a block device, or a file where we don't
    need to allocate over holes

aio should be a win everywhere.  The only caveat is that CONFG_AIO
is kernel option and could be turned off in some low end configs.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-22  5:12         ` Christoph Hellwig
@ 2019-05-22  6:19           ` Dave Chinner
  2019-05-22 17:31             ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2019-05-22  6:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, May 22, 2019 at 07:12:14AM +0200, Christoph Hellwig wrote:
> On Wed, May 22, 2019 at 08:24:34AM +1000, Dave Chinner wrote:
> > Yeah, the log recovery code should probably be split in three - the
> > kernel specific IO code/API, the log parsing code (the bit that
> > finds head/tail and parses it into transactions for recovery) and
> > then the bit that actually does the recovery. THe logprint code in
> > userspace uses the parsing code, so that's the bit we need to share
> > with userspace...
> 
> Actually one thing I have on my TODO list is to move the log item type
> specific recovery code first into an ops vector, and then out to the
> xfs_*_item.c together with the code creating those items.  That isn't
> really all of the recovery code, but it seems like a useful split.

Sounds like the right place to me - it's roughly where I had in mind
to split the code as it's not until logprint decodes the
transactions and needs to parse the individual log items that it
diverges from the kernel code. So just having a set of op vectors
that we can supply from userspace to implement logprint would make
it much simpler....

> Note that the I/O code isn't really very log specific, it basically
> just is trivial I/O to a vmalloc buffer code.  In fact I wonder if
> I could just generalize it a little more and move it to the block layer.

Yeah, it's not complex, just different to userspace. Which is why
I thought just having a simple API to between it and the kernel log
code would make it easy to port...

> > I've got a rough AIO implementation backing the xfs_buf.c code in
> > userspace already. It works just fine and is massively faster than
> > the existing code on SSDs, so I don't see a problem with porting IO
> > code that assumes an AIO model anymore. i.e. Re-using the kernel AIO
> > model for all the buffer code in userspace is one of the reasons I'm
> > porting xfs-buf.c to userspace.
> 
> Given that we:
> 
>  a) do direct I/O everywhere
>  b) tend to do it on either a block device, or a file where we don't
>     need to allocate over holes
> 
> aio should be a win everywhere.

So far it is, but I haven't tested on spinning disks so I can't say
for certain that it is a win there. The biggest difference for SSDs
is that we completely bypass the prefetching code and so the
buffer cache memory footprint goes way down. Hence we save huge
amounts of CPU by avoiding allocating, freeing and faulting in
memory so we essentially stop bashing on and being limited by
mmap_sem contention.

> The only caveat is that CONFG_AIO
> is kernel option and could be turned off in some low end configs.

Should be trivial to add a configure option to turn it off and
have the IO code just call pread/pwrite directly and run the
completions synchronously. That's kind of how I'm building up the
patchset, anyway - AIO doesn't come along until after the xfs_buf.c
infrastructure is in place doing sync IO. I'll make a note to add a
--disable-aio config option when I get there....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-22  6:19           ` Dave Chinner
@ 2019-05-22 17:31             ` Christoph Hellwig
  2019-05-22 23:28               ` Dave Chinner
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-22 17:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

Does this look fine to you?

http://git.infradead.org/users/hch/xfs.git/commitdiff/8d9600456c6674453dddd5bf36512658c39d7207

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-22 17:31             ` Christoph Hellwig
@ 2019-05-22 23:28               ` Dave Chinner
  2019-05-23  6:22                 ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2019-05-22 23:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, May 22, 2019 at 07:31:32PM +0200, Christoph Hellwig wrote:
> Does this look fine to you?
> 
> http://git.infradead.org/users/hch/xfs.git/commitdiff/8d9600456c6674453dddd5bf36512658c39d7207

Yeah, that'll make it heaps easier to deal with. I'd rename "bp" in
that patch to something like "bufp" so it's not easily confused with
a xfs_buf bp, but otherwise it looks good.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers
  2019-05-22 23:28               ` Dave Chinner
@ 2019-05-23  6:22                 ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-05-23  6:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, linux-xfs

On Thu, May 23, 2019 at 09:28:59AM +1000, Dave Chinner wrote:
> Yeah, that'll make it heaps easier to deal with. I'd rename "bp" in
> that patch to something like "bufp" so it's not easily confused with
> a xfs_buf bp, but otherwise it looks good.

I had that in my original unpublished version, but it seemed to create
a lot of churn.  I'll throw it in as an additional cleanup patch at the
end.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-05-23  6:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-20 16:13 use bios directly in the log code Christoph Hellwig
2019-05-20 16:13 ` [PATCH 01/17] xfs: remove the no-op spinlock_destroy stub Christoph Hellwig
2019-05-20 16:13 ` [PATCH 02/17] xfs: remove the never used _XBF_COMPOUND flag Christoph Hellwig
2019-05-20 16:13 ` [PATCH 03/17] xfs: renumber XBF_WRITE_FAIL Christoph Hellwig
2019-05-20 16:13 ` [PATCH 04/17] xfs: reformat xlog_get_lowest_lsn Christoph Hellwig
2019-05-20 16:13 ` [PATCH 05/17] xfs: don't use REQ_PREFLUSH for split log writes Christoph Hellwig
2019-05-20 16:13 ` [PATCH 06/17] xfs: factor out log buffer writing Christoph Hellwig
2019-05-20 16:13 ` [PATCH 07/17] xfs: factor out splitting of an iclog from xlog_sync Christoph Hellwig
2019-05-20 16:13 ` [PATCH 08/17] xfs: split iclog size calculation out of xlog_sync Christoph Hellwig
2019-05-20 16:13 ` [PATCH 09/17] xfs: update both state counters together in xlog_sync Christoph Hellwig
2019-05-20 16:13 ` [PATCH 10/17] xfs: remove the syncing argument from xlog_verify_iclog Christoph Hellwig
2019-05-20 16:13 ` [PATCH 11/17] xfs: make use of the l_targ field in struct xlog Christoph Hellwig
2019-05-20 16:13 ` [PATCH 12/17] xfs: use bios directly to write log buffers Christoph Hellwig
2019-05-20 16:13 ` [PATCH 13/17] xfs: return an offset instead of a pointer from xlog_align Christoph Hellwig
2019-05-20 16:13 ` [PATCH 14/17] xfs: use bios directly to read and write the log recovery buffers Christoph Hellwig
2019-05-20 23:32   ` Dave Chinner
2019-05-21  5:09     ` Christoph Hellwig
2019-05-21 22:24       ` Dave Chinner
2019-05-22  5:12         ` Christoph Hellwig
2019-05-22  6:19           ` Dave Chinner
2019-05-22 17:31             ` Christoph Hellwig
2019-05-22 23:28               ` Dave Chinner
2019-05-23  6:22                 ` Christoph Hellwig
2019-05-20 16:13 ` [PATCH 15/17] xfs: remove unused buffer cache APIs Christoph Hellwig
2019-05-20 16:13 ` [PATCH 16/17] xfs: properly type the b_log_item field in struct xfs_buf Christoph Hellwig
2019-05-20 16:13 ` [PATCH 17/17] xfs: remove the b_io_length " Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.