All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/6] xfs: help mkfs shed its AG initialization code
@ 2019-06-21 19:56 Darrick J. Wong
  2019-06-21 19:57 ` [PATCH 1/6] xfs: refactor free space btree record initialization Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Darrick J. Wong @ 2019-06-21 19:56 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

In this series, we start by adapting the libxfs AG construction code to
be aware that an internal log can be placed with an AG that is being
initialized.  This is necessary to refactor mkfs to use the AG
construction set instead of its own open-coded initialization work.

In userspace, the next thing we have to do is to fix the uncached buffer
code so that libxfs_putbuf won't try to suck them into the buffer cache;
and then fix delwri_{queue,submit} so that IO errors are returned by the
submit function.

The final patch in the xfsprogs series replaces all of mkfs' AG
initialization functions with a single call to the functions in libxfs.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=mkfs-refactor

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=mkfs-refactor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] xfs: refactor free space btree record initialization
  2019-06-21 19:56 [PATCH RFC 0/6] xfs: help mkfs shed its AG initialization code Darrick J. Wong
@ 2019-06-21 19:57 ` Darrick J. Wong
  2019-06-25 10:37   ` Christoph Hellwig
  2019-06-21 19:57 ` [PATCH 2/6] xfs: account for log space when formatting new AGs Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2019-06-21 19:57 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, Allison Collins

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor the code that populates the free space btrees of a new AG so
that we can avoid code duplication once things start getting
complicated.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
---
 libxfs/xfs_ag.c |   27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index 8ee45699..fe79693e 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -57,37 +57,42 @@ xfs_btroot_init(
 	xfs_btree_init_block(mp, bp, id->type, 0, 0, id->agno);
 }
 
-/*
- * Alloc btree root block init functions
- */
+/* Finish initializing a free space btree. */
 static void
-xfs_bnoroot_init(
+xfs_freesp_init_recs(
 	struct xfs_mount	*mp,
 	struct xfs_buf		*bp,
 	struct aghdr_init_data	*id)
 {
 	struct xfs_alloc_rec	*arec;
 
-	xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, id->agno);
 	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
 	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
 	arec->ar_blockcount = cpu_to_be32(id->agsize -
 					  be32_to_cpu(arec->ar_startblock));
 }
 
+/*
+ * Alloc btree root block init functions
+ */
 static void
-xfs_cntroot_init(
+xfs_bnoroot_init(
 	struct xfs_mount	*mp,
 	struct xfs_buf		*bp,
 	struct aghdr_init_data	*id)
 {
-	struct xfs_alloc_rec	*arec;
+	xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, id->agno);
+	xfs_freesp_init_recs(mp, bp, id);
+}
 
+static void
+xfs_cntroot_init(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
+{
 	xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, id->agno);
-	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
-	arec->ar_blockcount = cpu_to_be32(id->agsize -
-					  be32_to_cpu(arec->ar_startblock));
+	xfs_freesp_init_recs(mp, bp, id);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] xfs: account for log space when formatting new AGs
  2019-06-21 19:56 [PATCH RFC 0/6] xfs: help mkfs shed its AG initialization code Darrick J. Wong
  2019-06-21 19:57 ` [PATCH 1/6] xfs: refactor free space btree record initialization Darrick J. Wong
@ 2019-06-21 19:57 ` Darrick J. Wong
  2019-06-25 10:39   ` Christoph Hellwig
  2019-06-21 19:57 ` [PATCH 3/6] libxfs: fix uncached buffer refcounting Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2019-06-21 19:57 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, Allison Collins

From: Darrick J. Wong <darrick.wong@oracle.com>

When we're writing out a fresh new AG, make sure that we don't list an
internal log as free and that we create the rmap for the region.  growfs
never does this, but we will need it when we hook up mkfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
---
 libxfs/xfs_ag.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)


diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
index fe79693e..237d6c53 100644
--- a/libxfs/xfs_ag.c
+++ b/libxfs/xfs_ag.c
@@ -11,6 +11,7 @@
 #include "xfs_shared.h"
 #include "xfs_format.h"
 #include "xfs_trans_resv.h"
+#include "xfs_bit.h"
 #include "xfs_sb.h"
 #include "xfs_mount.h"
 #include "xfs_btree.h"
@@ -45,6 +46,12 @@ xfs_get_aghdr_buf(
 	return bp;
 }
 
+static inline bool is_log_ag(struct xfs_mount *mp, struct aghdr_init_data *id)
+{
+	return mp->m_sb.sb_logstart > 0 &&
+	       id->agno == XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart);
+}
+
 /*
  * Generic btree root block init function
  */
@@ -65,11 +72,50 @@ xfs_freesp_init_recs(
 	struct aghdr_init_data	*id)
 {
 	struct xfs_alloc_rec	*arec;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
 
 	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
 	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
+
+	if (is_log_ag(mp, id)) {
+		struct xfs_alloc_rec	*nrec;
+		xfs_agblock_t		start = XFS_FSB_TO_AGBNO(mp,
+							mp->m_sb.sb_logstart);
+
+		ASSERT(start >= mp->m_ag_prealloc_blocks);
+		if (start != mp->m_ag_prealloc_blocks) {
+			/*
+			 * Modify first record to pad stripe align of log
+			 */
+			arec->ar_blockcount = cpu_to_be32(start -
+						mp->m_ag_prealloc_blocks);
+			nrec = arec + 1;
+			/*
+			 * Insert second record at start of internal log
+			 * which then gets trimmed.
+			 */
+			nrec->ar_startblock = cpu_to_be32(
+					be32_to_cpu(arec->ar_startblock) +
+					be32_to_cpu(arec->ar_blockcount));
+			arec = nrec;
+			be16_add_cpu(&block->bb_numrecs, 1);
+		}
+		/*
+		 * Change record start to after the internal log
+		 */
+		be32_add_cpu(&arec->ar_startblock, mp->m_sb.sb_logblocks);
+	}
+
+	/*
+	 * Calculate the record block count and check for the case where
+	 * the log might have consumed all available space in the AG. If
+	 * so, reset the record count to 0 to avoid exposure of an invalid
+	 * record start block.
+	 */
 	arec->ar_blockcount = cpu_to_be32(id->agsize -
 					  be32_to_cpu(arec->ar_startblock));
+	if (!arec->ar_blockcount)
+		block->bb_numrecs = 0;
 }
 
 /*
@@ -155,6 +201,18 @@ xfs_rmaproot_init(
 		rrec->rm_offset = 0;
 		be16_add_cpu(&block->bb_numrecs, 1);
 	}
+
+	/* account for the log space */
+	if (is_log_ag(mp, id)) {
+		rrec = XFS_RMAP_REC_ADDR(block,
+				be16_to_cpu(block->bb_numrecs) + 1);
+		rrec->rm_startblock = cpu_to_be32(
+				XFS_FSB_TO_AGBNO(mp, mp->m_sb.sb_logstart));
+		rrec->rm_blockcount = cpu_to_be32(mp->m_sb.sb_logblocks);
+		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_LOG);
+		rrec->rm_offset = 0;
+		be16_add_cpu(&block->bb_numrecs, 1);
+	}
 }
 
 /*
@@ -215,6 +273,14 @@ xfs_agfblock_init(
 		agf->agf_refcount_level = cpu_to_be32(1);
 		agf->agf_refcount_blocks = cpu_to_be32(1);
 	}
+
+	if (is_log_ag(mp, id)) {
+		int64_t	logblocks = mp->m_sb.sb_logblocks;
+
+		be32_add_cpu(&agf->agf_freeblks, -logblocks);
+		agf->agf_longest = cpu_to_be32(id->agsize -
+			XFS_FSB_TO_AGBNO(mp, mp->m_sb.sb_logstart) - logblocks);
+	}
 }
 
 static void

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] libxfs: fix uncached buffer refcounting
  2019-06-21 19:56 [PATCH RFC 0/6] xfs: help mkfs shed its AG initialization code Darrick J. Wong
  2019-06-21 19:57 ` [PATCH 1/6] xfs: refactor free space btree record initialization Darrick J. Wong
  2019-06-21 19:57 ` [PATCH 2/6] xfs: account for log space when formatting new AGs Darrick J. Wong
@ 2019-06-21 19:57 ` Darrick J. Wong
  2019-06-25 10:39   ` Christoph Hellwig
  2019-06-21 19:57 ` [PATCH 4/6] libxfs: fix buffer refcounting in delwri_queue Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2019-06-21 19:57 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Currently, uncached buffers in userspace are created with zero refcount
and are fed to cache_node_put when they're released.  This is totally
broken -- the refcount should be 1 (because the caller now holds a
reference) and we should never be dumping uncached buffers into the
cache.  Fix both of these problems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_io.h   |   18 ++++++++++++++++++
 libxfs/libxfs_priv.h |    2 --
 libxfs/rdwr.c        |    5 ++++-
 3 files changed, 22 insertions(+), 3 deletions(-)


diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
index 94261270..b6f8756a 100644
--- a/libxfs/libxfs_io.h
+++ b/libxfs/libxfs_io.h
@@ -225,4 +225,22 @@ xfs_buf_associate_memory(struct xfs_buf *bp, void *mem, size_t len)
 	return 0;
 }
 
+/*
+ * Allocate an uncached buffer that points nowhere.  The refcount will be 1,
+ * and the cache node hash list will be empty to indicate that it's uncached.
+ */
+static inline struct xfs_buf *
+xfs_buf_get_uncached(struct xfs_buftarg *targ, size_t bblen, int flags)
+{
+	struct xfs_buf	*bp;
+
+	bp = libxfs_getbufr(targ, XFS_BUF_DADDR_NULL, bblen);
+	if (!bp)
+		return NULL;
+
+	INIT_LIST_HEAD(&bp->b_node.cn_hash);
+	bp->b_node.cn_count = 1;
+	return bp;
+}
+
 #endif	/* __LIBXFS_IO_H__ */
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 9004ff0c..eb92942a 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -378,8 +378,6 @@ roundup_64(uint64_t x, uint32_t y)
 	(len) = __bar; /* no set-but-unused warning */	\
 	NULL;						\
 })
-#define xfs_buf_get_uncached(t,n,f)     \
-	libxfs_getbufr((t), XFS_BUF_DADDR_NULL, (n));
 #define xfs_buf_relse(bp)		libxfs_putbuf(bp)
 #define xfs_buf_get(devp,blkno,len)	(libxfs_getbuf((devp), (blkno), (len)))
 #define xfs_bwrite(bp)			libxfs_writebuf((bp), 0)
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 998862af..9c6c93a7 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -866,7 +866,10 @@ libxfs_putbuf(xfs_buf_t *bp)
 		}
 	}
 
-	cache_node_put(libxfs_bcache, (struct cache_node *)bp);
+	if (!list_empty(&bp->b_node.cn_hash))
+		cache_node_put(libxfs_bcache, (struct cache_node *)bp);
+	else if (--bp->b_node.cn_count == 0)
+		libxfs_putbufr(bp);
 }
 
 void

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] libxfs: fix buffer refcounting in delwri_queue
  2019-06-21 19:56 [PATCH RFC 0/6] xfs: help mkfs shed its AG initialization code Darrick J. Wong
                   ` (2 preceding siblings ...)
  2019-06-21 19:57 ` [PATCH 3/6] libxfs: fix uncached buffer refcounting Darrick J. Wong
@ 2019-06-21 19:57 ` Darrick J. Wong
  2019-06-25 10:40   ` Christoph Hellwig
  2019-06-21 19:57 ` [PATCH 5/6] libxfs: make xfs_buf_delwri_submit actually do something Darrick J. Wong
  2019-06-21 19:57 ` [PATCH 6/6] mkfs: use libxfs to write out new AGs Darrick J. Wong
  5 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2019-06-21 19:57 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In the kernel, xfs_buf_delwri_queue increments the buffer reference
count before putting the buffer on the buffer list, and the refcount is
decremented after the io completes for a net refcount change of zero.

In userspace, delwri_queue calls libxfs_writebuf, which puts the buffer.
delwri_queue is a no-op, for a net refcount change of -1.  This creates
problems for any callers that expect a net change of zero, so increment
the buffer refcount before calling writebuf.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_io.h   |    7 +++++++
 libxfs/libxfs_priv.h |    1 -
 2 files changed, 7 insertions(+), 1 deletion(-)


diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
index b6f8756a..c47a435d 100644
--- a/libxfs/libxfs_io.h
+++ b/libxfs/libxfs_io.h
@@ -243,4 +243,11 @@ xfs_buf_get_uncached(struct xfs_buftarg *targ, size_t bblen, int flags)
 	return bp;
 }
 
+static inline void
+xfs_buf_delwri_queue(struct xfs_buf *bp, struct list_head *buffer_list)
+{
+	bp->b_node.cn_count++;
+	libxfs_writebuf(bp, 0);
+}
+
 #endif	/* __LIBXFS_IO_H__ */
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index eb92942a..fd420f4f 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -381,7 +381,6 @@ roundup_64(uint64_t x, uint32_t y)
 #define xfs_buf_relse(bp)		libxfs_putbuf(bp)
 #define xfs_buf_get(devp,blkno,len)	(libxfs_getbuf((devp), (blkno), (len)))
 #define xfs_bwrite(bp)			libxfs_writebuf((bp), 0)
-#define xfs_buf_delwri_queue(bp, bl)	libxfs_writebuf((bp), 0)
 #define xfs_buf_delwri_submit(bl)	(0)
 #define xfs_buf_oneshot(bp)		((void) 0)
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] libxfs: make xfs_buf_delwri_submit actually do something
  2019-06-21 19:56 [PATCH RFC 0/6] xfs: help mkfs shed its AG initialization code Darrick J. Wong
                   ` (3 preceding siblings ...)
  2019-06-21 19:57 ` [PATCH 4/6] libxfs: fix buffer refcounting in delwri_queue Darrick J. Wong
@ 2019-06-21 19:57 ` Darrick J. Wong
  2019-06-25 10:41   ` Christoph Hellwig
  2019-06-21 19:57 ` [PATCH 6/6] mkfs: use libxfs to write out new AGs Darrick J. Wong
  5 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2019-06-21 19:57 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

xfs_buf_delwri_queue doesn't report errors, which means that if the
buffer write fails we have no way of knowing that something bad
happened.  In the kernel we queue and then submit buffers, and the
submit call communicates errors to callers.  Do the same here since
we're going to start using the AG header initialization functions, which
use delwri_{queue,submit} heavily.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_io.h   |    6 +++++-
 libxfs/libxfs_priv.h |    1 -
 libxfs/rdwr.c        |   25 +++++++++++++++++++++++++
 3 files changed, 30 insertions(+), 2 deletions(-)


diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
index c47a435d..d033faad 100644
--- a/libxfs/libxfs_io.h
+++ b/libxfs/libxfs_io.h
@@ -70,6 +70,7 @@ typedef struct xfs_buf {
 	struct xfs_buf_map	*b_maps;
 	struct xfs_buf_map	__b_map;
 	int			b_nmaps;
+	struct list_head	b_list;
 #ifdef XFS_BUF_TRACING
 	struct list_head	b_lock_list;
 	const char		*b_func;
@@ -243,11 +244,14 @@ xfs_buf_get_uncached(struct xfs_buftarg *targ, size_t bblen, int flags)
 	return bp;
 }
 
+/* Push a single buffer on a delwri queue. */
 static inline void
 xfs_buf_delwri_queue(struct xfs_buf *bp, struct list_head *buffer_list)
 {
 	bp->b_node.cn_count++;
-	libxfs_writebuf(bp, 0);
+	list_add_tail(&bp->b_list, buffer_list);
 }
 
+int xfs_buf_delwri_submit(struct list_head *buffer_list);
+
 #endif	/* __LIBXFS_IO_H__ */
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index fd420f4f..0233393d 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -381,7 +381,6 @@ roundup_64(uint64_t x, uint32_t y)
 #define xfs_buf_relse(bp)		libxfs_putbuf(bp)
 #define xfs_buf_get(devp,blkno,len)	(libxfs_getbuf((devp), (blkno), (len)))
 #define xfs_bwrite(bp)			libxfs_writebuf((bp), 0)
-#define xfs_buf_delwri_submit(bl)	(0)
 #define xfs_buf_oneshot(bp)		((void) 0)
 
 #define XBRW_READ			LIBXFS_BREAD
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 9c6c93a7..dd582a4e 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -1472,3 +1472,28 @@ libxfs_irele(
 	libxfs_idestroy(ip);
 	kmem_zone_free(xfs_inode_zone, ip);
 }
+
+/*
+ * Write out a buffer list synchronously.
+ *
+ * This will take the @buffer_list, write all buffers out and wait for I/O
+ * completion on all of the buffers. @buffer_list is consumed by the function,
+ * so callers must have some other way of tracking buffers if they require such
+ * functionality.
+ */
+int
+xfs_buf_delwri_submit(
+	struct list_head	*buffer_list)
+{
+	struct xfs_buf		*bp, *n;
+	int			error = 0, error2;
+
+	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
+		list_del_init(&bp->b_list);
+		error2 = libxfs_writebuf(bp, 0);
+		if (!error)
+			error = error2;
+	}
+
+	return error;
+}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] mkfs: use libxfs to write out new AGs
  2019-06-21 19:56 [PATCH RFC 0/6] xfs: help mkfs shed its AG initialization code Darrick J. Wong
                   ` (4 preceding siblings ...)
  2019-06-21 19:57 ` [PATCH 5/6] libxfs: make xfs_buf_delwri_submit actually do something Darrick J. Wong
@ 2019-06-21 19:57 ` Darrick J. Wong
  2019-06-25 10:41   ` Christoph Hellwig
  2019-07-01 12:25   ` Dave Chinner
  5 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2019-06-21 19:57 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the libxfs AG initialization functions to write out the new
filesystem instead of open-coding everything.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/libxfs.h         |    1 
 libxfs/libxfs_api_defs.h |    3 
 mkfs/xfs_mkfs.c          |  359 +++-------------------------------------------
 3 files changed, 29 insertions(+), 334 deletions(-)


diff --git a/include/libxfs.h b/include/libxfs.h
index dd5fe542..3bf7feab 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -63,6 +63,7 @@ extern uint32_t crc32c_le(uint32_t crc, unsigned char const *p, size_t len);
 #include "xfs_bmap.h"
 #include "xfs_trace.h"
 #include "xfs_trans.h"
+#include "xfs_ag.h"
 #include "xfs_rmap_btree.h"
 #include "xfs_rmap.h"
 #include "xfs_refcount_btree.h"
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 0ae21318..645c9b1b 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -154,4 +154,7 @@
 #define LIBXFS_ATTR_CREATE		ATTR_CREATE
 #define LIBXFS_ATTR_REPLACE		ATTR_REPLACE
 
+#define xfs_ag_init_headers		libxfs_ag_init_headers
+#define xfs_buf_delwri_submit		libxfs_buf_delwri_submit
+
 #endif /* __LIBXFS_API_DEFS_H__ */
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 56ba5379..8a44bb98 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3431,349 +3431,30 @@ initialise_ag_headers(
 	struct xfs_mount	*mp,
 	struct xfs_sb		*sbp,
 	xfs_agnumber_t		agno,
-	int			*worst_freelist)
+	int			*worst_freelist,
+	struct list_head	*buffer_list)
 {
+	struct aghdr_init_data	id = {
+		.agno		= agno,
+		.agsize		= cfg->agsize,
+	};
 	struct xfs_perag	*pag = libxfs_perag_get(mp, agno);
-	struct xfs_agfl		*agfl;
-	struct xfs_agf		*agf;
-	struct xfs_agi		*agi;
-	struct xfs_buf		*buf;
-	struct xfs_btree_block	*block;
-	struct xfs_alloc_rec	*arec;
-	struct xfs_alloc_rec	*nrec;
-	int			bucket;
-	uint64_t		agsize = cfg->agsize;
-	xfs_agblock_t		agblocks;
-	bool			is_log_ag = false;
-	int			c;
-
-	if (cfg->loginternal && agno == cfg->logagno)
-		is_log_ag = true;
-
-	/*
-	 * Superblock.
-	 */
-	buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
-			XFS_FSS_TO_BB(mp, 1));
-	buf->b_ops = &xfs_sb_buf_ops;
-	memset(buf->b_addr, 0, cfg->sectorsize);
-	libxfs_sb_to_disk(buf->b_addr, sbp);
-	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
+	int			error;
 
-	/*
-	 * AG header block: freespace
-	 */
-	buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AG_DADDR(mp, agno, XFS_AGF_DADDR(mp)),
-			XFS_FSS_TO_BB(mp, 1));
-	buf->b_ops = &xfs_agf_buf_ops;
-	agf = XFS_BUF_TO_AGF(buf);
-	memset(agf, 0, cfg->sectorsize);
 	if (agno == cfg->agcount - 1)
-		agsize = cfg->dblocks - (xfs_rfsblock_t)(agno * agsize);
-	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
-	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
-	agf->agf_seqno = cpu_to_be32(agno);
-	agf->agf_length = cpu_to_be32(agsize);
-	agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(XFS_BNO_BLOCK(mp));
-	agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
-	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
-	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
-	pag->pagf_levels[XFS_BTNUM_BNOi] = 1;
-	pag->pagf_levels[XFS_BTNUM_CNTi] = 1;
-
-	if (xfs_sb_version_hasrmapbt(sbp)) {
-		agf->agf_roots[XFS_BTNUM_RMAPi] = cpu_to_be32(XFS_RMAP_BLOCK(mp));
-		agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
-		agf->agf_rmap_blocks = cpu_to_be32(1);
-	}
+		id.agsize = cfg->dblocks - (xfs_rfsblock_t)(agno * cfg->agsize);
 
-	if (xfs_sb_version_hasreflink(sbp)) {
-		agf->agf_refcount_root = cpu_to_be32(libxfs_refc_block(mp));
-		agf->agf_refcount_level = cpu_to_be32(1);
-		agf->agf_refcount_blocks = cpu_to_be32(1);
+	INIT_LIST_HEAD(&id.buffer_list);
+	error = -libxfs_ag_init_headers(mp, &id);
+	if (error) {
+		fprintf(stderr, _("AG header init failed, error %d\n"), error);
+		exit(1);
 	}
 
-	agf->agf_flfirst = 0;
-	agf->agf_fllast = cpu_to_be32(libxfs_agfl_size(mp) - 1);
-	agf->agf_flcount = 0;
-	agblocks = (xfs_agblock_t)(agsize - libxfs_prealloc_blocks(mp));
-	agf->agf_freeblks = cpu_to_be32(agblocks);
-	agf->agf_longest = cpu_to_be32(agblocks);
-
-	if (xfs_sb_version_hascrc(sbp))
-		platform_uuid_copy(&agf->agf_uuid, &sbp->sb_uuid);
+	list_splice_tail_init(&id.buffer_list, buffer_list);
 
-	if (is_log_ag) {
-		be32_add_cpu(&agf->agf_freeblks, -(int64_t)cfg->logblocks);
-		agf->agf_longest = cpu_to_be32(agsize -
-			XFS_FSB_TO_AGBNO(mp, cfg->logstart) - cfg->logblocks);
-	}
 	if (libxfs_alloc_min_freelist(mp, pag) > *worst_freelist)
 		*worst_freelist = libxfs_alloc_min_freelist(mp, pag);
-	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-
-	/*
-	 * AG freelist header block
-	 */
-	buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AG_DADDR(mp, agno, XFS_AGFL_DADDR(mp)),
-			XFS_FSS_TO_BB(mp, 1));
-	buf->b_ops = &xfs_agfl_buf_ops;
-	agfl = XFS_BUF_TO_AGFL(buf);
-	/* setting to 0xff results in initialisation to NULLAGBLOCK */
-	memset(agfl, 0xff, cfg->sectorsize);
-	if (xfs_sb_version_hascrc(sbp)) {
-		agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
-		agfl->agfl_seqno = cpu_to_be32(agno);
-		platform_uuid_copy(&agfl->agfl_uuid, &sbp->sb_uuid);
-		for (bucket = 0; bucket < libxfs_agfl_size(mp); bucket++)
-			agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
-	}
-
-	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-
-	/*
-	 * AG header block: inodes
-	 */
-	buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
-			XFS_FSS_TO_BB(mp, 1));
-	agi = XFS_BUF_TO_AGI(buf);
-	buf->b_ops = &xfs_agi_buf_ops;
-	memset(agi, 0, cfg->sectorsize);
-	agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
-	agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
-	agi->agi_seqno = cpu_to_be32(agno);
-	agi->agi_length = cpu_to_be32(agsize);
-	agi->agi_count = 0;
-	agi->agi_root = cpu_to_be32(XFS_IBT_BLOCK(mp));
-	agi->agi_level = cpu_to_be32(1);
-	if (xfs_sb_version_hasfinobt(sbp)) {
-		agi->agi_free_root = cpu_to_be32(XFS_FIBT_BLOCK(mp));
-		agi->agi_free_level = cpu_to_be32(1);
-	}
-	agi->agi_freecount = 0;
-	agi->agi_newino = cpu_to_be32(NULLAGINO);
-	agi->agi_dirino = cpu_to_be32(NULLAGINO);
-	if (xfs_sb_version_hascrc(sbp))
-		platform_uuid_copy(&agi->agi_uuid, &sbp->sb_uuid);
-	for (c = 0; c < XFS_AGI_UNLINKED_BUCKETS; c++)
-		agi->agi_unlinked[c] = cpu_to_be32(NULLAGINO);
-	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-
-	/*
-	 * BNO btree root block
-	 */
-	buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_BNO_BLOCK(mp)),
-			BTOBB(cfg->blocksize));
-	buf->b_ops = &xfs_bnobt_buf_ops;
-	block = XFS_BUF_TO_BLOCK(buf);
-	memset(block, 0, cfg->blocksize);
-	libxfs_btree_init_block(mp, buf, XFS_BTNUM_BNO, 0, 1, agno);
-
-	arec = XFS_ALLOC_REC_ADDR(mp, block, 1);
-	arec->ar_startblock = cpu_to_be32(libxfs_prealloc_blocks(mp));
-	if (is_log_ag) {
-		xfs_agblock_t	start = XFS_FSB_TO_AGBNO(mp, cfg->logstart);
-
-		ASSERT(start >= libxfs_prealloc_blocks(mp));
-		if (start != libxfs_prealloc_blocks(mp)) {
-			/*
-			 * Modify first record to pad stripe align of log
-			 */
-			arec->ar_blockcount = cpu_to_be32(start -
-						libxfs_prealloc_blocks(mp));
-			nrec = arec + 1;
-			/*
-			 * Insert second record at start of internal log
-			 * which then gets trimmed.
-			 */
-			nrec->ar_startblock = cpu_to_be32(
-					be32_to_cpu(arec->ar_startblock) +
-					be32_to_cpu(arec->ar_blockcount));
-			arec = nrec;
-			be16_add_cpu(&block->bb_numrecs, 1);
-		}
-		/*
-		 * Change record start to after the internal log
-		 */
-		be32_add_cpu(&arec->ar_startblock, cfg->logblocks);
-	}
-	/*
-	 * Calculate the record block count and check for the case where
-	 * the log might have consumed all available space in the AG. If
-	 * so, reset the record count to 0 to avoid exposure of an invalid
-	 * record start block.
-	 */
-	arec->ar_blockcount = cpu_to_be32(agsize -
-					  be32_to_cpu(arec->ar_startblock));
-	if (!arec->ar_blockcount)
-		block->bb_numrecs = 0;
-
-	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-
-	/*
-	 * CNT btree root block
-	 */
-	buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_CNT_BLOCK(mp)),
-			BTOBB(cfg->blocksize));
-	buf->b_ops = &xfs_cntbt_buf_ops;
-	block = XFS_BUF_TO_BLOCK(buf);
-	memset(block, 0, cfg->blocksize);
-	libxfs_btree_init_block(mp, buf, XFS_BTNUM_CNT, 0, 1, agno);
-
-	arec = XFS_ALLOC_REC_ADDR(mp, block, 1);
-	arec->ar_startblock = cpu_to_be32(libxfs_prealloc_blocks(mp));
-	if (is_log_ag) {
-		xfs_agblock_t	start = XFS_FSB_TO_AGBNO(mp, cfg->logstart);
-
-		ASSERT(start >= libxfs_prealloc_blocks(mp));
-		if (start != libxfs_prealloc_blocks(mp)) {
-			arec->ar_blockcount = cpu_to_be32(start -
-					libxfs_prealloc_blocks(mp));
-			nrec = arec + 1;
-			nrec->ar_startblock = cpu_to_be32(
-					be32_to_cpu(arec->ar_startblock) +
-					be32_to_cpu(arec->ar_blockcount));
-			arec = nrec;
-			be16_add_cpu(&block->bb_numrecs, 1);
-		}
-		be32_add_cpu(&arec->ar_startblock, cfg->logblocks);
-	}
-	/*
-	 * Calculate the record block count and check for the case where
-	 * the log might have consumed all available space in the AG. If
-	 * so, reset the record count to 0 to avoid exposure of an invalid
-	 * record start block.
-	 */
-	arec->ar_blockcount = cpu_to_be32(agsize -
-					  be32_to_cpu(arec->ar_startblock));
-	if (!arec->ar_blockcount)
-		block->bb_numrecs = 0;
-
-	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-
-	/*
-	 * refcount btree root block
-	 */
-	if (xfs_sb_version_hasreflink(sbp)) {
-		buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AGB_TO_DADDR(mp, agno, libxfs_refc_block(mp)),
-			BTOBB(cfg->blocksize));
-		buf->b_ops = &xfs_refcountbt_buf_ops;
-
-		block = XFS_BUF_TO_BLOCK(buf);
-		memset(block, 0, cfg->blocksize);
-		libxfs_btree_init_block(mp, buf, XFS_BTNUM_REFC, 0, 0, agno);
-		libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-	}
-
-	/*
-	 * INO btree root block
-	 */
-	buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
-			BTOBB(cfg->blocksize));
-	buf->b_ops = &xfs_inobt_buf_ops;
-	block = XFS_BUF_TO_BLOCK(buf);
-	memset(block, 0, cfg->blocksize);
-	libxfs_btree_init_block(mp, buf, XFS_BTNUM_INO, 0, 0, agno);
-	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-
-	/*
-	 * Free INO btree root block
-	 */
-	if (xfs_sb_version_hasfinobt(sbp)) {
-		buf = libxfs_getbuf(mp->m_ddev_targp,
-				XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
-				BTOBB(cfg->blocksize));
-		buf->b_ops = &xfs_finobt_buf_ops;
-		block = XFS_BUF_TO_BLOCK(buf);
-		memset(block, 0, cfg->blocksize);
-		libxfs_btree_init_block(mp, buf, XFS_BTNUM_FINO, 0, 0, agno);
-		libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-	}
-
-	/* RMAP btree root block */
-	if (xfs_sb_version_hasrmapbt(sbp)) {
-		struct xfs_rmap_rec	*rrec;
-
-		buf = libxfs_getbuf(mp->m_ddev_targp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
-			BTOBB(cfg->blocksize));
-		buf->b_ops = &xfs_rmapbt_buf_ops;
-		block = XFS_BUF_TO_BLOCK(buf);
-		memset(block, 0, cfg->blocksize);
-
-		libxfs_btree_init_block(mp, buf, XFS_BTNUM_RMAP, 0, 0, agno);
-
-		/*
-		 * mark the AG header regions as static metadata
-		 * The BNO btree block is the first block after the
-		 * headers, so it's location defines the size of region
-		 * the static metadata consumes.
-		 */
-		rrec = XFS_RMAP_REC_ADDR(block, 1);
-		rrec->rm_startblock = 0;
-		rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
-		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
-		rrec->rm_offset = 0;
-		be16_add_cpu(&block->bb_numrecs, 1);
-
-		/* account freespace btree root blocks */
-		rrec = XFS_RMAP_REC_ADDR(block, 2);
-		rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
-		rrec->rm_blockcount = cpu_to_be32(2);
-		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
-		rrec->rm_offset = 0;
-		be16_add_cpu(&block->bb_numrecs, 1);
-
-		/* account inode btree root blocks */
-		rrec = XFS_RMAP_REC_ADDR(block, 3);
-		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
-		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
-						XFS_IBT_BLOCK(mp));
-		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
-		rrec->rm_offset = 0;
-		be16_add_cpu(&block->bb_numrecs, 1);
-
-		/* account for rmap btree root */
-		rrec = XFS_RMAP_REC_ADDR(block, 4);
-		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
-		rrec->rm_blockcount = cpu_to_be32(1);
-		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
-		rrec->rm_offset = 0;
-		be16_add_cpu(&block->bb_numrecs, 1);
-
-		/* account for refcount btree root */
-		if (xfs_sb_version_hasreflink(sbp)) {
-			rrec = XFS_RMAP_REC_ADDR(block, 5);
-			rrec->rm_startblock = cpu_to_be32(libxfs_refc_block(mp));
-			rrec->rm_blockcount = cpu_to_be32(1);
-			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
-			rrec->rm_offset = 0;
-			be16_add_cpu(&block->bb_numrecs, 1);
-		}
-
-		/* account for the log space */
-		if (is_log_ag) {
-			rrec = XFS_RMAP_REC_ADDR(block,
-					be16_to_cpu(block->bb_numrecs) + 1);
-			rrec->rm_startblock = cpu_to_be32(
-					XFS_FSB_TO_AGBNO(mp, cfg->logstart));
-			rrec->rm_blockcount = cpu_to_be32(cfg->logblocks);
-			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_LOG);
-			rrec->rm_offset = 0;
-			be16_add_cpu(&block->bb_numrecs, 1);
-		}
-
-		libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
-	}
-
 	libxfs_perag_put(pag);
 }
 
@@ -3896,6 +3577,8 @@ main(
 		},
 	};
 
+	struct list_head	buffer_list;
+
 	platform_uuid_generate(&cli.uuid);
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
@@ -4087,8 +3770,16 @@ main(
 	/*
 	 * Initialise all the static on disk metadata.
 	 */
+	INIT_LIST_HEAD(&buffer_list);
 	for (agno = 0; agno < cfg.agcount; agno++)
-		initialise_ag_headers(&cfg, mp, sbp, agno, &worst_freelist);
+		initialise_ag_headers(&cfg, mp, sbp, agno, &worst_freelist,
+				&buffer_list);
+
+	if (libxfs_buf_delwri_submit(&buffer_list)) {
+		fprintf(stderr, _("%s: writing AG headers failed\n"),
+				progname);
+		exit(1);
+	}
 
 	/*
 	 * Initialise the freespace freelists (i.e. AGFLs) in each AG.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] xfs: refactor free space btree record initialization
  2019-06-21 19:57 ` [PATCH 1/6] xfs: refactor free space btree record initialization Darrick J. Wong
@ 2019-06-25 10:37   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-06-25 10:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs, Allison Collins

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] xfs: account for log space when formatting new AGs
  2019-06-21 19:57 ` [PATCH 2/6] xfs: account for log space when formatting new AGs Darrick J. Wong
@ 2019-06-25 10:39   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-06-25 10:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs, Allison Collins

On Fri, Jun 21, 2019 at 12:57:06PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we're writing out a fresh new AG, make sure that we don't list an
> internal log as free and that we create the rmap for the region.  growfs
> never does this, but we will need it when we hook up mkfs.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Allison Collins <allison.henderson@oracle.com>
> ---
>  libxfs/xfs_ag.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 
> 
> diff --git a/libxfs/xfs_ag.c b/libxfs/xfs_ag.c
> index fe79693e..237d6c53 100644
> --- a/libxfs/xfs_ag.c
> +++ b/libxfs/xfs_ag.c
> @@ -11,6 +11,7 @@
>  #include "xfs_shared.h"
>  #include "xfs_format.h"
>  #include "xfs_trans_resv.h"
> +#include "xfs_bit.h"
>  #include "xfs_sb.h"
>  #include "xfs_mount.h"
>  #include "xfs_btree.h"
> @@ -45,6 +46,12 @@ xfs_get_aghdr_buf(
>  	return bp;
>  }
>  
> +static inline bool is_log_ag(struct xfs_mount *mp, struct aghdr_init_data *id)
> +{
> +	return mp->m_sb.sb_logstart > 0 &&
> +	       id->agno == XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart);
> +}
> +
>  /*
>   * Generic btree root block init function
>   */
> @@ -65,11 +72,50 @@ xfs_freesp_init_recs(
>  	struct aghdr_init_data	*id)
>  {
>  	struct xfs_alloc_rec	*arec;
> +	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
>  
>  	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
>  	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> +
> +	if (is_log_ag(mp, id)) {
> +		struct xfs_alloc_rec	*nrec;
> +		xfs_agblock_t		start = XFS_FSB_TO_AGBNO(mp,
> +							mp->m_sb.sb_logstart);

This new code is pretty self-contained, maybe it should move into
a separate helper?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] libxfs: fix uncached buffer refcounting
  2019-06-21 19:57 ` [PATCH 3/6] libxfs: fix uncached buffer refcounting Darrick J. Wong
@ 2019-06-25 10:39   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-06-25 10:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Fri, Jun 21, 2019 at 12:57:12PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Currently, uncached buffers in userspace are created with zero refcount
> and are fed to cache_node_put when they're released.  This is totally
> broken -- the refcount should be 1 (because the caller now holds a
> reference) and we should never be dumping uncached buffers into the
> cache.  Fix both of these problems.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] libxfs: fix buffer refcounting in delwri_queue
  2019-06-21 19:57 ` [PATCH 4/6] libxfs: fix buffer refcounting in delwri_queue Darrick J. Wong
@ 2019-06-25 10:40   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-06-25 10:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Fri, Jun 21, 2019 at 12:57:24PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In the kernel, xfs_buf_delwri_queue increments the buffer reference
> count before putting the buffer on the buffer list, and the refcount is
> decremented after the io completes for a net refcount change of zero.
> 
> In userspace, delwri_queue calls libxfs_writebuf, which puts the buffer.
> delwri_queue is a no-op, for a net refcount change of -1.  This creates
> problems for any callers that expect a net change of zero, so increment
> the buffer refcount before calling writebuf.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/6] libxfs: make xfs_buf_delwri_submit actually do something
  2019-06-21 19:57 ` [PATCH 5/6] libxfs: make xfs_buf_delwri_submit actually do something Darrick J. Wong
@ 2019-06-25 10:41   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-06-25 10:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] mkfs: use libxfs to write out new AGs
  2019-06-21 19:57 ` [PATCH 6/6] mkfs: use libxfs to write out new AGs Darrick J. Wong
@ 2019-06-25 10:41   ` Christoph Hellwig
  2019-07-01 12:25   ` Dave Chinner
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-06-25 10:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

Nice!

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] mkfs: use libxfs to write out new AGs
  2019-06-21 19:57 ` [PATCH 6/6] mkfs: use libxfs to write out new AGs Darrick J. Wong
  2019-06-25 10:41   ` Christoph Hellwig
@ 2019-07-01 12:25   ` Dave Chinner
  2019-07-01 14:14     ` Darrick J. Wong
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2019-07-01 12:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Fri, Jun 21, 2019 at 12:57:39PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use the libxfs AG initialization functions to write out the new
> filesystem instead of open-coding everything.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
.....
> @@ -4087,8 +3770,16 @@ main(
>  	/*
>  	 * Initialise all the static on disk metadata.
>  	 */
> +	INIT_LIST_HEAD(&buffer_list);
>  	for (agno = 0; agno < cfg.agcount; agno++)
> -		initialise_ag_headers(&cfg, mp, sbp, agno, &worst_freelist);
> +		initialise_ag_headers(&cfg, mp, sbp, agno, &worst_freelist,
> +				&buffer_list);
> +
> +	if (libxfs_buf_delwri_submit(&buffer_list)) {
> +		fprintf(stderr, _("%s: writing AG headers failed\n"),
> +				progname);
> +		exit(1);
> +	}

The problem I came across with this "one big delwri list" construct
when adding delwri lists for batched AIO processing is that the
memory footprint for high AG count filesystems really blows out. Did
you check what happens when you create a filesystem with a few tens
of thousands of AGs? 

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/6] mkfs: use libxfs to write out new AGs
  2019-07-01 12:25   ` Dave Chinner
@ 2019-07-01 14:14     ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2019-07-01 14:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: sandeen, linux-xfs

On Mon, Jul 01, 2019 at 10:25:04PM +1000, Dave Chinner wrote:
> On Fri, Jun 21, 2019 at 12:57:39PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the libxfs AG initialization functions to write out the new
> > filesystem instead of open-coding everything.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> .....
> > @@ -4087,8 +3770,16 @@ main(
> >  	/*
> >  	 * Initialise all the static on disk metadata.
> >  	 */
> > +	INIT_LIST_HEAD(&buffer_list);
> >  	for (agno = 0; agno < cfg.agcount; agno++)
> > -		initialise_ag_headers(&cfg, mp, sbp, agno, &worst_freelist);
> > +		initialise_ag_headers(&cfg, mp, sbp, agno, &worst_freelist,
> > +				&buffer_list);
> > +
> > +	if (libxfs_buf_delwri_submit(&buffer_list)) {
> > +		fprintf(stderr, _("%s: writing AG headers failed\n"),
> > +				progname);
> > +		exit(1);
> > +	}
> 
> The problem I came across with this "one big delwri list" construct
> when adding delwri lists for batched AIO processing is that the
> memory footprint for high AG count filesystems really blows out. Did
> you check what happens when you create a filesystem with a few tens
> of thousands of AGs? 

I did, and then amended this patch to delwri_submit every ~16 or so AGs.

:)

I haven't resent the patch since I figure xfsprogs 5.3 is a ways off...

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-07-01 14:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21 19:56 [PATCH RFC 0/6] xfs: help mkfs shed its AG initialization code Darrick J. Wong
2019-06-21 19:57 ` [PATCH 1/6] xfs: refactor free space btree record initialization Darrick J. Wong
2019-06-25 10:37   ` Christoph Hellwig
2019-06-21 19:57 ` [PATCH 2/6] xfs: account for log space when formatting new AGs Darrick J. Wong
2019-06-25 10:39   ` Christoph Hellwig
2019-06-21 19:57 ` [PATCH 3/6] libxfs: fix uncached buffer refcounting Darrick J. Wong
2019-06-25 10:39   ` Christoph Hellwig
2019-06-21 19:57 ` [PATCH 4/6] libxfs: fix buffer refcounting in delwri_queue Darrick J. Wong
2019-06-25 10:40   ` Christoph Hellwig
2019-06-21 19:57 ` [PATCH 5/6] libxfs: make xfs_buf_delwri_submit actually do something Darrick J. Wong
2019-06-25 10:41   ` Christoph Hellwig
2019-06-21 19:57 ` [PATCH 6/6] mkfs: use libxfs to write out new AGs Darrick J. Wong
2019-06-25 10:41   ` Christoph Hellwig
2019-07-01 12:25   ` Dave Chinner
2019-07-01 14:14     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.