All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/18] xfs: sparse inode chunks
@ 2015-02-19 18:13 Brian Foster
  2015-02-19 18:13 ` [PATCH v5 01/18] xfs: create individual inode alloc. helper Brian Foster
                   ` (18 more replies)
  0 siblings, 19 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

Hi all,

Here's v5 of sparse inode chunks. The only real change here is to
convert the allocmask helpers back to using the XFS bitmap helpers
rather than the generic bitmap code. This eliminates the need for the
endian-conversion hack and extra helper to export a generic bitmap to a
native type. The former users of the generic bitmap itself have been
converted to use the native 64-bit value appropriately.

The XFS bitmap code is actually not in userspace either so neither of
these implementations backport cleanly to userspace. As it is, I've not
included the sparse alloc/free code in my xfsprogs branch as this code
currently isn't needed. Nothing in userspace that I've seen requires the
ability to do a sparse inode allocation or free. I suspect if it is
needed in the future, we can more easily sync the XFS bitmap helpers to
userspace than the generic Linux bitmap code.

Thoughts, reviews, flames appreciated...

Brian

v5:
- Use XFS helpers for allocmask code instead of generic bitmap helpers.
v4: http://oss.sgi.com/archives/xfs/2015-02/msg00240.html
- Rename sb_spinoalignmt to sb_spino_align.
- Clean up error/warning messages.
- Use a union to differentiate old/new xfs_inobt_rec on-disk format.
  Refactor such that in-core record fields are always valid.
- Rename/move allocmap (bitmap) helper functions and provide extra
  helper for endian conv.
- Refactor sparse chunk allocation record management code.
- Clean up #ifdef and label usage for DEBUG mode sparse allocs.
- Split up and moved some generic, preparatory hunks earlier in series.
v3: http://oss.sgi.com/archives/xfs/2015-02/msg00110.html
- Rebase to latest for-next (bulkstat rework, data structure shuffling,
  etc.).
- Fix issparse helper logic.
- Update inode alignment model w/ spinodes enabled. All inode records
  are chunk size aligned, sparse allocations cluster size aligned (both
  enforced on mount).
- Reworked sparse inode record merge logic to coincide w/ new alignment
  model.
- Mark feature as experimental (warn on mount).
- Include and use block allocation agbno range limit to prevent
  allocation of invalid inode records.
- Add some DEBUG bits to improve sparse alloc. test coverage.
v2: http://oss.sgi.com/archives/xfs/2014-11/msg00007.html
- Use a manually set feature bit instead of dynamic based on the
  existence of sparse inode chunks.
- Add sb/mp fields for sparse alloc. granularity (use instead of cluster
  size).
- Undo xfs_inobt_insert() loop removal to avoid breakage of larger page
  size arches.
- Rename sparse record overlap helper and do XFS_LOOKUP_LE search.
- Use byte of pad space in inobt record for inode count field.
- Convert bitmap mgmt to use generic bitmap code.
- Rename XFS_INODES_PER_SPCHUNK to XFS_INODES_PER_HOLEMASK_BIT.
- Add fs geometry bit for sparse inodes.
- Rebase to latest for-next (bulkstat refactor).
v1: http://oss.sgi.com/archives/xfs/2014-07/msg00355.html

Brian Foster (18):
  xfs: create individual inode alloc. helper
  xfs: update free inode record logic to support sparse inode records
  xfs: support min/max agbno args in block allocator
  xfs: add sparse inode chunk alignment superblock field
  xfs: use sparse chunk alignment for min. inode allocation requirement
  xfs: sparse inode chunks feature helpers and mount requirements
  xfs: add fs geometry bit for sparse inode chunks
  xfs: introduce inode record hole mask for sparse inode chunks
  xfs: use actual inode count for sparse records in bulkstat/inumbers
  xfs: pass inode count through ordered icreate log item
  xfs: handle sparse inode chunks in icreate log recovery
  xfs: helper to convert holemask to inode alloc. bitmap
  xfs: allocate sparse inode chunks on full chunk allocation failure
  xfs: randomly do sparse inode allocations in DEBUG mode
  xfs: filter out sparse regions from individual inode allocation
  xfs: only free allocated regions of inode chunks
  xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster()
  xfs: enable sparse inode chunks for v5 superblocks

 fs/xfs/libxfs/xfs_alloc.c        |  42 +++-
 fs/xfs/libxfs/xfs_alloc.h        |   2 +
 fs/xfs/libxfs/xfs_format.h       |  50 +++-
 fs/xfs/libxfs/xfs_fs.h           |   1 +
 fs/xfs/libxfs/xfs_ialloc.c       | 530 +++++++++++++++++++++++++++++++++++----
 fs/xfs/libxfs/xfs_ialloc.h       |  12 +-
 fs/xfs/libxfs/xfs_ialloc_btree.c |  93 ++++++-
 fs/xfs/libxfs/xfs_ialloc_btree.h |  10 +
 fs/xfs/libxfs/xfs_sb.c           |  30 ++-
 fs/xfs/xfs_fsops.c               |   4 +-
 fs/xfs/xfs_inode.c               |  28 ++-
 fs/xfs/xfs_itable.c              |  13 +-
 fs/xfs/xfs_log_recover.c         |  26 +-
 fs/xfs/xfs_mount.c               |  16 ++
 fs/xfs/xfs_mount.h               |   2 +
 fs/xfs/xfs_trace.h               |  47 ++++
 16 files changed, 820 insertions(+), 86 deletions(-)

-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 01/18] xfs: create individual inode alloc. helper
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 02/18] xfs: update free inode record logic to support sparse inode records Brian Foster
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

Inode allocation from sparse inode records must filter the ir_free mask
against ir_holemask.  In preparation for this requirement, create a
helper to allocate an individual inode from an inode record.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 116ef1d..12b62a34 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -731,6 +731,16 @@ xfs_ialloc_get_rec(
 }
 
 /*
+ * Return the offset of the first free inode in the record.
+ */
+STATIC int
+xfs_inobt_first_free_inode(
+	struct xfs_inobt_rec_incore	*rec)
+{
+	return xfs_lowbit64(rec->ir_free);
+}
+
+/*
  * Allocate an inode using the inobt-only algorithm.
  */
 STATIC int
@@ -960,7 +970,7 @@ newino:
 	}
 
 alloc_inode:
-	offset = xfs_lowbit64(rec.ir_free);
+	offset = xfs_inobt_first_free_inode(&rec);
 	ASSERT(offset >= 0);
 	ASSERT(offset < XFS_INODES_PER_CHUNK);
 	ASSERT((XFS_AGINO_TO_OFFSET(mp, rec.ir_startino) %
@@ -1209,7 +1219,7 @@ xfs_dialloc_ag(
 	if (error)
 		goto error_cur;
 
-	offset = xfs_lowbit64(rec.ir_free);
+	offset = xfs_inobt_first_free_inode(&rec);
 	ASSERT(offset >= 0);
 	ASSERT(offset < XFS_INODES_PER_CHUNK);
 	ASSERT((XFS_AGINO_TO_OFFSET(mp, rec.ir_startino) %
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 02/18] xfs: update free inode record logic to support sparse inode records
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
  2015-02-19 18:13 ` [PATCH v5 01/18] xfs: create individual inode alloc. helper Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 03/18] xfs: support min/max agbno args in block allocator Brian Foster
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

xfs_difree_inobt() uses logic in a couple places that assume inobt
records refer to fully allocated chunks. Specifically, the use of
mp->m_ialloc_inos can cause problems for inode chunks that are sparsely
allocated. Sparse inode chunks can, by definition, define a smaller
number of inodes than a full inode chunk.

Fix the logic that determines whether an inode record should be removed
from the inobt to use the ir_free mask rather than ir_freecount.

Fix the agi counters modification to use ir_freecount to add the actual
number of inodes freed rather than assuming a full inode chunk.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 12b62a34..ffac044 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1509,7 +1509,7 @@ xfs_difree_inobt(
 	 * When an inode cluster is free, it becomes eligible for removal
 	 */
 	if (!(mp->m_flags & XFS_MOUNT_IKEEP) &&
-	    (rec.ir_freecount == mp->m_ialloc_inos)) {
+	    (rec.ir_free == XFS_INOBT_ALL_FREE)) {
 
 		*deleted = 1;
 		*first_ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino);
@@ -1519,7 +1519,7 @@ xfs_difree_inobt(
 		 * AGI and Superblock inode counts, and mark the disk space
 		 * to be freed when the transaction is committed.
 		 */
-		ilen = mp->m_ialloc_inos;
+		ilen = rec.ir_freecount;
 		be32_add_cpu(&agi->agi_count, -ilen);
 		be32_add_cpu(&agi->agi_freecount, -(ilen - 1));
 		xfs_ialloc_log_agi(tp, agbp, XFS_AGI_COUNT | XFS_AGI_FREECOUNT);
@@ -1640,7 +1640,7 @@ xfs_difree_finobt(
 	 * keeping inode chunks permanently on disk, remove the record.
 	 * Otherwise, update the record with the new information.
 	 */
-	if (rec.ir_freecount == mp->m_ialloc_inos &&
+	if (rec.ir_free == XFS_INOBT_ALL_FREE &&
 	    !(mp->m_flags & XFS_MOUNT_IKEEP)) {
 		error = xfs_btree_delete(cur, &i);
 		if (error)
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 03/18] xfs: support min/max agbno args in block allocator
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
  2015-02-19 18:13 ` [PATCH v5 01/18] xfs: create individual inode alloc. helper Brian Foster
  2015-02-19 18:13 ` [PATCH v5 02/18] xfs: update free inode record logic to support sparse inode records Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 04/18] xfs: add sparse inode chunk alignment superblock field Brian Foster
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

The block allocator supports various arguments to tweak block allocation
behavior and set allocation requirements. The sparse inode chunk feature
introduces a new requirement not supported by the current arguments.
Sparse inode allocations must convert or merge into an inode record that
describes a fixed length chunk (64 inodes x inodesize). Full inode chunk
allocations by definition always result in valid inode records. Sparse
chunk allocations are smaller and the associated records can refer to
blocks not owned by the inode chunk. This model can result in invalid
inode records in certain cases.

For example, if a sparse allocation occurs near the start of an AG, the
aligned inode record for that chunk might refer to agbno 0. If an
allocation occurs towards the end of the AG and the AG size is not
aligned, the inode record could refer to blocks beyond the end of the
AG. While neither of these scenarios directly result in corruption, they
both insert invalid inode records and at minimum cause repair to
complain, are unlikely to merge into full chunks over time and set land
mines for other areas of code.

To guarantee sparse inode chunk allocation creates valid inode records,
support the ability to specify an agbno range limit for
XFS_ALLOCTYPE_NEAR_BNO block allocations. The min/max agbno's are
specified in the allocation arguments and limit the block allocation
algorithms to that range. The starting 'agbno' hint is clamped to the
range if the specified agbno is out of range. If no sufficient extent is
available within the range, the allocation fails. For backwards
compatibility, the min/max fields can be initialized to 0 to disable
range limiting (e.g., equivalent to min=0,max=agsize).

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 42 +++++++++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_alloc.h |  2 ++
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a6fbf44..0ddf6c9 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -149,13 +149,27 @@ xfs_alloc_compute_aligned(
 {
 	xfs_agblock_t	bno;
 	xfs_extlen_t	len;
+	xfs_extlen_t	diff;
 
 	/* Trim busy sections out of found extent */
 	xfs_extent_busy_trim(args, foundbno, foundlen, &bno, &len);
 
+	/*
+	 * If we have a largish extent that happens to start before min_agbno,
+	 * see if we can shift it into range...
+	 */
+	if (bno < args->min_agbno && bno + len > args->min_agbno) {
+		diff = args->min_agbno - bno;
+		if (len > diff) {
+			bno += diff;
+			len -= diff;
+		}
+	}
+
 	if (args->alignment > 1 && len >= args->minlen) {
 		xfs_agblock_t	aligned_bno = roundup(bno, args->alignment);
-		xfs_extlen_t	diff = aligned_bno - bno;
+
+		diff = aligned_bno - bno;
 
 		*resbno = aligned_bno;
 		*reslen = diff >= len ? 0 : len - diff;
@@ -790,9 +804,13 @@ xfs_alloc_find_best_extent(
 		 * The good extent is closer than this one.
 		 */
 		if (!dir) {
+			if (*sbnoa > args->max_agbno)
+				goto out_use_good;
 			if (*sbnoa >= args->agbno + gdiff)
 				goto out_use_good;
 		} else {
+			if (*sbnoa < args->min_agbno)
+				goto out_use_good;
 			if (*sbnoa <= args->agbno - gdiff)
 				goto out_use_good;
 		}
@@ -879,6 +897,17 @@ xfs_alloc_ag_vextent_near(
 	dofirst = prandom_u32() & 1;
 #endif
 
+	/* handle unitialized agbno range so caller doesn't have to */
+	if (!args->min_agbno && !args->max_agbno)
+		args->max_agbno = args->mp->m_sb.sb_agblocks - 1;
+	ASSERT(args->min_agbno <= args->max_agbno);
+
+	/* clamp agbno to the range if it's outside */
+	if (args->agbno < args->min_agbno)
+		args->agbno = args->min_agbno;
+	if (args->agbno > args->max_agbno)
+		args->agbno = args->max_agbno;
+
 restart:
 	bno_cur_lt = NULL;
 	bno_cur_gt = NULL;
@@ -971,6 +1000,8 @@ restart:
 						  &ltbnoa, &ltlena);
 			if (ltlena < args->minlen)
 				continue;
+			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
+				continue;
 			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 			xfs_alloc_fix_len(args);
 			ASSERT(args->len >= args->minlen);
@@ -1091,11 +1122,11 @@ restart:
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 			xfs_alloc_compute_aligned(args, ltbno, ltlen,
 						  &ltbnoa, &ltlena);
-			if (ltlena >= args->minlen)
+			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
 				break;
 			if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
 				goto error0;
-			if (!i) {
+			if (!i || ltbnoa < args->min_agbno) {
 				xfs_btree_del_cursor(bno_cur_lt,
 						     XFS_BTREE_NOERROR);
 				bno_cur_lt = NULL;
@@ -1107,11 +1138,11 @@ restart:
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 			xfs_alloc_compute_aligned(args, gtbno, gtlen,
 						  &gtbnoa, &gtlena);
-			if (gtlena >= args->minlen)
+			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
 				break;
 			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
 				goto error0;
-			if (!i) {
+			if (!i || gtbnoa > args->max_agbno) {
 				xfs_btree_del_cursor(bno_cur_gt,
 						     XFS_BTREE_NOERROR);
 				bno_cur_gt = NULL;
@@ -1211,6 +1242,7 @@ restart:
 	ASSERT(ltnew >= ltbno);
 	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
 	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
+	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
 	args->agbno = ltnew;
 
 	if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index d1b4b6a..29f27b2 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -112,6 +112,8 @@ typedef struct xfs_alloc_arg {
 	xfs_extlen_t	total;		/* total blocks needed in xaction */
 	xfs_extlen_t	alignment;	/* align answer to multiple of this */
 	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
+	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
+	xfs_agblock_t	max_agbno;	/* ... */
 	xfs_extlen_t	len;		/* output: actual size of extent */
 	xfs_alloctype_t	type;		/* allocation type XFS_ALLOCTYPE_... */
 	xfs_alloctype_t	otype;		/* original allocation type */
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 04/18] xfs: add sparse inode chunk alignment superblock field
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (2 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 03/18] xfs: support min/max agbno args in block allocator Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 05/18] xfs: use sparse chunk alignment for min. inode allocation requirement Brian Foster
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

Add sb_spino_align to the superblock to specify sparse inode chunk
alignment. This also currently represents the minimum allowable sparse
chunk allocation size.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h | 6 +++---
 fs/xfs/libxfs/xfs_sb.c     | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 8eb7189..dbca93d 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -170,7 +170,7 @@ typedef struct xfs_sb {
 	__uint32_t	sb_features_log_incompat;
 
 	__uint32_t	sb_crc;		/* superblock crc */
-	__uint32_t	sb_pad;
+	xfs_extlen_t	sb_spino_align;	/* sparse inode chunk alignment */
 
 	xfs_ino_t	sb_pquotino;	/* project quota inode */
 	xfs_lsn_t	sb_lsn;		/* last write sequence */
@@ -256,7 +256,7 @@ typedef struct xfs_dsb {
 	__be32		sb_features_log_incompat;
 
 	__le32		sb_crc;		/* superblock crc */
-	__be32		sb_pad;
+	__be32		sb_spino_align;	/* sparse inode chunk alignment */
 
 	__be64		sb_pquotino;	/* project quota inode */
 	__be64		sb_lsn;		/* last write sequence */
@@ -282,7 +282,7 @@ typedef enum {
 	XFS_SBS_LOGSECTLOG, XFS_SBS_LOGSECTSIZE, XFS_SBS_LOGSUNIT,
 	XFS_SBS_FEATURES2, XFS_SBS_BAD_FEATURES2, XFS_SBS_FEATURES_COMPAT,
 	XFS_SBS_FEATURES_RO_COMPAT, XFS_SBS_FEATURES_INCOMPAT,
-	XFS_SBS_FEATURES_LOG_INCOMPAT, XFS_SBS_CRC, XFS_SBS_PAD,
+	XFS_SBS_FEATURES_LOG_INCOMPAT, XFS_SBS_CRC, XFS_SBS_SPINO_ALIGN,
 	XFS_SBS_PQUOTINO, XFS_SBS_LSN,
 	XFS_SBS_FIELDCOUNT
 } xfs_sb_field_t;
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index b0a5fe9..a461c2e 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -382,7 +382,7 @@ __xfs_sb_from_disk(
 				be32_to_cpu(from->sb_features_log_incompat);
 	/* crc is only used on disk, not in memory; just init to 0 here. */
 	to->sb_crc = 0;
-	to->sb_pad = 0;
+	to->sb_spino_align = be32_to_cpu(from->sb_spino_align);
 	to->sb_pquotino = be64_to_cpu(from->sb_pquotino);
 	to->sb_lsn = be64_to_cpu(from->sb_lsn);
 	/* Convert on-disk flags to in-memory flags? */
@@ -524,7 +524,7 @@ xfs_sb_to_disk(
 				cpu_to_be32(from->sb_features_incompat);
 		to->sb_features_log_incompat =
 				cpu_to_be32(from->sb_features_log_incompat);
-		to->sb_pad = 0;
+		to->sb_spino_align = cpu_to_be32(from->sb_spino_align);
 		to->sb_lsn = cpu_to_be64(from->sb_lsn);
 	}
 }
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 05/18] xfs: use sparse chunk alignment for min. inode allocation requirement
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (3 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 04/18] xfs: add sparse inode chunk alignment superblock field Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 06/18] xfs: sparse inode chunks feature helpers and mount requirements Brian Foster
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

xfs_ialloc_ag_select() iterates through the allocation groups looking
for free inodes or free space to determine whether to allow an inode
allocation to proceed. If no free inodes are available, it assumes that
an AG must have an extent longer than mp->m_ialloc_blks.

Sparse inode chunk support currently allows for allocations smaller than
the traditional inode chunk size specified in m_ialloc_blks. The current
minimum sparse allocation is set in the superblock sb_spino_align field
at mkfs time. Create a new m_ialloc_min_blks field in xfs_mount and use
this to represent the minimum supported allocation size for inode
chunks. Initialize m_ialloc_min_blks at mount time based on whether
sparse inodes are supported.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 2 +-
 fs/xfs/libxfs/xfs_sb.c     | 5 +++++
 fs/xfs/xfs_mount.h         | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index ffac044..07cce35 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -644,7 +644,7 @@ xfs_ialloc_ag_select(
 		 * if we fail allocation due to alignment issues then it is most
 		 * likely a real ENOSPC condition.
 		 */
-		ineed = mp->m_ialloc_blks;
+		ineed = mp->m_ialloc_min_blks;
 		if (flags && ineed > 1)
 			ineed += xfs_ialloc_cluster_alignment(mp);
 		longest = pag->pagf_longest;
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index a461c2e..2b5b4fe 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -697,6 +697,11 @@ xfs_sb_mount_common(
 	mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
 					sbp->sb_inopblock);
 	mp->m_ialloc_blks = mp->m_ialloc_inos >> sbp->sb_inopblog;
+
+	if (sbp->sb_spino_align)
+		mp->m_ialloc_min_blks = sbp->sb_spino_align;
+	else
+		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
 }
 
 /*
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 0d8abd6..cba7afb 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -136,6 +136,8 @@ typedef struct xfs_mount {
 	__uint64_t		m_flags;	/* global mount flags */
 	int			m_ialloc_inos;	/* inodes in inode allocation */
 	int			m_ialloc_blks;	/* blocks in inode allocation */
+	int			m_ialloc_min_blks;/* min blocks in sparse inode
+						   * allocation */
 	int			m_inoalign_mask;/* mask sb_inoalignmt if used */
 	uint			m_qflags;	/* quota status flags */
 	struct xfs_trans_resv	m_resv;		/* precomputed res values */
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 06/18] xfs: sparse inode chunks feature helpers and mount requirements
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (4 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 05/18] xfs: use sparse chunk alignment for min. inode allocation requirement Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 07/18] xfs: add fs geometry bit for sparse inode chunks Brian Foster
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

The sparse inode chunks feature uses the helper function to enable the
allocation of sparse inode chunks. The incompatible feature bit is set
on disk at mkfs time to prevent mount from unsupported kernels.

Also, enforce the inode alignment requirements required for sparse inode
chunks at mount time. When enabled, full inode chunks (and all inode
record) alignment is increased from cluster size to inode chunk size.
Sparse inode alignment must match the cluster size of the fs. Both
superblock alignment fields are set as such by mkfs when sparse inode
support is enabled.

Finally, warn that sparse inode chunks is an experimental feature until
further notice.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h |  7 +++++++
 fs/xfs/libxfs/xfs_sb.c     | 21 +++++++++++++++++++++
 fs/xfs/xfs_mount.c         | 16 ++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index dbca93d..47005b1 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -519,6 +519,7 @@ xfs_sb_has_ro_compat_feature(
 }
 
 #define XFS_SB_FEAT_INCOMPAT_FTYPE	(1 << 0)	/* filetype in dirent */
+#define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE)
 
@@ -568,6 +569,12 @@ static inline int xfs_sb_version_hasfinobt(xfs_sb_t *sbp)
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FINOBT);
 }
 
+static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
+{
+	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
+}
+
 /*
  * end of superblock version macros
  */
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 2b5b4fe..5b20a6c 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -182,6 +182,27 @@ xfs_mount_validate_sb(
 			return -EFSCORRUPTED;
 	}
 
+	/*
+	 * Full inode chunks must be aligned to inode chunk size when
+	 * sparse inodes are enabled to support the sparse chunk
+	 * allocation algorithm and prevent overlapping inode records.
+	 */
+	if (xfs_sb_version_hassparseinodes(sbp)) {
+		uint32_t	align;
+
+		xfs_alert(mp,
+	"EXPERIMENTAL sparse inode feature enabled. Use at your own risk!");
+
+		align = XFS_INODES_PER_CHUNK * sbp->sb_inodesize
+				>> sbp->sb_blocklog;
+		if (sbp->sb_inoalignmt != align) {
+			xfs_warn(mp,
+"Inode block alignment (%u) must match chunk size (%u) for sparse inodes.",
+				 sbp->sb_inoalignmt, align);
+			return -EINVAL;
+		}
+	}
+
 	if (unlikely(
 	    sbp->sb_logstart == 0 && mp->m_logdev_targp == mp->m_ddev_targp)) {
 		xfs_warn(mp,
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 4fa80e6..61fd023 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -738,6 +738,22 @@ xfs_mountfs(
 	}
 
 	/*
+	 * If enabled, sparse inode chunk alignment is expected to match the
+	 * cluster size. Full inode chunk alignment must match the chunk size,
+	 * but that is checked on sb read verification...
+	 */
+	if (xfs_sb_version_hassparseinodes(&mp->m_sb) &&
+	    mp->m_sb.sb_spino_align !=
+			XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size)) {
+		xfs_warn(mp,
+	"Sparse inode block alignment (%u) must match cluster size (%llu).",
+			 mp->m_sb.sb_spino_align,
+			 XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size));
+		error = -EINVAL;
+		goto out_remove_uuid;
+	}
+
+	/*
 	 * Set inode alignment fields
 	 */
 	xfs_set_inoalignment(mp);
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 07/18] xfs: add fs geometry bit for sparse inode chunks
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (5 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 06/18] xfs: sparse inode chunks feature helpers and mount requirements Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 08/18] xfs: introduce inode record hole mask " Brian Foster
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

Define an fs geometry bit for sparse inode chunks such that the
characteristic of the fs can be identified by userspace.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_fs.h | 1 +
 fs/xfs/xfs_fsops.c     | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 18dc721..89689c6 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -239,6 +239,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_V5SB	0x8000	/* version 5 superblock */
 #define XFS_FSOP_GEOM_FLAGS_FTYPE	0x10000	/* inode directory types */
 #define XFS_FSOP_GEOM_FLAGS_FINOBT	0x20000	/* free inode btree */
+#define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 74efe5b..8c18da9 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -101,7 +101,9 @@ xfs_fs_geometry(
 			(xfs_sb_version_hasftype(&mp->m_sb) ?
 				XFS_FSOP_GEOM_FLAGS_FTYPE : 0) |
 			(xfs_sb_version_hasfinobt(&mp->m_sb) ?
-				XFS_FSOP_GEOM_FLAGS_FINOBT : 0);
+				XFS_FSOP_GEOM_FLAGS_FINOBT : 0) |
+			(xfs_sb_version_hassparseinodes(&mp->m_sb) ?
+				XFS_FSOP_GEOM_FLAGS_SPINODES : 0);
 		geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
 				mp->m_sb.sb_logsectsize : BBSIZE;
 		geo->rtsectsize = mp->m_sb.sb_blocksize;
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 08/18] xfs: introduce inode record hole mask for sparse inode chunks
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (6 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 07/18] xfs: add fs geometry bit for sparse inode chunks Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 09/18] xfs: use actual inode count for sparse records in bulkstat/inumbers Brian Foster
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

The inode btrees track 64 inodes per record regardless of inode size.
Thus, inode chunks on disk vary in size depending on the size of the
inodes. This creates a contiguous allocation requirement for new inode
chunks that can be difficult to satisfy on an aged and fragmented (free
space) filesystems.

The inode record freecount currently uses 4 bytes on disk to track the
free inode count. With a maximum freecount value of 64, only one byte is
required. Convert the freecount field to a single byte and use two of
the remaining 3 higher order bytes left for the hole mask field. Use the
final leftover byte for the total count field.

The hole mask field tracks holes in the chunks of physical space that
the inode record refers to. This facilitates the sparse allocation of
inode chunks when contiguous chunks are not available and allows the
inode btrees to identify what portions of the chunk contain valid
inodes. The total count field contains the total number of valid inodes
referred to by the record. This can also be deduced from the hole mask.
The count field provides clarity and redundancy for internal record
verification.

Note that neither of the new fields can be written to disk on fs'
without sparse inode support. Doing so writes to the high-order bytes of
freecount and causes corruption from the perspective of older kernels.
The on-disk inobt record data structure is updated with a union to
distinguish between the original, "full" format and the new, "sparse"
format. The conversion routines to get, insert and update records are
updated to translate to and from the on-disk record accordingly such
that freecount remains a 4-byte value on non-supported fs, yet the new
fields of the in-core record are always valid with respect to the
record. This means that higher level code can refer to the current
in-core record format unconditionally and lower level code ensures that
records are translated to/from disk according to the capabilities of the
fs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h       | 34 +++++++++++++++++++++++++---
 fs/xfs/libxfs/xfs_ialloc.c       | 48 +++++++++++++++++++++++++++++++++-------
 fs/xfs/libxfs/xfs_ialloc_btree.c | 11 ++++++++-
 3 files changed, 81 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 47005b1..4f2160d 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1285,26 +1285,54 @@ typedef	__uint64_t	xfs_inofree_t;
 #define	XFS_INOBT_ALL_FREE		((xfs_inofree_t)-1)
 #define	XFS_INOBT_MASK(i)		((xfs_inofree_t)1 << (i))
 
+#define XFS_INOBT_HOLEMASK_FULL		0	/* holemask for full chunk */
+#define XFS_INOBT_HOLEMASK_BITS		(NBBY * sizeof(__uint16_t))
+#define XFS_INODES_PER_HOLEMASK_BIT	\
+	(XFS_INODES_PER_CHUNK / (NBBY * sizeof(__uint16_t)))
+
 static inline xfs_inofree_t xfs_inobt_maskn(int i, int n)
 {
 	return ((n >= XFS_INODES_PER_CHUNK ? 0 : XFS_INOBT_MASK(n)) - 1) << i;
 }
 
 /*
- * Data record structure
+ * The on-disk inode record structure has two formats. The original "full"
+ * format uses a 4-byte freecount. The "sparse" format uses a 1-byte freecount
+ * and replaces the 3 high-order freecount bytes wth the holemask and inode
+ * count.
+ *
+ * The holemask of the sparse record format allows an inode chunk to have holes
+ * that refer to blocks not owned by the inode record. This facilitates inode
+ * allocation in the event of severe free space fragmentation.
  */
 typedef struct xfs_inobt_rec {
 	__be32		ir_startino;	/* starting inode number */
-	__be32		ir_freecount;	/* count of free inodes (set bits) */
+	union {
+		struct {
+			__be32	ir_freecount;	/* count of free inodes */
+		} f;
+		struct {
+			__be16	ir_holemask;/* hole mask for sparse chunks */
+			__u8	ir_count;	/* total inode count */
+			__u8	ir_freecount;	/* count of free inodes */
+		} sp;
+	} ir_u;
 	__be64		ir_free;	/* free inode mask */
 } xfs_inobt_rec_t;
 
 typedef struct xfs_inobt_rec_incore {
 	xfs_agino_t	ir_startino;	/* starting inode number */
-	__int32_t	ir_freecount;	/* count of free inodes (set bits) */
+	__uint16_t	ir_holemask;	/* hole mask for sparse chunks */
+	__uint8_t	ir_count;	/* total inode count */
+	__uint8_t	ir_freecount;	/* count of free inodes (set bits) */
 	xfs_inofree_t	ir_free;	/* free inode mask */
 } xfs_inobt_rec_incore_t;
 
+static inline bool xfs_inobt_issparse(uint16_t holemask)
+{
+	/* non-zero holemask represents a sparse rec. */
+	return holemask;
+}
 
 /*
  * Key structure
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 07cce35..008cb24 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -65,6 +65,8 @@ xfs_inobt_lookup(
 	int			*stat)	/* success/failure */
 {
 	cur->bc_rec.i.ir_startino = ino;
+	cur->bc_rec.i.ir_holemask = 0;
+	cur->bc_rec.i.ir_count = 0;
 	cur->bc_rec.i.ir_freecount = 0;
 	cur->bc_rec.i.ir_free = 0;
 	return xfs_btree_lookup(cur, dir, stat);
@@ -82,7 +84,14 @@ xfs_inobt_update(
 	union xfs_btree_rec	rec;
 
 	rec.inobt.ir_startino = cpu_to_be32(irec->ir_startino);
-	rec.inobt.ir_freecount = cpu_to_be32(irec->ir_freecount);
+	if (xfs_sb_version_hassparseinodes(&cur->bc_mp->m_sb)) {
+		rec.inobt.ir_u.sp.ir_holemask = cpu_to_be16(irec->ir_holemask);
+		rec.inobt.ir_u.sp.ir_count = irec->ir_count;
+		rec.inobt.ir_u.sp.ir_freecount = irec->ir_freecount;
+	} else {
+		/* ir_holemask/ir_count not supported on-disk */
+		rec.inobt.ir_u.f.ir_freecount = cpu_to_be32(irec->ir_freecount);
+	}
 	rec.inobt.ir_free = cpu_to_be64(irec->ir_free);
 	return xfs_btree_update(cur, &rec);
 }
@@ -100,12 +109,27 @@ xfs_inobt_get_rec(
 	int			error;
 
 	error = xfs_btree_get_rec(cur, &rec, stat);
-	if (!error && *stat == 1) {
-		irec->ir_startino = be32_to_cpu(rec->inobt.ir_startino);
-		irec->ir_freecount = be32_to_cpu(rec->inobt.ir_freecount);
-		irec->ir_free = be64_to_cpu(rec->inobt.ir_free);
+	if (error || *stat == 0)
+		return error;
+
+	irec->ir_startino = be32_to_cpu(rec->inobt.ir_startino);
+	if (xfs_sb_version_hassparseinodes(&cur->bc_mp->m_sb)) {
+		irec->ir_holemask = be16_to_cpu(rec->inobt.ir_u.sp.ir_holemask);
+		irec->ir_count = rec->inobt.ir_u.sp.ir_count;
+		irec->ir_freecount = rec->inobt.ir_u.sp.ir_freecount;
+	} else {
+		/*
+		 * ir_holemask/ir_count not supported on-disk. Fill in hardcoded
+		 * values for full inode chunks.
+		 */
+		irec->ir_holemask = XFS_INOBT_HOLEMASK_FULL;
+		irec->ir_count = XFS_INODES_PER_CHUNK;
+		irec->ir_freecount =
+				be32_to_cpu(rec->inobt.ir_u.f.ir_freecount);
 	}
-	return error;
+	irec->ir_free = be64_to_cpu(rec->inobt.ir_free);
+
+	return 0;
 }
 
 /*
@@ -114,10 +138,14 @@ xfs_inobt_get_rec(
 STATIC int
 xfs_inobt_insert_rec(
 	struct xfs_btree_cur	*cur,
+	__uint16_t		holemask,
+	__uint8_t		count,
 	__int32_t		freecount,
 	xfs_inofree_t		free,
 	int			*stat)
 {
+	cur->bc_rec.i.ir_holemask = holemask;
+	cur->bc_rec.i.ir_count = count;
 	cur->bc_rec.i.ir_freecount = freecount;
 	cur->bc_rec.i.ir_free = free;
 	return xfs_btree_insert(cur, stat);
@@ -154,7 +182,9 @@ xfs_inobt_insert(
 		}
 		ASSERT(i == 0);
 
-		error = xfs_inobt_insert_rec(cur, XFS_INODES_PER_CHUNK,
+		error = xfs_inobt_insert_rec(cur, XFS_INOBT_HOLEMASK_FULL,
+					     XFS_INODES_PER_CHUNK,
+					     XFS_INODES_PER_CHUNK,
 					     XFS_INOBT_ALL_FREE, &i);
 		if (error) {
 			xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
@@ -1604,7 +1634,9 @@ xfs_difree_finobt(
 		 */
 		XFS_WANT_CORRUPTED_GOTO(ibtrec->ir_freecount == 1, error);
 
-		error = xfs_inobt_insert_rec(cur, ibtrec->ir_freecount,
+		error = xfs_inobt_insert_rec(cur, ibtrec->ir_holemask,
+					     ibtrec->ir_count,
+					     ibtrec->ir_freecount,
 					     ibtrec->ir_free, &i);
 		if (error)
 			goto error;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 964c465..b95aac5 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -167,7 +167,16 @@ xfs_inobt_init_rec_from_cur(
 	union xfs_btree_rec	*rec)
 {
 	rec->inobt.ir_startino = cpu_to_be32(cur->bc_rec.i.ir_startino);
-	rec->inobt.ir_freecount = cpu_to_be32(cur->bc_rec.i.ir_freecount);
+	if (xfs_sb_version_hassparseinodes(&cur->bc_mp->m_sb)) {
+		rec->inobt.ir_u.sp.ir_holemask =
+					cpu_to_be16(cur->bc_rec.i.ir_holemask);
+		rec->inobt.ir_u.sp.ir_count = cur->bc_rec.i.ir_count;
+		rec->inobt.ir_u.sp.ir_freecount = cur->bc_rec.i.ir_freecount;
+	} else {
+		/* ir_holemask/ir_count not supported on-disk */
+		rec->inobt.ir_u.f.ir_freecount =
+					cpu_to_be32(cur->bc_rec.i.ir_freecount);
+	}
 	rec->inobt.ir_free = cpu_to_be64(cur->bc_rec.i.ir_free);
 }
 
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 09/18] xfs: use actual inode count for sparse records in bulkstat/inumbers
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (7 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 08/18] xfs: introduce inode record hole mask " Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 10/18] xfs: pass inode count through ordered icreate log item Brian Foster
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

The bulkstat and inumbers mechanisms make the assumption that inode
records consist of a full 64 inode chunk in several places. For example,
this is used to track how many inodes have been processed overall as
well as to determine whether a record has allocated inodes that must be
handled.

This assumption is invalid for sparse inode records. While sparse inodes
will be marked as free in the ir_free mask, they are not accounted as
free in ir_freecount because they cannot be allocated. Therefore,
ir_freecount may be less than 64 inodes in an inode record for which all
physically allocated inodes are free (and in turn ir_freecount < 64 does
not signify that the record has allocated inodes).

The new in-core inobt record format includes the ir_count field. This
holds the number of true, physical inodes tracked by the record. The
in-core ir_count field is always valid as it is hardcoded to
XFS_INODES_PER_CHUNK when sparse inodes is not enabled. Use ir_count to
handle inode records correctly in bulkstat in a generic manner.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_itable.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 82e3142..7c68058 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -252,7 +252,7 @@ xfs_bulkstat_grab_ichunk(
 		}
 
 		irec->ir_free |= xfs_inobt_maskn(0, idx);
-		*icount = XFS_INODES_PER_CHUNK - irec->ir_freecount;
+		*icount = irec->ir_count - irec->ir_freecount;
 	}
 
 	return 0;
@@ -415,6 +415,8 @@ xfs_bulkstat(
 				goto del_cursor;
 			if (icount) {
 				irbp->ir_startino = r.ir_startino;
+				irbp->ir_holemask = r.ir_holemask;
+				irbp->ir_count = r.ir_count;
 				irbp->ir_freecount = r.ir_freecount;
 				irbp->ir_free = r.ir_free;
 				irbp++;
@@ -447,13 +449,15 @@ xfs_bulkstat(
 			 * If this chunk has any allocated inodes, save it.
 			 * Also start read-ahead now for this chunk.
 			 */
-			if (r.ir_freecount < XFS_INODES_PER_CHUNK) {
+			if (r.ir_freecount < r.ir_count) {
 				xfs_bulkstat_ichunk_ra(mp, agno, &r);
 				irbp->ir_startino = r.ir_startino;
+				irbp->ir_holemask = r.ir_holemask;
+				irbp->ir_count = r.ir_count;
 				irbp->ir_freecount = r.ir_freecount;
 				irbp->ir_free = r.ir_free;
 				irbp++;
-				icount += XFS_INODES_PER_CHUNK - r.ir_freecount;
+				icount += r.ir_count - r.ir_freecount;
 			}
 			error = xfs_btree_increment(cur, 0, &stat);
 			if (error || stat == 0) {
@@ -599,8 +603,7 @@ xfs_inumbers(
 		agino = r.ir_startino + XFS_INODES_PER_CHUNK - 1;
 		buffer[bufidx].xi_startino =
 			XFS_AGINO_TO_INO(mp, agno, r.ir_startino);
-		buffer[bufidx].xi_alloccount =
-			XFS_INODES_PER_CHUNK - r.ir_freecount;
+		buffer[bufidx].xi_alloccount = r.ir_count - r.ir_freecount;
 		buffer[bufidx].xi_allocmask = ~r.ir_free;
 		if (++bufidx == bcount) {
 			long	written;
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 10/18] xfs: pass inode count through ordered icreate log item
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (8 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 09/18] xfs: use actual inode count for sparse records in bulkstat/inumbers Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 11/18] xfs: handle sparse inode chunks in icreate log recovery Brian Foster
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

v5 superblocks use an ordered log item for logging the initialization of
inode chunks. The icreate log item is currently hardcoded to an inode
count of 64 inodes.

The agbno and extent length are used to initialize the inode chunk from
log recovery. While an incorrect inode count does not lead to bad inode
chunk initialization, we should pass the correct inode count such that log
recovery has enough data to perform meaningful validity checks on the
chunk.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 7 ++++---
 fs/xfs/libxfs/xfs_ialloc.h | 2 +-
 fs/xfs/xfs_log_recover.c   | 4 ++--
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 008cb24..7c002f2 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -250,6 +250,7 @@ xfs_ialloc_inode_init(
 	struct xfs_mount	*mp,
 	struct xfs_trans	*tp,
 	struct list_head	*buffer_list,
+	int			icount,
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		agbno,
 	xfs_agblock_t		length,
@@ -305,7 +306,7 @@ xfs_ialloc_inode_init(
 		 * they track in the AIL as if they were physically logged.
 		 */
 		if (tp)
-			xfs_icreate_log(tp, agno, agbno, mp->m_ialloc_inos,
+			xfs_icreate_log(tp, agno, agbno, icount,
 					mp->m_sb.sb_inodesize, length, gen);
 	} else
 		version = 2;
@@ -524,8 +525,8 @@ xfs_ialloc_ag_alloc(
 	 * rather than a linear progression to prevent the next generation
 	 * number from being easily guessable.
 	 */
-	error = xfs_ialloc_inode_init(args.mp, tp, NULL, agno, args.agbno,
-			args.len, prandom_u32());
+	error = xfs_ialloc_inode_init(args.mp, tp, NULL, newlen, agno,
+			args.agbno, args.len, prandom_u32());
 
 	if (error)
 		return error;
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index 100007d..4d4b702 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -156,7 +156,7 @@ int xfs_inobt_get_rec(struct xfs_btree_cur *cur,
  * Inode chunk initialisation routine
  */
 int xfs_ialloc_inode_init(struct xfs_mount *mp, struct xfs_trans *tp,
-			  struct list_head *buffer_list,
+			  struct list_head *buffer_list, int icount,
 			  xfs_agnumber_t agno, xfs_agblock_t agbno,
 			  xfs_agblock_t length, unsigned int gen);
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index a5a945f..ecc73d5 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3091,8 +3091,8 @@ xlog_recover_do_icreate_pass2(
 			XFS_AGB_TO_DADDR(mp, agno, agbno), length, 0))
 		return 0;
 
-	xfs_ialloc_inode_init(mp, NULL, buffer_list, agno, agbno, length,
-					be32_to_cpu(icl->icl_gen));
+	xfs_ialloc_inode_init(mp, NULL, buffer_list, count, agno, agbno, length,
+			      be32_to_cpu(icl->icl_gen));
 	return 0;
 }
 
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 11/18] xfs: handle sparse inode chunks in icreate log recovery
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (9 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 10/18] xfs: pass inode count through ordered icreate log item Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 12/18] xfs: helper to convert holemask to inode alloc. bitmap Brian Foster
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

Recovery of icreate transactions assumes hardcoded values for the inode
count and chunk length.

Sparse inode chunks are allocated in units of m_ialloc_min_blks. Update
the icreate validity checks to allow for appropriately sized inode
chunks and verify the inode count matches what is expected based on the
extent length rather than assuming a hardcoded count.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index ecc73d5..74d504b 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3068,12 +3068,22 @@ xlog_recover_do_icreate_pass2(
 		return -EINVAL;
 	}
 
-	/* existing allocation is fixed value */
-	ASSERT(count == mp->m_ialloc_inos);
-	ASSERT(length == mp->m_ialloc_blks);
-	if (count != mp->m_ialloc_inos ||
-	     length != mp->m_ialloc_blks) {
-		xfs_warn(log->l_mp, "xlog_recover_do_icreate_trans: bad count 2");
+	/*
+	 * The inode chunk is either full or sparse and we only support
+	 * m_ialloc_min_blks sized sparse allocations at this time.
+	 */
+	if (length != mp->m_ialloc_blks &&
+	    length != mp->m_ialloc_min_blks) {
+		xfs_warn(log->l_mp,
+			 "%s: unsupported chunk length", __FUNCTION__);
+		return -EINVAL;
+	}
+
+	/* verify inode count is consistent with extent length */
+	if ((count >> mp->m_sb.sb_inopblog) != length) {
+		xfs_warn(log->l_mp,
+			 "%s: inconsistent inode count and chunk length",
+			 __FUNCTION__);
 		return -EINVAL;
 	}
 
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 12/18] xfs: helper to convert holemask to inode alloc. bitmap
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (10 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 11/18] xfs: handle sparse inode chunks in icreate log recovery Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 13/18] xfs: allocate sparse inode chunks on full chunk allocation failure Brian Foster
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

The inobt record holemask field is a condensed data type designed to fit
into the existing on-disk record and is zero based (allocated regions
are set to 0, sparse regions are set to 1) to provide backwards
compatibility. This makes the type somewhat complex for use in higher
level inode manipulations such as individual inode allocation, etc.

Rather than foist the complexity of dealing with this field to every bit
of logic that requires inode granular information, create a helper to
convert the holemask to an inode allocation bitmap. The inode allocation
bitmap is inode granularity similar to the inobt record free mask and
indicates which inodes of the chunk are physically allocated on disk,
irrespective of whether the inode is considered allocated or free by the
filesystem.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc_btree.c | 51 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ialloc_btree.h |  3 +++
 2 files changed, 54 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index b95aac5..aa13b46 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -427,3 +427,54 @@ xfs_inobt_maxrecs(
 		return blocklen / sizeof(xfs_inobt_rec_t);
 	return blocklen / (sizeof(xfs_inobt_key_t) + sizeof(xfs_inobt_ptr_t));
 }
+
+/*
+ * Convert the inode record holemask to an inode allocation bitmap. The inode
+ * allocation bitmap is inode granularity and specifies whether an inode is
+ * physically allocated on disk (not whether the inode is considered allocated
+ * or free by the fs).
+ *
+ * A bit value of 1 means the inode is allocated, a value of 0 means it is free.
+ */
+uint64_t
+xfs_inobt_irec_to_allocmask(
+	struct xfs_inobt_rec_incore	*rec)
+{
+	uint64_t			bitmap = 0;
+	uint64_t			inodespbit;
+	int				nextbit;
+	uint				allocbitmap;
+
+	/*
+	 * The holemask has 16-bits for a 64 inode record. Therefore each
+	 * holemask bit represents multiple inodes. Create a mask of bits to set
+	 * in the allocmask for each holemask bit.
+	 */
+	inodespbit = (1 << XFS_INODES_PER_HOLEMASK_BIT) - 1;
+
+	/*
+	 * Allocated inodes are represented by 0 bits in holemask. Invert the 0
+	 * bits to 1 and convert to a uint so we can use xfs_next_bit(). Mask
+	 * anything beyond the 16 holemask bits since this casts to a larger
+	 * type.
+	 */
+	allocbitmap = ~rec->ir_holemask & ((1 << XFS_INOBT_HOLEMASK_BITS) - 1);
+
+	/*
+	 * allocbitmap is the inverted holemask so every set bit represents
+	 * allocated inodes. To expand from 16-bit holemask granularity to
+	 * 64-bit (e.g., bit-per-inode), set inodespbit bits in the target
+	 * bitmap for every holemask bit.
+	 */
+	nextbit = xfs_next_bit(&allocbitmap, 1, 0);
+	while (nextbit != -1) {
+		ASSERT(nextbit < (sizeof(rec->ir_holemask) * NBBY));
+
+		bitmap |= (inodespbit <<
+			   (nextbit * XFS_INODES_PER_HOLEMASK_BIT));
+
+		nextbit = xfs_next_bit(&allocbitmap, 1, nextbit + 1);
+	}
+
+	return bitmap;
+}
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index d7ebea7..2c581ba 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -62,4 +62,7 @@ extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_mount *,
 		xfs_btnum_t);
 extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int);
 
+/* ir_holemask to inode allocation bitmap conversion */
+uint64_t xfs_inobt_irec_to_allocmask(struct xfs_inobt_rec_incore *);
+
 #endif	/* __XFS_IALLOC_BTREE_H__ */
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 13/18] xfs: allocate sparse inode chunks on full chunk allocation failure
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (11 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 12/18] xfs: helper to convert holemask to inode alloc. bitmap Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 14/18] xfs: randomly do sparse inode allocations in DEBUG mode Brian Foster
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

xfs_ialloc_ag_alloc() makes several attempts to allocate a full inode
chunk. If all else fails, reduce the allocation to the sparse length and
alignment and attempt to allocate a sparse inode chunk.

If sparse chunk allocation succeeds, check whether an inobt record
already exists that can track the chunk. If so, inherit and update the
existing record. Otherwise, insert a new record for the sparse chunk.

Create helpers to align sparse chunk inode records and insert or update
existing records in the inode btrees. The xfs_inobt_insert_sprec()
helper implements the merge or update semantics required for sparse
inode records with respect to both the inobt and finobt. To update the
inobt, either insert a new record or merge with an existing record. To
update the finobt, use the updated inobt record to either insert or
replace an existing record.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c       | 329 +++++++++++++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_ialloc_btree.c |  31 ++++
 fs/xfs/libxfs/xfs_ialloc_btree.h |   7 +
 fs/xfs/xfs_trace.h               |  47 ++++++
 4 files changed, 400 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 7c002f2..0aad400 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -378,6 +378,213 @@ xfs_ialloc_inode_init(
 }
 
 /*
+ * Align startino and allocmask for a recently allocated sparse chunk such that
+ * they are fit for insertion (or merge) into the on-disk inode btrees.
+ *
+ * Background:
+ *
+ * When enabled, sparse inode support increases the inode alignment from cluster
+ * size to inode chunk size. This means that the minimum range between two
+ * non-adjacent inode records in the inobt is large enough for a full inode
+ * record. This allows for cluster sized, cluster aligned block allocation
+ * without need to worry about whether the resulting inode record overlaps with
+ * another record in the tree. Without this basic rule, we would have to deal
+ * with the consequences of overlap by potentially undoing recent allocations in
+ * the inode allocation codepath.
+ *
+ * Because of this alignment rule (which is enforced on mount), there are two
+ * inobt possibilities for newly allocated sparse chunks. One is that the
+ * aligned inode record for the chunk covers a range of inodes not already
+ * covered in the inobt (i.e., it is safe to insert a new sparse record). The
+ * other is that a record already exists at the aligned startino that considers
+ * the newly allocated range as sparse. In the latter case, record content is
+ * merged in hope that sparse inode chunks fill to full chunks over time.
+ */
+STATIC void
+xfs_align_sparse_ino(
+	struct xfs_mount		*mp,
+	xfs_agino_t			*startino,
+	uint16_t			*allocmask)
+{
+	xfs_agblock_t			agbno;
+	xfs_agblock_t			mod;
+	int				offset;
+
+	agbno = XFS_AGINO_TO_AGBNO(mp, *startino);
+	mod = agbno % mp->m_sb.sb_inoalignmt;
+	if (!mod)
+		return;
+
+	/* calculate the inode offset and align startino */
+	offset = mod << mp->m_sb.sb_inopblog;
+	*startino -= offset;
+
+	/*
+	 * Since startino has been aligned down, left shift allocmask such that
+	 * it continues to represent the same physical inodes relative to the
+	 * new startino.
+	 */
+	*allocmask <<= offset / XFS_INODES_PER_HOLEMASK_BIT;
+}
+
+/*
+ * Determine whether the source inode record can merge into the target. Both
+ * records must be sparse, the inode ranges must match and there must be no
+ * allocation overlap between the records.
+ */
+STATIC bool
+__xfs_inobt_can_merge(
+	struct xfs_inobt_rec_incore	*trec,	/* tgt record */
+	struct xfs_inobt_rec_incore	*srec)	/* src record */
+{
+	uint64_t			talloc;
+	uint64_t			salloc;
+
+	/* records must cover the same inode range */
+	if (trec->ir_startino != srec->ir_startino)
+		return false;
+
+	/* both records must be sparse */
+	if (!xfs_inobt_issparse(trec->ir_holemask) ||
+	    !xfs_inobt_issparse(srec->ir_holemask))
+		return false;
+
+	/* both records must track some inodes */
+	if (!trec->ir_count || !srec->ir_count)
+		return false;
+
+	/* can't exceed capacity of a full record */
+	if (trec->ir_count + srec->ir_count > XFS_INODES_PER_CHUNK)
+		return false;
+
+	/* verify there is no allocation overlap */
+	talloc = xfs_inobt_irec_to_allocmask(trec);
+	salloc = xfs_inobt_irec_to_allocmask(srec);
+	if (talloc & salloc)
+		return false;
+
+	return true;
+}
+
+/*
+ * Merge the source inode record into the target. The caller must call
+ * __xfs_inobt_can_merge() to ensure the merge is valid.
+ */
+STATIC void
+__xfs_inobt_rec_merge(
+	struct xfs_inobt_rec_incore	*trec,	/* target */
+	struct xfs_inobt_rec_incore	*srec)	/* src */
+{
+	ASSERT(trec->ir_startino == srec->ir_startino);
+
+	/* combine the counts */
+	trec->ir_count += srec->ir_count;
+	trec->ir_freecount += srec->ir_freecount;
+
+	/*
+	 * Merge the holemask and free mask. For both fields, 0 bits refer to
+	 * allocated inodes. We combine the allocated ranges with bitwise AND.
+	 */
+	trec->ir_holemask &= srec->ir_holemask;
+	trec->ir_free &= srec->ir_free;
+}
+
+/*
+ * Insert a new sparse inode chunk into the associated inode btree. The inode
+ * record for the sparse chunk is pre-aligned to a startino that should match
+ * any pre-existing sparse inode record in the tree. This allows sparse chunks
+ * to fill over time.
+ *
+ * This function supports two modes of handling preexisting records depending on
+ * the merge flag. If merge is true, the provided record is merged with the
+ * existing record and updated in place. The merged record is returned in nrec.
+ * If merge is false, an existing record is replaced with the provided record.
+ * If no preexisting record exists, the provided record is always inserted.
+ *
+ * It is considered corruption if a merge is requested and not possible. Given
+ * the sparse inode alignment constraints, this should never happen.
+ */
+STATIC int
+xfs_inobt_insert_sprec(
+	struct xfs_mount		*mp,
+	struct xfs_trans		*tp,
+	struct xfs_buf			*agbp,
+	int				btnum,
+	struct xfs_inobt_rec_incore	*nrec,	/* in/out: new/merged rec. */
+	bool				merge)	/* merge or replace */
+{
+	struct xfs_btree_cur		*cur;
+	struct xfs_agi			*agi = XFS_BUF_TO_AGI(agbp);
+	xfs_agnumber_t			agno = be32_to_cpu(agi->agi_seqno);
+	int				error;
+	int				i;
+	struct xfs_inobt_rec_incore	rec;
+
+	cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, btnum);
+
+	/* the new record is pre-aligned so we know where to look */
+	error = xfs_inobt_lookup(cur, nrec->ir_startino, XFS_LOOKUP_EQ, &i);
+	if (error)
+		goto error;
+	/* if nothing there, insert a new record and return */
+	if (i == 0) {
+		error = xfs_inobt_insert_rec(cur, nrec->ir_holemask,
+					     nrec->ir_count, nrec->ir_freecount,
+					     nrec->ir_free, &i);
+		if (error)
+			goto error;
+		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
+
+		goto out;
+	}
+
+	/*
+	 * A record exists at this startino. Merge or replace the record
+	 * depending on what we've been asked to do.
+	 */
+	if (merge) {
+		error = xfs_inobt_get_rec(cur, &rec, &i);
+		if (error)
+			goto error;
+		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
+		XFS_WANT_CORRUPTED_GOTO(rec.ir_startino == nrec->ir_startino,
+					error);
+
+		/*
+		 * This should never fail. If we have coexisting records that
+		 * cannot merge, something is seriously wrong.
+		 */
+		XFS_WANT_CORRUPTED_GOTO(__xfs_inobt_can_merge(nrec, &rec),
+					error);
+
+		trace_xfs_irec_merge_pre(mp, agno, rec.ir_startino,
+					 rec.ir_holemask, nrec->ir_startino,
+					 nrec->ir_holemask);
+
+		/* merge to nrec to output the updated record */
+		__xfs_inobt_rec_merge(nrec, &rec);
+
+		trace_xfs_irec_merge_post(mp, agno, nrec->ir_startino,
+					  nrec->ir_holemask);
+
+		error = xfs_inobt_rec_check_count(mp, nrec);
+		if (error)
+			goto error;
+	}
+
+	error = xfs_inobt_update(cur, nrec);
+	if (error)
+		goto error;
+
+out:
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	return 0;
+error:
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	return error;
+}
+
+/*
  * Allocate new inodes in the allocation group specified by agbp.
  * Return 0 for success, else error code.
  */
@@ -395,6 +602,8 @@ xfs_ialloc_ag_alloc(
 	xfs_agino_t	newlen;		/* new number of inodes */
 	int		isaligned = 0;	/* inode allocation at stripe unit */
 					/* boundary */
+	uint16_t	allocmask = (uint16_t) -1; /* init. to full chunk */
+	struct xfs_inobt_rec_incore rec;
 	struct xfs_perag *pag;
 
 	memset(&args, 0, sizeof(args));
@@ -510,6 +719,45 @@ xfs_ialloc_ag_alloc(
 			return error;
 	}
 
+	/*
+	 * Finally, try a sparse allocation if the filesystem supports it and
+	 * the sparse allocation length is smaller than a full chunk.
+	 */
+	if (xfs_sb_version_hassparseinodes(&args.mp->m_sb) &&
+	    args.mp->m_ialloc_min_blks < args.mp->m_ialloc_blks &&
+	    args.fsbno == NULLFSBLOCK) {
+		args.type = XFS_ALLOCTYPE_NEAR_BNO;
+		args.agbno = be32_to_cpu(agi->agi_root);
+		args.fsbno = XFS_AGB_TO_FSB(args.mp, agno, args.agbno);
+		args.alignment = args.mp->m_sb.sb_spino_align;
+		args.prod = 1;
+
+		args.minlen = args.mp->m_ialloc_min_blks;
+		args.maxlen = args.minlen;
+
+		/*
+		 * The inode record will be aligned to full chunk size. We must
+		 * prevent sparse allocation from AG boundaries that result in
+		 * invalid inode records, such as records that start at agbno 0
+		 * or extend beyond the AG.
+		 *
+		 * Set min agbno to the first aligned, non-zero agbno and max to
+		 * the last aligned agbno that is at least one full chunk from
+		 * the end of the AG.
+		 */
+		args.min_agbno = args.mp->m_sb.sb_inoalignmt;
+		args.max_agbno = round_down(args.mp->m_sb.sb_agblocks,
+					    args.mp->m_sb.sb_inoalignmt) -
+				 args.mp->m_ialloc_blks;
+
+		error = xfs_alloc_vextent(&args);
+		if (error)
+			return error;
+
+		newlen = args.len << args.mp->m_sb.sb_inopblog;
+		allocmask = (1 << (newlen / XFS_INODES_PER_HOLEMASK_BIT)) - 1;
+	}
+
 	if (args.fsbno == NULLFSBLOCK) {
 		*alloc = 0;
 		return 0;
@@ -534,6 +782,73 @@ xfs_ialloc_ag_alloc(
 	 * Convert the results.
 	 */
 	newino = XFS_OFFBNO_TO_AGINO(args.mp, args.agbno, 0);
+
+	if (xfs_inobt_issparse(~allocmask)) {
+		/*
+		 * We've allocated a sparse chunk. Align the startino and mask.
+		 */
+		xfs_align_sparse_ino(args.mp, &newino, &allocmask);
+
+		rec.ir_startino = newino;
+		rec.ir_holemask = ~allocmask;
+		rec.ir_count = newlen;
+		rec.ir_freecount = newlen;
+		rec.ir_free = XFS_INOBT_ALL_FREE;
+
+		/*
+		 * Insert the sparse record into the inobt and allow for a merge
+		 * if necessary. If a merge does occur, rec is updated to the
+		 * merged record.
+		 */
+		error = xfs_inobt_insert_sprec(args.mp, tp, agbp, XFS_BTNUM_INO,
+					       &rec, true);
+		if (error == -EFSCORRUPTED) {
+			xfs_alert(args.mp,
+	"invalid sparse inode record: ino 0x%llx holemask 0x%x count %u",
+				  XFS_AGINO_TO_INO(args.mp, agno,
+						   rec.ir_startino),
+				  rec.ir_holemask, rec.ir_count);
+			xfs_force_shutdown(args.mp, SHUTDOWN_CORRUPT_INCORE);
+		}
+		if (error)
+			return error;
+
+		/*
+		 * We can't merge the part we've just allocated as for the inobt
+		 * due to finobt semantics. The original record may or may not
+		 * exist independent of whether physical inodes exist in this
+		 * sparse chunk.
+		 *
+		 * We must update the finobt record based on the inobt record.
+		 * rec contains the fully merged and up to date inobt record
+		 * from the previous call. Set merge false to replace any
+		 * existing record with this one.
+		 */
+		if (xfs_sb_version_hasfinobt(&args.mp->m_sb)) {
+			error = xfs_inobt_insert_sprec(args.mp, tp, agbp,
+						       XFS_BTNUM_FINO, &rec,
+						       false);
+			if (error)
+				return error;
+		}
+	} else {
+		/* full chunk - insert new records to both btrees */
+		error = xfs_inobt_insert(args.mp, tp, agbp, newino, newlen,
+					 XFS_BTNUM_INO);
+		if (error)
+			return error;
+
+		if (xfs_sb_version_hasfinobt(&args.mp->m_sb)) {
+			error = xfs_inobt_insert(args.mp, tp, agbp, newino,
+						 newlen, XFS_BTNUM_FINO);
+			if (error)
+				return error;
+		}
+	}
+
+	/*
+	 * Update AGI counts and newino.
+	 */
 	be32_add_cpu(&agi->agi_count, newlen);
 	be32_add_cpu(&agi->agi_freecount, newlen);
 	pag = xfs_perag_get(args.mp, agno);
@@ -542,20 +857,6 @@ xfs_ialloc_ag_alloc(
 	agi->agi_newino = cpu_to_be32(newino);
 
 	/*
-	 * Insert records describing the new inode chunk into the btrees.
-	 */
-	error = xfs_inobt_insert(args.mp, tp, agbp, newino, newlen,
-				 XFS_BTNUM_INO);
-	if (error)
-		return error;
-
-	if (xfs_sb_version_hasfinobt(&args.mp->m_sb)) {
-		error = xfs_inobt_insert(args.mp, tp, agbp, newino, newlen,
-					 XFS_BTNUM_FINO);
-		if (error)
-			return error;
-	}
-	/*
 	 * Log allocation group header fields
 	 */
 	xfs_ialloc_log_agi(tp, agbp,
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index aa13b46..674ad8f 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -478,3 +478,34 @@ xfs_inobt_irec_to_allocmask(
 
 	return bitmap;
 }
+
+#if defined(DEBUG) || defined(XFS_WARN)
+/*
+ * Verify that an in-core inode record has a valid inode count.
+ */
+int
+xfs_inobt_rec_check_count(
+	struct xfs_mount		*mp,
+	struct xfs_inobt_rec_incore	*rec)
+{
+	int				inocount = 0;
+	int				nextbit = 0;
+	uint64_t			allocbmap;
+	int				wordsz;
+
+	wordsz = sizeof(allocbmap) / sizeof(unsigned int);
+	allocbmap = xfs_inobt_irec_to_allocmask(rec);
+
+	nextbit = xfs_next_bit((uint *) &allocbmap, wordsz, nextbit);
+	while (nextbit != -1) {
+		inocount++;
+		nextbit = xfs_next_bit((uint *) &allocbmap, wordsz,
+				       nextbit + 1);
+	}
+
+	if (inocount != rec->ir_count)
+		return -EFSCORRUPTED;
+
+	return 0;
+}
+#endif	/* DEBUG */
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index 2c581ba..bd88453 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -65,4 +65,11 @@ extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int);
 /* ir_holemask to inode allocation bitmap conversion */
 uint64_t xfs_inobt_irec_to_allocmask(struct xfs_inobt_rec_incore *);
 
+#if defined(DEBUG) || defined(XFS_WARN)
+int xfs_inobt_rec_check_count(struct xfs_mount *,
+			      struct xfs_inobt_rec_incore *);
+#else
+#define xfs_inobt_rec_check_count(mp, rec)	0
+#endif	/* DEBUG */
+
 #endif	/* __XFS_IALLOC_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 51372e3..12a4bf4 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -734,6 +734,53 @@ TRACE_EVENT(xfs_iomap_prealloc_size,
 		  __entry->blocks, __entry->shift, __entry->writeio_blocks)
 )
 
+TRACE_EVENT(xfs_irec_merge_pre,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agino_t agino,
+		 uint16_t holemask, xfs_agino_t nagino, uint16_t nholemask),
+	TP_ARGS(mp, agno, agino, holemask, nagino, nholemask),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agino_t, agino)
+		__field(uint16_t, holemask)
+		__field(xfs_agino_t, nagino)
+		__field(uint16_t, nholemask)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->agino = agino;
+		__entry->holemask = holemask;
+		__entry->nagino = nagino;
+		__entry->nholemask = holemask;
+	),
+	TP_printk("dev %d:%d agno %d inobt (%u:0x%x) new (%u:0x%x)",
+		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno,
+		  __entry->agino, __entry->holemask, __entry->nagino,
+		  __entry->nholemask)
+)
+
+TRACE_EVENT(xfs_irec_merge_post,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agino_t agino,
+		 uint16_t holemask),
+	TP_ARGS(mp, agno, agino, holemask),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agino_t, agino)
+		__field(uint16_t, holemask)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->agino = agino;
+		__entry->holemask = holemask;
+	),
+	TP_printk("dev %d:%d agno %d inobt (%u:0x%x)", MAJOR(__entry->dev),
+		  MINOR(__entry->dev), __entry->agno, __entry->agino,
+		  __entry->holemask)
+)
+
 #define DEFINE_IREF_EVENT(name) \
 DEFINE_EVENT(xfs_iref_class, name, \
 	TP_PROTO(struct xfs_inode *ip, unsigned long caller_ip), \
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 14/18] xfs: randomly do sparse inode allocations in DEBUG mode
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (12 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 13/18] xfs: allocate sparse inode chunks on full chunk allocation failure Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 15/18] xfs: filter out sparse regions from individual inode allocation Brian Foster
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

Sparse inode allocations generally only occur when full inode chunk
allocation fails. This requires some level of filesystem space usage and
fragmentation.

For filesystems formatted with sparse inode chunks enabled, do random
sparse inode chunk allocs when compiled in DEBUG mode to increase test
coverage.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 0aad400..f719706 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -606,9 +606,18 @@ xfs_ialloc_ag_alloc(
 	struct xfs_inobt_rec_incore rec;
 	struct xfs_perag *pag;
 
+	int		do_sparse = 0;
+
+#ifdef DEBUG
+	/* randomly do sparse inode allocations */
+	if (xfs_sb_version_hassparseinodes(&tp->t_mountp->m_sb))
+		do_sparse = prandom_u32() & 1;
+#endif
+
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = tp->t_mountp;
+	args.fsbno = NULLFSBLOCK;
 
 	/*
 	 * Locking will ensure that we don't have two callers in here
@@ -629,6 +638,8 @@ xfs_ialloc_ag_alloc(
 	agno = be32_to_cpu(agi->agi_seqno);
 	args.agbno = XFS_AGINO_TO_AGBNO(args.mp, newino) +
 		     args.mp->m_ialloc_blks;
+	if (do_sparse)
+		goto sparse_alloc;
 	if (likely(newino != NULLAGINO &&
 		  (args.agbno < be32_to_cpu(agi->agi_length)))) {
 		args.fsbno = XFS_AGB_TO_FSB(args.mp, agno, args.agbno);
@@ -667,8 +678,7 @@ xfs_ialloc_ag_alloc(
 		 * subsequent requests.
 		 */
 		args.minalignslop = 0;
-	} else
-		args.fsbno = NULLFSBLOCK;
+	}
 
 	if (unlikely(args.fsbno == NULLFSBLOCK)) {
 		/*
@@ -726,6 +736,7 @@ xfs_ialloc_ag_alloc(
 	if (xfs_sb_version_hassparseinodes(&args.mp->m_sb) &&
 	    args.mp->m_ialloc_min_blks < args.mp->m_ialloc_blks &&
 	    args.fsbno == NULLFSBLOCK) {
+sparse_alloc:
 		args.type = XFS_ALLOCTYPE_NEAR_BNO;
 		args.agbno = be32_to_cpu(agi->agi_root);
 		args.fsbno = XFS_AGB_TO_FSB(args.mp, agno, args.agbno);
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 15/18] xfs: filter out sparse regions from individual inode allocation
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (13 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 14/18] xfs: randomly do sparse inode allocations in DEBUG mode Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 16/18] xfs: only free allocated regions of inode chunks Brian Foster
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

Inode allocation from an existing record with free inodes traditionally
selects the first inode available according to the ir_free mask. With
sparse inode chunks, the ir_free mask could refer to an unallocated
region. We must mask the unallocated regions out of ir_free before using
it to select a free inode in the chunk.

Update the xfs_inobt_first_free_inode() helper to find the first free
inode available of the allocated regions of the inode chunk.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index f719706..a673da3 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1074,13 +1074,24 @@ xfs_ialloc_get_rec(
 }
 
 /*
- * Return the offset of the first free inode in the record.
+ * Return the offset of the first free inode in the record. If the inode chunk
+ * is sparsely allocated, we convert the record holemask to inode granularity
+ * and mask off the unallocated regions from the inode free mask.
  */
 STATIC int
 xfs_inobt_first_free_inode(
 	struct xfs_inobt_rec_incore	*rec)
 {
-	return xfs_lowbit64(rec->ir_free);
+	xfs_inofree_t			realfree;
+
+	/* if there are no holes, return the first available offset */
+	if (!xfs_inobt_issparse(rec->ir_holemask))
+		return xfs_lowbit64(rec->ir_free);
+
+	realfree = xfs_inobt_irec_to_allocmask(rec);
+	realfree &= rec->ir_free;
+
+	return xfs_lowbit64(realfree);
 }
 
 /*
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 16/18] xfs: only free allocated regions of inode chunks
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (14 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 15/18] xfs: filter out sparse regions from individual inode allocation Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 17/18] xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster() Brian Foster
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

An inode chunk is currently added to the transaction free list based on
a simple fsb conversion and hardcoded chunk length. The nature of sparse
chunks is such that the physical chunk of inodes on disk may consist of
one or more discontiguous parts. Blocks that reside in the holes of the
inode chunk are not inodes and could be allocated to any other use or
not allocated at all.

Refactor the existing xfs_bmap_add_free() call into the
xfs_difree_inode_chunk() helper. The new helper uses the existing
calculation if a chunk is not sparse. Otherwise, use the inobt record
holemask to free the contiguous regions of the chunk.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 81 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index a673da3..a29dd4e 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1798,6 +1798,83 @@ out_error:
 	return error;
 }
 
+/*
+ * Free the blocks of an inode chunk. We must consider that the inode chunk
+ * might be sparse and only free the regions that are allocated as part of the
+ * chunk.
+ */
+STATIC void
+xfs_difree_inode_chunk(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t			agno,
+	struct xfs_inobt_rec_incore	*rec,
+	struct xfs_bmap_free		*flist)
+{
+	xfs_agblock_t	sagbno = XFS_AGINO_TO_AGBNO(mp, rec->ir_startino);
+	int		startidx, endidx;
+	int		nextbit;
+	xfs_agblock_t	agbno;
+	int		contigblk;
+	DECLARE_BITMAP(holemask, XFS_INOBT_HOLEMASK_BITS);
+
+	if (!xfs_inobt_issparse(rec->ir_holemask)) {
+		/* not sparse, calculate extent info directly */
+		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno,
+				  XFS_AGINO_TO_AGBNO(mp, rec->ir_startino)),
+				  mp->m_ialloc_blks, flist, mp);
+		return;
+	}
+
+	/* holemask is only 16-bits (fits in an unsigned long) */
+	ASSERT(sizeof(rec->ir_holemask) <= sizeof(holemask[0]));
+	holemask[0] = rec->ir_holemask;
+
+	/*
+	 * Find contiguous ranges of zeroes (i.e., allocated regions) in the
+	 * holemask and convert the start/end index of each range to an extent.
+	 * We start with the start and end index both pointing at the first 0 in
+	 * the mask.
+	 */
+	startidx = endidx = find_first_zero_bit(holemask,
+						XFS_INOBT_HOLEMASK_BITS);
+	nextbit = startidx + 1;
+	while (startidx < XFS_INOBT_HOLEMASK_BITS) {
+		nextbit = find_next_zero_bit(holemask, XFS_INOBT_HOLEMASK_BITS,
+					     nextbit);
+		/*
+		 * If the next zero bit is contiguous, update the end index of
+		 * the current range and continue.
+		 */
+		if (nextbit != XFS_INOBT_HOLEMASK_BITS &&
+		    nextbit == endidx + 1) {
+			endidx = nextbit;
+			goto next;
+		}
+
+		/*
+		 * nextbit is not contiguous with the current end index. Convert
+		 * the current start/end to an extent and add it to the free
+		 * list.
+		 */
+		agbno = sagbno + (startidx * XFS_INODES_PER_HOLEMASK_BIT) /
+				  mp->m_sb.sb_inopblock;
+		contigblk = ((endidx - startidx + 1) *
+			     XFS_INODES_PER_HOLEMASK_BIT) /
+			    mp->m_sb.sb_inopblock;
+
+		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
+		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
+		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
+				  flist, mp);
+
+		/* reset range to current bit and carry on... */
+		startidx = endidx = nextbit;
+
+next:
+		nextbit++;
+	}
+}
+
 STATIC int
 xfs_difree_inobt(
 	struct xfs_mount		*mp,
@@ -1889,9 +1966,7 @@ xfs_difree_inobt(
 			goto error0;
 		}
 
-		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno,
-				  XFS_AGINO_TO_AGBNO(mp, rec.ir_startino)),
-				  mp->m_ialloc_blks, flist, mp);
+		xfs_difree_inode_chunk(mp, agno, &rec, flist);
 	} else {
 		*deleted = 0;
 
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 17/18] xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster()
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (15 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 16/18] xfs: only free allocated regions of inode chunks Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 18:13 ` [PATCH v5 18/18] xfs: enable sparse inode chunks for v5 superblocks Brian Foster
  2015-02-19 19:10 ` [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

xfs_ifree_cluster() is called to mark all in-memory inodes and inode
buffers as stale. This occurs after we've removed the inobt records and
dropped any references of inobt data. xfs_ifree_cluster() uses the
starting inode number to walk the namespace of inodes expected for a
single chunk a cluster buffer at a time. The cluster buffer disk
addresses are calculated by decoding the sequential inode numbers
expected from the chunk.

The problem with this approach is that if the inode chunk being removed
is a sparse chunk, not all of the buffer addresses that are calculated
as part of this sequence may be inode clusters. Attempting to acquire
the buffer based on expected inode characterstics (i.e., cluster length)
can lead to errors and is generally incorrect.

We already use a couple variables to carry requisite state from
xfs_difree() to xfs_ifree_cluster(). Rather than add a third, define a
new internal structure to carry the existing parameters through these
functions. Add an alloc field that represents the physical allocation
bitmap of inodes in the chunk being removed. Modify xfs_ifree_cluster()
to check each inode against the bitmap and skip the clusters that were
never allocated as real inodes on disk.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 17 +++++++----------
 fs/xfs/libxfs/xfs_ialloc.h | 10 ++++++++--
 fs/xfs/xfs_inode.c         | 28 ++++++++++++++++++++--------
 3 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index a29dd4e..b81e5e3 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1882,8 +1882,7 @@ xfs_difree_inobt(
 	struct xfs_buf			*agbp,
 	xfs_agino_t			agino,
 	struct xfs_bmap_free		*flist,
-	int				*deleted,
-	xfs_ino_t			*first_ino,
+	struct xfs_icluster		*xic,
 	struct xfs_inobt_rec_incore	*orec)
 {
 	struct xfs_agi			*agi = XFS_BUF_TO_AGI(agbp);
@@ -1941,9 +1940,9 @@ xfs_difree_inobt(
 	 */
 	if (!(mp->m_flags & XFS_MOUNT_IKEEP) &&
 	    (rec.ir_free == XFS_INOBT_ALL_FREE)) {
-
-		*deleted = 1;
-		*first_ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino);
+		xic->deleted = 1;
+		xic->first_ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino);
+		xic->alloc = xfs_inobt_irec_to_allocmask(&rec);
 
 		/*
 		 * Remove the inode cluster from the AGI B+Tree, adjust the
@@ -1968,7 +1967,7 @@ xfs_difree_inobt(
 
 		xfs_difree_inode_chunk(mp, agno, &rec, flist);
 	} else {
-		*deleted = 0;
+		xic->deleted = 0;
 
 		error = xfs_inobt_update(cur, &rec);
 		if (error) {
@@ -2107,8 +2106,7 @@ xfs_difree(
 	struct xfs_trans	*tp,		/* transaction pointer */
 	xfs_ino_t		inode,		/* inode to be freed */
 	struct xfs_bmap_free	*flist,		/* extents to free */
-	int			*deleted,/* set if inode cluster was deleted */
-	xfs_ino_t		*first_ino)/* first inode in deleted cluster */
+	struct xfs_icluster	*xic)	/* cluster info if deleted */
 {
 	/* REFERENCED */
 	xfs_agblock_t		agbno;	/* block number containing inode */
@@ -2159,8 +2157,7 @@ xfs_difree(
 	/*
 	 * Fix up the inode allocation btree.
 	 */
-	error = xfs_difree_inobt(mp, tp, agbp, agino, flist, deleted, first_ino,
-				 &rec);
+	error = xfs_difree_inobt(mp, tp, agbp, agino, flist, xic, &rec);
 	if (error)
 		goto error0;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index 4d4b702..12401fe 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -28,6 +28,13 @@ struct xfs_btree_cur;
 /* Move inodes in clusters of this size */
 #define	XFS_INODE_BIG_CLUSTER_SIZE	8192
 
+struct xfs_icluster {
+	bool		deleted;	/* record is deleted */
+	xfs_ino_t	first_ino;	/* first inode number */
+	uint64_t	alloc;		/* inode phys. allocation bitmap for
+					 * sparse chunks */
+};
+
 /* Calculate and return the number of filesystem blocks per inode cluster */
 static inline int
 xfs_icluster_size_fsb(
@@ -90,8 +97,7 @@ xfs_difree(
 	struct xfs_trans *tp,		/* transaction pointer */
 	xfs_ino_t	inode,		/* inode to be freed */
 	struct xfs_bmap_free *flist,	/* extents to free */
-	int		*deleted,	/* set if inode cluster was deleted */
-	xfs_ino_t	*first_ino);	/* first inode in deleted cluster */
+	struct xfs_icluster *ifree);	/* cluster info if deleted */
 
 /*
  * Return the location of the inode in imap, for mapping it into a buffer.
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index daafa1f..a054110 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2182,9 +2182,9 @@ xfs_iunlink_remove(
  */
 STATIC int
 xfs_ifree_cluster(
-	xfs_inode_t	*free_ip,
-	xfs_trans_t	*tp,
-	xfs_ino_t	inum)
+	xfs_inode_t		*free_ip,
+	xfs_trans_t		*tp,
+	struct xfs_icluster	*xic)
 {
 	xfs_mount_t		*mp = free_ip->i_mount;
 	int			blks_per_cluster;
@@ -2197,13 +2197,26 @@ xfs_ifree_cluster(
 	xfs_inode_log_item_t	*iip;
 	xfs_log_item_t		*lip;
 	struct xfs_perag	*pag;
+	xfs_ino_t		inum;
 
+	inum = xic->first_ino;
 	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum));
 	blks_per_cluster = xfs_icluster_size_fsb(mp);
 	inodes_per_cluster = blks_per_cluster << mp->m_sb.sb_inopblog;
 	nbufs = mp->m_ialloc_blks / blks_per_cluster;
 
 	for (j = 0; j < nbufs; j++, inum += inodes_per_cluster) {
+		/*
+		 * The allocation bitmap tells us which inodes of the chunk were
+		 * physically allocated. Skip the cluster if an inode falls into
+		 * a sparse region.
+		 */
+		if ((xic->alloc & XFS_INOBT_MASK(inum - xic->first_ino)) == 0) {
+			ASSERT(((inum - xic->first_ino) %
+				inodes_per_cluster) == 0);
+			continue;
+		}
+
 		blkno = XFS_AGB_TO_DADDR(mp, XFS_INO_TO_AGNO(mp, inum),
 					 XFS_INO_TO_AGBNO(mp, inum));
 
@@ -2361,8 +2374,7 @@ xfs_ifree(
 	xfs_bmap_free_t	*flist)
 {
 	int			error;
-	int			delete;
-	xfs_ino_t		first_ino;
+	struct xfs_icluster	xic = { 0 };
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	ASSERT(ip->i_d.di_nlink == 0);
@@ -2378,7 +2390,7 @@ xfs_ifree(
 	if (error)
 		return error;
 
-	error = xfs_difree(tp, ip->i_ino, flist, &delete, &first_ino);
+	error = xfs_difree(tp, ip->i_ino, flist, &xic);
 	if (error)
 		return error;
 
@@ -2395,8 +2407,8 @@ xfs_ifree(
 	ip->i_d.di_gen++;
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-	if (delete)
-		error = xfs_ifree_cluster(ip, tp, first_ino);
+	if (xic.deleted)
+		error = xfs_ifree_cluster(ip, tp, &xic);
 
 	return error;
 }
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 18/18] xfs: enable sparse inode chunks for v5 superblocks
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (16 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 17/18] xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster() Brian Foster
@ 2015-02-19 18:13 ` Brian Foster
  2015-02-19 19:10 ` [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
  18 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 18:13 UTC (permalink / raw)
  To: xfs

Enable mounting of filesystems with sparse inode support enabled. Add
the incompat. feature bit to the *_ALL mask.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 4f2160d..7122ff6 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -521,7 +521,8 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_FTYPE	(1 << 0)	/* filetype in dirent */
 #define XFS_SB_FEAT_INCOMPAT_SPINODES	(1 << 1)	/* sparse inode chunks */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
-		(XFS_SB_FEAT_INCOMPAT_FTYPE)
+		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
+		 XFS_SB_FEAT_INCOMPAT_SPINODES)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
-- 
1.9.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 00/18] xfs: sparse inode chunks
  2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
                   ` (17 preceding siblings ...)
  2015-02-19 18:13 ` [PATCH v5 18/18] xfs: enable sparse inode chunks for v5 superblocks Brian Foster
@ 2015-02-19 19:10 ` Brian Foster
  2015-02-19 23:01   ` Dave Chinner
  2015-06-01  0:12   ` Dave Chinner
  18 siblings, 2 replies; 27+ messages in thread
From: Brian Foster @ 2015-02-19 19:10 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: Type: text/plain, Size: 6082 bytes --]

On Thu, Feb 19, 2015 at 01:13:25PM -0500, Brian Foster wrote:
> Hi all,
> 
> Here's v5 of sparse inode chunks. The only real change here is to
> convert the allocmask helpers back to using the XFS bitmap helpers
> rather than the generic bitmap code. This eliminates the need for the
> endian-conversion hack and extra helper to export a generic bitmap to a
> native type. The former users of the generic bitmap itself have been
> converted to use the native 64-bit value appropriately.
> 
> The XFS bitmap code is actually not in userspace either so neither of
> these implementations backport cleanly to userspace. As it is, I've not
> included the sparse alloc/free code in my xfsprogs branch as this code
> currently isn't needed. Nothing in userspace that I've seen requires the
> ability to do a sparse inode allocation or free. I suspect if it is
> needed in the future, we can more easily sync the XFS bitmap helpers to
> userspace than the generic Linux bitmap code.
> 
> Thoughts, reviews, flames appreciated...
> 

Attached is a tarball of a set of xfsprogs patches to aid in testing
this patchset. I'm posting as a tarball because the core patches (e.g.,
the kernel patches) are obviously still in flux. The tarball includes
the following:

- general dependency backports
- core infrastructure backports (i.e., applicable patches from this v5
  sparse inode set)
- xfsprogs work for sparse inode support

The latter bits include support for mkfs, xfs_info, xfs_db and
xfs_repair, the fundamentals of all of which should work. Use the '-i
spalign' mkfs option to format a sparse inode enabled fs. E.g.:

	mkfs.xfs -m crc=1,finobt=1 -i spalign <dev>

Note that metadump is not yet supported. Failures from the associated
xfstests, etc. are expected. I'm not aware of anything else that is
missing support or otherwise broken, so any feedback along those lines
is appreciated.

Brian

> Brian
> 
> v5:
> - Use XFS helpers for allocmask code instead of generic bitmap helpers.
> v4: http://oss.sgi.com/archives/xfs/2015-02/msg00240.html
> - Rename sb_spinoalignmt to sb_spino_align.
> - Clean up error/warning messages.
> - Use a union to differentiate old/new xfs_inobt_rec on-disk format.
>   Refactor such that in-core record fields are always valid.
> - Rename/move allocmap (bitmap) helper functions and provide extra
>   helper for endian conv.
> - Refactor sparse chunk allocation record management code.
> - Clean up #ifdef and label usage for DEBUG mode sparse allocs.
> - Split up and moved some generic, preparatory hunks earlier in series.
> v3: http://oss.sgi.com/archives/xfs/2015-02/msg00110.html
> - Rebase to latest for-next (bulkstat rework, data structure shuffling,
>   etc.).
> - Fix issparse helper logic.
> - Update inode alignment model w/ spinodes enabled. All inode records
>   are chunk size aligned, sparse allocations cluster size aligned (both
>   enforced on mount).
> - Reworked sparse inode record merge logic to coincide w/ new alignment
>   model.
> - Mark feature as experimental (warn on mount).
> - Include and use block allocation agbno range limit to prevent
>   allocation of invalid inode records.
> - Add some DEBUG bits to improve sparse alloc. test coverage.
> v2: http://oss.sgi.com/archives/xfs/2014-11/msg00007.html
> - Use a manually set feature bit instead of dynamic based on the
>   existence of sparse inode chunks.
> - Add sb/mp fields for sparse alloc. granularity (use instead of cluster
>   size).
> - Undo xfs_inobt_insert() loop removal to avoid breakage of larger page
>   size arches.
> - Rename sparse record overlap helper and do XFS_LOOKUP_LE search.
> - Use byte of pad space in inobt record for inode count field.
> - Convert bitmap mgmt to use generic bitmap code.
> - Rename XFS_INODES_PER_SPCHUNK to XFS_INODES_PER_HOLEMASK_BIT.
> - Add fs geometry bit for sparse inodes.
> - Rebase to latest for-next (bulkstat refactor).
> v1: http://oss.sgi.com/archives/xfs/2014-07/msg00355.html
> 
> Brian Foster (18):
>   xfs: create individual inode alloc. helper
>   xfs: update free inode record logic to support sparse inode records
>   xfs: support min/max agbno args in block allocator
>   xfs: add sparse inode chunk alignment superblock field
>   xfs: use sparse chunk alignment for min. inode allocation requirement
>   xfs: sparse inode chunks feature helpers and mount requirements
>   xfs: add fs geometry bit for sparse inode chunks
>   xfs: introduce inode record hole mask for sparse inode chunks
>   xfs: use actual inode count for sparse records in bulkstat/inumbers
>   xfs: pass inode count through ordered icreate log item
>   xfs: handle sparse inode chunks in icreate log recovery
>   xfs: helper to convert holemask to inode alloc. bitmap
>   xfs: allocate sparse inode chunks on full chunk allocation failure
>   xfs: randomly do sparse inode allocations in DEBUG mode
>   xfs: filter out sparse regions from individual inode allocation
>   xfs: only free allocated regions of inode chunks
>   xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster()
>   xfs: enable sparse inode chunks for v5 superblocks
> 
>  fs/xfs/libxfs/xfs_alloc.c        |  42 +++-
>  fs/xfs/libxfs/xfs_alloc.h        |   2 +
>  fs/xfs/libxfs/xfs_format.h       |  50 +++-
>  fs/xfs/libxfs/xfs_fs.h           |   1 +
>  fs/xfs/libxfs/xfs_ialloc.c       | 530 +++++++++++++++++++++++++++++++++++----
>  fs/xfs/libxfs/xfs_ialloc.h       |  12 +-
>  fs/xfs/libxfs/xfs_ialloc_btree.c |  93 ++++++-
>  fs/xfs/libxfs/xfs_ialloc_btree.h |  10 +
>  fs/xfs/libxfs/xfs_sb.c           |  30 ++-
>  fs/xfs/xfs_fsops.c               |   4 +-
>  fs/xfs/xfs_inode.c               |  28 ++-
>  fs/xfs/xfs_itable.c              |  13 +-
>  fs/xfs/xfs_log_recover.c         |  26 +-
>  fs/xfs/xfs_mount.c               |  16 ++
>  fs/xfs/xfs_mount.h               |   2 +
>  fs/xfs/xfs_trace.h               |  47 ++++
>  16 files changed, 820 insertions(+), 86 deletions(-)
> 
> -- 
> 1.9.3
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

[-- Attachment #2: xfsprogs-sparse-inodes-v5.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 34627 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 00/18] xfs: sparse inode chunks
  2015-02-19 19:10 ` [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
@ 2015-02-19 23:01   ` Dave Chinner
  2015-02-19 23:20     ` Brian Foster
  2015-06-01  0:12   ` Dave Chinner
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2015-02-19 23:01 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Feb 19, 2015 at 02:10:34PM -0500, Brian Foster wrote:
> On Thu, Feb 19, 2015 at 01:13:25PM -0500, Brian Foster wrote:
> > Hi all,
> > 
> > Here's v5 of sparse inode chunks. The only real change here is to
> > convert the allocmask helpers back to using the XFS bitmap helpers
> > rather than the generic bitmap code. This eliminates the need for the
> > endian-conversion hack and extra helper to export a generic bitmap to a
> > native type. The former users of the generic bitmap itself have been
> > converted to use the native 64-bit value appropriately.
> > 
> > The XFS bitmap code is actually not in userspace either so neither of
> > these implementations backport cleanly to userspace. As it is, I've not
> > included the sparse alloc/free code in my xfsprogs branch as this code
> > currently isn't needed. Nothing in userspace that I've seen requires the
> > ability to do a sparse inode allocation or free. I suspect if it is
> > needed in the future, we can more easily sync the XFS bitmap helpers to
> > userspace than the generic Linux bitmap code.
> > 
> > Thoughts, reviews, flames appreciated...
> > 
> 
> Attached is a tarball of a set of xfsprogs patches to aid in testing
> this patchset. I'm posting as a tarball because the core patches (e.g.,
> the kernel patches) are obviously still in flux. The tarball includes
> the following:
> 
> - general dependency backports
> - core infrastructure backports (i.e., applicable patches from this v5
>   sparse inode set)
> - xfsprogs work for sparse inode support

You should probably base it on the libxfs-3.19-update branch rather
than backport random patches into the current branch. This is what
I'm basing the current rmap-btree work I'm doing on, and having the
same libxfs structure on both sides makes it way easier to keep both
sides up to date....

Give me a couple of hours and I'll push out the latest updates to
that the branch...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 00/18] xfs: sparse inode chunks
  2015-02-19 23:01   ` Dave Chinner
@ 2015-02-19 23:20     ` Brian Foster
  2015-02-19 23:49       ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Brian Foster @ 2015-02-19 23:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Feb 20, 2015 at 10:01:50AM +1100, Dave Chinner wrote:
> On Thu, Feb 19, 2015 at 02:10:34PM -0500, Brian Foster wrote:
> > On Thu, Feb 19, 2015 at 01:13:25PM -0500, Brian Foster wrote:
> > > Hi all,
> > > 
> > > Here's v5 of sparse inode chunks. The only real change here is to
> > > convert the allocmask helpers back to using the XFS bitmap helpers
> > > rather than the generic bitmap code. This eliminates the need for the
> > > endian-conversion hack and extra helper to export a generic bitmap to a
> > > native type. The former users of the generic bitmap itself have been
> > > converted to use the native 64-bit value appropriately.
> > > 
> > > The XFS bitmap code is actually not in userspace either so neither of
> > > these implementations backport cleanly to userspace. As it is, I've not
> > > included the sparse alloc/free code in my xfsprogs branch as this code
> > > currently isn't needed. Nothing in userspace that I've seen requires the
> > > ability to do a sparse inode allocation or free. I suspect if it is
> > > needed in the future, we can more easily sync the XFS bitmap helpers to
> > > userspace than the generic Linux bitmap code.
> > > 
> > > Thoughts, reviews, flames appreciated...
> > > 
> > 
> > Attached is a tarball of a set of xfsprogs patches to aid in testing
> > this patchset. I'm posting as a tarball because the core patches (e.g.,
> > the kernel patches) are obviously still in flux. The tarball includes
> > the following:
> > 
> > - general dependency backports
> > - core infrastructure backports (i.e., applicable patches from this v5
> >   sparse inode set)
> > - xfsprogs work for sparse inode support
> 
> You should probably base it on the libxfs-3.19-update branch rather
> than backport random patches into the current branch. This is what
> I'm basing the current rmap-btree work I'm doing on, and having the
> same libxfs structure on both sides makes it way easier to keep both
> sides up to date....
> 

I wasn't aware we had such a branch... I don't see it in my xfsprogs
repo. Perhaps that's because my repo is still based on the oss.sgi.com
repo?

> Give me a couple of hours and I'll push out the latest updates to
> that the branch...
> 

If you can make it available somewhere or another, I'll try to move
everything over.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 00/18] xfs: sparse inode chunks
  2015-02-19 23:20     ` Brian Foster
@ 2015-02-19 23:49       ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2015-02-19 23:49 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Feb 19, 2015 at 06:20:24PM -0500, Brian Foster wrote:
> On Fri, Feb 20, 2015 at 10:01:50AM +1100, Dave Chinner wrote:
> > On Thu, Feb 19, 2015 at 02:10:34PM -0500, Brian Foster wrote:
> > > On Thu, Feb 19, 2015 at 01:13:25PM -0500, Brian Foster wrote:
> > > > Hi all,
> > > > 
> > > > Here's v5 of sparse inode chunks. The only real change here is to
> > > > convert the allocmask helpers back to using the XFS bitmap helpers
> > > > rather than the generic bitmap code. This eliminates the need for the
> > > > endian-conversion hack and extra helper to export a generic bitmap to a
> > > > native type. The former users of the generic bitmap itself have been
> > > > converted to use the native 64-bit value appropriately.
> > > > 
> > > > The XFS bitmap code is actually not in userspace either so neither of
> > > > these implementations backport cleanly to userspace. As it is, I've not
> > > > included the sparse alloc/free code in my xfsprogs branch as this code
> > > > currently isn't needed. Nothing in userspace that I've seen requires the
> > > > ability to do a sparse inode allocation or free. I suspect if it is
> > > > needed in the future, we can more easily sync the XFS bitmap helpers to
> > > > userspace than the generic Linux bitmap code.
> > > > 
> > > > Thoughts, reviews, flames appreciated...
> > > > 
> > > 
> > > Attached is a tarball of a set of xfsprogs patches to aid in testing
> > > this patchset. I'm posting as a tarball because the core patches (e.g.,
> > > the kernel patches) are obviously still in flux. The tarball includes
> > > the following:
> > > 
> > > - general dependency backports
> > > - core infrastructure backports (i.e., applicable patches from this v5
> > >   sparse inode set)
> > > - xfsprogs work for sparse inode support
> > 
> > You should probably base it on the libxfs-3.19-update branch rather
> > than backport random patches into the current branch. This is what
> > I'm basing the current rmap-btree work I'm doing on, and having the
> > same libxfs structure on both sides makes it way easier to keep both
> > sides up to date....
> > 
> 
> I wasn't aware we had such a branch... I don't see it in my xfsprogs
> repo. Perhaps that's because my repo is still based on the oss.sgi.com
> repo?

Possibly, though I thought I pushed it there. I don't tend to look
at the oss repositories these days, and only push to them at release
time.

https://git.kernel.org/cgit/fs/xfs/xfsprogs-dev.git/log/?h=libxfs-3.19-update

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 00/18] xfs: sparse inode chunks
  2015-02-19 19:10 ` [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
  2015-02-19 23:01   ` Dave Chinner
@ 2015-06-01  0:12   ` Dave Chinner
  2015-06-01 12:56     ` Brian Foster
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2015-06-01  0:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Feb 19, 2015 at 02:10:34PM -0500, Brian Foster wrote:
> On Thu, Feb 19, 2015 at 01:13:25PM -0500, Brian Foster wrote:
> > Hi all,
> > 
> > Here's v5 of sparse inode chunks. The only real change here is to
> > convert the allocmask helpers back to using the XFS bitmap helpers
> > rather than the generic bitmap code. This eliminates the need for the
> > endian-conversion hack and extra helper to export a generic bitmap to a
> > native type. The former users of the generic bitmap itself have been
> > converted to use the native 64-bit value appropriately.
> > 
> > The XFS bitmap code is actually not in userspace either so neither of
> > these implementations backport cleanly to userspace. As it is, I've not
> > included the sparse alloc/free code in my xfsprogs branch as this code
> > currently isn't needed. Nothing in userspace that I've seen requires the
> > ability to do a sparse inode allocation or free. I suspect if it is
> > needed in the future, we can more easily sync the XFS bitmap helpers to
> > userspace than the generic Linux bitmap code.
> > 
> > Thoughts, reviews, flames appreciated...
> > 
> 
> Attached is a tarball of a set of xfsprogs patches to aid in testing
> this patchset. I'm posting as a tarball because the core patches (e.g.,
> the kernel patches) are obviously still in flux. The tarball includes
> the following:
> 
> - general dependency backports
> - core infrastructure backports (i.e., applicable patches from this v5
>   sparse inode set)
> - xfsprogs work for sparse inode support
> 
> The latter bits include support for mkfs, xfs_info, xfs_db and
> xfs_repair, the fundamentals of all of which should work. Use the '-i
> spalign' mkfs option to format a sparse inode enabled fs. E.g.:
> 
> 	mkfs.xfs -m crc=1,finobt=1 -i spalign <dev>
> 
> Note that metadump is not yet supported. Failures from the associated
> xfstests, etc. are expected. I'm not aware of anything else that is
> missing support or otherwise broken, so any feedback along those lines
> is appreciated.

Notes:

- mkfs output doesn't indicate that sparse inodes are configured.
- mkfs cli option of "-i sparse=[0|1]" makes more sense than
  "spalign"
- "SPARSE_INODES" missing from xfs_db version output
- kernel code seems to be regression from when not using sparse
  inodes
- inode allocation speed does not seem to be impacted by sparse
  inode allocation - running my fsmark tests on a debug kernel show
  no performance differential, even though sparse inode chunks
  should be created in that case.
- it smoke tests through xfstests ok

I haven't really looked through the userspace code in any detail,
so I can't really comment on that side of things yet. The kernel
code looks good, there doesn't appear to be any regressions and the
new functionailty works so far. Hence I think I'm going to merge
the kernel code in the 4.2 cycle, and we can work on getting
userspace into the current dev tree for people to test and use the
new code....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 00/18] xfs: sparse inode chunks
  2015-06-01  0:12   ` Dave Chinner
@ 2015-06-01 12:56     ` Brian Foster
  2015-06-01 20:47       ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Brian Foster @ 2015-06-01 12:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Jun 01, 2015 at 10:12:30AM +1000, Dave Chinner wrote:
> On Thu, Feb 19, 2015 at 02:10:34PM -0500, Brian Foster wrote:
> > On Thu, Feb 19, 2015 at 01:13:25PM -0500, Brian Foster wrote:
> > > Hi all,
> > > 
> > > Here's v5 of sparse inode chunks. The only real change here is to
> > > convert the allocmask helpers back to using the XFS bitmap helpers
> > > rather than the generic bitmap code. This eliminates the need for the
> > > endian-conversion hack and extra helper to export a generic bitmap to a
> > > native type. The former users of the generic bitmap itself have been
> > > converted to use the native 64-bit value appropriately.
> > > 
> > > The XFS bitmap code is actually not in userspace either so neither of
> > > these implementations backport cleanly to userspace. As it is, I've not
> > > included the sparse alloc/free code in my xfsprogs branch as this code
> > > currently isn't needed. Nothing in userspace that I've seen requires the
> > > ability to do a sparse inode allocation or free. I suspect if it is
> > > needed in the future, we can more easily sync the XFS bitmap helpers to
> > > userspace than the generic Linux bitmap code.
> > > 
> > > Thoughts, reviews, flames appreciated...
> > > 
> > 
> > Attached is a tarball of a set of xfsprogs patches to aid in testing
> > this patchset. I'm posting as a tarball because the core patches (e.g.,
> > the kernel patches) are obviously still in flux. The tarball includes
> > the following:
> > 
> > - general dependency backports
> > - core infrastructure backports (i.e., applicable patches from this v5
> >   sparse inode set)
> > - xfsprogs work for sparse inode support
> > 
> > The latter bits include support for mkfs, xfs_info, xfs_db and
> > xfs_repair, the fundamentals of all of which should work. Use the '-i
> > spalign' mkfs option to format a sparse inode enabled fs. E.g.:
> > 
> > 	mkfs.xfs -m crc=1,finobt=1 -i spalign <dev>
> > 
> > Note that metadump is not yet supported. Failures from the associated
> > xfstests, etc. are expected. I'm not aware of anything else that is
> > missing support or otherwise broken, so any feedback along those lines
> > is appreciated.
> 
> Notes:
> 
> - mkfs output doesn't indicate that sparse inodes are configured.
> - mkfs cli option of "-i sparse=[0|1]" makes more sense than
>   "spalign"
> - "SPARSE_INODES" missing from xfs_db version output

Ok...

> - kernel code seems to be regression from when not using sparse
>   inodes

What regression are you referring to?

> - inode allocation speed does not seem to be impacted by sparse
>   inode allocation - running my fsmark tests on a debug kernel show
>   no performance differential, even though sparse inode chunks
>   should be created in that case.
> - it smoke tests through xfstests ok
> 

I haven't really run into much for issues so far save for a problem
discovered with the DEBUG mode code from my recent large block size
testing. I have a patch for that lying around I need to post...

> I haven't really looked through the userspace code in any detail,
> so I can't really comment on that side of things yet. The kernel
> code looks good, there doesn't appear to be any regressions and the
> new functionailty works so far. Hence I think I'm going to merge
> the kernel code in the 4.2 cycle, and we can work on getting
> userspace into the current dev tree for people to test and use the
> new code....
> 

Sounds good, thanks. The userspace bits have only been posted for
testing purposes to this point to avoid the churn from active review of
the core code. Since that is now merged, I'll get the latest mechanism
ported over to userspace, incorporate some of the fixes noted above and
get something posted hopefully soon.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 00/18] xfs: sparse inode chunks
  2015-06-01 12:56     ` Brian Foster
@ 2015-06-01 20:47       ` Dave Chinner
  2015-06-01 21:21         ` Brian Foster
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2015-06-01 20:47 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, Jun 01, 2015 at 08:56:39AM -0400, Brian Foster wrote:
> On Mon, Jun 01, 2015 at 10:12:30AM +1000, Dave Chinner wrote:
> > On Thu, Feb 19, 2015 at 02:10:34PM -0500, Brian Foster wrote:
> > - kernel code seems to be regression from when not using sparse
> >   inodes
> 
> What regression are you referring to?

Doh! typo there. s/from/free/

> > - inode allocation speed does not seem to be impacted by sparse
> >   inode allocation - running my fsmark tests on a debug kernel show
> >   no performance differential, even though sparse inode chunks
> >   should be created in that case.
> > - it smoke tests through xfstests ok
> 
> I haven't really run into much for issues so far save for a problem
> discovered with the DEBUG mode code from my recent large block size
> testing. I have a patch for that lying around I need to post...
> 
> > I haven't really looked through the userspace code in any detail,
> > so I can't really comment on that side of things yet. The kernel
> > code looks good, there doesn't appear to be any regressions and the
> > new functionailty works so far. Hence I think I'm going to merge
> > the kernel code in the 4.2 cycle, and we can work on getting
> > userspace into the current dev tree for people to test and use the
> > new code....
> > 
> 
> Sounds good, thanks. The userspace bits have only been posted for
> testing purposes to this point to avoid the churn from active review of
> the core code. Since that is now merged, I'll get the latest mechanism
> ported over to userspace, incorporate some of the fixes noted above and
> get something posted hopefully soon.

Can you port it to the current dev branch (libxfs-4.1-update)? That
way will be much easier for you, and me when it comes to merging..

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 00/18] xfs: sparse inode chunks
  2015-06-01 20:47       ` Dave Chinner
@ 2015-06-01 21:21         ` Brian Foster
  0 siblings, 0 replies; 27+ messages in thread
From: Brian Foster @ 2015-06-01 21:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jun 02, 2015 at 06:47:03AM +1000, Dave Chinner wrote:
> On Mon, Jun 01, 2015 at 08:56:39AM -0400, Brian Foster wrote:
> > On Mon, Jun 01, 2015 at 10:12:30AM +1000, Dave Chinner wrote:
> > > On Thu, Feb 19, 2015 at 02:10:34PM -0500, Brian Foster wrote:
> > > - kernel code seems to be regression from when not using sparse
> > >   inodes
> > 
> > What regression are you referring to?
> 
> Doh! typo there. s/from/free/
> 

Ah, that sounds better. ;)

> > > - inode allocation speed does not seem to be impacted by sparse
> > >   inode allocation - running my fsmark tests on a debug kernel show
> > >   no performance differential, even though sparse inode chunks
> > >   should be created in that case.
> > > - it smoke tests through xfstests ok
> > 
> > I haven't really run into much for issues so far save for a problem
> > discovered with the DEBUG mode code from my recent large block size
> > testing. I have a patch for that lying around I need to post...
> > 
> > > I haven't really looked through the userspace code in any detail,
> > > so I can't really comment on that side of things yet. The kernel
> > > code looks good, there doesn't appear to be any regressions and the
> > > new functionailty works so far. Hence I think I'm going to merge
> > > the kernel code in the 4.2 cycle, and we can work on getting
> > > userspace into the current dev tree for people to test and use the
> > > new code....
> > > 
> > 
> > Sounds good, thanks. The userspace bits have only been posted for
> > testing purposes to this point to avoid the churn from active review of
> > the core code. Since that is now merged, I'll get the latest mechanism
> > ported over to userspace, incorporate some of the fixes noted above and
> > get something posted hopefully soon.
> 
> Can you port it to the current dev branch (libxfs-4.1-update)? That
> way will be much easier for you, and me when it comes to merging..
> 

Yeah, it's been based on the 4.1 update branch for the last tarball or
two that have been posted, which eliminated the need for the
dependencies I was carrying along with it beforehand.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-06-01 21:21 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-19 18:13 [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
2015-02-19 18:13 ` [PATCH v5 01/18] xfs: create individual inode alloc. helper Brian Foster
2015-02-19 18:13 ` [PATCH v5 02/18] xfs: update free inode record logic to support sparse inode records Brian Foster
2015-02-19 18:13 ` [PATCH v5 03/18] xfs: support min/max agbno args in block allocator Brian Foster
2015-02-19 18:13 ` [PATCH v5 04/18] xfs: add sparse inode chunk alignment superblock field Brian Foster
2015-02-19 18:13 ` [PATCH v5 05/18] xfs: use sparse chunk alignment for min. inode allocation requirement Brian Foster
2015-02-19 18:13 ` [PATCH v5 06/18] xfs: sparse inode chunks feature helpers and mount requirements Brian Foster
2015-02-19 18:13 ` [PATCH v5 07/18] xfs: add fs geometry bit for sparse inode chunks Brian Foster
2015-02-19 18:13 ` [PATCH v5 08/18] xfs: introduce inode record hole mask " Brian Foster
2015-02-19 18:13 ` [PATCH v5 09/18] xfs: use actual inode count for sparse records in bulkstat/inumbers Brian Foster
2015-02-19 18:13 ` [PATCH v5 10/18] xfs: pass inode count through ordered icreate log item Brian Foster
2015-02-19 18:13 ` [PATCH v5 11/18] xfs: handle sparse inode chunks in icreate log recovery Brian Foster
2015-02-19 18:13 ` [PATCH v5 12/18] xfs: helper to convert holemask to inode alloc. bitmap Brian Foster
2015-02-19 18:13 ` [PATCH v5 13/18] xfs: allocate sparse inode chunks on full chunk allocation failure Brian Foster
2015-02-19 18:13 ` [PATCH v5 14/18] xfs: randomly do sparse inode allocations in DEBUG mode Brian Foster
2015-02-19 18:13 ` [PATCH v5 15/18] xfs: filter out sparse regions from individual inode allocation Brian Foster
2015-02-19 18:13 ` [PATCH v5 16/18] xfs: only free allocated regions of inode chunks Brian Foster
2015-02-19 18:13 ` [PATCH v5 17/18] xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster() Brian Foster
2015-02-19 18:13 ` [PATCH v5 18/18] xfs: enable sparse inode chunks for v5 superblocks Brian Foster
2015-02-19 19:10 ` [PATCH v5 00/18] xfs: sparse inode chunks Brian Foster
2015-02-19 23:01   ` Dave Chinner
2015-02-19 23:20     ` Brian Foster
2015-02-19 23:49       ` Dave Chinner
2015-06-01  0:12   ` Dave Chinner
2015-06-01 12:56     ` Brian Foster
2015-06-01 20:47       ` Dave Chinner
2015-06-01 21:21         ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.