* [PATCH 1/7] journaling_log: fix some typos in the section about EFDs
2016-08-25 23:26 [PATCH v8 0/7] xfs-docs: reorganize chapters, document rmap and reflink Darrick J. Wong
@ 2016-08-25 23:27 ` Darrick J. Wong
2016-08-25 23:27 ` [PATCH 2/7] xfsdocs: document known testing procedures Darrick J. Wong
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:27 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-xfs, xfs
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
.../journaling_log.asciidoc | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/design/XFS_Filesystem_Structure/journaling_log.asciidoc b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
index a67fcc2..67d209f 100644
--- a/design/XFS_Filesystem_Structure/journaling_log.asciidoc
+++ b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
@@ -368,7 +368,7 @@ typedef struct xfs_efd_log_format {
----
*efd_type*::
-The signature of an EFI operation, 0x1236. This value is in host-endian order,
+The signature of an EFD operation, 0x1237. This value is in host-endian order,
not big-endian like the rest of XFS.
*efd_size*::
@@ -382,9 +382,9 @@ A 64-bit number that binds the corresponding EFI log item to this EFD log item.
*efd_extents*::
Variable-length array of extents to be freed. The array length is given by
-+efi_nextents+. The record type will be either +xfs_extent_64_t+ or
++efd_nextents+. The record type will be either +xfs_extent_64_t+ or
+xfs_extent_32_t+; this can be determined from the log item size (+oh_len+) and
-the number of extents (+efi_nextents+).
+the number of extents (+efd_nextents+).
[[Inode_Log_Item]]
=== Inode Updates
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/7] xfsdocs: document known testing procedures
2016-08-25 23:26 [PATCH v8 0/7] xfs-docs: reorganize chapters, document rmap and reflink Darrick J. Wong
2016-08-25 23:27 ` [PATCH 1/7] journaling_log: fix some typos in the section about EFDs Darrick J. Wong
@ 2016-08-25 23:27 ` Darrick J. Wong
2016-08-25 23:27 ` [PATCH 3/7] xfsdocs: update the on-disk format with changes for Linux 4.5 Darrick J. Wong
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:27 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-xfs, xfs
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
design/XFS_Filesystem_Structure/docinfo.xml | 14 ++++++++++++
design/XFS_Filesystem_Structure/testing.asciidoc | 23 ++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc | 2 ++
3 files changed, 39 insertions(+)
create mode 100644 design/XFS_Filesystem_Structure/testing.asciidoc
diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index ba97809..cc5596d 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -108,4 +108,18 @@
</simplelist>
</revdescription>
</revision>
+ <revision>
+ <revnumber>3.14</revnumber>
+ <date>January 2016</date>
+ <author>
+ <firstname>Darrick</firstname>
+ <surname>Wong</surname>
+ <email></email>
+ </author>
+ <revdescription>
+ <simplelist>
+ <member>Document disk format change testing.</member>
+ </simplelist>
+ </revdescription>
+ </revision>
</revhistory>
diff --git a/design/XFS_Filesystem_Structure/testing.asciidoc b/design/XFS_Filesystem_Structure/testing.asciidoc
new file mode 100644
index 0000000..f1c90bc
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/testing.asciidoc
@@ -0,0 +1,23 @@
+[[Testing]]
+= Testing Filesystem Changes
+
+People put a lot of trust in filesystems to preserve their data in a reliable
+fashion. To that end, it is very important that users and developers have
+access to a suite of regression tests that can be used to prove correct
+operation of any given filesystem code, or to analyze failures to fix problems
+found in the code. The XFS regression test suite, +xfstests+, is hosted at
++git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git+. Most tests apply to
+filesystems in general, but the suite also contains tests for features specific
+to each filesystem.
+
+When fixing bugs, it is important to provide a testcase exposing the bug so
+that the developers can avoid a future re-occurrence of the regression.
+Furthermore, if you're developing a new user-visible feature for XFS, please
+help the rest of the development community to sustain and maintain the whole
+codebase by providing generous test coverage to check its behavior.
+
+When altering, adding, or removing an on-disk data structure, please remember
+to update both the in-kernel structure size checks in +xfs_ondisk.h+ and to
+ensure that your changes are reflected in xfstest xfs/122. These regression
+tests enable us to detect compiler bugs, alignment problems, and anything
+else that might result in the creation of incompatible filesystem images.
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index 53262bf..f580aab 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -52,6 +52,8 @@ include::common_types.asciidoc[]
include::magic.asciidoc[]
+include::testing.asciidoc[]
+
// return titles to normal
:leveloffset: 0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/7] xfsdocs: update the on-disk format with changes for Linux 4.5
2016-08-25 23:26 [PATCH v8 0/7] xfs-docs: reorganize chapters, document rmap and reflink Darrick J. Wong
2016-08-25 23:27 ` [PATCH 1/7] journaling_log: fix some typos in the section about EFDs Darrick J. Wong
2016-08-25 23:27 ` [PATCH 2/7] xfsdocs: document known testing procedures Darrick J. Wong
@ 2016-08-25 23:27 ` Darrick J. Wong
2016-08-25 23:27 ` [PATCH 4/7] xfsdocs: move the discussions of short and long format btrees to a separate chapter Darrick J. Wong
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:27 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-xfs, xfs
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index 4aabc55..dc1fad2 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -66,9 +66,10 @@ of the literal area and +di_forkoff+. The attribute fork is located between
[[Inode_Core]]
== Inode Core
-The inode's core is 96 bytes in size and contains information about the file
-itself including most stat data information about data and attribute forks after
-the core within the inode. It uses the following structure:
+The inode's core is 96 bytes on a V4 filesystem and 176 bytes on a V5
+filesystem. It contains information about the file itself including most stat
+data information about data and attribute forks after the core within the
+inode. It uses the following structure:
[source, c]
----
@@ -313,8 +314,16 @@ Counts the number of changes made to the attributes in this inode.
Log sequence number of the last inode write.
*di_flags2*::
-Specifies extended flags associated with a v3 inode. There are no flags defined
-currently.
+Specifies extended flags associated with a v3 inode.
+
+.Version 3 Inode flags
+[options="header"]
+|=====
+| Flag | Description
+| +XFS_DIFLAG2_DAX+ |
+For a file, enable DAX to increase performance on persistent-memory storage.
+If set on a directory, files created in the directory will inherit this flag.
+|=====
*di_pad2*::
Padding for future expansion of the inode.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/7] xfsdocs: move the discussions of short and long format btrees to a separate chapter
2016-08-25 23:26 [PATCH v8 0/7] xfs-docs: reorganize chapters, document rmap and reflink Darrick J. Wong
` (2 preceding siblings ...)
2016-08-25 23:27 ` [PATCH 3/7] xfsdocs: update the on-disk format with changes for Linux 4.5 Darrick J. Wong
@ 2016-08-25 23:27 ` Darrick J. Wong
2016-08-25 23:27 ` [PATCH 5/7] xfsdocs: reverse-mapping btree documentation Darrick J. Wong
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:27 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-xfs, xfs
Move the discussion of short and long format btrees into a separate
chapter.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
.../allocation_groups.asciidoc | 59 ------
design/XFS_Filesystem_Structure/btrees.asciidoc | 196 ++++++++++++++++++++
.../XFS_Filesystem_Structure/data_extents.asciidoc | 72 +------
.../xfs_filesystem_structure.asciidoc | 2
4 files changed, 204 insertions(+), 125 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/btrees.asciidoc
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index 0633175..55bbc50 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -612,65 +612,6 @@ Checksum of the AGF sector.
*agf_spare2*::
Empty space in the unlogged part of the AGF sector.
-[[Short_Format_Btrees]]
-=== Short Format B+trees
-
-Each allocation group uses a ``short format'' B+tree to index various
-information about the allocation group. The structure is called short format
-because all block pointers are AG block numbers. The trees use the following
-header:
-
-[source, c]
-----
-struct xfs_btree_sblock {
- __be32 bb_magic;
- __be16 bb_level;
- __be16 bb_numrecs;
- __be32 bb_leftsib;
- __be32 bb_rightsib;
-
- /* version 5 filesystem fields start here */
- __be64 bb_blkno;
- __be64 bb_lsn;
- uuid_t bb_uuid;
- __be32 bb_owner;
- __le32 bb_crc;
-};
-----
-
-*bb_magic*::
-Specifies the magic number for the per-AG B+tree block.
-
-*bb_level*::
-The level of the tree in which this block is found. If this value is 0, this
-is a leaf block and contains records; otherwise, it is a node block and
-contains keys and pointers.
-
-*bb_numrecs*::
-Number of records in this block.
-
-*bb_leftsib*::
-AG block number of the left sibling of this B+tree node.
-
-*bb_rightsib*::
-AG block number of the right sibling of this B+tree node.
-
-*bb_blkno*::
-FS block number of this B+tree block.
-
-*bb_lsn*::
-Log sequence number of the last write to this block.
-
-*bb_uuid*::
-The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+
-depending on which features are set.
-
-*bb_owner*::
-The AG number that this B+tree block ought to be in.
-
-*bb_crc*::
-Checksum of the B+tree block.
-
[[AG_Free_Space_Btrees]]
=== AG Free Space B+trees
diff --git a/design/XFS_Filesystem_Structure/btrees.asciidoc b/design/XFS_Filesystem_Structure/btrees.asciidoc
new file mode 100644
index 0000000..306e061
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/btrees.asciidoc
@@ -0,0 +1,196 @@
+= B+trees
+
+XFS uses b+trees to index all metadata records. This well known data structure
+is used to provide efficient random and sequential access to metadata records
+while minimizing seek times. There are two btree formats: a short format
+for records pertaining to a single allocation group, since all block pointers
+in an AG are 32-bits in size; and a long format for records pertaining to a
+file, since file data can have 64-bit block offsets. Each b+tree block is
+either a leaf node containing records, or an internal node containing keys and
+pointers to other b+tree blocks. The tree consists of a root block which may
+point to some number of other blocks; blocks in the bottom level of the b+tree
+contains only records.
+
+Leaf blocks of both types of b+trees have the same general format: a header
+describing the data in the block, and an array of records. The specific header
+formats are given in the next two sections, and the record format is provided
+by the b+tree client itself. The generic b+tree code does not have any
+specific knowledge of the record format.
+
+----
++--------+------------+------------+
+| header | record | records... |
++--------+------------+------------+
+----
+
+Internal node blocks of both types of b+trees also have the same general
+format: a header describing the data in the block, an array of keys, and an
+array of pointers. Each pointer may be associated with one or two keys. The
+first key uniquely identifies the first record accessible via the leftmost path
+down the branch of the tree.
+
+If the records in a b+tree are indexed by an interval, then a range of keys can
+uniquely identify a single record. For example, if a record covers blocks
+12-16, then any one of the keys 12, 13, 14, 15, or 16 return the same record.
+In this case, the key for the record describing "12-16" is 12. If none of the
+records overlap, we only need to store one key.
+
+This is the format of a standard b+tree node:
+
+----
++--------+---------+---------+---------+---------+
+| header | key | keys... | ptr | ptrs... |
++--------+---------+---------+---------+---------+
+----
+
+If the b+tree records do not overlap, performing a b+tree lookup is simple.
+Start with the root. If it is a leaf block, perform a binary search of the
+records until we find the record with a lower key than our search key. If the
+block is a node block, perform a binary search of the keys until we find a
+key lower than our search key, then follow the pointer to the next block.
+Repeat until we find a record.
+
+However, if b+tree records contain intervals and are allowed to overlap, the
+internal nodes of the b+tree become larger:
+
+----
++--------+---------+----------+---------+-------------+---------+---------+
+| header | low key | high key | low key | high key... | ptr | ptrs... |
++--------+---------+----------+---------+-------------+---------+---------+
+----
+
+The low keys are exactly the same as the keys in the non-overlapping b+tree.
+High keys, however, are a little different. Recall that a record with a key
+consisting of an interval can be referenced by a number of keys. Since the low
+key of a record indexes the low end of that key range, the high key indexes the
+high end of the key range. Returning to the example above, the high key for
+the record describing "12-16" is 16. The high key recorded in a b+tree node
+is the largest of the high keys of all records accessible under the subtree
+rooted by the pointer. For a level 1 node, this is the largest high key in
+the pointed-to leaf node; for any other node, this is the largest of the high
+keys in the pointed-to node.
+
+Nodes and leaves use the same magic numbers.
+
+[[Short_Format_Btrees]]
+== Short Format B+trees
+
+Each allocation group uses a ``short format'' B+tree to index various
+information about the allocation group. The structure is called short format
+because all block pointers are AG block numbers. The trees use the following
+header:
+
+[source, c]
+----
+struct xfs_btree_sblock {
+ __be32 bb_magic;
+ __be16 bb_level;
+ __be16 bb_numrecs;
+ __be32 bb_leftsib;
+ __be32 bb_rightsib;
+
+ /* version 5 filesystem fields start here */
+ __be64 bb_blkno;
+ __be64 bb_lsn;
+ uuid_t bb_uuid;
+ __be32 bb_owner;
+ __le32 bb_crc;
+};
+----
+
+*bb_magic*::
+Specifies the magic number for the per-AG B+tree block.
+
+*bb_level*::
+The level of the tree in which this block is found. If this value is 0, this
+is a leaf block and contains records; otherwise, it is a node block and
+contains keys and pointers.
+
+*bb_numrecs*::
+Number of records in this block.
+
+*bb_leftsib*::
+AG block number of the left sibling of this B+tree node.
+
+*bb_rightsib*::
+AG block number of the right sibling of this B+tree node.
+
+*bb_blkno*::
+FS block number of this B+tree block.
+
+*bb_lsn*::
+Log sequence number of the last write to this block.
+
+*bb_uuid*::
+The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+
+depending on which features are set.
+
+*bb_owner*::
+The AG number that this B+tree block ought to be in.
+
+*bb_crc*::
+Checksum of the B+tree block.
+
+[[Long_Format_Btrees]]
+== Long Format B+trees
+
+Long format B+trees are similar to short format B+trees, except that their
+block pointers are 64-bit filesystem block numbers instead of 32-bit AG block
+numbers. Because of this, long format b+trees can be (and usually are) rooted
+in an inode's data or attribute fork. The nodes and leaves of this B+tree use
+the +xfs_btree_lblock+ declaration:
+
+[source, c]
+----
+struct xfs_btree_lblock {
+ __be32 bb_magic;
+ __be16 bb_level;
+ __be16 bb_numrecs;
+ __be64 bb_leftsib;
+ __be64 bb_rightsib;
+
+ /* version 5 filesystem fields start here */
+ __be64 bb_blkno;
+ __be64 bb_lsn;
+ uuid_t bb_uuid;
+ __be64 bb_owner;
+ __le32 bb_crc;
+ __be32 bb_pad;
+};
+----
+
+*bb_magic*::
+Specifies the magic number for the btree block.
+
+*bb_level*::
+The level of the tree in which this block is found. If this value is 0, this
+is a leaf block and contains records; otherwise, it is a node block and
+contains keys and pointers.
+
+*bb_numrecs*::
+Number of records in this block.
+
+*bb_leftsib*::
+FS block number of the left sibling of this B+tree node.
+
+*bb_rightsib*::
+FS block number of the right sibling of this B+tree node.
+
+*bb_blkno*::
+FS block number of this B+tree block.
+
+*bb_lsn*::
+Log sequence number of the last write to this block.
+
+*bb_uuid*::
+The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+
+depending on which features are set.
+
+*bb_owner*::
+The AG number that this B+tree block ought to be in.
+
+*bb_crc*::
+Checksum of the B+tree block.
+
+*bb_pad*::
+Pads the structure to 64 bytes.
diff --git a/design/XFS_Filesystem_Structure/data_extents.asciidoc b/design/XFS_Filesystem_Structure/data_extents.asciidoc
index a39045d..4f1109b 100644
--- a/design/XFS_Filesystem_Structure/data_extents.asciidoc
+++ b/design/XFS_Filesystem_Structure/data_extents.asciidoc
@@ -203,9 +203,10 @@ u.bmx[0-1] = [startoff,startblock,blockcount,extentflag]
[[Btree_Extent_List]]
== B+tree Extent List
-To manage extent maps that cannot fit in the inode fork area, XFS uses long
-format B+trees. The root node of the B+tree is stored in the inode's data
-fork. All block pointers for extent B+trees are 64-bit absolute block numbers.
+To manage extent maps that cannot fit in the inode fork area, XFS uses
+xref:Long_Format_Btrees[long format B+trees]. The root node of the B+tree is
+stored in the inode's data fork. All block pointers for extent B+trees are
+64-bit filesystem block numbers.
For a single level B+tree, the root node points to the B+tree's leaves. Each
leaf occupies one filesystem block and contains a header and an array of extents
@@ -242,69 +243,8 @@ standard 256 byte inode before a new level of nodes is added between the root
and the leaves. This will be less if +di_forkoff+ is not zero (i.e. attributes
are in use on the inode).
-[[Long_Format_Btrees]]
-=== Long Format B+trees
-
-The subsequent nodes and leaves of the B+tree use the +xfs_btree_lblock+
-declaration:
-
-[source, c]
-----
-struct xfs_btree_lblock {
- __be32 bb_magic;
- __be16 bb_level;
- __be16 bb_numrecs;
- __be64 bb_leftsib;
- __be64 bb_rightsib;
-
- /* version 5 filesystem fields start here */
- __be64 bb_blkno;
- __be64 bb_lsn;
- uuid_t bb_uuid;
- __be64 bb_owner;
- __le32 bb_crc;
- __be32 bb_pad;
-};
-----
-
-*bb_magic*::
-Specifies the magic number for the BMBT block: ``BMAP'' (0x424d4150).
-On a v5 filesystem, this is ``BMA3'' (0x424d4133).
-
-*bb_level*::
-The level of the tree in which this block is found. If this value is 0, this
-is a leaf block and contains records; otherwise, it is a node block and
-contains keys and pointers.
-
-*bb_numrecs*::
-Number of records in this block.
-
-*bb_leftsib*::
-FS block number of the left sibling of this B+tree node.
-
-*bb_rightsib*::
-FS block number of the right sibling of this B+tree node.
-
-*bb_blkno*::
-FS block number of this B+tree block.
-
-*bb_lsn*::
-Log sequence number of the last write to this block.
-
-*bb_uuid*::
-The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+
-depending on which features are set.
-
-*bb_owner*::
-The AG number that this B+tree block ought to be in.
-
-*bb_crc*::
-Checksum of the B+tree block.
-
-*bb_pad*::
-Pads the structure to 64 bytes.
-
-// force-split the lists
+* The magic number for a BMBT block is ``BMAP'' (0x424d4150). On a v5
+filesystem, this is ``BMA3'' (0x424d4133).
* For intermediate nodes, the data following +xfs_btree_lblock+ is the same as
the root node: array of +xfs_bmbt_key+ value followed by an array of
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index f580aab..62502b3 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -62,6 +62,8 @@ Global Structures
:leveloffset: 1
+include::btrees.asciidoc[]
+
include::allocation_groups.asciidoc[]
include::journaling_log.asciidoc[]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/7] xfsdocs: reverse-mapping btree documentation
2016-08-25 23:26 [PATCH v8 0/7] xfs-docs: reorganize chapters, document rmap and reflink Darrick J. Wong
` (3 preceding siblings ...)
2016-08-25 23:27 ` [PATCH 4/7] xfsdocs: move the discussions of short and long format btrees to a separate chapter Darrick J. Wong
@ 2016-08-25 23:27 ` Darrick J. Wong
2016-08-25 23:27 ` [PATCH 6/7] xfsdocs: document refcount btree and reflink Darrick J. Wong
2016-08-25 23:27 ` [PATCH 7/7] xfsdocs: document the realtime reverse mapping btree Darrick J. Wong
6 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:27 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-xfs, xfs
Add chapters on the operation of the reverse mapping btree and future
things we could do with rmap data.
v2: Add magic number to the table.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
.../allocation_groups.asciidoc | 31 +-
design/XFS_Filesystem_Structure/docinfo.xml | 17 +
.../journaling_log.asciidoc | 122 ++++++++
design/XFS_Filesystem_Structure/magic.asciidoc | 3
.../reconstruction.asciidoc | 53 +++
design/XFS_Filesystem_Structure/rmapbt.asciidoc | 305 ++++++++++++++++++++
.../xfs_filesystem_structure.asciidoc | 4
7 files changed, 526 insertions(+), 9 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/reconstruction.asciidoc
create mode 100644 design/XFS_Filesystem_Structure/rmapbt.asciidoc
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index 55bbc50..9fcf975 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -12,6 +12,7 @@ Each AG has the following characteristics:
* A super block describing overall filesystem info
* Free space management
* Inode allocation and tracking
+ * Reverse block-mapping index (optional)
Having multiple AGs allows XFS to handle most operations in parallel without
degrading performance as the number of concurrent accesses increases.
@@ -379,6 +380,12 @@ it doesn't understand the flag.
Free inode B+tree. Each allocation group contains a B+tree to track inode chunks
containing free inodes. This is a performance optimization to reduce the time
required to allocate inodes.
+
+| +XFS_SB_FEAT_RO_COMPAT_RMAPBT+ |
+Reverse mapping B+tree. Each allocation group contains a B+tree containing
+records mapping AG blocks to their owners. See the section about
+xref:Reconstruction[reconstruction] for more details.
+
|=====
*sb_features_incompat*::
@@ -529,9 +536,7 @@ struct xfs_agf {
__be32 agf_seqno;
__be32 agf_length;
__be32 agf_roots[XFS_BTNUM_AGF];
- __be32 agf_spare0;
__be32 agf_levels[XFS_BTNUM_AGF];
- __be32 agf_spare1;
__be32 agf_flfirst;
__be32 agf_fllast;
__be32 agf_flcount;
@@ -541,7 +546,9 @@ struct xfs_agf {
/* version 5 filesystem fields start here */
uuid_t agf_uuid;
- __be64 agf_spare64[16];
+ __be32 agf_rmap_blocks;
+ __be32 __pad;
+ __be64 agf_spare64[15];
/* unlogged fields, written during buffer writeback. */
__be64 agf_lsn;
@@ -550,9 +557,10 @@ struct xfs_agf {
};
----
-The rest of the bytes in the sector are zeroed. +XFS_BTNUM_AGF+ is set to 2:
-index 0 for the free space B+tree indexed by block number; and index 1 for the
-free space B+tree indexed by extent size.
+The rest of the bytes in the sector are zeroed. +XFS_BTNUM_AGF+ is set to 3:
+index 0 for the free space B+tree indexed by block number; index 1 for the free
+space B+tree indexed by extent size; and index 2 for the reverse-mapping
+B+tree.
*agf_magicnum*::
Specifies the magic number for the AGF sector: ``XAGF'' (0x58414746).
@@ -570,11 +578,13 @@ this could be less than the +sb_agblocks+ value. It is this value that should
be used to determine the size of the AG.
*agf_roots*::
-Specifies the block number for the root of the two free space B+trees.
+Specifies the block number for the root of the two free space B+trees and the
+reverse-mapping B+tree, if enabled.
*agf_levels*::
-Specifies the level or depth of the two free space B+trees. For a fresh AG, this
-will be one, and the ``roots'' will point to a single leaf of level 0.
+Specifies the level or depth of the two free space B+trees and the
+reverse-mapping B+tree, if enabled. For a fresh AG, this value will be one,
+and the ``roots'' will point to a single leaf of level 0.
*agf_flfirst*::
Specifies the index of the first ``free list'' block. Free lists are covered in
@@ -600,6 +610,9 @@ used if the +XFS_SB_VERSION2_LAZYSBCOUNTBIT+ bit is set in +sb_features2+.
The UUID of this block, which must match either +sb_uuid+ or +sb_meta_uuid+
depending on which features are set.
+*agf_rmap_blocks*::
+The size of the reverse mapping B+tree in this allocation group, in blocks.
+
*agf_spare64*::
Empty space in the logged part of the AGF sector, for use for future features.
diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index cc5596d..44f944a 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -122,4 +122,21 @@
</simplelist>
</revdescription>
</revision>
+ <revision>
+ <revnumber>3.141</revnumber>
+ <date>June 2016</date>
+ <author>
+ <firstname>Darrick</firstname>
+ <surname>Wong</surname>
+ <email></email>
+ </author>
+ <revdescription>
+ <simplelist>
+ <member>Document the reverse-mapping btree.</member>
+ <member>Move the b+tree info to a separate chapter.</member>
+ <member>Discuss overlapping interval b+trees.</member>
+ <member>Discuss new log items for atomic updates.</member>
+ </simplelist>
+ </revdescription>
+ </revision>
</revhistory>
diff --git a/design/XFS_Filesystem_Structure/journaling_log.asciidoc b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
index 67d209f..78ce436 100644
--- a/design/XFS_Filesystem_Structure/journaling_log.asciidoc
+++ b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
@@ -209,6 +209,8 @@ magic number to distinguish themselves. Buffer data items only appear after
| +XFS_LI_DQUOT+ | 0x123d | xref:Quota_Update_Log_Item[Update Quota]
| +XFS_LI_QUOTAOFF+ | 0x123e | xref:Quota_Off_Log_Item[Quota Off]
| +XFS_LI_ICREATE+ | 0x123f | xref:Inode_Create_Log_Item[Inode Creation]
+| +XFS_LI_RUI+ | 0x1240 | xref:RUI_Log_Item[Reverse Mapping Update Intent]
+| +XFS_LI_RUD+ | 0x1241 | xref:RUD_Log_Item[Reverse Mapping Update Done]
|=====
[[Log_Transaction_Headers]]
@@ -386,6 +388,126 @@ Variable-length array of extents to be freed. The array length is given by
+xfs_extent_32_t+; this can be determined from the log item size (+oh_len+) and
the number of extents (+efd_nextents+).
+[[RUI_Log_Item]]
+=== Reverse Mapping Updates Intent
+
+The next two operation types work together to handle deferred reverse mapping
+updates. Naturally, the mappings to be updated can be expressed in terms of
+mapping extents:
+
+[source, c]
+----
+struct xfs_map_extent {
+ __uint64_t me_owner;
+ __uint64_t me_startblock;
+ __uint64_t me_startoff;
+ __uint32_t me_len;
+ __uint32_t me_flags;
+};
+----
+
+*me_owner*::
+Owner of this reverse mapping. See the values in the section about
+xref:Reverse_Mapping_Btree[reverse mapping] for more information.
+
+*me_startblock*::
+Filesystem block of this mapping.
+
+*me_startoff*::
+Logical block offset of this mapping.
+
+*me_len*::
+The length of this mapping.
+
+*me_flags*::
+The lower byte of this field is a type code indicating what sort of
+reverse mapping operation we want. The upper three bytes are flag bits.
+
+.Reverse mapping update log intent types
+[options="header"]
+|=====
+| Value | Description
+| +XFS_RMAP_EXTENT_MAP+ | Add a reverse mapping for file data.
+| +XFS_RMAP_EXTENT_MAP_SHARED+ | Add a reverse mapping for file data for a file with shared blocks.
+| +XFS_RMAP_EXTENT_UNMAP+ | Remove a reverse mapping for file data.
+| +XFS_RMAP_EXTENT_UNMAP_SHARED+ | Remove a reverse mapping for file data for a file with shared blocks.
+| +XFS_RMAP_EXTENT_CONVERT+ | Convert a reverse mapping for file data between unwritten and normal.
+| +XFS_RMAP_EXTENT_CONVERT_SHARED+ | Convert a reverse mapping for file data between unwritten and normal for a file with shared blocks.
+| +XFS_RMAP_EXTENT_ALLOC+ | Add a reverse mapping for non-file data.
+| +XFS_RMAP_EXTENT_FREE+ | Remove a reverse mapping for non-file data.
+|=====
+
+.Reverse mapping update log intent flags
+[options="header"]
+|=====
+| Value | Description
+| +XFS_RMAP_EXTENT_ATTR_FORK+ | Extent is for the attribute fork.
+| +XFS_RMAP_EXTENT_BMBT_BLOCK+ | Extent is for a block mapping btree block.
+| +XFS_RMAP_EXTENT_UNWRITTEN+ | Extent is unwritten.
+|=====
+
+The ``rmap update intent'' operation comes first; it tells the log that XFS
+wants to update some reverse mappings. This record is crucial for correct log
+recovery because it enables us to spread a complex metadata update across
+multiple transactions while ensuring that a crash midway through the complex
+update will be replayed fully during log recovery.
+
+[source, c]
+----
+struct xfs_rui_log_format {
+ __uint16_t rui_type;
+ __uint16_t rui_size;
+ __uint32_t rui_nextents;
+ __uint64_t rui_id;
+ struct xfs_map_extent rui_extents[1];
+};
+----
+
+*rui_type*::
+The signature of an RUI operation, 0x1240. This value is in host-endian order,
+not big-endian like the rest of XFS.
+
+*rui_size*::
+Size of this log item. Should be 1.
+
+*rui_nextents*::
+Number of reverse mappings.
+
+*rui_id*::
+A 64-bit number that binds the corresponding RUD log item to this RUI log item.
+
+*rui_extents*::
+Variable-length array of reverse mappings to update.
+
+[[RUD_Log_Item]]
+=== Completion of Reverse Mapping Updates
+
+The ``reverse mapping update done'' operation complements the ``reverse mapping
+update intent'' operation. This second operation indicates that the update
+actually happened, so that log recovery needn't replay the update. The RUD and
+the actual updates are typically found in a new transaction following the
+transaction in which the RUI was logged.
+
+[source, c]
+----
+struct xfs_rud_log_format {
+ __uint16_t rud_type;
+ __uint16_t rud_size;
+ __uint32_t __pad;
+ __uint64_t rud_rui_id;
+};
+----
+
+*rud_type*::
+The signature of an RUD operation, 0x1241. This value is in host-endian order,
+not big-endian like the rest of XFS.
+
+*rud_size*::
+Size of this log item. Should be 1.
+
+*rud_rui_id*::
+A 64-bit number that binds the corresponding RUI log item to this RUD log item.
+
[[Inode_Log_Item]]
=== Inode Updates
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index 301cfa0..10fd15f 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -44,6 +44,7 @@ relevant chapters. Magic numbers tend to have consistent locations:
| +XFS_ATTR_LEAF_MAGIC+ | 0xfbee | | xref:Leaf_Attributes[Leaf Attribute]
| +XFS_ATTR3_LEAF_MAGIC+ | 0x3bee | | xref:Leaf_Attributes[Leaf Attribute], v5 only
| +XFS_ATTR3_RMT_MAGIC+ | 0x5841524d | XARM | xref:Remote_Values[Remote Attribute Value], v5 only
+| +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
|=====
The magic numbers for log items are at offset zero in each log item, but items
@@ -61,6 +62,8 @@ are not aligned to blocks.
| +XFS_LI_DQUOT+ | 0x123d | | xref:Quota_Update_Log_Item[Update Quota Log Item]
| +XFS_LI_QUOTAOFF+ | 0x123e | | xref:Quota_Off_Log_Item[Quota Off Log Item]
| +XFS_LI_ICREATE+ | 0x123f | | xref:Inode_Create_Log_Item[Inode Creation Log Item]
+| +XFS_LI_RUI+ | 0x1240 | | xref:RUI_Log_Item[Reverse Mapping Update Intent]
+| +XFS_LI_RUD+ | 0x1241 | | xref:RUD_Log_Item[Reverse Mapping Update Done]
|=====
= Theoretical Limits
diff --git a/design/XFS_Filesystem_Structure/reconstruction.asciidoc b/design/XFS_Filesystem_Structure/reconstruction.asciidoc
new file mode 100644
index 0000000..f172e0f
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/reconstruction.asciidoc
@@ -0,0 +1,53 @@
+[[Reconstruction]]
+= Metadata Reconstruction
+
+[NOTE]
+This is a theoretical discussion of how reconstruction could work; none of this
+is implemented as of 2015.
+
+A simple UNIX filesystem can be thought of in terms of a directed acyclic graph.
+To a first approximation, there exists a root directory node, which points to
+other nodes. Those other nodes can themselves be directories or they can be
+files. Each file, in turn, points to data blocks.
+
+XFS adds a few more details to this picture:
+
+* The real root(s) of an XFS filesystem are the allocation group headers
+(superblock, AGF, AGI, AGFL).
+* Each allocation group’s headers point to various per-AG B+trees (free space,
+inode, free inodes, free list, etc.)
+* The free space B+trees point to unused extents;
+* The inode B+trees point to blocks containing inode chunks;
+* All superblocks point to the root directory and the log;
+* Hardlinks mean that multiple directories can point to a single file node;
+* File data block pointers are indexed by file offset;
+* Files and directories can have a second collection of pointers to data blocks
+which contain extended attributes;
+* Large directories require multiple data blocks to store all the subpointers;
+* Still larger directories use high-offset data blocks to store a B+tree of
+hashes to directory entries;
+* Large extended attribute forks similarly use high-offset data blocks to store
+a B+tree of hashes to attribute keys; and
+* Symbolic links can point to data blocks.
+
+The beauty of this massive graph structure is that under normal circumstances,
+everything known to the filesystem is discoverable (access controls
+notwithstanding) from the root. The major weakness of this structure of course
+is that breaking a edge in the graph can render entire subtrees inaccessible.
++xfs_repair+ “recovers” from broken directories by scanning for unlinked inodes
+and connecting them to +/lost+found+, but this isn’t sufficiently general to
+recover from breaks in other parts of the graph structure. Wouldn’t it be
+useful to have back pointers as a secondary data structure? The current repair
+strategy is to reconstruct whatever can be rebuilt, but to scrap anything that
+doesn't check out.
+
+The xref:Reverse_Mapping_Btree[reverse-mapping B+tree] fills in part of the
+puzzle. Since it contains copies of every entry in each inode’s data and
+attribute forks, we can fix a corrupted block map with these records.
+Furthermore, if the inode B+trees become corrupt, it is possible to visit all
+inode chunks using the reverse-mapping data. Should XFS ever gain the ability
+to store parent directory information in each inode, it also becomes possible
+to resurrect damaged directory trees, which should reduce the complaints about
+inodes ending up in +/lost+found+. Everything else in the per-AG primary
+metadata can already be reconstructed via +xfs_repair+. Hopefully,
+reconstruction will not turn out to be a fool's errand.
diff --git a/design/XFS_Filesystem_Structure/rmapbt.asciidoc b/design/XFS_Filesystem_Structure/rmapbt.asciidoc
new file mode 100644
index 0000000..a8a210b
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/rmapbt.asciidoc
@@ -0,0 +1,305 @@
+[[Reverse_Mapping_Btree]]
+== Reverse-Mapping B+tree
+
+[NOTE]
+This data structure is under construction! Details may change.
+
+If the feature is enabled, each allocation group has its own reverse
+block-mapping B+tree, which grows in the free space like the free space
+B+trees. As mentioned in the chapter about
+xref:Reconstruction[reconstruction], this data structure is another piece of
+the puzzle necessary to reconstruct the data or attribute fork of a file from
+reverse-mapping records; we can also use it to double-check allocations to
+ensure that we are not accidentally cross-linking blocks, which can cause
+severe damage to the filesystem.
+
+This B+tree is only present if the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+
+feature is enabled. The feature requires a version 5 filesystem.
+
+Each record in the reverse-mapping B+tree has the following structure:
+
+[source, c]
+----
+struct xfs_rmap_rec {
+ __be32 rm_startblock;
+ __be32 rm_blockcount;
+ __be64 rm_owner;
+ __be64 rm_fork:1;
+ __be64 rm_bmbt:1;
+ __be64 rm_unwritten:1;
+ __be64 rm_unused:7;
+ __be64 rm_offset:54;
+};
+----
+
+*rm_startblock*::
+AG block number of this record.
+
+*rm_blockcount*::
+The length of this extent.
+
+*rm_owner*::
+A 64-bit number describing the owner of this extent. This is typically the
+absolute inode number, but can also correspond to one of the following:
+
+.Special owner values
+[options="header"]
+|=====
+| Value | Description
+| +XFS_RMAP_OWN_NULL+ | No owner. This should never appear on disk.
+| +XFS_RMAP_OWN_UNKNOWN+ | Unknown owner; for EFI recovery. This should never appear on disk.
+| +XFS_RMAP_OWN_FS+ | Allocation group headers
+| +XFS_RMAP_OWN_LOG+ | XFS log blocks
+| +XFS_RMAP_OWN_AG+ | Per-allocation group B+tree blocks. This means free space B+tree blocks, blocks on the freelist, and reverse-mapping B+tree blocks.
+| +XFS_RMAP_OWN_INOBT+ | Per-allocation group inode B+tree blocks. This includes free inode B+tree blocks.
+| +XFS_RMAP_OWN_INODES+ | Inode chunks
+|=====
+
+*rm_fork*::
+If +rm_owner+ describes an inode, this can be 1 if this record is for an
+attribute fork.
+
+*rm_bmbt*::
+If +rm_owner+ describes an inode, this can be 1 to signify that this record is
+for a block map B+tree block. In this case, +rm_offset+ has no meaning.
+
+*rm_unwritten*::
+A flag indicating that the extent is unwritten. This corresponds to the flag in
+the xref:Data_Extents[extent record] format which means +XFS_EXT_UNWRITTEN+.
+
+*rm_offset*::
+The 54-bit logical file block offset, if +rm_owner+ describes an inode.
+Meaningless otherwise.
+
+[NOTE]
+The single-bit flag values +rm_unwritten+, +rm_fork+, and +rm_bmbt+ are packed
+into the larger fields in the C structure definition.
+
+The key has the following structure:
+
+[source, c]
+----
+struct xfs_rmap_key {
+ __be32 rm_startblock;
+ __be64 rm_owner;
+ __be64 rm_fork:1;
+ __be64 rm_bmbt:1;
+ __be64 rm_reserved:1;
+ __be64 rm_unused:7;
+ __be64 rm_offset:54;
+};
+----
+
+For the reverse-mapping B+tree on a filesystem that supports sharing of file
+data blocks, the key definition is larger than the usual AG block number. On a
+classic XFS filesystem, each block has only one owner, which means that
++rm_startblock+ is sufficient to uniquely identify each record. However,
+shared block support (reflink) on XFS breaks that assumption; now filesystem
+blocks can be linked to any logical block offset of any file inode. Therefore,
+the key must include the owner and offset information to preserve the 1 to 1
+relation between key and record.
+
+* As the reference counting is AG relative, all the block numbers are only
+32-bits.
+* The +bb_magic+ value is "RMB3" (0x524d4233).
+* The +xfs_btree_sblock_t+ header is used for intermediate B+tree node as well
+as the leaves.
+* Each pointer is associated with two keys. The first of these is the "low
+key", which is the key of the smallest record accessible through the pointer.
+This low key has the same meaning as the key in all other btrees. The second
+key is the high key, which is the maximum of the largest key that can be used
+to access a given record underneath the pointer. Recall that each record
+in the reverse mapping b+tree describes an interval of physical blocks mapped
+to an interval of logical file block offsets; therefore, it makes sense that
+a range of keys can be used to find to a record.
+
+=== xfs_db rmapbt Example
+
+This example shows a reverse-mapping B+tree from a freshly populated root
+filesystem:
+
+----
+xfs_db> agf 0
+xfs_db> addr rmaproot
+xfs_db> p
+magic = 0x524d4233
+level = 1
+numrecs = 43
+leftsib = null
+rightsib = null
+bno = 56
+lsn = 0x3000004c8
+uuid = 1977221d-8345-464e-b1f4-aa2ea36895f4
+owner = 0
+crc = 0x7cf8be6f (correct)
+keys[1-43] = [startblock,owner,offset]
+keys[1-43] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,owner_hi,
+ offset_hi,attrfork_hi,bmbtblock_hi]
+ 1:[0,-3,0,0,0,351,4418,66,0,0]
+ 2:[417,285,0,0,0,827,4419,2,0,0]
+ 3:[829,499,0,0,0,2352,573,55,0,0]
+ 4:[1292,710,0,0,0,32168,262923,47,0,0]
+ 5:[32215,-5,0,0,0,34655,2365,3411,0,0]
+ 6:[34083,1161,0,0,0,34895,265220,1,0,1]
+ 7:[34896,256191,0,0,0,36522,-9,0,0,0]
+ ...
+ 41:[50998,326734,0,0,0,51430,-5,0,0,0]
+ 42:[51431,327010,0,0,0,51600,325722,11,0,0]
+ 43:[51611,327112,0,0,0,94063,23522,28375272,0,0]
+ptrs[1-43] = 1:5 2:6 3:8 4:9 5:10 6:11 7:418 ... 41:46377 42:48784 43:49522
+----
+
+We arbitrarily pick pointer 17 to traverse downwards:
+
+----
+xfs_db> addr ptrs[17]
+xfs_db> p
+magic = 0x524d4233
+level = 0
+numrecs = 168
+leftsib = 36284
+rightsib = 37617
+bno = 294760
+lsn = 0x200002761
+uuid = 1977221d-8345-464e-b1f4-aa2ea36895f4
+owner = 0
+crc = 0x2dad3fbe (correct)
+recs[1-168] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+ 1:[40326,1,259615,0,0,0,0] 2:[40327,1,-5,0,0,0,0]
+ 3:[40328,2,259618,0,0,0,0] 4:[40330,1,259619,0,0,0,0]
+ ...
+ 127:[40540,1,324266,0,0,0,0] 128:[40541,1,324266,8388608,0,0,0]
+ 129:[40542,2,324266,1,0,0,0] 130:[40544,32,-7,0,0,0,0]
+----
+
+Several interesting things pop out here. The first record shows that inode
+259,615 has mapped AG block 40,326 at offset 0. We confirm this by looking at
+the block map for that inode:
+
+----
+xfs_db> inode 259615
+xfs_db> bmap
+data offset 0 startblock 40326 (0/40326) count 1 flag 0
+----
+
+Next, notice records 127 and 128, which describe neighboring AG blocks that are
+mapped to non-contiguous logical blocks in inode 324,266. Given the logical
+offset of 8,388,608 we surmise that this is a leaf directory, but let us
+confirm:
+
+----
+xfs_db> inode 324266
+xfs_db> p core.mode
+core.mode = 040755
+xfs_db> bmap
+data offset 0 startblock 40540 (0/40540) count 1 flag 0
+data offset 1 startblock 40542 (0/40542) count 2 flag 0
+data offset 3 startblock 40576 (0/40576) count 1 flag 0
+data offset 8388608 startblock 40541 (0/40541) count 1 flag 0
+xfs_db> p core.mode
+core.mode = 0100644
+xfs_db> dblock 0
+xfs_db> p dhdr.hdr.magic
+dhdr.hdr.magic = 0x58444433
+xfs_db> dblock 8388608
+xfs_db> p lhdr.info.hdr.magic
+lhdr.info.hdr.magic = 0x3df1
+----
+
+Indeed, this inode 324,266 appears to be a leaf directory, as it has regular
+directory data blocks at low offsets, and a single leaf block.
+
+Notice further the two reverse-mapping records with negative owners. An owner
+of -7 corresponds to +XFS_RMAP_OWN_INODES+, which is an inode chunk, and an
+owner code of -5 corresponds to +XFS_RMAP_OWN_AG+, which covers free space
+B+trees and free space. Let's see if block 40,544 is part of an inode chunk:
+
+----
+xfs_db> blockget
+xfs_db> fsblock 40544
+xfs_db> blockuse
+block 40544 (0/40544) type inode
+xfs_db> stack
+1:
+ byte offset 166068224, length 4096
+ buffer block 324352 (fsbno 40544), 8 bbs
+ inode 324266, dir inode 324266, type data
+xfs_db> type inode
+xfs_db> p
+core.magic = 0x494e
+----
+
+Our suspicions are confirmed. Let's also see if 40,327 is part of a free space
+tree:
+
+----
+xfs_db> fsblock 40327
+xfs_db> blockuse
+block 40327 (0/40327) type btrmap
+xfs_db> type rmapbt
+xfs_db> p
+magic = 0x524d4233
+----
+
+As you can see, the reverse block-mapping B+tree is an important secondary
+metadata structure, which can be used to reconstruct damaged primary metadata.
+Now let's look at an extend rmap btree:
+
+----
+xfs_db> agf 0
+xfs_db> addr rmaproot
+xfs_db> p
+magic = 0x34524d42
+level = 1
+numrecs = 5
+leftsib = null
+rightsib = null
+bno = 6368
+lsn = 0x100000d1b
+uuid = 400f0928-6b88-4c37-af1e-cef1f8911f3f
+owner = 0
+crc = 0x8d4ace05 (correct)
+keys[1-5] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,owner_hi,offset_hi,attrfork_hi,bmbtblock_hi]
+1:[0,-3,0,0,0,705,132,681,0,0]
+2:[24,5761,0,0,0,548,5761,524,0,0]
+3:[24,5929,0,0,0,380,5929,356,0,0]
+4:[24,6097,0,0,0,212,6097,188,0,0]
+5:[24,6277,0,0,0,807,-7,0,0,0]
+ptrs[1-5] = 1:5 2:771 3:9 4:10 5:11
+----
+
+The second pointer stores both the low key [24,5761,0,0,0] and the high key
+[548,5761,524,0,0], which means that we can expect block 771 to contain records
+starting at physical block 24, inode 5761, offset zero; and that one of the
+records can be used to find a reverse mapping for physical block 548, inode
+5761, and offset 524:
+
+----
+xfs_db> addr ptrs[2]
+xfs_db> p
+magic = 0x34524d42
+level = 0
+numrecs = 168
+leftsib = 5
+rightsib = 9
+bno = 6168
+lsn = 0x100000d1b
+uuid = 400f0928-6b88-4c37-af1e-cef1f8911f3f
+owner = 0
+crc = 0xd58eff0e (correct)
+recs[1-168] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+1:[24,525,5761,0,0,0,0]
+2:[24,524,5762,0,0,0,0]
+3:[24,523,5763,0,0,0,0]
+...
+166:[24,360,5926,0,0,0,0]
+167:[24,359,5927,0,0,0,0]
+168:[24,358,5928,0,0,0,0]
+----
+
+Observe that the first record in the block starts at physical block 24, inode
+5761, offset zero, just as we expected. Note that this first record is also
+indexed by the highest key as provided in the node block; physical block 548,
+inode 5761, offset 524 is the very last block mapped by this record. Furthermore,
+note that record 168, despite being the last record in this block, has a lower
+maximum key (physical block 382, inode 5928, offset 23) than the first record.
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index 62502b3..1b8658d 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -48,6 +48,8 @@ include::overview.asciidoc[]
include::metadata_integrity.asciidoc[]
+include::reconstruction.asciidoc[]
+
include::common_types.asciidoc[]
include::magic.asciidoc[]
@@ -66,6 +68,8 @@ include::btrees.asciidoc[]
include::allocation_groups.asciidoc[]
+include::rmapbt.asciidoc[]
+
include::journaling_log.asciidoc[]
include::internal_inodes.asciidoc[]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 6/7] xfsdocs: document refcount btree and reflink
2016-08-25 23:26 [PATCH v8 0/7] xfs-docs: reorganize chapters, document rmap and reflink Darrick J. Wong
` (4 preceding siblings ...)
2016-08-25 23:27 ` [PATCH 5/7] xfsdocs: reverse-mapping btree documentation Darrick J. Wong
@ 2016-08-25 23:27 ` Darrick J. Wong
2016-08-25 23:27 ` [PATCH 7/7] xfsdocs: document the realtime reverse mapping btree Darrick J. Wong
6 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:27 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-xfs, xfs
Document the reference count btree and talk a little bit about how
the reflink feature uses it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
.../allocation_groups.asciidoc | 25 ++-
.../XFS_Filesystem_Structure/directories.asciidoc | 1
design/XFS_Filesystem_Structure/docinfo.xml | 2
.../journaling_log.asciidoc | 192 ++++++++++++++++++++
design/XFS_Filesystem_Structure/magic.asciidoc | 5 +
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 25 ++-
.../XFS_Filesystem_Structure/refcountbt.asciidoc | 145 +++++++++++++++
design/XFS_Filesystem_Structure/reflink.asciidoc | 40 ++++
design/XFS_Filesystem_Structure/rmapbt.asciidoc | 2
.../xfs_filesystem_structure.asciidoc | 4
10 files changed, 435 insertions(+), 6 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/refcountbt.asciidoc
create mode 100644 design/XFS_Filesystem_Structure/reflink.asciidoc
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index 9fcf975..cafa8b7 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -13,6 +13,7 @@ Each AG has the following characteristics:
* Free space management
* Inode allocation and tracking
* Reverse block-mapping index (optional)
+ * Data block reference count index (optional)
Having multiple AGs allows XFS to handle most operations in parallel without
degrading performance as the number of concurrent accesses increases.
@@ -386,6 +387,12 @@ Reverse mapping B+tree. Each allocation group contains a B+tree containing
records mapping AG blocks to their owners. See the section about
xref:Reconstruction[reconstruction] for more details.
+| +XFS_SB_FEAT_RO_COMPAT_REFLINK+ |
+Reference count B+tree. Each allocation group contains a B+tree to track the
+reference counts of AG blocks. This enables files to share data blocks safely.
+See the section about xref:Reflink_Deduplication[reflink and deduplication] for
+more details.
+
|=====
*sb_features_incompat*::
@@ -547,8 +554,10 @@ struct xfs_agf {
/* version 5 filesystem fields start here */
uuid_t agf_uuid;
__be32 agf_rmap_blocks;
- __be32 __pad;
- __be64 agf_spare64[15];
+ __be32 agf_refcount_blocks;
+ __be32 agf_refcount_root;
+ __be32 agf_refcount_level;
+ __be64 agf_spare64[14];
/* unlogged fields, written during buffer writeback. */
__be64 agf_lsn;
@@ -613,6 +622,15 @@ depending on which features are set.
*agf_rmap_blocks*::
The size of the reverse mapping B+tree in this allocation group, in blocks.
+*agf_refcount_blocks*::
+The size of the reference count B+tree in this allocation group, in blocks.
+
+*agf_refcount_root*::
+Block number for the root of the reference count B+tree, if enabled.
+
+*agf_refcount_root*::
+Depth of the reference count B+tree, if enabled.
+
*agf_spare64*::
Empty space in the logged part of the AGF sector, for use for future features.
@@ -1243,4 +1261,5 @@ By placing the real time device (and the journal) on separate high-performance
storage devices, it is possible to reduce most of the unpredictability in I/O
response times that come from metadata operations.
-None of the XFS per-AG B+trees are involved with real time files.
+None of the XFS per-AG B+trees are involved with real time files. It is not
+possible for real time files to share data blocks.
diff --git a/design/XFS_Filesystem_Structure/directories.asciidoc b/design/XFS_Filesystem_Structure/directories.asciidoc
index bccf912..1758c4e 100644
--- a/design/XFS_Filesystem_Structure/directories.asciidoc
+++ b/design/XFS_Filesystem_Structure/directories.asciidoc
@@ -1419,6 +1419,7 @@ The hash value of a particular record.
The directory/attribute logical block containing all entries up to the
corresponding hash value.
+//
* The freeindex's +bests+ array starts from the end of the block and grows to the
start of the block.
diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index 44f944a..f5e62bc 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -136,6 +136,8 @@
<member>Move the b+tree info to a separate chapter.</member>
<member>Discuss overlapping interval b+trees.</member>
<member>Discuss new log items for atomic updates.</member>
+ <member>Document the reference-count btree.</member>
+ <member>Discuss block sharing, reflink, & deduplication.</member>
</simplelist>
</revdescription>
</revision>
diff --git a/design/XFS_Filesystem_Structure/journaling_log.asciidoc b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
index 78ce436..0aec036 100644
--- a/design/XFS_Filesystem_Structure/journaling_log.asciidoc
+++ b/design/XFS_Filesystem_Structure/journaling_log.asciidoc
@@ -211,6 +211,10 @@ magic number to distinguish themselves. Buffer data items only appear after
| +XFS_LI_ICREATE+ | 0x123f | xref:Inode_Create_Log_Item[Inode Creation]
| +XFS_LI_RUI+ | 0x1240 | xref:RUI_Log_Item[Reverse Mapping Update Intent]
| +XFS_LI_RUD+ | 0x1241 | xref:RUD_Log_Item[Reverse Mapping Update Done]
+| +XFS_LI_CUI+ | 0x1242 | xref:CUI_Log_Item[Reference Count Update Intent]
+| +XFS_LI_CUD+ | 0x1243 | xref:CUD_Log_Item[Reference Count Update Done]
+| +XFS_LI_BUI+ | 0x1244 | xref:BUI_Log_Item[File Block Mapping Update Intent]
+| +XFS_LI_BUD+ | 0x1245 | xref:BUD_Log_Item[File Block Mapping Update Done]
|=====
[[Log_Transaction_Headers]]
@@ -508,6 +512,194 @@ Size of this log item. Should be 1.
*rud_rui_id*::
A 64-bit number that binds the corresponding RUI log item to this RUD log item.
+[[CUI_Log_Item]]
+=== Reference Count Updates Intent
+
+The next two operation types work together to handle reference count updates.
+Naturally, the ranges of extents having reference count updates can be
+expressed in terms of physical extents:
+
+[source, c]
+----
+struct xfs_phys_extent {
+ __uint64_t pe_startblock;
+ __uint32_t pe_len;
+ __uint32_t pe_flags;
+};
+----
+
+*pe_startblock*::
+Filesystem block of this extent.
+
+*pe_len*::
+The length of this extent.
+
+*pe_flags*::
+The lower byte of this field is a type code indicating what sort of
+reverse mapping operation we want. The upper three bytes are flag bits.
+
+.Reference count update log intent types
+[options="header"]
+|=====
+| Value | Description
+| +XFS_REFCOUNT_EXTENT_INCREASE+ | Increase the reference count for this extent.
+| +XFS_REFCOUNT_EXTENT_DECREASE+ | Decrease the reference count for this extent.
+| +XFS_REFCOUNT_EXTENT_ALLOC_COW+ | Reserve an extent for staging copy on write.
+| +XFS_REFCOUNT_EXTENT_FREE_COW+ | Unreserve an extent for staging copy on write.
+|=====
+
+The ``reference count update intent'' operation comes first; it tells the log
+that XFS wants to update some reference counts. This record is crucial for
+correct log recovery because it enables us to spread a complex metadata update
+across multiple transactions while ensuring that a crash midway through the
+complex update will be replayed fully during log recovery.
+
+[source, c]
+----
+struct xfs_cui_log_format {
+ __uint16_t cui_type;
+ __uint16_t cui_size;
+ __uint32_t cui_nextents;
+ __uint64_t cui_id;
+ struct xfs_map_extent cui_extents[1];
+};
+----
+
+*cui_type*::
+The signature of an CUI operation, 0x1242. This value is in host-endian order,
+not big-endian like the rest of XFS.
+
+*cui_size*::
+Size of this log item. Should be 1.
+
+*cui_nextents*::
+Number of reference count updates.
+
+*cui_id*::
+A 64-bit number that binds the corresponding RUD log item to this RUI log item.
+
+*cui_extents*::
+Variable-length array of reference count update information.
+
+[[CUD_Log_Item]]
+=== Completion of Reference Count Updates
+
+The ``reference count update done'' operation complements the ``reference count
+update intent'' operation. This second operation indicates that the update
+actually happened, so that log recovery needn't replay the update. The CUD and
+the actual updates are typically found in a new transaction following the
+transaction in which the CUI was logged.
+
+[source, c]
+----
+struct xfs_cud_log_format {
+ __uint16_t cud_type;
+ __uint16_t cud_size;
+ __uint32_t __pad;
+ __uint64_t cud_cui_id;
+};
+----
+
+*cud_type*::
+The signature of an RUD operation, 0x1243. This value is in host-endian order,
+not big-endian like the rest of XFS.
+
+*cud_size*::
+Size of this log item. Should be 1.
+
+*cud_cui_id*::
+A 64-bit number that binds the corresponding CUI log item to this CUD log item.
+
+[[BUI_Log_Item]]
+=== File Block Mapping Intent
+
+The next two operation types work together to handle deferred file block
+mapping updates. The extents to be mapped are expressed via the
++xfs_map_extent+ structure discussed in the section about
+xref:RUI_Log_Item[reverse mapping intents].
+
+The lower byte of the +me_flags+ field is a type code indicating what sort of
+file block mapping operation we want. The upper three bytes are flag bits.
+
+.File block mapping update log intent types
+[options="header"]
+|=====
+| Value | Description
+| +XFS_BMAP_EXTENT_MAP+ | Add a mapping for file data.
+| +XFS_BMAP_EXTENT_UNMAP+ | Remove a mapping for file data.
+|=====
+
+.File block mapping update log intent flags
+[options="header"]
+|=====
+| Value | Description
+| +XFS_BMAP_EXTENT_ATTR_FORK+ | Extent is for the attribute fork.
+| +XFS_BMAP_EXTENT_UNWRITTEN+ | Extent is unwritten.
+|=====
+
+The ``file block mapping update intent'' operation comes first; it tells the
+log that XFS wants to map or unmap some extents in a file. This record is
+crucial for correct log recovery because it enables us to spread a complex
+metadata update across multiple transactions while ensuring that a crash midway
+through the complex update will be replayed fully during log recovery.
+
+[source, c]
+----
+struct xfs_bui_log_format {
+ __uint16_t bui_type;
+ __uint16_t bui_size;
+ __uint32_t bui_nextents;
+ __uint64_t bui_id;
+ struct xfs_map_extent bui_extents[1];
+};
+----
+
+*bui_type*::
+The signature of an BUI operation, 0x1244. This value is in host-endian order,
+not big-endian like the rest of XFS.
+
+*bui_size*::
+Size of this log item. Should be 1.
+
+*bui_nextents*::
+Number of file mappings. Should be 1.
+
+*bui_id*::
+A 64-bit number that binds the corresponding BUD log item to this BUI log item.
+
+*bui_extents*::
+Variable-length array of file block mappings to update. There should only
+be one mapping present.
+
+[[BUD_Log_Item]]
+=== Completion of File Block Mapping Updates
+
+The ``file block mapping update done'' operation complements the ``file block
+mapping update intent'' operation. This second operation indicates that the
+update actually happened, so that log recovery needn't replay the update. The
+BUD and the actual updates are typically found in a new transaction following
+the transaction in which the BUI was logged.
+
+[source, c]
+----
+struct xfs_bud_log_format {
+ __uint16_t bud_type;
+ __uint16_t bud_size;
+ __uint32_t __pad;
+ __uint64_t bud_bui_id;
+};
+----
+
+*bud_type*::
+The signature of an BUD operation, 0x1245. This value is in host-endian order,
+not big-endian like the rest of XFS.
+
+*bud_size*::
+Size of this log item. Should be 1.
+
+*bud_bui_id*::
+A 64-bit number that binds the corresponding BUI log item to this BUD log item.
+
[[Inode_Log_Item]]
=== Inode Updates
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index 10fd15f..bc172f3 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -45,6 +45,7 @@ relevant chapters. Magic numbers tend to have consistent locations:
| +XFS_ATTR3_LEAF_MAGIC+ | 0x3bee | | xref:Leaf_Attributes[Leaf Attribute], v5 only
| +XFS_ATTR3_RMT_MAGIC+ | 0x5841524d | XARM | xref:Remote_Values[Remote Attribute Value], v5 only
| +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
+| +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only
|=====
The magic numbers for log items are at offset zero in each log item, but items
@@ -64,6 +65,10 @@ are not aligned to blocks.
| +XFS_LI_ICREATE+ | 0x123f | | xref:Inode_Create_Log_Item[Inode Creation Log Item]
| +XFS_LI_RUI+ | 0x1240 | | xref:RUI_Log_Item[Reverse Mapping Update Intent]
| +XFS_LI_RUD+ | 0x1241 | | xref:RUD_Log_Item[Reverse Mapping Update Done]
+| +XFS_LI_CUI+ | 0x1242 | | xref:CUI_Log_Item[Reference Count Update Intent]
+| +XFS_LI_CUD+ | 0x1243 | | xref:CUD_Log_Item[Reference Count Update Done]
+| +XFS_LI_BUI+ | 0x1244 | | xref:BUI_Log_Item[File Block Mapping Update Intent]
+| +XFS_LI_BUD+ | 0x1245 | | xref:BUD_Log_Item[File Block Mapping Update Done]
|=====
= Theoretical Limits
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index dc1fad2..4415c38 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -109,7 +109,8 @@ struct xfs_dinode_core {
__be64 di_changecount;
__be64 di_lsn;
__be64 di_flags2;
- __u8 di_pad2[16];
+ __be32 di_cowextsize;
+ __u8 di_pad2[12];
xfs_timestamp_t di_crtime;
__be64 di_ino;
uuid_t di_uuid;
@@ -215,7 +216,7 @@ including relevant metadata like B+trees. This does not include blocks used for
extended attributes.
*di_extsize*::
-Specifies the extent size for filesystems with real-time devices and an extent
+Specifies the extent size for filesystems with real-time devices or an extent
size hint for standard filesystems. For normal filesystems, and with
directories, the +XFS_DIFLAG_EXTSZINHERIT+ flag must be set in +di_flags+ if
this field is used. Inodes created in these directories will inherit the
@@ -279,7 +280,7 @@ For directory inodes, new inodes inherit the +di_projid+ value.
For directory inodes, symlinks cannot be created.
| +XFS_DIFLAG_EXTSIZE+ |
-Specifies the extent size for real-time files or a and extent size hint for regular files.
+Specifies the extent size for real-time files or an extent size hint for regular files.
| +XFS_DIFLAG_EXTSZINHERIT+ |
For directory inodes, new inodes inherit the +di_extsize+ value.
@@ -323,8 +324,26 @@ Specifies extended flags associated with a v3 inode.
| +XFS_DIFLAG2_DAX+ |
For a file, enable DAX to increase performance on persistent-memory storage.
If set on a directory, files created in the directory will inherit this flag.
+| +XFS_DIFLAG2_REFLINK+ |
+This inode shares (or has shared) data blocks with another inode.
+| +XFS_DIFLAG2_COWEXTSIZE+ |
+For files, this is the extent size hint for copy on write operations; see
++di_cowextsize+ for details. For directories, the value in +di_cowextsize+
+will be copied to all newly created files and directories.
|=====
+*di_cowextsize*::
+Specifies the extent size hint for copy on write operations. When allocating
+extents for a copy on write operation, the allocator will be asked to align
+its allocations to either +di_cowextsize+ blocks or +di_extsize+ blocks,
+whichever is greater. The +XFS_DIFLAG2_COWEXTSIZE+ flag must be set if this
+field is used. If this field and its flag are set on a directory file, the
+value will be copied into any files or directories created within this
+directory. During a block sharing operation, this value will be copied from
+the source file to the destination file if the sharing operation completely
+overwrites the destination file's contents and the destination file does not
+already have +di_cowextsize+ set.
+
*di_pad2*::
Padding for future expansion of the inode.
diff --git a/design/XFS_Filesystem_Structure/refcountbt.asciidoc b/design/XFS_Filesystem_Structure/refcountbt.asciidoc
new file mode 100644
index 0000000..dbbb98e
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/refcountbt.asciidoc
@@ -0,0 +1,145 @@
+[[Reference_Count_Btree]]
+== Reference Count B+tree
+
+[NOTE]
+This data structure is under construction! Details may change.
+
+To support the sharing of file data blocks (reflink), each allocation group has
+its own reference count B+tree, which grows in the allocated space like the
+inode B+trees. This data could be gleaned by performing an interval query of
+the reverse-mapping B+tree, but doing so would come at a huge performance
+penalty. Therefore, this data structure is a cache of computable information.
+
+This B+tree is only present if the +XFS_SB_FEAT_RO_COMPAT_REFLINK+
+feature is enabled. The feature requires a version 5 filesystem.
+
+Each record in the reference count B+tree has the following structure:
+
+[source, c]
+----
+struct xfs_refcount_rec {
+ __be32 rc_startblock;
+ __be32 rc_blockcount;
+ __be32 rc_refcount;
+};
+----
+
+*rc_startblock*::
+AG block number of this record.
+
+*rc_blockcount*::
+The length of this extent.
+
+*rc_refcount*::
+Number of mappings of this filesystem extent.
+
+Node pointers are an AG relative block pointer:
+
+[source, c]
+----
+struct xfs_refcount_key {
+ __be32 rc_startblock;
+};
+----
+
+* As the reference counting is AG relative, all the block numbers are only
+32-bits.
+* The +bb_magic+ value is "R3FC" (0x52334643).
+* The +xfs_btree_sblock_t+ header is used for intermediate B+tree node as well
+as the leaves.
+
+=== xfs_db refcntbt Example
+
+For this example, an XFS filesystem was populated with a root filesystem and
+a deduplication program was run to create shared blocks:
+
+----
+xfs_db> agf 0
+xfs_db> addr refcntroot
+xfs_db> p
+magic = 0x52334643
+level = 1
+numrecs = 6
+leftsib = null
+rightsib = null
+bno = 36892
+lsn = 0x200004ec2
+uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae
+owner = 0
+crc = 0x75f35128 (correct)
+keys[1-6] = [startblock] 1:[14] 2:[65633] 3:[65780] 4:[94571] 5:[117201] 6:[152442]
+ptrs[1-6] = 1:7 2:25836 3:25835 4:18447 5:18445 6:18449
+xfs_db> addr ptrs[3]
+xfs_db> p
+magic = 0x52334643
+level = 0
+numrecs = 80
+leftsib = 25836
+rightsib = 18447
+bno = 51670
+lsn = 0x200004ec2
+uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae
+owner = 0
+crc = 0xc3962813 (correct)
+recs[1-80] = [startblock,blockcount,refcount]
+ 1:[65780,1,2] 2:[65781,1,3] 3:[65785,2,2] 4:[66640,1,2]
+ 5:[69602,4,2] 6:[72256,16,2] 7:[72871,4,2] 8:[72879,20,2]
+ 9:[73395,4,2] 10:[75063,4,2] 11:[79093,4,2] 12:[86344,16,2]
+----
+
+Record 6 in the reference count B+tree for AG 0 indicates that the AG extent
+starting at block 72,256 and running for 16 blocks has a reference count of 2.
+This means that there are two files sharing the block:
+
+----
+xfs_db> blockget -n
+xfs_db> fsblock 72256
+xfs_db> blockuse
+block 72256 (0/72256) type rldata inode 25169197
+----
+
+The blockuse type changes to ``rldata'' to indicate that the block is shared
+data. Unfortunately, blockuse only tells us about one block owner. If we
+happen to have enabled the reverse-mapping B+tree, we can use it to find all
+inodes that own this block:
+
+----
+xfs_db> agf 0
+xfs_db> addr rmaproot
+...
+xfs_db> addr ptrs[3]
+...
+xfs_db> addr ptrs[7]
+xfs_db> p
+magic = 0x524d4233
+level = 0
+numrecs = 22
+leftsib = 65057
+rightsib = 65058
+bno = 291478
+lsn = 0x200004ec2
+uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae
+owner = 0
+crc = 0xed7da3f7 (correct)
+recs[1-22] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+ 1:[68957,8,3201,0,0,0,0] 2:[68965,4,25260953,0,0,0,0]
+ ...
+ 18:[72232,58,3227,0,0,0,0] 19:[72256,16,25169197,24,0,0,0]
+ 20:[72290,75,3228,0,0,0,0] 21:[72365,46,3229,0,0,0,0]
+----
+
+Records 18 and 19 intersect the block 72,256; they tell us that inodes 3,227
+and 25,169,197 both claim ownership. Let us confirm this:
+
+----
+xfs_db> inode 25169197
+xfs_db> bmap
+data offset 0 startblock 12632259 (3/49347) count 24 flag 0
+data offset 24 startblock 72256 (0/72256) count 16 flag 0
+data offset 40 startblock 12632299 (3/49387) count 18 flag 0
+xfs_db> inode 3227
+xfs_db> bmap
+data offset 0 startblock 72232 (0/72232) count 58 flag 0
+----
+
+Inodes 25,169,197 and 3,227 both contain mappings to block 0/72,256.
diff --git a/design/XFS_Filesystem_Structure/reflink.asciidoc b/design/XFS_Filesystem_Structure/reflink.asciidoc
new file mode 100644
index 0000000..8f52b90
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/reflink.asciidoc
@@ -0,0 +1,40 @@
+[[Reflink_Deduplication]]
+= Sharing Data Blocks
+
+On a traditional filesystem, there is a 1:1 mapping between a logical block
+offset in a file and a physical block on disk, which is to say that physical
+blocks are not shared. However, there exist various use cases for being able
+to share blocks between files -- deduplicating files saves space on archival
+systems; creating space-efficient clones of disk images for virtual machines
+and containers facilitates efficient datacenters; and deferring the payment of
+the allocation cost of a file system tree copy as long as possible makes
+regular work faster. In all of these cases, a write to one of the shared
+copies *must* not affect the other shared copies, which means that writes to
+shared blocks must employ a copy-on-write strategy. Sharing blocks in this
+manner is commonly referred to as ``reflinking''.
+
+XFS implements block sharing in a fairly straightforward manner. All existing
+data fork structures remain unchanged, save for the addition of a
+per-allocation group xref:Reference_Count_Btree[reference count B+tree]. This
+data structure tracks reference counts for all shared physical blocks, with a
+few rules to maintain compatibility with existing code: If a block is free, it
+will be tracked in the free space B+trees. If a block is owned by a single
+file, it appears in neither the free space nor the reference count B+trees. If
+a block is shared, it will appear in the reference count B+tree with a
+reference count >= 2. The first two cases are established precedent in XFS, so
+the third case is the only behavioral change.
+
+When a filesystem block is shared, the block mapping in the destination file is
+updated to point to that filesystem block and the reference count B+tree records
+are updated to reflect the increased refcount. If a shared block is written, a
+new block will be allocated, the dirty data written to this new block, and the
+file's block mapping updated to point to the new block. If a shared block is
+unmapped, the reference count records are updated to reflect the decreased
+refcount and the block is also freed if its reference count becomes zero. This
+enables users to create space efficient clones of disk images and to copy
+filesystem subtrees quickly, using the standard Linux coreutils packages.
+
+Deduplication employs the same mechanism to share blocks and copy them at write
+time. However, the kernel confirms that the contents of both files are
+identical before updating the destination file's mapping. This enables XFS to
+be used by userspace deduplication programs such as +duperemove+.
diff --git a/design/XFS_Filesystem_Structure/rmapbt.asciidoc b/design/XFS_Filesystem_Structure/rmapbt.asciidoc
index a8a210b..0ec72c1 100644
--- a/design/XFS_Filesystem_Structure/rmapbt.asciidoc
+++ b/design/XFS_Filesystem_Structure/rmapbt.asciidoc
@@ -53,6 +53,8 @@ absolute inode number, but can also correspond to one of the following:
| +XFS_RMAP_OWN_AG+ | Per-allocation group B+tree blocks. This means free space B+tree blocks, blocks on the freelist, and reverse-mapping B+tree blocks.
| +XFS_RMAP_OWN_INOBT+ | Per-allocation group inode B+tree blocks. This includes free inode B+tree blocks.
| +XFS_RMAP_OWN_INODES+ | Inode chunks
+| +XFS_RMAP_OWN_REFC+ | Per-allocation group refcount B+tree blocks. This will be used for reflink support.
+| +XFS_RMAP_OWN_COW+ | Blocks that have been reserved for a copy-on-write operation that has not completed.
|=====
*rm_fork*::
diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
index 1b8658d..7916fbe 100644
--- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
+++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc
@@ -48,6 +48,8 @@ include::overview.asciidoc[]
include::metadata_integrity.asciidoc[]
+include::reflink.asciidoc[]
+
include::reconstruction.asciidoc[]
include::common_types.asciidoc[]
@@ -70,6 +72,8 @@ include::allocation_groups.asciidoc[]
include::rmapbt.asciidoc[]
+include::refcountbt.asciidoc[]
+
include::journaling_log.asciidoc[]
include::internal_inodes.asciidoc[]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 7/7] xfsdocs: document the realtime reverse mapping btree
2016-08-25 23:26 [PATCH v8 0/7] xfs-docs: reorganize chapters, document rmap and reflink Darrick J. Wong
` (5 preceding siblings ...)
2016-08-25 23:27 ` [PATCH 6/7] xfsdocs: document refcount btree and reflink Darrick J. Wong
@ 2016-08-25 23:27 ` Darrick J. Wong
2016-09-08 1:38 ` Dave Chinner
6 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2016-08-25 23:27 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-xfs, xfs
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
.../allocation_groups.asciidoc | 8 +
design/XFS_Filesystem_Structure/docinfo.xml | 14 +
.../internal_inodes.asciidoc | 2
design/XFS_Filesystem_Structure/magic.asciidoc | 1
.../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 6 -
design/XFS_Filesystem_Structure/rtrmapbt.asciidoc | 234 ++++++++++++++++++++
6 files changed, 263 insertions(+), 2 deletions(-)
create mode 100644 design/XFS_Filesystem_Structure/rtrmapbt.asciidoc
diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
index cafa8b7..7ba636a 100644
--- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
+++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
@@ -105,6 +105,7 @@ struct xfs_sb
xfs_ino_t sb_pquotino;
xfs_lsn_t sb_lsn;
uuid_t sb_meta_uuid;
+ xfs_ino_t sb_rrmapino;
};
----
*sb_magicnum*::
@@ -449,6 +450,13 @@ If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
all metadata blocks must match this UUID. If not, the block header UUID field
must match +sb_uuid+.
+*sb_rrmapino*::
+If the +XFS_SB_FEAT_COMPAT_RMAPBT+ feature is set and a real-time
+device is present (+sb_rblocks+ > 0), this field points to an inode
+that contains the root to the
+xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree].
+This field is zero otherwise.
+
=== xfs_db Superblock Example
A filesystem is made on a single disk with the following command:
diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index f5e62bc..5cdcf6c 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -141,4 +141,18 @@
</simplelist>
</revdescription>
</revision>
+ <revision>
+ <revnumber>3.1415</revnumber>
+ <date>July 2016</date>
+ <author>
+ <firstname>Darrick</firstname>
+ <surname>Wong</surname>
+ <email></email>
+ </author>
+ <revdescription>
+ <simplelist>
+ <member>Document the real-time reverse-mapping btree.</member>
+ </simplelist>
+ </revdescription>
+ </revision>
</revhistory>
diff --git a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
index 9ace3ea..e6bf75f 100644
--- a/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
+++ b/design/XFS_Filesystem_Structure/internal_inodes.asciidoc
@@ -201,3 +201,5 @@ rtbitmap location, and positive if there are any.
This data structure is not particularly space efficient, however it is a very
fast way to provide the same data as the two free space B+trees for regular
files since the space is preallocated and metadata maintenance is minimal.
+
+include::rtrmapbt.asciidoc[]
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index bc172f3..77bed6d 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -45,6 +45,7 @@ relevant chapters. Magic numbers tend to have consistent locations:
| +XFS_ATTR3_LEAF_MAGIC+ | 0x3bee | | xref:Leaf_Attributes[Leaf Attribute], v5 only
| +XFS_ATTR3_RMT_MAGIC+ | 0x5841524d | XARM | xref:Remote_Values[Remote Attribute Value], v5 only
| +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
+| +XFS_RTRMAP_CRC_MAGIC+ | 0x4d415052 | MAPR | xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only
| +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only
|=====
diff --git a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
index 4415c38..02d44ac 100644
--- a/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
+++ b/design/XFS_Filesystem_Structure/ondisk_inode.asciidoc
@@ -141,7 +141,8 @@ the associated metadata or data; or ``btree'' where the inode contains a B+tree
root node which points to filesystem blocks containing the metadata or data.
Migration between the formats depends on the amount of metadata associated with
the inode. ``dev'' is used for character and block devices while ``uuid'' is
-currently not used.
+currently not used. ``rmap'' indicates that a reverse-mapping B+tree
+is rooted in the fork.
[source, c]
----
@@ -150,7 +151,8 @@ typedef enum xfs_dinode_fmt {
XFS_DINODE_FMT_LOCAL,
XFS_DINODE_FMT_EXTENTS,
XFS_DINODE_FMT_BTREE,
- XFS_DINODE_FMT_UUID
+ XFS_DINODE_FMT_UUID,
+ XFS_DINODE_FMT_RMAP,
} xfs_dinode_fmt_t;
----
diff --git a/design/XFS_Filesystem_Structure/rtrmapbt.asciidoc b/design/XFS_Filesystem_Structure/rtrmapbt.asciidoc
new file mode 100644
index 0000000..3a109b2
--- /dev/null
+++ b/design/XFS_Filesystem_Structure/rtrmapbt.asciidoc
@@ -0,0 +1,234 @@
+[[Real_time_Reverse_Mapping_Btree]]
+=== Real-Time Reverse-Mapping B+tree
+
+[NOTE]
+This data structure is under construction! Details may change.
+
+If the reverse-mapping B+tree and real-time storage device features
+are enabled, the real-time device has its own reverse block-mapping
+B+tree.
+
+As mentioned in the chapter about xref:Reconstruction[reconstruction],
+this data structure is another piece of the puzzle necessary to
+reconstruct the data or attribute fork of a file from reverse-mapping
+records; we can also use it to double-check allocations to ensure that
+we are not accidentally cross-linking blocks, which can cause severe
+damage to the filesystem.
+
+This B+tree is only present if the +XFS_SB_FEAT_RO_COMPAT_RMAPBT+
+feature is enabled and a real time device is present. The feature
+requires a version 5 filesystem.
+
+The real-time reverse mapping B+tree is rooted in an inode's data
+fork; the inode number is given by the +sb_rrmapino+ field in the
+superblock. The B+tree blocks themselves are stored in the regular
+filesystem. The structures used for an inode's B+tree root are:
+
+[source, c]
+----
+struct xfs_rtrmap_root {
+ __be16 bb_level;
+ __be16 bb_numrecs;
+};
+----
+
+* On disk, the B+tree node starts with the +xfs_rtrmap_root+ header
+followed by an array of +xfs_rtrmap_key+ values and then an array of
++xfs_rtrmap_ptr_t+ values. The size of both arrays is specified by the
+header's +bb_numrecs+ value.
+
+* The root node in the inode can only contain up to 10 key/pointer
+pairs for a standard 512 byte inode before a new level of nodes is
+added between the root and the leaves. +di_forkoff+ should always
+be zero, because there are no extended attributes.
+
+Each record in the real-time reverse-mapping B+tree has the following
+structure:
+
+[source, c]
+----
+struct xfs_rtrmap_rec {
+ __be64 rm_startblock;
+ __be64 rm_blockcount;
+ __be64 rm_owner;
+ __be64 rm_fork:1;
+ __be64 rm_bmbt:1;
+ __be64 rm_unwritten:1;
+ __be64 rm_unused:7;
+ __be64 rm_offset:54;
+};
+----
+
+*rm_startblock*::
+Real-time device block number of this record.
+
+*rm_blockcount*::
+The length of this extent, in real-time blocks.
+
+*rm_owner*::
+A 64-bit number describing the owner of this extent. This must be an
+inode number, because the real-time device is for file data only.
+
+*rm_fork*::
+If +rm_owner+ describes an inode, this can be 1 if this record is for
+an attribute fork. This value will always be zero for real-time
+extents.
+
+*rm_bmbt*::
+If +rm_owner+ describes an inode, this can be 1 to signify that this
+record is for a block map B+tree block. In this case, +rm_offset+ has
+no meaning. This value will always be zero for real-time extents.
+
+*rm_unwritten*::
+A flag indicating that the extent is unwritten. This corresponds to
+the flag in the xref:Data_Extents[extent record] format which means
++XFS_EXT_UNWRITTEN+.
+
+*rm_offset*::
+The 54-bit logical file block offset, if +rm_owner+ describes an
+inode.
+
+[NOTE]
+The single-bit flag values +rm_unwritten+, +rm_fork+, and +rm_bmbt+
+are packed into the larger fields in the C structure definition.
+
+The key has the following structure:
+
+[source, c]
+----
+struct xfs_rtrmap_key {
+ __be64 rm_startblock;
+ __be64 rm_owner;
+ __be64 rm_fork:1;
+ __be64 rm_bmbt:1;
+ __be64 rm_reserved:1;
+ __be64 rm_unused:7;
+ __be64 rm_offset:54;
+};
+----
+
+* All block numbers are 64-bit real-time device block numbers.
+
+* The +bb_magic+ value is ``MAPR'' (0x4d415052).
+
+* The +xfs_btree_lblock_t+ header is used for intermediate B+tree node as well
+as the leaves.
+
+* Each pointer is associated with two keys. The first of these is the
+"low key", which is the key of the smallest record accessible through
+the pointer. This low key has the same meaning as the key in all
+other btrees. The second key is the high key, which is the maximum of
+the largest key that can be used to access a given record underneath
+the pointer. Recall that each record in the real-time reverse mapping
+b+tree describes an interval of physical blocks mapped to an interval
+of logical file block offsets; therefore, it makes sense that a range
+of keys can be used to find to a record.
+
+==== xfs_db rtrmapbt Example
+
+This example shows a real-time reverse-mapping B+tree from a freshly
+populated root filesystem:
+
+----
+xfs_db> sb 0
+xfs_db> addr rrmapino
+xfs_db> p
+core.magic = 0x494e
+core.mode = 0100000
+core.version = 3
+core.format = 5 (rtrmapbt)
+...
+u3.rtrmapbt.level = 3
+u3.rtrmapbt.numrecs = 1
+u3.rtrmapbt.keys[1] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,
+ owner_hi,offset_hi,attrfork_hi,bmbtblock_hi]
+ 1:[1,132,1,0,0,1705337,133,54431,0,0]
+u3.rtrmapbt.ptrs[1] = 1:671
+xfs_db> addr u3.rtrmapbt.ptrs[1]
+xfs_db> p
+magic = 0x4d415052
+level = 2
+numrecs = 8
+leftsib = null
+rightsib = null
+bno = 5368
+lsn = 0x400000000
+uuid = 98bbde42-67e7-46a5-a73e-d64a76b1b5ce
+owner = 131
+crc = 0x2560d199 (correct)
+keys[1-8] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,owner_hi,
+ offset_hi,attrfork_hi,bmbtblock_hi]
+ 1:[1,132,1,0,0,17749,132,17749,0,0]
+ 2:[17751,132,17751,0,0,35499,132,35499,0,0]
+ 3:[35501,132,35501,0,0,53249,132,53249,0,0]
+ 4:[53251,132,53251,0,0,1658473,133,7567,0,0]
+ 5:[1658475,133,7569,0,0,1667473,133,16567,0,0]
+ 6:[1667475,133,16569,0,0,1685223,133,34317,0,0]
+ 7:[1685225,133,34319,0,0,1694223,133,43317,0,0]
+ 8:[1694225,133,43319,0,0,1705337,133,54431,0,0]
+ptrs[1-8] = 1:134 2:238 3:345 4:453 5:795 6:563 7:670 8:780
+----
+
+We arbitrarily pick pointer 7 (twice) to traverse downwards:
+
+----
+xfs_db> addr ptrs[7]
+xfs_db> p
+magic = 0x4d415052
+level = 1
+numrecs = 36
+leftsib = 563
+rightsib = 780
+bno = 5360
+lsn = 0
+uuid = 98bbde42-67e7-46a5-a73e-d64a76b1b5ce
+owner = 131
+crc = 0x6807761d (correct)
+keys[1-36] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,owner_hi,
+ offset_hi,attrfork_hi,bmbtblock_hi]
+ 1:[1685225,133,34319,0,0,1685473,133,34567,0,0]
+ 2:[1685475,133,34569,0,0,1685723,133,34817,0,0]
+ 3:[1685725,133,34819,0,0,1685973,133,35067,0,0]
+ ...
+ 34:[1693475,133,42569,0,0,1693723,133,42817,0,0]
+ 35:[1693725,133,42819,0,0,1693973,133,43067,0,0]
+ 36:[1693975,133,43069,0,0,1694223,133,43317,0,0]
+ptrs[1-36] = 1:669 2:672 3:674...34:722 35:723 36:725
+xfs_db> addr ptrs[7]
+xfs_db> p
+magic = 0x4d415052
+level = 0
+numrecs = 125
+leftsib = 678
+rightsib = 681
+bno = 5440
+lsn = 0
+uuid = 98bbde42-67e7-46a5-a73e-d64a76b1b5ce
+owner = 131
+crc = 0xefce34d4 (correct)
+recs[1-125] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+ 1:[1686725,1,133,35819,0,0,0]
+ 2:[1686727,1,133,35821,0,0,0]
+ 3:[1686729,1,133,35823,0,0,0]
+ ...
+ 123:[1686969,1,133,36063,0,0,0]
+ 124:[1686971,1,133,36065,0,0,0]
+ 125:[1686973,1,133,36067,0,0,0]
+----
+
+Several interesting things pop out here. The first record shows that
+inode 133 has mapped real-time block 1,686,725 at offset 35,819. We
+confirm this by looking at the block map for that inode:
+
+----
+xfs_db> inode 133
+xfs_db> p core.realtime
+core.realtime = 1
+xfs_db> bmap
+data offset 35817 startblock 1686723 (1/638147) count 1 flag 0
+data offset 35819 startblock 1686725 (1/638149) count 1 flag 0
+data offset 35821 startblock 1686727 (1/638151) count 1 flag 0
+----
+
+Notice that inode 133 has the real-time flag set, which means that its
+data blocks are all allocated from the real-time device.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 7/7] xfsdocs: document the realtime reverse mapping btree
2016-08-25 23:27 ` [PATCH 7/7] xfsdocs: document the realtime reverse mapping btree Darrick J. Wong
@ 2016-09-08 1:38 ` Dave Chinner
2016-09-08 2:03 ` Darrick J. Wong
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2016-09-08 1:38 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, xfs
On Thu, Aug 25, 2016 at 04:27:42PM -0700, Darrick J. Wong wrote:
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> .../allocation_groups.asciidoc | 8 +
> design/XFS_Filesystem_Structure/docinfo.xml | 14 +
> .../internal_inodes.asciidoc | 2
> design/XFS_Filesystem_Structure/magic.asciidoc | 1
> .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 6 -
> design/XFS_Filesystem_Structure/rtrmapbt.asciidoc | 234 ++++++++++++++++++++
> 6 files changed, 263 insertions(+), 2 deletions(-)
> create mode 100644 design/XFS_Filesystem_Structure/rtrmapbt.asciidoc
>
>
> diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> index cafa8b7..7ba636a 100644
> --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> @@ -105,6 +105,7 @@ struct xfs_sb
> xfs_ino_t sb_pquotino;
> xfs_lsn_t sb_lsn;
> uuid_t sb_meta_uuid;
> + xfs_ino_t sb_rrmapino;
> };
> ----
> *sb_magicnum*::
> @@ -449,6 +450,13 @@ If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
> all metadata blocks must match this UUID. If not, the block header UUID field
> must match +sb_uuid+.
>
> +*sb_rrmapino*::
> +If the +XFS_SB_FEAT_COMPAT_RMAPBT+ feature is set and a real-time
XFS_SB_FEAT_RO_COMPAT_RMAPBT?
(yes, I am reading these patches!)
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 7/7] xfsdocs: document the realtime reverse mapping btree
2016-09-08 1:38 ` Dave Chinner
@ 2016-09-08 2:03 ` Darrick J. Wong
0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2016-09-08 2:03 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs, xfs
On Thu, Sep 08, 2016 at 11:38:38AM +1000, Dave Chinner wrote:
> On Thu, Aug 25, 2016 at 04:27:42PM -0700, Darrick J. Wong wrote:
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > .../allocation_groups.asciidoc | 8 +
> > design/XFS_Filesystem_Structure/docinfo.xml | 14 +
> > .../internal_inodes.asciidoc | 2
> > design/XFS_Filesystem_Structure/magic.asciidoc | 1
> > .../XFS_Filesystem_Structure/ondisk_inode.asciidoc | 6 -
> > design/XFS_Filesystem_Structure/rtrmapbt.asciidoc | 234 ++++++++++++++++++++
> > 6 files changed, 263 insertions(+), 2 deletions(-)
> > create mode 100644 design/XFS_Filesystem_Structure/rtrmapbt.asciidoc
> >
> >
> > diff --git a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> > index cafa8b7..7ba636a 100644
> > --- a/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> > +++ b/design/XFS_Filesystem_Structure/allocation_groups.asciidoc
> > @@ -105,6 +105,7 @@ struct xfs_sb
> > xfs_ino_t sb_pquotino;
> > xfs_lsn_t sb_lsn;
> > uuid_t sb_meta_uuid;
> > + xfs_ino_t sb_rrmapino;
> > };
> > ----
> > *sb_magicnum*::
> > @@ -449,6 +450,13 @@ If the +XFS_SB_FEAT_INCOMPAT_META_UUID+ feature is set, then the UUID field in
> > all metadata blocks must match this UUID. If not, the block header UUID field
> > must match +sb_uuid+.
> >
> > +*sb_rrmapino*::
> > +If the +XFS_SB_FEAT_COMPAT_RMAPBT+ feature is set and a real-time
>
> XFS_SB_FEAT_RO_COMPAT_RMAPBT?
>
> (yes, I am reading these patches!)
Woohoo!!!
Thank you for catching this! :)
--D
>
> -Dave.
>
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread