All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: darrick.wong@oracle.com
Cc: linux-xfs@vger.kernel.org, linux-doc@vger.kernel.org, corbet@lwn.net
Subject: [PATCH 16/22] docs: add preliminary XFS realtime rmapbt structures to the DS&A book
Date: Wed, 03 Oct 2018 21:20:05 -0700	[thread overview]
Message-ID: <153862680580.26427.13325972708752045108.stgit@magnolia> (raw)
In-Reply-To: <153862669110.26427.16504658853992750743.stgit@magnolia>

From: Darrick J. Wong <darrick.wong@oracle.com>

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 .../xfs-data-structures/internal_inodes.rst        |    2 
 .../filesystems/xfs-data-structures/rtrmapbt.rst   |  230 ++++++++++++++++++++
 2 files changed, 232 insertions(+)
 create mode 100644 Documentation/filesystems/xfs-data-structures/rtrmapbt.rst


diff --git a/Documentation/filesystems/xfs-data-structures/internal_inodes.rst b/Documentation/filesystems/xfs-data-structures/internal_inodes.rst
index 4c3a1bf1f822..0faf58caf8f6 100644
--- a/Documentation/filesystems/xfs-data-structures/internal_inodes.rst
+++ b/Documentation/filesystems/xfs-data-structures/internal_inodes.rst
@@ -206,3 +206,5 @@ rtbitmap location, and positive if there are any.
 This data structure is not particularly space efficient, however it is a very
 fast way to provide the same data as the two free space B+trees for regular
 files since the space is preallocated and metadata maintenance is minimal.
+
+.. include:: rtrmapbt.rst
diff --git a/Documentation/filesystems/xfs-data-structures/rtrmapbt.rst b/Documentation/filesystems/xfs-data-structures/rtrmapbt.rst
new file mode 100644
index 000000000000..1573ec4f09ec
--- /dev/null
+++ b/Documentation/filesystems/xfs-data-structures/rtrmapbt.rst
@@ -0,0 +1,230 @@
+Real-Time Reverse-Mapping B+tree
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    **Note**
+
+    This data structure is under construction! Details may change.
+
+If the reverse-mapping B+tree and real-time storage device features are
+enabled, the real-time device has its own reverse block-mapping B+tree.
+
+As mentioned in the chapter about `reconstruction <#metadata-reconstruction>`__, this
+data structure is another piece of the puzzle necessary to reconstruct the
+data or attribute fork of a file from reverse-mapping records; we can also use
+it to double-check allocations to ensure that we are not accidentally
+cross-linking blocks, which can cause severe damage to the filesystem.
+
+This B+tree is only present if the XFS\_SB\_FEAT\_RO\_COMPAT\_RMAPBT feature
+is enabled and a real time device is present. The feature requires a version 5
+filesystem.
+
+The real-time reverse mapping B+tree is rooted in an inode’s data fork; the
+inode number is given by the sb\_rrmapino field in the superblock. The B+tree
+blocks themselves are stored in the regular filesystem. The structures used
+for an inode’s B+tree root are:
+
+.. code:: c
+
+    struct xfs_rtrmap_root {
+         __be16                     bb_level;
+         __be16                     bb_numrecs;
+    };
+
+-  On disk, the B+tree node starts with the xfs\_rtrmap\_root header followed
+   by an array of xfs\_rtrmap\_key values and then an array of
+   xfs\_rtrmap\_ptr\_t values. The size of both arrays is specified by the
+   header’s bb\_numrecs value.
+
+-  The root node in the inode can only contain up to 10 key/pointer pairs for
+   a standard 512 byte inode before a new level of nodes is added between the
+   root and the leaves. di\_forkoff should always be zero, because there are
+   no extended attributes.
+
+Each record in the real-time reverse-mapping B+tree has the following
+structure:
+
+.. code:: c
+
+    struct xfs_rtrmap_rec {
+         __be64                     rm_startblock;
+         __be64                     rm_blockcount;
+         __be64                     rm_owner;
+         __be64                     rm_fork:1;
+         __be64                     rm_bmbt:1;
+         __be64                     rm_unwritten:1;
+         __be64                     rm_unused:7;
+         __be64                     rm_offset:54;
+    };
+
+**rm\_startblock**
+    Real-time device block number of this record.
+
+**rm\_blockcount**
+    The length of this extent, in real-time blocks.
+
+**rm\_owner**
+    A 64-bit number describing the owner of this extent. This must be an inode
+    number, because the real-time device is for file data only.
+
+**rm\_fork**
+    If rm\_owner describes an inode, this can be 1 if this record is for an
+    attribute fork. This value will always be zero for real-time extents.
+
+**rm\_bmbt**
+    If rm\_owner describes an inode, this can be 1 to signify that this record
+    is for a block map B+tree block. In this case, rm\_offset has no meaning.
+    This value will always be zero for real-time extents.
+
+**rm\_unwritten**
+    A flag indicating that the extent is unwritten. This corresponds to the
+    flag in the `extent record <#data-extents>`__ format which means
+    XFS\_EXT\_UNWRITTEN.
+
+**rm\_offset**
+    The 54-bit logical file block offset, if rm\_owner describes an inode.
+
+    **Note**
+
+    The single-bit flag values rm\_unwritten, rm\_fork, and rm\_bmbt are
+    packed into the larger fields in the C structure definition.
+
+The key has the following structure:
+
+.. code:: c
+
+    struct xfs_rtrmap_key {
+         __be64                     rm_startblock;
+         __be64                     rm_owner;
+         __be64                     rm_fork:1;
+         __be64                     rm_bmbt:1;
+         __be64                     rm_reserved:1;
+         __be64                     rm_unused:7;
+         __be64                     rm_offset:54;
+    };
+
+-  All block numbers are 64-bit real-time device block numbers.
+
+-  The bb\_magic value is "MAPR" (0x4d415052).
+
+-  The xfs\_btree\_lblock\_t header is used for intermediate B+tree node as
+   well as the leaves.
+
+-  Each pointer is associated with two keys. The first of these is the "low
+   key", which is the key of the smallest record accessible through the
+   pointer. This low key has the same meaning as the key in all other btrees.
+   The second key is the high key, which is the maximum of the largest key
+   that can be used to access a given record underneath the pointer. Recall
+   that each record in the real-time reverse mapping b+tree describes an
+   interval of physical blocks mapped to an interval of logical file block
+   offsets; therefore, it makes sense that a range of keys can be used to find
+   to a record.
+
+xfs\_db rtrmapbt Example
+""""""""""""""""""""""""
+
+This example shows a real-time reverse-mapping B+tree from a freshly populated
+root filesystem:
+
+::
+
+    xfs_db> sb 0
+    xfs_db> addr rrmapino
+    xfs_db> p
+    core.magic = 0x494e
+    core.mode = 0100000
+    core.version = 3
+    core.format = 5 (rtrmapbt)
+    ...
+    u3.rtrmapbt.level = 3
+    u3.rtrmapbt.numrecs = 1
+    u3.rtrmapbt.keys[1] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,
+                   owner_hi,offset_hi,attrfork_hi,bmbtblock_hi]
+        1:[1,132,1,0,0,1705337,133,54431,0,0]
+    u3.rtrmapbt.ptrs[1] = 1:671
+    xfs_db> addr u3.rtrmapbt.ptrs[1]
+    xfs_db> p
+    magic = 0x4d415052
+    level = 2
+    numrecs = 8
+    leftsib = null
+    rightsib = null
+    bno = 5368
+    lsn = 0x400000000
+    uuid = 98bbde42-67e7-46a5-a73e-d64a76b1b5ce
+    owner = 131
+    crc = 0x2560d199 (correct)
+    keys[1-8] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,owner_hi,
+             offset_hi,attrfork_hi,bmbtblock_hi]
+        1:[1,132,1,0,0,17749,132,17749,0,0]
+        2:[17751,132,17751,0,0,35499,132,35499,0,0]
+        3:[35501,132,35501,0,0,53249,132,53249,0,0]
+        4:[53251,132,53251,0,0,1658473,133,7567,0,0]
+        5:[1658475,133,7569,0,0,1667473,133,16567,0,0]
+        6:[1667475,133,16569,0,0,1685223,133,34317,0,0]
+        7:[1685225,133,34319,0,0,1694223,133,43317,0,0]
+        8:[1694225,133,43319,0,0,1705337,133,54431,0,0]
+    ptrs[1-8] = 1:134 2:238 3:345 4:453 5:795 6:563 7:670 8:780
+
+We arbitrarily pick pointer 7 (twice) to traverse downwards:
+
+::
+
+    xfs_db> addr ptrs[7]
+    xfs_db> p
+    magic = 0x4d415052
+    level = 1
+    numrecs = 36
+    leftsib = 563
+    rightsib = 780
+    bno = 5360
+    lsn = 0
+    uuid = 98bbde42-67e7-46a5-a73e-d64a76b1b5ce
+    owner = 131
+    crc = 0x6807761d (correct)
+    keys[1-36] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,owner_hi,
+              offset_hi,attrfork_hi,bmbtblock_hi]
+        1:[1685225,133,34319,0,0,1685473,133,34567,0,0]
+        2:[1685475,133,34569,0,0,1685723,133,34817,0,0]
+        3:[1685725,133,34819,0,0,1685973,133,35067,0,0]
+        ...
+        34:[1693475,133,42569,0,0,1693723,133,42817,0,0]
+        35:[1693725,133,42819,0,0,1693973,133,43067,0,0]
+        36:[1693975,133,43069,0,0,1694223,133,43317,0,0]
+    ptrs[1-36] = 1:669 2:672 3:674...34:722 35:723 36:725
+    xfs_db> addr ptrs[7]
+    xfs_db> p
+    magic = 0x4d415052
+    level = 0
+    numrecs = 125
+    leftsib = 678
+    rightsib = 681
+    bno = 5440
+    lsn = 0
+    uuid = 98bbde42-67e7-46a5-a73e-d64a76b1b5ce
+    owner = 131
+    crc = 0xefce34d4 (correct)
+    recs[1-125] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+        1:[1686725,1,133,35819,0,0,0]
+        2:[1686727,1,133,35821,0,0,0]
+        3:[1686729,1,133,35823,0,0,0]
+        ...
+        123:[1686969,1,133,36063,0,0,0]
+        124:[1686971,1,133,36065,0,0,0]
+        125:[1686973,1,133,36067,0,0,0]
+
+Several interesting things pop out here. The first record shows that inode 133
+has mapped real-time block 1,686,725 at offset 35,819. We confirm this by
+looking at the block map for that inode:
+
+::
+
+    xfs_db> inode 133
+    xfs_db> p core.realtime
+    core.realtime = 1
+    xfs_db> bmap
+    data offset 35817 startblock 1686723 (1/638147) count 1 flag 0
+    data offset 35819 startblock 1686725 (1/638149) count 1 flag 0
+    data offset 35821 startblock 1686727 (1/638151) count 1 flag 0
+
+Notice that inode 133 has the real-time flag set, which means that its data
+blocks are all allocated from the real-time device.

  parent reply	other threads:[~2018-10-04 11:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-04  4:18 [PATCH v2 00/22] xfs-4.20: major documentation surgery Darrick J. Wong
2018-10-04  4:18 ` [PATCH 01/22] docs: add skeleton of XFS Data Structures and Algorithms book Darrick J. Wong
2018-10-04  4:18 ` [PATCH 03/22] docs: add XFS self-describing metadata integrity doc to DS&A book Darrick J. Wong
2018-10-04  4:18 ` [PATCH 04/22] docs: add XFS delayed logging design " Darrick J. Wong
2018-10-04  4:18 ` [PATCH 05/22] docs: add XFS shared data block chapter " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 06/22] docs: add XFS online repair " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 07/22] docs: add XFS common types and magic numbers " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 08/22] docs: add XFS testing chapter to the " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 09/22] docs: add XFS btrees " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 10/22] docs: add XFS dir/attr btree structure " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 11/22] docs: add XFS allocation group metadata " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 12/22] docs: add XFS reverse mapping structures " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 13/22] docs: add XFS refcount btree structure to " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 14/22] docs: add XFS log to the " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 15/22] docs: add XFS internal inodes " Darrick J. Wong
2018-10-04  4:20 ` Darrick J. Wong [this message]
2018-10-04  4:20 ` [PATCH 17/22] docs: add XFS inode format " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 18/22] docs: add XFS data extent map doc " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 19/22] docs: add XFS directory structure " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 20/22] docs: add XFS extended attributes structures " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 21/22] docs: add XFS symlink " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 22/22] docs: add XFS metadump structure to " Darrick J. Wong
2018-10-06  0:51 ` [PATCH v2 00/22] xfs-4.20: major documentation surgery Dave Chinner
2018-10-06  1:01   ` Jonathan Corbet
2018-10-06  1:09     ` Dave Chinner
2018-10-06 13:29   ` Matthew Wilcox
2018-10-06 14:10     ` Jonathan Corbet
2018-10-11 17:27   ` Jonathan Corbet
2018-10-12  1:33     ` Dave Chinner
2018-10-15  9:55     ` Christoph Hellwig
2018-10-15 14:28       ` Jonathan Corbet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=153862680580.26427.13325972708752045108.stgit@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.