All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: darrick.wong@oracle.com
Cc: linux-xfs@vger.kernel.org, linux-doc@vger.kernel.org, corbet@lwn.net
Subject: [PATCH 13/22] docs: add XFS refcount btree structure to DS&A book
Date: Wed, 03 Oct 2018 21:19:46 -0700	[thread overview]
Message-ID: <153862678652.26427.14910212060817967947.stgit@magnolia> (raw)
In-Reply-To: <153862669110.26427.16504658853992750743.stgit@magnolia>

From: Darrick J. Wong <darrick.wong@oracle.com>

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 .../xfs-data-structures/allocation_groups.rst      |    1 
 .../filesystems/xfs-data-structures/refcountbt.rst |  154 ++++++++++++++++++++
 2 files changed, 155 insertions(+)
 create mode 100644 Documentation/filesystems/xfs-data-structures/refcountbt.rst


diff --git a/Documentation/filesystems/xfs-data-structures/allocation_groups.rst b/Documentation/filesystems/xfs-data-structures/allocation_groups.rst
index 6c0ffd3a170b..76c6ddcd02ac 100644
--- a/Documentation/filesystems/xfs-data-structures/allocation_groups.rst
+++ b/Documentation/filesystems/xfs-data-structures/allocation_groups.rst
@@ -1381,3 +1381,4 @@ None of the XFS per-AG B+trees are involved with real time files. It is not
 possible for real time files to share data blocks.
 
 .. include:: rmapbt.rst
+.. include:: refcountbt.rst
diff --git a/Documentation/filesystems/xfs-data-structures/refcountbt.rst b/Documentation/filesystems/xfs-data-structures/refcountbt.rst
new file mode 100644
index 000000000000..0f2b818959df
--- /dev/null
+++ b/Documentation/filesystems/xfs-data-structures/refcountbt.rst
@@ -0,0 +1,154 @@
+.. SPDX-License-Identifier: CC-BY-SA-4.0
+
+Reference Count B+tree
+~~~~~~~~~~~~~~~~~~~~~~
+
+To support the sharing of file data blocks (reflink), each allocation group
+has its own reference count B+tree, which grows in the allocated space like
+the inode B+trees. This data could be gleaned by performing an interval query
+of the reverse-mapping B+tree, but doing so would come at a huge performance
+penalty. Therefore, this data structure is a cache of computable information.
+
+This B+tree is only present if the XFS\_SB\_FEAT\_RO\_COMPAT\_REFLINK feature
+is enabled. The feature requires a version 5 filesystem.
+
+Each record in the reference count B+tree has the following structure:
+
+.. code:: c
+
+    struct xfs_refcount_rec {
+         __be32                     rc_startblock;
+         __be32                     rc_blockcount;
+         __be32                     rc_refcount;
+    };
+
+**rc\_startblock**
+    AG block number of this record. The high bit is set for all records
+    referring to an extent that is being used to stage a copy on write
+    operation. This reduces recovery time during mount operations. The
+    reference count of these staging events must only be 1.
+
+**rc\_blockcount**
+    The length of this extent.
+
+**rc\_refcount**
+    Number of mappings of this filesystem extent.
+
+Node pointers are an AG relative block pointer:
+
+.. code:: c
+
+    struct xfs_refcount_key {
+         __be32                     rc_startblock;
+    };
+
+-  As the reference counting is AG relative, all the block numbers are only
+   32-bits.
+
+-  The bb\_magic value is "R3FC" (0x52334643).
+
+-  The xfs\_btree\_sblock\_t header is used for intermediate B+tree node as
+   well as the leaves.
+
+xfs\_db refcntbt Example
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+For this example, an XFS filesystem was populated with a root filesystem and a
+deduplication program was run to create shared blocks:
+
+::
+
+    xfs_db> agf 0
+    xfs_db> addr refcntroot
+    xfs_db> p
+    magic = 0x52334643
+    level = 1
+    numrecs = 6
+    leftsib = null
+    rightsib = null
+    bno = 36892
+    lsn = 0x200004ec2
+    uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae
+    owner = 0
+    crc = 0x75f35128 (correct)
+    keys[1-6] = [startblock] 1:[14] 2:[65633] 3:[65780] 4:[94571] 5:[117201] 6:[152442]
+    ptrs[1-6] = 1:7 2:25836 3:25835 4:18447 5:18445 6:18449
+    xfs_db> addr ptrs[3]
+    xfs_db> p
+    magic = 0x52334643
+    level = 0
+    numrecs = 80
+    leftsib = 25836
+    rightsib = 18447
+    bno = 51670
+    lsn = 0x200004ec2
+    uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae
+    owner = 0
+    crc = 0xc3962813 (correct)
+    recs[1-80] = [startblock,blockcount,refcount,cowflag]
+            1:[65780,1,2,0] 2:[65781,1,3,0] 3:[65785,2,2,0] 4:[66640,1,2,0]
+            5:[69602,4,2,0] 6:[72256,16,2,0] 7:[72871,4,2,0] 8:[72879,20,2,0]
+            9:[73395,4,2,0] 10:[75063,4,2,0] 11:[79093,4,2,0] 12:[86344,16,2,0]
+            ...
+            80:[35235,10,1,1]
+
+Notice record 80. The copy on write flag is set and the reference count is 1,
+which indicates that the extent 35,235 - 35,244 are being used to stage a copy
+on write activity. The "cowflag" field is the high bit of rc\_startblock.
+
+Record 6 in the reference count B+tree for AG 0 indicates that the AG extent
+starting at block 72,256 and running for 16 blocks has a reference count of 2.
+This means that there are two files sharing the block:
+
+::
+
+    xfs_db> blockget -n
+    xfs_db> fsblock 72256
+    xfs_db> blockuse
+    block 72256 (0/72256) type rldata inode 25169197
+
+The blockuse type changes to "rldata" to indicate that the block is shared
+data. Unfortunately, blockuse only tells us about one block owner. If we
+happen to have enabled the reverse-mapping B+tree, we can use it to find all
+inodes that own this block:
+
+::
+
+    xfs_db> agf 0
+    xfs_db> addr rmaproot
+    ...
+    xfs_db> addr ptrs[3]
+    ...
+    xfs_db> addr ptrs[7]
+    xfs_db> p
+    magic = 0x524d4233
+    level = 0
+    numrecs = 22
+    leftsib = 65057
+    rightsib = 65058
+    bno = 291478
+    lsn = 0x200004ec2
+    uuid = f1f89746-e00b-49c9-96b3-ecef0f2f14ae
+    owner = 0
+    crc = 0xed7da3f7 (correct)
+    recs[1-22] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+            1:[68957,8,3201,0,0,0,0] 2:[68965,4,25260953,0,0,0,0]
+            ...
+            18:[72232,58,3227,0,0,0,0] 19:[72256,16,25169197,24,0,0,0]
+            20:[72290,75,3228,0,0,0,0] 21:[72365,46,3229,0,0,0,0]
+
+Records 18 and 19 intersect the block 72,256; they tell us that inodes 3,227
+and 25,169,197 both claim ownership. Let us confirm this:
+
+::
+
+    xfs_db> inode 25169197
+    xfs_db> bmap
+    data offset 0 startblock 12632259 (3/49347) count 24 flag 0
+    data offset 24 startblock 72256 (0/72256) count 16 flag 0
+    data offset 40 startblock 12632299 (3/49387) count 18 flag 0
+    xfs_db> inode 3227
+    xfs_db> bmap
+    data offset 0 startblock 72232 (0/72232) count 58 flag 0
+
+Inodes 25,169,197 and 3,227 both contain mappings to block 0/72,256.

  parent reply	other threads:[~2018-10-04 11:11 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-04  4:18 [PATCH v2 00/22] xfs-4.20: major documentation surgery Darrick J. Wong
2018-10-04  4:18 ` [PATCH 01/22] docs: add skeleton of XFS Data Structures and Algorithms book Darrick J. Wong
2018-10-04  4:18 ` [PATCH 03/22] docs: add XFS self-describing metadata integrity doc to DS&A book Darrick J. Wong
2018-10-04  4:18 ` [PATCH 04/22] docs: add XFS delayed logging design " Darrick J. Wong
2018-10-04  4:18 ` [PATCH 05/22] docs: add XFS shared data block chapter " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 06/22] docs: add XFS online repair " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 07/22] docs: add XFS common types and magic numbers " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 08/22] docs: add XFS testing chapter to the " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 09/22] docs: add XFS btrees " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 10/22] docs: add XFS dir/attr btree structure " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 11/22] docs: add XFS allocation group metadata " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 12/22] docs: add XFS reverse mapping structures " Darrick J. Wong
2018-10-04  4:19 ` Darrick J. Wong [this message]
2018-10-04  4:19 ` [PATCH 14/22] docs: add XFS log " Darrick J. Wong
2018-10-04  4:19 ` [PATCH 15/22] docs: add XFS internal inodes " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 16/22] docs: add preliminary XFS realtime rmapbt structures " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 17/22] docs: add XFS inode format " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 18/22] docs: add XFS data extent map doc " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 19/22] docs: add XFS directory structure " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 20/22] docs: add XFS extended attributes structures " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 21/22] docs: add XFS symlink " Darrick J. Wong
2018-10-04  4:20 ` [PATCH 22/22] docs: add XFS metadump structure to " Darrick J. Wong
2018-10-06  0:51 ` [PATCH v2 00/22] xfs-4.20: major documentation surgery Dave Chinner
2018-10-06  1:01   ` Jonathan Corbet
2018-10-06  1:09     ` Dave Chinner
2018-10-06 13:29   ` Matthew Wilcox
2018-10-06 14:10     ` Jonathan Corbet
2018-10-11 17:27   ` Jonathan Corbet
2018-10-12  1:33     ` Dave Chinner
2018-10-15  9:55     ` Christoph Hellwig
2018-10-15 14:28       ` Jonathan Corbet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=153862678652.26427.14910212060817967947.stgit@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.