From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp2120.oracle.com ([141.146.126.78]:51492 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726813AbeJDLME (ORCPT ); Thu, 4 Oct 2018 07:12:04 -0400 Subject: [PATCH 20/22] docs: add XFS extended attributes structures to the DS&A book From: "Darrick J. Wong" Date: Wed, 03 Oct 2018 21:20:31 -0700 Message-ID: <153862683147.26427.3827362737993004696.stgit@magnolia> In-Reply-To: <153862669110.26427.16504658853992750743.stgit@magnolia> References: <153862669110.26427.16504658853992750743.stgit@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-doc@vger.kernel.org, corbet@lwn.net From: Darrick J. Wong Signed-off-by: Darrick J. Wong --- .../filesystems/xfs-data-structures/dynamic.rst | 1 .../xfs-data-structures/extended_attributes.rst | 933 ++++++++++++++++++++ 2 files changed, 934 insertions(+) create mode 100644 Documentation/filesystems/xfs-data-structures/extended_attributes.rst diff --git a/Documentation/filesystems/xfs-data-structures/dynamic.rst b/Documentation/filesystems/xfs-data-structures/dynamic.rst index 2c12fca905fd..16755381d0f8 100644 --- a/Documentation/filesystems/xfs-data-structures/dynamic.rst +++ b/Documentation/filesystems/xfs-data-structures/dynamic.rst @@ -6,3 +6,4 @@ Dynamic Allocated Structures .. include:: ondisk_inode.rst .. include:: data_extents.rst .. include:: directories.rst +.. include:: extended_attributes.rst diff --git a/Documentation/filesystems/xfs-data-structures/extended_attributes.rst b/Documentation/filesystems/xfs-data-structures/extended_attributes.rst new file mode 100644 index 000000000000..db6de15227cd --- /dev/null +++ b/Documentation/filesystems/xfs-data-structures/extended_attributes.rst @@ -0,0 +1,933 @@ +.. SPDX-License-Identifier: CC-BY-SA-4.0 + +Extended Attributes +------------------- + +Extended attributes enable users and administrators to attach (name: value) +pairs to inodes within the XFS filesystem. They could be used to store +meta-information about the file. + +Attribute names can be up to 256 bytes in length, terminated by the first 0 +byte. The intent is that they be printable ASCII (or other character set) +names for the attribute. The values can contain up to 64KB of arbitrary binary +data. Some XFS internal attributes (eg. parent pointers) use non-printable +names for the attribute. + +Access Control Lists (ACLs) and Data Migration Facility (DMF) use extended +attributes to store their associated metadata with an inode. + +XFS uses two disjoint attribute name spaces associated with every inode. These +are the root and user address spaces. The root address space is accessible +only to the superuser, and then only by specifying a flag argument to the +function call. Other users will not see or be able to modify attributes in the +root address space. The user address space is protected by the normal file +permissions mechanism, so the owner of the file can decide who is able to see +and/or modify the value of attributes on any particular file. + +To view extended attributes from the command line, use the getfattr command. +To set or delete extended attributes, use the setfattr command. ACLs control +should use the getfacl and setfacl commands. + +XFS attributes supports three namespaces: "user", "trusted" (or "root" using +IRIX terminology), and "secure". + +See the section about `extended attributes <#extended-attribute-versions>`__ +in the inode for instructions on how to calculate the location of the +attributes. + +The following four sections describe each of the on-disk formats. + +Short Form Attributes +~~~~~~~~~~~~~~~~~~~~~ + +When the all extended attributes can fit within the inode’s attribute fork, +the inode’s di\_aformat is set to "local" and the attributes are stored in +the inode’s literal area starting at offset di\_forkoff × 8. + +Shortform attributes use the following structures: + +.. code:: c + + typedef struct xfs_attr_shortform { + struct xfs_attr_sf_hdr { + __be16 totsize; + __u8 count; + } hdr; + struct xfs_attr_sf_entry { + __uint8_t namelen; + __uint8_t valuelen; + __uint8_t flags; + __uint8_t nameval[1]; + } list[1]; + } xfs_attr_shortform_t; + typedef struct xfs_attr_sf_hdr xfs_attr_sf_hdr_t; + typedef struct xfs_attr_sf_entry xfs_attr_sf_entry_t; + +**totsize** + Total size of the attribute structure in bytes. + +**count** + The number of entries that can be found in this structure. + +**namelen** and **valuelen** + These values specify the size of the two byte arrays containing the name + and value pairs. valuelen is zero for extended attributes with no value. + +**nameval[]** + A single array whose size is the sum of namelen and valuelen. The names + and values are not null terminated on-disk. The value immediately follows + the name in the array. + +.. _attribute-flags: + +**flags** + A combination of the following: + +.. list-table:: + :widths: 28 52 + :header-rows: 1 + + * - Flag + - Description + + * - 0 + - The attribute's namespace is "user". + + * - XFS_ATTR_ROOT + - The attribute's namespace is "trusted". + + * - XFS_ATTR_SECURE + - The attribute's namespace is "secure". + + * - XFS_ATTR_INCOMPLETE + - This attribute is being modified. + + * - XFS_ATTR_LOCAL + - The attribute value is contained within this block. + +Table: Attribute Namespaces + +.. figure:: images/64.png + :alt: Short form attribute layout + + Short form attribute layout + +xfs\_db Short Form Attribute Example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A file is created and two attributes are set: + +:: + + # setfattr -n user.empty few_attr + # setfattr -n trusted.trust -v val1 few_attr + +Using xfs\_db, we dump the inode: + +:: + + xfs_db> inode + xfs_db> p + core.magic = 0x494e + core.mode = 0100644 + ... + core.naextents = 0 + core.forkoff = 15 + core.aformat = 1 (local) + ... + a.sfattr.hdr.totsize = 24 + a.sfattr.hdr.count = 2 + a.sfattr.list[0].namelen = 5 + a.sfattr.list[0].valuelen = 0 + a.sfattr.list[0].root = 0 + a.sfattr.list[0].secure = 0 + a.sfattr.list[0].name = "empty" + a.sfattr.list[1].namelen = 5 + a.sfattr.list[1].valuelen = 4 + a.sfattr.list[1].root = 1 + a.sfattr.list[1].secure = 0 + a.sfattr.list[1].name = "trust" + a.sfattr.list[1].value = "val1" + +We can determine the actual inode offset to be 220 (15 x 8 + 100) or 0xdc. +Examining the raw dump, the second attribute is highlighted: + +:: + + xfs_db> type text + xfs_db> p + 09: 49 4e 81 a4 01 02 00 01 00 00 00 00 00 00 00 00 IN.............. + 10: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 02 ................ + 20: 44 be 19 be 38 d1 26 98 44 be 1a be 38 d1 26 98 D...8...D...8... + 30: 44 be 1a e1 3a 9a ea 18 00 00 00 00 00 00 00 04 D............... + 40: 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 ................ + 50: 00 00 0f 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ + 60: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 12 ................ + 70: 53 a0 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ + 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 18 02 00 ................ + ^^ hdr.totsize = 0x18 + e0: 05 00 00 65 6d 70 74 79 05 04 02 74 72 75 73 74 ...empty...trust + f0: 76 61 6c 31 00 00 00 00 00 00 00 00 00 00 00 00 val1............ + +Adding another attribute with attr1, the format is converted to extents and +di\_forkoff remains unchanged (and all those zeros in the dump above remain +unused): + +:: + + xfs_db> inode + xfs_db> p + ... + core.naextents = 1 + core.forkoff = 15 + core.aformat = 2 (extents) + ... + a.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,37534,1,0] + +Performing the same steps with attr2, adding one attribute at a time, you can +see di\_forkoff change as attributes are added: + +:: + + xfs_db> inode + xfs_db> p + ... + core.naextents = 0 + core.forkoff = 15 + core.aformat = 1 (local) + ... + a.sfattr.hdr.totsize = 17 + a.sfattr.hdr.count = 1 + a.sfattr.list[0].namelen = 10 + a.sfattr.list[0].valuelen = 0 + a.sfattr.list[0].root = 0 + a.sfattr.list[0].secure = 0 + a.sfattr.list[0].name = "empty_attr" + +Attribute added: + +:: + + xfs_db> p + ... + core.naextents = 0 + core.forkoff = 15 + core.aformat = 1 (local) + ... + a.sfattr.hdr.totsize = 31 + a.sfattr.hdr.count = 2 + a.sfattr.list[0].namelen = 10 + a.sfattr.list[0].valuelen = 0 + a.sfattr.list[0].root = 0 + a.sfattr.list[0].secure = 0 + a.sfattr.list[0].name = "empty_attr" + a.sfattr.list[1].namelen = 7 + a.sfattr.list[1].valuelen = 4 + a.sfattr.list[1].root = 1 + a.sfattr.list[1].secure = 0 + a.sfattr.list[1].name = "trust_a" + a.sfattr.list[1].value = "val1" + +Another attribute is added: + +:: + + xfs_db> p + ... + core.naextents = 0 + core.forkoff = 13 + core.aformat = 1 (local) + ... + a.sfattr.hdr.totsize = 52 + a.sfattr.hdr.count = 3 + a.sfattr.list[0].namelen = 10 + a.sfattr.list[0].valuelen = 0 + a.sfattr.list[0].root = 0 + a.sfattr.list[0].secure = 0 + a.sfattr.list[0].name = "empty_attr" + a.sfattr.list[1].namelen = 7 + a.sfattr.list[1].valuelen = 4 + a.sfattr.list[1].root = 1 + a.sfattr.list[1].secure = 0 + a.sfattr.list[1].name = "trust_a" + a.sfattr.list[1].value = "val1" + a.sfattr.list[2].namelen = 6 + a.sfattr.list[2].valuelen = 12 + a.sfattr.list[2].root = 0 + a.sfattr.list[2].secure = 0 + a.sfattr.list[2].name = "second" + a.sfattr.list[2].value = "second_value" + +One more is added: + +:: + + xfs_db> p + core.naextents = 0 + core.forkoff = 10 + core.aformat = 1 (local) + ... + a.sfattr.hdr.totsize = 69 + a.sfattr.hdr.count = 4 + a.sfattr.list[0].namelen = 10 + a.sfattr.list[0].valuelen = 0 + a.sfattr.list[0].root = 0 + a.sfattr.list[0].secure = 0 + a.sfattr.list[0].name = "empty_attr" + a.sfattr.list[1].namelen = 7 + a.sfattr.list[1].valuelen = 4 + a.sfattr.list[1].root = 1 + a.sfattr.list[1].secure = 0 + a.sfattr.list[1].name = "trust_a" + a.sfattr.list[1].value = "val1" + a.sfattr.list[2].namelen = 6 + a.sfattr.list[2].valuelen = 12 + a.sfattr.list[2].root = 0 + a.sfattr.list[2].secure = 0 + a.sfattr.list[2].name = "second" + a.sfattr.list[2].value = "second_value" + a.sfattr.list[3].namelen = 6 + a.sfattr.list[3].valuelen = 8 + a.sfattr.list[3].root = 0 + a.sfattr.list[3].secure = 1 + a.sfattr.list[3].name = "policy" + a.sfattr.list[3].value = "contents" + +A raw dump is shown to compare with the attr1 dump on a prior page, the header +is highlighted: + +:: + + xfs_db> type text + xfs_db> p + 00: 49 4e 81 a4 01 02 00 01 00 00 00 00 00 00 00 00 IN.............. + 10: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 05 ................ + 20: 44 be 24 cd 0f b0 96 18 44 be 24 cd 0f b0 96 18 D.......D....... + 30: 44 be 2d f5 01 62 7a 18 00 00 00 00 00 00 00 04 D....bz......... + 40: 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 ................ + 50: 00 00 0a 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ + 60: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 01 ................ + 70: 41 c0 00 01 00 00 00 00 00 00 00 00 00 00 00 00 A............... + 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + b0: 00 00 00 00 00 45 04 00 0a 00 00 65 6d 70 74 79 .....E.....empty + c0: 5f 61 74 74 72 07 04 02 74 72 75 73 74 5f 61 76 .attr...trust.av + d0: 61 6c 31 06 0c 00 73 65 63 6f 6e 64 73 65 63 6f all...secondseco + e0: 6e 64 5f 76 61 6c 75 65 06 08 04 70 6f 6c 69 63 nd.value...polic + f0: 79 63 6f 6e 74 65 6e 74 73 64 5f 76 61 6c 75 65 ycontentsd.value + +It can be clearly seen that attr2 allows many more attributes to be stored in +an inode before they are moved to another filesystem block. + +Leaf Attributes +~~~~~~~~~~~~~~~ + +When an inode’s attribute fork space is used up with shortform attributes and +more are added, the attribute format is migrated to "extents". + +Extent based attributes use hash/index pairs to speed up an attribute lookup. +The first part of the "leaf" contains an array of fixed size hash/index +pairs with the flags stored as well. The remaining part of the leaf block +contains the array name/value pairs, where each element varies in length. + +Each leaf is based on the xfs\_da\_blkinfo\_t block header declared in the +section about `directories <#directory-attribute-block-header>`__. On a v5 +filesystem, the block header is xfs\_da3\_blkinfo\_t. The structure +encapsulating all other structures in the attribute block is +xfs\_attr\_leafblock\_t. + +The structures involved are: + +.. code:: c + + typedef struct xfs_attr_leaf_map { + __be16 base; + __be16 size; + } xfs_attr_leaf_map_t; + +**base** + Block offset of the free area, in bytes. + +**size** + Size of the free area, in bytes. + +.. code:: c + + typedef struct xfs_attr_leaf_hdr { + xfs_da_blkinfo_t info; + __be16 count; + __be16 usedbytes; + __be16 firstused; + __u8 holes; + __u8 pad1; + xfs_attr_leaf_map_t freemap[3]; + } xfs_attr_leaf_hdr_t; + +**info** + Directory/attribute block header. + +**count** + Number of entries. + +**usedbytes** + Number of bytes used in the leaf block. + +**firstused** + Block offset of the first entry in use, in bytes. + +**holes** + Set to 1 if block compaction is necessary. + +**pad1** + Padding to maintain alignment to 64-bit boundaries. + +.. code:: c + + typedef struct xfs_attr_leaf_entry { + __be32 hashval; + __be16 nameidx; + __u8 flags; + __u8 pad2; + } xfs_attr_leaf_entry_t; + ---- + +**hashval** + Hash value of the attribute name. + +**nameidx** + Block offset of the name entry, in bytes. + +**flags** + Attribute flags, as specified `above <#attribute-flags>`__. + +**pad2** + Pads the structure to 64-bit boundaries. + +.. code:: c + + typedef struct xfs_attr_leaf_name_local { + __be16 valuelen; + __u8 namelen; + __u8 nameval[1]; + } xfs_attr_leaf_name_local_t; + +**valuelen** + Length of the value, in bytes. + +**namelen** + Length of the name, in bytes. + +**nameval** + The name and the value. String values are not zero-terminated. + +.. code:: c + + typedef struct xfs_attr_leaf_name_remote { + __be32 valueblk; + __be32 valuelen; + __u8 namelen; + __u8 name[1]; + } xfs_attr_leaf_name_remote_t; + +**valueblk** + The logical block in the attribute map where the value is located. + +**valuelen** + Length of the value, in bytes. + +**namelen** + Length of the name, in bytes. + +**nameval** + The name. String values are not zero-terminated. + +.. code:: c + + typedef struct xfs_attr_leafblock { + xfs_attr_leaf_hdr_t hdr; + xfs_attr_leaf_entry_t entries[1]; + xfs_attr_leaf_name_local_t namelist; + xfs_attr_leaf_name_remote_t valuelist; + } xfs_attr_leafblock_t; + +**hdr** + Attribute block header. + +**entries** + A variable-length array of attribute entries. + +**namelist** + A variable-length array of descriptors of local attributes. The location + and size of these entries is determined dynamically. + +**valuelist** + A variable-length array of descriptors of remote attributes. The location + and size of these entries is determined dynamically. + +On a v5 filesystem, the header becomes xfs\_da3\_blkinfo\_t to accomodate the +extra metadata integrity fields: + +.. code:: c + + typedef struct xfs_attr3_leaf_hdr { + xfs_da3_blkinfo_t info; + __be16 count; + __be16 usedbytes; + __be16 firstused; + __u8 holes; + __u8 pad1; + xfs_attr_leaf_map_t freemap[3]; + __be32 pad2; + } xfs_attr3_leaf_hdr_t; + + + typedef struct xfs_attr3_leafblock { + xfs_attr3_leaf_hdr_t hdr; + xfs_attr_leaf_entry_t entries[1]; + xfs_attr_leaf_name_local_t namelist; + xfs_attr_leaf_name_remote_t valuelist; + } xfs_attr3_leafblock_t; + +Each leaf header uses the magic number XFS\_ATTR\_LEAF\_MAGIC (0xfbee). On a +v5 filesystem, the magic number is XFS\_ATTR3\_LEAF\_MAGIC (0x3bee). + +The hash/index elements in the entries[] array are packed from the top of the +block. Name/values grow from the bottom but are not packed. The freemap +contains run-length-encoded entries for the free bytes after the entries[] +array, but only the three largest runs are stored (smaller runs are dropped). +When the freemap doesn’t show enough space for an allocation, the name/value +area is compacted and allocation is tried again. If there still isn’t enough +space, then the block is split. The name/value structures (both local and +remote versions) must be 32-bit aligned. + +For attributes with small values (ie. the value can be stored within the +leaf), the XFS\_ATTR\_LOCAL flag is set for the attribute. The entry details +are stored using the xfs\_attr\_leaf\_name\_local\_t structure. For large +attribute values that cannot be stored within the leaf, separate filesystem +blocks are allocated to store the value. They use the +xfs\_attr\_leaf\_name\_remote\_t structure. See `Remote +Values <#remote-attribute-values>`__ for more information. + +.. ifconfig:: builder != 'latex' + + .. figure:: images/69.png + :alt: Leaf attribute layout + + Leaf attribute layout + +.. ifconfig:: builder == 'latex' + + .. figure:: images/69.png + :scale: 45% + :alt: Leaf attribute layout + + Leaf attribute layout + +Both local and remote entries can be interleaved as they are only addressed by +the hash/index entries. The flag is stored with the hash/index pairs so the +appropriate structure can be used. + +Since duplicate hash keys are possible, for each hash that matches during a +lookup, the actual name string must be compared. + +An "incomplete" bit is also used for attribute flags. It shows that an +attribute is in the middle of being created and should not be shown to the +user if we crash during the time that the bit is set. The bit is cleared when +attribute has finished being set up. This is done because some large +attributes cannot be created inside a single transaction. + +xfs\_db Leaf Attribute Example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A single 30KB extended attribute is added to an inode: + +:: + + xfs_db> inode + xfs_db> p + ... + core.nblocks = 9 + core.nextents = 0 + core.naextents = 1 + core.forkoff = 15 + core.aformat = 2 (extents) + ... + a.bmx[0] = [startoff,startblock,blockcount,extentflag] + 0:[0,37535,9,0] + xfs_db> ablock 0 + xfs_db> p + hdr.info.forw = 0 + hdr.info.back = 0 + hdr.info.magic = 0xfbee + hdr.count = 1 + hdr.usedbytes = 20 + hdr.firstused = 4076 + hdr.holes = 0 + hdr.freemap[0-2] = [base,size] 0:[40,4036] 1:[0,0] 2:[0,0] + entries[0] = [hashval,nameidx,incomplete,root,secure,local] + 0:[0xfcf89d4f,4076,0,0,0,0] + nvlist[0].valueblk = 0x1 + nvlist[0].valuelen = 30692 + nvlist[0].namelen = 8 + nvlist[0].name = "big_attr" + +Attribute blocks 1 to 8 (filesystem blocks 37536 to 37543) contain the raw +binary value data for the attribute. + +Index 4076 (0xfec) is the offset into the block where the name/value +information is. As can be seen by the value, it’s at the end of the block: + +:: + + xfs_db> type text + xfs_db> p + + 000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 01 00 14 ................ + 010: 0f ec 00 00 00 28 0f c4 00 00 00 00 00 00 00 00 ................ + 020: fc f8 9d 4f 0f ec 00 00 00 00 00 00 00 00 00 00 ...O............ + 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + ... + fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................ + ff0: 00 00 77 e4 08 62 69 67 5f 61 74 74 72 00 00 00 ..w..big.attr... + +A 30KB attribute and a couple of small attributes are added to a file: + +:: + + xfs_db> inode + xfs_db> p + ... + core.nblocks = 10 + core.extsize = 0 + core.nextents = 1 + core.naextents = 2 + core.forkoff = 15 + core.aformat = 2 (extents) + ... + u.bmx[0] = [startoff,startblock,blockcount,extentflag] + 0:[0,81857,1,0] + a.bmx[0-1] = [startoff,startblock,blockcount,extentflag] + 0:[0,81858,1,0] + 1:[1,182398,8,0] + xfs_db> ablock 0 + xfs_db> p + hdr.info.forw = 0 + hdr.info.back = 0 + hdr.info.magic = 0xfbee + hdr.count = 3 + hdr.usedbytes = 52 + hdr.firstused = 4044 + hdr.holes = 0 + hdr.freemap[0-2] = [base,size] 0:[56,3988] 1:[0,0] 2:[0,0] + entries[0-2] = [hashval,nameidx,incomplete,root,secure,local] + 0:[0x1e9d3934,4044,0,0,0,1] + 1:[0x1e9d3937,4060,0,0,0,1] + 2:[0xfcf89d4f,4076,0,0,0,0] + nvlist[0].valuelen = 6 + nvlist[0].namelen = 5 + nvlist[0].name = "attr2" + nvlist[0].value = "value2" + nvlist[1].valuelen = 6 + nvlist[1].namelen = 5 + nvlist[1].name = "attr1" + nvlist[1].value = "value1" + nvlist[2].valueblk = 0x1 + nvlist[2].valuelen = 30692 + nvlist[2].namelen = 8 + nvlist[2].name = "big_attr" + +As can be seen in the entries array, the two small attributes have the local +flag set and the values are printed. + +A raw disk dump shows the attributes. The last attribute added is highlighted +(offset 4044 or 0xfcc): + +:: + + 000: 00 00 00 00 00 00 00 00 fb ee 00 00 00 03 00 34 ...............4 + 010: 0f cc 00 00 00 38 0f 94 00 00 00 00 00 00 00 00 .....8.......... + 020: 1e 9d 39 34 0f cc 01 00 1e 9d 39 37 0f dc 01 00 ..94......97.... + 030: fc f8 9d 4f 0f ec 00 00 00 00 00 00 00 00 00 00 ...0............ + 040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00................. + ... + fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 06 05 61 ...............a + fd0: 74 74 72 32 76 61 6c 75 65 32 00 00 00 06 05 61 ttr2value2.....a + fe0: 74 74 72 31 76 61 6c 75 65 31 00 00 00 00 00 01 ttr1value1...... + ff0: 00 00 77 e4 08 62 69 67 5f 61 74 74 72 00 00 00 ..w..big.attr... + +Node Attributes +~~~~~~~~~~~~~~~ + +When the number of attributes exceeds the space that can fit in one filesystem +block (ie. hash, flag, name and local values), the first attribute block +becomes the root of a B+tree where the leaves contain the hash/name/value +information that was stored in a single leaf block. The inode’s attribute +format itself remains extent based. The nodes use the xfs\_da\_intnode\_t or +xfs\_da3\_intnode\_t structures introduced in the section about +`directories <#directory-attribute-internal-node>`__. + +The location of the attribute leaf blocks can be in any order. The only way to +find an attribute is by walking the node block hash/before values. Given a +hash to look up, search the node’s btree array for the first hashval in the +array that exceeds the given hash. The entry is in the block pointed to by the +before value. + +Each attribute node block has a magic number of XFS\_DA\_NODE\_MAGIC (0xfebe). +On a v5 filesystem this is XFS\_DA3\_NODE\_MAGIC (0x3ebe). + +.. figure:: images/72.png + :alt: Node attribute layout + + Node attribute layout + +xfs\_db Node Attribute Example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +An inode with 1000 small attributes with the naming "attribute\_n" where +'n' is a number: + +:: + + xfs_db> inode + xfs_db> p + ... + core.nblocks = 15 + core.nextents = 0 + core.naextents = 1 + core.forkoff = 15 + core.aformat = 2 (extents) + ... + a.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,525144,15,0] + xfs_db> ablock 0 + xfs_db> p + hdr.info.forw = 0 + hdr.info.back = 0 + hdr.info.magic = 0xfebe + hdr.count = 14 + hdr.level = 1 + btree[0-13] = [hashval,before] + 0:[0x3435122d,1] + 1:[0x343550a9,14] + 2:[0x343553a6,13] + 3:[0x3436122d,12] + 4:[0x343650a9,8] + 5:[0x343653a6,7] + 6:[0x343691af,6] + 7:[0x3436d0ab,11] + 8:[0x3436d3a7,10] + 9:[0x3437122d,9] + 10:[0x3437922e,3] + 11:[0x3437d22a,5] + 12:[0x3e686c25,4] + 13:[0x3e686fad,2] + +The hashes are in ascending order in the btree array, and if the hash for the +attribute we are looking up is before the entry, we go to the addressed +attribute block. + +For example, to lookup attribute "attribute\_267": + +:: + + xfs_db> hash attribute_267 + 0x3437d1a8 + +In the root btree node, this falls between 0x3437922e and 0x3437d22a, +therefore leaf 11 or attribute block 5 will contain the entry. + +:: + + xfs_db> ablock 5 + xfs_db> p + hdr.info.forw = 4 + hdr.info.back = 3 + hdr.info.magic = 0xfbee + hdr.count = 96 + hdr.usedbytes = 2688 + hdr.firstused = 1408 + hdr.holes = 0 + hdr.freemap[0-2] = [base,size] 0:[800,608] 1:[0,0] 2:[0,0] + entries[0.95] = [hashval,nameidx,incomplete,root,secure,local] + 0:[0x3437922f,4068,0,0,0,1] + 1:[0x343792a6,4040,0,0,0,1] + 2:[0x343792a7,4012,0,0,0,1] + 3:[0x343792a8,3984,0,0,0,1] + ... + 82:[0x3437d1a7,2892,0,0,0,1] + 83:[0x3437d1a8,2864,0,0,0,1] + 84:[0x3437d1a9,2836,0,0,0,1] + ... + 95:[0x3437d22a,2528,0,0,0,1] + nvlist[0].valuelen = 10 + nvlist[0].namelen = 13 + nvlist[0].name = "attribute_310" + nvlist[0].value = "value_316\d" + nvlist[1].valuelen = 16 + nvlist[1].namelen = 13 + nvlist[1].name = "attribute_309" + nvlist[1].value = "value_309\d" + nvlist[2].valuelen = 10 + nvlist[2].namelen = 13 + nvlist[2].name = "attribute_308" + nvlist[2].value = "value_308\d" + nvlist[3].valuelen = 10 + nvlist[3].namelen = 13 + nvlist[3].name = "attribute_307" + nvlist[3].value = "value_307\d" + ... + nvlist[82].valuelen = 10 + nvlist[82].namelen = 13 + nvlist[82].name = "attribute_268" + nvlist[82].value = "value_268\d" + nvlist[83].valuelen = 10 + nvlist[83].namelen = 13 + nvlist[83].name = "attribute_267" + nvlist[83].value = "value_267\d" + nvlist[84].valuelen = 10 + nvlist[84].namelen = 13 + nvlist[84].name = "attribute_266" + nvlist[84].value = "value_266\d" + ... + +Each of the hash entries has XFS\_ATTR\_LOCAL flag set (1), which means the +attribute’s value follows immediately after the name. Raw disk of the +name/value pair at offset 2864 (0xb30), highlighted with "value\_267" +following immediately after the name: + +:: + + b00: 62 75 74 65 5f 32 36 35 76 61 6c 75 65 5f 32 36 bute.265value.26 + b10: 35 0a 00 00 00 0a 0d 61 74 74 72 69 62 75 74 65 5......attribute + b20: 51 32 36 36 76 61 6c 75 65 5f 32 36 36 0a 00 00 .266value.266... + b30: 00 0a 0d 61 74 74 72 69 62 75 74 65 5f 32 36 37 ...attribute.267 + b40: 76 61 6c 75 65 5f 32 36 37 0a 00 00 00 0a 0d 61 value.267......a + b50: 74 74 72 69 62 75 74 65 5f 32 36 38 76 61 6c 75 ttribute.268va1u + b60: 65 5f 32 36 38 0a 00 00 00 0a 0d 61 74 74 72 69 e.268......attri + b70: 62 75 74 65 5f 32 36 39 76 61 6c 75 65 5f 32 36 bute.269value.26 + +Each entry starts on a 32-bit (4 byte) boundary, therefore the highlighted +entry has 2 unused bytes after it. + +B+tree Attributes +~~~~~~~~~~~~~~~~~ + +When the attribute’s extent map in an inode grows beyond the available space, +the inode’s attribute format is changed to a "btree". The inode contains +root node of the extent B+tree which then address the leaves that contains the +extent arrays for the attribute data. The attribute data itself in the +allocated filesystem blocks use the same layout and structures as described in +`Node Attributes <#node-attributes>`__. + +Refer to the previous section on `B+tree Data Extents <#b-tree-extent-list>`__ +for more information on XFS B+tree extents. + +xfs\_db B+tree Attribute Example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Added 2000 attributes with 729 byte values to a file: + +:: + + xfs_db> inode + xfs_db> p + ... + core.nblocks = 640 + core.extsize = 0 + core.nextents = 1 + core.naextents = 274 + core.forkoff = 15 + core.aformat = 3 (btree) + ... + a.bmbt.level = 1 + a.bmbt.numrecs = 2 + a.bmbt.keys[1-2] = [startoff] 1:[0] 2:[219] + a.bmbt.ptrs[1-2] = 1:83162 2:109968 + xfs_db> fsblock 83162 + xfs_db> type bmapbtd + xfs_db> p + magic = 0x424d4150 + level = 0 + numrecs = 127 + leftsib = null + rightsib = 109968 + recs[1-127] = [startoff,startblock,blockcount,extentflag] + 1:[0,81870,1,0] + ... + xfs_db> fsblock 109968 + xfs_db> type bmapbtd + xfs_db> p + magic = 0x424d4150 + level = 0 + numrecs = 147 + leftsib = 83162 + rightsib = null + recs[1-147] = [startoff,startblock,blockcount,extentflag] + ... + (which is fsblock 81870) + xfs_db> ablock 0 + xfs_db> p + hdr.info.forw = 0 + hdr.info.back = 0 + hdr.info.magic = 0xfebe + hdr.count = 2 + hdr.level = 2 + btree[0-1] = [hashval,before] 0:[0x343612a6,513] 1:[0x3e686fad,512] + +The extent B+tree has two leaves that specify the 274 extents used for the +attributes. Looking at the first block, it can be seen that the attribute +B+tree is two levels deep. The two blocks at offset 513 and 512 (ie. access +using the ablock command) are intermediate xfs\_da\_intnode\_t nodes that +index all the attribute leaves. + +Remote Attribute Values +~~~~~~~~~~~~~~~~~~~~~~~ + +On a v5 filesystem, all remote value blocks start with this header: + +.. code:: c + + struct xfs_attr3_rmt_hdr { + __be32 rm_magic; + __be32 rm_offset; + __be32 rm_bytes; + __be32 rm_crc; + uuid_t rm_uuid; + __be64 rm_owner; + __be64 rm_blkno; + __be64 rm_lsn; + }; + +**rm\_magic** + Specifies the magic number for the remote value block: "XARM" + (0x5841524d). + +**rm\_offset** + Offset of the remote value data, in bytes. + +**rm\_bytes** + Number of bytes used to contain the remote value data. + +**rm\_crc** + Checksum of the remote value block. + +**rm\_uuid** + The UUID of this block, which must match either sb\_uuid or sb\_meta\_uuid + depending on which features are set. + +**rm\_owner** + The inode number that this remote value block belongs to. + +**rm\_blkno** + Disk block number of this remote value block. + +**rm\_lsn** + Log sequence number of the last write to this block. + +Filesystems formatted prior to v5 do not have this header in the remote block. +Value data begins immediately at offset zero.