linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Chandan Babu R <chandan.babu@oracle.com>
Cc: linux-xfs@vger.kernel.org, djwong@kernel.org
Subject: Re: [PATCH V3 07/12] xfs: Rename inode's extent counter fields based on their width
Date: Tue, 28 Sep 2021 09:46:37 +1000	[thread overview]
Message-ID: <20210927234637.GM1756565@dread.disaster.area> (raw)
In-Reply-To: <20210916100647.176018-8-chandan.babu@oracle.com>

On Thu, Sep 16, 2021 at 03:36:42PM +0530, Chandan Babu R wrote:
> This commit renames extent counter fields in "struct xfs_dinode" and "struct
> xfs_log_dinode" based on the width of the fields. As of this commit, the
> 32-bit field will be used to count data fork extents and the 16-bit field will
> be used to count attr fork extents.
> 
> This change is done to enable a future commit to introduce a new 64-bit extent
> counter field.
> 
> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h      |  8 ++++----
>  fs/xfs/libxfs/xfs_inode_buf.c   |  4 ++--
>  fs/xfs/libxfs/xfs_log_format.h  |  4 ++--
>  fs/xfs/scrub/inode_repair.c     |  4 ++--
>  fs/xfs/scrub/trace.h            | 14 +++++++-------
>  fs/xfs/xfs_inode_item.c         |  4 ++--
>  fs/xfs/xfs_inode_item_recover.c |  8 ++++----
>  7 files changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index dba868f2c3e3..87c927d912f6 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -802,8 +802,8 @@ typedef struct xfs_dinode {
>  	__be64		di_size;	/* number of bytes in file */
>  	__be64		di_nblocks;	/* # of direct & btree blocks used */
>  	__be32		di_extsize;	/* basic/minimum extent size for file */
> -	__be32		di_nextents;	/* number of extents in data fork */
> -	__be16		di_anextents;	/* number of extents in attribute fork*/
> +	__be32		di_nextents32;	/* number of extents in data fork */
> +	__be16		di_nextents16;	/* number of extents in attribute fork*/


Hmmm. Having the same field in the inode hold the extent count
for different inode forks based on a bit in the superblock means the
on-disk inode format is not self describing. i.e. we can't decode
the on-disk contents of an inode correctly without knowing whether a
specific feature bit is set in the superblock or not.

Right now we don't have use external configs to decode the inode.
Feature level conditional fields are determined by inode version,
not superblock bits. Optional feature fields easy to deal with -
zero if the feature is not in use, otherwise we assume it is in use
and can validity check it appropriately. IOWs, we don't need
to look at sb feature bits to decode and validate inode fields.

This change means that we can't determine if the extent counts are
correct just by looking at the on-disk inode. If we just have
di_nextents32 set to a non-zero value, does that mean we should have
data fork extents or attribute fork extents present?

Just looking at whether the attr fork is initialised is not
sufficient - it can be initialised with zero attr fork extents
present.  We can't look at the literal area contents, either,
because we don't zero that when we shrink it. We can't look at
di_nblocks, because that counts both attr and data for blocks. We
can't look at di_size, because we can have data extents beyond EOF
and hence a size of zero doesn't mean the data fork is empty.

So if both forks are in extent format, they could be either both
empty, both contain extents or only one fork contains extents but we
can't tell which state is the correct one. Hence
if di_nextents64 if zero, we don't know if di_nextents32 is a count
of attribute extents or data extents without first looking at the
superblock feature bit to determine if di_nextents64 is in use or
not. The inode format is not self describing anymore.

When XFS introduced 32 bit link counts, the inode version was bumped
from v1 to v2 because it redefined fields in the inode structure
similar to this proposal[1]. The verison number was then used to
determine if the inode was in old or new format - it was a self
describing format change. Hence If we are going to redefine
di_nextents to be able to hold either data fork extent count (old
format) or attr fork extent count (new format) we really need to
bump the inode version so that we can discriminate between the two
inode formats just by looking at the inode itself.

If we don't want to bump the version, then we need to do something
like:

-	__be32		di_nextents;	/* number of extents in data fork */
-	__be16		di_anextents;	/* number of extents in attribute fork*/
+	__be32		di_nextents_old;/* old number of extents in data fork */
+	__be16		di_anextents_old;/* old number of extents in attribute fork*/
.....
-	__u8            di_pad2[12];
+	__be64		di_nextents;	/* number of extents in data fork */
+	__be32		di_anextents;	/* number of extents in attribute fork*/
+	__u8            di_pad2[4];

So that there is no ambiguity in the on-disk format between the two
formats - if one set is non-zero, the other set must be zero in this
sort of setup.

However, I think that redefining the fields and bumping the inode
version is the better long term strategy, as it allows future reuse
of the di_anextents_old field, and it uses less of the small amount
of unused padding we have remaining in the on-disk inode core.

At which point, the feature bit in the superblock becomes "has v4
inodes", not "has big extent counts". We then use v4 inode format in
memory for everything (i.e. 64 bit extent counts) and convert
to/from the ondisk format at IO time like we do with v1/v2 inodes.

Thoughts?

-Dave.

[1] The change to v2 inodes back in 1995 removed the filesystem UUID
from the inode and was replaced with a 32 bit link counter, a project ID
value and padding:

@@ -36,10 +38,12 @@ typedef struct xfs_dinode_core
        __uint16_t      di_mode;        /* mode and type of file */
        __int8_t        di_version;     /* inode version */
        __int8_t        di_format;      /* format of di_c data */
-       __uint16_t      di_nlink;       /* number of links to file */
+       __uint16_t      di_onlink;      /* old number of links to file */
        __uint32_t      di_uid;         /* owner's user id */
        __uint32_t      di_gid;         /* owner's group id */
-       uuid_t          di_uuid;        /* file unique id */
+       __uint32_t      di_nlink;       /* number of links to file */
+       __uint16_t      di_projid;      /* owner's project id */
+       __uint8_t       di_pad[10];     /* unused, zeroed space */
        xfs_timestamp_t di_atime;       /* time last accessed */
        xfs_timestamp_t di_mtime;       /* time last modified */
        xfs_timestamp_t di_ctime;       /* time created/inode modified */
@@ -81,7 +85,13 @@ typedef struct xfs_dinode

it was the redefinition of the di_uuid variable space that required
the bumping of the inode version...
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-09-27 23:46 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-16 10:06 [PATCH V3 00/12] xfs: Extend per-inode extent counters Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 01/12] xfs: Move extent count limits to xfs_format.h Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 02/12] xfs: Introduce xfs_iext_max_nextents() helper Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 03/12] xfs: Rename MAXEXTNUM, MAXAEXTNUM to XFS_IFORK_EXTCNT_MAXS32, XFS_IFORK_EXTCNT_MAXS16 Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 04/12] xfs: Use xfs_extnum_t instead of basic data types Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 05/12] xfs: Introduce xfs_dfork_nextents() helper Chandan Babu R
2021-09-27 22:46   ` Dave Chinner
2021-09-28  9:46     ` Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 06/12] xfs: xfs_dfork_nextents: Return extent count via an out argument Chandan Babu R
2021-09-30  1:19   ` Dave Chinner
2021-09-16 10:06 ` [PATCH V3 07/12] xfs: Rename inode's extent counter fields based on their width Chandan Babu R
2021-09-27 23:46   ` Dave Chinner [this message]
2021-09-28  4:04     ` Dave Chinner
2021-09-29 17:03       ` Chandan Babu R
2021-09-30  0:40         ` Dave Chinner
2021-09-30  4:31           ` Dave Chinner
2021-09-30  7:30             ` Chandan Babu R
2021-09-30 22:55               ` Dave Chinner
2021-10-07 10:52                 ` Chandan Babu R
2021-10-10 21:49                   ` Dave Chinner
2021-10-13 14:44                     ` Chandan Babu R
2021-10-14  2:00                       ` Dave Chinner
2021-10-14 10:07                         ` Chandan Babu R
2021-10-21 10:27                       ` Chandan Babu R
2021-09-28  9:47     ` Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 08/12] xfs: Promote xfs_extnum_t and xfs_aextnum_t to 64 and 32-bits respectively Chandan Babu R
2021-09-28  0:47   ` Dave Chinner
2021-09-28  9:47     ` Chandan Babu R
2021-09-28 23:08       ` Dave Chinner
2021-09-29 17:04         ` Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 09/12] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters Chandan Babu R
2021-09-27 23:06   ` Dave Chinner
2021-09-28  9:49     ` Chandan Babu R
2021-09-28 23:39       ` Dave Chinner
2021-09-29 17:04         ` Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 10/12] xfs: Extend per-inode extent counter widths Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 11/12] xfs: Add XFS_SB_FEAT_INCOMPAT_NREXT64 to XFS_SB_FEAT_INCOMPAT_ALL Chandan Babu R
2021-09-16 10:06 ` [PATCH V3 12/12] xfs: Define max extent length based on on-disk format definition Chandan Babu R
2021-09-28  0:33   ` Dave Chinner
2021-09-28 10:07     ` Chandan Babu R
2021-09-18  0:03 ` [PATCH V3 00/12] xfs: Extend per-inode extent counters Darrick J. Wong
2021-09-18  3:36   ` [External] : " Chandan Babu R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210927234637.GM1756565@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=chandan.babu@oracle.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).