From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF623C433F5 for ; Mon, 27 Sep 2021 23:46:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B7764611C9 for ; Mon, 27 Sep 2021 23:46:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238234AbhI0XsT (ORCPT ); Mon, 27 Sep 2021 19:48:19 -0400 Received: from mail106.syd.optusnet.com.au ([211.29.132.42]:39027 "EHLO mail106.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238236AbhI0XsT (ORCPT ); Mon, 27 Sep 2021 19:48:19 -0400 Received: from dread.disaster.area (pa49-195-238-16.pa.nsw.optusnet.com.au [49.195.238.16]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id BC620860093; Tue, 28 Sep 2021 09:46:38 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1mV0KT-00HUXw-EJ; Tue, 28 Sep 2021 09:46:37 +1000 Date: Tue, 28 Sep 2021 09:46:37 +1000 From: Dave Chinner To: Chandan Babu R Cc: linux-xfs@vger.kernel.org, djwong@kernel.org Subject: Re: [PATCH V3 07/12] xfs: Rename inode's extent counter fields based on their width Message-ID: <20210927234637.GM1756565@dread.disaster.area> References: <20210916100647.176018-1-chandan.babu@oracle.com> <20210916100647.176018-8-chandan.babu@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210916100647.176018-8-chandan.babu@oracle.com> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=YKPhNiOx c=1 sm=1 tr=0 a=DzKKRZjfViQTE5W6EVc0VA==:117 a=DzKKRZjfViQTE5W6EVc0VA==:17 a=kj9zAlcOel0A:10 a=7QKq2e-ADPsA:10 a=yPCof4ZbAAAA:8 a=7-415B0cAAAA:8 a=4T9XfmIfzXTh88SRqEMA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Thu, Sep 16, 2021 at 03:36:42PM +0530, Chandan Babu R wrote: > This commit renames extent counter fields in "struct xfs_dinode" and "struct > xfs_log_dinode" based on the width of the fields. As of this commit, the > 32-bit field will be used to count data fork extents and the 16-bit field will > be used to count attr fork extents. > > This change is done to enable a future commit to introduce a new 64-bit extent > counter field. > > Signed-off-by: Chandan Babu R > --- > fs/xfs/libxfs/xfs_format.h | 8 ++++---- > fs/xfs/libxfs/xfs_inode_buf.c | 4 ++-- > fs/xfs/libxfs/xfs_log_format.h | 4 ++-- > fs/xfs/scrub/inode_repair.c | 4 ++-- > fs/xfs/scrub/trace.h | 14 +++++++------- > fs/xfs/xfs_inode_item.c | 4 ++-- > fs/xfs/xfs_inode_item_recover.c | 8 ++++---- > 7 files changed, 23 insertions(+), 23 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > index dba868f2c3e3..87c927d912f6 100644 > --- a/fs/xfs/libxfs/xfs_format.h > +++ b/fs/xfs/libxfs/xfs_format.h > @@ -802,8 +802,8 @@ typedef struct xfs_dinode { > __be64 di_size; /* number of bytes in file */ > __be64 di_nblocks; /* # of direct & btree blocks used */ > __be32 di_extsize; /* basic/minimum extent size for file */ > - __be32 di_nextents; /* number of extents in data fork */ > - __be16 di_anextents; /* number of extents in attribute fork*/ > + __be32 di_nextents32; /* number of extents in data fork */ > + __be16 di_nextents16; /* number of extents in attribute fork*/ Hmmm. Having the same field in the inode hold the extent count for different inode forks based on a bit in the superblock means the on-disk inode format is not self describing. i.e. we can't decode the on-disk contents of an inode correctly without knowing whether a specific feature bit is set in the superblock or not. Right now we don't have use external configs to decode the inode. Feature level conditional fields are determined by inode version, not superblock bits. Optional feature fields easy to deal with - zero if the feature is not in use, otherwise we assume it is in use and can validity check it appropriately. IOWs, we don't need to look at sb feature bits to decode and validate inode fields. This change means that we can't determine if the extent counts are correct just by looking at the on-disk inode. If we just have di_nextents32 set to a non-zero value, does that mean we should have data fork extents or attribute fork extents present? Just looking at whether the attr fork is initialised is not sufficient - it can be initialised with zero attr fork extents present. We can't look at the literal area contents, either, because we don't zero that when we shrink it. We can't look at di_nblocks, because that counts both attr and data for blocks. We can't look at di_size, because we can have data extents beyond EOF and hence a size of zero doesn't mean the data fork is empty. So if both forks are in extent format, they could be either both empty, both contain extents or only one fork contains extents but we can't tell which state is the correct one. Hence if di_nextents64 if zero, we don't know if di_nextents32 is a count of attribute extents or data extents without first looking at the superblock feature bit to determine if di_nextents64 is in use or not. The inode format is not self describing anymore. When XFS introduced 32 bit link counts, the inode version was bumped from v1 to v2 because it redefined fields in the inode structure similar to this proposal[1]. The verison number was then used to determine if the inode was in old or new format - it was a self describing format change. Hence If we are going to redefine di_nextents to be able to hold either data fork extent count (old format) or attr fork extent count (new format) we really need to bump the inode version so that we can discriminate between the two inode formats just by looking at the inode itself. If we don't want to bump the version, then we need to do something like: - __be32 di_nextents; /* number of extents in data fork */ - __be16 di_anextents; /* number of extents in attribute fork*/ + __be32 di_nextents_old;/* old number of extents in data fork */ + __be16 di_anextents_old;/* old number of extents in attribute fork*/ ..... - __u8 di_pad2[12]; + __be64 di_nextents; /* number of extents in data fork */ + __be32 di_anextents; /* number of extents in attribute fork*/ + __u8 di_pad2[4]; So that there is no ambiguity in the on-disk format between the two formats - if one set is non-zero, the other set must be zero in this sort of setup. However, I think that redefining the fields and bumping the inode version is the better long term strategy, as it allows future reuse of the di_anextents_old field, and it uses less of the small amount of unused padding we have remaining in the on-disk inode core. At which point, the feature bit in the superblock becomes "has v4 inodes", not "has big extent counts". We then use v4 inode format in memory for everything (i.e. 64 bit extent counts) and convert to/from the ondisk format at IO time like we do with v1/v2 inodes. Thoughts? -Dave. [1] The change to v2 inodes back in 1995 removed the filesystem UUID from the inode and was replaced with a 32 bit link counter, a project ID value and padding: @@ -36,10 +38,12 @@ typedef struct xfs_dinode_core __uint16_t di_mode; /* mode and type of file */ __int8_t di_version; /* inode version */ __int8_t di_format; /* format of di_c data */ - __uint16_t di_nlink; /* number of links to file */ + __uint16_t di_onlink; /* old number of links to file */ __uint32_t di_uid; /* owner's user id */ __uint32_t di_gid; /* owner's group id */ - uuid_t di_uuid; /* file unique id */ + __uint32_t di_nlink; /* number of links to file */ + __uint16_t di_projid; /* owner's project id */ + __uint8_t di_pad[10]; /* unused, zeroed space */ xfs_timestamp_t di_atime; /* time last accessed */ xfs_timestamp_t di_mtime; /* time last modified */ xfs_timestamp_t di_ctime; /* time created/inode modified */ @@ -81,7 +85,13 @@ typedef struct xfs_dinode it was the redefinition of the di_uuid variable space that required the bumping of the inode version... -- Dave Chinner david@fromorbit.com