From: Jan Kara <jack@suse.cz> To: Ted Tso <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org>, Jan Kara <jack@suse.cz> Subject: [PATCH 4/5] ext4: Orphan file documentation Date: Mon, 16 Aug 2021 11:23:02 +0200 [thread overview] Message-ID: <20210816092309.26842-4-jack@suse.cz> (raw) In-Reply-To: <20210816091810.16994-1-jack@suse.cz> Add documentation about the orphan file feature. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> --- Documentation/filesystems/ext4/globals.rst | 1 + Documentation/filesystems/ext4/inodes.rst | 10 ++-- Documentation/filesystems/ext4/orphan.rst | 52 +++++++++++++++++++ .../filesystems/ext4/special_inodes.rst | 17 ++++++ Documentation/filesystems/ext4/super.rst | 15 +++++- 5 files changed, 89 insertions(+), 6 deletions(-) create mode 100644 Documentation/filesystems/ext4/orphan.rst diff --git a/Documentation/filesystems/ext4/globals.rst b/Documentation/filesystems/ext4/globals.rst index 368bf7662b96..b17418974fd3 100644 --- a/Documentation/filesystems/ext4/globals.rst +++ b/Documentation/filesystems/ext4/globals.rst @@ -11,3 +11,4 @@ have static metadata at fixed locations. .. include:: bitmaps.rst .. include:: mmp.rst .. include:: journal.rst +.. include:: orphan.rst diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst index a65baffb4ebf..6c5ce666e63f 100644 --- a/Documentation/filesystems/ext4/inodes.rst +++ b/Documentation/filesystems/ext4/inodes.rst @@ -498,11 +498,11 @@ structure -- inode change time (ctime), access time (atime), data modification time (mtime), and deletion time (dtime). The four fields are 32-bit signed integers that represent seconds since the Unix epoch (1970-01-01 00:00:00 GMT), which means that the fields will overflow in -January 2038. For inodes that are not linked from any directory but are -still open (orphan inodes), the dtime field is overloaded for use with -the orphan list. The superblock field ``s_last_orphan`` points to the -first inode in the orphan list; dtime is then the number of the next -orphaned inode, or zero if there are no more orphans. +January 2038. If the filesystem does not have orphan_file feature, inodes +that are not linked from any directory but are still open (orphan inodes) have +the dtime field overloaded for use with the orphan list. The superblock field +``s_last_orphan`` points to the first inode in the orphan list; dtime is then +the number of the next orphaned inode, or zero if there are no more orphans. If the inode structure size ``sb->s_inode_size`` is larger than 128 bytes and the ``i_inode_extra`` field is large enough to encompass the diff --git a/Documentation/filesystems/ext4/orphan.rst b/Documentation/filesystems/ext4/orphan.rst new file mode 100644 index 000000000000..bb19ecd1b626 --- /dev/null +++ b/Documentation/filesystems/ext4/orphan.rst @@ -0,0 +1,52 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Orphan file +----------- + +In unix there can inodes that are unlinked from directory hierarchy but that +are still alive because they are open. In case of crash the filesystem has to +clean up these inodes as otherwise they (and the blocks referenced from them) +would leak. Similarly if we truncate or extend the file, we need not be able +to perform the operation in a single journalling transaction. In such case we +track the inode as orphan so that in case of crash extra blocks allocated to +the file get truncated. + +Traditionally ext4 tracks orphan inodes in a form of single linked list where +superblock contains the inode number of the last orphan inode (s\_last\_orphan +field) and then each inode contains inode number of the previously orphaned +inode (we overload i\_dtime inode field for this). However this filesystem +global single linked list is a scalability bottleneck for workloads that result +in heavy creation of orphan inodes. When orphan file feature +(COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode +(referenced from the superblock through s\_orphan_file_inum) with several +blocks. Each of these blocks has a structure: + +.. list-table:: + :widths: 8 8 24 40 + :header-rows: 1 + + * - Offset + - Type + - Name + - Description + * - 0x0 + - Array of \_\_le32 entries + - Orphan inode entries + - Each \_\_le32 entry is either empty (0) or it contains inode number of + an orphan inode. + * - blocksize - 8 + - \_\_le32 + - ob\_magic + - Magic value stored in orphan block tail (0x0b10ca04) + * - blocksize - 4 + - \_\_le32 + - ob\_checksum + - Checksum of the orphan block. + +When a filesystem with orphan file feature is writeably mounted, we set +RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may +be valid orphan entries. In case we see this feature when mounting the +filesystem, we read the whole orphan file and process all orphan inodes found +there as usual. When cleanly unmounting the filesystem we remove the +RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan +file and also make the filesystem fully compatible with older kernels. diff --git a/Documentation/filesystems/ext4/special_inodes.rst b/Documentation/filesystems/ext4/special_inodes.rst index 9061aabba827..94f304e3a0a7 100644 --- a/Documentation/filesystems/ext4/special_inodes.rst +++ b/Documentation/filesystems/ext4/special_inodes.rst @@ -36,3 +36,20 @@ ext4 reserves some inode for special features, as follows: * - 11 - Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock. +Note that there are also some inodes allocated from non-reserved inode numbers +for other filesystem features which are not referenced from standard directory +hierarchy. These are generally reference from the superblock. They are: + +.. list-table:: + :widths: 20 50 + :header-rows: 1 + + * - Superblock field + - Description + + * - s\_lpf\_ino + - Inode number of lost+found directory. + * - s\_prj\_quota\_inum + - Inode number of quota file tracking project quotas + * - s\_orphan\_file\_inum + - Inode number of file tracking orphan inodes. diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst index 2eb1ab20498d..f6a548e957bb 100644 --- a/Documentation/filesystems/ext4/super.rst +++ b/Documentation/filesystems/ext4/super.rst @@ -479,7 +479,11 @@ The ext4 superblock is laid out as follows in - Filename charset encoding flags. * - 0x280 - \_\_le32 - - s\_reserved[95] + - s\_orphan\_file\_inum + - Orphan file inode number. + * - 0x284 + - \_\_le32 + - s\_reserved[94] - Padding to the end of the block. * - 0x3FC - \_\_le32 @@ -603,6 +607,11 @@ following: the journal, JBD2 incompat feature (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets set (COMPAT\_FAST\_COMMIT). + * - 0x1000 + - Orphan file allocated. This is the special file for more efficient + tracking of unlinked but still open inodes. When there may be any + entries in the file, we additionally set proper rocompat feature + (RO\_COMPAT\_ORPHAN\_PRESENT). .. _super_incompat: @@ -713,6 +722,10 @@ the following: - Filesystem tracks project quotas. (RO\_COMPAT\_PROJECT) * - 0x8000 - Verity inodes may be present on the filesystem. (RO\_COMPAT\_VERITY) + * - 0x10000 + - Indicates orphan file may have valid orphan entries and thus we need + to clean them up when mounting the filesystem + (RO\_COMPAT\_ORPHAN\_PRESENT). .. _super_def_hash: -- 2.26.2
next prev parent reply other threads:[~2021-08-16 9:23 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-16 9:22 [PATCH 0/5 v6] ext4: Speedup orphan file handling Jan Kara 2021-08-16 9:22 ` [PATCH 1/5] ext4: Support for checksumming from journal triggers Jan Kara 2021-08-16 9:23 ` [PATCH 2/5] ext4: Move orphan inode handling into a separate file Jan Kara 2021-08-16 9:23 ` [PATCH 3/5] ext4: Speedup ext4 orphan inode handling Jan Kara 2021-08-16 9:23 ` Jan Kara [this message] 2021-08-16 9:23 ` [PATCH 5/5] ext4: Improve scalability of ext4 orphan file handling Jan Kara 2021-08-16 9:58 ` [PATCH 0/5 v6] ext4: Speedup " Jan Kara -- strict thread matches above, loose matches on Subject: below -- 2021-08-16 9:57 [PATCH 0/5 v7] " Jan Kara 2021-08-16 9:57 ` [PATCH 4/5] ext4: Orphan file documentation Jan Kara 2021-08-11 10:19 [PATCH 0/5 v5] ext4: Speedup orphan file handling Jan Kara 2021-08-11 10:19 ` [PATCH 4/5] ext4: Orphan file documentation Jan Kara 2021-08-12 15:02 ` Theodore Ts'o 2021-07-12 15:40 [PATCH 0/5 v4] ext4: Speedup orphan file handling Jan Kara 2021-07-12 15:40 ` [PATCH 4/5] ext4: Orphan file documentation Jan Kara
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210816092309.26842-4-jack@suse.cz \ --to=jack@suse.cz \ --cc=linux-ext4@vger.kernel.org \ --cc=tytso@mit.edu \ --subject='Re: [PATCH 4/5] ext4: Orphan file documentation' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.