All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Ted Tso <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>, Jan Kara <jack@suse.cz>
Subject: [PATCH 4/5] ext4: Orphan file documentation
Date: Mon, 12 Jul 2021 17:40:08 +0200	[thread overview]
Message-ID: <20210712154009.9290-5-jack@suse.cz> (raw)
In-Reply-To: <20210712154009.9290-1-jack@suse.cz>

Add documentation about the orphan file feature.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/filesystems/ext4/globals.rst    |  1 +
 Documentation/filesystems/ext4/inodes.rst     | 10 ++--
 Documentation/filesystems/ext4/orphan.rst     | 52 +++++++++++++++++++
 .../filesystems/ext4/special_inodes.rst       | 17 ++++++
 Documentation/filesystems/ext4/super.rst      | 15 +++++-
 5 files changed, 88 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/filesystems/ext4/orphan.rst

diff --git a/Documentation/filesystems/ext4/globals.rst b/Documentation/filesystems/ext4/globals.rst
index 368bf7662b96..b17418974fd3 100644
--- a/Documentation/filesystems/ext4/globals.rst
+++ b/Documentation/filesystems/ext4/globals.rst
@@ -11,3 +11,4 @@ have static metadata at fixed locations.
 .. include:: bitmaps.rst
 .. include:: mmp.rst
 .. include:: journal.rst
+.. include:: orphan.rst
diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst
index a65baffb4ebf..6c5ce666e63f 100644
--- a/Documentation/filesystems/ext4/inodes.rst
+++ b/Documentation/filesystems/ext4/inodes.rst
@@ -498,11 +498,11 @@ structure -- inode change time (ctime), access time (atime), data
 modification time (mtime), and deletion time (dtime). The four fields
 are 32-bit signed integers that represent seconds since the Unix epoch
 (1970-01-01 00:00:00 GMT), which means that the fields will overflow in
-January 2038. For inodes that are not linked from any directory but are
-still open (orphan inodes), the dtime field is overloaded for use with
-the orphan list. The superblock field ``s_last_orphan`` points to the
-first inode in the orphan list; dtime is then the number of the next
-orphaned inode, or zero if there are no more orphans.
+January 2038. If the filesystem does not have orphan_file feature, inodes
+that are not linked from any directory but are still open (orphan inodes) have
+the dtime field overloaded for use with the orphan list. The superblock field
+``s_last_orphan`` points to the first inode in the orphan list; dtime is then
+the number of the next orphaned inode, or zero if there are no more orphans.
 
 If the inode structure size ``sb->s_inode_size`` is larger than 128
 bytes and the ``i_inode_extra`` field is large enough to encompass the
diff --git a/Documentation/filesystems/ext4/orphan.rst b/Documentation/filesystems/ext4/orphan.rst
new file mode 100644
index 000000000000..bb19ecd1b626
--- /dev/null
+++ b/Documentation/filesystems/ext4/orphan.rst
@@ -0,0 +1,52 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Orphan file
+-----------
+
+In unix there can inodes that are unlinked from directory hierarchy but that
+are still alive because they are open. In case of crash the filesystem has to
+clean up these inodes as otherwise they (and the blocks referenced from them)
+would leak. Similarly if we truncate or extend the file, we need not be able
+to perform the operation in a single journalling transaction. In such case we
+track the inode as orphan so that in case of crash extra blocks allocated to
+the file get truncated.
+
+Traditionally ext4 tracks orphan inodes in a form of single linked list where
+superblock contains the inode number of the last orphan inode (s\_last\_orphan
+field) and then each inode contains inode number of the previously orphaned
+inode (we overload i\_dtime inode field for this). However this filesystem
+global single linked list is a scalability bottleneck for workloads that result
+in heavy creation of orphan inodes. When orphan file feature
+(COMPAT\_ORPHAN\_FILE) is enabled, the filesystem has a special inode
+(referenced from the superblock through s\_orphan_file_inum) with several
+blocks. Each of these blocks has a structure:
+
+.. list-table::
+   :widths: 8 8 24 40
+   :header-rows: 1
+
+   * - Offset
+     - Type
+     - Name
+     - Description
+   * - 0x0
+     - Array of \_\_le32 entries
+     - Orphan inode entries
+     - Each \_\_le32 entry is either empty (0) or it contains inode number of
+       an orphan inode.
+   * - blocksize - 8
+     - \_\_le32
+     - ob\_magic
+     - Magic value stored in orphan block tail (0x0b10ca04)
+   * - blocksize - 4
+     - \_\_le32
+     - ob\_checksum
+     - Checksum of the orphan block.
+
+When a filesystem with orphan file feature is writeably mounted, we set
+RO\_COMPAT\_ORPHAN\_PRESENT feature in the superblock to indicate there may
+be valid orphan entries. In case we see this feature when mounting the
+filesystem, we read the whole orphan file and process all orphan inodes found
+there as usual. When cleanly unmounting the filesystem we remove the
+RO\_COMPAT\_ORPHAN\_PRESENT feature to avoid unnecessary scanning of the orphan
+file and also make the filesystem fully compatible with older kernels.
diff --git a/Documentation/filesystems/ext4/special_inodes.rst b/Documentation/filesystems/ext4/special_inodes.rst
index 9061aabba827..94f304e3a0a7 100644
--- a/Documentation/filesystems/ext4/special_inodes.rst
+++ b/Documentation/filesystems/ext4/special_inodes.rst
@@ -36,3 +36,20 @@ ext4 reserves some inode for special features, as follows:
    * - 11
      - Traditional first non-reserved inode. Usually this is the lost+found directory. See s\_first\_ino in the superblock.
 
+Note that there are also some inodes allocated from non-reserved inode numbers
+for other filesystem features which are not referenced from standard directory
+hierarchy. These are generally reference from the superblock. They are:
+
+.. list-table::
+   :widths: 20 50
+   :header-rows: 1
+
+   * - Superblock field
+     - Description
+
+   * - s\_lpf\_ino
+     - Inode number of lost+found directory.
+   * - s\_prj\_quota\_inum
+     - Inode number of quota file tracking project quotas
+   * - s\_orphan\_file\_inum
+     - Inode number of file tracking orphan inodes.
diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
index 2eb1ab20498d..f1d4907104f5 100644
--- a/Documentation/filesystems/ext4/super.rst
+++ b/Documentation/filesystems/ext4/super.rst
@@ -479,7 +479,11 @@ The ext4 superblock is laid out as follows in
      - Filename charset encoding flags.
    * - 0x280
      - \_\_le32
-     - s\_reserved[95]
+     - s\_orphan\_file\_inum
+     - Orphan file inode number.
+   * - 0x284
+     - \_\_le32
+     - s\_reserved[94]
      - Padding to the end of the block.
    * - 0x3FC
      - \_\_le32
@@ -603,6 +607,11 @@ following:
        the journal, JBD2 incompat feature
        (JBD2\_FEATURE\_INCOMPAT\_FAST\_COMMIT) gets
        set (COMPAT\_FAST\_COMMIT).
+   * - 0x1000
+     - Orphan file allocated. This is the special file for more efficient
+       tracking of unlinked but still open inodes. When there may be any
+       entries in the file, we additionally set proper rocompat feature
+       (RO\_COMPAT\_ORPHAN\_PRESENT).
 
 .. _super_incompat:
 
@@ -691,7 +700,9 @@ the following:
      - Indicates that large inodes exist on this filesystem
        (RO\_COMPAT\_EXTRA\_ISIZE).
    * - 0x80
-     - This filesystem has a snapshot (RO\_COMPAT\_HAS\_SNAPSHOT).
+     - Indicates orphan file may have valid orphan entries and thus we need
+       to clean them up when mounting the filesystem
+       (RO\_COMPAT\_ORPHAN\_PRESENT).
    * - 0x100
      - `Quota <Quota>`__ (RO\_COMPAT\_QUOTA).
    * - 0x200
-- 
2.26.2


  parent reply	other threads:[~2021-07-12 15:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-12 15:40 [PATCH 0/5 v4] ext4: Speedup orphan file handling Jan Kara
2021-07-12 15:40 ` [PATCH 1/5] ext4: Support for checksumming from journal triggers Jan Kara
2021-07-22 14:52   ` Theodore Ts'o
2021-07-12 15:40 ` [PATCH 2/5] ext4: Move orphan inode handling into a separate file Jan Kara
2021-07-22 14:53   ` Theodore Ts'o
2021-08-08 14:29   ` Theodore Ts'o
2021-08-10 13:15     ` Jan Kara
2021-07-12 15:40 ` [PATCH 3/5] ext4: Speedup ext4 orphan inode handling Jan Kara
2021-07-12 15:40 ` Jan Kara [this message]
2021-07-12 15:40 ` [PATCH 5/5] ext4: Improve scalability of ext4 orphan file handling Jan Kara
2021-08-11 10:19 [PATCH 0/5 v5] ext4: Speedup " Jan Kara
2021-08-11 10:19 ` [PATCH 4/5] ext4: Orphan file documentation Jan Kara
2021-08-12 15:02   ` Theodore Ts'o
2021-08-16  9:22 [PATCH 0/5 v6] ext4: Speedup orphan file handling Jan Kara
2021-08-16  9:23 ` [PATCH 4/5] ext4: Orphan file documentation Jan Kara
2021-08-16  9:57 [PATCH 0/5 v7] ext4: Speedup orphan file handling Jan Kara
2021-08-16  9:57 ` [PATCH 4/5] ext4: Orphan file documentation Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210712154009.9290-5-jack@suse.cz \
    --to=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --subject='Re: [PATCH 4/5] ext4: Orphan file documentation' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.