archive mirror
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <>
	Gabriel Krisman Bertazi <>
Subject: [PATCH RFC v6 11/11] docs: ext4.rst: Document encoding and case-insensitive
Date: Mon, 18 Mar 2019 16:27:45 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

From: Gabriel Krisman Bertazi <>

Introduces the encoding-awareness and case-insensitive features on ext4
for system administrators.  Explain the minimum of design decisions that
are important for sysadmins wanting to enable this feature.

Signed-off-by: Gabriel Krisman Bertazi <>
 Documentation/admin-guide/ext4.rst | 41 ++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
index e506d3dae510..4e08d0309f1e 100644
--- a/Documentation/admin-guide/ext4.rst
+++ b/Documentation/admin-guide/ext4.rst
@@ -91,10 +91,51 @@ Currently Available
 * large block (up to pagesize) support
 * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
   the ordering)
+* Encoding aware file names
+* Case insensitive file name lookups
 [1] Filesystems with a block size of 1k may see a limit imposed by the
 directory hash tree having a maximum depth of two.
+Encoding-aware file names and case-insensitive lookups
+Ext4 optionally supports filesystem-wide charset knowledge when handling
+file names, which allows the user to perform file system lookups using
+charset equivalent versions of the same file name, and optionally ensure
+that no invalid names are held by the filesystem.  charset encoding
+awareness is also essential for performing case-insensitive lookups,
+because it is what defines the casefold operation.
+The case-insensitive file name lookup feature is supported in a smaller
+granularity, on a per-directory basis, allowing the user to mix
+case-insensitive and case-sensitive directories in the same filesystem.
+It is enabled by flipping a file attribute on an empty directory.  For
+the reason stated above, the filesystem must have encoding enabled to
+use this feature.
+Both encoding-awareness and case-awareness are name-preserving on the
+disk, meaning that the file name provided by userspace is a
+byte-per-byte match to what is actually written in the disk.  The
+Unicode normalization format used by the kernel is thus an internal
+representation, and not exposed to the userspace nor to the disk, with
+the important exception of disk hashes, used on large directories with
+DX feature.  On DX directories, the hash must be calculated using the
+normalized version of the filename, meaning that the normalization
+format used actually has an impact on where the directory entry is
+When we change from viewing filenames as opaque byte sequences to seeing
+them as encoded strings we need to address what happens when a program
+tries to create a file with an invalid name.  The Unicode subsystem
+within the kernel leaves the decision of what to do in this case to the
+filesystem, which select its preferred behavior by enabling/disabling
+the strict mode.  When Ext4 encounters one of those strings and the
+filesystem did not require strict mode, it falls back to considering the
+entire string as an opaque byte sequence, which still allows the user to
+operate on that file but the case-insensitive and equivalent sequence
+lookups won't work.

  parent reply	other threads:[~2019-03-18 20:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-18 20:27 [PATCH RFC v6 00/11] Ext4 Encoding and Case-insensitive support Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 01/11] unicode: Add unicode character database files Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 02/11] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 03/11] unicode: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 04/11] unicode: reduce the size of utf8data[] Gabriel Krisman Bertazi
2019-04-06 19:53   ` Theodore Ts'o
2019-04-08 12:02     ` Weber, Olaf (HPC Data Management & Storage)
2019-03-18 20:27 ` [PATCH RFC v6 05/11] unicode: Implement higher level API for string handling Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 06/11] unicode: Introduce test module for normalized utf8 implementation Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 07/11] MAINTAINERS: Add Unicode subsystem entry Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 08/11] ext4: Include encoding information in the superblock Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 09/11] ext4: Support encoding-aware file name lookups Gabriel Krisman Bertazi
2019-03-18 20:27 ` [PATCH RFC v6 10/11] ext4: Implement EXT4_CASEFOLD_FL flag Gabriel Krisman Bertazi
2019-03-18 20:27 ` Gabriel Krisman Bertazi [this message]
2019-03-21 22:30 ` [PATCH RFC v6 00/11] Ext4 Encoding and Case-insensitive support Randy Dunlap
2019-03-22 23:57   ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).