All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: tytso@mit.edu
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	sfrench@samba.org, darrick.wong@oracle.com,
	samba-technical@lists.samba.org, jlayton@kernel.org,
	bfields@fieldses.org, paulus@samba.org,
	Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Subject: [PATCH RFC v5 11/11] docs: ext4.rst: Document encoding and case-insensitive
Date: Mon, 28 Jan 2019 16:32:23 -0500	[thread overview]
Message-ID: <20190128213223.31512-12-krisman@collabora.com> (raw)
In-Reply-To: <20190128213223.31512-1-krisman@collabora.com>

From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

Introduces the encoding-awareness and case-insensitive features on ext4
for system administrators.  Explain the minimum of design decisions that
are important for sysadmins wanting to enable this feature.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 Documentation/admin-guide/ext4.rst | 41 ++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
index e506d3dae510..4e08d0309f1e 100644
--- a/Documentation/admin-guide/ext4.rst
+++ b/Documentation/admin-guide/ext4.rst
@@ -91,10 +91,51 @@ Currently Available
 * large block (up to pagesize) support
 * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
   the ordering)
+* Encoding aware file names
+* Case insensitive file name lookups
 
 [1] Filesystems with a block size of 1k may see a limit imposed by the
 directory hash tree having a maximum depth of two.
 
+Encoding-aware file names and case-insensitive lookups
+======================================================
+
+Ext4 optionally supports filesystem-wide charset knowledge when handling
+file names, which allows the user to perform file system lookups using
+charset equivalent versions of the same file name, and optionally ensure
+that no invalid names are held by the filesystem.  charset encoding
+awareness is also essential for performing case-insensitive lookups,
+because it is what defines the casefold operation.
+
+The case-insensitive file name lookup feature is supported in a smaller
+granularity, on a per-directory basis, allowing the user to mix
+case-insensitive and case-sensitive directories in the same filesystem.
+It is enabled by flipping a file attribute on an empty directory.  For
+the reason stated above, the filesystem must have encoding enabled to
+use this feature.
+
+Both encoding-awareness and case-awareness are name-preserving on the
+disk, meaning that the file name provided by userspace is a
+byte-per-byte match to what is actually written in the disk.  The
+Unicode normalization format used by the kernel is thus an internal
+representation, and not exposed to the userspace nor to the disk, with
+the important exception of disk hashes, used on large directories with
+DX feature.  On DX directories, the hash must be calculated using the
+normalized version of the filename, meaning that the normalization
+format used actually has an impact on where the directory entry is
+stored.
+
+When we change from viewing filenames as opaque byte sequences to seeing
+them as encoded strings we need to address what happens when a program
+tries to create a file with an invalid name.  The Unicode subsystem
+within the kernel leaves the decision of what to do in this case to the
+filesystem, which select its preferred behavior by enabling/disabling
+the strict mode.  When Ext4 encounters one of those strings and the
+filesystem did not require strict mode, it falls back to considering the
+entire string as an opaque byte sequence, which still allows the user to
+operate on that file but the case-insensitive and equivalent sequence
+lookups won't work.
+
 Options
 =======
 
-- 
2.20.1


  parent reply	other threads:[~2019-01-28 21:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-28 21:32 [PATCH RFC v5 00/11] Ext4 Encoding and Case-insensitive support Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 01/11] unicode: Add unicode character database files Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 02/11] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 03/11] unicode: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 04/11] unicode: reduce the size of utf8data[] Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 05/11] unicode: Implement higher level API for string handling Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 06/11] unicode: Introduce test module for normalized utf8 implementation Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 07/11] MAINTAINERS: Add Unicode subsystem entry Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 08/11] ext4: Include encoding information in the superblock Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 09/11] ext4: Support encoding-aware file name lookups Gabriel Krisman Bertazi
2019-01-28 21:32 ` [PATCH RFC v5 10/11] ext4: Implement EXT4_CASEFOLD_FL flag Gabriel Krisman Bertazi
2019-01-28 21:32 ` Gabriel Krisman Bertazi [this message]
2019-01-29 16:54 ` [PATCH RFC v5 00/11] Ext4 Encoding and Case-insensitive support J. Bruce Fields
2019-02-05 18:10 ` Pali Rohár
2019-02-05 19:08   ` Gabriel Krisman Bertazi
2019-02-06  8:47     ` Pali Rohár
2019-02-06 16:04       ` Gabriel Krisman Bertazi
2019-02-06 16:43         ` Pali Rohár
2019-02-19 19:04 ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190128213223.31512-12-krisman@collabora.com \
    --to=krisman@collabora.com \
    --cc=bfields@fieldses.org \
    --cc=darrick.wong@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=krisman@collabora.co.uk \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=samba-technical@lists.samba.org \
    --cc=sfrench@samba.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.