All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold
@ 2018-11-26 22:19 Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields Gabriel Krisman Bertazi
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

Hi Ted,

This version includes all the changes you requested as well as the
unicode 11.0 data files.  The utf8data file is generated in the kernel
source, and it is validated in that code, by nls_utf8-selftest.c, I
simply copied it here.  For the 10.0-> 11.0 migration, I also added a
few tests in the kernel source for the scripts that changed.

e2fsprogs:
  https://gitlab.collabora.com/krisman/e2fsprogs -b encoding-feature-merge_v3
linux:
  https://gitlab.collabora.com/krisman/e2fsprogs -b ext4-ci-directrory_v3
xfstests
  https://gitlab.collabora.com/krisman/xfstests -b encoding_v3

Thanks,

----
Original cover letter message:


These are the modifications to e2fsprogs in order to support encoding
awareness and case folding.  This patch series is divided in 3 parts:

Patch 1 & 2 work on reserving superblock fields.  Patch 1 is actually
unrelated, just updating the super_block to resynchronize with the
kernel.  Patch 2 reserves the feature bit and superblock fields for this
feature.

Patch 3 through 5 implements the changes the changes to mke2fs and
chattr/lsattr to enable the encoding feature at mkfs time and flipping
the casefold flag on demand for specific directories.

Patch 6 through 9 is where things get a bit ugly.  fsck needs to become
encoding aware, in order to calculate directory hashes correctly and
verify/fix inconsistencies.  This requires a tiny bit of plumbing to
pass the encoding information up to the point where we calculate the
hash, as well as implementing a simple nls-like interface in e2fsprogs
to do normalization/casefolding.  You'll see that in this series I've
actually dropped the utf8 part because that patch is huge and I'd rather
discuss it separately.  I did it in a hacky way now, where we import the
utf8n code from linux.  I thought about using libunistring but it
doesn't seem to support versioning and we risk being incompatible with
the kernel hashes.  I think we could follow the kernel approach and make
ucd files available in e2fsprogs and generate the data at
compilation. What do you think?

If you want to see a full utf8 capable version of this series, please
clone from:

https://gitlab.collabora.com/krisman/e2fsprogs -b encoding-feature-merge

If you don't object to patch 1 & 2, can we get them merged before the
rest of the series is ready, so I can reserve the bits in the super
block for this feature (patch 2) and avoid more rebasing on my side?


Gabriel Krisman Bertazi (12):
  libe2p: Helpers for configuring the encoding superblock fields
  mke2fs: Configure encoding during superblock initialization
  chattr/lsattr: Support casefold attribute
  lib/ext2fs: Implement NLS support
  lib/ext2fs: Support encoding when calculating dx hashes
  debugfs/htree: Support encoding when printing the file hash
  tune2fs: Prevent enabling encryption flag on encoding-aware fs
  ext2fs: nls: Support UTF-8 11.0 with NFKD normalization
  ext4.5: Add fname_encoding feature to ext4 man page
  mke2fs.8: Document fname_encoding options
  mke2fs.conf.5: Document fname_encoding configuration option
  chattr.1: Document the casefold attribute

 debugfs/Makefile.in        |    1 +
 debugfs/htree.c            |   30 +-
 e2fsck/Makefile.in         |    7 +-
 e2fsck/dx_dirinfo.c        |    4 +-
 e2fsck/e2fsck.h            |    4 +-
 e2fsck/pass1.c             |    3 +-
 e2fsck/pass2.c             |   11 +-
 e2fsck/rehash.c            |   20 +-
 e2fsck/unix.c              |   18 +
 lib/e2p/Makefile.in        |    8 +-
 lib/e2p/e2p.h              |    5 +
 lib/e2p/encoding.c         |   97 +
 lib/e2p/feature.c          |    2 +-
 lib/e2p/pf.c               |    1 +
 lib/ext2fs/Makefile.in     |   16 +-
 lib/ext2fs/dirhash.c       |   52 +
 lib/ext2fs/ext2_fs.h       |   10 +-
 lib/ext2fs/ext2fs.h        |    8 +
 lib/ext2fs/initialize.c    |    4 +
 lib/ext2fs/nls.h           |   66 +
 lib/ext2fs/nls_ascii.c     |   48 +
 lib/ext2fs/nls_utf8-norm.c |  793 +++++
 lib/ext2fs/nls_utf8.c      |   85 +
 lib/ext2fs/utf8data.h      | 6079 ++++++++++++++++++++++++++++++++++++
 lib/ext2fs/utf8n.h         |  120 +
 misc/chattr.1.in           |    8 +-
 misc/chattr.c              |    3 +-
 misc/ext4.5.in             |   10 +
 misc/mke2fs.8.in           |   25 +
 misc/mke2fs.c              |   83 +-
 misc/mke2fs.conf.5.in      |    4 +
 misc/mke2fs.conf.in        |    3 +
 misc/tune2fs.c             |    6 +
 33 files changed, 7598 insertions(+), 36 deletions(-)
 create mode 100644 lib/e2p/encoding.c
 create mode 100644 lib/ext2fs/nls.h
 create mode 100644 lib/ext2fs/nls_ascii.c
 create mode 100644 lib/ext2fs/nls_utf8-norm.c
 create mode 100644 lib/ext2fs/nls_utf8.c
 create mode 100644 lib/ext2fs/utf8data.h
 create mode 100644 lib/ext2fs/utf8n.h

-- 
2.19.2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-30 15:42   ` Theodore Y. Ts'o
  2018-11-26 22:19 ` [PATCH v3 02/12] mke2fs: Configure encoding during superblock initialization Gabriel Krisman Bertazi
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

Implement helper functions to convert the encoding name and specific
parameters requested by the user on the command line into the format
that is written to disk.

Changes since v2:
  - Rename defines to add EXT4_ prefix
  - Use unicode X.Y versioning scheme

Changes since v1:
  - Drop struct ext4_encoding_map name.
  - remove question mark in comment.
  - Reword 0x0 -> NULL
  - Prevent out of bound array access if requested invalid encoding

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 lib/e2p/Makefile.in  |  8 +++-
 lib/e2p/e2p.h        |  5 +++
 lib/e2p/encoding.c   | 97 ++++++++++++++++++++++++++++++++++++++++++++
 lib/ext2fs/ext2_fs.h |  5 +++
 4 files changed, 113 insertions(+), 2 deletions(-)
 create mode 100644 lib/e2p/encoding.c

diff --git a/lib/e2p/Makefile.in b/lib/e2p/Makefile.in
index 2b0aa1915130..68d534cdaf11 100644
--- a/lib/e2p/Makefile.in
+++ b/lib/e2p/Makefile.in
@@ -19,7 +19,8 @@ all::	e2p.pc
 OBJS=		feature.o fgetflags.o fsetflags.o fgetversion.o fsetversion.o \
 		getflags.o getversion.o hashstr.o iod.o ls.o ljs.o mntopts.o \
 		parse_num.o pe.o pf.o ps.o setflags.o setversion.o uuid.o \
-		ostype.o percent.o crypto_mode.o fgetproject.o fsetproject.o
+		ostype.o percent.o crypto_mode.o fgetproject.o fsetproject.o \
+		encoding.o
 
 SRCS=		$(srcdir)/feature.c $(srcdir)/fgetflags.c \
 		$(srcdir)/fsetflags.c $(srcdir)/fgetversion.c \
@@ -29,7 +30,7 @@ SRCS=		$(srcdir)/feature.c $(srcdir)/fgetflags.c \
 		$(srcdir)/pe.c $(srcdir)/pf.c $(srcdir)/ps.c \
 		$(srcdir)/setflags.c $(srcdir)/setversion.c $(srcdir)/uuid.c \
 		$(srcdir)/ostype.c $(srcdir)/percent.c $(srcdir)/crypto_mode.c \
-		$(srcdir)/fgetproject.c $(srcdir)/fsetproject.c
+		$(srcdir)/fgetproject.c $(srcdir)/fsetproject.c $(srcdir)/encoding.c
 HFILES= e2p.h
 
 LIBRARY= libe2p
@@ -147,6 +148,9 @@ getversion.o: $(srcdir)/getversion.c $(top_builddir)/lib/config.h \
 hashstr.o: $(srcdir)/hashstr.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2p.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h
+encoding.o: $(srcdir)/encoding.c $(top_builddir)/lib/config.h \
+ $(top_builddir)/lib/dirpaths.h $(srcdir)/e2p.h \
+ $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h
 iod.o: $(srcdir)/iod.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2p.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h
diff --git a/lib/e2p/e2p.h b/lib/e2p/e2p.h
index d70b59a5d358..c3a6b2587bf6 100644
--- a/lib/e2p/e2p.h
+++ b/lib/e2p/e2p.h
@@ -80,3 +80,8 @@ unsigned int e2p_percent(int percent, unsigned int base);
 
 const char *e2p_encmode2string(int num);
 int e2p_string2encmode(char *string);
+
+int e2p_str2encoding(const char *string);
+const char *e2p_encoding2str(int encoding);
+int e2p_get_encoding_flags(int encoding);
+int e2p_str2encoding_flags(int encoding, char *param, __u16 *flags);
diff --git a/lib/e2p/encoding.c b/lib/e2p/encoding.c
new file mode 100644
index 000000000000..a441647aaea3
--- /dev/null
+++ b/lib/e2p/encoding.c
@@ -0,0 +1,97 @@
+/*
+ * encoding.c --- convert between encoding magic numbers and strings
+ *
+ * Copyright (C) 2018  Collabora Ltd.
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+
+#include "config.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <ctype.h>
+#include <errno.h>
+#include <stdio.h>
+
+#include "e2p.h"
+
+#define ARRAY_SIZE(array)			\
+        (sizeof(array) / sizeof(array[0]))
+
+static const struct {
+	char *name;
+	__u16 default_flags;
+} ext4_encoding_map[] = {
+	/* 0x0 */ { "ascii", 0},
+	/* 0x1 */ {"utf8-10.0", (EXT4_UTF8_NORMALIZATION_TYPE_NFKD |
+				 EXT4_UTF8_CASEFOLD_TYPE_NFKDCF)},
+};
+
+static const struct enc_flags {
+	__u16 flag;
+	char *param;
+} encoding_flags[] = {
+	{ EXT4_ENC_STRICT_MODE_FL, "strict" },
+};
+
+/* Return a positive number < 0xff indicating the encoding magic number
+ * or a negative value indicating error. */
+int e2p_str2encoding(const char *string)
+{
+	int i;
+
+	for (i = 0 ; i < ARRAY_SIZE(ext4_encoding_map); i++)
+		if (!strcmp(string, ext4_encoding_map[i].name))
+			return i;
+
+	return -EINVAL;
+}
+
+const char *e2p_encoding2str(int encoding)
+{
+	if (encoding < ARRAY_SIZE(ext4_encoding_map))
+		return ext4_encoding_map[encoding].name;
+	return NULL;
+}
+
+int e2p_get_encoding_flags(int encoding)
+{
+	if (encoding < ARRAY_SIZE(ext4_encoding_map))
+		return ext4_encoding_map[encoding].default_flags;
+	return 0;
+}
+
+int e2p_str2encoding_flags(int encoding, char *param, __u16 *flags)
+{
+	char *f = strtok(param, "-");
+	const struct enc_flags *fl;
+	int i, neg = 0;
+
+	while (f) {
+		neg = 0;
+		if (!strncmp("no", f, 2)) {
+			neg = 1;
+			f += 2;
+		}
+
+		for (i = 0; i < ARRAY_SIZE(encoding_flags); i++) {
+			fl = &encoding_flags[i];
+			if (!strcmp(fl->param, f)) {
+				if (neg)
+					*flags &= ~fl->flag;
+				else
+					*flags |= fl->flag;
+
+				goto next_flag;
+			}
+		}
+		return -EINVAL;
+	next_flag:
+		f = strtok(NULL, "-");
+	}
+	return 0;
+}
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index f1c405b76339..36ae7ae41c47 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -16,6 +16,7 @@
 #ifndef _LINUX_EXT2_FS_H
 #define _LINUX_EXT2_FS_H
 
+#include <stddef.h>
 #include <ext2fs/ext2_types.h>		/* Changed from linux/types.h */
 
 #ifndef __GNUC_PREREQ
@@ -1127,4 +1128,8 @@ struct mmp_struct {
  */
 #define EXT4_INLINE_DATA_DOTDOT_SIZE	(4)
 
+#define EXT4_ENC_STRICT_MODE_FL			(1 << 0) /* Reject invalid sequences */
+#define EXT4_UTF8_NORMALIZATION_TYPE_NFKD	(1 << 1)
+#define EXT4_UTF8_CASEFOLD_TYPE_NFKDCF		(1 << 4)
+
 #endif	/* _LINUX_EXT2_FS_H */
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 02/12] mke2fs: Configure encoding during superblock initialization
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 03/12] chattr/lsattr: Support casefold attribute Gabriel Krisman Bertazi
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

This patch implements two new extended options to mkefs, allowing the
user to specify an encoding for file name operations and encoding flags
during filesystem creation.  We provide default flags for each encoding,
which the user can overwrite by passing -E fname_encoding-flags to mkfs.

If the user doesn't specify an encoding, the default value from
options.fname_encoding in mke2fs.conf.in file will be used.

Changes since v2:
  - Rename feature encoding -> fname_encoding
  - Add default encoding option to mke2fs.conf.in
  - Fix behavior on -O fname_encoding
  - Prevent use of encrypt and encoding simultaneously
---
 lib/e2p/feature.c       |  2 +-
 lib/ext2fs/initialize.c |  4 ++
 misc/mke2fs.c           | 83 ++++++++++++++++++++++++++++++++++++++++-
 misc/mke2fs.conf.in     |  3 ++
 4 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/lib/e2p/feature.c b/lib/e2p/feature.c
index 294a56a40b52..ded87f5611dc 100644
--- a/lib/e2p/feature.c
+++ b/lib/e2p/feature.c
@@ -110,7 +110,7 @@ static struct feature feature_list[] = {
 	{       E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_ENCRYPT,
 			"encrypt"},
 	{       E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_FNAME_ENCODING,
-			"encoding"},
+			"fname_encoding"},
 	{	0, 0, 0 },
 };
 
diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index 8c9e97fee831..30b1ae033340 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -186,6 +186,10 @@ errcode_t ext2fs_initialize(const char *name, int flags,
 	set_field(s_flags, 0);
 	assign_field(s_backup_bgs[0]);
 	assign_field(s_backup_bgs[1]);
+
+	assign_field(s_encoding);
+	assign_field(s_encoding_flags);
+
 	if (super->s_feature_incompat & ~EXT2_LIB_FEATURE_INCOMPAT_SUPP) {
 		retval = EXT2_ET_UNSUPP_FEATURE;
 		goto cleanup;
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index f05003fc30b9..98b116ebaad2 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -790,6 +790,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
 	int	len;
 	int	r_usage = 0;
 	int	ret;
+	int	encoding = -1;
+	char 	*encoding_flags = NULL;
 
 	len = strlen(opts);
 	buf = malloc(len+1);
@@ -1056,6 +1058,31 @@ static void parse_extended_opts(struct ext2_super_block *param,
 			}
 		} else if (!strcmp(token, "android_sparse")) {
 			android_sparse_file = 1;
+		} else if (!strcmp(token, "fname_encoding")) {
+			if (!arg) {
+				profile_get_string(profile, "options",
+						   "fname_encoding", 0, 0,
+						   &arg);
+				if (!arg) {
+					r_usage++;
+					continue;
+				}
+			}
+
+			encoding = e2p_str2encoding(arg);
+			if (encoding < 0) {
+				fprintf(stderr, _("Invalid encoding: %s"), arg);
+				r_usage++;
+				continue;
+			}
+			param->s_encoding = encoding;
+			ext2fs_set_feature_fname_encoding(param);
+		} else if (!strcmp(token, "fname_encoding_flags")) {
+			if (!arg) {
+				r_usage++;
+				continue;
+			}
+			encoding_flags = arg;
 		} else {
 			r_usage++;
 			badopt = token;
@@ -1080,6 +1107,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
 			"\ttest_fs\n"
 			"\tdiscard\n"
 			"\tnodiscard\n"
+			"\tfname_encoding=<encoding>\n"
+			"\tfname_encoding_flags=<flags>\n"
 			"\tquotatype=<quota type(s) to be enabled>\n\n"),
 			badopt ? badopt : "");
 		free(buf);
@@ -1091,6 +1120,25 @@ static void parse_extended_opts(struct ext2_super_block *param,
 				  "multiple of stride %u.\n\n"),
 			param->s_raid_stripe_width, param->s_raid_stride);
 
+	if (ext2fs_has_feature_fname_encoding(param)) {
+		param->s_encoding_flags =
+			e2p_get_encoding_flags(param->s_encoding);
+
+		if (encoding_flags &&
+		    e2p_str2encoding_flags(param->s_encoding, encoding_flags,
+					   &param->s_encoding_flags)) {
+			fprintf(stderr, _("error: Invalid encoding flag: %s\n"),
+				encoding_flags);
+			free(buf);
+			exit(1);
+		}
+	} else if (encoding_flags) {
+		fprintf(stderr, _("error: An encoding must be explicitely "
+				  "specified when passing encoding-flags\n"));
+		free(buf);
+		exit(1);
+	}
+
 	free(buf);
 }
 
@@ -1112,6 +1160,7 @@ static __u32 ok_features[3] = {
 		EXT4_FEATURE_INCOMPAT_64BIT|
 		EXT4_FEATURE_INCOMPAT_INLINE_DATA|
 		EXT4_FEATURE_INCOMPAT_ENCRYPT |
+		EXT4_FEATURE_INCOMPAT_FNAME_ENCODING |
 		EXT4_FEATURE_INCOMPAT_CSUM_SEED |
 		EXT4_FEATURE_INCOMPAT_LARGEDIR,
 	/* R/O compat */
@@ -1518,6 +1567,8 @@ static void PRS(int argc, char *argv[])
 	int		use_bsize;
 	char		*newpath;
 	int		pathlen = sizeof(PATH_SET) + 1;
+	char		*encoding_name = NULL;
+	int		encoding;
 
 	if (oldpath)
 		pathlen += strlen(oldpath);
@@ -2026,6 +2077,7 @@ profile_error:
 		ext2fs_clear_feature_huge_file(&fs_param);
 		ext2fs_clear_feature_metadata_csum(&fs_param);
 		ext2fs_clear_feature_ea_inode(&fs_param);
+		ext2fs_clear_feature_fname_encoding(&fs_param);
 	}
 	edit_feature(fs_features ? fs_features : tmp,
 		     &fs_param.s_feature_compat);
@@ -2341,6 +2393,26 @@ profile_error:
 	if (packed_meta_blocks)
 		journal_location = 0;
 
+	if (ext2fs_has_feature_fname_encoding(&fs_param)) {
+		profile_get_string(profile, "options", "fname_encoding",
+				   0, 0, &encoding_name);
+		if (!encoding_name) {
+			com_err(program_name, 0, "%s",
+				_("Filename encoding type must be specified\n"
+				  "Use -E fname_encoding=<name> instead"));
+			exit(1);
+		}
+		encoding = e2p_str2encoding(encoding_name);
+		if (encoding < 0) {
+			com_err(program_name, 0, "%s",
+				_("Unknown default filename encoding\n"
+				  "Use -E fname_encoding=<name> instead"));
+			exit(1);
+		}
+		fs_param.s_encoding = encoding;
+		fs_param.s_encoding_flags = e2p_get_encoding_flags(encoding);
+	}
+
 	/* Get options from profile */
 	for (cpp = fs_types; *cpp; cpp++) {
 		tmp = NULL;
@@ -2385,6 +2457,15 @@ profile_error:
 		}
 	}
 
+	if (ext2fs_has_feature_fname_encoding(&fs_param) &&
+	    ext2fs_has_feature_encrypt(&fs_param)) {
+		com_err(program_name, 0, "%s",
+			_("The encrypt and encoding features are not "
+			  "compatible.\nThey can not be both enabled "
+			  "simultaneously.\n"));
+		      exit (1);
+	}
+
 	/* Don't allow user to set both metadata_csum and uninit_bg bits. */
 	if (ext2fs_has_feature_metadata_csum(&fs_param) &&
 	    ext2fs_has_feature_gdt_csum(&fs_param))
@@ -2393,7 +2474,7 @@ profile_error:
 	/* Can't support bigalloc feature without extents feature */
 	if (ext2fs_has_feature_bigalloc(&fs_param) &&
 	    !ext2fs_has_feature_extents(&fs_param)) {
-		com_err(program_name, 0, "%s",
+		com_err(program_name, 0,
 			_("Can't support bigalloc feature without "
 			  "extents feature"));
 		exit(1);
diff --git a/misc/mke2fs.conf.in b/misc/mke2fs.conf.in
index 01e35cf83150..3330dbc810bb 100644
--- a/misc/mke2fs.conf.in
+++ b/misc/mke2fs.conf.in
@@ -45,3 +45,6 @@
 	     blocksize = 4096
 	     inode_size = 128
 	}
+
+[options]
+fname_encoding = utf8-10.0
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 03/12] chattr/lsattr: Support casefold attribute
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 02/12] mke2fs: Configure encoding during superblock initialization Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 04/12] lib/ext2fs: Implement NLS support Gabriel Krisman Bertazi
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

This flag can be set on directories to request insensitive file name
lookups.

I used the letter 'F', referring to "caseFold" for lack of a better
option.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 lib/e2p/pf.c         | 1 +
 lib/ext2fs/ext2_fs.h | 5 +++--
 misc/chattr.c        | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/e2p/pf.c b/lib/e2p/pf.c
index 884f1671edae..0c6998c4b766 100644
--- a/lib/e2p/pf.c
+++ b/lib/e2p/pf.c
@@ -44,6 +44,7 @@ static struct flags_name flags_array[] = {
 	{ EXT2_TOPDIR_FL, "T", "Top_of_Directory_Hierarchies" },
 	{ EXT4_EXTENTS_FL, "e", "Extents" },
 	{ FS_NOCOW_FL, "C", "No_COW" },
+	{ EXT4_CASEFOLD_FL, "F", "Casefold" },
 	{ EXT4_INLINE_DATA_FL, "N", "Inline_Data" },
 	{ EXT4_PROJINHERIT_FL, "P", "Project_Hierarchy" },
 	{ EXT4_VERITY_FL, "V", "Verity" },
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index 36ae7ae41c47..032f83ed5ed3 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -340,10 +340,11 @@ struct ext2_dx_tail {
 #define EXT4_SNAPFILE_SHRUNK_FL		0x08000000  /* Snapshot shrink has completed */
 #define EXT4_INLINE_DATA_FL		0x10000000 /* Inode has inline data */
 #define EXT4_PROJINHERIT_FL		0x20000000 /* Create with parents projid */
+#define EXT4_CASEFOLD_FL		0x40000000 /* Casefolded file */
 #define EXT2_RESERVED_FL		0x80000000 /* reserved for ext2 lib */
 
-#define EXT2_FL_USER_VISIBLE		0x204BDFFF /* User visible flags */
-#define EXT2_FL_USER_MODIFIABLE		0x204B80FF /* User modifiable flags */
+#define EXT2_FL_USER_VISIBLE		0x604BDFFF /* User visible flags */
+#define EXT2_FL_USER_MODIFIABLE		0x604B80FF /* User modifiable flags */
 
 /*
  * ioctl commands
diff --git a/misc/chattr.c b/misc/chattr.c
index a5b401a741b7..a5d60170bdb6 100644
--- a/misc/chattr.c
+++ b/misc/chattr.c
@@ -86,7 +86,7 @@ static unsigned long sf;
 static void usage(void)
 {
 	fprintf(stderr,
-		_("Usage: %s [-pRVf] [-+=aAcCdDeijPsStTu] [-v version] files...\n"),
+		_("Usage: %s [-pRVf] [-+=aAcCdDeijPsStTuF] [-v version] files...\n"),
 		program_name);
 	exit(1);
 }
@@ -112,6 +112,7 @@ static const struct flags_char flags_array[] = {
 	{ EXT2_NOTAIL_FL, 't' },
 	{ EXT2_TOPDIR_FL, 'T' },
 	{ FS_NOCOW_FL, 'C' },
+	{ EXT4_CASEFOLD_FL, 'F' },
 	{ 0, 0 }
 };
 
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 04/12] lib/ext2fs: Implement NLS support
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2018-11-26 22:19 ` [PATCH v3 03/12] chattr/lsattr: Support casefold attribute Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-30 15:54   ` Theodore Y. Ts'o
  2018-11-26 22:19 ` [PATCH v3 05/12] lib/ext2fs: Support encoding when calculating dx hashes Gabriel Krisman Bertazi
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

Basic NLS support is required in e2fsprogs because of fsck, which
needsto calculate dx hashes for encoding aware filesystems.  this patch
implements this infrastructure as well as ascii support.

We don't need to do all the dance of versioning as we do in the kernel,
because we know before-hand which encodings and versions we
support (those we know how to store in the sb), so it is simpler just to
create static tables.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 lib/ext2fs/Makefile.in | 10 +++++--
 lib/ext2fs/nls.h       | 65 ++++++++++++++++++++++++++++++++++++++++++
 lib/ext2fs/nls_ascii.c | 48 +++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 2 deletions(-)
 create mode 100644 lib/ext2fs/nls.h
 create mode 100644 lib/ext2fs/nls_ascii.c

diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index 4a197cdf4e4a..a2f07403c9ae 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -20,6 +20,9 @@ COMPILE_ET=	_ET_DIR_OVERRIDE=$(srcdir)/../et ../et/compile_et
 @TEST_IO_CMT@TEST_IO_LIB_OBJS = test_io.o
 @IMAGER_CMT@E2IMAGE_LIB_OBJS = imager.o
 
+NLS_OBJS=nls_ascii.o
+NLS_SRCS=nls_ascii.c
+
 DEBUG_OBJS= debug_cmds.o extent_cmds.o tst_cmds.o debugfs.o util.o \
 	ncheck.o icheck.o ls.o lsdel.o dump.o set_fields.o logdump.o \
 	htree.o unused.o e2freefrag.o filefrag.o extent_inode.o zap.o \
@@ -130,7 +133,8 @@ OBJS= $(DEBUGFS_LIB_OBJS) $(RESIZE_LIB_OBJS) $(E2IMAGE_LIB_OBJS) \
 	unlink.o \
 	valid_blk.o \
 	version.o \
-	rbtree.o
+	rbtree.o \
+	$(NLS_OBJS)
 
 SRCS= ext2_err.c \
 	$(srcdir)/alloc.c \
@@ -222,7 +226,8 @@ SRCS= ext2_err.c \
 	$(srcdir)/write_bb_file.c \
 	$(srcdir)/rbtree.c \
 	$(srcdir)/tst_libext2fs.c \
-	$(DEBUG_SRCS)
+	$(DEBUG_SRCS) \
+	$(NLS_SRCS)
 
 HFILES= bitops.h ext2fs.h ext2_io.h ext2_fs.h ext2_ext_attr.h ext3_extents.h \
 	tdb.h qcow2.h hashmap.h
@@ -1412,3 +1417,4 @@ do_journal.o: $(top_srcdir)/debugfs/do_journal.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/kernel-jbd.h \
  $(srcdir)/jfs_compat.h $(srcdir)/kernel-list.h \
  $(top_srcdir)/debugfs/journal.h $(srcdir)/../../e2fsck/jfs_user.h
+$(NLS_OBJS): $(srcdir)/nls.h
diff --git a/lib/ext2fs/nls.h b/lib/ext2fs/nls.h
new file mode 100644
index 000000000000..b7f6ebcd3b25
--- /dev/null
+++ b/lib/ext2fs/nls.h
@@ -0,0 +1,65 @@
+/*
+ * nls.h - Header for encoding support functions
+ *
+ * Copyright (C) 2017 Collabora Ltd.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 3 of the License, or (at
+ *  your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef EXT2FS_NLS_H
+#define EXT2FS_NLS_H
+
+#include <unistd.h>
+#include <string.h>
+#include <stdio.h>
+
+struct nls_table;
+
+#define ARRAY_SIZE(array)			\
+        (sizeof(array) / sizeof(array[0]))
+
+struct nls_ops {
+	int (*normalize)(const struct nls_table *charset,
+			 const unsigned char *str, size_t len,
+			 unsigned char *dest, size_t dlen);
+
+	int (*casefold)(const struct nls_table *charset,
+			const unsigned char *str, size_t len,
+			unsigned char *dest, size_t dlen);
+};
+
+struct nls_table {
+	char *name;
+	const struct nls_ops *ops;
+};
+
+extern const struct nls_table nls_ascii;
+
+static const struct nls_table *encoding_list[] = {
+	&nls_ascii
+};
+
+static const struct nls_table *nls_load_table(const char *name)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(encoding_list); i++) {
+		if (strcmp(encoding_list[i]->name, name) == 0)
+			return encoding_list[i];
+	}
+	return NULL;
+}
+
+#endif
diff --git a/lib/ext2fs/nls_ascii.c b/lib/ext2fs/nls_ascii.c
new file mode 100644
index 000000000000..22e819849f3a
--- /dev/null
+++ b/lib/ext2fs/nls_ascii.c
@@ -0,0 +1,48 @@
+#include "nls.h"
+#include <string.h>
+
+static unsigned char charset_tolower(const struct nls_table *table,
+				     unsigned int c)
+{
+	if (c >= 'A' && c <= 'Z')
+		return (c | 0x20);
+	return c;
+}
+
+static unsigned char charset_toupper(const struct nls_table *table,
+				     unsigned int c)
+{
+	if (c >= 'a' && c <= 'z')
+		return (c & ~0x20);
+	return c;
+}
+
+static int ascii_casefold(const struct nls_table *table,
+			  const unsigned char *str, size_t len,
+			  unsigned char *dest, size_t dlen)
+{
+	unsigned i;
+
+	for (i = 0; i < len; i++)
+		dest[i] = charset_toupper(table, str[i]);
+
+	return len;
+}
+
+static int ascii_normalize(const struct nls_table *table,
+			   const unsigned char *str, size_t len,
+			   unsigned char *dest, size_t dlen)
+{
+	memcpy(dest, str, len);
+	return len;
+}
+
+const static struct nls_ops ascii_ops = {
+	.casefold = ascii_casefold,
+	.normalize = ascii_normalize,
+};
+
+const struct nls_table nls_ascii = {
+	.name = "ascii",
+	.ops = &ascii_ops,
+};
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 05/12] lib/ext2fs: Support encoding when calculating dx hashes
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
                   ` (3 preceding siblings ...)
  2018-11-26 22:19 ` [PATCH v3 04/12] lib/ext2fs: Implement NLS support Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 06/12] debugfs/htree: Support encoding when printing the file hash Gabriel Krisman Bertazi
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

fsck must be aware of the superblock encoding and the casefold directory
setting, such that it is able to correctly calculate the dentry hashes.

Changes since V2:
  - Don't modify dirhash symbol

Changes since V1:
  - Abort if encoding is invalid.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 e2fsck/Makefile.in     |  7 +++---
 e2fsck/dx_dirinfo.c    |  4 +++-
 e2fsck/e2fsck.h        |  4 +++-
 e2fsck/pass1.c         |  3 ++-
 e2fsck/pass2.c         | 11 ++++++---
 e2fsck/rehash.c        | 20 +++++++++-------
 e2fsck/unix.c          | 18 +++++++++++++++
 lib/ext2fs/Makefile.in |  3 ++-
 lib/ext2fs/dirhash.c   | 52 ++++++++++++++++++++++++++++++++++++++++++
 lib/ext2fs/ext2fs.h    |  8 +++++++
 10 files changed, 112 insertions(+), 18 deletions(-)

diff --git a/e2fsck/Makefile.in b/e2fsck/Makefile.in
index 676ab7ddcc1d..9799274fa74e 100644
--- a/e2fsck/Makefile.in
+++ b/e2fsck/Makefile.in
@@ -293,7 +293,8 @@ pass1.o: $(srcdir)/pass1.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/support/profile.h \
  $(top_builddir)/lib/support/prof_err.h $(top_srcdir)/lib/support/quotaio.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
- $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/problem.h
+ $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/problem.h \
+ $(top_srcdir)/lib/ext2fs/nls.h
 pass1b.o: $(srcdir)/pass1b.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/lib/et/com_err.h \
  $(srcdir)/e2fsck.h $(top_srcdir)/lib/ext2fs/ext2_fs.h \
@@ -317,7 +318,7 @@ pass2.o: $(srcdir)/pass2.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/support/prof_err.h $(top_srcdir)/lib/support/quotaio.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/problem.h \
- $(top_srcdir)/lib/support/dict.h
+ $(top_srcdir)/lib/support/dict.h $(top_srcdir)/lib/ext2fs/nls.h
 pass3.o: $(srcdir)/pass3.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/e2fsck.h \
  $(top_srcdir)/lib/ext2fs/ext2_fs.h $(top_builddir)/lib/ext2fs/ext2_types.h \
@@ -416,7 +417,7 @@ unix.o: $(srcdir)/unix.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/et/com_err.h $(top_srcdir)/lib/support/plausible.h \
  $(srcdir)/e2fsck.h $(top_srcdir)/lib/ext2fs/ext2fs.h \
  $(top_srcdir)/lib/ext2fs/ext3_extents.h $(top_srcdir)/lib/ext2fs/ext2_io.h \
- $(top_builddir)/lib/ext2fs/ext2_err.h \
+ $(top_builddir)/lib/ext2fs/ext2_err.h  $(top_srcdir)/lib/ext2fs/nls.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(top_srcdir)/lib/support/profile.h \
  $(top_builddir)/lib/support/prof_err.h $(top_srcdir)/lib/support/quotaio.h \
diff --git a/e2fsck/dx_dirinfo.c b/e2fsck/dx_dirinfo.c
index c7b605685339..c0b0e9a41235 100644
--- a/e2fsck/dx_dirinfo.c
+++ b/e2fsck/dx_dirinfo.c
@@ -13,7 +13,8 @@
  * entry.  During pass1, the passed-in parent is 0; it will get filled
  * in during pass2.
  */
-void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino, int num_blocks)
+void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino, struct ext2_inode *inode,
+		       int num_blocks)
 {
 	struct dx_dir_info *dir;
 	int		i, j;
@@ -72,6 +73,7 @@ void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino, int num_blocks)
 	dir->ino = ino;
 	dir->numblocks = num_blocks;
 	dir->hashversion = 0;
+	dir->casefolded_hash = inode->i_flags & EXT4_CASEFOLD_FL;
 	dir->dx_block = e2fsck_allocate_memory(ctx, num_blocks
 				       * sizeof (struct dx_dirblock_info),
 				       "dx_block info array");
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index cd5cba2f6031..1c7a67cba1ce 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -109,6 +109,7 @@ struct dx_dir_info {
 	int			hashversion;
 	short			depth;		/* depth of tree */
 	struct dx_dirblock_info	*dx_block; 	/* Array of size numblocks */
+	int			casefolded_hash;
 };
 
 #define DX_DIRBLOCK_ROOT	1
@@ -471,7 +472,8 @@ extern int e2fsck_dir_info_get_dotdot(e2fsck_t ctx, ext2_ino_t ino,
 				      ext2_ino_t *dotdot);
 
 /* dx_dirinfo.c */
-extern void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino, int num_blocks);
+extern void e2fsck_add_dx_dir(e2fsck_t ctx, ext2_ino_t ino,
+			      struct ext2_inode *inode, int num_blocks);
 extern struct dx_dir_info *e2fsck_get_dx_dir_info(e2fsck_t ctx, ext2_ino_t ino);
 extern void e2fsck_free_dx_dir_info(e2fsck_t ctx);
 extern int e2fsck_get_num_dx_dirinfo(e2fsck_t ctx);
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 8abf0c33a1d3..16ebec18db6f 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -48,6 +48,7 @@
 
 #include "e2fsck.h"
 #include <ext2fs/ext2_ext_attr.h>
+#include <e2p/e2p.h>
 
 #include "problem.h"
 
@@ -3381,7 +3382,7 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx,
 			inode->i_flags &= ~EXT2_INDEX_FL;
 			dirty_inode++;
 		} else {
-			e2fsck_add_dx_dir(ctx, ino, pb.last_block+1);
+			e2fsck_add_dx_dir(ctx, ino, inode, pb.last_block+1);
 		}
 	}
 
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index b92eec1e149f..a7d9c47dbe8e 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -933,6 +933,7 @@ static int check_dir_block(ext2_filsys fs,
 	int	filetype = 0;
 	int	encrypted = 0;
 	size_t	max_block_size;
+	int	hash_flags = 0;
 
 	cd = (struct check_dir_struct *) priv_data;
 	ibuf = buf = cd->buf;
@@ -1426,9 +1427,13 @@ skip_checksum:
 			dir_modified++;
 
 		if (dx_db) {
-			ext2fs_dirhash(dx_dir->hashversion, dirent->name,
-				       ext2fs_dirent_name_len(dirent),
-				       fs->super->s_hash_seed, &hash, 0);
+			if (dx_dir->casefolded_hash)
+				hash_flags = EXT4_CASEFOLD_FL;
+
+			ext2fs_dirhash2(dx_dir->hashversion, dirent->name,
+					ext2fs_dirent_name_len(dirent),
+					fs->encoding, hash_flags,
+					fs->super->s_hash_seed, &hash, 0);
 			if (hash < dx_db->min_hash)
 				dx_db->min_hash = hash;
 			if (hash > dx_db->max_hash)
diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 7c4ab0836482..a5fc1be1a210 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -113,7 +113,7 @@ static int fill_dir_block(ext2_filsys fs,
 	struct ext2_dir_entry 	*dirent;
 	char			*dir;
 	unsigned int		offset, dir_offset, rec_len, name_len;
-	int			hash_alg;
+	int			hash_alg, hash_flags;
 
 	if (blockcnt < 0)
 		return 0;
@@ -139,6 +139,7 @@ static int fill_dir_block(ext2_filsys fs,
 		if (fd->err)
 			return BLOCK_ABORT;
 	}
+	hash_flags = fd->inode->i_flags & EXT4_CASEFOLD_FL;
 	hash_alg = fs->super->s_def_hash_version;
 	if ((hash_alg <= EXT2_HASH_TEA) &&
 	    (fs->super->s_flags & EXT2_FLAGS_UNSIGNED_HASH))
@@ -184,10 +185,11 @@ static int fill_dir_block(ext2_filsys fs,
 		if (fd->compress)
 			ent->hash = ent->minor_hash = 0;
 		else {
-			fd->err = ext2fs_dirhash(hash_alg, dirent->name,
-						 name_len,
-						 fs->super->s_hash_seed,
-						 &ent->hash, &ent->minor_hash);
+			fd->err = ext2fs_dirhash2(hash_alg,
+						  dirent->name, name_len,
+						  fs->encoding, hash_flags,
+						  fs->super->s_hash_seed,
+						  &ent->hash, &ent->minor_hash);
 			if (fd->err)
 				return BLOCK_ABORT;
 		}
@@ -371,6 +373,7 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
 	char			new_name[256];
 	unsigned int		new_len;
 	int			hash_alg;
+	int hash_flags = fd->inode->i_flags & EXT4_CASEFOLD_FL;
 
 	clear_problem_context(&pctx);
 	pctx.ino = ino;
@@ -415,9 +418,10 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
 		if (fix_problem(ctx, PR_2_NON_UNIQUE_FILE, &pctx)) {
 			memcpy(ent->dir->name, new_name, new_len);
 			ext2fs_dirent_set_name_len(ent->dir, new_len);
-			ext2fs_dirhash(hash_alg, new_name, new_len,
-				       fs->super->s_hash_seed,
-				       &ent->hash, &ent->minor_hash);
+			ext2fs_dirhash2(hash_alg, new_name, new_len,
+					fs->encoding, hash_flags,
+					fs->super->s_hash_seed,
+					&ent->hash, &ent->minor_hash);
 			fixed++;
 		}
 	}
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 2df22b17146f..bb610af0956f 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -55,6 +55,7 @@ extern int optind;
 #include "problem.h"
 #include "jfs_user.h"
 #include "../version.h"
+#include <ext2fs/nls.h>
 
 /* Command line options */
 static int cflag;		/* check disk */
@@ -1381,6 +1382,7 @@ int main (int argc, char *argv[])
 	int old_bitmaps;
 	__u32 features[3];
 	char *cp;
+	const char *encoding_name;
 	enum quota_type qtype;
 
 	clear_problem_context(&pctx);
@@ -1784,6 +1786,22 @@ print_unsupp_features:
 		goto get_newer;
 	}
 
+	if (ext2fs_has_feature_fname_encoding(sb)) {
+		encoding_name = e2p_encoding2str(sb->s_encoding);
+		if (!encoding_name) {
+			log_err(ctx, _("%s has unknown encoding: 0x%X\n"),
+				ctx->filesystem_name, sb->s_encoding);
+			goto get_newer;
+		}
+
+		fs->encoding = nls_load_table(encoding_name);
+		if (!fs->encoding) {
+			log_err(ctx, _("%s has unsupported encoding: %s\n"),
+				ctx->filesystem_name, encoding_name);
+			goto get_newer;
+		}
+	}
+
 	/*
 	 * If the user specified a specific superblock, presumably the
 	 * master superblock has been trashed.  So we mark the
diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index a2f07403c9ae..b756bbdf35a5 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -779,7 +779,8 @@ dirhash.o: $(srcdir)/dirhash.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fs.h \
  $(srcdir)/ext2_fs.h $(srcdir)/ext3_extents.h $(top_srcdir)/lib/et/com_err.h \
  $(srcdir)/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
- $(srcdir)/ext2_ext_attr.h $(srcdir)/hashmap.h $(srcdir)/bitops.h
+ $(srcdir)/ext2_ext_attr.h $(srcdir)/hashmap.h $(srcdir)/bitops.h \
+ $(srcdir)/nls.h
 dir_iterate.o: $(srcdir)/dir_iterate.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(srcdir)/ext2_fs.h \
  $(top_builddir)/lib/ext2fs/ext2_types.h $(srcdir)/ext2fsP.h \
diff --git a/lib/ext2fs/dirhash.c b/lib/ext2fs/dirhash.c
index 4ba3f35c091f..7e1cb9f5b514 100644
--- a/lib/ext2fs/dirhash.c
+++ b/lib/ext2fs/dirhash.c
@@ -14,9 +14,11 @@
 #include "config.h"
 #include <stdio.h>
 #include <string.h>
+#include <limits.h>
 
 #include "ext2_fs.h"
 #include "ext2fs.h"
+#include "nls.h"
 
 /*
  * Keyed 32-bit hash function using TEA in a Davis-Meyer function
@@ -184,6 +186,11 @@ static void str2hashbuf(const char *msg, int len, __u32 *buf, int num,
  * A particular hash version specifies whether or not the seed is
  * represented, and whether or not the returned hash is 32 bits or 64
  * bits.  32 bit hashes will return 0 for the minor hash.
+ *
+ * This function doesn't do any normalization or casefolding of the
+ * input string.  To take charset encoding into account, use
+ * ext2fs_dirhash2.
+ *
  */
 errcode_t ext2fs_dirhash(int version, const char *name, int len,
 			 const __u32 *seed,
@@ -257,3 +264,48 @@ errcode_t ext2fs_dirhash(int version, const char *name, int len,
 		*ret_minor_hash = minor_hash;
 	return 0;
 }
+
+/*
+ * Returns the hash of a filename considering normalization and
+ * casefolding.  This is a wrapper around ext2fs_dirhash with string
+ * encoding support based on the nls_table and the flags. Check
+ * ext2fs_dirhash for documentation on the input and output parameters.
+ */
+errcode_t ext2fs_dirhash2(int version, const char *name, int len,
+			  const struct nls_table *charset, int hash_flags,
+			  const __u32 *seed,
+			  ext2_dirhash_t *ret_hash,
+			  ext2_dirhash_t *ret_minor_hash)
+{
+	errcode_t r;
+	int dlen;
+	unsigned char *buff;
+
+	if (len && charset) {
+		buff = calloc(sizeof (char), PATH_MAX);
+		if (!buff)
+			return -1;
+
+		if (hash_flags & EXT4_CASEFOLD_FL)
+			dlen = charset->ops->casefold(charset, name, len, buff,
+						  PATH_MAX);
+		else
+			dlen = charset->ops->normalize(charset, name, len, buff,
+						  PATH_MAX);
+
+		if (dlen < 0) {
+			free(buff);
+			goto opaque_seq;
+		}
+
+		r = ext2fs_dirhash(version, buff, dlen, seed, ret_hash,
+				   ret_minor_hash);
+
+		free(buff);
+		return r;
+	}
+
+opaque_seq:
+	return ext2fs_dirhash(version, name, len, seed, ret_hash,
+			      ret_minor_hash);
+}
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 64c5b8758a40..f7760e579508 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -307,6 +307,8 @@ struct struct_ext2_filsys {
 
 	/* hashmap for SHA of data blocks */
 	struct ext2fs_hashmap* block_sha_map;
+
+	const struct nls_table *encoding;
 };
 
 #if EXT2_FLAT_INCLUDES
@@ -1174,6 +1176,12 @@ extern errcode_t ext2fs_dirhash(int version, const char *name, int len,
 				ext2_dirhash_t *ret_hash,
 				ext2_dirhash_t *ret_minor_hash);
 
+extern errcode_t ext2fs_dirhash2(int version, const char *name, int len,
+				 const struct nls_table *charset,
+				 int hash_flags,
+				 const __u32 *seed,
+				 ext2_dirhash_t *ret_hash,
+				 ext2_dirhash_t *ret_minor_hash);
 
 /* dir_iterate.c */
 extern errcode_t ext2fs_get_rec_len(ext2_filsys fs,
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 06/12] debugfs/htree: Support encoding when printing the file hash
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
                   ` (4 preceding siblings ...)
  2018-11-26 22:19 ` [PATCH v3 05/12] lib/ext2fs: Support encoding when calculating dx hashes Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 07/12] tune2fs: Prevent enabling encryption flag on encoding-aware fs Gabriel Krisman Bertazi
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

Implement two parameters -e and -c, to specify encoding and casefold
when printing the hash of a given file.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 debugfs/Makefile.in    |  1 +
 debugfs/htree.c        | 30 +++++++++++++++++++++++-------
 lib/ext2fs/Makefile.in |  2 +-
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/debugfs/Makefile.in b/debugfs/Makefile.in
index bb4d1947b33b..bc59f5f97513 100644
--- a/debugfs/Makefile.in
+++ b/debugfs/Makefile.in
@@ -287,6 +287,7 @@ htree.o: $(srcdir)/htree.c $(top_builddir)/lib/config.h \
  $(top_srcdir)/lib/ext2fs/ext2_io.h $(top_builddir)/lib/ext2fs/ext2_err.h \
  $(top_srcdir)/lib/ext2fs/ext2_ext_attr.h $(top_srcdir)/lib/ext2fs/hashmap.h \
  $(top_srcdir)/lib/ext2fs/bitops.h $(srcdir)/../misc/create_inode.h \
+ $(top_srcdir)/lib/ext2fs/nls.h \
  $(top_srcdir)/lib/e2p/e2p.h $(top_srcdir)/lib/support/quotaio.h \
  $(top_srcdir)/lib/support/dqblk_v2.h \
  $(top_srcdir)/lib/support/quotaio_tree.h
diff --git a/debugfs/htree.c b/debugfs/htree.c
index 0c6a3852393e..d76dc7f0f5e8 100644
--- a/debugfs/htree.c
+++ b/debugfs/htree.c
@@ -27,6 +27,8 @@ extern char *optarg;
 #include "uuid/uuid.h"
 #include "e2p/e2p.h"
 
+#include "ext2fs/nls.h"
+
 static FILE *pager;
 
 static void htree_dump_leaf_node(ext2_filsys fs, ext2_ino_t ino,
@@ -44,6 +46,7 @@ static void htree_dump_leaf_node(ext2_filsys fs, ext2_ino_t ino,
 	ext2_dirhash_t 	hash, minor_hash;
 	unsigned int	rec_len;
 	int		hash_alg;
+	int		hash_flags = inode->i_flags & EXT4_CASEFOLD_FL;
 	int		csum_size = 0;
 
 	if (ext2fs_has_feature_metadata_csum(fs->super))
@@ -89,9 +92,10 @@ static void htree_dump_leaf_node(ext2_filsys fs, ext2_ino_t ino,
 		}
 		strncpy(name, dirent->name, thislen);
 		name[thislen] = '\0';
-		errcode = ext2fs_dirhash(hash_alg, name,
-					 thislen, fs->super->s_hash_seed,
-					 &hash, &minor_hash);
+		errcode = ext2fs_dirhash2(hash_alg, name, thislen,
+					  fs->encoding, hash_flags,
+					  fs->super->s_hash_seed,
+					  &hash, &minor_hash);
 		if (errcode)
 			com_err("htree_dump_leaf_node", errcode,
 				"while calculating hash");
@@ -306,11 +310,12 @@ errout:
 void do_dx_hash(int argc, char *argv[], int sci_idx EXT2FS_ATTR((unused)),
 		void *infop EXT2FS_ATTR((unused)))
 {
-	ext2_dirhash_t hash, minor_hash;
+	ext2_dirhash_t hash, minor_hash, hash_flags;
 	errcode_t	err;
 	int		c;
 	int		hash_version = 0;
 	__u32		hash_seed[4];
+	const struct nls_table *encoding;
 
 	hash_seed[0] = hash_seed[1] = hash_seed[2] = hash_seed[3] = 0;
 
@@ -329,6 +334,15 @@ void do_dx_hash(int argc, char *argv[], int sci_idx EXT2FS_ATTR((unused)),
 				return;
 			}
 			break;
+		case 'c':
+			hash_flags = EXT4_CASEFOLD_FL;
+			break;
+		case 'e':
+			encoding = nls_load_table(optarg);
+			if (!encoding)
+				fprintf(stderr, "Invalid encoding: %s\n",
+					optarg);
+				return;
 		default:
 			goto print_usage;
 		}
@@ -336,11 +350,13 @@ void do_dx_hash(int argc, char *argv[], int sci_idx EXT2FS_ATTR((unused)),
 	if (optind != argc-1) {
 	print_usage:
 		com_err(argv[0], 0, "usage: dx_hash [-h hash_alg] "
-			"[-s hash_seed] filename");
+			"[-s hash_seed] [-c] [-e encoding] filename");
 		return;
 	}
-	err = ext2fs_dirhash(hash_version, argv[optind], strlen(argv[optind]),
-			     hash_seed, &hash, &minor_hash);
+	err = ext2fs_dirhash2(hash_version, argv[optind],
+			      strlen(argv[optind]), encoding, hash_flags,
+			      hash_seed, &hash, &minor_hash);
+
 	if (err) {
 		com_err(argv[0], err, "while calculating hash");
 		return;
diff --git a/lib/ext2fs/Makefile.in b/lib/ext2fs/Makefile.in
index b756bbdf35a5..78c54d1a09de 100644
--- a/lib/ext2fs/Makefile.in
+++ b/lib/ext2fs/Makefile.in
@@ -1299,7 +1299,7 @@ htree.o: $(top_srcdir)/debugfs/htree.c $(top_builddir)/lib/config.h \
  $(srcdir)/hashmap.h $(srcdir)/bitops.h \
  $(top_srcdir)/debugfs/../misc/create_inode.h $(top_srcdir)/lib/e2p/e2p.h \
  $(top_srcdir)/lib/support/quotaio.h $(top_srcdir)/lib/support/dqblk_v2.h \
- $(top_srcdir)/lib/support/quotaio_tree.h
+ $(top_srcdir)/lib/support/quotaio_tree.h $(srcdir)/nls.h
 unused.o: $(top_srcdir)/debugfs/unused.c $(top_builddir)/lib/config.h \
  $(top_builddir)/lib/dirpaths.h $(top_srcdir)/debugfs/debugfs.h \
  $(top_srcdir)/lib/ss/ss.h $(top_builddir)/lib/ss/ss_err.h \
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 07/12] tune2fs: Prevent enabling encryption flag on encoding-aware fs
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
                   ` (5 preceding siblings ...)
  2018-11-26 22:19 ` [PATCH v3 06/12] debugfs/htree: Support encoding when printing the file hash Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 09/12] ext4.5: Add fname_encoding feature to ext4 man page Gabriel Krisman Bertazi
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

The kernel will refuse to mount filesystems with the encryption and
encoding features enabled at the same time.  The encoding feature can
only be set at mount time, so we can just prevent encryption from being
set at a later time by tune2fs.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 misc/tune2fs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index a680b461cc86..4c92bee30b38 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -1459,6 +1459,12 @@ mmp_error:
 	}
 
 	if (FEATURE_ON(E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_ENCRYPT)) {
+		if (ext2fs_has_feature_fname_encoding(sb)) {
+			fputs(_("Cannot enable encrypt feature on filesystems "
+				"with the encoding feature enabled.\n"),
+			      stderr);
+			return 1;
+		}
 		fs->super->s_encrypt_algos[0] =
 			EXT4_ENCRYPTION_MODE_AES_256_XTS;
 		fs->super->s_encrypt_algos[1] =
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 09/12] ext4.5: Add fname_encoding feature to ext4 man page
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
                   ` (6 preceding siblings ...)
  2018-11-26 22:19 ` [PATCH v3 07/12] tune2fs: Prevent enabling encryption flag on encoding-aware fs Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 10/12] mke2fs.8: Document fname_encoding options Gabriel Krisman Bertazi
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 misc/ext4.5.in | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/misc/ext4.5.in b/misc/ext4.5.in
index 8f4c4f51ea85..98c7b1313ab6 100644
--- a/misc/ext4.5.in
+++ b/misc/ext4.5.in
@@ -151,6 +151,16 @@ can be specified using the
 .B \-G
 option.
 .TP
+.B fname_encoding
+.br
+This ext4 feature provides file system-level character encoding support
+for files and directories name.  This feature is name-preserving on the
+disk, but it allows applications to lookup for a file in the file system
+using any encoding equivalent version of the file name.
+
+This feature is required to perform in-kernel case-insensitive file
+name lookups.
+.TP
 .B has_journal
 .br
 Create a journal to ensure filesystem consistency even across unclean
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 10/12] mke2fs.8: Document fname_encoding options
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
                   ` (7 preceding siblings ...)
  2018-11-26 22:19 ` [PATCH v3 09/12] ext4.5: Add fname_encoding feature to ext4 man page Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-30 15:59   ` Theodore Y. Ts'o
  2018-11-26 22:19 ` [PATCH v3 11/12] mke2fs.conf.5: Document fname_encoding configuration option Gabriel Krisman Bertazi
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 misc/mke2fs.8.in | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index 603e37e54a78..4a2aa8fd9672 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -280,6 +280,31 @@ option is still accepted for backwards compatibility, but is deprecated.
 The following extended options are supported:
 .RS 1.2i
 .TP
+.BI fname_encoding= encoding-name
+Enable the
+.I fname_encoding
+feature in the super block and set
+.I encoding-name
+as the encoding to be used.  If
+.I encoding-name
+is not specified, the encoding defined in
+.BR mke2fs.conf (5)
+is used.
+.TP
+.BI fname_encoding_flags= encoding-flags
+Define parameters for file name character encoding operations.  If a
+flag is not changed using this parameter, its default value is used.
+.I encoding-flags
+should be a comma-separated lists of flags to be enabled.  To disable a
+flag, add it to the list with the prefix "no".
+
+The only flag that can be set right now is
+.I strict
+which means that invalid strings should be rejected by the file system.
+In the default configuration, the
+.I strict
+flag is disabled.
+.TP
 .BI mmp_update_interval= interval
 Adjust the initial MMP update interval to
 .I interval
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 11/12] mke2fs.conf.5: Document fname_encoding configuration option
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
                   ` (8 preceding siblings ...)
  2018-11-26 22:19 ` [PATCH v3 10/12] mke2fs.8: Document fname_encoding options Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
  2018-11-26 22:19 ` [PATCH v3 12/12] chattr.1: Document the casefold attribute Gabriel Krisman Bertazi
       [not found] ` <20181126221949.12172-9-krisman@collabora.com>
  11 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 misc/mke2fs.conf.5.in | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/misc/mke2fs.conf.5.in b/misc/mke2fs.conf.5.in
index c086b4113b50..ab93c7761a16 100644
--- a/misc/mke2fs.conf.5.in
+++ b/misc/mke2fs.conf.5.in
@@ -97,6 +97,10 @@ The following relations are defined in the
 .I [options]
 stanza.
 .TP
+.I fname_encoding
+This relation defines the file name encoding to be used by mke2fs, in
+case the user doesn't specify an encoding in the command line.
+.TP
 .I proceed_delay
 If this relation is set to a positive integer, then mke2fs will
 wait
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 12/12] chattr.1: Document the casefold attribute
  2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
                   ` (9 preceding siblings ...)
  2018-11-26 22:19 ` [PATCH v3 11/12] mke2fs.conf.5: Document fname_encoding configuration option Gabriel Krisman Bertazi
@ 2018-11-26 22:19 ` Gabriel Krisman Bertazi
       [not found] ` <20181126221949.12172-9-krisman@collabora.com>
  11 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-26 22:19 UTC (permalink / raw)
  To: tytso; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 misc/chattr.1.in | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/misc/chattr.1.in b/misc/chattr.1.in
index 028ae9e7ca4c..5949d096687b 100644
--- a/misc/chattr.1.in
+++ b/misc/chattr.1.in
@@ -29,7 +29,7 @@ The operator '+' causes the selected attributes to be added to the
 existing attributes of the files; '-' causes them to be removed; and '='
 causes them to be the only attributes that the files have.
 .PP
-The letters 'aAcCdDeijPsStTu' select the new attributes for the files:
+The letters 'aAcCdDeFijPsStTu' select the new attributes for the files:
 append only (a),
 no atime updates (A),
 compressed (c),
@@ -37,6 +37,7 @@ no copy on write (C),
 no dump (d),
 synchronous directory updates (D),
 extent format (e),
+case-insensitive directory lookups (F),
 immutable (i),
 data journalling (j),
 project hierarchy (P),
@@ -119,6 +120,11 @@ set or reset using
 although it can be displayed by
 .BR lsattr (1).
 .PP
+A directory with the 'F' attribute set indicates that all the path
+lookups inside that directory are made in a case-insensitive fashion.
+This attribute can only be changed in empty directories on file systems
+with the fname_encoding feature enabled.
+.PP
 A file with the 'i' attribute cannot be modified: it cannot be deleted or
 renamed, no link can be created to this file, most of the file's
 metadata can not be modified, and the file can not be opened in write mode.
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields
  2018-11-26 22:19 ` [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields Gabriel Krisman Bertazi
@ 2018-11-30 15:42   ` Theodore Y. Ts'o
  2018-11-30 20:46     ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 19+ messages in thread
From: Theodore Y. Ts'o @ 2018-11-30 15:42 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

On Mon, Nov 26, 2018 at 05:19:38PM -0500, Gabriel Krisman Bertazi wrote:
> +	/* 0x1 */ {"utf8-10.0", (EXT4_UTF8_NORMALIZATION_TYPE_NFKD |
> +				 EXT4_UTF8_CASEFOLD_TYPE_NFKDCF)},

We're using 10.0 here even though the later in the patch we're
installing Unicode 11.0.  What if we just call this utf8-10+?  Unicode
releases new versions every six months these days, and so long as the
case fold rules don't change for any existing characters, but are only
added for new characters added to the new version of Unicode, it would
definitely be OK for strict mode.

Even in relaxed mode, if someone decided to use, say, Klingon
characters not recognized by the Unicode consortium in their system,
and later on the Unicode consortium reassigns those code points to
some obscure ancient script, it would be unfortunate, how much would
it be *our* problem?  The worst that could happen is that if case
folding were enabled, two file names that were previously unique would
be considered identical by the new case folding rules after the
rebooting into the new kernel.  If hashed directories were used, one
or both of the filenames might not be accessible, but it wouldn't lead
to an actual file system level inconsistency.  And data would only be
lost if the wrong file were to get accidentally deleted in the confusion.
 
I'm curious how Windows handles this problem.  Windows and Apple are
happy to include the latest set of emoji's as they become available
--- after all, that's a key competitive advantage for some markets :-)
--- so I'm guessing they must not be terribly strict about Unicode
versioning, either.  So maybe the right thing to do is to just call it
"utf8" and be done with it.  :-)

       	   					- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 04/12] lib/ext2fs: Implement NLS support
  2018-11-26 22:19 ` [PATCH v3 04/12] lib/ext2fs: Implement NLS support Gabriel Krisman Bertazi
@ 2018-11-30 15:54   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 19+ messages in thread
From: Theodore Y. Ts'o @ 2018-11-30 15:54 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

On Mon, Nov 26, 2018 at 05:19:41PM -0500, Gabriel Krisman Bertazi wrote:
> +static int ascii_casefold(const struct nls_table *table,
> +			  const unsigned char *str, size_t len,
> +			  unsigned char *dest, size_t dlen)
> +{
> +	unsigned i;
> +
> +	for (i = 0; i < len; i++)
> +		dest[i] = charset_toupper(table, str[i]);
> +
> +	return len;
> +}
> +
> +static int ascii_normalize(const struct nls_table *table,
> +			   const unsigned char *str, size_t len,
> +			   unsigned char *dest, size_t dlen)
> +{
> +	memcpy(dest, str, len);
> +	return len;
> +}

Shouldn't these two functions at least check to make sure dlen >= len
and return an error if not?

It's not paranoia when there are people out there actually trying to
get you.  :-)

						- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 10/12] mke2fs.8: Document fname_encoding options
  2018-11-26 22:19 ` [PATCH v3 10/12] mke2fs.8: Document fname_encoding options Gabriel Krisman Bertazi
@ 2018-11-30 15:59   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 19+ messages in thread
From: Theodore Y. Ts'o @ 2018-11-30 15:59 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: kernel, linux-ext4

In general, it's better to fold the man page updates with the patch
that adds the support.  So ideally patches 10 and 11 should be folded
into patch 2, and patch 12 should be folded into patch 3.

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD normalization
       [not found] ` <20181126221949.12172-9-krisman@collabora.com>
@ 2018-11-30 16:12   ` Theodore Y. Ts'o
  2018-11-30 16:53   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 19+ messages in thread
From: Theodore Y. Ts'o @ 2018-11-30 16:12 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

On Mon, Nov 26, 2018 at 05:19:45PM -0500, Gabriel Krisman Bertazi wrote:
> +static int utf8_casefold(const struct nls_table *table,
> +			  const unsigned char *str, size_t len,
> +			  unsigned char *dest, size_t dlen)
> +{
> +	const struct utf8data *data = utf8nfkdicf(UNICODE_AGE(10,0,0));
> +	struct utf8cursor cur;
> +	size_t nlen = 0;
> +
> +	if (utf8ncursor(&cur, data, str, len) < 0)
> +		goto invalid_seq;
> +
> +	for (nlen = 0; nlen < dlen; nlen++) {
> +		dest[nlen] = utf8byte(&cur);
> +		if (!dest[nlen])
> +			return nlen;
> +		if (dest[nlen] == -1)
> +			break;
> +	}
> +invalid_seq:
> +	/* Treat the sequence as a binary blob. */
> +	memcpy(dest, str, len);
> +	return len;
> +
> +}

So it looks like the interface is if the destination buffer is too
small OR if the string is not a valid UTF-8 string, we treat it as a
binary blob.  I wonder if we would be better off if this function
actually signalling that there is a problem?  (Buffer too small,
invalid UTF-8 string).

It's fine to treat it as a binary blob, and copy it out to the
destination buffer, but I can imagine be use cases where knowing this
will be useful.  *Especially* the destination buffer too small case;
I'm actually a little nervous about having it silently ignoring that
error condition and just copying the binary blob.

Also, there *really* needs to be a check before dlen is assumed to be
>= len in the memcpy after the invalid_seq label.

							- Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD normalization
       [not found] ` <20181126221949.12172-9-krisman@collabora.com>
  2018-11-30 16:12   ` [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD normalization Theodore Y. Ts'o
@ 2018-11-30 16:53   ` Theodore Y. Ts'o
  2018-11-30 18:48     ` Gabriel Krisman Bertazi
  1 sibling, 1 reply; 19+ messages in thread
From: Theodore Y. Ts'o @ 2018-11-30 16:53 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: kernel, linux-ext4, Gabriel Krisman Bertazi

On Mon, Nov 26, 2018 at 05:19:45PM -0500, Gabriel Krisman Bertazi wrote:
> From: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
> 
> We need this such that we can do normalization and casefolding
> compatible with the kernel, in order to properly support fsck
> verification and rehashing.
> 
> The UTF-8 11.0 implementation is copied and adapted from the kernel code
> to ensure maximum compatibility.  The decode trie in utf8data.h is
> generated using a script and the UCD sources in the kernel code.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>

One more thought.  Is there any test cases we can add here?  I assume
the SGI folks must have had some test code that they used when they
were developing their trie code.  Was any of that released?

Maybe there is some Unicode normalization and case folding test
vectors we can grab?

Thanks,

     		      	   	      	  - Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD normalization
  2018-11-30 16:53   ` Theodore Y. Ts'o
@ 2018-11-30 18:48     ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-30 18:48 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: kernel, linux-ext4

"Theodore Y. Ts'o" <tytso@mit.edu> writes:

> One more thought.  Is there any test cases we can add here?  I assume
> the SGI folks must have had some test code that they used when they
> were developing their trie code.  Was any of that released?
>
> Maybe there is some Unicode normalization and case folding test
> vectors we can grab?

Since these file are generated and imported from the kernel code, I added
the tests there instead.  Should I duplicate or move them to here?


-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields
  2018-11-30 15:42   ` Theodore Y. Ts'o
@ 2018-11-30 20:46     ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 19+ messages in thread
From: Gabriel Krisman Bertazi @ 2018-11-30 20:46 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: kernel, linux-ext4

"Theodore Y. Ts'o" <tytso@mit.edu> writes:

> On Mon, Nov 26, 2018 at 05:19:38PM -0500, Gabriel Krisman Bertazi wrote:
>> +	/* 0x1 */ {"utf8-10.0", (EXT4_UTF8_NORMALIZATION_TYPE_NFKD |
>> +				 EXT4_UTF8_CASEFOLD_TYPE_NFKDCF)},
>
> We're using 10.0 here even though the later in the patch we're
> installing Unicode 11.0.  What if we just call this utf8-10+?  Unicode
> releases new versions every six months these days, and so long as the
> case fold rules don't change for any existing characters, but are only
> added for new characters added to the new version of Unicode, it would
> definitely be OK for strict mode.
>
> Even in relaxed mode, if someone decided to use, say, Klingon
> characters not recognized by the Unicode consortium in their system,
> and later on the Unicode consortium reassigns those code points to
> some obscure ancient script, it would be unfortunate, how much would
> it be *our* problem?  The worst that could happen is that if case
> folding were enabled, two file names that were previously unique would
> be considered identical by the new case folding rules after the
> rebooting into the new kernel.  If hashed directories were used, one
> or both of the filenames might not be accessible, but it wouldn't lead
> to an actual file system level inconsistency.  And data would only be
> lost if the wrong file were to get accidentally deleted in the
> confusion.

If this is not our problem, it does get much easier.  But we might be
able to assist the user a bit more, if we store the version in the
superblock.

We only allow the user to specify utf8 without requiring a version in
mkfs, just like you said, but we still write the unicode version in the
superblock.  The kernel will always mount using the newest unicode, but
recommend the user to run fsck if there is a version mismatch.  fsck can
then check the filesystem using first the newest version, and if an
invalid is found, it tries to use the exact superblock version.  If the
second attempt doesn't fail, we can rehash the entry, because no real
inconsistencies actually exist. If the rehash triggers a collision, we
could ask the user interactively what to do, if we can be interactive in
fsck (we can't, right?).  Otherwise, if we can't solve the collisions,
we set a flag in the superblock to force the exact version when mounting
the next time.  The user loses normalization of new scripts, but we warn
them about it, and the existing data is preserved and accessible.
Finally, if no collision is detected, or if we can solve all of then, we
write the new hashes and silently update the unicode version flag in the
superblock in fsck.

The interface becomes simpler for the common user, we basically hide
unicode versioning from someone that is not playing with ancient
scripts, and they still benefit from the new version by just rebooting
to an updated kernel.  But we still give the user that actually cares
about ancient scripts a way to fix her situation.

> I'm curious how Windows handles this problem.  Windows and Apple are
> happy to include the latest set of emoji's as they become available
> --- after all, that's a key competitive advantage for some markets :-)
> --- so I'm guessing they must not be terribly strict about Unicode
> versioning, either.  So maybe the right thing to do is to just call it
> "utf8" and be done with it.  :-)

I just did some tests on a macbook.  The machine was on xnu-4570.71.1~1,
which is pre-unicode 11 and creating a file with a unicode 11+ sequence
triggers an "illegal byte sequence" error.  After updating the system
(to xnu-4903.221.2), i can create files using new emoticons. So, from what
i can tell, apple is using a more strict mode that reject invalid
sequences.

Windows seems more permissive, I can create a unicode 11 file on a
system I am sure doesn't have unicode 11 support, since it is much older
than that version, and I can check the file name on disk is the one i
asked for.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-12-01  7:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26 22:19 [PATCH e2fsprogs v3 00/12] Support encoding awareness and casefold Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 01/12] libe2p: Helpers for configuring the encoding superblock fields Gabriel Krisman Bertazi
2018-11-30 15:42   ` Theodore Y. Ts'o
2018-11-30 20:46     ` Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 02/12] mke2fs: Configure encoding during superblock initialization Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 03/12] chattr/lsattr: Support casefold attribute Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 04/12] lib/ext2fs: Implement NLS support Gabriel Krisman Bertazi
2018-11-30 15:54   ` Theodore Y. Ts'o
2018-11-26 22:19 ` [PATCH v3 05/12] lib/ext2fs: Support encoding when calculating dx hashes Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 06/12] debugfs/htree: Support encoding when printing the file hash Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 07/12] tune2fs: Prevent enabling encryption flag on encoding-aware fs Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 09/12] ext4.5: Add fname_encoding feature to ext4 man page Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 10/12] mke2fs.8: Document fname_encoding options Gabriel Krisman Bertazi
2018-11-30 15:59   ` Theodore Y. Ts'o
2018-11-26 22:19 ` [PATCH v3 11/12] mke2fs.conf.5: Document fname_encoding configuration option Gabriel Krisman Bertazi
2018-11-26 22:19 ` [PATCH v3 12/12] chattr.1: Document the casefold attribute Gabriel Krisman Bertazi
     [not found] ` <20181126221949.12172-9-krisman@collabora.com>
2018-11-30 16:12   ` [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD normalization Theodore Y. Ts'o
2018-11-30 16:53   ` Theodore Y. Ts'o
2018-11-30 18:48     ` Gabriel Krisman Bertazi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.