All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs
@ 2020-12-10 15:03 Arnaud Ferraris
  2020-12-10 15:03 ` [PATCH RESEND v2 01/12] tune2fs: Allow enabling casefold feature after fs creation Arnaud Ferraris
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

Hello,

This patch series improves e2fsprogs for case-insensitive filesystems.

First, it allows tune2fs to enable the 'casefold' feature on existing
filesystems.

Then, it improves e2fsck by allowing it to:
- fix entries containing invalid UTF-8 characters
- detect duplicated entries

By default, invalid filenames are only checked when strict mode is enabled.
A new option is therefore added to allow the user to force this verification.

This series has been tested by running xfstests, and by manually corrupting
the test filesystem using debugfs as well.

Best regards,
Arnaud

---

Changes in v2:
  - added missing comment in e2fsck/pass1.c
  - added a new problem code dedicated to bad encoded file names
  - reworked a test in e2fsck/pass2.c

Arnaud Ferraris (1):
  e2fsck: add new problem for casefolded name check

Gabriel Krisman Bertazi (11):
  tune2fs: Allow enabling casefold feature after fs creation
  tune2fs: Fix casefold+encrypt error message
  ext2fs: Add method to validate casefolded strings
  ext2fs: Implement faster CI comparison of strings
  e2fsck: Fix entries with invalid encoded characters
  e2fsck: Support casefold directories when rehashing
  dict: Support comparison with context
  e2fsck: Detect duplicated casefolded direntries for rehash
  e2fsck: Add option to force encoded filename verification
  e2fsck.8.in: Document check_encoding extended option
  tests: f_bad_fname: Test fixes of invalid filenames and duplicates

 e2fsck/e2fsck.8.in         |   4 ++
 e2fsck/e2fsck.c            |   4 ++
 e2fsck/e2fsck.h            |   2 +
 e2fsck/pass1.c             |  18 ++++++++
 e2fsck/pass1b.c            |   2 +-
 e2fsck/pass2.c             |  76 +++++++++++++++++++++++++++++---
 e2fsck/problem.c           |   5 +++
 e2fsck/problem.h           |   3 ++
 e2fsck/rehash.c            |  88 ++++++++++++++++++++++++++++++-------
 e2fsck/unix.c              |   4 ++
 lib/ext2fs/ext2fs.h        |   6 +++
 lib/ext2fs/ext2fsP.h       |   6 +++
 lib/ext2fs/nls_utf8.c      |  62 ++++++++++++++++++++++++++
 lib/support/dict.c         |  22 +++++++---
 lib/support/dict.h         |   4 +-
 lib/support/mkquota.c      |   2 +-
 misc/tune2fs.c             |  18 +++++++-
 tests/f_bad_fname/expect.1 |  22 ++++++++++
 tests/f_bad_fname/expect.2 |   7 +++
 tests/f_bad_fname/image.gz | Bin 0 -> 802 bytes
 tests/f_bad_fname/name     |   1 +
 21 files changed, 322 insertions(+), 34 deletions(-)
 create mode 100644 tests/f_bad_fname/expect.1
 create mode 100644 tests/f_bad_fname/expect.2
 create mode 100644 tests/f_bad_fname/image.gz
 create mode 100644 tests/f_bad_fname/name

-- 
2.29.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 01/12] tune2fs: Allow enabling casefold feature after fs creation
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2021-01-27 22:42   ` Theodore Ts'o
  2020-12-10 15:03 ` [PATCH RESEND v2 02/12] tune2fs: Fix casefold+encrypt error message Arnaud Ferraris
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

The main reason we didn't allow this before was because !CASEFOLDED
directories were expected to be normalized().  Since this is no longer
the case, and as long as the encrypt feature is not enabled, it should
be safe to enable this feature.

Disabling the feature is trickier, since we need to make sure there are
no existing +F directories in the filesystem.  Leave that for a future
patch.

Also, enabling strict mode requires some filesystem-wide verification,
so ignore that for now.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 misc/tune2fs.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index f942c698..0809e565 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -161,7 +161,8 @@ static __u32 ok_features[3] = {
 		EXT4_FEATURE_INCOMPAT_64BIT |
 		EXT4_FEATURE_INCOMPAT_ENCRYPT |
 		EXT4_FEATURE_INCOMPAT_CSUM_SEED |
-		EXT4_FEATURE_INCOMPAT_LARGEDIR,
+		EXT4_FEATURE_INCOMPAT_LARGEDIR |
+		EXT4_FEATURE_INCOMPAT_CASEFOLD,
 	/* R/O compat */
 	EXT2_FEATURE_RO_COMPAT_LARGE_FILE |
 		EXT4_FEATURE_RO_COMPAT_HUGE_FILE|
@@ -1513,6 +1514,19 @@ mmp_error:
 		}
 	}
 
+	if (FEATURE_ON(E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_CASEFOLD)) {
+		if (ext2fs_has_feature_encrypt(sb)) {
+			fputs(_("Cannot enable casefold feature on filesystems "
+				"with the encrypt feature enabled.\n"),
+			      stderr);
+			return 1;
+		}
+
+		sb->s_encoding = EXT4_ENC_UTF8_12_1;
+		sb->s_encoding_flags = e2p_get_encoding_flags(sb->s_encoding);
+	}
+
+
 	if (sb->s_rev_level == EXT2_GOOD_OLD_REV &&
 	    (sb->s_feature_compat || sb->s_feature_ro_compat ||
 	     sb->s_feature_incompat))
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 02/12] tune2fs: Fix casefold+encrypt error message
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
  2020-12-10 15:03 ` [PATCH RESEND v2 01/12] tune2fs: Allow enabling casefold feature after fs creation Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2021-01-27 22:46   ` Theodore Ts'o
  2020-12-10 15:03 ` [PATCH RESEND v2 03/12] ext2fs: Add method to validate casefolded strings Arnaud Ferraris
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Refering to EXT4_INCOMPAT_CASEFOLD as encoding is not as meaningful as
saying casefold.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 misc/tune2fs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index 0809e565..c182f4d5 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -1470,7 +1470,7 @@ mmp_error:
 	if (FEATURE_ON(E2P_FEATURE_INCOMPAT, EXT4_FEATURE_INCOMPAT_ENCRYPT)) {
 		if (ext2fs_has_feature_casefold(sb)) {
 			fputs(_("Cannot enable encrypt feature on filesystems "
-				"with the encoding feature enabled.\n"),
+				"with the casefold feature enabled.\n"),
 			      stderr);
 			return 1;
 		}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 03/12] ext2fs: Add method to validate casefolded strings
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
  2020-12-10 15:03 ` [PATCH RESEND v2 01/12] tune2fs: Allow enabling casefold feature after fs creation Arnaud Ferraris
  2020-12-10 15:03 ` [PATCH RESEND v2 02/12] tune2fs: Fix casefold+encrypt error message Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2021-01-28  2:48   ` Theodore Ts'o
  2020-12-10 15:03 ` [PATCH RESEND v2 04/12] ext2fs: Implement faster CI comparison of strings Arnaud Ferraris
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

This is exported to be used by fsck.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 lib/ext2fs/ext2fs.h   |  2 ++
 lib/ext2fs/ext2fsP.h  |  2 ++
 lib/ext2fs/nls_utf8.c | 29 +++++++++++++++++++++++++++++
 3 files changed, 33 insertions(+)

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 69c8a3ff..4065cb70 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1613,6 +1613,8 @@ extern errcode_t ext2fs_new_dir_inline_data(ext2_filsys fs, ext2_ino_t dir_ino,
 
 /* nls_utf8.c */
 extern const struct ext2fs_nls_table *ext2fs_load_nls_table(int encoding);
+extern int ext2fs_check_encoded_name(const struct ext2fs_nls_table *table,
+				     char *s, size_t len, char **pos);
 
 /* mkdir.c */
 extern errcode_t ext2fs_mkdir(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t inum,
diff --git a/lib/ext2fs/ext2fsP.h b/lib/ext2fs/ext2fsP.h
index ad8b7d52..30564ded 100644
--- a/lib/ext2fs/ext2fsP.h
+++ b/lib/ext2fs/ext2fsP.h
@@ -104,6 +104,8 @@ struct ext2fs_nls_ops {
 	int (*casefold)(const struct ext2fs_nls_table *charset,
 			const unsigned char *str, size_t len,
 			unsigned char *dest, size_t dlen);
+	int (*validate)(const struct ext2fs_nls_table *table,
+			char *s, size_t len, char **pos);
 };
 
 /* Function prototypes */
diff --git a/lib/ext2fs/nls_utf8.c b/lib/ext2fs/nls_utf8.c
index e4c4e7a3..903c65ba 100644
--- a/lib/ext2fs/nls_utf8.c
+++ b/lib/ext2fs/nls_utf8.c
@@ -920,8 +920,31 @@ invalid_seq:
 	return -EINVAL;
 }
 
+
+static int utf8_validate(const struct ext2fs_nls_table *table,
+			 char *s, size_t len, char **pos)
+{
+	const struct utf8data *data = utf8nfdicf(table->version);
+	utf8leaf_t	*leaf;
+	unsigned char	hangul[UTF8HANGULLEAF];
+
+	if (!data)
+		return -1;
+	while (len && *s) {
+		leaf = utf8nlookup(data, hangul, s, len);
+		if (!leaf) {
+			*pos = s;
+			return 1;
+		}
+		len -= utf8clen(s);
+		s += utf8clen(s);
+	}
+	return 0;
+}
+
 static const struct ext2fs_nls_ops utf8_ops = {
 	.casefold = utf8_casefold,
+	.validate = utf8_validate,
 };
 
 static const struct ext2fs_nls_table nls_utf8 = {
@@ -936,3 +959,9 @@ const struct ext2fs_nls_table *ext2fs_load_nls_table(int encoding)
 
 	return NULL;
 }
+
+int ext2fs_check_encoded_name(const struct ext2fs_nls_table *table,
+			      char *name, size_t len, char **pos)
+{
+	return table->ops->validate(table, name, len, pos);
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 04/12] ext2fs: Implement faster CI comparison of strings
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (2 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 03/12] ext2fs: Add method to validate casefolded strings Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2021-01-28  2:49   ` Theodore Ts'o
  2020-12-10 15:03 ` [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check Arnaud Ferraris
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Instead of calling casefold two times and memcmp the result, which
require allocating a temporary buffer for the casefolded version, add a
strcasecmp-like method to perform the comparison of each code-point
during the casefold itself.

This method is exposed because it needs to be used directly by fsck.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 lib/ext2fs/ext2fs.h   |  4 ++++
 lib/ext2fs/ext2fsP.h  |  4 ++++
 lib/ext2fs/nls_utf8.c | 33 +++++++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 4065cb70..9e96ca5c 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1615,6 +1615,10 @@ extern errcode_t ext2fs_new_dir_inline_data(ext2_filsys fs, ext2_ino_t dir_ino,
 extern const struct ext2fs_nls_table *ext2fs_load_nls_table(int encoding);
 extern int ext2fs_check_encoded_name(const struct ext2fs_nls_table *table,
 				     char *s, size_t len, char **pos);
+extern int ext2fs_casefold_cmp(const struct ext2fs_nls_table *table,
+			       const unsigned char *str1, size_t len1,
+			       const unsigned char *str2, size_t len2);
+
 
 /* mkdir.c */
 extern errcode_t ext2fs_mkdir(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t inum,
diff --git a/lib/ext2fs/ext2fsP.h b/lib/ext2fs/ext2fsP.h
index 30564ded..99239be0 100644
--- a/lib/ext2fs/ext2fsP.h
+++ b/lib/ext2fs/ext2fsP.h
@@ -106,6 +106,10 @@ struct ext2fs_nls_ops {
 			unsigned char *dest, size_t dlen);
 	int (*validate)(const struct ext2fs_nls_table *table,
 			char *s, size_t len, char **pos);
+	int (*casefold_cmp)(const struct ext2fs_nls_table *table,
+			    const unsigned char *str1, size_t len1,
+			    const unsigned char *str2, size_t len2);
+
 };
 
 /* Function prototypes */
diff --git a/lib/ext2fs/nls_utf8.c b/lib/ext2fs/nls_utf8.c
index 903c65ba..1c444ca2 100644
--- a/lib/ext2fs/nls_utf8.c
+++ b/lib/ext2fs/nls_utf8.c
@@ -942,9 +942,36 @@ static int utf8_validate(const struct ext2fs_nls_table *table,
 	return 0;
 }
 
+static int utf8_casefold_cmp(const struct ext2fs_nls_table *table,
+			     const unsigned char *str1, size_t len1,
+			     const unsigned char *str2, size_t len2)
+{
+	const struct utf8data *data = utf8nfdicf(table->version);
+	int c1, c2;
+	struct utf8cursor cur1, cur2;
+
+	if (utf8ncursor(&cur1, data, (const char *) str1, len1) < 0)
+		return -1;
+	if (utf8ncursor(&cur2, data, (const char *) str2, len2) < 0)
+		return -1;
+
+	do {
+		c1 = utf8byte(&cur1);
+		c2 = utf8byte(&cur2);
+
+		if (c1 < 0 || c2 < 0)
+			return -1;
+		if (c1 != c2)
+			return c1 - c2;
+	} while (c1);
+
+	return 0;
+}
+
 static const struct ext2fs_nls_ops utf8_ops = {
 	.casefold = utf8_casefold,
 	.validate = utf8_validate,
+	.casefold_cmp = utf8_casefold_cmp,
 };
 
 static const struct ext2fs_nls_table nls_utf8 = {
@@ -965,3 +992,9 @@ int ext2fs_check_encoded_name(const struct ext2fs_nls_table *table,
 {
 	return table->ops->validate(table, name, len, pos);
 }
+int ext2fs_casefold_cmp(const struct ext2fs_nls_table *table,
+			const unsigned char *str1, size_t len1,
+			const unsigned char *str2, size_t len2)
+{
+	return table->ops->casefold_cmp(table, str1, len1, str2, len2);
+}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (3 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 04/12] ext2fs: Implement faster CI comparison of strings Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2020-12-10 20:36   ` Gabriel Krisman Bertazi
  2020-12-10 20:38   ` Gabriel Krisman Bertazi
  2020-12-10 15:03 ` [PATCH RESEND v2 06/12] e2fsck: Fix entries with invalid encoded characters Arnaud Ferraris
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

---
Changes in v2:
  - added in this version

 e2fsck/problem.c | 5 +++++
 e2fsck/problem.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/e2fsck/problem.c b/e2fsck/problem.c
index e79c853b..2b596303 100644
--- a/e2fsck/problem.c
+++ b/e2fsck/problem.c
@@ -1805,6 +1805,11 @@ static struct e2fsck_problem problem_table[] = {
 	  N_("Encrypted @E references @i %Di, which has a different encryption policy.\n"),
 	  PROMPT_CLEAR, 0, 0, 0, 0 },
 
+	/* Casefolded directory entry has illegal characters in its name */
+	{ PR_2_BAD_CASEFOLDED_NAME,
+	  N_("@E has illegal UTF-8 characters in its name.\n"),
+	  PROMPT_FIX, 0, 0, 0, 0 },
+
 	/* Pass 3 errors */
 
 	/* Pass 3: Checking directory connectivity */
diff --git a/e2fsck/problem.h b/e2fsck/problem.h
index 4185e517..a8806fd4 100644
--- a/e2fsck/problem.h
+++ b/e2fsck/problem.h
@@ -1028,6 +1028,9 @@ struct problem_context {
 /* Encrypted directory contains file with different encryption policy */
 #define PR_2_INCONSISTENT_ENCRYPTION_POLICY	0x020052
 
+/* Casefolded directory entry has illegal characters in its name */
+#define PR_2_BAD_CASEFOLDED_NAME		0x0200053
+
 /*
  * Pass 3 errors
  */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 06/12] e2fsck: Fix entries with invalid encoded characters
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (4 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2020-12-10 20:51   ` Gabriel Krisman Bertazi
  2020-12-10 15:03 ` [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing Arnaud Ferraris
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

On strict mode, invalid Unicode sequences are not permited.  This patch
adds a verification step to pass2 to detect and modify the entries with
the same replacement char used for non-encoding directories '.'.

After the encoding test, we still want to check the name for usual
problems, '\0', '/' in the middle of the sequence.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
Changes in v2:
  - added missing comment
  - uses the problem code introduced by the previous patch
  - reworked a test to ease future support of encrypted+casefolded
    directories

 e2fsck/e2fsck.c |  4 ++++
 e2fsck/e2fsck.h |  1 +
 e2fsck/pass1.c  | 18 +++++++++++++++++
 e2fsck/pass2.c  | 51 ++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
index d8be566f..dc4b45e2 100644
--- a/e2fsck/e2fsck.c
+++ b/e2fsck/e2fsck.c
@@ -75,6 +75,10 @@ errcode_t e2fsck_reset_context(e2fsck_t ctx)
 		ext2fs_free_block_bitmap(ctx->block_found_map);
 		ctx->block_found_map = 0;
 	}
+	if (ctx->inode_casefold_map) {
+		ext2fs_free_block_bitmap(ctx->inode_casefold_map);
+		ctx->inode_casefold_map = 0;
+	}
 	if (ctx->inode_link_info) {
 		ext2fs_free_icount(ctx->inode_link_info);
 		ctx->inode_link_info = 0;
diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index 85f953b2..dcaab0a1 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -262,6 +262,7 @@ struct e2fsck_struct {
 	ext2fs_inode_bitmap inode_bb_map; /* Inodes which are in bad blocks */
 	ext2fs_inode_bitmap inode_imagic_map; /* AFS inodes */
 	ext2fs_inode_bitmap inode_reg_map; /* Inodes which are regular files*/
+	ext2fs_inode_bitmap inode_casefold_map; /* Inodes which are casefolded */
 
 	ext2fs_block_bitmap block_found_map; /* Blocks which are in use */
 	ext2fs_block_bitmap block_dup_map; /* Blks referenced more than once */
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 8eecd958..6909fed5 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -23,6 +23,7 @@
  * 	- A bitmap of which inodes have bad fields.	(inode_bad_map)
  * 	- A bitmap of which inodes are in bad blocks.	(inode_bb_map)
  * 	- A bitmap of which inodes are imagic inodes.	(inode_imagic_map)
+ * 	- A bitmap of which inodes are casefolded.	(inode_casefold_map)
  * 	- A bitmap of which blocks are in use.		(block_found_map)
  * 	- A bitmap of which blocks are in use by two inodes	(block_dup_map)
  * 	- The data blocks of the directory inodes.	(dir_map)
@@ -1260,6 +1261,20 @@ void e2fsck_pass1(e2fsck_t ctx)
 		ctx->flags |= E2F_FLAG_ABORT;
 		return;
 	}
+	if (casefold_fs) {
+		pctx.errcode =
+			e2fsck_allocate_inode_bitmap(fs,
+						     _("inode casefold map"),
+						     EXT2FS_BMAP64_RBTREE,
+						     "inode_casefold_map",
+						     &ctx->inode_casefold_map);
+		if (pctx.errcode) {
+			pctx.num = 1;
+			fix_problem(ctx, PR_1_ALLOCATE_IBITMAP_ERROR, &pctx);
+			ctx->flags |= E2F_FLAG_ABORT;
+			return;
+		}
+	}
 	pctx.errcode = e2fsck_setup_icount(ctx, "inode_link_info", 0, NULL,
 					   &ctx->inode_link_info);
 	if (pctx.errcode) {
@@ -1870,6 +1885,9 @@ void e2fsck_pass1(e2fsck_t ctx)
 		    add_encrypted_file(ctx, &pctx) < 0)
 			goto clear_inode;
 
+		if (casefold_fs && inode->i_flags & EXT4_CASEFOLD_FL)
+			ext2fs_mark_inode_bitmap2(ctx->inode_casefold_map, ino);
+
 		if (LINUX_S_ISDIR(inode->i_mode)) {
 			ext2fs_mark_inode_bitmap2(ctx->inode_dir_map, ino);
 			e2fsck_add_dir_info(ctx, ino, 0);
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index 4dbc44ea..b9402b24 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -36,11 +36,13 @@
  * 	- The inode_bad_map bitmap
  * 	- The inode_dir_map bitmap
  * 	- The encrypted_file_info
+ *	- The inode_casefold_map bitmap
  *
  * Pass 2 frees the following data structures
  * 	- The inode_bad_map bitmap
  * 	- The inode_reg_map bitmap
  * 	- The encrypted_file_info
+ *	- The inode_casefold_map bitmap
  */
 
 #define _GNU_SOURCE 1 /* get strnlen() */
@@ -287,6 +289,10 @@ void e2fsck_pass2(e2fsck_t ctx)
 		ext2fs_free_inode_bitmap(ctx->inode_reg_map);
 		ctx->inode_reg_map = 0;
 	}
+	if (ctx->inode_casefold_map) {
+		ext2fs_free_inode_bitmap(ctx->inode_casefold_map);
+		ctx->inode_casefold_map = 0;
+	}
 	destroy_encrypted_file_info(ctx);
 
 	clear_problem_context(&pctx);
@@ -515,6 +521,30 @@ static int encrypted_check_name(e2fsck_t ctx,
 	return 0;
 }
 
+static int encoded_check_name(e2fsck_t ctx,
+			      struct ext2_dir_entry *dirent,
+			      struct problem_context *pctx)
+{
+	const struct ext2fs_nls_table *tbl = ctx->fs->encoding;
+	int ret;
+	int len = ext2fs_dirent_name_len(dirent);
+	char *pos, *end;
+
+	ret = ext2fs_check_encoded_name(tbl, dirent->name, len, &pos);
+	if (ret < 0) {
+		fatal_error(ctx, _("NLS is broken."));
+	} else if(ret > 0) {
+		ret = fix_problem(ctx, PR_2_BAD_CASEFOLDED_NAME, pctx);
+		if (ret) {
+			end = &dirent->name[len];
+			for (; *pos && pos != end; pos++)
+				*pos = '.';
+		}
+	}
+
+	return (ret || check_name(ctx, dirent, pctx));
+}
+
 /*
  * Check the directory filetype (if present)
  */
@@ -998,11 +1028,18 @@ static int check_dir_block(ext2_filsys fs,
 	size_t	max_block_size;
 	int	hash_flags = 0;
 	static char *eop_read_dirblock = NULL;
+	int cf_dir = 0;
 
 	cd = (struct check_dir_struct *) priv_data;
 	ibuf = buf = cd->buf;
 	ctx = cd->ctx;
 
+	/* We only want filename encoding verification on strict
+	 * mode. */
+	if (ext2fs_test_inode_bitmap2(ctx->inode_casefold_map, ino) &&
+	    (ctx->fs->super->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL))
+		cf_dir = 1;
+
 	if (ctx->flags & E2F_FLAG_RUN_RETURN)
 		return DIRENT_ABORT;
 
@@ -1483,11 +1520,7 @@ skip_checksum:
 		if (check_filetype(ctx, dirent, ino, &cd->pctx))
 			dir_modified++;
 
-		if (dir_encpolicy_id == NO_ENCRYPTION_POLICY) {
-			/* Unencrypted directory */
-			if (check_name(ctx, dirent, &cd->pctx))
-				dir_modified++;
-		} else {
+		if (dir_encpolicy_id != NO_ENCRYPTION_POLICY) {
 			/* Encrypted directory */
 			if (dot_state > 1 &&
 			    check_encrypted_dirent(ctx, dirent,
@@ -1497,6 +1530,14 @@ skip_checksum:
 				dir_modified++;
 				goto next;
 			}
+		} else if (cf_dir) {
+			/* Casefolded directory */
+			if (encoded_check_name(ctx, dirent, &cd->pctx))
+				dir_modified++;
+		} else {
+			/* Unencrypted and uncasefolded directory */
+			if (check_name(ctx, dirent, &cd->pctx))
+				dir_modified++;
 		}
 
 		if (dx_db) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (5 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 06/12] e2fsck: Fix entries with invalid encoded characters Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2020-12-10 20:53   ` Gabriel Krisman Bertazi
  2020-12-10 15:03 ` [PATCH RESEND v2 08/12] dict: Support comparison with context Arnaud Ferraris
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

When rehashing a +F directory, the casefold comparison needs to be
performed, in order to identify duplicated filenames.  Like the -F
version, This is done in two steps, first adapt the qsort comparison to
consider casefolded directories, and then iterate over the sorted list
fixing dups.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 e2fsck/rehash.c | 88 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 72 insertions(+), 16 deletions(-)

diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
index 30e510a6..14215011 100644
--- a/e2fsck/rehash.c
+++ b/e2fsck/rehash.c
@@ -214,6 +214,23 @@ static EXT2_QSORT_TYPE ino_cmp(const void *a, const void *b)
 	return (he_a->ino - he_b->ino);
 }
 
+struct name_cmp_ctx
+{
+	int casefold;
+	const struct ext2fs_nls_table *tbl;
+};
+
+
+static int same_name(const struct name_cmp_ctx *cmp_ctx, char *s1,
+		     int len1, char *s2, int len2)
+{
+	if (!cmp_ctx->casefold)
+		return (len1 == len2 &&	!memcmp(s1, s2, len1));
+	else
+		return !ext2fs_casefold_cmp(cmp_ctx->tbl,
+					    s1, len1, s2, len2);
+}
+
 /* Used for sorting the hash entry */
 static EXT2_QSORT_TYPE name_cmp(const void *a, const void *b)
 {
@@ -240,9 +257,35 @@ static EXT2_QSORT_TYPE name_cmp(const void *a, const void *b)
 	return ret;
 }
 
+static EXT2_QSORT_TYPE name_cf_cmp(const struct name_cmp_ctx *ctx,
+				   const void *a, const void *b)
+{
+	const struct hash_entry *he_a = (const struct hash_entry *) a;
+	const struct hash_entry *he_b = (const struct hash_entry *) b;
+	unsigned int he_a_len, he_b_len, min_len;
+	int ret;
+
+	he_a_len = ext2fs_dirent_name_len(he_a->dir);
+	he_b_len = ext2fs_dirent_name_len(he_b->dir);
+
+	ret = ext2fs_casefold_cmp(ctx->tbl, he_a->dir->name, he_a_len,
+				  he_b->dir->name, he_b_len);
+	if (ret == 0) {
+		if (he_a_len > he_b_len)
+			ret = 1;
+		else if (he_a_len < he_b_len)
+			ret = -1;
+		else
+			ret = he_b->dir->inode - he_a->dir->inode;
+	}
+	return ret;
+}
+
+
 /* Used for sorting the hash entry */
-static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b)
+static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b, void *arg)
 {
+	const struct name_cmp_ctx *ctx = (struct name_cmp_ctx *) arg;
 	const struct hash_entry *he_a = (const struct hash_entry *) a;
 	const struct hash_entry *he_b = (const struct hash_entry *) b;
 	int	ret;
@@ -256,8 +299,12 @@ static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b)
 			ret = 1;
 		else if (he_a->minor_hash < he_b->minor_hash)
 			ret = -1;
-		else
-			ret = name_cmp(a, b);
+		else {
+			if (ctx->casefold)
+				ret = name_cf_cmp(ctx, a, b);
+			else
+				ret = name_cmp(a, b);
+		}
 	}
 	return ret;
 }
@@ -380,7 +427,8 @@ static void mutate_name(char *str, unsigned int *len)
 
 static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
 				    ext2_ino_t ino,
-				    struct fill_dir_struct *fd)
+				    struct fill_dir_struct *fd,
+				    const struct name_cmp_ctx *cmp_ctx)
 {
 	struct problem_context	pctx;
 	struct hash_entry	*ent, *prev;
@@ -403,11 +451,12 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
 		ent = fd->harray + i;
 		prev = ent - 1;
 		if (!ent->dir->inode ||
-		    (ext2fs_dirent_name_len(ent->dir) !=
-		     ext2fs_dirent_name_len(prev->dir)) ||
-		    memcmp(ent->dir->name, prev->dir->name,
-			     ext2fs_dirent_name_len(ent->dir)))
+		    !same_name(cmp_ctx, ent->dir->name,
+			       ext2fs_dirent_name_len(ent->dir),
+			       prev->dir->name,
+			       ext2fs_dirent_name_len(prev->dir)))
 			continue;
+
 		pctx.dirent = ent->dir;
 		if ((ent->dir->inode == prev->dir->inode) &&
 		    fix_problem(ctx, PR_2_DUPLICATE_DIRENT, &pctx)) {
@@ -426,10 +475,11 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
 		mutate_name(new_name, &new_len);
 		for (j=0; j < fd->num_array; j++) {
 			if ((i==j) ||
-			    (new_len !=
-			     (unsigned) ext2fs_dirent_name_len(fd->harray[j].dir)) ||
-			    memcmp(new_name, fd->harray[j].dir->name, new_len))
+			    !same_name(cmp_ctx, new_name, new_len,
+				       fd->harray[j].dir->name,
+				       ext2fs_dirent_name_len(fd->harray[j].dir))) {
 				continue;
+			}
 			mutate_name(new_name, &new_len);
 
 			j = -1;
@@ -894,6 +944,7 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
 	struct fill_dir_struct	fd = { NULL, NULL, 0, 0, 0, NULL,
 				       0, 0, 0, 0, 0, 0 };
 	struct out_dir		outdir = { 0, 0, 0, 0 };
+	struct name_cmp_ctx name_cmp_ctx = {0, NULL};
 
 	e2fsck_read_inode(ctx, ino, &inode, "rehash_dir");
 
@@ -921,6 +972,11 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
 		fd.compress = 1;
 	fd.parent = 0;
 
+	if (fs->encoding && (inode.i_flags & EXT4_CASEFOLD_FL)) {
+		name_cmp_ctx.casefold = 1;
+		name_cmp_ctx.tbl = fs->encoding;
+	}
+
 retry_nohash:
 	/* Read in the entire directory into memory */
 	retval = ext2fs_block_iterate3(fs, ino, 0, 0,
@@ -949,16 +1005,16 @@ retry_nohash:
 	/* Sort the list */
 resort:
 	if (fd.compress && fd.num_array > 1)
-		qsort(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
-		      hash_cmp);
+		qsort_r(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
+			hash_cmp, &name_cmp_ctx);
 	else
-		qsort(fd.harray, fd.num_array, sizeof(struct hash_entry),
-		      hash_cmp);
+		qsort_r(fd.harray, fd.num_array, sizeof(struct hash_entry),
+			hash_cmp, &name_cmp_ctx);
 
 	/*
 	 * Look for duplicates
 	 */
-	if (duplicate_search_and_fix(ctx, fs, ino, &fd))
+	if (duplicate_search_and_fix(ctx, fs, ino, &fd, &name_cmp_ctx))
 		goto resort;
 
 	if (ctx->options & E2F_OPT_NO) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 08/12] dict: Support comparison with context
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (6 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2020-12-10 15:03 ` [PATCH RESEND v2 09/12] e2fsck: Detect duplicated casefolded direntries for rehash Arnaud Ferraris
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 e2fsck/pass1b.c       |  2 +-
 e2fsck/pass2.c        |  2 +-
 lib/support/dict.c    | 22 ++++++++++++++++------
 lib/support/dict.h    |  4 +++-
 lib/support/mkquota.c |  2 +-
 5 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/e2fsck/pass1b.c b/e2fsck/pass1b.c
index 3352f9bd..2f8c14c8 100644
--- a/e2fsck/pass1b.c
+++ b/e2fsck/pass1b.c
@@ -104,7 +104,7 @@ static dict_t clstr_dict, ino_dict;
 
 static ext2fs_inode_bitmap inode_dup_map;
 
-static int dict_int_cmp(const void *a, const void *b)
+static int dict_int_cmp(const void* cmp_ctx, const void *a, const void *b)
 {
 	intptr_t	ia, ib;
 
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index b9402b24..b62bfcb1 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -328,7 +328,7 @@ static short htree_depth(struct dx_dir_info *dx_dir,
 	return depth;
 }
 
-static int dict_de_cmp(const void *a, const void *b)
+static int dict_de_cmp(const void *cmp_ctx, const void *a, const void *b)
 {
 	const struct ext2_dir_entry *de_a, *de_b;
 	int	a_len, b_len;
diff --git a/lib/support/dict.c b/lib/support/dict.c
index 6a5c15ce..f8277c4a 100644
--- a/lib/support/dict.c
+++ b/lib/support/dict.c
@@ -267,6 +267,7 @@ dict_t *dict_create(dictcount_t maxcount, dict_comp_t comp)
 	new->allocnode = dnode_alloc;
 	new->freenode = dnode_free;
 	new->context = NULL;
+	new->cmp_ctx = NULL;
 	new->nodecount = 0;
 	new->maxcount = maxcount;
 	new->nilnode.left = &new->nilnode;
@@ -294,6 +295,14 @@ void dict_set_allocator(dict_t *dict, dnode_alloc_t al,
     dict->context = context;
 }
 
+void dict_set_cmp_context(dict_t *dict, void *cmp_ctx)
+{
+    dict_assert (!dict->cmp_ctx);
+    dict_assert (dict_count(dict) == 0);
+
+    dict->cmp_ctx = cmp_ctx;
+}
+
 #ifdef E2FSCK_NOTUSED
 /*
  * Free a dynamically allocated dictionary object. Removing the nodes
@@ -467,7 +476,7 @@ dnode_t *dict_lookup(dict_t *dict, const void *key)
     /* simple binary search adapted for trees that contain duplicate keys */
 
     while (root != nil) {
-	result = dict->compare(key, root->key);
+	result = dict->compare(dict->cmp_ctx, key, root->key);
 	if (result < 0)
 	    root = root->left;
 	else if (result > 0)
@@ -479,7 +488,8 @@ dnode_t *dict_lookup(dict_t *dict, const void *key)
 		do {
 		    saved = root;
 		    root = root->left;
-		    while (root != nil && dict->compare(key, root->key))
+		    while (root != nil
+			   && dict->compare(dict->cmp_ctx, key, root->key))
 			root = root->right;
 		} while (root != nil);
 		return saved;
@@ -503,7 +513,7 @@ dnode_t *dict_lower_bound(dict_t *dict, const void *key)
     dnode_t *tentative = 0;
 
     while (root != nil) {
-	int result = dict->compare(key, root->key);
+	int result = dict->compare(dict->cmp_ctx, key, root->key);
 
 	if (result > 0) {
 	    root = root->right;
@@ -535,7 +545,7 @@ dnode_t *dict_upper_bound(dict_t *dict, const void *key)
     dnode_t *tentative = 0;
 
     while (root != nil) {
-	int result = dict->compare(key, root->key);
+	int result = dict->compare(dict->cmp_ctx, key, root->key);
 
 	if (result < 0) {
 	    root = root->left;
@@ -580,7 +590,7 @@ void dict_insert(dict_t *dict, dnode_t *node, const void *key)
 
     while (where != nil) {
 	parent = where;
-	result = dict->compare(key, where->key);
+	result = dict->compare(dict->cmp_ctx, key, where->key);
 	/* trap attempts at duplicate key insertion unless it's explicitly allowed */
 	dict_assert (dict->dupes || result != 0);
 	if (result < 0)
@@ -1261,7 +1271,7 @@ static int tokenize(char *string, ...)
     return tokcount;
 }
 
-static int comparef(const void *key1, const void *key2)
+static int comparef(const void *cmp_ctx, const void *key1, const void *key2)
 {
     return strcmp(key1, key2);
 }
diff --git a/lib/support/dict.h b/lib/support/dict.h
index 838079d6..d9462a33 100644
--- a/lib/support/dict.h
+++ b/lib/support/dict.h
@@ -56,7 +56,7 @@ typedef struct dnode_t {
 #endif
 } dnode_t;
 
-typedef int (*dict_comp_t)(const void *, const void *);
+typedef int (*dict_comp_t)(const void *, const void *, const void *);
 typedef dnode_t *(*dnode_alloc_t)(void *);
 typedef void (*dnode_free_t)(dnode_t *, void *);
 
@@ -69,6 +69,7 @@ typedef struct dict_t {
     dnode_alloc_t dict_allocnode;
     dnode_free_t dict_freenode;
     void *dict_context;
+    void *cmp_ctx;
     int dict_dupes;
 #else
     int dict_dummmy;
@@ -88,6 +89,7 @@ typedef struct dict_load_t {
 
 extern dict_t *dict_create(dictcount_t, dict_comp_t);
 extern void dict_set_allocator(dict_t *, dnode_alloc_t, dnode_free_t, void *);
+extern void dict_set_cmp_context(dict_t *, void *);
 extern void dict_destroy(dict_t *);
 extern void dict_free_nodes(dict_t *);
 extern void dict_free(dict_t *);
diff --git a/lib/support/mkquota.c b/lib/support/mkquota.c
index 6f7ae6d6..fbc3833a 100644
--- a/lib/support/mkquota.c
+++ b/lib/support/mkquota.c
@@ -234,7 +234,7 @@ out:
 /* Helper functions for computing quota in memory.                */
 /******************************************************************/
 
-static int dict_uint_cmp(const void *a, const void *b)
+static int dict_uint_cmp(const void *cmp_ctx, const void *a, const void *b)
 {
 	unsigned int	c, d;
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 09/12] e2fsck: Detect duplicated casefolded direntries for rehash
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (7 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 08/12] dict: Support comparison with context Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2020-12-10 15:03 ` [PATCH RESEND v2 10/12] e2fsck: Add option to force encoded filename verification Arnaud Ferraris
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

On pass2, support casefolded directories when looking for duplicated
entries.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 e2fsck/pass2.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index b62bfcb1..b6514de8 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -344,6 +344,20 @@ static int dict_de_cmp(const void *cmp_ctx, const void *a, const void *b)
 	return memcmp(de_a->name, de_b->name, a_len);
 }
 
+static int dict_de_cf_cmp(const void *cmp_ctx, const void *a, const void *b)
+{
+	const struct ext2fs_nls_table *tbl = cmp_ctx;
+	const struct ext2_dir_entry *de_a, *de_b;
+	int	a_len, b_len;
+
+	de_a = (const struct ext2_dir_entry *) a;
+	a_len = ext2fs_dirent_name_len(de_a);
+	de_b = (const struct ext2_dir_entry *) b;
+	b_len = ext2fs_dirent_name_len(de_b);
+
+	return ext2fs_casefold_cmp(tbl, de_a->name, a_len, de_b->name, b_len);
+}
+
 /*
  * This is special sort function that makes sure that directory blocks
  * with a dirblock of zero are sorted to the beginning of the list.
@@ -1255,7 +1269,13 @@ skip_checksum:
 
 	dir_encpolicy_id = find_encryption_policy(ctx, ino);
 
-	dict_init(&de_dict, DICTCOUNT_T_MAX, dict_de_cmp);
+	if (cf_dir) {
+		dict_init(&de_dict, DICTCOUNT_T_MAX, dict_de_cf_cmp);
+		dict_set_cmp_context(&de_dict, (void *)ctx->fs->encoding);
+	} else {
+		dict_init(&de_dict, DICTCOUNT_T_MAX, dict_de_cmp);
+	}
+
 	prev = 0;
 	do {
 		dgrp_t group;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 10/12] e2fsck: Add option to force encoded filename verification
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (8 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 09/12] e2fsck: Detect duplicated casefolded direntries for rehash Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2020-12-10 20:48   ` Gabriel Krisman Bertazi
  2020-12-10 15:03 ` [PATCH RESEND v2 11/12] e2fsck.8.in: Document check_encoding extended option Arnaud Ferraris
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

This is interesting for !strict filesystems as part of the encoding
update procedure. Once the filesystem is known to not have badly encoded
filenames, the update is trivial, thanks to the stability of assigned
code points in the unicode specification.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 e2fsck/e2fsck.h | 1 +
 e2fsck/pass2.c  | 5 +++--
 e2fsck/unix.c   | 4 ++++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
index dcaab0a1..f324e92c 100644
--- a/e2fsck/e2fsck.h
+++ b/e2fsck/e2fsck.h
@@ -177,6 +177,7 @@ struct resource_track {
 #define E2F_OPT_ICOUNT_FULLMAP	0x20000 /* use an array for inode counts */
 #define E2F_OPT_UNSHARE_BLOCKS  0x40000
 #define E2F_OPT_CLEAR_UNINIT	0x80000 /* Hack to clear the uninit bit */
+#define E2F_OPT_CHECK_ENCODING  0x100000 /* Force verification of encoded filenames */
 
 /*
  * E2fsck flags
diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
index b6514de8..f93e2b53 100644
--- a/e2fsck/pass2.c
+++ b/e2fsck/pass2.c
@@ -1049,9 +1049,10 @@ static int check_dir_block(ext2_filsys fs,
 	ctx = cd->ctx;
 
 	/* We only want filename encoding verification on strict
-	 * mode. */
+	 * mode or if explicitly requested by user. */
 	if (ext2fs_test_inode_bitmap2(ctx->inode_casefold_map, ino) &&
-	    (ctx->fs->super->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL))
+	    ((ctx->fs->super->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL) ||
+	     (ctx->options & E2F_OPT_CHECK_ENCODING)))
 		cf_dir = 1;
 
 	if (ctx->flags & E2F_FLAG_RUN_RETURN)
diff --git a/e2fsck/unix.c b/e2fsck/unix.c
index 1cb51672..0a9012e5 100644
--- a/e2fsck/unix.c
+++ b/e2fsck/unix.c
@@ -753,6 +753,9 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 			ctx->options |= E2F_OPT_UNSHARE_BLOCKS;
 			ctx->options |= E2F_OPT_FORCE;
 			continue;
+		} else if (strcmp(token, "check_encoding") == 0) {
+			ctx->options |= E2F_OPT_CHECK_ENCODING;
+			continue;
 #ifdef CONFIG_DEVELOPER_FEATURES
 		} else if (strcmp(token, "clear_all_uninit_bits") == 0) {
 			ctx->options |= E2F_OPT_CLEAR_UNINIT;
@@ -784,6 +787,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
 		fputs("\tbmap2extent\n", stderr);
 		fputs("\tunshare_blocks\n", stderr);
 		fputs("\tfixes_only\n", stderr);
+		fputs("\tcheck_encoding\n", stderr);
 		fputc('\n', stderr);
 		exit(1);
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 11/12] e2fsck.8.in: Document check_encoding extended option
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (9 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 10/12] e2fsck: Add option to force encoded filename verification Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2020-12-10 15:03 ` [PATCH RESEND v2 12/12] tests: f_bad_fname: Test fixes of invalid filenames and duplicates Arnaud Ferraris
  2021-01-28  2:52 ` [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Theodore Ts'o
  12 siblings, 0 replies; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 e2fsck/e2fsck.8.in | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/e2fsck/e2fsck.8.in b/e2fsck/e2fsck.8.in
index 4e3890b2..019a34ec 100644
--- a/e2fsck/e2fsck.8.in
+++ b/e2fsck/e2fsck.8.in
@@ -267,6 +267,10 @@ Only fix damaged metadata; do not optimize htree directories or compress
 extent trees.  This option is incompatible with the -D and -E bmap2extent
 options.
 .TP
+.BI check_encoding
+Force verification of encoded filenames in case-insensitive directories.
+This is the default mode if the filesystem has the strict flag enabled.
+.TP
 .BI unshare_blocks
 If the filesystem has shared blocks, with the shared blocks read-only feature
 enabled, then this will unshare all shared blocks and unset the read-only
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH RESEND v2 12/12] tests: f_bad_fname: Test fixes of invalid filenames and duplicates
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (10 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 11/12] e2fsck.8.in: Document check_encoding extended option Arnaud Ferraris
@ 2020-12-10 15:03 ` Arnaud Ferraris
  2021-01-28  2:52 ` [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Theodore Ts'o
  12 siblings, 0 replies; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-10 15:03 UTC (permalink / raw)
  To: linux-ext4; +Cc: drosen, krisman, ebiggers, tytso, Arnaud Ferraris

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 tests/f_bad_fname/expect.1 |  22 ++++++++++++++++++++++
 tests/f_bad_fname/expect.2 |   7 +++++++
 tests/f_bad_fname/image.gz | Bin 0 -> 802 bytes
 tests/f_bad_fname/name     |   1 +
 4 files changed, 30 insertions(+)
 create mode 100644 tests/f_bad_fname/expect.1
 create mode 100644 tests/f_bad_fname/expect.2
 create mode 100644 tests/f_bad_fname/image.gz
 create mode 100644 tests/f_bad_fname/name

diff --git a/tests/f_bad_fname/expect.1 b/tests/f_bad_fname/expect.1
new file mode 100644
index 00000000..1d860b22
--- /dev/null
+++ b/tests/f_bad_fname/expect.1
@@ -0,0 +1,22 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Entry 'AM-^?' in /ci_dir (12) has illegal characters in its name.
+Fix? yes
+
+Entry 'AM-~' in /ci_dir (12) has illegal characters in its name.
+Fix? yes
+
+Duplicate entry 'A.' found.
+	Marking /ci_dir (12) to be rebuilt.
+
+Pass 3: Checking directory connectivity
+Pass 3A: Optimizing directories
+Entry 'A.' in /ci_dir (12) has a non-unique filename.
+Rename to A.~0? yes
+
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 14/16 files (0.0% non-contiguous), 22/100 blocks
+Exit status is 1
diff --git a/tests/f_bad_fname/expect.2 b/tests/f_bad_fname/expect.2
new file mode 100644
index 00000000..13de1c08
--- /dev/null
+++ b/tests/f_bad_fname/expect.2
@@ -0,0 +1,7 @@
+Pass 1: Checking inodes, blocks, and sizes
+Pass 2: Checking directory structure
+Pass 3: Checking directory connectivity
+Pass 4: Checking reference counts
+Pass 5: Checking group summary information
+test_filesys: 14/16 files (0.0% non-contiguous), 22/100 blocks
+Exit status is 0
diff --git a/tests/f_bad_fname/image.gz b/tests/f_bad_fname/image.gz
new file mode 100644
index 0000000000000000000000000000000000000000..a8b3fc6b8397a7859d9697c462f24f498bb57fd8
GIT binary patch
literal 802
zcmb2|=HU3ZwK|T0IWspgJ(c0@9p4PuP!Wa)#-G*9mUQmd6)h1gP)&N{wkF^Lhf?9g
zMNtKcnpXpOe4{cJM+7ff`jtKW--4vV=~{YsIv=@RXj&kBGDu*_q5yNH8;i;k=a=78
z@%4!p{B((@Y#;x**}v1?o!R+$Msd2?XQhT^yX=m-bnO%AUveTi<+L?Vc;L6)A5Olh
zkgUC!Xa6hu>if4lZ+<+wuKRbcPoc`uZ6eD**?bcdtk2o`_gzv|PMx0hpO>@$rmu^(
z?AyKU``I-=Pp&oC_H)|fqtg6-`@Wz0F1_p<&)<1R{~GSt@b*UM($5jzk<&|Eqb!S5
zC;dI25?e2O?evSo@-lf#pZaR%b#MCjIm7kh&2Lv9-_JR}c*edv`_`Z5uit04m}T9R
zvg*S}s~^PYF7Q0{{O`dNpFUM?$$I|tgvVE&{hDl_R{mvAe|KZ^|CzPI7iX&P%Zr_C
zZ@M!(>662efM5S2S5LLK+`4<&P40-^kkj^ie!o3`-2B-7*Z-%7h5kQqb@iJ6?SA`q
zEzOyqebrVf!FKA``uEYh>$GmalAq=J*SUCQeTLpw{<BN}E!=s<-emf(_Gdx=FT?;P
zy@8Ut>+2WHWLy0EcfE6{wREm~%<NmA*IwBd@H(K;H0a*{^XGG{ukTy`&U`}Su8{AQ
z@BceZp7Q<wb+ftkJKq&Dpn&P+Z;rhAk+dfKoOr+6AFEkXu17w<Ki@Zh-s(lSqDt<$
z_q}+p!+tI;b$j=wtnKr7?5{m$tlIMVebwjxKa=<Ve7|$+y+8WCt}>q26LNIByx0HF
zdH5@Ouk^p<-#3bF)m~g&Irr<c^Y&}Z{~y2aS`&EAK6>}r`pP?ZBEOZ}g`cy3`d{tq
z*BSM{UoQRmzSZQvY*fe5KXYH}*Z*qVr}y*p&-cgVKG&N_7JkaOdT;h8y<<UbLG{Z`
z3(CHJe*4V2-2Qg%U+dGK<aa&!b6u~#rZUZa`&H|C5pNk8Q9MX0=fSHl{Bc6hCNNB3
GWB>rT_?(Xb

literal 0
HcmV?d00001

diff --git a/tests/f_bad_fname/name b/tests/f_bad_fname/name
new file mode 100644
index 00000000..675165a6
--- /dev/null
+++ b/tests/f_bad_fname/name
@@ -0,0 +1 @@
+Case-insensitive directory with broken file names
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check
  2020-12-10 15:03 ` [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check Arnaud Ferraris
@ 2020-12-10 20:36   ` Gabriel Krisman Bertazi
  2020-12-10 20:38   ` Gabriel Krisman Bertazi
  1 sibling, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-10 20:36 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, ebiggers, tytso

Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:

> ---
> Changes in v2:
>   - added in this version
>
>  e2fsck/problem.c | 5 +++++
>  e2fsck/problem.h | 3 +++
>  2 files changed, 8 insertions(+)
>
> diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> index e79c853b..2b596303 100644
> --- a/e2fsck/problem.c
> +++ b/e2fsck/problem.c
> @@ -1805,6 +1805,11 @@ static struct e2fsck_problem problem_table[] = {
>  	  N_("Encrypted @E references @i %Di, which has a different encryption policy.\n"),
>  	  PROMPT_CLEAR, 0, 0, 0, 0 },
>  
> +	/* Casefolded directory entry has illegal characters in its name */
> +	{ PR_2_BAD_CASEFOLDED_NAME,
> +	  N_("@E has illegal UTF-8 characters in its name.\n"),
> +	  PROMPT_FIX, 0, 0, 0, 0 },
> +
>  	/* Pass 3 errors */
>  
>  	/* Pass 3: Checking directory connectivity */
> diff --git a/e2fsck/problem.h b/e2fsck/problem.h
> index 4185e517..a8806fd4 100644
> --- a/e2fsck/problem.h
> +++ b/e2fsck/problem.h
> @@ -1028,6 +1028,9 @@ struct problem_context {
>  /* Encrypted directory contains file with different encryption policy */
>  #define PR_2_INCONSISTENT_ENCRYPTION_POLICY	0x020052
>  
> +/* Casefolded directory entry has illegal characters in its name */
> +#define PR_2_BAD_CASEFOLDED_NAME		0x0200053

This should be 0x020053 (yours has an extra 0)

> +
>  /*
>   * Pass 3 errors
>   */

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check
  2020-12-10 15:03 ` [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check Arnaud Ferraris
  2020-12-10 20:36   ` Gabriel Krisman Bertazi
@ 2020-12-10 20:38   ` Gabriel Krisman Bertazi
  1 sibling, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-10 20:38 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, ebiggers, tytso

Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:

> ---
> Changes in v2:
>   - added in this version
>
>  e2fsck/problem.c | 5 +++++
>  e2fsck/problem.h | 3 +++
>  2 files changed, 8 insertions(+)
>
> diff --git a/e2fsck/problem.c b/e2fsck/problem.c
> index e79c853b..2b596303 100644
> --- a/e2fsck/problem.c
> +++ b/e2fsck/problem.c
> @@ -1805,6 +1805,11 @@ static struct e2fsck_problem problem_table[] = {
>  	  N_("Encrypted @E references @i %Di, which has a different encryption policy.\n"),
>  	  PROMPT_CLEAR, 0, 0, 0, 0 },
>  
> +	/* Casefolded directory entry has illegal characters in its name */
> +	{ PR_2_BAD_CASEFOLDED_NAME,
> +	  N_("@E has illegal UTF-8 characters in its name.\n"),
> +	  PROMPT_FIX, 0, 0, 0, 0 },
> +
>  	/* Pass 3 errors */
>  
>  	/* Pass 3: Checking directory connectivity */
> diff --git a/e2fsck/problem.h b/e2fsck/problem.h
> index 4185e517..a8806fd4 100644
> --- a/e2fsck/problem.h
> +++ b/e2fsck/problem.h
> @@ -1028,6 +1028,9 @@ struct problem_context {
>  /* Encrypted directory contains file with different encryption policy */
>  #define PR_2_INCONSISTENT_ENCRYPTION_POLICY	0x020052
>  
> +/* Casefolded directory entry has illegal characters in its name */
> +#define PR_2_BAD_CASEFOLDED_NAME		0x0200053

Also, PR_2_BAD_ENCODED_NAME makes more sense than CASEFOLDED.  The
name is encoded in utf-8 but not casefolded on-disk.

> +
>  /*
>   * Pass 3 errors
>   */

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 10/12] e2fsck: Add option to force encoded filename verification
  2020-12-10 15:03 ` [PATCH RESEND v2 10/12] e2fsck: Add option to force encoded filename verification Arnaud Ferraris
@ 2020-12-10 20:48   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-10 20:48 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, ebiggers, tytso

Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:

> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>
> This is interesting for !strict filesystems as part of the encoding
> update procedure. Once the filesystem is known to not have badly encoded
> filenames, the update is trivial, thanks to the stability of assigned
> code points in the unicode specification.

Ted,

I thought of this after your comment on LSFMM 2019 that strict mode
made the update of the unicode version trivial.  I think this patch is a
dependency for ext4 to support a newer unicode version, which I'd like
to support in the near feature, such that we don't lag too much behind
the Unicode specification.

>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
> ---
>  e2fsck/e2fsck.h | 1 +
>  e2fsck/pass2.c  | 5 +++--
>  e2fsck/unix.c   | 4 ++++
>  3 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> index dcaab0a1..f324e92c 100644
> --- a/e2fsck/e2fsck.h
> +++ b/e2fsck/e2fsck.h
> @@ -177,6 +177,7 @@ struct resource_track {
>  #define E2F_OPT_ICOUNT_FULLMAP	0x20000 /* use an array for inode counts */
>  #define E2F_OPT_UNSHARE_BLOCKS  0x40000
>  #define E2F_OPT_CLEAR_UNINIT	0x80000 /* Hack to clear the uninit bit */
> +#define E2F_OPT_CHECK_ENCODING  0x100000 /* Force verification of encoded filenames */
>  
>  /*
>   * E2fsck flags
> diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
> index b6514de8..f93e2b53 100644
> --- a/e2fsck/pass2.c
> +++ b/e2fsck/pass2.c
> @@ -1049,9 +1049,10 @@ static int check_dir_block(ext2_filsys fs,
>  	ctx = cd->ctx;
>  
>  	/* We only want filename encoding verification on strict
> -	 * mode. */
> +	 * mode or if explicitly requested by user. */
>  	if (ext2fs_test_inode_bitmap2(ctx->inode_casefold_map, ino) &&
> -	    (ctx->fs->super->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL))
> +	    ((ctx->fs->super->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL) ||
> +	     (ctx->options & E2F_OPT_CHECK_ENCODING)))
>  		cf_dir = 1;
>  
>  	if (ctx->flags & E2F_FLAG_RUN_RETURN)
> diff --git a/e2fsck/unix.c b/e2fsck/unix.c
> index 1cb51672..0a9012e5 100644
> --- a/e2fsck/unix.c
> +++ b/e2fsck/unix.c
> @@ -753,6 +753,9 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
>  			ctx->options |= E2F_OPT_UNSHARE_BLOCKS;
>  			ctx->options |= E2F_OPT_FORCE;
>  			continue;
> +		} else if (strcmp(token, "check_encoding") == 0) {
> +			ctx->options |= E2F_OPT_CHECK_ENCODING;
> +			continue;
>  #ifdef CONFIG_DEVELOPER_FEATURES
>  		} else if (strcmp(token, "clear_all_uninit_bits") == 0) {
>  			ctx->options |= E2F_OPT_CLEAR_UNINIT;
> @@ -784,6 +787,7 @@ static void parse_extended_opts(e2fsck_t ctx, const char *opts)
>  		fputs("\tbmap2extent\n", stderr);
>  		fputs("\tunshare_blocks\n", stderr);
>  		fputs("\tfixes_only\n", stderr);
> +		fputs("\tcheck_encoding\n", stderr);
>  		fputc('\n', stderr);
>  		exit(1);
>  	}

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 06/12] e2fsck: Fix entries with invalid encoded characters
  2020-12-10 15:03 ` [PATCH RESEND v2 06/12] e2fsck: Fix entries with invalid encoded characters Arnaud Ferraris
@ 2020-12-10 20:51   ` Gabriel Krisman Bertazi
  2020-12-15 17:16     ` Arnaud Ferraris
  0 siblings, 1 reply; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-10 20:51 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, ebiggers, tytso

Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:

> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>
> On strict mode, invalid Unicode sequences are not permited.  This patch
> adds a verification step to pass2 to detect and modify the entries with
> the same replacement char used for non-encoding directories '.'.
>
> After the encoding test, we still want to check the name for usual
> problems, '\0', '/' in the middle of the sequence.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
> ---
> Changes in v2:
>   - added missing comment
>   - uses the problem code introduced by the previous patch
>   - reworked a test to ease future support of encrypted+casefolded
>     directories
>
>  e2fsck/e2fsck.c |  4 ++++
>  e2fsck/e2fsck.h |  1 +
>  e2fsck/pass1.c  | 18 +++++++++++++++++
>  e2fsck/pass2.c  | 51 ++++++++++++++++++++++++++++++++++++++++++++-----
>  4 files changed, 69 insertions(+), 5 deletions(-)
>
> diff --git a/e2fsck/e2fsck.c b/e2fsck/e2fsck.c
> index d8be566f..dc4b45e2 100644
> --- a/e2fsck/e2fsck.c
> +++ b/e2fsck/e2fsck.c
> @@ -75,6 +75,10 @@ errcode_t e2fsck_reset_context(e2fsck_t ctx)
>  		ext2fs_free_block_bitmap(ctx->block_found_map);
>  		ctx->block_found_map = 0;
>  	}
> +	if (ctx->inode_casefold_map) {
> +		ext2fs_free_block_bitmap(ctx->inode_casefold_map);
> +		ctx->inode_casefold_map = 0;
> +	}
>  	if (ctx->inode_link_info) {
>  		ext2fs_free_icount(ctx->inode_link_info);
>  		ctx->inode_link_info = 0;
> diff --git a/e2fsck/e2fsck.h b/e2fsck/e2fsck.h
> index 85f953b2..dcaab0a1 100644
> --- a/e2fsck/e2fsck.h
> +++ b/e2fsck/e2fsck.h
> @@ -262,6 +262,7 @@ struct e2fsck_struct {
>  	ext2fs_inode_bitmap inode_bb_map; /* Inodes which are in bad blocks */
>  	ext2fs_inode_bitmap inode_imagic_map; /* AFS inodes */
>  	ext2fs_inode_bitmap inode_reg_map; /* Inodes which are regular files*/
> +	ext2fs_inode_bitmap inode_casefold_map; /* Inodes which are casefolded */
>  
>  	ext2fs_block_bitmap block_found_map; /* Blocks which are in use */
>  	ext2fs_block_bitmap block_dup_map; /* Blks referenced more than once */
> diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> index 8eecd958..6909fed5 100644
> --- a/e2fsck/pass1.c
> +++ b/e2fsck/pass1.c
> @@ -23,6 +23,7 @@
>   * 	- A bitmap of which inodes have bad fields.	(inode_bad_map)
>   * 	- A bitmap of which inodes are in bad blocks.	(inode_bb_map)
>   * 	- A bitmap of which inodes are imagic inodes.	(inode_imagic_map)
> + * 	- A bitmap of which inodes are casefolded.	(inode_casefold_map)
>   * 	- A bitmap of which blocks are in use.		(block_found_map)
>   * 	- A bitmap of which blocks are in use by two inodes	(block_dup_map)
>   * 	- The data blocks of the directory inodes.	(dir_map)
> @@ -1260,6 +1261,20 @@ void e2fsck_pass1(e2fsck_t ctx)
>  		ctx->flags |= E2F_FLAG_ABORT;
>  		return;
>  	}
> +	if (casefold_fs) {
> +		pctx.errcode =
> +			e2fsck_allocate_inode_bitmap(fs,
> +						     _("inode casefold map"),
> +						     EXT2FS_BMAP64_RBTREE,
> +						     "inode_casefold_map",
> +						     &ctx->inode_casefold_map);
> +		if (pctx.errcode) {
> +			pctx.num = 1;
> +			fix_problem(ctx, PR_1_ALLOCATE_IBITMAP_ERROR, &pctx);
> +			ctx->flags |= E2F_FLAG_ABORT;
> +			return;
> +		}
> +	}
>  	pctx.errcode = e2fsck_setup_icount(ctx, "inode_link_info", 0, NULL,
>  					   &ctx->inode_link_info);
>  	if (pctx.errcode) {
> @@ -1870,6 +1885,9 @@ void e2fsck_pass1(e2fsck_t ctx)
>  		    add_encrypted_file(ctx, &pctx) < 0)
>  			goto clear_inode;
>  
> +		if (casefold_fs && inode->i_flags & EXT4_CASEFOLD_FL)
> +			ext2fs_mark_inode_bitmap2(ctx->inode_casefold_map, ino);
> +
>  		if (LINUX_S_ISDIR(inode->i_mode)) {
>  			ext2fs_mark_inode_bitmap2(ctx->inode_dir_map, ino);
>  			e2fsck_add_dir_info(ctx, ino, 0);
> diff --git a/e2fsck/pass2.c b/e2fsck/pass2.c
> index 4dbc44ea..b9402b24 100644
> --- a/e2fsck/pass2.c
> +++ b/e2fsck/pass2.c
> @@ -36,11 +36,13 @@
>   * 	- The inode_bad_map bitmap
>   * 	- The inode_dir_map bitmap
>   * 	- The encrypted_file_info
> + *	- The inode_casefold_map bitmap
>   *
>   * Pass 2 frees the following data structures
>   * 	- The inode_bad_map bitmap
>   * 	- The inode_reg_map bitmap
>   * 	- The encrypted_file_info
> + *	- The inode_casefold_map bitmap
>   */
>  
>  #define _GNU_SOURCE 1 /* get strnlen() */
> @@ -287,6 +289,10 @@ void e2fsck_pass2(e2fsck_t ctx)
>  		ext2fs_free_inode_bitmap(ctx->inode_reg_map);
>  		ctx->inode_reg_map = 0;
>  	}
> +	if (ctx->inode_casefold_map) {
> +		ext2fs_free_inode_bitmap(ctx->inode_casefold_map);
> +		ctx->inode_casefold_map = 0;
> +	}
>  	destroy_encrypted_file_info(ctx);
>  
>  	clear_problem_context(&pctx);
> @@ -515,6 +521,30 @@ static int encrypted_check_name(e2fsck_t ctx,
>  	return 0;
>  }
>  
> +static int encoded_check_name(e2fsck_t ctx,
> +			      struct ext2_dir_entry *dirent,
> +			      struct problem_context *pctx)
> +{
> +	const struct ext2fs_nls_table *tbl = ctx->fs->encoding;
> +	int ret;
> +	int len = ext2fs_dirent_name_len(dirent);
> +	char *pos, *end;
> +
> +	ret = ext2fs_check_encoded_name(tbl, dirent->name, len, &pos);
> +	if (ret < 0) {
> +		fatal_error(ctx, _("NLS is broken."));
> +	} else if(ret > 0) {
> +		ret = fix_problem(ctx, PR_2_BAD_CASEFOLDED_NAME, pctx);
> +		if (ret) {
> +			end = &dirent->name[len];
> +			for (; *pos && pos != end; pos++)
> +				*pos = '.';
> +		}
> +	}
> +
> +	return (ret || check_name(ctx, dirent, pctx));
> +}
> +
>  /*
>   * Check the directory filetype (if present)
>   */
> @@ -998,11 +1028,18 @@ static int check_dir_block(ext2_filsys fs,
>  	size_t	max_block_size;
>  	int	hash_flags = 0;
>  	static char *eop_read_dirblock = NULL;
> +	int cf_dir = 0;
>  
>  	cd = (struct check_dir_struct *) priv_data;
>  	ibuf = buf = cd->buf;
>  	ctx = cd->ctx;
>  
> +	/* We only want filename encoding verification on strict
> +	 * mode. */
> +	if (ext2fs_test_inode_bitmap2(ctx->inode_casefold_map, ino) &&
> +	    (ctx->fs->super->s_encoding_flags & EXT4_ENC_STRICT_MODE_FL))
> +		cf_dir = 1;
> +
>  	if (ctx->flags & E2F_FLAG_RUN_RETURN)
>  		return DIRENT_ABORT;
>  
> @@ -1483,11 +1520,7 @@ skip_checksum:
>  		if (check_filetype(ctx, dirent, ino, &cd->pctx))
>  			dir_modified++;
>  
> -		if (dir_encpolicy_id == NO_ENCRYPTION_POLICY) {
> -			/* Unencrypted directory */
> -			if (check_name(ctx, dirent, &cd->pctx))
> -				dir_modified++;
> -		} else {
> +		if (dir_encpolicy_id != NO_ENCRYPTION_POLICY) {
>  			/* Encrypted directory */
>  			if (dot_state > 1 &&
>  			    check_encrypted_dirent(ctx, dirent,
> @@ -1497,6 +1530,14 @@ skip_checksum:
>  				dir_modified++;
>  				goto next;
>  			}
> +		} else if (cf_dir) {
> +			/* Casefolded directory */
> +			if (encoded_check_name(ctx, dirent, &cd->pctx))
> +				dir_modified++;
> +		} else {
> +			/* Unencrypted and uncasefolded directory */
> +			if (check_name(ctx, dirent, &cd->pctx))
> +				dir_modified++;
>  		}

This won't do for encrypted+casefolded directories, right?

>  
>  		if (dx_db) {

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing
  2020-12-10 15:03 ` [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing Arnaud Ferraris
@ 2020-12-10 20:53   ` Gabriel Krisman Bertazi
  2020-12-15 17:17     ` Arnaud Ferraris
  0 siblings, 1 reply; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-10 20:53 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, ebiggers, tytso

Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:

> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>
> When rehashing a +F directory, the casefold comparison needs to be
> performed, in order to identify duplicated filenames.  Like the -F
> version, This is done in two steps, first adapt the qsort comparison to
> consider casefolded directories, and then iterate over the sorted list
> fixing dups.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
> ---
>  e2fsck/rehash.c | 88 ++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 72 insertions(+), 16 deletions(-)
>
> diff --git a/e2fsck/rehash.c b/e2fsck/rehash.c
> index 30e510a6..14215011 100644
> --- a/e2fsck/rehash.c
> +++ b/e2fsck/rehash.c
> @@ -214,6 +214,23 @@ static EXT2_QSORT_TYPE ino_cmp(const void *a, const void *b)
>  	return (he_a->ino - he_b->ino);
>  }
>  
> +struct name_cmp_ctx
> +{
> +	int casefold;
> +	const struct ext2fs_nls_table *tbl;
> +};
> +
> +
> +static int same_name(const struct name_cmp_ctx *cmp_ctx, char *s1,
> +		     int len1, char *s2, int len2)
> +{
> +	if (!cmp_ctx->casefold)
> +		return (len1 == len2 &&	!memcmp(s1, s2, len1));
> +	else
> +		return !ext2fs_casefold_cmp(cmp_ctx->tbl,
> +					    s1, len1, s2, len2);
> +}
> +
>  /* Used for sorting the hash entry */
>  static EXT2_QSORT_TYPE name_cmp(const void *a, const void *b)
>  {
> @@ -240,9 +257,35 @@ static EXT2_QSORT_TYPE name_cmp(const void *a, const void *b)
>  	return ret;
>  }
>  
> +static EXT2_QSORT_TYPE name_cf_cmp(const struct name_cmp_ctx *ctx,
> +				   const void *a, const void *b)
> +{
> +	const struct hash_entry *he_a = (const struct hash_entry *) a;
> +	const struct hash_entry *he_b = (const struct hash_entry *) b;
> +	unsigned int he_a_len, he_b_len, min_len;
> +	int ret;
> +
> +	he_a_len = ext2fs_dirent_name_len(he_a->dir);
> +	he_b_len = ext2fs_dirent_name_len(he_b->dir);
> +
> +	ret = ext2fs_casefold_cmp(ctx->tbl, he_a->dir->name, he_a_len,
> +				  he_b->dir->name, he_b_len);
> +	if (ret == 0) {
> +		if (he_a_len > he_b_len)
> +			ret = 1;
> +		else if (he_a_len < he_b_len)
> +			ret = -1;
> +		else
> +			ret = he_b->dir->inode - he_a->dir->inode;
> +	}
> +	return ret;
> +}
> +
> +

extra line

>  /* Used for sorting the hash entry */
> -static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b)
> +static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b, void *arg)
>  {
> +	const struct name_cmp_ctx *ctx = (struct name_cmp_ctx *) arg;
>  	const struct hash_entry *he_a = (const struct hash_entry *) a;
>  	const struct hash_entry *he_b = (const struct hash_entry *) b;
>  	int	ret;
> @@ -256,8 +299,12 @@ static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b)
>  			ret = 1;
>  		else if (he_a->minor_hash < he_b->minor_hash)
>  			ret = -1;
> -		else
> -			ret = name_cmp(a, b);
> +		else {
> +			if (ctx->casefold)
> +				ret = name_cf_cmp(ctx, a, b);
> +			else
> +				ret = name_cmp(a, b);
> +		}
>  	}
>  	return ret;
>  }
> @@ -380,7 +427,8 @@ static void mutate_name(char *str, unsigned int *len)
>  
>  static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
>  				    ext2_ino_t ino,
> -				    struct fill_dir_struct *fd)
> +				    struct fill_dir_struct *fd,
> +				    const struct name_cmp_ctx *cmp_ctx)
>  {
>  	struct problem_context	pctx;
>  	struct hash_entry	*ent, *prev;
> @@ -403,11 +451,12 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
>  		ent = fd->harray + i;
>  		prev = ent - 1;
>  		if (!ent->dir->inode ||
> -		    (ext2fs_dirent_name_len(ent->dir) !=
> -		     ext2fs_dirent_name_len(prev->dir)) ||
> -		    memcmp(ent->dir->name, prev->dir->name,
> -			     ext2fs_dirent_name_len(ent->dir)))
> +		    !same_name(cmp_ctx, ent->dir->name,
> +			       ext2fs_dirent_name_len(ent->dir),
> +			       prev->dir->name,
> +			       ext2fs_dirent_name_len(prev->dir)))
>  			continue;
> +

noise.

Other than that, I think this is still good.

>  		pctx.dirent = ent->dir;
>  		if ((ent->dir->inode == prev->dir->inode) &&
>  		    fix_problem(ctx, PR_2_DUPLICATE_DIRENT, &pctx)) {
> @@ -426,10 +475,11 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
>  		mutate_name(new_name, &new_len);
>  		for (j=0; j < fd->num_array; j++) {
>  			if ((i==j) ||
> -			    (new_len !=
> -			     (unsigned) ext2fs_dirent_name_len(fd->harray[j].dir)) ||
> -			    memcmp(new_name, fd->harray[j].dir->name, new_len))
> +			    !same_name(cmp_ctx, new_name, new_len,
> +				       fd->harray[j].dir->name,
> +				       ext2fs_dirent_name_len(fd->harray[j].dir))) {
>  				continue;
> +			}
>  			mutate_name(new_name, &new_len);
>  
>  			j = -1;
> @@ -894,6 +944,7 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
>  	struct fill_dir_struct	fd = { NULL, NULL, 0, 0, 0, NULL,
>  				       0, 0, 0, 0, 0, 0 };
>  	struct out_dir		outdir = { 0, 0, 0, 0 };
> +	struct name_cmp_ctx name_cmp_ctx = {0, NULL};
>  
>  	e2fsck_read_inode(ctx, ino, &inode, "rehash_dir");
>  
> @@ -921,6 +972,11 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
>  		fd.compress = 1;
>  	fd.parent = 0;
>  
> +	if (fs->encoding && (inode.i_flags & EXT4_CASEFOLD_FL)) {
> +		name_cmp_ctx.casefold = 1;
> +		name_cmp_ctx.tbl = fs->encoding;
> +	}
> +
>  retry_nohash:
>  	/* Read in the entire directory into memory */
>  	retval = ext2fs_block_iterate3(fs, ino, 0, 0,
> @@ -949,16 +1005,16 @@ retry_nohash:
>  	/* Sort the list */
>  resort:
>  	if (fd.compress && fd.num_array > 1)
> -		qsort(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
> -		      hash_cmp);
> +		qsort_r(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
> +			hash_cmp, &name_cmp_ctx);
>  	else
> -		qsort(fd.harray, fd.num_array, sizeof(struct hash_entry),
> -		      hash_cmp);
> +		qsort_r(fd.harray, fd.num_array, sizeof(struct hash_entry),
> +			hash_cmp, &name_cmp_ctx);
>  
>  	/*
>  	 * Look for duplicates
>  	 */
> -	if (duplicate_search_and_fix(ctx, fs, ino, &fd))
> +	if (duplicate_search_and_fix(ctx, fs, ino, &fd, &name_cmp_ctx))
>  		goto resort;
>  
>  	if (ctx->options & E2F_OPT_NO) {

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 06/12] e2fsck: Fix entries with invalid encoded characters
  2020-12-10 20:51   ` Gabriel Krisman Bertazi
@ 2020-12-15 17:16     ` Arnaud Ferraris
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-15 17:16 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: linux-ext4, drosen, ebiggers, tytso

Hi Gabriel,

Le 10/12/2020 à 21:51, Gabriel Krisman Bertazi a écrit :
> Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:
> 
>> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>>
>> @@ -1483,11 +1520,7 @@ skip_checksum:
>>  		if (check_filetype(ctx, dirent, ino, &cd->pctx))
>>  			dir_modified++;
>>  
>> -		if (dir_encpolicy_id == NO_ENCRYPTION_POLICY) {
>> -			/* Unencrypted directory */
>> -			if (check_name(ctx, dirent, &cd->pctx))
>> -				dir_modified++;
>> -		} else {
>> +		if (dir_encpolicy_id != NO_ENCRYPTION_POLICY) {
>>  			/* Encrypted directory */
>>  			if (dot_state > 1 &&
>>  			    check_encrypted_dirent(ctx, dirent,
>> @@ -1497,6 +1530,14 @@ skip_checksum:
>>  				dir_modified++;
>>  				goto next;
>>  			}
>> +		} else if (cf_dir) {
>> +			/* Casefolded directory */
>> +			if (encoded_check_name(ctx, dirent, &cd->pctx))
>> +				dir_modified++;
>> +		} else {
>> +			/* Unencrypted and uncasefolded directory */
>> +			if (check_name(ctx, dirent, &cd->pctx))
>> +				dir_modified++;
>>  		}
> 
> This won't do for encrypted+casefolded directories, right?

Indeed, as encrypted+casefolded isn't supported right now, it's just a
re-arrangement to ease future support, as suggested by Eric.

Arnaud

> 
>>  
>>  		if (dx_db) {
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing
  2020-12-10 20:53   ` Gabriel Krisman Bertazi
@ 2020-12-15 17:17     ` Arnaud Ferraris
  2020-12-15 17:34       ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 26+ messages in thread
From: Arnaud Ferraris @ 2020-12-15 17:17 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: linux-ext4, drosen, ebiggers, tytso



Le 10/12/2020 à 21:53, Gabriel Krisman Bertazi a écrit :
> Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:
> 
>> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>>
>> @@ -403,11 +451,12 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
>>  		ent = fd->harray + i;
>>  		prev = ent - 1;
>>  		if (!ent->dir->inode ||
>> -		    (ext2fs_dirent_name_len(ent->dir) !=
>> -		     ext2fs_dirent_name_len(prev->dir)) ||
>> -		    memcmp(ent->dir->name, prev->dir->name,
>> -			     ext2fs_dirent_name_len(ent->dir)))
>> +		    !same_name(cmp_ctx, ent->dir->name,
>> +			       ext2fs_dirent_name_len(ent->dir),
>> +			       prev->dir->name,
>> +			       ext2fs_dirent_name_len(prev->dir)))
>>  			continue;
>> +
> 
> noise.

Could you please be more specific?

Arnaud

> 
> Other than that, I think this is still good.
> 
>>  		pctx.dirent = ent->dir;
>>  		if ((ent->dir->inode == prev->dir->inode) &&
>>  		    fix_problem(ctx, PR_2_DUPLICATE_DIRENT, &pctx)) {
>> @@ -426,10 +475,11 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
>>  		mutate_name(new_name, &new_len);
>>  		for (j=0; j < fd->num_array; j++) {
>>  			if ((i==j) ||
>> -			    (new_len !=
>> -			     (unsigned) ext2fs_dirent_name_len(fd->harray[j].dir)) ||
>> -			    memcmp(new_name, fd->harray[j].dir->name, new_len))
>> +			    !same_name(cmp_ctx, new_name, new_len,
>> +				       fd->harray[j].dir->name,
>> +				       ext2fs_dirent_name_len(fd->harray[j].dir))) {
>>  				continue;
>> +			}
>>  			mutate_name(new_name, &new_len);
>>  
>>  			j = -1;
>> @@ -894,6 +944,7 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
>>  	struct fill_dir_struct	fd = { NULL, NULL, 0, 0, 0, NULL,
>>  				       0, 0, 0, 0, 0, 0 };
>>  	struct out_dir		outdir = { 0, 0, 0, 0 };
>> +	struct name_cmp_ctx name_cmp_ctx = {0, NULL};
>>  
>>  	e2fsck_read_inode(ctx, ino, &inode, "rehash_dir");
>>  
>> @@ -921,6 +972,11 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
>>  		fd.compress = 1;
>>  	fd.parent = 0;
>>  
>> +	if (fs->encoding && (inode.i_flags & EXT4_CASEFOLD_FL)) {
>> +		name_cmp_ctx.casefold = 1;
>> +		name_cmp_ctx.tbl = fs->encoding;
>> +	}
>> +
>>  retry_nohash:
>>  	/* Read in the entire directory into memory */
>>  	retval = ext2fs_block_iterate3(fs, ino, 0, 0,
>> @@ -949,16 +1005,16 @@ retry_nohash:
>>  	/* Sort the list */
>>  resort:
>>  	if (fd.compress && fd.num_array > 1)
>> -		qsort(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
>> -		      hash_cmp);
>> +		qsort_r(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
>> +			hash_cmp, &name_cmp_ctx);
>>  	else
>> -		qsort(fd.harray, fd.num_array, sizeof(struct hash_entry),
>> -		      hash_cmp);
>> +		qsort_r(fd.harray, fd.num_array, sizeof(struct hash_entry),
>> +			hash_cmp, &name_cmp_ctx);
>>  
>>  	/*
>>  	 * Look for duplicates
>>  	 */
>> -	if (duplicate_search_and_fix(ctx, fs, ino, &fd))
>> +	if (duplicate_search_and_fix(ctx, fs, ino, &fd, &name_cmp_ctx))
>>  		goto resort;
>>  
>>  	if (ctx->options & E2F_OPT_NO) {
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing
  2020-12-15 17:17     ` Arnaud Ferraris
@ 2020-12-15 17:34       ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 26+ messages in thread
From: Gabriel Krisman Bertazi @ 2020-12-15 17:34 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, ebiggers, tytso

Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:

> Le 10/12/2020 à 21:53, Gabriel Krisman Bertazi a écrit :
>> Arnaud Ferraris <arnaud.ferraris@collabora.com> writes:
>> 
>>> From: Gabriel Krisman Bertazi <krisman@collabora.com>
>>>
>>> @@ -403,11 +451,12 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
>>>  		ent = fd->harray + i;
>>>  		prev = ent - 1;
>>>  		if (!ent->dir->inode ||
>>> -		    (ext2fs_dirent_name_len(ent->dir) !=
>>> -		     ext2fs_dirent_name_len(prev->dir)) ||
>>> -		    memcmp(ent->dir->name, prev->dir->name,
>>> -			     ext2fs_dirent_name_len(ent->dir)))
>>> +		    !same_name(cmp_ctx, ent->dir->name,
>>> +			       ext2fs_dirent_name_len(ent->dir),
>>> +			       prev->dir->name,
>>> +			       ext2fs_dirent_name_len(prev->dir)))
>>>  			continue;
>>> +
   ^^^^^^^

>> 
>> noise.
>
> Could you please be more specific?

the patch is adding an empty line for no reason.

>
> Arnaud
>
>> 
>> Other than that, I think this is still good.
>> 
>>>  		pctx.dirent = ent->dir;
>>>  		if ((ent->dir->inode == prev->dir->inode) &&
>>>  		    fix_problem(ctx, PR_2_DUPLICATE_DIRENT, &pctx)) {
>>> @@ -426,10 +475,11 @@ static int duplicate_search_and_fix(e2fsck_t ctx, ext2_filsys fs,
>>>  		mutate_name(new_name, &new_len);
>>>  		for (j=0; j < fd->num_array; j++) {
>>>  			if ((i==j) ||
>>> -			    (new_len !=
>>> -			     (unsigned) ext2fs_dirent_name_len(fd->harray[j].dir)) ||
>>> -			    memcmp(new_name, fd->harray[j].dir->name, new_len))
>>> +			    !same_name(cmp_ctx, new_name, new_len,
>>> +				       fd->harray[j].dir->name,
>>> +				       ext2fs_dirent_name_len(fd->harray[j].dir))) {
>>>  				continue;
>>> +			}
>>>  			mutate_name(new_name, &new_len);
>>>  
>>>  			j = -1;
>>> @@ -894,6 +944,7 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
>>>  	struct fill_dir_struct	fd = { NULL, NULL, 0, 0, 0, NULL,
>>>  				       0, 0, 0, 0, 0, 0 };
>>>  	struct out_dir		outdir = { 0, 0, 0, 0 };
>>> +	struct name_cmp_ctx name_cmp_ctx = {0, NULL};
>>>  
>>>  	e2fsck_read_inode(ctx, ino, &inode, "rehash_dir");
>>>  
>>> @@ -921,6 +972,11 @@ errcode_t e2fsck_rehash_dir(e2fsck_t ctx, ext2_ino_t ino,
>>>  		fd.compress = 1;
>>>  	fd.parent = 0;
>>>  
>>> +	if (fs->encoding && (inode.i_flags & EXT4_CASEFOLD_FL)) {
>>> +		name_cmp_ctx.casefold = 1;
>>> +		name_cmp_ctx.tbl = fs->encoding;
>>> +	}
>>> +
>>>  retry_nohash:
>>>  	/* Read in the entire directory into memory */
>>>  	retval = ext2fs_block_iterate3(fs, ino, 0, 0,
>>> @@ -949,16 +1005,16 @@ retry_nohash:
>>>  	/* Sort the list */
>>>  resort:
>>>  	if (fd.compress && fd.num_array > 1)
>>> -		qsort(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
>>> -		      hash_cmp);
>>> +		qsort_r(fd.harray+2, fd.num_array-2, sizeof(struct hash_entry),
>>> +			hash_cmp, &name_cmp_ctx);
>>>  	else
>>> -		qsort(fd.harray, fd.num_array, sizeof(struct hash_entry),
>>> -		      hash_cmp);
>>> +		qsort_r(fd.harray, fd.num_array, sizeof(struct hash_entry),
>>> +			hash_cmp, &name_cmp_ctx);
>>>  
>>>  	/*
>>>  	 * Look for duplicates
>>>  	 */
>>> -	if (duplicate_search_and_fix(ctx, fs, ino, &fd))
>>> +	if (duplicate_search_and_fix(ctx, fs, ino, &fd, &name_cmp_ctx))
>>>  		goto resort;
>>>  
>>>  	if (ctx->options & E2F_OPT_NO) {
>> 

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 01/12] tune2fs: Allow enabling casefold feature after fs creation
  2020-12-10 15:03 ` [PATCH RESEND v2 01/12] tune2fs: Allow enabling casefold feature after fs creation Arnaud Ferraris
@ 2021-01-27 22:42   ` Theodore Ts'o
  0 siblings, 0 replies; 26+ messages in thread
From: Theodore Ts'o @ 2021-01-27 22:42 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, krisman, ebiggers

On Thu, Dec 10, 2020 at 04:03:42PM +0100, Arnaud Ferraris wrote:
> From: Gabriel Krisman Bertazi <krisman@collabora.com>
> 
> The main reason we didn't allow this before was because !CASEFOLDED
> directories were expected to be normalized().  Since this is no longer
> the case, and as long as the encrypt feature is not enabled, it should
> be safe to enable this feature.
> 
> Disabling the feature is trickier, since we need to make sure there are
> no existing +F directories in the filesystem.  Leave that for a future
> patch.
> 
> Also, enabling strict mode requires some filesystem-wide verification,
> so ignore that for now.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 02/12] tune2fs: Fix casefold+encrypt error message
  2020-12-10 15:03 ` [PATCH RESEND v2 02/12] tune2fs: Fix casefold+encrypt error message Arnaud Ferraris
@ 2021-01-27 22:46   ` Theodore Ts'o
  0 siblings, 0 replies; 26+ messages in thread
From: Theodore Ts'o @ 2021-01-27 22:46 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, krisman, ebiggers

On Thu, Dec 10, 2020 at 04:03:43PM +0100, Arnaud Ferraris wrote:
> From: Gabriel Krisman Bertazi <krisman@collabora.com>
> 
> Refering to EXT4_INCOMPAT_CASEFOLD as encoding is not as meaningful as
> saying casefold.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 03/12] ext2fs: Add method to validate casefolded strings
  2020-12-10 15:03 ` [PATCH RESEND v2 03/12] ext2fs: Add method to validate casefolded strings Arnaud Ferraris
@ 2021-01-28  2:48   ` Theodore Ts'o
  0 siblings, 0 replies; 26+ messages in thread
From: Theodore Ts'o @ 2021-01-28  2:48 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, krisman, ebiggers

On Thu, Dec 10, 2020 at 04:03:44PM +0100, Arnaud Ferraris wrote:
> From: Gabriel Krisman Bertazi <krisman@collabora.com>
> 
> This is exported to be used by fsck.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 04/12] ext2fs: Implement faster CI comparison of strings
  2020-12-10 15:03 ` [PATCH RESEND v2 04/12] ext2fs: Implement faster CI comparison of strings Arnaud Ferraris
@ 2021-01-28  2:49   ` Theodore Ts'o
  0 siblings, 0 replies; 26+ messages in thread
From: Theodore Ts'o @ 2021-01-28  2:49 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, krisman, ebiggers

On Thu, Dec 10, 2020 at 04:03:45PM +0100, Arnaud Ferraris wrote:
> From: Gabriel Krisman Bertazi <krisman@collabora.com>
> 
> Instead of calling casefold two times and memcmp the result, which
> require allocating a temporary buffer for the casefolded version, add a
> strcasecmp-like method to perform the comparison of each code-point
> during the casefold itself.
> 
> This method is exposed because it needs to be used directly by fsck.
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs
  2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
                   ` (11 preceding siblings ...)
  2020-12-10 15:03 ` [PATCH RESEND v2 12/12] tests: f_bad_fname: Test fixes of invalid filenames and duplicates Arnaud Ferraris
@ 2021-01-28  2:52 ` Theodore Ts'o
  12 siblings, 0 replies; 26+ messages in thread
From: Theodore Ts'o @ 2021-01-28  2:52 UTC (permalink / raw)
  To: Arnaud Ferraris; +Cc: linux-ext4, drosen, krisman, ebiggers

Oops, I started applying the wrong version.  I'm aborting and
restarting with the v3 of this patch series.

	   	       	       - Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-28  2:53 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 01/12] tune2fs: Allow enabling casefold feature after fs creation Arnaud Ferraris
2021-01-27 22:42   ` Theodore Ts'o
2020-12-10 15:03 ` [PATCH RESEND v2 02/12] tune2fs: Fix casefold+encrypt error message Arnaud Ferraris
2021-01-27 22:46   ` Theodore Ts'o
2020-12-10 15:03 ` [PATCH RESEND v2 03/12] ext2fs: Add method to validate casefolded strings Arnaud Ferraris
2021-01-28  2:48   ` Theodore Ts'o
2020-12-10 15:03 ` [PATCH RESEND v2 04/12] ext2fs: Implement faster CI comparison of strings Arnaud Ferraris
2021-01-28  2:49   ` Theodore Ts'o
2020-12-10 15:03 ` [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check Arnaud Ferraris
2020-12-10 20:36   ` Gabriel Krisman Bertazi
2020-12-10 20:38   ` Gabriel Krisman Bertazi
2020-12-10 15:03 ` [PATCH RESEND v2 06/12] e2fsck: Fix entries with invalid encoded characters Arnaud Ferraris
2020-12-10 20:51   ` Gabriel Krisman Bertazi
2020-12-15 17:16     ` Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing Arnaud Ferraris
2020-12-10 20:53   ` Gabriel Krisman Bertazi
2020-12-15 17:17     ` Arnaud Ferraris
2020-12-15 17:34       ` Gabriel Krisman Bertazi
2020-12-10 15:03 ` [PATCH RESEND v2 08/12] dict: Support comparison with context Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 09/12] e2fsck: Detect duplicated casefolded direntries for rehash Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 10/12] e2fsck: Add option to force encoded filename verification Arnaud Ferraris
2020-12-10 20:48   ` Gabriel Krisman Bertazi
2020-12-10 15:03 ` [PATCH RESEND v2 11/12] e2fsck.8.in: Document check_encoding extended option Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 12/12] tests: f_bad_fname: Test fixes of invalid filenames and duplicates Arnaud Ferraris
2021-01-28  2:52 ` [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.