All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaud Ferraris <arnaud.ferraris@collabora.com>
To: linux-ext4@vger.kernel.org
Cc: drosen@google.com, krisman@collabora.com, ebiggers@kernel.org,
	tytso@mit.edu, Arnaud Ferraris <arnaud.ferraris@collabora.com>
Subject: [PATCH RESEND v2 04/12] ext2fs: Implement faster CI comparison of strings
Date: Thu, 10 Dec 2020 16:03:45 +0100	[thread overview]
Message-ID: <20201210150353.91843-5-arnaud.ferraris@collabora.com> (raw)
In-Reply-To: <20201210150353.91843-1-arnaud.ferraris@collabora.com>

From: Gabriel Krisman Bertazi <krisman@collabora.com>

Instead of calling casefold two times and memcmp the result, which
require allocating a temporary buffer for the casefolded version, add a
strcasecmp-like method to perform the comparison of each code-point
during the casefold itself.

This method is exposed because it needs to be used directly by fsck.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>
---
 lib/ext2fs/ext2fs.h   |  4 ++++
 lib/ext2fs/ext2fsP.h  |  4 ++++
 lib/ext2fs/nls_utf8.c | 33 +++++++++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 4065cb70..9e96ca5c 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -1615,6 +1615,10 @@ extern errcode_t ext2fs_new_dir_inline_data(ext2_filsys fs, ext2_ino_t dir_ino,
 extern const struct ext2fs_nls_table *ext2fs_load_nls_table(int encoding);
 extern int ext2fs_check_encoded_name(const struct ext2fs_nls_table *table,
 				     char *s, size_t len, char **pos);
+extern int ext2fs_casefold_cmp(const struct ext2fs_nls_table *table,
+			       const unsigned char *str1, size_t len1,
+			       const unsigned char *str2, size_t len2);
+
 
 /* mkdir.c */
 extern errcode_t ext2fs_mkdir(ext2_filsys fs, ext2_ino_t parent, ext2_ino_t inum,
diff --git a/lib/ext2fs/ext2fsP.h b/lib/ext2fs/ext2fsP.h
index 30564ded..99239be0 100644
--- a/lib/ext2fs/ext2fsP.h
+++ b/lib/ext2fs/ext2fsP.h
@@ -106,6 +106,10 @@ struct ext2fs_nls_ops {
 			unsigned char *dest, size_t dlen);
 	int (*validate)(const struct ext2fs_nls_table *table,
 			char *s, size_t len, char **pos);
+	int (*casefold_cmp)(const struct ext2fs_nls_table *table,
+			    const unsigned char *str1, size_t len1,
+			    const unsigned char *str2, size_t len2);
+
 };
 
 /* Function prototypes */
diff --git a/lib/ext2fs/nls_utf8.c b/lib/ext2fs/nls_utf8.c
index 903c65ba..1c444ca2 100644
--- a/lib/ext2fs/nls_utf8.c
+++ b/lib/ext2fs/nls_utf8.c
@@ -942,9 +942,36 @@ static int utf8_validate(const struct ext2fs_nls_table *table,
 	return 0;
 }
 
+static int utf8_casefold_cmp(const struct ext2fs_nls_table *table,
+			     const unsigned char *str1, size_t len1,
+			     const unsigned char *str2, size_t len2)
+{
+	const struct utf8data *data = utf8nfdicf(table->version);
+	int c1, c2;
+	struct utf8cursor cur1, cur2;
+
+	if (utf8ncursor(&cur1, data, (const char *) str1, len1) < 0)
+		return -1;
+	if (utf8ncursor(&cur2, data, (const char *) str2, len2) < 0)
+		return -1;
+
+	do {
+		c1 = utf8byte(&cur1);
+		c2 = utf8byte(&cur2);
+
+		if (c1 < 0 || c2 < 0)
+			return -1;
+		if (c1 != c2)
+			return c1 - c2;
+	} while (c1);
+
+	return 0;
+}
+
 static const struct ext2fs_nls_ops utf8_ops = {
 	.casefold = utf8_casefold,
 	.validate = utf8_validate,
+	.casefold_cmp = utf8_casefold_cmp,
 };
 
 static const struct ext2fs_nls_table nls_utf8 = {
@@ -965,3 +992,9 @@ int ext2fs_check_encoded_name(const struct ext2fs_nls_table *table,
 {
 	return table->ops->validate(table, name, len, pos);
 }
+int ext2fs_casefold_cmp(const struct ext2fs_nls_table *table,
+			const unsigned char *str1, size_t len1,
+			const unsigned char *str2, size_t len2)
+{
+	return table->ops->casefold_cmp(table, str1, len1, str2, len2);
+}
-- 
2.29.2


  parent reply	other threads:[~2020-12-10 15:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-10 15:03 [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 01/12] tune2fs: Allow enabling casefold feature after fs creation Arnaud Ferraris
2021-01-27 22:42   ` Theodore Ts'o
2020-12-10 15:03 ` [PATCH RESEND v2 02/12] tune2fs: Fix casefold+encrypt error message Arnaud Ferraris
2021-01-27 22:46   ` Theodore Ts'o
2020-12-10 15:03 ` [PATCH RESEND v2 03/12] ext2fs: Add method to validate casefolded strings Arnaud Ferraris
2021-01-28  2:48   ` Theodore Ts'o
2020-12-10 15:03 ` Arnaud Ferraris [this message]
2021-01-28  2:49   ` [PATCH RESEND v2 04/12] ext2fs: Implement faster CI comparison of strings Theodore Ts'o
2020-12-10 15:03 ` [PATCH RESEND v2 05/12] e2fsck: add new problem for casefolded name check Arnaud Ferraris
2020-12-10 20:36   ` Gabriel Krisman Bertazi
2020-12-10 20:38   ` Gabriel Krisman Bertazi
2020-12-10 15:03 ` [PATCH RESEND v2 06/12] e2fsck: Fix entries with invalid encoded characters Arnaud Ferraris
2020-12-10 20:51   ` Gabriel Krisman Bertazi
2020-12-15 17:16     ` Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 07/12] e2fsck: Support casefold directories when rehashing Arnaud Ferraris
2020-12-10 20:53   ` Gabriel Krisman Bertazi
2020-12-15 17:17     ` Arnaud Ferraris
2020-12-15 17:34       ` Gabriel Krisman Bertazi
2020-12-10 15:03 ` [PATCH RESEND v2 08/12] dict: Support comparison with context Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 09/12] e2fsck: Detect duplicated casefolded direntries for rehash Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 10/12] e2fsck: Add option to force encoded filename verification Arnaud Ferraris
2020-12-10 20:48   ` Gabriel Krisman Bertazi
2020-12-10 15:03 ` [PATCH RESEND v2 11/12] e2fsck.8.in: Document check_encoding extended option Arnaud Ferraris
2020-12-10 15:03 ` [PATCH RESEND v2 12/12] tests: f_bad_fname: Test fixes of invalid filenames and duplicates Arnaud Ferraris
2021-01-28  2:52 ` [PATCH RESEND v2 00/12] e2fsprogs: improve case-insensitive fs Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201210150353.91843-5-arnaud.ferraris@collabora.com \
    --to=arnaud.ferraris@collabora.com \
    --cc=drosen@google.com \
    --cc=ebiggers@kernel.org \
    --cc=krisman@collabora.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.