All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/4] xfs: fix ascii-ci problems, then kill it
@ 2023-04-06  0:02 Darrick J. Wong
  2023-04-06  0:02 ` [PATCH 1/4] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:02 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david

Hi all,

Last week, I was fiddling around with the metadump name obfuscation code
while writing a debugger command to generate directories full of names
that all have the same hash name.  I had a few questions about how well
all that worked with ascii-ci mode, and discovered a nasty discrepancy
between the kernel and glibc's implementations of the tolower()
function.

I discovered that I could create a directory that is large enough to
require separate leaf index blocks.  The hashes stored in the dabtree
use the ascii-ci specific hash function, which uses a library function
to convert the name to lowercase before hashing.  If the kernel and C
library's versions of tolower do not behave exactly identically,
xfs_ascii_ci_hashname will not produce the same results for the same
inputs.  xfs_repair will deem the leaf information corrupt and rebuild
the directory.  After that, lookups in the kernel will fail because the
hash index doesn't work.

The kernel's tolower function will convert extended ascii uppercase
letters (e.g. A-with-umlaut) to extended ascii lowercase letters (e.g.
a-with-umlaut), whereas glibc's will only do that if you force LANG to
ascii.  Tiny embedded libc implementations just plain won't do it at
all, and the result is a mess.  Stabilize the behavior of the hash
function by encoding the name transformation function in libxfs, add it
to the selftest, and fix all the userspace tools, none of which handle
this transformation correctly.

The v1 series generated a /lot/ of discussion, in which several things
became very clear: (1) Linus is not enamored of case folding of any
kind; (2) Dave and Christoph don't seem to agree on whether the feature
is supposed to work for 7-bit ascii or latin1; (3) it trashes UTF8
encoded names if those happen to show up; and (4) I don't want to
maintain this mess any longer than I have to.  Kill it in 2030.

v2: rename the functions to make it clear we're moving away from the
letters t, o, l, o, w, e, and r; and deprecate the whole feature once
we've fixed the bugs and added tests.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-asciici-bugs-6.3
---
 Documentation/admin-guide/xfs.rst |    1 
 fs/xfs/Kconfig                    |   27 +++++
 fs/xfs/libxfs/xfs_dir2.c          |    4 -
 fs/xfs/libxfs/xfs_dir2.h          |   31 +++++
 fs/xfs/scrub/dir.c                |    7 +
 fs/xfs/xfs_dahash_test.c          |  211 +++++++++++++++++++------------------
 fs/xfs/xfs_super.c                |   13 ++
 7 files changed, 191 insertions(+), 103 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation
  2023-04-06  0:02 [PATCHSET v2 0/4] xfs: fix ascii-ci problems, then kill it Darrick J. Wong
@ 2023-04-06  0:02 ` Darrick J. Wong
  2023-04-11  4:50   ` Christoph Hellwig
  2023-04-06  0:03 ` [PATCH 2/4] xfs: test the ascii case-insensitive hash Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:02 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Back in the old days, the "ascii-ci" feature was created to implement
case-insensitive directory entry lookups for latin1-encoded names and
remove the large overhead of Samba's case-insensitive lookup code.  UTF8
names were not allowed, but nobody explicitly wrote in the documentation
that this was only expected to work if the system used latin1 names.
The kernel tolower function was selected to prepare names for hashed
lookups.

There's a major discrepancy in the function that computes directory entry
hashes for filesystems that have ASCII case-insensitive lookups enabled.
The root of this is that the kernel and glibc's tolower implementations
have differing behavior for extended ASCII accented characters.  I wrote
a program to spit out characters for which the tolower() return value is
different from the input:

glibc tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z

kernel tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z 192:À 193:Á
194:Â 195:Ã 196:Ä 197:Å 198:Æ 199:Ç 200:È 201:É 202:Ê 203:Ë 204:Ì 205:Í
206:Î 207:Ï 208:Ð 209:Ñ 210:Ò 211:Ó 212:Ô 213:Õ 214:Ö 215:× 216:Ø 217:Ù
218:Ú 219:Û 220:Ü 221:Ý 222:Þ

Which means that the kernel and userspace do not agree on the hash value
for a directory filename that contains those higher values.  The hash
values are written into the leaf index block of directories that are
larger than two blocks in size, which means that xfs_repair will flag
these directories as having corrupted hash indexes and rewrite the index
with hash values that the kernel now will not recognize.

Because the ascii-ci feature is not frequently enabled and the kernel
touches filesystems far more frequently than xfs_repair does, fix this
by encoding the kernel's toupper predicate and tolower functions into
libxfs.  Give the new functions less provocative names to make it really
obvious that this is a pre-hash name preparation function, and nothing
else.  This change makes userspace's behavior consistent with the
kernel.

Found by auditing obfuscate_name in xfs_metadump as part of working on
parent pointers, wondering how it could possibly work correctly with ci
filesystems, writing a test tool to create a directory with
hash-colliding names, and watching xfs_repair flag it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2.c |    4 ++--
 fs/xfs/libxfs/xfs_dir2.h |   31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 92bac3373f1f..0c5b92d17f5f 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -64,7 +64,7 @@ xfs_ascii_ci_hashname(
 	int			i;
 
 	for (i = 0, hash = 0; i < name->len; i++)
-		hash = tolower(name->name[i]) ^ rol32(hash, 7);
+		hash = xfs_ascii_ci_xfrm(name->name[i]) ^ rol32(hash, 7);
 
 	return hash;
 }
@@ -85,7 +85,7 @@ xfs_ascii_ci_compname(
 	for (i = 0; i < len; i++) {
 		if (args->name[i] == name[i])
 			continue;
-		if (tolower(args->name[i]) != tolower(name[i]))
+		if (xfs_ascii_ci_xfrm(args->name[i]) != xfs_ascii_ci_xfrm(name[i]))
 			return XFS_CMP_DIFFERENT;
 		result = XFS_CMP_CASE;
 	}
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index dd39f17dd9a9..19af22a16c41 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -248,4 +248,35 @@ unsigned int xfs_dir3_data_end_offset(struct xfs_da_geometry *geo,
 		struct xfs_dir2_data_hdr *hdr);
 bool xfs_dir2_namecheck(const void *name, size_t length);
 
+/*
+ * The "ascii-ci" feature was created to speed up case-insensitive lookups for
+ * a Samba product.  Because of the inherent problems with CI and UTF-8
+ * encoding, etc, it was decided that Samba would be configured to export
+ * latin1/iso 8859-1 encodings as that covered >90% of the target markets for
+ * the product.  Hence the "ascii-ci" casefolding code could be encoded into
+ * the XFS directory operations and remove all the overhead of casefolding from
+ * Samba.
+ *
+ * To provide consistent hashing behavior between the userspace and kernel,
+ * these functions prepare names for hashing by transforming specific bytes
+ * to other bytes.  Robustness with other encodings is not guaranteed.
+ */
+static inline bool xfs_ascii_ci_need_xfrm(unsigned char c)
+{
+	if (c >= 0x41 && c <= 0x5a)	/* A-Z */
+		return true;
+	if (c >= 0xc0 && c <= 0xd6)	/* latin A-O with accents */
+		return true;
+	if (c >= 0xd8 && c <= 0xde)	/* latin O-Y with accents */
+		return true;
+	return false;
+}
+
+static inline unsigned char xfs_ascii_ci_xfrm(unsigned char c)
+{
+	if (xfs_ascii_ci_need_xfrm(c))
+		c -= 'A' - 'a';
+	return c;
+}
+
 #endif	/* __XFS_DIR2_H__ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] xfs: test the ascii case-insensitive hash
  2023-04-06  0:02 [PATCHSET v2 0/4] xfs: fix ascii-ci problems, then kill it Darrick J. Wong
  2023-04-06  0:02 ` [PATCH 1/4] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
@ 2023-04-06  0:03 ` Darrick J. Wong
  2023-04-11  4:50   ` Christoph Hellwig
  2023-04-06  0:03 ` [PATCH 3/4] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:03 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Now that we've made kernel and userspace use the same tolower code for
computing directory index hashes, add that to the selftest code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_dahash_test.c |  211 ++++++++++++++++++++++++----------------------
 1 file changed, 111 insertions(+), 100 deletions(-)


diff --git a/fs/xfs/xfs_dahash_test.c b/fs/xfs/xfs_dahash_test.c
index 230651ab5ce4..0dab5941e080 100644
--- a/fs/xfs/xfs_dahash_test.c
+++ b/fs/xfs/xfs_dahash_test.c
@@ -9,6 +9,9 @@
 #include "xfs_format.h"
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_dir2_priv.h"
 #include "xfs_dahash_test.h"
 
 /* 4096 random bytes */
@@ -533,108 +536,109 @@ static struct dahash_test {
 	uint16_t	start;	/* random 12 bit offset in buf */
 	uint16_t	length;	/* random 8 bit length of test */
 	xfs_dahash_t	dahash;	/* expected dahash result */
+	xfs_dahash_t	ascii_ci_dahash; /* expected ascii-ci dahash result */
 } test[] __initdata =
 {
-	{0x0567, 0x0097, 0x96951389},
-	{0x0869, 0x0055, 0x6455ab4f},
-	{0x0c51, 0x00be, 0x8663afde},
-	{0x044a, 0x00fc, 0x98fbe432},
-	{0x0f29, 0x0079, 0x42371997},
-	{0x08ba, 0x0052, 0x942be4f7},
-	{0x01f2, 0x0013, 0x5262687e},
-	{0x09e3, 0x00e2, 0x8ffb0908},
-	{0x007c, 0x0051, 0xb3158491},
-	{0x0854, 0x001f, 0x83bb20d9},
-	{0x031b, 0x0008, 0x98970bdf},
-	{0x0de7, 0x0027, 0xbfbf6f6c},
-	{0x0f76, 0x0005, 0x906a7105},
-	{0x092e, 0x00d0, 0x86631850},
-	{0x0233, 0x0082, 0xdbdd914e},
-	{0x04c9, 0x0075, 0x5a400a9e},
-	{0x0b66, 0x0099, 0xae128b45},
-	{0x000d, 0x00ed, 0xe61c216a},
-	{0x0a31, 0x003d, 0xf69663b9},
-	{0x00a3, 0x0052, 0x643c39ae},
-	{0x0125, 0x00d5, 0x7c310b0d},
-	{0x0105, 0x004a, 0x06a77e74},
-	{0x0858, 0x008e, 0x265bc739},
-	{0x045e, 0x0095, 0x13d6b192},
-	{0x0dab, 0x003c, 0xc4498704},
-	{0x00cd, 0x00b5, 0x802a4e2d},
-	{0x069b, 0x008c, 0x5df60f71},
-	{0x0454, 0x006c, 0x5f03d8bb},
-	{0x040e, 0x0032, 0x0ce513b5},
-	{0x0874, 0x00e2, 0x6a811fb3},
-	{0x0521, 0x00b4, 0x93296833},
-	{0x0ddc, 0x00cf, 0xf9305338},
-	{0x0a70, 0x0023, 0x239549ea},
-	{0x083e, 0x0027, 0x2d88ba97},
-	{0x0241, 0x00a7, 0xfe0b32e1},
-	{0x0dfc, 0x0096, 0x1a11e815},
-	{0x023e, 0x001e, 0xebc9a1f3},
-	{0x067e, 0x0066, 0xb1067f81},
-	{0x09ea, 0x000e, 0x46fd7247},
-	{0x036b, 0x008c, 0x1a39acdf},
-	{0x078f, 0x0030, 0x964042ab},
-	{0x085c, 0x008f, 0x1829edab},
-	{0x02ec, 0x009f, 0x6aefa72d},
-	{0x043b, 0x00ce, 0x65642ff5},
-	{0x0a32, 0x00b8, 0xbd82759e},
-	{0x0d3c, 0x0087, 0xf4d66d54},
-	{0x09ec, 0x008a, 0x06bfa1ff},
-	{0x0902, 0x0015, 0x755025d2},
-	{0x08fe, 0x000e, 0xf690ce2d},
-	{0x00fb, 0x00dc, 0xe55f1528},
-	{0x0eaa, 0x003a, 0x0fe0a8d7},
-	{0x05fb, 0x0006, 0x86281cfb},
-	{0x0dd1, 0x00a7, 0x60ab51b4},
-	{0x0005, 0x001b, 0xf51d969b},
-	{0x077c, 0x00dd, 0xc2fed268},
-	{0x0575, 0x00f5, 0x432c0b1a},
-	{0x05be, 0x0088, 0x78baa04b},
-	{0x0c89, 0x0068, 0xeda9e428},
-	{0x0f5c, 0x0068, 0xec143c76},
-	{0x06a8, 0x0009, 0xd72651ce},
-	{0x060f, 0x008e, 0x765426cd},
-	{0x07b1, 0x0047, 0x2cfcfa0c},
-	{0x04f1, 0x0041, 0x55b172f9},
-	{0x0e05, 0x00ac, 0x61efde93},
-	{0x0bf7, 0x0097, 0x05b83eee},
-	{0x04e9, 0x00f3, 0x9928223a},
-	{0x023a, 0x0005, 0xdfada9bc},
-	{0x0acb, 0x000e, 0x2217cecd},
-	{0x0148, 0x0060, 0xbc3f7405},
-	{0x0764, 0x0059, 0xcbc201b1},
-	{0x021f, 0x0059, 0x5d6b2256},
-	{0x0f1e, 0x006c, 0xdefeeb45},
-	{0x071c, 0x00b9, 0xb9b59309},
-	{0x0564, 0x0063, 0xae064271},
-	{0x0b14, 0x0044, 0xdb867d9b},
-	{0x0e5a, 0x0055, 0xff06b685},
-	{0x015e, 0x00ba, 0x1115ccbc},
-	{0x0379, 0x00e6, 0x5f4e58dd},
-	{0x013b, 0x0067, 0x4897427e},
-	{0x0e64, 0x0071, 0x7af2b7a4},
-	{0x0a11, 0x0050, 0x92105726},
-	{0x0109, 0x0055, 0xd0d000f9},
-	{0x00aa, 0x0022, 0x815d229d},
-	{0x09ac, 0x004f, 0x02f9d985},
-	{0x0e1b, 0x00ce, 0x5cf92ab4},
-	{0x08af, 0x00d8, 0x17ca72d1},
-	{0x0e33, 0x000a, 0xda2dba6b},
-	{0x0ee3, 0x006a, 0xb00048e5},
-	{0x0648, 0x001a, 0x2364b8cb},
-	{0x0315, 0x0085, 0x0596fd0d},
-	{0x0fbb, 0x003e, 0x298230ca},
-	{0x0422, 0x006a, 0x78ada4ab},
-	{0x04ba, 0x0073, 0xced1fbc2},
-	{0x007d, 0x0061, 0x4b7ff236},
-	{0x070b, 0x00d0, 0x261cf0ae},
-	{0x0c1a, 0x0035, 0x8be92ee2},
-	{0x0af8, 0x0063, 0x824dcf03},
-	{0x08f8, 0x006d, 0xd289710c},
-	{0x021b, 0x00ee, 0x6ac1c41d},
-	{0x05b5, 0x00da, 0x8e52f0e2},
+	{0x0567, 0x0097, 0x96951389, 0xc153aa0d},
+	{0x0869, 0x0055, 0x6455ab4f, 0xd07f69bf},
+	{0x0c51, 0x00be, 0x8663afde, 0xf9add90c},
+	{0x044a, 0x00fc, 0x98fbe432, 0xbf2abb76},
+	{0x0f29, 0x0079, 0x42371997, 0x282588b3},
+	{0x08ba, 0x0052, 0x942be4f7, 0x2e023547},
+	{0x01f2, 0x0013, 0x5262687e, 0x5266287e},
+	{0x09e3, 0x00e2, 0x8ffb0908, 0x1da892f3},
+	{0x007c, 0x0051, 0xb3158491, 0xb67f9e63},
+	{0x0854, 0x001f, 0x83bb20d9, 0x22bb21db},
+	{0x031b, 0x0008, 0x98970bdf, 0x9cd70adf},
+	{0x0de7, 0x0027, 0xbfbf6f6c, 0xae3f296c},
+	{0x0f76, 0x0005, 0x906a7105, 0x906a7105},
+	{0x092e, 0x00d0, 0x86631850, 0xa3f6ac04},
+	{0x0233, 0x0082, 0xdbdd914e, 0x5d8c7aac},
+	{0x04c9, 0x0075, 0x5a400a9e, 0x12f60711},
+	{0x0b66, 0x0099, 0xae128b45, 0x7551310d},
+	{0x000d, 0x00ed, 0xe61c216a, 0xc22d3c4c},
+	{0x0a31, 0x003d, 0xf69663b9, 0x51960bf8},
+	{0x00a3, 0x0052, 0x643c39ae, 0xa93c73a8},
+	{0x0125, 0x00d5, 0x7c310b0d, 0xf221cbb3},
+	{0x0105, 0x004a, 0x06a77e74, 0xa4ef4561},
+	{0x0858, 0x008e, 0x265bc739, 0xd6c36d9b},
+	{0x045e, 0x0095, 0x13d6b192, 0x5f5c1d62},
+	{0x0dab, 0x003c, 0xc4498704, 0x10414654},
+	{0x00cd, 0x00b5, 0x802a4e2d, 0xfbd17c9d},
+	{0x069b, 0x008c, 0x5df60f71, 0x91ddca5f},
+	{0x0454, 0x006c, 0x5f03d8bb, 0x5c59fce0},
+	{0x040e, 0x0032, 0x0ce513b5, 0xa8cd99b1},
+	{0x0874, 0x00e2, 0x6a811fb3, 0xca028316},
+	{0x0521, 0x00b4, 0x93296833, 0x2c4d4880},
+	{0x0ddc, 0x00cf, 0xf9305338, 0x2c94210d},
+	{0x0a70, 0x0023, 0x239549ea, 0x22b561aa},
+	{0x083e, 0x0027, 0x2d88ba97, 0x5cd8bb9d},
+	{0x0241, 0x00a7, 0xfe0b32e1, 0x17b506b8},
+	{0x0dfc, 0x0096, 0x1a11e815, 0xee4141bd},
+	{0x023e, 0x001e, 0xebc9a1f3, 0x5689a1f3},
+	{0x067e, 0x0066, 0xb1067f81, 0xd9952571},
+	{0x09ea, 0x000e, 0x46fd7247, 0x42b57245},
+	{0x036b, 0x008c, 0x1a39acdf, 0x58bf1586},
+	{0x078f, 0x0030, 0x964042ab, 0xb04218b9},
+	{0x085c, 0x008f, 0x1829edab, 0x9ceca89c},
+	{0x02ec, 0x009f, 0x6aefa72d, 0x634cc2a7},
+	{0x043b, 0x00ce, 0x65642ff5, 0x6c8a584e},
+	{0x0a32, 0x00b8, 0xbd82759e, 0x0f96a34f},
+	{0x0d3c, 0x0087, 0xf4d66d54, 0xb71ba5f4},
+	{0x09ec, 0x008a, 0x06bfa1ff, 0x576ca80f},
+	{0x0902, 0x0015, 0x755025d2, 0x517225c2},
+	{0x08fe, 0x000e, 0xf690ce2d, 0xf690cf3d},
+	{0x00fb, 0x00dc, 0xe55f1528, 0x707d7d92},
+	{0x0eaa, 0x003a, 0x0fe0a8d7, 0x87638cc5},
+	{0x05fb, 0x0006, 0x86281cfb, 0x86281cf9},
+	{0x0dd1, 0x00a7, 0x60ab51b4, 0xe28ef00c},
+	{0x0005, 0x001b, 0xf51d969b, 0xe71dd6d3},
+	{0x077c, 0x00dd, 0xc2fed268, 0xdc30c555},
+	{0x0575, 0x00f5, 0x432c0b1a, 0x81dd7d16},
+	{0x05be, 0x0088, 0x78baa04b, 0xd69b433e},
+	{0x0c89, 0x0068, 0xeda9e428, 0xe9b4fa0a},
+	{0x0f5c, 0x0068, 0xec143c76, 0x9947067a},
+	{0x06a8, 0x0009, 0xd72651ce, 0xd72651ee},
+	{0x060f, 0x008e, 0x765426cd, 0x2099626f},
+	{0x07b1, 0x0047, 0x2cfcfa0c, 0x1a4baa07},
+	{0x04f1, 0x0041, 0x55b172f9, 0x15331a79},
+	{0x0e05, 0x00ac, 0x61efde93, 0x320568cc},
+	{0x0bf7, 0x0097, 0x05b83eee, 0xc72fb7a3},
+	{0x04e9, 0x00f3, 0x9928223a, 0xe8c77de2},
+	{0x023a, 0x0005, 0xdfada9bc, 0xdfadb9be},
+	{0x0acb, 0x000e, 0x2217cecd, 0x0017d6cd},
+	{0x0148, 0x0060, 0xbc3f7405, 0xf5fd6615},
+	{0x0764, 0x0059, 0xcbc201b1, 0xbb089bf4},
+	{0x021f, 0x0059, 0x5d6b2256, 0xa16a0a59},
+	{0x0f1e, 0x006c, 0xdefeeb45, 0xfc34f9d6},
+	{0x071c, 0x00b9, 0xb9b59309, 0xb645eae2},
+	{0x0564, 0x0063, 0xae064271, 0x954dc6d1},
+	{0x0b14, 0x0044, 0xdb867d9b, 0xdf432309},
+	{0x0e5a, 0x0055, 0xff06b685, 0xa65ff257},
+	{0x015e, 0x00ba, 0x1115ccbc, 0x11c365f4},
+	{0x0379, 0x00e6, 0x5f4e58dd, 0x2d176d31},
+	{0x013b, 0x0067, 0x4897427e, 0xc40532fe},
+	{0x0e64, 0x0071, 0x7af2b7a4, 0x1fb7bf43},
+	{0x0a11, 0x0050, 0x92105726, 0xb1185e51},
+	{0x0109, 0x0055, 0xd0d000f9, 0x60a60bfd},
+	{0x00aa, 0x0022, 0x815d229d, 0x215d379c},
+	{0x09ac, 0x004f, 0x02f9d985, 0x10b90b20},
+	{0x0e1b, 0x00ce, 0x5cf92ab4, 0x6a477573},
+	{0x08af, 0x00d8, 0x17ca72d1, 0x385af156},
+	{0x0e33, 0x000a, 0xda2dba6b, 0xda2dbb69},
+	{0x0ee3, 0x006a, 0xb00048e5, 0xa9a2decc},
+	{0x0648, 0x001a, 0x2364b8cb, 0x3364b1cb},
+	{0x0315, 0x0085, 0x0596fd0d, 0xa651740f},
+	{0x0fbb, 0x003e, 0x298230ca, 0x7fc617c7},
+	{0x0422, 0x006a, 0x78ada4ab, 0xc576ae2a},
+	{0x04ba, 0x0073, 0xced1fbc2, 0xaac8455b},
+	{0x007d, 0x0061, 0x4b7ff236, 0x347d5739},
+	{0x070b, 0x00d0, 0x261cf0ae, 0xc7fb1c10},
+	{0x0c1a, 0x0035, 0x8be92ee2, 0x8be9b4e1},
+	{0x0af8, 0x0063, 0x824dcf03, 0x53010388},
+	{0x08f8, 0x006d, 0xd289710c, 0x30418edd},
+	{0x021b, 0x00ee, 0x6ac1c41d, 0x2557e9a3},
+	{0x05b5, 0x00da, 0x8e52f0e2, 0x98531012},
 };
 
 int __init
@@ -644,12 +648,19 @@ xfs_dahash_test(void)
 	unsigned int	errors = 0;
 
 	for (i = 0; i < ARRAY_SIZE(test); i++) {
+		struct xfs_name	xname = { };
 		xfs_dahash_t	hash;
 
 		hash = xfs_da_hashname(test_buf + test[i].start,
 				test[i].length);
 		if (hash != test[i].dahash)
 			errors++;
+
+		xname.name = test_buf + test[i].start;
+		xname.len = test[i].length;
+		hash = xfs_ascii_ci_hashname(&xname);
+		if (hash != test[i].ascii_ci_dahash)
+			errors++;
 	}
 
 	if (errors) {


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] xfs: use the directory name hash function for dir scrubbing
  2023-04-06  0:02 [PATCHSET v2 0/4] xfs: fix ascii-ci problems, then kill it Darrick J. Wong
  2023-04-06  0:02 ` [PATCH 1/4] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
  2023-04-06  0:03 ` [PATCH 2/4] xfs: test the ascii case-insensitive hash Darrick J. Wong
@ 2023-04-06  0:03 ` Darrick J. Wong
  2023-04-11  4:51   ` Christoph Hellwig
  2023-04-06  0:03 ` [PATCH 4/4] xfs: deprecate the ascii-ci feature Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:03 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

The directory code has a directory-specific hash computation function
that includes a modified hash function for case-insensitive lookups.
Hence we must use that function (and not the raw da_hashname) when
checking the dabtree structure.

Found by accidentally breaking xfs/188 to create an abnormally huge
case-insensitive directory and watching scrub break.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index d1b0f23c2c59..aeb815a483ff 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -201,6 +201,7 @@ xchk_dir_rec(
 	struct xchk_da_btree		*ds,
 	int				level)
 {
+	struct xfs_name			dname = { };
 	struct xfs_da_state_blk		*blk = &ds->state->path.blk[level];
 	struct xfs_mount		*mp = ds->state->mp;
 	struct xfs_inode		*dp = ds->dargs.dp;
@@ -297,7 +298,11 @@ xchk_dir_rec(
 		xchk_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
 		goto out_relse;
 	}
-	calc_hash = xfs_da_hashname(dent->name, dent->namelen);
+
+	/* Does the directory hash match? */
+	dname.name = dent->name;
+	dname.len = dent->namelen;
+	calc_hash = xfs_dir2_hashname(mp, &dname);
 	if (calc_hash != hash)
 		xchk_fblock_set_corrupt(ds->sc, XFS_DATA_FORK, rec_bno);
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] xfs: deprecate the ascii-ci feature
  2023-04-06  0:02 [PATCHSET v2 0/4] xfs: fix ascii-ci problems, then kill it Darrick J. Wong
                   ` (2 preceding siblings ...)
  2023-04-06  0:03 ` [PATCH 3/4] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
@ 2023-04-06  0:03 ` Darrick J. Wong
  2023-04-11  4:52   ` Christoph Hellwig
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
  2023-04-06  0:11 ` [PATCH] fstests: add a couple more tests for ascii-ci problems Darrick J. Wong
  5 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:03 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

This feature is a mess -- the hash function has been broken for the
entire 15 years of its existence if you create names with extended ascii
bytes; metadump name obfuscation has silently failed for just as long;
and the feature clashes horribly with the UTF8 encodings that most
systems use today.  There is exactly one fstest for this feature.

In other words, this feature is crap.  Let's deprecate it now so we can
remove it from the codebase in 2030.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 Documentation/admin-guide/xfs.rst |    1 +
 fs/xfs/Kconfig                    |   27 +++++++++++++++++++++++++++
 fs/xfs/xfs_super.c                |   13 +++++++++++++
 3 files changed, 41 insertions(+)


diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
index e2561416391c..e85a9404d5c0 100644
--- a/Documentation/admin-guide/xfs.rst
+++ b/Documentation/admin-guide/xfs.rst
@@ -240,6 +240,7 @@ Deprecated Mount Options
   Name				Removal Schedule
 ===========================     ================
 Mounting with V4 filesystem     September 2030
+Mounting ascii-ci filesystem    September 2030
 ikeep/noikeep			September 2025
 attr2/noattr2			September 2025
 ===========================     ================
diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 9fac5ea8d0e4..09c5ff136f22 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -47,6 +47,33 @@ config XFS_SUPPORT_V4
 	  To continue supporting the old V4 format (crc=0), say Y.
 	  To close off an attack surface, say N.
 
+config XFS_SUPPORT_ASCII_CI
+	bool "Support deprecated case-insensitive ascii (ascii-ci=1) format"
+	depends on XFS_FS
+	default y
+	help
+	  The ASCII case insensitivity filesystem feature only works correctly
+	  on systems that have been coerced into using ISO 8859-1, and it does
+	  not work on extended attributes.  The kernel has no visibility into
+	  the locale settings in userspace, so it corrupts UTF-8 names.
+	  Enabling this feature makes XFS vulnerable to mixed case sensitivity
+	  attacks.  Because of this, the feature is deprecated.  All users
+	  should upgrade by backing up their files, reformatting, and restoring
+	  from the backup.
+
+	  Administrators and users can detect such a filesystem by running
+	  xfs_info against a filesystem mountpoint and checking for a string
+	  beginning with "ascii-ci=".  If the string "ascii-ci=1" is found, the
+	  filesystem is a case-insensitive filesystem.  If no such string is
+	  found, please upgrade xfsprogs to the latest version and try again.
+
+	  This option will become default N in September 2025.  Support for the
+	  feature will be removed entirely in September 2030.  Distributors
+	  can say N here to withdraw support earlier.
+
+	  To continue supporting case-insensitivity (ascii-ci=1), say Y.
+	  To close off an attack surface, say N.
+
 config XFS_QUOTA
 	bool "XFS Quota support"
 	depends on XFS_FS
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 4f814f9e12ab..4d2e87462ac4 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1548,6 +1548,19 @@ xfs_fs_fill_super(
 #endif
 	}
 
+	/* ASCII case insensitivity is undergoing deprecation. */
+	if (xfs_has_asciici(mp)) {
+#ifdef CONFIG_XFS_SUPPORT_ASCII_CI
+		xfs_warn_once(mp,
+	"Deprecated ASCII case-insensitivity feature (ascii-ci=1) will not be supported after September 2030.");
+#else
+		xfs_warn(mp,
+	"Deprecated ASCII case-insensitivity feature (ascii-ci=1) not supported by kernel.");
+		error = -EINVAL;
+		goto out_free_sb;
+#endif
+	}
+
 	/* Filesystem claims it needs repair, so refuse the mount. */
 	if (xfs_has_needsrepair(mp)) {
 		xfs_warn(mp, "Filesystem needs repair.  Please run xfs_repair.");


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it
  2023-04-06  0:02 [PATCHSET v2 0/4] xfs: fix ascii-ci problems, then kill it Darrick J. Wong
                   ` (3 preceding siblings ...)
  2023-04-06  0:03 ` [PATCH 4/4] xfs: deprecate the ascii-ci feature Darrick J. Wong
@ 2023-04-06  0:09 ` Darrick J. Wong
  2023-04-06  0:09   ` [PATCH 1/6] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
                     ` (7 more replies)
  2023-04-06  0:11 ` [PATCH] fstests: add a couple more tests for ascii-ci problems Darrick J. Wong
  5 siblings, 8 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, david

Hi all,

Last week, I was fiddling around with the metadump name obfuscation code
while writing a debugger command to generate directories full of names
that all have the same hash name.  I had a few questions about how well
all that worked with ascii-ci mode, and discovered a nasty discrepancy
between the kernel and glibc's implementations of the tolower()
function.

I discovered that I could create a directory that is large enough to
require separate leaf index blocks.  The hashes stored in the dabtree
use the ascii-ci specific hash function, which uses a library function
to convert the name to lowercase before hashing.  If the kernel and C
library's versions of tolower do not behave exactly identically,
xfs_ascii_ci_hashname will not produce the same results for the same
inputs.  xfs_repair will deem the leaf information corrupt and rebuild
the directory.  After that, lookups in the kernel will fail because the
hash index doesn't work.

The kernel's tolower function will convert extended ascii uppercase
letters (e.g. A-with-umlaut) to extended ascii lowercase letters (e.g.
a-with-umlaut), whereas glibc's will only do that if you force LANG to
ascii.  Tiny embedded libc implementations just plain won't do it at
all, and the result is a mess.  Stabilize the behavior of the hash
function by encoding the name transformation function in libxfs, add it
to the selftest, and fix all the userspace tools, none of which handle
this transformation correctly.

The v1 series generated a /lot/ of discussion, in which several things
became very clear: (1) Linus is not enamored of case folding of any
kind; (2) Dave and Christoph don't seem to agree on whether the feature
is supposed to work for 7-bit ascii or latin1; (3) it trashes UTF8
encoded names if those happen to show up; and (4) I don't want to
maintain this mess any longer than I have to.  Kill it in 2030.

v2: rename the functions to make it clear we're moving away from the
letters t, o, l, o, w, e, and r; and deprecate the whole feature once
we've fixed the bugs and added tests.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-asciici-bugs-6.3

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=fix-asciici-bugs-6.3
---
 db/metadump.c            |   79 +++++++++++++++--
 libfrog/dahashselftest.h |  208 ++++++++++++++++++++++++----------------------
 libxfs/libxfs_api_defs.h |    2 
 libxfs/xfs_dir2.c        |    4 -
 libxfs/xfs_dir2.h        |   31 +++++++
 man/man8/mkfs.xfs.8.in   |   23 ++++-
 mkfs/xfs_mkfs.c          |   11 ++
 7 files changed, 243 insertions(+), 115 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
@ 2023-04-06  0:09   ` Darrick J. Wong
  2023-04-06  0:09   ` [PATCH 2/6] xfs: test the ascii case-insensitive hash Darrick J. Wong
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Back in the old days, the "ascii-ci" feature was created to implement
case-insensitive directory entry lookups for latin1-encoded names and
remove the large overhead of Samba's case-insensitive lookup code.  UTF8
names were not allowed, but nobody explicitly wrote in the documentation
that this was only expected to work if the system used latin1 names.
The kernel tolower function was selected to prepare names for hashed
lookups.

There's a major discrepancy in the function that computes directory entry
hashes for filesystems that have ASCII case-insensitive lookups enabled.
The root of this is that the kernel and glibc's tolower implementations
have differing behavior for extended ASCII accented characters.  I wrote
a program to spit out characters for which the tolower() return value is
different from the input:

glibc tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z

kernel tolower:
65:A 66:B 67:C 68:D 69:E 70:F 71:G 72:H 73:I 74:J 75:K 76:L 77:M 78:N
79:O 80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W 88:X 89:Y 90:Z 192:À 193:Á
194:Â 195:Ã 196:Ä 197:Å 198:Æ 199:Ç 200:È 201:É 202:Ê 203:Ë 204:Ì 205:Í
206:Î 207:Ï 208:Ð 209:Ñ 210:Ò 211:Ó 212:Ô 213:Õ 214:Ö 215:× 216:Ø 217:Ù
218:Ú 219:Û 220:Ü 221:Ý 222:Þ

Which means that the kernel and userspace do not agree on the hash value
for a directory filename that contains those higher values.  The hash
values are written into the leaf index block of directories that are
larger than two blocks in size, which means that xfs_repair will flag
these directories as having corrupted hash indexes and rewrite the index
with hash values that the kernel now will not recognize.

Because the ascii-ci feature is not frequently enabled and the kernel
touches filesystems far more frequently than xfs_repair does, fix this
by encoding the kernel's toupper predicate and tolower functions into
libxfs.  Give the new functions less provocative names to make it really
obvious that this is a pre-hash name preparation function, and nothing
else.  This change makes userspace's behavior consistent with the
kernel.

Found by auditing obfuscate_name in xfs_metadump as part of working on
parent pointers, wondering how it could possibly work correctly with ci
filesystems, writing a test tool to create a directory with
hash-colliding names, and watching xfs_repair flag it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_dir2.c |    4 ++--
 libxfs/xfs_dir2.h |   31 +++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index d6a19296..ae21ee34 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -63,7 +63,7 @@ xfs_ascii_ci_hashname(
 	int			i;
 
 	for (i = 0, hash = 0; i < name->len; i++)
-		hash = tolower(name->name[i]) ^ rol32(hash, 7);
+		hash = xfs_ascii_ci_xfrm(name->name[i]) ^ rol32(hash, 7);
 
 	return hash;
 }
@@ -84,7 +84,7 @@ xfs_ascii_ci_compname(
 	for (i = 0; i < len; i++) {
 		if (args->name[i] == name[i])
 			continue;
-		if (tolower(args->name[i]) != tolower(name[i]))
+		if (xfs_ascii_ci_xfrm(args->name[i]) != xfs_ascii_ci_xfrm(name[i]))
 			return XFS_CMP_DIFFERENT;
 		result = XFS_CMP_CASE;
 	}
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index dd39f17d..19af22a1 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -248,4 +248,35 @@ unsigned int xfs_dir3_data_end_offset(struct xfs_da_geometry *geo,
 		struct xfs_dir2_data_hdr *hdr);
 bool xfs_dir2_namecheck(const void *name, size_t length);
 
+/*
+ * The "ascii-ci" feature was created to speed up case-insensitive lookups for
+ * a Samba product.  Because of the inherent problems with CI and UTF-8
+ * encoding, etc, it was decided that Samba would be configured to export
+ * latin1/iso 8859-1 encodings as that covered >90% of the target markets for
+ * the product.  Hence the "ascii-ci" casefolding code could be encoded into
+ * the XFS directory operations and remove all the overhead of casefolding from
+ * Samba.
+ *
+ * To provide consistent hashing behavior between the userspace and kernel,
+ * these functions prepare names for hashing by transforming specific bytes
+ * to other bytes.  Robustness with other encodings is not guaranteed.
+ */
+static inline bool xfs_ascii_ci_need_xfrm(unsigned char c)
+{
+	if (c >= 0x41 && c <= 0x5a)	/* A-Z */
+		return true;
+	if (c >= 0xc0 && c <= 0xd6)	/* latin A-O with accents */
+		return true;
+	if (c >= 0xd8 && c <= 0xde)	/* latin O-Y with accents */
+		return true;
+	return false;
+}
+
+static inline unsigned char xfs_ascii_ci_xfrm(unsigned char c)
+{
+	if (xfs_ascii_ci_need_xfrm(c))
+		c -= 'A' - 'a';
+	return c;
+}
+
 #endif	/* __XFS_DIR2_H__ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] xfs: test the ascii case-insensitive hash
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
  2023-04-06  0:09   ` [PATCH 1/6] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
@ 2023-04-06  0:09   ` Darrick J. Wong
  2023-04-06  0:09   ` [PATCH 3/6] xfs_db: move obfuscate_name assertion to callers Darrick J. Wong
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Now that we've made kernel and userspace use the same tolower code for
computing directory index hashes, add that to the selftest code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/dahashselftest.h |  208 ++++++++++++++++++++++++----------------------
 libxfs/libxfs_api_defs.h |    2 
 2 files changed, 110 insertions(+), 100 deletions(-)


diff --git a/libfrog/dahashselftest.h b/libfrog/dahashselftest.h
index 7dda5303..ea9d925b 100644
--- a/libfrog/dahashselftest.h
+++ b/libfrog/dahashselftest.h
@@ -13,108 +13,109 @@ static struct dahash_test {
 	uint16_t	start;	/* random 12 bit offset in buf */
 	uint16_t	length;	/* random 8 bit length of test */
 	xfs_dahash_t	dahash;	/* expected dahash result */
+	xfs_dahash_t	ascii_ci_dahash; /* expected ascii-ci dahash result */
 } dahash_tests[] =
 {
-	{0x0567, 0x0097, 0x96951389},
-	{0x0869, 0x0055, 0x6455ab4f},
-	{0x0c51, 0x00be, 0x8663afde},
-	{0x044a, 0x00fc, 0x98fbe432},
-	{0x0f29, 0x0079, 0x42371997},
-	{0x08ba, 0x0052, 0x942be4f7},
-	{0x01f2, 0x0013, 0x5262687e},
-	{0x09e3, 0x00e2, 0x8ffb0908},
-	{0x007c, 0x0051, 0xb3158491},
-	{0x0854, 0x001f, 0x83bb20d9},
-	{0x031b, 0x0008, 0x98970bdf},
-	{0x0de7, 0x0027, 0xbfbf6f6c},
-	{0x0f76, 0x0005, 0x906a7105},
-	{0x092e, 0x00d0, 0x86631850},
-	{0x0233, 0x0082, 0xdbdd914e},
-	{0x04c9, 0x0075, 0x5a400a9e},
-	{0x0b66, 0x0099, 0xae128b45},
-	{0x000d, 0x00ed, 0xe61c216a},
-	{0x0a31, 0x003d, 0xf69663b9},
-	{0x00a3, 0x0052, 0x643c39ae},
-	{0x0125, 0x00d5, 0x7c310b0d},
-	{0x0105, 0x004a, 0x06a77e74},
-	{0x0858, 0x008e, 0x265bc739},
-	{0x045e, 0x0095, 0x13d6b192},
-	{0x0dab, 0x003c, 0xc4498704},
-	{0x00cd, 0x00b5, 0x802a4e2d},
-	{0x069b, 0x008c, 0x5df60f71},
-	{0x0454, 0x006c, 0x5f03d8bb},
-	{0x040e, 0x0032, 0x0ce513b5},
-	{0x0874, 0x00e2, 0x6a811fb3},
-	{0x0521, 0x00b4, 0x93296833},
-	{0x0ddc, 0x00cf, 0xf9305338},
-	{0x0a70, 0x0023, 0x239549ea},
-	{0x083e, 0x0027, 0x2d88ba97},
-	{0x0241, 0x00a7, 0xfe0b32e1},
-	{0x0dfc, 0x0096, 0x1a11e815},
-	{0x023e, 0x001e, 0xebc9a1f3},
-	{0x067e, 0x0066, 0xb1067f81},
-	{0x09ea, 0x000e, 0x46fd7247},
-	{0x036b, 0x008c, 0x1a39acdf},
-	{0x078f, 0x0030, 0x964042ab},
-	{0x085c, 0x008f, 0x1829edab},
-	{0x02ec, 0x009f, 0x6aefa72d},
-	{0x043b, 0x00ce, 0x65642ff5},
-	{0x0a32, 0x00b8, 0xbd82759e},
-	{0x0d3c, 0x0087, 0xf4d66d54},
-	{0x09ec, 0x008a, 0x06bfa1ff},
-	{0x0902, 0x0015, 0x755025d2},
-	{0x08fe, 0x000e, 0xf690ce2d},
-	{0x00fb, 0x00dc, 0xe55f1528},
-	{0x0eaa, 0x003a, 0x0fe0a8d7},
-	{0x05fb, 0x0006, 0x86281cfb},
-	{0x0dd1, 0x00a7, 0x60ab51b4},
-	{0x0005, 0x001b, 0xf51d969b},
-	{0x077c, 0x00dd, 0xc2fed268},
-	{0x0575, 0x00f5, 0x432c0b1a},
-	{0x05be, 0x0088, 0x78baa04b},
-	{0x0c89, 0x0068, 0xeda9e428},
-	{0x0f5c, 0x0068, 0xec143c76},
-	{0x06a8, 0x0009, 0xd72651ce},
-	{0x060f, 0x008e, 0x765426cd},
-	{0x07b1, 0x0047, 0x2cfcfa0c},
-	{0x04f1, 0x0041, 0x55b172f9},
-	{0x0e05, 0x00ac, 0x61efde93},
-	{0x0bf7, 0x0097, 0x05b83eee},
-	{0x04e9, 0x00f3, 0x9928223a},
-	{0x023a, 0x0005, 0xdfada9bc},
-	{0x0acb, 0x000e, 0x2217cecd},
-	{0x0148, 0x0060, 0xbc3f7405},
-	{0x0764, 0x0059, 0xcbc201b1},
-	{0x021f, 0x0059, 0x5d6b2256},
-	{0x0f1e, 0x006c, 0xdefeeb45},
-	{0x071c, 0x00b9, 0xb9b59309},
-	{0x0564, 0x0063, 0xae064271},
-	{0x0b14, 0x0044, 0xdb867d9b},
-	{0x0e5a, 0x0055, 0xff06b685},
-	{0x015e, 0x00ba, 0x1115ccbc},
-	{0x0379, 0x00e6, 0x5f4e58dd},
-	{0x013b, 0x0067, 0x4897427e},
-	{0x0e64, 0x0071, 0x7af2b7a4},
-	{0x0a11, 0x0050, 0x92105726},
-	{0x0109, 0x0055, 0xd0d000f9},
-	{0x00aa, 0x0022, 0x815d229d},
-	{0x09ac, 0x004f, 0x02f9d985},
-	{0x0e1b, 0x00ce, 0x5cf92ab4},
-	{0x08af, 0x00d8, 0x17ca72d1},
-	{0x0e33, 0x000a, 0xda2dba6b},
-	{0x0ee3, 0x006a, 0xb00048e5},
-	{0x0648, 0x001a, 0x2364b8cb},
-	{0x0315, 0x0085, 0x0596fd0d},
-	{0x0fbb, 0x003e, 0x298230ca},
-	{0x0422, 0x006a, 0x78ada4ab},
-	{0x04ba, 0x0073, 0xced1fbc2},
-	{0x007d, 0x0061, 0x4b7ff236},
-	{0x070b, 0x00d0, 0x261cf0ae},
-	{0x0c1a, 0x0035, 0x8be92ee2},
-	{0x0af8, 0x0063, 0x824dcf03},
-	{0x08f8, 0x006d, 0xd289710c},
-	{0x021b, 0x00ee, 0x6ac1c41d},
-	{0x05b5, 0x00da, 0x8e52f0e2},
+	{0x0567, 0x0097, 0x96951389, 0xc153aa0d},
+	{0x0869, 0x0055, 0x6455ab4f, 0xd07f69bf},
+	{0x0c51, 0x00be, 0x8663afde, 0xf9add90c},
+	{0x044a, 0x00fc, 0x98fbe432, 0xbf2abb76},
+	{0x0f29, 0x0079, 0x42371997, 0x282588b3},
+	{0x08ba, 0x0052, 0x942be4f7, 0x2e023547},
+	{0x01f2, 0x0013, 0x5262687e, 0x5266287e},
+	{0x09e3, 0x00e2, 0x8ffb0908, 0x1da892f3},
+	{0x007c, 0x0051, 0xb3158491, 0xb67f9e63},
+	{0x0854, 0x001f, 0x83bb20d9, 0x22bb21db},
+	{0x031b, 0x0008, 0x98970bdf, 0x9cd70adf},
+	{0x0de7, 0x0027, 0xbfbf6f6c, 0xae3f296c},
+	{0x0f76, 0x0005, 0x906a7105, 0x906a7105},
+	{0x092e, 0x00d0, 0x86631850, 0xa3f6ac04},
+	{0x0233, 0x0082, 0xdbdd914e, 0x5d8c7aac},
+	{0x04c9, 0x0075, 0x5a400a9e, 0x12f60711},
+	{0x0b66, 0x0099, 0xae128b45, 0x7551310d},
+	{0x000d, 0x00ed, 0xe61c216a, 0xc22d3c4c},
+	{0x0a31, 0x003d, 0xf69663b9, 0x51960bf8},
+	{0x00a3, 0x0052, 0x643c39ae, 0xa93c73a8},
+	{0x0125, 0x00d5, 0x7c310b0d, 0xf221cbb3},
+	{0x0105, 0x004a, 0x06a77e74, 0xa4ef4561},
+	{0x0858, 0x008e, 0x265bc739, 0xd6c36d9b},
+	{0x045e, 0x0095, 0x13d6b192, 0x5f5c1d62},
+	{0x0dab, 0x003c, 0xc4498704, 0x10414654},
+	{0x00cd, 0x00b5, 0x802a4e2d, 0xfbd17c9d},
+	{0x069b, 0x008c, 0x5df60f71, 0x91ddca5f},
+	{0x0454, 0x006c, 0x5f03d8bb, 0x5c59fce0},
+	{0x040e, 0x0032, 0x0ce513b5, 0xa8cd99b1},
+	{0x0874, 0x00e2, 0x6a811fb3, 0xca028316},
+	{0x0521, 0x00b4, 0x93296833, 0x2c4d4880},
+	{0x0ddc, 0x00cf, 0xf9305338, 0x2c94210d},
+	{0x0a70, 0x0023, 0x239549ea, 0x22b561aa},
+	{0x083e, 0x0027, 0x2d88ba97, 0x5cd8bb9d},
+	{0x0241, 0x00a7, 0xfe0b32e1, 0x17b506b8},
+	{0x0dfc, 0x0096, 0x1a11e815, 0xee4141bd},
+	{0x023e, 0x001e, 0xebc9a1f3, 0x5689a1f3},
+	{0x067e, 0x0066, 0xb1067f81, 0xd9952571},
+	{0x09ea, 0x000e, 0x46fd7247, 0x42b57245},
+	{0x036b, 0x008c, 0x1a39acdf, 0x58bf1586},
+	{0x078f, 0x0030, 0x964042ab, 0xb04218b9},
+	{0x085c, 0x008f, 0x1829edab, 0x9ceca89c},
+	{0x02ec, 0x009f, 0x6aefa72d, 0x634cc2a7},
+	{0x043b, 0x00ce, 0x65642ff5, 0x6c8a584e},
+	{0x0a32, 0x00b8, 0xbd82759e, 0x0f96a34f},
+	{0x0d3c, 0x0087, 0xf4d66d54, 0xb71ba5f4},
+	{0x09ec, 0x008a, 0x06bfa1ff, 0x576ca80f},
+	{0x0902, 0x0015, 0x755025d2, 0x517225c2},
+	{0x08fe, 0x000e, 0xf690ce2d, 0xf690cf3d},
+	{0x00fb, 0x00dc, 0xe55f1528, 0x707d7d92},
+	{0x0eaa, 0x003a, 0x0fe0a8d7, 0x87638cc5},
+	{0x05fb, 0x0006, 0x86281cfb, 0x86281cf9},
+	{0x0dd1, 0x00a7, 0x60ab51b4, 0xe28ef00c},
+	{0x0005, 0x001b, 0xf51d969b, 0xe71dd6d3},
+	{0x077c, 0x00dd, 0xc2fed268, 0xdc30c555},
+	{0x0575, 0x00f5, 0x432c0b1a, 0x81dd7d16},
+	{0x05be, 0x0088, 0x78baa04b, 0xd69b433e},
+	{0x0c89, 0x0068, 0xeda9e428, 0xe9b4fa0a},
+	{0x0f5c, 0x0068, 0xec143c76, 0x9947067a},
+	{0x06a8, 0x0009, 0xd72651ce, 0xd72651ee},
+	{0x060f, 0x008e, 0x765426cd, 0x2099626f},
+	{0x07b1, 0x0047, 0x2cfcfa0c, 0x1a4baa07},
+	{0x04f1, 0x0041, 0x55b172f9, 0x15331a79},
+	{0x0e05, 0x00ac, 0x61efde93, 0x320568cc},
+	{0x0bf7, 0x0097, 0x05b83eee, 0xc72fb7a3},
+	{0x04e9, 0x00f3, 0x9928223a, 0xe8c77de2},
+	{0x023a, 0x0005, 0xdfada9bc, 0xdfadb9be},
+	{0x0acb, 0x000e, 0x2217cecd, 0x0017d6cd},
+	{0x0148, 0x0060, 0xbc3f7405, 0xf5fd6615},
+	{0x0764, 0x0059, 0xcbc201b1, 0xbb089bf4},
+	{0x021f, 0x0059, 0x5d6b2256, 0xa16a0a59},
+	{0x0f1e, 0x006c, 0xdefeeb45, 0xfc34f9d6},
+	{0x071c, 0x00b9, 0xb9b59309, 0xb645eae2},
+	{0x0564, 0x0063, 0xae064271, 0x954dc6d1},
+	{0x0b14, 0x0044, 0xdb867d9b, 0xdf432309},
+	{0x0e5a, 0x0055, 0xff06b685, 0xa65ff257},
+	{0x015e, 0x00ba, 0x1115ccbc, 0x11c365f4},
+	{0x0379, 0x00e6, 0x5f4e58dd, 0x2d176d31},
+	{0x013b, 0x0067, 0x4897427e, 0xc40532fe},
+	{0x0e64, 0x0071, 0x7af2b7a4, 0x1fb7bf43},
+	{0x0a11, 0x0050, 0x92105726, 0xb1185e51},
+	{0x0109, 0x0055, 0xd0d000f9, 0x60a60bfd},
+	{0x00aa, 0x0022, 0x815d229d, 0x215d379c},
+	{0x09ac, 0x004f, 0x02f9d985, 0x10b90b20},
+	{0x0e1b, 0x00ce, 0x5cf92ab4, 0x6a477573},
+	{0x08af, 0x00d8, 0x17ca72d1, 0x385af156},
+	{0x0e33, 0x000a, 0xda2dba6b, 0xda2dbb69},
+	{0x0ee3, 0x006a, 0xb00048e5, 0xa9a2decc},
+	{0x0648, 0x001a, 0x2364b8cb, 0x3364b1cb},
+	{0x0315, 0x0085, 0x0596fd0d, 0xa651740f},
+	{0x0fbb, 0x003e, 0x298230ca, 0x7fc617c7},
+	{0x0422, 0x006a, 0x78ada4ab, 0xc576ae2a},
+	{0x04ba, 0x0073, 0xced1fbc2, 0xaac8455b},
+	{0x007d, 0x0061, 0x4b7ff236, 0x347d5739},
+	{0x070b, 0x00d0, 0x261cf0ae, 0xc7fb1c10},
+	{0x0c1a, 0x0035, 0x8be92ee2, 0x8be9b4e1},
+	{0x0af8, 0x0063, 0x824dcf03, 0x53010388},
+	{0x08f8, 0x006d, 0xd289710c, 0x30418edd},
+	{0x021b, 0x00ee, 0x6ac1c41d, 0x2557e9a3},
+	{0x05b5, 0x00da, 0x8e52f0e2, 0x98531012},
 };
 
 /* Don't print anything to stdout. */
@@ -127,6 +128,7 @@ dahash_test(
 	int		i;
 	int		errors = 0;
 	int		bytes = 0;
+	struct xfs_name	xname = { };
 	struct timeval	start, stop;
 	uint64_t	usec;
 
@@ -150,6 +152,12 @@ dahash_test(
 				dahash_tests[i].length);
 		if (hash != dahash_tests[i].dahash)
 			errors++;
+
+		xname.name = randbytes_test_buf + dahash_tests[i].start;
+		xname.len = dahash_tests[i].length;
+		hash = libxfs_ascii_ci_hashname(&xname);
+		if (hash != dahash_tests[i].ascii_ci_dahash)
+			errors++;
 	}
 	gettimeofday(&stop, NULL);
 
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index f8efcce7..ef3624ae 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -33,6 +33,8 @@
 #define xfs_alloc_read_agf		libxfs_alloc_read_agf
 #define xfs_alloc_vextent		libxfs_alloc_vextent
 
+#define xfs_ascii_ci_hashname		libxfs_ascii_ci_hashname
+
 #define xfs_attr_get			libxfs_attr_get
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] xfs_db: move obfuscate_name assertion to callers
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
  2023-04-06  0:09   ` [PATCH 1/6] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
  2023-04-06  0:09   ` [PATCH 2/6] xfs: test the ascii case-insensitive hash Darrick J. Wong
@ 2023-04-06  0:09   ` Darrick J. Wong
  2023-04-11  4:52     ` Christoph Hellwig
  2023-04-06  0:09   ` [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems Darrick J. Wong
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Currently, obfuscate_name asserts that the hash of the new name is the
same as the old name.  To enable bug fixes in the next patch, move this
assertion to the callers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/metadump.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/db/metadump.c b/db/metadump.c
index 27d1df43..317ff728 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -882,7 +882,6 @@ obfuscate_name(
 		*first ^= 0x10;
 		ASSERT(!is_invalid_char(*first));
 	}
-	ASSERT(libxfs_da_hashname(name, name_len) == hash);
 }
 
 /*
@@ -1208,6 +1207,7 @@ generate_obfuscated_name(
 
 	hash = libxfs_da_hashname(name, namelen);
 	obfuscate_name(hash, namelen, name);
+	ASSERT(hash == libxfs_da_hashname(name, namelen));
 
 	/*
 	 * Make sure the name is not something already seen.  If we
@@ -1321,6 +1321,7 @@ obfuscate_path_components(
 			namelen = strnlen((char *)comp, len);
 			hash = libxfs_da_hashname(comp, namelen);
 			obfuscate_name(hash, namelen, comp);
+			ASSERT(hash == libxfs_da_hashname(comp, namelen));
 			break;
 		}
 		namelen = slash - (char *)comp;
@@ -1332,6 +1333,7 @@ obfuscate_path_components(
 		}
 		hash = libxfs_da_hashname(comp, namelen);
 		obfuscate_name(hash, namelen, comp);
+		ASSERT(hash == libxfs_da_hashname(comp, namelen));
 		comp += namelen + 1;
 		len -= namelen + 1;
 	}


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-04-06  0:09   ` [PATCH 3/6] xfs_db: move obfuscate_name assertion to callers Darrick J. Wong
@ 2023-04-06  0:09   ` Darrick J. Wong
  2023-04-11  4:58     ` Christoph Hellwig
  2023-04-06  0:10   ` [PATCH 5/6] mkfs.xfs.8: warn about the version=ci feature Darrick J. Wong
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:09 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Now that we've stabilized the dirent hash function for ascii-ci
filesystems, adapt the metadump name obfuscation code to detect when
it's obfuscating a directory entry name on an ascii-ci filesystem and
spit out names that actually have the same hash.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/metadump.c |   77 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 68 insertions(+), 9 deletions(-)


diff --git a/db/metadump.c b/db/metadump.c
index 317ff728..4f8b3adb 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -817,13 +817,17 @@ static void
 obfuscate_name(
 	xfs_dahash_t	hash,
 	size_t		name_len,
-	unsigned char	*name)
+	unsigned char	*name,
+	bool		is_dirent)
 {
-	unsigned char	*newp = name;
+	unsigned char	*oldname = NULL;
+	unsigned char	*newp;
 	int		i;
-	xfs_dahash_t	new_hash = 0;
+	xfs_dahash_t	new_hash;
 	unsigned char	*first;
 	unsigned char	high_bit;
+	int		tries = 0;
+	bool		is_ci_name = is_dirent && xfs_has_asciici(mp);
 	int		shift;
 
 	/*
@@ -836,6 +840,24 @@ obfuscate_name(
 	if (name_len < 5)
 		return;
 
+	if (is_ci_name) {
+		oldname = alloca(name_len);
+		memcpy(oldname, name, name_len);
+	}
+
+again:
+	newp = name;
+	new_hash = 0;
+
+	/*
+	 * If we cannot generate a ci-compatible obfuscated name after 1000
+	 * tries, don't bother obfuscating the name.
+	 */
+	if (tries++ > 1000) {
+		memcpy(name, oldname, name_len);
+		return;
+	}
+
 	/*
 	 * The beginning of the obfuscated name can be pretty much
 	 * anything, so fill it in with random characters.
@@ -843,7 +865,11 @@ obfuscate_name(
 	 */
 	for (i = 0; i < name_len - 5; i++) {
 		*newp = random_filename_char();
-		new_hash = *newp ^ rol32(new_hash, 7);
+		if (is_ci_name)
+			new_hash = xfs_ascii_ci_xfrm(*newp) ^
+							rol32(new_hash, 7);
+		else
+			new_hash = *newp ^ rol32(new_hash, 7);
 		newp++;
 	}
 
@@ -867,6 +893,17 @@ obfuscate_name(
 			high_bit = 0x80;
 		} else
 			high_bit = 0;
+
+		/*
+		 * If ascii-ci is enabled, uppercase characters are converted
+		 * to lowercase characters while computing the name hash.  If
+		 * any of the necessary correction bytes are uppercase, the
+		 * hash of the new name will not match.  Try again with a
+		 * different prefix.
+		 */
+		if (is_ci_name && xfs_ascii_ci_need_xfrm(*newp))
+			goto again;
+
 		ASSERT(!is_invalid_char(*newp));
 		newp++;
 	}
@@ -880,6 +917,10 @@ obfuscate_name(
 	 */
 	if (high_bit) {
 		*first ^= 0x10;
+
+		if (is_ci_name && xfs_ascii_ci_need_xfrm(*first))
+			goto again;
+
 		ASSERT(!is_invalid_char(*first));
 	}
 }
@@ -1177,6 +1218,24 @@ handle_duplicate_name(xfs_dahash_t hash, size_t name_len, unsigned char *name)
 	return 1;
 }
 
+static inline xfs_dahash_t
+dirattr_hashname(
+	bool		is_dirent,
+	const uint8_t	*name,
+	int		namelen)
+{
+	if (is_dirent) {
+		struct xfs_name	xname = {
+			.name	= name,
+			.len	= namelen,
+		};
+
+		return libxfs_dir2_hashname(mp, &xname);
+	}
+
+	return libxfs_da_hashname(name, namelen);
+}
+
 static void
 generate_obfuscated_name(
 	xfs_ino_t		ino,
@@ -1205,9 +1264,9 @@ generate_obfuscated_name(
 
 	/* Obfuscate the name (if possible) */
 
-	hash = libxfs_da_hashname(name, namelen);
-	obfuscate_name(hash, namelen, name);
-	ASSERT(hash == libxfs_da_hashname(name, namelen));
+	hash = dirattr_hashname(ino != 0, name, namelen);
+	obfuscate_name(hash, namelen, name, ino != 0);
+	ASSERT(hash == dirattr_hashname(ino != 0, name, namelen));
 
 	/*
 	 * Make sure the name is not something already seen.  If we
@@ -1320,7 +1379,7 @@ obfuscate_path_components(
 			/* last (or single) component */
 			namelen = strnlen((char *)comp, len);
 			hash = libxfs_da_hashname(comp, namelen);
-			obfuscate_name(hash, namelen, comp);
+			obfuscate_name(hash, namelen, comp, false);
 			ASSERT(hash == libxfs_da_hashname(comp, namelen));
 			break;
 		}
@@ -1332,7 +1391,7 @@ obfuscate_path_components(
 			continue;
 		}
 		hash = libxfs_da_hashname(comp, namelen);
-		obfuscate_name(hash, namelen, comp);
+		obfuscate_name(hash, namelen, comp, false);
 		ASSERT(hash == libxfs_da_hashname(comp, namelen));
 		comp += namelen + 1;
 		len -= namelen + 1;


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] mkfs.xfs.8: warn about the version=ci feature
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-04-06  0:09   ` [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems Darrick J. Wong
@ 2023-04-06  0:10   ` Darrick J. Wong
  2023-04-11  4:59     ` Christoph Hellwig
  2023-04-06  0:10   ` [PATCH 6/6] mkfs: deprecate the ascii-ci feature Darrick J. Wong
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:10 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Document the exact byte transformations that happen during directory
name lookup when the version=ci feature is enabled.  Warn that this is
not generally compatible, and that people should not use this feature.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/mkfs.xfs.8.in |   22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)


diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 49e64d47..6fc7708b 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -809,11 +809,25 @@ can be either 2 or 'ci', defaulting to 2 if unspecified.
 With version 2 directories, the directory block size can be
 any power of 2 size from the filesystem block size up to 65536.
 .IP
-The
+If the
 .B version=ci
-option enables ASCII only case-insensitive filename lookup and version
-2 directories. Filenames are case-preserving, that is, the names
-are stored in directories using the case they were created with.
+option is specified, the kernel will transform certain bytes in filenames
+before performing lookup-related operations.
+The byte sequence given to create a directory entry is persisted without
+alterations.
+The lookup transformations are defined as follows:
+
+    0x41-0x5a -> 0x61-0x7a
+
+    0xc0-0xd6 -> 0xe0-0xf6
+
+    0xd8-0xde -> 0xf8-0xfe
+
+This transformation roughly corresponds to case insensitivity in ISO
+8859-1.
+The transformations are not compatible with other encodings (e.g. UTF8).
+Do not enable this feature unless your entire environment has been coerced
+to ISO 8859-1.
 .IP
 Note: Version 1 directories are not supported.
 .TP


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] mkfs: deprecate the ascii-ci feature
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-04-06  0:10   ` [PATCH 5/6] mkfs.xfs.8: warn about the version=ci feature Darrick J. Wong
@ 2023-04-06  0:10   ` Darrick J. Wong
  2023-04-11  4:59     ` Christoph Hellwig
  2023-04-13 15:19   ` [RFC PATCH 7/6] xfs_db: hoist name obfuscation code out of metadump.c Darrick J. Wong
  2023-04-13 15:20   ` [RFC PATCH 8/6] xfs_db: create dirents and xattrs with colliding names Darrick J. Wong
  7 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:10 UTC (permalink / raw)
  To: djwong, cem; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Deprecate this feature, since the feature is broken.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/mkfs.xfs.8.in |    1 +
 mkfs/xfs_mkfs.c        |   11 +++++++++++
 2 files changed, 12 insertions(+)


diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 6fc7708b..01f9dc6e 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -828,6 +828,7 @@ This transformation roughly corresponds to case insensitivity in ISO
 The transformations are not compatible with other encodings (e.g. UTF8).
 Do not enable this feature unless your entire environment has been coerced
 to ISO 8859-1.
+This feature is deprecated and will be removed in September 2030.
 .IP
 Note: Version 1 directories are not supported.
 .TP
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 6dc0f335..64f17a8f 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2150,6 +2150,17 @@ validate_sb_features(
 	struct mkfs_params	*cfg,
 	struct cli_params	*cli)
 {
+	if (cli->sb_feat.nci) {
+		/*
+		 * The ascii-ci feature is deprecated in the upstream Linux
+		 * kernel.  In September 2025 it will be turned off by default
+		 * in the kernel and in September 2030 support will be removed
+		 * entirely.
+		 */
+		fprintf(stdout,
+_("ascii-ci filesystems are deprecated and will not be supported by future versions.\n"));
+	}
+
 	/*
 	 * Now we have blocks and sector sizes set up, check parameters that are
 	 * no longer optional for CRC enabled filesystems.  Catch them up front


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH] fstests: add a couple more tests for ascii-ci problems
  2023-04-06  0:02 [PATCHSET v2 0/4] xfs: fix ascii-ci problems, then kill it Darrick J. Wong
                   ` (4 preceding siblings ...)
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
@ 2023-04-06  0:11 ` Darrick J. Wong
  5 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-06  0:11 UTC (permalink / raw)
  To: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

Add some tests to make sure that userspace and the kernel actually
agree on how to do ascii case-insensitive directory lookups, and that
metadump can actually obfuscate such filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/859     |   64 ++++++++++++++++++++++++++++++++++
 tests/xfs/859.out |   24 +++++++++++++
 tests/xfs/860     |  100 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/860.out |    9 +++++
 tests/xfs/861     |   90 ++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/861.out |    2 +
 6 files changed, 289 insertions(+)
 create mode 100755 tests/xfs/859
 create mode 100644 tests/xfs/859.out
 create mode 100755 tests/xfs/860
 create mode 100644 tests/xfs/860.out
 create mode 100755 tests/xfs/861
 create mode 100644 tests/xfs/861.out

diff --git a/tests/xfs/859 b/tests/xfs/859
new file mode 100755
index 0000000000..d9e662ad11
--- /dev/null
+++ b/tests/xfs/859
@@ -0,0 +1,64 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2023 Oracle.  All Rights Reserved.
+#
+# FS QA Test 859
+#
+# Make sure that the kernel and userspace agree on which byte sequences are
+# ASCII uppercase letters, and how to convert them.
+#
+. ./common/preamble
+_begin_fstest auto ci dir
+
+# Override the default cleanup function.
+# _cleanup()
+# {
+# 	cd /
+# 	rm -r -f $tmp.*
+# }
+
+# Import common functions.
+. ./common/filter
+
+_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: stabilize the tolower function used for ascii-ci dir hash computation"
+_fixed_by_kernel_commit XXXXXXXXXXXX "xfs: use the directory name hash function for dir scrubbing"
+
+_supported_fs xfs
+_require_scratch
+_require_xfs_mkfs_ciname
+
+_scratch_mkfs -n version=ci > $seqres.full
+_scratch_mount
+
+# Create a two-block directory to force leaf format
+mkdir "$SCRATCH_MNT/lol"
+touch "$SCRATCH_MNT/lol/autoexec.bat"
+i=0
+dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT")
+nr_dirents=$((dblksz * 2 / 256))
+
+for ((i = 0; i < nr_dirents; i++)); do
+	name="$(printf "y%0254d" $i)"
+	ln "$SCRATCH_MNT/lol/autoexec.bat" "$SCRATCH_MNT/lol/$name"
+done
+
+dirsz=$(stat -c '%s' $SCRATCH_MNT/lol)
+test $dirsz -gt $dblksz || echo "dir size $dirsz, expected at least $dblksz?"
+stat $SCRATCH_MNT/lol >> $seqres.full
+
+# Create names with extended ascii characters in them to exploit the fact
+# that the Linux kernel will transform extended ASCII uppercase characters
+# but libc won't.  Need to force LANG=C here so that awk doesn't spit out utf8
+# sequences.
+old_lang="$LANG"
+LANG=C
+awk 'END { for (i = 192; i < 247; i++) printf("%c\n", i); }' < /dev/null | while read name; do
+	ln "$SCRATCH_MNT/lol/autoexec.bat" "$SCRATCH_MNT/lol/$name" 2>&1 | _filter_scratch
+done
+
+LANG=$old_lang
+
+# Now just let repair run
+
+status=0
+exit
diff --git a/tests/xfs/859.out b/tests/xfs/859.out
new file mode 100644
index 0000000000..a4939ba670
--- /dev/null
+++ b/tests/xfs/859.out
@@ -0,0 +1,24 @@
+QA output created by 859
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\340': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\341': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\342': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\343': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\344': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\345': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\346': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\347': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\350': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\351': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\352': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\353': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\354': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\355': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\356': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\357': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\360': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\361': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\362': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\363': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\364': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\365': File exists
+ln: failed to create hard link 'SCRATCH_MNT/lol/'$'\366': File exists
diff --git a/tests/xfs/860 b/tests/xfs/860
new file mode 100755
index 0000000000..d26d197bd9
--- /dev/null
+++ b/tests/xfs/860
@@ -0,0 +1,100 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2023 Oracle.  All Rights Reserved.
+#
+# FS QA Test 860
+#
+# Make sure that metadump obfuscation works for filesystems with ascii-ci
+# enabled.
+#
+. ./common/preamble
+_begin_fstest auto dir ci
+
+_cleanup()
+{
+      cd /
+      rm -r -f $tmp.* $testdir
+}
+
+_fixed_by_git_commit xfsprogs XXXXXXXXXXX "xfs_db: fix metadump name obfuscation for ascii-ci filesystems"
+
+_supported_fs xfs
+_require_test
+_require_scratch
+_require_xfs_mkfs_ciname
+
+_scratch_mkfs -n version=ci > $seqres.full
+_scratch_mount
+
+# Create a two-block directory to force leaf format
+mkdir "$SCRATCH_MNT/lol"
+touch "$SCRATCH_MNT/lol/autoexec.bat"
+i=0
+dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT")
+nr_dirents=$((dblksz * 2 / 256))
+
+for ((i = 0; i < nr_dirents; i++)); do
+	name="$(printf "y%0254d" $i)"
+	ln "$SCRATCH_MNT/lol/autoexec.bat" "$SCRATCH_MNT/lol/$name"
+done
+
+dirsz=$(stat -c '%s' $SCRATCH_MNT/lol)
+test $dirsz -gt $dblksz || echo "dir size $dirsz, expected at least $dblksz?"
+stat $SCRATCH_MNT/lol >> $seqres.full
+
+# Create a two-block attr to force leaf format
+i=0
+for ((i = 0; i < nr_dirents; i++)); do
+	name="$(printf "user.%0250d" $i)"
+	$SETFATTR_PROG -n "$name" -v 1 "$SCRATCH_MNT/lol/autoexec.bat"
+done
+stat $SCRATCH_MNT/lol/autoexec.bat >> $seqres.full
+
+_scratch_unmount
+
+testdir=$TEST_DIR/$seq.metadumps
+mkdir -p $testdir
+metadump_file=$testdir/scratch.md
+metadump_file_a=${metadump_file}.a
+metadump_file_o=${metadump_file}.o
+metadump_file_ao=${metadump_file}.ao
+
+echo metadump
+_scratch_xfs_metadump $metadump_file >> $seqres.full
+
+echo metadump a
+_scratch_xfs_metadump $metadump_file_a -a >> $seqres.full
+
+echo metadump o
+_scratch_xfs_metadump $metadump_file_o -o >> $seqres.full
+
+echo metadump ao
+_scratch_xfs_metadump $metadump_file_ao -a -o >> $seqres.full
+
+echo mdrestore
+_scratch_xfs_mdrestore $metadump_file
+_scratch_mount
+_check_scratch_fs
+_scratch_unmount
+
+echo mdrestore a
+_scratch_xfs_mdrestore $metadump_file_a
+_scratch_mount
+_check_scratch_fs
+_scratch_unmount
+
+echo mdrestore o
+_scratch_xfs_mdrestore $metadump_file_o
+_scratch_mount
+_check_scratch_fs
+_scratch_unmount
+
+echo mdrestore ao
+_scratch_xfs_mdrestore $metadump_file_ao
+_scratch_mount
+_check_scratch_fs
+_scratch_unmount
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/860.out b/tests/xfs/860.out
new file mode 100644
index 0000000000..136fc5f7d6
--- /dev/null
+++ b/tests/xfs/860.out
@@ -0,0 +1,9 @@
+QA output created by 860
+metadump
+metadump a
+metadump o
+metadump ao
+mdrestore
+mdrestore a
+mdrestore o
+mdrestore ao
diff --git a/tests/xfs/861 b/tests/xfs/861
new file mode 100755
index 0000000000..7b0a37a3f1
--- /dev/null
+++ b/tests/xfs/861
@@ -0,0 +1,90 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2023 Oracle.  All Rights Reserved.
+#
+# FS QA Test 861
+#
+# Make sure that the kernel and utilities can handle large numbers of dirhash
+# collisions in both the directory and extended attribute structures.
+#
+# This started as a regression test for the new 'hashcoll' function in xfs_db,
+# but became a regression test for an xfs_repair bug affecting hashval checks
+# applied to the second and higher node levels of a dabtree.
+#
+. ./common/preamble
+_begin_fstest auto dir
+
+_fixed_by_git_commit xfsprogs b7b81f336ac "xfs_repair: fix incorrect dabtree hashval comparison"
+
+_supported_fs xfs
+_require_xfs_db_command "hashcoll"
+_require_xfs_db_command "path"
+_require_scratch
+
+_scratch_mkfs > $seqres.full
+_scratch_mount
+
+crash_dir=$SCRATCH_MNT/lol/
+crash_attrs=$SCRATCH_MNT/hah
+
+mkdir -p "$crash_dir"
+touch "$crash_attrs"
+
+# Create enough dirents to fill two dabtree node blocks with names that all
+# hash to the same value.  Each dirent gets its own record in the dabtree,
+# so we must create enough dirents to get a dabtree of at least height 2.
+dblksz=$(_xfs_get_dir_blocksize "$SCRATCH_MNT")
+
+da_records_per_block=$((dblksz / 8))	# 32-bit hash and 32-bit before
+nr_dirents=$((da_records_per_block * 2))
+
+longname="$(mktemp --dry-run "$(perl -e 'print "X" x 255;')" | tr ' ' 'X')"
+echo "creating $nr_dirents dirents from '$longname'" >> $seqres.full
+_scratch_xfs_db -r -c "hashcoll -n $nr_dirents -p $crash_dir $longname"
+
+# Create enough xattrs to fill two dabtree nodes.  Each attribute leaf block
+# gets its own record in the dabtree, so we have to create enough attr blocks
+# (each full of attrs) to get a dabtree of at least height 2.
+blksz=$(_get_block_size "$SCRATCH_MNT")
+
+attr_records_per_block=$((blksz / 255))
+da_records_per_block=$((blksz / 8))	# 32-bit hash and 32-bit before
+nr_attrs=$((da_records_per_block * attr_records_per_block * 2))
+
+longname="$(mktemp --dry-run "$(perl -e 'print "X" x 249;')" | tr ' ' 'X')"
+echo "creating $nr_attrs attrs from '$longname'" >> $seqres.full
+_scratch_xfs_db -r -c "hashcoll -a -n $nr_attrs -p $crash_attrs $longname"
+
+_scratch_unmount
+
+# Make sure that there's one hash value dominating the dabtree block.
+# We don't require 100% because directories create dabtree records for dot
+# and dotdot.
+filter_hashvals() {
+	uniq -c | awk -v seqres_full="$seqres.full" \
+		'{print $0 >> seqres_full; tot += $1; if ($1 > biggest) biggest = $1;} END {if (biggest >= (tot - 2)) exit(0); exit(1);}'
+	test "${PIPESTATUS[1]}" -eq 0 || \
+		echo "Scattered dabtree hashes?  See seqres.full"
+}
+
+# Did we actually get a two-level dabtree for the directory?  Does it contain a
+# long run of hashes?
+echo "dir check" >> $seqres.full
+da_node_block_offset=$(( (2 ** 35) / blksz ))
+dir_db_args=(-c 'path /lol/' -c "dblock $da_node_block_offset" -c 'addr nbtree[0].before')
+dir_count="$(_scratch_xfs_db "${dir_db_args[@]}" -c 'print lhdr.count' | awk '{print $3}')"
+_scratch_xfs_db "${dir_db_args[@]}" -c "print lents[0-$((dir_count - 1))].hashval" | sed -e 's/lents\[[0-9]*\]/lents[NN]/g' | filter_hashvals
+
+# Did we actually get a two-level dabtree for the attrs?  Does it contain a
+# long run of hashes?
+echo "attr check" >> $seqres.full
+attr_db_args=(-c 'path /hah' -c "ablock 0" -c 'addr btree[0].before')
+attr_count="$(_scratch_xfs_db "${attr_db_args[@]}" -c 'print hdr.count' | awk '{print $3}')"
+_scratch_xfs_db "${attr_db_args[@]}" -c "print btree[0-$((attr_count - 1))].hashval" | sed -e 's/btree\[[0-9]*\]/btree[NN]/g' | filter_hashvals
+
+# Remount to get some coverage of xfs_scrub before seeing if xfs_repair
+# will trip over the large dabtrees.
+echo Silence is golden
+_scratch_mount
+status=0
+exit
diff --git a/tests/xfs/861.out b/tests/xfs/861.out
new file mode 100644
index 0000000000..d11b76c82e
--- /dev/null
+++ b/tests/xfs/861.out
@@ -0,0 +1,2 @@
+QA output created by 861
+Silence is golden

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation
  2023-04-06  0:02 ` [PATCH 1/4] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
@ 2023-04-11  4:50   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

On Wed, Apr 05, 2023 at 05:02:59PM -0700, Darrick J. Wong wrote:
> -		if (tolower(args->name[i]) != tolower(name[i]))
> +		if (xfs_ascii_ci_xfrm(args->name[i]) != xfs_ascii_ci_xfrm(name[i]))

Please avoid the overly long line name here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] xfs: test the ascii case-insensitive hash
  2023-04-06  0:03 ` [PATCH 2/4] xfs: test the ascii case-insensitive hash Darrick J. Wong
@ 2023-04-11  4:50   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] xfs: use the directory name hash function for dir scrubbing
  2023-04-06  0:03 ` [PATCH 3/4] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
@ 2023-04-11  4:51   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] xfs: deprecate the ascii-ci feature
  2023-04-06  0:03 ` [PATCH 4/4] xfs: deprecate the ascii-ci feature Darrick J. Wong
@ 2023-04-11  4:52   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/6] xfs_db: move obfuscate_name assertion to callers
  2023-04-06  0:09   ` [PATCH 3/6] xfs_db: move obfuscate_name assertion to callers Darrick J. Wong
@ 2023-04-11  4:52     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, david

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems
  2023-04-06  0:09   ` [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems Darrick J. Wong
@ 2023-04-11  4:58     ` Christoph Hellwig
  2023-04-11 15:35       ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, david

On Wed, Apr 05, 2023 at 05:09:55PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Now that we've stabilized the dirent hash function for ascii-ci
> filesystems, adapt the metadump name obfuscation code to detect when
> it's obfuscating a directory entry name on an ascii-ci filesystem and
> spit out names that actually have the same hash.

Between the alloc use, the goto jumping back and the failure to
obsfucate some names this really seems horribly ugly.  I could
come up with ideas to fix some of that, but they'd be fairly invasive.

Is there any reason we need to support obsfucatation for ascii-ci,
or could we just say we require "-o" to metadump ascii-ci file systems
and not deal with this at all given that it never actually worked?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] mkfs.xfs.8: warn about the version=ci feature
  2023-04-06  0:10   ` [PATCH 5/6] mkfs.xfs.8: warn about the version=ci feature Darrick J. Wong
@ 2023-04-11  4:59     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, david

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/6] mkfs: deprecate the ascii-ci feature
  2023-04-06  0:10   ` [PATCH 6/6] mkfs: deprecate the ascii-ci feature Darrick J. Wong
@ 2023-04-11  4:59     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-11  4:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, david

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems
  2023-04-11  4:58     ` Christoph Hellwig
@ 2023-04-11 15:35       ` Darrick J. Wong
  2023-04-12 12:09         ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-11 15:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs, david

On Mon, Apr 10, 2023 at 09:58:46PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 05, 2023 at 05:09:55PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Now that we've stabilized the dirent hash function for ascii-ci
> > filesystems, adapt the metadump name obfuscation code to detect when
> > it's obfuscating a directory entry name on an ascii-ci filesystem and
> > spit out names that actually have the same hash.
> 
> Between the alloc use, the goto jumping back and the failure to
> obsfucate some names this really seems horribly ugly.  I could
> come up with ideas to fix some of that, but they'd be fairly invasive.

Given that it's rol7 and xoring, I'd love it if someone came up with a
gentler obfuscate_name() that at least tried to generate obfuscated
names that weren't full of control characters and other junk that make
ls output horrible.

Buuuut doing that requires a deep understanding of how the math works.
I think I've almost grokked it, but applied math has never been my
specialty.  Mark Adler's crc spoof looked promising if we ever follow
through on Dave's suggestion to change the dahash to crc32c, but that's
a whole different discussion.

> Is there any reason we need to support obsfucatation for ascii-ci,
> or could we just say we require "-o" to metadump ascii-ci file systems
> and not deal with this at all given that it never actually worked?

That would be simpler for metadump, yes.

I'm going to introduce a followup series that adds a new xfs_db command
to generate obfuscated filenames/attrs to exercise the dabtree hash
collision resolution code.  I should probably do that now, since I
already sent xfs/861 that uses it.

It wouldn't be the end of the world if hashcoll didn't work on asciici
filesystems, but that /would/ be a testing gap.

--D

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems
  2023-04-11 15:35       ` Darrick J. Wong
@ 2023-04-12 12:09         ` Christoph Hellwig
  2023-04-12 22:04           ` Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2023-04-12 12:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs, david

On Tue, Apr 11, 2023 at 08:35:46AM -0700, Darrick J. Wong wrote:
> > obsfucate some names this really seems horribly ugly.  I could
> > come up with ideas to fix some of that, but they'd be fairly invasive.
> 
> Given that it's rol7 and xoring, I'd love it if someone came up with a
> gentler obfuscate_name() that at least tried to generate obfuscated
> names that weren't full of control characters and other junk that make
> ls output horrible.
> 
> Buuuut doing that requires a deep understanding of how the math works.
> I think I've almost grokked it, but applied math has never been my
> specialty.  Mark Adler's crc spoof looked promising if we ever follow
> through on Dave's suggestion to change the dahash to crc32c, but that's
> a whole different discussion.

Agreed on all counts.

> > Is there any reason we need to support obsfucatation for ascii-ci,
> > or could we just say we require "-o" to metadump ascii-ci file systems
> > and not deal with this at all given that it never actually worked?
> 
> That would be simpler for metadump, yes.
> 
> I'm going to introduce a followup series that adds a new xfs_db command
> to generate obfuscated filenames/attrs to exercise the dabtree hash
> collision resolution code.  I should probably do that now, since I
> already sent xfs/861 that uses it.
> 
> It wouldn't be the end of the world if hashcoll didn't work on asciici
> filesystems, but that /would/ be a testing gap.

Do we really care about that testing gap for a feature you just
deprecated and which has been pretty broken all this time?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems
  2023-04-12 12:09         ` Christoph Hellwig
@ 2023-04-12 22:04           ` Darrick J. Wong
  0 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-12 22:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs, david

On Wed, Apr 12, 2023 at 05:09:56AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 11, 2023 at 08:35:46AM -0700, Darrick J. Wong wrote:
> > > obsfucate some names this really seems horribly ugly.  I could
> > > come up with ideas to fix some of that, but they'd be fairly invasive.
> > 
> > Given that it's rol7 and xoring, I'd love it if someone came up with a
> > gentler obfuscate_name() that at least tried to generate obfuscated
> > names that weren't full of control characters and other junk that make
> > ls output horrible.
> > 
> > Buuuut doing that requires a deep understanding of how the math works.
> > I think I've almost grokked it, but applied math has never been my
> > specialty.  Mark Adler's crc spoof looked promising if we ever follow
> > through on Dave's suggestion to change the dahash to crc32c, but that's
> > a whole different discussion.
> 
> Agreed on all counts.
> 
> > > Is there any reason we need to support obsfucatation for ascii-ci,
> > > or could we just say we require "-o" to metadump ascii-ci file systems
> > > and not deal with this at all given that it never actually worked?
> > 
> > That would be simpler for metadump, yes.
> > 
> > I'm going to introduce a followup series that adds a new xfs_db command
> > to generate obfuscated filenames/attrs to exercise the dabtree hash
> > collision resolution code.  I should probably do that now, since I
> > already sent xfs/861 that uses it.
> > 
> > It wouldn't be the end of the world if hashcoll didn't work on asciici
> > filesystems, but that /would/ be a testing gap.
> 
> Do we really care about that testing gap for a feature you just
> deprecated and which has been pretty broken all this time?

I don't, and am perfectly happy to send an alternate patch that errors
out if you try to obfuscate an asciici filesystem.  Or maybe doesn't
even error out, since names less than 5 letters aren't obfuscated, so
it's not like we're hiding things effectively anyway.

That said, Carlos is the maintainer, so let's let him decide. :D

1) Gross loopy code; or
2) Less test coverage of broken code; or
3) Control gross loopy code with a flag so that debugger commands can
   still do gross things, but metadump won't.

--D

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 7/6] xfs_db: hoist name obfuscation code out of metadump.c
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-04-06  0:10   ` [PATCH 6/6] mkfs: deprecate the ascii-ci feature Darrick J. Wong
@ 2023-04-13 15:19   ` Darrick J. Wong
  2023-04-13 15:20   ` [RFC PATCH 8/6] xfs_db: create dirents and xattrs with colliding names Darrick J. Wong
  7 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-13 15:19 UTC (permalink / raw)
  To: cem; +Cc: linux-xfs, david

From: Darrick J. Wong <djwong@kernel.org>

We want to create a debugger command that will create obfuscated names
for directory and xattr names, so hoist the name obfuscation code into a
separate file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/Makefile    |    2 
 db/metadump.c  |  383 -------------------------------------------------------
 db/obfuscate.c |  389 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 db/obfuscate.h |   17 ++
 4 files changed, 408 insertions(+), 383 deletions(-)
 create mode 100644 db/obfuscate.c
 create mode 100644 db/obfuscate.h

diff --git a/db/Makefile b/db/Makefile
index dbe79a9a1b1..e4f05c5cd76 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -13,7 +13,7 @@ HFILES = addr.h agf.h agfl.h agi.h attr.h attrshort.h bit.h block.h bmap.h \
 	flist.h fprint.h frag.h freesp.h hash.h help.h init.h inode.h input.h \
 	io.h logformat.h malloc.h metadump.h output.h print.h quit.h sb.h \
 	sig.h strvec.h text.h type.h write.h attrset.h symlink.h fsmap.h \
-	fuzz.h
+	fuzz.h obfuscate.h
 CFILES = $(HFILES:.h=.c) btdump.c btheight.c convert.c info.c namei.c \
 	timelimit.c bmap_inflate.c unlinked.c
 LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
diff --git a/db/metadump.c b/db/metadump.c
index 4f8b3adb163..d9a616a9296 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -19,6 +19,7 @@
 #include "faddr.h"
 #include "field.h"
 #include "dir2.h"
+#include "obfuscate.h"
 
 #define DEFAULT_MAX_EXT_SIZE	XFS_MAX_BMBT_EXTLEN
 
@@ -736,19 +737,6 @@ nametable_add(xfs_dahash_t hash, int namelen, unsigned char *name)
 	return ent;
 }
 
-#define is_invalid_char(c)	((c) == '/' || (c) == '\0')
-#define rol32(x,y)		(((x) << (y)) | ((x) >> (32 - (y))))
-
-static inline unsigned char
-random_filename_char(void)
-{
-	static unsigned char filename_alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
-						"abcdefghijklmnopqrstuvwxyz"
-						"0123456789-_";
-
-	return filename_alphabet[random() % (sizeof filename_alphabet - 1)];
-}
-
 #define	ORPHANAGE	"lost+found"
 #define	ORPHANAGE_LEN	(sizeof (ORPHANAGE) - 1)
 
@@ -808,375 +796,6 @@ in_lost_found(
 	return slen == namelen && !memcmp(name, s, namelen);
 }
 
-/*
- * Given a name and its hash value, massage the name in such a way
- * that the result is another name of equal length which shares the
- * same hash value.
- */
-static void
-obfuscate_name(
-	xfs_dahash_t	hash,
-	size_t		name_len,
-	unsigned char	*name,
-	bool		is_dirent)
-{
-	unsigned char	*oldname = NULL;
-	unsigned char	*newp;
-	int		i;
-	xfs_dahash_t	new_hash;
-	unsigned char	*first;
-	unsigned char	high_bit;
-	int		tries = 0;
-	bool		is_ci_name = is_dirent && xfs_has_asciici(mp);
-	int		shift;
-
-	/*
-	 * Our obfuscation algorithm requires at least 5-character
-	 * names, so don't bother if the name is too short.  We
-	 * work backward from a hash value to determine the last
-	 * five bytes in a name required to produce a new name
-	 * with the same hash.
-	 */
-	if (name_len < 5)
-		return;
-
-	if (is_ci_name) {
-		oldname = alloca(name_len);
-		memcpy(oldname, name, name_len);
-	}
-
-again:
-	newp = name;
-	new_hash = 0;
-
-	/*
-	 * If we cannot generate a ci-compatible obfuscated name after 1000
-	 * tries, don't bother obfuscating the name.
-	 */
-	if (tries++ > 1000) {
-		memcpy(name, oldname, name_len);
-		return;
-	}
-
-	/*
-	 * The beginning of the obfuscated name can be pretty much
-	 * anything, so fill it in with random characters.
-	 * Accumulate its new hash value as we go.
-	 */
-	for (i = 0; i < name_len - 5; i++) {
-		*newp = random_filename_char();
-		if (is_ci_name)
-			new_hash = xfs_ascii_ci_xfrm(*newp) ^
-							rol32(new_hash, 7);
-		else
-			new_hash = *newp ^ rol32(new_hash, 7);
-		newp++;
-	}
-
-	/*
-	 * Compute which five bytes need to be used at the end of
-	 * the name so the hash of the obfuscated name is the same
-	 * as the hash of the original.  If any result in an invalid
-	 * character, flip a bit and arrange for a corresponding bit
-	 * in a neighboring byte to be flipped as well.  For the
-	 * last byte, the "neighbor" to change is the first byte
-	 * we're computing here.
-	 */
-	new_hash = rol32(new_hash, 3) ^ hash;
-
-	first = newp;
-	high_bit = 0;
-	for (shift = 28; shift >= 0; shift -= 7) {
-		*newp = (new_hash >> shift & 0x7f) ^ high_bit;
-		if (is_invalid_char(*newp)) {
-			*newp ^= 1;
-			high_bit = 0x80;
-		} else
-			high_bit = 0;
-
-		/*
-		 * If ascii-ci is enabled, uppercase characters are converted
-		 * to lowercase characters while computing the name hash.  If
-		 * any of the necessary correction bytes are uppercase, the
-		 * hash of the new name will not match.  Try again with a
-		 * different prefix.
-		 */
-		if (is_ci_name && xfs_ascii_ci_need_xfrm(*newp))
-			goto again;
-
-		ASSERT(!is_invalid_char(*newp));
-		newp++;
-	}
-
-	/*
-	 * If we flipped a bit on the last byte, we need to fix up
-	 * the matching bit in the first byte.  The result will
-	 * be a valid character, because we know that first byte
-	 * has 0's in its upper four bits (it was produced by a
-	 * 28-bit right-shift of a 32-bit unsigned value).
-	 */
-	if (high_bit) {
-		*first ^= 0x10;
-
-		if (is_ci_name && xfs_ascii_ci_need_xfrm(*first))
-			goto again;
-
-		ASSERT(!is_invalid_char(*first));
-	}
-}
-
-/*
- * Flip a bit in each of two bytes at the end of the given name.
- * This is used in generating a series of alternate names to be used
- * in the event a duplicate is found.
- *
- * The bits flipped are selected such that they both affect the same
- * bit in the name's computed hash value, so flipping them both will
- * preserve the hash.
- *
- * The following diagram aims to show the portion of a computed
- * hash that a given byte of a name affects.
- *
- *	   31    28      24    21	     14		  8 7       3     0
- *	   +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
- * hash:   | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
- *	   +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
- *	  last-4 ->|	       |<-- last-2 --->|	   |<--- last ---->|
- *		 |<-- last-3 --->|	     |<-- last-1 --->|     |<- last-4
- *			 |<-- last-7 --->|	     |<-- last-5 --->|
- *	   |<-- last-8 --->|	       |<-- last-6 --->|
- *			. . . and so on
- *
- * The last byte of the name directly affects the low-order byte of
- * the hash.  The next-to-last affects bits 7-14, the next one back
- * affects bits 14-21, and so on.  The effect wraps around when it
- * goes beyond the top of the hash (as happens for byte last-4).
- *
- * Bits that are flipped together "overlap" on the hash value.  As
- * an example of overlap, the last two bytes both affect bit 7 in
- * the hash.  That pair of bytes (and their overlapping bits) can be
- * used for this "flip bit" operation (it's the first pair tried,
- * actually).
- *
- * A table defines overlapping pairs--the bytes involved and bits
- * within them--that can be used this way.  The byte offset is
- * relative to a starting point within the name, which will be set
- * to affect the bytes at the end of the name.  The function is
- * called with a "bitseq" value which indicates which bit flip is
- * desired, and this translates directly into selecting which entry
- * in the bit_to_flip[] table to apply.
- *
- * The function returns 1 if the operation was successful.  It
- * returns 0 if the result produced a character that's not valid in
- * a name (either '/' or a '\0').  Finally, it returns -1 if the bit
- * sequence number is beyond what is supported for a name of this
- * length.
- *
- * Discussion
- * ----------
- * (Also see the discussion above find_alternate(), below.)
- *
- * In order to make this function work for any length name, the
- * table is ordered by increasing byte offset, so that the earliest
- * entries can apply to the shortest strings.  This way all names
- * are done consistently.
- *
- * When bit flips occur, they can convert printable characters
- * into non-printable ones.  In an effort to reduce the impact of
- * this, the first bit flips are chosen to affect bytes the end of
- * the name (and furthermore, toward the low bits of a byte).  Those
- * bytes are often non-printable anyway because of the way they are
- * initially selected by obfuscate_name()).  This is accomplished,
- * using later table entries first.
- *
- * Each row in the table doubles the number of alternates that
- * can be generated.  A two-byte name is limited to using only
- * the first row, so it's possible to generate two alternates
- * (the original name, plus the alternate produced by flipping
- * the one pair of bits).  In a 5-byte name, the effect of the
- * first byte overlaps the last by 4 its, and there are 8 bits
- * to flip, allowing for 256 possible alternates.
- *
- * Short names (less than 5 bytes) are never even obfuscated, so for
- * such names the relatively small number of alternates should never
- * really be a problem.
- *
- * Long names (more than 6 bytes, say) are not likely to exhaust
- * the number of available alternates.  In fact, the table could
- * probably have stopped at 8 entries, on the assumption that 256
- * alternates should be enough for most any situation.  The entries
- * beyond those are present mostly for demonstration of how it could
- * be populated with more entries, should it ever be necessary to do
- * so.
- */
-static int
-flip_bit(
-	size_t		name_len,
-	unsigned char	*name,
-	uint32_t	bitseq)
-{
-	int	index;
-	size_t	offset;
-	unsigned char *p0, *p1;
-	unsigned char m0, m1;
-	struct {
-	    int		byte;	/* Offset from start within name */
-	    unsigned char bit;	/* Bit within that byte */
-	} bit_to_flip[][2] = {	/* Sorted by second entry's byte */
-	    { { 0, 0 }, { 1, 7 } },	/* Each row defines a pair */
-	    { { 1, 0 }, { 2, 7 } },	/* of bytes and a bit within */
-	    { { 2, 0 }, { 3, 7 } },	/* each byte.  Each bit in */
-	    { { 0, 4 }, { 4, 0 } },	/* a pair affects the same */
-	    { { 0, 5 }, { 4, 1 } },	/* bit in the hash, so flipping */
-	    { { 0, 6 }, { 4, 2 } },	/* both will change the name */
-	    { { 0, 7 }, { 4, 3 } },	/* while preserving the hash. */
-	    { { 3, 0 }, { 4, 7 } },
-	    { { 0, 0 }, { 5, 3 } },	/* The first entry's byte offset */
-	    { { 0, 1 }, { 5, 4 } },	/* must be less than the second. */
-	    { { 0, 2 }, { 5, 5 } },
-	    { { 0, 3 }, { 5, 6 } },	/* The table can be extended to */
-	    { { 0, 4 }, { 5, 7 } },	/* an arbitrary number of entries */
-	    { { 4, 0 }, { 5, 7 } },	/* but there's not much point. */
-		/* . . . */
-	};
-
-	/* Find the first entry *not* usable for name of this length */
-
-	for (index = 0; index < ARRAY_SIZE(bit_to_flip); index++)
-		if (bit_to_flip[index][1].byte >= name_len)
-			break;
-
-	/*
-	 * Back up to the last usable entry.  If that number is
-	 * smaller than the bit sequence number, inform the caller
-	 * that nothing this large (or larger) will work.
-	 */
-	if (bitseq > --index)
-		return -1;
-
-	/*
-	 * We will be switching bits at the end of name, with a
-	 * preference for affecting the last bytes first.  Compute
-	 * where in the name we'll start applying the changes.
-	 */
-	offset = name_len - (bit_to_flip[index][1].byte + 1);
-	index -= bitseq;	/* Use later table entries first */
-
-	p0 = name + offset + bit_to_flip[index][0].byte;
-	p1 = name + offset + bit_to_flip[index][1].byte;
-	m0 = 1 << bit_to_flip[index][0].bit;
-	m1 = 1 << bit_to_flip[index][1].bit;
-
-	/* Only change the bytes if it produces valid characters */
-
-	if (is_invalid_char(*p0 ^ m0) || is_invalid_char(*p1 ^ m1))
-		return 0;
-
-	*p0 ^= m0;
-	*p1 ^= m1;
-
-	return 1;
-}
-
-/*
- * This function generates a well-defined sequence of "alternate"
- * names for a given name.  An alternate is a name having the same
- * length and same hash value as the original name.  This is needed
- * because the algorithm produces only one obfuscated name to use
- * for a given original name, and it's possible that result matches
- * a name already seen.  This function checks for this, and if it
- * occurs, finds another suitable obfuscated name to use.
- *
- * Each bit in the binary representation of the sequence number is
- * used to select one possible "bit flip" operation to perform on
- * the name.  So for example:
- *    seq = 0:	selects no bits to flip
- *    seq = 1:	selects the 0th bit to flip
- *    seq = 2:	selects the 1st bit to flip
- *    seq = 3:	selects the 0th and 1st bit to flip
- *    ... and so on.
- *
- * The flip_bit() function takes care of the details of the bit
- * flipping within the name.  Note that the "1st bit" in this
- * context is a bit sequence number; i.e. it doesn't necessarily
- * mean bit 0x02 will be changed.
- *
- * If a valid name (one that contains no '/' or '\0' characters) is
- * produced by this process for the given sequence number, this
- * function returns 1.  If the result is not valid, it returns 0.
- * Returns -1 if the sequence number is beyond the the maximum for
- * names of the given length.
- *
- *
- * Discussion
- * ----------
- * The number of alternates available for a given name is dependent
- * on its length.  A "bit flip" involves inverting two bits in
- * a name--the two bits being selected such that their values
- * affect the name's hash value in the same way.  Alternates are
- * thus generated by inverting the value of pairs of such
- * "overlapping" bits in the original name.  Each byte after the
- * first in a name adds at least one bit of overlap to work with.
- * (See comments above flip_bit() for more discussion on this.)
- *
- * So the number of alternates is dependent on the number of such
- * overlapping bits in a name.  If there are N bit overlaps, there
- * 2^N alternates for that hash value.
- *
- * Here are the number of overlapping bits available for generating
- * alternates for names of specific lengths:
- *	1	0	(must have 2 bytes to have any overlap)
- *	2	1	One bit overlaps--so 2 possible alternates
- *	3	2	Two bits overlap--so 4 possible alternates
- *	4	4	Three bits overlap, so 2^3 alternates
- *	5	8	8 bits overlap (due to wrapping), 256 alternates
- *	6	18	2^18 alternates
- *	7	28	2^28 alternates
- *	   ...
- * It's clear that the number of alternates grows very quickly with
- * the length of the name.  But note that the set of alternates
- * includes invalid names.  And for certain (contrived) names, the
- * number of valid names is a fairly small fraction of the total
- * number of alternates.
- *
- * The main driver for this infrastructure for coming up with
- * alternate names is really related to names 5 (or possibly 6)
- * bytes in length.  5-byte obfuscated names contain no randomly-
- * generated bytes in them, and the chance of an obfuscated name
- * matching an already-seen name is too high to just ignore.  This
- * methodical selection of alternates ensures we don't produce
- * duplicate names unless we have exhausted our options.
- */
-static int
-find_alternate(
-	size_t		name_len,
-	unsigned char	*name,
-	uint32_t	seq)
-{
-	uint32_t	bitseq = 0;
-	uint32_t	bits = seq;
-
-	if (!seq)
-		return 1;	/* alternate 0 is the original name */
-	if (name_len < 2)	/* Must have 2 bytes to flip */
-		return -1;
-
-	for (bitseq = 0; bits; bitseq++) {
-		uint32_t	mask = 1 << bitseq;
-		int		fb;
-
-		if (!(bits & mask))
-			continue;
-
-		fb = flip_bit(name_len, name, bitseq);
-		if (fb < 1)
-			return fb ? -1 : 0;
-		bits ^= mask;
-	}
-
-	return 1;
-}
-
 /*
  * Look up the given name in the name table.  If it is already
  * present, iterate through a well-defined sequence of alternate
diff --git a/db/obfuscate.c b/db/obfuscate.c
new file mode 100644
index 00000000000..249f22b52ce
--- /dev/null
+++ b/db/obfuscate.c
@@ -0,0 +1,389 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2007, 2011 SGI
+ * All Rights Reserved.
+ */
+#include "libxfs.h"
+#include "init.h"
+#include "obfuscate.h"
+
+static inline unsigned char
+random_filename_char(void)
+{
+	static unsigned char filename_alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+						"abcdefghijklmnopqrstuvwxyz"
+						"0123456789-_";
+
+	return filename_alphabet[random() % (sizeof filename_alphabet - 1)];
+}
+
+#define rol32(x,y)		(((x) << (y)) | ((x) >> (32 - (y))))
+
+/*
+ * Given a name and its hash value, massage the name in such a way
+ * that the result is another name of equal length which shares the
+ * same hash value.
+ */
+void
+obfuscate_name(
+	xfs_dahash_t	hash,
+	size_t		name_len,
+	unsigned char	*name,
+	bool		is_dirent)
+{
+	unsigned char	*oldname = NULL;
+	unsigned char	*newp;
+	int		i;
+	xfs_dahash_t	new_hash;
+	unsigned char	*first;
+	unsigned char	high_bit;
+	int		tries = 0;
+	bool		is_ci_name = is_dirent && xfs_has_asciici(mp);
+	int		shift;
+
+	/*
+	 * Our obfuscation algorithm requires at least 5-character
+	 * names, so don't bother if the name is too short.  We
+	 * work backward from a hash value to determine the last
+	 * five bytes in a name required to produce a new name
+	 * with the same hash.
+	 */
+	if (name_len < 5)
+		return;
+
+	if (is_ci_name) {
+		oldname = alloca(name_len);
+		memcpy(oldname, name, name_len);
+	}
+
+again:
+	newp = name;
+	new_hash = 0;
+
+	/*
+	 * If we cannot generate a ci-compatible obfuscated name after 1000
+	 * tries, don't bother obfuscating the name.
+	 */
+	if (tries++ > 1000) {
+		memcpy(name, oldname, name_len);
+		return;
+	}
+
+	/*
+	 * The beginning of the obfuscated name can be pretty much
+	 * anything, so fill it in with random characters.
+	 * Accumulate its new hash value as we go.
+	 */
+	for (i = 0; i < name_len - 5; i++) {
+		*newp = random_filename_char();
+		if (is_ci_name)
+			new_hash = xfs_ascii_ci_xfrm(*newp) ^
+							rol32(new_hash, 7);
+		else
+			new_hash = *newp ^ rol32(new_hash, 7);
+		newp++;
+	}
+
+	/*
+	 * Compute which five bytes need to be used at the end of
+	 * the name so the hash of the obfuscated name is the same
+	 * as the hash of the original.  If any result in an invalid
+	 * character, flip a bit and arrange for a corresponding bit
+	 * in a neighboring byte to be flipped as well.  For the
+	 * last byte, the "neighbor" to change is the first byte
+	 * we're computing here.
+	 */
+	new_hash = rol32(new_hash, 3) ^ hash;
+
+	first = newp;
+	high_bit = 0;
+	for (shift = 28; shift >= 0; shift -= 7) {
+		*newp = (new_hash >> shift & 0x7f) ^ high_bit;
+		if (is_invalid_char(*newp)) {
+			*newp ^= 1;
+			high_bit = 0x80;
+		} else
+			high_bit = 0;
+
+		/*
+		 * If ascii-ci is enabled, uppercase characters are converted
+		 * to lowercase characters while computing the name hash.  If
+		 * any of the necessary correction bytes are uppercase, the
+		 * hash of the new name will not match.  Try again with a
+		 * different prefix.
+		 */
+		if (is_ci_name && xfs_ascii_ci_need_xfrm(*newp))
+			goto again;
+
+		ASSERT(!is_invalid_char(*newp));
+		newp++;
+	}
+
+	/*
+	 * If we flipped a bit on the last byte, we need to fix up
+	 * the matching bit in the first byte.  The result will
+	 * be a valid character, because we know that first byte
+	 * has 0's in its upper four bits (it was produced by a
+	 * 28-bit right-shift of a 32-bit unsigned value).
+	 */
+	if (high_bit) {
+		*first ^= 0x10;
+
+		if (is_ci_name && xfs_ascii_ci_need_xfrm(*first))
+			goto again;
+
+		ASSERT(!is_invalid_char(*first));
+	}
+}
+
+/*
+ * Flip a bit in each of two bytes at the end of the given name.
+ * This is used in generating a series of alternate names to be used
+ * in the event a duplicate is found.
+ *
+ * The bits flipped are selected such that they both affect the same
+ * bit in the name's computed hash value, so flipping them both will
+ * preserve the hash.
+ *
+ * The following diagram aims to show the portion of a computed
+ * hash that a given byte of a name affects.
+ *
+ *	   31    28      24    21	     14		  8 7       3     0
+ *	   +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
+ * hash:   | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+ *	   +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
+ *	  last-4 ->|	       |<-- last-2 --->|	   |<--- last ---->|
+ *		 |<-- last-3 --->|	     |<-- last-1 --->|     |<- last-4
+ *			 |<-- last-7 --->|	     |<-- last-5 --->|
+ *	   |<-- last-8 --->|	       |<-- last-6 --->|
+ *			. . . and so on
+ *
+ * The last byte of the name directly affects the low-order byte of
+ * the hash.  The next-to-last affects bits 7-14, the next one back
+ * affects bits 14-21, and so on.  The effect wraps around when it
+ * goes beyond the top of the hash (as happens for byte last-4).
+ *
+ * Bits that are flipped together "overlap" on the hash value.  As
+ * an example of overlap, the last two bytes both affect bit 7 in
+ * the hash.  That pair of bytes (and their overlapping bits) can be
+ * used for this "flip bit" operation (it's the first pair tried,
+ * actually).
+ *
+ * A table defines overlapping pairs--the bytes involved and bits
+ * within them--that can be used this way.  The byte offset is
+ * relative to a starting point within the name, which will be set
+ * to affect the bytes at the end of the name.  The function is
+ * called with a "bitseq" value which indicates which bit flip is
+ * desired, and this translates directly into selecting which entry
+ * in the bit_to_flip[] table to apply.
+ *
+ * The function returns 1 if the operation was successful.  It
+ * returns 0 if the result produced a character that's not valid in
+ * a name (either '/' or a '\0').  Finally, it returns -1 if the bit
+ * sequence number is beyond what is supported for a name of this
+ * length.
+ *
+ * Discussion
+ * ----------
+ * (Also see the discussion above find_alternate(), below.)
+ *
+ * In order to make this function work for any length name, the
+ * table is ordered by increasing byte offset, so that the earliest
+ * entries can apply to the shortest strings.  This way all names
+ * are done consistently.
+ *
+ * When bit flips occur, they can convert printable characters
+ * into non-printable ones.  In an effort to reduce the impact of
+ * this, the first bit flips are chosen to affect bytes the end of
+ * the name (and furthermore, toward the low bits of a byte).  Those
+ * bytes are often non-printable anyway because of the way they are
+ * initially selected by obfuscate_name()).  This is accomplished,
+ * using later table entries first.
+ *
+ * Each row in the table doubles the number of alternates that
+ * can be generated.  A two-byte name is limited to using only
+ * the first row, so it's possible to generate two alternates
+ * (the original name, plus the alternate produced by flipping
+ * the one pair of bits).  In a 5-byte name, the effect of the
+ * first byte overlaps the last by 4 its, and there are 8 bits
+ * to flip, allowing for 256 possible alternates.
+ *
+ * Short names (less than 5 bytes) are never even obfuscated, so for
+ * such names the relatively small number of alternates should never
+ * really be a problem.
+ *
+ * Long names (more than 6 bytes, say) are not likely to exhaust
+ * the number of available alternates.  In fact, the table could
+ * probably have stopped at 8 entries, on the assumption that 256
+ * alternates should be enough for most any situation.  The entries
+ * beyond those are present mostly for demonstration of how it could
+ * be populated with more entries, should it ever be necessary to do
+ * so.
+ */
+static int
+flip_bit(
+	size_t		name_len,
+	unsigned char	*name,
+	uint32_t	bitseq)
+{
+	int	index;
+	size_t	offset;
+	unsigned char *p0, *p1;
+	unsigned char m0, m1;
+	struct {
+	    int		byte;	/* Offset from start within name */
+	    unsigned char bit;	/* Bit within that byte */
+	} bit_to_flip[][2] = {	/* Sorted by second entry's byte */
+	    { { 0, 0 }, { 1, 7 } },	/* Each row defines a pair */
+	    { { 1, 0 }, { 2, 7 } },	/* of bytes and a bit within */
+	    { { 2, 0 }, { 3, 7 } },	/* each byte.  Each bit in */
+	    { { 0, 4 }, { 4, 0 } },	/* a pair affects the same */
+	    { { 0, 5 }, { 4, 1 } },	/* bit in the hash, so flipping */
+	    { { 0, 6 }, { 4, 2 } },	/* both will change the name */
+	    { { 0, 7 }, { 4, 3 } },	/* while preserving the hash. */
+	    { { 3, 0 }, { 4, 7 } },
+	    { { 0, 0 }, { 5, 3 } },	/* The first entry's byte offset */
+	    { { 0, 1 }, { 5, 4 } },	/* must be less than the second. */
+	    { { 0, 2 }, { 5, 5 } },
+	    { { 0, 3 }, { 5, 6 } },	/* The table can be extended to */
+	    { { 0, 4 }, { 5, 7 } },	/* an arbitrary number of entries */
+	    { { 4, 0 }, { 5, 7 } },	/* but there's not much point. */
+		/* . . . */
+	};
+
+	/* Find the first entry *not* usable for name of this length */
+
+	for (index = 0; index < ARRAY_SIZE(bit_to_flip); index++)
+		if (bit_to_flip[index][1].byte >= name_len)
+			break;
+
+	/*
+	 * Back up to the last usable entry.  If that number is
+	 * smaller than the bit sequence number, inform the caller
+	 * that nothing this large (or larger) will work.
+	 */
+	if (bitseq > --index)
+		return -1;
+
+	/*
+	 * We will be switching bits at the end of name, with a
+	 * preference for affecting the last bytes first.  Compute
+	 * where in the name we'll start applying the changes.
+	 */
+	offset = name_len - (bit_to_flip[index][1].byte + 1);
+	index -= bitseq;	/* Use later table entries first */
+
+	p0 = name + offset + bit_to_flip[index][0].byte;
+	p1 = name + offset + bit_to_flip[index][1].byte;
+	m0 = 1 << bit_to_flip[index][0].bit;
+	m1 = 1 << bit_to_flip[index][1].bit;
+
+	/* Only change the bytes if it produces valid characters */
+
+	if (is_invalid_char(*p0 ^ m0) || is_invalid_char(*p1 ^ m1))
+		return 0;
+
+	*p0 ^= m0;
+	*p1 ^= m1;
+
+	return 1;
+}
+
+/*
+ * This function generates a well-defined sequence of "alternate"
+ * names for a given name.  An alternate is a name having the same
+ * length and same hash value as the original name.  This is needed
+ * because the algorithm produces only one obfuscated name to use
+ * for a given original name, and it's possible that result matches
+ * a name already seen.  This function checks for this, and if it
+ * occurs, finds another suitable obfuscated name to use.
+ *
+ * Each bit in the binary representation of the sequence number is
+ * used to select one possible "bit flip" operation to perform on
+ * the name.  So for example:
+ *    seq = 0:	selects no bits to flip
+ *    seq = 1:	selects the 0th bit to flip
+ *    seq = 2:	selects the 1st bit to flip
+ *    seq = 3:	selects the 0th and 1st bit to flip
+ *    ... and so on.
+ *
+ * The flip_bit() function takes care of the details of the bit
+ * flipping within the name.  Note that the "1st bit" in this
+ * context is a bit sequence number; i.e. it doesn't necessarily
+ * mean bit 0x02 will be changed.
+ *
+ * If a valid name (one that contains no '/' or '\0' characters) is
+ * produced by this process for the given sequence number, this
+ * function returns 1.  If the result is not valid, it returns 0.
+ * Returns -1 if the sequence number is beyond the the maximum for
+ * names of the given length.
+ *
+ *
+ * Discussion
+ * ----------
+ * The number of alternates available for a given name is dependent
+ * on its length.  A "bit flip" involves inverting two bits in
+ * a name--the two bits being selected such that their values
+ * affect the name's hash value in the same way.  Alternates are
+ * thus generated by inverting the value of pairs of such
+ * "overlapping" bits in the original name.  Each byte after the
+ * first in a name adds at least one bit of overlap to work with.
+ * (See comments above flip_bit() for more discussion on this.)
+ *
+ * So the number of alternates is dependent on the number of such
+ * overlapping bits in a name.  If there are N bit overlaps, there
+ * 2^N alternates for that hash value.
+ *
+ * Here are the number of overlapping bits available for generating
+ * alternates for names of specific lengths:
+ *	1	0	(must have 2 bytes to have any overlap)
+ *	2	1	One bit overlaps--so 2 possible alternates
+ *	3	2	Two bits overlap--so 4 possible alternates
+ *	4	4	Three bits overlap, so 2^3 alternates
+ *	5	8	8 bits overlap (due to wrapping), 256 alternates
+ *	6	18	2^18 alternates
+ *	7	28	2^28 alternates
+ *	   ...
+ * It's clear that the number of alternates grows very quickly with
+ * the length of the name.  But note that the set of alternates
+ * includes invalid names.  And for certain (contrived) names, the
+ * number of valid names is a fairly small fraction of the total
+ * number of alternates.
+ *
+ * The main driver for this infrastructure for coming up with
+ * alternate names is really related to names 5 (or possibly 6)
+ * bytes in length.  5-byte obfuscated names contain no randomly-
+ * generated bytes in them, and the chance of an obfuscated name
+ * matching an already-seen name is too high to just ignore.  This
+ * methodical selection of alternates ensures we don't produce
+ * duplicate names unless we have exhausted our options.
+ */
+int
+find_alternate(
+	size_t		name_len,
+	unsigned char	*name,
+	uint32_t	seq)
+{
+	uint32_t	bitseq = 0;
+	uint32_t	bits = seq;
+
+	if (!seq)
+		return 1;	/* alternate 0 is the original name */
+	if (name_len < 2)	/* Must have 2 bytes to flip */
+		return -1;
+
+	for (bitseq = 0; bits; bitseq++) {
+		uint32_t	mask = 1 << bitseq;
+		int		fb;
+
+		if (!(bits & mask))
+			continue;
+
+		fb = flip_bit(name_len, name, bitseq);
+		if (fb < 1)
+			return fb ? -1 : 0;
+		bits ^= mask;
+	}
+
+	return 1;
+}
diff --git a/db/obfuscate.h b/db/obfuscate.h
new file mode 100644
index 00000000000..afaaca37154
--- /dev/null
+++ b/db/obfuscate.h
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2007, 2011 SGI
+ * All Rights Reserved.
+ */
+#ifndef __DB_OBFUSCATE_H__
+#define __DB_OBFUSCATE_H__
+
+/* Routines to obfuscate directory filenames and xattr names. */
+
+#define is_invalid_char(c)	((c) == '/' || (c) == '\0')
+
+void obfuscate_name(xfs_dahash_t hash, size_t name_len, unsigned char *name,
+		bool is_dirent);
+int find_alternate(size_t name_len, unsigned char *name, uint32_t seq);
+
+#endif /* __DB_OBFUSCATE_H__ */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 8/6] xfs_db: create dirents and xattrs with colliding names
  2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
                     ` (6 preceding siblings ...)
  2023-04-13 15:19   ` [RFC PATCH 7/6] xfs_db: hoist name obfuscation code out of metadump.c Darrick J. Wong
@ 2023-04-13 15:20   ` Darrick J. Wong
  7 siblings, 0 replies; 26+ messages in thread
From: Darrick J. Wong @ 2023-04-13 15:20 UTC (permalink / raw)
  To: cem; +Cc: linux-xfs, david, Christoph Hellwig

From: Darrick J. Wong <djwong@kernel.org>

Create a new debugger command that will create dirent and xattr names
that induce dahash collisions.  This is the driver program that xfs/861
uses to reproduce dabtree node block checking errors.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/hash.c         |  376 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/xfs_db.8 |   31 ++++
 2 files changed, 407 insertions(+)

diff --git a/db/hash.c b/db/hash.c
index 68c53e7f9bc..79a250526e9 100644
--- a/db/hash.c
+++ b/db/hash.c
@@ -5,12 +5,15 @@
  */
 
 #include "libxfs.h"
+#include "init.h"
 #include "addr.h"
 #include "command.h"
 #include "type.h"
 #include "io.h"
 #include "output.h"
 #include "hash.h"
+#include "obfuscate.h"
+#include <sys/xattr.h>
 
 static int hash_f(int argc, char **argv);
 static void hash_help(void);
@@ -46,8 +49,381 @@ hash_f(
 	return 0;
 }
 
+static void
+hashcoll_help(void)
+{
+	printf(_(
+"\n"
+" Generate obfuscated variants of the provided name.  Each variant will have\n"
+" the same dahash value.  Names are written to stdout with a NULL separating\n"
+" each name.\n"
+"\n"
+" -a -- create extended attributes.\n"
+" -i -- read standard input for the name, up to %d bytes.\n"
+" -n -- create this many names.\n"
+" -p -- create directory entries or extended attributes in this file.\n"
+" -s -- seed the rng with this value.\n"
+"\n"),
+			MAXNAMELEN - 1);
+}
+
+struct name_dup {
+	struct name_dup	*next;
+	uint32_t	crc;
+	uint8_t		namelen;
+	uint8_t		name[];
+};
+
+static inline size_t
+name_dup_sizeof(
+	unsigned int	namelen)
+{
+	return sizeof(struct name_dup) + namelen;
+}
+
+#define MAX_DUP_TABLE_BUCKETS	(1048575)
+
+struct dup_table {
+	unsigned int	nr_buckets;
+	struct name_dup	*buckets[];
+};
+
+static inline size_t
+dup_table_sizeof(
+	unsigned int	nr_buckets)
+{
+	return sizeof(struct dup_table) +
+				(nr_buckets * sizeof(struct name_dup *));
+}
+
+static int
+dup_table_alloc(
+	unsigned long		nr_names,
+	struct dup_table	**tabp)
+{
+	struct dup_table	*t;
+
+	*tabp = NULL;
+
+	if (nr_names == 1)
+		return 0;
+
+	nr_names = min(MAX_DUP_TABLE_BUCKETS, nr_names);
+	t = calloc(1, dup_table_sizeof(nr_names));
+	if (!t)
+		return ENOMEM;
+
+	t->nr_buckets = nr_names;
+	*tabp = t;
+	return 0;
+}
+
+static void
+dup_table_free(
+	struct dup_table	*tab)
+{
+	struct name_dup		*ent, *next;
+	unsigned int		i;
+
+	if (!tab)
+		return;
+
+	for (i = 0; i < tab->nr_buckets; i++) {
+		ent = tab->buckets[i];
+
+		while (ent) {
+			next = ent->next;
+			free(ent);
+			ent = next;
+		}
+	}
+	free(tab);
+}
+
+static struct name_dup *
+dup_table_find(
+	struct dup_table	*tab,
+	unsigned char		*name,
+	size_t			namelen)
+{
+	struct name_dup		*ent;
+	uint32_t		crc = crc32c(~0, name, namelen);
+
+	ent = tab->buckets[crc % tab->nr_buckets];
+	while (ent) {
+		if (ent->crc == crc &&
+		    ent->namelen == namelen &&
+		    !memcmp(ent->name, name, namelen))
+			return ent;
+
+		ent = ent->next;
+	}
+
+	return NULL;
+}
+
+static int
+dup_table_store(
+	struct dup_table	*tab,
+	unsigned char		*name,
+	size_t			namelen)
+{
+	struct name_dup		*dup;
+	uint32_t		seq = 1;
+
+	ASSERT(namelen < MAXNAMELEN);
+
+	while ((dup = dup_table_find(tab, name, namelen)) != NULL) {
+		int		ret;
+
+		do {
+			ret = find_alternate(namelen, name, seq++);
+		} while (ret == 0);
+		if (ret < 0)
+			return EEXIST;
+	}
+
+	dup = malloc(name_dup_sizeof(namelen));
+	if (!dup)
+		return ENOMEM;
+
+	dup->crc = crc32c(~0, name, namelen);
+	dup->namelen = namelen;
+	memcpy(dup->name, name, namelen);
+	dup->next = tab->buckets[dup->crc % tab->nr_buckets];
+
+	tab->buckets[dup->crc % tab->nr_buckets] = dup;
+	return 0;
+}
+
+static int
+collide_dirents(
+	unsigned long		nr,
+	const unsigned char	*name,
+	size_t			namelen,
+	int			fd)
+{
+	struct xfs_name		dname = {
+		.name		= name,
+		.len		= namelen,
+	};
+	unsigned char		direntname[MAXNAMELEN + 1];
+	struct dup_table	*tab = NULL;
+	xfs_dahash_t		old_hash;
+	unsigned long		i;
+	int			error = 0;
+
+	old_hash = libxfs_dir2_hashname(mp, &dname);
+
+	if (fd >= 0) {
+		int		newfd;
+
+		/*
+		 * User passed in a fd, so we'll use the directory to detect
+		 * duplicate names.  First create the name that we are passed
+		 * in; the new names will be hardlinks to the first file.
+		 */
+		newfd = openat(fd, name, O_CREAT, 0600);
+		if (newfd < 0)
+			return errno;
+		close(newfd);
+	} else if (nr > 1) {
+		/*
+		 * Track every name we create so that we don't emit duplicates.
+		 */
+		error = dup_table_alloc(nr, &tab);
+		if (error)
+			return error;
+	}
+
+	dname.name = direntname;
+	for (i = 0; i < nr; i++) {
+		strncpy(direntname, name, MAXNAMELEN);
+		obfuscate_name(old_hash, namelen, direntname, true);
+		ASSERT(old_hash == libxfs_dir2_hashname(mp, &dname));
+
+		if (fd >= 0) {
+			error = linkat(fd, name, fd, direntname, 0);
+			if (error && errno != EEXIST)
+				return errno;
+
+			/* don't print names to stdout */
+			continue;
+		} else if (tab) {
+			error = dup_table_store(tab, direntname, namelen);
+			if (error)
+				break;
+		}
+
+		printf("%s%c", direntname, 0);
+	}
+
+	dup_table_free(tab);
+	return error;
+}
+
+static int
+collide_xattrs(
+	unsigned long		nr,
+	const unsigned char	*name,
+	size_t			namelen,
+	int			fd)
+{
+	unsigned char		xattrname[MAXNAMELEN + 5];
+	struct dup_table	*tab = NULL;
+	xfs_dahash_t		old_hash;
+	unsigned long		i;
+	int			error;
+
+	old_hash = libxfs_da_hashname(name, namelen);
+
+	if (fd >= 0) {
+		/*
+		 * User passed in a fd, so we'll use the xattr structure to
+		 * detect duplicate names.  First create the attribute that we
+		 * are passed in.
+		 */
+		snprintf(xattrname, MAXNAMELEN + 5, "user.%s", name);
+		error = fsetxattr(fd, xattrname, "1", 1, 0);
+		if (error)
+			return errno;
+	} else if (nr > 1) {
+		/*
+		 * Track every name we create so that we don't emit duplicates.
+		 */
+		error = dup_table_alloc(nr, &tab);
+		if (error)
+			return error;
+	}
+
+	for (i = 0; i < nr; i++) {
+		snprintf(xattrname, MAXNAMELEN + 5, "user.%s", name);
+		obfuscate_name(old_hash, namelen, xattrname + 5, false);
+		ASSERT(old_hash == libxfs_da_hashname(xattrname + 5, namelen));
+
+		if (fd >= 0) {
+			error = fsetxattr(fd, xattrname, "1", 1, 0);
+			if (error)
+				return errno;
+
+			/* don't print names to stdout */
+			continue;
+		} else if (tab) {
+			error = dup_table_store(tab, xattrname, namelen + 5);
+			if (error)
+				break;
+		}
+
+		printf("%s%c", xattrname, 0);
+	}
+
+	dup_table_free(tab);
+	return error;
+}
+
+static int
+hashcoll_f(
+	int		argc,
+	char		**argv)
+{
+	const char	*path = NULL;
+	bool		read_stdin = false;
+	bool		create_xattr = false;
+	unsigned long	nr = 1, seed = 0;
+	int		fd = -1;
+	int		c;
+	int		error;
+
+	while ((c = getopt(argc, argv, "ain:p:s:")) != EOF) {
+		switch (c) {
+		case 'a':
+			create_xattr = true;
+			break;
+		case 'i':
+			read_stdin = true;
+			break;
+		case 'n':
+			nr = strtoul(optarg, NULL, 10);
+			break;
+		case 'p':
+			path = optarg;
+			break;
+		case 's':
+			seed = strtoul(optarg, NULL, 10);
+			break;
+		default:
+			exitcode = 1;
+			hashcoll_help();
+			return 0;
+		}
+	}
+
+	if (path) {
+		int	oflags = O_RDWR;
+
+		if (!create_xattr)
+			oflags = O_RDONLY | O_DIRECTORY;
+
+		fd = open(path, oflags);
+		if (fd < 0) {
+			perror(path);
+			exitcode = 1;
+			return 0;
+		}
+	}
+
+	if (seed)
+		srandom(seed);
+
+	if (read_stdin) {
+		char	buf[MAXNAMELEN];
+		size_t	len;
+
+		len = fread(buf, 1, MAXNAMELEN - 1, stdin);
+
+		if (create_xattr)
+			error = collide_xattrs(nr, buf, len, fd);
+		else
+			error = collide_dirents(nr, buf, len, fd);
+		if (error) {
+			printf(_("hashcoll: %s\n"), strerror(error));
+			exitcode = 1;
+		}
+		goto done;
+	}
+
+	for (c = optind; c < argc; c++) {
+		size_t	len = strlen(argv[c]);
+
+		if (create_xattr)
+			error = collide_xattrs(nr, argv[c], len, fd);
+		else
+			error = collide_dirents(nr, argv[c], len, fd);
+		if (error) {
+			printf(_("hashcoll: %s\n"), strerror(error));
+			exitcode = 1;
+		}
+	}
+
+done:
+	if (fd >= 0)
+		close(fd);
+	return 0;
+}
+
+static cmdinfo_t	hashcoll_cmd = {
+	.name		= "hashcoll",
+	.cfunc		= hashcoll_f,
+	.argmin		= 0,
+	.argmax		= -1,
+	.args		= N_("[-a] [-s seed] [-n nr] [-p path] -i|names..."),
+	.oneline	= N_("create names that produce dahash collisions"),
+	.help		= hashcoll_help,
+};
+
 void
 hash_init(void)
 {
 	add_command(&hash_cmd);
+	add_command(&hashcoll_cmd);
 }
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index 43c7db5e225..793d0042319 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -791,6 +791,37 @@ Prints the hash value of
 .I string
 using the hash function of the XFS directory and attribute implementation.
 .TP
+.BI "hashcoll [-a] [-s seed] [-n " nr "] [-p " path "] -i | " names...
+Create directory entries or extended attributes names that all have the same
+hash value.
+The metadump name obfuscation algorithm is used here.
+Names are written to standard output, with a NULL between each name for use
+with xargs -0.
+.RS 1.0i
+.PD 0
+.TP 0.4i
+.TP 0.4i
+.B \-a
+Create extended attribute names.
+.TP 0.4i
+.B \-i
+Read the first name to create from standard input.
+Up to 255 bytes are read.
+If this option is not specified, first names are taken from the command line.
+.TP 0.4i
+.BI \-n " nr"
+Create this many duplicated names.
+The default is to create one name.
+.TP 0.4i
+.BI \-p " path"
+Create directory entries or extended attributes in this file instead of
+writing the names to standard output.
+.TP 0.4i
+.BI \-s " seed"
+Seed the random number generator with this value.
+.PD
+.RE
+.TP
 .BI "help [" command ]
 Print help for one or all commands.
 .TP

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-04-13 15:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-06  0:02 [PATCHSET v2 0/4] xfs: fix ascii-ci problems, then kill it Darrick J. Wong
2023-04-06  0:02 ` [PATCH 1/4] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
2023-04-11  4:50   ` Christoph Hellwig
2023-04-06  0:03 ` [PATCH 2/4] xfs: test the ascii case-insensitive hash Darrick J. Wong
2023-04-11  4:50   ` Christoph Hellwig
2023-04-06  0:03 ` [PATCH 3/4] xfs: use the directory name hash function for dir scrubbing Darrick J. Wong
2023-04-11  4:51   ` Christoph Hellwig
2023-04-06  0:03 ` [PATCH 4/4] xfs: deprecate the ascii-ci feature Darrick J. Wong
2023-04-11  4:52   ` Christoph Hellwig
2023-04-06  0:09 ` [PATCHSET v2 0/6] xfsprogs: fix ascii-ci problems, then kill it Darrick J. Wong
2023-04-06  0:09   ` [PATCH 1/6] xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation Darrick J. Wong
2023-04-06  0:09   ` [PATCH 2/6] xfs: test the ascii case-insensitive hash Darrick J. Wong
2023-04-06  0:09   ` [PATCH 3/6] xfs_db: move obfuscate_name assertion to callers Darrick J. Wong
2023-04-11  4:52     ` Christoph Hellwig
2023-04-06  0:09   ` [PATCH 4/6] xfs_db: fix metadump name obfuscation for ascii-ci filesystems Darrick J. Wong
2023-04-11  4:58     ` Christoph Hellwig
2023-04-11 15:35       ` Darrick J. Wong
2023-04-12 12:09         ` Christoph Hellwig
2023-04-12 22:04           ` Darrick J. Wong
2023-04-06  0:10   ` [PATCH 5/6] mkfs.xfs.8: warn about the version=ci feature Darrick J. Wong
2023-04-11  4:59     ` Christoph Hellwig
2023-04-06  0:10   ` [PATCH 6/6] mkfs: deprecate the ascii-ci feature Darrick J. Wong
2023-04-11  4:59     ` Christoph Hellwig
2023-04-13 15:19   ` [RFC PATCH 7/6] xfs_db: hoist name obfuscation code out of metadump.c Darrick J. Wong
2023-04-13 15:20   ` [RFC PATCH 8/6] xfs_db: create dirents and xattrs with colliding names Darrick J. Wong
2023-04-06  0:11 ` [PATCH] fstests: add a couple more tests for ascii-ci problems Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.