All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support
@ 2015-12-24 16:25 Andrew Gabbasov
  2015-12-24 16:25 ` [PATCH v2 1/7] udf: Prevent buffer overrun with multi-byte characters Andrew Gabbasov
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Andrew Gabbasov @ 2015-12-24 16:25 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

V2:

The single patch was split into several commits for separate logical
steps. Also, some minor fixes were done in the code of the patches.

V1:

Current implementation has several issues in unicode.c, mostly related
to handling multi-bytes characters in file names:

- loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
properly catch the end of output buffer in case of multi-bytes characters,
allowing out-of-bounds writing and memory corruption;

- udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
buffer at all, also allowing out-of-bounds writing and memory corruption;

- udf_translate_to_linux does not take into account multi-bytes characters
at all (although it is called after converting to UTF8 or NLS): maximal
length of extension is counted as 5 bytes, that may be incorrect with
multi-bytes characters; when inserting CRC and extension for long names
(near the end of the buffer), they are inserted at fixed place at the end,
that can break into the middle of the multi-bytes character;

- when being converted from CS0 to UTF8 (or NLS), the name can be truncated
(even if the sizes in bytes of input and output buffers are the same),
but the following translating function does not know about it and does not
insert CRC, as it is assumed by the specs.

Because of the last item above, it looks like all the checks and
conversions (re-coding and possible CRC insertions) should be done
simultaneously in the single function. This means that the listed
issues can not be fixed independently and separately. So, the whole
conversion and translation support should be reworked.

The proposed implementation below fixes the listed issues, and also has
some additional features:

- it gets rid of "struct ustr", since it actually just makes an unneeded
extra copying of the buffer and does not have any other significant
advantage;

- it unifies UTF8 and NLS conversions support, since there is no much
sense to separate these cases;

- UDF_NAME_LEN constant adjusted to better reflect actual restrictions.


Andrew Gabbasov (7):
  udf: Prevent buffer overrun with multi-byte characters
  udf: Check output buffer length when converting name to CS0
  udf: Parameterize output length in udf_put_filename
  udf: Join functions for UTF8 and NLS conversions
  udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
  udf: Remove struct ustr as non-needed intermediate storage
  udf: Merge linux specific translation into CS0 conversion function

 fs/udf/namei.c   |  16 +-
 fs/udf/super.c   |  38 ++--
 fs/udf/udfdecl.h |  21 +-
 fs/udf/unicode.c | 611 ++++++++++++++++++++++---------------------------------
 4 files changed, 274 insertions(+), 412 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/7] udf: Prevent buffer overrun with multi-byte characters
  2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
@ 2015-12-24 16:25 ` Andrew Gabbasov
  2015-12-24 16:25 ` [PATCH v2 2/7] udf: Check output buffer length when converting name to CS0 Andrew Gabbasov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Andrew Gabbasov @ 2015-12-24 16:25 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

udf_CS0toUTF8 function stops the conversion when the output buffer
length reaches UDF_NAME_LEN-2, which is correct maximum name length,
but, when checking, it leaves the space for a single byte only,
while multi-bytes output characters can take more space, causing
buffer overflow.

Similar error exists in udf_CS0toNLS function, that restricts
the output length to UDF_NAME_LEN, while actual maximum allowed
length is UDF_NAME_LEN-2.

In these cases the output can override not only the current buffer
length field, causing corruption of the name buffer itself, but also
following allocation structures, causing kernel crash.

Adjust the output length checks in both functions to prevent buffer
overruns in case of multi-bytes UTF8 or NLS characters.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
---
 fs/udf/unicode.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
index ab478e6..95a224b 100644
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -128,11 +128,15 @@ int udf_CS0toUTF8(struct ustr *utf_o, const struct ustr *ocu_i)
 		if (c < 0x80U)
 			utf_o->u_name[utf_o->u_len++] = (uint8_t)c;
 		else if (c < 0x800U) {
+			if (utf_o->u_len > (UDF_NAME_LEN - 4))
+				break;
 			utf_o->u_name[utf_o->u_len++] =
 						(uint8_t)(0xc0 | (c >> 6));
 			utf_o->u_name[utf_o->u_len++] =
 						(uint8_t)(0x80 | (c & 0x3f));
 		} else {
+			if (utf_o->u_len > (UDF_NAME_LEN - 5))
+				break;
 			utf_o->u_name[utf_o->u_len++] =
 						(uint8_t)(0xe0 | (c >> 12));
 			utf_o->u_name[utf_o->u_len++] =
@@ -277,7 +281,7 @@ static int udf_CS0toNLS(struct nls_table *nls, struct ustr *utf_o,
 			c = (c << 8) | ocu[i++];
 
 		len = nls->uni2char(c, &utf_o->u_name[utf_o->u_len],
-				    UDF_NAME_LEN - utf_o->u_len);
+				    UDF_NAME_LEN - 2 - utf_o->u_len);
 		/* Valid character? */
 		if (len >= 0)
 			utf_o->u_len += len;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/7] udf: Check output buffer length when converting name to CS0
  2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
  2015-12-24 16:25 ` [PATCH v2 1/7] udf: Prevent buffer overrun with multi-byte characters Andrew Gabbasov
@ 2015-12-24 16:25 ` Andrew Gabbasov
  2016-01-04 17:19   ` Jan Kara
  2015-12-24 16:25 ` [PATCH v2 3/7] udf: Parameterize output length in udf_put_filename Andrew Gabbasov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Andrew Gabbasov @ 2015-12-24 16:25 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

If a name contains at least some characters with Unicode values
exceeding single byte, the CS0 output should have 2 bytes per character.
And if other input characters have single byte Unicode values, then
the single input byte is converted to 2 output bytes, and the length
of output becomes larger than the length of input. And if the input
name is long enough, the output length may exceed the allocated buffer
length.

All this means that conversion from UTF8 or NLS to CS0 requires
checking of output length in order to stop when it exceeds the given
output buffer size.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
---
 fs/udf/unicode.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
index 95a224b..155f912 100644
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -177,17 +177,18 @@ int udf_CS0toUTF8(struct ustr *utf_o, const struct ustr *ocu_i)
 static int udf_UTF8toCS0(dstring *ocu, struct ustr *utf, int length)
 {
 	unsigned c, i, max_val, utf_char;
-	int utf_cnt, u_len;
+	int utf_cnt, u_len, u_ch;
 
 	memset(ocu, 0, sizeof(dstring) * length);
 	ocu[0] = 8;
 	max_val = 0xffU;
+	u_ch = 1;
 
 try_again:
 	u_len = 0U;
 	utf_char = 0U;
 	utf_cnt = 0U;
-	for (i = 0U; i < utf->u_len; i++) {
+	for (i = 0U; (i < utf->u_len) && ((u_len + 1 + u_ch) < length); i++) {
 		c = (uint8_t)utf->u_name[i];
 
 		/* Complete a multi-byte UTF-8 character */
@@ -229,6 +230,7 @@ try_again:
 			if (max_val == 0xffU) {
 				max_val = 0xffffU;
 				ocu[0] = (uint8_t)0x10U;
+				u_ch = 2;
 				goto try_again;
 			}
 			goto error_out;
@@ -299,15 +301,16 @@ static int udf_NLStoCS0(struct nls_table *nls, dstring *ocu, struct ustr *uni,
 	int len;
 	unsigned i, max_val;
 	uint16_t uni_char;
-	int u_len;
+	int u_len, u_ch;
 
 	memset(ocu, 0, sizeof(dstring) * length);
 	ocu[0] = 8;
 	max_val = 0xffU;
+	u_ch = 1;
 
 try_again:
 	u_len = 0U;
-	for (i = 0U; i < uni->u_len; i++) {
+	for (i = 0U; (i < uni->u_len) && ((u_len + 1 + u_ch) < length); i++) {
 		len = nls->char2uni(&uni->u_name[i], uni->u_len - i, &uni_char);
 		if (!len)
 			continue;
@@ -320,6 +323,7 @@ try_again:
 		if (uni_char > max_val) {
 			max_val = 0xffffU;
 			ocu[0] = (uint8_t)0x10U;
+			u_ch = 2;
 			goto try_again;
 		}
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/7] udf: Parameterize output length in udf_put_filename
  2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
  2015-12-24 16:25 ` [PATCH v2 1/7] udf: Prevent buffer overrun with multi-byte characters Andrew Gabbasov
  2015-12-24 16:25 ` [PATCH v2 2/7] udf: Check output buffer length when converting name to CS0 Andrew Gabbasov
@ 2015-12-24 16:25 ` Andrew Gabbasov
  2015-12-24 16:25 ` [PATCH v2 4/7] udf: Join functions for UTF8 and NLS conversions Andrew Gabbasov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Andrew Gabbasov @ 2015-12-24 16:25 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

Make the desired output length a parameter rather than have it
hard-coded to UDF_NAME_LEN. Although all call sites still have
this length the same, this parameterization will make the function
more universal and also consistent with udf_get_filename.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
---
 fs/udf/namei.c   | 10 ++++++----
 fs/udf/udfdecl.h |  4 ++--
 fs/udf/unicode.c | 10 +++++-----
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index c97b5a8..6192070 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -362,8 +362,9 @@ static struct fileIdentDesc *udf_add_entry(struct inode *dir,
 			*err = -EINVAL;
 			goto out_err;
 		}
-		namelen = udf_put_filename(sb, dentry->d_name.name, name,
-						 dentry->d_name.len);
+		namelen = udf_put_filename(sb, dentry->d_name.name,
+					   dentry->d_name.len,
+					   name, UDF_NAME_LEN);
 		if (!namelen) {
 			*err = -ENAMETOOLONG;
 			goto out_err;
@@ -996,8 +997,9 @@ static int udf_symlink(struct inode *dir, struct dentry *dentry,
 		}
 
 		if (pc->componentType == 5) {
-			namelen = udf_put_filename(sb, compstart, name,
-						   symname - compstart);
+			namelen = udf_put_filename(sb, compstart,
+						   symname - compstart,
+						   name, UDF_NAME_LEN);
 			if (!namelen)
 				goto out_no_entry;
 
diff --git a/fs/udf/udfdecl.h b/fs/udf/udfdecl.h
index 47bb3f5..35591e3 100644
--- a/fs/udf/udfdecl.h
+++ b/fs/udf/udfdecl.h
@@ -213,8 +213,8 @@ udf_get_lb_pblock(struct super_block *sb, struct kernel_lb_addr *loc,
 /* unicode.c */
 extern int udf_get_filename(struct super_block *, uint8_t *, int, uint8_t *,
 			    int);
-extern int udf_put_filename(struct super_block *, const uint8_t *, uint8_t *,
-			    int);
+extern int udf_put_filename(struct super_block *, const uint8_t *, int,
+			    uint8_t *, int);
 extern int udf_build_ustr(struct ustr *, dstring *, int);
 extern int udf_CS0toUTF8(struct ustr *, const struct ustr *);
 
diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
index 155f912..13e8b69 100644
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -388,22 +388,22 @@ out1:
 	return ret;
 }
 
-int udf_put_filename(struct super_block *sb, const uint8_t *sname,
-		     uint8_t *dname, int flen)
+int udf_put_filename(struct super_block *sb, const uint8_t *sname, int slen,
+		     uint8_t *dname, int dlen)
 {
 	struct ustr unifilename;
 	int namelen;
 
-	if (!udf_char_to_ustr(&unifilename, sname, flen))
+	if (!udf_char_to_ustr(&unifilename, sname, slen))
 		return 0;
 
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8)) {
-		namelen = udf_UTF8toCS0(dname, &unifilename, UDF_NAME_LEN);
+		namelen = udf_UTF8toCS0(dname, &unifilename, dlen);
 		if (!namelen)
 			return 0;
 	} else if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP)) {
 		namelen = udf_NLStoCS0(UDF_SB(sb)->s_nls_map, dname,
-					&unifilename, UDF_NAME_LEN);
+					&unifilename, dlen);
 		if (!namelen)
 			return 0;
 	} else
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/7] udf: Join functions for UTF8 and NLS conversions
  2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
                   ` (2 preceding siblings ...)
  2015-12-24 16:25 ` [PATCH v2 3/7] udf: Parameterize output length in udf_put_filename Andrew Gabbasov
@ 2015-12-24 16:25 ` Andrew Gabbasov
  2015-12-24 16:25 ` [PATCH v2 5/7] udf: Adjust UDF_NAME_LEN to better reflect actual restrictions Andrew Gabbasov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Andrew Gabbasov @ 2015-12-24 16:25 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

There is no much sense to have separate functions for UTF8 and
NLS conversions, since UTF8 encoding is actually the special case
of NLS.

However, although UTF8 is also supported by general NLS framework,
it would be good to have separate UTF8 character conversion functions
(char2uni and uni2char) locally in UDF code, so that they could be
used even if NLS support is not enabled in the kernel configuration.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
---
 fs/udf/unicode.c | 272 ++++++++++++++++++-------------------------------------
 1 file changed, 88 insertions(+), 184 deletions(-)

diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
index 13e8b69..21a8cfb 100644
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -76,147 +76,72 @@ static void udf_build_ustr_exact(struct ustr *dest, dstring *ptr, int exactsize)
 	memcpy(dest->u_name, ptr + 1, exactsize - 1);
 }
 
-/*
- * udf_CS0toUTF8
- *
- * PURPOSE
- *	Convert OSTA Compressed Unicode to the UTF-8 equivalent.
- *
- * PRE-CONDITIONS
- *	utf			Pointer to UTF-8 output buffer.
- *	ocu			Pointer to OSTA Compressed Unicode input buffer
- *				of size UDF_NAME_LEN bytes.
- * 				both of type "struct ustr *"
- *
- * POST-CONDITIONS
- *	<return>		>= 0 on success.
- *
- * HISTORY
- *	November 12, 1997 - Andrew E. Mileski
- *	Written, tested, and released.
- */
-int udf_CS0toUTF8(struct ustr *utf_o, const struct ustr *ocu_i)
+static int udf_uni2char_utf8(wchar_t uni,
+			     unsigned char *out,
+			     int boundlen)
 {
-	const uint8_t *ocu;
-	uint8_t cmp_id, ocu_len;
-	int i;
-
-	ocu_len = ocu_i->u_len;
-	if (ocu_len == 0) {
-		memset(utf_o, 0, sizeof(struct ustr));
-		return 0;
-	}
-
-	cmp_id = ocu_i->u_cmpID;
-	if (cmp_id != 8 && cmp_id != 16) {
-		memset(utf_o, 0, sizeof(struct ustr));
-		pr_err("unknown compression code (%d) stri=%s\n",
-		       cmp_id, ocu_i->u_name);
-		return -EINVAL;
-	}
-
-	ocu = ocu_i->u_name;
-	utf_o->u_len = 0;
-	for (i = 0; (i < ocu_len) && (utf_o->u_len <= (UDF_NAME_LEN - 3));) {
-
-		/* Expand OSTA compressed Unicode to Unicode */
-		uint32_t c = ocu[i++];
-		if (cmp_id == 16)
-			c = (c << 8) | ocu[i++];
-
-		/* Compress Unicode to UTF-8 */
-		if (c < 0x80U)
-			utf_o->u_name[utf_o->u_len++] = (uint8_t)c;
-		else if (c < 0x800U) {
-			if (utf_o->u_len > (UDF_NAME_LEN - 4))
-				break;
-			utf_o->u_name[utf_o->u_len++] =
-						(uint8_t)(0xc0 | (c >> 6));
-			utf_o->u_name[utf_o->u_len++] =
-						(uint8_t)(0x80 | (c & 0x3f));
-		} else {
-			if (utf_o->u_len > (UDF_NAME_LEN - 5))
-				break;
-			utf_o->u_name[utf_o->u_len++] =
-						(uint8_t)(0xe0 | (c >> 12));
-			utf_o->u_name[utf_o->u_len++] =
-						(uint8_t)(0x80 |
-							  ((c >> 6) & 0x3f));
-			utf_o->u_name[utf_o->u_len++] =
-						(uint8_t)(0x80 | (c & 0x3f));
-		}
+	int u_len = 0;
+
+	if (boundlen <= 0)
+		return -ENAMETOOLONG;
+
+	if (uni < 0x80) {
+		out[u_len++] = (unsigned char)uni;
+	} else if (uni < 0x800) {
+		if (boundlen < 2)
+			return -ENAMETOOLONG;
+		out[u_len++] = (unsigned char)(0xc0 | (uni >> 6));
+		out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f));
+	} else {
+		if (boundlen < 3)
+			return -ENAMETOOLONG;
+		out[u_len++] = (unsigned char)(0xe0 | (uni >> 12));
+		out[u_len++] = (unsigned char)(0x80 | ((uni >> 6) & 0x3f));
+		out[u_len++] = (unsigned char)(0x80 | (uni & 0x3f));
 	}
-	utf_o->u_cmpID = 8;
-
-	return utf_o->u_len;
+	return u_len;
 }
 
-/*
- *
- * udf_UTF8toCS0
- *
- * PURPOSE
- *	Convert UTF-8 to the OSTA Compressed Unicode equivalent.
- *
- * DESCRIPTION
- *	This routine is only called by udf_lookup().
- *
- * PRE-CONDITIONS
- *	ocu			Pointer to OSTA Compressed Unicode output
- *				buffer of size UDF_NAME_LEN bytes.
- *	utf			Pointer to UTF-8 input buffer.
- *	utf_len			Length of UTF-8 input buffer in bytes.
- *
- * POST-CONDITIONS
- *	<return>		Zero on success.
- *
- * HISTORY
- *	November 12, 1997 - Andrew E. Mileski
- *	Written, tested, and released.
- */
-static int udf_UTF8toCS0(dstring *ocu, struct ustr *utf, int length)
+static int udf_char2uni_utf8(const unsigned char *in,
+			     int boundlen,
+			     wchar_t *uni)
 {
-	unsigned c, i, max_val, utf_char;
-	int utf_cnt, u_len, u_ch;
+	unsigned int utf_char;
+	unsigned char c;
+	int utf_cnt, u_len;
 
-	memset(ocu, 0, sizeof(dstring) * length);
-	ocu[0] = 8;
-	max_val = 0xffU;
-	u_ch = 1;
-
-try_again:
-	u_len = 0U;
-	utf_char = 0U;
-	utf_cnt = 0U;
-	for (i = 0U; (i < utf->u_len) && ((u_len + 1 + u_ch) < length); i++) {
-		c = (uint8_t)utf->u_name[i];
+	utf_char = 0;
+	utf_cnt = 0;
+	for (u_len = 0; u_len < boundlen;) {
+		c = in[u_len++];
 
 		/* Complete a multi-byte UTF-8 character */
 		if (utf_cnt) {
-			utf_char = (utf_char << 6) | (c & 0x3fU);
+			utf_char = (utf_char << 6) | (c & 0x3f);
 			if (--utf_cnt)
 				continue;
 		} else {
 			/* Check for a multi-byte UTF-8 character */
-			if (c & 0x80U) {
+			if (c & 0x80) {
 				/* Start a multi-byte UTF-8 character */
-				if ((c & 0xe0U) == 0xc0U) {
-					utf_char = c & 0x1fU;
+				if ((c & 0xe0) == 0xc0) {
+					utf_char = c & 0x1f;
 					utf_cnt = 1;
-				} else if ((c & 0xf0U) == 0xe0U) {
-					utf_char = c & 0x0fU;
+				} else if ((c & 0xf0) == 0xe0) {
+					utf_char = c & 0x0f;
 					utf_cnt = 2;
-				} else if ((c & 0xf8U) == 0xf0U) {
-					utf_char = c & 0x07U;
+				} else if ((c & 0xf8) == 0xf0) {
+					utf_char = c & 0x07;
 					utf_cnt = 3;
-				} else if ((c & 0xfcU) == 0xf8U) {
-					utf_char = c & 0x03U;
+				} else if ((c & 0xfc) == 0xf8) {
+					utf_char = c & 0x03;
 					utf_cnt = 4;
-				} else if ((c & 0xfeU) == 0xfcU) {
-					utf_char = c & 0x01U;
+				} else if ((c & 0xfe) == 0xfc) {
+					utf_char = c & 0x01;
 					utf_cnt = 5;
 				} else {
-					goto error_out;
+					utf_cnt = -1;
+					break;
 				}
 				continue;
 			} else {
@@ -224,36 +149,19 @@ try_again:
 				utf_char = c;
 			}
 		}
-
-		/* Choose no compression if necessary */
-		if (utf_char > max_val) {
-			if (max_val == 0xffU) {
-				max_val = 0xffffU;
-				ocu[0] = (uint8_t)0x10U;
-				u_ch = 2;
-				goto try_again;
-			}
-			goto error_out;
-		}
-
-		if (max_val == 0xffffU)
-			ocu[++u_len] = (uint8_t)(utf_char >> 8);
-		ocu[++u_len] = (uint8_t)(utf_char & 0xffU);
+		*uni = utf_char;
+		break;
 	}
-
 	if (utf_cnt) {
-error_out:
-		ocu[++u_len] = '?';
-		printk(KERN_DEBUG pr_fmt("bad UTF-8 character\n"));
+		*uni = '?';
+		return -EINVAL;
 	}
-
-	ocu[length - 1] = (uint8_t)u_len + 1;
-
-	return u_len + 1;
+	return u_len;
 }
 
-static int udf_CS0toNLS(struct nls_table *nls, struct ustr *utf_o,
-			const struct ustr *ocu_i)
+static int udf_name_from_CS0(struct ustr *utf_o,
+			     const struct ustr *ocu_i,
+			     int (*conv_f)(wchar_t, unsigned char *, int))
 {
 	const uint8_t *ocu;
 	uint8_t cmp_id, ocu_len;
@@ -282,8 +190,8 @@ static int udf_CS0toNLS(struct nls_table *nls, struct ustr *utf_o,
 		if (cmp_id == 16)
 			c = (c << 8) | ocu[i++];
 
-		len = nls->uni2char(c, &utf_o->u_name[utf_o->u_len],
-				    UDF_NAME_LEN - 2 - utf_o->u_len);
+		len = conv_f(c, &utf_o->u_name[utf_o->u_len],
+			     UDF_NAME_LEN - 2 - utf_o->u_len);
 		/* Valid character? */
 		if (len >= 0)
 			utf_o->u_len += len;
@@ -295,23 +203,23 @@ static int udf_CS0toNLS(struct nls_table *nls, struct ustr *utf_o,
 	return utf_o->u_len;
 }
 
-static int udf_NLStoCS0(struct nls_table *nls, dstring *ocu, struct ustr *uni,
-			int length)
+static int udf_name_to_CS0(dstring *ocu, struct ustr *uni, int length,
+			   int (*conv_f)(const unsigned char *, int, wchar_t *))
 {
-	int len;
-	unsigned i, max_val;
-	uint16_t uni_char;
+	int i, len;
+	unsigned int max_val;
+	wchar_t uni_char;
 	int u_len, u_ch;
 
 	memset(ocu, 0, sizeof(dstring) * length);
 	ocu[0] = 8;
-	max_val = 0xffU;
+	max_val = 0xff;
 	u_ch = 1;
 
 try_again:
-	u_len = 0U;
-	for (i = 0U; (i < uni->u_len) && ((u_len + 1 + u_ch) < length); i++) {
-		len = nls->char2uni(&uni->u_name[i], uni->u_len - i, &uni_char);
+	u_len = 0;
+	for (i = 0; (i < uni->u_len) && ((u_len + 1 + u_ch) < length); i++) {
+		len = conv_f(&uni->u_name[i], uni->u_len - i, &uni_char);
 		if (!len)
 			continue;
 		/* Invalid character, deal with it */
@@ -321,15 +229,15 @@ try_again:
 		}
 
 		if (uni_char > max_val) {
-			max_val = 0xffffU;
-			ocu[0] = (uint8_t)0x10U;
+			max_val = 0xffff;
+			ocu[0] = 0x10;
 			u_ch = 2;
 			goto try_again;
 		}
 
-		if (max_val == 0xffffU)
+		if (max_val == 0xffff)
 			ocu[++u_len] = (uint8_t)(uni_char >> 8);
-		ocu[++u_len] = (uint8_t)(uni_char & 0xffU);
+		ocu[++u_len] = (uint8_t)(uni_char & 0xff);
 		i += len - 1;
 	}
 
@@ -337,10 +245,16 @@ try_again:
 	return u_len + 1;
 }
 
+int udf_CS0toUTF8(struct ustr *utf_o, const struct ustr *ocu_i)
+{
+	return udf_name_from_CS0(utf_o, ocu_i, udf_uni2char_utf8);
+}
+
 int udf_get_filename(struct super_block *sb, uint8_t *sname, int slen,
 		     uint8_t *dname, int dlen)
 {
 	struct ustr *filename, *unifilename;
+	int (*conv_f)(wchar_t, unsigned char *, int);
 	int ret;
 
 	if (!slen)
@@ -358,23 +272,18 @@ int udf_get_filename(struct super_block *sb, uint8_t *sname, int slen,
 
 	udf_build_ustr_exact(unifilename, sname, slen);
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8)) {
-		ret = udf_CS0toUTF8(filename, unifilename);
-		if (ret < 0) {
-			udf_debug("Failed in udf_get_filename: sname = %s\n",
-				  sname);
-			goto out2;
-		}
+		conv_f = udf_uni2char_utf8;
 	} else if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP)) {
-		ret = udf_CS0toNLS(UDF_SB(sb)->s_nls_map, filename,
-				   unifilename);
-		if (ret < 0) {
-			udf_debug("Failed in udf_get_filename: sname = %s\n",
-				  sname);
-			goto out2;
-		}
+		conv_f = UDF_SB(sb)->s_nls_map->uni2char;
 	} else
 		BUG();
 
+	ret = udf_name_from_CS0(filename, unifilename, conv_f);
+	if (ret < 0) {
+		udf_debug("Failed in udf_get_filename: sname = %s\n", sname);
+		goto out2;
+	}
+
 	ret = udf_translate_to_linux(dname, dlen,
 				     filename->u_name, filename->u_len,
 				     unifilename->u_name, unifilename->u_len);
@@ -392,24 +301,19 @@ int udf_put_filename(struct super_block *sb, const uint8_t *sname, int slen,
 		     uint8_t *dname, int dlen)
 {
 	struct ustr unifilename;
-	int namelen;
+	int (*conv_f)(const unsigned char *, int, wchar_t *);
 
 	if (!udf_char_to_ustr(&unifilename, sname, slen))
 		return 0;
 
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8)) {
-		namelen = udf_UTF8toCS0(dname, &unifilename, dlen);
-		if (!namelen)
-			return 0;
+		conv_f = udf_char2uni_utf8;
 	} else if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP)) {
-		namelen = udf_NLStoCS0(UDF_SB(sb)->s_nls_map, dname,
-					&unifilename, dlen);
-		if (!namelen)
-			return 0;
+		conv_f = UDF_SB(sb)->s_nls_map->char2uni;
 	} else
-		return 0;
+		BUG();
 
-	return namelen;
+	return udf_name_to_CS0(dname, &unifilename, dlen, conv_f);
 }
 
 #define ILLEGAL_CHAR_MARK	'_'
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/7] udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
  2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
                   ` (3 preceding siblings ...)
  2015-12-24 16:25 ` [PATCH v2 4/7] udf: Join functions for UTF8 and NLS conversions Andrew Gabbasov
@ 2015-12-24 16:25 ` Andrew Gabbasov
  2015-12-24 16:25 ` [PATCH v2 6/7] udf: Remove struct ustr as non-needed intermediate storage Andrew Gabbasov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Andrew Gabbasov @ 2015-12-24 16:25 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

Actual name length restriction is 254 bytes, this is used in 'ustr'
structure, and this is what fits into UDF File Ident structures.
And in most cases the constant is used as UDF_NAME_LEN-2.
So, it's better to just modify the constant to make it closer
to reality.

Also, in some cases it's useful to have a separate constant for
the maximum length of file name field in CS0 encoding in UDF File
Ident structures.

Also, remove the unused UDF_PATH_LEN constant.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
---
 fs/udf/namei.c   | 10 +++++-----
 fs/udf/super.c   |  2 +-
 fs/udf/udfdecl.h |  6 +++---
 fs/udf/unicode.c |  6 +++---
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 6192070..8527368 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -291,7 +291,7 @@ static struct dentry *udf_lookup(struct inode *dir, struct dentry *dentry,
 	struct udf_fileident_bh fibh;
 	struct fileIdentDesc *fi;
 
-	if (dentry->d_name.len > UDF_NAME_LEN - 2)
+	if (dentry->d_name.len > UDF_NAME_LEN)
 		return ERR_PTR(-ENAMETOOLONG);
 
 #ifdef UDF_RECOVERY
@@ -351,7 +351,7 @@ static struct fileIdentDesc *udf_add_entry(struct inode *dir,
 	struct udf_inode_info *dinfo;
 
 	fibh->sbh = fibh->ebh = NULL;
-	name = kmalloc(UDF_NAME_LEN, GFP_NOFS);
+	name = kmalloc(UDF_NAME_LEN_CS0, GFP_NOFS);
 	if (!name) {
 		*err = -ENOMEM;
 		goto out_err;
@@ -364,7 +364,7 @@ static struct fileIdentDesc *udf_add_entry(struct inode *dir,
 		}
 		namelen = udf_put_filename(sb, dentry->d_name.name,
 					   dentry->d_name.len,
-					   name, UDF_NAME_LEN);
+					   name, UDF_NAME_LEN_CS0);
 		if (!namelen) {
 			*err = -ENAMETOOLONG;
 			goto out_err;
@@ -915,7 +915,7 @@ static int udf_symlink(struct inode *dir, struct dentry *dentry,
 
 	iinfo = UDF_I(inode);
 	down_write(&iinfo->i_data_sem);
-	name = kmalloc(UDF_NAME_LEN, GFP_NOFS);
+	name = kmalloc(UDF_NAME_LEN_CS0, GFP_NOFS);
 	if (!name) {
 		err = -ENOMEM;
 		goto out_no_entry;
@@ -999,7 +999,7 @@ static int udf_symlink(struct inode *dir, struct dentry *dentry,
 		if (pc->componentType == 5) {
 			namelen = udf_put_filename(sb, compstart,
 						   symname - compstart,
-						   name, UDF_NAME_LEN);
+						   name, UDF_NAME_LEN_CS0);
 			if (!namelen)
 				goto out_no_entry;
 
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 81155b9..a801721 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -2348,7 +2348,7 @@ static int udf_statfs(struct dentry *dentry, struct kstatfs *buf)
 					  le32_to_cpu(lvidiu->numDirs)) : 0)
 			+ buf->f_bfree;
 	buf->f_ffree = buf->f_bfree;
-	buf->f_namelen = UDF_NAME_LEN - 2;
+	buf->f_namelen = UDF_NAME_LEN;
 	buf->f_fsid.val[0] = (u32)id;
 	buf->f_fsid.val[1] = (u32)(id >> 32);
 
diff --git a/fs/udf/udfdecl.h b/fs/udf/udfdecl.h
index 35591e3..2a70a63 100644
--- a/fs/udf/udfdecl.h
+++ b/fs/udf/udfdecl.h
@@ -49,8 +49,8 @@ extern __printf(3, 4) void _udf_warn(struct super_block *sb,
 #define UDF_EXTENT_FLAG_MASK	0xC0000000
 
 #define UDF_NAME_PAD		4
-#define UDF_NAME_LEN		256
-#define UDF_PATH_LEN		1023
+#define UDF_NAME_LEN		254
+#define UDF_NAME_LEN_CS0	255
 
 static inline size_t udf_file_entry_alloc_offset(struct inode *inode)
 {
@@ -109,7 +109,7 @@ struct generic_desc {
 
 struct ustr {
 	uint8_t u_cmpID;
-	uint8_t u_name[UDF_NAME_LEN - 2];
+	uint8_t u_name[UDF_NAME_LEN];
 	uint8_t u_len;
 };
 
diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
index 21a8cfb..7eaa865 100644
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -33,7 +33,7 @@ static int udf_translate_to_linux(uint8_t *, int, uint8_t *, int, uint8_t *,
 
 static int udf_char_to_ustr(struct ustr *dest, const uint8_t *src, int strlen)
 {
-	if ((!dest) || (!src) || (!strlen) || (strlen > UDF_NAME_LEN - 2))
+	if ((!dest) || (!src) || (!strlen) || (strlen > UDF_NAME_LEN))
 		return 0;
 
 	memset(dest, 0, sizeof(struct ustr));
@@ -184,14 +184,14 @@ static int udf_name_from_CS0(struct ustr *utf_o,
 
 	ocu = ocu_i->u_name;
 	utf_o->u_len = 0;
-	for (i = 0; (i < ocu_len) && (utf_o->u_len <= (UDF_NAME_LEN - 3));) {
+	for (i = 0; (i < ocu_len) && (utf_o->u_len < UDF_NAME_LEN);) {
 		/* Expand OSTA compressed Unicode to Unicode */
 		uint32_t c = ocu[i++];
 		if (cmp_id == 16)
 			c = (c << 8) | ocu[i++];
 
 		len = conv_f(c, &utf_o->u_name[utf_o->u_len],
-			     UDF_NAME_LEN - 2 - utf_o->u_len);
+			     UDF_NAME_LEN - utf_o->u_len);
 		/* Valid character? */
 		if (len >= 0)
 			utf_o->u_len += len;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 6/7] udf: Remove struct ustr as non-needed intermediate storage
  2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
                   ` (4 preceding siblings ...)
  2015-12-24 16:25 ` [PATCH v2 5/7] udf: Adjust UDF_NAME_LEN to better reflect actual restrictions Andrew Gabbasov
@ 2015-12-24 16:25 ` Andrew Gabbasov
  2016-01-04 12:32   ` Jan Kara
  2015-12-24 16:25 ` [PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function Andrew Gabbasov
  2016-01-04 13:30 ` [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Jan Kara
  7 siblings, 1 reply; 17+ messages in thread
From: Andrew Gabbasov @ 2015-12-24 16:25 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

Although 'struct ustr' tries to structurize the data by combining
the string and its length, it doesn't actually make much benefit,
since it saves only one parameter, but introduces an extra copying
of the whole buffer, serving as an intermediate storage. It looks
quite inefficient and not actually needed.

This commit gets rid of the struct ustr by changing the parameters
of some functions appropriately.

Also, it removes using 'dstring' type, since it doesn't make much
sense too.

Just using the occasion, add a 'const' qualifier to udf_get_filename
to make consistent parameters sets.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
---
 fs/udf/super.c   |  36 +++++---------
 fs/udf/udfdecl.h |  13 ++---
 fs/udf/unicode.c | 143 +++++++++++++++++--------------------------------------
 3 files changed, 59 insertions(+), 133 deletions(-)

diff --git a/fs/udf/super.c b/fs/udf/super.c
index a801721..c1e9d45 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -891,18 +891,14 @@ static int udf_find_fileset(struct super_block *sb,
 static int udf_load_pvoldesc(struct super_block *sb, sector_t block)
 {
 	struct primaryVolDesc *pvoldesc;
-	struct ustr *instr, *outstr;
+	uint8_t *outstr;
 	struct buffer_head *bh;
 	uint16_t ident;
 	int ret = -ENOMEM;
 
-	instr = kmalloc(sizeof(struct ustr), GFP_NOFS);
-	if (!instr)
-		return -ENOMEM;
-
-	outstr = kmalloc(sizeof(struct ustr), GFP_NOFS);
+	outstr = kmalloc(128, GFP_NOFS);
 	if (!outstr)
-		goto out1;
+		return -ENOMEM;
 
 	bh = udf_read_tagged(sb, block, block, &ident);
 	if (!bh) {
@@ -927,31 +923,25 @@ static int udf_load_pvoldesc(struct super_block *sb, sector_t block)
 #endif
 	}
 
-	if (!udf_build_ustr(instr, pvoldesc->volIdent, 32)) {
-		ret = udf_CS0toUTF8(outstr, instr);
-		if (ret < 0)
-			goto out_bh;
+	ret = udf_CS0toUTF8(outstr, 31, pvoldesc->volIdent, 32);
+	if (ret < 0)
+		goto out_bh;
 
-		strncpy(UDF_SB(sb)->s_volume_ident, outstr->u_name,
-			outstr->u_len > 31 ? 31 : outstr->u_len);
-		udf_debug("volIdent[] = '%s'\n", UDF_SB(sb)->s_volume_ident);
-	}
+	strncpy(UDF_SB(sb)->s_volume_ident, outstr, ret);
+	udf_debug("volIdent[] = '%s'\n", UDF_SB(sb)->s_volume_ident);
 
-	if (!udf_build_ustr(instr, pvoldesc->volSetIdent, 128)) {
-		ret = udf_CS0toUTF8(outstr, instr);
-		if (ret < 0)
-			goto out_bh;
+	ret = udf_CS0toUTF8(outstr, 127, pvoldesc->volSetIdent, 128);
+	if (ret < 0)
+		goto out_bh;
 
-		udf_debug("volSetIdent[] = '%s'\n", outstr->u_name);
-	}
+	outstr[ret] = 0;
+	udf_debug("volSetIdent[] = '%s'\n", outstr);
 
 	ret = 0;
 out_bh:
 	brelse(bh);
 out2:
 	kfree(outstr);
-out1:
-	kfree(instr);
 	return ret;
 }
 
diff --git a/fs/udf/udfdecl.h b/fs/udf/udfdecl.h
index 2a70a63..fbd0e7f 100644
--- a/fs/udf/udfdecl.h
+++ b/fs/udf/udfdecl.h
@@ -107,12 +107,6 @@ struct generic_desc {
 	__le32		volDescSeqNum;
 };
 
-struct ustr {
-	uint8_t u_cmpID;
-	uint8_t u_name[UDF_NAME_LEN];
-	uint8_t u_len;
-};
-
 
 /* super.c */
 
@@ -211,12 +205,11 @@ udf_get_lb_pblock(struct super_block *sb, struct kernel_lb_addr *loc,
 }
 
 /* unicode.c */
-extern int udf_get_filename(struct super_block *, uint8_t *, int, uint8_t *,
-			    int);
+extern int udf_get_filename(struct super_block *, const uint8_t *, int,
+			    uint8_t *, int);
 extern int udf_put_filename(struct super_block *, const uint8_t *, int,
 			    uint8_t *, int);
-extern int udf_build_ustr(struct ustr *, dstring *, int);
-extern int udf_CS0toUTF8(struct ustr *, const struct ustr *);
+extern int udf_CS0toUTF8(uint8_t *, int, const uint8_t *, int);
 
 /* ialloc.c */
 extern void udf_free_inode(struct inode *);
diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
index 7eaa865..f1cdeac 100644
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -28,53 +28,8 @@
 
 #include "udf_sb.h"
 
-static int udf_translate_to_linux(uint8_t *, int, uint8_t *, int, uint8_t *,
-				  int);
-
-static int udf_char_to_ustr(struct ustr *dest, const uint8_t *src, int strlen)
-{
-	if ((!dest) || (!src) || (!strlen) || (strlen > UDF_NAME_LEN))
-		return 0;
-
-	memset(dest, 0, sizeof(struct ustr));
-	memcpy(dest->u_name, src, strlen);
-	dest->u_cmpID = 0x08;
-	dest->u_len = strlen;
-
-	return strlen;
-}
-
-/*
- * udf_build_ustr
- */
-int udf_build_ustr(struct ustr *dest, dstring *ptr, int size)
-{
-	int usesize;
-
-	if (!dest || !ptr || !size)
-		return -1;
-	BUG_ON(size < 2);
-
-	usesize = min_t(size_t, ptr[size - 1], sizeof(dest->u_name));
-	usesize = min(usesize, size - 2);
-	dest->u_cmpID = ptr[0];
-	dest->u_len = usesize;
-	memcpy(dest->u_name, ptr + 1, usesize);
-	memset(dest->u_name + usesize, 0, sizeof(dest->u_name) - usesize);
-
-	return 0;
-}
-
-/*
- * udf_build_ustr_exact
- */
-static void udf_build_ustr_exact(struct ustr *dest, dstring *ptr, int exactsize)
-{
-	memset(dest, 0, sizeof(struct ustr));
-	dest->u_cmpID = ptr[0];
-	dest->u_len = exactsize - 1;
-	memcpy(dest->u_name, ptr + 1, exactsize - 1);
-}
+static int udf_translate_to_linux(uint8_t *, int, const uint8_t *, int,
+				  const uint8_t *, int);
 
 static int udf_uni2char_utf8(wchar_t uni,
 			     unsigned char *out,
@@ -159,51 +114,48 @@ static int udf_char2uni_utf8(const unsigned char *in,
 	return u_len;
 }
 
-static int udf_name_from_CS0(struct ustr *utf_o,
-			     const struct ustr *ocu_i,
+static int udf_name_from_CS0(uint8_t *str_o, int str_max_len,
+			     const uint8_t *ocu, int ocu_len,
 			     int (*conv_f)(wchar_t, unsigned char *, int))
 {
-	const uint8_t *ocu;
-	uint8_t cmp_id, ocu_len;
+	uint8_t cmp_id;
 	int i, len;
+	int str_o_len = 0;
 
+	if (str_max_len <= 0)
+		return 0;
 
-	ocu_len = ocu_i->u_len;
 	if (ocu_len == 0) {
-		memset(utf_o, 0, sizeof(struct ustr));
+		memset(str_o, 0, str_max_len);
 		return 0;
 	}
 
-	cmp_id = ocu_i->u_cmpID;
+	cmp_id = ocu[0];
 	if (cmp_id != 8 && cmp_id != 16) {
-		memset(utf_o, 0, sizeof(struct ustr));
-		pr_err("unknown compression code (%d) stri=%s\n",
-		       cmp_id, ocu_i->u_name);
+		memset(str_o, 0, str_max_len);
+		pr_err("unknown compression code (%d) stri=%s\n", cmp_id, ocu);
 		return -EINVAL;
 	}
 
-	ocu = ocu_i->u_name;
-	utf_o->u_len = 0;
-	for (i = 0; (i < ocu_len) && (utf_o->u_len < UDF_NAME_LEN);) {
+	for (i = 1; (i < ocu_len) && (str_o_len < str_max_len);) {
 		/* Expand OSTA compressed Unicode to Unicode */
 		uint32_t c = ocu[i++];
 		if (cmp_id == 16)
 			c = (c << 8) | ocu[i++];
 
-		len = conv_f(c, &utf_o->u_name[utf_o->u_len],
-			     UDF_NAME_LEN - utf_o->u_len);
+		len = conv_f(c, &str_o[str_o_len], str_max_len - str_o_len);
 		/* Valid character? */
 		if (len >= 0)
-			utf_o->u_len += len;
+			str_o_len += len;
 		else
-			utf_o->u_name[utf_o->u_len++] = '?';
+			str_o[str_o_len++] = '?';
 	}
-	utf_o->u_cmpID = 8;
 
-	return utf_o->u_len;
+	return str_o_len;
 }
 
-static int udf_name_to_CS0(dstring *ocu, struct ustr *uni, int length,
+static int udf_name_to_CS0(uint8_t *ocu, int ocu_max_len,
+			   const uint8_t *str_i, int str_len,
 			   int (*conv_f)(const unsigned char *, int, wchar_t *))
 {
 	int i, len;
@@ -211,15 +163,18 @@ static int udf_name_to_CS0(dstring *ocu, struct ustr *uni, int length,
 	wchar_t uni_char;
 	int u_len, u_ch;
 
-	memset(ocu, 0, sizeof(dstring) * length);
+	if (ocu_max_len <= 0)
+		return 0;
+
+	memset(ocu, 0, ocu_max_len);
 	ocu[0] = 8;
 	max_val = 0xff;
 	u_ch = 1;
 
 try_again:
-	u_len = 0;
-	for (i = 0; (i < uni->u_len) && ((u_len + 1 + u_ch) < length); i++) {
-		len = conv_f(&uni->u_name[i], uni->u_len - i, &uni_char);
+	u_len = 1;
+	for (i = 0; (i < str_len) && ((u_len + u_ch) <= ocu_max_len); i++) {
+		len = conv_f(&str_i[i], str_len - i, &uni_char);
 		if (!len)
 			continue;
 		/* Invalid character, deal with it */
@@ -236,41 +191,37 @@ try_again:
 		}
 
 		if (max_val == 0xffff)
-			ocu[++u_len] = (uint8_t)(uni_char >> 8);
-		ocu[++u_len] = (uint8_t)(uni_char & 0xff);
+			ocu[u_len++] = (uint8_t)(uni_char >> 8);
+		ocu[u_len++] = (uint8_t)(uni_char & 0xff);
 		i += len - 1;
 	}
 
-	ocu[length - 1] = (uint8_t)u_len + 1;
-	return u_len + 1;
+	return u_len;
 }
 
-int udf_CS0toUTF8(struct ustr *utf_o, const struct ustr *ocu_i)
+int udf_CS0toUTF8(uint8_t *utf_o, int o_len, const uint8_t *ocu_i, int i_len)
 {
-	return udf_name_from_CS0(utf_o, ocu_i, udf_uni2char_utf8);
+	return udf_name_from_CS0(utf_o, o_len, ocu_i, i_len,
+				 udf_uni2char_utf8);
 }
 
-int udf_get_filename(struct super_block *sb, uint8_t *sname, int slen,
+int udf_get_filename(struct super_block *sb, const uint8_t *sname, int slen,
 		     uint8_t *dname, int dlen)
 {
-	struct ustr *filename, *unifilename;
+	uint8_t *filename;
 	int (*conv_f)(wchar_t, unsigned char *, int);
 	int ret;
 
 	if (!slen)
 		return -EIO;
 
-	filename = kmalloc(sizeof(struct ustr), GFP_NOFS);
+	if (dlen <= 0)
+		return 0;
+
+	filename = kmalloc(dlen, GFP_NOFS);
 	if (!filename)
 		return -ENOMEM;
 
-	unifilename = kmalloc(sizeof(struct ustr), GFP_NOFS);
-	if (!unifilename) {
-		ret = -ENOMEM;
-		goto out1;
-	}
-
-	udf_build_ustr_exact(unifilename, sname, slen);
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8)) {
 		conv_f = udf_uni2char_utf8;
 	} else if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP)) {
@@ -278,21 +229,17 @@ int udf_get_filename(struct super_block *sb, uint8_t *sname, int slen,
 	} else
 		BUG();
 
-	ret = udf_name_from_CS0(filename, unifilename, conv_f);
+	ret = udf_name_from_CS0(filename, dlen, sname, slen, conv_f);
 	if (ret < 0) {
 		udf_debug("Failed in udf_get_filename: sname = %s\n", sname);
 		goto out2;
 	}
 
-	ret = udf_translate_to_linux(dname, dlen,
-				     filename->u_name, filename->u_len,
-				     unifilename->u_name, unifilename->u_len);
+	ret = udf_translate_to_linux(dname, dlen, filename, dlen, sname, slen);
 	/* Zero length filename isn't valid... */
 	if (ret == 0)
 		ret = -EINVAL;
 out2:
-	kfree(unifilename);
-out1:
 	kfree(filename);
 	return ret;
 }
@@ -300,12 +247,8 @@ out1:
 int udf_put_filename(struct super_block *sb, const uint8_t *sname, int slen,
 		     uint8_t *dname, int dlen)
 {
-	struct ustr unifilename;
 	int (*conv_f)(const unsigned char *, int, wchar_t *);
 
-	if (!udf_char_to_ustr(&unifilename, sname, slen))
-		return 0;
-
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8)) {
 		conv_f = udf_char2uni_utf8;
 	} else if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP)) {
@@ -313,7 +256,7 @@ int udf_put_filename(struct super_block *sb, const uint8_t *sname, int slen,
 	} else
 		BUG();
 
-	return udf_name_to_CS0(dname, &unifilename, dlen, conv_f);
+	return udf_name_to_CS0(dname, dlen, sname, slen, conv_f);
 }
 
 #define ILLEGAL_CHAR_MARK	'_'
@@ -324,8 +267,8 @@ int udf_put_filename(struct super_block *sb, const uint8_t *sname, int slen,
 #define CRC_LEN			5
 
 static int udf_translate_to_linux(uint8_t *newName, int newLen,
-				  uint8_t *udfName, int udfLen,
-				  uint8_t *fidName, int fidNameLen)
+				  const uint8_t *udfName, int udfLen,
+				  const uint8_t *fidName, int fidNameLen)
 {
 	int index, newIndex = 0, needsCRC = 0;
 	int extIndex = 0, newExtIndex = 0, hasExt = 0;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function
  2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
                   ` (5 preceding siblings ...)
  2015-12-24 16:25 ` [PATCH v2 6/7] udf: Remove struct ustr as non-needed intermediate storage Andrew Gabbasov
@ 2015-12-24 16:25 ` Andrew Gabbasov
  2016-01-04 13:25   ` Jan Kara
  2016-01-04 13:30 ` [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Jan Kara
  7 siblings, 1 reply; 17+ messages in thread
From: Andrew Gabbasov @ 2015-12-24 16:25 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

Current implementation of udf_translate_to_linux function does not
support multi-bytes characters at all: it counts bytes while calculating
extension length, when inserting CRC inside the name it doesn't
take into account inter-character boundaries and can break into
the middle of the character.

The most efficient way to properly support multi-bytes characters is
merging of translation operations directly into conversion function.
This can help to avoid extra passes along the string or parsing
the multi-bytes character back into unicode to find out it's length.

Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
---
 fs/udf/unicode.c | 260 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 141 insertions(+), 119 deletions(-)

diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
index f1cdeac..1dc967d 100644
--- a/fs/udf/unicode.c
+++ b/fs/udf/unicode.c
@@ -28,9 +28,6 @@
 
 #include "udf_sb.h"
 
-static int udf_translate_to_linux(uint8_t *, int, const uint8_t *, int,
-				  const uint8_t *, int);
-
 static int udf_uni2char_utf8(wchar_t uni,
 			     unsigned char *out,
 			     int boundlen)
@@ -114,13 +111,32 @@ static int udf_char2uni_utf8(const unsigned char *in,
 	return u_len;
 }
 
+#define ILLEGAL_CHAR_MARK	'_'
+#define EXT_MARK		'.'
+#define CRC_MARK		'#'
+#define EXT_SIZE		5
+/* Number of chars we need to store generated CRC to make filename unique */
+#define CRC_LEN			5
+
 static int udf_name_from_CS0(uint8_t *str_o, int str_max_len,
 			     const uint8_t *ocu, int ocu_len,
-			     int (*conv_f)(wchar_t, unsigned char *, int))
+			     int (*conv_f)(wchar_t, unsigned char *, int),
+			     int translate)
 {
+	uint32_t c;
 	uint8_t cmp_id;
 	int i, len;
-	int str_o_len = 0;
+	int u_ch;
+	int firstDots = 0, needsCRC = 0, illChar;
+	int ext_i_len, ext_max_len;
+	int str_o_len = 0;	/* Length of resulting output */
+	int ext_o_len = 0;	/* Extension output length */
+	int ext_crc_len = 0;	/* Extension output length if used with CRC */
+	int i_ext = -1;		/* Extension position in input buffer */
+	int o_crc = 0;		/* Rightmost possible output pos for CRC+ext */
+	unsigned short valueCRC;
+	uint8_t ext[EXT_SIZE * NLS_MAX_CHARSET_SIZE + 1];
+	uint8_t crc[CRC_LEN];
 
 	if (str_max_len <= 0)
 		return 0;
@@ -133,22 +149,134 @@ static int udf_name_from_CS0(uint8_t *str_o, int str_max_len,
 	cmp_id = ocu[0];
 	if (cmp_id != 8 && cmp_id != 16) {
 		memset(str_o, 0, str_max_len);
-		pr_err("unknown compression code (%d) stri=%s\n", cmp_id, ocu);
+		pr_err("unknown compression code (%d)\n", cmp_id);
 		return -EINVAL;
 	}
+	u_ch = cmp_id >> 3;
+
+	ocu++;
+	ocu_len--;
+
+	if (translate) {
+		/* Look for extension */
+		for (i = (ocu_len & ~(u_ch - 1)) - u_ch, ext_i_len = 0;
+		     (i >= 0) && (ext_i_len < EXT_SIZE);
+		     i -= u_ch, ext_i_len++) {
+
+			c = ocu[i];
+			if (u_ch > 1)
+				c = (c << 8) | ocu[i + 1];
+
+			if (c == EXT_MARK) {
+				if (ext_i_len)
+					i_ext = i;
+				break;
+			}
+		}
+		if (i_ext >= 0) {
+			/* Convert extension */
+			ext_max_len = min_t(int, sizeof(ext), str_max_len);
+			ext[ext_o_len++] = EXT_MARK;
+			illChar = 0;
+			for (i = i_ext + u_ch; i < ocu_len;) {
+
+				c = ocu[i++];
+				if (u_ch > 1)
+					c = (c << 8) | ocu[i++];
+
+				if (c == '/' || c == 0) {
+					if (illChar)
+						continue;
+					illChar = 1;
+					needsCRC = 1;
+					c = ILLEGAL_CHAR_MARK;
+				} else {
+					illChar = 0;
+				}
+
+				len = conv_f(c, &ext[ext_o_len],
+					     ext_max_len - ext_o_len);
+				/* Valid character? */
+				if (len >= 0) {
+					ext_o_len += len;
+				} else {
+					ext[ext_o_len++] = '?';
+					needsCRC = 1;
+				}
+				if ((ext_o_len + CRC_LEN) < str_max_len)
+					ext_crc_len = ext_o_len;
+			}
+		}
+	}
+
+	illChar = 0;
+	for (i = 0; i < ocu_len;) {
+
+		if (str_o_len >= str_max_len) {
+			needsCRC = 1;
+			break;
+		}
+
+		if (translate && (i == i_ext)) {
+			if (str_o_len > (str_max_len - ext_o_len))
+				needsCRC = 1;
+			break;
+		}
 
-	for (i = 1; (i < ocu_len) && (str_o_len < str_max_len);) {
 		/* Expand OSTA compressed Unicode to Unicode */
-		uint32_t c = ocu[i++];
-		if (cmp_id == 16)
+		c = ocu[i++];
+		if (u_ch > 1)
 			c = (c << 8) | ocu[i++];
 
+		if (translate) {
+			if ((c == '.') && (firstDots >= 0))
+				firstDots++;
+			else
+				firstDots = -1;
+
+			if (c == '/' || c == 0) {
+				if (illChar)
+					continue;
+				illChar = 1;
+				needsCRC = 1;
+				c = ILLEGAL_CHAR_MARK;
+			} else {
+				illChar = 0;
+			}
+		}
+
 		len = conv_f(c, &str_o[str_o_len], str_max_len - str_o_len);
 		/* Valid character? */
-		if (len >= 0)
+		if (len >= 0) {
 			str_o_len += len;
-		else
+		} else {
 			str_o[str_o_len++] = '?';
+			needsCRC = 1;
+		}
+		if (str_o_len <= (str_max_len - ext_o_len - CRC_LEN))
+			o_crc = str_o_len;
+	}
+
+	if (translate) {
+		if ((firstDots == 1) || (firstDots == 2))
+			needsCRC = 1;
+		if (needsCRC) {
+			str_o_len = o_crc;
+			valueCRC = crc_itu_t(0, ocu, ocu_len);
+			crc[0] = CRC_MARK;
+			crc[1] = hex_asc_upper_hi(valueCRC >> 8);
+			crc[2] = hex_asc_upper_lo(valueCRC >> 8);
+			crc[3] = hex_asc_upper_hi(valueCRC);
+			crc[4] = hex_asc_upper_lo(valueCRC);
+			len = min_t(int, CRC_LEN, str_max_len - str_o_len);
+			memcpy(&str_o[str_o_len], crc, len);
+			str_o_len += len;
+			ext_o_len = ext_crc_len;
+		}
+		if (ext_o_len > 0) {
+			memcpy(&str_o[str_o_len], ext, ext_o_len);
+			str_o_len += ext_o_len;
+		}
 	}
 
 	return str_o_len;
@@ -202,13 +330,12 @@ try_again:
 int udf_CS0toUTF8(uint8_t *utf_o, int o_len, const uint8_t *ocu_i, int i_len)
 {
 	return udf_name_from_CS0(utf_o, o_len, ocu_i, i_len,
-				 udf_uni2char_utf8);
+				 udf_uni2char_utf8, 0);
 }
 
 int udf_get_filename(struct super_block *sb, const uint8_t *sname, int slen,
 		     uint8_t *dname, int dlen)
 {
-	uint8_t *filename;
 	int (*conv_f)(wchar_t, unsigned char *, int);
 	int ret;
 
@@ -218,10 +345,6 @@ int udf_get_filename(struct super_block *sb, const uint8_t *sname, int slen,
 	if (dlen <= 0)
 		return 0;
 
-	filename = kmalloc(dlen, GFP_NOFS);
-	if (!filename)
-		return -ENOMEM;
-
 	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8)) {
 		conv_f = udf_uni2char_utf8;
 	} else if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP)) {
@@ -229,18 +352,10 @@ int udf_get_filename(struct super_block *sb, const uint8_t *sname, int slen,
 	} else
 		BUG();
 
-	ret = udf_name_from_CS0(filename, dlen, sname, slen, conv_f);
-	if (ret < 0) {
-		udf_debug("Failed in udf_get_filename: sname = %s\n", sname);
-		goto out2;
-	}
-
-	ret = udf_translate_to_linux(dname, dlen, filename, dlen, sname, slen);
+	ret = udf_name_from_CS0(dname, dlen, sname, slen, conv_f, 1);
 	/* Zero length filename isn't valid... */
 	if (ret == 0)
 		ret = -EINVAL;
-out2:
-	kfree(filename);
 	return ret;
 }
 
@@ -259,96 +374,3 @@ int udf_put_filename(struct super_block *sb, const uint8_t *sname, int slen,
 	return udf_name_to_CS0(dname, dlen, sname, slen, conv_f);
 }
 
-#define ILLEGAL_CHAR_MARK	'_'
-#define EXT_MARK		'.'
-#define CRC_MARK		'#'
-#define EXT_SIZE 		5
-/* Number of chars we need to store generated CRC to make filename unique */
-#define CRC_LEN			5
-
-static int udf_translate_to_linux(uint8_t *newName, int newLen,
-				  const uint8_t *udfName, int udfLen,
-				  const uint8_t *fidName, int fidNameLen)
-{
-	int index, newIndex = 0, needsCRC = 0;
-	int extIndex = 0, newExtIndex = 0, hasExt = 0;
-	unsigned short valueCRC;
-	uint8_t curr;
-
-	if (udfName[0] == '.' &&
-	    (udfLen == 1 || (udfLen == 2 && udfName[1] == '.'))) {
-		needsCRC = 1;
-		newIndex = udfLen;
-		memcpy(newName, udfName, udfLen);
-	} else {
-		for (index = 0; index < udfLen; index++) {
-			curr = udfName[index];
-			if (curr == '/' || curr == 0) {
-				needsCRC = 1;
-				curr = ILLEGAL_CHAR_MARK;
-				while (index + 1 < udfLen &&
-						(udfName[index + 1] == '/' ||
-						 udfName[index + 1] == 0))
-					index++;
-			}
-			if (curr == EXT_MARK &&
-					(udfLen - index - 1) <= EXT_SIZE) {
-				if (udfLen == index + 1)
-					hasExt = 0;
-				else {
-					hasExt = 1;
-					extIndex = index;
-					newExtIndex = newIndex;
-				}
-			}
-			if (newIndex < newLen)
-				newName[newIndex++] = curr;
-			else
-				needsCRC = 1;
-		}
-	}
-	if (needsCRC) {
-		uint8_t ext[EXT_SIZE];
-		int localExtIndex = 0;
-
-		if (hasExt) {
-			int maxFilenameLen;
-			for (index = 0;
-			     index < EXT_SIZE && extIndex + index + 1 < udfLen;
-			     index++) {
-				curr = udfName[extIndex + index + 1];
-
-				if (curr == '/' || curr == 0) {
-					needsCRC = 1;
-					curr = ILLEGAL_CHAR_MARK;
-					while (extIndex + index + 2 < udfLen &&
-					      (index + 1 < EXT_SIZE &&
-						(udfName[extIndex + index + 2] == '/' ||
-						 udfName[extIndex + index + 2] == 0)))
-						index++;
-				}
-				ext[localExtIndex++] = curr;
-			}
-			maxFilenameLen = newLen - CRC_LEN - localExtIndex;
-			if (newIndex > maxFilenameLen)
-				newIndex = maxFilenameLen;
-			else
-				newIndex = newExtIndex;
-		} else if (newIndex > newLen - CRC_LEN)
-			newIndex = newLen - CRC_LEN;
-		newName[newIndex++] = CRC_MARK;
-		valueCRC = crc_itu_t(0, fidName, fidNameLen);
-		newName[newIndex++] = hex_asc_upper_hi(valueCRC >> 8);
-		newName[newIndex++] = hex_asc_upper_lo(valueCRC >> 8);
-		newName[newIndex++] = hex_asc_upper_hi(valueCRC);
-		newName[newIndex++] = hex_asc_upper_lo(valueCRC);
-
-		if (hasExt) {
-			newName[newIndex++] = EXT_MARK;
-			for (index = 0; index < localExtIndex; index++)
-				newName[newIndex++] = ext[index];
-		}
-	}
-
-	return newIndex;
-}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 6/7] udf: Remove struct ustr as non-needed intermediate storage
  2015-12-24 16:25 ` [PATCH v2 6/7] udf: Remove struct ustr as non-needed intermediate storage Andrew Gabbasov
@ 2016-01-04 12:32   ` Jan Kara
  2016-01-11 13:31     ` Andrew Gabbasov
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2016-01-04 12:32 UTC (permalink / raw)
  To: Andrew Gabbasov; +Cc: Jan Kara, linux-kernel

On Thu 24-12-15 10:25:37, Andrew Gabbasov wrote:
> Although 'struct ustr' tries to structurize the data by combining
> the string and its length, it doesn't actually make much benefit,
> since it saves only one parameter, but introduces an extra copying
> of the whole buffer, serving as an intermediate storage. It looks
> quite inefficient and not actually needed.
> 
> This commit gets rid of the struct ustr by changing the parameters
> of some functions appropriately.
> 
> Also, it removes using 'dstring' type, since it doesn't make much
> sense too.
> 
> Just using the occasion, add a 'const' qualifier to udf_get_filename
> to make consistent parameters sets.

Some comments below:

> @@ -211,15 +163,18 @@ static int udf_name_to_CS0(dstring *ocu, struct ustr *uni, int length,
>  	wchar_t uni_char;
>  	int u_len, u_ch;
>  
> -	memset(ocu, 0, sizeof(dstring) * length);
> +	if (ocu_max_len <= 0)
> +		return 0;
> +
> +	memset(ocu, 0, ocu_max_len);
>  	ocu[0] = 8;
>  	max_val = 0xff;
>  	u_ch = 1;
>  
>  try_again:
> -	u_len = 0;
> -	for (i = 0; (i < uni->u_len) && ((u_len + 1 + u_ch) < length); i++) {
> -		len = conv_f(&uni->u_name[i], uni->u_len - i, &uni_char);
> +	u_len = 1;
> +	for (i = 0; (i < str_len) && ((u_len + u_ch) <= ocu_max_len); i++) {
> +		len = conv_f(&str_i[i], str_len - i, &uni_char);
>  		if (!len)
>  			continue;
>  		/* Invalid character, deal with it */
> @@ -236,41 +191,37 @@ try_again:
>  		}
>  
>  		if (max_val == 0xffff)
> -			ocu[++u_len] = (uint8_t)(uni_char >> 8);
> -		ocu[++u_len] = (uint8_t)(uni_char & 0xff);
> +			ocu[u_len++] = (uint8_t)(uni_char >> 8);
> +		ocu[u_len++] = (uint8_t)(uni_char & 0xff);
>  		i += len - 1;
>  	}
>  
> -	ocu[length - 1] = (uint8_t)u_len + 1;
> -	return u_len + 1;
> +	return u_len;

It seems you removed setting of the length in the resulting CS0 string.

> -int udf_CS0toUTF8(struct ustr *utf_o, const struct ustr *ocu_i)
> +int udf_CS0toUTF8(uint8_t *utf_o, int o_len, const uint8_t *ocu_i, int i_len)
>  {
> -	return udf_name_from_CS0(utf_o, ocu_i, udf_uni2char_utf8);
> +	return udf_name_from_CS0(utf_o, o_len, ocu_i, i_len,
> +				 udf_uni2char_utf8);
>  }
>  
> -int udf_get_filename(struct super_block *sb, uint8_t *sname, int slen,
> +int udf_get_filename(struct super_block *sb, const uint8_t *sname, int slen,
>  		     uint8_t *dname, int dlen)
>  {
> -	struct ustr *filename, *unifilename;
> +	uint8_t *filename;
>  	int (*conv_f)(wchar_t, unsigned char *, int);
>  	int ret;
>  
>  	if (!slen)
>  		return -EIO;
>  
> -	filename = kmalloc(sizeof(struct ustr), GFP_NOFS);
> +	if (dlen <= 0)
> +		return 0;
> +
> +	filename = kmalloc(dlen, GFP_NOFS);
>  	if (!filename)
>  		return -ENOMEM;
>  
> -	unifilename = kmalloc(sizeof(struct ustr), GFP_NOFS);
> -	if (!unifilename) {
> -		ret = -ENOMEM;
> -		goto out1;
> -	}
> -
> -	udf_build_ustr_exact(unifilename, sname, slen);
>  	if (UDF_QUERY_FLAG(sb, UDF_FLAG_UTF8)) {
>  		conv_f = udf_uni2char_utf8;
>  	} else if (UDF_QUERY_FLAG(sb, UDF_FLAG_NLS_MAP)) {
> @@ -278,21 +229,17 @@ int udf_get_filename(struct super_block *sb, uint8_t *sname, int slen,
>  	} else
>  		BUG();
>  
> -	ret = udf_name_from_CS0(filename, unifilename, conv_f);
> +	ret = udf_name_from_CS0(filename, dlen, sname, slen, conv_f);
>  	if (ret < 0) {
>  		udf_debug("Failed in udf_get_filename: sname = %s\n", sname);
>  		goto out2;
>  	}
>  
> -	ret = udf_translate_to_linux(dname, dlen,
> -				     filename->u_name, filename->u_len,
> -				     unifilename->u_name, unifilename->u_len);
> +	ret = udf_translate_to_linux(dname, dlen, filename, dlen, sname, slen);

Umm, but here you pass CS0 name to udf_translate_to_linux() including
beginning encoding character which didn't happen before your patch. So in
case we have to compute CRC it will be different.


								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function
  2015-12-24 16:25 ` [PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function Andrew Gabbasov
@ 2016-01-04 13:25   ` Jan Kara
  2016-01-11 13:31     ` Andrew Gabbasov
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2016-01-04 13:25 UTC (permalink / raw)
  To: Andrew Gabbasov; +Cc: Jan Kara, linux-kernel

On Thu 24-12-15 10:25:38, Andrew Gabbasov wrote:
> Current implementation of udf_translate_to_linux function does not
> support multi-bytes characters at all: it counts bytes while calculating
> extension length, when inserting CRC inside the name it doesn't
> take into account inter-character boundaries and can break into
> the middle of the character.
> 
> The most efficient way to properly support multi-bytes characters is
> merging of translation operations directly into conversion function.
> This can help to avoid extra passes along the string or parsing
> the multi-bytes character back into unicode to find out it's length.
> 
> Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>

Looks mostly good. A few comments below.

> ---
>  fs/udf/unicode.c | 260 ++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 141 insertions(+), 119 deletions(-)
> 
> diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
> index f1cdeac..1dc967d 100644
> --- a/fs/udf/unicode.c
> +++ b/fs/udf/unicode.c
> @@ -28,9 +28,6 @@
>  
>  #include "udf_sb.h"
>  
> -static int udf_translate_to_linux(uint8_t *, int, const uint8_t *, int,
> -				  const uint8_t *, int);
> -
>  static int udf_uni2char_utf8(wchar_t uni,
>  			     unsigned char *out,
>  			     int boundlen)
> @@ -114,13 +111,32 @@ static int udf_char2uni_utf8(const unsigned char *in,
>  	return u_len;
>  }
>  
> +#define ILLEGAL_CHAR_MARK	'_'
> +#define EXT_MARK		'.'
> +#define CRC_MARK		'#'
> +#define EXT_SIZE		5
> +/* Number of chars we need to store generated CRC to make filename unique */
> +#define CRC_LEN			5
> +
>  static int udf_name_from_CS0(uint8_t *str_o, int str_max_len,
>  			     const uint8_t *ocu, int ocu_len,
> -			     int (*conv_f)(wchar_t, unsigned char *, int))
> +			     int (*conv_f)(wchar_t, unsigned char *, int),
> +			     int translate)
>  {
> +	uint32_t c;
>  	uint8_t cmp_id;
>  	int i, len;
> -	int str_o_len = 0;
> +	int u_ch;
> +	int firstDots = 0, needsCRC = 0, illChar;
> +	int ext_i_len, ext_max_len;
> +	int str_o_len = 0;	/* Length of resulting output */
> +	int ext_o_len = 0;	/* Extension output length */
> +	int ext_crc_len = 0;	/* Extension output length if used with CRC */
> +	int i_ext = -1;		/* Extension position in input buffer */
> +	int o_crc = 0;		/* Rightmost possible output pos for CRC+ext */
> +	unsigned short valueCRC;
> +	uint8_t ext[EXT_SIZE * NLS_MAX_CHARSET_SIZE + 1];
> +	uint8_t crc[CRC_LEN];
>  
>  	if (str_max_len <= 0)
>  		return 0;
> @@ -133,22 +149,134 @@ static int udf_name_from_CS0(uint8_t *str_o, int str_max_len,
>  	cmp_id = ocu[0];
>  	if (cmp_id != 8 && cmp_id != 16) {
>  		memset(str_o, 0, str_max_len);
> -		pr_err("unknown compression code (%d) stri=%s\n", cmp_id, ocu);
> +		pr_err("unknown compression code (%d)\n", cmp_id);
>  		return -EINVAL;
>  	}
> +	u_ch = cmp_id >> 3;
> +
> +	ocu++;
> +	ocu_len--;
> +

Can we just check whether ocu_len is a multiple of u_ch here and bail out
if not? That would save us some rounding below and also fix possible access
beyond the end of the string in case fs is corrupted.

> +	if (translate) {
> +		/* Look for extension */
> +		for (i = (ocu_len & ~(u_ch - 1)) - u_ch, ext_i_len = 0;
> +		     (i >= 0) && (ext_i_len < EXT_SIZE);
> +		     i -= u_ch, ext_i_len++) {
> +

Just remove the empty line above please.

> +			c = ocu[i];
> +			if (u_ch > 1)
> +				c = (c << 8) | ocu[i + 1];
> +
> +			if (c == EXT_MARK) {
> +				if (ext_i_len)
> +					i_ext = i;
> +				break;
> +			}
> +		}
> +		if (i_ext >= 0) {
> +			/* Convert extension */
> +			ext_max_len = min_t(int, sizeof(ext), str_max_len);
> +			ext[ext_o_len++] = EXT_MARK;
> +			illChar = 0;
> +			for (i = i_ext + u_ch; i < ocu_len;) {
> +

Remove the empty line here please. Also how about using i += u_ch in the
for() above instead of i++ in the two lines below?

> +				c = ocu[i++];
> +				if (u_ch > 1)
> +					c = (c << 8) | ocu[i++];
> +
> +				if (c == '/' || c == 0) {
> +					if (illChar)
> +						continue;
> +					illChar = 1;
> +					needsCRC = 1;
> +					c = ILLEGAL_CHAR_MARK;
> +				} else {
> +					illChar = 0;
> +				}
> +
> +				len = conv_f(c, &ext[ext_o_len],
> +					     ext_max_len - ext_o_len);
> +				/* Valid character? */
> +				if (len >= 0) {
> +					ext_o_len += len;
> +				} else {
> +					ext[ext_o_len++] = '?';
> +					needsCRC = 1;
> +				}
> +				if ((ext_o_len + CRC_LEN) < str_max_len)
> +					ext_crc_len = ext_o_len;
> +			}
> +		}
> +	}
> +
> +	illChar = 0;
> +	for (i = 0; i < ocu_len;) {
> +

Again, remove the empty line, i += u_ch.

> +		if (str_o_len >= str_max_len) {
> +			needsCRC = 1;
> +			break;
> +		}
> +
> +		if (translate && (i == i_ext)) {
> +			if (str_o_len > (str_max_len - ext_o_len))
> +				needsCRC = 1;
> +			break;
> +		}
>  
> -	for (i = 1; (i < ocu_len) && (str_o_len < str_max_len);) {
>  		/* Expand OSTA compressed Unicode to Unicode */
> -		uint32_t c = ocu[i++];
> -		if (cmp_id == 16)
> +		c = ocu[i++];
> +		if (u_ch > 1)
>  			c = (c << 8) | ocu[i++];
>  
> +		if (translate) {
> +			if ((c == '.') && (firstDots >= 0))
> +				firstDots++;
> +			else
> +				firstDots = -1;
> +
> +			if (c == '/' || c == 0) {
> +				if (illChar)
> +					continue;
> +				illChar = 1;
> +				needsCRC = 1;
> +				c = ILLEGAL_CHAR_MARK;
> +			} else {
> +				illChar = 0;
> +			}
> +		}
> +
>  		len = conv_f(c, &str_o[str_o_len], str_max_len - str_o_len);
>  		/* Valid character? */
> -		if (len >= 0)
> +		if (len >= 0) {
>  			str_o_len += len;
> -		else
> +		} else {
>  			str_o[str_o_len++] = '?';
> +			needsCRC = 1;
> +		}

Umm, can you try factoring out body of this loop into a helper function and
using it both during extension parsing and full name parsing? Also I think
that checking whether the name is '.' or '..' could be moved just into the
if (translate) below where you could just directly check it like:
	if (str_o_len <= 2 && str_o[0] == '.' &&
	    (str_o_len == 1 || str_o[1] == '.'))

> +		if (str_o_len <= (str_max_len - ext_o_len - CRC_LEN))
> +			o_crc = str_o_len;
> +	}
> +
> +	if (translate) {
> +		if ((firstDots == 1) || (firstDots == 2))
> +			needsCRC = 1;
> +		if (needsCRC) {
> +			str_o_len = o_crc;
> +			valueCRC = crc_itu_t(0, ocu, ocu_len);
> +			crc[0] = CRC_MARK;
> +			crc[1] = hex_asc_upper_hi(valueCRC >> 8);
> +			crc[2] = hex_asc_upper_lo(valueCRC >> 8);
> +			crc[3] = hex_asc_upper_hi(valueCRC);
> +			crc[4] = hex_asc_upper_lo(valueCRC);
> +			len = min_t(int, CRC_LEN, str_max_len - str_o_len);
> +			memcpy(&str_o[str_o_len], crc, len);
> +			str_o_len += len;
> +			ext_o_len = ext_crc_len;
> +		}
> +		if (ext_o_len > 0) {
> +			memcpy(&str_o[str_o_len], ext, ext_o_len);
> +			str_o_len += ext_o_len;
> +		}
>  	}
>  
>  	return str_o_len;

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support
  2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
                   ` (6 preceding siblings ...)
  2015-12-24 16:25 ` [PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function Andrew Gabbasov
@ 2016-01-04 13:30 ` Jan Kara
  7 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2016-01-04 13:30 UTC (permalink / raw)
  To: Andrew Gabbasov; +Cc: Jan Kara, linux-kernel

On Thu 24-12-15 10:25:31, Andrew Gabbasov wrote:
> V2:
> 
> The single patch was split into several commits for separate logical
> steps. Also, some minor fixes were done in the code of the patches.

Thanks for the patches! I have taken patches 1 and 2 right away so that
they make it for this merge window and CCed them for stable. Patches 3-5
are fine as well but since they are only code cleanups, there is no problem
in them waiting until patches 6 and 7 are updated.

								Honza
> 
> V1:
> 
> Current implementation has several issues in unicode.c, mostly related
> to handling multi-bytes characters in file names:
> 
> - loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
> properly catch the end of output buffer in case of multi-bytes characters,
> allowing out-of-bounds writing and memory corruption;
> 
> - udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
> buffer at all, also allowing out-of-bounds writing and memory corruption;
> 
> - udf_translate_to_linux does not take into account multi-bytes characters
> at all (although it is called after converting to UTF8 or NLS): maximal
> length of extension is counted as 5 bytes, that may be incorrect with
> multi-bytes characters; when inserting CRC and extension for long names
> (near the end of the buffer), they are inserted at fixed place at the end,
> that can break into the middle of the multi-bytes character;
> 
> - when being converted from CS0 to UTF8 (or NLS), the name can be truncated
> (even if the sizes in bytes of input and output buffers are the same),
> but the following translating function does not know about it and does not
> insert CRC, as it is assumed by the specs.
> 
> Because of the last item above, it looks like all the checks and
> conversions (re-coding and possible CRC insertions) should be done
> simultaneously in the single function. This means that the listed
> issues can not be fixed independently and separately. So, the whole
> conversion and translation support should be reworked.
> 
> The proposed implementation below fixes the listed issues, and also has
> some additional features:
> 
> - it gets rid of "struct ustr", since it actually just makes an unneeded
> extra copying of the buffer and does not have any other significant
> advantage;
> 
> - it unifies UTF8 and NLS conversions support, since there is no much
> sense to separate these cases;
> 
> - UDF_NAME_LEN constant adjusted to better reflect actual restrictions.
> 
> 
> Andrew Gabbasov (7):
>   udf: Prevent buffer overrun with multi-byte characters
>   udf: Check output buffer length when converting name to CS0
>   udf: Parameterize output length in udf_put_filename
>   udf: Join functions for UTF8 and NLS conversions
>   udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
>   udf: Remove struct ustr as non-needed intermediate storage
>   udf: Merge linux specific translation into CS0 conversion function
> 
>  fs/udf/namei.c   |  16 +-
>  fs/udf/super.c   |  38 ++--
>  fs/udf/udfdecl.h |  21 +-
>  fs/udf/unicode.c | 611 ++++++++++++++++++++++---------------------------------
>  4 files changed, 274 insertions(+), 412 deletions(-)
> 
> -- 
> 2.1.0
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/7] udf: Check output buffer length when converting name to CS0
  2015-12-24 16:25 ` [PATCH v2 2/7] udf: Check output buffer length when converting name to CS0 Andrew Gabbasov
@ 2016-01-04 17:19   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2016-01-04 17:19 UTC (permalink / raw)
  To: Andrew Gabbasov; +Cc: Jan Kara, linux-kernel

On Thu 24-12-15 10:25:33, Andrew Gabbasov wrote:
> If a name contains at least some characters with Unicode values
> exceeding single byte, the CS0 output should have 2 bytes per character.
> And if other input characters have single byte Unicode values, then
> the single input byte is converted to 2 output bytes, and the length
> of output becomes larger than the length of input. And if the input
> name is long enough, the output length may exceed the allocated buffer
> length.
> 
> All this means that conversion from UTF8 or NLS to CS0 requires
> checking of output length in order to stop when it exceeds the given
> output buffer size.
> 
> Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>

I have taken this patch to my tree with a slight modification that
udf_xxxtoCS0 functions return 0 when they would need to truncate the name.
That way we properly return ENAMETOOLONG when user tries to create name we
cannot store instead of silently truncating it.

								Honza
> ---
>  fs/udf/unicode.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/udf/unicode.c b/fs/udf/unicode.c
> index 95a224b..155f912 100644
> --- a/fs/udf/unicode.c
> +++ b/fs/udf/unicode.c
> @@ -177,17 +177,18 @@ int udf_CS0toUTF8(struct ustr *utf_o, const struct ustr *ocu_i)
>  static int udf_UTF8toCS0(dstring *ocu, struct ustr *utf, int length)
>  {
>  	unsigned c, i, max_val, utf_char;
> -	int utf_cnt, u_len;
> +	int utf_cnt, u_len, u_ch;
>  
>  	memset(ocu, 0, sizeof(dstring) * length);
>  	ocu[0] = 8;
>  	max_val = 0xffU;
> +	u_ch = 1;
>  
>  try_again:
>  	u_len = 0U;
>  	utf_char = 0U;
>  	utf_cnt = 0U;
> -	for (i = 0U; i < utf->u_len; i++) {
> +	for (i = 0U; (i < utf->u_len) && ((u_len + 1 + u_ch) < length); i++) {
>  		c = (uint8_t)utf->u_name[i];
>  
>  		/* Complete a multi-byte UTF-8 character */
> @@ -229,6 +230,7 @@ try_again:
>  			if (max_val == 0xffU) {
>  				max_val = 0xffffU;
>  				ocu[0] = (uint8_t)0x10U;
> +				u_ch = 2;
>  				goto try_again;
>  			}
>  			goto error_out;
> @@ -299,15 +301,16 @@ static int udf_NLStoCS0(struct nls_table *nls, dstring *ocu, struct ustr *uni,
>  	int len;
>  	unsigned i, max_val;
>  	uint16_t uni_char;
> -	int u_len;
> +	int u_len, u_ch;
>  
>  	memset(ocu, 0, sizeof(dstring) * length);
>  	ocu[0] = 8;
>  	max_val = 0xffU;
> +	u_ch = 1;
>  
>  try_again:
>  	u_len = 0U;
> -	for (i = 0U; i < uni->u_len; i++) {
> +	for (i = 0U; (i < uni->u_len) && ((u_len + 1 + u_ch) < length); i++) {
>  		len = nls->char2uni(&uni->u_name[i], uni->u_len - i, &uni_char);
>  		if (!len)
>  			continue;
> @@ -320,6 +323,7 @@ try_again:
>  		if (uni_char > max_val) {
>  			max_val = 0xffffU;
>  			ocu[0] = (uint8_t)0x10U;
> +			u_ch = 2;
>  			goto try_again;
>  		}
>  
> -- 
> 2.1.0
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 6/7] udf: Remove struct ustr as non-needed intermediate storage
  2016-01-04 12:32   ` Jan Kara
@ 2016-01-11 13:31     ` Andrew Gabbasov
  2016-01-12 13:39       ` Jan Kara
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Gabbasov @ 2016-01-11 13:31 UTC (permalink / raw)
  To: 'Jan Kara'; +Cc: Jan Kara, linux-kernel

Hi Jan,

Thank you for your review!
And sorry for a delay with response, we had several days of holidays.

> -----Original Message-----
> From: Jan Kara [mailto:jack@suse.cz] 
> Sent: Monday, January 04, 2016 3:33 PM
> To: Gabbasov, Andrew
> Cc: Jan Kara; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 6/7] udf: Remove struct ustr as non-needed
intermediate storage

[skipped]

> >  
> > -	ocu[length - 1] = (uint8_t)u_len + 1;
> > -	return u_len + 1;
> > +	return u_len;
>
> It seems you removed setting of the length in the resulting CS0 string.

Yes, and it was done deliberately.
udf_name_to_CS0 and its caller udf_put_filename functions are used for
writing File Identifier fields only, which are not "dstrings", that is
not containing the length in last byte. The last byte with the length
from these functions would not be copied to filesystem or used in any
other way.

[skipped]

> > -	ret = udf_translate_to_linux(dname, dlen,
> > -				     filename->u_name, filename->u_len,
> > -				     unifilename->u_name,
unifilename->u_len);
> > +	ret = udf_translate_to_linux(dname, dlen, filename, dlen, sname,
slen);
>
> Umm, but here you pass CS0 name to udf_translate_to_linux() including
> beginning encoding character which didn't happen before your patch.
> So in case we have to compute CRC it will be different.

Yes, you are correct, this is a mistake (introduced when splitting
the original single patch). I will correct it in version 3 of the patches
(I will use sname + 1 and slen - 1).
Thank you for catching it!

I will prepare an update soon.

Thanks.

Best regards,
Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function
  2016-01-04 13:25   ` Jan Kara
@ 2016-01-11 13:31     ` Andrew Gabbasov
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Gabbasov @ 2016-01-11 13:31 UTC (permalink / raw)
  To: 'Jan Kara'; +Cc: Jan Kara, linux-kernel

Hi Jan,

> -----Original Message-----
> From: Jan Kara [mailto:jack@suse.cz] 
> Sent: Monday, January 04, 2016 4:25 PM
> To: Gabbasov, Andrew
> Cc: Jan Kara; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 7/7] udf: Merge linux specific translation into CS0
conversion function

[skipped]

> > +	u_ch = cmp_id >> 3;
> > +
> > +	ocu++;
> > +	ocu_len--;
> > +
>
> Can we just check whether ocu_len is a multiple of u_ch here
> and bail out if not? That would save us some rounding below
> and also fix possible access beyond the end of the string
> in case fs is corrupted.

Yes, I think it can be done. I just wasn't sure if having
unaligned length is allowed or not, and tried to support
this case too. We can definitely return -EINVAL if ocu_len
is not a multiple of u_ch.

[skipped]

> > -	for (i = 1; (i < ocu_len) && (str_o_len < str_max_len);) {
> >  		/* Expand OSTA compressed Unicode to Unicode */
> > -		uint32_t c = ocu[i++];
> > -		if (cmp_id == 16)
> > +		c = ocu[i++];
> > +		if (u_ch > 1)
> >  			c = (c << 8) | ocu[i++];
> >  
> > +		if (translate) {
> > +			if ((c == '.') && (firstDots >= 0))
> > +				firstDots++;
> > +			else
> > +				firstDots = -1;
> > +
> > +			if (c == '/' || c == 0) {
> > +				if (illChar)
> > +					continue;
> > +				illChar = 1;
> > +				needsCRC = 1;
> > +				c = ILLEGAL_CHAR_MARK;
> > +			} else {
> > +				illChar = 0;
> > +			}
> > +		}
> > +
> >  		len = conv_f(c, &str_o[str_o_len], str_max_len - str_o_len);
> >  		/* Valid character? */
> > -		if (len >= 0)
> > +		if (len >= 0) {
> >  			str_o_len += len;
> > -		else
> > +		} else {
> >  			str_o[str_o_len++] = '?';
> > +			needsCRC = 1;
> > +		}
>
> Umm, can you try factoring out body of this loop into a helper function
> and using it both during extension parsing and full name parsing?

I'll try to think about it. Although I'm not sure if it could become
a little overcomplicated since that helper could require too much internals
of this calling function.

> Also I think that checking whether the name is '.' or '..' could be moved
> just into the if (translate) below where you could just directly check it
like:
>	if (str_o_len <= 2 && str_o[0] == '.' &&
>	    (str_o_len == 1 || str_o[1] == '.'))

Yes, you are right, it will be simpler.
I'll do it (and get rid of firstDots) in version 3 of the patch.

I'll also make all your suggested changes with empty lines and += u_ch.

Thanks.

Best regards,
Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 6/7] udf: Remove struct ustr as non-needed intermediate storage
  2016-01-11 13:31     ` Andrew Gabbasov
@ 2016-01-12 13:39       ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2016-01-12 13:39 UTC (permalink / raw)
  To: Andrew Gabbasov; +Cc: 'Jan Kara', Jan Kara, linux-kernel

  Hi Andrew,

On Mon 11-01-16 16:31:40, Andrew Gabbasov wrote:
> > >  
> > > -	ocu[length - 1] = (uint8_t)u_len + 1;
> > > -	return u_len + 1;
> > > +	return u_len;
> >
> > It seems you removed setting of the length in the resulting CS0 string.
> 
> Yes, and it was done deliberately.
> udf_name_to_CS0 and its caller udf_put_filename functions are used for
> writing File Identifier fields only, which are not "dstrings", that is
> not containing the length in last byte. The last byte with the length
> from these functions would not be copied to filesystem or used in any
> other way.

I see, you are right. OK.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support
  2016-01-15  8:44 Andrew Gabbasov
@ 2016-01-22 16:41 ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2016-01-22 16:41 UTC (permalink / raw)
  To: Andrew Gabbasov; +Cc: Jan Kara, linux-kernel

On Fri 15-01-16 02:44:18, Andrew Gabbasov wrote:
> V3:
> 
> Patches 1 and 2 skipped from sending since they are already accepted
> by the maintainer (patch 2 with some changes comparing to V2).
> 
> Patches 3 - 5 rebased on top of updated patch 2.
> 
> Patch 6: Fixed a mistake in passing parameters to translate_to_linux():
> the third buffer and length, used for CRC calculation, should be
> passed without leading encoding character.

Thanks! For now I've taken patches 3-6 into my tree. I'll have a look at
patch 7 next week since I need a fresh mind for that.

								Honza
> 
> Patch 7: Main part of body of converting loops extracted to a separate
> helper function. Also, some other modifications addressing maintainer's
> comments to V2.
> 
> V2:
> 
> The single patch was split into several commits for separate logical
> steps. Also, some minor fixes were done in the code of the patches.
> 
> V1:
> 
> Current implementation has several issues in unicode.c, mostly related
> to handling multi-bytes characters in file names:
> 
> - loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
> properly catch the end of output buffer in case of multi-bytes characters,
> allowing out-of-bounds writing and memory corruption;
> 
> - udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
> buffer at all, also allowing out-of-bounds writing and memory corruption;
> 
> - udf_translate_to_linux does not take into account multi-bytes characters
> at all (although it is called after converting to UTF8 or NLS): maximal
> length of extension is counted as 5 bytes, that may be incorrect with
> multi-bytes characters; when inserting CRC and extension for long names
> (near the end of the buffer), they are inserted at fixed place at the end,
> that can break into the middle of the multi-bytes character;
> 
> - when being converted from CS0 to UTF8 (or NLS), the name can be truncated
> (even if the sizes in bytes of input and output buffers are the same),
> but the following translating function does not know about it and does not
> insert CRC, as it is assumed by the specs.
> 
> Because of the last item above, it looks like all the checks and
> conversions (re-coding and possible CRC insertions) should be done
> simultaneously in the single function. This means that the listed
> issues can not be fixed independently and separately. So, the whole
> conversion and translation support should be reworked.
> 
> The proposed implementation below fixes the listed issues, and also has
> some additional features:
> 
> - it gets rid of "struct ustr", since it actually just makes an unneeded
> extra copying of the buffer and does not have any other significant
> advantage;
> 
> - it unifies UTF8 and NLS conversions support, since there is no much
> sense to separate these cases;
> 
> - UDF_NAME_LEN constant adjusted to better reflect actual restrictions.
> 
> 
> Andrew Gabbasov (7):
>   udf: Prevent buffer overrun with multi-byte characters
>   udf: Check output buffer length when converting name to CS0
>   udf: Parameterize output length in udf_put_filename
>   udf: Join functions for UTF8 and NLS conversions
>   udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
>   udf: Remove struct ustr as non-needed intermediate storage
>   udf: Merge linux specific translation into CS0 conversion function
> 
>  fs/udf/namei.c   |  16 +-
>  fs/udf/super.c   |  38 ++--
>  fs/udf/udfdecl.h |  21 +-
>  fs/udf/unicode.c | 620 ++++++++++++++++++++++---------------------------------
>  4 files changed, 281 insertions(+), 414 deletions(-)
> 
> -- 
> 2.1.0
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support
@ 2016-01-15  8:44 Andrew Gabbasov
  2016-01-22 16:41 ` Jan Kara
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Gabbasov @ 2016-01-15  8:44 UTC (permalink / raw)
  To: Jan Kara, linux-kernel

V3:

Patches 1 and 2 skipped from sending since they are already accepted
by the maintainer (patch 2 with some changes comparing to V2).

Patches 3 - 5 rebased on top of updated patch 2.

Patch 6: Fixed a mistake in passing parameters to translate_to_linux():
the third buffer and length, used for CRC calculation, should be
passed without leading encoding character.

Patch 7: Main part of body of converting loops extracted to a separate
helper function. Also, some other modifications addressing maintainer's
comments to V2.

V2:

The single patch was split into several commits for separate logical
steps. Also, some minor fixes were done in the code of the patches.

V1:

Current implementation has several issues in unicode.c, mostly related
to handling multi-bytes characters in file names:

- loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
properly catch the end of output buffer in case of multi-bytes characters,
allowing out-of-bounds writing and memory corruption;

- udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
buffer at all, also allowing out-of-bounds writing and memory corruption;

- udf_translate_to_linux does not take into account multi-bytes characters
at all (although it is called after converting to UTF8 or NLS): maximal
length of extension is counted as 5 bytes, that may be incorrect with
multi-bytes characters; when inserting CRC and extension for long names
(near the end of the buffer), they are inserted at fixed place at the end,
that can break into the middle of the multi-bytes character;

- when being converted from CS0 to UTF8 (or NLS), the name can be truncated
(even if the sizes in bytes of input and output buffers are the same),
but the following translating function does not know about it and does not
insert CRC, as it is assumed by the specs.

Because of the last item above, it looks like all the checks and
conversions (re-coding and possible CRC insertions) should be done
simultaneously in the single function. This means that the listed
issues can not be fixed independently and separately. So, the whole
conversion and translation support should be reworked.

The proposed implementation below fixes the listed issues, and also has
some additional features:

- it gets rid of "struct ustr", since it actually just makes an unneeded
extra copying of the buffer and does not have any other significant
advantage;

- it unifies UTF8 and NLS conversions support, since there is no much
sense to separate these cases;

- UDF_NAME_LEN constant adjusted to better reflect actual restrictions.


Andrew Gabbasov (7):
  udf: Prevent buffer overrun with multi-byte characters
  udf: Check output buffer length when converting name to CS0
  udf: Parameterize output length in udf_put_filename
  udf: Join functions for UTF8 and NLS conversions
  udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
  udf: Remove struct ustr as non-needed intermediate storage
  udf: Merge linux specific translation into CS0 conversion function

 fs/udf/namei.c   |  16 +-
 fs/udf/super.c   |  38 ++--
 fs/udf/udfdecl.h |  21 +-
 fs/udf/unicode.c | 620 ++++++++++++++++++++++---------------------------------
 4 files changed, 281 insertions(+), 414 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-01-22 16:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-24 16:25 [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Andrew Gabbasov
2015-12-24 16:25 ` [PATCH v2 1/7] udf: Prevent buffer overrun with multi-byte characters Andrew Gabbasov
2015-12-24 16:25 ` [PATCH v2 2/7] udf: Check output buffer length when converting name to CS0 Andrew Gabbasov
2016-01-04 17:19   ` Jan Kara
2015-12-24 16:25 ` [PATCH v2 3/7] udf: Parameterize output length in udf_put_filename Andrew Gabbasov
2015-12-24 16:25 ` [PATCH v2 4/7] udf: Join functions for UTF8 and NLS conversions Andrew Gabbasov
2015-12-24 16:25 ` [PATCH v2 5/7] udf: Adjust UDF_NAME_LEN to better reflect actual restrictions Andrew Gabbasov
2015-12-24 16:25 ` [PATCH v2 6/7] udf: Remove struct ustr as non-needed intermediate storage Andrew Gabbasov
2016-01-04 12:32   ` Jan Kara
2016-01-11 13:31     ` Andrew Gabbasov
2016-01-12 13:39       ` Jan Kara
2015-12-24 16:25 ` [PATCH v2 7/7] udf: Merge linux specific translation into CS0 conversion function Andrew Gabbasov
2016-01-04 13:25   ` Jan Kara
2016-01-11 13:31     ` Andrew Gabbasov
2016-01-04 13:30 ` [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Jan Kara
2016-01-15  8:44 Andrew Gabbasov
2016-01-22 16:41 ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.