All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] unit: add utf8_validate_test80 test
@ 2019-03-20 23:31 Michael Tretter
  2019-03-20 23:31 ` [PATCH 2/2] utf8: Fix expected bytes in l_utf8_get_codepoint Michael Tretter
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Tretter @ 2019-03-20 23:31 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

UTF-8 requires the form 10xxxxxx for the second, third and forth bytes
of a well-formed byte sequences.

Add a test for the string "ße" encoded using the Latin-1 Supplement
block. This is a relatively common German letter combination and valid
Unicode, but not UTF-8.
---
 unit/test-utf8.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/unit/test-utf8.c b/unit/test-utf8.c
index 3768506..9b5bc6e 100644
--- a/unit/test-utf8.c
+++ b/unit/test-utf8.c
@@ -815,6 +815,17 @@ static struct utf8_validate_test utf8_validate_test79 = {
 	.ucs4_len = 5,
 };
 
+static const char utf8_80[] = { 0xdf, 0x65 };
+static const wchar_t ucs4_80[] = { 0xffff };
+
+static struct utf8_validate_test utf8_validate_test80 = {
+	.utf8 = utf8_80,
+	.utf8_len = 2,
+	.type = UTF8_VALIDATE_TYPE_NOTUNICODE,
+	.ucs4 = ucs4_80,
+	.ucs4_len = 1,
+};
+
 static void test_utf8_codepoint(const struct utf8_validate_test *test)
 {
 	unsigned int i, pos;
@@ -1085,6 +1096,8 @@ int main(int argc, char *argv[])
 					&utf8_validate_test78);
 	l_test_add("Validate UTF 79", test_utf8_validate,
 					&utf8_validate_test79);
+	l_test_add("Validate UTF 80", test_utf8_validate,
+					&utf8_validate_test80);
 
 	l_test_add("Strlen UTF 1", test_utf8_strlen,
 					&utf8_strlen_test1);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] utf8: Fix expected bytes in l_utf8_get_codepoint
  2019-03-20 23:31 [PATCH 1/2] unit: add utf8_validate_test80 test Michael Tretter
@ 2019-03-20 23:31 ` Michael Tretter
  2019-03-21  1:09   ` Denis Kenzior
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Tretter @ 2019-03-20 23:31 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

UTF-8 requires the form 10xxxxxx for the second, third and forth bytes
of a well-formed byte sequences. Therefore, comparing with 0 is not
sufficient to exclude ill-formed byte sequences, but the first two bit
must follow the specified form.

Without this check, iwd crashes if it encounters Latin-1 Supplement
encoded SSIDs during scanning, because they are erroneously accepted as
valid UTF-8.
---
 ell/utf8.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ell/utf8.c b/ell/utf8.c
index e9998f7..f8750d8 100644
--- a/ell/utf8.c
+++ b/ell/utf8.c
@@ -109,7 +109,7 @@ LIB_EXPORT int l_utf8_get_codepoint(const char *str, size_t len, wchar_t *cp)
 	val = str[0] & (0xff >> (expect_bytes + 1));
 
 	for (i = 1; i < expect_bytes; i++) {
-		if ((str[i] & 0xc0) == 0)
+		if ((str[i] & 0xc0) != 0x80)
 			goto error;
 
 		val <<= 6;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 2/2] utf8: Fix expected bytes in l_utf8_get_codepoint
  2019-03-20 23:31 ` [PATCH 2/2] utf8: Fix expected bytes in l_utf8_get_codepoint Michael Tretter
@ 2019-03-21  1:09   ` Denis Kenzior
  0 siblings, 0 replies; 3+ messages in thread
From: Denis Kenzior @ 2019-03-21  1:09 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 667 bytes --]

Hi Michael,

On 03/20/2019 06:31 PM, Michael Tretter wrote:
> UTF-8 requires the form 10xxxxxx for the second, third and forth bytes
> of a well-formed byte sequences. Therefore, comparing with 0 is not
> sufficient to exclude ill-formed byte sequences, but the first two bit
> must follow the specified form.
> 
> Without this check, iwd crashes if it encounters Latin-1 Supplement
> encoded SSIDs during scanning, because they are erroneously accepted as
> valid UTF-8.
> ---
>   ell/utf8.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 

Good catch! That bug has been in there since 2011.

Both applied, thanks.

Regards,
-Denis

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-21  1:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-20 23:31 [PATCH 1/2] unit: add utf8_validate_test80 test Michael Tretter
2019-03-20 23:31 ` [PATCH 2/2] utf8: Fix expected bytes in l_utf8_get_codepoint Michael Tretter
2019-03-21  1:09   ` Denis Kenzior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.