All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arvind Sankar <nivedita@alum.mit.edu>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Subject: [PATCH 21/24] efi/libstub: Add UTF-8 decoding to efi_puts
Date: Mon, 18 May 2020 15:07:13 -0400	[thread overview]
Message-ID: <20200518190716.751506-22-nivedita@alum.mit.edu> (raw)
In-Reply-To: <20200518190716.751506-1-nivedita@alum.mit.edu>

In order to be able to use the UTF-16 support added to vsprintf in the
previous commit, enhance efi_puts to decode UTF-8 into UTF-16. Invalid
UTF-8 encodings are passed through unchanged.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 .../firmware/efi/libstub/efi-stub-helper.c    | 67 +++++++++++++++++--
 1 file changed, 62 insertions(+), 5 deletions(-)

diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index a36f3af6e130..48242bc982a3 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -36,17 +36,74 @@ void efi_char16_puts(efi_char16_t *str)
 		       output_string, str);
 }
 
+static
+u32 utf8_to_utf32(const u8 **s8)
+{
+	u32 c32;
+	u8 c0, cx;
+	size_t clen, i;
+
+	c0 = cx = *(*s8)++;
+	/*
+	 * The position of the most-significant 0 bit gives us the length of
+	 * a multi-octet encoding.
+	 */
+	for (clen = 0; cx & 0x80; ++clen)
+		cx <<= 1;
+	/*
+	 * If the 0 bit is in position 8, this is a valid single-octet
+	 * encoding. If the 0 bit is in position 7 or positions 1-3, the
+	 * encoding is invalid.
+	 * In either case, we just return the first octet.
+	 */
+	if (clen < 2 || clen > 4)
+		return c0;
+	/* Get the bits from the first octet. */
+	c32 = cx >> clen--;
+	for (i = 0; i < clen; ++i) {
+		/* Trailing octets must have 10 in most significant bits. */
+		cx = (*s8)[i] ^ 0x80;
+		if (cx & 0xc0)
+			return c0;
+		c32 = (c32 << 6) | cx;
+	}
+	/*
+	 * Check for validity:
+	 * - The character must be in the Unicode range.
+	 * - It must not be a surrogate.
+	 * - It must be encoded using the correct number of octets.
+	 */
+	if (c32 > 0x10ffff ||
+	    (c32 & 0xf800) == 0xd800 ||
+	    clen != (c32 >= 0x80) + (c32 >= 0x800) + (c32 >= 0x10000))
+		return c0;
+	*s8 += clen;
+	return c32;
+}
+
 void efi_puts(const char *str)
 {
 	efi_char16_t buf[128];
 	size_t pos = 0, lim = ARRAY_SIZE(buf);
+	const u8 *s8 = (const u8 *)str;
+	u32 c32;
 
-	while (*str) {
-		if (*str == '\n')
+	while (*s8) {
+		if (*s8 == '\n')
 			buf[pos++] = L'\r';
-		/* Cast to unsigned char to avoid sign-extension */
-		buf[pos++] = (unsigned char)(*str++);
-		if (*str == '\0' || pos >= lim - 2) {
+		c32 = utf8_to_utf32(&s8);
+		if (c32 < 0x10000)
+			/* Characters in plane 0 use a single word. */
+			buf[pos++] = c32;
+		else {
+			/*
+			 * Characters in other planes encode into a surrogate
+			 * pair.
+			 */
+			buf[pos++] = (0xd800 - (0x10000 >> 10)) + (c32 >> 10);
+			buf[pos++] = 0xdc00 + (c32 & 0x3ff);
+		}
+		if (*s8 == '\0' || pos >= lim - 2) {
 			buf[pos] = L'\0';
 			efi_char16_puts(buf);
 			pos = 0;
-- 
2.26.2


  parent reply	other threads:[~2020-05-18 19:07 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-18 19:06 [PATCH 00/24] efi/libstub: Add printf implementation Arvind Sankar
2020-05-18 19:06 ` [PATCH 01/24] efi/libstub: Include dependencies of efistub.h Arvind Sankar
2020-05-18 19:06 ` [PATCH 02/24] efi/libstub: Rename efi_[char16_]printk to efi_[char16_]puts Arvind Sankar
2020-05-18 19:06 ` [PATCH 03/24] efi/libstub: Buffer output of efi_puts Arvind Sankar
2020-05-18 19:06 ` [PATCH 04/24] efi/libstub: Add a basic printf implementation Arvind Sankar
2020-05-18 19:06 ` [PATCH 05/24] efi/libstub: Optimize for size instead of speed Arvind Sankar
2020-06-05  0:31   ` Andrey Ignatov
2020-06-05  6:33     ` Ard Biesheuvel
2020-06-05 13:14       ` Arvind Sankar
2020-06-05 13:32         ` Arvind Sankar
2020-06-05 14:53           ` Ard Biesheuvel
2020-06-05 15:10             ` Arvind Sankar
2020-06-05 15:11               ` Ard Biesheuvel
2020-06-05 15:06           ` [PATCH] efi/x86: Fix build with gcc 4 Arvind Sankar
2020-06-05 16:09             ` Andrey Ignatov
2020-06-15  9:43               ` Ard Biesheuvel
2020-06-19 16:46             ` [tip: efi/urgent] " tip-bot2 for Arvind Sankar
2020-05-18 19:06 ` [PATCH 06/24] efi/printf: Drop %n format and L qualifier Arvind Sankar
2020-05-18 19:06 ` [PATCH 07/24] efi/printf: Add 64-bit and 8-bit integer support Arvind Sankar
2020-05-18 19:07 ` [PATCH 08/24] efi/printf: Factor out flags parsing and handle '%' earlier Arvind Sankar
2020-05-18 19:07 ` [PATCH 09/24] efi/printf: Fix minor bug in precision handling Arvind Sankar
2020-05-18 19:07 ` [PATCH 10/24] efi/printf: Merge 'p' with the integer formats Arvind Sankar
2020-05-18 19:07 ` [PATCH 11/24] efi/printf: Factor out width/precision parsing Arvind Sankar
2020-05-18 19:07 ` [PATCH 12/24] efi/printf: Factor out integer argument retrieval Arvind Sankar
2020-05-18 19:07 ` [PATCH 13/24] efi/printf: Handle null string input Arvind Sankar
2020-05-18 19:07 ` [PATCH 14/24] efi/printf: Refactor code to consolidate padding and output Arvind Sankar
2020-05-18 19:07 ` [PATCH 15/24] efi/printf: Abort on invalid format Arvind Sankar
2020-05-18 19:07 ` [PATCH 16/24] efi/printf: Turn vsprintf into vsnprintf Arvind Sankar
2020-05-18 19:07 ` [PATCH 17/24] efi/libstub: Implement printk-style logging Arvind Sankar
2020-05-19  8:22   ` Ard Biesheuvel
2020-05-19 15:07     ` Arvind Sankar
2020-05-20 16:38       ` Arvind Sankar
2020-05-20 16:38         ` Ard Biesheuvel
2020-05-20 17:02           ` Arvind Sankar
2020-05-20 17:09             ` Ard Biesheuvel
2020-05-18 19:07 ` [PATCH 18/24] efi/libstub: Add definitions for console input and events Arvind Sankar
2020-05-18 19:07 ` [PATCH 19/24] efi/gop: Add an option to list out the available GOP modes Arvind Sankar
2020-05-18 19:07 ` [PATCH 20/24] efi/printf: Add support for wchar_t (UTF-16) Arvind Sankar
2020-05-18 19:07 ` Arvind Sankar [this message]
2020-05-18 19:07 ` [PATCH 22/24] efi/libstub: Use %ls for filename Arvind Sankar
2020-05-18 19:07 ` [PATCH 23/24] efi/libstub: Get the exact UTF-8 length Arvind Sankar
2020-05-18 19:07 ` [PATCH 24/24] efi/libstub: Use snprintf with %ls to convert the command line Arvind Sankar
2020-05-19  7:53 ` [PATCH 00/24] efi/libstub: Add printf implementation Ard Biesheuvel
2020-05-19 15:06   ` Arvind Sankar
2020-05-19 16:44     ` Ard Biesheuvel
2020-05-21  0:29       ` [PATCH] efi/libstub: Don't parse overlong command lines Arvind Sankar
2020-05-22 13:13         ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200518190716.751506-22-nivedita@alum.mit.edu \
    --to=nivedita@alum.mit.edu \
    --cc=ardb@kernel.org \
    --cc=linux-efi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.