All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Krefting <peter@softwolves.pp.se>
To: Junio C Hamano <gitster@pobox.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Git Mailing List <git@vger.kernel.org>
Subject: [PATCH] commit: reject non-characters
Date: Tue, 9 Jul 2013 12:16:33 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.00.1307091213090.2313@ds9.cixit.se> (raw)
In-Reply-To: <7vfvvozvx4.fsf@alter.siamese.dyndns.org>

Unicode clause D14 defines all characters U+nFFFE and U+nFFFF (where
0 <= n <= 10h) as well as the range U+FDD0..U+FDEF as non-characters,
reserved for internal use only.  Disallow these characters in commit
messages as they are normally not recommended for interchange.

Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
---
Junio C Hamano:

> Yeah, while we are at it, doing this may not hurt.  I think Brian's
> two patches are in fairly good shape otherwise, so perhaps you can
> do this as a follow-up patch on top of the tip of the topic,
> e82bd6cc (commit: reject overlong UTF-8 sequences, 2013-07-04)?

OK, here you are. Enjoy :)

  commit.c               |  7 +++++--
  t/t3900-i18n-commit.sh | 18 ++++++++++++++++++
  2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index 5097dba..0587732 100644
--- a/commit.c
+++ b/commit.c
@@ -1305,8 +1305,11 @@ static int find_invalid_utf8(const char *buf, int len)
  		/* Surrogates are only for UTF-16 and cannot be encoded in UTF-8. */
  		if ((codepoint & 0x1ff800) == 0xd800)
  			return bad_offset;
-		/* U+FFFE and U+FFFF are guaranteed non-characters. */
-		if ((codepoint & 0x1ffffe) == 0xfffe)
+		/* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */
+		if ((codepoint & 0xffffe) == 0xfffe)
+			return bad_offset;
+		/* So are anything in the range U+FDD0..U+FDEF. */
+		if (codepoint >= 0xfdd0 && codepoint <= 0xfdef)
  			return bad_offset;
  	}
  	return -1;
diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh
index 051ea9d..38b00c3 100755
--- a/t/t3900-i18n-commit.sh
+++ b/t/t3900-i18n-commit.sh
@@ -58,6 +58,24 @@ test_expect_success 'UTF-8 overlong sequences rejected' '
  	grep "did not conform" "$HOME"/stderr
  '

+test_expect_success 'UTF-8 non-characters refused' '
+	test_when_finished "rm -f $HOME/stderr $HOME/invalid" &&
+	echo "UTF-8 non-character 1" >F &&
+	printf "Commit message\n\nNon-character:\364\217\277\276\n" \
+		>"$HOME/invalid" &&
+	git commit -a -F "$HOME/invalid" 2>"$HOME"/stderr &&
+	grep "did not conform" "$HOME"/stderr
+'
+
+test_expect_success 'UTF-8 non-characters refused' '
+	test_when_finished "rm -f $HOME/stderr $HOME/invalid" &&
+	echo "UTF-8 non-character 2." >F &&
+	printf "Commit message\n\nNon-character:\357\267\220\n" \
+		>"$HOME/invalid" &&
+	git commit -a -F "$HOME/invalid" 2>"$HOME"/stderr &&
+	grep "did not conform" "$HOME"/stderr
+'
+
  for H in ISO8859-1 eucJP ISO-2022-JP
  do
  	test_expect_success "$H setup" '
-- 
1.8.3.1

  reply	other threads:[~2013-07-09 11:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-04 17:17 [PATCH v2 0/2] commit: improve UTF-8 validation brian m. carlson
2013-07-04 17:19 ` [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints brian m. carlson
2013-07-04 19:58   ` Torsten Bögershausen
2013-07-04 20:39     ` brian m. carlson
2013-07-05 12:51   ` Peter Krefting
2013-07-08 19:36     ` Junio C Hamano
2013-07-09 11:16       ` Peter Krefting [this message]
2013-08-05 12:48         ` [PATCH] commit: reject non-characters Peter Krefting
2013-08-05 16:54           ` Junio C Hamano
2013-08-06  7:03             ` Peter Krefting
2013-07-04 17:20 ` [PATCH v2 2/2] commit: reject overlong UTF-8 sequences brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.00.1307091213090.2313@ds9.cixit.se \
    --to=peter@softwolves.pp.se \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.