All of lore.kernel.org
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints
Date: Thu, 4 Jul 2013 20:39:21 +0000	[thread overview]
Message-ID: <20130704203921.GR862789@vauxhall.crustytoothpaste.net> (raw)
In-Reply-To: <51D5D3D0.3030102@web.de>

[-- Attachment #1: Type: text/plain, Size: 1921 bytes --]

On Thu, Jul 04, 2013 at 09:58:08PM +0200, Torsten Bögershausen wrote:
> On 2013-07-04 19.19, brian m. carlson wrote:
> > The commit code already contains code for validating UTF-8, but it does not
> > check for invalid values, such as guaranteed non-characters and surrogates.  Fix
> s/guaranteed non-characters/code points out of range/

The "such as" is meant to be illustrative, not all-inclusive, and my
patch does check for U+FFFE and U+FFFF, which are guaranteed
non-characters.

> > this by explicitly checking for and rejecting such characters.
> Do we really reject them, or do we (only) warn about them ? 

Well, find_invalid_utf8 rejects them as invalid, and verify_utf8 fixes
them up as if they were Latin-1, and commit_tree_extended warns about
them.  My interpretation was from the point of view of the code that I
touched (find_invalid_utf8), not the binary.  It would be nice if the
binary actually rejected it, too, but that isn't within the scope of
this patch.

> Other question:
> Now that we have a check for codepoints out of range, beyond U+10FFFF,
> do we want to have an additional testcase ?

Sure, why not?

> > +test_expect_success 'UTF-8 invalid characters refused' '
> May be:
>  test_expect_success 'UTF-8 invalid surrogate' '

Since I'll be adding at least one more unit test, as you requested, I'll
change the name.  I suppose I might as well add a test for the
non-characters as well.

> Does it make sense to "grep on the fly", like this:
> git commit -a -F "$HOME/invalid" 2>&1  | grep "did not conform"

I am interested in making sure that git commit succeeds, and using a
pipe will cause any failure of git commit to be ignored.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2013-07-04 20:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-04 17:17 [PATCH v2 0/2] commit: improve UTF-8 validation brian m. carlson
2013-07-04 17:19 ` [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints brian m. carlson
2013-07-04 19:58   ` Torsten Bögershausen
2013-07-04 20:39     ` brian m. carlson [this message]
2013-07-05 12:51   ` Peter Krefting
2013-07-08 19:36     ` Junio C Hamano
2013-07-09 11:16       ` [PATCH] commit: reject non-characters Peter Krefting
2013-08-05 12:48         ` Peter Krefting
2013-08-05 16:54           ` Junio C Hamano
2013-08-06  7:03             ` Peter Krefting
2013-07-04 17:20 ` [PATCH v2 2/2] commit: reject overlong UTF-8 sequences brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130704203921.GR862789@vauxhall.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.