All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: lars.schneider@autodesk.com, git@vger.kernel.org,
	gitster@pobox.com, j6t@kdbg.org, sunshine@sunshineco.com,
	peff@peff.net, ramsay@ramsayjones.plus.com,
	Johannes.Schindelin@gmx.de, pclouds@gmail.com
Subject: Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute
Date: Sun, 1 Apr 2018 15:24:54 +0200	[thread overview]
Message-ID: <0FEBEFB2-46D6-4688-AF07-654B56FFF9D8@gmail.com> (raw)
In-Reply-To: <20180318072435.GA24190@tor.lan>


> On 18 Mar 2018, at 08:24, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> Some comments inline
> 
> On Fri, Mar 09, 2018 at 06:35:32PM +0100, lars.schneider@autodesk.com wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> Git recognizes files encoded with ASCII or one of its supersets (e.g.
>> UTF-8 or ISO-8859-1) as text files. All other encodings are usually
>> interpreted as binary and consequently built-in Git text processing
>> tools (e.g. 'git diff') as well as most Git web front ends do not
>> visualize the content.
>> 
>> Add an attribute to tell Git what encoding the user has defined for a
>> given file. If the content is added to the index, then Git converts the
> 
> Minor comment:
> "Git converts the content"
> Everywhere else (?) "encodes or reencodes" is used.
> "Git reencodes the content" may be more consistent.

OK, will change.


>> 
>> +static const char *default_encoding = "UTF-8";
>> +
>> +static int encode_to_git(const char *path, const char *src, size_t src_len,
>> +			 struct strbuf *buf, const char *enc, int conv_flags)
>> +{
>> +	char *dst;
>> +	int dst_len;
>> +	int die_on_error = conv_flags & CONV_WRITE_OBJECT;
>> +
>> +	/*
>> +	 * No encoding is specified or there is nothing to encode.
>> +	 * Tell the caller that the content was not modified.
>> +	 */
>> +	if (!enc || (src && !src_len))
>> +		return 0;
> 
> (This may have been discussed before.
> As we checked (enc != NULL) I think we can add here:)
> 	if (is_encoding_utf8(enc))
> 		return 0;

This should be covered in git_path_check_encoding(),
introduced in v12:

        /* Don't encode to the default encoding */
	if (same_encoding(value, default_encoding))
		return NULL;

In that function the encoding of a certain file is read from
the .gitattributes. If the encoding matches the compile-time
defined default encoding (= UTF-8), then the encoding is set
to NULL.


>> 
>> +
>> +static int encode_to_worktree(const char *path, const char *src, size_t src_len,
>> +			      struct strbuf *buf, const char *enc)
>> +{
>> +	char *dst;
>> +	int dst_len;
>> +
>> +	/*
>> +	 * No encoding is specified or there is nothing to encode.
>> +	 * Tell the caller that the content was not modified.
>> +	 */
>> +	if (!enc || (src && !src_len))
>> +		return 0;
> 
> Same as above:
> 	if (is_encoding_utf8(enc))
> 		return 0;
> 
>> +
>> +	dst = reencode_string_len(src, src_len, enc, default_encoding,
>> +				  &dst_len);
>> +	if (!dst) {
>> +		error("failed to encode '%s' from %s to %s",
>> +			path, default_encoding, enc);
>> +		return 0;
>> +	}
>> +
>> +	strbuf_attach(buf, dst, dst_len, dst_len + 1);
>> +	return 1;
>> +}
>> +
>> static int crlf_to_git(const struct index_state *istate,
>> 		       const char *path, const char *src, size_t len,
>> 		       struct strbuf *buf,
>> @@ -978,6 +1051,25 @@ static int ident_to_worktree(const char *path, const char *src, size_t len,
>> 	return 1;
>> }
>> 
>> +static const char *git_path_check_encoding(struct attr_check_item *check)
>> +{
>> +	const char *value = check->value;
>> +
>> +	if (ATTR_UNSET(value) || !strlen(value))
>> +		return NULL;
>> +
> 
> 
>> +	if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
>> +		error(_("working-tree-encoding attribute requires a value"));
>> +		return NULL;
>> +	}
> 
> TRUE or false are values, but just wrong ones.
> If this test is removed, the user will see "failed to encode "TRUE" to "UTF-8",
> which should give enough information to fix it.

I see your point. However, I would like to stop the processing right
there for these invalid values. How about 

  error(_("true/false are no valid working-tree-encodings"));

I think that is the most straight forward/helpful error message
for the enduser (I consider the term "boolean" but dismissed it
as potentially confusing to folks not familiar with the term).

OK with you?

> 
>> +
>> +	/* Don't encode to the default encoding */
>> +	if (!strcasecmp(value, default_encoding))
>> +		return NULL;
> Same as above ?:
> 	if (is_encoding_utf8(value))
> 		return 0;

Yes, that was fixed in v12 as mentioned above :-)

- Lars

  reply	other threads:[~2018-04-01 13:25 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-09 17:35 [PATCH v11 00/10] convert: add support for different encodings lars.schneider
2018-03-09 17:35 ` [PATCH v11 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-03-09 17:35 ` [PATCH v11 02/10] strbuf: add xstrdup_toupper() lars.schneider
2018-03-09 17:35 ` [PATCH v11 03/10] strbuf: add a case insensitive starts_with() lars.schneider
2018-03-09 17:35 ` [PATCH v11 04/10] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-03-09 17:35 ` [PATCH v11 05/10] utf8: add function to detect a missing " lars.schneider
2018-03-09 17:35 ` [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute lars.schneider
2018-03-09 19:10   ` Junio C Hamano
2018-03-15 21:23     ` Lars Schneider
2018-03-18  7:24   ` Torsten Bögershausen
2018-04-01 13:24     ` Lars Schneider [this message]
2018-04-05 16:41       ` Torsten Bögershausen
2018-04-15 16:54         ` Lars Schneider
2018-03-09 17:35 ` [PATCH v11 07/10] convert: check for detectable errors in UTF encodings lars.schneider
2018-03-09 19:00   ` Junio C Hamano
2018-03-09 19:04     ` Lars Schneider
2018-03-09 19:10   ` Junio C Hamano
2018-03-09 17:35 ` [PATCH v11 08/10] convert: advise canonical UTF encoding names lars.schneider
2018-03-09 19:11   ` Junio C Hamano
2018-03-15 22:42     ` Lars Schneider
2018-03-09 17:35 ` [PATCH v11 09/10] convert: add tracing for 'working-tree-encoding' attribute lars.schneider
2018-03-09 17:35 ` [PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-03-09 20:18   ` Eric Sunshine
2018-03-09 20:22     ` Junio C Hamano
2018-03-09 20:27       ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0FEBEFB2-46D6-4688-AF07-654B56FFF9D8@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=lars.schneider@autodesk.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.