All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Keeping <john@keeping.me.uk>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org, "Eric S. Raymond" <esr@thyrsus.com>,
	Felipe Contreras <felipe.contreras@gmail.com>,
	Sverre Rabbelier <srabbelier@gmail.com>
Subject: Re: [PATCH 2/8] git_remote_helpers: fix input when running under Python 3
Date: Mon, 14 Jan 2013 09:47:21 +0000	[thread overview]
Message-ID: <20130114094721.GQ4574@serenity.lan> (raw)
In-Reply-To: <50F38E12.6090207@alum.mit.edu>

On Mon, Jan 14, 2013 at 05:48:18AM +0100, Michael Haggerty wrote:
> On 01/13/2013 05:17 PM, John Keeping wrote:
>> On Sun, Jan 13, 2013 at 04:26:39AM +0100, Michael Haggerty wrote:
>>> On 01/12/2013 08:23 PM, John Keeping wrote:
>>>> Although 2to3 will fix most issues in Python 2 code to make it run under
>>>> Python 3, it does not handle the new strict separation between byte
>>>> strings and unicode strings.  There is one instance in
>>>> git_remote_helpers where we are caught by this.
>>>>
>>>> Fix it by explicitly decoding the incoming byte string into a unicode
>>>> string.  In this instance, use the locale under which the application is
>>>> running.
>>>>
>>>> Signed-off-by: John Keeping <john@keeping.me.uk>
>>>> ---
>>>>  git_remote_helpers/git/importer.py | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/git_remote_helpers/git/importer.py b/git_remote_helpers/git/importer.py
>>>> index e28cc8f..6814003 100644
>>>> --- a/git_remote_helpers/git/importer.py
>>>> +++ b/git_remote_helpers/git/importer.py
>>>> @@ -20,7 +20,7 @@ class GitImporter(object):
>>>>          """Returns a dictionary with refs.
>>>>          """
>>>>          args = ["git", "--git-dir=" + gitdir, "for-each-ref", "refs/heads"]
>>>> -        lines = check_output(args).strip().split('\n')
>>>> +        lines = check_output(args).decode().strip().split('\n')
>>>>          refs = {}
>>>>          for line in lines:
>>>>              value, name = line.split(' ')
>>>>
>>>
>>> Won't this change cause an exception if the branch names are not all
>>> valid strings in the current locale's encoding?  I don't see how this
>>> assumption is justified (e.g., see git-check-ref-format(1) for the rules
>>> governing reference names).
>> 
>> Yes it will.  The problem is that for Python 3 we need to decode the
>> byte string into a unicode string, which means we need to know what
>> encoding it is.
>> 
>> I don't think we can just say "git-for-each-ref will print refs in
>> UTF-8" since AFAIK git doesn't care what encoding the refs are in - I
>> suspect that's determined by the filesystem which in the end probably
>> maps to whatever bytes the shell fed git when the ref was created.
>> 
>> That's why I chose the current locale in this case.  I'm hoping someone
>> here will correct me if we can do better, but I don't see any way of
>> avoiding choosing some encoding here if we want to support Python 3
>> (which I think we will, even if we don't right now).
> 
> I'm not just trying to be a nuisance here;

You're not being - I think this is a difficult issue and I don't know
myself what the right answer is.

>                                            I'm struggling myself to
> understand how a program that cares about strings-vs-bytes (e.g., a
> Python3 script) should coexist with a program that doesn't (e.g., git
> [1]).  I think this will become a big issue if my Python version of the
> commit email script ever gets integrated and then made compatible with
> Python3.
> 
> You claim "for Python 3 we need to decode the byte string into a unicode
> string".  I understand that Python 3 strings are Unicode, but why/when
> is it necessary to decode data into a Unicode string as opposed to
> leaving it as a byte sequence?
> 
> In this particular case (from a cursory look over the code) it seems to
> me that (1) decoding to Unicode will sometimes fail for data that git
> considers valid and (2) there is no obvious reason that the data cannot
> be processed as byte sequences.

I've been thinking about this overnight and I think you're right that
treating them as byte strings in Python is most correct.  Having said
that, when I'm programming in Python I would find it quite surprising
that I had to treat ref strings specially - and as soon as I want to use
one in a string context (e.g. printing it as part of a message to the
user) I'm back to the same problem.

So I think we should try to solve the problem once rather than forcing
everyone who wants to use the library to solve it individually.  I just
wish it was obvious what we should do!


John

  reply	other threads:[~2013-01-14  9:47 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-12 19:23 [PATCH 0/8] Initial support for Python 3 John Keeping
2013-01-12 19:23 ` [PATCH 1/8] git_remote_helpers: Allow building with " John Keeping
2013-01-12 19:23 ` [PATCH 2/8] git_remote_helpers: fix input when running under " John Keeping
2013-01-13  3:26   ` Michael Haggerty
2013-01-13 16:17     ` John Keeping
2013-01-14  4:48       ` Michael Haggerty
2013-01-14  9:47         ` John Keeping [this message]
2013-01-15 19:48           ` [RFC/PATCH 2/8 v2] " John Keeping
2013-01-15 20:51             ` Junio C Hamano
2013-01-15 21:54               ` John Keeping
2013-01-15 22:04                 ` Junio C Hamano
2013-01-15 22:40                   ` [RFC/PATCH 2/8 v3] " John Keeping
2013-01-16  0:03                     ` Pete Wyckoff
2013-01-16  9:45                       ` John Keeping
2013-01-17  0:29                         ` Pete Wyckoff
2013-01-12 19:23 ` [PATCH 3/8] git_remote_helpers: Force rebuild if python version changes John Keeping
2013-01-12 23:30   ` Pete Wyckoff
2013-01-13 16:26     ` John Keeping
2013-01-13 17:14       ` Pete Wyckoff
2013-01-13 17:52         ` John Keeping
2013-01-15 22:58           ` John Keeping
2013-01-17  0:27             ` Pete Wyckoff
2013-01-12 19:23 ` [PATCH 4/8] git_remote_helpers: Use 2to3 if building with Python 3 John Keeping
2013-01-12 19:23 ` [PATCH 5/8] svn-fe: allow svnrdump_sim.py to run " John Keeping
2013-01-12 19:23 ` [PATCH 6/8] git-remote-testpy: hash bytes explicitly John Keeping
2013-01-12 19:23 ` [PATCH 7/8] git-remote-testpy: don't do unbuffered text I/O John Keeping
2013-01-12 19:23 ` [PATCH 8/8] git-remote-testpy: call print as a function John Keeping
2013-01-12 23:43 ` [PATCH 0/8] Initial support for Python 3 Pete Wyckoff
2013-01-13  0:41   ` John Keeping
2013-01-13 12:34     ` John Keeping
2013-01-13 16:40     ` Pete Wyckoff
2013-01-13 17:35       ` John Keeping
2013-01-17 18:53 ` [PATCH v2 0/8] Initial Python 3 support John Keeping
2013-01-17 18:53 ` [PATCH v2 1/8] git_remote_helpers: allow building with Python 3 John Keeping
2013-01-17 18:53 ` [PATCH v2 2/8] git_remote_helpers: fix input when running under " John Keeping
2013-01-17 18:53 ` [PATCH v2 3/8] git_remote_helpers: force rebuild if python version changes John Keeping
2013-01-17 18:53 ` [PATCH v2 4/8] git_remote_helpers: use 2to3 if building with Python 3 John Keeping
2013-01-18  5:15   ` Sverre Rabbelier
2013-01-18 10:32     ` John Keeping
2013-01-19  7:52       ` Sverre Rabbelier
2013-01-17 18:53 ` [PATCH v2 5/8] svn-fe: allow svnrdump_sim.py to run " John Keeping
2013-01-17 18:53 ` [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly John Keeping
2013-01-17 20:36   ` Junio C Hamano
2013-01-17 20:43     ` Junio C Hamano
2013-01-17 21:00     ` John Keeping
2013-01-17 21:05       ` John Keeping
2013-01-17 22:24       ` Junio C Hamano
2013-01-17 22:30         ` John Keeping
2013-01-17 22:57           ` Junio C Hamano
2013-01-17 18:54 ` [PATCH v2 7/8] git-remote-testpy: don't do unbuffered text I/O John Keeping
2013-01-18  3:50   ` Sverre Rabbelier
2013-01-17 18:54 ` [PATCH v2 8/8] git-remote-testpy: call print as a function John Keeping
2013-01-18  3:48   ` Sverre Rabbelier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130114094721.GQ4574@serenity.lan \
    --to=john@keeping.me.uk \
    --cc=esr@thyrsus.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.