All of lore.kernel.org
 help / color / mirror / Atom feed
* STGIT: Deathpatch in linus tree
@ 2012-02-07 13:02 "Andy Green (林安廸)"
  2012-02-07 17:37 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: "Andy Green (林安廸)" @ 2012-02-07 13:02 UTC (permalink / raw)
  To: git

Hi -

Linus just pushed 105e5180936d69b1aee46ead8a5fc6c68f4d5f65 to
linux-2.6... along the lines of the Monty Python joke that was so funny
it kills anyone who hears it, if I have a stgit branch based on a HEAD
that includes this commit then stgit dies when pushing on top of it.

So...

[agreen@build linux-2.6]$ stg pop --all
[agreen@build linux-2.6]$ git reset --hard 96e02d1
HEAD is now at 96e02d1 exec: fix use-after-free bug in setup_new_exec()
[agreen@build linux-2.6]$ stg push
Pushing patch "subject-patch-1-3-arm-dt-add-p" ... done (empty)
Now at patch "subject-patch-1-3-arm-dt-add-p"
[agreen@build linux-2.6]$ stg pop
Popped subject-patch-1-3-arm-dt-add-p
No patch applied

So the commit just before the bad guy is happy.  Then -->

[agreen@build linux-2.6]$ git reset --hard 105e518
HEAD is now at 105e518 Merge tag 'hwmon-fixes-for-3.3-rc3' of
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
[agreen@build linux-2.6]$ stg push
Error: Unhandled exception:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/stgit/main.py", line 152, in _main
    ret = command.func(parser, options, args)
  File "/usr/lib/python2.7/site-packages/stgit/commands/push.py", line
68, in func
    check_clean_iw = clean_iw)
  File "/usr/lib/python2.7/site-packages/stgit/lib/transaction.py", line
95, in __init__
    self.__current_tree = self.__stack.head.data.tree
  File "/usr/lib/python2.7/site-packages/stgit/lib/git.py", line 426, in
data
    self.__repository.cat_object(self.sha1))
  File "/usr/lib/python2.7/site-packages/stgit/lib/git.py", line 408, in
parse
    assert False
AssertionError

It also dies if I use Linus' current HEAD as the basis I am stgitting on
top of, which is one patch ahead of the deathpatch.

I'm using Fedora rawhide versions of git and stgit

git-1.7.9-1.fc17.x86_64
stgit-0.15-2.fc17.noarch

-Andy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: STGIT: Deathpatch in linus tree
  2012-02-07 13:02 STGIT: Deathpatch in linus tree "Andy Green (林安廸)"
@ 2012-02-07 17:37 ` Junio C Hamano
  2012-02-08  7:33   ` [StGit PATCH] Parse commit object header correctly Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2012-02-07 17:37 UTC (permalink / raw)
  To: Andy Green (林安廸), Catalin Marinas; +Cc: git

"Andy Green (林安廸)" <andy@warmcat.com> writes:

> [agreen@build linux-2.6]$ git reset --hard 105e518
> HEAD is now at 105e518 Merge tag 'hwmon-fixes-for-3.3-rc3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> [agreen@build linux-2.6]$ stg push
> Error: Unhandled exception:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/stgit/main.py", line 152, in _main
>     ret = command.func(parser, options, args)
>   File "/usr/lib/python2.7/site-packages/stgit/commands/push.py", line
> 68, in func
>     check_clean_iw = clean_iw)
>   File "/usr/lib/python2.7/site-packages/stgit/lib/transaction.py", line
> 95, in __init__
>     self.__current_tree = self.__stack.head.data.tree
>   File "/usr/lib/python2.7/site-packages/stgit/lib/git.py", line 426, in
> data
>     self.__repository.cat_object(self.sha1))
>   File "/usr/lib/python2.7/site-packages/stgit/lib/git.py", line 408, in
> parse
>     assert False
> AssertionError

Stgit at least at version 0.15 (which is you used, and which is what I
checked its source for) assumes it knows every possible kind of header
that can be recorded in the commit object and hits this assert when it
seems something it does not understand.

Version 0.16 seems to have removed this assert, like this hunk on the file
between 0.15 and 0.16:

@@ -404,8 +404,6 @@ class CommitData(Immutable, Repr):
                 cd = cd.set_author(Person.parse(value))
             elif key == 'committer':
                 cd = cd.set_committer(Person.parse(value))
-            else:
-                assert False
         assert False

The above code in StGit that tries to parse commit object header is
broken, and I am surprised that nobody caught it back in late 2006 when
the optional encoding header was added to the commit object, which the
above does not understand.

I am not sure if that removal of the extra assertion is enough, though.
It does not skip lines that it does not understand until the first blank
line, i.e. the end of the header, as it should.  Instead it has this
before starting to inspect if a line in the header is what it knows about:

            line = lines[i].strip()
            if not line:
                return cd.set_message(''.join(lines[i+1:]))
            key, value = line.split(None, 1)

I do not think this will work on a continuation line inside a header that
has only one token, and it will also ignore the leading whitespace that
protects the continuation line.

I think the loop should at least changed line this, until StGit starts
caring about multi-line headers with continuation lines.

diff --git a/stgit/lib/git.py b/stgit/lib/git.py
index 56287f6..f2b284d 100644
--- a/stgit/lib/git.py
+++ b/stgit/lib/git.py
@@ -392,18 +392,20 @@ class CommitData(Immutable, Repr):
         cd = cls(parents = [])
         lines = list(s.splitlines(True))
         for i in xrange(len(lines)):
-            line = lines[i].strip()
+            line = lines[i]
             if not line:
                 return cd.set_message(''.join(lines[i+1:]))
-            key, value = line.split(None, 1)
-            if key == 'tree':
-                cd = cd.set_tree(repository.get_tree(value))
-            elif key == 'parent':
-                cd = cd.add_parent(repository.get_commit(value))
-            elif key == 'author':
-                cd = cd.set_author(Person.parse(value))
-            elif key == 'committer':
-                cd = cd.set_committer(Person.parse(value))
+	    ix = line.find(' ')
+	    if 0 < ix:
+		key, value = line[0:ix], line[ix+1:]
+		if key == 'tree':
+		    cd = cd.set_tree(repository.get_tree(value))
+		elif key == 'parent':
+		    cd = cd.add_parent(repository.get_commit(value))
+		elif key == 'author':
+		    cd = cd.set_author(Person.parse(value))
+		elif key == 'committer':
+		    cd = cd.set_committer(Person.parse(value))
         assert False
 
 class Commit(GitObject):

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [StGit PATCH] Parse commit object header correctly
  2012-02-07 17:37 ` Junio C Hamano
@ 2012-02-08  7:33   ` Junio C Hamano
  2012-02-08 10:00     ` Michael Haggerty
  2012-02-09  9:38     ` Catalin Marinas
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-02-08  7:33 UTC (permalink / raw)
  To: Karl Hasselström, Catalin Marinas
  Cc: Andy Green (林安廸), git

To allow parsing the header produced by versions of Git newer than the
code written to parse it, all commit parsers are expected to skip unknown
header lines, so that newer types of header lines can be added safely.
The only three things that are promised are:

 (1) the header ends with an empty line (just an LF, not "a blank line"),
 (2) unknown lines can be skipped, and
 (3) a header "field" begins with the field name, followed by a single SP
     followed by the value.

The parser used by StGit, introduced by commit cbe4567 (New StGit core
infrastructure: repository operations, 2007-12-19), was accidentally a bit
too loose to lose information, and a bit too strict to raise exception
when dealing with a line it does not understand.

 - It used "strip()" to lose whitespaces from both ends, risking a line
   with only whitespaces to be mistaken as the end of the header.

 - It used "k, v = line.split(None, 1)", blindly assuming that all header
   lines (including the ones that the version of StGit may not understand)
   can safely be split without raising an exception, which is not true if
   there is no SP on the line.

This patch changes the parsing logic so that it:

 (1) detects end of the hedaer correctly by treating only an empty line as
     such;
 (2) handles multi-line fields (a header line that begins with a single SP
     is appended to the previous line after removing that leading SP but
     retaining the LF between the line and the previous line) correctly;
 (3) splits a line at the first SP to find the field name, but only does
     so when there actually is SP on the line; and
 (4) ignores lines that cannot be understood without barfing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Earlier I sent a minimum parser fix that ignores multi-line fields, as
   the fields StGit cares about are all single line.  This patch also
   teaches multi-line fields to the parser, so that later versions of
   StGit can parse and use them if they choose to.

   Python is not my primary language, so please take this with a grain of
   salt.

   Thanks.

 stgit/lib/git.py |   41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/stgit/lib/git.py b/stgit/lib/git.py
index 56287f6..f19371b 100644
--- a/stgit/lib/git.py
+++ b/stgit/lib/git.py
@@ -390,21 +390,34 @@ class CommitData(Immutable, Repr):
         @return: A new L{CommitData} object
         @rtype: L{CommitData}"""
         cd = cls(parents = [])
-        lines = list(s.splitlines(True))
+        raw_lines = list(s.splitlines(True))
+        lines = []
+        # Collapse multi-line header lines
+        for i in xrange(len(raw_lines)):
+            line = raw_lines[i]
+            if line == '\n':
+                cd.set_message(''.join(raw_lines[i+1:]))
+                break
+            if line[0] == ' ':
+                # continuation line
+                lines[-1] += '\n' + line[1:]
+            else:
+                lines.append(line)
         for i in xrange(len(lines)):
-            line = lines[i].strip()
-            if not line:
-                return cd.set_message(''.join(lines[i+1:]))
-            key, value = line.split(None, 1)
-            if key == 'tree':
-                cd = cd.set_tree(repository.get_tree(value))
-            elif key == 'parent':
-                cd = cd.add_parent(repository.get_commit(value))
-            elif key == 'author':
-                cd = cd.set_author(Person.parse(value))
-            elif key == 'committer':
-                cd = cd.set_committer(Person.parse(value))
-        assert False
+            line = lines[i].rstrip('\n')
+            ix = line.find(' ')
+            if 0 <= ix:
+                key, value = line[0:ix], line[ix+1:]
+                if key == 'tree':
+                    cd = cd.set_tree(repository.get_tree(value))
+                elif key == 'parent':
+                    cd = cd.add_parent(repository.get_commit(value))
+                elif key == 'author':
+                    cd = cd.set_author(Person.parse(value))
+                elif key == 'committer':
+                    cd = cd.set_committer(Person.parse(value))
+        return cd
+
 
 class Commit(GitObject):
     """Represents a git commit object. All the actual data contents of the

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-08  7:33   ` [StGit PATCH] Parse commit object header correctly Junio C Hamano
@ 2012-02-08 10:00     ` Michael Haggerty
  2012-02-08 10:43       ` Frans Klaver
  2012-02-15 12:24       ` Catalin Marinas
  2012-02-09  9:38     ` Catalin Marinas
  1 sibling, 2 replies; 15+ messages in thread
From: Michael Haggerty @ 2012-02-08 10:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Karl Hasselström, Catalin Marinas,
	"Andy Green (林安廸)",
	git

On 02/08/2012 08:33 AM, Junio C Hamano wrote:
> To allow parsing the header produced by versions of Git newer than the
> code written to parse it, all commit parsers are expected to skip unknown
> header lines, so that newer types of header lines can be added safely.
> The only three things that are promised are:
> 
>  (1) the header ends with an empty line (just an LF, not "a blank line"),
>  (2) unknown lines can be skipped, and
>  (3) a header "field" begins with the field name, followed by a single SP
>      followed by the value.
> 
> The parser used by StGit, introduced by commit cbe4567 (New StGit core
> infrastructure: repository operations, 2007-12-19), was accidentally a bit
> too loose to lose information, and a bit too strict to raise exception
> when dealing with a line it does not understand.
> 
>  - It used "strip()" to lose whitespaces from both ends, risking a line
>    with only whitespaces to be mistaken as the end of the header.
> 
>  - It used "k, v = line.split(None, 1)", blindly assuming that all header
>    lines (including the ones that the version of StGit may not understand)
>    can safely be split without raising an exception, which is not true if
>    there is no SP on the line.
> 
> This patch changes the parsing logic so that it:
> 
>  (1) detects end of the hedaer correctly by treating only an empty line as
>      such;
>  (2) handles multi-line fields (a header line that begins with a single SP
>      is appended to the previous line after removing that leading SP but
>      retaining the LF between the line and the previous line) correctly;
>  (3) splits a line at the first SP to find the field name, but only does
>      so when there actually is SP on the line; and
>  (4) ignores lines that cannot be understood without barfing.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> 
>  * Earlier I sent a minimum parser fix that ignores multi-line fields, as
>    the fields StGit cares about are all single line.  This patch also
>    teaches multi-line fields to the parser, so that later versions of
>    StGit can parse and use them if they choose to.
> 
>    Python is not my primary language, so please take this with a grain of
>    salt.
> 
>    Thanks.
> 
>  stgit/lib/git.py |   41 +++++++++++++++++++++++++++--------------
>  1 file changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/stgit/lib/git.py b/stgit/lib/git.py
> index 56287f6..f19371b 100644
> --- a/stgit/lib/git.py
> +++ b/stgit/lib/git.py
> @@ -390,21 +390,34 @@ class CommitData(Immutable, Repr):
>          @return: A new L{CommitData} object
>          @rtype: L{CommitData}"""
>          cd = cls(parents = [])
> -        lines = list(s.splitlines(True))
> +        raw_lines = list(s.splitlines(True))

str.splitlines() splits lines at any EOL pattern ('\n', '\r\n' or '\r'
alone).  If you want to be sure to split only on '\n', I think the
simplest alternative is

    raw_lines = s.split('\n')

str.split() and str.splitlines() already return lists, so it is not
necessary to wrap the result in list().

But please note that str.split() discards the split characters, and if
the last character in the string is '\n' then the last string in the
result list is the empty string.

> +        lines = []
> +        # Collapse multi-line header lines
> +        for i in xrange(len(raw_lines)):
> +            line = raw_lines[i]

The two previous lines can be written

           for (i, line) in enumerate(raw_lines):

> +            if line == '\n':
> +                cd.set_message(''.join(raw_lines[i+1:]))
> +                break
> +            if line[0] == ' ':
> +                # continuation line
> +                lines[-1] += '\n' + line[1:]

In your original version, lines[-1] would already be LF-terminated, so
this line would create a double-LF in the string.

> +            else:
> +                lines.append(line)
>          for i in xrange(len(lines)):
> -            line = lines[i].strip()
> -            if not line:
> -                return cd.set_message(''.join(lines[i+1:]))
> -            key, value = line.split(None, 1)
> -            if key == 'tree':
> -                cd = cd.set_tree(repository.get_tree(value))
> -            elif key == 'parent':
> -                cd = cd.add_parent(repository.get_commit(value))
> -            elif key == 'author':
> -                cd = cd.set_author(Person.parse(value))
> -            elif key == 'committer':
> -                cd = cd.set_committer(Person.parse(value))
> -        assert False
> +            line = lines[i].rstrip('\n')
> +            ix = line.find(' ')
> +            if 0 <= ix:
> +                key, value = line[0:ix], line[ix+1:]

The above five lines can be written

           for line in lines:
               if ' ' in line:
                   key, value = line.rstrip('\n').split(' ', 1)

or (if the lack of a space should be treated more like an exception)

           for line in lines:
               try:
                   key, value = line.rstrip('\n').split(' ', 1)
               except ValueError:
                   continue

> +                if key == 'tree':
> +                    cd = cd.set_tree(repository.get_tree(value))
> +                elif key == 'parent':
> +                    cd = cd.add_parent(repository.get_commit(value))
> +                elif key == 'author':
> +                    cd = cd.set_author(Person.parse(value))
> +                elif key == 'committer':
> +                    cd = cd.set_committer(Person.parse(value))

All in all, I would recommend something like (untested):

        @return: A new L{CommitData} object
        @rtype: L{CommitData}"""
        cd = cls(parents = [])
        lines = []
        raw_lines = s.split('\n')
        # Collapse multi-line header lines
        for i, line in enumerate(raw_lines):
            if not line:
                cd.set_message('\n'.join(raw_lines[i+1:]))
                break
            if line.startswith(' '):
                # continuation line
                lines[-1] += '\n' + line[1:]
            else:
                lines.append(line)

        for line in lines:
            if ' ' in line:
                key, value = line.split(' ', 1)
                if key == 'tree':
                    cd = cd.set_tree(repository.get_tree(value))
                elif key == 'parent':
                    cd = cd.add_parent(repository.get_commit(value))
                elif key == 'author':
                    cd = cd.set_author(Person.parse(value))
                elif key == 'committer':
                    cd = cd.set_committer(Person.parse(value))
        return cd

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-08 10:00     ` Michael Haggerty
@ 2012-02-08 10:43       ` Frans Klaver
  2012-02-08 16:17         ` Michael Haggerty
  2012-02-15 12:24       ` Catalin Marinas
  1 sibling, 1 reply; 15+ messages in thread
From: Frans Klaver @ 2012-02-08 10:43 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Karl Hasselström, Catalin Marinas,
	"Andy Green (林安廸)",
	git

On Wed, Feb 8, 2012 at 11:00 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 02/08/2012 08:33 AM, Junio C Hamano wrote:
>
>>  (1) detects end of the hedaer correctly by treating only an empty line as
>>      such;

s/hedaer/header/;


>> +            line = lines[i].rstrip('\n')
>> +            ix = line.find(' ')
>> +            if 0 <= ix:
>> +                key, value = line[0:ix], line[ix+1:]
>
> The above five lines can be written
>
>           for line in lines:
>               if ' ' in line:
>                   key, value = line.rstrip('\n').split(' ', 1)
>
> or (if the lack of a space should be treated more like an exception)
>
>           for line in lines:
>               try:
>                   key, value = line.rstrip('\n').split(' ', 1)
>               except ValueError:
>                   continue

This is generally considered more pythonic: "It's easier to ask for
forgiveness than to get permission".


>
>> +                if key == 'tree':
>> +                    cd = cd.set_tree(repository.get_tree(value))
>> +                elif key == 'parent':
>> +                    cd = cd.add_parent(repository.get_commit(value))
>> +                elif key == 'author':
>> +                    cd = cd.set_author(Person.parse(value))
>> +                elif key == 'committer':
>> +                    cd = cd.set_committer(Person.parse(value))
>
> All in all, I would recommend something like (untested):
>
>        @return: A new L{CommitData} object
>        @rtype: L{CommitData}"""
>        cd = cls(parents = [])
>        lines = []
>        raw_lines = s.split('\n')
>        # Collapse multi-line header lines
>        for i, line in enumerate(raw_lines):
>            if not line:
>                cd.set_message('\n'.join(raw_lines[i+1:]))
>                break
>            if line.startswith(' '):
>                # continuation line
>                lines[-1] += '\n' + line[1:]
>            else:
>                lines.append(line)
>
>        for line in lines:
>            if ' ' in line:
>                key, value = line.split(' ', 1)
>                if key == 'tree':
>                    cd = cd.set_tree(repository.get_tree(value))
>                elif key == 'parent':
>                    cd = cd.add_parent(repository.get_commit(value))
>                elif key == 'author':
>                    cd = cd.set_author(Person.parse(value))
>                elif key == 'committer':
>                    cd = cd.set_committer(Person.parse(value))
>        return cd

One could also take the recommended python approach for
switch/case-like if/elif/else statements:

updater = { 'tree': lambda cd, value: cd.set_tree(repository.get_tree(value),
                 'parent': lambda cd, value:
cd.add_parent(repository.get_commit(value)),
                 'author': lambda cd, value: cd.set_author(Person.parse(value)),
                 'committer': lambda cd, value:
cd.set_committer(Person.parse(value))
              }
for line in lines:
    try:
        key, value = line.split(' ', 1)
        cd = updater[key](cd, value)
    except ValueError:
        continue
    except KeyError:
        continue

It documents about the same, but adds checking on double 'case'
statements. The resulting for loop is rather cleaner and the exception
approach becomes even more logical. I rather like the result, but I
guess it's mostly a matter of taste.

Cheers,
Frans

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-08 10:43       ` Frans Klaver
@ 2012-02-08 16:17         ` Michael Haggerty
  2012-02-08 20:04           ` Frans Klaver
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Haggerty @ 2012-02-08 16:17 UTC (permalink / raw)
  To: Frans Klaver
  Cc: Junio C Hamano, Karl Hasselström, Catalin Marinas,
	"Andy Green (林安廸)",
	git

On 02/08/2012 11:43 AM, Frans Klaver wrote:
> On Wed, Feb 8, 2012 at 11:00 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> On 02/08/2012 08:33 AM, Junio C Hamano wrote:
>>> +            line = lines[i].rstrip('\n')
>>> +            ix = line.find(' ')
>>> +            if 0 <= ix:
>>> +                key, value = line[0:ix], line[ix+1:]
>>
>> The above five lines can be written
>>
>>           for line in lines:
>>               if ' ' in line:
>>                   key, value = line.rstrip('\n').split(' ', 1)
>>
>> or (if the lack of a space should be treated more like an exception)
>>
>>           for line in lines:
>>               try:
>>                   key, value = line.rstrip('\n').split(' ', 1)
>>               except ValueError:
>>                   continue
> 
> This is generally considered more pythonic: "It's easier to ask for
> forgiveness than to get permission".

Given that Junio explicitly wanted to allow lines with no spaces, I
assume that lack of a space is not an error but rather a conceivable
future extension.  If my assumption is correct, then it is misleading
(and inefficient) to handle it via an exception.

>>> +                if key == 'tree':
>>> +                    cd = cd.set_tree(repository.get_tree(value))
>>> +                elif key == 'parent':
>>> +                    cd = cd.add_parent(repository.get_commit(value))
>>> +                elif key == 'author':
>>> +                    cd = cd.set_author(Person.parse(value))
>>> +                elif key == 'committer':
>>> +                    cd = cd.set_committer(Person.parse(value))
>>
>> All in all, I would recommend something like (untested):
>>
>>        @return: A new L{CommitData} object
>>        @rtype: L{CommitData}"""
>>        cd = cls(parents = [])
>>        lines = []
>>        raw_lines = s.split('\n')
>>        # Collapse multi-line header lines
>>        for i, line in enumerate(raw_lines):
>>            if not line:
>>                cd.set_message('\n'.join(raw_lines[i+1:]))
>>                break
>>            if line.startswith(' '):
>>                # continuation line
>>                lines[-1] += '\n' + line[1:]
>>            else:
>>                lines.append(line)
>>
>>        for line in lines:
>>            if ' ' in line:
>>                key, value = line.split(' ', 1)
>>                if key == 'tree':
>>                    cd = cd.set_tree(repository.get_tree(value))
>>                elif key == 'parent':
>>                    cd = cd.add_parent(repository.get_commit(value))
>>                elif key == 'author':
>>                    cd = cd.set_author(Person.parse(value))
>>                elif key == 'committer':
>>                    cd = cd.set_committer(Person.parse(value))
>>        return cd
> 
> One could also take the recommended python approach for
> switch/case-like if/elif/else statements:
>
> updater = { 'tree': lambda cd, value: cd.set_tree(repository.get_tree(value),
>                  'parent': lambda cd, value:
> cd.add_parent(repository.get_commit(value)),
>                  'author': lambda cd, value: cd.set_author(Person.parse(value)),
>                  'committer': lambda cd, value:
> cd.set_committer(Person.parse(value))
>               }
> for line in lines:
>     try:
>         key, value = line.split(' ', 1)
>         cd = updater[key](cd, value)
>     except ValueError:
>         continue
>     except KeyError:
>         continue
> 
> It documents about the same, but adds checking on double 'case'
> statements. The resulting for loop is rather cleaner and the exception
> approach becomes even more logical. I rather like the result, but I
> guess it's mostly a matter of taste.

I know this approach and use it frequently, but when one has to resort
to lambdas and there are only four cases, it becomes IMHO less readable
than the if..else version.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-08 16:17         ` Michael Haggerty
@ 2012-02-08 20:04           ` Frans Klaver
  2012-02-09  3:58             ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Frans Klaver @ 2012-02-08 20:04 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Karl Hasselström, Catalin Marinas,
	Andy Green (林安廸),
	git

On Wed, 08 Feb 2012 17:17:24 +0100, Michael Haggerty  
<mhagger@alum.mit.edu> wrote:

> On 02/08/2012 11:43 AM, Frans Klaver wrote:
>> On Wed, Feb 8, 2012 at 11:00 AM, Michael Haggerty  
>> <mhagger@alum.mit.edu> wrote:
>>> On 02/08/2012 08:33 AM, Junio C Hamano wrote:
>>>> +            line = lines[i].rstrip('\n')
>>>> +            ix = line.find(' ')
>>>> +            if 0 <= ix:
>>>> +                key, value = line[0:ix], line[ix+1:]
>>>
>>> The above five lines can be written
>>>
>>>           for line in lines:
>>>               if ' ' in line:
>>>                   key, value = line.rstrip('\n').split(' ', 1)
>>>
>>> or (if the lack of a space should be treated more like an exception)
>>>
>>>           for line in lines:
>>>               try:
>>>                   key, value = line.rstrip('\n').split(' ', 1)
>>>               except ValueError:
>>>                   continue
>>
>> This is generally considered more pythonic: "It's easier to ask for
>> forgiveness than to get permission".
>
> Given that Junio explicitly wanted to allow lines with no spaces, I
> assume that lack of a space is not an error but rather a conceivable
> future extension.  If my assumption is correct, then it is misleading
> (and inefficient) to handle it via an exception.

I find the documenting more convincing than the efficiency, but from the  
phrasing I think you do too.


>>>        for line in lines:
>>>            if ' ' in line:
>>>                key, value = line.split(' ', 1)
>>>                if key == 'tree':
>>>                    cd = cd.set_tree(repository.get_tree(value))
>>>                elif key == 'parent':
>>>                    cd = cd.add_parent(repository.get_commit(value))
>>>                elif key == 'author':
>>>                    cd = cd.set_author(Person.parse(value))
>>>                elif key == 'committer':
>>>                    cd = cd.set_committer(Person.parse(value))
>>>        return cd
>>
>> One could also take the recommended python approach for
>> switch/case-like if/elif/else statements:
>>
>> updater = { 'tree': lambda cd, value:  
>> cd.set_tree(repository.get_tree(value),
>>                  'parent': lambda cd, value:
>> cd.add_parent(repository.get_commit(value)),
>>                  'author': lambda cd, value:  
>> cd.set_author(Person.parse(value)),
>>                  'committer': lambda cd, value:
>> cd.set_committer(Person.parse(value))
>>               }
>> for line in lines:
>>     try:
>>         key, value = line.split(' ', 1)
>>         cd = updater[key](cd, value)
>>     except ValueError:
>>         continue
>>     except KeyError:
>>         continue
>>
>> It documents about the same, but adds checking on double 'case'
>> statements. The resulting for loop is rather cleaner and the exception
>> approach becomes even more logical. I rather like the result, but I
>> guess it's mostly a matter of taste.
>
> I know this approach and use it frequently, but when one has to resort
> to lambdas and there are only four cases, it becomes IMHO less readable
> than the if..else version.

Well, as I said, its largely a matter of taste; four items is a corner  
case to me when thinking maintainability vs. readability. On the other  
hand, this doesn't seem like an oft-changing piece of code, so a longer  
list of if..elif..else shouldn't be a problem either.

Frans

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-08 20:04           ` Frans Klaver
@ 2012-02-09  3:58             ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-02-09  3:58 UTC (permalink / raw)
  To: Frans Klaver
  Cc: Michael Haggerty, Karl Hasselström, Catalin Marinas,
	Andy Green (林安廸),
	git

"Frans Klaver" <fransklaver@gmail.com> writes:

>>>>           for line in lines:
>>>>               try:
>>>>                   key, value = line.rstrip('\n').split(' ', 1)
>>>>               except ValueError:
>>>>                   continue
>>>
>>> This is generally considered more pythonic: "It's easier to ask for
>>> forgiveness than to get permission".
>>
>> Given that Junio explicitly wanted to allow lines with no spaces, I
>> assume that lack of a space is not an error but rather a conceivable
>> future extension.  If my assumption is correct, then it is misleading
>> (and inefficient) to handle it via an exception.
>
> I find the documenting more convincing than the efficiency, but from
> the phrasing I think you do too.

A line that consists entirely of non-SP may or may not a conceivable
future extension, but the point is to "skip without barfing anything you
do not understand".

I wouldn't oppose the rewrite that uses try/except ValueError if
"everything in this try block will parse what I understand correctly, and
any ValueError exception this try block throws is an indication that I
encountered what I do not understand and I must skip" is the more pythonic
way to express that principle. Python is not my primary language as I
said, and in addition StGit may have its own style I haven't learned.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-08  7:33   ` [StGit PATCH] Parse commit object header correctly Junio C Hamano
  2012-02-08 10:00     ` Michael Haggerty
@ 2012-02-09  9:38     ` Catalin Marinas
  2012-02-09 17:51       ` Jonathan Nieder
  2012-02-09 19:04       ` Junio C Hamano
  1 sibling, 2 replies; 15+ messages in thread
From: Catalin Marinas @ 2012-02-09  9:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Karl Hasselström, Andy Green (林安廸), git

Hi Junio,

On 8 February 2012 07:33, Junio C Hamano <gitster@pobox.com> wrote:
> To allow parsing the header produced by versions of Git newer than the
> code written to parse it, all commit parsers are expected to skip unknown
> header lines, so that newer types of header lines can be added safely.
> The only three things that are promised are:
>
>  (1) the header ends with an empty line (just an LF, not "a blank line"),
>  (2) unknown lines can be skipped, and
>  (3) a header "field" begins with the field name, followed by a single SP
>     followed by the value.

Thanks for looking into this. Is this the same as an email header? If
yes, we could just use the python's email.Header.decode_header()
function (I haven't tried yet).

BTW, does Git allow custom headers to be inserted by tools like StGit?

-- 
Catalin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-09  9:38     ` Catalin Marinas
@ 2012-02-09 17:51       ` Jonathan Nieder
  2012-02-10  4:27         ` Nguyen Thai Ngoc Duy
  2012-02-09 19:04       ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Nieder @ 2012-02-09 17:51 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Junio C Hamano, Karl Hasselström,
	Andy Green (林安廸),
	git

Hi Catalin,

Catalin Marinas wrote:

> Thanks for looking into this. Is this the same as an email header? If
> yes, we could just use the python's email.Header.decode_header()
> function (I haven't tried yet).

They look like this:

	encoding ISO8859-1

> BTW, does Git allow custom headers to be inserted by tools like StGit?

No.  There is one list of supported headers, and this list is the
standards body that maintains it[*].  So if you end up needing an
extension to the commit object format, that can be done, but it needs
to be accepted here (and ideally checked by "git fsck", though it's
lagging a bit in that respect lately).

By the way, headers have a standard order to avoid spurious changes in
commit names from reordering.  Additions so far have always happened
at the end, which is what makes checks by "git fsck" possible --- it
can't rule out an unrecognized header line being a standard field from
a future version of git, but it would be allowed to complain about
unrecognized fields before 'encoding', for example.

Thanks,
Jonathan

[*] http://thread.gmane.org/gmane.comp.version-control.git/138848/focus=138892

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-09  9:38     ` Catalin Marinas
  2012-02-09 17:51       ` Jonathan Nieder
@ 2012-02-09 19:04       ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-02-09 19:04 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Karl Hasselström, Andy Green (林安廸), git

Catalin Marinas <catalin.marinas@gmail.com> writes:

> On 8 February 2012 07:33, Junio C Hamano <gitster@pobox.com> wrote:
>> To allow parsing the header produced by versions of Git newer than the
>> code written to parse it, all commit parsers are expected to skip unknown
>> header lines, so that newer types of header lines can be added safely.
>> The only three things that are promised are:
>>
>>  (1) the header ends with an empty line (just an LF, not "a blank line"),
>>  (2) unknown lines can be skipped, and
>>  (3) a header "field" begins with the field name, followed by a single SP
>>     followed by the value.
>
> Thanks for looking into this. Is this the same as an email header? If
> yes, we could just use the python's email.Header.decode_header()
> function (I haven't tried yet).

If you are thinking about feeding everything up to the first empty line to
whatever is designed to parse email header, please don't.  I do not think
they obey "skip unknown lines without barfing" rule [*1*], so we would be
back to square one if you did so.

The fix posted in this thread is necessary because the change to StGit
between v0.15 and v0.16 made to ignore lines starting with "encoding " was
a wrong way to work around the broken parser in v0.15 in the first place.
The parser assumed that (1) all whitespaces around the header lines can be
stripped, (2) the result after such stripping will always have at least
one whitespace so that line.split(None, 1) will never barf, and (3)
between the field name and its value there may be arbitrary number of
whitespace characters that can be ignored so that line.split(None, 1) is a
safe way to split it into a (key,value) pair.  None of which is a safe
thing to assume.  The rule for safe parsing is to ignore all lines it does
not understand without assuming anything, and I wrote the patch in this
thread to make sure it makes no such unwarranted assumption.

> BTW, does Git allow custom headers to be inserted by tools like StGit?

The header format is designed in such a way that it is safe for a parser
to silently ignore unknown cruft, but that also means tools that work on
an existing commit and produce a similar one, like "commit --amend", are
free to either ignore and drop them when creating a new commit out of the
original one, or replay it verbatim without adjusting them to the new
context they appear in.  In that sense, they are technically "allowed",
but depending on the nature of the information you are putting there, it
semantically may or may not produce the desired result [*2*].  I would say
it is strongly discouraged to invent new types of header lines without
first consulting the people who maintain tools you must interoperate with,
so that they will also be aware of them, and hopefully their tools can be
adjusted to help you use them.

Sorry for the breakage and making you to deal with this post release. We
observed that recent StGit did not barf after we added "encoding " field
to Git, and assumed that StGit correctly ignored lines that it did not
understand, without inspecting its code.

At least we should have Cc'ed you guys directly when the change was being
discussed on the list.


[Footnote]

*1* I also suspect that it will handle a line that begins with a single SP
differently if you use email parsing rules. In a commit object header, the
content of such a line is appended to the value of the previous field
after turning that leading single SP into a LF, and the resulting value
will be a string that consists of multiple lines. The header folding rule
used for e-mail in RFC2822 is a way to represent a (logically) single line
as physically multiple lines, so the result of unfolding will become a
single line.  This difference may not matter for the purpose of the
current StGit that understands nothing but tree/parent/author/committer,
but because we are discussing a forward-looking fix for its parser, I
wouldn't recommend "it does not matter because we do not currently care"
approach.

*2* For example, a line that begins with "gpgsig " is a field that records
the GPG signature of a commit itself (using "git commit -S"), and it
should not survive across "commit --amend".  A line that begins with
"mergetag " is a field that records the tag information that was merged
from a side branch, and amending such a merge does not change what was
merged, so it should survive across "commit --amend".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-09 17:51       ` Jonathan Nieder
@ 2012-02-10  4:27         ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 15+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-02-10  4:27 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Catalin Marinas, Junio C Hamano, Karl Hasselström,
	Andy Green (林安廸),
	git

On Fri, Feb 10, 2012 at 12:51 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> No.  There is one list of supported headers, and this list is the
> standards body that maintains it[*].  So if you end up needing an
> extension to the commit object format, that can be done, but it needs
> to be accepted here (and ideally checked by "git fsck", though it's
> lagging a bit in that respect lately).
>
> [*] http://thread.gmane.org/gmane.comp.version-control.git/138848/focus=138892

Doesn't this deserve a document in Documentation/technical? It's also
a good opportunity to document tree and tag object format in addition
to commit object. I did not check thoroughly but the commit that
introduced encoding field, 4b2bced, did not come with any document
updates, so I assume it has not been documented ever since.
-- 
Duy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-08 10:00     ` Michael Haggerty
  2012-02-08 10:43       ` Frans Klaver
@ 2012-02-15 12:24       ` Catalin Marinas
  2012-02-15 18:13         ` Junio C Hamano
  2012-02-15 18:40         ` "Andy Green (林安廸)"
  1 sibling, 2 replies; 15+ messages in thread
From: Catalin Marinas @ 2012-02-15 12:24 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Karl Hasselström,
	"Andy Green (林安廸)",
	git

On 8 February 2012 10:00, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 02/08/2012 08:33 AM, Junio C Hamano wrote:
>> To allow parsing the header produced by versions of Git newer than the
>> code written to parse it, all commit parsers are expected to skip unknown
>> header lines, so that newer types of header lines can be added safely.
>> The only three things that are promised are:
>>
>>  (1) the header ends with an empty line (just an LF, not "a blank line"),
>>  (2) unknown lines can be skipped, and
>>  (3) a header "field" begins with the field name, followed by a single SP
>>      followed by the value.
>>
>> The parser used by StGit, introduced by commit cbe4567 (New StGit core
>> infrastructure: repository operations, 2007-12-19), was accidentally a bit
>> too loose to lose information, and a bit too strict to raise exception
>> when dealing with a line it does not understand.
...
> All in all, I would recommend something like (untested):
>
>        @return: A new L{CommitData} object
>        @rtype: L{CommitData}"""
>        cd = cls(parents = [])
>        lines = []
>        raw_lines = s.split('\n')
>        # Collapse multi-line header lines
>        for i, line in enumerate(raw_lines):
>            if not line:
>                cd.set_message('\n'.join(raw_lines[i+1:]))
>                break
>            if line.startswith(' '):
>                # continuation line
>                lines[-1] += '\n' + line[1:]
>            else:
>                lines.append(line)
>
>        for line in lines:
>            if ' ' in line:
>                key, value = line.split(' ', 1)
>                if key == 'tree':
>                    cd = cd.set_tree(repository.get_tree(value))
>                elif key == 'parent':
>                    cd = cd.add_parent(repository.get_commit(value))
>                elif key == 'author':
>                    cd = cd.set_author(Person.parse(value))
>                elif key == 'committer':
>                    cd = cd.set_committer(Person.parse(value))
>        return cd

Thank you all for comments and patches. I used a combination of
Junio's patch with the comments from Michael and a fix from me. I'll
publish it to the 'master' branch shortly and release a 0.16.1
hopefully this week.

-- 
Catalin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-15 12:24       ` Catalin Marinas
@ 2012-02-15 18:13         ` Junio C Hamano
  2012-02-15 18:40         ` "Andy Green (林安廸)"
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2012-02-15 18:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Michael Haggerty, Karl Hasselström,
	Andy Green (林安廸),
	git

Catalin Marinas <catalin.marinas@gmail.com> writes:

> ... I'll
> publish it to the 'master' branch shortly and release a 0.16.1
> hopefully this week.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [StGit PATCH] Parse commit object header correctly
  2012-02-15 12:24       ` Catalin Marinas
  2012-02-15 18:13         ` Junio C Hamano
@ 2012-02-15 18:40         ` "Andy Green (林安廸)"
  1 sibling, 0 replies; 15+ messages in thread
From: "Andy Green (林安廸)" @ 2012-02-15 18:40 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Michael Haggerty, Junio C Hamano, Karl Hasselström, git

On 02/15/2012 04:24 AM, Somebody in the thread at some point said:

Hi -

> Thank you all for comments and patches. I used a combination of
> Junio's patch with the comments from Michael and a fix from me. I'll
> publish it to the 'master' branch shortly and release a 0.16.1
> hopefully this week.

I cloned the master branch and installed it locally, it's working well.

Thanks a lot to the guys who spent time on this bug and stgit overall,
which I rely heavily on!

-Andy

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-02-15 18:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-07 13:02 STGIT: Deathpatch in linus tree "Andy Green (林安廸)"
2012-02-07 17:37 ` Junio C Hamano
2012-02-08  7:33   ` [StGit PATCH] Parse commit object header correctly Junio C Hamano
2012-02-08 10:00     ` Michael Haggerty
2012-02-08 10:43       ` Frans Klaver
2012-02-08 16:17         ` Michael Haggerty
2012-02-08 20:04           ` Frans Klaver
2012-02-09  3:58             ` Junio C Hamano
2012-02-15 12:24       ` Catalin Marinas
2012-02-15 18:13         ` Junio C Hamano
2012-02-15 18:40         ` "Andy Green (林安廸)"
2012-02-09  9:38     ` Catalin Marinas
2012-02-09 17:51       ` Jonathan Nieder
2012-02-10  4:27         ` Nguyen Thai Ngoc Duy
2012-02-09 19:04       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.