All of lore.kernel.org
 help / color / mirror / Atom feed
* fast-import on existing branches
@ 2019-03-08 10:50 Norbert Nemec
  2019-03-08 15:40 ` Elijah Newren
  0 siblings, 1 reply; 5+ messages in thread
From: Norbert Nemec @ 2019-03-08 10:50 UTC (permalink / raw)
  To: git

Hi there,

I've struggled for quite some time to sort out documented, intended and actual behavior of git fast-import. Unless I'm completely mistaken, it seems to be a straightforward bug, but if that is the case, I am really surprised why nobody else has stumbled over it before:

I managed to use fast-import for a chain of commits onto a new branch into an empty repository.
I managed to use fast-import to create a new branch starting from an existing parent using the 'from' command as documented.

What I failed to do is to add commits on top of an existing branch in a new fast-import stream. As it seems, the variant of using 'commit' without 'from' only works on branches that were created within the same fast-import stream!

The different approaches I tried (each with new fast-import stream on existing repo with existing branch)
* 'commit' without 'from'
-> Error: "Not updating <branch> (new tip <hash> does not contain <hash>)
And indeed looking into the repo afterwards, a new commit exists without any parent.
* 'commit' with 'from' both naming the same branch
-> Error: "Can't create a branch from itself"
The only workarounds that I could find are to either explicitly looking up the top commit on the target branch and hand that to fast-import or create a temporary branch with a different name.

Looking through the code of fast-import.c, I can indeed lookup_branch and new_branch only deal with internal data structures and the only point were read_ref is called to actually read existing branches from the repo is in update_branch to check whether the parent was set correctly. What is missing is a call to read_ref in either lookup_branch or new_branch (probably both have to be reworked in some way to handle this cleanly). From all I can see a fix should be fairly straightforward to implement, but I am really not sure whether I have the full picture on this.

(I found all of this struggling with git-p4.py which appears to contains a complex and not fully correct mechanism to determine the 'initalParent' that appears to implement just such a workaround.)

I would be grateful for any input on this issue! Greetings,
Norbert


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fast-import on existing branches
  2019-03-08 10:50 fast-import on existing branches Norbert Nemec
@ 2019-03-08 15:40 ` Elijah Newren
  2019-03-08 17:33   ` Norbert Nemec
  0 siblings, 1 reply; 5+ messages in thread
From: Elijah Newren @ 2019-03-08 15:40 UTC (permalink / raw)
  To: Norbert Nemec; +Cc: git

Hi Norbert,

On Fri, Mar 8, 2019 at 2:51 AM Norbert Nemec
<Norbert.Nemec@microsoft.com> wrote:
>
> Hi there,
>
> I've struggled for quite some time to sort out documented, intended and actual behavior of git fast-import. Unless I'm completely mistaken, it seems to be a straightforward bug, but if that is the case, I am really surprised why nobody else has stumbled over it before:
>
> I managed to use fast-import for a chain of commits onto a new branch into an empty repository.
> I managed to use fast-import to create a new branch starting from an existing parent using the 'from' command as documented.
>
> What I failed to do is to add commits on top of an existing branch in a new fast-import stream. As it seems, the variant of using 'commit' without 'from' only works on branches that were created within the same fast-import stream!
>
> The different approaches I tried (each with new fast-import stream on existing repo with existing branch)
> * 'commit' without 'from'
> -> Error: "Not updating <branch> (new tip <hash> does not contain <hash>)
> And indeed looking into the repo afterwards, a new commit exists without any parent.
> * 'commit' with 'from' both naming the same branch
> -> Error: "Can't create a branch from itself"
> The only workarounds that I could find are to either explicitly looking up the top commit on the target branch and hand that to fast-import or create a temporary branch with a different name.

I would have just used "from <sha1>" where <sha1> is something I look
up from the current branch I want to update.  But, re-looking at the
docs, it appears git-fast-import.txt covers this already with a
possibly easier syntax:

"""
The special case of restarting an incremental import from the
current branch value should be written as:
----
        from refs/heads/branch^0
----
The `^0` suffix is necessary as fast-import does not permit a branch to
start from itself, and the branch is created in memory before the
`from` command is even read from the input.  Adding `^0` will force
fast-import to resolve the commit through Git's revision parsing library,
rather than its internal branch table, thereby loading in the
existing value of the branch.
"""

Perhaps try that?

> Looking through the code of fast-import.c, I can indeed lookup_branch and new_branch only deal with internal data structures and the only point were read_ref is called to actually read existing branches from the repo is in update_branch to check whether the parent was set correctly. What is missing is a call to read_ref in either lookup_branch or new_branch (probably both have to be reworked in some way to handle this cleanly). From all I can see a fix should be fairly straightforward to implement, but I am really not sure whether I have the full picture on this.
>
> (I found all of this struggling with git-p4.py which appears to contains a complex and not fully correct mechanism to determine the 'initalParent' that appears to implement just such a workaround.)
>
> I would be grateful for any input on this issue! Greetings,
> Norbert

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: fast-import on existing branches
  2019-03-08 15:40 ` Elijah Newren
@ 2019-03-08 17:33   ` Norbert Nemec
  2019-03-11 19:46     ` Elijah Newren
  0 siblings, 1 reply; 5+ messages in thread
From: Norbert Nemec @ 2019-03-08 17:33 UTC (permalink / raw)
  To: git

Thanks, Elijah, I had indeed missed that block about the ^0 handling.

I still don't get why this awkward workaround is required. Why isn't that lookup done by default? Performance can't be the reason, since the same lookup is done lateron anyway, just as correctness check. The way I read the documentation, providing no "from" should continue committing to a branch in any case. I would never have seen the continuation of an incremental import a "special case". There is a number of tools around that sync a git repo from some other source and would regularly need to continue an existing branch.

Greetings,
Norbert


-----Original Message-----
From: Elijah Newren <newren@gmail.com> 
Sent: Friday, March 8, 2019 4:41 PM
To: Norbert Nemec <Norbert.Nemec@microsoft.com>
Cc: git@vger.kernel.org
Subject: Re: fast-import on existing branches

Hi Norbert,

On Fri, Mar 8, 2019 at 2:51 AM Norbert Nemec <Norbert.Nemec@microsoft.com> wrote:
>
> Hi there,
>
> I've struggled for quite some time to sort out documented, intended and actual behavior of git fast-import. Unless I'm completely mistaken, it seems to be a straightforward bug, but if that is the case, I am really surprised why nobody else has stumbled over it before:
>
> I managed to use fast-import for a chain of commits onto a new branch into an empty repository.
> I managed to use fast-import to create a new branch starting from an existing parent using the 'from' command as documented.
>
> What I failed to do is to add commits on top of an existing branch in a new fast-import stream. As it seems, the variant of using 'commit' without 'from' only works on branches that were created within the same fast-import stream!
>
> The different approaches I tried (each with new fast-import stream on 
> existing repo with existing branch)
> * 'commit' without 'from'
> -> Error: "Not updating <branch> (new tip <hash> does not contain 
> -> <hash>)
> And indeed looking into the repo afterwards, a new commit exists without any parent.
> * 'commit' with 'from' both naming the same branch
> -> Error: "Can't create a branch from itself"
> The only workarounds that I could find are to either explicitly looking up the top commit on the target branch and hand that to fast-import or create a temporary branch with a different name.

I would have just used "from <sha1>" where <sha1> is something I look up from the current branch I want to update.  But, re-looking at the docs, it appears git-fast-import.txt covers this already with a possibly easier syntax:

"""
The special case of restarting an incremental import from the current branch value should be written as:
----
        from refs/heads/branch^0
----
The `^0` suffix is necessary as fast-import does not permit a branch to start from itself, and the branch is created in memory before the `from` command is even read from the input.  Adding `^0` will force fast-import to resolve the commit through Git's revision parsing library, rather than its internal branch table, thereby loading in the existing value of the branch.
"""

Perhaps try that?

> Looking through the code of fast-import.c, I can indeed lookup_branch and new_branch only deal with internal data structures and the only point were read_ref is called to actually read existing branches from the repo is in update_branch to check whether the parent was set correctly. What is missing is a call to read_ref in either lookup_branch or new_branch (probably both have to be reworked in some way to handle this cleanly). From all I can see a fix should be fairly straightforward to implement, but I am really not sure whether I have the full picture on this.
>
> (I found all of this struggling with git-p4.py which appears to 
> contains a complex and not fully correct mechanism to determine the 
> 'initalParent' that appears to implement just such a workaround.)
>
> I would be grateful for any input on this issue! Greetings, Norbert

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fast-import on existing branches
  2019-03-08 17:33   ` Norbert Nemec
@ 2019-03-11 19:46     ` Elijah Newren
  2019-03-12  9:18       ` Norbert Nemec
  0 siblings, 1 reply; 5+ messages in thread
From: Elijah Newren @ 2019-03-11 19:46 UTC (permalink / raw)
  To: Norbert Nemec; +Cc: git

Hi Norbert,

On Fri, Mar 8, 2019 at 9:38 AM Norbert Nemec
<Norbert.Nemec@microsoft.com> wrote:
>
> Thanks, Elijah, I had indeed missed that block about the ^0 handling.
>
> I still don't get why this awkward workaround is required. Why isn't that lookup done by default? Performance can't be the reason, since the same lookup is done lateron anyway, just as correctness check. The way I read the documentation, providing no "from" should continue committing to a branch in any case. I would never have seen the continuation of an incremental import a "special case". There is a number of tools around that sync a git repo from some other source and would regularly need to continue an existing branch.
>
> Greetings,
> Norbert

If this "awkward workaround", as you put it, were removed it would
make it impossible to create a commit with no parent without using a
different branch name.  I really like being able to export, modify,
and re-import history, using something of the form:

   git fast-export --all | <edit the stream somehow> | git fast-import --force

which would no longer work if fast-import automatically assumed a
parent for every from-less commit in the input based on the reference
name.  Personally, I'm more on the side of not understanding why
"from" isn't required whenever you want your commit to have a parent;
users can specify either a sha or a mark-id easily enough; I don't see
what it saves to allow omitting it, and it inevitably leads to other
confusion like yours.  But I'm well over a decade too late to advocate
for that.


Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: fast-import on existing branches
  2019-03-11 19:46     ` Elijah Newren
@ 2019-03-12  9:18       ` Norbert Nemec
  0 siblings, 0 replies; 5+ messages in thread
From: Norbert Nemec @ 2019-03-12  9:18 UTC (permalink / raw)
  To: git

Hi Elijah,

thanks for explaining the motivation behind the current solution!

I still believe the situation could be improved without breaking compatibility:
* in the documentation the paragraph about "Omitting the from command" should change "existing branches" into something like "existing branches within the cache of the current fast-import stream". The current phrasing is simply wrong.
* the documentation of the "from branch^0" variant currently looks like a solution for a rare situation that is easily overlooked. Maybe it could be integrated with the paragraph about "Omitting" since it is very closely related.
* in update_branch the warning could hint at the possible solution (explicitly supply a from parent_branch^0 argument)
* I'm still not sure why it would hurt to change fatal error in parse_from about creating a branch from itself to simply fall back to the ^0 behavior?

A much better solution in my view would be to aim for real statelessness and make the caching of branch pointers completely transparent. For all I can see, fast-import nearly follows this paradigm but violates it in this subtle point. Once way to achieve this would be to offer an explicit way to state that a commit should have no parent and deprecate the variant without "from" argument, issuing a warning. The great practical advantage of that would be that interrupting and continuing the fast-import stream would be guaranteed to deliver the same result. Far more important is the conceptual simplicity: A developer could completely forget about caching when it comes to correctness and only think of it when it comes to performance optimization.

Anyway: these are all improvements for future developers. Personally, I am satisfied with everything I have.

Greetings,
Norbert


-----Original Message-----
From: Elijah Newren <newren@gmail.com> 
Sent: Monday, March 11, 2019 8:46 PM
To: Norbert Nemec <Norbert.Nemec@microsoft.com>
Cc: git@vger.kernel.org
Subject: Re: fast-import on existing branches

Hi Norbert,

On Fri, Mar 8, 2019 at 9:38 AM Norbert Nemec <Norbert.Nemec@microsoft.com> wrote:
>
> Thanks, Elijah, I had indeed missed that block about the ^0 handling.
>
> I still don't get why this awkward workaround is required. Why isn't that lookup done by default? Performance can't be the reason, since the same lookup is done lateron anyway, just as correctness check. The way I read the documentation, providing no "from" should continue committing to a branch in any case. I would never have seen the continuation of an incremental import a "special case". There is a number of tools around that sync a git repo from some other source and would regularly need to continue an existing branch.
>
> Greetings,
> Norbert

If this "awkward workaround", as you put it, were removed it would make it impossible to create a commit with no parent without using a different branch name.  I really like being able to export, modify, and re-import history, using something of the form:

   git fast-export --all | <edit the stream somehow> | git fast-import --force

which would no longer work if fast-import automatically assumed a parent for every from-less commit in the input based on the reference name.  Personally, I'm more on the side of not understanding why "from" isn't required whenever you want your commit to have a parent; users can specify either a sha or a mark-id easily enough; I don't see what it saves to allow omitting it, and it inevitably leads to other confusion like yours.  But I'm well over a decade too late to advocate for that.


Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-12  9:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-08 10:50 fast-import on existing branches Norbert Nemec
2019-03-08 15:40 ` Elijah Newren
2019-03-08 17:33   ` Norbert Nemec
2019-03-11 19:46     ` Elijah Newren
2019-03-12  9:18       ` Norbert Nemec

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.