All of lore.kernel.org
 help / color / mirror / Atom feed
* Patch (apply) vs. Pull
@ 2005-06-20 16:19 Darrin Thompson
  2005-06-20 17:22 ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Darrin Thompson @ 2005-06-20 16:19 UTC (permalink / raw)
  To: git

In trying to understand git I've adopted a mental model where everybody
duplicates locally the remote history in which they are interested.
These histories have various points in common, allowing for intelligent
merging. That makes perfect sense to me, until I look at the list
archives filled with patches.

How exactly are these patches being generated? Is there a right-way(tm)
which causes the recipient's a later pullers' histories to be
intelligently handled in the future?

TIA.

--
Darrin




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-20 16:19 Patch (apply) vs. Pull Darrin Thompson
@ 2005-06-20 17:22 ` Junio C Hamano
  2005-06-20 23:01   ` Darrin Thompson
                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Junio C Hamano @ 2005-06-20 17:22 UTC (permalink / raw)
  To: Darrin Thompson; +Cc: git

>>>>> "DT" == Darrin Thompson <darrint@progeny.com> writes:

DT> How exactly are these patches being generated? Is there a right-way(tm)
DT> which causes the recipient's a later pullers' histories to be
DT> intelligently handled in the future?

Those patches are, as far as GIT is concerned, out of band
communications.  You could still merge the result if the patch
was picked up by more than one tree independently, but the
behaviour is a bit less than ideal.  Usually the merge ends up
to be still manageable.

FYI, here is what I have been doing:

 (1) Start from Linus HEAD.

 (2) Repeat develop-and-commit cycle.

 (3) Run "git format-patch" (not in Linus tree) to generate
     patches.

 (4) Send them out and wait to see which one sticks.

 (5) Pull from Linus.

 (6) Throw away my HEAD, making Linus HEAD my HEAD, while
     preserving changes I have made since I forked from him.  I
     use "jit-rewind" for this.

 (7) Examine patches that Linus rejected, and apply ones that I
     still consider good, making one commit per patch.  I use
     "jit-patch" and "jit-commit -m" for this.

 (8) Go back to step 2.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-20 17:22 ` Junio C Hamano
@ 2005-06-20 23:01   ` Darrin Thompson
  2005-06-21 18:02     ` Daniel Barkalow
  2005-06-21 22:09   ` Linus Torvalds
  2005-06-22 17:04   ` Patch (apply) vs. Pull Catalin Marinas
  2 siblings, 1 reply; 33+ messages in thread
From: Darrin Thompson @ 2005-06-20 23:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, 2005-06-20 at 10:22 -0700, Junio C Hamano wrote:
>  (6) Throw away my HEAD, making Linus HEAD my HEAD, while
>      preserving changes I have made since I forked from him.  I
>      use "jit-rewind" for this.

When you say it that way it sounds so _bad_. :-)

Would it make sense to come up with a way to make an emailed series of
patches represent a series of commits? Could patches still be
cherrypicked?

--
Darrin



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-20 23:01   ` Darrin Thompson
@ 2005-06-21 18:02     ` Daniel Barkalow
  2005-06-22  8:47       ` Junio C Hamano
  2005-06-22  9:56       ` Catalin Marinas
  0 siblings, 2 replies; 33+ messages in thread
From: Daniel Barkalow @ 2005-06-21 18:02 UTC (permalink / raw)
  To: Darrin Thompson; +Cc: Junio C Hamano, git

On Mon, 20 Jun 2005, Darrin Thompson wrote:

> On Mon, 2005-06-20 at 10:22 -0700, Junio C Hamano wrote:
> >  (6) Throw away my HEAD, making Linus HEAD my HEAD, while
> >      preserving changes I have made since I forked from him.  I
> >      use "jit-rewind" for this.
> 
> When you say it that way it sounds so _bad_. :-)

The reason is actually that he has to end up with a different history,
which is the history of the project mainline, rather than the history of
his tree. He could, of course, follow his own history, but then
communication with people who use a different history becomes difficult.

> Would it make sense to come up with a way to make an emailed series of
> patches represent a series of commits? Could patches still be
> cherrypicked?

Commits are fundamentally resistant to cherrypicking, because they give
the state of the tree rather than expressing changes in that
state. Long-term, I think that something like StGIT should be integrated
into the system and deal with generating the HEAD you get after getting
patches in.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-20 17:22 ` Junio C Hamano
  2005-06-20 23:01   ` Darrin Thompson
@ 2005-06-21 22:09   ` Linus Torvalds
  2005-06-22  9:08     ` Junio C Hamano
                       ` (3 more replies)
  2005-06-22 17:04   ` Patch (apply) vs. Pull Catalin Marinas
  2 siblings, 4 replies; 33+ messages in thread
From: Linus Torvalds @ 2005-06-21 22:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Darrin Thompson, git



On Mon, 20 Jun 2005, Junio C Hamano wrote:
>
> FYI, here is what I have been doing:
> 
>  (1) Start from Linus HEAD.
> 
>  (2) Repeat develop-and-commit cycle.
> 
>  (3) Run "git format-patch" (not in Linus tree) to generate
>      patches.
> 
>  (4) Send them out and wait to see which one sticks.
> 
>  (5) Pull from Linus.
> 
>  (6) Throw away my HEAD, making Linus HEAD my HEAD, while
>      preserving changes I have made since I forked from him.  I
>      use "jit-rewind" for this.
> 
>  (7) Examine patches that Linus rejected, and apply ones that I
>      still consider good, making one commit per patch.  I use
>      "jit-patch" and "jit-commit -m" for this.
> 
>  (8) Go back to step 2.

Btw, I'd like to help automate the 6-7 stage with a different kind of 
merge logic.

The current "real merge" is the global history merge, and that's the kind
that I personally want to use, since that's what makes sense from a
"project lead" standpoint and for the people around me in the kernel space
that are project leaders of their own.

However, as you point out, it's not necessarily the best kind of merge for
the "individual developer" standpoint. Most individual developers don't
necessarily want to merge their work, rather they want to "bring it
forward" to the current tip. And I think git could help with that too.

It would be somewhat akin to the current git-merge-script, but instead of 
merging it based on the common parent, it would instead try to re-base all 
the local commits from the common parent onwards on top of the new remote 
head. That often makes more sense from the standpoint of a individual 
developer who wants to update his work to the remote head.

Something like this (this assumes FETCH_HEAD is the remote head that we 
just fetched with "git fetch xxx" and that we want to re-base to):

 - get the different HEAD info set up, and save the original head in 
   ORIG_HEAD, the way "git resolve" does for real merges:

	: ${GIT_DIR=.git}

	orig=$(git-rev-parse HEAD)
	new=$(git-rev-parse FETCH_HEAD)
	common=$(git-merge-base $orig $new)

	echo $orig > $GIT_DIR/ORIG_HEAD

 - fast-forward to the new HEAD. We'll want to re-base everything off 
   that. If that fails, exit out - we've got dirty state

	git-read-tree -m -u $orig $new && exit 1

 - for each commit that we had in our old tree but not in the common part, 
   try to re-base it:

	> FAILED_TO_CHERRYPICK
	for i in $(git-rev-list $orig ^$common)
	do
		git-cherry-pick $i ||
			(echo $i >> FAILED_TO_CHERRYPICK)
	done
	if [ -s FAILED_TO_CHERRYPICK ]; then
		echo Some commits could not be cherry-picked, check by hand:
		cat FAILED_TO_CHERRYPICK
	fi

and here the "git-cherry-pick" thing is just a script that basically takes
an old commit ID, and tries to re-apply it as a patch (with author data
and commit messages, of course) on top of the current head. It would 
basically be nothing more than a "git-diff-tree $1" followed by tryign to 
figure out whether it had already been applied or whether it can be 
applied now.

What do you think?

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-21 18:02     ` Daniel Barkalow
@ 2005-06-22  8:47       ` Junio C Hamano
  2005-06-22  9:56       ` Catalin Marinas
  1 sibling, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2005-06-22  8:47 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:

DB> On Mon, 20 Jun 2005, Darrin Thompson wrote:
>> On Mon, 2005-06-20 at 10:22 -0700, Junio C Hamano wrote:
>> >  (6) Throw away my HEAD, making Linus HEAD my HEAD, while
>> >      preserving changes I have made since I forked from him.  I
>> >      use "jit-rewind" for this.
>> 
>> When you say it that way it sounds so _bad_. :-)

DB> The reason is actually that he has to end up with a different history,

Exactly.  I _want_ to, not just _have_ to, end up with a
different history.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-21 22:09   ` Linus Torvalds
@ 2005-06-22  9:08     ` Junio C Hamano
  2005-06-22 17:21       ` Linus Torvalds
  2005-06-22 16:23     ` Darrin Thompson
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2005-06-22  9:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> and here the "git-cherry-pick" thing is just a script that basically takes
LT> an old commit ID, and tries to re-apply it as a patch (with author data
LT> and commit messages, of course) on top of the current head. It would 
LT> basically be nothing more than a "git-diff-tree $1" followed by tryign to 
LT> figure out whether it had already been applied or whether it can be 
LT> applied now.

LT> What do you think?

What you outlined is essentially what I already do by using
jit-rewind, followed by a repeated use of (jit-patch and
jit-commit with -m flag).  The reason I have not automated the
"repeat" part is _not_ because I am lazy, but because typically
the rejected things really need manual intervention, not for
mechanical (read: merge conflict) reasons, but for semantic
reasons, when some patches are accepted while some others are
not.  Especially if I am not the sole supplier of patches to
your tree, my older patches usually need not just rebasing but
_rethinking_, so I myself do not find need for automating things
further that much from what I already have.

Having said that, one automation I would benefit from is to
automatically find patches that _have_ been accepted and drop
them from my snapshot pool --- that part should be very easy to
automate and I have not done so primarily because I _am_ lazy.
I could call it git-cherry-drop ;-).



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-21 18:02     ` Daniel Barkalow
  2005-06-22  8:47       ` Junio C Hamano
@ 2005-06-22  9:56       ` Catalin Marinas
  1 sibling, 0 replies; 33+ messages in thread
From: Catalin Marinas @ 2005-06-22  9:56 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Darrin Thompson, Junio C Hamano, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Mon, 20 Jun 2005, Darrin Thompson wrote:
>> Would it make sense to come up with a way to make an emailed series of
>> patches represent a series of commits? Could patches still be
>> cherrypicked?

While you could cherry-pick a changeset generated by a commit
(i.e. the diff between the commit's tree and its parent), I found this
not to always be convenient. For example, a fix might need more than
one commit but there is no way to know how they relate and which
changesets to cherry-pick, unless somebody tells you exactly.

> Commits are fundamentally resistant to cherrypicking, because they give
> the state of the tree rather than expressing changes in that
> state. Long-term, I think that something like StGIT should be integrated
> into the system and deal with generating the HEAD you get after getting
> patches in.

With StGIT, you can gather many related commits into a patch. This
patch (i.e. a series of related commits) could be pulled into your
tree and pushed onto your stack of patches. StGIT should also allow
one to upgrade the pulled patch and re-applied onto the stack.

Thanks to Daniel's suggestion for multiple heads support, StGIT will
(in a future release) support pulling changes from a remote tree
together with the patch series information. After this is done,
applying patches from different branch (head) would be quite simple.

One problem is patch dependency tracking (i.e. you cannot push a patch
onto the stack if it expects a certain patch to be already
applied). Darcs does this by checking whether two patches can be
commuted. I have to think a bit more about how StGIT could handle
this.

-- 
Catalin


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-21 22:09   ` Linus Torvalds
  2005-06-22  9:08     ` Junio C Hamano
@ 2005-06-22 16:23     ` Darrin Thompson
  2005-06-23  8:36     ` Martin Langhoff
  2005-06-23 23:21     ` [PATCH 0/3] Rebasing for "individual developer" usage Junio C Hamano
  3 siblings, 0 replies; 33+ messages in thread
From: Darrin Thompson @ 2005-06-22 16:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Tue, 2005-06-21 at 15:09 -0700, Linus Torvalds wrote:
> However, as you point out, it's not necessarily the best kind of merge for
> the "individual developer" standpoint. Most individual developers don't
> necessarily want to merge their work, rather they want to "bring it
> forward" to the current tip. And I think git could help with that too.
> 

That's a good way to put it.

> and here the "git-cherry-pick" thing is just a script that basically takes
> an old commit ID, and tries to re-apply it as a patch (with author data
> and commit messages, of course) on top of the current head. It would 
> basically be nothing more than a "git-diff-tree $1" followed by tryign to 
> figure out whether it had already been applied or whether it can be 
> applied now.
> 
> What do you think?

Here are two desirable things the might be tough to reconcile:

- The merging mechanism might benefit from knowing that your commit was
really originally my commit _if_ my history is relevant to the merge and
present.

- The rest of the world does _not_ want to have to keep my commits on
hand just to follow the mainline.

I imagine if those could be reconciled you'd hit a sweet spot.

A mechanism where those two were true might also provide better hooks
for knowing other things. For instance, which of these particular
commits of mine are not in the mainline tree? 

Perhaps your mainline commit might refer to my humble commit as some
kind of sibling. You don't need to have it to follow the mainline, but
the data is there if it helps anybody.

--
Darrin



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-20 17:22 ` Junio C Hamano
  2005-06-20 23:01   ` Darrin Thompson
  2005-06-21 22:09   ` Linus Torvalds
@ 2005-06-22 17:04   ` Catalin Marinas
  2 siblings, 0 replies; 33+ messages in thread
From: Catalin Marinas @ 2005-06-22 17:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Darrin Thompson, git

Have a look at StGIT for this, it might help.

Junio C Hamano <junkio@cox.net> wrote:
> FYI, here is what I have been doing:
>
>  (1) Start from Linus HEAD.
>
>  (2) Repeat develop-and-commit cycle.

Gather the related commits into an StGIT patch. It's actually easier
to only update a set of existing patches, similar to the quilt way.

stg new patch1
modify...
stg commit
modify...
stg commit

stg push/pop/new

etc.

>  (3) Run "git format-patch" (not in Linus tree) to generate
>      patches.

stg export. The problem with this one is that it doesn't preserve any
of the commit information but it can be adapted (though I'm not sure
it is worth since the patch won't be that readable).

>  (4) Send them out and wait to see which one sticks.
>
>  (5) Pull from Linus.
>
>  (6) Throw away my HEAD, making Linus HEAD my HEAD, while
>      preserving changes I have made since I forked from him.  I
>      use "jit-rewind" for this.

stg pop -a. This will remove all your changes grouped in stgit
patches. The HEAD is now Linus' old HEAD. Pull/merge will advance the
HEAD to Linus' latest HEAD.

>  (7) Examine patches that Linus rejected, and apply ones that I
>      still consider good, making one commit per patch.  I use
>      "jit-patch" and "jit-commit -m" for this.

stg push -a. This step will do a diff3 between the current HEAD and
the top of the patch as the two branches, and the bottom of the patch
as an ancestor.

If the patch was merged unmodified, stgit detects this and warns you
that the patch is now empty (it detects it even if a file contains
other modifications apart from yours). If you modified it in the
meantime or the Linus modified it when merged (or some other third
party patch modifies yours), you will get a conflict you can resolve.

>  (8) Go back to step 2.

-- 
Catalin


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-22  9:08     ` Junio C Hamano
@ 2005-06-22 17:21       ` Linus Torvalds
  2005-06-22 20:08         ` Daniel Barkalow
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2005-06-22 17:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Wed, 22 Jun 2005, Junio C Hamano wrote:
> 
> Having said that, one automation I would benefit from is to
> automatically find patches that _have_ been accepted and drop
> them from my snapshot pool --- that part should be very easy to
> automate and I have not done so primarily because I _am_ lazy.
> I could call it git-cherry-drop ;-).

Andrew Morton does all of this with quilt, and it ends up being very
effective. Of course, the kernel has had almost 15 years of people making 
it more and more modular, so the kernel really gets rejects very seldom 
indeed, and it would probably not work as well for other projects 
(including git) that tend to have tighter couplings, if for no other 
reason than the fact that they are usually smaller.

Anyway, it's not a trivial problem. You can search for patch matches in
the patch history, but the thing is, that will fail whenever a patch was
edited for whitespace etc (which does end up happening often enough to
matter). The same goes even more for the commit messages (I routinely not
only add my sign-off, of course, but also fix typo's I notice and often
end up reformatting whitespace).

So at the very least, you'd have to have a fuzzy search (but exact enough 
that you'd see if a part of a patch was dropped etc).

			Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-22 17:21       ` Linus Torvalds
@ 2005-06-22 20:08         ` Daniel Barkalow
  2005-06-22 20:22           ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Barkalow @ 2005-06-22 20:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Wed, 22 Jun 2005, Linus Torvalds wrote:

> Anyway, it's not a trivial problem. You can search for patch matches in
> the patch history, but the thing is, that will fail whenever a patch was
> edited for whitespace etc (which does end up happening often enough to
> matter). The same goes even more for the commit messages (I routinely not
> only add my sign-off, of course, but also fix typo's I notice and often
> end up reformatting whitespace).

If each patch is given an ID by the author (not committer) which is
preserved across various later modifications, we could at least recognize
which patch it was. Then we could present the author with the information
that the patch was accepted and was modified, and give the author the
option of replacing it in the series with one that reverts some or all of
the modifications (including re-adding dropped portions). I don't see any
reason that it would be better to search for similarity rather than just
having things explicitly tagged (I think, although I'm not sure, that
authors will care about the difference between someone applying an
alternative patch and someone applying the author's patch with
modifications.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-22 20:08         ` Daniel Barkalow
@ 2005-06-22 20:22           ` Linus Torvalds
  2005-06-22 21:54             ` Daniel Barkalow
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2005-06-22 20:22 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git



On Wed, 22 Jun 2005, Daniel Barkalow wrote:
>
> If each patch is given an ID by the author (not committer) which is
> preserved across various later modifications, we could at least recognize
> which patch it was.

I really don't like it.

I do realize that people use patch ID's inside various companies already, 
because it's a nice way to track things. But the fact is, especially with 
the patch going outside the SCM (which is the whole _point_ here, after 
all), any modifications will make that ID be dubious.

And if it _isn't_ modified, then the ID is pointless - you might as well 
use the SHA1 of the patch itself as its ID, ie not use an explicit ID at 
all.

So I think introducing extra ID's in the process only creates the
possibility for more confusion. Either the patch is unmodified (and the ID
is not needed in the first place) or the patch is modified (and the ID
doesn't convey that). Not to mention the fact that the ID then becomes 
just another thing that can get corrupted or lost or just plain mistakenly 
edited from another patch..

So while I do accept ID's in my workflow (the XFS guys use them to track
commits between their own internal system and the kernel releases), I
really don't like it as a primary mechanism. I think it's useful for
specific projects, but any such usefulness is exactly the fact that then
the tracking of the ID's is totally outside of git itself or any of the
processes of git users.

Which of course is ok, but it's _not_ what I'm interested in if we're
discussing trying to make git itself have some support for "end-developer
merges" (re-write history) as opposed to "maintainer merges" (merge
history).

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-22 20:22           ` Linus Torvalds
@ 2005-06-22 21:54             ` Daniel Barkalow
  2005-06-22 22:21               ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Barkalow @ 2005-06-22 21:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Wed, 22 Jun 2005, Linus Torvalds wrote:

> I do realize that people use patch ID's inside various companies already, 
> because it's a nice way to track things. But the fact is, especially with 
> the patch going outside the SCM (which is the whole _point_ here, after 
> all), any modifications will make that ID be dubious.

If this were done within the system, the maintainer would merge a commit
from the author, commit, fix up the whitespace, commit, and push out the
result. The developer then sees that the head was merged, and that
whitespace changes were applied on top of that.

The benefit to an ID that doesn't get changed is that the developer (or
the developer's scripts) can tell that the patch is an ancestor of the new
base tree, which is really important information about maintainer
intent: the remnant in this case is a revert of the maintainer's
changes to the patch, not an independant but conflicting patch.

> And if it _isn't_ modified, then the ID is pointless - you might as well 
> use the SHA1 of the patch itself as its ID, ie not use an explicit ID at 
> all.

My thought was actually to use the hash as the ID, and add headers for
"this patch is a descendant of <other-ID>" as it gets tweaked.

But even in the case where the patch is not modified, the developer can't
retrieve the hash of the applied patch, because it may have applied with
fuzz, in which case diffing the parent's tree with the tree wouldn't
generate a byte-for-byte identical patch. So it would be worth having the
commit store the hash of the submitted patch in a header anyway.

> So I think introducing extra ID's in the process only creates the
> possibility for more confusion. Either the patch is unmodified (and the ID
> is not needed in the first place) or the patch is modified (and the ID
> doesn't convey that).

Do you actually modify patches before applying them, rather than applying
them and the fixing the resulting files? I've never managed to modify
content in a patch (aside from dropping hunks) without upsetting patch. If
you apply them and then fix things, the ID (and for that matter, the 
hash) will be safely conveyed from my system to yours and available for
the commit to mention.

> Which of course is ok, but it's _not_ what I'm interested in if we're
> discussing trying to make git itself have some support for "end-developer
> merges" (re-write history) as opposed to "maintainer merges" (merge
> history).

I believe there are separate issues here: 

 1. pure history rewriting: maintainer merges a developer's
    intermediate head; developer generates a new history in which
    everything later is based on the new mainline.
 2. patches: changes are transferred as diffs over SMTP instead of trees
    inside git.
 3. cherry-picking: maintainer applies some set of changes from a
    developer which is not merging a head the developer created.

I think (1) is easily handled as a merge script that goes through a series
of commits and makes a new series out of merging each of them, rather than
merging the last of them only.

I think (2) should be as transparent as possible, and, in cases where
there was no cherry-picking, be equivalent in the system's behavior to the
result of pull and merge (with the possibility for various cleanup 
happening on top of or along with the merge in either method).

The tricky part is (3), which is currently only possible by going outside
of git. But I think that this is something to tackle separately from
(1) and (2) (where (2) does not involve doing (3)).

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-22 21:54             ` Daniel Barkalow
@ 2005-06-22 22:21               ` Linus Torvalds
  2005-06-23  3:32                 ` Daniel Barkalow
  2005-06-23  8:47                 ` Martin Langhoff
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2005-06-22 22:21 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git



On Wed, 22 Jun 2005, Daniel Barkalow wrote:
> 
> Do you actually modify patches before applying them, rather than applying
> them and the fixing the resulting files? I've never managed to modify
> content in a patch (aside from dropping hunks) without upsetting patch.

I've been doing unified diffs for a _loong_ time, and I edit patches in my 
sleep. The rules for line numbers etc are really quite simple, and yes, I 
do edit patches before I apply them.

The most common form of editing is just removing single fragments, or
fixing up whitespace. Quite often people send me patches that don't even
apply, because whitespace got corrupted either because their mailer ate
it, or simply because they cut-and-pasted the patch. So I end up fixing
things up, and the end result may not actually match the original one
byte-for-byte, even if it matches visually.

Similarly, the "remove patch fragments" is often because somebody mixes up
two things in a patch, and I decide to take one of them but there's some
problem with the other.  Sometimes that patch fragment thing means that I
have to edit even within one fragment, and fix up the patch line-counters
etc.

NOTE! This is "rare" in the sense that it doesn't happen for most patches,
but that said, I get a _lot_ of patches, and it's not rare in the sense
that it doesn't happen weekly.

So for example, I did one such edit just Monday in the "sparse" project,
where Peter Jones sent me a patch that added parsing for a lot of new gcc
attributes (good), but he had also done some other things that weren't
quite ready yet (bad). And because they were easy to separate out in the
patch, I just did it right there instead of asking him to do it for me and
re-sending.

> If you apply them and then fix things, the ID (and for that matter, the
> hash) will be safely conveyed from my system to yours and available for
> the commit to mention.

I don't want crap in my code. I disagree very strongly with people who say
that "you can just fix it later". That's not how people work, and worst of
all, not only does it not get fixed up, even _if_ it is fixed up the wrong
version inevitably ends up showing up somewhere else, just because
somebody ended up using the original patch.

So I want things to be cleaned up before they hit the tree, rather than 
have a really dirty history. A dirty history just makes it harder to read, 
and I don't believe in a second that it's "closer to reality" like some 
people claim.

I don't believe anybody wants to see the "true" history. If they did, we'd 
be logging keystrokes, and sending all of that around. Nope, people want 
(and need, in order to be able to follow it) an "idealized" history.

> I believe there are separate issues here: 
> 
>  1. pure history rewriting: maintainer merges a developer's
>     intermediate head; developer generates a new history in which
>     everything later is based on the new mainline.
>  2. patches: changes are transferred as diffs over SMTP instead of trees
>     inside git.
>  3. cherry-picking: maintainer applies some set of changes from a
>     developer which is not merging a head the developer created.
> 
> I think (1) is easily handled as a merge script that goes through a series
> of commits and makes a new series out of merging each of them, rather than
> merging the last of them only.

Yes. And I think (1) is pretty useful on its own, and that git could 
support that with a nice helper script.

> I think (2) should be as transparent as possible, and, in cases where
> there was no cherry-picking, be equivalent in the system's behavior to the
> result of pull and merge (with the possibility for various cleanup 
> happening on top of or along with the merge in either method).

I really see patches as something totally different than merging. I 
literally see them as a way to move between different systems.

For example, the git model just doesn't work very well for "fluid" 
development: git ends up setting history entirely in stone, which means 
that you can't fix up mistakes later. And this is where patches come in: 
they work as a way to transfer the information, while at the same time 
totally breaking the connection with the original messy tree.

So I really think our view on what "patches" are is very fundamentally 
different. I don't think SMTP is a good medium for merges: BK actually 
supports that with "bk send" (or something like that) and I refused to use 
it. It was a "worst of both worlds" thing.

> The tricky part is (3), which is currently only possible by going outside
> of git. But I think that this is something to tackle separately from
> (1) and (2) (where (2) does not involve doing (3)).

I think the cherry-picking kind of goes hand in hand with 1/2, though. 
Patches are really the perfect form of cherry-picking, exactly because 
they do _not_ imply a very strong ordering or even a very strong 
dependency on the state of the rest of the sources. So patches end up 
being the perfect medium for cherry-picking, and SMTP ends up being one of 
the best ways to transport them and let them evolve.

And then (1) ends up often being the way the patches actually get 
generated in the first place. So these things are intertwined, I think.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-22 22:21               ` Linus Torvalds
@ 2005-06-23  3:32                 ` Daniel Barkalow
  2005-06-23  4:23                   ` Linus Torvalds
  2005-06-23  8:47                 ` Martin Langhoff
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel Barkalow @ 2005-06-23  3:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Wed, 22 Jun 2005, Linus Torvalds wrote:

> I've been doing unified diffs for a _loong_ time, and I edit patches in my 
> sleep. The rules for line numbers etc are really quite simple, and yes, I 
> do edit patches before I apply them.

Ah, okay. I actually know the rules for line numbers, and I still can't
change the lines and line numbers reliably in sync.

> Similarly, the "remove patch fragments" is often because somebody mixes up
> two things in a patch, and I decide to take one of them but there's some
> problem with the other.  Sometimes that patch fragment thing means that I
> have to edit even within one fragment, and fix up the patch line-counters
> etc.

Right; I do this myself, when I turn a mass of changes into a set of
patches. I think it would be worth having a good tool for applying a
patch while discarding undesired hunks and editing the regions it applies
to. This is, of course, not immediately relevant, aside from it making the
tracking more useful, because more people would be able to reliably modify
patches. In any case, though, after you've fixed a patch, you apply it
rather than sending it to anyone or archiving it or something, right?

> NOTE! This is "rare" in the sense that it doesn't happen for most patches,
> but that said, I get a _lot_ of patches, and it's not rare in the sense
> that it doesn't happen weekly.

I think it would we worthwhile for the common case for the system to
recognize that a patch went in unmodified, but potentially after a
different patch which caused it to have fuzz, and have a header that would
get the developer's update to recognize what happened.

> I don't want crap in my code. I disagree very strongly with people who say
> that "you can just fix it later". That's not how people work, and worst of
> all, not only does it not get fixed up, even _if_ it is fixed up the wrong
> version inevitably ends up showing up somewhere else, just because
> somebody ended up using the original patch.

If you can modify patches, that's better; I think most maintainers would
fall back to applying the patch and fixing things in the working directory
before committing; the history still doesn't get dirty, which is what
actually matters.

> > I believe there are separate issues here: 
> > 
> >  1. pure history rewriting: maintainer merges a developer's
> >     intermediate head; developer generates a new history in which
> >     everything later is based on the new mainline.
> >  2. patches: changes are transferred as diffs over SMTP instead of trees
> >     inside git.
> >  3. cherry-picking: maintainer applies some set of changes from a
> >     developer which is not merging a head the developer created.
> > 
> > I think (1) is easily handled as a merge script that goes through a series
> > of commits and makes a new series out of merging each of them, rather than
> > merging the last of them only.
> 
> Yes. And I think (1) is pretty useful on its own, and that git could 
> support that with a nice helper script.

I think that this, by itself, is likely to be a sufficiently common case
to be worth just doing. Once the script exists, it makes it worthwhile for
developers to organize things such that it works.

E.g., yesterday, I'd have had:

You -- A1 -- A2
 \             \
  ----- B ------ Stuff I've sent -- Stuff I'm working on

If you pulled A1 or A2, and/or B, the script would take care of everything
without any fuss. I didn't actually make this structure of commits,
however, because I didn't have the script that would make it useful. 

> > I think (2) should be as transparent as possible, and, in cases where
> > there was no cherry-picking, be equivalent in the system's behavior to the
> > result of pull and merge (with the possibility for various cleanup 
> > happening on top of or along with the merge in either method).
> 
> I really see patches as something totally different than merging. I 
> literally see them as a way to move between different systems.
> 
> For example, the git model just doesn't work very well for "fluid" 
> development: git ends up setting history entirely in stone, which means 
> that you can't fix up mistakes later. And this is where patches come in: 
> they work as a way to transfer the information, while at the same time 
> totally breaking the connection with the original messy tree.

I certainly use patches internally as a way of going from my messy
development tree to my clean patch tree. But that's mostly a developer's
own usage. (I suppose developers might diff their latest commit against
the mainline, and then hand-edit it into a series without regenerating
patches, but I think that would be really awkward and failure-prone for
non-experts in unified diff.)

But I think that the real value of patches for submission as opposed to
merges is that you can review them conveniently. It's not just that
they're machine-readable and can apply with fuzz; they're also pretty easy
for humans to read, which is why unified diffs are better than context
diffs, despite having the same expressive power. In this case, then, they
aren't being used for cherry-picking or any other history cleaning; you'll
tend to apply the patch straight (or reject it), and then it would be
useful to have it act like a merge, with respect to further operations
understanding what happened.

> So I really think our view on what "patches" are is very fundamentally 
> different. I don't think SMTP is a good medium for merges: BK actually 
> supports that with "bk send" (or something like that) and I refused to use 
> it. It was a "worst of both worlds" thing.

Certainly, if you trust the result enough to merge, there's no need to
review anything and therefore no need to go through email.

> > The tricky part is (3), which is currently only possible by going outside
> > of git. But I think that this is something to tackle separately from
> > (1) and (2) (where (2) does not involve doing (3)).
> 
> I think the cherry-picking kind of goes hand in hand with 1/2, though. 
> Patches are really the perfect form of cherry-picking, exactly because 
> they do _not_ imply a very strong ordering or even a very strong 
> dependency on the state of the rest of the sources. So patches end up 
> being the perfect medium for cherry-picking, and SMTP ends up being one of 
> the best ways to transport them and let them evolve.

I think that cherry-picking depends on 1/2, but that each of them has a
non-cherry-picking case worth supporting specifically.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23  3:32                 ` Daniel Barkalow
@ 2005-06-23  4:23                   ` Linus Torvalds
  2005-06-23  5:15                     ` Daniel Barkalow
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2005-06-23  4:23 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git



On Wed, 22 Jun 2005, Daniel Barkalow wrote:
> 
> Ah, okay. I actually know the rules for line numbers, and I still can't
> change the lines and line numbers reliably in sync.

Heh. I usually just edit the fragment, and ignore the line numbers, and 
then I go back and just count ;)

So I don't even try to keep things in sync, I just fix them up 
after-the-fact.

> Right; I do this myself, when I turn a mass of changes into a set of
> patches. I think it would be worth having a good tool for applying a
> patch while discarding undesired hunks and editing the regions it applies
> to.

There at least one GNU emacs "patch mode" editing thing to help you, and I
think there is even a tool specifically geared to splitting up a patch
into several sub-patches. I've been pointed at it occasionally, I just do 
it by hand.

> I think it would we worthwhile for the common case for the system to
> recognize that a patch went in unmodified, but potentially after a
> different patch which caused it to have fuzz, and have a header that would
> get the developer's update to recognize what happened.

Note that I don't even apply patches with fuzz at all. "git-apply" refuses 
to recognize anything with fuzz, although it _will_ move the patch around 
to make it match.

I've got this theory that if you apply a thousand patches, and one of them 
applies with fuzz, you want to stop right there and see what's wrong.

This was what I did before I wrote "git-apply":

	patch -E -u --no-backup-if-mismatch -f -p1 --fuzz=0 --input=$PATCHFILE

ie pretty unambiguous. I don't trust the "automatically guess the depth of 
the patch" thing, for example, since it, together with allowing fuzz, has 
several times caused nasty problems for me with things like the wrong 
Makefile being modified by a patch..

So as far as I'm concerned, you really could just take the SHA1 of the
patch (leave out the '@@' lines with line numbers), and you'd have a
reliable ID for it.

In fact, you could probably replace every run of contiguous whitespace
with a single space, and then you'd not have to worry about whitespace
differences either. That would be very simple to do, and quite workable: I
certainly think it sounds more reliable than just hoping that people
always pass on a "patch ID" in their emails..

> > I don't want crap in my code. I disagree very strongly with people who say
> > that "you can just fix it later". That's not how people work, and worst of
> > all, not only does it not get fixed up, even _if_ it is fixed up the wrong
> > version inevitably ends up showing up somewhere else, just because
> > somebody ended up using the original patch.
> 
> If you can modify patches, that's better; I think most maintainers would
> fall back to applying the patch and fixing things in the working directory
> before committing; the history still doesn't get dirty, which is what
> actually matters.

The reason I much prefer editing patches is my batch-mode approach to then
applying them. If I were to fix things up after applying a patch, I'd have
to break up the series: apply <n> patches, fix up, apply <m> patches, fix
up, etc. In contrast, now I just fix up the mbox directly (at a minimum,
add the sign-off, even if I don't actually touch the patch itself), and
just apply it all in one go.

But yes, if it's a nasty case, I'll just apply it, edit it, re-create a
diff, and then re-apply it with the re-created diff (since all my tools
are geared towards getting the log message etc with the patch, I don't
just commit it after fixing it up: that would screw up the author
information etc).

> > Yes. And I think (1) is pretty useful on its own, and that git could 
> > support that with a nice helper script.
> 
> I think that this, by itself, is likely to be a sufficiently common case
> to be worth just doing. Once the script exists, it makes it worthwhile for
> developers to organize things such that it works.

Yeah. It probably works well in 99% of the cases to just do a simple
"export as patch" + "apply on top with old commit message, author and
author-date".

> But I think that the real value of patches for submission as opposed to
> merges is that you can review them conveniently.

Yes, agreed (with the exception of how I tend to merge with Andrew, where 
it's really more of a regular merge in the sense of "I pull from Andrew 
because I trust him, not because I look at every patch").

> It's not just that
> they're machine-readable and can apply with fuzz; they're also pretty
> easy for humans to read, which is why unified diffs are better than
> context diffs, despite having the same expressive power. In this case,
> then, they aren't being used for cherry-picking or any other history
> cleaning; you'll tend to apply the patch straight (or reject it), and
> then it would be useful to have it act like a merge, with respect to
> further operations understanding what happened.

I've _occasionally_ wanted patches to work that way, just because they 
don't apply, but they'd apply to the right version and then I could just 
merge them. So yes, sometimes a patch might be more of a merge thing. Most 
of the time, the patch has really been around the block several times, and 
it's really lost it's position in the history tree.. So it really ends up 
being "just apply it to the top" 99% of the time anyway.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23  4:23                   ` Linus Torvalds
@ 2005-06-23  5:15                     ` Daniel Barkalow
  2005-06-23  6:09                       ` Linus Torvalds
  2005-06-23 12:10                       ` Catalin Marinas
  0 siblings, 2 replies; 33+ messages in thread
From: Daniel Barkalow @ 2005-06-23  5:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Wed, 22 Jun 2005, Linus Torvalds wrote:

> On Wed, 22 Jun 2005, Daniel Barkalow wrote:
> > I think it would we worthwhile for the common case for the system to
> > recognize that a patch went in unmodified, but potentially after a
> > different patch which caused it to have fuzz, and have a header that would
> > get the developer's update to recognize what happened.
> 
> Note that I don't even apply patches with fuzz at all. "git-apply" refuses 
> to recognize anything with fuzz, although it _will_ move the patch around 
> to make it match.

I bet I'm misunderstanding fuzz; what I actually mean is that, if a patch
applies after moving it, then regenerating it from the result would give
the a patch with different line numbers; if these affect the hash, the
author's tools will be sad.

> So as far as I'm concerned, you really could just take the SHA1 of the
> patch (leave out the '@@' lines with line numbers), and you'd have a
> reliable ID for it.
>
> In fact, you could probably replace every run of contiguous whitespace
> with a single space, and then you'd not have to worry about whitespace
> differences either. That would be very simple to do, and quite workable: I
> certainly think it sounds more reliable than just hoping that people
> always pass on a "patch ID" in their emails..

That's actually quite plausible. The only case it wouldn't handle is when
you actually discard parts, and I'm not sure at this point what other
people should see there.

> But yes, if it's a nasty case, I'll just apply it, edit it, re-create a
> diff, and then re-apply it with the re-created diff (since all my tools
> are geared towards getting the log message etc with the patch, I don't
> just commit it after fixing it up: that would screw up the author
> information etc).

I think most people's scripts stash the information needed for the commit
somewhere, and pick it back up at commit time, at least for merges.

> > > Yes. And I think (1) is pretty useful on its own, and that git could 
> > > support that with a nice helper script.
> > 
> > I think that this, by itself, is likely to be a sufficiently common case
> > to be worth just doing. Once the script exists, it makes it worthwhile for
> > developers to organize things such that it works.
> 
> Yeah. It probably works well in 99% of the cases to just do a simple
> "export as patch" + "apply on top with old commit message, author and
> author-date".

I think that you'll get better results out of "merge with top" + "commit
with old commit info, but not listing old commit as a parent". At least,
that's what StGIT is doing, IIRC, and using merge instead of patch seems
like it'll make the remaining 1% a lot more pleasant. In fact, isn't it
necessary if you want to make sense out of "half of my patch got applied",
as a bunch of "still needed" hunks and a bunch of "already applied" hunks 
that disappear after the message?

> > It's not just that
> > they're machine-readable and can apply with fuzz; they're also pretty
> > easy for humans to read, which is why unified diffs are better than
> > context diffs, despite having the same expressive power. In this case,
> > then, they aren't being used for cherry-picking or any other history
> > cleaning; you'll tend to apply the patch straight (or reject it), and
> > then it would be useful to have it act like a merge, with respect to
> > further operations understanding what happened.
> 
> I've _occasionally_ wanted patches to work that way, just because they 
> don't apply, but they'd apply to the right version and then I could just 
> merge them. So yes, sometimes a patch might be more of a merge thing. Most 
> of the time, the patch has really been around the block several times, and 
> it's really lost it's position in the history tree.. So it really ends up 
> being "just apply it to the top" 99% of the time anyway.

It should be fine as a merge if you apply it to the top; the case that's
cherry-picking is when you apply a patch that was second in a series
without applying the first. By "like a merge" I really mean "someone
changed <old> to <patched>; I want to make that change to <new>, such that
future merges aren't confused." In this case, you'd actually generate only
an "apply" commit, not recreate (or fetch) "patched-old" and generate a
merge commit. But, if you put the hash of the patch (as above) in a commit
header, other people who have the same patch (including the author) can
identify the commonality and not be confused. (I think putting
it in a header is likely necessary for efficiency reasons, so that it
isn't necessary to unpack/diff/hash all of the trees while updating.)

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23  5:15                     ` Daniel Barkalow
@ 2005-06-23  6:09                       ` Linus Torvalds
  2005-06-23 16:45                         ` Daniel Barkalow
  2005-06-23 12:10                       ` Catalin Marinas
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2005-06-23  6:09 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git



On Thu, 23 Jun 2005, Daniel Barkalow wrote:
> 
> I bet I'm misunderstanding fuzz; what I actually mean is that, if a patch
> applies after moving it, then regenerating it from the result would give
> the a patch with different line numbers; if these affect the hash, the
> author's tools will be sad.

What GNU patch calls "fuzz" is how badly the context can "not match". A 
"fuzz factor" of one allows the patch to apply even if the "outermost" of 
the context lines don't match up. See "man patch".

What you talk about is what they (and I) call "offset", and yes, you must
ignore the line numbers when considering two patches identical, exactly
because other patches may change their offsets.

So "git-apply" does apply patches that are offset from where the patch 
claims (and the "claimed position" is really nothing more than a "start 
searching here" parameter), but git-apply does not allow any fuzz.

> > In fact, you could probably replace every run of contiguous whitespace
> > with a single space, and then you'd not have to worry about whitespace
> > differences either. That would be very simple to do, and quite workable: I
> > certainly think it sounds more reliable than just hoping that people
> > always pass on a "patch ID" in their emails..
> 
> That's actually quite plausible. The only case it wouldn't handle is when
> you actually discard parts, and I'm not sure at this point what other
> people should see there.

Yes. One small note of warning: different "diff" algorithms may under some
(mostly unlikely) circumstances result in different patches for the
difference between the same two files. So when comparin SHA1's of diffs
this way, you should also hopefully have the same diff generation
algorithm.

That's not likely to be a problem in practice, but it migh be something to 
keep in mind as a _possible_ source of confusion, where a patch isn't 
recognized only because it was generated differently from the one that we 
compare against.

In practice, this can happen today with the "-C" and "-M" flags to diff,
of course: two patches look different (and get different SHA1 values) just
because one was generated with "rename logic" turned on and the other
wasn't..

> > Yeah. It probably works well in 99% of the cases to just do a simple
> > "export as patch" + "apply on top with old commit message, author and
> > author-date".
> 
> I think that you'll get better results out of "merge with top" + "commit
> with old commit info, but not listing old commit as a parent".

If I understand you correctly, that assumes that you followed the whole
chain, though, and that there was no cherry-picking.

I'd like to keep the door open here for cherry-picking or other 
transformations ("recreate tree _without_ that one commit"), because it 
would seem to also be a potentially good way to clean up history, not 
just move it forward, no?

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-21 22:09   ` Linus Torvalds
  2005-06-22  9:08     ` Junio C Hamano
  2005-06-22 16:23     ` Darrin Thompson
@ 2005-06-23  8:36     ` Martin Langhoff
  2005-06-23 23:21     ` [PATCH 0/3] Rebasing for "individual developer" usage Junio C Hamano
  3 siblings, 0 replies; 33+ messages in thread
From: Martin Langhoff @ 2005-06-23  8:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Darrin Thompson, git

On 6/22/05, Linus Torvalds <torvalds@osdl.org> wrote:
> Btw, I'd like to help automate the 6-7 stage with a different kind of
> merge logic.
(...)
>  - get the different HEAD info set up, and save the original head in
>    ORIG_HEAD, the way "git resolve" does for real merges:
> 
>         : ${GIT_DIR=.git}
> 
>         orig=$(git-rev-parse HEAD)
>         new=$(git-rev-parse FETCH_HEAD)
>         common=$(git-merge-base $orig $new)
> 
>         echo $orig > $GIT_DIR/ORIG_HEAD
> 
>  - fast-forward to the new HEAD. We'll want to re-base everything off
>    that. If that fails, exit out - we've got dirty state
> 
>         git-read-tree -m -u $orig $new && exit 1
> 
>  - for each commit that we had in our old tree but not in the common part,
>    try to re-base it:
> 
>         > FAILED_TO_CHERRYPICK
>         for i in $(git-rev-list $orig ^$common)
>         do
>                 git-cherry-pick $i ||
>                         (echo $i >> FAILED_TO_CHERRYPICK)
>         done
>         if [ -s FAILED_TO_CHERRYPICK ]; then
>                 echo Some commits could not be cherry-picked, check by hand:
>                 cat FAILED_TO_CHERRYPICK
>         fi

Re-base and replay local history is an approach I've been using
successfully with Arch (though it takes literally ages). Ideally, the
process should be able to be restarted after one call to
git-cherry-pick fails. It is usually a handful of patches in a series
of a few hundred that will break, usually because it's been fed
upstream. You want to resolve the conflict and resume somehow.

> and here the "git-cherry-pick" thing is just a script that basically takes
> an old commit ID, and tries to re-apply it as a patch (with author data
> and commit messages, of course) on top of the current head. It would
> basically be nothing more than a "git-diff-tree $1" followed by tryign to
> figure out whether it had already been applied or whether it can be
> applied now.
> 
> What do you think?

Sounds great. 

It might be useful to provide it a "skip" list, so that it skips
applying selected patches (that have presumably made it upstream).

And perhaps --stop-at <commit-sha> so that if a large replay fails or
yields a broken tree (not at the git level, but at the /does it
compile and run/ level), I can throw away the temporary repo where I'm
working and try again in shorter batches, stopping at "strategical"
points.

cheers,


martin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-22 22:21               ` Linus Torvalds
  2005-06-23  3:32                 ` Daniel Barkalow
@ 2005-06-23  8:47                 ` Martin Langhoff
  1 sibling, 0 replies; 33+ messages in thread
From: Martin Langhoff @ 2005-06-23  8:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git

On 6/23/05, Linus Torvalds <torvalds@osdl.org> wrote:
> I've been doing unified diffs for a _loong_ time, and I edit patches in my
> sleep. The rules for line numbers etc are really quite simple, and yes, I
> do edit patches before I apply them.

We often do unified diff editing when dealing with merge conflicts. It
isn't too hard once you've got the hang of it, usually the hard thing
is resolving the conflict in the first place.

Emacs has a diff editing mode that is really good:
http://wiki.gnuarch.org/Process_20_2a_2erej_20files

cheers,


martin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23  5:15                     ` Daniel Barkalow
  2005-06-23  6:09                       ` Linus Torvalds
@ 2005-06-23 12:10                       ` Catalin Marinas
  2005-06-23 17:05                         ` Daniel Barkalow
  1 sibling, 1 reply; 33+ messages in thread
From: Catalin Marinas @ 2005-06-23 12:10 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, Junio C Hamano, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> I think that you'll get better results out of "merge with top" + "commit
> with old commit info, but not listing old commit as a parent". At least,
> that's what StGIT is doing, IIRC, and using merge instead of patch seems
> like it'll make the remaining 1% a lot more pleasant.

Actually StGIT still lists the old commit as the 2nd parent since I
want to implement a log command which can also show only the commits
against a single patch. If this 2nd parent would not be stored,
pushing a patch onto the stack when its base was changed would reset
all the history for that patch.

Of course, there are other ways of doing this like storing all the
commit ids in a file but I found this to be the simplest.

-- 
Catalin


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23  6:09                       ` Linus Torvalds
@ 2005-06-23 16:45                         ` Daniel Barkalow
  2005-06-23 18:43                           ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Barkalow @ 2005-06-23 16:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Wed, 22 Jun 2005, Linus Torvalds wrote:

> On Thu, 23 Jun 2005, Daniel Barkalow wrote:
> 
> > > In fact, you could probably replace every run of contiguous whitespace
> > > with a single space, and then you'd not have to worry about whitespace
> > > differences either. That would be very simple to do, and quite workable: I
> > > certainly think it sounds more reliable than just hoping that people
> > > always pass on a "patch ID" in their emails..
> > 
> > That's actually quite plausible. The only case it wouldn't handle is when
> > you actually discard parts, and I'm not sure at this point what other
> > people should see there.
> 
> Yes. One small note of warning: different "diff" algorithms may under some
> (mostly unlikely) circumstances result in different patches for the
> difference between the same two files. So when comparin SHA1's of diffs
> this way, you should also hopefully have the same diff generation
> algorithm.

I think that, if we care much about the hashes of diffs, we should hash
the diff that's being applied, not hash a regenerated diff (unless, of
course, the diff that's being applied has been modified in hash-relevant
ways, in which case it's not going to match anything anyway).

GNU diff sometimes generates patches which are really terrible to try to
read, because it finds some line of punctuation from a removed block that
matches a line in an added block and interleaves unrelated content. It
would be nice to not have to worry about confusion if there's a mismatch.

Wouldn't the only case where this would be a problem be if we had the
committer apply the patch, generate a diff, hash it, and stick the hash in
the commit? The no-cache version has the diffs for hashing done by the
same person with the same program, and the hash-the-applied-patch version
has the hashes done on the same patch.

> > > Yeah. It probably works well in 99% of the cases to just do a simple
> > > "export as patch" + "apply on top with old commit message, author and
> > > author-date".
> > 
> > I think that you'll get better results out of "merge with top" + "commit
> > with old commit info, but not listing old commit as a parent".
> 
> If I understand you correctly, that assumes that you followed the whole
> chain, though, and that there was no cherry-picking.

I think that the whole-chain case is sufficiently common to make special
and extra smooth; developers will be making their history forwards every
day, and only cherry-picking on occasion.

This is also still in the developer's messy history, and I don't think
those commits are really worth much as history or organization (although
they're wonderful as checkpointing). The cherry-picking there will
generally be content-based rather than commit-based, and will mostly be
picking out hunks to form a clean patch sequence to replace the history.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23 12:10                       ` Catalin Marinas
@ 2005-06-23 17:05                         ` Daniel Barkalow
  2005-06-24 13:41                           ` Catalin Marinas
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Barkalow @ 2005-06-23 17:05 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Linus Torvalds, Junio C Hamano, git

On Thu, 23 Jun 2005, Catalin Marinas wrote:

> Actually StGIT still lists the old commit as the 2nd parent since I
> want to implement a log command which can also show only the commits
> against a single patch. If this 2nd parent would not be stored,
> pushing a patch onto the stack when its base was changed would reset
> all the history for that patch.

I think that it's important to avoid having the array of "rebased the
patch" commits be reachable from the final series if that series is going
to be merged into the mainline at the end.

If you want to keep the history of a patch, you should be able to do it by
rebasing that history as well as the latest patch, so you'd get a
two-parent commit with two rebased parents when you rebased a two-parent
commit.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23 16:45                         ` Daniel Barkalow
@ 2005-06-23 18:43                           ` Linus Torvalds
  2005-06-23 19:59                             ` Daniel Barkalow
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2005-06-23 18:43 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git



On Thu, 23 Jun 2005, Daniel Barkalow wrote:
> 
> I think that, if we care much about the hashes of diffs, we should hash
> the diff that's being applied, not hash a regenerated diff (unless, of
> course, the diff that's being applied has been modified in hash-relevant
> ways, in which case it's not going to match anything anyway).

But this whole algorithm depends on comparing two git commits, so we don't
actually _have_ a diff: we must generate it ourselves from the commits we
have in the history (ie we'd compare the commit we're "moving forward"
with each commit that we're moving it forward past - in order to see
whether it's already been applied).

Now, we control both of these patch generations, so in that sense it's 
easy to just make sure we generate both diffs the same way, but I just 
wanted to point this fact out.

> Wouldn't the only case where this would be a problem be if we had the
> committer apply the patch, generate a diff, hash it, and stick the hash in
> the commit?

Right, that would mean that we don't control the hash generation at both 
points, and would make it fundamentally harder. But since the whole point 
is that we should be able to generate the ID _without_ it actually being 
stored away anywhere, this hash approach should work fine.

> I think that the whole-chain case is sufficiently common to make special
> and extra smooth; developers will be making their history forwards every
> day, and only cherry-picking on occasion.

I'm not convinced. I think a _lot_ of people would want to do re-ordering 
of patches, combining them, and cherry-picking them if they just had a 
good way to do that.

That would then be a good way of cleaning up messy history too. Just prune 
the old tree once you've moved it forward.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23 18:43                           ` Linus Torvalds
@ 2005-06-23 19:59                             ` Daniel Barkalow
  2005-06-23 22:20                               ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Barkalow @ 2005-06-23 19:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Thu, 23 Jun 2005, Linus Torvalds wrote:

> Right, that would mean that we don't control the hash generation at both 
> points, and would make it fundamentally harder. But since the whole point 
> is that we should be able to generate the ID _without_ it actually being 
> stored away anywhere, this hash approach should work fine.

("We" in this case being the developer checking whether the patch went in,
not the maintainer)

We'll need to see if the performance is okay this way, both in terms of
time spent searching for the patch you're checking on and in terms of
recognizing the patch as being the same. No point in designing anything
more complex until we determine if we can't just recognize it already.

> > I think that the whole-chain case is sufficiently common to make special
> > and extra smooth; developers will be making their history forwards every
> > day, and only cherry-picking on occasion.
> 
> I'm not convinced. I think a _lot_ of people would want to do re-ordering 
> of patches, combining them, and cherry-picking them if they just had a 
> good way to do that.

Oh, I think they'd want to do that. I just don't think they'd want to do
it as often as they want to update their tree for changes in mainline,
just because no one developer will do more work than the rest of the
developers combined. And it's worthwhile, because you need to use
diff/patch for the complex cases, and can use diff3 for the simple case,
and you need diff3 if you want the system to treat "everything in patch 5
is already in the new base" as success rather than conflict.

(This is, incidentally, the case that drives me crazy about arch: if I
make a change in one working directory, copy that change into another
working directory, commit it from one of those directories, and update in
the other, I get a reject, and I then have to figure out if the changes 
were actually the same or if one of them has further modifications.)

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23 19:59                             ` Daniel Barkalow
@ 2005-06-23 22:20                               ` Linus Torvalds
  2005-06-23 22:49                                 ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2005-06-23 22:20 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git



On Thu, 23 Jun 2005, Daniel Barkalow wrote:
>
> On Thu, 23 Jun 2005, Linus Torvalds wrote:
> > 
> > Right, that would mean that we don't control the hash generation at both 
> > points, and would make it fundamentally harder. But since the whole point 
> > is that we should be able to generate the ID _without_ it actually being 
> > stored away anywhere, this hash approach should work fine.
> 
> ("We" in this case being the developer checking whether the patch went in,
> not the maintainer)

Yes.

> We'll need to see if the performance is okay this way, both in terms of
> time spent searching for the patch you're checking on and in terms of
> recognizing the patch as being the same. No point in designing anything
> more complex until we determine if we can't just recognize it already.

Actually, this is a _very_ efficient check to do, because I'm just 
incredibly smart.

What you do is to download the current git tree (give it a while to mirror 
out, actually), and you get a magic "git-patch-id" program.

Now, the reason this is very efficient is magical. Let's say that you have
my tree as branch "linus", and you have your own tree (branch "daniel"),
and you want to see what commits are in both (but are not the _same_
commit - you just wonder if they have the same diff). That's a big clue 
that you should drop your copy, because I applied it as a diff.

What you do is simply:

	git-whatchanged -p linus..daniel | git-patch-id | sort > daniel-patch-id
	git-whatchanged -p daniel..linus | git-patch-id | sort > linus-patch-id
	join -j 1 linus-patch-id daniel-patch-id

and you're done. You've just created a list of commits that have the same 
patch ID (and it will list the patch ID _and_ the two commit ID's, just 
to be really nice about it).

It's all totally linear in number of patches involved (yeah, the "sort" is
obviously reall NlogN, and technically the git-rev-list between the two
trees _could_ be more than linear, but in practice they are very cheap and
very close to linear anyway).

It's also obviously totally untested, but hey, I always write perfect
code, so what could _possibly_ go wrong..

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23 22:20                               ` Linus Torvalds
@ 2005-06-23 22:49                                 ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2005-06-23 22:49 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, Git Mailing List, Andrew Morton



On Thu, 23 Jun 2005, Linus Torvalds wrote:
> 
> It's also obviously totally untested, but hey, I always write perfect
> code, so what could _possibly_ go wrong..

Actually, for once it actually does work.

Doing this for the whole current git kernel history shows the following
patch ID's that are duplicates:

 - fix ia64 syscall auditing:
	0d9a1c0b1d4009f81f1c3d558961cc1eaf89c43d 3ac3ed555bec5b1f92bb22cb94823a0e99d0f320
	0d9a1c0b1d4009f81f1c3d558961cc1eaf89c43d 446b8831f5acf2076fa58a66286789eb84f3df2c

 - remove outdated print_* functions:
	1a830d81e7580b55eb0d445a5bf43920e1dba211 1409277c4aad2e87ad27b2b8a6901ce78eaf8081
	1a830d81e7580b55eb0d445a5bf43920e1dba211 db9dff366ba78085d0323364fadbf09bec0e77ee

 - zfcp: add point-2-point support:
	3969795b9b61425ef7e7ac52e9576203dc1ca71a 6f71d9bc025b02a8cbc2be83b0226a7043a507a5
	3969795b9b61425ef7e7ac52e9576203dc1ca71a 91bbfbda8d41f834c70c47d6f8c95245c90019e5

 - ppc32: add 405EP cpu_spec entry:
	59a8f295cf4c44cae74db41699392dc22db64df1 7fbdf1a23be1837b8bc5bcec096015ca99e00aa7
	59a8f295cf4c44cae74db41699392dc22db64df1 ad95d6098dd1e94a09d2a1fdf39fd8281fcd8958
	59a8f295cf4c44cae74db41699392dc22db64df1 beb9e1c3f32a0f878765c7c1142f91083739c5bd

 - Convert i2o to compat_ioctl:
	6ab2bc457d0a736d97f26f2ca75666bb981fdc42 83363ea074504f9005e28cd6209923637bb74de5
	6ab2bc457d0a736d97f26f2ca75666bb981fdc42 f4c2c15b930b23edaa633b09fe3f4c01b4ecce9f

 - ppc32: Fix Alsa PowerMac driver on old machines:
	7c82fcb20679f5c24c8db37b6dc554bf841744e9 5218064c885af5c49e380d09d54f3cc86891a580
	7c82fcb20679f5c24c8db37b6dc554bf841744e9 9ae250d175e1cbff82223ce2c07897c790c5b948

 - scsi: remove unused scsi_cmnd->internal_timeout field
	883a1e3ca52a071eaf9d53e66d13b250c6d60942 97665e9c22991401dc56968619c6b8b9c09f3268
	883a1e3ca52a071eaf9d53e66d13b250c6d60942 d3a933dc9851e74581f9f4c8e703e77901ae8d01

 - scsi: remove meaningless scsi_cmnd->serial_number_at_timeout field:
	8d3c15861cc08388e610940506255d3a19109aee 84011ae88da62a20b3ae7b48e2ae3b1ef0fc810a
	8d3c15861cc08388e610940506255d3a19109aee c6295cdf656de63d6d1123def71daba6cd91939c

 - kill old EH constants:
	9b0f17e44f99618902fa0cdc5cd73711060f3650 0db7157ca47e21c7623a59e710b807ad06fce161
	9b0f17e44f99618902fa0cdc5cd73711060f3650 2bc474c3646efba67bdc83b7fc7d8ee7562e0106

 - input: Fix fast scrolling scancodes in atkbd.c:
	bb6b1e4d2d979e1cccf96d0dd494f608d5143bee 5212dd58e67e4b8009107d69a9de45dd2e687496
	bb6b1e4d2d979e1cccf96d0dd494f608d5143bee 7d6064d44bc79e328f2794ee7322ba2676511e2b

 - scsi: add DID_REQUEUE to the error handling:
	c62473f1990c475b087fee7280a1b2e9b9589bc5 686579d95d48c713bdb7008cc76af8398219e687
	c62473f1990c475b087fee7280a1b2e9b9589bc5 bf341919dbc1fbcbb565fb3224c840760ebd9f85

 - consolidate timeout defintions in scsi.h:
	d56aa0d90bced261ddf3b58038fca3cbff59a189 0890d74f295be849032fd4390ee00422dfda83b1
	d56aa0d90bced261ddf3b58038fca3cbff59a189 b6651129cc27d56a9cbefcb5f713cea7706fd6b7

 - kill gratitious includes of major.h:
	e938d763a9c231acbf93f1140c911cbec447c89d 5523662c4cd585b892811d7bb3e25d9a787e19b3
	e938d763a9c231acbf93f1140c911cbec447c89d b453257f057b834fdf9f4a6ad6133598b79bd982

 - drivers/scsi/sym53c416.c: fix a wrong check:
	eb2d067160e40137cd00590e59e800dd09c4f62f 380c3877ae5de888cfb7a59990b9aee5a415295f
	eb2d067160e40137cd00590e59e800dd09c4f62f b6f0b0d016a254ff583fec26f2c9e21c1ae2fdf3

 - ultrastor: fix compile failure:
	ed19cee5e4439c92e4e711fe1c10569b58415c46 4e33bd874bce8b3df2ab52538db59730196383c3
	ed19cee5e4439c92e4e711fe1c10569b58415c46 69aa3f71580990f39e387d96ed1001d2f5fb04b1

 - qla trivial iomem annotation:
	ff16fe82338d2dd7d8baf6a3b38780e0eac2b1f0 766f2fa170e65948053b06c6106c8dc8526c3e14
	ff16fe82338d2dd7d8baf6a3b38780e0eac2b1f0 93fc4294fc112ce4e518a3f62dea8681dc39d9cf

where the first number is the patch ID, the the second number is the 
commit ID.

What's interesting to note here is the following:

 - it actually did find duplicates in the current kernel that had just 
   merged perfectly and come in through two different trees.

   It's also interesting to note that while most of them had largely the
   same commit message (ie they clearly got applied from the same emailed
   diff), the algorithm doesn't even look at that one, and thus the fact
   that some had extra comments ("I have tested this patch and have seen
   no problems with it.") others had not just different sign-offs (due to 
   the different paths they took), but different authors noted.

   Ie this really doesn't care at all about how the patch came there, it 
   just notices that the patch is identical to some other patch.

 - it didn't have a single false positive (the above is the full list of 
   duplicate patch ID's - I just did a

	git-whatchanged -p | git-patch-id | LANG=C sort > sorted-patch-list

   followed by

	cut -d' ' -f1 sorted-patch-list | uniq -d | join -j 1 - sorted-patch-list

   and then when you have that list, you can check the commits themselves 
   with

	cut -d' ' -f2 dup-list | git-diff-tree --pretty -p --stdin | less -S

   and verify the output.

 - it actually found the patch that got incorrectly applied _three_times_ 
   ("ppc32: add 405EP cpu_spec entry") because it continued to apply, and
   thus Andrew's scripts didn't notice it.

In other words, this "check patch ID's" thing is not only efficient, it
actually seems to work. It literally checked the last 2+ months of kernel
history (2788 patches) for duplicate patches in just under a minute, and
might be useful to Andrew etc as a way to see "yup, that patch got
applied".

Doing the same for just the difference between two branches is pretty much
instantaneous. You can basically check ~50 patches a second using this.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 0/3] Rebasing for "individual developer" usage
  2005-06-21 22:09   ` Linus Torvalds
                       ` (2 preceding siblings ...)
  2005-06-23  8:36     ` Martin Langhoff
@ 2005-06-23 23:21     ` Junio C Hamano
  2005-06-23 23:27       ` [PATCH 1/3] git-commit-script: get commit message from an existing one Junio C Hamano
                         ` (2 more replies)
  3 siblings, 3 replies; 33+ messages in thread
From: Junio C Hamano @ 2005-06-23 23:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Darrin Thompson, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Something like this...
LT> What do you think?

I'll be sending these three patches as my answer to that question.

    [PATCH 1/3] git-commit-script: get commit message from an existing one.
    [PATCH 2/3] git-cherry: find commits not merged upstream.
    [PATCH 3/3] git-rebase-script: rebase local commits to new upstream head.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/3] git-commit-script: get commit message from an existing one.
  2005-06-23 23:21     ` [PATCH 0/3] Rebasing for "individual developer" usage Junio C Hamano
@ 2005-06-23 23:27       ` Junio C Hamano
  2005-06-23 23:28       ` [PATCH 2/3] git-cherry: find commits not merged upstream Junio C Hamano
  2005-06-23 23:29       ` [PATCH 3/3] git-rebase-script: rebase local commits to new upstream head Junio C Hamano
  2 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2005-06-23 23:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

With -m flag specified, git-commit-script takes the commit
message along with author information from an existing commit.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
*** Linus, I suspect that date.c mishandles the raw date; I am
*** consistently getting 7 hours off and my machine runs in
*** US/Pacific (-0700) timezone.

 git-commit-script |   75 +++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/git-commit-script b/git-commit-script
--- a/git-commit-script
+++ b/git-commit-script
@@ -1,12 +1,37 @@
 #!/bin/sh
+#
+# Copyright (c) 2005 Linus Torvalds
+#
+
+usage () {
+    echo 'git commit [-m existing-commit] [<path>...]'
+    exit 1
+}
+
 : ${GIT_DIR=.git}
-if [ ! -d $GIT_DIR ]; then
+if [ ! -d "$GIT_DIR" ]; then
 	echo Not a git directory 1>&2
 	exit 1
 fi
+while case "$#" in 0) break ;; esac
+do
+    case "$1" in
+    -m) shift
+        case "$#" in
+	0) usage ;;
+	*) use_commit=`git-rev-parse "$1"` ||
+	   exit ;;
+	esac
+	;;
+    *)  break
+        ;;
+    esac
+    shift
+done
+
 git-update-cache -q --refresh -- "$@" || exit 1
 PARENTS="-p HEAD"
-if [ ! -r $GIT_DIR/HEAD ]; then
+if [ ! -r "$GIT_DIR/HEAD" ]; then
 	if [ -z "$(git-ls-files)" ]; then
 		echo Nothing to commit 1>&2
 		exit 1
@@ -20,7 +45,7 @@ if [ ! -r $GIT_DIR/HEAD ]; then
 	) > .editmsg
 	PARENTS=""
 else
-	if [ -f $GIT_DIR/MERGE_HEAD ]; then
+	if [ -f "$GIT_DIR/MERGE_HEAD" ]; then
 		echo "#"
 		echo "# It looks like your may be committing a MERGE."
 		echo "# If this is not correct, please remove the file"
@@ -28,8 +53,38 @@ else
 		echo "# and try again"
 		echo "#"
 		PARENTS="-p HEAD -p MERGE_HEAD"
-	fi > .editmsg
-	git-status-script >> .editmsg
+	elif test "$use_commit" != ""
+	then
+		pick_author_script='
+		/^author /{
+			h
+			s/^author \([^<]*\) <[^>]*> .*$/\1/
+			s/'\''/'\''\'\'\''/g
+			s/.*/GIT_AUTHOR_NAME='\''&'\''/p
+
+			g
+			s/^author [^<]* <\([^>]*\)> .*$/\1/
+			s/'\''/'\''\'\'\''/g
+			s/.*/GIT_AUTHOR_EMAIL='\''&'\''/p
+
+			g
+			s/^author [^<]* <[^>]*> \(.*\)$/\1/
+			s/'\''/'\''\'\'\''/g
+			s/.*/GIT_AUTHOR_DATE='\''&'\''/p
+
+			q
+		}
+		'
+		set_author_env=`git-cat-file commit "$use_commit" |
+		sed -ne "$pick_author_script"`
+		eval "$set_author_env"
+		export GIT_AUTHOR_NAME
+		export GIT_AUTHOR_EMAIL
+		export GIT_AUTHOR_DATE
+		git-cat-file commit "$use_commit" |
+		sed -e '1,/^$/d'
+	fi >.editmsg
+	git-status-script >>.editmsg
 fi
 if [ "$?" != "0" ]
 then
@@ -37,13 +92,17 @@ then
 	rm .editmsg
 	exit 1
 fi
-${VISUAL:-${EDITOR:-vi}} .editmsg
+case "$use_commit" in
+'')
+	${VISUAL:-${EDITOR:-vi}} .editmsg
+	;;
+esac
 grep -v '^#' < .editmsg | git-stripspace > .cmitmsg
 [ -s .cmitmsg ] && 
 	tree=$(git-write-tree) &&
 	commit=$(cat .cmitmsg | git-commit-tree $tree $PARENTS) &&
-	echo $commit > $GIT_DIR/HEAD &&
-	rm -f -- $GIT_DIR/MERGE_HEAD
+	echo $commit > "$GIT_DIR/HEAD" &&
+	rm -f -- "$GIT_DIR/MERGE_HEAD"
 ret="$?"
 rm -f .cmitmsg .editmsg
 exit "$ret"
------------


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 2/3] git-cherry: find commits not merged upstream.
  2005-06-23 23:21     ` [PATCH 0/3] Rebasing for "individual developer" usage Junio C Hamano
  2005-06-23 23:27       ` [PATCH 1/3] git-commit-script: get commit message from an existing one Junio C Hamano
@ 2005-06-23 23:28       ` Junio C Hamano
  2005-06-23 23:29       ` [PATCH 3/3] git-rebase-script: rebase local commits to new upstream head Junio C Hamano
  2 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2005-06-23 23:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This script helps the git-rebase script by finding commits that
have not been merged upstream.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 Makefile   |    2 +
 git-cherry |   90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+), 1 deletions(-)
 create mode 100755 git-cherry

diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -25,7 +25,7 @@ SCRIPTS=git git-apply-patch-script git-m
 	git-deltafy-script git-fetch-script git-status-script git-commit-script \
 	git-log-script git-shortlog git-cvsimport-script git-diff-script \
 	git-reset-script git-add-script git-checkout-script git-clone-script \
-	gitk
+	gitk git-cherry
 
 PROG=   git-update-cache git-diff-files git-init-db git-write-tree \
 	git-read-tree git-commit-tree git-cat-file git-fsck-cache \
diff --git a/git-cherry b/git-cherry
new file mode 100755
--- /dev/null
+++ b/git-cherry
@@ -0,0 +1,90 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano.
+#
+
+usage="usage: $0 "'<upstream> [<head>]
+
+             __*__*__*__*__> <upstream>
+            /
+  fork-point
+            \__+__+__+__+__+__+__+__> <head>
+
+Each commit between the fork-point and <head> is examined, and
+compared against the change each commit between the fork-point and
+<upstream> introduces.  If the change does not seem to be in the
+upstream, it is shown on the standard output.
+
+The output is intended to be used as:
+
+    OLD_HEAD=$(git-rev-parse HEAD)
+    git-rev-parse linus >${GIT_DIR-.}/HEAD
+    git-cherry linus OLD_HEAD |
+    while read commit
+    do
+        GIT_EXTERNAL_DIFF=git-apply-patch-script git-diff-tree -p "$commit" &&
+	git-commit-script -m "$commit"
+    done
+'
+
+case "$#" in
+1) linus=`git-rev-parse "$1"` &&
+   junio=`git-rev-parse HEAD` || exit
+   ;;
+2) linus=`git-rev-parse "$1"` &&
+   junio=`git-rev-parse "$2"` || exit
+   ;;
+*) echo >&2 "$usage"; exit 1 ;;
+esac
+
+# Note that these list commits in reverse order;
+# not that the order in inup matters...
+inup=`git-rev-list ^$junio $linus` &&
+ours=`git-rev-list $junio ^$linus` || exit
+
+tmp=.cherry-tmp$$
+patch=$tmp-patch
+diff=$tmp-diff
+mkdir $patch
+trap "rm -rf $tmp-*" 0 1 2 3 15
+
+_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
+_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
+
+for c in $inup $ours
+do
+	git-diff-tree -p $c |
+	sed -e "/^$_x40 (from $_x40)\$/d;/^--- /d;/^+++ /d;/^@@ /d" >$patch/$c
+	git-diff-tree -r $c |
+	sed -e "/^$_x40 (from $_x40)\$/d;s/ $_x40 $_x40 / X X /" >$patch/$c.s
+done
+
+LF='
+'
+O=
+for c in $ours
+do
+	found=
+	for d in $inup
+	do
+		cmp $patch/$c.s $patch/$d.s >/dev/null ||
+		continue
+
+		diff --unified=0 $patch/$c $patch/$d >$diff
+		cmp /dev/null $diff >/dev/null && {
+			found=t
+			break
+		}
+	done
+	case "$found,$O" in
+	t,*)	;;
+	,)
+		O="$c" ;;
+	,*)
+		O="$c$LF$O" ;;
+	esac
+done
+case "$O" in
+'') ;;
+*)  echo "$O" ;;
+esac
------------


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/3] git-rebase-script: rebase local commits to new upstream head.
  2005-06-23 23:21     ` [PATCH 0/3] Rebasing for "individual developer" usage Junio C Hamano
  2005-06-23 23:27       ` [PATCH 1/3] git-commit-script: get commit message from an existing one Junio C Hamano
  2005-06-23 23:28       ` [PATCH 2/3] git-cherry: find commits not merged upstream Junio C Hamano
@ 2005-06-23 23:29       ` Junio C Hamano
  2 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2005-06-23 23:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Using git-cherry, forward port local commits missing from the
new upstream head.  This depends on "-m" flag support in
git-commit-script.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 Makefile          |    2 +-
 git-rebase-script |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 1 deletions(-)
 create mode 100755 git-rebase-script

diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -25,7 +25,7 @@ SCRIPTS=git git-apply-patch-script git-m
 	git-deltafy-script git-fetch-script git-status-script git-commit-script \
 	git-log-script git-shortlog git-cvsimport-script git-diff-script \
 	git-reset-script git-add-script git-checkout-script git-clone-script \
-	gitk git-cherry
+	gitk git-cherry git-rebase-script
 
 PROG=   git-update-cache git-diff-files git-init-db git-write-tree \
 	git-read-tree git-commit-tree git-cat-file git-fsck-cache \
diff --git a/git-rebase-script b/git-rebase-script
new file mode 100755
--- /dev/null
+++ b/git-rebase-script
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano.
+#
+
+usage="usage: $0 "'<upstream> [<head>]
+
+Uses output from git-cherry to rebase local commits to the new head of
+upstream tree.'
+
+: ${GIT_DIR=.git}
+
+case "$#" in
+1) linus=`git-rev-parse "$1"` &&
+   junio=`git-rev-parse HEAD` || exit
+   ;;
+2) linus=`git-rev-parse "$1"` &&
+   junio=`git-rev-parse "$2"` || exit
+   ;;
+*) echo >&2 "$usage"; exit 1 ;;
+esac
+
+git-read-tree -m -u $junio $linus &&
+echo "$linus" >"$GIT_DIR/HEAD" || exit
+
+tmp=.rebase-tmp$$
+fail=$tmp-fail
+trap "rm -rf $tmp-*" 0 1 2 3 15
+
+>$fail
+
+git-cherry $linus $junio |
+while read commit
+do
+	S=`cat "$GIT_DIR/HEAD"` &&
+        GIT_EXTERNAL_DIFF=git-apply-patch-script git-diff-tree -p $commit &&
+	git-commit-script -m "$commit" || {
+		echo $commit >>$fail
+		git-read-tree --reset -u $S
+	}
+done
+if test -s $fail
+then
+	echo Some commits could not be rebased, check by hand:
+	cat $fail
+fi
------------


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Patch (apply) vs. Pull
  2005-06-23 17:05                         ` Daniel Barkalow
@ 2005-06-24 13:41                           ` Catalin Marinas
  0 siblings, 0 replies; 33+ messages in thread
From: Catalin Marinas @ 2005-06-24 13:41 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, Junio C Hamano, git

Daniel Barkalow <barkalow@iabervon.org> wrote:
> I think that it's important to avoid having the array of "rebased the
> patch" commits be reachable from the final series if that series is going
> to be merged into the mainline at the end.

True. I will remove that. Any commit will have the new base of the
patch as a parent.

> If you want to keep the history of a patch, you should be able to do it by
> rebasing that history as well as the latest patch, so you'd get a
> two-parent commit with two rebased parents when you rebased a two-parent
> commit.

I can have two commits, one of them accessible via HEAD and the other
stored somewhere under .git/patches. The latter is just a normal
commit where the parent is the current HEAD. This will not be
generated when the patch is re-based, but only when a patch is
modified.

-- 
Catalin


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2005-06-24 13:37 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-20 16:19 Patch (apply) vs. Pull Darrin Thompson
2005-06-20 17:22 ` Junio C Hamano
2005-06-20 23:01   ` Darrin Thompson
2005-06-21 18:02     ` Daniel Barkalow
2005-06-22  8:47       ` Junio C Hamano
2005-06-22  9:56       ` Catalin Marinas
2005-06-21 22:09   ` Linus Torvalds
2005-06-22  9:08     ` Junio C Hamano
2005-06-22 17:21       ` Linus Torvalds
2005-06-22 20:08         ` Daniel Barkalow
2005-06-22 20:22           ` Linus Torvalds
2005-06-22 21:54             ` Daniel Barkalow
2005-06-22 22:21               ` Linus Torvalds
2005-06-23  3:32                 ` Daniel Barkalow
2005-06-23  4:23                   ` Linus Torvalds
2005-06-23  5:15                     ` Daniel Barkalow
2005-06-23  6:09                       ` Linus Torvalds
2005-06-23 16:45                         ` Daniel Barkalow
2005-06-23 18:43                           ` Linus Torvalds
2005-06-23 19:59                             ` Daniel Barkalow
2005-06-23 22:20                               ` Linus Torvalds
2005-06-23 22:49                                 ` Linus Torvalds
2005-06-23 12:10                       ` Catalin Marinas
2005-06-23 17:05                         ` Daniel Barkalow
2005-06-24 13:41                           ` Catalin Marinas
2005-06-23  8:47                 ` Martin Langhoff
2005-06-22 16:23     ` Darrin Thompson
2005-06-23  8:36     ` Martin Langhoff
2005-06-23 23:21     ` [PATCH 0/3] Rebasing for "individual developer" usage Junio C Hamano
2005-06-23 23:27       ` [PATCH 1/3] git-commit-script: get commit message from an existing one Junio C Hamano
2005-06-23 23:28       ` [PATCH 2/3] git-cherry: find commits not merged upstream Junio C Hamano
2005-06-23 23:29       ` [PATCH 3/3] git-rebase-script: rebase local commits to new upstream head Junio C Hamano
2005-06-22 17:04   ` Patch (apply) vs. Pull Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.