All of lore.kernel.org
 help / color / mirror / Atom feed
* Removing useless merge commit with "filter-branch"
@ 2012-03-08 23:21 Anatol Pomozov
  2012-03-08 23:30 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Anatol Pomozov @ 2012-03-08 23:21 UTC (permalink / raw)
  To: Git Mailing List

Hi,

I have a large project (~100K commits) and I need to split a part of
it into separate project. What I usually do in this case is

git filter-branch --prune-empty --index-filter 'git rm -rfq --cached
--ignore-unmatch UNNEEDED_DIRECTORIES' HEAD

that works more or less fine for me.

The original project has a lot of merge commits (don't ask me why).
Basically every non-merge commit is merged back to master branch
instead of rebasing on top of the master. In the command above I use
--prune-empty parameter that removes empty commits, but not their
merge points. This leaves a lot of "useless commit points" like this:

|
o      - merge commit that previously merged feature X
|\
| \
|  \
o  |   - real commit
|   |
|  /
|/
|


As of me such merge left-overs are completely useless and I would like
to remove them. Actually this task can be split into 2 steps:
1) Remove useless parents. A useless part is the one that points to a
commit that is *already* reachable by some other parent. This step
converts useless merge points to regular empty commits.
2) run filter branch with --prune-empty that removes such empty commits.


So my questions are:
1) What is the best way to remove "useless parents" as in the algorithm above?
2) Should such behavior (remove useless parent/merge commits) be
enabled when flag --prune-empty is used?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Removing useless merge commit with "filter-branch"
  2012-03-08 23:21 Removing useless merge commit with "filter-branch" Anatol Pomozov
@ 2012-03-08 23:30 ` Junio C Hamano
  2012-03-13 22:27   ` Anatol Pomozov
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2012-03-08 23:30 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: Git Mailing List

Anatol Pomozov <anatol.pomozov@gmail.com> writes:

> |
> o      - merge commit that previously merged feature X
> |\
> | \
> |  \
> o  |   - real commit
> |   |
> |  /
> |/
> |

It is unclear how many commits are drawn in the above picture and
what "feature X" is about in the above picture.  Care to redraw the
commit DAG to explain what you are trying to do a bit better?

The way I read it is that you start from a history like this (note
that when we draw an ascii art history we often write it sideways,
time flows from left to right):

    ---A-----B-----M---
        \         /
         C-------D

where a side branch to implement "feature X" that has C and D forked
at A, and it was merged at M after somebody else committed B on the
mainline.  When you filtered out some parts of the tree, it turns
out that C and D are totally unintereseting because their changes
touch parts outside of your interest, i.e. the history is:

    ---A-----B-----M---
        \         /
         o-------o

where 'o' are now no-op.

Is that what you are talking about?

I think "log --simplify-merges A..M -- path" may already has logic
that deals with this, so it may help if you study what it does and
how it does what it does.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Removing useless merge commit with "filter-branch"
  2012-03-08 23:30 ` Junio C Hamano
@ 2012-03-13 22:27   ` Anatol Pomozov
  2012-03-29 18:26     ` Anatol Pomozov
  0 siblings, 1 reply; 4+ messages in thread
From: Anatol Pomozov @ 2012-03-13 22:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

Hi

On Thu, Mar 8, 2012 at 3:30 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Anatol Pomozov <anatol.pomozov@gmail.com> writes:
>
>> |
>> o      - merge commit that previously merged feature X
>> |\
>> | \
>> |  \
>> o  |   - real commit
>> |   |
>> |  /
>> |/
>> |
>
> It is unclear how many commits are drawn in the above picture and
> what "feature X" is about in the above picture.  Care to redraw the
> commit DAG to explain what you are trying to do a bit better?
>
> The way I read it is that you start from a history like this (note
> that when we draw an ascii art history we often write it sideways,
> time flows from left to right):
>
>    ---A-----B-----M---
>        \         /
>         C-------D
>
> where a side branch to implement "feature X" that has C and D forked
> at A, and it was merged at M after somebody else committed B on the
> mainline.  When you filtered out some parts of the tree, it turns
> out that C and D are totally unintereseting because their changes
> touch parts outside of your interest, i.e. the history is:
>
>    ---A-----B-----M---
>        \         /
>         o-------o
>
> where 'o' are now no-op.
>
> Is that what you are talking about?

Yes, in fact --prune-empty flag removes empty commits so the history looks like

-----A-------B-------M--------
       \               /
        --------------


So M is a merge that has 2 parents A and B. I would like to remove
this merge M and leave the history as

-----A-----B-----

as only these commits have changes in my library that I am trying to extract.

I think some trickery with "git filter-branch --parent-filter" should help here.

First one runs filter-branch with --parent-filter and removes useless
parents from merges (in this example with will be parent A---M), this
converts such merges to regular empty commits

then run filter-branch one more time with --prune-empty - it removes
empty commits.
>
> I think "log --simplify-merges A..M -- path" may already has logic
> that deals with this, so it may help if you study what it does and
> how it does what it does.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Removing useless merge commit with "filter-branch"
  2012-03-13 22:27   ` Anatol Pomozov
@ 2012-03-29 18:26     ` Anatol Pomozov
  0 siblings, 0 replies; 4+ messages in thread
From: Anatol Pomozov @ 2012-03-29 18:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

Hi,

I solved my issue by using "git filter-branch --parent-filter". The
idea is to visit all commits and remove all "dependent" parents. To
find independent parents I used "git show-branch --independent
PARENT_1..PARENT_N" command. In my case (we have a lot of short-term
development branches) this converted ~80% of all merges to "empty
non-merge commits" and later "git filter-branch --prune-empty" removed
them. This made history of my project much more simpler and linear.

I think such functionality should be available in git and enabled when
"--prune-empty" flag is used. So "--prune-empty" removes not only
simple commits but also empty useless merge commits. Or maybe add a
--prune-empty-merges flag?

Anyway here is the script that I use, future readers might find it useful:

$ git filter-branch -f --prune-empty --parent-filter
PATH_TO/rewrite_parent.rb master
$ cat rewrite_parent.rb
#!/usr/bin/ruby
old_parents = gets.chomp.gsub('-p ', ' ')

if old_parents.empty? then
  new_parents = []
else
  new_parents = `git show-branch --independent #{old_parents}`.split
end

puts new_parents.map{|p| '-p ' + p}.join(' ')

Most likely the script can be rewritten as one-line shell script.

On Tue, Mar 13, 2012 at 3:27 PM, Anatol Pomozov
<anatol.pomozov@gmail.com> wrote:
> Hi
>
> On Thu, Mar 8, 2012 at 3:30 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Anatol Pomozov <anatol.pomozov@gmail.com> writes:
>>
>>> |
>>> o      - merge commit that previously merged feature X
>>> |\
>>> | \
>>> |  \
>>> o  |   - real commit
>>> |   |
>>> |  /
>>> |/
>>> |
>>
>> It is unclear how many commits are drawn in the above picture and
>> what "feature X" is about in the above picture.  Care to redraw the
>> commit DAG to explain what you are trying to do a bit better?
>>
>> The way I read it is that you start from a history like this (note
>> that when we draw an ascii art history we often write it sideways,
>> time flows from left to right):
>>
>>    ---A-----B-----M---
>>        \         /
>>         C-------D
>>
>> where a side branch to implement "feature X" that has C and D forked
>> at A, and it was merged at M after somebody else committed B on the
>> mainline.  When you filtered out some parts of the tree, it turns
>> out that C and D are totally unintereseting because their changes
>> touch parts outside of your interest, i.e. the history is:
>>
>>    ---A-----B-----M---
>>        \         /
>>         o-------o
>>
>> where 'o' are now no-op.
>>
>> Is that what you are talking about?
>
> Yes, in fact --prune-empty flag removes empty commits so the history looks like
>
> -----A-------B-------M--------
>       \               /
>        --------------
>
>
> So M is a merge that has 2 parents A and B. I would like to remove
> this merge M and leave the history as
>
> -----A-----B-----
>
> as only these commits have changes in my library that I am trying to extract.
>
> I think some trickery with "git filter-branch --parent-filter" should help here.
>
> First one runs filter-branch with --parent-filter and removes useless
> parents from merges (in this example with will be parent A---M), this
> converts such merges to regular empty commits
>
> then run filter-branch one more time with --prune-empty - it removes
> empty commits.
>>
>> I think "log --simplify-merges A..M -- path" may already has logic
>> that deals with this, so it may help if you study what it does and
>> how it does what it does.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-03-29 18:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-08 23:21 Removing useless merge commit with "filter-branch" Anatol Pomozov
2012-03-08 23:30 ` Junio C Hamano
2012-03-13 22:27   ` Anatol Pomozov
2012-03-29 18:26     ` Anatol Pomozov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.