From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren <newren@gmail.com>, git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Eric Wong" <e@80x24.org>,
"Jeff King" <peff@peff.net>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
"Lars Schneider" <larsxschneider@gmail.com>,
"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation
Date: Mon, 26 Aug 2019 21:32:58 -0400 [thread overview]
Message-ID: <e7df2ce3-f772-54b5-4e81-70510a897352@gmail.com> (raw)
In-Reply-To: <20190826235226.15386-5-newren@gmail.com>
On 8/26/2019 7:52 PM, Elijah Newren wrote:
> filter-branch suffers from a huge number of pitfalls that can result in
> incorrectly rewritten history, and many of the problems can easily go
> undetected until the new repository is in use. This can result in
> problems ranging from an even messier history than what led folks to
> filter-branch in the first place, to data loss or corruption. These
> issues cannot be backward compatibly fixed, so add a warning to the
> filter-branch manpage about this and recommand that another tool (such
> as filter-repo) be used instead.
>
> Also, update other manpages that referenced filter-branch. Several of
> these needed updates even if we could continue recommending
> filter-branch, either due to implying that something was unique to
> filter-branch when it applied more generally to all history rewriting
> tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
> something about filter-branch was used as an example despite other more
> commonly known examples now existing. Reword these sections to fix
> these issues and to avoid recommending filter-branch.
>
> Finally, remove the section explaining BFG Repo Cleaner as an
> alternative to filter-branch. I feel somewhat bad about this,
> especially since I feel like I learned so much from BFG that I put to
> good use in filter-repo (which is much more than I can say for
> filter-branch), but keeping that section presented a few problems:
> * In order to recommend that people quit using filter-branch, we need
> to provide them a recomendation for something else to use that
> can handle all the same types of rewrites. To my knowledge,
> filter-repo is the only such tool. So it needs to be mentioned.
> * I don't want to give conflicting recommendations to users
> * If we recommend two tools, we shouldn't expect users to learn both
> and pick which one to use; we should explain which problems one
> can solve that the other can't or when one is much faster than
> the other.
> * BFG and filter-repo have similar performance
> * All filtering types that BFG can do, filter-repo can also do. In
> fact, filter-repo comes with a reimplementation of BFG named
> bfg-ish which provides the same user-interface as BFG but with
> several bugfixes and new features that are hard to implement in
> BFG due to its technical underpinnings.
> While I could still mention both tools, it seems like I would need to
> provide some kind of comparison and I would ultimately just say that
> filter-repo can do everything BFG can, so ultimately it seems that it
> is just better to remove that section altogether.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> Documentation/git-fast-export.txt | 6 ++---
> Documentation/git-filter-branch.txt | 42 ++++++++---------------------
> Documentation/git-gc.txt | 17 ++++++------
> Documentation/git-rebase.txt | 2 +-
> Documentation/git-replace.txt | 10 +++----
> Documentation/git-svn.txt | 4 +--
> Documentation/githooks.txt | 7 ++---
> contrib/svn-fe/svn-fe.txt | 4 +--
> 8 files changed, 36 insertions(+), 56 deletions(-)
>
> diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt
> index cc940eb9ad..784e934009 100644
> --- a/Documentation/git-fast-export.txt
> +++ b/Documentation/git-fast-export.txt
> @@ -17,9 +17,9 @@ This program dumps the given revisions in a form suitable to be piped
> into 'git fast-import'.
>
> You can use it as a human-readable bundle replacement (see
> -linkgit:git-bundle[1]), or as a kind of an interactive
> -'git filter-branch'.
> -
> +linkgit:git-bundle[1]), or as a format that can be edited before being
> +fed to 'git fast-import' in order to do history rewrites (an ability
> +relied on by tools like 'git filter-repo').
>
> OPTIONS
> -------
> diff --git a/Documentation/git-filter-branch.txt b/Documentation/git-filter-branch.txt
> index 6b53dd7e06..8c586eed55 100644
> --- a/Documentation/git-filter-branch.txt
> +++ b/Documentation/git-filter-branch.txt
> @@ -16,6 +16,17 @@ SYNOPSIS
> [--original <namespace>] [-d <directory>] [-f | --force]
> [--state-branch <branch>] [--] [<rev-list options>...]
>
> +WARNING
> +-------
> +'git filter-branch' has a litany of gotchas that can and will cause
> +history to be rewritten incorrectly (in addition to abysmal
> +performance). These issues cannot be backward compatibly fixed and as
> +such, its use is not recommended. Please use an alternative history
> +filtering tool such as 'git filter-repo'. If you still need to use
> +'git filter-branch', please carefully read the "Safety" section of
> +https://public-inbox.org/git/CABPp-BEDOH-row-hxY4u_cP30ptqOpcCvPibwyZ2wBu142qUbA@mail.gmail.com/
Is it possible to present this URL as a hyperlink with a succinct
description? Maybe 'carefully read [the "Safety" section of this
message on the Git mailing list](url).' (I'm using Markdown notation
here as I don't know the equivalent for our docs.)
> +and avoid as many of the pitfalls listed there as reasonably possible.
> +
> DESCRIPTION
> -----------
> Lets you rewrite Git revision history by rewriting the branches mentioned
> @@ -445,37 +456,6 @@ warned.
> (or if your git-gc is not new enough to support arguments to
> `--prune`, use `git repack -ad; git prune` instead).
>
> -NOTES
> ------
> -
> -git-filter-branch allows you to make complex shell-scripted rewrites
> -of your Git history, but you probably don't need this flexibility if
> -you're simply _removing unwanted data_ like large files or passwords.
> -For those operations you may want to consider
> -http://rtyley.github.io/bfg-repo-cleaner/[The BFG Repo-Cleaner],
> -a JVM-based alternative to git-filter-branch, typically at least
> -10-50x faster for those use-cases, and with quite different
> -characteristics:
> -
> -* Any particular version of a file is cleaned exactly _once_. The BFG,
> - unlike git-filter-branch, does not give you the opportunity to
> - handle a file differently based on where or when it was committed
> - within your history. This constraint gives the core performance
> - benefit of The BFG, and is well-suited to the task of cleansing bad
> - data - you don't care _where_ the bad data is, you just want it
> - _gone_.
> -
> -* By default The BFG takes full advantage of multi-core machines,
> - cleansing commit file-trees in parallel. git-filter-branch cleans
> - commits sequentially (i.e. in a single-threaded manner), though it
> - _is_ possible to write filters that include their own parallelism,
> - in the scripts executed against each commit.
> -
> -* The http://rtyley.github.io/bfg-repo-cleaner/#examples[command options]
> - are much more restrictive than git-filter branch, and dedicated just
> - to the tasks of removing unwanted data- e.g:
> - `--strip-blobs-bigger-than 1M`.
> -
> GIT
> ---
> Part of the linkgit:git[1] suite
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 247f765604..0c114ad1ca 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -115,15 +115,14 @@ NOTES
> -----
>
> 'git gc' tries very hard not to delete objects that are referenced
> -anywhere in your repository. In
> -particular, it will keep not only objects referenced by your current set
> -of branches and tags, but also objects referenced by the index,
> -remote-tracking branches, refs saved by 'git filter-branch' in
> -refs/original/, reflogs (which may reference commits in branches
> -that were later amended or rewound), and anything else in the refs/* namespace.
> -If you are expecting some objects to be deleted and they aren't, check
> -all of those locations and decide whether it makes sense in your case to
> -remove those references.
> +anywhere in your repository. In particular, it will keep not only
> +objects referenced by your current set of branches and tags, but also
> +objects referenced by the index, remote-tracking branches, notes saved
> +by 'git notes' under refs/notes/, reflogs (which may reference commits
> +in branches that were later amended or rewound), and anything else in
> +the refs/* namespace. If you are expecting some objects to be deleted
> +and they aren't, check all of those locations and decide whether it
> +makes sense in your case to remove those references.
>
> On the other hand, when 'git gc' runs concurrently with another process,
> there is a risk of it deleting an object that the other process is using
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 6156609cf7..2f201d85d4 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -832,7 +832,7 @@ Hard case: The changes are not the same.::
> This happens if the 'subsystem' rebase had conflicts, or used
> `--interactive` to omit, edit, squash, or fixup commits; or
> if the upstream used one of `commit --amend`, `reset`, or
> - `filter-branch`.
> + a full history rewriting command like `filter-repo`.
>
>
> The easy case
> diff --git a/Documentation/git-replace.txt b/Documentation/git-replace.txt
> index 246dc9943c..35595a2cd3 100644
> --- a/Documentation/git-replace.txt
> +++ b/Documentation/git-replace.txt
> @@ -123,10 +123,10 @@ The following format are available:
> CREATING REPLACEMENT OBJECTS
> ----------------------------
>
> -linkgit:git-filter-branch[1], linkgit:git-hash-object[1] and
> -linkgit:git-rebase[1], among other git commands, can be used to create
> -replacement objects from existing objects. The `--edit` option can
> -also be used with 'git replace' to create a replacement object by
> +linkgit:git-hash-object[1], linkgit:git-rebase[1], and
> +linkgit:git-filter-repo[1], among other git commands, can be used to
> +create replacement objects from existing objects. The `--edit` option
> +can also be used with 'git replace' to create a replacement object by
> editing an existing object.
>
> If you want to replace many blobs, trees or commits that are part of a
> @@ -148,8 +148,8 @@ pending objects.
> SEE ALSO
> --------
> linkgit:git-hash-object[1]
> -linkgit:git-filter-branch[1]
> linkgit:git-rebase[1]
> +linkgit:git-filter-repo[1]
> linkgit:git-tag[1]
> linkgit:git-branch[1]
> linkgit:git-commit[1]
> diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
> index 30711625fd..f2762dd5d4 100644
> --- a/Documentation/git-svn.txt
> +++ b/Documentation/git-svn.txt
> @@ -769,9 +769,9 @@ option for (hopefully) obvious reasons.
> +
> This option is NOT recommended as it makes it difficult to track down
> old references to SVN revision numbers in existing documentation, bug
> -reports and archives. If you plan to eventually migrate from SVN to Git
> +reports, and archives. If you plan to eventually migrate from SVN to Git
> and are certain about dropping SVN history, consider
> -linkgit:git-filter-branch[1] instead. filter-branch also allows
> +linkgit:git-filter-repo[1] instead. filter-repo also allows
> reformatting of metadata for ease-of-reading and rewriting authorship
> info for non-"svn.authorsFile" users.
>
> diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
> index 82cd573776..997548f5ed 100644
> --- a/Documentation/githooks.txt
> +++ b/Documentation/githooks.txt
> @@ -425,9 +425,10 @@ post-rewrite
>
> This hook is invoked by commands that rewrite commits
> (linkgit:git-commit[1] when called with `--amend` and
> -linkgit:git-rebase[1]; currently `git filter-branch` does 'not' call
> -it!). Its first argument denotes the command it was invoked by:
> -currently one of `amend` or `rebase`. Further command-dependent
> +linkgit:git-rebase[1]; however, full-history (re)writing tools like
> +linkgit:git-fast-import[1] or linkgit:git-filter-repo[1] typically do
> +not call it!). Its first argument denotes the command it was invoked
> +by: currently one of `amend` or `rebase`. Further command-dependent
> arguments may be passed in the future.
>
> The hook receives a list of the rewritten commits on stdin, in the
> diff --git a/contrib/svn-fe/svn-fe.txt b/contrib/svn-fe/svn-fe.txt
> index a3425f4770..19333fc8df 100644
> --- a/contrib/svn-fe/svn-fe.txt
> +++ b/contrib/svn-fe/svn-fe.txt
> @@ -56,7 +56,7 @@ line. This line has the form `git-svn-id: URL@REVNO UUID`.
>
> The resulting repository will generally require further processing
> to put each project in its own repository and to separate the history
> -of each branch. The 'git filter-branch --subdirectory-filter' command
> +of each branch. The 'git filter-repo --subdirectory-filter' command
> may be useful for this purpose.
>
> BUGS
> @@ -67,5 +67,5 @@ The exit status does not reflect whether an error was detected.
>
> SEE ALSO
> --------
> -git-svn(1), svn2git(1), svk(1), git-filter-branch(1), git-fast-import(1),
> +git-svn(1), svn2git(1), svk(1), git-filter-repo(1), git-fast-import(1),
> https://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt
>
next prev parent reply other threads:[~2019-08-27 1:33 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-22 18:26 RFC: Proposing git-filter-repo for inclusion in git.git Elijah Newren
2019-08-22 20:23 ` Junio C Hamano
2019-08-22 21:12 ` Elijah Newren
2019-08-22 21:34 ` Junio C Hamano
2019-08-26 23:52 ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
2019-08-26 23:52 ` [RFC PATCH 1/5] t6006: simplify and optimize empty message test Elijah Newren
2019-08-27 1:23 ` Derrick Stolee
2019-08-26 23:52 ` [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-27 1:25 ` Derrick Stolee
2019-08-26 23:52 ` [RFC PATCH 3/5] git-sh-i18n: work with external scripts Elijah Newren
2019-08-27 1:28 ` Derrick Stolee
2019-08-26 23:52 ` [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation Elijah Newren
2019-08-27 1:32 ` Derrick Stolee [this message]
2019-08-27 6:23 ` Elijah Newren
2019-08-26 23:52 ` [RFC PATCH 5/5] Remove git-filter-branch, it is now external to git.git Elijah Newren
2019-08-27 1:39 ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Derrick Stolee
2019-08-27 6:17 ` Elijah Newren
2019-08-27 7:03 ` Eric Wong
2019-08-27 8:43 ` Sergey Organov
2019-08-27 22:18 ` Elijah Newren
2019-08-28 8:52 ` Sergey Organov
2019-08-28 17:16 ` Elijah Newren
2019-08-28 19:03 ` Sergey Organov
2019-08-30 20:40 ` Johannes Schindelin
2019-08-30 23:22 ` Elijah Newren
2019-09-02 9:29 ` Johannes Schindelin
2019-09-03 17:37 ` Elijah Newren
2019-08-28 0:22 ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-28 0:22 ` [PATCH v2 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-08-28 0:22 ` [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-28 6:00 ` Eric Sunshine
2019-08-28 0:22 ` [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-28 6:17 ` Eric Sunshine
2019-08-28 21:48 ` Elijah Newren
2019-08-28 0:22 ` [RFC PATCH v2 4/4] Remove git-filter-branch, it is now external to git.git Elijah Newren
2019-08-29 0:06 ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-29 0:06 ` [PATCH v3 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-08-29 0:06 ` [PATCH v3 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-29 0:06 ` [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-29 18:10 ` Eric Sunshine
2019-08-30 0:04 ` Elijah Newren
2019-08-29 0:06 ` [PATCH v3 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-08-30 5:57 ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-30 5:57 ` [PATCH v4 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-09-02 14:47 ` Johannes Schindelin
2019-08-30 5:57 ` [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-09-02 14:45 ` Johannes Schindelin
2019-08-30 5:57 ` [PATCH v4 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-30 5:57 ` [PATCH v4 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-09-03 18:55 ` [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-09-03 18:55 ` [PATCH v5 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-09-03 21:08 ` Junio C Hamano
2019-09-03 21:58 ` Elijah Newren
2019-09-03 22:25 ` Junio C Hamano
2019-09-03 18:55 ` [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-09-03 21:26 ` Junio C Hamano
2019-09-03 22:46 ` Junio C Hamano
2019-09-04 20:32 ` Elijah Newren
2019-09-03 18:55 ` [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-09-03 21:40 ` Junio C Hamano
2019-09-04 20:30 ` Elijah Newren
2019-09-03 18:55 ` [PATCH v5 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-09-04 22:32 ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-09-04 22:32 ` [PATCH v6 1/3] t6006: simplify, fix, and optimize empty message test Elijah Newren
2019-09-04 22:32 ` [PATCH v6 2/3] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-09-04 22:32 ` [PATCH v6 3/3] t9902: use a non-deprecated command for testing Elijah Newren
2019-08-23 3:00 ` RFC: Proposing git-filter-repo for inclusion in git.git Eric Wong
2019-08-23 18:06 ` Elijah Newren
2019-08-23 18:29 ` Elijah Newren
2019-08-28 11:09 ` Johannes Schindelin
2019-08-28 15:06 ` Junio C Hamano
2019-08-23 12:02 ` Derrick Stolee
2019-08-26 19:56 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e7df2ce3-f772-54b5-4e81-70510a897352@gmail.com \
--to=stolee@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=e@80x24.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=larsxschneider@gmail.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).