git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Derrick Stolee" <stolee@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Lars Schneider" <larsxschneider@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v5 0/4] Warn about git-filter-branch usage and avoid it
Date: Tue,  3 Sep 2019 11:55:20 -0700	[thread overview]
Message-ID: <20190903185524.13467-1-newren@gmail.com> (raw)
In-Reply-To: <20190828002210.8862-1-newren@gmail.com>

It's been about 5 days with no further feedback, other than some timings
from Dscho for Windows showing that my fixes help there too.  So, I did
one last re-read, made a couple small wording tweaks, and am resending as
ready for inclusion.

Changes since v4:
  * Included the windows timings from Dscho in the commit messages for
    the first two perf patches
  * A few slight wording tweaks to the manpage

Elijah Newren (4):
  t6006: simplify and optimize empty message test
  t3427: accelerate this test by using fast-export and fast-import
  Recommend git-filter-repo instead of git-filter-branch
  t9902: use a non-deprecated command for testing

 Documentation/git-fast-export.txt   |   6 +-
 Documentation/git-filter-branch.txt | 273 +++++++++++++++++++++++++---
 Documentation/git-gc.txt            |  17 +-
 Documentation/git-rebase.txt        |   3 +-
 Documentation/git-replace.txt       |  10 +-
 Documentation/git-svn.txt           |  10 +-
 Documentation/githooks.txt          |  10 +-
 contrib/svn-fe/svn-fe.txt           |   4 +-
 git-filter-branch.sh                |  13 ++
 t/t3427-rebase-subtree.sh           |  24 ++-
 t/t6006-rev-list-format.sh          |   5 +-
 t/t9902-completion.sh               |  12 +-
 12 files changed, 310 insertions(+), 77 deletions(-)

Range-diff:
1:  7ddbeea2ca ! 1:  ccea0e5846 t6006: simplify and optimize empty message test
    @@ Commit message
         Despite only being one piece of the 71st test and there being 73 tests
         overall, this small change to just this one test speeds up the overall
         execution time of t6006 (as measured by the best of 3 runs of `time
    -    ./t6006-rev-list-format.sh`) by about 11% on Linux and by 13% on
    -    Mac.
    +    ./t6006-rev-list-format.sh`) by about 11% on Linux, 13% on Mac, and
    +    about 15% on Windows.
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
2:  e1e63189c1 ! 2:  6d73135006 t3427: accelerate this test by using fast-export and fast-import
    @@ Commit message
     
         fast-export and fast-import can easily handle the simple rewrite that
         was being done by filter-branch, and should be significantly faster on
    -    systems with a slow fork.  Timings from before and after on two laptops
    -    that I have access to (measured via `time ./t3427-rebase-subtree.sh`,
    -    i.e. including everything in this test -- not just the filter-branch or
    -    fast-export/fast-import pair):
    +    systems with a slow fork.  Timings from before and after on a few
    +    laptops that I or others measured on (measured via `time
    +    ./t3427-rebase-subtree.sh`, i.e. including everything in this test --
    +    not just the filter-branch or fast-export/fast-import pair):
     
    -       Linux:  4.305s -> 3.684s (~17% speedup)
    -       Mac:   10.128s -> 7.038s (~30% speedup)
    +       Linux:    4.305s -> 3.684s (~17% speedup)
    +       Mac:     10.128s -> 7.038s (~30% speedup)
    +       Windows:  1m 37s -> 1m 17s (~26% speedup)
     
         Signed-off-by: Elijah Newren <newren@gmail.com>
     
3:  ed6505584f ! 3:  2f225c8697 Recommend git-filter-repo instead of git-filter-branch
    @@ Documentation/git-filter-branch.txt: warned.
     +document or provide to a coworker, who then runs them on a different OS
     +where the same commands are not working/tested (some examples in the
     +git-filter-branch manpage are also affected by this).  BSD vs. GNU
    -+userland differences can really bite.  If you're lucky, you get ugly
    -+error messages spewed.  But just as likely, the commands either don't do
    -+the filtering requested, or silently corrupt making some unwanted
    -+change.  The unwanted change may only affect a few commits, so it's not
    -+necessarily obvious either.  (The fact that problems won't necessarily
    -+be obvious means they are likely to go unnoticed until the rewritten
    -+history is in use for quite a while, at which point it's really hard to
    -+justify another flag-day for another rewrite.)
    ++userland differences can really bite.  If lucky, error messages are
    ++spewed.  But just as likely, the commands either don't do the filtering
    ++requested, or silently corrupt by making some unwanted change.  The
    ++unwanted change may only affect a few commits, so it's not necessarily
    ++obvious either.  (The fact that problems won't necessarily be obvious
    ++means they are likely to go unnoticed until the rewritten history is in
    ++use for quite a while, at which point it's really hard to justify
    ++another flag-day for another rewrite.)
     +
     +* Filenames with spaces are often mishandled by shell snippets since
     +they cause problems for shell pipelines.  Not everyone is familiar with
     +find -print0, xargs -0, git-ls-files -z, etc.  Even people who are
    -+familiar with these may assume such needs are not relevant because
    ++familiar with these may assume such flags are not relevant because
     +someone else renamed any such files in their repo back before the person
    -+doing the filtering joined the project.  And, often, even those familiar
    -+with handling arguments with spaces my not do so just because they
    ++doing the filtering joined the project.  And often, even those familiar
    ++with handling arguments with spaces may not do so just because they
     +aren't in the mindset of thinking about everything that could possibly
     +go wrong.
     +
     +* Non-ascii filenames can be silently removed despite being in a desired
    -+directory.  The desire to select paths to keep often use pipelines like
    ++directory.  Keeping only wanted paths is often done using pipelines like
     +`git ls-files | grep -v ^WANTED_DIR/ | xargs git rm`.  ls-files will
    -+only quote filenames if needed so folks may not notice that one of the
    -+files didn't match the regex, again until it's much too late.  Yes,
    -+someone who knows about core.quotePath can avoid this (unless they have
    -+other special characters like \t, \n, or "), and people who use ls-files
    -+-z with something other than grep can avoid this, but that doesn't mean
    -+they will.
    ++only quote filenames if needed, so folks may not notice that one of the
    ++files didn't match the regex (at least not until it's much too late).
    ++Yes, someone who knows about core.quotePath can avoid this (unless they
    ++have other special characters like \t, \n, or "), and people who use
    ++ls-files -z with something other than grep can avoid this, but that
    ++doesn't mean they will.
     +
     +* Similarly, when moving files around, one can find that filenames with
     +non-ascii or special characters end up in a different directory, one
    @@ Documentation/git-filter-branch.txt: warned.
     +the same name, no warning or error is provided; git-filter-branch simply
     +overwrites each tag in some undocumented pre-defined order resulting in
     +only one tag at the end.  (A git-filter-branch regression test requires
    -+this.)
    ++this surprising behavior.)
     +
    -+Also, the poor performance of git-filter-branch often leads to safety issues:
    ++Also, the poor performance of git-filter-branch often leads to safety
    ++issues:
     +
     +* Coming up with the correct shell snippet to do the filtering you want
     +is sometimes difficult unless you're just doing a trivial modification
4:  ca8e124cb3 = 4:  048eba375b t9902: use a non-deprecated command for testing
-- 
2.23.0.39.gf92d9de5c3


  parent reply	other threads:[~2019-09-03 18:55 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-22 18:26 RFC: Proposing git-filter-repo for inclusion in git.git Elijah Newren
2019-08-22 20:23 ` Junio C Hamano
2019-08-22 21:12   ` Elijah Newren
2019-08-22 21:34     ` Junio C Hamano
2019-08-26 23:52       ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Elijah Newren
2019-08-26 23:52         ` [RFC PATCH 1/5] t6006: simplify and optimize empty message test Elijah Newren
2019-08-27  1:23           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 2/5] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-27  1:25           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 3/5] git-sh-i18n: work with external scripts Elijah Newren
2019-08-27  1:28           ` Derrick Stolee
2019-08-26 23:52         ` [RFC PATCH 4/5] Recommend git-filter-repo instead of git-filter-branch in documentation Elijah Newren
2019-08-27  1:32           ` Derrick Stolee
2019-08-27  6:23             ` Elijah Newren
2019-08-26 23:52         ` [RFC PATCH 5/5] Remove git-filter-branch, it is now external to git.git Elijah Newren
2019-08-27  1:39         ` [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere Derrick Stolee
2019-08-27  6:17           ` Elijah Newren
2019-08-27  7:03         ` Eric Wong
2019-08-27  8:43           ` Sergey Organov
2019-08-27 22:18             ` Elijah Newren
2019-08-28  8:52               ` Sergey Organov
2019-08-28 17:16                 ` Elijah Newren
2019-08-28 19:03                   ` Sergey Organov
2019-08-30 20:40                   ` Johannes Schindelin
2019-08-30 23:22                     ` Elijah Newren
2019-09-02  9:29                       ` Johannes Schindelin
2019-09-03 17:37                         ` Elijah Newren
2019-08-28  0:22         ` [PATCH v2 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-28  0:22           ` [PATCH v2 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-08-28  0:22           ` [PATCH v2 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-28  6:00             ` Eric Sunshine
2019-08-28  0:22           ` [PATCH v2 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-28  6:17             ` Eric Sunshine
2019-08-28 21:48               ` Elijah Newren
2019-08-28  0:22           ` [RFC PATCH v2 4/4] Remove git-filter-branch, it is now external to git.git Elijah Newren
2019-08-29  0:06           ` [PATCH v3 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-29  0:06             ` [PATCH v3 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-08-29  0:06             ` [PATCH v3 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-08-29  0:06             ` [PATCH v3 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-29 18:10               ` Eric Sunshine
2019-08-30  0:04                 ` Elijah Newren
2019-08-29  0:06             ` [PATCH v3 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-08-30  5:57             ` [PATCH v4 0/4] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-08-30  5:57               ` [PATCH v4 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-09-02 14:47                 ` Johannes Schindelin
2019-08-30  5:57               ` [PATCH v4 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-09-02 14:45                 ` Johannes Schindelin
2019-08-30  5:57               ` [PATCH v4 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-08-30  5:57               ` [PATCH v4 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-09-03 18:55           ` Elijah Newren [this message]
2019-09-03 18:55             ` [PATCH v5 1/4] t6006: simplify and optimize empty message test Elijah Newren
2019-09-03 21:08               ` Junio C Hamano
2019-09-03 21:58                 ` Elijah Newren
2019-09-03 22:25                   ` Junio C Hamano
2019-09-03 18:55             ` [PATCH v5 2/4] t3427: accelerate this test by using fast-export and fast-import Elijah Newren
2019-09-03 21:26               ` Junio C Hamano
2019-09-03 22:46                 ` Junio C Hamano
2019-09-04 20:32                   ` Elijah Newren
2019-09-03 18:55             ` [PATCH v5 3/4] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-09-03 21:40               ` Junio C Hamano
2019-09-04 20:30                 ` Elijah Newren
2019-09-03 18:55             ` [PATCH v5 4/4] t9902: use a non-deprecated command for testing Elijah Newren
2019-09-04 22:32             ` [PATCH v6 0/3] Warn about git-filter-branch usage and avoid it Elijah Newren
2019-09-04 22:32               ` [PATCH v6 1/3] t6006: simplify, fix, and optimize empty message test Elijah Newren
2019-09-04 22:32               ` [PATCH v6 2/3] Recommend git-filter-repo instead of git-filter-branch Elijah Newren
2019-09-04 22:32               ` [PATCH v6 3/3] t9902: use a non-deprecated command for testing Elijah Newren
2019-08-23  3:00     ` RFC: Proposing git-filter-repo for inclusion in git.git Eric Wong
2019-08-23 18:06       ` Elijah Newren
2019-08-23 18:29         ` Elijah Newren
2019-08-28 11:09         ` Johannes Schindelin
2019-08-28 15:06           ` Junio C Hamano
2019-08-23 12:02     ` Derrick Stolee
2019-08-26 19:56   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190903185524.13467-1-newren@gmail.com \
    --to=newren@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=larsxschneider@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).