git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Thiel <sebastian.thiel@icloud.com>
To: Junio C Hamano <gitster@pobox.com>,
	Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Josh Triplett <josh@joshtriplett.org>,
	Elijah Newren <newren@gmail.com>,
	Phillip Wood <phillip.wood123@gmail.com>
Subject: Re: [PATCH] precious-files.txt: new document proposing new precious file type
Date: Sun, 11 Feb 2024 23:08:02 +0100	[thread overview]
Message-ID: <2A762405-1A2B-472E-9A2F-D068A25F65C1@icloud.com> (raw)
In-Reply-To: <xmqq8r5gfc3j.fsf@gitster.g>

I didn't know where I would best reply to give an update on my work
on precious file support, but here I go.

On my journey to daring implementing precious files in Git, I decided
to implement it in Gitoxide first to ease myself into it.

After what felt like months of work on the Gitoxide-equivalent of
dir.c, it just took 2 days to cobble together a 'gix clean' with
precious files support.

You might say that something as destructive as a 'clean' subcommand
would better not be rushed, but it was surprisingly straightforward
to implement. It was so inviting even that I could spend the second
day, today, entirely on polishing, yielding a 'gix clean' which is
fun to use, with some extras I never knew I wanted until I had full
control over it and could play around easily.

What I found myself do immediately by the way is adjust `.gitignore`
files of the project to have precious declarations right after
their non-precious counterparts for backwards compatibility.

It works perfectly, from what I can tell, and it is truly wonderful
to be able to wipe a repo clean without fear of destroying anything
valuable. And I am aware that we all know that, but wanted to write
it to underline how psychologically valuable this feature is.

Without further ado, I invite you all to give it a go yourself
for first experiences with precious files maybe.

    git clone https://github.com/Byron/gitoxide
    cd gitoxide
    cargo build --release --bin gix --no-default-features --features max-pure
	target/release/gix clean

This should do the trick - from there the program should guide the
user.

If you want to see some more interesting features besides precious
files, you can run 'cargo test -p gix' and follow the 'gix clean -xd'
instructions along with the `--debug` flag.

A word about performance: It is slower.
It started out to be only about 1% slower even on the biggest repositories
and under optimal conditions (i.e. precomposeUnicode and ignoreCase off
and skipHash true). But as I improved correctness and added features,
that was lost and it's now about 15% slower on bigger repositories.

I appended a benchmark run on the Linux kernel at the end, and it shows
that Gitoxide definitely spends more time in userland. I can only
assume that some performance was lost when I started to deviate from
the 'only do the work you need' recipe that I learned from Git to
'always provide a consistent set of information about directory entries'.

On top of that, there is multiple major shortcomings in this realm:

- Gitoxide doesn't actually get faster when reading indices with multiple
  threads for some reason.
- the icase-hashtable is created only with a single thread.
- the precompose-unicode conversion is very slow and easily costs 25%
  performance.

But that's details, some of which you can see yourself when running
'gix --trace -v clean'.

Now I hope you will have fun trying 'gix clean' with precious files in your
repositories. Also, I am particularly interested in learning how it fares
in situations where you know 'git clean' might have difficulties.
I tried very hard to achieve correctness, and any problem you find
will be fixed ASAP.

With this experience, I think I am in a good position to get precious
files support for 'git clean' implemented, once I get to make the start.

Cheers,
Sebastian

----

Here is the benchmark result (and before I forget, Gitoxide also uses about 25% more memory
for some reason, so really has some catchup to do, eventually)

linux (ffc2532) +369 -819 [!] took 2s
❯ hyperfine -N -w1 -r4  'gix clean -xd --skip-hidden-repositories=non-bare' 'gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare' 'git clean -nxd'
Benchmark 1: gix clean -xd --skip-hidden-repositories=non-bare
  Time (mean ± σ):     171.7 ms ±   3.0 ms    [User: 70.4 ms, System: 101.4 ms]
  Range (min … max):   167.4 ms … 174.2 ms    4 runs

Benchmark 2: gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
  Time (mean ± σ):     156.3 ms ±   3.1 ms    [User: 56.9 ms, System: 99.3 ms]
  Range (min … max):   154.1 ms … 160.8 ms    4 runs

Benchmark 3: git clean -nxd
  Time (mean ± σ):     138.4 ms ±   2.7 ms    [User: 40.5 ms, System: 103.7 ms]
  Range (min … max):   136.1 ms … 142.0 ms    4 runs

Summary
  git clean -nxd ran
    1.13 ± 0.03 times faster than gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
    1.24 ± 0.03 times faster than gix clean -xd --skip-hidden-repositories=non-bare


On 27 Dec 2023, at 6:28, Junio C Hamano wrote:

> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Elijah Newren <newren@gmail.com>
>>
>> We have traditionally considered all ignored files to be expendable, but
>> users occasionally want ignored files that are not considered
>> expendable.  Add a design document covering how to split ignored files
>> into two types: 'trashable' (what all ignored files are currently
>> considered) and 'precious' (the new type of ignored file).
>
> The proposed syntax is a bit different from what I personally prefer
> (which is Phillip's [P14] or something like it), but I consider that
> the more valuable parts of this document is about how various
> commands ought to interact with precious paths, which shouldn't
> change regardless of the syntax.
>
> Thanks for putting this together.

      parent reply	other threads:[~2024-02-11 22:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-27  2:25 [PATCH] precious-files.txt: new document proposing new precious file type Elijah Newren via GitGitGadget
2023-12-27  5:28 ` Junio C Hamano
2023-12-27  6:54   ` Elijah Newren
2023-12-27 22:15     ` Junio C Hamano
2024-01-18  7:51       ` Sebastian Thiel
2024-01-18 19:14         ` Junio C Hamano
2024-01-18 21:33           ` Sebastian Thiel
2024-01-19  2:37             ` Elijah Newren
2024-01-19  7:51               ` Sebastian Thiel
2024-01-19 18:45                 ` Junio C Hamano
2024-01-19  2:58           ` Elijah Newren
2024-01-19 16:53             ` Phillip Wood
2024-01-19 17:17               ` Junio C Hamano
2024-01-24  6:50               ` Elijah Newren
2024-02-11 22:08   ` Sebastian Thiel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2A762405-1A2B-472E-9A2F-D068A25F65C1@icloud.com \
    --to=sebastian.thiel@icloud.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=josh@joshtriplett.org \
    --cc=newren@gmail.com \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).