From: Sebastian Thiel <sebastian.thiel@icloud.com>
To: Junio C Hamano <gitster@pobox.com>,
Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Josh Triplett <josh@joshtriplett.org>,
Elijah Newren <newren@gmail.com>,
Phillip Wood <phillip.wood123@gmail.com>
Subject: Re: [PATCH] precious-files.txt: new document proposing new precious file type
Date: Sun, 11 Feb 2024 23:08:02 +0100 [thread overview]
Message-ID: <2A762405-1A2B-472E-9A2F-D068A25F65C1@icloud.com> (raw)
In-Reply-To: <xmqq8r5gfc3j.fsf@gitster.g>
I didn't know where I would best reply to give an update on my work
on precious file support, but here I go.
On my journey to daring implementing precious files in Git, I decided
to implement it in Gitoxide first to ease myself into it.
After what felt like months of work on the Gitoxide-equivalent of
dir.c, it just took 2 days to cobble together a 'gix clean' with
precious files support.
You might say that something as destructive as a 'clean' subcommand
would better not be rushed, but it was surprisingly straightforward
to implement. It was so inviting even that I could spend the second
day, today, entirely on polishing, yielding a 'gix clean' which is
fun to use, with some extras I never knew I wanted until I had full
control over it and could play around easily.
What I found myself do immediately by the way is adjust `.gitignore`
files of the project to have precious declarations right after
their non-precious counterparts for backwards compatibility.
It works perfectly, from what I can tell, and it is truly wonderful
to be able to wipe a repo clean without fear of destroying anything
valuable. And I am aware that we all know that, but wanted to write
it to underline how psychologically valuable this feature is.
Without further ado, I invite you all to give it a go yourself
for first experiences with precious files maybe.
git clone https://github.com/Byron/gitoxide
cd gitoxide
cargo build --release --bin gix --no-default-features --features max-pure
target/release/gix clean
This should do the trick - from there the program should guide the
user.
If you want to see some more interesting features besides precious
files, you can run 'cargo test -p gix' and follow the 'gix clean -xd'
instructions along with the `--debug` flag.
A word about performance: It is slower.
It started out to be only about 1% slower even on the biggest repositories
and under optimal conditions (i.e. precomposeUnicode and ignoreCase off
and skipHash true). But as I improved correctness and added features,
that was lost and it's now about 15% slower on bigger repositories.
I appended a benchmark run on the Linux kernel at the end, and it shows
that Gitoxide definitely spends more time in userland. I can only
assume that some performance was lost when I started to deviate from
the 'only do the work you need' recipe that I learned from Git to
'always provide a consistent set of information about directory entries'.
On top of that, there is multiple major shortcomings in this realm:
- Gitoxide doesn't actually get faster when reading indices with multiple
threads for some reason.
- the icase-hashtable is created only with a single thread.
- the precompose-unicode conversion is very slow and easily costs 25%
performance.
But that's details, some of which you can see yourself when running
'gix --trace -v clean'.
Now I hope you will have fun trying 'gix clean' with precious files in your
repositories. Also, I am particularly interested in learning how it fares
in situations where you know 'git clean' might have difficulties.
I tried very hard to achieve correctness, and any problem you find
will be fixed ASAP.
With this experience, I think I am in a good position to get precious
files support for 'git clean' implemented, once I get to make the start.
Cheers,
Sebastian
----
Here is the benchmark result (and before I forget, Gitoxide also uses about 25% more memory
for some reason, so really has some catchup to do, eventually)
linux (ffc2532) +369 -819 [!] took 2s
❯ hyperfine -N -w1 -r4 'gix clean -xd --skip-hidden-repositories=non-bare' 'gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare' 'git clean -nxd'
Benchmark 1: gix clean -xd --skip-hidden-repositories=non-bare
Time (mean ± σ): 171.7 ms ± 3.0 ms [User: 70.4 ms, System: 101.4 ms]
Range (min … max): 167.4 ms … 174.2 ms 4 runs
Benchmark 2: gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
Time (mean ± σ): 156.3 ms ± 3.1 ms [User: 56.9 ms, System: 99.3 ms]
Range (min … max): 154.1 ms … 160.8 ms 4 runs
Benchmark 3: git clean -nxd
Time (mean ± σ): 138.4 ms ± 2.7 ms [User: 40.5 ms, System: 103.7 ms]
Range (min … max): 136.1 ms … 142.0 ms 4 runs
Summary
git clean -nxd ran
1.13 ± 0.03 times faster than gix -c index.skipHash=1 -c core.ignoreCase=0 -c core.precomposeUnicode=0 clean -xd --skip-hidden-repositories=non-bare
1.24 ± 0.03 times faster than gix clean -xd --skip-hidden-repositories=non-bare
On 27 Dec 2023, at 6:28, Junio C Hamano wrote:
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Elijah Newren <newren@gmail.com>
>>
>> We have traditionally considered all ignored files to be expendable, but
>> users occasionally want ignored files that are not considered
>> expendable. Add a design document covering how to split ignored files
>> into two types: 'trashable' (what all ignored files are currently
>> considered) and 'precious' (the new type of ignored file).
>
> The proposed syntax is a bit different from what I personally prefer
> (which is Phillip's [P14] or something like it), but I consider that
> the more valuable parts of this document is about how various
> commands ought to interact with precious paths, which shouldn't
> change regardless of the syntax.
>
> Thanks for putting this together.
prev parent reply other threads:[~2024-02-11 22:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-27 2:25 [PATCH] precious-files.txt: new document proposing new precious file type Elijah Newren via GitGitGadget
2023-12-27 5:28 ` Junio C Hamano
2023-12-27 6:54 ` Elijah Newren
2023-12-27 22:15 ` Junio C Hamano
2024-01-18 7:51 ` Sebastian Thiel
2024-01-18 19:14 ` Junio C Hamano
2024-01-18 21:33 ` Sebastian Thiel
2024-01-19 2:37 ` Elijah Newren
2024-01-19 7:51 ` Sebastian Thiel
2024-01-19 18:45 ` Junio C Hamano
2024-01-19 2:58 ` Elijah Newren
2024-01-19 16:53 ` Phillip Wood
2024-01-19 17:17 ` Junio C Hamano
2024-01-24 6:50 ` Elijah Newren
2024-02-11 22:08 ` Sebastian Thiel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2A762405-1A2B-472E-9A2F-D068A25F65C1@icloud.com \
--to=sebastian.thiel@icloud.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=josh@joshtriplett.org \
--cc=newren@gmail.com \
--cc=phillip.wood123@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).