All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: Taylor Blau <ttaylorr@github.com>, Sun Chao <16657101987@163.com>,
	Sun Chao via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration
Date: Wed, 14 Jul 2021 21:32:26 +0200	[thread overview]
Message-ID: <87y2a8zntw.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YO87ax2JpLndc5Ly@nand.local>


On Wed, Jul 14 2021, Taylor Blau wrote:

> On Wed, Jul 14, 2021 at 08:19:15PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> The reason why we want to avoid freshen the mtime of ".pack" file is to
>> >> improve the reading speed of Git Servers.
>> >
>> > That's surprising behavior to me. Are you saying that calling utime(2)
>> > causes the *page* cache to be invalidated and that most reads are
>> > cache-misses lowering overall IOPS?
>>
>> I think you may be right narrowly, but wrong in this context :)
>>
>> I.e. my understanding of this problem is that they have some incremental
>> backup job, e.g. rsync without --checksum (not that doing that would
>> help, chicken & egg issue)..
>
> Ah, thanks for explaining. That's helpful, and changes my thinking.
>
> Ideally, Sun would be able to use --checksum (if they are using rsync)
> or some equivalent (if they are not). In other words, this seems like a
> problem that Git shouldn't be bending over backwards for.

Even with my strong opinions about rsync being bad for this use-case, in
practice it does work for a lot of people, e.g. with nightly jobs
etc. Not everyone's repository is insanely busy, where I've mainly seen
this sort of corruption.

In that case making people use --checksum is borderline inhumane :)

> But if that isn't possible, then I find introducing a new file to
> redefine the pack's mtime just to accommodate a backup system that
> doesn't know better to be a poor justification for adding this
> complexity. Especially since we agree that rsync-ing live Git
> repositories is a bad idea in the first place ;).
>
> If it were me, I would probably stop here and avoid pursuing this
> further. But an OK middle ground might be core.freshenPackfiles=<bool>
> to indicate whether or not packs can be freshened, or the objects
> contained within them should just be rewritten loose.
>
> Sun could then set this configuration to "false", implying:
>
>   - That they would have more random loose objects, leading to some
>     redundant work by their backup system.
>   - But they wouldn't have to resync their huge packfiles.
>
> ...and we wouldn't have to introduce any new formats/file types to do
> it. To me, that seems like a net-positive outcome.

This approach is getting quite close to my core.checkCollisions patch,
to the point of perhaps being indistinguishable in practice:
https://lore.kernel.org/git/20181028225023.26427-5-avarab@gmail.com/

I.e. if you're happy to re-write out duplicate objects then you're going
to be ignoring the collision check and don't need to do it. It's not the
same in that you might skip writing objects you know are reachable, and
with the collisions check off and not-so-thin packs you will/might get
more redundancy than you asked for.

But in practice with modern clients mostly/entirely sending you just the
things you need in the common case it might be close enough.

I mean it addresses the expiry race that a unreachable-becomes-reachable
again race would be "solved" by just re-writing that data, hrm, but
there's probably aspects of that race I'm not considering.

Anyway, in the sense that for a lot of systems syncing file additions is
a lot cheaper than rewrites it might get you what you want...

  reply	other threads:[~2021-07-14 19:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17   ` Sun Chao
2021-07-14  1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14  1:39   ` Ævar Arnfjörð Bjarmason
2021-07-14  2:52     ` Taylor Blau
2021-07-14 16:46       ` Sun Chao
2021-07-14 17:04         ` Taylor Blau
2021-07-14 18:19           ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11             ` Martin Fick
2021-07-14 19:41               ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20                 ` Martin Fick
2021-07-20  6:32                   ` Ævar Arnfjörð Bjarmason
2021-07-15  8:23                 ` Son Luong Ngoc
2021-07-20  6:29                   ` Ævar Arnfjörð Bjarmason
2021-07-14 19:30             ` Taylor Blau
2021-07-14 19:32               ` Ævar Arnfjörð Bjarmason [this message]
2021-07-14 19:52                 ` Taylor Blau
2021-07-14 21:40               ` Junio C Hamano
2021-07-15 16:30           ` Sun Chao
2021-07-15 16:42             ` Taylor Blau
2021-07-15 16:48               ` Sun Chao
2021-07-14 16:11     ` Sun Chao
2021-07-19 19:53   ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51     ` Taylor Blau
2021-07-20  0:07       ` Junio C Hamano
2021-07-20 15:07         ` Sun Chao
2021-07-20  6:19       ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34         ` Sun Chao
2021-07-20 15:00       ` Sun Chao
2021-07-20 16:53         ` Taylor Blau
2021-08-15 17:08     ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2a8zntw.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=16657101987@163.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=ttaylorr@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.