All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Martin Fick <mfick@codeaurora.org>
Cc: Taylor Blau <ttaylorr@github.com>, Sun Chao <16657101987@163.com>,
	Taylor Blau <me@ttaylorr.com>,
	Sun Chao via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Subject: Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration
Date: Tue, 20 Jul 2021 08:32:35 +0200	[thread overview]
Message-ID: <875yx5wkt2.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <3112447.ymCj9SdLpg@mfick-lnx>


On Wed, Jul 14 2021, Martin Fick wrote:

> On Wednesday, July 14, 2021 9:41:42 PM MDT you wrote:
>> On Wed, Jul 14 2021, Martin Fick wrote:
>> > On Wednesday, July 14, 2021 8:19:15 PM MDT Ævar Arnfjörð Bjarmason wrote:
>> >> The best way to get backups of git repositories you know are correct are
>> >> is to use git's own transport mechanisms, i.e. fetch/pull the data, or
>> >> create bundles from it.
>> > 
>> > I don't think this is a fair recommendation since unfortunately, this
>> > cannot be used to create a full backup. This can be used to back up the
>> > version controlled data, but not the repositories meta-data, i.e.
>> > configs, reflogs, alternate setups...
>> 
>> *nod*
>> 
>> FWIW at an ex-job I helped systems administrators who'd produced such a
>> broken backup-via-rsync create a hybrid version as an interim
>> solution. I.e. it would sync the objects via git transport, and do an
>> rsync on a whitelist (or blacklist), so pickup config, but exclude
>> objects.
>> 
>> "Hybrid" because it was in a state of needing to deal with manual
>> tweaking of config.
>> 
>> But usually someone who's needing to thoroughly solve this backup
>> problem will inevitably end up with wanting to drive everything that's
>> not in the object or refstore from some external system, i.e. have
>> config be generated from puppet, a database etc., ditto for alternates
>> etc.
>> 
>> But even if you can't get to that point (or don't want to) I'd say aim
>> for the hybrid system.
>> 
>> This isn't some purely theoretical concern b.t.w., the system using
>> rsync like this was producing repos that wouldn't fsck all the time, and
>> it wasn't such a busy site.
>> 
>> I suspect (but haven't tried) that for someone who can't easily change
>> their backup solution they'd get most of the benefits of git-native
>> transport by having their "rsync" sync refs, then objects, not the other
>> way around. Glob order dictates that most backup systems will do
>> objects, then refs (which will of course, at that point, refer to
>> nonexisting objects).
>> 
>> It's still not safe, you'll still be subject to races, but probably a
>> lot better in practice.
>
> It would be great if git provided a command to do a reliable incremental 
> backup, maybe it could copy things in the order that you mention?

I don't think we can or want to support this sort of thing ever, for the
same reason that you probably won't convince MySQL,PostgreSQL etc. that
they should support "cp -r" as a mode for backing up their live database
services.

I mean, there is the topic of git being lazy about fsync() etc, but even
if all of that were 100% solved you'd still get bad things if you picked
an arbitrary time to snapshot a running git directory, e.g. your
"master" branch might have a "master.lock" because it was in the middle
of an update.

If you used "fetch/clone/bundle" etc. to get the data no problem, but if
your snapshot happens then you'd need to manually clean that up, a
situation which in practice wouldn't persist, but would be persistent
with a snapshot approach.

> However, most people will want to use the backup system they have and not a 
> special git tool. Maybe git fsck should gain a switch that would rewind any 
> refs to an older point that is no broken (using reflogs)? That way, most 
> backups would just work and be rewound to the point at which the backup 
> started?

I think the main problem in the wild is not the inability of using a
special tool, but one of education. Most people wouldn't think of "cp
-r" as a first approach to say backing up a live mysql server, they'd
use mysqldump and the like.

But for some reason git is considered "not a database" enough that those
same people would just use rsync/tar/whatever, and are then surprised
when their data is corrupt or in some weird or inconsistent state...

Anyway, see also my just-posted:
https://lore.kernel.org/git/878s21wl4z.fsf@evledraar.gmail.com/

I.e. I'm not saying "never use rsync", there's cases where that's fine,
but for a live "real" server I'd say solutions in that class shouldn't
be considered/actively migrated away from.

  reply	other threads:[~2021-07-20  6:40 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 19:01 [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-11 23:44 ` Ævar Arnfjörð Bjarmason
2021-07-12 16:17   ` Sun Chao
2021-07-14  1:28 ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Sun Chao via GitGitGadget
2021-07-14  1:39   ` Ævar Arnfjörð Bjarmason
2021-07-14  2:52     ` Taylor Blau
2021-07-14 16:46       ` Sun Chao
2021-07-14 17:04         ` Taylor Blau
2021-07-14 18:19           ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11             ` Martin Fick
2021-07-14 19:41               ` Ævar Arnfjörð Bjarmason
2021-07-14 20:20                 ` Martin Fick
2021-07-20  6:32                   ` Ævar Arnfjörð Bjarmason [this message]
2021-07-15  8:23                 ` Son Luong Ngoc
2021-07-20  6:29                   ` Ævar Arnfjörð Bjarmason
2021-07-14 19:30             ` Taylor Blau
2021-07-14 19:32               ` Ævar Arnfjörð Bjarmason
2021-07-14 19:52                 ` Taylor Blau
2021-07-14 21:40               ` Junio C Hamano
2021-07-15 16:30           ` Sun Chao
2021-07-15 16:42             ` Taylor Blau
2021-07-15 16:48               ` Sun Chao
2021-07-14 16:11     ` Sun Chao
2021-07-19 19:53   ` [PATCH v3] " Sun Chao via GitGitGadget
2021-07-19 20:51     ` Taylor Blau
2021-07-20  0:07       ` Junio C Hamano
2021-07-20 15:07         ` Sun Chao
2021-07-20  6:19       ` Ævar Arnfjörð Bjarmason
2021-07-20 15:34         ` Sun Chao
2021-07-20 15:00       ` Sun Chao
2021-07-20 16:53         ` Taylor Blau
2021-08-15 17:08     ` [PATCH v4 0/2] " Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 1/2] packfile: rename `derive_filename()` to `derive_pack_filename()` Sun Chao via GitGitGadget
2021-08-15 17:08       ` [PATCH v4 2/2] packfile: freshen the mtime of packfile by bump file Sun Chao via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875yx5wkt2.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=16657101987@163.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=mfick@codeaurora.org \
    --cc=ttaylorr@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.