On 2021-11-09 at 00:10:05, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Doesn't rsync work the same per-file fashion, and the only reason
> > why it is a better fit is because it is not continuous, not
> > attempting to "sync" while the repository is in use, until the user
> > explicitly says "OK, I am ready to go home, so let's stop working
> > here and send everything over to home with rsync"?
> 
> OK, so not "per-file" but "continuous" is the root problem, and
> "cloud" would be a good word because all the popular ones share that
> "continuous" trait.
> 
> This part of the proposed patch text may need rethinking a bit.
> 
> > +or added files, broken refs, and a wide variety of other corruption.  These
> > +services tend to sync file by file and don't understand the structure of a Git
> > +repository.  This is especially bad if they sync the repository in the middle of
> 
> That is, "file by file" is not a problem per-se, "don't understand
> the structure" is a huge problem, and "continuous" may contribute to
> the problem further.

"File by file" is not a problem per se, but it is a problem if you don't
snapshot the repository or it's in use.  For example, if you used an LVM
snapshot on an in-use repository and then synced that, we'd guarantee
that the repository was fully consistent.  Practically nobody does that,
though.

But to clarify: the goal of this type of software is to sync single,
independent documents.  As such, these tools don't necessarily consider
that it's important to sync other files in the same directory at the
same time.  You can end up with cases where some parts of your
repository get synced, then some other files in other directories, then
other parts of the repository, resulting in different states over
different times.  There's just no guarantee about how they behave with
all of the files in a given repository because they aren't considered a
relevant unit.  That's why I mention that they sync file by file.  If
they did syncing directory by directory, we'd likely have a lot fewer
problems.

> I wonder if you let the "cloud" services to continuously sync your
> repository, then go quiescent for a while and then start touching
> the destination, it would be sufficient, though.  The refs with
> funny "2" suffixes and the like are the symptom of two sides
> simultanously mucking with the same file (e.g ".git/refs/main") and
> the "cloud sync" could not decide what the right final outcome is,
> right?

I think this is very risky.  Yes, usually the duplicate files are caused
by competing changes on both sides, but having seen users with this
problem before, I'm not sure if we can safely guarantee this is the only
case.

> I also wonder if we add a way to transfer reflog entries, that imply
> the object reachability, say "git push --with-reflog", over the
> wire, it would be sufficient to do everything with Git.

Yes, please.  I'd love to see this.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA