All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org,
	Christian Couder <christian.couder@gmail.com>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 4/4] fetch: implement support for atomic reference updates
Date: Tue, 12 Jan 2021 13:22:40 +0100	[thread overview]
Message-ID: <X/2UkKS0AhT6jLXj@ncase> (raw)
In-Reply-To: <xmqq5z438ddv.fsf@gitster.c.googlers.com>

[-- Attachment #1: Type: text/plain, Size: 3858 bytes --]

On Mon, Jan 11, 2021 at 11:47:08AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> It would be much more preferrable to see this series go like so:
> >> 
> >>     [1/4] create append_fetch_head() that writes out to
> >>           fetch_head->fp
> >> 
> >>     [1.5/4] convert append_fetch_head() to ALWAYS accumulate in
> >>             fetch_head->buf, and ALWAYS flushes what is accumulated
> >>             at the end.
> >
> > This is a change I'm hesitant to make. The thing is that FETCH_HEAD is
> > quite weird as the current design allows different `git fetch --append`
> > processes to write to FETCH_HEAD at the same time.
> 
> Hmph, do you mean 
> 
> 	git fetch --append remote-a & git fetch --append remote-b
> 
> or something else [*1*]?

That's what I mean.

> With the current write-out codepath, there is no guarantee that ...
> 
> > > +		fprintf(fetch_head->fp, "%s\t%s\t%s",
> > > +			old_oid, merge_status_marker, note);
> > > +		for (i = 0; i < url_len; ++i)
> > > +			if ('\n' == url[i])
> > > +				fputs("\\n", fetch_head->fp);
> > > +			else
> > > +				fputc(url[i], fetch_head->fp);
> > > +		fputc('\n', fetch_head->fp);
> 
> ... these stdio operations for a single record would result in a
> single atomic write() that would not compete with writes by other
> processes.  So I wouldn't call "the current design allows ... at the
> same time."  It has been broken for years and nobody complained.

Fair enough. I somehow blindly assumed that `git fetch --all`, which
does invoke `git fetch --append`, did perform the fetch in parallel. If
that's not the case: all the better.

> > If we change to
> > always accumulate first and flush once we're done, then we essentially
> > have a change in behaviour as FETCH_HEADs aren't written in an
> > interleaving fashion anymore, but only at the end.
> 
> I view it a good thing, actually, for another reason [*2*], but that
> would take a follow-up fix that is much more involved, so let's not
> go there (yet).  At least buffering a single line's worth of output
> in a buffer and writing it out in a single write() would get us much
> closer to solving the "mixed up output" from multiple processes
> problem the current code seems to already have.

The buffering of a single line is exactly what we're doing now in the
non-atomic case. Previously there had been multiple writes, now we only
call `strbuf_write()` once on the buffer for each reference.

> > Also, if there is any
> > unexpected error in between updating references which causes us to die,
> > then we wouldn't have written the already updated references to
> > FETCH_HEAD now.
> 
> That one may be a valid concern.
> 
> Thanks.

Just to be explicit: are you fine with changes in this patch as-is? They
don't solve concurrent writes to FETCH_HEAD, but they should make it
easier to solve that if we ever need to.

Patrick

> 
> [Footnote]
> 
> *1* "git fetch --all/--multiple" was strictly serial operation to
>     avoid such a mixed-up output problem, but perhaps we weren't
>     careful enough when we introduced parallel fetch and broke it?
> 
> *2* When fetching from a single remote, the code makes sure that a
>     record that is not marked as 'not-for-merge' is listed first by
>     reordering the records, but it does not work well when fetching
>     from multiple remotes.  Queuing all in the buffer before showing
>     them would give us an opportunity to sort, but as I said, it
>     takes coordination among the processes---instead of each process
>     writing to FETCH_HEAD on their own, somebody has to collect the
>     records from them, reorder as needed and write them all out.
> 
>     cf. https://lore.kernel.org/git/X8oL190Vl03B0cQ%2F@coredump.intra.peff.net/
> 
>     

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-01-12 12:23 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-07 13:51 [PATCH 0/2] fetch: implement support for atomic reference updates Patrick Steinhardt
2021-01-07 13:51 ` [PATCH 1/2] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
2021-01-07 22:59   ` Junio C Hamano
2021-01-08  0:45     ` Christian Couder
2021-01-08  7:18       ` Junio C Hamano
2021-01-07 13:51 ` [PATCH 2/2] fetch: implement support for atomic reference updates Patrick Steinhardt
2021-01-08  0:19   ` Junio C Hamano
2021-01-08 12:11 ` [PATCH v2 0/4] " Patrick Steinhardt
2021-01-08 12:11   ` [PATCH v2 1/4] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
2021-01-08 23:40     ` Junio C Hamano
2021-01-11 10:26       ` Patrick Steinhardt
2021-01-11 19:24         ` Junio C Hamano
2021-01-08 12:11   ` [PATCH v2 2/4] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
2021-01-08 23:50     ` Junio C Hamano
2021-01-11 10:28       ` Patrick Steinhardt
2021-01-08 12:11   ` [PATCH v2 3/4] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
2021-01-08 23:53     ` Junio C Hamano
2021-01-08 12:11   ` [PATCH v2 4/4] fetch: implement support for atomic reference updates Patrick Steinhardt
2021-01-09  0:05     ` Junio C Hamano
2021-01-11 10:42       ` Patrick Steinhardt
2021-01-11 19:47         ` Junio C Hamano
2021-01-12 12:22           ` Patrick Steinhardt [this message]
2021-01-12 12:29             ` Patrick Steinhardt
2021-01-12 19:19             ` Junio C Hamano
2021-01-11 11:05 ` [PATCH v3 0/5] " Patrick Steinhardt
2021-01-11 11:05   ` [PATCH v3 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
2021-01-11 23:16     ` Junio C Hamano
2021-01-11 11:05   ` [PATCH v3 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
2021-01-11 11:10     ` Patrick Steinhardt
2021-01-11 11:17     ` Christian Couder
2021-01-11 23:21     ` Junio C Hamano
2021-01-11 11:05   ` [PATCH v3 3/5] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
2021-01-11 11:05   ` [PATCH v3 4/5] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
2021-01-11 11:05   ` [PATCH v3 5/5] fetch: implement support for atomic reference updates Patrick Steinhardt
2021-01-12  0:04   ` [PATCH v3 0/5] " Junio C Hamano
2021-01-12 12:27 ` [PATCH v4 " Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 1/5] fetch: extract writing to FETCH_HEAD Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 2/5] fetch: use strbuf to format FETCH_HEAD updates Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 3/5] fetch: refactor `s_update_ref` to use common exit path Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 4/5] fetch: allow passing a transaction to `s_update_ref()` Patrick Steinhardt
2021-01-12 12:27   ` [PATCH v4 5/5] fetch: implement support for atomic reference updates Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X/2UkKS0AhT6jLXj@ncase \
    --to=ps@pks.im \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.