git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Rolf Eike Beer <eb@emlix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Junio C Hamano <gitster@pobox.com>,
	Git List Mailing <git@vger.kernel.org>,
	Tobias Ulmer <tu@emlix.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: data loss when doing ls-remote and piped to command
Date: Fri, 17 Sep 2021 15:13:24 -0400	[thread overview]
Message-ID: <YUTo1BTp7BXOw6K9@coredump.intra.peff.net> (raw)
In-Reply-To: <2722184.bRktqFsmb4@devpool47>

On Fri, Sep 17, 2021 at 08:59:07AM +0200, Rolf Eike Beer wrote:

> What you need is a _fast_ git server. kernel.org or github.com seem to be too 
> slow for this if you don't sit somewhere in their datacenter. Use something in 
> your local network, a Xeon E5 with lot's of RAM and connected with 1GBit/s 
> Ethernet in my case.

One thing that puzzled me here: is the bad output between the server and
ls-remote, or between ls-remote and its output pipe?

I'd guess it has to be the latter, since otherwise ls-remote itself
would barf with an error message.

In that case, I'd think "git ls-remote ." would give you the fastest
outcome, because it's talking to upload-pack on the local box. But I'm
also confused how the speed could matter, as ls-remote reads the entire
input into an in-memory array, and then formats it.

We do the write using printf(). Is it possible your libc's stdio may
drop bytes when the pipe is full, rather than blocking? In general, I'd
expect write() to block, so libc doesn't have to care at all. But might
there be something in your environment putting the pipe into
non-blocking mode, and we get EAGAIN or something? If so, I'd expect
stdio to return the error.

Maybe patching Git like this would help:

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index f4fd823af8..5936b2b42c 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -146,7 +146,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		const struct ref_array_item *ref = ref_array.items[i];
 		if (show_symref_target && ref->symref)
 			printf("ref: %s\t%s\n", ref->symref, ref->refname);
-		printf("%s\t%s\n", oid_to_hex(&ref->objectname), ref->refname);
+		if (printf("%s\t%s\n", oid_to_hex(&ref->objectname), ref->refname) < 0)
+			die_errno("printf failed");
 		status = 0; /* we found something */
 	}
 

> And the reader must be "somewhat" slow. Using sha256sum works reliably for me. 
> Using "wc -l" does not, also md5sum and sha1sum are too fast as it seems.

If a slow pipe is involved, maybe:

  git ls-remote . | (sleep 5; cat) | sha256sum

would help reproduce. Assuming ls-remote's output is bigger than your
system pipe buffer (which is another interesting thing to check), then
it should block for 5 seconds on write() midway through the output,
which you can verify with strace.

-Peff

  reply	other threads:[~2021-09-17 19:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-15 12:43 data loss when doing ls-remote and piped to command Rolf Eike Beer
2021-09-15 18:17 ` Junio C Hamano
2021-09-16  6:38   ` Rolf Eike Beer
2021-09-16 10:12     ` Tobias Ulmer
2021-09-16 12:17       ` Rolf Eike Beer
2021-09-16 15:49         ` Mike Galbraith
2021-09-17  6:38           ` Mike Galbraith
2021-09-16 17:11         ` Linus Torvalds
2021-09-16 20:42           ` Junio C Hamano
2021-09-17  6:59             ` Rolf Eike Beer
2021-09-17 19:13               ` Jeff King [this message]
2021-09-17 19:28               ` Linus Torvalds
2021-09-18  6:33               ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUTo1BTp7BXOw6K9@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=eb@emlix.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tu@emlix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).