git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUGREPORT] Why is git-push fetching content?
@ 2023-02-21 22:01 Sean Allred
  2023-02-21 23:02 ` brian m. carlson
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Allred @ 2023-02-21 22:01 UTC (permalink / raw)
  To: Sean Allred, Kyle VandeWalle, git

What did you do before the bug happened? (Steps to reproduce your issue)

    # in a new directory,
    cd $(mktemp -d)

    # initialize a new repository
    git init

    # fetch a single commit from a remote
    git fetch --filter=tree:0 --depth=1 $REMOTE $COMMIT_OID

    # create a ref on that remote
    git push --no-verify $REMOTE $COMMIT_OID:$REFNAME

What did you expect to happen? (Expected behavior)

    I expected this process to complete very, very quickly. We believe
    the version where it had been doing so was ~2.37.

What happened instead? (Actual behavior)

    The fetch completes nearly instantly as expected. We receive ~200B
    from the remote for the commit object itself. What's truly bizarre
    is what happens during the push. It starts receiving objects from
    the remote! By the end of this process, the local repository is a
    whopping ~700MB -- though interestingly only about a tenth of the
    full repository size.

    This result in particular is strange in context. I would expect to
    either see 'almost all' the repository content, 'about half' (we
    have two trunks and fetching a single commit would at most fetch one
    of them), or 'virtual none at all'. There isn't a straightforward
    explanation for why 'one tenth' would make sense.

What's different between what you expected and what actually happened?

    Why should git-push ever be fetching objects? This doesn't map well
    to my mental model of the relationship between push/fetch.

    I would expect the local repository to stay in that 'git init'+200B
    range.

Anything else you want to add:

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.

    I've truncated the system information normally included by
    git-bugreport as I am sending this email from a different machine.

    Versions of Git that can reproduce:

      - 2.39.2.windows.1     (Windows 10)

        git version:
        git version 2.39.2.windows.1
        cpu: x86_64
        built from commit: a82fa99b36ddfd643e61ed45e52abe314687df67
        sizeof-long: 4
        sizeof-size_t: 8
        shell-path: /bin/sh
        feature: fsmonitor--daemon
        uname: Windows 10.0 19044
        compiler info: gnuc: 12.2
        libc info: no libc information available
        $SHELL (typically, interactive shell): C:\Program Files\Git\usr\bin\bash.exe

      - 2.31.1               (AIX UNIX 7.2)

        git version:
        git version 2.31.1
        cpu: 00F905E64C00
        no commit associated with this build
        sizeof-long: 8
        sizeof-size_t: 8
        shell-path: /opt/freeware/bin/bash
        uname: AIX 2 7 00FBC37A4C00
        compiler info: gnuc: 8.3
        libc info: no libc information available
        $SHELL (typically, interactive shell): /usr/bin/ksh

--
Sean Allred

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUGREPORT] Why is git-push fetching content?
  2023-02-21 22:01 [BUGREPORT] Why is git-push fetching content? Sean Allred
@ 2023-02-21 23:02 ` brian m. carlson
  2023-02-22 15:04   ` Sean Allred
       [not found]   ` <7bfb7ecd4a4c78668f97b00d5f06af0c9b2878269476e89c3311eeb8071b1ab3@mu.id>
  0 siblings, 2 replies; 7+ messages in thread
From: brian m. carlson @ 2023-02-21 23:02 UTC (permalink / raw)
  To: Sean Allred; +Cc: Sean Allred, Kyle VandeWalle, git

[-- Attachment #1: Type: text/plain, Size: 2696 bytes --]

On 2023-02-21 at 22:01:04, Sean Allred wrote:
> What did you do before the bug happened? (Steps to reproduce your issue)
> 
>     # in a new directory,
>     cd $(mktemp -d)
> 
>     # initialize a new repository
>     git init
> 
>     # fetch a single commit from a remote
>     git fetch --filter=tree:0 --depth=1 $REMOTE $COMMIT_OID
> 
>     # create a ref on that remote
>     git push --no-verify $REMOTE $COMMIT_OID:$REFNAME
> 
> What did you expect to happen? (Expected behavior)
> 
>     I expected this process to complete very, very quickly. We believe
>     the version where it had been doing so was ~2.37.
> 
> What happened instead? (Actual behavior)
> 
>     The fetch completes nearly instantly as expected. We receive ~200B
>     from the remote for the commit object itself. What's truly bizarre
>     is what happens during the push. It starts receiving objects from
>     the remote! By the end of this process, the local repository is a
>     whopping ~700MB -- though interestingly only about a tenth of the
>     full repository size.
> 
>     This result in particular is strange in context. I would expect to
>     either see 'almost all' the repository content, 'about half' (we
>     have two trunks and fetching a single commit would at most fetch one
>     of them), or 'virtual none at all'. There isn't a straightforward
>     explanation for why 'one tenth' would make sense.

It's hard to know for certain what's going on here, but it depends on
your history.  You did a partial clone with no trees, so you've likely
received a single commit object and no trees or blobs.

However, when you push a commit, that necessitates pushing the trees and
blobs as well, and you don't have those.  If the remote said that it
already had the commit, then it might push no objects at all (which I've
seen before) and thus just update the references.  However, if it pushes
even one commit, it may need to walk the history and find common
commits, which will necessitate fetching objects, and it will have to
push any trees and blobs as well, which also will require objects to be
fetched.

My guess is that this is probably made worse by the fact that this is
shallow, and that necessitates certain additional computations, which
means more objects are fetched. However, I'm not super sure how that
code works, so I think it may be helpful for someone else to chime in
who's more familiar with this.

If you want to see what's going on, you can run with
`GIT_TRACE=1 GIT_TRACE_PACKET=1`, which may show interesting information
about the negotiation.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUGREPORT] Why is git-push fetching content?
  2023-02-21 23:02 ` brian m. carlson
@ 2023-02-22 15:04   ` Sean Allred
  2023-06-20 11:26     ` Tao Klerks
       [not found]   ` <7bfb7ecd4a4c78668f97b00d5f06af0c9b2878269476e89c3311eeb8071b1ab3@mu.id>
  1 sibling, 1 reply; 7+ messages in thread
From: Sean Allred @ 2023-02-22 15:04 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Sean Allred, Kyle VandeWalle, git


"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> It's hard to know for certain what's going on here, but it depends on
> your history.  You did a partial clone with no trees, so you've likely
> received a single commit object and no trees or blobs.

Yup, this was the intention behind `--depth=1 --filter=tree:0`. The
server doing this ref update needs to be faster than having the full
history would allow.

> However, when you push a commit, that necessitates pushing the trees and
> blobs as well, and you don't have those.  If the remote said that it
> already had the commit, then it might push no objects at all (which I've
> seen before) and thus just update the references.  However, if it pushes
> even one commit, it may need to walk the history and find common
> commits, which will necessitate fetching objects, and it will have to
> push any trees and blobs as well, which also will require objects to be
> fetched.

Absolutely. The commit in question was fetched from the same remote to
which we're pushing, so it would seem by definition that git-push should
not need to push *any* object content whatsoever.

> My guess is that this is probably made worse by the fact that this is
> shallow, and that necessitates certain additional computations, which
> means more objects are fetched. However, I'm not super sure how that
> code works, so I think it may be helpful for someone else to chime in
> who's more familiar with this.

I'm certain this is just an unforeseen interaction between all these
pieces. I wouldn't be too surprised if we're among only a handful of
folks using git in this way.

> If you want to see what's going on, you can run with
> `GIT_TRACE=1 GIT_TRACE_PACKET=1`, which may show interesting information
> about the negotiation.

I'm not sure of the best way to include this information, so I'm just
going to inline it. I've edited this log file to remove several tens of
thousands of lines of object hashes and operational refnames. I've
annotated it with some guesses of what things might mean. I'm still
*relatively* new to reading such log files for serious debugging.

    08:30:47.655623 exec-cmd.c:237          trace: resolved executable dir: C:/Program Files/Git/mingw64/bin
    08:30:47.655623 git.c:460               trace: built-in: git push --no-verify -o emc2.enable-logging git@$REPO.git FETCH_HEAD:refs/tags/hswebrec/app/stage1/latest

This is the command actually run in the foreground. As you might
surmise, we're uing Git4Win. It's worth noting that FETCH_HEAD here is
0962f6cd9b1f2b5a012581823c12f0f0619bd3f5.

    08:30:47.655623 run-command.c:655       trace: run_command: unset GIT_PREFIX; ssh git@$REPO_HOST 'git-receive-pack '\''$REPO_PROJECT.git'\'''
    08:30:48.100544 pkt-line.c:80           packet:         push< 4edfa5e150857e21c686826e1e430f6b014ed173 refs/archive/app/devnull\0report-status report-status-v2 delete-refs side-band-64k quiet atomic ofs-delta push-options object-format=sha1 agent=git/2.38.4.gl1
    08:30:48.100544 pkt-line.c:80           packet:         push< 5d27e7331f08365c9bb3d342ae020807a386f42a refs/heads/app/10.1/stage1

    ---8<--- literally thousands of refs removed...

    08:30:49.121310 pkt-line.c:80           packet:         push< 90c12d8c0ad0559047b3b9de78d948901fcffac3 refs/tags/zrb/I10121236/726711
    08:30:49.121310 pkt-line.c:80           packet:         push< 90c12d8c0ad0559047b3b9de78d948901fcffac3 refs/tags/zrb/I10121236/726712
    08:30:49.121310 pkt-line.c:80           packet:         push< 0000

Looks like the above was the remote telling the client what objects it
has by virtue of what refs it is tracking.

    08:30:49.193113 pkt-line.c:80           packet:         push> shallow 0962f6cd9b1f2b5a012581823c12f0f0619bd3f5
    08:30:49.193113 pkt-line.c:80           packet:         push> 0000000000000000000000000000000000000000 0962f6cd9b1f2b5a012581823c12f0f0619bd3f5 refs/tags/hswebrec/app/stage1/latest\0 report-status-v2 side-band-64k quiet push-options object-format=sha1 agent=git/2.39.2.windows.1
    08:30:49.193113 pkt-line.c:80           packet:         push> 0000

And this is the client telling the remote what objects it has ('shallow
0962f6...'?) and what changes it would like to make.

    08:30:49.193113 pkt-line.c:80           packet:         push> emc2.enable-logging
    08:30:49.193113 pkt-line.c:80           packet:         push> 0000

...as well as our push-option that enabled more logging for us (nothing
relevant to Git communication -- mostly just internal web services and
print-statement debugging).

    08:30:49.193113 run-command.c:655       trace: run_command: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset -q --shallow
    08:30:49.224949 exec-cmd.c:237          trace: resolved executable dir: C:/Program Files/Git/mingw64/libexec/git-core
    08:30:49.224949 git.c:460               trace: built-in: git pack-objects --all-progress-implied --revs --stdout --thin --delta-base-offset -q --shallow
    08:30:49.224949 run-command.c:655       trace: run_command: git -c fetch.negotiationAlgorithm=noop fetch git@$REPO.git --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin
    08:30:49.246673 exec-cmd.c:237          trace: resolved executable dir: C:/Program Files/Git/mingw64/libexec/git-core
    08:30:49.256718 git.c:460               trace: built-in: git fetch git@$REPO.git --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin
    08:30:49.256718 run-command.c:655       trace: run_command: unset GIT_CONFIG_PARAMETERS GIT_PREFIX; GIT_PROTOCOL=version=2 ssh -o SendEnv=GIT_PROTOCOL git@tracklab.epic.com 'git-upload-pack '\''epic/test/trackdev/mono-23/app.git'\'''

It looks like this is the client initiating a fetch.

    08:30:49.699720 pkt-line.c:80           packet:        fetch< version 2
    08:30:49.699720 pkt-line.c:80           packet:        fetch< agent=git/2.38.4.gl1
    08:30:49.699720 pkt-line.c:80           packet:        fetch< ls-refs=unborn
    08:30:49.699720 pkt-line.c:80           packet:        fetch< fetch=shallow wait-for-done filter
    08:30:49.699720 pkt-line.c:80           packet:        fetch< server-option
    08:30:49.699720 pkt-line.c:80           packet:        fetch< object-format=sha1
    08:30:49.699720 pkt-line.c:80           packet:        fetch< object-info

The remote tells the client what version it is so the client can send a
request the remote understands.

    08:30:49.699720 pkt-line.c:80           packet:        fetch< 0000
    08:30:49.699720 pkt-line.c:80           packet:        fetch> command=fetch
    08:30:49.699720 pkt-line.c:80           packet:        fetch> agent=git/2.39.2.windows.1
    08:30:49.699720 pkt-line.c:80           packet:        fetch> object-format=sha1
    08:30:49.699720 pkt-line.c:80           packet:        fetch> 0001
    08:30:49.699720 pkt-line.c:80           packet:        fetch> thin-pack
    08:30:49.699720 pkt-line.c:80           packet:        fetch> no-progress
    08:30:49.699720 pkt-line.c:80           packet:        fetch> ofs-delta
    08:30:49.699720 pkt-line.c:80           packet:        fetch> shallow 0962f6cd9b1f2b5a012581823c12f0f0619bd3f5
    08:30:49.699720 pkt-line.c:80           packet:        fetch> filter blob:none
    08:30:49.699720 pkt-line.c:80           packet:        fetch> want 8da2fa849db733188b1820865deb800d8e6abfc6
    08:30:49.699720 pkt-line.c:80           packet:        fetch> done
    08:30:49.699720 pkt-line.c:80           packet:        fetch> 0000

The client asks the remote for content. Looks like the filter here got
changed from tree:0 to blob:none. This could be the bug -- and could
explain the 'weird' amount of content that was actually downloaded. A
fully-fleshed-out clone would be about 8GB, but I could certainly see a
blobless history being ~700MB. Interesting to note here that
0962f6^{tree} is 8da2fa.

    08:30:49.715316 pkt-line.c:80           packet:        fetch< shallow-info
    08:30:49.715316 pkt-line.c:80           packet:        fetch< 0001
    08:30:49.715316 pkt-line.c:80           packet:        fetch< packfile

Not sure what this bit is, to be honest.

    08:30:50.527739 pkt-line.c:80           packet:     sideband< PACK ...
    08:30:50.542281 run-command.c:655       trace: run_command: git index-pack --stdin --fix-thin '--keep=fetch-pack 6448 on win-pool7447' --promisor --pack_header=2,71814
    08:30:50.574053 exec-cmd.c:237          trace: resolved executable dir: C:/Program Files/Git/mingw64/libexec/git-core
    08:30:50.590271 git.c:460               trace: built-in: git index-pack --stdin --fix-thin '--keep=fetch-pack 6448 on win-pool7447' --promisor --pack_header=2,71814
    08:30:53.027721 pkt-line.c:80           packet:     sideband< 0000
    08:30:53.200395 run-command.c:655       trace: run_command: git maintenance run --auto --no-quiet
    08:30:53.246231 exec-cmd.c:237          trace: resolved executable dir: C:/Program Files/Git/mingw64/libexec/git-core
    08:30:53.248264 git.c:460               trace: built-in: git maintenance run --auto --no-quiet
    08:30:59.362381 run-command.c:655       trace: run_command: git -c fetch.negotiationAlgorithm=noop fetch git@$REPO.git --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin
    08:30:59.378458 exec-cmd.c:237          trace: resolved executable dir: C:/Program Files/Git/mingw64/libexec/git-core
    08:30:59.394133 git.c:460               trace: built-in: git fetch git@$REPO.git --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin
    08:33:28.966748 run-command.c:655       trace: run_command: unset GIT_CONFIG_PARAMETERS GIT_PREFIX; GIT_PROTOCOL=version=2 ssh -o SendEnv=GIT_PROTOCOL git@tracklab.epic.com 'git-upload-pack '\''epic/test/trackdev/mono-23/app.git'\'''

Nor why we'd go through a separate round of fetching.

    08:33:29.530124 pkt-line.c:80           packet:        fetch< version 2
    08:33:29.530124 pkt-line.c:80           packet:        fetch< agent=git/2.38.4.gl1
    08:33:29.530124 pkt-line.c:80           packet:        fetch< ls-refs=unborn
    08:33:29.530124 pkt-line.c:80           packet:        fetch< fetch=shallow wait-for-done filter
    08:33:29.530124 pkt-line.c:80           packet:        fetch< server-option
    08:33:29.530124 pkt-line.c:80           packet:        fetch< object-format=sha1
    08:33:29.530124 pkt-line.c:80           packet:        fetch< object-info
    08:33:29.530124 pkt-line.c:80           packet:        fetch< 0000
    08:33:51.764098 pkt-line.c:80           packet:        fetch> command=fetch
    08:33:51.764098 pkt-line.c:80           packet:        fetch> agent=git/2.39.2.windows.1
    08:33:51.764098 pkt-line.c:80           packet:        fetch> object-format=sha1
    08:33:51.764098 pkt-line.c:80           packet:        fetch> 0001
    08:33:51.764098 pkt-line.c:80           packet:        fetch> thin-pack
    08:33:51.764098 pkt-line.c:80           packet:        fetch> no-progress
    08:33:51.764098 pkt-line.c:80           packet:        fetch> ofs-delta
    08:33:51.764098 pkt-line.c:80           packet:        fetch> shallow 0962f6cd9b1f2b5a012581823c12f0f0619bd3f5
    08:33:51.764098 pkt-line.c:80           packet:        fetch> filter blob:none

But we can see this blob:none 'mistake' again...

    08:33:51.764098 pkt-line.c:80           packet:        fetch> want 0000063ae70e4385b0527df060daf0a81b306c8d
    08:33:51.764098 pkt-line.c:80           packet:        fetch> want 00003efd5bd3b3795950588f7d051e8d0b42def3

    ---8<--- many, many thousands of objects removed

    08:33:54.154726 pkt-line.c:80           packet:        fetch> want ffffd8aad00ab11c0672096203f57564b286da08
    08:33:54.154726 pkt-line.c:80           packet:        fetch> want ffffd967eaed43bf87d40a84f2f9e12c59575abe
    08:33:54.154726 pkt-line.c:80           packet:        fetch> done
    08:33:54.154726 pkt-line.c:80           packet:        fetch> 0000

... with all of those trees

    08:33:56.058314 pkt-line.c:80           packet:        fetch< shallow-info
    08:33:56.058314 pkt-line.c:80           packet:        fetch< 0001
    08:33:56.058314 pkt-line.c:80           packet:        fetch< packfile
    08:33:57.783631 pkt-line.c:80           packet:     sideband< PACK ...
    08:33:57.799321 run-command.c:655       trace: run_command: git index-pack --stdin --fix-thin '--keep=fetch-pack 6540 on win-pool7447' --promisor --pack_header=2,344628
    08:33:57.830596 exec-cmd.c:237          trace: resolved executable dir: C:/Program Files/Git/mingw64/libexec/git-core
    08:33:57.830596 git.c:460               trace: built-in: git index-pack --stdin --fix-thin '--keep=fetch-pack 6540 on win-pool7447' --promisor --pack_header=2,344628
    08:34:35.033275 pkt-line.c:80           packet:     sideband< 0000
    08:34:46.298777 run-command.c:655       trace: run_command: git maintenance run --auto --no-quiet
    08:34:46.330052 exec-cmd.c:237          trace: resolved executable dir: C:/Program Files/Git/mingw64/libexec/git-core
    08:34:46.345653 git.c:460               trace: built-in: git maintenance run --auto --no-quiet
    08:45:14.094158 pkt-line.c:80           packet:     sideband< \1
    08:45:18.281129 pkt-line.c:80           packet:     sideband< \2pre-receive started at 1677077118283
    remote: pre-receive started at 1677077118283
    08:45:18.297266 pkt-line.c:80           packet:     sideband< \2Received line '0000000000000000000000000000000000000000 0962f6cd9b1f2b5a012581823c12f0f0619bd3f5 refs/tags/hswebrec/app/stage1/l
    08:45:18.297266 pkt-line.c:80           packet:     sideband< \2atest'
    remote: Received line '0000000000000000000000000000000000000000 0962f6cd9b1f2b5a012581823c12f0f0619bd3f5 refs/tags/hswebrec/app/stage1/latest'

    ---8<--- clipped pre-receive output

    08:45:18.550847 pkt-line.c:80           packet:     sideband< \2pre-receive finished at 1677077118556 (273 ms)
    remote: pre-receive finished at 1677077118556 (273 ms)
    08:45:21.448647 pkt-line.c:80           packet:     sideband< \1000eunpack ok002cok refs/tags/hswebrec/app/stage1/latest0000
    08:45:21.448647 pkt-line.c:80           packet:         push< unpack ok
    08:45:21.448647 pkt-line.c:80           packet:         push< ok refs/tags/hswebrec/app/stage1/latest
    08:45:21.448647 pkt-line.c:80           packet:         push< 0000
    08:45:21.798337 pkt-line.c:80           packet:     sideband< \2post-receive started at 1677077121804
    remote: post-receive started at 1677077121804
    08:45:21.814043 pkt-line.c:80           packet:     sideband< \2Received line '0000000000000000000000000000000000000000 0962f6cd9b1f2b5a012581823c12f0f0619bd3f5 refs/tags/hswebrec/app/stage1/l
    08:45:21.814043 pkt-line.c:80           packet:     sideband< \2atest'
    remote: Received line '0000000000000000000000000000000000000000 0962f6cd9b1f2b5a012581823c12f0f0619bd3f5 refs/tags/hswebrec/app/stage1/latest'

    ---8<--- clipped post-receive output

    08:45:21.972423 pkt-line.c:80           packet:     sideband< \23.0000; path=/; Httponly; Secure"]}}post-receive finished at 1677077121979 (175 ms)
    remote: post-receive finished at 1677077121979 (175 ms)
    08:45:22.051793 pkt-line.c:80           packet:     sideband< 0000
    To $REPO.git
     * [new tag]         FETCH_HEAD -> hswebrec/app/stage1/latest

--
Sean Allred

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUGREPORT] Why is git-push fetching content?
       [not found]   ` <7bfb7ecd4a4c78668f97b00d5f06af0c9b2878269476e89c3311eeb8071b1ab3@mu.id>
@ 2023-02-22 15:48     ` Sean Allred
  0 siblings, 0 replies; 7+ messages in thread
From: Sean Allred @ 2023-02-22 15:48 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Sean Allred, Kyle VandeWalle, git


Apologies for the double-email; in switching between desktops, I
prematurely sent my last message. Luckily I was very nearly done.

Sean Allred <allred.sean@gmail.com> writes:
> But we can see this blob:none 'mistake' again...
>
>     08:33:51.764098 pkt-line.c:80           packet:        fetch> want 0000063ae70e4385b0527df060daf0a81b306c8d
>     08:33:51.764098 pkt-line.c:80           packet:        fetch> want 00003efd5bd3b3795950588f7d051e8d0b42def3
>
>     ---8<--- many, many thousands of objects removed
>
>     08:33:54.154726 pkt-line.c:80           packet:        fetch> want ffffd8aad00ab11c0672096203f57564b286da08
>     08:33:54.154726 pkt-line.c:80           packet:        fetch> want ffffd967eaed43bf87d40a84f2f9e12c59575abe
>     08:33:54.154726 pkt-line.c:80           packet:        fetch> done
>     08:33:54.154726 pkt-line.c:80           packet:        fetch> 0000
>
> ... with all of those trees

I was verifying my suspicion in the other desktop -- but my suspicion
was incorrect. These aren't all trees; in fact, the objects listed above
are *just* blobs -- no other object types. One could assume that these
are all the blobs in the 8da2fa tree, but I would expect that we'd get
all the subtrees in 8da2fa as well in that case.

I'm not sure how much more information I can extract from this list of
blobs, but I'm open to suggestions if we think there's a pattern here to
be discovered.

--
Sean Allred

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUGREPORT] Why is git-push fetching content?
  2023-02-22 15:04   ` Sean Allred
@ 2023-06-20 11:26     ` Tao Klerks
  2023-07-08  6:27       ` Sean Allred
  0 siblings, 1 reply; 7+ messages in thread
From: Tao Klerks @ 2023-06-20 11:26 UTC (permalink / raw)
  To: Sean Allred; +Cc: brian m. carlson, Sean Allred, Kyle VandeWalle, git

On Wed, Feb 22, 2023 at 4:45 PM Sean Allred <allred.sean@gmail.com> wrote:
>
>
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> > It's hard to know for certain what's going on here, but it depends on
> > your history.  You did a partial clone with no trees, so you've likely
> > received a single commit object and no trees or blobs.
>
> Yup, this was the intention behind `--depth=1 --filter=tree:0`. The
> server doing this ref update needs to be faster than having the full
> history would allow.
>

FWIW, you're not alone - we do exactly the same thing, for the same
reasons, and get the same outcome: We want to create a tag in a CI
job, that particular CI job has no reason to check out the code, all
we know is we want ref XXXXX to point to commit YYYYY.

The most logical way to achieve that seems to be to do a shallow
partial no-checkout clone of commit YYYYY, and then push to remote ref
XXXXX, but the push ends up doing extra seemingly-unnecessary
jit-fetching work.

In our case it's still better than any alternative we've found, but
wastes a few seconds that we'd love to see optimized away.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUGREPORT] Why is git-push fetching content?
  2023-06-20 11:26     ` Tao Klerks
@ 2023-07-08  6:27       ` Sean Allred
  2023-07-08  8:39         ` Sean Allred
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Allred @ 2023-07-08  6:27 UTC (permalink / raw)
  To: Tao Klerks
  Cc: Sean Allred, brian m. carlson, Sean Allred, Kyle VandeWalle, git


Thanks for the replies. I'd like to bump this up again. This has come up
in a new context and I don't see a viable workaround for us that doesn't
involve a rewrite of the process and an excessive amount of new
infrastructure.

I have a feeling this is somehow a general issue with promisor remotes,
though I don't know enough about how they work to know where to start
investigation. I've got what I believe to be minimal reproduction steps
below.

Tao Klerks <tao@klerks.biz> writes:
> On Wed, Feb 22, 2023 at 4:45 PM Sean Allred <allred.sean@gmail.com> wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> > It's hard to know for certain what's going on here, but it depends on
>> > your history.  You did a partial clone with no trees, so you've likely
>> > received a single commit object and no trees or blobs.
>>
>> Yup, this was the intention behind `--depth=1 --filter=tree:0`. The
>> server doing this ref update needs to be faster than having the full
>> history would allow.
>>
>
> FWIW, you're not alone - we do exactly the same thing, for the same
> reasons, and get the same outcome: We want to create a tag in a CI
> job, that particular CI job has no reason to check out the code, all
> we know is we want ref XXXXX to point to commit YYYYY.
>
> [...]
>
> In our case it's still better than any alternative we've found, but
> wastes a few seconds that we'd love to see optimized away.

Unfortunately in our case, 'a few seconds' is tens of minutes (I'm
working with a repository of several million commits) and is timing out
the remote host.

----

I devised some minimal steps to reproduce what I believe to be a related
issue: rev-list fetching content. I've prepared a public repository on
github.com to demonstrate, but you should be able to recreate this
repository if needed by just making a handful of commits to a couple
arbitrary files.

    (cwd:tmp)
    $ git clone --no-checkout --depth=1 --no-tags --filter=tree:0 https://github.com/vermiculus/testibus.git
    Cloning into 'testibus'...
    remote: Enumerating objects: 1, done.
    remote: Counting objects: 100% (1/1), done.
    remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
    Receiving objects: 100% (1/1), done.

Sweet, I've only received one object from the remote. This makes sense
per what I want: a treeless, blobless, fetch of a single commit. Let's
double-check.

    (cwd:testibus)
    $ git fsck
    Checking object directories: 100% (256/256), done.
    Checking objects: 100% (2/2), done.

I have two objects? How'd that second one get in there? What is it?
Let's try to find out...

    (cwd:testibus)
    $ git rev-list --objects --all
    d86642e7ae089b69e8a0b20a3e39337435833f92

Alright, I've got the commit object. That makes sense.

    c0fa909c5f67047abc027d9b06e1352954ee33f7

Weird, I also got the tree on the commit, even though I specified that
this should be a treeless clone.

    remote: Enumerating objects: 1, done.
    remote: Counting objects: 100% (1/1), done.
    remote: Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
    Receiving objects: 100% (1/1), 54 bytes | 54.00 KiB/s, done.
    94b334d80405218e281a6f5b48d31f73cd3af4be file

Woah woah! All I did was rev-list; why are we fetching content?

This is why I believe this is related to the push issue I'm ultimately
facing -- I'm not familiar with the specifics, but it stands to reason
that git-push needs to (somehow) iterate through objects in order to
negotiate a packfile with the remote. I suspect these two issues have
the same root cause.

I believe the following can be used with git-bisect to determine if this
truly ever worked or is a regression:

    setup:
        #!/bin/bash

        repo="https://github.com/vermiculus/testibus.git"
        repo_dir="~/path/to/repo"

        git clone --no-checkout --depth=1 --no-tags --filter=tree:0 "$repo" "$repo_dir"
        git -C "$repo_dir" remote set-url origin unreachable

    bisect script:
        git -C "$repo_dir" rev-list --objects --all

        (obviously using the just-built git)

I'm going to start running this bisect, but I suspect it will take a
while, so I wanted to get this out there.

--
Sean Allred

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUGREPORT] Why is git-push fetching content?
  2023-07-08  6:27       ` Sean Allred
@ 2023-07-08  8:39         ` Sean Allred
  0 siblings, 0 replies; 7+ messages in thread
From: Sean Allred @ 2023-07-08  8:39 UTC (permalink / raw)
  To: Sean Allred
  Cc: Tao Klerks, brian m. carlson, Sean Allred, Kyle VandeWalle, git


Following up with the results of my bisect (more discussion below). I'm
forced to conclude this may somehow have never worked as I'm expecting
(even though I do recall it working well in a long-gone environment),
but I'm very much hoping I just did the bisect incorrectly. (It's not a
feature I need to use much.)

So, is this a bug or is this working as intended for a good reason?

Sean Allred <allred.sean@gmail.com> writes:
> Thanks for the replies. I'd like to bump this up again. This has come up
> in a new context and I don't see a viable workaround for us that doesn't
> involve a rewrite of the process and an excessive amount of new
> infrastructure.
>
> I have a feeling this is somehow a general issue with promisor remotes,
> though I don't know enough about how they work to know where to start
> investigation. I've got what I believe to be minimal reproduction steps
> below.
>
> [...]
>
> I believe the following can be used with git-bisect to determine if this
> truly ever worked or is a regression:
>
>     setup:
>         #!/bin/bash
>
>         repo="https://github.com/vermiculus/testibus.git"
>         repo_dir="~/path/to/repo"
>
>         git clone --no-checkout --depth=1 --no-tags --filter=tree:0 "$repo" "$repo_dir"
>         git -C "$repo_dir" remote set-url origin unreachable
>
>     bisect script:
>         git -C "$repo_dir" rev-list --objects --all
>
>         (obviously using the just-built git)
>
> I'm going to start running this bisect, but I suspect it will take a
> while, so I wanted to get this out there.

I ended up using a bisect script that looks like this

    #!/bin/bash
    make clean
    NO_GETTEXT=1 make -j8 || exit 125
    ./bin-wrappers/git -C "$1" rev-list --objects --all || exit 1
    git rev-parse HEAD >> ../good-commits

and running

    git bisect start main 637fc4467e57872008171958eda0428818a7ee03
    git bisect run ../bisect-script.sh ~/tmp/testibus/

It took less time than I thought, but unfortunately I was never able to
actually find a 'good' commit. I arbitrarily chose "partial-clone:
design doc" (Jeff Hostetler, Dec 14 2017) as the first commit to the
partial-clone design document (under the assumption that it worked at
some point). If potentially lying to git-bisect in this way is
especially liable to bust it, I can start the exponentially-more-
expensive process of testing every commit along --first-parent, but I
suspect this may have never worked as I'm expecting.

--
Sean Allred

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-08  8:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-21 22:01 [BUGREPORT] Why is git-push fetching content? Sean Allred
2023-02-21 23:02 ` brian m. carlson
2023-02-22 15:04   ` Sean Allred
2023-06-20 11:26     ` Tao Klerks
2023-07-08  6:27       ` Sean Allred
2023-07-08  8:39         ` Sean Allred
     [not found]   ` <7bfb7ecd4a4c78668f97b00d5f06af0c9b2878269476e89c3311eeb8071b1ab3@mu.id>
2023-02-22 15:48     ` Sean Allred

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).