git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug: fetch with deepen shortens history
@ 2023-06-26 21:24 Benjamin Stein
  2023-07-25  0:03 ` Benjamin Stein
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Stein @ 2023-06-26 21:24 UTC (permalink / raw)
  To: git

Hello git gurus,

Here's an atypical bug report for you. I'm sorry for not starting with the template, but the context/setup are longer than felt useful in that format.

I have what I believe to be a (relatively) simple, reproducible test case (repo setup/steps below) around shallow checkouts at merge commits and deepening where the behavior is quite surprising - I end up with a smaller history after a fetch operation than when I started!

I've tried this on multiple OSes (Linux, Mac/Darwin), shells (zsh, bash) and various git versions ranging from 2.40.1 to 2.34.1, but I suspect it's older than that.

Scenario:
I'm using GitHub actions to look through some commit history and generate a report of commits relative to another branch. The specifics aren't super important, just that I start off in a shallow repo (depth=1) because (1) that's what GHA drops me into by default, and (2) since this is a large mono-repo I don't want to fetch all history every time - so I want to minimize the amount of data fetched.
So I used `fetch --shallow-exclude=other-branch HEAD` to get the relevant commits, and ran into my bug: when I do an extra `fetch --deepen` I end up with only a single commit in history instead of the N I had right before the call. If that's not super clear, I think the reproduction steps below should help.

Even more confusingly, if I check out a merge commit into trunk (the default) it misbehaves, but if I instead start out on the branch I'm inspecting, the same sequence of commands works correctly! This detail threw me for a long time as I tried to understand why behavior after a merge wasn't consistent.

My hunch is that this has to do with .git/shallow and the way some of the SHAs are coalescing when histories intersect/are combined, but I'm not too familiar with the inner workings of shallow. I even tried running the same sequence of steps with various combinations of --update-shallow on the fetches, but that doesn't seem to address the underlying issue of history shortening.

I hope that's all clear. If I can give more detail, definitely let me know, and I'm happy to try and explore some solutions, but I'm not certain where to begin.
I'm also not sure if it's an issue on the client or server - since I was initially testing with GitHub - but in my reduced example everything is local.

Thanks for your time and help with this!

-Benji


------ requisite bugreport answers ------

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
See setup below.
In my shallow checkout, I ran:
    git log --oneline | wc -l => N commits of history.
Then I ran
    git fetch --deepen=1 <branch>
Followed by
    git log --oneline | wc -l => now just one commit.

What did you expect to happen? (Expected behavior)
I expected to see N+1 commits of history because I deepened by 1.

What happened instead? (Actual behavior)
I instead had just one commit of history.

What's different between what you expected and what actually happened?
I expected not to have my local history shortened by ~N commits despite using --deepen instead of --depth.

Anything else you want to add:
See additional notes/steps in this email and attached script to reproduce


[System Info]
git version:
git version 2.40.1
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 6.2.6-76060206-generic #202303130630~1681329778~22.04~d824cd4 SMP PREEMPT_DYNAMIC Wed A x86_64
compiler info: gnuc: 11.3
libc info: glibc: 2.35
$SHELL (typically, interactive shell): /bin/zsh


[Enabled Hooks]
not run from a git repository - no hooks to show
(it's irrelevant, my example starts from nothing, so no hooks)


------- bug-setup.sh ----- (better to to run in chunks, rather than all at once, but provided for convenience)

set -x
# Setup working folder for easy cleanup
mkdir git-test && cd git-test

# Setup sample repo
mkdir source-repo && cd source-repo
git init
git branch -m trunk
for i in {01..05}; do echo "start${i}" >> start; git add start; git commit -m "start${i}"; done
git branch old-checkpoint
for i in {01..10}; do echo "new${i}" >> new; git add new; git commit -m "new${i}"; done
git checkout -b feature HEAD~2
for i in {01..03}; do echo "feature${i}" >> feature; git add feature; git commit -m "feature${i}"; done
git checkout trunk
git merge --no-edit feature
cd ..
sleep 1


# Checkout shallow clone at feature branch - this works as desired
git clone --no-local source-repo --depth=1 --branch feature shallow-clone-feature
cd shallow-clone-feature
git remote set-branches --add origin '*'
git fetch origin --shallow-exclude=old-checkpoint feature
git log --oneline origin/feature | wc -l # 11, expected
git fetch --deepen=1 origin feature
git log --oneline origin/feature | wc -l # 12, also as expected
cd ..
sleep 1


# Checkout shallow clone at merge commit - this illustrates the bug
git clone --no-local source-repo --depth=1 --branch trunk shallow-clone-merge
cd shallow-clone-merge
git remote set-branches --add origin '*'
git fetch origin --shallow-exclude=old-checkpoint feature
git log --oneline origin/feature | wc -l # 11, expected
git fetch --deepen=1 origin feature
git log --oneline origin/feature | wc -l # 1, unexpected

# Wait, what? Let's try that again
git fetch origin --shallow-exclude=old-checkpoint feature
git log --oneline origin/feature | wc -l # 11, still as expected
git fetch --deepen=1 origin feature
git log --oneline origin/feature | wc -l # 13, different this time. Unexpected.
cd ..
sleep 1


# What if we expand depth first?
git clone --no-local source-repo --depth=1 --branch trunk shallow-clone-with-depth
cd shallow-clone-with-depth
git remote set-branches --add origin '*'
git fetch --depth=2 origin feature
git fetch origin --shallow-exclude=old-checkpoint feature
git log --oneline origin/feature | wc -l # 11, expected
git fetch --deepen=1 origin feature
git log --oneline origin/feature | wc -l # 1, still unexpected
cd ..
sleep 1


# It turns out the depth query sometimes works if I also manually include HEAD, but still strangely.
# If I use depth=2, it's fine, but if I keep depth=3 it's not.
git clone --no-local source-repo --depth=1 --branch trunk shallow-clone-with-depth
cd shallow-clone-with-depth
git remote set-branches --add origin '*'
git fetch --depth=3 origin HEAD feature # it works if I use or include HEAD, but *not* if I include old-checkpoint
git fetch origin --shallow-exclude=old-checkpoint feature
git log --oneline origin/feature | wc -l # 11, expected
git fetch --deepen=1 origin feature
git log --oneline origin/feature | wc -l # 4, different from before, still wrong
cd ..
sleep 1

# If we start with deepen, instead
git clone --no-local ./source-repo --depth=1 --branch trunk shallow-clone-deepen
cd shallow-clone-deepen
git remote set-branches --add origin '*'
git fetch --deepen=1 origin HEAD feature # this also works if we use feature
git fetch origin --shallow-exclude=old-checkpoint feature
git log --oneline origin/feature | wc -l # 11, expected
git fetch --deepen=1 origin feature
git log --oneline origin/feature | wc -l # 12, that's finally correct
cd ..
sleep 1

# manually clear everything out by running
# cd .. && rm -rf git-test

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug: fetch with deepen shortens history
  2023-06-26 21:24 Bug: fetch with deepen shortens history Benjamin Stein
@ 2023-07-25  0:03 ` Benjamin Stein
  2023-07-31 10:56   ` Bagas Sanjaya
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Stein @ 2023-07-25  0:03 UTC (permalink / raw)
  To: git

> Hello git gurus,

> Here's an atypical bug report for you. I'm sorry for not starting with the template, but the context/setup are longer 
> than felt useful in that format.
> 
> I have what I believe to be a (relatively) simple, reproducible test case (repo setup/steps below) around shallow 
> checkouts at merge commits and deepening where the behavior is quite surprising - I end up with a smaller history 
> after a fetch operation than when I started!
>
>  [...snip...]

Hello again. It's been a month, and I didn't even get a "yes, we tested this and confirm the problem", so I thought I'd check in on this.

I also found a commit setup where even my "working" solution steps (only using deepen) still ends up with the unexpected behavior, so I thought I'd add that in here as a simpler scenario to experiment with. It happens when both sides of the merge are the same number of commits to the merge base.

Let me know if there's any additional information I can provide or something I can do to help resolve.

Thanks,
Benji

---------simple-bug-setup.sh---------
set -x
# Setup working folder for easy cleanup
mkdir git-test && cd git-test

# Setup repo
mkdir source-repo &&  cd source-repo
git init
git branch -m trunk
for i in {01..05}; do echo "start${i}" >> start; git add start; git commit -m "start${i}"; done
git branch old-checkpoint
for i in {01..10}; do echo "new${i}" >> new; git add new; git commit -m "new${i}"; done
git checkout -b feature HEAD~4
for i in {01..03}; do echo "feature${i}" >> feature; git add feature; git commit -m "feature${i}"; done
git checkout trunk
git merge --no-edit feature
cd ..
sleep 1

# simple checkout
git clone --no-local source-repo --depth=1 --branch trunk shallow-clone-only-deepen
cd shallow-clone-only-deepen
git remote set-branches --add origin '*'
git fetch --deepen=4 origin HEAD feature # this also works if we use feature
git fetch origin --shallow-exclude=old-checkpoint feature
git log --oneline origin/feature | wc -l # 9, expected
git fetch --deepen=1 origin feature
git log --oneline origin/feature | wc -l # 4, unexpected
cd ..
sleep 1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug: fetch with deepen shortens history
  2023-07-25  0:03 ` Benjamin Stein
@ 2023-07-31 10:56   ` Bagas Sanjaya
  0 siblings, 0 replies; 4+ messages in thread
From: Bagas Sanjaya @ 2023-07-31 10:56 UTC (permalink / raw)
  To: Benjamin Stein; +Cc: Git Mailing List, Junio C Hamano, Taylor Blau

[-- Attachment #1: Type: text/plain, Size: 2977 bytes --]

On Tue, Jul 25, 2023 at 12:03:31AM +0000, Benjamin Stein wrote:
> > Hello git gurus,
> 
> > Here's an atypical bug report for you. I'm sorry for not starting with the template, but the context/setup are longer 
> > than felt useful in that format.
> > 
> > I have what I believe to be a (relatively) simple, reproducible test case (repo setup/steps below) around shallow 
> > checkouts at merge commits and deepening where the behavior is quite surprising - I end up with a smaller history 
> > after a fetch operation than when I started!
> >
> >  [...snip...]
> 
> Hello again. It's been a month, and I didn't even get a "yes, we tested this and confirm the problem", so I thought I'd check in on this.
> 
> I also found a commit setup where even my "working" solution steps (only using deepen) still ends up with the unexpected behavior, so I thought I'd add that in here as a simpler scenario to experiment with. It happens when both sides of the merge are the same number of commits to the merge base.
> 
> Let me know if there's any additional information I can provide or something I can do to help resolve.

(also Cc: Junio and Taylor.)

I think in most cases, people doing shallow clones for testing mainline
branch of whatever project they're hacking. If at some later date they
wish to deepen their clones, they can do that up to specified date:

```
$ git clone <clone url> project && cd project
(hack, hack, hack)
$ git checkout main
$ git fetch --shallow-since=2023-01-01 && git repack -A -d
```

I often use this method when cloning quite large repos when I'm on mobile
data (via WiFi tethering) due to bandwidth limitation.

> 
> Thanks,
> Benji
> 
> ---------simple-bug-setup.sh---------
> set -x
> # Setup working folder for easy cleanup
> mkdir git-test && cd git-test
> 
> # Setup repo
> mkdir source-repo &&  cd source-repo
> git init
> git branch -m trunk
> for i in {01..05}; do echo "start${i}" >> start; git add start; git commit -m "start${i}"; done
> git branch old-checkpoint
> for i in {01..10}; do echo "new${i}" >> new; git add new; git commit -m "new${i}"; done
> git checkout -b feature HEAD~4
> for i in {01..03}; do echo "feature${i}" >> feature; git add feature; git commit -m "feature${i}"; done
> git checkout trunk
> git merge --no-edit feature
> cd ..
> sleep 1
> 
> # simple checkout
> git clone --no-local source-repo --depth=1 --branch trunk shallow-clone-only-deepen
> cd shallow-clone-only-deepen
> git remote set-branches --add origin '*'
> git fetch --deepen=4 origin HEAD feature # this also works if we use feature
> git fetch origin --shallow-exclude=old-checkpoint feature
> git log --oneline origin/feature | wc -l # 9, expected
> git fetch --deepen=1 origin feature
> git log --oneline origin/feature | wc -l # 4, unexpected
> cd ..
> sleep 1
> 

Thanks for the script that you provided.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug: fetch with deepen shortens history
@ 2023-08-04 18:22 Benjamin Stein
  0 siblings, 0 replies; 4+ messages in thread
From: Benjamin Stein @ 2023-08-04 18:22 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: Git Mailing List, Junio C Hamano, Taylor Blau

> I think in most cases, people doing shallow clones for testing mainline
> branch of whatever project they're hacking. If at some later date they
> wish to deepen their clones, they can do that up to specified date:

Yup, that makes sense to me. And it would make sense if it was up to a specified date, but I specifically want to reference a branch (which could move forward) so we can generate commits since then. 
You could imagine it being useful as "docs since last released version" where you don't need a tag for it, you can reference the release branch.

It's nice to be ably to rely on the fact that git refs can change, and so my script can continue to work without needing an update every time.
I guess I could pull the ref and then use the date, but that (a) relies on nothing else happening around then, and (b) seems to be exactly what --shallow-exclude wants to do.

> Thanks for the script that you provided.

You bet. As a fellow engineer, I know how helpful perfectly reproducible steps can be. :)

-Benji


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-04 18:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-26 21:24 Bug: fetch with deepen shortens history Benjamin Stein
2023-07-25  0:03 ` Benjamin Stein
2023-07-31 10:56   ` Bagas Sanjaya
2023-08-04 18:22 Benjamin Stein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).