git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Priedhorsky, Reid" <reidpr@lanl.gov>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: bug? round-trip through fast-import/fast-export loses files
Date: Mon, 20 Mar 2023 18:57:21 -0700	[thread overview]
Message-ID: <CABPp-BEG+vp-UcpVfcZecPBnfcuTjO6JYCo7wEU5ZrDUHBUd9g@mail.gmail.com> (raw)
In-Reply-To: <BBB169A5-0665-47C9-819B-6409A22AB699@lanl.gov>

Hi,

On Mon, Mar 20, 2023 at 11:23 AM Priedhorsky, Reid <reidpr@lanl.gov> wrote:
>
>   Hello,
>
>   I believe I’ve found a bug in Git. It seems that (1) round-tripping through
>   fast-export/fast-import a repository (2) that contains a commit that changes
>   a file to a directory (3) deletes the contents of that directory from the
>   repository.
>
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
>
>   Run this shell script:
>
>   ~~~~
>   #!/bin/bash
>
>   set -ex
>
>   mkdir -p /tmp/weirdal
>   cd /tmp/weirdal
>   git --version
>
>   # init repo
>   rm -Rf wd
>   mkdir wd
>   cd wd
>   git init -b main
>
>   # first commit - foo is a file
>   touch foo
>   git add -A
>   git commit -m 'file'
>
>   # second commit - foo is a directory
>   rm foo
>   mkdir foo
>   touch foo/bar
>   git add -A
>   git commit -m 'directory'
>
>   # the contents of foo are in the working dir and the repo
>   git status
>   ls -lR
>   git ls-tree --name-only -r HEAD
>
>   # import/export repository (add --full-tree to work around bug)
>   git fast-export --no-data -- --all > ../export
>   cat ../export
>   git fast-import --force --quiet < ../export
>
>   # bug: foo is still in the WD but not the repo; should still be both
>   git status
>   ls -lR
>   git ls-tree --name-only -r HEAD
>   #git fast-export --no-data -- --all | diff -u --text ../export - || true
>   ~~~~
>
> What did you expect to happen? (Expected behavior)
>
>   Repo should be unchanged, i.e.:
>
>   + git status
>   On branch main
>   nothing to commit, working tree clean
>
> What happened instead? (Actual behavior)
>
>   Git thinks foo/bar has been staged:
>
>   + git status
>   On branch main
>   Changes to be committed:
>     (use "git restore --staged <file>..." to unstage)
>           new file:   foo/bar
>
> What's different between what you expected and what actually happened?
>
>   File foo/bar is staged when it should be unchanged.
>
> Anything else you want to add:
>
>   This also happens in 2.38.1 built from source.
>
>   The bad behavior can be worked around with “--full-tree” on fast-export, but
>   the real repo where I want to do this is pretty large, so I’d prefer not to.
>
>   Note the “git fast-export” output:
>
>     commit refs/heads/main
>     mark :2
>     author Reid Priedhorsky <reidpr@lanl.gov> 1679330805 -0600
>     committer Reid Priedhorsky <reidpr@lanl.gov> 1679330805 -0600
>     data 10
>     directory
>     from :1
>     M 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 foo/bar
>     D foo
>
>   It looks to me like the “M ... foo/bar” is being processed before “D foo”
>   when it should happen in the opposite order.

Thanks for the well-written bug report, including not only a testcase
but even the relevant bits of the fast-export output.  I thought I had
fixed D/F issues in fast-export & fast-import before, and indeed a
search turns up both of

253fb5f889 (fast-import: Improve robustness when D->F changes provided
in wrong order, 2010-07-09)
060df62422 (fast-export: Fix output order of D/F changes, 2010-07-09)

However, it looks like both of those only considered D->F (directory
becomes a file) changes, whereas you specifically have a case of F->D
(file becoming a directory).

Honestly, looking back at those two patches of mine, I think both were
rather suboptimal.  A better solution that would handle both F->D and
D->F would be having fast-export sort the diff_filepairs such that it
processes the deletes before the modifies.  Another improved solution
would be having fast-import sort the files given to it and handling
deletes first.  Either should fix this.

Might be a good task for a new contributor.  Any takers?  (Tagging as
#leftoverbits.)

  reply	other threads:[~2023-03-21  1:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-20 17:10 bug? round-trip through fast-import/fast-export loses files Priedhorsky, Reid
2023-03-21  1:57 ` Elijah Newren [this message]
2023-03-21 15:54   ` Priedhorsky, Reid
2023-03-21 17:07   ` Junio C Hamano
2023-03-21 18:31   ` Jeff King
2023-03-22  3:07     ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BEG+vp-UcpVfcZecPBnfcuTjO6JYCo7wEU5ZrDUHBUd9g@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=reidpr@lanl.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).