All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
Date: Sun, 15 Apr 2018 14:02:30 +0200	[thread overview]
Message-ID: <20180415120230.GA21958@scaer> (raw)
In-Reply-To: <5ad0f6ccb81f_19073fa45f41a968589e3@ultri4.mail>

Ricardo, All,

On 2018-04-13 15:28 -0300, Ricardo Martincoski spake thusly:
> On Thu, Apr 12, 2018 at 02:48 PM, Yann E. MORIN wrote:
> > On 2018-04-12 06:28 -0300, Ricardo Martincoski spake thusly:
> >> When performing the checkout, use -f to allow it to occur even if there
> >> are untracked files that would be overwritten or removed.
> >> 
> >> Use git clean to remove any untracked files before generating the
> >> tarball. Use the second -f to ensure the repo is pristine even if
> >> previous checked out ref contained a submodule that is not present in
> >> the ref just checked out.
> [snip]
> > 
> > I'm not very happy of those tricks... If the repos is broken, it may be
> > in a state that repairing it is not even possible at all, in case a
> > critical git data is missing/damaged.
> 
> I created the series thinking only on git repos that are sane but have a dirty
> worktree. I should have used "dirty worktree" in the commit log.

Right, these are indeed two different cases.

> Rethinking now, taking into account a broken repo, and also the fact that the
> default download directory is inside buildroot, this patch is a bit dangerous.
> 
> With a completely broken repo in the git cache, the commands called by the
> download script end up executed in any repo that eventually contains that file
> (parent \(of parent \)\+ directory), so:
> $ cd buildroot
> $ git init dl/package/git
> $ cd dl/package/git
> $ rm .git/HEAD # simulate a completely broken repo
> $ git <command>
> The git command would be executed in the buildroot tree.
> A fetch to the wrong repo is bad, but not that bad.
> A checkout would fail.
> But checkout -f and git clean would succeed, potentially making the user to lose
> changes.

Arg, indeed that is dangerous...

> I will change this patch to Changes Requested.
> 
> A way to avoid this, once the script entered the directory, is to force the
> git-dir:
> $ git --git-dir=.git <command>
> This command would fail for a too-much-broken repo.

We recently reverted use of -C becasue it was not supported by old git
versions. What is the oldest git version to support --git-dir? It seems
it was introduced in v.1.4.2, dated 2006-08-12. Is that old enough that
all the distros we care about have that version or later?

> This is important also if we test the repo with git fsck before ditch+restart,
> otherwise it would be possible to return OK based on the wrong repo.
> 
> > 
> > If the repo is broken, either:
> > 
> >   - we ditch the repository and restart frm scratch,
> > 
> >   - or we just bail out and tell the user what hapened, so they can take
> >     action (e.g. remove the broken repo manually).
> 
> When the repo is broken (detectable by git fsck) I agree this is the best
> approach.
> 
> But I think a sane git repo (detectable by git fsck) that is in a dirty state
> (in the sense of git clean) is a different case.
> It can happen for example if a previous checkout operation was abruptly
> interrupted during a git checkout. In this case it seems to me a bit extreme
> (but not incorrect of course) to remove the repo and restart from scratch.
> 
> For the case a file from the worktree is missing the current 'git checkout'
> succeeds and the problem shows up as a hash mismatch.
> Using 'git checkout -f' the repo is recovered.
> For the case a git object needed by the checkout is missing the current 'git
> checkout' succeeds and the problem shows up as a hash mismatch.
> Using 'git checkout -f' the checkout fails like this:
>  error: invalid object
> 
> The git clean makes sure, among other things, that the worktree is clean even in
> the case the previous checkout contained a submodule that is no longer present
> in the new checkout. Maybe there is another command that ensures this, but I
> don't know about yet.
> Maybe I am missing something, but as I see, for a package with submodule, if
> a submodule is removed from the upstream project, it would trigger a
> ditch+restart too. Again, I think this is a bit extreme (more than git clean
> -ffdx) but not wrong.
> 
> In order to detect broken repos (and trigger the ditch) we could start the
> script by running fsck. Right?

So, basically, we would do something along those lines:

    _git() {
        eval ${GIT} --git-dir=.git "${@}"
    }

    _git fsck || { rm -rf ${git_cache}; exec "${0}" "${@}"; exit 1; }
    _git fetch
    _git fetch -t
    _git checkout ${ref}
    _git clean -dX
    _git checkout -- .
    _git submodules update

Of course, that would require using appropriate options to fsck to bail
out.

But what to do if any if the following actions fails? Should we simply
exit, or should we clean up and clone again?

I can see shere that could go wrong: the ref does not exist, so the
first checkout fails, so we ditch the repository, clone again, and
checkout again fails...

My opinion, for what it's worth, is to clan only on the fsck. Any other
failure should be left to the user to handle. Maybe with just a little
message saying something like:

    If you are not sure how to solve this, remove ${cache_dir}.

Thoughts?

> Do you have a local patch for this ditch+restart? Or should I create one?

I am working on that...

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

  reply	other threads:[~2018-04-15 12:02 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-12  9:28 [Buildroot] [RFC PATCH 0/3] fix some corner cases for download/git v1 Ricardo Martincoski
2018-04-12  9:28 ` [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git Ricardo Martincoski
2018-04-12 17:42   ` Yann E. MORIN
2018-04-13 18:23     ` Ricardo Martincoski
2018-04-15 12:12       ` Yann E. MORIN
2018-04-16  2:17         ` Ricardo Martincoski
2018-04-12  9:28 ` [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache Ricardo Martincoski
2018-04-12 17:48   ` Yann E. MORIN
2018-04-13 18:28     ` Ricardo Martincoski
2018-04-15 12:02       ` Yann E. MORIN [this message]
2018-04-16  2:54         ` Ricardo Martincoski
2018-04-16 16:01           ` Yann E. MORIN
2018-04-16 20:56             ` Yann E. MORIN
     [not found]               ` <1ffe206e-4b1e-c233-a511-ba4c3a8cb5f0@armadeus.com>
2018-04-17 10:42                 ` Yann E. MORIN
2018-04-17 11:30                   ` Thomas Petazzoni
2018-04-17  4:45             ` Ricardo Martincoski
2018-04-17  7:04               ` Ricardo Martincoski
2018-04-17  8:10                 ` Arnout Vandecappelle
2018-04-17  8:56                   ` Ricardo Martincoski
2018-04-17 10:36                     ` Yann E. MORIN
2018-04-17 11:58                       ` Arnout Vandecappelle
2018-04-12  9:28 ` [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs Ricardo Martincoski
2018-04-12 11:28   ` Thomas Petazzoni
2018-04-12 17:33     ` Yann E. MORIN
2018-04-13 18:32       ` Ricardo Martincoski
2018-04-15 12:08         ` Yann E. MORIN
2018-04-16  3:04           ` Ricardo Martincoski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180415120230.GA21958@scaer \
    --to=yann.morin.1998@free.fr \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.