All of lore.kernel.org
 help / color / mirror / Atom feed
* Git gc removes all packs
@ 2015-02-05 15:13 Dmitry Neverov
  2015-02-05 20:03 ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Neverov @ 2015-02-05 15:13 UTC (permalink / raw)
  To: git

Hi,

I'm experiencing a strange behavior of automatic git gc which corrupts a
local repository. Git version 2.2.2 on Mac OS X 10.10.1.

I'm using git p4 for synchronization with perforce. Sometimes after 'git
p4 rebase' git starts a garbage collection. When gc finishes a local
repository contains no pack files only loose objects, so I have to
re-import repository from perforce. It also doesn't contain a temporary
pack git gc was creating.

Command line history looks like this:

> git p4 rebase
Performing incremental import into refs/remotes/p4/master git branch
Depot paths: //XXX/YYY/
Import destination: refs/remotes/p4/master
Importing revision 352157 (100%)
Rebasing the current branch onto remotes/p4/master
First, rewinding head to replay your work on top of it...
Fast-forwarded master to remotes/p4/master.
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.

> ps aux | grep git
nd              14335  95.0  1.4  4643292 114788   ??  R     8:52PM
0:05.79 git pack-objects --keep-true-parents --honor-pack-keep
--non-empty --all --reflog --indexed-objects
--unpack-unreachable=2.weeks.ago --local --delta-base-offset
/path/to/repo/.git/objects/pack/.tmp-14333-pack
nd              14333   0.0  0.0  2452420    920   ??  S     8:52PM
0:00.00 git repack -d -l -A --unpack-unreachable=2.weeks.ago
nd              14331   0.0  0.0  2436036    744   ??  Ss    8:52PM
0:00.00 git gc --auto

After the 14331 process termination all packs are gone.

One more thing about my setup: since git p4 promotes a use of a linear
history I use a separate repository for another branch in perforce. In
order to be able to cherry-pick between repositories I added this
another repo objects dir as an alternate and also added a ref which is a
symbolic link to a branch in another repo (so I don't have to do any
fetches).

How do I troubleshoot the problem? Is there any way to enable a some
kind of logging for automatic git gc? Can use of alternates or symbolic
links in refs cause such a behavior?

--
Dmitry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-05 15:13 Git gc removes all packs Dmitry Neverov
@ 2015-02-05 20:03 ` Jeff King
  2015-02-17 16:39   ` Michael Haggerty
  2015-02-27 10:16   ` Dmitry Neverov
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2015-02-05 20:03 UTC (permalink / raw)
  To: Dmitry Neverov; +Cc: git

On Thu, Feb 05, 2015 at 04:13:03PM +0100, Dmitry Neverov wrote:

> I'm using git p4 for synchronization with perforce. Sometimes after 'git
> p4 rebase' git starts a garbage collection. When gc finishes a local
> repository contains no pack files only loose objects, so I have to
> re-import repository from perforce. It also doesn't contain a temporary
> pack git gc was creating.

It sounds like git didn't find any refs; it will pack only objects which
are reachable. Unreachable objects are either:

  1. Exploded into loose objects if the mtime on the pack they contain
     is less than 2 weeks old (and will eventually expire when they
     become 2 weeks old).

  2. Dropped completely if older than 2 weeks.

> One more thing about my setup: since git p4 promotes a use of a linear
> history I use a separate repository for another branch in perforce. In
> order to be able to cherry-pick between repositories I added this
> another repo objects dir as an alternate and also added a ref which is a
> symbolic link to a branch in another repo (so I don't have to do any
> fetches).

You can't symlink refs like this. The loose refs in the filesystem may
be migrated into the "packed-refs" file, at which point your symlink
will be broken. That is a likely reason why git would not find any refs.

So your setup will not ever work reliably.  But IMHO, it is a bug that
git does not notice the broken symlink and abort an operation which is
computing reachability in order to drop objects. As you noticed, it
means a misconfiguration or filesystem error results in data loss.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-05 20:03 ` Jeff King
@ 2015-02-17 16:39   ` Michael Haggerty
  2015-02-17 16:55     ` Jeff King
  2015-02-27 10:16   ` Dmitry Neverov
  1 sibling, 1 reply; 10+ messages in thread
From: Michael Haggerty @ 2015-02-17 16:39 UTC (permalink / raw)
  To: Jeff King, Dmitry Neverov; +Cc: git

On 02/05/2015 09:03 PM, Jeff King wrote:
> On Thu, Feb 05, 2015 at 04:13:03PM +0100, Dmitry Neverov wrote:
>> [...]
>> One more thing about my setup: since git p4 promotes a use of a linear
>> history I use a separate repository for another branch in perforce. In
>> order to be able to cherry-pick between repositories I added this
>> another repo objects dir as an alternate and also added a ref which is a
>> symbolic link to a branch in another repo (so I don't have to do any
>> fetches).
> 
> You can't symlink refs like this. The loose refs in the filesystem may
> be migrated into the "packed-refs" file, at which point your symlink
> will be broken. That is a likely reason why git would not find any refs.
> 
> So your setup will not ever work reliably.  But IMHO, it is a bug that
> git does not notice the broken symlink and abort an operation which is
> computing reachability in order to drop objects. As you noticed, it
> means a misconfiguration or filesystem error results in data loss.

There's a bunch of code in refs.c that is there explicitly for reading
loose references that are symlinks. If the link contents literally start
with "refs/", then they are read and treated as a symbolic ref.
Otherwise, the symlink is just followed.

It is still possible to write symbolic refs that are represented as
symlinks (see core.preferSymlinkRefs), but that backwards-compatibility
code was added in 2006(!) Maybe it's time to deprecate it. And maybe we
should start working towards a future where any symlinks under "refs"
cause git to complain.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-17 16:39   ` Michael Haggerty
@ 2015-02-17 16:55     ` Jeff King
  2015-02-17 20:37       ` Michael Haggerty
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2015-02-17 16:55 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Dmitry Neverov, git

On Tue, Feb 17, 2015 at 05:39:27PM +0100, Michael Haggerty wrote:

> > You can't symlink refs like this. The loose refs in the filesystem may
> > be migrated into the "packed-refs" file, at which point your symlink
> > will be broken. That is a likely reason why git would not find any refs.
> > 
> > So your setup will not ever work reliably.  But IMHO, it is a bug that
> > git does not notice the broken symlink and abort an operation which is
> > computing reachability in order to drop objects. As you noticed, it
> > means a misconfiguration or filesystem error results in data loss.
> 
> There's a bunch of code in refs.c that is there explicitly for reading
> loose references that are symlinks. If the link contents literally start
> with "refs/", then they are read and treated as a symbolic ref.
> Otherwise, the symlink is just followed.

Right, but we should be able to notice that:

  1. We found a symlink.

  2. We couldn't read it its ref value (because it's a broken link).

I think we _do_ notice that at the lowest level, and set REF_ISBROKEN.
But the problem is that the reachability code in prune and in
pack-objects (triggered by "repack -ad") uses for_each_ref, and not
for_each_rawref. So they ignore "broken" refs rather than complaining,
even though failing to read a ref may mean we could drop objects which
were only mentioned by that ref.

> It is still possible to write symbolic refs that are represented as
> symlinks (see core.preferSymlinkRefs), but that backwards-compatibility
> code was added in 2006(!) Maybe it's time to deprecate it. And maybe we
> should start working towards a future where any symlinks under "refs"
> cause git to complain.

I wouldn't mind seeing all of the symlink code go away, but I think it
is orthogonal to the problem I mentioned.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-17 16:55     ` Jeff King
@ 2015-02-17 20:37       ` Michael Haggerty
  2015-02-17 21:57         ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Haggerty @ 2015-02-17 20:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Dmitry Neverov, git

On 02/17/2015 05:55 PM, Jeff King wrote:
> On Tue, Feb 17, 2015 at 05:39:27PM +0100, Michael Haggerty wrote:
> 
>>> You can't symlink refs like this. The loose refs in the filesystem may
>>> be migrated into the "packed-refs" file, at which point your symlink
>>> will be broken. That is a likely reason why git would not find any refs.
>>>
>>> So your setup will not ever work reliably.  But IMHO, it is a bug that
>>> git does not notice the broken symlink and abort an operation which is
>>> computing reachability in order to drop objects. As you noticed, it
>>> means a misconfiguration or filesystem error results in data loss.
>>
>> There's a bunch of code in refs.c that is there explicitly for reading
>> loose references that are symlinks. If the link contents literally start
>> with "refs/", then they are read and treated as a symbolic ref.
>> Otherwise, the symlink is just followed.
> 
> Right, but we should be able to notice that:
> 
>   1. We found a symlink.
> 
>   2. We couldn't read it its ref value (because it's a broken link).
> 
> I think we _do_ notice that at the lowest level, and set REF_ISBROKEN.
> But the problem is that the reachability code in prune and in
> pack-objects (triggered by "repack -ad") uses for_each_ref, and not
> for_each_rawref. So they ignore "broken" refs rather than complaining,
> even though failing to read a ref may mean we could drop objects which
> were only mentioned by that ref.

Yes, this makes sense too. But my point was that sticking symlinks to
random files in your refs hierarchy is pretty questionable even *before*
the symlink gets broken. If we would warn the user as soon as we saw
such a thing, then the user's problem would never have advanced as far
as it did. Do you think that emitting warnings on *intact* symlinks is
too draconian?

> [...]

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-17 20:37       ` Michael Haggerty
@ 2015-02-17 21:57         ` Junio C Hamano
  2015-02-17 22:19           ` Michael Haggerty
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2015-02-17 21:57 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, Dmitry Neverov, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 02/17/2015 05:55 PM, Jeff King wrote:
>> On Tue, Feb 17, 2015 at 05:39:27PM +0100, Michael Haggerty wrote:
>> 
>>> There's a bunch of code in refs.c that is there explicitly for reading
>>> loose references that are symlinks. If the link contents literally start
>>> with "refs/", then they are read and treated as a symbolic ref.
>>> Otherwise, the symlink is just followed.
>> ...
> Yes, this makes sense too. But my point was that sticking symlinks to
> random files in your refs hierarchy is pretty questionable even *before*
> the symlink gets broken. If we would warn the user as soon as we saw
> such a thing, then the user's problem would never have advanced as far
> as it did. Do you think that emitting warnings on *intact* symlinks is
> too draconian?

Do you mean that we would end up reading refs/heads/hold if the user
did this:

    git rev-parse --verify HEAD -- >precious
    ln -s ../../../precious .git/refs/heads/hold

because that symbolic link does not begin with "refs/", and is an
accident waiting to happen so we should forbid it in the longer
term and warning when we see it would be the first step?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-17 21:57         ` Junio C Hamano
@ 2015-02-17 22:19           ` Michael Haggerty
  2015-02-18  7:13             ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Haggerty @ 2015-02-17 22:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Dmitry Neverov, git

On 02/17/2015 10:57 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> On 02/17/2015 05:55 PM, Jeff King wrote:
>>> On Tue, Feb 17, 2015 at 05:39:27PM +0100, Michael Haggerty wrote:
>>>
>>>> There's a bunch of code in refs.c that is there explicitly for reading
>>>> loose references that are symlinks. If the link contents literally start
>>>> with "refs/", then they are read and treated as a symbolic ref.
>>>> Otherwise, the symlink is just followed.
>>> ...
>> Yes, this makes sense too. But my point was that sticking symlinks to
>> random files in your refs hierarchy is pretty questionable even *before*
>> the symlink gets broken. If we would warn the user as soon as we saw
>> such a thing, then the user's problem would never have advanced as far
>> as it did. Do you think that emitting warnings on *intact* symlinks is
>> too draconian?
> 
> Do you mean that we would end up reading refs/heads/hold if the user
> did this:
> 
>     git rev-parse --verify HEAD -- >precious
>     ln -s ../../../precious .git/refs/heads/hold
> 
> because that symbolic link does not begin with "refs/",

Correct, you can do exactly that. The "hold" reference is resolvable and
listable using "for-each-ref". But if I try to update it, the contents
of the "precious" file are overwritten. On the other hand, if I run
"pack-refs", then the current value of the "hold" reference is moved to
"packed-refs" and the symlink is removed. This behavior is not sane.

> and is an
> accident waiting to happen so we should forbid it in the longer
> term and warning when we see it would be the first step?

Yes, I am proposing that approach, though if somebody can suggest a use
case I'm willing to be convinced otherwise. The only thing I can imagine
symlinks being useful for might be to temporarily create a fake repo,
run one or two specific known-safe commands, then delete the repo again.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-17 22:19           ` Michael Haggerty
@ 2015-02-18  7:13             ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2015-02-18  7:13 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, Dmitry Neverov, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 02/17/2015 10:57 PM, Junio C Hamano wrote:
> ...
>> Do you mean that we would end up reading refs/heads/hold if the user
>> did this:
>> 
>>     git rev-parse --verify HEAD -- >precious
>>     ln -s ../../../precious .git/refs/heads/hold
>> 
>> because that symbolic link does not begin with "refs/",
>
> Correct, you can do exactly that. The "hold" reference is resolvable and
> listable using "for-each-ref". But if I try to update it, the contents
> of the "precious" file are overwritten. On the other hand, if I run
> "pack-refs", then the current value of the "hold" reference is moved to
> "packed-refs" and the symlink is removed. This behavior is not sane.
>
>> and is an
>> accident waiting to happen so we should forbid it in the longer
>> term and warning when we see it would be the first step?
>
> Yes, I am proposing that approach, though if somebody can suggest a use
> case I'm willing to be convinced otherwise.

Thanks.  I agree the proposed tightening is probably harmless, but I
too would want to see if somebody comes up with a valid use case.  I
do not think of anything offhand.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-05 20:03 ` Jeff King
  2015-02-17 16:39   ` Michael Haggerty
@ 2015-02-27 10:16   ` Dmitry Neverov
  2015-02-27 13:14     ` Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Neverov @ 2015-02-27 10:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git

I followed your advice and removed a symlink ref from my repository.
But didn't help.. automatic GC has just removed all packs again. May
alternates cause such a behavior? Are any ways to make gc log
somewhere why it removes packs?

On Thu, Feb 5, 2015 at 9:03 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Feb 05, 2015 at 04:13:03PM +0100, Dmitry Neverov wrote:
>
>> I'm using git p4 for synchronization with perforce. Sometimes after 'git
>> p4 rebase' git starts a garbage collection. When gc finishes a local
>> repository contains no pack files only loose objects, so I have to
>> re-import repository from perforce. It also doesn't contain a temporary
>> pack git gc was creating.
>
> It sounds like git didn't find any refs; it will pack only objects which
> are reachable. Unreachable objects are either:
>
>   1. Exploded into loose objects if the mtime on the pack they contain
>      is less than 2 weeks old (and will eventually expire when they
>      become 2 weeks old).
>
>   2. Dropped completely if older than 2 weeks.
>
>> One more thing about my setup: since git p4 promotes a use of a linear
>> history I use a separate repository for another branch in perforce. In
>> order to be able to cherry-pick between repositories I added this
>> another repo objects dir as an alternate and also added a ref which is a
>> symbolic link to a branch in another repo (so I don't have to do any
>> fetches).
>
> You can't symlink refs like this. The loose refs in the filesystem may
> be migrated into the "packed-refs" file, at which point your symlink
> will be broken. That is a likely reason why git would not find any refs.
>
> So your setup will not ever work reliably.  But IMHO, it is a bug that
> git does not notice the broken symlink and abort an operation which is
> computing reachability in order to drop objects. As you noticed, it
> means a misconfiguration or filesystem error results in data loss.
>
> -Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Git gc removes all packs
  2015-02-27 10:16   ` Dmitry Neverov
@ 2015-02-27 13:14     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2015-02-27 13:14 UTC (permalink / raw)
  To: Dmitry Neverov; +Cc: git

On Fri, Feb 27, 2015 at 11:16:09AM +0100, Dmitry Neverov wrote:

> I followed your advice and removed a symlink ref from my repository.
> But didn't help.. automatic GC has just removed all packs again. May
> alternates cause such a behavior? Are any ways to make gc log
> somewhere why it removes packs?

If you have two repositories, A and B, and A points to B via alternates,
then you cannot safely run "git gc" in B unless it knows about all of
the refs in A. As we discussed before, symlinking the refs is not
enough, because those symlinks get stale. But nor is removing the
symlinks and just not knowing about the refs. :)

The only safe thing to do is to fetch all of the refs from A into B just
before running the gc (and consequently, you probably want to disable
gc.auto in B).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-27 13:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-05 15:13 Git gc removes all packs Dmitry Neverov
2015-02-05 20:03 ` Jeff King
2015-02-17 16:39   ` Michael Haggerty
2015-02-17 16:55     ` Jeff King
2015-02-17 20:37       ` Michael Haggerty
2015-02-17 21:57         ` Junio C Hamano
2015-02-17 22:19           ` Michael Haggerty
2015-02-18  7:13             ` Junio C Hamano
2015-02-27 10:16   ` Dmitry Neverov
2015-02-27 13:14     ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.