All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Scherer <m.scherer@fu-berlin.de>
To: git@vger.kernel.org
Subject: Blobs not referenced by file (anymore) are not removed by GC
Date: Mon, 08 Dec 2014 17:22:23 +0100	[thread overview]
Message-ID: <5485D03F.3060008@fu-berlin.de> (raw)

Hi,

after using BFG on a repo given certain directory globs, all of those
files(names) are gone from history, but can not be collected by garbage
collection anymore. So the blobs of the underlying files are not deleted
and only the file names are not associated with the blob anymore. I
wonder, if I discovered a bug (at least in bfg). But I expect git to
discover that this blobs are not used in any way (so they have to
associated to something right?)

# invoke bfg --delete-folders something multiple times with different
pattern.

# try to cleanup

git gc --aggressive --prune=now # big blobs still in history
git fsck # no results
git fsck --full  --unreachable --dangling # no results

to verify if the blobs are still there, see the output of

git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+
blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects
.txt

head bigobjects.txt # outputs 9451427d7335395779b91864418630d2f0af780a
blob   7895212 1869047 7657491


Also if bfg is being told to remove the biggest blob (bfg -B 1) with
no-blob-protection, it does not succeed in removing it.

--- output of bfg -B 1

Found 1 blob ids for large blobs - biggest=7895212 smallest=7895212
....

BFG aborting: No refs to update - no dirty commits found??
---

The repo can be found here.

https://github.com/marscher/stallone_stale_objects

I will restart all over to cleanup the history, but I guess this might
be interesting for git developers.


Best,
Martin

             reply	other threads:[~2014-12-08 16:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08 16:22 Martin Scherer [this message]
     [not found] ` <CAFY1edaEq1zYV0vgSfiPAXU6bqVBzaA-apVnSn8DBMbzcAa2tQ@mail.gmail.com>
2014-12-08 16:47   ` Blobs not referenced by file (anymore) are not removed by GC Roberto Tyley
2014-12-09 14:14 ` Jeff King
2014-12-09 16:01   ` Roberto Tyley
2014-12-09 16:11     ` Jeff King
2014-12-09 22:15       ` Roberto Tyley
2014-12-10  7:11         ` Jeff King
2014-12-10 16:07           ` Junio C Hamano
2014-12-10 23:41             ` Roberto Tyley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5485D03F.3060008@fu-berlin.de \
    --to=m.scherer@fu-berlin.de \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.