All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/16] make prune mtime-checking more careful
@ 2014-10-03 20:20 Jeff King
  2014-10-03 20:21 ` [PATCH 01/16] foreach_alt_odb: propagate return value from callback Jeff King
                   ` (17 more replies)
  0 siblings, 18 replies; 58+ messages in thread
From: Jeff King @ 2014-10-03 20:20 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty

At GitHub we've occasionally run across repos getting corrupted by trees
and blobs near the tip going missing. We do a lot of "test merges"
between branches and HEAD (this is what feeds the "OK to merge" button
on the web interface), and the objects are almost always related to
these merges. The objects are removed by prune, which doesn't realize
that they are part of an ongoing operation. Prune uses the filesystem
mtime to determine this, but we are not very thorough in making sure
that is kept up to date.

This series tries to fix that with two techniques:

  1. When we try to write an object to disk, we optimize out the write
     if we already have the object. Instead, we should still update the
     mtime of the object in this case.

  2. Treat objects reachable from "recent" objects as recent themselves.

     When we check that we have an object, we do not check whether we
     have all of the objects it can reach. If you have some new objects
     that refer to some old objects (e.g., you create and delete a tree
     on day 1, and then create a new tree referring to the blob on day
     2), then prune may delete the old object but not the new (in this
     case, we delete the blob but not the tree).

     Any subsequent use of the new object will check that we have it
     (e.g., commit-tree makes sure we have the tree we feed it), but not
     other objects it can reach. This can lead to referencing a
     half-formed part of the graph.

Note that this does not make prune race-free. For example, you could
check for and update the mtime of an object just as prune is deleting
it, and think that it is written when it is not. Fixing that would
require some atomic mechanism for prune to check the mtime and delete.
But I do think this series cuts us down to "real" race conditions, with
millisecond-ish timing. The problems we're fixing here are much worse
than that. The distance between an object being written and being
referred may operate on human timescales (e.g., writing a commit
message). Or the time distance between two objects that refer to each
other may be days or weeks; a prune where one falls in the "recent"
boundary and another does not can be disastrous.

There's quite a lot of patches here, but most of them are preparatory
cleanups. The meat is in patches 13, 15, and 16.

  [01/16]: foreach_alt_odb: propagate return value from callback
  [02/16]: isxdigit: cast input to unsigned char
  [03/16]: object_array: factor out slopbuf-freeing logic
  [04/16]: object_array: add a "clear" function
  [05/16]: clean up name allocation in prepare_revision_walk
  [06/16]: reachable: clear pending array after walking it
  [07/16]: t5304: use test_path_is_* instead of "test -f"
  [08/16]: t5304: use helper to report failure of "test foo = bar"
  [09/16]: prune: factor out loose-object directory traversal
  [10/16]: count-objects: do not use xsize_t when counting object size
  [11/16]: count-objects: use for_each_loose_file_in_objdir
  [12/16]: sha1_file: add for_each iterators for loose and packed objects
  [13/16]: prune: keep objects reachable from recent objects
  [14/16]: pack-objects: refactor unpack-unreachable expiration check
  [15/16]: pack-objects: match prune logic for discarding objects
  [16/16]: write_sha1_file: freshen existing objects

Note that these aren't yet running on GitHub servers. I know that they
fix real potential problems (see the new t6501 for examples), but I
don't know for sure if they will catch the problems we have seen. The
frequency of these issues is relatively rare, so even once deployed, we
won't know for sure until a few weeks or months have passed.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2014-10-08 17:03 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-03 20:20 [PATCH 0/16] make prune mtime-checking more careful Jeff King
2014-10-03 20:21 ` [PATCH 01/16] foreach_alt_odb: propagate return value from callback Jeff King
2014-10-03 22:55   ` René Scharfe
2014-10-04  0:31     ` Jeff King
2014-10-03 20:22 ` [PATCH 02/16] isxdigit: cast input to unsigned char Jeff King
2014-10-03 20:22 ` [PATCH 03/16] object_array: factor out slopbuf-freeing logic Jeff King
2014-10-07 11:25   ` Michael Haggerty
2014-10-08  7:36     ` Jeff King
2014-10-08  8:40       ` Michael Haggerty
2014-10-08  8:55         ` Jeff King
2014-10-03 20:22 ` [PATCH 04/16] object_array: add a "clear" function Jeff King
2014-10-03 20:23 ` [PATCH 05/16] clean up name allocation in prepare_revision_walk Jeff King
2014-10-03 20:24 ` [PATCH 06/16] reachable: clear pending array after walking it Jeff King
2014-10-03 20:25 ` [PATCH 07/16] t5304: use test_path_is_* instead of "test -f" Jeff King
2014-10-03 22:12   ` Junio C Hamano
2014-10-03 20:27 ` [PATCH 08/16] t5304: use helper to report failure of "test foo = bar" Jeff King
2014-10-03 22:17   ` Junio C Hamano
2014-10-04  0:13     ` Jeff King
2014-10-07 13:21   ` Michael Haggerty
2014-10-07 17:29     ` Junio C Hamano
2014-10-07 20:18       ` Jeff King
2014-10-07 20:35         ` Junio C Hamano
2014-10-07 21:29           ` Jeff King
2014-10-07 21:53             ` Junio C Hamano
2014-10-07 22:17               ` Michael Haggerty
2014-10-08  1:13                 ` Jeff King
2014-10-08 16:58                   ` Junio C Hamano
2014-10-07 21:16         ` Junio C Hamano
2014-10-03 20:29 ` [PATCH 09/16] prune: factor out loose-object directory traversal Jeff King
2014-10-03 22:19   ` Junio C Hamano
2014-10-04  0:24     ` Jeff King
2014-10-07 14:07   ` Michael Haggerty
2014-10-08  7:33     ` Jeff King
2014-10-03 20:31 ` [PATCH 10/16] count-objects: do not use xsize_t when counting object size Jeff King
2014-10-03 20:31 ` [PATCH 11/16] count-objects: use for_each_loose_file_in_objdir Jeff King
2014-10-03 20:32 ` [PATCH 12/16] sha1_file: add for_each iterators for loose and packed objects Jeff King
2014-10-05  8:15   ` René Scharfe
2014-10-05 10:47     ` Ramsay Jones
2014-10-03 20:39 ` [PATCH 13/16] prune: keep objects reachable from recent objects Jeff King
2014-10-03 21:47   ` Junio C Hamano
2014-10-04  0:09     ` Jeff King
2014-10-04  0:30     ` Jeff King
2014-10-04  3:04       ` Junio C Hamano
2014-10-07 16:29   ` Michael Haggerty
2014-10-08  7:19     ` Jeff King
2014-10-08 10:37       ` Michael Haggerty
2014-10-03 20:39 ` [PATCH 14/16] pack-objects: refactor unpack-unreachable expiration check Jeff King
2014-10-03 20:40 ` [PATCH 15/16] pack-objects: match prune logic for discarding objects Jeff King
2014-10-03 20:41 ` [PATCH 16/16] write_sha1_file: freshen existing objects Jeff King
2014-10-03 21:29   ` Junio C Hamano
2014-10-04  0:01     ` Jeff King
2014-10-05  9:12   ` René Scharfe
2014-10-03 22:20 ` [PATCH 0/16] make prune mtime-checking more careful Junio C Hamano
2014-10-04 22:22 ` Junio C Hamano
2014-10-05  9:19   ` René Scharfe
2014-10-06  1:42   ` Jeff King
2014-10-08  8:31     ` Jeff King
2014-10-08 17:03       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.