git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/32] Split index mode for very large indexes
@ 2014-04-28 10:55 Nguyễn Thái Ngọc Duy
  2014-04-28 10:55 ` [PATCH 01/32] ewah: fix constness of ewah_read_mmap Nguyễn Thái Ngọc Duy
                   ` (34 more replies)
  0 siblings, 35 replies; 75+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2014-04-28 10:55 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

I hinted about it earlier [1]. It now passes the test suite and with a
design that I'm happy with (thanks to Junio for a suggestion about the
rename problem).

From the user point of view, this reduces the writable size of index
down to the number of updated files. For example my webkit index v4 is
14MB. With a fresh split, I only have to update an index of 200KB.
Every file I touch will add about 80 bytes to that. As long as I don't
touch every single tracked file in my worktree, I should not pay
penalty for writing 14MB index file on every operation.

The read penalty is not addressed here, so I still pay 14MB hashing
cost. But that's an easy problem. We could cache the validated index
in a daemon. Whenever git needs to load an index, it pokes the daemon.
The daemon verifies that the on-disk index still has the same
signature, then sends the in-mem index to git. When git updates the
index, it pokes the daemon again to update in-mem index. Next time git
reads the index, it does not have to pay I/O cost any more (actually
it does but the cost is hidden away when you do not have to read it
yet).

The forth patch is not really necessary. I started out with a
different approach that needed that abstraction. But I think it's
still a nice thing to keep. The real meat starts from 0017 to 0025. In
essence, the new index is more like a journal, where the real index is
put away unchanged.

Doing this in other implementations should be easy (at least the
reading part) and with small code change. The whole index format is
retained. All you need is to read a new extension that contains two
ewah-bitmaps and apply the changes to create the final index.

This is a preparation step for my untracked file cache. With writing
(and later on reading) index becoming cheap, I can start to put more
things in there.

[1] http://thread.gmane.org/gmane.comp.version-control.git/246471/focus=247031

Nguyễn Thái Ngọc Duy (32):
  ewah: fix constness of ewah_read_mmap
  ewah: delete unused ewah_read_mmap_native declaration
  sequencer: do not update/refresh index if the lock cannot be held
  read-cache: new API write_locked_index instead of write_index/write_cache
  read-cache: relocate and unexport commit_locked_index()
  read-cache: store in-memory flags in the first 12 bits of ce_flags
  read-cache: be strict about "changed" in remove_marked_cache_entries()
  read-cache: be specific what part of the index has changed
  update-index: be specific what part of the index has changed
  resolve-undo: be specific what part of the index has changed
  unpack-trees: be specific what part of the index has changed
  cache-tree: mark istate->cache_changed on cache tree invalidation
  cache-tree: mark istate->cache_changed on cache tree update
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  read-cache: save index SHA-1 after reading
  read-cache: split-index mode
  read-cache: mark new entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark updated entries for split index
  split-index: the writing part
  split-index: the reading part
  split-index: do not invalidate cache-tree at read time
  split-index: strip pathname of on-disk replaced entries
  update-index: new options to enable/disable split index mode
  update-index --split-index: do not split if $GIT_DIR is read only
  rev-parse: add --shared-index-path to get shared index path
  read-tree: force split-index mode off on --index-output
  read-tree: note about dropping split-index mode or index version
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  t2104: make sure split index mode is off for the version test
  t1700: new tests for split-index mode
-- 
1.9.1.346.ga2b5940

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2014-05-22 16:38 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 01/32] ewah: fix constness of ewah_read_mmap Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 02/32] ewah: delete unused ewah_read_mmap_native declaration Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 03/32] sequencer: do not update/refresh index if the lock cannot be held Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 04/32] read-cache: new API write_locked_index instead of write_index/write_cache Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 05/32] read-cache: relocate and unexport commit_locked_index() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 06/32] read-cache: store in-memory flags in the first 12 bits of ce_flags Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 07/32] read-cache: be strict about "changed" in remove_marked_cache_entries() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 08/32] read-cache: be specific what part of the index has changed Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 09/32] update-index: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 10/32] resolve-undo: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 11/32] unpack-trees: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 12/32] cache-tree: mark istate->cache_changed on cache tree invalidation Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 13/32] cache-tree: mark istate->cache_changed on cache tree update Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 14/32] cache-tree: mark istate->cache_changed on prime_cache_tree() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 15/32] entry.c: update cache_changed if refresh_cache is set in checkout_entry() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 16/32] read-cache: save index SHA-1 after reading Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 17/32] read-cache: split-index mode Nguyễn Thái Ngọc Duy
2014-04-28 22:46   ` Junio C Hamano
2014-04-29  1:43     ` Duy Nguyen
2014-04-29 17:23       ` Junio C Hamano
2014-04-29 22:45         ` Duy Nguyen
2014-04-30 13:57           ` Junio C Hamano
2014-04-28 10:55 ` [PATCH 18/32] read-cache: mark new entries for split index Nguyễn Thái Ngọc Duy
2014-04-30 20:35   ` Eric Sunshine
2014-04-28 10:55 ` [PATCH 19/32] read-cache: save deleted entries in " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 20/32] read-cache: mark updated entries for " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 21/32] split-index: the writing part Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 22/32] split-index: the reading part Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 23/32] split-index: do not invalidate cache-tree at read time Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 24/32] split-index: strip pathname of on-disk replaced entries Nguyễn Thái Ngọc Duy
2014-04-29 20:25   ` Junio C Hamano
2014-04-28 10:55 ` [PATCH 25/32] update-index: new options to enable/disable split index mode Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 26/32] update-index --split-index: do not split if $GIT_DIR is read only Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 27/32] rev-parse: add --shared-index-path to get shared index path Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 28/32] read-tree: force split-index mode off on --index-output Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 29/32] read-tree: note about dropping split-index mode or index version Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 30/32] read-cache: force split index mode with GIT_TEST_SPLIT_INDEX Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 31/32] t2104: make sure split index mode is off for the version test Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 32/32] t1700: new tests for split-index mode Nguyễn Thái Ngọc Duy
2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
2014-04-29  1:52   ` Duy Nguyen
2014-05-09 10:27   ` Duy Nguyen
2014-05-09 17:55     ` Junio C Hamano
2014-05-13 11:15   ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 1/8] read-cache: allow to keep mmap'd memory after reading Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 2/3] Add read-cache--daemon Nguyễn Thái Ngọc Duy
2014-05-13 11:52       ` Erik Faye-Lund
2014-05-13 12:01         ` Duy Nguyen
2014-05-13 13:01         ` Duy Nguyen
2014-05-13 13:37           ` Erik Faye-Lund
2014-05-13 13:49             ` Duy Nguyen
2014-05-13 14:06               ` Erik Faye-Lund
2014-05-13 14:10                 ` Duy Nguyen
2014-05-13 14:16                   ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support Nguyễn Thái Ngọc Duy
2014-05-13 11:59       ` Erik Faye-Lund
2014-05-13 12:03         ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 3/8] daemonize: set a flag before exiting the main process Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 3/3] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 4/8] Add read-cache--daemon for caching index and related stuff Nguyễn Thái Ngọc Duy
2014-05-13 11:56       ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 5/8] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
2014-05-13 12:13       ` Erik Faye-Lund
2014-05-13 11:15     ` [PATCH 6/8] read-cache--daemon: do not read index " Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 7/8] read-cache: skip verifying trailing SHA-1 on cached index Nguyễn Thái Ngọc Duy
2014-05-13 11:15     ` [PATCH 8/8] read-cache: inform the daemon that the index has been updated Nguyễn Thái Ngọc Duy
2014-05-13 12:17       ` Erik Faye-Lund
2014-05-22 16:38       ` David Turner
2014-05-13 14:24     ` [PATCH 0/8] Speed up cache loading time Stefan Beller
2014-05-13 14:35       ` Duy Nguyen
2014-05-13 11:20   ` [PATCH 9/8] even faster loading time with index version 254 Nguyễn Thái Ngọc Duy
2014-04-28 22:23 ` [PATCH 00/32] Split index mode for very large indexes Junio C Hamano
2014-04-30 20:48 ` Richard Hansen
2014-05-01  0:09   ` Duy Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).