All of lore.kernel.org
 help / color / mirror / Atom feed
* What's cooking in git.git (Oct 2013, #06; Fri, 25)
@ 2013-10-25 23:23 Junio C Hamano
  2013-10-26 10:30 ` Duy Nguyen
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2013-10-25 23:23 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.

You can find the changes described here in the integration branches
of the repositories listed at

    http://git-blame.blogspot.com/p/git-public-repositories.html

--------------------------------------------------
[New Topics]

* bc/http-100-continue (2013-10-23) 1 commit
 - http: add option to enable 100 Continue responses

 Conditionally allow "100 Continue" responses to help use of
 GSS-Negotiate authentication scheme over HTTP transport.

 Seems to be still under discussion.


* jc/merge-base-reflog (2013-10-25) 2 commits
 - merge-base: teach "--fork-point" mode
 - merge-base: use OPT_CMDMODE and clarify the command line parsing

 Code the logic in "pull --rebase" that figures out a fork point
 from reflog entries in C.


* jk/date-c-double-semicolon (2013-10-24) 1 commit
 - drop redundant semicolon in empty while

 Will merge to 'next'.


* jk/for-each-ref-skip-parsing (2013-10-24) 1 commit
 - for-each-ref: avoid loading objects to print %(objectname)

 Will merge to 'next'.


* jk/pack-bitmap (2013-10-25) 19 commits
 - pack-bitmap: implement optional name_hash cache
 - t: add basic bitmap functionality tests
 - repack: consider bitmaps when performing repacks
 - repack: handle optional files created by pack-objects
 - repack: turn exts array into array-of-struct
 - repack: stop using magic number for ARRAY_SIZE(exts)
 - pack-objects: implement bitmap writing
 - rev-list: add bitmap mode to speed up object lists
 - pack-objects: use bitmaps when packing objects
 - pack-bitmap: add support for bitmap indexes
 - documentation: add documentation for the bitmap format
 - ewah: compressed bitmap implementation
 - compat: add endianness helpers
 - sha1_file: export `git_open_noatime`
 - revision: allow setting custom limiter function
 - pack-objects: factor out name_hash
 - pack-objects: refactor the packing list
 - revindex: export new APIs
 - sha1write: make buffer const-correct

 Borrows the bitmap index into packfiles from JGit to speed up
 enumeration of objects involved in a commit range without having to
 fully traverse the history.


* jk/refs-c-squelch-gcc (2013-10-24) 1 commit
 - silence gcc array-bounds warning

 Will merge to 'next'.


* jk/robustify-parse-commit (2013-10-24) 6 commits
 - checkout: do not die when leaving broken detached HEAD
 - use parse_commit_or_die instead of custom message
 - use parse_commit_or_die instead of segfaulting
 - assume parse_commit checks for NULL commit
 - assume parse_commit checks commit->object.parsed
 - log_tree_diff: die when we fail to parse a commit

 Will merge to 'next' after taking another look.


* mh/fetch-tags-in-addition-to-normal-refs (2013-10-24) 16 commits
 - fetch, remote: properly convey --no-prune options to subprocesses
 - builtin/remote.c:update(): use struct argv_array
 - builtin/remote.c: reorder function definitions
 - query_refspecs(): move some constants out of the loop
 - fetch --prune: prune only based on explicit refspecs
 - SQUASH??? --tags is no longer a short-hand
 - fetch --tags: fetch tags *in addition to* other stuff
 - builtin/fetch.c: reorder function definitions
 - ref_remove_duplicates(): improve documentation comment
 - ref_remove_duplicates(): simplify function
 - ref_remove_duplicates(): avoid redundant bisection
 - get_ref_map(): rename local variables
 - api-remote.txt: correct section "struct refspec"
 - t5510: check that "git fetch --prune --tags" does not prune branches
 - t5510: prepare test refs more straightforwardly
 - t5510: use the correct tag name in test

 Some questionable paragraphs in the doc updates, but other than
 that looks reasonably solid.


* nd/lift-path-max (2013-10-24) 2 commits
 - checkout_entry(): clarify the use of topath[] parameter
 - entry.c: convert checkout_entry to use strbuf

 Will merge to 'next'.


* jk/pack-corruption-post-mortem (2013-10-25) 1 commit
 - howto: add article on recovering a corrupted object

 Will merge to 'next'.


* jk/reset-p-current-head-fix (2013-10-25) 2 commits
 - reset: pass real rev name to add--interactive
 - add-interactive: handle unborn branch in patch mode

 "git reset -p HEAD" has codepath to special case it from resetting
 to contents of other commits, but recent change broke it.

 Will merge to 'next'.


* mf/graph-show-root (2013-10-25) 1 commit
 - graph.c: mark root commit differently

 Needs adjustments to some tests.


* nv/parseopt-opt-arg (2013-10-25) 1 commit
 - rev-parse --parseopt: add the --sticked-long mode

 Enhance "rev-parse --parseopt" mode to help parsing options with
 an optional parameter.

--------------------------------------------------
[Stalled]

* np/pack-v4 (2013-09-18) 90 commits
 . packv4-parse.c: add tree offset caching
 . t1050: replace one instance of show-index with verify-pack
 . index-pack, pack-objects: allow creating .idx v2 with .pack v4
 . unpack-objects: decode v4 trees
 . unpack-objects: allow to save processed bytes to a buffer
 - ...

 Nico and Duy advancing the eternal vaporware pack-v4.  This is here
 primarily for wider distribution of the preview edition.

 Temporarily ejected from 'pu', to try out jk/pack-bitmap, which
 this topic conflicts with.


* sc/doc-howto-dumb-http (2013-10-16) 1 commit
 . doc/howto: warn about (dumb)http server document being too old

 The new text needs to go somewhere in the body of the document,
 not before the title line.


* tg/perf-lib-test-perf-cleanup (2013-09-19) 2 commits
 - perf-lib: add test_perf_cleanup target
 - perf-lib: split starting the test from the execution

 Add test_perf_cleanup shell function to the perf suite, that allows
 the script writers to define a test with a clean-up action.

 Holding until needed.


* yt/shortened-rename (2013-10-18) 2 commits
 - SQUASH??? style fixes and s/omit/shorten/ where appropriate
 - diff.c: keep arrow(=>) on show_stats()'s shortened filename part to make rename visible

 Attempts to give more weight on the fact that a filepair represents
 a rename than showing substring of the actual path when diffstat
 lines are not wide enough.

 I am not sure if that is solving a right problem, though.


* tr/merge-recursive-index-only (2013-07-07) 3 commits
 - merge-recursive: -Xindex-only to leave worktree unchanged
 - merge-recursive: untangle double meaning of o->call_depth
 - merge-recursive: remove dead conditional in update_stages()

 Holding until there is a caller to learn from.


* jc/ref-excludes (2013-09-03) 2 commits
 - document --exclude option
 - revision: introduce --exclude=<glob> to tame wildcards

 People often wished a way to tell "git log --branches" (and "git
 log --remotes --not --branches") to exclude some local branches
 from the expansion of "--branches" (similarly for "--tags", "--all"
 and "--glob=<pattern>").  Now they have one.

 Needs a matching change to rev-parse.


* rv/send-email-cache-generated-mid (2013-08-21) 2 commits
 - git-send-email: Cache generated message-ids, use them when prompting
 - git-send-email: add optional 'choices' parameter to the ask sub


* rj/read-default-config-in-show-ref-pack-refs (2013-06-17) 3 commits
 - ### DONTMERGE: needs better explanation on what config they need
 - pack-refs.c: Add missing call to git_config()
 - show-ref.c: Add missing call to git_config()

 The changes themselves are probably good, but it is unclear what
 basic setting needs to be read for which exact operation.

 Waiting for clarification.
 $gmane/228294


* jc/format-patch (2013-04-22) 2 commits
 - format-patch: --inline-single
 - format-patch: rename "no_inline" field

 A new option to send a single patch to the standard output to be
 appended at the bottom of a message.  I personally have no need for
 this, but it was easy enough to cobble together.  Tests, docs and
 stripping out more MIMEy stuff are left as exercises to interested
 parties.


* jk/gitweb-utf8 (2013-04-08) 4 commits
 - gitweb: Fix broken blob action parameters on blob/commitdiff pages
 - gitweb: Don't append ';js=(0|1)' to external links
 - gitweb: Make feed title valid utf8
 - gitweb: Fix utf8 encoding for blob_plain, blobdiff_plain, commitdiff_plain, and patch

 Various fixes to gitweb.

 Drew Northup volunteered to take a look into this.
 $gmane/226216


* jc/show-branch (2013-06-07) 5 commits
 - show-branch: use commit slab to represent bitflags of arbitrary width
 - show-branch.c: remove "all_mask"
 - show-branch.c: abstract out "flags" operation
 - show-branch.c: lift all_mask/all_revs to a global static
 - show-branch.c: update comment style

 Waiting for the final step to lift the hard-limit before sending it out.

--------------------------------------------------
[Cooking]

* ap/remote-hg-unquote-cquote (2013-10-23) 1 commit
 - remote-hg: unquote C-style paths when exporting

 A fast-import stream expresses a pathname with funny characters by
 quoting them in C style; remote-hg remote helper forgot to unquote
 such a path.

 Will merge to 'next'.


* jl/pack-transfer-avoid-double-close (2013-10-23) 1 commit
 - Clear fd after closing to avoid double-close error

 The codepath that send_pack() calls pack_objects() mistakenly
 closed the same file descriptor twice, leading to potentially
 closing a wrong file descriptor that was opened in the meantime.

 Will merge to 'next'.
 Needs to be merged later to 'maint'.


* nd/magic-pathspec (2013-10-22) 1 commit
 - Fix calling parse_pathspec with no paths nor PATHSPEC_PREFER_* flags

 All callers to parse_pathspec() must choose between getting no
 pathspec or one path that is limited to the current directory
 when there is no paths given on the command line, but there were
 two callers that violated this rule, triggering a BUG().

 Will merge to 'next'.


* sb/git-svn-docs-indent-with-ht (2013-10-22) 1 commit
 - git-svn docs: Use tabs consistently within the ascii doc

 Will merge to 'next'.


* tr/gitk-doc-update (2013-10-22) 1 commit
 - Documentation: revamp gitk(1)

 Will merge to 'next'.


* tr/valgrind-test-fix (2013-10-22) 2 commits
 - Revert "test-lib: allow prefixing a custom string before "ok N" etc."
 - Revert "test-lib: support running tests under valgrind in parallel"

 Will merge to 'next'.


* sb/repack-in-c (2013-10-22) 1 commit
  (merged to 'next' on 2013-10-23 at 5d7ac72)
 + Reword repack documentation to no longer state it's a script

 Finishing touches to update documentation.

 Will merge to 'master'.


* mm/checkout-auto-track-fix (2013-10-18) 2 commits
 - checkout: proper error message on 'git checkout foo bar --'
 - checkout: allow dwim for branch creation for "git checkout $branch --"

 "git checkout topic", when there is not yet a local "topic" branch
 but there is a unique remote-tracking branch for a remote "topic"
 branch, pretended as if "git checkout -t -b topic remote/$r/topic"
 (for that unique remote $r) was run. This hack however was not
 implemented for "git checkout topic --".

 Will merge to 'next'.


* hn/log-graph-color-octopus (2013-10-18) 1 commit
 - graph: fix coloring around octopus merges

 Will merge to 'next'.


* nd/gc-lock-against-each-other (2013-10-18) 1 commit
 - gc: remove gc.pid file at end of execution

 Will merge to 'next'.


* fc/styles (2013-10-16) 7 commits
 - block-sha1/sha1.c: have SP around arithmetic operators
 - base85.c: have SP around arithmetic operators
 - archive.c: have SP around arithmetic operators
 - alloc.c: have SP around arithmetic operators
 - abspath.c: have SP around arithmetic operators
 - alias: have SP around arithmetic operators
 - C: have space around && and || operators

 C coding style fixes.  The ones near the tip have not been sent to
 the list yet (they cover the same kind of style violation as the
 second one) and I should send them to the list.

 Will merge to 'next'.


* jk/remote-literal-string-leakfix (2013-10-15) 1 commit
  (merged to 'next' on 2013-10-18 at 6abddac)
 + remote: do not copy "origin" string literal

 Will merge to 'master'.


* jk/split-broken-ident (2013-10-15) 1 commit
  (merged to 'next' on 2013-10-18 at 8f4b8b7)
 + split_ident: parse timestamp from end of line

 Make the fall-back parsing of commit objects with broken author or
 committer lines more robust to pick up the timestamps.

 Will merge to 'master'.


* sg/prompt-svn-remote-fix (2013-10-15) 1 commit
  (merged to 'next' on 2013-10-18 at 20b47eb)
 + bash prompt: don't use '+=' operator in show upstream code path

 Bash portability fix.

 Will merge to 'master'.


* sg/t3600-nul-sha1-fix (2013-10-16) 1 commit
 - t3600: fix broken "choking git rm" test

 Will merge to 'next'.


* ak/submodule-foreach-quoting (2013-09-27) 1 commit
  (merged to 'next' on 2013-10-14 at d77c5f1)
 + submodule foreach: skip eval for more than one argument

 A behavior change, but a worthwhile one: "git submodule foreach"
 was treating its arguments as part of a single command to be
 concatenated and passed to a shell, making writing buggy
 scripts too easy.

 This patch preserves the old "just pass it to the shell" behavior
 when a single argument is passed to 'git submodule foreach' and
 moves to a new "skip the shell and use the arguments passed
 unmolested" behavior when more than one argument is passed.

 The old behavior (always concatenating and passing to the shell)
 was similar to the 'ssh' command, while the new behavior (switching
 on the number of arguments) is what 'xterm -e' does.

 May need more thought to make sure this change is advertised well
 so that scripts that used multiple arguments but added their own
 extra layer of quoting are not broken.

 Will cook in 'next' for the rest of this cycle.


* ew/keepalive (2013-10-16) 2 commits
  (merged to 'next' on 2013-10-16 at 56fd9f3)
 + http: use curl's tcp keepalive if available
  (merged to 'next' on 2013-10-14 at 24d786f)
 + http: enable keepalive on TCP sockets

 Will merge to 'master'.


* jk/http-auth-redirects (2013-10-24) 10 commits
  (merged to 'next' on 2013-10-24 at 4bebb66)
 + http.c: Spell the null pointer as NULL
 + remote-curl: rewrite base url from info/refs redirects
 + remote-curl: store url as a strbuf
 + remote-curl: make refs_url a strbuf
 + http: update base URLs when we see redirects
 + http: provide effective url to callers
 + http: hoist credential request out of handle_curl_result
  (merged to 'next' on 2013-10-14 at a0642be)
 + http: refactor options to http_get_*
 + http_request: factor out curlinfo_strbuf
 + http_get_file: style fixes

 Handle the case where http transport gets redirected during the
 authorization request better.

 Will merge to 'master'.


* jl/submodule-mv (2013-10-13) 1 commit
 - mv: Fix spurious warning when moving a file in presence of submodules

 Moving a regular file in a repository with a .gitmodules file was
 producing a warning 'Could not find section in .gitmodules where
 path=<filename>'.

 Will merge to 'next'.


* kb/fast-hashmap (2013-10-22) 12 commits
 - remove old hash.[ch] implementation
 - read-cache.c: fix memory leaks caused by removed cache entries
 - name-hash.c: remove cache entries instead of marking them CE_UNHASHED
 - name-hash.c: use new hash map implementation for cache entries
 - name-hash.c: remove unreferenced directory entries
 - name-hash.c: use new hash map implementation for directories
 - diffcore-rename.c: use new hash map implementation
 - diffcore-rename.c: simplify finding exact renames
 - diffcore-rename.c: move code around to prepare for the next patch
 - buitin/describe.c: use new hash map implementation
 - add a hashtable implementation that supports O(1) removal
 - submodule: don't access the .gitmodules cache entry after removing it

 Improvements to our hash table to get it to meet the needs of the
 msysgit fscache project, with some nice performance improvements.

 The preparatory clean-up to submodule from Jens is at the bottom. I
 also squashed in a fix-up by Karsten found at $gmane/236468 (please
 double-check the result).

 Will merge to 'next'.


* jc/revision-range-unpeel (2013-10-15) 1 commit
  (merged to 'next' on 2013-10-16 at d04ddfe)
 + revision: do not peel tags used in range notation

 "git rev-list --objects ^v1.0^ v1.0" gave v1.0 tag itself in the
 output, but "git rev-list --objects v1.0^..v1.0" did not.

 Will merge to 'master'.


* jc/upload-pack-send-symref (2013-10-22) 10 commits
  (merged to 'next' on 2013-10-23 at 8ef5660)
 + t5570: Update for clone-progress-to-stderr branch
 + Merge branch 'jk/clone-progress-to-stderr' into jc/upload-pack-send-symref
 + t5570: Update for symref capability
  (merged to 'next' on 2013-10-16 at eb1ae25)
 + clone: test the new HEAD detection logic
 + connect: annotate refs with their symref information in get_remote_head()
 + connect.c: make parse_feature_value() static
 + upload-pack: send non-HEAD symbolic refs
 + upload-pack: send symbolic ref information as capability
 + upload-pack.c: do not pass confusing cb_data to mark_our_ref()
 + t5505: fix "set-head --auto with ambiguous HEAD" test

 One long-standing flaw in the pack transfer protocol used by "git
 clone" was that there was no way to tell the other end which branch
 "HEAD" points at, and the receiving end needed to guess.  A new
 capability has been defined in the pack protocol to convey this
 information so that cloning from a repository with more than one
 branches pointing at the same commit where the HEAD is at now
 reliably sets the initial branch in the resulting repository.

 Will merge to 'master'.


* jx/relative-path-regression-fix (2013-10-14) 3 commits
  (merged to 'next' on 2013-10-18 at b4af45f)
 + Use simpler relative_path when set_git_dir
  (merged to 'next' on 2013-10-14 at 704b9ee)
 + relative_path should honor dos-drive-prefix
 + test: use unambigous leading path (/foo) for MSYS

 Will merge to 'master' and later to 'maint'.


* jn/add-2.0-u-A-sans-pathspec (2013-04-26) 1 commit
 - git add: -u/-A now affects the entire working tree

 Will merge to and cook in 'next' until Git 2.0.


* jc/core-checkstat-2.0 (2013-05-06) 1 commit
 - core.statinfo: remove as promised in Git 2.0

 Will merge to and cook in 'next' until Git 2.0.


* jc/push-2.0-default-to-simple (2013-06-18) 1 commit
 - push: switch default from "matching" to "simple"

 Will merge to and cook in 'next' until Git 2.0.


* jc/add-2.0-ignore-removal (2013-04-22) 1 commit
 - git add <pathspec>... defaults to "-A"

 Updated endgame for "git add <pathspec>" that defaults to "--all"
 aka "--no-ignore-removal".

 Will merge to and cook in 'next' until Git 2.0.


* jc/hold-diff-remove-q-synonym-for-no-deletion (2013-07-19) 1 commit
 - diff: remove "diff-files -q" in a version of Git in a distant future

 Will merge to and cook in 'next' until a distant future.

--------------------------------------------------
[Discarded]

* jh/shorten-refname (2013-05-07) 4 commits
 . t1514: refname shortening is done after dereferencing symbolic refs
 . shorten_unambiguous_ref(): Fix shortening refs/remotes/origin/HEAD to origin
 . t1514: Demonstrate failure to correctly shorten "refs/remotes/origin/HEAD"
 . t1514: Add tests of shortening refnames in strict/loose mode

 When remotes/origin/HEAD is not a symbolic ref, "rev-parse
 --abbrev-ref remotes/origin/HEAD" ought to show "origin", not
 "origin/HEAD", which is fixed with this series (if it is a symbolic
 ref that points at remotes/origin/something, then it should show
 "origin/something" and it already does).

 Has been expecting a reroll, as an early part of a larger series.
 $gmane/225137

 Discarded due to inactivity, without prejudice.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
  2013-10-25 23:23 What's cooking in git.git (Oct 2013, #06; Fri, 25) Junio C Hamano
@ 2013-10-26 10:30 ` Duy Nguyen
  2013-10-28 15:48   ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Duy Nguyen @ 2013-10-26 10:30 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Git Mailing List, Junio C Hamano, Jeff King

On Sat, Oct 26, 2013 at 6:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> * kb/fast-hashmap (2013-10-22) 12 commits
>  - remove old hash.[ch] implementation
>  - read-cache.c: fix memory leaks caused by removed cache entries
>  - name-hash.c: remove cache entries instead of marking them CE_UNHASHED
>  - name-hash.c: use new hash map implementation for cache entries
>  - name-hash.c: remove unreferenced directory entries
>  - name-hash.c: use new hash map implementation for directories
>  - diffcore-rename.c: use new hash map implementation
>  - diffcore-rename.c: simplify finding exact renames
>  - diffcore-rename.c: move code around to prepare for the next patch
>  - buitin/describe.c: use new hash map implementation
>  - add a hashtable implementation that supports O(1) removal
>  - submodule: don't access the .gitmodules cache entry after removing it
>
>  Improvements to our hash table to get it to meet the needs of the
>  msysgit fscache project, with some nice performance improvements.
>
>  The preparatory clean-up to submodule from Jens is at the bottom. I
>  also squashed in a fix-up by Karsten found at $gmane/236468 (please
>  double-check the result).

jk/pack-bitmap adds khash.h, which from a first glance looks like yet
another hash table implementation. I was just wondering if kb's new
hash tables can cover the need of pack-bitmap.c too so we can remove
khash.h later..
-- 
Duy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
  2013-10-26 10:30 ` Duy Nguyen
@ 2013-10-28 15:48   ` Junio C Hamano
  2013-10-28 16:16     ` Vicent Martí
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2013-10-28 15:48 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Karsten Blees, Git Mailing List, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> On Sat, Oct 26, 2013 at 6:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> * kb/fast-hashmap (2013-10-22) 12 commits
>>  - remove old hash.[ch] implementation
>>  - read-cache.c: fix memory leaks caused by removed cache entries
>>  - name-hash.c: remove cache entries instead of marking them CE_UNHASHED
>>  - name-hash.c: use new hash map implementation for cache entries
>>  - name-hash.c: remove unreferenced directory entries
>>  - name-hash.c: use new hash map implementation for directories
>>  - diffcore-rename.c: use new hash map implementation
>>  - diffcore-rename.c: simplify finding exact renames
>>  - diffcore-rename.c: move code around to prepare for the next patch
>>  - buitin/describe.c: use new hash map implementation
>>  - add a hashtable implementation that supports O(1) removal
>>  - submodule: don't access the .gitmodules cache entry after removing it
>>
>>  Improvements to our hash table to get it to meet the needs of the
>>  msysgit fscache project, with some nice performance improvements.
>>
>>  The preparatory clean-up to submodule from Jens is at the bottom. I
>>  also squashed in a fix-up by Karsten found at $gmane/236468 (please
>>  double-check the result).
>
> jk/pack-bitmap adds khash.h, which from a first glance looks like yet
> another hash table implementation. I was just wondering if kb's new
> hash tables can cover the need of pack-bitmap.c too so we can remove
> khash.h later..

Good thinking ;-).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
  2013-10-28 15:48   ` Junio C Hamano
@ 2013-10-28 16:16     ` Vicent Martí
  2013-10-28 16:41       ` Junio C Hamano
  2013-10-28 19:45       ` Karsten Blees
  0 siblings, 2 replies; 9+ messages in thread
From: Vicent Martí @ 2013-10-28 16:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Duy Nguyen, Karsten Blees, Git Mailing List, Jeff King

On Mon, Oct 28, 2013 at 4:48 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> jk/pack-bitmap adds khash.h, which from a first glance looks like yet
>> another hash table implementation. I was just wondering if kb's new
>> hash tables can cover the need of pack-bitmap.c too so we can remove
>> khash.h later..
>
> Good thinking ;-).

We use the khash tables to map:

    - sha1 (const char *) to (void *)
    - sha1 (const char *) to int

The new `hashmap.c` covers the first case quite well (albeit slightly
more verbosely than I'd like), but in the second case it doesn't quite
work. Since the new hash needs to embed the "struct hashmap_entry" on
all its values (to allow for separate chaining), having it map to
`int` keys requires a struct like this:

    struct sha1_position {
        struct hashmap_entry {
            struct hashmap_entry *next;
            unsigned int hash;
        };
        int position;
    }

khash on the other hand is capable of storing the position values as
part of the hash table itself (i.e. `int **buckets`), and saves us
from thousands of bytes of allocations + indirection.

I am not sure whether the consistency of having a single hash map
warrants the performance and memory hits when operating on the
extended index.

Please advice.

luv,
vmg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
  2013-10-28 16:16     ` Vicent Martí
@ 2013-10-28 16:41       ` Junio C Hamano
  2013-10-28 19:45       ` Karsten Blees
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2013-10-28 16:41 UTC (permalink / raw)
  To: Vicent Martí; +Cc: Duy Nguyen, Karsten Blees, Git Mailing List, Jeff King

Vicent Martí <tanoku@gmail.com> writes:

> On Mon, Oct 28, 2013 at 4:48 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> jk/pack-bitmap adds khash.h, which from a first glance looks like yet
>>> another hash table implementation. I was just wondering if kb's new
>>> hash tables can cover the need of pack-bitmap.c too so we can remove
>>> khash.h later..
>> ...
> khash on the other hand is capable of storing the position values as
> part of the hash table itself (i.e. `int **buckets`), and saves us
> from thousands of bytes of allocations + indirection.

My "Good thinking ;-)" comment was primarily meant as "somebody
needs to at least think about the possibility and consider pros and
cons", and you thought about it already ;-).

In short, kb's hash table does not cover the need for pack-bitmap,
so we should keep two at least for now, until (and/or unless) either
side can be shown (and/or extended) to cover the need for the other
one as well.

Thanks. 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
  2013-10-28 16:16     ` Vicent Martí
  2013-10-28 16:41       ` Junio C Hamano
@ 2013-10-28 19:45       ` Karsten Blees
  2013-10-28 21:04         ` Vicent Martí
  1 sibling, 1 reply; 9+ messages in thread
From: Karsten Blees @ 2013-10-28 19:45 UTC (permalink / raw)
  To: Vicent Martí, Junio C Hamano
  Cc: Duy Nguyen, Karsten Blees, Git Mailing List, Jeff King

Am 28.10.2013 17:16, schrieb Vicent Martí:
> On Mon, Oct 28, 2013 at 4:48 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> jk/pack-bitmap adds khash.h, which from a first glance looks like yet
>>> another hash table implementation. I was just wondering if kb's new
>>> hash tables can cover the need of pack-bitmap.c too so we can remove
>>> khash.h later..
>>
>> Good thinking ;-).
> 
> We use the khash tables to map:
> 
>     - sha1 (const char *) to (void *)
>     - sha1 (const char *) to int
> 
> The new `hashmap.c` covers the first case quite well (albeit slightly
> more verbosely than I'd like), but in the second case it doesn't quite
> work. Since the new hash needs to embed the "struct hashmap_entry" on
> all its values (to allow for separate chaining), having it map to
> `int` keys requires a struct like this:
> 
>     struct sha1_position {
>         struct hashmap_entry {
>             struct hashmap_entry *next;
>             unsigned int hash;
>         };
>         int position;
>     }
> 

Hmm...isn't that position rather an index into two separately maintained arrays? So you'd rather have:

    struct sha1_position {
        struct hashmap_entry {
            struct hashmap_entry *next;
            unsigned int hash;
        };
        uint32_t pack_name_hash;
        struct object *object;
     }

> khash on the other hand is capable of storing the position values as
> part of the hash table itself (i.e. `int **buckets`), and saves us
> from thousands of bytes of allocations + indirection.
> 

However, khash keeps separate arrays for flags, keys and values, all of them overallocated by 1 / load factor (so there's lots of unused space). ext_index.objects and ext_index.hashes are also overallocated by the usual alloc_nr() factor 1.5.

Regarding memory consumption, I think both implementations will be pretty similar. Hashmap allocates many small regions while khash re-allocates a few big ones...I really don't know which is better, ideally entries would be allocated in chunks to minimize both heap overhead and memcpy disadvantes.

Regarding performance, khash uses open addressing, which requires more key comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)). However, any measurable differences will most likely be dwarfed by IO costs in this particular use case.


Btw., pack-objects.c::rehash_objects() in patch 03 unnecessarily checks for duplicates. That's probably the reason for the high hashcmp times you found in the first round of the patch series.

Cheers,
Karsten

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
  2013-10-28 19:45       ` Karsten Blees
@ 2013-10-28 21:04         ` Vicent Martí
  2013-10-29  9:09           ` Karsten Blees
  0 siblings, 1 reply; 9+ messages in thread
From: Vicent Martí @ 2013-10-28 21:04 UTC (permalink / raw)
  To: Karsten Blees
  Cc: Junio C Hamano, Duy Nguyen, Karsten Blees, Git Mailing List, Jeff King

On Mon, Oct 28, 2013 at 8:45 PM, Karsten Blees <karsten.blees@gmail.com> wrote:
>> The new `hashmap.c` covers the first case quite well (albeit slightly
>> more verbosely than I'd like), but in the second case it doesn't quite
>> work. Since the new hash needs to embed the "struct hashmap_entry" on
>> all its values (to allow for separate chaining), having it map to
>> `int` keys requires a struct like this:
>>
>>     struct sha1_position {
>>         struct hashmap_entry {
>>             struct hashmap_entry *next;
>>             unsigned int hash;
>>         };
>>         int position;
>>     }
>>
>
> Hmm...isn't that position rather an index into two separately maintained arrays? So you'd rather have:
>
>     struct sha1_position {
>         struct hashmap_entry {
>             struct hashmap_entry *next;
>             unsigned int hash;
>         };
>         uint32_t pack_name_hash;
>         struct object *object;
>      }

No, this is not quite right. We use the separate arrays because the
normal operation mode is to index by position (e.g. we need the nth
object in the extended index); the hash table is an auxiliary
structure to reverse that indexing (e.g. what position does this SHA1
have on the extended index). The information which is always required
to construct bitmaps is position, so we need to store the indexes in a
map.

>> khash on the other hand is capable of storing the position values as
>> part of the hash table itself (i.e. `int **buckets`), and saves us
>> from thousands of bytes of allocations + indirection.
>>
>
> However, khash keeps separate arrays for flags, keys and values, all of them overallocated by 1 / load factor (so there's lots of unused space). ext_index.objects and ext_index.hashes are also overallocated by the usual alloc_nr() factor 1.5.

FWIW The flags for khash are compacted, so they are stored much more
tightly than pointers, even when overallocated.

>
> Regarding memory consumption, I think both implementations will be pretty similar. Hashmap allocates many small regions while khash re-allocates a few big ones...I really don't know which is better, ideally entries would be allocated in chunks to minimize both heap overhead and memcpy disadvantes.

I agree, both implementations probably have very similar memory
characteristics, probably enough not to matter.

> Regarding performance, khash uses open addressing, which requires more key comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)). However, any measurable differences will most likely be dwarfed by IO costs in this particular use case.

I don't think this is true. If you actually run a couple insertion and
lookup benchmarks comparing the two implementations, you'll find khash
to be around ~30% faster for most workloads (venturing a guess from
past experience). I am obviously not the author of khash, but I've
found that the theoretical increase in key comparisons is completely
dwarfed by the benefit of increased cache locality during the probing
fase. I still haven't found a faster C hash table implementation than
khash for the general case, that's why I normally use it despite the
worrisome preprocessor crash-party going on in that header file.

> Btw., pack-objects.c::rehash_objects() in patch 03 unnecessarily checks for duplicates. That's probably the reason for the high hashcmp times you found in the first round of the patch series.

Patch 03 is a refactoring -- the duplicate checking code has been in
pack-objects.c for years. I am not sure duplicate checking is
superfluous at all, there are many cases where you could be
double-inserting objects in a pack-objects run, and you really don't
want to generate packfiles with dupe objects.

Thanks for the feedback!

luv,
vmg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
  2013-10-28 21:04         ` Vicent Martí
@ 2013-10-29  9:09           ` Karsten Blees
  2013-11-14 19:38             ` Karsten Blees
  0 siblings, 1 reply; 9+ messages in thread
From: Karsten Blees @ 2013-10-29  9:09 UTC (permalink / raw)
  To: Vicent Martí
  Cc: Junio C Hamano, Duy Nguyen, Karsten Blees, Git Mailing List, Jeff King

On Mon, Oct 28, 2013 at 10:04 PM, Vicent Martí <tanoku@gmail.com> wrote:
>
> On Mon, Oct 28, 2013 at 8:45 PM, Karsten Blees <karsten.blees@gmail.com> wrote:
>
> > Regarding performance, khash uses open addressing, which requires more key comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)). However, any measurable differences will most likely be dwarfed by IO costs in this particular use case.
>
> I don't think this is true. If you actually run a couple insertion and
> lookup benchmarks comparing the two implementations, you'll find khash
> to be around ~30% faster for most workloads (venturing a guess from
> past experience). I am obviously not the author of khash, but I've
> found that the theoretical increase in key comparisons is completely
> dwarfed by the benefit of increased cache locality during the probing
> fase. I still haven't found a faster C hash table implementation than
> khash for the general case, that's why I normally use it despite the
> worrisome preprocessor crash-party going on in that header file.

Yes, cache locality is where open addressing shines, however, you
loose that benefit when the keys are pointers (e.g. sha1's).

>
>
> > Btw., pack-objects.c::rehash_objects() in patch 03 unnecessarily checks for duplicates. That's probably the reason for the high hashcmp times you found in the first round of the patch series.
>
> Patch 03 is a refactoring -- the duplicate checking code has been in
> pack-objects.c for years. I am not sure duplicate checking is
> superfluous at all, there are many cases where you could be
> double-inserting objects in a pack-objects run, and you really don't
> want to generate packfiles with dupe objects.

The point is that its in _rehash_. Duplicate checking should be in
add/put. Rehash only rearranges entries that are alread _in_ the map,
and it usually only needs the hash code for that. So pack-objects
implements rehash in terms of a full clear + add-all instead, which is
of course slower than what khash, hashmap etc. would do.

Ciao,
Karsten

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: What's cooking in git.git (Oct 2013, #06; Fri, 25)
  2013-10-29  9:09           ` Karsten Blees
@ 2013-11-14 19:38             ` Karsten Blees
  0 siblings, 0 replies; 9+ messages in thread
From: Karsten Blees @ 2013-11-14 19:38 UTC (permalink / raw)
  To: Vicent Martí
  Cc: Junio C Hamano, Duy Nguyen, Karsten Blees, Git Mailing List, Jeff King

Am 29.10.2013 10:09, schrieb Karsten Blees:
> On Mon, Oct 28, 2013 at 10:04 PM, Vicent Martí <tanoku@gmail.com> wrote:
>>
>> On Mon, Oct 28, 2013 at 8:45 PM, Karsten Blees <karsten.blees@gmail.com> wrote:
>>
>>> Regarding performance, khash uses open addressing, which requires more key comparisons (O(1/(1-load_factor))) than chaining (O(1+load_factor)). However, any measurable differences will most likely be dwarfed by IO costs in this particular use case.
>>
>> I don't think this is true. If you actually run a couple insertion and
>> lookup benchmarks comparing the two implementations, you'll find khash
>> to be around ~30% faster for most workloads (venturing a guess from
>> past experience). I am obviously not the author of khash, but I've
...

Just out of curiosity, I added performance test code for khash to the test in my current hashmap patch series [1]. It turns out that khash is by far the slowest of the bunch, especially with many collisions.

Again, I don't think that performance matters all that much (or in other words: _any_ hash table implementation will probably be fast enough compared to the rest that's going on). Its more a question of whether we really need two different hash table implementations (and a queasy feeling about the macro kludge in khash.h...).


Khash doesn't store the hash codes along with the entries (as both hash.[ch] and hashmap.[ch] do), so it needs to re-calculate hash codes on every resize. For a fair comparison, the "khash" test uses keys with pre-calculated hash codes in the key structure. This should be similar to a hash function that just copies 4 bytes from a sha1 key. Khash maps with more complex hash functions will be slower (see khstr).

The "khstr" test uses khash's predefined string map and khash's X31 hash function for strings (therefore no separate values for different hash functions here).

The table is similar to what I posted for hashmap-v2 [2] (i.e. real time in seconds for 1,000 rounds á 100,000 entries). I just turned it around a bit to make room for khash columns.

test | hash_fn | hashmap |  hash   |  khash  | khstr  |
-----+---------+---------+---------+---------+--------+
     | FNV     |   2.429 |  14.366 |  11.780 | 18.677 |
     | FNV  x2 |   2.946 |  14.558 |  10.922 |        |
 add | i       |   1.708 |   7.419 |   4.132 |        |
     | i    x2 |   1.791 |   8.565 |   4.502 |        |
     | i/10    |   1.555 |   1.805 | 344.691 |        |
     | i/10 x2 |   1.543 |   1.808 | 319.559 |        |
-----+---------+---------+---------+---------+--------+
     | FNV     |   1.822 |   3.452 |   4.922 |  8.309 |
get  | FNV  x2 |   2.298 |   3.194 |   4.473 |        |
100% | i       |   1.252 |   1.344 |   0.944 |        |
hits | i    x2 |   1.286 |   1.434 |   1.220 |        |
     | i/10    |   6.720 |   5.138 | 281.815 |        |
     | i/10 x2 |   6.297 |   5.188 | 257.021 |        |
-----+---------+---------+---------+---------+--------+
     | FNV     |   1.023 |   3.949 |   4.115 |  4.878 |
get  | FNV  x2 |   1.538 |   3.915 |   4.571 |        |
10%  | i       |   0.654 | 397.457 |  38.125 |        |
hits | i    x2 |   0.718 |   0.722 |   9.111 |        |
     | i/10    |   1.128 |  30.235 |  60.376 |        |
     | i/10 x2 |   1.260 |   1.082 |  43.354 |        |
-----+---------+---------+---------+---------+--------+

[1] https://github.com/kblees/git/commits/kb/hashmap-v5-khash
[2] http://article.gmane.org/gmane.comp.version-control.git/235290

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-11-14 19:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-25 23:23 What's cooking in git.git (Oct 2013, #06; Fri, 25) Junio C Hamano
2013-10-26 10:30 ` Duy Nguyen
2013-10-28 15:48   ` Junio C Hamano
2013-10-28 16:16     ` Vicent Martí
2013-10-28 16:41       ` Junio C Hamano
2013-10-28 19:45       ` Karsten Blees
2013-10-28 21:04         ` Vicent Martí
2013-10-29  9:09           ` Karsten Blees
2013-11-14 19:38             ` Karsten Blees

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.