* What's cooking in git.git (Jan 2019, #01; Mon, 7) @ 2019-01-07 23:34 Junio C Hamano 2019-01-08 9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer ` (5 more replies) 0 siblings, 6 replies; 41+ messages in thread From: Junio C Hamano @ 2019-01-07 23:34 UTC (permalink / raw) To: git Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. The 'master' branch now has the first batch of topics (many of which have been cooking in 'next' during the pre-release freeze). You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -------------------------------------------------- [Graduated to "master"] * ab/push-dwim-dst (2018-11-14) 7 commits (merged to 'next' on 2018-12-28 at d9f618de10) + push doc: document the DWYM behavior pushing to unqualified <dst> + push: test that <src> doesn't DWYM if <dst> is unqualified + push: add an advice on unqualified <dst> push + push: move unqualified refname error into a function + push: improve the error shown on unqualified <dst> push + i18n: remote.c: mark error(...) messages for translation + remote.c: add braces in anticipation of a follow-up change Originally merged to 'next' on 2018-11-18 "git push $there $src:$dst" rejects when $dst is not a fully qualified refname and not clear what the end user meant. The codepath has been taught to give a clearer error message, and also guess where the push should go by taking the type of the pushed object into account (e.g. a tag object would want to go under refs/tags/). * en/fast-export-import (2018-11-17) 11 commits (merged to 'next' on 2018-12-28 at a1b09cf515) + fast-export: add a --show-original-ids option to show original names + fast-import: remove unmaintained duplicate documentation + fast-export: add --reference-excluded-parents option + fast-export: ensure we export requested refs + fast-export: when using paths, avoid corrupt stream with non-existent mark + fast-export: move commit rewriting logic into a function for reuse + fast-export: avoid dying when filtering by paths and old tags exist + fast-export: use value from correct enum + git-fast-export.txt: clarify misleading documentation about rev-list args + git-fast-import.txt: fix documentation for --quiet option + fast-export: convert sha1 to oid Originally merged to 'next' on 2018-11-18 Small fixes and features for fast-export and fast-import, mostly on the fast-export side. * en/merge-path-collision (2018-12-01) 11 commits (merged to 'next' on 2018-12-28 at b50d3eee25) + t6036: avoid non-portable "cp -a" + merge-recursive: combine error handling + t6036, t6043: increase code coverage for file collision handling + merge-recursive: improve rename/rename(1to2)/add[/add] handling + merge-recursive: use handle_file_collision for add/add conflicts + merge-recursive: improve handling for rename/rename(2to1) conflicts + merge-recursive: fix rename/add conflict handling + merge-recursive: new function for better colliding conflict resolutions + merge-recursive: increase marker length with depth of recursion + t6036, t6042: testcases for rename collision of already conflicting files + t6042: add tests for consistency in file collision conflict handling Originally merged to 'next' on 2018-12-01 Updates for corner cases in merge-recursive. * fc/http-version (2018-11-09) 1 commit (merged to 'next' on 2018-12-28 at 56bcbb0fa9) + http: add support selecting http version Originally merged to 'next' on 2018-11-18 The "http.version" configuration variable can be used with recent enough cURL library to force the version of HTTP used to talk when fetching and pushing. * jk/loose-object-cache (2018-11-24) 10 commits (merged to 'next' on 2018-12-28 at 5a5faf384e) + odb_load_loose_cache: fix strbuf leak + fetch-pack: drop custom loose object cache + sha1-file: use loose object cache for quick existence check + object-store: provide helpers for loose_objects_cache + sha1-file: use an object_directory for the main object dir + handle alternates paths the same as the main object dir + sha1_file_name(): overwrite buffer instead of appending + rename "alternate_object_database" to "object_directory" + submodule--helper: prefer strip_suffix() to ends_with() + fsck: do not reuse child_process structs Originally merged to 'next' on 2018-11-24 Code clean-up with optimization for the codepath that checks (non-)existence of loose objects. * mk/http-backend-kill-children-before-exit (2018-11-26) 1 commit (merged to 'next' on 2018-12-28 at 81188d93c3) + http-backend: enable cleaning up forked upload/receive-pack on exit Originally merged to 'next' on 2018-11-29 The http-backend CGI process did not correctly clean up the child processes it spawns to run upload-pack etc. when it dies itself, which has been corrected. * nd/checkout-dwim-fix (2018-11-14) 1 commit (merged to 'next' on 2018-12-28 at 3183c9305b) + checkout: disambiguate dwim tracking branches and local files Originally merged to 'next' on 2018-11-18 "git checkout frotz" (without any double-dash) avoids ambiguity by making sure 'frotz' cannot be interpreted as a revision and as a path at the same time. This safety has been updated to check also a unique remote-tracking branch 'frotz' in a remote, when dwimming to create a local branch 'frotz' out of a remote-tracking branch 'frotz' from a remote. * nd/i18n (2018-11-12) 16 commits (merged to 'next' on 2018-12-28 at 8e2de8338e) + fsck: mark strings for translation + fsck: reduce word legos to help i18n + parse-options.c: mark more strings for translation + parse-options.c: turn some die() to BUG() + parse-options: replace opterror() with optname() + repack: mark more strings for translation + remote.c: mark messages for translation + remote.c: turn some error() or die() to BUG() + reflog: mark strings for translation + read-cache.c: add missing colon separators + read-cache.c: mark more strings for translation + read-cache.c: turn die("internal error") to BUG() + attr.c: mark more string for translation + archive.c: mark more strings for translation + alias.c: mark split_cmdline_strerror() strings for translation + git.c: mark more strings for translation Originally merged to 'next' on 2018-11-18 More _("i18n") markings. * nd/the-index (2018-11-12) 22 commits (merged to 'next' on 2018-12-28 at 6bbd3befbe) + rebase-interactive.c: remove the_repository references + rerere.c: remove the_repository references + pack-*.c: remove the_repository references + pack-check.c: remove the_repository references + notes-cache.c: remove the_repository references + line-log.c: remove the_repository reference + diff-lib.c: remove the_repository references + delta-islands.c: remove the_repository references + cache-tree.c: remove the_repository references + bundle.c: remove the_repository references + branch.c: remove the_repository reference + bisect.c: remove the_repository reference + blame.c: remove implicit dependency the_repository + sequencer.c: remove implicit dependency on the_repository + sequencer.c: remove implicit dependency on the_index + transport.c: remove implicit dependency on the_index + notes-merge.c: remove implicit dependency the_repository + notes-merge.c: remove implicit dependency on the_index + list-objects.c: reduce the_repository references + list-objects-filter.c: remove implicit dependency on the_index + wt-status.c: remove implicit dependency the_repository + wt-status.c: remove implicit dependency on the_index (this branch is used by md/list-objects-filter-by-depth.) Originally merged to 'next' on 2018-11-18 More codepaths become aware of working with in-core repository instance other than the default "the_repository". * sd/stash-wo-user-name (2018-11-19) 1 commit (merged to 'next' on 2018-12-28 at 99197ef5a1) + stash: tolerate missing user identity (this branch is used by ps/stash-in-c.) Originally merged to 'next' on 2018-11-19 A properly configured username/email is required under user.useConfigOnly in order to create commits; now "git stash" (even though it creates commit objects to represent stash entries) command is exempt from the requirement. * sg/clone-initial-fetch-configuration (2018-11-16) 3 commits (merged to 'next' on 2018-12-28 at 82e104f221) + Documentation/clone: document ignored configuration variables + clone: respect additional configured fetch refspecs during initial fetch + clone: use a more appropriate variable name for the default refspec Originally merged to 'next' on 2018-11-18 Refspecs configured with "git -c var=val clone" did not propagate to the resulting repository, which has been corrected. -------------------------------------------------- [New Topics] * ab/proto-v2-fixes (2018-12-14) 5 commits - tests: mark tests broken under GIT_TEST_PROTOCOL_VERSION=2 - builtin/fetch-pack: support protocol version 2 - tests: mark & fix tests broken under GIT_TEST_PROTOCOL_VERSION=1 - tests: add a special setup where for protocol.version - tests: add a check for unportable env --unset This is (just a part of) v2, and I see v3 posted on the list, but it seems that v3 needs further updates. cf. <20181218124852.GD30471@sigill.intra.peff.net> I'll drop the whole thing and wait for an update. * jn/stripspace-wo-repository (2018-12-26) 1 commit - stripspace: allow -s/-c outside git repository "git stripspace" should be usable outside a git repository, but under the "-s" or "-c" mode, it didn't. Will merge to 'next'. * ma/asciidoctor (2018-12-26) 3 commits - git-status.txt: render tables correctly under Asciidoctor - Documentation: do not nest open blocks - git-column.txt: fix section header Some of the documentation pages formatted incorrectly with Asciidoctor, which have been fixed. Will merge to 'next'. * nd/worktree-remove-with-uninitialized-submodules (2019-01-07) 1 commit - worktree: allow to (re)move worktrees with uninitialized submodules "git worktree remove" and "git worktree move" refused to work when there is a submodule involved. This has been loosened to ignore uninitialized submodules. Will merge to 'next'. * sb/submodule-unset-core-worktree-when-worktree-is-lost (2018-12-26) 4 commits - submodule deinit: unset core.worktree - submodule--helper: fix BUG message in ensure_core_worktree - submodule: unset core.worktree if no working tree is present - submodule update: add regression test with old style setups The core.worktree setting in a submodule repository should not be pointing at a directory when the submodule loses its working tree (e.g. getting deinit'ed), but the code did not properly maintain this invariant. Will merge to 'next'. * so/cherry-pick-always-allow-m1 (2019-01-07) 4 commits - t3506: validate '-m 1 -ff' is now accepted for non-merge commits - t3502: validate '-m 1' argument is now accepted for non-merge commits - cherry-pick: do not error on non-merge commits when '-m 1' is specified - t3510: stop using '-m 1' to force failure mid-sequence of cherry-picks "git cherry-pick -m1" was forbidden when picking a non-merge commit, even though there _is_ parent number 1 for such a commit. This was done to avoid mistakes back when "cherry-pick" was about picking a single commit, but is no longer useful with "cherry-pick" that can pick a range of commits. Now the "-m$num" option is allowed when picking any commit, as long as $num names an existing parent of the commit. Technically this is a backward incompatible change; hopefully nobody is relying on the error-checking behaviour. Will merge to 'next'. * cy/completion-typofix (2019-01-03) 1 commit - completion: fix typo in git-completion.bash Typofix. Will merge to 'next'. * cy/zsh-completion-SP-in-path (2019-01-03) 2 commits - completion: treat results of git ls-tree as file paths - zsh: complete unquoted paths with spaces correctly With zsh, "git cmd path<TAB>" was completed to "git cmd path name" when the completed path has a special character like SP in it, without any attempt to keep "path name" a single filename. This has been fixed to complete it to "git cmd path\ name" just like Bash completion does. Will merge to 'next'. * ds/commit-graph-assert-missing-parents (2019-01-02) 1 commit - commit-graph: writing missing parents is a BUG Tightening error checking in commit-graph writer. Will merge to 'next'. * ed/simplify-setup-git-dir (2019-01-03) 1 commit - Simplify handling of setup_git_directory_gently() failure cases. Code simplification. Will merge to 'next'. * es/doc-worktree-guessremote-config (2018-12-28) 1 commit - doc/config: do a better job of introducing 'worktree.guessRemote' Doc clarification. Will merge to 'next'. * ew/ban-strncat (2019-01-02) 1 commit - banned.h: mark strncat() as banned The "strncat()" function is now among the banned functions. Will merge to 'next'. * jk/dev-build-format-security (2019-01-07) 1 commit - config.mak.dev: add -Wall, primarily for -Wformat, to help autoconf users Earlier we added "-Wformat-security" to developer builds, assuming that "-Wall" (which includes "-Wformat" which in turn is required to use "-Wformat-security") is always in effect. This is not true when config.mak.autogen is in use, unfortunately. This has been fixed by unconditionally adding "-Wall" to developer builds. Will merge to 'next'. * jp/author-committer-config (2019-01-02) 2 commits - DONTMERGE - Add author and committer configuration settings Four new configuration variables {author,committer}.{name,email} have been introduced to override user.{name,email} in more specific cases. Expecting a reroll. cf. <xmqq1s5uk6qh.fsf@gitster-ct.c.googlers.com> * js/rebase-am (2019-01-03) 4 commits - built-in rebase: call `git am` directly - rebase: teach `reset_head()` to optionally skip the worktree - rebase: avoid double reflog entry when switching branches - rebase: move `reset_head()` into a better spot Instead of going through "git-rebase--am" scriptlet to use the "am" backend, the built-in version of "git rebase" learned to drive the "am" backend directly. Waiting for a review response. Looked almost ready. * ms/packet-err-check (2019-01-02) 2 commits - pack-protocol.txt: accept error packets in any context - Use packet_reader instead of packet_read_line Error checking of data sent over the pack-protocol has been revamped so that error packets are always diagnosed properly. * os/rebase-runs-post-checkout-hook (2019-01-02) 2 commits - rebase: run post-checkout hook on checkout - t5403: simplify by using a single repository "git rebase" internally runs "checkout" to switch between branches, and the command used to call the post-checkout hook, but the reimplementation stopped doing so, which is getting fixed. * rb/hpe (2019-01-03) 5 commits - compat/regex/regcomp.c: define intptr_t and uintptr_t on NonStop - git-compat-util.h: add FLOSS headers for HPE NonStop - config.mak.uname: support for modern HPE NonStop config. - transport-helper: drop read/write errno checks - transport-helper: use xread instead of read Portability updates for the HPE NonStop platform. Will merge to 'next'. * sg/test-bash-version-fix (2019-01-03) 2 commits - Merge branch 'sg/test-bash-version-fix' - test-lib: check Bash version for '-x' without using shell arrays (this branch is used by sg/stress-test.) The test suite tried to see if it is run under bash, but the check itself failed under some other implementations of shell (notably under NetBSD). This has been corrected. Will merge to 'next'. * sm/http-no-more-failonerror (2019-01-03) 2 commits - Unset CURLOPT_FAILONERROR - Change how HTTP response body is returned Waiting for clarifications. * tg/t5570-drop-racy-test (2019-01-07) 2 commits - Revert "t/lib-git-daemon: record daemon log" - t5570: drop racy test An inherently racy test that caused intermittent failures has been removed. Will merge to 'next'. * tt/bisect-in-c (2019-01-02) 7 commits - bisect--helper: `bisect_start` shell function partially in C - bisect--helper: `get_terms` & `bisect_terms` shell function in C - bisect--helper: `bisect_next_check` shell function in C - bisect--helper: `check_and_set_terms` shell function in C - wrapper: move is_empty_file() and rename it as is_empty_or_missing_file() - bisect--helper: `bisect_write` shell function in C - bisect--helper: `bisect_reset` shell function in C More code in "git bisect" has been rewritten in C. * ja/doc-build-l10n (2019-01-07) 1 commit - Documentation/Makefile add optional targets for l10n Prepare Documentation/Makefile so that manpage localization can reuse it by overriding and tweaking the list of build products. Will merge to 'next'. * jk/loose-object-cache-oid (2019-01-07) 5 commits - sha1-file: modernize loose header/stream functions - sha1-file: modernize loose object file functions - http: use struct object_id instead of bare sha1 - update comment references to sha1_object_info() - sha1-file: fix outdated sha1 comment references (this branch uses rs/loose-object-cache-perffix.) Code clean-up. Will merge to 'next'. * mm/multimail-1.5 (2019-01-07) 1 commit - git-multimail: update to release 1.5.0 Update "git multimail" from the upstream. Will merge to 'next'. * po/git-p4-wo-login (2019-01-07) 1 commit - git-p4: fix problem when p4 login is not necessary "git p4" update. Will merge to 'next'. * rs/loose-object-cache-perffix (2019-01-07) 4 commits - object-store: retire odb_load_loose_cache() - object-store: use one oid_array per subdirectory for loose cache - object-store: factor out odb_clear_loose_cache() - object-store: factor out odb_loose_cache() (this branch is used by jk/loose-object-cache-oid.) The loose object cache used to optimize existence look-up has been updated. Will merge to 'next'. * rs/sha1-file-close-mapped-file-on-error (2019-01-07) 1 commit - sha1-file: close fd of empty file in map_sha1_file_1() Code clean-up. Will merge to 'next'. -------------------------------------------------- [Cooking] * lt/date-human (2019-01-02) 3 commits - t0006-date.sh: add `human` date format tests. - Add 'human' date format documentation - Add 'human' date format A new date format "--date=human" that morphs its output depending on how far the time is from the current time has been introduced. "--date=auto" can be used to use this new format when the output is goint to the pager or to the terminal and otherwise the default format. The design around "auto" may need to be rethought. The tests need to be updated, too. cf. <20190104075034.GA26014@sigill.intra.peff.net> cf. <a5412274-028f-3662-e4f5-dbbcad4d9a40@iee.org> * ds/midx-expire-repack (2019-01-02) 7 commits - midx: implement midx_repack() - multi-pack-index: prepare 'repack' subcommand - multi-pack-index: implement 'expire' verb - midx: refactor permutation logic - multi-pack-index: prepare for 'expire' subcommand - Docs: rearrange subcommands for multi-pack-index - repack: refactor pack deletion for future use * ds/push-sparse-tree-walk (2018-12-11) 6 commits - pack-objects: create GIT_TEST_PACK_SPARSE - pack-objects: create pack.useSparse setting - revision: implement sparse algorithm - pack-objects: add --sparse option - list-objects: consume sparse tree walk - revision: add mark_tree_uninteresting_sparse * js/rebase-i-redo-exec (2018-12-11) 3 commits - rebase: introduce a shortcut for --reschedule-failed-exec - rebase: add a config option to default to --reschedule-failed-exec - rebase: introduce --reschedule-failed-exec * md/list-objects-filter-by-depth (2018-12-28) 4 commits - tree:<depth>: skip some trees even when collecting omits - list-objects-filter: teach tree:# how to handle >0 - Merge branch 'nd/the-index' into md/list-objects-filter-by-depth - Merge branch 'sb/more-repo-in-api' into md/list-objects-filter-by-depth (this branch uses sb/more-repo-in-api; is tangled with jt/get-reference-with-commit-graph.) * nd/backup-log (2018-12-10) 24 commits - FIXME - rebase: keep backup of overwritten files on --skip or --abort - am: keep backup of overwritten files on --skip or --abort - checkout -f: keep backup of overwritten files - reset --hard: keep backup of overwritten files - unpack-trees.c: keep backup of ignored files being overwritten - refs: keep backup of deleted reflog - config --edit: support backup log - sha1-file.c: let index_path() accept NULL istate - backup-log: keep all blob references around - gc: prune backup logs - backup-log: add prune command - backup-log: add log command - backup-log: add diff command - backup-log: add cat command - backup-log.c: add API for walking backup log - add--interactive: support backup log - apply: support backup log with --keep-backup - commit: support backup log - update-index: support backup log with --keep-backup - add: support backup log - read-cache.c: new flag for add_index_entry() to write to backup log - backup-log: add "update" subcommand - doc: introduce new "backup log" concept * nd/style-opening-brace (2018-12-10) 1 commit - style: the opening '{' of a function is in a separate line Code clean-up. Will merge to 'next'. * sg/stress-test (2019-01-07) 8 commits - test-lib: add the '--stress' option to run a test repeatedly under load - test-lib-functions: introduce the 'test_set_port' helper function - test-lib: set $TRASH_DIRECTORY earlier - test-lib: consolidate naming of test-results paths - test-lib: parse command line options earlier - test-lib: parse options in a for loop to keep $@ intact - test-lib: extract Bash version check for '-x' tracing - test-lib: translate SIGTERM and SIGHUP to an exit (this branch uses sg/test-bash-version-fix.) Flaky tests can now be repeatedly run under load with the "--stress" option. Will merge to 'next'. * tg/checkout-no-overlay (2019-01-02) 8 commits - checkout: introduce checkout.overlayMode config - checkout: introduce --{,no-}overlay option - checkout: factor out mark_cache_entry_for_checkout function - checkout: clarify comment - read-cache: add invalidate parameter to remove_marked_cache_entries - entry: support CE_WT_REMOVE flag in checkout_entry - entry: factor out unlink_entry function - move worktree tests to t24* "git checkout --no-overlay" can be used to trigger a new mode of checking out paths out of the tree-ish, that allows paths that match the pathspec that are in the current index and working tree and are not in the tree-ish. Will merge to 'next'. * jk/proto-v2-hidden-refs-fix (2018-12-14) 3 commits - upload-pack: support hidden refs with protocol v2 - parse_hide_refs_config: handle NULL section - serve: pass "config context" through to individual commands The v2 upload-pack protocol implementation failed to honor hidden-ref configuration, which has been corrected. Will merge to 'next'. * la/quiltimport-keep-non-patch (2018-12-14) 1 commit - git-quiltimport: Add --keep-non-patch option "git quiltimport" learned "--keep-non-patch" option. Will merge to 'next'. * sb/submodule-fetchjobs-default-to-one (2018-12-14) 1 commit - submodule update: run at most one fetch job unless otherwise set "git submodule update" ought to use a single job unless asked, but by mistake used multiple jobs, which has been fixed. Will merge to 'next'. * cb/openbsd-allows-reading-directory (2018-12-03) 1 commit (merged to 'next' on 2019-01-04 at 9d865107fd) + config.mak.uname: OpenBSD uses BSD semantics with fread for directories BSD port update. Will merge to 'master'. * cb/t5004-empty-tar-archive-fix (2018-12-03) 1 commit (merged to 'next' on 2019-01-04 at 39f4cf94ce) + t5004: avoid using tar for empty packages BSD port update. Will merge to 'master'. * cb/test-lint-cp-a (2018-12-03) 1 commit (merged to 'next' on 2019-01-04 at d13e6cfcb0) + tests: add lint for non portable cp -a BSD port update. Will merge to 'master'. * hb/t0061-dot-in-path-fix (2018-12-03) 1 commit (merged to 'next' on 2019-01-04 at 789f990c4e) + t0061: do not fail test if '.' is part of $PATH Test update. Will merge to 'master'. * hn/highlight-sideband-keywords (2018-12-04) 1 commit (merged to 'next' on 2019-01-04 at b039601533) + sideband: color lines with keyword only Lines that begin with a certain keyword that come over the wire, as well as lines that consist only of one of these keywords, ought to be painted in color for easier eyeballing, but the latter was broken ever since the feature was introduced in 2.19, which has been corrected. Will merge to 'master'. * js/commit-graph-chunk-table-fix (2018-12-14) 3 commits - Makefile: correct example fuzz build - commit-graph: fix buffer read-overflow - commit-graph, fuzz: add fuzzer for commit-graph The codepath to read from the commit-graph file attempted to read past the end of it when the file's table-of-contents was corrupt. * jt/get-reference-with-commit-graph (2018-12-28) 1 commit - revision: use commit graph in get_reference() (this branch uses sb/more-repo-in-api; is tangled with md/list-objects-filter-by-depth.) Micro-optimize the code that prepares commit objects to be walked by "git rev-list" when the commit-graph is available. Needs to be rebuilt when sb/more-repo-in-api is rewound. * md/exclude-promisor-objects-fix-cleanup (2018-12-06) 1 commit (merged to 'next' on 2019-01-04 at c15362832d) + revision.c: put promisor option in specialized struct Code clean-up. Will merge to 'master'. * bw/mailmap (2018-12-09) 1 commit (merged to 'next' on 2019-01-04 at 02b6e83231) + mailmap: update brandon williams's email address Will merge to 'master'. * do/gitweb-strict-export-conf-doc (2018-12-09) 1 commit (merged to 'next' on 2019-01-04 at 5249c9386a) + docs: fix $strict_export text in gitweb.conf.txt Doc update. Will merge to 'master'. * en/directory-renames-nothanks-doc-update (2018-12-09) 1 commit (merged to 'next' on 2019-01-04 at cb7134b54c) + git-rebase.txt: update note about directory rename detection and am Doc update. Will merge to 'master'. * fd/gitweb-snapshot-conf-doc-fix (2018-12-09) 1 commit (merged to 'next' on 2019-01-04 at 7ba71fca17) + docs/gitweb.conf: config variable typo Doc update. Will merge to 'master'. * km/rebase-doc-typofix (2018-12-10) 1 commit (merged to 'next' on 2019-01-04 at c89f646e8f) + rebase docs: drop stray word in merge command description Doc update. Will merge to 'master'. * nd/indentation-fix (2018-12-09) 1 commit (merged to 'next' on 2019-01-04 at 738b17d365) + Indent code with TABs Code cleanup. Will merge to 'master'. * tb/use-common-win32-pathfuncs-on-cygwin (2018-12-26) 1 commit (merged to 'next' on 2019-01-04 at c3b2b1f3c3) + git clone <url> C:\cygwin\home\USER\repo' is working (again) Cygwin update. Will merge to 'master'. * tb/log-G-binary (2018-12-26) 1 commit (merged to 'next' on 2019-01-04 at a713ef389c) + log -G: ignore binary files "git log -G<regex>" looked for a hunk in the "git log -p" patch output that contained a string that matches the given pattern. Optimize this code to ignore binary files, which by default will not show any hunk that would match any pattern (unless textconv or the --text option is in effect, that is). Will merge to 'master'. * dl/merge-cleanup-scissors-fix (2018-11-21) 2 commits - merge: add scissors line on merge conflict - t7600: clean up 'merge --squash c3 with c7' test The list of conflicted paths shown in the editor while concluding a conflicted merge was shown above the scissors line when the clean-up mode is set to "scissors", even though it was commented out just like the list of updated paths and other information to help the user explain the merge better. Kicked out of 'next', to replace with a newer iteration. cf. <cover.1545745331.git.liu.denton@gmail.com> * aw/pretty-trailers (2018-12-09) 7 commits - pretty: add support for separator option in %(trailers) - strbuf: separate callback for strbuf_expand:ing literals - pretty: add support for "valueonly" option in %(trailers) - pretty: allow showing specific trailers - pretty: single return path in %(trailers) handling - pretty: allow %(trailers) options with explicit value - doc: group pretty-format.txt placeholders descriptions The %(trailers) formatter in "git log --format=..." now allows to optionally pick trailers selectively by keyword, show only values, etc. How's the doneness of this one? * nd/attr-pathspec-in-tree-walk (2018-11-19) 5 commits (merged to 'next' on 2019-01-04 at 6a07e5b905) + tree-walk: support :(attr) matching + dir.c: move, rename and export match_attrs() + pathspec.h: clean up "extern" in function declarations + tree-walk.c: make tree_entry_interesting() take an index + tree.c: make read_tree*() take 'struct repository *' The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. Will merge to 'master'. * ab/commit-graph-progress-fix (2018-11-20) 1 commit (merged to 'next' on 2019-01-04 at 405a1a2cf5) + commit-graph: split up close_reachable() progress output Will merge to 'master'. * jn/unknown-index-extensions (2018-11-21) 2 commits - index: offer advice for unknown index extensions - index: do not warn about unrecognized extensions A bit too alarming warning given when unknown index extensions exist is getting revamped. Expecting a reroll. * nd/checkout-noisy (2018-11-20) 2 commits (merged to 'next' on 2019-01-04 at 480172d3d7) + t0027: squelch checkout path run outside test_expect_* block + checkout: print something when checking out paths "git checkout [<tree-ish>] path..." learned to report the number of paths that have been checked out of the index or the tree-ish, which gives it the same degree of noisy-ness as the case in which the command checks out a branch. Will merge to 'master'. * en/rebase-merge-on-sequencer (2019-01-07) 8 commits - rebase: implement --merge via the interactive machinery - rebase: define linearization ordering and enforce it - git-legacy-rebase: simplify unnecessary triply-nested if - git-rebase, sequencer: extend --quiet option for the interactive machinery - am, rebase--merge: do not overlook --skip'ed commits with post-rewrite - t5407: add a test demonstrating how interactive handles --skip differently - rebase: fix incompatible options error message - rebase: make builtin and legacy script error messages the same "git rebase --merge" as been reimplemented by reusing the internal machinery used for "git rebase -i". * dl/remote-save-to-push (2018-12-11) 1 commit - remote: add --save-to-push option to git remote set-url "git remote set-url" learned a new option that moves existing value of the URL field to pushURL field of the remote before replacing the URL field with a new value. I am personally not yet quite convinced if this is worth pursuing. * js/protocol-advertise-multi (2018-12-28) 1 commit - protocol: advertise multiple supported versions The transport layer has been updated so that the protocol version used can be negotiated between the parties, by the initiator listing the protocol versions it is willing to talk, and the other side choosing from one of them. * js/smart-http-detect-remote-error (2019-01-07) 3 commits - remote-curl: die on server-side errors - remote-curl: tighten "version 2" check for smart-http - remote-curl: refactor smart-http discovery Some errors from the other side coming over smart HTTP transport were not noticed, which has been corrected. * nb/branch-show-other-worktrees-head (2019-01-07) 4 commits - branch: add an extra verbose output displaying worktree path for checked out branch - branch: mark and color a branch that is checked out in a linked worktree differently - SQUASH??? - ref-filter: add worktreepath atom "git branch --list" learned to show branches that are checked out in other worktrees connected to the same repository prefixed with '+', similar to the way the currently checked out branch is shown with '*' in front. * ot/ref-filter-object-info (2018-12-28) 6 commits - ref-filter: add docs for new options - ref-filter: add tests for deltabase - ref-filter: add deltabase option - ref-filter: add tests for objectsize:disk - ref-filter: add check for negative file size - ref-filter: add objectsize:disk option The "--format=<placeholder>" option of for-each-ref, branch and tag learned to show a few more traits of objects that can be learned by the object_info API. Will merge to 'next'. * sb/diff-color-moved-config-option-fixup (2018-11-14) 1 commit (merged to 'next' on 2019-01-04 at 46de5f42d1) + diff: align move detection error handling with other options Minor inconsistency fix. Will merge to 'master'. * md/list-lazy-objects-fix (2018-12-06) 1 commit (merged to 'next' on 2019-01-04 at 93bd38fff9) + list-objects.c: don't segfault for missing cmdline objects "git rev-list --exclude-promisor-objects" had to take an object that does not exist locally (and is lazily available) from the command line without barfing, but the code dereferenced NULL. Will merge to 'master'. * sb/more-repo-in-api (2018-12-28) 23 commits - t/helper/test-repository: celebrate independence from the_repository - path.h: make REPO_GIT_PATH_FUNC repository agnostic - commit: prepare free_commit_buffer and release_commit_memory for any repo - commit-graph: convert remaining functions to handle any repo - submodule: don't add submodule as odb for push - submodule: use submodule repos for object lookup - pretty: prepare format_commit_message to handle arbitrary repositories - commit: prepare logmsg_reencode to handle arbitrary repositories - commit: prepare repo_unuse_commit_buffer to handle any repo - commit: prepare get_commit_buffer to handle any repo - commit-reach: prepare in_merge_bases[_many] to handle any repo - commit-reach: prepare get_merge_bases to handle any repo - commit-reach.c: allow get_merge_bases_many_0 to handle any repo - commit-reach.c: allow remove_redundant to handle any repo - commit-reach.c: allow merge_bases_many to handle any repo - commit-reach.c: allow paint_down_to_common to handle any repo - commit: allow parse_commit* to handle any repo - object: parse_object to honor its repository argument - object-store: prepare has_{sha1, object}_file to handle any repo - object-store: prepare read_object_file to deal with any repo - object-store: allow read_object_file_extended to read from any repo - packfile: allow has_packed_and_bad to handle arbitrary repositories - sha1_file: allow read_object to read objects in arbitrary repositories (this branch is used by jt/get-reference-with-commit-graph and md/list-objects-filter-by-depth.) The in-core repository instances are passed through more codepaths. * bc/sha-256 (2018-11-14) 12 commits - hash: add an SHA-256 implementation using OpenSSL - sha256: add an SHA-256 implementation using libgcrypt - Add a base implementation of SHA-256 support - commit-graph: convert to using the_hash_algo - t/helper: add a test helper to compute hash speed - sha1-file: add a constant for hash block size - t: make the sha1 test-tool helper generic - t: add basic tests for our SHA-1 implementation - cache: make hashcmp and hasheq work with larger hashes - hex: introduce functions to print arbitrary hashes - sha1-file: provide functions to look up hash algorithms - sha1-file: rename algorithm to "sha1" Add sha-256 hash and plug it through the code to allow building Git with the "NewHash". * js/vsts-ci (2018-10-16) 13 commits . travis: fix skipping tagged releases . README: add a build badge (status of the Azure Pipelines build) . tests: record more stderr with --write-junit-xml in case of failure . tests: include detailed trace logs with --write-junit-xml upon failure . git-p4: use `test_atexit` to kill the daemon . git-daemon: use `test_atexit` in the tests . tests: introduce `test_atexit` . ci: add a build definition for Azure DevOps . ci/lib.sh: add support for Azure Pipelines . tests: optionally write results as JUnit-style .xml . test-date: add a subcommand to measure times in shell scripts . ci/lib.sh: encapsulate Travis-specific things . ci: rename the library of common functions Prepare to run test suite on Azure DevOps. Ejected out of 'pu', as doing so seems to help other topics get tested at TravisCI. https://travis-ci.org/git/git/builds/452713184 is a sample of a build whose tests on 4 hang (with this series in). Ejecting it gave us https://travis-ci.org/git/git/builds/452778963 which still shows breakages from other topics not yet in 'next', but at least the tests do not stall. * du/branch-show-current (2018-10-26) 1 commit - branch: introduce --show-current display option "git branch" learned a new subcommand "--show-current". I am personally not yet quite convinced if this is worth pursuing. * mk/use-size-t-in-zlib (2018-10-15) 1 commit - zlib.c: use size_t for size The wrapper to call into zlib followed our long tradition to use "unsigned long" for sizes of regions in memory, which have been updated to use "size_t". * ag/sequencer-reduce-rewriting-todo (2018-11-12) 16 commits . rebase--interactive: move transform_todo_file() to rebase--interactive.c . sequencer: fix a call to error() in transform_todo_file() . sequencer: use edit_todo_list() in complete_action() . rebase-interactive: rewrite edit_todo_list() to handle the initial edit . rebase-interactive: append_todo_help() changes . rebase-interactive: use todo_list_write_to_file() in edit_todo_list() . sequencer: refactor skip_unnecessary_picks() to work on a todo_list . sequencer: change complete_action() to use the refactored functions . sequencer: make sequencer_make_script() write its script to a strbuf . sequencer: refactor rearrange_squash() to work on a todo_list . sequencer: refactor sequencer_add_exec_commands() to work on a todo_list . sequencer: refactor check_todo_list() to work on a todo_list . sequencer: introduce todo_list_write_to_file() . sequencer: refactor transform_todos() to work on a todo_list . sequencer: make the todo_list structure public . sequencer: changes in parse_insn_buffer() The scripted version of "git rebase -i" wrote and rewrote the todo list many times during a single step of its operation, and the recent C-rewrite made a faithful conversion of the logic to C. The implementation has been updated to carry necessary information around in-core to avoid rewriting the same file over and over unnecessarily. With too many topics in-flight that touch sequencer and rebaser, this need to wait giving precedence to other topics that fix bugs. * sb/submodule-recursive-fetch-gets-the-tip (2018-12-09) 9 commits - fetch: ensure submodule objects fetched - submodule.c: fetch in submodules git directory instead of in worktree - submodule: migrate get_next_submodule to use repository structs - repository: repo_submodule_init to take a submodule struct - submodule: store OIDs in changed_submodule_names - submodule.c: tighten scope of changed_submodule_names struct - submodule.c: sort changed_submodule_names before searching it - submodule.c: fix indentation - sha1-array: provide oid_array_filter "git fetch --recurse-submodules" may not fetch the necessary commit that is bound to the superproject, which is getting corrected. Ready? * js/add-i-coalesce-after-editing-hunk (2018-08-28) 1 commit - add -p: coalesce hunks before testing applicability Applicability check after a patch is edited in a "git add -i/p" session has been improved. Will hold. cf. <e5b2900a-0558-d3bf-8ea1-d526b078bbc2@talktalk.net> * ps/stash-in-c (2019-01-04) 27 commits - tests: add a special setup where stash.useBuiltin is off - stash: optionally use the scripted version again - stash: add back the original, scripted `git stash` - stash: convert `stash--helper.c` into `stash.c` - stash: replace all `write-tree` child processes with API calls - stash: optimize `get_untracked_files()` and `check_changes()` - stash: convert save to builtin - stash: make push -q quiet - stash: convert push to builtin - stash: convert create to builtin - stash: convert store to builtin - stash: convert show to builtin - stash: convert list to builtin - stash: convert pop to builtin - stash: convert branch to builtin - stash: convert drop and clear to builtin - stash: convert apply to builtin - stash: mention options in `show` synopsis - stash: add tests for `git stash show` config - stash: rename test cases to be more descriptive - t3903: modernize style - stash: improve option parsing test coverage - ident: add the ability to provide a "fallback identity" - strbuf.c: add `strbuf_insertf()` and `strbuf_vinsertf()` - strbuf.c: add `strbuf_join_argv()` - sha1-name.c: add `get_oidf()` which acts like `get_oid()` - Merge branch 'sd/stash-wo-user-name' "git stash" rewritten in C. * pw/add-p-select (2018-07-26) 4 commits - add -p: optimize line selection for short hunks - add -p: allow line selection to be inverted - add -p: select modified lines correctly - add -p: select individual hunk lines "git add -p" interactive interface learned to let users choose individual added/removed lines to be used in the operation, instead of accepting or rejecting a whole hunk. Will discard. No further feedbacks on the topic for quite some time. cf. <d622a95b-7302-43d4-4ec9-b2cf3388c653@talktalk.net> I found the feature to be hard to explain, and may result in more end-user complaints, but let's see. -------------------------------------------------- [Discarded] * ab/reject-alias-loop (2018-10-19) 1 commit . alias: detect loops in mixed execution mode Two (or more) aliases that mutually refer to each other can form an infinite loop; we now attempt to notice and stop. Discarded. Reverted out of 'next'. cf. <87sh0slvxm.fsf@evledraar.gmail.com> * gl/bundle-unlock-before-aborting (2018-11-14) 1 commit . bundle: rollback lock file while refusing to create an empty bundle Superseded by jk/close-duped-fd-before-unlock-for-bundle * js/remote-archive-v2 (2018-09-28) 4 commits . archive: allow archive over HTTP(S) with proto v2 . archive: implement protocol v2 archive command . archive: use packet_reader for communications . archive: follow test standards around assertions The original implementation of "git archive --remote" more or less bypassed the transport layer and did not work over http(s). The version 2 of the protocol is defined to allow going over http(s) as well as Git native transport. Retracted; reverted out of next. cf. <20181114195142.GI126896@google.com> * ab/format-patch-rangediff-not-stat (2018-11-24) 1 commit . format-patch: don't include --stat with --range-diff output The "--rangediff" option recently added to "format-patch" interspersed a bogus and useless "--stat" information by mistake, which is being corrected. Reverted out of 'next'. * jc/postpone-rebase-in-c (2018-11-26) 1 commit . rebase: mark the C reimplementation as an experimental opt-in feature People seem to be still finding latent bugs in the "rebase in C" reimplementation. For the upcoming release, use the scripted version by default and adjust the documentation accordingly. Reverted out of 'next'. ^ permalink raw reply [flat|nested] 41+ messages in thread
* tg/checkout-no-overlay, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano @ 2019-01-08 9:50 ` Thomas Gummerer 2019-01-08 17:51 ` Junio C Hamano 2019-01-08 17:30 ` ag/sequencer-reduce-rewriting-todo " Alban Gruin ` (4 subsequent siblings) 5 siblings, 1 reply; 41+ messages in thread From: Thomas Gummerer @ 2019-01-08 9:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List On Mon, Jan 7, 2019 at 11:34 PM Junio C Hamano <gitster@pobox.com> wrote: > * tg/checkout-no-overlay (2019-01-02) 8 commits > - checkout: introduce checkout.overlayMode config > - checkout: introduce --{,no-}overlay option > - checkout: factor out mark_cache_entry_for_checkout function > - checkout: clarify comment > - read-cache: add invalidate parameter to remove_marked_cache_entries > - entry: support CE_WT_REMOVE flag in checkout_entry > - entry: factor out unlink_entry function > - move worktree tests to t24* > > "git checkout --no-overlay" can be used to trigger a new mode of > checking out paths out of the tree-ish, that allows paths that > match the pathspec that are in the current index and working tree > and are not in the tree-ish. > > Will merge to 'next'. Please hold off on merging this to 'next'. There's still an outstanding comment from Duy and Eric [*1*], that should be addressed before this goes to next. (Their comments also apply to the documentation in 8/8). I'll send v3 of the series with this fixed today. *1*: <CAPig+cSOyCQZXiG7sJWb12WzzujM-nsqqpt+cFZTFvXB1+-SVQ@mail.gmail.com> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: tg/checkout-no-overlay, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-08 9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer @ 2019-01-08 17:51 ` Junio C Hamano 0 siblings, 0 replies; 41+ messages in thread From: Junio C Hamano @ 2019-01-08 17:51 UTC (permalink / raw) To: Thomas Gummerer; +Cc: Git Mailing List Thomas Gummerer <t.gummerer@gmail.com> writes: > On Mon, Jan 7, 2019 at 11:34 PM Junio C Hamano <gitster@pobox.com> wrote: >> * tg/checkout-no-overlay (2019-01-02) 8 commits >> - checkout: introduce checkout.overlayMode config >> - checkout: introduce --{,no-}overlay option >> - checkout: factor out mark_cache_entry_for_checkout function >> - checkout: clarify comment >> - read-cache: add invalidate parameter to remove_marked_cache_entries >> - entry: support CE_WT_REMOVE flag in checkout_entry >> - entry: factor out unlink_entry function >> - move worktree tests to t24* >> >> "git checkout --no-overlay" can be used to trigger a new mode of >> checking out paths out of the tree-ish, that allows paths that >> match the pathspec that are in the current index and working tree >> and are not in the tree-ish. >> >> Will merge to 'next'. > > Please hold off on merging this to 'next'. Thanks, will do so. ^ permalink raw reply [flat|nested] 41+ messages in thread
* ag/sequencer-reduce-rewriting-todo Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano 2019-01-08 9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer @ 2019-01-08 17:30 ` Alban Gruin 2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan ` (3 subsequent siblings) 5 siblings, 0 replies; 41+ messages in thread From: Alban Gruin @ 2019-01-08 17:30 UTC (permalink / raw) To: Junio C Hamano, git Hi Junio, Le 08/01/2019 à 00:34, Junio C Hamano a écrit : > * ag/sequencer-reduce-rewriting-todo (2018-11-12) 16 commits > . rebase--interactive: move transform_todo_file() to rebase--interactive.c > . sequencer: fix a call to error() in transform_todo_file() > . sequencer: use edit_todo_list() in complete_action() > . rebase-interactive: rewrite edit_todo_list() to handle the initial edit > . rebase-interactive: append_todo_help() changes > . rebase-interactive: use todo_list_write_to_file() in edit_todo_list() > . sequencer: refactor skip_unnecessary_picks() to work on a todo_list > . sequencer: change complete_action() to use the refactored functions > . sequencer: make sequencer_make_script() write its script to a strbuf > . sequencer: refactor rearrange_squash() to work on a todo_list > . sequencer: refactor sequencer_add_exec_commands() to work on a todo_list > . sequencer: refactor check_todo_list() to work on a todo_list > . sequencer: introduce todo_list_write_to_file() > . sequencer: refactor transform_todos() to work on a todo_list > . sequencer: make the todo_list structure public > . sequencer: changes in parse_insn_buffer() > > The scripted version of "git rebase -i" wrote and rewrote the todo > list many times during a single step of its operation, and the > recent C-rewrite made a faithful conversion of the logic to C. The > implementation has been updated to carry necessary information > around in-core to avoid rewriting the same file over and over > unnecessarily. > > With too many topics in-flight that touch sequencer and rebaser, > this need to wait giving precedence to other topics that fix bugs. > > I submitted a new version of this topic a week ago, you may have missed it: <20181229160413.19333-1-alban.gruin@gmail.com>. Cheers, Alban ^ permalink raw reply [flat|nested] 41+ messages in thread
* sb/more-repo-in-api, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano 2019-01-08 9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer 2019-01-08 17:30 ` ag/sequencer-reduce-rewriting-todo " Alban Gruin @ 2019-01-08 21:20 ` Jonathan Tan 2019-01-08 21:35 ` Junio C Hamano 2019-01-09 7:37 ` Martin Ågren ` (2 subsequent siblings) 5 siblings, 1 reply; 41+ messages in thread From: Jonathan Tan @ 2019-01-08 21:20 UTC (permalink / raw) To: gitster; +Cc: git, stolee, Jonathan Tan > * sb/more-repo-in-api (2018-12-28) 23 commits > - t/helper/test-repository: celebrate independence from the_repository > - path.h: make REPO_GIT_PATH_FUNC repository agnostic > - commit: prepare free_commit_buffer and release_commit_memory for any repo > - commit-graph: convert remaining functions to handle any repo > - submodule: don't add submodule as odb for push > - submodule: use submodule repos for object lookup > - pretty: prepare format_commit_message to handle arbitrary repositories > - commit: prepare logmsg_reencode to handle arbitrary repositories > - commit: prepare repo_unuse_commit_buffer to handle any repo > - commit: prepare get_commit_buffer to handle any repo > - commit-reach: prepare in_merge_bases[_many] to handle any repo > - commit-reach: prepare get_merge_bases to handle any repo > - commit-reach.c: allow get_merge_bases_many_0 to handle any repo > - commit-reach.c: allow remove_redundant to handle any repo > - commit-reach.c: allow merge_bases_many to handle any repo > - commit-reach.c: allow paint_down_to_common to handle any repo > - commit: allow parse_commit* to handle any repo > - object: parse_object to honor its repository argument > - object-store: prepare has_{sha1, object}_file to handle any repo > - object-store: prepare read_object_file to deal with any repo > - object-store: allow read_object_file_extended to read from any repo > - packfile: allow has_packed_and_bad to handle arbitrary repositories > - sha1_file: allow read_object to read objects in arbitrary repositories > (this branch is used by jt/get-reference-with-commit-graph and md/list-objects-filter-by-depth.) > > The in-core repository instances are passed through more codepaths. I think this is ready to be considered for merging to next. This series looks good both to Stolee [1] and to me (I replied to a previous version with comments on patch 18 [2] which Stefan has addressed, as can be seen in the inter-diff provided by Junio [3] - I probably should have replied to the latest version stating this too). [1] https://public-inbox.org/git/b2ff842d-4d60-0db7-c11d-dcc006dade18@gmail.com/ [2] https://public-inbox.org/git/20181115221254.45373-1-jonathantanmy@google.com/ [3] https://public-inbox.org/git/xmqq36qho1gf.fsf@gitster-ct.c.googlers.com/ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sb/more-repo-in-api, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan @ 2019-01-08 21:35 ` Junio C Hamano 2019-01-09 21:28 ` Stefan Beller 0 siblings, 1 reply; 41+ messages in thread From: Junio C Hamano @ 2019-01-08 21:35 UTC (permalink / raw) To: Jonathan Tan; +Cc: git, stolee Jonathan Tan <jonathantanmy@google.com> writes: >> The in-core repository instances are passed through more codepaths. > > I think this is ready to be considered for merging to next. This series looks > good both to Stolee [1] and to me (I replied to a previous version with > comments on patch 18 [2] which Stefan has addressed, as can be seen in the > inter-diff provided by Junio [3] - I probably should have replied to the latest > version stating this too). Alright, thanks. While attempting to resolve conflicts with this and Peff's "get rid of has_sha1_file()" topic, I found it somewhat disturbing that sha1-file.c still was a mixture of functions that do and do not take the repository pointer, but perhaps it's something we need to polish over time on top. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sb/more-repo-in-api, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-08 21:35 ` Junio C Hamano @ 2019-01-09 21:28 ` Stefan Beller 0 siblings, 0 replies; 41+ messages in thread From: Stefan Beller @ 2019-01-09 21:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Tan, git, Derrick Stolee On Tue, Jan 8, 2019 at 1:36 PM Junio C Hamano <gitster@pobox.com> wrote: > > Jonathan Tan <jonathantanmy@google.com> writes: > > >> The in-core repository instances are passed through more codepaths. > > > > I think this is ready to be considered for merging to next. This series looks > > good both to Stolee [1] and to me (I replied to a previous version with > > comments on patch 18 [2] which Stefan has addressed, as can be seen in the > > inter-diff provided by Junio [3] - I probably should have replied to the latest > > version stating this too). > > Alright, thanks. > > While attempting to resolve conflicts with this and Peff's "get rid > of has_sha1_file()" topic, I found it somewhat disturbing that > sha1-file.c still was a mixture of functions that do and do not take > the repository pointer, I agree that the current state is less than optimal, but that comes down to the workflow? Most of the later series regarding the repo refactoring are crafting some minimal set of changes that make sense to reach one specific goal. It would have been easier if we could have refactored the whole code base at once. > but perhaps it's something we need to polish > over time on top. Yeah, that is my current take. And best we'd have more patches as "t/helper/test-repository: celebrate independence from the_repository" to make sure we don't have bugs creeping into the code base. Thanks, Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano ` (2 preceding siblings ...) 2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan @ 2019-01-09 7:37 ` Martin Ågren 2019-01-09 21:06 ` Martin Ågren 2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King 2019-01-10 18:02 ` Stefan Beller 5 siblings, 1 reply; 41+ messages in thread From: Martin Ågren @ 2019-01-09 7:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List, brian m. carlson On Tue, 8 Jan 2019 at 00:34, Junio C Hamano <gitster@pobox.com> wrote: > * bc/sha-256 (2018-11-14) 12 commits > - hash: add an SHA-256 implementation using OpenSSL > - sha256: add an SHA-256 implementation using libgcrypt > - Add a base implementation of SHA-256 support > - commit-graph: convert to using the_hash_algo > - t/helper: add a test helper to compute hash speed > - sha1-file: add a constant for hash block size > - t: make the sha1 test-tool helper generic > - t: add basic tests for our SHA-1 implementation > - cache: make hashcmp and hasheq work with larger hashes > - hex: introduce functions to print arbitrary hashes > - sha1-file: provide functions to look up hash algorithms > - sha1-file: rename algorithm to "sha1" > > Add sha-256 hash and plug it through the code to allow building Git > with the "NewHash". AddressSanitizer barks at current pu (855f98be272f19d16564e) for a handful of tests. One example is t5702-protocol-v2.sh. I've tracked this one down to ce1a82c251 ("Merge branch 'bc/sha-256' into jch", 2019-01-08). Reverting that merge makes the problem go away, at least in t5702. Notably bc/sha-256 on its own has no problems with t5702, possibly because there has been quite some work done on the test script itself in bc/sha-256..pu. There are a few failing tests with AddressSanitizer on pu and they might be caused by different topics. I've got to run, but thought I'd just mention this one for now. Here's AddressSanitizer's first complaint for t5702: ==1691823==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6040000004f2 at pc 0x0000004ea0fd bp 0x7ffc53082590 sp 0x7ffc53081d40 READ of size 32 at 0x6040000004f2 thread T0 #0 0x4ea0fc in __asan_memcpy llvm/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23 #1 0x8603ec in oidset_insert oidset.c #2 0x86c977 in add_promisor_object packfile.c:2129:4 #3 0x86c07a in for_each_object_in_pack packfile.c:2070:7 #4 0x86c535 in for_each_packed_object packfile.c:2095:7 #5 0x86c651 in is_promisor_object packfile.c:2151:4 #6 0x5ae77d in mark_object builtin/fsck.c:173:6 #7 0x5b0824 in mark_object_reachable builtin/fsck.c:200:2 #8 0x5b0824 in fsck_handle_ref builtin/fsck.c:509 #9 0x8e6987 in do_for_each_repo_ref_iterator refs/iterator.c:418:12 #10 0x8cb84f in do_for_each_ref refs.c:1466:9 #11 0x8cb84f in refs_for_each_rawref refs.c:1532 #12 0x8cb84f in for_each_rawref refs.c:1538 #13 0x5acd1f in get_default_heads builtin/fsck.c:524:2 #14 0x5acd1f in cmd_fsck builtin/fsck.c:846 #15 0x52f022 in run_builtin git.c:422:11 #16 0x52d583 in handle_builtin git.c:655:8 #17 0x52c00b in run_argv git.c:709:4 #18 0x52c00b in cmd_main git.c:806 #19 0x6c4569 in main common-main.c:45:9 #20 0x7f5fd6c23b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 #21 0x41c799 in _start (git+0x41c799) 0x6040000004f2 is located 0 bytes to the right of 34-byte region [0x6040000004d0,0x6040000004f2) allocated by thread T0 here: #0 0x4eb4cf in malloc llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146 #1 0x9fa1db in do_xmalloc wrapper.c:60:8 #2 0x9fa2fd in do_xmallocz wrapper.c:100:8 #3 0x9fa2fd in xmallocz_gently wrapper.c:113 #4 0x86a877 in unpack_compressed_entry packfile.c:1588:11 #5 0x86a02e in unpack_entry packfile.c:1737:11 #6 0x867431 in cache_or_unpack_entry packfile.c:1439:10 #7 0x867431 in packed_object_info packfile.c:1506 #8 0x96b7be in oid_object_info_extended sha1-file.c:1394:10 #9 0x96d7d0 in read_object sha1-file.c:1434:6 #10 0x96d7d0 in read_object_file_extended sha1-file.c:1476 #11 0x85cf40 in repo_read_object_file ./object-store.h:174:9 #12 0x85cf40 in parse_object object.c:273 #13 0x86c752 in add_promisor_object packfile.c:2108:23 #14 0x86c07a in for_each_object_in_pack packfile.c:2070:7 #15 0x86c535 in for_each_packed_object packfile.c:2095:7 #16 0x86c651 in is_promisor_object packfile.c:2151:4 #17 0x5ae77d in mark_object builtin/fsck.c:173:6 #18 0x5b0824 in mark_object_reachable builtin/fsck.c:200:2 #19 0x5b0824 in fsck_handle_ref builtin/fsck.c:509 #20 0x8e6987 in do_for_each_repo_ref_iterator refs/iterator.c:418:12 #21 0x8cb84f in do_for_each_ref refs.c:1466:9 #22 0x8cb84f in refs_for_each_rawref refs.c:1532 #23 0x8cb84f in for_each_rawref refs.c:1538 #24 0x5acd1f in get_default_heads builtin/fsck.c:524:2 #25 0x5acd1f in cmd_fsck builtin/fsck.c:846 #26 0x52f022 in run_builtin git.c:422:11 #27 0x52d583 in handle_builtin git.c:655:8 #28 0x52c00b in run_argv git.c:709:4 #29 0x52c00b in cmd_main git.c:806 #30 0x6c4569 in main common-main.c:45:9 #31 0x7f5fd6c23b96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-09 7:37 ` Martin Ågren @ 2019-01-09 21:06 ` Martin Ågren 2019-01-10 1:02 ` brian m. carlson ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Martin Ågren @ 2019-01-09 21:06 UTC (permalink / raw) To: Git Mailing List; +Cc: brian m. carlson, Junio C Hamano On Wed, 9 Jan 2019 at 08:37, Martin Ågren <martin.agren@gmail.com> wrote: > > On Tue, 8 Jan 2019 at 00:34, Junio C Hamano <gitster@pobox.com> wrote: > > * bc/sha-256 (2018-11-14) 12 commits > > - hash: add an SHA-256 implementation using OpenSSL > > - sha256: add an SHA-256 implementation using libgcrypt > > - Add a base implementation of SHA-256 support > > - commit-graph: convert to using the_hash_algo > > - t/helper: add a test helper to compute hash speed > > - sha1-file: add a constant for hash block size > > - t: make the sha1 test-tool helper generic > > - t: add basic tests for our SHA-1 implementation > > - cache: make hashcmp and hasheq work with larger hashes > > - hex: introduce functions to print arbitrary hashes > > - sha1-file: provide functions to look up hash algorithms > > - sha1-file: rename algorithm to "sha1" > > > > Add sha-256 hash and plug it through the code to allow building Git > > with the "NewHash". > > AddressSanitizer barks at current pu (855f98be272f19d16564e) for a > handful of tests. > > One example is t5702-protocol-v2.sh. [...] > > ==1691823==ERROR: AddressSanitizer: heap-buffer-overflow on address > 0x6040000004f2 at pc 0x0000004ea0fd bp 0x7ffc53082590 sp > 0x7ffc53081d40 > READ of size 32 at 0x6040000004f2 thread T0 > #0 0x4ea0fc in __asan_memcpy > llvm/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23 > #1 0x8603ec in oidset_insert oidset.c > #2 0x86c977 in add_promisor_object packfile.c:2129:4 > #3 0x86c07a in for_each_object_in_pack packfile.c:2070:7 > #4 0x86c535 in for_each_packed_object packfile.c:2095:7 > #5 0x86c651 in is_promisor_object packfile.c:2151:4 > 0x6040000004f2 is located 0 bytes to the right of 34-byte region > [0x6040000004d0,0x6040000004f2) > allocated by thread T0 here: > #0 0x4eb4cf in malloc > llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146 > #1 0x9fa1db in do_xmalloc wrapper.c:60:8 > #2 0x9fa2fd in do_xmallocz wrapper.c:100:8 > #3 0x9fa2fd in xmallocz_gently wrapper.c:113 > #4 0x86a877 in unpack_compressed_entry packfile.c:1588:11 > #5 0x86a02e in unpack_entry packfile.c:1737:11 > #6 0x867431 in cache_or_unpack_entry packfile.c:1439:10 > #7 0x867431 in packed_object_info packfile.c:1506 > #8 0x96b7be in oid_object_info_extended sha1-file.c:1394:10 > #9 0x96d7d0 in read_object sha1-file.c:1434:6 > #10 0x96d7d0 in read_object_file_extended sha1-file.c:1476 > #11 0x85cf40 in repo_read_object_file ./object-store.h:174:9 > #12 0x85cf40 in parse_object object.c:273 > #13 0x86c752 in add_promisor_object packfile.c:2108:23 > #14 0x86c07a in for_each_object_in_pack packfile.c:2070:7 > #15 0x86c535 in for_each_packed_object packfile.c:2095:7 > #16 0x86c651 in is_promisor_object packfile.c:2151:4 I found some more time to look into this. It seems we have a buffer with raw data and we set up a `struct object_id *` pointing into it, at a (supposed) OID value. Then `update_tree_entry_internal()` verifies that the buffer contains sufficiently many bytes, i.e., at least `the_hash_algo->rawsz` (=20). We immediately call `oidset_insert()` which copies an entire struct, i.e., we copy sizeof(struct object_id) (=32) bytes. Which is 12 more than what is known to be safe. For this particular input data, we read outside allocated memory. I can think of three possible approaches: * Allocate with a margin (GIT_MAX_RAWSZ - the_hash_algo->rawsz) where "necessary" (TM). Maybe not so maintainable. * Teach `oidset_insert()` (i.e., khash) to only copy `the_hash_algo->rawsz` bytes. Maybe not so good for performance. * Ignore. I wonder which of these is the least awful, or if there are other ideas. Martin ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-09 21:06 ` Martin Ågren @ 2019-01-10 1:02 ` brian m. carlson 2019-01-10 18:55 ` Junio C Hamano 2019-01-10 19:03 ` Martin Ågren 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson 2 siblings, 2 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-10 1:02 UTC (permalink / raw) To: Martin Ågren; +Cc: Git Mailing List, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 1964 bytes --] On Wed, Jan 09, 2019 at 10:06:08PM +0100, Martin Ågren wrote: > I found some more time to look into this. > > It seems we have a buffer with raw data and we set up a `struct > object_id *` pointing into it, at a (supposed) OID value. Then > `update_tree_entry_internal()` verifies that the buffer contains > sufficiently many bytes, i.e., at least `the_hash_algo->rawsz` (=20). > We immediately call `oidset_insert()` which copies an entire struct, > i.e., we copy sizeof(struct object_id) (=32) bytes. Which is 12 more > than what is known to be safe. For this particular input data, we read > outside allocated memory. Anything pointing to a struct object_id has to support at least GIT_MAX_RAWSZ bytes, and that code doesn't, because it's a tree buffer. I ran into this later on in my SHA-256 work and have a series that fixes the tree-walk code, but it's a bit involved and requires copying the struct object_id out of the buffer. I thought we were going to be triggering this case only with some new code I was introducing, but apparently somebody else got there first. > I can think of three possible approaches: > > * Allocate with a margin (GIT_MAX_RAWSZ - the_hash_algo->rawsz) where > "necessary" (TM). Maybe not so maintainable. I think there are actually several places where we allocate for these buffers, so this is not likely to be a great solution. Even worse, in some cases, we intentionally use a too-short buffer knowing that we'll never dereference the data. > * Teach `oidset_insert()` (i.e., khash) to only copy > `the_hash_algo->rawsz` bytes. Maybe not so good for performance. This is probably the best fix for the moment if you want an immediate fix. As for my series, I'll need to run the testsuite on it, but I'll try to get it out tonight or at the latest tomorrow if people want to use that instead. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-10 1:02 ` brian m. carlson @ 2019-01-10 18:55 ` Junio C Hamano 2019-01-10 19:03 ` Martin Ågren 1 sibling, 0 replies; 41+ messages in thread From: Junio C Hamano @ 2019-01-10 18:55 UTC (permalink / raw) To: brian m. carlson; +Cc: Martin Ågren, Git Mailing List "brian m. carlson" <sandals@crustytoothpaste.net> writes: >> I can think of three possible approaches: >> >> * Allocate with a margin (GIT_MAX_RAWSZ - the_hash_algo->rawsz) where >> "necessary" (TM). Maybe not so maintainable. > > I think there are actually several places where we allocate for these > buffers, so this is not likely to be a great solution. Even worse, in > some cases, we intentionally use a too-short buffer knowing that we'll > never dereference the data. > >> * Teach `oidset_insert()` (i.e., khash) to only copy >> `the_hash_algo->rawsz` bytes. Maybe not so good for performance. > > This is probably the best fix for the moment if you want an immediate > fix. > > As for my series, I'll need to run the testsuite on it, but I'll try to > get it out tonight or at the latest tomorrow if people want to use that > instead. Thanks. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-10 1:02 ` brian m. carlson 2019-01-10 18:55 ` Junio C Hamano @ 2019-01-10 19:03 ` Martin Ågren 1 sibling, 0 replies; 41+ messages in thread From: Martin Ågren @ 2019-01-10 19:03 UTC (permalink / raw) To: brian m. carlson; +Cc: Git Mailing List, Junio C Hamano On Thu, 10 Jan 2019 at 02:03, brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On Wed, Jan 09, 2019 at 10:06:08PM +0100, Martin Ågren wrote: > > i.e., we copy sizeof(struct object_id) (=32) bytes. Which is 12 more > > than what is known to be safe. For this particular input data, we read > > outside allocated memory. > > Anything pointing to a struct object_id has to support at least > GIT_MAX_RAWSZ bytes, and that code doesn't, because it's a tree buffer. > > I ran into this later on in my SHA-256 work and have a series that fixes > the tree-walk code, but it's a bit involved and requires copying the > struct object_id out of the buffer. > > I thought we were going to be triggering this case only with some new > code I was introducing, but apparently somebody else got there first. > As for my series, I'll need to run the testsuite on it, but I'll try to > get it out tonight or at the latest tomorrow if people want to use that > instead. Cool. I should have known that you had something in the pipeline. Thanks for working on this. ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 0/5] tree-walk object_id refactor 2019-01-09 21:06 ` Martin Ågren 2019-01-10 1:02 ` brian m. carlson @ 2019-01-10 4:25 ` brian m. carlson 2019-01-10 4:25 ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson ` (5 more replies) 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson 2 siblings, 6 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-10 4:25 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano There are a small number of places in our codebase where we cast a buffer of unsigned char to a struct object_id pointer. When we have GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places (the buffer for tree objects) can lead to us copying too much data when using SHA-1 as the hash, since there are only 20 bytes to read. This was not expected to be a problem before future code was introduced, but due to a combination of series the issue became noticeable. This series introduces a refactor to avoid referencing the struct object_id directly from a buffer and instead storing an additional struct object_id (and an int) in struct name_entry and referring to that. This series, while based on master, addresses the interactions seen on pu between the SHA-256 series and the oidset series. There are a small number of conflicts, both textual and logical, when merging this series and pu, but they should be easily resolved. This series contains a final patch which will become necessary at some point for hygienic code, but which could be deferred until later if desired. The testsuite passes with AddressSanitizer at each stage and when merged into pu. brian m. carlson (5): tree-walk: copy object ID before use match-trees: compute buffer offset correctly when splicing match-trees: use hashcpy to splice trees tree-walk: store object_id in a separate member cache: make oidcpy always copy GIT_MAX_RAWSZ bytes builtin/grep.c | 8 ++++---- builtin/merge-tree.c | 20 ++++++++++---------- builtin/pack-objects.c | 4 ++-- builtin/reflog.c | 4 ++-- cache-tree.c | 4 ++-- cache.h | 2 +- contrib/coccinelle/object_id.cocci | 30 ------------------------------ delta-islands.c | 2 +- fsck.c | 4 ++-- http-push.c | 4 ++-- list-objects.c | 6 +++--- match-trees.c | 11 ++++++----- notes.c | 4 ++-- packfile.c | 2 +- revision.c | 4 ++-- tree-diff.c | 6 +++--- tree-walk.c | 21 ++++++++++++--------- tree-walk.h | 9 ++++++--- tree.c | 10 +++++----- unpack-trees.c | 6 +++--- walker.c | 4 ++-- 21 files changed, 71 insertions(+), 94 deletions(-) ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 1/5] tree-walk: copy object ID before use 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson @ 2019-01-10 4:25 ` brian m. carlson 2019-01-10 4:25 ` [PATCH 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson ` (4 subsequent siblings) 5 siblings, 0 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-10 4:25 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano In a future commit, the pointer returned by tree_entry_extract will point into the struct tree_desc, causing its lifetime to be bound to that of the struct tree_desc itself. To ensure this code path keeps working, copy the object_id into a local variable so that it lives long enough. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- tree-walk.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index 79bafbd1a2..1e040fc20e 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -498,10 +498,10 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_ int namelen = strlen(name); while (t->size) { const char *entry; - const struct object_id *oid; + struct object_id oid; int entrylen, cmp; - oid = tree_entry_extract(t, &entry, mode); + oidcpy(&oid, tree_entry_extract(t, &entry, mode)); entrylen = tree_entry_len(&t->entry); update_tree_entry(t); if (entrylen > namelen) @@ -512,7 +512,7 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_ if (cmp < 0) break; if (entrylen == namelen) { - oidcpy(result, oid); + oidcpy(result, &oid); return 0; } if (name[entrylen] != '/') @@ -520,10 +520,10 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_ if (!S_ISDIR(*mode)) break; if (++entrylen == namelen) { - oidcpy(result, oid); + oidcpy(result, &oid); return 0; } - return get_tree_entry(oid, name + entrylen, result, mode); + return get_tree_entry(&oid, name + entrylen, result, mode); } return -1; } ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 2/5] match-trees: compute buffer offset correctly when splicing 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson 2019-01-10 4:25 ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson @ 2019-01-10 4:25 ` brian m. carlson 2019-01-10 4:25 ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson ` (3 subsequent siblings) 5 siblings, 0 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-10 4:25 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano Currently, the struct object_id pointer returned from tree_entry_extract lives directly inside the parsed tree buffer. In a future commit, this will change so that it instead points to a dedicated struct member. Since in this code path, we want to modify the buffer directly, compute the buffer offset we want to modify by using the pointer to the path instead. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- match-trees.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/match-trees.c b/match-trees.c index 2b6d31ef9d..feca48a5fd 100644 --- a/match-trees.c +++ b/match-trees.c @@ -199,15 +199,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, while (desc.size) { const char *name; unsigned mode; - const struct object_id *oid; - oid = tree_entry_extract(&desc, &name, &mode); + tree_entry_extract(&desc, &name, &mode); if (strlen(name) == toplen && !memcmp(name, prefix, toplen)) { if (!S_ISDIR(mode)) die("entry %s in tree %s is not a tree", name, oid_to_hex(oid1)); - rewrite_here = (struct object_id *)oid; + rewrite_here = (struct object_id *)(desc.entry.path + + strlen(desc.entry.path) + + 1); break; } update_tree_entry(&desc); ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 3/5] match-trees: use hashcpy to splice trees 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson 2019-01-10 4:25 ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson 2019-01-10 4:25 ` [PATCH 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson @ 2019-01-10 4:25 ` brian m. carlson 2019-01-10 6:45 ` Jeff King 2019-01-10 4:25 ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson ` (2 subsequent siblings) 5 siblings, 1 reply; 41+ messages in thread From: brian m. carlson @ 2019-01-10 4:25 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano When we're splicing trees, we're writing directly from one location into a buffer that is exactly the same size as a tree object. If the current hash algorithm is SHA-1, we may not have a full 32 (GIT_MAX_RAWSZ) bytes available to write, nor would we want to write that many bytes even if we did. In a future commit, we'll split out hashcpy to respect the_hash_algo while oidcpy uses GIT_MAX_RAWSZ, so convert the oidcpy to a hashcpy so we copy the right number of bytes. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- match-trees.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/match-trees.c b/match-trees.c index feca48a5fd..b1fbd022d1 100644 --- a/match-trees.c +++ b/match-trees.c @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, } else { rewrite_with = oid2; } - oidcpy(rewrite_here, rewrite_with); + hashcpy(rewrite_here->hash, rewrite_with->hash); status = write_object_file(buf, sz, tree_type, result); free(buf); return status; ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees 2019-01-10 4:25 ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson @ 2019-01-10 6:45 ` Jeff King 2019-01-10 23:55 ` brian m. carlson 0 siblings, 1 reply; 41+ messages in thread From: Jeff King @ 2019-01-10 6:45 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Martin Ågren, Junio C Hamano On Thu, Jan 10, 2019 at 04:25:49AM +0000, brian m. carlson wrote: > When we're splicing trees, we're writing directly from one location into > a buffer that is exactly the same size as a tree object. If the current > hash algorithm is SHA-1, we may not have a full 32 (GIT_MAX_RAWSZ) bytes > available to write, nor would we want to write that many bytes even if > we did. In a future commit, we'll split out hashcpy to respect > the_hash_algo while oidcpy uses GIT_MAX_RAWSZ, so convert the oidcpy to > a hashcpy so we copy the right number of bytes. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > match-trees.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/match-trees.c b/match-trees.c > index feca48a5fd..b1fbd022d1 100644 > --- a/match-trees.c > +++ b/match-trees.c > @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, > } else { > rewrite_with = oid2; > } > - oidcpy(rewrite_here, rewrite_with); > + hashcpy(rewrite_here->hash, rewrite_with->hash); Hrm. Our coccinelle patches will want to convert this back to oidcpy(), though I see you drop them in the final patch. However, I wonder if it points to another mismatch. Isn't the point that we _don't_ actually have "struct object_id"s here? I.e., rewrite_here and rewrite_with should actually be "const unsigned char *" that we happen to know are the_hash_algo->raw_sz? I think the only reason they are "struct object_id" is because that's what tree_entry_extract() returns. Should we be providing another function to allow more byte-oriented access? -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees 2019-01-10 6:45 ` Jeff King @ 2019-01-10 23:55 ` brian m. carlson 2019-01-11 14:51 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: brian m. carlson @ 2019-01-10 23:55 UTC (permalink / raw) To: Jeff King; +Cc: git, Martin Ågren, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 1601 bytes --] On Thu, Jan 10, 2019 at 01:45:20AM -0500, Jeff King wrote: > On Thu, Jan 10, 2019 at 04:25:49AM +0000, brian m. carlson wrote: > > diff --git a/match-trees.c b/match-trees.c > > index feca48a5fd..b1fbd022d1 100644 > > --- a/match-trees.c > > +++ b/match-trees.c > > @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, > > } else { > > rewrite_with = oid2; > > } > > - oidcpy(rewrite_here, rewrite_with); > > + hashcpy(rewrite_here->hash, rewrite_with->hash); > > Hrm. Our coccinelle patches will want to convert this back to oidcpy(), > though I see you drop them in the final patch. > > However, I wonder if it points to another mismatch. Isn't the point that > we _don't_ actually have "struct object_id"s here? I.e., rewrite_here > and rewrite_with should actually be "const unsigned char *" that we > happen to know are the_hash_algo->raw_sz? Correct. > I think the only reason they are "struct object_id" is because that's > what tree_entry_extract() returns. Should we be providing another > function to allow more byte-oriented access? The reason is we recursively call splice_tree, passing rewrite_here as the first argument. That argument is then used for read_object_file, which requires a struct object_id * (because it is, logically, an object ID). I *think* we could fix this by copying an unsigned char *rewrite_here into a temporary struct object_id before we recurse, but it's not obvious to me if that's safe to do. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees 2019-01-10 23:55 ` brian m. carlson @ 2019-01-11 14:51 ` Jeff King 2019-01-11 14:54 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: Jeff King @ 2019-01-11 14:51 UTC (permalink / raw) To: brian m. carlson, git, Martin Ågren, Junio C Hamano On Thu, Jan 10, 2019 at 11:55:51PM +0000, brian m. carlson wrote: > > I think the only reason they are "struct object_id" is because that's > > what tree_entry_extract() returns. Should we be providing another > > function to allow more byte-oriented access? > > The reason is we recursively call splice_tree, passing rewrite_here as > the first argument. That argument is then used for read_object_file, > which requires a struct object_id * (because it is, logically, an object > ID). > > I *think* we could fix this by copying an unsigned char *rewrite_here > into a temporary struct object_id before we recurse, but it's not > obvious to me if that's safe to do. I think rewrite_here needs to be a direct pointer into the buffer, since we plan to modify it. I think rewrite_with is correct to be an object_id. It's either the oid passed in from the caller, or the subtree we generated (in which case it's the result from write_object_file). So the "most correct" thing is probably something like this: diff --git a/match-trees.c b/match-trees.c index 03e81b56e1..129b13a970 100644 --- a/match-trees.c +++ b/match-trees.c @@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, char *buf; unsigned long sz; struct tree_desc desc; - struct object_id *rewrite_here; + unsigned char *rewrite_here; const struct object_id *rewrite_with; struct object_id subtree; enum object_type type; @@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, if (!S_ISDIR(mode)) die("entry %s in tree %s is not a tree", name, oid_to_hex(oid1)); - rewrite_here = (struct object_id *)(desc.entry.path + - strlen(desc.entry.path) + - 1); + /* + * We cast here for two reasons: + * + * - to flip the "char *" (for the path) to "unsigned + * char *" (for the hash stored after it) + * + * - to discard the "const"; this is OK because we + * know it points into our non-const "buf" + */ + rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1; break; } update_tree_entry(&desc); @@ -224,7 +231,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, } else { rewrite_with = oid2; } - hashcpy(rewrite_here->hash, rewrite_with->hash); + hashcpy(rewrite_here, rewrite_with->hash); status = write_object_file(buf, sz, tree_type, result); free(buf); return status; I think if I were trying to write this in a less-subtle way, I'd probably stop trying to do it in-place, and have a copy loop more like: for entry in src_tree if match_entry(entry, prefix) entry = rewrite_entry(entry) /* either oid2 or subtree */ push_entry(dst_tree) We may even have to go that way eventually if we might ever be rewriting to a tree with a different hash size (i.e., there is a hidden assumption here that rewrite_here points to exactly the_hash_algo->rawsz bytes of hash). But I think for now it's not necessary, and it's way outside the scope of what you're trying to do here. -Peff ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees 2019-01-11 14:51 ` Jeff King @ 2019-01-11 14:54 ` Jeff King 2019-01-14 1:30 ` brian m. carlson 0 siblings, 1 reply; 41+ messages in thread From: Jeff King @ 2019-01-11 14:54 UTC (permalink / raw) To: brian m. carlson, git, Martin Ågren, Junio C Hamano On Fri, Jan 11, 2019 at 09:51:06AM -0500, Jeff King wrote: > I think rewrite_here needs to be a direct pointer into the buffer, since > we plan to modify it. > > I think rewrite_with is correct to be an object_id. It's either the oid > passed in from the caller, or the subtree we generated (in which case > it's the result from write_object_file). > > So the "most correct" thing is probably something like this: Of course, it would be nice if what I sent compiled. ;) rewrite_here does double duty: it's the pointer in the tree that we need to rewrite, and it's also the oid we pass down recursively. So we have to handle both cases, like so: diff --git a/match-trees.c b/match-trees.c index 03e81b56e1..3e0ed889b4 100644 --- a/match-trees.c +++ b/match-trees.c @@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, char *buf; unsigned long sz; struct tree_desc desc; - struct object_id *rewrite_here; + unsigned char *rewrite_here; const struct object_id *rewrite_with; struct object_id subtree; enum object_type type; @@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, if (!S_ISDIR(mode)) die("entry %s in tree %s is not a tree", name, oid_to_hex(oid1)); - rewrite_here = (struct object_id *)(desc.entry.path + - strlen(desc.entry.path) + - 1); + /* + * We cast here for two reasons: + * + * - to flip the "char *" (for the path) to "unsigned + * char *" (for the hash stored after it) + * + * - to discard the "const"; this is OK because we + * know it points into our non-const "buf" + */ + rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1; break; } update_tree_entry(&desc); @@ -217,14 +224,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, die("entry %.*s not found in tree %s", toplen, prefix, oid_to_hex(oid1)); if (*subpath) { - status = splice_tree(rewrite_here, subpath, oid2, &subtree); + struct object_id tree_oid; + hashcpy(tree_oid.hash, rewrite_here); + status = splice_tree(&tree_oid, subpath, oid2, &subtree); if (status) return status; rewrite_with = &subtree; } else { rewrite_with = oid2; } - hashcpy(rewrite_here->hash, rewrite_with->hash); + hashcpy(rewrite_here, rewrite_with->hash); status = write_object_file(buf, sz, tree_type, result); free(buf); return status; ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees 2019-01-11 14:54 ` Jeff King @ 2019-01-14 1:30 ` brian m. carlson 2019-01-14 15:40 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: brian m. carlson @ 2019-01-14 1:30 UTC (permalink / raw) To: Jeff King; +Cc: git, Martin Ågren, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 944 bytes --] On Fri, Jan 11, 2019 at 09:54:46AM -0500, Jeff King wrote: > On Fri, Jan 11, 2019 at 09:51:06AM -0500, Jeff King wrote: > > > I think rewrite_here needs to be a direct pointer into the buffer, since > > we plan to modify it. > > > > I think rewrite_with is correct to be an object_id. It's either the oid > > passed in from the caller, or the subtree we generated (in which case > > it's the result from write_object_file). > > > > So the "most correct" thing is probably something like this: > > Of course, it would be nice if what I sent compiled. ;) > > rewrite_here does double duty: it's the pointer in the tree that we need > to rewrite, and it's also the oid we pass down recursively. So we have > to handle both cases, like so: Since I took most of the patch you wrote, may I apply your sign-off to the resulting patch I send out? -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees 2019-01-14 1:30 ` brian m. carlson @ 2019-01-14 15:40 ` Jeff King 0 siblings, 0 replies; 41+ messages in thread From: Jeff King @ 2019-01-14 15:40 UTC (permalink / raw) To: brian m. carlson, git, Martin Ågren, Junio C Hamano On Mon, Jan 14, 2019 at 01:30:17AM +0000, brian m. carlson wrote: > > > So the "most correct" thing is probably something like this: > > > > Of course, it would be nice if what I sent compiled. ;) > > > > rewrite_here does double duty: it's the pointer in the tree that we need > > to rewrite, and it's also the oid we pass down recursively. So we have > > to handle both cases, like so: > > Since I took most of the patch you wrote, may I apply your sign-off to > the resulting patch I send out? Yes, definitely. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 4/5] tree-walk: store object_id in a separate member 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson ` (2 preceding siblings ...) 2019-01-10 4:25 ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson @ 2019-01-10 4:25 ` brian m. carlson 2019-01-10 6:49 ` Jeff King 2019-01-10 4:25 ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson 2019-01-10 6:40 ` [PATCH 0/5] tree-walk object_id refactor Jeff King 5 siblings, 1 reply; 41+ messages in thread From: brian m. carlson @ 2019-01-10 4:25 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- builtin/grep.c | 8 ++++---- builtin/merge-tree.c | 20 ++++++++++---------- builtin/pack-objects.c | 4 ++-- builtin/reflog.c | 4 ++-- cache-tree.c | 4 ++-- delta-islands.c | 2 +- fsck.c | 4 ++-- http-push.c | 4 ++-- list-objects.c | 6 +++--- match-trees.c | 2 +- notes.c | 4 ++-- packfile.c | 2 +- revision.c | 4 ++-- tree-diff.c | 6 +++--- tree-walk.c | 11 +++++++---- tree-walk.h | 9 ++++++--- tree.c | 10 +++++----- unpack-trees.c | 6 +++--- walker.c | 4 ++-- 19 files changed, 60 insertions(+), 54 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index bad9c0a3d5..6fb93c0370 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -566,7 +566,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); if (S_ISREG(entry.mode)) { - hit |= grep_oid(opt, entry.oid, base->buf, tn_len, + hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, check_attr ? base->buf + tn_len : NULL); } else if (S_ISDIR(entry.mode)) { enum object_type type; @@ -574,10 +574,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, void *data; unsigned long size; - data = lock_and_read_oid_file(entry.oid, &type, &size); + data = lock_and_read_oid_file(&entry.oid, &type, &size); if (!data) die(_("unable to read tree (%s)"), - oid_to_hex(entry.oid)); + oid_to_hex(&entry.oid)); strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); @@ -585,7 +585,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, check_attr, repo); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { - hit |= grep_submodule(opt, repo, pathspec, entry.oid, + hit |= grep_submodule(opt, repo, pathspec, &entry.oid, base->buf, base->buf + tn_len); } diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index 70f6fc9167..4500c41e94 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -154,15 +154,15 @@ static void show_result(void) /* An empty entry never compares same, not even to another empty entry */ static int same_entry(struct name_entry *a, struct name_entry *b) { - return a->oid && - b->oid && - oideq(a->oid, b->oid) && + return !is_null_oid(&a->oid) && + !is_null_oid(&b->oid) && + oideq(&a->oid, &b->oid) && a->mode == b->mode; } static int both_empty(struct name_entry *a, struct name_entry *b) { - return !(a->oid || b->oid); + return is_null_oid(&a->oid) && is_null_oid(&b->oid); } static struct merge_list *create_entry(unsigned stage, unsigned mode, const struct object_id *oid, const char *path) @@ -178,7 +178,7 @@ static struct merge_list *create_entry(unsigned stage, unsigned mode, const stru static char *traverse_path(const struct traverse_info *info, const struct name_entry *n) { - char *path = xmallocz(traverse_path_len(info, n)); + char *path = xmallocz(traverse_path_len(info, n) + the_hash_algo->rawsz); return make_traverse_path(path, info, n); } @@ -192,8 +192,8 @@ static void resolve(const struct traverse_info *info, struct name_entry *ours, s return; path = traverse_path(info, result); - orig = create_entry(2, ours->mode, ours->oid, path); - final = create_entry(0, result->mode, result->oid, path); + orig = create_entry(2, ours->mode, &ours->oid, path); + final = create_entry(0, result->mode, &result->oid, path); final->link = orig; @@ -217,7 +217,7 @@ static void unresolved_directory(const struct traverse_info *info, newbase = traverse_path(info, p); -#define ENTRY_OID(e) (((e)->mode && S_ISDIR((e)->mode)) ? (e)->oid : NULL) +#define ENTRY_OID(e) (((e)->mode && S_ISDIR((e)->mode)) ? &(e)->oid : NULL) buf0 = fill_tree_descriptor(t + 0, ENTRY_OID(n + 0)); buf1 = fill_tree_descriptor(t + 1, ENTRY_OID(n + 1)); buf2 = fill_tree_descriptor(t + 2, ENTRY_OID(n + 2)); @@ -243,7 +243,7 @@ static struct merge_list *link_entry(unsigned stage, const struct traverse_info path = entry->path; else path = traverse_path(info, n); - link = create_entry(stage, n->mode, n->oid, path); + link = create_entry(stage, n->mode, &n->oid, path); link->link = entry; return link; } @@ -318,7 +318,7 @@ static int threeway_callback(int n, unsigned long mask, unsigned long dirmask, s } if (same_entry(entry+0, entry+1)) { - if (entry[2].oid && !S_ISDIR(entry[2].mode)) { + if (!is_null_oid(&entry[2].oid) && !S_ISDIR(entry[2].mode)) { /* We did not touch, they modified -- take theirs */ resolve(info, entry+1, entry+2); return mask; diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 24bba8147f..327d9170c4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1334,7 +1334,7 @@ static void add_pbase_object(struct tree_desc *tree, if (cmp < 0) return; if (name[cmplen] != '/') { - add_object_entry(entry.oid, + add_object_entry(&entry.oid, object_type(entry.mode), fullname, 1); return; @@ -1345,7 +1345,7 @@ static void add_pbase_object(struct tree_desc *tree, const char *down = name+cmplen+1; int downlen = name_cmp_len(down); - tree = pbase_tree_get(entry.oid); + tree = pbase_tree_get(&entry.oid); if (!tree) return; init_tree_desc(&sub, tree->tree_data, tree->tree_size); diff --git a/builtin/reflog.c b/builtin/reflog.c index 64a8df4f25..1f1010e2d9 100644 --- a/builtin/reflog.c +++ b/builtin/reflog.c @@ -94,8 +94,8 @@ static int tree_is_complete(const struct object_id *oid) init_tree_desc(&desc, tree->buffer, tree->size); complete = 1; while (tree_entry(&desc, &entry)) { - if (!has_sha1_file(entry.oid->hash) || - (S_ISDIR(entry.mode) && !tree_is_complete(entry.oid))) { + if (!has_sha1_file(entry.oid.hash) || + (S_ISDIR(entry.mode) && !tree_is_complete(&entry.oid))) { tree->object.flags |= INCOMPLETE; complete = 0; } diff --git a/cache-tree.c b/cache-tree.c index 190c6e5aa6..98cb66587b 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -675,7 +675,7 @@ static void prime_cache_tree_rec(struct repository *r, cnt++; else { struct cache_tree_sub *sub; - struct tree *subtree = lookup_tree(r, entry.oid); + struct tree *subtree = lookup_tree(r, &entry.oid); if (!subtree->object.parsed) parse_tree(subtree); sub = cache_tree_sub(it, entry.path); @@ -724,7 +724,7 @@ int cache_tree_matches_traversal(struct cache_tree *root, it = find_cache_tree_from_traversal(root, info); it = cache_tree_find(it, ent->path); - if (it && it->entry_count > 0 && oideq(ent->oid, &it->oid)) + if (it && it->entry_count > 0 && oideq(&ent->oid, &it->oid)) return it->entry_count; return 0; } diff --git a/delta-islands.c b/delta-islands.c index 191a930705..2186bd0738 100644 --- a/delta-islands.c +++ b/delta-islands.c @@ -296,7 +296,7 @@ void resolve_tree_islands(struct repository *r, if (S_ISGITLINK(entry.mode)) continue; - obj = lookup_object(r, entry.oid->hash); + obj = lookup_object(r, entry.oid.hash); if (!obj) continue; diff --git a/fsck.c b/fsck.c index 68502ce85b..2260adb71e 100644 --- a/fsck.c +++ b/fsck.c @@ -410,14 +410,14 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op continue; if (S_ISDIR(entry.mode)) { - obj = (struct object *)lookup_tree(the_repository, entry.oid); + obj = (struct object *)lookup_tree(the_repository, &entry.oid); if (name && obj) put_object_name(options, obj, "%s%s/", name, entry.path); result = options->walk(obj, OBJ_TREE, data, options); } else if (S_ISREG(entry.mode) || S_ISLNK(entry.mode)) { - obj = (struct object *)lookup_blob(the_repository, entry.oid); + obj = (struct object *)lookup_blob(the_repository, &entry.oid); if (name && obj) put_object_name(options, obj, "%s%s", name, entry.path); diff --git a/http-push.c b/http-push.c index cd48590912..bb802d80ee 100644 --- a/http-push.c +++ b/http-push.c @@ -1311,11 +1311,11 @@ static struct object_list **process_tree(struct tree *tree, while (tree_entry(&desc, &entry)) switch (object_type(entry.mode)) { case OBJ_TREE: - p = process_tree(lookup_tree(the_repository, entry.oid), + p = process_tree(lookup_tree(the_repository, &entry.oid), p); break; case OBJ_BLOB: - p = process_blob(lookup_blob(the_repository, entry.oid), + p = process_blob(lookup_blob(the_repository, &entry.oid), p); break; default: diff --git a/list-objects.c b/list-objects.c index cf7f25bed3..ee6aff0cef 100644 --- a/list-objects.c +++ b/list-objects.c @@ -123,15 +123,15 @@ static void process_tree_contents(struct traversal_context *ctx, } if (S_ISDIR(entry.mode)) { - struct tree *t = lookup_tree(ctx->revs->repo, entry.oid); + struct tree *t = lookup_tree(ctx->revs->repo, &entry.oid); t->object.flags |= NOT_USER_GIVEN; process_tree(ctx, t, base, entry.path); } else if (S_ISGITLINK(entry.mode)) - process_gitlink(ctx, entry.oid->hash, + process_gitlink(ctx, entry.oid.hash, base, entry.path); else { - struct blob *b = lookup_blob(ctx->revs->repo, entry.oid); + struct blob *b = lookup_blob(ctx->revs->repo, &entry.oid); b->object.flags |= NOT_USER_GIVEN; process_blob(ctx, b, base, entry.path); } diff --git a/match-trees.c b/match-trees.c index b1fbd022d1..03e81b56e1 100644 --- a/match-trees.c +++ b/match-trees.c @@ -106,7 +106,7 @@ static int score_trees(const struct object_id *hash1, const struct object_id *ha update_tree_entry(&two); } else { /* path appears in both */ - if (!oideq(one.entry.oid, two.entry.oid)) { + if (!oideq(&one.entry.oid, &two.entry.oid)) { /* they are different */ score += score_differs(one.entry.mode, two.entry.mode, diff --git a/notes.c b/notes.c index 25cdce28b7..7f7cc4d511 100644 --- a/notes.c +++ b/notes.c @@ -450,7 +450,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree, l = xcalloc(1, sizeof(*l)); oidcpy(&l->key_oid, &object_oid); - oidcpy(&l->val_oid, entry.oid); + oidcpy(&l->val_oid, &entry.oid); if (note_tree_insert(t, node, n, l, type, combine_notes_concatenate)) die("Failed to load %s %s into notes tree " @@ -481,7 +481,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree, } strbuf_addstr(&non_note_path, entry.path); add_non_note(t, strbuf_detach(&non_note_path, NULL), - entry.mode, entry.oid->hash); + entry.mode, entry.oid.hash); } } free(buf); diff --git a/packfile.c b/packfile.c index 8c6b47cc77..e38af1a275 100644 --- a/packfile.c +++ b/packfile.c @@ -2100,7 +2100,7 @@ static int add_promisor_object(const struct object_id *oid, */ return 0; while (tree_entry_gently(&desc, &entry)) - oidset_insert(set, entry.oid); + oidset_insert(set, &entry.oid); } else if (obj->type == OBJ_COMMIT) { struct commit *commit = (struct commit *) obj; struct commit_list *parents = commit->parents; diff --git a/revision.c b/revision.c index 13e0519c02..05fb390d55 100644 --- a/revision.c +++ b/revision.c @@ -67,10 +67,10 @@ static void mark_tree_contents_uninteresting(struct repository *r, while (tree_entry(&desc, &entry)) { switch (object_type(entry.mode)) { case OBJ_TREE: - mark_tree_uninteresting(r, lookup_tree(r, entry.oid)); + mark_tree_uninteresting(r, lookup_tree(r, &entry.oid)); break; case OBJ_BLOB: - mark_blob_uninteresting(lookup_blob(r, entry.oid)); + mark_blob_uninteresting(lookup_blob(r, &entry.oid)); break; default: /* Subproject commit - not in this repository */ diff --git a/tree-diff.c b/tree-diff.c index 0e54324610..4ffa00ae0c 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -239,7 +239,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p, DIFF_STATUS_ADDED; if (tpi_valid) { - oid_i = tp[i].entry.oid; + oid_i = &tp[i].entry.oid; mode_i = tp[i].entry.mode; } else { @@ -280,7 +280,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p, /* same rule as in emitthis */ int tpi_valid = tp && !(tp[i].entry.mode & S_IFXMIN_NEQ); - parents_oid[i] = tpi_valid ? tp[i].entry.oid : NULL; + parents_oid[i] = tpi_valid ? &tp[i].entry.oid : NULL; } strbuf_add(base, path, pathlen); @@ -491,7 +491,7 @@ static struct combine_diff_path *ll_diff_tree_paths( continue; /* diff(t,pi) != ø */ - if (!oideq(t.entry.oid, tp[i].entry.oid) || + if (!oideq(&t.entry.oid, &tp[i].entry.oid) || (t.entry.mode != tp[i].entry.mode)) continue; diff --git a/tree-walk.c b/tree-walk.c index 1e040fc20e..b6daeab16d 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -48,7 +48,8 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l /* Initialize the descriptor entry */ desc->entry.path = path; desc->entry.mode = canon_mode(mode); - desc->entry.oid = (const struct object_id *)(path + len); + desc->entry.pathlen = len - 1; + memcpy(desc->entry.oid.hash, path + len, the_hash_algo->rawsz); return 0; } @@ -107,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a) static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err) { const void *buf = desc->buffer; - const unsigned char *end = desc->entry.oid->hash + the_hash_algo->rawsz; + const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz; unsigned long size = desc->size; unsigned long len = end - (const unsigned char *)buf; @@ -175,9 +176,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base) pathlen--; info->pathlen = pathlen ? pathlen + 1 : 0; info->name.path = base; - info->name.oid = (void *)(base + pathlen + 1); - if (pathlen) + info->name.pathlen = pathlen; + if (pathlen) { + memcpy(info->name.oid.hash, (base + pathlen + 1), the_hash_algo->rawsz); info->prev = &dummy; + } } char *make_traverse_path(char *path, const struct traverse_info *info, const struct name_entry *n) diff --git a/tree-walk.h b/tree-walk.h index 196831007e..08d349c54f 100644 --- a/tree-walk.h +++ b/tree-walk.h @@ -1,11 +1,14 @@ #ifndef TREE_WALK_H #define TREE_WALK_H +#include "cache.h" + struct strbuf; struct name_entry { - const struct object_id *oid; + struct object_id oid; const char *path; + int pathlen; unsigned int mode; }; @@ -19,12 +22,12 @@ static inline const struct object_id *tree_entry_extract(struct tree_desc *desc, { *pathp = desc->entry.path; *modep = desc->entry.mode; - return desc->entry.oid; + return &desc->entry.oid; } static inline int tree_entry_len(const struct name_entry *ne) { - return (const char *)ne->oid - ne->path - 1; + return ne->pathlen; } /* diff --git a/tree.c b/tree.c index 215d3fdc7c..7badea0009 100644 --- a/tree.c +++ b/tree.c @@ -84,7 +84,7 @@ static int read_tree_1(struct tree *tree, struct strbuf *base, continue; } - switch (fn(entry.oid, base, + switch (fn(&entry.oid, base, entry.path, entry.mode, stage, context)) { case 0: continue; @@ -95,19 +95,19 @@ static int read_tree_1(struct tree *tree, struct strbuf *base, } if (S_ISDIR(entry.mode)) - oidcpy(&oid, entry.oid); + oidcpy(&oid, &entry.oid); else if (S_ISGITLINK(entry.mode)) { struct commit *commit; - commit = lookup_commit(the_repository, entry.oid); + commit = lookup_commit(the_repository, &entry.oid); if (!commit) die("Commit %s in submodule path %s%s not found", - oid_to_hex(entry.oid), + oid_to_hex(&entry.oid), base->buf, entry.path); if (parse_commit(commit)) die("Invalid commit %s in submodule path %s%s", - oid_to_hex(entry.oid), + oid_to_hex(&entry.oid), base->buf, entry.path); oidcpy(&oid, get_commit_tree_oid(commit)); diff --git a/unpack-trees.c b/unpack-trees.c index 6d53cbfc86..734828fda2 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -679,7 +679,7 @@ static int switch_cache_bottom(struct traverse_info *info) static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k) { - return name_j->oid && name_k->oid && oideq(name_j->oid, name_k->oid); + return !is_null_oid(&name_j->oid) && !is_null_oid(&name_k->oid) && oideq(&name_j->oid, &name_k->oid); } static int all_trees_same_as_cache_tree(int n, unsigned long dirmask, @@ -857,7 +857,7 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, else { const struct object_id *oid = NULL; if (dirmask & 1) - oid = names[i].oid; + oid = &names[i].oid; buf[nr_buf++] = fill_tree_descriptor(t + i, oid); } } @@ -981,7 +981,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, ce->ce_mode = create_ce_mode(n->mode); ce->ce_flags = create_ce_flags(stage); ce->ce_namelen = len; - oidcpy(&ce->oid, n->oid); + oidcpy(&ce->oid, &n->oid); make_traverse_path(ce->name, info, n); return ce; diff --git a/walker.c b/walker.c index 96990d84da..d74ae59c77 100644 --- a/walker.c +++ b/walker.c @@ -50,13 +50,13 @@ static int process_tree(struct walker *walker, struct tree *tree) continue; if (S_ISDIR(entry.mode)) { struct tree *tree = lookup_tree(the_repository, - entry.oid); + &entry.oid); if (tree) obj = &tree->object; } else { struct blob *blob = lookup_blob(the_repository, - entry.oid); + &entry.oid); if (blob) obj = &blob->object; } ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 4/5] tree-walk: store object_id in a separate member 2019-01-10 4:25 ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson @ 2019-01-10 6:49 ` Jeff King 2019-01-10 23:57 ` brian m. carlson 0 siblings, 1 reply; 41+ messages in thread From: Jeff King @ 2019-01-10 6:49 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Martin Ågren, Junio C Hamano On Thu, Jan 10, 2019 at 04:25:50AM +0000, brian m. carlson wrote: > diff --git a/tree-walk.c b/tree-walk.c > index 1e040fc20e..b6daeab16d 100644 > --- a/tree-walk.c > +++ b/tree-walk.c > @@ -48,7 +48,8 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l > /* Initialize the descriptor entry */ > desc->entry.path = path; > desc->entry.mode = canon_mode(mode); > - desc->entry.oid = (const struct object_id *)(path + len); > + desc->entry.pathlen = len - 1; > + memcpy(desc->entry.oid.hash, path + len, the_hash_algo->rawsz); > > return 0; > } This one really is a hashcpy() now, right, even after your final patch? I guess using rawsz explicitly makes it match the computation here: > @@ -107,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a) > static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err) > { > const void *buf = desc->buffer; > - const unsigned char *end = desc->entry.oid->hash + the_hash_algo->rawsz; > + const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz; > unsigned long size = desc->size; > unsigned long len = end - (const unsigned char *)buf; So maybe it's better to be explicit as you have here. (Mostly just as I was reading it, I was looking for a use of hashcpy and was surprised not to find it ;) ). > @@ -175,9 +176,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base) > pathlen--; > info->pathlen = pathlen ? pathlen + 1 : 0; > info->name.path = base; > - info->name.oid = (void *)(base + pathlen + 1); > - if (pathlen) > + info->name.pathlen = pathlen; > + if (pathlen) { > + memcpy(info->name.oid.hash, (base + pathlen + 1), the_hash_algo->rawsz); > info->prev = &dummy; > + } > } > Ditto here, I think. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 4/5] tree-walk: store object_id in a separate member 2019-01-10 6:49 ` Jeff King @ 2019-01-10 23:57 ` brian m. carlson 0 siblings, 0 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-10 23:57 UTC (permalink / raw) To: Jeff King; +Cc: git, Martin Ågren, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 1055 bytes --] On Thu, Jan 10, 2019 at 01:49:45AM -0500, Jeff King wrote: > This one really is a hashcpy() now, right, even after your final patch? > I guess using rawsz explicitly makes it match the computation here: > > > @@ -107,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a) > > static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err) > > { > > const void *buf = desc->buffer; > > - const unsigned char *end = desc->entry.oid->hash + the_hash_algo->rawsz; > > + const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz; > > unsigned long size = desc->size; > > unsigned long len = end - (const unsigned char *)buf; > > So maybe it's better to be explicit as you have here. (Mostly just as I > was reading it, I was looking for a use of hashcpy and was surprised not > to find it ;) ). Yeah, I think a hashcpy is a better choice. Will fix. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson ` (3 preceding siblings ...) 2019-01-10 4:25 ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson @ 2019-01-10 4:25 ` brian m. carlson 2019-01-10 6:50 ` Jeff King 2019-01-10 6:40 ` [PATCH 0/5] tree-walk object_id refactor Jeff King 5 siblings, 1 reply; 41+ messages in thread From: brian m. carlson @ 2019-01-10 4:25 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano There are some situations in which we want to store an object ID into struct object_id without the_hash_algo necessarily being set correctly. One such case is when cloning a repository, where we must read refs from the remote side without having a repository from which to read the preferred algorithm. In this cases, we may have the_hash_algo set to SHA-1, which is the default, but read refs into struct object_id that are SHA-256. When copying these values, we will want to copy them completely, not just the first 20 bytes. Consequently, make sure that oidcpy copies the maximum number of bytes at all times, regardless of the setting of the_hash_algo. Since oidcpy and hashcpy are no longer functionally identical, remove the Cocinelle object_id transformations that convert from one into the other. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- cache.h | 2 +- contrib/coccinelle/object_id.cocci | 30 ------------------------------ 2 files changed, 1 insertion(+), 31 deletions(-) diff --git a/cache.h b/cache.h index ca36b44ee0..8b8c6a1a2a 100644 --- a/cache.h +++ b/cache.h @@ -1072,7 +1072,7 @@ static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src) static inline void oidcpy(struct object_id *dst, const struct object_id *src) { - hashcpy(dst->hash, src->hash); + memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ); } static inline struct object_id *oiddup(const struct object_id *src) diff --git a/contrib/coccinelle/object_id.cocci b/contrib/coccinelle/object_id.cocci index 6a7cf3e02d..3e536a9834 100644 --- a/contrib/coccinelle/object_id.cocci +++ b/contrib/coccinelle/object_id.cocci @@ -86,36 +86,6 @@ struct object_id OID; - hashcmp(OID.hash, OIDPTR->hash) + oidcmp(&OID, OIDPTR) -@@ -struct object_id OID1, OID2; -@@ -- hashcpy(OID1.hash, OID2.hash) -+ oidcpy(&OID1, &OID2) - -@@ -identifier f != oidcpy; -struct object_id *OIDPTR1; -struct object_id *OIDPTR2; -@@ - f(...) {<... -- hashcpy(OIDPTR1->hash, OIDPTR2->hash) -+ oidcpy(OIDPTR1, OIDPTR2) - ...>} - -@@ -struct object_id *OIDPTR; -struct object_id OID; -@@ -- hashcpy(OIDPTR->hash, OID.hash) -+ oidcpy(OIDPTR, &OID) - -@@ -struct object_id *OIDPTR; -struct object_id OID; -@@ -- hashcpy(OID.hash, OIDPTR->hash) -+ oidcpy(&OID, OIDPTR) - @@ struct object_id *OIDPTR1; struct object_id *OIDPTR2; ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes 2019-01-10 4:25 ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson @ 2019-01-10 6:50 ` Jeff King 0 siblings, 0 replies; 41+ messages in thread From: Jeff King @ 2019-01-10 6:50 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Martin Ågren, Junio C Hamano On Thu, Jan 10, 2019 at 04:25:51AM +0000, brian m. carlson wrote: > Since oidcpy and hashcpy are no longer functionally identical, remove > the Cocinelle object_id transformations that convert from one into the > other. Unfortunately this means we'll no longer automatically find cases where "foo" got converted into "struct object_id" and could be updated. I guess at some point we assume that most everything has been converted anyway, and that coccinelle rule loses its usefulness. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 0/5] tree-walk object_id refactor 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson ` (4 preceding siblings ...) 2019-01-10 4:25 ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson @ 2019-01-10 6:40 ` Jeff King 2019-01-11 0:17 ` brian m. carlson 5 siblings, 1 reply; 41+ messages in thread From: Jeff King @ 2019-01-10 6:40 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Martin Ågren, Junio C Hamano On Thu, Jan 10, 2019 at 04:25:46AM +0000, brian m. carlson wrote: > There are a small number of places in our codebase where we cast a > buffer of unsigned char to a struct object_id pointer. When we have > GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places > (the buffer for tree objects) can lead to us copying too much data when > using SHA-1 as the hash, since there are only 20 bytes to read. > > This was not expected to be a problem before future code was introduced, > but due to a combination of series the issue became noticeable. > > This series introduces a refactor to avoid referencing the struct > object_id directly from a buffer and instead storing an additional > struct object_id (and an int) in struct name_entry and referring to > that. I think this is really the only safe and sane solution. We resisted it because of the cost of the extra copies (especially the update_tree_entry() one). But I don't know that anybody actually measured it. Do you have any performance numbers before/after this series? -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 0/5] tree-walk object_id refactor 2019-01-10 6:40 ` [PATCH 0/5] tree-walk object_id refactor Jeff King @ 2019-01-11 0:17 ` brian m. carlson 2019-01-11 14:17 ` Jeff King 0 siblings, 1 reply; 41+ messages in thread From: brian m. carlson @ 2019-01-11 0:17 UTC (permalink / raw) To: Jeff King; +Cc: git, Martin Ågren, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 1793 bytes --] On Thu, Jan 10, 2019 at 01:40:31AM -0500, Jeff King wrote: > On Thu, Jan 10, 2019 at 04:25:46AM +0000, brian m. carlson wrote: > > > There are a small number of places in our codebase where we cast a > > buffer of unsigned char to a struct object_id pointer. When we have > > GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places > > (the buffer for tree objects) can lead to us copying too much data when > > using SHA-1 as the hash, since there are only 20 bytes to read. > > > > This was not expected to be a problem before future code was introduced, > > but due to a combination of series the issue became noticeable. > > > > This series introduces a refactor to avoid referencing the struct > > object_id directly from a buffer and instead storing an additional > > struct object_id (and an int) in struct name_entry and referring to > > that. > > I think this is really the only safe and sane solution. We resisted it > because of the cost of the extra copies (especially the > update_tree_entry() one). But I don't know that anybody actually > measured it. Do you have any performance numbers before/after this > series? Unfortunately, I don't. I'm not really sure in what situations we hit this code path a lot, so I'm not sure what exactly we should performance test. If you have suggestions, I can set up some perf tests. I will say that I resisted writing this series for a long time, since I knew it was going to be a difficult slog to get everything working (and it was). If there had been a way to avoid it, I would have done it. However, writing this series led to about ten tests in my SHA-256 work suddenly passing where they hadn't before. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 0/5] tree-walk object_id refactor 2019-01-11 0:17 ` brian m. carlson @ 2019-01-11 14:17 ` Jeff King 0 siblings, 0 replies; 41+ messages in thread From: Jeff King @ 2019-01-11 14:17 UTC (permalink / raw) To: brian m. carlson, git, Martin Ågren, Junio C Hamano On Fri, Jan 11, 2019 at 12:17:50AM +0000, brian m. carlson wrote: > > I think this is really the only safe and sane solution. We resisted it > > because of the cost of the extra copies (especially the > > update_tree_entry() one). But I don't know that anybody actually > > measured it. Do you have any performance numbers before/after this > > series? > > Unfortunately, I don't. I'm not really sure in what situations we hit > this code path a lot, so I'm not sure what exactly we should performance > test. If you have suggestions, I can set up some perf tests. I think the interesting one here is the copy in update_tree_entry(), which is called for every entry of every tree we parse for a diff. So the command with the most noticeable impact, I think, would be something like: git log -- pathspec I couldn't demonstrate any measurable slowdown (I didn't compute the mean to see if it gets worse, but certainly any change is within the run-to-run noise). I guess that is competing with the cost to access the commit objects. Perhaps a pure tree diff would be a more precise test. Here's the most pathological case I could come up with: git init for i in $(seq 10000); do echo $i >$i; done git add . git commit -m base echo change >9999 git commit -am change time git diff-tree HEAD^ HEAD which should really be spending 99% of its time inflating and walking through the trees. And even there I couldn't measure anything. So I'd say it's probably fine. -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 0/5] tree-walk object_id refactor 2019-01-09 21:06 ` Martin Ågren 2019-01-10 1:02 ` brian m. carlson 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson @ 2019-01-15 0:39 ` brian m. carlson 2019-01-15 0:39 ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson ` (5 more replies) 2 siblings, 6 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-15 0:39 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King There are a small number of places in our codebase where we cast a buffer of unsigned char to a struct object_id pointer. When we have GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places (the buffer for tree objects) can lead to us copying too much data when using SHA-1 as the hash, since there are only 20 bytes to read. Changes from v1: * Use hashcpy instead of memcpy. * Adopt Peff's suggestion for improving patch 3. brian m. carlson (5): tree-walk: copy object ID before use match-trees: compute buffer offset correctly when splicing match-trees: use hashcpy to splice trees tree-walk: store object_id in a separate member cache: make oidcpy always copy GIT_MAX_RAWSZ bytes builtin/grep.c | 8 ++++---- builtin/merge-tree.c | 20 ++++++++++---------- builtin/pack-objects.c | 4 ++-- builtin/reflog.c | 4 ++-- cache-tree.c | 4 ++-- cache.h | 2 +- contrib/coccinelle/object_id.cocci | 30 ------------------------------ delta-islands.c | 2 +- fsck.c | 4 ++-- http-push.c | 4 ++-- list-objects.c | 6 +++--- match-trees.c | 27 ++++++++++++++++++++------- notes.c | 4 ++-- packfile.c | 2 +- revision.c | 4 ++-- tree-diff.c | 6 +++--- tree-walk.c | 21 ++++++++++++--------- tree-walk.h | 9 ++++++--- tree.c | 10 +++++----- unpack-trees.c | 6 +++--- walker.c | 4 ++-- 21 files changed, 85 insertions(+), 96 deletions(-) ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH v2 1/5] tree-walk: copy object ID before use 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson @ 2019-01-15 0:39 ` brian m. carlson 2019-01-15 0:39 ` [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson ` (4 subsequent siblings) 5 siblings, 0 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-15 0:39 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King In a future commit, the pointer returned by tree_entry_extract will point into the struct tree_desc, causing its lifetime to be bound to that of the struct tree_desc itself. To ensure this code path keeps working, copy the object_id into a local variable so that it lives long enough. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- tree-walk.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index 79bafbd1a2..1e040fc20e 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -498,10 +498,10 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_ int namelen = strlen(name); while (t->size) { const char *entry; - const struct object_id *oid; + struct object_id oid; int entrylen, cmp; - oid = tree_entry_extract(t, &entry, mode); + oidcpy(&oid, tree_entry_extract(t, &entry, mode)); entrylen = tree_entry_len(&t->entry); update_tree_entry(t); if (entrylen > namelen) @@ -512,7 +512,7 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_ if (cmp < 0) break; if (entrylen == namelen) { - oidcpy(result, oid); + oidcpy(result, &oid); return 0; } if (name[entrylen] != '/') @@ -520,10 +520,10 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_ if (!S_ISDIR(*mode)) break; if (++entrylen == namelen) { - oidcpy(result, oid); + oidcpy(result, &oid); return 0; } - return get_tree_entry(oid, name + entrylen, result, mode); + return get_tree_entry(&oid, name + entrylen, result, mode); } return -1; } ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson 2019-01-15 0:39 ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson @ 2019-01-15 0:39 ` brian m. carlson 2019-01-15 0:39 ` [PATCH v2 3/5] match-trees: use hashcpy to splice trees brian m. carlson ` (3 subsequent siblings) 5 siblings, 0 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-15 0:39 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King Currently, the struct object_id pointer returned from tree_entry_extract lives directly inside the parsed tree buffer. In a future commit, this will change so that it instead points to a dedicated struct member. Since in this code path, we want to modify the buffer directly, compute the buffer offset we want to modify by using the pointer to the path instead. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- match-trees.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/match-trees.c b/match-trees.c index 2b6d31ef9d..feca48a5fd 100644 --- a/match-trees.c +++ b/match-trees.c @@ -199,15 +199,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, while (desc.size) { const char *name; unsigned mode; - const struct object_id *oid; - oid = tree_entry_extract(&desc, &name, &mode); + tree_entry_extract(&desc, &name, &mode); if (strlen(name) == toplen && !memcmp(name, prefix, toplen)) { if (!S_ISDIR(mode)) die("entry %s in tree %s is not a tree", name, oid_to_hex(oid1)); - rewrite_here = (struct object_id *)oid; + rewrite_here = (struct object_id *)(desc.entry.path + + strlen(desc.entry.path) + + 1); break; } update_tree_entry(&desc); ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 3/5] match-trees: use hashcpy to splice trees 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson 2019-01-15 0:39 ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson 2019-01-15 0:39 ` [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson @ 2019-01-15 0:39 ` brian m. carlson 2019-01-15 0:39 ` [PATCH v2 4/5] tree-walk: store object_id in a separate member brian m. carlson ` (2 subsequent siblings) 5 siblings, 0 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-15 0:39 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King When we splice trees together, we operate in place on the tree buffer. If we're using SHA-1 for the hash algorithm, we may not have a full GIT_MAX_RAWSZ (32) bytes to copy. Consequently, it doesn't logically make sense for us to use a struct object_id to represent this type, since it isn't a complete object. Represent this value as a unsigned char pointer instead and copy it when necessary. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- match-trees.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/match-trees.c b/match-trees.c index feca48a5fd..c2b7329e09 100644 --- a/match-trees.c +++ b/match-trees.c @@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, char *buf; unsigned long sz; struct tree_desc desc; - struct object_id *rewrite_here; + unsigned char *rewrite_here; const struct object_id *rewrite_with; struct object_id subtree; enum object_type type; @@ -206,9 +206,19 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, if (!S_ISDIR(mode)) die("entry %s in tree %s is not a tree", name, oid_to_hex(oid1)); - rewrite_here = (struct object_id *)(desc.entry.path + - strlen(desc.entry.path) + - 1); + + /* + * We cast here for two reasons: + * + * - to flip the "char *" (for the path) to "unsigned + * char *" (for the hash stored after it) + * + * - to discard the "const"; this is OK because we + * know it points into our non-const "buf" + */ + rewrite_here = (unsigned char *)(desc.entry.path + + strlen(desc.entry.path) + + 1); break; } update_tree_entry(&desc); @@ -217,14 +227,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix, die("entry %.*s not found in tree %s", toplen, prefix, oid_to_hex(oid1)); if (*subpath) { - status = splice_tree(rewrite_here, subpath, oid2, &subtree); + struct object_id tree_oid; + hashcpy(tree_oid.hash, rewrite_here); + status = splice_tree(&tree_oid, subpath, oid2, &subtree); if (status) return status; rewrite_with = &subtree; } else { rewrite_with = oid2; } - oidcpy(rewrite_here, rewrite_with); + hashcpy(rewrite_here, rewrite_with->hash); status = write_object_file(buf, sz, tree_type, result); free(buf); return status; ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 4/5] tree-walk: store object_id in a separate member 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson ` (2 preceding siblings ...) 2019-01-15 0:39 ` [PATCH v2 3/5] match-trees: use hashcpy to splice trees brian m. carlson @ 2019-01-15 0:39 ` brian m. carlson 2019-01-15 0:39 ` [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson 2019-01-15 17:51 ` [PATCH v2 0/5] tree-walk object_id refactor Junio C Hamano 5 siblings, 0 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-15 0:39 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- builtin/grep.c | 8 ++++---- builtin/merge-tree.c | 20 ++++++++++---------- builtin/pack-objects.c | 4 ++-- builtin/reflog.c | 4 ++-- cache-tree.c | 4 ++-- delta-islands.c | 2 +- fsck.c | 4 ++-- http-push.c | 4 ++-- list-objects.c | 6 +++--- match-trees.c | 2 +- notes.c | 4 ++-- packfile.c | 2 +- revision.c | 4 ++-- tree-diff.c | 6 +++--- tree-walk.c | 11 +++++++---- tree-walk.h | 9 ++++++--- tree.c | 10 +++++----- unpack-trees.c | 6 +++--- walker.c | 4 ++-- 19 files changed, 60 insertions(+), 54 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index bad9c0a3d5..6fb93c0370 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -566,7 +566,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); if (S_ISREG(entry.mode)) { - hit |= grep_oid(opt, entry.oid, base->buf, tn_len, + hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, check_attr ? base->buf + tn_len : NULL); } else if (S_ISDIR(entry.mode)) { enum object_type type; @@ -574,10 +574,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, void *data; unsigned long size; - data = lock_and_read_oid_file(entry.oid, &type, &size); + data = lock_and_read_oid_file(&entry.oid, &type, &size); if (!data) die(_("unable to read tree (%s)"), - oid_to_hex(entry.oid)); + oid_to_hex(&entry.oid)); strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); @@ -585,7 +585,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, check_attr, repo); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { - hit |= grep_submodule(opt, repo, pathspec, entry.oid, + hit |= grep_submodule(opt, repo, pathspec, &entry.oid, base->buf, base->buf + tn_len); } diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index 70f6fc9167..4500c41e94 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -154,15 +154,15 @@ static void show_result(void) /* An empty entry never compares same, not even to another empty entry */ static int same_entry(struct name_entry *a, struct name_entry *b) { - return a->oid && - b->oid && - oideq(a->oid, b->oid) && + return !is_null_oid(&a->oid) && + !is_null_oid(&b->oid) && + oideq(&a->oid, &b->oid) && a->mode == b->mode; } static int both_empty(struct name_entry *a, struct name_entry *b) { - return !(a->oid || b->oid); + return is_null_oid(&a->oid) && is_null_oid(&b->oid); } static struct merge_list *create_entry(unsigned stage, unsigned mode, const struct object_id *oid, const char *path) @@ -178,7 +178,7 @@ static struct merge_list *create_entry(unsigned stage, unsigned mode, const stru static char *traverse_path(const struct traverse_info *info, const struct name_entry *n) { - char *path = xmallocz(traverse_path_len(info, n)); + char *path = xmallocz(traverse_path_len(info, n) + the_hash_algo->rawsz); return make_traverse_path(path, info, n); } @@ -192,8 +192,8 @@ static void resolve(const struct traverse_info *info, struct name_entry *ours, s return; path = traverse_path(info, result); - orig = create_entry(2, ours->mode, ours->oid, path); - final = create_entry(0, result->mode, result->oid, path); + orig = create_entry(2, ours->mode, &ours->oid, path); + final = create_entry(0, result->mode, &result->oid, path); final->link = orig; @@ -217,7 +217,7 @@ static void unresolved_directory(const struct traverse_info *info, newbase = traverse_path(info, p); -#define ENTRY_OID(e) (((e)->mode && S_ISDIR((e)->mode)) ? (e)->oid : NULL) +#define ENTRY_OID(e) (((e)->mode && S_ISDIR((e)->mode)) ? &(e)->oid : NULL) buf0 = fill_tree_descriptor(t + 0, ENTRY_OID(n + 0)); buf1 = fill_tree_descriptor(t + 1, ENTRY_OID(n + 1)); buf2 = fill_tree_descriptor(t + 2, ENTRY_OID(n + 2)); @@ -243,7 +243,7 @@ static struct merge_list *link_entry(unsigned stage, const struct traverse_info path = entry->path; else path = traverse_path(info, n); - link = create_entry(stage, n->mode, n->oid, path); + link = create_entry(stage, n->mode, &n->oid, path); link->link = entry; return link; } @@ -318,7 +318,7 @@ static int threeway_callback(int n, unsigned long mask, unsigned long dirmask, s } if (same_entry(entry+0, entry+1)) { - if (entry[2].oid && !S_ISDIR(entry[2].mode)) { + if (!is_null_oid(&entry[2].oid) && !S_ISDIR(entry[2].mode)) { /* We did not touch, they modified -- take theirs */ resolve(info, entry+1, entry+2); return mask; diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 24bba8147f..327d9170c4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1334,7 +1334,7 @@ static void add_pbase_object(struct tree_desc *tree, if (cmp < 0) return; if (name[cmplen] != '/') { - add_object_entry(entry.oid, + add_object_entry(&entry.oid, object_type(entry.mode), fullname, 1); return; @@ -1345,7 +1345,7 @@ static void add_pbase_object(struct tree_desc *tree, const char *down = name+cmplen+1; int downlen = name_cmp_len(down); - tree = pbase_tree_get(entry.oid); + tree = pbase_tree_get(&entry.oid); if (!tree) return; init_tree_desc(&sub, tree->tree_data, tree->tree_size); diff --git a/builtin/reflog.c b/builtin/reflog.c index 64a8df4f25..1f1010e2d9 100644 --- a/builtin/reflog.c +++ b/builtin/reflog.c @@ -94,8 +94,8 @@ static int tree_is_complete(const struct object_id *oid) init_tree_desc(&desc, tree->buffer, tree->size); complete = 1; while (tree_entry(&desc, &entry)) { - if (!has_sha1_file(entry.oid->hash) || - (S_ISDIR(entry.mode) && !tree_is_complete(entry.oid))) { + if (!has_sha1_file(entry.oid.hash) || + (S_ISDIR(entry.mode) && !tree_is_complete(&entry.oid))) { tree->object.flags |= INCOMPLETE; complete = 0; } diff --git a/cache-tree.c b/cache-tree.c index 190c6e5aa6..98cb66587b 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -675,7 +675,7 @@ static void prime_cache_tree_rec(struct repository *r, cnt++; else { struct cache_tree_sub *sub; - struct tree *subtree = lookup_tree(r, entry.oid); + struct tree *subtree = lookup_tree(r, &entry.oid); if (!subtree->object.parsed) parse_tree(subtree); sub = cache_tree_sub(it, entry.path); @@ -724,7 +724,7 @@ int cache_tree_matches_traversal(struct cache_tree *root, it = find_cache_tree_from_traversal(root, info); it = cache_tree_find(it, ent->path); - if (it && it->entry_count > 0 && oideq(ent->oid, &it->oid)) + if (it && it->entry_count > 0 && oideq(&ent->oid, &it->oid)) return it->entry_count; return 0; } diff --git a/delta-islands.c b/delta-islands.c index 191a930705..2186bd0738 100644 --- a/delta-islands.c +++ b/delta-islands.c @@ -296,7 +296,7 @@ void resolve_tree_islands(struct repository *r, if (S_ISGITLINK(entry.mode)) continue; - obj = lookup_object(r, entry.oid->hash); + obj = lookup_object(r, entry.oid.hash); if (!obj) continue; diff --git a/fsck.c b/fsck.c index 68502ce85b..2260adb71e 100644 --- a/fsck.c +++ b/fsck.c @@ -410,14 +410,14 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op continue; if (S_ISDIR(entry.mode)) { - obj = (struct object *)lookup_tree(the_repository, entry.oid); + obj = (struct object *)lookup_tree(the_repository, &entry.oid); if (name && obj) put_object_name(options, obj, "%s%s/", name, entry.path); result = options->walk(obj, OBJ_TREE, data, options); } else if (S_ISREG(entry.mode) || S_ISLNK(entry.mode)) { - obj = (struct object *)lookup_blob(the_repository, entry.oid); + obj = (struct object *)lookup_blob(the_repository, &entry.oid); if (name && obj) put_object_name(options, obj, "%s%s", name, entry.path); diff --git a/http-push.c b/http-push.c index cd48590912..bb802d80ee 100644 --- a/http-push.c +++ b/http-push.c @@ -1311,11 +1311,11 @@ static struct object_list **process_tree(struct tree *tree, while (tree_entry(&desc, &entry)) switch (object_type(entry.mode)) { case OBJ_TREE: - p = process_tree(lookup_tree(the_repository, entry.oid), + p = process_tree(lookup_tree(the_repository, &entry.oid), p); break; case OBJ_BLOB: - p = process_blob(lookup_blob(the_repository, entry.oid), + p = process_blob(lookup_blob(the_repository, &entry.oid), p); break; default: diff --git a/list-objects.c b/list-objects.c index cf7f25bed3..ee6aff0cef 100644 --- a/list-objects.c +++ b/list-objects.c @@ -123,15 +123,15 @@ static void process_tree_contents(struct traversal_context *ctx, } if (S_ISDIR(entry.mode)) { - struct tree *t = lookup_tree(ctx->revs->repo, entry.oid); + struct tree *t = lookup_tree(ctx->revs->repo, &entry.oid); t->object.flags |= NOT_USER_GIVEN; process_tree(ctx, t, base, entry.path); } else if (S_ISGITLINK(entry.mode)) - process_gitlink(ctx, entry.oid->hash, + process_gitlink(ctx, entry.oid.hash, base, entry.path); else { - struct blob *b = lookup_blob(ctx->revs->repo, entry.oid); + struct blob *b = lookup_blob(ctx->revs->repo, &entry.oid); b->object.flags |= NOT_USER_GIVEN; process_blob(ctx, b, base, entry.path); } diff --git a/match-trees.c b/match-trees.c index c2b7329e09..18ab825bef 100644 --- a/match-trees.c +++ b/match-trees.c @@ -106,7 +106,7 @@ static int score_trees(const struct object_id *hash1, const struct object_id *ha update_tree_entry(&two); } else { /* path appears in both */ - if (!oideq(one.entry.oid, two.entry.oid)) { + if (!oideq(&one.entry.oid, &two.entry.oid)) { /* they are different */ score += score_differs(one.entry.mode, two.entry.mode, diff --git a/notes.c b/notes.c index 25cdce28b7..7f7cc4d511 100644 --- a/notes.c +++ b/notes.c @@ -450,7 +450,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree, l = xcalloc(1, sizeof(*l)); oidcpy(&l->key_oid, &object_oid); - oidcpy(&l->val_oid, entry.oid); + oidcpy(&l->val_oid, &entry.oid); if (note_tree_insert(t, node, n, l, type, combine_notes_concatenate)) die("Failed to load %s %s into notes tree " @@ -481,7 +481,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree, } strbuf_addstr(&non_note_path, entry.path); add_non_note(t, strbuf_detach(&non_note_path, NULL), - entry.mode, entry.oid->hash); + entry.mode, entry.oid.hash); } } free(buf); diff --git a/packfile.c b/packfile.c index 8c6b47cc77..e38af1a275 100644 --- a/packfile.c +++ b/packfile.c @@ -2100,7 +2100,7 @@ static int add_promisor_object(const struct object_id *oid, */ return 0; while (tree_entry_gently(&desc, &entry)) - oidset_insert(set, entry.oid); + oidset_insert(set, &entry.oid); } else if (obj->type == OBJ_COMMIT) { struct commit *commit = (struct commit *) obj; struct commit_list *parents = commit->parents; diff --git a/revision.c b/revision.c index 13e0519c02..05fb390d55 100644 --- a/revision.c +++ b/revision.c @@ -67,10 +67,10 @@ static void mark_tree_contents_uninteresting(struct repository *r, while (tree_entry(&desc, &entry)) { switch (object_type(entry.mode)) { case OBJ_TREE: - mark_tree_uninteresting(r, lookup_tree(r, entry.oid)); + mark_tree_uninteresting(r, lookup_tree(r, &entry.oid)); break; case OBJ_BLOB: - mark_blob_uninteresting(lookup_blob(r, entry.oid)); + mark_blob_uninteresting(lookup_blob(r, &entry.oid)); break; default: /* Subproject commit - not in this repository */ diff --git a/tree-diff.c b/tree-diff.c index 0e54324610..4ffa00ae0c 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -239,7 +239,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p, DIFF_STATUS_ADDED; if (tpi_valid) { - oid_i = tp[i].entry.oid; + oid_i = &tp[i].entry.oid; mode_i = tp[i].entry.mode; } else { @@ -280,7 +280,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p, /* same rule as in emitthis */ int tpi_valid = tp && !(tp[i].entry.mode & S_IFXMIN_NEQ); - parents_oid[i] = tpi_valid ? tp[i].entry.oid : NULL; + parents_oid[i] = tpi_valid ? &tp[i].entry.oid : NULL; } strbuf_add(base, path, pathlen); @@ -491,7 +491,7 @@ static struct combine_diff_path *ll_diff_tree_paths( continue; /* diff(t,pi) != ø */ - if (!oideq(t.entry.oid, tp[i].entry.oid) || + if (!oideq(&t.entry.oid, &tp[i].entry.oid) || (t.entry.mode != tp[i].entry.mode)) continue; diff --git a/tree-walk.c b/tree-walk.c index 1e040fc20e..56b951d3cb 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -48,7 +48,8 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l /* Initialize the descriptor entry */ desc->entry.path = path; desc->entry.mode = canon_mode(mode); - desc->entry.oid = (const struct object_id *)(path + len); + desc->entry.pathlen = len - 1; + hashcpy(desc->entry.oid.hash, (const unsigned char *)path + len); return 0; } @@ -107,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a) static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err) { const void *buf = desc->buffer; - const unsigned char *end = desc->entry.oid->hash + the_hash_algo->rawsz; + const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz; unsigned long size = desc->size; unsigned long len = end - (const unsigned char *)buf; @@ -175,9 +176,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base) pathlen--; info->pathlen = pathlen ? pathlen + 1 : 0; info->name.path = base; - info->name.oid = (void *)(base + pathlen + 1); - if (pathlen) + info->name.pathlen = pathlen; + if (pathlen) { + hashcpy(info->name.oid.hash, (const unsigned char *)base + pathlen + 1); info->prev = &dummy; + } } char *make_traverse_path(char *path, const struct traverse_info *info, const struct name_entry *n) diff --git a/tree-walk.h b/tree-walk.h index 196831007e..08d349c54f 100644 --- a/tree-walk.h +++ b/tree-walk.h @@ -1,11 +1,14 @@ #ifndef TREE_WALK_H #define TREE_WALK_H +#include "cache.h" + struct strbuf; struct name_entry { - const struct object_id *oid; + struct object_id oid; const char *path; + int pathlen; unsigned int mode; }; @@ -19,12 +22,12 @@ static inline const struct object_id *tree_entry_extract(struct tree_desc *desc, { *pathp = desc->entry.path; *modep = desc->entry.mode; - return desc->entry.oid; + return &desc->entry.oid; } static inline int tree_entry_len(const struct name_entry *ne) { - return (const char *)ne->oid - ne->path - 1; + return ne->pathlen; } /* diff --git a/tree.c b/tree.c index 215d3fdc7c..7badea0009 100644 --- a/tree.c +++ b/tree.c @@ -84,7 +84,7 @@ static int read_tree_1(struct tree *tree, struct strbuf *base, continue; } - switch (fn(entry.oid, base, + switch (fn(&entry.oid, base, entry.path, entry.mode, stage, context)) { case 0: continue; @@ -95,19 +95,19 @@ static int read_tree_1(struct tree *tree, struct strbuf *base, } if (S_ISDIR(entry.mode)) - oidcpy(&oid, entry.oid); + oidcpy(&oid, &entry.oid); else if (S_ISGITLINK(entry.mode)) { struct commit *commit; - commit = lookup_commit(the_repository, entry.oid); + commit = lookup_commit(the_repository, &entry.oid); if (!commit) die("Commit %s in submodule path %s%s not found", - oid_to_hex(entry.oid), + oid_to_hex(&entry.oid), base->buf, entry.path); if (parse_commit(commit)) die("Invalid commit %s in submodule path %s%s", - oid_to_hex(entry.oid), + oid_to_hex(&entry.oid), base->buf, entry.path); oidcpy(&oid, get_commit_tree_oid(commit)); diff --git a/unpack-trees.c b/unpack-trees.c index 6d53cbfc86..734828fda2 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -679,7 +679,7 @@ static int switch_cache_bottom(struct traverse_info *info) static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k) { - return name_j->oid && name_k->oid && oideq(name_j->oid, name_k->oid); + return !is_null_oid(&name_j->oid) && !is_null_oid(&name_k->oid) && oideq(&name_j->oid, &name_k->oid); } static int all_trees_same_as_cache_tree(int n, unsigned long dirmask, @@ -857,7 +857,7 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, else { const struct object_id *oid = NULL; if (dirmask & 1) - oid = names[i].oid; + oid = &names[i].oid; buf[nr_buf++] = fill_tree_descriptor(t + i, oid); } } @@ -981,7 +981,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, ce->ce_mode = create_ce_mode(n->mode); ce->ce_flags = create_ce_flags(stage); ce->ce_namelen = len; - oidcpy(&ce->oid, n->oid); + oidcpy(&ce->oid, &n->oid); make_traverse_path(ce->name, info, n); return ce; diff --git a/walker.c b/walker.c index 96990d84da..d74ae59c77 100644 --- a/walker.c +++ b/walker.c @@ -50,13 +50,13 @@ static int process_tree(struct walker *walker, struct tree *tree) continue; if (S_ISDIR(entry.mode)) { struct tree *tree = lookup_tree(the_repository, - entry.oid); + &entry.oid); if (tree) obj = &tree->object; } else { struct blob *blob = lookup_blob(the_repository, - entry.oid); + &entry.oid); if (blob) obj = &blob->object; } ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson ` (3 preceding siblings ...) 2019-01-15 0:39 ` [PATCH v2 4/5] tree-walk: store object_id in a separate member brian m. carlson @ 2019-01-15 0:39 ` brian m. carlson 2019-01-15 17:51 ` [PATCH v2 0/5] tree-walk object_id refactor Junio C Hamano 5 siblings, 0 replies; 41+ messages in thread From: brian m. carlson @ 2019-01-15 0:39 UTC (permalink / raw) To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King There are some situations in which we want to store an object ID into struct object_id without the_hash_algo necessarily being set correctly. One such case is when cloning a repository, where we must read refs from the remote side without having a repository from which to read the preferred algorithm. In this cases, we may have the_hash_algo set to SHA-1, which is the default, but read refs into struct object_id that are SHA-256. When copying these values, we will want to copy them completely, not just the first 20 bytes. Consequently, make sure that oidcpy copies the maximum number of bytes at all times, regardless of the setting of the_hash_algo. Since oidcpy and hashcpy are no longer functionally identical, remove the Cocinelle object_id transformations that convert from one into the other. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- cache.h | 2 +- contrib/coccinelle/object_id.cocci | 30 ------------------------------ 2 files changed, 1 insertion(+), 31 deletions(-) diff --git a/cache.h b/cache.h index ca36b44ee0..8b8c6a1a2a 100644 --- a/cache.h +++ b/cache.h @@ -1072,7 +1072,7 @@ static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src) static inline void oidcpy(struct object_id *dst, const struct object_id *src) { - hashcpy(dst->hash, src->hash); + memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ); } static inline struct object_id *oiddup(const struct object_id *src) diff --git a/contrib/coccinelle/object_id.cocci b/contrib/coccinelle/object_id.cocci index 6a7cf3e02d..3e536a9834 100644 --- a/contrib/coccinelle/object_id.cocci +++ b/contrib/coccinelle/object_id.cocci @@ -86,36 +86,6 @@ struct object_id OID; - hashcmp(OID.hash, OIDPTR->hash) + oidcmp(&OID, OIDPTR) -@@ -struct object_id OID1, OID2; -@@ -- hashcpy(OID1.hash, OID2.hash) -+ oidcpy(&OID1, &OID2) - -@@ -identifier f != oidcpy; -struct object_id *OIDPTR1; -struct object_id *OIDPTR2; -@@ - f(...) {<... -- hashcpy(OIDPTR1->hash, OIDPTR2->hash) -+ oidcpy(OIDPTR1, OIDPTR2) - ...>} - -@@ -struct object_id *OIDPTR; -struct object_id OID; -@@ -- hashcpy(OIDPTR->hash, OID.hash) -+ oidcpy(OIDPTR, &OID) - -@@ -struct object_id *OIDPTR; -struct object_id OID; -@@ -- hashcpy(OID.hash, OIDPTR->hash) -+ oidcpy(&OID, OIDPTR) - @@ struct object_id *OIDPTR1; struct object_id *OIDPTR2; ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH v2 0/5] tree-walk object_id refactor 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson ` (4 preceding siblings ...) 2019-01-15 0:39 ` [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson @ 2019-01-15 17:51 ` Junio C Hamano 5 siblings, 0 replies; 41+ messages in thread From: Junio C Hamano @ 2019-01-15 17:51 UTC (permalink / raw) To: brian m. carlson; +Cc: git, Martin Ågren, Jeff King "brian m. carlson" <sandals@crustytoothpaste.net> writes: > There are a small number of places in our codebase where we cast a > buffer of unsigned char to a struct object_id pointer. When we have > GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places > (the buffer for tree objects) can lead to us copying too much data when > using SHA-1 as the hash, since there are only 20 bytes to read. Thanks. And thanks for a pleasant-to-follow discussion during the review of the previous round. > Changes from v1: > * Use hashcpy instead of memcpy. > * Adopt Peff's suggestion for improving patch 3. > > brian m. carlson (5): > tree-walk: copy object ID before use > match-trees: compute buffer offset correctly when splicing > match-trees: use hashcpy to splice trees > tree-walk: store object_id in a separate member > cache: make oidcpy always copy GIT_MAX_RAWSZ bytes > > builtin/grep.c | 8 ++++---- > builtin/merge-tree.c | 20 ++++++++++---------- > builtin/pack-objects.c | 4 ++-- > builtin/reflog.c | 4 ++-- > cache-tree.c | 4 ++-- > cache.h | 2 +- > contrib/coccinelle/object_id.cocci | 30 ------------------------------ > delta-islands.c | 2 +- > fsck.c | 4 ++-- > http-push.c | 4 ++-- > list-objects.c | 6 +++--- > match-trees.c | 27 ++++++++++++++++++++------- > notes.c | 4 ++-- > packfile.c | 2 +- > revision.c | 4 ++-- > tree-diff.c | 6 +++--- > tree-walk.c | 21 ++++++++++++--------- > tree-walk.h | 9 ++++++--- > tree.c | 10 +++++----- > unpack-trees.c | 6 +++--- > walker.c | 4 ++-- > 21 files changed, 85 insertions(+), 96 deletions(-) ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano ` (3 preceding siblings ...) 2019-01-09 7:37 ` Martin Ågren @ 2019-01-09 10:28 ` Jeff King 2019-01-10 19:05 ` Junio C Hamano 2019-01-10 19:46 ` Junio C Hamano 2019-01-10 18:02 ` Stefan Beller 5 siblings, 2 replies; 41+ messages in thread From: Jeff King @ 2019-01-09 10:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, git On Mon, Jan 07, 2019 at 03:34:10PM -0800, Junio C Hamano wrote: > * jk/proto-v2-hidden-refs-fix (2018-12-14) 3 commits > - upload-pack: support hidden refs with protocol v2 > - parse_hide_refs_config: handle NULL section > - serve: pass "config context" through to individual commands > > The v2 upload-pack protocol implementation failed to honor > hidden-ref configuration, which has been corrected. > > Will merge to 'next'. Sorry I didn't catch this before it hit 'next', but I think we were planning to scrap this in favor of: https://public-inbox.org/git/20181218124750.GC30471@sigill.intra.peff.net/ -Peff ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King @ 2019-01-10 19:05 ` Junio C Hamano 2019-01-10 19:46 ` Junio C Hamano 1 sibling, 0 replies; 41+ messages in thread From: Junio C Hamano @ 2019-01-10 19:05 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git Jeff King <peff@peff.net> writes: > On Mon, Jan 07, 2019 at 03:34:10PM -0800, Junio C Hamano wrote: > >> * jk/proto-v2-hidden-refs-fix (2018-12-14) 3 commits >> - upload-pack: support hidden refs with protocol v2 >> - parse_hide_refs_config: handle NULL section >> - serve: pass "config context" through to individual commands >> >> The v2 upload-pack protocol implementation failed to honor >> hidden-ref configuration, which has been corrected. >> >> Will merge to 'next'. > > Sorry I didn't catch this before it hit 'next', but I think we were > planning to scrap this in favor of: > > https://public-inbox.org/git/20181218124750.GC30471@sigill.intra.peff.net/ Ouch, I ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King 2019-01-10 19:05 ` Junio C Hamano @ 2019-01-10 19:46 ` Junio C Hamano 1 sibling, 0 replies; 41+ messages in thread From: Junio C Hamano @ 2019-01-10 19:46 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git Jeff King <peff@peff.net> writes: > On Mon, Jan 07, 2019 at 03:34:10PM -0800, Junio C Hamano wrote: > >> * jk/proto-v2-hidden-refs-fix (2018-12-14) 3 commits >> - upload-pack: support hidden refs with protocol v2 >> - parse_hide_refs_config: handle NULL section >> - serve: pass "config context" through to individual commands >> >> The v2 upload-pack protocol implementation failed to honor >> hidden-ref configuration, which has been corrected. >> >> Will merge to 'next'. > > Sorry I didn't catch this before it hit 'next', but I think we were > planning to scrap this in favor of: > > https://public-inbox.org/git/20181218124750.GC30471@sigill.intra.peff.net/ Ouch, indeed. I did recall that <20181218124852.GD30471@sigill.intra.peff.net> was the final verdict, which did mention that message, but somehow I final onefailed to act on it. Let me revert the merge and redo the topic. Thanks. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7) 2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano ` (4 preceding siblings ...) 2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King @ 2019-01-10 18:02 ` Stefan Beller 5 siblings, 0 replies; 41+ messages in thread From: Stefan Beller @ 2019-01-10 18:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: git > * sb/submodule-recursive-fetch-gets-the-tip (2018-12-09) 9 commits > - fetch: ensure submodule objects fetched > - submodule.c: fetch in submodules git directory instead of in worktree > - submodule: migrate get_next_submodule to use repository structs > - repository: repo_submodule_init to take a submodule struct > - submodule: store OIDs in changed_submodule_names > - submodule.c: tighten scope of changed_submodule_names struct > - submodule.c: sort changed_submodule_names before searching it > - submodule.c: fix indentation > - sha1-array: provide oid_array_filter > > "git fetch --recurse-submodules" may not fetch the necessary commit > that is bound to the superproject, which is getting corrected. > > Ready? I checked the last discussion at https://public-inbox.org/git/20181129002756.167615-1-sbeller@google.com/ and I think it is ready as I did not see any outstanding issues. Thanks, Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2019-01-15 17:52 UTC | newest] Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano 2019-01-08 9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer 2019-01-08 17:51 ` Junio C Hamano 2019-01-08 17:30 ` ag/sequencer-reduce-rewriting-todo " Alban Gruin 2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan 2019-01-08 21:35 ` Junio C Hamano 2019-01-09 21:28 ` Stefan Beller 2019-01-09 7:37 ` Martin Ågren 2019-01-09 21:06 ` Martin Ågren 2019-01-10 1:02 ` brian m. carlson 2019-01-10 18:55 ` Junio C Hamano 2019-01-10 19:03 ` Martin Ågren 2019-01-10 4:25 ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson 2019-01-10 4:25 ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson 2019-01-10 4:25 ` [PATCH 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson 2019-01-10 4:25 ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson 2019-01-10 6:45 ` Jeff King 2019-01-10 23:55 ` brian m. carlson 2019-01-11 14:51 ` Jeff King 2019-01-11 14:54 ` Jeff King 2019-01-14 1:30 ` brian m. carlson 2019-01-14 15:40 ` Jeff King 2019-01-10 4:25 ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson 2019-01-10 6:49 ` Jeff King 2019-01-10 23:57 ` brian m. carlson 2019-01-10 4:25 ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson 2019-01-10 6:50 ` Jeff King 2019-01-10 6:40 ` [PATCH 0/5] tree-walk object_id refactor Jeff King 2019-01-11 0:17 ` brian m. carlson 2019-01-11 14:17 ` Jeff King 2019-01-15 0:39 ` [PATCH v2 " brian m. carlson 2019-01-15 0:39 ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson 2019-01-15 0:39 ` [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson 2019-01-15 0:39 ` [PATCH v2 3/5] match-trees: use hashcpy to splice trees brian m. carlson 2019-01-15 0:39 ` [PATCH v2 4/5] tree-walk: store object_id in a separate member brian m. carlson 2019-01-15 0:39 ` [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson 2019-01-15 17:51 ` [PATCH v2 0/5] tree-walk object_id refactor Junio C Hamano 2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King 2019-01-10 19:05 ` Junio C Hamano 2019-01-10 19:46 ` Junio C Hamano 2019-01-10 18:02 ` Stefan Beller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).