All of lore.kernel.org
 help / color / mirror / Atom feed
From: Karsten Blees <karsten.blees@gmail.com>
To: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <tr@thomasrast.ch>,
	Jens Lehmann <Jens.Lehmann@web.de>,
	Karsten Blees <karsten.blees@gmail.com>
Subject: [PATCH v4 00/14] New hash table implementation
Date: Thu, 07 Nov 2013 15:32:35 +0100	[thread overview]
Message-ID: <527BA483.6040803@gmail.com> (raw)

Also here:
https://github.com/kblees/git/commits/kb/hashmap-v4

Sorry for the delay, but the promised static use-after-free analysis [1] turned out to be much more involved than I had anticipated.

The Good News is that the static analysis independently found the four use-after-free issues that we already knew about (two fixed in my initial patch 10, and two more found by Thomas' valgrind tests). These are all fixed in this round.

Valgrind tests ran over night and only reported a single unrelated problem (glibc strlen() working in 4-byte chunks and thus exceeding the malloced space in builtin/fetch.c:462).

The Bad News is that I'm a bit clueless regarding unpack-trees.c, so I'd welcome if someone more familiar with that part of git could have a look (see analysis below).


The first 11 patches in this version are exactly as currently in pu, just rebased to next. The problematic memory-leaks patch is at the very end, so we could merge the rest and cook this one a bit longer...

Changes since v3:
- included Jens Lehmann's submodule use-after-free fix [2] (#01)
- added minor fix for 'git update-index --verbose' output (#12)
- added minor cleanup patch for builtin/update-index.c (#13)
- moved the problematic 'fix memory leaks' patch to the end (#14)
- included fix for first valgrind breakage [3] (already in pu) (#14)
- added fix for second valgrind breakage reported in [4] (#14)

[1] http://article.gmane.org/gmane.comp.version-control.git/236467
[2] http://article.gmane.org/gmane.comp.version-control.git/236370
[3] http://article.gmane.org/gmane.comp.version-control.git/236468
[4] http://article.gmane.org/gmane.comp.version-control.git/236869

Jens Lehmann (1):
  submodule: don't access the .gitmodules cache entry after removing it

Karsten Blees (13):
  add a hashtable implementation that supports O(1) removal
  buitin/describe.c: use new hash map implementation
  diffcore-rename.c: move code around to prepare for the next patch
  diffcore-rename.c: simplify finding exact renames
  diffcore-rename.c: use new hash map implementation
  name-hash.c: use new hash map implementation for directories
  name-hash.c: remove unreferenced directory entries
  name-hash.c: use new hash map implementation for cache entries
  name-hash.c: remove cache entries instead of marking them CE_UNHASHED
  remove old hash.[ch] implementation
  fix 'git update-index --verbose --again' output
  builtin/update-index.c: cleanup update_one
  read-cache.c: fix memory leaks caused by removed cache entries

 Documentation/technical/api-hash.txt    |  52 -------
 Documentation/technical/api-hashmap.txt | 237 +++++++++++++++++++++++++++++
 Makefile                                |   5 +-
 builtin/describe.c                      |  53 +++----
 builtin/rm.c                            |   2 +-
 builtin/update-index.c                  |  39 +++--
 cache.h                                 |  18 +--
 diffcore-rename.c                       | 185 ++++++++---------------
 hash.c                                  | 110 --------------
 hash.h                                  |  50 -------
 hashmap.c                               | 212 ++++++++++++++++++++++++++
 hashmap.h                               |  72 +++++++++
 name-hash.c                             | 156 +++++++------------
 read-cache.c                            |  10 +-
 resolve-undo.c                          |   7 +-
 submodule.c                             |  25 +---
 t/t0011-hashmap.sh                      | 236 +++++++++++++++++++++++++++++
 test-hashmap.c                          | 256 ++++++++++++++++++++++++++++++++
 unpack-trees.c                          |   3 +-
 19 files changed, 1199 insertions(+), 529 deletions(-)
 delete mode 100644 Documentation/technical/api-hash.txt
 create mode 100644 Documentation/technical/api-hashmap.txt
 delete mode 100644 hash.c
 delete mode 100644 hash.h
 create mode 100644 hashmap.c
 create mode 100644 hashmap.h
 create mode 100755 t/t0011-hashmap.sh
 create mode 100644 test-hashmap.c

===

Potential use-after-free problems in unpack-trees.c:
----------------------------------------------------

Basic analysis of unpack-trees.c is that most callers go through add_entry or merged_entry, which both add a copy of the cache_entry to a separate index_state (unpack_trees_options.result). This may replace and free entries in unpack_trees_options.result, which is (mostly) ok because unpack_trees_options.result is (mostly) not used otherwise. Adding copies to a separate index is completely safe, so we don't even have to look at most of the other functions in unpack-trees.c.

The two call hierarchies that I'm worried about are as follows:

1.  unpack_callback may pass a cache_entry from unpack_trees_options.src_index
    to unpack_nondirectories (as src[0]), which _may_ add it to
    unpack_trees_options.result without making a copy.
1a. adding a cache_entry from src_index to result without making a copy
    destroys hashtable chaining in src_index (do_add_entry clears ce->next).
1b. if this cache_entry is later replaced and freed in result, it will
    implicitly free an entry that's still in src_index.

  read-cache.c::add_index_entry
    unpack-trees.c::do_add_entry
      unpack-trees.c::unpack_nondirectories
        unpack-trees.c::unpack_callback

2. unpack_trees iterates over unpack_trees_options.result and uses the
   cache_entry after calling verify_absent / apply_sparse_checkout, which in
   turn _may_ replace and free that cache_entry

  read-cache.c::add_index_entry
    unpack-trees.c::do_add_entry
      unpack-trees.c::add_entry
        unpack-trees.c::verify_clean_subdirectory
          unpack-trees.c::check_ok_to_remove
            unpack-trees.c::verify_absent_1
              unpack-trees.c::verify_absent_sparse
                unpack-trees.c::apply_sparse_checkout
                  unpack-trees.c::unpack_trees
              unpack-trees.c::verify_absent
                unpack-trees.c::unpack_trees


Static use-after-free analysis:
-------------------------------

What I did so far is this:

1. Get the reverse call hierarchy for remove_name_hash, up to cmd_* (thanks
   to Eclipse) - results in ~180 functions.
2. Get functions that reference struct cache_entry - results in ~200
   functions.
3. Intersect the two lists - results in 54 functions that _may_ use
   cache_entry after removing / freeing it:

ok builtin/apply.c::add_conflicted_stages_file
ok builtin/apply.c::add_index_file
ok builtin/apply.c::build_fake_ancestor
ok builtin/blame.c::fake_working_tree_commit
ok builtin/checkout.c::checkout_paths
ok builtin/checkout.c::update_some
ok builtin/commit.c::list_paths
ok builtin/ls-files.c::overlay_tree_on_cache
ok builtin/merge.c::suggest_conflicts
ok builtin/reset.c::update_index_from_diff
not-ok builtin/rm.c::cmd_rm (ce->name added to list without strdup)
ok builtin/update-index.c::add_cacheinfo
ok builtin/update-index.c::add_one_path
not-ok builtin/update-index.c::do_reupdate (ce->name passed to update_one,
   which uses it after remove_file_from_cache)
ok builtin/update-index.c::process_directory
ok builtin/update-index.c::process_path
ok builtin/update-index.c::unresolve_one
ok merge-recursive.c::add_cacheinfo
ok read-cache.c::add_index_entry
ok read-cache.c::add_index_entry_with_check
ok read-cache.c::add_to_index
ok read-cache.c::check_file_directory_conflict (called from add_index_entry,
   so ce is not yet in the index)
ok read-cache.c::has_dir_name (called from add_index_entry, so ce is not yet
   in the index)
ok read-cache.c::has_file_name (called from add_index_entry, so ce is not yet
   in the index)
not-ok read-cache.c::read_index_unmerged (uses old ce after replacing
   it with new_ce)
ok read-cache.c::refresh_index
ok read-cache.c::rename_index_entry_at
ok resolve-undo.c::unmerge_index
not-ok resolve-undo.c::unmerge_index_entry_at (uses ce->name after
   remove_index_entry_at)
ok resolve-undo.c::unmerge_marked_index
ok sequencer.c::do_recursive_merge
ok tree.c::read_one_entry_opt
ok tree.c::read_tree
ok unpack-trees.c::add_entry (adds a copy to unpack_trees_options.result)
-- unpack-trees.c::add_same_unmerged
?? unpack-trees.c::apply_sparse_checkout
-- unpack-trees.c::bind_merge
?? unpack-trees.c::check_ok_to_remove
ok unpack-trees.c::check_updates (ok - holds no cache_entry references
   while calling remove_marked_cache_entries)
-- unpack-trees.c::deleted_entry
ok unpack-trees.c::do_add_entry (adds ce to unpack_trees_options.result)
-- unpack-trees.c::keep_entry
ok unpack-trees.c::merged_entry (adds a copy to unpack_trees_options.result)
-- unpack-trees.c::oneway_merge
-- unpack-trees.c::threeway_merge
-- unpack-trees.c::twoway_merge
?? unpack-trees.c::unpack_callback (see unpack_nondirectories)
-- unpack-trees.c::unpack_index_entry
?? unpack-trees.c::unpack_nondirectories (may call do_add_entry without
   copying src[0])
?? unpack-trees.c::unpack_trees (may replace entries in
   unpack_trees_options.result while iterating over it)
?? unpack-trees.c::verify_absent (see unpack_trees)
?? unpack-trees.c::verify_absent_1 (see unpack_trees)
?? unpack-trees.c::verify_absent_sparse (see unpack_trees)
?? unpack-trees.c::verify_clean_subdirectory (see unpack_trees)

             reply	other threads:[~2013-11-07 14:32 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-07 14:32 Karsten Blees [this message]
2013-11-07 14:33 ` [PATCH v4 01/14] submodule: don't access the .gitmodules cache entry after removing it Karsten Blees
2013-11-07 22:27   ` Heiko Voigt
2013-11-07 14:34 ` [PATCH v4 02/14] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-11-07 21:40   ` Junio C Hamano
2013-11-08 10:27     ` Karsten Blees
2013-11-08 16:45       ` Philip Oakley
2013-11-08 17:08       ` Junio C Hamano
2013-11-13 16:37         ` Karsten Blees
2013-11-07 14:35 ` [PATCH v4 03/14] buitin/describe.c: use new hash map implementation Karsten Blees
2013-11-07 14:36 ` [PATCH v4 04/14] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
2013-11-07 14:36 ` [PATCH v4 05/14] diffcore-rename.c: simplify finding exact renames Karsten Blees
2013-11-07 14:37 ` [PATCH v4 06/14] diffcore-rename.c: use new hash map implementation Karsten Blees
2013-11-07 14:38 ` [PATCH v4 07/14] name-hash.c: use new hash map implementation for directories Karsten Blees
2013-11-07 14:38 ` [PATCH v4 08/14] name-hash.c: remove unreferenced directory entries Karsten Blees
2013-11-07 14:39 ` [PATCH v4 09/14] name-hash.c: use new hash map implementation for cache entries Karsten Blees
2013-11-07 14:39 ` [PATCH v4 10/14] name-hash.c: remove cache entries instead of marking them CE_UNHASHED Karsten Blees
2013-11-07 14:40 ` [PATCH v4 11/14] remove old hash.[ch] implementation Karsten Blees
2013-11-07 14:43 ` [PATCH v4 12/14] fix 'git update-index --verbose --again' output Karsten Blees
2013-11-07 22:12   ` Junio C Hamano
2013-11-08 10:27     ` Karsten Blees
2013-11-07 14:44 ` [PATCH v4 13/14] builtin/update-index.c: cleanup update_one Karsten Blees
2013-11-07 21:40   ` Junio C Hamano
2013-11-08 10:27     ` Karsten Blees
2013-11-07 14:45 ` [PATCH v4 14/14] read-cache.c: fix memory leaks caused by removed cache entries Karsten Blees
2013-11-07 21:40   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=527BA483.6040803@gmail.com \
    --to=karsten.blees@gmail.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.