git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Thomas Gummerer <t.gummerer@gmail.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>, git@vger.kernel.org
Subject: playing with MSan, was Re: [PATCH 0/3] fixes for running the test suite with --valgrind
Date: Wed, 4 Oct 2017 06:19:32 -0400	[thread overview]
Message-ID: <20171004101932.pai6wzcv2eohsicr@sigill.intra.peff.net> (raw)
In-Reply-To: <20171003234153.pmslizidgbifl6en@sigill.intra.peff.net>

On Tue, Oct 03, 2017 at 07:41:54PM -0400, Jeff King wrote:

> I think using SANITIZE=memory would catch these, but it needs some
> suppressions tuning. The weird "zlib reads uninitialized memory" error
> is a problem (valgrind sees this, too, but we have suppressions).

I dug into this a little more. You can blacklist certain functions from
getting MSan treatment, but that's not quite what we want. We want to
mark bytes from certain _sources_ as being initialized, even if MSan
doesn't agree.

And indeed, you can do that. As far as I can tell, MSan works by keeping
a shadow map of memory and setting flags when it believes it has been
initialized, and then checking that map when we make decisions based on
the memory. But it can only do that if it instruments all writes. So the
MSan documentation recommends that you build _everything_, including
libraries, with it. Which obviously we don't do if we're using a system
zlib. Or a system libc for that matter (though they intercept many
common libc functions to handle this).

So one strategy is to "cheat" a bit at the library interfaces, and claim
whatever they send us is properly initialized. The patch below tries
that with zlib, and it does seem to work. It would fail to notice a real
problem with any input we send _to_ the library (since the library isn't
instrumented, and we claim that whatever comes out of it is legitimate).
I could probably live with that.

But there are quite a few test failures that would still need
investigating and annotating:

  - Certainly it's confused by looking at regmatch_t results from
    regexec(). We can fix that by building with NO_REGEX. But pcre has
    a similar problem.

  - Ditto curl and openssl, whose exit points would need annotations.

  - For some reason test-sigchain segfaults when it raise()s in the
    signal handler and recurses. Not sure if this is an MSan bug or
    what.

So I dunno. This approach is a _lot_ more convenient than trying to
rebuild all the dependencies from scratch, and it runs way faster than
valgrind. It did find the cases that led to the patches in this
series, and at least one more: if the lstat() at the end of
entry.c:write_entry() fails, we write nonsense into the cache_entry.

I think we could probably get it to zero false positives without _too_
much effort. I'll stop here for tonight, but I may pick it up again
later (of course anybody else is welcome to fool around with it, too).

Below is the patch that let me run:

  make SANITIZE=memory CC=clang-6.0 NO_REGEX=1

and get a tractable number of errors.

-- >8 --
diff --git a/Makefile b/Makefile
index b143e4eea3..1da5c01211 100644
--- a/Makefile
+++ b/Makefile
@@ -1047,6 +1047,9 @@ endif
 ifneq ($(filter leak,$(SANITIZERS)),)
 BASIC_CFLAGS += -DSUPPRESS_ANNOTATED_LEAKS
 endif
+ifneq ($(filter memory,$(SANITIZERS)),)
+BASIC_CFLAGS += -DENABLE_MSAN_UNPOISON
+endif
 endif
 
 ifndef sysconfdir
diff --git a/git-compat-util.h b/git-compat-util.h
index cedad4d581..836a4c0b54 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1191,4 +1191,11 @@ extern void unleak_memory(const void *ptr, size_t len);
 #define UNLEAK(var) do {} while (0)
 #endif
 
+#ifdef ENABLE_MSAN_UNPOISON
+#include <sanitizer/msan_interface.h>
+#define msan_unpoison(ptr, len) __msan_unpoison(ptr, len)
+#else
+#define msan_unpoison(ptr, len) do {} while (0)
+#endif
+
 #endif
diff --git a/zlib.c b/zlib.c
index 4223f1a8c5..5fa8f12507 100644
--- a/zlib.c
+++ b/zlib.c
@@ -56,6 +56,8 @@ static void zlib_post_call(git_zstream *s)
 	if (s->z.total_in != s->total_in + bytes_consumed)
 		die("BUG: total_in mismatch");
 
+	msan_unpoison(s->next_out, bytes_produced);
+
 	s->total_out = s->z.total_out;
 	s->total_in = s->z.total_in;
 	s->next_in = s->z.next_in;

  parent reply	other threads:[~2017-10-04 10:19 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 19:57 [PATCH 0/3] fixes for running the test suite with --valgrind Thomas Gummerer
2017-10-03 19:57 ` [PATCH 1/3] path.c: fix uninitialized memory access Thomas Gummerer
2017-10-03 22:45   ` Jonathan Nieder
2017-10-03 23:30     ` Jeff King
2017-10-03 23:37       ` Jonathan Nieder
2017-10-04  4:47         ` Junio C Hamano
2017-10-04  5:21           ` Jeff King
2017-10-04 19:22           ` Thomas Gummerer
2017-10-04 19:36           ` Jonathan Nieder
2017-10-03 19:57 ` [PATCH 2/3] http-push: fix construction of hex value from path Thomas Gummerer
2017-10-03 22:53   ` Jonathan Nieder
2017-10-03 23:36     ` Jeff King
2017-10-04  4:48       ` Junio C Hamano
2017-10-04  5:20         ` Junio C Hamano
2017-10-04  5:26           ` Jeff King
2017-10-04  6:26             ` Junio C Hamano
2017-10-03 19:57 ` [PATCH 3/3] sub-process: allocate argv on the heap Thomas Gummerer
2017-10-03 20:24   ` Johannes Sixt
2017-10-04  4:59     ` Junio C Hamano
2017-10-04  5:32       ` Jeff King
2017-10-04  5:58       ` Johannes Sixt
2017-10-04 19:31       ` Thomas Gummerer
2017-10-03 20:25   ` Stefan Beller
2017-10-03 23:41 ` [PATCH 0/3] fixes for running the test suite with --valgrind Jeff King
2017-10-03 23:50   ` Jonathan Nieder
2017-10-03 23:54     ` Jeff King
2017-10-04 10:19   ` Jeff King [this message]
2017-10-04 19:30     ` playing with MSan, was " Thomas Gummerer
2017-10-05  3:46       ` lstat-ing delayed-filter output, was Re: playing with MSan Jeff King
2017-10-05 10:47         ` Lars Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171004101932.pai6wzcv2eohsicr@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=t.gummerer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).