All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Bulekov <alxndr@bu.edu>
To: Qiuhao Li <Qiuhao.Li@outlook.com>, qemu-devel@nongnu.org
Cc: darren.kenny@oracle.com, bsd@redhat.com, thuth@redhat.com,
	stefanha@redhat.com, pbonzini@redhat.com
Subject: Re: [PATCH 1/4] fuzz: refine crash detection mechanism
Date: Tue, 22 Dec 2020 11:47:59 -0500	[thread overview]
Message-ID: <87a6u5kcsg.fsf@stormtrooper.vrmnet> (raw)
In-Reply-To: <ME3P282MB14924A6558A105B7FBFA579DFCC20@ME3P282MB1492.AUSP282.PROD.OUTLOOK.COM>


Oops let me try to resend this..

Qiuhao Li <Qiuhao.Li@outlook.com> writes:

> The original crash detection method is to fork a process to test our new
> trace input. If the child process exits in time and the second-to-last line
> is the same as the first crash, we think it is a crash triggered by the same
> bug. However, in some situations, it doesn't work since it is a
> hardcoded-offset string comparison.
>
> For example, suppose an assertion failure makes the crash. In that case, the
> second-to-last line will be 'timeout: the monitored command dumped core',
>

Ah - I have not encountered this message. Are you running an
--enable-sanitizers build? I believe ASAN disables coredumps, by
default. I have to turn them on with:
ASAN_OPTIONS=abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1

Maybe this is a matter of setting the correct env variables/disabling
coredumps?

I like the idea of switching out CRASH_TOKEN for a regex, however I am
not sure about using the hardcoded crash_patterns to perform matching:

1.) You risk missing some crash pattern. E.g. I don't think
abort()/hw_error() are covered right now.
2.) At some point ASAN/compiler-rt might change the way it outputs
crashes.

I think the current lines[-2] approach is ugly, but it is small, works
in most cases (when coredumps are disabled), and has a simple
CRASH_TOKEN fallback mechanism. We should fix the coredump problem.

Is there any way to do this without hardcoding patterns (or at least
fall-back to something if you don't find a pattern)?

-Alex

> which doesn't contain any information about the assertion failure like where
> it happened or the assertion statement. This may lead to a minimized input
> triggers assertion failure but may indicate another bug. As for some
> sanitizers' crashes, the direct string comparison may stop us from getting a
> smaller input, since they may have a different leaf stack frame.
>
> Perhaps we can detect crashes using both precise output string comparison
> and rough pattern string match and info the user when the trace input
> triggers different but a seminar output.
>
> Tested:
> Assertion failure, https://bugs.launchpad.net/qemu/+bug/1908062
> AddressSanitizer, https://bugs.launchpad.net/qemu/+bug/1907497
> Trace input that doesn't crash
> Trace input that crashes Qtest
>
> Signed-off-by: Qiuhao Li <Qiuhao.Li@outlook.com>
> ---
>  scripts/oss-fuzz/minimize_qtest_trace.py | 59 ++++++++++++++++++------
>  1 file changed, 46 insertions(+), 13 deletions(-)
>
> diff --git a/scripts/oss-fuzz/minimize_qtest_trace.py b/scripts/oss-fuzz/minimize_qtest_trace.py
> index 5e405a0d5f..d3b09e6567 100755
> --- a/scripts/oss-fuzz/minimize_qtest_trace.py
> +++ b/scripts/oss-fuzz/minimize_qtest_trace.py
> @@ -10,11 +10,16 @@ import os
>  import subprocess
>  import time
>  import struct
> +import re
>
>  QEMU_ARGS = None
>  QEMU_PATH = None
>  TIMEOUT = 5
> -CRASH_TOKEN = None
> +
> +crash_patterns = ("Assertion.+failed",
> +                  "SUMMARY.+Sanitizer")
> +crash_pattern = None
> +crash_string = None
>
>  write_suffix_lookup = {"b": (1, "B"),
>                         "w": (2, "H"),
> @@ -24,13 +29,12 @@ write_suffix_lookup = {"b": (1, "B"),
>  def usage():
>      sys.exit("""\
>  Usage: QEMU_PATH="/path/to/qemu" QEMU_ARGS="args" {} input_trace output_trace
> -By default, will try to use the second-to-last line in the output to identify
> -whether the crash occred. Optionally, manually set a string that idenitifes the
> -crash by setting CRASH_TOKEN=
> +By default, we will try to search predefined crash patterns through the
> +tracing output to see whether the crash occred. Optionally, manually set a
> +string that idenitifes the crash by setting CRASH_PATTERN=
>  """.format((sys.argv[0])))
>
>  def check_if_trace_crashes(trace, path):
> -    global CRASH_TOKEN
>      with open(path, "w") as tracefile:
>          tracefile.write("".join(trace))
>
> @@ -42,17 +46,47 @@ def check_if_trace_crashes(trace, path):
>                            shell=True,
>                            stdin=subprocess.PIPE,
>                            stdout=subprocess.PIPE)
> +    if rc.returncode == 137:    # Timed Out
> +        return False
> +
>      stdo = rc.communicate()[0]
>      output = stdo.decode('unicode_escape')
>      > -    if rc.returncode == 137:    # Timed Out
> -        return False
> -    if len(output.splitlines()) < 2:
> +    output_lines = output.splitlines()
> +    # Usually we care about the summary info in the last few lines, reverse.
> +    output_lines.reverse()
> +
> +    global crash_pattern, crash_patterns, crash_string
> +    if crash_pattern is None: # Initialization
> +        for line in output_lines:
> +            for c in crash_patterns:
> +                if re.search(c, line) is not None:
> +                    crash_pattern = c
> +                    crash_string = line
> +                    print("Identifying crash pattern by this string: ",\
> +                          crash_string)
> +                    print("Using regex pattern: ", crash_pattern)
> +                    return True
> +        print("Failed to initialize crash pattern: no match.")
>          return False
>
> -    if CRASH_TOKEN is None:
> -        CRASH_TOKEN = output.splitlines()[-2]
> +    # First, we search exactly the previous crash string.
> +    for line in output_lines:
> +        if crash_string == line:
> +            return True
> +
> +    # Then we decide whether a similar (same pattern) crash happened.
> +    # Slower now :(
> +    for line in output_lines:
> +        if re.search(crash_pattern, line) is not None:
> +            print("\nINFO: The crash string changed during our minimization process.")
> +            print("Before: ", crash_string)
> +            print("After: ", line)
> +            print("The original regex pattern can still match, updated the crash string.")
> +            crash_string = line
> +            return True
>
> -    return CRASH_TOKEN in output
> +    # The input did not trigger (the same type) bug.
> +    return False
>
>
>  def minimize_trace(inpath, outpath):
> @@ -66,7 +100,6 @@ def minimize_trace(inpath, outpath):
>      print("Crashed in {} seconds".format(end-start))
>      TIMEOUT = (end-start)*5
>      print("Setting the timeout for {} seconds".format(TIMEOUT))
> -    print("Identifying Crashes by this string: {}".format(CRASH_TOKEN))
>
>      i = 0
>      newtrace = trace[:]
> @@ -152,6 +185,6 @@ if __name__ == '__main__':
>          usage()
>      # if "accel" not in QEMU_ARGS:
>      #     QEMU_ARGS += " -accel qtest"
> -    CRASH_TOKEN = os.getenv("CRASH_TOKEN")
> +    crash_pattern = os.getenv("CRASH_PATTERN")
>      QEMU_ARGS += " -qtest stdio -monitor none -serial none "
>      minimize_trace(sys.argv[1], sys.argv[2])
> --
> 2.25.1


  parent reply	other threads:[~2020-12-22 16:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-19 18:39 [PATCH 0/4] improve crash case minimization Qiuhao Li
2020-12-19 18:56 ` [PATCH 1/4] fuzz: refine crash detection mechanism Qiuhao Li
2020-12-21 18:46   ` Alexander Bulekov
2020-12-22 11:18     ` Qiuhao Li
2020-12-22 16:47   ` Alexander Bulekov [this message]
2020-12-23  5:58     ` Li Qiuhao
2020-12-19 18:56 ` [PATCH 2/4] fuzz: split QTest writes from the rightmost byte Qiuhao Li
2020-12-21 20:01   ` Alexander Bulekov
2020-12-22 11:20     ` Qiuhao Li
2020-12-19 18:56 ` [PATCH 3/4] fuzz: setting bits in operand of out/write to zero Qiuhao Li
2020-12-21 20:35   ` Alexander Bulekov
2020-12-22 11:21     ` Qiuhao Li
2020-12-19 18:56 ` [PATCH 4/4] fuzz: delay IO until they can't trigger the crash Qiuhao Li
2020-12-21 21:17   ` Alexander Bulekov
2020-12-22 11:22     ` Qiuhao Li
2020-12-22 18:30       ` Alexander Bulekov
2020-12-23  9:20         ` Qiuhao Li
2020-12-25  0:24           ` Alexander Bulekov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6u5kcsg.fsf@stormtrooper.vrmnet \
    --to=alxndr@bu.edu \
    --cc=Qiuhao.Li@outlook.com \
    --cc=bsd@redhat.com \
    --cc=darren.kenny@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.