From: Yifan Lu <me@yifanlu.com>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: "Bug 1863025" <1863025@bugs.launchpad.net>,
"Richard Henderson" <rth@twiddle.net>,
"Alex Bennée" <alex.bennee@linaro.org>,
qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
Date: Fri, 14 Feb 2020 16:01:17 -0800 [thread overview]
Message-ID: <CAP4MwtfP_+pxb96WQ4coe187A4e2HasXqTXsobTn1UH+8RFK8Q@mail.gmail.com> (raw)
In-Reply-To: <dc224902-b8bb-934e-947a-4417449566ea@linaro.org>
What race are you thinking of in my patch? The obvious race I can
think of is benign:
Case 1:
A: does TB flush
B: read tb_flush_count
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries
Case 2:
B: read tb_flush_count
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries
Case 3:
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: read tb_flush_count
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (no increment seen)
B: proceeds
Case 1 is the expected case. Case 2, we thought TB was stale but it
wasn't so we get it again with tb_lookup__cpu_state with minimal extra
overhead.
Case 3 seems to be bad because we could read tb_flush_count and find
it already incremented. But if so that means thread A is at the end of
do_tb_flush and the lookup tables are already cleared and the TCG
context is already reset. So it should be safe for thread B to call
tb_lookup__cpu_state or tb_gen_code.
Yifan
On Fri, Feb 14, 2020 at 3:31 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/14/20 6:49 AM, Alex Bennée wrote:
> > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
> > which is invalidated by a tb_flush before we execute it. This doesn't
> > affect the other cpu_exec modes as a tb_flush by it's nature can only
> > occur on a quiescent system. The race was described as:
> >
> > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> > B3. tcg_tb_alloc obtains a new TB
> >
> > C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
> > (same TB as B2)
> >
> > A3. start_exclusive critical section entered
> > A4. do_tb_flush is called, TB memory freed/re-allocated
> > A5. end_exclusive exits critical section
> >
> > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> > B3. tcg_tb_alloc reallocates TB from B2
> >
> > C4. start_exclusive critical section entered
> > C5. cpu_tb_exec executes the TB code that was free in A4
> >
> > The simplest fix is to widen the exclusive period to include the TB
> > lookup. As a result we can drop the complication of checking we are in
> > the exclusive region before we end it.
>
> I'm not 100% keen on having the tb_gen_code within the exclusive region. It
> implies a much larger delay on (at least) the first execution of the atomic
> operation.
>
> But I suppose until recently we had a global lock around code generation, and
> this is only slightly worse. Plus, it has the advantage of being dead simple,
> and without the races vs tb_ctx.tb_flush_count that exist in Yifan's patch.
>
> Applied to tcg-next.
>
>
> r~
WARNING: multiple messages have this Message-ID (diff)
From: Yifan <me@yifanlu.com>
To: qemu-devel@nongnu.org
Subject: [Bug 1863025] Re: [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
Date: Sat, 15 Feb 2020 00:01:17 -0000 [thread overview]
Message-ID: <CAP4MwtfP_+pxb96WQ4coe187A4e2HasXqTXsobTn1UH+8RFK8Q@mail.gmail.com> (raw)
Message-ID: <20200215000117._6nTivmr-oRkkKEvJAQ5KuNxDxzzHAAYGHT-WpQ6ccs@z> (raw)
In-Reply-To: 158154486735.14935.3370403781300872079.malonedeb@soybean.canonical.com
What race are you thinking of in my patch? The obvious race I can
think of is benign:
Case 1:
A: does TB flush
B: read tb_flush_count
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries
Case 2:
B: read tb_flush_count
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (increment seen)
B: retries
Case 3:
A: does TB flush
A: increment tb_flush_count
A: end_exclusive
B: read tb_flush_count
B: tb_lookup__cpu_state/tb_gen_code
B: start_exclusive
B: read tb_flush_count again (no increment seen)
B: proceeds
Case 1 is the expected case. Case 2, we thought TB was stale but it
wasn't so we get it again with tb_lookup__cpu_state with minimal extra
overhead.
Case 3 seems to be bad because we could read tb_flush_count and find
it already incremented. But if so that means thread A is at the end of
do_tb_flush and the lookup tables are already cleared and the TCG
context is already reset. So it should be safe for thread B to call
tb_lookup__cpu_state or tb_gen_code.
Yifan
On Fri, Feb 14, 2020 at 3:31 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/14/20 6:49 AM, Alex Bennée wrote:
> > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
> > which is invalidated by a tb_flush before we execute it. This doesn't
> > affect the other cpu_exec modes as a tb_flush by it's nature can only
> > occur on a quiescent system. The race was described as:
> >
> > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> > B3. tcg_tb_alloc obtains a new TB
> >
> > C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
> > (same TB as B2)
> >
> > A3. start_exclusive critical section entered
> > A4. do_tb_flush is called, TB memory freed/re-allocated
> > A5. end_exclusive exits critical section
> >
> > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
> > B3. tcg_tb_alloc reallocates TB from B2
> >
> > C4. start_exclusive critical section entered
> > C5. cpu_tb_exec executes the TB code that was free in A4
> >
> > The simplest fix is to widen the exclusive period to include the TB
> > lookup. As a result we can drop the complication of checking we are in
> > the exclusive region before we end it.
>
> I'm not 100% keen on having the tb_gen_code within the exclusive region. It
> implies a much larger delay on (at least) the first execution of the atomic
> operation.
>
> But I suppose until recently we had a global lock around code generation, and
> this is only slightly worse. Plus, it has the advantage of being dead simple,
> and without the races vs tb_ctx.tb_flush_count that exist in Yifan's patch.
>
> Applied to tcg-next.
>
>
> r~
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1863025
Title:
Use-after-free after flush in TCG accelerator
Status in QEMU:
Confirmed
Bug description:
I believe I found a UAF in TCG that can lead to a guest VM escape. The
security list informed me "This can not be treated as a security
issue." and to post it here. I am looking at the 4.2.0 source code.
The issue requires a race and I will try to describe it in terms of
three concurrent threads.
Thread A:
A1. qemu_tcg_cpu_thread_fn runs work loop
A2. qemu_wait_io_event => qemu_wait_io_event_common => process_queued_cpu_work
A3. start_exclusive critical section entered
A4. do_tb_flush is called, TB memory freed/re-allocated
A5. end_exclusive exits critical section
Thread B:
B1. qemu_tcg_cpu_thread_fn runs work loop
B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
B3. tcg_tb_alloc obtains a new TB
Thread C:
C1. qemu_tcg_cpu_thread_fn runs work loop
C2. cpu_exec_step_atomic executes
C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
C4. start_exclusive critical section entered
C5. cpu_tb_exec executes the TB code
C6. end_exclusive exits critical section
Consider the following sequence of events:
B2 => B3 => C3 (same TB as B2) => A3 => A4 (TB freed) => A5 => B2 =>
B3 (re-allocates TB from B2) => C4 => C5 (freed/reused TB now executing) => C6
In short, because thread C uses the TB in the critical section, there
is no guarantee that the pointer has not been "freed" (rather the
memory is marked as re-usable) and therefore a use-after-free occurs.
Since the TCG generated code can be in the same memory as the TB data
structure, it is possible for an attacker to overwrite the UAF pointer
with code generated from TCG. This can overwrite key pointer values
and could lead to code execution on the host outside of the TCG
sandbox.
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1863025/+subscriptions
next prev parent reply other threads:[~2020-02-15 0:02 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-12 22:01 [Bug 1863025] [NEW] Use-after-free after flush in TCG accelerator Yifan
2020-02-14 14:23 ` [Bug 1863025] " Alex Bennée
2020-02-14 14:29 ` Alex Bennée
2020-02-14 14:49 ` [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025) Alex Bennée
2020-02-14 14:49 ` [Bug 1863025] Re: Use-after-free after flush in TCG accelerator Alex Bennée
2020-02-14 15:22 ` [PATCH] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025) Paolo Bonzini
2020-02-14 23:31 ` Richard Henderson
2020-02-15 0:01 ` Yifan Lu [this message]
2020-02-15 0:01 ` [Bug 1863025] " Yifan
2020-02-14 14:51 ` [Bug 1863025] Re: Use-after-free after flush in TCG accelerator Alex Bennée
2020-02-14 18:09 ` Yifan
2020-02-14 18:18 ` Yifan
2020-03-10 9:14 ` Laurent Vivier
2020-04-30 13:43 ` Laurent Vivier
2023-08-31 12:48 ` Samuel Henrique
2023-08-31 13:40 ` Philippe Mathieu-Daudé
2023-08-31 13:57 ` Daniel P. Berrangé
2023-08-31 13:57 ` Daniel Berrange
2023-08-31 14:10 ` Mauro Matteo Cascella
2023-08-31 14:10 ` Mauro Matteo Cascella
2023-08-31 14:12 ` Mauro Matteo Cascella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAP4MwtfP_+pxb96WQ4coe187A4e2HasXqTXsobTn1UH+8RFK8Q@mail.gmail.com \
--to=me@yifanlu.com \
--cc=1863025@bugs.launchpad.net \
--cc=alex.bennee@linaro.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).