Re: [Qemu-devel] [Xen-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache?

From: Alexey G <x1917x@gmail.com>
To: hrg <hrgstephen@gmail.com>
Cc: anthony.perard@citrix.com, xen-devel@lists.xensource.com,
	qemu-devel@nongnu.org, jun.nakajima@intel.com, agraf@suse.de,
	sstabellini@kernel.org, xen-devel@lists.xenproject.org,
	wangxinxin.wang@huawei.com,
	"Herongguang (Stephen)" <herongguang.he@huawei.com>,
	xen-devel@lists.xen.org
Subject: Re: [Qemu-devel] [Xen-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache?
Date: Mon, 10 Apr 2017 03:52:05 +1000	[thread overview]
Message-ID: <20170410035205.000050b1@gmail.com> (raw)
In-Reply-To: <CADZi59xEg0SvScgo=Jg0MxTbGYY=nTX18Qo-4GMkfayThDU9zw@mail.gmail.com>

On Mon, 10 Apr 2017 00:36:02 +0800
hrg <hrgstephen@gmail.com> wrote:

Hi,

> On Sun, Apr 9, 2017 at 11:55 PM, hrg <hrgstephen@gmail.com> wrote:
> > On Sun, Apr 9, 2017 at 11:52 PM, hrg <hrgstephen@gmail.com> wrote:  
> >> Hi,
> >>
> >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
> >> instead of first level entry (if map to rom other than guest memory
> >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
> >> out memory, qemu did not invalidate cache entries in linked
> >> list(entry->next), so when VM balloon back in memory, gfns probably
> >> mapped to different mfns, thus if guest asks device to DMA to these
> >> GPA, qemu may DMA to stale MFNs.
> >>
> >> So I think in xen_invalidate_map_cache() linked lists should also be
> >> checked and invalidated.
> >>
> >> What’s your opinion? Is this a bug? Is my analyze correct?  
> >
> > Added Jun Nakajima and Alexander Graf  
> And correct Stefano Stabellini's email address.

There is a real issue with the xen-mapcache corruption in fact. I encountered
it a few months ago while experimenting with Q35 support on Xen. Q35 emulation
uses an AHCI controller by default, along with NCQ mode enabled. The issue can
be (somewhat) easily reproduced there, though using a normal i440 emulation
might possibly allow to reproduce the issue as well, using a dedicated test
code from a guest side. In case of Q35+NCQ the issue can be reproduced "as is".

The issue occurs when a guest domain performs an intensive disk I/O, ex. while
guest OS booting. QEMU crashes with "Bad ram offset 980aa000"
message logged, where the address is different each time. The hard thing with
this issue is that it has a very low reproducibility rate.

The corruption happens when there are multiple I/O commands in the NCQ queue.
So there are overlapping emulated DMA operations in flight and QEMU uses a
sequence of mapcache actions which can be executed in the "wrong" order thus
leading to an inconsistent xen-mapcache - so a bad address from the wrong
entry is returned.

The bad thing with this issue is that QEMU crash due to "Bad ram offset"
appearance is a relatively good situation in the sense that this is a caught
error. But there might be a much worse (artificial) situation where the returned
address looks valid but points to a different mapped memory.

The fix itself is not hard (ex. an additional checked field in MapCacheEntry),
but there is a need of some reliable way to test it considering the low
reproducibility rate.

Regards,
Alex