Re: [RFC] xen/arm: Handling cache maintenance instructions by set/way

From: George Dunlap <george.dunlap@citrix.com>
To: Marc Zyngier <marc.zyngier@arm.com>,
	Julien Grall <julien.grall@linaro.org>,
	Jan Beulich <JBeulich@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andre Przywara <andre.przywara@arm.com>, Tim Deegan <tim@xen.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [RFC] xen/arm: Handling cache maintenance instructions by set/way
Date: Fri, 8 Dec 2017 10:56:35 +0000	[thread overview]
Message-ID: <3fdd243a-a40d-3f02-e64e-2253f9456b3f@citrix.com> (raw)
In-Reply-To: <f1f6271d-d3fb-1a47-72a0-d110fcc0d99b@arm.com>

On 12/07/2017 07:21 PM, Marc Zyngier wrote:
> On 07/12/17 18:06, George Dunlap wrote:
>> On 12/07/2017 04:58 PM, Marc Zyngier wrote:
>>> On 07/12/17 16:44, George Dunlap wrote:
>>>> On 12/07/2017 04:04 PM, Julien Grall wrote:
>>>>> Hi Jan,
>>>>>
>>>>> On 07/12/17 15:45, Jan Beulich wrote:
>>>>>>>>> On 07.12.17 at 15:53, <marc.zyngier@arm.com> wrote:
>>>>>>> On 07/12/17 13:52, Julien Grall wrote:
>>>>>>> There is exactly one case where set/way makes sense, and that's when
>>>>>>> you're the only CPU left in the system, your MMU is off, and you're
>>>>>>> about to go down.
>>>>>>
>>>>>> With this and ...
>>>>>>
>>>>>>> On top of bypassing the coherency, S/W CMOs do not prevent lines from
>>>>>>> migrating from one CPU to another. So you could happily be flushing by
>>>>>>> S/W, and still end up with dirty lines in your cache. Success!
>>>>>>
>>>>>> ... this I wonder what value emulating those insns then has in the first
>>>>>> place. Can't you as well simply skip and ignore them, with the same
>>>>>> (bad) result?
>>>>>
>>>>> The result will be much much worst. Here a concrete example with a Linux
>>>>> Arm 32-bit:
>>>>>
>>>>>     1) Cache enabled
>>>>>     2) Decompress
>>>>>     3) Nuke cache (S/W)
>>>>>     4) Cache off
>>>>>     5) Access new kernel
>>>>>
>>>>> If you skip #3, the decompress data may not have reached the memory, so
>>>>> you would access stall data.
>>>>>
>>>>> This would effectively mean we don't support Linux Arm 32-bit.
>>>>
>>>> So Marc said that #3 "doesn't make sense", since although it might be
>>>> the only cpu on in the system, you're not "about to go down"; but Linux
>>>> 32-bit is doing that anyway.
>>>
>>> "Doesn't make sense" on an ARMv7+ with SMP. That code dates back to
>>> ARMv4, and has been left untouched ever since. "If it ain't broke..."
>>>
>>>> It sounds like from the slides the purpose of #3 might be to get stuff
>>>> out of the D-cache into the I-cache.  But why is the cache turned off?
>>>
>>> Linux mandates that the kernel in entered with the MMU off. Which has
>>> the effect of disabling the caches too (VIVT caches and all that jazz).
>>>
>>>> And why doesn't Linux use the VA-based flushes rather than the S/W flushes?
>>>
>>> Linux/arm64 does. Changing the 32bit port to use VA CMOs would probably
>>> break stuff from the late 90s, so that's not going to happen. These
>>> days, I tend to pick my battles... ;-)
>>
>> OK, so let me try to state this "forwards" for those of us not familiar
>> with the situation:
>>
>> 1. Linux expects to start in 'linear' mode, with the MMU disabled.
>>
>> 2. On ARM, disabling the MMU disables caching (!).  But disabling
>> caching doesn't flush the cache; it just means the cache is bypassed (!).
>>
>> 3. Which means for Linux on ARM, after unzipping the kernel image, you
>> need to flush the cache before disabling the MMU and starting Linux proper
>>
>> 4. For historical reasons, 32-bit ARM Linux uses the S/W instructions to
>> flush the cache.  This still works on 32-bit hardware, and so the Linux
>> maintainers are loathe to change it, even though more reliable VA-based
>> instructions are available (?).
> 
> It also works on 64bit HW. It is just not easily virtualizable, which is
> why we've removed all S/W from the 64bit Linux port a while ago.

From the diagram in your talk, it looked like the "flush the cache"
operation *doesn't* work anywhere that has a "system cache", even on
bare metal.

>> 6. Rather than fix this in Linux, KVM has added a work-around in which
>> the *hypervisor* flushes the caches at certain points (!!!).  Julien is
>> looking into doing the same with Xen.
> 
> The "at certain points" doesn't quite describe it. We fully emulate S/W
> instruction using the biggest hammer we can find.

Oh, I thought Julien was saying something about flushing the guest's RAM
every time caching was enabled or disabled.

>> Given the variety of hardware that Linux has to run on, it's hard to
>> understand why 1) 32-bit ARM Linux couldn't detect if it would be
>> appropriate to use VA-based instructions rather than S/W instructions 2)
>> There couldn't at least be a Kconfig option to use VA instructions
>> instead of S/W instructions.
> 
> [Linux hat on]
> 
> 1) There is hardly anything to detect. Both sets of CMOs are available
> on a moderately recent implementation. What you'd want to detect is the
> the kernel is "virtualizable", which is not an easy task.
<snip>
> An alternative option would be to switch to VA CMOs if compiled for
> ARMv7 (and maybe v6), assuming that doesn't have any horrible side
> effect with broken cache implementations (and there is a few out there).
> You'll have to check that this doesn't regress on any existing HW.

So the idea would be to use the VA-based operations if available, and
then special-case specific chipsets known to have issues.  Linux (and
Xen and...) end up doing this for lots of different kinds of hardware;
this would be no different.

> 2) Kconfig options are the way to hell. It took us 5 years to get a
> 32bit kernel that would boot on about anything, and we're not going to
> go back.

Well, at the moment you *don't* have a 32-bit kernel that will boot on
anything.  It won't boot (it sounds like) on any 32-bit system that has
a system cache, including a 64-bit hypervisor providing a 32-bit guest.

Alternately, would it make sense to have a PV "cache flush" operation
for hypervisors?  x86 has a way to expose hypervisor capabilities via
specific CPUID leaves.  Does anything like this exist for ARM?  If so,
the code could be, "If virtualized and hypervisor provides PV cache
flush, use that.  Otherwise, fall back to S/W operation."

> Of course, none of that will solve the most important issue, which is to
> boot an unmodified kernel from yesterday to install a distribution. If
> you want to be able to do that, you'll have to use the aforementioned
> hammer.

Well it will take time to code up a solution and get *that* into user's
hands as well.  I would think the fastest way to get *most* distros
working would be to open a ticket saying it's broken on virtual
hardware, and asking them to apply a patch.  Then the priority of
getting more "enterprisey" distros working if and when.

Just to be clear -- I'm just trying to help push to explore other
options here.  I'm not opposed to Julien or someone making a work-around
in Xen.  But it's quite a bit of effort to achieve a pretty crappy end,
so I think it's worth exploring what kind of effort we could spend
achieving a "proper" fix first.

(Thanks also for taking the time to help explain this.)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel