[RFC] xen/arm: Handling cache maintenance instructions by set/way

* [RFC] xen/arm: Handling cache maintenance instructions by set/way
@ 2017-12-05 18:39 Julien Grall
  2017-12-05 22:35 ` Stefano Stabellini
                   ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Julien Grall @ 2017-12-05 18:39 UTC (permalink / raw)
  To: xen-devel, Jan Beulich, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Andre Przywara, Tim Deegan

Hi all,

Even though it is an Arm failure, I have CCed x86 folks to get feedback 
on the approach. I have a WIP branch I could share if that interest people.

Few months ago, we noticed an heisenbug on jobs run by osstest on the 
cubietrucks (see [1]). From the log, we figured out that the guest vCPU 
0 is in data/prefetch abort state at early boot. I have been able to 
reproduce it reliably, although from the little information I have I 
think it is related to a cache issue because we don't trap cache 
maintenance instructions by set/way.

This is a set of 3 instructions (clean, clean & invalidate, invalidate) 
working on a given cache level by S/W. Because the OS is not allowed to 
infer the S/W to PA mapping, it can only use S/W to nuke the whole 
cache. "The expected usage of the cache maintenance that operate by 
set/way is associated with powerdown and powerup of caches, if this is 
required by the implementation" (see D3-2020 ARM DDI 0487B.b).

Those instructions will target a local processor and usually working in 
batch for nuking the cache. This means if the vCPU is migrated to 
another pCPU in the middle of the process, the cache may not be cleaned. 
This would result to data corruption and potential crash of the OS.

Thankfully, the Arm architecture offers a way to trap all the cache 
maintenance instructions by S/W (e.g HCR_EL2.TSW). Xen will need to set 
that bit and handle S/W.

The major question now is how to handle them. S/W instructions are 
difficult to virtualize (see ARMv7 ARM B1.14.4).

The suggested policy is based on the KVM one:
	- If we trap a S/W instructions, we enable VM trapping (e.g 
HCR_EL2.TVM) to detect cache being turned on/off, and do a full clean.
	- We flush the caches on both caches being turned on and off.
	- Once the caches are enabled, we stop trapping VM instructions.

Doing a full clean will require to go through the P2M and flush the 
entries one by one. At the moment, all the memory is mapped. As you can 
imagine flushing guest with hundreds of MB will take a very long time 
(Linux timeout during CPU bring).

Therefore, we need a way to limit the number of entries we need to 
flush. The suggested solution here is to introduce Populate On Demand 
(PoD) on Arm.

The guest would boot with no RAM mapped in stage-2 page-table. At every 
prefetch/data abort, the RAM would be mapped using preferably 2MB chunk 
or 4KB. This means that when S/W would be used, the number of entries 
mapped would be very limited. However, for safety, the flush should be 
preemptible.

For those been worry about the performance impact, I have looked at the 
current use of S/W instructions:
	- Linux Arm64: The last used in the kernel was beginning of 2015
	- Linux Arm32: Still use S/W for boot and secondary CPU bring-up. No 
plan to change.
	- UEFI: A couple of use in UEFI, but I have heard they plan to remove 
them (need confirmation).

I haven't looked at all the OSes. However, given the Arm Arm clearly 
state S/W instructions are not easily virtualizable, I would expect 
guest OSes developers to try there best to limit the use of the 
instructions.

To limit the performance impact, we could introduce a guest option to 
tell whether the guest will use S/W. If it does plan to use S/W, PoD 
will be disabled.

Now regarding the hardware domain. At the moment, it has its RAM direct 
mapped. Supporting direct mapping in PoD will be quite a pain for a 
limited benefits (see why above). In that case I would suggest to impose 
vCPU pinning for the hardware domain if the S/W are expected to be used. 
Again, a command line option could be introduced here.

Any feedbacks on the approach will be welcomed.

Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg03191.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread