All of lore.kernel.org
 help / color / mirror / Atom feed
From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache.
Date: Fri, 16 Aug 2013 10:42:04 -0700	[thread overview]
Message-ID: <20130816174204.GD20246@cbox> (raw)
In-Reply-To: <20130816172806.GC20246@cbox>

On Fri, Aug 16, 2013 at 10:28:06AM -0700, Christoffer Dall wrote:
> On Fri, Aug 16, 2013 at 10:48:39PM +0530, Anup Patel wrote:
> > On 16 August 2013 22:44, Christoffer Dall <christoffer.dall@linaro.org>wrote:
> > 
> > > On Fri, Aug 16, 2013 at 10:32:30AM +0530, Anup Patel wrote:
> > > > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall
> > > > <christoffer.dall@linaro.org> wrote:
> > > > > On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
> > > > >> On 2013-08-15 16:13, Anup Patel wrote:
> > > > >> > Hi Marc,
> > > > >> >
> > > > >> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com
> > > >
> > > > >> > wrote:
> > > > >> >> On 2013-08-15 14:31, Anup Patel wrote:
> > > > >> >>>
> > > > >> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier
> > > > >> >>> <marc.zyngier@arm.com>
> > > > >> >>> wrote:
> > > > >> >>>>
> > > > >> >>>> On 2013-08-15 07:26, Anup Patel wrote:
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> Hi Marc,
> > > > >> >>>>>
> > > > >> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier
> > > > >> >>>>> <marc.zyngier@arm.com>
> > > > >> >>>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> Hi Anup,
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> On 2013-08-14 15:22, Anup Patel wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier
> > > > >> >>>>>>> <marc.zyngier@arm.com>
> > > > >> >>>>>>> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Hi Pranav,
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote:
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Systems with large external L3-cache (few MBs), might have
> > > > >> >>>>>>>>> dirty
> > > > >> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle
> > > > >> >>>>>>>>> this,
> > > > >> >>>>>>>>> we need to flush such dirty content from d-cache so that
> > > > >> >>>>>>>>> guest
> > > > >> >>>>>>>>> will see correct contents of guest page when guest MMU is
> > > > >> >>>>>>>>> disabled.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external
> > > > >> >>>>>>>>> L3-cache.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar
> > > > >> >>>>>>>>> <pranavkumar@linaro.org>
> > > > >> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@linaro.org>
> > > > >> >>>>>>>>> ---
> > > > >> >>>>>>>>>  arch/arm64/include/asm/kvm_mmu.h |    2 ++
> > > > >> >>>>>>>>>  1 file changed, 2 insertions(+)
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
> > > > >> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h
> > > > >> >>>>>>>>> index efe609c..5129038 100644
> > > > >> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
> > > > >> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > > >> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void
> > > > >> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
> > > > >> >>>>>>>>>       if (!icache_is_aliasing()) {            /* PIPT */
> > > > >> >>>>>>>>>               unsigned long hva = gfn_to_hva(kvm, gfn);
> > > > >> >>>>>>>>>               flush_icache_range(hva, hva + PAGE_SIZE);
> > > > >> >>>>>>>>> +             /* Flush d-cache for systems with external
> > > > >> >>>>>>>>> caches. */
> > > > >> >>>>>>>>> +             __flush_dcache_area((void *) hva, PAGE_SIZE);
> > > > >> >>>>>>>>>       } else if (!icache_is_aivivt()) {       /* non
> > > > >> >>>>>>>>> ASID-tagged
> > > > >> >>>>>>>>> VIVT
> > > > >> >>>>>>>>> */
> > > > >> >>>>>>>>>               /* any kind of VIPT cache */
> > > > >> >>>>>>>>>               __flush_icache_all();
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> [adding Will to the discussion as we talked about this in the
> > > > >> >>>>>>>> past]
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to
> > > hit
> > > > >> >>>>>>>> the
> > > > >> >>>>>>>> data
> > > > >> >>>>>>>> cache on each page mapping. It looks overkill.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning?
> > > > >> >>>>>>>> kvmtools
> > > > >> >>>>>>>> knows which bits of the guest memory have been touched, and
> > > > >> >>>>>>>> can do a
> > > > >> >>>>>>>> "DC
> > > > >> >>>>>>>> DVAC" on this region.
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space.
> > > > >> >>>>>>> I am
> > > > >> >>>>>>> sure
> > > > >> >>>>>>> other architectures don't have such cache cleaning in
> > > > >> >>>>>>> user-space for
> > > > >> >>>>>>> KVM.
> > > > >> >>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> The alternative is do it in the kernel before running any
> > > vcpu
> > > > >> >>>>>>>> - but
> > > > >> >>>>>>>> that's not very nice either (have to clean the whole of the
> > > > >> >>>>>>>> guest
> > > > >> >>>>>>>> memory, which makes a full dcache clean more appealing).
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of
> > > > >> >>>>>>> VCPU was
> > > > >> >>>>>>> our second option but current approach seemed very simple
> > > hence
> > > > >> >>>>>>> we went for this.
> > > > >> >>>>>>>
> > > > >> >>>>>>> If more people vote for full d-cache clean upon first run of
> > > > >> >>>>>>> VCPU then
> > > > >> >>>>>>> we should revise this patch.
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> Can you please give the attached patch a spin on your HW? I've
> > > > >> >>>>>> boot-tested
> > > > >> >>>>>> it on a model, but of course I can't really verify that it
> > > fixes
> > > > >> >>>>>> your
> > > > >> >>>>>> issue.
> > > > >> >>>>>>
> > > > >> >>>>>> As far as I can see, it should fix it without any additional
> > > > >> >>>>>> flushing.
> > > > >> >>>>>>
> > > > >> >>>>>> Please let me know how it goes.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are
> > > > >> >>>>> treated as "Normal Non-shareable, Inner Write-Back
> > > > >> >>>>> Write-Allocate,
> > > > >> >>>>> Outer Write-Back Write-Allocate memory"
> > > > >> >>>>>
> > > > >> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are
> > > > >> >>>>> treated as "Strongly-ordered device memory"
> > > > >> >>>>>
> > > > >> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as
> > > > >> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be
> > > > >> >>>>> accessed as normal memory when HCR_EL2.DC=1.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> I don't think so. Stage-2 still applies, and should force MMIO to
> > > > >> >>>> be
> > > > >> >>>> accessed as device memory.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software
> > > > >> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or
> > > > >> >>>>> KVM ARM64 because we use VGIC.
> > > > >> >>>>>
> > > > >> >>>>> IMHO, this patch enforces incorrect memory attribute for
> > > Guest/VM
> > > > >> >>>>> when Stage1 MMU is off.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> See above. My understanding is that HCR.DC controls the default
> > > > >> >>>> output of
> > > > >> >>>> Stage-1, and Stage-2 overrides still apply.
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant
> > > > >> >>> and wanted guest to decide the memory attribute. In other words,
> > > > >> >>> you
> > > > >> >>> did not want to enforce any memory attribute in Stage2.
> > > > >> >>>
> > > > >> >>> Please refer to this patch
> > > > >> >>> https://patchwork.kernel.org/patch/2543201/
> > > > >> >>
> > > > >> >>
> > > > >> >> This patch has never been merged. If you carry on following the
> > > > >> >> discussion,
> > > > >> >> you will certainly notice it was dropped for a very good reason:
> > > > >> >>
> > > https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
> > > > >> >>
> > > > >> >> So Stage-2 memory attributes are used, they are not going away, and
> > > > >> >> they are
> > > > >> >> essential to the patch I sent this morning.
> > > > >> >>
> > > > >> >>
> > > > >> >>         M.
> > > > >> >> --
> > > > >> >> Fast, cheap, reliable. Pick two.
> > > > >> >
> > > > >> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM
> > > > >> > is
> > > > >> > provided a DMA-capable device in pass-through mode. The reason being
> > > > >> > bootloader/firmware typically don't enable MMU and such
> > > > >> > bootloader/firmware
> > > > >> > will programme a pass-through DMA-capable device without any flushes
> > > > >> > to
> > > > >> > guest RAM (because it has not enabled MMU).
> > > > >> >
> > > > >> > A good example of such a device would be SATA AHCI controller given
> > > > >> > to a
> > > > >> > Guest/VM as direct access (using SystemMMU) and Guest
> > > > >> > bootloader/firmware
> > > > >> > accessing this SATA AHCI controller to load kernel images from SATA
> > > > >> > disk.
> > > > >> > In this situation, we will have to hack Guest bootloader/firmware
> > > > >> > AHCI driver to
> > > > >> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
> > > > >>
> > > > >> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out
> > > of
> > > > >> curiosity: is that a made up example or something you actually have?
> > > > >>
> > > > >> Back to square one:
> > > > >> Can you please benchmark the various cache cleaning options (global at
> > > > >> startup time, per-page on S2 translation fault, and user-space)?
> > > > >>
> > > > > Eh, why is this a more valid argument than the vgic?  The device
> > > > > passthrough Stage-2 mappings would still have the Stage-2 memory
> > > > > attributes to configure that memory region as device memory.  Why is it
> > > > > relevant if the device is DMA-capable in this context?
> > > > >
> > > >
> > > > Most ARM bootloader/firmware run with MMU off hence, they will not do
> > > > explicit cache flushes when programming DMA-capable device. Now If
> > > > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware
> > > > will not be visible to DMA-capable device.
> > > >
> > > Read my question again: The bootloaders running with the MMU off in a VM
> > > will only disable the MMU for Stage-1 translations.  Stage-2
> > > translations are still enforced using hte Stage-2 page tables and their
> > > attributes for all mappings to devices will still enforce
> > > strongly-ordered device type memory.
> > >
> > 
> > Please read my reply again. Also try to read-up SATA AHCI spec.
> > 
> >
> 
> Can you please explain how the specs realate to my question?
> 
> The only case I can see is if the guest puts data in a DMA region that
> it expects the device to read.  Is this the case?
> 
> Just so we're clear, this is quite different from programming the device
> through the device memory.
> 

ok, I see from the specs.  It's not just DMA-capable, but actually
DMA-programmable.  In that case setting DC=1 would be problematic.

Thanks for being so helpful in explaining this ;)

-Christoffer

  reply	other threads:[~2013-08-16 17:42 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-14 11:47 [PATCH] ARM64: KVM: Fix coherent_icache_guest_page() for host with external L3-cache Pranavkumar Sawargaonkar
2013-08-14 12:04 ` Marc Zyngier
2013-08-14 14:22   ` Anup Patel
2013-08-14 15:06     ` Alexander Graf
2013-08-14 15:34       ` Marc Zyngier
2013-08-14 15:41         ` Peter Maydell
2013-08-14 15:57           ` Marc Zyngier
2013-08-14 16:36             ` Anup Patel
2013-08-14 15:23     ` Marc Zyngier
2013-08-14 15:35       ` Peter Maydell
2013-08-14 15:49         ` Marc Zyngier
2013-08-14 17:34           ` Christoffer Dall
2013-08-15  4:44             ` Marc Zyngier
2013-08-15 16:58               ` Christoffer Dall
2013-08-14 15:36       ` Anup Patel
2013-08-15  4:52     ` Marc Zyngier
2013-08-15  6:26       ` Anup Patel
2013-08-15  8:31         ` Marc Zyngier
2013-08-15 13:31           ` Anup Patel
2013-08-15 14:47             ` Marc Zyngier
2013-08-15 15:13               ` Anup Patel
2013-08-15 15:37                 ` Marc Zyngier
2013-08-15 15:45                   ` Anup Patel
2013-08-15 16:53                   ` Christoffer Dall
2013-08-16  5:02                     ` Anup Patel
2013-08-16  6:57                       ` Anup Patel
2013-08-16 17:19                         ` Christoffer Dall
2013-08-16 17:42                           ` Anup Patel
2013-08-16 17:50                             ` Christoffer Dall
2013-08-16 18:06                               ` Christoffer Dall
2013-08-16 18:20                                 ` Anup Patel
2013-08-16 18:11                               ` Anup Patel
2013-08-16 18:20                                 ` Christoffer Dall
2013-08-30  9:52                                 ` Catalin Marinas
2013-08-30 10:44                                   ` Anup Patel
2013-08-30 13:02                                     ` Catalin Marinas
2013-08-30 13:21                                     ` Marc Zyngier
2013-08-30 14:04                                       ` Catalin Marinas
2013-08-30 14:22                                         ` Marc Zyngier
2013-08-30 14:30                                           ` Will Deacon
2013-08-30 14:52                                             ` Anup Patel
2013-08-30 15:12                                             ` Marc Zyngier
2013-08-29 10:52                         ` Catalin Marinas
2013-08-29 12:31                           ` Anup Patel
2013-08-29 12:53                             ` Catalin Marinas
2013-08-29 16:02                               ` Anup Patel
2013-08-30  9:44                                 ` Catalin Marinas
2013-08-30 10:36                                   ` Anup Patel
2013-08-30 12:52                                     ` Catalin Marinas
2013-08-16 17:14                       ` Christoffer Dall
     [not found]                         ` <CALrVBkvEP1Q0mKpv8ViOTLRvW2ks18MQXgmurSBHn+aJcz+=gw@mail.gmail.com>
2013-08-16 17:28                           ` Christoffer Dall
2013-08-16 17:42                             ` Christoffer Dall [this message]
2013-08-15  8:39       ` Pranavkumar Sawargaonkar
2013-08-15 15:42         ` Marc Zyngier
2013-08-14 14:23 ` Sudeep KarkadaNagesha
2013-08-14 14:35   ` Anup Patel
2013-08-14 17:37 ` Christoffer Dall
  -- strict thread matches above, loose matches on Subject: below --
2013-08-14 11:45 Pranavkumar Sawargaonkar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130816174204.GD20246@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.