* [PATCH] KVM: arm64: Handle CMOs on Read Only memslots @ 2021-02-11 14:27 Marc Zyngier 2021-02-12 17:12 ` Alexandru Elisei 2021-02-16 12:18 ` Alexandru Elisei 0 siblings, 2 replies; 7+ messages in thread From: Marc Zyngier @ 2021-02-11 14:27 UTC (permalink / raw) To: kvmarm, linux-arm-kernel, kvm Cc: kernel-team, Suzuki K Poulose, Will Deacon, Jianyong Wu, James Morse, Alexandru Elisei, Julien Thierry It appears that when a guest traps into KVM because it is performing a CMO on a Read Only memslot, our handling of this operation is "slightly suboptimal", as we treat it as an MMIO access without a valid syndrome. The chances that userspace is adequately equiped to deal with such an exception being slim, it would be better to handle it in the kernel. What we need to provide is roughly as follows: (a) if a CMO hits writeable memory, handle it as a normal memory acess (b) if a CMO hits non-memory, skip it (c) if a CMO hits R/O memory, that's where things become fun: (1) if the CMO is DC IVAC, the architecture says this should result in a permission fault (2) if the CMO is DC CIVAC, it should work similarly to (a) We already perform (a) and (b) correctly, but (c) is a total mess. Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2). One way to do it is to treat CMOs generating a translation fault as a *read*, even when they are on a RW memslot. This allows us to further triage things: If they come back with a permission fault, that is because this is a DC IVAC instruction: - inside a RW memslot: no problem, treat it as a write (a)(c.2) - inside a RO memslot: inject a data abort in the guest (c.1) The only drawback is that DC IVAC on a yet unmapped page faults twice: one for the initial translation fault that result in a RO mapping, and once for the permission fault. I think we can live with that. Reported-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> --- Notes: I have taken the option to inject an abort in the guest when it issues a DC IVAC on a R/O memslot, but another option would be to just perform the invalidation ourselves as a DC CIAVAC. This would have the advantage of being consistent with what we do for emulated MMIO. arch/arm64/kvm/mmu.c | 53 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 41 insertions(+), 12 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7d2257cc5438..c7f4388bea45 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -760,7 +760,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_pgtable *pgt; fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); - write_fault = kvm_is_write_fault(vcpu); + /* + * Treat translation faults on CMOs as read faults. Should + * this further generate a permission fault on a R/O memslot, + * it will be caught in kvm_handle_guest_abort(), with + * prejudice. Permission faults on non-R/O memslot will be + * gracefully handled as writes. + */ + if (fault_status == FSC_FAULT && kvm_vcpu_dabt_is_cm(vcpu)) + write_fault = false; + else + write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); VM_BUG_ON(write_fault && exec_fault); @@ -1013,19 +1023,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) } /* - * Check for a cache maintenance operation. Since we - * ended-up here, we know it is outside of any memory - * slot. But we can't find out if that is for a device, - * or if the guest is just being stupid. The only thing - * we know for sure is that this range cannot be cached. + * Check for a cache maintenance operation. Three cases: + * + * - It is outside of any memory slot. But we can't find out + * if that is for a device, or if the guest is just being + * stupid. The only thing we know for sure is that this + * range cannot be cached. So let's assume that the guest + * is just being cautious, and skip the instruction. + * + * - Otherwise, check whether this is a permission fault. + * If so, that's a DC IVAC on a R/O memslot, which is a + * pretty bad idea, and we tell the guest so. * - * So let's assume that the guest is just being - * cautious, and skip the instruction. + * - If this wasn't a permission fault, pass it along for + * further handling (including faulting the page in if it + * was a translation fault). */ - if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) { - kvm_incr_pc(vcpu); - ret = 1; - goto out_unlock; + if (kvm_vcpu_dabt_is_cm(vcpu)) { + if (kvm_is_error_hva(hva)) { + kvm_incr_pc(vcpu); + ret = 1; + goto out_unlock; + } + + if (fault_status == FSC_PERM) { + /* DC IVAC on a R/O memslot */ + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); + ret = 1; + goto out_unlock; + } + + goto handle_access; } /* @@ -1039,6 +1067,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) goto out_unlock; } +handle_access: /* Userspace should not be able to register out-of-bounds IPAs */ VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm)); -- 2.30.0 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: arm64: Handle CMOs on Read Only memslots 2021-02-11 14:27 [PATCH] KVM: arm64: Handle CMOs on Read Only memslots Marc Zyngier @ 2021-02-12 17:12 ` Alexandru Elisei 2021-02-12 18:18 ` Marc Zyngier 2021-02-16 12:18 ` Alexandru Elisei 1 sibling, 1 reply; 7+ messages in thread From: Alexandru Elisei @ 2021-02-12 17:12 UTC (permalink / raw) To: Marc Zyngier, kvmarm, linux-arm-kernel, kvm Cc: Suzuki K Poulose, kernel-team, Jianyong Wu, James Morse, Will Deacon, Julien Thierry Hi Marc, I've been trying to get my head around what the architecture says about CMOs, so please bare with me if I misunderstood some things. On 2/11/21 2:27 PM, Marc Zyngier wrote: > It appears that when a guest traps into KVM because it is > performing a CMO on a Read Only memslot, our handling of > this operation is "slightly suboptimal", as we treat it as > an MMIO access without a valid syndrome. > > The chances that userspace is adequately equiped to deal > with such an exception being slim, it would be better to > handle it in the kernel. > > What we need to provide is roughly as follows: > > (a) if a CMO hits writeable memory, handle it as a normal memory acess > (b) if a CMO hits non-memory, skip it > (c) if a CMO hits R/O memory, that's where things become fun: > (1) if the CMO is DC IVAC, the architecture says this should result > in a permission fault > (2) if the CMO is DC CIVAC, it should work similarly to (a) When you say it should work similarly to (a), you mean it should be handled as a normal memory access, without the "CMO hits writeable memory" part, right? > > We already perform (a) and (b) correctly, but (c) is a total mess. > Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2). > > One way to do it is to treat CMOs generating a translation fault as > a *read*, even when they are on a RW memslot. This allows us to > further triage things: > > If they come back with a permission fault, that is because this is > a DC IVAC instruction: > - inside a RW memslot: no problem, treat it as a write (a)(c.2) > - inside a RO memslot: inject a data abort in the guest (c.1) > > The only drawback is that DC IVAC on a yet unmapped page faults > twice: one for the initial translation fault that result in a RO > mapping, and once for the permission fault. I think we can live with > that. I'm trying to make sure I understand what the problem is. gfn_to_pfn_prot() returnsKVM_HVA_ERR_RO_BAD if the write is to a RO memslot. KVM_HVA_ERR_RO_BAD is PAGE_OFFSET + PAGE_SIZE, which means that is_error_noslot_pfn() return true. In that case we exit to userspace with -EFAULT for DC IVAC and DC CIVAC. But what we should be doing is this: - For DC IVAC, inject a dabt with ISS = 0x10, meaning an external abort (that's what kvm_inject_dabt_does()). - For DC CIVAC, exit to userspace with -EFAULT. Did I get that right? Thanks, Alex > > Reported-by: Jianyong Wu <jianyong.wu@arm.com> > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > > Notes: > I have taken the option to inject an abort in the guest when > it issues a DC IVAC on a R/O memslot, but another option would > be to just perform the invalidation ourselves as a DC CIAVAC. > > This would have the advantage of being consistent with what we > do for emulated MMIO. > > arch/arm64/kvm/mmu.c | 53 ++++++++++++++++++++++++++++++++++---------- > 1 file changed, 41 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 7d2257cc5438..c7f4388bea45 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -760,7 +760,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_pgtable *pgt; > > fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); > - write_fault = kvm_is_write_fault(vcpu); > + /* > + * Treat translation faults on CMOs as read faults. Should > + * this further generate a permission fault on a R/O memslot, > + * it will be caught in kvm_handle_guest_abort(), with > + * prejudice. Permission faults on non-R/O memslot will be > + * gracefully handled as writes. > + */ > + if (fault_status == FSC_FAULT && kvm_vcpu_dabt_is_cm(vcpu)) > + write_fault = false; > + else > + write_fault = kvm_is_write_fault(vcpu); > exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > VM_BUG_ON(write_fault && exec_fault); > > @@ -1013,19 +1023,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > } > > /* > - * Check for a cache maintenance operation. Since we > - * ended-up here, we know it is outside of any memory > - * slot. But we can't find out if that is for a device, > - * or if the guest is just being stupid. The only thing > - * we know for sure is that this range cannot be cached. > + * Check for a cache maintenance operation. Three cases: > + * > + * - It is outside of any memory slot. But we can't find out > + * if that is for a device, or if the guest is just being > + * stupid. The only thing we know for sure is that this > + * range cannot be cached. So let's assume that the guest > + * is just being cautious, and skip the instruction. > + * > + * - Otherwise, check whether this is a permission fault. > + * If so, that's a DC IVAC on a R/O memslot, which is a > + * pretty bad idea, and we tell the guest so. > * > - * So let's assume that the guest is just being > - * cautious, and skip the instruction. > + * - If this wasn't a permission fault, pass it along for > + * further handling (including faulting the page in if it > + * was a translation fault). > */ > - if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) { > - kvm_incr_pc(vcpu); > - ret = 1; > - goto out_unlock; > + if (kvm_vcpu_dabt_is_cm(vcpu)) { > + if (kvm_is_error_hva(hva)) { > + kvm_incr_pc(vcpu); > + ret = 1; > + goto out_unlock; > + } > + > + if (fault_status == FSC_PERM) { > + /* DC IVAC on a R/O memslot */ > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > + ret = 1; > + goto out_unlock; > + } > + > + goto handle_access; > } > > /* > @@ -1039,6 +1067,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > goto out_unlock; > } > > +handle_access: > /* Userspace should not be able to register out-of-bounds IPAs */ > VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm)); > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: arm64: Handle CMOs on Read Only memslots 2021-02-12 17:12 ` Alexandru Elisei @ 2021-02-12 18:18 ` Marc Zyngier 2021-02-16 12:19 ` Alexandru Elisei 0 siblings, 1 reply; 7+ messages in thread From: Marc Zyngier @ 2021-02-12 18:18 UTC (permalink / raw) To: Alexandru Elisei Cc: kvm, Suzuki K Poulose, kernel-team, Jianyong Wu, James Morse, linux-arm-kernel, Will Deacon, kvmarm, Julien Thierry Hi Alex, On 2021-02-12 17:12, Alexandru Elisei wrote: > Hi Marc, > > I've been trying to get my head around what the architecture says about > CMOs, so > please bare with me if I misunderstood some things. No worries. I've had this patch for a few weeks now, and can't make up my mind about it. It does address an actual issue though, so I couldn't just discard it... ;-) > On 2/11/21 2:27 PM, Marc Zyngier wrote: >> It appears that when a guest traps into KVM because it is >> performing a CMO on a Read Only memslot, our handling of >> this operation is "slightly suboptimal", as we treat it as >> an MMIO access without a valid syndrome. >> >> The chances that userspace is adequately equiped to deal >> with such an exception being slim, it would be better to >> handle it in the kernel. >> >> What we need to provide is roughly as follows: >> >> (a) if a CMO hits writeable memory, handle it as a normal memory acess >> (b) if a CMO hits non-memory, skip it >> (c) if a CMO hits R/O memory, that's where things become fun: >> (1) if the CMO is DC IVAC, the architecture says this should result >> in a permission fault >> (2) if the CMO is DC CIVAC, it should work similarly to (a) > > When you say it should work similarly to (a), you mean it should be > handled as a > normal memory access, without the "CMO hits writeable memory" part, > right? What I mean is that the cache invalidation should take place, preferably without involving KVM at all (other than populating S2 if required). > >> >> We already perform (a) and (b) correctly, but (c) is a total mess. >> Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2). >> >> One way to do it is to treat CMOs generating a translation fault as >> a *read*, even when they are on a RW memslot. This allows us to >> further triage things: >> >> If they come back with a permission fault, that is because this is >> a DC IVAC instruction: >> - inside a RW memslot: no problem, treat it as a write (a)(c.2) >> - inside a RO memslot: inject a data abort in the guest (c.1) >> >> The only drawback is that DC IVAC on a yet unmapped page faults >> twice: one for the initial translation fault that result in a RO >> mapping, and once for the permission fault. I think we can live with >> that. > > I'm trying to make sure I understand what the problem is. > > gfn_to_pfn_prot() returnsKVM_HVA_ERR_RO_BAD if the write is to a RO > memslot. > KVM_HVA_ERR_RO_BAD is PAGE_OFFSET + PAGE_SIZE, which means that > is_error_noslot_pfn() return true. In that case we exit to userspace > with -EFAULT > for DC IVAC and DC CIVAC. But what we should be doing is this: > > - For DC IVAC, inject a dabt with ISS = 0x10, meaning an external abort > (that's > what kvm_inject_dabt_does()). > > - For DC CIVAC, exit to userspace with -EFAULT. > > Did I get that right? Not quite. What I *think* we should do is: - DC CIVAC should just work, without going to userspace. I can't imagine a reason why we'd involve userspace for this, and we currently don't really have a good way to describe this to userspace. - DC IVAC is more nuanced: we could either inject an exception (which is what this patch does), or perform the CMO ourselves as a DC CIVAC (consistent with the IVA->CIVA upgrade caused by having a S2 translation). This second approach is comparable to what we do when the guest issues a CMO on an emulated MMIO address (we don't inject a fault). Thanks, M. -- Jazz is not dead. It just smells funny... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: arm64: Handle CMOs on Read Only memslots 2021-02-12 18:18 ` Marc Zyngier @ 2021-02-16 12:19 ` Alexandru Elisei 0 siblings, 0 replies; 7+ messages in thread From: Alexandru Elisei @ 2021-02-16 12:19 UTC (permalink / raw) To: Marc Zyngier Cc: kvm, Suzuki K Poulose, kernel-team, Jianyong Wu, James Morse, linux-arm-kernel, Will Deacon, kvmarm, Julien Thierry Hi Marc, Thank you for the explanations! On 2/12/21 6:18 PM, Marc Zyngier wrote: > Hi Alex, > > On 2021-02-12 17:12, Alexandru Elisei wrote: >> Hi Marc, >> >> I've been trying to get my head around what the architecture says about CMOs, so >> please bare with me if I misunderstood some things. > > No worries. I've had this patch for a few weeks now, and can't > make up my mind about it. It does address an actual issue though, > so I couldn't just discard it... ;-) > >> On 2/11/21 2:27 PM, Marc Zyngier wrote: >>> It appears that when a guest traps into KVM because it is >>> performing a CMO on a Read Only memslot, our handling of >>> this operation is "slightly suboptimal", as we treat it as >>> an MMIO access without a valid syndrome. >>> >>> The chances that userspace is adequately equiped to deal >>> with such an exception being slim, it would be better to >>> handle it in the kernel. >>> >>> What we need to provide is roughly as follows: >>> >>> (a) if a CMO hits writeable memory, handle it as a normal memory acess >>> (b) if a CMO hits non-memory, skip it >>> (c) if a CMO hits R/O memory, that's where things become fun: >>> (1) if the CMO is DC IVAC, the architecture says this should result >>> in a permission fault For KVM to get a stage 2 fault, the IPA must already be mapped as writable in the guest's stage 1 tables. If I read that right and you are suggesting that the guest should get a permission fault, I don't think that's correct from the guest's viewpoint. >>> >>> (2) if the CMO is DC CIVAC, it should work similarly to (a) >> >> When you say it should work similarly to (a), you mean it should be handled as a >> normal memory access, without the "CMO hits writeable memory" part, right? > > What I mean is that the cache invalidation should take place, > preferably without involving KVM at all (other than populating > S2 if required). > >> >>> >>> We already perform (a) and (b) correctly, but (c) is a total mess. >>> Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2). >>> >>> One way to do it is to treat CMOs generating a translation fault as >>> a *read*, even when they are on a RW memslot. This allows us to >>> further triage things: >>> >>> If they come back with a permission fault, that is because this is >>> a DC IVAC instruction: >>> - inside a RW memslot: no problem, treat it as a write (a)(c.2) >>> - inside a RO memslot: inject a data abort in the guest (c.1) >>> >>> The only drawback is that DC IVAC on a yet unmapped page faults >>> twice: one for the initial translation fault that result in a RO >>> mapping, and once for the permission fault. I think we can live with >>> that. >> >> I'm trying to make sure I understand what the problem is. >> >> gfn_to_pfn_prot() returnsKVM_HVA_ERR_RO_BAD if the write is to a RO memslot. >> KVM_HVA_ERR_RO_BAD is PAGE_OFFSET + PAGE_SIZE, which means that >> is_error_noslot_pfn() return true. In that case we exit to userspace >> with -EFAULT >> for DC IVAC and DC CIVAC. But what we should be doing is this: >> >> - For DC IVAC, inject a dabt with ISS = 0x10, meaning an external abort (that's >> what kvm_inject_dabt_does()). >> >> - For DC CIVAC, exit to userspace with -EFAULT. >> >> Did I get that right? > > Not quite. What I *think* we should do is: > > - DC CIVAC should just work, without going to userspace. I can't imagine > a reason why we'd involve userspace for this, and we currently don't > really have a good way to describe this to userspace. > > - DC IVAC is more nuanced: we could either inject an exception (which > is what this patch does), or perform the CMO ourselves as a DC CIVAC > (consistent with the IVA->CIVA upgrade caused by having a S2 translation). > This second approach is comparable to what we do when the guest > issues a CMO on an emulated MMIO address (we don't inject a fault). Here are my thoughts about this. There is nothing that userspace can do regarding the CMO operations, so I agree that we should handle this in the kernel. If there is no memslot associated with the faulting IPA, then I don't think we can do the CMO because there is no PA associated with the IPA. Assuming the memslot associated with the fault IPA is readonly: Writes coming from the guest are emulated, so whatever the guest writes will never be in a dirty cache line. Cleaning that address would match what KVM_MEM_READONLY API guarantees: "[..] In this case, writes to this memory will be posted to userspace as KVM_EXIT_MMIO exits." No dirty cache line (from the guest's point of view), nothing written to memory. The cache line might be dirty for two reasons: - This is the first time the guest accesses that memory location. No need to do anything (neither cleaning, nor mapping at stage 2), because the subsequent read from the guest will map it at stage 2, and that will trigger the dcache cleaning in user_mem_abort(). - Userspace wrote to the physical address as part of device emulation. It is entirely reasonable for host userspace to assume that the RO memslot is mapped as device memory by the guest, which means that the guest reads from main memory, while host userspace writes to cache (assuming no FWB). In this case, I think it's the host userspace's duty to do the dcache cleaning. Because of the two reasons above, I think cleaning the dcache will have no effect from a correctness perspective. As for invalidating the cache line, beside the two scenarios above, a clean cache line could have been allocated by a read, done either by the guest (if it mapped the IPA as Normal cacheable) or by the host (CPU speculating loads or userspace/kernel reading from the address). I think invalidating, just like cleaning, would have no effect on the correctness of the emulation. My opinion is that we should simply skip CMOs on read-only memslot. What do you think? Thanks, Alex > Thanks, > > M. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: arm64: Handle CMOs on Read Only memslots 2021-02-11 14:27 [PATCH] KVM: arm64: Handle CMOs on Read Only memslots Marc Zyngier 2021-02-12 17:12 ` Alexandru Elisei @ 2021-02-16 12:18 ` Alexandru Elisei 2021-02-17 10:43 ` Andrew Jones 1 sibling, 1 reply; 7+ messages in thread From: Alexandru Elisei @ 2021-02-16 12:18 UTC (permalink / raw) To: Marc Zyngier, kvmarm, linux-arm-kernel, kvm Cc: Suzuki K Poulose, kernel-team, Jianyong Wu, James Morse, Will Deacon, Julien Thierry Hi Marc, Played with this for a bit to try to understand the problem better, wrote a simple MMIO device in kvmtool which maps the memory as a read-only memslot [1] and poked it with kvm-unit-tests [2]. [1] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/mmiodev-wip1 [2] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/mmiodev-wip1 On 2/11/21 2:27 PM, Marc Zyngier wrote: > It appears that when a guest traps into KVM because it is > performing a CMO on a Read Only memslot, our handling of > this operation is "slightly suboptimal", as we treat it as > an MMIO access without a valid syndrome. > > The chances that userspace is adequately equiped to deal > with such an exception being slim, it would be better to > handle it in the kernel. > > What we need to provide is roughly as follows: > > (a) if a CMO hits writeable memory, handle it as a normal memory acess > (b) if a CMO hits non-memory, skip it > (c) if a CMO hits R/O memory, that's where things become fun: > (1) if the CMO is DC IVAC, the architecture says this should result > in a permission fault > (2) if the CMO is DC CIVAC, it should work similarly to (a) > > We already perform (a) and (b) correctly, but (c) is a total mess. > Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2). > > One way to do it is to treat CMOs generating a translation fault as > a *read*, even when they are on a RW memslot. This allows us to > further triage things: > > If they come back with a permission fault, that is because this is > a DC IVAC instruction: > - inside a RW memslot: no problem, treat it as a write (a)(c.2) > - inside a RO memslot: inject a data abort in the guest (c.1) > > The only drawback is that DC IVAC on a yet unmapped page faults > twice: one for the initial translation fault that result in a RO > mapping, and once for the permission fault. I think we can live with > that. > > Reported-by: Jianyong Wu <jianyong.wu@arm.com> > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > > Notes: > I have taken the option to inject an abort in the guest when > it issues a DC IVAC on a R/O memslot, but another option would > be to just perform the invalidation ourselves as a DC CIAVAC. > > This would have the advantage of being consistent with what we > do for emulated MMIO. > > arch/arm64/kvm/mmu.c | 53 ++++++++++++++++++++++++++++++++++---------- > 1 file changed, 41 insertions(+), 12 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 7d2257cc5438..c7f4388bea45 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -760,7 +760,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_pgtable *pgt; > > fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); > - write_fault = kvm_is_write_fault(vcpu); > + /* > + * Treat translation faults on CMOs as read faults. Should > + * this further generate a permission fault on a R/O memslot, > + * it will be caught in kvm_handle_guest_abort(), with > + * prejudice. Permission faults on non-R/O memslot will be > + * gracefully handled as writes. > + */ > + if (fault_status == FSC_FAULT && kvm_vcpu_dabt_is_cm(vcpu)) > + write_fault = false; This means that every DC CIVAC will map the IPA with read permissions in the stage 2 tables, regardless of the IPA being already mapped. It's harmless, but a bit unexpected. > + else > + write_fault = kvm_is_write_fault(vcpu); > exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > VM_BUG_ON(write_fault && exec_fault); > > @@ -1013,19 +1023,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > } > > /* > - * Check for a cache maintenance operation. Since we > - * ended-up here, we know it is outside of any memory > - * slot. But we can't find out if that is for a device, > - * or if the guest is just being stupid. The only thing > - * we know for sure is that this range cannot be cached. > + * Check for a cache maintenance operation. Three cases: > + * > + * - It is outside of any memory slot. But we can't find out > + * if that is for a device, or if the guest is just being > + * stupid. The only thing we know for sure is that this > + * range cannot be cached. So let's assume that the guest > + * is just being cautious, and skip the instruction. > + * > + * - Otherwise, check whether this is a permission fault. > + * If so, that's a DC IVAC on a R/O memslot, which is a > + * pretty bad idea, and we tell the guest so. > * > - * So let's assume that the guest is just being > - * cautious, and skip the instruction. > + * - If this wasn't a permission fault, pass it along for > + * further handling (including faulting the page in if it > + * was a translation fault). > */ > - if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) { > - kvm_incr_pc(vcpu); > - ret = 1; > - goto out_unlock; > + if (kvm_vcpu_dabt_is_cm(vcpu)) { > + if (kvm_is_error_hva(hva)) { > + kvm_incr_pc(vcpu); > + ret = 1; > + goto out_unlock; > + } > + > + if (fault_status == FSC_PERM) { > + /* DC IVAC on a R/O memslot */ > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > + ret = 1; > + goto out_unlock; > + } I don't like the inconsistency. We go from exiting to userspace for both DC IVAC/DC CIVAC to mapping the IPA with read permissions for DC CIVAC, but injecting a DABT for a DC IVAC. DC IVAC acts just like a DC CIVAC and requires the same permissions when executed by a guest, so I'm not sure we should be handling them differently. Thanks, Alex > + > + goto handle_access; > } > > /* > @@ -1039,6 +1067,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > goto out_unlock; > } > > +handle_access: > /* Userspace should not be able to register out-of-bounds IPAs */ > VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm)); > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: arm64: Handle CMOs on Read Only memslots 2021-02-16 12:18 ` Alexandru Elisei @ 2021-02-17 10:43 ` Andrew Jones 2021-02-17 11:12 ` Alexandru Elisei 0 siblings, 1 reply; 7+ messages in thread From: Andrew Jones @ 2021-02-17 10:43 UTC (permalink / raw) To: Alexandru Elisei Cc: kernel-team, kvm, Marc Zyngier, Will Deacon, kvmarm, linux-arm-kernel On Tue, Feb 16, 2021 at 12:18:31PM +0000, Alexandru Elisei wrote: > Hi Marc, > > Played with this for a bit to try to understand the problem better, wrote a simple > MMIO device in kvmtool which maps the memory as a read-only memslot [1] and poked > it with kvm-unit-tests [2]. > > [1] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/mmiodev-wip1 > > [2] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/mmiodev-wip1 Looks like you forgot to add arm/mmiodev.c to your commit. Thanks, drew > > On 2/11/21 2:27 PM, Marc Zyngier wrote: > > It appears that when a guest traps into KVM because it is > > performing a CMO on a Read Only memslot, our handling of > > this operation is "slightly suboptimal", as we treat it as > > an MMIO access without a valid syndrome. > > > > The chances that userspace is adequately equiped to deal > > with such an exception being slim, it would be better to > > handle it in the kernel. > > > > What we need to provide is roughly as follows: > > > > (a) if a CMO hits writeable memory, handle it as a normal memory acess > > (b) if a CMO hits non-memory, skip it > > (c) if a CMO hits R/O memory, that's where things become fun: > > (1) if the CMO is DC IVAC, the architecture says this should result > > in a permission fault > > (2) if the CMO is DC CIVAC, it should work similarly to (a) > > > > We already perform (a) and (b) correctly, but (c) is a total mess. > > Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2). > > > > One way to do it is to treat CMOs generating a translation fault as > > a *read*, even when they are on a RW memslot. This allows us to > > further triage things: > > > > If they come back with a permission fault, that is because this is > > a DC IVAC instruction: > > - inside a RW memslot: no problem, treat it as a write (a)(c.2) > > - inside a RO memslot: inject a data abort in the guest (c.1) > > > > The only drawback is that DC IVAC on a yet unmapped page faults > > twice: one for the initial translation fault that result in a RO > > mapping, and once for the permission fault. I think we can live with > > that. > > > > Reported-by: Jianyong Wu <jianyong.wu@arm.com> > > Signed-off-by: Marc Zyngier <maz@kernel.org> > > --- > > > > Notes: > > I have taken the option to inject an abort in the guest when > > it issues a DC IVAC on a R/O memslot, but another option would > > be to just perform the invalidation ourselves as a DC CIAVAC. > > > > This would have the advantage of being consistent with what we > > do for emulated MMIO. > > > > arch/arm64/kvm/mmu.c | 53 ++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 41 insertions(+), 12 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 7d2257cc5438..c7f4388bea45 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -760,7 +760,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > struct kvm_pgtable *pgt; > > > > fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); > > - write_fault = kvm_is_write_fault(vcpu); > > + /* > > + * Treat translation faults on CMOs as read faults. Should > > + * this further generate a permission fault on a R/O memslot, > > + * it will be caught in kvm_handle_guest_abort(), with > > + * prejudice. Permission faults on non-R/O memslot will be > > + * gracefully handled as writes. > > + */ > > + if (fault_status == FSC_FAULT && kvm_vcpu_dabt_is_cm(vcpu)) > > + write_fault = false; > > This means that every DC CIVAC will map the IPA with read permissions in the stage > 2 tables, regardless of the IPA being already mapped. It's harmless, but a bit > unexpected. > > > + else > > + write_fault = kvm_is_write_fault(vcpu); > > exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > > VM_BUG_ON(write_fault && exec_fault); > > > > @@ -1013,19 +1023,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > > } > > > > /* > > - * Check for a cache maintenance operation. Since we > > - * ended-up here, we know it is outside of any memory > > - * slot. But we can't find out if that is for a device, > > - * or if the guest is just being stupid. The only thing > > - * we know for sure is that this range cannot be cached. > > + * Check for a cache maintenance operation. Three cases: > > + * > > + * - It is outside of any memory slot. But we can't find out > > + * if that is for a device, or if the guest is just being > > + * stupid. The only thing we know for sure is that this > > + * range cannot be cached. So let's assume that the guest > > + * is just being cautious, and skip the instruction. > > + * > > + * - Otherwise, check whether this is a permission fault. > > + * If so, that's a DC IVAC on a R/O memslot, which is a > > + * pretty bad idea, and we tell the guest so. > > * > > - * So let's assume that the guest is just being > > - * cautious, and skip the instruction. > > + * - If this wasn't a permission fault, pass it along for > > + * further handling (including faulting the page in if it > > + * was a translation fault). > > */ > > - if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) { > > - kvm_incr_pc(vcpu); > > - ret = 1; > > - goto out_unlock; > > + if (kvm_vcpu_dabt_is_cm(vcpu)) { > > + if (kvm_is_error_hva(hva)) { > > + kvm_incr_pc(vcpu); > > + ret = 1; > > + goto out_unlock; > > + } > > + > > + if (fault_status == FSC_PERM) { > > + /* DC IVAC on a R/O memslot */ > > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > > + ret = 1; > > + goto out_unlock; > > + } > > I don't like the inconsistency. We go from exiting to userspace for both DC > IVAC/DC CIVAC to mapping the IPA with read permissions for DC CIVAC, but injecting > a DABT for a DC IVAC. DC IVAC acts just like a DC CIVAC and requires the same > permissions when executed by a guest, so I'm not sure we should be handling them > differently. > > Thanks, > > Alex > > > + > > + goto handle_access; > > } > > > > /* > > @@ -1039,6 +1067,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > > goto out_unlock; > > } > > > > +handle_access: > > /* Userspace should not be able to register out-of-bounds IPAs */ > > VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm)); > > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: arm64: Handle CMOs on Read Only memslots 2021-02-17 10:43 ` Andrew Jones @ 2021-02-17 11:12 ` Alexandru Elisei 0 siblings, 0 replies; 7+ messages in thread From: Alexandru Elisei @ 2021-02-17 11:12 UTC (permalink / raw) To: Andrew Jones Cc: kernel-team, kvm, Marc Zyngier, Will Deacon, kvmarm, linux-arm-kernel Hi Drew, On 2/17/21 10:43 AM, Andrew Jones wrote: > On Tue, Feb 16, 2021 at 12:18:31PM +0000, Alexandru Elisei wrote: >> Hi Marc, >> >> Played with this for a bit to try to understand the problem better, wrote a simple >> MMIO device in kvmtool which maps the memory as a read-only memslot [1] and poked >> it with kvm-unit-tests [2]. >> >> [1] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/mmiodev-wip1 >> >> [2] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/mmiodev-wip1 > Looks like you forgot to add arm/mmiodev.c to your commit. Fixed, thanks for pointing that out! Thanks, Alex > > Thanks, > drew > >> On 2/11/21 2:27 PM, Marc Zyngier wrote: >>> It appears that when a guest traps into KVM because it is >>> performing a CMO on a Read Only memslot, our handling of >>> this operation is "slightly suboptimal", as we treat it as >>> an MMIO access without a valid syndrome. >>> >>> The chances that userspace is adequately equiped to deal >>> with such an exception being slim, it would be better to >>> handle it in the kernel. >>> >>> What we need to provide is roughly as follows: >>> >>> (a) if a CMO hits writeable memory, handle it as a normal memory acess >>> (b) if a CMO hits non-memory, skip it >>> (c) if a CMO hits R/O memory, that's where things become fun: >>> (1) if the CMO is DC IVAC, the architecture says this should result >>> in a permission fault >>> (2) if the CMO is DC CIVAC, it should work similarly to (a) >>> >>> We already perform (a) and (b) correctly, but (c) is a total mess. >>> Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2). >>> >>> One way to do it is to treat CMOs generating a translation fault as >>> a *read*, even when they are on a RW memslot. This allows us to >>> further triage things: >>> >>> If they come back with a permission fault, that is because this is >>> a DC IVAC instruction: >>> - inside a RW memslot: no problem, treat it as a write (a)(c.2) >>> - inside a RO memslot: inject a data abort in the guest (c.1) >>> >>> The only drawback is that DC IVAC on a yet unmapped page faults >>> twice: one for the initial translation fault that result in a RO >>> mapping, and once for the permission fault. I think we can live with >>> that. >>> >>> Reported-by: Jianyong Wu <jianyong.wu@arm.com> >>> Signed-off-by: Marc Zyngier <maz@kernel.org> >>> --- >>> >>> Notes: >>> I have taken the option to inject an abort in the guest when >>> it issues a DC IVAC on a R/O memslot, but another option would >>> be to just perform the invalidation ourselves as a DC CIAVAC. >>> >>> This would have the advantage of being consistent with what we >>> do for emulated MMIO. >>> >>> arch/arm64/kvm/mmu.c | 53 ++++++++++++++++++++++++++++++++++---------- >>> 1 file changed, 41 insertions(+), 12 deletions(-) >>> >>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >>> index 7d2257cc5438..c7f4388bea45 100644 >>> --- a/arch/arm64/kvm/mmu.c >>> +++ b/arch/arm64/kvm/mmu.c >>> @@ -760,7 +760,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >>> struct kvm_pgtable *pgt; >>> >>> fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); >>> - write_fault = kvm_is_write_fault(vcpu); >>> + /* >>> + * Treat translation faults on CMOs as read faults. Should >>> + * this further generate a permission fault on a R/O memslot, >>> + * it will be caught in kvm_handle_guest_abort(), with >>> + * prejudice. Permission faults on non-R/O memslot will be >>> + * gracefully handled as writes. >>> + */ >>> + if (fault_status == FSC_FAULT && kvm_vcpu_dabt_is_cm(vcpu)) >>> + write_fault = false; >> This means that every DC CIVAC will map the IPA with read permissions in the stage >> 2 tables, regardless of the IPA being already mapped. It's harmless, but a bit >> unexpected. >> >>> + else >>> + write_fault = kvm_is_write_fault(vcpu); >>> exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); >>> VM_BUG_ON(write_fault && exec_fault); >>> >>> @@ -1013,19 +1023,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) >>> } >>> >>> /* >>> - * Check for a cache maintenance operation. Since we >>> - * ended-up here, we know it is outside of any memory >>> - * slot. But we can't find out if that is for a device, >>> - * or if the guest is just being stupid. The only thing >>> - * we know for sure is that this range cannot be cached. >>> + * Check for a cache maintenance operation. Three cases: >>> + * >>> + * - It is outside of any memory slot. But we can't find out >>> + * if that is for a device, or if the guest is just being >>> + * stupid. The only thing we know for sure is that this >>> + * range cannot be cached. So let's assume that the guest >>> + * is just being cautious, and skip the instruction. >>> + * >>> + * - Otherwise, check whether this is a permission fault. >>> + * If so, that's a DC IVAC on a R/O memslot, which is a >>> + * pretty bad idea, and we tell the guest so. >>> * >>> - * So let's assume that the guest is just being >>> - * cautious, and skip the instruction. >>> + * - If this wasn't a permission fault, pass it along for >>> + * further handling (including faulting the page in if it >>> + * was a translation fault). >>> */ >>> - if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) { >>> - kvm_incr_pc(vcpu); >>> - ret = 1; >>> - goto out_unlock; >>> + if (kvm_vcpu_dabt_is_cm(vcpu)) { >>> + if (kvm_is_error_hva(hva)) { >>> + kvm_incr_pc(vcpu); >>> + ret = 1; >>> + goto out_unlock; >>> + } >>> + >>> + if (fault_status == FSC_PERM) { >>> + /* DC IVAC on a R/O memslot */ >>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); >>> + ret = 1; >>> + goto out_unlock; >>> + } >> I don't like the inconsistency. We go from exiting to userspace for both DC >> IVAC/DC CIVAC to mapping the IPA with read permissions for DC CIVAC, but injecting >> a DABT for a DC IVAC. DC IVAC acts just like a DC CIVAC and requires the same >> permissions when executed by a guest, so I'm not sure we should be handling them >> differently. >> >> Thanks, >> >> Alex >> >>> + >>> + goto handle_access; >>> } >>> >>> /* >>> @@ -1039,6 +1067,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) >>> goto out_unlock; >>> } >>> >>> +handle_access: >>> /* Userspace should not be able to register out-of-bounds IPAs */ >>> VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm)); >>> >> _______________________________________________ >> kvmarm mailing list >> kvmarm@lists.cs.columbia.edu >> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-02-17 11:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-11 14:27 [PATCH] KVM: arm64: Handle CMOs on Read Only memslots Marc Zyngier 2021-02-12 17:12 ` Alexandru Elisei 2021-02-12 18:18 ` Marc Zyngier 2021-02-16 12:19 ` Alexandru Elisei 2021-02-16 12:18 ` Alexandru Elisei 2021-02-17 10:43 ` Andrew Jones 2021-02-17 11:12 ` Alexandru Elisei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).