* [PATCH v5 0/2] MTE support for KVM guest @ 2020-11-19 15:38 Steven Price 2020-11-19 15:39 ` [PATCH v5 1/2] arm64: kvm: Save/restore MTE registers Steven Price ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Steven Price @ 2020-11-19 15:38 UTC (permalink / raw) To: Catalin Marinas, Marc Zyngier, Will Deacon Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel, Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela, Dr. David Alan Gilbert, Richard Henderson, Peter Maydell, Haibo Xu, Andrew Jones This series adds support for Arm's Memory Tagging Extension (MTE) to KVM, allowing KVM guests to make use of it. This builds on the existing user space support already in v5.10-rc1, see [1] for an overview. [1] https://lwn.net/Articles/834289/ Changes since v4[2]: * Rebased on v5.10-rc4. * Require the VMM to map all guest memory PROT_MTE if MTE is enabled for the guest. * Add a kvm_has_mte() accessor. [2] http://lkml.kernel.org/r/20201026155727.36685-1-steven.price%40arm.com The change to require the VMM to map all guest memory PROT_MTE is significant as it means that the VMM has to deal with the MTE tags even if it doesn't care about them (e.g. for virtual devices or if the VMM doesn't support migration). Also unfortunately because the VMM can change the memory layout at any time the check for PROT_MTE/VM_MTE has to be done very late (at the point of faulting pages into stage 2). The alternative would be to modify the set_pte_at() handler to always check if there is MTE data relating to a swap page even if the PTE doesn't have the MTE bit set. I haven't initially done this because of ordering issues during early boot, but could investigate further if the above VMM requirement is too strict. Steven Price (2): arm64: kvm: Save/restore MTE registers arm64: kvm: Introduce MTE VCPU feature arch/arm64/include/asm/kvm_emulate.h | 3 +++ arch/arm64/include/asm/kvm_host.h | 8 ++++++++ arch/arm64/include/asm/sysreg.h | 3 ++- arch/arm64/kvm/arm.c | 9 +++++++++ arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++ arch/arm64/kvm/mmu.c | 6 ++++++ arch/arm64/kvm/sys_regs.c | 20 +++++++++++++++----- include/uapi/linux/kvm.h | 1 + 8 files changed, 58 insertions(+), 6 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v5 1/2] arm64: kvm: Save/restore MTE registers 2020-11-19 15:38 [PATCH v5 0/2] MTE support for KVM guest Steven Price @ 2020-11-19 15:39 ` Steven Price 2020-11-19 15:39 ` [PATCH v5 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price 2020-11-19 15:45 ` [PATCH v5 0/2] MTE support for KVM guest Peter Maydell 2 siblings, 0 replies; 38+ messages in thread From: Steven Price @ 2020-11-19 15:39 UTC (permalink / raw) To: Catalin Marinas, Marc Zyngier, Will Deacon Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel, Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela, Dr. David Alan Gilbert, Richard Henderson, Peter Maydell, Haibo Xu, Andrew Jones Define the new system registers that MTE introduces and context switch them. The MTE feature is still hidden from the ID register as it isn't supported in a VM yet. Signed-off-by: Steven Price <steven.price@arm.com> --- arch/arm64/include/asm/kvm_host.h | 4 ++++ arch/arm64/include/asm/sysreg.h | 3 ++- arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++++++++++++++ arch/arm64/kvm/sys_regs.c | 14 ++++++++++---- 4 files changed, 30 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 0cd9f0f75c13..d3e136343468 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -136,6 +136,8 @@ enum vcpu_sysreg { SCTLR_EL1, /* System Control Register */ ACTLR_EL1, /* Auxiliary Control Register */ CPACR_EL1, /* Coprocessor Access Control */ + RGSR_EL1, /* Random Allocation Tag Seed Register */ + GCR_EL1, /* Tag Control Register */ ZCR_EL1, /* SVE Control */ TTBR0_EL1, /* Translation Table Base Register 0 */ TTBR1_EL1, /* Translation Table Base Register 1 */ @@ -152,6 +154,8 @@ enum vcpu_sysreg { TPIDR_EL1, /* Thread ID, Privileged */ AMAIR_EL1, /* Aux Memory Attribute Indirection Register */ CNTKCTL_EL1, /* Timer Control Register (EL1) */ + TFSRE0_EL1, /* Tag Fault Status Register (EL0) */ + TFSR_EL1, /* Tag Fault Stauts Register (EL1) */ PAR_EL1, /* Physical Address Register */ MDSCR_EL1, /* Monitor Debug System Control Register */ MDCCINT_EL1, /* Monitor Debug Comms Channel Interrupt Enable Reg */ diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index e2ef4c2edf06..b6668ffa04d9 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -569,7 +569,8 @@ #define SCTLR_ELx_M (BIT(0)) #define SCTLR_ELx_FLAGS (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | \ - SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB) + SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB | \ + SCTLR_ELx_ITFSB) /* SCTLR_EL2 specific flags. */ #define SCTLR_EL2_RES1 ((BIT(4)) | (BIT(5)) | (BIT(11)) | (BIT(16)) | \ diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h index cce43bfe158f..45255ba60152 100644 --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h @@ -18,6 +18,11 @@ static inline void __sysreg_save_common_state(struct kvm_cpu_context *ctxt) { ctxt_sys_reg(ctxt, MDSCR_EL1) = read_sysreg(mdscr_el1); + if (system_supports_mte()) { + ctxt_sys_reg(ctxt, RGSR_EL1) = read_sysreg_s(SYS_RGSR_EL1); + ctxt_sys_reg(ctxt, GCR_EL1) = read_sysreg_s(SYS_GCR_EL1); + ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1); + } } static inline void __sysreg_save_user_state(struct kvm_cpu_context *ctxt) @@ -45,6 +50,8 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt) ctxt_sys_reg(ctxt, CNTKCTL_EL1) = read_sysreg_el1(SYS_CNTKCTL); ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg_par(); ctxt_sys_reg(ctxt, TPIDR_EL1) = read_sysreg(tpidr_el1); + if (system_supports_mte()) + ctxt_sys_reg(ctxt, TFSR_EL1) = read_sysreg_el1(SYS_TFSR); ctxt_sys_reg(ctxt, SP_EL1) = read_sysreg(sp_el1); ctxt_sys_reg(ctxt, ELR_EL1) = read_sysreg_el1(SYS_ELR); @@ -63,6 +70,11 @@ static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt) static inline void __sysreg_restore_common_state(struct kvm_cpu_context *ctxt) { write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1), mdscr_el1); + if (system_supports_mte()) { + write_sysreg_s(ctxt_sys_reg(ctxt, RGSR_EL1), SYS_RGSR_EL1); + write_sysreg_s(ctxt_sys_reg(ctxt, GCR_EL1), SYS_GCR_EL1); + write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1); + } } static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt) @@ -106,6 +118,8 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt) write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL); write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1), par_el1); write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1), tpidr_el1); + if (system_supports_mte()) + write_sysreg_el1(ctxt_sys_reg(ctxt, TFSR_EL1), SYS_TFSR); if (!has_vhe() && cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) && diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index c1fac9836af1..4792d5249f07 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1366,6 +1366,12 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, return true; } +static unsigned int mte_visibility(const struct kvm_vcpu *vcpu, + const struct sys_reg_desc *rd) +{ + return REG_HIDDEN; +} + /* sys_reg_desc initialiser for known cpufeature ID registers */ #define ID_SANITISED(name) { \ SYS_DESC(SYS_##name), \ @@ -1534,8 +1540,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 }, { SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 }, - { SYS_DESC(SYS_RGSR_EL1), undef_access }, - { SYS_DESC(SYS_GCR_EL1), undef_access }, + { SYS_DESC(SYS_RGSR_EL1), undef_access, reset_unknown, RGSR_EL1, .visibility = mte_visibility }, + { SYS_DESC(SYS_GCR_EL1), undef_access, reset_unknown, GCR_EL1, .visibility = mte_visibility }, { SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility }, { SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 }, @@ -1561,8 +1567,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi }, { SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi }, - { SYS_DESC(SYS_TFSR_EL1), undef_access }, - { SYS_DESC(SYS_TFSRE0_EL1), undef_access }, + { SYS_DESC(SYS_TFSR_EL1), undef_access, reset_unknown, TFSR_EL1, .visibility = mte_visibility }, + { SYS_DESC(SYS_TFSRE0_EL1), undef_access, reset_unknown, TFSRE0_EL1, .visibility = mte_visibility }, { SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 }, { SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 }, -- 2.20.1 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v5 2/2] arm64: kvm: Introduce MTE VCPU feature 2020-11-19 15:38 [PATCH v5 0/2] MTE support for KVM guest Steven Price 2020-11-19 15:39 ` [PATCH v5 1/2] arm64: kvm: Save/restore MTE registers Steven Price @ 2020-11-19 15:39 ` Steven Price 2020-11-19 15:45 ` [PATCH v5 0/2] MTE support for KVM guest Peter Maydell 2 siblings, 0 replies; 38+ messages in thread From: Steven Price @ 2020-11-19 15:39 UTC (permalink / raw) To: Catalin Marinas, Marc Zyngier, Will Deacon Cc: Steven Price, James Morse, Julien Thierry, Suzuki K Poulose, kvmarm, linux-arm-kernel, linux-kernel, Dave Martin, Mark Rutland, Thomas Gleixner, qemu-devel, Juan Quintela, Dr. David Alan Gilbert, Richard Henderson, Peter Maydell, Haibo Xu, Andrew Jones Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging for a VM. This exposes the feature to the guest and requires the VMM to have set PROT_MTE on all mappings exposed to the guest. This ensures that the guest cannot see stale tags, and that the tags are correctly saved/restored across swap. Signed-off-by: Steven Price <steven.price@arm.com> --- arch/arm64/include/asm/kvm_emulate.h | 3 +++ arch/arm64/include/asm/kvm_host.h | 4 ++++ arch/arm64/kvm/arm.c | 9 +++++++++ arch/arm64/kvm/mmu.c | 6 ++++++ arch/arm64/kvm/sys_regs.c | 6 +++++- include/uapi/linux/kvm.h | 1 + 6 files changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 5ef2669ccd6c..7791ef044b7f 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) || vcpu_el1_is_32bit(vcpu)) vcpu->arch.hcr_el2 |= HCR_TID2; + + if (kvm_has_mte(vcpu->kvm)) + vcpu->arch.hcr_el2 |= HCR_ATA; } static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index d3e136343468..aeff10bc5b31 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -120,6 +120,8 @@ struct kvm_arch { unsigned int pmuver; u8 pfr0_csv2; + /* Memory Tagging Extension enabled for the guest */ + bool mte_enabled; }; struct kvm_vcpu_fault_info { @@ -658,4 +660,6 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu); #define kvm_arm_vcpu_sve_finalized(vcpu) \ ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED) +#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled) + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index c0ffb019ca8b..da4aeba1855c 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -89,6 +89,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, r = 0; kvm->arch.return_nisv_io_abort_to_user = true; break; + case KVM_CAP_ARM_MTE: + if (!system_supports_mte() || kvm->created_vcpus) + return -EINVAL; + r = 0; + kvm->arch.mte_enabled = true; + break; default: r = -EINVAL; break; @@ -226,6 +232,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) */ r = 1; break; + case KVM_CAP_ARM_MTE: + r = system_supports_mte(); + break; case KVM_CAP_STEAL_TIME: r = kvm_arm_pvtime_supported(); break; diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 1a01da9fdc99..f804d2109b8c 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -815,6 +815,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) fault_ipa &= ~(vma_pagesize - 1); + /* VMA regions must be mapped with PROT_MTE when VM has MTE enabled */ + if (kvm_has_mte(kvm) && !(vma->vm_flags & VM_MTE)) { + pr_err_ratelimited("Page not mapped VM_MTE in MTE guest\n"); + return -EFAULT; + } + gfn = fault_ipa >> PAGE_SHIFT; mmap_read_unlock(current->mm); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 4792d5249f07..469b0ef3eb07 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1123,7 +1123,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, val &= ~(0xfUL << ID_AA64PFR0_CSV2_SHIFT); val |= ((u64)vcpu->kvm->arch.pfr0_csv2 << ID_AA64PFR0_CSV2_SHIFT); } else if (id == SYS_ID_AA64PFR1_EL1) { - val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT); + if (!kvm_has_mte(vcpu->kvm)) + val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT); } else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) { val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) | (0xfUL << ID_AA64ISAR1_API_SHIFT) | @@ -1369,6 +1370,9 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, static unsigned int mte_visibility(const struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd) { + if (kvm_has_mte(vcpu->kvm)) + return 0; + return REG_HIDDEN; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index ca41220b40b8..3e6fb5b580a9 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_USER_SPACE_MSR 188 #define KVM_CAP_X86_MSR_FILTER 189 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190 +#define KVM_CAP_ARM_MTE 191 #ifdef KVM_CAP_IRQ_ROUTING -- 2.20.1 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-19 15:38 [PATCH v5 0/2] MTE support for KVM guest Steven Price 2020-11-19 15:39 ` [PATCH v5 1/2] arm64: kvm: Save/restore MTE registers Steven Price 2020-11-19 15:39 ` [PATCH v5 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price @ 2020-11-19 15:45 ` Peter Maydell 2020-11-19 15:57 ` Steven Price ` (2 more replies) 2 siblings, 3 replies; 38+ messages in thread From: Peter Maydell @ 2020-11-19 15:45 UTC (permalink / raw) To: Steven Price Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose, kvmarm, arm-mail-list, lkml - Kernel Mailing List, Dave Martin, Mark Rutland, Thomas Gleixner, QEMU Developers, Juan Quintela, Dr. David Alan Gilbert, Richard Henderson, Haibo Xu, Andrew Jones On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: > This series adds support for Arm's Memory Tagging Extension (MTE) to > KVM, allowing KVM guests to make use of it. This builds on the existing > user space support already in v5.10-rc1, see [1] for an overview. > The change to require the VMM to map all guest memory PROT_MTE is > significant as it means that the VMM has to deal with the MTE tags even > if it doesn't care about them (e.g. for virtual devices or if the VMM > doesn't support migration). Also unfortunately because the VMM can > change the memory layout at any time the check for PROT_MTE/VM_MTE has > to be done very late (at the point of faulting pages into stage 2). I'm a bit dubious about requring the VMM to map the guest memory PROT_MTE unless somebody's done at least a sketch of the design for how this would work on the QEMU side. Currently QEMU just assumes the guest memory is guest memory and it can access it without special precautions... thanks -- PMM ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-19 15:45 ` [PATCH v5 0/2] MTE support for KVM guest Peter Maydell @ 2020-11-19 15:57 ` Steven Price 2020-11-19 16:39 ` Peter Maydell 2020-11-19 18:42 ` Andrew Jones 2020-11-23 12:16 ` Dr. David Alan Gilbert 2 siblings, 1 reply; 38+ messages in thread From: Steven Price @ 2020-11-19 15:57 UTC (permalink / raw) To: Peter Maydell Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose, kvmarm, arm-mail-list, lkml - Kernel Mailing List, Dave Martin, Mark Rutland, Thomas Gleixner, QEMU Developers, Juan Quintela, Dr. David Alan Gilbert, Richard Henderson, Haibo Xu, Andrew Jones On 19/11/2020 15:45, Peter Maydell wrote: > On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: >> This series adds support for Arm's Memory Tagging Extension (MTE) to >> KVM, allowing KVM guests to make use of it. This builds on the existing >> user space support already in v5.10-rc1, see [1] for an overview. > >> The change to require the VMM to map all guest memory PROT_MTE is >> significant as it means that the VMM has to deal with the MTE tags even >> if it doesn't care about them (e.g. for virtual devices or if the VMM >> doesn't support migration). Also unfortunately because the VMM can >> change the memory layout at any time the check for PROT_MTE/VM_MTE has >> to be done very late (at the point of faulting pages into stage 2). > > I'm a bit dubious about requring the VMM to map the guest memory > PROT_MTE unless somebody's done at least a sketch of the design > for how this would work on the QEMU side. Currently QEMU just > assumes the guest memory is guest memory and it can access it > without special precautions... I agree this needs some investigation - I'm hoping Haibo will be able to provide some feedback here as he has been looking at the QEMU support. However the VMM is likely going to require some significant changes to ensure that migration doesn't break, so either way there's work to be done. Fundamentally most memory will need a mapping with PROT_MTE just so the VMM can get at the tags for migration purposes, so QEMU is going to have to learn how to treat guest memory specially if it wants to be able to enable MTE for both itself and the guest. I'll also hunt down what's happening with my attempts to fix the set_pte_at() handling for swap and I'll post that as an alternative if it turns out to be a reasonable approach. But I don't think that solve the QEMU issue above. The other alternative would be to implement a new kernel interface to fetch tags from the guest and not require the VMM to maintain a PROT_MTE mapping. But we need some real feedback from someone familiar with QEMU to know what that interface should look like. So I'm holding off on that until there's a 'real' PoC implementation. Thanks, Steve ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-19 15:57 ` Steven Price @ 2020-11-19 16:39 ` Peter Maydell 0 siblings, 0 replies; 38+ messages in thread From: Peter Maydell @ 2020-11-19 16:39 UTC (permalink / raw) To: Steven Price Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose, kvmarm, arm-mail-list, lkml - Kernel Mailing List, Dave Martin, Mark Rutland, Thomas Gleixner, QEMU Developers, Juan Quintela, Dr. David Alan Gilbert, Richard Henderson, Haibo Xu, Andrew Jones On Thu, 19 Nov 2020 at 15:57, Steven Price <steven.price@arm.com> wrote: > On 19/11/2020 15:45, Peter Maydell wrote: > > I'm a bit dubious about requring the VMM to map the guest memory > > PROT_MTE unless somebody's done at least a sketch of the design > > for how this would work on the QEMU side. Currently QEMU just > > assumes the guest memory is guest memory and it can access it > > without special precautions... > > I agree this needs some investigation - I'm hoping Haibo will be able to > provide some feedback here as he has been looking at the QEMU support. > However the VMM is likely going to require some significant changes to > ensure that migration doesn't break, so either way there's work to be done. > > Fundamentally most memory will need a mapping with PROT_MTE just so the > VMM can get at the tags for migration purposes, so QEMU is going to have > to learn how to treat guest memory specially if it wants to be able to > enable MTE for both itself and the guest. If the only reason the VMM needs tag access is for migration it feels like there must be a nicer way to do it than by requiring it to map the whole of the guest address space twice (once for normal use and once to get the tags)... Anyway, maybe "must map PROT_MTE" is workable, but it seems a bit premature to fix the kernel ABI as working that way until we are at least reasonably sure that it is the right design. thanks -- PMM ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-19 15:45 ` [PATCH v5 0/2] MTE support for KVM guest Peter Maydell 2020-11-19 15:57 ` Steven Price @ 2020-11-19 18:42 ` Andrew Jones 2020-11-19 19:11 ` Marc Zyngier 2020-11-23 12:16 ` Dr. David Alan Gilbert 2 siblings, 1 reply; 38+ messages in thread From: Andrew Jones @ 2020-11-19 18:42 UTC (permalink / raw) To: Peter Maydell Cc: Steven Price, Mark Rutland, Dr. David Alan Gilbert, Haibo Xu, Suzuki K Poulose, QEMU Developers, Catalin Marinas, Juan Quintela, Richard Henderson, lkml - Kernel Mailing List, Dave Martin, James Morse, arm-mail-list, Marc Zyngier, Thomas Gleixner, Will Deacon, kvmarm, Julien Thierry On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote: > On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: > > This series adds support for Arm's Memory Tagging Extension (MTE) to > > KVM, allowing KVM guests to make use of it. This builds on the existing > > user space support already in v5.10-rc1, see [1] for an overview. > > > The change to require the VMM to map all guest memory PROT_MTE is > > significant as it means that the VMM has to deal with the MTE tags even > > if it doesn't care about them (e.g. for virtual devices or if the VMM > > doesn't support migration). Also unfortunately because the VMM can > > change the memory layout at any time the check for PROT_MTE/VM_MTE has > > to be done very late (at the point of faulting pages into stage 2). > > I'm a bit dubious about requring the VMM to map the guest memory > PROT_MTE unless somebody's done at least a sketch of the design > for how this would work on the QEMU side. Currently QEMU just > assumes the guest memory is guest memory and it can access it > without special precautions... > There are two statements being made here: 1) Requiring the use of PROT_MTE when mapping guest memory may not fit QEMU well. 2) New KVM features should be accompanied with supporting QEMU code in order to prove that the APIs make sense. I strongly agree with (2). While kvmtool supports some quick testing, it doesn't support migration. We must test all new features with a migration supporting VMM. I'm not sure about (1). I don't feel like it should be a major problem, but (2). I'd be happy to help with the QEMU prototype, but preferably when there's hardware available. Has all the current MTE testing just been done on simulators? And, if so, are there regression tests regularly running on the simulators too? And can they test migration? If hardware doesn't show up quickly and simulators aren't used for regression tests, then all this code will start rotting from day one. Thanks, drew ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-19 18:42 ` Andrew Jones @ 2020-11-19 19:11 ` Marc Zyngier 2020-11-20 9:50 ` Steven Price 0 siblings, 1 reply; 38+ messages in thread From: Marc Zyngier @ 2020-11-19 19:11 UTC (permalink / raw) To: Andrew Jones Cc: Peter Maydell, Steven Price, Mark Rutland, Dr. David Alan Gilbert, Haibo Xu, Suzuki K Poulose, QEMU Developers, Catalin Marinas, Juan Quintela, Richard Henderson, lkml - Kernel Mailing List, Dave Martin, James Morse, arm-mail-list, Thomas Gleixner, Will Deacon, kvmarm, Julien Thierry On 2020-11-19 18:42, Andrew Jones wrote: > On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote: >> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> >> wrote: >> > This series adds support for Arm's Memory Tagging Extension (MTE) to >> > KVM, allowing KVM guests to make use of it. This builds on the existing >> > user space support already in v5.10-rc1, see [1] for an overview. >> >> > The change to require the VMM to map all guest memory PROT_MTE is >> > significant as it means that the VMM has to deal with the MTE tags even >> > if it doesn't care about them (e.g. for virtual devices or if the VMM >> > doesn't support migration). Also unfortunately because the VMM can >> > change the memory layout at any time the check for PROT_MTE/VM_MTE has >> > to be done very late (at the point of faulting pages into stage 2). >> >> I'm a bit dubious about requring the VMM to map the guest memory >> PROT_MTE unless somebody's done at least a sketch of the design >> for how this would work on the QEMU side. Currently QEMU just >> assumes the guest memory is guest memory and it can access it >> without special precautions... >> > > There are two statements being made here: > > 1) Requiring the use of PROT_MTE when mapping guest memory may not fit > QEMU well. > > 2) New KVM features should be accompanied with supporting QEMU code in > order to prove that the APIs make sense. > > I strongly agree with (2). While kvmtool supports some quick testing, > it > doesn't support migration. We must test all new features with a > migration > supporting VMM. > > I'm not sure about (1). I don't feel like it should be a major problem, > but (2). > > I'd be happy to help with the QEMU prototype, but preferably when > there's > hardware available. Has all the current MTE testing just been done on > simulators? And, if so, are there regression tests regularly running on > the simulators too? And can they test migration? If hardware doesn't > show up quickly and simulators aren't used for regression tests, then > all this code will start rotting from day one. While I agree with the sentiment, the reality is pretty bleak. I'm pretty sure nobody will ever run a migration on emulation. I also doubt there is much overlap between MTE users and migration users, unfortunately. No HW is available today, and when it becomes available, it will be in the form of a closed system on which QEMU doesn't run, either because we are locked out of EL2 (as usual), or because migration is not part of the use case (like KVM on Android, for example). So we can wait another two (five?) years until general purpose HW becomes available, or we start merging what we can test today. I'm inclined to do the latter. And I think it is absolutely fine for QEMU to say "no MTE support with KVM" (we can remove all userspace visibility, except for the capability). M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-19 19:11 ` Marc Zyngier @ 2020-11-20 9:50 ` Steven Price 2020-11-20 9:56 ` Marc Zyngier 2020-12-04 8:25 ` Haibo Xu 0 siblings, 2 replies; 38+ messages in thread From: Steven Price @ 2020-11-20 9:50 UTC (permalink / raw) To: Marc Zyngier, Andrew Jones Cc: Peter Maydell, Mark Rutland, Dr. David Alan Gilbert, Haibo Xu, Suzuki K Poulose, QEMU Developers, Catalin Marinas, Juan Quintela, Richard Henderson, lkml - Kernel Mailing List, Dave Martin, James Morse, arm-mail-list, Thomas Gleixner, Will Deacon, kvmarm, Julien Thierry On 19/11/2020 19:11, Marc Zyngier wrote: > On 2020-11-19 18:42, Andrew Jones wrote: >> On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote: >>> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: >>> > This series adds support for Arm's Memory Tagging Extension (MTE) to >>> > KVM, allowing KVM guests to make use of it. This builds on the >>> existing >>> > user space support already in v5.10-rc1, see [1] for an overview. >>> >>> > The change to require the VMM to map all guest memory PROT_MTE is >>> > significant as it means that the VMM has to deal with the MTE tags >>> even >>> > if it doesn't care about them (e.g. for virtual devices or if the VMM >>> > doesn't support migration). Also unfortunately because the VMM can >>> > change the memory layout at any time the check for PROT_MTE/VM_MTE has >>> > to be done very late (at the point of faulting pages into stage 2). >>> >>> I'm a bit dubious about requring the VMM to map the guest memory >>> PROT_MTE unless somebody's done at least a sketch of the design >>> for how this would work on the QEMU side. Currently QEMU just >>> assumes the guest memory is guest memory and it can access it >>> without special precautions... >>> >> >> There are two statements being made here: >> >> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit >> QEMU well. >> >> 2) New KVM features should be accompanied with supporting QEMU code in >> order to prove that the APIs make sense. >> >> I strongly agree with (2). While kvmtool supports some quick testing, it >> doesn't support migration. We must test all new features with a migration >> supporting VMM. >> >> I'm not sure about (1). I don't feel like it should be a major problem, >> but (2). (1) seems to be contentious whichever way we go. Either PROT_MTE isn't required in which case it's easy to accidentally screw up migration, or it is required in which case it's difficult to handle normal guest memory from the VMM. I get the impression that probably I should go back to the previous approach - sorry for the distraction with this change. (2) isn't something I'm trying to skip, but I'm limited in what I can do myself so would appreciate help here. Haibo is looking into this. >> >> I'd be happy to help with the QEMU prototype, but preferably when there's >> hardware available. Has all the current MTE testing just been done on >> simulators? And, if so, are there regression tests regularly running on >> the simulators too? And can they test migration? If hardware doesn't >> show up quickly and simulators aren't used for regression tests, then >> all this code will start rotting from day one. As Marc says, hardware isn't available. Testing is either via the Arm FVP model (that I've been using for most of my testing) or QEMU full system emulation. > > While I agree with the sentiment, the reality is pretty bleak. > > I'm pretty sure nobody will ever run a migration on emulation. I also doubt > there is much overlap between MTE users and migration users, unfortunately. > > No HW is available today, and when it becomes available, it will be in > the form of a closed system on which QEMU doesn't run, either because > we are locked out of EL2 (as usual), or because migration is not part of > the use case (like KVM on Android, for example). > > So we can wait another two (five?) years until general purpose HW becomes > available, or we start merging what we can test today. I'm inclined to > do the latter. > > And I think it is absolutely fine for QEMU to say "no MTE support with KVM" > (we can remove all userspace visibility, except for the capability). What I'm trying to achieve is a situation where KVM+MTE without migration works and we leave ourselves a clear path where migration can be added. With hindsight I think this version of the series was a wrong turn - if we return to not requiring PROT_MTE then we have the following two potential options to explore for migration in the future: * The VMM can choose to enable PROT_MTE if it needs to, and if desired we can add a flag to enforce this in the kernel. * If needed a new kernel interface can be provided to fetch/set tags from guest memory which isn't mapped PROT_MTE. Does this sound reasonable? I'll clean up the set_pte_at() change and post a v6 later today. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-20 9:50 ` Steven Price @ 2020-11-20 9:56 ` Marc Zyngier 2020-11-20 9:58 ` Steven Price 2020-12-04 8:25 ` Haibo Xu 1 sibling, 1 reply; 38+ messages in thread From: Marc Zyngier @ 2020-11-20 9:56 UTC (permalink / raw) To: Steven Price Cc: Andrew Jones, Peter Maydell, Mark Rutland, Dr. David Alan Gilbert, Haibo Xu, Suzuki K Poulose, QEMU Developers, Catalin Marinas, Juan Quintela, Richard Henderson, lkml - Kernel Mailing List, Dave Martin, James Morse, arm-mail-list, Thomas Gleixner, Will Deacon, kvmarm, Julien Thierry On 2020-11-20 09:50, Steven Price wrote: > On 19/11/2020 19:11, Marc Zyngier wrote: > Does this sound reasonable? > > I'll clean up the set_pte_at() change and post a v6 later today. Please hold on. I still haven't reviewed your v5, nor have I had time to read your reply to my comments on v4. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-20 9:56 ` Marc Zyngier @ 2020-11-20 9:58 ` Steven Price 0 siblings, 0 replies; 38+ messages in thread From: Steven Price @ 2020-11-20 9:58 UTC (permalink / raw) To: Marc Zyngier Cc: Andrew Jones, Peter Maydell, Mark Rutland, Dr. David Alan Gilbert, Haibo Xu, Suzuki K Poulose, QEMU Developers, Catalin Marinas, Juan Quintela, Richard Henderson, lkml - Kernel Mailing List, Dave Martin, James Morse, arm-mail-list, Thomas Gleixner, Will Deacon, kvmarm, Julien Thierry On 20/11/2020 09:56, Marc Zyngier wrote: > On 2020-11-20 09:50, Steven Price wrote: >> On 19/11/2020 19:11, Marc Zyngier wrote: > >> Does this sound reasonable? >> >> I'll clean up the set_pte_at() change and post a v6 later today. > > Please hold on. I still haven't reviewed your v5, nor have I had time > to read your reply to my comments on v4. Sure, no problem ;) Steve ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-20 9:50 ` Steven Price 2020-11-20 9:56 ` Marc Zyngier @ 2020-12-04 8:25 ` Haibo Xu 2020-12-07 14:48 ` Steven Price 1 sibling, 1 reply; 38+ messages in thread From: Haibo Xu @ 2020-12-04 8:25 UTC (permalink / raw) To: Steven Price Cc: Marc Zyngier, Andrew Jones, Catalin Marinas, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, arm-mail-list, kvmarm, Thomas Gleixner, Will Deacon, Dave Martin, lkml - Kernel Mailing List On Fri, 20 Nov 2020 at 17:51, Steven Price <steven.price@arm.com> wrote: > > On 19/11/2020 19:11, Marc Zyngier wrote: > > On 2020-11-19 18:42, Andrew Jones wrote: > >> On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote: > >>> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: > >>> > This series adds support for Arm's Memory Tagging Extension (MTE) to > >>> > KVM, allowing KVM guests to make use of it. This builds on the > >>> existing > >>> > user space support already in v5.10-rc1, see [1] for an overview. > >>> > >>> > The change to require the VMM to map all guest memory PROT_MTE is > >>> > significant as it means that the VMM has to deal with the MTE tags > >>> even > >>> > if it doesn't care about them (e.g. for virtual devices or if the VMM > >>> > doesn't support migration). Also unfortunately because the VMM can > >>> > change the memory layout at any time the check for PROT_MTE/VM_MTE has > >>> > to be done very late (at the point of faulting pages into stage 2). > >>> > >>> I'm a bit dubious about requring the VMM to map the guest memory > >>> PROT_MTE unless somebody's done at least a sketch of the design > >>> for how this would work on the QEMU side. Currently QEMU just > >>> assumes the guest memory is guest memory and it can access it > >>> without special precautions... > >>> > >> > >> There are two statements being made here: > >> > >> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit > >> QEMU well. > >> > >> 2) New KVM features should be accompanied with supporting QEMU code in > >> order to prove that the APIs make sense. > >> > >> I strongly agree with (2). While kvmtool supports some quick testing, it > >> doesn't support migration. We must test all new features with a migration > >> supporting VMM. > >> > >> I'm not sure about (1). I don't feel like it should be a major problem, > >> but (2). > > (1) seems to be contentious whichever way we go. Either PROT_MTE isn't > required in which case it's easy to accidentally screw up migration, or > it is required in which case it's difficult to handle normal guest > memory from the VMM. I get the impression that probably I should go back > to the previous approach - sorry for the distraction with this change. > > (2) isn't something I'm trying to skip, but I'm limited in what I can do > myself so would appreciate help here. Haibo is looking into this. > Hi Steven, Sorry for the later reply! I have finished the POC for the MTE migration support with the assumption that all the memory is mapped with PROT_MTE. But I got stuck in the test with a FVP setup. Previously, I successfully compiled a test case to verify the basic function of MTE in a guest. But these days, the re-compiled test can't be executed by the guest(very weird). The short plan to verify the migration is to set the MTE tags on one page in the guest, and try to dump the migrated memory contents. I will update the status later next week! Regards, Haibo > >> > >> I'd be happy to help with the QEMU prototype, but preferably when there's > >> hardware available. Has all the current MTE testing just been done on > >> simulators? And, if so, are there regression tests regularly running on > >> the simulators too? And can they test migration? If hardware doesn't > >> show up quickly and simulators aren't used for regression tests, then > >> all this code will start rotting from day one. > > As Marc says, hardware isn't available. Testing is either via the Arm > FVP model (that I've been using for most of my testing) or QEMU full > system emulation. > > > > > While I agree with the sentiment, the reality is pretty bleak. > > > > I'm pretty sure nobody will ever run a migration on emulation. I also doubt > > there is much overlap between MTE users and migration users, unfortunately. > > > > No HW is available today, and when it becomes available, it will be in > > the form of a closed system on which QEMU doesn't run, either because > > we are locked out of EL2 (as usual), or because migration is not part of > > the use case (like KVM on Android, for example). > > > > So we can wait another two (five?) years until general purpose HW becomes > > available, or we start merging what we can test today. I'm inclined to > > do the latter. > > > > And I think it is absolutely fine for QEMU to say "no MTE support with KVM" > > (we can remove all userspace visibility, except for the capability). > > What I'm trying to achieve is a situation where KVM+MTE without > migration works and we leave ourselves a clear path where migration can > be added. With hindsight I think this version of the series was a wrong > turn - if we return to not requiring PROT_MTE then we have the following > two potential options to explore for migration in the future: > > * The VMM can choose to enable PROT_MTE if it needs to, and if desired > we can add a flag to enforce this in the kernel. > > * If needed a new kernel interface can be provided to fetch/set tags > from guest memory which isn't mapped PROT_MTE. > > Does this sound reasonable? > > I'll clean up the set_pte_at() change and post a v6 later today. > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-04 8:25 ` Haibo Xu @ 2020-12-07 14:48 ` Steven Price 2020-12-07 15:27 ` Peter Maydell ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Steven Price @ 2020-12-07 14:48 UTC (permalink / raw) To: Haibo Xu Cc: Marc Zyngier, Andrew Jones, Catalin Marinas, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, arm-mail-list, kvmarm, Thomas Gleixner, Will Deacon, Dave Martin, lkml - Kernel Mailing List On 04/12/2020 08:25, Haibo Xu wrote: > On Fri, 20 Nov 2020 at 17:51, Steven Price <steven.price@arm.com> wrote: >> >> On 19/11/2020 19:11, Marc Zyngier wrote: >>> On 2020-11-19 18:42, Andrew Jones wrote: >>>> On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote: >>>>> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: >>>>>> This series adds support for Arm's Memory Tagging Extension (MTE) to >>>>>> KVM, allowing KVM guests to make use of it. This builds on the >>>>> existing >>>>>> user space support already in v5.10-rc1, see [1] for an overview. >>>>> >>>>>> The change to require the VMM to map all guest memory PROT_MTE is >>>>>> significant as it means that the VMM has to deal with the MTE tags >>>>> even >>>>>> if it doesn't care about them (e.g. for virtual devices or if the VMM >>>>>> doesn't support migration). Also unfortunately because the VMM can >>>>>> change the memory layout at any time the check for PROT_MTE/VM_MTE has >>>>>> to be done very late (at the point of faulting pages into stage 2). >>>>> >>>>> I'm a bit dubious about requring the VMM to map the guest memory >>>>> PROT_MTE unless somebody's done at least a sketch of the design >>>>> for how this would work on the QEMU side. Currently QEMU just >>>>> assumes the guest memory is guest memory and it can access it >>>>> without special precautions... >>>>> >>>> >>>> There are two statements being made here: >>>> >>>> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit >>>> QEMU well. >>>> >>>> 2) New KVM features should be accompanied with supporting QEMU code in >>>> order to prove that the APIs make sense. >>>> >>>> I strongly agree with (2). While kvmtool supports some quick testing, it >>>> doesn't support migration. We must test all new features with a migration >>>> supporting VMM. >>>> >>>> I'm not sure about (1). I don't feel like it should be a major problem, >>>> but (2). >> >> (1) seems to be contentious whichever way we go. Either PROT_MTE isn't >> required in which case it's easy to accidentally screw up migration, or >> it is required in which case it's difficult to handle normal guest >> memory from the VMM. I get the impression that probably I should go back >> to the previous approach - sorry for the distraction with this change. >> >> (2) isn't something I'm trying to skip, but I'm limited in what I can do >> myself so would appreciate help here. Haibo is looking into this. >> > > Hi Steven, > > Sorry for the later reply! > > I have finished the POC for the MTE migration support with the assumption > that all the memory is mapped with PROT_MTE. But I got stuck in the test > with a FVP setup. Previously, I successfully compiled a test case to verify > the basic function of MTE in a guest. But these days, the re-compiled test > can't be executed by the guest(very weird). The short plan to verify > the migration > is to set the MTE tags on one page in the guest, and try to dump the migrated > memory contents. Hi Haibo, Sounds like you are making good progress - thanks for the update. Have you thought about how the PROT_MTE mappings might work if QEMU itself were to use MTE? My worry is that we end up with MTE in a guest preventing QEMU from using MTE itself (because of the PROT_MTE mappings). I'm hoping QEMU can wrap its use of guest memory in a sequence which disables tag checking (something similar will be needed for the "protected VM" use case anyway), but this isn't something I've looked into. > I will update the status later next week! Great, I look forward to hearing how it goes. Thanks, Steve ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 14:48 ` Steven Price @ 2020-12-07 15:27 ` Peter Maydell 2020-12-07 15:45 ` Steven Price 2020-12-08 9:51 ` Haibo Xu 2020-12-16 7:31 ` Haibo Xu 2 siblings, 1 reply; 38+ messages in thread From: Peter Maydell @ 2020-12-07 15:27 UTC (permalink / raw) To: Steven Price Cc: Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Marc Zyngier, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Catalin Marinas, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Mon, 7 Dec 2020 at 14:48, Steven Price <steven.price@arm.com> wrote: > Sounds like you are making good progress - thanks for the update. Have > you thought about how the PROT_MTE mappings might work if QEMU itself > were to use MTE? My worry is that we end up with MTE in a guest > preventing QEMU from using MTE itself (because of the PROT_MTE > mappings). I'm hoping QEMU can wrap its use of guest memory in a > sequence which disables tag checking (something similar will be needed > for the "protected VM" use case anyway), but this isn't something I've > looked into. It's not entirely the same as the "protected VM" case. For that the patches currently on list basically special case "this is a debug access (eg from gdbstub/monitor)" which then either gets to go via "decrypt guest RAM for debug" or gets failed depending on whether the VM has a debug-is-ok flag enabled. For an MTE guest the common case will be guests doing standard DMA operations to or from guest memory. The ideal API for that from QEMU's point of view would be "accesses to guest RAM don't do tag checks, even if tag checks are enabled for accesses QEMU does to memory it has allocated itself as a normal userspace program". thanks -- PMM ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 15:27 ` Peter Maydell @ 2020-12-07 15:45 ` Steven Price 2020-12-07 16:05 ` Marc Zyngier 2020-12-07 16:44 ` Dr. David Alan Gilbert 0 siblings, 2 replies; 38+ messages in thread From: Steven Price @ 2020-12-07 15:45 UTC (permalink / raw) To: Peter Maydell Cc: Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Marc Zyngier, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Catalin Marinas, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On 07/12/2020 15:27, Peter Maydell wrote: > On Mon, 7 Dec 2020 at 14:48, Steven Price <steven.price@arm.com> wrote: >> Sounds like you are making good progress - thanks for the update. Have >> you thought about how the PROT_MTE mappings might work if QEMU itself >> were to use MTE? My worry is that we end up with MTE in a guest >> preventing QEMU from using MTE itself (because of the PROT_MTE >> mappings). I'm hoping QEMU can wrap its use of guest memory in a >> sequence which disables tag checking (something similar will be needed >> for the "protected VM" use case anyway), but this isn't something I've >> looked into. > > It's not entirely the same as the "protected VM" case. For that > the patches currently on list basically special case "this is a > debug access (eg from gdbstub/monitor)" which then either gets > to go via "decrypt guest RAM for debug" or gets failed depending > on whether the VM has a debug-is-ok flag enabled. For an MTE > guest the common case will be guests doing standard DMA operations > to or from guest memory. The ideal API for that from QEMU's > point of view would be "accesses to guest RAM don't do tag > checks, even if tag checks are enabled for accesses QEMU does to > memory it has allocated itself as a normal userspace program". Sorry, I know I simplified it rather by saying it's similar to protected VM. Basically as I see it there are three types of memory access: 1) Debug case - has to go via a special case for decryption or ignoring the MTE tag value. Hopefully this can be abstracted in the same way. 2) Migration - for a protected VM there's likely to be a special method to allow the VMM access to the encrypted memory (AFAIK memory is usually kept inaccessible to the VMM). For MTE this again has to be special cased as we actually want both the data and the tag values. 3) Device DMA - for a protected VM it's usual to unencrypt a small area of memory (with the permission of the guest) and use that as a bounce buffer. This is possible with MTE: have an area the VMM purposefully maps with PROT_MTE. The issue is that this has a performance overhead and we can do better with MTE because it's trivial for the VMM to disable the protection for any memory. The part I'm unsure on is how easy it is for QEMU to deal with (3) without the overhead of bounce buffers. Ideally there'd already be a wrapper for guest memory accesses and that could just be wrapped with setting TCO during the access. I suspect the actual situation is more complex though, and I'm hoping Haibo's investigations will help us understand this. Thanks, Steve ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 15:45 ` Steven Price @ 2020-12-07 16:05 ` Marc Zyngier 2020-12-07 16:34 ` Catalin Marinas 2020-12-07 16:44 ` Dr. David Alan Gilbert 1 sibling, 1 reply; 38+ messages in thread From: Marc Zyngier @ 2020-12-07 16:05 UTC (permalink / raw) To: Steven Price Cc: Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Catalin Marinas, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On 2020-12-07 15:45, Steven Price wrote: > On 07/12/2020 15:27, Peter Maydell wrote: >> On Mon, 7 Dec 2020 at 14:48, Steven Price <steven.price@arm.com> >> wrote: >>> Sounds like you are making good progress - thanks for the update. >>> Have >>> you thought about how the PROT_MTE mappings might work if QEMU itself >>> were to use MTE? My worry is that we end up with MTE in a guest >>> preventing QEMU from using MTE itself (because of the PROT_MTE >>> mappings). I'm hoping QEMU can wrap its use of guest memory in a >>> sequence which disables tag checking (something similar will be >>> needed >>> for the "protected VM" use case anyway), but this isn't something >>> I've >>> looked into. >> >> It's not entirely the same as the "protected VM" case. For that >> the patches currently on list basically special case "this is a >> debug access (eg from gdbstub/monitor)" which then either gets >> to go via "decrypt guest RAM for debug" or gets failed depending >> on whether the VM has a debug-is-ok flag enabled. For an MTE >> guest the common case will be guests doing standard DMA operations >> to or from guest memory. The ideal API for that from QEMU's >> point of view would be "accesses to guest RAM don't do tag >> checks, even if tag checks are enabled for accesses QEMU does to >> memory it has allocated itself as a normal userspace program". > > Sorry, I know I simplified it rather by saying it's similar to > protected VM. Basically as I see it there are three types of memory > access: > > 1) Debug case - has to go via a special case for decryption or > ignoring the MTE tag value. Hopefully this can be abstracted in the > same way. > > 2) Migration - for a protected VM there's likely to be a special > method to allow the VMM access to the encrypted memory (AFAIK memory > is usually kept inaccessible to the VMM). For MTE this again has to be > special cased as we actually want both the data and the tag values. > > 3) Device DMA - for a protected VM it's usual to unencrypt a small > area of memory (with the permission of the guest) and use that as a > bounce buffer. This is possible with MTE: have an area the VMM > purposefully maps with PROT_MTE. The issue is that this has a > performance overhead and we can do better with MTE because it's > trivial for the VMM to disable the protection for any memory. > > The part I'm unsure on is how easy it is for QEMU to deal with (3) > without the overhead of bounce buffers. Ideally there'd already be a > wrapper for guest memory accesses and that could just be wrapped with > setting TCO during the access. I suspect the actual situation is more > complex though, and I'm hoping Haibo's investigations will help us > understand this. What I'd really like to see is a description of how shared memory is, in general, supposed to work with MTE. My gut feeling is that it doesn't, and that you need to turn MTE off when sharing memory (either implicitly or explicitly). Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 16:05 ` Marc Zyngier @ 2020-12-07 16:34 ` Catalin Marinas 2020-12-07 19:03 ` Marc Zyngier 0 siblings, 1 reply; 38+ messages in thread From: Catalin Marinas @ 2020-12-07 16:34 UTC (permalink / raw) To: Marc Zyngier Cc: Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Mon, Dec 07, 2020 at 04:05:55PM +0000, Marc Zyngier wrote: > On 2020-12-07 15:45, Steven Price wrote: > > On 07/12/2020 15:27, Peter Maydell wrote: > > > On Mon, 7 Dec 2020 at 14:48, Steven Price <steven.price@arm.com> > > > wrote: > > > > Sounds like you are making good progress - thanks for the > > > > update. Have > > > > you thought about how the PROT_MTE mappings might work if QEMU itself > > > > were to use MTE? My worry is that we end up with MTE in a guest > > > > preventing QEMU from using MTE itself (because of the PROT_MTE > > > > mappings). I'm hoping QEMU can wrap its use of guest memory in a > > > > sequence which disables tag checking (something similar will be > > > > needed > > > > for the "protected VM" use case anyway), but this isn't > > > > something I've > > > > looked into. > > > > > > It's not entirely the same as the "protected VM" case. For that > > > the patches currently on list basically special case "this is a > > > debug access (eg from gdbstub/monitor)" which then either gets > > > to go via "decrypt guest RAM for debug" or gets failed depending > > > on whether the VM has a debug-is-ok flag enabled. For an MTE > > > guest the common case will be guests doing standard DMA operations > > > to or from guest memory. The ideal API for that from QEMU's > > > point of view would be "accesses to guest RAM don't do tag > > > checks, even if tag checks are enabled for accesses QEMU does to > > > memory it has allocated itself as a normal userspace program". > > > > Sorry, I know I simplified it rather by saying it's similar to > > protected VM. Basically as I see it there are three types of memory > > access: > > > > 1) Debug case - has to go via a special case for decryption or > > ignoring the MTE tag value. Hopefully this can be abstracted in the > > same way. > > > > 2) Migration - for a protected VM there's likely to be a special > > method to allow the VMM access to the encrypted memory (AFAIK memory > > is usually kept inaccessible to the VMM). For MTE this again has to be > > special cased as we actually want both the data and the tag values. > > > > 3) Device DMA - for a protected VM it's usual to unencrypt a small > > area of memory (with the permission of the guest) and use that as a > > bounce buffer. This is possible with MTE: have an area the VMM > > purposefully maps with PROT_MTE. The issue is that this has a > > performance overhead and we can do better with MTE because it's > > trivial for the VMM to disable the protection for any memory. > > > > The part I'm unsure on is how easy it is for QEMU to deal with (3) > > without the overhead of bounce buffers. Ideally there'd already be a > > wrapper for guest memory accesses and that could just be wrapped with > > setting TCO during the access. I suspect the actual situation is more > > complex though, and I'm hoping Haibo's investigations will help us > > understand this. > > What I'd really like to see is a description of how shared memory > is, in general, supposed to work with MTE. My gut feeling is that > it doesn't, and that you need to turn MTE off when sharing memory > (either implicitly or explicitly). The allocation tag (in-memory tag) is a property assigned to a physical address range and it can be safely shared between different processes as long as they access it via pointers with the same allocation tag (bits 59:56). The kernel enables such tagged shared memory for user processes (anonymous, tmpfs, shmem). What we don't have in the architecture is a memory type which allows access to tags but no tag checking. To access the data when the tags aren't known, the tag checking would have to be disabled via either a prctl() or by setting the PSTATE.TCO bit. The kernel accesses the user memory via the linear map using a match-all tag 0xf, so no TCO bit toggling. For user, however, we disabled such match-all tag and it cannot be enabled at run-time (at least not easily, it's cached in the TLB). However, we already have two modes to disable tag checking which Qemu could use when migrating data+tags. -- Catalin ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 16:34 ` Catalin Marinas @ 2020-12-07 19:03 ` Marc Zyngier 2020-12-08 17:21 ` Catalin Marinas 0 siblings, 1 reply; 38+ messages in thread From: Marc Zyngier @ 2020-12-07 19:03 UTC (permalink / raw) To: Catalin Marinas Cc: Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Mon, 07 Dec 2020 16:34:05 +0000, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Mon, Dec 07, 2020 at 04:05:55PM +0000, Marc Zyngier wrote: > > What I'd really like to see is a description of how shared memory > > is, in general, supposed to work with MTE. My gut feeling is that > > it doesn't, and that you need to turn MTE off when sharing memory > > (either implicitly or explicitly). > > The allocation tag (in-memory tag) is a property assigned to a physical > address range and it can be safely shared between different processes as > long as they access it via pointers with the same allocation tag (bits > 59:56). The kernel enables such tagged shared memory for user processes > (anonymous, tmpfs, shmem). I think that's one case where the shared memory scheme breaks, as we have two kernels in charge of their own tags, and they obviously can't be synchronised > What we don't have in the architecture is a memory type which allows > access to tags but no tag checking. To access the data when the tags > aren't known, the tag checking would have to be disabled via either a > prctl() or by setting the PSTATE.TCO bit. I guess that's point (3) in Steven's taxonomy. It still a bit ugly to fit in an existing piece of userspace, specially if it wants to use MTE for its own benefit. > The kernel accesses the user memory via the linear map using a match-all > tag 0xf, so no TCO bit toggling. For user, however, we disabled such > match-all tag and it cannot be enabled at run-time (at least not easily, > it's cached in the TLB). However, we already have two modes to disable > tag checking which Qemu could use when migrating data+tags. I wonder whether we will have to have something kernel side to dump/reload tags in a way that matches the patterns used by live migration. M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 19:03 ` Marc Zyngier @ 2020-12-08 17:21 ` Catalin Marinas 2020-12-08 18:21 ` Marc Zyngier 0 siblings, 1 reply; 38+ messages in thread From: Catalin Marinas @ 2020-12-08 17:21 UTC (permalink / raw) To: Marc Zyngier Cc: Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Mon, Dec 07, 2020 at 07:03:13PM +0000, Marc Zyngier wrote: > On Mon, 07 Dec 2020 16:34:05 +0000, > Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Mon, Dec 07, 2020 at 04:05:55PM +0000, Marc Zyngier wrote: > > > What I'd really like to see is a description of how shared memory > > > is, in general, supposed to work with MTE. My gut feeling is that > > > it doesn't, and that you need to turn MTE off when sharing memory > > > (either implicitly or explicitly). > > > > The allocation tag (in-memory tag) is a property assigned to a physical > > address range and it can be safely shared between different processes as > > long as they access it via pointers with the same allocation tag (bits > > 59:56). The kernel enables such tagged shared memory for user processes > > (anonymous, tmpfs, shmem). > > I think that's one case where the shared memory scheme breaks, as we > have two kernels in charge of their own tags, and they obviously can't > be synchronised Yes, if you can't trust the other entity to not change the tags, the only option is to do an untagged access. > > What we don't have in the architecture is a memory type which allows > > access to tags but no tag checking. To access the data when the tags > > aren't known, the tag checking would have to be disabled via either a > > prctl() or by setting the PSTATE.TCO bit. > > I guess that's point (3) in Steven's taxonomy. It still a bit ugly to > fit in an existing piece of userspace, specially if it wants to use > MTE for its own benefit. I agree it's ugly. For the device DMA emulation case, the only sane way is to mimic what a real device does - no tag checking. For a generic implementation, this means that such shared memory should not be mapped with PROT_MTE on the VMM side. I guess this leads to your point that sharing doesn't work for this scenario ;). > > The kernel accesses the user memory via the linear map using a match-all > > tag 0xf, so no TCO bit toggling. For user, however, we disabled such > > match-all tag and it cannot be enabled at run-time (at least not easily, > > it's cached in the TLB). However, we already have two modes to disable > > tag checking which Qemu could use when migrating data+tags. > > I wonder whether we will have to have something kernel side to > dump/reload tags in a way that matches the patterns used by live > migration. We have something related - ptrace dumps/resores the tags. Can the same concept be expanded to a KVM ioctl? -- Catalin ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-08 17:21 ` Catalin Marinas @ 2020-12-08 18:21 ` Marc Zyngier 2020-12-09 12:44 ` Catalin Marinas 0 siblings, 1 reply; 38+ messages in thread From: Marc Zyngier @ 2020-12-08 18:21 UTC (permalink / raw) To: Catalin Marinas Cc: Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On 2020-12-08 17:21, Catalin Marinas wrote: > On Mon, Dec 07, 2020 at 07:03:13PM +0000, Marc Zyngier wrote: >> On Mon, 07 Dec 2020 16:34:05 +0000, >> Catalin Marinas <catalin.marinas@arm.com> wrote: >> > On Mon, Dec 07, 2020 at 04:05:55PM +0000, Marc Zyngier wrote: >> > > What I'd really like to see is a description of how shared memory >> > > is, in general, supposed to work with MTE. My gut feeling is that >> > > it doesn't, and that you need to turn MTE off when sharing memory >> > > (either implicitly or explicitly). >> > >> > The allocation tag (in-memory tag) is a property assigned to a physical >> > address range and it can be safely shared between different processes as >> > long as they access it via pointers with the same allocation tag (bits >> > 59:56). The kernel enables such tagged shared memory for user processes >> > (anonymous, tmpfs, shmem). >> >> I think that's one case where the shared memory scheme breaks, as we >> have two kernels in charge of their own tags, and they obviously can't >> be synchronised > > Yes, if you can't trust the other entity to not change the tags, the > only option is to do an untagged access. > >> > What we don't have in the architecture is a memory type which allows >> > access to tags but no tag checking. To access the data when the tags >> > aren't known, the tag checking would have to be disabled via either a >> > prctl() or by setting the PSTATE.TCO bit. >> >> I guess that's point (3) in Steven's taxonomy. It still a bit ugly to >> fit in an existing piece of userspace, specially if it wants to use >> MTE for its own benefit. > > I agree it's ugly. For the device DMA emulation case, the only sane way > is to mimic what a real device does - no tag checking. For a generic > implementation, this means that such shared memory should not be mapped > with PROT_MTE on the VMM side. I guess this leads to your point that > sharing doesn't work for this scenario ;). Exactly ;-) >> > The kernel accesses the user memory via the linear map using a match-all >> > tag 0xf, so no TCO bit toggling. For user, however, we disabled such >> > match-all tag and it cannot be enabled at run-time (at least not easily, >> > it's cached in the TLB). However, we already have two modes to disable >> > tag checking which Qemu could use when migrating data+tags. >> >> I wonder whether we will have to have something kernel side to >> dump/reload tags in a way that matches the patterns used by live >> migration. > > We have something related - ptrace dumps/resores the tags. Can the same > concept be expanded to a KVM ioctl? Yes, although I wonder whether we should integrate this deeply into the dirty-log mechanism: it would be really interesting to dump the tags at the point where the page is flagged as clean from a dirty-log point of view. As the page is dirtied, discard the saved tags. It is probably expensive, but it ensures that the VMM sees consistent tags (if the page is clean, the tags are valid). Of course, it comes with the added requirement that the VMM allocates enough memory to store the tags, which may be a tall order. I'm not sure how to give a consistent view to userspace otherwise. It'd be worth looking at how much we can reuse from the ptrace (and I expect swap?) code to implement this. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-08 18:21 ` Marc Zyngier @ 2020-12-09 12:44 ` Catalin Marinas 2020-12-09 13:25 ` Marc Zyngier 0 siblings, 1 reply; 38+ messages in thread From: Catalin Marinas @ 2020-12-09 12:44 UTC (permalink / raw) To: Marc Zyngier Cc: Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Tue, Dec 08, 2020 at 06:21:12PM +0000, Marc Zyngier wrote: > On 2020-12-08 17:21, Catalin Marinas wrote: > > On Mon, Dec 07, 2020 at 07:03:13PM +0000, Marc Zyngier wrote: > > > I wonder whether we will have to have something kernel side to > > > dump/reload tags in a way that matches the patterns used by live > > > migration. > > > > We have something related - ptrace dumps/resores the tags. Can the same > > concept be expanded to a KVM ioctl? > > Yes, although I wonder whether we should integrate this deeply into > the dirty-log mechanism: it would be really interesting to dump the > tags at the point where the page is flagged as clean from a dirty-log > point of view. As the page is dirtied, discard the saved tags. From the VMM perspective, the tags can be treated just like additional (meta)data in a page. We'd only need the tags when copying over. It can race with the VM dirtying the page (writing tags would dirty it) but I don't think the current migration code cares about this. If dirtied, it copies it again. The only downside I see is an extra syscall per page both on the origin VMM and the destination one to dump/restore the tags. Is this a performance issue? -- Catalin ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-09 12:44 ` Catalin Marinas @ 2020-12-09 13:25 ` Marc Zyngier 2020-12-09 15:27 ` Catalin Marinas 0 siblings, 1 reply; 38+ messages in thread From: Marc Zyngier @ 2020-12-09 13:25 UTC (permalink / raw) To: Catalin Marinas Cc: Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On 2020-12-09 12:44, Catalin Marinas wrote: > On Tue, Dec 08, 2020 at 06:21:12PM +0000, Marc Zyngier wrote: >> On 2020-12-08 17:21, Catalin Marinas wrote: >> > On Mon, Dec 07, 2020 at 07:03:13PM +0000, Marc Zyngier wrote: >> > > I wonder whether we will have to have something kernel side to >> > > dump/reload tags in a way that matches the patterns used by live >> > > migration. >> > >> > We have something related - ptrace dumps/resores the tags. Can the same >> > concept be expanded to a KVM ioctl? >> >> Yes, although I wonder whether we should integrate this deeply into >> the dirty-log mechanism: it would be really interesting to dump the >> tags at the point where the page is flagged as clean from a dirty-log >> point of view. As the page is dirtied, discard the saved tags. > > From the VMM perspective, the tags can be treated just like additional > (meta)data in a page. We'd only need the tags when copying over. It can > race with the VM dirtying the page (writing tags would dirty it) but I > don't think the current migration code cares about this. If dirtied, it > copies it again. > > The only downside I see is an extra syscall per page both on the origin > VMM and the destination one to dump/restore the tags. Is this a > performance issue? I'm not sure. Migrating VMs already has a massive overhead, so an extra syscall per page isn't terrifying. But that's the point where I admit not knowing enough about what the VMM expects, nor whether that matches what happens on other architectures that deal with per-page metadata. Would this syscall operate on the guest address space? Or on the VMM's own mapping? M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-09 13:25 ` Marc Zyngier @ 2020-12-09 15:27 ` Catalin Marinas 2020-12-09 18:27 ` Richard Henderson 0 siblings, 1 reply; 38+ messages in thread From: Catalin Marinas @ 2020-12-09 15:27 UTC (permalink / raw) To: Marc Zyngier Cc: Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Wed, Dec 09, 2020 at 01:25:18PM +0000, Marc Zyngier wrote: > On 2020-12-09 12:44, Catalin Marinas wrote: > > On Tue, Dec 08, 2020 at 06:21:12PM +0000, Marc Zyngier wrote: > > > On 2020-12-08 17:21, Catalin Marinas wrote: > > > > On Mon, Dec 07, 2020 at 07:03:13PM +0000, Marc Zyngier wrote: > > > > > I wonder whether we will have to have something kernel side to > > > > > dump/reload tags in a way that matches the patterns used by live > > > > > migration. > > > > > > > > We have something related - ptrace dumps/resores the tags. Can the same > > > > concept be expanded to a KVM ioctl? > > > > > > Yes, although I wonder whether we should integrate this deeply into > > > the dirty-log mechanism: it would be really interesting to dump the > > > tags at the point where the page is flagged as clean from a dirty-log > > > point of view. As the page is dirtied, discard the saved tags. > > > > From the VMM perspective, the tags can be treated just like additional > > (meta)data in a page. We'd only need the tags when copying over. It can > > race with the VM dirtying the page (writing tags would dirty it) but I > > don't think the current migration code cares about this. If dirtied, it > > copies it again. > > > > The only downside I see is an extra syscall per page both on the origin > > VMM and the destination one to dump/restore the tags. Is this a > > performance issue? > > I'm not sure. Migrating VMs already has a massive overhead, so an extra > syscall per page isn't terrifying. But that's the point where I admit > not knowing enough about what the VMM expects, nor whether that matches > what happens on other architectures that deal with per-page metadata. > > Would this syscall operate on the guest address space? Or on the VMM's > own mapping? Whatever is easier for the VMM, I don't think it matters as long as the host kernel can get the actual physical address (and linear map correspondent). Maybe simpler if it's the VMM address space as the kernel can check the access permissions in case you want to hide the guest memory from the VMM for other reasons (migration is also off the table). Without syscalls, an option would be for the VMM to create two mappings: one with PROT_MTE for migration and the other without for normal DMA etc. That's achievable using memfd_create() or shm_open() and two mmap() calls, only one having PROT_MTE. The VMM address space should be sufficiently large to map two guest IPAs. -- Catalin ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-09 15:27 ` Catalin Marinas @ 2020-12-09 18:27 ` Richard Henderson 2020-12-09 18:39 ` Catalin Marinas 0 siblings, 1 reply; 38+ messages in thread From: Richard Henderson @ 2020-12-09 18:27 UTC (permalink / raw) To: Catalin Marinas, Marc Zyngier Cc: Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On 12/9/20 9:27 AM, Catalin Marinas wrote: > On Wed, Dec 09, 2020 at 01:25:18PM +0000, Marc Zyngier wrote: >> Would this syscall operate on the guest address space? Or on the VMM's >> own mapping? ... > Whatever is easier for the VMM, I don't think it matters as long as the > host kernel can get the actual physical address (and linear map > correspondent). Maybe simpler if it's the VMM address space as the > kernel can check the access permissions in case you want to hide the > guest memory from the VMM for other reasons (migration is also off the > table). Indeed, such a syscall is no longer specific to vmm's and may be used for any bulk move of tags that userland might want. > Without syscalls, an option would be for the VMM to create two mappings: > one with PROT_MTE for migration and the other without for normal DMA > etc. That's achievable using memfd_create() or shm_open() and two mmap() > calls, only one having PROT_MTE. The VMM address space should be > sufficiently large to map two guest IPAs. I would have thought that the best way is to use TCO, so that we don't have to have dual mappings (and however many MB of extra page tables that might imply). r~ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-09 18:27 ` Richard Henderson @ 2020-12-09 18:39 ` Catalin Marinas 2020-12-09 20:13 ` Richard Henderson 0 siblings, 1 reply; 38+ messages in thread From: Catalin Marinas @ 2020-12-09 18:39 UTC (permalink / raw) To: Richard Henderson Cc: Marc Zyngier, Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Wed, Dec 09, 2020 at 12:27:59PM -0600, Richard Henderson wrote: > On 12/9/20 9:27 AM, Catalin Marinas wrote: > > On Wed, Dec 09, 2020 at 01:25:18PM +0000, Marc Zyngier wrote: > >> Would this syscall operate on the guest address space? Or on the VMM's > >> own mapping? > ... > > Whatever is easier for the VMM, I don't think it matters as long as the > > host kernel can get the actual physical address (and linear map > > correspondent). Maybe simpler if it's the VMM address space as the > > kernel can check the access permissions in case you want to hide the > > guest memory from the VMM for other reasons (migration is also off the > > table). > > Indeed, such a syscall is no longer specific to vmm's and may be used for any > bulk move of tags that userland might want. For CRIU, I think the current ptrace interface would do. With VMMs, the same remote VM model doesn't apply (the "remote" VM is actually the guest memory). I'd keep this under a KVM ioctl() number rather than a new, specific syscall. > > Without syscalls, an option would be for the VMM to create two mappings: > > one with PROT_MTE for migration and the other without for normal DMA > > etc. That's achievable using memfd_create() or shm_open() and two mmap() > > calls, only one having PROT_MTE. The VMM address space should be > > sufficiently large to map two guest IPAs. > > I would have thought that the best way is to use TCO, so that we don't have to > have dual mappings (and however many MB of extra page tables that might imply). The problem appears when the VMM wants to use MTE itself (e.g. linked against an MTE-aware glibc), toggling TCO is no longer generic enough, especially when it comes to device emulation. -- Catalin ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-09 18:39 ` Catalin Marinas @ 2020-12-09 20:13 ` Richard Henderson 2020-12-09 20:20 ` Peter Maydell 0 siblings, 1 reply; 38+ messages in thread From: Richard Henderson @ 2020-12-09 20:13 UTC (permalink / raw) To: Catalin Marinas Cc: Marc Zyngier, Steven Price, Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On 12/9/20 12:39 PM, Catalin Marinas wrote: >> I would have thought that the best way is to use TCO, so that we don't have to >> have dual mappings (and however many MB of extra page tables that might imply). > > The problem appears when the VMM wants to use MTE itself (e.g. linked > against an MTE-aware glibc), toggling TCO is no longer generic enough, > especially when it comes to device emulation. But we do know exactly when we're manipulating guest memory -- we have special routines for that. So the special routines gain a toggle of TCO around the exact guest memory manipulation, not a blanket disable of MTE across large swaths of QEMU. r~ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-09 20:13 ` Richard Henderson @ 2020-12-09 20:20 ` Peter Maydell 0 siblings, 0 replies; 38+ messages in thread From: Peter Maydell @ 2020-12-09 20:20 UTC (permalink / raw) To: Richard Henderson Cc: Catalin Marinas, Marc Zyngier, Steven Price, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, QEMU Developers, Dr. David Alan Gilbert, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Wed, 9 Dec 2020 at 20:13, Richard Henderson <richard.henderson@linaro.org> wrote: > > On 12/9/20 12:39 PM, Catalin Marinas wrote: > >> I would have thought that the best way is to use TCO, so that we don't have to > >> have dual mappings (and however many MB of extra page tables that might imply). > > > > The problem appears when the VMM wants to use MTE itself (e.g. linked > > against an MTE-aware glibc), toggling TCO is no longer generic enough, > > especially when it comes to device emulation. > > But we do know exactly when we're manipulating guest memory -- we have special > routines for that. Well, yes and no. It's not like every access to guest memory is through a specific set of "load from guest"/"store from guest" functions, and in some situations there's a "get a pointer to guest RAM, keep using it over a long-ish sequence of QEMU code, then be done with it" pattern. It's because it's not that trivial to isolate when something is accessing guest RAM that I don't want to just have it be mapped PROT_MTE into QEMU. I think we'd end up spending a lot of time hunting down "whoops, turns out this is accessing guest RAM and sometimes it trips over the tags in a hard-to-debug way" bugs. I'd much rather the kernel just provided us with an API for what we want, which is (1) the guest RAM as just RAM with no tag checking and separately (2) some mechanism yet-to-be-designed which lets us bulk copy a page's worth of tags for migration. thanks -- PMM ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 15:45 ` Steven Price 2020-12-07 16:05 ` Marc Zyngier @ 2020-12-07 16:44 ` Dr. David Alan Gilbert 2020-12-07 17:10 ` Peter Maydell 2020-12-08 10:05 ` Haibo Xu 1 sibling, 2 replies; 38+ messages in thread From: Dr. David Alan Gilbert @ 2020-12-07 16:44 UTC (permalink / raw) To: Steven Price, dgibson Cc: Peter Maydell, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Marc Zyngier, Richard Henderson, QEMU Developers, Catalin Marinas, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin * Steven Price (steven.price@arm.com) wrote: > On 07/12/2020 15:27, Peter Maydell wrote: > > On Mon, 7 Dec 2020 at 14:48, Steven Price <steven.price@arm.com> wrote: > > > Sounds like you are making good progress - thanks for the update. Have > > > you thought about how the PROT_MTE mappings might work if QEMU itself > > > were to use MTE? My worry is that we end up with MTE in a guest > > > preventing QEMU from using MTE itself (because of the PROT_MTE > > > mappings). I'm hoping QEMU can wrap its use of guest memory in a > > > sequence which disables tag checking (something similar will be needed > > > for the "protected VM" use case anyway), but this isn't something I've > > > looked into. > > > > It's not entirely the same as the "protected VM" case. For that > > the patches currently on list basically special case "this is a > > debug access (eg from gdbstub/monitor)" which then either gets > > to go via "decrypt guest RAM for debug" or gets failed depending > > on whether the VM has a debug-is-ok flag enabled. For an MTE > > guest the common case will be guests doing standard DMA operations > > to or from guest memory. The ideal API for that from QEMU's > > point of view would be "accesses to guest RAM don't do tag > > checks, even if tag checks are enabled for accesses QEMU does to > > memory it has allocated itself as a normal userspace program". > > Sorry, I know I simplified it rather by saying it's similar to protected VM. > Basically as I see it there are three types of memory access: > > 1) Debug case - has to go via a special case for decryption or ignoring the > MTE tag value. Hopefully this can be abstracted in the same way. > > 2) Migration - for a protected VM there's likely to be a special method to > allow the VMM access to the encrypted memory (AFAIK memory is usually kept > inaccessible to the VMM). For MTE this again has to be special cased as we > actually want both the data and the tag values. > > 3) Device DMA - for a protected VM it's usual to unencrypt a small area of > memory (with the permission of the guest) and use that as a bounce buffer. > This is possible with MTE: have an area the VMM purposefully maps with > PROT_MTE. The issue is that this has a performance overhead and we can do > better with MTE because it's trivial for the VMM to disable the protection > for any memory. Those all sound very similar to the AMD SEV world; there's the special case for Debug that Peter mentioned; migration is ...complicated and needs special case that's still being figured out, and as I understand Device DMA also uses a bounce buffer (and swiotlb in the guest to make that happen). I'm not sure about the stories for the IBM hardware equivalents. Dave > The part I'm unsure on is how easy it is for QEMU to deal with (3) without > the overhead of bounce buffers. Ideally there'd already be a wrapper for > guest memory accesses and that could just be wrapped with setting TCO during > the access. I suspect the actual situation is more complex though, and I'm > hoping Haibo's investigations will help us understand this. > > Thanks, > > Steve > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 16:44 ` Dr. David Alan Gilbert @ 2020-12-07 17:10 ` Peter Maydell 2020-12-07 17:44 ` Dr. David Alan Gilbert 2020-12-08 10:05 ` Haibo Xu 1 sibling, 1 reply; 38+ messages in thread From: Peter Maydell @ 2020-12-07 17:10 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Steven Price, David Gibson, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Marc Zyngier, Richard Henderson, QEMU Developers, Catalin Marinas, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Mon, 7 Dec 2020 at 16:44, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Steven Price (steven.price@arm.com) wrote: > > Sorry, I know I simplified it rather by saying it's similar to protected VM. > > Basically as I see it there are three types of memory access: > > > > 1) Debug case - has to go via a special case for decryption or ignoring the > > MTE tag value. Hopefully this can be abstracted in the same way. > > > > 2) Migration - for a protected VM there's likely to be a special method to > > allow the VMM access to the encrypted memory (AFAIK memory is usually kept > > inaccessible to the VMM). For MTE this again has to be special cased as we > > actually want both the data and the tag values. > > > > 3) Device DMA - for a protected VM it's usual to unencrypt a small area of > > memory (with the permission of the guest) and use that as a bounce buffer. > > This is possible with MTE: have an area the VMM purposefully maps with > > PROT_MTE. The issue is that this has a performance overhead and we can do > > better with MTE because it's trivial for the VMM to disable the protection > > for any memory. > > Those all sound very similar to the AMD SEV world; there's the special > case for Debug that Peter mentioned; migration is ...complicated and > needs special case that's still being figured out, and as I understand > Device DMA also uses a bounce buffer (and swiotlb in the guest to make > that happen). Mmm, but for encrypted VMs the VM has to jump through all these hoops because "don't let the VM directly access arbitrary guest RAM" is the whole point of the feature. For MTE, we don't want in general to be doing tag-checked accesses to guest RAM and there is nothing in the feature "allow guests to use MTE" that requires that the VMM's guest RAM accesses must do tag-checking. So we should avoid having a design that require us to jump through all the hoops. Even if it happens that handling encrypted VMs means that QEMU has to grow some infrastructure for carefully positioning hoops in appropriate places, we shouldn't use it unnecessarily... All we actually need is a mechanism for migrating the tags: I don't think there's ever a situation where you want tag-checking enabled for the VMM's accesses to the guest RAM. thanks -- PMM ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 17:10 ` Peter Maydell @ 2020-12-07 17:44 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 38+ messages in thread From: Dr. David Alan Gilbert @ 2020-12-07 17:44 UTC (permalink / raw) To: Peter Maydell Cc: Steven Price, David Gibson, Haibo Xu, lkml - Kernel Mailing List, Juan Quintela, Marc Zyngier, Richard Henderson, QEMU Developers, Catalin Marinas, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin * Peter Maydell (peter.maydell@linaro.org) wrote: > On Mon, 7 Dec 2020 at 16:44, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > * Steven Price (steven.price@arm.com) wrote: > > > Sorry, I know I simplified it rather by saying it's similar to protected VM. > > > Basically as I see it there are three types of memory access: > > > > > > 1) Debug case - has to go via a special case for decryption or ignoring the > > > MTE tag value. Hopefully this can be abstracted in the same way. > > > > > > 2) Migration - for a protected VM there's likely to be a special method to > > > allow the VMM access to the encrypted memory (AFAIK memory is usually kept > > > inaccessible to the VMM). For MTE this again has to be special cased as we > > > actually want both the data and the tag values. > > > > > > 3) Device DMA - for a protected VM it's usual to unencrypt a small area of > > > memory (with the permission of the guest) and use that as a bounce buffer. > > > This is possible with MTE: have an area the VMM purposefully maps with > > > PROT_MTE. The issue is that this has a performance overhead and we can do > > > better with MTE because it's trivial for the VMM to disable the protection > > > for any memory. > > > > Those all sound very similar to the AMD SEV world; there's the special > > case for Debug that Peter mentioned; migration is ...complicated and > > needs special case that's still being figured out, and as I understand > > Device DMA also uses a bounce buffer (and swiotlb in the guest to make > > that happen). > > Mmm, but for encrypted VMs the VM has to jump through all these > hoops because "don't let the VM directly access arbitrary guest RAM" > is the whole point of the feature. For MTE, we don't want in general > to be doing tag-checked accesses to guest RAM and there is nothing > in the feature "allow guests to use MTE" that requires that the VMM's > guest RAM accesses must do tag-checking. So we should avoid having > a design that require us to jump through all the hoops. Yes agreed, that's a fair distinction. Dave Even if > it happens that handling encrypted VMs means that QEMU has to grow > some infrastructure for carefully positioning hoops in appropriate > places, we shouldn't use it unnecessarily... All we actually need is > a mechanism for migrating the tags: I don't think there's ever a > situation where you want tag-checking enabled for the VMM's accesses > to the guest RAM. > > thanks > -- PMM > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 16:44 ` Dr. David Alan Gilbert 2020-12-07 17:10 ` Peter Maydell @ 2020-12-08 10:05 ` Haibo Xu 1 sibling, 0 replies; 38+ messages in thread From: Haibo Xu @ 2020-12-08 10:05 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Steven Price, dgibson, Peter Maydell, lkml - Kernel Mailing List, Juan Quintela, Marc Zyngier, Richard Henderson, QEMU Developers, Catalin Marinas, Thomas Gleixner, Will Deacon, kvmarm, arm-mail-list, Dave Martin On Tue, 8 Dec 2020 at 00:44, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > * Steven Price (steven.price@arm.com) wrote: > > On 07/12/2020 15:27, Peter Maydell wrote: > > > On Mon, 7 Dec 2020 at 14:48, Steven Price <steven.price@arm.com> wrote: > > > > Sounds like you are making good progress - thanks for the update. Have > > > > you thought about how the PROT_MTE mappings might work if QEMU itself > > > > were to use MTE? My worry is that we end up with MTE in a guest > > > > preventing QEMU from using MTE itself (because of the PROT_MTE > > > > mappings). I'm hoping QEMU can wrap its use of guest memory in a > > > > sequence which disables tag checking (something similar will be needed > > > > for the "protected VM" use case anyway), but this isn't something I've > > > > looked into. > > > > > > It's not entirely the same as the "protected VM" case. For that > > > the patches currently on list basically special case "this is a > > > debug access (eg from gdbstub/monitor)" which then either gets > > > to go via "decrypt guest RAM for debug" or gets failed depending > > > on whether the VM has a debug-is-ok flag enabled. For an MTE > > > guest the common case will be guests doing standard DMA operations > > > to or from guest memory. The ideal API for that from QEMU's > > > point of view would be "accesses to guest RAM don't do tag > > > checks, even if tag checks are enabled for accesses QEMU does to > > > memory it has allocated itself as a normal userspace program". > > > > Sorry, I know I simplified it rather by saying it's similar to protected VM. > > Basically as I see it there are three types of memory access: > > > > 1) Debug case - has to go via a special case for decryption or ignoring the > > MTE tag value. Hopefully this can be abstracted in the same way. > > > > 2) Migration - for a protected VM there's likely to be a special method to > > allow the VMM access to the encrypted memory (AFAIK memory is usually kept > > inaccessible to the VMM). For MTE this again has to be special cased as we > > actually want both the data and the tag values. > > > > 3) Device DMA - for a protected VM it's usual to unencrypt a small area of > > memory (with the permission of the guest) and use that as a bounce buffer. > > This is possible with MTE: have an area the VMM purposefully maps with > > PROT_MTE. The issue is that this has a performance overhead and we can do > > better with MTE because it's trivial for the VMM to disable the protection > > for any memory. > > Those all sound very similar to the AMD SEV world; there's the special > case for Debug that Peter mentioned; migration is ...complicated and > needs special case that's still being figured out, and as I understand > Device DMA also uses a bounce buffer (and swiotlb in the guest to make > that happen). > > > I'm not sure about the stories for the IBM hardware equivalents. Like s390-skeys(storage keys) support in Qemu? I have read the migration support for the s390-skeys in Qemu and found that the logic is very similar to that of MTE, except the difference that the s390-skeys were migrated separately from that of the guest memory data while for MTE, I think the guest memory tags should go with the memory data. > > Dave > > > The part I'm unsure on is how easy it is for QEMU to deal with (3) without > > the overhead of bounce buffers. Ideally there'd already be a wrapper for > > guest memory accesses and that could just be wrapped with setting TCO during > > the access. I suspect the actual situation is more complex though, and I'm > > hoping Haibo's investigations will help us understand this. > > > > Thanks, > > > > Steve > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 14:48 ` Steven Price 2020-12-07 15:27 ` Peter Maydell @ 2020-12-08 9:51 ` Haibo Xu 2020-12-08 10:01 ` Marc Zyngier 2020-12-16 7:31 ` Haibo Xu 2 siblings, 1 reply; 38+ messages in thread From: Haibo Xu @ 2020-12-08 9:51 UTC (permalink / raw) To: Steven Price Cc: Marc Zyngier, Andrew Jones, Catalin Marinas, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, arm-mail-list, kvmarm, Thomas Gleixner, Will Deacon, Dave Martin, lkml - Kernel Mailing List On Mon, 7 Dec 2020 at 22:48, Steven Price <steven.price@arm.com> wrote: > > On 04/12/2020 08:25, Haibo Xu wrote: > > On Fri, 20 Nov 2020 at 17:51, Steven Price <steven.price@arm.com> wrote: > >> > >> On 19/11/2020 19:11, Marc Zyngier wrote: > >>> On 2020-11-19 18:42, Andrew Jones wrote: > >>>> On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote: > >>>>> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: > >>>>>> This series adds support for Arm's Memory Tagging Extension (MTE) to > >>>>>> KVM, allowing KVM guests to make use of it. This builds on the > >>>>> existing > >>>>>> user space support already in v5.10-rc1, see [1] for an overview. > >>>>> > >>>>>> The change to require the VMM to map all guest memory PROT_MTE is > >>>>>> significant as it means that the VMM has to deal with the MTE tags > >>>>> even > >>>>>> if it doesn't care about them (e.g. for virtual devices or if the VMM > >>>>>> doesn't support migration). Also unfortunately because the VMM can > >>>>>> change the memory layout at any time the check for PROT_MTE/VM_MTE has > >>>>>> to be done very late (at the point of faulting pages into stage 2). > >>>>> > >>>>> I'm a bit dubious about requring the VMM to map the guest memory > >>>>> PROT_MTE unless somebody's done at least a sketch of the design > >>>>> for how this would work on the QEMU side. Currently QEMU just > >>>>> assumes the guest memory is guest memory and it can access it > >>>>> without special precautions... > >>>>> > >>>> > >>>> There are two statements being made here: > >>>> > >>>> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit > >>>> QEMU well. > >>>> > >>>> 2) New KVM features should be accompanied with supporting QEMU code in > >>>> order to prove that the APIs make sense. > >>>> > >>>> I strongly agree with (2). While kvmtool supports some quick testing, it > >>>> doesn't support migration. We must test all new features with a migration > >>>> supporting VMM. > >>>> > >>>> I'm not sure about (1). I don't feel like it should be a major problem, > >>>> but (2). > >> > >> (1) seems to be contentious whichever way we go. Either PROT_MTE isn't > >> required in which case it's easy to accidentally screw up migration, or > >> it is required in which case it's difficult to handle normal guest > >> memory from the VMM. I get the impression that probably I should go back > >> to the previous approach - sorry for the distraction with this change. > >> > >> (2) isn't something I'm trying to skip, but I'm limited in what I can do > >> myself so would appreciate help here. Haibo is looking into this. > >> > > > > Hi Steven, > > > > Sorry for the later reply! > > > > I have finished the POC for the MTE migration support with the assumption > > that all the memory is mapped with PROT_MTE. But I got stuck in the test > > with a FVP setup. Previously, I successfully compiled a test case to verify > > the basic function of MTE in a guest. But these days, the re-compiled test > > can't be executed by the guest(very weird). The short plan to verify > > the migration > > is to set the MTE tags on one page in the guest, and try to dump the migrated > > memory contents. > > Hi Haibo, > > Sounds like you are making good progress - thanks for the update. Have > you thought about how the PROT_MTE mappings might work if QEMU itself > were to use MTE? My worry is that we end up with MTE in a guest > preventing QEMU from using MTE itself (because of the PROT_MTE > mappings). I'm hoping QEMU can wrap its use of guest memory in a > sequence which disables tag checking (something similar will be needed > for the "protected VM" use case anyway), but this isn't something I've > looked into. As far as I can see, to map all the guest memory with PROT_MTE in VMM is a little weird, and lots of APIs have to be changed to include this flag. IMHO, it would be better if the KVM can provide new APIs to load/store the guest memory tag which may make it easier to enable the Qemu migration support. > > > I will update the status later next week! > > Great, I look forward to hearing how it goes. > > Thanks, > > Steve ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-08 9:51 ` Haibo Xu @ 2020-12-08 10:01 ` Marc Zyngier 2020-12-08 10:10 ` Haibo Xu 0 siblings, 1 reply; 38+ messages in thread From: Marc Zyngier @ 2020-12-08 10:01 UTC (permalink / raw) To: Haibo Xu Cc: Steven Price, Andrew Jones, Catalin Marinas, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, arm-mail-list, kvmarm, Thomas Gleixner, Will Deacon, Dave Martin, lkml - Kernel Mailing List On 2020-12-08 09:51, Haibo Xu wrote: > On Mon, 7 Dec 2020 at 22:48, Steven Price <steven.price@arm.com> wrote: >> [...] >> Sounds like you are making good progress - thanks for the update. Have >> you thought about how the PROT_MTE mappings might work if QEMU itself >> were to use MTE? My worry is that we end up with MTE in a guest >> preventing QEMU from using MTE itself (because of the PROT_MTE >> mappings). I'm hoping QEMU can wrap its use of guest memory in a >> sequence which disables tag checking (something similar will be needed >> for the "protected VM" use case anyway), but this isn't something I've >> looked into. > > As far as I can see, to map all the guest memory with PROT_MTE in VMM > is a little weird, and lots of APIs have to be changed to include this > flag. > IMHO, it would be better if the KVM can provide new APIs to load/store > the > guest memory tag which may make it easier to enable the Qemu migration > support. On what granularity? To what storage? How do you plan to synchronise this with the dirty-log interface? Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-08 10:01 ` Marc Zyngier @ 2020-12-08 10:10 ` Haibo Xu 0 siblings, 0 replies; 38+ messages in thread From: Haibo Xu @ 2020-12-08 10:10 UTC (permalink / raw) To: Marc Zyngier Cc: Steven Price, Andrew Jones, Catalin Marinas, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, arm-mail-list, kvmarm, Thomas Gleixner, Will Deacon, Dave Martin, lkml - Kernel Mailing List On Tue, 8 Dec 2020 at 18:01, Marc Zyngier <maz@kernel.org> wrote: > > On 2020-12-08 09:51, Haibo Xu wrote: > > On Mon, 7 Dec 2020 at 22:48, Steven Price <steven.price@arm.com> wrote: > >> > > [...] > > >> Sounds like you are making good progress - thanks for the update. Have > >> you thought about how the PROT_MTE mappings might work if QEMU itself > >> were to use MTE? My worry is that we end up with MTE in a guest > >> preventing QEMU from using MTE itself (because of the PROT_MTE > >> mappings). I'm hoping QEMU can wrap its use of guest memory in a > >> sequence which disables tag checking (something similar will be needed > >> for the "protected VM" use case anyway), but this isn't something I've > >> looked into. > > > > As far as I can see, to map all the guest memory with PROT_MTE in VMM > > is a little weird, and lots of APIs have to be changed to include this > > flag. > > IMHO, it would be better if the KVM can provide new APIs to load/store > > the > > guest memory tag which may make it easier to enable the Qemu migration > > support. > > On what granularity? To what storage? How do you plan to synchronise > this > with the dirty-log interface? The Qemu would migrate page by page, and if one page has been migrated but becomes dirty again, the migration process would re-send this dirty page. The current MTE migration POC codes would try to send the page tags just after the page data, if one page becomes dirty again, the page data and the tags would be re-sent. > > Thanks, > > M. > -- > Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-07 14:48 ` Steven Price 2020-12-07 15:27 ` Peter Maydell 2020-12-08 9:51 ` Haibo Xu @ 2020-12-16 7:31 ` Haibo Xu 2020-12-16 10:22 ` Steven Price 2 siblings, 1 reply; 38+ messages in thread From: Haibo Xu @ 2020-12-16 7:31 UTC (permalink / raw) To: Steven Price Cc: Marc Zyngier, Andrew Jones, Catalin Marinas, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, arm-mail-list, kvmarm, Thomas Gleixner, Will Deacon, Dave Martin, lkml - Kernel Mailing List On Mon, 7 Dec 2020 at 22:48, Steven Price <steven.price@arm.com> wrote: > > On 04/12/2020 08:25, Haibo Xu wrote: > > On Fri, 20 Nov 2020 at 17:51, Steven Price <steven.price@arm.com> wrote: > >> > >> On 19/11/2020 19:11, Marc Zyngier wrote: > >>> On 2020-11-19 18:42, Andrew Jones wrote: > >>>> On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote: > >>>>> On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: > >>>>>> This series adds support for Arm's Memory Tagging Extension (MTE) to > >>>>>> KVM, allowing KVM guests to make use of it. This builds on the > >>>>> existing > >>>>>> user space support already in v5.10-rc1, see [1] for an overview. > >>>>> > >>>>>> The change to require the VMM to map all guest memory PROT_MTE is > >>>>>> significant as it means that the VMM has to deal with the MTE tags > >>>>> even > >>>>>> if it doesn't care about them (e.g. for virtual devices or if the VMM > >>>>>> doesn't support migration). Also unfortunately because the VMM can > >>>>>> change the memory layout at any time the check for PROT_MTE/VM_MTE has > >>>>>> to be done very late (at the point of faulting pages into stage 2). > >>>>> > >>>>> I'm a bit dubious about requring the VMM to map the guest memory > >>>>> PROT_MTE unless somebody's done at least a sketch of the design > >>>>> for how this would work on the QEMU side. Currently QEMU just > >>>>> assumes the guest memory is guest memory and it can access it > >>>>> without special precautions... > >>>>> > >>>> > >>>> There are two statements being made here: > >>>> > >>>> 1) Requiring the use of PROT_MTE when mapping guest memory may not fit > >>>> QEMU well. > >>>> > >>>> 2) New KVM features should be accompanied with supporting QEMU code in > >>>> order to prove that the APIs make sense. > >>>> > >>>> I strongly agree with (2). While kvmtool supports some quick testing, it > >>>> doesn't support migration. We must test all new features with a migration > >>>> supporting VMM. > >>>> > >>>> I'm not sure about (1). I don't feel like it should be a major problem, > >>>> but (2). > >> > >> (1) seems to be contentious whichever way we go. Either PROT_MTE isn't > >> required in which case it's easy to accidentally screw up migration, or > >> it is required in which case it's difficult to handle normal guest > >> memory from the VMM. I get the impression that probably I should go back > >> to the previous approach - sorry for the distraction with this change. > >> > >> (2) isn't something I'm trying to skip, but I'm limited in what I can do > >> myself so would appreciate help here. Haibo is looking into this. > >> > > > > Hi Steven, > > > > Sorry for the later reply! > > > > I have finished the POC for the MTE migration support with the assumption > > that all the memory is mapped with PROT_MTE. But I got stuck in the test > > with a FVP setup. Previously, I successfully compiled a test case to verify > > the basic function of MTE in a guest. But these days, the re-compiled test > > can't be executed by the guest(very weird). The short plan to verify > > the migration > > is to set the MTE tags on one page in the guest, and try to dump the migrated > > memory contents. > > Hi Haibo, > > Sounds like you are making good progress - thanks for the update. Have > you thought about how the PROT_MTE mappings might work if QEMU itself > were to use MTE? My worry is that we end up with MTE in a guest > preventing QEMU from using MTE itself (because of the PROT_MTE > mappings). I'm hoping QEMU can wrap its use of guest memory in a > sequence which disables tag checking (something similar will be needed > for the "protected VM" use case anyway), but this isn't something I've > looked into. > > > I will update the status later next week! > > Great, I look forward to hearing how it goes. Hi Steve, I have finished verifying the POC on a FVP setup, and the MTE test case can be migrated from one VM to another successfully. Since the test case is very simple which just maps one page with MTE enabled and does some memory access, so I can't say it's OK for other cases. BTW, I noticed that you have sent out patch set v6 which mentions that mapping all the guest memory with PROT_MTE was not feasible. So what's the plan for the next step? Will new KVM APIs which can facilitate the tag store and recover be available? Regards, Haibo > > Thanks, > > Steve ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-16 7:31 ` Haibo Xu @ 2020-12-16 10:22 ` Steven Price 2020-12-17 1:47 ` Haibo Xu 0 siblings, 1 reply; 38+ messages in thread From: Steven Price @ 2020-12-16 10:22 UTC (permalink / raw) To: Haibo Xu Cc: Marc Zyngier, Andrew Jones, Catalin Marinas, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, arm-mail-list, kvmarm, Thomas Gleixner, Will Deacon, Dave Martin, lkml - Kernel Mailing List On 16/12/2020 07:31, Haibo Xu wrote: [...] > Hi Steve, Hi Haibo > I have finished verifying the POC on a FVP setup, and the MTE test case can > be migrated from one VM to another successfully. Since the test case is very > simple which just maps one page with MTE enabled and does some memory > access, so I can't say it's OK for other cases. That's great progress. > > BTW, I noticed that you have sent out patch set v6 which mentions that mapping > all the guest memory with PROT_MTE was not feasible. So what's the plan for the > next step? Will new KVM APIs which can facilitate the tag store and recover be > available? I'm currently rebasing on top of the KASAN MTE patch series. My plan for now is to switch back to not requiring the VMM to supply PROT_MTE (so KVM 'upgrades' the pages as necessary) and I'll add an RFC patch on the end of the series to add an KVM API for doing bulk read/write of tags. That way the VMM can map guest memory without PROT_MTE (so device 'DMA' accesses will be unchecked), and use the new API for migration. Thanks, Steve ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-12-16 10:22 ` Steven Price @ 2020-12-17 1:47 ` Haibo Xu 0 siblings, 0 replies; 38+ messages in thread From: Haibo Xu @ 2020-12-17 1:47 UTC (permalink / raw) To: Steven Price Cc: Marc Zyngier, Andrew Jones, Catalin Marinas, Juan Quintela, Richard Henderson, QEMU Developers, Dr. David Alan Gilbert, arm-mail-list, kvmarm, Thomas Gleixner, Will Deacon, Dave Martin, lkml - Kernel Mailing List On Wed, 16 Dec 2020 at 18:23, Steven Price <steven.price@arm.com> wrote: > > On 16/12/2020 07:31, Haibo Xu wrote: > [...] > > Hi Steve, > > Hi Haibo > > > I have finished verifying the POC on a FVP setup, and the MTE test case can > > be migrated from one VM to another successfully. Since the test case is very > > simple which just maps one page with MTE enabled and does some memory > > access, so I can't say it's OK for other cases. > > That's great progress. > > > > > BTW, I noticed that you have sent out patch set v6 which mentions that mapping > > all the guest memory with PROT_MTE was not feasible. So what's the plan for the > > next step? Will new KVM APIs which can facilitate the tag store and recover be > > available? > > I'm currently rebasing on top of the KASAN MTE patch series. My plan for > now is to switch back to not requiring the VMM to supply PROT_MTE (so > KVM 'upgrades' the pages as necessary) and I'll add an RFC patch on the > end of the series to add an KVM API for doing bulk read/write of tags. > That way the VMM can map guest memory without PROT_MTE (so device 'DMA' > accesses will be unchecked), and use the new API for migration. > Great! Will have a try with the new API in my POC! > Thanks, > > Steve ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v5 0/2] MTE support for KVM guest 2020-11-19 15:45 ` [PATCH v5 0/2] MTE support for KVM guest Peter Maydell 2020-11-19 15:57 ` Steven Price 2020-11-19 18:42 ` Andrew Jones @ 2020-11-23 12:16 ` Dr. David Alan Gilbert 2 siblings, 0 replies; 38+ messages in thread From: Dr. David Alan Gilbert @ 2020-11-23 12:16 UTC (permalink / raw) To: Peter Maydell Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon, James Morse, Julien Thierry, Suzuki K Poulose, kvmarm, arm-mail-list, lkml - Kernel Mailing List, Dave Martin, Mark Rutland, Thomas Gleixner, QEMU Developers, Juan Quintela, Richard Henderson, Haibo Xu, Andrew Jones * Peter Maydell (peter.maydell@linaro.org) wrote: > On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@arm.com> wrote: > > This series adds support for Arm's Memory Tagging Extension (MTE) to > > KVM, allowing KVM guests to make use of it. This builds on the existing > > user space support already in v5.10-rc1, see [1] for an overview. > > > The change to require the VMM to map all guest memory PROT_MTE is > > significant as it means that the VMM has to deal with the MTE tags even > > if it doesn't care about them (e.g. for virtual devices or if the VMM > > doesn't support migration). Also unfortunately because the VMM can > > change the memory layout at any time the check for PROT_MTE/VM_MTE has > > to be done very late (at the point of faulting pages into stage 2). > > I'm a bit dubious about requring the VMM to map the guest memory > PROT_MTE unless somebody's done at least a sketch of the design > for how this would work on the QEMU side. Currently QEMU just > assumes the guest memory is guest memory and it can access it > without special precautions... Although that is also changing because of the encrypted/protected memory in things like SEV. Dave > thanks > -- PMM > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2020-12-17 1:48 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-19 15:38 [PATCH v5 0/2] MTE support for KVM guest Steven Price 2020-11-19 15:39 ` [PATCH v5 1/2] arm64: kvm: Save/restore MTE registers Steven Price 2020-11-19 15:39 ` [PATCH v5 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price 2020-11-19 15:45 ` [PATCH v5 0/2] MTE support for KVM guest Peter Maydell 2020-11-19 15:57 ` Steven Price 2020-11-19 16:39 ` Peter Maydell 2020-11-19 18:42 ` Andrew Jones 2020-11-19 19:11 ` Marc Zyngier 2020-11-20 9:50 ` Steven Price 2020-11-20 9:56 ` Marc Zyngier 2020-11-20 9:58 ` Steven Price 2020-12-04 8:25 ` Haibo Xu 2020-12-07 14:48 ` Steven Price 2020-12-07 15:27 ` Peter Maydell 2020-12-07 15:45 ` Steven Price 2020-12-07 16:05 ` Marc Zyngier 2020-12-07 16:34 ` Catalin Marinas 2020-12-07 19:03 ` Marc Zyngier 2020-12-08 17:21 ` Catalin Marinas 2020-12-08 18:21 ` Marc Zyngier 2020-12-09 12:44 ` Catalin Marinas 2020-12-09 13:25 ` Marc Zyngier 2020-12-09 15:27 ` Catalin Marinas 2020-12-09 18:27 ` Richard Henderson 2020-12-09 18:39 ` Catalin Marinas 2020-12-09 20:13 ` Richard Henderson 2020-12-09 20:20 ` Peter Maydell 2020-12-07 16:44 ` Dr. David Alan Gilbert 2020-12-07 17:10 ` Peter Maydell 2020-12-07 17:44 ` Dr. David Alan Gilbert 2020-12-08 10:05 ` Haibo Xu 2020-12-08 9:51 ` Haibo Xu 2020-12-08 10:01 ` Marc Zyngier 2020-12-08 10:10 ` Haibo Xu 2020-12-16 7:31 ` Haibo Xu 2020-12-16 10:22 ` Steven Price 2020-12-17 1:47 ` Haibo Xu 2020-11-23 12:16 ` Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).