* [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 15:12 ` Greg KH
2023-10-16 13:27 ` [PATCH v10 02/50] KVM: SVM: Fix TSC_AUX virtualization setup Michael Roth
` (48 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, stable
From: Paolo Bonzini <pbonzini@redhat.com>
svm_recalc_instruction_intercepts() is always called at least once
before the vCPU is started, so the setting or clearing of the RDTSCP
intercept can be dropped from the TSC_AUX virtualization support.
Extracted from a patch by Tom Lendacky.
Cc: stable@vger.kernel.org
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/svm/sev.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b9a0a939d59f..fa1fb81323b5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3027,11 +3027,8 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) &&
(guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP) ||
- guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDPID))) {
+ guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDPID)))
set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, 1, 1);
- if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
- svm_clr_intercept(svm, INTERCEPT_RDTSCP);
- }
}
void sev_init_vmcb(struct vcpu_svm *svm)
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway
2023-10-16 13:27 ` [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway Michael Roth
@ 2023-10-16 15:12 ` Greg KH
2023-10-16 15:14 ` Paolo Bonzini
0 siblings, 1 reply; 158+ messages in thread
From: Greg KH @ 2023-10-16 15:12 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, stable
On Mon, Oct 16, 2023 at 08:27:30AM -0500, Michael Roth wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
>
> svm_recalc_instruction_intercepts() is always called at least once
> before the vCPU is started, so the setting or clearing of the RDTSCP
> intercept can be dropped from the TSC_AUX virtualization support.
>
> Extracted from a patch by Tom Lendacky.
>
> Cc: stable@vger.kernel.org
> Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> (cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> arch/x86/kvm/svm/sev.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
What stable tree(s) are you wanting this applied to (same for the others
in this series)? It's already in the 6.1.56 release, and the Fixes tag
is for 5.19, so I don't see where it could be missing from?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway
2023-10-16 15:12 ` Greg KH
@ 2023-10-16 15:14 ` Paolo Bonzini
2023-10-16 15:21 ` Michael Roth
0 siblings, 1 reply; 158+ messages in thread
From: Paolo Bonzini @ 2023-10-16 15:14 UTC (permalink / raw)
To: Greg KH, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, stable
On 10/16/23 17:12, Greg KH wrote:
> On Mon, Oct 16, 2023 at 08:27:30AM -0500, Michael Roth wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> svm_recalc_instruction_intercepts() is always called at least once
>> before the vCPU is started, so the setting or clearing of the RDTSCP
>> intercept can be dropped from the TSC_AUX virtualization support.
>>
>> Extracted from a patch by Tom Lendacky.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> (cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>> arch/x86/kvm/svm/sev.c | 5 +----
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> What stable tree(s) are you wanting this applied to (same for the others
> in this series)? It's already in the 6.1.56 release, and the Fixes tag
> is for 5.19, so I don't see where it could be missing from?
I tink it's missing in the (destined for 6.7) tree that Michael is
basing this series on, so he's cherry picking it from Linus's tree.
Paolo
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway
2023-10-16 15:14 ` Paolo Bonzini
@ 2023-10-16 15:21 ` Michael Roth
0 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 15:21 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Greg KH, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, bp,
vbabka, kirill, ak, tony.luck, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, stable
On Mon, Oct 16, 2023 at 05:14:38PM +0200, Paolo Bonzini wrote:
> On 10/16/23 17:12, Greg KH wrote:
> > On Mon, Oct 16, 2023 at 08:27:30AM -0500, Michael Roth wrote:
> > > From: Paolo Bonzini <pbonzini@redhat.com>
> > >
> > > svm_recalc_instruction_intercepts() is always called at least once
> > > before the vCPU is started, so the setting or clearing of the RDTSCP
> > > intercept can be dropped from the TSC_AUX virtualization support.
> > >
> > > Extracted from a patch by Tom Lendacky.
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > (cherry picked from commit e8d93d5d93f85949e7299be289c6e7e1154b2f78)
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > > arch/x86/kvm/svm/sev.c | 5 +----
> > > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > What stable tree(s) are you wanting this applied to (same for the others
> > in this series)? It's already in the 6.1.56 release, and the Fixes tag
> > is for 5.19, so I don't see where it could be missing from?
>
> I tink it's missing in the (destined for 6.7) tree that Michael is basing
> this series on, so he's cherry picking it from Linus's tree.
Yes, this and PATCH #2 are both prereqs that have already been applied
upstream, and are only being included in this series because they are
preqs for PATCH #3 which is new. Sorry for any confusion.
-Mike
>
> Paolo
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 02/50] KVM: SVM: Fix TSC_AUX virtualization setup
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
2023-10-16 13:27 ` [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 13:27 ` [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests Michael Roth
` (47 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, stable
From: Tom Lendacky <thomas.lendacky@amd.com>
The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.
This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.
Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.
With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.
Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e0096d01c4fcb8c96c05643cfc2c20ab78eae4da)
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/svm/sev.c | 31 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.c | 9 ++-------
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 29 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index fa1fb81323b5..4900c078045a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2962,6 +2962,32 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
count, in);
}
+static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
+ bool v_tsc_aux = guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
+
+ set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
+ }
+}
+
+void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_cpuid_entry2 *best;
+
+ /* For sev guests, the memory encryption bit is not reserved in CR3. */
+ best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
+ if (best)
+ vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
+
+ if (sev_es_guest(svm->vcpu.kvm))
+ sev_es_vcpu_after_set_cpuid(svm);
+}
+
static void sev_es_init_vmcb(struct vcpu_svm *svm)
{
struct vmcb *vmcb = svm->vmcb01.ptr;
@@ -3024,11 +3050,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
-
- if (boot_cpu_has(X86_FEATURE_V_TSC_AUX) &&
- (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP) ||
- guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDPID)))
- set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, 1, 1);
}
void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f283eb47f6ac..aef1ddf0b705 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4284,7 +4284,6 @@ static bool svm_has_emulated_msr(struct kvm *kvm, u32 index)
static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- struct kvm_cpuid_entry2 *best;
/*
* SVM doesn't provide a way to disable just XSAVES in the guest, KVM
@@ -4328,12 +4327,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
!!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
- /* For sev guests, the memory encryption bit is not reserved in CR3. */
- if (sev_guest(vcpu->kvm)) {
- best = kvm_find_cpuid_entry(vcpu, 0x8000001F);
- if (best)
- vcpu->arch.reserved_gpa_bits &= ~(1UL << (best->ebx & 0x3f));
- }
+ if (sev_guest(vcpu->kvm))
+ sev_vcpu_after_set_cpuid(svm);
init_vmcb_after_set_cpuid(vcpu);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f41253958357..be67ab7fdd10 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -684,6 +684,7 @@ void __init sev_hardware_setup(void);
void sev_hardware_unsetup(void);
int sev_cpu_init(struct svm_cpu_data *sd);
void sev_init_vmcb(struct vcpu_svm *svm);
+void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
void sev_free_vcpu(struct kvm_vcpu *vcpu);
int sev_handle_vmgexit(struct kvm_vcpu *vcpu);
int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
2023-10-16 13:27 ` [PATCH v10 01/50] KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway Michael Roth
2023-10-16 13:27 ` [PATCH v10 02/50] KVM: SVM: Fix TSC_AUX virtualization setup Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-13 12:50 ` Paolo Bonzini
2023-10-16 13:27 ` [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
` (46 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Alexey Kardashevskiy
When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.
However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.
Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.
Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
if the host/guest configuration allows it. If the host/guest
configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
that it can be caught by the existing checks in
kvm_{set,get}_msr_common() if the guest still attempts to access it.
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Cc: Alexey Kardashevskiy <aik@amd.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/svm/sev.c | 19 +++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4900c078045a..6ee925d66648 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2972,6 +2972,25 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
}
+
+ /*
+ * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
+ * the host/guest supports its use.
+ *
+ * guest_can_use() checks a number of requirements on the host/guest to
+ * ensure that MSR_IA32_XSS is available, but it might report true even
+ * if X86_FEATURE_XSAVES isn't configured in the guest to ensure host
+ * MSR_IA32_XSS is always properly restored. For SEV-ES, it is better
+ * to further check that the guest CPUID actually supports
+ * X86_FEATURE_XSAVES so that accesses to MSR_IA32_XSS by misbehaved
+ * guests will still get intercepted and caught in the normal
+ * kvm_emulate_rdmsr()/kvm_emulated_wrmsr() paths.
+ */
+ if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
+ else
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 0, 0);
}
void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index aef1ddf0b705..1e7fb1ea45f7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -103,6 +103,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
{ .index = MSR_IA32_LASTINTTOIP, .always = false },
+ { .index = MSR_IA32_XSS, .always = false },
{ .index = MSR_EFER, .always = false },
{ .index = MSR_IA32_CR_PAT, .always = false },
{ .index = MSR_AMD64_SEV_ES_GHCB, .always = true },
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index be67ab7fdd10..c409f934c377 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -30,7 +30,7 @@
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2
-#define MAX_DIRECT_ACCESS_MSRS 46
+#define MAX_DIRECT_ACCESS_MSRS 47
#define MSRPM_OFFSETS 32
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
2023-10-16 13:27 ` [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests Michael Roth
@ 2023-12-13 12:50 ` Paolo Bonzini
2023-12-13 17:30 ` Sean Christopherson
0 siblings, 1 reply; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 12:50 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Alexey Kardashevskiy
On 10/16/23 15:27, Michael Roth wrote:
> Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
> if the host/guest configuration allows it. If the host/guest
> configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
> that it can be caught by the existing checks in
> kvm_{set,get}_msr_common() if the guest still attempts to access it.
This is wrong, because it allows the guest to do untrapped writes to
MSR_IA32_XSS and therefore (via XRSTORS) to MSRs that the host might not
save or restore.
If the processor cannot let the host validate writes to MSR_IA32_XSS,
KVM simply cannot expose XSAVES to SEV-ES (and SEV-SNP) guests.
Because SVM doesn't provide a way to disable just XSAVES in the guest,
all that KVM can do is keep on trapping MSR_IA32_XSS (which the guest
shouldn't read or write to). In other words the crash on accesses to
MSR_IA32_XSS is not a bug but a feature (of the hypervisor, that
wants/needs to protect itself just as much as the guest wants to).
The bug is that there is no API to tell userspace "do not enable this
and that CPUID for SEV guests", there is only the extremely limited
KVM_GET_SUPPORTED_CPUID system ioctl.
For now, all we can do is document our wishes, with which userspace had
better comply. Please send a patch to QEMU that makes it obey.
Paolo
--------------------------- 8< -----------------------
From 303e66472ddf54c2a945588b133d34eaab291257 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Wed, 13 Dec 2023 07:45:08 -0500
Subject: [PATCH] Documentation: KVM: suggest disabling XSAVES on SEV-ES guests
When intercepts are enabled for MSR_IA32_XSS, the host will swap in/out
the guest-defined values while context-switching to/from guest mode.
However, in the case of SEV-ES, vcpu->arch.guest_state_protected is set,
so the guest-defined value is effectively ignored when switching to
guest mode with the understanding that the VMSA will handle swapping
in/out this register state.
However, SVM is still configured to intercept these accesses for SEV-ES
guests, so the values in the initial MSR_IA32_XSS are effectively
read-only, and a guest will experience undefined behavior if it actually
tries to write to this MSR. Fortunately, only CET/shadowstack makes use
of this register on SEV-ES-capable systems currently, which isn't yet
widely used, but this may become more of an issue in the future.
Additionally, enabling intercepts of MSR_IA32_XSS results in #VC
exceptions in the guest in certain paths that can lead to unexpected #VC
nesting levels. One example is SEV-SNP guests when handling #VC
exceptions for CPUID instructions involving leaf 0xD, subleaf 0x1, since
they will access MSR_IA32_XSS as part of servicing the CPUID #VC, then
generate another #VC when accessing MSR_IA32_XSS, which can lead to
guest crashes if an NMI occurs at that point in time. Running perf on a
guest while it is issuing such a sequence is one example where these can
be problematic.
Unfortunately, there is not really a way to fix this issue; allowing
unfiltered access to MSR_IA32_XSS also lets the guest write (via
XRSTORS) MSRs that the host might not be ready to save or restore.
Because SVM doesn't provide a way to disable just XSAVES in the guest,
all that KVM can do to protect itself is keep on trapping MSR_IA32_XSS.
Userspace has to comply and not enable XSAVES in CPUID, so that the
guest has no business accessing MSR_IA32_XSS at all.
Unfortunately^2, there is no API to tell userspace "do not enable this
and that CPUID for SEV guests", there is only the extremely limited
KVM_GET_SUPPORTED_CPUID system ioctl. So all we can do for now is
document it.
Reported-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
index 49a05f24747b..0c91916c0164 100644
--- a/Documentation/virt/kvm/x86/errata.rst
+++ b/Documentation/virt/kvm/x86/errata.rst
@@ -33,6 +33,15 @@ Note however that any software (e.g ``WIN87EM.DLL``) expecting these features
to be present likely predates these CPUID feature bits, and therefore
doesn't know to check for them anyway.
+Encrypted guests
+~~~~~~~~~~~~~~~~
+
+For SEV-ES guests, it is impossible for KVM to validate writes for MSRs that
+are part of the VMSA. In the case of MSR_IA32_XSS, however, KVM needs to
+validate writes to the MSR in order to prevent the guest from using XRSTORS
+to overwrite host MSRs. Therefore, the XSAVES feature should never be exposed
+to SEV-ES guests.
+
Nested virtualization features
------------------------------
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
2023-12-13 12:50 ` Paolo Bonzini
@ 2023-12-13 17:30 ` Sean Christopherson
2023-12-13 17:40 ` Paolo Bonzini
0 siblings, 1 reply; 158+ messages in thread
From: Sean Christopherson @ 2023-12-13 17:30 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Alexey Kardashevskiy
On Wed, Dec 13, 2023, Paolo Bonzini wrote:
> On 10/16/23 15:27, Michael Roth wrote:
> > Address this by disabling intercepts of MSR_IA32_XSS for SEV-ES guests
> > if the host/guest configuration allows it. If the host/guest
> > configuration doesn't allow for MSR_IA32_XSS, leave it intercepted so
> > that it can be caught by the existing checks in
> > kvm_{set,get}_msr_common() if the guest still attempts to access it.
>
> This is wrong, because it allows the guest to do untrapped writes to
> MSR_IA32_XSS and therefore (via XRSTORS) to MSRs that the host might not
> save or restore.
>
> If the processor cannot let the host validate writes to MSR_IA32_XSS,
> KVM simply cannot expose XSAVES to SEV-ES (and SEV-SNP) guests.
>
> Because SVM doesn't provide a way to disable just XSAVES in the guest,
> all that KVM can do is keep on trapping MSR_IA32_XSS (which the guest
> shouldn't read or write to). In other words the crash on accesses to
> MSR_IA32_XSS is not a bug but a feature (of the hypervisor, that
> wants/needs to protect itself just as much as the guest wants to).
>
> The bug is that there is no API to tell userspace "do not enable this
> and that CPUID for SEV guests", there is only the extremely limited
> KVM_GET_SUPPORTED_CPUID system ioctl.
>
> For now, all we can do is document our wishes, with which userspace had
> better comply. Please send a patch to QEMU that makes it obey.
Discussed this early today with Paolo at PUCK and pointed out that (a) the CPU
context switches the underlying state, (b) SVM doesn't allow intercepting *just*
XSAVES, and (c) SNP's AP creation can bypass XSS interception.
So while we all (all == KVM folks) agree that this is rather terrifying, e.g.
gives KVM zero option if there is a hardware issue, it's "fine" to let the guest
use XSAVES/XSS.
See also https://lore.kernel.org/all/ZUQvNIE9iU5TqJfw@google.com
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests
2023-12-13 17:30 ` Sean Christopherson
@ 2023-12-13 17:40 ` Paolo Bonzini
0 siblings, 0 replies; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 17:40 UTC (permalink / raw)
To: Sean Christopherson
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Alexey Kardashevskiy
On 12/13/23 18:30, Sean Christopherson wrote:
>> For now, all we can do is document our wishes, with which userspace had
>> better comply. Please send a patch to QEMU that makes it obey.
> Discussed this early today with Paolo at PUCK and pointed out that (a) the CPU
> context switches the underlying state, (b) SVM doesn't allow intercepting*just*
> XSAVES, and (c) SNP's AP creation can bypass XSS interception.
>
> So while we all (all == KVM folks) agree that this is rather terrifying, e.g.
> gives KVM zero option if there is a hardware issue, it's "fine" to let the guest
> use XSAVES/XSS.
Indeed; looks like I've got to queue this for 6.7 after all.
Paolo
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (2 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 03/50] KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-13 12:51 ` Paolo Bonzini
2023-10-16 13:27 ` [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
` (45 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen,
Ashish Kalra
From: Brijesh Singh <brijesh.singh@amd.com>
Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/amd.c | 5 +++--
tools/arch/x86/include/asm/cpufeatures.h | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 58cb9495e40f..1640cedd77f1 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -437,6 +437,7 @@
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
#define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index dd8379d84445..14ee7f750cc7 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -630,8 +630,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
* SME feature (set in scattered.c).
* If the kernel has not enabled SME via any means then
* don't advertise the SME feature.
- * For SEV: If BIOS has not enabled SEV then don't advertise the
- * SEV and SEV_ES feature (set in scattered.c).
+ * For SEV: If BIOS has not enabled SEV then don't advertise SEV and
+ * any additional functionality based on it.
*
* In all cases, since support for SME and SEV requires long mode,
* don't advertise the feature under CONFIG_X86_32.
@@ -666,6 +666,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
clear_sev:
setup_clear_cpu_cap(X86_FEATURE_SEV);
setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
}
}
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index 798e60b5454b..669f45eefa0c 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -432,6 +432,7 @@
#define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
#define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
#define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-10-16 13:27 ` [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
@ 2023-12-13 12:51 ` Paolo Bonzini
2023-12-13 13:13 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 12:51 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On 10/16/23 15:27, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> Add CPU feature detection for Secure Encrypted Virtualization with
> Secure Nested Paging. This feature adds a strong memory integrity
> protection to help prevent malicious hypervisor-based attacks like
> data replay, memory re-mapping, and more.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
> Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
Queued, thanks.
Paolo
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/kernel/cpu/amd.c | 5 +++--
> tools/arch/x86/include/asm/cpufeatures.h | 1 +
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 58cb9495e40f..1640cedd77f1 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -437,6 +437,7 @@
> #define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
> #define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
> #define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
> #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
> #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index dd8379d84445..14ee7f750cc7 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -630,8 +630,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> * SME feature (set in scattered.c).
> * If the kernel has not enabled SME via any means then
> * don't advertise the SME feature.
> - * For SEV: If BIOS has not enabled SEV then don't advertise the
> - * SEV and SEV_ES feature (set in scattered.c).
> + * For SEV: If BIOS has not enabled SEV then don't advertise SEV and
> + * any additional functionality based on it.
> *
> * In all cases, since support for SME and SEV requires long mode,
> * don't advertise the feature under CONFIG_X86_32.
> @@ -666,6 +666,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> clear_sev:
> setup_clear_cpu_cap(X86_FEATURE_SEV);
> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> }
> }
>
> diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
> index 798e60b5454b..669f45eefa0c 100644
> --- a/tools/arch/x86/include/asm/cpufeatures.h
> +++ b/tools/arch/x86/include/asm/cpufeatures.h
> @@ -432,6 +432,7 @@
> #define X86_FEATURE_SEV (19*32+ 1) /* AMD Secure Encrypted Virtualization */
> #define X86_FEATURE_VM_PAGE_FLUSH (19*32+ 2) /* "" VM Page Flush MSR is supported */
> #define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_SEV_SNP (19*32+ 4) /* AMD Secure Encrypted Virtualization - Secure Nested Paging */
> #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
> #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
> #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 12:51 ` Paolo Bonzini
@ 2023-12-13 13:13 ` Borislav Petkov
2023-12-13 13:31 ` Paolo Bonzini
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-13 13:13 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On Wed, Dec 13, 2023 at 01:51:58PM +0100, Paolo Bonzini wrote:
> On 10/16/23 15:27, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> >
> > Add CPU feature detection for Secure Encrypted Virtualization with
> > Secure Nested Paging. This feature adds a strong memory integrity
> > protection to help prevent malicious hypervisor-based attacks like
> > data replay, memory re-mapping, and more.
> >
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
> > Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
>
> Queued, thanks.
Paolo, please stop queueing x86 patches through your tree. I'll give you
an immutable branch with the x86 bits when the stuff has been reviewed.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 13:13 ` Borislav Petkov
@ 2023-12-13 13:31 ` Paolo Bonzini
2023-12-13 13:36 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 13:31 UTC (permalink / raw)
To: Borislav Petkov
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On 12/13/23 14:13, Borislav Petkov wrote:
> On Wed, Dec 13, 2023 at 01:51:58PM +0100, Paolo Bonzini wrote:
>> On 10/16/23 15:27, Michael Roth wrote:
>>> From: Brijesh Singh <brijesh.singh@amd.com>
>>>
>>> Add CPU feature detection for Secure Encrypted Virtualization with
>>> Secure Nested Paging. This feature adds a strong memory integrity
>>> protection to help prevent malicious hypervisor-based attacks like
>>> data replay, memory re-mapping, and more.
>>>
>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>> Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
>>> Signed-off-by: Ashish Kalra <Ashish.Kalra@amd.com>
>>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>>
>> Queued, thanks.
>
> Paolo, please stop queueing x86 patches through your tree. I'll give you
> an immutable branch with the x86 bits when the stuff has been reviewed.
Sure, I only queued it because you gave Acked-by for 05/50 and this is
an obvious dependency. I would like to get things in as they are ready
(whenever it makes sense), so if you want to include those two in the
x86 tree for 6.8, that would work for me.
Paolo
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 13:31 ` Paolo Bonzini
@ 2023-12-13 13:36 ` Borislav Petkov
2023-12-13 13:40 ` Paolo Bonzini
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-13 13:36 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On Wed, Dec 13, 2023 at 02:31:05PM +0100, Paolo Bonzini wrote:
> Sure, I only queued it because you gave Acked-by for 05/50 and this is an
> obvious dependency. I would like to get things in as they are ready
> (whenever it makes sense), so if you want to include those two in the x86
> tree for 6.8, that would work for me.
It doesn't make sense to include them into 6.8 because the two alone are
simply dead code in 6.8.
The plan is to put the x86 patches first in the next submission, I'll
pick them up for 6.9 and then give you an immutable branch to apply the
KVM bits ontop. This way it all goes together.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 13:36 ` Borislav Petkov
@ 2023-12-13 13:40 ` Paolo Bonzini
2023-12-13 13:49 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 13:40 UTC (permalink / raw)
To: Borislav Petkov
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On 12/13/23 14:36, Borislav Petkov wrote:
>> Sure, I only queued it because you gave Acked-by for 05/50 and this is an
>> obvious dependency. I would like to get things in as they are ready
>> (whenever it makes sense), so if you want to include those two in the x86
>> tree for 6.8, that would work for me.
>
> It doesn't make sense to include them into 6.8 because the two alone are
> simply dead code in 6.8.
Why are they dead code? X86_FEATURE_SEV_SNP is set automatically based
on CPUID, therefore patch 5 is a performance improvement on all
processors that support SEV-SNP. This is independent of whether KVM can
create SEV-SNP guests or not.
If this is wrong, there is a problem in the commit messages.
Paolo
> The plan is to put the x86 patches first in the next submission, I'll
> pick them up for 6.9 and then give you an immutable branch to apply the
> KVM bits ontop. This way it all goes together.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 13:40 ` Paolo Bonzini
@ 2023-12-13 13:49 ` Borislav Petkov
2023-12-13 14:18 ` Paolo Bonzini
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-13 13:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On Wed, Dec 13, 2023 at 02:40:24PM +0100, Paolo Bonzini wrote:
> Why are they dead code? X86_FEATURE_SEV_SNP is set automatically based on
> CPUID, therefore patch 5 is a performance improvement on all processors that
> support SEV-SNP. This is independent of whether KVM can create SEV-SNP
> guests or not.
No, it is not. This CPUID bit means:
"RMP table can be enabled to protect memory even from hypervisor."
Without the SNP host patches, it is dead code.
And regardless, arch/x86/kvm/ patches go through the kvm tree. The rest
of arch/x86/ through the tip tree. We've been over this a bunch of times
already.
If you don't agree with this split, let's discuss it offlist with all
tip and kvm maintainers, reach an agreement who picks up what and to put
an end to this nonsense.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 13:49 ` Borislav Petkov
@ 2023-12-13 14:18 ` Paolo Bonzini
2023-12-13 15:41 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 14:18 UTC (permalink / raw)
To: Borislav Petkov
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On 12/13/23 14:49, Borislav Petkov wrote:
> On Wed, Dec 13, 2023 at 02:40:24PM +0100, Paolo Bonzini wrote:
>> Why are they dead code? X86_FEATURE_SEV_SNP is set automatically based on
>> CPUID, therefore patch 5 is a performance improvement on all processors that
>> support SEV-SNP. This is independent of whether KVM can create SEV-SNP
>> guests or not.
>
> No, it is not. This CPUID bit means:
>
> "RMP table can be enabled to protect memory even from hypervisor."
>
> Without the SNP host patches, it is dead code.
- if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
+ if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
+ (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
+ !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
Surely we can agree that cpu_feature_enabled(X86_FEATURE_SEV_SNP) has nothing
to do with SEV-SNP host patches being present? And that therefore retpolines
are preferred even without any SEV-SNP support in KVM?
And can we agree that "Acked-by" means "feel free and take it if you wish,
I don't care enough to merge it through my tree or provide a topic branch"?
I'm asking because I'm not sure if we agree on these two things, but they
really seem basic to me?
Paolo
> And regardless, arch/x86/kvm/ patches go through the kvm tree. The rest
> of arch/x86/ through the tip tree. We've been over this a bunch of times
> already.
> If you don't agree with this split, let's discuss it offlist with all
> tip and kvm maintainers, reach an agreement who picks up what and to put
> an end to this nonsense.
>
> Thx.
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 14:18 ` Paolo Bonzini
@ 2023-12-13 15:41 ` Borislav Petkov
2023-12-13 17:35 ` Paolo Bonzini
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-13 15:41 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On Wed, Dec 13, 2023 at 03:18:17PM +0100, Paolo Bonzini wrote:
> Surely we can agree that cpu_feature_enabled(X86_FEATURE_SEV_SNP) has nothing
> to do with SEV-SNP host patches being present?
It does - we're sanitizing the meaning of a CPUID flag present in
/proc/cpuinfo, see here:
https://git.kernel.org/tip/79c603ee43b2674fba0257803bab265147821955
> And that therefore retpolines are preferred even without any SEV-SNP
> support in KVM?
No, automatic IBRS should be disabled when SNP is enabled. Not CPUID
present - enabled. We clear that bit on a couple of occasions in the SNP
host patchset if we determine that SNP host support is not possible so
4/50 needs to go together with the rest to mean something.
> And can we agree that "Acked-by" means "feel free and take it if you wish,
I can see how it can mean that and I'm sorry for the misunderstanding
I caused. Two things here:
* I acked it because I did a lengthly digging internally on whether
disabling AIBRS makes sense on SNP and this was a note more to myself to
say, yes, that's a good change.
* If I wanted for you to pick it up, I would've acked 4/50 too. Which
I haven't.
> I'm asking because I'm not sure if we agree on these two things, but they
> really seem basic to me?
I think KVM and x86 maintainers should sit down and discuss who picks up
what and through which tree so that there's no more confusion in the
future. It seems things need discussion...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 15:41 ` Borislav Petkov
@ 2023-12-13 17:35 ` Paolo Bonzini
2023-12-13 18:53 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 17:35 UTC (permalink / raw)
To: Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On 12/13/23 16:41, Borislav Petkov wrote:
> On Wed, Dec 13, 2023 at 03:18:17PM +0100, Paolo Bonzini wrote:
>> Surely we can agree that cpu_feature_enabled(X86_FEATURE_SEV_SNP) has nothing
>> to do with SEV-SNP host patches being present?
>
> It does - we're sanitizing the meaning of a CPUID flag present in
> /proc/cpuinfo, see here:
>
> https://git.kernel.org/tip/79c603ee43b2674fba0257803bab265147821955
>
>> And that therefore retpolines are preferred even without any SEV-SNP
>> support in KVM?
>
> No, automatic IBRS should be disabled when SNP is enabled. Not CPUID
> present - enabled.
Ok, so the root cause of the problem is commit message/patch ordering:
1) patch 4 should have unconditionally cleared the feature (until the
initialization code comes around in patch 6); and it should have
mentioned in the commit message that we don't want X86_FEATURE_SEV_SNP
to be set, unless SNP can be enabled via MSR_AMD64_SYSCFG.
2) possibly, the commit message of patch 5 could have said something
like "at this point in the kernel SNP is never enabled".
3) Patch 23 should have been placed before the SNP initialization,
because as things stand the patches (mildly) break bisectability.
> We clear that bit on a couple of occasions in the SNP
> host patchset if we determine that SNP host support is not possible so
> 4/50 needs to go together with the rest to mean something.
Understood now. With the patch ordering and commit message edits I
suggested above, indeed I would not have picked up patch 4.
But with your explanation, I would even say that "4/50 needs to go
together with the rest" *for correctness*, not just to mean something.
Paolo
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature
2023-12-13 17:35 ` Paolo Bonzini
@ 2023-12-13 18:53 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-12-13 18:53 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
seanjc, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On Wed, Dec 13, 2023 at 06:35:35PM +0100, Paolo Bonzini wrote:
> 1) patch 4 should have unconditionally cleared the feature (until the
> initialization code comes around in patch 6); and it should have mentioned
> in the commit message that we don't want X86_FEATURE_SEV_SNP to be set,
> unless SNP can be enabled via MSR_AMD64_SYSCFG.
I guess.
> 2) possibly, the commit message of patch 5 could have said something like
> "at this point in the kernel SNP is never enabled".
Sure.
> 3) Patch 23 should have been placed before the SNP initialization, because
> as things stand the patches (mildly) break bisectability.
Ok, I still haven't reached that one.
> Understood now. With the patch ordering and commit message edits I
> suggested above, indeed I would not have picked up patch 4.
In the future, please simply refrain from picking up x86 patches if you
haven't gotten an explicit ACK.
We could have conflicting changes in tip, we could be reworking that
part of the code and the thing the patch touches could be completely
gone, and so on and so on...
Also, we want to have a full picture of what goes in. Exactly the same
as how you'd like to have a full picture of what goes into kvm and how
we don't apply kvm patches unless there's some extraordinary
circumstance or we have received an explicit ACK.
> But with your explanation, I would even say that "4/50 needs to go
> together with the rest" *for correctness*, not just to mean something.
Yes, but for ease of integration it would be easier if they go in two
groups - kvm and x86 bits. Not: some x86 bits first, then kvm bits
through your tree and then some more x86 bits. That would be
a logistical nightmare.
And even if you bisect and land at 4/50 and you disable AIBRS even
without SNP being really enabled, that is not a big deal - you're only
bisecting and not really using that kernel and it's not like it breaks
builds so...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (3 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 04/50] x86/cpufeatures: Add SEV-SNP CPU feature Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-25 17:33 ` Borislav Petkov
` (2 more replies)
2023-10-16 13:27 ` [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support Michael Roth
` (44 subsequent siblings)
49 siblings, 3 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Kim Phillips
From: Kim Phillips <kim.phillips@amd.com>
Without SEV-SNP, Automatic IBRS protects only the kernel. But when
SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
host-side code, including userspace. This protection comes at a cost:
reduced userspace indirect branch performance.
To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
hosts. Fall back to retpolines instead.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
[mdr: squash in changes from review discussion]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kernel/cpu/common.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 382d4e6b848d..11fae89b799e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1357,8 +1357,13 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
/*
* AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
* flag and protect from vendor-specific bugs via the whitelist.
+ *
+ * Don't use AutoIBRS when SNP is enabled because it degrades host
+ * userspace indirect branch performance.
*/
- if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
+ if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
+ (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
+ !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
!(ia32_cap & ARCH_CAP_PBRSB_NO))
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
2023-10-16 13:27 ` [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
@ 2023-10-25 17:33 ` Borislav Petkov
2023-10-27 21:50 ` Dave Hansen
2023-12-13 12:52 ` Paolo Bonzini
2 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-10-25 17:33 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Kim Phillips
On Mon, Oct 16, 2023 at 08:27:34AM -0500, Michael Roth wrote:
> From: Kim Phillips <kim.phillips@amd.com>
>
> Without SEV-SNP, Automatic IBRS protects only the kernel. But when
> SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
> host-side code, including userspace. This protection comes at a cost:
> reduced userspace indirect branch performance.
>
> To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
> hosts. Fall back to retpolines instead.
>
> Signed-off-by: Kim Phillips <kim.phillips@amd.com>
> [mdr: squash in changes from review discussion]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> arch/x86/kernel/cpu/common.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
2023-10-16 13:27 ` [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
2023-10-25 17:33 ` Borislav Petkov
@ 2023-10-27 21:50 ` Dave Hansen
2023-12-13 12:52 ` Paolo Bonzini
2 siblings, 0 replies; 158+ messages in thread
From: Dave Hansen @ 2023-10-27 21:50 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Kim Phillips
On 10/16/23 06:27, Michael Roth wrote:
> Without SEV-SNP, Automatic IBRS protects only the kernel. But when
> SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
> host-side code, including userspace. This protection comes at a cost:
> reduced userspace indirect branch performance.
>
> To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
> hosts. Fall back to retpolines instead.
Thanks for the updated changelog:
Acked-by: Dave Hansen <dave.hansen@intel.com>
BTW, have you given your hardware folks a hard time about this? It
seems _kinda_ silly to be using retpolines when the hardware has a
perfectly good IBRS implementation for the kernel.
Just please make sure there's a good underlying reason for this behavior
and as opposed to being some kind of inadvertent side effect.
I assume Auto-IBRS and SEV-SNP are going to be with us for a long time,
so it would be nice to have a long term solution here.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled
2023-10-16 13:27 ` [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
2023-10-25 17:33 ` Borislav Petkov
2023-10-27 21:50 ` Dave Hansen
@ 2023-12-13 12:52 ` Paolo Bonzini
2 siblings, 0 replies; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 12:52 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Kim Phillips
On 10/16/23 15:27, Michael Roth wrote:
> From: Kim Phillips <kim.phillips@amd.com>
>
> Without SEV-SNP, Automatic IBRS protects only the kernel. But when
> SEV-SNP is enabled, the Automatic IBRS protection umbrella widens to all
> host-side code, including userspace. This protection comes at a cost:
> reduced userspace indirect branch performance.
>
> To avoid this performance loss, don't use Automatic IBRS on SEV-SNP
> hosts. Fall back to retpolines instead.
>
> Signed-off-by: Kim Phillips <kim.phillips@amd.com>
> [mdr: squash in changes from review discussion]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
Queued, thanks.
Paolo
> ---
> arch/x86/kernel/cpu/common.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 382d4e6b848d..11fae89b799e 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1357,8 +1357,13 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
> /*
> * AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
> * flag and protect from vendor-specific bugs via the whitelist.
> + *
> + * Don't use AutoIBRS when SNP is enabled because it degrades host
> + * userspace indirect branch performance.
> */
> - if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
> + if ((ia32_cap & ARCH_CAP_IBRS_ALL) ||
> + (cpu_has(c, X86_FEATURE_AUTOIBRS) &&
> + !cpu_feature_enabled(X86_FEATURE_SEV_SNP))) {
> setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
> if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
> !(ia32_cap & ARCH_CAP_PBRSB_NO))
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (4 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 05/50] x86/speculation: Do not enable Automatic IBRS if SEV SNP is enabled Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-25 18:19 ` Tom Lendacky
2023-11-07 16:31 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers Michael Roth
` (43 subsequent siblings)
49 siblings, 2 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
a number of steps needed to detect/enable SEV-SNP and RMP table support
on the host:
- Detect SEV-SNP support based on CPUID bit
- Initialize the RMP table memory reported by the RMP base/end MSR
registers and configure IOMMU to be compatible with RMP access
restrictions
- Set the MtrrFixDramModEn bit in SYSCFG MSR
- Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
- Configure IOMMU
RMP table entry format is non-architectural and it can vary by
processor. It is defined by the PPR. Restrict SNP support to CPU
models/families which are compatible with the current RMP table entry
format to guard against any undefined behavior when running on other
system types. Future models/support will handle this through an
architectural mechanism to allow for broader compatibility.
SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
instead of CONFIG_AMD_MEM_ENCRYPT.
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: rework commit message to be clearer about what patch does, squash
in early_rmptable_check() handling from Tom]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/Kbuild | 2 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 11 +-
arch/x86/include/asm/sev.h | 6 +
arch/x86/kernel/cpu/amd.c | 19 ++
arch/x86/virt/svm/Makefile | 3 +
arch/x86/virt/svm/sev.c | 239 +++++++++++++++++++++++
drivers/iommu/amd/init.c | 2 +-
include/linux/amd-iommu.h | 2 +-
9 files changed, 288 insertions(+), 4 deletions(-)
create mode 100644 arch/x86/virt/svm/Makefile
create mode 100644 arch/x86/virt/svm/sev.c
diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index 5a83da703e87..6a1f36df6a18 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -28,5 +28,7 @@ obj-y += net/
obj-$(CONFIG_KEXEC_FILE) += purgatory/
+obj-y += virt/svm/
+
# for cleaning
subdir- += boot tools
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 702d93fdd10e..83efd407033b 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -117,6 +117,12 @@
#define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31))
#endif
+#ifdef CONFIG_KVM_AMD_SEV
+# define DISABLE_SEV_SNP 0
+#else
+# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -141,7 +147,7 @@
DISABLE_ENQCMD)
#define DISABLED_MASK17 0
#define DISABLED_MASK18 (DISABLE_IBT)
-#define DISABLED_MASK19 0
+#define DISABLED_MASK19 (DISABLE_SEV_SNP)
#define DISABLED_MASK20 0
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1d111350197f..2be74afb4cbd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -589,6 +589,8 @@
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE 0xc0010132
+#define MSR_AMD64_RMP_END 0xc0010133
/* SNP feature bits enabled by the hypervisor */
#define MSR_AMD64_SNP_VTOM BIT_ULL(3)
@@ -690,7 +692,14 @@
#define MSR_K8_TOP_MEM2 0xc001001d
#define MSR_AMD64_SYSCFG 0xc0010010
#define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
-#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
+#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
+#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
+#define MSR_AMD64_SYSCFG_MFDM_BIT 19
+#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
+
#define MSR_K8_INT_PENDING_MSG 0xc0010055
/* C1E active bits in int pending message */
#define K8_INTP_C1E_ACTIVE_MASK 0x18000000
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 5b4a1ce3d368..b05fcd0ab7e4 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -243,4 +243,10 @@ static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
static inline u64 sev_get_status(void) { return 0; }
#endif
+#ifdef CONFIG_KVM_AMD_SEV
+bool snp_get_rmptable_info(u64 *start, u64 *len);
+#else
+static inline bool snp_get_rmptable_info(u64 *start, u64 *len) { return false; }
+#endif
+
#endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 14ee7f750cc7..6cc2074fcea3 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -20,6 +20,7 @@
#include <asm/delay.h>
#include <asm/debugreg.h>
#include <asm/resctrl.h>
+#include <asm/sev.h>
#ifdef CONFIG_X86_64
# include <asm/mmconfig.h>
@@ -618,6 +619,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
resctrl_cpu_detect(c);
}
+static bool early_rmptable_check(void)
+{
+ u64 rmp_base, rmp_size;
+
+ /*
+ * For early BSP initialization, max_pfn won't be set up yet, wait until
+ * it is set before performing the RMP table calculations.
+ */
+ if (!max_pfn)
+ return true;
+
+ return snp_get_rmptable_info(&rmp_base, &rmp_size);
+}
+
static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
{
u64 msr;
@@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
if (!(msr & MSR_K7_HWCR_SMMLOCK))
goto clear_sev;
+ if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
+ goto clear_snp;
+
return;
clear_all:
@@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
clear_sev:
setup_clear_cpu_cap(X86_FEATURE_SEV);
setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+clear_snp:
setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
}
}
diff --git a/arch/x86/virt/svm/Makefile b/arch/x86/virt/svm/Makefile
new file mode 100644
index 000000000000..ef2a31bdcc70
--- /dev/null
+++ b/arch/x86/virt/svm/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_KVM_AMD_SEV) += sev.o
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
new file mode 100644
index 000000000000..8b9ed72489e4
--- /dev/null
+++ b/arch/x86/virt/svm/sev.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD SVM-SEV Host Support.
+ *
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ *
+ * Author: Ashish Kalra <ashish.kalra@amd.com>
+ *
+ */
+
+#include <linux/cc_platform.h>
+#include <linux/printk.h>
+#include <linux/mm_types.h>
+#include <linux/set_memory.h>
+#include <linux/memblock.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/cpumask.h>
+#include <linux/iommu.h>
+#include <linux/amd-iommu.h>
+
+#include <asm/sev.h>
+#include <asm/processor.h>
+#include <asm/setup.h>
+#include <asm/svm.h>
+#include <asm/smp.h>
+#include <asm/cpu.h>
+#include <asm/apic.h>
+#include <asm/cpuid.h>
+#include <asm/cmdline.h>
+#include <asm/iommu.h>
+
+/*
+ * The RMP entry format is not architectural. The format is defined in PPR
+ * Family 19h Model 01h, Rev B1 processor.
+ */
+struct rmpentry {
+ u64 assigned : 1,
+ pagesize : 1,
+ immutable : 1,
+ rsvd1 : 9,
+ gpa : 39,
+ asid : 10,
+ vmsa : 1,
+ validated : 1,
+ rsvd2 : 1;
+ u64 rsvd3;
+} __packed;
+
+/*
+ * The first 16KB from the RMP_BASE is used by the processor for the
+ * bookkeeping, the range needs to be added during the RMP entry lookup.
+ */
+#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+
+static struct rmpentry *rmptable_start __ro_after_init;
+static u64 rmptable_max_pfn __ro_after_init;
+
+#undef pr_fmt
+#define pr_fmt(fmt) "SEV-SNP: " fmt
+
+static int __mfd_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_MFDM;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void mfd_enable(void *arg)
+{
+ __mfd_enable(smp_processor_id());
+}
+
+static int __snp_enable(unsigned int cpu)
+{
+ u64 val;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+
+ val |= MSR_AMD64_SYSCFG_SNP_EN;
+ val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
+
+ wrmsrl(MSR_AMD64_SYSCFG, val);
+
+ return 0;
+}
+
+static __init void snp_enable(void *arg)
+{
+ __snp_enable(smp_processor_id());
+}
+
+#define RMP_ADDR_MASK GENMASK_ULL(51, 13)
+
+bool snp_get_rmptable_info(u64 *start, u64 *len)
+{
+ u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
+
+ rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
+ rdmsrl(MSR_AMD64_RMP_END, rmp_end);
+
+ if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
+ pr_err("Memory for the RMP table has not been reserved by BIOS\n");
+ return false;
+ }
+
+ if (rmp_base > rmp_end) {
+ pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
+ return false;
+ }
+
+ rmp_sz = rmp_end - rmp_base + 1;
+
+ /*
+ * Calculate the amount the memory that must be reserved by the BIOS to
+ * address the whole RAM, including the bookkeeping area. The RMP itself
+ * must also be covered.
+ */
+ max_rmp_pfn = max_pfn;
+ if (PHYS_PFN(rmp_end) > max_pfn)
+ max_rmp_pfn = PHYS_PFN(rmp_end);
+
+ calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+ if (calc_rmp_sz > rmp_sz) {
+ pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
+ calc_rmp_sz, rmp_sz);
+ return false;
+ }
+
+ *start = rmp_base;
+ *len = rmp_sz;
+
+ return true;
+}
+
+static __init int __snp_rmptable_init(void)
+{
+ u64 rmp_base, rmp_size;
+ void *rmp_start;
+ u64 val;
+
+ if (!snp_get_rmptable_info(&rmp_base, &rmp_size))
+ return 1;
+
+ pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
+ rmp_base, rmp_base + rmp_size - 1);
+
+ rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB);
+ if (!rmp_start) {
+ pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size);
+ return 1;
+ }
+
+ /*
+ * Check if SEV-SNP is already enabled, this can happen in case of
+ * kexec boot.
+ */
+ rdmsrl(MSR_AMD64_SYSCFG, val);
+ if (val & MSR_AMD64_SYSCFG_SNP_EN)
+ goto skip_enable;
+
+ /* Initialize the RMP table to zero */
+ memset(rmp_start, 0, rmp_size);
+
+ /* Flush the caches to ensure that data is written before SNP is enabled. */
+ wbinvd_on_all_cpus();
+
+ /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
+ on_each_cpu(mfd_enable, NULL, 1);
+
+ /* Enable SNP on all CPUs. */
+ on_each_cpu(snp_enable, NULL, 1);
+
+skip_enable:
+ rmp_start += RMPTABLE_CPU_BOOKKEEPING_SZ;
+ rmp_size -= RMPTABLE_CPU_BOOKKEEPING_SZ;
+
+ rmptable_start = (struct rmpentry *)rmp_start;
+ rmptable_max_pfn = rmp_size / sizeof(struct rmpentry) - 1;
+
+ return 0;
+}
+
+static int __init snp_rmptable_init(void)
+{
+ int family, model;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return 0;
+
+ family = boot_cpu_data.x86;
+ model = boot_cpu_data.x86_model;
+
+ /*
+ * RMP table entry format is not architectural and it can vary by processor and
+ * is defined by the per-processor PPR. Restrict SNP support on the known CPU
+ * model and family for which the RMP table entry format is currently defined for.
+ */
+ if (family != 0x19 || model > 0xaf)
+ goto nosnp;
+
+ if (amd_iommu_snp_enable())
+ goto nosnp;
+
+ if (__snp_rmptable_init())
+ goto nosnp;
+
+ cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+
+ return 0;
+
+nosnp:
+ setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
+ return -ENOSYS;
+}
+
+/*
+ * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
+ * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
+ * called after subsys_initcall().
+ *
+ * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
+ * directly into guest private memory. In case of SNP, the IOMMU ensures that
+ * the page(s) used for DMA are hypervisor owned.
+ */
+fs_initcall(snp_rmptable_init);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 45efb7e5d725..1c9924de607a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3802,7 +3802,7 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
}
-#ifdef CONFIG_AMD_MEM_ENCRYPT
+#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void)
{
/*
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 99a5201d9e62..55fc03cb3968 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -205,7 +205,7 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn,
u64 *value);
struct amd_iommu *get_amd_iommu(unsigned int idx);
-#ifdef CONFIG_AMD_MEM_ENCRYPT
+#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void);
#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-10-16 13:27 ` [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support Michael Roth
@ 2023-10-25 18:19 ` Tom Lendacky
2023-11-07 16:31 ` Borislav Petkov
1 sibling, 0 replies; 158+ messages in thread
From: Tom Lendacky @ 2023-10-25 18:19 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
rientjes, dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On 10/16/23 08:27, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The memory integrity guarantees of SEV-SNP are enforced through a new
> structure called the Reverse Map Table (RMP). The RMP is a single data
> structure shared across the system that contains one entry for every 4K
> page of DRAM that may be used by SEV-SNP VMs. APM2 section 15.36 details
> a number of steps needed to detect/enable SEV-SNP and RMP table support
> on the host:
>
> - Detect SEV-SNP support based on CPUID bit
> - Initialize the RMP table memory reported by the RMP base/end MSR
> registers and configure IOMMU to be compatible with RMP access
> restrictions
> - Set the MtrrFixDramModEn bit in SYSCFG MSR
> - Set the SecureNestedPagingEn and VMPLEn bits in the SYSCFG MSR
> - Configure IOMMU
>
> RMP table entry format is non-architectural and it can vary by
> processor. It is defined by the PPR. Restrict SNP support to CPU
> models/families which are compatible with the current RMP table entry
> format to guard against any undefined behavior when running on other
> system types. Future models/support will handle this through an
> architectural mechanism to allow for broader compatibility.
>
> SNP host code depends on CONFIG_KVM_AMD_SEV config flag, which may be
> enabled even when CONFIG_AMD_MEM_ENCRYPT isn't set, so update the
> SNP-specific IOMMU helpers used here to rely on CONFIG_KVM_AMD_SEV
> instead of CONFIG_AMD_MEM_ENCRYPT.
>
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> [mdr: rework commit message to be clearer about what patch does, squash
> in early_rmptable_check() handling from Tom]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> arch/x86/Kbuild | 2 +
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/msr-index.h | 11 +-
> arch/x86/include/asm/sev.h | 6 +
> arch/x86/kernel/cpu/amd.c | 19 ++
> arch/x86/virt/svm/Makefile | 3 +
> arch/x86/virt/svm/sev.c | 239 +++++++++++++++++++++++
> drivers/iommu/amd/init.c | 2 +-
> include/linux/amd-iommu.h | 2 +-
> 9 files changed, 288 insertions(+), 4 deletions(-)
> create mode 100644 arch/x86/virt/svm/Makefile
> create mode 100644 arch/x86/virt/svm/sev.c
>
> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
> index 5a83da703e87..6a1f36df6a18 100644
> --- a/arch/x86/Kbuild
> +++ b/arch/x86/Kbuild
> @@ -28,5 +28,7 @@ obj-y += net/
>
> obj-$(CONFIG_KEXEC_FILE) += purgatory/
>
> +obj-y += virt/svm/
> +
> # for cleaning
> subdir- += boot tools
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index 702d93fdd10e..83efd407033b 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -117,6 +117,12 @@
> #define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31))
> #endif
>
> +#ifdef CONFIG_KVM_AMD_SEV
> +# define DISABLE_SEV_SNP 0
> +#else
> +# define DISABLE_SEV_SNP (1 << (X86_FEATURE_SEV_SNP & 31))
> +#endif
> +
> /*
> * Make sure to add features to the correct mask
> */
> @@ -141,7 +147,7 @@
> DISABLE_ENQCMD)
> #define DISABLED_MASK17 0
> #define DISABLED_MASK18 (DISABLE_IBT)
> -#define DISABLED_MASK19 0
> +#define DISABLED_MASK19 (DISABLE_SEV_SNP)
> #define DISABLED_MASK20 0
> #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 1d111350197f..2be74afb4cbd 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -589,6 +589,8 @@
> #define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> #define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
> #define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
> +#define MSR_AMD64_RMP_BASE 0xc0010132
> +#define MSR_AMD64_RMP_END 0xc0010133
>
> /* SNP feature bits enabled by the hypervisor */
> #define MSR_AMD64_SNP_VTOM BIT_ULL(3)
> @@ -690,7 +692,14 @@
> #define MSR_K8_TOP_MEM2 0xc001001d
> #define MSR_AMD64_SYSCFG 0xc0010010
> #define MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT 23
> -#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_AMD64_SYSCFG_MEM_ENCRYPT_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_EN_BIT 24
> +#define MSR_AMD64_SYSCFG_SNP_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_EN_BIT)
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT 25
> +#define MSR_AMD64_SYSCFG_SNP_VMPL_EN BIT_ULL(MSR_AMD64_SYSCFG_SNP_VMPL_EN_BIT)
> +#define MSR_AMD64_SYSCFG_MFDM_BIT 19
> +#define MSR_AMD64_SYSCFG_MFDM BIT_ULL(MSR_AMD64_SYSCFG_MFDM_BIT)
> +
> #define MSR_K8_INT_PENDING_MSG 0xc0010055
> /* C1E active bits in int pending message */
> #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 5b4a1ce3d368..b05fcd0ab7e4 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -243,4 +243,10 @@ static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
> static inline u64 sev_get_status(void) { return 0; }
> #endif
>
> +#ifdef CONFIG_KVM_AMD_SEV
> +bool snp_get_rmptable_info(u64 *start, u64 *len);
> +#else
> +static inline bool snp_get_rmptable_info(u64 *start, u64 *len) { return false; }
> +#endif
> +
> #endif
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 14ee7f750cc7..6cc2074fcea3 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -20,6 +20,7 @@
> #include <asm/delay.h>
> #include <asm/debugreg.h>
> #include <asm/resctrl.h>
> +#include <asm/sev.h>
>
> #ifdef CONFIG_X86_64
> # include <asm/mmconfig.h>
> @@ -618,6 +619,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
> resctrl_cpu_detect(c);
> }
>
> +static bool early_rmptable_check(void)
> +{
> + u64 rmp_base, rmp_size;
> +
> + /*
> + * For early BSP initialization, max_pfn won't be set up yet, wait until
> + * it is set before performing the RMP table calculations.
> + */
> + if (!max_pfn)
> + return true;
To make this so that AutoIBRS isn't disabled should an RMP table not have
been allocated by BIOS, lets delete the above check. It then becomes just
a check for whether the RMP table has been allocated by BIOS, enabled by
selecting a BIOS option, which shows intent for running SNP guests.
This way, the AutoIBRS mitigation can be used if SNP is not possible on
the system.
Thanks,
Tom
> +
> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
> +}
> +
> static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> {
> u64 msr;
> @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> if (!(msr & MSR_K7_HWCR_SMMLOCK))
> goto clear_sev;
>
> + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
> + goto clear_snp;
> +
> return;
>
> clear_all:
> @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> clear_sev:
> setup_clear_cpu_cap(X86_FEATURE_SEV);
> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
> +clear_snp:
> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> }
> }
> diff --git a/arch/x86/virt/svm/Makefile b/arch/x86/virt/svm/Makefile
> new file mode 100644
> index 000000000000..ef2a31bdcc70
> --- /dev/null
> +++ b/arch/x86/virt/svm/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_KVM_AMD_SEV) += sev.o
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> new file mode 100644
> index 000000000000..8b9ed72489e4
> --- /dev/null
> +++ b/arch/x86/virt/svm/sev.c
> @@ -0,0 +1,239 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AMD SVM-SEV Host Support.
> + *
> + * Copyright (C) 2023 Advanced Micro Devices, Inc.
> + *
> + * Author: Ashish Kalra <ashish.kalra@amd.com>
> + *
> + */
> +
> +#include <linux/cc_platform.h>
> +#include <linux/printk.h>
> +#include <linux/mm_types.h>
> +#include <linux/set_memory.h>
> +#include <linux/memblock.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/cpumask.h>
> +#include <linux/iommu.h>
> +#include <linux/amd-iommu.h>
> +
> +#include <asm/sev.h>
> +#include <asm/processor.h>
> +#include <asm/setup.h>
> +#include <asm/svm.h>
> +#include <asm/smp.h>
> +#include <asm/cpu.h>
> +#include <asm/apic.h>
> +#include <asm/cpuid.h>
> +#include <asm/cmdline.h>
> +#include <asm/iommu.h>
> +
> +/*
> + * The RMP entry format is not architectural. The format is defined in PPR
> + * Family 19h Model 01h, Rev B1 processor.
> + */
> +struct rmpentry {
> + u64 assigned : 1,
> + pagesize : 1,
> + immutable : 1,
> + rsvd1 : 9,
> + gpa : 39,
> + asid : 10,
> + vmsa : 1,
> + validated : 1,
> + rsvd2 : 1;
> + u64 rsvd3;
> +} __packed;
> +
> +/*
> + * The first 16KB from the RMP_BASE is used by the processor for the
> + * bookkeeping, the range needs to be added during the RMP entry lookup.
> + */
> +#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
> +
> +static struct rmpentry *rmptable_start __ro_after_init;
> +static u64 rmptable_max_pfn __ro_after_init;
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "SEV-SNP: " fmt
> +
> +static int __mfd_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_MFDM;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void mfd_enable(void *arg)
> +{
> + __mfd_enable(smp_processor_id());
> +}
> +
> +static int __snp_enable(unsigned int cpu)
> +{
> + u64 val;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> +
> + val |= MSR_AMD64_SYSCFG_SNP_EN;
> + val |= MSR_AMD64_SYSCFG_SNP_VMPL_EN;
> +
> + wrmsrl(MSR_AMD64_SYSCFG, val);
> +
> + return 0;
> +}
> +
> +static __init void snp_enable(void *arg)
> +{
> + __snp_enable(smp_processor_id());
> +}
> +
> +#define RMP_ADDR_MASK GENMASK_ULL(51, 13)
> +
> +bool snp_get_rmptable_info(u64 *start, u64 *len)
> +{
> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
> +
> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
> + return false;
> + }
> +
> + if (rmp_base > rmp_end) {
> + pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
> + return false;
> + }
> +
> + rmp_sz = rmp_end - rmp_base + 1;
> +
> + /*
> + * Calculate the amount the memory that must be reserved by the BIOS to
> + * address the whole RAM, including the bookkeeping area. The RMP itself
> + * must also be covered.
> + */
> + max_rmp_pfn = max_pfn;
> + if (PHYS_PFN(rmp_end) > max_pfn)
> + max_rmp_pfn = PHYS_PFN(rmp_end);
> +
> + calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + if (calc_rmp_sz > rmp_sz) {
> + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> + calc_rmp_sz, rmp_sz);
> + return false;
> + }
> +
> + *start = rmp_base;
> + *len = rmp_sz;
> +
> + return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)
> +{
> + u64 rmp_base, rmp_size;
> + void *rmp_start;
> + u64 val;
> +
> + if (!snp_get_rmptable_info(&rmp_base, &rmp_size))
> + return 1;
> +
> + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
> + rmp_base, rmp_base + rmp_size - 1);
> +
> + rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB);
> + if (!rmp_start) {
> + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size);
> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen in case of
> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */
> + memset(rmp_start, 0, rmp_size);
> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
> + on_each_cpu(mfd_enable, NULL, 1);
> +
> + /* Enable SNP on all CPUs. */
> + on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> + rmp_start += RMPTABLE_CPU_BOOKKEEPING_SZ;
> + rmp_size -= RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + rmptable_start = (struct rmpentry *)rmp_start;
> + rmptable_max_pfn = rmp_size / sizeof(struct rmpentry) - 1;
> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + int family, model;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + family = boot_cpu_data.x86;
> + model = boot_cpu_data.x86_model;
> +
> + /*
> + * RMP table entry format is not architectural and it can vary by processor and
> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> + * model and family for which the RMP table entry format is currently defined for.
> + */
> + if (family != 0x19 || model > 0xaf)
> + goto nosnp;
> +
> + if (amd_iommu_snp_enable())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return -ENOSYS;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
> + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
> + * called after subsys_initcall().
> + *
> + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
> + * directly into guest private memory. In case of SNP, the IOMMU ensures that
> + * the page(s) used for DMA are hypervisor owned.
> + */
> +fs_initcall(snp_rmptable_init);
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 45efb7e5d725..1c9924de607a 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -3802,7 +3802,7 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
> return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
> }
>
> -#ifdef CONFIG_AMD_MEM_ENCRYPT
> +#ifdef CONFIG_KVM_AMD_SEV
> int amd_iommu_snp_enable(void)
> {
> /*
> diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
> index 99a5201d9e62..55fc03cb3968 100644
> --- a/include/linux/amd-iommu.h
> +++ b/include/linux/amd-iommu.h
> @@ -205,7 +205,7 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn,
> u64 *value);
> struct amd_iommu *get_amd_iommu(unsigned int idx);
>
> -#ifdef CONFIG_AMD_MEM_ENCRYPT
> +#ifdef CONFIG_KVM_AMD_SEV
> int amd_iommu_snp_enable(void);
> #endif
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-10-16 13:27 ` [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support Michael Roth
2023-10-25 18:19 ` Tom Lendacky
@ 2023-11-07 16:31 ` Borislav Petkov
2023-11-07 18:32 ` Tom Lendacky
` (2 more replies)
1 sibling, 3 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-11-07 16:31 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:35AM -0500, Michael Roth wrote:
> +static bool early_rmptable_check(void)
> +{
> + u64 rmp_base, rmp_size;
> +
> + /*
> + * For early BSP initialization, max_pfn won't be set up yet, wait until
> + * it is set before performing the RMP table calculations.
> + */
> + if (!max_pfn)
> + return true;
This already says that this is called at the wrong point during init.
Right now we have
early_identify_cpu -> early_init_amd -> early_detect_mem_encrypt
which runs only on the BSP but then early_init_amd() is called in
init_amd() too so that it takes care of the APs too.
Which ends up doing a lot of unnecessary work on each AP in
early_detect_mem_encrypt() like calculating the RMP size on each AP
unnecessarily where this needs to happen exactly once.
Is there any reason why this function cannot be moved to init_amd()
where it'll do the normal, per-AP init?
And the stuff that needs to happen once, needs to be called once too.
> +
> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
> +}
> +
> static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> {
> u64 msr;
> @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> if (!(msr & MSR_K7_HWCR_SMMLOCK))
> goto clear_sev;
>
> + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
> + goto clear_snp;
> +
> return;
>
> clear_all:
> @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> clear_sev:
> setup_clear_cpu_cap(X86_FEATURE_SEV);
> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
> +clear_snp:
> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> }
> }
...
> +bool snp_get_rmptable_info(u64 *start, u64 *len)
> +{
> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
> +
> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
> +
> + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
> + return false;
> + }
If you're masking off bits 0-12 above...
> +
> + if (rmp_base > rmp_end) {
... why aren't you using the masked out vars further on?
I know, the hw will say, yeah, those bits are 0 but still. IOW, do:
rmp_base &= RMP_ADDR_MASK;
rmp_end &= RMP_ADDR_MASK;
after reading them.
> + pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
> + return false;
> + }
> +
> + rmp_sz = rmp_end - rmp_base + 1;
> +
> + /*
> + * Calculate the amount the memory that must be reserved by the BIOS to
> + * address the whole RAM, including the bookkeeping area. The RMP itself
> + * must also be covered.
> + */
> + max_rmp_pfn = max_pfn;
> + if (PHYS_PFN(rmp_end) > max_pfn)
> + max_rmp_pfn = PHYS_PFN(rmp_end);
> +
> + calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + if (calc_rmp_sz > rmp_sz) {
> + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
> + calc_rmp_sz, rmp_sz);
> + return false;
> + }
> +
> + *start = rmp_base;
> + *len = rmp_sz;
> +
> + return true;
> +}
> +
> +static __init int __snp_rmptable_init(void)
> +{
> + u64 rmp_base, rmp_size;
> + void *rmp_start;
> + u64 val;
> +
> + if (!snp_get_rmptable_info(&rmp_base, &rmp_size))
> + return 1;
> +
> + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
That's "RMP table physical range"
> + rmp_base, rmp_base + rmp_size - 1);
> +
> + rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB);
> + if (!rmp_start) {
> + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size);
No need to dump rmp_base and rmp_size again here - you're dumping them
above.
> + return 1;
> + }
> +
> + /*
> + * Check if SEV-SNP is already enabled, this can happen in case of
> + * kexec boot.
> + */
> + rdmsrl(MSR_AMD64_SYSCFG, val);
> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
> + goto skip_enable;
> +
> + /* Initialize the RMP table to zero */
Again: useless comment.
> + memset(rmp_start, 0, rmp_size);
> +
> + /* Flush the caches to ensure that data is written before SNP is enabled. */
> + wbinvd_on_all_cpus();
> +
> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
First of all, use the APM bit name here pls: MtrrFixDramModEn.
And then, for the life of me, I can't find any mention in the APM why
this bit is needed. Neither in "15.36.2 Enabling SEV-SNP" nor in
"15.34.3 Enabling SEV".
Looking at the bit defintions of WrMem an RdMem - read and write
requests get directed to system memory instead of MMIO so I guess you
don't want to be able to write MMIO for certain physical ranges when SNP
is enabled but it'll be good to have this properly explained instead of
a "this must happen" information-less sentence.
> + on_each_cpu(mfd_enable, NULL, 1);
> +
> + /* Enable SNP on all CPUs. */
Useless comment.
> + on_each_cpu(snp_enable, NULL, 1);
> +
> +skip_enable:
> + rmp_start += RMPTABLE_CPU_BOOKKEEPING_SZ;
> + rmp_size -= RMPTABLE_CPU_BOOKKEEPING_SZ;
> +
> + rmptable_start = (struct rmpentry *)rmp_start;
> + rmptable_max_pfn = rmp_size / sizeof(struct rmpentry) - 1;
> +
> + return 0;
> +}
> +
> +static int __init snp_rmptable_init(void)
> +{
> + int family, model;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return 0;
> +
> + family = boot_cpu_data.x86;
> + model = boot_cpu_data.x86_model;
Looks useless - just use boot_cpu_data directly below.
As mentioned here already https://lore.kernel.org/all/Y9ubi0i4Z750gdMm@zn.tnic/
And I already mentioned that for v9:
https://lore.kernel.org/r/20230621094236.GZZJLGDAicp1guNPvD@fat_crate.local
Next time I'm NAKing this patch until you incorporate all review
comments or you give a technical reason why you disagree with them.
> + /*
> + * RMP table entry format is not architectural and it can vary by processor and
> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> + * model and family for which the RMP table entry format is currently defined for.
> + */
> + if (family != 0x19 || model > 0xaf)
> + goto nosnp;
> +
> + if (amd_iommu_snp_enable())
> + goto nosnp;
> +
> + if (__snp_rmptable_init())
> + goto nosnp;
> +
> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
> +
> + return 0;
> +
> +nosnp:
> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
> + return -ENOSYS;
> +}
> +
> +/*
> + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
> + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
> + * called after subsys_initcall().
> + *
> + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
> + * directly into guest private memory. In case of SNP, the IOMMU ensures that
> + * the page(s) used for DMA are hypervisor owned.
> + */
> +fs_initcall(snp_rmptable_init);
This looks backwards. AFAICT, the IOMMU code should call arch code to
enable SNP at the right time, not the other way around - arch code
calling driver code.
Especially if the SNP table enablement depends on some exact IOMMU
init_state:
if (init_state > IOMMU_ENABLED) {
pr_err("SNP: Too late to enable SNP for IOMMU.\n");
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 16:31 ` Borislav Petkov
@ 2023-11-07 18:32 ` Tom Lendacky
2023-11-07 19:13 ` Borislav Petkov
2023-11-08 8:21 ` Jeremi Piotrowski
2023-11-07 19:00 ` Kalra, Ashish
2023-12-20 7:07 ` Michael Roth
2 siblings, 2 replies; 158+ messages in thread
From: Tom Lendacky @ 2023-11-07 18:32 UTC (permalink / raw)
To: Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
rientjes, dovmurik, tobin, vbabka, kirill, ak, tony.luck,
marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On 11/7/23 10:31, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:35AM -0500, Michael Roth wrote:
>> +static bool early_rmptable_check(void)
>> +{
>> + u64 rmp_base, rmp_size;
>> +
>> + /*
>> + * For early BSP initialization, max_pfn won't be set up yet, wait until
>> + * it is set before performing the RMP table calculations.
>> + */
>> + if (!max_pfn)
>> + return true;
>
> This already says that this is called at the wrong point during init.
(Just addressing some of your comments, Mike to address others)
I commented earlier that we can remove this check and then it becomes
purely a check for whether the RMP table has been pre-allocated by the
BIOS. It is done early here in order to allow for AutoIBRS to be used as a
Spectre mitigation. If an RMP table has not been allocated by BIOS then
the SNP feature can be cleared, allowing AutoIBRS to be used, if available.
>
> Right now we have
>
> early_identify_cpu -> early_init_amd -> early_detect_mem_encrypt
>
> which runs only on the BSP but then early_init_amd() is called in
> init_amd() too so that it takes care of the APs too.
>
> Which ends up doing a lot of unnecessary work on each AP in
> early_detect_mem_encrypt() like calculating the RMP size on each AP
> unnecessarily where this needs to happen exactly once.
>
> Is there any reason why this function cannot be moved to init_amd()
> where it'll do the normal, per-AP init?
It needs to be called early enough to allow for AutoIBRS to not be
disabled just because SNP is supported. By calling it where it is
currently called, the SNP feature can be cleared if, even though
supported, SNP can't be used, allowing AutoIBRS to be used as a more
performant Spectre mitigation.
>
> And the stuff that needs to happen once, needs to be called once too.
>
>> +
>> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
>> +}
>> +
>> static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>> {
>> u64 msr;
>> @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>> if (!(msr & MSR_K7_HWCR_SMMLOCK))
>> goto clear_sev;
>>
>> + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
>> + goto clear_snp;
>> +
>> return;
>>
>> clear_all:
>> @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>> clear_sev:
>> setup_clear_cpu_cap(X86_FEATURE_SEV);
>> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
>> +clear_snp:
>> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>> }
>> }
>
> ...
>
>> +bool snp_get_rmptable_info(u64 *start, u64 *len)
>> +{
>> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
>> +
>> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
>> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
>> +
>> + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
>> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
>> + return false;
>> + }
>
> If you're masking off bits 0-12 above...
Because the RMP_END MSR, most specifically, has a default value of 0x1fff,
where bits [12:0] are reserved. So to specifically check if the MSR has
been set to a non-zero end value, the bit are masked off. However, ...
>
>> +
>> + if (rmp_base > rmp_end) {
>
> ... why aren't you using the masked out vars further on?
... the full values can be used once they have been determined to not be zero.
>
> I know, the hw will say, yeah, those bits are 0 but still. IOW, do:
>
> rmp_base &= RMP_ADDR_MASK;
> rmp_end &= RMP_ADDR_MASK;
>
> after reading them.
You can't for RMP_END since it will always have bits 12:0 set to one and
you shouldn't clear them once you know that the MSR has truly been set.
Thanks,
Tom
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 18:32 ` Tom Lendacky
@ 2023-11-07 19:13 ` Borislav Petkov
2023-11-08 8:21 ` Jeremi Piotrowski
1 sibling, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-11-07 19:13 UTC (permalink / raw)
To: Tom Lendacky
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Tue, Nov 07, 2023 at 12:32:58PM -0600, Tom Lendacky wrote:
> It needs to be called early enough to allow for AutoIBRS to not be disabled
> just because SNP is supported. By calling it where it is currently called,
> the SNP feature can be cleared if, even though supported, SNP can't be used,
> allowing AutoIBRS to be used as a more performant Spectre mitigation.
So far so good.
However, early_rmptable_check -> snp_get_rmptable_info is unnecessary
work which happens on every AP for no reason whatsoever. That's reading
RMP_BASE and RMP_END, doing the same checks which it did on the BSP and
then throwing away the computed rmp_base and rmp_sz, all once per AP.
I don't mind doing early work which needs to be done only once.
I mind doing work which needs to be done only once, on every AP.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 18:32 ` Tom Lendacky
2023-11-07 19:13 ` Borislav Petkov
@ 2023-11-08 8:21 ` Jeremi Piotrowski
2023-11-08 15:19 ` Tom Lendacky
1 sibling, 1 reply; 158+ messages in thread
From: Jeremi Piotrowski @ 2023-11-08 8:21 UTC (permalink / raw)
To: Tom Lendacky, Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
rientjes, dovmurik, tobin, vbabka, kirill, ak, tony.luck,
marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On 07/11/2023 19:32, Tom Lendacky wrote:
> On 11/7/23 10:31, Borislav Petkov wrote:
>>
>> And the stuff that needs to happen once, needs to be called once too.
>>
>>> +
>>> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
>>> +}
>>> +
>>> static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>> {
>>> u64 msr;
>>> @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>> if (!(msr & MSR_K7_HWCR_SMMLOCK))
>>> goto clear_sev;
>>> + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
>>> + goto clear_snp;
>>> +
>>> return;
>>> clear_all:
>>> @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>> clear_sev:
>>> setup_clear_cpu_cap(X86_FEATURE_SEV);
>>> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
>>> +clear_snp:
>>> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>>> }
>>> }
>>
>> ...
>>
>>> +bool snp_get_rmptable_info(u64 *start, u64 *len)
>>> +{
>>> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
>>> +
>>> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
>>> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
>>> +
>>> + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
>>> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
>>> + return false;
>>> + }
>>
>> If you're masking off bits 0-12 above...
>
> Because the RMP_END MSR, most specifically, has a default value of 0x1fff, where bits [12:0] are reserved. So to specifically check if the MSR has been set to a non-zero end value, the bit are masked off. However, ...
>
Do you have a source for this? Because the APM vol. 2, table A.7 says the reset value of RMP_END is all zeros.
Thanks,
Jeremi
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-08 8:21 ` Jeremi Piotrowski
@ 2023-11-08 15:19 ` Tom Lendacky
0 siblings, 0 replies; 158+ messages in thread
From: Tom Lendacky @ 2023-11-08 15:19 UTC (permalink / raw)
To: Jeremi Piotrowski, Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, hpa, ardb, pbonzini, seanjc, vkuznets, jmattson,
luto, dave.hansen, slp, pgonda, peterz, srinivas.pandruvada,
rientjes, dovmurik, tobin, vbabka, kirill, ak, tony.luck,
marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On 11/8/23 02:21, Jeremi Piotrowski wrote:
> On 07/11/2023 19:32, Tom Lendacky wrote:
>> On 11/7/23 10:31, Borislav Petkov wrote:
>>>
>>> And the stuff that needs to happen once, needs to be called once too.
>>>
>>>> +
>>>> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
>>>> +}
>>>> +
>>>> static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>>> {
>>>> u64 msr;
>>>> @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>>> if (!(msr & MSR_K7_HWCR_SMMLOCK))
>>>> goto clear_sev;
>>>> + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
>>>> + goto clear_snp;
>>>> +
>>>> return;
>>>> clear_all:
>>>> @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>>> clear_sev:
>>>> setup_clear_cpu_cap(X86_FEATURE_SEV);
>>>> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
>>>> +clear_snp:
>>>> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>>>> }
>>>> }
>>>
>>> ...
>>>
>>>> +bool snp_get_rmptable_info(u64 *start, u64 *len)
>>>> +{
>>>> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
>>>> +
>>>> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
>>>> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
>>>> +
>>>> + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
>>>> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
>>>> + return false;
>>>> + }
>>>
>>> If you're masking off bits 0-12 above...
>>
>> Because the RMP_END MSR, most specifically, has a default value of 0x1fff, where bits [12:0] are reserved. So to specifically check if the MSR has been set to a non-zero end value, the bit are masked off. However, ...
>>
>
> Do you have a source for this? Because the APM vol. 2, table A.7 says the reset value of RMP_END is all zeros.
Ah, good catch. Let me work on getting the APM updated.
Thanks,
Tom
>
> Thanks,
> Jeremi
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 16:31 ` Borislav Petkov
2023-11-07 18:32 ` Tom Lendacky
@ 2023-11-07 19:00 ` Kalra, Ashish
2023-11-07 19:19 ` Borislav Petkov
2023-12-08 17:09 ` Jeremi Piotrowski
2023-12-20 7:07 ` Michael Roth
2 siblings, 2 replies; 158+ messages in thread
From: Kalra, Ashish @ 2023-11-07 19:00 UTC (permalink / raw)
To: Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
Hello Boris,
Addressing of some of the remaining comments:
On 11/7/2023 10:31 AM, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:35AM -0500, Michael Roth wrote:
>> +static bool early_rmptable_check(void)
>> +{
>> + u64 rmp_base, rmp_size;
>> +
>> + /*
>> + * For early BSP initialization, max_pfn won't be set up yet, wait until
>> + * it is set before performing the RMP table calculations.
>> + */
>> + if (!max_pfn)
>> + return true;
>
> This already says that this is called at the wrong point during init.
>
> Right now we have
>
> early_identify_cpu -> early_init_amd -> early_detect_mem_encrypt
>
> which runs only on the BSP but then early_init_amd() is called in
> init_amd() too so that it takes care of the APs too.
>
> Which ends up doing a lot of unnecessary work on each AP in
> early_detect_mem_encrypt() like calculating the RMP size on each AP
> unnecessarily where this needs to happen exactly once.
>
> Is there any reason why this function cannot be moved to init_amd()
> where it'll do the normal, per-AP init?
>
> And the stuff that needs to happen once, needs to be called once too.
>
>> +
>> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
>> +}
>> +
>> static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>> {
>> u64 msr;
>> @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>> if (!(msr & MSR_K7_HWCR_SMMLOCK))
>> goto clear_sev;
>>
>> + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
>> + goto clear_snp;
>> +
>> return;
>>
>> clear_all:
>> @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>> clear_sev:
>> setup_clear_cpu_cap(X86_FEATURE_SEV);
>> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
>> +clear_snp:
>> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>> }
>> }
>
> ...
>
>> +bool snp_get_rmptable_info(u64 *start, u64 *len)
>> +{
>> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
>> +
>> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
>> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
>> +
>> + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
>> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
>> + return false;
>> + }
>
> If you're masking off bits 0-12 above...
>
>> +
>> + if (rmp_base > rmp_end) {
>
> ... why aren't you using the masked out vars further on?
>
> I know, the hw will say, yeah, those bits are 0 but still. IOW, do:
>
> rmp_base &= RMP_ADDR_MASK;
> rmp_end &= RMP_ADDR_MASK;
>
> after reading them.
>
>> + pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
>> + return false;
>> + }
>> +
>> + rmp_sz = rmp_end - rmp_base + 1;
>> +
>> + /*
>> + * Calculate the amount the memory that must be reserved by the BIOS to
>> + * address the whole RAM, including the bookkeeping area. The RMP itself
>> + * must also be covered.
>> + */
>> + max_rmp_pfn = max_pfn;
>> + if (PHYS_PFN(rmp_end) > max_pfn)
>> + max_rmp_pfn = PHYS_PFN(rmp_end);
>> +
>> + calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
>> +
>> + if (calc_rmp_sz > rmp_sz) {
>> + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
>> + calc_rmp_sz, rmp_sz);
>> + return false;
>> + }
>> +
>> + *start = rmp_base;
>> + *len = rmp_sz;
>> +
>> + return true;
>> +}
>> +
>> +static __init int __snp_rmptable_init(void)
>> +{
>> + u64 rmp_base, rmp_size;
>> + void *rmp_start;
>> + u64 val;
>> +
>> + if (!snp_get_rmptable_info(&rmp_base, &rmp_size))
>> + return 1;
>> +
>> + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
>
> That's "RMP table physical range"
>
>> + rmp_base, rmp_base + rmp_size - 1);
>> +
>> + rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB);
>> + if (!rmp_start) {
>> + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size);
>
> No need to dump rmp_base and rmp_size again here - you're dumping them
> above.
>
>> + return 1;
>> + }
>> +
>> + /*
>> + * Check if SEV-SNP is already enabled, this can happen in case of
>> + * kexec boot.
>> + */
>> + rdmsrl(MSR_AMD64_SYSCFG, val);
>> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
>> + goto skip_enable;
>> +
>> + /* Initialize the RMP table to zero */
>
> Again: useless comment.
>
>> + memset(rmp_start, 0, rmp_size);
>> +
>> + /* Flush the caches to ensure that data is written before SNP is enabled. */
>> + wbinvd_on_all_cpus();
>> +
>> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
>
> First of all, use the APM bit name here pls: MtrrFixDramModEn.
>
> And then, for the life of me, I can't find any mention in the APM why
> this bit is needed. Neither in "15.36.2 Enabling SEV-SNP" nor in
> "15.34.3 Enabling SEV".
>
> Looking at the bit defintions of WrMem an RdMem - read and write
> requests get directed to system memory instead of MMIO so I guess you
> don't want to be able to write MMIO for certain physical ranges when SNP
> is enabled but it'll be good to have this properly explained instead of
> a "this must happen" information-less sentence.
This is a per-requisite for SNP_INIT as per the SNP Firmware ABI
specifications, section 8.8.2:
From the SNP FW ABI specs:
If INIT_RMP is 1, then the firmware ensures the following system
requirements are met:
• SYSCFG[MemoryEncryptionModEn] must be set to 1 across all cores. (SEV
must be
enabled.)
• SYSCFG[SecureNestedPagingEn] must be set to 1 across all cores.
• SYSCFG[VMPLEn] must be set to 1 across all cores.
• SYSCFG[MFDM] must be set to 1 across all cores.
• VM_HSAVE_PA (MSR C001_0117) must be set to 0h across all cores.
• HWCR[SmmLock] (MSR C001_0015) must be set to 1 across all cores.
So, this platform enabling code for SNP needs to ensure that these
conditions are met before SNP_INIT is called.
>
>> + on_each_cpu(mfd_enable, NULL, 1);
>> +
>> + /* Enable SNP on all CPUs. */
>
> Useless comment.
>
>> + on_each_cpu(snp_enable, NULL, 1);
>> +
>> +skip_enable:
>> + rmp_start += RMPTABLE_CPU_BOOKKEEPING_SZ;
>> + rmp_size -= RMPTABLE_CPU_BOOKKEEPING_SZ;
>> +
>> + rmptable_start = (struct rmpentry *)rmp_start;
>> + rmptable_max_pfn = rmp_size / sizeof(struct rmpentry) - 1;
>> +
>> + return 0;
>> +}
>> +
>> +static int __init snp_rmptable_init(void)
>> +{
>> + int family, model;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> + return 0;
>> +
>> + family = boot_cpu_data.x86;
>> + model = boot_cpu_data.x86_model;
>
> Looks useless - just use boot_cpu_data directly below.
>
> As mentioned here already https://lore.kernel.org/all/Y9ubi0i4Z750gdMm@zn.tnic/
>
> And I already mentioned that for v9:
>
> https://lore.kernel.org/r/20230621094236.GZZJLGDAicp1guNPvD@fat_crate.local
>
> Next time I'm NAKing this patch until you incorporate all review
> comments or you give a technical reason why you disagree with them.
>
>> + /*
>> + * RMP table entry format is not architectural and it can vary by processor and
>> + * is defined by the per-processor PPR. Restrict SNP support on the known CPU
>> + * model and family for which the RMP table entry format is currently defined for.
>> + */
>> + if (family != 0x19 || model > 0xaf)
>> + goto nosnp;
>> +
>> + if (amd_iommu_snp_enable())
>> + goto nosnp;
>> +
>> + if (__snp_rmptable_init())
>> + goto nosnp;
>> +
>> + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
>> +
>> + return 0;
>> +
>> +nosnp:
>> + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>> + return -ENOSYS;
>> +}
>> +
>> +/*
>> + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
>> + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
>> + * called after subsys_initcall().
>> + *
>> + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
>> + * directly into guest private memory. In case of SNP, the IOMMU ensures that
>> + * the page(s) used for DMA are hypervisor owned.
>> + */
>> +fs_initcall(snp_rmptable_init);
>
> This looks backwards. AFAICT, the IOMMU code should call arch code to
> enable SNP at the right time, not the other way around - arch code
> calling driver code.
>
> Especially if the SNP table enablement depends on some exact IOMMU
> init_state:
>
> if (init_state > IOMMU_ENABLED) {
> pr_err("SNP: Too late to enable SNP for IOMMU.\n");
>
>
This is again as per SNP_INIT requirements, that SNP support is enabled
in the IOMMU before SNP_INIT is done. The above function
snp_rmptable_init() only calls the IOMMU driver to enable SNP support
when it has detected and enabled platform support for SNP.
It is not that IOMMU driver has to call the arch code to enable SNP at
the right time but it is the other way around that once platform support
for SNP is enabled then the IOMMU driver has to be called to enable the
same for the IOMMU and this needs to be done before the CCP driver is
loaded and does SNP_INIT.
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 19:00 ` Kalra, Ashish
@ 2023-11-07 19:19 ` Borislav Petkov
2023-11-07 20:27 ` Borislav Petkov
2023-12-08 17:09 ` Jeremi Piotrowski
1 sibling, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-11-07 19:19 UTC (permalink / raw)
To: Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On Tue, Nov 07, 2023 at 01:00:00PM -0600, Kalra, Ashish wrote:
> > First of all, use the APM bit name here pls: MtrrFixDramModEn.
> >
> > And then, for the life of me, I can't find any mention in the APM why
> > this bit is needed. Neither in "15.36.2 Enabling SEV-SNP" nor in
> > "15.34.3 Enabling SEV".
> >
> > Looking at the bit defintions of WrMem an RdMem - read and write
> > requests get directed to system memory instead of MMIO so I guess you
> > don't want to be able to write MMIO for certain physical ranges when SNP
> > is enabled but it'll be good to have this properly explained instead of
> > a "this must happen" information-less sentence.
>
> This is a per-requisite for SNP_INIT as per the SNP Firmware ABI
> specifications, section 8.8.2:
Did you even read the text you're responding to?
> > This looks backwards. AFAICT, the IOMMU code should call arch code to
> > enable SNP at the right time, not the other way around - arch code
> > calling driver code.
> >
> > Especially if the SNP table enablement depends on some exact IOMMU
> > init_state:
> >
> > if (init_state > IOMMU_ENABLED) {
> > pr_err("SNP: Too late to enable SNP for IOMMU.\n");
> >
> >
>
> This is again as per SNP_INIT requirements, that SNP support is enabled in
> the IOMMU before SNP_INIT is done. The above function snp_rmptable_init()
> only calls the IOMMU driver to enable SNP support when it has detected and
> enabled platform support for SNP.
>v
> It is not that IOMMU driver has to call the arch code to enable SNP at the
> right time but it is the other way around that once platform support for SNP
> is enabled then the IOMMU driver has to be called to enable the same for the
> IOMMU and this needs to be done before the CCP driver is loaded and does
> SNP_INIT.
You again didn't read the text you're responding to.
Arch code does not call drivers - arch code sets up the arch and
provides facilities which the drivers use.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 19:19 ` Borislav Petkov
@ 2023-11-07 20:27 ` Borislav Petkov
2023-11-07 21:21 ` Kalra, Ashish
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-11-07 20:27 UTC (permalink / raw)
To: Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On Tue, Nov 07, 2023 at 08:19:31PM +0100, Borislav Petkov wrote:
> Arch code does not call drivers - arch code sets up the arch and
> provides facilities which the drivers use.
IOW (just an example diff):
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1c9924de607a..00cdbc844961 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3290,6 +3290,7 @@ static int __init state_next(void)
break;
case IOMMU_ENABLED:
register_syscore_ops(&amd_iommu_syscore_ops);
+ amd_iommu_snp_enable();
ret = amd_iommu_init_pci();
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT;
break;
@@ -3814,16 +3815,6 @@ int amd_iommu_snp_enable(void)
return -EINVAL;
}
- /*
- * Prevent enabling SNP after IOMMU_ENABLED state because this process
- * affect how IOMMU driver sets up data structures and configures
- * IOMMU hardware.
- */
- if (init_state > IOMMU_ENABLED) {
- pr_err("SNP: Too late to enable SNP for IOMMU.\n");
- return -EINVAL;
- }
-
amd_iommu_snp_en = check_feature_on_all_iommus(FEATURE_SNP);
if (!amd_iommu_snp_en)
return -EINVAL;
and now you only need to line up snp_rmptable_init() after IOMMU init
instead of having it be a fs_initcall which happens right after
pci_subsys_init() so that PCI is there but at the right time when iommu
init state is at IOMMU_ENABLED but no later because then it is too late.
And there you need to test amd_iommu_snp_en which is already exported
anyway.
Ok?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 20:27 ` Borislav Petkov
@ 2023-11-07 21:21 ` Kalra, Ashish
2023-11-07 21:27 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Kalra, Ashish @ 2023-11-07 21:21 UTC (permalink / raw)
To: Borislav Petkov
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On 11/7/2023 2:27 PM, Borislav Petkov wrote:
> On Tue, Nov 07, 2023 at 08:19:31PM +0100, Borislav Petkov wrote:
>> Arch code does not call drivers - arch code sets up the arch and
>> provides facilities which the drivers use.
>
> IOW (just an example diff):
>
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 1c9924de607a..00cdbc844961 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -3290,6 +3290,7 @@ static int __init state_next(void)
> break;
> case IOMMU_ENABLED:
> register_syscore_ops(&amd_iommu_syscore_ops);
> + amd_iommu_snp_enable();
> ret = amd_iommu_init_pci();
> init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT;
> break;
> @@ -3814,16 +3815,6 @@ int amd_iommu_snp_enable(void)
> return -EINVAL;
> }
>
> - /*
> - * Prevent enabling SNP after IOMMU_ENABLED state because this process
> - * affect how IOMMU driver sets up data structures and configures
> - * IOMMU hardware.
> - */
> - if (init_state > IOMMU_ENABLED) {
> - pr_err("SNP: Too late to enable SNP for IOMMU.\n");
> - return -EINVAL;
> - }
> -
> amd_iommu_snp_en = check_feature_on_all_iommus(FEATURE_SNP);
> if (!amd_iommu_snp_en)
> return -EINVAL;
>
> and now you only need to line up snp_rmptable_init() after IOMMU init
> instead of having it be a fs_initcall which happens right after
> pci_subsys_init() so that PCI is there but at the right time when iommu
> init state is at IOMMU_ENABLED but no later because then it is too late.
>
> And there you need to test amd_iommu_snp_en which is already exported
> anyway.
>
> Ok?
>
No, this is not correct as this will always enable SNP support on IOMMU
even when SNP support is not supported and enabled on the platform, and
then we will do stuff like forcing IOMMU v1 pagetables which we really
don't want to do if SNP is not supported and enabled on the platform.
That's what snp_rmptable_init() calling amd_iommu_snp_enable() ensures
that SNP on IOMMU is *only* enabled when platform/arch support for it is
detected and enabled.
And isn't IOMMU driver always going to be built-in and isn't it part of
the platform support (not arch code, but surely platform specific code)?
(IOMMU enablement is requirement for SNP).
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 21:21 ` Kalra, Ashish
@ 2023-11-07 21:27 ` Borislav Petkov
2023-11-07 22:08 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-11-07 21:27 UTC (permalink / raw)
To: Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On Tue, Nov 07, 2023 at 03:21:29PM -0600, Kalra, Ashish wrote:
> No, this is not correct as this will always enable SNP support on
> IOMMU even when SNP support is not supported and enabled on the
> platform,
You see that we set or clear X86_FEATURE_SEV_SNP depending on support,
right?
Which means, you need to test that bit in amd_iommu_snp_enable() first.
> And isn't IOMMU driver always going to be built-in and isn't it part of the
> platform support (not arch code, but surely platform specific code)?
> (IOMMU enablement is requirement for SNP).
Read the note again about the fragile ordering in my previous mail.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 21:27 ` Borislav Petkov
@ 2023-11-07 22:08 ` Borislav Petkov
2023-11-07 22:33 ` Kalra, Ashish
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-11-07 22:08 UTC (permalink / raw)
To: Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
Ontop. Only build-tested.
---
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index 2fd52b65deac..3be2451e7bc8 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -10,6 +10,7 @@ extern int force_iommu, no_iommu;
extern int iommu_detected;
extern int iommu_merge;
extern int panic_on_overflow;
+extern bool amd_iommu_snp_en;
#ifdef CONFIG_SWIOTLB
extern bool x86_swiotlb_enable;
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 8b9ed72489e4..9237c327ad6d 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -196,23 +196,15 @@ static __init int __snp_rmptable_init(void)
static int __init snp_rmptable_init(void)
{
- int family, model;
-
- if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ if (!amd_iommu_snp_en)
return 0;
- family = boot_cpu_data.x86;
- model = boot_cpu_data.x86_model;
-
/*
* RMP table entry format is not architectural and it can vary by processor and
* is defined by the per-processor PPR. Restrict SNP support on the known CPU
* model and family for which the RMP table entry format is currently defined for.
*/
- if (family != 0x19 || model > 0xaf)
- goto nosnp;
-
- if (amd_iommu_snp_enable())
+ if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xaf)
goto nosnp;
if (__snp_rmptable_init())
@@ -228,12 +220,10 @@ static int __init snp_rmptable_init(void)
}
/*
- * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable()
- * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be
- * called after subsys_initcall().
+ * This must be called after the IOMMU has been initialized.
*
* NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA
* directly into guest private memory. In case of SNP, the IOMMU ensures that
* the page(s) used for DMA are hypervisor owned.
*/
-fs_initcall(snp_rmptable_init);
+device_initcall(snp_rmptable_init);
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index e2857109e966..353d68b25fe2 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -148,6 +148,4 @@ struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
extern u64 amd_iommu_efr;
extern u64 amd_iommu_efr2;
-
-extern bool amd_iommu_snp_en;
#endif
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1c9924de607a..9e72cd8413bb 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3255,6 +3255,35 @@ static bool __init detect_ivrs(void)
return true;
}
+#ifdef CONFIG_KVM_AMD_SEV
+static void iommu_snp_enable(void)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return;
+
+ /*
+ * The SNP support requires that IOMMU must be enabled, and is
+ * not configured in the passthrough mode.
+ */
+ if (no_iommu || iommu_default_passthrough()) {
+ pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported");
+ return;
+ }
+
+ amd_iommu_snp_en = check_feature_on_all_iommus(FEATURE_SNP);
+ if (!amd_iommu_snp_en)
+ return;
+
+ pr_info("SNP enabled\n");
+
+ /* Enforce IOMMU v1 pagetable when SNP is enabled. */
+ if (amd_iommu_pgtable != AMD_IOMMU_V1) {
+ pr_warn("Force to using AMD IOMMU v1 page table due to SNP\n");
+ amd_iommu_pgtable = AMD_IOMMU_V1;
+ }
+}
+#endif
+
/****************************************************************************
*
* AMD IOMMU Initialization State Machine
@@ -3290,6 +3319,7 @@ static int __init state_next(void)
break;
case IOMMU_ENABLED:
register_syscore_ops(&amd_iommu_syscore_ops);
+ iommu_snp_enable();
ret = amd_iommu_init_pci();
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT;
break;
@@ -3802,40 +3832,4 @@ int amd_iommu_pc_set_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64
return iommu_pc_get_set_reg(iommu, bank, cntr, fxn, value, true);
}
-#ifdef CONFIG_KVM_AMD_SEV
-int amd_iommu_snp_enable(void)
-{
- /*
- * The SNP support requires that IOMMU must be enabled, and is
- * not configured in the passthrough mode.
- */
- if (no_iommu || iommu_default_passthrough()) {
- pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported");
- return -EINVAL;
- }
-
- /*
- * Prevent enabling SNP after IOMMU_ENABLED state because this process
- * affect how IOMMU driver sets up data structures and configures
- * IOMMU hardware.
- */
- if (init_state > IOMMU_ENABLED) {
- pr_err("SNP: Too late to enable SNP for IOMMU.\n");
- return -EINVAL;
- }
-
- amd_iommu_snp_en = check_feature_on_all_iommus(FEATURE_SNP);
- if (!amd_iommu_snp_en)
- return -EINVAL;
-
- pr_info("SNP enabled\n");
-
- /* Enforce IOMMU v1 pagetable when SNP is enabled. */
- if (amd_iommu_pgtable != AMD_IOMMU_V1) {
- pr_warn("Force to using AMD IOMMU v1 page table due to SNP\n");
- amd_iommu_pgtable = AMD_IOMMU_V1;
- }
- return 0;
-}
-#endif
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 22:08 ` Borislav Petkov
@ 2023-11-07 22:33 ` Kalra, Ashish
2023-11-08 6:14 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Kalra, Ashish @ 2023-11-07 22:33 UTC (permalink / raw)
To: Borislav Petkov
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On 11/7/2023 4:08 PM, Borislav Petkov wrote:
> static int __init snp_rmptable_init(void)
> {
> - int family, model;
> -
> - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + if (!amd_iommu_snp_en)
> return 0;
>
We will still need some method to tell the IOMMU driver if SNP
support/feature is disabled by this function, for example, when CPU
family and model is not supported by SNP and we jump to no_snp label.
The reliable way for this to work is to ensure snp_rmptable_init() is
called before IOMMU initialization and then IOMMU initialization depends
on SNP feature flag setup by snp_rmptable_init() to enable SNP support
on IOMMU or not.
If snp_rmptable_init() is called after IOMMU initialization and it
detects an issue with SNP support it will clear the SNP feature but the
IOMMU driver does not get notified about it, therefore,
snp_rmptable_init() should get called before IOMMU initialization or as
part of IOMMU initialization, for example, amd_iommu_enable() calling
snp_rmptable_init() before calling iommu_snp_enable().
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 22:33 ` Kalra, Ashish
@ 2023-11-08 6:14 ` Borislav Petkov
2023-11-08 9:11 ` Jeremi Piotrowski
2023-11-08 19:53 ` Kalra, Ashish
0 siblings, 2 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-11-08 6:14 UTC (permalink / raw)
To: Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On Tue, Nov 07, 2023 at 04:33:41PM -0600, Kalra, Ashish wrote:
> We will still need some method to tell the IOMMU driver if SNP
> support/feature is disabled by this function, for example, when CPU family
> and model is not supported by SNP and we jump to no_snp label.
See below.
> The reliable way for this to work is to ensure snp_rmptable_init() is called
> before IOMMU initialization and then IOMMU initialization depends on SNP
> feature flag setup by snp_rmptable_init() to enable SNP support on IOMMU or
> not.
Yes, this whole SNP initialization needs to be reworked and split this
way:
- early detection work which needs to be done once goes to
bsp_init_amd(): that's basically your early_detect_mem_encrypt() stuff
which needs to happen exactly only once and early.
- Any work like:
c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
and the like which needs to happen on each AP, gets put in a function
which gets called by init_amd().
By the time IOMMU gets to init, you already know whether it should
enable SNP and check X86_FEATURE_SEV_SNP.
Finally, you call __snp_rmptable_init() which does the *per-CPU* init
work which is still pending.
Ok?
Ontop of the previous ontop patch:
---
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6cc2074fcea3..a9c95e5d6b06 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -674,8 +674,19 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
if (!(msr & MSR_K7_HWCR_SMMLOCK))
goto clear_sev;
- if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
- goto clear_snp;
+ if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
+ /*
+ * RMP table entry format is not architectural and it can vary by processor
+ * and is defined by the per-processor PPR. Restrict SNP support on the known
+ * CPU model and family for which the RMP table entry format is currently
+ * defined for.
+ */
+ if (c->x86 != 0x19 || c->x86_model > 0xaf)
+ goto clear_snp;
+
+ if (!early_rmptable_check())
+ goto clear_snp;
+ }
return;
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 9237c327ad6d..5a71df9ae4cb 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -199,14 +199,6 @@ static int __init snp_rmptable_init(void)
if (!amd_iommu_snp_en)
return 0;
- /*
- * RMP table entry format is not architectural and it can vary by processor and
- * is defined by the per-processor PPR. Restrict SNP support on the known CPU
- * model and family for which the RMP table entry format is currently defined for.
- */
- if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xaf)
- goto nosnp;
-
if (__snp_rmptable_init())
goto nosnp;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-08 6:14 ` Borislav Petkov
@ 2023-11-08 9:11 ` Jeremi Piotrowski
2023-11-08 19:53 ` Kalra, Ashish
1 sibling, 0 replies; 158+ messages in thread
From: Jeremi Piotrowski @ 2023-11-08 9:11 UTC (permalink / raw)
To: Borislav Petkov, Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On 08/11/2023 07:14, Borislav Petkov wrote:
> On Tue, Nov 07, 2023 at 04:33:41PM -0600, Kalra, Ashish wrote:
>> We will still need some method to tell the IOMMU driver if SNP
>> support/feature is disabled by this function, for example, when CPU family
>> and model is not supported by SNP and we jump to no_snp label.
>
> See below.
>
>> The reliable way for this to work is to ensure snp_rmptable_init() is called
>> before IOMMU initialization and then IOMMU initialization depends on SNP
>> feature flag setup by snp_rmptable_init() to enable SNP support on IOMMU or
>> not.
>
> Yes, this whole SNP initialization needs to be reworked and split this
> way:
I agree with Borislav and have some comments of my own.
>
> - early detection work which needs to be done once goes to
> bsp_init_amd(): that's basically your early_detect_mem_encrypt() stuff
> which needs to happen exactly only once and early.
>
> - Any work like:
>
> c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
>
> and the like which needs to happen on each AP, gets put in a function
> which gets called by init_amd().
>
> By the time IOMMU gets to init, you already know whether it should
> enable SNP and check X86_FEATURE_SEV_SNP.
This flow would suit me better too. In SNP-host capable Hyper-V VMs there
is no IOMMU and I've had to resort to early return from amd_iommu_snp_enable()
to prevent it from disabling SNP [1]. In addition to what Borislav posted,
you'd just need to enforce that if IOMMU is detected it actually gets enabled.
[1]: https://lore.kernel.org/lkml/20230213103402.1189285-6-jpiotrowski@linux.microsoft.com/
>
> Finally, you call __snp_rmptable_init() which does the *per-CPU* init
> work which is still pending.
Yes please, and the only rmp thing left to do per-CPU would be to check that the MSRs are
set the same as the value read from CPU0.
Running the early_rmptable_check() from early_detect_mem_encrypt() and on every CPU
makes it difficult to support a kernel allocated RMP table. If you look at what I did
for the mentioned Hyper-V SNP-host VMs [2] (which I think is reasonable) the RMP table
is allocated in init_mem_mapping() and the addresses are propagated to other CPUs through
hv_cpu_init(), which is called from cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, ...). So
it would be great if any init works plays nice with cpu hotplug notifiers.
[2]: https://lore.kernel.org/lkml/20230213103402.1189285-2-jpiotrowski@linux.microsoft.com/
Thanks,
Jeremi
>
> Ok?
>
> Ontop of the previous ontop patch:
>
> ---
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 6cc2074fcea3..a9c95e5d6b06 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -674,8 +674,19 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
> if (!(msr & MSR_K7_HWCR_SMMLOCK))
> goto clear_sev;
>
> - if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
> - goto clear_snp;
> + if (cpu_has(c, X86_FEATURE_SEV_SNP)) {
> + /*
> + * RMP table entry format is not architectural and it can vary by processor
> + * and is defined by the per-processor PPR. Restrict SNP support on the known
> + * CPU model and family for which the RMP table entry format is currently
> + * defined for.
> + */
> + if (c->x86 != 0x19 || c->x86_model > 0xaf)
> + goto clear_snp;
> +
> + if (!early_rmptable_check())
> + goto clear_snp;
> + }
>
> return;
>
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index 9237c327ad6d..5a71df9ae4cb 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -199,14 +199,6 @@ static int __init snp_rmptable_init(void)
> if (!amd_iommu_snp_en)
> return 0;
>
> - /*
> - * RMP table entry format is not architectural and it can vary by processor and
> - * is defined by the per-processor PPR. Restrict SNP support on the known CPU
> - * model and family for which the RMP table entry format is currently defined for.
> - */
> - if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xaf)
> - goto nosnp;
> -
> if (__snp_rmptable_init())
> goto nosnp;
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-08 6:14 ` Borislav Petkov
2023-11-08 9:11 ` Jeremi Piotrowski
@ 2023-11-08 19:53 ` Kalra, Ashish
1 sibling, 0 replies; 158+ messages in thread
From: Kalra, Ashish @ 2023-11-08 19:53 UTC (permalink / raw)
To: Borislav Petkov
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On 11/8/2023 12:14 AM, Borislav Petkov wrote:
> On Tue, Nov 07, 2023 at 04:33:41PM -0600, Kalra, Ashish wrote:
>> We will still need some method to tell the IOMMU driver if SNP
>> support/feature is disabled by this function, for example, when CPU family
>> and model is not supported by SNP and we jump to no_snp label.
>
> See below.
>
>> The reliable way for this to work is to ensure snp_rmptable_init() is called
>> before IOMMU initialization and then IOMMU initialization depends on SNP
>> feature flag setup by snp_rmptable_init() to enable SNP support on IOMMU or
>> not.
>
> Yes, this whole SNP initialization needs to be reworked and split this
> way:
>
> - early detection work which needs to be done once goes to
> bsp_init_amd(): that's basically your early_detect_mem_encrypt() stuff
> which needs to happen exactly only once and early.
>
> - Any work like:
>
> c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
>
> and the like which needs to happen on each AP, gets put in a function
> which gets called by init_amd().
>
> By the time IOMMU gets to init, you already know whether it should
> enable SNP and check X86_FEATURE_SEV_SNP.
>
> Finally, you call __snp_rmptable_init() which does the *per-CPU* init
> work which is still pending. >
> Ok?
Yes, will need to rework the SNP initialization stuff, the important
point is that we want to do snp_rmptable_init() stuff before IOMMU
initialization as for things like RMP table not correctly setup, etc.,
we don't want IOMMU initialization to enable SNP on the IOMMUs.
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 19:00 ` Kalra, Ashish
2023-11-07 19:19 ` Borislav Petkov
@ 2023-12-08 17:09 ` Jeremi Piotrowski
2023-12-08 23:21 ` Kalra, Ashish
1 sibling, 1 reply; 158+ messages in thread
From: Jeremi Piotrowski @ 2023-12-08 17:09 UTC (permalink / raw)
To: Kalra, Ashish, Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On 07/11/2023 20:00, Kalra, Ashish wrote:
> Hello Boris,
>
> Addressing of some of the remaining comments:
>
> On 11/7/2023 10:31 AM, Borislav Petkov wrote:
>> On Mon, Oct 16, 2023 at 08:27:35AM -0500, Michael Roth wrote:
>>> +static bool early_rmptable_check(void)
>>> +{
>>> + u64 rmp_base, rmp_size;
>>> +
>>> + /*
>>> + * For early BSP initialization, max_pfn won't be set up yet, wait until
>>> + * it is set before performing the RMP table calculations.
>>> + */
>>> + if (!max_pfn)
>>> + return true;
>>
>> This already says that this is called at the wrong point during init.
>>
>> Right now we have
>>
>> early_identify_cpu -> early_init_amd -> early_detect_mem_encrypt
>>
>> which runs only on the BSP but then early_init_amd() is called in
>> init_amd() too so that it takes care of the APs too.
>>
>> Which ends up doing a lot of unnecessary work on each AP in
>> early_detect_mem_encrypt() like calculating the RMP size on each AP
>> unnecessarily where this needs to happen exactly once.
>>
>> Is there any reason why this function cannot be moved to init_amd()
>> where it'll do the normal, per-AP init?
>>
>> And the stuff that needs to happen once, needs to be called once too.
>>
>>> +
>>> + return snp_get_rmptable_info(&rmp_base, &rmp_size);
>>> +}
>>> +
>>> static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>> {
>>> u64 msr;
>>> @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>> if (!(msr & MSR_K7_HWCR_SMMLOCK))
>>> goto clear_sev;
>>> + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check())
>>> + goto clear_snp;
>>> +
>>> return;
>>> clear_all:
>>> @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>>> clear_sev:
>>> setup_clear_cpu_cap(X86_FEATURE_SEV);
>>> setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
>>> +clear_snp:
>>> setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>>> }
>>> }
>>
>> ...
>>
>>> +bool snp_get_rmptable_info(u64 *start, u64 *len)
>>> +{
>>> + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end;
>>> +
>>> + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base);
>>> + rdmsrl(MSR_AMD64_RMP_END, rmp_end);
>>> +
>>> + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) {
>>> + pr_err("Memory for the RMP table has not been reserved by BIOS\n");
>>> + return false;
>>> + }
>>
>> If you're masking off bits 0-12 above...
>>
>>> +
>>> + if (rmp_base > rmp_end) {
>>
>> ... why aren't you using the masked out vars further on?
>>
>> I know, the hw will say, yeah, those bits are 0 but still. IOW, do:
>>
>> rmp_base &= RMP_ADDR_MASK;
>> rmp_end &= RMP_ADDR_MASK;
>>
>> after reading them.
>>
>>> + pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end);
>>> + return false;
>>> + }
>>> +
>>> + rmp_sz = rmp_end - rmp_base + 1;
>>> +
>>> + /*
>>> + * Calculate the amount the memory that must be reserved by the BIOS to
>>> + * address the whole RAM, including the bookkeeping area. The RMP itself
>>> + * must also be covered.
>>> + */
>>> + max_rmp_pfn = max_pfn;
>>> + if (PHYS_PFN(rmp_end) > max_pfn)
>>> + max_rmp_pfn = PHYS_PFN(rmp_end);
>>> +
>>> + calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ;
>>> +
>>> + if (calc_rmp_sz > rmp_sz) {
>>> + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n",
>>> + calc_rmp_sz, rmp_sz);
>>> + return false;
>>> + }
>>> +
>>> + *start = rmp_base;
>>> + *len = rmp_sz;
>>> +
>>> + return true;
>>> +}
>>> +
>>> +static __init int __snp_rmptable_init(void)
>>> +{
>>> + u64 rmp_base, rmp_size;
>>> + void *rmp_start;
>>> + u64 val;
>>> +
>>> + if (!snp_get_rmptable_info(&rmp_base, &rmp_size))
>>> + return 1;
>>> +
>>> + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n",
>>
>> That's "RMP table physical range"
>>
>>> + rmp_base, rmp_base + rmp_size - 1);
>>> +
>>> + rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB);
>>> + if (!rmp_start) {
>>> + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size);
>>
>> No need to dump rmp_base and rmp_size again here - you're dumping them
>> above.
>>
>>> + return 1;
>>> + }
>>> +
>>> + /*
>>> + * Check if SEV-SNP is already enabled, this can happen in case of
>>> + * kexec boot.
>>> + */
>>> + rdmsrl(MSR_AMD64_SYSCFG, val);
>>> + if (val & MSR_AMD64_SYSCFG_SNP_EN)
>>> + goto skip_enable;
>>> +
>>> + /* Initialize the RMP table to zero */
>>
>> Again: useless comment.
>>
>>> + memset(rmp_start, 0, rmp_size);
>>> +
>>> + /* Flush the caches to ensure that data is written before SNP is enabled. */
>>> + wbinvd_on_all_cpus();
>>> +
>>> + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */
>>
>> First of all, use the APM bit name here pls: MtrrFixDramModEn.
>>
>> And then, for the life of me, I can't find any mention in the APM why
>> this bit is needed. Neither in "15.36.2 Enabling SEV-SNP" nor in
>> "15.34.3 Enabling SEV".
>>
>> Looking at the bit defintions of WrMem an RdMem - read and write
>> requests get directed to system memory instead of MMIO so I guess you
>> don't want to be able to write MMIO for certain physical ranges when SNP
>> is enabled but it'll be good to have this properly explained instead of
>> a "this must happen" information-less sentence.
>
> This is a per-requisite for SNP_INIT as per the SNP Firmware ABI specifications, section 8.8.2:
>
> From the SNP FW ABI specs:
>
> If INIT_RMP is 1, then the firmware ensures the following system requirements are met:
> • SYSCFG[MemoryEncryptionModEn] must be set to 1 across all cores. (SEV must be
> enabled.)> • SYSCFG[SecureNestedPagingEn] must be set to 1 across all cores.
> • SYSCFG[VMPLEn] must be set to 1 across all cores.
> • SYSCFG[MFDM] must be set to 1 across all cores.
Hi Ashish,
I just noticed that the kernel shouts at me about this bit when I offline->online a CPU in
an SNP host:
[2692586.589194] smpboot: CPU 63 is now offline
[2692589.366822] [Firmware Warn]: MTRR: CPU 0: SYSCFG[MtrrFixDramModEn] not cleared by BIOS, clearing this bit
[2692589.376582] smpboot: Booting Node 0 Processor 63 APIC 0x3f
[2692589.378070] [Firmware Warn]: MTRR: CPU 63: SYSCFG[MtrrFixDramModEn] not cleared by BIOS, clearing this bit
[2692589.388845] microcode: CPU63: new patch_level=0x0a0011d1
Now I understand if you say "CPU offlining is not supported" but there's nothing currently
blocking it.
Best wishes,
Jeremi
> • VM_HSAVE_PA (MSR C001_0117) must be set to 0h across all cores.
> • HWCR[SmmLock] (MSR C001_0015) must be set to 1 across all cores.
>
> So, this platform enabling code for SNP needs to ensure that these conditions are met before SNP_INIT is called.
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-12-08 17:09 ` Jeremi Piotrowski
@ 2023-12-08 23:21 ` Kalra, Ashish
0 siblings, 0 replies; 158+ messages in thread
From: Kalra, Ashish @ 2023-12-08 23:21 UTC (permalink / raw)
To: Jeremi Piotrowski, Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
Hello Jeremi,
> Hi Ashish,
>
> I just noticed that the kernel shouts at me about this bit when I offline->online a CPU in
> an SNP host:
Yes, i also observe the same warning when i bring a CPU back online.
>
> [2692586.589194] smpboot: CPU 63 is now offline
> [2692589.366822] [Firmware Warn]: MTRR: CPU 0: SYSCFG[MtrrFixDramModEn] not cleared by BIOS, clearing this bit
> [2692589.376582] smpboot: Booting Node 0 Processor 63 APIC 0x3f
> [2692589.378070] [Firmware Warn]: MTRR: CPU 63: SYSCFG[MtrrFixDramModEn] not cleared by BIOS, clearing this bit
> [2692589.388845] microcode: CPU63: new patch_level=0x0a0011d1
>
> Now I understand if you say "CPU offlining is not supported" but there's nothing currently
> blocking it.
>
There is CPU hotplug support for SNP platform to do __snp_enable() when
bringing a CPU back online, but not really sure what needs to be done
for MtrrFixDramModeEn bit in SYSCFG, discussing with the FW developers.
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support
2023-11-07 16:31 ` Borislav Petkov
2023-11-07 18:32 ` Tom Lendacky
2023-11-07 19:00 ` Kalra, Ashish
@ 2023-12-20 7:07 ` Michael Roth
2 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-12-20 7:07 UTC (permalink / raw)
To: Borislav Petkov
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Tue, Nov 07, 2023 at 05:31:42PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:35AM -0500, Michael Roth wrote:
> > +static bool early_rmptable_check(void)
> > +{
> > + u64 rmp_base, rmp_size;
> > +
> > + /*
> > + * For early BSP initialization, max_pfn won't be set up yet, wait until
> > + * it is set before performing the RMP table calculations.
> > + */
> > + if (!max_pfn)
> > + return true;
>
> This already says that this is called at the wrong point during init.
>
> Right now we have
>
> early_identify_cpu -> early_init_amd -> early_detect_mem_encrypt
>
> which runs only on the BSP but then early_init_amd() is called in
> init_amd() too so that it takes care of the APs too.
>
> Which ends up doing a lot of unnecessary work on each AP in
> early_detect_mem_encrypt() like calculating the RMP size on each AP
> unnecessarily where this needs to happen exactly once.
>
> Is there any reason why this function cannot be moved to init_amd()
> where it'll do the normal, per-AP init?
>
> And the stuff that needs to happen once, needs to be called once too.
I've renamed/repurposed snp_get_rmptable_info() to
snp_probe_rmptable_info(). It now reads the MSRs, sanity-checks them,
and stores the values into ro_after_init variables on success.
Subsequent code uses those values to initialize the RMP table mapping
instead of re-reading the MSRs.
I've moved the call-site for snp_probe_rmptable_info() to
bsp_init_amd(), which gets called right after early_init_amd(), so
should still be early enough to clear X86_FEATURE_SEV_SNP such that
AutoIBRS doesn't get disabled if SNP isn't available on the system. APs
don't call bsp_init_amd(), so that should avoid doing multiple MSR reads.
And I think Ashish has all the other review comments addressed now.
Thanks,
Mike
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (5 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 06/50] x86/sev: Add the host SEV-SNP initialization support Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-11-14 14:24 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries Michael Roth
` (42 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
entry for a given page. The RMP entry format is documented in AMD PPR, see
https://bugzilla.kernel.org/attachment.cgi?id=296015.
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: separate 'assigned' indicator from return code]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/sev-common.h | 4 +++
arch/x86/include/asm/sev-host.h | 22 +++++++++++++
arch/x86/virt/svm/sev.c | 53 +++++++++++++++++++++++++++++++
3 files changed, 79 insertions(+)
create mode 100644 arch/x86/include/asm/sev-host.h
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b463fcbd4b90..1e6fb93d8ab0 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -173,4 +173,8 @@ struct snp_psc_desc {
#define GHCB_ERR_INVALID_INPUT 5
#define GHCB_ERR_INVALID_EVENT 6
+/* RMP page size */
+#define RMP_PG_SIZE_4K 0
+#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+
#endif
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
new file mode 100644
index 000000000000..4c487ce8457f
--- /dev/null
+++ b/arch/x86/include/asm/sev-host.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * AMD SVM-SEV Host Support.
+ *
+ * Copyright (C) 2023 Advanced Micro Devices, Inc.
+ *
+ * Author: Ashish Kalra <ashish.kalra@amd.com>
+ *
+ */
+
+#ifndef __ASM_X86_SEV_HOST_H
+#define __ASM_X86_SEV_HOST_H
+
+#include <asm/sev-common.h>
+
+#ifdef CONFIG_KVM_AMD_SEV
+int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
+#else
+static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
+#endif
+
+#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 8b9ed72489e4..7d3802605376 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -53,6 +53,9 @@ struct rmpentry {
*/
#define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
+/* Mask to apply to a PFN to get the first PFN of a 2MB page */
+#define PFN_PMD_MASK (~((1ULL << (PMD_SHIFT - PAGE_SHIFT)) - 1))
+
static struct rmpentry *rmptable_start __ro_after_init;
static u64 rmptable_max_pfn __ro_after_init;
@@ -237,3 +240,53 @@ static int __init snp_rmptable_init(void)
* the page(s) used for DMA are hypervisor owned.
*/
fs_initcall(snp_rmptable_init);
+
+static int rmptable_entry(u64 pfn, struct rmpentry *entry)
+{
+ if (WARN_ON_ONCE(pfn > rmptable_max_pfn))
+ return -EFAULT;
+
+ *entry = rmptable_start[pfn];
+
+ return 0;
+}
+
+static int __snp_lookup_rmpentry(u64 pfn, struct rmpentry *entry, int *level)
+{
+ struct rmpentry large_entry;
+ int ret;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ ret = rmptable_entry(pfn, entry);
+ if (ret)
+ return ret;
+
+ /*
+ * Find the authoritative RMP entry for a PFN. This can be either a 4K
+ * RMP entry or a special large RMP entry that is authoritative for a
+ * whole 2M area.
+ */
+ ret = rmptable_entry(pfn & PFN_PMD_MASK, &large_entry);
+ if (ret)
+ return ret;
+
+ *level = RMP_TO_X86_PG_LEVEL(large_entry.pagesize);
+
+ return 0;
+}
+
+int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
+{
+ struct rmpentry e;
+ int ret;
+
+ ret = __snp_lookup_rmpentry(pfn, &e, level);
+ if (ret)
+ return ret;
+
+ *assigned = !!e.assigned;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers
2023-10-16 13:27 ` [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers Michael Roth
@ 2023-11-14 14:24 ` Borislav Petkov
2023-12-19 3:31 ` Michael Roth
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-11-14 14:24 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:36AM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
$ git grep snp_lookup_page_in_rmptable
$
Stale commit message. And not very telling. Please rewrite.
> entry for a given page. The RMP entry format is documented in AMD PPR, see
> https://bugzilla.kernel.org/attachment.cgi?id=296015.
<--- Brijesh's SOB comes first here if he's the primary author.
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> [mdr: separate 'assigned' indicator from return code]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> arch/x86/include/asm/sev-common.h | 4 +++
> arch/x86/include/asm/sev-host.h | 22 +++++++++++++
> arch/x86/virt/svm/sev.c | 53 +++++++++++++++++++++++++++++++
> 3 files changed, 79 insertions(+)
> create mode 100644 arch/x86/include/asm/sev-host.h
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index b463fcbd4b90..1e6fb93d8ab0 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -173,4 +173,8 @@ struct snp_psc_desc {
> #define GHCB_ERR_INVALID_INPUT 5
> #define GHCB_ERR_INVALID_EVENT 6
>
> +/* RMP page size */
> +#define RMP_PG_SIZE_4K 0
RMP_PG_LEVEL_4K just like the generic ones.
> +#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
What else is there besides X86 PG level?
IOW, RMP_TO_PG_LEVEL simply.
> +
> #endif
> diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
Nah, we don't need a third sev header:
arch/x86/include/asm/sev-common.h
arch/x86/include/asm/sev.h
arch/x86/include/asm/sev-host.h
Put it in sev.h pls.
sev-common.h should be merged into sev.h too unless there's a compelling
reason not to which I don't see atm.
> new file mode 100644
> index 000000000000..4c487ce8457f
> --- /dev/null
> +++ b/arch/x86/include/asm/sev-host.h
...
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index 8b9ed72489e4..7d3802605376 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -53,6 +53,9 @@ struct rmpentry {
> */
> #define RMPTABLE_CPU_BOOKKEEPING_SZ 0x4000
>
> +/* Mask to apply to a PFN to get the first PFN of a 2MB page */
> +#define PFN_PMD_MASK (~((1ULL << (PMD_SHIFT - PAGE_SHIFT)) - 1))
GENMASK_ULL
> static struct rmpentry *rmptable_start __ro_after_init;
> static u64 rmptable_max_pfn __ro_after_init;
>
> @@ -237,3 +240,53 @@ static int __init snp_rmptable_init(void)
> * the page(s) used for DMA are hypervisor owned.
> */
> fs_initcall(snp_rmptable_init);
> +
> +static int rmptable_entry(u64 pfn, struct rmpentry *entry)
The signature of this one should be:
static struct rmpentry *get_rmp_entry(u64 pfn)
and the callers should use the IS_ERR* macros to check whether it
returns a valid pointer or a negative value for error.
Ditto for the other two functions here.
> + if (WARN_ON_ONCE(pfn > rmptable_max_pfn))
> + return -EFAULT;
> +
> + *entry = rmptable_start[pfn];
This wants to be called rmptable[] then.
> +
> + return 0;
> +}
> +
> +static int __snp_lookup_rmpentry(u64 pfn, struct rmpentry *entry, int *level)
> +{
> + struct rmpentry large_entry;
> + int ret;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return -ENXIO;
ENODEV or so.
> +
> + ret = rmptable_entry(pfn, entry);
> + if (ret)
> + return ret;
> +
> + /*
> + * Find the authoritative RMP entry for a PFN. This can be either a 4K
> + * RMP entry or a special large RMP entry that is authoritative for a
> + * whole 2M area.
> + */
> + ret = rmptable_entry(pfn & PFN_PMD_MASK, &large_entry);
> + if (ret)
> + return ret;
> +
> + *level = RMP_TO_X86_PG_LEVEL(large_entry.pagesize);
> +
> + return 0;
> +}
> +
> +int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
> +{
> + struct rmpentry e;
> + int ret;
> +
> + ret = __snp_lookup_rmpentry(pfn, &e, level);
> + if (ret)
> + return ret;
> +
> + *assigned = !!e.assigned;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
> --
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers
2023-11-14 14:24 ` Borislav Petkov
@ 2023-12-19 3:31 ` Michael Roth
2024-01-09 22:07 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-12-19 3:31 UTC (permalink / raw)
To: Borislav Petkov
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On Tue, Nov 14, 2023 at 03:24:42PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:36AM -0500, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> >
> > The snp_lookup_page_in_rmptable() can be used by the host to read the RMP
>
> $ git grep snp_lookup_page_in_rmptable
> $
>
> Stale commit message. And not very telling. Please rewrite.
>
> > entry for a given page. The RMP entry format is documented in AMD PPR, see
> > https://bugzilla.kernel.org/attachment.cgi?id=296015.
>
> <--- Brijesh's SOB comes first here if he's the primary author.
>
> > Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > [mdr: separate 'assigned' indicator from return code]
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> > arch/x86/include/asm/sev-common.h | 4 +++
> > arch/x86/include/asm/sev-host.h | 22 +++++++++++++
> > arch/x86/virt/svm/sev.c | 53 +++++++++++++++++++++++++++++++
> > 3 files changed, 79 insertions(+)
> > create mode 100644 arch/x86/include/asm/sev-host.h
> >
> > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > index b463fcbd4b90..1e6fb93d8ab0 100644
> > --- a/arch/x86/include/asm/sev-common.h
> > +++ b/arch/x86/include/asm/sev-common.h
> > @@ -173,4 +173,8 @@ struct snp_psc_desc {
> > #define GHCB_ERR_INVALID_INPUT 5
> > #define GHCB_ERR_INVALID_EVENT 6
> >
> > +/* RMP page size */
> > +#define RMP_PG_SIZE_4K 0
>
> RMP_PG_LEVEL_4K just like the generic ones.
I've moved this to sev.h, but it RMP_PG_SIZE_4K is already defined there
and used by a bunch of guest code so it's a bit out-of-place to update
those as part of this patchset. I can send a follow-up series to clean up
some of the naming and get rid of sev-common.h
>
> > +#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
>
> What else is there besides X86 PG level?
>
> IOW, RMP_TO_PG_LEVEL simply.
Make sense.
>
> > +
> > #endif
> > diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
>
> Nah, we don't need a third sev header:
>
> arch/x86/include/asm/sev-common.h
> arch/x86/include/asm/sev.h
> arch/x86/include/asm/sev-host.h
>
> Put it in sev.h pls.
Done.
>
> sev-common.h should be merged into sev.h too unless there's a compelling
> reason not to which I don't see atm.
Doesn't seem like it would be an issue, maybe some fallout from any
files that previously only included sev-common.h and now need to pull in
guest struct definitions as well, but those definitions don't have a lot
of external dependencies so don't anticipate any header include
hellishness. I'll send that as a separate follow-up, along with some of
the renames you suggested above since they'll touch guest code and
create unecessary churn for SNP host support.
Thanks,
Mike
> > --
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers
2023-12-19 3:31 ` Michael Roth
@ 2024-01-09 22:07 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2024-01-09 22:07 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On Mon, Dec 18, 2023 at 09:31:50PM -0600, Michael Roth wrote:
> I've moved this to sev.h, but it RMP_PG_SIZE_4K is already defined there
> and used by a bunch of guest code so it's a bit out-of-place to update
> those as part of this patchset. I can send a follow-up series to clean up
> some of the naming and get rid of sev-common.h
Yap, good idea.
> Doesn't seem like it would be an issue, maybe some fallout from any
> files that previously only included sev-common.h and now need to pull in
> guest struct definitions as well, but those definitions don't have a lot
> of external dependencies so don't anticipate any header include
> hellishness. I'll send that as a separate follow-up, along with some of
> the renames you suggested above since they'll touch guest code and
> create unecessary churn for SNP host support.
OTOH, people recently have started looking at including only that stuff
which is really used so having a single header would cause more
preprocessing effort. I'm not too crazy about it as the preprocessing
overhead is barely measurable so might as well have a single header and
then split it later...
Definitely something for the after-burner and not important right now.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (6 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 07/50] x86/sev: Add RMP entry lookup helpers Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-11-15 16:08 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 09/50] x86/traps: Define RMP violation #PF error code Michael Roth
` (41 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
This information will be useful for debugging things like page faults
due to RMP access violations and RMPUPDATE failures.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: move helper to standalone patch]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/sev-host.h | 2 +
arch/x86/virt/svm/sev.c | 77 +++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+)
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 4c487ce8457f..bb06c57f2909 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -15,8 +15,10 @@
#ifdef CONFIG_KVM_AMD_SEV
int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
+void sev_dump_hva_rmpentry(unsigned long address);
#else
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
+static inline void sev_dump_hva_rmpentry(unsigned long address) {}
#endif
#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 7d3802605376..cac3e311c38f 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -290,3 +290,80 @@ int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level)
return 0;
}
EXPORT_SYMBOL_GPL(snp_lookup_rmpentry);
+
+/*
+ * Dump the raw RMP entry for a particular PFN. These bits are documented in the
+ * PPR for a particular CPU model and provide useful information about how a
+ * particular PFN is being utilized by the kernel/firmware at the time certain
+ * unexpected events occur, such as RMP faults.
+ */
+static void sev_dump_rmpentry(u64 dumped_pfn)
+{
+ struct rmpentry e;
+ u64 pfn, pfn_end;
+ int level, ret;
+ u64 *e_data;
+
+ ret = __snp_lookup_rmpentry(dumped_pfn, &e, &level);
+ if (ret) {
+ pr_info("Failed to read RMP entry for PFN 0x%llx, error %d\n",
+ dumped_pfn, ret);
+ return;
+ }
+
+ e_data = (u64 *)&e;
+ if (e.assigned) {
+ pr_info("RMP entry for PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+ dumped_pfn, e_data[1], e_data[0]);
+ return;
+ }
+
+ /*
+ * If the RMP entry for a particular PFN is not in an assigned state,
+ * then it is sometimes useful to get an idea of whether or not any RMP
+ * entries for other PFNs within the same 2MB region are assigned, since
+ * those too can affect the ability to access a particular PFN in
+ * certain situations, such as when the PFN is being accessed via a 2MB
+ * mapping in the host page table.
+ */
+ pfn = ALIGN(dumped_pfn, PTRS_PER_PMD);
+ pfn_end = pfn + PTRS_PER_PMD;
+
+ while (pfn < pfn_end) {
+ ret = __snp_lookup_rmpentry(pfn, &e, &level);
+ if (ret) {
+ pr_info_ratelimited("Failed to read RMP entry for PFN 0x%llx\n", pfn);
+ pfn++;
+ continue;
+ }
+
+ if (e_data[0] || e_data[1]) {
+ pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
+ dumped_pfn, pfn, e_data[1], e_data[0]);
+ return;
+ }
+ pfn++;
+ }
+
+ pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
+ dumped_pfn);
+}
+
+void sev_dump_hva_rmpentry(unsigned long hva)
+{
+ unsigned int level;
+ pgd_t *pgd;
+ pte_t *pte;
+
+ pgd = __va(read_cr3_pa());
+ pgd += pgd_index(hva);
+ pte = lookup_address_in_pgd(pgd, hva, &level);
+
+ if (pte) {
+ pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
+ return;
+ }
+
+ sev_dump_rmpentry(pte_pfn(*pte));
+}
+EXPORT_SYMBOL_GPL(sev_dump_hva_rmpentry);
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries
2023-10-16 13:27 ` [PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries Michael Roth
@ 2023-11-15 16:08 ` Borislav Petkov
2023-12-19 6:08 ` Michael Roth
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-11-15 16:08 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:37AM -0500, Michael Roth wrote:
> +/*
> + * Dump the raw RMP entry for a particular PFN. These bits are documented in the
> + * PPR for a particular CPU model and provide useful information about how a
> + * particular PFN is being utilized by the kernel/firmware at the time certain
> + * unexpected events occur, such as RMP faults.
> + */
> +static void sev_dump_rmpentry(u64 dumped_pfn)
Just "dump_rmentry"
s/dumped_pfn/pfn/g
> + struct rmpentry e;
> + u64 pfn, pfn_end;
> + int level, ret;
> + u64 *e_data;
> +
> + ret = __snp_lookup_rmpentry(dumped_pfn, &e, &level);
> + if (ret) {
> + pr_info("Failed to read RMP entry for PFN 0x%llx, error %d\n",
> + dumped_pfn, ret);
> + return;
> + }
> +
> + e_data = (u64 *)&e;
> + if (e.assigned) {
> + pr_info("RMP entry for PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> + dumped_pfn, e_data[1], e_data[0]);
> + return;
> + }
> +
> + /*
> + * If the RMP entry for a particular PFN is not in an assigned state,
> + * then it is sometimes useful to get an idea of whether or not any RMP
> + * entries for other PFNs within the same 2MB region are assigned, since
> + * those too can affect the ability to access a particular PFN in
> + * certain situations, such as when the PFN is being accessed via a 2MB
> + * mapping in the host page table.
> + */
> + pfn = ALIGN(dumped_pfn, PTRS_PER_PMD);
> + pfn_end = pfn + PTRS_PER_PMD;
> +
> + while (pfn < pfn_end) {
> + ret = __snp_lookup_rmpentry(pfn, &e, &level);
> + if (ret) {
> + pr_info_ratelimited("Failed to read RMP entry for PFN 0x%llx\n", pfn);
Why ratelmited?
No need to print anything if you fail to read it - simply dump the range
[pfn, pfn_end], _data[0], e_data[1] exactly *once* before the loop and
inside the loop dump only the ones you can lookup...
> + pfn++;
> + continue;
> + }
> +
> + if (e_data[0] || e_data[1]) {
> + pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> + dumped_pfn, pfn, e_data[1], e_data[0]);
> + return;
> + }
> + pfn++;
> + }
> +
> + pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
> + dumped_pfn);
... and then you don't need this one either.
> +}
> +
> +void sev_dump_hva_rmpentry(unsigned long hva)
> +{
> + unsigned int level;
> + pgd_t *pgd;
> + pte_t *pte;
> +
> + pgd = __va(read_cr3_pa());
> + pgd += pgd_index(hva);
> + pte = lookup_address_in_pgd(pgd, hva, &level);
If this is using the current CR3, why aren't you simply using
lookup_address() here without the need to read pgd?
> +
> + if (pte) {
if (!pte)
Doh.
> + pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
> + return;
> + }
> +
> + sev_dump_rmpentry(pte_pfn(*pte));
> +}
> +EXPORT_SYMBOL_GPL(sev_dump_hva_rmpentry);
Who's going to use this, kvm?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries
2023-11-15 16:08 ` Borislav Petkov
@ 2023-12-19 6:08 ` Michael Roth
0 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-12-19 6:08 UTC (permalink / raw)
To: Borislav Petkov
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Wed, Nov 15, 2023 at 05:08:52PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:37AM -0500, Michael Roth wrote:
> > +/*
> > + * Dump the raw RMP entry for a particular PFN. These bits are documented in the
> > + * PPR for a particular CPU model and provide useful information about how a
> > + * particular PFN is being utilized by the kernel/firmware at the time certain
> > + * unexpected events occur, such as RMP faults.
> > + */
> > +static void sev_dump_rmpentry(u64 dumped_pfn)
>
> Just "dump_rmentry"
>
> s/dumped_pfn/pfn/g
>
> > + struct rmpentry e;
> > + u64 pfn, pfn_end;
> > + int level, ret;
> > + u64 *e_data;
> > +
> > + ret = __snp_lookup_rmpentry(dumped_pfn, &e, &level);
> > + if (ret) {
> > + pr_info("Failed to read RMP entry for PFN 0x%llx, error %d\n",
> > + dumped_pfn, ret);
> > + return;
> > + }
> > +
> > + e_data = (u64 *)&e;
> > + if (e.assigned) {
> > + pr_info("RMP entry for PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> > + dumped_pfn, e_data[1], e_data[0]);
> > + return;
> > + }
> > +
> > + /*
> > + * If the RMP entry for a particular PFN is not in an assigned state,
> > + * then it is sometimes useful to get an idea of whether or not any RMP
> > + * entries for other PFNs within the same 2MB region are assigned, since
> > + * those too can affect the ability to access a particular PFN in
> > + * certain situations, such as when the PFN is being accessed via a 2MB
> > + * mapping in the host page table.
> > + */
> > + pfn = ALIGN(dumped_pfn, PTRS_PER_PMD);
> > + pfn_end = pfn + PTRS_PER_PMD;
> > +
> > + while (pfn < pfn_end) {
> > + ret = __snp_lookup_rmpentry(pfn, &e, &level);
> > + if (ret) {
> > + pr_info_ratelimited("Failed to read RMP entry for PFN 0x%llx\n", pfn);
>
> Why ratelmited?
Dave had some concerns about potentially printing out ~512 messages
for a particular PFN dump, and this seemed like a potential case where
that might still occur if there was some issue with RMP table access.
But I still wanted to print some indicator if we did hit that case,
since it might be related to whatever caused the dump to get triggered.
>
> No need to print anything if you fail to read it - simply dump the range
> [pfn, pfn_end], _data[0], e_data[1] exactly *once* before the loop and
> inside the loop dump only the ones you can lookup...
Similar to above, the loop used to print every populated entry in the
2M range if the dumped PFN wasn't itself in an assigned state, but Dave
had some concerns about flooding. So now the loop only prints 1
populated entry to provide some indication that there are entries
present that could explain things like RMP faults for the PFN that caused
the dump.
That makes it a bit awkward to print a header statement, since you end
up with something like:
PFN is not assigned, so dumping the first populated RMP entry found with the 2MB range (if any)
PFN_x is populated, contents [high=... low=...]
Or if nothing found:
PFN is not assigned, so dumping the first populated RMP entry found with the 2MB range (if any)
Whereas the current logic just prints 1 self-contained statement which
fully explains each of the above cases and doesn't require the user to
infer there was nothing present in the range based on the lack of
statement. It's a little clearer, a little less verbose, and a little easier
to grep for either situation without needed to get context from surrounding
statements.
>
> > + pfn++;
> > + continue;
> > + }
> > +
> > + if (e_data[0] || e_data[1]) {
> > + pr_info("No assigned RMP entry for PFN 0x%llx, but the 2MB region contains populated RMP entries, e.g.: PFN 0x%llx: [high=0x%016llx low=0x%016llx]\n",
> > + dumped_pfn, pfn, e_data[1], e_data[0]);
> > + return;
> > + }
> > + pfn++;
> > + }
> > +
> > + pr_info("No populated RMP entries in the 2MB region containing PFN 0x%llx\n",
> > + dumped_pfn);
>
> ... and then you don't need this one either.
>
> > +}
> > +
> > +void sev_dump_hva_rmpentry(unsigned long hva)
> > +{
> > + unsigned int level;
> > + pgd_t *pgd;
> > + pte_t *pte;
> > +
> > + pgd = __va(read_cr3_pa());
> > + pgd += pgd_index(hva);
> > + pte = lookup_address_in_pgd(pgd, hva, &level);
>
> If this is using the current CR3, why aren't you simply using
> lookup_address() here without the need to read pgd?
>
> > +
> > + if (pte) {
>
> if (!pte)
>
> Doh.
Yikes. Thanks for the catch.
>
> > + pr_info("Can't dump RMP entry for HVA %lx: no PTE/PFN found\n", hva);
> > + return;
> > + }
> > +
> > + sev_dump_rmpentry(pte_pfn(*pte));
> > +}
> > +EXPORT_SYMBOL_GPL(sev_dump_hva_rmpentry);
>
> Who's going to use this, kvm?
This is mainly used by the host #PF handler via show_fault_oops(). It
can happen both for kernel or userspace accesses if there's a bug, so
that's why the read_cr3_pa() is needed, since these may be userspace
HVAs. Though I just realized the patch that uses this (next one in the
series) claims to only be for kernel #PFs, so that might cause some
confusion. I'll get that commit message fixed up.
Thanks,
Mike
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 09/50] x86/traps: Define RMP violation #PF error code
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (7 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 08/50] x86/fault: Add helper for dumping RMP entries Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 14:14 ` Dave Hansen
2023-10-16 13:27 ` [PATCH v10 10/50] x86/fault: Report RMP page faults for kernel addresses Michael Roth
` (40 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Dave Hansen
From: Brijesh Singh <brijesh.singh@amd.com>
Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.
While at it, use the BIT() macro.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off by: Ashish Kalra <ashish.kalra@amd.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/trap_pf.h | 4 ++++
arch/x86/mm/fault.c | 1 +
2 files changed, 5 insertions(+)
diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index afa524325e55..136707d7a961 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_TRAP_PF_H
#define _ASM_X86_TRAP_PF_H
+#include <linux/bits.h> /* BIT() macro */
+
/*
* Page fault error code bits:
*
@@ -13,6 +15,7 @@
* bit 5 == 1: protection keys block access
* bit 6 == 1: shadow stack access fault
* bit 15 == 1: SGX MMU page-fault
+ * bit 31 == 1: fault was due to RMP violation
*/
enum x86_pf_error_code {
X86_PF_PROT = 1 << 0,
@@ -23,6 +26,7 @@ enum x86_pf_error_code {
X86_PF_PK = 1 << 5,
X86_PF_SHSTK = 1 << 6,
X86_PF_SGX = 1 << 15,
+ X86_PF_RMP = 1 << 31,
};
#endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ab778eac1952..7858b9515d4a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -547,6 +547,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
!(error_code & X86_PF_PROT) ? "not-present page" :
(error_code & X86_PF_RSVD) ? "reserved bit violation" :
(error_code & X86_PF_PK) ? "protection keys violation" :
+ (error_code & X86_PF_RMP) ? "RMP violation" :
"permissions violation");
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 09/50] x86/traps: Define RMP violation #PF error code
2023-10-16 13:27 ` [PATCH v10 09/50] x86/traps: Define RMP violation #PF error code Michael Roth
@ 2023-10-16 14:14 ` Dave Hansen
2023-10-16 14:55 ` Michael Roth
0 siblings, 1 reply; 158+ messages in thread
From: Dave Hansen @ 2023-10-16 14:14 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On 10/16/23 06:27, Michael Roth wrote:
> Bit 31 in the page fault-error bit will be set when processor encounters
> an RMP violation. While at it, use the BIT() macro.
Any idea where the BIT() use went? I remember seeing it in earlier
versions.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 09/50] x86/traps: Define RMP violation #PF error code
2023-10-16 14:14 ` Dave Hansen
@ 2023-10-16 14:55 ` Michael Roth
0 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 14:55 UTC (permalink / raw)
To: Dave Hansen
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 07:14:07AM -0700, Dave Hansen wrote:
> On 10/16/23 06:27, Michael Roth wrote:
> > Bit 31 in the page fault-error bit will be set when processor encounters
> > an RMP violation. While at it, use the BIT() macro.
>
> Any idea where the BIT() use went? I remember seeing it in earlier
> versions.
Yah... this patch used to convert all the previous definitions over to
using BIT() as part of introducing the new RMP bit. I'm not sure what
happened, but a likely possibility is I hit a merge conflict at some
point due to upstream commit fd5439e0c9, which introduced this change:
X86_PF_SHSTK = 1 << 6,
and my brain probably defaulted to using the existing pattern to
resolve it. I'll get this fixed up.
-Mike
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 10/50] x86/fault: Report RMP page faults for kernel addresses
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (8 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 09/50] x86/traps: Define RMP violation #PF error code Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-11-21 15:23 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
` (39 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
RMP #PFs on kernel addresses are fatal and should never happen in
practice. They indicate a bug in the host kernel somewhere, so dump some
information about any RMP entries related to the faulting address to aid
with debugging.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/mm/fault.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7858b9515d4a..9f154beef9c7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -34,6 +34,7 @@
#include <asm/kvm_para.h> /* kvm_handle_async_pf */
#include <asm/vdso.h> /* fixup_vdso_exception() */
#include <asm/irq_stack.h>
+#include <asm/sev-host.h> /* sev_dump_rmpentry() */
#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -580,6 +581,9 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
}
dump_pagetable(address);
+
+ if (error_code & X86_PF_RMP)
+ sev_dump_hva_rmpentry(address);
}
static noinline void
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 10/50] x86/fault: Report RMP page faults for kernel addresses
2023-10-16 13:27 ` [PATCH v10 10/50] x86/fault: Report RMP page faults for kernel addresses Michael Roth
@ 2023-11-21 15:23 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-11-21 15:23 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
On Mon, Oct 16, 2023 at 08:27:39AM -0500, Michael Roth wrote:
> RMP #PFs on kernel addresses are fatal and should never happen in
s/#PFs/faults/
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (9 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 10/50] x86/fault: Report RMP page faults for kernel addresses Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-11-21 16:21 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 12/50] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
` (38 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.
The PSMASH instruction expands a 2MB RMP entry into a corresponding set
of contiguous 4KB-Page RMP entries. The hypervisor will use this
instruction to adjust the RMP entry without invalidating the previous
RMP entry.
Add the following external interface API functions:
psmash():
Used to smash a 2MB aligned page into 4K pages while preserving the
Validated bit in the RMP.
rmp_make_private():
Used to assign a page to guest using the RMPUPDATE instruction.
rmp_make_shared():
Used to transition a page to hypervisor/shared state using the
RMPUPDATE instruction.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/sev-common.h | 14 +++++
arch/x86/include/asm/sev-host.h | 10 ++++
arch/x86/virt/svm/sev.c | 89 +++++++++++++++++++++++++++++++
3 files changed, 113 insertions(+)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 1e6fb93d8ab0..93ec8c12c91d 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -173,8 +173,22 @@ struct snp_psc_desc {
#define GHCB_ERR_INVALID_INPUT 5
#define GHCB_ERR_INVALID_EVENT 6
+/* RMUPDATE detected 4K page and 2MB page overlap. */
+#define RMPUPDATE_FAIL_OVERLAP 4
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
+#define RMP_PG_SIZE_2M 1
#define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
+#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+
+struct rmp_state {
+ u64 gpa;
+ u8 assigned;
+ u8 pagesize;
+ u8 immutable;
+ u8 rsvd;
+ u32 asid;
+} __packed;
#endif
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index bb06c57f2909..1df989411334 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -16,9 +16,19 @@
#ifdef CONFIG_KVM_AMD_SEV
int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
void sev_dump_hva_rmpentry(unsigned long address);
+int psmash(u64 pfn);
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
+int rmp_make_shared(u64 pfn, enum pg_level level);
#else
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
static inline void sev_dump_hva_rmpentry(unsigned long address) {}
+static inline int psmash(u64 pfn) { return -ENXIO; }
+static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
+ bool immutable)
+{
+ return -ENXIO;
+}
+static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENXIO; }
#endif
#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index cac3e311c38f..24a695af13a5 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -367,3 +367,92 @@ void sev_dump_hva_rmpentry(unsigned long hva)
sev_dump_rmpentry(pte_pfn(*pte));
}
EXPORT_SYMBOL_GPL(sev_dump_hva_rmpentry);
+
+/*
+ * PSMASH a 2MB aligned page into 4K pages in the RMP table while preserving the
+ * Validated bit.
+ */
+int psmash(u64 pfn)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret;
+
+ pr_debug("%s: PFN: 0x%llx\n", __func__, pfn);
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ /* Binutils version 2.36 supports the PSMASH mnemonic. */
+ asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+ : "=a"(ret)
+ : "a"(paddr)
+ : "memory", "cc");
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(psmash);
+
+static int rmpupdate(u64 pfn, struct rmp_state *val)
+{
+ unsigned long paddr = pfn << PAGE_SHIFT;
+ int ret, level, npages;
+ int attempts = 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENXIO;
+
+ do {
+ /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+ : "=a"(ret)
+ : "a"(paddr), "c"((unsigned long)val)
+ : "memory", "cc");
+
+ attempts++;
+ } while (ret == RMPUPDATE_FAIL_OVERLAP);
+
+ if (ret) {
+ pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
+ attempts, ret, pfn, npages, level);
+ sev_dump_rmpentry(pfn);
+ dump_stack();
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+/*
+ * Assign a page to guest using the RMPUPDATE instruction.
+ */
+int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
+{
+ struct rmp_state val;
+
+ memset(&val, 0, sizeof(val));
+ val.assigned = 1;
+ val.asid = asid;
+ val.immutable = immutable;
+ val.gpa = gpa;
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_private);
+
+/*
+ * Transition a page to hypervisor/shared state using the RMPUPDATE instruction.
+ */
+int rmp_make_shared(u64 pfn, enum pg_level level)
+{
+ struct rmp_state val;
+
+ memset(&val, 0, sizeof(val));
+ val.pagesize = X86_TO_RMP_PG_LEVEL(level);
+
+ return rmpupdate(pfn, &val);
+}
+EXPORT_SYMBOL_GPL(rmp_make_shared);
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
2023-10-16 13:27 ` [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
@ 2023-11-21 16:21 ` Borislav Petkov
2023-12-19 16:20 ` Michael Roth
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-11-21 16:21 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:40AM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
> hypervisor will use the instruction to add pages to the RMP table. See
> APM3 for details on the instruction operations.
>
> The PSMASH instruction expands a 2MB RMP entry into a corresponding set
> of contiguous 4KB-Page RMP entries. The hypervisor will use this
s/-Page//
> instruction to adjust the RMP entry without invalidating the previous
> RMP entry.
"... without invalidating it."
This below is useless text in the commit message - that should be all
visible from the patch itself.
> Add the following external interface API functions:
>
> psmash(): Used to smash a 2MB aligned page into 4K pages while
> preserving the Validated bit in the RMP.
>
> rmp_make_private(): Used to assign a page to guest using the RMPUPDATE
> instruction.
>
> rmp_make_shared(): Used to transition a page to hypervisor/shared
> state using the RMPUPDATE instruction.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Since Brijesh is the author, first comes his SOB, then Ashish's and then
yours.
> [mdr: add RMPUPDATE retry logic for transient FAIL_OVERLAP errors]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> arch/x86/include/asm/sev-common.h | 14 +++++
> arch/x86/include/asm/sev-host.h | 10 ++++
> arch/x86/virt/svm/sev.c | 89 +++++++++++++++++++++++++++++++
> 3 files changed, 113 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 1e6fb93d8ab0..93ec8c12c91d 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -173,8 +173,22 @@ struct snp_psc_desc {
> #define GHCB_ERR_INVALID_INPUT 5
> #define GHCB_ERR_INVALID_EVENT 6
>
> +/* RMUPDATE detected 4K page and 2MB page overlap. */
> +#define RMPUPDATE_FAIL_OVERLAP 4
> +
> /* RMP page size */
> #define RMP_PG_SIZE_4K 0
> +#define RMP_PG_SIZE_2M 1
RMP_PG_LEVEL_2M
> #define RMP_TO_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? PG_LEVEL_4K : PG_LEVEL_2M)
> +#define X86_TO_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
> +
> +struct rmp_state {
> + u64 gpa;
> + u8 assigned;
> + u8 pagesize;
> + u8 immutable;
> + u8 rsvd;
> + u32 asid;
> +} __packed;
>
> #endif
> diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
> index bb06c57f2909..1df989411334 100644
> --- a/arch/x86/include/asm/sev-host.h
> +++ b/arch/x86/include/asm/sev-host.h
> @@ -16,9 +16,19 @@
> #ifdef CONFIG_KVM_AMD_SEV
> int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level);
> void sev_dump_hva_rmpentry(unsigned long address);
> +int psmash(u64 pfn);
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> +int rmp_make_shared(u64 pfn, enum pg_level level);
> #else
> static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
> static inline void sev_dump_hva_rmpentry(unsigned long address) {}
> +static inline int psmash(u64 pfn) { return -ENXIO; }
> +static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid,
> + bool immutable)
> +{
> + return -ENXIO;
> +}
> +static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENXIO; }
> #endif
>
> #endif
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index cac3e311c38f..24a695af13a5 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -367,3 +367,92 @@ void sev_dump_hva_rmpentry(unsigned long hva)
> sev_dump_rmpentry(pte_pfn(*pte));
> }
> EXPORT_SYMBOL_GPL(sev_dump_hva_rmpentry);
> +
> +/*
> + * PSMASH a 2MB aligned page into 4K pages in the RMP table while preserving the
> + * Validated bit.
> + */
> +int psmash(u64 pfn)
> +{
> + unsigned long paddr = pfn << PAGE_SHIFT;
> + int ret;
> +
> + pr_debug("%s: PFN: 0x%llx\n", __func__, pfn);
Left over?
> +
> + if (!pfn_valid(pfn))
> + return -EINVAL;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return -ENXIO;
That needs to happen first in the function.
> +
> + /* Binutils version 2.36 supports the PSMASH mnemonic. */
> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
> + : "=a"(ret)
> + : "a"(paddr)
Add an empty space between the " and the (
> + : "memory", "cc");
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(psmash);
> +
> +static int rmpupdate(u64 pfn, struct rmp_state *val)
rmp_state *state
so that it is clear what this is.
> +{
> + unsigned long paddr = pfn << PAGE_SHIFT;
> + int ret, level, npages;
> + int attempts = 0;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return -ENXIO;
> +
> + do {
> + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> + : "=a"(ret)
> + : "a"(paddr), "c"((unsigned long)val)
Add an empty space between the " and the (
> + : "memory", "cc");
> +
> + attempts++;
> + } while (ret == RMPUPDATE_FAIL_OVERLAP);
What's the logic here? Loop as long as it says "overlap"?
How "transient" is that overlapping condition?
What's the upper limit of that loop?
This loop should check a generously chosen upper limit of attempts and
then break if that limit is reached.
> + if (ret) {
> + pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
> + attempts, ret, pfn, npages, level);
You're dumping here uninitialized stack variables npages and level.
Looks like leftover from some prior version of this function.
> + sev_dump_rmpentry(pfn);
> + dump_stack();
This is going to become real noisy on a huge machine with a lot of SNP
guests.
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Assign a page to guest using the RMPUPDATE instruction.
> + */
One-line comment works too.
/* Assign ... */
> +int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable)
> +{
> + struct rmp_state val;
> +
> + memset(&val, 0, sizeof(val));
> + val.assigned = 1;
> + val.asid = asid;
> + val.immutable = immutable;
> + val.gpa = gpa;
> + val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> +
> + return rmpupdate(pfn, &val);
> +}
> +EXPORT_SYMBOL_GPL(rmp_make_private);
> +
> +/*
> + * Transition a page to hypervisor/shared state using the RMPUPDATE instruction.
> + */
Ditto.
> +int rmp_make_shared(u64 pfn, enum pg_level level)
> +{
> + struct rmp_state val;
> +
> + memset(&val, 0, sizeof(val));
> + val.pagesize = X86_TO_RMP_PG_LEVEL(level);
> +
> + return rmpupdate(pfn, &val);
> +}
> +EXPORT_SYMBOL_GPL(rmp_make_shared);
> --
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
2023-11-21 16:21 ` Borislav Petkov
@ 2023-12-19 16:20 ` Michael Roth
0 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-12-19 16:20 UTC (permalink / raw)
To: Borislav Petkov
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Tue, Nov 21, 2023 at 05:21:49PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:40AM -0500, Michael Roth wrote:
> > +static int rmpupdate(u64 pfn, struct rmp_state *val)
>
> rmp_state *state
>
> so that it is clear what this is.
>
> > +{
> > + unsigned long paddr = pfn << PAGE_SHIFT;
> > + int ret, level, npages;
> > + int attempts = 0;
> > +
> > + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> > + return -ENXIO;
> > +
> > + do {
> > + /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> > + asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> > + : "=a"(ret)
> > + : "a"(paddr), "c"((unsigned long)val)
>
> Add an empty space between the " and the (
>
> > + : "memory", "cc");
> > +
> > + attempts++;
> > + } while (ret == RMPUPDATE_FAIL_OVERLAP);
>
> What's the logic here? Loop as long as it says "overlap"?
>
> How "transient" is that overlapping condition?
>
> What's the upper limit of that loop?
>
> This loop should check a generously chosen upper limit of attempts and
> then break if that limit is reached.
We've raised similar questions to David Kaplan and discussed this to a
fair degree.
The transient condition here is due to firmware locking the 2MB-aligned
RMP entry for the range to handle atomic updates. There is no upper bound
on retries or the amount of time spent, but it is always transient since
multiple hypervisor implementations now depend on this and any deviation
from this assurance would constitute a firmware regression.
A good torture test for this path is lots of 4K-only guests doing
concurrent boot/shutdowns in a tight loop. With week-long runs the
longest delay seen was on the order of 100ns, but there's no real
correlation between time spent and number of retries, sometimes
100ns delays only involve 1 retry, sometimes much smaller time delays
involve hundreds of retries, and it all depends on what firmware is
doing, so there's no way to infer a safe retry limit based on that
data.
All that said, there are unfortunately other conditions that can
trigger non-transient RMPUPDATE_FAIL_OVERLAP failures, and these will
result in an infinite loop. Those are the result of host misbehavior
however, like trying to set up 2MB private RMP entries when there are
already private 4K entries in the range. Ideally these would be separate
error codes, but even if that were changed in firmware we'd still need
code to support older firmwares that don't disambiguate so not sure this
situation can be improved much.
>
> > + if (ret) {
> > + pr_err("RMPUPDATE failed after %d attempts, ret: %d, pfn: %llx, npages: %d, level: %d\n",
> > + attempts, ret, pfn, npages, level);
>
> You're dumping here uninitialized stack variables npages and level.
> Looks like leftover from some prior version of this function.
Yah, I'll clean this up. I think logging the attempts probably doesn't
have much use anymore either.
>
> > + sev_dump_rmpentry(pfn);
> > + dump_stack();
>
> This is going to become real noisy on a huge machine with a lot of SNP
> guests.
Since the transient case will eventually resolve to ret==0, we will only
get here on a kernel oops sort of condition where a stack dump seems
appropriate. rmpupdate() shouldn't error during normal operation, and if
it ever does it will likely be a fatal situation where those stack dumps
will be useful.
Thanks,
Mike
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 12/50] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (10 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 11/50] x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-11-24 14:20 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 13/50] crypto: ccp: Define the SEV-SNP commands Michael Roth
` (37 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used with standard x86 and IOMMU page tables to enforce
memory restrictions and page access rights. The RMP check is enforced as
soon as SEV-SNP is enabled globally in the system. When hardware
encounters an RMP-check failure, it raises a page-fault exception.
The rmp_make_private() and rmp_make_shared() helpers are used to add
or remove the pages from the RMP table. Improve the rmp_make_private()
to invalidate state so that pages cannot be used in the direct-map after
they are added the RMP table, and restored to their default valid
permission after the pages are removed from the RMP table.
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
arch/x86/virt/svm/sev.c | 62 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index 24a695af13a5..bf9b97046e05 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -395,6 +395,42 @@ int psmash(u64 pfn)
}
EXPORT_SYMBOL_GPL(psmash);
+static int restore_direct_map(u64 pfn, int npages)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < npages; i++) {
+ ret = set_direct_map_default_noflush(pfn_to_page(pfn + i));
+ if (ret)
+ break;
+ }
+
+ if (ret)
+ pr_warn("Failed to restore direct map for pfn 0x%llx, ret: %d\n",
+ pfn + i, ret);
+
+ return ret;
+}
+
+static int invalidate_direct_map(u64 pfn, int npages)
+{
+ int i, ret = 0;
+
+ for (i = 0; i < npages; i++) {
+ ret = set_direct_map_invalid_noflush(pfn_to_page(pfn + i));
+ if (ret)
+ break;
+ }
+
+ if (ret) {
+ pr_warn("Failed to invalidate direct map for pfn 0x%llx, ret: %d\n",
+ pfn + i, ret);
+ restore_direct_map(pfn, i);
+ }
+
+ return ret;
+}
+
static int rmpupdate(u64 pfn, struct rmp_state *val)
{
unsigned long paddr = pfn << PAGE_SHIFT;
@@ -404,6 +440,21 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
return -ENXIO;
+ level = RMP_TO_X86_PG_LEVEL(val->pagesize);
+ npages = page_level_size(level) / PAGE_SIZE;
+
+ /*
+ * If page is getting assigned in the RMP table then unmap it from the
+ * direct map.
+ */
+ if (val->assigned) {
+ if (invalidate_direct_map(pfn, npages)) {
+ pr_err("Failed to unmap %d pages at pfn 0x%llx from the direct_map\n",
+ npages, pfn);
+ return -EFAULT;
+ }
+ }
+
do {
/* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
@@ -422,6 +473,17 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
return -EFAULT;
}
+ /*
+ * Restore the direct map after the page is removed from the RMP table.
+ */
+ if (!val->assigned) {
+ if (restore_direct_map(pfn, npages)) {
+ pr_err("Failed to map %d pages at pfn 0x%llx into the direct_map\n",
+ npages, pfn);
+ return -EFAULT;
+ }
+ }
+
return 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 12/50] x86/sev: Invalidate pages from the direct map when adding them to the RMP table
2023-10-16 13:27 ` [PATCH v10 12/50] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
@ 2023-11-24 14:20 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-11-24 14:20 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:41AM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The integrity guarantee of SEV-SNP is enforced through the RMP table.
> The RMP is used with standard x86 and IOMMU page tables to enforce
> memory restrictions and page access rights. The RMP check is enforced as
> soon as SEV-SNP is enabled globally in the system. When hardware
> encounters an RMP-check failure, it raises a page-fault exception.
>
> The rmp_make_private() and rmp_make_shared() helpers are used to add
> or remove the pages from the RMP table. Improve the rmp_make_private()
> to invalidate state so that pages cannot be used in the direct-map after
> they are added the RMP table, and restored to their default valid
> permission after the pages are removed from the RMP table.
Brijesh's SOB comes
<--- here,
then Ashish's two tags.
Please audit your whole set for such inconsistencies.
> @@ -404,6 +440,21 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
> if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> return -ENXIO;
>
> + level = RMP_TO_X86_PG_LEVEL(val->pagesize);
> + npages = page_level_size(level) / PAGE_SIZE;
> +
> + /*
> + * If page is getting assigned in the RMP table then unmap it from the
> + * direct map.
Here I'm missing the explanation *why* the pages need to be unmapped
from the direct map.
What happens if not?
> + */
> + if (val->assigned) {
> + if (invalidate_direct_map(pfn, npages)) {
> + pr_err("Failed to unmap %d pages at pfn 0x%llx from the direct_map\n",
> + npages, pfn);
invalidate_direct_map() already dumps an error message - no need to do
that here too.
> + return -EFAULT;
> + }
> + }
> +
> do {
> /* Binutils version 2.36 supports the RMPUPDATE mnemonic. */
> asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
> @@ -422,6 +473,17 @@ static int rmpupdate(u64 pfn, struct rmp_state *val)
> return -EFAULT;
> }
>
> + /*
> + * Restore the direct map after the page is removed from the RMP table.
> + */
> + if (!val->assigned) {
> + if (restore_direct_map(pfn, npages)) {
> + pr_err("Failed to map %d pages at pfn 0x%llx into the direct_map\n",
> + npages, pfn);
Ditto.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 13/50] crypto: ccp: Define the SEV-SNP commands
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (11 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 12/50] x86/sev: Invalidate pages from the direct map when adding them to the RMP table Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-11-24 14:36 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
` (36 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.
Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: update SNP command list and SNP status struct based on current
spec, use C99 flexible arrays]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
drivers/crypto/ccp/sev-dev.c | 16 +++
include/linux/psp-sev.h | 246 +++++++++++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 53 ++++++++
3 files changed, 315 insertions(+)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index f97166fba9d9..c2da92f19ccd 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -130,6 +130,8 @@ static int sev_cmd_buffer_len(int cmd)
switch (cmd) {
case SEV_CMD_INIT: return sizeof(struct sev_data_init);
case SEV_CMD_INIT_EX: return sizeof(struct sev_data_init_ex);
+ case SEV_CMD_SNP_SHUTDOWN_EX: return sizeof(struct sev_data_snp_shutdown_ex);
+ case SEV_CMD_SNP_INIT_EX: return sizeof(struct sev_data_snp_init_ex);
case SEV_CMD_PLATFORM_STATUS: return sizeof(struct sev_user_data_status);
case SEV_CMD_PEK_CSR: return sizeof(struct sev_data_pek_csr);
case SEV_CMD_PEK_CERT_IMPORT: return sizeof(struct sev_data_pek_cert_import);
@@ -158,6 +160,20 @@ static int sev_cmd_buffer_len(int cmd)
case SEV_CMD_GET_ID: return sizeof(struct sev_data_get_id);
case SEV_CMD_ATTESTATION_REPORT: return sizeof(struct sev_data_attestation_report);
case SEV_CMD_SEND_CANCEL: return sizeof(struct sev_data_send_cancel);
+ case SEV_CMD_SNP_GCTX_CREATE: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_LAUNCH_START: return sizeof(struct sev_data_snp_launch_start);
+ case SEV_CMD_SNP_LAUNCH_UPDATE: return sizeof(struct sev_data_snp_launch_update);
+ case SEV_CMD_SNP_ACTIVATE: return sizeof(struct sev_data_snp_activate);
+ case SEV_CMD_SNP_DECOMMISSION: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_PAGE_RECLAIM: return sizeof(struct sev_data_snp_page_reclaim);
+ case SEV_CMD_SNP_GUEST_STATUS: return sizeof(struct sev_data_snp_guest_status);
+ case SEV_CMD_SNP_LAUNCH_FINISH: return sizeof(struct sev_data_snp_launch_finish);
+ case SEV_CMD_SNP_DBG_DECRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_DBG_ENCRYPT: return sizeof(struct sev_data_snp_dbg);
+ case SEV_CMD_SNP_PAGE_UNSMASH: return sizeof(struct sev_data_snp_page_unsmash);
+ case SEV_CMD_SNP_PLATFORM_STATUS: return sizeof(struct sev_data_snp_addr);
+ case SEV_CMD_SNP_GUEST_REQUEST: return sizeof(struct sev_data_snp_guest_request);
+ case SEV_CMD_SNP_CONFIG: return sizeof(struct sev_user_data_snp_config);
default: return 0;
}
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 7fd17e82bab4..a7f92e74564d 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -78,6 +78,36 @@ enum sev_cmd {
SEV_CMD_DBG_DECRYPT = 0x060,
SEV_CMD_DBG_ENCRYPT = 0x061,
+ /* SNP specific commands */
+ SEV_CMD_SNP_INIT = 0x81,
+ SEV_CMD_SNP_SHUTDOWN = 0x82,
+ SEV_CMD_SNP_PLATFORM_STATUS = 0x83,
+ SEV_CMD_SNP_DF_FLUSH = 0x84,
+ SEV_CMD_SNP_INIT_EX = 0x85,
+ SEV_CMD_SNP_SHUTDOWN_EX = 0x86,
+ SEV_CMD_SNP_DECOMMISSION = 0x90,
+ SEV_CMD_SNP_ACTIVATE = 0x91,
+ SEV_CMD_SNP_GUEST_STATUS = 0x92,
+ SEV_CMD_SNP_GCTX_CREATE = 0x93,
+ SEV_CMD_SNP_GUEST_REQUEST = 0x94,
+ SEV_CMD_SNP_ACTIVATE_EX = 0x95,
+ SEV_CMD_SNP_LAUNCH_START = 0xA0,
+ SEV_CMD_SNP_LAUNCH_UPDATE = 0xA1,
+ SEV_CMD_SNP_LAUNCH_FINISH = 0xA2,
+ SEV_CMD_SNP_DBG_DECRYPT = 0xB0,
+ SEV_CMD_SNP_DBG_ENCRYPT = 0xB1,
+ SEV_CMD_SNP_PAGE_SWAP_OUT = 0xC0,
+ SEV_CMD_SNP_PAGE_SWAP_IN = 0xC1,
+ SEV_CMD_SNP_PAGE_MOVE = 0xC2,
+ SEV_CMD_SNP_PAGE_MD_INIT = 0xC3,
+ SEV_CMD_SNP_PAGE_SET_STATE = 0xC6,
+ SEV_CMD_SNP_PAGE_RECLAIM = 0xC7,
+ SEV_CMD_SNP_PAGE_UNSMASH = 0xC8,
+ SEV_CMD_SNP_CONFIG = 0xC9,
+ SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0xCA,
+ SEV_CMD_SNP_COMMIT = 0xCB,
+ SEV_CMD_SNP_VLEK_LOAD = 0xCD,
+
SEV_CMD_MAX,
};
@@ -523,6 +553,222 @@ struct sev_data_attestation_report {
u32 len; /* In/Out */
} __packed;
+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+ u64 address; /* In */
+ u32 len; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_activate - SNP_ACTIVATE command params
+ *
+ * @gctx_paddr: system physical address guest context page
+ * @asid: ASID to bind to the guest
+ */
+struct sev_data_snp_activate {
+ u64 gctx_paddr; /* In */
+ u32 asid; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_addr - generic SNP command params
+ *
+ * @address: system physical address guest context page
+ */
+struct sev_data_snp_addr {
+ u64 gctx_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @policy: guest policy
+ * @ma_gctx_addr: system physical address of migration agent
+ * @imi_en: launch flow is launching an IMI for the purpose of
+ * guest-assisted migration.
+ * @ma_en: the guest is associated with a migration agent
+ */
+struct sev_data_snp_launch_start {
+ u64 gctx_paddr; /* In */
+ u64 policy; /* In */
+ u64 ma_gctx_paddr; /* In */
+ u32 ma_en:1; /* In */
+ u32 imi_en:1; /* In */
+ u32 rsvd:30;
+ u8 gosvw[16]; /* In */
+} __packed;
+
+/* SNP support page type */
+enum {
+ SNP_PAGE_TYPE_NORMAL = 0x1,
+ SNP_PAGE_TYPE_VMSA = 0x2,
+ SNP_PAGE_TYPE_ZERO = 0x3,
+ SNP_PAGE_TYPE_UNMEASURED = 0x4,
+ SNP_PAGE_TYPE_SECRET = 0x5,
+ SNP_PAGE_TYPE_CPUID = 0x6,
+
+ SNP_PAGE_TYPE_MAX
+};
+
+/**
+ * struct sev_data_snp_launch_update - SNP_LAUNCH_UPDATE command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ * @imi_page: indicates that this page is part of the IMI of the guest
+ * @page_type: encoded page type
+ * @page_size: page size 0 indicates 4K and 1 indicates 2MB page
+ * @address: system physical address of destination page to encrypt
+ * @vmpl1_perms: VMPL permission mask for VMPL1
+ * @vmpl2_perms: VMPL permission mask for VMPL2
+ * @vmpl3_perms: VMPL permission mask for VMPL3
+ */
+struct sev_data_snp_launch_update {
+ u64 gctx_paddr; /* In */
+ u32 page_size:1; /* In */
+ u32 page_type:3; /* In */
+ u32 imi_page:1; /* In */
+ u32 rsvd:27;
+ u32 rsvd2;
+ u64 address; /* In */
+ u32 rsvd3:8;
+ u32 vmpl1_perms:8; /* In */
+ u32 vmpl2_perms:8; /* In */
+ u32 vmpl3_perms:8; /* In */
+ u32 rsvd4;
+} __packed;
+
+/**
+ * struct sev_data_snp_launch_finish - SNP_LAUNCH_FINISH command params
+ *
+ * @gctx_addr: system physical address of guest context page
+ */
+struct sev_data_snp_launch_finish {
+ u64 gctx_paddr;
+ u64 id_block_paddr;
+ u64 id_auth_paddr;
+ u8 id_block_en:1;
+ u8 auth_key_en:1;
+ u64 rsvd:62;
+ u8 host_data[32];
+} __packed;
+
+/**
+ * struct sev_data_snp_guest_status - SNP_GUEST_STATUS command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @address: system physical address of guest status page
+ */
+struct sev_data_snp_guest_status {
+ u64 gctx_paddr;
+ u64 address;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_reclaim - SNP_PAGE_RECLAIM command params
+ *
+ * @paddr: system physical address of page to be claimed. The 0th bit
+ * in the address indicates the page size. 0h indicates 4 kB and
+ * 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_reclaim {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_snp_page_unsmash - SNP_PAGE_UNSMASH command params
+ *
+ * @paddr: system physical address of page to be unsmashed. The 0th bit
+ * in the address indicates the page size. 0h indicates 4 kB and
+ * 1h indicates 2 MB page.
+ */
+struct sev_data_snp_page_unsmash {
+ u64 paddr;
+} __packed;
+
+/**
+ * struct sev_data_dbg - DBG_ENCRYPT/DBG_DECRYPT command parameters
+ *
+ * @handle: handle of the VM to perform debug operation
+ * @src_addr: source address of data to operate on
+ * @dst_addr: destination address of data to operate on
+ */
+struct sev_data_snp_dbg {
+ u64 gctx_paddr; /* In */
+ u64 src_addr; /* In */
+ u64 dst_addr; /* In */
+} __packed;
+
+/**
+ * struct sev_snp_guest_request - SNP_GUEST_REQUEST command params
+ *
+ * @gctx_paddr: system physical address of guest context page
+ * @req_paddr: system physical address of request page
+ * @res_paddr: system physical address of response page
+ */
+struct sev_data_snp_guest_request {
+ u64 gctx_paddr; /* In */
+ u64 req_paddr; /* In */
+ u64 res_paddr; /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_init - SNP_INIT_EX structure
+ *
+ * @init_rmp: indicate that the RMP should be initialized.
+ * @list_paddr_en: indicate that list_paddr is valid
+ * @list_paddr: system physical address of range list
+ */
+struct sev_data_snp_init_ex {
+ u32 init_rmp:1;
+ u32 list_paddr_en:1;
+ u32 rsvd:30;
+ u32 rsvd1;
+ u64 list_paddr;
+ u8 rsvd2[48];
+} __packed;
+
+/**
+ * struct sev_data_range - RANGE structure
+ *
+ * @base: system physical address of first byte of range
+ * @page_count: number of 4KB pages in this range
+ */
+struct sev_data_range {
+ u64 base;
+ u32 page_count;
+ u32 rsvd;
+} __packed;
+
+/**
+ * struct sev_data_range_list - RANGE_LIST structure
+ *
+ * @num_elements: number of elements in RANGE_ARRAY
+ * @ranges: array of num_elements of type RANGE
+ */
+struct sev_data_range_list {
+ u32 num_elements;
+ u32 rsvd;
+ struct sev_data_range ranges[];
+} __packed;
+
+/**
+ * struct sev_data_snp_shutdown_ex - SNP_SHUTDOWN_EX structure
+ *
+ * @length: len of the command buffer read by the PSP
+ * @iommu_snp_shutdown: Disable enforcement of SNP in the IOMMU
+ */
+struct sev_data_snp_shutdown_ex {
+ u32 length;
+ u32 iommu_snp_shutdown:1;
+ u32 rsvd1:31;
+} __packed;
+
#ifdef CONFIG_CRYPTO_DEV_SP_PSP
/**
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 1c9da485318f..48e3ef91559c 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -68,6 +68,13 @@ typedef enum {
SEV_RET_INVALID_PARAM,
SEV_RET_RESOURCE_LIMIT,
SEV_RET_SECURE_DATA_INVALID,
+ SEV_RET_INVALID_PAGE_SIZE,
+ SEV_RET_INVALID_PAGE_STATE,
+ SEV_RET_INVALID_MDATA_ENTRY,
+ SEV_RET_INVALID_PAGE_OWNER,
+ SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+ SEV_RET_RMP_INIT_REQUIRED,
+
SEV_RET_MAX,
} sev_ret_code;
@@ -154,6 +161,52 @@ struct sev_user_data_get_id2 {
__u32 length; /* In/Out */
} __packed;
+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @major: API major version
+ * @minor: API minor version
+ * @state: current platform state
+ * @is_rmp_initialized: whether RMP is initialized or not
+ * @build: firmware build id for the API version
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @vlek_en: VLEK hashstick is loaded
+ * @guest_count: the number of guest currently managed by the firmware
+ * @current_tcb_version: current TCB version
+ * @reported_tcb_version: reported TCB version
+ */
+struct sev_user_data_snp_status {
+ __u8 api_major; /* Out */
+ __u8 api_minor; /* Out */
+ __u8 state; /* Out */
+ __u8 is_rmp_initialized:1; /* Out */
+ __u8 rsvd:7;
+ __u32 build_id; /* Out */
+ __u32 mask_chip_id:1; /* Out */
+ __u32 mask_chip_key:1; /* Out */
+ __u32 vlek_en:1; /* Out */
+ __u32 rsvd1:29;
+ __u32 guest_count; /* Out */
+ __u64 current_tcb_version; /* Out */
+ __u64 reported_tcb_version; /* Out */
+} __packed;
+
+/*
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: The TCB version to report in the guest attestation report.
+ * @mask_chip_id: Indicates that the CHID_ID field in the attestation report
+ * will always be zero.
+ */
+struct sev_user_data_snp_config {
+ __u64 reported_tcb ; /* In */
+ __u32 mask_chip_id:1; /* In */
+ __u32 mask_chip_key:1; /* In */
+ __u32 rsvd:30; /* In */
+ __u8 rsvd1[52];
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 13/50] crypto: ccp: Define the SEV-SNP commands
2023-10-16 13:27 ` [PATCH v10 13/50] crypto: ccp: Define the SEV-SNP commands Michael Roth
@ 2023-11-24 14:36 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-11-24 14:36 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:42AM -0500, Michael Roth wrote:
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index 7fd17e82bab4..a7f92e74564d 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -78,6 +78,36 @@ enum sev_cmd {
> SEV_CMD_DBG_DECRYPT = 0x060,
> SEV_CMD_DBG_ENCRYPT = 0x061,
>
> + /* SNP specific commands */
> + SEV_CMD_SNP_INIT = 0x81,
The other commands start with "0x0" - pls do that too here or unify with
a pre-patch.
> + SEV_CMD_SNP_SHUTDOWN = 0x82,
> + SEV_CMD_SNP_PLATFORM_STATUS = 0x83,
> + SEV_CMD_SNP_DF_FLUSH = 0x84,
> + SEV_CMD_SNP_INIT_EX = 0x85,
> + SEV_CMD_SNP_SHUTDOWN_EX = 0x86,
> + SEV_CMD_SNP_DECOMMISSION = 0x90,
> + SEV_CMD_SNP_ACTIVATE = 0x91,
> + SEV_CMD_SNP_GUEST_STATUS = 0x92,
> + SEV_CMD_SNP_GCTX_CREATE = 0x93,
> + SEV_CMD_SNP_GUEST_REQUEST = 0x94,
> + SEV_CMD_SNP_ACTIVATE_EX = 0x95,
> + SEV_CMD_SNP_LAUNCH_START = 0xA0,
> + SEV_CMD_SNP_LAUNCH_UPDATE = 0xA1,
> + SEV_CMD_SNP_LAUNCH_FINISH = 0xA2,
> + SEV_CMD_SNP_DBG_DECRYPT = 0xB0,
> + SEV_CMD_SNP_DBG_ENCRYPT = 0xB1,
> + SEV_CMD_SNP_PAGE_SWAP_OUT = 0xC0,
> + SEV_CMD_SNP_PAGE_SWAP_IN = 0xC1,
> + SEV_CMD_SNP_PAGE_MOVE = 0xC2,
> + SEV_CMD_SNP_PAGE_MD_INIT = 0xC3,
> + SEV_CMD_SNP_PAGE_SET_STATE = 0xC6,
> + SEV_CMD_SNP_PAGE_RECLAIM = 0xC7,
> + SEV_CMD_SNP_PAGE_UNSMASH = 0xC8,
> + SEV_CMD_SNP_CONFIG = 0xC9,
> + SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0xCA,
You don't have to vertically align those to a different column due to
this command's name not fitting - just do:
SEV_CMD_SNP_CONFIG = 0x0C9,
SEV_CMD_SNP_DOWNLOAD_FIRMWARE_EX = 0x0CA,
SEV_CMD_SNP_COMMIT = 0x0CB,
> + SEV_CMD_SNP_COMMIT = 0xCB,
> + SEV_CMD_SNP_VLEK_LOAD = 0xCD,
> +
> SEV_CMD_MAX,
> };
...
> +/**
> + * struct sev_data_snp_launch_start - SNP_LAUNCH_START command params
> + *
> + * @gctx_addr: system physical address of guest context page
> + * @policy: guest policy
> + * @ma_gctx_addr: system physical address of migration agent
> + * @imi_en: launch flow is launching an IMI for the purpose of
What is an "IMI"?
Define it once for the readers pls.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (12 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 13/50] crypto: ccp: Define the SEV-SNP commands Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-11-27 9:59 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 15/50] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
` (35 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
From: Brijesh Singh <brijesh.singh@amd.com>
Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command. Make sure to do a WBINVD and issue DF_FLUSH
command to prepare for the first SNP guest launch after INIT.
During the execution of SNP_INIT command, the firmware configures
and enables SNP security policy enforcement in many system components.
Some system components write to regions of memory reserved by early
x86 firmware (e.g. UEFI). Other system components write to regions
provided by the operation system, hypervisor, or x86 firmware.
Such system components can only write to HV-fixed pages or Default
pages. They will error when attempting to write to other page states
after SNP_INIT enables their SNP enforcement.
Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
system physical address ranges to convert into the HV-fixed page states
during the RMP initialization. If INIT_RMP is 1, hypervisors should
provide all system physical address ranges that the hypervisor will
never assign to a guest until the next RMP re-initialization.
For instance, the memory that UEFI reserves should be included in the
range list. This allows system components that occasionally write to
memory (e.g. logging to UEFI reserved regions) to not fail due to
RMP initialization and SNP enablement.
Note that SNP_INIT(_EX) must not be executed while non-SEV guests are
executing, otherwise it is possible that the system could reset or hang.
The psp_init_on_probe module parameter was added for SEV/SEV-ES support
and the init_ex_path module parameter to allow for time for the
necessary file system to be mounted/available. SNP_INIT(_EX) does not
use the file associated with init_ex_path. So, to avoid running into
issues where SNP_INIT(_EX) is called while there are other running
guests, issue it during module probe regardless of the psp_init_on_probe
setting, but maintain the previous deferrable handling for SEV/SEV-ES
initialization.
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[mdr: squash in psp_init_on_probe changes from Tom]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
drivers/crypto/ccp/sev-dev.c | 272 +++++++++++++++++++++++++++++++++--
drivers/crypto/ccp/sev-dev.h | 2 +
2 files changed, 259 insertions(+), 15 deletions(-)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index c2da92f19ccd..fae1fd45eccd 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -29,6 +29,7 @@
#include <asm/smp.h>
#include <asm/cacheflush.h>
+#include <asm/e820/types.h>
#include "psp-dev.h"
#include "sev-dev.h"
@@ -37,6 +38,10 @@
#define SEV_FW_FILE "amd/sev.fw"
#define SEV_FW_NAME_SIZE 64
+/* Minimum firmware version required for the SEV-SNP support */
+#define SNP_MIN_API_MAJOR 1
+#define SNP_MIN_API_MINOR 51
+
static DEFINE_MUTEX(sev_cmd_mutex);
static struct sev_misc_dev *misc_dev;
@@ -80,6 +85,14 @@ static void *sev_es_tmr;
#define NV_LENGTH (32 * 1024)
static void *sev_init_ex_buffer;
+/*
+ * SEV_DATA_RANGE_LIST:
+ * Array containing range of pages that firmware transitions to HV-fixed
+ * page state.
+ */
+struct sev_data_range_list *snp_range_list;
+static int __sev_snp_init_locked(int *error);
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -466,9 +479,9 @@ static inline int __sev_do_init_locked(int *psp_ret)
return __sev_init_locked(psp_ret);
}
-static int __sev_platform_init_locked(int *error)
+static int ___sev_platform_init_locked(int *error, bool probe)
{
- int rc = 0, psp_ret = SEV_RET_NO_FW_CALL;
+ int rc, psp_ret = SEV_RET_NO_FW_CALL;
struct psp_device *psp = psp_master;
struct sev_device *sev;
@@ -480,6 +493,34 @@ static int __sev_platform_init_locked(int *error)
if (sev->state == SEV_STATE_INIT)
return 0;
+ /*
+ * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
+ * so perform SEV-SNP initialization at probe time.
+ */
+ rc = __sev_snp_init_locked(error);
+ if (rc && rc != -ENODEV) {
+ /*
+ * Don't abort the probe if SNP INIT failed,
+ * continue to initialize the legacy SEV firmware.
+ */
+ dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
+ }
+
+ /* Delay SEV/SEV-ES support initialization */
+ if (probe && !psp_init_on_probe)
+ return 0;
+
+ if (!sev_es_tmr) {
+ /* Obtain the TMR memory area for SEV-ES use */
+ sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
+ if (sev_es_tmr)
+ /* Must flush the cache before giving it to the firmware */
+ clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
+ else
+ dev_warn(sev->dev,
+ "SEV: TMR allocation failed, SEV-ES support unavailable\n");
+ }
+
if (sev_init_ex_buffer) {
rc = sev_read_init_ex_file();
if (rc)
@@ -522,6 +563,11 @@ static int __sev_platform_init_locked(int *error)
return 0;
}
+static int __sev_platform_init_locked(int *error)
+{
+ return ___sev_platform_init_locked(error, false);
+}
+
int sev_platform_init(int *error)
{
int rc;
@@ -534,6 +580,17 @@ int sev_platform_init(int *error)
}
EXPORT_SYMBOL_GPL(sev_platform_init);
+static int sev_platform_init_on_probe(int *error)
+{
+ int rc;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = ___sev_platform_init_locked(error, true);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+
static int __sev_platform_shutdown_locked(int *error)
{
struct sev_device *sev = psp_master->sev_data;
@@ -838,6 +895,191 @@ static int sev_update_firmware(struct device *dev)
return ret;
}
+static void snp_set_hsave_pa(void *arg)
+{
+ wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
+{
+ struct sev_data_range_list *range_list = arg;
+ struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
+ size_t size;
+
+ if ((range_list->num_elements * sizeof(struct sev_data_range) +
+ sizeof(struct sev_data_range_list)) > PAGE_SIZE)
+ return -E2BIG;
+
+ switch (rs->desc) {
+ case E820_TYPE_RESERVED:
+ case E820_TYPE_PMEM:
+ case E820_TYPE_ACPI:
+ range->base = rs->start & PAGE_MASK;
+ size = (rs->end + 1) - rs->start;
+ range->page_count = size >> PAGE_SHIFT;
+ range_list->num_elements++;
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+ struct psp_device *psp = psp_master;
+ struct sev_data_snp_init_ex data;
+ struct sev_device *sev;
+ int rc = 0;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return -ENODEV;
+
+ if (!psp || !psp->sev_data)
+ return -ENODEV;
+
+ sev = psp->sev_data;
+
+ if (sev->snp_initialized)
+ return 0;
+
+ if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
+ dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
+ SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
+ return 0;
+ }
+
+ /*
+ * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
+ * across all cores.
+ */
+ on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+ /*
+ * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
+ * system physical address ranges to convert into the HV-fixed page states
+ * during the RMP initialization. For instance, the memory that UEFI
+ * reserves should be included in the range list. This allows system
+ * components that occasionally write to memory (e.g. logging to UEFI
+ * reserved regions) to not fail due to RMP initialization and SNP enablement.
+ */
+ if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
+ /*
+ * Firmware checks that the pages containing the ranges enumerated
+ * in the RANGES structure are either in the Default page state or in the
+ * firmware page state.
+ */
+ snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!snp_range_list) {
+ dev_err(sev->dev,
+ "SEV: SNP_INIT_EX range list memory allocation failed\n");
+ return -ENOMEM;
+ }
+
+ /*
+ * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
+ * to be setup as HV-fixed pages.
+ */
+
+ rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
+ snp_range_list, snp_filter_reserved_mem_regions);
+ if (rc) {
+ dev_err(sev->dev,
+ "SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
+ return rc;
+ }
+
+ memset(&data, 0, sizeof(data));
+ data.init_rmp = 1;
+ data.list_paddr_en = 1;
+ data.list_paddr = __psp_pa(snp_range_list);
+
+ /*
+ * Before invoking SNP_INIT_EX with INIT_RMP=1, make sure that
+ * all dirty cache lines containing the RMP are flushed.
+ *
+ * NOTE: that includes writes via RMPUPDATE instructions, which
+ * are also cacheable writes.
+ */
+ wbinvd_on_all_cpus();
+
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT_EX, &data, error);
+ if (rc)
+ return rc;
+ } else {
+ /*
+ * SNP_INIT is equivalent to SNP_INIT_EX with INIT_RMP=1, so
+ * just as with that case, make sure all dirty cache lines
+ * containing the RMP are flushed.
+ */
+ wbinvd_on_all_cpus();
+
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+ if (rc)
+ return rc;
+ }
+
+ /* Prepare for first SNP guest launch after INIT */
+ wbinvd_on_all_cpus();
+ rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+ if (rc)
+ return rc;
+
+ sev->snp_initialized = true;
+ dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+ return rc;
+}
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_shutdown_ex data;
+ int ret;
+
+ if (!sev->snp_initialized)
+ return 0;
+
+ memset(&data, 0, sizeof(data));
+ data.length = sizeof(data);
+ data.iommu_snp_shutdown = 1;
+
+ wbinvd_on_all_cpus();
+
+retry:
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
+ /* SHUTDOWN may require DF_FLUSH */
+ if (*error == SEV_RET_DFFLUSH_REQUIRED) {
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
+ return ret;
+ }
+ goto retry;
+ }
+ if (ret) {
+ dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
+ return ret;
+ }
+
+ sev->snp_initialized = false;
+ dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+ return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+ int rc;
+
+ mutex_lock(&sev_cmd_mutex);
+ rc = __sev_snp_shutdown_locked(error);
+ mutex_unlock(&sev_cmd_mutex);
+
+ return rc;
+}
+
static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
{
struct sev_device *sev = psp_master->sev_data;
@@ -1285,6 +1527,8 @@ int sev_dev_init(struct psp_device *psp)
static void sev_firmware_shutdown(struct sev_device *sev)
{
+ int error;
+
sev_platform_shutdown(NULL);
if (sev_es_tmr) {
@@ -1301,6 +1545,13 @@ static void sev_firmware_shutdown(struct sev_device *sev)
get_order(NV_LENGTH));
sev_init_ex_buffer = NULL;
}
+
+ if (snp_range_list) {
+ kfree(snp_range_list);
+ snp_range_list = NULL;
+ }
+
+ sev_snp_shutdown(&error);
}
void sev_dev_destroy(struct psp_device *psp)
@@ -1356,24 +1607,15 @@ void sev_pci_init(void)
}
}
- /* Obtain the TMR memory area for SEV-ES use */
- sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
- if (sev_es_tmr)
- /* Must flush the cache before giving it to the firmware */
- clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
- else
- dev_warn(sev->dev,
- "SEV: TMR allocation failed, SEV-ES support unavailable\n");
-
- if (!psp_init_on_probe)
- return;
-
/* Initialize the platform */
- rc = sev_platform_init(&error);
+ rc = sev_platform_init_on_probe(&error);
if (rc)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);
+ dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
+ "-SNP" : "", sev->api_major, sev->api_minor, sev->build);
+
return;
err:
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 778c95155e74..85506325051a 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -52,6 +52,8 @@ struct sev_device {
u8 build;
void *cmd_buf;
+
+ bool snp_initialized;
};
int sev_dev_init(struct psp_device *psp);
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
2023-10-16 13:27 ` [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
@ 2023-11-27 9:59 ` Borislav Petkov
2023-11-30 2:13 ` Kalra, Ashish
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-11-27 9:59 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On Mon, Oct 16, 2023 at 08:27:43AM -0500, Michael Roth wrote:
> +/*
> + * SEV_DATA_RANGE_LIST:
> + * Array containing range of pages that firmware transitions to HV-fixed
> + * page state.
> + */
> +struct sev_data_range_list *snp_range_list;
> +static int __sev_snp_init_locked(int *error);
Put the function above the caller instead of doing a forward
declaration.
> static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
> {
> struct sev_device *sev = psp_master->sev_data;
> @@ -466,9 +479,9 @@ static inline int __sev_do_init_locked(int *psp_ret)
> return __sev_init_locked(psp_ret);
> }
>
> -static int __sev_platform_init_locked(int *error)
> +static int ___sev_platform_init_locked(int *error, bool probe)
> {
> - int rc = 0, psp_ret = SEV_RET_NO_FW_CALL;
> + int rc, psp_ret = SEV_RET_NO_FW_CALL;
> struct psp_device *psp = psp_master;
> struct sev_device *sev;
>
> @@ -480,6 +493,34 @@ static int __sev_platform_init_locked(int *error)
> if (sev->state == SEV_STATE_INIT)
> return 0;
>
> + /*
> + * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
> + * so perform SEV-SNP initialization at probe time.
> + */
> + rc = __sev_snp_init_locked(error);
> + if (rc && rc != -ENODEV) {
> + /*
> + * Don't abort the probe if SNP INIT failed,
> + * continue to initialize the legacy SEV firmware.
> + */
> + dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
> + }
> +
> + /* Delay SEV/SEV-ES support initialization */
> + if (probe && !psp_init_on_probe)
> + return 0;
> +
> + if (!sev_es_tmr) {
> + /* Obtain the TMR memory area for SEV-ES use */
> + sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
> + if (sev_es_tmr)
> + /* Must flush the cache before giving it to the firmware */
> + clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
> + else
> + dev_warn(sev->dev,
> + "SEV: TMR allocation failed, SEV-ES support unavailable\n");
> + }
> +
> if (sev_init_ex_buffer) {
> rc = sev_read_init_ex_file();
> if (rc)
> @@ -522,6 +563,11 @@ static int __sev_platform_init_locked(int *error)
> return 0;
> }
>
> +static int __sev_platform_init_locked(int *error)
> +{
> + return ___sev_platform_init_locked(error, false);
> +}
Uff, this is silly. And it makes the code hard to follow and that meat
of the platform init functionality in the ___-prefixed function a mess.
And the problem is that that "probe" functionality is replicated from
the one place where it is actually needed - sev_pci_init() which calls
that new sev_platform_init_on_probe() function - to everything that
calls __sev_platform_init_locked() for which you've added a wrapper.
What you should do, instead, is split the code around
__sev_snp_init_locked() in a separate function which does only that and
is called something like __sev_platform_init_snp_locked() or so which
does that unconditional work. And then you define:
_sev_platform_init_locked(int *error, bool probe)
note the *one* '_' - i.e., first layer:
_sev_platform_init_locked(int *error, bool probe):
{
__sev_platform_init_snp_locked(error);
if (!probe)
return 0;
if (psp_init_on_probe)
__sev_platform_init_locked(error);
...
}
and you do the probing in that function only so that it doesn't get lost
in the bunch of things __sev_platform_init_locked() does.
And then you call _sev_platform_init_locked() everywhere and no need for
a second sev_platform_init_on_probe().
> +
> int sev_platform_init(int *error)
> {
> int rc;
> @@ -534,6 +580,17 @@ int sev_platform_init(int *error)
> }
> EXPORT_SYMBOL_GPL(sev_platform_init);
>
> +static int sev_platform_init_on_probe(int *error)
> +{
> + int rc;
> +
> + mutex_lock(&sev_cmd_mutex);
> + rc = ___sev_platform_init_locked(error, true);
> + mutex_unlock(&sev_cmd_mutex);
> +
> + return rc;
> +}
> +
> static int __sev_platform_shutdown_locked(int *error)
> {
> struct sev_device *sev = psp_master->sev_data;
> @@ -838,6 +895,191 @@ static int sev_update_firmware(struct device *dev)
> return ret;
> }
>
> +static void snp_set_hsave_pa(void *arg)
> +{
> + wrmsrl(MSR_VM_HSAVE_PA, 0);
> +}
> +
> +static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
> +{
> + struct sev_data_range_list *range_list = arg;
> + struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
> + size_t size;
> +
> + if ((range_list->num_elements * sizeof(struct sev_data_range) +
> + sizeof(struct sev_data_range_list)) > PAGE_SIZE)
> + return -E2BIG;
Why? A comment would be helpful like with the rest this patch adds.
> + switch (rs->desc) {
> + case E820_TYPE_RESERVED:
> + case E820_TYPE_PMEM:
> + case E820_TYPE_ACPI:
> + range->base = rs->start & PAGE_MASK;
> + size = (rs->end + 1) - rs->start;
> + range->page_count = size >> PAGE_SHIFT;
> + range_list->num_elements++;
> + break;
> + default:
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static int __sev_snp_init_locked(int *error)
> +{
> + struct psp_device *psp = psp_master;
> + struct sev_data_snp_init_ex data;
> + struct sev_device *sev;
> + int rc = 0;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return -ENODEV;
> +
> + if (!psp || !psp->sev_data)
> + return -ENODEV;
Only caller checks this already.
> + sev = psp->sev_data;
> +
> + if (sev->snp_initialized)
Do we really need this silly boolean or is there a way to query the
platform whether SNP has been initialized?
> + return 0;
> +
> + if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
> + dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
> + SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
> + return 0;
> + }
> +
> + /*
> + * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
> + * across all cores.
> + */
> + on_each_cpu(snp_set_hsave_pa, NULL, 1);
> +
> + /*
> + * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
> + * system physical address ranges to convert into the HV-fixed page states
> + * during the RMP initialization. For instance, the memory that UEFI
> + * reserves should be included in the range list. This allows system
> + * components that occasionally write to memory (e.g. logging to UEFI
> + * reserved regions) to not fail due to RMP initialization and SNP enablement.
> + */
> + if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
Is there a generic way to probe SNP_INIT_EX presence in the firmware or
are FW version numbers the only way?
> + /*
> + * Firmware checks that the pages containing the ranges enumerated
> + * in the RANGES structure are either in the Default page state or in the
"default"
> + * firmware page state.
> + */
> + snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
> + if (!snp_range_list) {
> + dev_err(sev->dev,
> + "SEV: SNP_INIT_EX range list memory allocation failed\n");
> + return -ENOMEM;
> + }
> +
> + /*
> + * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
> + * to be setup as HV-fixed pages.
> + */
> +
^ Superfluous newline.
> + rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
> + snp_range_list, snp_filter_reserved_mem_regions);
> + if (rc) {
> + dev_err(sev->dev,
> + "SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
> + return rc;
> + }
> +
> + memset(&data, 0, sizeof(data));
> + data.init_rmp = 1;
> + data.list_paddr_en = 1;
> + data.list_paddr = __psp_pa(snp_range_list);
> +
> + /*
> + * Before invoking SNP_INIT_EX with INIT_RMP=1, make sure that
> + * all dirty cache lines containing the RMP are flushed.
> + *
> + * NOTE: that includes writes via RMPUPDATE instructions, which
> + * are also cacheable writes.
> + */
> + wbinvd_on_all_cpus();
> +
> + rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT_EX, &data, error);
> + if (rc)
> + return rc;
> + } else {
> + /*
> + * SNP_INIT is equivalent to SNP_INIT_EX with INIT_RMP=1, so
> + * just as with that case, make sure all dirty cache lines
> + * containing the RMP are flushed.
> + */
> + wbinvd_on_all_cpus();
> +
> + rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
> + if (rc)
> + return rc;
> + }
So instead of duplicating the code here at the end of the if-else
branching, you can do:
void *arg = &data;
if () {
...
cmd = SEV_CMD_SNP_INIT_EX;
} else {
cmd = SEV_CMD_SNP_INIT;
arg = NULL;
}
wbinvd_on_all_cpus();
rc = __sev_do_cmd_locked(cmd, arg, error);
if (rc)
return rc;
> + /* Prepare for first SNP guest launch after INIT */
> + wbinvd_on_all_cpus();
Why is that WBINVD needed?
> + rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
> + if (rc)
> + return rc;
> +
> + sev->snp_initialized = true;
> + dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
> +
> + return rc;
> +}
> +
> +static int __sev_snp_shutdown_locked(int *error)
> +{
> + struct sev_device *sev = psp_master->sev_data;
> + struct sev_data_snp_shutdown_ex data;
> + int ret;
> +
> + if (!sev->snp_initialized)
> + return 0;
> +
> + memset(&data, 0, sizeof(data));
> + data.length = sizeof(data);
> + data.iommu_snp_shutdown = 1;
> +
> + wbinvd_on_all_cpus();
> +
> +retry:
> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
> + /* SHUTDOWN may require DF_FLUSH */
> + if (*error == SEV_RET_DFFLUSH_REQUIRED) {
> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
> + if (ret) {
> + dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
> + return ret;
When you return here, sev->snp_initialized is still true but, in
reality, it probably is in some half-broken state after issuing those
commands you it is not really initialized anymore.
> + }
> + goto retry;
This needs an upper limit from which to break out and not potentially
endless-loop.
> + }
> + if (ret) {
> + dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
> + return ret;
> + }
> +
> + sev->snp_initialized = false;
> + dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
> +
> + return ret;
> +}
> +
> +static int sev_snp_shutdown(int *error)
> +{
> + int rc;
> +
> + mutex_lock(&sev_cmd_mutex);
> + rc = __sev_snp_shutdown_locked(error);
Why is this "locked" version even there if it is called only here?
IOW, put all the logic in here - no need for
__sev_snp_shutdown_locked().
> + mutex_unlock(&sev_cmd_mutex);
> +
> + return rc;
> +}
...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
2023-11-27 9:59 ` Borislav Petkov
@ 2023-11-30 2:13 ` Kalra, Ashish
2023-12-06 17:08 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Kalra, Ashish @ 2023-11-30 2:13 UTC (permalink / raw)
To: Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh, Jarkko Sakkinen
Hello Boris,
>> +static int ___sev_platform_init_locked(int *error, bool probe)
>> {
>> - int rc = 0, psp_ret = SEV_RET_NO_FW_CALL;
>> + int rc, psp_ret = SEV_RET_NO_FW_CALL;
>> struct psp_device *psp = psp_master;
>> struct sev_device *sev;
>>
>> @@ -480,6 +493,34 @@ static int __sev_platform_init_locked(int *error)
>> if (sev->state == SEV_STATE_INIT)
>> return 0;
>>
>> + /*
>> + * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
>> + * so perform SEV-SNP initialization at probe time.
>> + */
>> + rc = __sev_snp_init_locked(error);
>> + if (rc && rc != -ENODEV) {
>> + /*
>> + * Don't abort the probe if SNP INIT failed,
>> + * continue to initialize the legacy SEV firmware.
>> + */
>> + dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
>> + }
>> +
>> + /* Delay SEV/SEV-ES support initialization */
>> + if (probe && !psp_init_on_probe)
>> + return 0;
>> +
>> + if (!sev_es_tmr) {
>> + /* Obtain the TMR memory area for SEV-ES use */
>> + sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
>> + if (sev_es_tmr)
>> + /* Must flush the cache before giving it to the firmware */
>> + clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
>> + else
>> + dev_warn(sev->dev,
>> + "SEV: TMR allocation failed, SEV-ES support unavailable\n");
>> + }
>> +
>> if (sev_init_ex_buffer) {
>> rc = sev_read_init_ex_file();
>> if (rc)
>> @@ -522,6 +563,11 @@ static int __sev_platform_init_locked(int *error)
>> return 0;
>> }
>>
>> +static int __sev_platform_init_locked(int *error)
>> +{
>> + return ___sev_platform_init_locked(error, false);
>> +}
>
> Uff, this is silly. And it makes the code hard to follow and that meat
> of the platform init functionality in the ___-prefixed function a mess.
>
> And the problem is that that "probe" functionality is replicated from
> the one place where it is actually needed - sev_pci_init() which calls
> that new sev_platform_init_on_probe() function - to everything that
> calls __sev_platform_init_locked() for which you've added a wrapper.
>
> What you should do, instead, is split the code around
> __sev_snp_init_locked() in a separate function which does only that and
> is called something like __sev_platform_init_snp_locked() or so which
> does that unconditional work. And then you define:
>
> _sev_platform_init_locked(int *error, bool probe)
>
> note the *one* '_' - i.e., first layer:
>
> _sev_platform_init_locked(int *error, bool probe):
> {
> __sev_platform_init_snp_locked(error);
>
> if (!probe)
> return 0;
>
> if (psp_init_on_probe)
> __sev_platform_init_locked(error);
>
> ...
> }
>
> and you do the probing in that function only so that it doesn't get lost
> in the bunch of things __sev_platform_init_locked() does.
>
> And then you call _sev_platform_init_locked() everywhere and no need for
> a second sev_platform_init_on_probe().
>
It surely seems hard to follow up, so i am anyway going to clean it up by:
Adding the "probe" parameter to sev_platform_init() where the parameter
being true indicates that we only want to do SNP initialization on
probe, the same parameter will get passed on to
__sev_platform_init_locked().
So eventually there won't be a second sev_platform_init_on_probe() and
also there is no need for a ___sev_platform_init_locked().
We will only have sev_platform_init() and _sev_platform_init_locked().
>> +
>> +static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
>> +{
>> + struct sev_data_range_list *range_list = arg;
>> + struct sev_data_range *range = &range_list->ranges[range_list->num_elements];
>> + size_t size;
>> +
>> + if ((range_list->num_elements * sizeof(struct sev_data_range) +
>> + sizeof(struct sev_data_range_list)) > PAGE_SIZE)
>> + return -E2BIG;
>
> Why? A comment would be helpful like with the rest this patch adds.
>
Ok.
>> + switch (rs->desc) {
>> + case E820_TYPE_RESERVED:
>> + case E820_TYPE_PMEM:
>> + case E820_TYPE_ACPI:
>> + range->base = rs->start & PAGE_MASK;
>> + size = (rs->end + 1) - rs->start;
>> + range->page_count = size >> PAGE_SHIFT;
>> + range_list->num_elements++;
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int __sev_snp_init_locked(int *error)
>> +{
>> + struct psp_device *psp = psp_master;
>> + struct sev_data_snp_init_ex data;
>> + struct sev_device *sev;
>> + int rc = 0;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> + return -ENODEV;
>> +
>> + if (!psp || !psp->sev_data)
>> + return -ENODEV;
>
> Only caller checks this already.
>
Ok.
>> + sev = psp->sev_data;
>> +
>> + if (sev->snp_initialized)
>
> Do we really need this silly boolean or is there a way to query the
> platform whether SNP has been initialized?
>
Yes it makes sense to have it as any platform specific way to query
whether the SNP has been initialized will be much more expensive then
simply checking this boolean.
>> + return 0;
>> +
>> + if (!sev_version_greater_or_equal(SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR)) {
>> + dev_dbg(sev->dev, "SEV-SNP support requires firmware version >= %d:%d\n",
>> + SNP_MIN_API_MAJOR, SNP_MIN_API_MINOR);
>> + return 0;
>> + }
>> +
>> + /*
>> + * The SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h
>> + * across all cores.
>> + */
>> + on_each_cpu(snp_set_hsave_pa, NULL, 1);
>> +
>> + /*
>> + * Starting in SNP firmware v1.52, the SNP_INIT_EX command takes a list of
>> + * system physical address ranges to convert into the HV-fixed page states
>> + * during the RMP initialization. For instance, the memory that UEFI
>> + * reserves should be included in the range list. This allows system
>> + * components that occasionally write to memory (e.g. logging to UEFI
>> + * reserved regions) to not fail due to RMP initialization and SNP enablement.
>> + */
>> + if (sev_version_greater_or_equal(SNP_MIN_API_MAJOR, 52)) {
>
> Is there a generic way to probe SNP_INIT_EX presence in the firmware or
> are FW version numbers the only way?
It is not only the presence of SNP_INIT_EX but this check is more
specific to passing the HV_Fixed pages list to SNP_INIT_EX and that is
only supported with SNP FW versions 1.52 and beyond, so the FW version
check is the only way.
>
>> + /*
>> + * Firmware checks that the pages containing the ranges enumerated
>> + * in the RANGES structure are either in the Default page state or in the
>
> "default"
>
>> + * firmware page state.
>> + */
>> + snp_range_list = kzalloc(PAGE_SIZE, GFP_KERNEL);
>> + if (!snp_range_list) {
>> + dev_err(sev->dev,
>> + "SEV: SNP_INIT_EX range list memory allocation failed\n");
>> + return -ENOMEM;
>> + }
>> +
>> + /*
>> + * Retrieve all reserved memory regions setup by UEFI from the e820 memory map
>> + * to be setup as HV-fixed pages.
>> + */
>> +
>
>
> ^ Superfluous newline.
>
>> + rc = walk_iomem_res_desc(IORES_DESC_NONE, IORESOURCE_MEM, 0, ~0,
>> + snp_range_list, snp_filter_reserved_mem_regions);
>> + if (rc) {
>> + dev_err(sev->dev,
>> + "SEV: SNP_INIT_EX walk_iomem_res_desc failed rc = %d\n", rc);
>> + return rc;
>> + }
>> +
>> + memset(&data, 0, sizeof(data));
>> + data.init_rmp = 1;
>> + data.list_paddr_en = 1;
>> + data.list_paddr = __psp_pa(snp_range_list);
>> +
>> + /*
>> + * Before invoking SNP_INIT_EX with INIT_RMP=1, make sure that
>> + * all dirty cache lines containing the RMP are flushed.
>> + *
>> + * NOTE: that includes writes via RMPUPDATE instructions, which
>> + * are also cacheable writes.
>> + */
>> + wbinvd_on_all_cpus();
>> +
>> + rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT_EX, &data, error);
>> + if (rc)
>> + return rc;
>> + } else {
>> + /*
>> + * SNP_INIT is equivalent to SNP_INIT_EX with INIT_RMP=1, so
>> + * just as with that case, make sure all dirty cache lines
>> + * containing the RMP are flushed.
>> + */
>> + wbinvd_on_all_cpus();
>> +
>> + rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
>> + if (rc)
>> + return rc;
>> + }
>
> So instead of duplicating the code here at the end of the if-else
> branching, you can do:
>
> void *arg = &data;
>
> if () {
> ...
> cmd = SEV_CMD_SNP_INIT_EX;
> } else {
> cmd = SEV_CMD_SNP_INIT;
> arg = NULL;
> }
>
> wbinvd_on_all_cpus();
> rc = __sev_do_cmd_locked(cmd, arg, error);
> if (rc)
> return rc;
Yes, makes sense, will fix it.
>
>> + /* Prepare for first SNP guest launch after INIT */
>> + wbinvd_on_all_cpus();
>
> Why is that WBINVD needed?
As the comment above mentions, WBINVD + DF_FLUSH is needed before the
first guest launch.
>
>> + rc = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
>> + if (rc)
>> + return rc;
>> +
>> + sev->snp_initialized = true;
>> + dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
>> +
>> + return rc;
>> +}
>> +
>> +static int __sev_snp_shutdown_locked(int *error)
>> +{
>> + struct sev_device *sev = psp_master->sev_data;
>> + struct sev_data_snp_shutdown_ex data;
>> + int ret;
>> +
>> + if (!sev->snp_initialized)
>> + return 0;
>> +
>> + memset(&data, 0, sizeof(data));
>> + data.length = sizeof(data);
>> + data.iommu_snp_shutdown = 1;
>> +
>> + wbinvd_on_all_cpus();
>> +
>> +retry:
>> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
>> + /* SHUTDOWN may require DF_FLUSH */
>> + if (*error == SEV_RET_DFFLUSH_REQUIRED) {
>> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, NULL);
>> + if (ret) {
>> + dev_err(sev->dev, "SEV-SNP DF_FLUSH failed\n");
>> + return ret;
>
> When you return here, sev->snp_initialized is still true but, in
> reality, it probably is in some half-broken state after issuing those
> commands you it is not really initialized anymore.
Yes, this needs to be fixed.
>
>> + }
>> + goto retry;
>
> This needs an upper limit from which to break out and not potentially
> endless-loop.
>
Ok.
>> + }
>> + if (ret) {
>> + dev_err(sev->dev, "SEV-SNP firmware shutdown failed\n");
>> + return ret;
>> + }
>> +
>> + sev->snp_initialized = false;
>> + dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
>> +
>> + return ret;
>> +}
>> +
>> +static int sev_snp_shutdown(int *error)
>> +{
>> + int rc;
>> +
>> + mutex_lock(&sev_cmd_mutex);
>> + rc = __sev_snp_shutdown_locked(error);
>
> Why is this "locked" version even there if it is called only here?
>
> IOW, put all the logic in here - no need for
> __sev_snp_shutdown_locked().
In the latest code base, _sev_snp_shutdown_locked() is called from
__sev_firmware_shutdown().
Thanks,
Ashish
>
>> + mutex_unlock(&sev_cmd_mutex);
>> +
>> + return rc;
>> +}
>
> ...
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
2023-11-30 2:13 ` Kalra, Ashish
@ 2023-12-06 17:08 ` Borislav Petkov
2023-12-06 20:35 ` Kalra, Ashish
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-06 17:08 UTC (permalink / raw)
To: Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh,
Jarkko Sakkinen
On Wed, Nov 29, 2023 at 08:13:52PM -0600, Kalra, Ashish wrote:
> It surely seems hard to follow up, so i am anyway going to clean it up by:
>
> Adding the "probe" parameter to sev_platform_init() where the parameter
> being true indicates that we only want to do SNP initialization on probe,
> the same parameter will get passed on to
> __sev_platform_init_locked().
That's exactly what you should *not* do - the probe parameter controls
whether
if (psp_init_on_probe)
__sev_platform_init_locked(error);
and so on should get executed or not.
If it is unclear, lemme know and I'll do a diff to show you what I mean.
> > > + /* Prepare for first SNP guest launch after INIT */
> > > + wbinvd_on_all_cpus();
> >
> > Why is that WBINVD needed?
>
> As the comment above mentions, WBINVD + DF_FLUSH is needed before the first
> guest launch.
Lemme see if I get this straight. The correct order is:
WBINVD
SNP_INIT_*
WBINVD
DF_FLUSH
If so, do a comment which goes like this:
/*
* The order of commands to execute before the first guest
* launch is the following:
*
* bla...
*/
> In the latest code base, _sev_snp_shutdown_locked() is called from
> __sev_firmware_shutdown().
Then carve that function out only when needed - do not do changes
preemptively. This is not helping during review.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
2023-12-06 17:08 ` Borislav Petkov
@ 2023-12-06 20:35 ` Kalra, Ashish
2023-12-09 16:20 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Kalra, Ashish @ 2023-12-06 20:35 UTC (permalink / raw)
To: Borislav Petkov
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh,
Jarkko Sakkinen
Hello Boris,
On 12/6/2023 11:08 AM, Borislav Petkov wrote:
> On Wed, Nov 29, 2023 at 08:13:52PM -0600, Kalra, Ashish wrote:
>> It surely seems hard to follow up, so i am anyway going to clean it up by:
>>
>> Adding the "probe" parameter to sev_platform_init() where the parameter
>> being true indicates that we only want to do SNP initialization on probe,
>> the same parameter will get passed on to
>> __sev_platform_init_locked().
>
> That's exactly what you should *not* do - the probe parameter controls
> whether
>
> if (psp_init_on_probe)
> __sev_platform_init_locked(error);
>
> and so on should get executed or not.
>
Not actually.
The main use case for the probe parameter is to control if we want to do
legacy SEV/SEV-ES INIT during probe. There is a usage case where we want
to delay legacy SEV INIT till an actual SEV/SEV-ES guest is being
launched. So essentially the probe parameter controls if we want to
execute __sev_do_init_locked() or not.
We always want to do SNP INIT at probe time.
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
2023-12-06 20:35 ` Kalra, Ashish
@ 2023-12-09 16:20 ` Borislav Petkov
2023-12-11 21:11 ` Kalra, Ashish
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-09 16:20 UTC (permalink / raw)
To: Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh,
Jarkko Sakkinen
On Wed, Dec 06, 2023 at 02:35:28PM -0600, Kalra, Ashish wrote:
> The main use case for the probe parameter is to control if we want to do
> legacy SEV/SEV-ES INIT during probe. There is a usage case where we want to
> delay legacy SEV INIT till an actual SEV/SEV-ES guest is being launched. So
> essentially the probe parameter controls if we want to
> execute __sev_do_init_locked() or not.
>
> We always want to do SNP INIT at probe time.
Here's what I mean (diff ontop):
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index fae1fd45eccd..830d74fcf950 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -479,11 +479,16 @@ static inline int __sev_do_init_locked(int *psp_ret)
return __sev_init_locked(psp_ret);
}
-static int ___sev_platform_init_locked(int *error, bool probe)
+/*
+ * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
+ * so perform SEV-SNP initialization at probe time.
+ */
+static int __sev_platform_init_snp_locked(int *error)
{
- int rc, psp_ret = SEV_RET_NO_FW_CALL;
+
struct psp_device *psp = psp_master;
struct sev_device *sev;
+ int rc;
if (!psp || !psp->sev_data)
return -ENODEV;
@@ -493,10 +498,6 @@ static int ___sev_platform_init_locked(int *error, bool probe)
if (sev->state == SEV_STATE_INIT)
return 0;
- /*
- * Legacy guests cannot be running while SNP_INIT(_EX) is executing,
- * so perform SEV-SNP initialization at probe time.
- */
rc = __sev_snp_init_locked(error);
if (rc && rc != -ENODEV) {
/*
@@ -506,8 +507,21 @@ static int ___sev_platform_init_locked(int *error, bool probe)
dev_err(sev->dev, "SEV-SNP: failed to INIT rc %d, error %#x\n", rc, *error);
}
- /* Delay SEV/SEV-ES support initialization */
- if (probe && !psp_init_on_probe)
+ return rc;
+}
+
+static int __sev_platform_init_locked(int *error)
+{
+ int rc, psp_ret = SEV_RET_NO_FW_CALL;
+ struct psp_device *psp = psp_master;
+ struct sev_device *sev;
+
+ if (!psp || !psp->sev_data)
+ return -ENODEV;
+
+ sev = psp->sev_data;
+
+ if (sev->state == SEV_STATE_INIT)
return 0;
if (!sev_es_tmr) {
@@ -563,33 +577,32 @@ static int ___sev_platform_init_locked(int *error, bool probe)
return 0;
}
-static int __sev_platform_init_locked(int *error)
-{
- return ___sev_platform_init_locked(error, false);
-}
-
-int sev_platform_init(int *error)
+static int _sev_platform_init_locked(int *error, bool probe)
{
int rc;
- mutex_lock(&sev_cmd_mutex);
- rc = __sev_platform_init_locked(error);
- mutex_unlock(&sev_cmd_mutex);
+ rc = __sev_platform_init_snp_locked(error);
+ if (rc)
+ return rc;
- return rc;
+ /* Delay SEV/SEV-ES support initialization */
+ if (probe && !psp_init_on_probe)
+ return 0;
+
+ return __sev_platform_init_locked(error);
}
-EXPORT_SYMBOL_GPL(sev_platform_init);
-static int sev_platform_init_on_probe(int *error)
+int sev_platform_init(int *error)
{
int rc;
mutex_lock(&sev_cmd_mutex);
- rc = ___sev_platform_init_locked(error, true);
+ rc = _sev_platform_init_locked(error, false);
mutex_unlock(&sev_cmd_mutex);
return rc;
}
+EXPORT_SYMBOL_GPL(sev_platform_init);
static int __sev_platform_shutdown_locked(int *error)
{
@@ -691,7 +704,7 @@ static int sev_ioctl_do_pek_pdh_gen(int cmd, struct sev_issue_cmd *argp, bool wr
return -EPERM;
if (sev->state == SEV_STATE_UNINIT) {
- rc = __sev_platform_init_locked(&argp->error);
+ rc = _sev_platform_init_locked(&argp->error, false);
if (rc)
return rc;
}
@@ -734,7 +747,7 @@ static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp, bool writable)
cmd:
if (sev->state == SEV_STATE_UNINIT) {
- ret = __sev_platform_init_locked(&argp->error);
+ ret = _sev_platform_init_locked(&argp->error, false);
if (ret)
goto e_free_blob;
}
@@ -1115,7 +1128,7 @@ static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
/* If platform is not in INIT state then transition it to INIT */
if (sev->state != SEV_STATE_INIT) {
- ret = __sev_platform_init_locked(&argp->error);
+ ret = _sev_platform_init_locked(&argp->error, false);
if (ret)
goto e_free_oca;
}
@@ -1246,7 +1259,7 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
if (!writable)
return -EPERM;
- ret = __sev_platform_init_locked(&argp->error);
+ ret = _sev_platform_init_locked(&argp->error, false);
if (ret)
return ret;
}
@@ -1608,7 +1621,9 @@ void sev_pci_init(void)
}
/* Initialize the platform */
- rc = sev_platform_init_on_probe(&error);
+ mutex_lock(&sev_cmd_mutex);
+ rc = _sev_platform_init_locked(&error, true);
+ mutex_unlock(&sev_cmd_mutex);
if (rc)
dev_err(sev->dev, "SEV: failed to INIT error %#x, rc %d\n",
error, rc);
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
2023-12-09 16:20 ` Borislav Petkov
@ 2023-12-11 21:11 ` Kalra, Ashish
2023-12-12 6:52 ` Borislav Petkov
0 siblings, 1 reply; 158+ messages in thread
From: Kalra, Ashish @ 2023-12-11 21:11 UTC (permalink / raw)
To: Borislav Petkov
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh,
Jarkko Sakkinen
Hello Boris,
On 12/9/2023 10:20 AM, Borislav Petkov wrote:
> On Wed, Dec 06, 2023 at 02:35:28PM -0600, Kalra, Ashish wrote:
>> The main use case for the probe parameter is to control if we want to doHl
>> legacy SEV/SEV-ES INIT during probe. There is a usage case where we want to
>> delay legacy SEV INIT till an actual SEV/SEV-ES guest is being launched. So
>> essentially the probe parameter controls if we want to
>> execute __sev_do_init_locked() or not.
>>
>> We always want to do SNP INIT at probe time.
>
> Here's what I mean (diff ontop):
>
See my comments below on this patch:
> +int sev_platform_init(int *error)
> {
> int rc;
>
> mutex_lock(&sev_cmd_mutex);
> - rc = ___sev_platform_init_locked(error, true);
> + rc = _sev_platform_init_locked(error, false);
> mutex_unlock(&sev_cmd_mutex);
>
> return rc;
> }
> +EXPORT_SYMBOL_GPL(sev_platform_init);
>
What we need is a mechanism to do legacy SEV/SEV-ES INIT only if a
SEV/SEV-ES guest is being launched, hence, we want an additional
parameter added to sev_platform_init() exported interface so that
kvm_amd module can call this interface during guest launch and indicate
if SNP/legacy guest is being launched.
That's the reason we want to add the probe parameter to
sev_platform_init().
And to address your previous comments, this will remain a clean
interface, there are going to be only two functions:
sev_platform_init() & __sev_platform_init_locked().
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
2023-12-11 21:11 ` Kalra, Ashish
@ 2023-12-12 6:52 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-12-12 6:52 UTC (permalink / raw)
To: Kalra, Ashish
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, seanjc, vkuznets, jmattson, luto, dave.hansen, slp,
pgonda, peterz, srinivas.pandruvada, rientjes, dovmurik, tobin,
vbabka, kirill, ak, tony.luck, sathyanarayanan.kuppuswamy,
alpergun, jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh, Jarkko Sakkinen
On Mon, Dec 11, 2023 at 03:11:17PM -0600, Kalra, Ashish wrote:
> What we need is a mechanism to do legacy SEV/SEV-ES INIT only if a
> SEV/SEV-ES guest is being launched, hence, we want an additional parameter
> added to sev_platform_init() exported interface so that kvm_amd module can
> call this interface during guest launch and indicate if SNP/legacy guest is
> being launched.
>
> That's the reason we want to add the probe parameter to
> sev_platform_init().
That's not what your original patch does and nowhere in the whole
patchset do I see this new requirement for KVM to be able to control the
probing.
The probe param is added to ___sev_platform_init_locked() which is
called by this new sev_platform_init_on_probe() thing to signal that
whatever calls this, it wants the probing.
And "whatever" is sev_pci_init() which is called from the bowels of the
secure processor drivers. Suffice it to say, this is some sort of an
init path.
So, it wants to init SNP stuff which is unconditional during driver init
- not when KVM starts guests - and probe too on driver init time, *iff*
that psp_init_on_probe thing is set. Which looks suspicious to me:
"Add psp_init_on_probe module parameter that allows for skipping the
PSP's SEV platform initialization during module init. User may decouple
module init from PSP init due to use of the INIT_EX support in upcoming
patch which allows for users to save PSP's internal state to file."
From b64fa5fc9f44 ("crypto: ccp - Add psp_init_on_probe module
parameter").
And reading about INIT_EX, "This command loads the SEV related
persistent data from user-supplied data and initializes the platform
context."
So it sounds like HV vendor wants to supply something itself. But then
looking at init_ex_path and open_file_as_root() makes me cringe.
I would've never done it this way: we have request_firmware* etc helpers
for loading blobs from userspace which are widely used. But then reading
3d725965f836 ("crypto: ccp - Add SEV_INIT_EX support")
that increases the cringe factor even more because that also wants to
*write* into that file. Maybe there were good reasons to do it this way
- it is still yucky for my taste tho...
But I digress - whatever you want to do, the right approach is to split
the functionality:
SNP init
legacy SEV init
and to call them from a wrapper function around it which determines
which ones need to get called depending on that delayed probe thing.
Lumping everything together and handing a silly bool downwards is
already turning into a mess.
Now, looking at sev_guest_init() which calls sev_platform_init() and if
you want to pass back'n'forth more information than just that &error
pointer, then you can define your own struct sev_platform_init_info or
so which you preset before calling sev_platform_init() and pass in
a pointer to it.
And in it you can stick &error, bool probe or whatever else you need to
control what the platform needs to do upon init. And if you need to
extend that in the future, you can add new struct members and so on.
HTH.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 15/50] crypto: ccp: Provide API to issue SEV and SNP commands
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (13 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 14/50] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-06 20:21 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list Michael Roth
` (34 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
Make sev_do_cmd() a generic API interface for the hypervisor
to issue commands to manage an SEV and SNP guest. The commands
for SEV and SNP are defined in the SEV and SEV-SNP firmware
specifications.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
drivers/crypto/ccp/sev-dev.c | 3 ++-
include/linux/psp-sev.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index fae1fd45eccd..613b25f81498 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -418,7 +418,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
return ret;
}
-static int sev_do_cmd(int cmd, void *data, int *psp_ret)
+int sev_do_cmd(int cmd, void *data, int *psp_ret)
{
int rc;
@@ -428,6 +428,7 @@ static int sev_do_cmd(int cmd, void *data, int *psp_ret)
return rc;
}
+EXPORT_SYMBOL_GPL(sev_do_cmd);
static int __sev_init_locked(int *error)
{
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index a7f92e74564d..61bb5849ebf2 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -883,6 +883,20 @@ int sev_guest_df_flush(int *error);
*/
int sev_guest_decommission(struct sev_data_decommission *data, int *error);
+/**
+ * sev_do_cmd - perform SEV command
+ *
+ * @error: SEV command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV if the SEV device is not available
+ * -%ENOTSUPP if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO if the SEV returned a non-zero return code
+ */
+int sev_do_cmd(int cmd, void *data, int *psp_ret);
+
void *psp_copy_user_blob(u64 uaddr, u32 len);
#else /* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -898,6 +912,9 @@ sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENO
static inline int
sev_guest_decommission(struct sev_data_decommission *data, int *error) { return -ENODEV; }
+static inline int
+sev_do_cmd(int cmd, void *data, int *psp_ret) { return -ENODEV; }
+
static inline int
sev_guest_activate(struct sev_data_activate *data, int *error) { return -ENODEV; }
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 15/50] crypto: ccp: Provide API to issue SEV and SNP commands
2023-10-16 13:27 ` [PATCH v10 15/50] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
@ 2023-12-06 20:21 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-12-06 20:21 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:44AM -0500, Michael Roth wrote:
> Subject: Re: [PATCH v10 15/50] crypto: ccp: Provide API to issue SEV and SNP commands
"...: Export sev_do_cmd() as a generic API..."
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> Make sev_do_cmd() a generic API interface for the hypervisor
> to issue commands to manage an SEV and SNP guest. The commands
> for SEV and SNP are defined in the SEV and SEV-SNP firmware
> specifications.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
...
> diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
> index a7f92e74564d..61bb5849ebf2 100644
> --- a/include/linux/psp-sev.h
> +++ b/include/linux/psp-sev.h
> @@ -883,6 +883,20 @@ int sev_guest_df_flush(int *error);
> */
> int sev_guest_decommission(struct sev_data_decommission *data, int *error);
>
> +/**
See below for the output of
./scripts/kernel-doc -none include/linux/psp-sev.h
I understand that you want to kernel-doc stuff but you should do it
right.
> + * sev_do_cmd - perform SEV command
"Issue an SEV or an SEV-SNP command"
> + *
> + * @error: SEV command return code
That must be @psp_ret.
And to quote the abovementioned script:
include/linux/psp-sev.h:898: warning: Function parameter or member 'cmd' not described in 'sev_do_cmd'
include/linux/psp-sev.h:898: warning: Function parameter or member 'data' not described in 'sev_do_cmd'
include/linux/psp-sev.h:898: warning: Function parameter or member 'psp_ret' not described in 'sev_do_cmd'
include/linux/psp-sev.h:898: warning: Excess function parameter 'error' description in 'sev_do_cmd'
> + *
> + * Returns:
> + * 0 if the SEV successfully processed the command
"the SEV"?
You mean the "SEV device"?
> + * -%ENODEV if the SEV device is not available
> + * -%ENOTSUPP if the SEV does not support SEV
> + * -%ETIMEDOUT if the SEV command timed out
> + * -%EIO if the SEV returned a non-zero return code
> + */
> +int sev_do_cmd(int cmd, void *data, int *psp_ret);
> +
> void *psp_copy_user_blob(u64 uaddr, u32 len);
>
> #else /* !CONFIG_CRYPTO_DEV_SP_PSP */
> @@ -898,6 +912,9 @@ sev_guest_deactivate(struct sev_data_deactivate *data, int *error) { return -ENO
> static inline int
> sev_guest_decommission(struct sev_data_decommission *data, int *error) { return -ENODEV; }
>
> +static inline int
> +sev_do_cmd(int cmd, void *data, int *psp_ret) { return -ENODEV; }
> +
> static inline int
> sev_guest_activate(struct sev_data_activate *data, int *error) { return -ENODEV; }
>
include/linux/psp-sev.h:20: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* SEV platform state
include/linux/psp-sev.h:31: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* SEV platform and guest management commands
include/linux/psp-sev.h:126: warning: Function parameter or member 'reserved' not described in 'sev_data_init'
include/linux/psp-sev.h:146: warning: Function parameter or member 'reserved' not described in 'sev_data_init_ex'
include/linux/psp-sev.h:175: warning: expecting prototype for struct sev_data_cert_import. Prototype was for struct sev_data_pek_cert_import instead
include/linux/psp-sev.h:212: warning: Function parameter or member 'pdh_cert_address' not described in 'sev_data_pdh_cert_export'
include/linux/psp-sev.h:212: warning: Function parameter or member 'pdh_cert_len' not described in 'sev_data_pdh_cert_export'
include/linux/psp-sev.h:212: warning: Function parameter or member 'reserved' not described in 'sev_data_pdh_cert_export'
include/linux/psp-sev.h:276: warning: Function parameter or member 'reserved' not described in 'sev_data_launch_start'
include/linux/psp-sev.h:290: warning: Function parameter or member 'reserved' not described in 'sev_data_launch_update_data'
include/linux/psp-sev.h:304: warning: Function parameter or member 'reserved' not described in 'sev_data_launch_update_vmsa'
include/linux/psp-sev.h:318: warning: Function parameter or member 'reserved' not described in 'sev_data_launch_measure'
include/linux/psp-sev.h:342: warning: Function parameter or member 'reserved1' not described in 'sev_data_launch_secret'
include/linux/psp-sev.h:342: warning: Function parameter or member 'reserved2' not described in 'sev_data_launch_secret'
include/linux/psp-sev.h:342: warning: Function parameter or member 'reserved3' not described in 'sev_data_launch_secret'
include/linux/psp-sev.h:381: warning: Function parameter or member 'reserved1' not described in 'sev_data_send_start'
include/linux/psp-sev.h:381: warning: Function parameter or member 'reserved2' not described in 'sev_data_send_start'
include/linux/psp-sev.h:381: warning: Function parameter or member 'reserved3' not described in 'sev_data_send_start'
include/linux/psp-sev.h:405: warning: expecting prototype for struct sev_data_send_update. Prototype was for struct sev_data_send_update_data instead
include/linux/psp-sev.h:428: warning: expecting prototype for struct sev_data_send_update. Prototype was for struct sev_data_send_update_vmsa instead
include/linux/psp-sev.h:465: warning: Function parameter or member 'policy' not described in 'sev_data_receive_start'
include/linux/psp-sev.h:465: warning: Function parameter or member 'reserved1' not described in 'sev_data_receive_start'
include/linux/psp-sev.h:489: warning: Function parameter or member 'reserved1' not described in 'sev_data_receive_update_data'
include/linux/psp-sev.h:489: warning: Function parameter or member 'reserved2' not described in 'sev_data_receive_update_data'
include/linux/psp-sev.h:489: warning: Function parameter or member 'reserved3' not described in 'sev_data_receive_update_data'
include/linux/psp-sev.h:513: warning: Function parameter or member 'reserved1' not described in 'sev_data_receive_update_vmsa'
include/linux/psp-sev.h:513: warning: Function parameter or member 'reserved2' not described in 'sev_data_receive_update_vmsa'
include/linux/psp-sev.h:513: warning: Function parameter or member 'reserved3' not described in 'sev_data_receive_update_vmsa'
include/linux/psp-sev.h:538: warning: Function parameter or member 'reserved' not described in 'sev_data_dbg'
include/linux/psp-sev.h:554: warning: Function parameter or member 'reserved' not described in 'sev_data_attestation_report'
include/linux/psp-sev.h:585: warning: Function parameter or member 'gctx_paddr' not described in 'sev_data_snp_addr'
include/linux/psp-sev.h:605: warning: Function parameter or member 'gctx_paddr' not described in 'sev_data_snp_launch_start'
include/linux/psp-sev.h:605: warning: Function parameter or member 'ma_gctx_paddr' not described in 'sev_data_snp_launch_start'
include/linux/psp-sev.h:605: warning: Function parameter or member 'rsvd' not described in 'sev_data_snp_launch_start'
include/linux/psp-sev.h:605: warning: Function parameter or member 'gosvw' not described in 'sev_data_snp_launch_start'
include/linux/psp-sev.h:644: warning: Function parameter or member 'gctx_paddr' not described in 'sev_data_snp_launch_update'
include/linux/psp-sev.h:644: warning: Function parameter or member 'rsvd' not described in 'sev_data_snp_launch_update'
include/linux/psp-sev.h:644: warning: Function parameter or member 'rsvd2' not described in 'sev_data_snp_launch_update'
include/linux/psp-sev.h:644: warning: Function parameter or member 'rsvd3' not described in 'sev_data_snp_launch_update'
include/linux/psp-sev.h:644: warning: Function parameter or member 'rsvd4' not described in 'sev_data_snp_launch_update'
include/linux/psp-sev.h:659: warning: Function parameter or member 'gctx_paddr' not described in 'sev_data_snp_launch_finish'
include/linux/psp-sev.h:659: warning: Function parameter or member 'id_block_paddr' not described in 'sev_data_snp_launch_finish'
include/linux/psp-sev.h:659: warning: Function parameter or member 'id_auth_paddr' not described in 'sev_data_snp_launch_finish'
include/linux/psp-sev.h:659: warning: Function parameter or member 'id_block_en' not described in 'sev_data_snp_launch_finish'
include/linux/psp-sev.h:659: warning: Function parameter or member 'auth_key_en' not described in 'sev_data_snp_launch_finish'
include/linux/psp-sev.h:659: warning: Function parameter or member 'rsvd' not described in 'sev_data_snp_launch_finish'
include/linux/psp-sev.h:659: warning: Function parameter or member 'host_data' not described in 'sev_data_snp_launch_finish'
include/linux/psp-sev.h:705: warning: expecting prototype for struct sev_data_dbg. Prototype was for struct sev_data_snp_dbg instead
include/linux/psp-sev.h:718: warning: expecting prototype for struct sev_snp_guest_request. Prototype was for struct sev_data_snp_guest_request instead
include/linux/psp-sev.h:734: warning: expecting prototype for struct sev_data_snp_init. Prototype was for struct sev_data_snp_init_ex instead
include/linux/psp-sev.h:746: warning: Function parameter or member 'rsvd' not described in 'sev_data_range'
include/linux/psp-sev.h:758: warning: Function parameter or member 'rsvd' not described in 'sev_data_range_list'
include/linux/psp-sev.h:770: warning: Function parameter or member 'rsvd1' not described in 'sev_data_snp_shutdown_ex'
include/linux/psp-sev.h:825: warning: Function parameter or member 'filep' not described in 'sev_issue_cmd_external_user'
include/linux/psp-sev.h:825: warning: Function parameter or member 'id' not described in 'sev_issue_cmd_external_user'
include/linux/psp-sev.h:825: warning: Function parameter or member 'data' not described in 'sev_issue_cmd_external_user'
include/linux/psp-sev.h:840: warning: Function parameter or member 'data' not described in 'sev_guest_deactivate'
include/linux/psp-sev.h:840: warning: Function parameter or member 'error' not described in 'sev_guest_deactivate'
include/linux/psp-sev.h:840: warning: Excess function parameter 'deactivate' description in 'sev_guest_deactivate'
include/linux/psp-sev.h:840: warning: Excess function parameter 'sev_ret' description in 'sev_guest_deactivate'
include/linux/psp-sev.h:855: warning: Function parameter or member 'data' not described in 'sev_guest_activate'
include/linux/psp-sev.h:855: warning: Function parameter or member 'error' not described in 'sev_guest_activate'
include/linux/psp-sev.h:855: warning: Excess function parameter 'activate' description in 'sev_guest_activate'
include/linux/psp-sev.h:855: warning: Excess function parameter 'sev_ret' description in 'sev_guest_activate'
include/linux/psp-sev.h:869: warning: Function parameter or member 'error' not described in 'sev_guest_df_flush'
include/linux/psp-sev.h:869: warning: Excess function parameter 'sev_ret' description in 'sev_guest_df_flush'
include/linux/psp-sev.h:884: warning: Function parameter or member 'data' not described in 'sev_guest_decommission'
include/linux/psp-sev.h:884: warning: Function parameter or member 'error' not described in 'sev_guest_decommission'
include/linux/psp-sev.h:884: warning: Excess function parameter 'decommission' description in 'sev_guest_decommission'
include/linux/psp-sev.h:884: warning: Excess function parameter 'sev_ret' description in 'sev_guest_decommission'
include/linux/psp-sev.h:898: warning: Function parameter or member 'cmd' not described in 'sev_do_cmd'
include/linux/psp-sev.h:898: warning: Function parameter or member 'data' not described in 'sev_do_cmd'
include/linux/psp-sev.h:898: warning: Function parameter or member 'psp_ret' not described in 'sev_do_cmd'
include/linux/psp-sev.h:898: warning: Excess function parameter 'error' description in 'sev_do_cmd'
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (14 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 15/50] crypto: ccp: Provide API to issue SEV and SNP commands Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-06 20:42 ` Borislav Petkov
2023-12-07 16:20 ` Vlastimil Babka
2023-10-16 13:27 ` [PATCH v10 17/50] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
` (33 subsequent siblings)
49 siblings, 2 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
From: Ashish Kalra <ashish.kalra@amd.com>
Pages are unsafe to be released back to the page-allocator, if they
have been transitioned to firmware/guest state and can't be reclaimed
or transitioned back to hypervisor/shared state. In this case add
them to an internal leaked pages list to ensure that they are not freed
or touched/accessed to cause fatal page faults.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: relocate to arch/x86/coco/sev/host.c]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/sev-host.h | 3 +++
arch/x86/virt/svm/sev.c | 28 ++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
index 1df989411334..7490a665e78f 100644
--- a/arch/x86/include/asm/sev-host.h
+++ b/arch/x86/include/asm/sev-host.h
@@ -19,6 +19,8 @@ void sev_dump_hva_rmpentry(unsigned long address);
int psmash(u64 pfn);
int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
int rmp_make_shared(u64 pfn, enum pg_level level);
+void snp_leak_pages(u64 pfn, unsigned int npages);
+
#else
static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
static inline void sev_dump_hva_rmpentry(unsigned long address) {}
@@ -29,6 +31,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
return -ENXIO;
}
static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENXIO; }
+static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
#endif
#endif
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index bf9b97046e05..29a69f4b8cfb 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -59,6 +59,12 @@ struct rmpentry {
static struct rmpentry *rmptable_start __ro_after_init;
static u64 rmptable_max_pfn __ro_after_init;
+/* list of pages which are leaked and cannot be reclaimed */
+static LIST_HEAD(snp_leaked_pages_list);
+static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
+
+static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
+
#undef pr_fmt
#define pr_fmt(fmt) "SEV-SNP: " fmt
@@ -518,3 +524,25 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
return rmpupdate(pfn, &val);
}
EXPORT_SYMBOL_GPL(rmp_make_shared);
+
+void snp_leak_pages(u64 pfn, unsigned int npages)
+{
+ struct page *page = pfn_to_page(pfn);
+
+ pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
+
+ spin_lock(&snp_leaked_pages_list_lock);
+ while (npages--) {
+ /*
+ * Reuse the page's buddy list for chaining into the leaked
+ * pages list. This page should not be on a free list currently
+ * and is also unsafe to be added to a free list.
+ */
+ list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
+ sev_dump_rmpentry(pfn);
+ pfn++;
+ }
+ spin_unlock(&snp_leaked_pages_list_lock);
+ atomic_long_inc(&snp_nr_leaked_pages);
+}
+EXPORT_SYMBOL_GPL(snp_leak_pages);
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list
2023-10-16 13:27 ` [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list Michael Roth
@ 2023-12-06 20:42 ` Borislav Petkov
2023-12-08 20:54 ` Kalra, Ashish
2023-12-07 16:20 ` Vlastimil Babka
1 sibling, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-06 20:42 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
On Mon, Oct 16, 2023 at 08:27:45AM -0500, Michael Roth wrote:
> + spin_lock(&snp_leaked_pages_list_lock);
> + while (npages--) {
> + /*
> + * Reuse the page's buddy list for chaining into the leaked
> + * pages list. This page should not be on a free list currently
> + * and is also unsafe to be added to a free list.
> + */
> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
> + sev_dump_rmpentry(pfn);
> + pfn++;
> + }
> + spin_unlock(&snp_leaked_pages_list_lock);
> + atomic_long_inc(&snp_nr_leaked_pages);
How is this supposed to count?
You're leaking @npages as the function's parameter but are incrementing
snp_nr_leaked_pages only once?
Just make it a bog-normal unsigned long and increment it inside the
locked section.
Or do at the beginning of the function:
atomic_long_add(npages, &snp_nr_leaked_pages);
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list
2023-12-06 20:42 ` Borislav Petkov
@ 2023-12-08 20:54 ` Kalra, Ashish
0 siblings, 0 replies; 158+ messages in thread
From: Kalra, Ashish @ 2023-12-08 20:54 UTC (permalink / raw)
To: Borislav Petkov, Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang
On 12/6/2023 2:42 PM, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:45AM -0500, Michael Roth wrote:
>> + spin_lock(&snp_leaked_pages_list_lock);
>> + while (npages--) {
>> + /*
>> + * Reuse the page's buddy list for chaining into the leaked
>> + * pages list. This page should not be on a free list currently
>> + * and is also unsafe to be added to a free list.
>> + */
>> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
>> + sev_dump_rmpentry(pfn);
>> + pfn++;
>> + }
>> + spin_unlock(&snp_leaked_pages_list_lock);
>> + atomic_long_inc(&snp_nr_leaked_pages);
>
> How is this supposed to count?
>
> You're leaking @npages as the function's parameter but are incrementing
> snp_nr_leaked_pages only once?
>
> Just make it a bog-normal unsigned long and increment it inside the
> locked section.
>
> Or do at the beginning of the function:
>
> atomic_long_add(npages, &snp_nr_leaked_pages);
>
Yes will fix accordingly by incrementing it inside the locked section.
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list
2023-10-16 13:27 ` [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list Michael Roth
2023-12-06 20:42 ` Borislav Petkov
@ 2023-12-07 16:20 ` Vlastimil Babka
2023-12-08 22:10 ` Kalra, Ashish
1 sibling, 1 reply; 158+ messages in thread
From: Vlastimil Babka @ 2023-12-07 16:20 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang
On 10/16/23 15:27, Michael Roth wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> Pages are unsafe to be released back to the page-allocator, if they
> have been transitioned to firmware/guest state and can't be reclaimed
> or transitioned back to hypervisor/shared state. In this case add
> them to an internal leaked pages list to ensure that they are not freed
Note the adding to the list doesn't ensure anything like that. Not dropping
the refcount to zero does. But tracking them might indeed not be bad for
e.g. crashdump investigations so no objection there.
> or touched/accessed to cause fatal page faults.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> [mdr: relocate to arch/x86/coco/sev/host.c]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> arch/x86/include/asm/sev-host.h | 3 +++
> arch/x86/virt/svm/sev.c | 28 ++++++++++++++++++++++++++++
> 2 files changed, 31 insertions(+)
>
> diff --git a/arch/x86/include/asm/sev-host.h b/arch/x86/include/asm/sev-host.h
> index 1df989411334..7490a665e78f 100644
> --- a/arch/x86/include/asm/sev-host.h
> +++ b/arch/x86/include/asm/sev-host.h
> @@ -19,6 +19,8 @@ void sev_dump_hva_rmpentry(unsigned long address);
> int psmash(u64 pfn);
> int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int asid, bool immutable);
> int rmp_make_shared(u64 pfn, enum pg_level level);
> +void snp_leak_pages(u64 pfn, unsigned int npages);
> +
> #else
> static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENXIO; }
> static inline void sev_dump_hva_rmpentry(unsigned long address) {}
> @@ -29,6 +31,7 @@ static inline int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, int as
> return -ENXIO;
> }
> static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENXIO; }
> +static inline void snp_leak_pages(u64 pfn, unsigned int npages) {}
> #endif
>
> #endif
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index bf9b97046e05..29a69f4b8cfb 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -59,6 +59,12 @@ struct rmpentry {
> static struct rmpentry *rmptable_start __ro_after_init;
> static u64 rmptable_max_pfn __ro_after_init;
>
> +/* list of pages which are leaked and cannot be reclaimed */
> +static LIST_HEAD(snp_leaked_pages_list);
> +static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
> +
> +static atomic_long_t snp_nr_leaked_pages = ATOMIC_LONG_INIT(0);
> +
> #undef pr_fmt
> #define pr_fmt(fmt) "SEV-SNP: " fmt
>
> @@ -518,3 +524,25 @@ int rmp_make_shared(u64 pfn, enum pg_level level)
> return rmpupdate(pfn, &val);
> }
> EXPORT_SYMBOL_GPL(rmp_make_shared);
> +
> +void snp_leak_pages(u64 pfn, unsigned int npages)
> +{
> + struct page *page = pfn_to_page(pfn);
> +
> + pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
> +
> + spin_lock(&snp_leaked_pages_list_lock);
> + while (npages--) {
> + /*
> + * Reuse the page's buddy list for chaining into the leaked
> + * pages list. This page should not be on a free list currently
> + * and is also unsafe to be added to a free list.
> + */
> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
> + sev_dump_rmpentry(pfn);
> + pfn++;
You increment pfn, but not page, which is always pointing to the page of the
initial pfn, so need to do page++ too.
But that assumes it's all order-0 pages (hard to tell for me whether that's
true as we start with a pfn), if there can be compound pages, it would be
best to only add the head page and skip the tail pages - it's not expected
to use page->buddy_list of tail pages.
> + }
> + spin_unlock(&snp_leaked_pages_list_lock);
> + atomic_long_inc(&snp_nr_leaked_pages);
> +}
> +EXPORT_SYMBOL_GPL(snp_leak_pages);
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list
2023-12-07 16:20 ` Vlastimil Babka
@ 2023-12-08 22:10 ` Kalra, Ashish
2023-12-11 13:08 ` Vlastimil Babka
0 siblings, 1 reply; 158+ messages in thread
From: Kalra, Ashish @ 2023-12-08 22:10 UTC (permalink / raw)
To: Vlastimil Babka, Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang
Hello Vlastimil,
On 12/7/2023 10:20 AM, Vlastimil Babka wrote:
>> +
>> +void snp_leak_pages(u64 pfn, unsigned int npages)
>> +{
>> + struct page *page = pfn_to_page(pfn);
>> +
>> + pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn, pfn + npages);
>> +
>> + spin_lock(&snp_leaked_pages_list_lock);
>> + while (npages--) {
>> + /*
>> + * Reuse the page's buddy list for chaining into the leaked
>> + * pages list. This page should not be on a free list currently
>> + * and is also unsafe to be added to a free list.
>> + */
>> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
>> + sev_dump_rmpentry(pfn);
>> + pfn++;
>
> You increment pfn, but not page, which is always pointing to the page of the
> initial pfn, so need to do page++ too.
Yes, that is a bug and needs to be fixed.
> But that assumes it's all order-0 pages (hard to tell for me whether that's
> true as we start with a pfn), if there can be compound pages, it would be
> best to only add the head page and skip the tail pages - it's not expected
> to use page->buddy_list of tail pages.
Can't we use PageCompound() to check if the page is a compound page and
then use page->compound_head to get and add the head page to leaked
pages list. I understand the tail pages for compound pages are really
limited for usage.
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list
2023-12-08 22:10 ` Kalra, Ashish
@ 2023-12-11 13:08 ` Vlastimil Babka
2023-12-12 23:26 ` Kalra, Ashish
0 siblings, 1 reply; 158+ messages in thread
From: Vlastimil Babka @ 2023-12-11 13:08 UTC (permalink / raw)
To: Kalra, Ashish, Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang
On 12/8/23 23:10, Kalra, Ashish wrote:
> Hello Vlastimil,
>
> On 12/7/2023 10:20 AM, Vlastimil Babka wrote:
>
>>> +
>>> +void snp_leak_pages(u64 pfn, unsigned int npages)
>>> +{
>>> + struct page *page = pfn_to_page(pfn);
>>> +
>>> + pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn,
>>> pfn + npages);
>>> +
>>> + spin_lock(&snp_leaked_pages_list_lock);
>>> + while (npages--) {
>>> + /*
>>> + * Reuse the page's buddy list for chaining into the leaked
>>> + * pages list. This page should not be on a free list currently
>>> + * and is also unsafe to be added to a free list.
>>> + */
>>> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
>>> + sev_dump_rmpentry(pfn);
>>> + pfn++;
>>
>> You increment pfn, but not page, which is always pointing to the page
>> of the
>> initial pfn, so need to do page++ too.
>
> Yes, that is a bug and needs to be fixed.
>
>> But that assumes it's all order-0 pages (hard to tell for me whether
>> that's
>> true as we start with a pfn), if there can be compound pages, it would be
>> best to only add the head page and skip the tail pages - it's not
>> expected
>> to use page->buddy_list of tail pages.
>
> Can't we use PageCompound() to check if the page is a compound page and
> then use page->compound_head to get and add the head page to leaked
> pages list. I understand the tail pages for compound pages are really
> limited for usage.
Yeah that should work. Need to be careful though, should probably only
process head pages and check if the whole compound_order() is within the
range we are to leak, and then leak the head page and advance the loop
by compound_order(). And if we encounter a tail page, it should probably
be just skipped. I'm looking at snp_reclaim_pages() which seems to
process a number of pages with SEV_CMD_SNP_PAGE_RECLAIM and once any
fails, call snp_leak_pages() on the rest. Could that invoke
snp_leak_pages with the first pfn being a tail page?
> Thanks,
> Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list
2023-12-11 13:08 ` Vlastimil Babka
@ 2023-12-12 23:26 ` Kalra, Ashish
0 siblings, 0 replies; 158+ messages in thread
From: Kalra, Ashish @ 2023-12-12 23:26 UTC (permalink / raw)
To: Vlastimil Babka, Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang
Hello Vlastimil,
On 12/11/2023 7:08 AM, Vlastimil Babka wrote:
>
>
> On 12/8/23 23:10, Kalra, Ashish wrote:
>> Hello Vlastimil,
>>
>> On 12/7/2023 10:20 AM, Vlastimil Babka wrote:
>>
>>>> +
>>>> +void snp_leak_pages(u64 pfn, unsigned int npages)
>>>> +{
>>>> + struct page *page = pfn_to_page(pfn);
>>>> +
>>>> + pr_debug("%s: leaking PFN range 0x%llx-0x%llx\n", __func__, pfn,
>>>> pfn + npages);
>>>> +
>>>> + spin_lock(&snp_leaked_pages_list_lock);
>>>> + while (npages--) {
>>>> + /*
>>>> + * Reuse the page's buddy list for chaining into the leaked
>>>> + * pages list. This page should not be on a free list currently
>>>> + * and is also unsafe to be added to a free list.
>>>> + */
>>>> + list_add_tail(&page->buddy_list, &snp_leaked_pages_list);
>>>> + sev_dump_rmpentry(pfn);
>>>> + pfn++;
>>>
>>> You increment pfn, but not page, which is always pointing to the page
>>> of the
>>> initial pfn, so need to do page++ too.
>>
>> Yes, that is a bug and needs to be fixed.
>>
>>> But that assumes it's all order-0 pages (hard to tell for me whether
>>> that's
>>> true as we start with a pfn), if there can be compound pages, it would be
>>> best to only add the head page and skip the tail pages - it's not
>>> expected
>>> to use page->buddy_list of tail pages.
>>
>> Can't we use PageCompound() to check if the page is a compound page and
>> then use page->compound_head to get and add the head page to leaked
>> pages list. I understand the tail pages for compound pages are really
>> limited for usage.
>
> Yeah that should work. Need to be careful though, should probably only
> process head pages and check if the whole compound_order() is within the
> range we are to leak, and then leak the head page and advance the loop
> by compound_order(). And if we encounter a tail page, it should probably
> be just skipped. I'm looking at snp_reclaim_pages() which seems to
> process a number of pages with SEV_CMD_SNP_PAGE_RECLAIM and once any
> fails, call snp_leak_pages() on the rest. Could that invoke
> snp_leak_pages with the first pfn being a tail page?
Yes i don't think we can assume that the first pfn will not be a tail
page. But then this becomes complex as we might have already reclaimed
the head page and one or more tail pages successfully or probably never
transitioned head page to FW state as alloc_page()/alloc_pages() would
have returned subpage(s) of a largepage.
But then we really can't use the buddy_list of a tail page to insert it
in the snp leaked pages list, right ?
These non-reclaimed pages are not usable anymore anyway, any access to
them will cause fatal RMP #PF, so don't know if i can use the buddy_list
to insert tail pages as that will corrupt the page metadata ?
We initially used to invoke memory_failure() here to try to gracefully
handle failure of these non-reclaimed pages and that used to handle
hugepages, etc., but as pointed in previous review feedback that is not
a logical approach for this as that's meant more for the RAS stuff.
Maybe it is a simpler approach to have our own container object on top
and have this page pointer and list_head in it and use that list_head to
insert into the snp leaked list instead of re-using the buddy_list for
chaining into the leaked pages list ?
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 17/50] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (15 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 16/50] x86/sev: Introduce snp leaked pages list Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-08 13:05 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command " Michael Roth
` (32 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The behavior and requirement for the SEV-legacy command is altered when
the SNP firmware is in the INIT state. See SEV-SNP firmware specification
for more details.
Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
when SNP is enabled to satisfy new requirements for the SNP. Continue
allocating a 1mb region for !SNP configuration.
While at it, provide API that can be used by others to allocate a page
that can be used by the firmware. The immediate user for this API will
be the KVM driver. The KVM driver to need to allocate a firmware context
page during the guest creation. The context page need to be updated
by the firmware. See the SEV-SNP specification for further details.
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: use struct sev_data_snp_page_reclaim instead of passing paddr
directly to SEV_CMD_SNP_PAGE_RECLAIM]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
drivers/crypto/ccp/sev-dev.c | 151 ++++++++++++++++++++++++++++++++---
include/linux/psp-sev.h | 9 +++
2 files changed, 151 insertions(+), 9 deletions(-)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 613b25f81498..ea21307a2b34 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -30,6 +30,7 @@
#include <asm/smp.h>
#include <asm/cacheflush.h>
#include <asm/e820/types.h>
+#include <asm/sev-host.h>
#include "psp-dev.h"
#include "sev-dev.h"
@@ -93,6 +94,13 @@ static void *sev_init_ex_buffer;
struct sev_data_range_list *snp_range_list;
static int __sev_snp_init_locked(int *error);
+/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
+#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)
+
+static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
+
+static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
+
static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
{
struct sev_device *sev = psp_master->sev_data;
@@ -193,11 +201,131 @@ static int sev_cmd_buffer_len(int cmd)
return 0;
}
+static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int ret, err, i, n = 0;
+
+ for (i = 0; i < npages; i++, pfn++, n++) {
+ struct sev_data_snp_page_reclaim data = {0};
+
+ data.paddr = pfn << PAGE_SHIFT;
+
+ if (locked)
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ else
+ ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+
+ if (ret)
+ goto cleanup;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * If failed to reclaim the page then page is no longer safe to
+ * be release back to the system, leak it.
+ */
+ snp_leak_pages(pfn, npages - n);
+ return ret;
+}
+
+static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
+{
+ /* Cbit maybe set in the paddr */
+ unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ int rc, n = 0, i;
+
+ for (i = 0; i < npages; i++, n++, pfn++) {
+ rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
+ if (rc)
+ goto cleanup;
+ }
+
+ return 0;
+
+cleanup:
+ /*
+ * Try unrolling the firmware state changes by
+ * reclaiming the pages which were already changed to the
+ * firmware state.
+ */
+ snp_reclaim_pages(paddr, n, locked);
+
+ return rc;
+}
+
+static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
+{
+ unsigned long npages = 1ul << order, paddr;
+ struct sev_device *sev;
+ struct page *page;
+
+ if (!psp_master || !psp_master->sev_data)
+ return NULL;
+
+ page = alloc_pages(gfp_mask, order);
+ if (!page)
+ return NULL;
+
+ /* If SEV-SNP is initialized then add the page in RMP table. */
+ sev = psp_master->sev_data;
+ if (!sev->snp_initialized)
+ return page;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (rmp_mark_pages_firmware(paddr, npages, locked))
+ return NULL;
+
+ return page;
+}
+
+void *snp_alloc_firmware_page(gfp_t gfp_mask)
+{
+ struct page *page;
+
+ page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
+
+ return page ? page_address(page) : NULL;
+}
+EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
+
+static void __snp_free_firmware_pages(struct page *page, int order, bool locked)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ unsigned long paddr, npages = 1ul << order;
+
+ if (!page)
+ return;
+
+ paddr = __pa((unsigned long)page_address(page));
+ if (sev->snp_initialized &&
+ snp_reclaim_pages(paddr, npages, locked))
+ return;
+
+ __free_pages(page, order);
+}
+
+void snp_free_firmware_page(void *addr)
+{
+ if (!addr)
+ return;
+
+ __snp_free_firmware_pages(virt_to_page(addr), 0, false);
+}
+EXPORT_SYMBOL_GPL(snp_free_firmware_page);
+
static void *sev_fw_alloc(unsigned long len)
{
struct page *page;
- page = alloc_pages(GFP_KERNEL, get_order(len));
+ page = __snp_alloc_firmware_pages(GFP_KERNEL, get_order(len), false);
if (!page)
return NULL;
@@ -443,7 +571,7 @@ static int __sev_init_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);
data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}
return __sev_do_cmd_locked(SEV_CMD_INIT, &data, error);
@@ -466,7 +594,7 @@ static int __sev_init_ex_locked(int *error)
data.tmr_address = __pa(sev_es_tmr);
data.flags |= SEV_INIT_FLAGS_SEV_ES;
- data.tmr_len = SEV_ES_TMR_SIZE;
+ data.tmr_len = sev_es_tmr_size;
}
return __sev_do_cmd_locked(SEV_CMD_INIT_EX, &data, error);
@@ -513,14 +641,16 @@ static int ___sev_platform_init_locked(int *error, bool probe)
if (!sev_es_tmr) {
/* Obtain the TMR memory area for SEV-ES use */
- sev_es_tmr = sev_fw_alloc(SEV_ES_TMR_SIZE);
- if (sev_es_tmr)
+ sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
+ if (sev_es_tmr) {
/* Must flush the cache before giving it to the firmware */
- clflush_cache_range(sev_es_tmr, SEV_ES_TMR_SIZE);
- else
+ if (!sev->snp_initialized)
+ clflush_cache_range(sev_es_tmr, sev_es_tmr_size);
+ } else {
dev_warn(sev->dev,
"SEV: TMR allocation failed, SEV-ES support unavailable\n");
}
+ }
if (sev_init_ex_buffer) {
rc = sev_read_init_ex_file();
@@ -1030,6 +1160,8 @@ static int __sev_snp_init_locked(int *error)
sev->snp_initialized = true;
dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+ sev_es_tmr_size = SEV_SNP_ES_TMR_SIZE;
+
return rc;
}
@@ -1536,8 +1668,9 @@ static void sev_firmware_shutdown(struct sev_device *sev)
/* The TMR area was encrypted, flush it from the cache */
wbinvd_on_all_cpus();
- free_pages((unsigned long)sev_es_tmr,
- get_order(SEV_ES_TMR_SIZE));
+ __snp_free_firmware_pages(virt_to_page(sev_es_tmr),
+ get_order(sev_es_tmr_size),
+ false);
sev_es_tmr = NULL;
}
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 61bb5849ebf2..9342cee1a1e6 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -898,6 +898,8 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
int sev_do_cmd(int cmd, void *data, int *psp_ret);
void *psp_copy_user_blob(u64 uaddr, u32 len);
+void *snp_alloc_firmware_page(gfp_t mask);
+void snp_free_firmware_page(void *addr);
#else /* !CONFIG_CRYPTO_DEV_SP_PSP */
@@ -925,6 +927,13 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
+static inline void *snp_alloc_firmware_page(gfp_t mask)
+{
+ return NULL;
+}
+
+static inline void snp_free_firmware_page(void *addr) { }
+
#endif /* CONFIG_CRYPTO_DEV_SP_PSP */
#endif /* __PSP_SEV_H__ */
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 17/50] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
2023-10-16 13:27 ` [PATCH v10 17/50] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
@ 2023-12-08 13:05 ` Borislav Petkov
2023-12-19 23:46 ` Michael Roth
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-08 13:05 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:46AM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The behavior and requirement for the SEV-legacy command is altered when
> the SNP firmware is in the INIT state. See SEV-SNP firmware specification
> for more details.
>
> Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
> when SNP is enabled to satisfy new requirements for the SNP. Continue
s/the //
> allocating a 1mb region for !SNP configuration.
>
> While at it, provide API that can be used by others to allocate a page
"...an API... ... to allocate a firmware page."
Simple.
> that can be used by the firmware.
> The immediate user for this API will be the KVM driver.
Delete that sentence.
> The KVM driver to need to allocate a firmware context
"The KVM driver needs to allocate ...
> page during the guest creation. The context page need to be updated
"needs"
> by the firmware. See the SEV-SNP specification for further details.
>
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> [mdr: use struct sev_data_snp_page_reclaim instead of passing paddr
> directly to SEV_CMD_SNP_PAGE_RECLAIM]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> drivers/crypto/ccp/sev-dev.c | 151 ++++++++++++++++++++++++++++++++---
> include/linux/psp-sev.h | 9 +++
> 2 files changed, 151 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 613b25f81498..ea21307a2b34 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -30,6 +30,7 @@
> #include <asm/smp.h>
> #include <asm/cacheflush.h>
> #include <asm/e820/types.h>
> +#include <asm/sev-host.h>
>
> #include "psp-dev.h"
> #include "sev-dev.h"
> @@ -93,6 +94,13 @@ static void *sev_init_ex_buffer;
> struct sev_data_range_list *snp_range_list;
> static int __sev_snp_init_locked(int *error);
>
> +/* When SEV-SNP is enabled the TMR needs to be 2MB aligned and 2MB size. */
> +#define SEV_SNP_ES_TMR_SIZE (2 * 1024 * 1024)
There's "SEV", "SNP" *and* "ES". Wow.
Let's do this:
#define SEV_TMR_SIZE SZ_1M
#define SNP_TMR_SIZE SZ_2M
Done.
> +static size_t sev_es_tmr_size = SEV_ES_TMR_SIZE;
> +
> +static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret);
Instead of doing forward declarations, move the whole logic around
__sev_do_cmd_locked() up here in the file so that you can call that
function by other functions without forward declarations.
The move should probably be a pre-patch.
> static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
> {
> struct sev_device *sev = psp_master->sev_data;
> @@ -193,11 +201,131 @@ static int sev_cmd_buffer_len(int cmd)
> return 0;
> }
>
> +static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
> +{
> + /* Cbit maybe set in the paddr */
> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> + int ret, err, i, n = 0;
> +
> + for (i = 0; i < npages; i++, pfn++, n++) {
> + struct sev_data_snp_page_reclaim data = {0};
> +
> + data.paddr = pfn << PAGE_SHIFT;
This shifting back'n'forth between paddr and pfn makes this function
hard to read. Let's use only paddr (diff ontop):
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index ea21307a2b34..25078b0253bd 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -203,14 +203,15 @@ static int sev_cmd_buffer_len(int cmd)
static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool locked)
{
- /* Cbit maybe set in the paddr */
- unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
int ret, err, i, n = 0;
- for (i = 0; i < npages; i++, pfn++, n++) {
+ /* C-bit maybe set, clear it: */
+ paddr = __sme_clr(paddr);
+
+ for (i = 0; i < npages; i++, paddr += PAGE_SIZE, n++) {
struct sev_data_snp_page_reclaim data = {0};
- data.paddr = pfn << PAGE_SHIFT;
+ data.paddr = paddr;
if (locked)
ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
@@ -220,7 +221,7 @@ static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool lock
if (ret)
goto cleanup;
- ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ ret = rmp_make_shared(__phys_to_pfn(paddr), PG_LEVEL_4K);
if (ret)
goto cleanup;
}
@@ -232,7 +233,7 @@ static int snp_reclaim_pages(unsigned long paddr, unsigned int npages, bool lock
* If failed to reclaim the page then page is no longer safe to
* be release back to the system, leak it.
*/
- snp_leak_pages(pfn, npages - n);
+ snp_leak_pages(__phys_to_pfn(paddr), npages - n);
return ret;
}
> +
> + if (locked)
> + ret = __sev_do_cmd_locked(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> + else
> + ret = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
> +
> + if (ret)
> + goto cleanup;
> +
> + ret = rmp_make_shared(pfn, PG_LEVEL_4K);
> + if (ret)
> + goto cleanup;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /*
> + * If failed to reclaim the page then page is no longer safe to
> + * be release back to the system, leak it.
"released"
> + */
> + snp_leak_pages(pfn, npages - n);
> + return ret;
> +}
> +
> +static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
> +{
> + /* Cbit maybe set in the paddr */
> + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> + int rc, n = 0, i;
That n looks like it can be replaced by i.
> +
> + for (i = 0; i < npages; i++, n++, pfn++) {
> + rc = rmp_make_private(pfn, 0, PG_LEVEL_4K, 0, true);
> + if (rc)
> + goto cleanup;
> + }
> +
> + return 0;
> +
> +cleanup:
> + /*
> + * Try unrolling the firmware state changes by
> + * reclaiming the pages which were already changed to the
> + * firmware state.
> + */
> + snp_reclaim_pages(paddr, n, locked);
> +
> + return rc;
> +}
> +
> +static struct page *__snp_alloc_firmware_pages(gfp_t gfp_mask, int order, bool locked)
AFAICT, @locked is always false. So it can go.
> +{
> + unsigned long npages = 1ul << order, paddr;
> + struct sev_device *sev;
> + struct page *page;
> +
> + if (!psp_master || !psp_master->sev_data)
> + return NULL;
> +
> + page = alloc_pages(gfp_mask, order);
> + if (!page)
> + return NULL;
> +
> + /* If SEV-SNP is initialized then add the page in RMP table. */
> + sev = psp_master->sev_data;
> + if (!sev->snp_initialized)
> + return page;
> +
> + paddr = __pa((unsigned long)page_address(page));
> + if (rmp_mark_pages_firmware(paddr, npages, locked))
> + return NULL;
> +
> + return page;
> +}
> +
> +void *snp_alloc_firmware_page(gfp_t gfp_mask)
> +{
> + struct page *page;
> +
> + page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> +
> + return page ? page_address(page) : NULL;
> +}
> +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> +
> +static void __snp_free_firmware_pages(struct page *page, int order, bool locked)a
This @locked too is always false. It becomes true later in
Subject: [PATCH v10 50/50] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
which talks about some panic notifier running in atomic context. But
then you can't take locks in atomic context.
Looks like this whole dance around the locked thing needs a cleanup.
...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 17/50] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
2023-12-08 13:05 ` Borislav Petkov
@ 2023-12-19 23:46 ` Michael Roth
0 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-12-19 23:46 UTC (permalink / raw)
To: Borislav Petkov
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On Fri, Dec 08, 2023 at 02:05:20PM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:46AM -0500, Michael Roth wrote:
> > From: Brijesh Singh <brijesh.singh@amd.com>
> >
> > The behavior and requirement for the SEV-legacy command is altered when
> > the SNP firmware is in the INIT state. See SEV-SNP firmware specification
> > for more details.
> >
> > Allocate the Trusted Memory Region (TMR) as a 2mb sized/aligned region
> > when SNP is enabled to satisfy new requirements for the SNP. Continue
>
> s/the //
>
> > allocating a 1mb region for !SNP configuration.
> >
> > While at it, provide API that can be used by others to allocate a page
>
> "...an API... ... to allocate a firmware page."
>
> Simple.
>
> > that can be used by the firmware.
>
> > The immediate user for this API will be the KVM driver.
>
> Delete that sentence.
>
> > The KVM driver to need to allocate a firmware context
>
> "The KVM driver needs to allocate ...
>
> > page during the guest creation. The context page need to be updated
>
> "needs"
>
> > by the firmware. See the SEV-SNP specification for further details.
> >
> > Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > [mdr: use struct sev_data_snp_page_reclaim instead of passing paddr
> > directly to SEV_CMD_SNP_PAGE_RECLAIM]
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> > drivers/crypto/ccp/sev-dev.c | 151 ++++++++++++++++++++++++++++++++---
> > include/linux/psp-sev.h | 9 +++
> > 2 files changed, 151 insertions(+), 9 deletions(-)
> >
> > +static int rmp_mark_pages_firmware(unsigned long paddr, unsigned int npages, bool locked)
> > +{
> > + /* Cbit maybe set in the paddr */
> > + unsigned long pfn = __sme_clr(paddr) >> PAGE_SHIFT;
> > + int rc, n = 0, i;
>
> That n looks like it can be replaced by i.
Indeed, and for snp_reclaim_pages() too by the looks of it. Will fix that up,
along with all the other suggestions.
> > +
> > +void *snp_alloc_firmware_page(gfp_t gfp_mask)
> > +{
> > + struct page *page;
> > +
> > + page = __snp_alloc_firmware_pages(gfp_mask, 0, false);
> > +
> > + return page ? page_address(page) : NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(snp_alloc_firmware_page);
> > +
> > +static void __snp_free_firmware_pages(struct page *page, int order, bool locked)a
>
> This @locked too is always false. It becomes true later in
>
> Subject: [PATCH v10 50/50] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
>
> which talks about some panic notifier running in atomic context. But
> then you can't take locks in atomic context.
In that case, the lock isn't actually taken. locked==true is basically
used to tell the code to not to try to acquire the lock, but the caller
is relying on the fact that all the other CPUs are stopped at that point
so there's no need to protect against multiple concurrent firmware
commands being issued.
>
> Looks like this whole dance around the locked thing needs a cleanup.
There's another case that will be introduced in the next version of this
series (likely right after this patch) to handle a bug where the buffer used
to access INIT_EX non-volatile data needs to be transitioned to
firmware-owned beforehand. In that case, the CCP cleanup path introduces
another caller of __snp_free_firmware_pages() where locked==true. Maybe this
can be revisited in that context.
Thanks,
Mike
>
> ...
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command when SNP is enabled
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (16 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 17/50] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-09 15:36 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 19/50] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
` (31 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..
A command buffer may contains a system physical address that the firmware
may write to. There are two cases that need to be handled:
1) system physical address points to a guest memory
2) system physical address points to a host memory
To handle the case #1, change the page state to the firmware in the RMP
table before issuing the command and restore the state to shared after the
command completes.
For the case #2, use a bounce buffer to complete the request.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
drivers/crypto/ccp/sev-dev.c | 346 ++++++++++++++++++++++++++++++++++-
drivers/crypto/ccp/sev-dev.h | 12 ++
2 files changed, 348 insertions(+), 10 deletions(-)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index ea21307a2b34..b574b0ef2b1f 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -462,12 +462,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
return sev_write_init_ex_file();
}
+static int alloc_snp_host_map(struct sev_device *sev)
+{
+ struct page *page;
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ memset(map, 0, sizeof(*map));
+
+ page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
+ if (!page)
+ return -ENOMEM;
+
+ map->host = page_address(page);
+ }
+
+ return 0;
+}
+
+static void free_snp_host_map(struct sev_device *sev)
+{
+ int i;
+
+ for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
+ struct snp_host_map *map = &sev->snp_host_map[i];
+
+ if (map->host) {
+ __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
+ memset(map, 0, sizeof(*map));
+ }
+ }
+}
+
+static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ map->active = false;
+
+ if (!paddr || !len)
+ return 0;
+
+ map->paddr = *paddr;
+ map->len = len;
+
+ /* If paddr points to a guest memory then change the page state to firmwware. */
+ if (guest) {
+ if (rmp_mark_pages_firmware(*paddr, npages, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ if (!map->host)
+ return -ENOMEM;
+
+ /* Check if the pre-allocated buffer can be used to fullfil the request. */
+ if (len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /* Transition the pre-allocated buffer to the firmware state. */
+ if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
+ return -EFAULT;
+
+ /* Set the paddr to use pre-allocated firmware buffer */
+ *paddr = __psp_pa(map->host);
+
+done:
+ map->active = true;
+ return 0;
+}
+
+static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
+{
+ unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
+
+ if (!map->active)
+ return 0;
+
+ /* If paddr points to a guest memory then restore the page state to hypervisor. */
+ if (guest) {
+ if (snp_reclaim_pages(*paddr, npages, true))
+ return -EFAULT;
+
+ goto done;
+ }
+
+ /*
+ * Transition the pre-allocated buffer to hypervisor state before the access.
+ *
+ * This is because while changing the page state to firmware, the kernel unmaps
+ * the pages from the direct map, and to restore the direct map the pages must
+ * be transitioned back to the shared state.
+ */
+ if (snp_reclaim_pages(__pa(map->host), npages, true))
+ return -EFAULT;
+
+ /* Copy the response data firmware buffer to the callers buffer. */
+ memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
+ *paddr = map->paddr;
+
+done:
+ map->active = false;
+ return 0;
+}
+
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+ switch (cmd) {
+ case SEV_CMD_PLATFORM_STATUS:
+ case SEV_CMD_GUEST_STATUS:
+ case SEV_CMD_LAUNCH_START:
+ case SEV_CMD_RECEIVE_START:
+ case SEV_CMD_LAUNCH_MEASURE:
+ case SEV_CMD_SEND_START:
+ case SEV_CMD_SEND_UPDATE_DATA:
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ case SEV_CMD_PEK_CSR:
+ case SEV_CMD_PDH_CERT_EXPORT:
+ case SEV_CMD_GET_ID:
+ case SEV_CMD_ATTESTATION_REPORT:
+ return true;
+ default:
+ return false;
+ }
+}
+
+#define prep_buffer(name, addr, len, guest, map) \
+ func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
+
+static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
+{
+ int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
+ struct sev_device *sev = psp_master->sev_data;
+ bool from_fw = !to_fw;
+
+ /*
+ * After the command is completed, change the command buffer memory to
+ * hypervisor state.
+ *
+ * The immutable bit is automatically cleared by the firmware, so
+ * no not need to reclaim the page.
+ */
+ if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
+ return -EFAULT;
+
+ /* No need to go further if firmware failed to execute command. */
+ if (fw_err)
+ return 0;
+ }
+
+ if (to_fw)
+ func = map_firmware_writeable;
+ else
+ func = unmap_firmware_writeable;
+
+ /*
+ * A command buffer may contains a system physical address. If the address
+ * points to a host memory then use an intermediate firmware page otherwise
+ * change the page state in the RMP table.
+ */
+ switch (cmd) {
+ case SEV_CMD_PDH_CERT_EXPORT:
+ if (prep_buffer(struct sev_data_pdh_cert_export, pdh_cert_address,
+ pdh_cert_len, false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_pdh_cert_export, cert_chain_address,
+ cert_chain_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_GET_ID:
+ if (prep_buffer(struct sev_data_get_id, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_PEK_CSR:
+ if (prep_buffer(struct sev_data_pek_csr, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_launch_update_data, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_launch_update_vmsa, address, len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_MEASURE:
+ if (prep_buffer(struct sev_data_launch_measure, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_LAUNCH_UPDATE_SECRET:
+ if (prep_buffer(struct sev_data_launch_secret, guest_address, guest_len,
+ true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_DECRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, false,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_DBG_ENCRYPT:
+ if (prep_buffer(struct sev_data_dbg, dst_addr, len, true,
+ &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_ATTESTATION_REPORT:
+ if (prep_buffer(struct sev_data_attestation_report, address, len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_START:
+ if (prep_buffer(struct sev_data_send_start, session_address,
+ session_len, false, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_send_update_data, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_data, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_SEND_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_send_update_vmsa, hdr_address, hdr_len,
+ false, &sev->snp_host_map[0]))
+ goto err;
+ if (prep_buffer(struct sev_data_send_update_vmsa, trans_address,
+ trans_len, false, &sev->snp_host_map[1]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_DATA:
+ if (prep_buffer(struct sev_data_receive_update_data, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ case SEV_CMD_RECEIVE_UPDATE_VMSA:
+ if (prep_buffer(struct sev_data_receive_update_vmsa, guest_address,
+ guest_len, true, &sev->snp_host_map[0]))
+ goto err;
+ break;
+ default:
+ break;
+ }
+
+ /* The command buffer need to be in the firmware state. */
+ if (to_fw && sev_legacy_cmd_buf_writable(cmd)) {
+ if (rmp_mark_pages_firmware(__pa(cmd_buf), 1, true))
+ return -EFAULT;
+ }
+
+ return 0;
+
+err:
+ return -EINVAL;
+}
+
+static inline bool need_firmware_copy(int cmd)
+{
+ struct sev_device *sev = psp_master->sev_data;
+
+ /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
+ return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
+}
+
+static int snp_aware_copy_to_firmware(int cmd, void *data)
+{
+ return __snp_cmd_buf_copy(cmd, data, true, 0);
+}
+
+static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
+{
+ return __snp_cmd_buf_copy(cmd, data, false, fw_err);
+}
+
static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
{
struct psp_device *psp = psp_master;
struct sev_device *sev;
unsigned int phys_lsb, phys_msb;
unsigned int reg, ret = 0;
+ void *cmd_buf;
int buf_len;
if (!psp || !psp->sev_data)
@@ -487,12 +770,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
* work for some memory, e.g. vmalloc'd addresses, and @data may not be
* physically contiguous.
*/
- if (data)
- memcpy(sev->cmd_buf, data, buf_len);
+ if (data) {
+ if (sev->cmd_buf_active > 2)
+ return -EBUSY;
+
+ cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
+
+ memcpy(cmd_buf, data, buf_len);
+ sev->cmd_buf_active++;
+
+ /*
+ * The behavior of the SEV-legacy commands is altered when the
+ * SNP firmware is in the INIT state.
+ */
+ if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))
+ return -EFAULT;
+ } else {
+ cmd_buf = sev->cmd_buf;
+ }
/* Get the physical address of the command buffer */
- phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
- phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
+ phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
+ phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
cmd, phys_msb, phys_lsb, psp_timeout);
@@ -533,15 +832,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
ret = sev_write_init_ex_file_if_required(cmd);
}
- print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
- buf_len, false);
-
/*
* Copy potential output from the PSP back to data. Do this even on
* failure in case the caller wants to glean something from the error.
*/
- if (data)
- memcpy(data, sev->cmd_buf, buf_len);
+ if (data) {
+ /*
+ * Restore the page state after the command completes.
+ */
+ if (need_firmware_copy(cmd) &&
+ snp_aware_copy_from_firmware(cmd, cmd_buf, ret))
+ return -EFAULT;
+
+ memcpy(data, cmd_buf, buf_len);
+ sev->cmd_buf_active--;
+ }
+
+ print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+ buf_len, false);
return ret;
}
@@ -639,6 +947,14 @@ static int ___sev_platform_init_locked(int *error, bool probe)
if (probe && !psp_init_on_probe)
return 0;
+ /*
+ * Allocate the intermediate buffers used for the legacy command handling.
+ */
+ if (rc != -ENODEV && alloc_snp_host_map(sev)) {
+ dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
+ goto skip_legacy;
+ }
+
if (!sev_es_tmr) {
/* Obtain the TMR memory area for SEV-ES use */
sev_es_tmr = sev_fw_alloc(sev_es_tmr_size);
@@ -691,6 +1007,7 @@ static int ___sev_platform_init_locked(int *error, bool probe)
dev_info(sev->dev, "SEV API:%d.%d build:%d\n", sev->api_major,
sev->api_minor, sev->build);
+skip_legacy:
return 0;
}
@@ -1616,10 +1933,12 @@ int sev_dev_init(struct psp_device *psp)
if (!sev)
goto e_err;
- sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 0);
+ sev->cmd_buf = (void *)devm_get_free_pages(dev, GFP_KERNEL, 1);
if (!sev->cmd_buf)
goto e_sev;
+ sev->cmd_buf_backup = (uint8_t *)sev->cmd_buf + PAGE_SIZE;
+
psp->sev_data = sev;
sev->dev = dev;
@@ -1685,6 +2004,12 @@ static void sev_firmware_shutdown(struct sev_device *sev)
snp_range_list = NULL;
}
+ /*
+ * The host map need to clear the immutable bit so it must be free'd before the
+ * SNP firmware shutdown.
+ */
+ free_snp_host_map(sev);
+
sev_snp_shutdown(&error);
}
@@ -1753,6 +2078,7 @@ void sev_pci_init(void)
return;
err:
+ free_snp_host_map(sev);
psp_master->sev_data = NULL;
}
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 85506325051a..2c2fe42189a5 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -29,11 +29,20 @@
#define SEV_CMD_COMPLETE BIT(1)
#define SEV_CMDRESP_IOC BIT(0)
+#define MAX_SNP_HOST_MAP_BUFS 2
+
struct sev_misc_dev {
struct kref refcount;
struct miscdevice misc;
};
+struct snp_host_map {
+ u64 paddr;
+ u32 len;
+ void *host;
+ bool active;
+};
+
struct sev_device {
struct device *dev;
struct psp_device *psp;
@@ -52,8 +61,11 @@ struct sev_device {
u8 build;
void *cmd_buf;
+ void *cmd_buf_backup;
+ int cmd_buf_active;
bool snp_initialized;
+ struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
};
int sev_dev_init(struct psp_device *psp);
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command when SNP is enabled
2023-10-16 13:27 ` [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command " Michael Roth
@ 2023-12-09 15:36 ` Borislav Petkov
2023-12-29 21:38 ` Michael Roth
0 siblings, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-09 15:36 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:47AM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The behavior of the SEV-legacy commands is altered when the SNP firmware
> is in the INIT state. When SNP is in INIT state, all the SEV-legacy
> commands that cause the firmware to write to memory must be in the
> firmware state before issuing the command..
I think this is trying to say that the *memory* must be in firmware
state before the command. Needs massaging.
> A command buffer may contains a system physical address that the firmware
"contain"
> may write to. There are two cases that need to be handled:
>
> 1) system physical address points to a guest memory
> 2) system physical address points to a host memory
s/a //g
>
> To handle the case #1, change the page state to the firmware in the RMP
> table before issuing the command and restore the state to shared after the
> command completes.
>
> For the case #2, use a bounce buffer to complete the request.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> drivers/crypto/ccp/sev-dev.c | 346 ++++++++++++++++++++++++++++++++++-
> drivers/crypto/ccp/sev-dev.h | 12 ++
> 2 files changed, 348 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index ea21307a2b34..b574b0ef2b1f 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -462,12 +462,295 @@ static int sev_write_init_ex_file_if_required(int cmd_id)
> return sev_write_init_ex_file();
> }
>
> +static int alloc_snp_host_map(struct sev_device *sev)
If this is allocating intermediary bounce buffers, then call the
function that it does exactly that. Or what "host_map" is the name
referring to?
> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> + struct snp_host_map *map = &sev->snp_host_map[i];
> +
> + memset(map, 0, sizeof(*map));
> +
> + page = alloc_pages(GFP_KERNEL_ACCOUNT, get_order(SEV_FW_BLOB_MAX_SIZE));
> + if (!page)
> + return -ENOMEM;
If the second allocation fails, you just leaked the first one.
> + map->host = page_address(page);
> + }
> +
> + return 0;
> +}
> +
> +static void free_snp_host_map(struct sev_device *sev)
> +{
> + int i;
> +
> + for (i = 0; i < MAX_SNP_HOST_MAP_BUFS; i++) {
> + struct snp_host_map *map = &sev->snp_host_map[i];
> +
> + if (map->host) {
> + __free_pages(virt_to_page(map->host), get_order(SEV_FW_BLOB_MAX_SIZE));
> + memset(map, 0, sizeof(*map));
> + }
> + }
> +}
> +
> +static int map_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
Why is paddr a pointer? You simply pass a "unsigned long paddr" like the
rest of the gazillion functions dealing with addresses.
And then you do the ERR_PTR, PTR_ERR thing for the return value of this
function, see include/linux/err.h.
> +{
> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> +
> + map->active = false;
This toggling of active on function entry and exit is silly.
The usual way to do those things is to mark it as active as the last
step of the map function, when everything has succeeded and to mark it
as inactive (active == false) as the first step in the unmap function.
> +
> + if (!paddr || !len)
> + return 0;
> +
> + map->paddr = *paddr;
> + map->len = len;
> +
> + /* If paddr points to a guest memory then change the page state to firmwware. */
> + if (guest) {
> + if (rmp_mark_pages_firmware(*paddr, npages, true))
> + return -EFAULT;
> +
> + goto done;
> + }
This is where it tells you that this function wants splitting:
map_guest_firmware_pages
map_host_firmware_pages
or so.
And then you lose the @guest argument too and you call the different
functions depending on the SEV cmd.
> +
> + if (!map->host)
What in the hell is ->host?! SPA is host memory?
Comments please.
> + return -ENOMEM;
> +
> + /* Check if the pre-allocated buffer can be used to fullfil the request. */
"fulfill"
> + if (len > SEV_FW_BLOB_MAX_SIZE)
> + return -EINVAL;
> +
> + /* Transition the pre-allocated buffer to the firmware state. */
> + if (rmp_mark_pages_firmware(__pa(map->host), npages, true))
> + return -EFAULT;
> +
> + /* Set the paddr to use pre-allocated firmware buffer */
> + *paddr = __psp_pa(map->host);
> +
> +done:
> + map->active = true;
> + return 0;
> +}
> +
> +static int unmap_firmware_writeable(u64 *paddr, u32 len, bool guest, struct snp_host_map *map)
> +{
> + unsigned int npages = PAGE_ALIGN(len) >> PAGE_SHIFT;
> +
> + if (!map->active)
Same comments as above for that one.
> + return 0;
> +
> + /* If paddr points to a guest memory then restore the page state to hypervisor. */
> + if (guest) {
> + if (snp_reclaim_pages(*paddr, npages, true))
> + return -EFAULT;
> +
> + goto done;
> + }
> +
> + /*
> + * Transition the pre-allocated buffer to hypervisor state before the access.
> + *
> + * This is because while changing the page state to firmware, the kernel unmaps
> + * the pages from the direct map, and to restore the direct map the pages must
> + * be transitioned back to the shared state.
> + */
> + if (snp_reclaim_pages(__pa(map->host), npages, true))
> + return -EFAULT;
> +
> + /* Copy the response data firmware buffer to the callers buffer. */
> + memcpy(__va(__sme_clr(map->paddr)), map->host, min_t(size_t, len, map->len));
This is not testing whether map->host is NULL as the above counterpart.
> + *paddr = map->paddr;
> +
> +done:
> + map->active = false;
> + return 0;
> +}
> +
> +static bool sev_legacy_cmd_buf_writable(int cmd)
> +{
> + switch (cmd) {
> + case SEV_CMD_PLATFORM_STATUS:
> + case SEV_CMD_GUEST_STATUS:
> + case SEV_CMD_LAUNCH_START:
> + case SEV_CMD_RECEIVE_START:
> + case SEV_CMD_LAUNCH_MEASURE:
> + case SEV_CMD_SEND_START:
> + case SEV_CMD_SEND_UPDATE_DATA:
> + case SEV_CMD_SEND_UPDATE_VMSA:
> + case SEV_CMD_PEK_CSR:
> + case SEV_CMD_PDH_CERT_EXPORT:
> + case SEV_CMD_GET_ID:
> + case SEV_CMD_ATTESTATION_REPORT:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +#define prep_buffer(name, addr, len, guest, map) \
> + func(&((typeof(name *))cmd_buf)->addr, ((typeof(name *))cmd_buf)->len, guest, map)
> +
> +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
> +{
> + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
> + struct sev_device *sev = psp_master->sev_data;
> + bool from_fw = !to_fw;
> +
> + /*
> + * After the command is completed, change the command buffer memory to
> + * hypervisor state.
> + *
> + * The immutable bit is automatically cleared by the firmware, so
> + * no not need to reclaim the page.
> + */
> + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
> + if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
> + return -EFAULT;
> +
> + /* No need to go further if firmware failed to execute command. */
> + if (fw_err)
> + return 0;
> + }
> +
> + if (to_fw)
> + func = map_firmware_writeable;
> + else
> + func = unmap_firmware_writeable;
Eww, ugly and with the macro above even worse. And completely
unnecessary.
Define prep_buffer() as a normal function which selects which @func to
call and then does it. Not like this.
...
> +static inline bool need_firmware_copy(int cmd)
> +{
> + struct sev_device *sev = psp_master->sev_data;
> +
> + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
"initialized"
> + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
redundant ternary conditional:
return cmd < SEV_CMD_SNP_INIT && sev->snp_initialized;
> +}
> +
> +static int snp_aware_copy_to_firmware(int cmd, void *data)
What does "SNP aware" even mean?
> +{
> + return __snp_cmd_buf_copy(cmd, data, true, 0);
> +}
> +
> +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
> +{
> + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
> +}
> +
> static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> {
> struct psp_device *psp = psp_master;
> struct sev_device *sev;
> unsigned int phys_lsb, phys_msb;
> unsigned int reg, ret = 0;
> + void *cmd_buf;
> int buf_len;
>
> if (!psp || !psp->sev_data)
> @@ -487,12 +770,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> * work for some memory, e.g. vmalloc'd addresses, and @data may not be
> * physically contiguous.
> */
> - if (data)
> - memcpy(sev->cmd_buf, data, buf_len);
> + if (data) {
> + if (sev->cmd_buf_active > 2)
What is that silly counter supposed to mean?
Nested SNP commands?
> + return -EBUSY;
> +
> + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
> +
> + memcpy(cmd_buf, data, buf_len);
> + sev->cmd_buf_active++;
> +
> + /*
> + * The behavior of the SEV-legacy commands is altered when the
> + * SNP firmware is in the INIT state.
> + */
> + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))
Move that need_firmware_copy() check inside snp_aware_copy_to_firmware()
and the other one.
> + return -EFAULT;
> + } else {
> + cmd_buf = sev->cmd_buf;
> + }
>
> /* Get the physical address of the command buffer */
> - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
> + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
>
> dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
> cmd, phys_msb, phys_lsb, psp_timeout);
...
> @@ -639,6 +947,14 @@ static int ___sev_platform_init_locked(int *error, bool probe)
> if (probe && !psp_init_on_probe)
> return 0;
>
> + /*
> + * Allocate the intermediate buffers used for the legacy command handling.
> + */
> + if (rc != -ENODEV && alloc_snp_host_map(sev)) {
Why isn't this
if (!rc && ...)
> + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
> + goto skip_legacy;
No need for that skip_legacy silly label. Just "return 0" here.
...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command when SNP is enabled
2023-12-09 15:36 ` Borislav Petkov
@ 2023-12-29 21:38 ` Michael Roth
0 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-12-29 21:38 UTC (permalink / raw)
To: Borislav Petkov
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Sat, Dec 09, 2023 at 04:36:56PM +0100, Borislav Petkov wrote:
> > +static int __snp_cmd_buf_copy(int cmd, void *cmd_buf, bool to_fw, int fw_err)
> > +{
> > + int (*func)(u64 *paddr, u32 len, bool guest, struct snp_host_map *map);
> > + struct sev_device *sev = psp_master->sev_data;
> > + bool from_fw = !to_fw;
> > +
> > + /*
> > + * After the command is completed, change the command buffer memory to
> > + * hypervisor state.
> > + *
> > + * The immutable bit is automatically cleared by the firmware, so
> > + * no not need to reclaim the page.
> > + */
> > + if (from_fw && sev_legacy_cmd_buf_writable(cmd)) {
> > + if (snp_reclaim_pages(__pa(cmd_buf), 1, true))
> > + return -EFAULT;
> > +
> > + /* No need to go further if firmware failed to execute command. */
> > + if (fw_err)
> > + return 0;
> > + }
> > +
> > + if (to_fw)
> > + func = map_firmware_writeable;
> > + else
> > + func = unmap_firmware_writeable;
>
> Eww, ugly and with the macro above even worse. And completely
> unnecessary.
>
> Define prep_buffer() as a normal function which selects which @func to
> call and then does it. Not like this.
I've rewritten this using a descriptor array to handle buffers for
various command parameters, and switched to allocating bounce buffers
on-demand to avoid some of the init/cleanup coordination. I dont think
any of these are really performance critical and its only for legacy
support, but would be straightforward to add a cache of pre-allocated
buffers later if needed.
I've tried to document/name the helpers so the flow is a bit clearer.
-Mike
>
> ...
>
> > +static inline bool need_firmware_copy(int cmd)
> > +{
> > + struct sev_device *sev = psp_master->sev_data;
> > +
> > + /* After SNP is INIT'ed, the behavior of legacy SEV command is changed. */
>
> "initialized"
>
> > + return ((cmd < SEV_CMD_SNP_INIT) && sev->snp_initialized) ? true : false;
>
> redundant ternary conditional:
>
> return cmd < SEV_CMD_SNP_INIT && sev->snp_initialized;
>
> > +}
> > +
> > +static int snp_aware_copy_to_firmware(int cmd, void *data)
>
> What does "SNP aware" even mean?
>
> > +{
> > + return __snp_cmd_buf_copy(cmd, data, true, 0);
> > +}
> > +
> > +static int snp_aware_copy_from_firmware(int cmd, void *data, int fw_err)
> > +{
> > + return __snp_cmd_buf_copy(cmd, data, false, fw_err);
> > +}
> > +
> > static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> > {
> > struct psp_device *psp = psp_master;
> > struct sev_device *sev;
> > unsigned int phys_lsb, phys_msb;
> > unsigned int reg, ret = 0;
> > + void *cmd_buf;
> > int buf_len;
> >
> > if (!psp || !psp->sev_data)
> > @@ -487,12 +770,28 @@ static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
> > * work for some memory, e.g. vmalloc'd addresses, and @data may not be
> > * physically contiguous.
> > */
> > - if (data)
> > - memcpy(sev->cmd_buf, data, buf_len);
> > + if (data) {
> > + if (sev->cmd_buf_active > 2)
>
> What is that silly counter supposed to mean?
>
> Nested SNP commands?
>
> > + return -EBUSY;
> > +
> > + cmd_buf = sev->cmd_buf_active ? sev->cmd_buf_backup : sev->cmd_buf;
> > +
> > + memcpy(cmd_buf, data, buf_len);
> > + sev->cmd_buf_active++;
> > +
> > + /*
> > + * The behavior of the SEV-legacy commands is altered when the
> > + * SNP firmware is in the INIT state.
> > + */
> > + if (need_firmware_copy(cmd) && snp_aware_copy_to_firmware(cmd, cmd_buf))
>
> Move that need_firmware_copy() check inside snp_aware_copy_to_firmware()
> and the other one.
>
> > + return -EFAULT;
> > + } else {
> > + cmd_buf = sev->cmd_buf;
> > + }
> >
> > /* Get the physical address of the command buffer */
> > - phys_lsb = data ? lower_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> > - phys_msb = data ? upper_32_bits(__psp_pa(sev->cmd_buf)) : 0;
> > + phys_lsb = data ? lower_32_bits(__psp_pa(cmd_buf)) : 0;
> > + phys_msb = data ? upper_32_bits(__psp_pa(cmd_buf)) : 0;
> >
> > dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
> > cmd, phys_msb, phys_lsb, psp_timeout);
>
> ...
>
> > @@ -639,6 +947,14 @@ static int ___sev_platform_init_locked(int *error, bool probe)
> > if (probe && !psp_init_on_probe)
> > return 0;
> >
> > + /*
> > + * Allocate the intermediate buffers used for the legacy command handling.
> > + */
> > + if (rc != -ENODEV && alloc_snp_host_map(sev)) {
>
> Why isn't this
>
> if (!rc && ...)
>
> > + dev_notice(sev->dev, "Failed to alloc host map (disabling legacy SEV)\n");
> > + goto skip_legacy;
>
> No need for that skip_legacy silly label. Just "return 0" here.
>
> ...
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 19/50] crypto: ccp: Add the SNP_PLATFORM_STATUS command
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (17 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 18/50] crypto: ccp: Handle the legacy SEV command " Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-12 16:45 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y Michael Roth
` (30 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The command can be used by the userspace to query the SNP platform status
report. See the SEV-SNP spec for more details.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
Documentation/virt/coco/sev-guest.rst | 27 ++++++++++++++++
drivers/crypto/ccp/sev-dev.c | 45 +++++++++++++++++++++++++++
include/uapi/linux/psp-sev.h | 1 +
3 files changed, 73 insertions(+)
diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index 68b0d2363af8..e828c5326936 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -67,6 +67,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
};
};
+The host ioctl should be called to /dev/sev device. The ioctl accepts command
+id and command input structure.
+
+::
+ struct sev_issue_cmd {
+ /* Command ID */
+ __u32 cmd;
+
+ /* Command request structure */
+ __u64 data;
+
+ /* firmware error code on failure (see psp-sev.h) */
+ __u32 error;
+ };
+
+
2.1 SNP_GET_REPORT
------------------
@@ -124,6 +140,17 @@ be updated with the expected value.
See GHCB specification for further detail on how to parse the certificate blob.
+2.4 SNP_PLATFORM_STATUS
+-----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_platform_status
+:Returns (out): 0 on success, -negative on error
+
+The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
+status includes API major, minor version and more. See the SEV-SNP
+specification for further details.
+
3. SEV-SNP CPUID Enforcement
============================
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index b574b0ef2b1f..679b8d6fc09a 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1772,6 +1772,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
return ret;
}
+static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_data_snp_addr buf;
+ struct page *status_page;
+ void *data;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ status_page = alloc_page(GFP_KERNEL_ACCOUNT);
+ if (!status_page)
+ return -ENOMEM;
+
+ data = page_address(status_page);
+ if (rmp_mark_pages_firmware(__pa(data), 1, true)) {
+ __free_pages(status_page, 0);
+ return -EFAULT;
+ }
+
+ buf.gctx_paddr = __psp_pa(data);
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &argp->error);
+
+ /* Change the page state before accessing it */
+ if (snp_reclaim_pages(__pa(data), 1, true)) {
+ snp_leak_pages(__pa(data) >> PAGE_SHIFT, 1);
+ return -EFAULT;
+ }
+
+ if (ret)
+ goto cleanup;
+
+ if (copy_to_user((void __user *)argp->data, data,
+ sizeof(struct sev_user_data_snp_status)))
+ ret = -EFAULT;
+
+cleanup:
+ __free_pages(status_page, 0);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1823,6 +1865,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SEV_GET_ID2:
ret = sev_ioctl_do_get_id2(&input);
break;
+ case SNP_PLATFORM_STATUS:
+ ret = sev_ioctl_snp_platform_status(&input);
+ break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index 48e3ef91559c..b94b3687edbb 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -28,6 +28,7 @@ enum {
SEV_PEK_CERT_IMPORT,
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
+ SNP_PLATFORM_STATUS,
SEV_MAX,
};
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 19/50] crypto: ccp: Add the SNP_PLATFORM_STATUS command
2023-10-16 13:27 ` [PATCH v10 19/50] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
@ 2023-12-12 16:45 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-12-12 16:45 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:48AM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The command can be used by the userspace to query the SNP platform status
s/by the userspace //
> report. See the SEV-SNP spec for more details.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Mike, this doesn't have your SOB at the end. The whole set should have
it if you're sending it. Please go through the whole thing.
> ---
> Documentation/virt/coco/sev-guest.rst | 27 ++++++++++++++++
> drivers/crypto/ccp/sev-dev.c | 45 +++++++++++++++++++++++++++
> include/uapi/linux/psp-sev.h | 1 +
> 3 files changed, 73 insertions(+)
>
> diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
> index 68b0d2363af8..e828c5326936 100644
> --- a/Documentation/virt/coco/sev-guest.rst
> +++ b/Documentation/virt/coco/sev-guest.rst
> @@ -67,6 +67,22 @@ counter (e.g. counter overflow), then -EIO will be returned.
> };
> };
>
> +The host ioctl should be called to /dev/sev device. The ioctl accepts commanda
"... should be sent to the... "
> +id and command input structure.
> +
> +::
> + struct sev_issue_cmd {
> + /* Command ID */
> + __u32 cmd;
> +
> + /* Command request structure */
> + __u64 data;
> +
> + /* firmware error code on failure (see psp-sev.h) */
> + __u32 error;
> + };
> +
> +
> 2.1 SNP_GET_REPORT
> ------------------
>
> @@ -124,6 +140,17 @@ be updated with the expected value.
>
> See GHCB specification for further detail on how to parse the certificate blob.
>
> +2.4 SNP_PLATFORM_STATUS
> +-----------------------
> +:Technology: sev-snp
> +:Type: hypervisor ioctl cmd
> +:Parameters (in): struct sev_data_snp_platform_status
> +:Returns (out): 0 on success, -negative on error
> +
> +The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
> +status includes API major, minor version and more. See the SEV-SNP
> +specification for further details.
> +
> 3. SEV-SNP CPUID Enforcement
> ============================
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index b574b0ef2b1f..679b8d6fc09a 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -1772,6 +1772,48 @@ static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp, bool writable)
> return ret;
> }
>
> +static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
sev_ioctl_do_snp_platform_status like the others.
> +{
> + struct sev_device *sev = psp_master->sev_data;
> + struct sev_data_snp_addr buf;
> + struct page *status_page;
> + void *data;
> + int ret;
> +
> + if (!sev->snp_initialized || !argp->data)
> + return -EINVAL;
> +
> + status_page = alloc_page(GFP_KERNEL_ACCOUNT);
> + if (!status_page)
> + return -ENOMEM;
> +
> + data = page_address(status_page);
> + if (rmp_mark_pages_firmware(__pa(data), 1, true)) {
> + __free_pages(status_page, 0);
> + return -EFAULT;
ret = -EFAULT;
goto cleanup;
instead.
...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (18 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 19/50] crypto: ccp: Add the SNP_PLATFORM_STATUS command Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-13 12:54 ` Paolo Bonzini
2023-12-18 10:13 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 21/50] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
` (29 subsequent siblings)
49 siblings, 2 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
SEV-SNP relies on the restricted/protected memory support to run guests,
so make sure to enable that support with the
CONFIG_KVM_SW_PROTECTED_VM build option.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8452ed0228cb..71dc506aa3fb 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -126,6 +126,7 @@ config KVM_AMD_SEV
bool "AMD Secure Encrypted Virtualization (SEV) support"
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+ select KVM_SW_PROTECTED_VM
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
2023-10-16 13:27 ` [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y Michael Roth
@ 2023-12-13 12:54 ` Paolo Bonzini
2023-12-29 21:41 ` Michael Roth
2023-12-18 10:13 ` Borislav Petkov
1 sibling, 1 reply; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 12:54 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
On 10/16/23 15:27, Michael Roth wrote:
> SEV-SNP relies on the restricted/protected memory support to run guests,
> so make sure to enable that support with the
> CONFIG_KVM_SW_PROTECTED_VM build option.
>
> Signed-off-by: Michael Roth<michael.roth@amd.com>
> ---
Why select KVM_SW_PROTECTED_VM and not KVM_GENERIC_PRIVATE_MEM?
Paolo
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
2023-12-13 12:54 ` Paolo Bonzini
@ 2023-12-29 21:41 ` Michael Roth
0 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-12-29 21:41 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
On Wed, Dec 13, 2023 at 01:54:55PM +0100, Paolo Bonzini wrote:
> On 10/16/23 15:27, Michael Roth wrote:
> > SEV-SNP relies on the restricted/protected memory support to run guests,
> > so make sure to enable that support with the
> > CONFIG_KVM_SW_PROTECTED_VM build option.
> >
> > Signed-off-by: Michael Roth<michael.roth@amd.com>
> > ---
>
> Why select KVM_SW_PROTECTED_VM and not KVM_GENERIC_PRIVATE_MEM?
I'm not sure, maybe there were previous iterations where it made sense
but KVM_GENERIC_PRIVATE_MEM definitely seems more appropriate now. I've
changed it accordingly.
-Mike
>
> Paolo
>
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
2023-10-16 13:27 ` [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y Michael Roth
2023-12-13 12:54 ` Paolo Bonzini
@ 2023-12-18 10:13 ` Borislav Petkov
2023-12-29 21:40 ` Michael Roth
1 sibling, 1 reply; 158+ messages in thread
From: Borislav Petkov @ 2023-12-18 10:13 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
On Mon, Oct 16, 2023 at 08:27:49AM -0500, Michael Roth wrote:
> SEV-SNP relies on the restricted/protected memory support to run guests,
> so make sure to enable that support with the
> CONFIG_KVM_SW_PROTECTED_VM build option.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> arch/x86/kvm/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 8452ed0228cb..71dc506aa3fb 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -126,6 +126,7 @@ config KVM_AMD_SEV
> bool "AMD Secure Encrypted Virtualization (SEV) support"
> depends on KVM_AMD && X86_64
> depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
> + select KVM_SW_PROTECTED_VM
> help
> Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
> with Encrypted State (SEV-ES) on AMD processors.
> --
Kconfig doesn't like this one:
WARNING: unmet direct dependencies detected for KVM_SW_PROTECTED_VM
Depends on [n]: VIRTUALIZATION [=y] && EXPERT [=n] && X86_64 [=y]
Selected by [m]:
- KVM_AMD_SEV [=y] && VIRTUALIZATION [=y] && KVM_AMD [=m] && X86_64 [=y] && CRYPTO_DEV_SP_PSP [=y] && (KVM_AMD [=m]!=y || CRYPTO_DEV_CCP_DD [=m]!=m)
WARNING: unmet direct dependencies detected for KVM_SW_PROTECTED_VM
Depends on [n]: VIRTUALIZATION [=y] && EXPERT [=n] && X86_64 [=y]
Selected by [m]:
- KVM_AMD_SEV [=y] && VIRTUALIZATION [=y] && KVM_AMD [=m] && X86_64 [=y] && CRYPTO_DEV_SP_PSP [=y] && (KVM_AMD [=m]!=y || CRYPTO_DEV_CCP_DD [=m]!=m)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y
2023-12-18 10:13 ` Borislav Petkov
@ 2023-12-29 21:40 ` Michael Roth
0 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-12-29 21:40 UTC (permalink / raw)
To: Borislav Petkov
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
On Mon, Dec 18, 2023 at 11:13:50AM +0100, Borislav Petkov wrote:
> On Mon, Oct 16, 2023 at 08:27:49AM -0500, Michael Roth wrote:
> > SEV-SNP relies on the restricted/protected memory support to run guests,
> > so make sure to enable that support with the
> > CONFIG_KVM_SW_PROTECTED_VM build option.
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> > arch/x86/kvm/Kconfig | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index 8452ed0228cb..71dc506aa3fb 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -126,6 +126,7 @@ config KVM_AMD_SEV
> > bool "AMD Secure Encrypted Virtualization (SEV) support"
> > depends on KVM_AMD && X86_64
> > depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
> > + select KVM_SW_PROTECTED_VM
> > help
> > Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
> > with Encrypted State (SEV-ES) on AMD processors.
> > --
>
> Kconfig doesn't like this one:
>
> WARNING: unmet direct dependencies detected for KVM_SW_PROTECTED_VM
> Depends on [n]: VIRTUALIZATION [=y] && EXPERT [=n] && X86_64 [=y]
> Selected by [m]:
> - KVM_AMD_SEV [=y] && VIRTUALIZATION [=y] && KVM_AMD [=m] && X86_64 [=y] && CRYPTO_DEV_SP_PSP [=y] && (KVM_AMD [=m]!=y || CRYPTO_DEV_CCP_DD [=m]!=m)
>
> WARNING: unmet direct dependencies detected for KVM_SW_PROTECTED_VM
> Depends on [n]: VIRTUALIZATION [=y] && EXPERT [=n] && X86_64 [=y]
> Selected by [m]:
> - KVM_AMD_SEV [=y] && VIRTUALIZATION [=y] && KVM_AMD [=m] && X86_64 [=y] && CRYPTO_DEV_SP_PSP [=y] && (KVM_AMD [=m]!=y || CRYPTO_DEV_CCP_DD [=m]!=m)
I think this is because KVM_SW_PROTECTED_VM requires EXPERT, which has
to be set explicitly. But I think Paolo is right that
KVM_GENERIC_PRIVATE_MEM is more appropriate, which does not require
EXPERT.
-Mike
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 21/50] KVM: SEV: Add support to handle AP reset MSR protocol
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (19 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 20/50] KVM: SEV: Select CONFIG_KVM_SW_PROTECTED_VM when CONFIG_KVM_AMD_SEV=y Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-12 17:02 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 22/50] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
` (28 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Tom Lendacky <thomas.lendacky@amd.com>
Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 51 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 93ec8c12c91d..57ced29264ce 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -56,6 +56,8 @@
/* AP Reset Hold */
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
#define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
+#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6ee925d66648..4f895a7201ed 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -65,6 +65,10 @@ module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
#define sev_es_debug_swap_enabled false
#endif /* CONFIG_KVM_AMD_SEV */
+#define AP_RESET_HOLD_NONE 0
+#define AP_RESET_HOLD_NAE_EVENT 1
+#define AP_RESET_HOLD_MSR_PROTO 2
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2594,6 +2598,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
void sev_es_unmap_ghcb(struct vcpu_svm *svm)
{
+ /* Clear any indication that the vCPU is in a type of AP Reset Hold */
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
+
if (!svm->sev_es.ghcb)
return;
@@ -2805,6 +2812,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_AP_RESET_HOLD_REQ:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
+ ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+
+ /*
+ * Preset the result to a non-SIPI return and then only set
+ * the result to non-zero when delivering a SIPI.
+ */
+ set_ghcb_msr_bits(svm, 0,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
+
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
@@ -2904,6 +2927,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
+ svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NAE_EVENT;
ret = kvm_emulate_ap_reset_hold(vcpu);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
@@ -3147,13 +3171,29 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
return;
}
- /*
- * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
- * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
- * non-zero value.
- */
- if (!svm->sev_es.ghcb)
- return;
+ /* Subsequent SIPI */
+ switch (svm->sev_es.ap_reset_hold_type) {
+ case AP_RESET_HOLD_NAE_EVENT:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ break;
+ case AP_RESET_HOLD_MSR_PROTO:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set GHCB data field to a non-zero value.
+ */
+ set_ghcb_msr_bits(svm, 1,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
+ GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
- ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
+ set_ghcb_msr_bits(svm, GHCB_MSR_AP_RESET_HOLD_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ default:
+ break;
+ }
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c409f934c377..b74231511493 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -195,6 +195,7 @@ struct vcpu_sev_es_state {
u8 valid_bitmap[16];
struct kvm_host_map ghcb_map;
bool received_first_sipi;
+ unsigned int ap_reset_hold_type;
/* SEV-ES scratch area support */
u64 sw_scratch;
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 21/50] KVM: SEV: Add support to handle AP reset MSR protocol
2023-10-16 13:27 ` [PATCH v10 21/50] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
@ 2023-12-12 17:02 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-12-12 17:02 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:50AM -0500, Michael Roth wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
>
> Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
> available in version 2 of the GHCB specification.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
> arch/x86/include/asm/sev-common.h | 2 ++
> arch/x86/kvm/svm/sev.c | 56 ++++++++++++++++++++++++++-----
> arch/x86/kvm/svm/svm.h | 1 +
> 3 files changed, 51 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index 93ec8c12c91d..57ced29264ce 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -56,6 +56,8 @@
> /* AP Reset Hold */
> #define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
> #define GHCB_MSR_AP_RESET_HOLD_RESP 0x007
> +#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
> +#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)
Align vertically pls.
> /* GHCB GPA Register */
> #define GHCB_MSR_REG_GPA_REQ 0x012
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6ee925d66648..4f895a7201ed 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -65,6 +65,10 @@ module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
> #define sev_es_debug_swap_enabled false
> #endif /* CONFIG_KVM_AMD_SEV */
>
> +#define AP_RESET_HOLD_NONE 0
> +#define AP_RESET_HOLD_NAE_EVENT 1
> +#define AP_RESET_HOLD_MSR_PROTO 2
> +
> static u8 sev_enc_bit;
> static DECLARE_RWSEM(sev_deactivate_lock);
> static DEFINE_MUTEX(sev_bitmap_lock);
> @@ -2594,6 +2598,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>
> void sev_es_unmap_ghcb(struct vcpu_svm *svm)
> {
> + /* Clear any indication that the vCPU is in a type of AP Reset Hold */
> + svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_NONE;
> +
> if (!svm->sev_es.ghcb)
> return;
>
> @@ -2805,6 +2812,22 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> GHCB_MSR_INFO_POS);
> break;
> }
> + case GHCB_MSR_AP_RESET_HOLD_REQ:
> + svm->sev_es.ap_reset_hold_type = AP_RESET_HOLD_MSR_PROTO;
> + ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
> +
> + /*
> + * Preset the result to a non-SIPI return and then only set
> + * the result to non-zero when delivering a SIPI.
> + */
> + set_ghcb_msr_bits(svm, 0,
> + GHCB_MSR_AP_RESET_HOLD_RESULT_MASK,
> + GHCB_MSR_AP_RESET_HOLD_RESULT_POS);
Yikes, those defines are a mouthful.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 22/50] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (20 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 21/50] KVM: SEV: Add support to handle AP reset MSR protocol Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-18 10:23 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
` (27 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
Version 2 of the GHCB specification introduced advertisement of features
that are supported by the Hypervisor.
Now that KVM supports version 2 of the GHCB specification, bump the
maximum supported protocol version.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/include/asm/sev-common.h | 2 ++
arch/x86/kvm/svm/sev.c | 14 ++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++-
3 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 57ced29264ce..9ba88973a187 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -101,6 +101,8 @@ enum psc_op {
/* GHCB Hypervisor Feature Request/Response */
#define GHCB_MSR_HV_FT_REQ 0x080
#define GHCB_MSR_HV_FT_RESP 0x081
+#define GHCB_MSR_HV_FT_POS 12
+#define GHCB_MSR_HV_FT_MASK GENMASK_ULL(51, 0)
#define GHCB_MSR_HV_FT_RESP_VAL(v) \
/* GHCBData[63:12] */ \
(((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4f895a7201ed..088b32657f46 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2568,6 +2568,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ case SVM_VMGEXIT_HV_FEATURES:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -2828,6 +2829,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK,
GHCB_MSR_INFO_POS);
break;
+ case GHCB_MSR_HV_FT_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
+ GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
+ GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
@@ -2952,6 +2960,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_HV_FEATURES: {
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_HV_FT_SUPPORTED);
+
+ ret = 1;
+ break;
+ }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index b74231511493..c13070d00910 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -663,9 +663,10 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
/* sev.c */
-#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL
+#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP
extern unsigned int max_sev_asid;
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 22/50] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
2023-10-16 13:27 ` [PATCH v10 22/50] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
@ 2023-12-18 10:23 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-12-18 10:23 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:51AM -0500, Michael Roth wrote:
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 4f895a7201ed..088b32657f46 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2568,6 +2568,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> case SVM_VMGEXIT_AP_HLT_LOOP:
> case SVM_VMGEXIT_AP_JUMP_TABLE:
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> + case SVM_VMGEXIT_HV_FEATURES:
> break;
> default:
> reason = GHCB_ERR_INVALID_EVENT;
> @@ -2828,6 +2829,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> GHCB_MSR_INFO_MASK,
> GHCB_MSR_INFO_POS);
> break;
> + case GHCB_MSR_HV_FT_REQ: {
^^^
No need to have a statement block here. Neither below.
> + set_ghcb_msr_bits(svm, GHCB_HV_FT_SUPPORTED,
> + GHCB_MSR_HV_FT_MASK, GHCB_MSR_HV_FT_POS);
> + set_ghcb_msr_bits(svm, GHCB_MSR_HV_FT_RESP,
> + GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
> + break;
> + }
> case GHCB_MSR_TERM_REQ: {
> u64 reason_set, reason_code;
>
> @@ -2952,6 +2960,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
> ret = 1;
> break;
> }
> + case SVM_VMGEXIT_HV_FEATURES: {
^^^^
> + ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_HV_FT_SUPPORTED);
> +
> + ret = 1;
> + break;
> + }
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(vcpu,
> "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (21 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 22/50] KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-11 13:24 ` Vlastimil Babka
` (3 more replies)
2023-10-16 13:27 ` [PATCH v10 24/50] KVM: SEV: Add initial SEV-SNP support Michael Roth
` (26 subsequent siblings)
49 siblings, 4 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
Implement a workaround for an SNP erratum where the CPU will incorrectly
signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
RMP entry of a VMCB, VMSA or AVIC backing page.
When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
backing pages as "in-use" via a reserved bit in the corresponding RMP
entry after a successful VMRUN. This is done for _all_ VMs, not just
SNP-Active VMs.
If the hypervisor accesses an in-use page through a writable
translation, the CPU will throw an RMP violation #PF. On early SNP
hardware, if an in-use page is 2mb aligned and software accesses any
part of the associated 2mb region with a hupage, the CPU will
incorrectly treat the entire 2mb region as in-use and signal a spurious
RMP violation #PF.
The recommended is to not use the hugepage for the VMCB, VMSA or
AVIC backing page for similar reasons. Add a generic allocator that will
ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
be used when SEV-SNP is enabled. Also implement similar handling for the
VMCB/VMSA pages of nested guests.
Co-developed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Marc Orr <marcorr@google.com>
Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: squash in nested guest handling from Ashish]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/lapic.c | 5 ++++-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/sev.c | 33 ++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 17 ++++++++++++---
arch/x86/kvm/svm/svm.h | 1 +
7 files changed, 55 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index f1505a5fa781..4ef2eca14287 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -136,6 +136,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
KVM_X86_OP_OPTIONAL(gmem_invalidate)
+KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
#undef KVM_X86_OP
#undef KVM_X86_OP_OPTIONAL
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fa401cb1a552..a3983271ea28 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1763,6 +1763,7 @@ struct kvm_x86_ops {
int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
+ void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
};
struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dcd60b39e794..631a554c0f48 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2810,7 +2810,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
vcpu->arch.apic = apic;
- apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+ if (kvm_x86_ops.alloc_apic_backing_page)
+ apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
+ else
+ apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!apic->regs) {
printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dd496c9e5f91..1f9a3f9eb985 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1194,7 +1194,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
if (svm->nested.initialized)
return 0;
- vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmcb02_page = snp_safe_alloc_page(&svm->vcpu);
if (!vmcb02_page)
return -ENOMEM;
svm->nested.vmcb02.ptr = page_address(vmcb02_page);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 088b32657f46..1cfb9232fc74 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3211,3 +3211,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
break;
}
}
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+ unsigned long pfn;
+ struct page *p;
+
+ if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+ /*
+ * Allocate an SNP safe page to workaround the SNP erratum where
+ * the CPU will incorrectly signal an RMP violation #PF if a
+ * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
+ * or AVIC backing page. The recommeded workaround is to not use the
+ * hugepage.
+ *
+ * Allocate one extra page, use a page which is not 2mb aligned
+ * and free the other.
+ */
+ p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+ if (!p)
+ return NULL;
+
+ split_page(p, 1);
+
+ pfn = page_to_pfn(p);
+ if (IS_ALIGNED(pfn, PTRS_PER_PMD))
+ __free_page(p++);
+ else
+ __free_page(p + 1);
+
+ return p;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1e7fb1ea45f7..8e4ef0cd968a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -706,7 +706,7 @@ static int svm_cpu_init(int cpu)
int ret = -ENOMEM;
memset(sd, 0, sizeof(struct svm_cpu_data));
- sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ sd->save_area = snp_safe_alloc_page(NULL);
if (!sd->save_area)
return ret;
@@ -1425,7 +1425,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm = to_svm(vcpu);
err = -ENOMEM;
- vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmcb01_page = snp_safe_alloc_page(vcpu);
if (!vmcb01_page)
goto out;
@@ -1434,7 +1434,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
* SEV-ES guests require a separate VMSA page used to contain
* the encrypted register state of the guest.
*/
- vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ vmsa_page = snp_safe_alloc_page(vcpu);
if (!vmsa_page)
goto error_free_vmcb_page;
@@ -4876,6 +4876,16 @@ static int svm_vm_init(struct kvm *kvm)
return 0;
}
+static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
+{
+ struct page *page = snp_safe_alloc_page(vcpu);
+
+ if (!page)
+ return NULL;
+
+ return page_address(page);
+}
+
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = KBUILD_MODNAME,
@@ -5007,6 +5017,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
+ .alloc_apic_backing_page = svm_alloc_apic_backing_page,
};
/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c13070d00910..b7b8bf73cbb9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -694,6 +694,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
/* vmenter.S */
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
2023-10-16 13:27 ` [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
@ 2023-12-11 13:24 ` Vlastimil Babka
2023-12-12 0:00 ` Kalra, Ashish
2023-12-13 13:31 ` Paolo Bonzini
` (2 subsequent siblings)
3 siblings, 1 reply; 158+ messages in thread
From: Vlastimil Babka @ 2023-12-11 13:24 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On 10/16/23 15:27, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
>
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing pages as "in-use" via a reserved bit in the corresponding RMP
> entry after a successful VMRUN. This is done for _all_ VMs, not just
> SNP-Active VMs.
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will
> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.
>
> The recommended is to not use the hugepage for the VMCB, VMSA or
> AVIC backing page for similar reasons. Add a generic allocator that will
> ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
This is a bit confusing wording as we are not avoiding "using a
hugepage" but AFAIU, avoiding using a (4k) page that has a hugepage
aligned physical address, right?
> be used when SEV-SNP is enabled. Also implement similar handling for the
> VMCB/VMSA pages of nested guests.
>
> Co-developed-by: Marc Orr <marcorr@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> [mdr: squash in nested guest handling from Ashish]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
<snip>
> +
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> +{
> + unsigned long pfn;
> + struct page *p;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +
> + /*
> + * Allocate an SNP safe page to workaround the SNP erratum where
> + * the CPU will incorrectly signal an RMP violation #PF if a
> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> + * or AVIC backing page. The recommeded workaround is to not use the
> + * hugepage.
Same here "not use the hugepage"
> + *
> + * Allocate one extra page, use a page which is not 2mb aligned
> + * and free the other.
This makes more sense.
> + */
> + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> + if (!p)
> + return NULL;
> +
> + split_page(p, 1);
Yeah I think that's a sensible use of split_page(), as we don't have
support for forcefully non-aligned allocations or specific "page
coloring" in the page allocator.
So even with my wording concerns:
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> +
> + pfn = page_to_pfn(p);
> + if (IS_ALIGNED(pfn, PTRS_PER_PMD))
> + __free_page(p++);
> + else
> + __free_page(p + 1);
> +
> + return p;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1e7fb1ea45f7..8e4ef0cd968a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -706,7 +706,7 @@ static int svm_cpu_init(int cpu)
> int ret = -ENOMEM;
>
> memset(sd, 0, sizeof(struct svm_cpu_data));
> - sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
> + sd->save_area = snp_safe_alloc_page(NULL);
> if (!sd->save_area)
> return ret;
>
> @@ -1425,7 +1425,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> svm = to_svm(vcpu);
>
> err = -ENOMEM;
> - vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmcb01_page = snp_safe_alloc_page(vcpu);
> if (!vmcb01_page)
> goto out;
>
> @@ -1434,7 +1434,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> * SEV-ES guests require a separate VMSA page used to contain
> * the encrypted register state of the guest.
> */
> - vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmsa_page = snp_safe_alloc_page(vcpu);
> if (!vmsa_page)
> goto error_free_vmcb_page;
>
> @@ -4876,6 +4876,16 @@ static int svm_vm_init(struct kvm *kvm)
> return 0;
> }
>
> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> +{
> + struct page *page = snp_safe_alloc_page(vcpu);
> +
> + if (!page)
> + return NULL;
> +
> + return page_address(page);
> +}
> +
> static struct kvm_x86_ops svm_x86_ops __initdata = {
> .name = KBUILD_MODNAME,
>
> @@ -5007,6 +5017,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
> + .alloc_apic_backing_page = svm_alloc_apic_backing_page,
> };
>
> /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index c13070d00910..b7b8bf73cbb9 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -694,6 +694,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
> void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>
> /* vmenter.S */
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
2023-12-11 13:24 ` Vlastimil Babka
@ 2023-12-12 0:00 ` Kalra, Ashish
0 siblings, 0 replies; 158+ messages in thread
From: Kalra, Ashish @ 2023-12-12 0:00 UTC (permalink / raw)
To: Vlastimil Babka, Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, kirill, ak,
tony.luck, marcorr, sathyanarayanan.kuppuswamy, alpergun, jarkko,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
Hello Vlastimil,
On 12/11/2023 7:24 AM, Vlastimil Babka wrote:
> On 10/16/23 15:27, Michael Roth wrote:
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> Implement a workaround for an SNP erratum where the CPU will incorrectly
>> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
>> RMP entry of a VMCB, VMSA or AVIC backing page.
>>
>> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
>> backing pages as "in-use" via a reserved bit in the corresponding RMP
>> entry after a successful VMRUN. This is done for _all_ VMs, not just
>> SNP-Active VMs.
>>
>> If the hypervisor accesses an in-use page through a writable
>> translation, the CPU will throw an RMP violation #PF. On early SNP
>> hardware, if an in-use page is 2mb aligned and software accesses any
>> part of the associated 2mb region with a hupage, the CPU will
>> incorrectly treat the entire 2mb region as in-use and signal a spurious
>> RMP violation #PF.
>>
>> The recommended is to not use the hugepage for the VMCB, VMSA or
>> AVIC backing page for similar reasons. Add a generic allocator that will
>> ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
>
> This is a bit confusing wording as we are not avoiding "using a
> hugepage" but AFAIU, avoiding using a (4k) page that has a hugepage
> aligned physical address, right?
Yes.
>
>> be used when SEV-SNP is enabled. Also implement similar handling for the
>> VMCB/VMSA pages of nested guests.
>>
>> Co-developed-by: Marc Orr <marcorr@google.com>
>> Signed-off-by: Marc Orr <marcorr@google.com>
>> Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
>> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> [mdr: squash in nested guest handling from Ashish]
>> Signed-off-by: Michael Roth <michael.roth@amd.com>
>> ---
>
> <snip>
>
>> +
>> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
>> +{
>> + unsigned long pfn;
>> + struct page *p;
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
>> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> +
>> + /*
>> + * Allocate an SNP safe page to workaround the SNP erratum where
>> + * the CPU will incorrectly signal an RMP violation #PF if a
>> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
>> + * or AVIC backing page. The recommeded workaround is to not use the
>> + * hugepage.
>
> Same here "not use the hugepage"
>
>> + *
>> + * Allocate one extra page, use a page which is not 2mb aligned
>> + * and free the other.
>
> This makes more sense.
>
>> + */
>> + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
>> + if (!p)
>> + return NULL;
>> +
>> + split_page(p, 1);
> > Yeah I think that's a sensible use of split_page(), as we don't have
> support for forcefully non-aligned allocations or specific "page
> coloring" in the page allocator.
Yes, using split_page() allows us to free the additionally allocated
page individually.
Thanks,
Ashish
> So even with my wording concerns:
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
2023-10-16 13:27 ` [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
2023-12-11 13:24 ` Vlastimil Babka
@ 2023-12-13 13:31 ` Paolo Bonzini
2023-12-13 18:45 ` Paolo Bonzini
2023-12-18 14:57 ` Borislav Petkov
3 siblings, 0 replies; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 13:31 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On 10/16/23 15:27, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
>
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing pages as "in-use" via a reserved bit in the corresponding RMP
> entry after a successful VMRUN. This is done for _all_ VMs, not just
> SNP-Active VMs.
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will
> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.
I don't understand if this can happen even if SEV-SNP is not in use,
just because it is supported on the host? If so, should this be Cc'd to
stable? (I can tweak the wording and submit it).
Paolo
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
2023-10-16 13:27 ` [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
2023-12-11 13:24 ` Vlastimil Babka
2023-12-13 13:31 ` Paolo Bonzini
@ 2023-12-13 18:45 ` Paolo Bonzini
2023-12-18 14:57 ` Borislav Petkov
3 siblings, 0 replies; 158+ messages in thread
From: Paolo Bonzini @ 2023-12-13 18:45 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, seanjc, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
On 10/16/23 15:27, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
>
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing pages as "in-use" via a reserved bit in the corresponding RMP
> entry after a successful VMRUN. This is done for _all_ VMs, not just
> SNP-Active VMs.
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will
> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.
>
> The recommended is to not use the hugepage for the VMCB, VMSA or
> AVIC backing page for similar reasons. Add a generic allocator that will
> ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
> be used when SEV-SNP is enabled. Also implement similar handling for the
> VMCB/VMSA pages of nested guests.
>
> Co-developed-by: Marc Orr <marcorr@google.com>
> Signed-off-by: Marc Orr <marcorr@google.com>
> Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> [mdr: squash in nested guest handling from Ashish]
> Signed-off-by: Michael Roth <michael.roth@amd.com>
Based on the discussion with Borislav, please move this earlier in the
series, before patch 6.
Paolo
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/lapic.c | 5 ++++-
> arch/x86/kvm/svm/nested.c | 2 +-
> arch/x86/kvm/svm/sev.c | 33 ++++++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.c | 17 ++++++++++++---
> arch/x86/kvm/svm/svm.h | 1 +
> 7 files changed, 55 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index f1505a5fa781..4ef2eca14287 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -136,6 +136,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
> KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
> KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
> KVM_X86_OP_OPTIONAL(gmem_invalidate)
> +KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
>
> #undef KVM_X86_OP
> #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index fa401cb1a552..a3983271ea28 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1763,6 +1763,7 @@ struct kvm_x86_ops {
>
> int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
> void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
> + void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> };
>
> struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index dcd60b39e794..631a554c0f48 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2810,7 +2810,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
>
> vcpu->arch.apic = apic;
>
> - apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> + if (kvm_x86_ops.alloc_apic_backing_page)
> + apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu);
> + else
> + apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> if (!apic->regs) {
> printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
> vcpu->vcpu_id);
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index dd496c9e5f91..1f9a3f9eb985 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1194,7 +1194,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
> if (svm->nested.initialized)
> return 0;
>
> - vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmcb02_page = snp_safe_alloc_page(&svm->vcpu);
> if (!vmcb02_page)
> return -ENOMEM;
> svm->nested.vmcb02.ptr = page_address(vmcb02_page);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 088b32657f46..1cfb9232fc74 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3211,3 +3211,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> break;
> }
> }
> +
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> +{
> + unsigned long pfn;
> + struct page *p;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +
> + /*
> + * Allocate an SNP safe page to workaround the SNP erratum where
> + * the CPU will incorrectly signal an RMP violation #PF if a
> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> + * or AVIC backing page. The recommeded workaround is to not use the
> + * hugepage.
> + *
> + * Allocate one extra page, use a page which is not 2mb aligned
> + * and free the other.
> + */
> + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> + if (!p)
> + return NULL;
> +
> + split_page(p, 1);
> +
> + pfn = page_to_pfn(p);
> + if (IS_ALIGNED(pfn, PTRS_PER_PMD))
> + __free_page(p++);
> + else
> + __free_page(p + 1);
> +
> + return p;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1e7fb1ea45f7..8e4ef0cd968a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -706,7 +706,7 @@ static int svm_cpu_init(int cpu)
> int ret = -ENOMEM;
>
> memset(sd, 0, sizeof(struct svm_cpu_data));
> - sd->save_area = alloc_page(GFP_KERNEL | __GFP_ZERO);
> + sd->save_area = snp_safe_alloc_page(NULL);
> if (!sd->save_area)
> return ret;
>
> @@ -1425,7 +1425,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> svm = to_svm(vcpu);
>
> err = -ENOMEM;
> - vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmcb01_page = snp_safe_alloc_page(vcpu);
> if (!vmcb01_page)
> goto out;
>
> @@ -1434,7 +1434,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
> * SEV-ES guests require a separate VMSA page used to contain
> * the encrypted register state of the guest.
> */
> - vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> + vmsa_page = snp_safe_alloc_page(vcpu);
> if (!vmsa_page)
> goto error_free_vmcb_page;
>
> @@ -4876,6 +4876,16 @@ static int svm_vm_init(struct kvm *kvm)
> return 0;
> }
>
> +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
> +{
> + struct page *page = snp_safe_alloc_page(vcpu);
> +
> + if (!page)
> + return NULL;
> +
> + return page_address(page);
> +}
> +
> static struct kvm_x86_ops svm_x86_ops __initdata = {
> .name = KBUILD_MODNAME,
>
> @@ -5007,6 +5017,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
> + .alloc_apic_backing_page = svm_alloc_apic_backing_page,
> };
>
> /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index c13070d00910..b7b8bf73cbb9 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -694,6 +694,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
> void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
> void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
>
> /* vmenter.S */
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
2023-10-16 13:27 ` [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
` (2 preceding siblings ...)
2023-12-13 18:45 ` Paolo Bonzini
@ 2023-12-18 14:57 ` Borislav Petkov
3 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-12-18 14:57 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
Just typos:
On Mon, Oct 16, 2023 at 08:27:52AM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> Implement a workaround for an SNP erratum where the CPU will incorrectly
> signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the
> RMP entry of a VMCB, VMSA or AVIC backing page.
>
> When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC
> backing pages as "in-use" via a reserved bit in the corresponding RMP
> entry after a successful VMRUN. This is done for _all_ VMs, not just
> SNP-Active VMs.
>
> If the hypervisor accesses an in-use page through a writable
> translation, the CPU will throw an RMP violation #PF. On early SNP
> hardware, if an in-use page is 2mb aligned and software accesses any
> part of the associated 2mb region with a hupage, the CPU will
"hugepage"
> incorrectly treat the entire 2mb region as in-use and signal a spurious
> RMP violation #PF.
>
> The recommended is to not use the hugepage for the VMCB, VMSA or
s/recommended/recommendation/
s/the hugepage/a hugepage/
> AVIC backing page for similar reasons. Add a generic allocator that will
> ensure that the page returns is not hugepage (2mb or 1gb) and is safe to
"... the page returned is not a hugepage..."
...
> +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
> +{
> + unsigned long pfn;
> + struct page *p;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP))
> + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +
> + /*
> + * Allocate an SNP safe page to workaround the SNP erratum where
> + * the CPU will incorrectly signal an RMP violation #PF if a
> + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA
> + * or AVIC backing page. The recommeded workaround is to not use the
"recommended"
> + * hugepage.
> + *
> + * Allocate one extra page, use a page which is not 2mb aligned
> + * and free the other.
> + */
> + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
> + if (!p)
> + return NULL;
> +
> + split_page(p, 1);
> +
> + pfn = page_to_pfn(p);
> + if (IS_ALIGNED(pfn, PTRS_PER_PMD))
> + __free_page(p++);
> + else
> + __free_page(p + 1);
> +
> + return p;
> +}
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 24/50] KVM: SEV: Add initial SEV-SNP support
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (22 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 23/50] KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-12-18 17:43 ` Borislav Petkov
2023-10-16 13:27 ` [PATCH v10 25/50] KVM: SEV: Add KVM_SNP_INIT command Michael Roth
` (25 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.
The SNP feature is added incrementally, the later patches adds a new module
parameters that can be used to enabled SEV-SNP in the KVM.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/kvm/svm/sev.c | 10 ++++++++++
arch/x86/kvm/svm/svm.h | 8 ++++++++
2 files changed, 18 insertions(+)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1cfb9232fc74..4eefc168ebb3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -59,10 +59,14 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
/* enable/disable SEV-ES DebugSwap support */
static bool sev_es_debug_swap_enabled = true;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
+
+/* enable/disable SEV-SNP support */
+static bool sev_snp_enabled;
#else
#define sev_enabled false
#define sev_es_enabled false
#define sev_es_debug_swap_enabled false
+#define sev_snp_enabled false
#endif /* CONFIG_KVM_AMD_SEV */
#define AP_RESET_HOLD_NONE 0
@@ -2186,6 +2190,7 @@ void __init sev_hardware_setup(void)
{
#ifdef CONFIG_KVM_AMD_SEV
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
+ bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;
@@ -2261,6 +2266,10 @@ void __init sev_hardware_setup(void)
sev_es_asid_count = min_sev_asid - 1;
WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
sev_es_supported = true;
+ sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
+
+ pr_info("SEV-ES %ssupported: %u ASIDs\n",
+ sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);
out:
if (boot_cpu_has(X86_FEATURE_SEV))
@@ -2277,6 +2286,7 @@ void __init sev_hardware_setup(void)
if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
!cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
sev_es_debug_swap_enabled = false;
+ sev_snp_enabled = sev_snp_supported;
#endif
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index b7b8bf73cbb9..635430fa641b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,6 +79,7 @@ enum {
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
+ bool snp_active; /* SEV-SNP enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
@@ -339,6 +340,13 @@ static __always_inline bool sev_es_guest(struct kvm *kvm)
#endif
}
+static __always_inline bool sev_snp_guest(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_es_guest(kvm) && sev->snp_active;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 24/50] KVM: SEV: Add initial SEV-SNP support
2023-10-16 13:27 ` [PATCH v10 24/50] KVM: SEV: Add initial SEV-SNP support Michael Roth
@ 2023-12-18 17:43 ` Borislav Petkov
0 siblings, 0 replies; 158+ messages in thread
From: Borislav Petkov @ 2023-12-18 17:43 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, vbabka, kirill,
ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun, jarkko,
ashish.kalra, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On Mon, Oct 16, 2023 at 08:27:53AM -0500, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
>
> The next generation of SEV is called SEV-SNP (Secure Nested Paging).
> SEV-SNP builds upon existing SEV and SEV-ES functionality while adding new
> hardware based security protection. SEV-SNP adds strong memory encryption
> integrity protection to help prevent malicious hypervisor-based attacks
> such as data replay, memory re-mapping, and more, to create an isolated
> execution environment.
>
> The SNP feature is added incrementally, the later patches adds a new module
> parameters that can be used to enabled SEV-SNP in the KVM.
This sentence can simply go to /dev/null.
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
> arch/x86/kvm/svm/sev.c | 10 ++++++++++
> arch/x86/kvm/svm/svm.h | 8 ++++++++
> 2 files changed, 18 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 1cfb9232fc74..4eefc168ebb3 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -59,10 +59,14 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
> /* enable/disable SEV-ES DebugSwap support */
> static bool sev_es_debug_swap_enabled = true;
> module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
> +
> +/* enable/disable SEV-SNP support */
Useless comment.
> +static bool sev_snp_enabled;
> #else
> #define sev_enabled false
> #define sev_es_enabled false
> #define sev_es_debug_swap_enabled false
> +#define sev_snp_enabled false
> #endif /* CONFIG_KVM_AMD_SEV */
>
> #define AP_RESET_HOLD_NONE 0
> @@ -2186,6 +2190,7 @@ void __init sev_hardware_setup(void)
> {
> #ifdef CONFIG_KVM_AMD_SEV
> unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
> + bool sev_snp_supported = false;
> bool sev_es_supported = false;
> bool sev_supported = false;
>
> @@ -2261,6 +2266,10 @@ void __init sev_hardware_setup(void)
> sev_es_asid_count = min_sev_asid - 1;
> WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count));
> sev_es_supported = true;
> + sev_snp_supported = sev_snp_enabled && cpu_feature_enabled(X86_FEATURE_SEV_SNP);
> +
> + pr_info("SEV-ES %ssupported: %u ASIDs\n",
> + sev_snp_supported ? "and SEV-SNP " : "", sev_es_asid_count);
Why like this?
>
> out:
Here, below the "out:" label you're already dumping SEV and -ES status.
Just do SNP exactly the same.
> if (boot_cpu_has(X86_FEATURE_SEV))
> @@ -2277,6 +2286,7 @@ void __init sev_hardware_setup(void)
> if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
> !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
> sev_es_debug_swap_enabled = false;
> + sev_snp_enabled = sev_snp_supported;
> #endif
> }
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 25/50] KVM: SEV: Add KVM_SNP_INIT command
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (23 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 24/50] KVM: SEV: Add initial SEV-SNP support Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 13:27 ` [PATCH v10 26/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
` (24 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Pavan Kumar Paluri
From: Brijesh Singh <brijesh.singh@amd.com>
The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.
The flags value must be zero, it will be extended in future SNP support to
communicate the optional features (such as restricted INT injection etc).
Co-developed-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 27 +++++++++++++
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 39 ++++++++++++++++++-
arch/x86/kvm/svm/svm.h | 4 ++
include/uapi/linux/kvm.h | 13 +++++++
5 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index 995780088eb2..b1a19c9a577a 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -434,6 +434,33 @@ issued by the hypervisor to make the guest ready for execution.
Returns: 0 on success, -negative on error
+18. KVM_SNP_INIT
+----------------
+
+The KVM_SNP_INIT command can be used by the hypervisor to initialize SEV-SNP
+context. In a typical workflow, this command should be the first command issued.
+
+Parameters (in/out): struct kvm_snp_init
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_snp_init {
+ __u64 flags;
+ };
+
+The flags bitmap is defined as::
+
+ /* enable the restricted injection */
+ #define KVM_SEV_SNP_RESTRICTED_INJET (1<<0)
+
+ /* enable the restricted injection timer */
+ #define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1<<1)
+
+If the specified flags is not supported then return -EOPNOTSUPP, and the supported
+flags are returned.
+
References
==========
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 19bf955b67e0..a901f1daaefc 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -289,6 +289,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
+#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4eefc168ebb3..0cd2a850cb45 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -251,6 +251,25 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_decommission(handle);
}
+static int verify_snp_init_flags(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_snp_init params;
+ int ret = 0;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ if (params.flags & ~SEV_SNP_SUPPORTED_FLAGS)
+ ret = -EOPNOTSUPP;
+
+ params.flags = SEV_SNP_SUPPORTED_FLAGS;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, ¶ms, sizeof(params)))
+ ret = -EFAULT;
+
+ return ret;
+}
+
static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -264,12 +283,19 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
sev->active = true;
- sev->es_active = argp->id == KVM_SEV_ES_INIT;
+ sev->es_active = (argp->id == KVM_SEV_ES_INIT || argp->id == KVM_SEV_SNP_INIT);
+ sev->snp_active = argp->id == KVM_SEV_SNP_INIT;
asid = sev_asid_new(sev);
if (asid < 0)
goto e_no_asid;
sev->asid = asid;
+ if (sev->snp_active) {
+ ret = verify_snp_init_flags(kvm, argp);
+ if (ret)
+ goto e_free;
+ }
+
ret = sev_platform_init(&argp->error);
if (ret)
goto e_free;
@@ -285,6 +311,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
sev_asid_free(sev);
sev->asid = 0;
e_no_asid:
+ sev->snp_active = false;
sev->es_active = false;
sev->active = false;
return ret;
@@ -623,6 +650,10 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
if (sev_es_debug_swap_enabled)
save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+ /* Enable the SEV-SNP feature */
+ if (sev_snp_guest(svm->vcpu.kvm))
+ save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
@@ -1881,6 +1912,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
}
switch (sev_cmd.id) {
+ case KVM_SEV_SNP_INIT:
+ if (!sev_snp_enabled) {
+ r = -ENOTTY;
+ goto out;
+ }
+ fallthrough;
case KVM_SEV_ES_INIT:
if (!sev_es_enabled) {
r = -ENOTTY;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 635430fa641b..71f56bee0b90 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -76,6 +76,9 @@ enum {
/* TPR and CR2 are always written before VMRUN */
#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
+/* Supported init feature flags */
+#define SEV_SNP_SUPPORTED_FLAGS 0x0
+
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
@@ -91,6 +94,7 @@ struct kvm_sev_info {
struct list_head mirror_entry; /* Use as a list entry of mirrors */
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
+ u64 snp_init_flags;
};
struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 65fc983af840..a98a77f4fc4c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1962,6 +1962,9 @@ enum sev_cmd_id {
/* Guest Migration Extension */
KVM_SEV_SEND_CANCEL,
+ /* SNP specific commands */
+ KVM_SEV_SNP_INIT,
+
KVM_SEV_NR_MAX,
};
@@ -2058,6 +2061,16 @@ struct kvm_sev_receive_update_data {
__u32 trans_len;
};
+/* enable the restricted injection */
+#define KVM_SEV_SNP_RESTRICTED_INJET (1 << 0)
+
+/* enable the restricted injection timer */
+#define KVM_SEV_SNP_RESTRICTED_TIMER_INJET (1 << 1)
+
+struct kvm_snp_init {
+ __u64 flags;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 26/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (24 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 25/50] KVM: SEV: Add KVM_SNP_INIT command Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 13:27 ` [PATCH v10 27/50] KVM: Add HVA range operator Michael Roth
` (23 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct
the measurement of the guest. If the guest is expected to be migrated,
the command also binds a migration agent (MA) to the guest.
For more information see the SEV-SNP specification.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: hold sev_deactivate_lock when calling SEV_CMD_SNP_DECOMMISSION]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 +++
arch/x86/kvm/svm/sev.c | 144 +++++++++++++++++-
arch/x86/kvm/svm/svm.h | 1 +
include/uapi/linux/kvm.h | 10 ++
4 files changed, 176 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b1a19c9a577a..b1beb2fe8766 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -461,6 +461,30 @@ The flags bitmap is defined as::
If the specified flags is not supported then return -EOPNOTSUPP, and the supported
flags are returned.
+19. KVM_SNP_LAUNCH_START
+------------------------
+
+The KVM_SNP_LAUNCH_START command is used for creating the memory encryption
+context for the SEV-SNP guest. To create the encryption context, user must
+provide a guest policy, migration agent (if any) and guest OS visible
+workarounds value as defined SEV-SNP specification.
+
+Parameters (in): struct kvm_snp_launch_start
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_start {
+ __u64 policy; /* Guest policy to use. */
+ __u64 ma_uaddr; /* userspace address of migration agent */
+ __u8 ma_en; /* 1 if the migration agent is enabled */
+ __u8 imi_en; /* set IMI to 1. */
+ __u8 gosvw[16]; /* guest OS visible workarounds */
+ };
+
+See the SEV-SNP specification for further detail on the launch input.
+
References
==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0cd2a850cb45..a4efd1858a9c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -24,6 +24,7 @@
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
#include <asm/debugreg.h>
+#include <asm/sev-host.h>
#include "mmu.h"
#include "x86.h"
@@ -73,6 +74,10 @@ static bool sev_snp_enabled;
#define AP_RESET_HOLD_NAE_EVENT 1
#define AP_RESET_HOLD_MSR_PROTO 2
+/* As defined by SEV-SNP Firmware ABI, under "Guest Policy". */
+#define SNP_POLICY_MASK_SMT BIT_ULL(16)
+#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -83,6 +88,8 @@ static unsigned int nr_asids;
static unsigned long *sev_asid_bitmap;
static unsigned long *sev_reclaim_asid_bitmap;
+static int snp_decommission_context(struct kvm *kvm);
+
struct enc_region {
struct list_head list;
unsigned long npages;
@@ -108,12 +115,17 @@ static int sev_flush_asids(int min_asid, int max_asid)
down_write(&sev_deactivate_lock);
wbinvd_on_all_cpus();
- ret = sev_guest_df_flush(&error);
+
+ if (sev_snp_enabled)
+ ret = sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, &error);
+ else
+ ret = sev_guest_df_flush(&error);
up_write(&sev_deactivate_lock);
if (ret)
- pr_err("SEV: DF_FLUSH failed, ret=%d, error=%#x\n", ret, error);
+ pr_err("SEV%s: DF_FLUSH failed, ret=%d, error=%#x\n",
+ sev_snp_enabled ? "-SNP" : "", ret, error);
return ret;
}
@@ -1888,6 +1900,94 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}
+/*
+ * The guest context contains all the information, keys and metadata
+ * associated with the guest that the firmware tracks to implement SEV
+ * and SNP features. The firmware stores the guest context in hypervisor
+ * provide page via the SNP_GCTX_CREATE command.
+ */
+static void *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct sev_data_snp_addr data = {};
+ void *context;
+ int rc;
+
+ /* Allocate memory for context page */
+ context = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT);
+ if (!context)
+ return NULL;
+
+ data.gctx_paddr = __psp_pa(context);
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, &data, &argp->error);
+ if (rc) {
+ snp_free_firmware_page(context);
+ return NULL;
+ }
+
+ return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_activate data = {0};
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.asid = sev_get_asid(kvm);
+ return sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, &data, error);
+}
+
+static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_start start = {0};
+ struct kvm_sev_snp_launch_start params;
+ int rc;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Don't allow userspace to allocate memory for more than 1 SNP context. */
+ if (sev->snp_context)
+ return -EINVAL;
+
+ sev->snp_context = snp_context_create(kvm, argp);
+ if (!sev->snp_context)
+ return -ENOTTY;
+
+ if (params.policy & SNP_POLICY_MASK_SINGLE_SOCKET) {
+ pr_warn("SEV-SNP hypervisor does not support limiting guests to a single socket.");
+ return -EINVAL;
+ }
+
+ if (!(params.policy & SNP_POLICY_MASK_SMT)) {
+ pr_warn("SEV-SNP hypervisor does not support limiting guests to a single SMT thread.");
+ return -EINVAL;
+ }
+
+ start.gctx_paddr = __psp_pa(sev->snp_context);
+ start.policy = params.policy;
+ memcpy(start.gosvw, params.gosvw, sizeof(params.gosvw));
+ rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_START, &start, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ sev->fd = argp->sev_fd;
+ rc = snp_bind_asid(kvm, &argp->error);
+ if (rc)
+ goto e_free_context;
+
+ return 0;
+
+e_free_context:
+ snp_decommission_context(kvm);
+
+ return rc;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -1978,6 +2078,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_RECEIVE_FINISH:
r = sev_receive_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_START:
+ r = snp_launch_start(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2170,6 +2273,33 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
return ret;
}
+static int snp_decommission_context(struct kvm *kvm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_addr data = {};
+ int ret;
+
+ /* If context is not created then do nothing */
+ if (!sev->snp_context)
+ return 0;
+
+ data.gctx_paddr = __sme_pa(sev->snp_context);
+ down_write(&sev_deactivate_lock);
+ ret = sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, &data, NULL);
+ if (WARN_ONCE(ret, "failed to release guest context")) {
+ up_write(&sev_deactivate_lock);
+ return ret;
+ }
+
+ up_write(&sev_deactivate_lock);
+
+ /* free the context page now */
+ snp_free_firmware_page(sev->snp_context);
+ sev->snp_context = NULL;
+
+ return 0;
+}
+
void sev_vm_destroy(struct kvm *kvm)
{
struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
@@ -2211,7 +2341,15 @@ void sev_vm_destroy(struct kvm *kvm)
}
}
- sev_unbind_asid(kvm, sev->handle);
+ if (sev_snp_guest(kvm)) {
+ if (snp_decommission_context(kvm)) {
+ WARN_ONCE(1, "Failed to free SNP guest context, leaking asid!\n");
+ return;
+ }
+ } else {
+ sev_unbind_asid(kvm, sev->handle);
+ }
+
sev_asid_free(sev);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 71f56bee0b90..f86dd7d09441 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -95,6 +95,7 @@ struct kvm_sev_info {
struct misc_cg *misc_cg; /* For misc cgroup accounting */
atomic_t migration_in_progress;
u64 snp_init_flags;
+ void *snp_context; /* SNP guest context page */
};
struct kvm_svm {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a98a77f4fc4c..e92da3d4f569 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1964,6 +1964,7 @@ enum sev_cmd_id {
/* SNP specific commands */
KVM_SEV_SNP_INIT,
+ KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_NR_MAX,
};
@@ -2071,6 +2072,15 @@ struct kvm_snp_init {
__u64 flags;
};
+struct kvm_sev_snp_launch_start {
+ __u64 policy;
+ __u64 ma_uaddr;
+ __u8 ma_en;
+ __u8 imi_en;
+ __u8 gosvw[16];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 27/50] KVM: Add HVA range operator
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (25 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 26/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 13:27 ` [PATCH v10 28/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
` (22 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Vishal Annapurve
From: Vishal Annapurve <vannapurve@google.com>
Introduce HVA range operator so that other KVM subsystems
can operate on HVA range.
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
[mdr: minor checkpatch alignment fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
include/linux/kvm_host.h | 6 +++++
virt/kvm/kvm_main.c | 49 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 840a5be5962a..f5453006b98d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1431,6 +1431,12 @@ void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
void kvm_mmu_invalidate_end(struct kvm *kvm);
bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
+typedef int (*kvm_hva_range_op_t)(struct kvm *kvm,
+ struct kvm_gfn_range *range, void *data);
+
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+ unsigned long hva_end, kvm_hva_range_op_t handler, void *data);
+
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
long kvm_arch_vcpu_ioctl(struct file *filp,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 959e866c84f0..2ad452a13d82 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -676,6 +676,55 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
return r;
}
+int kvm_vm_do_hva_range_op(struct kvm *kvm, unsigned long hva_start,
+ unsigned long hva_end, kvm_hva_range_op_t handler, void *data)
+{
+ int ret = 0;
+ struct kvm_gfn_range gfn_range;
+ struct kvm_memory_slot *slot;
+ struct kvm_memslots *slots;
+ int i, idx;
+
+ if (WARN_ON_ONCE(hva_end <= hva_start))
+ return -EINVAL;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) {
+ struct interval_tree_node *node;
+
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot_in_hva_range(node, slots,
+ hva_start, hva_end - 1) {
+ unsigned long start, end;
+
+ slot = container_of(node, struct kvm_memory_slot,
+ hva_node[slots->node_idx]);
+ start = max(hva_start, slot->userspace_addr);
+ end = min(hva_end, slot->userspace_addr +
+ (slot->npages << PAGE_SHIFT));
+
+ /*
+ * {gfn(page) | page intersects with [hva_start, hva_end)} =
+ * {gfn_start, gfn_start+1, ..., gfn_end-1}.
+ */
+ gfn_range.start = hva_to_gfn_memslot(start, slot);
+ gfn_range.end = hva_to_gfn_memslot(end + PAGE_SIZE - 1, slot);
+ gfn_range.slot = slot;
+
+ ret = handler(kvm, &gfn_range, data);
+ if (ret)
+ goto e_ret;
+ }
+ }
+
+e_ret:
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_do_hva_range_op);
+
static __always_inline int kvm_handle_hva_range(struct mmu_notifier *mn,
unsigned long start,
unsigned long end,
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 28/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (26 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 27/50] KVM: Add HVA range operator Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 13:27 ` [PATCH v10 29/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
` (21 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.
In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.
While terminating the guest, reclaim the guest pages added in the RMP
table. If the reclaim fails, then the page is no longer safe to be
released back to the system and leak them.
For more information see the SEV-SNP specification.
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 28 +++
arch/x86/kvm/svm/sev.c | 181 ++++++++++++++++++
include/uapi/linux/kvm.h | 19 ++
3 files changed, 228 insertions(+)
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b1beb2fe8766..d4325b26724c 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error
See the SEV-SNP specification for further detail on the launch input.
+20. KVM_SNP_LAUNCH_UPDATE
+-------------------------
+
+The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also
+calculates a measurement of the memory contents. The measurement is a signature
+of the memory contents that can be sent to the guest owner as an attestation
+that the memory was encrypted correctly by the firmware.
+
+Parameters (in): struct kvm_snp_launch_update
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_update {
+ __u64 start_gfn; /* Guest page number to start from. */
+ __u64 uaddr; /* userspace address need to be encrypted */
+ __u32 len; /* length of memory region */
+ __u8 imi_page; /* 1 if memory is part of the IMI */
+ __u8 page_type; /* page type */
+ __u8 vmpl3_perms; /* VMPL3 permission mask */
+ __u8 vmpl2_perms; /* VMPL2 permission mask */
+ __u8 vmpl1_perms; /* VMPL1 permission mask */
+ };
+
+See the SEV-SNP spec for further details on how to build the VMPL permission
+mask and page type.
+
References
==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a4efd1858a9c..c505e4620456 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -246,6 +246,36 @@ static void sev_decommission(unsigned int handle)
sev_guest_decommission(&decommission, NULL);
}
+static int snp_page_reclaim(u64 pfn)
+{
+ struct sev_data_snp_page_reclaim data = {0};
+ int err, rc;
+
+ data.paddr = __sme_set(pfn << PAGE_SHIFT);
+ rc = sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, &data, &err);
+ if (rc) {
+ /*
+ * If the reclaim failed, then page is no longer safe
+ * to use.
+ */
+ snp_leak_pages(pfn, 1);
+ }
+
+ return rc;
+}
+
+static int host_rmp_make_shared(u64 pfn, enum pg_level level, bool leak)
+{
+ int rc;
+
+ rc = rmp_make_shared(pfn, level);
+ if (rc && leak)
+ snp_leak_pages(pfn,
+ page_level_size(level) >> PAGE_SHIFT);
+
+ return rc;
+}
+
static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
{
struct sev_data_deactivate deactivate;
@@ -1988,6 +2018,154 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
return rc;
}
+static int snp_launch_update_gfn_handler(struct kvm *kvm,
+ struct kvm_gfn_range *range,
+ void *opaque)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_memory_slot *memslot = range->slot;
+ struct sev_data_snp_launch_update data = {0};
+ struct kvm_sev_snp_launch_update params;
+ struct kvm_sev_cmd *argp = opaque;
+ int *error = &argp->error;
+ int i, n = 0, ret = 0;
+ unsigned long npages;
+ kvm_pfn_t *pfns;
+ gfn_t gfn;
+
+ if (!kvm_slot_can_be_private(memslot)) {
+ pr_err("SEV-SNP requires private memory support via guest_memfd.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params))) {
+ pr_err("Failed to copy user parameters for SEV-SNP launch.\n");
+ return -EFAULT;
+ }
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+
+ npages = range->end - range->start;
+ pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL_ACCOUNT);
+ if (!pfns)
+ return -ENOMEM;
+
+ pr_debug("%s: GFN range 0x%llx-0x%llx, type %d\n", __func__,
+ range->start, range->end, params.page_type);
+
+ for (gfn = range->start, i = 0; gfn < range->end; gfn++, i++) {
+ int order, level;
+ bool assigned;
+ void *kvaddr;
+
+ ret = __kvm_gmem_get_pfn(kvm, memslot, gfn, &pfns[i], &order, false);
+ if (ret)
+ goto e_release;
+
+ n++;
+ ret = snp_lookup_rmpentry((u64)pfns[i], &assigned, &level);
+ if (ret || assigned) {
+ pr_err("Failed to ensure GFN 0x%llx is in initial shared state, ret: %d, assigned: %d\n",
+ gfn, ret, assigned);
+ return -EFAULT;
+ }
+
+ kvaddr = pfn_to_kaddr(pfns[i]);
+ if (!virt_addr_valid(kvaddr)) {
+ pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn);
+ ret = -EINVAL;
+ goto e_release;
+ }
+
+ ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+ if (ret) {
+ pr_err("Guest read failed, ret: 0x%x\n", ret);
+ goto e_release;
+ }
+
+ ret = rmp_make_private(pfns[i], gfn << PAGE_SHIFT, PG_LEVEL_4K,
+ sev_get_asid(kvm), true);
+ if (ret) {
+ ret = -EFAULT;
+ goto e_release;
+ }
+
+ data.address = __sme_set(pfns[i] << PAGE_SHIFT);
+ data.page_size = X86_TO_RMP_PG_LEVEL(PG_LEVEL_4K);
+ data.page_type = params.page_type;
+ data.vmpl3_perms = params.vmpl3_perms;
+ data.vmpl2_perms = params.vmpl2_perms;
+ data.vmpl1_perms = params.vmpl1_perms;
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, error);
+ if (ret) {
+ pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n",
+ ret, *error);
+ snp_page_reclaim(pfns[i]);
+
+ /*
+ * When invalid CPUID function entries are detected, the firmware
+ * corrects these entries for debugging purpose and leaves the
+ * page unencrypted so it can be provided users for debugging
+ * and error-reporting.
+ *
+ * Copy the corrected CPUID page back to shared memory so
+ * userpsace can retrieve this information.
+ */
+ if (params.page_type == SNP_PAGE_TYPE_CPUID &&
+ *error == SEV_RET_INVALID_PARAM) {
+ int ret;
+
+ host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+ ret = kvm_write_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE);
+ if (ret)
+ pr_err("Failed to write CPUID page back to userspace, ret: 0x%x\n",
+ ret);
+ }
+
+ goto e_release;
+ }
+ }
+
+e_release:
+ /* Content of memory is updated, mark pages dirty */
+ for (i = 0; i < n; i++) {
+ set_page_dirty(pfn_to_page(pfns[i]));
+ mark_page_accessed(pfn_to_page(pfns[i]));
+
+ /*
+ * If its an error, then update RMP entry to change page ownership
+ * to the hypervisor.
+ */
+ if (ret)
+ host_rmp_make_shared(pfns[i], PG_LEVEL_4K, true);
+
+ put_page(pfn_to_page(pfns[i]));
+ }
+
+ kvfree(pfns);
+ return ret;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_update params;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ return kvm_vm_do_hva_range_op(kvm, params.uaddr, params.uaddr + params.len,
+ snp_launch_update_gfn_handler, argp);
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2081,6 +2259,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_START:
r = snp_launch_start(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_UPDATE:
+ r = snp_launch_update(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e92da3d4f569..264e6acb7947 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1965,6 +1965,7 @@ enum sev_cmd_id {
/* SNP specific commands */
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
+ KVM_SEV_SNP_LAUNCH_UPDATE,
KVM_SEV_NR_MAX,
};
@@ -2081,6 +2082,24 @@ struct kvm_sev_snp_launch_start {
__u8 pad[6];
};
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL 0x1
+#define KVM_SEV_SNP_PAGE_TYPE_VMSA 0x2
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO 0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED 0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS 0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID 0x6
+
+struct kvm_sev_snp_launch_update {
+ __u64 start_gfn;
+ __u64 uaddr;
+ __u32 len;
+ __u8 imi_page;
+ __u8 page_type;
+ __u8 vmpl3_perms;
+ __u8 vmpl2_perms;
+ __u8 vmpl1_perms;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 29/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (27 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 28/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 13:27 ` [PATCH v10 30/50] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
` (20 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Harald Hoyer
From: Brijesh Singh <brijesh.singh@amd.com>
The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.
While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.
If its an SNP guest, then VMSA was added in the RMP entry as
a guest owned page and also removed from the kernel direct map
so flush it later after it is transitioned back to hypervisor
state and restored in the direct map.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: always measure BSP first to get consistent launch measurements]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 24 +++
arch/x86/kvm/svm/sev.c | 146 ++++++++++++++++++
include/uapi/linux/kvm.h | 14 ++
3 files changed, 184 insertions(+)
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index d4325b26724c..b89634cfcc06 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -513,6 +513,30 @@ Returns: 0 on success, -negative on error
See the SEV-SNP spec for further details on how to build the VMPL permission
mask and page type.
+21. KVM_SNP_LAUNCH_FINISH
+-------------------------
+
+After completion of the SNP guest launch flow, the KVM_SNP_LAUNCH_FINISH command can be
+issued to make the guest ready for the execution.
+
+Parameters (in): struct kvm_sev_snp_launch_finish
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[32];
+ __u8 pad[6];
+ };
+
+
+See SEV-SNP specification for further details on launch finish input parameters.
+
References
==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c505e4620456..ae9f765dfa95 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -78,6 +78,8 @@ static bool sev_snp_enabled;
#define SNP_POLICY_MASK_SMT BIT_ULL(16)
#define SNP_POLICY_MASK_SINGLE_SOCKET BIT_ULL(20)
+#define INITIAL_VMSA_GPA 0xFFFFFFFFF000
+
static u8 sev_enc_bit;
static DECLARE_RWSEM(sev_deactivate_lock);
static DEFINE_MUTEX(sev_bitmap_lock);
@@ -747,7 +749,29 @@ static int sev_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (!sev_es_guest(kvm))
return -ENOTTY;
+ /* Handle boot vCPU first to ensure consistent measurement of initial state. */
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (vcpu->vcpu_id != 0)
+ continue;
+
+ ret = mutex_lock_killable(&vcpu->mutex);
+ if (ret)
+ return ret;
+
+ ret = __sev_launch_update_vmsa(kvm, vcpu, &argp->error);
+
+ mutex_unlock(&vcpu->mutex);
+ if (ret)
+ return ret;
+
+ break;
+ }
+
+ /* Handle remaining vCPUs. */
kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (vcpu->vcpu_id == 0)
+ continue;
+
ret = mutex_lock_killable(&vcpu->mutex);
if (ret)
return ret;
@@ -2166,6 +2190,109 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
snp_launch_update_gfn_handler, argp);
}
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct sev_data_snp_launch_update data = {};
+ struct kvm_vcpu *vcpu;
+ unsigned long i;
+ int ret;
+
+ data.gctx_paddr = __psp_pa(sev->snp_context);
+ data.page_type = SNP_PAGE_TYPE_VMSA;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ struct vcpu_svm *svm = to_svm(vcpu);
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ /* Perform some pre-encryption checks against the VMSA */
+ ret = sev_es_sync_vmsa(svm);
+ if (ret)
+ return ret;
+
+ /* Transition the VMSA page to a firmware state. */
+ ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
+ if (ret)
+ return ret;
+
+ /* Issue the SNP command to encrypt the VMSA */
+ data.address = __sme_pa(svm->sev_es.vmsa);
+ ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE,
+ &data, &argp->error);
+ if (ret) {
+ snp_page_reclaim(pfn);
+ return ret;
+ }
+
+ svm->vcpu.arch.guest_state_protected = true;
+ }
+
+ return 0;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_launch_finish params;
+ struct sev_data_snp_launch_finish *data;
+ void *id_block = NULL, *id_auth = NULL;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, sizeof(params)))
+ return -EFAULT;
+
+ /* Measure all vCPUs using LAUNCH_UPDATE before finalizing the launch flow. */
+ ret = snp_launch_update_vmsa(kvm, argp);
+ if (ret)
+ return ret;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+ if (!data)
+ return -ENOMEM;
+
+ if (params.id_block_en) {
+ id_block = psp_copy_user_blob(params.id_block_uaddr, KVM_SEV_SNP_ID_BLOCK_SIZE);
+ if (IS_ERR(id_block)) {
+ ret = PTR_ERR(id_block);
+ goto e_free;
+ }
+
+ data->id_block_en = 1;
+ data->id_block_paddr = __sme_pa(id_block);
+
+ id_auth = psp_copy_user_blob(params.id_auth_uaddr, KVM_SEV_SNP_ID_AUTH_SIZE);
+ if (IS_ERR(id_auth)) {
+ ret = PTR_ERR(id_auth);
+ goto e_free_id_block;
+ }
+
+ data->id_auth_paddr = __sme_pa(id_auth);
+
+ if (params.auth_key_en)
+ data->auth_key_en = 1;
+ }
+
+ memcpy(data->host_data, params.host_data, KVM_SEV_SNP_FINISH_DATA_SIZE);
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, &argp->error);
+
+ kfree(id_auth);
+
+e_free_id_block:
+ kfree(id_block);
+
+e_free:
+ kfree(data);
+
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2262,6 +2389,9 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_UPDATE:
r = snp_launch_update(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_LAUNCH_FINISH:
+ r = snp_launch_finish(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2730,11 +2860,27 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
svm = to_svm(vcpu);
+ /*
+ * If its an SNP guest, then VMSA was added in the RMP entry as
+ * a guest owned page. Transition the page to hypervisor state
+ * before releasing it back to the system.
+ * Also the page is removed from the kernel direct map, so flush it
+ * later after it is transitioned back to hypervisor state and
+ * restored in the direct map.
+ */
+ if (sev_snp_guest(vcpu->kvm)) {
+ u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
+
+ if (host_rmp_make_shared(pfn, PG_LEVEL_4K, true))
+ goto skip_vmsa_free;
+ }
+
if (vcpu->arch.guest_state_protected)
sev_flush_encrypted_page(vcpu, svm->sev_es.vmsa);
__free_page(virt_to_page(svm->sev_es.vmsa));
+skip_vmsa_free:
if (svm->sev_es.ghcb_sa_free)
kvfree(svm->sev_es.ghcb_sa);
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 264e6acb7947..6f7b44b32497 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1966,6 +1966,7 @@ enum sev_cmd_id {
KVM_SEV_SNP_INIT,
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
+ KVM_SEV_SNP_LAUNCH_FINISH,
KVM_SEV_NR_MAX,
};
@@ -2100,6 +2101,19 @@ struct kvm_sev_snp_launch_update {
__u8 vmpl1_perms;
};
+#define KVM_SEV_SNP_ID_BLOCK_SIZE 96
+#define KVM_SEV_SNP_ID_AUTH_SIZE 4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE 32
+
+struct kvm_sev_snp_launch_finish {
+ __u64 id_block_uaddr;
+ __u64 id_auth_uaddr;
+ __u8 id_block_en;
+ __u8 auth_key_en;
+ __u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+ __u8 pad[6];
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 30/50] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (28 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 29/50] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command Michael Roth
@ 2023-10-16 13:27 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 31/50] KVM: SEV: Add KVM_EXIT_VMGEXIT Michael Roth
` (19 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:27 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
SEV-SNP guests are required to perform a GHCB GPA registration. Before
using a GHCB GPA for a vCPU the first time, a guest must register the
vCPU GHCB GPA. If hypervisor can work with the guest requested GPA then
it must respond back with the same GPA otherwise return -1.
On VMEXIT, Verify that GHCB GPA matches with the registered value. If a
mismatch is detected then abort the guest.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/sev-common.h | 8 ++++++++
arch/x86/kvm/svm/sev.c | 28 ++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 7 +++++++
3 files changed, 43 insertions(+)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 9ba88973a187..9febc1474a30 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -59,6 +59,14 @@
#define GHCB_MSR_AP_RESET_HOLD_RESULT_POS 12
#define GHCB_MSR_AP_RESET_HOLD_RESULT_MASK GENMASK_ULL(51, 0)
+/* Preferred GHCB GPA Request */
+#define GHCB_MSR_PREF_GPA_REQ 0x010
+#define GHCB_MSR_GPA_VALUE_POS 12
+#define GHCB_MSR_GPA_VALUE_MASK GENMASK_ULL(51, 0)
+
+#define GHCB_MSR_PREF_GPA_RESP 0x011
+#define GHCB_MSR_PREF_GPA_NONE 0xfffffffffffff
+
/* GHCB GPA Register */
#define GHCB_MSR_REG_GPA_REQ 0x012
#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ae9f765dfa95..d9c3ecef2710 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3348,6 +3348,27 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PREF_GPA_REQ: {
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
+ case GHCB_MSR_REG_GPA_REQ: {
+ u64 gfn;
+
+ gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+
+ svm->sev_es.ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+ set_ghcb_msr_bits(svm, gfn, GHCB_MSR_GPA_VALUE_MASK,
+ GHCB_MSR_GPA_VALUE_POS);
+ set_ghcb_msr_bits(svm, GHCB_MSR_REG_GPA_RESP, GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+ break;
+ }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
@@ -3411,6 +3432,13 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
trace_kvm_vmgexit_enter(vcpu->vcpu_id, svm->sev_es.ghcb);
sev_es_sync_from_ghcb(svm);
+
+ /* SEV-SNP guest requires that the GHCB GPA must be registered */
+ if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, ghcb_gpa)) {
+ vcpu_unimpl(&svm->vcpu, "vmgexit: GHCB GPA [%#llx] is not registered.\n", ghcb_gpa);
+ return -EINVAL;
+ }
+
ret = sev_es_validate_vmgexit(svm);
if (ret)
return ret;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f86dd7d09441..c4449a88e629 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -209,6 +209,8 @@ struct vcpu_sev_es_state {
u32 ghcb_sa_len;
bool ghcb_sa_sync;
bool ghcb_sa_free;
+
+ u64 ghcb_registered_gpa;
};
struct vcpu_svm {
@@ -352,6 +354,11 @@ static __always_inline bool sev_snp_guest(struct kvm *kvm)
return sev_es_guest(kvm) && sev->snp_active;
}
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+ return svm->sev_es.ghcb_registered_gpa == val;
+}
+
static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
{
vmcb->control.clean = 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 31/50] KVM: SEV: Add KVM_EXIT_VMGEXIT
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (29 preceding siblings ...)
2023-10-16 13:27 ` [PATCH v10 30/50] KVM: SEV: Add support to handle GHCB GPA register VMGEXIT Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 32/50] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
` (18 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
For private memslots, GHCB page state change requests will be forwarded
to userspace for processing. Define a new KVM_EXIT_VMGEXIT for exits of
this type, as well as other potential userspace handling for VMGEXITs in
the future.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
Documentation/virt/kvm/api.rst | 34 ++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 6 ++++++
2 files changed, 40 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 5e08f2a157ef..e84c62423ab7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6847,6 +6847,40 @@ Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
+::
+
+ /* KVM_EXIT_VMGEXIT */
+ struct {
+ __u64 ghcb_msr; /* GHCB MSR contents */
+ __u64 ret; /* user -> kernel return value */
+ } memory;
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest has
+issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. This is generally handled via the Guest-Hypervisor Communication
+Block (GHCB) specification. The value of 'ghcb_msr' will be the contents of
+the GHCB MSR register at the time of the VMGEXIT, which can either be the GPA
+of the GHCB page for page-based GHCB requests, or an encoding of an MSR-based
+GHCB request. The mechanism to distinguish between these two and determine the
+type of request is the same as what is documented in the GHCB specification.
+
+Not all VMGEXITs or GHCB requests will be forwarded to userspace. Currently
+this will only be the case for "SNP Page State Change" requests (PSCs), and
+only for the subset of these which involve actual shared <-> private
+transition. Userspace is expected to process these requests in accordance
+with the GHCB specification and issue KVM_SET_MEMORY_ATTRIBUTE ioctls to
+perform the shared/private transitions.
+
+GHCB page-based PSC requests require returning a 64-bit return value to the
+guest via the SW_EXITINFO2 field of the vCPU's VMCB structure, as documented
+in the GHCB. Userspace must set 'ret' to what the GHCB specification documents
+the SW_EXITINFO2 VMCB field should be set to after processing a PSC request.
+
+For MSR-based PSC requests, userspace must set the value of 'ghcb_msr' to be
+the same as what the GHCB specification documents the actual GHCB MSR register
+should be set to after processing a PSC request.
+
6. Capabilities that can be enabled on vCPUs
============================================
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6f7b44b32497..3af546adb962 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -279,6 +279,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_RISCV_CSR 36
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_MEMORY_FAULT 38
+#define KVM_EXIT_VMGEXIT 50
/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -525,6 +526,11 @@ struct kvm_run {
#define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
__u32 flags;
} notify;
+ /* KVM_EXIT_VMGEXIT */
+ struct {
+ __u64 ghcb_msr; /* GHCB MSR contents */
+ __u64 ret; /* user -> kernel */
+ } vmgexit;
/* Fix the size of the union. */
char padding[256];
};
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 32/50] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (30 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 31/50] KVM: SEV: Add KVM_EXIT_VMGEXIT Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 33/50] KVM: SEV: Add support to handle " Michael Roth
` (17 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification.
When using gmem, private/shared memory is allocated through separate
pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
KVM ioctl to tell KVM MMU whether or not a particular GFN should be
backed by private memory or not.
Forward these page state change requests to userspace so that it can
issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
entries when it is ready to map a private page into a guest.
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/kvm/svm/sev.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d9c3ecef2710..4890e910e6e0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3269,6 +3269,15 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}
+static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ set_ghcb_msr(svm, vcpu->run->vmgexit.ghcb_msr);
+
+ return 1; /* resume */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3369,6 +3378,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
GHCB_MSR_INFO_POS);
break;
}
+ case GHCB_MSR_PSC_REQ:
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.ghcb_msr = control->ghcb_gpa;
+ vcpu->arch.complete_userspace_io = snp_complete_psc_msr_protocol;
+
+ ret = -1;
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 33/50] KVM: SEV: Add support to handle Page State Change VMGEXIT
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (31 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 32/50] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 34/50] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
` (16 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification version 2.
Forward these requests to userspace as KVM_EXIT_VMGEXITs, similar to how
it is done for requests that don't use a GHCB page.
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/kvm/svm/sev.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4890e910e6e0..0287fadeae76 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3081,6 +3081,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
+ case SVM_VMGEXIT_PSC:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3278,6 +3279,15 @@ static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
return 1; /* resume */
}
+static int snp_complete_psc(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, vcpu->run->vmgexit.ret);
+
+ return 1; /* resume */
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3522,6 +3532,12 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_PSC:
+ /* Let userspace handling allocating/deallocating backing pages. */
+ vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+ vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
+ vcpu->arch.complete_userspace_io = snp_complete_psc;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 34/50] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (32 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 33/50] KVM: SEV: Add support to handle " Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 35/50] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
` (15 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
While resolving the RMP page fault, there may be cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.
To keep the RMP and TDP page level in sync, zap the gfn range after
splitting the pages in the RMP entry. The zap should force the TDP to
gets rebuilt with the new page level.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu.h | 2 --
arch/x86/kvm/mmu/mmu.c | 1 +
3 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a3983271ea28..b9e783d34e94 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1872,6 +1872,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 253fb2093d5d..40111f4dae9e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -237,8 +237,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
return -(u32)fault & errcode;
}
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 10c323e2faa4..8c78807e0f45 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6702,6 +6702,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
return need_tlb_flush;
}
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *slot)
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 35/50] KVM: SEV: Add support to handle RMP nested page faults
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (33 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 34/50] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 36/50] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
` (14 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
When SEV-SNP is enabled in the guest, the hardware places restrictions
on all memory accesses based on the contents of the RMP table. When
hardware encounters RMP check failure caused by the guest memory access
it raises the #NPF. The error code contains additional information on
the access type. See the APM volume 2 for additional information.
When using gmem, RMP faults resulting from mismatches between the state
in the RMP table vs. what the guest expects via its page table result
in KVM_EXIT_MEMORY_FAULTs being forwarded to userspace to handle. This
means the only expected case that needs to be handled in the kernel is
when the page size of the entry in the RMP table is larger than the
mapping in the nested page table, in which case a PSMASH instruction
needs to be issued to split the large RMP entry into individual 4K
entries so that subsequent accesses can succeed.
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/include/asm/sev-common.h | 3 +
arch/x86/kvm/svm/sev.c | 92 +++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 21 +++++--
arch/x86/kvm/svm/svm.h | 1 +
4 files changed, 113 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 9febc1474a30..15d8e9805963 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -188,6 +188,9 @@ struct snp_psc_desc {
/* RMUPDATE detected 4K page and 2MB page overlap. */
#define RMPUPDATE_FAIL_OVERLAP 4
+/* PSMASH failed due to concurrent access by another CPU */
+#define PSMASH_FAIL_INUSE 3
+
/* RMP page size */
#define RMP_PG_SIZE_4K 0
#define RMP_PG_SIZE_2M 1
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0287fadeae76..0a45031386c2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3270,6 +3270,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
svm->vmcb->control.ghcb_gpa = value;
}
+static int snp_rmptable_psmash(kvm_pfn_t pfn)
+{
+ pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+ return psmash(pfn);
+}
+
static int snp_complete_psc_msr_protocol(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3816,3 +3823,88 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
return p;
}
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+ struct kvm_memory_slot *slot;
+ struct kvm *kvm = vcpu->kvm;
+ int order, rmp_level, ret;
+ bool assigned;
+ kvm_pfn_t pfn;
+ gfn_t gfn;
+
+ gfn = gpa >> PAGE_SHIFT;
+
+ /*
+ * The only time RMP faults occur for shared pages is when the guest is
+ * triggering an RMP fault for an implicit page-state change from
+ * shared->private. Implicit page-state changes are forwarded to
+ * userspace via KVM_EXIT_MEMORY_FAULT events, however, so RMP faults
+ * for shared pages should not end up here.
+ */
+ if (!kvm_mem_is_private(kvm, gfn)) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!kvm_slot_can_be_private(slot)) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, non-private slot for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, &order);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, no private backing page for GPA 0x%llx\n",
+ gpa);
+ return;
+ }
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret || !assigned) {
+ pr_warn_ratelimited("SEV: Unexpected RMP fault, no assigned RMP entry found for GPA 0x%llx PFN 0x%llx error %d\n",
+ gpa, pfn, ret);
+ goto out;
+ }
+
+ /*
+ * There are 2 cases where a PSMASH may be needed to resolve an #NPF
+ * with PFERR_GUEST_RMP_BIT set:
+ *
+ * 1) RMPADJUST/PVALIDATE can trigger an #NPF with PFERR_GUEST_SIZEM
+ * bit set if the guest issues them with a smaller granularity than
+ * what is indicated by the page-size bit in the 2MB-aligned RMP
+ * entry for the PFN that backs the GPA.
+ *
+ * 2) Guest access via NPT can trigger an #NPF if the NPT mapping is
+ * smaller than what is indicated by the 2MB-aligned RMP entry for
+ * the PFN that backs the GPA.
+ *
+ * In both these cases, the corresponding 2M RMP entry needs to
+ * be PSMASH'd to 512 4K RMP entries. If the RMP entry is already
+ * split into 4K RMP entries, then this is likely a spurious case which
+ * can occur when there are concurrent accesses by the guest to a 2MB
+ * GPA range that is backed by a 2MB-aligned PFN who's RMP entry is in
+ * the process of being PMASH'd into 4K entries. These cases should
+ * resolve automatically on subsequent accesses, so just ignore them
+ * here.
+ */
+ if (rmp_level == PG_LEVEL_4K) {
+ pr_debug_ratelimited("%s: Spurious RMP fault for GPA 0x%llx, error_code 0x%llx",
+ __func__, gpa, error_code);
+ goto out;
+ }
+
+ pr_debug_ratelimited("%s: Splitting 2M RMP entry for GPA 0x%llx, error_code 0x%llx",
+ __func__, gpa, error_code);
+ ret = snp_rmptable_psmash(pfn);
+ if (ret && ret != PSMASH_FAIL_INUSE)
+ pr_err_ratelimited("SEV: Unable to split RMP entry for GPA 0x%llx PFN 0x%llx ret %d\n",
+ gpa, pfn, ret);
+
+ kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+out:
+ put_page(pfn_to_page(pfn));
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8e4ef0cd968a..563c9839428d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2046,15 +2046,28 @@ static int pf_interception(struct kvm_vcpu *vcpu)
static int npf_interception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ int rc;
u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;
trace_kvm_page_fault(vcpu, fault_address, error_code);
- return kvm_mmu_page_fault(vcpu, fault_address, error_code,
- static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
- svm->vmcb->control.insn_bytes : NULL,
- svm->vmcb->control.insn_len);
+ rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+ static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+ svm->vmcb->control.insn_bytes : NULL,
+ svm->vmcb->control.insn_len);
+
+ /*
+ * rc == 0 indicates a userspace exit is needed to handle page
+ * transitions, so do that first before updating the RMP table.
+ */
+ if (error_code & PFERR_GUEST_RMP_MASK) {
+ if (rc == 0)
+ return rc;
+ handle_rmp_page_fault(vcpu, fault_address, error_code);
+ }
+
+ return rc;
}
static int db_interception(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c4449a88e629..c3a37136fa30 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -715,6 +715,7 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
/* vmenter.S */
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 36/50] KVM: SEV: Use a VMSA physical address variable for populating VMCB
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (34 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 35/50] KVM: SEV: Add support to handle RMP nested page faults Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 37/50] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
` (13 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
From: Tom Lendacky <thomas.lendacky@amd.com>
In preparation to support SEV-SNP AP Creation, use a variable that holds
the VMSA physical address rather than converting the virtual address.
This will allow SEV-SNP AP Creation to set the new physical address that
will be used should the vCPU reset path be taken.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/kvm/svm/sev.c | 3 +--
arch/x86/kvm/svm/svm.c | 9 ++++++++-
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0a45031386c2..f36d72ca2cf7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3639,8 +3639,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
* the VMSA will be NULL if this vCPU is the destination for intrahost
* migration, and will be copied later.
*/
- if (svm->sev_es.vmsa)
- svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
/* Can't intercept CR register access, HV can't modify CR registers */
svm_clr_intercept(svm, INTERCEPT_CR0_READ);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 563c9839428d..c04c554e5675 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1463,9 +1463,16 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
svm_switch_vmcb(svm, &svm->vmcb01);
- if (vmsa_page)
+ if (vmsa_page) {
svm->sev_es.vmsa = page_address(vmsa_page);
+ /*
+ * Do not include the encryption mask on the VMSA physical
+ * address since hardware will access it using the guest key.
+ */
+ svm->sev_es.vmsa_pa = __pa(svm->sev_es.vmsa);
+ }
+
svm->guest_state_loaded = false;
return 0;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c3a37136fa30..0ad76ed4d625 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -200,6 +200,7 @@ struct vcpu_sev_es_state {
struct ghcb *ghcb;
u8 valid_bitmap[16];
struct kvm_host_map ghcb_map;
+ hpa_t vmsa_pa;
bool received_first_sipi;
unsigned int ap_reset_hold_type;
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 37/50] KVM: SEV: Support SEV-SNP AP Creation NAE event
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (35 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 36/50] KVM: SEV: Use a VMSA physical address variable for populating VMCB Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 38/50] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
` (12 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Tom Lendacky <thomas.lendacky@amd.com>
Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
guests to alter the register state of the APs on their own. This allows
the guest a way of simulating INIT-SIPI.
A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
so as to avoid updating the VMSA pointer while the vCPU is running.
For CREATE
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID. The GPA is saved in the svm struct of the
target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
to the vCPU and then the vCPU is kicked.
For CREATE_ON_INIT:
The guest supplies the GPA of the VMSA to be used for the vCPU with
the specified APIC ID the next time an INIT is performed. The GPA is
saved in the svm struct of the target vCPU.
For DESTROY:
The guest indicates it wishes to stop the vCPU. The GPA is cleared
from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
added to vCPU and then the vCPU is kicked.
The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
as a result of the event or as a result of an INIT. The handler sets the
vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
leave the vCPU as not runnable. Any previous VMSA pages that were
installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
a new VMSA is to be installed, the VMSA guest page is pinned and set as
the VMSA in the vCPU VMCB and the vCPU state is set to
KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
cleared in the vCPU VMCB and the vCPU state is left as
KVM_MP_STATE_UNINITIALIZED to prevent it from being run.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: add handling for restrictedmem]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/svm.h | 5 +
arch/x86/kvm/svm/sev.c | 219 ++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 3 +
arch/x86/kvm/svm/svm.h | 8 +-
arch/x86/kvm/x86.c | 11 ++
6 files changed, 246 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b9e783d34e94..cd4bfe0b7deb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -113,6 +113,7 @@
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_HV_TLB_FLUSH \
KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index a901f1daaefc..3d5e61352290 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -290,6 +290,11 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
#define SVM_SEV_FEAT_SNP_ACTIVE BIT(0)
+#define SVM_SEV_FEAT_RESTRICTED_INJECTION BIT(3)
+#define SVM_SEV_FEAT_ALTERNATE_INJECTION BIT(4)
+#define SVM_SEV_FEAT_INT_INJ_MODES \
+ (SVM_SEV_FEAT_RESTRICTED_INJECTION | \
+ SVM_SEV_FEAT_ALTERNATE_INJECTION)
struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f36d72ca2cf7..e547adddacfa 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -650,6 +650,7 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
static int sev_es_sync_vmsa(struct vcpu_svm *svm)
{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
struct sev_es_save_area *save = svm->sev_es.vmsa;
/* Check some debug related fields before encrypting the VMSA */
@@ -698,6 +699,12 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
if (sev_snp_guest(svm->vcpu.kvm))
save->sev_features |= SVM_SEV_FEAT_SNP_ACTIVE;
+ /*
+ * Save the VMSA synced SEV features. For now, they are the same for
+ * all vCPUs, so just save each time.
+ */
+ sev->sev_features = save->sev_features;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
@@ -3076,6 +3083,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
if (!kvm_ghcb_sw_scratch_is_valid(svm))
goto vmgexit_err;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
+ if (!kvm_ghcb_rax_is_valid(svm))
+ goto vmgexit_err;
+ break;
case SVM_VMGEXIT_NMI_COMPLETE:
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
@@ -3295,6 +3307,202 @@ static int snp_complete_psc(struct kvm_vcpu *vcpu)
return 1; /* resume */
}
+static int __sev_snp_update_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ hpa_t cur_pa;
+
+ WARN_ON(!mutex_is_locked(&svm->sev_es.snp_vmsa_mutex));
+
+ /* Save off the current VMSA PA for later checks */
+ cur_pa = svm->sev_es.vmsa_pa;
+
+ /* Mark the vCPU as offline and not runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+
+ /* Clear use of the VMSA */
+ svm->sev_es.vmsa_pa = INVALID_PAGE;
+ svm->vmcb->control.vmsa_pa = INVALID_PAGE;
+
+ /*
+ * sev->sev_es.vmsa holds the virtual address of the VMSA initially
+ * allocated by the host. If the guest specified a new a VMSA via
+ * AP_CREATION, it will have been pinned to avoid future issues
+ * with things like page migration support. Make sure to un-pin it
+ * before switching to a newer guest-specified VMSA.
+ */
+ if (cur_pa != __pa(svm->sev_es.vmsa) && VALID_PAGE(cur_pa))
+ kvm_release_pfn_dirty(__phys_to_pfn(cur_pa));
+
+ if (VALID_PAGE(svm->sev_es.snp_vmsa_gpa)) {
+ gfn_t gfn = gpa_to_gfn(svm->sev_es.snp_vmsa_gpa);
+ struct kvm_memory_slot *slot;
+ kvm_pfn_t pfn;
+
+ slot = gfn_to_memslot(vcpu->kvm, gfn);
+ if (!slot)
+ return -EINVAL;
+
+ /*
+ * The new VMSA will be private memory guest memory, so
+ * retrieve the PFN from the gmem backend, and leave the ref
+ * count of the associated folio elevated to ensure it won't
+ * ever be migrated.
+ */
+ if (kvm_gmem_get_pfn(vcpu->kvm, slot, gfn, &pfn, NULL))
+ return -EINVAL;
+
+ /* Use the new VMSA */
+ svm->sev_es.vmsa_pa = pfn_to_hpa(pfn);
+ svm->vmcb->control.vmsa_pa = svm->sev_es.vmsa_pa;
+
+ /* Mark the vCPU as runnable */
+ vcpu->arch.pv.pv_unhalted = false;
+ vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ }
+
+ /*
+ * When replacing the VMSA during SEV-SNP AP creation,
+ * mark the VMCB dirty so that full state is always reloaded.
+ */
+ vmcb_mark_all_dirty(svm->vmcb);
+
+ return 0;
+}
+
+/*
+ * Invoked as part of svm_vcpu_reset() processing of an init event.
+ */
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ int ret;
+
+ if (!sev_snp_guest(vcpu->kvm))
+ return;
+
+ mutex_lock(&svm->sev_es.snp_vmsa_mutex);
+
+ if (!svm->sev_es.snp_ap_create)
+ goto unlock;
+
+ svm->sev_es.snp_ap_create = false;
+
+ ret = __sev_snp_update_protected_guest_state(vcpu);
+ if (ret)
+ vcpu_unimpl(vcpu, "snp: AP state update on init failed\n");
+
+unlock:
+ mutex_unlock(&svm->sev_es.snp_vmsa_mutex);
+}
+
+static int sev_snp_ap_creation(struct vcpu_svm *svm)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm_vcpu *target_vcpu;
+ struct vcpu_svm *target_svm;
+ unsigned int request;
+ unsigned int apic_id;
+ bool kick;
+ int ret;
+
+ request = lower_32_bits(svm->vmcb->control.exit_info_1);
+ apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
+
+ /* Validate the APIC ID */
+ target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);
+ if (!target_vcpu) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
+ apic_id);
+ return -EINVAL;
+ }
+
+ ret = 0;
+
+ target_svm = to_svm(target_vcpu);
+
+ /*
+ * The target vCPU is valid, so the vCPU will be kicked unless the
+ * request is for CREATE_ON_INIT. For any errors at this stage, the
+ * kick will place the vCPU in an non-runnable state.
+ */
+ kick = true;
+
+ mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
+ target_svm->sev_es.snp_ap_create = true;
+
+ /* Interrupt injection mode shouldn't change for AP creation */
+ if (request < SVM_VMGEXIT_AP_DESTROY) {
+ u64 sev_features;
+
+ sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
+ sev_features ^= sev->sev_features;
+ if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
+ vcpu->arch.regs[VCPU_REGS_RAX]);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ switch (request) {
+ case SVM_VMGEXIT_AP_CREATE_ON_INIT:
+ kick = false;
+ fallthrough;
+ case SVM_VMGEXIT_AP_CREATE:
+ if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /*
+ * Malicious guest can RMPADJUST a large page into VMSA which
+ * will hit the SNP erratum where the CPU will incorrectly signal
+ * an RMP violation #PF if a hugepage collides with the RMP entry
+ * of VMSA page, reject the AP CREATE request if VMSA address from
+ * guest is 2M aligned.
+ */
+ if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
+ vcpu_unimpl(vcpu,
+ "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
+ svm->vmcb->control.exit_info_2);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
+ break;
+ case SVM_VMGEXIT_AP_DESTROY:
+ break;
+ default:
+ vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
+ request);
+ ret = -EINVAL;
+ break;
+ }
+
+out:
+ if (kick) {
+ if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
+ target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+
+ kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
+ kvm_vcpu_kick(target_vcpu);
+ }
+
+ mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
+
+ return ret;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3545,6 +3753,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->vmgexit.ghcb_msr = ghcb_gpa;
vcpu->arch.complete_userspace_io = snp_complete_psc;
break;
+ case SVM_VMGEXIT_AP_CREATION:
+ ret = sev_snp_ap_creation(svm);
+ if (ret) {
+ ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 2);
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, GHCB_ERR_INVALID_INPUT);
+ }
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
@@ -3711,6 +3928,8 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
GHCB_VERSION_MIN,
sev_enc_bit));
+
+ mutex_init(&svm->sev_es.snp_vmsa_mutex);
}
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c04c554e5675..f5cdcbd1ba67 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1402,6 +1402,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
svm->spec_ctrl = 0;
svm->virt_spec_ctrl = 0;
+ if (init_event)
+ sev_snp_init_protected_guest_state(vcpu);
+
init_vmcb(vcpu);
if (!init_event)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0ad76ed4d625..f81dfa1594f6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -96,6 +96,7 @@ struct kvm_sev_info {
atomic_t migration_in_progress;
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
+ u64 sev_features; /* Features set at VMSA creation */
};
struct kvm_svm {
@@ -212,6 +213,10 @@ struct vcpu_sev_es_state {
bool ghcb_sa_free;
u64 ghcb_registered_gpa;
+
+ struct mutex snp_vmsa_mutex; /* Used to handle concurrent updates of VMSA. */
+ gpa_t snp_vmsa_gpa;
+ bool snp_ap_create;
};
struct vcpu_svm {
@@ -687,7 +692,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL
-#define GHCB_HV_FT_SUPPORTED GHCB_HV_FT_SNP
+#define GHCB_HV_FT_SUPPORTED (GHCB_HV_FT_SNP | GHCB_HV_FT_SNP_AP_CREATION)
extern unsigned int max_sev_asid;
@@ -717,6 +722,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
+void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
/* vmenter.S */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 12f9e99c7ad0..8977f7f12a4a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10660,6 +10660,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
+
+ if (kvm_check_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu)) {
+ kvm_vcpu_reset(vcpu, true);
+ if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE) {
+ r = 1;
+ goto out;
+ }
+ }
}
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -12871,6 +12879,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
return true;
#endif
+ if (kvm_test_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, vcpu))
+ return true;
+
if (kvm_arch_interrupt_allowed(vcpu) &&
(kvm_cpu_has_interrupt(vcpu) ||
kvm_guest_apic_has_interrupt(vcpu)))
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 38/50] KVM: SEV: Add support for GHCB-based termination requests
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (36 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 37/50] KVM: SEV: Support SEV-SNP AP Creation NAE event Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-19 12:20 ` Liam Merwick
2023-10-16 13:28 ` [PATCH v10 39/50] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
` (11 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
GHCB version 2 adds support for a GHCB-based termination request that
a guest can issue when it reaches an error state and wishes to inform
the hypervisor that it should be terminated. Implement support for that
similarly to GHCB MSR-based termination requests that are already
available to SEV-ES guests via earlier versions of the GHCB protocol.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/svm/sev.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e547adddacfa..9c38fe796e00 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3094,6 +3094,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
+ case SVM_VMGEXIT_TERM_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3762,6 +3763,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
+ case SVM_VMGEXIT_TERM_REQUEST:
+ pr_info("SEV-ES guess requested termination: reason %#llx info %#llx\n",
+ control->exit_info_1, control->exit_info_1);
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+ vcpu->run->system_event.ndata = 1;
+ vcpu->run->system_event.data[0] = control->ghcb_gpa;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 38/50] KVM: SEV: Add support for GHCB-based termination requests
2023-10-16 13:28 ` [PATCH v10 38/50] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
@ 2023-10-19 12:20 ` Liam Merwick
0 siblings, 0 replies; 158+ messages in thread
From: Liam Merwick @ 2023-10-19 12:20 UTC (permalink / raw)
To: Michael Roth, kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
zhi.a.wang, Liam Merwick
On 16/10/2023 14:28, Michael Roth wrote:
> GHCB version 2 adds support for a GHCB-based termination request that
> a guest can issue when it reaches an error state and wishes to inform
> the hypervisor that it should be terminated. Implement support for that
> similarly to GHCB MSR-based termination requests that are already
> available to SEV-ES guests via earlier versions of the GHCB protocol.
Maybe add
See 'Termination Request' in the 'Invoking VMGEXIT' section of AMD's
GHCB spec for more details.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
> arch/x86/kvm/svm/sev.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index e547adddacfa..9c38fe796e00 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3094,6 +3094,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> case SVM_VMGEXIT_HV_FEATURES:
> case SVM_VMGEXIT_PSC:
> + case SVM_VMGEXIT_TERM_REQUEST:
> break;
> default:
> reason = GHCB_ERR_INVALID_EVENT;
> @@ -3762,6 +3763,14 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>
> ret = 1;
> break;
> + case SVM_VMGEXIT_TERM_REQUEST:
> + pr_info("SEV-ES guess requested termination: reason %#llx info %#llx\n",
> + control->exit_info_1, control->exit_info_1);
typo: "guess" -> "guest"
It prints exit_info_1 twice - was one of those meant to be exit_info_2?
Otherwise
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
> + vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> + vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
> + vcpu->run->system_event.ndata = 1;
> + vcpu->run->system_event.data[0] = control->ghcb_gpa;
> + break;
> case SVM_VMGEXIT_UNSUPPORTED_EVENT:
> vcpu_unimpl(vcpu,
> "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 39/50] KVM: SEV: Implement gmem hook for initializing private pages
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (37 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 38/50] KVM: SEV: Add support for GHCB-based termination requests Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 40/50] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
` (10 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
This will handle RMP table updates and direct map changes needed to put
a page into a private state before mapping it into an SEV-SNP guest.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 95 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/svm/svm.h | 1 +
4 files changed, 99 insertions(+)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 71dc506aa3fb..8caf2eb6add8 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -127,6 +127,7 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select KVM_SW_PROTECTED_VM
+ select HAVE_KVM_GMEM_PREPARE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9c38fe796e00..8cf2d19597b1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4135,3 +4135,98 @@ void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
out:
put_page(pfn_to_page(pfn));
}
+
+static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn = start;
+
+ while (pfn < end) {
+ int ret, rmp_level;
+ bool assigned;
+
+ ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (ret) {
+ pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n",
+ pfn, start, end, rmp_level, ret);
+ return false;
+ }
+
+ if (assigned) {
+ pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n",
+ __func__, pfn, start, end, rmp_level);
+ return false;
+ }
+
+ pfn++;
+ }
+
+ return true;
+}
+
+static u8 max_level_for_order(int order)
+{
+ if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+ return PG_LEVEL_2M;
+
+ return PG_LEVEL_4K;
+}
+
+static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order)
+{
+ kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+
+ /*
+ * If this is a large folio, and the entire 2M range containing the
+ * PFN is currently shared, then the entire 2M-aligned range can be
+ * set to private via a single 2M RMP entry.
+ */
+ if (max_level_for_order(order) > PG_LEVEL_4K &&
+ is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD))
+ return true;
+
+ return false;
+}
+
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ kvm_pfn_t pfn_aligned;
+ gfn_t gfn_aligned;
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc)
+ return rc;
+
+ if (assigned) {
+ pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n",
+ __func__, gfn, pfn, max_order, level);
+ return 0;
+ }
+
+ if (is_large_rmp_possible(kvm, pfn, max_order)) {
+ level = PG_LEVEL_2M;
+ pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD);
+ gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD);
+ } else {
+ level = PG_LEVEL_4K;
+ pfn_aligned = pfn;
+ gfn_aligned = gfn;
+ }
+
+ rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false);
+ if (rc) {
+ pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -EINVAL;
+ }
+
+ pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n",
+ __func__, gfn, pfn, pfn_aligned, max_order, level);
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f5cdcbd1ba67..b3ed424533b0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5041,6 +5041,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
.vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons,
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+
+ .gmem_prepare = sev_gmem_prepare,
};
/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f81dfa1594f6..c5cee554176e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -723,6 +723,7 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm);
struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
+int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
/* vmenter.S */
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 40/50] KVM: SEV: Implement gmem hook for invalidating private pages
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (38 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 39/50] KVM: SEV: Implement gmem hook for initializing private pages Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 41/50] KVM: x86: Add gmem hook for determining max NPT mapping level Michael Roth
` (9 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
Implement a platform hook to do the work of restoring the direct map
entries of gmem-managed pages and transitioning the corresponding RMP
table entries back to the default shared/hypervisor-owned state.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/svm/sev.c | 63 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 ++
4 files changed, 67 insertions(+)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8caf2eb6add8..dfc857db389f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -128,6 +128,7 @@ config KVM_AMD_SEV
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
select KVM_SW_PROTECTED_VM
select HAVE_KVM_GMEM_PREPARE
+ select HAVE_KVM_GMEM_INVALIDATE
help
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8cf2d19597b1..5b3a3bbfebee 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4230,3 +4230,66 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order)
return 0;
}
+
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
+{
+ kvm_pfn_t pfn;
+
+ pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end);
+
+ for (pfn = start; pfn < end;) {
+ bool use_2m_update = false;
+ int rc, rmp_level;
+ bool assigned;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level);
+ if (rc) {
+ pr_debug_ratelimited("SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ goto next_pfn;
+ }
+
+ if (!assigned)
+ goto next_pfn;
+
+ use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) &&
+ end >= (pfn + PTRS_PER_PMD) &&
+ rmp_level > PG_LEVEL_4K;
+
+ /*
+ * If an unaligned PFN corresponds to a 2M region assigned as a
+ * large page in he RMP table, PSMASH the region into individual
+ * 4K RMP entries before attempting to convert a 4K sub-page.
+ */
+ if (!use_2m_update && rmp_level > PG_LEVEL_4K) {
+ rc = snp_rmptable_psmash(pfn);
+ if (rc)
+ pr_err_ratelimited("SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ }
+
+ rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K);
+ if (WARN_ON_ONCE(rc)) {
+ pr_err_ratelimited("SEV: Failed to update RMP entry for PFN 0x%llx error %d\n",
+ pfn, rc);
+ goto next_pfn;
+ }
+
+ /*
+ * SEV-ES avoids host/guest cache coherency issues through
+ * WBINVD hooks issued via MMU notifiers during run-time, and
+ * KVM's VM destroy path at shutdown. Those MMU notifier events
+ * don't cover gmem since there is no requirement to map pages
+ * to a HVA in order to use them for a running guest. While the
+ * shutdown path would still likely cover things for SNP guests,
+ * userspace may also free gmem pages during run-time via
+ * hole-punching operations on the guest_memfd, so flush the
+ * cache entries for these pages before free'ing them back to
+ * the host.
+ */
+ clflush_cache_range(__va(pfn_to_hpa(pfn)),
+ use_2m_update ? PMD_SIZE : PAGE_SIZE);
+next_pfn:
+ pfn += use_2m_update ? PTRS_PER_PMD : 1;
+ }
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b3ed424533b0..9cff302b4402 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5043,6 +5043,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
.gmem_prepare = sev_gmem_prepare,
+ .gmem_invalidate = sev_gmem_invalidate,
};
/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c5cee554176e..1fd90a88b0db 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -724,6 +724,8 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
+void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
+int sev_gmem_max_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
/* vmenter.S */
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 41/50] KVM: x86: Add gmem hook for determining max NPT mapping level
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (39 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 40/50] KVM: SEV: Implement gmem hook for invalidating " Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 42/50] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
` (8 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
In the case of SEV-SNP, whether or not a 2MB page can be mapped via a
2MB mapping in the guest's nested page table depends on whether or not
any subpages within the range have already been initialized as private
in the RMP table. The existing mixed-attribute tracking in KVM is
insufficient here, for instance:
- gmem allocates 2MB page
- guest issues PVALIDATE on 2MB page
- guest later converts a subpage to shared
- SNP host code issues PSMASH to split 2MB RMP mapping to 4K
- KVM MMU splits NPT mapping to 4K
At this point there are no mixed attributes, and KVM would normally
allow for 2MB NPT mappings again, but this is actually not allowed
because the RMP table mappings are 4K and cannot be promoted on the
hypervisor side, so the NPT mappings must still be limited to 4K to
match this.
Add a hook to determine the max NPT mapping size in situations like
this.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 12 ++++++++++--
arch/x86/kvm/svm/sev.c | 27 +++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 1 +
5 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 4ef2eca14287..7f2e00c48d3b 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -135,6 +135,7 @@ KVM_X86_OP(complete_emulated_msr)
KVM_X86_OP(vcpu_deliver_sipi_vector)
KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
+KVM_X86_OP_OPTIONAL_RET0(gmem_max_level)
KVM_X86_OP_OPTIONAL(gmem_invalidate)
KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cd4bfe0b7deb..6dda4d24dbef 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1764,6 +1764,7 @@ struct kvm_x86_ops {
int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end);
+ int (*gmem_max_level)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level);
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
};
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8c78807e0f45..64f6cb428b32 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4304,6 +4304,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
int max_order, r;
+ u8 max_level;
if (!kvm_slot_can_be_private(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
@@ -4317,8 +4318,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
return r;
}
- fault->max_level = min(kvm_max_level_for_order(max_order),
- fault->max_level);
+ max_level = kvm_max_level_for_order(max_order);
+ r = static_call(kvm_x86_gmem_max_level)(vcpu->kvm, fault->pfn,
+ fault->gfn, &max_level);
+ if (r) {
+ kvm_release_pfn_clean(fault->pfn);
+ return r;
+ }
+
+ fault->max_level = min(max_level, fault->max_level);
fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
return RET_PF_CONTINUE;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5b3a3bbfebee..6c6d5a320d72 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4293,3 +4293,30 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end)
pfn += use_2m_update ? PTRS_PER_PMD : 1;
}
}
+
+int sev_gmem_max_level(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level)
+{
+ int level, rc;
+ bool assigned;
+
+ if (!sev_snp_guest(kvm))
+ return 0;
+
+ rc = snp_lookup_rmpentry(pfn, &assigned, &level);
+ if (rc) {
+ pr_err_ratelimited("SEV: RMP entry not found: GFN %llx PFN %llx level %d error %d\n",
+ gfn, pfn, level, rc);
+ return -ENOENT;
+ }
+
+ if (!assigned) {
+ pr_err_ratelimited("SEV: RMP entry is not assigned: GFN %llx PFN %llx level %d\n",
+ gfn, pfn, level);
+ return -EINVAL;
+ }
+
+ if (level < *max_level)
+ *max_level = level;
+
+ return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9cff302b4402..d97ec673b63d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5043,6 +5043,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
.gmem_prepare = sev_gmem_prepare,
+ .gmem_max_level = sev_gmem_max_level,
.gmem_invalidate = sev_gmem_invalidate,
};
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 42/50] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (40 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 41/50] KVM: x86: Add gmem hook for determining max NPT mapping level Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 43/50] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
` (7 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
From: Ashish Kalra <ashish.kalra@amd.com>
With SNP/guest_memfd, private/encrypted memory should not be mappable,
and MMU notifications for HVA-mapped memory will only be relevant to
unencrypted guest memory. Therefore, the rationale behind issuing a
wbinvd_on_all_cpus() in sev_guest_memory_reclaimed() should not apply
for SNP guests and can be ignored.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: Add some clarifications in commit]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/svm/sev.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6c6d5a320d72..f027def3a79e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2852,7 +2852,14 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
void sev_guest_memory_reclaimed(struct kvm *kvm)
{
- if (!sev_guest(kvm))
+ /*
+ * With SNP+gmem, private/encrypted memory should be
+ * unreachable via the hva-based mmu notifiers. Additionally,
+ * for shared->private translations, H/W coherency will ensure
+ * first guest access to the page would clear out any existing
+ * dirty copies of that cacheline.
+ */
+ if (!sev_guest(kvm) || sev_snp_guest(kvm))
return;
wbinvd_on_all_cpus();
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 43/50] KVM: SVM: Add module parameter to enable the SEV-SNP
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (41 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 42/50] KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 44/50] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
` (6 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
Add a module parameter than can be used to enable or disable the SEV-SNP
feature. Now that KVM contains the support for the SNP set the GHCB
hypervisor feature flag to indicate that SNP is supported.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/kvm/svm/sev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f027def3a79e..efe879524b6c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -62,7 +62,8 @@ static bool sev_es_debug_swap_enabled = true;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
/* enable/disable SEV-SNP support */
-static bool sev_snp_enabled;
+static bool sev_snp_enabled = true;
+module_param_named(sev_snp, sev_snp_enabled, bool, 0444);
#else
#define sev_enabled false
#define sev_es_enabled false
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 44/50] iommu/amd: Add IOMMU_SNP_SHUTDOWN support
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (42 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 43/50] KVM: SVM: Add module parameter to enable the SEV-SNP Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 45/50] iommu/amd: Report all cases inhibiting SNP enablement Michael Roth
` (5 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
From: Ashish Kalra <ashish.kalra@amd.com>
Add a new IOMMU API interface amd_iommu_snp_disable() to transition
IOMMU pages to Hypervisor state from Reclaim state after SNP_SHUTDOWN_EX
command. Invoke this API from the CCP driver after SNP_SHUTDOWN_EX
command.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
drivers/crypto/ccp/sev-dev.c | 20 +++++++++++++
drivers/iommu/amd/init.c | 55 ++++++++++++++++++++++++++++++++++++
include/linux/amd-iommu.h | 3 ++
3 files changed, 78 insertions(+)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 679b8d6fc09a..0626c0feff9b 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -26,6 +26,7 @@
#include <linux/fs.h>
#include <linux/fs_struct.h>
#include <linux/psp.h>
+#include <linux/amd-iommu.h>
#include <asm/smp.h>
#include <asm/cacheflush.h>
@@ -1513,6 +1514,25 @@ static int __sev_snp_shutdown_locked(int *error)
return ret;
}
+ /*
+ * SNP_SHUTDOWN_EX with IOMMU_SNP_SHUTDOWN set to 1 disables SNP
+ * enforcement by the IOMMU and also transitions all pages
+ * associated with the IOMMU to the Reclaim state.
+ * Firmware was transitioning the IOMMU pages to Hypervisor state
+ * before version 1.53. But, accounting for the number of assigned
+ * 4kB pages in a 2M page was done incorrectly by not transitioning
+ * to the Reclaim state. This resulted in RMP #PF when later accessing
+ * the 2M page containing those pages during kexec boot. Hence, the
+ * firmware now transitions these pages to Reclaim state and hypervisor
+ * needs to transition these pages to shared state. SNP Firmware
+ * version 1.53 and above are needed for kexec boot.
+ */
+ ret = amd_iommu_snp_disable();
+ if (ret) {
+ dev_err(sev->dev, "SNP IOMMU shutdown failed\n");
+ return ret;
+ }
+
sev->snp_initialized = false;
dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1c9924de607a..6af208a4f66b 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -30,6 +30,7 @@
#include <asm/io_apic.h>
#include <asm/irq_remapping.h>
#include <asm/set_memory.h>
+#include <asm/sev-host.h>
#include <linux/crash_dump.h>
@@ -3838,4 +3839,58 @@ int amd_iommu_snp_enable(void)
return 0;
}
+
+static int iommu_page_make_shared(void *page)
+{
+ unsigned long paddr, pfn;
+
+ paddr = iommu_virt_to_phys(page);
+ /* Cbit maybe set in the paddr */
+ pfn = __sme_clr(paddr) >> PAGE_SHIFT;
+ return rmp_make_shared(pfn, PG_LEVEL_4K);
+}
+
+static int iommu_make_shared(void *va, size_t size)
+{
+ void *page;
+ int ret;
+
+ if (!va)
+ return 0;
+
+ for (page = va; page < (va + size); page += PAGE_SIZE) {
+ ret = iommu_page_make_shared(page);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+int amd_iommu_snp_disable(void)
+{
+ struct amd_iommu *iommu;
+ int ret;
+
+ if (!amd_iommu_snp_en)
+ return 0;
+
+ for_each_iommu(iommu) {
+ ret = iommu_make_shared(iommu->evt_buf, EVT_BUFFER_SIZE);
+ if (ret)
+ return ret;
+
+ ret = iommu_make_shared(iommu->ppr_log, PPR_LOG_SIZE);
+ if (ret)
+ return ret;
+
+ ret = iommu_make_shared((void *)iommu->cmd_sem, PAGE_SIZE);
+ if (ret)
+ return ret;
+ }
+
+ amd_iommu_snp_en = false;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(amd_iommu_snp_disable);
#endif
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 55fc03cb3968..b04f2d3201b1 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -207,6 +207,9 @@ struct amd_iommu *get_amd_iommu(unsigned int idx);
#ifdef CONFIG_KVM_AMD_SEV
int amd_iommu_snp_enable(void);
+int amd_iommu_snp_disable(void);
+#else
+static inline int amd_iommu_snp_disable(void) { return 0; }
#endif
#endif /* _ASM_X86_AMD_IOMMU_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 45/50] iommu/amd: Report all cases inhibiting SNP enablement
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (43 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 44/50] iommu/amd: Add IOMMU_SNP_SHUTDOWN support Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 46/50] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
` (4 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
Enabling SNP relies on various IOMMU-related checks in
amd_iommu_snp_enable(). In most cases, when the host supports SNP, any
IOMMU-related details that prevent enabling SNP are reported. One case
where it is not reported is when the IOMMU doesn't support the SNP
feature. Often this is the result of the corresponding BIOS option not
being enabled, so report that case along with the others.
While here, fix up the reporting to be more consistent about using
periods to end sentences, and always printing a newline afterward.
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
drivers/iommu/amd/init.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6af208a4f66b..121092f0a48a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3811,7 +3811,7 @@ int amd_iommu_snp_enable(void)
* not configured in the passthrough mode.
*/
if (no_iommu || iommu_default_passthrough()) {
- pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported");
+ pr_err("SNP: IOMMU is disabled or configured in passthrough mode, SNP cannot be supported.\n");
return -EINVAL;
}
@@ -3826,14 +3826,16 @@ int amd_iommu_snp_enable(void)
}
amd_iommu_snp_en = check_feature_on_all_iommus(FEATURE_SNP);
- if (!amd_iommu_snp_en)
+ if (!amd_iommu_snp_en) {
+ pr_err("SNP: IOMMU SNP feature is not enabled, SNP cannot be supported.\n");
return -EINVAL;
+ }
pr_info("SNP enabled\n");
/* Enforce IOMMU v1 pagetable when SNP is enabled. */
if (amd_iommu_pgtable != AMD_IOMMU_V1) {
- pr_warn("Force to using AMD IOMMU v1 page table due to SNP\n");
+ pr_warn("Force to using AMD IOMMU v1 page table due to SNP.\n");
amd_iommu_pgtable = AMD_IOMMU_V1;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 46/50] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (44 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 45/50] iommu/amd: Report all cases inhibiting SNP enablement Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 23:11 ` Dionna Amalie Glaze
2023-10-16 13:28 ` [PATCH v10 47/50] x86/sev: Add KVM commands for per-instance certs Michael Roth
` (3 subsequent siblings)
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Alexey Kardashevskiy,
Dionna Glaze
From: Brijesh Singh <brijesh.singh@amd.com>
The SEV-SNP firmware provides the SNP_CONFIG command used to set the
system-wide configuration value for SNP guests. The information includes
the TCB version string to be reported in guest attestation reports.
Version 2 of the GHCB specification adds an NAE (SNP extended guest
request) that a guest can use to query the reports that include additional
certificates.
In both cases, userspace provided additional data is included in the
attestation reports. The userspace will use the SNP_SET_EXT_CONFIG
command to give the certificate blob and the reported TCB version string
at once. Note that the specification defines certificate blob with a
specific GUID format; the userspace is responsible for building the
proper certificate blob. The ioctl treats it an opaque blob.
While it is not defined in the spec, but let's add SNP_GET_EXT_CONFIG
command that can be used to obtain the data programmed through the
SNP_SET_EXT_CONFIG.
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: squash in doc patch from Dionna]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
Documentation/virt/coco/sev-guest.rst | 27 ++++
drivers/crypto/ccp/sev-dev.c | 173 ++++++++++++++++++++++++++
drivers/crypto/ccp/sev-dev.h | 2 +
include/linux/psp-sev.h | 10 ++
include/uapi/linux/psp-sev.h | 17 +++
5 files changed, 229 insertions(+)
diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst
index e828c5326936..7cabf54395e5 100644
--- a/Documentation/virt/coco/sev-guest.rst
+++ b/Documentation/virt/coco/sev-guest.rst
@@ -151,6 +151,33 @@ The SNP_PLATFORM_STATUS command is used to query the SNP platform status. The
status includes API major, minor version and more. See the SEV-SNP
specification for further details.
+2.5 SNP_SET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_SET_EXT_CONFIG is used to set the system-wide configuration such as
+reported TCB version in the attestation report. The command is similar to
+SNP_CONFIG command defined in the SEV-SNP spec. The main difference is the
+command also accepts an additional certificate blob defined in the GHCB
+specification.
+
+If the certs_address is zero, then the previous certificate blob will deleted.
+For more information on the certificate blob layout, see the GHCB spec
+(extended guest request message).
+
+2.6 SNP_GET_EXT_CONFIG
+----------------------
+:Technology: sev-snp
+:Type: hypervisor ioctl cmd
+:Parameters (in): struct sev_data_snp_ext_config
+:Returns (out): 0 on success, -negative on error
+
+The SNP_GET_EXT_CONFIG is used to query the system-wide configuration set
+through the SNP_SET_EXT_CONFIG.
+
3. SEV-SNP CPUID Enforcement
============================
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 0626c0feff9b..4807ddd6ec52 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1496,6 +1496,10 @@ static int __sev_snp_shutdown_locked(int *error)
data.length = sizeof(data);
data.iommu_snp_shutdown = 1;
+ /* Free the memory used for caching the certificate data */
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = NULL;
+
wbinvd_on_all_cpus();
retry:
@@ -1834,6 +1838,121 @@ static int sev_ioctl_snp_platform_status(struct sev_issue_cmd *argp)
return ret;
}
+static int sev_ioctl_snp_get_config(struct sev_issue_cmd *argp)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ struct sev_snp_certs *snp_certs;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the TCB version programmed through the SET_CONFIG to userspace */
+ if (input.config_address) {
+ if (copy_to_user((void * __user)input.config_address,
+ &sev->snp_config, sizeof(struct sev_user_data_snp_config)))
+ return -EFAULT;
+ }
+
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+
+ /* Copy the extended certs programmed through the SNP_SET_CONFIG */
+ if (input.certs_address && snp_certs) {
+ if (input.certs_len < snp_certs->len) {
+ /* Return the certs length to userspace */
+ input.certs_len = snp_certs->len;
+
+ ret = -EIO;
+ goto e_done;
+ }
+
+ if (copy_to_user((void * __user)input.certs_address,
+ snp_certs->data, snp_certs->len)) {
+ ret = -EFAULT;
+ goto put_exit;
+ }
+ }
+
+ ret = 0;
+
+e_done:
+ if (copy_to_user((void __user *)argp->data, &input, sizeof(input)))
+ ret = -EFAULT;
+
+put_exit:
+ sev_snp_certs_put(snp_certs);
+
+ return ret;
+}
+
+static int sev_ioctl_snp_set_config(struct sev_issue_cmd *argp, bool writable)
+{
+ struct sev_device *sev = psp_master->sev_data;
+ struct sev_user_data_ext_snp_config input;
+ struct sev_user_data_snp_config config;
+ struct sev_snp_certs *snp_certs = NULL;
+ void *certs = NULL;
+ int ret;
+
+ if (!sev->snp_initialized || !argp->data)
+ return -EINVAL;
+
+ if (!writable)
+ return -EPERM;
+
+ if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+ return -EFAULT;
+
+ /* Copy the certs from userspace */
+ if (input.certs_address) {
+ if (!input.certs_len || !IS_ALIGNED(input.certs_len, PAGE_SIZE))
+ return -EINVAL;
+
+ certs = psp_copy_user_blob(input.certs_address, input.certs_len);
+ if (IS_ERR(certs))
+ return PTR_ERR(certs);
+ }
+
+ /* Issue the PSP command to update the TCB version using the SNP_CONFIG. */
+ if (input.config_address) {
+ if (copy_from_user(&config,
+ (void __user *)input.config_address, sizeof(config))) {
+ ret = -EFAULT;
+ goto e_free;
+ }
+
+ ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
+ if (ret)
+ goto e_free;
+
+ memcpy(&sev->snp_config, &config, sizeof(config));
+ }
+
+ /*
+ * If the new certs are passed then cache it else free the old certs.
+ */
+ if (input.certs_len) {
+ snp_certs = sev_snp_certs_new(certs, input.certs_len);
+ if (!snp_certs) {
+ ret = -ENOMEM;
+ goto e_free;
+ }
+ }
+
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = snp_certs;
+
+ return 0;
+
+e_free:
+ kfree(certs);
+ return ret;
+}
+
static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
{
void __user *argp = (void __user *)arg;
@@ -1888,6 +2007,12 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
case SNP_PLATFORM_STATUS:
ret = sev_ioctl_snp_platform_status(&input);
break;
+ case SNP_SET_EXT_CONFIG:
+ ret = sev_ioctl_snp_set_config(&input, writable);
+ break;
+ case SNP_GET_EXT_CONFIG:
+ ret = sev_ioctl_snp_get_config(&input);
+ break;
default:
ret = -EINVAL;
goto out;
@@ -1936,6 +2061,54 @@ int sev_guest_df_flush(int *error)
}
EXPORT_SYMBOL_GPL(sev_guest_df_flush);
+static void sev_snp_certs_release(struct kref *kref)
+{
+ struct sev_snp_certs *certs = container_of(kref, struct sev_snp_certs, kref);
+
+ kfree(certs->data);
+ kfree(certs);
+}
+
+struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len)
+{
+ struct sev_snp_certs *certs;
+
+ if (!len || !data)
+ return NULL;
+
+ certs = kzalloc(sizeof(*certs), GFP_KERNEL);
+ if (!certs)
+ return NULL;
+
+ certs->data = data;
+ certs->len = len;
+ kref_init(&certs->kref);
+
+ return certs;
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_new);
+
+struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs)
+{
+ if (!certs)
+ return NULL;
+
+ if (!kref_get_unless_zero(&certs->kref))
+ return NULL;
+
+ return certs;
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_get);
+
+void sev_snp_certs_put(struct sev_snp_certs *certs)
+{
+ if (!certs)
+ return;
+
+ kref_put(&certs->kref, sev_snp_certs_release);
+}
+EXPORT_SYMBOL_GPL(sev_snp_certs_put);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/drivers/crypto/ccp/sev-dev.h b/drivers/crypto/ccp/sev-dev.h
index 2c2fe42189a5..71eac493fd56 100644
--- a/drivers/crypto/ccp/sev-dev.h
+++ b/drivers/crypto/ccp/sev-dev.h
@@ -66,6 +66,8 @@ struct sev_device {
bool snp_initialized;
struct snp_host_map snp_host_map[MAX_SNP_HOST_MAP_BUFS];
+ struct sev_snp_certs *snp_certs;
+ struct sev_user_data_snp_config snp_config;
};
int sev_dev_init(struct psp_device *psp);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 9342cee1a1e6..3c605856ef4f 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -16,6 +16,16 @@
#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
+struct sev_snp_certs {
+ void *data;
+ u32 len;
+ struct kref kref;
+};
+
+struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
+struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
+void sev_snp_certs_put(struct sev_snp_certs *certs);
+
/**
* SEV platform state
*/
diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
index b94b3687edbb..b70db9ab7e44 100644
--- a/include/uapi/linux/psp-sev.h
+++ b/include/uapi/linux/psp-sev.h
@@ -29,6 +29,8 @@ enum {
SEV_GET_ID, /* This command is deprecated, use SEV_GET_ID2 */
SEV_GET_ID2,
SNP_PLATFORM_STATUS,
+ SNP_SET_EXT_CONFIG,
+ SNP_GET_EXT_CONFIG,
SEV_MAX,
};
@@ -208,6 +210,21 @@ struct sev_user_data_snp_config {
__u8 rsvd1[52];
} __packed;
+/**
+ * struct sev_data_snp_ext_config - system wide configuration value for SNP.
+ *
+ * @config_address: address of the struct sev_user_data_snp_config or 0 when
+ * reported_tcb does not need to be updated.
+ * @certs_address: address of extended guest request certificate chain or
+ * 0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
+ * @certs_len: length of the certs
+ */
+struct sev_user_data_ext_snp_config {
+ __u64 config_address; /* In */
+ __u64 certs_address; /* In */
+ __u32 certs_len; /* In */
+} __packed;
+
/**
* struct sev_issue_cmd - SEV ioctl parameters
*
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 46/50] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
2023-10-16 13:28 ` [PATCH v10 46/50] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
@ 2023-10-16 23:11 ` Dionna Amalie Glaze
0 siblings, 0 replies; 158+ messages in thread
From: Dionna Amalie Glaze @ 2023-10-16 23:11 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Alexey Kardashevskiy
> +/**
> + * struct sev_data_snp_ext_config - system wide configuration value for SNP.
> + *
> + * @config_address: address of the struct sev_user_data_snp_config or 0 when
> + * reported_tcb does not need to be updated.
> + * @certs_address: address of extended guest request certificate chain or
> + * 0 when previous certificate should be removed on SNP_SET_EXT_CONFIG.
> + * @certs_len: length of the certs
> + */
> +struct sev_user_data_ext_snp_config {
> + __u64 config_address; /* In */
> + __u64 certs_address; /* In */
> + __u32 certs_len; /* In */
> +} __packed;
> +
Can we add a generation number to this? Whenever user space sets the
certs blob it will invalidate the instance-specific certificates that
are settable in KVM.
The VMM will need to weave the instance-specific data with the new
certs installed at the machine level since we're not adding
interpretation of the cert blob to KVM.
--
-Dionna Glaze, PhD (she/her)
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 47/50] x86/sev: Add KVM commands for per-instance certs
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (45 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 46/50] crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
` (2 subsequent siblings)
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Dionna Glaze, Tom Lendacky,
Alexey Kardashevskiy
From: Dionna Glaze <dionnaglaze@google.com>
The /dev/sev device has the ability to store host-wide certificates for
the key used by the AMD-SP for SEV-SNP attestation report signing,
but for hosts that want to specify additional certificates that are
specific to the image launched in a VM, a different way is needed to
communicate those certificates.
Add two new KVM ioctl to handle this: KVM_SEV_SNP_{GET,SET}_CERTS
The certificates that are set with this command are expected to follow
the same format as the host certificates, but that format is opaque
to the kernel.
The new behavior for custom certificates is that the extended guest
request command will now return the overridden certificates if they
were installed for the instance. The error condition for a too small
data buffer is changed to return the overridden certificate data size
if there is an overridden certificate set installed.
Setting a 0 length certificate returns the system state to only return
the host certificates on an extended guest request.
Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
for an extra certificate.
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: remove used of "we" and "this patch" in commit log, squash in
documentation patch]
Signed-off-by: Michael Roth <michael.roth@amd.com>
[aik: snp_handle_ext_guest_request() now uses the CCP's cert object
without copying things over, only refcounting needed.]
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
.../virt/kvm/x86/amd-memory-encryption.rst | 44 +++++++
arch/x86/kvm/svm/sev.c | 115 ++++++++++++++++++
arch/x86/kvm/svm/svm.h | 1 +
include/linux/psp-sev.h | 2 +-
include/uapi/linux/kvm.h | 12 ++
5 files changed, 173 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
index b89634cfcc06..2ce6c90f07d4 100644
--- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst
@@ -537,6 +537,50 @@ Returns: 0 on success, -negative on error
See SEV-SNP specification for further details on launch finish input parameters.
+22. KVM_SEV_SNP_GET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_GET_CERTS command
+can be issued to request the data that has been installed with the
+KVM_SEV_SNP_SET_CERTS command.
+
+Parameters (in/out): struct kvm_sev_snp_get_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_get_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len
+ };
+
+If no certs have been installed, then the return value is -ENOENT.
+If the buffer specified in the struct is too small, the certs_len field will be
+overwritten with the required bytes to receive all the certificate bytes and the
+return value will be -EINVAL.
+
+23. KVM_SEV_SNP_SET_CERTS
+-------------------------
+
+After the SNP guest launch flow has started, the KVM_SEV_SNP_SET_CERTS command
+can be issued to override the /dev/sev certs data that is returned when a
+guest issues an extended guest request. This is useful for instance-specific
+extensions to the host certificates.
+
+Parameters (in/out): struct kvm_sev_snp_set_certs
+
+Returns: 0 on success, -negative on error
+
+::
+
+ struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len
+ };
+
+The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.
+
References
==========
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index efe879524b6c..602aaf82eef3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2301,6 +2301,113 @@ static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
return ret;
}
+static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ struct kvm_sev_snp_get_certs params;
+ struct sev_snp_certs *snp_certs;
+ int rc = 0;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
+ sizeof(params)))
+ return -EFAULT;
+
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+ /* No instance certs set. */
+ if (!snp_certs)
+ return -ENOENT;
+
+ if (params.certs_len < sev->snp_certs->len) {
+ /* Output buffer too small. Return the required size. */
+ params.certs_len = sev->snp_certs->len;
+
+ if (copy_to_user((void __user *)(uintptr_t)argp->data, ¶ms,
+ sizeof(params)))
+ rc = -EFAULT;
+ else
+ rc = -EINVAL; /* May be ENOSPC? */
+ } else {
+ if (copy_to_user((void __user *)(uintptr_t)params.certs_uaddr,
+ snp_certs->data, snp_certs->len))
+ rc = -EFAULT;
+ }
+
+ sev_snp_certs_put(snp_certs);
+
+ return rc;
+}
+
+static void snp_replace_certs(struct kvm_sev_info *sev, struct sev_snp_certs *snp_certs)
+{
+ sev_snp_certs_put(sev->snp_certs);
+ sev->snp_certs = snp_certs;
+}
+
+static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+ unsigned long length = SEV_FW_BLOB_MAX_SIZE;
+ struct kvm_sev_snp_set_certs params;
+ struct sev_snp_certs *snp_certs;
+ void *to_certs;
+ int ret;
+
+ if (!sev_snp_guest(kvm))
+ return -ENOTTY;
+
+ if (!sev->snp_context)
+ return -EINVAL;
+
+ if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data,
+ sizeof(params)))
+ return -EFAULT;
+
+ if (params.certs_len > SEV_FW_BLOB_MAX_SIZE)
+ return -EINVAL;
+
+ /*
+ * Setting a length of 0 is the same as "uninstalling" instance-
+ * specific certificates.
+ */
+ if (params.certs_len == 0) {
+ snp_replace_certs(sev, NULL);
+ return 0;
+ }
+
+ /* Page-align the length */
+ length = ALIGN(params.certs_len, PAGE_SIZE);
+
+ to_certs = kmalloc(length, GFP_KERNEL | __GFP_ZERO);
+ if (!to_certs)
+ return -ENOMEM;
+
+ if (copy_from_user(to_certs,
+ (void __user *)(uintptr_t)params.certs_uaddr,
+ params.certs_len)) {
+ ret = -EFAULT;
+ goto error_exit;
+ }
+
+ snp_certs = sev_snp_certs_new(to_certs, length);
+ if (!snp_certs) {
+ ret = -ENOMEM;
+ goto error_exit;
+ }
+
+ snp_replace_certs(sev, snp_certs);
+
+ return 0;
+error_exit:
+ kfree(to_certs);
+ return ret;
+}
+
int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
{
struct kvm_sev_cmd sev_cmd;
@@ -2400,6 +2507,12 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_FINISH:
r = snp_launch_finish(kvm, &sev_cmd);
break;
+ case KVM_SEV_SNP_GET_CERTS:
+ r = snp_get_instance_certs(kvm, &sev_cmd);
+ break;
+ case KVM_SEV_SNP_SET_CERTS:
+ r = snp_set_instance_certs(kvm, &sev_cmd);
+ break;
default:
r = -EINVAL;
goto out;
@@ -2616,6 +2729,8 @@ static int snp_decommission_context(struct kvm *kvm)
snp_free_firmware_page(sev->snp_context);
sev->snp_context = NULL;
+ sev_snp_certs_put(sev->snp_certs);
+
return 0;
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 1fd90a88b0db..bdf792ba06e1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -97,6 +97,7 @@ struct kvm_sev_info {
u64 snp_init_flags;
void *snp_context; /* SNP guest context page */
u64 sev_features; /* Features set at VMSA creation */
+ struct sev_snp_certs *snp_certs;
};
struct kvm_svm {
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 3c605856ef4f..722e26d28d2f 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -14,7 +14,7 @@
#include <uapi/linux/psp-sev.h>
-#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
+#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */
struct sev_snp_certs {
void *data;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3af546adb962..0444e122ac5e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1973,6 +1973,8 @@ enum sev_cmd_id {
KVM_SEV_SNP_LAUNCH_START,
KVM_SEV_SNP_LAUNCH_UPDATE,
KVM_SEV_SNP_LAUNCH_FINISH,
+ KVM_SEV_SNP_GET_CERTS,
+ KVM_SEV_SNP_SET_CERTS,
KVM_SEV_NR_MAX,
};
@@ -2120,6 +2122,16 @@ struct kvm_sev_snp_launch_finish {
__u8 pad[6];
};
+struct kvm_sev_snp_get_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len;
+};
+
+struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len;
+};
+
#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
#define KVM_DEV_ASSIGN_MASK_INTX (1 << 2)
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (46 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 47/50] x86/sev: Add KVM commands for per-instance certs Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 23:18 ` Dionna Amalie Glaze
2023-10-16 13:28 ` [PATCH v10 49/50] crypto: ccp: Add debug support for decrypting pages Michael Roth
2023-10-16 13:28 ` [PATCH v10 50/50] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump Michael Roth
49 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Alexey Kardashevskiy
From: Brijesh Singh <brijesh.singh@amd.com>
Version 2 of GHCB specification added the support for two SNP Guest
Request Message NAE events. The events allows for an SEV-SNP guest to
make request to the SEV-SNP firmware through hypervisor using the
SNP_GUEST_REQUEST API define in the SEV-SNP firmware specification.
The SNP_EXT_GUEST_REQUEST is similar to SNP_GUEST_REQUEST with the
difference of an additional certificate blob that can be passed through
the SNP_SET_CONFIG ioctl defined in the CCP driver. The CCP driver
provides snp_guest_ext_guest_request() that is used by the KVM to get
both the report and certificate data at once.
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: ensure FW command failures are indicated to guest]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kvm/svm/sev.c | 176 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 1 +
drivers/crypto/ccp/sev-dev.c | 15 +++
include/linux/psp-sev.h | 1 +
4 files changed, 193 insertions(+)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 602aaf82eef3..d71ec257debb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
#include <linux/misc_cgroup.h>
#include <linux/processor.h>
#include <linux/trace_events.h>
+#include <uapi/linux/sev-guest.h>
#include <asm/pkru.h>
#include <asm/trapnr.h>
@@ -339,6 +340,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
ret = verify_snp_init_flags(kvm, argp);
if (ret)
goto e_free;
+
+ mutex_init(&sev->guest_req_lock);
}
ret = sev_platform_init(&argp->error);
@@ -2345,8 +2348,10 @@ static int snp_get_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
static void snp_replace_certs(struct kvm_sev_info *sev, struct sev_snp_certs *snp_certs)
{
+ mutex_lock(&sev->guest_req_lock);
sev_snp_certs_put(sev->snp_certs);
sev->snp_certs = snp_certs;
+ mutex_unlock(&sev->guest_req_lock);
}
static int snp_set_instance_certs(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -3218,6 +3223,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_HV_FEATURES:
case SVM_VMGEXIT_PSC:
case SVM_VMGEXIT_TERM_REQUEST:
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
break;
default:
reason = GHCB_ERR_INVALID_EVENT;
@@ -3627,6 +3634,163 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
return ret;
}
+static unsigned long snp_setup_guest_buf(struct vcpu_svm *svm,
+ struct sev_data_snp_guest_request *data,
+ gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ kvm_pfn_t req_pfn, resp_pfn;
+ struct kvm_sev_info *sev;
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ if (!IS_ALIGNED(req_gpa, PAGE_SIZE) || !IS_ALIGNED(resp_gpa, PAGE_SIZE))
+ return SEV_RET_INVALID_PARAM;
+
+ req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
+ if (is_error_noslot_pfn(req_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
+ if (is_error_noslot_pfn(resp_pfn))
+ return SEV_RET_INVALID_ADDRESS;
+
+ if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true))
+ return SEV_RET_INVALID_ADDRESS;
+
+ data->gctx_paddr = __psp_pa(sev->snp_context);
+ data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
+ data->res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
+
+ return 0;
+}
+
+static void snp_cleanup_guest_buf(struct sev_data_snp_guest_request *data, unsigned long *rc)
+{
+ u64 pfn = __sme_clr(data->res_paddr) >> PAGE_SHIFT;
+ int ret;
+
+ ret = snp_page_reclaim(pfn);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+
+ ret = rmp_make_shared(pfn, PG_LEVEL_4K);
+ if (ret)
+ *rc = SEV_RET_INVALID_ADDRESS;
+}
+
+static void snp_handle_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request data = {0};
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_sev_info *sev;
+ unsigned long rc;
+ int err;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
+ if (rc)
+ /* Ensure an error value is returned to guest. */
+ rc = err ? err : SEV_RET_INVALID_ADDRESS;
+
+ snp_cleanup_guest_buf(&data, &rc);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
+}
+
+static void snp_handle_ext_guest_request(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
+{
+ struct sev_data_snp_guest_request req = {0};
+ struct sev_snp_certs *snp_certs = NULL;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+ struct kvm *kvm = vcpu->kvm;
+ unsigned long data_npages;
+ struct kvm_sev_info *sev;
+ unsigned long exitcode = 0;
+ u64 data_gpa;
+ int err, rc;
+
+ if (!sev_snp_guest(vcpu->kvm)) {
+ rc = SEV_RET_INVALID_GUEST;
+ goto e_fail;
+ }
+
+ sev = &to_kvm_svm(kvm)->sev_info;
+
+ data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+ data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+ if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
+ exitcode = SEV_RET_INVALID_ADDRESS;
+ goto e_fail;
+ }
+
+ mutex_lock(&sev->guest_req_lock);
+
+ rc = snp_setup_guest_buf(svm, &req, req_gpa, resp_gpa);
+ if (rc)
+ goto unlock;
+
+ /*
+ * If a VMM-specific certificate blob hasn't been provided, grab the
+ * host-wide one.
+ */
+ snp_certs = sev_snp_certs_get(sev->snp_certs);
+ if (!snp_certs)
+ snp_certs = sev_snp_global_certs_get();
+
+ /*
+ * If there is a host-wide or VMM-specific certificate blob available,
+ * make sure the guest has allocated enough space to store it.
+ * Otherwise, inform the guest how much space is needed.
+ */
+ if (snp_certs && (data_npages << PAGE_SHIFT) < snp_certs->len) {
+ vcpu->arch.regs[VCPU_REGS_RBX] = snp_certs->len >> PAGE_SHIFT;
+ exitcode = SNP_GUEST_VMM_ERR(SNP_GUEST_VMM_ERR_INVALID_LEN);
+ goto cleanup;
+ }
+
+ rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &req, &err);
+ if (rc) {
+ /* pass the firmware error code */
+ exitcode = err;
+ goto cleanup;
+ }
+
+ /* Copy the certificate blob in the guest memory */
+ if (snp_certs &&
+ kvm_write_guest(kvm, data_gpa, snp_certs->data, snp_certs->len))
+ exitcode = SEV_RET_INVALID_ADDRESS;
+
+cleanup:
+ sev_snp_certs_put(snp_certs);
+ snp_cleanup_guest_buf(&req, &exitcode);
+
+unlock:
+ mutex_unlock(&sev->guest_req_lock);
+
+e_fail:
+ ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -3894,6 +4058,18 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
vcpu->run->system_event.ndata = 1;
vcpu->run->system_event.data[0] = control->ghcb_gpa;
break;
+ case SVM_VMGEXIT_GUEST_REQUEST:
+ snp_handle_guest_request(svm, control->exit_info_1, control->exit_info_2);
+
+ ret = 1;
+ break;
+ case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+ snp_handle_ext_guest_request(svm,
+ control->exit_info_1,
+ control->exit_info_2);
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bdf792ba06e1..3673a6e4e22e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -98,6 +98,7 @@ struct kvm_sev_info {
void *snp_context; /* SNP guest context page */
u64 sev_features; /* Features set at VMSA creation */
struct sev_snp_certs *snp_certs;
+ struct mutex guest_req_lock; /* Lock for guest request handling */
};
struct kvm_svm {
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 4807ddd6ec52..f9c75c561c4e 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2109,6 +2109,21 @@ void sev_snp_certs_put(struct sev_snp_certs *certs)
}
EXPORT_SYMBOL_GPL(sev_snp_certs_put);
+struct sev_snp_certs *sev_snp_global_certs_get(void)
+{
+ struct sev_device *sev;
+
+ if (!psp_master || !psp_master->sev_data)
+ return NULL;
+
+ sev = psp_master->sev_data;
+ if (!sev->snp_initialized)
+ return NULL;
+
+ return sev_snp_certs_get(sev->snp_certs);
+}
+EXPORT_SYMBOL_GPL(sev_snp_global_certs_get);
+
static void sev_exit(struct kref *ref)
{
misc_deregister(&misc_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 722e26d28d2f..3b294ccbbec9 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -25,6 +25,7 @@ struct sev_snp_certs {
struct sev_snp_certs *sev_snp_certs_new(void *data, u32 len);
struct sev_snp_certs *sev_snp_certs_get(struct sev_snp_certs *certs);
void sev_snp_certs_put(struct sev_snp_certs *certs);
+struct sev_snp_certs *sev_snp_global_certs_get(void);
/**
* SEV platform state
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-16 13:28 ` [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2023-10-16 23:18 ` Dionna Amalie Glaze
2023-10-17 16:27 ` Sean Christopherson
0 siblings, 1 reply; 158+ messages in thread
From: Dionna Amalie Glaze @ 2023-10-16 23:18 UTC (permalink / raw)
To: Michael Roth
Cc: kvm, linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, Alexey Kardashevskiy
> +
> + /*
> + * If a VMM-specific certificate blob hasn't been provided, grab the
> + * host-wide one.
> + */
> + snp_certs = sev_snp_certs_get(sev->snp_certs);
> + if (!snp_certs)
> + snp_certs = sev_snp_global_certs_get();
> +
This is where the generation I suggested adding would get checked. If
the instance certs' generation is not the global generation, then I
think we need a way to return to the VMM to make that right before
continuing to provide outdated certificates.
This might be an unreasonable request, but the fact that the certs and
reported_tcb can be set while a VM is running makes this an issue.
--
-Dionna Glaze, PhD (she/her)
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-16 23:18 ` Dionna Amalie Glaze
@ 2023-10-17 16:27 ` Sean Christopherson
2023-10-18 2:28 ` Alexey Kardashevskiy
0 siblings, 1 reply; 158+ messages in thread
From: Sean Christopherson @ 2023-10-17 16:27 UTC (permalink / raw)
To: Dionna Amalie Glaze
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, bp,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh, Alexey Kardashevskiy
On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
> > +
> > + /*
> > + * If a VMM-specific certificate blob hasn't been provided, grab the
> > + * host-wide one.
> > + */
> > + snp_certs = sev_snp_certs_get(sev->snp_certs);
> > + if (!snp_certs)
> > + snp_certs = sev_snp_global_certs_get();
> > +
>
> This is where the generation I suggested adding would get checked. If
> the instance certs' generation is not the global generation, then I
> think we need a way to return to the VMM to make that right before
> continuing to provide outdated certificates.
> This might be an unreasonable request, but the fact that the certs and
> reported_tcb can be set while a VM is running makes this an issue.
Before we get that far, the changelogs need to explain why the kernel is storing
userspace blobs in the first place. The whole thing is a bit of a mess.
sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
between bumping the refcount and grabbing the pointer, KVM will end up leaking a
refcount and consuming a pointer without a refcount.
if (!kref_get_unless_zero(&certs->kref))
return NULL;
return certs;
If allocating memory for the certs fails, the kernel will have set the config
but not store the corresponding certs.
ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
if (ret)
goto e_free;
memcpy(&sev->snp_config, &config, sizeof(config));
}
/*
* If the new certs are passed then cache it else free the old certs.
*/
if (input.certs_len) {
snp_certs = sev_snp_certs_new(certs, input.certs_len);
if (!snp_certs) {
ret = -ENOMEM;
goto e_free;
}
}
Reasoning about ordering is also difficult, e.g. what is KVM's contract with
userspace in terms of recognizing new global certs?
I don't understand why the kernel needs to manage the certs. AFAICT the so called
global certs aren't an input to SEV_CMD_SNP_CONFIG, i.e. SNP_SET_EXT_CONFIG is
purely a software defined thing.
The easiest solution I can think of is to have KVM provide a chunk of memory in
kvm_sev_info for SNP guests that userspace can mmap(), a la vcpu->run.
struct sev_snp_certs {
u8 data[KVM_MAX_SEV_SNP_CERT_SIZE];
u32 size;
u8 pad[<size to make the struct page aligned>];
};
When the guest requests the certs, KVM does something like:
certs_size = READ_ONCE(sev->snp_certs->size);
if (certs_size > sizeof(sev->snp_certs->data) ||
!IS_ALIGNED(certs_size, PAGE_SIZE))
certs_size = 0;
if (certs_size && (data_npages << PAGE_SHIFT) < certs_size) {
vcpu->arch.regs[VCPU_REGS_RBX] = certs_size >> PAGE_SHIFT;
exitcode = SNP_GUEST_VMM_ERR(SNP_GUEST_VMM_ERR_INVALID_LEN);
goto cleanup;
}
...
if (certs_size &&
kvm_write_guest(kvm, data_gpa, sev->snp_certs->data, certs_size))
exitcode = SEV_RET_INVALID_ADDRESS;
If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
concern.
If userspace needs to *stall* cert requests, e.g. while the certs are being updated,
then that's a different issue entirely. If the GHCB allows telling the guest to
retry the request, then it should be trivially easy to solve, e.g. add a flag in
sev_snp_certs. If KVM must "immediately" handle the request, then we'll need more
elaborate uAPI.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-17 16:27 ` Sean Christopherson
@ 2023-10-18 2:28 ` Alexey Kardashevskiy
2023-10-18 13:48 ` Sean Christopherson
0 siblings, 1 reply; 158+ messages in thread
From: Alexey Kardashevskiy @ 2023-10-18 2:28 UTC (permalink / raw)
To: Sean Christopherson, Dionna Amalie Glaze
Cc: Michael Roth, kvm, linux-coco, linux-mm, linux-crypto, x86,
linux-kernel, tglx, mingo, jroedel, thomas.lendacky, hpa, ardb,
pbonzini, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, bp,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On 18/10/23 03:27, Sean Christopherson wrote:
> On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
>>> +
>>> + /*
>>> + * If a VMM-specific certificate blob hasn't been provided, grab the
>>> + * host-wide one.
>>> + */
>>> + snp_certs = sev_snp_certs_get(sev->snp_certs);
>>> + if (!snp_certs)
>>> + snp_certs = sev_snp_global_certs_get();
>>> +
>>
>> This is where the generation I suggested adding would get checked. If
>> the instance certs' generation is not the global generation, then I
>> think we need a way to return to the VMM to make that right before
>> continuing to provide outdated certificates.
>> This might be an unreasonable request, but the fact that the certs and
>> reported_tcb can be set while a VM is running makes this an issue.
>
> Before we get that far, the changelogs need to explain why the kernel is storing
> userspace blobs in the first place. The whole thing is a bit of a mess.
>
> sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
> bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
> while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
> between bumping the refcount and grabbing the pointer, KVM will end up leaking a
> refcount and consuming a pointer without a refcount.
>
> if (!kref_get_unless_zero(&certs->kref))
> return NULL;
>
> return certs;
I'm missing something here. The @certs pointer is on the stack, if it is
being released elsewhere - kref_get_unless_zero() is going to fail and
return NULL. How can this @certs not have the refcount incremented?
> If allocating memory for the certs fails, the kernel will have set the config
> but not store the corresponding certs.
Ah true.
> ret = __sev_do_cmd_locked(SEV_CMD_SNP_CONFIG, &config, &argp->error);
> if (ret)
> goto e_free;
>
> memcpy(&sev->snp_config, &config, sizeof(config));
> }
>
> /*
> * If the new certs are passed then cache it else free the old certs.
> */
> if (input.certs_len) {
> snp_certs = sev_snp_certs_new(certs, input.certs_len);
> if (!snp_certs) {
> ret = -ENOMEM;
> goto e_free;
> }
> }
>
> Reasoning about ordering is also difficult, e.g. what is KVM's contract with
> userspace in terms of recognizing new global certs?
>
> I don't understand why the kernel needs to manage the certs. AFAICT the so called
> global certs aren't an input to SEV_CMD_SNP_CONFIG, i.e. SNP_SET_EXT_CONFIG is
> purely a software defined thing.
> > The easiest solution I can think of is to have KVM provide a chunk of
memory in
> kvm_sev_info for SNP guests that userspace can mmap(), a la vcpu->run.
>
> struct sev_snp_certs {
> u8 data[KVM_MAX_SEV_SNP_CERT_SIZE];
> u32 size;
> u8 pad[<size to make the struct page aligned>];
> };
>
> When the guest requests the certs, KVM does something like:
>
> certs_size = READ_ONCE(sev->snp_certs->size);
> if (certs_size > sizeof(sev->snp_certs->data) ||
> !IS_ALIGNED(certs_size, PAGE_SIZE))
> certs_size = 0;
>
> if (certs_size && (data_npages << PAGE_SHIFT) < certs_size) {
> vcpu->arch.regs[VCPU_REGS_RBX] = certs_size >> PAGE_SHIFT;
> exitcode = SNP_GUEST_VMM_ERR(SNP_GUEST_VMM_ERR_INVALID_LEN);
> goto cleanup;
> }
>
> ...
>
> if (certs_size &&
> kvm_write_guest(kvm, data_gpa, sev->snp_certs->data, certs_size))
> exitcode = SEV_RET_INVALID_ADDRESS;
>
> If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
> That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
> concern.
The global cert lives in CCP (/dev/sev), the per VM cert lives in
kvmvm_fd. "A la vcpu->run" is fine for the latter but for the former we
need something else. And there is scenario when one global certs blob is
what is needed and copying it over multiple VMs seems suboptimal.
> If userspace needs to *stall* cert requests, e.g. while the certs are being updated,
afaik it does not need to.
> then that's a different issue entirely. If the GHCB allows telling the guest to
> retry the request, then it should be trivially easy to solve, e.g. add a flag in
> sev_snp_certs. If KVM must "immediately" handle the request, then we'll need more
> elaborate uAPI.
--
Alexey
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-18 2:28 ` Alexey Kardashevskiy
@ 2023-10-18 13:48 ` Sean Christopherson
2023-10-18 20:27 ` Kalra, Ashish
` (2 more replies)
0 siblings, 3 replies; 158+ messages in thread
From: Sean Christopherson @ 2023-10-18 13:48 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>
> On 18/10/23 03:27, Sean Christopherson wrote:
> > On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
> > > > +
> > > > + /*
> > > > + * If a VMM-specific certificate blob hasn't been provided, grab the
> > > > + * host-wide one.
> > > > + */
> > > > + snp_certs = sev_snp_certs_get(sev->snp_certs);
> > > > + if (!snp_certs)
> > > > + snp_certs = sev_snp_global_certs_get();
> > > > +
> > >
> > > This is where the generation I suggested adding would get checked. If
> > > the instance certs' generation is not the global generation, then I
> > > think we need a way to return to the VMM to make that right before
> > > continuing to provide outdated certificates.
> > > This might be an unreasonable request, but the fact that the certs and
> > > reported_tcb can be set while a VM is running makes this an issue.
> >
> > Before we get that far, the changelogs need to explain why the kernel is storing
> > userspace blobs in the first place. The whole thing is a bit of a mess.
> >
> > sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
> > bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
> > while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
> > between bumping the refcount and grabbing the pointer, KVM will end up leaking a
> > refcount and consuming a pointer without a refcount.
> >
> > if (!kref_get_unless_zero(&certs->kref))
> > return NULL;
> >
> > return certs;
>
> I'm missing something here. The @certs pointer is on the stack,
No, nothing guarantees that @certs is on the stack and will never be reloaded.
sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
possible that it can be inlined. Then you end up with:
struct sev_device *sev;
if (!psp_master || !psp_master->sev_data)
return NULL;
sev = psp_master->sev_data;
if (!sev->snp_initialized)
return NULL;
if (!sev->snp_certs)
return NULL;
if (!kref_get_unless_zero(&sev->snp_certs->kref))
return NULL;
return sev->snp_certs;
At which point the compiler could choose to omit a local variable entirely, it
could store @certs in a register and reload after kref_get_unless_zero(), etc.
If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.
That atomic operation in kref_get_unless_zero() might prevent a reload between
getting the kref and the return, but it wouldn't prevent a reload between the
!NULL check and kref_get_unless_zero().
> > If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
> > That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
> > concern.
>
> The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
> "A la vcpu->run" is fine for the latter but for the former we need something
> else.
Why? The cert ultimately comes from userspace, no? Make userspace deal with it.
> And there is scenario when one global certs blob is what is needed and
> copying it over multiple VMs seems suboptimal.
That's a solvable problem. I'm not sure I like the most obvious solution, but it
is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
or via an ioctl().
FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
userspace pointer would suffice. The benefit of a kernel controlled pointer is
that it doesn't require copying to a kernel buffer (or special code to copy from
userspace into guest).
Actually, looking at the flow again, AFAICT there's nothing special about the
target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
And the GHCB doesn't dictate ordering between storing the certificates and doing
the request. That means the certificate stuff can be punted entirely to usersepace.
Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
for non-SNP guests:
unsigned long exitcode = 0;
u64 data_gpa;
int err, rc;
if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
goto e_fail;
}
e_fail:
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
Which really highlights that we need to get test infrastructure up and running
for SEV-ES, SNP, and TDX.
Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
static void snp_handle_guest_request(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
struct sev_data_snp_guest_request data = {0};
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm *kvm = vcpu->kvm;
struct kvm_sev_info *sev;
gpa_t req_gpa = control->exit_info_1;
gpa_t resp_gpa = control->exit_info_2;
unsigned long rc;
int err;
if (!sev_snp_guest(vcpu->kvm)) {
rc = SEV_RET_INVALID_GUEST;
goto e_fail;
}
sev = &to_kvm_svm(kvm)->sev_info;
mutex_lock(&sev->guest_req_lock);
rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
if (rc)
goto unlock;
rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
if (rc)
/* Ensure an error value is returned to guest. */
rc = err ? err : SEV_RET_INVALID_ADDRESS;
snp_cleanup_guest_buf(&data, &rc);
unlock:
mutex_unlock(&sev->guest_req_lock);
e_fail:
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
}
static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
{
u64 certs_exitcode = vcpu->run->hypercall.args[2];
struct vcpu_svm *svm = to_svm(vcpu);
if (certs_exitcode)
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
else
snp_handle_guest_request(svm);
return 1;
}
static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
struct kvm *kvm = vcpu->kvm;
struct kvm_sev_info *sev;
unsigned long exitcode;
u64 data_gpa;
if (!sev_snp_guest(vcpu->kvm)) {
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
return 1;
}
data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
return 1;
}
vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
vcpu->run->hypercall.args[0] = data_gpa;
vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
return 0;
}
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-18 13:48 ` Sean Christopherson
@ 2023-10-18 20:27 ` Kalra, Ashish
2023-10-18 20:38 ` Sean Christopherson
2023-10-19 2:48 ` Alexey Kardashevskiy
2023-11-10 22:07 ` Michael Roth
2 siblings, 1 reply; 158+ messages in thread
From: Kalra, Ashish @ 2023-10-18 20:27 UTC (permalink / raw)
To: Sean Christopherson, Alexey Kardashevskiy
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, nikunj.dadhania,
pankaj.gupta, liam.merwick, zhi.a.wang, Brijesh Singh
On 10/18/2023 8:48 AM, Sean Christopherson wrote:
> On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>>
>> On 18/10/23 03:27, Sean Christopherson wrote:
>>> On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
>>>>> +
>>>>> + /*
>>>>> + * If a VMM-specific certificate blob hasn't been provided, grab the
>>>>> + * host-wide one.
>>>>> + */
>>>>> + snp_certs = sev_snp_certs_get(sev->snp_certs);
>>>>> + if (!snp_certs)
>>>>> + snp_certs = sev_snp_global_certs_get();
>>>>> +
>>>>
>>>> This is where the generation I suggested adding would get checked. If
>>>> the instance certs' generation is not the global generation, then I
>>>> think we need a way to return to the VMM to make that right before
>>>> continuing to provide outdated certificates.
>>>> This might be an unreasonable request, but the fact that the certs and
>>>> reported_tcb can be set while a VM is running makes this an issue.
>>>
>>> Before we get that far, the changelogs need to explain why the kernel is storing
>>> userspace blobs in the first place. The whole thing is a bit of a mess.
>>>
>>> sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
>>> bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
>>> while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
>>> between bumping the refcount and grabbing the pointer, KVM will end up leaking a
>>> refcount and consuming a pointer without a refcount.
>>>
>>> if (!kref_get_unless_zero(&certs->kref))
>>> return NULL;
>>>
>>> return certs;
>>
>> I'm missing something here. The @certs pointer is on the stack,
>
> No, nothing guarantees that @certs is on the stack and will never be reloaded.
> sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
> possible that it can be inlined. Then you end up with:
>
> struct sev_device *sev;
>
> if (!psp_master || !psp_master->sev_data)
> return NULL;
>
> sev = psp_master->sev_data;
> if (!sev->snp_initialized)
> return NULL;
>
> if (!sev->snp_certs)
> return NULL;
>
> if (!kref_get_unless_zero(&sev->snp_certs->kref))
> return NULL;
>
> return sev->snp_certs;
>
> At which point the compiler could choose to omit a local variable entirely, it
> could store @certs in a register and reload after kref_get_unless_zero(), etc.
> If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.
>
> That atomic operation in kref_get_unless_zero() might prevent a reload between
> getting the kref and the return, but it wouldn't prevent a reload between the
> !NULL check and kref_get_unless_zero().
>
>>> If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
>>> That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
>>> concern.
>>
>> The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
>> "A la vcpu->run" is fine for the latter but for the former we need something
>> else.
>
> Why? The cert ultimately comes from userspace, no? Make userspace deal with it.
>
>> And there is scenario when one global certs blob is what is needed and
>> copying it over multiple VMs seems suboptimal.
>
> That's a solvable problem. I'm not sure I like the most obvious solution, but it
> is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
> or via an ioctl().
>
> FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
> userspace pointer would suffice. The benefit of a kernel controlled pointer is
> that it doesn't require copying to a kernel buffer (or special code to copy from
> userspace into guest).
>
> Actually, looking at the flow again, AFAICT there's nothing special about the
> target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
> KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
> And the GHCB doesn't dictate ordering between storing the certificates and doing
> the request.
That's true.
>That means the certificate stuff can be punted entirely to usersepace.
>
> Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
> for non-SNP guests:
>
> unsigned long exitcode = 0;
> u64 data_gpa;
> int err, rc;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
> goto e_fail;
> }
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
>
> Which really highlights that we need to get test infrastructure up and running
> for SEV-ES, SNP, and TDX.
>
> Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
> is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
>
> static void snp_handle_guest_request(struct vcpu_svm *svm)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> struct sev_data_snp_guest_request data = {0};
> struct kvm_vcpu *vcpu = &svm->vcpu;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_sev_info *sev;
> gpa_t req_gpa = control->exit_info_1;
> gpa_t resp_gpa = control->exit_info_2;
> unsigned long rc;
> int err;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST;
> goto e_fail;
> }
>
> sev = &to_kvm_svm(kvm)->sev_info;
>
> mutex_lock(&sev->guest_req_lock);
>
> rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> if (rc)
> goto unlock;
>
> rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> if (rc)
> /* Ensure an error value is returned to guest. */
> rc = err ? err : SEV_RET_INVALID_ADDRESS;
>
> snp_cleanup_guest_buf(&data, &rc);
>
> unlock:
> mutex_unlock(&sev->guest_req_lock);
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
> }
>
> static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
> {
> u64 certs_exitcode = vcpu->run->hypercall.args[2];
> struct vcpu_svm *svm = to_svm(vcpu);
>
> if (certs_exitcode)
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
> else
> snp_handle_guest_request(svm);
> return 1;
> }
>
> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> {
> struct kvm_vcpu *vcpu = &svm->vcpu;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_sev_info *sev;
> unsigned long exitcode;
> u64 data_gpa;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> return 1;
> }
>
> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> return 1;
> }
>
> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> vcpu->run->hypercall.args[0] = data_gpa;
> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> return 0;
> }
>
IIRC, the important consideration here is to ensure that getting the
attestation report and retrieving the certificates appears atomic to the
guest. When SNP live migration is supported we don't want a case where
the guest could have migrated between the call to obtain the
certificates and obtaining the attestation report, which can potentially
cause failure of validation of the attestation report.
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-18 20:27 ` Kalra, Ashish
@ 2023-10-18 20:38 ` Sean Christopherson
2023-10-18 21:27 ` Kalra, Ashish
0 siblings, 1 reply; 158+ messages in thread
From: Sean Christopherson @ 2023-10-18 20:38 UTC (permalink / raw)
To: Ashish Kalra
Cc: Alexey Kardashevskiy, Dionna Amalie Glaze, Michael Roth, kvm,
linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On Wed, Oct 18, 2023, Ashish Kalra wrote:
> > static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> > {
> > struct kvm_vcpu *vcpu = &svm->vcpu;
> > struct kvm *kvm = vcpu->kvm;
> > struct kvm_sev_info *sev;
> > unsigned long exitcode;
> > u64 data_gpa;
> >
> > if (!sev_snp_guest(vcpu->kvm)) {
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> > return 1;
> > }
> >
> > data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> > return 1;
> > }
> >
> > vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> > vcpu->run->hypercall.args[0] = data_gpa;
> > vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> > vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
> > vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> > return 0;
> > }
> >
>
> IIRC, the important consideration here is to ensure that getting the
> attestation report and retrieving the certificates appears atomic to the
> guest. When SNP live migration is supported we don't want a case where the
> guest could have migrated between the call to obtain the certificates and
> obtaining the attestation report, which can potentially cause failure of
> validation of the attestation report.
Where does "obtaining the attestation report" happen? I see the guest request
and the certificate stuff, I don't see anything about attestation reports (though
I'm not looking very closely).
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-18 20:38 ` Sean Christopherson
@ 2023-10-18 21:27 ` Kalra, Ashish
2023-10-18 21:43 ` Sean Christopherson
0 siblings, 1 reply; 158+ messages in thread
From: Kalra, Ashish @ 2023-10-18 21:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Alexey Kardashevskiy, Dionna Amalie Glaze, Michael Roth, kvm,
linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On 10/18/2023 3:38 PM, Sean Christopherson wrote:
> On Wed, Oct 18, 2023, Ashish Kalra wrote:
>>> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
>>> {
>>> struct kvm_vcpu *vcpu = &svm->vcpu;
>>> struct kvm *kvm = vcpu->kvm;
>>> struct kvm_sev_info *sev;
>>> unsigned long exitcode;
>>> u64 data_gpa;
>>>
>>> if (!sev_snp_guest(vcpu->kvm)) {
>>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
>>> return 1;
>>> }
>>>
>>> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
>>> return 1;
>>> }
>>>
>>> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
>>> vcpu->run->hypercall.args[0] = data_gpa;
>>> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
>>> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>>> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
>>> return 0;
>>> }
>>>
>>
>> IIRC, the important consideration here is to ensure that getting the
>> attestation report and retrieving the certificates appears atomic to the
>> guest. When SNP live migration is supported we don't want a case where the
>> guest could have migrated between the call to obtain the certificates and
>> obtaining the attestation report, which can potentially cause failure of
>> validation of the attestation report.
>
> Where does "obtaining the attestation report" happen? I see the guest request
> and the certificate stuff, I don't see anything about attestation reports (though
> I'm not looking very closely).
>
The guest requests that the firmware construct an attestation report via
the SNP_GUEST_REQUEST command. The certificates are piggy-backed to the
guest along with the attestation report (retrieved from the FW via the
SNP_GUEST_REQUEST command) as part of the SNP Extended Guest Request NAE
handling.
Thanks,
Ashish
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-18 21:27 ` Kalra, Ashish
@ 2023-10-18 21:43 ` Sean Christopherson
0 siblings, 0 replies; 158+ messages in thread
From: Sean Christopherson @ 2023-10-18 21:43 UTC (permalink / raw)
To: Ashish Kalra
Cc: Alexey Kardashevskiy, Dionna Amalie Glaze, Michael Roth, kvm,
linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, nikunj.dadhania, pankaj.gupta, liam.merwick,
zhi.a.wang, Brijesh Singh
On Wed, Oct 18, 2023, Ashish Kalra wrote:
>
> On 10/18/2023 3:38 PM, Sean Christopherson wrote:
> > On Wed, Oct 18, 2023, Ashish Kalra wrote:
> > > > static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> > > > {
> > > > struct kvm_vcpu *vcpu = &svm->vcpu;
> > > > struct kvm *kvm = vcpu->kvm;
> > > > struct kvm_sev_info *sev;
> > > > unsigned long exitcode;
> > > > u64 data_gpa;
> > > >
> > > > if (!sev_snp_guest(vcpu->kvm)) {
> > > > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> > > > return 1;
> > > > }
> > > >
> > > > data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > > > if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> > > > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> > > > return 1;
> > > > }
> > > >
Doh, I forget to set
vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> > > > vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> > > > vcpu->run->hypercall.args[0] = data_gpa;
> > > > vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> > > > vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
> > > > vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> > > > return 0;
> > > > }
> > > >
> > >
> > > IIRC, the important consideration here is to ensure that getting the
> > > attestation report and retrieving the certificates appears atomic to the
> > > guest. When SNP live migration is supported we don't want a case where the
> > > guest could have migrated between the call to obtain the certificates and
> > > obtaining the attestation report, which can potentially cause failure of
> > > validation of the attestation report.
> >
> > Where does "obtaining the attestation report" happen? I see the guest request
> > and the certificate stuff, I don't see anything about attestation reports (though
> > I'm not looking very closely).
> >
>
> The guest requests that the firmware construct an attestation report via the
> SNP_GUEST_REQUEST command. The certificates are piggy-backed to the guest
> along with the attestation report (retrieved from the FW via the
> SNP_GUEST_REQUEST command) as part of the SNP Extended Guest Request NAE
> handling.
Ah, thanks!
In that case, my proposal should more or less Just Work™, we simply need to define
KVM's ABI to be that userspace is responsible for doing KVM_RUN with
vcpu->run->immediate_exit set before migrating if the previous exit was
KVM_EXIT_HYPERCALL with KVM_HC_SNP_GET_CERTS. This is standard operating procedure
for userspace exits where KVM needs to "complete" the VM-Exit, e.g. for MMIO, I/O,
etc. that are punted to userspace.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-18 13:48 ` Sean Christopherson
2023-10-18 20:27 ` Kalra, Ashish
@ 2023-10-19 2:48 ` Alexey Kardashevskiy
2023-10-19 14:57 ` Sean Christopherson
2023-10-20 18:37 ` Tom Lendacky
2023-11-10 22:07 ` Michael Roth
2 siblings, 2 replies; 158+ messages in thread
From: Alexey Kardashevskiy @ 2023-10-19 2:48 UTC (permalink / raw)
To: Sean Christopherson
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On 19/10/23 00:48, Sean Christopherson wrote:
> On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>>
>> On 18/10/23 03:27, Sean Christopherson wrote:
>>> On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
>>>>> +
>>>>> + /*
>>>>> + * If a VMM-specific certificate blob hasn't been provided, grab the
>>>>> + * host-wide one.
>>>>> + */
>>>>> + snp_certs = sev_snp_certs_get(sev->snp_certs);
>>>>> + if (!snp_certs)
>>>>> + snp_certs = sev_snp_global_certs_get();
>>>>> +
>>>>
>>>> This is where the generation I suggested adding would get checked. If
>>>> the instance certs' generation is not the global generation, then I
>>>> think we need a way to return to the VMM to make that right before
>>>> continuing to provide outdated certificates.
>>>> This might be an unreasonable request, but the fact that the certs and
>>>> reported_tcb can be set while a VM is running makes this an issue.
>>>
>>> Before we get that far, the changelogs need to explain why the kernel is storing
>>> userspace blobs in the first place. The whole thing is a bit of a mess.
>>>
>>> sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
>>> bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
>>> while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
>>> between bumping the refcount and grabbing the pointer, KVM will end up leaking a
>>> refcount and consuming a pointer without a refcount.
>>>
>>> if (!kref_get_unless_zero(&certs->kref))
>>> return NULL;
>>>
>>> return certs;
>>
>> I'm missing something here. The @certs pointer is on the stack,
>
> No, nothing guarantees that @certs is on the stack and will never be reloaded.
> sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
> possible that it can be inlined. Then you end up with:
>
> struct sev_device *sev;
>
> if (!psp_master || !psp_master->sev_data)
> return NULL;
>
> sev = psp_master->sev_data;
> if (!sev->snp_initialized)
> return NULL;
>
> if (!sev->snp_certs)
> return NULL;
>
> if (!kref_get_unless_zero(&sev->snp_certs->kref))
> return NULL;
>
> return sev->snp_certs;
>
> At which point the compiler could choose to omit a local variable entirely, it
> could store @certs in a register and reload after kref_get_unless_zero(), etc.
> If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.
>
> That atomic operation in kref_get_unless_zero() might prevent a reload between
> getting the kref and the return, but it wouldn't prevent a reload between the
> !NULL check and kref_get_unless_zero().
Oh. The function is exported so I thought gcc would not go that far but
yeah it is possible. So this needs an explicit READ_ONCE barrier.
>>> If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
>>> That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
>>> concern.
>>
>> The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
>> "A la vcpu->run" is fine for the latter but for the former we need something
>> else.
>
> Why? The cert ultimately comes from userspace, no? Make userspace deal with it.
>
>> And there is scenario when one global certs blob is what is needed and
>> copying it over multiple VMs seems suboptimal.
>
> That's a solvable problem. I'm not sure I like the most obvious solution, but it
> is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
> or via an ioctl().
>
> FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
> userspace pointer would suffice. The benefit of a kernel controlled pointer is
> that it doesn't require copying to a kernel buffer (or special code to copy from
> userspace into guest).
Just to clarify - like, a small userspace non-qemu program which just
holds a pointer with the certs blob, or embed it into libvirt or systemd?
> Actually, looking at the flow again, AFAICT there's nothing special about the
> target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
> KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
> And the GHCB doesn't dictate ordering between storing the certificates and doing
> the request. That means the certificate stuff can be punted entirely to usersepace.
All true.
> Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
> for non-SNP guests:
>
> unsigned long exitcode = 0;
> u64 data_gpa;
> int err, rc;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
> goto e_fail;
> }
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
>
> Which really highlights that we need to get test infrastructure up and running
> for SEV-ES, SNP, and TDX.
>
> Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
> is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
>
> static void snp_handle_guest_request(struct vcpu_svm *svm)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> struct sev_data_snp_guest_request data = {0};
> struct kvm_vcpu *vcpu = &svm->vcpu;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_sev_info *sev;
> gpa_t req_gpa = control->exit_info_1;
> gpa_t resp_gpa = control->exit_info_2;
> unsigned long rc;
> int err;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST;
> goto e_fail;
> }
>
> sev = &to_kvm_svm(kvm)->sev_info;
>
> mutex_lock(&sev->guest_req_lock);
>
> rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
> if (rc)
> goto unlock;
>
> rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> if (rc)
> /* Ensure an error value is returned to guest. */
> rc = err ? err : SEV_RET_INVALID_ADDRESS;
>
> snp_cleanup_guest_buf(&data, &rc);
>
> unlock:
> mutex_unlock(&sev->guest_req_lock);
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
> }
>
> static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
> {
> u64 certs_exitcode = vcpu->run->hypercall.args[2];
> struct vcpu_svm *svm = to_svm(vcpu);
>
> if (certs_exitcode)
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
> else
> snp_handle_guest_request(svm);
> return 1;
> }
>
> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> {
> struct kvm_vcpu *vcpu = &svm->vcpu;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_sev_info *sev;
> unsigned long exitcode;
> u64 data_gpa;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> return 1;
> }
>
> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> return 1;
> }
>
> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> vcpu->run->hypercall.args[0] = data_gpa;
> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
btw why is it _LONG_MODE and not just _64? :)
> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> return 0;
> }
This should work the KVM stored certs nicely but not for the global
certs. Although I am not all convinced that global certs is all that
valuable but I do not know the history of that, happened before I joined
so I let others to comment on that. Thanks,
--
Alexey
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-19 2:48 ` Alexey Kardashevskiy
@ 2023-10-19 14:57 ` Sean Christopherson
2023-10-19 23:55 ` Alexey Kardashevskiy
2023-10-20 18:37 ` Tom Lendacky
1 sibling, 1 reply; 158+ messages in thread
From: Sean Christopherson @ 2023-10-19 14:57 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On Thu, Oct 19, 2023, Alexey Kardashevskiy wrote:
>
> On 19/10/23 00:48, Sean Christopherson wrote:
> > static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
> > {
> > struct kvm_vcpu *vcpu = &svm->vcpu;
> > struct kvm *kvm = vcpu->kvm;
> > struct kvm_sev_info *sev;
> > unsigned long exitcode;
> > u64 data_gpa;
> >
> > if (!sev_snp_guest(vcpu->kvm)) {
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
> > return 1;
> > }
> >
> > data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
> > ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
> > return 1;
> > }
> >
> > vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
> > vcpu->run->hypercall.args[0] = data_gpa;
> > vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
> > vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>
> btw why is it _LONG_MODE and not just _64? :)
I'm pretty sure it got copied from Xen when KVM started adding supporting for
emulating Xen's hypercalls. I assume Xen PV actually has a need for identifying
long mode as opposed to just 64-bit mode, but KVM, not so much.
> > vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> > return 0;
> > }
>
> This should work the KVM stored certs nicely but not for the global certs.
> Although I am not all convinced that global certs is all that valuable but I
> do not know the history of that, happened before I joined so I let others to
> comment on that. Thanks,
Aren't the global certs provided by userspace too though? If all certs are
ultimately controlled by userspace, I don't see any reason to make the kernel a
middle-man.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-19 14:57 ` Sean Christopherson
@ 2023-10-19 23:55 ` Alexey Kardashevskiy
2023-10-20 0:13 ` Sean Christopherson
0 siblings, 1 reply; 158+ messages in thread
From: Alexey Kardashevskiy @ 2023-10-19 23:55 UTC (permalink / raw)
To: Sean Christopherson
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On 20/10/23 01:57, Sean Christopherson wrote:
> On Thu, Oct 19, 2023, Alexey Kardashevskiy wrote:
>>
>> On 19/10/23 00:48, Sean Christopherson wrote:
>>> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
>>> {
>>> struct kvm_vcpu *vcpu = &svm->vcpu;
>>> struct kvm *kvm = vcpu->kvm;
>>> struct kvm_sev_info *sev;
>>> unsigned long exitcode;
>>> u64 data_gpa;
>>>
>>> if (!sev_snp_guest(vcpu->kvm)) {
>>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
>>> return 1;
>>> }
>>>
>>> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
>>> return 1;
>>> }
>>>
>>> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
>>> vcpu->run->hypercall.args[0] = data_gpa;
>>> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
>>> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>>
>> btw why is it _LONG_MODE and not just _64? :)
>
> I'm pretty sure it got copied from Xen when KVM started adding supporting for
> emulating Xen's hypercalls. I assume Xen PV actually has a need for identifying
> long mode as opposed to just 64-bit mode, but KVM, not so much.
>
>>> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
>>> return 0;
>>> }
>>
>> This should work the KVM stored certs nicely but not for the global certs.
>> Although I am not all convinced that global certs is all that valuable but I
>> do not know the history of that, happened before I joined so I let others to
>> comment on that. Thanks,
>
> Aren't the global certs provided by userspace too though? If all certs are
> ultimately controlled by userspace, I don't see any reason to make the kernel a
> middle-man.
The max blob size is 32KB or so and for 200 VMs it is:
- 6.5MB, all in the userspace so swappable vs
- 32KB but in the kernel so not swappable.
Sure, a box capable of running 200 VMs must have plenty of RAM but still :)
Plus, GHCB now has to go via the userspace before talking to the PSP
which was not the case so far (though I cannot think of immediate
implication right now).
--
Alexey
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-19 23:55 ` Alexey Kardashevskiy
@ 2023-10-20 0:13 ` Sean Christopherson
2023-10-20 0:43 ` Alexey Kardashevskiy
0 siblings, 1 reply; 158+ messages in thread
From: Sean Christopherson @ 2023-10-20 0:13 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On Fri, Oct 20, 2023, Alexey Kardashevskiy wrote:
>
> On 20/10/23 01:57, Sean Christopherson wrote:
> > On Thu, Oct 19, 2023, Alexey Kardashevskiy wrote:
> > > > vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
> > > > return 0;
> > > > }
> > >
> > > This should work the KVM stored certs nicely but not for the global certs.
> > > Although I am not all convinced that global certs is all that valuable but I
> > > do not know the history of that, happened before I joined so I let others to
> > > comment on that. Thanks,
> >
> > Aren't the global certs provided by userspace too though? If all certs are
> > ultimately controlled by userspace, I don't see any reason to make the kernel a
> > middle-man.
>
> The max blob size is 32KB or so and for 200 VMs it is:
Not according to include/linux/psp-sev.h:
#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
Ugh, and I see in another patch:
Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
for an extra certificate.
-#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
+#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */
That's gross and just asking for ABI problems, because then there's this:
+::
+
+ struct kvm_sev_snp_set_certs {
+ __u64 certs_uaddr;
+ __u64 certs_len
+ };
+
+The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.
> - 6.5MB, all in the userspace so swappable vs
> - 32KB but in the kernel so not swappable.
> Sure, a box capable of running 200 VMs must have plenty of RAM but still :)
That's making quite a few assumptions.
1) That the global cert will be 32KiB (which clearly isn't the case today).
2) That every VM will want the global cert.
3) That userspace can't figure out a way to share the global cert.
Even in that absolutely worst case scenario, I am not remotely convinced that it
justifies taking on the necessary complexity to manage certs in-kernel.
> Plus, GHCB now has to go via the userspace before talking to the PSP which
> was not the case so far (though I cannot think of immediate implication
> right now).
Any argument along the lines of "because that's how we've always done it" is going
to fall on deaf ears. If there's a real performance bottleneck with kicking out
to userspace, then I'll happily work to figure out a solution. If.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-20 0:13 ` Sean Christopherson
@ 2023-10-20 0:43 ` Alexey Kardashevskiy
2023-10-20 15:13 ` Sean Christopherson
0 siblings, 1 reply; 158+ messages in thread
From: Alexey Kardashevskiy @ 2023-10-20 0:43 UTC (permalink / raw)
To: Sean Christopherson
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On 20/10/23 11:13, Sean Christopherson wrote:
> On Fri, Oct 20, 2023, Alexey Kardashevskiy wrote:
>>
>> On 20/10/23 01:57, Sean Christopherson wrote:
>>> On Thu, Oct 19, 2023, Alexey Kardashevskiy wrote:
>>>>> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
>>>>> return 0;
>>>>> }
>>>>
>>>> This should work the KVM stored certs nicely but not for the global certs.
>>>> Although I am not all convinced that global certs is all that valuable but I
>>>> do not know the history of that, happened before I joined so I let others to
>>>> comment on that. Thanks,
>>>
>>> Aren't the global certs provided by userspace too though? If all certs are
>>> ultimately controlled by userspace, I don't see any reason to make the kernel a
>>> middle-man.
>>
>> The max blob size is 32KB or so and for 200 VMs it is:
>
> Not according to include/linux/psp-sev.h:
>
> #define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
>
> Ugh, and I see in another patch:
>
> Also increase the SEV_FW_BLOB_MAX_SIZE another 4K page to allow space
> for an extra certificate.
>
> -#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */
> +#define SEV_FW_BLOB_MAX_SIZE 0x5000 /* 20KB */
>
> That's gross and just asking for ABI problems, because then there's this:
>
> +::
> +
> + struct kvm_sev_snp_set_certs {
> + __u64 certs_uaddr;
> + __u64 certs_len
> + };
> +
> +The certs_len field may not exceed SEV_FW_BLOB_MAX_SIZE.
>
>> - 6.5MB, all in the userspace so swappable vs
>> - 32KB but in the kernel so not swappable.
>> Sure, a box capable of running 200 VMs must have plenty of RAM but still :)
>
> That's making quite a few assumptions.
>
> 1) That the global cert will be 32KiB (which clearly isn't the case today).
> 2) That every VM will want the global cert.
> 3) That userspace can't figure out a way to share the global cert.
>
> Even in that absolutely worst case scenario, I am not remotely convinced that it
> justifies taking on the necessary complexity to manage certs in-kernel.
>
>> Plus, GHCB now has to go via the userspace before talking to the PSP which
>> was not the case so far (though I cannot think of immediate implication
>> right now).
>
> Any argument along the lines of "because that's how we've always done it" is going
> to fall on deaf ears. If there's a real performance bottleneck with kicking out
> to userspace, then I'll happily work to figure out a solution. If.
No, not performance, I was trying to imagine what can go wrong if
multiple vcpus are making this call, all exiting to QEMU, in a loop,
racing, something like this.
--
Alexey
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-20 0:43 ` Alexey Kardashevskiy
@ 2023-10-20 15:13 ` Sean Christopherson
0 siblings, 0 replies; 158+ messages in thread
From: Sean Christopherson @ 2023-10-20 15:13 UTC (permalink / raw)
To: Alexey Kardashevskiy
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On Fri, Oct 20, 2023, Alexey Kardashevskiy wrote:
>
> On 20/10/23 11:13, Sean Christopherson wrote:
> > On Fri, Oct 20, 2023, Alexey Kardashevskiy wrote:
> > > Plus, GHCB now has to go via the userspace before talking to the PSP which
> > > was not the case so far (though I cannot think of immediate implication
> > > right now).
> >
> > Any argument along the lines of "because that's how we've always done it" is going
> > to fall on deaf ears. If there's a real performance bottleneck with kicking out
> > to userspace, then I'll happily work to figure out a solution. If.
>
> No, not performance, I was trying to imagine what can go wrong if multiple
> vcpus are making this call, all exiting to QEMU, in a loop, racing,
> something like this.
I am not at all concerned about userspace being able to handle parallel requests
to get a certificate. Per-vCPU exits that access global/shared resources might
not be super common, but they're certainly not rare. E.g. a guest access to an
option ROM can trigger memslot updates in QEMU, which requires at least taking a
mutex to guard KVM_SET_USER_MEMORY_REGION, and IIRC QEMU also uses RCU to protect
QEMU accesses to address spaces.
Given that we know there will be scenarios where certificates are changed/updated,
I wouldn't be at all surprised if handling this in userspace is actually easier
as it will give userspace more control and options, and make it easier to reason
about the resulting behavior. E.g. userspace could choose between a lockless
scheme and a r/w lock if there's a need to ensure per-VM and global certs are
updated atomically from the guest's perspective.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-19 2:48 ` Alexey Kardashevskiy
2023-10-19 14:57 ` Sean Christopherson
@ 2023-10-20 18:37 ` Tom Lendacky
1 sibling, 0 replies; 158+ messages in thread
From: Tom Lendacky @ 2023-10-20 18:37 UTC (permalink / raw)
To: Alexey Kardashevskiy, Sean Christopherson
Cc: Dionna Amalie Glaze, Michael Roth, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel, hpa, ardb,
pbonzini, vkuznets, jmattson, luto, dave.hansen, slp, pgonda,
peterz, srinivas.pandruvada, rientjes, dovmurik, tobin, bp,
vbabka, kirill, ak, tony.luck, marcorr,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On 10/18/23 21:48, Alexey Kardashevskiy wrote:
>
> On 19/10/23 00:48, Sean Christopherson wrote:
>> On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>>>
>>> On 18/10/23 03:27, Sean Christopherson wrote:
>>>> On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
>>>>>> +
>>>>>> + /*
>>>>>> + * If a VMM-specific certificate blob hasn't been provided,
>>>>>> grab the
>>>>>> + * host-wide one.
>>>>>> + */
>>>>>> + snp_certs = sev_snp_certs_get(sev->snp_certs);
>>>>>> + if (!snp_certs)
>>>>>> + snp_certs = sev_snp_global_certs_get();
>>>>>> +
>>>>>
>>>>> This is where the generation I suggested adding would get checked. If
>>>>> the instance certs' generation is not the global generation, then I
>>>>> think we need a way to return to the VMM to make that right before
>>>>> continuing to provide outdated certificates.
>>>>> This might be an unreasonable request, but the fact that the certs and
>>>>> reported_tcb can be set while a VM is running makes this an issue.
>>>>
>>>> Before we get that far, the changelogs need to explain why the kernel
>>>> is storing
>>>> userspace blobs in the first place. The whole thing is a bit of a mess.
>>>>
>>>> sev_snp_global_certs_get() has data races that could lead to
>>>> variations of TOCTOU
>>>> bugs: sev_ioctl_snp_set_config() can overwrite
>>>> psp_master->sev_data->snp_certs
>>>> while sev_snp_global_certs_get() is running. If the compiler reloads
>>>> snp_certs
>>>> between bumping the refcount and grabbing the pointer, KVM will end up
>>>> leaking a
>>>> refcount and consuming a pointer without a refcount.
>>>>
>>>> if (!kref_get_unless_zero(&certs->kref))
>>>> return NULL;
>>>>
>>>> return certs;
>>>
>>> I'm missing something here. The @certs pointer is on the stack,
>>
>> No, nothing guarantees that @certs is on the stack and will never be
>> reloaded.
>> sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so
>> it's entirely
>> possible that it can be inlined. Then you end up with:
>>
>> struct sev_device *sev;
>>
>> if (!psp_master || !psp_master->sev_data)
>> return NULL;
>>
>> sev = psp_master->sev_data;
>> if (!sev->snp_initialized)
>> return NULL;
>>
>> if (!sev->snp_certs)
>> return NULL;
>>
>> if (!kref_get_unless_zero(&sev->snp_certs->kref))
>> return NULL;
>>
>> return sev->snp_certs;
>>
>> At which point the compiler could choose to omit a local variable
>> entirely, it
>> could store @certs in a register and reload after
>> kref_get_unless_zero(), etc.
>> If psp_master->sev_data->snp_certs is changed at any point, odd thing
>> can happen.
>>
>> That atomic operation in kref_get_unless_zero() might prevent a reload
>> between
>> getting the kref and the return, but it wouldn't prevent a reload
>> between the
>> !NULL check and kref_get_unless_zero().
>
> Oh. The function is exported so I thought gcc would not go that far but
> yeah it is possible. So this needs an explicit READ_ONCE barrier.
>
>
>>>> If userspace wants to provide garbage to the guest, so be it, not
>>>> KVM's problem.
>>>> That way, whether the VM gets the global cert or a per-VM cert is
>>>> purely a userspace
>>>> concern.
>>>
>>> The global cert lives in CCP (/dev/sev), the per VM cert lives in
>>> kvmvm_fd.
>>> "A la vcpu->run" is fine for the latter but for the former we need
>>> something
>>> else.
>>
>> Why? The cert ultimately comes from userspace, no? Make userspace deal
>> with it.
>>
>>> And there is scenario when one global certs blob is what is needed and
>>> copying it over multiple VMs seems suboptimal.
>>
>> That's a solvable problem. I'm not sure I like the most obvious
>> solution, but it
>> is a solution: let userspace define a KVM-wide blob pointer, either via
>> .mmap()
>> or via an ioctl().
>>
>> FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
>> userspace pointer would suffice. The benefit of a kernel controlled
>> pointer is
>> that it doesn't require copying to a kernel buffer (or special code to
>> copy from
>> userspace into guest).
>
> Just to clarify - like, a small userspace non-qemu program which just
> holds a pointer with the certs blob, or embed it into libvirt or systemd?
>
>
>> Actually, looking at the flow again, AFAICT there's nothing special
>> about the
>> target DATA_PAGE. It must be SHARED *before*
>> SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
>> KVM doesn't need to do conversions, there's no kernel priveleges
>> required, etc.
>> And the GHCB doesn't dictate ordering between storing the certificates
>> and doing
>> the request. That means the certificate stuff can be punted entirely to
>> usersepace.
>
> All true.
>
>> Heh, typing up the below, there's another bug: KVM will incorrectly
>> "return" '0'
>> for non-SNP guests:
>>
>> unsigned long exitcode = 0;
>> u64 data_gpa;
>> int err, rc;
>>
>> if (!sev_snp_guest(vcpu->kvm)) {
>> rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
>> goto e_fail;
>> }
>>
>> e_fail:
>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
>>
>> Which really highlights that we need to get test infrastructure up and
>> running
>> for SEV-ES, SNP, and TDX.
>>
>> Anyways, back to punting to userspace. Here's a rough sketch. The only
>> new uAPI
>> is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
>>
>> static void snp_handle_guest_request(struct vcpu_svm *svm)
>> {
>> struct vmcb_control_area *control = &svm->vmcb->control;
>> struct sev_data_snp_guest_request data = {0};
>> struct kvm_vcpu *vcpu = &svm->vcpu;
>> struct kvm *kvm = vcpu->kvm;
>> struct kvm_sev_info *sev;
>> gpa_t req_gpa = control->exit_info_1;
>> gpa_t resp_gpa = control->exit_info_2;
>> unsigned long rc;
>> int err;
>>
>> if (!sev_snp_guest(vcpu->kvm)) {
>> rc = SEV_RET_INVALID_GUEST;
>> goto e_fail;
>> }
>>
>> sev = &to_kvm_svm(kvm)->sev_info;
>>
>> mutex_lock(&sev->guest_req_lock);
>>
>> rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>> if (rc)
>> goto unlock;
>>
>> rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
>> if (rc)
>> /* Ensure an error value is returned to guest. */
>> rc = err ? err : SEV_RET_INVALID_ADDRESS;
>>
>> snp_cleanup_guest_buf(&data, &rc);
>>
>> unlock:
>> mutex_unlock(&sev->guest_req_lock);
>>
>> e_fail:
>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
>> }
>>
>> static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
>> {
>> u64 certs_exitcode = vcpu->run->hypercall.args[2];
>> struct vcpu_svm *svm = to_svm(vcpu);
>>
>> if (certs_exitcode)
>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
>> else
>> snp_handle_guest_request(svm);
>> return 1;
>> }
>>
>> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
>> {
>> struct kvm_vcpu *vcpu = &svm->vcpu;
>> struct kvm *kvm = vcpu->kvm;
>> struct kvm_sev_info *sev;
>> unsigned long exitcode;
>> u64 data_gpa;
>>
>> if (!sev_snp_guest(vcpu->kvm)) {
>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
>> return 1;
>> }
>>
>> data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>> if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
>> return 1;
>> }
>>
>> vcpu->run->hypercall.nr = KVM_HC_SNP_GET_CERTS;
>> vcpu->run->hypercall.args[0] = data_gpa;
>> vcpu->run->hypercall.args[1] = vcpu->arch.regs[VCPU_REGS_RBX];
>> vcpu->run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE;
>
> btw why is it _LONG_MODE and not just _64? :)
>
>> vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
>> return 0;
>> }
>
> This should work the KVM stored certs nicely but not for the global certs.
> Although I am not all convinced that global certs is all that valuable but
> I do not know the history of that, happened before I joined so I let
> others to comment on that. Thanks,
Global certs was the original implementation because it was intended to
provide the VCEK, ASK, and ARK. These will be the same for all SNP guests
that are launched. The original intention was also to not make the kernel
have to manage multiple certificates and instead just treat the data as a
blob provided from user-space.
The per-VM change was added to allow a per-VM certificates. If a provider
has no need to use this, then only the global certs blob is needed which
reduces the amount of memory needed for the VM.
Thanks,
Tom
>
>
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-10-18 13:48 ` Sean Christopherson
2023-10-18 20:27 ` Kalra, Ashish
2023-10-19 2:48 ` Alexey Kardashevskiy
@ 2023-11-10 22:07 ` Michael Roth
2023-11-10 22:47 ` Sean Christopherson
2 siblings, 1 reply; 158+ messages in thread
From: Michael Roth @ 2023-11-10 22:07 UTC (permalink / raw)
To: Sean Christopherson
Cc: Alexey Kardashevskiy, Dionna Amalie Glaze, kvm, linux-coco,
linux-mm, linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On Wed, Oct 18, 2023 at 06:48:59AM -0700, Sean Christopherson wrote:
> On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
> >
> > On 18/10/23 03:27, Sean Christopherson wrote:
> > > On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
> > > > > +
> > > > > + /*
> > > > > + * If a VMM-specific certificate blob hasn't been provided, grab the
> > > > > + * host-wide one.
> > > > > + */
> > > > > + snp_certs = sev_snp_certs_get(sev->snp_certs);
> > > > > + if (!snp_certs)
> > > > > + snp_certs = sev_snp_global_certs_get();
> > > > > +
> > > >
> > > > This is where the generation I suggested adding would get checked. If
> > > > the instance certs' generation is not the global generation, then I
> > > > think we need a way to return to the VMM to make that right before
> > > > continuing to provide outdated certificates.
> > > > This might be an unreasonable request, but the fact that the certs and
> > > > reported_tcb can be set while a VM is running makes this an issue.
> > >
> > > Before we get that far, the changelogs need to explain why the kernel is storing
> > > userspace blobs in the first place. The whole thing is a bit of a mess.
> > >
> > > sev_snp_global_certs_get() has data races that could lead to variations of TOCTOU
> > > bugs: sev_ioctl_snp_set_config() can overwrite psp_master->sev_data->snp_certs
> > > while sev_snp_global_certs_get() is running. If the compiler reloads snp_certs
> > > between bumping the refcount and grabbing the pointer, KVM will end up leaking a
> > > refcount and consuming a pointer without a refcount.
> > >
> > > if (!kref_get_unless_zero(&certs->kref))
> > > return NULL;
> > >
> > > return certs;
> >
> > I'm missing something here. The @certs pointer is on the stack,
>
> No, nothing guarantees that @certs is on the stack and will never be reloaded.
> sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so it's entirely
> possible that it can be inlined. Then you end up with:
>
> struct sev_device *sev;
>
> if (!psp_master || !psp_master->sev_data)
> return NULL;
>
> sev = psp_master->sev_data;
> if (!sev->snp_initialized)
> return NULL;
>
> if (!sev->snp_certs)
> return NULL;
>
> if (!kref_get_unless_zero(&sev->snp_certs->kref))
> return NULL;
>
> return sev->snp_certs;
>
> At which point the compiler could choose to omit a local variable entirely, it
> could store @certs in a register and reload after kref_get_unless_zero(), etc.
> If psp_master->sev_data->snp_certs is changed at any point, odd thing can happen.
>
> That atomic operation in kref_get_unless_zero() might prevent a reload between
> getting the kref and the return, but it wouldn't prevent a reload between the
> !NULL check and kref_get_unless_zero().
>
> > > If userspace wants to provide garbage to the guest, so be it, not KVM's problem.
> > > That way, whether the VM gets the global cert or a per-VM cert is purely a userspace
> > > concern.
> >
> > The global cert lives in CCP (/dev/sev), the per VM cert lives in kvmvm_fd.
> > "A la vcpu->run" is fine for the latter but for the former we need something
> > else.
>
> Why? The cert ultimately comes from userspace, no? Make userspace deal with it.
>
> > And there is scenario when one global certs blob is what is needed and
> > copying it over multiple VMs seems suboptimal.
>
> That's a solvable problem. I'm not sure I like the most obvious solution, but it
> is a solution: let userspace define a KVM-wide blob pointer, either via .mmap()
> or via an ioctl().
>
> FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
> userspace pointer would suffice. The benefit of a kernel controlled pointer is
> that it doesn't require copying to a kernel buffer (or special code to copy from
> userspace into guest).
>
> Actually, looking at the flow again, AFAICT there's nothing special about the
> target DATA_PAGE. It must be SHARED *before* SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
> KVM doesn't need to do conversions, there's no kernel priveleges required, etc.
> And the GHCB doesn't dictate ordering between storing the certificates and doing
> the request. That means the certificate stuff can be punted entirely to usersepace.
>
> Heh, typing up the below, there's another bug: KVM will incorrectly "return" '0'
> for non-SNP guests:
Thanks for the catch, will fix that up.
>
> unsigned long exitcode = 0;
> u64 data_gpa;
> int err, rc;
>
> if (!sev_snp_guest(vcpu->kvm)) {
> rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
> goto e_fail;
> }
>
> e_fail:
> ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
>
> Which really highlights that we need to get test infrastructure up and running
> for SEV-ES, SNP, and TDX.
>
> Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
> is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
This sketch seems like a good, flexible way to handle per-VM certs, but
it does complicate things from a userspace perspective. As a basic
requirement, all userspaces will need to provide a way to specify the
initial blob (either a very verbose base64-encoded userspace cmdline param,
or a filepatch that needs additional management to store and handle
permissions/etc.), and also a means to update it (e.g. a HMP/QMP command
for QEMU, some libvirt wrappers, etc.).
That's all well and good if you want to make use of per-VM certs, but we
don't necessarily expect that most deployments will necessarily want to deal
with per-VM certs, and would be happy with a system-wide one where they could
simply issue the /dev/sev ioctl to inject one automatically for all guests.
So we're sort of complicating the more common case to support a more niche
one (as far as userspace is concerned anyway; as far as kernel goes, your
approach is certainly simplest :)).
Instead, maybe a compromise is warranted so the requirements on userspace
side are less complicated for a more basic deployment:
1) If /dev/sev is used to set a global certificate, then that will be
used unconditionally by KVM, protected by simple dumb mutex during
usage/update.
2) If /dev/sev is not used to set the global certificate is the value
is NULL, we assume userspace wants full responsibility for managing
certificates and exit to userspace to request the certs in the manner
you suggested.
Sean, Dionna, would this cover your concerns and address the certificate
update use-case?
-Mike
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-11-10 22:07 ` Michael Roth
@ 2023-11-10 22:47 ` Sean Christopherson
2023-11-16 5:31 ` Dionna Amalie Glaze
0 siblings, 1 reply; 158+ messages in thread
From: Sean Christopherson @ 2023-11-10 22:47 UTC (permalink / raw)
To: Michael Roth
Cc: Alexey Kardashevskiy, Dionna Amalie Glaze, kvm, linux-coco,
linux-mm, linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh
On Fri, Nov 10, 2023, Michael Roth wrote:
> On Wed, Oct 18, 2023 at 06:48:59AM -0700, Sean Christopherson wrote:
> > On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
> > Anyways, back to punting to userspace. Here's a rough sketch. The only new uAPI
> > is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
>
> This sketch seems like a good, flexible way to handle per-VM certs, but
> it does complicate things from a userspace perspective. As a basic
> requirement, all userspaces will need to provide a way to specify the
> initial blob (either a very verbose base64-encoded userspace cmdline param,
> or a filepatch that needs additional management to store and handle
> permissions/etc.), and also a means to update it (e.g. a HMP/QMP command
> for QEMU, some libvirt wrappers, etc.).
>
> That's all well and good if you want to make use of per-VM certs, but we
> don't necessarily expect that most deployments will necessarily want to deal
> with per-VM certs, and would be happy with a system-wide one where they could
> simply issue the /dev/sev ioctl to inject one automatically for all guests.
>
> So we're sort of complicating the more common case to support a more niche
> one (as far as userspace is concerned anyway; as far as kernel goes, your
> approach is certainly simplest :)).
>
> Instead, maybe a compromise is warranted so the requirements on userspace
> side are less complicated for a more basic deployment:
>
> 1) If /dev/sev is used to set a global certificate, then that will be
> used unconditionally by KVM, protected by simple dumb mutex during
> usage/update.
> 2) If /dev/sev is not used to set the global certificate is the value
> is NULL, we assume userspace wants full responsibility for managing
> certificates and exit to userspace to request the certs in the manner
> you suggested.
>
> Sean, Dionna, would this cover your concerns and address the certificate
> update use-case?
Honestly, no. I see zero reason for the kernel to be involved. IIUC, there's no
privileged operations that require kernel intervention, which means that shoving
a global cert into /dev/sev is using the CCP driver as middleman. Just use a
userspace daemon. I have a very hard time believing that passing around large-ish
blobs of data in userspace isn't already a solved problem.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-11-10 22:47 ` Sean Christopherson
@ 2023-11-16 5:31 ` Dionna Amalie Glaze
2023-12-05 0:30 ` Dan Williams
0 siblings, 1 reply; 158+ messages in thread
From: Dionna Amalie Glaze @ 2023-11-16 5:31 UTC (permalink / raw)
To: Sean Christopherson
Cc: Michael Roth, Alexey Kardashevskiy, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh, Dan Williams
> > So we're sort of complicating the more common case to support a more niche
> > one (as far as userspace is concerned anyway; as far as kernel goes, your
> > approach is certainly simplest :)).
> >
> > Instead, maybe a compromise is warranted so the requirements on userspace
> > side are less complicated for a more basic deployment:
> >
> > 1) If /dev/sev is used to set a global certificate, then that will be
> > used unconditionally by KVM, protected by simple dumb mutex during
> > usage/update.
> > 2) If /dev/sev is not used to set the global certificate is the value
> > is NULL, we assume userspace wants full responsibility for managing
> > certificates and exit to userspace to request the certs in the manner
> > you suggested.
> >
> > Sean, Dionna, would this cover your concerns and address the certificate
> > update use-case?
>
> Honestly, no. I see zero reason for the kernel to be involved. IIUC, there's no
> privileged operations that require kernel intervention, which means that shoving
> a global cert into /dev/sev is using the CCP driver as middleman. Just use a
> userspace daemon. I have a very hard time believing that passing around large-ish
> blobs of data in userspace isn't already a solved problem.
ping sathyanarayanan.kuppuswamy@linux.intel.com and +Dan Williams
I think for a uniform experience for all coco technologies, we need
someone from Intel to weigh in on supporting auxblob through a similar
vmexit. Whereas the quoting enclave gets its PCK cert installed by the
host, something like the firmware's SBOM [1] could be delivered in
auxblob. The proposal to embed the compressed SBOM binary in a coff
section of the UEFI doesn't get it communicated to user space, so this
is a good place to get that info about the expected TDMR in. The SBOM
proposal itself would need additional modeling in the coRIM profile to
have extra coco-specific measurements or we need to find some other
method of getting this info bundled with the attestation report.
My own plan for SEV-SNP was to have a bespoke signed measurement of
the UEFI in the GUID table, but that doesn't extend to TDX. If we're
looking more at an industry alignment on coRIM for SBOM formats (yes
please), then it'd be great to start getting that kind of info plumbed
to the user in a uniform way that doesn't have to rely on servers
providing the endorsements.
[1] https://uefi.org/blog/firmware-sbom-proposal
--
-Dionna Glaze, PhD (she/her)
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-11-16 5:31 ` Dionna Amalie Glaze
@ 2023-12-05 0:30 ` Dan Williams
2023-12-05 0:48 ` Dionna Amalie Glaze
0 siblings, 1 reply; 158+ messages in thread
From: Dan Williams @ 2023-12-05 0:30 UTC (permalink / raw)
To: Dionna Amalie Glaze, Sean Christopherson
Cc: Michael Roth, Alexey Kardashevskiy, kvm, linux-coco, linux-mm,
linux-crypto, x86, linux-kernel, tglx, mingo, jroedel,
thomas.lendacky, hpa, ardb, pbonzini, vkuznets, jmattson, luto,
dave.hansen, slp, pgonda, peterz, srinivas.pandruvada, rientjes,
dovmurik, tobin, bp, vbabka, kirill, ak, tony.luck,
sathyanarayanan.kuppuswamy, alpergun, jarkko, ashish.kalra,
nikunj.dadhania, pankaj.gupta, liam.merwick, zhi.a.wang,
Brijesh Singh, Dan Williams, dan.middleton
[ add Dan Middleton for his awareness ]
Dionna Amalie Glaze wrote:
> > > So we're sort of complicating the more common case to support a more niche
> > > one (as far as userspace is concerned anyway; as far as kernel goes, your
> > > approach is certainly simplest :)).
> > >
> > > Instead, maybe a compromise is warranted so the requirements on userspace
> > > side are less complicated for a more basic deployment:
> > >
> > > 1) If /dev/sev is used to set a global certificate, then that will be
> > > used unconditionally by KVM, protected by simple dumb mutex during
> > > usage/update.
> > > 2) If /dev/sev is not used to set the global certificate is the value
> > > is NULL, we assume userspace wants full responsibility for managing
> > > certificates and exit to userspace to request the certs in the manner
> > > you suggested.
> > >
> > > Sean, Dionna, would this cover your concerns and address the certificate
> > > update use-case?
> >
> > Honestly, no. I see zero reason for the kernel to be involved. IIUC, there's no
> > privileged operations that require kernel intervention, which means that shoving
> > a global cert into /dev/sev is using the CCP driver as middleman. Just use a
> > userspace daemon. I have a very hard time believing that passing around large-ish
> > blobs of data in userspace isn't already a solved problem.
>
> ping sathyanarayanan.kuppuswamy@linux.intel.com and +Dan Williams
Apologies Dionna, I missed this earlier.
>
> I think for a uniform experience for all coco technologies, we need
> someone from Intel to weigh in on supporting auxblob through a similar
> vmexit. Whereas the quoting enclave gets its PCK cert installed by the
> host, something like the firmware's SBOM [1] could be delivered in
> auxblob. The proposal to embed the compressed SBOM binary in a coff
> section of the UEFI doesn't get it communicated to user space, so this
> is a good place to get that info about the expected TDMR in. The SBOM
> proposal itself would need additional modeling in the coRIM profile to
> have extra coco-specific measurements or we need to find some other
> method of getting this info bundled with the attestation report.
SBOM looks different than the SEV use case of @auxblob to convey a
certificate chain.
Are you asking for @auxblob to be SBOM on TDX and a certchain on SEV, or
unifying the @auxblob format on SBOM?
> My own plan for SEV-SNP was to have a bespoke signed measurement of
> the UEFI in the GUID table, but that doesn't extend to TDX. If we're
> looking more at an industry alignment on coRIM for SBOM formats (yes
> please), then it'd be great to start getting that kind of info plumbed
> to the user in a uniform way that doesn't have to rely on servers
> providing the endorsements.
>
> [1] https://uefi.org/blog/firmware-sbom-proposal
Honestly my first reaction for this ABI would be for a new file under
/sys/firmware/efi/efivars or similar.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-12-05 0:30 ` Dan Williams
@ 2023-12-05 0:48 ` Dionna Amalie Glaze
2023-12-05 20:06 ` Dan Williams
0 siblings, 1 reply; 158+ messages in thread
From: Dionna Amalie Glaze @ 2023-12-05 0:48 UTC (permalink / raw)
To: Dan Williams
Cc: Sean Christopherson, Michael Roth, Alexey Kardashevskiy, kvm,
linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, dan.middleton
On Mon, Dec 4, 2023 at 4:30 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> [ add Dan Middleton for his awareness ]
>
> Dionna Amalie Glaze wrote:
> > > > So we're sort of complicating the more common case to support a more niche
> > > > one (as far as userspace is concerned anyway; as far as kernel goes, your
> > > > approach is certainly simplest :)).
> > > >
> > > > Instead, maybe a compromise is warranted so the requirements on userspace
> > > > side are less complicated for a more basic deployment:
> > > >
> > > > 1) If /dev/sev is used to set a global certificate, then that will be
> > > > used unconditionally by KVM, protected by simple dumb mutex during
> > > > usage/update.
> > > > 2) If /dev/sev is not used to set the global certificate is the value
> > > > is NULL, we assume userspace wants full responsibility for managing
> > > > certificates and exit to userspace to request the certs in the manner
> > > > you suggested.
> > > >
> > > > Sean, Dionna, would this cover your concerns and address the certificate
> > > > update use-case?
> > >
> > > Honestly, no. I see zero reason for the kernel to be involved. IIUC, there's no
> > > privileged operations that require kernel intervention, which means that shoving
> > > a global cert into /dev/sev is using the CCP driver as middleman. Just use a
> > > userspace daemon. I have a very hard time believing that passing around large-ish
> > > blobs of data in userspace isn't already a solved problem.
> >
> > ping sathyanarayanan.kuppuswamy@linux.intel.com and +Dan Williams
>
> Apologies Dionna, I missed this earlier.
>
No worries, I've been sick anyway.
> >
> > I think for a uniform experience for all coco technologies, we need
> > someone from Intel to weigh in on supporting auxblob through a similar
> > vmexit. Whereas the quoting enclave gets its PCK cert installed by the
> > host, something like the firmware's SBOM [1] could be delivered in
> > auxblob. The proposal to embed the compressed SBOM binary in a coff
> > section of the UEFI doesn't get it communicated to user space, so this
> > is a good place to get that info about the expected TDMR in. The SBOM
> > proposal itself would need additional modeling in the coRIM profile to
> > have extra coco-specific measurements or we need to find some other
> > method of getting this info bundled with the attestation report.
>
> SBOM looks different than the SEV use case of @auxblob to convey a
> certificate chain.
The SEV use case has a GUID table in which we're allowed to provide
more than just the VCEK certificate chain. I'm using it to deliver a
UEFI endorsement document as well.
>
> Are you asking for @auxblob to be SBOM on TDX and a certchain on SEV, or
> unifying the @auxblob format on SBOM?
Given SEV is both certchain and SBOM and TDX doesn't need a certchain
in auxblob, I'd just be looking at delivering SBOM in auxblob for TDX.
It's probably better to have something extensible though, like SEV's
GUID table format. We may want to provide cached TDI RIMs too, for
example.
>
> > My own plan for SEV-SNP was to have a bespoke signed measurement of
> > the UEFI in the GUID table, but that doesn't extend to TDX. If we're
> > looking more at an industry alignment on coRIM for SBOM formats (yes
> > please), then it'd be great to start getting that kind of info plumbed
> > to the user in a uniform way that doesn't have to rely on servers
> > providing the endorsements.
> >
> > [1] https://uefi.org/blog/firmware-sbom-proposal
>
> Honestly my first reaction for this ABI would be for a new file under
> /sys/firmware/efi/efivars or similar.
For UEFI specifically that could make sense, yes. Not everyone has
been mounting efivars, so it's been a bit of an uphill battle for that
one. Still there's the matter of cached TDI RIMs. NVIDIA would have
everyone send attestation requests to their servers every quote
request in the NRAS architecture, but we're looking at other ways to
provide reliable attestation without a third party service, albeit
with slightly different security properties.
--
-Dionna Glaze, PhD (she/her)
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-12-05 0:48 ` Dionna Amalie Glaze
@ 2023-12-05 20:06 ` Dan Williams
2023-12-05 22:04 ` Dionna Amalie Glaze
0 siblings, 1 reply; 158+ messages in thread
From: Dan Williams @ 2023-12-05 20:06 UTC (permalink / raw)
To: Dionna Amalie Glaze, Dan Williams
Cc: Sean Christopherson, Michael Roth, Alexey Kardashevskiy, kvm,
linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, dan.middleton
[ add Ard for the SBOM sysfs ABI commentary ]
Dionna Amalie Glaze wrote:
[..]
> > > My own plan for SEV-SNP was to have a bespoke signed measurement of
> > > the UEFI in the GUID table, but that doesn't extend to TDX. If we're
> > > looking more at an industry alignment on coRIM for SBOM formats (yes
> > > please), then it'd be great to start getting that kind of info plumbed
> > > to the user in a uniform way that doesn't have to rely on servers
> > > providing the endorsements.
> > >
> > > [1] https://uefi.org/blog/firmware-sbom-proposal
> >
> > Honestly my first reaction for this ABI would be for a new file under
> > /sys/firmware/efi/efivars or similar.
>
> For UEFI specifically that could make sense, yes. Not everyone has
> been mounting efivars, so it's been a bit of an uphill battle for that
> one.
I wonder what the concern is with mounting efivarfs vs configfs? In any
event this seems distinct enough to be its own /sys/firmware/efi/sbom
file. I would defer to Ard, but I think SBOM is a generally useful
concept that would be out of place as a blob returned from configfs-tsm.
> Still there's the matter of cached TDI RIMs. NVIDIA would have
I am not immediatly sure what a "TDI RIM" is?
> everyone send attestation requests to their servers every quote
> request in the NRAS architecture, but we're looking at other ways to
"NRAS" does not parse for me either.
> provide reliable attestation without a third party service, albeit
> with slightly different security properties.
Setting the above confusion aside, I would just say that in general yes,
the kernel needs to understand its role in an end-to-end attestation
architecture that is not beholden to a single vendor, but also allows
the kernel to enforce ABI stability / mitigate regressions based on
binary format changes.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-12-05 20:06 ` Dan Williams
@ 2023-12-05 22:04 ` Dionna Amalie Glaze
2023-12-05 23:11 ` Dan Williams
0 siblings, 1 reply; 158+ messages in thread
From: Dionna Amalie Glaze @ 2023-12-05 22:04 UTC (permalink / raw)
To: Dan Williams
Cc: Sean Christopherson, Michael Roth, Alexey Kardashevskiy, kvm,
linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, dan.middleton
On Tue, Dec 5, 2023 at 12:06 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> [ add Ard for the SBOM sysfs ABI commentary ]
>
> Dionna Amalie Glaze wrote:
> [..]
> > > > My own plan for SEV-SNP was to have a bespoke signed measurement of
> > > > the UEFI in the GUID table, but that doesn't extend to TDX. If we're
> > > > looking more at an industry alignment on coRIM for SBOM formats (yes
> > > > please), then it'd be great to start getting that kind of info plumbed
> > > > to the user in a uniform way that doesn't have to rely on servers
> > > > providing the endorsements.
> > > >
> > > > [1] https://uefi.org/blog/firmware-sbom-proposal
> > >
> > > Honestly my first reaction for this ABI would be for a new file under
> > > /sys/firmware/efi/efivars or similar.
> >
> > For UEFI specifically that could make sense, yes. Not everyone has
> > been mounting efivars, so it's been a bit of an uphill battle for that
> > one.
>
> I wonder what the concern is with mounting efivarfs vs configfs? In any
> event this seems distinct enough to be its own /sys/firmware/efi/sbom
> file. I would defer to Ard, but I think SBOM is a generally useful
> concept that would be out of place as a blob returned from configfs-tsm.
>
> > Still there's the matter of cached TDI RIMs. NVIDIA would have
>
> I am not immediatly sure what a "TDI RIM" is?
>
I might just be making up terms. Any trusted hardware device that has
its own attestation will (hopefully) have signed reference
measurements, or a Reference Integrity Manifest as TCG calls them.
> > everyone send attestation requests to their servers every quote
> > request in the NRAS architecture, but we're looking at other ways to
>
> "NRAS" does not parse for me either.
>
That would be this https://docs.attestation.nvidia.com/api-docs/nras.html
> > provide reliable attestation without a third party service, albeit
> > with slightly different security properties.
>
> Setting the above confusion aside, I would just say that in general yes,
> the kernel needs to understand its role in an end-to-end attestation
> architecture that is not beholden to a single vendor, but also allows
> the kernel to enforce ABI stability / mitigate regressions based on
> binary format changes.
>
I'm mainly holding on to hope that I don't have to introduce a new
runtime dependency on a service that gives a source of truth about the
software that's running in the VM.
If we can have a GUID table with a flexible size that the host can
request of the guest, then we can version ABI changes with new GUID
entries.
It's a big enough value space without vanity naming opportunities that
we can pretty easily make changes without incurring any guest kernel
changes.
--
-Dionna Glaze, PhD (she/her)
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-12-05 22:04 ` Dionna Amalie Glaze
@ 2023-12-05 23:11 ` Dan Williams
2023-12-06 0:43 ` Dionna Amalie Glaze
0 siblings, 1 reply; 158+ messages in thread
From: Dan Williams @ 2023-12-05 23:11 UTC (permalink / raw)
To: Dionna Amalie Glaze, Dan Williams
Cc: Sean Christopherson, Michael Roth, Alexey Kardashevskiy, kvm,
linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, dan.middleton
Dionna Amalie Glaze wrote:
> On Tue, Dec 5, 2023 at 12:06 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > [ add Ard for the SBOM sysfs ABI commentary ]
> >
> > Dionna Amalie Glaze wrote:
> > [..]
> > > > > My own plan for SEV-SNP was to have a bespoke signed measurement of
> > > > > the UEFI in the GUID table, but that doesn't extend to TDX. If we're
> > > > > looking more at an industry alignment on coRIM for SBOM formats (yes
> > > > > please), then it'd be great to start getting that kind of info plumbed
> > > > > to the user in a uniform way that doesn't have to rely on servers
> > > > > providing the endorsements.
> > > > >
> > > > > [1] https://uefi.org/blog/firmware-sbom-proposal
> > > >
> > > > Honestly my first reaction for this ABI would be for a new file under
> > > > /sys/firmware/efi/efivars or similar.
> > >
> > > For UEFI specifically that could make sense, yes. Not everyone has
> > > been mounting efivars, so it's been a bit of an uphill battle for that
> > > one.
> >
> > I wonder what the concern is with mounting efivarfs vs configfs? In any
> > event this seems distinct enough to be its own /sys/firmware/efi/sbom
> > file. I would defer to Ard, but I think SBOM is a generally useful
> > concept that would be out of place as a blob returned from configfs-tsm.
> >
> > > Still there's the matter of cached TDI RIMs. NVIDIA would have
> >
> > I am not immediatly sure what a "TDI RIM" is?
> >
>
> I might just be making up terms. Any trusted hardware device that has
> its own attestation will (hopefully) have signed reference
> measurements, or a Reference Integrity Manifest as TCG calls them.
Ah, ok.
>
> > > everyone send attestation requests to their servers every quote
> > > request in the NRAS architecture, but we're looking at other ways to
> >
> > "NRAS" does not parse for me either.
> >
>
> That would be this https://docs.attestation.nvidia.com/api-docs/nras.html
Thanks!
> > > provide reliable attestation without a third party service, albeit
> > > with slightly different security properties.
> >
> > Setting the above confusion aside, I would just say that in general yes,
> > the kernel needs to understand its role in an end-to-end attestation
> > architecture that is not beholden to a single vendor, but also allows
> > the kernel to enforce ABI stability / mitigate regressions based on
> > binary format changes.
> >
>
> I'm mainly holding on to hope that I don't have to introduce a new
> runtime dependency on a service that gives a source of truth about the
> software that's running in the VM.
> If we can have a GUID table with a flexible size that the host can
> request of the guest, then we can version ABI changes with new GUID
> entries.
> It's a big enough value space without vanity naming opportunities that
> we can pretty easily make changes without incurring any guest kernel
> changes.
So it's not only SBOM that you are concerned about, but instead want to
have a one stop shop for auxiliary evidence and get the vendors agree on
following the same GUID+blob precedent that is already there for the AMD
cert chain? That sounds reasonable, but I still feel it should be
limited to things that do not fit into an existing ABI namespace.
...unless its evidence / material that only a TVM would ever need.
^ permalink raw reply [flat|nested] 158+ messages in thread
* Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
2023-12-05 23:11 ` Dan Williams
@ 2023-12-06 0:43 ` Dionna Amalie Glaze
0 siblings, 0 replies; 158+ messages in thread
From: Dionna Amalie Glaze @ 2023-12-06 0:43 UTC (permalink / raw)
To: Dan Williams
Cc: Sean Christopherson, Michael Roth, Alexey Kardashevskiy, kvm,
linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, vkuznets,
jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, sathyanarayanan.kuppuswamy, alpergun,
jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh, dan.middleton
> So it's not only SBOM that you are concerned about, but instead want to
> have a one stop shop for auxiliary evidence and get the vendors agree on
> following the same GUID+blob precedent that is already there for the AMD
> cert chain? That sounds reasonable, but I still feel it should be
> limited to things that do not fit into an existing ABI namespace.
>
The tl;dr is I want something simple for a hard problem and I'll
probably lose this argument.
The unfortunate state of affairs here is that it is "vendor dependent"
whatever pathway they choose to provide reference material for
attestations. Even with the RATS framework, "reference value provider"
is just a bubble in a diagram and not a federated service protocol
that we've all agreed on. TCG's recommendation for delivering the RIM
in the efi volume doesn't quite work in the cloud setting, since
images own that and not us as the firmware provider. There's no
standard for how to inform a user where to get a RIM other than that
:(
> ...unless its evidence / material that only a TVM would ever need.
There's the rub. Evidence may be provided to the TVM to forward to
verifiers, but really it's the verifiers that are tasked with
gathering all this attestation collateral. The TVM doesn't have to do
anything, even provide machine-specific certificates. That's pretty
dreadful system design though, since you end up with global services
in your hot path rather than getting cached data from the machine
you're getting an attestation from. My first stab at it is to just
have a storage bucket with objects named based on expected measurement
values, so you just construct a URL and download the endorsement if
you need it. I'd really rather just make that part of what the guest
can get from auxblob since they pay the bandwidth of forwarding it to
verifiers rather than my cost center paying for the storage bucket
bandwidth (is this the real reason I'm pushing on this? I'm unsure).
If you're instead asking if this information would need to get piped
to a non-TVM (say, a non-confidential VM with a virtual TPM), then the
answer is maa~aaa~aaybe but probably not. PCR0 in the cloud really
needs its own profile, since the TCG platform firmware profile (PFP)
is unsuitable. There's not a whole lot of point delivering a signed
endorsement of a firmware measurement that we don't include in the
event log anyway for stability reasons, even if that's against the PFP
spec. So probably not. I think we're pretty clear that host-cached
data would only need to be for TVMs.
If you ask the folks I've been talking to at Intel or NVIDIA, they'd
probably say to put a service in your dependencies and be done with
it; it's the vendor's responsibility to provide an available enough
service to provide all the evidence that an attestation verifier may
want. That's quite unfriendly to the smaller players in the field, but
maybe it's easy to integrate with something like Veraison's
endorsement provisioning API [1] or Intel's Trust Authority (née
Project Amber). I haven't done it.
[1] https://github.com/veraison/docs/tree/main/api/endorsement-provisioning
--
-Dionna Glaze, PhD (she/her)
^ permalink raw reply [flat|nested] 158+ messages in thread
* [PATCH v10 49/50] crypto: ccp: Add debug support for decrypting pages
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (47 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
2023-10-16 13:28 ` [PATCH v10 50/50] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump Michael Roth
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang, Brijesh Singh
From: Brijesh Singh <brijesh.singh@amd.com>
Add support to decrypt guest encrypted memory. These API interfaces can
be used for example to dump VMCBs on SNP guest exit.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: minor commit fixups]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
drivers/crypto/ccp/sev-dev.c | 32 ++++++++++++++++++++++++++++++++
include/linux/psp-sev.h | 19 +++++++++++++++++++
2 files changed, 51 insertions(+)
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index f9c75c561c4e..26218df1371e 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -2061,6 +2061,38 @@ int sev_guest_df_flush(int *error)
}
EXPORT_SYMBOL_GPL(sev_guest_df_flush);
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+ struct sev_data_snp_dbg data = {0};
+ struct sev_device *sev;
+ int ret;
+
+ if (!psp_master || !psp_master->sev_data)
+ return -ENODEV;
+
+ sev = psp_master->sev_data;
+
+ if (!sev->snp_initialized)
+ return -EINVAL;
+
+ data.gctx_paddr = sme_me_mask | (gctx_pfn << PAGE_SHIFT);
+ data.src_addr = sme_me_mask | (src_pfn << PAGE_SHIFT);
+ data.dst_addr = sme_me_mask | (dst_pfn << PAGE_SHIFT);
+
+ /* The destination page must be in the firmware state. */
+ if (rmp_mark_pages_firmware(data.dst_addr, 1, false))
+ return -EIO;
+
+ ret = sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, &data, error);
+
+ /* Restore the page state */
+ if (snp_reclaim_pages(data.dst_addr, 1, false))
+ ret = -EIO;
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(snp_guest_dbg_decrypt_page);
+
static void sev_snp_certs_release(struct kref *kref)
{
struct sev_snp_certs *certs = container_of(kref, struct sev_snp_certs, kref);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 3b294ccbbec9..eb2c8a2b2a02 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -908,6 +908,20 @@ int sev_guest_decommission(struct sev_data_decommission *data, int *error);
*/
int sev_do_cmd(int cmd, void *data, int *psp_ret);
+/**
+ * snp_guest_dbg_decrypt_page - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the SEV successfully processed the command
+ * -%ENODEV if the SEV device is not available
+ * -%ENOTSUPP if the SEV does not support SEV
+ * -%ETIMEDOUT if the SEV command timed out
+ * -%EIO if the SEV returned a non-zero return code
+ */
+int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error);
+
void *psp_copy_user_blob(u64 uaddr, u32 len);
void *snp_alloc_firmware_page(gfp_t mask);
void snp_free_firmware_page(void *addr);
@@ -938,6 +952,11 @@ sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int
static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); }
+static inline int snp_guest_dbg_decrypt_page(u64 gctx_pfn, u64 src_pfn, u64 dst_pfn, int *error)
+{
+ return -ENODEV;
+}
+
static inline void *snp_alloc_firmware_page(gfp_t mask)
{
return NULL;
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread
* [PATCH v10 50/50] crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump
2023-10-16 13:27 [PATCH v10 00/50] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support Michael Roth
` (48 preceding siblings ...)
2023-10-16 13:28 ` [PATCH v10 49/50] crypto: ccp: Add debug support for decrypting pages Michael Roth
@ 2023-10-16 13:28 ` Michael Roth
49 siblings, 0 replies; 158+ messages in thread
From: Michael Roth @ 2023-10-16 13:28 UTC (permalink / raw)
To: kvm
Cc: linux-coco, linux-mm, linux-crypto, x86, linux-kernel, tglx,
mingo, jroedel, thomas.lendacky, hpa, ardb, pbonzini, seanjc,
vkuznets, jmattson, luto, dave.hansen, slp, pgonda, peterz,
srinivas.pandruvada, rientjes, dovmurik, tobin, bp, vbabka,
kirill, ak, tony.luck, marcorr, sathyanarayanan.kuppuswamy,
alpergun, jarkko, ashish.kalra, nikunj.dadhania, pankaj.gupta,
liam.merwick, zhi.a.wang
From: Ashish Kalra <ashish.kalra@amd.com>
Add a kdump safe version of sev_firmware_shutdown() registered as a
crash_kexec_post_notifier, which is invoked during panic/crash to do
SEV/SNP shutdown. This is required for transitioning all IOMMU pages
to reclaim/hypervisor state, otherwise re-init of IOMMU pages during
crashdump kernel boot fails and panics the crashdump kernel. This
panic notifier runs in atomic context, hence it ensures not to
acquire any locks/mutexes and polls for PSP command completion
instead of depending on PSP command completion interrupt.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[mdr: remove use of "we" in comments]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
arch/x86/kernel/crash.c | 7 +++
drivers/crypto/ccp/sev-dev.c | 112 +++++++++++++++++++++++++----------
2 files changed, 89 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c92d88680dbf..23ede774d31b 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -59,6 +59,13 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
*/
cpu_emergency_stop_pt();
+ /*
+ * for SNP do wbinvd() on remote CPUs to
+ * safely do SNP_SHUTDOWN on the local CPU.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SEV_SNP))
+ wbinvd();
+
disable_local_APIC();
}
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 26218df1371e..21a3064f30c9 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -21,6 +21,7 @@
#include <linux/hw_random.h>
#include <linux/ccp.h>
#include <linux/firmware.h>
+#include <linux/panic_notifier.h>
#include <linux/gfp.h>
#include <linux/cpufeature.h>
#include <linux/fs.h>
@@ -137,6 +138,26 @@ static int sev_wait_cmd_ioc(struct sev_device *sev,
{
int ret;
+ /*
+ * If invoked during panic handling, local interrupts are disabled,
+ * so the PSP command completion interrupt can't be used. Poll for
+ * PSP command completion instead.
+ */
+ if (irqs_disabled()) {
+ unsigned long timeout_usecs = (timeout * USEC_PER_SEC) / 10;
+
+ /* Poll for SEV command completion: */
+ while (timeout_usecs--) {
+ *reg = ioread32(sev->io_regs + sev->vdata->cmdresp_reg);
+ if (*reg & PSP_CMDRESP_RESP)
+ return 0;
+
+ udelay(10);
+ }
+
+ return -ETIMEDOUT;
+ }
+
ret = wait_event_timeout(sev->int_queue,
sev->int_rcvd, timeout * HZ);
if (!ret)
@@ -1058,17 +1079,6 @@ static int __sev_platform_shutdown_locked(int *error)
return ret;
}
-static int sev_platform_shutdown(int *error)
-{
- int rc;
-
- mutex_lock(&sev_cmd_mutex);
- rc = __sev_platform_shutdown_locked(NULL);
- mutex_unlock(&sev_cmd_mutex);
-
- return rc;
-}
-
static int sev_get_platform_state(int *state, int *error)
{
struct sev_user_data_status data;
@@ -1483,7 +1493,7 @@ static int __sev_snp_init_locked(int *error)
return rc;
}
-static int __sev_snp_shutdown_locked(int *error)
+static int __sev_snp_shutdown_locked(int *error, bool in_panic)
{
struct sev_device *sev = psp_master->sev_data;
struct sev_data_snp_shutdown_ex data;
@@ -1500,7 +1510,16 @@ static int __sev_snp_shutdown_locked(int *error)
sev_snp_certs_put(sev->snp_certs);
sev->snp_certs = NULL;
- wbinvd_on_all_cpus();
+ /*
+ * If invoked during panic handling, local interrupts are disabled
+ * and all CPUs are stopped, so wbinvd_on_all_cpus() can't be called.
+ * In that case, a wbinvd() is done on remote CPUs via the NMI
+ * callback, so only a local wbinvd() is needed here.
+ */
+ if (!in_panic)
+ wbinvd_on_all_cpus();
+ else
+ wbinvd();
retry:
ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN_EX, &data, error);
@@ -1543,17 +1562,6 @@ static int __sev_snp_shutdown_locked(int *error)
return ret;
}
-static int sev_snp_shutdown(int *error)
-{
- int rc;
-
- mutex_lock(&sev_cmd_mutex);
- rc = __sev_snp_shutdown_locked(error);
- mutex_unlock(&sev_cmd_mutex);
-
- return rc;
-}
-
static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
{
struct sev_device *sev = psp_master->sev_data;
@@ -2262,19 +2270,29 @@ int sev_dev_init(struct psp_device *psp)
return ret;
}
-static void sev_firmware_shutdown(struct sev_device *sev)
+static void __sev_firmware_shutdown(struct sev_device *sev, bool in_panic)
{
int error;
- sev_platform_shutdown(NULL);
+ __sev_platform_shutdown_locked(NULL);
if (sev_es_tmr) {
- /* The TMR area was encrypted, flush it from the cache */
- wbinvd_on_all_cpus();
+ /*
+ * The TMR area was encrypted, flush it from the cache
+ *
+ * If invoked during panic handling, local interrupts are
+ * disabled and all CPUs are stopped, so wbinvd_on_all_cpus()
+ * can't be used. In that case, wbinvd() is done on remote CPUs
+ * via the NMI callback, so a local wbinvd() is sufficient here.
+ */
+ if (!in_panic)
+ wbinvd_on_all_cpus();
+ else
+ wbinvd();
__snp_free_firmware_pages(virt_to_page(sev_es_tmr),
get_order(sev_es_tmr_size),
- false);
+ true);
sev_es_tmr = NULL;
}
@@ -2295,7 +2313,14 @@ static void sev_firmware_shutdown(struct sev_device *sev)
*/
free_snp_host_map(sev);
- sev_snp_shutdown(&error);
+ __sev_snp_shutdown_locked(&error, in_panic);
+}
+
+static void sev_firmware_shutdown(struct sev_device *sev)
+{
+ mutex_lock(&sev_cmd_mutex);
+ __sev_firmware_shutdown(sev, false);
+ mutex_unlock(&sev_cmd_mutex);
}
void sev_dev_destroy(struct psp_device *psp)
@@ -2313,6 +2338,28 @@ void sev_dev_destroy(struct psp_device *psp)
psp_clear_sev_irq_handler(psp);
}
+static int sev_snp_shutdown_on_panic(struct notifier_block *nb,
+ unsigned long reason, void *arg)
+{
+ struct sev_device *sev = psp_master->sev_data;
+
+ /*
+ * Panic callbacks are executed with all other CPUs stopped,
+ * so don't wait for sev_cmd_mutex to be released since it
+ * would block here forever.
+ */
+ if (mutex_is_locked(&sev_cmd_mutex))
+ return NOTIFY_DONE;
+
+ __sev_firmware_shutdown(sev, true);
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block sev_snp_panic_notifier = {
+ .notifier_call = sev_snp_shutdown_on_panic,
+};
+
int sev_issue_cmd_external_user(struct file *filep, unsigned int cmd,
void *data, int *error)
{
@@ -2360,6 +2407,8 @@ void sev_pci_init(void)
dev_info(sev->dev, "SEV%s API:%d.%d build:%d\n", sev->snp_initialized ?
"-SNP" : "", sev->api_major, sev->api_minor, sev->build);
+ atomic_notifier_chain_register(&panic_notifier_list,
+ &sev_snp_panic_notifier);
return;
err:
@@ -2375,4 +2424,7 @@ void sev_pci_exit(void)
return;
sev_firmware_shutdown(sev);
+
+ atomic_notifier_chain_unregister(&panic_notifier_list,
+ &sev_snp_panic_notifier);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 158+ messages in thread