linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "Rafael J . Wysocki" <rjw@rjwysocki.net>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Tony Luck <tony.luck@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Kuppuswamy Sathyanarayanan <knsathya@kernel.org>,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org
Subject: Re: [PATCH v2] x86: Skip WBINVD instruction for VM guest
Date: Sat, 4 Dec 2021 02:49:15 +0300	[thread overview]
Message-ID: <20211203234915.jw6kdd2qnfrionch@black.fi.intel.com> (raw)
In-Reply-To: <87lf126010.ffs@tglx>

On Fri, Dec 03, 2021 at 12:48:43AM +0100, Thomas Gleixner wrote:
> Kirill,
> 
> On Fri, Dec 03 2021 at 01:21, Kirill A. Shutemov wrote:
> > On Thu, Nov 25, 2021 at 01:40:24AM +0100, Thomas Gleixner wrote:
> >> Kuppuswamy,
> >> Either that or you provide patches with arguments which are based on
> >> proper analysis and not on 'appears to' observations.
> >
> > I think the right solution to the WBINVD would be to add a #VE handler
> > that does nothing. We don't have a reasonable way to handle it from within
> > the guest. We can call the VMM in hope that it would handle it, but VMM is
> > untrusted and it can ignore the request.
> >
> > Dave suggested that we need to do code audit to make sure that there's no
> > user inside TDX guest environment that relies on WBINVD to work correctly.
> >
> > Below is full call tree of WBINVD. It is substantially larger than I
> > anticipated from initial grep.
> >
> > Conclusions:
> >
> >   - Most of callers are in ACPI code on changing S-states. Ignoring cache
> >     flush for S-state change on virtual machine should be safe.
> >
> >   - The only WBINVD I was able to trigger is on poweroff from ACPI code.
> >     Reboot also should trigger it, but for some reason I don't see it.
> >
> >   - Few caller in CPU offline code. TDX does not allowed to offline CPU as
> >     we cannot bring it back -- we don't have SIPI. And even if offline
> >     works for vCPU it should be safe to ignore WBINVD there.
> >
> >   - NVDIMMs are not supported inside TDX. If it will change we would need
> >     to deal with cache flushing for this case. Hopefully, we would be able
> >     to avoid WBINVD.
> >
> >   - Cache QoS and MTRR use WBINVD. They are disabled in TDX, but it is
> >     controlled by VMM if the feature is advertised. We would need to
> >     filter CPUID/MSRs to make sure VMM would not mess with them.
> >
> > Is it good enough justification for do-nothing #VE WBINVD handler?
> 
> first of all thank you very much for this very profound analysis.
> 
> This is really what I was asking for and you probably went even a step
> deeper than that. Very appreciated.
> 
> What we should do instead of doing a wholesale let's ignore WBINVD is to
> have a separate function/macro:
> 
>  ACPI_FLUSH_CPU_CACHE_PHYS()
> 
> and invoke that from the functions which are considered to be safe.
> 
> That would default to ACPI_FLUSH_CPU_CACHE() for other architecures
> obviously.
> 
> Then you can rightfully do:
> 
> #define ACPI_FLUSH_CPU_CACHE_PHYS()     \
>         if (!cpu_feature_enabled(XXX))	\
>         	wbinvd();               \              
>                 
> where $XXX might be FEATURE_TDX_GUEST for paranoia sake and then
> extended to X86_FEATURE_HYPERVISOR if everyone agrees.
> 
> Then you have the #VE handler which just acts on any other wbinvd
> invocation via warn, panic, whatever, no?

I found another angle at the problem. According to the ACPI spec v6.4
section 16.2 cache flushing is required on the way to S1, S2 and S3.
And according to 8.2 it also is required on the way to C3.

TDX doesn't support these S- and C-states. TDX is only supports S0 and S5.

Adjusting code to match the spec would make TDX work automagically.

Any opinions on the patch below?

I didn't touch ACPI_FLUSH_CPU_CACHE() users in cpufreq/longhaul.c because
it might be outside of ACPI spec, I donno.

diff --git a/drivers/acpi/acpica/hwesleep.c b/drivers/acpi/acpica/hwesleep.c
index 808fdf54aeeb..b004a72a426e 100644
--- a/drivers/acpi/acpica/hwesleep.c
+++ b/drivers/acpi/acpica/hwesleep.c
@@ -104,7 +104,8 @@ acpi_status acpi_hw_extended_sleep(u8 sleep_state)
 
 	/* Flush caches, as per ACPI specification */
 
-	ACPI_FLUSH_CPU_CACHE();
+	if (sleep_state >= ACPI_STATE_S1 && sleep_state <= ACPI_STATE_S3)
+		ACPI_FLUSH_CPU_CACHE();
 
 	status = acpi_os_enter_sleep(sleep_state, sleep_control, 0);
 	if (status == AE_CTRL_TERMINATE) {
diff --git a/drivers/acpi/acpica/hwsleep.c b/drivers/acpi/acpica/hwsleep.c
index 34a3825f25d3..bfcd66efeb48 100644
--- a/drivers/acpi/acpica/hwsleep.c
+++ b/drivers/acpi/acpica/hwsleep.c
@@ -110,7 +110,8 @@ acpi_status acpi_hw_legacy_sleep(u8 sleep_state)
 
 	/* Flush caches, as per ACPI specification */
 
-	ACPI_FLUSH_CPU_CACHE();
+	if (sleep_state >= ACPI_STATE_S1 && sleep_state <= ACPI_STATE_S3)
+		ACPI_FLUSH_CPU_CACHE();
 
 	status = acpi_os_enter_sleep(sleep_state, pm1a_control, pm1b_control);
 	if (status == AE_CTRL_TERMINATE) {
diff --git a/drivers/acpi/acpica/hwxfsleep.c b/drivers/acpi/acpica/hwxfsleep.c
index e4cde23a2906..ba77598ee43e 100644
--- a/drivers/acpi/acpica/hwxfsleep.c
+++ b/drivers/acpi/acpica/hwxfsleep.c
@@ -162,8 +162,6 @@ acpi_status acpi_enter_sleep_state_s4bios(void)
 		return_ACPI_STATUS(status);
 	}
 
-	ACPI_FLUSH_CPU_CACHE();
-
 	status = acpi_hw_write_port(acpi_gbl_FADT.smi_command,
 				    (u32)acpi_gbl_FADT.s4_bios_request, 8);
 	if (ACPI_FAILURE(status)) {
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 76ef1bcc8848..01495aca850e 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -567,7 +567,8 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index)
 {
 	struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
 
-	ACPI_FLUSH_CPU_CACHE();
+	if (cx->type == ACPI_STATE_C3)
+		ACPI_FLUSH_CPU_CACHE();
 
 	while (1) {
 
diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index eaa47753b758..a81d08b762c2 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -73,7 +73,9 @@ static int acpi_sleep_prepare(u32 acpi_state)
 		acpi_set_waking_vector(acpi_wakeup_address);
 
 	}
-	ACPI_FLUSH_CPU_CACHE();
+
+	if (acpi_state >= ACPI_STATE_S1 && acpi_state <= ACPI_STATE_S3)
+		ACPI_FLUSH_CPU_CACHE();
 #endif
 	pr_info("Preparing to enter system sleep state S%d\n", acpi_state);
 	acpi_enable_wakeup_devices(acpi_state);
@@ -566,7 +568,8 @@ static int acpi_suspend_enter(suspend_state_t pm_state)
 	u32 acpi_state = acpi_target_sleep_state;
 	int error;
 
-	ACPI_FLUSH_CPU_CACHE();
+	if (acpi_state >= ACPI_STATE_S1 && acpi_state <= ACPI_STATE_S3)
+		ACPI_FLUSH_CPU_CACHE();
 
 	trace_suspend_resume(TPS("acpi_suspend"), acpi_state, true);
 	switch (acpi_state) {
@@ -903,8 +906,6 @@ static int acpi_hibernation_enter(void)
 {
 	acpi_status status = AE_OK;
 
-	ACPI_FLUSH_CPU_CACHE();
-
 	/* This shouldn't return.  If it returns, we have a problem */
 	status = acpi_enter_sleep_state(ACPI_STATE_S4);
 	/* Reprogram control registers */
-- 
 Kirill A. Shutemov

  reply	other threads:[~2021-12-03 23:49 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16  0:50 [PATCH v1 1/1] x86: Skip WBINVD instruction for VM guest Kuppuswamy Sathyanarayanan
2021-11-16 16:24 ` Borislav Petkov
2021-11-16 16:36   ` Sathyanarayanan Kuppuswamy
2021-11-19  4:03   ` [PATCH v2] " Kuppuswamy Sathyanarayanan
2021-11-25  0:40     ` Thomas Gleixner
2021-12-02 22:21       ` Kirill A. Shutemov
2021-12-02 22:38         ` Dave Hansen
2021-12-02 23:48         ` Thomas Gleixner
2021-12-03 23:49           ` Kirill A. Shutemov [this message]
2021-12-04  0:20             ` Dave Hansen
2021-12-04  0:54               ` Kirill A. Shutemov
2021-12-06 15:35                 ` Dave Hansen
2021-12-06 16:39                   ` Dan Williams
2021-12-06 16:53                     ` Dave Hansen
2021-12-06 17:51                       ` Dan Williams
2021-12-04 20:27             ` Rafael J. Wysocki
2021-12-06 12:29               ` [PATCH 0/4] ACPI/ACPICA: Only flush caches on S1/S2/S3 and C3 Kirill A. Shutemov
2021-12-06 12:29                 ` [PATCH 1/4] ACPICA: Do not flush cache for on entering S4 and S5 Kirill A. Shutemov
2021-12-08 14:58                   ` Rafael J. Wysocki
2021-12-06 12:29                 ` [PATCH 2/4] ACPI: PM: Remove redundant cache flushing Kirill A. Shutemov
2021-12-07 16:35                   ` Rafael J. Wysocki
2021-12-09 13:32                     ` Kirill A. Shutemov
2021-12-17 18:04                       ` Rafael J. Wysocki
2021-12-06 12:29                 ` [PATCH 3/4] ACPI: processor idle: Only flush cache on entering C3 Kirill A. Shutemov
2021-12-06 15:03                   ` Peter Zijlstra
2021-12-08 16:26                     ` Rafael J. Wysocki
2021-12-09 13:33                       ` Kirill A. Shutemov
2021-12-17 17:58                         ` Rafael J. Wysocki
2021-12-06 12:29                 ` [PATCH 4/4] ACPI: PM: Avoid cache flush on entering S4 Kirill A. Shutemov
2021-12-08 15:10                   ` Rafael J. Wysocki
2021-12-08 16:04                     ` Kirill A. Shutemov
2021-12-08 16:16                       ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211203234915.jw6kdd2qnfrionch@black.fi.intel.com \
    --to=kirill.shutemov@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=knsathya@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).