linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kai Huang <kai.huang@intel.com>
To: Dave Hansen <dave.hansen@intel.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: seanjc@google.com, pbonzini@redhat.com, len.brown@intel.com,
	tony.luck@intel.com, rafael.j.wysocki@intel.com,
	reinette.chatre@intel.com, dan.j.williams@intel.com,
	peterz@infradead.org, ak@linux.intel.com,
	kirill.shutemov@linux.intel.com,
	sathyanarayanan.kuppuswamy@linux.intel.com,
	isaku.yamahata@intel.com
Subject: Re: [PATCH v5 07/22] x86/virt/tdx: Implement SEAMCALL function
Date: Thu, 04 Aug 2022 22:06:13 +1200	[thread overview]
Message-ID: <7e4bdbf988addfa811e3317e6da7ef691bd46c3a.camel@intel.com> (raw)
In-Reply-To: <28ece806443e4de04d7e587d7e678d58259f9c5b.camel@intel.com>

On Thu, 2022-08-04 at 10:35 +1200, Kai Huang wrote:
> On Wed, 2022-08-03 at 07:20 -0700, Dave Hansen wrote:
> > On 8/2/22 19:37, Kai Huang wrote:
> > > On Thu, 2022-07-21 at 13:52 +1200, Kai Huang wrote:
> > > > Also, if I understand correctly above, your suggestion is we want to prevent any
> > > > CMR memory going offline so it won't be hot-removed (assuming we can get CMRs
> > > > during boot).  This looks contradicts to the requirement of being able to allow
> > > > moving memory from core-mm to driver.  When we offline the memory, we cannot
> > > > know whether the memory will be used by driver, or later hot-removed.
> > > Hi Dave,
> > > 
> > > The high level flow of device hot-removal is:
> > > 
> > > acpi_scan_hot_remove()
> > > 	-> acpi_scan_try_to_offline()
> > > 		-> acpi_bus_offline()
> > > 			-> device_offline()
> > > 				-> memory_subsys_offline()
> > > 	-> acpi_bus_trim()
> > > 		-> acpi_memory_device_remove()
> > > 
> > > 
> > > And memory_subsys_offline() can also be triggered via /sysfs:
> > > 
> > > 	echo 0 > /sys/devices/system/memory/memory30/online
> > > 
> > > After the memory block is offline, my understanding is kernel can theoretically
> > > move it to, i.e. ZONE_DEVICE via memremap_pages().
> > > 
> > > As you can see memory_subsys_offline() is the entry point of memory device
> > > offline (before it the code is generic for all ACPI device), and it cannot
> > > distinguish whether the removal is from ACPI event, or from /sysfs, so it seems
> > > we are unable to refuse to offline memory in  memory_subsys_offline() when it is
> > > called from ACPI event.
> > > 
> > > Any comments?
> > 
> > I suggest refactoring the code in a way that makes it possible to
> > distinguish the two cases.
> > 
> > It's not like you have some binary kernel.  You have the source code for
> > the whole thing and can propose changes *ANYWHERE* you need.  Even better:
> > 
> > $ grep -A2 ^ACPI\$ MAINTAINERS
> > ACPI
> > M:	"Rafael J. Wysocki" <rafael@kernel.org>
> > R:	Len Brown <lenb@kernel.org>
> > 
> > The maintainer of ACPI works for our employer.  Plus, he's a nice
> > helpful guy that you can go ask how you might refactor this or
> > approaches you might take.  Have you talked to Rafael about this issue?
> 
> Rafael once also suggested to set hotplug.enabled to 0 as your code shows below,
> but we just got the TDX architecture behaviour of memory hotplug clarified from
> Intel TDX guys recently. 
> 
> > Also, from a two-minute grepping session, I noticed this:
> > 
> > > static acpi_status acpi_bus_offline(acpi_handle handle, u32 lvl, void *data,
> > >                                     void **ret_p)
> > > {
> > ...
> > >         if (device->handler && !device->handler->hotplug.enabled) {
> > >                 *ret_p = &device->dev;
> > >                 return AE_SUPPORT;
> > >         }
> > 
> > It looks to me like if you simply set:
> > 
> > 	memory_device_handler->hotplug.enabled = false;
> > 
> > you'll get most of the behavior you want.  ACPI memory hotplug would not
> > work and the changes would be confined to the ACPI world.  The
> > "lower-level" bus-based hotplug would be unaffected.
> > 
> > Now, I don't know what kind of locking would be needed to muck with a
> > global structure like that.  But, it's a start.
> 
> This has two problems:
> 
> 1) This approach cannot distinguish non-CMR memory hotplug and CMR memory
> hotplug, as it disables ACPI memory hotplug for all.  But this is fine as we
> want to reject non-CMR memory hotplug anyway.  We just need to explain clearly
> in changelog.
> 
> 2) This won't allow the kernel to speak out "BIOS  bug" when CMR memory hotplug
> actually happens.  Instead, we can only print out "hotplug is disabled due to
> TDX is enabled by BIOS." when we set hotplug.enable to false.
> 
> Assuming above is OK, I'll explore this option.  I'll also do some research to
> see if it's still possible to speak out "BIOS bug" in this approach but it's not
> a mandatory requirement to me now.
> 
> Also, if print out "BIOS bug" for CMR memory hotplug isn't mandatory, then we
> can just detect TDX during kernel boot, and disable hotplug when TDX is enabled
> by BIOS, but don't need to use "winner-take-all" approach.  The former is
> clearer and easier to implement.  I'll go with the former approach if I don't
> hear objection from you.
> 
> And ACPI CPU hotplug can also use the same way.
> 
> Please let me know any comments.  Thanks!
> 

One more reason why "winner-take-all" approach doesn't work: 

If we allow ACPI memory hotplug to happen but choose to disable it in the
handler using "winner-take-all", then at the beginning the ACPI code will
actually create a /sysfs entry for hotplug.enabled to allow userspace to change
it:

	/sys/firmware/acpi/hotplug/memory/enabled

Which means even we set hotplug.enabled to false at some point, userspace can
turn it on again.  The only way is to not create this /sysfs entry at the
beginning.

With "winner-take-all" approach, I don't think we should avoid creating the
/sysfs entry.  Nor we should introduce arch-specific hook to, i.e. prevent
/sysfs entry being changed by userspace.

So instead of "winner-take-all" approach, I'll introduce a new kernel command
line to allow user to choose between ACPI CPU/memory hotplug vs TDX.  This
command line should not impact the "software" CPU/memory hotplug even when user
choose to use TDX.  In this case, this is similar to "winner-take-all" anyway.

  reply	other threads:[~2022-08-04 10:06 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 11:15 [PATCH v5 00/22] TDX host kernel support Kai Huang
2022-06-22 11:15 ` [PATCH v5 01/22] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2022-06-23  5:57   ` Chao Gao
2022-06-23  9:23     ` Kai Huang
2022-08-02  2:01   ` [PATCH v5 1/22] " Wu, Binbin
2022-08-03  9:25     ` Kai Huang
2022-06-22 11:15 ` [PATCH v5 02/22] cc_platform: Add new attribute to prevent ACPI CPU hotplug Kai Huang
2022-06-22 11:42   ` Rafael J. Wysocki
2022-06-23  0:01     ` Kai Huang
2022-06-27  8:01       ` Igor Mammedov
2022-06-28 10:04         ` Kai Huang
2022-06-28 11:52           ` Igor Mammedov
2022-06-28 17:33           ` Rafael J. Wysocki
2022-06-28 23:41             ` Kai Huang
2022-06-24 18:57   ` Dave Hansen
2022-06-27  5:05     ` Kai Huang
2022-07-13 11:09       ` Kai Huang
2022-07-19 17:46         ` Dave Hansen
2022-07-19 23:54           ` Kai Huang
2022-08-03  3:40       ` Binbin Wu
2022-08-03  9:20         ` Kai Huang
2022-06-29  5:33   ` Christoph Hellwig
2022-06-29  9:09     ` Kai Huang
2022-08-03  3:55   ` Binbin Wu
2022-08-03  9:21     ` Kai Huang
2022-06-22 11:15 ` [PATCH v5 03/22] cc_platform: Add new attribute to prevent ACPI memory hotplug Kai Huang
2022-06-22 11:45   ` Rafael J. Wysocki
2022-06-23  0:08     ` Kai Huang
2022-06-28 17:55       ` Rafael J. Wysocki
2022-06-28 12:01     ` Igor Mammedov
2022-06-28 23:49       ` Kai Huang
2022-06-29  8:48         ` Igor Mammedov
2022-06-29  9:13           ` Kai Huang
2022-06-22 11:16 ` [PATCH v5 04/22] x86/virt/tdx: Prevent ACPI CPU hotplug and " Kai Huang
2022-06-24  1:41   ` Chao Gao
2022-06-24 11:21     ` Kai Huang
2022-06-29  8:35       ` Yuan Yao
2022-06-29  9:17         ` Kai Huang
2022-06-29 14:22       ` Dave Hansen
2022-06-29 23:02         ` Kai Huang
2022-06-30 15:44           ` Dave Hansen
2022-06-30 22:45             ` Kai Huang
2022-06-22 11:16 ` [PATCH v5 05/22] x86/virt/tdx: Prevent hot-add driver managed memory Kai Huang
2022-06-24  2:12   ` Chao Gao
2022-06-24 11:23     ` Kai Huang
2022-06-24 19:01   ` Dave Hansen
2022-06-27  5:27     ` Kai Huang
2022-06-22 11:16 ` [PATCH v5 06/22] x86/virt/tdx: Add skeleton to initialize TDX on demand Kai Huang
2022-06-24  2:39   ` Chao Gao
2022-06-24 11:27     ` Kai Huang
2022-06-22 11:16 ` [PATCH v5 07/22] x86/virt/tdx: Implement SEAMCALL function Kai Huang
2022-06-24 18:38   ` Dave Hansen
2022-06-27  5:23     ` Kai Huang
2022-06-27 20:58       ` Dave Hansen
2022-06-27 22:10         ` Kai Huang
2022-07-19 19:39           ` Dan Williams
2022-07-19 23:28             ` Kai Huang
2022-07-20 10:18           ` Kai Huang
2022-07-20 16:48             ` Dave Hansen
2022-07-21  1:52               ` Kai Huang
2022-07-27  0:34                 ` Kai Huang
2022-07-27  0:50                   ` Dave Hansen
2022-07-27 12:46                     ` Kai Huang
2022-08-03  2:37                 ` Kai Huang
2022-08-03 14:20                   ` Dave Hansen
2022-08-03 22:35                     ` Kai Huang
2022-08-04 10:06                       ` Kai Huang [this message]
2022-06-22 11:16 ` [PATCH v5 08/22] x86/virt/tdx: Shut down TDX module in case of error Kai Huang
2022-06-24 18:50   ` Dave Hansen
2022-06-27  5:26     ` Kai Huang
2022-06-27 20:46       ` Dave Hansen
2022-06-27 22:34         ` Kai Huang
2022-06-27 22:56           ` Dave Hansen
2022-06-27 23:59             ` Kai Huang
2022-06-28  0:03               ` Dave Hansen
2022-06-28  0:11                 ` Kai Huang
2022-06-22 11:16 ` [PATCH v5 09/22] x86/virt/tdx: Detect TDX module by doing module global initialization Kai Huang
2022-06-22 11:16 ` [PATCH v5 10/22] x86/virt/tdx: Do logical-cpu scope TDX module initialization Kai Huang
2022-06-22 11:17 ` [PATCH v5 11/22] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2022-06-22 11:17 ` [PATCH v5 12/22] x86/virt/tdx: Convert all memory regions in memblock to TDX memory Kai Huang
2022-06-24 19:40   ` Dave Hansen
2022-06-27  6:16     ` Kai Huang
2022-07-07  2:37       ` Kai Huang
2022-07-07 14:26       ` Dave Hansen
2022-07-07 14:36         ` Juergen Gross
2022-07-07 23:42           ` Kai Huang
2022-07-07 23:34         ` Kai Huang
2022-08-03  1:30           ` Kai Huang
2022-08-03 14:22             ` Dave Hansen
2022-08-03 22:14               ` Kai Huang
2022-06-22 11:17 ` [PATCH v5 13/22] x86/virt/tdx: Add placeholder to construct TDMRs based on memblock Kai Huang
2022-06-22 11:17 ` [PATCH v5 14/22] x86/virt/tdx: Create TDMRs to cover all memblock memory regions Kai Huang
2022-06-22 11:17 ` [PATCH v5 15/22] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2022-06-24 20:13   ` Dave Hansen
2022-06-27 10:31     ` Kai Huang
2022-06-27 20:41       ` Dave Hansen
2022-06-27 22:50         ` Kai Huang
2022-06-27 22:57           ` Dave Hansen
2022-06-27 23:05             ` Kai Huang
2022-06-28  0:48         ` Xiaoyao Li
2022-06-28 17:03           ` Dave Hansen
2022-08-17 22:46   ` Sagi Shahar
2022-08-17 23:43     ` Huang, Kai
2022-06-22 11:17 ` [PATCH v5 16/22] x86/virt/tdx: Set up reserved areas for all TDMRs Kai Huang
2022-06-22 11:17 ` [PATCH v5 17/22] x86/virt/tdx: Reserve TDX module global KeyID Kai Huang
2022-06-22 11:17 ` [PATCH v5 18/22] x86/virt/tdx: Configure TDX module with TDMRs and " Kai Huang
2022-06-22 11:17 ` [PATCH v5 19/22] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2022-06-22 11:17 ` [PATCH v5 20/22] x86/virt/tdx: Initialize all TDMRs Kai Huang
2022-06-22 11:17 ` [PATCH v5 21/22] x86/virt/tdx: Support kexec() Kai Huang
2022-06-22 11:17 ` [PATCH v5 22/22] Documentation/x86: Add documentation for TDX host support Kai Huang
2022-08-18  4:07   ` Bagas Sanjaya
2022-08-18  9:33     ` Huang, Kai
2022-06-24 19:47 ` [PATCH v5 00/22] TDX host kernel support Dave Hansen
2022-06-27  4:09   ` Kai Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e4bdbf988addfa811e3317e6da7ef691bd46c3a.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).