linux-csky.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug
@ 2024-01-31 16:48 Russell King (Oracle)
  2024-01-31 16:49 ` [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices Russell King
                   ` (14 more replies)
  0 siblings, 15 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2024-01-31 16:48 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse

Hi,

This is another iteration of the Arm64 virtual CPU hotplug support,
updated for the review comments on the v3 smaller series posted back
in December.

I believe all feedback has been addressed - if I have missed something
then please accept my apologies. Several patches have been dropped from
the original series, including patches 2 and 3, the ACPICA patch adding
the new GICC bits (now merged), the patch to rename ACPI_HOTPLUG_CPU
and the _OSC bits. The arch_unregister_cpu() fix has been merged into
its appropriate commit.

One change that has not been addressed yet is that it is likely that
"ACPI: add support to (un)register CPUs based on the _STA enabled bit"
needs to do more cleanup than it is doing when unregistering the CPU -
much of acpi_processor_make_not_present() probably needs to be done to
properly clean up - hence why this is still RFC for now.

It is hoped that we have reached agreement with the remainder of the
patches, and we are getting close to having something that can be
merged once the above is addressed.

This is from my aarch64/hotplug-vcpu/head branch, minus the top two
commits, and is based on v6.8-rc2.

 Documentation/ABI/testing/sysfs-devices-system-cpu |   6 +
 Documentation/arch/arm64/cpu-hotplug.rst           |  79 ++++++++++
 Documentation/arch/arm64/index.rst                 |   1 +
 arch/arm64/include/asm/acpi.h                      |  11 ++
 arch/arm64/kernel/acpi_numa.c                      |  11 --
 arch/arm64/kernel/psci.c                           |   2 +-
 arch/arm64/kernel/smp.c                            |   3 +-
 drivers/acpi/acpi_processor.c                      |  99 +++++++++++--
 drivers/acpi/device_pm.c                           |   2 +-
 drivers/acpi/device_sysfs.c                        |   2 +-
 drivers/acpi/internal.h                            |   4 +-
 drivers/acpi/property.c                            |   2 +-
 drivers/acpi/scan.c                                | 162 ++++++++++++++-------
 drivers/base/cpu.c                                 |  16 +-
 drivers/irqchip/irq-gic-v3.c                       |  32 ++--
 include/acpi/acpi_bus.h                            |   1 +
 include/linux/acpi.h                               |   5 +-
 include/linux/cpumask.h                            |  25 ++++
 kernel/cpu.c                                       |   3 +
 19 files changed, 370 insertions(+), 96 deletions(-)
 create mode 100644 Documentation/arch/arm64/cpu-hotplug.rst

On Wed, Dec 13, 2023 at 12:47:31PM +0000, Russell King (Oracle) wrote:
> Hi,
> 
> This is this remaining patches for ARM64 virtual cpu hotplug, which
> follows on from the previous set of 21 patches that GregKH has
> recently queued up, and "x86: intel_epb: Don't rely on link order"
> which can be found at:
> 
> https://lore.kernel.org/r/E1r6SeD-00DCuK-M6@rmk-PC.armlinux.org.uk
> https://lore.kernel.org/r/ZVyz/Ve5pPu8AWoA@shell.armlinux.org.uk
> 
> The entire series can be found at:
> 
>  git://git.armlinux.org.uk/~rmk/linux-arm.git aarch64/hotplug-vcpu/head
> 
> The original cover message from the entire series is below the
> diffstat.
> 
>  Documentation/arch/arm64/cpu-hotplug.rst   |  79 ++++++++++++++++
>  Documentation/arch/arm64/index.rst         |   1 +
>  arch/arm64/include/asm/acpi.h              |  11 +++
>  arch/arm64/kernel/acpi_numa.c              |  11 ---
>  arch/arm64/kernel/psci.c                   |   2 +-
>  arch/arm64/kernel/smp.c                    |   3 +-
>  arch/loongarch/Kconfig                     |   2 +-
>  arch/loongarch/configs/loongson3_defconfig |   2 +-
>  arch/loongarch/kernel/acpi.c               |   4 +-
>  arch/x86/Kconfig                           |   3 +-
>  arch/x86/kernel/acpi/boot.c                |   4 +-
>  drivers/acpi/Kconfig                       |  13 ++-
>  drivers/acpi/acpi_processor.c              | 141 ++++++++++++++++++++++++++---
>  drivers/acpi/bus.c                         |  16 ++++
>  drivers/acpi/device_pm.c                   |   2 +-
>  drivers/acpi/device_sysfs.c                |   2 +-
>  drivers/acpi/internal.h                    |   1 -
>  drivers/acpi/property.c                    |   2 +-
>  drivers/acpi/scan.c                        | 140 ++++++++++++++++++----------
>  drivers/base/cpu.c                         |  16 +++-
>  drivers/irqchip/irq-gic-v3.c               |  32 ++++---
>  include/acpi/acpi_bus.h                    |   1 +
>  include/acpi/actbl2.h                      |   1 +
>  include/linux/acpi.h                       |  10 +-
>  include/linux/cpumask.h                    |  25 +++++
>  kernel/cpu.c                               |   3 +
>  26 files changed, 421 insertions(+), 106 deletions(-)
> 
> On Tue, Oct 24, 2023 at 04:15:28PM +0100, Russell King (Oracle) wrote:
> > Hi,
> > 
> > I'm posting James' patch set updated with most of the review comments
> > from his RFC v2 series back in September. Individual patches have a
> > changelog attached at the bottom of the commit message. Those which
> > I have finished updating have my S-o-b on them, those which still have
> > outstanding review comments from RFC v2 do not. In some of these cases
> > I've asked questions and am waiting for responses.
> > 
> > I'm posting this as RFC v3 because there's still some unaddressed
> > comments and it's clearly not ready for merging. Even if it was ready
> > to be merged, it is too late in this development cycle to be taking
> > this change in, so there would be little point posting it non-RFC.
> > Also James stated that he's waiting for confirmation from the
> > Kubernetes/Kata folk - I have no idea what the status is there.
> > 
> > I will be sending each patch individually to a wider audience
> > appropriate for that patch - apologies to those missing out on this
> > cover message. I have added more mailing lists to the series with the
> > exception of the acpica list in a hope of this cover message also
> > reaching those folk.
> > 
> > The changes that aren't included are:
> > 
> > 1. Updates for my patch that was merged via Thomas (thanks!):
> >    c4dd854f740c cpu-hotplug: Provide prototypes for arch CPU registration
> >    rather than having this change spread through James' patches.
> > 
> > 2. New patch - simplification of PA-RISC's smp_prepare_boot_cpu()
> > 
> > 3. Moved "ACPI: Use the acpi_device_is_present() helper in more places"
> >    and "ACPI: Rename acpi_scan_device_not_present() to be about
> >    enumeration" to the beginning of the series - these two patches are
> >    already queued up for merging into 6.7.
> > 
> > 4. Moved "arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into
> >    a helper" to the beginning of the series, which has been submitted,
> >    but as yet the fate of that posting isn't known.
> > 
> > The first four patches in this series are provided for completness only.
> > 
> > There is an additional patch in James' git tree that isn't in the set
> > of patches that James posted: "ACPI: processor: Only call
> > arch_unregister_cpu() if HOTPLUG_CPU is selected" which looks to me to
> > be a workaround for arch_unregister_cpu() being under the ifdef. I've
> > commented on this on the RFC v2 posting making a suggestion, but as yet
> > haven't had any response.
> > 
> > I've included almost all of James' original covering body below the
> > diffstat.
> > 
> > The reason that I'm doing this is to help move this code forward so
> > hopefully it can be merged - which is why I have been keen to dig out
> > from James' patches anything that can be merged and submit it
> > separately, since this is a feature for which some users have a
> > definite need for.
> > 
> > Please note that I haven't tested this beyond building for aarch64 at
> > the present time.
> > 
> > The series can be found at:
> > 
> >  git://git.armlinux.org.uk/~rmk/linux-arm.git aarch64/hotplug-vcpu/v6.6-rc7
> > 
> >  Documentation/arch/arm64/cpu-hotplug.rst   |  79 +++++++++++++++
> >  Documentation/arch/arm64/index.rst         |   1 +
> >  arch/arm64/Kconfig                         |   1 +
> >  arch/arm64/include/asm/acpi.h              |  11 +++
> >  arch/arm64/include/asm/cpu.h               |   1 -
> >  arch/arm64/kernel/acpi_numa.c              |  11 ---
> >  arch/arm64/kernel/psci.c                   |   2 +-
> >  arch/arm64/kernel/setup.c                  |  13 +--
> >  arch/arm64/kernel/smp.c                    |   5 +-
> >  arch/ia64/Kconfig                          |   3 +
> >  arch/ia64/include/asm/acpi.h               |   2 +-
> >  arch/ia64/include/asm/cpu.h                |   6 --
> >  arch/ia64/kernel/acpi.c                    |   6 +-
> >  arch/ia64/kernel/setup.c                   |   2 +-
> >  arch/ia64/kernel/topology.c                |  35 +------
> >  arch/loongarch/Kconfig                     |   2 +
> >  arch/loongarch/configs/loongson3_defconfig |   2 +-
> >  arch/loongarch/kernel/acpi.c               |   4 +-
> >  arch/loongarch/kernel/topology.c           |  38 +-------
> >  arch/parisc/kernel/smp.c                   |   8 +-
> >  arch/riscv/Kconfig                         |   1 +
> >  arch/riscv/kernel/setup.c                  |  19 +---
> >  arch/x86/Kconfig                           |   3 +
> >  arch/x86/include/asm/cpu.h                 |   4 -
> >  arch/x86/kernel/acpi/boot.c                |   4 +-
> >  arch/x86/kernel/cpu/intel_epb.c            |   2 +-
> >  arch/x86/kernel/topology.c                 |  27 +-----
> >  drivers/acpi/Kconfig                       |  14 ++-
> >  drivers/acpi/acpi_processor.c              | 151 +++++++++++++++++++++++------
> >  drivers/acpi/bus.c                         |  16 +++
> >  drivers/acpi/device_pm.c                   |   2 +-
> >  drivers/acpi/device_sysfs.c                |   2 +-
> >  drivers/acpi/internal.h                    |   1 -
> >  drivers/acpi/processor_core.c              |   2 +-
> >  drivers/acpi/property.c                    |   2 +-
> >  drivers/acpi/scan.c                        | 148 ++++++++++++++++++----------
> >  drivers/base/arch_topology.c               |  38 +++++---
> >  drivers/base/cpu.c                         |  44 +++++++--
> >  drivers/base/init.c                        |   2 +-
> >  drivers/base/node.c                        |   7 --
> >  drivers/firmware/psci/psci.c               |   2 +
> >  drivers/irqchip/irq-gic-v3.c               |  38 +++++---
> >  include/acpi/acpi_bus.h                    |   1 +
> >  include/acpi/actbl2.h                      |   1 +
> >  include/linux/acpi.h                       |  13 ++-
> >  include/linux/cpu.h                        |   4 +
> >  include/linux/cpumask.h                    |  25 +++++
> >  kernel/cpu.c                               |   3 +
> >  48 files changed, 516 insertions(+), 292 deletions(-)
> > 
> > 
> > On Wed, Sep 13, 2023 at 04:37:48PM +0000, James Morse wrote:
> > > Hello!
> > > 
> > > Changes since RFC-v1:
> > >  * riscv is new, ia64 is gone
> > >  * The KVM support is different, and upstream - no need to patch the host.
> > > 
> > > ---
> > > 
> > > This series adds what looks like cpuhotplug support to arm64 for use in
> > > virtual machines. It does this by moving the cpu_register() calls for
> > > architectures that support ACPI out of the arch code by using
> > > GENERIC_CPU_DEVICES, then into the ACPI processor driver.
> > > 
> > > The kubernetes folk really want to be able to add CPUs to an existing VM,
> > > in exactly the same way they do on x86. The use-case is pre-booting guests
> > > with one CPU, then adding the number that were actually needed when the
> > > workload is provisioned.
> > > 
> > > Wait? Doesn't arm64 support cpuhotplug already!?
> > > In the arm world, cpuhotplug gets used to mean removing the power from a CPU.
> > > The CPU is offline, and remains present. For x86, and ACPI, cpuhotplug
> > > has the additional step of physically removing the CPU, so that it isn't
> > > present anymore.
> > > 
> > > Arm64 doesn't support this, and can't support it: CPUs are really a slice
> > > of the SoC, and there is not enough information in the existing ACPI tables
> > > to describe which bits of the slice also got removed. Without a reference
> > > machine: adding this support to the spec is a wild goose chase.
> > > 
> > > Critically: everything described in the firmware tables must remain present.
> > > 
> > > For a virtual machine this is easy as all the other bits of 'virtual SoC'
> > > are emulated, so they can (and do) remain present when a vCPU is 'removed'.
> > > 
> > > On a system that supports cpuhotplug the MADT has to describe every possible
> > > CPU at boot. Under KVM, the vGIC needs to know about every possible vCPU before
> > > the guest is started.
> > > With these constraints, virtual-cpuhotplug is really just a hypervisor/firmware
> > > policy about which CPUs can be brought online.
> > > 
> > > This series adds support for virtual-cpuhotplug as exactly that: firmware
> > > policy. This may even work on a physical machine too; for a guest the part of
> > > firmware is played by the VMM. (typically Qemu).
> > > 
> > > PSCI support is modified to return 'DENIED' if the CPU can't be brought
> > > online/enabled yet. The CPU object's _STA method's enabled bit is used to
> > > indicate firmware's current disposition. If the CPU has its enabled bit clear,
> > > it will not be registered with sysfs, and attempts to bring it online will
> > > fail. The notifications that _STA has changed its value then work in the same
> > > way as physical hotplug, and firmware can cause the CPU to be registered some
> > > time later, allowing it to be brought online.
> > > 
> > > This creates something that looks like cpuhotplug to user-space, as the sysfs
> > > files appear and disappear, and the udev notifications look the same.
> > > 
> > > One notable difference is the CPU present mask, which is exposed via sysfs.
> > > Because the CPUs remain present throughout, they can still be seen in that mask.
> > > This value does get used by webbrowsers to estimate the number of CPUs
> > > as the CPU online mask is constantly changed on mobile phones.
> > > 
> > > Linux is tolerant of PSCI returning errors, as its always been allowed to do
> > > that. To avoid confusing OS that can't tolerate this, we needed an additional
> > > bit in the MADT GICC flags. This series copies ACPI_MADT_ONLINE_CAPABLE, which
> > > appears to be for this purpose, but calls it ACPI_MADT_GICC_CPU_CAPABLE as it
> > > has a different bit position in the GICC.
> > > 
> > > This code is unconditionally enabled for all ACPI architectures.
> > > If there are problems with firmware tables on some devices, the CPUs will
> > > already be online by the time the acpi_processor_make_enabled() is called.
> > > A mismatch here causes a firmware-bug message and kernel taint. This should
> > > only affect people with broken firmware who also boot with maxcpus=1, and
> > > bring CPUs online later.
> > > 
> > > I had a go at switching the remaining architectures over to GENERIC_CPU_DEVICES,
> > > so that the Kconfig symbol can be removed, but I got stuck with powerpc
> > > and s390.
> > > 
> > > I've only build tested Loongarch and riscv. I've removed the ia64 specific
> > > patches, but left the changes in other patches to make git-grep review of
> > > renames easier.
> > > 
> > > If folk want to play along at home, you'll need a copy of Qemu that supports this.
> > > https://github.com/salil-mehta/qemu.git salil/virt-cpuhp-armv8/rfc-v2-rc6
> > > 
> > > Replace your '-smp' argument with something like:
> > > | -smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1
> > > 
> > > then feed the following to the Qemu montior;
> > > | (qemu) device_add driver=host-arm-cpu,core-id=1,id=cpu1
> > > | (qemu) device_del cpu1
> > > 
> > > 
> > > Why is this still an RFC? I'm still looking for confirmation from the
> > > kubernetes/kata folk that this works for them. Because of this I've culled
> > > the CC list...
> > > 
> > > 
> > > This series is based on v6.6-rc1, and can be retrieved from:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/ virtual_cpu_hotplug/rfc/v2
> > > 
> > > 
> > > Thanks,
> > > 
> > > James Morse (34):
> > >   ACPI: Move ACPI_HOTPLUG_CPU to be disabled on arm64 and riscv
> > >   drivers: base: Use present CPUs in GENERIC_CPU_DEVICES
> > >   drivers: base: Allow parts of GENERIC_CPU_DEVICES to be overridden
> > >   drivers: base: Move cpu_dev_init() after node_dev_init()
> > >   drivers: base: Print a warning instead of panic() when register_cpu()
> > >     fails
> > >   arm64: setup: Switch over to GENERIC_CPU_DEVICES using
> > >     arch_register_cpu()
> > >   x86: intel_epb: Don't rely on link order
> > >   x86/topology: Switch over to GENERIC_CPU_DEVICES
> > >   LoongArch: Switch over to GENERIC_CPU_DEVICES
> > >   riscv: Switch over to GENERIC_CPU_DEVICES
> > >   arch_topology: Make register_cpu_capacity_sysctl() tolerant to late
> > >     CPUs
> > >   ACPI: Use the acpi_device_is_present() helper in more places
> > >   ACPI: Rename acpi_scan_device_not_present() to be about enumeration
> > >   ACPI: Only enumerate enabled (or functional) devices
> > >   ACPI: processor: Add support for processors described as container
> > >     packages
> > >   ACPI: processor: Register CPUs that are online, but not described in
> > >     the DSDT
> > >   ACPI: processor: Register all CPUs from acpi_processor_get_info()
> > >   ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'
> > >   ACPI: Move acpi_bus_trim_one() before acpi_scan_hot_remove()
> > >   ACPI: Rename acpi_processor_hotadd_init and remove pre-processor
> > >     guards
> > >   ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
> > >   ACPI: Check _STA present bit before making CPUs not present
> > >   ACPI: Warn when the present bit changes but the feature is not enabled
> > >   drivers: base: Implement weak arch_unregister_cpu()
> > >   LoongArch: Use the __weak version of arch_unregister_cpu()
> > >   arm64: acpi: Move get_cpu_for_acpi_id() to a header
> > >   ACPICA: Add new MADT GICC flags fields [code first?]
> > >   arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a
> > >     helper
> > >   irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
> > >   irqchip/gic-v3: Add support for ACPI's disabled but 'online capable'
> > >     CPUs
> > >   ACPI: add support to register CPUs based on the _STA enabled bit
> > >   arm64: document virtual CPU hotplug's expectations
> > >   ACPI: Add _OSC bits to advertise OS support for toggling CPU
> > >     present/enabled
> > >   cpumask: Add enabled cpumask for present CPUs that can be brought
> > >     online
> > > 
> > > Jean-Philippe Brucker (1):
> > >   arm64: psci: Ignore DENIED CPUs
> > 
> > -- 
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
> > 
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
@ 2024-01-31 16:49 ` Russell King
  2024-01-31 17:25   ` Miguel Luis
                     ` (2 more replies)
  2024-01-31 16:49 ` [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info() Russell King
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:49 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

Today the ACPI enumeration code 'visits' all devices that are present.

This is a problem for arm64, where CPUs are always present, but not
always enabled. When a device-check occurs because the firmware-policy
has changed and a CPU is now enabled, the following error occurs:
| acpi ACPI0007:48: Enumeration failure

This is ultimately because acpi_dev_ready_for_enumeration() returns
true for a device that is not enabled. The ACPI Processor driver
will not register such CPUs as they are not 'decoding their resources'.

ACPI allows a device to be functional instead of maintaining the
present and enabled bit, but we can't simply check the enabled bit
for all devices since firmware can be buggy.

If ACPI indicates that the device is present and enabled, then all well
and good, we can enumate it. However, if the device is present and not
enabled, then we also check whether the device is a processor device
to limit the impact of this new check to just processor devices.

This avoids enumerating present && functional processor devices that
are not enabled.

Signed-off-by: James Morse <james.morse@arm.com>
Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Changes since RFC v2:
 * Incorporate comment suggestion by Gavin Shan.
Changes since RFC v3:
 * Fixed "sert" typo.
Changes since RFC v3 (smaller series):
 * Restrict checking the enabled bit to processor devices, update
   commit comments.
 * Use Rafael's suggestion in
   https://lore.kernel.org/r/5760569.DvuYhMxLoT@kreacher
 * Updated with a fix - see:
   https://lore.kernel.org/all/Zbe8WQRASx6D6RaG@shell.armlinux.org.uk/
---
 drivers/acpi/acpi_processor.c | 11 +++++++++
 drivers/acpi/device_pm.c      |  2 +-
 drivers/acpi/device_sysfs.c   |  2 +-
 drivers/acpi/internal.h       |  4 ++-
 drivers/acpi/property.c       |  2 +-
 drivers/acpi/scan.c           | 46 +++++++++++++++++++++++++++--------
 6 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 4fe2ef54088c..cf7c1cca69dd 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -626,6 +626,17 @@ static struct acpi_scan_handler processor_handler = {
 	},
 };
 
+bool acpi_device_is_processor(const struct acpi_device *adev)
+{
+	if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
+		return true;
+
+	if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
+		return false;
+
+	return acpi_scan_check_handler(adev, &processor_handler);
+}
+
 static int acpi_processor_container_attach(struct acpi_device *dev,
 					   const struct acpi_device_id *id)
 {
diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
index 3b4d048c4941..e3c80f3b3b57 100644
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
 		return -EINVAL;
 
 	device->power.state = ACPI_STATE_UNKNOWN;
-	if (!acpi_device_is_present(device)) {
+	if (!acpi_dev_ready_for_enumeration(device)) {
 		device->flags.initialized = false;
 		return -ENXIO;
 	}
diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
index 23373faa35ec..a0256d2493a7 100644
--- a/drivers/acpi/device_sysfs.c
+++ b/drivers/acpi/device_sysfs.c
@@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
 	struct acpi_hardware_id *id;
 
 	/* Avoid unnecessarily loading modules for non present devices. */
-	if (!acpi_device_is_present(acpi_dev))
+	if (!acpi_dev_ready_for_enumeration(acpi_dev))
 		return 0;
 
 	/*
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index 6588525c45ef..1bc8b6db60c5 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
 int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
 				       const char *hotplug_profile_name);
 void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
+bool acpi_scan_check_handler(const struct acpi_device *adev,
+			     struct acpi_scan_handler *handler);
 
 #ifdef CONFIG_DEBUG_FS
 extern struct dentry *acpi_debugfs_dir;
@@ -121,7 +123,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
 void acpi_device_remove_files(struct acpi_device *dev);
 void acpi_device_add_finalize(struct acpi_device *device);
 void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
-bool acpi_device_is_present(const struct acpi_device *adev);
 bool acpi_device_is_battery(struct acpi_device *adev);
 bool acpi_device_is_first_physical_node(struct acpi_device *adev,
 					const struct device *dev);
@@ -133,6 +134,7 @@ int acpi_bus_register_early_device(int type);
 const struct acpi_device *acpi_companion_match(const struct device *dev);
 int __acpi_device_uevent_modalias(const struct acpi_device *adev,
 				  struct kobj_uevent_env *env);
+bool acpi_device_is_processor(const struct acpi_device *adev);
 
 /* --------------------------------------------------------------------------
                                   Power Resource
diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
index a6ead5204046..9f8d54038770 100644
--- a/drivers/acpi/property.c
+++ b/drivers/acpi/property.c
@@ -1486,7 +1486,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
 	if (!is_acpi_device_node(fwnode))
 		return false;
 
-	return acpi_device_is_present(to_acpi_device_node(fwnode));
+	return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
 }
 
 static const void *
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index e6ed1ba91e5c..fd2e8b3a5749 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
 	int error;
 
 	acpi_bus_get_status(adev);
-	if (acpi_device_is_present(adev)) {
+	if (acpi_dev_ready_for_enumeration(adev)) {
 		/*
 		 * This function is only called for device objects for which
 		 * matching scan handlers exist.  The only situation in which
@@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
 	int error;
 
 	acpi_bus_get_status(adev);
-	if (!acpi_device_is_present(adev)) {
+	if (!acpi_dev_ready_for_enumeration(adev)) {
 		acpi_scan_device_not_enumerated(adev);
 		return 0;
 	}
@@ -1917,11 +1917,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
 	return true;
 }
 
-bool acpi_device_is_present(const struct acpi_device *adev)
-{
-	return adev->status.present || adev->status.functional;
-}
-
 static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
 				       const char *idstr,
 				       const struct acpi_device_id **matchid)
@@ -1942,6 +1937,18 @@ static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
 	return false;
 }
 
+bool acpi_scan_check_handler(const struct acpi_device *adev,
+			     struct acpi_scan_handler *handler)
+{
+	struct acpi_hardware_id *hwid;
+
+	list_for_each_entry(hwid, &adev->pnp.ids, list)
+		if (acpi_scan_handler_matching(handler, hwid->id, NULL))
+			return true;
+
+	return false;
+}
+
 static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
 					const struct acpi_device_id **matchid)
 {
@@ -2405,16 +2412,35 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
  * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
  * @device: Pointer to the &struct acpi_device to check
  *
- * Check if the device is present and has no unmet dependencies.
+ * Check if the device is functional or enabled and has no unmet dependencies.
  *
- * Return true if the device is ready for enumeratino. Otherwise, return false.
+ * Return true if the device is ready for enumeration. Otherwise, return false.
  */
 bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
 {
 	if (device->flags.honor_deps && device->dep_unmet)
 		return false;
 
-	return acpi_device_is_present(device);
+	/*
+	 * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
+	 * (!present && functional) for certain types of devices that should be
+	 * enumerated. Note that the enabled bit should not be set unless the
+	 * present bit is set.
+	 *
+	 * However, limit this only to processor devices to reduce possible
+	 * regressions with firmware.
+	 */
+	if (!device->status.present)
+		return device->status.functional;
+
+	/*
+	 * Fast path - if enabled is set, avoid the more expensive test to
+	 * check whether this device is a processor.
+	 */
+	if (device->status.enabled)
+		return true;
+
+	return !acpi_device_is_processor(device);
 }
 EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
  2024-01-31 16:49 ` [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices Russell King
@ 2024-01-31 16:49 ` Russell King
  2024-02-15 19:22   ` Rafael J. Wysocki
  2024-01-31 16:49 ` [PATCH RFC v4 03/15] ACPI: Move acpi_bus_trim_one() before acpi_scan_hot_remove() Russell King
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Russell King @ 2024-01-31 16:49 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

To allow ACPI to skip the call to arch_register_cpu() when the _STA
value indicates the CPU can't be brought online right now, move the
arch_register_cpu() call into acpi_processor_get_info().

Systems can still be booted with 'acpi=off', or not include an
ACPI description at all. For these, the CPUs continue to be
registered by cpu_dev_register_generic().

This moves the CPU register logic back to a subsys_initcall(),
while the memory nodes will have been registered earlier.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Changes since RFC v2:
 * Fixup comment in acpi_processor_get_info() (Gavin Shan)
 * Add comment in cpu_dev_register_generic() (Gavin Shan)
---
 drivers/acpi/acpi_processor.c | 12 ++++++++++++
 drivers/base/cpu.c            |  6 +++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index cf7c1cca69dd..a68c475cdea5 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
 			cpufreq_add_device("acpi-cpufreq");
 	}
 
+	/*
+	 * Register CPUs that are present. get_cpu_device() is used to skip
+	 * duplicate CPU descriptions from firmware.
+	 */
+	if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
+	    !get_cpu_device(pr->id)) {
+		int ret = arch_register_cpu(pr->id);
+
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 *  Extra Processor objects may be enumerated on MP systems with
 	 *  less than the max # of CPUs. They should be ignored _iff
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 47de0f140ba6..13d052bf13f4 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
 {
 	int i, ret;
 
-	if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
+	/*
+	 * When ACPI is enabled, CPUs are registered via
+	 * acpi_processor_get_info().
+	 */
+	if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
 		return;
 
 	for_each_present_cpu(i) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 03/15] ACPI: Move acpi_bus_trim_one() before acpi_scan_hot_remove()
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
  2024-01-31 16:49 ` [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices Russell King
  2024-01-31 16:49 ` [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info() Russell King
@ 2024-01-31 16:49 ` Russell King
  2024-01-31 16:49 ` [PATCH RFC v4 04/15] ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards Russell King
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:49 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

A subsequent patch will change acpi_scan_hot_remove() to call
acpi_bus_trim_one() instead of acpi_bus_trim(), meaning it can no longer
rely on the prototype in the header file.

Move these functions further up the file.
No change in behaviour.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/acpi/scan.c | 76 ++++++++++++++++++++++-----------------------
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index fd2e8b3a5749..2c8ba4526278 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -244,6 +244,44 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
 	return 0;
 }
 
+static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
+{
+	struct acpi_scan_handler *handler = adev->handler;
+
+	acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
+
+	adev->flags.match_driver = false;
+	if (handler) {
+		if (handler->detach)
+			handler->detach(adev);
+
+		adev->handler = NULL;
+	} else {
+		device_release_driver(&adev->dev);
+	}
+	/*
+	 * Most likely, the device is going away, so put it into D3cold before
+	 * that.
+	 */
+	acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
+	adev->flags.initialized = false;
+	acpi_device_clear_enumerated(adev);
+
+	return 0;
+}
+
+/**
+ * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects.
+ * @adev: Root of the ACPI namespace scope to walk.
+ *
+ * Must be called under acpi_scan_lock.
+ */
+void acpi_bus_trim(struct acpi_device *adev)
+{
+	acpi_bus_trim_one(adev, NULL);
+}
+EXPORT_SYMBOL_GPL(acpi_bus_trim);
+
 static int acpi_scan_hot_remove(struct acpi_device *device)
 {
 	acpi_handle handle = device->handle;
@@ -2576,44 +2614,6 @@ int acpi_bus_scan(acpi_handle handle)
 }
 EXPORT_SYMBOL(acpi_bus_scan);
 
-static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
-{
-	struct acpi_scan_handler *handler = adev->handler;
-
-	acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
-
-	adev->flags.match_driver = false;
-	if (handler) {
-		if (handler->detach)
-			handler->detach(adev);
-
-		adev->handler = NULL;
-	} else {
-		device_release_driver(&adev->dev);
-	}
-	/*
-	 * Most likely, the device is going away, so put it into D3cold before
-	 * that.
-	 */
-	acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
-	adev->flags.initialized = false;
-	acpi_device_clear_enumerated(adev);
-
-	return 0;
-}
-
-/**
- * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects.
- * @adev: Root of the ACPI namespace scope to walk.
- *
- * Must be called under acpi_scan_lock.
- */
-void acpi_bus_trim(struct acpi_device *adev)
-{
-	acpi_bus_trim_one(adev, NULL);
-}
-EXPORT_SYMBOL_GPL(acpi_bus_trim);
-
 int acpi_bus_register_early_device(int type)
 {
 	struct acpi_device *device = NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 04/15] ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (2 preceding siblings ...)
  2024-01-31 16:49 ` [PATCH RFC v4 03/15] ACPI: Move acpi_bus_trim_one() before acpi_scan_hot_remove() Russell King
@ 2024-01-31 16:49 ` Russell King
  2024-01-31 16:50 ` [PATCH RFC v4 05/15] ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug Russell King
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:49 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

acpi_processor_hotadd_init() will make a CPU present by mapping it
based on its hardware id.

'hotadd_init' is ambiguous once there are two different behaviours
for cpu hotplug. This is for toggling the _STA present bit. Subsequent
patches will add support for toggling the _STA enabled bit, named
acpi_processor_make_enabled().

Rename it acpi_processor_make_present() to make it clear this is
for CPUs that were not previously present.

Expose the function prototypes it uses to allow the preprocessor
guards to be removed. The IS_ENABLED() check will let the compiler
dead-code elimination pass remove this if it isn't going to be
used.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/acpi/acpi_processor.c | 14 +++++---------
 include/linux/acpi.h          |  2 --
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index a68c475cdea5..fd8f3a0572cb 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -183,13 +183,15 @@ static void __init acpi_pcc_cpufreq_init(void) {}
 #endif /* CONFIG_X86 */
 
 /* Initialization */
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_processor_hotadd_init(struct acpi_processor *pr)
+static int acpi_processor_make_present(struct acpi_processor *pr)
 {
 	unsigned long long sta;
 	acpi_status status;
 	int ret;
 
+	if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU))
+		return -ENODEV;
+
 	if (invalid_phys_cpuid(pr->phys_id))
 		return -ENODEV;
 
@@ -223,12 +225,6 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr)
 	cpu_maps_update_done();
 	return ret;
 }
-#else
-static inline int acpi_processor_hotadd_init(struct acpi_processor *pr)
-{
-	return -ENODEV;
-}
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
 static int acpi_processor_get_info(struct acpi_device *device)
 {
@@ -335,7 +331,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
 	 *  because cpuid <-> apicid mapping is persistent now.
 	 */
 	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
-		int ret = acpi_processor_hotadd_init(pr);
+		int ret = acpi_processor_make_present(pr);
 
 		if (ret)
 			return ret;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index b7165e52b3c6..76ad43f7860b 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -302,12 +302,10 @@ static inline int acpi_processor_evaluate_cst(acpi_handle handle, u32 cpu,
 }
 #endif
 
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Arch dependent functions for cpu hotplug support */
 int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id,
 		 int *pcpu);
 int acpi_unmap_cpu(int cpu);
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
 #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
 int acpi_get_ioapic_id(acpi_handle handle, u32 gsi_base, u64 *phys_addr);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 05/15] ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (3 preceding siblings ...)
  2024-01-31 16:49 ` [PATCH RFC v4 04/15] ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards Russell King
@ 2024-01-31 16:50 ` Russell King
  2024-01-31 16:50 ` [PATCH RFC v4 06/15] ACPI: convert acpi_processor_post_eject() to use IS_ENABLED() Russell King (Oracle)
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

struct acpi_scan_handler has a detach callback that is used to remove
a driver when a bus is changed. When interacting with an eject-request,
the detach callback is called before _EJ0.

This means the ACPI processor driver can't use _STA to determine if a
CPU has been made not-present, or some of the other _STA bits have been
changed. acpi_processor_remove() needs to know the value of _STA after
_EJ0 has been called.

Add a post_eject callback to struct acpi_scan_handler. This is called
after acpi_scan_hot_remove() has successfully called _EJ0. Because
acpi_bus_trim_one() also clears the handler pointer, it needs to be
told if the caller will go on to call acpi_bus_post_eject(), so
that acpi_device_clear_enumerated() and clearing the handler pointer
can be deferred. The existing not-used pointer is used for this.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
---
Outstanding comments:
 https://lore.kernel.org/r/20230914152809.00003152@Huawei.com
 https://lore.kernel.org/r/c3ef8123-1fcc-7289-c475-c753de44d564@redhat.com
---
 drivers/acpi/acpi_processor.c |  4 +--
 drivers/acpi/scan.c           | 52 ++++++++++++++++++++++++++++++-----
 include/acpi/acpi_bus.h       |  1 +
 3 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index fd8f3a0572cb..00809c3b09f7 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -459,7 +459,7 @@ static int acpi_processor_add(struct acpi_device *device,
 
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Removal */
-static void acpi_processor_remove(struct acpi_device *device)
+static void acpi_processor_post_eject(struct acpi_device *device)
 {
 	struct acpi_processor *pr;
 
@@ -627,7 +627,7 @@ static struct acpi_scan_handler processor_handler = {
 	.ids = processor_device_ids,
 	.attach = acpi_processor_add,
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
-	.detach = acpi_processor_remove,
+	.post_eject = acpi_processor_post_eject,
 #endif
 	.hotplug = {
 		.enabled = true,
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 2c8ba4526278..e4f4542f29af 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -244,18 +244,28 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
 	return 0;
 }
 
-static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
+/**
+ * acpi_bus_trim_one() - Detach scan handlers and drivers from ACPI device
+ *                       objects.
+ * @adev:       Root of the ACPI namespace scope to walk.
+ * @eject:      Pointer to a bool that indicates if this was due to an
+ *              eject-request.
+ *
+ * Must be called under acpi_scan_lock.
+ * If @eject points to true, clearing the device enumeration is deferred until
+ * acpi_bus_post_eject() is called.
+ */
+static int acpi_bus_trim_one(struct acpi_device *adev, void *eject)
 {
 	struct acpi_scan_handler *handler = adev->handler;
+	bool is_eject = *(bool *)eject;
 
-	acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
+	acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, eject);
 
 	adev->flags.match_driver = false;
 	if (handler) {
 		if (handler->detach)
 			handler->detach(adev);
-
-		adev->handler = NULL;
 	} else {
 		device_release_driver(&adev->dev);
 	}
@@ -265,7 +275,12 @@ static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
 	 */
 	acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
 	adev->flags.initialized = false;
-	acpi_device_clear_enumerated(adev);
+
+	/* For eject this is deferred to acpi_bus_post_eject() */
+	if (!is_eject) {
+		adev->handler = NULL;
+		acpi_device_clear_enumerated(adev);
+	}
 
 	return 0;
 }
@@ -278,15 +293,36 @@ static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
  */
 void acpi_bus_trim(struct acpi_device *adev)
 {
-	acpi_bus_trim_one(adev, NULL);
+	bool eject = false;
+
+	acpi_bus_trim_one(adev, &eject);
 }
 EXPORT_SYMBOL_GPL(acpi_bus_trim);
 
+static int acpi_bus_post_eject(struct acpi_device *adev, void *not_used)
+{
+	struct acpi_scan_handler *handler = adev->handler;
+
+	acpi_dev_for_each_child_reverse(adev, acpi_bus_post_eject, NULL);
+
+	if (handler) {
+		if (handler->post_eject)
+			handler->post_eject(adev);
+
+		adev->handler = NULL;
+	}
+
+	acpi_device_clear_enumerated(adev);
+
+	return 0;
+}
+
 static int acpi_scan_hot_remove(struct acpi_device *device)
 {
 	acpi_handle handle = device->handle;
 	unsigned long long sta;
 	acpi_status status;
+	bool eject = true;
 
 	if (device->handler && device->handler->hotplug.demand_offline) {
 		if (!acpi_scan_is_offline(device, true))
@@ -299,7 +335,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
 
 	acpi_handle_debug(handle, "Ejecting\n");
 
-	acpi_bus_trim(device);
+	acpi_bus_trim_one(device, &eject);
 
 	acpi_evaluate_lck(handle, 0);
 	/*
@@ -322,6 +358,8 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
 	} else if (sta & ACPI_STA_DEVICE_ENABLED) {
 		acpi_handle_warn(handle,
 			"Eject incomplete - status 0x%llx\n", sta);
+	} else {
+		acpi_bus_post_eject(device, NULL);
 	}
 
 	return 0;
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index e4d24d3f9abb..436e7e022e4e 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -129,6 +129,7 @@ struct acpi_scan_handler {
 	bool (*match)(const char *idstr, const struct acpi_device_id **matchid);
 	int (*attach)(struct acpi_device *dev, const struct acpi_device_id *id);
 	void (*detach)(struct acpi_device *dev);
+	void (*post_eject)(struct acpi_device *dev);
 	void (*bind)(struct device *phys_dev);
 	void (*unbind)(struct device *phys_dev);
 	struct acpi_hotplug_profile hotplug;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 06/15] ACPI: convert acpi_processor_post_eject() to use IS_ENABLED()
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (4 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 05/15] ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug Russell King
@ 2024-01-31 16:50 ` Russell King (Oracle)
  2024-01-31 16:50 ` [PATCH RFC v4 07/15] ACPI: Check _STA present bit before making CPUs not present Russell King
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

Rather than ifdef'ing acpi_processor_post_eject() and its use site, use
IS_ENABLED() to increase compile coverage.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/acpi/acpi_processor.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 00809c3b09f7..5351095b6968 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -457,12 +457,14 @@ static int acpi_processor_add(struct acpi_device *device,
 	return result;
 }
 
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
 /* Removal */
 static void acpi_processor_post_eject(struct acpi_device *device)
 {
 	struct acpi_processor *pr;
 
+	if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU))
+		return;
+
 	if (!device || !acpi_driver_data(device))
 		return;
 
@@ -501,7 +503,6 @@ static void acpi_processor_post_eject(struct acpi_device *device)
 	free_cpumask_var(pr->throttling.shared_cpu_map);
 	kfree(pr);
 }
-#endif /* CONFIG_ACPI_HOTPLUG_CPU */
 
 #ifdef CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC
 bool __init processor_physically_present(acpi_handle handle)
@@ -626,9 +627,7 @@ static const struct acpi_device_id processor_device_ids[] = {
 static struct acpi_scan_handler processor_handler = {
 	.ids = processor_device_ids,
 	.attach = acpi_processor_add,
-#ifdef CONFIG_ACPI_HOTPLUG_CPU
 	.post_eject = acpi_processor_post_eject,
-#endif
 	.hotplug = {
 		.enabled = true,
 	},
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 07/15] ACPI: Check _STA present bit before making CPUs not present
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (5 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 06/15] ACPI: convert acpi_processor_post_eject() to use IS_ENABLED() Russell King (Oracle)
@ 2024-01-31 16:50 ` Russell King
  2024-01-31 16:50 ` [PATCH RFC v4 08/15] ACPI: Warn when the present bit changes but the feature is not enabled Russell King
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

When called acpi_processor_post_eject() unconditionally make a CPU
not-present and unregisters it.

To add support for AML events where the CPU has become disabled, but
remains present, the _STA method should be checked before calling
acpi_processor_remove().

Rename acpi_processor_post_eject() acpi_processor_remove_possible(), and
check the _STA before calling.

Adding the function prototype for arch_unregister_cpu() allows the
preprocessor guards to be removed.

After this change CPUs will remain registered and visible to
user-space as offline if buggy firmware triggers an eject-request,
but doesn't clear the corresponding _STA bits after _EJ0 has been
called.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Changes since RFC v3:
 * Move IS_ENABLED(CONFIG_ACPI_HOTPLUG_PRESENT_CPU) into separate patch.
---
 drivers/acpi/acpi_processor.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 5351095b6968..77d47e6c2474 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -458,16 +458,13 @@ static int acpi_processor_add(struct acpi_device *device,
 }
 
 /* Removal */
-static void acpi_processor_post_eject(struct acpi_device *device)
+static void acpi_processor_make_not_present(struct acpi_device *device)
 {
 	struct acpi_processor *pr;
 
 	if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU))
 		return;
 
-	if (!device || !acpi_driver_data(device))
-		return;
-
 	pr = acpi_driver_data(device);
 	if (pr->id >= nr_cpu_ids)
 		goto out;
@@ -504,6 +501,29 @@ static void acpi_processor_post_eject(struct acpi_device *device)
 	kfree(pr);
 }
 
+static void acpi_processor_post_eject(struct acpi_device *device)
+{
+	struct acpi_processor *pr;
+	unsigned long long sta;
+	acpi_status status;
+
+	if (!device)
+		return;
+
+	pr = acpi_driver_data(device);
+	if (!pr || pr->id >= nr_cpu_ids || invalid_phys_cpuid(pr->phys_id))
+		return;
+
+	status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
+	if (ACPI_FAILURE(status))
+		return;
+
+	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_PRESENT)) {
+		acpi_processor_make_not_present(device);
+		return;
+	}
+}
+
 #ifdef CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC
 bool __init processor_physically_present(acpi_handle handle)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 08/15] ACPI: Warn when the present bit changes but the feature is not enabled
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (6 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 07/15] ACPI: Check _STA present bit before making CPUs not present Russell King
@ 2024-01-31 16:50 ` Russell King
  2024-01-31 16:50 ` [PATCH RFC v4 09/15] arm64: acpi: Move get_cpu_for_acpi_id() to a header Russell King
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

ACPI firmware can trigger the events to add and remove CPUs, but the
OS may not support this.

Print an error message when this happens.

This gives early warning on arm64 systems that don't support
CONFIG_ACPI_HOTPLUG_PRESENT_CPU, as making CPUs not present has
side effects for other parts of the system.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Changes since RFC v2:
 * Update commit message with suggestion from Gavin Shan
---
 drivers/acpi/acpi_processor.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 77d47e6c2474..d1d33e74216c 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -189,8 +189,10 @@ static int acpi_processor_make_present(struct acpi_processor *pr)
 	acpi_status status;
 	int ret;
 
-	if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU))
+	if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU)) {
+		pr_err_once("Changing CPU present bit is not supported\n");
 		return -ENODEV;
+	}
 
 	if (invalid_phys_cpuid(pr->phys_id))
 		return -ENODEV;
@@ -462,8 +464,10 @@ static void acpi_processor_make_not_present(struct acpi_device *device)
 {
 	struct acpi_processor *pr;
 
-	if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU))
+	if (!IS_ENABLED(CONFIG_ACPI_HOTPLUG_CPU)) {
+		pr_err_once("Changing CPU present bit is not supported");
 		return;
+	}
 
 	pr = acpi_driver_data(device);
 	if (pr->id >= nr_cpu_ids)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 09/15] arm64: acpi: Move get_cpu_for_acpi_id() to a header
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (7 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 08/15] ACPI: Warn when the present bit changes but the feature is not enabled Russell King
@ 2024-01-31 16:50 ` Russell King
  2024-01-31 16:50 ` [PATCH RFC v4 10/15] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() Russell King
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

ACPI identifies CPUs by UID. get_cpu_for_acpi_id() maps the ACPI UID
to the linux CPU number.

The helper to retrieve this mapping is only available in arm64's numa
code.

Move it to live next to get_acpi_id_for_cpu().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 arch/arm64/include/asm/acpi.h | 11 +++++++++++
 arch/arm64/kernel/acpi_numa.c | 11 -----------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h
index 6792a1f83f2a..bc9a6656fc0c 100644
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -119,6 +119,17 @@ static inline u32 get_acpi_id_for_cpu(unsigned int cpu)
 	return	acpi_cpu_get_madt_gicc(cpu)->uid;
 }
 
+static inline int get_cpu_for_acpi_id(u32 uid)
+{
+	int cpu;
+
+	for (cpu = 0; cpu < nr_cpu_ids; cpu++)
+		if (uid == get_acpi_id_for_cpu(cpu))
+			return cpu;
+
+	return -EINVAL;
+}
+
 static inline void arch_fix_phys_package_id(int num, u32 slot) { }
 void __init acpi_init_cpus(void);
 int apei_claim_sea(struct pt_regs *regs);
diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
index e51535a5f939..0c036a9a3c33 100644
--- a/arch/arm64/kernel/acpi_numa.c
+++ b/arch/arm64/kernel/acpi_numa.c
@@ -34,17 +34,6 @@ int __init acpi_numa_get_nid(unsigned int cpu)
 	return acpi_early_node_map[cpu];
 }
 
-static inline int get_cpu_for_acpi_id(u32 uid)
-{
-	int cpu;
-
-	for (cpu = 0; cpu < nr_cpu_ids; cpu++)
-		if (uid == get_acpi_id_for_cpu(cpu))
-			return cpu;
-
-	return -EINVAL;
-}
-
 static int __init acpi_parse_gicc_pxm(union acpi_subtable_headers *header,
 				      const unsigned long end)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 10/15] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (8 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 09/15] arm64: acpi: Move get_cpu_for_acpi_id() to a header Russell King
@ 2024-01-31 16:50 ` Russell King
  2024-02-02 16:44   ` Jonathan Cameron
  2024-01-31 16:50 ` [PATCH RFC v4 11/15] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs Russell King
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
It should only count the number of enabled redistributors, but it
also tries to sanity check the GICC entry, currently returning an
error if the Enabled bit is set, but the gicr_base_address is zero.

Adding support for the online-capable bit to the sanity check will
complicate it, for no benefit. The existing check implicitly depends on
gic_acpi_count_gicr_regions() previous failing to find any GICR regions
(as it is valid to have gicr_base_address of zero if the redistributors
are described via a GICR entry).

Instead of complicating the check, remove it. Failures that happen at
this point cause the irqchip not to register, meaning no irqs can be
requested. The kernel grinds to a panic() pretty quickly.

Without the check, MADT tables that exhibit this problem are still
caught by gic_populate_rdist(), which helpfully also prints what went
wrong:
| CPU4: mpidr 100 has no re-distributor!

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Changes since RFC v3 (smaller series):
 * Move acpi_gicc_is_usable() change in gic_acpi_match_gicc() into
   following patch so this is only about the error return.
 * Tweak commit message
---
 drivers/irqchip/irq-gic-v3.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 98b0329b7154..8deb671c4ff7 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -2422,19 +2422,10 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
 	 * If GICC is enabled and has valid gicr base address, then it means
 	 * GICR base is presented via GICC
 	 */
-	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address) {
+	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address)
 		acpi_data.enabled_rdists++;
-		return 0;
-	}
 
-	/*
-	 * It's perfectly valid firmware can pass disabled GICC entry, driver
-	 * should not treat as errors, skip the entry instead of probe fail.
-	 */
-	if (!acpi_gicc_is_usable(gicc))
-		return 0;
-
-	return -ENODEV;
+	return 0;
 }
 
 static int __init gic_acpi_count_gicr_regions(void)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 11/15] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (9 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 10/15] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() Russell King
@ 2024-01-31 16:50 ` Russell King
  2024-02-02 16:47   ` Jonathan Cameron
  2024-01-31 16:50 ` [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs Russell King
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

To support virtual CPU hotplug, ACPI has added an 'online capable' bit
to the MADT GICC entries. This indicates a disabled CPU entry may not
be possible to online via PSCI until firmware has set enabled bit in
_STA.

This means that a "usable" GIC is one that is marked as either enabled,
or online capable. Therefore, change acpi_gicc_is_usable() to check both
bits. However, we need to change the test in gic_acpi_match_gicc() back
to testing just the enabled bit so the count of enabled distributors is
correct.

What about the redistributor in the GICC entry? ACPI doesn't want to say.
Assume the worst: When a redistributor is described in the GICC entry,
but the entry is marked as disabled at boot, assume the redistributor
is inaccessible.

The GICv3 driver doesn't support late online of redistributors, so this
means the corresponding CPU can't be brought online either. Clear the
possible and present bits.

Systems that want CPU hotplug in a VM can ensure their redistributors
are always-on, and describe them that way with a GICR entry in the MADT.

When mapping redistributors found via GICC entries, handle the case
where the arch code believes the CPU is present and possible, but it
does not have an accessible redistributor. Print a warning and clear
the present and possible bits.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
----
Disabled but online-capable CPUs cause this message to be printed
if their redistributors are described via GICC:
| GICv3: CPU 3's redistributor is inaccessible: this CPU can't be brought online

If ACPI's _STA tries to make the cpu present later, this message is printed:
| Changing CPU present bit is not supported

Changes since RFC v2:
 * use gicc->flags & (ACPI_MADT_ENABLED | ACPI_MADT_GICC_CPU_CAPABLE)
Changes since RFC v3 (smaller series):
 * Update ACPI_MADT_GICC_CPU_CAPABLE to ACPI_MADT_GICC_ONLINE_CAPABLE as
   per Lorenzo's ACPICA pull request.
 * Move acpi_gicc_is_usable() change in gic_acpi_match_gicc() into this
   patch from the previous one, and update the commit message to
   describe this change.
---
 drivers/irqchip/irq-gic-v3.c | 21 +++++++++++++++++++--
 include/linux/acpi.h         |  3 ++-
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 8deb671c4ff7..6d0f98d3540e 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -2370,11 +2370,25 @@ gic_acpi_parse_madt_gicc(union acpi_subtable_headers *header,
 				(struct acpi_madt_generic_interrupt *)header;
 	u32 reg = readl_relaxed(acpi_data.dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK;
 	u32 size = reg == GIC_PIDR2_ARCH_GICv4 ? SZ_64K * 4 : SZ_64K * 2;
+	int cpu = get_cpu_for_acpi_id(gicc->uid);
 	void __iomem *redist_base;
 
 	if (!acpi_gicc_is_usable(gicc))
 		return 0;
 
+	/*
+	 * Capable but disabled CPUs can be brought online later. What about
+	 * the redistributor? ACPI doesn't want to say!
+	 * Virtual hotplug systems can use the MADT's "always-on" GICR entries.
+	 * Otherwise, prevent such CPUs from being brought online.
+	 */
+	if (!(gicc->flags & ACPI_MADT_ENABLED)) {
+		pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu);
+		set_cpu_present(cpu, false);
+		set_cpu_possible(cpu, false);
+		return 0;
+	}
+
 	redist_base = ioremap(gicc->gicr_base_address, size);
 	if (!redist_base)
 		return -ENOMEM;
@@ -2420,9 +2434,12 @@ static int __init gic_acpi_match_gicc(union acpi_subtable_headers *header,
 
 	/*
 	 * If GICC is enabled and has valid gicr base address, then it means
-	 * GICR base is presented via GICC
+	 * GICR base is presented via GICC. The redistributor is only known to
+	 * be accessible if the GICC is marked as enabled. If this bit is not
+	 * set, we'd need to add the redistributor at runtime, which isn't
+	 * supported.
 	 */
-	if (acpi_gicc_is_usable(gicc) && gicc->gicr_base_address)
+	if (gicc->flags & ACPI_MADT_ENABLED && gicc->gicr_base_address)
 		acpi_data.enabled_rdists++;
 
 	return 0;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 76ad43f7860b..cf018d42e632 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -239,7 +239,8 @@ void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
 
 static inline bool acpi_gicc_is_usable(struct acpi_madt_generic_interrupt *gicc)
 {
-	return gicc->flags & ACPI_MADT_ENABLED;
+	return gicc->flags & (ACPI_MADT_ENABLED |
+			      ACPI_MADT_GICC_ONLINE_CAPABLE);
 }
 
 /* the following numa functions are architecture-dependent */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (10 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 11/15] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs Russell King
@ 2024-01-31 16:50 ` Russell King
  2024-04-11 11:35   ` Jonathan Cameron
  2024-01-31 16:50 ` [PATCH RFC v4 13/15] ACPI: add support to (un)register CPUs based on the _STA enabled bit Russell King
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: Jean-Philippe Brucker <jean-philippe@linaro.org>

When a CPU is marked as disabled, but online capable in the MADT, PSCI
applies some firmware policy to control when it can be brought online.
PSCI returns DENIED to a CPU_ON request if this is not currently
permitted. The OS can learn the current policy from the _STA enabled bit.

Handle the PSCI DENIED return code gracefully instead of printing an
error.

See https://developer.arm.com/documentation/den0022/f/?lang=en page 58.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
[ morse: Rewrote commit message ]
Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Changes since RFC v2
 * Add specification reference
 * Use EPERM rather than EPROBE_DEFER
Changes since RFC v3:
 * Use EPERM everywhere
 * Drop unnecessary changes to drivers/firmware/psci/psci.c
---
 arch/arm64/kernel/psci.c | 2 +-
 arch/arm64/kernel/smp.c  | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
index 29a8e444db83..fabd732d0a2d 100644
--- a/arch/arm64/kernel/psci.c
+++ b/arch/arm64/kernel/psci.c
@@ -40,7 +40,7 @@ static int cpu_psci_cpu_boot(unsigned int cpu)
 {
 	phys_addr_t pa_secondary_entry = __pa_symbol(secondary_entry);
 	int err = psci_ops.cpu_on(cpu_logical_map(cpu), pa_secondary_entry);
-	if (err)
+	if (err && err != -EPERM)
 		pr_err("failed to boot CPU%d (%d)\n", cpu, err);
 
 	return err;
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 4ced34f62dab..dc0e0b3ec2d4 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -132,7 +132,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 	/* Now bring the CPU into our world */
 	ret = boot_secondary(cpu, idle);
 	if (ret) {
-		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
+		if (ret != -EPERM)
+			pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
 		return ret;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 13/15] ACPI: add support to (un)register CPUs based on the _STA enabled bit
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (11 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs Russell King
@ 2024-01-31 16:50 ` Russell King
  2024-01-31 16:50 ` [PATCH RFC v4 14/15] arm64: document virtual CPU hotplug's expectations Russell King
  2024-01-31 16:50 ` [PATCH RFC v4 15/15] cpumask: Add enabled cpumask for present CPUs that can be brought online Russell King
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

acpi_processor_get_info() registers all present CPUs. Registering a
CPU is what creates the sysfs entries and triggers the udev
notifications.

arm64 virtual machines that support 'virtual cpu hotplug' use the
enabled bit to indicate whether the CPU can be brought online, as
the existing ACPI tables require all hardware to be described and
present.

If firmware describes a CPU as present, but disabled, skip the
registration. Such CPUs are present, but can't be brought online for
whatever reason. (e.g. firmware/hypervisor policy).

Once firmware sets the enabled bit, the CPU can be registered and
brought online by user-space. Online CPUs, or CPUs that are missing
an _STA method must always be registered.

When firmware clears the enabled bit, we need to unregister the CPU
for symetry. As this is dependent on hotplug CPU being support, and
arch_unregister_cpu() only exists when hotplug CPU is supported,
we need to add a check for that configuration symbol.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Changes since RFC v3 (smaller series):
 * Squash "ACPI: processor: Only call arch_unregister_cpu() if
   HOTPLUG_CPU is selected" into this patch.
---
 drivers/acpi/acpi_processor.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index d1d33e74216c..5e641180c45a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -228,6 +228,32 @@ static int acpi_processor_make_present(struct acpi_processor *pr)
 	return ret;
 }
 
+static int acpi_processor_make_enabled(struct acpi_processor *pr)
+{
+	unsigned long long sta;
+	acpi_status status;
+	bool present, enabled;
+
+	if (!acpi_has_method(pr->handle, "_STA"))
+		return arch_register_cpu(pr->id);
+
+	status = acpi_evaluate_integer(pr->handle, "_STA", NULL, &sta);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	present = sta & ACPI_STA_DEVICE_PRESENT;
+	enabled = sta & ACPI_STA_DEVICE_ENABLED;
+
+	if (cpu_online(pr->id) && (!present || !enabled)) {
+		pr_err_once(FW_BUG "CPU %u is online, but described as not present or disabled!\n", pr->id);
+		add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
+	} else if (!present || !enabled) {
+		return -ENODEV;
+	}
+
+	return arch_register_cpu(pr->id);
+}
+
 static int acpi_processor_get_info(struct acpi_device *device)
 {
 	union acpi_object object = { 0 };
@@ -318,7 +344,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
 	 */
 	if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
 	    !get_cpu_device(pr->id)) {
-		int ret = arch_register_cpu(pr->id);
+		int ret = acpi_processor_make_enabled(pr);
 
 		if (ret)
 			return ret;
@@ -511,7 +537,7 @@ static void acpi_processor_post_eject(struct acpi_device *device)
 	unsigned long long sta;
 	acpi_status status;
 
-	if (!device)
+	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || !device)
 		return;
 
 	pr = acpi_driver_data(device);
@@ -526,6 +552,9 @@ static void acpi_processor_post_eject(struct acpi_device *device)
 		acpi_processor_make_not_present(device);
 		return;
 	}
+
+	if (cpu_present(pr->id) && !(sta & ACPI_STA_DEVICE_ENABLED))
+		arch_unregister_cpu(pr->id);
 }
 
 #ifdef CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 14/15] arm64: document virtual CPU hotplug's expectations
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (12 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 13/15] ACPI: add support to (un)register CPUs based on the _STA enabled bit Russell King
@ 2024-01-31 16:50 ` Russell King
  2024-02-02 17:04   ` Jonathan Cameron
  2024-01-31 16:50 ` [PATCH RFC v4 15/15] cpumask: Add enabled cpumask for present CPUs that can be brought online Russell King
  14 siblings, 1 reply; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

Add a description of physical and virtual CPU hotplug, explain the
differences and elaborate on what is required in ACPI for a working
virtual hotplug system.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Outstanding comment:
 https://lore.kernel.org/r/20230914174137.00000a62@Huawei.com
 https://lore.kernel.org/r/20231215170428.00000d81@Huawei.com
---
 Documentation/arch/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++++++++
 Documentation/arch/arm64/index.rst       |  1 +
 2 files changed, 80 insertions(+)
 create mode 100644 Documentation/arch/arm64/cpu-hotplug.rst

diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst
new file mode 100644
index 000000000000..76ba8d932c72
--- /dev/null
+++ b/Documentation/arch/arm64/cpu-hotplug.rst
@@ -0,0 +1,79 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. _cpuhp_index:
+
+====================
+CPU Hotplug and ACPI
+====================
+
+CPU hotplug in the arm64 world is commonly used to describe the kernel taking
+CPUs online/offline using PSCI. This document is about ACPI firmware allowing
+CPUs that were not available during boot to be added to the system later.
+
+``possible`` and ``present`` refer to the state of the CPU as seen by linux.
+
+
+CPU Hotplug on physical systems - CPUs not present at boot
+----------------------------------------------------------
+
+Physical systems need to mark a CPU that is ``possible`` but not ``present`` as
+being ``present``. An example would be a dual socket machine, where the package
+in one of the sockets can be replaced while the system is running.
+
+This is not supported.
+
+In the arm64 world CPUs are not a single device but a slice of the system.
+There are no systems that support the physical addition (or removal) of CPUs
+while the system is running, and ACPI is not able to sufficiently describe
+them.
+
+e.g. New CPUs come with new caches, but the platform's cache toplogy is
+described in a static table, the PPTT. How caches are shared between CPUs is
+not discoverable, and must be described by firmware.
+
+e.g. The GIC redistributor for each CPU must be accessed by the driver during
+boot to discover the system wide supported features. ACPI's MADT GICC
+structures can describe a redistributor associated with a disabled CPU, but
+can't describe whether the redistributor is accessible, only that it is not
+'always on'.
+
+arm64's ACPI tables assume that everything described is ``present``.
+
+
+CPU Hotplug on virtual systems - CPUs not enabled at boot
+---------------------------------------------------------
+
+Virtual systems have the advantage that all the properties the system will
+ever have can be described at boot. There are no power-domain considerations
+as such devices are emulated.
+
+CPU Hotplug on virtual systems is supported. It is distinct from physical
+CPU Hotplug as all resources are described as ``present``, but CPUs may be
+marked as disabled by firmware. Only the CPU's online/offline behaviour is
+influenced by firmware. An example is where a virtual machine boots with a
+single CPU, and additional CPUs are added once a cloud orchestrator deploys
+the workload.
+
+For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware.
+
+Virtual hotplug is implemented as a firmware policy affecting which CPUs can be
+brought online. Firmware can enforce its policy via PSCI's return codes. e.g.
+``DENIED``.
+
+The ACPI tables must describe all the resources of the virtual machine. CPUs
+that firmware wishes to disable either from boot (or later) should not be
+``enabled`` in the MADT GICC structures, but should have the ``online capable``
+bit set, to indicate they can be enabled later. The boot CPU must be marked as
+``enabled``.  The 'always on' GICR structure must be used to describe the
+redistributors.
+
+CPUs described as ``online capable`` but not ``enabled`` can be set to enabled
+by the DSDT's Processor object's _STA method. On virtual systems the _STA method
+must always report the CPU as ``present``. Changes to the firmware policy can
+be notified to the OS via device-check or eject-request.
+
+CPUs described as ``enabled`` in the static table, should not have their _STA
+modified dynamically by firmware. Soft-restart features such as kexec will
+re-read the static properties of the system from these static tables, and
+may malfunction if these no longer describe the running system. Linux will
+re-discover the dynamic properties of the system from the _STA method later
+during boot.
diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
index d08e924204bf..78544de0a8a9 100644
--- a/Documentation/arch/arm64/index.rst
+++ b/Documentation/arch/arm64/index.rst
@@ -13,6 +13,7 @@ ARM64 Architecture
     asymmetric-32bit
     booting
     cpu-feature-registers
+    cpu-hotplug
     elf_hwcaps
     hugetlbpage
     kdump
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH RFC v4 15/15] cpumask: Add enabled cpumask for present CPUs that can be brought online
  2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
                   ` (13 preceding siblings ...)
  2024-01-31 16:50 ` [PATCH RFC v4 14/15] arm64: document virtual CPU hotplug's expectations Russell King
@ 2024-01-31 16:50 ` Russell King
  14 siblings, 0 replies; 42+ messages in thread
From: Russell King @ 2024-01-31 16:50 UTC (permalink / raw)
  To: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc
  Cc: Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Jonathan Cameron, Rafael J. Wysocki

From: James Morse <james.morse@arm.com>

The 'offline' file in sysfs shows all offline CPUs, including those
that aren't present. User-space is expected to remove not-present CPUs
from this list to learn which CPUs could be brought online.

CPUs can be present but not-enabled. These CPUs can't be brought online
until the firmware policy changes, which comes with an ACPI notification
that will register the CPUs.

With only the offline and present files, user-space is unable to
determine which CPUs it can try to bring online. Add a new CPU mask
that shows this based on all the registered CPUs.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Jianyong Wu <jianyong.wu@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
Changes since RFCv3 (smaller series);
 * Added Documentation/ABI/testing/sysfs-devices-system-cpu entry
---
 .../ABI/testing/sysfs-devices-system-cpu      |  6 +++++
 drivers/base/cpu.c                            | 10 ++++++++
 include/linux/cpumask.h                       | 25 +++++++++++++++++++
 kernel/cpu.c                                  |  3 +++
 4 files changed, 44 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index a1db6db47505..59482c10e0ad 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -693,3 +693,9 @@ Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 		(RO) indicates whether or not the kernel directly supports
 		modifying the crash elfcorehdr for CPU hot un/plug and/or
 		on/offline changes.
+
+What:		/sys/devices/system/cpu/enabled
+Date:		Nov 2022
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:
+		(RO) the list of CPUs that can be brought online.
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 13d052bf13f4..a6e96a0a92b7 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -95,6 +95,7 @@ void unregister_cpu(struct cpu *cpu)
 {
 	int logical_cpu = cpu->dev.id;
 
+	set_cpu_enabled(logical_cpu, false);
 	unregister_cpu_under_node(logical_cpu, cpu_to_node(logical_cpu));
 
 	device_unregister(&cpu->dev);
@@ -273,6 +274,13 @@ static ssize_t print_cpus_offline(struct device *dev,
 }
 static DEVICE_ATTR(offline, 0444, print_cpus_offline, NULL);
 
+static ssize_t print_cpus_enabled(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(cpu_enabled_mask));
+}
+static DEVICE_ATTR(enabled, 0444, print_cpus_enabled, NULL);
+
 static ssize_t print_cpus_isolated(struct device *dev,
 				  struct device_attribute *attr, char *buf)
 {
@@ -413,6 +421,7 @@ int register_cpu(struct cpu *cpu, int num)
 	register_cpu_under_node(num, cpu_to_node(num));
 	dev_pm_qos_expose_latency_limit(&cpu->dev,
 					PM_QOS_RESUME_LATENCY_NO_CONSTRAINT);
+	set_cpu_enabled(num, true);
 
 	return 0;
 }
@@ -494,6 +503,7 @@ static struct attribute *cpu_root_attrs[] = {
 	&cpu_attrs[2].attr.attr,
 	&dev_attr_kernel_max.attr,
 	&dev_attr_offline.attr,
+	&dev_attr_enabled.attr,
 	&dev_attr_isolated.attr,
 #ifdef CONFIG_NO_HZ_FULL
 	&dev_attr_nohz_full.attr,
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index cfb545841a2c..cc72a0887f04 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -92,6 +92,7 @@ static inline void set_nr_cpu_ids(unsigned int nr)
  *
  *     cpu_possible_mask- has bit 'cpu' set iff cpu is populatable
  *     cpu_present_mask - has bit 'cpu' set iff cpu is populated
+ *     cpu_enabled_mask  - has bit 'cpu' set iff cpu can be brought online
  *     cpu_online_mask  - has bit 'cpu' set iff cpu available to scheduler
  *     cpu_active_mask  - has bit 'cpu' set iff cpu available to migration
  *
@@ -124,11 +125,13 @@ static inline void set_nr_cpu_ids(unsigned int nr)
 
 extern struct cpumask __cpu_possible_mask;
 extern struct cpumask __cpu_online_mask;
+extern struct cpumask __cpu_enabled_mask;
 extern struct cpumask __cpu_present_mask;
 extern struct cpumask __cpu_active_mask;
 extern struct cpumask __cpu_dying_mask;
 #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask)
 #define cpu_online_mask   ((const struct cpumask *)&__cpu_online_mask)
+#define cpu_enabled_mask   ((const struct cpumask *)&__cpu_enabled_mask)
 #define cpu_present_mask  ((const struct cpumask *)&__cpu_present_mask)
 #define cpu_active_mask   ((const struct cpumask *)&__cpu_active_mask)
 #define cpu_dying_mask    ((const struct cpumask *)&__cpu_dying_mask)
@@ -993,6 +996,7 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS);
 #else
 #define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask)
 #define for_each_online_cpu(cpu)   for_each_cpu((cpu), cpu_online_mask)
+#define for_each_enabled_cpu(cpu)   for_each_cpu((cpu), cpu_enabled_mask)
 #define for_each_present_cpu(cpu)  for_each_cpu((cpu), cpu_present_mask)
 #endif
 
@@ -1015,6 +1019,15 @@ set_cpu_possible(unsigned int cpu, bool possible)
 		cpumask_clear_cpu(cpu, &__cpu_possible_mask);
 }
 
+static inline void
+set_cpu_enabled(unsigned int cpu, bool can_be_onlined)
+{
+	if (can_be_onlined)
+		cpumask_set_cpu(cpu, &__cpu_enabled_mask);
+	else
+		cpumask_clear_cpu(cpu, &__cpu_enabled_mask);
+}
+
 static inline void
 set_cpu_present(unsigned int cpu, bool present)
 {
@@ -1096,6 +1109,7 @@ static __always_inline unsigned int num_online_cpus(void)
 	return raw_atomic_read(&__num_online_cpus);
 }
 #define num_possible_cpus()	cpumask_weight(cpu_possible_mask)
+#define num_enabled_cpus()	cpumask_weight(cpu_enabled_mask)
 #define num_present_cpus()	cpumask_weight(cpu_present_mask)
 #define num_active_cpus()	cpumask_weight(cpu_active_mask)
 
@@ -1104,6 +1118,11 @@ static inline bool cpu_online(unsigned int cpu)
 	return cpumask_test_cpu(cpu, cpu_online_mask);
 }
 
+static inline bool cpu_enabled(unsigned int cpu)
+{
+	return cpumask_test_cpu(cpu, cpu_enabled_mask);
+}
+
 static inline bool cpu_possible(unsigned int cpu)
 {
 	return cpumask_test_cpu(cpu, cpu_possible_mask);
@@ -1128,6 +1147,7 @@ static inline bool cpu_dying(unsigned int cpu)
 
 #define num_online_cpus()	1U
 #define num_possible_cpus()	1U
+#define num_enabled_cpus()	1U
 #define num_present_cpus()	1U
 #define num_active_cpus()	1U
 
@@ -1141,6 +1161,11 @@ static inline bool cpu_possible(unsigned int cpu)
 	return cpu == 0;
 }
 
+static inline bool cpu_enabled(unsigned int cpu)
+{
+	return cpu == 0;
+}
+
 static inline bool cpu_present(unsigned int cpu)
 {
 	return cpu == 0;
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e6ec3ba4950b..23b5568fb378 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -3117,6 +3117,9 @@ EXPORT_SYMBOL(__cpu_possible_mask);
 struct cpumask __cpu_online_mask __read_mostly;
 EXPORT_SYMBOL(__cpu_online_mask);
 
+struct cpumask __cpu_enabled_mask __read_mostly;
+EXPORT_SYMBOL(__cpu_enabled_mask);
+
 struct cpumask __cpu_present_mask __read_mostly;
 EXPORT_SYMBOL(__cpu_present_mask);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices
  2024-01-31 16:49 ` [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices Russell King
@ 2024-01-31 17:25   ` Miguel Luis
  2024-02-15 20:10   ` Rafael J. Wysocki
  2024-02-21 13:01   ` Rafael J. Wysocki
  2 siblings, 0 replies; 42+ messages in thread
From: Miguel Luis @ 2024-01-31 17:25 UTC (permalink / raw)
  To: Russell King
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Jonathan Cameron, Rafael J. Wysocki

Hi

> On 31 Jan 2024, at 15:49, Russell King <rmk+kernel@armlinux.org.uk> wrote:
> 
> From: James Morse <james.morse@arm.com>
> 
> Today the ACPI enumeration code 'visits' all devices that are present.
> 
> This is a problem for arm64, where CPUs are always present, but not
> always enabled. When a device-check occurs because the firmware-policy
> has changed and a CPU is now enabled, the following error occurs:
> | acpi ACPI0007:48: Enumeration failure
> 
> This is ultimately because acpi_dev_ready_for_enumeration() returns
> true for a device that is not enabled. The ACPI Processor driver
> will not register such CPUs as they are not 'decoding their resources'.
> 
> ACPI allows a device to be functional instead of maintaining the
> present and enabled bit, but we can't simply check the enabled bit
> for all devices since firmware can be buggy.
> 
> If ACPI indicates that the device is present and enabled, then all well
> and good, we can enumate it. However, if the device is present and not

enumerate

> enabled, then we also check whether the device is a processor device
> to limit the impact of this new check to just processor devices.
> 
> This avoids enumerating present && functional processor devices that
> are not enabled.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> Changes since RFC v2:
> * Incorporate comment suggestion by Gavin Shan.
> Changes since RFC v3:
> * Fixed "sert" typo.
> Changes since RFC v3 (smaller series):
> * Restrict checking the enabled bit to processor devices, update
>   commit comments.
> * Use Rafael's suggestion in
>   https://lore.kernel.org/r/5760569.DvuYhMxLoT@kreacher
> * Updated with a fix - see:
>   https://lore.kernel.org/all/Zbe8WQRASx6D6RaG@shell.armlinux.org.uk/
> ---
> drivers/acpi/acpi_processor.c | 11 +++++++++
> drivers/acpi/device_pm.c      |  2 +-
> drivers/acpi/device_sysfs.c   |  2 +-
> drivers/acpi/internal.h       |  4 ++-
> drivers/acpi/property.c       |  2 +-
> drivers/acpi/scan.c           | 46 +++++++++++++++++++++++++++--------
> 6 files changed, 53 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..cf7c1cca69dd 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,6 +626,17 @@ static struct acpi_scan_handler processor_handler = {
> },
> };
> 
> +bool acpi_device_is_processor(const struct acpi_device *adev)
> +{
> + if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
> + return true;
> +
> + if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
> + return false;
> +
> + return acpi_scan_check_handler(adev, &processor_handler);
> +}
> +
> static int acpi_processor_container_attach(struct acpi_device *dev,
>   const struct acpi_device_id *id)
> {
> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> index 3b4d048c4941..e3c80f3b3b57 100644
> --- a/drivers/acpi/device_pm.c
> +++ b/drivers/acpi/device_pm.c
> @@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
> return -EINVAL;
> 
> device->power.state = ACPI_STATE_UNKNOWN;
> - if (!acpi_device_is_present(device)) {
> + if (!acpi_dev_ready_for_enumeration(device)) {
> device->flags.initialized = false;
> return -ENXIO;
> }
> diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
> index 23373faa35ec..a0256d2493a7 100644
> --- a/drivers/acpi/device_sysfs.c
> +++ b/drivers/acpi/device_sysfs.c
> @@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
> struct acpi_hardware_id *id;
> 
> /* Avoid unnecessarily loading modules for non present devices. */
> - if (!acpi_device_is_present(acpi_dev))
> + if (!acpi_dev_ready_for_enumeration(acpi_dev))
> return 0;
> 
> /*
> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index 6588525c45ef..1bc8b6db60c5 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
> int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
>       const char *hotplug_profile_name);
> void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> +     struct acpi_scan_handler *handler);
> 
> #ifdef CONFIG_DEBUG_FS
> extern struct dentry *acpi_debugfs_dir;
> @@ -121,7 +123,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
> void acpi_device_remove_files(struct acpi_device *dev);
> void acpi_device_add_finalize(struct acpi_device *device);
> void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
> -bool acpi_device_is_present(const struct acpi_device *adev);
> bool acpi_device_is_battery(struct acpi_device *adev);
> bool acpi_device_is_first_physical_node(struct acpi_device *adev,
> const struct device *dev);
> @@ -133,6 +134,7 @@ int acpi_bus_register_early_device(int type);
> const struct acpi_device *acpi_companion_match(const struct device *dev);
> int __acpi_device_uevent_modalias(const struct acpi_device *adev,
>  struct kobj_uevent_env *env);
> +bool acpi_device_is_processor(const struct acpi_device *adev);
> 
> /* --------------------------------------------------------------------------
>                                   Power Resource
> diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> index a6ead5204046..9f8d54038770 100644
> --- a/drivers/acpi/property.c
> +++ b/drivers/acpi/property.c
> @@ -1486,7 +1486,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
> if (!is_acpi_device_node(fwnode))
> return false;
> 
> - return acpi_device_is_present(to_acpi_device_node(fwnode));
> + return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
> }
> 
> static const void *
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index e6ed1ba91e5c..fd2e8b3a5749 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
> int error;
> 
> acpi_bus_get_status(adev);
> - if (acpi_device_is_present(adev)) {
> + if (acpi_dev_ready_for_enumeration(adev)) {
> /*
> * This function is only called for device objects for which
> * matching scan handlers exist.  The only situation in which
> @@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
> int error;
> 
> acpi_bus_get_status(adev);
> - if (!acpi_device_is_present(adev)) {
> + if (!acpi_dev_ready_for_enumeration(adev)) {
> acpi_scan_device_not_enumerated(adev);
> return 0;
> }
> @@ -1917,11 +1917,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
> return true;
> }
> 
> -bool acpi_device_is_present(const struct acpi_device *adev)
> -{
> - return adev->status.present || adev->status.functional;
> -}
> -
> static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
>       const char *idstr,
>       const struct acpi_device_id **matchid)
> @@ -1942,6 +1937,18 @@ static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
> return false;
> }
> 
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> +     struct acpi_scan_handler *handler)
> +{
> + struct acpi_hardware_id *hwid;
> +
> + list_for_each_entry(hwid, &adev->pnp.ids, list)
> + if (acpi_scan_handler_matching(handler, hwid->id, NULL))
> + return true;
> +
> + return false;
> +}
> +
> static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
> const struct acpi_device_id **matchid)
> {
> @@ -2405,16 +2412,35 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
>  * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
>  * @device: Pointer to the &struct acpi_device to check
>  *
> - * Check if the device is present and has no unmet dependencies.
> + * Check if the device is functional or enabled and has no unmet dependencies.
>  *
> - * Return true if the device is ready for enumeratino. Otherwise, return false.
> + * Return true if the device is ready for enumeration. Otherwise, return false.
>  */
> bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
> {
> if (device->flags.honor_deps && device->dep_unmet)
> return false;
> 
> - return acpi_device_is_present(device);
> + /*
> + * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> + * (!present && functional) for certain types of devices that should be
> + * enumerated. Note that the enabled bit should not be set unless the
> + * present bit is set.
> + *
> + * However, limit this only to processor devices to reduce possible
> + * regressions with firmware.
> + */
> + if (!device->status.present)
> + return device->status.functional;
> +
> + /*
> + * Fast path - if enabled is set, avoid the more expensive test to
> + * check whether this device is a processor.
> + */
> + if (device->status.enabled)
> + return true;
> +
> + return !acpi_device_is_processor(device);

Otherwise, feel free to add:

Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Miguel Luis <miguel.luis@oracle.com>

Thanks

Miguel

> }
> EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
> 
> -- 
> 2.30.2
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 10/15] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
  2024-01-31 16:50 ` [PATCH RFC v4 10/15] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() Russell King
@ 2024-02-02 16:44   ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2024-02-02 16:44 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Rafael J. Wysocki

On Wed, 31 Jan 2024 16:50:27 +0000
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:

> From: James Morse <james.morse@arm.com>
> 
> gic_acpi_match_gicc() is only called via gic_acpi_count_gicr_regions().
> It should only count the number of enabled redistributors, but it
> also tries to sanity check the GICC entry, currently returning an
> error if the Enabled bit is set, but the gicr_base_address is zero.
> 
> Adding support for the online-capable bit to the sanity check will
> complicate it, for no benefit. The existing check implicitly depends on
> gic_acpi_count_gicr_regions() previous failing to find any GICR regions
> (as it is valid to have gicr_base_address of zero if the redistributors
> are described via a GICR entry).
> 
> Instead of complicating the check, remove it. Failures that happen at
> this point cause the irqchip not to register, meaning no irqs can be
> requested. The kernel grinds to a panic() pretty quickly.
> 
> Without the check, MADT tables that exhibit this problem are still
> caught by gic_populate_rdist(), which helpfully also prints what went
> wrong:
> | CPU4: mpidr 100 has no re-distributor!
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Does what it says on the tin.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 11/15] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs
  2024-01-31 16:50 ` [PATCH RFC v4 11/15] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs Russell King
@ 2024-02-02 16:47   ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2024-02-02 16:47 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Rafael J. Wysocki

On Wed, 31 Jan 2024 16:50:32 +0000
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:

> From: James Morse <james.morse@arm.com>
> 
> To support virtual CPU hotplug, ACPI has added an 'online capable' bit
> to the MADT GICC entries. This indicates a disabled CPU entry may not
> be possible to online via PSCI until firmware has set enabled bit in
> _STA.
> 
> This means that a "usable" GIC is one that is marked as either enabled,
> or online capable. Therefore, change acpi_gicc_is_usable() to check both
> bits. However, we need to change the test in gic_acpi_match_gicc() back
> to testing just the enabled bit so the count of enabled distributors is
> correct.
> 
> What about the redistributor in the GICC entry? ACPI doesn't want to say.
> Assume the worst: When a redistributor is described in the GICC entry,
> but the entry is marked as disabled at boot, assume the redistributor
> is inaccessible.
> 
> The GICv3 driver doesn't support late online of redistributors, so this
> means the corresponding CPU can't be brought online either. Clear the
> possible and present bits.
> 
> Systems that want CPU hotplug in a VM can ensure their redistributors
> are always-on, and describe them that way with a GICR entry in the MADT.
> 
> When mapping redistributors found via GICC entries, handle the case
> where the arch code believes the CPU is present and possible, but it
> does not have an accessible redistributor. Print a warning and clear
> the present and possible bits.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 14/15] arm64: document virtual CPU hotplug's expectations
  2024-01-31 16:50 ` [PATCH RFC v4 14/15] arm64: document virtual CPU hotplug's expectations Russell King
@ 2024-02-02 17:04   ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2024-02-02 17:04 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Rafael J. Wysocki

On Wed, 31 Jan 2024 16:50:48 +0000
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:

> From: James Morse <james.morse@arm.com>
> 
> Add a description of physical and virtual CPU hotplug, explain the
> differences and elaborate on what is required in ACPI for a working
> virtual hotplug system.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> Outstanding comment:
>  https://lore.kernel.org/r/20230914174137.00000a62@Huawei.com

Outstanding but an ACPI Spec question rather than one about this description.
I've finally regained access to Mantis etc so can follow that up at some point.

If we get clarify on how to tell that CPUs that are present and enabled at boot
can be removed later we can relax this text to allow for that.


>  https://lore.kernel.org/r/20231215170428.00000d81@Huawei.com
That's a comment on the above comment to say I'm fine with it as is :)

This @Huawei guy is really annoying ;)

Jonathan

> ---
>  Documentation/arch/arm64/cpu-hotplug.rst | 79 ++++++++++++++++++++++++
>  Documentation/arch/arm64/index.rst       |  1 +
>  2 files changed, 80 insertions(+)
>  create mode 100644 Documentation/arch/arm64/cpu-hotplug.rst
> 
> diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst
> new file mode 100644
> index 000000000000..76ba8d932c72
> --- /dev/null
> +++ b/Documentation/arch/arm64/cpu-hotplug.rst
> @@ -0,0 +1,79 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. _cpuhp_index:
> +
> +====================
> +CPU Hotplug and ACPI
> +====================
> +
> +CPU hotplug in the arm64 world is commonly used to describe the kernel taking
> +CPUs online/offline using PSCI. This document is about ACPI firmware allowing
> +CPUs that were not available during boot to be added to the system later.
> +
> +``possible`` and ``present`` refer to the state of the CPU as seen by linux.
> +
> +
> +CPU Hotplug on physical systems - CPUs not present at boot
> +----------------------------------------------------------
> +
> +Physical systems need to mark a CPU that is ``possible`` but not ``present`` as
> +being ``present``. An example would be a dual socket machine, where the package
> +in one of the sockets can be replaced while the system is running.
> +
> +This is not supported.
> +
> +In the arm64 world CPUs are not a single device but a slice of the system.
> +There are no systems that support the physical addition (or removal) of CPUs
> +while the system is running, and ACPI is not able to sufficiently describe
> +them.
> +
> +e.g. New CPUs come with new caches, but the platform's cache toplogy is
> +described in a static table, the PPTT. How caches are shared between CPUs is
> +not discoverable, and must be described by firmware.
> +
> +e.g. The GIC redistributor for each CPU must be accessed by the driver during
> +boot to discover the system wide supported features. ACPI's MADT GICC
> +structures can describe a redistributor associated with a disabled CPU, but
> +can't describe whether the redistributor is accessible, only that it is not
> +'always on'.
> +
> +arm64's ACPI tables assume that everything described is ``present``.
> +
> +
> +CPU Hotplug on virtual systems - CPUs not enabled at boot
> +---------------------------------------------------------
> +
> +Virtual systems have the advantage that all the properties the system will
> +ever have can be described at boot. There are no power-domain considerations
> +as such devices are emulated.
> +
> +CPU Hotplug on virtual systems is supported. It is distinct from physical
> +CPU Hotplug as all resources are described as ``present``, but CPUs may be
> +marked as disabled by firmware. Only the CPU's online/offline behaviour is
> +influenced by firmware. An example is where a virtual machine boots with a
> +single CPU, and additional CPUs are added once a cloud orchestrator deploys
> +the workload.
> +
> +For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware.
> +
> +Virtual hotplug is implemented as a firmware policy affecting which CPUs can be
> +brought online. Firmware can enforce its policy via PSCI's return codes. e.g.
> +``DENIED``.
> +
> +The ACPI tables must describe all the resources of the virtual machine. CPUs
> +that firmware wishes to disable either from boot (or later) should not be
> +``enabled`` in the MADT GICC structures, but should have the ``online capable``
> +bit set, to indicate they can be enabled later. The boot CPU must be marked as
> +``enabled``.  The 'always on' GICR structure must be used to describe the
> +redistributors.
> +
> +CPUs described as ``online capable`` but not ``enabled`` can be set to enabled
> +by the DSDT's Processor object's _STA method. On virtual systems the _STA method
> +must always report the CPU as ``present``. Changes to the firmware policy can
> +be notified to the OS via device-check or eject-request.
> +
> +CPUs described as ``enabled`` in the static table, should not have their _STA
> +modified dynamically by firmware. Soft-restart features such as kexec will
> +re-read the static properties of the system from these static tables, and
> +may malfunction if these no longer describe the running system. Linux will
> +re-discover the dynamic properties of the system from the _STA method later
> +during boot.
> diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
> index d08e924204bf..78544de0a8a9 100644
> --- a/Documentation/arch/arm64/index.rst
> +++ b/Documentation/arch/arm64/index.rst
> @@ -13,6 +13,7 @@ ARM64 Architecture
>      asymmetric-32bit
>      booting
>      cpu-feature-registers
> +    cpu-hotplug
>      elf_hwcaps
>      hugetlbpage
>      kdump


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-01-31 16:49 ` [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info() Russell King
@ 2024-02-15 19:22   ` Rafael J. Wysocki
  2024-02-20 11:27     ` Russell King (Oracle)
                       ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-02-15 19:22 UTC (permalink / raw)
  To: Russell King
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Jonathan Cameron, Rafael J. Wysocki

On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
>
> From: James Morse <james.morse@arm.com>
>
> To allow ACPI to skip the call to arch_register_cpu() when the _STA
> value indicates the CPU can't be brought online right now, move the
> arch_register_cpu() call into acpi_processor_get_info().
>
> Systems can still be booted with 'acpi=off', or not include an
> ACPI description at all. For these, the CPUs continue to be
> registered by cpu_dev_register_generic().
>
> This moves the CPU register logic back to a subsys_initcall(),
> while the memory nodes will have been registered earlier.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> Changes since RFC v2:
>  * Fixup comment in acpi_processor_get_info() (Gavin Shan)
>  * Add comment in cpu_dev_register_generic() (Gavin Shan)
> ---
>  drivers/acpi/acpi_processor.c | 12 ++++++++++++
>  drivers/base/cpu.c            |  6 +++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index cf7c1cca69dd..a68c475cdea5 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
>                         cpufreq_add_device("acpi-cpufreq");
>         }
>
> +       /*
> +        * Register CPUs that are present. get_cpu_device() is used to skip
> +        * duplicate CPU descriptions from firmware.
> +        */
> +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> +           !get_cpu_device(pr->id)) {
> +               int ret = arch_register_cpu(pr->id);
> +
> +               if (ret)
> +                       return ret;
> +       }
> +
>         /*
>          *  Extra Processor objects may be enumerated on MP systems with
>          *  less than the max # of CPUs. They should be ignored _iff

This is interesting, because right below there is the following code:

    if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
        int ret = acpi_processor_hotadd_init(pr);

        if (ret)
            return ret;
    }

and acpi_processor_hotadd_init() essentially calls arch_register_cpu()
with some extra things around it (more about that below).

I do realize that acpi_processor_hotadd_init() is defined under
CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's
consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set.

So why are the two conditionals that almost contradict each other both
needed?  It looks like the new code could be combined with
acpi_processor_hotadd_init() to do the right thing in all cases.

Now, acpi_processor_hotadd_init() does some extra things that look
like they should be done by the new code too.

1. It checks invalid_phys_cpuid() which appears to be a good idea to me.

2. It uses locking around arch_register_cpu() which doesn't seem
unreasonable either.

3. It calls acpi_map_cpu() and I'm not sure why this is not done by
the new code.

The only thing that can be dropped from it is the _STA check AFAICS,
because acpi_processor_add() won't even be called if the CPU is not
present (and not enabled after the first patch).

So why does the code not do 1 - 3 above?

> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 47de0f140ba6..13d052bf13f4 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
>  {
>         int i, ret;
>
> -       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> +       /*
> +        * When ACPI is enabled, CPUs are registered via
> +        * acpi_processor_get_info().
> +        */
> +       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
>                 return;

Honestly, this looks like a quick hack to me and it absolutely
requires an ACK from the x86 maintainers to go anywhere.

>
>         for_each_present_cpu(i) {
> --

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices
  2024-01-31 16:49 ` [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices Russell King
  2024-01-31 17:25   ` Miguel Luis
@ 2024-02-15 20:10   ` Rafael J. Wysocki
  2024-02-19  9:45     ` Jonathan Cameron
  2024-02-20 11:30     ` Russell King (Oracle)
  2024-02-21 13:01   ` Rafael J. Wysocki
  2 siblings, 2 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-02-15 20:10 UTC (permalink / raw)
  To: Russell King
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Jonathan Cameron, Rafael J. Wysocki

On Wed, Jan 31, 2024 at 5:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
>
> From: James Morse <james.morse@arm.com>
>
> Today the ACPI enumeration code 'visits' all devices that are present.
>
> This is a problem for arm64, where CPUs are always present, but not
> always enabled. When a device-check occurs because the firmware-policy
> has changed and a CPU is now enabled, the following error occurs:
> | acpi ACPI0007:48: Enumeration failure
>
> This is ultimately because acpi_dev_ready_for_enumeration() returns
> true for a device that is not enabled. The ACPI Processor driver
> will not register such CPUs as they are not 'decoding their resources'.
>
> ACPI allows a device to be functional instead of maintaining the
> present and enabled bit, but we can't simply check the enabled bit
> for all devices since firmware can be buggy.
>
> If ACPI indicates that the device is present and enabled, then all well
> and good, we can enumate it. However, if the device is present and not
> enabled, then we also check whether the device is a processor device
> to limit the impact of this new check to just processor devices.
>
> This avoids enumerating present && functional processor devices that
> are not enabled.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> Changes since RFC v2:
>  * Incorporate comment suggestion by Gavin Shan.
> Changes since RFC v3:
>  * Fixed "sert" typo.
> Changes since RFC v3 (smaller series):
>  * Restrict checking the enabled bit to processor devices, update
>    commit comments.
>  * Use Rafael's suggestion in
>    https://lore.kernel.org/r/5760569.DvuYhMxLoT@kreacher
>  * Updated with a fix - see:
>    https://lore.kernel.org/all/Zbe8WQRASx6D6RaG@shell.armlinux.org.uk/
> ---
>  drivers/acpi/acpi_processor.c | 11 +++++++++
>  drivers/acpi/device_pm.c      |  2 +-
>  drivers/acpi/device_sysfs.c   |  2 +-
>  drivers/acpi/internal.h       |  4 ++-
>  drivers/acpi/property.c       |  2 +-
>  drivers/acpi/scan.c           | 46 +++++++++++++++++++++++++++--------
>  6 files changed, 53 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..cf7c1cca69dd 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,6 +626,17 @@ static struct acpi_scan_handler processor_handler = {
>         },
>  };
>
> +bool acpi_device_is_processor(const struct acpi_device *adev)
> +{
> +       if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
> +               return true;
> +
> +       if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
> +               return false;
> +
> +       return acpi_scan_check_handler(adev, &processor_handler);
> +}
> +
>  static int acpi_processor_container_attach(struct acpi_device *dev,
>                                            const struct acpi_device_id *id)
>  {
> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> index 3b4d048c4941..e3c80f3b3b57 100644
> --- a/drivers/acpi/device_pm.c
> +++ b/drivers/acpi/device_pm.c
> @@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
>                 return -EINVAL;
>
>         device->power.state = ACPI_STATE_UNKNOWN;
> -       if (!acpi_device_is_present(device)) {
> +       if (!acpi_dev_ready_for_enumeration(device)) {
>                 device->flags.initialized = false;
>                 return -ENXIO;
>         }
> diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
> index 23373faa35ec..a0256d2493a7 100644
> --- a/drivers/acpi/device_sysfs.c
> +++ b/drivers/acpi/device_sysfs.c
> @@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
>         struct acpi_hardware_id *id;
>
>         /* Avoid unnecessarily loading modules for non present devices. */
> -       if (!acpi_device_is_present(acpi_dev))
> +       if (!acpi_dev_ready_for_enumeration(acpi_dev))
>                 return 0;
>
>         /*
> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index 6588525c45ef..1bc8b6db60c5 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
>  int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
>                                        const char *hotplug_profile_name);
>  void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> +                            struct acpi_scan_handler *handler);
>
>  #ifdef CONFIG_DEBUG_FS
>  extern struct dentry *acpi_debugfs_dir;
> @@ -121,7 +123,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
>  void acpi_device_remove_files(struct acpi_device *dev);
>  void acpi_device_add_finalize(struct acpi_device *device);
>  void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
> -bool acpi_device_is_present(const struct acpi_device *adev);
>  bool acpi_device_is_battery(struct acpi_device *adev);
>  bool acpi_device_is_first_physical_node(struct acpi_device *adev,
>                                         const struct device *dev);
> @@ -133,6 +134,7 @@ int acpi_bus_register_early_device(int type);
>  const struct acpi_device *acpi_companion_match(const struct device *dev);
>  int __acpi_device_uevent_modalias(const struct acpi_device *adev,
>                                   struct kobj_uevent_env *env);
> +bool acpi_device_is_processor(const struct acpi_device *adev);
>
>  /* --------------------------------------------------------------------------
>                                    Power Resource
> diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> index a6ead5204046..9f8d54038770 100644
> --- a/drivers/acpi/property.c
> +++ b/drivers/acpi/property.c
> @@ -1486,7 +1486,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
>         if (!is_acpi_device_node(fwnode))
>                 return false;
>
> -       return acpi_device_is_present(to_acpi_device_node(fwnode));
> +       return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
>  }
>
>  static const void *
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index e6ed1ba91e5c..fd2e8b3a5749 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
>         int error;
>
>         acpi_bus_get_status(adev);
> -       if (acpi_device_is_present(adev)) {
> +       if (acpi_dev_ready_for_enumeration(adev)) {
>                 /*
>                  * This function is only called for device objects for which
>                  * matching scan handlers exist.  The only situation in which
> @@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
>         int error;
>
>         acpi_bus_get_status(adev);
> -       if (!acpi_device_is_present(adev)) {
> +       if (!acpi_dev_ready_for_enumeration(adev)) {
>                 acpi_scan_device_not_enumerated(adev);
>                 return 0;
>         }
> @@ -1917,11 +1917,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
>         return true;
>  }
>
> -bool acpi_device_is_present(const struct acpi_device *adev)
> -{
> -       return adev->status.present || adev->status.functional;
> -}
> -
>  static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
>                                        const char *idstr,
>                                        const struct acpi_device_id **matchid)
> @@ -1942,6 +1937,18 @@ static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
>         return false;
>  }
>
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> +                            struct acpi_scan_handler *handler)
> +{
> +       struct acpi_hardware_id *hwid;
> +
> +       list_for_each_entry(hwid, &adev->pnp.ids, list)
> +               if (acpi_scan_handler_matching(handler, hwid->id, NULL))
> +                       return true;
> +
> +       return false;
> +}
> +
>  static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
>                                         const struct acpi_device_id **matchid)
>  {
> @@ -2405,16 +2412,35 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
>   * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
>   * @device: Pointer to the &struct acpi_device to check
>   *
> - * Check if the device is present and has no unmet dependencies.
> + * Check if the device is functional or enabled and has no unmet dependencies.
>   *
> - * Return true if the device is ready for enumeratino. Otherwise, return false.
> + * Return true if the device is ready for enumeration. Otherwise, return false.
>   */
>  bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
>  {
>         if (device->flags.honor_deps && device->dep_unmet)
>                 return false;
>
> -       return acpi_device_is_present(device);
> +       /*
> +        * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> +        * (!present && functional) for certain types of devices that should be
> +        * enumerated. Note that the enabled bit should not be set unless the
> +        * present bit is set.
> +        *
> +        * However, limit this only to processor devices to reduce possible
> +        * regressions with firmware.
> +        */
> +       if (!device->status.present)
> +               return device->status.functional;
> +
> +       /*
> +        * Fast path - if enabled is set, avoid the more expensive test to
> +        * check whether this device is a processor.
> +        */
> +       if (device->status.enabled)
> +               return true;
> +
> +       return !acpi_device_is_processor(device);
>  }
>  EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
>
> --

I can queue this up for 6.9 as it looks like the rest of the series
will still need some work.  What do you think?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices
  2024-02-15 20:10   ` Rafael J. Wysocki
@ 2024-02-19  9:45     ` Jonathan Cameron
  2024-02-20 11:30     ` Russell King (Oracle)
  1 sibling, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2024-02-19  9:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Russell King, linux-pm, loongarch, linux-acpi, linux-arch,
	linux-kernel, linux-arm-kernel, linux-riscv, kvmarm, x86,
	acpica-devel, linux-csky, linux-doc, linux-ia64, linux-parisc,
	Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse

On Thu, 15 Feb 2024 21:10:39 +0100
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Wed, Jan 31, 2024 at 5:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> >
> > From: James Morse <james.morse@arm.com>
> >
> > Today the ACPI enumeration code 'visits' all devices that are present.
> >
> > This is a problem for arm64, where CPUs are always present, but not
> > always enabled. When a device-check occurs because the firmware-policy
> > has changed and a CPU is now enabled, the following error occurs:
> > | acpi ACPI0007:48: Enumeration failure
> >
> > This is ultimately because acpi_dev_ready_for_enumeration() returns
> > true for a device that is not enabled. The ACPI Processor driver
> > will not register such CPUs as they are not 'decoding their resources'.
> >
> > ACPI allows a device to be functional instead of maintaining the
> > present and enabled bit, but we can't simply check the enabled bit
> > for all devices since firmware can be buggy.
> >
> > If ACPI indicates that the device is present and enabled, then all well
> > and good, we can enumate it. However, if the device is present and not
> > enabled, then we also check whether the device is a processor device
> > to limit the impact of this new check to just processor devices.
> >
> > This avoids enumerating present && functional processor devices that
> > are not enabled.
> >
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---
> > Changes since RFC v2:
> >  * Incorporate comment suggestion by Gavin Shan.
> > Changes since RFC v3:
> >  * Fixed "sert" typo.
> > Changes since RFC v3 (smaller series):
> >  * Restrict checking the enabled bit to processor devices, update
> >    commit comments.
> >  * Use Rafael's suggestion in
> >    https://lore.kernel.org/r/5760569.DvuYhMxLoT@kreacher
> >  * Updated with a fix - see:
> >    https://lore.kernel.org/all/Zbe8WQRASx6D6RaG@shell.armlinux.org.uk/
> > ---
> >  drivers/acpi/acpi_processor.c | 11 +++++++++
> >  drivers/acpi/device_pm.c      |  2 +-
> >  drivers/acpi/device_sysfs.c   |  2 +-
> >  drivers/acpi/internal.h       |  4 ++-
> >  drivers/acpi/property.c       |  2 +-
> >  drivers/acpi/scan.c           | 46 +++++++++++++++++++++++++++--------
> >  6 files changed, 53 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 4fe2ef54088c..cf7c1cca69dd 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -626,6 +626,17 @@ static struct acpi_scan_handler processor_handler = {
> >         },
> >  };
> >
> > +bool acpi_device_is_processor(const struct acpi_device *adev)
> > +{
> > +       if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
> > +               return true;
> > +
> > +       if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
> > +               return false;
> > +
> > +       return acpi_scan_check_handler(adev, &processor_handler);
> > +}
> > +
> >  static int acpi_processor_container_attach(struct acpi_device *dev,
> >                                            const struct acpi_device_id *id)
> >  {
> > diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> > index 3b4d048c4941..e3c80f3b3b57 100644
> > --- a/drivers/acpi/device_pm.c
> > +++ b/drivers/acpi/device_pm.c
> > @@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
> >                 return -EINVAL;
> >
> >         device->power.state = ACPI_STATE_UNKNOWN;
> > -       if (!acpi_device_is_present(device)) {
> > +       if (!acpi_dev_ready_for_enumeration(device)) {
> >                 device->flags.initialized = false;
> >                 return -ENXIO;
> >         }
> > diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
> > index 23373faa35ec..a0256d2493a7 100644
> > --- a/drivers/acpi/device_sysfs.c
> > +++ b/drivers/acpi/device_sysfs.c
> > @@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
> >         struct acpi_hardware_id *id;
> >
> >         /* Avoid unnecessarily loading modules for non present devices. */
> > -       if (!acpi_device_is_present(acpi_dev))
> > +       if (!acpi_dev_ready_for_enumeration(acpi_dev))
> >                 return 0;
> >
> >         /*
> > diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> > index 6588525c45ef..1bc8b6db60c5 100644
> > --- a/drivers/acpi/internal.h
> > +++ b/drivers/acpi/internal.h
> > @@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
> >  int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
> >                                        const char *hotplug_profile_name);
> >  void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
> > +bool acpi_scan_check_handler(const struct acpi_device *adev,
> > +                            struct acpi_scan_handler *handler);
> >
> >  #ifdef CONFIG_DEBUG_FS
> >  extern struct dentry *acpi_debugfs_dir;
> > @@ -121,7 +123,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
> >  void acpi_device_remove_files(struct acpi_device *dev);
> >  void acpi_device_add_finalize(struct acpi_device *device);
> >  void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
> > -bool acpi_device_is_present(const struct acpi_device *adev);
> >  bool acpi_device_is_battery(struct acpi_device *adev);
> >  bool acpi_device_is_first_physical_node(struct acpi_device *adev,
> >                                         const struct device *dev);
> > @@ -133,6 +134,7 @@ int acpi_bus_register_early_device(int type);
> >  const struct acpi_device *acpi_companion_match(const struct device *dev);
> >  int __acpi_device_uevent_modalias(const struct acpi_device *adev,
> >                                   struct kobj_uevent_env *env);
> > +bool acpi_device_is_processor(const struct acpi_device *adev);
> >
> >  /* --------------------------------------------------------------------------
> >                                    Power Resource
> > diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> > index a6ead5204046..9f8d54038770 100644
> > --- a/drivers/acpi/property.c
> > +++ b/drivers/acpi/property.c
> > @@ -1486,7 +1486,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
> >         if (!is_acpi_device_node(fwnode))
> >                 return false;
> >
> > -       return acpi_device_is_present(to_acpi_device_node(fwnode));
> > +       return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));
> >  }
> >
> >  static const void *
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index e6ed1ba91e5c..fd2e8b3a5749 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
> >         int error;
> >
> >         acpi_bus_get_status(adev);
> > -       if (acpi_device_is_present(adev)) {
> > +       if (acpi_dev_ready_for_enumeration(adev)) {
> >                 /*
> >                  * This function is only called for device objects for which
> >                  * matching scan handlers exist.  The only situation in which
> > @@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
> >         int error;
> >
> >         acpi_bus_get_status(adev);
> > -       if (!acpi_device_is_present(adev)) {
> > +       if (!acpi_dev_ready_for_enumeration(adev)) {
> >                 acpi_scan_device_not_enumerated(adev);
> >                 return 0;
> >         }
> > @@ -1917,11 +1917,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
> >         return true;
> >  }
> >
> > -bool acpi_device_is_present(const struct acpi_device *adev)
> > -{
> > -       return adev->status.present || adev->status.functional;
> > -}
> > -
> >  static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
> >                                        const char *idstr,
> >                                        const struct acpi_device_id **matchid)
> > @@ -1942,6 +1937,18 @@ static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
> >         return false;
> >  }
> >
> > +bool acpi_scan_check_handler(const struct acpi_device *adev,
> > +                            struct acpi_scan_handler *handler)
> > +{
> > +       struct acpi_hardware_id *hwid;
> > +
> > +       list_for_each_entry(hwid, &adev->pnp.ids, list)
> > +               if (acpi_scan_handler_matching(handler, hwid->id, NULL))
> > +                       return true;
> > +
> > +       return false;
> > +}
> > +
> >  static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
> >                                         const struct acpi_device_id **matchid)
> >  {
> > @@ -2405,16 +2412,35 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
> >   * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
> >   * @device: Pointer to the &struct acpi_device to check
> >   *
> > - * Check if the device is present and has no unmet dependencies.
> > + * Check if the device is functional or enabled and has no unmet dependencies.
> >   *
> > - * Return true if the device is ready for enumeratino. Otherwise, return false.
> > + * Return true if the device is ready for enumeration. Otherwise, return false.
> >   */
> >  bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
> >  {
> >         if (device->flags.honor_deps && device->dep_unmet)
> >                 return false;
> >
> > -       return acpi_device_is_present(device);
> > +       /*
> > +        * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> > +        * (!present && functional) for certain types of devices that should be
> > +        * enumerated. Note that the enabled bit should not be set unless the
> > +        * present bit is set.
> > +        *
> > +        * However, limit this only to processor devices to reduce possible
> > +        * regressions with firmware.
> > +        */
> > +       if (!device->status.present)
> > +               return device->status.functional;
> > +
> > +       /*
> > +        * Fast path - if enabled is set, avoid the more expensive test to
> > +        * check whether this device is a processor.
> > +        */
> > +       if (device->status.enabled)
> > +               return true;
> > +
> > +       return !acpi_device_is_processor(device);
> >  }
> >  EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
> >
> > --  
> 
> I can queue this up for 6.9 as it looks like the rest of the series
> will still need some work.  What do you think?

The sooner this goes in the sooner we discover if some of the bios bug
workarounds we have dropped form the series are in reality necessary
(i.e. get it into big board test farms).

So I'm definitely keen to see this go in for 6.9.

Hopefully we can make rapid progress on the rest of the series and
hammer out which of the remaining subtle differences between
the two flows are real vs code evolution issues.

Jonathan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-02-15 19:22   ` Rafael J. Wysocki
@ 2024-02-20 11:27     ` Russell King (Oracle)
  2024-02-20 15:13       ` Russell King (Oracle)
  2024-02-20 20:59     ` Thomas Gleixner
  2024-03-22 18:53     ` Jonathan Cameron
  2 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2024-02-20 11:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Jonathan Cameron

On Thu, Feb 15, 2024 at 08:22:29PM +0100, Rafael J. Wysocki wrote:
> On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index cf7c1cca69dd..a68c475cdea5 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >                         cpufreq_add_device("acpi-cpufreq");
> >         }
> >
> > +       /*
> > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > +        * duplicate CPU descriptions from firmware.
> > +        */
> > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > +           !get_cpu_device(pr->id)) {
> > +               int ret = arch_register_cpu(pr->id);
> > +
> > +               if (ret)
> > +                       return ret;
> > +       }
> > +
> >         /*
> >          *  Extra Processor objects may be enumerated on MP systems with
> >          *  less than the max # of CPUs. They should be ignored _iff
> 
> This is interesting, because right below there is the following code:
> 
>     if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
>         int ret = acpi_processor_hotadd_init(pr);
> 
>         if (ret)
>             return ret;
>     }
> 
> and acpi_processor_hotadd_init() essentially calls arch_register_cpu()
> with some extra things around it (more about that below).
> 
> I do realize that acpi_processor_hotadd_init() is defined under
> CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's
> consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set.
> 
> So why are the two conditionals that almost contradict each other both
> needed?  It looks like the new code could be combined with
> acpi_processor_hotadd_init() to do the right thing in all cases.
> 
> Now, acpi_processor_hotadd_init() does some extra things that look
> like they should be done by the new code too.
> 
> 1. It checks invalid_phys_cpuid() which appears to be a good idea to me.
> 
> 2. It uses locking around arch_register_cpu() which doesn't seem
> unreasonable either.
> 
> 3. It calls acpi_map_cpu() and I'm not sure why this is not done by
> the new code.
> 
> The only thing that can be dropped from it is the _STA check AFAICS,
> because acpi_processor_add() won't even be called if the CPU is not
> present (and not enabled after the first patch).
> 
> So why does the code not do 1 - 3 above?

Honestly, I'm out of my depth with this and can't answer your
questions - and I really don't want to try fiddling with this code
because it's just too icky (even in its current form in mainline)
to be understandable to anyone who hasn't gained a detailed knowledge
of this code.

It's going to require a lot of analysis - how acpi_map_cpuid() behaves
in all circumstances, what this means for invalid_logical_cpuid() and
invalid_phys_cpuid(), what paths will be taken in each case. This code
is already just too hairy for someone who isn't an experienced ACPI
hacker to be able to follow and I don't see an obvious way to make it
more readable.

James' additions make it even more complex and less readable.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices
  2024-02-15 20:10   ` Rafael J. Wysocki
  2024-02-19  9:45     ` Jonathan Cameron
@ 2024-02-20 11:30     ` Russell King (Oracle)
  1 sibling, 0 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2024-02-20 11:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Jonathan Cameron

On Thu, Feb 15, 2024 at 09:10:39PM +0100, Rafael J. Wysocki wrote:
> On Wed, Jan 31, 2024 at 5:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> >
> > From: James Morse <james.morse@arm.com>
> >
> > Today the ACPI enumeration code 'visits' all devices that are present.
> >
> > This is a problem for arm64, where CPUs are always present, but not
> > always enabled. When a device-check occurs because the firmware-policy
> > has changed and a CPU is now enabled, the following error occurs:
> > | acpi ACPI0007:48: Enumeration failure
> >
> > This is ultimately because acpi_dev_ready_for_enumeration() returns
> > true for a device that is not enabled. The ACPI Processor driver
> > will not register such CPUs as they are not 'decoding their resources'.
> >
> > ACPI allows a device to be functional instead of maintaining the
> > present and enabled bit, but we can't simply check the enabled bit
> > for all devices since firmware can be buggy.
> >
> > If ACPI indicates that the device is present and enabled, then all well
> > and good, we can enumate it. However, if the device is present and not
> > enabled, then we also check whether the device is a processor device
> > to limit the impact of this new check to just processor devices.
> >
> > This avoids enumerating present && functional processor devices that
> > are not enabled.
> >
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> 
> I can queue this up for 6.9 as it looks like the rest of the series
> will still need some work.  What do you think?

That seems to be the only way we can make some progress with this
series. I've no idea how we progress from here because I can't answer
your questions on patch 2.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-02-20 11:27     ` Russell King (Oracle)
@ 2024-02-20 15:13       ` Russell King (Oracle)
  2024-02-20 16:24         ` Jonathan Cameron
  0 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2024-02-20 15:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Jonathan Cameron

On Tue, Feb 20, 2024 at 11:27:15AM +0000, Russell King (Oracle) wrote:
> On Thu, Feb 15, 2024 at 08:22:29PM +0100, Rafael J. Wysocki wrote:
> > On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index cf7c1cca69dd..a68c475cdea5 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c
> > > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > >                         cpufreq_add_device("acpi-cpufreq");
> > >         }
> > >
> > > +       /*
> > > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > > +        * duplicate CPU descriptions from firmware.
> > > +        */
> > > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > > +           !get_cpu_device(pr->id)) {
> > > +               int ret = arch_register_cpu(pr->id);
> > > +
> > > +               if (ret)
> > > +                       return ret;
> > > +       }
> > > +
> > >         /*
> > >          *  Extra Processor objects may be enumerated on MP systems with
> > >          *  less than the max # of CPUs. They should be ignored _iff
> > 
> > This is interesting, because right below there is the following code:
> > 
> >     if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> >         int ret = acpi_processor_hotadd_init(pr);
> > 
> >         if (ret)
> >             return ret;
> >     }
> > 
> > and acpi_processor_hotadd_init() essentially calls arch_register_cpu()
> > with some extra things around it (more about that below).
> > 
> > I do realize that acpi_processor_hotadd_init() is defined under
> > CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's
> > consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set.
> > 
> > So why are the two conditionals that almost contradict each other both
> > needed?  It looks like the new code could be combined with
> > acpi_processor_hotadd_init() to do the right thing in all cases.
> > 
> > Now, acpi_processor_hotadd_init() does some extra things that look
> > like they should be done by the new code too.
> > 
> > 1. It checks invalid_phys_cpuid() which appears to be a good idea to me.
> > 
> > 2. It uses locking around arch_register_cpu() which doesn't seem
> > unreasonable either.
> > 
> > 3. It calls acpi_map_cpu() and I'm not sure why this is not done by
> > the new code.
> > 
> > The only thing that can be dropped from it is the _STA check AFAICS,
> > because acpi_processor_add() won't even be called if the CPU is not
> > present (and not enabled after the first patch).
> > 
> > So why does the code not do 1 - 3 above?
> 
> Honestly, I'm out of my depth with this and can't answer your
> questions - and I really don't want to try fiddling with this code
> because it's just too icky (even in its current form in mainline)
> to be understandable to anyone who hasn't gained a detailed knowledge
> of this code.
> 
> It's going to require a lot of analysis - how acpi_map_cpuid() behaves
> in all circumstances, what this means for invalid_logical_cpuid() and
> invalid_phys_cpuid(), what paths will be taken in each case. This code
> is already just too hairy for someone who isn't an experienced ACPI
> hacker to be able to follow and I don't see an obvious way to make it
> more readable.
> 
> James' additions make it even more complex and less readable.

As an illustration of the problems I'm having here, I was just writing
a reply to this with a suggestion of transforming this code ultimately
to:

	if (!get_cpu_device(pr->id)) {
		int ret;

		if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id))
			ret = acpi_processor_make_enabled(pr);
		else
			ret = acpi_processor_make_present(pr);

		if (ret)
			return ret;
	}

(acpi_processor_make_present() would be acpi_processor_hotadd_init()
and acpi_processor_make_enabled() would be arch_register_cpu() at this
point.)

Then I realised that's a bad idea - because we really need to check
that pr->id is valid before calling get_cpu_device() on it, so this
won't work. That leaves us with:

	int ret;

	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
		/* x86 et.al. path */
		ret = acpi_processor_make_present(pr);
	} else if (!get_cpu_device(pr->id)) {
		/* Arm64 path */
		ret = acpi_processor_make_enabled(pr);
	} else {
		ret = 0;
	}

	if (ret)
		return ret;

Now, the next transformation would be to move !get_cpu_device(pr->id)
into acpi_processor_make_enabled() which would eliminate one of those
if() legs.

Now, if we want to somehow make the call to arch_regster_cpu() common
in these two paths, the next question is what are the _precise_
semantics of acpi_map_cpu(), particularly with respect to it
modifying pr->id. Is it guaranteed to always give the same result
for the same processor described in ACPI? What acpi_map_cpu() anyway,
I can find no documentation for it.

Then there's the question whether calling acpi_unmap_cpu() should be
done on the failure path if arch_register_cpu() fails, which is done
for the x86 path but not the Arm64 path. Should it be done for the
Arm64 path? I've no idea, but as Arm64 doesn't implement either of
these two functions, I guess they could be stubbed out and thus be
no-ops - but then we open a hole where if pr->id is invalid, we
end up passing that invalid value to arch_register_cpu() which I'm
quite sure will explode with a negative CPU number.

So, to my mind, what you're effectively asking for is a total rewrite
of all the code in and called by acpi_processor_get_info()... and that
is not something I am willing to do (because it's too far outside of
my knowledge area.)

As I said in my reply to patch 1, I think your comments on patch 2
make Arm64 vcpu hotplug unachievable in a reasonable time frame, and
certainly outside the bounds of what I can do to progress this.

So, at this point I'm going to stand down from further participation
with this patch set as I believe I've reached the limit of what I can
do to progress it.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-02-20 15:13       ` Russell King (Oracle)
@ 2024-02-20 16:24         ` Jonathan Cameron
  2024-02-20 19:59           ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Jonathan Cameron @ 2024-02-20 16:24 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Rafael J. Wysocki, linux-pm, loongarch, linux-acpi, linux-arch,
	linux-kernel, linux-arm-kernel, linux-riscv, kvmarm, x86,
	acpica-devel, linux-csky, linux-doc, linux-ia64, linux-parisc,
	Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse

On Tue, 20 Feb 2024 15:13:58 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Tue, Feb 20, 2024 at 11:27:15AM +0000, Russell King (Oracle) wrote:
> > On Thu, Feb 15, 2024 at 08:22:29PM +0100, Rafael J. Wysocki wrote:  
> > > On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:  
> > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > > index cf7c1cca69dd..a68c475cdea5 100644
> > > > --- a/drivers/acpi/acpi_processor.c
> > > > +++ b/drivers/acpi/acpi_processor.c
> > > > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > > >                         cpufreq_add_device("acpi-cpufreq");
> > > >         }
> > > >
> > > > +       /*
> > > > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > > > +        * duplicate CPU descriptions from firmware.
> > > > +        */
> > > > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > > > +           !get_cpu_device(pr->id)) {
> > > > +               int ret = arch_register_cpu(pr->id);
> > > > +
> > > > +               if (ret)
> > > > +                       return ret;
> > > > +       }
> > > > +
> > > >         /*
> > > >          *  Extra Processor objects may be enumerated on MP systems with
> > > >          *  less than the max # of CPUs. They should be ignored _iff  
> > > 
> > > This is interesting, because right below there is the following code:
> > > 
> > >     if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> > >         int ret = acpi_processor_hotadd_init(pr);
> > > 
> > >         if (ret)
> > >             return ret;
> > >     }
> > > 
> > > and acpi_processor_hotadd_init() essentially calls arch_register_cpu()
> > > with some extra things around it (more about that below).
> > > 
> > > I do realize that acpi_processor_hotadd_init() is defined under
> > > CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's
> > > consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set.
> > > 
> > > So why are the two conditionals that almost contradict each other both
> > > needed?  It looks like the new code could be combined with
> > > acpi_processor_hotadd_init() to do the right thing in all cases.
> > > 
> > > Now, acpi_processor_hotadd_init() does some extra things that look
> > > like they should be done by the new code too.
> > > 
> > > 1. It checks invalid_phys_cpuid() which appears to be a good idea to me.
> > > 
> > > 2. It uses locking around arch_register_cpu() which doesn't seem
> > > unreasonable either.
> > > 
> > > 3. It calls acpi_map_cpu() and I'm not sure why this is not done by
> > > the new code.
> > > 
> > > The only thing that can be dropped from it is the _STA check AFAICS,
> > > because acpi_processor_add() won't even be called if the CPU is not
> > > present (and not enabled after the first patch).
> > > 
> > > So why does the code not do 1 - 3 above?  
> > 
> > Honestly, I'm out of my depth with this and can't answer your
> > questions - and I really don't want to try fiddling with this code
> > because it's just too icky (even in its current form in mainline)
> > to be understandable to anyone who hasn't gained a detailed knowledge
> > of this code.
> > 
> > It's going to require a lot of analysis - how acpi_map_cpuid() behaves
> > in all circumstances, what this means for invalid_logical_cpuid() and
> > invalid_phys_cpuid(), what paths will be taken in each case. This code
> > is already just too hairy for someone who isn't an experienced ACPI
> > hacker to be able to follow and I don't see an obvious way to make it
> > more readable.
> > 
> > James' additions make it even more complex and less readable.  
> 
> As an illustration of the problems I'm having here, I was just writing
> a reply to this with a suggestion of transforming this code ultimately
> to:
> 
> 	if (!get_cpu_device(pr->id)) {
> 		int ret;
> 
> 		if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id))
> 			ret = acpi_processor_make_enabled(pr);
> 		else
> 			ret = acpi_processor_make_present(pr);
> 
> 		if (ret)
> 			return ret;
> 	}
> 
> (acpi_processor_make_present() would be acpi_processor_hotadd_init()
> and acpi_processor_make_enabled() would be arch_register_cpu() at this
> point.)
> 
> Then I realised that's a bad idea - because we really need to check
> that pr->id is valid before calling get_cpu_device() on it, so this
> won't work. That leaves us with:
> 
> 	int ret;
> 
> 	if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> 		/* x86 et.al. path */
> 		ret = acpi_processor_make_present(pr);
> 	} else if (!get_cpu_device(pr->id)) {
> 		/* Arm64 path */
> 		ret = acpi_processor_make_enabled(pr);
> 	} else {
> 		ret = 0;
> 	}
> 
> 	if (ret)
> 		return ret;
> 
> Now, the next transformation would be to move !get_cpu_device(pr->id)
> into acpi_processor_make_enabled() which would eliminate one of those
> if() legs.
> 
> Now, if we want to somehow make the call to arch_regster_cpu() common
> in these two paths, the next question is what are the _precise_
> semantics of acpi_map_cpu(), particularly with respect to it
> modifying pr->id. Is it guaranteed to always give the same result
> for the same processor described in ACPI? What acpi_map_cpu() anyway,
> I can find no documentation for it.
> 
> Then there's the question whether calling acpi_unmap_cpu() should be
> done on the failure path if arch_register_cpu() fails, which is done
> for the x86 path but not the Arm64 path. Should it be done for the
> Arm64 path? I've no idea, but as Arm64 doesn't implement either of
> these two functions, I guess they could be stubbed out and thus be
> no-ops - but then we open a hole where if pr->id is invalid, we
> end up passing that invalid value to arch_register_cpu() which I'm
> quite sure will explode with a negative CPU number.
> 
> So, to my mind, what you're effectively asking for is a total rewrite
> of all the code in and called by acpi_processor_get_info()... and that
> is not something I am willing to do (because it's too far outside of
> my knowledge area.)
> 
> As I said in my reply to patch 1, I think your comments on patch 2
> make Arm64 vcpu hotplug unachievable in a reasonable time frame, and
> certainly outside the bounds of what I can do to progress this.
> 
> So, at this point I'm going to stand down from further participation
> with this patch set as I believe I've reached the limit of what I can
> do to progress it.
> 

Thanks for your hard work on this Russell - we have moved forwards.

Short of anyone else stepping up I'll pick this up with
the help of some my colleagues. As such I'm keen on getting patch
1 upstream ASAP so that we can exclude the need for some of the
other workarounds from earlier versions of this series (the ones
dropped before now).

We will need a little time to get up to speed on the current status
and discussion points Russell raises above.

Jonathan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-02-20 16:24         ` Jonathan Cameron
@ 2024-02-20 19:59           ` Rafael J. Wysocki
  2024-02-21 12:04             ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-02-20 19:59 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Russell King (Oracle),
	Rafael J. Wysocki, linux-pm, loongarch, linux-acpi, linux-arch,
	linux-kernel, linux-arm-kernel, linux-riscv, kvmarm, x86,
	acpica-devel, linux-csky, linux-doc, linux-ia64, linux-parisc,
	Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse

On Tue, Feb 20, 2024 at 5:24 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 20 Feb 2024 15:13:58 +0000
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
>
> > On Tue, Feb 20, 2024 at 11:27:15AM +0000, Russell King (Oracle) wrote:
> > > On Thu, Feb 15, 2024 at 08:22:29PM +0100, Rafael J. Wysocki wrote:
> > > > On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > > > index cf7c1cca69dd..a68c475cdea5 100644
> > > > > --- a/drivers/acpi/acpi_processor.c
> > > > > +++ b/drivers/acpi/acpi_processor.c
> > > > > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > > > >                         cpufreq_add_device("acpi-cpufreq");
> > > > >         }
> > > > >
> > > > > +       /*
> > > > > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > > > > +        * duplicate CPU descriptions from firmware.
> > > > > +        */
> > > > > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > > > > +           !get_cpu_device(pr->id)) {
> > > > > +               int ret = arch_register_cpu(pr->id);
> > > > > +
> > > > > +               if (ret)
> > > > > +                       return ret;
> > > > > +       }
> > > > > +
> > > > >         /*
> > > > >          *  Extra Processor objects may be enumerated on MP systems with
> > > > >          *  less than the max # of CPUs. They should be ignored _iff
> > > >
> > > > This is interesting, because right below there is the following code:
> > > >
> > > >     if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> > > >         int ret = acpi_processor_hotadd_init(pr);
> > > >
> > > >         if (ret)
> > > >             return ret;
> > > >     }
> > > >
> > > > and acpi_processor_hotadd_init() essentially calls arch_register_cpu()
> > > > with some extra things around it (more about that below).
> > > >
> > > > I do realize that acpi_processor_hotadd_init() is defined under
> > > > CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's
> > > > consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set.
> > > >
> > > > So why are the two conditionals that almost contradict each other both
> > > > needed?  It looks like the new code could be combined with
> > > > acpi_processor_hotadd_init() to do the right thing in all cases.
> > > >
> > > > Now, acpi_processor_hotadd_init() does some extra things that look
> > > > like they should be done by the new code too.
> > > >
> > > > 1. It checks invalid_phys_cpuid() which appears to be a good idea to me.
> > > >
> > > > 2. It uses locking around arch_register_cpu() which doesn't seem
> > > > unreasonable either.
> > > >
> > > > 3. It calls acpi_map_cpu() and I'm not sure why this is not done by
> > > > the new code.
> > > >
> > > > The only thing that can be dropped from it is the _STA check AFAICS,
> > > > because acpi_processor_add() won't even be called if the CPU is not
> > > > present (and not enabled after the first patch).
> > > >
> > > > So why does the code not do 1 - 3 above?
> > >
> > > Honestly, I'm out of my depth with this and can't answer your
> > > questions - and I really don't want to try fiddling with this code
> > > because it's just too icky (even in its current form in mainline)
> > > to be understandable to anyone who hasn't gained a detailed knowledge
> > > of this code.
> > >
> > > It's going to require a lot of analysis - how acpi_map_cpuid() behaves
> > > in all circumstances, what this means for invalid_logical_cpuid() and
> > > invalid_phys_cpuid(), what paths will be taken in each case. This code
> > > is already just too hairy for someone who isn't an experienced ACPI
> > > hacker to be able to follow and I don't see an obvious way to make it
> > > more readable.
> > >
> > > James' additions make it even more complex and less readable.
> >
> > As an illustration of the problems I'm having here, I was just writing
> > a reply to this with a suggestion of transforming this code ultimately
> > to:
> >
> >       if (!get_cpu_device(pr->id)) {
> >               int ret;
> >
> >               if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id))
> >                       ret = acpi_processor_make_enabled(pr);
> >               else
> >                       ret = acpi_processor_make_present(pr);
> >
> >               if (ret)
> >                       return ret;
> >       }
> >
> > (acpi_processor_make_present() would be acpi_processor_hotadd_init()
> > and acpi_processor_make_enabled() would be arch_register_cpu() at this
> > point.)
> >
> > Then I realised that's a bad idea - because we really need to check
> > that pr->id is valid before calling get_cpu_device() on it, so this
> > won't work. That leaves us with:
> >
> >       int ret;
> >
> >       if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> >               /* x86 et.al. path */
> >               ret = acpi_processor_make_present(pr);
> >       } else if (!get_cpu_device(pr->id)) {
> >               /* Arm64 path */
> >               ret = acpi_processor_make_enabled(pr);
> >       } else {
> >               ret = 0;
> >       }
> >
> >       if (ret)
> >               return ret;
> >
> > Now, the next transformation would be to move !get_cpu_device(pr->id)
> > into acpi_processor_make_enabled() which would eliminate one of those
> > if() legs.
> >
> > Now, if we want to somehow make the call to arch_regster_cpu() common
> > in these two paths, the next question is what are the _precise_
> > semantics of acpi_map_cpu(), particularly with respect to it
> > modifying pr->id. Is it guaranteed to always give the same result
> > for the same processor described in ACPI? What acpi_map_cpu() anyway,
> > I can find no documentation for it.
> >
> > Then there's the question whether calling acpi_unmap_cpu() should be
> > done on the failure path if arch_register_cpu() fails, which is done
> > for the x86 path but not the Arm64 path. Should it be done for the
> > Arm64 path? I've no idea, but as Arm64 doesn't implement either of
> > these two functions, I guess they could be stubbed out and thus be
> > no-ops - but then we open a hole where if pr->id is invalid, we
> > end up passing that invalid value to arch_register_cpu() which I'm
> > quite sure will explode with a negative CPU number.
> >
> > So, to my mind, what you're effectively asking for is a total rewrite
> > of all the code in and called by acpi_processor_get_info()... and that
> > is not something I am willing to do (because it's too far outside of
> > my knowledge area.)
> >
> > As I said in my reply to patch 1, I think your comments on patch 2
> > make Arm64 vcpu hotplug unachievable in a reasonable time frame, and
> > certainly outside the bounds of what I can do to progress this.
> >
> > So, at this point I'm going to stand down from further participation
> > with this patch set as I believe I've reached the limit of what I can
> > do to progress it.
> >
>
> Thanks for your hard work on this Russell - we have moved forwards.
>
> Short of anyone else stepping up I'll pick this up with
> the help of some my colleagues. As such I'm keen on getting patch
> 1 upstream ASAP so that we can exclude the need for some of the
> other workarounds from earlier versions of this series (the ones
> dropped before now).

Applied (as 6.9 material).

> We will need a little time to get up to speed on the current status
> and discussion points Russell raises above.

Sure.

I'm planning to send comments for some other patches in the series this week.

Thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-02-15 19:22   ` Rafael J. Wysocki
  2024-02-20 11:27     ` Russell King (Oracle)
@ 2024-02-20 20:59     ` Thomas Gleixner
  2024-03-22 18:53     ` Jonathan Cameron
  2 siblings, 0 replies; 42+ messages in thread
From: Thomas Gleixner @ 2024-02-20 20:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, Russell King
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Jonathan Cameron, Rafael J. Wysocki

On Thu, Feb 15 2024 at 20:22, Rafael J. Wysocki wrote:
> On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
>> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
>> index 47de0f140ba6..13d052bf13f4 100644
>> --- a/drivers/base/cpu.c
>> +++ b/drivers/base/cpu.c
>> @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
>>  {
>>         int i, ret;
>>
>> -       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
>> +       /*
>> +        * When ACPI is enabled, CPUs are registered via
>> +        * acpi_processor_get_info().
>> +        */
>> +       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
>>                 return;
>
> Honestly, this looks like a quick hack to me and it absolutely
> requires an ACK from the x86 maintainers to go anywhere.

That should work, but yes I agree it's all but pretty.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-02-20 19:59           ` Rafael J. Wysocki
@ 2024-02-21 12:04             ` Rafael J. Wysocki
  0 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-02-21 12:04 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Russell King (Oracle),
	linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse

On Tue, Feb 20, 2024 at 8:59 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Feb 20, 2024 at 5:24 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Tue, 20 Feb 2024 15:13:58 +0000
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> >
> > > On Tue, Feb 20, 2024 at 11:27:15AM +0000, Russell King (Oracle) wrote:
> > > > On Thu, Feb 15, 2024 at 08:22:29PM +0100, Rafael J. Wysocki wrote:
> > > > > On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > > > > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > > > > index cf7c1cca69dd..a68c475cdea5 100644
> > > > > > --- a/drivers/acpi/acpi_processor.c
> > > > > > +++ b/drivers/acpi/acpi_processor.c
> > > > > > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > > > > >                         cpufreq_add_device("acpi-cpufreq");
> > > > > >         }
> > > > > >
> > > > > > +       /*
> > > > > > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > > > > > +        * duplicate CPU descriptions from firmware.
> > > > > > +        */
> > > > > > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > > > > > +           !get_cpu_device(pr->id)) {
> > > > > > +               int ret = arch_register_cpu(pr->id);
> > > > > > +
> > > > > > +               if (ret)
> > > > > > +                       return ret;
> > > > > > +       }
> > > > > > +
> > > > > >         /*
> > > > > >          *  Extra Processor objects may be enumerated on MP systems with
> > > > > >          *  less than the max # of CPUs. They should be ignored _iff
> > > > >
> > > > > This is interesting, because right below there is the following code:
> > > > >
> > > > >     if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> > > > >         int ret = acpi_processor_hotadd_init(pr);
> > > > >
> > > > >         if (ret)
> > > > >             return ret;
> > > > >     }
> > > > >
> > > > > and acpi_processor_hotadd_init() essentially calls arch_register_cpu()
> > > > > with some extra things around it (more about that below).
> > > > >
> > > > > I do realize that acpi_processor_hotadd_init() is defined under
> > > > > CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's
> > > > > consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set.
> > > > >
> > > > > So why are the two conditionals that almost contradict each other both
> > > > > needed?  It looks like the new code could be combined with
> > > > > acpi_processor_hotadd_init() to do the right thing in all cases.
> > > > >
> > > > > Now, acpi_processor_hotadd_init() does some extra things that look
> > > > > like they should be done by the new code too.
> > > > >
> > > > > 1. It checks invalid_phys_cpuid() which appears to be a good idea to me.
> > > > >
> > > > > 2. It uses locking around arch_register_cpu() which doesn't seem
> > > > > unreasonable either.
> > > > >
> > > > > 3. It calls acpi_map_cpu() and I'm not sure why this is not done by
> > > > > the new code.
> > > > >
> > > > > The only thing that can be dropped from it is the _STA check AFAICS,
> > > > > because acpi_processor_add() won't even be called if the CPU is not
> > > > > present (and not enabled after the first patch).
> > > > >
> > > > > So why does the code not do 1 - 3 above?
> > > >
> > > > Honestly, I'm out of my depth with this and can't answer your
> > > > questions - and I really don't want to try fiddling with this code
> > > > because it's just too icky (even in its current form in mainline)
> > > > to be understandable to anyone who hasn't gained a detailed knowledge
> > > > of this code.
> > > >
> > > > It's going to require a lot of analysis - how acpi_map_cpuid() behaves
> > > > in all circumstances, what this means for invalid_logical_cpuid() and
> > > > invalid_phys_cpuid(), what paths will be taken in each case. This code
> > > > is already just too hairy for someone who isn't an experienced ACPI
> > > > hacker to be able to follow and I don't see an obvious way to make it
> > > > more readable.
> > > >
> > > > James' additions make it even more complex and less readable.
> > >
> > > As an illustration of the problems I'm having here, I was just writing
> > > a reply to this with a suggestion of transforming this code ultimately
> > > to:
> > >
> > >       if (!get_cpu_device(pr->id)) {
> > >               int ret;
> > >
> > >               if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id))
> > >                       ret = acpi_processor_make_enabled(pr);
> > >               else
> > >                       ret = acpi_processor_make_present(pr);
> > >
> > >               if (ret)
> > >                       return ret;
> > >       }
> > >
> > > (acpi_processor_make_present() would be acpi_processor_hotadd_init()
> > > and acpi_processor_make_enabled() would be arch_register_cpu() at this
> > > point.)
> > >
> > > Then I realised that's a bad idea - because we really need to check
> > > that pr->id is valid before calling get_cpu_device() on it, so this
> > > won't work. That leaves us with:
> > >
> > >       int ret;
> > >
> > >       if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> > >               /* x86 et.al. path */
> > >               ret = acpi_processor_make_present(pr);
> > >       } else if (!get_cpu_device(pr->id)) {
> > >               /* Arm64 path */
> > >               ret = acpi_processor_make_enabled(pr);
> > >       } else {
> > >               ret = 0;
> > >       }
> > >
> > >       if (ret)
> > >               return ret;
> > >
> > > Now, the next transformation would be to move !get_cpu_device(pr->id)
> > > into acpi_processor_make_enabled() which would eliminate one of those
> > > if() legs.
> > >
> > > Now, if we want to somehow make the call to arch_regster_cpu() common
> > > in these two paths, the next question is what are the _precise_
> > > semantics of acpi_map_cpu(), particularly with respect to it
> > > modifying pr->id. Is it guaranteed to always give the same result
> > > for the same processor described in ACPI? What acpi_map_cpu() anyway,
> > > I can find no documentation for it.
> > >
> > > Then there's the question whether calling acpi_unmap_cpu() should be
> > > done on the failure path if arch_register_cpu() fails, which is done
> > > for the x86 path but not the Arm64 path. Should it be done for the
> > > Arm64 path? I've no idea, but as Arm64 doesn't implement either of
> > > these two functions, I guess they could be stubbed out and thus be
> > > no-ops - but then we open a hole where if pr->id is invalid, we
> > > end up passing that invalid value to arch_register_cpu() which I'm
> > > quite sure will explode with a negative CPU number.
> > >
> > > So, to my mind, what you're effectively asking for is a total rewrite
> > > of all the code in and called by acpi_processor_get_info()... and that
> > > is not something I am willing to do (because it's too far outside of
> > > my knowledge area.)
> > >
> > > As I said in my reply to patch 1, I think your comments on patch 2
> > > make Arm64 vcpu hotplug unachievable in a reasonable time frame, and
> > > certainly outside the bounds of what I can do to progress this.
> > >
> > > So, at this point I'm going to stand down from further participation
> > > with this patch set as I believe I've reached the limit of what I can
> > > do to progress it.
> > >
> >
> > Thanks for your hard work on this Russell - we have moved forwards.
> >
> > Short of anyone else stepping up I'll pick this up with
> > the help of some my colleagues. As such I'm keen on getting patch
> > 1 upstream ASAP so that we can exclude the need for some of the
> > other workarounds from earlier versions of this series (the ones
> > dropped before now).
>
> Applied (as 6.9 material).

And I'm going to drop it, because it is not correct.

The problem is that it is going to affect non-processor devices, but
let me comment on that patch itself.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices
  2024-01-31 16:49 ` [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices Russell King
  2024-01-31 17:25   ` Miguel Luis
  2024-02-15 20:10   ` Rafael J. Wysocki
@ 2024-02-21 13:01   ` Rafael J. Wysocki
  2 siblings, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-02-21 13:01 UTC (permalink / raw)
  To: Russell King, Jonathan Cameron
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse

On Wed, Jan 31, 2024 at 5:49 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
>
> From: James Morse <james.morse@arm.com>
>
> Today the ACPI enumeration code 'visits' all devices that are present.
>
> This is a problem for arm64, where CPUs are always present, but not
> always enabled. When a device-check occurs because the firmware-policy
> has changed and a CPU is now enabled, the following error occurs:
> | acpi ACPI0007:48: Enumeration failure
>
> This is ultimately because acpi_dev_ready_for_enumeration() returns
> true for a device that is not enabled. The ACPI Processor driver
> will not register such CPUs as they are not 'decoding their resources'.
>
> ACPI allows a device to be functional instead of maintaining the
> present and enabled bit, but we can't simply check the enabled bit
> for all devices since firmware can be buggy.
>
> If ACPI indicates that the device is present and enabled, then all well
> and good, we can enumate it. However, if the device is present and not
> enabled, then we also check whether the device is a processor device
> to limit the impact of this new check to just processor devices.
>
> This avoids enumerating present && functional processor devices that
> are not enabled.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> Changes since RFC v2:
>  * Incorporate comment suggestion by Gavin Shan.
> Changes since RFC v3:
>  * Fixed "sert" typo.
> Changes since RFC v3 (smaller series):
>  * Restrict checking the enabled bit to processor devices, update
>    commit comments.
>  * Use Rafael's suggestion in
>    https://lore.kernel.org/r/5760569.DvuYhMxLoT@kreacher
>  * Updated with a fix - see:
>    https://lore.kernel.org/all/Zbe8WQRASx6D6RaG@shell.armlinux.org.uk/
> ---
>  drivers/acpi/acpi_processor.c | 11 +++++++++
>  drivers/acpi/device_pm.c      |  2 +-
>  drivers/acpi/device_sysfs.c   |  2 +-
>  drivers/acpi/internal.h       |  4 ++-
>  drivers/acpi/property.c       |  2 +-
>  drivers/acpi/scan.c           | 46 +++++++++++++++++++++++++++--------
>  6 files changed, 53 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 4fe2ef54088c..cf7c1cca69dd 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -626,6 +626,17 @@ static struct acpi_scan_handler processor_handler = {
>         },
>  };
>
> +bool acpi_device_is_processor(const struct acpi_device *adev)
> +{
> +       if (adev->device_type == ACPI_BUS_TYPE_PROCESSOR)
> +               return true;
> +
> +       if (adev->device_type != ACPI_BUS_TYPE_DEVICE)
> +               return false;
> +
> +       return acpi_scan_check_handler(adev, &processor_handler);
> +}
> +
>  static int acpi_processor_container_attach(struct acpi_device *dev,
>                                            const struct acpi_device_id *id)
>  {
> diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
> index 3b4d048c4941..e3c80f3b3b57 100644
> --- a/drivers/acpi/device_pm.c
> +++ b/drivers/acpi/device_pm.c
> @@ -313,7 +313,7 @@ int acpi_bus_init_power(struct acpi_device *device)
>                 return -EINVAL;
>
>         device->power.state = ACPI_STATE_UNKNOWN;
> -       if (!acpi_device_is_present(device)) {
> +       if (!acpi_dev_ready_for_enumeration(device)) {

Sorry for failing to catch this earlier, but this change affects
non-processor devices possibly adversely.

Namely, one of the differences between acpi_device_is_present() and
acpi_dev_ready_for_enumeration() is the (device->flags.honor_deps &&
device->dep_unmet) check in the latter which is not present in the
former which may cause the power_manageable flag to be unset for
devices with dependencies, although they are in fact power-manageable.

The replacement here cannot be made.

>                 device->flags.initialized = false;
>                 return -ENXIO;
>         }
> diff --git a/drivers/acpi/device_sysfs.c b/drivers/acpi/device_sysfs.c
> index 23373faa35ec..a0256d2493a7 100644
> --- a/drivers/acpi/device_sysfs.c
> +++ b/drivers/acpi/device_sysfs.c
> @@ -141,7 +141,7 @@ static int create_pnp_modalias(const struct acpi_device *acpi_dev, char *modalia
>         struct acpi_hardware_id *id;
>
>         /* Avoid unnecessarily loading modules for non present devices. */
> -       if (!acpi_device_is_present(acpi_dev))
> +       if (!acpi_dev_ready_for_enumeration(acpi_dev))

The replacement here is incorrect for an analogous reason as above: it
may cause modalias creation to be skipped for devices with unmet
dependencies that are not processors and matching modules for them
should be loaded.

In fact, this replacement doesn't even have a functional effect on
processors, because there are no modules matching the processor device
ID AFAICS.

>                 return 0;
>
>         /*
> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index 6588525c45ef..1bc8b6db60c5 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -62,6 +62,8 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
>  int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
>                                        const char *hotplug_profile_name);
>  void acpi_scan_hotplug_enabled(struct acpi_hotplug_profile *hotplug, bool val);
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> +                            struct acpi_scan_handler *handler);
>
>  #ifdef CONFIG_DEBUG_FS
>  extern struct dentry *acpi_debugfs_dir;
> @@ -121,7 +123,6 @@ int acpi_device_setup_files(struct acpi_device *dev);
>  void acpi_device_remove_files(struct acpi_device *dev);
>  void acpi_device_add_finalize(struct acpi_device *device);
>  void acpi_free_pnp_ids(struct acpi_device_pnp *pnp);
> -bool acpi_device_is_present(const struct acpi_device *adev);
>  bool acpi_device_is_battery(struct acpi_device *adev);
>  bool acpi_device_is_first_physical_node(struct acpi_device *adev,
>                                         const struct device *dev);
> @@ -133,6 +134,7 @@ int acpi_bus_register_early_device(int type);
>  const struct acpi_device *acpi_companion_match(const struct device *dev);
>  int __acpi_device_uevent_modalias(const struct acpi_device *adev,
>                                   struct kobj_uevent_env *env);
> +bool acpi_device_is_processor(const struct acpi_device *adev);
>
>  /* --------------------------------------------------------------------------
>                                    Power Resource
> diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
> index a6ead5204046..9f8d54038770 100644
> --- a/drivers/acpi/property.c
> +++ b/drivers/acpi/property.c
> @@ -1486,7 +1486,7 @@ static bool acpi_fwnode_device_is_available(const struct fwnode_handle *fwnode)
>         if (!is_acpi_device_node(fwnode))
>                 return false;
>
> -       return acpi_device_is_present(to_acpi_device_node(fwnode));
> +       return acpi_dev_ready_for_enumeration(to_acpi_device_node(fwnode));

This, again, may break non-processor devices with dependencies in
subtle ways and it doesn't have a functional effect on processors
AFAICS.

>  }
>
>  static const void *
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index e6ed1ba91e5c..fd2e8b3a5749 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -304,7 +304,7 @@ static int acpi_scan_device_check(struct acpi_device *adev)
>         int error;
>
>         acpi_bus_get_status(adev);
> -       if (acpi_device_is_present(adev)) {
> +       if (acpi_dev_ready_for_enumeration(adev)) {

It looks to me like there are two purposes of this replacement.  One
is to handle the removal case which is analogous to the
acpi_scan_bus_check() case below.

The other purpose seems to be to avoid the dev_warn() message printed
when acpi_processor_add() does not return 1 for a processor device
that is not enabled.

However, this message arguably should not be printed at all so long as
acpi_bus_scan() succeeds, because hot-adding a device without a
matching scan handler is entirely valid.

I'll send a patch to fix this shortly.

In addition to that, it would suffice to make acpi_processor_add()
check the enabled bit and return 0 early when it is clear.  I'll send
a patch for this either.

>                 /*
>                  * This function is only called for device objects for which
>                  * matching scan handlers exist.  The only situation in which
> @@ -338,7 +338,7 @@ static int acpi_scan_bus_check(struct acpi_device *adev, void *not_used)
>         int error;
>
>         acpi_bus_get_status(adev);
> -       if (!acpi_device_is_present(adev)) {
> +       if (!acpi_dev_ready_for_enumeration(adev)) {

Indeed, the enabled bit should be checked here, along with the present
and functional bits, but it would be better to move that check to
acpi_bus_trim_one() or even acpi_processor_remove(), so the
not-present-but-functional case is handled correctly.  And the
acpi_scan_device_check() case could then be handled analogously.
Another patch to be sent.

>                 acpi_scan_device_not_enumerated(adev);
>                 return 0;
>         }
> @@ -1917,11 +1917,6 @@ static bool acpi_device_should_be_hidden(acpi_handle handle)
>         return true;
>  }
>
> -bool acpi_device_is_present(const struct acpi_device *adev)
> -{
> -       return adev->status.present || adev->status.functional;
> -}
> -
>  static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
>                                        const char *idstr,
>                                        const struct acpi_device_id **matchid)
> @@ -1942,6 +1937,18 @@ static bool acpi_scan_handler_matching(struct acpi_scan_handler *handler,
>         return false;
>  }
>
> +bool acpi_scan_check_handler(const struct acpi_device *adev,
> +                            struct acpi_scan_handler *handler)
> +{
> +       struct acpi_hardware_id *hwid;
> +
> +       list_for_each_entry(hwid, &adev->pnp.ids, list)
> +               if (acpi_scan_handler_matching(handler, hwid->id, NULL))
> +                       return true;
> +
> +       return false;
> +}
> +
>  static struct acpi_scan_handler *acpi_scan_match_handler(const char *idstr,
>                                         const struct acpi_device_id **matchid)
>  {
> @@ -2405,16 +2412,35 @@ EXPORT_SYMBOL_GPL(acpi_dev_clear_dependencies);
>   * acpi_dev_ready_for_enumeration - Check if the ACPI device is ready for enumeration
>   * @device: Pointer to the &struct acpi_device to check
>   *
> - * Check if the device is present and has no unmet dependencies.
> + * Check if the device is functional or enabled and has no unmet dependencies.
>   *
> - * Return true if the device is ready for enumeratino. Otherwise, return false.
> + * Return true if the device is ready for enumeration. Otherwise, return false.
>   */
>  bool acpi_dev_ready_for_enumeration(const struct acpi_device *device)
>  {
>         if (device->flags.honor_deps && device->dep_unmet)
>                 return false;
>
> -       return acpi_device_is_present(device);
> +       /*
> +        * ACPI 6.5's 6.3.7 "_STA (Device Status)" allows firmware to return
> +        * (!present && functional) for certain types of devices that should be
> +        * enumerated. Note that the enabled bit should not be set unless the
> +        * present bit is set.
> +        *
> +        * However, limit this only to processor devices to reduce possible
> +        * regressions with firmware.
> +        */
> +       if (!device->status.present)
> +               return device->status.functional;
> +
> +       /*
> +        * Fast path - if enabled is set, avoid the more expensive test to
> +        * check whether this device is a processor.
> +        */
> +       if (device->status.enabled)
> +               return true;
> +
> +       return !acpi_device_is_processor(device);
>  }
>  EXPORT_SYMBOL_GPL(acpi_dev_ready_for_enumeration);
>
> --

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-02-15 19:22   ` Rafael J. Wysocki
  2024-02-20 11:27     ` Russell King (Oracle)
  2024-02-20 20:59     ` Thomas Gleixner
@ 2024-03-22 18:53     ` Jonathan Cameron
  2024-04-10 12:43       ` Jonathan Cameron
  2 siblings, 1 reply; 42+ messages in thread
From: Jonathan Cameron @ 2024-03-22 18:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Russell King, linux-pm, loongarch, linux-acpi, linux-arch,
	linux-kernel, linux-arm-kernel, linux-riscv, kvmarm, x86,
	acpica-devel, linux-csky, linux-doc, linux-ia64, linux-parisc,
	Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse

On Thu, 15 Feb 2024 20:22:29 +0100
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Wed, Jan 31, 2024 at 5:50 PM Russell King <rmk+kernel@armlinux.org.uk> wrote:
> >
> > From: James Morse <james.morse@arm.com>
> >
> > To allow ACPI to skip the call to arch_register_cpu() when the _STA
> > value indicates the CPU can't be brought online right now, move the
> > arch_register_cpu() call into acpi_processor_get_info().
> >
> > Systems can still be booted with 'acpi=off', or not include an
> > ACPI description at all. For these, the CPUs continue to be
> > registered by cpu_dev_register_generic().
> >
> > This moves the CPU register logic back to a subsys_initcall(),
> > while the memory nodes will have been registered earlier.
> >
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---
> > Changes since RFC v2:
> >  * Fixup comment in acpi_processor_get_info() (Gavin Shan)
> >  * Add comment in cpu_dev_register_generic() (Gavin Shan)
> > ---
> >  drivers/acpi/acpi_processor.c | 12 ++++++++++++
> >  drivers/base/cpu.c            |  6 +++++-
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index cf7c1cca69dd..a68c475cdea5 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -314,6 +314,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >                         cpufreq_add_device("acpi-cpufreq");
> >         }
> >
> > +       /*
> > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > +        * duplicate CPU descriptions from firmware.
> > +        */
> > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > +           !get_cpu_device(pr->id)) {
> > +               int ret = arch_register_cpu(pr->id);
> > +
> > +               if (ret)
> > +                       return ret;
> > +       }
> > +
> >         /*
> >          *  Extra Processor objects may be enumerated on MP systems with
> >          *  less than the max # of CPUs. They should be ignored _iff  
> 
> This is interesting, because right below there is the following code:
> 
>     if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
>         int ret = acpi_processor_hotadd_init(pr);
> 
>         if (ret)
>             return ret;
>     }
> 
> and acpi_processor_hotadd_init() essentially calls arch_register_cpu()
> with some extra things around it (more about that below).
> 
> I do realize that acpi_processor_hotadd_init() is defined under
> CONFIG_ACPI_HOTPLUG_CPU, so for the sake of the argument let's
> consider an architecture where CONFIG_ACPI_HOTPLUG_CPU is set.
> 
> So why are the two conditionals that almost contradict each other both
> needed?  It looks like the new code could be combined with
> acpi_processor_hotadd_init() to do the right thing in all cases.

I jumped on to the end of this series to look at this as the two legs
look more similar at that point. I'll figure out how to drive
any changes through the series once the end goal is clear.

To make testing easy I made the acpi_process_make_enabled() look as
much like acpi_process_make_present() as possible.

> 
> Now, acpi_processor_hotadd_init() does some extra things that look
> like they should be done by the new code too.
> 
> 1. It checks invalid_phys_cpuid() which appears to be a good idea to me.

Indeed that is sensible. Not sure there is a path to here where it fails,
but defense in depth is good.

> 
> 2. It uses locking around arch_register_cpu() which doesn't seem
> unreasonable either.

Seems reasonable, though exactly what this protecting is unclear to me
- is the arch_register_cpu() and/or the acpi_map_cpu().
Whilst it would be nice to be sure, appears harmless, so let us
take it for consistency if nothing else.

The cpu_maps_update_begin()/end() calls though aren't necessary as
we aren't touching the cpu_present or cpu_online masks.


> 
> 3. It calls acpi_map_cpu() and I'm not sure why this is not done by
> the new code.

Doesn't exist except on x86 and longarch as Russell mentioned. So let's
see what it does (on x86)  So we are into the realm of interfaces that
look generic but really aren't :(  I particularly like the
generic_processor_info() which isn't particularly generic.

1. cpu = acpi_register_lapic()

Docs say: Register a local apic and generates a logic cpu number

2. generic_processor_info() in arch/x86/kernel/acpi/acpi.c

Checks against nr_cpus_ids - maybe that bit is useful

Allocate_logical_cpuid().
Digging in, it seems to do similar to setting __cpu_logical_map on arm64.
That's done in acpi_map_gic_cpu_interface, which happens when MADT is
parsed and I believe it's one of the the things we need to do whether
or not the CPU is enabled at boot. So already done.

acpi_processor_set_pdc() -- configure _PDC support (which I'd never heard
of before now).  Deprecated in ACPI 3.0. Given we are using stuff only added
in 6.5 we can probably skip that even if it would be harmless.

acpi_map_cpu2node() -- evalulate _PXM and set __apicid_to_node[]
entry. That is only used from x86 code. Not sure what equivalent would be.
Also numa_set_node(cpu, nid);  Which again sounds a lot more generic than
it is. Load of x86 specific stuff + set_cpu_numa_node() which is generic
and for ARM64 (and anything using CONFIG_GENERIC_ARCH_NUMA) is called
by numa_store_cpu_info() either from early_map_cpu_to_node() or smp_prepare_cpus()
which is called for_each_possible_cpu() and hence has already been done.

So conclusion on this one is there doesn't seem to be anything to do.
We could provide a __weak function or an ARM64 specific one that does
nothing or gate it on an appropriate config variable.  However, given
I presume 'future' ARM64 support for CPU hotplug will want to do something
in these calls, perhaps a better bet is to pass a bool into the function
to indicate these should be skipped if present is not changing.

Having done that, we end up with code that is messy enough we are
better off keeping them as separate functions, though they may
look a little more similar than in this version.

There is a final thing in here you didn't mention
setting pr->flags.need_hotplug_init
which causes extra stuff to occur in processor_driver.c
The extra stuff doesn't seem to be necessary for the enable case
despite being needed for change of present status.
I haven't figured this bit out yet (I need to mess around on x86
to understand what goes wrong if you don't use that flag).


> 
> The only thing that can be dropped from it is the _STA check AFAICS,
> because acpi_processor_add() won't even be called if the CPU is not
> present (and not enabled after the first patch).
> 
> So why does the code not do 1 - 3 above?
I agree with 1 and 2, reasoning for 3 given above.

> 
> > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > index 47de0f140ba6..13d052bf13f4 100644
> > --- a/drivers/base/cpu.c
> > +++ b/drivers/base/cpu.c
> > @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
> >  {
> >         int i, ret;
> >
> > -       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> > +       /*
> > +        * When ACPI is enabled, CPUs are registered via
> > +        * acpi_processor_get_info().
> > +        */
> > +       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
> >                 return;  
> 
> Honestly, this looks like a quick hack to me and it absolutely
> requires an ACK from the x86 maintainers to go anywhere.
Will address this separately.

> 
> >
> >         for_each_present_cpu(i) {
> > --  


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-03-22 18:53     ` Jonathan Cameron
@ 2024-04-10 12:43       ` Jonathan Cameron
  2024-04-10 13:28         ` Rafael J. Wysocki
  0 siblings, 1 reply; 42+ messages in thread
From: Jonathan Cameron @ 2024-04-10 12:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Russell King, linux-pm, loongarch, linux-acpi, linux-arch,
	linux-kernel, linux-arm-kernel, linux-riscv, kvmarm, x86,
	acpica-devel, linux-csky, linux-doc, linux-ia64, linux-parisc,
	Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse

> >   
> > > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > > index 47de0f140ba6..13d052bf13f4 100644
> > > --- a/drivers/base/cpu.c
> > > +++ b/drivers/base/cpu.c
> > > @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
> > >  {
> > >         int i, ret;
> > >
> > > -       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> > > +       /*
> > > +        * When ACPI is enabled, CPUs are registered via
> > > +        * acpi_processor_get_info().
> > > +        */
> > > +       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
> > >                 return;    
> > 
> > Honestly, this looks like a quick hack to me and it absolutely
> > requires an ACK from the x86 maintainers to go anywhere.  
> Will address this separately.
> 

So do people prefer this hack, or something along lines of the following?

static int __init cpu_dev_register_generic(void)
{
        int i, ret = 0;

        for_each_online_cpu(i) {
                if (!get_cpu_device(i)) {
                        ret = arch_register_cpu(i);
                        if (ret)
                                pr_warn("register_cpu %d failed (%d)\n", i, ret);
                }
        }
	//Probably just eat the error.
        return 0;
}
subsys_initcall_sync(cpu_dev_register_generic);

Which may look familiar at it's effectively patch 3 from v3 which was dealing
with CPUs missing from DSDT (something we think doesn't happen).

It might be possible to elide the arch_register_cpu() in
make_present() but that will mean we use different flows in this patch set
for the hotplug and initially present cases which is a bit messy.

I've tested this lightly on arm64 and x86 ACPI + DT booting and it "seems" fine.

Jonathan

> >   
> > >
> > >         for_each_present_cpu(i) {
> > > --    
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-04-10 12:43       ` Jonathan Cameron
@ 2024-04-10 13:28         ` Rafael J. Wysocki
  2024-04-10 13:50           ` Jonathan Cameron
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-04-10 13:28 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Rafael J. Wysocki, Russell King, linux-pm, loongarch, linux-acpi,
	linux-arch, linux-kernel, linux-arm-kernel, linux-riscv, kvmarm,
	x86, acpica-devel, linux-csky, linux-doc, linux-ia64,
	linux-parisc, Salil Mehta, Jean-Philippe Brucker, jianyong.wu,
	justin.he, James Morse

On Wed, Apr 10, 2024 at 2:43 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> > >
> > > > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > > > index 47de0f140ba6..13d052bf13f4 100644
> > > > --- a/drivers/base/cpu.c
> > > > +++ b/drivers/base/cpu.c
> > > > @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
> > > >  {
> > > >         int i, ret;
> > > >
> > > > -       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> > > > +       /*
> > > > +        * When ACPI is enabled, CPUs are registered via
> > > > +        * acpi_processor_get_info().
> > > > +        */
> > > > +       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
> > > >                 return;
> > >
> > > Honestly, this looks like a quick hack to me and it absolutely
> > > requires an ACK from the x86 maintainers to go anywhere.
> > Will address this separately.
> >
>
> So do people prefer this hack, or something along lines of the following?
>
> static int __init cpu_dev_register_generic(void)
> {
>         int i, ret = 0;
>
>         for_each_online_cpu(i) {
>                 if (!get_cpu_device(i)) {
>                         ret = arch_register_cpu(i);
>                         if (ret)
>                                 pr_warn("register_cpu %d failed (%d)\n", i, ret);
>                 }
>         }
>         //Probably just eat the error.
>         return 0;
> }
> subsys_initcall_sync(cpu_dev_register_generic);

I would prefer something like the above.

I actually thought that arch_register_cpu() might return something
like -EPROBE_DEFER when it cannot determine whether or not the CPU is
really available.

Then, the ACPI processor enumeration path may take care of registering
CPU that have not been registered so far and in the more-or-less the
same way regardless of the architecture (modulo some arch-specific
stuff).

In the end, it should be possible to avoid changing the behavior of
x86 and loongarch in this series.

> Which may look familiar at it's effectively patch 3 from v3 which was dealing
> with CPUs missing from DSDT (something we think doesn't happen).
>
> It might be possible to elide the arch_register_cpu() in
> make_present() but that will mean we use different flows in this patch set
> for the hotplug and initially present cases which is a bit messy.
>
> I've tested this lightly on arm64 and x86 ACPI + DT booting and it "seems" fine.

Sounds promising.

Thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-04-10 13:28         ` Rafael J. Wysocki
@ 2024-04-10 13:50           ` Jonathan Cameron
  2024-04-10 14:19             ` Rafael J. Wysocki
  2024-04-10 18:56             ` Russell King (Oracle)
  0 siblings, 2 replies; 42+ messages in thread
From: Jonathan Cameron @ 2024-04-10 13:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Russell King, linux-pm, loongarch, linux-acpi, linux-arch,
	linux-kernel, linux-arm-kernel, linux-riscv, kvmarm, x86,
	acpica-devel, linux-csky, linux-doc, linux-ia64, linux-parisc,
	Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse

On Wed, 10 Apr 2024 15:28:18 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Wed, Apr 10, 2024 at 2:43 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >  
> > > >  
> > > > > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > > > > index 47de0f140ba6..13d052bf13f4 100644
> > > > > --- a/drivers/base/cpu.c
> > > > > +++ b/drivers/base/cpu.c
> > > > > @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
> > > > >  {
> > > > >         int i, ret;
> > > > >
> > > > > -       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> > > > > +       /*
> > > > > +        * When ACPI is enabled, CPUs are registered via
> > > > > +        * acpi_processor_get_info().
> > > > > +        */
> > > > > +       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
> > > > >                 return;  
> > > >
> > > > Honestly, this looks like a quick hack to me and it absolutely
> > > > requires an ACK from the x86 maintainers to go anywhere.  
> > > Will address this separately.
> > >  
> >
> > So do people prefer this hack, or something along lines of the following?
> >
> > static int __init cpu_dev_register_generic(void)
> > {
> >         int i, ret = 0;
> >
> >         for_each_online_cpu(i) {
> >                 if (!get_cpu_device(i)) {
> >                         ret = arch_register_cpu(i);
> >                         if (ret)
> >                                 pr_warn("register_cpu %d failed (%d)\n", i, ret);
> >                 }
> >         }
> >         //Probably just eat the error.
> >         return 0;
> > }
> > subsys_initcall_sync(cpu_dev_register_generic);  
> 
> I would prefer something like the above.
> 
> I actually thought that arch_register_cpu() might return something
> like -EPROBE_DEFER when it cannot determine whether or not the CPU is
> really available.

Ok. That would end up looking much more like the original code I think.
So we wouldn't have this late registration at all, or keep it for DT
on arm64?  I'm not sure that's a clean solution though leaves
the x86 path alone.

If we get rid of this catch all, solution would be to move the
!acpi_disabled check into the arm64 version of arch_cpu_register()
because we would only want the delayed registration path to be
used on ACPI cases where the question of CPU availability can't
yet be resolved.

> 
> Then, the ACPI processor enumeration path may take care of registering
> CPU that have not been registered so far and in the more-or-less the
> same way regardless of the architecture (modulo some arch-specific
> stuff).

If I understand correctly, in acpi_processor_get_info() we'd end up
with a similar check on whether it was already registered (the x86 path)
or had be deferred (arm64 / acpi).
 
> 
> In the end, it should be possible to avoid changing the behavior of
> x86 and loongarch in this series.

Possible, yes, but result if I understand correctly is we end up with
very different flows and replication of functionality between the
early registration and the late one. I'm fine with that if you prefer it!

> 
> > Which may look familiar at it's effectively patch 3 from v3 which was dealing
> > with CPUs missing from DSDT (something we think doesn't happen).
> >
> > It might be possible to elide the arch_register_cpu() in
> > make_present() but that will mean we use different flows in this patch set
> > for the hotplug and initially present cases which is a bit messy.
> >
> > I've tested this lightly on arm64 and x86 ACPI + DT booting and it "seems" fine.  
> 
> Sounds promising.

Possibly not that relevant though if proposal is to drop this approach. :(
At least I now have test setups!

Jonathan
> 
> Thanks!


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-04-10 13:50           ` Jonathan Cameron
@ 2024-04-10 14:19             ` Rafael J. Wysocki
  2024-04-10 15:58               ` Jonathan Cameron
  2024-04-10 18:56             ` Russell King (Oracle)
  1 sibling, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-04-10 14:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Rafael J. Wysocki, Russell King, linux-pm, loongarch, linux-acpi,
	linux-arch, linux-kernel, linux-arm-kernel, linux-riscv, kvmarm,
	x86, acpica-devel, linux-csky, linux-doc, linux-ia64,
	linux-parisc, Salil Mehta, Jean-Philippe Brucker, jianyong.wu,
	justin.he, James Morse

On Wed, Apr 10, 2024 at 3:50 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Wed, 10 Apr 2024 15:28:18 +0200
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>
> > On Wed, Apr 10, 2024 at 2:43 PM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > > >
> > > > > > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > > > > > index 47de0f140ba6..13d052bf13f4 100644
> > > > > > --- a/drivers/base/cpu.c
> > > > > > +++ b/drivers/base/cpu.c
> > > > > > @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
> > > > > >  {
> > > > > >         int i, ret;
> > > > > >
> > > > > > -       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> > > > > > +       /*
> > > > > > +        * When ACPI is enabled, CPUs are registered via
> > > > > > +        * acpi_processor_get_info().
> > > > > > +        */
> > > > > > +       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
> > > > > >                 return;
> > > > >
> > > > > Honestly, this looks like a quick hack to me and it absolutely
> > > > > requires an ACK from the x86 maintainers to go anywhere.
> > > > Will address this separately.
> > > >
> > >
> > > So do people prefer this hack, or something along lines of the following?
> > >
> > > static int __init cpu_dev_register_generic(void)
> > > {
> > >         int i, ret = 0;
> > >
> > >         for_each_online_cpu(i) {
> > >                 if (!get_cpu_device(i)) {
> > >                         ret = arch_register_cpu(i);
> > >                         if (ret)
> > >                                 pr_warn("register_cpu %d failed (%d)\n", i, ret);
> > >                 }
> > >         }
> > >         //Probably just eat the error.
> > >         return 0;
> > > }
> > > subsys_initcall_sync(cpu_dev_register_generic);
> >
> > I would prefer something like the above.
> >
> > I actually thought that arch_register_cpu() might return something
> > like -EPROBE_DEFER when it cannot determine whether or not the CPU is
> > really available.
>
> Ok. That would end up looking much more like the original code I think.
> So we wouldn't have this late registration at all, or keep it for DT
> on arm64?  I'm not sure that's a clean solution though leaves
> the x86 path alone.

I'm not sure why DT on arm64 would need to do late registration.

There is this chain of calls in the mainline today:

driver_init()
  cpu_dev_init()
    cpu_dev_register_generic()

the last of which registers CPUs on arm64/DT systems IIUC. I don't see
a need to change this behavior.

On arm64/ACPI, though, arch_register_cpu() cannot make progress until
it can look into the ACPI Namespace, so I would just make it return
-EPROBE_DEFER or equivalent then and the ACPI enumeration will find
the CPU and basically treat it as one that has just appeared.

> If we get rid of this catch all, solution would be to move the
> !acpi_disabled check into the arm64 version of arch_cpu_register()
> because we would only want the delayed registration path to be
> used on ACPI cases where the question of CPU availability can't
> yet be resolved.

Exactly.

This is similar (if not equivalent even) to a CPU becoming available
between the cpu_dev_register_generic() call and the ACPI enumeration.

> >
> > Then, the ACPI processor enumeration path may take care of registering
> > CPU that have not been registered so far and in the more-or-less the
> > same way regardless of the architecture (modulo some arch-specific
> > stuff).
>
> If I understand correctly, in acpi_processor_get_info() we'd end up
> with a similar check on whether it was already registered (the x86 path)
> or had be deferred (arm64 / acpi).
>
> >
> > In the end, it should be possible to avoid changing the behavior of
> > x86 and loongarch in this series.
>
> Possible, yes, but result if I understand correctly is we end up with
> very different flows and replication of functionality between the
> early registration and the late one. I'm fine with that if you prefer it!

But that's what is there today, isn't it?

I think this can be changed to reduce the duplication, but I'd prefer
to do that later, when the requisite functionality is in place and we
just do the consolidation.  In that case, if anything goes wrong, we
can take a step back and reconsider without deferring the arm64 CPU
hotplug support.

> >
> > > Which may look familiar at it's effectively patch 3 from v3 which was dealing
> > > with CPUs missing from DSDT (something we think doesn't happen).
> > >
> > > It might be possible to elide the arch_register_cpu() in
> > > make_present() but that will mean we use different flows in this patch set
> > > for the hotplug and initially present cases which is a bit messy.
> > >
> > > I've tested this lightly on arm64 and x86 ACPI + DT booting and it "seems" fine.
> >
> > Sounds promising.
>
> Possibly not that relevant though if proposal is to drop this approach. :(
> At least I now have test setups!

Great!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-04-10 14:19             ` Rafael J. Wysocki
@ 2024-04-10 15:58               ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2024-04-10 15:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Russell King, linux-pm, loongarch, linux-acpi, linux-arch,
	linux-kernel, linux-arm-kernel, linux-riscv, kvmarm, x86,
	acpica-devel, linux-csky, linux-doc, linux-ia64, linux-parisc,
	Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse, Miguel Luis

On Wed, 10 Apr 2024 16:19:50 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Wed, Apr 10, 2024 at 3:50 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Wed, 10 Apr 2024 15:28:18 +0200
> > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> >  
> > > On Wed, Apr 10, 2024 at 2:43 PM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:  
> > > >  
> > > > > >  
> > > > > > > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > > > > > > index 47de0f140ba6..13d052bf13f4 100644
> > > > > > > --- a/drivers/base/cpu.c
> > > > > > > +++ b/drivers/base/cpu.c
> > > > > > > @@ -553,7 +553,11 @@ static void __init cpu_dev_register_generic(void)
> > > > > > >  {
> > > > > > >         int i, ret;
> > > > > > >
> > > > > > > -       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> > > > > > > +       /*
> > > > > > > +        * When ACPI is enabled, CPUs are registered via
> > > > > > > +        * acpi_processor_get_info().
> > > > > > > +        */
> > > > > > > +       if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES) || !acpi_disabled)
> > > > > > >                 return;  
> > > > > >
> > > > > > Honestly, this looks like a quick hack to me and it absolutely
> > > > > > requires an ACK from the x86 maintainers to go anywhere.  
> > > > > Will address this separately.
> > > > >  
> > > >
> > > > So do people prefer this hack, or something along lines of the following?
> > > >
> > > > static int __init cpu_dev_register_generic(void)
> > > > {
> > > >         int i, ret = 0;
> > > >
> > > >         for_each_online_cpu(i) {
> > > >                 if (!get_cpu_device(i)) {
> > > >                         ret = arch_register_cpu(i);
> > > >                         if (ret)
> > > >                                 pr_warn("register_cpu %d failed (%d)\n", i, ret);
> > > >                 }
> > > >         }
> > > >         //Probably just eat the error.
> > > >         return 0;
> > > > }
> > > > subsys_initcall_sync(cpu_dev_register_generic);  
> > >
> > > I would prefer something like the above.
> > >
> > > I actually thought that arch_register_cpu() might return something
> > > like -EPROBE_DEFER when it cannot determine whether or not the CPU is
> > > really available.  
> >
> > Ok. That would end up looking much more like the original code I think.
> > So we wouldn't have this late registration at all, or keep it for DT
> > on arm64?  I'm not sure that's a clean solution though leaves
> > the x86 path alone.  
> 
> I'm not sure why DT on arm64 would need to do late registration.

This was me falsely thinking better to do it close together for
DT and ACPI. It definitely doesn't need to (or it wouldn't work today!)

> 
> There is this chain of calls in the mainline today:
> 
> driver_init()
>   cpu_dev_init()
>     cpu_dev_register_generic()
> 
> the last of which registers CPUs on arm64/DT systems IIUC. I don't see
> a need to change this behavior.
> 
> On arm64/ACPI, though, arch_register_cpu() cannot make progress until
> it can look into the ACPI Namespace, so I would just make it return
> -EPROBE_DEFER or equivalent then and the ACPI enumeration will find
> the CPU and basically treat it as one that has just appeared.

Ok so giving this a go...

Arm 64 version of arch_register_cpu() ended up as the following
(obviously needs cleaning up, bikeshedding of naming etc)

int arch_register_cpu(int cpu)
{
        struct cpu *c = &per_cpu(cpu_devices, cpu);
        acpi_handle acpi_handle = ACPI_HANDLE(&c->dev);
        int ret;

	printk("!!!!! INTO arch_register_cpu() %px\n", ACPI_HANDLE(&c->dev));

        if (!acpi_disabled && !acpi_handle)
                return -EPROBE_DEFER;
        if (acpi_handle) {
                ret = acpi_sta_enabled(acpi_handle);
                if (ret) {
                        printk("Have handle, not enabled\n");
                        /* Not enabled */
                        return ret;
                }
        }
        printk("!!!!! onwards arch_register_cpu()\n");

        c->hotpluggable = arch_cpu_is_hotpluggable(cpu);

        return register_cpu(c, cpu);
}

with new utility function in drivers/acpi/utils.c

int acpi_sta_enabled(acpi_handle handle)
{
       unsigned long long sta;
       bool present, enabled;
       acpi_status status;

       if (acpi_has_method(handle, "_STA")) {
               status = acpi_evaluate_integer(handle, "_STA", NULL, &sta);
               if (ACPI_FAILURE(status))
                       return -ENODEV;

               present = sta & ACPI_STA_DEVICE_PRESENT;
               enabled = sta & ACPI_STA_DEVICE_ENABLED;
               if (!present || !enabled) {
                       return -EPROBE_DEFER;
               }
               return 0;
       }
       return 0; /* No _STA means always on! */
}
	struct cpu *c = &per_cpu(cpu_devices, pr->id);	
	ACPI_COMPANION_SET(&c->dev, device);

in acpi_processor_get_info() and that calls

static int acpi_processor_make_enabled(struct acpi_processor *pr)
{
        int ret;

        if (invalid_phys_cpuid(pr->phys_id))
                return -ENODEV;

        cpus_write_lock();
        ret = arch_register_cpu(pr->id);
        cpus_write_unlock();

        return ret;
}

I think setting the ACPI handle should be harmless on other architectures.
It seems like the obvious one to set it to for cpu->dev.

Brief tests on same set of DT and ACPI on x86 and arm64 seem fine.

> 
> > If we get rid of this catch all, solution would be to move the
> > !acpi_disabled check into the arm64 version of arch_cpu_register()
> > because we would only want the delayed registration path to be
> > used on ACPI cases where the question of CPU availability can't
> > yet be resolved.  
> 
> Exactly.
> 
> This is similar (if not equivalent even) to a CPU becoming available
> between the cpu_dev_register_generic() call and the ACPI enumeration.

> 
> > >
> > > Then, the ACPI processor enumeration path may take care of registering
> > > CPU that have not been registered so far and in the more-or-less the
> > > same way regardless of the architecture (modulo some arch-specific
> > > stuff).  
> >
> > If I understand correctly, in acpi_processor_get_info() we'd end up
> > with a similar check on whether it was already registered (the x86 path)
> > or had be deferred (arm64 / acpi).
> >  
> > >
> > > In the end, it should be possible to avoid changing the behavior of
> > > x86 and loongarch in this series.  
> >
> > Possible, yes, but result if I understand correctly is we end up with
> > very different flows and replication of functionality between the
> > early registration and the late one. I'm fine with that if you prefer it!  
> 
> But that's what is there today, isn't it?

Agreed - but I was previously thinking we could move everything late.
I'm fine with just keeping the two flows separate.

> 
> I think this can be changed to reduce the duplication, but I'd prefer
> to do that later, when the requisite functionality is in place and we
> just do the consolidation.  In that case, if anything goes wrong, we
> can take a step back and reconsider without deferring the arm64 CPU
> hotplug support.

Great. That plan certainly works for me :)

Thanks for quick replies and help getting this headed in right direction.

+CC Miguel who is also looking at some of this series. Sorry Miguel,
was assuming you were on the thread and never checked :(

Jonathan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-04-10 13:50           ` Jonathan Cameron
  2024-04-10 14:19             ` Rafael J. Wysocki
@ 2024-04-10 18:56             ` Russell King (Oracle)
  2024-04-10 19:08               ` Rafael J. Wysocki
  1 sibling, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2024-04-10 18:56 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Rafael J. Wysocki, linux-pm, loongarch, linux-acpi, linux-arch,
	linux-kernel, linux-arm-kernel, linux-riscv, kvmarm, x86,
	acpica-devel, linux-csky, linux-doc, linux-ia64, linux-parisc,
	Salil Mehta, Jean-Philippe Brucker, jianyong.wu, justin.he,
	James Morse

On Wed, Apr 10, 2024 at 02:50:05PM +0100, Jonathan Cameron wrote:
> If we get rid of this catch all, solution would be to move the
> !acpi_disabled check into the arm64 version of arch_cpu_register()
> because we would only want the delayed registration path to be
> used on ACPI cases where the question of CPU availability can't
> yet be resolved.

Aren't we then needing two arch_register_cpu() implementations?
I'm assuming that you're suggesting that the !acpi_disabled, then
do nothing check is moved into arch_register_cpu() - or to put it
another way, arch_register_cpu() does nothing if ACPI is enabled.

If arch_register_cpu() does nothing if ACPI is enabled, how do
CPUs get registered (and sysfs files get created to control them)
on ACPI systems? ACPI wouldn't be able to call arch_register_cpu(),
so I suspect you'll need an ACPI-specific version of this function.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-04-10 18:56             ` Russell King (Oracle)
@ 2024-04-10 19:08               ` Rafael J. Wysocki
  2024-04-10 21:07                 ` Jonathan Cameron
  0 siblings, 1 reply; 42+ messages in thread
From: Rafael J. Wysocki @ 2024-04-10 19:08 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jonathan Cameron, Rafael J. Wysocki, linux-pm, loongarch,
	linux-acpi, linux-arch, linux-kernel, linux-arm-kernel,
	linux-riscv, kvmarm, x86, acpica-devel, linux-csky, linux-doc,
	linux-ia64, linux-parisc, Salil Mehta, Jean-Philippe Brucker,
	jianyong.wu, justin.he, James Morse

On Wed, Apr 10, 2024 at 8:56 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> On Wed, Apr 10, 2024 at 02:50:05PM +0100, Jonathan Cameron wrote:
> > If we get rid of this catch all, solution would be to move the
> > !acpi_disabled check into the arm64 version of arch_cpu_register()
> > because we would only want the delayed registration path to be
> > used on ACPI cases where the question of CPU availability can't
> > yet be resolved.
>
> Aren't we then needing two arch_register_cpu() implementations?
> I'm assuming that you're suggesting that the !acpi_disabled, then
> do nothing check is moved into arch_register_cpu() - or to put it
> another way, arch_register_cpu() does nothing if ACPI is enabled.
>
> If arch_register_cpu() does nothing if ACPI is enabled, how do
> CPUs get registered (and sysfs files get created to control them)
> on ACPI systems? ACPI wouldn't be able to call arch_register_cpu(),
> so I suspect you'll need an ACPI-specific version of this function.

arch_register_cpu() will do what it does, but it will check (upfront)
if ACPI is enabled and if so, if the ACPI Namespace is available.  In
the case when ACPI is enabled and the ACPI Namespace is not ready, it
will return -EPROBE_DEFER (say).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info()
  2024-04-10 19:08               ` Rafael J. Wysocki
@ 2024-04-10 21:07                 ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2024-04-10 21:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Russell King (Oracle),
	linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Miguel Luis

On Wed, 10 Apr 2024 21:08:06 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Wed, Apr 10, 2024 at 8:56 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Apr 10, 2024 at 02:50:05PM +0100, Jonathan Cameron wrote:  
> > > If we get rid of this catch all, solution would be to move the
> > > !acpi_disabled check into the arm64 version of arch_cpu_register()
> > > because we would only want the delayed registration path to be
> > > used on ACPI cases where the question of CPU availability can't
> > > yet be resolved.  
> >
> > Aren't we then needing two arch_register_cpu() implementations?
> > I'm assuming that you're suggesting that the !acpi_disabled, then
> > do nothing check is moved into arch_register_cpu() - or to put it
> > another way, arch_register_cpu() does nothing if ACPI is enabled.
> >
> > If arch_register_cpu() does nothing if ACPI is enabled, how do
> > CPUs get registered (and sysfs files get created to control them)
> > on ACPI systems? ACPI wouldn't be able to call arch_register_cpu(),
> > so I suspect you'll need an ACPI-specific version of this function.  
> 
> arch_register_cpu() will do what it does, but it will check (upfront)
> if ACPI is enabled and if so, if the ACPI Namespace is available.  In
> the case when ACPI is enabled and the ACPI Namespace is not ready, it
> will return -EPROBE_DEFER (say).

Exactly.  I oversimplified and wasn't clear enough.
The check is there in the arch_register_cpu() and is one of the ways
that function can decide to actually register the cpu but not the only one.

I think we may later want to consider breaking it into 2 arch calls
(check if ready to register + register) to reduce code duplication
in with the hotplug path where there is a little extra to do
inbetween.

Hopefully that can wait though.

Jonathan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs
  2024-01-31 16:50 ` [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs Russell King
@ 2024-04-11 11:35   ` Jonathan Cameron
  2024-04-11 13:25     ` Jonathan Cameron
  0 siblings, 1 reply; 42+ messages in thread
From: Jonathan Cameron @ 2024-04-11 11:35 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Rafael J. Wysocki, Miguel Luis

On Wed, 31 Jan 2024 16:50:38 +0000
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> 
> When a CPU is marked as disabled, but online capable in the MADT, PSCI
> applies some firmware policy to control when it can be brought online.
> PSCI returns DENIED to a CPU_ON request if this is not currently
> permitted. The OS can learn the current policy from the _STA enabled bit.
> 
> Handle the PSCI DENIED return code gracefully instead of printing an
> error.
> 
> See https://developer.arm.com/documentation/den0022/f/?lang=en page 58.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> [ morse: Rewrote commit message ]
> Signed-off-by: James Morse <james.morse@arm.com>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---

This change to return failure from __cpu_up in non error cases exposes
an possible issue with cpu_up() in kernel/cpu.c in that it brings the numa node
before we try (and fail) to bring up CPUs that may be denied.

We could try offlining the numa node on error, or just register it later.

Currently I'm testing this change which I think is harmless for cases that don't
fail the cpu_up()

For the cpu hotplug path note the node only comes online wiht the cpu online, not
the earlier hotplug. Reasonable given there is nothing online in the node before
that point.

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 537099bf5d02..a4730396ccea 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1742,10 +1742,6 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
                return -EINVAL;
        }

-       err = try_online_node(cpu_to_node(cpu));
-       if (err)
-               return err;
-
        cpu_maps_update_begin();

        if (cpu_hotplug_disabled) {
@@ -1760,7 +1756,10 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
        err = _cpu_up(cpu, 0, target);
 out:
        cpu_maps_update_done();
-       return err;
+       if (err)
+               return err;
+
+       return try_online_node(cpu_to_node(cpu));
 }

There is a kicker in the remove path where currently check_cpu_on_node()
checks for_each_present_cpu() whereas to work for us we need to use
for_each_online_cpu() or the node is never removed.

Now my current view is that we should only show
nodes in /sys/bus/nodes/devices/ if there is a CPU online (assuming no other
reasons the node should be online such as memory).
That's easy enough to make work but all I'm really learning is that the semantics
of what is an online form a node point of view is not consistent.

Fixing this will create a minor change on x86 but does anyone really care
about what happens in the offline path wrt to 'when' the node disappears?
I think the corner case is.

1. Add 2 CPUs (A, B) in a CPU only node.
2. Online CPU A - this brings /sys/bus/devices/nodeX online
3. Remove CPU A - no effect because the check for try_remove is on presence and CPU B is
   still present.
4. Online CPU B - no change.
5. Offline CPU B - no change.
4. Remove CPU B - /sys/bus/device/nodeX offline (disappears)

To make it work on arm64 where we never make CPUs not present.

1. Add 2 CPUs (A, B) in a CPU only node.
2. Online CPU A - this brings /sys/bus/devices/nodeX online
3. Remove CPU A - /sys/bus/devices/nodeX offline (disappears)
4. Online CPU B - this brings /sys/bus/device/nodeX online
5. Offline CPU B - no change (node updates only happen in hotplug code)
6. Remove CPU B - /sys/bus/device/nodeX offline (disappears).

Step 5 may seem weird but I think we can't mess with nodes there because
userspace may well rely on them still being around for some reason
(it's a much more common situation).

My assumption is that step3 removing the node isn't going to hurt anyone?

If no one shouts, i'll go ahead with rolling a v5 patch set where this is done.


Jonathan





> Changes since RFC v2
>  * Add specification reference
>  * Use EPERM rather than EPROBE_DEFER
> Changes since RFC v3:
>  * Use EPERM everywhere
>  * Drop unnecessary changes to drivers/firmware/psci/psci.c
> ---
>  arch/arm64/kernel/psci.c | 2 +-
>  arch/arm64/kernel/smp.c  | 3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> index 29a8e444db83..fabd732d0a2d 100644
> --- a/arch/arm64/kernel/psci.c
> +++ b/arch/arm64/kernel/psci.c
> @@ -40,7 +40,7 @@ static int cpu_psci_cpu_boot(unsigned int cpu)
>  {
>  	phys_addr_t pa_secondary_entry = __pa_symbol(secondary_entry);
>  	int err = psci_ops.cpu_on(cpu_logical_map(cpu), pa_secondary_entry);
> -	if (err)
> +	if (err && err != -EPERM)
>  		pr_err("failed to boot CPU%d (%d)\n", cpu, err);
>  
>  	return err;
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 4ced34f62dab..dc0e0b3ec2d4 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -132,7 +132,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  	/* Now bring the CPU into our world */
>  	ret = boot_secondary(cpu, idle);
>  	if (ret) {
> -		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> +		if (ret != -EPERM)
> +			pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>  		return ret;
>  	}
>  


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs
  2024-04-11 11:35   ` Jonathan Cameron
@ 2024-04-11 13:25     ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2024-04-11 13:25 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: linux-pm, loongarch, linux-acpi, linux-arch, linux-kernel,
	linux-arm-kernel, linux-riscv, kvmarm, x86, acpica-devel,
	linux-csky, linux-doc, linux-ia64, linux-parisc, Salil Mehta,
	Jean-Philippe Brucker, jianyong.wu, justin.he, James Morse,
	Rafael J. Wysocki, Miguel Luis

On Thu, 11 Apr 2024 12:35:23 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 31 Jan 2024 16:50:38 +0000
> Russell King (Oracle) <rmk+kernel@armlinux.org.uk> wrote:
> 
> > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > 
> > When a CPU is marked as disabled, but online capable in the MADT, PSCI
> > applies some firmware policy to control when it can be brought online.
> > PSCI returns DENIED to a CPU_ON request if this is not currently
> > permitted. The OS can learn the current policy from the _STA enabled bit.
> > 
> > Handle the PSCI DENIED return code gracefully instead of printing an
> > error.
> > 
> > See https://developer.arm.com/documentation/den0022/f/?lang=en page 58.
> > 
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > [ morse: Rewrote commit message ]
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > ---  
> 
> This change to return failure from __cpu_up in non error cases exposes
> an possible issue with cpu_up() in kernel/cpu.c in that it brings the numa node
> before we try (and fail) to bring up CPUs that may be denied.
> 
> We could try offlining the numa node on error, or just register it later.
> 
> Currently I'm testing this change which I think is harmless for cases that don't
> fail the cpu_up()
> 
> For the cpu hotplug path note the node only comes online wiht the cpu online, not
> the earlier hotplug. Reasonable given there is nothing online in the node before
> that point.
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 537099bf5d02..a4730396ccea 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1742,10 +1742,6 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
>                 return -EINVAL;
>         }
> 
> -       err = try_online_node(cpu_to_node(cpu));
> -       if (err)
> -               return err;
> -
>         cpu_maps_update_begin();
> 
>         if (cpu_hotplug_disabled) {
> @@ -1760,7 +1756,10 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
>         err = _cpu_up(cpu, 0, target);
>  out:
>         cpu_maps_update_done();
> -       return err;
> +       if (err)
> +               return err;
> +
> +       return try_online_node(cpu_to_node(cpu));
>  }
> 
> There is a kicker in the remove path where currently check_cpu_on_node()
> checks for_each_present_cpu() whereas to work for us we need to use
> for_each_online_cpu() or the node is never removed.
> 
> Now my current view is that we should only show
> nodes in /sys/bus/nodes/devices/ if there is a CPU online (assuming no other
> reasons the node should be online such as memory).
> That's easy enough to make work but all I'm really learning is that the semantics
> of what is an online form a node point of view is not consistent.
> 
> Fixing this will create a minor change on x86 but does anyone really care
> about what happens in the offline path wrt to 'when' the node disappears?
> I think the corner case is.
> 
> 1. Add 2 CPUs (A, B) in a CPU only node.
> 2. Online CPU A - this brings /sys/bus/devices/nodeX online
> 3. Remove CPU A - no effect because the check for try_remove is on presence and CPU B is
>    still present.
> 4. Online CPU B - no change.
> 5. Offline CPU B - no change.
> 4. Remove CPU B - /sys/bus/device/nodeX offline (disappears)
> 
> To make it work on arm64 where we never make CPUs not present.
> 
> 1. Add 2 CPUs (A, B) in a CPU only node.
> 2. Online CPU A - this brings /sys/bus/devices/nodeX online
> 3. Remove CPU A - /sys/bus/devices/nodeX offline (disappears)
> 4. Online CPU B - this brings /sys/bus/device/nodeX online
> 5. Offline CPU B - no change (node updates only happen in hotplug code)
> 6. Remove CPU B - /sys/bus/device/nodeX offline (disappears).
> 
> Step 5 may seem weird but I think we can't mess with nodes there because
> userspace may well rely on them still being around for some reason
> (it's a much more common situation).
> 
> My assumption is that step3 removing the node isn't going to hurt anyone?
> 
> If no one shouts, i'll go ahead with rolling a v5 patch set where this is done.

This needs a little more thought as if we follow above sequence there isn't
currently any cleanup of
/sys/bus/cpu/devices/cpuB/node link.

If you follow the sequence above an attempt is made to create it again.
Whilst I have a WIP bit of dancing code that deals with this it is ugly as
the check on whether a node is online needs to be dropped on this path.
The problem coming back to the tear down and setup paths being in entirely different
states and not remotely symmetric. 

So my proposal for now is to treat the whole numa node problem as something
for another day.

Step 1)
For ARM64 cpu hp - NUMA nodes created at boot for any present CPUs - so all
of them.  They never go away, hence no annoying tear down to do.

Step 2)
We can consider moving to dynamic numa node handling either via the messy
approach I have working, or via more major surgery.
I have a bunch of other work in this area for memory hotplug, so maybe
I can role those 2 activities together.

Jonathan
> 
> 
> Jonathan
> 
> 
> 
> 
> 
> > Changes since RFC v2
> >  * Add specification reference
> >  * Use EPERM rather than EPROBE_DEFER
> > Changes since RFC v3:
> >  * Use EPERM everywhere
> >  * Drop unnecessary changes to drivers/firmware/psci/psci.c
> > ---
> >  arch/arm64/kernel/psci.c | 2 +-
> >  arch/arm64/kernel/smp.c  | 3 ++-
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> > index 29a8e444db83..fabd732d0a2d 100644
> > --- a/arch/arm64/kernel/psci.c
> > +++ b/arch/arm64/kernel/psci.c
> > @@ -40,7 +40,7 @@ static int cpu_psci_cpu_boot(unsigned int cpu)
> >  {
> >  	phys_addr_t pa_secondary_entry = __pa_symbol(secondary_entry);
> >  	int err = psci_ops.cpu_on(cpu_logical_map(cpu), pa_secondary_entry);
> > -	if (err)
> > +	if (err && err != -EPERM)
> >  		pr_err("failed to boot CPU%d (%d)\n", cpu, err);
> >  
> >  	return err;
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 4ced34f62dab..dc0e0b3ec2d4 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -132,7 +132,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> >  	/* Now bring the CPU into our world */
> >  	ret = boot_secondary(cpu, idle);
> >  	if (ret) {
> > -		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> > +		if (ret != -EPERM)
> > +			pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> >  		return ret;
> >  	}
> >    
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-04-11 13:25 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 16:48 [RFC PATCH v4 00/15] ACPI/arm64: add support for virtual cpu hotplug Russell King (Oracle)
2024-01-31 16:49 ` [PATCH RFC v4 01/15] ACPI: Only enumerate enabled (or functional) processor devices Russell King
2024-01-31 17:25   ` Miguel Luis
2024-02-15 20:10   ` Rafael J. Wysocki
2024-02-19  9:45     ` Jonathan Cameron
2024-02-20 11:30     ` Russell King (Oracle)
2024-02-21 13:01   ` Rafael J. Wysocki
2024-01-31 16:49 ` [PATCH RFC v4 02/15] ACPI: processor: Register all CPUs from acpi_processor_get_info() Russell King
2024-02-15 19:22   ` Rafael J. Wysocki
2024-02-20 11:27     ` Russell King (Oracle)
2024-02-20 15:13       ` Russell King (Oracle)
2024-02-20 16:24         ` Jonathan Cameron
2024-02-20 19:59           ` Rafael J. Wysocki
2024-02-21 12:04             ` Rafael J. Wysocki
2024-02-20 20:59     ` Thomas Gleixner
2024-03-22 18:53     ` Jonathan Cameron
2024-04-10 12:43       ` Jonathan Cameron
2024-04-10 13:28         ` Rafael J. Wysocki
2024-04-10 13:50           ` Jonathan Cameron
2024-04-10 14:19             ` Rafael J. Wysocki
2024-04-10 15:58               ` Jonathan Cameron
2024-04-10 18:56             ` Russell King (Oracle)
2024-04-10 19:08               ` Rafael J. Wysocki
2024-04-10 21:07                 ` Jonathan Cameron
2024-01-31 16:49 ` [PATCH RFC v4 03/15] ACPI: Move acpi_bus_trim_one() before acpi_scan_hot_remove() Russell King
2024-01-31 16:49 ` [PATCH RFC v4 04/15] ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards Russell King
2024-01-31 16:50 ` [PATCH RFC v4 05/15] ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug Russell King
2024-01-31 16:50 ` [PATCH RFC v4 06/15] ACPI: convert acpi_processor_post_eject() to use IS_ENABLED() Russell King (Oracle)
2024-01-31 16:50 ` [PATCH RFC v4 07/15] ACPI: Check _STA present bit before making CPUs not present Russell King
2024-01-31 16:50 ` [PATCH RFC v4 08/15] ACPI: Warn when the present bit changes but the feature is not enabled Russell King
2024-01-31 16:50 ` [PATCH RFC v4 09/15] arm64: acpi: Move get_cpu_for_acpi_id() to a header Russell King
2024-01-31 16:50 ` [PATCH RFC v4 10/15] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() Russell King
2024-02-02 16:44   ` Jonathan Cameron
2024-01-31 16:50 ` [PATCH RFC v4 11/15] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs Russell King
2024-02-02 16:47   ` Jonathan Cameron
2024-01-31 16:50 ` [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs Russell King
2024-04-11 11:35   ` Jonathan Cameron
2024-04-11 13:25     ` Jonathan Cameron
2024-01-31 16:50 ` [PATCH RFC v4 13/15] ACPI: add support to (un)register CPUs based on the _STA enabled bit Russell King
2024-01-31 16:50 ` [PATCH RFC v4 14/15] arm64: document virtual CPU hotplug's expectations Russell King
2024-02-02 17:04   ` Jonathan Cameron
2024-01-31 16:50 ` [PATCH RFC v4 15/15] cpumask: Add enabled cpumask for present CPUs that can be brought online Russell King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).