linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Question] Memory hotplug clarification for Qemu ARM/virt
@ 2019-05-08 10:15 Shameerali Kolothum Thodi
  2019-05-08 12:50 ` Robin Murphy
  2019-05-08 20:08 ` Laszlo Ersek
  0 siblings, 2 replies; 11+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-05-08 10:15 UTC (permalink / raw)
  To: robin.murphy, will.deacon, Catalin Marinas, Anshuman Khandual,
	linux-arm-kernel, linux-mm
  Cc: qemu-devel, qemu-arm, eric.auger, Igor Mammedov, Laszlo Ersek,
	peter.maydell, Linuxarm, ard.biesheuvel, Jonathan Cameron,
	xuwei (O)

Hi,

This series here[0] attempts to add support for PCDIMM in QEMU for
ARM/Virt platform and has stumbled upon an issue as it is not clear(at least
from Qemu/EDK2 point of view) how in physical world the hotpluggable
memory is handled by kernel.

The proposed implementation in Qemu, builds the SRAT and DSDT parts
and uses GED device to trigger the hotplug. This works fine.

But when we added the DT node corresponding to the PCDIMM(cold plug
scenario), we noticed that Guest kernel see this memory during early boot
even if we are booting with ACPI. Because of this, hotpluggable memory
may end up in zone normal and make it non-hot-un-pluggable even if Guest
boots with ACPI.

Further discussions[1] revealed that, EDK2 UEFI has no means to interpret the
ACPI content from Qemu(this is designed to do so) and uses DT info to
build the GetMemoryMap(). To solve this, introduced "hotpluggable" property
to DT memory node(patches #7 & #8 from [0]) so that UEFI can differentiate
the nodes and exclude the hotpluggable ones from GetMemoryMap().

But then Laszlo rightly pointed out that in order to accommodate the changes
into UEFI we need to know how exactly Linux expects/handles all the 
hotpluggable memory scenarios. Please find the discussion here[2].

For ease, I am just copying the relevant comment from Laszlo below,

/******
"Given patches #7 and #8, as I understand them, the firmware cannot distinguish
 hotpluggable & present, from hotpluggable & absent. The firmware can only
 skip both hotpluggable cases. That's fine in that the firmware will hog neither
 type -- but is that OK for the OS as well, for both ACPI boot and DT boot?

Consider in particular the "hotpluggable & present, ACPI boot" case. Assuming
we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
will not include the range despite it being present at boot. Presumably, ACPI
will refer to the range somehow, however. Will that not confuse the OS?

When Igor raised this earlier, I suggested that hotpluggable-and-present should
be added by the firmware, but also allocated immediately, as EfiBootServicesData
type memory. This will prevent other drivers in the firmware from allocating AcpiNVS
or Reserved chunks from the same memory range, the UEFI memmap will contain
the range as EfiBootServicesData, and then the OS can release that allocation in
one go early during boot.

But this really has to be clarified from the Linux kernel's expectations. Please
formalize all of the following cases:

OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI should report as
-----------------  ------------------  -------------------------------  ------------------------
DT                 present             ?                                ?
DT                 absent              ?                                ?
ACPI               present             ?                                ?
ACPI               absent              ?                                ?

Again, this table is dictated by Linux."

******/

Could you please take a look at this and let us know what is expected here from
a Linux kernel view point.

(Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed any valid
points above).

Thanks,
Shameer
[0] https://patchwork.kernel.org/cover/10890919/
[1] https://patchwork.kernel.org/patch/10863299/
[2] https://patchwork.kernel.org/patch/10890937/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-08 10:15 [Question] Memory hotplug clarification for Qemu ARM/virt Shameerali Kolothum Thodi
@ 2019-05-08 12:50 ` Robin Murphy
  2019-05-08 20:26   ` Laszlo Ersek
  2019-05-08 20:08 ` Laszlo Ersek
  1 sibling, 1 reply; 11+ messages in thread
From: Robin Murphy @ 2019-05-08 12:50 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, will.deacon, Catalin Marinas,
	Anshuman Khandual, linux-arm-kernel, linux-mm
  Cc: qemu-devel, qemu-arm, eric.auger, Igor Mammedov, Laszlo Ersek,
	peter.maydell, Linuxarm, ard.biesheuvel, Jonathan Cameron,
	xuwei (O)

Hi Shameer,

On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> Hi,
> 
> This series here[0] attempts to add support for PCDIMM in QEMU for
> ARM/Virt platform and has stumbled upon an issue as it is not clear(at least
> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> memory is handled by kernel.
> 
> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> and uses GED device to trigger the hotplug. This works fine.
> 
> But when we added the DT node corresponding to the PCDIMM(cold plug
> scenario), we noticed that Guest kernel see this memory during early boot
> even if we are booting with ACPI. Because of this, hotpluggable memory
> may end up in zone normal and make it non-hot-un-pluggable even if Guest
> boots with ACPI.
> 
> Further discussions[1] revealed that, EDK2 UEFI has no means to interpret the
> ACPI content from Qemu(this is designed to do so) and uses DT info to
> build the GetMemoryMap(). To solve this, introduced "hotpluggable" property
> to DT memory node(patches #7 & #8 from [0]) so that UEFI can differentiate
> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> 
> But then Laszlo rightly pointed out that in order to accommodate the changes
> into UEFI we need to know how exactly Linux expects/handles all the
> hotpluggable memory scenarios. Please find the discussion here[2].
> 
> For ease, I am just copying the relevant comment from Laszlo below,
> 
> /******
> "Given patches #7 and #8, as I understand them, the firmware cannot distinguish
>   hotpluggable & present, from hotpluggable & absent. The firmware can only
>   skip both hotpluggable cases. That's fine in that the firmware will hog neither
>   type -- but is that OK for the OS as well, for both ACPI boot and DT boot?
> 
> Consider in particular the "hotpluggable & present, ACPI boot" case. Assuming
> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
> will not include the range despite it being present at boot. Presumably, ACPI
> will refer to the range somehow, however. Will that not confuse the OS?
> 
> When Igor raised this earlier, I suggested that hotpluggable-and-present should
> be added by the firmware, but also allocated immediately, as EfiBootServicesData
> type memory. This will prevent other drivers in the firmware from allocating AcpiNVS
> or Reserved chunks from the same memory range, the UEFI memmap will contain
> the range as EfiBootServicesData, and then the OS can release that allocation in
> one go early during boot.
> 
> But this really has to be clarified from the Linux kernel's expectations. Please
> formalize all of the following cases:
> 
> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI should report as
> -----------------  ------------------  -------------------------------  ------------------------
> DT                 present             ?                                ?
> DT                 absent              ?                                ?
> ACPI               present             ?                                ?
> ACPI               absent              ?                                ?
> 
> Again, this table is dictated by Linux."
> 
> ******/
> 
> Could you please take a look at this and let us know what is expected here from
> a Linux kernel view point.

For arm64, so far we've not even been considering DT-based hotplug - as 
far as I'm aware there would still be a big open question there around 
notification mechanisms and how to describe them. The DT stuff so far 
has come from the PowerPC folks, so it's probably worth seeing what 
their ideas are.

ACPI-wise I've always assumed/hoped that hotplug-related things should 
be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do" 
would be enough for us.

Robin.

> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed any valid
> points above).
> 
> Thanks,
> Shameer
> [0] https://patchwork.kernel.org/cover/10890919/
> [1] https://patchwork.kernel.org/patch/10863299/
> [2] https://patchwork.kernel.org/patch/10890937/
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-08 10:15 [Question] Memory hotplug clarification for Qemu ARM/virt Shameerali Kolothum Thodi
  2019-05-08 12:50 ` Robin Murphy
@ 2019-05-08 20:08 ` Laszlo Ersek
  1 sibling, 0 replies; 11+ messages in thread
From: Laszlo Ersek @ 2019-05-08 20:08 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, robin.murphy, will.deacon,
	Catalin Marinas, Anshuman Khandual, linux-arm-kernel, linux-mm
  Cc: qemu-devel, qemu-arm, eric.auger, Igor Mammedov, peter.maydell,
	Linuxarm, ard.biesheuvel, Jonathan Cameron, xuwei (O)

On 05/08/19 12:15, Shameerali Kolothum Thodi wrote:
> Hi,
> 
> This series here[0] attempts to add support for PCDIMM in QEMU for
> ARM/Virt platform and has stumbled upon an issue as it is not clear(at least
> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> memory is handled by kernel.
> 
> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> and uses GED device to trigger the hotplug. This works fine.
> 
> But when we added the DT node corresponding to the PCDIMM(cold plug
> scenario), we noticed that Guest kernel see this memory during early boot
> even if we are booting with ACPI. Because of this, hotpluggable memory
> may end up in zone normal and make it non-hot-un-pluggable even if Guest
> boots with ACPI.
> 
> Further discussions[1] revealed that, EDK2 UEFI has no means to interpret the
> ACPI content from Qemu(this is designed to do so) and uses DT info to
> build the GetMemoryMap(). To solve this, introduced "hotpluggable" property
> to DT memory node(patches #7 & #8 from [0]) so that UEFI can differentiate
> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> 
> But then Laszlo rightly pointed out that in order to accommodate the changes
> into UEFI we need to know how exactly Linux expects/handles all the 
> hotpluggable memory scenarios. Please find the discussion here[2].
> 
> For ease, I am just copying the relevant comment from Laszlo below,
> 
> /******
> "Given patches #7 and #8, as I understand them, the firmware cannot distinguish
>  hotpluggable & present, from hotpluggable & absent. The firmware can only
>  skip both hotpluggable cases. That's fine in that the firmware will hog neither
>  type -- but is that OK for the OS as well, for both ACPI boot and DT boot?
> 
> Consider in particular the "hotpluggable & present, ACPI boot" case. Assuming
> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
> will not include the range despite it being present at boot. Presumably, ACPI
> will refer to the range somehow, however. Will that not confuse the OS?
> 
> When Igor raised this earlier, I suggested that hotpluggable-and-present should
> be added by the firmware, but also allocated immediately, as EfiBootServicesData
> type memory. This will prevent other drivers in the firmware from allocating AcpiNVS
> or Reserved chunks from the same memory range, the UEFI memmap will contain
> the range as EfiBootServicesData, and then the OS can release that allocation in
> one go early during boot.
> 
> But this really has to be clarified from the Linux kernel's expectations. Please
> formalize all of the following cases:
> 
> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI should report as
> -----------------  ------------------  -------------------------------  ------------------------
> DT                 present             ?                                ?
> DT                 absent              ?                                ?
> ACPI               present             ?                                ?
> ACPI               absent              ?                                ?
> 
> Again, this table is dictated by Linux."
> 
> ******/
> 
> Could you please take a look at this and let us know what is expected here from
> a Linux kernel view point.
> 
> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed any valid
> points above).

I'm happy with your summary, thank you!
Laszlo

> 
> Thanks,
> Shameer
> [0] https://patchwork.kernel.org/cover/10890919/
> [1] https://patchwork.kernel.org/patch/10863299/
> [2] https://patchwork.kernel.org/patch/10890937/
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-08 12:50 ` Robin Murphy
@ 2019-05-08 20:26   ` Laszlo Ersek
  2019-05-09 16:35     ` Igor Mammedov
  0 siblings, 1 reply; 11+ messages in thread
From: Laszlo Ersek @ 2019-05-08 20:26 UTC (permalink / raw)
  To: Robin Murphy, Shameerali Kolothum Thodi, will.deacon,
	Catalin Marinas, Anshuman Khandual, linux-arm-kernel, linux-mm
  Cc: qemu-devel, qemu-arm, eric.auger, Igor Mammedov, peter.maydell,
	Linuxarm, ard.biesheuvel, Jonathan Cameron, xuwei (O)

On 05/08/19 14:50, Robin Murphy wrote:
> Hi Shameer,
> 
> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
>> Hi,
>>
>> This series here[0] attempts to add support for PCDIMM in QEMU for
>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
>> least
>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
>> memory is handled by kernel.
>>
>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
>> and uses GED device to trigger the hotplug. This works fine.
>>
>> But when we added the DT node corresponding to the PCDIMM(cold plug
>> scenario), we noticed that Guest kernel see this memory during early boot
>> even if we are booting with ACPI. Because of this, hotpluggable memory
>> may end up in zone normal and make it non-hot-un-pluggable even if Guest
>> boots with ACPI.
>>
>> Further discussions[1] revealed that, EDK2 UEFI has no means to
>> interpret the
>> ACPI content from Qemu(this is designed to do so) and uses DT info to
>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
>> property
>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
>> differentiate
>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
>>
>> But then Laszlo rightly pointed out that in order to accommodate the
>> changes
>> into UEFI we need to know how exactly Linux expects/handles all the
>> hotpluggable memory scenarios. Please find the discussion here[2].
>>
>> For ease, I am just copying the relevant comment from Laszlo below,
>>
>> /******
>> "Given patches #7 and #8, as I understand them, the firmware cannot
>> distinguish
>>   hotpluggable & present, from hotpluggable & absent. The firmware can
>> only
>>   skip both hotpluggable cases. That's fine in that the firmware will
>> hog neither
>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
>> boot?
>>
>> Consider in particular the "hotpluggable & present, ACPI boot" case.
>> Assuming
>> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
>> will not include the range despite it being present at boot.
>> Presumably, ACPI
>> will refer to the range somehow, however. Will that not confuse the OS?
>>
>> When Igor raised this earlier, I suggested that
>> hotpluggable-and-present should
>> be added by the firmware, but also allocated immediately, as
>> EfiBootServicesData
>> type memory. This will prevent other drivers in the firmware from
>> allocating AcpiNVS
>> or Reserved chunks from the same memory range, the UEFI memmap will
>> contain
>> the range as EfiBootServicesData, and then the OS can release that
>> allocation in
>> one go early during boot.
>>
>> But this really has to be clarified from the Linux kernel's
>> expectations. Please
>> formalize all of the following cases:
>>
>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report
>> as  DT/ACPI should report as
>> -----------------  ------------------ 
>> -------------------------------  ------------------------
>> DT                 present             ?                                ?
>> DT                 absent              ?                                ?
>> ACPI               present             ?                                ?
>> ACPI               absent              ?                                ?
>>
>> Again, this table is dictated by Linux."
>>
>> ******/
>>
>> Could you please take a look at this and let us know what is expected
>> here from
>> a Linux kernel view point.
> 
> For arm64, so far we've not even been considering DT-based hotplug - as
> far as I'm aware there would still be a big open question there around
> notification mechanisms and how to describe them. The DT stuff so far
> has come from the PowerPC folks, so it's probably worth seeing what
> their ideas are.
> 
> ACPI-wise I've always assumed/hoped that hotplug-related things should
> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
> would be enough for us.

As far as I can see in UEFI v2.8 -- and I had checked the spec before
dumping the table with the many question marks on Shameer --, all the
hot-plug language in the spec refers to USB and PCI hot-plug in the
preboot environment. There is not a single word about hot-plug at OS
runtime (regarding any device or component type), nor about memory
hot-plug (at any time).

Looking to x86 appears valid -- so what does the Linux kernel expect on
that architecture, in the "ACPI" rows of the table?

Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest
re-raising the question on those lists too; at least the "ACPI" rows of
the table.

Thanks!
Laszlo

> 
> Robin.
> 
>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
>> any valid
>> points above).
>>
>> Thanks,
>> Shameer
>> [0] https://patchwork.kernel.org/cover/10890919/
>> [1] https://patchwork.kernel.org/patch/10863299/
>> [2] https://patchwork.kernel.org/patch/10890937/
>>
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-08 20:26   ` Laszlo Ersek
@ 2019-05-09 16:35     ` Igor Mammedov
  2019-05-09 21:48       ` Laszlo Ersek
  0 siblings, 1 reply; 11+ messages in thread
From: Igor Mammedov @ 2019-05-09 16:35 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Robin Murphy, Shameerali Kolothum Thodi, will.deacon,
	Catalin Marinas, Anshuman Khandual, linux-arm-kernel, linux-mm,
	qemu-devel, qemu-arm, eric.auger, peter.maydell, Linuxarm,
	ard.biesheuvel, Jonathan Cameron, xuwei (O)

On Wed, 8 May 2019 22:26:12 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 05/08/19 14:50, Robin Murphy wrote:
> > Hi Shameer,
> > 
> > On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> >> Hi,
> >>
> >> This series here[0] attempts to add support for PCDIMM in QEMU for
> >> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
> >> least
> >> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> >> memory is handled by kernel.
> >>
> >> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> >> and uses GED device to trigger the hotplug. This works fine.
> >>
> >> But when we added the DT node corresponding to the PCDIMM(cold plug
> >> scenario), we noticed that Guest kernel see this memory during early boot
> >> even if we are booting with ACPI. Because of this, hotpluggable memory
> >> may end up in zone normal and make it non-hot-un-pluggable even if Guest
> >> boots with ACPI.
> >>
> >> Further discussions[1] revealed that, EDK2 UEFI has no means to
> >> interpret the
> >> ACPI content from Qemu(this is designed to do so) and uses DT info to
> >> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
> >> property
> >> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
> >> differentiate
> >> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> >>
> >> But then Laszlo rightly pointed out that in order to accommodate the
> >> changes
> >> into UEFI we need to know how exactly Linux expects/handles all the
> >> hotpluggable memory scenarios. Please find the discussion here[2].
> >>
> >> For ease, I am just copying the relevant comment from Laszlo below,
> >>
> >> /******
> >> "Given patches #7 and #8, as I understand them, the firmware cannot
> >> distinguish
> >>   hotpluggable & present, from hotpluggable & absent. The firmware can
> >> only
> >>   skip both hotpluggable cases. That's fine in that the firmware will
> >> hog neither
> >>   type -- but is that OK for the OS as well, for both ACPI boot and DT
> >> boot?
> >>
> >> Consider in particular the "hotpluggable & present, ACPI boot" case.
> >> Assuming
> >> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
> >> will not include the range despite it being present at boot.
> >> Presumably, ACPI
> >> will refer to the range somehow, however. Will that not confuse the OS?
> >>
> >> When Igor raised this earlier, I suggested that
> >> hotpluggable-and-present should
> >> be added by the firmware, but also allocated immediately, as
> >> EfiBootServicesData
> >> type memory. This will prevent other drivers in the firmware from
> >> allocating AcpiNVS
> >> or Reserved chunks from the same memory range, the UEFI memmap will
> >> contain
> >> the range as EfiBootServicesData, and then the OS can release that
> >> allocation in
> >> one go early during boot.
> >>
> >> But this really has to be clarified from the Linux kernel's
> >> expectations. Please
> >> formalize all of the following cases:
> >>
> >> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report
> >> as  DT/ACPI should report as
> >> -----------------  ------------------ 
> >> -------------------------------  ------------------------
> >> DT                 present             ?                                ?
> >> DT                 absent              ?                                ?
> >> ACPI               present             ?                                ?
> >> ACPI               absent              ?                                ?
> >>
> >> Again, this table is dictated by Linux."
> >>
> >> ******/
> >>
> >> Could you please take a look at this and let us know what is expected
> >> here from
> >> a Linux kernel view point.
> > 
> > For arm64, so far we've not even been considering DT-based hotplug - as
> > far as I'm aware there would still be a big open question there around
> > notification mechanisms and how to describe them. The DT stuff so far
> > has come from the PowerPC folks, so it's probably worth seeing what
> > their ideas are.
> > 
> > ACPI-wise I've always assumed/hoped that hotplug-related things should
> > be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
> > would be enough for us.
> 
> As far as I can see in UEFI v2.8 -- and I had checked the spec before
> dumping the table with the many question marks on Shameer --, all the
> hot-plug language in the spec refers to USB and PCI hot-plug in the
> preboot environment. There is not a single word about hot-plug at OS
> runtime (regarding any device or component type), nor about memory
> hot-plug (at any time).
>
> Looking to x86 appears valid -- so what does the Linux kernel expect on
> that architecture, in the "ACPI" rows of the table?

I could only answer from QEMU x86 perspective.
QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
because of different linux guests tend to cannibalize it, making it non
unpluggable. The last culprit I recall was KASLR.

So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
it's possible (it's probably hack (spec deosn't say anything about it)
but it mostly works for Linux (plug/unplug) and Windows guest also
fine with plug part (no unplug there)).

As for physical systems, there are out there ones that do report
hotpluggable RAM in GetMemoryMap().

> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest
> re-raising the question on those lists too; at least the "ACPI" rows of
> the table.
> 
> Thanks!
> Laszlo
> 
> > 
> > Robin.
> > 
> >> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
> >> any valid
> >> points above).
> >>
> >> Thanks,
> >> Shameer
> >> [0] https://patchwork.kernel.org/cover/10890919/
> >> [1] https://patchwork.kernel.org/patch/10863299/
> >> [2] https://patchwork.kernel.org/patch/10890937/
> >>
> >>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-09 16:35     ` Igor Mammedov
@ 2019-05-09 21:48       ` Laszlo Ersek
  2019-05-10  8:34         ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 11+ messages in thread
From: Laszlo Ersek @ 2019-05-09 21:48 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Robin Murphy, Shameerali Kolothum Thodi, will.deacon,
	Catalin Marinas, Anshuman Khandual, linux-arm-kernel, linux-mm,
	qemu-devel, qemu-arm, eric.auger, peter.maydell, Linuxarm,
	ard.biesheuvel, Jonathan Cameron, xuwei (O)

On 05/09/19 18:35, Igor Mammedov wrote:
> On Wed, 8 May 2019 22:26:12 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 05/08/19 14:50, Robin Murphy wrote:
>>> Hi Shameer,
>>>
>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
>>>> Hi,
>>>>
>>>> This series here[0] attempts to add support for PCDIMM in QEMU for
>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
>>>> least
>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
>>>> memory is handled by kernel.
>>>>
>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
>>>> and uses GED device to trigger the hotplug. This works fine.
>>>>
>>>> But when we added the DT node corresponding to the PCDIMM(cold plug
>>>> scenario), we noticed that Guest kernel see this memory during early boot
>>>> even if we are booting with ACPI. Because of this, hotpluggable memory
>>>> may end up in zone normal and make it non-hot-un-pluggable even if Guest
>>>> boots with ACPI.
>>>>
>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
>>>> interpret the
>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
>>>> property
>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
>>>> differentiate
>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
>>>>
>>>> But then Laszlo rightly pointed out that in order to accommodate the
>>>> changes
>>>> into UEFI we need to know how exactly Linux expects/handles all the
>>>> hotpluggable memory scenarios. Please find the discussion here[2].
>>>>
>>>> For ease, I am just copying the relevant comment from Laszlo below,
>>>>
>>>> /******
>>>> "Given patches #7 and #8, as I understand them, the firmware cannot
>>>> distinguish
>>>>   hotpluggable & present, from hotpluggable & absent. The firmware can
>>>> only
>>>>   skip both hotpluggable cases. That's fine in that the firmware will
>>>> hog neither
>>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
>>>> boot?
>>>>
>>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
>>>> Assuming
>>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
>>>> will not include the range despite it being present at boot.
>>>> Presumably, ACPI
>>>> will refer to the range somehow, however. Will that not confuse the OS?
>>>>
>>>> When Igor raised this earlier, I suggested that
>>>> hotpluggable-and-present should
>>>> be added by the firmware, but also allocated immediately, as
>>>> EfiBootServicesData
>>>> type memory. This will prevent other drivers in the firmware from
>>>> allocating AcpiNVS
>>>> or Reserved chunks from the same memory range, the UEFI memmap will
>>>> contain
>>>> the range as EfiBootServicesData, and then the OS can release that
>>>> allocation in
>>>> one go early during boot.
>>>>
>>>> But this really has to be clarified from the Linux kernel's
>>>> expectations. Please
>>>> formalize all of the following cases:
>>>>
>>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report
>>>> as  DT/ACPI should report as
>>>> -----------------  ------------------ 
>>>> -------------------------------  ------------------------
>>>> DT                 present             ?                                ?
>>>> DT                 absent              ?                                ?
>>>> ACPI               present             ?                                ?
>>>> ACPI               absent              ?                                ?
>>>>
>>>> Again, this table is dictated by Linux."
>>>>
>>>> ******/
>>>>
>>>> Could you please take a look at this and let us know what is expected
>>>> here from
>>>> a Linux kernel view point.
>>>
>>> For arm64, so far we've not even been considering DT-based hotplug - as
>>> far as I'm aware there would still be a big open question there around
>>> notification mechanisms and how to describe them. The DT stuff so far
>>> has come from the PowerPC folks, so it's probably worth seeing what
>>> their ideas are.
>>>
>>> ACPI-wise I've always assumed/hoped that hotplug-related things should
>>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
>>> would be enough for us.
>>
>> As far as I can see in UEFI v2.8 -- and I had checked the spec before
>> dumping the table with the many question marks on Shameer --, all the
>> hot-plug language in the spec refers to USB and PCI hot-plug in the
>> preboot environment. There is not a single word about hot-plug at OS
>> runtime (regarding any device or component type), nor about memory
>> hot-plug (at any time).
>>
>> Looking to x86 appears valid -- so what does the Linux kernel expect on
>> that architecture, in the "ACPI" rows of the table?
> 
> I could only answer from QEMU x86 perspective.
> QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
> because of different linux guests tend to cannibalize it, making it non
> unpluggable. The last culprit I recall was KASLR.
> 
> So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
> it's possible (it's probably hack (spec deosn't say anything about it)
> but it mostly works for Linux (plug/unplug) and Windows guest also
> fine with plug part (no unplug there)).

I can accept this as a perfectly valid design. Which would mean, QEMU should mark each hotpluggable RAM range in the DTB for the firmware with the special new property, regardless of its initial ("cold") plugged-ness, and then the firmware will not expose the range in the GCD memory space map, and consequently in the UEFI memmap either.

IOW, our table is, thus far:

OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI should report as
-----------------  ------------------  -------------------------------  ------------------------
DT                 present             ABSENT                           ?
DT                 absent              ABSENT                           ?
ACPI               present             ABSENT                           PRESENT
ACPI               absent              ABSENT                           ABSENT

In the firmware, I only need to care about the GetMemoryMap() column, so I can work with this. Can someone please file a feature request at <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these detais?

Thanks
Laszlo

> 
> As for physical systems, there are out there ones that do report
> hotpluggable RAM in GetMemoryMap().
> 
>> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest
>> re-raising the question on those lists too; at least the "ACPI" rows of
>> the table.
>>
>> Thanks!
>> Laszlo
>>
>>>
>>> Robin.
>>>
>>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
>>>> any valid
>>>> points above).
>>>>
>>>> Thanks,
>>>> Shameer
>>>> [0] https://patchwork.kernel.org/cover/10890919/
>>>> [1] https://patchwork.kernel.org/patch/10863299/
>>>> [2] https://patchwork.kernel.org/patch/10890937/
>>>>
>>>>
>>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-09 21:48       ` Laszlo Ersek
@ 2019-05-10  8:34         ` Shameerali Kolothum Thodi
  2019-05-10  9:15           ` [Qemu-devel] " Auger Eric
  0 siblings, 1 reply; 11+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-05-10  8:34 UTC (permalink / raw)
  To: Laszlo Ersek, Igor Mammedov
  Cc: Robin Murphy, will.deacon, Catalin Marinas, Anshuman Khandual,
	linux-arm-kernel, linux-mm, qemu-devel, qemu-arm, eric.auger,
	peter.maydell, Linuxarm, ard.biesheuvel, Jonathan Cameron,
	xuwei (O)



> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: 09 May 2019 22:48
> To: Igor Mammedov <imammedo@redhat.com>
> Cc: Robin Murphy <robin.murphy@arm.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; will.deacon@arm.com; Catalin
> Marinas <Catalin.Marinas@arm.com>; Anshuman Khandual
> <anshuman.khandual@arm.com>; linux-arm-kernel@lists.infradead.org;
> linux-mm <linux-mm@kvack.org>; qemu-devel@nongnu.org;
> qemu-arm@nongnu.org; eric.auger@redhat.com; peter.maydell@linaro.org;
> Linuxarm <linuxarm@huawei.com>; ard.biesheuvel@linaro.org; Jonathan
> Cameron <jonathan.cameron@huawei.com>; xuwei (O) <xuwei5@huawei.com>
> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
> 
> On 05/09/19 18:35, Igor Mammedov wrote:
> > On Wed, 8 May 2019 22:26:12 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >
> >> On 05/08/19 14:50, Robin Murphy wrote:
> >>> Hi Shameer,
> >>>
> >>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> >>>> Hi,
> >>>>
> >>>> This series here[0] attempts to add support for PCDIMM in QEMU for
> >>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
> >>>> least
> >>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> >>>> memory is handled by kernel.
> >>>>
> >>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> >>>> and uses GED device to trigger the hotplug. This works fine.
> >>>>
> >>>> But when we added the DT node corresponding to the PCDIMM(cold plug
> >>>> scenario), we noticed that Guest kernel see this memory during early
> boot
> >>>> even if we are booting with ACPI. Because of this, hotpluggable memory
> >>>> may end up in zone normal and make it non-hot-un-pluggable even if
> Guest
> >>>> boots with ACPI.
> >>>>
> >>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
> >>>> interpret the
> >>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
> >>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
> >>>> property
> >>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
> >>>> differentiate
> >>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> >>>>
> >>>> But then Laszlo rightly pointed out that in order to accommodate the
> >>>> changes
> >>>> into UEFI we need to know how exactly Linux expects/handles all the
> >>>> hotpluggable memory scenarios. Please find the discussion here[2].
> >>>>
> >>>> For ease, I am just copying the relevant comment from Laszlo below,
> >>>>
> >>>> /******
> >>>> "Given patches #7 and #8, as I understand them, the firmware cannot
> >>>> distinguish
> >>>>   hotpluggable & present, from hotpluggable & absent. The firmware
> can
> >>>> only
> >>>>   skip both hotpluggable cases. That's fine in that the firmware will
> >>>> hog neither
> >>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
> >>>> boot?
> >>>>
> >>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
> >>>> Assuming
> >>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI
> memmap
> >>>> will not include the range despite it being present at boot.
> >>>> Presumably, ACPI
> >>>> will refer to the range somehow, however. Will that not confuse the OS?
> >>>>
> >>>> When Igor raised this earlier, I suggested that
> >>>> hotpluggable-and-present should
> >>>> be added by the firmware, but also allocated immediately, as
> >>>> EfiBootServicesData
> >>>> type memory. This will prevent other drivers in the firmware from
> >>>> allocating AcpiNVS
> >>>> or Reserved chunks from the same memory range, the UEFI memmap will
> >>>> contain
> >>>> the range as EfiBootServicesData, and then the OS can release that
> >>>> allocation in
> >>>> one go early during boot.
> >>>>
> >>>> But this really has to be clarified from the Linux kernel's
> >>>> expectations. Please
> >>>> formalize all of the following cases:
> >>>>
> >>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report
> >>>> as  DT/ACPI should report as
> >>>> -----------------  ------------------
> >>>> -------------------------------  ------------------------
> >>>>
> DT                 present             ?
>               ?
> >>>>
> DT                 absent              ?
>                ?
> >>>>
> ACPI               present             ?
>               ?
> >>>>
> ACPI               absent              ?
>               ?
> >>>>
> >>>> Again, this table is dictated by Linux."
> >>>>
> >>>> ******/
> >>>>
> >>>> Could you please take a look at this and let us know what is expected
> >>>> here from
> >>>> a Linux kernel view point.
> >>>
> >>> For arm64, so far we've not even been considering DT-based hotplug - as
> >>> far as I'm aware there would still be a big open question there around
> >>> notification mechanisms and how to describe them. The DT stuff so far
> >>> has come from the PowerPC folks, so it's probably worth seeing what
> >>> their ideas are.
> >>>
> >>> ACPI-wise I've always assumed/hoped that hotplug-related things should
> >>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
> >>> would be enough for us.
> >>
> >> As far as I can see in UEFI v2.8 -- and I had checked the spec before
> >> dumping the table with the many question marks on Shameer --, all the
> >> hot-plug language in the spec refers to USB and PCI hot-plug in the
> >> preboot environment. There is not a single word about hot-plug at OS
> >> runtime (regarding any device or component type), nor about memory
> >> hot-plug (at any time).
> >>
> >> Looking to x86 appears valid -- so what does the Linux kernel expect on
> >> that architecture, in the "ACPI" rows of the table?
> >
> > I could only answer from QEMU x86 perspective.
> > QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
> > because of different linux guests tend to cannibalize it, making it non
> > unpluggable. The last culprit I recall was KASLR.
> >
> > So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
> > it's possible (it's probably hack (spec deosn't say anything about it)
> > but it mostly works for Linux (plug/unplug) and Windows guest also
> > fine with plug part (no unplug there)).
> 
> I can accept this as a perfectly valid design. Which would mean, QEMU should
> mark each hotpluggable RAM range in the DTB for the firmware with the
> special new property, regardless of its initial ("cold") plugged-ness, and then
> the firmware will not expose the range in the GCD memory space map, and
> consequently in the UEFI memmap either.
> 
> IOW, our table is, thus far:
> 
> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
> DT/ACPI should report as
> -----------------  ------------------  -------------------------------  ------------------------
> DT                 present
> ABSENT                           ?
> DT                 absent
> ABSENT                           ?
> ACPI               present             ABSENT
> PRESENT
> ACPI               absent              ABSENT
> ABSENT
> In the firmware, I only need to care about the GetMemoryMap() column, so I
> can work with this.

Thank you all for the inputs.

I assume we will still report the DT cold plug case to kernel(hotpluggable & present).
so the table will be something like this,

OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI should report as
-----------------  ------------------  -------------------------------  ------------------------
DT                 present             ABSENT                           PRESENT
DT                 absent              ABSENT                           ABSENT
ACPI               present             ABSENT                           PRESENT
ACPI               absent              ABSENT                           ABSENT 


 Can someone please file a feature request at
> <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these
> detais?

Ok. I will do that.

Thanks,
Shameer

> Thanks
> Laszlo
> 
> >
> > As for physical systems, there are out there ones that do report
> > hotpluggable RAM in GetMemoryMap().
> >
> >> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest
> >> re-raising the question on those lists too; at least the "ACPI" rows of
> >> the table.
> >>
> >> Thanks!
> >> Laszlo
> >>
> >>>
> >>> Robin.
> >>>
> >>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
> >>>> any valid
> >>>> points above).
> >>>>
> >>>> Thanks,
> >>>> Shameer
> >>>> [0] https://patchwork.kernel.org/cover/10890919/
> >>>> [1] https://patchwork.kernel.org/patch/10863299/
> >>>> [2] https://patchwork.kernel.org/patch/10890937/
> >>>>
> >>>>
> >>
> >


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-10  8:34         ` Shameerali Kolothum Thodi
@ 2019-05-10  9:15           ` Auger Eric
  2019-05-10  9:27             ` Shameerali Kolothum Thodi
  0 siblings, 1 reply; 11+ messages in thread
From: Auger Eric @ 2019-05-10  9:15 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Laszlo Ersek, Igor Mammedov
  Cc: peter.maydell, xuwei (O),
	Anshuman Khandual, Catalin Marinas, ard.biesheuvel, will.deacon,
	qemu-devel, Linuxarm, linux-mm, qemu-arm, Jonathan Cameron,
	Robin Murphy, linux-arm-kernel

Hi Shameer,

On 5/10/19 10:34 AM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>> Sent: 09 May 2019 22:48
>> To: Igor Mammedov <imammedo@redhat.com>
>> Cc: Robin Murphy <robin.murphy@arm.com>; Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; will.deacon@arm.com; Catalin
>> Marinas <Catalin.Marinas@arm.com>; Anshuman Khandual
>> <anshuman.khandual@arm.com>; linux-arm-kernel@lists.infradead.org;
>> linux-mm <linux-mm@kvack.org>; qemu-devel@nongnu.org;
>> qemu-arm@nongnu.org; eric.auger@redhat.com; peter.maydell@linaro.org;
>> Linuxarm <linuxarm@huawei.com>; ard.biesheuvel@linaro.org; Jonathan
>> Cameron <jonathan.cameron@huawei.com>; xuwei (O) <xuwei5@huawei.com>
>> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
>>
>> On 05/09/19 18:35, Igor Mammedov wrote:
>>> On Wed, 8 May 2019 22:26:12 +0200
>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>
>>>> On 05/08/19 14:50, Robin Murphy wrote:
>>>>> Hi Shameer,
>>>>>
>>>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> This series here[0] attempts to add support for PCDIMM in QEMU for
>>>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
>>>>>> least
>>>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
>>>>>> memory is handled by kernel.
>>>>>>
>>>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
>>>>>> and uses GED device to trigger the hotplug. This works fine.
>>>>>>
>>>>>> But when we added the DT node corresponding to the PCDIMM(cold plug
>>>>>> scenario), we noticed that Guest kernel see this memory during early
>> boot
>>>>>> even if we are booting with ACPI. Because of this, hotpluggable memory
>>>>>> may end up in zone normal and make it non-hot-un-pluggable even if
>> Guest
>>>>>> boots with ACPI.
>>>>>>
>>>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
>>>>>> interpret the
>>>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
>>>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
>>>>>> property
>>>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
>>>>>> differentiate
>>>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
>>>>>>
>>>>>> But then Laszlo rightly pointed out that in order to accommodate the
>>>>>> changes
>>>>>> into UEFI we need to know how exactly Linux expects/handles all the
>>>>>> hotpluggable memory scenarios. Please find the discussion here[2].
>>>>>>
>>>>>> For ease, I am just copying the relevant comment from Laszlo below,
>>>>>>
>>>>>> /******
>>>>>> "Given patches #7 and #8, as I understand them, the firmware cannot
>>>>>> distinguish
>>>>>>   hotpluggable & present, from hotpluggable & absent. The firmware
>> can
>>>>>> only
>>>>>>   skip both hotpluggable cases. That's fine in that the firmware will
>>>>>> hog neither
>>>>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
>>>>>> boot?
>>>>>>
>>>>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
>>>>>> Assuming
>>>>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI
>> memmap
>>>>>> will not include the range despite it being present at boot.
>>>>>> Presumably, ACPI
>>>>>> will refer to the range somehow, however. Will that not confuse the OS?
>>>>>>
>>>>>> When Igor raised this earlier, I suggested that
>>>>>> hotpluggable-and-present should
>>>>>> be added by the firmware, but also allocated immediately, as
>>>>>> EfiBootServicesData
>>>>>> type memory. This will prevent other drivers in the firmware from
>>>>>> allocating AcpiNVS
>>>>>> or Reserved chunks from the same memory range, the UEFI memmap will
>>>>>> contain
>>>>>> the range as EfiBootServicesData, and then the OS can release that
>>>>>> allocation in
>>>>>> one go early during boot.
>>>>>>
>>>>>> But this really has to be clarified from the Linux kernel's
>>>>>> expectations. Please
>>>>>> formalize all of the following cases:
>>>>>>
>>>>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report
>>>>>> as  DT/ACPI should report as
>>>>>> -----------------  ------------------
>>>>>> -------------------------------  ------------------------
>>>>>>
>> DT                 present             ?
>>               ?
>>>>>>
>> DT                 absent              ?
>>                ?
>>>>>>
>> ACPI               present             ?
>>               ?
>>>>>>
>> ACPI               absent              ?
>>               ?
>>>>>>
>>>>>> Again, this table is dictated by Linux."
>>>>>>
>>>>>> ******/
>>>>>>
>>>>>> Could you please take a look at this and let us know what is expected
>>>>>> here from
>>>>>> a Linux kernel view point.
>>>>>
>>>>> For arm64, so far we've not even been considering DT-based hotplug - as
>>>>> far as I'm aware there would still be a big open question there around
>>>>> notification mechanisms and how to describe them. The DT stuff so far
>>>>> has come from the PowerPC folks, so it's probably worth seeing what
>>>>> their ideas are.
>>>>>
>>>>> ACPI-wise I've always assumed/hoped that hotplug-related things should
>>>>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
>>>>> would be enough for us.
>>>>
>>>> As far as I can see in UEFI v2.8 -- and I had checked the spec before
>>>> dumping the table with the many question marks on Shameer --, all the
>>>> hot-plug language in the spec refers to USB and PCI hot-plug in the
>>>> preboot environment. There is not a single word about hot-plug at OS
>>>> runtime (regarding any device or component type), nor about memory
>>>> hot-plug (at any time).
>>>>
>>>> Looking to x86 appears valid -- so what does the Linux kernel expect on
>>>> that architecture, in the "ACPI" rows of the table?
>>>
>>> I could only answer from QEMU x86 perspective.
>>> QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
>>> because of different linux guests tend to cannibalize it, making it non
>>> unpluggable. The last culprit I recall was KASLR.
>>>
>>> So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
>>> it's possible (it's probably hack (spec deosn't say anything about it)
>>> but it mostly works for Linux (plug/unplug) and Windows guest also
>>> fine with plug part (no unplug there)).
>>
>> I can accept this as a perfectly valid design. Which would mean, QEMU should
>> mark each hotpluggable RAM range in the DTB for the firmware with the
>> special new property, regardless of its initial ("cold") plugged-ness, and then
>> the firmware will not expose the range in the GCD memory space map, and
>> consequently in the UEFI memmap either.
>>
>> IOW, our table is, thus far:
>>
>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
>> DT/ACPI should report as
>> -----------------  ------------------  -------------------------------  ------------------------
>> DT                 present
>> ABSENT                           ?
>> DT                 absent
>> ABSENT                           ?
>> ACPI               present             ABSENT
>> PRESENT
>> ACPI               absent              ABSENT
>> ABSENT
>> In the firmware, I only need to care about the GetMemoryMap() column, so I
>> can work with this.
> 
> Thank you all for the inputs.
> 
> I assume we will still report the DT cold plug case to kernel(hotpluggable & present).
> so the table will be something like this,
> 
> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI should report as
> -----------------  ------------------  -------------------------------  ------------------------
> DT                 present             ABSENT                           PRESENT
> DT                 absent              ABSENT                           ABSENT
With DT boot, how does the OS get to know if thehotpluggable memory is
present or absent? Or maybe I misunderstand the last column.

Thanks

Eric
> ACPI               present             ABSENT                           PRESENT
> ACPI               absent              ABSENT                           ABSENT 
> 
> 
>  Can someone please file a feature request at
>> <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these
>> detais?
> 
> Ok. I will do that.
> 
> Thanks,
> Shameer
> 
>> Thanks
>> Laszlo
>>
>>>
>>> As for physical systems, there are out there ones that do report
>>> hotpluggable RAM in GetMemoryMap().
>>>
>>>> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest
>>>> re-raising the question on those lists too; at least the "ACPI" rows of
>>>> the table.
>>>>
>>>> Thanks!
>>>> Laszlo
>>>>
>>>>>
>>>>> Robin.
>>>>>
>>>>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
>>>>>> any valid
>>>>>> points above).
>>>>>>
>>>>>> Thanks,
>>>>>> Shameer
>>>>>> [0] https://patchwork.kernel.org/cover/10890919/
>>>>>> [1] https://patchwork.kernel.org/patch/10863299/
>>>>>> [2] https://patchwork.kernel.org/patch/10890937/
>>>>>>
>>>>>>
>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-10  9:15           ` [Qemu-devel] " Auger Eric
@ 2019-05-10  9:27             ` Shameerali Kolothum Thodi
  2019-05-10  9:58               ` Auger Eric
  0 siblings, 1 reply; 11+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-05-10  9:27 UTC (permalink / raw)
  To: Auger Eric, Laszlo Ersek, Igor Mammedov
  Cc: peter.maydell, xuwei (O),
	Anshuman Khandual, Catalin Marinas, ard.biesheuvel, will.deacon,
	qemu-devel, Linuxarm, linux-mm, qemu-arm, Jonathan Cameron,
	Robin Murphy, linux-arm-kernel

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 10 May 2019 10:16
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> Laszlo Ersek <lersek@redhat.com>; Igor Mammedov
> <imammedo@redhat.com>
> Cc: peter.maydell@linaro.org; xuwei (O) <xuwei5@huawei.com>; Anshuman
> Khandual <anshuman.khandual@arm.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; ard.biesheuvel@linaro.org;
> will.deacon@arm.com; qemu-devel@nongnu.org; Linuxarm
> <linuxarm@huawei.com>; linux-mm <linux-mm@kvack.org>;
> qemu-arm@nongnu.org; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Robin Murphy <robin.murphy@arm.com>;
> linux-arm-kernel@lists.infradead.org
> Subject: Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu
> ARM/virt
> 
> Hi Shameer,
> 
> On 5/10/19 10:34 AM, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -----Original Message-----
> >> From: Laszlo Ersek [mailto:lersek@redhat.com]
> >> Sent: 09 May 2019 22:48
> >> To: Igor Mammedov <imammedo@redhat.com>
> >> Cc: Robin Murphy <robin.murphy@arm.com>; Shameerali Kolothum Thodi
> >> <shameerali.kolothum.thodi@huawei.com>; will.deacon@arm.com; Catalin
> >> Marinas <Catalin.Marinas@arm.com>; Anshuman Khandual
> >> <anshuman.khandual@arm.com>; linux-arm-kernel@lists.infradead.org;
> >> linux-mm <linux-mm@kvack.org>; qemu-devel@nongnu.org;
> >> qemu-arm@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org;
> >> Linuxarm <linuxarm@huawei.com>; ard.biesheuvel@linaro.org; Jonathan
> >> Cameron <jonathan.cameron@huawei.com>; xuwei (O)
> <xuwei5@huawei.com>
> >> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
> >>
> >> On 05/09/19 18:35, Igor Mammedov wrote:
> >>> On Wed, 8 May 2019 22:26:12 +0200
> >>> Laszlo Ersek <lersek@redhat.com> wrote:
> >>>
> >>>> On 05/08/19 14:50, Robin Murphy wrote:
> >>>>> Hi Shameer,
> >>>>>
> >>>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> This series here[0] attempts to add support for PCDIMM in QEMU for
> >>>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
> >>>>>> least
> >>>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> >>>>>> memory is handled by kernel.
> >>>>>>
> >>>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> >>>>>> and uses GED device to trigger the hotplug. This works fine.
> >>>>>>
> >>>>>> But when we added the DT node corresponding to the PCDIMM(cold
> plug
> >>>>>> scenario), we noticed that Guest kernel see this memory during early
> >> boot
> >>>>>> even if we are booting with ACPI. Because of this, hotpluggable
> memory
> >>>>>> may end up in zone normal and make it non-hot-un-pluggable even if
> >> Guest
> >>>>>> boots with ACPI.
> >>>>>>
> >>>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
> >>>>>> interpret the
> >>>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
> >>>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
> >>>>>> property
> >>>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
> >>>>>> differentiate
> >>>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> >>>>>>
> >>>>>> But then Laszlo rightly pointed out that in order to accommodate the
> >>>>>> changes
> >>>>>> into UEFI we need to know how exactly Linux expects/handles all the
> >>>>>> hotpluggable memory scenarios. Please find the discussion here[2].
> >>>>>>
> >>>>>> For ease, I am just copying the relevant comment from Laszlo below,
> >>>>>>
> >>>>>> /******
> >>>>>> "Given patches #7 and #8, as I understand them, the firmware cannot
> >>>>>> distinguish
> >>>>>>   hotpluggable & present, from hotpluggable & absent. The firmware
> >> can
> >>>>>> only
> >>>>>>   skip both hotpluggable cases. That's fine in that the firmware will
> >>>>>> hog neither
> >>>>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
> >>>>>> boot?
> >>>>>>
> >>>>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
> >>>>>> Assuming
> >>>>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI
> >> memmap
> >>>>>> will not include the range despite it being present at boot.
> >>>>>> Presumably, ACPI
> >>>>>> will refer to the range somehow, however. Will that not confuse the
> OS?
> >>>>>>
> >>>>>> When Igor raised this earlier, I suggested that
> >>>>>> hotpluggable-and-present should
> >>>>>> be added by the firmware, but also allocated immediately, as
> >>>>>> EfiBootServicesData
> >>>>>> type memory. This will prevent other drivers in the firmware from
> >>>>>> allocating AcpiNVS
> >>>>>> or Reserved chunks from the same memory range, the UEFI memmap
> will
> >>>>>> contain
> >>>>>> the range as EfiBootServicesData, and then the OS can release that
> >>>>>> allocation in
> >>>>>> one go early during boot.
> >>>>>>
> >>>>>> But this really has to be clarified from the Linux kernel's
> >>>>>> expectations. Please
> >>>>>> formalize all of the following cases:
> >>>>>>
> >>>>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should
> report
> >>>>>> as  DT/ACPI should report as
> >>>>>> -----------------  ------------------
> >>>>>> -------------------------------  ------------------------
> >>>>>>
> >> DT                 present             ?
> >>               ?
> >>>>>>
> >> DT                 absent              ?
> >>                ?
> >>>>>>
> >> ACPI               present             ?
> >>               ?
> >>>>>>
> >> ACPI               absent              ?
> >>               ?
> >>>>>>
> >>>>>> Again, this table is dictated by Linux."
> >>>>>>
> >>>>>> ******/
> >>>>>>
> >>>>>> Could you please take a look at this and let us know what is expected
> >>>>>> here from
> >>>>>> a Linux kernel view point.
> >>>>>
> >>>>> For arm64, so far we've not even been considering DT-based hotplug - as
> >>>>> far as I'm aware there would still be a big open question there around
> >>>>> notification mechanisms and how to describe them. The DT stuff so far
> >>>>> has come from the PowerPC folks, so it's probably worth seeing what
> >>>>> their ideas are.
> >>>>>
> >>>>> ACPI-wise I've always assumed/hoped that hotplug-related things
> should
> >>>>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
> >>>>> would be enough for us.
> >>>>
> >>>> As far as I can see in UEFI v2.8 -- and I had checked the spec before
> >>>> dumping the table with the many question marks on Shameer --, all the
> >>>> hot-plug language in the spec refers to USB and PCI hot-plug in the
> >>>> preboot environment. There is not a single word about hot-plug at OS
> >>>> runtime (regarding any device or component type), nor about memory
> >>>> hot-plug (at any time).
> >>>>
> >>>> Looking to x86 appears valid -- so what does the Linux kernel expect on
> >>>> that architecture, in the "ACPI" rows of the table?
> >>>
> >>> I could only answer from QEMU x86 perspective.
> >>> QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
> >>> because of different linux guests tend to cannibalize it, making it non
> >>> unpluggable. The last culprit I recall was KASLR.
> >>>
> >>> So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
> >>> it's possible (it's probably hack (spec deosn't say anything about it)
> >>> but it mostly works for Linux (plug/unplug) and Windows guest also
> >>> fine with plug part (no unplug there)).
> >>
> >> I can accept this as a perfectly valid design. Which would mean, QEMU
> should
> >> mark each hotpluggable RAM range in the DTB for the firmware with the
> >> special new property, regardless of its initial ("cold") plugged-ness, and then
> >> the firmware will not expose the range in the GCD memory space map, and
> >> consequently in the UEFI memmap either.
> >>
> >> IOW, our table is, thus far:
> >>
> >> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
> >> DT/ACPI should report as
> >> -----------------  ------------------  -------------------------------  ------------------------
> >> DT                 present
> >> ABSENT                           ?
> >> DT                 absent
> >> ABSENT                           ?
> >> ACPI               present             ABSENT
> >> PRESENT
> >> ACPI               absent              ABSENT
> >> ABSENT
> >> In the firmware, I only need to care about the GetMemoryMap() column, so
> I
> >> can work with this.
> >
> > Thank you all for the inputs.
> >
> > I assume we will still report the DT cold plug case to kernel(hotpluggable &
> present).
> > so the table will be something like this,
> >
> > OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
> DT/ACPI should report as
> > -----------------  ------------------  -------------------------------  ------------------------
> > DT                 present             ABSENT
> PRESENT
> > DT                 absent              ABSENT
> ABSENT
> With DT boot, how does the OS get to know if thehotpluggable memory is
> present or absent? Or maybe I misunderstand the last column.

It doesn't. For hotpluggable & present case it will be just like normal memory and
for absent case no memory node(hotpluaggble) is populated in DT. Is this acceptable?

On another note, if there are no strong case for DT cold plug for PCDIMM we can drop
it altogether which will make everything much simpler and no change required for
UEFI as well.

Thanks,
Shameer


> Thanks
> 
> Eric
> > ACPI               present             ABSENT
> PRESENT
> > ACPI               absent              ABSENT
> ABSENT
> >
> >
> >  Can someone please file a feature request at
> >> <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these
> >> detais?
> >
> > Ok. I will do that.
> >
> > Thanks,
> > Shameer
> >
> >> Thanks
> >> Laszlo
> >>
> >>>
> >>> As for physical systems, there are out there ones that do report
> >>> hotpluggable RAM in GetMemoryMap().
> >>>
> >>>> Shameer: if you (Huawei) are represented on the USWG / ASWG, I
> suggest
> >>>> re-raising the question on those lists too; at least the "ACPI" rows of
> >>>> the table.
> >>>>
> >>>> Thanks!
> >>>> Laszlo
> >>>>
> >>>>>
> >>>>> Robin.
> >>>>>
> >>>>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
> >>>>>> any valid
> >>>>>> points above).
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Shameer
> >>>>>> [0] https://patchwork.kernel.org/cover/10890919/
> >>>>>> [1] https://patchwork.kernel.org/patch/10863299/
> >>>>>> [2] https://patchwork.kernel.org/patch/10890937/
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-10  9:27             ` Shameerali Kolothum Thodi
@ 2019-05-10  9:58               ` Auger Eric
  2019-05-10 15:05                 ` Igor Mammedov
  0 siblings, 1 reply; 11+ messages in thread
From: Auger Eric @ 2019-05-10  9:58 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi, Laszlo Ersek, Igor Mammedov
  Cc: peter.maydell, xuwei (O),
	Anshuman Khandual, Catalin Marinas, ard.biesheuvel, will.deacon,
	qemu-devel, Linuxarm, linux-mm, qemu-arm, Jonathan Cameron,
	Robin Murphy, linux-arm-kernel

Hi Shameer,

On 5/10/19 11:27 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: 10 May 2019 10:16
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
>> Laszlo Ersek <lersek@redhat.com>; Igor Mammedov
>> <imammedo@redhat.com>
>> Cc: peter.maydell@linaro.org; xuwei (O) <xuwei5@huawei.com>; Anshuman
>> Khandual <anshuman.khandual@arm.com>; Catalin Marinas
>> <Catalin.Marinas@arm.com>; ard.biesheuvel@linaro.org;
>> will.deacon@arm.com; qemu-devel@nongnu.org; Linuxarm
>> <linuxarm@huawei.com>; linux-mm <linux-mm@kvack.org>;
>> qemu-arm@nongnu.org; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; Robin Murphy <robin.murphy@arm.com>;
>> linux-arm-kernel@lists.infradead.org
>> Subject: Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu
>> ARM/virt
>>
>> Hi Shameer,
>>
>> On 5/10/19 10:34 AM, Shameerali Kolothum Thodi wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>>>> Sent: 09 May 2019 22:48
>>>> To: Igor Mammedov <imammedo@redhat.com>
>>>> Cc: Robin Murphy <robin.murphy@arm.com>; Shameerali Kolothum Thodi
>>>> <shameerali.kolothum.thodi@huawei.com>; will.deacon@arm.com; Catalin
>>>> Marinas <Catalin.Marinas@arm.com>; Anshuman Khandual
>>>> <anshuman.khandual@arm.com>; linux-arm-kernel@lists.infradead.org;
>>>> linux-mm <linux-mm@kvack.org>; qemu-devel@nongnu.org;
>>>> qemu-arm@nongnu.org; eric.auger@redhat.com;
>> peter.maydell@linaro.org;
>>>> Linuxarm <linuxarm@huawei.com>; ard.biesheuvel@linaro.org; Jonathan
>>>> Cameron <jonathan.cameron@huawei.com>; xuwei (O)
>> <xuwei5@huawei.com>
>>>> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
>>>>
>>>> On 05/09/19 18:35, Igor Mammedov wrote:
>>>>> On Wed, 8 May 2019 22:26:12 +0200
>>>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>>>
>>>>>> On 05/08/19 14:50, Robin Murphy wrote:
>>>>>>> Hi Shameer,
>>>>>>>
>>>>>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This series here[0] attempts to add support for PCDIMM in QEMU for
>>>>>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
>>>>>>>> least
>>>>>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
>>>>>>>> memory is handled by kernel.
>>>>>>>>
>>>>>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
>>>>>>>> and uses GED device to trigger the hotplug. This works fine.
>>>>>>>>
>>>>>>>> But when we added the DT node corresponding to the PCDIMM(cold
>> plug
>>>>>>>> scenario), we noticed that Guest kernel see this memory during early
>>>> boot
>>>>>>>> even if we are booting with ACPI. Because of this, hotpluggable
>> memory
>>>>>>>> may end up in zone normal and make it non-hot-un-pluggable even if
>>>> Guest
>>>>>>>> boots with ACPI.
>>>>>>>>
>>>>>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
>>>>>>>> interpret the
>>>>>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
>>>>>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
>>>>>>>> property
>>>>>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
>>>>>>>> differentiate
>>>>>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
>>>>>>>>
>>>>>>>> But then Laszlo rightly pointed out that in order to accommodate the
>>>>>>>> changes
>>>>>>>> into UEFI we need to know how exactly Linux expects/handles all the
>>>>>>>> hotpluggable memory scenarios. Please find the discussion here[2].
>>>>>>>>
>>>>>>>> For ease, I am just copying the relevant comment from Laszlo below,
>>>>>>>>
>>>>>>>> /******
>>>>>>>> "Given patches #7 and #8, as I understand them, the firmware cannot
>>>>>>>> distinguish
>>>>>>>>   hotpluggable & present, from hotpluggable & absent. The firmware
>>>> can
>>>>>>>> only
>>>>>>>>   skip both hotpluggable cases. That's fine in that the firmware will
>>>>>>>> hog neither
>>>>>>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
>>>>>>>> boot?
>>>>>>>>
>>>>>>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
>>>>>>>> Assuming
>>>>>>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI
>>>> memmap
>>>>>>>> will not include the range despite it being present at boot.
>>>>>>>> Presumably, ACPI
>>>>>>>> will refer to the range somehow, however. Will that not confuse the
>> OS?
>>>>>>>>
>>>>>>>> When Igor raised this earlier, I suggested that
>>>>>>>> hotpluggable-and-present should
>>>>>>>> be added by the firmware, but also allocated immediately, as
>>>>>>>> EfiBootServicesData
>>>>>>>> type memory. This will prevent other drivers in the firmware from
>>>>>>>> allocating AcpiNVS
>>>>>>>> or Reserved chunks from the same memory range, the UEFI memmap
>> will
>>>>>>>> contain
>>>>>>>> the range as EfiBootServicesData, and then the OS can release that
>>>>>>>> allocation in
>>>>>>>> one go early during boot.
>>>>>>>>
>>>>>>>> But this really has to be clarified from the Linux kernel's
>>>>>>>> expectations. Please
>>>>>>>> formalize all of the following cases:
>>>>>>>>
>>>>>>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should
>> report
>>>>>>>> as  DT/ACPI should report as
>>>>>>>> -----------------  ------------------
>>>>>>>> -------------------------------  ------------------------
>>>>>>>>
>>>> DT                 present             ?
>>>>               ?
>>>>>>>>
>>>> DT                 absent              ?
>>>>                ?
>>>>>>>>
>>>> ACPI               present             ?
>>>>               ?
>>>>>>>>
>>>> ACPI               absent              ?
>>>>               ?
>>>>>>>>
>>>>>>>> Again, this table is dictated by Linux."
>>>>>>>>
>>>>>>>> ******/
>>>>>>>>
>>>>>>>> Could you please take a look at this and let us know what is expected
>>>>>>>> here from
>>>>>>>> a Linux kernel view point.
>>>>>>>
>>>>>>> For arm64, so far we've not even been considering DT-based hotplug - as
>>>>>>> far as I'm aware there would still be a big open question there around
>>>>>>> notification mechanisms and how to describe them. The DT stuff so far
>>>>>>> has come from the PowerPC folks, so it's probably worth seeing what
>>>>>>> their ideas are.
>>>>>>>
>>>>>>> ACPI-wise I've always assumed/hoped that hotplug-related things
>> should
>>>>>>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
>>>>>>> would be enough for us.
>>>>>>
>>>>>> As far as I can see in UEFI v2.8 -- and I had checked the spec before
>>>>>> dumping the table with the many question marks on Shameer --, all the
>>>>>> hot-plug language in the spec refers to USB and PCI hot-plug in the
>>>>>> preboot environment. There is not a single word about hot-plug at OS
>>>>>> runtime (regarding any device or component type), nor about memory
>>>>>> hot-plug (at any time).
>>>>>>
>>>>>> Looking to x86 appears valid -- so what does the Linux kernel expect on
>>>>>> that architecture, in the "ACPI" rows of the table?
>>>>>
>>>>> I could only answer from QEMU x86 perspective.
>>>>> QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
>>>>> because of different linux guests tend to cannibalize it, making it non
>>>>> unpluggable. The last culprit I recall was KASLR.
>>>>>
>>>>> So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
>>>>> it's possible (it's probably hack (spec deosn't say anything about it)
>>>>> but it mostly works for Linux (plug/unplug) and Windows guest also
>>>>> fine with plug part (no unplug there)).
>>>>
>>>> I can accept this as a perfectly valid design. Which would mean, QEMU
>> should
>>>> mark each hotpluggable RAM range in the DTB for the firmware with the
>>>> special new property, regardless of its initial ("cold") plugged-ness, and then
>>>> the firmware will not expose the range in the GCD memory space map, and
>>>> consequently in the UEFI memmap either.
>>>>
>>>> IOW, our table is, thus far:
>>>>
>>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
>>>> DT/ACPI should report as
>>>> -----------------  ------------------  -------------------------------  ------------------------
>>>> DT                 present
>>>> ABSENT                           ?
>>>> DT                 absent
>>>> ABSENT                           ?
>>>> ACPI               present             ABSENT
>>>> PRESENT
>>>> ACPI               absent              ABSENT
>>>> ABSENT
>>>> In the firmware, I only need to care about the GetMemoryMap() column, so
>> I
>>>> can work with this.
>>>
>>> Thank you all for the inputs.
>>>
>>> I assume we will still report the DT cold plug case to kernel(hotpluggable &
>> present).
>>> so the table will be something like this,
>>>
>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
>> DT/ACPI should report as
>>> -----------------  ------------------  -------------------------------  ------------------------
>>> DT                 present             ABSENT
>> PRESENT
>>> DT                 absent              ABSENT
>> ABSENT
>> With DT boot, how does the OS get to know if thehotpluggable memory is
>> present or absent? Or maybe I misunderstand the last column.
> 
> It doesn't. For hotpluggable & present case it will be just like normal memory and
> for absent case no memory node(hotpluaggble) is populated in DT. Is this acceptable?
OK I get it now. Yes it makes sense.
> 
> On another note, if there are no strong case for DT cold plug for PCDIMM we can drop
> it altogether which will make everything much simpler and no change required for
> UEFI as well.
I don't think we have strong requirements for PCDIMM in DT mode (initial
RAM can be used). As long as we can detect an attempt to use PCDIMM in
DT only mode and reject it (-no-acpi or !firmware_loaded ?), personally
I don't have any objection.

Thanks

Eric
> 
> Thanks,
> Shameer
> 
> 
>> Thanks
>>
>> Eric
>>> ACPI               present             ABSENT
>> PRESENT
>>> ACPI               absent              ABSENT
>> ABSENT
>>>
>>>
>>>  Can someone please file a feature request at
>>>> <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these
>>>> detais?
>>>
>>> Ok. I will do that.
>>>
>>> Thanks,
>>> Shameer
>>>
>>>> Thanks
>>>> Laszlo
>>>>
>>>>>
>>>>> As for physical systems, there are out there ones that do report
>>>>> hotpluggable RAM in GetMemoryMap().
>>>>>
>>>>>> Shameer: if you (Huawei) are represented on the USWG / ASWG, I
>> suggest
>>>>>> re-raising the question on those lists too; at least the "ACPI" rows of
>>>>>> the table.
>>>>>>
>>>>>> Thanks!
>>>>>> Laszlo
>>>>>>
>>>>>>>
>>>>>>> Robin.
>>>>>>>
>>>>>>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
>>>>>>>> any valid
>>>>>>>> points above).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Shameer
>>>>>>>> [0] https://patchwork.kernel.org/cover/10890919/
>>>>>>>> [1] https://patchwork.kernel.org/patch/10863299/
>>>>>>>> [2] https://patchwork.kernel.org/patch/10890937/
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt
  2019-05-10  9:58               ` Auger Eric
@ 2019-05-10 15:05                 ` Igor Mammedov
  0 siblings, 0 replies; 11+ messages in thread
From: Igor Mammedov @ 2019-05-10 15:05 UTC (permalink / raw)
  To: Auger Eric
  Cc: Shameerali Kolothum Thodi, Laszlo Ersek, peter.maydell, xuwei (O),
	Anshuman Khandual, Catalin Marinas, ard.biesheuvel, will.deacon,
	qemu-devel, Linuxarm, linux-mm, qemu-arm, Jonathan Cameron,
	Robin Murphy, linux-arm-kernel

On Fri, 10 May 2019 11:58:38 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Shameer,
> 
> On 5/10/19 11:27 AM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> > 
> >> -----Original Message-----
> >> From: Auger Eric [mailto:eric.auger@redhat.com]
> >> Sent: 10 May 2019 10:16
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> >> Laszlo Ersek <lersek@redhat.com>; Igor Mammedov
> >> <imammedo@redhat.com>
> >> Cc: peter.maydell@linaro.org; xuwei (O) <xuwei5@huawei.com>; Anshuman
> >> Khandual <anshuman.khandual@arm.com>; Catalin Marinas
> >> <Catalin.Marinas@arm.com>; ard.biesheuvel@linaro.org;
> >> will.deacon@arm.com; qemu-devel@nongnu.org; Linuxarm
> >> <linuxarm@huawei.com>; linux-mm <linux-mm@kvack.org>;
> >> qemu-arm@nongnu.org; Jonathan Cameron
> >> <jonathan.cameron@huawei.com>; Robin Murphy <robin.murphy@arm.com>;
> >> linux-arm-kernel@lists.infradead.org
> >> Subject: Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu
> >> ARM/virt
> >>
> >> Hi Shameer,
> >>
> >> On 5/10/19 10:34 AM, Shameerali Kolothum Thodi wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Laszlo Ersek [mailto:lersek@redhat.com]
> >>>> Sent: 09 May 2019 22:48
> >>>> To: Igor Mammedov <imammedo@redhat.com>
> >>>> Cc: Robin Murphy <robin.murphy@arm.com>; Shameerali Kolothum Thodi
> >>>> <shameerali.kolothum.thodi@huawei.com>; will.deacon@arm.com; Catalin
> >>>> Marinas <Catalin.Marinas@arm.com>; Anshuman Khandual
> >>>> <anshuman.khandual@arm.com>; linux-arm-kernel@lists.infradead.org;
> >>>> linux-mm <linux-mm@kvack.org>; qemu-devel@nongnu.org;
> >>>> qemu-arm@nongnu.org; eric.auger@redhat.com;
> >> peter.maydell@linaro.org;
> >>>> Linuxarm <linuxarm@huawei.com>; ard.biesheuvel@linaro.org; Jonathan
> >>>> Cameron <jonathan.cameron@huawei.com>; xuwei (O)
> >> <xuwei5@huawei.com>
> >>>> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
> >>>>
> >>>> On 05/09/19 18:35, Igor Mammedov wrote:
> >>>>> On Wed, 8 May 2019 22:26:12 +0200
> >>>>> Laszlo Ersek <lersek@redhat.com> wrote:
> >>>>>
> >>>>>> On 05/08/19 14:50, Robin Murphy wrote:
> >>>>>>> Hi Shameer,
> >>>>>>>
> >>>>>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> This series here[0] attempts to add support for PCDIMM in QEMU for
> >>>>>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
> >>>>>>>> least
> >>>>>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> >>>>>>>> memory is handled by kernel.
> >>>>>>>>
> >>>>>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> >>>>>>>> and uses GED device to trigger the hotplug. This works fine.
> >>>>>>>>
> >>>>>>>> But when we added the DT node corresponding to the PCDIMM(cold
> >> plug
> >>>>>>>> scenario), we noticed that Guest kernel see this memory during early
> >>>> boot
> >>>>>>>> even if we are booting with ACPI. Because of this, hotpluggable
> >> memory
> >>>>>>>> may end up in zone normal and make it non-hot-un-pluggable even if
> >>>> Guest
> >>>>>>>> boots with ACPI.
> >>>>>>>>
> >>>>>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
> >>>>>>>> interpret the
> >>>>>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
> >>>>>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
> >>>>>>>> property
> >>>>>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
> >>>>>>>> differentiate
> >>>>>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> >>>>>>>>
> >>>>>>>> But then Laszlo rightly pointed out that in order to accommodate the
> >>>>>>>> changes
> >>>>>>>> into UEFI we need to know how exactly Linux expects/handles all the
> >>>>>>>> hotpluggable memory scenarios. Please find the discussion here[2].
> >>>>>>>>
> >>>>>>>> For ease, I am just copying the relevant comment from Laszlo below,
> >>>>>>>>
> >>>>>>>> /******
> >>>>>>>> "Given patches #7 and #8, as I understand them, the firmware cannot
> >>>>>>>> distinguish
> >>>>>>>>   hotpluggable & present, from hotpluggable & absent. The firmware
> >>>> can
> >>>>>>>> only
> >>>>>>>>   skip both hotpluggable cases. That's fine in that the firmware will
> >>>>>>>> hog neither
> >>>>>>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
> >>>>>>>> boot?
> >>>>>>>>
> >>>>>>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
> >>>>>>>> Assuming
> >>>>>>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI
> >>>> memmap
> >>>>>>>> will not include the range despite it being present at boot.
> >>>>>>>> Presumably, ACPI
> >>>>>>>> will refer to the range somehow, however. Will that not confuse the
> >> OS?
> >>>>>>>>
> >>>>>>>> When Igor raised this earlier, I suggested that
> >>>>>>>> hotpluggable-and-present should
> >>>>>>>> be added by the firmware, but also allocated immediately, as
> >>>>>>>> EfiBootServicesData
> >>>>>>>> type memory. This will prevent other drivers in the firmware from
> >>>>>>>> allocating AcpiNVS
> >>>>>>>> or Reserved chunks from the same memory range, the UEFI memmap
> >> will
> >>>>>>>> contain
> >>>>>>>> the range as EfiBootServicesData, and then the OS can release that
> >>>>>>>> allocation in
> >>>>>>>> one go early during boot.
> >>>>>>>>
> >>>>>>>> But this really has to be clarified from the Linux kernel's
> >>>>>>>> expectations. Please
> >>>>>>>> formalize all of the following cases:
> >>>>>>>>
> >>>>>>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should
> >> report
> >>>>>>>> as  DT/ACPI should report as
> >>>>>>>> -----------------  ------------------
> >>>>>>>> -------------------------------  ------------------------
> >>>>>>>>
> >>>> DT                 present             ?
> >>>>               ?
> >>>>>>>>
> >>>> DT                 absent              ?
> >>>>                ?
> >>>>>>>>
> >>>> ACPI               present             ?
> >>>>               ?
> >>>>>>>>
> >>>> ACPI               absent              ?
> >>>>               ?
> >>>>>>>>
> >>>>>>>> Again, this table is dictated by Linux."
> >>>>>>>>
> >>>>>>>> ******/
> >>>>>>>>
> >>>>>>>> Could you please take a look at this and let us know what is expected
> >>>>>>>> here from
> >>>>>>>> a Linux kernel view point.
> >>>>>>>
> >>>>>>> For arm64, so far we've not even been considering DT-based hotplug - as
> >>>>>>> far as I'm aware there would still be a big open question there around
> >>>>>>> notification mechanisms and how to describe them. The DT stuff so far
> >>>>>>> has come from the PowerPC folks, so it's probably worth seeing what
> >>>>>>> their ideas are.
> >>>>>>>
> >>>>>>> ACPI-wise I've always assumed/hoped that hotplug-related things
> >> should
> >>>>>>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
> >>>>>>> would be enough for us.
> >>>>>>
> >>>>>> As far as I can see in UEFI v2.8 -- and I had checked the spec before
> >>>>>> dumping the table with the many question marks on Shameer --, all the
> >>>>>> hot-plug language in the spec refers to USB and PCI hot-plug in the
> >>>>>> preboot environment. There is not a single word about hot-plug at OS
> >>>>>> runtime (regarding any device or component type), nor about memory
> >>>>>> hot-plug (at any time).
> >>>>>>
> >>>>>> Looking to x86 appears valid -- so what does the Linux kernel expect on
> >>>>>> that architecture, in the "ACPI" rows of the table?
> >>>>>
> >>>>> I could only answer from QEMU x86 perspective.
> >>>>> QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
> >>>>> because of different linux guests tend to cannibalize it, making it non
> >>>>> unpluggable. The last culprit I recall was KASLR.
> >>>>>
> >>>>> So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
> >>>>> it's possible (it's probably hack (spec deosn't say anything about it)
> >>>>> but it mostly works for Linux (plug/unplug) and Windows guest also
> >>>>> fine with plug part (no unplug there)).
> >>>>
> >>>> I can accept this as a perfectly valid design. Which would mean, QEMU
> >> should
> >>>> mark each hotpluggable RAM range in the DTB for the firmware with the
> >>>> special new property, regardless of its initial ("cold") plugged-ness, and then
> >>>> the firmware will not expose the range in the GCD memory space map, and
> >>>> consequently in the UEFI memmap either.
> >>>>
> >>>> IOW, our table is, thus far:
> >>>>
> >>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
> >>>> DT/ACPI should report as
> >>>> -----------------  ------------------  -------------------------------  ------------------------
> >>>> DT                 present
> >>>> ABSENT                           ?
> >>>> DT                 absent
> >>>> ABSENT                           ?
> >>>> ACPI               present             ABSENT
> >>>> PRESENT
> >>>> ACPI               absent              ABSENT
> >>>> ABSENT
> >>>> In the firmware, I only need to care about the GetMemoryMap() column, so
> >> I
> >>>> can work with this.
> >>>
> >>> Thank you all for the inputs.
> >>>
> >>> I assume we will still report the DT cold plug case to kernel(hotpluggable &
> >> present).
> >>> so the table will be something like this,
> >>>
> >>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
> >> DT/ACPI should report as
> >>> -----------------  ------------------  -------------------------------  ------------------------
> >>> DT                 present             ABSENT
> >> PRESENT
> >>> DT                 absent              ABSENT
> >> ABSENT
> >> With DT boot, how does the OS get to know if thehotpluggable memory is
> >> present or absent? Or maybe I misunderstand the last column.
> > 
> > It doesn't. For hotpluggable & present case it will be just like normal memory and
> > for absent case no memory node(hotpluaggble) is populated in DT. Is this acceptable?
> OK I get it now. Yes it makes sense.
> > 
> > On another note, if there are no strong case for DT cold plug for PCDIMM we can drop
> > it altogether which will make everything much simpler and no change required for
> > UEFI as well.
> I don't think we have strong requirements for PCDIMM in DT mode (initial
> RAM can be used). As long as we can detect an attempt to use PCDIMM in
> DT only mode and reject it (-no-acpi or !firmware_loaded ?), personally
> I don't have any objection.
It seems we are in agreement here, let's skip DT part for now.
We can add it later if there is demand for it (I don't see any issues wrt migration here).

> 
> Thanks
> 
> Eric
> > 
> > Thanks,
> > Shameer
> > 
> > 
> >> Thanks
> >>
> >> Eric
> >>> ACPI               present             ABSENT
> >> PRESENT
> >>> ACPI               absent              ABSENT
> >> ABSENT
> >>>
> >>>
> >>>  Can someone please file a feature request at
> >>>> <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these
> >>>> detais?
> >>>
> >>> Ok. I will do that.
> >>>
> >>> Thanks,
> >>> Shameer
> >>>
> >>>> Thanks
> >>>> Laszlo
> >>>>
> >>>>>
> >>>>> As for physical systems, there are out there ones that do report
> >>>>> hotpluggable RAM in GetMemoryMap().
> >>>>>
> >>>>>> Shameer: if you (Huawei) are represented on the USWG / ASWG, I
> >> suggest
> >>>>>> re-raising the question on those lists too; at least the "ACPI" rows of
> >>>>>> the table.
> >>>>>>
> >>>>>> Thanks!
> >>>>>> Laszlo
> >>>>>>
> >>>>>>>
> >>>>>>> Robin.
> >>>>>>>
> >>>>>>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
> >>>>>>>> any valid
> >>>>>>>> points above).
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Shameer
> >>>>>>>> [0] https://patchwork.kernel.org/cover/10890919/
> >>>>>>>> [1] https://patchwork.kernel.org/patch/10863299/
> >>>>>>>> [2] https://patchwork.kernel.org/patch/10890937/
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-05-10 15:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-08 10:15 [Question] Memory hotplug clarification for Qemu ARM/virt Shameerali Kolothum Thodi
2019-05-08 12:50 ` Robin Murphy
2019-05-08 20:26   ` Laszlo Ersek
2019-05-09 16:35     ` Igor Mammedov
2019-05-09 21:48       ` Laszlo Ersek
2019-05-10  8:34         ` Shameerali Kolothum Thodi
2019-05-10  9:15           ` [Qemu-devel] " Auger Eric
2019-05-10  9:27             ` Shameerali Kolothum Thodi
2019-05-10  9:58               ` Auger Eric
2019-05-10 15:05                 ` Igor Mammedov
2019-05-08 20:08 ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).