QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
@ 2019-08-13 14:16 Laszlo Ersek
  2019-08-13 16:09 ` Laszlo Ersek
  0 siblings, 1 reply; 23+ messages in thread
From: Laszlo Ersek @ 2019-08-13 14:16 UTC (permalink / raw)
  To: edk2-devel-groups-io
  Cc: Yingwen Chen, Phillip Goerl, qemu devel list, Jiewen Yao,
	Jun Nakajima, Paolo Bonzini, Igor Mammedov, Boris Ostrovsky,
	edk2-rfc-groups-io, Joao Marcal Lemos Martins

Hi,

this message is a problem statement, and an initial recommendation for
solving it, from Jiewen, Paolo, Yingwen, and others. I'm cross-posting
the thread starter to the <devel@edk2.groups.io>, <rfc@edk2.groups.io>
and <qemu-devel@nongnu.org> lists. Please use "Reply All" when
commenting.

In response to the initial posting, I plan to ask a number of questions.

The related TianoCore bugzillas are:

  https://bugzilla.tianocore.org/show_bug.cgi?id=1512
  https://bugzilla.tianocore.org/show_bug.cgi?id=1515

SMM is used as a security barrier between the OS kernel and the
firmware. When a CPU is plugged into a running system where this barrier
exists fine otherwise, the new CPU can be considered a means to attack
SMM. When the next SMI is raised (globally, or targeted at the new CPU),
the SMBASE for that CPU is still at 0x30000, which is normal RAM, not
SMRAM. Therefore the OS could place attack code in that area prior to
the SMI. Once in SMM, the new CPU would execute OS-owned code (from
normal RAM) with access to SMRAM and to other SMM-protected stuff, such
as flash. [I stole a few words from Paolo here.]

Jiewen summarized the problem as follows:

- Asset: SMM

- Adversary:

  - System Software Attacker, who can control any OS memory or silicon
    register from OS level, or read write BIOS data.

  - Simple hardware attacker, who can hot add or hot remove a CPU.

  - Non-adversary: The attacker cannot modify the flash BIOS code or
    read only BIOS data. The flash part itself is treated as TCB and
    protected.

- Threat: The attacker may hot add or hot remove a CPU, then modify
  system memory to tamper the SMRAM content, or trigger SMI to get the
  privilege escalation by executing code in SMM mode.

We'd like to solve this problem for QEMU/KVM and OVMF.

(At the moment, CPU hotplug doesn't work with OVMF *iff* OVMF was built
with -D SMM_REQUIRE. SMBASE relocation never happens for the new CPU,
the SMM infrastructure in edk2 doesn't know about the new CPU, and so
when the first SMI is broadcast afterwards, we crash. We'd like this
functionality to *work*, in the first place -- but securely at that, so
that an actively malicious guest kernel can't break into SMM.)

Yingwen and Jiewen suggested the following process.

Legend:

- "New CPU":  CPU being hot-added
- "Host CPU": existing CPU
- (Flash):    code running from flash
- (SMM):      code running from SMRAM

Steps:

(01) New CPU: (Flash) enter reset vector, Global SMI disabled by
     default.

(02) New CPU: (Flash) configure memory control to let it access global
     host memory.

(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI) --
     I am waiting for hot-add message. (NOTE: Host CPU can only send
     instruction in SMM mode. -- The register is SMM only)

(04) Host CPU: (OS) get message from board that a new CPU is added.
     (GPIO -> SCI)

(05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU will
     not enter CPU because SMI is disabled)

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM rebase
     code.

(07) Host CPU: (SMM) Send message to New CPU to Enable SMI.

(08) New CPU: (Flash) Get message - Enable SMI.

(09) Host CPU: (SMM) Send SMI to the new CPU only.

(10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
     TSEG.

(11) Host CPU: (SMM) Restore 38000.

(12) Host CPU: (SMM) Update located data structure to add the new CPU
     information. (This step will involve CPU_SERVICE protocol)

===================== (now, the next SMI will bring all CPU into TSEG)

(13) New CPU: (Flash) run MRC code, to init its own memory.

(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-13 14:16 [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF Laszlo Ersek
@ 2019-08-13 16:09 ` Laszlo Ersek
  2019-08-13 16:18   ` Laszlo Ersek
  2019-08-14 13:20   ` Yao, Jiewen
  0 siblings, 2 replies; 23+ messages in thread
From: Laszlo Ersek @ 2019-08-13 16:09 UTC (permalink / raw)
  To: edk2-devel-groups-io
  Cc: Yingwen Chen, Phillip Goerl, qemu devel list, Jiewen Yao,
	Jun Nakajima, Paolo Bonzini, Igor Mammedov, Boris Ostrovsky,
	edk2-rfc-groups-io, Joao Marcal Lemos Martins

On 08/13/19 16:16, Laszlo Ersek wrote:

> Yingwen and Jiewen suggested the following process.
>
> Legend:
>
> - "New CPU":  CPU being hot-added
> - "Host CPU": existing CPU
> - (Flash):    code running from flash
> - (SMM):      code running from SMRAM
>
> Steps:
>
> (01) New CPU: (Flash) enter reset vector, Global SMI disabled by
>      default.

- What does "Global SMI disabled by default" mean? In particular, what
  is "global" here?

  Do you mean that the CPU being hot-plugged should mask (by default)
  broadcast SMIs? What about directed SMIs? (An attacker could try that
  too.)

  And what about other processors? (I'd assume step (01)) is not
  relevant for other processors, but "global" is quite confusing here.)

- Does this part require a new branch somewhere in the OVMF SEC code?
  How do we determine whether the CPU executing SEC is BSP or
  hot-plugged AP?

- How do we tell the hot-plugged AP where to start execution? (I.e. that
  it should execute code at a particular pflash location.)

  For example, in MpInitLib, we start a specific AP with INIT-SIPI-SIPI,
  where "SIPI" stores the startup address in the "Interrupt Command
  Register" (which is memory-mapped in xAPIC mode, and an MSR in x2APIC
  mode, apparently). That doesn't apply here -- should QEMU auto-start
  the new CPU?

- What memory is used as stack by the new CPU, when it runs code from
  flash?

  QEMU does not emulate CAR (Cache As RAM). The new CPU doesn't have
  access to SMRAM. And we cannot use AcpiNVS or Reserved memory, because
  a malicious OS could use other CPUs -- or PCI device DMA -- to attack
  the stack (unless QEMU forcibly paused other CPUs upon hotplug; I'm
  not sure).

- If an attempt is made to hotplug multiple CPUs in quick succession,
  does something serialize those attempts?

  Again, stack usage could be a concern, even with Cache-As-RAM --
  HyperThreads (logical processors) on a single core don't have
  dedicated cache.

  Does CPU hotplug apply only at the socket level? If the CPU is
  multi-core, what is responsible for hot-plugging all cores present in
  the socket?


> (02) New CPU: (Flash) configure memory control to let it access global
>      host memory.

In QEMU/KVM guests, we don't have to enable memory explicitly, it just
exists and works.

In OVMF X64 SEC, we can't access RAM above 4GB, but that shouldn't be an
issue per se.


> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
>      -- I am waiting for hot-add message.

Maybe we can simplify this in QEMU by broadcasting an SMI to existent
processors immediately upon plugging the new CPU.


>                                        (NOTE: Host CPU can only send
>      instruction in SMM mode. -- The register is SMM only)

Sorry, I don't follow -- what register are we talking about here, and
why is the BSP needed to send anything at all? What "instruction" do you
have in mind?


> (04) Host CPU: (OS) get message from board that a new CPU is added.
>      (GPIO -> SCI)
>
> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
>      will not enter CPU because SMI is disabled)

I don't understand the OS involvement here. But, again, perhaps QEMU can
force all existent CPUs into SMM immediately upon adding the new CPU.


> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>      rebase code.
>
> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.

Aha, so this is the SMM-only register you mention in step (03). Is the
register specified in the Intel SDM?


> (08) New CPU: (Flash) Get message - Enable SMI.
>
> (09) Host CPU: (SMM) Send SMI to the new CPU only.
>
> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
>      TSEG.

What code does the new CPU execute after it completes step (10)? Does it
halt?


> (11) Host CPU: (SMM) Restore 38000.

These steps (i.e., (06) through (11)) don't appear RAS-specific. The
only platform-specific feature seems to be SMI masking register, which
could be extracted into a new SmmCpuFeaturesLib API.

Thus, would you please consider open sourcing firmware code for steps
(06) through (11)?


Alternatively -- and in particular because the stack for step (01)
concerns me --, we could approach this from a high-level, functional
perspective. The states that really matter are the relocated SMBASE for
the new CPU, and the state of the full system, right at the end of step
(11).

When the SMM setup quiesces during normal firmware boot, OVMF could use
existent (finalized) SMBASE infomation to *pre-program* some virtual
QEMU hardware, with such state that would be expected, as "final" state,
of any new hotplugged CPU. Afterwards, if / when the hotplug actually
happens, QEMU could blanket-apply this state to the new CPU, and
broadcast a hardware SMI to all CPUs except the new one.

The hardware SMI should tell the firmware that the rest of the process
-- step (12) below, and onward -- is being requested.

If I understand right, this approach would produce an firmware & system
state that's identical to what's expected right after step (11):

- all SMBASEs relocated
- all preexistent CPUs in SMM
- new CPU halted / blocked from launch
- DRAM at 0x30000 / 0x38000 contains OS-owned data

Is my understanding correct that this is the expected state after step
(11)?

Three more comments on the "SMBASE pre-config" approach:

- the virtual hardware providing this feature should become locked after
  the configuration, until next platform reset

- the pre-config should occur via simple hardware accesses, so that it
  can be replayed at S3 resume, i.e. as part of the S3 boot script

- from the pre-configured state, and the APIC ID, QEMU itself could
  perhaps calculate the SMI stack location for the new processor.


> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>      information. (This step will involve CPU_SERVICE protocol)

I commented on EFI_SMM_CPU_SERVICE_PROTOCOL in upon bullet (4) of
<https://bugzilla.tianocore.org/show_bug.cgi?id=1512#c4>.

Calling EFI_SMM_ADD_PROCESSOR looks justified.

What are some of the other member functions used for? The scary one is
EFI_SMM_REGISTER_EXCEPTION_HANDLER.


> ===================== (now, the next SMI will bring all CPU into TSEG)

OK... but what component injects that SMI, and when?


> (13) New CPU: (Flash) run MRC code, to init its own memory.

Why is this needed esp. after step (10)? The new CPU has accessed DRAM
already. And why are we executing code from pflash, rather than from
SMRAM, given that we're past SMBASE relocation?


> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
>
> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.

I'm confused by these steps. I thought that step (12) would complete the
hotplug, by updating the administrative data structures internally. And
the next SMI -- raised for the usual purposes, such as a software SMI
for variable access -- would be handled like it always is, except it
would also pull the new CPU into SMM too.

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-13 16:09 ` Laszlo Ersek
@ 2019-08-13 16:18   ` Laszlo Ersek
  2019-08-14 13:20   ` Yao, Jiewen
  1 sibling, 0 replies; 23+ messages in thread
From: Laszlo Ersek @ 2019-08-13 16:18 UTC (permalink / raw)
  To: edk2-devel-groups-io
  Cc: Yingwen Chen, Phillip Goerl, qemu devel list, Jiewen Yao,
	Jun Nakajima, Paolo Bonzini, Igor Mammedov, Boris Ostrovsky,
	edk2-rfc-groups-io, Joao Marcal Lemos Martins

On 08/13/19 18:09, Laszlo Ersek wrote:
> On 08/13/19 16:16, Laszlo Ersek wrote:

>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>      rebase code.
>>
>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
> 
> Aha, so this is the SMM-only register you mention in step (03). Is the
> register specified in the Intel SDM?
> 
> 
>> (08) New CPU: (Flash) Get message - Enable SMI.
>>
>> (09) Host CPU: (SMM) Send SMI to the new CPU only.
>>
>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
>>      TSEG.
> 
> What code does the new CPU execute after it completes step (10)? Does it
> halt?
> 
> 
>> (11) Host CPU: (SMM) Restore 38000.
> 
> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> only platform-specific feature seems to be SMI masking register, which
> could be extracted into a new SmmCpuFeaturesLib API.
> 
> Thus, would you please consider open sourcing firmware code for steps
> (06) through (11)?
> 
> 
> Alternatively -- and in particular because the stack for step (01)
> concerns me --, we could approach this from a high-level, functional
> perspective. The states that really matter are the relocated SMBASE for
> the new CPU, and the state of the full system, right at the end of step
> (11).
> 
> When the SMM setup quiesces during normal firmware boot, OVMF could use
> existent (finalized) SMBASE infomation to *pre-program* some virtual
> QEMU hardware, with such state that would be expected, as "final" state,
> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> happens, QEMU could blanket-apply this state to the new CPU, and
> broadcast a hardware SMI to all CPUs except the new one.
> 
> The hardware SMI should tell the firmware that the rest of the process
> -- step (12) below, and onward -- is being requested.
> 
> If I understand right, this approach would produce an firmware & system
> state that's identical to what's expected right after step (11):
> 
> - all SMBASEs relocated
> - all preexistent CPUs in SMM
> - new CPU halted / blocked from launch
> - DRAM at 0x30000 / 0x38000 contains OS-owned data
> 
> Is my understanding correct that this is the expected state after step
> (11)?

Revisiting some of my notes from earlier, such as
<https://bugzilla.redhat.com/show_bug.cgi?id=1454803#c46> -- apologies,
private BZ... --, we discussed some of this stuff with Mike on the phone
in April.

And, it looked like generating a hardware SMI in QEMU, in association
with the hotplug action that was being requested through the QEMU
monitor, would be the right approach.

By now I have forgotten about that discussion -- hence "revisiting my
notes"--, but luckily, it seems consistent with what I've proposed
above, under "alternatively".

Thanks,
Laszlo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-13 16:09 ` Laszlo Ersek
  2019-08-13 16:18   ` Laszlo Ersek
@ 2019-08-14 13:20   ` Yao, Jiewen
  2019-08-14 14:04     ` Paolo Bonzini
  1 sibling, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2019-08-14 13:20 UTC (permalink / raw)
  To: Laszlo Ersek, edk2-devel-groups-io
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima,  Jun,
	Paolo Bonzini, Igor Mammedov, Boris Ostrovsky,
	edk2-rfc-groups-io, Joao Marcal Lemos Martins

My comments below.

> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Wednesday, August 14, 2019 12:09 AM
> To: edk2-devel-groups-io <devel@edk2.groups.io>
> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Paolo Bonzini <pbonzini@redhat.com>; Yao, Jiewen
> <jiewen.yao@intel.com>; Chen, Yingwen <yingwen.chen@intel.com>;
> Nakajima, Jun <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl <phillip.goerl@oracle.com>
> Subject: Re: CPU hotplug using SMM with QEMU+OVMF
> 
> On 08/13/19 16:16, Laszlo Ersek wrote:
> 
> > Yingwen and Jiewen suggested the following process.
> >
> > Legend:
> >
> > - "New CPU":  CPU being hot-added
> > - "Host CPU": existing CPU
> > - (Flash):    code running from flash
> > - (SMM):      code running from SMRAM
> >
> > Steps:
> >
> > (01) New CPU: (Flash) enter reset vector, Global SMI disabled by
> >      default.
> 
> - What does "Global SMI disabled by default" mean? In particular, what
>   is "global" here?
[Jiewen] OK. Let's don’t use the term "global".


>   Do you mean that the CPU being hot-plugged should mask (by default)
>   broadcast SMIs? What about directed SMIs? (An attacker could try that
>   too.)
[Jiewen] I mean all SMIs are blocked for this specific hot-added CPU.


>   And what about other processors? (I'd assume step (01)) is not
>   relevant for other processors, but "global" is quite confusing here.)
[Jiewen] No impact to other processors.


> - Does this part require a new branch somewhere in the OVMF SEC code?
>   How do we determine whether the CPU executing SEC is BSP or
>   hot-plugged AP?
[Jiewen] I think this is blocked from hardware perspective, since the first instruction.
There are some hardware specific registers can be used to determine if the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in OVMF hot plug driver.


> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>   it should execute code at a particular pflash location.)
[Jiewen] Same real mode reset vector at FFFF:FFF0.


>   For example, in MpInitLib, we start a specific AP with INIT-SIPI-SIPI,
>   where "SIPI" stores the startup address in the "Interrupt Command
>   Register" (which is memory-mapped in xAPIC mode, and an MSR in x2APIC
>   mode, apparently). That doesn't apply here -- should QEMU auto-start
>   the new CPU?
[Jiewen] You can send INIT-SIPI-SIPI to new CPU only after it can access memory.
SIPI need give AP an below 1M memory address as waking vector.


> - What memory is used as stack by the new CPU, when it runs code from
>   flash?
[Jiewen] Same as other CPU in normal boot. You can use special reserved memory.


>   QEMU does not emulate CAR (Cache As RAM). The new CPU doesn't have
>   access to SMRAM. And we cannot use AcpiNVS or Reserved memory,
> because
>   a malicious OS could use other CPUs -- or PCI device DMA -- to attack
>   the stack (unless QEMU forcibly paused other CPUs upon hotplug; I'm
>   not sure).
[Jiewen] Excellent point!
I don’t think there is problem for real hardware, who always has CAR.
Can QEMU provide some CPU specific space, such as MMIO region?


> - If an attempt is made to hotplug multiple CPUs in quick succession,
>   does something serialize those attempts?
[Jiewen] The BIOS need consider this as availability requirement.
I don’t have strong opinion.
You can design a system that required hotplug must be one-by-one, or fail the hot-add.
Or you can design a system that did not have such restriction.
Again, all we need to do is to maintain the integrity of SMM.
The availability should be considered as separate requirement.


>   Again, stack usage could be a concern, even with Cache-As-RAM --
>   HyperThreads (logical processors) on a single core don't have
>   dedicated cache.
[Jiewen] Agree with you on the virtual environment.
For real hardware, we do socket level hot-add only. So HT is not the concern.
But if you want to do that in virtual environment, a processor specific memory
should be considered.


>   Does CPU hotplug apply only at the socket level? If the CPU is
>   multi-core, what is responsible for hot-plugging all cores present in
>   the socket?
[Jiewen] Ditto.


> > (02) New CPU: (Flash) configure memory control to let it access global
> >      host memory.
> 
> In QEMU/KVM guests, we don't have to enable memory explicitly, it just
> exists and works.
> 
> In OVMF X64 SEC, we can't access RAM above 4GB, but that shouldn't be an
> issue per se.
[Jiewen] Agree. I do not see the issue.


> > (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
> >      -- I am waiting for hot-add message.
> 
> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
> processors immediately upon plugging the new CPU.
> 
> 
> >                                        (NOTE: Host CPU can only
> send
> >      instruction in SMM mode. -- The register is SMM only)
> 
> Sorry, I don't follow -- what register are we talking about here, and
> why is the BSP needed to send anything at all? What "instruction" do you
> have in mind?
[Jiewen] The new CPU does not enable SMI at reset.
At some point of time later, the CPU need enable SMI, right?
The "instruction" here means, the host CPUs need tell to CPU to enable SMI.


> > (04) Host CPU: (OS) get message from board that a new CPU is added.
> >      (GPIO -> SCI)
> >
> > (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
> >      will not enter CPU because SMI is disabled)
> 
> I don't understand the OS involvement here. But, again, perhaps QEMU can
> force all existent CPUs into SMM immediately upon adding the new CPU.
[Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.


> > (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >      rebase code.
> >
> > (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
> 
> Aha, so this is the SMM-only register you mention in step (03). Is the
> register specified in the Intel SDM?
[Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
It is platform specific register. Not defined in SDM.
You may invent one in device model.


> > (08) New CPU: (Flash) Get message - Enable SMI.
> >
> > (09) Host CPU: (SMM) Send SMI to the new CPU only.
> >
> > (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
> >      TSEG.
> 
> What code does the new CPU execute after it completes step (10)? Does it
> halt?
[Jiewen] The new CPU exits SMM and return to original place - where it is
interrupted to enter SMM - running code on the flash.


> > (11) Host CPU: (SMM) Restore 38000.
> 
> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> only platform-specific feature seems to be SMI masking register, which
> could be extracted into a new SmmCpuFeaturesLib API.
> 
> Thus, would you please consider open sourcing firmware code for steps
> (06) through (11)?
> 
> Alternatively -- and in particular because the stack for step (01)
> concerns me --, we could approach this from a high-level, functional
> perspective. The states that really matter are the relocated SMBASE for
> the new CPU, and the state of the full system, right at the end of step
> (11).
> 
> When the SMM setup quiesces during normal firmware boot, OVMF could
> use
> existent (finalized) SMBASE infomation to *pre-program* some virtual
> QEMU hardware, with such state that would be expected, as "final" state,
> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> happens, QEMU could blanket-apply this state to the new CPU, and
> broadcast a hardware SMI to all CPUs except the new one.
> 
> The hardware SMI should tell the firmware that the rest of the process
> -- step (12) below, and onward -- is being requested.
> 
> If I understand right, this approach would produce an firmware & system
> state that's identical to what's expected right after step (11):
> 
> - all SMBASEs relocated
> - all preexistent CPUs in SMM
> - new CPU halted / blocked from launch
> - DRAM at 0x30000 / 0x38000 contains OS-owned data
> 
> Is my understanding correct that this is the expected state after step
> (11)?
[Jiewen] I think you are correct.


> Three more comments on the "SMBASE pre-config" approach:
> 
> - the virtual hardware providing this feature should become locked after
>   the configuration, until next platform reset
> 
> - the pre-config should occur via simple hardware accesses, so that it
>   can be replayed at S3 resume, i.e. as part of the S3 boot script
> 
> - from the pre-configured state, and the APIC ID, QEMU itself could
>   perhaps calculate the SMI stack location for the new processor.
> 
> 
> > (12) Host CPU: (SMM) Update located data structure to add the new CPU
> >      information. (This step will involve CPU_SERVICE protocol)
> 
> I commented on EFI_SMM_CPU_SERVICE_PROTOCOL in upon bullet (4) of
> <https://bugzilla.tianocore.org/show_bug.cgi?id=1512#c4>.
> 
> Calling EFI_SMM_ADD_PROCESSOR looks justified.
[Jiewen] I think you are correct.
Also REMOVE_PROCESSOR will be used for hot-remove action.


> What are some of the other member functions used for? The scary one is
> EFI_SMM_REGISTER_EXCEPTION_HANDLER.
[Jiewen] This is to register a new exception handler in SMM.
I don’t think this API is involved in hot-add.


> > ===================== (now, the next SMI will bring all CPU into TSEG)
> 
> OK... but what component injects that SMI, and when?
[Jiewen] Any SMI event. It could be synchronized SMI or asynchronized SMI.
It could from software such as IO write, or hardware such as thermal event.


> > (13) New CPU: (Flash) run MRC code, to init its own memory.
> 
> Why is this needed esp. after step (10)? The new CPU has accessed DRAM
> already. And why are we executing code from pflash, rather than from
> SMRAM, given that we're past SMBASE relocation?
[Jiewen] On real hardware, it is needed because different CPU may have different capability to access different DIMM.
I do not think your virtual platform need it.


> > (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
> >
> > (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.
> 
> I'm confused by these steps. I thought that step (12) would complete the
> hotplug, by updating the administrative data structures internally. And
> the next SMI -- raised for the usual purposes, such as a software SMI
> for variable access -- would be handled like it always is, except it
> would also pull the new CPU into SMM too.
[Jiewen] The OS need use the new CPU at some point of time, right?
As such, the OS need pull the new CPU into its own environment by INIT-SIPI-SIPI.


> Thanks!
> Laszlo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-14 13:20   ` Yao, Jiewen
@ 2019-08-14 14:04     ` Paolo Bonzini
  2019-08-15  9:55       ` Yao, Jiewen
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Paolo Bonzini @ 2019-08-14 14:04 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, edk2-devel-groups-io
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

On 14/08/19 15:20, Yao, Jiewen wrote:
>> - Does this part require a new branch somewhere in the OVMF SEC code?
>>   How do we determine whether the CPU executing SEC is BSP or
>>   hot-plugged AP?
> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
> There are some hardware specific registers can be used to determine if the CPU is new added.
> I don’t think this must be same as the real hardware.
> You are free to invent some registers in device model to be used in OVMF hot plug driver.

Yes, this would be a new operation mode for QEMU, that only applies to
hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.

>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>>   it should execute code at a particular pflash location.)
> [Jiewen] Same real mode reset vector at FFFF:FFF0.

You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
QEMU.  The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.

We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.

> I don’t think there is problem for real hardware, who always has CAR.
> Can QEMU provide some CPU specific space, such as MMIO region?

Why is a CPU-specific region needed if every other processor is in SMM
and thus trusted.

>>   Does CPU hotplug apply only at the socket level? If the CPU is
>>   multi-core, what is responsible for hot-plugging all cores present in
>>   the socket?

I can answer this: the SMM handler would interact with the hotplug
controller in the same way that ACPI DSDT does normally.  This supports
multiple hotplugs already.

Writes to the hotplug controller from outside SMM would be ignored.

>>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
>>>      -- I am waiting for hot-add message.
>>
>> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
>> processors immediately upon plugging the new CPU.

The QEMU DSDT could be modified (when secure boot is in effect) to OUT
to 0xB2 when hotplug happens.  It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.


>>
>>>                                        (NOTE: Host CPU can only
>> send
>>>      instruction in SMM mode. -- The register is SMM only)
>>
>> Sorry, I don't follow -- what register are we talking about here, and
>> why is the BSP needed to send anything at all? What "instruction" do you
>> have in mind?
> [Jiewen] The new CPU does not enable SMI at reset.
> At some point of time later, the CPU need enable SMI, right?
> The "instruction" here means, the host CPUs need tell to CPU to enable SMI.

Right, this would be a write to the CPU hotplug controller

>>> (04) Host CPU: (OS) get message from board that a new CPU is added.
>>>      (GPIO -> SCI)
>>>
>>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
>>>      will not enter CPU because SMI is disabled)
>>
>> I don't understand the OS involvement here. But, again, perhaps QEMU can
>> force all existent CPUs into SMM immediately upon adding the new CPU.
> [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.

See above.

>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>>      rebase code.
>>>
>>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
>>
>> Aha, so this is the SMM-only register you mention in step (03). Is the
>> register specified in the Intel SDM?
> [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
> It is platform specific register. Not defined in SDM.
> You may invent one in device model.

See above.

>>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
>>>      TSEG.
>>
>> What code does the new CPU execute after it completes step (10)? Does it
>> halt?
>
> [Jiewen] The new CPU exits SMM and return to original place - where it is
> interrupted to enter SMM - running code on the flash.

So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).

>>> (11) Host CPU: (SMM) Restore 38000.
>>
>> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
>> only platform-specific feature seems to be SMI masking register, which
>> could be extracted into a new SmmCpuFeaturesLib API.
>>
>> Thus, would you please consider open sourcing firmware code for steps
>> (06) through (11)?
>>
>> Alternatively -- and in particular because the stack for step (01)
>> concerns me --, we could approach this from a high-level, functional
>> perspective. The states that really matter are the relocated SMBASE for
>> the new CPU, and the state of the full system, right at the end of step
>> (11).
>>
>> When the SMM setup quiesces during normal firmware boot, OVMF could
>> use
>> existent (finalized) SMBASE infomation to *pre-program* some virtual
>> QEMU hardware, with such state that would be expected, as "final" state,
>> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
>> happens, QEMU could blanket-apply this state to the new CPU, and
>> broadcast a hardware SMI to all CPUs except the new one.

I'd rather avoid this and stay as close as possible to real hardware.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-14 14:04     ` Paolo Bonzini
@ 2019-08-15  9:55       ` Yao, Jiewen
  2019-08-15 16:04         ` Paolo Bonzini
  2019-08-15 15:00       ` [Qemu-devel] [edk2-devel] " Laszlo Ersek
  2019-08-15 16:07       ` [Qemu-devel] " Igor Mammedov
  2 siblings, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2019-08-15  9:55 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, edk2-devel-groups-io
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

Hi Paolo
I am not sure what do you mean - "You do not need a reset vector ...".
If so, where is the first instruction of the new CPU in the virtualization environment?
Please help me understand that at first. Then we can continue the discussion.

Thank you
Yao Jiewen

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, August 14, 2019 10:05 PM
> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
> <lersek@redhat.com>; edk2-devel-groups-io <devel@edk2.groups.io>
> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: CPU hotplug using SMM with QEMU+OVMF
> 
> On 14/08/19 15:20, Yao, Jiewen wrote:
> >> - Does this part require a new branch somewhere in the OVMF SEC code?
> >>   How do we determine whether the CPU executing SEC is BSP or
> >>   hot-plugged AP?
> > [Jiewen] I think this is blocked from hardware perspective, since the first
> instruction.
> > There are some hardware specific registers can be used to determine if the
> CPU is new added.
> > I don’t think this must be same as the real hardware.
> > You are free to invent some registers in device model to be used in OVMF
> hot plug driver.
> 
> Yes, this would be a new operation mode for QEMU, that only applies to
> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> fact it doesn't reply to anything at all.
> 
> >> - How do we tell the hot-plugged AP where to start execution? (I.e. that
> >>   it should execute code at a particular pflash location.)
> > [Jiewen] Same real mode reset vector at FFFF:FFF0.
> 
> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> QEMU.  The AP does not start execution at all when it is unplugged, so
> no cache-as-RAM etc.

> We only need to modify QEMU so that hot-plugged APIs do not reply to
> INIT/SIPI/SMI.
> 
> > I don’t think there is problem for real hardware, who always has CAR.
> > Can QEMU provide some CPU specific space, such as MMIO region?
> 
> Why is a CPU-specific region needed if every other processor is in SMM
> and thus trusted.
> >>   Does CPU hotplug apply only at the socket level? If the CPU is
> >>   multi-core, what is responsible for hot-plugging all cores present in
> >>   the socket?
> 
> I can answer this: the SMM handler would interact with the hotplug
> controller in the same way that ACPI DSDT does normally.  This supports
> multiple hotplugs already.
> 
> Writes to the hotplug controller from outside SMM would be ignored.
> 
> >>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
> >>>      -- I am waiting for hot-add message.
> >>
> >> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
> >> processors immediately upon plugging the new CPU.
> 
> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> to 0xB2 when hotplug happens.  It could write a well-known value to
> 0xB2, to be read by an SMI handler in edk2.
> 
> 
> >>
> >>>                                        (NOTE: Host CPU can
> only
> >> send
> >>>      instruction in SMM mode. -- The register is SMM only)
> >>
> >> Sorry, I don't follow -- what register are we talking about here, and
> >> why is the BSP needed to send anything at all? What "instruction" do you
> >> have in mind?
> > [Jiewen] The new CPU does not enable SMI at reset.
> > At some point of time later, the CPU need enable SMI, right?
> > The "instruction" here means, the host CPUs need tell to CPU to enable
> SMI.
> 
> Right, this would be a write to the CPU hotplug controller
> 
> >>> (04) Host CPU: (OS) get message from board that a new CPU is added.
> >>>      (GPIO -> SCI)
> >>>
> >>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
> >>>      will not enter CPU because SMI is disabled)
> >>
> >> I don't understand the OS involvement here. But, again, perhaps QEMU
> can
> >> force all existent CPUs into SMM immediately upon adding the new CPU.
> > [Jiewen] OS here means the Host CPU running code in OS environment, not
> in SMM environment.
> 
> See above.
> 
> >>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>>      rebase code.
> >>>
> >>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
> >>
> >> Aha, so this is the SMM-only register you mention in step (03). Is the
> >> register specified in the Intel SDM?
> > [Jiewen] Right. That is the register to let host CPU tell new CPU to enable
> SMI.
> > It is platform specific register. Not defined in SDM.
> > You may invent one in device model.
> 
> See above.
> 
> >>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE
> to
> >>>      TSEG.
> >>
> >> What code does the new CPU execute after it completes step (10)? Does
> it
> >> halt?
> >
> > [Jiewen] The new CPU exits SMM and return to original place - where it is
> > interrupted to enter SMM - running code on the flash.
> 
> So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
> 
> >>> (11) Host CPU: (SMM) Restore 38000.
> >>
> >> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> >> only platform-specific feature seems to be SMI masking register, which
> >> could be extracted into a new SmmCpuFeaturesLib API.
> >>
> >> Thus, would you please consider open sourcing firmware code for steps
> >> (06) through (11)?
> >>
> >> Alternatively -- and in particular because the stack for step (01)
> >> concerns me --, we could approach this from a high-level, functional
> >> perspective. The states that really matter are the relocated SMBASE for
> >> the new CPU, and the state of the full system, right at the end of step
> >> (11).
> >>
> >> When the SMM setup quiesces during normal firmware boot, OVMF could
> >> use
> >> existent (finalized) SMBASE infomation to *pre-program* some virtual
> >> QEMU hardware, with such state that would be expected, as "final" state,
> >> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> >> happens, QEMU could blanket-apply this state to the new CPU, and
> >> broadcast a hardware SMI to all CPUs except the new one.
> 
> I'd rather avoid this and stay as close as possible to real hardware.
> 
> Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-14 14:04     ` Paolo Bonzini
  2019-08-15  9:55       ` Yao, Jiewen
@ 2019-08-15 15:00       ` " Laszlo Ersek
  2019-08-15 16:16         ` Igor Mammedov
  2019-08-15 16:21         ` Paolo Bonzini
  2019-08-15 16:07       ` [Qemu-devel] " Igor Mammedov
  2 siblings, 2 replies; 23+ messages in thread
From: Laszlo Ersek @ 2019-08-15 15:00 UTC (permalink / raw)
  To: devel, pbonzini, Yao, Jiewen
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

On 08/14/19 16:04, Paolo Bonzini wrote:
> On 14/08/19 15:20, Yao, Jiewen wrote:
>>> - Does this part require a new branch somewhere in the OVMF SEC code?
>>>   How do we determine whether the CPU executing SEC is BSP or
>>>   hot-plugged AP?
>> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
>> There are some hardware specific registers can be used to determine if the CPU is new added.
>> I don’t think this must be same as the real hardware.
>> You are free to invent some registers in device model to be used in OVMF hot plug driver.
> 
> Yes, this would be a new operation mode for QEMU, that only applies to
> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> fact it doesn't reply to anything at all.
> 
>>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>>>   it should execute code at a particular pflash location.)
>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> 
> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> QEMU.  The AP does not start execution at all when it is unplugged, so
> no cache-as-RAM etc.
> 
> We only need to modify QEMU so that hot-plugged APIs do not reply to
> INIT/SIPI/SMI.
> 
>> I don’t think there is problem for real hardware, who always has CAR.
>> Can QEMU provide some CPU specific space, such as MMIO region?
> 
> Why is a CPU-specific region needed if every other processor is in SMM
> and thus trusted.

I was going through the steps Jiewen and Yingwen recommended.

In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).

Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.

Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.

One more comment below:

> 
>>>   Does CPU hotplug apply only at the socket level? If the CPU is
>>>   multi-core, what is responsible for hot-plugging all cores present in
>>>   the socket?
> 
> I can answer this: the SMM handler would interact with the hotplug
> controller in the same way that ACPI DSDT does normally.  This supports
> multiple hotplugs already.
> 
> Writes to the hotplug controller from outside SMM would be ignored.
> 
>>>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
>>>>      -- I am waiting for hot-add message.
>>>
>>> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
>>> processors immediately upon plugging the new CPU.
> 
> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> to 0xB2 when hotplug happens.  It could write a well-known value to
> 0xB2, to be read by an SMI handler in edk2.

(My comment below is general, and may not apply to this particular
situation. I'm too confused to figure that out myself, sorry!)

I dislike involving QEMU's generated DSDT in anything SMM (even
injecting the SMI), because the AML interpreter runs in the OS.

If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.

If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.

I'd be OK with an SMM / SMI involvement in QEMU's DSDT if, by diverging
from that DSDT, the OS kernel could only mess with its own state, and
not with the firmware's.

Thanks
Laszlo

> 
> 
>>>
>>>>                                        (NOTE: Host CPU can only
>>> send
>>>>      instruction in SMM mode. -- The register is SMM only)
>>>
>>> Sorry, I don't follow -- what register are we talking about here, and
>>> why is the BSP needed to send anything at all? What "instruction" do you
>>> have in mind?
>> [Jiewen] The new CPU does not enable SMI at reset.
>> At some point of time later, the CPU need enable SMI, right?
>> The "instruction" here means, the host CPUs need tell to CPU to enable SMI.
> 
> Right, this would be a write to the CPU hotplug controller
> 
>>>> (04) Host CPU: (OS) get message from board that a new CPU is added.
>>>>      (GPIO -> SCI)
>>>>
>>>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
>>>>      will not enter CPU because SMI is disabled)
>>>
>>> I don't understand the OS involvement here. But, again, perhaps QEMU can
>>> force all existent CPUs into SMM immediately upon adding the new CPU.
>> [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.
> 
> See above.
> 
>>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>>>      rebase code.
>>>>
>>>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
>>>
>>> Aha, so this is the SMM-only register you mention in step (03). Is the
>>> register specified in the Intel SDM?
>> [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
>> It is platform specific register. Not defined in SDM.
>> You may invent one in device model.
> 
> See above.
> 
>>>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
>>>>      TSEG.
>>>
>>> What code does the new CPU execute after it completes step (10)? Does it
>>> halt?
>>
>> [Jiewen] The new CPU exits SMM and return to original place - where it is
>> interrupted to enter SMM - running code on the flash.
> 
> So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
> 
>>>> (11) Host CPU: (SMM) Restore 38000.
>>>
>>> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
>>> only platform-specific feature seems to be SMI masking register, which
>>> could be extracted into a new SmmCpuFeaturesLib API.
>>>
>>> Thus, would you please consider open sourcing firmware code for steps
>>> (06) through (11)?
>>>
>>> Alternatively -- and in particular because the stack for step (01)
>>> concerns me --, we could approach this from a high-level, functional
>>> perspective. The states that really matter are the relocated SMBASE for
>>> the new CPU, and the state of the full system, right at the end of step
>>> (11).
>>>
>>> When the SMM setup quiesces during normal firmware boot, OVMF could
>>> use
>>> existent (finalized) SMBASE infomation to *pre-program* some virtual
>>> QEMU hardware, with such state that would be expected, as "final" state,
>>> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
>>> happens, QEMU could blanket-apply this state to the new CPU, and
>>> broadcast a hardware SMI to all CPUs except the new one.
> 
> I'd rather avoid this and stay as close as possible to real hardware.
> 
> Paolo
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> 
> View/Reply Online (#45608): https://edk2.groups.io/g/devel/message/45608
> Mute This Topic: https://groups.io/mt/32852911/1721875
> Group Owner: devel+owner@edk2.groups.io
> Unsubscribe: https://edk2.groups.io/g/devel/unsub  [lersek@redhat.com]
> -=-=-=-=-=-=-=-=-=-=-=-
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15  9:55       ` Yao, Jiewen
@ 2019-08-15 16:04         ` Paolo Bonzini
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2019-08-15 16:04 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, edk2-devel-groups-io
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

On 15/08/19 11:55, Yao, Jiewen wrote:
> Hi Paolo
> I am not sure what do you mean - "You do not need a reset vector ...".
> If so, where is the first instruction of the new CPU in the virtualization environment?
> Please help me understand that at first. Then we can continue the discussion.

The BSP starts running from 0xFFFFFFF0.  APs do not start running at all
and just sit waiting for an INIT-SIPI-SIPI sequence.  Please see my
proposal in the reply to Laszlo.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-14 14:04     ` Paolo Bonzini
  2019-08-15  9:55       ` Yao, Jiewen
  2019-08-15 15:00       ` [Qemu-devel] [edk2-devel] " Laszlo Ersek
@ 2019-08-15 16:07       ` " Igor Mammedov
  2019-08-15 16:24         ` Paolo Bonzini
  2 siblings, 1 reply; 23+ messages in thread
From: Igor Mammedov @ 2019-08-15 16:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Chen, Yingwen, edk2-devel-groups-io, Phillip Goerl,
	qemu devel list, Yao, Jiewen, Nakajima, Jun, Boris Ostrovsky,
	edk2-rfc-groups-io, Laszlo Ersek, Joao Marcal Lemos Martins

On Wed, 14 Aug 2019 16:04:50 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 14/08/19 15:20, Yao, Jiewen wrote:
> >> - Does this part require a new branch somewhere in the OVMF SEC code?
> >>   How do we determine whether the CPU executing SEC is BSP or
> >>   hot-plugged AP?  
> > [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
> > There are some hardware specific registers can be used to determine if the CPU is new added.
> > I don’t think this must be same as the real hardware.
> > You are free to invent some registers in device model to be used in OVMF hot plug driver.  
> 
> Yes, this would be a new operation mode for QEMU, that only applies to
> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> fact it doesn't reply to anything at all.
> 
> >> - How do we tell the hot-plugged AP where to start execution? (I.e. that
> >>   it should execute code at a particular pflash location.)  
> > [Jiewen] Same real mode reset vector at FFFF:FFF0.  
> 
> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> QEMU.  The AP does not start execution at all when it is unplugged, so
> no cache-as-RAM etc.
> 
> We only need to modify QEMU so that hot-plugged APIs do not reply to
> INIT/SIPI/SMI.
> 
> > I don’t think there is problem for real hardware, who always has CAR.
> > Can QEMU provide some CPU specific space, such as MMIO region?  
> 
> Why is a CPU-specific region needed if every other processor is in SMM
> and thus trusted.
> 
> >>   Does CPU hotplug apply only at the socket level? If the CPU is
> >>   multi-core, what is responsible for hot-plugging all cores present in
> >>   the socket?  
> 
> I can answer this: the SMM handler would interact with the hotplug
> controller in the same way that ACPI DSDT does normally.  This supports
> multiple hotplugs already.
> 
> Writes to the hotplug controller from outside SMM would be ignored.
> 
> >>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
> >>>      -- I am waiting for hot-add message.  
> >>
> >> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
> >> processors immediately upon plugging the new CPU.  
> 
> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> to 0xB2 when hotplug happens.  It could write a well-known value to
> 0xB2, to be read by an SMI handler in edk2.
> 
> 
> >>  
> >>>                                        (NOTE: Host CPU can only  
> >> send  
> >>>      instruction in SMM mode. -- The register is SMM only)  
> >>
> >> Sorry, I don't follow -- what register are we talking about here, and
> >> why is the BSP needed to send anything at all? What "instruction" do you
> >> have in mind?  
> > [Jiewen] The new CPU does not enable SMI at reset.
> > At some point of time later, the CPU need enable SMI, right?
> > The "instruction" here means, the host CPUs need tell to CPU to enable SMI.  
> 
> Right, this would be a write to the CPU hotplug controller
> 
> >>> (04) Host CPU: (OS) get message from board that a new CPU is added.
> >>>      (GPIO -> SCI)
> >>>
> >>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
> >>>      will not enter CPU because SMI is disabled)  
> >>
> >> I don't understand the OS involvement here. But, again, perhaps QEMU can
> >> force all existent CPUs into SMM immediately upon adding the new CPU.  
> > [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.  
> 
> See above.
> 
> >>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>>      rebase code.
> >>>
> >>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.  
> >>
> >> Aha, so this is the SMM-only register you mention in step (03). Is the
> >> register specified in the Intel SDM?  
> > [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
> > It is platform specific register. Not defined in SDM.
> > You may invent one in device model.  
> 
> See above.
> 
> >>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
> >>>      TSEG.  
> >>
> >> What code does the new CPU execute after it completes step (10)? Does it
> >> halt?  
> >
> > [Jiewen] The new CPU exits SMM and return to original place - where it is
> > interrupted to enter SMM - running code on the flash.  
> 
> So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).

Looking at Q35 code and Seabios SMM relocation as example, if I see it
right QEMU has:
    - SMRAM is aliased from DRAM at 0xa0000
    - and TSEG steals from the top of low RAM when configured

Now problem is that default SMBASE at 0x30000 isn't backed by anything
in SMRAM address space and default SMI entry falls-through to the same
location in System address space.

The later is not trusted and entry into SMM mode will corrupt area + might
jump to 'random' SMI handler (hence save/restore code in Seabios).

Here is an idea, can we map a memory region at 0x30000 in SMRAM address
space with relocation space/code reserved. It could be a part of TSEG
(so we don't have to invent ABI to configure that)?

In that case we do not have to care about System address space content
anymore and un-trusted code shouldn't be able to supply rogue SMI handler.
(that would cross out one of the reasons for inventing disabled-INIT/SMI state)


> >>> (11) Host CPU: (SMM) Restore 38000.  
> >>
> >> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> >> only platform-specific feature seems to be SMI masking register, which
> >> could be extracted into a new SmmCpuFeaturesLib API.
> >>
> >> Thus, would you please consider open sourcing firmware code for steps
> >> (06) through (11)?
> >>
> >> Alternatively -- and in particular because the stack for step (01)
> >> concerns me --, we could approach this from a high-level, functional
> >> perspective. The states that really matter are the relocated SMBASE for
> >> the new CPU, and the state of the full system, right at the end of step
> >> (11).
> >>
> >> When the SMM setup quiesces during normal firmware boot, OVMF could
> >> use
> >> existent (finalized) SMBASE infomation to *pre-program* some virtual
> >> QEMU hardware, with such state that would be expected, as "final" state,
> >> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> >> happens, QEMU could blanket-apply this state to the new CPU, and
> >> broadcast a hardware SMI to all CPUs except the new one.  
> 
> I'd rather avoid this and stay as close as possible to real hardware.
> 
> Paolo



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 15:00       ` [Qemu-devel] [edk2-devel] " Laszlo Ersek
@ 2019-08-15 16:16         ` Igor Mammedov
  2019-08-15 16:21         ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Igor Mammedov @ 2019-08-15 16:16 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Chen, Yingwen, devel, Phillip Goerl, qemu devel list, Yao,
	Jiewen, Nakajima, Jun, pbonzini, Boris Ostrovsky,
	edk2-rfc-groups-io, Joao Marcal Lemos Martins

On Thu, 15 Aug 2019 17:00:16 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 08/14/19 16:04, Paolo Bonzini wrote:
> > On 14/08/19 15:20, Yao, Jiewen wrote:  
> >>> - Does this part require a new branch somewhere in the OVMF SEC code?
> >>>   How do we determine whether the CPU executing SEC is BSP or
> >>>   hot-plugged AP?  
> >> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
> >> There are some hardware specific registers can be used to determine if the CPU is new added.
> >> I don’t think this must be same as the real hardware.
> >> You are free to invent some registers in device model to be used in OVMF hot plug driver.  
> > 
> > Yes, this would be a new operation mode for QEMU, that only applies to
> > hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> > fact it doesn't reply to anything at all.
> >   
> >>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
> >>>   it should execute code at a particular pflash location.)  
> >> [Jiewen] Same real mode reset vector at FFFF:FFF0.  
> > 
> > You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> > QEMU.  The AP does not start execution at all when it is unplugged, so
> > no cache-as-RAM etc.
> > 
> > We only need to modify QEMU so that hot-plugged APIs do not reply to
> > INIT/SIPI/SMI.
> >   
> >> I don’t think there is problem for real hardware, who always has CAR.
> >> Can QEMU provide some CPU specific space, such as MMIO region?  
> > 
> > Why is a CPU-specific region needed if every other processor is in SMM
> > and thus trusted.  
> 
> I was going through the steps Jiewen and Yingwen recommended.
> 
> In step (02), the new CPU is expected to set up RAM access. In step
> (03), the new CPU, executing code from flash, is expected to "send board
> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> message." For that action, the new CPU may need a stack (minimally if we
> want to use C function calls).
> 
> Until step (03), there had been no word about any other (= pre-plugged)
> CPUs (more precisely, Jiewen even confirmed "No impact to other
> processors"), so I didn't assume that other CPUs had entered SMM.
> 
> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> as I can. I'm still very confused. If you have a better understanding,
> could you please write up the 15-step process from the thread starter
> again, with all QEMU customizations applied? Such as, unnecessary steps
> removed, and platform specifics filled in.
> 
> One more comment below:
> 
> >   
> >>>   Does CPU hotplug apply only at the socket level? If the CPU is
> >>>   multi-core, what is responsible for hot-plugging all cores present in
> >>>   the socket?  
> > 
> > I can answer this: the SMM handler would interact with the hotplug
> > controller in the same way that ACPI DSDT does normally.  This supports
> > multiple hotplugs already.
> > 
> > Writes to the hotplug controller from outside SMM would be ignored.
> >   
> >>>> (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
> >>>>      -- I am waiting for hot-add message.  
> >>>
> >>> Maybe we can simplify this in QEMU by broadcasting an SMI to existent
> >>> processors immediately upon plugging the new CPU.  
> > 
> > The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> > to 0xB2 when hotplug happens.  It could write a well-known value to
> > 0xB2, to be read by an SMI handler in edk2.  
> 
> (My comment below is general, and may not apply to this particular
> situation. I'm too confused to figure that out myself, sorry!)
> 
> I dislike involving QEMU's generated DSDT in anything SMM (even
> injecting the SMI), because the AML interpreter runs in the OS.
> 
> If a malicious OS kernel is a bit too enlightened about the DSDT, it
> could willfully diverge from the process that we design. If QEMU
> broadcast the SMI internally, the guest OS could not interfere with that.
> 
> If the purpose of the SMI is specifically to force all CPUs into SMM
> (and thereby force them into trusted state), then the OS would be
> explicitly counter-interested in carrying out the AML operations from
> QEMU's DSDT.
it shouldn't matter where from management SMI comes if OS won't be able
to actually trigger SMI with un-trusted content at SMBASE on hotplugged (parked) CPU.
The worst that could happen is that new cpu will stay parked.

> I'd be OK with an SMM / SMI involvement in QEMU's DSDT if, by diverging
> from that DSDT, the OS kernel could only mess with its own state, and
> not with the firmware's.
> 
> Thanks
> Laszlo
> 
> > 
> >   
> >>>  
> >>>>                                        (NOTE: Host CPU can only  
> >>> send  
> >>>>      instruction in SMM mode. -- The register is SMM only)  
> >>>
> >>> Sorry, I don't follow -- what register are we talking about here, and
> >>> why is the BSP needed to send anything at all? What "instruction" do you
> >>> have in mind?  
> >> [Jiewen] The new CPU does not enable SMI at reset.
> >> At some point of time later, the CPU need enable SMI, right?
> >> The "instruction" here means, the host CPUs need tell to CPU to enable SMI.  
> > 
> > Right, this would be a write to the CPU hotplug controller
> >   
> >>>> (04) Host CPU: (OS) get message from board that a new CPU is added.
> >>>>      (GPIO -> SCI)
> >>>>
> >>>> (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
> >>>>      will not enter CPU because SMI is disabled)  
> >>>
> >>> I don't understand the OS involvement here. But, again, perhaps QEMU can
> >>> force all existent CPUs into SMM immediately upon adding the new CPU.  
> >> [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.  
> > 
> > See above.
> >   
> >>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>>>      rebase code.
> >>>>
> >>>> (07) Host CPU: (SMM) Send message to New CPU to Enable SMI.  
> >>>
> >>> Aha, so this is the SMM-only register you mention in step (03). Is the
> >>> register specified in the Intel SDM?  
> >> [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
> >> It is platform specific register. Not defined in SDM.
> >> You may invent one in device model.  
> > 
> > See above.
> >   
> >>>> (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
> >>>>      TSEG.  
> >>>
> >>> What code does the new CPU execute after it completes step (10)? Does it
> >>> halt?  
> >>
> >> [Jiewen] The new CPU exits SMM and return to original place - where it is
> >> interrupted to enter SMM - running code on the flash.  
> > 
> > So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
> >   
> >>>> (11) Host CPU: (SMM) Restore 38000.  
> >>>
> >>> These steps (i.e., (06) through (11)) don't appear RAS-specific. The
> >>> only platform-specific feature seems to be SMI masking register, which
> >>> could be extracted into a new SmmCpuFeaturesLib API.
> >>>
> >>> Thus, would you please consider open sourcing firmware code for steps
> >>> (06) through (11)?
> >>>
> >>> Alternatively -- and in particular because the stack for step (01)
> >>> concerns me --, we could approach this from a high-level, functional
> >>> perspective. The states that really matter are the relocated SMBASE for
> >>> the new CPU, and the state of the full system, right at the end of step
> >>> (11).
> >>>
> >>> When the SMM setup quiesces during normal firmware boot, OVMF could
> >>> use
> >>> existent (finalized) SMBASE infomation to *pre-program* some virtual
> >>> QEMU hardware, with such state that would be expected, as "final" state,
> >>> of any new hotplugged CPU. Afterwards, if / when the hotplug actually
> >>> happens, QEMU could blanket-apply this state to the new CPU, and
> >>> broadcast a hardware SMI to all CPUs except the new one.  
> > 
> > I'd rather avoid this and stay as close as possible to real hardware.
> > 
> > Paolo
> > 
> > -=-=-=-=-=-=-=-=-=-=-=-
> > Groups.io Links: You receive all messages sent to this group.
> > 
> > View/Reply Online (#45608): https://edk2.groups.io/g/devel/message/45608
> > Mute This Topic: https://groups.io/mt/32852911/1721875
> > Group Owner: devel+owner@edk2.groups.io
> > Unsubscribe: https://edk2.groups.io/g/devel/unsub  [lersek@redhat.com]
> > -=-=-=-=-=-=-=-=-=-=-=-
> >   
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 15:00       ` [Qemu-devel] [edk2-devel] " Laszlo Ersek
  2019-08-15 16:16         ` Igor Mammedov
@ 2019-08-15 16:21         ` Paolo Bonzini
  2019-08-16  2:46           ` Yao, Jiewen
  2019-08-16 20:00           ` Laszlo Ersek
  1 sibling, 2 replies; 23+ messages in thread
From: Paolo Bonzini @ 2019-08-15 16:21 UTC (permalink / raw)
  To: Laszlo Ersek, devel, Yao, Jiewen
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

On 15/08/19 17:00, Laszlo Ersek wrote:
> On 08/14/19 16:04, Paolo Bonzini wrote:
>> On 14/08/19 15:20, Yao, Jiewen wrote:
>>>> - Does this part require a new branch somewhere in the OVMF SEC code?
>>>>   How do we determine whether the CPU executing SEC is BSP or
>>>>   hot-plugged AP?
>>> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
>>> There are some hardware specific registers can be used to determine if the CPU is new added.
>>> I don’t think this must be same as the real hardware.
>>> You are free to invent some registers in device model to be used in OVMF hot plug driver.
>>
>> Yes, this would be a new operation mode for QEMU, that only applies to
>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
>> fact it doesn't reply to anything at all.
>>
>>>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>>>>   it should execute code at a particular pflash location.)
>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
>>
>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
>> QEMU.  The AP does not start execution at all when it is unplugged, so
>> no cache-as-RAM etc.
>>
>> We only need to modify QEMU so that hot-plugged APIs do not reply to
>> INIT/SIPI/SMI.
>>
>>> I don’t think there is problem for real hardware, who always has CAR.
>>> Can QEMU provide some CPU specific space, such as MMIO region?
>>
>> Why is a CPU-specific region needed if every other processor is in SMM
>> and thus trusted.
> 
> I was going through the steps Jiewen and Yingwen recommended.
> 
> In step (02), the new CPU is expected to set up RAM access. In step
> (03), the new CPU, executing code from flash, is expected to "send board
> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> message." For that action, the new CPU may need a stack (minimally if we
> want to use C function calls).
> 
> Until step (03), there had been no word about any other (= pre-plugged)
> CPUs (more precisely, Jiewen even confirmed "No impact to other
> processors"), so I didn't assume that other CPUs had entered SMM.
> 
> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> as I can. I'm still very confused. If you have a better understanding,
> could you please write up the 15-step process from the thread starter
> again, with all QEMU customizations applied? Such as, unnecessary steps
> removed, and platform specifics filled in.

Sure.

(01a) QEMU: create new CPU.  The CPU already exists, but it does not
     start running code until unparked by the CPU hotplug controller.

(01b) QEMU: trigger SCI

(02-03) no equivalent

(04) Host CPU: (OS) execute GPE handler from DSDT

(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
     will not enter CPU because SMI is disabled)

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
     rebase code.

(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
     new CPU

(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.

(08a) New CPU: (Low RAM) Enter protected mode.

(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.

(09) Host CPU: (SMM) Send SMI to the new CPU only.

(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
     TSEG.

(11) Host CPU: (SMM) Restore 38000.

(12) Host CPU: (SMM) Update located data structure to add the new CPU
     information. (This step will involve CPU_SERVICE protocol)

(13) New CPU: (Flash) do whatever other initialization is needed

(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..


In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.


>> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
>> to 0xB2 when hotplug happens.  It could write a well-known value to
>> 0xB2, to be read by an SMI handler in edk2.
> 
> I dislike involving QEMU's generated DSDT in anything SMM (even
> injecting the SMI), because the AML interpreter runs in the OS.
> 
> If a malicious OS kernel is a bit too enlightened about the DSDT, it
> could willfully diverge from the process that we design. If QEMU
> broadcast the SMI internally, the guest OS could not interfere with that.
> 
> If the purpose of the SMI is specifically to force all CPUs into SMM
> (and thereby force them into trusted state), then the OS would be
> explicitly counter-interested in carrying out the AML operations from
> QEMU's DSDT.

But since the hotplug controller would only be accessible from SMM,
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2.  FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 16:07       ` [Qemu-devel] " Igor Mammedov
@ 2019-08-15 16:24         ` Paolo Bonzini
  2019-08-16  7:42           ` Igor Mammedov
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2019-08-15 16:24 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Chen, Yingwen, edk2-devel-groups-io, Phillip Goerl,
	qemu devel list, Yao, Jiewen, Nakajima, Jun, Boris Ostrovsky,
	edk2-rfc-groups-io, Laszlo Ersek, Joao Marcal Lemos Martins

On 15/08/19 18:07, Igor Mammedov wrote:
> Looking at Q35 code and Seabios SMM relocation as example, if I see it
> right QEMU has:
>     - SMRAM is aliased from DRAM at 0xa0000
>     - and TSEG steals from the top of low RAM when configured
> 
> Now problem is that default SMBASE at 0x30000 isn't backed by anything
> in SMRAM address space and default SMI entry falls-through to the same
> location in System address space.
> 
> The later is not trusted and entry into SMM mode will corrupt area + might
> jump to 'random' SMI handler (hence save/restore code in Seabios).
> 
> Here is an idea, can we map a memory region at 0x30000 in SMRAM address
> space with relocation space/code reserved. It could be a part of TSEG
> (so we don't have to invent ABI to configure that)?

No, there could be real mode code using it.  What we _could_ do is
initialize SMBASE to 0xa0000, but I think it's better to not deviate too
much from processor behavior (even if it's admittedly a 20-years legacy
that doesn't make any sense).

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 16:21         ` Paolo Bonzini
@ 2019-08-16  2:46           ` Yao, Jiewen
  2019-08-16  7:20             ` Paolo Bonzini
  2019-08-16 20:00           ` Laszlo Ersek
  1 sibling, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2019-08-16  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, devel
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

Comment below:


> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, August 16, 2019 12:21 AM
> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao, Jiewen
> <jiewen.yao@intel.com>
> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> 
> On 15/08/19 17:00, Laszlo Ersek wrote:
> > On 08/14/19 16:04, Paolo Bonzini wrote:
> >> On 14/08/19 15:20, Yao, Jiewen wrote:
> >>>> - Does this part require a new branch somewhere in the OVMF SEC
> code?
> >>>>   How do we determine whether the CPU executing SEC is BSP or
> >>>>   hot-plugged AP?
> >>> [Jiewen] I think this is blocked from hardware perspective, since the first
> instruction.
> >>> There are some hardware specific registers can be used to determine if
> the CPU is new added.
> >>> I don’t think this must be same as the real hardware.
> >>> You are free to invent some registers in device model to be used in
> OVMF hot plug driver.
> >>
> >> Yes, this would be a new operation mode for QEMU, that only applies to
> >> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
> >> fact it doesn't reply to anything at all.
> >>
> >>>> - How do we tell the hot-plugged AP where to start execution? (I.e.
> that
> >>>>   it should execute code at a particular pflash location.)
> >>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> >>
> >> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> >> QEMU.  The AP does not start execution at all when it is unplugged, so
> >> no cache-as-RAM etc.
> >>
> >> We only need to modify QEMU so that hot-plugged APIs do not reply to
> >> INIT/SIPI/SMI.
> >>
> >>> I don’t think there is problem for real hardware, who always has CAR.
> >>> Can QEMU provide some CPU specific space, such as MMIO region?
> >>
> >> Why is a CPU-specific region needed if every other processor is in SMM
> >> and thus trusted.
> >
> > I was going through the steps Jiewen and Yingwen recommended.
> >
> > In step (02), the new CPU is expected to set up RAM access. In step
> > (03), the new CPU, executing code from flash, is expected to "send board
> > message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> > message." For that action, the new CPU may need a stack (minimally if we
> > want to use C function calls).
> >
> > Until step (03), there had been no word about any other (= pre-plugged)
> > CPUs (more precisely, Jiewen even confirmed "No impact to other
> > processors"), so I didn't assume that other CPUs had entered SMM.
> >
> > Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> > as I can. I'm still very confused. If you have a better understanding,
> > could you please write up the 15-step process from the thread starter
> > again, with all QEMU customizations applied? Such as, unnecessary steps
> > removed, and platform specifics filled in.
> 
> Sure.
> 
> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>      start running code until unparked by the CPU hotplug controller.
> 
> (01b) QEMU: trigger SCI
> 
> (02-03) no equivalent
> 
> (04) Host CPU: (OS) execute GPE handler from DSDT
> 
> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>      will not enter CPU because SMI is disabled)
> 
> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>      rebase code.
> 
> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>      new CPU
> 
> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
restriction that INIT/SIPI/SIPI can only be sent in SMM.



> (08a) New CPU: (Low RAM) Enter protected mode.
[Jiewen] NOTE: The new CPU still cannot use any physical memory, because
the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.



> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
> 
> (09) Host CPU: (SMM) Send SMI to the new CPU only.
> 
> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>      TSEG.
> 
> (11) Host CPU: (SMM) Restore 38000.
> 
> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>      information. (This step will involve CPU_SERVICE protocol)
> 
> (13) New CPU: (Flash) do whatever other initialization is needed
> 
> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
> 
> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
> 
> 
> In other words, the cache-as-RAM phase of 02-03 is replaced by the
> INIT-SIPI-SIPI sequence of 07b-08a-08b.
[Jiewen] I am OK with this proposal.
I think the rule is same - the new CPU CANNOT touch any system memory,
no matter it is from reset-vector or from INIT/SIPI/SIPI.
Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be
CPU specific or on the flash.



> >> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
> >> to 0xB2 when hotplug happens.  It could write a well-known value to
> >> 0xB2, to be read by an SMI handler in edk2.
> >
> > I dislike involving QEMU's generated DSDT in anything SMM (even
> > injecting the SMI), because the AML interpreter runs in the OS.
> >
> > If a malicious OS kernel is a bit too enlightened about the DSDT, it
> > could willfully diverge from the process that we design. If QEMU
> > broadcast the SMI internally, the guest OS could not interfere with that.
> >
> > If the purpose of the SMI is specifically to force all CPUs into SMM
> > (and thereby force them into trusted state), then the OS would be
> > explicitly counter-interested in carrying out the AML operations from
> > QEMU's DSDT.
> 
> But since the hotplug controller would only be accessible from SMM,
> there would be no other way to invoke it than to follow the DSDT's
> instruction and write to 0xB2.  FWIW, real hardware also has plenty of
> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
> access).
> 
> Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16  2:46           ` Yao, Jiewen
@ 2019-08-16  7:20             ` Paolo Bonzini
  2019-08-16  7:49               ` Yao, Jiewen
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2019-08-16  7:20 UTC (permalink / raw)
  To: Yao, Jiewen, Laszlo Ersek, devel
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

On 16/08/19 04:46, Yao, Jiewen wrote:
> Comment below:
> 
> 
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: Friday, August 16, 2019 12:21 AM
>> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao, Jiewen
>> <jiewen.yao@intel.com>
>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
>> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
>> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
>> <phillip.goerl@oracle.com>
>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
>>
>> On 15/08/19 17:00, Laszlo Ersek wrote:
>>> On 08/14/19 16:04, Paolo Bonzini wrote:
>>>> On 14/08/19 15:20, Yao, Jiewen wrote:
>>>>>> - Does this part require a new branch somewhere in the OVMF SEC
>> code?
>>>>>>   How do we determine whether the CPU executing SEC is BSP or
>>>>>>   hot-plugged AP?
>>>>> [Jiewen] I think this is blocked from hardware perspective, since the first
>> instruction.
>>>>> There are some hardware specific registers can be used to determine if
>> the CPU is new added.
>>>>> I don’t think this must be same as the real hardware.
>>>>> You are free to invent some registers in device model to be used in
>> OVMF hot plug driver.
>>>>
>>>> Yes, this would be a new operation mode for QEMU, that only applies to
>>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
>>>> fact it doesn't reply to anything at all.
>>>>
>>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e.
>> that
>>>>>>   it should execute code at a particular pflash location.)
>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
>>>>
>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
>>>> QEMU.  The AP does not start execution at all when it is unplugged, so
>>>> no cache-as-RAM etc.
>>>>
>>>> We only need to modify QEMU so that hot-plugged APIs do not reply to
>>>> INIT/SIPI/SMI.
>>>>
>>>>> I don’t think there is problem for real hardware, who always has CAR.
>>>>> Can QEMU provide some CPU specific space, such as MMIO region?
>>>>
>>>> Why is a CPU-specific region needed if every other processor is in SMM
>>>> and thus trusted.
>>>
>>> I was going through the steps Jiewen and Yingwen recommended.
>>>
>>> In step (02), the new CPU is expected to set up RAM access. In step
>>> (03), the new CPU, executing code from flash, is expected to "send board
>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
>>> message." For that action, the new CPU may need a stack (minimally if we
>>> want to use C function calls).
>>>
>>> Until step (03), there had been no word about any other (= pre-plugged)
>>> CPUs (more precisely, Jiewen even confirmed "No impact to other
>>> processors"), so I didn't assume that other CPUs had entered SMM.
>>>
>>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
>>> as I can. I'm still very confused. If you have a better understanding,
>>> could you please write up the 15-step process from the thread starter
>>> again, with all QEMU customizations applied? Such as, unnecessary steps
>>> removed, and platform specifics filled in.
>>
>> Sure.
>>
>> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>>      start running code until unparked by the CPU hotplug controller.
>>
>> (01b) QEMU: trigger SCI
>>
>> (02-03) no equivalent
>>
>> (04) Host CPU: (OS) execute GPE handler from DSDT
>>
>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>>      will not enter CPU because SMI is disabled)
>>
>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>      rebase code.
>>
>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>>      new CPU
>>
>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
> restriction that INIT/SIPI/SIPI can only be sent in SMM.

All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
before 07a, so this is okay.

However I do see a problem, because a PCI device's DMA could overwrite
0x38000 between (06) and (10) and hijack the code that is executed in
SMM.  How is this avoided on real hardware?  By the time the new CPU
enters SMM, it doesn't run off cache-as-RAM anymore.

Paolo

>> (08a) New CPU: (Low RAM) Enter protected mode.
>
> [Jiewen] NOTE: The new CPU still cannot use any physical memory, because
> the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.
> 
>> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
>>
>> (09) Host CPU: (SMM) Send SMI to the new CPU only.
>>
>> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>>      TSEG.
>>
>> (11) Host CPU: (SMM) Restore 38000.
>>
>> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>>      information. (This step will involve CPU_SERVICE protocol)
>>
>> (13) New CPU: (Flash) do whatever other initialization is needed
>>
>> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
>>
>> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
>>
>>
>> In other words, the cache-as-RAM phase of 02-03 is replaced by the
>> INIT-SIPI-SIPI sequence of 07b-08a-08b.
> [Jiewen] I am OK with this proposal.
> I think the rule is same - the new CPU CANNOT touch any system memory,
> no matter it is from reset-vector or from INIT/SIPI/SIPI.
> Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be
> CPU specific or on the flash.
> 
> 
> 
>>>> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
>>>> to 0xB2 when hotplug happens.  It could write a well-known value to
>>>> 0xB2, to be read by an SMI handler in edk2.
>>>
>>> I dislike involving QEMU's generated DSDT in anything SMM (even
>>> injecting the SMI), because the AML interpreter runs in the OS.
>>>
>>> If a malicious OS kernel is a bit too enlightened about the DSDT, it
>>> could willfully diverge from the process that we design. If QEMU
>>> broadcast the SMI internally, the guest OS could not interfere with that.
>>>
>>> If the purpose of the SMI is specifically to force all CPUs into SMM
>>> (and thereby force them into trusted state), then the OS would be
>>> explicitly counter-interested in carrying out the AML operations from
>>> QEMU's DSDT.
>>
>> But since the hotplug controller would only be accessible from SMM,
>> there would be no other way to invoke it than to follow the DSDT's
>> instruction and write to 0xB2.  FWIW, real hardware also has plenty of
>> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
>> access).
>>
>> Paolo



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 16:24         ` Paolo Bonzini
@ 2019-08-16  7:42           ` Igor Mammedov
  0 siblings, 0 replies; 23+ messages in thread
From: Igor Mammedov @ 2019-08-16  7:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Chen, Yingwen, edk2-devel-groups-io, Phillip Goerl,
	qemu devel list, Yao, Jiewen, Nakajima, Jun, Boris Ostrovsky,
	edk2-rfc-groups-io, Laszlo Ersek, Joao Marcal Lemos Martins

On Thu, 15 Aug 2019 18:24:53 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 15/08/19 18:07, Igor Mammedov wrote:
> > Looking at Q35 code and Seabios SMM relocation as example, if I see it
> > right QEMU has:
> >     - SMRAM is aliased from DRAM at 0xa0000
> >     - and TSEG steals from the top of low RAM when configured
> > 
> > Now problem is that default SMBASE at 0x30000 isn't backed by anything
> > in SMRAM address space and default SMI entry falls-through to the same
> > location in System address space.
> > 
> > The later is not trusted and entry into SMM mode will corrupt area + might
> > jump to 'random' SMI handler (hence save/restore code in Seabios).
> > 
> > Here is an idea, can we map a memory region at 0x30000 in SMRAM address
> > space with relocation space/code reserved. It could be a part of TSEG
> > (so we don't have to invent ABI to configure that)?  
> 
> No, there could be real mode code using it.

My impression was that QEMU/KVM's SMM address space is accessible only from
CPU in SMM mode, so SMM CPU should access in-depended SMRAM at 0x30000 in
SMM address space while not SMM CPUs (including real mode) should access
0x30000 from normal system RAM.


> What we _could_ do is
> initialize SMBASE to 0xa0000, but I think it's better to not deviate too
> much from processor behavior (even if it's admittedly a 20-years legacy
> that doesn't make any sense).

Agreed, it's better to follow spec, that's one of the reasons why I was toying
with idea of using separate SMRAM at 0x30000 mapped only in SMM address space.

Practically we would be following spec: SDM: 34.4 SMRAM
"
System logic can use the SMI acknowledge transaction or the assertion of the SMIACT# pin to decode accesses to
the SMRAM and redirect them (if desired) to specific SMRAM memory. If a separate RAM memory is used for
SMRAM, system logic should provide a programmable method of mapping the SMRAM into system memory space
when the processor is not in SMM. This mechanism will enable start-up procedures to initialize the SMRAM space
(that is, load the SMI handler) before executing the SMI handler during SMM.
"

Another benefit that gives us, is that we won't have to pull in
all existing CPUs into SMM (essentially another stop_machine) to
guarantee exclusive access to 0x30000 in normal RAM.

> 
> Paolo



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16  7:20             ` Paolo Bonzini
@ 2019-08-16  7:49               ` Yao, Jiewen
  2019-08-16 20:15                 ` Laszlo Ersek
  0 siblings, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2019-08-16  7:49 UTC (permalink / raw)
  To: Paolo Bonzini, Laszlo Ersek, devel
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

below

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, August 16, 2019 3:20 PM
> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
> <lersek@redhat.com>; devel@edk2.groups.io
> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> <phillip.goerl@oracle.com>
> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> 
> On 16/08/19 04:46, Yao, Jiewen wrote:
> > Comment below:
> >
> >
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> Sent: Friday, August 16, 2019 12:21 AM
> >> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao,
> Jiewen
> >> <jiewen.yao@intel.com>
> >> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> >> <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>;
> >> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> >> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>;
> >> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> >> <phillip.goerl@oracle.com>
> >> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> >>
> >> On 15/08/19 17:00, Laszlo Ersek wrote:
> >>> On 08/14/19 16:04, Paolo Bonzini wrote:
> >>>> On 14/08/19 15:20, Yao, Jiewen wrote:
> >>>>>> - Does this part require a new branch somewhere in the OVMF SEC
> >> code?
> >>>>>>   How do we determine whether the CPU executing SEC is BSP or
> >>>>>>   hot-plugged AP?
> >>>>> [Jiewen] I think this is blocked from hardware perspective, since the
> first
> >> instruction.
> >>>>> There are some hardware specific registers can be used to determine
> if
> >> the CPU is new added.
> >>>>> I don’t think this must be same as the real hardware.
> >>>>> You are free to invent some registers in device model to be used in
> >> OVMF hot plug driver.
> >>>>
> >>>> Yes, this would be a new operation mode for QEMU, that only applies
> to
> >>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI,
> in
> >>>> fact it doesn't reply to anything at all.
> >>>>
> >>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e.
> >> that
> >>>>>>   it should execute code at a particular pflash location.)
> >>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> >>>>
> >>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> >>>> QEMU.  The AP does not start execution at all when it is unplugged,
> so
> >>>> no cache-as-RAM etc.
> >>>>
> >>>> We only need to modify QEMU so that hot-plugged APIs do not reply
> to
> >>>> INIT/SIPI/SMI.
> >>>>
> >>>>> I don’t think there is problem for real hardware, who always has CAR.
> >>>>> Can QEMU provide some CPU specific space, such as MMIO region?
> >>>>
> >>>> Why is a CPU-specific region needed if every other processor is in SMM
> >>>> and thus trusted.
> >>>
> >>> I was going through the steps Jiewen and Yingwen recommended.
> >>>
> >>> In step (02), the new CPU is expected to set up RAM access. In step
> >>> (03), the new CPU, executing code from flash, is expected to "send
> board
> >>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> >>> message." For that action, the new CPU may need a stack (minimally if
> we
> >>> want to use C function calls).
> >>>
> >>> Until step (03), there had been no word about any other (= pre-plugged)
> >>> CPUs (more precisely, Jiewen even confirmed "No impact to other
> >>> processors"), so I didn't assume that other CPUs had entered SMM.
> >>>
> >>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> >>> as I can. I'm still very confused. If you have a better understanding,
> >>> could you please write up the 15-step process from the thread starter
> >>> again, with all QEMU customizations applied? Such as, unnecessary
> steps
> >>> removed, and platform specifics filled in.
> >>
> >> Sure.
> >>
> >> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
> >>      start running code until unparked by the CPU hotplug controller.
> >>
> >> (01b) QEMU: trigger SCI
> >>
> >> (02-03) no equivalent
> >>
> >> (04) Host CPU: (OS) execute GPE handler from DSDT
> >>
> >> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
> >>      will not enter CPU because SMI is disabled)
> >>
> >> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>      rebase code.
> >>
> >> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
> >>      new CPU
> >>
> >> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
> > [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
> > restriction that INIT/SIPI/SIPI can only be sent in SMM.
> 
> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
> before 07a, so this is okay.
[Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
I don’t see any extra step between 06 and 07a.
What is the magic here?



> However I do see a problem, because a PCI device's DMA could overwrite
> 0x38000 between (06) and (10) and hijack the code that is executed in
> SMM.  How is this avoided on real hardware?  By the time the new CPU
> enters SMM, it doesn't run off cache-as-RAM anymore.
[Jiewen] Interesting question.
I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
-- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
-- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.

I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.
I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?



> Paolo
> 
> >> (08a) New CPU: (Low RAM) Enter protected mode.
> >
> > [Jiewen] NOTE: The new CPU still cannot use any physical memory,
> because
> > the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.
> >
> >> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
> >>
> >> (09) Host CPU: (SMM) Send SMI to the new CPU only.
> >>
> >> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
> >>      TSEG.
> >>
> >> (11) Host CPU: (SMM) Restore 38000.
> >>
> >> (12) Host CPU: (SMM) Update located data structure to add the new CPU
> >>      information. (This step will involve CPU_SERVICE protocol)
> >>
> >> (13) New CPU: (Flash) do whatever other initialization is needed
> >>
> >> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
> >>
> >> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
> >>
> >>
> >> In other words, the cache-as-RAM phase of 02-03 is replaced by the
> >> INIT-SIPI-SIPI sequence of 07b-08a-08b.
> > [Jiewen] I am OK with this proposal.
> > I think the rule is same - the new CPU CANNOT touch any system memory,
> > no matter it is from reset-vector or from INIT/SIPI/SIPI.
> > Or I would say: if the new CPU want to touch some memory before first
> SMI, the memory should be
> > CPU specific or on the flash.
> >
> >
> >
> >>>> The QEMU DSDT could be modified (when secure boot is in effect) to
> OUT
> >>>> to 0xB2 when hotplug happens.  It could write a well-known value to
> >>>> 0xB2, to be read by an SMI handler in edk2.
> >>>
> >>> I dislike involving QEMU's generated DSDT in anything SMM (even
> >>> injecting the SMI), because the AML interpreter runs in the OS.
> >>>
> >>> If a malicious OS kernel is a bit too enlightened about the DSDT, it
> >>> could willfully diverge from the process that we design. If QEMU
> >>> broadcast the SMI internally, the guest OS could not interfere with that.
> >>>
> >>> If the purpose of the SMI is specifically to force all CPUs into SMM
> >>> (and thereby force them into trusted state), then the OS would be
> >>> explicitly counter-interested in carrying out the AML operations from
> >>> QEMU's DSDT.
> >>
> >> But since the hotplug controller would only be accessible from SMM,
> >> there would be no other way to invoke it than to follow the DSDT's
> >> instruction and write to 0xB2.  FWIW, real hardware also has plenty of
> >> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
> >> access).
> >>
> >> Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-15 16:21         ` Paolo Bonzini
  2019-08-16  2:46           ` Yao, Jiewen
@ 2019-08-16 20:00           ` Laszlo Ersek
  1 sibling, 0 replies; 23+ messages in thread
From: Laszlo Ersek @ 2019-08-16 20:00 UTC (permalink / raw)
  To: Paolo Bonzini, devel, Yao, Jiewen
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Nakajima, Jun,
	Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

On 08/15/19 18:21, Paolo Bonzini wrote:
> On 15/08/19 17:00, Laszlo Ersek wrote:
>> On 08/14/19 16:04, Paolo Bonzini wrote:
>>> On 14/08/19 15:20, Yao, Jiewen wrote:
>>>>> - Does this part require a new branch somewhere in the OVMF SEC code?
>>>>>   How do we determine whether the CPU executing SEC is BSP or
>>>>>   hot-plugged AP?
>>>> [Jiewen] I think this is blocked from hardware perspective, since the first instruction.
>>>> There are some hardware specific registers can be used to determine if the CPU is new added.
>>>> I don’t think this must be same as the real hardware.
>>>> You are free to invent some registers in device model to be used in OVMF hot plug driver.
>>>
>>> Yes, this would be a new operation mode for QEMU, that only applies to
>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI, in
>>> fact it doesn't reply to anything at all.
>>>
>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e. that
>>>>>   it should execute code at a particular pflash location.)
>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
>>>
>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
>>> QEMU.  The AP does not start execution at all when it is unplugged, so
>>> no cache-as-RAM etc.
>>>
>>> We only need to modify QEMU so that hot-plugged APIs do not reply to
>>> INIT/SIPI/SMI.
>>>
>>>> I don’t think there is problem for real hardware, who always has CAR.
>>>> Can QEMU provide some CPU specific space, such as MMIO region?
>>>
>>> Why is a CPU-specific region needed if every other processor is in SMM
>>> and thus trusted.
>>
>> I was going through the steps Jiewen and Yingwen recommended.
>>
>> In step (02), the new CPU is expected to set up RAM access. In step
>> (03), the new CPU, executing code from flash, is expected to "send board
>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
>> message." For that action, the new CPU may need a stack (minimally if we
>> want to use C function calls).
>>
>> Until step (03), there had been no word about any other (= pre-plugged)
>> CPUs (more precisely, Jiewen even confirmed "No impact to other
>> processors"), so I didn't assume that other CPUs had entered SMM.
>>
>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
>> as I can. I'm still very confused. If you have a better understanding,
>> could you please write up the 15-step process from the thread starter
>> again, with all QEMU customizations applied? Such as, unnecessary steps
>> removed, and platform specifics filled in.
> 
> Sure.
> 
> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>      start running code until unparked by the CPU hotplug controller.
> 
> (01b) QEMU: trigger SCI
> 
> (02-03) no equivalent
> 
> (04) Host CPU: (OS) execute GPE handler from DSDT
> 
> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>      will not enter CPU because SMI is disabled)
> 
> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>      rebase code.

(Could Intel open source code for this?)

> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>      new CPU
> 
> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
> 
> (08a) New CPU: (Low RAM) Enter protected mode.

PCI DMA attack might be relevant (but yes, I see you've mentioned that
too, down-thread)

> 
> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
> 
> (09) Host CPU: (SMM) Send SMI to the new CPU only.
> 
> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>      TSEG.

I wish we could simply wake the new CPU -- after step 07a -- with an
SMI. IOW, if we could excise steps 07b, 08a, 08b.

Our CPU hotplug controller, and the initial parked state in 01a for the
new CPU, are going to be home-brewed anyway.

On the other hand...

> (11) Host CPU: (SMM) Restore 38000.
> 
> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>      information. (This step will involve CPU_SERVICE protocol)
> 
> (13) New CPU: (Flash) do whatever other initialization is needed
> 
> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

basically step 08b is the environment to which the new CPU returns in
13/14, after the RSM.

Do we absolutely need low RAM for 08a (for entering protected mode)? we
could execute from pflash, no? OTOH we'd still need RAM for the stack,
and that could be attacked with PCI DMA similarly. I believe.

> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
> 
> 
> In other words, the cache-as-RAM phase of 02-03 is replaced by the
> INIT-SIPI-SIPI sequence of 07b-08a-08b.
> 
> 
>>> The QEMU DSDT could be modified (when secure boot is in effect) to OUT
>>> to 0xB2 when hotplug happens.  It could write a well-known value to
>>> 0xB2, to be read by an SMI handler in edk2.
>>
>> I dislike involving QEMU's generated DSDT in anything SMM (even
>> injecting the SMI), because the AML interpreter runs in the OS.
>>
>> If a malicious OS kernel is a bit too enlightened about the DSDT, it
>> could willfully diverge from the process that we design. If QEMU
>> broadcast the SMI internally, the guest OS could not interfere with that.
>>
>> If the purpose of the SMI is specifically to force all CPUs into SMM
>> (and thereby force them into trusted state), then the OS would be
>> explicitly counter-interested in carrying out the AML operations from
>> QEMU's DSDT.
> 
> But since the hotplug controller would only be accessible from SMM,
> there would be no other way to invoke it than to follow the DSDT's
> instruction and write to 0xB2.

Right.

> FWIW, real hardware also has plenty of
> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
> access).

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16  7:49               ` Yao, Jiewen
@ 2019-08-16 20:15                 ` Laszlo Ersek
  2019-08-16 22:19                   ` Alex Williamson
  0 siblings, 1 reply; 23+ messages in thread
From: Laszlo Ersek @ 2019-08-16 20:15 UTC (permalink / raw)
  To: Yao, Jiewen, Paolo Bonzini, devel
  Cc: Chen, Yingwen, Phillip Goerl, qemu devel list, Alex Williamson,
	Nakajima, Jun, Igor Mammedov, Boris Ostrovsky,
	edk2-rfc-groups-io, Joao Marcal Lemos Martins

+Alex (direct question at the bottom)

On 08/16/19 09:49, Yao, Jiewen wrote:
> below
> 
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: Friday, August 16, 2019 3:20 PM
>> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
>> <lersek@redhat.com>; devel@edk2.groups.io
>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
>> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
>> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
>> <phillip.goerl@oracle.com>
>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
>>
>> On 16/08/19 04:46, Yao, Jiewen wrote:
>>> Comment below:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>>>> Sent: Friday, August 16, 2019 12:21 AM
>>>> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao,
>> Jiewen
>>>> <jiewen.yao@intel.com>
>>>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
>>>> <qemu-devel@nongnu.org>; Igor Mammedov
>> <imammedo@redhat.com>;
>>>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
>>>> <jun.nakajima@intel.com>; Boris Ostrovsky
>> <boris.ostrovsky@oracle.com>;
>>>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
>>>> <phillip.goerl@oracle.com>
>>>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
>>>>
>>>> On 15/08/19 17:00, Laszlo Ersek wrote:
>>>>> On 08/14/19 16:04, Paolo Bonzini wrote:
>>>>>> On 14/08/19 15:20, Yao, Jiewen wrote:
>>>>>>>> - Does this part require a new branch somewhere in the OVMF SEC
>>>> code?
>>>>>>>>   How do we determine whether the CPU executing SEC is BSP or
>>>>>>>>   hot-plugged AP?
>>>>>>> [Jiewen] I think this is blocked from hardware perspective, since the
>> first
>>>> instruction.
>>>>>>> There are some hardware specific registers can be used to determine
>> if
>>>> the CPU is new added.
>>>>>>> I don’t think this must be same as the real hardware.
>>>>>>> You are free to invent some registers in device model to be used in
>>>> OVMF hot plug driver.
>>>>>>
>>>>>> Yes, this would be a new operation mode for QEMU, that only applies
>> to
>>>>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI,
>> in
>>>>>> fact it doesn't reply to anything at all.
>>>>>>
>>>>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e.
>>>> that
>>>>>>>>   it should execute code at a particular pflash location.)
>>>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
>>>>>>
>>>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
>>>>>> QEMU.  The AP does not start execution at all when it is unplugged,
>> so
>>>>>> no cache-as-RAM etc.
>>>>>>
>>>>>> We only need to modify QEMU so that hot-plugged APIs do not reply
>> to
>>>>>> INIT/SIPI/SMI.
>>>>>>
>>>>>>> I don’t think there is problem for real hardware, who always has CAR.
>>>>>>> Can QEMU provide some CPU specific space, such as MMIO region?
>>>>>>
>>>>>> Why is a CPU-specific region needed if every other processor is in SMM
>>>>>> and thus trusted.
>>>>>
>>>>> I was going through the steps Jiewen and Yingwen recommended.
>>>>>
>>>>> In step (02), the new CPU is expected to set up RAM access. In step
>>>>> (03), the new CPU, executing code from flash, is expected to "send
>> board
>>>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
>>>>> message." For that action, the new CPU may need a stack (minimally if
>> we
>>>>> want to use C function calls).
>>>>>
>>>>> Until step (03), there had been no word about any other (= pre-plugged)
>>>>> CPUs (more precisely, Jiewen even confirmed "No impact to other
>>>>> processors"), so I didn't assume that other CPUs had entered SMM.
>>>>>
>>>>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
>>>>> as I can. I'm still very confused. If you have a better understanding,
>>>>> could you please write up the 15-step process from the thread starter
>>>>> again, with all QEMU customizations applied? Such as, unnecessary
>> steps
>>>>> removed, and platform specifics filled in.
>>>>
>>>> Sure.
>>>>
>>>> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
>>>>      start running code until unparked by the CPU hotplug controller.
>>>>
>>>> (01b) QEMU: trigger SCI
>>>>
>>>> (02-03) no equivalent
>>>>
>>>> (04) Host CPU: (OS) execute GPE handler from DSDT
>>>>
>>>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
>>>>      will not enter CPU because SMI is disabled)
>>>>
>>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
>>>>      rebase code.
>>>>
>>>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
>>>>      new CPU
>>>>
>>>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
>>> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
>>> restriction that INIT/SIPI/SIPI can only be sent in SMM.
>>
>> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
>> before 07a, so this is okay.
> [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
> I don’t see any extra step between 06 and 07a.
> What is the magic here?

The magic is 07a itself, IIUC. The CPU hotplug controller would be
accessible only in SMM. And until 07a happens, the new CPU ignores
INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
would implement the new CPU's behavior like that.

> 
> 
> 
>> However I do see a problem, because a PCI device's DMA could overwrite
>> 0x38000 between (06) and (10) and hijack the code that is executed in
>> SMM.  How is this avoided on real hardware?  By the time the new CPU
>> enters SMM, it doesn't run off cache-as-RAM anymore.
> [Jiewen] Interesting question.
> I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
> -- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
> -- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.

We do have physical PCI(e) device assignment; sorry for not highlighting
that earlier. That feature (VFIO) does rely on the (physical) IOMMU, and
it makes sure that the assigned device can only access physical frames
that belong to the virtual machine that the device is assigned to.

However, as far as I know, VFIO doesn't try to restrict PCI DMA to
subsets of guest RAM... I could be wrong about that, I vaguely recall
RMRR support, which seems somewhat related.

> I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.
> I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?

I think that would be a VFIO feature.

Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
(expressed with guest-physical RAM addresses), perhaps permanently,
perhaps just for a while -- not sure about coordination though --, could
VFIO accommodate that (I guess by "punching holes" in the IOMMU page
tables)?

Thanks
Laszlo

> 
> 
> 
>> Paolo
>>
>>>> (08a) New CPU: (Low RAM) Enter protected mode.
>>>
>>> [Jiewen] NOTE: The new CPU still cannot use any physical memory,
>> because
>>> the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.
>>>
>>>> (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
>>>>
>>>> (09) Host CPU: (SMM) Send SMI to the new CPU only.
>>>>
>>>> (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
>>>>      TSEG.
>>>>
>>>> (11) Host CPU: (SMM) Restore 38000.
>>>>
>>>> (12) Host CPU: (SMM) Update located data structure to add the new CPU
>>>>      information. (This step will involve CPU_SERVICE protocol)
>>>>
>>>> (13) New CPU: (Flash) do whatever other initialization is needed
>>>>
>>>> (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
>>>>
>>>> (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
>>>>
>>>>
>>>> In other words, the cache-as-RAM phase of 02-03 is replaced by the
>>>> INIT-SIPI-SIPI sequence of 07b-08a-08b.
>>> [Jiewen] I am OK with this proposal.
>>> I think the rule is same - the new CPU CANNOT touch any system memory,
>>> no matter it is from reset-vector or from INIT/SIPI/SIPI.
>>> Or I would say: if the new CPU want to touch some memory before first
>> SMI, the memory should be
>>> CPU specific or on the flash.
>>>
>>>
>>>
>>>>>> The QEMU DSDT could be modified (when secure boot is in effect) to
>> OUT
>>>>>> to 0xB2 when hotplug happens.  It could write a well-known value to
>>>>>> 0xB2, to be read by an SMI handler in edk2.
>>>>>
>>>>> I dislike involving QEMU's generated DSDT in anything SMM (even
>>>>> injecting the SMI), because the AML interpreter runs in the OS.
>>>>>
>>>>> If a malicious OS kernel is a bit too enlightened about the DSDT, it
>>>>> could willfully diverge from the process that we design. If QEMU
>>>>> broadcast the SMI internally, the guest OS could not interfere with that.
>>>>>
>>>>> If the purpose of the SMI is specifically to force all CPUs into SMM
>>>>> (and thereby force them into trusted state), then the OS would be
>>>>> explicitly counter-interested in carrying out the AML operations from
>>>>> QEMU's DSDT.
>>>>
>>>> But since the hotplug controller would only be accessible from SMM,
>>>> there would be no other way to invoke it than to follow the DSDT's
>>>> instruction and write to 0xB2.  FWIW, real hardware also has plenty of
>>>> 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
>>>> access).
>>>>
>>>> Paolo
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16 20:15                 ` Laszlo Ersek
@ 2019-08-16 22:19                   ` Alex Williamson
  2019-08-17  0:20                     ` Yao, Jiewen
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2019-08-16 22:19 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Chen, Yingwen, devel, Phillip Goerl, qemu devel list, Yao,
	Jiewen, Nakajima, Jun, Igor Mammedov, Paolo Bonzini,
	Boris Ostrovsky, edk2-rfc-groups-io, Joao Marcal Lemos Martins

On Fri, 16 Aug 2019 22:15:15 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> +Alex (direct question at the bottom)
> 
> On 08/16/19 09:49, Yao, Jiewen wrote:
> > below
> >   
> >> -----Original Message-----
> >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >> Sent: Friday, August 16, 2019 3:20 PM
> >> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
> >> <lersek@redhat.com>; devel@edk2.groups.io
> >> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> >> <qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
> >> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> >> <jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
> >> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> >> <phillip.goerl@oracle.com>
> >> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> >>
> >> On 16/08/19 04:46, Yao, Jiewen wrote:  
> >>> Comment below:
> >>>
> >>>  
> >>>> -----Original Message-----
> >>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> >>>> Sent: Friday, August 16, 2019 12:21 AM
> >>>> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao,  
> >> Jiewen  
> >>>> <jiewen.yao@intel.com>
> >>>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> >>>> <qemu-devel@nongnu.org>; Igor Mammedov  
> >> <imammedo@redhat.com>;  
> >>>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> >>>> <jun.nakajima@intel.com>; Boris Ostrovsky  
> >> <boris.ostrovsky@oracle.com>;  
> >>>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
> >>>> <phillip.goerl@oracle.com>
> >>>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> >>>>
> >>>> On 15/08/19 17:00, Laszlo Ersek wrote:  
> >>>>> On 08/14/19 16:04, Paolo Bonzini wrote:  
> >>>>>> On 14/08/19 15:20, Yao, Jiewen wrote:  
> >>>>>>>> - Does this part require a new branch somewhere in the OVMF SEC  
> >>>> code?  
> >>>>>>>>   How do we determine whether the CPU executing SEC is BSP or
> >>>>>>>>   hot-plugged AP?  
> >>>>>>> [Jiewen] I think this is blocked from hardware perspective, since the  
> >> first  
> >>>> instruction.  
> >>>>>>> There are some hardware specific registers can be used to determine  
> >> if  
> >>>> the CPU is new added.  
> >>>>>>> I don’t think this must be same as the real hardware.
> >>>>>>> You are free to invent some registers in device model to be used in  
> >>>> OVMF hot plug driver.  
> >>>>>>
> >>>>>> Yes, this would be a new operation mode for QEMU, that only applies  
> >> to  
> >>>>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or SMI,  
> >> in  
> >>>>>> fact it doesn't reply to anything at all.
> >>>>>>  
> >>>>>>>> - How do we tell the hot-plugged AP where to start execution? (I.e.  
> >>>> that  
> >>>>>>>>   it should execute code at a particular pflash location.)  
> >>>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.  
> >>>>>>
> >>>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> >>>>>> QEMU.  The AP does not start execution at all when it is unplugged,  
> >> so  
> >>>>>> no cache-as-RAM etc.
> >>>>>>
> >>>>>> We only need to modify QEMU so that hot-plugged APIs do not reply  
> >> to  
> >>>>>> INIT/SIPI/SMI.
> >>>>>>  
> >>>>>>> I don’t think there is problem for real hardware, who always has CAR.
> >>>>>>> Can QEMU provide some CPU specific space, such as MMIO region?  
> >>>>>>
> >>>>>> Why is a CPU-specific region needed if every other processor is in SMM
> >>>>>> and thus trusted.  
> >>>>>
> >>>>> I was going through the steps Jiewen and Yingwen recommended.
> >>>>>
> >>>>> In step (02), the new CPU is expected to set up RAM access. In step
> >>>>> (03), the new CPU, executing code from flash, is expected to "send  
> >> board  
> >>>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> >>>>> message." For that action, the new CPU may need a stack (minimally if  
> >> we  
> >>>>> want to use C function calls).
> >>>>>
> >>>>> Until step (03), there had been no word about any other (= pre-plugged)
> >>>>> CPUs (more precisely, Jiewen even confirmed "No impact to other
> >>>>> processors"), so I didn't assume that other CPUs had entered SMM.
> >>>>>
> >>>>> Paolo, I've attempted to read Jiewen's response, and yours, as carefully
> >>>>> as I can. I'm still very confused. If you have a better understanding,
> >>>>> could you please write up the 15-step process from the thread starter
> >>>>> again, with all QEMU customizations applied? Such as, unnecessary  
> >> steps  
> >>>>> removed, and platform specifics filled in.  
> >>>>
> >>>> Sure.
> >>>>
> >>>> (01a) QEMU: create new CPU.  The CPU already exists, but it does not
> >>>>      start running code until unparked by the CPU hotplug controller.
> >>>>
> >>>> (01b) QEMU: trigger SCI
> >>>>
> >>>> (02-03) no equivalent
> >>>>
> >>>> (04) Host CPU: (OS) execute GPE handler from DSDT
> >>>>
> >>>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
> >>>>      will not enter CPU because SMI is disabled)
> >>>>
> >>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> >>>>      rebase code.
> >>>>
> >>>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
> >>>>      new CPU
> >>>>
> >>>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.  
> >>> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
> >>> restriction that INIT/SIPI/SIPI can only be sent in SMM.  
> >>
> >> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
> >> before 07a, so this is okay.  
> > [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
> > I don’t see any extra step between 06 and 07a.
> > What is the magic here?  
> 
> The magic is 07a itself, IIUC. The CPU hotplug controller would be
> accessible only in SMM. And until 07a happens, the new CPU ignores
> INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
> would implement the new CPU's behavior like that.
> 
> > 
> > 
> >   
> >> However I do see a problem, because a PCI device's DMA could overwrite
> >> 0x38000 between (06) and (10) and hijack the code that is executed in
> >> SMM.  How is this avoided on real hardware?  By the time the new CPU
> >> enters SMM, it doesn't run off cache-as-RAM anymore.  
> > [Jiewen] Interesting question.
> > I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
> > -- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
> > -- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.  
> 
> We do have physical PCI(e) device assignment; sorry for not highlighting
> that earlier. That feature (VFIO) does rely on the (physical) IOMMU, and
> it makes sure that the assigned device can only access physical frames
> that belong to the virtual machine that the device is assigned to.
> 
> However, as far as I know, VFIO doesn't try to restrict PCI DMA to
> subsets of guest RAM... I could be wrong about that, I vaguely recall
> RMRR support, which seems somewhat related.
> 
> > I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.
> > I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?  
> 
> I think that would be a VFIO feature.
> 
> Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
> (expressed with guest-physical RAM addresses), perhaps permanently,
> perhaps just for a while -- not sure about coordination though --, could
> VFIO accommodate that (I guess by "punching holes" in the IOMMU page
> tables)?

It depends.  For starters, the vfio mapping API does not allow
unmapping arbitrary sub-ranges of previous mappings.  So the hole you
want to punch would need to be independently mapped.  From there you
get into the issue of whether this range is a potential DMA target.  If
it is, then this is the path to data corruption.  We cannot interfere
with the operation of the device and we have little to no visibility of
active DMA targets.

If we're talking about RAM that is never a DMA target, perhaps e820
reserved memory, then we can make sure certainly MemoryRegions are
skipped when mapped by QEMU and would expect the guest to never map
them through a vIOMMU as well.  Maybe then it's a question of where
we're trying to provide security (it might be more difficult if QEMU
needs to sanitize vIOMMU mappings to actively prevent mapping
reserved areas).

Is there anything unique about the VM case here?  Bare metal SMM needs
to be concerned about protecting itself from I/O devices that operate
outside of the realm of SMM mode as well, right?  Is something "simple"
like an AddressSpace switch necessary here, such that an I/O device
always has a mapping to a safe guest RAM page while the vCPU
AddressSpace can switch to some protected page?  The IOMMU and vCPU
mappings don't need to be the same.  The vCPU is more under our control
than the assigned device.

FWIW, RMRRs are a VT-d specific mechanism to define an address range as
persistently, identity mapped for one or more devices.  IOW, the device
would always map that range.  I don't think that's what you're after
here.  RMRRs are also an abomination that I hope we never find a
requirement for in a VM.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-16 22:19                   ` Alex Williamson
@ 2019-08-17  0:20                     ` Yao, Jiewen
  2019-08-18 19:50                       ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2019-08-17  0:20 UTC (permalink / raw)
  To: Alex Williamson, Laszlo Ersek
  Cc: Chen, Yingwen, devel, Phillip Goerl, qemu devel list, Nakajima,
	Jun, Igor Mammedov, Paolo Bonzini, Boris Ostrovsky,
	edk2-rfc-groups-io, Joao Marcal Lemos Martins



> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Saturday, August 17, 2019 6:20 AM
> To: Laszlo Ersek <lersek@redhat.com>
> Cc: Yao, Jiewen <jiewen.yao@intel.com>; Paolo Bonzini
> <pbonzini@redhat.com>; devel@edk2.groups.io; edk2-rfc-groups-io
> <rfc@edk2.groups.io>; qemu devel list <qemu-devel@nongnu.org>; Igor
> Mammedov <imammedo@redhat.com>; Chen, Yingwen
> <yingwen.chen@intel.com>; Nakajima, Jun <jun.nakajima@intel.com>; Boris
> Ostrovsky <boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
> <joao.m.martins@oracle.com>; Phillip Goerl <phillip.goerl@oracle.com>
> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> 
> On Fri, 16 Aug 2019 22:15:15 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
> > +Alex (direct question at the bottom)
> >
> > On 08/16/19 09:49, Yao, Jiewen wrote:
> > > below
> > >
> > >> -----Original Message-----
> > >> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > >> Sent: Friday, August 16, 2019 3:20 PM
> > >> To: Yao, Jiewen <jiewen.yao@intel.com>; Laszlo Ersek
> > >> <lersek@redhat.com>; devel@edk2.groups.io
> > >> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> > >> <qemu-devel@nongnu.org>; Igor Mammedov
> <imammedo@redhat.com>;
> > >> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> > >> <jun.nakajima@intel.com>; Boris Ostrovsky
> <boris.ostrovsky@oracle.com>;
> > >> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip
> Goerl
> > >> <phillip.goerl@oracle.com>
> > >> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> > >>
> > >> On 16/08/19 04:46, Yao, Jiewen wrote:
> > >>> Comment below:
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > >>>> Sent: Friday, August 16, 2019 12:21 AM
> > >>>> To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao,
> > >> Jiewen
> > >>>> <jiewen.yao@intel.com>
> > >>>> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
> > >>>> <qemu-devel@nongnu.org>; Igor Mammedov
> > >> <imammedo@redhat.com>;
> > >>>> Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
> > >>>> <jun.nakajima@intel.com>; Boris Ostrovsky
> > >> <boris.ostrovsky@oracle.com>;
> > >>>> Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip
> Goerl
> > >>>> <phillip.goerl@oracle.com>
> > >>>> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
> > >>>>
> > >>>> On 15/08/19 17:00, Laszlo Ersek wrote:
> > >>>>> On 08/14/19 16:04, Paolo Bonzini wrote:
> > >>>>>> On 14/08/19 15:20, Yao, Jiewen wrote:
> > >>>>>>>> - Does this part require a new branch somewhere in the OVMF
> SEC
> > >>>> code?
> > >>>>>>>>   How do we determine whether the CPU executing SEC is BSP
> or
> > >>>>>>>>   hot-plugged AP?
> > >>>>>>> [Jiewen] I think this is blocked from hardware perspective, since
> the
> > >> first
> > >>>> instruction.
> > >>>>>>> There are some hardware specific registers can be used to
> determine
> > >> if
> > >>>> the CPU is new added.
> > >>>>>>> I don’t think this must be same as the real hardware.
> > >>>>>>> You are free to invent some registers in device model to be used
> in
> > >>>> OVMF hot plug driver.
> > >>>>>>
> > >>>>>> Yes, this would be a new operation mode for QEMU, that only
> applies
> > >> to
> > >>>>>> hot-plugged CPUs.  In this mode the AP doesn't reply to INIT or
> SMI,
> > >> in
> > >>>>>> fact it doesn't reply to anything at all.
> > >>>>>>
> > >>>>>>>> - How do we tell the hot-plugged AP where to start execution?
> (I.e.
> > >>>> that
> > >>>>>>>>   it should execute code at a particular pflash location.)
> > >>>>>>> [Jiewen] Same real mode reset vector at FFFF:FFF0.
> > >>>>>>
> > >>>>>> You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
> > >>>>>> QEMU.  The AP does not start execution at all when it is
> unplugged,
> > >> so
> > >>>>>> no cache-as-RAM etc.
> > >>>>>>
> > >>>>>> We only need to modify QEMU so that hot-plugged APIs do not
> reply
> > >> to
> > >>>>>> INIT/SIPI/SMI.
> > >>>>>>
> > >>>>>>> I don’t think there is problem for real hardware, who always has
> CAR.
> > >>>>>>> Can QEMU provide some CPU specific space, such as MMIO
> region?
> > >>>>>>
> > >>>>>> Why is a CPU-specific region needed if every other processor is in
> SMM
> > >>>>>> and thus trusted.
> > >>>>>
> > >>>>> I was going through the steps Jiewen and Yingwen recommended.
> > >>>>>
> > >>>>> In step (02), the new CPU is expected to set up RAM access. In step
> > >>>>> (03), the new CPU, executing code from flash, is expected to "send
> > >> board
> > >>>>> message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
> > >>>>> message." For that action, the new CPU may need a stack
> (minimally if
> > >> we
> > >>>>> want to use C function calls).
> > >>>>>
> > >>>>> Until step (03), there had been no word about any other (=
> pre-plugged)
> > >>>>> CPUs (more precisely, Jiewen even confirmed "No impact to other
> > >>>>> processors"), so I didn't assume that other CPUs had entered SMM.
> > >>>>>
> > >>>>> Paolo, I've attempted to read Jiewen's response, and yours, as
> carefully
> > >>>>> as I can. I'm still very confused. If you have a better understanding,
> > >>>>> could you please write up the 15-step process from the thread
> starter
> > >>>>> again, with all QEMU customizations applied? Such as, unnecessary
> > >> steps
> > >>>>> removed, and platform specifics filled in.
> > >>>>
> > >>>> Sure.
> > >>>>
> > >>>> (01a) QEMU: create new CPU.  The CPU already exists, but it does
> not
> > >>>>      start running code until unparked by the CPU hotplug
> controller.
> > >>>>
> > >>>> (01b) QEMU: trigger SCI
> > >>>>
> > >>>> (02-03) no equivalent
> > >>>>
> > >>>> (04) Host CPU: (OS) execute GPE handler from DSDT
> > >>>>
> > >>>> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New
> CPU
> > >>>>      will not enter CPU because SMI is disabled)
> > >>>>
> > >>>> (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
> > >>>>      rebase code.
> > >>>>
> > >>>> (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
> > >>>>      new CPU
> > >>>>
> > >>>> (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
> > >>> [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is
> no
> > >>> restriction that INIT/SIPI/SIPI can only be sent in SMM.
> > >>
> > >> All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
> > >> before 07a, so this is okay.
> > > [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is
> delivered at 07a?
> > > I don’t see any extra step between 06 and 07a.
> > > What is the magic here?
> >
> > The magic is 07a itself, IIUC. The CPU hotplug controller would be
> > accessible only in SMM. And until 07a happens, the new CPU ignores
> > INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
> > would implement the new CPU's behavior like that.
[Jiewen] Got it. Looks fine to me.



> > >> However I do see a problem, because a PCI device's DMA could
> overwrite
> > >> 0x38000 between (06) and (10) and hijack the code that is executed in
> > >> SMM.  How is this avoided on real hardware?  By the time the new
> CPU
> > >> enters SMM, it doesn't run off cache-as-RAM anymore.
> > > [Jiewen] Interesting question.
> > > I don’t think the DMA attack is considered in threat model for the virtual
> environment. We only list adversary below:
> > > -- Adversary: System Software Attacker, who can control any OS memory
> or silicon register from OS level, or read write BIOS data.
> > > -- Adversary: Simple hardware attacker, who can hot add or hot remove
> a CPU.
> >
> > We do have physical PCI(e) device assignment; sorry for not highlighting
> > that earlier.
[Jiewen] That is OK. Then we MUST add the third adversary.
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.

In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region or non-DMA capable region. It depends upon the silicon design.
#4: the normal OS accessible memory - including ACPI reclaim, ACPI NVS, and reserved memory not included by #3 - MUST be DMA capable region.
As such, IOMMU protection is NOT required for #1 and #2. IOMMU protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please correct me if I am wrong.



>> That feature (VFIO) does rely on the (physical) IOMMU, and
> > it makes sure that the assigned device can only access physical frames
> > that belong to the virtual machine that the device is assigned to.
[Jiewen] Thank you! Good to know.
I found https://www.kernel.org/doc/Documentation/vfio.txt
Is that what you scribed above?
Anyway, I believe the problem is clear and the solution in real world is clear.
I will leave the virtual world discussion to Alex, Paolo, Laszlo.
If you need any of my input, please let me know.



> > However, as far as I know, VFIO doesn't try to restrict PCI DMA to
> > subsets of guest RAM... I could be wrong about that, I vaguely recall
> > RMRR support, which seems somewhat related.
> >
> > > I agree it is a threat from real hardware perspective. SMM may check
> VTd to make sure the 38000 is blocked.
> > > I doubt if it is a threat in virtual environment. Do we have a way to block
> DMA in virtual environment?
> >
> > I think that would be a VFIO feature.
> >
> > Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
> > (expressed with guest-physical RAM addresses), perhaps permanently,
> > perhaps just for a while -- not sure about coordination though --, could
> > VFIO accommodate that (I guess by "punching holes" in the IOMMU page
> > tables)?
> 
> It depends.  For starters, the vfio mapping API does not allow
> unmapping arbitrary sub-ranges of previous mappings.  So the hole you
> want to punch would need to be independently mapped.  From there you
> get into the issue of whether this range is a potential DMA target.  If
> it is, then this is the path to data corruption.  We cannot interfere
> with the operation of the device and we have little to no visibility of
> active DMA targets.
> 
> If we're talking about RAM that is never a DMA target, perhaps e820
> reserved memory, then we can make sure certainly MemoryRegions are
> skipped when mapped by QEMU and would expect the guest to never map
> them through a vIOMMU as well.  Maybe then it's a question of where
> we're trying to provide security (it might be more difficult if QEMU
> needs to sanitize vIOMMU mappings to actively prevent mapping
> reserved areas).
> 
> Is there anything unique about the VM case here?  Bare metal SMM needs
> to be concerned about protecting itself from I/O devices that operate
> outside of the realm of SMM mode as well, right?  Is something "simple"
> like an AddressSpace switch necessary here, such that an I/O device
> always has a mapping to a safe guest RAM page while the vCPU
> AddressSpace can switch to some protected page?  The IOMMU and vCPU
> mappings don't need to be the same.  The vCPU is more under our control
> than the assigned device.
> 
> FWIW, RMRRs are a VT-d specific mechanism to define an address range as
> persistently, identity mapped for one or more devices.  IOW, the device
> would always map that range.  I don't think that's what you're after
> here.  RMRRs are also an abomination that I hope we never find a
> requirement for in a VM.  Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-17  0:20                     ` Yao, Jiewen
@ 2019-08-18 19:50                       ` Paolo Bonzini
  2019-08-18 23:00                         ` Yao, Jiewen
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Bonzini @ 2019-08-18 19:50 UTC (permalink / raw)
  To: Yao, Jiewen, Alex Williamson, Laszlo Ersek
  Cc: Chen, Yingwen, devel, Phillip Goerl, qemu devel list, Nakajima,
	Jun, Igor Mammedov, Boris Ostrovsky, edk2-rfc-groups-io,
	Joao Marcal Lemos Martins

On 17/08/19 02:20, Yao, Jiewen wrote:
> [Jiewen] That is OK. Then we MUST add the third adversary.
> -- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
> NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
> 
> In the real world:
> #1: the SMM MUST be non-DMA capable region.
> #2: the MMIO MUST be non-DMA capable region.
> #3: the stolen memory MIGHT be DMA capable region or non-DMA capable
> region. It depends upon the silicon design.
> #4: the normal OS accessible memory - including ACPI reclaim, ACPI
> NVS, and reserved memory not included by #3 - MUST be DMA capable region.
> As such, IOMMU protection is NOT required for #1 and #2. IOMMU
> protection MIGHT be required for #3 and MUST be required for #4.
> I assume the virtual environment is designed in the same way. Please
> correct me if I am wrong.
> 

Correct.  The 0x30000...0x3ffff area is the only problematic one;
Igor's idea (or a variant, for example optionally remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-18 19:50                       ` Paolo Bonzini
@ 2019-08-18 23:00                         ` Yao, Jiewen
  2019-08-19 14:10                           ` Paolo Bonzini
  0 siblings, 1 reply; 23+ messages in thread
From: Yao, Jiewen @ 2019-08-18 23:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Chen, Yingwen, devel, Phillip Goerl, qemu devel list,
	Alex Williamson, Nakajima, Jun, Igor Mammedov, Boris Ostrovsky,
	edk2-rfc-groups-io, Laszlo Ersek, Joao Marcal Lemos Martins

in real world, we deprecate AB-seg usage because they are vulnerable to smm cache poison attack.
I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison. 

thank you!
Yao, Jiewen


> ^[$B:_^[(B 2019^[$BG/^[(B8^[$B7n^[(B19^[$BF|!$>e8a^[(B3:50^[$B!$^[(BPaolo Bonzini <pbonzini@redhat.com> ^[$B<LF;!'^[(B
> 
>> On 17/08/19 02:20, Yao, Jiewen wrote:
>> [Jiewen] That is OK. Then we MUST add the third adversary.
>> -- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
>> NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
>> 
>> In the real world:
>> #1: the SMM MUST be non-DMA capable region.
>> #2: the MMIO MUST be non-DMA capable region.
>> #3: the stolen memory MIGHT be DMA capable region or non-DMA capable
>> region. It depends upon the silicon design.
>> #4: the normal OS accessible memory - including ACPI reclaim, ACPI
>> NVS, and reserved memory not included by #3 - MUST be DMA capable region.
>> As such, IOMMU protection is NOT required for #1 and #2. IOMMU
>> protection MIGHT be required for #3 and MUST be required for #4.
>> I assume the virtual environment is designed in the same way. Please
>> correct me if I am wrong.
>> 
> 
> Correct.  The 0x30000...0x3ffff area is the only problematic one;
> Igor's idea (or a variant, for example optionally remapping
> 0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
> 
> Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
  2019-08-18 23:00                         ` Yao, Jiewen
@ 2019-08-19 14:10                           ` Paolo Bonzini
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2019-08-19 14:10 UTC (permalink / raw)
  To: Yao, Jiewen
  Cc: Chen, Yingwen, devel, Phillip Goerl, qemu devel list,
	Alex Williamson, Nakajima, Jun, Igor Mammedov, Boris Ostrovsky,
	edk2-rfc-groups-io, Laszlo Ersek, Joao Marcal Lemos Martins

On 19/08/19 01:00, Yao, Jiewen wrote:
> in real world, we deprecate AB-seg usage because they are vulnerable
> to smm cache poison attack. I assume cache poison is out of scope in
> the virtual world, or there is a way to prevent ABseg cache poison.

Indeed the SMRR would not cover the A-seg on real hardware.  However, if
the chipset allowed aliasing A-seg SMRAM to 0x30000, it would only be
used for SMBASE relocation of hotplugged CPU.  The firmware would still
keep low SMRAM disabled, *except around SMBASE relocation of hotplugged
CPUs*.  To avoid cache poisoning attacks, you only have to issue a
WBINVD before enabling low SMRAM and before disabling it.  Hotplug SMI
is not a performance-sensitive path, so it's not a big deal.

So I guess you agree that PCI DMA attacks are a potential vector also on
real hardware.  As Alex pointed out, VT-d is not a solution because
there could be legitimate DMA happening during CPU hotplug.  For OVMF
we'll probably go with Igor's idea, it would be nice if Intel chipsets
supported it too. :)

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-13 14:16 [Qemu-devel] CPU hotplug using SMM with QEMU+OVMF Laszlo Ersek
2019-08-13 16:09 ` Laszlo Ersek
2019-08-13 16:18   ` Laszlo Ersek
2019-08-14 13:20   ` Yao, Jiewen
2019-08-14 14:04     ` Paolo Bonzini
2019-08-15  9:55       ` Yao, Jiewen
2019-08-15 16:04         ` Paolo Bonzini
2019-08-15 15:00       ` [Qemu-devel] [edk2-devel] " Laszlo Ersek
2019-08-15 16:16         ` Igor Mammedov
2019-08-15 16:21         ` Paolo Bonzini
2019-08-16  2:46           ` Yao, Jiewen
2019-08-16  7:20             ` Paolo Bonzini
2019-08-16  7:49               ` Yao, Jiewen
2019-08-16 20:15                 ` Laszlo Ersek
2019-08-16 22:19                   ` Alex Williamson
2019-08-17  0:20                     ` Yao, Jiewen
2019-08-18 19:50                       ` Paolo Bonzini
2019-08-18 23:00                         ` Yao, Jiewen
2019-08-19 14:10                           ` Paolo Bonzini
2019-08-16 20:00           ` Laszlo Ersek
2019-08-15 16:07       ` [Qemu-devel] " Igor Mammedov
2019-08-15 16:24         ` Paolo Bonzini
2019-08-16  7:42           ` Igor Mammedov

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org qemu-devel@archiver.kernel.org
	public-inbox-index qemu-devel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox