From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53226) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yl2OP-0002u9-SD for qemu-devel@nongnu.org; Wed, 22 Apr 2015 17:41:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yl2OM-0004E0-K4 for qemu-devel@nongnu.org; Wed, 22 Apr 2015 17:41:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39906) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yl2OM-0004Di-BK for qemu-devel@nongnu.org; Wed, 22 Apr 2015 17:41:10 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id t3MLf91L031790 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Wed, 22 Apr 2015 17:41:09 -0400 Message-ID: <55381571.4000201@redhat.com> Date: Wed, 22 Apr 2015 23:41:05 +0200 From: Laszlo Ersek MIME-Version: 1.0 References: <1429521560-2743-1-git-send-email-kraxel@redhat.com> <1429521560-2743-5-git-send-email-kraxel@redhat.com> <55365C33.2090101@redhat.com> <1429628650.21164.24.camel@nilsson.home.kraxel.org> In-Reply-To: <1429628650.21164.24.camel@nilsson.home.kraxel.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 5/6] [wip] tseg, part1, not (yet) tested List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gerd Hoffmann Cc: pbonzini@redhat.com, qemu-devel@nongnu.org, mst@redhat.com On 04/21/15 17:04, Gerd Hoffmann wrote: >>> +static const MemoryRegionOps tseg_blackhole_ops = { >>> + .read = tseg_blackhole_read, >>> + .write = tseg_blackhole_write, >>> + .endianness = DEVICE_NATIVE_ENDIAN, >>> + .valid.min_access_size = 1, >>> + .valid.max_access_size = 4, >>> + .impl.min_access_size = 4, >>> + .impl.max_access_size = 4, >>> + .endianness = DEVICE_LITTLE_ENDIAN, >>> +}; >> >> (a) you've specified .endianness twice (with inconsistent initializers >> at that, although the value returned by the accessor happens to be >> unaffected). > > Oops, cut+paste bug, will fix (/me wonders why gcc didn't throw an error > on that one). > >> (b) Why are 8-byte accesses forbidden? ... > > just a matter of setting .valid.max_access_size = 8, can do that of > course. > >>> + >>> /* PCIe MMCFG */ >>> static void mch_update_pciexbar(MCHPCIState *mch) >>> { >>> @@ -267,6 +289,7 @@ static void mch_update_smram(MCHPCIState *mch) >>> { >>> PCIDevice *pd = PCI_DEVICE(mch); >>> bool h_smrame = (pd->config[MCH_HOST_BRIDGE_ESMRAMC] & MCH_HOST_BRIDGE_ESMRAMC_H_SMRAME); >>> + uint32_t tseg_size; >>> >>> /* implement SMRAM.D_LCK */ >>> if (pd->config[MCH_HOST_BRIDGE_SMRAM] & MCH_HOST_BRIDGE_SMRAM_D_LCK) { >>> @@ -296,6 +319,32 @@ static void mch_update_smram(MCHPCIState *mch) >>> memory_region_set_enabled(&mch->high_smram, false); >>> } >>> >>> + if (pd->config[MCH_HOST_BRIDGE_ESMRAMC] & MCH_HOST_BRIDGE_ESMRAMC_T_EN) { >>> + switch (pd->config[MCH_HOST_BRIDGE_ESMRAMC] & >>> + MCH_HOST_BRIDGE_ESMRAMC_TSEG_SZ_MASK) { >>> + case MCH_HOST_BRIDGE_ESMRAMC_TSEG_SZ_1MB: >>> + tseg_size = 1024 * 1024; >>> + break; >>> + case MCH_HOST_BRIDGE_ESMRAMC_TSEG_SZ_2MB: >>> + tseg_size = 1024 * 1024 * 2; >>> + break; >>> + case MCH_HOST_BRIDGE_ESMRAMC_TSEG_SZ_8MB: >>> + tseg_size = 1024 * 1024 * 8; >>> + break; >>> + default: >>> + tseg_size = 0; >>> + break; >>> + } >>> + } else { >>> + tseg_size = 0; >>> + } >> >> - so, is the guest supposed to select the tseg size by writing this >> register? > > Correct. > >> - I assume this register is not reprogrammable once SMRAM is locked -- >> is that correct? > > Correct. > >> - can the guest somehow use this facility to detect, dynamically, the >> presence of this feature? (For now I'm thinking about a static build >> flag for OVMF that would enable SMM support, but I'm 99% sure Jordan >> will object and ask for a dynamic feature detection.) > > Hmm. I think if we merge all the smm / smram / tseg patches in one go > this should work. Without patches reading the ESMRAMC register returns > zero. With the patches the three cache-disable bits are hardcoded to > '1'. This should work for detecting tseg support. > > You also have to test for smm support. The current protocol for this is > that seabios checks whenever smm is already initialized (see > *_apmc_smm_setup() in seabios/src/fw/smm.c) and if so it skips smm > initialization. Right now qemu sets the bit on reset when running on > kvm, so seabios doesn't try to use smm. On tcg the bit is clear after > reset and seabios actually uses smm mode. I started looking into this. Basically the condition for SMM/SMRAM support is: Q35 && SMRAM support && SMM support (1) The first subcondition can be checked via the host bridge devid (and OVMF already does, for other purposes). The second one relies on the ESMRAMC, which is in PCI config space. The third one is messy. It relies on SMI_EN, which is an ioport at PMBA+0x30. It requires a configured PMBA. The problem for OVMF is the following: this condition is too complex / too intrusive to evaluate in order to see whether Q35+SMM+SMRAM are available. For that reason, I'd like to ask if it would be possible to introduce a new fw_cfg file that would simply communicate the end result of the above expression. Long version: If the above condition evaluates to TRUE, we assume that the user wants "security" -- ie. it wants to prevent a "malicious" runtime guest OS form messing with OVMF's Secure Boot feature. Good. However, at the moment, OVMF has a large number of holes that a malicious runtime guest OS can exploit. The S3 LockBox and the varstore pflash chip are most frequently mentioned, but there are more; in particular I'm referring to the other special memory ranges OVMF uses. (Please refer to the A comprehensive memory map of OVMF section of the OVMF whitepaper for the following. It describes in detail what special memory ranges OVMF has, and the life cycle of each.) These ranges are adequately protected against a *benign* guest runtime OS; the UEFI memory map makes sure that such a runtime OS doesn't accidentally overwrite stuff that OVMF will either *need* for S3 resume in pristine form from the first cold boot, or will *overwrite* during S3 resume as scratch space (thereby potentially destroying OS data). But a malicious guest OS can nonetheless overwrite things in these special areas that OVMF will run (or use) during S3, before it locks down SMRAM again. Some of those holes / inappropriately protected memory ranges are tied to the SEC phase, which is the earliest and least capable UEFI phase. For example, refer to: +--------------------------+ 9216 KB PcdOvmfDxeMemFvBase | | |PEIFV from FVMAIN_COMPACT | size: 896 KB PcdOvmfPeiMemFvSize | decompressed firmware | | volume with PEI modules | | | +--------------------------+ 8320 KB PcdOvmfPeiMemFvBase "(6) PEIFV -- decompressed firmware volume with PEI modules". OVMF is an unconventional UEFI firmware in the sense that it runs *only* the SEC phase from (unmodifiable, read-only) flash. The PEI and DXE firmware volumes are decompressed from flash to RAM by SEC, and then PEI and DXE modules run from RAM. (It is more conventional to run PEI modules from the flash as well, not just SEC. The way OVMF does it allows it to save space in the pflash, because PEI modules are thus allowed to exist only in compressed (as opposed to directly executable) form in the pflash.) In order to save some cycles, OVMF's SEC decompresses the PEI and DXE firmware volumes (in one go) to RAM after a real reset only, and then "protects" the decompressed PEI firmware volume in RAM via the UEFI memory map (see above). Then, at S3 resume, SEC simply reuses the already decompressed (and "protected") PEI firmware volume when handing off to PEI. Obviously, if a guest OS pokes some foreign code into this decompressed PEI firmware volume before S3 suspend, then OVMF will run that foreign code at S3 resume, with the SMRAM open. Therefore OVMF has to forego such optimizations even as early as in SEC when condition (1) evaluates to false -- it must decompress the firmware volumes from pristine pflash over the possibly breached RAM even during S3 resume. And for that SEC needs to evaluate (1). This is where the complexity / intrusiveness of (1) becomes a problem. Programming the PMBA in SEC just so I can read SMI_EN (for the sake of evaluation (1)) is something I'd like to avoid. We already have a nice fw_cfg client library that is usable in SEC; that would be my choice for learning about condition (1). I'll audit the rest of the special memory regions too, of course. In any case, exposing this QEMU capability / configuration (see also Paolo's -machine smm=on/off idea) via fw_cfg seems both appropriate for fw_cfg (after all it does configure the firmware) and the least intrusive for OVMF. What do you think? Thanks! Laszlo >>> + memory_region_set_enabled(&mch->tseg_blackhole, tseg_size); >>> + memory_region_set_size(&mch->tseg_blackhole, tseg_size); >>> + memory_region_add_subregion_overlap(mch->system_memory, >>> + mch->below_4g_mem_size - tseg_size, >>> + &mch->tseg_blackhole, 1); >> >> So, does tseg overlay the last few MB of the RAM below 4GB? (Ie. right >> before the PCI hole starts?) > > tseg is just normal ram (yes, located at the end of memory), but (once > tseg is enabled) only cpus in smm mode are allowed to access it. > Likewise busmaster dma access is rejected, so non-smm code can't use the > sata controller to access this indirectly. > >>> + memory_region_init_io(&mch->tseg_blackhole, OBJECT(mch), >>> + &tseg_blackhole_ops, NULL, >>> + "tseg-blackhole", 0); >> >> Okay, this answers one of the above, tseg_blackhole is never uninitialized. >> >>> + memory_region_set_enabled(&mch->tseg_blackhole, false); >>> + memory_region_add_subregion_overlap(mch->system_memory, >>> + mch->below_4g_mem_size, >>> + &mch->tseg_blackhole, 1); >> >> The address (2nd argument) is inconsistent with the one in >> mch_update_smram() -- this function call places the tseg black hole at >> the beginning of the 32-bit PCI hole, not just below it. > > It's a zero-length hole here, therefore it actually is the same. But I > didn't bother to use "mch->below_4g_mem_size - 0" as start address to > make that more clear ;) > >> (a) place the permanent PEI RAM so that it ends at the top of >> "below_4g_mem_size"; PublishPeiMemory() >> (b) create one of the memory HOBs, QemuInitializeRam(), >> (c) create the MMIO HOB that describes the 32-bit PCI hole, >> MemMapInitialization() >> >> If the last 1 / 2 / 8 MB of the low RAM can't be used as general purpose >> RAM, then (a) is affected (it should simply be sunk by the tseg size of >> choice); (b) is affected similarly (just decrease the top of the memory >> HOB); > > Yes. > > cheers, > Gerd > >