From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [Qemu-devel] How to reserve guest physical region for ACPI Date: Thu, 7 Jan 2016 12:54:30 +0200 Message-ID: <20160107123214-mutt-send-email-mst@redhat.com> References: <1449040860-19040-1-git-send-email-guangrong.xiao@linux.intel.com> <5680A0C8.6040505@linux.intel.com> <20151228141917-mutt-send-email-mst@redhat.com> <20151230165554.4d15e5a4@nial.brq.redhat.com> <20151230213740-mutt-send-email-mst@redhat.com> <20160105173025.4aa4e69f@nial.brq.redhat.com> <20160105183211-mutt-send-email-mst@redhat.com> <20160107113025.32cc5b52@nial.brq.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Xiao Guangrong , ehabkost@redhat.com, kvm@vger.kernel.org, gleb@kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, dan.j.williams@intel.com, Laszlo Ersek , rth@twiddle.net To: Igor Mammedov Return-path: Received: from mx1.redhat.com ([209.132.183.28]:51281 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752786AbcAGKyf (ORCPT ); Thu, 7 Jan 2016 05:54:35 -0500 Content-Disposition: inline In-Reply-To: <20160107113025.32cc5b52@nial.brq.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Jan 07, 2016 at 11:30:25AM +0100, Igor Mammedov wrote: > On Tue, 5 Jan 2016 18:43:02 +0200 > "Michael S. Tsirkin" wrote: > > > On Tue, Jan 05, 2016 at 05:30:25PM +0100, Igor Mammedov wrote: > > > > > bios-linker-loader is a great interface for initializing some > > > > > guest owned data and linking it together but I think it adds > > > > > unnecessary complexity and is misused if it's used to handle > > > > > device owned data/on device memory in this and VMGID cases. > > > > > > > > I want a generic interface for guest to enumerate these things. linker > > > > seems quite reasonable but if you see a reason why it won't do, or want > > > > to propose a better interface, fine. > > > > > > > > PCI would do, too - though windows guys had concerns about > > > > returning PCI BARs from ACPI. > > > There were potential issues with pSeries bootloader that treated > > > PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed. > > > Could you point out to discussion about windows issues? > > > > > > What VMGEN patches that used PCI for mapping purposes were > > > stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM > > > class id but we couldn't agree on it. > > > > > > VMGEN v13 with full discussion is here > > > https://patchwork.ozlabs.org/patch/443554/ > > > So to continue with this route we would need to pick some other > > > driver less class id so windows won't prompt for driver or > > > maybe supply our own driver stub to guarantee that no one > > > would touch it. Any suggestions? > > > > Pick any device/vendor id pair for which windows specifies no driver. > > There's a small risk that this will conflict with some > > guest but I think it's minimal. > device/vendor id pair was QEMU specific so doesn't conflicts with anything > issue we were trying to solve was to prevent windows asking for driver > even though it does so only once if told not to ask again. > > That's why PCI_CLASS_MEMORY_RAM was selected as it's generic driver-less > device descriptor in INF file which matches as the last resort if > there isn't any other diver that's matched device by device/vendor id pair. I think this is the only class in this inf. If you can't use it, you must use an existing device/vendor id pair, there's some risk involved but probably not much. > > > > > > > > > > > > > > > > > There was RFC on list to make BIOS boot from NVDIMM already > > > > > doing some ACPI table lookup/parsing. Now if they were forced > > > > > to also parse and execute AML to initialize QEMU with guest > > > > > allocated address that would complicate them quite a bit. > > > > > > > > If they just need to find a table by name, it won't be > > > > too bad, would it? > > > that's what they were doing scanning memory for static NVDIMM table. > > > However if it were DataTable, BIOS side would have to execute > > > AML so that the table address could be told to QEMU. > > > > Not at all. You can find any table by its signature without > > parsing AML. > yep, and then BIOS would need to tell its address to QEMU > writing to IO port which is allocated statically in QEMU > for this purpose and is described in AML only on guest side. io ports are an ABI too but they are way easier to maintain. > > > > > > > In case of direct mapping or PCI BAR there is no need to initialize > > > QEMU side from AML. > > > That also saves us IO port where this address should be written > > > if bios-linker-loader approach is used. > > > > > > > > > > > > While with NVDIMM control memory region mapped directly by QEMU, > > > > > respective patches don't need in any way to initialize QEMU, > > > > > all they would need just read necessary data from control region. > > > > > > > > > > Also using bios-linker-loader takes away some usable RAM > > > > > from guest and in the end that doesn't scale, > > > > > the more devices I add the less usable RAM is left for guest OS > > > > > while all the device needs is a piece of GPA address space > > > > > that would belong to it. > > > > > > > > I don't get this comment. I don't think it's MMIO that is wanted. > > > > If it's backed by qemu virtual memory then it's RAM. > > > Then why don't allocate video card VRAM the same way and try to explain > > > user that a guest started with '-m 128 -device cirrus-vga,vgamem_mb=64Mb' > > > only has 64Mb of available RAM because of we think that on device VRAM > > > is also RAM. > > > > > > Maybe I've used MMIO term wrongly here but it roughly reflects the idea > > > that on device memory (whether it's VRAM, NVDIMM control block or VMGEN > > > area) is not allocated from guest's usable RAM (as described in E820) > > > but rather directly mapped in guest's GPA and doesn't consume available > > > RAM as guest sees it. That's also the way it's done on real hardware. > > > > > > What we need in case of VMGEN ID and NVDIMM is on device memory > > > that could be directly accessed by guest. > > > Both direct mapping or PCI BAR do that job and we could use simple > > > static AML without any patching. > > > > At least with VMGEN the issue is that there's an AML method > > that returns the physical address. > > Then if guest OS moves the BAR (which is legal), it will break > > since caller has no way to know it's related to the BAR. > I've found a following MS doc "Firmware Allocation of PCI Device Resources in Windows". It looks like when MS implemented resource rebalancing in > Vista they pushed a compat change to PCI specs. > That ECN is called "Ignore PCI Boot Configuration_DSM Function" > and can be found here: > https://pcisig.com/sites/default/files/specification_documents/ECR-Ignorebootconfig-final.pdf > > It looks like it's possible to forbid rebalancing per > device/bridge if it has _DMS method that returns "do not > ignore the boot configuration of PCI resources". I'll have to study this but we don't want that globally, do we? This restricts hotplug functionality significantly. > > > > > > > > > > > > > See patch at the bottom that might be handy. > > > > > > > > > > > > > he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1: > > > > > > > | when writing ASL one shall make sure that only XP supported > > > > > > > | features are in global scope, which is evaluated when tables > > > > > > > | are loaded and features of rev2 and higher are inside methods. > > > > > > > | That way XP doesn't crash as far as it doesn't evaluate unsupported > > > > > > > | opcode and one can guard those opcodes checking _REV object if neccesary. > > > > > > > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html) > > > > > > > > > > > > Yes, this technique works. > > > > > > > > > > > > An alternative is to add an XSDT, XP ignores that. > > > > > > XSDT at the moment breaks OVMF (because it loads both > > > > > > the RSDT and the XSDT, which is wrong), but I think > > > > > > Laszlo was working on a fix for that. > > > > > Using XSDT would increase ACPI tables occupied RAM > > > > > as it would duplicate DSDT + non XP supported AML > > > > > at global namespace. > > > > > > > > Not at all - I posted patches linking to same > > > > tables from both RSDT and XSDT at some point. > > > > Only the list of pointers would be different. > > > if you put XP incompatible AML in separate SSDT and link it > > > only from XSDT than that would work but if incompatibility > > > is in DSDT, one would have to provide compat DSDT for RSDT > > > an incompat DSDT for XSDT. > > > > So don't do this. > well spec says "An ACPI-compatible OS must use the XSDT if present", > which I read as tables pointed by RSDT MUST be pointed by XSDT > as well and RSDT MUST NOT not be used. > > so if we put incompatible changes in a separate SSDT and put > it only in XSDT that might work. Showstopper here is OVMF which > has issues with it as Laszlo pointed out. But that's just a bug. > Also since Windows implements only subset of spec XSDT trick > would cover only XP based versions while the rest will see and > use XSDT pointed tables which still could have incompatible > AML with some of the later windows versions. We'll have to see what these are exactly. If it's methods in SSDT we can check the version supported by the ASPM. > > > > > > So far policy was don't try to run guest OS on QEMU > > > configuration that isn't supported by it. > > > > It's better if guests don't see some features but > > don't crash. It's not always possible of course but > > we should try to avoid this. > > > > > For example we use VAR_PACKAGE when running with more > > > than 255 VCPUs (commit b4f4d5481) which BSODs XP. > > > > Yes. And it's because we violate the spec, DSDT > > should not have this stuff. > > > > > So we can continue with that policy with out resorting to > > > using both RSDT and XSDT, > > > It would be even easier as all AML would be dynamically > > > generated and DSDT would only contain AML elements for > > > a concrete QEMU configuration. > > > > I'd prefer XSDT but I won't nack it if you do it in DSDT. > > I think it's not spec compliant but guests do not > > seem to care. > > > > > > > So far we've managed keep DSDT compatible with XP while > > > > > introducing features from v2 and higher ACPI revisions as > > > > > AML that is only evaluated on demand. > > > > > We can continue doing so unless we have to unconditionally > > > > > add incompatible AML at global scope. > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > > > Michael, Paolo, what do you think about these ideas? > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > So using a patch below, we can add Name(PQRS, 0x0) at the top of the > > > > > > SSDT (or bottom, or add a separate SSDT just for that). It returns the > > > > > > current offset so we can add that to the linker. > > > > > > > > > > > > Won't work if you append the Name to the Aml structure (these can be > > > > > > nested to arbitrary depth using aml_append), so using plain GArray for > > > > > > this API makes sense to me. > > > > > > > > > > > > ---> > > > > > > > > > > > > acpi: add build_append_named_dword, returning an offset in buffer > > > > > > > > > > > > This is a very limited form of support for runtime patching - > > > > > > similar in functionality to what we can do with ACPI_EXTRACT > > > > > > macros in python, but implemented in C. > > > > > > > > > > > > This is to allow ACPI code direct access to data tables - > > > > > > which is exactly what DataTableRegion is there for, except > > > > > > no known windows release so far implements DataTableRegion. > > > > > unsupported means Windows will BSOD, so it's practically > > > > > unusable unless MS will patch currently existing Windows > > > > > versions. > > > > > > > > Yes. That's why my patch allows patching SSDT without using > > > > DataTableRegion. > > > > > > > > > Another thing about DataTableRegion is that ACPI tables are > > > > > supposed to have static content which matches checksum in > > > > > table the header while you are trying to use it for dynamic > > > > > data. It would be cleaner/more compatible to teach > > > > > bios-linker-loader to just allocate memory and patch AML > > > > > with the allocated address. > > > > > > > > Yes - if address is static, you need to put it outside > > > > the table. Can come right before or right after this. > > > > > > > > > Also if OperationRegion() is used, then one has to patch > > > > > DefOpRegion directly as RegionOffset must be Integer, > > > > > using variable names is not permitted there. > > > > > > > > I am not sure the comment was understood correctly. > > > > The comment says really "we can't use DataTableRegion > > > > so here is an alternative". > > > so how are you going to access data at which patched > > > NameString point to? > > > for that you'd need a normal patched OperationRegion > > > as well since DataTableRegion isn't usable. > > > > For VMGENID you would patch the method that > > returns the address - you do not need an op region > > as you never access it. > > > > I don't know about NVDIMM. Maybe OperationRegion can > > use the patched NameString? Will need some thought. > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > > > > > > > > > --- > > > > > > > > > > > > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h > > > > > > index 1b632dc..f8998ea 100644 > > > > > > --- a/include/hw/acpi/aml-build.h > > > > > > +++ b/include/hw/acpi/aml-build.h > > > > > > @@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre); > > > > > > void > > > > > > build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets); > > > > > > > > > > > > +int > > > > > > +build_append_named_dword(GArray *array, const char *name_format, ...); > > > > > > + > > > > > > #endif > > > > > > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c > > > > > > index 0d4b324..7f9fa65 100644 > > > > > > --- a/hw/acpi/aml-build.c > > > > > > +++ b/hw/acpi/aml-build.c > > > > > > @@ -262,6 +262,32 @@ static void build_append_int(GArray *table, uint64_t value) > > > > > > } > > > > > > } > > > > > > > > > > > > +/* Build NAME(XXXX, 0x0) where 0x0 is encoded as a qword, > > > > > > + * and return the offset to 0x0 for runtime patching. > > > > > > + * > > > > > > + * Warning: runtime patching is best avoided. Only use this as > > > > > > + * a replacement for DataTableRegion (for guests that don't > > > > > > + * support it). > > > > > > + */ > > > > > > +int > > > > > > +build_append_named_qword(GArray *array, const char *name_format, ...) > > > > > > +{ > > > > > > + int offset; > > > > > > + va_list ap; > > > > > > + > > > > > > + va_start(ap, name_format); > > > > > > + build_append_namestringv(array, name_format, ap); > > > > > > + va_end(ap); > > > > > > + > > > > > > + build_append_byte(array, 0x0E); /* QWordPrefix */ > > > > > > + > > > > > > + offset = array->len; > > > > > > + build_append_int_noprefix(array, 0x0, 8); > > > > > > + assert(array->len == offset + 8); > > > > > > + > > > > > > + return offset; > > > > > > +} > > > > > > + > > > > > > static GPtrArray *alloc_list; > > > > > > > > > > > > static Aml *aml_alloc(void) > > > > > > > > > > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > >