All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine
@ 2020-12-23  9:08 Jiahui Cen
  2020-12-23  9:08 ` [PATCH v3 1/8] acpi: Allow DSDT acpi table changes Jiahui Cen
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

This patch series adds some fixes for ARM virt machine pxb support.
1. Pass addr offset for IO, MMIO and bus number when builing crs, because
the addr_trans is needed to describe an addr resource. [1]
2. Inform guest os not to ignore the resource map generated by firmware as
the x86 default way. [2]
3. Reorder the root bridges [3] and exclude resources of extra root bridges
from main root bridge's _CRS.
4. Enable pxb for ARM virt machine by default.
5. Update expected DSDT files with the above changes and enable the pxb
unit-test.

v2->v3:
* Reorder the root bridges.
* Exclude resources of extra root bridges from main root bridge's _CRS.

v1->v2:
* Update expected DSDT files.
* Quote PCI Firmware spec as comments.

[1]: https://lore.kernel.org/qemu-devel/20201217132747.4744-1-cenjiahui@huawei.com/
[2]: https://lore.kernel.org/qemu-devel/20201217132926.4812-1-cenjiahui@huawei.com/
[3]: https://lore.kernel.org/lkml/20201218062335.5320-1-cenjiahui@huawei.com/

Jiahui Cen (8):
  acpi: Allow DSDT acpi table changes
  acpi: Add addr offset in build_crs
  acpi/gpex: Inform os to keep firmware resource map
  acpi/gpex: Exclude pxb's resources from PCI0
  acpi/gpex: Append pxb devs in ascending order
  Kconfig: Enable PXB for ARM_VIRT by default
  acpi: Enable pxb unit-test for ARM virt machine
  acpi: Update addr_trans and _DSM in expected files

 hw/acpi/aml-build.c               |  18 ++--
 hw/i386/acpi-build.c              |   3 +-
 hw/pci-bridge/Kconfig             |   2 +-
 hw/pci-host/gpex-acpi.c           |  96 ++++++++++++++------
 include/hw/acpi/aml-build.h       |   4 +-
 tests/data/acpi/microvm/DSDT.pcie | Bin 3023 -> 3031 bytes
 tests/data/acpi/virt/DSDT         | Bin 5196 -> 5204 bytes
 tests/data/acpi/virt/DSDT.memhp   | Bin 6557 -> 6565 bytes
 tests/data/acpi/virt/DSDT.numamem | Bin 5196 -> 5204 bytes
 tests/data/acpi/virt/DSDT.pxb     | Bin 7802 -> 7689 bytes
 tests/qtest/bios-tables-test.c    |   4 -
 11 files changed, 86 insertions(+), 41 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/8] acpi: Allow DSDT acpi table changes
  2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
@ 2020-12-23  9:08 ` Jiahui Cen
  2020-12-23  9:08 ` [PATCH v3 2/8] acpi: Add addr offset in build_crs Jiahui Cen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..42418e58e7 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,6 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/microvm/DSDT.pcie",
+"tests/data/acpi/virt/DSDT",
+"tests/data/acpi/virt/DSDT.memhp",
+"tests/data/acpi/virt/DSDT.numamem",
+"tests/data/acpi/virt/DSDT.pxb",
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/8] acpi: Add addr offset in build_crs
  2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
  2020-12-23  9:08 ` [PATCH v3 1/8] acpi: Allow DSDT acpi table changes Jiahui Cen
@ 2020-12-23  9:08 ` Jiahui Cen
  2020-12-29 13:36   ` Igor Mammedov
  2020-12-23  9:08 ` [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map Jiahui Cen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

AML needs Address Translation offset to describe how a bridge translates
addresses accross the bridge when using an address descriptor, and
especially on ARM, the translation offset of pio resource is usually
non zero.

Therefore, it's necessary to pass offset for pio, mmio32, mmio64 and bus
number into build_crs.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 hw/acpi/aml-build.c         | 18 ++++++++++--------
 hw/i386/acpi-build.c        |  3 ++-
 hw/pci-host/gpex-acpi.c     |  3 ++-
 include/hw/acpi/aml-build.h |  4 +++-
 4 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index f976aa667b..7b6ebb0cc8 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2076,7 +2076,9 @@ void build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
                  tpm2_ptr, "TPM2", table_data->len - tpm2_start, 4, NULL, NULL);
 }
 
-Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
+Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set, uint32_t io_offset,
+               uint32_t mmio32_offset, uint64_t mmio64_offset,
+               uint16_t bus_nr_offset)
 {
     Aml *crs = aml_resource_template();
     CrsRangeSet temp_range_set;
@@ -2189,10 +2191,10 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
     for (i = 0; i < temp_range_set.io_ranges->len; i++) {
         entry = g_ptr_array_index(temp_range_set.io_ranges, i);
         aml_append(crs,
-                   aml_word_io(AML_MIN_FIXED, AML_MAX_FIXED,
-                               AML_POS_DECODE, AML_ENTIRE_RANGE,
-                               0, entry->base, entry->limit, 0,
-                               entry->limit - entry->base + 1));
+                   aml_dword_io(AML_MIN_FIXED, AML_MAX_FIXED,
+                                AML_POS_DECODE, AML_ENTIRE_RANGE,
+                                0, entry->base, entry->limit, io_offset,
+                                entry->limit - entry->base + 1));
         crs_range_insert(range_set->io_ranges, entry->base, entry->limit);
     }
 
@@ -2205,7 +2207,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
                    aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED,
                                     AML_MAX_FIXED, AML_NON_CACHEABLE,
                                     AML_READ_WRITE,
-                                    0, entry->base, entry->limit, 0,
+                                    0, entry->base, entry->limit, mmio32_offset,
                                     entry->limit - entry->base + 1));
         crs_range_insert(range_set->mem_ranges, entry->base, entry->limit);
     }
@@ -2217,7 +2219,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
                    aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
                                     AML_MAX_FIXED, AML_NON_CACHEABLE,
                                     AML_READ_WRITE,
-                                    0, entry->base, entry->limit, 0,
+                                    0, entry->base, entry->limit, mmio64_offset,
                                     entry->limit - entry->base + 1));
         crs_range_insert(range_set->mem_64bit_ranges,
                          entry->base, entry->limit);
@@ -2230,7 +2232,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
                             0,
                             pci_bus_num(host->bus),
                             max_bus,
-                            0,
+                            bus_nr_offset,
                             max_bus - pci_bus_num(host->bus) + 1));
 
     return crs;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f18b71dea9..f56d699c7f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1360,7 +1360,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
             }
 
             aml_append(dev, build_prt(false));
-            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set);
+            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set,
+                            0, 0, 0, 0);
             aml_append(dev, aml_name_decl("_CRS", crs));
             aml_append(scope, dev);
             aml_append(dsdt, scope);
diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index 7f20ee1c98..11b3db8f71 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -168,7 +168,8 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
              * 1. The resources the pci-brige/pcie-root-port need.
              * 2. The resources the devices behind pxb need.
              */
-            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set);
+            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set,
+                            cfg->pio.base, 0, 0, 0);
             aml_append(dev, aml_name_decl("_CRS", crs));
 
             acpi_dsdt_add_pci_osc(dev);
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index e727bea1bc..54a5aec4d7 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -452,7 +452,9 @@ void crs_replace_with_free_ranges(GPtrArray *ranges,
 void crs_range_set_init(CrsRangeSet *range_set);
 void crs_range_set_free(CrsRangeSet *range_set);
 
-Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set);
+Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set, uint32_t io_offset,
+               uint32_t mmio32_offset, uint64_t mmio64_offset,
+               uint16_t bus_nr_offset);
 
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
                        uint64_t len, int node, MemoryAffinityFlags flags);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
  2020-12-23  9:08 ` [PATCH v3 1/8] acpi: Allow DSDT acpi table changes Jiahui Cen
  2020-12-23  9:08 ` [PATCH v3 2/8] acpi: Add addr offset in build_crs Jiahui Cen
@ 2020-12-23  9:08 ` Jiahui Cen
  2020-12-29 13:41   ` Igor Mammedov
  2020-12-23  9:08 ` [PATCH v3 4/8] acpi/gpex: Exclude pxb's resources from PCI0 Jiahui Cen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

There may be some differences in pci resource assignment between guest os
and firmware.

Eg. A Bridge with Bus [d2]
    -+-[0000:d2]---01.0-[d3]----01.0

    where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
          [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
                                          BAR4 (mem, 64-bit, pref) [size=64M]

    In EDK2, the Resource Map would be:
        PciBus: Resource Map for Bridge [D2|01|00]
        Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
           Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
           Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
        Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
    It would use 0x4100000 to calculate the root bus's PMem64 resource window.

    While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
    the PMem64 size, which would be 0x6000000. So kernel would try to
    allocate 0x6000000 from the PMem64 resource window, but since the window
    size is 0x4100000 as assigned by EDK2, the allocation would fail.

The diffences could result in resource assignment failure.

Using _DSM #5 method to inform guest os not to ignore the PCI configuration
that firmware has done at boot time could handle the differences.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index 11b3db8f71..c189306599 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
     UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
     ifctx = aml_if(aml_equal(aml_arg(0), UUID));
     ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
-    uint8_t byte_list[1] = {1};
-    buf = aml_buffer(1, byte_list);
+    uint8_t byte_list[] = {
+                0x1 << 0 /* support for functions other than function 0 */ |
+                0x1 << 5 /* support for function 5 */
+                };
+    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
     aml_append(ifctx1, aml_return(buf));
     aml_append(ifctx, ifctx1);
+
+    /* PCI Firmware Specification 3.1
+     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
+     */
+    /* Arg2: Function Index: 5 */
+    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
+    /* 0 - The operating system must not ignore the PCI configuration that
+     *     firmware has done at boot time.
+     */
+    aml_append(ifctx1, aml_return(aml_int(0)));
+    aml_append(ifctx, ifctx1);
     aml_append(method, ifctx);
 
     byte_list[0] = 0;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 4/8] acpi/gpex: Exclude pxb's resources from PCI0
  2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
                   ` (2 preceding siblings ...)
  2020-12-23  9:08 ` [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map Jiahui Cen
@ 2020-12-23  9:08 ` Jiahui Cen
  2020-12-23  9:08 ` [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order Jiahui Cen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

Exclude the resources of extra root bridges from PCI0's _CRS. Otherwise,
the resource windows would overlap in guest, and the IO resource window
would fail to be registered.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 hw/pci-host/gpex-acpi.c | 64 +++++++++++++-------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index c189306599..4bf1e94309 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -144,6 +144,8 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
     Aml *method, *crs, *dev, *rbuf;
     PCIBus *bus = cfg->bus;
     CrsRangeSet crs_range_set;
+    CrsRangeEntry *entry;
+    int i;
 
     /* start to construct the tables for pxb */
     crs_range_set_init(&crs_range_set);
@@ -191,7 +193,6 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
             aml_append(scope, dev);
         }
     }
-    crs_range_set_free(&crs_range_set);
 
     /* tables for the main */
     dev = aml_device("%s", "PCI0");
@@ -209,36 +210,55 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
     aml_append(method, aml_return(aml_int(cfg->ecam.base)));
     aml_append(dev, method);
 
+    /*
+     * At this point crs_range_set has all the ranges used by pci
+     * busses *other* than PCI0.  These ranges will be excluded from
+     * the PCI0._CRS.
+     */
     rbuf = aml_resource_template();
     aml_append(rbuf,
         aml_word_bus_number(AML_MIN_FIXED, AML_MAX_FIXED, AML_POS_DECODE,
                             0x0000, 0x0000, nr_pcie_buses - 1, 0x0000,
                             nr_pcie_buses));
     if (cfg->mmio32.size) {
-        aml_append(rbuf,
-                   aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
-                                    AML_NON_CACHEABLE, AML_READ_WRITE, 0x0000,
-                                    cfg->mmio32.base,
-                                    cfg->mmio32.base + cfg->mmio32.size - 1,
-                                    0x0000,
-                                    cfg->mmio32.size));
+        crs_replace_with_free_ranges(crs_range_set.mem_ranges,
+                                     cfg->mmio32.base,
+                                     cfg->mmio32.base + cfg->mmio32.size - 1);
+        for (i = 0; i < crs_range_set.mem_ranges->len; i++) {
+            entry = g_ptr_array_index(crs_range_set.mem_ranges, i);
+            aml_append(rbuf,
+                aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
+                                 AML_NON_CACHEABLE, AML_READ_WRITE, 0x0000,
+                                 entry->base, entry->limit,
+                                 0x0000, entry->limit - entry->base + 1));
+        }
     }
     if (cfg->pio.size) {
-        aml_append(rbuf,
-                   aml_dword_io(AML_MIN_FIXED, AML_MAX_FIXED, AML_POS_DECODE,
-                                AML_ENTIRE_RANGE, 0x0000, 0x0000,
-                                cfg->pio.size - 1,
-                                cfg->pio.base,
-                                cfg->pio.size));
+        crs_replace_with_free_ranges(crs_range_set.io_ranges,
+                                     0x0000,
+                                     cfg->pio.size - 1);
+        for (i = 0; i < crs_range_set.io_ranges->len; i++) {
+            entry = g_ptr_array_index(crs_range_set.io_ranges, i);
+            aml_append(rbuf,
+                aml_dword_io(AML_MIN_FIXED, AML_MAX_FIXED, AML_POS_DECODE,
+                             AML_ENTIRE_RANGE, 0x0000, entry->base,
+                             entry->limit, cfg->pio.base,
+                             entry->limit - entry->base + 1));
+        }
     }
     if (cfg->mmio64.size) {
-        aml_append(rbuf,
-                   aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
-                                    AML_NON_CACHEABLE, AML_READ_WRITE, 0x0000,
-                                    cfg->mmio64.base,
-                                    cfg->mmio64.base + cfg->mmio64.size - 1,
-                                    0x0000,
-                                    cfg->mmio64.size));
+        crs_replace_with_free_ranges(crs_range_set.mem_64bit_ranges,
+                                     cfg->mmio64.base,
+                                     cfg->mmio64.base + cfg->mmio64.size - 1);
+        for (i = 0; i < crs_range_set.mem_64bit_ranges->len; i++) {
+            entry = g_ptr_array_index(crs_range_set.mem_64bit_ranges, i);
+            aml_append(rbuf,
+                aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
+                                 AML_NON_CACHEABLE, AML_READ_WRITE, 0x0000,
+                                 entry->base,
+                                 entry->limit, 0x0000,
+                                 entry->limit - entry->base + 1));
+        }
     }
     aml_append(dev, aml_name_decl("_CRS", rbuf));
 
@@ -257,4 +277,6 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
     aml_append(dev_res0, aml_name_decl("_CRS", crs));
     aml_append(dev, dev_res0);
     aml_append(scope, dev);
+
+    crs_range_set_free(&crs_range_set);
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order
  2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
                   ` (3 preceding siblings ...)
  2020-12-23  9:08 ` [PATCH v3 4/8] acpi/gpex: Exclude pxb's resources from PCI0 Jiahui Cen
@ 2020-12-23  9:08 ` Jiahui Cen
  2020-12-29 13:47   ` Igor Mammedov
  2020-12-23  9:08 ` [PATCH v3 6/8] Kconfig: Enable PXB for ARM_VIRT by default Jiahui Cen
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

The overlap check of IO resource window would fail when Linux kernel
registers an IO resource [b, c) earlier than another resource [a, b).
Though this incorrect check could be fixed by [1], it would be better to
append pxb devs into DSDT table in ascending order.

[1]: https://lore.kernel.org/lkml/20201218062335.5320-1-cenjiahui@huawei.com/

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 hw/pci-host/gpex-acpi.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index 4bf1e94309..95a7a0f12b 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -141,7 +141,7 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
 void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 {
     int nr_pcie_buses = cfg->ecam.size / PCIE_MMCFG_SIZE_MIN;
-    Aml *method, *crs, *dev, *rbuf;
+    Aml *method, *crs, *dev, *rbuf, *pxb_devs[nr_pcie_buses];
     PCIBus *bus = cfg->bus;
     CrsRangeSet crs_range_set;
     CrsRangeEntry *entry;
@@ -149,6 +149,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 
     /* start to construct the tables for pxb */
     crs_range_set_init(&crs_range_set);
+    memset(pxb_devs, 0, sizeof(pxb_devs));
     if (bus) {
         QLIST_FOREACH(bus, &bus->child, sibling) {
             uint8_t bus_num = pci_bus_num(bus);
@@ -190,7 +191,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 
             acpi_dsdt_add_pci_osc(dev);
 
-            aml_append(scope, dev);
+            pxb_devs[bus_num] = dev;
         }
     }
 
@@ -278,5 +279,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
     aml_append(dev, dev_res0);
     aml_append(scope, dev);
 
+    for (i = 0; i < ARRAY_SIZE(pxb_devs); i++) {
+        if (pxb_devs[i]) {
+            aml_append(scope, pxb_devs[i]);
+        }
+    }
+
     crs_range_set_free(&crs_range_set);
 }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 6/8] Kconfig: Enable PXB for ARM_VIRT by default
  2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
                   ` (4 preceding siblings ...)
  2020-12-23  9:08 ` [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order Jiahui Cen
@ 2020-12-23  9:08 ` Jiahui Cen
  2020-12-29 13:50   ` Igor Mammedov
  2020-12-23  9:08 ` [PATCH v3 7/8] acpi: Enable pxb unit-test for ARM virt machine Jiahui Cen
  2020-12-23  9:08 ` [PATCH v3 8/8] acpi: Update addr_trans and _DSM in expected files Jiahui Cen
  7 siblings, 1 reply; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

PXB is now supported on ARM, so let's enable it by default.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 hw/pci-bridge/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci-bridge/Kconfig b/hw/pci-bridge/Kconfig
index a51ec716f5..f8df4315ba 100644
--- a/hw/pci-bridge/Kconfig
+++ b/hw/pci-bridge/Kconfig
@@ -5,7 +5,7 @@ config PCIE_PORT
 
 config PXB
     bool
-    default y if Q35
+    default y if Q35 || ARM_VIRT
 
 config XIO3130
     bool
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 7/8] acpi: Enable pxb unit-test for ARM virt machine
  2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
                   ` (5 preceding siblings ...)
  2020-12-23  9:08 ` [PATCH v3 6/8] Kconfig: Enable PXB for ARM_VIRT by default Jiahui Cen
@ 2020-12-23  9:08 ` Jiahui Cen
  2020-12-23  9:08 ` [PATCH v3 8/8] acpi: Update addr_trans and _DSM in expected files Jiahui Cen
  7 siblings, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

No matter whether the pxb is enabled or not, the CONFIG_PXB macro in test
would keep undefined. And since pxb is now enabled for ARM Virt machine
by default, let's enable pxb unit-test by removing the CONFIG_PXB.

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 tests/qtest/bios-tables-test.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 4e026f90d0..669202fc95 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1196,7 +1196,6 @@ static void test_acpi_virt_tcg_numamem(void)
 
 }
 
-#ifdef CONFIG_PXB
 static void test_acpi_virt_tcg_pxb(void)
 {
     test_data data = {
@@ -1228,7 +1227,6 @@ static void test_acpi_virt_tcg_pxb(void)
 
     free_test_data(&data);
 }
-#endif
 
 static void test_acpi_tcg_acpi_hmat(const char *machine)
 {
@@ -1342,9 +1340,7 @@ int main(int argc, char *argv[])
         qtest_add_func("acpi/virt", test_acpi_virt_tcg);
         qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem);
         qtest_add_func("acpi/virt/memhp", test_acpi_virt_tcg_memhp);
-#ifdef CONFIG_PXB
         qtest_add_func("acpi/virt/pxb", test_acpi_virt_tcg_pxb);
-#endif
     }
     ret = g_test_run();
     boot_sector_cleanup(disk);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 8/8] acpi: Update addr_trans and _DSM in expected files
  2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
                   ` (6 preceding siblings ...)
  2020-12-23  9:08 ` [PATCH v3 7/8] acpi: Enable pxb unit-test for ARM virt machine Jiahui Cen
@ 2020-12-23  9:08 ` Jiahui Cen
  7 siblings, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-23  9:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Michael S. Tsirkin,
	Ard Biesheuvel, Richard Henderson, Paolo Bonzini, Igor Mammedov,
	Laszlo Ersek, wu.wubin

Addr_trans in _CRS is changed and a new _DSM #5 method is added.
Also the expected file for pxb for ARM virt does not match the source code.

Update expected DSDT files accordingly, and re-enable their testing.

Full diff of changed files disassembly:

diff -ru /tmp/old/tests/data/acpi/microvm/DSDT.pcie.dsl /tmp/new/tests/data/acpi/microvm/DSDT.pcie.dsl
--- /tmp/old/tests/data/acpi/microvm/DSDT.pcie.dsl      2020-12-23 15:49:57.161081285 +0800
+++ /tmp/new/tests/data/acpi/microvm/DSDT.pcie.dsl      2020-12-23 15:55:11.837769953 +0800
@@ -9,9 +9,9 @@
  *
  * Original Table Header:
  *     Signature        "DSDT"
- *     Length           0x00000BCF (3023)
+ *     Length           0x00000BD7 (3031)
  *     Revision         0x02
- *     Checksum         0x29
+ *     Checksum         0x99
  *     OEM ID           "BOCHS "
  *     OEM Table ID     "BXPCDSDT"
  *     OEM Revision     0x00000001 (1)
@@ -1302,9 +1302,14 @@
                     {
                         Return (Buffer (One)
                         {
-                             0x01                                             // .
+                             0x21                                             // !
                         })
                     }
+
+                    If ((Arg2 == 0x05))
+                    {
+                        Return (Zero)
+                    }
                 }

                 Return (Buffer (One)
diff -ru /tmp/old/tests/data/acpi/virt/DSDT.dsl /tmp/new/tests/data/acpi/virt/DSDT.dsl
--- /tmp/old/tests/data/acpi/virt/DSDT.dsl      2020-12-23 15:49:57.421095066 +0800
+++ /tmp/new/tests/data/acpi/virt/DSDT.dsl      2020-12-23 15:55:12.267792771 +0800
@@ -9,9 +9,9 @@
  *
  * Original Table Header:
  *     Signature        "DSDT"
- *     Length           0x0000144C (5196)
+ *     Length           0x00001454 (5204)
  *     Revision         0x02
- *     Checksum         0xF0
+ *     Checksum         0x60
  *     OEM ID           "BOCHS "
  *     OEM Table ID     "BXPCDSDT"
  *     OEM Revision     0x00000001 (1)
@@ -1838,9 +1838,14 @@
                     {
                         Return (Buffer (One)
                         {
-                             0x01                                             // .
+                             0x21                                             // !
                         })
                     }
+
+                    If ((Arg2 == 0x05))
+                    {
+                        Return (Zero)
+                    }
                 }

                 Return (Buffer (One)
diff -ru /tmp/old/tests/data/acpi/virt/DSDT.memhp.dsl /tmp/new/tests/data/acpi/virt/DSDT.memhp.dsl
--- /tmp/old/tests/data/acpi/virt/DSDT.memhp.dsl        2020-12-23 15:49:57.421095066 +0800
+++ /tmp/new/tests/data/acpi/virt/DSDT.memhp.dsl        2020-12-23 15:55:12.277793302 +0800
@@ -9,9 +9,9 @@
  *
  * Original Table Header:
  *     Signature        "DSDT"
- *     Length           0x0000199D (6557)
+ *     Length           0x000019A5 (6565)
  *     Revision         0x02
- *     Checksum         0x11
+ *     Checksum         0x90
  *     OEM ID           "BOCHS "
  *     OEM Table ID     "BXPCDSDT"
  *     OEM Revision     0x00000001 (1)
@@ -1840,9 +1840,14 @@
                     {
                         Return (Buffer (One)
                         {
-                             0x01                                             // .
+                             0x21                                             // !
                         })
                     }
+
+                    If ((Arg2 == 0x05))
+                    {
+                        Return (Zero)
+                    }
                 }

                 Return (Buffer (One)
diff -ru /tmp/old/tests/data/acpi/virt/DSDT.numamem.dsl /tmp/new/tests/data/acpi/virt/DSDT.numamem.dsl
--- /tmp/old/tests/data/acpi/virt/DSDT.numamem.dsl      2020-12-23 15:49:57.431095596 +0800
+++ /tmp/new/tests/data/acpi/virt/DSDT.numamem.dsl      2020-12-23 15:55:12.287793832 +0800
@@ -9,9 +9,9 @@
  *
  * Original Table Header:
  *     Signature        "DSDT"
- *     Length           0x0000144C (5196)
+ *     Length           0x00001454 (5204)
  *     Revision         0x02
- *     Checksum         0xF0
+ *     Checksum         0x60
  *     OEM ID           "BOCHS "
  *     OEM Table ID     "BXPCDSDT"
  *     OEM Revision     0x00000001 (1)
@@ -1838,9 +1838,14 @@
                     {
                         Return (Buffer (One)
                         {
-                             0x01                                             // .
+                             0x21                                             // !
                         })
                     }
+
+                    If ((Arg2 == 0x05))
+                    {
+                        Return (Zero)
+                    }
                 }

                 Return (Buffer (One)
diff -ru /tmp/old/tests/data/acpi/virt/DSDT.pxb.dsl /tmp/new/tests/data/acpi/virt/DSDT.pxb.dsl
--- /tmp/old/tests/data/acpi/virt/DSDT.pxb.dsl  2020-12-23 15:49:57.441096126 +0800
+++ /tmp/new/tests/data/acpi/virt/DSDT.pxb.dsl  2020-12-23 15:55:12.287793832 +0800
@@ -9,9 +9,9 @@
  *
  * Original Table Header:
  *     Signature        "DSDT"
- *     Length           0x00001E7A (7802)
+ *     Length           0x00001E09 (7689)
  *     Revision         0x02
- *     Checksum         0x57
+ *     Checksum         0x30
  *     OEM ID           "BOCHS "
  *     OEM Table ID     "BXPCDSDT"
  *     OEM Revision     0x00000001 (1)
@@ -45,32 +45,6 @@
             })
         }

-        Device (FLS0)
-        {
-            Name (_HID, "LNRO0015")  // _HID: Hardware ID
-            Name (_UID, Zero)  // _UID: Unique ID
-            Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
-            {
-                Memory32Fixed (ReadWrite,
-                    0x00000000,         // Address Base
-                    0x04000000,         // Address Length
-                    )
-            })
-        }
-
-        Device (FLS1)
-        {
-            Name (_HID, "LNRO0015")  // _HID: Hardware ID
-            Name (_UID, One)  // _UID: Unique ID
-            Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
-            {
-                Memory32Fixed (ReadWrite,
-                    0x04000000,         // Address Base
-                    0x04000000,         // Address Length
-                    )
-            })
-        }
-
         Device (FWCF)
         {
             Name (_HID, "QEMU0002")  // _HID: Hardware ID
@@ -661,16 +635,15 @@
             })
         }

-        Device (PC80)
+        Device (PCI0)
         {
             Name (_HID, "PNP0A08" /* PCI Express Bus */)  // _HID: Hardware ID
             Name (_CID, "PNP0A03" /* PCI Bus */)  // _CID: Compatible ID
-            Name (_ADR, Zero)  // _ADR: Address
-            Name (_CCA, One)  // _CCA: Cache Coherency Attribute
             Name (_SEG, Zero)  // _SEG: PCI Segment
-            Name (_BBN, 0x80)  // _BBN: BIOS Bus Number
-            Name (_UID, 0x80)  // _UID: Unique ID
-            Name (_STR, Unicode ("pxb Device"))  // _STR: Description String
+            Name (_BBN, Zero)  // _BBN: BIOS Bus Number
+            Name (_UID, Zero)  // _UID: Unique ID
+            Name (_STR, Unicode ("PCIe 0 Device"))  // _STR: Description String
+            Name (_CCA, One)  // _CCA: Cache Coherency Attribute
             Name (_PRT, Package (0x80)  // _PRT: PCI Routing Table
             {
                 Package (0x04)
@@ -1789,15 +1762,41 @@
                 }
             }

+            Method (_CBA, 0, NotSerialized)  // _CBA: Configuration Base Address
+            {
+                Return (0x0000004010000000)
+            }
+
             Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
             {
                 WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,
                     0x0000,             // Granularity
-                    0x0080,             // Range Minimum
-                    0x0080,             // Range Maximum
+                    0x0000,             // Range Minimum
+                    0x007F,             // Range Maximum
                     0x0000,             // Translation Offset
-                    0x0001,             // Length
+                    0x0080,             // Length
                     ,, )
+                DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite,
+                    0x00000000,         // Granularity
+                    0x10000000,         // Range Minimum
+                    0x3EFEFFFF,         // Range Maximum
+                    0x00000000,         // Translation Offset
+                    0x2EFF0000,         // Length
+                    ,, , AddressRangeMemory, TypeStatic)
+                DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
+                    0x00000000,         // Granularity
+                    0x00000000,         // Range Minimum
+                    0x0000FFFF,         // Range Maximum
+                    0x3EFF0000,         // Translation Offset
+                    0x00010000,         // Length
+                    ,, , TypeStatic, DenseTranslation)
+                QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite,
+                    0x0000000000000000, // Granularity
+                    0x0000008000000000, // Range Minimum
+                    0x000000FFFFFFFFFF, // Range Maximum
+                    0x0000000000000000, // Translation Offset
+                    0x0000008000000000, // Length
+                    ,, , AddressRangeMemory, TypeStatic)
             })
             Name (SUPP, Zero)
             Name (CTRL, Zero)
@@ -1808,8 +1807,8 @@
                 {
                     CreateDWordField (Arg3, 0x04, CDW2)
                     CreateDWordField (Arg3, 0x08, CDW3)
-                    SUPP = CDW2 /* \_SB_.PC80._OSC.CDW2 */
-                    CTRL = CDW3 /* \_SB_.PC80._OSC.CDW3 */
+                    SUPP = CDW2 /* \_SB_.PCI0._OSC.CDW2 */
+                    CTRL = CDW3 /* \_SB_.PCI0._OSC.CDW3 */
                     CTRL &= 0x1F
                     If ((Arg1 != One))
                     {
@@ -1821,7 +1820,7 @@
                         CDW1 |= 0x10
                     }

-                    CDW3 = CTRL /* \_SB_.PC80.CTRL */
+                    CDW3 = CTRL /* \_SB_.PCI0.CTRL */
                     Return (Arg3)
                 }
                 Else
@@ -1839,9 +1838,14 @@
                     {
                         Return (Buffer (One)
                         {
-                             0x01                                             // .
+                             0x21                                             // !
                         })
                     }
+
+                    If ((Arg2 == 0x05))
+                    {
+                        Return (Zero)
+                    }
                 }

                 Return (Buffer (One)
@@ -1849,17 +1853,30 @@
                      0x00                                             // .
                 })
             }
+
+            Device (RES0)
+            {
+                Name (_HID, "PNP0C02" /* PNP Motherboard Resources */)  // _HID: Hardware ID
+                Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
+                {
+                    QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite,
+                        0x0000000000000000, // Granularity
+                        0x0000004010000000, // Range Minimum
+                        0x000000401FFFFFFF, // Range Maximum
+                        0x0000000000000000, // Translation Offset
+                        0x0000000010000000, // Length
+                        ,, , AddressRangeMemory, TypeStatic)
+                })
+            }
         }

-        Device (PCI0)
+        Device (PC80)
         {
             Name (_HID, "PNP0A08" /* PCI Express Bus */)  // _HID: Hardware ID
             Name (_CID, "PNP0A03" /* PCI Bus */)  // _CID: Compatible ID
-            Name (_SEG, Zero)  // _SEG: PCI Segment
-            Name (_BBN, Zero)  // _BBN: BIOS Bus Number
-            Name (_UID, "PCI0")  // _UID: Unique ID
-            Name (_STR, Unicode ("PCIe 0 Device"))  // _STR: Description String
-            Name (_CCA, One)  // _CCA: Cache Coherency Attribute
+            Name (_BBN, 0x80)  // _BBN: BIOS Bus Number
+            Name (_UID, 0x80)  // _UID: Unique ID
+            Name (_STR, Unicode ("pxb Device"))  // _STR: Description String
             Name (_PRT, Package (0x80)  // _PRT: PCI Routing Table
             {
                 Package (0x04)
@@ -2978,46 +2995,16 @@
                 }
             }

-            Method (_CBA, 0, NotSerialized)  // _CBA: Configuration Base Address
-            {
-                Return (0x0000004010000000)
-            }
-
-            Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
+            Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
             {
-                Return (ResourceTemplate ()
-                {
-                    WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,
-                        0x0000,             // Granularity
-                        0x0000,             // Range Minimum
-                        0x007F,             // Range Maximum
-                        0x0000,             // Translation Offset
-                        0x0080,             // Length
-                        ,, )
-                    DWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite,
-                        0x00000000,         // Granularity
-                        0x10000000,         // Range Minimum
-                        0x3EFEFFFF,         // Range Maximum
-                        0x00000000,         // Translation Offset
-                        0x2EFF0000,         // Length
-                        ,, , AddressRangeMemory, TypeStatic)
-                    DWordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
-                        0x00000000,         // Granularity
-                        0x00000000,         // Range Minimum
-                        0x0000FFFF,         // Range Maximum
-                        0x3EFF0000,         // Translation Offset
-                        0x00010000,         // Length
-                        ,, , TypeStatic, DenseTranslation)
-                    QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite,
-                        0x0000000000000000, // Granularity
-                        0x0000008000000000, // Range Minimum
-                        0x000000FFFFFFFFFF, // Range Maximum
-                        0x0000000000000000, // Translation Offset
-                        0x0000008000000000, // Length
-                        ,, , AddressRangeMemory, TypeStatic)
-                })
-            }
-
+                WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode,
+                    0x0000,             // Granularity
+                    0x0080,             // Range Minimum
+                    0x0080,             // Range Maximum
+                    0x0000,             // Translation Offset
+                    0x0001,             // Length
+                    ,, )
+            })
             Name (SUPP, Zero)
             Name (CTRL, Zero)
             Method (_OSC, 4, NotSerialized)  // _OSC: Operating System Capabilities
@@ -3027,8 +3014,8 @@
                 {
                     CreateDWordField (Arg3, 0x04, CDW2)
                     CreateDWordField (Arg3, 0x08, CDW3)
-                    SUPP = CDW2 /* \_SB_.PCI0._OSC.CDW2 */
-                    CTRL = CDW3 /* \_SB_.PCI0._OSC.CDW3 */
+                    SUPP = CDW2 /* \_SB_.PC80._OSC.CDW2 */
+                    CTRL = CDW3 /* \_SB_.PC80._OSC.CDW3 */
                     CTRL &= 0x1F
                     If ((Arg1 != One))
                     {
@@ -3040,7 +3027,7 @@
                         CDW1 |= 0x10
                     }

-                    CDW3 = CTRL /* \_SB_.PCI0.CTRL */
+                    CDW3 = CTRL /* \_SB_.PC80.CTRL */
                     Return (Arg3)
                 }
                 Else
@@ -3058,9 +3045,14 @@
                     {
                         Return (Buffer (One)
                         {
-                             0x01                                             // .
+                             0x21                                             // !
                         })
                     }
+
+                    If ((Arg2 == 0x05))
+                    {
+                        Return (Zero)
+                    }
                 }

                 Return (Buffer (One)
@@ -3068,21 +3060,6 @@
                      0x00                                             // .
                 })
             }
-
-            Device (RES0)
-            {
-                Name (_HID, "PNP0C02" /* PNP Motherboard Resources */)  // _HID: Hardware ID
-                Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
-                {
-                    QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite,
-                        0x0000000000000000, // Granularity
-                        0x0000004010000000, // Range Minimum
-                        0x000000401FFFFFFF, // Range Maximum
-                        0x0000000000000000, // Translation Offset
-                        0x0000000010000000, // Length
-                        ,, , AddressRangeMemory, TypeStatic)
-                })
-            }
         }

         Device (\_SB.GED)

Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
---
 tests/data/acpi/microvm/DSDT.pcie           | Bin 3023 -> 3031 bytes
 tests/data/acpi/virt/DSDT                   | Bin 5196 -> 5204 bytes
 tests/data/acpi/virt/DSDT.memhp             | Bin 6557 -> 6565 bytes
 tests/data/acpi/virt/DSDT.numamem           | Bin 5196 -> 5204 bytes
 tests/data/acpi/virt/DSDT.pxb               | Bin 7802 -> 7689 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   5 -----
 6 files changed, 5 deletions(-)

diff --git a/tests/data/acpi/microvm/DSDT.pcie b/tests/data/acpi/microvm/DSDT.pcie
index 4b765541e372f4ba4e25529c14acf696516c8f61..e590b98f9960025f75dd0544492d3088781406dc 100644
GIT binary patch
delta 59
zcmV-B0L1^#7uOdGL{mgm*9!mu0-2Et8v;SPu_reH0Z6l70pke>HD5$iO$4ARlS&I8
R2_c{dlWGbDqyUp@3uOk*5ZC|!

delta 51
zcmcaEeqNl*CD<k8JU0UaljcM&X(rE|8`aGj867sqGd||z(2aKq_GMY1IN6I^o{@2K
H5qBy8b(js<

diff --git a/tests/data/acpi/virt/DSDT b/tests/data/acpi/virt/DSDT
index bc519abff9cadc1552e4e586b0a3f5f0db498f4a..ea8a0869af1637ab75fe335e100256a2acf85e16 100644
GIT binary patch
delta 58
zcmX@3aYcj6CD<h-M1+BXDPba)G-LlpHAzk;w-t*WIk`AY<6VM%Sr%wc_7s-qR9wJ5
OIg5*R3B%+};l}{T<qzlp

delta 50
zcmcbjaYlp7CD<jzM}&ca>BB@WX~y=AYLc8xe#;j-a&mF##=8XjvMf-X>?thI$T+!B
G_%Q%yeGTXU

diff --git a/tests/data/acpi/virt/DSDT.memhp b/tests/data/acpi/virt/DSDT.memhp
index 54728e2b4b8b959f3f829386f6a388ef2600e747..897648637cc6c8af47c67a9a349477c0240f833b 100644
GIT binary patch
delta 60
zcmV-C0K@;CGo>>ML{mgmr5OMK0+5jk8v=lsu_qV_0!FB#K?w>7HD5$iO$4ARlSvaF
S2_c{dlWGbDqyV#N6Ep_<;t(eQ

delta 52
zcmZ2#JlB}ZCD<iot|S8kli)-yX{L_p8`UK^nf#V7cI4#Z(2aKq_GMY1IN4KJo{@2L
Ip|A-X0Bmp#CjbBd

diff --git a/tests/data/acpi/virt/DSDT.numamem b/tests/data/acpi/virt/DSDT.numamem
index bc519abff9cadc1552e4e586b0a3f5f0db498f4a..ea8a0869af1637ab75fe335e100256a2acf85e16 100644
GIT binary patch
delta 58
zcmX@3aYcj6CD<h-M1+BXDPba)G-LlpHAzk;w-t*WIk`AY<6VM%Sr%wc_7s-qR9wJ5
OIg5*R3B%+};l}{T<qzlp

delta 50
zcmcbjaYlp7CD<jzM}&ca>BB@WX~y=AYLc8xe#;j-a&mF##=8XjvMf-X>?thI$T+!B
G_%Q%yeGTXU

diff --git a/tests/data/acpi/virt/DSDT.pxb b/tests/data/acpi/virt/DSDT.pxb
index d5f0533a02d62bc2ae2db9b9de9484e5c06652fe..f380e6023e7fc209a43280d71bf8a91286e35b63 100644
GIT binary patch
delta 127
zcmexm(`m!y66_MfDaXLTWH6CSnyKa9MD>J?Z+CF8x~&Ls_ME(mQ+{%XF#qN`!fQEL
zIpUpzf;R`r$TD)6#=8XjvMkV?Y%46!sknfBauyfs5{AjsWy2VqCokhv-h7)giW5Z@
QFI?3UL1xCuuVg0x0IJI-00000

delta 259
zcmeCQ`DMf966_LECC9+P6h4tlnyK~aMD>JvLpPsb1CDqPPZwSvzaW1D14B~=j`&bd
z7a-{z6f7vn#n;Bkz{mgrER_t=O&~3X*t9UBYheNE0&B5w3wL&dXbE)n4K*+@Fanw%
z9OAff!F&!z-^nM0mH8cAf<SgUI|7Xfc6HzULO7O_LpR<f*q3F2;$&Z0c}~V9g3OFS
v2e?mm;51_84RH1}*j&W9mV;Bo5o|NVlFceYvWzI27#SxQ3WrUuk?jQl7Wq8e

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index 42418e58e7..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,6 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/microvm/DSDT.pcie",
-"tests/data/acpi/virt/DSDT",
-"tests/data/acpi/virt/DSDT.memhp",
-"tests/data/acpi/virt/DSDT.numamem",
-"tests/data/acpi/virt/DSDT.pxb",
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/8] acpi: Add addr offset in build_crs
  2020-12-23  9:08 ` [PATCH v3 2/8] acpi: Add addr offset in build_crs Jiahui Cen
@ 2020-12-29 13:36   ` Igor Mammedov
  2020-12-31  3:26     ` Jiahui Cen
  0 siblings, 1 reply; 26+ messages in thread
From: Igor Mammedov @ 2020-12-29 13:36 UTC (permalink / raw)
  To: Jiahui Cen
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin, Ard Biesheuvel,
	Richard Henderson, qemu-devel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin

On Wed, 23 Dec 2020 17:08:30 +0800
Jiahui Cen <cenjiahui@huawei.com> wrote:

> AML needs Address Translation offset to describe how a bridge translates
> addresses accross the bridge when using an address descriptor, and
> especially on ARM, the translation offset of pio resource is usually
> non zero.

could you point out where in patch [8/8] it becomes non zero?

> 
> Therefore, it's necessary to pass offset for pio, mmio32, mmio64 and bus
> number into build_crs.
> 
> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> ---
>  hw/acpi/aml-build.c         | 18 ++++++++++--------
>  hw/i386/acpi-build.c        |  3 ++-
>  hw/pci-host/gpex-acpi.c     |  3 ++-
>  include/hw/acpi/aml-build.h |  4 +++-
>  4 files changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index f976aa667b..7b6ebb0cc8 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -2076,7 +2076,9 @@ void build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
>                   tpm2_ptr, "TPM2", table_data->len - tpm2_start, 4, NULL, NULL);
>  }
>  
> -Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
> +Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set, uint32_t io_offset,
> +               uint32_t mmio32_offset, uint64_t mmio64_offset,
> +               uint16_t bus_nr_offset)
>  {
>      Aml *crs = aml_resource_template();
>      CrsRangeSet temp_range_set;
> @@ -2189,10 +2191,10 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>      for (i = 0; i < temp_range_set.io_ranges->len; i++) {
>          entry = g_ptr_array_index(temp_range_set.io_ranges, i);
>          aml_append(crs,
> -                   aml_word_io(AML_MIN_FIXED, AML_MAX_FIXED,
> -                               AML_POS_DECODE, AML_ENTIRE_RANGE,
> -                               0, entry->base, entry->limit, 0,
> -                               entry->limit - entry->base + 1));
> +                   aml_dword_io(AML_MIN_FIXED, AML_MAX_FIXED,
> +                                AML_POS_DECODE, AML_ENTIRE_RANGE,
> +                                0, entry->base, entry->limit, io_offset,
> +                                entry->limit - entry->base + 1));
>          crs_range_insert(range_set->io_ranges, entry->base, entry->limit);
>      }
>  
> @@ -2205,7 +2207,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>                     aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED,
>                                      AML_MAX_FIXED, AML_NON_CACHEABLE,
>                                      AML_READ_WRITE,
> -                                    0, entry->base, entry->limit, 0,
> +                                    0, entry->base, entry->limit, mmio32_offset,
>                                      entry->limit - entry->base + 1));
>          crs_range_insert(range_set->mem_ranges, entry->base, entry->limit);
>      }
> @@ -2217,7 +2219,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>                     aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
>                                      AML_MAX_FIXED, AML_NON_CACHEABLE,
>                                      AML_READ_WRITE,
> -                                    0, entry->base, entry->limit, 0,
> +                                    0, entry->base, entry->limit, mmio64_offset,
>                                      entry->limit - entry->base + 1));
>          crs_range_insert(range_set->mem_64bit_ranges,
>                           entry->base, entry->limit);
> @@ -2230,7 +2232,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>                              0,
>                              pci_bus_num(host->bus),
>                              max_bus,
> -                            0,
> +                            bus_nr_offset,
>                              max_bus - pci_bus_num(host->bus) + 1));
>  
>      return crs;
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index f18b71dea9..f56d699c7f 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1360,7 +1360,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>              }
>  
>              aml_append(dev, build_prt(false));
> -            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set);
> +            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set,
> +                            0, 0, 0, 0);
>              aml_append(dev, aml_name_decl("_CRS", crs));
>              aml_append(scope, dev);
>              aml_append(dsdt, scope);
> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> index 7f20ee1c98..11b3db8f71 100644
> --- a/hw/pci-host/gpex-acpi.c
> +++ b/hw/pci-host/gpex-acpi.c
> @@ -168,7 +168,8 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>               * 1. The resources the pci-brige/pcie-root-port need.
>               * 2. The resources the devices behind pxb need.
>               */
> -            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set);
> +            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set,
> +                            cfg->pio.base, 0, 0, 0);
>              aml_append(dev, aml_name_decl("_CRS", crs));
>  
>              acpi_dsdt_add_pci_osc(dev);
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index e727bea1bc..54a5aec4d7 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -452,7 +452,9 @@ void crs_replace_with_free_ranges(GPtrArray *ranges,
>  void crs_range_set_init(CrsRangeSet *range_set);
>  void crs_range_set_free(CrsRangeSet *range_set);
>  
> -Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set);
> +Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set, uint32_t io_offset,
> +               uint32_t mmio32_offset, uint64_t mmio64_offset,
> +               uint16_t bus_nr_offset);
>  
>  void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>                         uint64_t len, int node, MemoryAffinityFlags flags);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2020-12-23  9:08 ` [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map Jiahui Cen
@ 2020-12-29 13:41   ` Igor Mammedov
  2020-12-30 21:22     ` Michael S. Tsirkin
  2020-12-31  3:30     ` Jiahui Cen
  0 siblings, 2 replies; 26+ messages in thread
From: Igor Mammedov @ 2020-12-29 13:41 UTC (permalink / raw)
  To: Jiahui Cen
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Ard Biesheuvel, Paolo Bonzini,
	Laszlo Ersek, wu.wubin

On Wed, 23 Dec 2020 17:08:31 +0800
Jiahui Cen <cenjiahui@huawei.com> wrote:

> There may be some differences in pci resource assignment between guest os
> and firmware.
> 
> Eg. A Bridge with Bus [d2]
>     -+-[0000:d2]---01.0-[d3]----01.0
> 
>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>                                           BAR4 (mem, 64-bit, pref) [size=64M]
> 
>     In EDK2, the Resource Map would be:
>         PciBus: Resource Map for Bridge [D2|01|00]
>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
> 
>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>     the PMem64 size, which would be 0x6000000. So kernel would try to
>     allocate 0x6000000 from the PMem64 resource window, but since the window
>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
> 
> The diffences could result in resource assignment failure.
> 
> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
> that firmware has done at boot time could handle the differences.

I'm not sure about this one, 
OS should able to reconfigure PCI resources according to what and where is plugged
(and it even more true is hotplug is taken into account)

> 
> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> ---
>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> index 11b3db8f71..c189306599 100644
> --- a/hw/pci-host/gpex-acpi.c
> +++ b/hw/pci-host/gpex-acpi.c
> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
> -    uint8_t byte_list[1] = {1};
> -    buf = aml_buffer(1, byte_list);
> +    uint8_t byte_list[] = {
> +                0x1 << 0 /* support for functions other than function 0 */ |
> +                0x1 << 5 /* support for function 5 */
> +                };
> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>      aml_append(ifctx1, aml_return(buf));
>      aml_append(ifctx, ifctx1);
> +
> +    /* PCI Firmware Specification 3.1
> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
> +     */
> +    /* Arg2: Function Index: 5 */
> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
> +    /* 0 - The operating system must not ignore the PCI configuration that
> +     *     firmware has done at boot time.
> +     */
> +    aml_append(ifctx1, aml_return(aml_int(0)));
> +    aml_append(ifctx, ifctx1);
>      aml_append(method, ifctx);
>  
>      byte_list[0] = 0;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order
  2020-12-23  9:08 ` [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order Jiahui Cen
@ 2020-12-29 13:47   ` Igor Mammedov
  2020-12-30 21:17     ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Igor Mammedov @ 2020-12-29 13:47 UTC (permalink / raw)
  To: Jiahui Cen
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin, Ard Biesheuvel,
	Richard Henderson, qemu-devel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin

On Wed, 23 Dec 2020 17:08:33 +0800
Jiahui Cen <cenjiahui@huawei.com> wrote:

> The overlap check of IO resource window would fail when Linux kernel
> registers an IO resource [b, c) earlier than another resource [a, b).
> Though this incorrect check could be fixed by [1], it would be better to
> append pxb devs into DSDT table in ascending order.
> 
> [1]: https://lore.kernel.org/lkml/20201218062335.5320-1-cenjiahui@huawei.com/

considering there is acceptable fix for kernel I'd rather avoid
workarounds on QEMU side. So I suggest dropping this patch.

it also should reduce noise in [8/8] masking other changes.

> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> ---
>  hw/pci-host/gpex-acpi.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> index 4bf1e94309..95a7a0f12b 100644
> --- a/hw/pci-host/gpex-acpi.c
> +++ b/hw/pci-host/gpex-acpi.c
> @@ -141,7 +141,7 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>  void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>  {
>      int nr_pcie_buses = cfg->ecam.size / PCIE_MMCFG_SIZE_MIN;
> -    Aml *method, *crs, *dev, *rbuf;
> +    Aml *method, *crs, *dev, *rbuf, *pxb_devs[nr_pcie_buses];
>      PCIBus *bus = cfg->bus;
>      CrsRangeSet crs_range_set;
>      CrsRangeEntry *entry;
> @@ -149,6 +149,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>  
>      /* start to construct the tables for pxb */
>      crs_range_set_init(&crs_range_set);
> +    memset(pxb_devs, 0, sizeof(pxb_devs));
>      if (bus) {
>          QLIST_FOREACH(bus, &bus->child, sibling) {
>              uint8_t bus_num = pci_bus_num(bus);
> @@ -190,7 +191,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>  
>              acpi_dsdt_add_pci_osc(dev);
>  
> -            aml_append(scope, dev);
> +            pxb_devs[bus_num] = dev;
>          }
>      }
>  
> @@ -278,5 +279,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>      aml_append(dev, dev_res0);
>      aml_append(scope, dev);
>  
> +    for (i = 0; i < ARRAY_SIZE(pxb_devs); i++) {
> +        if (pxb_devs[i]) {
> +            aml_append(scope, pxb_devs[i]);
> +        }
> +    }
> +
>      crs_range_set_free(&crs_range_set);
>  }



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 6/8] Kconfig: Enable PXB for ARM_VIRT by default
  2020-12-23  9:08 ` [PATCH v3 6/8] Kconfig: Enable PXB for ARM_VIRT by default Jiahui Cen
@ 2020-12-29 13:50   ` Igor Mammedov
  2020-12-31  7:35     ` Jiahui Cen
  0 siblings, 1 reply; 26+ messages in thread
From: Igor Mammedov @ 2020-12-29 13:50 UTC (permalink / raw)
  To: Jiahui Cen
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin, Ard Biesheuvel,
	Richard Henderson, qemu-devel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin

On Wed, 23 Dec 2020 17:08:34 +0800
Jiahui Cen <cenjiahui@huawei.com> wrote:
subj
s/by default//
s/enable/compile/

> PXB is now supported on ARM, so let's enable it by default.
s/it by default/for arm_virt machine/
s/enable/compile/

> 
> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> ---
>  hw/pci-bridge/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/pci-bridge/Kconfig b/hw/pci-bridge/Kconfig
> index a51ec716f5..f8df4315ba 100644
> --- a/hw/pci-bridge/Kconfig
> +++ b/hw/pci-bridge/Kconfig
> @@ -5,7 +5,7 @@ config PCIE_PORT
>  
>  config PXB
>      bool
> -    default y if Q35
> +    default y if Q35 || ARM_VIRT
>  
>  config XIO3130
>      bool



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order
  2020-12-29 13:47   ` Igor Mammedov
@ 2020-12-30 21:17     ` Michael S. Tsirkin
  2020-12-31  7:34       ` Jiahui Cen
  2021-01-05  0:21       ` Igor Mammedov
  0 siblings, 2 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2020-12-30 21:17 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Ard Biesheuvel,
	Richard Henderson, qemu-devel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin

On Tue, Dec 29, 2020 at 02:47:35PM +0100, Igor Mammedov wrote:
> On Wed, 23 Dec 2020 17:08:33 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> 
> > The overlap check of IO resource window would fail when Linux kernel
> > registers an IO resource [b, c) earlier than another resource [a, b).
> > Though this incorrect check could be fixed by [1], it would be better to
> > append pxb devs into DSDT table in ascending order.
> > 
> > [1]: https://lore.kernel.org/lkml/20201218062335.5320-1-cenjiahui@huawei.com/
> 
> considering there is acceptable fix for kernel I'd rather avoid
> workarounds on QEMU side. So I suggest dropping this patch.

Well there's something to be said for a defined order of things.
And patch is from Dec 2020 will take ages for guests to be fixed,
and changing pci core on stable kernels is risky and needs
a ton of testing, not done eaily ...
Which guests are affected by the bug?

There are also some issues with the patch see below.

> it also should reduce noise in [8/8] masking other changes.
> 
> > Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> > ---
> >  hw/pci-host/gpex-acpi.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> > index 4bf1e94309..95a7a0f12b 100644
> > --- a/hw/pci-host/gpex-acpi.c
> > +++ b/hw/pci-host/gpex-acpi.c
> > @@ -141,7 +141,7 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
> >  void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
> >  {
> >      int nr_pcie_buses = cfg->ecam.size / PCIE_MMCFG_SIZE_MIN;
> > -    Aml *method, *crs, *dev, *rbuf;
> > +    Aml *method, *crs, *dev, *rbuf, *pxb_devs[nr_pcie_buses];

dynamically sized array on stack poses security issues

> >      PCIBus *bus = cfg->bus;
> >      CrsRangeSet crs_range_set;
> >      CrsRangeEntry *entry;
> > @@ -149,6 +149,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
> >  
> >      /* start to construct the tables for pxb */
> >      crs_range_set_init(&crs_range_set);
> > +    memset(pxb_devs, 0, sizeof(pxb_devs));
> >      if (bus) {
> >          QLIST_FOREACH(bus, &bus->child, sibling) {
> >              uint8_t bus_num = pci_bus_num(bus);
> > @@ -190,7 +191,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
> >  
> >              acpi_dsdt_add_pci_osc(dev);
> >  
> > -            aml_append(scope, dev);
> > +            pxb_devs[bus_num] = dev;

If bus numbers intersect this will overwrite old one.
I'd rather not worry about it, just have an array
of structs with bus numbers and sort it.


> >          }
> >      }
> >  
> > @@ -278,5 +279,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
> >      aml_append(dev, dev_res0);
> >      aml_append(scope, dev);
> >  
> > +    for (i = 0; i < ARRAY_SIZE(pxb_devs); i++) {
> > +        if (pxb_devs[i]) {
> > +            aml_append(scope, pxb_devs[i]);
> > +        }
> > +    }


so this sorts them by bus number not by io address.
Probably happens to help since bios numbers them in the same order ...
Is there a way to address this more robustly in case
bios changes? E.g. I see the bug is only in PIO so sort by that address maybe?

Also pls add a code comment explaining why we are doing all this
with link to patch, which guests are affected etc.

> > +
> >      crs_range_set_free(&crs_range_set);
> >  }



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2020-12-29 13:41   ` Igor Mammedov
@ 2020-12-30 21:22     ` Michael S. Tsirkin
  2020-12-31  8:22       ` Jiahui Cen
  2021-01-05  0:35       ` Igor Mammedov
  2020-12-31  3:30     ` Jiahui Cen
  1 sibling, 2 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2020-12-30 21:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Richard Henderson,
	qemu-devel, Ard Biesheuvel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin

On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
> On Wed, 23 Dec 2020 17:08:31 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> 
> > There may be some differences in pci resource assignment between guest os
> > and firmware.
> > 
> > Eg. A Bridge with Bus [d2]
> >     -+-[0000:d2]---01.0-[d3]----01.0
> > 
> >     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
> >           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
> >                                           BAR4 (mem, 64-bit, pref) [size=64M]
> > 
> >     In EDK2, the Resource Map would be:
> >         PciBus: Resource Map for Bridge [D2|01|00]
> >         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
> >            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
> >            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
> >         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
> >     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
> > 
> >     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
> >     the PMem64 size, which would be 0x6000000. So kernel would try to
> >     allocate 0x6000000 from the PMem64 resource window, but since the window
> >     size is 0x4100000 as assigned by EDK2, the allocation would fail.
> > 
> > The diffences could result in resource assignment failure.
> > 
> > Using _DSM #5 method to inform guest os not to ignore the PCI configuration
> > that firmware has done at boot time could handle the differences.
> 
> I'm not sure about this one, 
> OS should able to reconfigure PCI resources according to what and where is plugged
> (and it even more true is hotplug is taken into account)

spec says this:

0: No (The operating system must not ignore the PCI configuration that firmware has done
at boot time. However, the operating system is free to configure the devices in this hierarchy
that have not been configured by the firmware. There may be a reduced level of hot plug
capability support in this hierarchy due to resource constraints. This situation is the same as
the legacy situation where this _DSM is not provided.)
1: Yes (The operating system may ignore the PCI configuration that the firmware has done
at boot time, and reconfigure/rebalance the resources in the hierarchy.)

and

IMPLEMENTATION NOTE
This _DSM function provides backwards compatibility on platforms that can run legacy operating
systems.
Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
The firmware cannot distinguish the operating system in time to change the boot configuration of
devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
operating system is installed on this system, it can access device resources above the 4 GiB so it does
not want the firmware to constrain the resource assignment below 4 GiB that the firmware
configures at boot time. It is not possible for the firmware to change this by the time it boots the
operating system. Ignoring the configurations done by firmware at boot time will allow the
operating system to push resource assignment using addresses above 4 GiB for an x64 operating
system while constrain it to addresses below 4 GiB for an x86 operating system.

so fundamentally, saying "1" here just means "you can ignore what
firmware configured if you like".


I have a different question though: our CRS etc is based on what
firmware configured. Is that ok? Or is ACPI expected to somehow
reconfigure itself when OS reconfigures devices?
Think it's ok but could not find documentation either way.


> > 
> > Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> > ---
> >  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> > index 11b3db8f71..c189306599 100644
> > --- a/hw/pci-host/gpex-acpi.c
> > +++ b/hw/pci-host/gpex-acpi.c
> > @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
> >      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
> >      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
> >      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
> > -    uint8_t byte_list[1] = {1};
> > -    buf = aml_buffer(1, byte_list);
> > +    uint8_t byte_list[] = {
> > +                0x1 << 0 /* support for functions other than function 0 */ |
> > +                0x1 << 5 /* support for function 5 */
> > +                };
> > +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
> >      aml_append(ifctx1, aml_return(buf));
> >      aml_append(ifctx, ifctx1);
> > +
> > +    /* PCI Firmware Specification 3.1
> > +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
> > +     */
> > +    /* Arg2: Function Index: 5 */
> > +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
> > +    /* 0 - The operating system must not ignore the PCI configuration that
> > +     *     firmware has done at boot time.
> > +     */
> > +    aml_append(ifctx1, aml_return(aml_int(0)));
> > +    aml_append(ifctx, ifctx1);
> >      aml_append(method, ifctx);
> >  
> >      byte_list[0] = 0;



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/8] acpi: Add addr offset in build_crs
  2020-12-29 13:36   ` Igor Mammedov
@ 2020-12-31  3:26     ` Jiahui Cen
  0 siblings, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-31  3:26 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin, Ard Biesheuvel,
	Richard Henderson, qemu-devel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin



On 2020/12/29 21:36, Igor Mammedov wrote:
> On Wed, 23 Dec 2020 17:08:30 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> 
>> AML needs Address Translation offset to describe how a bridge translates
>> addresses accross the bridge when using an address descriptor, and
>> especially on ARM, the translation offset of pio resource is usually
>> non zero.
> 
> could you point out where in patch [8/8] it becomes non zero?
> 

Actually the testcase is simple and there is no resource required by
the extra bus except bus number. So this patch seems not change the
expected files. Would it be better to add some devices under the extra
bus in the testcase?

BTW, there are several patches that changes the expected files and
make patch [8/8] messy. Should I separate patch [8/8] into different
patches to make it clear?

Thanks,
Jiahui

>>
>> Therefore, it's necessary to pass offset for pio, mmio32, mmio64 and bus
>> number into build_crs.
>>
>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>> ---
>>  hw/acpi/aml-build.c         | 18 ++++++++++--------
>>  hw/i386/acpi-build.c        |  3 ++-
>>  hw/pci-host/gpex-acpi.c     |  3 ++-
>>  include/hw/acpi/aml-build.h |  4 +++-
>>  4 files changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index f976aa667b..7b6ebb0cc8 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -2076,7 +2076,9 @@ void build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
>>                   tpm2_ptr, "TPM2", table_data->len - tpm2_start, 4, NULL, NULL);
>>  }
>>  
>> -Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>> +Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set, uint32_t io_offset,
>> +               uint32_t mmio32_offset, uint64_t mmio64_offset,
>> +               uint16_t bus_nr_offset)
>>  {
>>      Aml *crs = aml_resource_template();
>>      CrsRangeSet temp_range_set;
>> @@ -2189,10 +2191,10 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>>      for (i = 0; i < temp_range_set.io_ranges->len; i++) {
>>          entry = g_ptr_array_index(temp_range_set.io_ranges, i);
>>          aml_append(crs,
>> -                   aml_word_io(AML_MIN_FIXED, AML_MAX_FIXED,
>> -                               AML_POS_DECODE, AML_ENTIRE_RANGE,
>> -                               0, entry->base, entry->limit, 0,
>> -                               entry->limit - entry->base + 1));
>> +                   aml_dword_io(AML_MIN_FIXED, AML_MAX_FIXED,
>> +                                AML_POS_DECODE, AML_ENTIRE_RANGE,
>> +                                0, entry->base, entry->limit, io_offset,
>> +                                entry->limit - entry->base + 1));
>>          crs_range_insert(range_set->io_ranges, entry->base, entry->limit);
>>      }
>>  
>> @@ -2205,7 +2207,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>>                     aml_dword_memory(AML_POS_DECODE, AML_MIN_FIXED,
>>                                      AML_MAX_FIXED, AML_NON_CACHEABLE,
>>                                      AML_READ_WRITE,
>> -                                    0, entry->base, entry->limit, 0,
>> +                                    0, entry->base, entry->limit, mmio32_offset,
>>                                      entry->limit - entry->base + 1));
>>          crs_range_insert(range_set->mem_ranges, entry->base, entry->limit);
>>      }
>> @@ -2217,7 +2219,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>>                     aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
>>                                      AML_MAX_FIXED, AML_NON_CACHEABLE,
>>                                      AML_READ_WRITE,
>> -                                    0, entry->base, entry->limit, 0,
>> +                                    0, entry->base, entry->limit, mmio64_offset,
>>                                      entry->limit - entry->base + 1));
>>          crs_range_insert(range_set->mem_64bit_ranges,
>>                           entry->base, entry->limit);
>> @@ -2230,7 +2232,7 @@ Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
>>                              0,
>>                              pci_bus_num(host->bus),
>>                              max_bus,
>> -                            0,
>> +                            bus_nr_offset,
>>                              max_bus - pci_bus_num(host->bus) + 1));
>>  
>>      return crs;
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index f18b71dea9..f56d699c7f 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -1360,7 +1360,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>              }
>>  
>>              aml_append(dev, build_prt(false));
>> -            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set);
>> +            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set,
>> +                            0, 0, 0, 0);
>>              aml_append(dev, aml_name_decl("_CRS", crs));
>>              aml_append(scope, dev);
>>              aml_append(dsdt, scope);
>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>> index 7f20ee1c98..11b3db8f71 100644
>> --- a/hw/pci-host/gpex-acpi.c
>> +++ b/hw/pci-host/gpex-acpi.c
>> @@ -168,7 +168,8 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>>               * 1. The resources the pci-brige/pcie-root-port need.
>>               * 2. The resources the devices behind pxb need.
>>               */
>> -            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set);
>> +            crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set,
>> +                            cfg->pio.base, 0, 0, 0);
>>              aml_append(dev, aml_name_decl("_CRS", crs));
>>  
>>              acpi_dsdt_add_pci_osc(dev);
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index e727bea1bc..54a5aec4d7 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -452,7 +452,9 @@ void crs_replace_with_free_ranges(GPtrArray *ranges,
>>  void crs_range_set_init(CrsRangeSet *range_set);
>>  void crs_range_set_free(CrsRangeSet *range_set);
>>  
>> -Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set);
>> +Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set, uint32_t io_offset,
>> +               uint32_t mmio32_offset, uint64_t mmio64_offset,
>> +               uint16_t bus_nr_offset);
>>  
>>  void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>>                         uint64_t len, int node, MemoryAffinityFlags flags);
> 
> .
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2020-12-29 13:41   ` Igor Mammedov
  2020-12-30 21:22     ` Michael S. Tsirkin
@ 2020-12-31  3:30     ` Jiahui Cen
  1 sibling, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-31  3:30 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Ard Biesheuvel, Paolo Bonzini,
	Laszlo Ersek, wu.wubin



On 2020/12/29 21:41, Igor Mammedov wrote:
> On Wed, 23 Dec 2020 17:08:31 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> 
>> There may be some differences in pci resource assignment between guest os
>> and firmware.
>>
>> Eg. A Bridge with Bus [d2]
>>     -+-[0000:d2]---01.0-[d3]----01.0
>>
>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>
>>     In EDK2, the Resource Map would be:
>>         PciBus: Resource Map for Bridge [D2|01|00]
>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>
>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>
>> The diffences could result in resource assignment failure.
>>
>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>> that firmware has done at boot time could handle the differences.
> 
> I'm not sure about this one, 
> OS should able to reconfigure PCI resources according to what and where is plugged
> (and it even more true is hotplug is taken into account)

I think the problem is that OS can not reconfigure the resource windows set in _CRS
by firmware, which means the total resource range where OS can assign from is limited.
So would it be better that OS prefers the resource assignment by firmware and
reconfigures those not properly assigned resources?

And the bios seems to have taken hotplug reserved resources into account.

Thanks,
Jiahui

>>
>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>> ---
>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>> index 11b3db8f71..c189306599 100644
>> --- a/hw/pci-host/gpex-acpi.c
>> +++ b/hw/pci-host/gpex-acpi.c
>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>> -    uint8_t byte_list[1] = {1};
>> -    buf = aml_buffer(1, byte_list);
>> +    uint8_t byte_list[] = {
>> +                0x1 << 0 /* support for functions other than function 0 */ |
>> +                0x1 << 5 /* support for function 5 */
>> +                };
>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>      aml_append(ifctx1, aml_return(buf));
>>      aml_append(ifctx, ifctx1);
>> +
>> +    /* PCI Firmware Specification 3.1
>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>> +     */
>> +    /* Arg2: Function Index: 5 */
>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>> +    /* 0 - The operating system must not ignore the PCI configuration that
>> +     *     firmware has done at boot time.
>> +     */
>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>> +    aml_append(ifctx, ifctx1);
>>      aml_append(method, ifctx);
>>  
>>      byte_list[0] = 0;
> 
> .
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order
  2020-12-30 21:17     ` Michael S. Tsirkin
@ 2020-12-31  7:34       ` Jiahui Cen
  2021-01-05  0:21       ` Igor Mammedov
  1 sibling, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-31  7:34 UTC (permalink / raw)
  To: Michael S. Tsirkin, Igor Mammedov
  Cc: xieyingtai, Eduardo Habkost, Ard Biesheuvel, Richard Henderson,
	qemu-devel, Paolo Bonzini, Laszlo Ersek, wu.wubin



On 2020/12/31 5:17, Michael S. Tsirkin wrote:
> On Tue, Dec 29, 2020 at 02:47:35PM +0100, Igor Mammedov wrote:
>> On Wed, 23 Dec 2020 17:08:33 +0800
>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>
>>> The overlap check of IO resource window would fail when Linux kernel
>>> registers an IO resource [b, c) earlier than another resource [a, b).
>>> Though this incorrect check could be fixed by [1], it would be better to
>>> append pxb devs into DSDT table in ascending order.
>>>
>>> [1]: https://lore.kernel.org/lkml/20201218062335.5320-1-cenjiahui@huawei.com/
>>
>> considering there is acceptable fix for kernel I'd rather avoid
>> workarounds on QEMU side. So I suggest dropping this patch.
> 
> Well there's something to be said for a defined order of things.
> And patch is from Dec 2020 will take ages for guests to be fixed,
> and changing pci core on stable kernels is risky and needs
> a ton of testing, not done eaily ...

Agree. It seems not a fatal problem since usually the resources
are ordered, so I have no idea how long it would take to accept
the patch.

> Which guests are affected by the bug?

It is a common part of pci resource registery, and all those having
PCI_IOBASE defined would be affected, such as arm, arm64, mips, risc-v...

> 
> There are also some issues with the patch see below.
> 
>> it also should reduce noise in [8/8] masking other changes.
>>
>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>> ---
>>>  hw/pci-host/gpex-acpi.c | 11 +++++++++--
>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>> index 4bf1e94309..95a7a0f12b 100644
>>> --- a/hw/pci-host/gpex-acpi.c
>>> +++ b/hw/pci-host/gpex-acpi.c
>>> @@ -141,7 +141,7 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>  void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>>>  {
>>>      int nr_pcie_buses = cfg->ecam.size / PCIE_MMCFG_SIZE_MIN;
>>> -    Aml *method, *crs, *dev, *rbuf;
>>> +    Aml *method, *crs, *dev, *rbuf, *pxb_devs[nr_pcie_buses];
> 
> dynamically sized array on stack poses security issues

Thanks for your review.

I will replace it with dynamical allocation like g_alloc0.

> 
>>>      PCIBus *bus = cfg->bus;
>>>      CrsRangeSet crs_range_set;
>>>      CrsRangeEntry *entry;
>>> @@ -149,6 +149,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>>>  
>>>      /* start to construct the tables for pxb */
>>>      crs_range_set_init(&crs_range_set);
>>> +    memset(pxb_devs, 0, sizeof(pxb_devs));
>>>      if (bus) {
>>>          QLIST_FOREACH(bus, &bus->child, sibling) {
>>>              uint8_t bus_num = pci_bus_num(bus);
>>> @@ -190,7 +191,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>>>  
>>>              acpi_dsdt_add_pci_osc(dev);
>>>  
>>> -            aml_append(scope, dev);
>>> +            pxb_devs[bus_num] = dev;
> 
> If bus numbers intersect this will overwrite old one.
> I'd rather not worry about it, just have an array
> of structs with bus numbers and sort it.
> 

I have no idea when the bus numbers would intersect. IIUC, the
validity of bus number will be checked when pxb device realizes.
Thus different root buses would have different bus numbers.

> 
>>>          }
>>>      }
>>>  
>>> @@ -278,5 +279,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
>>>      aml_append(dev, dev_res0);
>>>      aml_append(scope, dev);
>>>  
>>> +    for (i = 0; i < ARRAY_SIZE(pxb_devs); i++) {
>>> +        if (pxb_devs[i]) {
>>> +            aml_append(scope, pxb_devs[i]);
>>> +        }
>>> +    }
> 
> 
> so this sorts them by bus number not by io address.
> Probably happens to help since bios numbers them in the same order ...
> Is there a way to address this more robustly in case
> bios changes? E.g. I see the bug is only in PIO so sort by that address maybe?
> 

In my humble opinion, sorting by bus number may be the simplest
way to workaround the bug, because generally the bios assigns
resources in bus number ascending order and thus the io address
would be assigned in order.

In case bios changes, as long as the bug is fixed, OS could handle
the resources no matter whether the resource is in order or not.

Otherwise we need to expose io address range from `build_crs`
for sorting. And in this way, there may be another problem that
the sorting would be difficult if a root bus has several separated
io resource windows which intersect other root buses.(Though,
generally the io resource window is a continuous range as bios
assigns resources in order.)

> Also pls add a code comment explaining why we are doing all this
> with link to patch, which guests are affected etc.

OK, I will add comments.

Thanks,
Jiahui

>>> +
>>>      crs_range_set_free(&crs_range_set);
>>>  }
> 
> .
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 6/8] Kconfig: Enable PXB for ARM_VIRT by default
  2020-12-29 13:50   ` Igor Mammedov
@ 2020-12-31  7:35     ` Jiahui Cen
  0 siblings, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-31  7:35 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin, Ard Biesheuvel,
	Richard Henderson, qemu-devel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin



On 2020/12/29 21:50, Igor Mammedov wrote:
> On Wed, 23 Dec 2020 17:08:34 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> subj
> s/by default//
> s/enable/compile/
> 
>> PXB is now supported on ARM, so let's enable it by default.
> s/it by default/for arm_virt machine/
> s/enable/compile/

Thanks for your review.
I will fix them in the next patches.

Thanks,
Jiahui

>>
>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>> ---
>>  hw/pci-bridge/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/pci-bridge/Kconfig b/hw/pci-bridge/Kconfig
>> index a51ec716f5..f8df4315ba 100644
>> --- a/hw/pci-bridge/Kconfig
>> +++ b/hw/pci-bridge/Kconfig
>> @@ -5,7 +5,7 @@ config PCIE_PORT
>>  
>>  config PXB
>>      bool
>> -    default y if Q35
>> +    default y if Q35 || ARM_VIRT
>>  
>>  config XIO3130
>>      bool
> 
> .
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2020-12-30 21:22     ` Michael S. Tsirkin
@ 2020-12-31  8:22       ` Jiahui Cen
  2021-01-05  0:35       ` Igor Mammedov
  1 sibling, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2020-12-31  8:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Igor Mammedov
  Cc: xieyingtai, Eduardo Habkost, Richard Henderson, qemu-devel,
	Ard Biesheuvel, Paolo Bonzini, Laszlo Ersek, wu.wubin



On 2020/12/31 5:22, Michael S. Tsirkin wrote:
> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
>> On Wed, 23 Dec 2020 17:08:31 +0800
>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>
>>> There may be some differences in pci resource assignment between guest os
>>> and firmware.
>>>
>>> Eg. A Bridge with Bus [d2]
>>>     -+-[0000:d2]---01.0-[d3]----01.0
>>>
>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>>
>>>     In EDK2, the Resource Map would be:
>>>         PciBus: Resource Map for Bridge [D2|01|00]
>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>>
>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>
>>> The diffences could result in resource assignment failure.
>>>
>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>>> that firmware has done at boot time could handle the differences.
>>
>> I'm not sure about this one, 
>> OS should able to reconfigure PCI resources according to what and where is plugged
>> (and it even more true is hotplug is taken into account)
> 
> spec says this:
> 
> 0: No (The operating system must not ignore the PCI configuration that firmware has done
> at boot time. However, the operating system is free to configure the devices in this hierarchy
> that have not been configured by the firmware. There may be a reduced level of hot plug
> capability support in this hierarchy due to resource constraints. This situation is the same as
> the legacy situation where this _DSM is not provided.)
> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> 
> and
> 
> IMPLEMENTATION NOTE
> This _DSM function provides backwards compatibility on platforms that can run legacy operating
> systems.
> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
> The firmware cannot distinguish the operating system in time to change the boot configuration of
> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
> operating system is installed on this system, it can access device resources above the 4 GiB so it does
> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
> configures at boot time. It is not possible for the firmware to change this by the time it boots the
> operating system. Ignoring the configurations done by firmware at boot time will allow the
> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
> system while constrain it to addresses below 4 GiB for an x86 operating system.
> 
> so fundamentally, saying "1" here just means "you can ignore what
> firmware configured if you like".
> 
> 
> I have a different question though: our CRS etc is based on what
> firmware configured. Is that ok? Or is ACPI expected to somehow
> reconfigure itself when OS reconfigures devices?
> Think it's ok but could not find documentation either way.

In my humble opinion, it is ok.

I'm not sure whether it is useful, but PCI Firmware Specification Revision 3.0 Chapter 3.5 said:

Firmware must configure all Host Bridges in the systems, even if they are not connected to a
console or boot device. Firmware must configure Host Bridges in order to allow operating systems
to use the devices below the Host Bridges. This is because the Host Bridges programming model is
not defined by the PCI Specifications. “Configured” in this context means that:
- Memory and I/O resources are assigned and configured.
‰- Includes both the resources consumed by the Host Bridge and the resources passed through to
the secondary bus.
‰- The bridge is enabled to receive and forward transactions.
‰- The bridge is operating in “safe” mode. Safe mode includes:
● Enabling resources such as: I/O Port, Memory addresses, VGA routing, bus number, etc.
● Enabling detection of parity and system errors
● Programming cacheline, latency timer, and other registers as required by the PCI
Specifications.


I'm not sure, but does "This is because the Host Bridges
programming model is not defined by the PCI Specifications"
mean that OS has no way to reconfigure Host Bridges (and
their CRS etc configuration).

Please point it out if I misunderstand the spec.

Thanks,
Jiahui

> 
>>>
>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>> ---
>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>> index 11b3db8f71..c189306599 100644
>>> --- a/hw/pci-host/gpex-acpi.c
>>> +++ b/hw/pci-host/gpex-acpi.c
>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>> -    uint8_t byte_list[1] = {1};
>>> -    buf = aml_buffer(1, byte_list);
>>> +    uint8_t byte_list[] = {
>>> +                0x1 << 0 /* support for functions other than function 0 */ |
>>> +                0x1 << 5 /* support for function 5 */
>>> +                };
>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>      aml_append(ifctx1, aml_return(buf));
>>>      aml_append(ifctx, ifctx1);
>>> +
>>> +    /* PCI Firmware Specification 3.1
>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>> +     */
>>> +    /* Arg2: Function Index: 5 */
>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>> +    /* 0 - The operating system must not ignore the PCI configuration that
>>> +     *     firmware has done at boot time.
>>> +     */
>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>>> +    aml_append(ifctx, ifctx1);
>>>      aml_append(method, ifctx);
>>>  
>>>      byte_list[0] = 0;
> 
> .
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order
  2020-12-30 21:17     ` Michael S. Tsirkin
  2020-12-31  7:34       ` Jiahui Cen
@ 2021-01-05  0:21       ` Igor Mammedov
  1 sibling, 0 replies; 26+ messages in thread
From: Igor Mammedov @ 2021-01-05  0:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Richard Henderson,
	qemu-devel, Ard Biesheuvel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin

On Wed, 30 Dec 2020 16:17:14 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Dec 29, 2020 at 02:47:35PM +0100, Igor Mammedov wrote:
> > On Wed, 23 Dec 2020 17:08:33 +0800
> > Jiahui Cen <cenjiahui@huawei.com> wrote:
> >   
> > > The overlap check of IO resource window would fail when Linux kernel
> > > registers an IO resource [b, c) earlier than another resource [a, b).
> > > Though this incorrect check could be fixed by [1], it would be better to
> > > append pxb devs into DSDT table in ascending order.
> > > 
> > > [1]: https://lore.kernel.org/lkml/20201218062335.5320-1-cenjiahui@huawei.com/  
> > 
> > considering there is acceptable fix for kernel I'd rather avoid
> > workarounds on QEMU side. So I suggest dropping this patch.  
> 
> Well there's something to be said for a defined order of things.
> And patch is from Dec 2020 will take ages for guests to be fixed,
> and changing pci core on stable kernels is risky and needs
> a ton of testing, not done eaily ...
> Which guests are affected by the bug?
it's workaround for a trivial bug for niche configuration
for a new QEMU feature that never worked for arm/virt machine
Downstream that think  that it is important enough to support
can backport and test patch thus helping stable trees to merge
it sooner.


> There are also some issues with the patch see below.
> 
> > it also should reduce noise in [8/8] masking other changes.
> >   
> > > Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> > > ---
> > >  hw/pci-host/gpex-acpi.c | 11 +++++++++--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> > > index 4bf1e94309..95a7a0f12b 100644
> > > --- a/hw/pci-host/gpex-acpi.c
> > > +++ b/hw/pci-host/gpex-acpi.c
> > > @@ -141,7 +141,7 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
> > >  void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
> > >  {
> > >      int nr_pcie_buses = cfg->ecam.size / PCIE_MMCFG_SIZE_MIN;
> > > -    Aml *method, *crs, *dev, *rbuf;
> > > +    Aml *method, *crs, *dev, *rbuf, *pxb_devs[nr_pcie_buses];  
> 
> dynamically sized array on stack poses security issues
> 
> > >      PCIBus *bus = cfg->bus;
> > >      CrsRangeSet crs_range_set;
> > >      CrsRangeEntry *entry;
> > > @@ -149,6 +149,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
> > >  
> > >      /* start to construct the tables for pxb */
> > >      crs_range_set_init(&crs_range_set);
> > > +    memset(pxb_devs, 0, sizeof(pxb_devs));
> > >      if (bus) {
> > >          QLIST_FOREACH(bus, &bus->child, sibling) {
> > >              uint8_t bus_num = pci_bus_num(bus);
> > > @@ -190,7 +191,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
> > >  
> > >              acpi_dsdt_add_pci_osc(dev);
> > >  
> > > -            aml_append(scope, dev);
> > > +            pxb_devs[bus_num] = dev;  
> 
> If bus numbers intersect this will overwrite old one.
> I'd rather not worry about it, just have an array
> of structs with bus numbers and sort it.
> 
> 
> > >          }
> > >      }
> > >  
> > > @@ -278,5 +279,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
> > >      aml_append(dev, dev_res0);
> > >      aml_append(scope, dev);
> > >  
> > > +    for (i = 0; i < ARRAY_SIZE(pxb_devs); i++) {
> > > +        if (pxb_devs[i]) {
> > > +            aml_append(scope, pxb_devs[i]);
> > > +        }
> > > +    }  
> 
> 
> so this sorts them by bus number not by io address.
> Probably happens to help since bios numbers them in the same order ...
> Is there a way to address this more robustly in case
> bios changes? E.g. I see the bug is only in PIO so sort by that address maybe?
> 
> Also pls add a code comment explaining why we are doing all this
> with link to patch, which guests are affected etc.
> 
> > > +
> > >      crs_range_set_free(&crs_range_set);
> > >  }  
> 
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2020-12-30 21:22     ` Michael S. Tsirkin
  2020-12-31  8:22       ` Jiahui Cen
@ 2021-01-05  0:35       ` Igor Mammedov
  2021-01-05  1:53         ` Jiahui Cen
  2021-01-05 19:33         ` Laszlo Ersek
  1 sibling, 2 replies; 26+ messages in thread
From: Igor Mammedov @ 2021-01-05  0:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Richard Henderson,
	qemu-devel, Ard Biesheuvel, Paolo Bonzini, Laszlo Ersek,
	wu.wubin

On Wed, 30 Dec 2020 16:22:08 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
> > On Wed, 23 Dec 2020 17:08:31 +0800
> > Jiahui Cen <cenjiahui@huawei.com> wrote:
> >   
> > > There may be some differences in pci resource assignment between guest os
> > > and firmware.
> > > 
> > > Eg. A Bridge with Bus [d2]
> > >     -+-[0000:d2]---01.0-[d3]----01.0
> > > 
> > >     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
> > >           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
> > >                                           BAR4 (mem, 64-bit, pref) [size=64M]
> > > 
> > >     In EDK2, the Resource Map would be:
> > >         PciBus: Resource Map for Bridge [D2|01|00]
> > >         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
> > >            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
> > >            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
> > >         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
> > >     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
> > > 
> > >     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
> > >     the PMem64 size, which would be 0x6000000. So kernel would try to
> > >     allocate 0x6000000 from the PMem64 resource window, but since the window
> > >     size is 0x4100000 as assigned by EDK2, the allocation would fail.
> > > 
> > > The diffences could result in resource assignment failure.
> > > 
> > > Using _DSM #5 method to inform guest os not to ignore the PCI configuration
> > > that firmware has done at boot time could handle the differences.  
> > 
> > I'm not sure about this one, 
> > OS should able to reconfigure PCI resources according to what and where is plugged
> > (and it even more true is hotplug is taken into account)  
> 
> spec says this:
> 
> 0: No (The operating system must not ignore the PCI configuration that firmware has done
> at boot time. However, the operating system is free to configure the devices in this hierarchy
> that have not been configured by the firmware. There may be a reduced level of hot plug
> capability support in this hierarchy due to resource constraints. This situation is the same as
> the legacy situation where this _DSM is not provided.)
> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
in guest kernel and maybe QEMU

Though I have a question,

 1. does it work for PC machine with current kernel, if so why?
 2. what it would take to make it work for arm/virt?

> and
> 
> IMPLEMENTATION NOTE
> This _DSM function provides backwards compatibility on platforms that can run legacy operating
> systems.
> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
> The firmware cannot distinguish the operating system in time to change the boot configuration of
> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
> operating system is installed on this system, it can access device resources above the 4 GiB so it does
> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
> configures at boot time. It is not possible for the firmware to change this by the time it boots the
> operating system. Ignoring the configurations done by firmware at boot time will allow the
> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
> system while constrain it to addresses below 4 GiB for an x86 operating system.
> 
> so fundamentally, saying "1" here just means "you can ignore what
> firmware configured if you like".
> 
> 
> I have a different question though: our CRS etc is based on what
> firmware configured. Is that ok? Or is ACPI expected to somehow
> reconfigure itself when OS reconfigures devices?
> Think it's ok but could not find documentation either way.

guest consume DSDT only at boot time,
reconfiguration can done later by PCI subsystem without
ACPI (at least it used to be so).

However DSM is dynamic,
and maybe evaluated at runtime,
though I don't know if kernel would re-evaluate this feature bit after boot


> 
> 
> > > 
> > > Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> > > ---
> > >  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
> > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> > > index 11b3db8f71..c189306599 100644
> > > --- a/hw/pci-host/gpex-acpi.c
> > > +++ b/hw/pci-host/gpex-acpi.c
> > > @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
> > >      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
> > >      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
> > >      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
> > > -    uint8_t byte_list[1] = {1};
> > > -    buf = aml_buffer(1, byte_list);
> > > +    uint8_t byte_list[] = {
> > > +                0x1 << 0 /* support for functions other than function 0 */ |
> > > +                0x1 << 5 /* support for function 5 */
> > > +                };
> > > +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
> > >      aml_append(ifctx1, aml_return(buf));
> > >      aml_append(ifctx, ifctx1);
> > > +
> > > +    /* PCI Firmware Specification 3.1
> > > +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
> > > +     */
> > > +    /* Arg2: Function Index: 5 */
> > > +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
> > > +    /* 0 - The operating system must not ignore the PCI configuration that
> > > +     *     firmware has done at boot time.
> > > +     */
> > > +    aml_append(ifctx1, aml_return(aml_int(0)));
> > > +    aml_append(ifctx, ifctx1);
> > >      aml_append(method, ifctx);
> > >  
> > >      byte_list[0] = 0;  
> 
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2021-01-05  0:35       ` Igor Mammedov
@ 2021-01-05  1:53         ` Jiahui Cen
  2021-01-06 13:29           ` Igor Mammedov
  2021-01-05 19:33         ` Laszlo Ersek
  1 sibling, 1 reply; 26+ messages in thread
From: Jiahui Cen @ 2021-01-05  1:53 UTC (permalink / raw)
  To: Igor Mammedov, Michael S. Tsirkin
  Cc: xieyingtai, Eduardo Habkost, Richard Henderson, qemu-devel,
	Ard Biesheuvel, Paolo Bonzini, Laszlo Ersek, wu.wubin



On 2021/1/5 8:35, Igor Mammedov wrote:
> On Wed, 30 Dec 2020 16:22:08 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
>>> On Wed, 23 Dec 2020 17:08:31 +0800
>>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>>   
>>>> There may be some differences in pci resource assignment between guest os
>>>> and firmware.
>>>>
>>>> Eg. A Bridge with Bus [d2]
>>>>     -+-[0000:d2]---01.0-[d3]----01.0
>>>>
>>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>>>
>>>>     In EDK2, the Resource Map would be:
>>>>         PciBus: Resource Map for Bridge [D2|01|00]
>>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>>>
>>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>>
>>>> The diffences could result in resource assignment failure.
>>>>
>>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>>>> that firmware has done at boot time could handle the differences.  
>>>
>>> I'm not sure about this one, 
>>> OS should able to reconfigure PCI resources according to what and where is plugged
>>> (and it even more true is hotplug is taken into account)  
>>
>> spec says this:
>>
>> 0: No (The operating system must not ignore the PCI configuration that firmware has done
>> at boot time. However, the operating system is free to configure the devices in this hierarchy
>> that have not been configured by the firmware. There may be a reduced level of hot plug
>> capability support in this hierarchy due to resource constraints. This situation is the same as
>> the legacy situation where this _DSM is not provided.)
>> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
>> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
> in guest kernel and maybe QEMU
> 
> Though I have a question,
> 
>  1. does it work for PC machine with current kernel, if so why?
>  2. what it would take to make it work for arm/virt?
> 

1. For x86, it generally keeps the configuration by firmware,
so there is nothing wrong for PC machine.

2. We add DSM method in DSDT to inform guest to keep
firmware's configuration, just like x86.

>> and
>>
>> IMPLEMENTATION NOTE
>> This _DSM function provides backwards compatibility on platforms that can run legacy operating
>> systems.
>> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
>> The firmware cannot distinguish the operating system in time to change the boot configuration of
>> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
>> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
>> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
>> operating system is installed on this system, it can access device resources above the 4 GiB so it does
>> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
>> configures at boot time. It is not possible for the firmware to change this by the time it boots the
>> operating system. Ignoring the configurations done by firmware at boot time will allow the
>> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
>> system while constrain it to addresses below 4 GiB for an x86 operating system.
>>
>> so fundamentally, saying "1" here just means "you can ignore what
>> firmware configured if you like".
>>
>>
>> I have a different question though: our CRS etc is based on what
>> firmware configured. Is that ok? Or is ACPI expected to somehow
>> reconfigure itself when OS reconfigures devices?
>> Think it's ok but could not find documentation either way.
> 
> guest consume DSDT only at boot time,
> reconfiguration can done later by PCI subsystem without
> ACPI (at least it used to be so).
> 
> However DSM is dynamic,
> and maybe evaluated at runtime,
> though I don't know if kernel would re-evaluate this feature bit after boot
> 

Seems kernel evaluates DSM only at boot time.

Thanks,
Jiahui

> 
>>
>>
>>>>
>>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>>> ---
>>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>>> index 11b3db8f71..c189306599 100644
>>>> --- a/hw/pci-host/gpex-acpi.c
>>>> +++ b/hw/pci-host/gpex-acpi.c
>>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>>> -    uint8_t byte_list[1] = {1};
>>>> -    buf = aml_buffer(1, byte_list);
>>>> +    uint8_t byte_list[] = {
>>>> +                0x1 << 0 /* support for functions other than function 0 */ |
>>>> +                0x1 << 5 /* support for function 5 */
>>>> +                };
>>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>>      aml_append(ifctx1, aml_return(buf));
>>>>      aml_append(ifctx, ifctx1);
>>>> +
>>>> +    /* PCI Firmware Specification 3.1
>>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>>> +     */
>>>> +    /* Arg2: Function Index: 5 */
>>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>>> +    /* 0 - The operating system must not ignore the PCI configuration that
>>>> +     *     firmware has done at boot time.
>>>> +     */
>>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>>>> +    aml_append(ifctx, ifctx1);
>>>>      aml_append(method, ifctx);
>>>>  
>>>>      byte_list[0] = 0;  
>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2021-01-05  0:35       ` Igor Mammedov
  2021-01-05  1:53         ` Jiahui Cen
@ 2021-01-05 19:33         ` Laszlo Ersek
  1 sibling, 0 replies; 26+ messages in thread
From: Laszlo Ersek @ 2021-01-05 19:33 UTC (permalink / raw)
  To: Igor Mammedov, Michael S. Tsirkin
  Cc: xieyingtai, Jiahui Cen, Eduardo Habkost, Richard Henderson,
	qemu-devel, Ard Biesheuvel, Paolo Bonzini, wu.wubin

On 01/05/21 01:35, Igor Mammedov wrote:
> On Wed, 30 Dec 2020 16:22:08 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:
>>> On Wed, 23 Dec 2020 17:08:31 +0800
>>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>>   
>>>> There may be some differences in pci resource assignment between guest os
>>>> and firmware.
>>>>
>>>> Eg. A Bridge with Bus [d2]
>>>>     -+-[0000:d2]---01.0-[d3]----01.0
>>>>
>>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>>>
>>>>     In EDK2, the Resource Map would be:
>>>>         PciBus: Resource Map for Bridge [D2|01|00]
>>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>>>
>>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>>
>>>> The diffences could result in resource assignment failure.
>>>>
>>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>>>> that firmware has done at boot time could handle the differences.  
>>>
>>> I'm not sure about this one, 
>>> OS should able to reconfigure PCI resources according to what and where is plugged
>>> (and it even more true is hotplug is taken into account)  
>>
>> spec says this:
>>
>> 0: No (The operating system must not ignore the PCI configuration that firmware has done
>> at boot time. However, the operating system is free to configure the devices in this hierarchy
>> that have not been configured by the firmware. There may be a reduced level of hot plug
>> capability support in this hierarchy due to resource constraints. This situation is the same as
>> the legacy situation where this _DSM is not provided.)
>> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
>> at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
> in guest kernel and maybe QEMU
> 
> Though I have a question,
> 
>  1. does it work for PC machine with current kernel, if so why?
>  2. what it would take to make it work for arm/virt?

The Linux/arm64 guest deals with PCI resources differently for
historical reasons. I was extremely confused by that as well, but Ard
explained here:
<https://www.redhat.com/archives/edk2-devel-archive/2020-December/msg01027.html>.

(Do not be alarmed by Ard's initial statement "That is not going to
work"; he later revised that here:
<https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg05033.html>.)

Thanks,
Laszlo

> 
>> and
>>
>> IMPLEMENTATION NOTE
>> This _DSM function provides backwards compatibility on platforms that can run legacy operating
>> systems.
>> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
>> The firmware cannot distinguish the operating system in time to change the boot configuration of
>> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
>> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
>> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
>> operating system is installed on this system, it can access device resources above the 4 GiB so it does
>> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
>> configures at boot time. It is not possible for the firmware to change this by the time it boots the
>> operating system. Ignoring the configurations done by firmware at boot time will allow the
>> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
>> system while constrain it to addresses below 4 GiB for an x86 operating system.
>>
>> so fundamentally, saying "1" here just means "you can ignore what
>> firmware configured if you like".
>>
>>
>> I have a different question though: our CRS etc is based on what
>> firmware configured. Is that ok? Or is ACPI expected to somehow
>> reconfigure itself when OS reconfigures devices?
>> Think it's ok but could not find documentation either way.
> 
> guest consume DSDT only at boot time,
> reconfiguration can done later by PCI subsystem without
> ACPI (at least it used to be so).
> 
> However DSM is dynamic,
> and maybe evaluated at runtime,
> though I don't know if kernel would re-evaluate this feature bit after boot
> 
> 
>>
>>
>>>>
>>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>>> ---
>>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>>> index 11b3db8f71..c189306599 100644
>>>> --- a/hw/pci-host/gpex-acpi.c
>>>> +++ b/hw/pci-host/gpex-acpi.c
>>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>>> -    uint8_t byte_list[1] = {1};
>>>> -    buf = aml_buffer(1, byte_list);
>>>> +    uint8_t byte_list[] = {
>>>> +                0x1 << 0 /* support for functions other than function 0 */ |
>>>> +                0x1 << 5 /* support for function 5 */
>>>> +                };
>>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>>      aml_append(ifctx1, aml_return(buf));
>>>>      aml_append(ifctx, ifctx1);
>>>> +
>>>> +    /* PCI Firmware Specification 3.1
>>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>>> +     */
>>>> +    /* Arg2: Function Index: 5 */
>>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>>> +    /* 0 - The operating system must not ignore the PCI configuration that
>>>> +     *     firmware has done at boot time.
>>>> +     */
>>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>>>> +    aml_append(ifctx, ifctx1);
>>>>      aml_append(method, ifctx);
>>>>  
>>>>      byte_list[0] = 0;  
>>
>>
> 
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2021-01-05  1:53         ` Jiahui Cen
@ 2021-01-06 13:29           ` Igor Mammedov
  2021-01-07  5:54             ` Jiahui Cen
  0 siblings, 1 reply; 26+ messages in thread
From: Igor Mammedov @ 2021-01-06 13:29 UTC (permalink / raw)
  To: Jiahui Cen
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Ard Biesheuvel, Paolo Bonzini,
	Laszlo Ersek, wu.wubin

On Tue, 5 Jan 2021 09:53:49 +0800
Jiahui Cen <cenjiahui@huawei.com> wrote:

> On 2021/1/5 8:35, Igor Mammedov wrote:
> > On Wed, 30 Dec 2020 16:22:08 -0500
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> >> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:  
> >>> On Wed, 23 Dec 2020 17:08:31 +0800
> >>> Jiahui Cen <cenjiahui@huawei.com> wrote:
> >>>     
> >>>> There may be some differences in pci resource assignment between guest os
> >>>> and firmware.
> >>>>
> >>>> Eg. A Bridge with Bus [d2]
> >>>>     -+-[0000:d2]---01.0-[d3]----01.0
> >>>>
> >>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
> >>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
> >>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
> >>>>
> >>>>     In EDK2, the Resource Map would be:
> >>>>         PciBus: Resource Map for Bridge [D2|01|00]
> >>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
> >>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
> >>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
> >>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
> >>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
> >>>>
> >>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
> >>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
> >>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
> >>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
> >>>>
> >>>> The diffences could result in resource assignment failure.
> >>>>
> >>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
> >>>> that firmware has done at boot time could handle the differences.    
> >>>
> >>> I'm not sure about this one, 
> >>> OS should able to reconfigure PCI resources according to what and where is plugged
> >>> (and it even more true is hotplug is taken into account)    
> >>
> >> spec says this:
> >>
> >> 0: No (The operating system must not ignore the PCI configuration that firmware has done
> >> at boot time. However, the operating system is free to configure the devices in this hierarchy
> >> that have not been configured by the firmware. There may be a reduced level of hot plug
> >> capability support in this hierarchy due to resource constraints. This situation is the same as
> >> the legacy situation where this _DSM is not provided.)
> >> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
> >> at boot time, and reconfigure/rebalance the resources in the hierarchy.)  
> > I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
> > in guest kernel and maybe QEMU
> > 
> > Though I have a question,
> > 
> >  1. does it work for PC machine with current kernel, if so why?
> >  2. what it would take to make it work for arm/virt?
> >   
> 
> 1. For x86, it generally keeps the configuration by firmware,
> so there is nothing wrong for PC machine.
> 
> 2. We add DSM method in DSDT to inform guest to keep
> firmware's configuration, just like x86.
> 
> >> and
> >>
> >> IMPLEMENTATION NOTE
> >> This _DSM function provides backwards compatibility on platforms that can run legacy operating
> >> systems.
> >> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
> >> The firmware cannot distinguish the operating system in time to change the boot configuration of
> >> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
> >> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
> >> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
> >> operating system is installed on this system, it can access device resources above the 4 GiB so it does
> >> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
> >> configures at boot time. It is not possible for the firmware to change this by the time it boots the
> >> operating system. Ignoring the configurations done by firmware at boot time will allow the
> >> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
> >> system while constrain it to addresses below 4 GiB for an x86 operating system.
> >>
> >> so fundamentally, saying "1" here just means "you can ignore what
> >> firmware configured if you like".
> >>
> >>
> >> I have a different question though: our CRS etc is based on what
> >> firmware configured. Is that ok? Or is ACPI expected to somehow
> >> reconfigure itself when OS reconfigures devices?
> >> Think it's ok but could not find documentation either way.  
> > 
> > guest consume DSDT only at boot time,
> > reconfiguration can done later by PCI subsystem without
> > ACPI (at least it used to be so).
> > 
> > However DSM is dynamic,
> > and maybe evaluated at runtime,
> > though I don't know if kernel would re-evaluate this feature bit after boot
> >   
> 
> Seems kernel evaluates DSM only at boot time.

Ok, lets respin this series without 5/8
to avoid mixing unrelated changes in one series.

We can think about 5/8 some more and return to it later if it proves hard to merge.

> 
> Thanks,
> Jiahui
> 
> >   
> >>
> >>  
> >>>>
> >>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
> >>>> ---
> >>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
> >>>>  1 file changed, 16 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
> >>>> index 11b3db8f71..c189306599 100644
> >>>> --- a/hw/pci-host/gpex-acpi.c
> >>>> +++ b/hw/pci-host/gpex-acpi.c
> >>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
> >>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
> >>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
> >>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
> >>>> -    uint8_t byte_list[1] = {1};
> >>>> -    buf = aml_buffer(1, byte_list);
> >>>> +    uint8_t byte_list[] = {
> >>>> +                0x1 << 0 /* support for functions other than function 0 */ |
> >>>> +                0x1 << 5 /* support for function 5 */
> >>>> +                };
> >>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
> >>>>      aml_append(ifctx1, aml_return(buf));
> >>>>      aml_append(ifctx, ifctx1);
> >>>> +
> >>>> +    /* PCI Firmware Specification 3.1
> >>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
> >>>> +     */
> >>>> +    /* Arg2: Function Index: 5 */
> >>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
> >>>> +    /* 0 - The operating system must not ignore the PCI configuration that
> >>>> +     *     firmware has done at boot time.
> >>>> +     */
> >>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
> >>>> +    aml_append(ifctx, ifctx1);
> >>>>      aml_append(method, ifctx);
> >>>>  
> >>>>      byte_list[0] = 0;    
> >>
> >>  
> > 
> > .
> >   
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map
  2021-01-06 13:29           ` Igor Mammedov
@ 2021-01-07  5:54             ` Jiahui Cen
  0 siblings, 0 replies; 26+ messages in thread
From: Jiahui Cen @ 2021-01-07  5:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: xieyingtai, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Ard Biesheuvel, Paolo Bonzini,
	Laszlo Ersek, wu.wubin



On 2021/1/6 21:29, Igor Mammedov wrote:
> On Tue, 5 Jan 2021 09:53:49 +0800
> Jiahui Cen <cenjiahui@huawei.com> wrote:
> 
>> On 2021/1/5 8:35, Igor Mammedov wrote:
>>> On Wed, 30 Dec 2020 16:22:08 -0500
>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>   
>>>> On Tue, Dec 29, 2020 at 02:41:42PM +0100, Igor Mammedov wrote:  
>>>>> On Wed, 23 Dec 2020 17:08:31 +0800
>>>>> Jiahui Cen <cenjiahui@huawei.com> wrote:
>>>>>     
>>>>>> There may be some differences in pci resource assignment between guest os
>>>>>> and firmware.
>>>>>>
>>>>>> Eg. A Bridge with Bus [d2]
>>>>>>     -+-[0000:d2]---01.0-[d3]----01.0
>>>>>>
>>>>>>     where [d2:01.00] is a pcie-pci-bridge with BAR0 (mem, 64-bit, non-pref) [size=256]
>>>>>>           [d3:01.00] is a PCI Device with BAR0 (mem, 64-bit, pref) [size=128K]
>>>>>>                                           BAR4 (mem, 64-bit, pref) [size=64M]
>>>>>>
>>>>>>     In EDK2, the Resource Map would be:
>>>>>>         PciBus: Resource Map for Bridge [D2|01|00]
>>>>>>         Type = PMem64; Base = 0x8004000000;     Length = 0x4100000;     Alignment = 0x3FFFFFF
>>>>>>            Base = 0x8004000000; Length = 0x4000000;     Alignment = 0x3FFFFFF;  Owner = PCI [D3|01|00:20]
>>>>>>            Base = 0x8008000000; Length = 0x20000;       Alignment = 0x1FFFF;    Owner = PCI [D3|01|00:10]
>>>>>>         Type =  Mem64; Base = 0x8008100000;     Length = 0x100; Alignment = 0xFFF
>>>>>>     It would use 0x4100000 to calculate the root bus's PMem64 resource window.
>>>>>>
>>>>>>     While in Linux, kernel will use 0x1FFFFFF as the alignment to calculate
>>>>>>     the PMem64 size, which would be 0x6000000. So kernel would try to
>>>>>>     allocate 0x6000000 from the PMem64 resource window, but since the window
>>>>>>     size is 0x4100000 as assigned by EDK2, the allocation would fail.
>>>>>>
>>>>>> The diffences could result in resource assignment failure.
>>>>>>
>>>>>> Using _DSM #5 method to inform guest os not to ignore the PCI configuration
>>>>>> that firmware has done at boot time could handle the differences.    
>>>>>
>>>>> I'm not sure about this one, 
>>>>> OS should able to reconfigure PCI resources according to what and where is plugged
>>>>> (and it even more true is hotplug is taken into account)    
>>>>
>>>> spec says this:
>>>>
>>>> 0: No (The operating system must not ignore the PCI configuration that firmware has done
>>>> at boot time. However, the operating system is free to configure the devices in this hierarchy
>>>> that have not been configured by the firmware. There may be a reduced level of hot plug
>>>> capability support in this hierarchy due to resource constraints. This situation is the same as
>>>> the legacy situation where this _DSM is not provided.)
>>>> 1: Yes (The operating system may ignore the PCI configuration that the firmware has done
>>>> at boot time, and reconfigure/rebalance the resources in the hierarchy.)  
>>> I sort of convinced my self that's is just hotplug work might need to implement reconfiguration
>>> in guest kernel and maybe QEMU
>>>
>>> Though I have a question,
>>>
>>>  1. does it work for PC machine with current kernel, if so why?
>>>  2. what it would take to make it work for arm/virt?
>>>   
>>
>> 1. For x86, it generally keeps the configuration by firmware,
>> so there is nothing wrong for PC machine.
>>
>> 2. We add DSM method in DSDT to inform guest to keep
>> firmware's configuration, just like x86.
>>
>>>> and
>>>>
>>>> IMPLEMENTATION NOTE
>>>> This _DSM function provides backwards compatibility on platforms that can run legacy operating
>>>> systems.
>>>> Operating systems for two different architectures (e.g., x86 and x64) can be installed on a platform.
>>>> The firmware cannot distinguish the operating system in time to change the boot configuration of
>>>> devices. Say for instance, an x86 operating system in non-PAE mode is installed on a system. The
>>>> x86 operating system cannot access device resource space above 4 GiB. So the firmware is required
>>>> to configure devices at boot time using addresses below 4 GiB. On the other hand, if an x64
>>>> operating system is installed on this system, it can access device resources above the 4 GiB so it does
>>>> not want the firmware to constrain the resource assignment below 4 GiB that the firmware
>>>> configures at boot time. It is not possible for the firmware to change this by the time it boots the
>>>> operating system. Ignoring the configurations done by firmware at boot time will allow the
>>>> operating system to push resource assignment using addresses above 4 GiB for an x64 operating
>>>> system while constrain it to addresses below 4 GiB for an x86 operating system.
>>>>
>>>> so fundamentally, saying "1" here just means "you can ignore what
>>>> firmware configured if you like".
>>>>
>>>>
>>>> I have a different question though: our CRS etc is based on what
>>>> firmware configured. Is that ok? Or is ACPI expected to somehow
>>>> reconfigure itself when OS reconfigures devices?
>>>> Think it's ok but could not find documentation either way.  
>>>
>>> guest consume DSDT only at boot time,
>>> reconfiguration can done later by PCI subsystem without
>>> ACPI (at least it used to be so).
>>>
>>> However DSM is dynamic,
>>> and maybe evaluated at runtime,
>>> though I don't know if kernel would re-evaluate this feature bit after boot
>>>   
>>
>> Seems kernel evaluates DSM only at boot time.
> 
> Ok, lets respin this series without 5/8
> to avoid mixing unrelated changes in one series.
> 
> We can think about 5/8 some more and return to it later if it proves hard to merge.

OK, I'll split patch [5/8] from this series and
send them separately.

Thanks for the discussion.

Thanks,
Jiahui

> 
>>
>> Thanks,
>> Jiahui
>>
>>>   
>>>>
>>>>  
>>>>>>
>>>>>> Signed-off-by: Jiahui Cen <cenjiahui@huawei.com>
>>>>>> ---
>>>>>>  hw/pci-host/gpex-acpi.c | 18 ++++++++++++++++--
>>>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
>>>>>> index 11b3db8f71..c189306599 100644
>>>>>> --- a/hw/pci-host/gpex-acpi.c
>>>>>> +++ b/hw/pci-host/gpex-acpi.c
>>>>>> @@ -112,10 +112,24 @@ static void acpi_dsdt_add_pci_osc(Aml *dev)
>>>>>>      UUID = aml_touuid("E5C937D0-3553-4D7A-9117-EA4D19C3434D");
>>>>>>      ifctx = aml_if(aml_equal(aml_arg(0), UUID));
>>>>>>      ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(0)));
>>>>>> -    uint8_t byte_list[1] = {1};
>>>>>> -    buf = aml_buffer(1, byte_list);
>>>>>> +    uint8_t byte_list[] = {
>>>>>> +                0x1 << 0 /* support for functions other than function 0 */ |
>>>>>> +                0x1 << 5 /* support for function 5 */
>>>>>> +                };
>>>>>> +    buf = aml_buffer(ARRAY_SIZE(byte_list), byte_list);
>>>>>>      aml_append(ifctx1, aml_return(buf));
>>>>>>      aml_append(ifctx, ifctx1);
>>>>>> +
>>>>>> +    /* PCI Firmware Specification 3.1
>>>>>> +     * 4.6.5. _DSM for Ignoring PCI Boot Configurations
>>>>>> +     */
>>>>>> +    /* Arg2: Function Index: 5 */
>>>>>> +    ifctx1 = aml_if(aml_equal(aml_arg(2), aml_int(5)));
>>>>>> +    /* 0 - The operating system must not ignore the PCI configuration that
>>>>>> +     *     firmware has done at boot time.
>>>>>> +     */
>>>>>> +    aml_append(ifctx1, aml_return(aml_int(0)));
>>>>>> +    aml_append(ifctx, ifctx1);
>>>>>>      aml_append(method, ifctx);
>>>>>>  
>>>>>>      byte_list[0] = 0;    
>>>>
>>>>  
>>>
>>> .
>>>   
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-07  5:56 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-23  9:08 [PATCH v3 0/8] acpi: Some fixes for pxb support for ARM virt machine Jiahui Cen
2020-12-23  9:08 ` [PATCH v3 1/8] acpi: Allow DSDT acpi table changes Jiahui Cen
2020-12-23  9:08 ` [PATCH v3 2/8] acpi: Add addr offset in build_crs Jiahui Cen
2020-12-29 13:36   ` Igor Mammedov
2020-12-31  3:26     ` Jiahui Cen
2020-12-23  9:08 ` [PATCH v3 3/8] acpi/gpex: Inform os to keep firmware resource map Jiahui Cen
2020-12-29 13:41   ` Igor Mammedov
2020-12-30 21:22     ` Michael S. Tsirkin
2020-12-31  8:22       ` Jiahui Cen
2021-01-05  0:35       ` Igor Mammedov
2021-01-05  1:53         ` Jiahui Cen
2021-01-06 13:29           ` Igor Mammedov
2021-01-07  5:54             ` Jiahui Cen
2021-01-05 19:33         ` Laszlo Ersek
2020-12-31  3:30     ` Jiahui Cen
2020-12-23  9:08 ` [PATCH v3 4/8] acpi/gpex: Exclude pxb's resources from PCI0 Jiahui Cen
2020-12-23  9:08 ` [PATCH v3 5/8] acpi/gpex: Append pxb devs in ascending order Jiahui Cen
2020-12-29 13:47   ` Igor Mammedov
2020-12-30 21:17     ` Michael S. Tsirkin
2020-12-31  7:34       ` Jiahui Cen
2021-01-05  0:21       ` Igor Mammedov
2020-12-23  9:08 ` [PATCH v3 6/8] Kconfig: Enable PXB for ARM_VIRT by default Jiahui Cen
2020-12-29 13:50   ` Igor Mammedov
2020-12-31  7:35     ` Jiahui Cen
2020-12-23  9:08 ` [PATCH v3 7/8] acpi: Enable pxb unit-test for ARM virt machine Jiahui Cen
2020-12-23  9:08 ` [PATCH v3 8/8] acpi: Update addr_trans and _DSM in expected files Jiahui Cen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.