All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Support x2APIC mode with TCG accelerator
@ 2023-03-26  5:20 Bui Quang Minh
  2023-03-26  5:20 ` [PATCH v2 1/5] i386/tcg: implement x2APIC registers MSR access Bui Quang Minh
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-26  5:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, David Woodhouse,
	Igor Mammedov, Alex Bennée, Bui Quang Minh

Hi everyone,

This series implements x2APIC mode in userspace local APIC and the
RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
using either Intel or AMD iommu.

Testing the emulated userspace APIC with kvm-unit-tests, disable test
device with this patch

diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
index 1734afb..f56fe1c 100644
--- a/lib/x86/fwcfg.c
+++ b/lib/x86/fwcfg.c
@@ -27,6 +27,7 @@ static void read_cfg_override(void)
 
        if ((str = getenv("TEST_DEVICE")))
                no_test_device = !atol(str);
+       no_test_device = true;
 
        if ((str = getenv("MEMLIMIT")))
                fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;

~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
./run_tests.sh -v -g apic 

TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
-cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
apic-split (54 tests, 8 unexpected failures, 1 skipped)
TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
6 unexpected failures, 2 skipped)

	FAIL: apic_disable: *0xfee00030: 50014
	FAIL: apic_disable: *0xfee00080: f0
	FAIL: apic_disable: *0xfee00030: 50014
	FAIL: apic_disable: *0xfee00080: f0 
	FAIL: apicbase: relocate apic

These errors are because we don't disable MMIO region when switching to
x2APIC and don't support relocate MMIO region yet. This is a problem
because, MMIO region is the same for all CPUs, in order to support these we
need to figure out how to allocate and manage different MMIO regions for
each CPUs. This can be an improvement in the future.

	FAIL: nmi-after-sti
	FAIL: multiple nmi

These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.

	FAIL: TMCCT should stay at zero

This error is related to APIC timer which should be addressed in separate
patch.

Version 2 changes,
- Add support for APIC ID larger than 255
- Adjust AMD iommu for x2APIC suuport
- Reorganize and split patch 1,2 into patch 1,2,3 in version 2

Thanks,
Quang Minh.

Bui Quang Minh (5):
  i386/tcg: implement x2APIC registers MSR access
  apic: add support for x2APIC mode
  apic, i386/tcg: add x2apic transitions
  intel_iommu: allow Extended Interrupt Mode when using userspace APIC
  amd_iommu: report x2APIC support to the operating system

 hw/i386/acpi-build.c                 |  28 +--
 hw/i386/amd_iommu.c                  |  21 +-
 hw/i386/amd_iommu.h                  |  16 +-
 hw/i386/intel_iommu.c                |  11 -
 hw/i386/x86.c                        |   8 +-
 hw/intc/apic.c                       | 358 ++++++++++++++++++++-------
 hw/intc/apic_common.c                |  15 +-
 hw/intc/trace-events                 |   4 +-
 include/hw/i386/apic.h               |   6 +-
 include/hw/i386/apic_internal.h      |   2 +-
 target/i386/cpu-sysemu.c             |  10 +
 target/i386/cpu.c                    |   5 +-
 target/i386/cpu.h                    |   9 +
 target/i386/tcg/sysemu/misc_helper.c |  31 +++
 14 files changed, 389 insertions(+), 135 deletions(-)

-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 1/5] i386/tcg: implement x2APIC registers MSR access
  2023-03-26  5:20 [PATCH v2 0/5] Support x2APIC mode with TCG accelerator Bui Quang Minh
@ 2023-03-26  5:20 ` Bui Quang Minh
  2023-03-27 16:56   ` David Woodhouse
  2023-03-26  5:20 ` [PATCH v2 2/5] apic: add support for x2APIC mode Bui Quang Minh
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-26  5:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, David Woodhouse,
	Igor Mammedov, Alex Bennée, Bui Quang Minh

This commit refactors apic_mem_read/write to support both MMIO access in
xAPIC and MSR access in x2APIC.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/intc/apic.c                       | 79 ++++++++++++++++++----------
 hw/intc/trace-events                 |  4 +-
 include/hw/i386/apic.h               |  3 ++
 target/i386/cpu.h                    |  3 ++
 target/i386/tcg/sysemu/misc_helper.c | 27 ++++++++++
 5 files changed, 86 insertions(+), 30 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 20b5a94073..61b494b20a 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -288,6 +288,13 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+bool is_x2apic_mode(DeviceState *dev)
+{
+    APICCommonState *s = APIC(dev);
+
+    return s->apicbase & MSR_IA32_APICBASE_EXTD;
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
     s->apicbase = (val & 0xfffff000) |
@@ -636,16 +643,11 @@ static void apic_timer(void *opaque)
     apic_timer_update(s, s->next_time);
 }
 
-static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+uint64_t apic_register_read(int index)
 {
     DeviceState *dev;
     APICCommonState *s;
-    uint32_t val;
-    int index;
-
-    if (size < 4) {
-        return 0;
-    }
+    uint64_t val;
 
     dev = cpu_get_current_apic();
     if (!dev) {
@@ -653,7 +655,6 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
     }
     s = APIC(dev);
 
-    index = (addr >> 4) & 0xff;
     switch(index) {
     case 0x02: /* id */
         val = s->id << 24;
@@ -720,7 +721,23 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
         val = 0;
         break;
     }
-    trace_apic_mem_readl(addr, val);
+
+    trace_apic_register_read(index, val);
+    return val;
+}
+
+static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint32_t val;
+    int index;
+
+    if (size < 4) {
+        return 0;
+    }
+
+    index = (addr >> 4) & 0xff;
+    val = (uint32_t)apic_register_read(index);
+
     return val;
 }
 
@@ -737,27 +754,10 @@ static void apic_send_msi(MSIMessage *msi)
     apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
 
-static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
-                           unsigned size)
+void apic_register_write(int index, uint64_t val)
 {
     DeviceState *dev;
     APICCommonState *s;
-    int index = (addr >> 4) & 0xff;
-
-    if (size < 4) {
-        return;
-    }
-
-    if (addr > 0xfff || !index) {
-        /* MSI and MMIO APIC are at the same memory location,
-         * but actually not on the global bus: MSI is on PCI bus
-         * APIC is connected directly to the CPU.
-         * Mapping them on the global bus happens to work because
-         * MSI registers are reserved in APIC MMIO and vice versa. */
-        MSIMessage msi = { .address = addr, .data = val };
-        apic_send_msi(&msi);
-        return;
-    }
 
     dev = cpu_get_current_apic();
     if (!dev) {
@@ -765,7 +765,7 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
     }
     s = APIC(dev);
 
-    trace_apic_mem_writel(addr, val);
+    trace_apic_register_write(index, val);
 
     switch(index) {
     case 0x02:
@@ -843,6 +843,29 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
     }
 }
 
+static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
+                           unsigned size)
+{
+    int index = (addr >> 4) & 0xff;
+
+    if (size < 4) {
+        return;
+    }
+
+    if (addr > 0xfff || !index) {
+        /* MSI and MMIO APIC are at the same memory location,
+         * but actually not on the global bus: MSI is on PCI bus
+         * APIC is connected directly to the CPU.
+         * Mapping them on the global bus happens to work because
+         * MSI registers are reserved in APIC MMIO and vice versa. */
+        MSIMessage msi = { .address = addr, .data = val };
+        apic_send_msi(&msi);
+        return;
+    }
+
+    apic_register_write(index, val);
+}
+
 static void apic_pre_save(APICCommonState *s)
 {
     apic_sync_vapic(s, SYNC_FROM_VAPIC);
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 50cadfb996..9d4e7a67be 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -14,8 +14,8 @@ cpu_get_apic_base(uint64_t val) "0x%016"PRIx64
 # apic.c
 apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d"
 apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode %d vector %d trigger_mode %d"
-apic_mem_readl(uint64_t addr, uint32_t val)  "0x%"PRIx64" = 0x%08x"
-apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x"
+apic_register_read(uint8_t reg, uint64_t val) "register 0x%02x = 0x%"PRIx64
+apic_register_write(uint8_t reg, uint64_t val) "register 0x%02x = 0x%"PRIx64
 
 # ioapic.c
 ioapic_set_remote_irr(int n) "set remote irr for pin %d"
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index bdc15a7a73..2cebeb4faf 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -18,6 +18,9 @@ void apic_sipi(DeviceState *s);
 void apic_poll_irq(DeviceState *d);
 void apic_designate_bsp(DeviceState *d, bool bsp);
 int apic_get_highest_priority_irr(DeviceState *dev);
+uint64_t apic_register_read(int index);
+void apic_register_write(int index, uint64_t val);
+bool is_x2apic_mode(DeviceState *d);
 
 /* pc.c */
 DeviceState *cpu_get_current_apic(void);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d243e290d3..02165a5ad2 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -545,6 +545,9 @@ typedef enum X86Seg {
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS     0x00000490
 #define MSR_IA32_VMX_VMFUNC             0x00000491
 
+#define MSR_APIC_START                  0x00000800
+#define MSR_APIC_END                    0x000008ff
+
 #define XSTATE_FP_BIT                   0
 #define XSTATE_SSE_BIT                  1
 #define XSTATE_YMM_BIT                  2
diff --git a/target/i386/tcg/sysemu/misc_helper.c b/target/i386/tcg/sysemu/misc_helper.c
index e1528b7f80..1fce2076a3 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -25,6 +25,7 @@
 #include "exec/address-spaces.h"
 #include "exec/exec-all.h"
 #include "tcg/helper-tcg.h"
+#include "hw/i386/apic.h"
 
 void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
 {
@@ -289,6 +290,19 @@ void helper_wrmsr(CPUX86State *env)
         env->msr_bndcfgs = val;
         cpu_sync_bndcs_hflags(env);
         break;
+    case MSR_APIC_START ... MSR_APIC_END: {
+        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
+
+        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
+            goto error;
+        }
+
+        qemu_mutex_lock_iothread();
+        apic_register_write(index, val);
+        qemu_mutex_unlock_iothread();
+
+        break;
+    }
     default:
         if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
             && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +
@@ -455,6 +469,19 @@ void helper_rdmsr(CPUX86State *env)
         val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
         break;
     }
+    case MSR_APIC_START ... MSR_APIC_END: {
+        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
+
+        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
+            raise_exception_ra(env, EXCP0D_GPF, GETPC());
+        }
+
+        qemu_mutex_lock_iothread();
+        val = apic_register_read(index);
+        qemu_mutex_unlock_iothread();
+
+        break;
+    }
     default:
         if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
             && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-26  5:20 [PATCH v2 0/5] Support x2APIC mode with TCG accelerator Bui Quang Minh
  2023-03-26  5:20 ` [PATCH v2 1/5] i386/tcg: implement x2APIC registers MSR access Bui Quang Minh
@ 2023-03-26  5:20 ` Bui Quang Minh
  2023-03-27 11:04   ` David Woodhouse
  2023-03-26  5:20 ` [PATCH v2 3/5] apic, i386/tcg: add x2apic transitions Bui Quang Minh
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-26  5:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, David Woodhouse,
	Igor Mammedov, Alex Bennée, Bui Quang Minh

This commit extends the APIC ID to 32-bit long and remove the 255 max APIC
ID limit in userspace APIC. The array that manages local APICs is now
dynamically allocated based on the max APIC ID of created x86 machine.
Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC
mode register access are supported.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/i386/x86.c                   |   8 +-
 hw/intc/apic.c                  | 229 +++++++++++++++++++++++---------
 hw/intc/apic_common.c           |   8 +-
 include/hw/i386/apic.h          |   3 +-
 include/hw/i386/apic_internal.h |   2 +-
 5 files changed, 184 insertions(+), 66 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a88a126123..fa9b15190d 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -132,11 +132,11 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
      * Can we support APIC ID 255 or higher?
      *
      * Under Xen: yes.
-     * With userspace emulated lapic: no
+     * With userspace emulated lapic: yes.
      * With KVM's in-kernel lapic: only if X2APIC API is enabled.
      */
     if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
-        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
+        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
         error_report("current -smp configuration requires kernel "
                      "irqchip and X2APIC API support.");
         exit(EXIT_FAILURE);
@@ -146,6 +146,10 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
         kvm_set_max_apic_id(x86ms->apic_id_limit);
     }
 
+    if (!kvm_irqchip_in_kernel()) {
+        apic_set_max_apic_id(x86ms->apic_id_limit);
+    }
+
     possible_cpus = mc->possible_cpu_arch_ids(ms);
     for (i = 0; i < ms->smp.cpus; i++) {
         x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 61b494b20a..159527af30 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -31,15 +31,15 @@
 #include "hw/i386/apic-msidef.h"
 #include "qapi/error.h"
 #include "qom/object.h"
-
-#define MAX_APICS 255
-#define MAX_APIC_WORDS 8
+#include "tcg/helper-tcg.h"
 
 #define SYNC_FROM_VAPIC                 0x1
 #define SYNC_TO_VAPIC                   0x2
 #define SYNC_ISR_IRR_TO_VAPIC           0x4
 
-static APICCommonState *local_apics[MAX_APICS + 1];
+static APICCommonState **local_apics;
+static uint32_t max_apics;
+static uint32_t max_apic_words;
 
 #define TYPE_APIC "apic"
 /*This is reusing the APICCommonState typedef from APIC_COMMON */
@@ -49,7 +49,19 @@ DECLARE_INSTANCE_CHECKER(APICCommonState, APIC,
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
 static void apic_update_irq(APICCommonState *s);
 static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
-                                      uint8_t dest, uint8_t dest_mode);
+                                      uint32_t dest, uint8_t dest_mode);
+
+void apic_set_max_apic_id(uint32_t max_apic_id)
+{
+    int word_size = 32;
+
+    /* round up the max apic id to next multiple of words */
+    max_apics = (max_apic_id + word_size - 1) & ~(word_size - 1);
+
+    local_apics = g_malloc0(sizeof(*local_apics) * max_apics);
+    max_apic_words = max_apics >> 5;
+}
+
 
 /* Find first bit starting from msb */
 static int apic_fls_bit(uint32_t value)
@@ -199,7 +211,7 @@ static void apic_external_nmi(APICCommonState *s)
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
     int __i, __j;\
-    for(__i = 0; __i < MAX_APIC_WORDS; __i++) {\
+    for(__i = 0; __i < max_apic_words; __i++) {\
         uint32_t __mask = deliver_bitmask[__i];\
         if (__mask) {\
             for(__j = 0; __j < 32; __j++) {\
@@ -226,7 +238,7 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
             {
                 int i, d;
                 d = -1;
-                for(i = 0; i < MAX_APIC_WORDS; i++) {
+                for(i = 0; i < max_apic_words; i++) {
                     if (deliver_bitmask[i]) {
                         d = i * 32 + apic_ffs_bit(deliver_bitmask[i]);
                         break;
@@ -276,16 +288,17 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
                  apic_set_irq(apic_iter, vector_num, trigger_mode) );
 }
 
-void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
+void apic_deliver_irq(uint32_t dest, uint8_t dest_mode, uint8_t delivery_mode,
                       uint8_t vector_num, uint8_t trigger_mode)
 {
-    uint32_t deliver_bitmask[MAX_APIC_WORDS];
+    uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t));
 
     trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num,
                            trigger_mode);
 
     apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode);
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
+    g_free(deliver_bitmask);
 }
 
 bool is_x2apic_mode(DeviceState *dev)
@@ -442,57 +455,91 @@ static void apic_eoi(APICCommonState *s)
     apic_update_irq(s);
 }
 
-static int apic_find_dest(uint8_t dest)
+static bool apic_match_dest(APICCommonState *apic, uint32_t dest)
 {
-    APICCommonState *apic = local_apics[dest];
+    if (is_x2apic_mode(&apic->parent_obj)) {
+        return apic->initial_apic_id == dest;
+    } else {
+        return apic->id == (uint8_t)dest;
+    }
+}
+
+static void apic_find_dest(uint32_t *deliver_bitmask, uint32_t dest)
+{
+    APICCommonState *apic = NULL;
     int i;
 
-    if (apic && apic->id == dest)
-        return dest;  /* shortcut in case apic->id == local_apics[dest]->id */
-
-    for (i = 0; i < MAX_APICS; i++) {
+    for (i = 0; i < max_apics; i++) {
         apic = local_apics[i];
-        if (apic && apic->id == dest)
-            return i;
-        if (!apic)
-            break;
+        if (apic && apic_match_dest(apic, dest)) {
+            apic_set_bit(deliver_bitmask, i);
+        }
     }
+}
 
-    return -1;
+static void apic_get_broadcast_bitmask(uint32_t *deliver_bitmask,
+                                       bool is_x2apic_broadcast)
+{
+    int i;
+    APICCommonState *apic_iter;
+
+    for (i = 0; i < max_apics; i++) {
+        apic_iter = local_apics[i];
+        if (apic_iter) {
+            bool apic_in_x2apic = is_x2apic_mode(&apic_iter->parent_obj);
+
+            if (is_x2apic_broadcast && apic_in_x2apic) {
+                apic_set_bit(deliver_bitmask, i);
+            } else if (!is_x2apic_broadcast && !apic_in_x2apic) {
+                apic_set_bit(deliver_bitmask, i);
+            }
+        }
+    }
 }
 
 static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
-                                      uint8_t dest, uint8_t dest_mode)
+                                      uint32_t dest, uint8_t dest_mode)
 {
     APICCommonState *apic_iter;
     int i;
 
+    memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
+
+    /* x2APIC broadcast id for both physical and logical (cluster) mode */
+    if (dest == 0xffffffff) {
+        apic_get_broadcast_bitmask(deliver_bitmask, true);
+        return;
+    }
+
     if (dest_mode == 0) {
+        apic_find_dest(deliver_bitmask, dest);
+        /* Broadcast to xAPIC mode apics */
         if (dest == 0xff) {
-            memset(deliver_bitmask, 0xff, MAX_APIC_WORDS * sizeof(uint32_t));
-        } else {
-            int idx = apic_find_dest(dest);
-            memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
-            if (idx >= 0)
-                apic_set_bit(deliver_bitmask, idx);
+            apic_get_broadcast_bitmask(deliver_bitmask, false);
         }
     } else {
         /* XXX: cluster mode */
-        memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
-        for(i = 0; i < MAX_APICS; i++) {
+        for(i = 0; i < max_apics; i++) {
             apic_iter = local_apics[i];
             if (apic_iter) {
-                if (apic_iter->dest_mode == 0xf) {
-                    if (dest & apic_iter->log_dest)
-                        apic_set_bit(deliver_bitmask, i);
-                } else if (apic_iter->dest_mode == 0x0) {
-                    if ((dest & 0xf0) == (apic_iter->log_dest & 0xf0) &&
-                        (dest & apic_iter->log_dest & 0x0f)) {
+                /* x2APIC mode */
+                if (apic_iter->apicbase & MSR_IA32_APICBASE_EXTD) {
+                    if ((dest & 0xffff0000) == (apic_iter->log_dest & 0xffff0000) &&
+                        (dest & apic_iter->log_dest & 0xffff)) {
                         apic_set_bit(deliver_bitmask, i);
                     }
+                } else {
+                    if (apic_iter->dest_mode == 0xf) {
+                        if (dest & apic_iter->log_dest) {
+                            apic_set_bit(deliver_bitmask, i);
+                        }
+                    } else if (apic_iter->dest_mode == 0x0) {
+                        if ((dest & 0xf0) == (apic_iter->log_dest & 0xf0) &&
+                            (dest & apic_iter->log_dest & 0x0f)) {
+                            apic_set_bit(deliver_bitmask, i);
+                        }
+                    }
                 }
-            } else {
-                break;
             }
         }
     }
@@ -516,29 +563,36 @@ void apic_sipi(DeviceState *dev)
     s->wait_for_sipi = 0;
 }
 
-static void apic_deliver(DeviceState *dev, uint8_t dest, uint8_t dest_mode,
+static void apic_deliver(DeviceState *dev, uint32_t dest, uint8_t dest_mode,
                          uint8_t delivery_mode, uint8_t vector_num,
-                         uint8_t trigger_mode)
+                         uint8_t trigger_mode, uint8_t dest_shorthand)
 {
     APICCommonState *s = APIC(dev);
-    uint32_t deliver_bitmask[MAX_APIC_WORDS];
-    int dest_shorthand = (s->icr[0] >> 18) & 3;
     APICCommonState *apic_iter;
+    uint32_t deliver_bitmask_size = max_apic_words * sizeof(uint32_t);
+    uint32_t *deliver_bitmask = g_malloc(deliver_bitmask_size);
+    uint32_t current_apic_id;
+
+    if (is_x2apic_mode(dev)) {
+        current_apic_id = s->initial_apic_id;
+    } else {
+        current_apic_id = s->id;
+    }
 
     switch (dest_shorthand) {
     case 0:
         apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode);
         break;
     case 1:
-        memset(deliver_bitmask, 0x00, sizeof(deliver_bitmask));
-        apic_set_bit(deliver_bitmask, s->id);
+        memset(deliver_bitmask, 0x00, deliver_bitmask_size);
+        apic_set_bit(deliver_bitmask, current_apic_id);
         break;
     case 2:
-        memset(deliver_bitmask, 0xff, sizeof(deliver_bitmask));
+        memset(deliver_bitmask, 0xff, deliver_bitmask_size);
         break;
     case 3:
-        memset(deliver_bitmask, 0xff, sizeof(deliver_bitmask));
-        apic_reset_bit(deliver_bitmask, s->id);
+        memset(deliver_bitmask, 0xff, deliver_bitmask_size);
+        apic_reset_bit(deliver_bitmask, current_apic_id);
         break;
     }
 
@@ -562,6 +616,7 @@ static void apic_deliver(DeviceState *dev, uint8_t dest, uint8_t dest_mode,
     }
 
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
+    g_free(deliver_bitmask);
 }
 
 static bool apic_check_pic(APICCommonState *s)
@@ -657,7 +712,11 @@ uint64_t apic_register_read(int index)
 
     switch(index) {
     case 0x02: /* id */
-        val = s->id << 24;
+        if (is_x2apic_mode(dev)) {
+            val = s->initial_apic_id;
+        } else {
+            val = s->id << 24;
+        }
         break;
     case 0x03: /* version */
         val = s->version | ((APIC_LVT_NB - 1) << 16);
@@ -680,9 +739,17 @@ uint64_t apic_register_read(int index)
         val = 0;
         break;
     case 0x0d:
-        val = s->log_dest << 24;
+        if (is_x2apic_mode(dev)) {
+            val = s->log_dest;
+        } else {
+            val = s->log_dest << 24;
+        }
         break;
     case 0x0e:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         val = (s->dest_mode << 28) | 0xfffffff;
         break;
     case 0x0f:
@@ -745,7 +812,12 @@ static void apic_send_msi(MSIMessage *msi)
 {
     uint64_t addr = msi->address;
     uint32_t data = msi->data;
-    uint8_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    uint32_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    /*
+     * The higher 3 bytes of destination id is stored in higher word of
+     * msi address. See x86_iommu_irq_to_msi_message()
+     */
+    dest = dest | (addr >> 32);
     uint8_t vector = (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
     uint8_t dest_mode = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
     uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
@@ -769,6 +841,10 @@ void apic_register_write(int index, uint64_t val)
 
     switch(index) {
     case 0x02:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->id = (val >> 24);
         break;
     case 0x03:
@@ -788,9 +864,17 @@ void apic_register_write(int index, uint64_t val)
         apic_eoi(s);
         break;
     case 0x0d:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->log_dest = val >> 24;
         break;
     case 0x0e:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->dest_mode = val >> 28;
         break;
     case 0x0f:
@@ -802,13 +886,27 @@ void apic_register_write(int index, uint64_t val)
     case 0x20 ... 0x27:
     case 0x28:
         break;
-    case 0x30:
+    case 0x30: {
+        uint32_t dest;
+
         s->icr[0] = val;
-        apic_deliver(dev, (s->icr[1] >> 24) & 0xff, (s->icr[0] >> 11) & 1,
+        if (is_x2apic_mode(dev)) {
+            s->icr[1] = val >> 32;
+            dest = s->icr[1];
+        } else {
+            dest = (s->icr[1] >> 24) & 0xff;
+        }
+
+        apic_deliver(dev, dest, (s->icr[0] >> 11) & 1,
                      (s->icr[0] >> 8) & 7, (s->icr[0] & 0xff),
-                     (s->icr[0] >> 15) & 1);
+                     (s->icr[0] >> 15) & 1, (s->icr[0] >> 18) & 3);
         break;
+    }
     case 0x31:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->icr[1] = val;
         break;
     case 0x32 ... 0x37:
@@ -837,6 +935,23 @@ void apic_register_write(int index, uint64_t val)
             s->count_shift = (v + 1) & 7;
         }
         break;
+    case 0x3f: {
+        int vector = val & 0xff;
+
+        if (!is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
+        /*
+         * Self IPI is identical to IPI with
+         * - Destination shorthand: 1 (Self)
+         * - Trigger mode: 0 (Edge)
+         * - Delivery mode: 0 (Fixed)
+         */
+        apic_deliver(dev, 0, 0, APIC_DM_FIXED, vector, 0, 1);
+
+        break;
+    }
     default:
         s->esr |= APIC_ESR_ILLEGAL_ADDRESS;
         break;
@@ -894,12 +1009,6 @@ static void apic_realize(DeviceState *dev, Error **errp)
 {
     APICCommonState *s = APIC(dev);
 
-    if (s->id >= MAX_APICS) {
-        error_setg(errp, "%s initialization failed. APIC ID %d is invalid",
-                   object_get_typename(OBJECT(dev)), s->id);
-        return;
-    }
-
     if (kvm_enabled()) {
         warn_report("Userspace local APIC is deprecated for KVM.");
         warn_report("Do not use kernel-irqchip except for the -M isapc machine type.");
@@ -909,7 +1018,7 @@ static void apic_realize(DeviceState *dev, Error **errp)
                           APIC_SPACE_SIZE);
 
     s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, apic_timer, s);
-    local_apics[s->id] = s;
+    local_apics[s->initial_apic_id] = s;
 
     msi_nonbroken = true;
 }
@@ -919,7 +1028,7 @@ static void apic_unrealize(DeviceState *dev)
     APICCommonState *s = APIC(dev);
 
     timer_free(s->timer);
-    local_apics[s->id] = NULL;
+    local_apics[s->initial_apic_id] = NULL;
 }
 
 static void apic_class_init(ObjectClass *klass, void *data)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 4a34f03047..39c207bd21 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -194,7 +194,11 @@ void apic_init_reset(DeviceState *dev)
     s = APIC_COMMON(dev);
     s->tpr = 0;
     s->spurious_vec = 0xff;
-    s->log_dest = 0;
+
+    if (!is_x2apic_mode(dev)) {
+        s->log_dest = 0;
+    }
+
     s->dest_mode = 0xf;
     memset(s->isr, 0, sizeof(s->isr));
     memset(s->tmr, 0, sizeof(s->tmr));
@@ -366,7 +370,7 @@ static const VMStateDescription vmstate_apic_common = {
         VMSTATE_UINT8(arb_id, APICCommonState),
         VMSTATE_UINT8(tpr, APICCommonState),
         VMSTATE_UINT32(spurious_vec, APICCommonState),
-        VMSTATE_UINT8(log_dest, APICCommonState),
+        VMSTATE_UINT32(log_dest, APICCommonState),
         VMSTATE_UINT8(dest_mode, APICCommonState),
         VMSTATE_UINT32_ARRAY(isr, APICCommonState, 8),
         VMSTATE_UINT32_ARRAY(tmr, APICCommonState, 8),
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index 2cebeb4faf..d938bfa8e0 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -3,7 +3,8 @@
 
 
 /* apic.c */
-void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
+void apic_set_max_apic_id(uint32_t max_apic_id);
+void apic_deliver_irq(uint32_t dest, uint8_t dest_mode, uint8_t delivery_mode,
                       uint8_t vector_num, uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 5f2ba24bfc..5f60cd5e78 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -164,7 +164,7 @@ struct APICCommonState {
     uint8_t arb_id;
     uint8_t tpr;
     uint32_t spurious_vec;
-    uint8_t log_dest;
+    uint32_t log_dest;
     uint8_t dest_mode;
     uint32_t isr[8];  /* in service register */
     uint32_t tmr[8];  /* trigger mode register */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/5] apic, i386/tcg: add x2apic transitions
  2023-03-26  5:20 [PATCH v2 0/5] Support x2APIC mode with TCG accelerator Bui Quang Minh
  2023-03-26  5:20 ` [PATCH v2 1/5] i386/tcg: implement x2APIC registers MSR access Bui Quang Minh
  2023-03-26  5:20 ` [PATCH v2 2/5] apic: add support for x2APIC mode Bui Quang Minh
@ 2023-03-26  5:20 ` Bui Quang Minh
  2023-03-26  5:20 ` [PATCH v2 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC Bui Quang Minh
  2023-03-26  5:20 ` [PATCH v2 5/5] amd_iommu: report x2APIC support to the operating system Bui Quang Minh
  4 siblings, 0 replies; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-26  5:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, David Woodhouse,
	Igor Mammedov, Alex Bennée, Bui Quang Minh

This commit adds support for x2APIC transitions when writing to
MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to
TCG_EXT_FEATURES.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/intc/apic.c                       | 50 ++++++++++++++++++++++++++++
 hw/intc/apic_common.c                |  7 ++--
 target/i386/cpu-sysemu.c             | 10 ++++++
 target/i386/cpu.c                    |  5 +--
 target/i386/cpu.h                    |  6 ++++
 target/i386/tcg/sysemu/misc_helper.c |  4 +++
 6 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 159527af30..735412274d 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -308,8 +308,41 @@ bool is_x2apic_mode(DeviceState *dev)
     return s->apicbase & MSR_IA32_APICBASE_EXTD;
 }
 
+static void apic_set_base_check(APICCommonState *s, uint64_t val)
+{
+    /* Enable x2apic when x2apic is not supported by CPU */
+    if (!cpu_has_x2apic_feature(&s->cpu->env) &&
+        val & MSR_IA32_APICBASE_EXTD)
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /*
+     * Transition into invalid state
+     * (s->apicbase & MSR_IA32_APICBASE_ENABLE == 0) &&
+     * (s->apicbase & MSR_IA32_APICBASE_EXTD) == 1
+     */
+    if (!(val & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /* Invalid transition from disabled mode to x2APIC */
+    if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        !(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /* Invalid transition from x2APIC to xAPIC */
+    if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        (s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_ENABLE) &&
+        !(val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
+    apic_set_base_check(s, val);
+
     s->apicbase = (val & 0xfffff000) |
         (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
     /* if disabled, cannot be enabled again */
@@ -318,6 +351,23 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
         cpu_clear_apic_feature(&s->cpu->env);
         s->spurious_vec &= ~APIC_SV_ENABLE;
     }
+
+    /* Transition from disabled mode to xAPIC */
+    if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_ENABLE)) {
+        s->apicbase |= MSR_IA32_APICBASE_ENABLE;
+        cpu_set_apic_feature(&s->cpu->env);
+    }
+
+    /* Transition from xAPIC to x2APIC */
+    if (cpu_has_x2apic_feature(&s->cpu->env) &&
+        !(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_EXTD)) {
+        s->apicbase |= MSR_IA32_APICBASE_EXTD;
+
+        s->log_dest = ((s->initial_apic_id & 0xffff0) << 16) |
+                      (1 << (s->initial_apic_id & 0xf));
+    }
 }
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 39c207bd21..3d249838fc 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -43,11 +43,8 @@ void cpu_set_apic_base(DeviceState *dev, uint64_t val)
     if (dev) {
         APICCommonState *s = APIC_COMMON(dev);
         APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
-        /* switching to x2APIC, reset possibly modified xAPIC ID */
-        if (!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
-            (val & MSR_IA32_APICBASE_EXTD)) {
-            s->id = s->initial_apic_id;
-        }
+        /* Reset possibly modified xAPIC ID */
+        s->id = s->initial_apic_id;
         info->set_base(s, val);
     }
 }
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index 28115edf44..de57892c25 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -235,6 +235,16 @@ void cpu_clear_apic_feature(CPUX86State *env)
     env->features[FEAT_1_EDX] &= ~CPUID_APIC;
 }
 
+void cpu_set_apic_feature(CPUX86State *env)
+{
+    env->features[FEAT_1_EDX] |= CPUID_APIC;
+}
+
+bool cpu_has_x2apic_feature(CPUX86State *env)
+{
+    return env->features[FEAT_1_ECX] & CPUID_EXT_X2APIC;
+}
+
 bool cpu_is_bsp(X86CPU *cpu)
 {
     return cpu_get_apic_base(cpu->apic_state) & MSR_IA32_APICBASE_BSP;
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6576287e5b..6847b2ae02 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -627,12 +627,13 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
           CPUID_EXT_XSAVE | /* CPUID_EXT_OSXSAVE is dynamic */   \
           CPUID_EXT_MOVBE | CPUID_EXT_AES | CPUID_EXT_HYPERVISOR | \
           CPUID_EXT_RDRAND | CPUID_EXT_AVX | CPUID_EXT_F16C | \
-          CPUID_EXT_FMA)
+          CPUID_EXT_FMA | CPUID_EXT_X2APIC)
           /* missing:
           CPUID_EXT_DTES64, CPUID_EXT_DSCPL, CPUID_EXT_VMX, CPUID_EXT_SMX,
           CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID,
           CPUID_EXT_XTPR, CPUID_EXT_PDCM, CPUID_EXT_PCID, CPUID_EXT_DCA,
-          CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER */
+          CPUID_EXT_TSC_DEADLINE_TIMER
+          */
 
 #ifdef TARGET_X86_64
 #define TCG_EXT2_X86_64_FEATURES (CPUID_EXT2_SYSCALL | CPUID_EXT2_LM)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 02165a5ad2..53cd8f01dd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -379,6 +379,10 @@ typedef enum X86Seg {
 #define MSR_IA32_APICBASE_ENABLE        (1<<11)
 #define MSR_IA32_APICBASE_EXTD          (1 << 10)
 #define MSR_IA32_APICBASE_BASE          (0xfffffU<<12)
+#define MSR_IA32_APICBASE_RESERVED \
+        (~(uint64_t)(MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE \
+                     | MSR_IA32_APICBASE_EXTD | MSR_IA32_APICBASE_BASE))
+
 #define MSR_IA32_FEATURE_CONTROL        0x0000003a
 #define MSR_TSC_ADJUST                  0x0000003b
 #define MSR_IA32_SPEC_CTRL              0x48
@@ -2158,8 +2162,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                    uint32_t *eax, uint32_t *ebx,
                    uint32_t *ecx, uint32_t *edx);
 void cpu_clear_apic_feature(CPUX86State *env);
+void cpu_set_apic_feature(CPUX86State *env);
 void host_cpuid(uint32_t function, uint32_t count,
                 uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
+bool cpu_has_x2apic_feature(CPUX86State *env);
 
 /* helper.c */
 void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
diff --git a/target/i386/tcg/sysemu/misc_helper.c b/target/i386/tcg/sysemu/misc_helper.c
index 1fce2076a3..91a58d4d97 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -159,6 +159,10 @@ void helper_wrmsr(CPUX86State *env)
         env->sysenter_eip = val;
         break;
     case MSR_IA32_APICBASE:
+        if (val & MSR_IA32_APICBASE_RESERVED) {
+            goto error;
+        }
+
         cpu_set_apic_base(env_archcpu(env)->apic_state, val);
         break;
     case MSR_EFER:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC
  2023-03-26  5:20 [PATCH v2 0/5] Support x2APIC mode with TCG accelerator Bui Quang Minh
                   ` (2 preceding siblings ...)
  2023-03-26  5:20 ` [PATCH v2 3/5] apic, i386/tcg: add x2apic transitions Bui Quang Minh
@ 2023-03-26  5:20 ` Bui Quang Minh
  2023-03-26  5:20 ` [PATCH v2 5/5] amd_iommu: report x2APIC support to the operating system Bui Quang Minh
  4 siblings, 0 replies; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-26  5:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, David Woodhouse,
	Igor Mammedov, Alex Bennée, Bui Quang Minh

As userspace APIC now supports x2APIC, intel interrupt remapping
hardware can be set to EIM mode when userspace local APIC is used.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/i386/intel_iommu.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index faade7def8..a34a4c8196 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4045,17 +4045,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
                       && x86_iommu_ir_supported(x86_iommu) ?
                                               ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
     }
-    if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) {
-        if (!kvm_irqchip_is_split()) {
-            error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split");
-            return false;
-        }
-        if (!kvm_enable_x2apic()) {
-            error_setg(errp, "eim=on requires support on the KVM side"
-                             "(X2APIC_API, first shipped in v4.7)");
-            return false;
-        }
-    }
 
     /* Currently only address widths supported are 39 and 48 bits */
     if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/5] amd_iommu: report x2APIC support to the operating system
  2023-03-26  5:20 [PATCH v2 0/5] Support x2APIC mode with TCG accelerator Bui Quang Minh
                   ` (3 preceding siblings ...)
  2023-03-26  5:20 ` [PATCH v2 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC Bui Quang Minh
@ 2023-03-26  5:20 ` Bui Quang Minh
  4 siblings, 0 replies; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-26  5:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, David Woodhouse,
	Igor Mammedov, Alex Bennée, Bui Quang Minh

This commit adds XTSup configuration to let user choose to whether enable
this feature or not. When XTSup is enabled, additional bytes in IRTE with
enabled guest virtual VAPIC are used to support 32-bit destination id.

Additionally, this commit changes to use IVHD type 0x11 in ACPI table for
feature report to operating system. This is because Linux does not use
XTSup in IOMMU Feature Reporting field of IVHD type 0x10 but only use XTSup
bit in EFR Register Image of IVHD 0x11 to indicate x2APIC support (see
init_iommu_one in linux/drivers/iommu/amd/init.c)

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/i386/acpi-build.c | 28 ++++++++++++++--------------
 hw/i386/amd_iommu.c  | 21 +++++++++++++++++++--
 hw/i386/amd_iommu.h  | 16 +++++++++++-----
 3 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ec857a117e..72d6bb2892 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2339,7 +2339,7 @@ static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
                 const char *oem_table_id)
 {
-    int ivhd_table_len = 24;
+    int ivhd_table_len = 40;
     AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
     GArray *ivhd_blob = g_array_new(false, true, 1);
     AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
@@ -2349,18 +2349,19 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
     /* IVinfo - IO virtualization information common to all
      * IOMMU units in a system
      */
-    build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
+    build_append_int_noprefix(table_data,
+                             (1UL << 0) | /* EFRSup */
+                             (40UL << 8), /* PASize */
+                             4);
     /* reserved */
     build_append_int_noprefix(table_data, 0, 8);
 
-    /* IVHD definition - type 10h */
-    build_append_int_noprefix(table_data, 0x10, 1);
+    /* IVHD definition - type 11h */
+    build_append_int_noprefix(table_data, 0x11, 1);
     /* virtualization flags */
     build_append_int_noprefix(table_data,
                              (1UL << 0) | /* HtTunEn      */
-                             (1UL << 4) | /* iotblSup     */
-                             (1UL << 6) | /* PrefSup      */
-                             (1UL << 7),  /* PPRSup       */
+                             (1UL << 4),  /* iotblSup     */
                              1);
 
     /*
@@ -2404,13 +2405,12 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
     build_append_int_noprefix(table_data, 0, 2);
     /* IOMMU info */
     build_append_int_noprefix(table_data, 0, 2);
-    /* IOMMU Feature Reporting */
-    build_append_int_noprefix(table_data,
-                             (48UL << 30) | /* HATS   */
-                             (48UL << 28) | /* GATS   */
-                             (1UL << 2)   | /* GTSup  */
-                             (1UL << 6),    /* GASup  */
-                             4);
+    /* IOMMU Attributes */
+    build_append_int_noprefix(table_data, 0, 4);
+    /* EFR Register Image */
+    build_append_int_noprefix(table_data, s->efr_reg, 8);
+    /* EFR Register Image 2 */
+    build_append_int_noprefix(table_data, 0, 8);
 
     /* IVHD entries as found above */
     g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index bcd016f5c5..5dfa93d945 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -31,6 +31,7 @@
 #include "hw/i386/apic_internal.h"
 #include "trace.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/qdev-properties.h"
 
 /* used AMD-Vi MMIO registers */
 const char *amdvi_mmio_low[] = {
@@ -1155,7 +1156,12 @@ static int amdvi_int_remap_ga(AMDVIState *iommu,
     irq->vector = irte.hi.fields.vector;
     irq->dest_mode = irte.lo.fields_remap.dm;
     irq->redir_hint = irte.lo.fields_remap.rq_eoi;
-    irq->dest = irte.lo.fields_remap.destination;
+    if (!iommu->xtsup) {
+        irq->dest = irte.lo.fields_remap.destination & 0xff;
+    } else {
+        irq->dest = irte.lo.fields_remap.destination |
+                    (irte.hi.fields.destination_hi << 24);
+    }
 
     return 0;
 }
@@ -1503,10 +1509,15 @@ static void amdvi_init(AMDVIState *s)
     s->enabled = false;
     s->ats_enabled = false;
     s->cmdbuf_enabled = false;
+    s->efr_reg = AMDVI_DEFAULT_EXT_FEATURES;
+
+    if (s->xtsup) {
+        s->efr_reg |= AMDVI_FEATURE_XT;
+    }
 
     /* reset MMIO */
     memset(s->mmior, 0, AMDVI_MMIO_SIZE);
-    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, AMDVI_EXT_FEATURES,
+    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, s->efr_reg,
             0xffffffffffffffef, 0);
     amdvi_set_quad(s, AMDVI_MMIO_STATUS, 0, 0x98, 0x67);
 
@@ -1586,6 +1597,11 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
     amdvi_init(s);
 }
 
+static Property amdvi_properties[] = {
+    DEFINE_PROP_BOOL("xtsup", AMDVIState, xtsup, false),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static const VMStateDescription vmstate_amdvi_sysbus = {
     .name = "amd-iommu",
     .unmigratable = 1
@@ -1612,6 +1628,7 @@ static void amdvi_sysbus_class_init(ObjectClass *klass, void *data)
     dc->user_creatable = true;
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     dc->desc = "AMD IOMMU (AMD-Vi) DMA Remapping device";
+    device_class_set_props(dc, amdvi_properties);
 }
 
 static const TypeInfo amdvi_sysbus = {
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 79d38a3e41..96df7b0400 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -154,6 +154,7 @@
 
 #define AMDVI_FEATURE_PREFETCH            (1ULL << 0) /* page prefetch       */
 #define AMDVI_FEATURE_PPR                 (1ULL << 1) /* PPR Support         */
+#define AMDVI_FEATURE_XT                  (1ULL << 2) /* x2APIC Support      */
 #define AMDVI_FEATURE_GT                  (1ULL << 4) /* Guest Translation   */
 #define AMDVI_FEATURE_IA                  (1ULL << 6) /* inval all support   */
 #define AMDVI_FEATURE_GA                  (1ULL << 7) /* guest VAPIC support */
@@ -173,8 +174,9 @@
 #define AMDVI_IOTLB_MAX_SIZE 1024
 #define AMDVI_DEVID_SHIFT    36
 
-/* extended feature support */
-#define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
+/* default extended feature */
+#define AMDVI_DEFAULT_EXT_FEATURES \
+        (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
         AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
         AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
 
@@ -278,8 +280,8 @@ union irte_ga_lo {
                 dm:1,
                 /* ------ */
                 guest_mode:1,
-                destination:8,
-                rsvd_1:48;
+                destination:24,
+                rsvd_1:32;
   } fields_remap;
 };
 
@@ -287,7 +289,8 @@ union irte_ga_hi {
   uint64_t val;
   struct {
       uint64_t  vector:8,
-                rsvd_2:56;
+                rsvd_2:48,
+                destination_hi:8;
   } fields;
 };
 
@@ -367,6 +370,9 @@ struct AMDVIState {
 
     /* Interrupt remapping */
     bool ga_enabled;
+    bool xtsup;
+
+    uint64_t efr_reg;            /* extended feature register */
 };
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-26  5:20 ` [PATCH v2 2/5] apic: add support for x2APIC mode Bui Quang Minh
@ 2023-03-27 11:04   ` David Woodhouse
  2023-03-27 15:33     ` Bui Quang Minh
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2023-03-27 11:04 UTC (permalink / raw)
  To: Bui Quang Minh, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 6168 bytes --]

On Sun, 2023-03-26 at 12:20 +0700, Bui Quang Minh wrote:
> This commit extends the APIC ID to 32-bit long and remove the 255 max APIC
> ID limit in userspace APIC. The array that manages local APICs is now
> dynamically allocated based on the max APIC ID of created x86 machine.
> Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC
> mode register access are supported.
> 
> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
> ---
>  hw/i386/x86.c                   |   8 +-
>  hw/intc/apic.c                  | 229 +++++++++++++++++++++++---------
>  hw/intc/apic_common.c           |   8 +-
>  include/hw/i386/apic.h          |   3 +-
>  include/hw/i386/apic_internal.h |   2 +-
>  5 files changed, 184 insertions(+), 66 deletions(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index a88a126123..fa9b15190d 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -132,11 +132,11 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
>       * Can we support APIC ID 255 or higher?
>       *
>       * Under Xen: yes.
> -     * With userspace emulated lapic: no
> +     * With userspace emulated lapic: yes.

Are you making this unconditional? It shall not be possible to emulate
a CPU *without* X2APIC?


>       * With KVM's in-kernel lapic: only if X2APIC API is enabled.
>       */
>      if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
> -        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
> +        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
>          error_report("current -smp configuration requires kernel "
>                       "irqchip and X2APIC API support.");
>          exit(EXIT_FAILURE);
...
> @@ -276,16 +288,17 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
>                   apic_set_irq(apic_iter, vector_num, trigger_mode) );
>  }
>  
> -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
> +void apic_deliver_irq(uint32_t dest, uint8_t dest_mode, uint8_t delivery_mode,
>                        uint8_t vector_num, uint8_t trigger_mode)

We can make this 'static' while we're here. It isn't actually called
from anywhere else, is it?

>  {
> -    uint32_t deliver_bitmask[MAX_APIC_WORDS];
> +    uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t));
>  
>      trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num,
>                             trigger_mode);
>  
>      apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode);
>      apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
> +    g_free(deliver_bitmask);
>  }
>  
>  bool is_x2apic_mode(DeviceState *dev)
...
>  
>  static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
> -                                      uint8_t dest, uint8_t dest_mode)
> +                                      uint32_t dest, uint8_t dest_mode)
>  {
>      APICCommonState *apic_iter;
>      int i;
>  
> +    memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
> +
> +    /* x2APIC broadcast id for both physical and logical (cluster) mode */
> +    if (dest == 0xffffffff) {
> +        apic_get_broadcast_bitmask(deliver_bitmask, true);
> +        return;
> +    }
> +
>      if (dest_mode == 0) {

Might be nice to have a constant for DEST_MODE_PHYS vs.
DEST_MODE_LOGICAL to make this clearer? 

> +        apic_find_dest(deliver_bitmask, dest);
> +        /* Broadcast to xAPIC mode apics */
>          if (dest == 0xff) {
> -            memset(deliver_bitmask, 0xff, MAX_APIC_WORDS * sizeof(uint32_t));
> -        } else {
> -            int idx = apic_find_dest(dest);
> -            memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
> -            if (idx >= 0)
> -                apic_set_bit(deliver_bitmask, idx);
> +            apic_get_broadcast_bitmask(deliver_bitmask, false);


Hrm... aren't you still interpreting destination 0x000000FF as
broadcast even for X2APIC mode? Or am I misreading this?


>          }
>      } else {
>          /* XXX: cluster mode */
> 
...

> @@ -366,7 +370,7 @@ static const VMStateDescription vmstate_apic_common = {
>          VMSTATE_UINT8(arb_id, APICCommonState),
>          VMSTATE_UINT8(tpr, APICCommonState),
>          VMSTATE_UINT32(spurious_vec, APICCommonState),
> -        VMSTATE_UINT8(log_dest, APICCommonState),
> +        VMSTATE_UINT32(log_dest, APICCommonState),
>          VMSTATE_UINT8(dest_mode, APICCommonState),
>          VMSTATE_UINT32_ARRAY(isr, APICCommonState, 8),
>          VMSTATE_UINT32_ARRAY(tmr, APICCommonState, 8),


Hm, doesn't this need to be added in a separate subsection, much as
ide_drive/pio_state in the example in docs/devel/migration.rst? Or did
I *not* need to do that in commit ecb0e98b4 (unrelated to x2apic, but
similar addition of state)?

Can you confirm that you've tested the behaviour when migrating back
from this to an older QEMU, both for a guest *with* X2APIC enabled
(which should fail gracefully), and a guest without X2APIC (which
should work).

> diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
> index 2cebeb4faf..d938bfa8e0 100644
> --- a/include/hw/i386/apic.h
> +++ b/include/hw/i386/apic.h
> @@ -3,7 +3,8 @@
>  
>  
>  /* apic.c */
> -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
> +void apic_set_max_apic_id(uint32_t max_apic_id);
> +void apic_deliver_irq(uint32_t dest, uint8_t dest_mode, uint8_t delivery_mode,
>                        uint8_t vector_num, uint8_t trigger_mode);

Making it static means this can be removed, of course.




[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-27 11:04   ` David Woodhouse
@ 2023-03-27 15:33     ` Bui Quang Minh
  2023-03-27 15:37       ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-27 15:33 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 3/27/23 18:04, David Woodhouse wrote:
> On Sun, 2023-03-26 at 12:20 +0700, Bui Quang Minh wrote:
>> This commit extends the APIC ID to 32-bit long and remove the 255 max APIC
>> ID limit in userspace APIC. The array that manages local APICs is now
>> dynamically allocated based on the max APIC ID of created x86 machine.
>> Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC
>> mode register access are supported.
>>
>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
>> ---
>>   hw/i386/x86.c                   |   8 +-
>>   hw/intc/apic.c                  | 229 +++++++++++++++++++++++---------
>>   hw/intc/apic_common.c           |   8 +-
>>   include/hw/i386/apic.h          |   3 +-
>>   include/hw/i386/apic_internal.h |   2 +-
>>   5 files changed, 184 insertions(+), 66 deletions(-)
>>
>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> index a88a126123..fa9b15190d 100644
>> --- a/hw/i386/x86.c
>> +++ b/hw/i386/x86.c
>> @@ -132,11 +132,11 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
>>        * Can we support APIC ID 255 or higher?
>>        *
>>        * Under Xen: yes.
>> -     * With userspace emulated lapic: no
>> +     * With userspace emulated lapic: yes.
> 
> Are you making this unconditional? It shall not be possible to emulate
> a CPU *without* X2APIC?

You are right, this should report error when APIC ID is higher than 255 
and x2APIC is not supported by the CPU.

> 
>>        * With KVM's in-kernel lapic: only if X2APIC API is enabled.
>>        */
>>       if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
>> -        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
>> +        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
>>           error_report("current -smp configuration requires kernel "
>>                        "irqchip and X2APIC API support.");
>>           exit(EXIT_FAILURE);
> ...
>> @@ -276,16 +288,17 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
>>                    apic_set_irq(apic_iter, vector_num, trigger_mode) );
>>   }
>>   
>> -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
>> +void apic_deliver_irq(uint32_t dest, uint8_t dest_mode, uint8_t delivery_mode,
>>                         uint8_t vector_num, uint8_t trigger_mode)
> 
> We can make this 'static' while we're here. It isn't actually called
> from anywhere else, is it?

I'll fix it in the next version.

>>   
>>   static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
>> -                                      uint8_t dest, uint8_t dest_mode)
>> +                                      uint32_t dest, uint8_t dest_mode)
>>   {
>>       APICCommonState *apic_iter;
>>       int i;
>>   
>> +    memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
>> +
>> +    /* x2APIC broadcast id for both physical and logical (cluster) mode */
>> +    if (dest == 0xffffffff) {
>> +        apic_get_broadcast_bitmask(deliver_bitmask, true);
>> +        return;
>> +    }
>> +
>>       if (dest_mode == 0) {
> 
> Might be nice to have a constant for DEST_MODE_PHYS vs.
> DEST_MODE_LOGICAL to make this clearer?

I'll fix it in the next version.

>> +        apic_find_dest(deliver_bitmask, dest);
>> +        /* Broadcast to xAPIC mode apics */
>>           if (dest == 0xff) {
>> -            memset(deliver_bitmask, 0xff, MAX_APIC_WORDS * sizeof(uint32_t));
>> -        } else {
>> -            int idx = apic_find_dest(dest);
>> -            memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
>> -            if (idx >= 0)
>> -                apic_set_bit(deliver_bitmask, idx);
>> +            apic_get_broadcast_bitmask(deliver_bitmask, false);
> 
> 
> Hrm... aren't you still interpreting destination 0x000000FF as
> broadcast even for X2APIC mode? Or am I misreading this?

In case the destination is 0xFF, the IPI will be delivered to CPU has 
APIC ID 0xFF if it is in x2APIC mode, and it will be delivered to all 
CPUs that are in xAPIC mode. In case the destination is 0xFFFFFFFF, the 
IPI is delivered to all CPUs that are in x2APIC mode. I've created 
apic_get_broadcast_bitmask function and changed the apic_find_dest to 
implement that logic.

>> @@ -366,7 +370,7 @@ static const VMStateDescription vmstate_apic_common = {
>>           VMSTATE_UINT8(arb_id, APICCommonState),
>>           VMSTATE_UINT8(tpr, APICCommonState),
>>           VMSTATE_UINT32(spurious_vec, APICCommonState),
>> -        VMSTATE_UINT8(log_dest, APICCommonState),
>> +        VMSTATE_UINT32(log_dest, APICCommonState),
>>           VMSTATE_UINT8(dest_mode, APICCommonState),
>>           VMSTATE_UINT32_ARRAY(isr, APICCommonState, 8),
>>           VMSTATE_UINT32_ARRAY(tmr, APICCommonState, 8),
> 
> 
> Hm, doesn't this need to be added in a separate subsection, much as
> ide_drive/pio_state in the example in docs/devel/migration.rst? Or did
> I *not* need to do that in commit ecb0e98b4 (unrelated to x2apic, but
> similar addition of state)?
> 
> Can you confirm that you've tested the behaviour when migrating back
> from this to an older QEMU, both for a guest *with* X2APIC enabled
> (which should fail gracefully), and a guest without X2APIC (which
> should work).

Oh, thank you for pointing out, I actually don't understand the use of 
vmstate before, I'll look at the document more and fix it.

>> diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
>> index 2cebeb4faf..d938bfa8e0 100644
>> --- a/include/hw/i386/apic.h
>> +++ b/include/hw/i386/apic.h
>> @@ -3,7 +3,8 @@
>>   
>>   
>>   /* apic.c */
>> -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
>> +void apic_set_max_apic_id(uint32_t max_apic_id);
>> +void apic_deliver_irq(uint32_t dest, uint8_t dest_mode, uint8_t delivery_mode,
>>                         uint8_t vector_num, uint8_t trigger_mode);
> 
> Making it static means this can be removed, of course.

I'll fix it in next version.

Thanks,
Quang Minh.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-27 15:33     ` Bui Quang Minh
@ 2023-03-27 15:37       ` David Woodhouse
  2023-03-27 15:45         ` Bui Quang Minh
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2023-03-27 15:37 UTC (permalink / raw)
  To: Bui Quang Minh, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 2033 bytes --]

On Mon, 2023-03-27 at 22:33 +0700, Bui Quang Minh wrote:
> 
> > > +    memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
> > > +
> > > +    /* x2APIC broadcast id for both physical and logical (cluster) mode */
> > > +    if (dest == 0xffffffff) {
> > > +        apic_get_broadcast_bitmask(deliver_bitmask, true);
> > > +        return;
> > > +    }
> > > +
> > >        if (dest_mode == 0) {
> > 
> > Might be nice to have a constant for DEST_MODE_PHYS vs.
> > DEST_MODE_LOGICAL to make this clearer?
> 
> I'll fix it in the next version.
> 
> > > +        apic_find_dest(deliver_bitmask, dest);
> > > +        /* Broadcast to xAPIC mode apics */
> > >            if (dest == 0xff) {
> > > -            memset(deliver_bitmask, 0xff, MAX_APIC_WORDS * sizeof(uint32_t));
> > > -        } else {
> > > -            int idx = apic_find_dest(dest);
> > > -            memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
> > > -            if (idx >= 0)
> > > -                apic_set_bit(deliver_bitmask, idx);
> > > +            apic_get_broadcast_bitmask(deliver_bitmask, false);
> > 
> > 
> > Hrm... aren't you still interpreting destination 0x000000FF as
> > broadcast even for X2APIC mode? Or am I misreading this?
> 
> In case the destination is 0xFF, the IPI will be delivered to CPU has
> APIC ID 0xFF if it is in x2APIC mode, and it will be delivered to all
> CPUs that are in xAPIC mode. In case the destination is 0xFFFFFFFF, the 
> IPI is delivered to all CPUs that are in x2APIC mode. I've created 
> apic_get_broadcast_bitmask function and changed the apic_find_dest to
> implement that logic.

Maybe I'm misreading the patch, but to me it looks that
if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
x2apic mode? So delivering to the APIC with physical ID 255 will be
misinterpreted as a broadcast? 

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-27 15:37       ` David Woodhouse
@ 2023-03-27 15:45         ` Bui Quang Minh
  2023-03-27 16:22           ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-27 15:45 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 3/27/23 22:37, David Woodhouse wrote:
> On Mon, 2023-03-27 at 22:33 +0700, Bui Quang Minh wrote:
>>
>>>> +    memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
>>>> +
>>>> +    /* x2APIC broadcast id for both physical and logical (cluster) mode */
>>>> +    if (dest == 0xffffffff) {
>>>> +        apic_get_broadcast_bitmask(deliver_bitmask, true);
>>>> +        return;
>>>> +    }
>>>> +
>>>>         if (dest_mode == 0) {
>>>
>>> Might be nice to have a constant for DEST_MODE_PHYS vs.
>>> DEST_MODE_LOGICAL to make this clearer?
>>
>> I'll fix it in the next version.
>>
>>>> +        apic_find_dest(deliver_bitmask, dest);
>>>> +        /* Broadcast to xAPIC mode apics */
>>>>             if (dest == 0xff) {
>>>> -            memset(deliver_bitmask, 0xff, MAX_APIC_WORDS * sizeof(uint32_t));
>>>> -        } else {
>>>> -            int idx = apic_find_dest(dest);
>>>> -            memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
>>>> -            if (idx >= 0)
>>>> -                apic_set_bit(deliver_bitmask, idx);
>>>> +            apic_get_broadcast_bitmask(deliver_bitmask, false);
>>>
>>>
>>> Hrm... aren't you still interpreting destination 0x000000FF as
>>> broadcast even for X2APIC mode? Or am I misreading this?
>>
>> In case the destination is 0xFF, the IPI will be delivered to CPU has
>> APIC ID 0xFF if it is in x2APIC mode, and it will be delivered to all
>> CPUs that are in xAPIC mode. In case the destination is 0xFFFFFFFF, the
>> IPI is delivered to all CPUs that are in x2APIC mode. I've created
>> apic_get_broadcast_bitmask function and changed the apic_find_dest to
>> implement that logic.
> 
> Maybe I'm misreading the patch, but to me it looks that
> if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
> x2apic mode? So delivering to the APIC with physical ID 255 will be
> misinterpreted as a broadcast?

In case dest == 0xff the second argument to apic_get_broadcast_bitmask 
is set to false which means this is xAPIC broadcast

static void apic_get_broadcast_bitmask(uint32_t *deliver_bitmask,
                                        bool is_x2apic_broadcast)
{
     int i;
     APICCommonState *apic_iter;

     for (i = 0; i < max_apics; i++) {
         apic_iter = local_apics[i];
         if (apic_iter) {
             bool apic_in_x2apic = is_x2apic_mode(&apic_iter->parent_obj);

             if (is_x2apic_broadcast && apic_in_x2apic) {
                 apic_set_bit(deliver_bitmask, i);
             } else if (!is_x2apic_broadcast && !apic_in_x2apic) {
                 apic_set_bit(deliver_bitmask, i);
             }
         }
     }
}

In apic_get_broadcast_bitmask, because is_x2apic_broadcast == false, the 
delivery bit set only if that apic_in_x2apic == false (that CPU is in 
xAPIC mode)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-27 15:45         ` Bui Quang Minh
@ 2023-03-27 16:22           ` David Woodhouse
  2023-03-27 16:35             ` Bui Quang Minh
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2023-03-27 16:22 UTC (permalink / raw)
  To: Bui Quang Minh, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 1339 bytes --]

On Mon, 2023-03-27 at 22:45 +0700, Bui Quang Minh wrote:
> 
> > Maybe I'm misreading the patch, but to me it looks that
> > if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
> > x2apic mode? So delivering to the APIC with physical ID 255 will be
> > misinterpreted as a broadcast?
> 
> In case dest == 0xff the second argument to apic_get_broadcast_bitmask 
> is set to false which means this is xAPIC broadcast

Yeah, but it *isn't* xAPIC broadcast. It's X2APIC unicast to APIC#255.

I think you want (although you don't have 'dev') something like this:


static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
                                      uint32_t dest, uint8_t dest_mode)
{
    APICCommonState *apic_iter;
    int i;

    memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));

    /* x2APIC broadcast id for both physical and logical (cluster) mode */
    if (dest == 0xffffffff) {
        apic_get_broadcast_bitmask(deliver_bitmask, true);
        return;
    }

    if (dest_mode == 0) {
        apic_find_dest(deliver_bitmask, dest);
        /* Broadcast to xAPIC mode apics */
-        if (dest == 0xff) {
+        if (dest == 0xff && is_x2apic_mode(dev)) {
            apic_get_broadcast_bitmask(deliver_bitmask, false);
        }
    } else {


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-27 16:22           ` David Woodhouse
@ 2023-03-27 16:35             ` Bui Quang Minh
  2023-03-27 16:49               ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-27 16:35 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 3/27/23 23:22, David Woodhouse wrote:
> On Mon, 2023-03-27 at 22:45 +0700, Bui Quang Minh wrote:
>>
>>> Maybe I'm misreading the patch, but to me it looks that
>>> if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
>>> x2apic mode? So delivering to the APIC with physical ID 255 will be
>>> misinterpreted as a broadcast?
>>
>> In case dest == 0xff the second argument to apic_get_broadcast_bitmask
>> is set to false which means this is xAPIC broadcast
> 
> Yeah, but it *isn't* xAPIC broadcast. It's X2APIC unicast to APIC#255.
> 
> I think you want (although you don't have 'dev') something like this:
> 
> 
> static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
>                                        uint32_t dest, uint8_t dest_mode)
> {
>      APICCommonState *apic_iter;
>      int i;
> 
>      memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
> 
>      /* x2APIC broadcast id for both physical and logical (cluster) mode */
>      if (dest == 0xffffffff) {
>          apic_get_broadcast_bitmask(deliver_bitmask, true);
>          return;
>      }
> 
>      if (dest_mode == 0) {
>          apic_find_dest(deliver_bitmask, dest);
>          /* Broadcast to xAPIC mode apics */
> -        if (dest == 0xff) {
> +        if (dest == 0xff && is_x2apic_mode(dev)) {
>              apic_get_broadcast_bitmask(deliver_bitmask, false);
>          }
>      } else {
> 

Hmm, the unicast case is handled in apic_find_dest function, the logic 
inside the if (dest == 0xff) is for handling the broadcast case only. 
This is because when dest == 0xff, it can be both a x2APIC unicast and 
xAPIC broadcast in case we have some CPUs that are in xAPIC and others 
are in x2APIC. Do you think the code here is tricky and hard to read?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-27 16:35             ` Bui Quang Minh
@ 2023-03-27 16:49               ` David Woodhouse
  2023-03-28 15:58                 ` Bui Quang Minh
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2023-03-27 16:49 UTC (permalink / raw)
  To: Bui Quang Minh, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 2957 bytes --]

On Mon, 2023-03-27 at 23:35 +0700, Bui Quang Minh wrote:
> On 3/27/23 23:22, David Woodhouse wrote:
> > On Mon, 2023-03-27 at 22:45 +0700, Bui Quang Minh wrote:
> > > 
> > > > Maybe I'm misreading the patch, but to me it looks that
> > > > if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
> > > > x2apic mode? So delivering to the APIC with physical ID 255 will be
> > > > misinterpreted as a broadcast?
> > > 
> > > In case dest == 0xff the second argument to apic_get_broadcast_bitmask
> > > is set to false which means this is xAPIC broadcast
> > 
> > Yeah, but it *isn't* xAPIC broadcast. It's X2APIC unicast to APIC#255.
> > 
> > I think you want (although you don't have 'dev') something like this:
> > 
> > 
> > static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
> >                                        uint32_t dest, uint8_t dest_mode)
> > {
> >      APICCommonState *apic_iter;
> >      int i;
> > 
> >      memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
> > 
> >      /* x2APIC broadcast id for both physical and logical (cluster) mode */
> >      if (dest == 0xffffffff) {
> >          apic_get_broadcast_bitmask(deliver_bitmask, true);
> >          return;
> >      }
> > 
> >      if (dest_mode == 0) {
> >          apic_find_dest(deliver_bitmask, dest);
> >          /* Broadcast to xAPIC mode apics */
> > -        if (dest == 0xff) {
> > +        if (dest == 0xff && is_x2apic_mode(dev)) {
> >              apic_get_broadcast_bitmask(deliver_bitmask, false);
> >          }
> >      } else {
> > 
> 
> Hmm, the unicast case is handled in apic_find_dest function, the logic 
> inside the if (dest == 0xff) is for handling the broadcast case only.
> This is because when dest == 0xff, it can be both a x2APIC unicast and 
> xAPIC broadcast in case we have some CPUs that are in xAPIC and others 
> are in x2APIC.

Ah! Yes, I see it now.

Shouldn't apic_get_broadcast_bitmask(… true) add *all* APICs to the
mask, regardless of their mode? An APIC which is still in xAPIC mode
will only look at the low 8 bits and see 0xFF which it also interprets
as broadcast? Or is that not how real hardware behaves?

>  Do you think the code here is tricky and hard to read?

Well, I completely failed to read it... :)

I think changing the existing comment something like this might help...

-         /* Broadcast to xAPIC mode apics */
+         /* Any APIC in xAPIC mode will interpret 0xFF as broadcast */

Coupled with a comment on apic_get_delivery_bitmask() clarifying that
it depends on the mode of each APIC it considers — which is obvious
enough in retrospect now I read the code and you point it out to me,
but empirically, we have to concede that it wasn't obvious *enough* :)




[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/5] i386/tcg: implement x2APIC registers MSR access
  2023-03-26  5:20 ` [PATCH v2 1/5] i386/tcg: implement x2APIC registers MSR access Bui Quang Minh
@ 2023-03-27 16:56   ` David Woodhouse
  2023-03-28 16:33     ` Bui Quang Minh
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2023-03-27 16:56 UTC (permalink / raw)
  To: Bui Quang Minh, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 1931 bytes --]

On Sun, 2023-03-26 at 12:20 +0700, Bui Quang Minh wrote:
> 
> +static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
> +                           unsigned size)
> +{
> +    int index = (addr >> 4) & 0xff;
> +
> +    if (size < 4) {
> +        return;
> +    }
> +
> +    if (addr > 0xfff || !index) {
> +        /* MSI and MMIO APIC are at the same memory location,
> +         * but actually not on the global bus: MSI is on PCI bus
> +         * APIC is connected directly to the CPU.
> +         * Mapping them on the global bus happens to work because
> +         * MSI registers are reserved in APIC MMIO and vice versa.
> */
> +        MSIMessage msi = { .address = addr, .data = val };
> +        apic_send_msi(&msi);
> +        return;
> +    }

I know you're just moving this bit around, but note that it means we
*can't* implement the 15-bit MSI trick as things stand, because those
extra 7 bits end up in bits 4-11 of the address, and that means the
'addr > 0xfff' check isn't correct any more.

However, that's only relevant in X2APIC mode... and there's no MMIO
access to registers in X2APIC mode. So the check could perhaps become
something like...

    DeviceState *apic = cpu_get_current_apic();
    if (!apic || is_x2apic_mode(apic) || addr > 0xfff || !index) {
        /* MSI and MMIO APIC are at the same memory location,
         * but actually not on the global bus: MSI is on PCI bus
         * APIC is connected directly to the CPU.
         * Mapping them on the global bus happens to work because
         * MSI registers are reserved in xAPIC MMIO and vice versa.
         * In X2APIC mode, there is no MMIO and bits 4-11 of the
         * address *might* be used to encode the extended dest ID.
         */

        MSIMessage msi = ...

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-27 16:49               ` David Woodhouse
@ 2023-03-28 15:58                 ` Bui Quang Minh
  2023-03-29 14:53                   ` Bui Quang Minh
  0 siblings, 1 reply; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-28 15:58 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 3/27/23 23:49, David Woodhouse wrote:
> On Mon, 2023-03-27 at 23:35 +0700, Bui Quang Minh wrote:
>> On 3/27/23 23:22, David Woodhouse wrote:
>>> On Mon, 2023-03-27 at 22:45 +0700, Bui Quang Minh wrote:
>>>>
>>>>> Maybe I'm misreading the patch, but to me it looks that
>>>>> if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
>>>>> x2apic mode? So delivering to the APIC with physical ID 255 will be
>>>>> misinterpreted as a broadcast?
>>>>
>>>> In case dest == 0xff the second argument to apic_get_broadcast_bitmask
>>>> is set to false which means this is xAPIC broadcast
>>>
>>> Yeah, but it *isn't* xAPIC broadcast. It's X2APIC unicast to APIC#255.
>>>
>>> I think you want (although you don't have 'dev') something like this:
>>>
>>>
>>> static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
>>>                                         uint32_t dest, uint8_t dest_mode)
>>> {
>>>       APICCommonState *apic_iter;
>>>       int i;
>>>
>>>       memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
>>>
>>>       /* x2APIC broadcast id for both physical and logical (cluster) mode */
>>>       if (dest == 0xffffffff) {
>>>           apic_get_broadcast_bitmask(deliver_bitmask, true);
>>>           return;
>>>       }
>>>
>>>       if (dest_mode == 0) {
>>>           apic_find_dest(deliver_bitmask, dest);
>>>           /* Broadcast to xAPIC mode apics */
>>> -        if (dest == 0xff) {
>>> +        if (dest == 0xff && is_x2apic_mode(dev)) {
>>>               apic_get_broadcast_bitmask(deliver_bitmask, false);
>>>           }
>>>       } else {
>>>
>>
>> Hmm, the unicast case is handled in apic_find_dest function, the logic
>> inside the if (dest == 0xff) is for handling the broadcast case only.
>> This is because when dest == 0xff, it can be both a x2APIC unicast and
>> xAPIC broadcast in case we have some CPUs that are in xAPIC and others
>> are in x2APIC.
> 
> Ah! Yes, I see it now.
> 
> Shouldn't apic_get_broadcast_bitmask(… true) add *all* APICs to the
> mask, regardless of their mode? An APIC which is still in xAPIC mode
> will only look at the low 8 bits and see 0xFF which it also interprets
> as broadcast? Or is that not how real hardware behaves?

This is interesting. Your point looks reasonable to me but I don't know 
how to verify it, I'm trying to write kernel module to test it but there 
are just too many things running on Linux that uses interrupt so the 
system hangs.

This raises another question: when dest == 0x102 in IPI, does the xAPIC 
mode CPU with APIC ID 0x2 accept the IPI? I can't see this stated 
clearly in the Intel SDM.

> 
>>   Do you think the code here is tricky and hard to read?
> 
> Well, I completely failed to read it... :)
> 
> I think changing the existing comment something like this might help...
> 
> -         /* Broadcast to xAPIC mode apics */
> +         /* Any APIC in xAPIC mode will interpret 0xFF as broadcast */
> 
> Coupled with a comment on apic_get_delivery_bitmask() clarifying that
> it depends on the mode of each APIC it considers — which is obvious
> enough in retrospect now I read the code and you point it out to me,
> but empirically, we have to concede that it wasn't obvious *enough* :)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/5] i386/tcg: implement x2APIC registers MSR access
  2023-03-27 16:56   ` David Woodhouse
@ 2023-03-28 16:33     ` Bui Quang Minh
  0 siblings, 0 replies; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-28 16:33 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 3/27/23 23:56, David Woodhouse wrote:
> On Sun, 2023-03-26 at 12:20 +0700, Bui Quang Minh wrote:
>>
>> +static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
>> +                           unsigned size)
>> +{
>> +    int index = (addr >> 4) & 0xff;
>> +
>> +    if (size < 4) {
>> +        return;
>> +    }
>> +
>> +    if (addr > 0xfff || !index) {
>> +        /* MSI and MMIO APIC are at the same memory location,
>> +         * but actually not on the global bus: MSI is on PCI bus
>> +         * APIC is connected directly to the CPU.
>> +         * Mapping them on the global bus happens to work because
>> +         * MSI registers are reserved in APIC MMIO and vice versa.
>> */
>> +        MSIMessage msi = { .address = addr, .data = val };
>> +        apic_send_msi(&msi);
>> +        return;
>> +    }
> 
> I know you're just moving this bit around, but note that it means we
> *can't* implement the 15-bit MSI trick as things stand, because those
> extra 7 bits end up in bits 4-11 of the address, and that means the
> 'addr > 0xfff' check isn't correct any more.
> 
> However, that's only relevant in X2APIC mode... and there's no MMIO
> access to registers in X2APIC mode. So the check could perhaps become
> something like...
> 
>      DeviceState *apic = cpu_get_current_apic();
>      if (!apic || is_x2apic_mode(apic) || addr > 0xfff || !index) {
>          /* MSI and MMIO APIC are at the same memory location,
>           * but actually not on the global bus: MSI is on PCI bus
>           * APIC is connected directly to the CPU.
>           * Mapping them on the global bus happens to work because
>           * MSI registers are reserved in xAPIC MMIO and vice versa.
>           * In X2APIC mode, there is no MMIO and bits 4-11 of the
>           * address *might* be used to encode the extended dest ID.
>           */
> 
>          MSIMessage msi = ...

In my opinion, I think the with the emulated interrupt remap hardware we 
don't need to do MSI trick. The behavior is the same with real hardware, 
in order to use x2APIC an interrupt remap hardware is required, the OS 
will configure the interrupt source (IOxAPIC, MSI-capable) to use the 
remappable format for interrupt request.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-28 15:58                 ` Bui Quang Minh
@ 2023-03-29 14:53                   ` Bui Quang Minh
  2023-03-29 15:30                     ` Bui Quang Minh
  0 siblings, 1 reply; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-29 14:53 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 3/28/23 22:58, Bui Quang Minh wrote:
> On 3/27/23 23:49, David Woodhouse wrote:
>> On Mon, 2023-03-27 at 23:35 +0700, Bui Quang Minh wrote:
>>> On 3/27/23 23:22, David Woodhouse wrote:
>>>> On Mon, 2023-03-27 at 22:45 +0700, Bui Quang Minh wrote:
>>>>>
>>>>>> Maybe I'm misreading the patch, but to me it looks that
>>>>>> if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
>>>>>> x2apic mode? So delivering to the APIC with physical ID 255 will be
>>>>>> misinterpreted as a broadcast?
>>>>>
>>>>> In case dest == 0xff the second argument to apic_get_broadcast_bitmask
>>>>> is set to false which means this is xAPIC broadcast
>>>>
>>>> Yeah, but it *isn't* xAPIC broadcast. It's X2APIC unicast to APIC#255.
>>>>
>>>> I think you want (although you don't have 'dev') something like this:
>>>>
>>>>
>>>> static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
>>>>                                         uint32_t dest, uint8_t 
>>>> dest_mode)
>>>> {
>>>>       APICCommonState *apic_iter;
>>>>       int i;
>>>>
>>>>       memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
>>>>
>>>>       /* x2APIC broadcast id for both physical and logical (cluster) 
>>>> mode */
>>>>       if (dest == 0xffffffff) {
>>>>           apic_get_broadcast_bitmask(deliver_bitmask, true);
>>>>           return;
>>>>       }
>>>>
>>>>       if (dest_mode == 0) {
>>>>           apic_find_dest(deliver_bitmask, dest);
>>>>           /* Broadcast to xAPIC mode apics */
>>>> -        if (dest == 0xff) {
>>>> +        if (dest == 0xff && is_x2apic_mode(dev)) {
>>>>               apic_get_broadcast_bitmask(deliver_bitmask, false);
>>>>           }
>>>>       } else {
>>>>
>>>
>>> Hmm, the unicast case is handled in apic_find_dest function, the logic
>>> inside the if (dest == 0xff) is for handling the broadcast case only.
>>> This is because when dest == 0xff, it can be both a x2APIC unicast and
>>> xAPIC broadcast in case we have some CPUs that are in xAPIC and others
>>> are in x2APIC.
>>
>> Ah! Yes, I see it now.
>>
>> Shouldn't apic_get_broadcast_bitmask(… true) add *all* APICs to the
>> mask, regardless of their mode? An APIC which is still in xAPIC mode
>> will only look at the low 8 bits and see 0xFF which it also interprets
>> as broadcast? Or is that not how real hardware behaves?
> 
> This is interesting. Your point looks reasonable to me but I don't know 
> how to verify it, I'm trying to write kernel module to test it but there 
> are just too many things running on Linux that uses interrupt so the 
> system hangs.
> 
> This raises another question: when dest == 0x102 in IPI, does the xAPIC 
> mode CPU with APIC ID 0x2 accept the IPI? I can't see this stated 
> clearly in the Intel SDM.

I do some more testing on my hardware, your point is correct when dest 
== 0xffffffff, the interrupt is delivered to all APICs regardless of 
their mode. And when dest == 0x102 in IPI, xAPIC mode CPU with APIC ID 
0x2 also accepts that IPI.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-29 14:53                   ` Bui Quang Minh
@ 2023-03-29 15:30                     ` Bui Quang Minh
  2023-03-30  8:28                       ` Igor Mammedov
  2023-04-03 10:27                       ` David Woodhouse
  0 siblings, 2 replies; 23+ messages in thread
From: Bui Quang Minh @ 2023-03-29 15:30 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 3/29/23 21:53, Bui Quang Minh wrote:
> On 3/28/23 22:58, Bui Quang Minh wrote:
>> On 3/27/23 23:49, David Woodhouse wrote:
>>> On Mon, 2023-03-27 at 23:35 +0700, Bui Quang Minh wrote:
>>>> On 3/27/23 23:22, David Woodhouse wrote:
>>>>> On Mon, 2023-03-27 at 22:45 +0700, Bui Quang Minh wrote:
>>>>>>
>>>>>>> Maybe I'm misreading the patch, but to me it looks that
>>>>>>> if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
>>>>>>> x2apic mode? So delivering to the APIC with physical ID 255 will be
>>>>>>> misinterpreted as a broadcast?
>>>>>>
>>>>>> In case dest == 0xff the second argument to 
>>>>>> apic_get_broadcast_bitmask
>>>>>> is set to false which means this is xAPIC broadcast
>>>>>
>>>>> Yeah, but it *isn't* xAPIC broadcast. It's X2APIC unicast to APIC#255.
>>>>>
>>>>> I think you want (although you don't have 'dev') something like this:
>>>>>
>>>>>
>>>>> static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
>>>>>                                         uint32_t dest, uint8_t 
>>>>> dest_mode)
>>>>> {
>>>>>       APICCommonState *apic_iter;
>>>>>       int i;
>>>>>
>>>>>       memset(deliver_bitmask, 0x00, max_apic_words * 
>>>>> sizeof(uint32_t));
>>>>>
>>>>>       /* x2APIC broadcast id for both physical and logical 
>>>>> (cluster) mode */
>>>>>       if (dest == 0xffffffff) {
>>>>>           apic_get_broadcast_bitmask(deliver_bitmask, true);
>>>>>           return;
>>>>>       }
>>>>>
>>>>>       if (dest_mode == 0) {
>>>>>           apic_find_dest(deliver_bitmask, dest);
>>>>>           /* Broadcast to xAPIC mode apics */
>>>>> -        if (dest == 0xff) {
>>>>> +        if (dest == 0xff && is_x2apic_mode(dev)) {
>>>>>               apic_get_broadcast_bitmask(deliver_bitmask, false);
>>>>>           }
>>>>>       } else {
>>>>>
>>>>
>>>> Hmm, the unicast case is handled in apic_find_dest function, the logic
>>>> inside the if (dest == 0xff) is for handling the broadcast case only.
>>>> This is because when dest == 0xff, it can be both a x2APIC unicast and
>>>> xAPIC broadcast in case we have some CPUs that are in xAPIC and others
>>>> are in x2APIC.
>>>
>>> Ah! Yes, I see it now.
>>>
>>> Shouldn't apic_get_broadcast_bitmask(… true) add *all* APICs to the
>>> mask, regardless of their mode? An APIC which is still in xAPIC mode
>>> will only look at the low 8 bits and see 0xFF which it also interprets
>>> as broadcast? Or is that not how real hardware behaves?
>>
>> This is interesting. Your point looks reasonable to me but I don't 
>> know how to verify it, I'm trying to write kernel module to test it 
>> but there are just too many things running on Linux that uses 
>> interrupt so the system hangs.
>>
>> This raises another question: when dest == 0x102 in IPI, does the 
>> xAPIC mode CPU with APIC ID 0x2 accept the IPI? I can't see this 
>> stated clearly in the Intel SDM.
> 
> I do some more testing on my hardware, your point is correct when dest 
> == 0xffffffff, the interrupt is delivered to all APICs regardless of 
> their mode.

To be precisely, it only broadcasts to CPUs in xAPIC mode if the IPI 
destination mode is physical. In case the destination mode is logical, 
flat model/cluster model rule applies to determine if the xAPIC CPUs 
accept the IPI. Wow, this is so complicated :)


> And when dest == 0x102 in IPI, xAPIC mode CPU with APIC ID 
> 0x2 also accepts that IPI.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-29 15:30                     ` Bui Quang Minh
@ 2023-03-30  8:28                       ` Igor Mammedov
  2023-04-03 16:01                         ` Bui Quang Minh
  2023-04-03 10:27                       ` David Woodhouse
  1 sibling, 1 reply; 23+ messages in thread
From: Igor Mammedov @ 2023-03-30  8:28 UTC (permalink / raw)
  To: Bui Quang Minh
  Cc: David Woodhouse, qemu-devel, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Michael S . Tsirkin, Marcel Apfelbaum,
	Alex Bennée

On Wed, 29 Mar 2023 22:30:44 +0700
Bui Quang Minh <minhquangbui99@gmail.com> wrote:

> On 3/29/23 21:53, Bui Quang Minh wrote:
> > On 3/28/23 22:58, Bui Quang Minh wrote:  
> >> On 3/27/23 23:49, David Woodhouse wrote:  
> >>> On Mon, 2023-03-27 at 23:35 +0700, Bui Quang Minh wrote:  
> >>>> On 3/27/23 23:22, David Woodhouse wrote:  
> >>>>> On Mon, 2023-03-27 at 22:45 +0700, Bui Quang Minh wrote:  
> >>>>>>  
> >>>>>>> Maybe I'm misreading the patch, but to me it looks that
> >>>>>>> if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
> >>>>>>> x2apic mode? So delivering to the APIC with physical ID 255 will be
> >>>>>>> misinterpreted as a broadcast?  
> >>>>>>
> >>>>>> In case dest == 0xff the second argument to 
> >>>>>> apic_get_broadcast_bitmask
> >>>>>> is set to false which means this is xAPIC broadcast  
> >>>>>
> >>>>> Yeah, but it *isn't* xAPIC broadcast. It's X2APIC unicast to APIC#255.
> >>>>>
> >>>>> I think you want (although you don't have 'dev') something like this:
> >>>>>
> >>>>>
> >>>>> static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
> >>>>>                                         uint32_t dest, uint8_t 
> >>>>> dest_mode)
> >>>>> {
> >>>>>       APICCommonState *apic_iter;
> >>>>>       int i;
> >>>>>
> >>>>>       memset(deliver_bitmask, 0x00, max_apic_words * 
> >>>>> sizeof(uint32_t));
> >>>>>
> >>>>>       /* x2APIC broadcast id for both physical and logical 
> >>>>> (cluster) mode */
> >>>>>       if (dest == 0xffffffff) {
> >>>>>           apic_get_broadcast_bitmask(deliver_bitmask, true);
> >>>>>           return;
> >>>>>       }
> >>>>>
> >>>>>       if (dest_mode == 0) {
> >>>>>           apic_find_dest(deliver_bitmask, dest);
> >>>>>           /* Broadcast to xAPIC mode apics */
> >>>>> -        if (dest == 0xff) {
> >>>>> +        if (dest == 0xff && is_x2apic_mode(dev)) {
> >>>>>               apic_get_broadcast_bitmask(deliver_bitmask, false);
> >>>>>           }
> >>>>>       } else {
> >>>>>  
> >>>>
> >>>> Hmm, the unicast case is handled in apic_find_dest function, the logic
> >>>> inside the if (dest == 0xff) is for handling the broadcast case only.
> >>>> This is because when dest == 0xff, it can be both a x2APIC unicast and
> >>>> xAPIC broadcast in case we have some CPUs that are in xAPIC and others
> >>>> are in x2APIC.  
> >>>
> >>> Ah! Yes, I see it now.
> >>>
> >>> Shouldn't apic_get_broadcast_bitmask(… true) add *all* APICs to the
> >>> mask, regardless of their mode? An APIC which is still in xAPIC mode
> >>> will only look at the low 8 bits and see 0xFF which it also interprets
> >>> as broadcast? Or is that not how real hardware behaves?  
> >>
> >> This is interesting. Your point looks reasonable to me but I don't 
> >> know how to verify it, I'm trying to write kernel module to test it 
> >> but there are just too many things running on Linux that uses 
> >> interrupt so the system hangs.
> >>
> >> This raises another question: when dest == 0x102 in IPI, does the 
> >> xAPIC mode CPU with APIC ID 0x2 accept the IPI? I can't see this 
> >> stated clearly in the Intel SDM.  
> > 
> > I do some more testing on my hardware, your point is correct when dest 
> > == 0xffffffff, the interrupt is delivered to all APICs regardless of 
> > their mode.  
> 
> To be precisely, it only broadcasts to CPUs in xAPIC mode if the IPI 
> destination mode is physical. In case the destination mode is logical, 
> flat model/cluster model rule applies to determine if the xAPIC CPUs 
> accept the IPI. Wow, this is so complicated :)

It would be nice if you could update apic kvm unit test with your
findings if it doesn't test those variants yet.

> 
> 
> > And when dest == 0x102 in IPI, xAPIC mode CPU with APIC ID 
> > 0x2 also accepts that IPI.  
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-29 15:30                     ` Bui Quang Minh
  2023-03-30  8:28                       ` Igor Mammedov
@ 2023-04-03 10:27                       ` David Woodhouse
  2023-04-03 16:38                         ` Bui Quang Minh
  1 sibling, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2023-04-03 10:27 UTC (permalink / raw)
  To: Bui Quang Minh, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 913 bytes --]

On Wed, 2023-03-29 at 22:30 +0700, Bui Quang Minh wrote:
> 
> > 
> > I do some more testing on my hardware, your point is correct when dest 
> > == 0xffffffff, the interrupt is delivered to all APICs regardless of 
> > their mode.
> 
> To be precisely, it only broadcasts to CPUs in xAPIC mode if the IPI 
> destination mode is physical. In case the destination mode is logical, 
> flat model/cluster model rule applies to determine if the xAPIC CPUs 
> accept the IPI. Wow, this is so complicated :)

So even if you send to *all* of the first 8 CPUs in a cluster (e.g.
cluster 0x0001 giving a destination 0x000100FF, a CPU in xAPIC mode
doesn't see that as a broadcast because it's logical mode?

I would have assumed that a CPU in xAPIC mode would have looked at the
low byte and interpreted it as xAPIC logical mode, with the cluster in
the high nybble and the 4-bit mask in the low nybble?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-03-30  8:28                       ` Igor Mammedov
@ 2023-04-03 16:01                         ` Bui Quang Minh
  0 siblings, 0 replies; 23+ messages in thread
From: Bui Quang Minh @ 2023-04-03 16:01 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: David Woodhouse, qemu-devel, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Michael S . Tsirkin, Marcel Apfelbaum,
	Alex Bennée

On 3/30/23 15:28, Igor Mammedov wrote:
> On Wed, 29 Mar 2023 22:30:44 +0700
> Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> 
>> On 3/29/23 21:53, Bui Quang Minh wrote:
>>> On 3/28/23 22:58, Bui Quang Minh wrote:
>>>> On 3/27/23 23:49, David Woodhouse wrote:
>>>>> On Mon, 2023-03-27 at 23:35 +0700, Bui Quang Minh wrote:
>>>>>> On 3/27/23 23:22, David Woodhouse wrote:
>>>>>>> On Mon, 2023-03-27 at 22:45 +0700, Bui Quang Minh wrote:
>>>>>>>>   
>>>>>>>>> Maybe I'm misreading the patch, but to me it looks that
>>>>>>>>> if (dest == 0xff) apic_get_broadcast_bitmask() bit applies even in
>>>>>>>>> x2apic mode? So delivering to the APIC with physical ID 255 will be
>>>>>>>>> misinterpreted as a broadcast?
>>>>>>>>
>>>>>>>> In case dest == 0xff the second argument to
>>>>>>>> apic_get_broadcast_bitmask
>>>>>>>> is set to false which means this is xAPIC broadcast
>>>>>>>
>>>>>>> Yeah, but it *isn't* xAPIC broadcast. It's X2APIC unicast to APIC#255.
>>>>>>>
>>>>>>> I think you want (although you don't have 'dev') something like this:
>>>>>>>
>>>>>>>
>>>>>>> static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
>>>>>>>                                          uint32_t dest, uint8_t
>>>>>>> dest_mode)
>>>>>>> {
>>>>>>>        APICCommonState *apic_iter;
>>>>>>>        int i;
>>>>>>>
>>>>>>>        memset(deliver_bitmask, 0x00, max_apic_words *
>>>>>>> sizeof(uint32_t));
>>>>>>>
>>>>>>>        /* x2APIC broadcast id for both physical and logical
>>>>>>> (cluster) mode */
>>>>>>>        if (dest == 0xffffffff) {
>>>>>>>            apic_get_broadcast_bitmask(deliver_bitmask, true);
>>>>>>>            return;
>>>>>>>        }
>>>>>>>
>>>>>>>        if (dest_mode == 0) {
>>>>>>>            apic_find_dest(deliver_bitmask, dest);
>>>>>>>            /* Broadcast to xAPIC mode apics */
>>>>>>> -        if (dest == 0xff) {
>>>>>>> +        if (dest == 0xff && is_x2apic_mode(dev)) {
>>>>>>>                apic_get_broadcast_bitmask(deliver_bitmask, false);
>>>>>>>            }
>>>>>>>        } else {
>>>>>>>   
>>>>>>
>>>>>> Hmm, the unicast case is handled in apic_find_dest function, the logic
>>>>>> inside the if (dest == 0xff) is for handling the broadcast case only.
>>>>>> This is because when dest == 0xff, it can be both a x2APIC unicast and
>>>>>> xAPIC broadcast in case we have some CPUs that are in xAPIC and others
>>>>>> are in x2APIC.
>>>>>
>>>>> Ah! Yes, I see it now.
>>>>>
>>>>> Shouldn't apic_get_broadcast_bitmask(… true) add *all* APICs to the
>>>>> mask, regardless of their mode? An APIC which is still in xAPIC mode
>>>>> will only look at the low 8 bits and see 0xFF which it also interprets
>>>>> as broadcast? Or is that not how real hardware behaves?
>>>>
>>>> This is interesting. Your point looks reasonable to me but I don't
>>>> know how to verify it, I'm trying to write kernel module to test it
>>>> but there are just too many things running on Linux that uses
>>>> interrupt so the system hangs.
>>>>
>>>> This raises another question: when dest == 0x102 in IPI, does the
>>>> xAPIC mode CPU with APIC ID 0x2 accept the IPI? I can't see this
>>>> stated clearly in the Intel SDM.
>>>
>>> I do some more testing on my hardware, your point is correct when dest
>>> == 0xffffffff, the interrupt is delivered to all APICs regardless of
>>> their mode.
>>
>> To be precisely, it only broadcasts to CPUs in xAPIC mode if the IPI
>> destination mode is physical. In case the destination mode is logical,
>> flat model/cluster model rule applies to determine if the xAPIC CPUs
>> accept the IPI. Wow, this is so complicated :)
> 
> It would be nice if you could update apic kvm unit test with your
> findings if it doesn't test those variants yet.
> 
>>
>>
>>> And when dest == 0x102 in IPI, xAPIC mode CPU with APIC ID
>>> 0x2 also accepts that IPI.

KVM does not do the same way as the real hardware in these cases, if the 
destination of IPI is 0xffffffff, IPI is broadcasted to x2APIC CPUs but 
not xAPIC CPUs. The same with IPI has destination 0x102 does not deliver 
to xAPIC CPU with APIC ID 0x2. This is the intended behavior as I see 
some comments mentioning it.

diff --git a/x86/apic.c b/x86/apic.c
index 20c3a1a..8c91d27 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -679,7 +679,7 @@ static void set_ldr(void *__ldr)
  	apic_write(APIC_LDR, ldr << 24);
  }

-static int test_fixed_ipi(u32 dest_mode, u8 dest, u8 vector,
+static int test_fixed_ipi(u32 dest_mode, u32 dest, u8 vector,
  			  int nr_ipis_expected, const char *mode_name)
  {
  	u64 start = rdtsc();
@@ -913,6 +913,38 @@ static void test_aliased_xapic_physical_ipi(void)
  	report(!f, "IPI to aliased xAPIC physical IDs");
  }

+static void reset_apic_cpu(void *arg)
+{
+	u8 *id = (u8 *)arg;
+	reset_apic();
+	*id = apic_id();
+}
+
+static void test_physical_ipi_with_x2apic_id(void)
+{
+	u8 vector = 0xf1;
+	int f = 0;
+	u8 apic_id_cpu1;
+
+	if (cpu_count() < 2)
+		return;
+
+	if (!is_x2apic_enabled())
+		return;
+
+	on_cpu(1, reset_apic_cpu, &apic_id_cpu1);
+	handle_irq(vector, handle_ipi);
+
+	/*
+	 * CPU1 is in xAPIC so it accepts the IPI because the (destination & 0xff)
+	 * matches its APIC ID.
+	 */
+	f += test_fixed_ipi(APIC_DEST_PHYSICAL, apic_id_cpu1 | 0x100, vector, 
1, "physical");
+	f += test_fixed_ipi(APIC_DEST_PHYSICAL, 0xffffffff, vector, 
cpu_count(), "physical");
+
+	report(!f, "IPI with x2apic id to xapic CPU");
+}
+
  typedef void (*apic_test_fn)(void);

  int main(void)
@@ -950,6 +982,7 @@ int main(void)
  		test_apic_id,
  		test_apicbase,
  		test_aliased_xapic_physical_ipi,
+		test_physical_ipi_with_x2apic_id,
  	};

  	assert_msg(is_apic_hw_enabled() && is_apic_sw_enabled(),

With this patch in kvm-unit-test, the version 3 of this series, which I 
will post soon, passes the test but not KVM. So I am not sure if I 
should post this test to kvm-unit-test.


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-04-03 10:27                       ` David Woodhouse
@ 2023-04-03 16:38                         ` Bui Quang Minh
  2023-04-09 14:31                           ` Bui Quang Minh
  0 siblings, 1 reply; 23+ messages in thread
From: Bui Quang Minh @ 2023-04-03 16:38 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 4/3/23 17:27, David Woodhouse wrote:
> On Wed, 2023-03-29 at 22:30 +0700, Bui Quang Minh wrote:
>>
>>>
>>> I do some more testing on my hardware, your point is correct when dest
>>> == 0xffffffff, the interrupt is delivered to all APICs regardless of
>>> their mode.
>>
>> To be precisely, it only broadcasts to CPUs in xAPIC mode if the IPI
>> destination mode is physical. In case the destination mode is logical,
>> flat model/cluster model rule applies to determine if the xAPIC CPUs
>> accept the IPI. Wow, this is so complicated :)
> 
> So even if you send to *all* of the first 8 CPUs in a cluster (e.g.
> cluster 0x0001 giving a destination 0x000100FF, a CPU in xAPIC mode
> doesn't see that as a broadcast because it's logical mode?

I mean if the destination is 0xffffffff, the xAPIC CPU will see 
destination as 0xff. 0xff is broadcast in physical destination mode 
only, in logical destination, it may not be a broadcast. It may depend 
on the whether it is flat model or cluster model in logical destination 
mode.

In flat model, 8 bits are used as mask, so in theory, this model can 
only support 8 CPUs, each CPU reserves its own bit by setting the upper 
8 bits of APIC LDR register. In Intel SDM, it is said that 0xff can be 
interpreted as a broadcast, this is true in normal case, but I think if 
the CPU its APIC LDR to 0, the IPI should not deliver to this CPU. This 
also matches with the current flat model of logical destination mode 
implementation of userspace APIC in Qemu before my series. However, I 
see this seems like a corner case, I didn't test it on real hardware.

With cluster model, when writing the above paragraphs, I think that 0xff 
will be delivered to all APICs (mask = 0xf) of cluster 15 (0xf). 
However, reading the SDM more carefully, I see that the there are only 
15 clusters with address from 0 to 14 so there is no cluster with 
address 15. 0xff is interpreted as broadcast to all APICs in all 
clusters too.

In conclusion, IPI with destination 0xffffffff can be a broadcast to all 
xAPIC CPUs too if we just ignore the corner case in flat model of 
logical destination mode (we may need to test more)

> I would have assumed that a CPU in xAPIC mode would have looked at the
> low byte and interpreted it as xAPIC logical mode, with the cluster in
> the high nybble and the 4-bit mask in the low nybble?

Yes, this is the behavior in cluster model of logical destination mode 
(I try to stick with Intel SDM Section 10.6.2.2 Volume 3A when using 
those terminologies)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/5] apic: add support for x2APIC mode
  2023-04-03 16:38                         ` Bui Quang Minh
@ 2023-04-09 14:31                           ` Bui Quang Minh
  0 siblings, 0 replies; 23+ messages in thread
From: Bui Quang Minh @ 2023-04-09 14:31 UTC (permalink / raw)
  To: David Woodhouse, qemu-devel
  Cc: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Michael S . Tsirkin, Marcel Apfelbaum, Igor Mammedov,
	Alex Bennée

On 4/3/23 23:38, Bui Quang Minh wrote:
> On 4/3/23 17:27, David Woodhouse wrote:
>> On Wed, 2023-03-29 at 22:30 +0700, Bui Quang Minh wrote:
>>>
>>>>
>>>> I do some more testing on my hardware, your point is correct when dest
>>>> == 0xffffffff, the interrupt is delivered to all APICs regardless of
>>>> their mode.
>>>
>>> To be precisely, it only broadcasts to CPUs in xAPIC mode if the IPI
>>> destination mode is physical. In case the destination mode is logical,
>>> flat model/cluster model rule applies to determine if the xAPIC CPUs
>>> accept the IPI. Wow, this is so complicated :)
>>
>> So even if you send to *all* of the first 8 CPUs in a cluster (e.g.
>> cluster 0x0001 giving a destination 0x000100FF, a CPU in xAPIC mode
>> doesn't see that as a broadcast because it's logical mode?
> 
> I mean if the destination is 0xffffffff, the xAPIC CPU will see 
> destination as 0xff. 0xff is broadcast in physical destination mode 
> only, in logical destination, it may not be a broadcast. It may depend 
> on the whether it is flat model or cluster model in logical destination 
> mode.
> 
> In flat model, 8 bits are used as mask, so in theory, this model can 
> only support 8 CPUs, each CPU reserves its own bit by setting the upper 
> 8 bits of APIC LDR register. In Intel SDM, it is said that 0xff can be 
> interpreted as a broadcast, this is true in normal case, but I think if 
> the CPU its APIC LDR to 0, the IPI should not deliver to this CPU. This 
> also matches with the current flat model of logical destination mode 
> implementation of userspace APIC in Qemu before my series. However, I 
> see this seems like a corner case, I didn't test it on real hardware.
> 
> With cluster model, when writing the above paragraphs, I think that 0xff 
> will be delivered to all APICs (mask = 0xf) of cluster 15 (0xf). 
> However, reading the SDM more carefully, I see that the there are only 
> 15 clusters with address from 0 to 14 so there is no cluster with 
> address 15. 0xff is interpreted as broadcast to all APICs in all 
> clusters too.
> 
> In conclusion, IPI with destination 0xffffffff can be a broadcast to all 
> xAPIC CPUs too if we just ignore the corner case in flat model of 
> logical destination mode (we may need to test more)

I do some tests on my machine with custom Linux kernel module (I can't 
use kvm-unit-tests because the display on my laptop is on embedded 
display port not serial port so I don't know how to get the output). I 
find out that
- In flat model, if the CPU intentionally set its 8 bit address to 0 in 
APIC_LDR, the 0xff logical IPI does not deliver to this CPU.
- In cluster model, the 4 higher bits used as cluster address, if these 
4 bits is equal to 0xf, the IPI is broadcasted to all cluster. The 4 
lower bits are used to address APICs in the cluster. If the CPU 
intentionally set these 4 lower bits to 0 in APIC_LDR, same as the flat 
model, this CPU does not receive the logical IPI. For example, the CPUs 
set its address to 0x20 (cluster 2, address 0 in cluster), the logical 
IPI with destination == 0xff does deliver to cluster 2 but does not 
deliver to this CPU.

So I choose to stick with the current implementation, 0xffffffff in IPI 
is seen as 0xff IPI in xAPIC CPUs. This IPI if in physical mode is a 
broadcast but not in logical mode.


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-04-09 14:32 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-26  5:20 [PATCH v2 0/5] Support x2APIC mode with TCG accelerator Bui Quang Minh
2023-03-26  5:20 ` [PATCH v2 1/5] i386/tcg: implement x2APIC registers MSR access Bui Quang Minh
2023-03-27 16:56   ` David Woodhouse
2023-03-28 16:33     ` Bui Quang Minh
2023-03-26  5:20 ` [PATCH v2 2/5] apic: add support for x2APIC mode Bui Quang Minh
2023-03-27 11:04   ` David Woodhouse
2023-03-27 15:33     ` Bui Quang Minh
2023-03-27 15:37       ` David Woodhouse
2023-03-27 15:45         ` Bui Quang Minh
2023-03-27 16:22           ` David Woodhouse
2023-03-27 16:35             ` Bui Quang Minh
2023-03-27 16:49               ` David Woodhouse
2023-03-28 15:58                 ` Bui Quang Minh
2023-03-29 14:53                   ` Bui Quang Minh
2023-03-29 15:30                     ` Bui Quang Minh
2023-03-30  8:28                       ` Igor Mammedov
2023-04-03 16:01                         ` Bui Quang Minh
2023-04-03 10:27                       ` David Woodhouse
2023-04-03 16:38                         ` Bui Quang Minh
2023-04-09 14:31                           ` Bui Quang Minh
2023-03-26  5:20 ` [PATCH v2 3/5] apic, i386/tcg: add x2apic transitions Bui Quang Minh
2023-03-26  5:20 ` [PATCH v2 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC Bui Quang Minh
2023-03-26  5:20 ` [PATCH v2 5/5] amd_iommu: report x2APIC support to the operating system Bui Quang Minh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.