All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kvm-unit-tests 00/17] VT-d unit test
@ 2016-10-26  7:47 Peter Xu
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test Peter Xu
                   ` (16 more replies)
  0 siblings, 17 replies; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

This series is adding simplest test cases for VT-d. From this version
and so on, the work is rebased to Alex's PCI series v8. Removing the
"RFC" prefix.

Thanks to Drew for a lot of review work for v1 of this series. I think
it's time to send another spin since the changelog is going out of
control...

RFC -> v1:
- when init edu device fail, report_skip() rather than return error
  [Radim]
- use asserts rather than "static bool inited" to avoid multiple init
  of components (affects patch 1/2) [Drew]
- moving the first two patches out of the series [Drew]
- int vtd_init(), do not setup_idt() since smp_init() did it [Drew]
- when edu do not have MSI enabled, skip interrupt test [Radim]
- rename vtd_reg_*() into vtd_{read|write}[lq](), and move them to
  header file [Drew]
- use PAGE_MASK when able [Drew]
- use "&" instead of "|" in intel-iommu init test (three places)
  [Drew]
- use "vtd_init()" in unit test [Drew]
- mention that where intel-iommu.h comes from [Drew]
- re-written vtd_gcmd_or(), make it also work on even hardware [Drew]
- remove most of the oneline wrapper for VT-d registers, instead, use
  vtd_{read|write}* with register names [Drew]
- remove useless BDF helpers [Drew]
- move edu device macros into header file [Drew]
- make edu_check_alive static inline [Drew]
- remove all useless wrappers in pci-edu.c [Drew]
- remove pci_dma_dir_t and all its users, instead, use "bool
  from_device" [Drew]
- not use typedef for structs, to follow Linux/kvm-unit-tests coding
  style [Drew]
- let pci_dev_init() clean and simple, then provide
  pci_enable_defaults() for more complicated things [Drew]
- add one more patch to add intel-iommu test into x86/unittest [Radim]
- use 0x60 intr request instead of factorial to trigger edu device
  interrupt [Drew]
- ...and some other changes I just forgot to note down...

Currently only a very small test scope is covered:

* VT-d init
* DMAR: 4 bytes copy
* IR: MSI

However this series could be a base point to add more test cases for
VT-d. The problem is, there are many IOMMU error conditions which are
very hard to be triggered in a real guest (IOMMU has merely no
interface for guest user, and it's totally running in the background).
This piece of work can be a start point if we want to do more
complicated things and play around with Intel IOMMU devices (also for
IOMMU regression tests).

Please review. Thanks,

=================

To run the test:

./x86/run ./x86/intel-iommu.flat \
    -M q35,kernel-irqchip=split -global ioapic.version=0x20 \
    -device intel-iommu,intremap=on -device edu

Sample output:

pxdev:kvm-unit-tests [new-iommu-ut]# ./iommu_run.sh
/root/git/qemu/bin/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio
+-device pci-testdev -kernel ./x86/intel-iommu.flat -M q35,kernel-irqchip=split -global ioapic.version=0x20 -device intel-iommu,intremap=on -device edu
enabling apic
paging enabled
cr0 = 80010011
cr3 = 7fff000
cr4 = 20
VT-d version:   0x10
     cap:       0x0012008c22260206
     ecap:      0x0000000000f00f1a
PASS: init status check
PASS: fault status check
PASS: QI enablement
DMAR table address: 0x0000000007ff9000
PASS: DMAR table setup
IR table address: 0x0000000007ff8000
PASS: IR table setup
PASS: DMAR enablement
PASS: IR enablement
PASS: DMAR support 39 bits address width
PASS: DMAR support huge pages
PCI: init dev 0x0020 BAR 0 [MEM] addr 0xfea00000
PCI detected cap 0x5
Detected MSI for device 0x20 offset 0x40
allocated vt-d root entry for PCI bus 0
allocated vt-d context entry for devfn 0x20
map 4K page IOVA 0x0 to 0x7ff7000 (sid=0x0020)
edu device DMA start TO addr 0x0 size 0x4 off 0x0
edu device DMA start FROM addr 0x4 size 0x4 off 0x0
PASS: DMAR 4B memcpy test
INTR: setup IRTE index 0
MSI: dev 0x20 init 64bit address: addr=0xfee00010, data=0x0
PASS: EDU factorial INTR test

Peter Xu (17):
  x86: intel-iommu: add vt-d init test
  libcflat: add IS_ALIGNED() macro, and page sizes
  libcflat: moving MIN/MAX here
  vm/page: provide PGDIR_OFFSET() macro
  x86/asm: add cpu_relax()
  pci: introduce struct pci_dev
  pci: provide pci_scan_bars()
  x86/vmexit: leverage pci_scan_bars()
  pci: add pci_config_write[wb]()
  pci: provide pci_set_master()
  pci: provide pci_enable_defaults()
  pci: add bdf helpers
  pci: edu: introduce pci-edu helpers
  x86: intel-iommu: add dmar test
  pci: add msi support for 32/64bit address
  x86: intel-iommu: add IR MSI test
  x86/unittests: add intel-iommu test

 lib/alloc.c            |   3 -
 lib/libcflat.h         |   8 ++
 lib/pci-edu.c          |  74 ++++++++++++
 lib/pci-edu.h          |  80 +++++++++++++
 lib/pci-host-generic.c |  25 +++-
 lib/pci-testdev.c      |  13 +-
 lib/pci.c              | 160 +++++++++++++++++++++----
 lib/pci.h              |  39 ++++--
 lib/x86/asm/barrier.h  |  11 ++
 lib/x86/asm/page.h     |   3 +
 lib/x86/asm/pci.h      |  14 +++
 lib/x86/intel-iommu.c  | 313 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/x86/intel-iommu.h  | 139 ++++++++++++++++++++++
 lib/x86/vm.c           |   4 +-
 x86/Makefile.common    |   1 +
 x86/Makefile.x86_64    |   2 +
 x86/intel-iommu.c      | 121 +++++++++++++++++++
 x86/unittests.cfg      |   7 ++
 x86/vmexit.c           |  25 ++--
 19 files changed, 978 insertions(+), 64 deletions(-)
 create mode 100644 lib/pci-edu.c
 create mode 100644 lib/pci-edu.h
 create mode 100644 lib/x86/intel-iommu.c
 create mode 100644 lib/x86/intel-iommu.h
 create mode 100644 x86/intel-iommu.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 16:12   ` Andrew Jones
  2016-11-08 10:52   ` Alexander Gordeev
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 02/17] libcflat: add IS_ALIGNED() macro, and page sizes Peter Xu
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

Adding fundamental init test for Intel IOMMU. This includes basic
initialization of Intel IOMMU device, like DMAR (DMA Remapping),
IR (Interrupt Remapping), QI (Queue Invalidation), etc.

Further tests can use vtd_init() to initialize Intel IOMMU environment.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/x86/intel-iommu.c |  88 ++++++++++++++++++++++++++++++++++++++
 lib/x86/intel-iommu.h | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++
 x86/Makefile.x86_64   |   2 +
 x86/intel-iommu.c     |  27 ++++++++++++
 4 files changed, 232 insertions(+)
 create mode 100644 lib/x86/intel-iommu.c
 create mode 100644 lib/x86/intel-iommu.h
 create mode 100644 x86/intel-iommu.c

diff --git a/lib/x86/intel-iommu.c b/lib/x86/intel-iommu.c
new file mode 100644
index 0000000..02eaf39
--- /dev/null
+++ b/lib/x86/intel-iommu.c
@@ -0,0 +1,88 @@
+/*
+ * Intel IOMMU APIs
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * Authors:
+ *   Peter Xu <peterx@redhat.com>,
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or
+ * later.
+ */
+
+#include "intel-iommu.h"
+
+#define ONE_BIT_ONLY(x) ((x) && !((x) & ((x) - 1)))
+
+#define VTD_RTA_MASK  (PAGE_MASK)
+#define VTD_IRTA_MASK (PAGE_MASK)
+
+static uint64_t vtd_root_table(void)
+{
+       /* No extend root table support yet */
+       return vtd_readq(DMAR_RTADDR_REG) & VTD_RTA_MASK;
+}
+
+static uint64_t vtd_ir_table(void)
+{
+	return vtd_readq(DMAR_IRTA_REG) & VTD_IRTA_MASK;
+}
+
+static void vtd_gcmd_or(uint32_t cmd)
+{
+	uint32_t status;
+
+	/* We only allow set one bit for each time */
+	assert(ONE_BIT_ONLY(cmd));
+
+	status = vtd_readl(DMAR_GSTS_REG);
+	vtd_writel(DMAR_GCMD_REG, status | cmd);
+
+	if (cmd & VTD_GCMD_ONE_SHOT_BITS)
+		/* One-shot bits are taking effect immediately */
+		return;
+
+	/* Make sure IOMMU handled our command request */
+	while (!(vtd_readl(DMAR_GSTS_REG) & cmd));
+}
+
+static void vtd_dump_init_info(void)
+{
+	printf("VT-d version:   0x%x\n", vtd_readl(DMAR_VER_REG));
+	printf("     cap:       0x%016lx\n", vtd_readq(DMAR_CAP_REG));
+	printf("     ecap:      0x%016lx\n", vtd_readq(DMAR_ECAP_REG));
+}
+
+static void vtd_setup_root_table(void)
+{
+	void *root = alloc_page();
+
+	memset(root, 0, PAGE_SIZE);
+	vtd_writeq(DMAR_RTADDR_REG, virt_to_phys(root));
+	vtd_gcmd_or(VTD_GCMD_ROOT);
+	printf("DMAR table address: 0x%016lx\n", vtd_root_table());
+}
+
+static void vtd_setup_ir_table(void)
+{
+	void *root = alloc_page();
+
+	memset(root, 0, PAGE_SIZE);
+	/* 0xf stands for table size (2^(0xf+1) == 65536) */
+	vtd_writeq(DMAR_IRTA_REG, virt_to_phys(root) | 0xf);
+	vtd_gcmd_or(VTD_GCMD_IR_TABLE);
+	printf("IR table address: 0x%016lx\n", vtd_ir_table());
+}
+
+void vtd_init(void)
+{
+	setup_vm();
+	smp_init();
+
+	vtd_dump_init_info();
+	vtd_gcmd_or(VTD_GCMD_QI); /* Enable QI */
+	vtd_setup_root_table();
+	vtd_setup_ir_table();
+	vtd_gcmd_or(VTD_GCMD_DMAR); /* Enable DMAR */
+	vtd_gcmd_or(VTD_GCMD_IR);   /* Enable IR */
+}
diff --git a/lib/x86/intel-iommu.h b/lib/x86/intel-iommu.h
new file mode 100644
index 0000000..537ce62
--- /dev/null
+++ b/lib/x86/intel-iommu.h
@@ -0,0 +1,115 @@
+/*
+ * Intel IOMMU header
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * Authors:
+ *   Peter Xu <peterx@redhat.com>,
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or
+ * later.
+ *
+ * (From include/linux/intel-iommu.h)
+ */
+
+#ifndef __INTEL_IOMMU_H__
+#define __INTEL_IOMMU_H__
+
+#include "libcflat.h"
+#include "vm.h"
+#include "isr.h"
+#include "smp.h"
+#include "desc.h"
+
+#define Q35_HOST_BRIDGE_IOMMU_ADDR  0xfed90000ULL
+
+/*
+ * Intel IOMMU register specification
+ */
+#define DMAR_VER_REG            0x0  /* Arch version supported by this IOMMU */
+#define DMAR_CAP_REG            0x8  /* Hardware supported capabilities */
+#define DMAR_CAP_REG_HI         0xc  /* High 32-bit of DMAR_CAP_REG */
+#define DMAR_ECAP_REG           0x10 /* Extended capabilities supported */
+#define DMAR_ECAP_REG_HI        0X14
+#define DMAR_GCMD_REG           0x18 /* Global command */
+#define DMAR_GSTS_REG           0x1c /* Global status */
+#define DMAR_RTADDR_REG         0x20 /* Root entry table */
+#define DMAR_RTADDR_REG_HI      0X24
+#define DMAR_CCMD_REG           0x28 /* Context command */
+#define DMAR_CCMD_REG_HI        0x2c
+#define DMAR_FSTS_REG           0x34 /* Fault status */
+#define DMAR_FECTL_REG          0x38 /* Fault control */
+#define DMAR_FEDATA_REG         0x3c /* Fault event interrupt data */
+#define DMAR_FEADDR_REG         0x40 /* Fault event interrupt addr */
+#define DMAR_FEUADDR_REG        0x44 /* Upper address */
+#define DMAR_AFLOG_REG          0x58 /* Advanced fault control */
+#define DMAR_AFLOG_REG_HI       0X5c
+#define DMAR_PMEN_REG           0x64 /* Enable protected memory region */
+#define DMAR_PLMBASE_REG        0x68 /* PMRR low addr */
+#define DMAR_PLMLIMIT_REG       0x6c /* PMRR low limit */
+#define DMAR_PHMBASE_REG        0x70 /* PMRR high base addr */
+#define DMAR_PHMBASE_REG_HI     0X74
+#define DMAR_PHMLIMIT_REG       0x78 /* PMRR high limit */
+#define DMAR_PHMLIMIT_REG_HI    0x7c
+#define DMAR_IQH_REG            0x80 /* Invalidation queue head */
+#define DMAR_IQH_REG_HI         0X84
+#define DMAR_IQT_REG            0x88 /* Invalidation queue tail */
+#define DMAR_IQT_REG_HI         0X8c
+#define DMAR_IQA_REG            0x90 /* Invalidation queue addr */
+#define DMAR_IQA_REG_HI         0x94
+#define DMAR_ICS_REG            0x9c /* Invalidation complete status */
+#define DMAR_IRTA_REG           0xb8 /* Interrupt remapping table addr */
+#define DMAR_IRTA_REG_HI        0xbc
+#define DMAR_IECTL_REG          0xa0 /* Invalidation event control */
+#define DMAR_IEDATA_REG         0xa4 /* Invalidation event data */
+#define DMAR_IEADDR_REG         0xa8 /* Invalidation event address */
+#define DMAR_IEUADDR_REG        0xac /* Invalidation event address */
+#define DMAR_PQH_REG            0xc0 /* Page request queue head */
+#define DMAR_PQH_REG_HI         0xc4
+#define DMAR_PQT_REG            0xc8 /* Page request queue tail*/
+#define DMAR_PQT_REG_HI         0xcc
+#define DMAR_PQA_REG            0xd0 /* Page request queue address */
+#define DMAR_PQA_REG_HI         0xd4
+#define DMAR_PRS_REG            0xdc /* Page request status */
+#define DMAR_PECTL_REG          0xe0 /* Page request event control */
+#define DMAR_PEDATA_REG         0xe4 /* Page request event data */
+#define DMAR_PEADDR_REG         0xe8 /* Page request event address */
+#define DMAR_PEUADDR_REG        0xec /* Page event upper address */
+#define DMAR_MTRRCAP_REG        0x100 /* MTRR capability */
+#define DMAR_MTRRCAP_REG_HI     0x104
+#define DMAR_MTRRDEF_REG        0x108 /* MTRR default type */
+#define DMAR_MTRRDEF_REG_HI     0x10c
+
+#define VTD_GCMD_IR_TABLE       (0x1000000)
+#define VTD_GCMD_IR             (0x2000000)
+#define VTD_GCMD_QI             (0x4000000)
+#define VTD_GCMD_WBF            (0x8000000)  /* Write Buffer Flush */
+#define VTD_GCMD_SFL            (0x20000000) /* Set Fault Log */
+#define VTD_GCMD_ROOT           (0x40000000)
+#define VTD_GCMD_DMAR           (0x80000000)
+#define VTD_GCMD_ONE_SHOT_BITS  (VTD_GCMD_IR_TABLE | VTD_GCMD_WBF | \
+				 VTD_GCMD_SFL | VTD_GCMD_ROOT)
+
+static inline void vtd_writel(unsigned int reg, uint32_t value)
+{
+	*(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
+}
+
+static inline void vtd_writeq(unsigned int reg, uint64_t value)
+{
+	*(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
+}
+
+static inline uint32_t vtd_readl(unsigned int reg)
+{
+	return *(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
+}
+
+static inline uint64_t vtd_readq(unsigned int reg)
+{
+	return *(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
+}
+
+void vtd_init(void);
+
+#endif
diff --git a/x86/Makefile.x86_64 b/x86/Makefile.x86_64
index f82492b..3e2821e 100644
--- a/x86/Makefile.x86_64
+++ b/x86/Makefile.x86_64
@@ -4,6 +4,7 @@ ldarch = elf64-x86-64
 CFLAGS += -mno-red-zone
 
 cflatobjs += lib/x86/setjmp64.o
+cflatobjs += lib/x86/intel-iommu.o
 
 tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
 	  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
@@ -14,6 +15,7 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
 tests += $(TEST_DIR)/svm.flat
 tests += $(TEST_DIR)/vmx.flat
 tests += $(TEST_DIR)/tscdeadline_latency.flat
+tests += $(TEST_DIR)/intel-iommu.flat
 
 include $(TEST_DIR)/Makefile.common
 
diff --git a/x86/intel-iommu.c b/x86/intel-iommu.c
new file mode 100644
index 0000000..f247913
--- /dev/null
+++ b/x86/intel-iommu.c
@@ -0,0 +1,27 @@
+/*
+ * Intel IOMMU unit test.
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * Authors:
+ *   Peter Xu <peterx@redhat.com>,
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or
+ * later.
+ */
+
+#include "intel-iommu.h"
+
+int main(int argc, char *argv[])
+{
+	vtd_init();
+
+	report("fault status check", vtd_readl(DMAR_FSTS_REG) == 0);
+	report("QI enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_QI);
+	report("DMAR table setup", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_ROOT);
+	report("IR table setup", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_IR_TABLE);
+	report("DMAR enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_DMAR);
+	report("IR enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_IR);
+
+	return report_summary();
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 02/17] libcflat: add IS_ALIGNED() macro, and page sizes
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 16:14   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 03/17] libcflat: moving MIN/MAX here Peter Xu
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

These macros will be useful to do page alignment checks.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/libcflat.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/libcflat.h b/lib/libcflat.h
index 72b1bf9..a80df04 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -33,6 +33,11 @@
 #define __ALIGN_MASK(x, mask)	(((x) + (mask)) & ~(mask))
 #define __ALIGN(x, a)		__ALIGN_MASK(x, (typeof(x))(a) - 1)
 #define ALIGN(x, a)		__ALIGN((x), (a))
+#define IS_ALIGNED(x, a)	(((x) & ((typeof(x))(a) - 1)) == 0)
+
+#define SZ_4K			(0x1000)
+#define SZ_2M			(0x200000)
+#define SZ_1G			(0x40000000)
 
 typedef uint8_t		u8;
 typedef int8_t		s8;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 03/17] libcflat: moving MIN/MAX here
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test Peter Xu
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 02/17] libcflat: add IS_ALIGNED() macro, and page sizes Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 16:15   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 04/17] vm/page: provide PGDIR_OFFSET() macro Peter Xu
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

That's something can be used outside alloc.c.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/alloc.c    | 3 ---
 lib/libcflat.h | 3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/alloc.c b/lib/alloc.c
index e1d7b8a..58af52b 100644
--- a/lib/alloc.c
+++ b/lib/alloc.c
@@ -7,9 +7,6 @@
 #include "asm/spinlock.h"
 #include "asm/io.h"
 
-#define MIN(a, b)		((a) < (b) ? (a) : (b))
-#define MAX(a, b)		((a) > (b) ? (a) : (b))
-
 #define PHYS_ALLOC_NR_REGIONS	256
 
 struct phys_alloc_region {
diff --git a/lib/libcflat.h b/lib/libcflat.h
index a80df04..f0308f6 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -39,6 +39,9 @@
 #define SZ_2M			(0x200000)
 #define SZ_1G			(0x40000000)
 
+#define MIN(a, b)		((a) < (b) ? (a) : (b))
+#define MAX(a, b)		((a) > (b) ? (a) : (b))
+
 typedef uint8_t		u8;
 typedef int8_t		s8;
 typedef uint16_t	u16;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 04/17] vm/page: provide PGDIR_OFFSET() macro
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (2 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 03/17] libcflat: moving MIN/MAX here Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 05/17] x86/asm: add cpu_relax() Peter Xu
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

This can be used in further patches.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/x86/asm/page.h | 3 +++
 lib/x86/vm.c       | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/x86/asm/page.h b/lib/x86/asm/page.h
index 5044a49..c43bab2 100644
--- a/lib/x86/asm/page.h
+++ b/lib/x86/asm/page.h
@@ -41,5 +41,8 @@
 #define	PGDIR_MASK	1023
 #endif
 
+#define PGDIR_BITS(lvl)        (((lvl) - 1) * PGDIR_WIDTH + PAGE_SHIFT)
+#define PGDIR_OFFSET(va, lvl)  (((va) >> PGDIR_BITS(lvl)) & PGDIR_MASK)
+
 #endif /* !__ASSEMBLY__ */
 #endif
diff --git a/lib/x86/vm.c b/lib/x86/vm.c
index f7e778b..cda4c5f 100644
--- a/lib/x86/vm.c
+++ b/lib/x86/vm.c
@@ -48,7 +48,7 @@ unsigned long *install_pte(unsigned long *cr3,
     unsigned offset;
 
     for (level = PAGE_LEVEL; level > pte_level; --level) {
-	offset = ((unsigned long)virt >> ((level-1) * PGDIR_WIDTH + 12)) & PGDIR_MASK;
+	offset = PGDIR_OFFSET((unsigned long)virt, level);
 	if (!(pt[offset] & PT_PRESENT_MASK)) {
 	    unsigned long *new_pt = pt_page;
             if (!new_pt)
@@ -60,7 +60,7 @@ unsigned long *install_pte(unsigned long *cr3,
 	}
 	pt = phys_to_virt(pt[offset] & PT_ADDR_MASK);
     }
-    offset = ((unsigned long)virt >> ((level-1) * PGDIR_WIDTH + 12)) & PGDIR_MASK;
+    offset = PGDIR_OFFSET((unsigned long)virt, level);
     pt[offset] = pte;
     return &pt[offset];
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 05/17] x86/asm: add cpu_relax()
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (3 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 04/17] vm/page: provide PGDIR_OFFSET() macro Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 16:18   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev Peter Xu
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

This will be useful to be put inside loops.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/x86/asm/barrier.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/lib/x86/asm/barrier.h b/lib/x86/asm/barrier.h
index 7c108bd..193fb4c 100644
--- a/lib/x86/asm/barrier.h
+++ b/lib/x86/asm/barrier.h
@@ -13,4 +13,15 @@
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
 
+/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
+static inline void rep_nop(void)
+{
+	asm volatile("rep; nop" ::: "memory");
+}
+
+static inline void cpu_relax(void)
+{
+	rep_nop();
+}
+
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (4 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 05/17] x86/asm: add cpu_relax() Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 16:41   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 07/17] pci: provide pci_scan_bars() Peter Xu
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

To extend current PCI framework, we need a per-device struct to store
device specific information. Time to have a pci_dev struct. Most of the
current PCI APIs are converted to use this pci_dev object as the first
argument. Currently it only contains one field "pci_bdf", which is the
bdf of current device.

For a few APIs like pci_config_*() ops, I kept the old interface (use
PCI BDF value rather than "struct pci_dev") since they can be used in a
open context that without any specific PCI device.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci-host-generic.c | 13 ++++++----
 lib/pci-testdev.c      | 13 +++++-----
 lib/pci.c              | 64 ++++++++++++++++++++++++++++++--------------------
 lib/pci.h              | 24 ++++++++++++-------
 x86/vmexit.c           | 18 +++++++-------
 5 files changed, 80 insertions(+), 52 deletions(-)

diff --git a/lib/pci-host-generic.c b/lib/pci-host-generic.c
index 96f4b4b..be3f9e7 100644
--- a/lib/pci-host-generic.c
+++ b/lib/pci-host-generic.c
@@ -194,6 +194,7 @@ static bool pci_alloc_resource(u64 *addr, u32 bar, u64 size)
 
 bool pci_probe(void)
 {
+	struct pci_dev pci_dev;
 	pcidevaddr_t dev;
 	u8 header;
 	u32 cmd;
@@ -208,6 +209,8 @@ bool pci_probe(void)
 		if (!pci_dev_exists(dev))
 			continue;
 
+		pci_dev_init(&pci_dev, dev);
+
 		/* We are only interested in normal PCI devices */
 		header = pci_config_readb(dev, PCI_HEADER_TYPE);
 		if ((header & PCI_HEADER_TYPE_MASK) != PCI_HEADER_TYPE_NORMAL)
@@ -219,21 +222,21 @@ bool pci_probe(void)
 			u64 addr, size;
 			u32 bar;
 
-			size = pci_bar_size(dev, i);
+			size = pci_bar_size(&pci_dev, i);
 			if (!size)
 				continue;
 
-			bar = pci_bar_get(dev, i);
+			bar = pci_bar_get(&pci_dev, i);
 			if (pci_alloc_resource(&addr, bar, size)) {
-				pci_bar_set_addr(dev, i, addr);
+				pci_bar_set_addr(&pci_dev, i, addr);
 
-				if (pci_bar_is_memory(dev, i))
+				if (pci_bar_is_memory(&pci_dev, i))
 					cmd |= PCI_COMMAND_MEMORY;
 				else
 					cmd |= PCI_COMMAND_IO;
 			}
 
-			if (pci_bar_is64(dev, i))
+			if (pci_bar_is64(&pci_dev, i))
 				i++;
 		}
 
diff --git a/lib/pci-testdev.c b/lib/pci-testdev.c
index ad482d3..caa31bd 100644
--- a/lib/pci-testdev.c
+++ b/lib/pci-testdev.c
@@ -163,26 +163,27 @@ static int pci_testdev_all(struct pci_test_dev_hdr *test,
 
 int pci_testdev(void)
 {
+	struct pci_dev pci_dev;
 	phys_addr_t addr;
 	void __iomem *mem, *io;
-	pcidevaddr_t dev;
 	int nr_tests = 0;
 	bool ret;
 
-	dev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
-	if (dev == PCIDEVADDR_INVALID) {
+	ret = pci_find_dev(&pci_dev, PCI_VENDOR_ID_REDHAT,
+			   PCI_DEVICE_ID_REDHAT_TEST);
+	if (ret == -1) {
 		printf("'pci-testdev' device is not found, "
 		       "check QEMU '-device pci-testdev' parameter\n");
 		return -1;
 	}
 
-	ret = pci_bar_is_valid(dev, 0) && pci_bar_is_valid(dev, 1);
+	ret = pci_bar_is_valid(&pci_dev, 0) && pci_bar_is_valid(&pci_dev, 1);
 	assert(ret);
 
-	addr = pci_bar_get_addr(dev, 0);
+	addr = pci_bar_get_addr(&pci_dev, 0);
 	mem = ioremap(addr, PAGE_SIZE);
 
-	addr = pci_bar_get_addr(dev, 1);
+	addr = pci_bar_get_addr(&pci_dev, 1);
 	io = (void *)(unsigned long)addr;
 
 	nr_tests += pci_testdev_all(mem, &pci_testdev_mem_ops);
diff --git a/lib/pci.c b/lib/pci.c
index 2dbbba4..bb41e0f 100644
--- a/lib/pci.c
+++ b/lib/pci.c
@@ -13,18 +13,26 @@ bool pci_dev_exists(pcidevaddr_t dev)
 		pci_config_readw(dev, PCI_DEVICE_ID) != 0xffff);
 }
 
+void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf)
+{
+       memset(dev, 0, sizeof(*dev));
+       dev->pci_bdf = bdf;
+}
+
 /* Scan bus look for a specific device. Only bus 0 scanned for now. */
-pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
+int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id)
 {
 	pcidevaddr_t dev;
 
-	for (dev = 0; dev < 256; ++dev) {
+	for (dev = 0; dev < PCI_DEVFN_MAX; ++dev) {
 		if (pci_config_readw(dev, PCI_VENDOR_ID) == vendor_id &&
-		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id)
-			return dev;
+		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id) {
+			pci_dev_init(pci_dev, dev);
+			return 0;
+		}
 	}
 
-	return PCIDEVADDR_INVALID;
+	return -1;
 }
 
 uint32_t pci_bar_mask(uint32_t bar)
@@ -33,12 +41,13 @@ uint32_t pci_bar_mask(uint32_t bar)
 		PCI_BASE_ADDRESS_IO_MASK : PCI_BASE_ADDRESS_MEM_MASK;
 }
 
-uint32_t pci_bar_get(pcidevaddr_t dev, int bar_num)
+uint32_t pci_bar_get(struct pci_dev *dev, int bar_num)
 {
-	return pci_config_readl(dev, PCI_BASE_ADDRESS_0 + bar_num * 4);
+	return pci_config_readl(dev->pci_bdf, PCI_BASE_ADDRESS_0 +
+				bar_num * 4);
 }
 
-phys_addr_t pci_bar_get_addr(pcidevaddr_t dev, int bar_num)
+phys_addr_t pci_bar_get_addr(struct pci_dev *dev, int bar_num)
 {
 	uint32_t bar = pci_bar_get(dev, bar_num);
 	uint32_t mask = pci_bar_mask(bar);
@@ -47,17 +56,18 @@ phys_addr_t pci_bar_get_addr(pcidevaddr_t dev, int bar_num)
 	if (pci_bar_is64(dev, bar_num))
 		addr |= (uint64_t)pci_bar_get(dev, bar_num + 1) << 32;
 
-	return pci_translate_addr(dev, addr);
+	return pci_translate_addr(dev->pci_bdf, addr);
 }
 
-void pci_bar_set_addr(pcidevaddr_t dev, int bar_num, phys_addr_t addr)
+void pci_bar_set_addr(struct pci_dev *dev, int bar_num, phys_addr_t addr)
 {
 	int off = PCI_BASE_ADDRESS_0 + bar_num * 4;
 
-	pci_config_writel(dev, off, (uint32_t)addr);
+	pci_config_writel(dev->pci_bdf, off, (uint32_t)addr);
 
 	if (pci_bar_is64(dev, bar_num))
-		pci_config_writel(dev, off + 4, (uint32_t)(addr >> 32));
+		pci_config_writel(dev->pci_bdf, off + 4,
+				  (uint32_t)(addr >> 32));
 }
 
 /*
@@ -70,20 +80,21 @@ void pci_bar_set_addr(pcidevaddr_t dev, int bar_num, phys_addr_t addr)
  * The following pci_bar_size_helper() and pci_bar_size() functions
  * implement the algorithm.
  */
-static uint32_t pci_bar_size_helper(pcidevaddr_t dev, int bar_num)
+static uint32_t pci_bar_size_helper(struct pci_dev *dev, int bar_num)
 {
 	int off = PCI_BASE_ADDRESS_0 + bar_num * 4;
+	uint16_t bdf = dev->pci_bdf;
 	uint32_t bar, val;
 
-	bar = pci_config_readl(dev, off);
-	pci_config_writel(dev, off, ~0u);
-	val = pci_config_readl(dev, off);
-	pci_config_writel(dev, off, bar);
+	bar = pci_config_readl(bdf, off);
+	pci_config_writel(bdf, off, ~0u);
+	val = pci_config_readl(bdf, off);
+	pci_config_writel(bdf, off, bar);
 
 	return val;
 }
 
-phys_addr_t pci_bar_size(pcidevaddr_t dev, int bar_num)
+phys_addr_t pci_bar_size(struct pci_dev *dev, int bar_num)
 {
 	uint32_t bar, size;
 
@@ -104,19 +115,19 @@ phys_addr_t pci_bar_size(pcidevaddr_t dev, int bar_num)
 	}
 }
 
-bool pci_bar_is_memory(pcidevaddr_t dev, int bar_num)
+bool pci_bar_is_memory(struct pci_dev *dev, int bar_num)
 {
 	uint32_t bar = pci_bar_get(dev, bar_num);
 
 	return !(bar & PCI_BASE_ADDRESS_SPACE_IO);
 }
 
-bool pci_bar_is_valid(pcidevaddr_t dev, int bar_num)
+bool pci_bar_is_valid(struct pci_dev *dev, int bar_num)
 {
 	return pci_bar_get(dev, bar_num);
 }
 
-bool pci_bar_is64(pcidevaddr_t dev, int bar_num)
+bool pci_bar_is64(struct pci_dev *dev, int bar_num)
 {
 	uint32_t bar = pci_bar_get(dev, bar_num);
 
@@ -135,6 +146,7 @@ static void pci_dev_print(pcidevaddr_t dev)
 	uint8_t progif = pci_config_readb(dev, PCI_CLASS_PROG);
 	uint8_t subclass = pci_config_readb(dev, PCI_CLASS_DEVICE);
 	uint8_t class = pci_config_readb(dev, PCI_CLASS_DEVICE + 1);
+	struct pci_dev pci_dev;
 	int i;
 
 	printf("dev %2d fn %d vendor_id %04x device_id %04x type %02x "
@@ -145,18 +157,20 @@ static void pci_dev_print(pcidevaddr_t dev)
 	if ((header & PCI_HEADER_TYPE_MASK) != PCI_HEADER_TYPE_NORMAL)
 		return;
 
+	pci_dev_init(&pci_dev, dev);
+
 	for (i = 0; i < 6; i++) {
 		phys_addr_t size, start, end;
 		uint32_t bar;
 
-		size = pci_bar_size(dev, i);
+		size = pci_bar_size(&pci_dev, i);
 		if (!size)
 			continue;
 
-		start = pci_bar_get_addr(dev, i);
+		start = pci_bar_get_addr(&pci_dev, i);
 		end = start + size - 1;
 
-		if (pci_bar_is64(dev, i)) {
+		if (pci_bar_is64(&pci_dev, i)) {
 			printf("\tBAR#%d,%d [%" PRIx64 "-%" PRIx64 " ",
 			       i, i + 1, start, end);
 			i++;
@@ -165,7 +179,7 @@ static void pci_dev_print(pcidevaddr_t dev)
 			       i, (uint32_t)start, (uint32_t)end);
 		}
 
-		bar = pci_bar_get(dev, i);
+		bar = pci_bar_get(&pci_dev, i);
 
 		if (bar & PCI_BASE_ADDRESS_SPACE_IO) {
 			printf("PIO]\n");
diff --git a/lib/pci.h b/lib/pci.h
index 40e11a8..af20dc5 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -15,10 +15,18 @@ enum {
 	PCIDEVADDR_INVALID = 0xffff,
 };
 
+#define PCI_DEVFN_MAX                   (256)
+
+struct pci_dev {
+	uint16_t pci_bdf;
+};
+
+void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
+
 extern bool pci_probe(void);
 extern void pci_print(void);
 extern bool pci_dev_exists(pcidevaddr_t dev);
-extern pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id);
+extern int pci_find_dev(struct pci_dev *dev, uint16_t vendor_id, uint16_t device_id);
 
 /*
  * @bar_num in all BAR access functions below is the index of the 32-bit
@@ -32,14 +40,14 @@ extern pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id);
  * It is expected the caller is aware of the device BAR layout and never
  * tries to address the middle of a 64-bit register.
  */
-extern phys_addr_t pci_bar_get_addr(pcidevaddr_t dev, int bar_num);
-extern void pci_bar_set_addr(pcidevaddr_t dev, int bar_num, phys_addr_t addr);
-extern phys_addr_t pci_bar_size(pcidevaddr_t dev, int bar_num);
-extern uint32_t pci_bar_get(pcidevaddr_t dev, int bar_num);
+extern phys_addr_t pci_bar_get_addr(struct pci_dev *dev, int bar_num);
+extern void pci_bar_set_addr(struct pci_dev *dev, int bar_num, phys_addr_t addr);
+extern phys_addr_t pci_bar_size(struct pci_dev *dev, int bar_num);
+extern uint32_t pci_bar_get(struct pci_dev *dev, int bar_num);
 extern uint32_t pci_bar_mask(uint32_t bar);
-extern bool pci_bar_is64(pcidevaddr_t dev, int bar_num);
-extern bool pci_bar_is_memory(pcidevaddr_t dev, int bar_num);
-extern bool pci_bar_is_valid(pcidevaddr_t dev, int bar_num);
+extern bool pci_bar_is64(struct pci_dev *dev, int bar_num);
+extern bool pci_bar_is_memory(struct pci_dev *dev, int bar_num);
+extern bool pci_bar_is_valid(struct pci_dev *dev, int bar_num);
 
 int pci_testdev(void);
 
diff --git a/x86/vmexit.c b/x86/vmexit.c
index 2d99d5f..2736ab8 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -372,7 +372,8 @@ int main(int ac, char **av)
 	struct fadt_descriptor_rev1 *fadt;
 	int i;
 	unsigned long membar = 0;
-	pcidevaddr_t pcidev;
+	struct pci_dev pcidev;
+	int ret;
 
 	smp_init();
 	setup_vm();
@@ -385,21 +386,22 @@ int main(int ac, char **av)
 	pm_tmr_blk = fadt->pm_tmr_blk;
 	printf("PM timer port is %x\n", pm_tmr_blk);
 
-	pcidev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
-	if (pcidev != PCIDEVADDR_INVALID) {
+	ret = pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
+			   PCI_DEVICE_ID_REDHAT_TEST);
+	if (ret == 0) {
 		for (i = 0; i < PCI_TESTDEV_NUM_BARS; i++) {
-			if (!pci_bar_is_valid(pcidev, i)) {
+			if (!pci_bar_is_valid(&pcidev, i)) {
 				continue;
 			}
-			if (pci_bar_is_memory(pcidev, i)) {
-				membar = pci_bar_get_addr(pcidev, i);
+			if (pci_bar_is_memory(&pcidev, i)) {
+				membar = pci_bar_get_addr(&pcidev, i);
 				pci_test.memaddr = ioremap(membar, PAGE_SIZE);
 			} else {
-				pci_test.iobar = pci_bar_get_addr(pcidev, i);
+				pci_test.iobar = pci_bar_get_addr(&pcidev, i);
 			}
 		}
 		printf("pci-testdev at 0x%x membar %lx iobar %x\n",
-		       pcidev, membar, pci_test.iobar);
+		       pcidev.pci_bdf, membar, pci_test.iobar);
 	}
 
 	for (i = 0; i < ARRAY_SIZE(tests); ++i)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 07/17] pci: provide pci_scan_bars()
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (5 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 16:47   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 08/17] x86/vmexit: leverage pci_scan_bars() Peter Xu
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

Let's provide a more general way to scan PCI bars, rather than read the
config registers every time.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci.c | 14 ++++++++++++++
 lib/pci.h |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/lib/pci.c b/lib/pci.c
index bb41e0f..95a6581 100644
--- a/lib/pci.c
+++ b/lib/pci.c
@@ -35,6 +35,20 @@ int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id
 	return -1;
 }
 
+void pci_scan_bars(struct pci_dev *dev)
+{
+	int i = 0;
+
+	for (i = 0; i < PCI_BAR_NUM; i++) {
+		if (!pci_bar_is_valid(dev, i))
+			continue;
+		dev->pci_bar[i] = pci_bar_get_addr(dev, i);
+		printf("PCI: init dev 0x%04x BAR %d [%s] addr 0x%lx\n",
+		       dev->pci_bdf, i, pci_bar_is_memory(dev, i) ?
+		       "MEM" : "IO", dev->pci_bar[i]);
+	}
+}
+
 uint32_t pci_bar_mask(uint32_t bar)
 {
 	return (bar & PCI_BASE_ADDRESS_SPACE_IO) ?
diff --git a/lib/pci.h b/lib/pci.h
index af20dc5..56b49f0 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -15,13 +15,16 @@ enum {
 	PCIDEVADDR_INVALID = 0xffff,
 };
 
+#define PCI_BAR_NUM                     (6)
 #define PCI_DEVFN_MAX                   (256)
 
 struct pci_dev {
 	uint16_t pci_bdf;
+	phys_addr_t pci_bar[PCI_BAR_NUM];
 };
 
 void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
+void pci_scan_bars(struct pci_dev *dev);
 
 extern bool pci_probe(void);
 extern void pci_print(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 08/17] x86/vmexit: leverage pci_scan_bars()
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (6 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 07/17] pci: provide pci_scan_bars() Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 16:54   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]() Peter Xu
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

Since pci-testdev is a very specific device for QEMU, let's use the new
pci_scan_bars() helper, and selectively choose the bars we want.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci.h    |  2 ++
 x86/vmexit.c | 23 ++++++++---------------
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/lib/pci.h b/lib/pci.h
index 56b49f0..e48045c 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -66,6 +66,8 @@ int pci_testdev(void);
  * pci-testdev supports at least three types of tests (via mmio and
  * portio BARs): no-eventfd, wildcard-eventfd and datamatch-eventfd
  */
+#define PCI_TESTDEV_BAR_MEM		0
+#define PCI_TESTDEV_BAR_IO		1
 #define PCI_TESTDEV_NUM_BARS		2
 #define PCI_TESTDEV_NUM_TESTS		3
 
diff --git a/x86/vmexit.c b/x86/vmexit.c
index 2736ab8..880466e 100644
--- a/x86/vmexit.c
+++ b/x86/vmexit.c
@@ -373,7 +373,6 @@ int main(int ac, char **av)
 	int i;
 	unsigned long membar = 0;
 	struct pci_dev pcidev;
-	int ret;
 
 	smp_init();
 	setup_vm();
@@ -386,20 +385,14 @@ int main(int ac, char **av)
 	pm_tmr_blk = fadt->pm_tmr_blk;
 	printf("PM timer port is %x\n", pm_tmr_blk);
 
-	ret = pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
-			   PCI_DEVICE_ID_REDHAT_TEST);
-	if (ret == 0) {
-		for (i = 0; i < PCI_TESTDEV_NUM_BARS; i++) {
-			if (!pci_bar_is_valid(&pcidev, i)) {
-				continue;
-			}
-			if (pci_bar_is_memory(&pcidev, i)) {
-				membar = pci_bar_get_addr(&pcidev, i);
-				pci_test.memaddr = ioremap(membar, PAGE_SIZE);
-			} else {
-				pci_test.iobar = pci_bar_get_addr(&pcidev, i);
-			}
-		}
+	if (!pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
+			  PCI_DEVICE_ID_REDHAT_TEST)) {
+		pci_scan_bars(&pcidev);
+		assert(pci_bar_is_memory(&pcidev, PCI_TESTDEV_BAR_MEM));
+		assert(!pci_bar_is_memory(&pcidev, PCI_TESTDEV_BAR_IO));
+		membar = pcidev.pci_bar[PCI_TESTDEV_BAR_MEM];
+		pci_test.memaddr = ioremap(membar, PAGE_SIZE);
+		pci_test.iobar = pcidev.pci_bar[PCI_TESTDEV_BAR_IO];
 		printf("pci-testdev at 0x%x membar %lx iobar %x\n",
 		       pcidev.pci_bdf, membar, pci_test.iobar);
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]()
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (7 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 08/17] x86/vmexit: leverage pci_scan_bars() Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 16:59   ` Andrew Jones
  2016-11-05 17:06   ` Alexander Gordeev
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master() Peter Xu
                   ` (7 subsequent siblings)
  16 siblings, 2 replies; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

We have writel() for 4 bytes but still missing 2/1 bytes accessors.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci-host-generic.c | 12 ++++++++++++
 lib/x86/asm/pci.h      | 14 ++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/lib/pci-host-generic.c b/lib/pci-host-generic.c
index be3f9e7..c04e4f5 100644
--- a/lib/pci-host-generic.c
+++ b/lib/pci-host-generic.c
@@ -291,6 +291,18 @@ u32 pci_config_readl(pcidevaddr_t dev, u8 off)
 	return readl(conf + off);
 }
 
+void pci_config_writeb(pcidevaddr_t dev, u8 off, u8 val)
+{
+	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
+	writeb(val, conf + off);
+}
+
+void pci_config_writew(pcidevaddr_t dev, u8 off, u16 val)
+{
+	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
+	writew(val, conf + off);
+}
+
 void pci_config_writel(pcidevaddr_t dev, u8 off, u32 val)
 {
 	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
diff --git a/lib/x86/asm/pci.h b/lib/x86/asm/pci.h
index 7384b91..abfe5d1 100644
--- a/lib/x86/asm/pci.h
+++ b/lib/x86/asm/pci.h
@@ -29,6 +29,20 @@ static inline uint32_t pci_config_readl(pcidevaddr_t dev, uint8_t reg)
     return inl(0xCFC);
 }
 
+static inline void pci_config_writeb(pcidevaddr_t dev, uint8_t reg,
+				     uint8_t val)
+{
+    outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
+    outb(val, 0xCFC);
+}
+
+static inline void pci_config_writew(pcidevaddr_t dev, uint8_t reg,
+				     uint32_t val)
+{
+    outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
+    outw(val, 0xCFC);
+}
+
 static inline void pci_config_writel(pcidevaddr_t dev, uint8_t reg, uint32_t val)
 {
     outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master()
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (8 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]() Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 17:04   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 11/17] pci: provide pci_enable_defaults() Peter Xu
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

We need to call this whenever we want to do DMA with a device.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci.c | 7 +++++++
 lib/pci.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/lib/pci.c b/lib/pci.c
index 95a6581..1495f61 100644
--- a/lib/pci.c
+++ b/lib/pci.c
@@ -7,6 +7,13 @@
 #include "pci.h"
 #include "asm/pci.h"
 
+void pci_set_master(struct pci_dev *dev, int master)
+{
+	uint32_t val = pci_config_readw(dev->pci_bdf, PCI_COMMAND);
+	val |= PCI_COMMAND_MASTER;
+	pci_config_writew(dev->pci_bdf, PCI_COMMAND, val);
+}
+
 bool pci_dev_exists(pcidevaddr_t dev)
 {
 	return (pci_config_readw(dev, PCI_VENDOR_ID) != 0xffff &&
diff --git a/lib/pci.h b/lib/pci.h
index e48045c..46657f4 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -25,6 +25,7 @@ struct pci_dev {
 
 void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
 void pci_scan_bars(struct pci_dev *dev);
+void pci_set_master(struct pci_dev *dev, int master);
 
 extern bool pci_probe(void);
 extern void pci_print(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 11/17] pci: provide pci_enable_defaults()
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (9 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master() Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 17:08   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 12/17] pci: add bdf helpers Peter Xu
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

Provide a function to do most of the common PCI init work.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci.c | 7 +++++++
 lib/pci.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/lib/pci.c b/lib/pci.c
index 1495f61..d78472f 100644
--- a/lib/pci.c
+++ b/lib/pci.c
@@ -56,6 +56,13 @@ void pci_scan_bars(struct pci_dev *dev)
 	}
 }
 
+int pci_enable_defaults(struct pci_dev *dev)
+{
+	pci_scan_bars(dev);
+	pci_set_master(dev, 1);
+	return 0;
+}
+
 uint32_t pci_bar_mask(uint32_t bar)
 {
 	return (bar & PCI_BASE_ADDRESS_SPACE_IO) ?
diff --git a/lib/pci.h b/lib/pci.h
index 46657f4..0336455 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -26,6 +26,7 @@ struct pci_dev {
 void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
 void pci_scan_bars(struct pci_dev *dev);
 void pci_set_master(struct pci_dev *dev, int master);
+int pci_enable_defaults(struct pci_dev *dev);
 
 extern bool pci_probe(void);
 extern void pci_print(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 12/17] pci: add bdf helpers
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (10 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 11/17] pci: provide pci_enable_defaults() Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 17:51   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers Peter Xu
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/pci.h b/lib/pci.h
index 0336455..7a3bce9 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -18,6 +18,9 @@ enum {
 #define PCI_BAR_NUM                     (6)
 #define PCI_DEVFN_MAX                   (256)
 
+#define PCI_BDF_GET_DEVFN(x)            ((x) & 0xff)
+#define PCI_BDF_GET_BUS(x)              (((x) >> 8) & 0xff)
+
 struct pci_dev {
 	uint16_t pci_bdf;
 	phys_addr_t pci_bar[PCI_BAR_NUM];
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (11 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 12/17] pci: add bdf helpers Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 17:18   ` Andrew Jones
  2016-11-04 17:24   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 14/17] x86: intel-iommu: add dmar test Peter Xu
                   ` (3 subsequent siblings)
  16 siblings, 2 replies; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

QEMU edu device is a pci device that is originally written for
educational purpose, however it also suits for IOMMU unit test. Adding
helpers for this specific device to implement the device logic.

The device supports lots of functions, here only DMA operation is
supported.

The spec of the device can be found at:

  https://github.com/qemu/qemu/blob/master/docs/specs/edu.txt

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci-edu.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/pci-edu.h | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+)
 create mode 100644 lib/pci-edu.c
 create mode 100644 lib/pci-edu.h

diff --git a/lib/pci-edu.c b/lib/pci-edu.c
new file mode 100644
index 0000000..c594e08
--- /dev/null
+++ b/lib/pci-edu.c
@@ -0,0 +1,74 @@
+/*
+ * Edu PCI device.
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * Authors:
+ *   Peter Xu <peterx@redhat.com>,
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or
+ * later.
+ */
+
+#include "pci-edu.h"
+#include "asm/barrier.h"
+
+/* Return true if alive */
+static inline bool edu_check_alive(struct pci_edu_dev *dev)
+{
+	static uint32_t live_count = 1;
+	uint32_t value;
+
+	edu_reg_write(dev, EDU_REG_ALIVE, live_count++);
+	value = edu_reg_read(dev, EDU_REG_ALIVE);
+	return (live_count - 1 == ~value);
+}
+
+int edu_init(struct pci_edu_dev *dev)
+{
+	int ret;
+
+	ret = pci_find_dev(&dev->pci_dev, PCI_VENDOR_ID_QEMU,
+			   PCI_DEVICE_ID_EDU);
+	if (ret)
+		return ret;
+
+	pci_enable_defaults(&dev->pci_dev);
+	assert(edu_check_alive(dev));
+
+	return 0;
+}
+
+void edu_dma(struct pci_edu_dev *dev, iova_t iova,
+	     size_t size, int dev_offset, bool from_device)
+{
+	uint64_t from, to;
+	uint32_t cmd = EDU_CMD_DMA_START;
+
+	assert(size <= EDU_DMA_SIZE_MAX);
+	assert(dev_offset < EDU_DMA_SIZE_MAX &&
+	       dev_offset >= 0);
+
+	printf("edu device DMA start %s addr %p size 0x%lu off 0x%x\n",
+	       from_device ? "FROM" : "TO",
+	       (void *)iova, size, dev_offset);
+
+	if (from_device) {
+		from = dev_offset + EDU_DMA_START;
+		to = iova;
+		cmd |= EDU_CMD_DMA_FROM;
+	} else {
+		from = iova;
+		to = EDU_DMA_START + dev_offset;
+		cmd |= EDU_CMD_DMA_TO;
+	}
+
+	edu_reg_writeq(dev, EDU_REG_DMA_SRC, from);
+	edu_reg_writeq(dev, EDU_REG_DMA_DST, to);
+	edu_reg_writeq(dev, EDU_REG_DMA_COUNT, size);
+	edu_reg_write(dev, EDU_REG_DMA_CMD, cmd);
+
+	/* Wait until DMA finished */
+	while (edu_reg_read(dev, EDU_REG_DMA_CMD) & EDU_CMD_DMA_START)
+		cpu_relax();
+}
diff --git a/lib/pci-edu.h b/lib/pci-edu.h
new file mode 100644
index 0000000..f98d2d1
--- /dev/null
+++ b/lib/pci-edu.h
@@ -0,0 +1,79 @@
+/*
+ * Edu PCI device header.
+ *
+ * Copyright (C) 2016 Red Hat, Inc.
+ *
+ * Authors:
+ *   Peter Xu <peterx@redhat.com>,
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or
+ * later.
+ *
+ * Edu device is a virtualized device in QEMU. Please refer to
+ * docs/specs/edu.txt in QEMU repository for EDU device manual.
+ */
+#ifndef __PCI_EDU_H__
+#define __PCI_EDU_H__
+
+#include "pci.h"
+
+#define  PCI_VENDOR_ID_QEMU              (0x1234)
+#define  PCI_DEVICE_ID_EDU               (0x11e8)
+
+/* The only bar used by EDU device */
+#define EDU_BAR_MEM                 (0)
+#define EDU_MAGIC                   (0xed)
+#define EDU_VERSION                 (0x100)
+#define EDU_DMA_BUF_SIZE            (1 << 20)
+#define EDU_INPUT_BUF_SIZE          (256)
+
+#define EDU_REG_ID                  (0x0)
+#define EDU_REG_ALIVE               (0x4)
+#define EDU_REG_FACTORIAL           (0x8)
+#define EDU_REG_STATUS              (0x20)
+#define EDU_REG_DMA_SRC             (0x80)
+#define EDU_REG_DMA_DST             (0x88)
+#define EDU_REG_DMA_COUNT           (0x90)
+#define EDU_REG_DMA_CMD             (0x98)
+
+#define EDU_CMD_DMA_START           (0x01)
+#define EDU_CMD_DMA_FROM            (0x02)
+#define EDU_CMD_DMA_TO              (0x00)
+
+#define EDU_STATUS_FACTORIAL        (0x1)
+#define EDU_STATUS_INT_ENABLE       (0x80)
+
+#define EDU_DMA_START               (0x40000)
+#define EDU_DMA_SIZE_MAX            (4096)
+
+struct pci_edu_dev {
+	struct pci_dev pci_dev;
+};
+
+static inline uint64_t edu_reg_readq(struct pci_edu_dev *dev, int reg)
+{
+	return *(volatile uint64_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg);
+}
+
+static inline uint32_t edu_reg_read(struct pci_edu_dev *dev, int reg)
+{
+	return *(volatile uint32_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg);
+}
+
+static inline void edu_reg_writeq(struct pci_edu_dev *dev, int reg,
+				  uint64_t val)
+{
+	*(volatile uint64_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg) = val;
+}
+
+static inline void edu_reg_write(struct pci_edu_dev *dev, int reg,
+				 uint32_t val)
+{
+	*(volatile uint32_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg) = val;
+}
+
+int edu_init(struct pci_edu_dev *dev);
+void edu_dma(struct pci_edu_dev *dev, iova_t iova,
+	     size_t size, int dev_offset, bool from_device);
+
+#endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 14/17] x86: intel-iommu: add dmar test
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (12 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 17:53   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 15/17] pci: add msi support for 32/64bit address Peter Xu
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

DMAR test is based on QEMU edu device. A 4B DMA memory copy is carried
out as the simplest DMAR test.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci.h             |   2 +
 lib/x86/intel-iommu.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/x86/intel-iommu.h |  23 +++++++++
 x86/Makefile.common   |   1 +
 x86/intel-iommu.c     |  52 ++++++++++++++++++++
 5 files changed, 209 insertions(+)

diff --git a/lib/pci.h b/lib/pci.h
index 7a3bce9..bc6b91a 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -31,6 +31,8 @@ void pci_scan_bars(struct pci_dev *dev);
 void pci_set_master(struct pci_dev *dev, int master);
 int pci_enable_defaults(struct pci_dev *dev);
 
+typedef phys_addr_t iova_t;
+
 extern bool pci_probe(void);
 extern void pci_print(void);
 extern bool pci_dev_exists(pcidevaddr_t dev);
diff --git a/lib/x86/intel-iommu.c b/lib/x86/intel-iommu.c
index 02eaf39..b680453 100644
--- a/lib/x86/intel-iommu.c
+++ b/lib/x86/intel-iommu.c
@@ -11,6 +11,42 @@
  */
 
 #include "intel-iommu.h"
+#include "libcflat.h"
+
+/*
+ * VT-d in QEMU currently only support 39 bits address width, which is
+ * 3-level translation.
+ */
+#define VTD_PAGE_LEVEL      (3)
+#define VTD_CE_AW_39BIT     (0x1)
+
+typedef uint64_t vtd_pte_t;
+
+struct vtd_root_entry {
+	/* Quad 1 */
+	uint64_t present:1;
+	uint64_t __reserved:11;
+	uint64_t context_table_p:52;
+	/* Quad 2 */
+	uint64_t __reserved_2;
+} __attribute__ ((packed));
+typedef struct vtd_root_entry vtd_re_t;
+
+struct vtd_context_entry {
+	/* Quad 1 */
+	uint64_t present:1;
+	uint64_t disable_fault_report:1;
+	uint64_t trans_type:2;
+	uint64_t __reserved:8;
+	uint64_t slptptr:52;
+	/* Quad 2 */
+	uint64_t addr_width:3;
+	uint64_t __ignore:4;
+	uint64_t __reserved_2:1;
+	uint64_t domain_id:16;
+	uint64_t __reserved_3:40;
+} __attribute__ ((packed));
+typedef struct vtd_context_entry vtd_ce_t;
 
 #define ONE_BIT_ONLY(x) ((x) && !((x) & ((x) - 1)))
 
@@ -74,6 +110,101 @@ static void vtd_setup_ir_table(void)
 	printf("IR table address: 0x%016lx\n", vtd_ir_table());
 }
 
+static void vtd_install_pte(vtd_pte_t *root, iova_t iova,
+			    phys_addr_t pa, int level_target)
+{
+	int level;
+	unsigned int offset;
+	void *page;
+
+	for (level = VTD_PAGE_LEVEL; level > level_target; level--) {
+		offset = PGDIR_OFFSET(iova, level);
+		if (!(root[offset] & VTD_PTE_RW)) {
+			page = alloc_page();
+			memset(page, 0, PAGE_SIZE);
+			root[offset] = virt_to_phys(page) | VTD_PTE_RW;
+		}
+		root = (uint64_t *)(root[offset] & VTD_PTE_ADDR);
+	}
+
+	offset = PGDIR_OFFSET(iova, level);
+	root[offset] = pa | VTD_PTE_RW;
+	if (level != 1)
+		/* This is huge page */
+		root[offset] |= VTD_PTE_HUGE;
+}
+
+#define  VTD_FETCH_VIRT_ADDR(x) \
+	((void *)(((uint64_t)phys_to_virt(x)) >> PAGE_SHIFT))
+
+/**
+ * vtd_map_range: setup IO address mapping for specific memory range
+ *
+ * @sid: source ID of the device to setup
+ * @iova: start IO virtual address
+ * @pa: start physical address
+ * @size: size of the mapping area
+ */
+void vtd_map_range(uint16_t sid, iova_t iova, phys_addr_t pa, size_t size)
+{
+	uint8_t bus_n, devfn;
+	void *slptptr;
+	vtd_ce_t *ce;
+	vtd_re_t *re = phys_to_virt(vtd_root_table());
+
+	assert(IS_ALIGNED(iova, SZ_4K));
+	assert(IS_ALIGNED(pa, SZ_4K));
+	assert(IS_ALIGNED(size, SZ_4K));
+
+	bus_n = PCI_BDF_GET_BUS(sid);
+	devfn = PCI_BDF_GET_DEVFN(sid);
+
+	/* Point to the correct root entry */
+	re += bus_n;
+
+	if (!re->present) {
+		ce = alloc_page();
+		memset(ce, 0, PAGE_SIZE);
+		memset(re, 0, sizeof(*re));
+		re->context_table_p = virt_to_phys(ce) >> PAGE_SHIFT;
+		re->present = 1;
+		printf("allocated vt-d root entry for PCI bus %d\n",
+		       bus_n);
+	} else
+		ce = VTD_FETCH_VIRT_ADDR(re->context_table_p);
+
+	/* Point to the correct context entry */
+	ce += devfn;
+
+	if (!ce->present) {
+		slptptr = alloc_page();
+		memset(slptptr, 0, PAGE_SIZE);
+		memset(ce, 0, sizeof(*ce));
+		/* To make it simple, domain ID is the same as SID */
+		ce->domain_id = sid;
+		/* We only test 39 bits width case (3-level paging) */
+		ce->addr_width = VTD_CE_AW_39BIT;
+		ce->slptptr = virt_to_phys(slptptr) >> PAGE_SHIFT;
+		ce->trans_type = VTD_CONTEXT_TT_MULTI_LEVEL;
+		ce->present = 1;
+		/* No error reporting yet */
+		ce->disable_fault_report = 1;
+		printf("allocated vt-d context entry for devfn 0x%x\n",
+		       devfn);
+	} else
+		slptptr = VTD_FETCH_VIRT_ADDR(ce->slptptr);
+
+	while (size) {
+		/* TODO: currently we only map 4K pages (level = 1) */
+		printf("map 4K page IOVA 0x%lx to 0x%lx (sid=0x%04x)\n",
+		       iova, pa, sid);
+		vtd_install_pte(slptptr, iova, pa, 1);
+		size -= PAGE_SIZE;
+		iova += PAGE_SIZE;
+		pa += PAGE_SIZE;
+	}
+}
+
 void vtd_init(void)
 {
 	setup_vm();
diff --git a/lib/x86/intel-iommu.h b/lib/x86/intel-iommu.h
index 537ce62..40816bb 100644
--- a/lib/x86/intel-iommu.h
+++ b/lib/x86/intel-iommu.h
@@ -20,6 +20,7 @@
 #include "isr.h"
 #include "smp.h"
 #include "desc.h"
+#include "pci.h"
 
 #define Q35_HOST_BRIDGE_IOMMU_ADDR  0xfed90000ULL
 
@@ -90,6 +91,27 @@
 #define VTD_GCMD_ONE_SHOT_BITS  (VTD_GCMD_IR_TABLE | VTD_GCMD_WBF | \
 				 VTD_GCMD_SFL | VTD_GCMD_ROOT)
 
+/* Supported Adjusted Guest Address Widths */
+#define VTD_CAP_SAGAW_SHIFT         8
+ /* 39-bit AGAW, 3-level page-table */
+#define VTD_CAP_SAGAW_39bit         (0x2ULL << VTD_CAP_SAGAW_SHIFT)
+ /* 48-bit AGAW, 4-level page-table */
+#define VTD_CAP_SAGAW_48bit         (0x4ULL << VTD_CAP_SAGAW_SHIFT)
+#define VTD_CAP_SAGAW               VTD_CAP_SAGAW_39bit
+
+/* Both 1G/2M huge pages */
+#define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
+
+#define VTD_CONTEXT_TT_MULTI_LEVEL  0
+#define VTD_CONTEXT_TT_DEV_IOTLB    1
+#define VTD_CONTEXT_TT_PASS_THROUGH 2
+
+#define VTD_PTE_R                   (1 << 0)
+#define VTD_PTE_W                   (1 << 1)
+#define VTD_PTE_RW                  (VTD_PTE_R | VTD_PTE_W)
+#define VTD_PTE_ADDR                GENMASK_ULL(51, 12)
+#define VTD_PTE_HUGE                (1 << 7)
+
 static inline void vtd_writel(unsigned int reg, uint32_t value)
 {
 	*(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
@@ -111,5 +133,6 @@ static inline uint64_t vtd_readq(unsigned int reg)
 }
 
 void vtd_init(void);
+void vtd_map_range(uint16_t sid, phys_addr_t iova, phys_addr_t pa, size_t size);
 
 #endif
diff --git a/x86/Makefile.common b/x86/Makefile.common
index 356d879..1dad18b 100644
--- a/x86/Makefile.common
+++ b/x86/Makefile.common
@@ -3,6 +3,7 @@
 all: test_cases
 
 cflatobjs += lib/pci.o
+cflatobjs += lib/pci-edu.o
 cflatobjs += lib/x86/io.o
 cflatobjs += lib/x86/smp.o
 cflatobjs += lib/x86/vm.o
diff --git a/x86/intel-iommu.c b/x86/intel-iommu.c
index f247913..26b39d1 100644
--- a/x86/intel-iommu.c
+++ b/x86/intel-iommu.c
@@ -11,9 +11,50 @@
  */
 
 #include "intel-iommu.h"
+#include "pci-edu.h"
+
+#define VTD_TEST_DMAR_4B ("DMAR 4B memcpy test")
+
+void vtd_test_dmar(struct pci_edu_dev *dev)
+{
+	void *page = alloc_page();
+
+#define DMA_TEST_WORD (0x12345678)
+	/* Modify the first 4 bytes of the page */
+	*(uint32_t *)page = DMA_TEST_WORD;
+
+	/*
+	 * Map the newly allocated page into IOVA address 0 (size 4K)
+	 * of the device address space. Root entry and context entry
+	 * will be automatically created when needed.
+	 */
+	vtd_map_range(dev->pci_dev.pci_bdf, 0, virt_to_phys(page), PAGE_SIZE);
+
+	/*
+	 * DMA the first 4 bytes of the page to EDU device buffer
+	 * offset 0.
+	 */
+	edu_dma(dev, 0, 4, 0, false);
+	/*
+	 * DMA the first 4 bytes of EDU device buffer into the page
+	 * with offset 4 (so it'll be using 4-7 bytes).
+	 */
+	edu_dma(dev, 4, 4, 0, true);
+
+	/*
+	 * Check data match between 0-3 bytes and 4-7 bytes of the
+	 * page.
+	 */
+	report(VTD_TEST_DMAR_4B, *((uint32_t *)page + 1) == DMA_TEST_WORD);
+
+	free_page(page);
+}
 
 int main(int argc, char *argv[])
 {
+	int ret;
+	struct pci_edu_dev dev;
+
 	vtd_init();
 
 	report("fault status check", vtd_readl(DMAR_FSTS_REG) == 0);
@@ -22,6 +63,17 @@ int main(int argc, char *argv[])
 	report("IR table setup", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_IR_TABLE);
 	report("DMAR enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_DMAR);
 	report("IR enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_IR);
+	report("DMAR support 39 bits address width",
+	       vtd_readq(DMAR_CAP_REG) & VTD_CAP_SAGAW);
+	report("DMAR support huge pages", vtd_readq(DMAR_CAP_REG) & VTD_CAP_SLLPS);
+
+	ret = edu_init(&dev);
+	if (ret) {
+		printf("Please specify \"-device edu\" to do "
+		       "further IOMMU tests.\n");
+		report_skip(VTD_TEST_DMAR_4B);
+	} else
+		vtd_test_dmar(&dev);
 
 	return report_summary();
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 15/17] pci: add msi support for 32/64bit address
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (13 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 14/17] x86: intel-iommu: add dmar test Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 17:33   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 16/17] x86: intel-iommu: add IR MSI test Peter Xu
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test Peter Xu
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

pci_cap_walk() is provided to allow walk through all the capability bits
for a specific PCI device. If a cap handler is provided, it'll be
triggered if the cap is detected along the cap walk.

MSI cap handler is the first one supported. We can add more cap handler
in the future.

Meanwhile, pci_setup_msi() API is provided to support basic 32/64 bit
address MSI setup.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/pci.h |  3 +++
 2 files changed, 71 insertions(+)

diff --git a/lib/pci.c b/lib/pci.c
index d78472f..8d9a455 100644
--- a/lib/pci.c
+++ b/lib/pci.c
@@ -7,6 +7,69 @@
 #include "pci.h"
 #include "asm/pci.h"
 
+typedef void (*pci_cap_handler)(struct pci_dev *dev, int cap_offset);
+
+static void pci_cap_msi_handler(struct pci_dev *dev, int cap_offset)
+{
+	printf("Detected MSI for device 0x%x offset 0x%x\n",
+	       dev->pci_bdf, cap_offset);
+	dev->msi_offset = cap_offset;
+}
+
+static pci_cap_handler cap_handlers[0xff] = {
+	[PCI_CAP_ID_MSI] = pci_cap_msi_handler,
+};
+
+void pci_cap_walk(struct pci_dev *dev)
+{
+	uint8_t cap_offset;
+	uint8_t cap_id;
+
+	cap_offset = pci_config_readb(dev->pci_bdf, PCI_CAPABILITY_LIST);
+	while (cap_offset) {
+		cap_id = pci_config_readb(dev->pci_bdf, cap_offset);
+		printf("PCI detected cap 0x%x\n", cap_id);
+		if (cap_handlers[cap_id])
+			cap_handlers[cap_id](dev, cap_offset);
+		cap_offset = pci_config_readb(dev->pci_bdf, cap_offset + 1);
+	}
+}
+
+int pci_setup_msi(struct pci_dev *dev, uint64_t msi_addr, uint32_t msi_data)
+{
+	uint16_t msi_control;
+	uint16_t offset;
+	pcidevaddr_t addr = dev->pci_bdf;
+
+	assert(dev);
+
+	if (!dev->msi_offset) {
+		printf("MSI: dev 0x%x does not support MSI.\n", addr);
+		return -1;
+	}
+
+	offset = dev->msi_offset;
+	msi_control = pci_config_readw(addr, offset + PCI_MSI_FLAGS);
+	pci_config_writel(addr, offset + PCI_MSI_ADDRESS_LO,
+			  msi_addr & 0xffffffff);
+
+	if (msi_control & PCI_MSI_FLAGS_64BIT) {
+		pci_config_writel(addr, offset + PCI_MSI_ADDRESS_HI,
+				  (uint32_t)(msi_addr >> 32));
+		pci_config_writel(addr, offset + PCI_MSI_DATA_64, msi_data);
+		printf("MSI: dev 0x%x init 64bit address: ", addr);
+	} else {
+		pci_config_writel(addr, offset + PCI_MSI_DATA_32, msi_data);
+		printf("MSI: dev 0x%x init 32bit address: ", addr);
+	}
+	printf("addr=0x%lx, data=0x%x\n", msi_addr, msi_data);
+
+	msi_control |= PCI_MSI_FLAGS_ENABLE;
+	pci_config_writew(addr, offset + PCI_MSI_FLAGS, msi_control);
+
+	return 0;
+}
+
 void pci_set_master(struct pci_dev *dev, int master)
 {
 	uint32_t val = pci_config_readw(dev->pci_bdf, PCI_COMMAND);
@@ -20,6 +83,10 @@ bool pci_dev_exists(pcidevaddr_t dev)
 		pci_config_readw(dev, PCI_DEVICE_ID) != 0xffff);
 }
 
+/*
+ * Scan bus look for a specific device. Only bus 0 scanned for now.
+ * After the scan, a pci_dev is returned with correct BAR information.
+ */
 void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf)
 {
        memset(dev, 0, sizeof(*dev));
@@ -60,6 +127,7 @@ int pci_enable_defaults(struct pci_dev *dev)
 {
 	pci_scan_bars(dev);
 	pci_set_master(dev, 1);
+	pci_cap_walk(dev);
 	return 0;
 }
 
diff --git a/lib/pci.h b/lib/pci.h
index bc6b91a..a0331ac 100644
--- a/lib/pci.h
+++ b/lib/pci.h
@@ -23,13 +23,16 @@ enum {
 
 struct pci_dev {
 	uint16_t pci_bdf;
+	uint16_t msi_offset;
 	phys_addr_t pci_bar[PCI_BAR_NUM];
 };
 
 void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
 void pci_scan_bars(struct pci_dev *dev);
 void pci_set_master(struct pci_dev *dev, int master);
+void pci_cap_walk(struct pci_dev *dev);
 int pci_enable_defaults(struct pci_dev *dev);
+int pci_setup_msi(struct pci_dev *dev, uint64_t msi_addr, uint32_t msi_data);
 
 typedef phys_addr_t iova_t;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 16/17] x86: intel-iommu: add IR MSI test
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (14 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 15/17] pci: add msi support for 32/64bit address Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 17:40   ` Andrew Jones
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test Peter Xu
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

First of all, vtd_setup_msi() is provided. It setup IRTE entries,
meanwhile, setup PCI device MSI vectors corresponding to VT-d spec.

Meanwhile, IR MSI test is added to intel IOMMU unit test. The basic IR
test is carried out by a edu INTR raise request. When write to the intr
raise register, interrupt will be generated. Type of interrupt will
depend on the setup (either INTx or MSI).

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 lib/pci-edu.h         |  1 +
 lib/x86/intel-iommu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/x86/intel-iommu.h |  1 +
 x86/intel-iommu.c     | 44 +++++++++++++++++++++++-
 4 files changed, 139 insertions(+), 1 deletion(-)

diff --git a/lib/pci-edu.h b/lib/pci-edu.h
index f98d2d1..ea46281 100644
--- a/lib/pci-edu.h
+++ b/lib/pci-edu.h
@@ -31,6 +31,7 @@
 #define EDU_REG_ALIVE               (0x4)
 #define EDU_REG_FACTORIAL           (0x8)
 #define EDU_REG_STATUS              (0x20)
+#define EDU_REG_INTR_RAISE          (0x60)
 #define EDU_REG_DMA_SRC             (0x80)
 #define EDU_REG_DMA_DST             (0x88)
 #define EDU_REG_DMA_COUNT           (0x90)
diff --git a/lib/x86/intel-iommu.c b/lib/x86/intel-iommu.c
index b680453..dd32887 100644
--- a/lib/x86/intel-iommu.c
+++ b/lib/x86/intel-iommu.c
@@ -12,6 +12,7 @@
 
 #include "intel-iommu.h"
 #include "libcflat.h"
+#include "pci.h"
 
 /*
  * VT-d in QEMU currently only support 39 bits address width, which is
@@ -48,6 +49,26 @@ struct vtd_context_entry {
 } __attribute__ ((packed));
 typedef struct vtd_context_entry vtd_ce_t;
 
+struct vtd_irte {
+	uint32_t present:1;
+	uint32_t fault_disable:1;    /* Fault Processing Disable */
+	uint32_t dest_mode:1;        /* Destination Mode */
+	uint32_t redir_hint:1;       /* Redirection Hint */
+	uint32_t trigger_mode:1;     /* Trigger Mode */
+	uint32_t delivery_mode:3;    /* Delivery Mode */
+	uint32_t __avail:4;          /* Available spaces for software */
+	uint32_t __reserved_0:3;     /* Reserved 0 */
+	uint32_t irte_mode:1;        /* IRTE Mode */
+	uint32_t vector:8;           /* Interrupt Vector */
+	uint32_t __reserved_1:8;     /* Reserved 1 */
+	uint32_t dest_id;            /* Destination ID */
+	uint16_t source_id:16;       /* Source-ID */
+	uint64_t sid_q:2;            /* Source-ID Qualifier */
+	uint64_t sid_vtype:2;        /* Source-ID Validation Type */
+	uint64_t __reserved_2:44;    /* Reserved 2 */
+} __attribute__ ((packed));
+typedef struct vtd_irte vtd_irte_t;
+
 #define ONE_BIT_ONLY(x) ((x) && !((x) & ((x) - 1)))
 
 #define VTD_RTA_MASK  (PAGE_MASK)
@@ -205,6 +226,79 @@ void vtd_map_range(uint16_t sid, iova_t iova, phys_addr_t pa, size_t size)
 	}
 }
 
+static uint16_t vtd_intr_index_alloc(void)
+{
+	static int index_ctr = 0;
+	assert(index_ctr < 65535);
+	return index_ctr++;
+}
+
+static void vtd_setup_irte(struct pci_dev *dev, vtd_irte_t *irte,
+			   int vector, int dest_id)
+{
+	assert(sizeof(vtd_irte_t) == 16);
+	memset(irte, 0, sizeof(*irte));
+	irte->fault_disable = 1;
+	irte->dest_mode = 0;	 /* physical */
+	irte->trigger_mode = 0;	 /* edge */
+	irte->delivery_mode = 0; /* fixed */
+	irte->irte_mode = 0;	 /* remapped */
+	irte->vector = vector;
+	irte->dest_id = dest_id;
+	irte->source_id = dev->pci_bdf;
+	irte->sid_q = 0;
+	irte->sid_vtype = 1;     /* full-sid verify */
+	irte->present = 1;
+}
+
+struct vtd_msi_addr {
+	uint32_t __dont_care:2;
+	uint32_t handle_15:1;	 /* handle[15] */
+	uint32_t shv:1;
+	uint32_t interrupt_format:1;
+	uint32_t handle_0_14:15; /* handle[0:14] */
+	uint32_t head:12;	 /* 0xfee */
+	uint32_t addr_hi;	 /* not used except with x2apic */
+} __attribute__ ((packed));
+typedef struct vtd_msi_addr vtd_msi_addr_t;
+
+struct vtd_msi_data {
+	uint16_t __reserved;
+	uint16_t subhandle;
+} __attribute__ ((packed));
+typedef struct vtd_msi_data vtd_msi_data_t;
+
+/**
+ * vtd_setup_msi - setup MSI message for a device
+ *
+ * @dev: PCI device to setup MSI
+ * @vector: interrupt vector
+ * @dest_id: destination processor
+ */
+int vtd_setup_msi(struct pci_dev *dev, int vector, int dest_id)
+{
+	vtd_msi_data_t msi_data = {};
+	vtd_msi_addr_t msi_addr = {};
+	vtd_irte_t *irte = phys_to_virt(vtd_ir_table());
+	uint16_t index = vtd_intr_index_alloc();
+
+	assert(sizeof(vtd_msi_addr_t) == 8);
+	assert(sizeof(vtd_msi_data_t) == 4);
+
+	printf("INTR: setup IRTE index %d\n", index);
+	vtd_setup_irte(dev, irte + index, vector, dest_id);
+
+	msi_addr.handle_15 = index >> 15 & 1;
+	msi_addr.shv = 0;
+	msi_addr.interrupt_format = 1;
+	msi_addr.handle_0_14 = index & 0x7fff;
+	msi_addr.head = 0xfee;
+	msi_data.subhandle = 0;
+
+	return pci_setup_msi(dev, *(uint64_t *)&msi_addr,
+			     *(uint32_t *)&msi_data);
+}
+
 void vtd_init(void)
 {
 	setup_vm();
diff --git a/lib/x86/intel-iommu.h b/lib/x86/intel-iommu.h
index 40816bb..6ce4263 100644
--- a/lib/x86/intel-iommu.h
+++ b/lib/x86/intel-iommu.h
@@ -134,5 +134,6 @@ static inline uint64_t vtd_readq(unsigned int reg)
 
 void vtd_init(void);
 void vtd_map_range(uint16_t sid, phys_addr_t iova, phys_addr_t pa, size_t size);
+int vtd_setup_msi(struct pci_dev *dev, int vector, int dest_id);
 
 #endif
diff --git a/x86/intel-iommu.c b/x86/intel-iommu.c
index 26b39d1..8f28736 100644
--- a/x86/intel-iommu.c
+++ b/x86/intel-iommu.c
@@ -12,8 +12,10 @@
 
 #include "intel-iommu.h"
 #include "pci-edu.h"
+#include "x86/apic.h"
 
 #define VTD_TEST_DMAR_4B ("DMAR 4B memcpy test")
+#define VTD_TEST_IR_MSI ("IR MSI")
 
 void vtd_test_dmar(struct pci_edu_dev *dev)
 {
@@ -50,6 +52,43 @@ void vtd_test_dmar(struct pci_edu_dev *dev)
 	free_page(page);
 }
 
+static volatile bool edu_intr_recved;
+
+static void edu_isr(isr_regs_t *regs)
+{
+	edu_intr_recved = true;
+	eoi();
+}
+
+static void vtd_test_ir(struct pci_edu_dev *dev)
+{
+#define VTD_TEST_VECTOR (0xee)
+	/*
+	 * Setup EDU PCI device MSI, using interrupt remapping. By
+	 * default, EDU device is using INTx.
+	 */
+	if (vtd_setup_msi(&dev->pci_dev, VTD_TEST_VECTOR, 0)) {
+		printf("edu device does not support MSI, skip test\n");
+		report_skip(VTD_TEST_IR_MSI);
+		return;
+	}
+
+	handle_irq(VTD_TEST_VECTOR, edu_isr);
+	irq_enable();
+
+	/* Manually trigger INTR */
+	edu_reg_write(dev, EDU_REG_INTR_RAISE, 1);
+
+	while (!edu_intr_recved)
+		cpu_relax();
+
+	/* Clear INTR bits */
+	edu_reg_write(dev, EDU_REG_INTR_RAISE, 0);
+
+	/* We are good as long as we reach here */
+	report(VTD_TEST_IR_MSI, true);
+}
+
 int main(int argc, char *argv[])
 {
 	int ret;
@@ -72,8 +111,11 @@ int main(int argc, char *argv[])
 		printf("Please specify \"-device edu\" to do "
 		       "further IOMMU tests.\n");
 		report_skip(VTD_TEST_DMAR_4B);
-	} else
+		report_skip(VTD_TEST_IR_MSI);
+	} else {
 		vtd_test_dmar(&dev);
+		vtd_test_ir(&dev);
+	}
 
 	return report_summary();
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test
  2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
                   ` (15 preceding siblings ...)
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 16/17] x86: intel-iommu: add IR MSI test Peter Xu
@ 2016-10-26  7:47 ` Peter Xu
  2016-11-04 17:46   ` Andrew Jones
  16 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-10-26  7:47 UTC (permalink / raw)
  To: kvm; +Cc: drjones, rkrcmar, peterx, agordeev, jan.kiszka, pbonzini

I used pc-q35-2.7 instead of default q35 since I'd better make sure both
QEMU 2.7/2.8 can run the test, which is to at least satisfy the
following matrix:

  |----------+-----------+------------|
  | versions | KVM < 4.7 | KVM >= 4.7 |
  |----------+-----------+------------|
  | QEMU 2.7 |           |            |
  | QEMU 2.8 |           |            |
  |----------+-----------+------------|

If using default q35, QEMU2.8+KVM<4.7 will fail to run.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 x86/unittests.cfg | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index f760d8f..d59cf68 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -206,3 +206,10 @@ extra_params = -cpu kvm64,hv_time,hv_synic,hv_stimer -device hyperv-testdev
 file = hyperv_clock.flat
 smp = 2
 extra_params = -cpu kvm64,hv_time
+
+[intel_iommu]
+file = intel-iommu.flat
+arch = x86_64
+timeout = 30
+smp = 4
+extra_params = -M pc-q35-2.7,kernel-irqchip=split -device intel-iommu,intremap=on -device edu
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test Peter Xu
@ 2016-11-04 16:12   ` Andrew Jones
  2016-11-07 16:32     ` Peter Xu
  2016-11-08 10:52   ` Alexander Gordeev
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 16:12 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:04PM +0800, Peter Xu wrote:
> Adding fundamental init test for Intel IOMMU. This includes basic
> initialization of Intel IOMMU device, like DMAR (DMA Remapping),
> IR (Interrupt Remapping), QI (Queue Invalidation), etc.
> 
> Further tests can use vtd_init() to initialize Intel IOMMU environment.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/x86/intel-iommu.c |  88 ++++++++++++++++++++++++++++++++++++++
>  lib/x86/intel-iommu.h | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  x86/Makefile.x86_64   |   2 +
>  x86/intel-iommu.c     |  27 ++++++++++++
>  4 files changed, 232 insertions(+)
>  create mode 100644 lib/x86/intel-iommu.c
>  create mode 100644 lib/x86/intel-iommu.h
>  create mode 100644 x86/intel-iommu.c
> 
> diff --git a/lib/x86/intel-iommu.c b/lib/x86/intel-iommu.c
> new file mode 100644
> index 0000000..02eaf39
> --- /dev/null
> +++ b/lib/x86/intel-iommu.c
> @@ -0,0 +1,88 @@
> +/*
> + * Intel IOMMU APIs
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * Authors:
> + *   Peter Xu <peterx@redhat.com>,
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or
> + * later.
> + */
> +
> +#include "intel-iommu.h"
> +
> +#define ONE_BIT_ONLY(x) ((x) && !((x) & ((x) - 1)))

How about instead adding

 static inline bool is_power_of_2(unsigned long n)
 {
     return (n && !(n & (n - 1)));
 }

to libcflat.h, because there's a few other places I've needed
this check myself.

> +
> +#define VTD_RTA_MASK  (PAGE_MASK)
> +#define VTD_IRTA_MASK (PAGE_MASK)
> +
> +static uint64_t vtd_root_table(void)
> +{
> +       /* No extend root table support yet */
> +       return vtd_readq(DMAR_RTADDR_REG) & VTD_RTA_MASK;
> +}
> +
> +static uint64_t vtd_ir_table(void)
> +{
> +	return vtd_readq(DMAR_IRTA_REG) & VTD_IRTA_MASK;
> +}
> +
> +static void vtd_gcmd_or(uint32_t cmd)
> +{
> +	uint32_t status;
> +
> +	/* We only allow set one bit for each time */
> +	assert(ONE_BIT_ONLY(cmd));
> +
> +	status = vtd_readl(DMAR_GSTS_REG);
> +	vtd_writel(DMAR_GCMD_REG, status | cmd);
> +
> +	if (cmd & VTD_GCMD_ONE_SHOT_BITS)
> +		/* One-shot bits are taking effect immediately */
> +		return;

Please use {} when adding comments between an if and its one line.

> +
> +	/* Make sure IOMMU handled our command request */
> +	while (!(vtd_readl(DMAR_GSTS_REG) & cmd));
> +}
> +
> +static void vtd_dump_init_info(void)
> +{
> +	printf("VT-d version:   0x%x\n", vtd_readl(DMAR_VER_REG));
> +	printf("     cap:       0x%016lx\n", vtd_readq(DMAR_CAP_REG));
> +	printf("     ecap:      0x%016lx\n", vtd_readq(DMAR_ECAP_REG));
> +}
> +
> +static void vtd_setup_root_table(void)
> +{
> +	void *root = alloc_page();
> +
> +	memset(root, 0, PAGE_SIZE);
> +	vtd_writeq(DMAR_RTADDR_REG, virt_to_phys(root));
> +	vtd_gcmd_or(VTD_GCMD_ROOT);
> +	printf("DMAR table address: 0x%016lx\n", vtd_root_table());
> +}
> +
> +static void vtd_setup_ir_table(void)
> +{
> +	void *root = alloc_page();
> +
> +	memset(root, 0, PAGE_SIZE);
> +	/* 0xf stands for table size (2^(0xf+1) == 65536) */
> +	vtd_writeq(DMAR_IRTA_REG, virt_to_phys(root) | 0xf);
> +	vtd_gcmd_or(VTD_GCMD_IR_TABLE);
> +	printf("IR table address: 0x%016lx\n", vtd_ir_table());
> +}
> +
> +void vtd_init(void)
> +{
> +	setup_vm();
> +	smp_init();
> +
> +	vtd_dump_init_info();
> +	vtd_gcmd_or(VTD_GCMD_QI); /* Enable QI */
> +	vtd_setup_root_table();
> +	vtd_setup_ir_table();
> +	vtd_gcmd_or(VTD_GCMD_DMAR); /* Enable DMAR */
> +	vtd_gcmd_or(VTD_GCMD_IR);   /* Enable IR */
> +}
> diff --git a/lib/x86/intel-iommu.h b/lib/x86/intel-iommu.h
> new file mode 100644
> index 0000000..537ce62
> --- /dev/null
> +++ b/lib/x86/intel-iommu.h
> @@ -0,0 +1,115 @@
> +/*
> + * Intel IOMMU header
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * Authors:
> + *   Peter Xu <peterx@redhat.com>,
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or
> + * later.
> + *
> + * (From include/linux/intel-iommu.h)
> + */
> +
> +#ifndef __INTEL_IOMMU_H__
> +#define __INTEL_IOMMU_H__
> +
> +#include "libcflat.h"
> +#include "vm.h"
> +#include "isr.h"
> +#include "smp.h"
> +#include "desc.h"
> +
> +#define Q35_HOST_BRIDGE_IOMMU_ADDR  0xfed90000ULL
> +
> +/*
> + * Intel IOMMU register specification
> + */
> +#define DMAR_VER_REG            0x0  /* Arch version supported by this IOMMU */
> +#define DMAR_CAP_REG            0x8  /* Hardware supported capabilities */
> +#define DMAR_CAP_REG_HI         0xc  /* High 32-bit of DMAR_CAP_REG */
> +#define DMAR_ECAP_REG           0x10 /* Extended capabilities supported */
> +#define DMAR_ECAP_REG_HI        0X14
> +#define DMAR_GCMD_REG           0x18 /* Global command */
> +#define DMAR_GSTS_REG           0x1c /* Global status */
> +#define DMAR_RTADDR_REG         0x20 /* Root entry table */
> +#define DMAR_RTADDR_REG_HI      0X24
> +#define DMAR_CCMD_REG           0x28 /* Context command */
> +#define DMAR_CCMD_REG_HI        0x2c
> +#define DMAR_FSTS_REG           0x34 /* Fault status */
> +#define DMAR_FECTL_REG          0x38 /* Fault control */
> +#define DMAR_FEDATA_REG         0x3c /* Fault event interrupt data */
> +#define DMAR_FEADDR_REG         0x40 /* Fault event interrupt addr */
> +#define DMAR_FEUADDR_REG        0x44 /* Upper address */
> +#define DMAR_AFLOG_REG          0x58 /* Advanced fault control */
> +#define DMAR_AFLOG_REG_HI       0X5c
> +#define DMAR_PMEN_REG           0x64 /* Enable protected memory region */
> +#define DMAR_PLMBASE_REG        0x68 /* PMRR low addr */
> +#define DMAR_PLMLIMIT_REG       0x6c /* PMRR low limit */
> +#define DMAR_PHMBASE_REG        0x70 /* PMRR high base addr */
> +#define DMAR_PHMBASE_REG_HI     0X74
> +#define DMAR_PHMLIMIT_REG       0x78 /* PMRR high limit */
> +#define DMAR_PHMLIMIT_REG_HI    0x7c
> +#define DMAR_IQH_REG            0x80 /* Invalidation queue head */
> +#define DMAR_IQH_REG_HI         0X84
> +#define DMAR_IQT_REG            0x88 /* Invalidation queue tail */
> +#define DMAR_IQT_REG_HI         0X8c
> +#define DMAR_IQA_REG            0x90 /* Invalidation queue addr */
> +#define DMAR_IQA_REG_HI         0x94
> +#define DMAR_ICS_REG            0x9c /* Invalidation complete status */
> +#define DMAR_IRTA_REG           0xb8 /* Interrupt remapping table addr */
> +#define DMAR_IRTA_REG_HI        0xbc
> +#define DMAR_IECTL_REG          0xa0 /* Invalidation event control */
> +#define DMAR_IEDATA_REG         0xa4 /* Invalidation event data */
> +#define DMAR_IEADDR_REG         0xa8 /* Invalidation event address */
> +#define DMAR_IEUADDR_REG        0xac /* Invalidation event address */
> +#define DMAR_PQH_REG            0xc0 /* Page request queue head */
> +#define DMAR_PQH_REG_HI         0xc4
> +#define DMAR_PQT_REG            0xc8 /* Page request queue tail*/
> +#define DMAR_PQT_REG_HI         0xcc
> +#define DMAR_PQA_REG            0xd0 /* Page request queue address */
> +#define DMAR_PQA_REG_HI         0xd4
> +#define DMAR_PRS_REG            0xdc /* Page request status */
> +#define DMAR_PECTL_REG          0xe0 /* Page request event control */
> +#define DMAR_PEDATA_REG         0xe4 /* Page request event data */
> +#define DMAR_PEADDR_REG         0xe8 /* Page request event address */
> +#define DMAR_PEUADDR_REG        0xec /* Page event upper address */
> +#define DMAR_MTRRCAP_REG        0x100 /* MTRR capability */
> +#define DMAR_MTRRCAP_REG_HI     0x104
> +#define DMAR_MTRRDEF_REG        0x108 /* MTRR default type */
> +#define DMAR_MTRRDEF_REG_HI     0x10c
> +
> +#define VTD_GCMD_IR_TABLE       (0x1000000)
> +#define VTD_GCMD_IR             (0x2000000)
> +#define VTD_GCMD_QI             (0x4000000)
> +#define VTD_GCMD_WBF            (0x8000000)  /* Write Buffer Flush */
> +#define VTD_GCMD_SFL            (0x20000000) /* Set Fault Log */
> +#define VTD_GCMD_ROOT           (0x40000000)
> +#define VTD_GCMD_DMAR           (0x80000000)
> +#define VTD_GCMD_ONE_SHOT_BITS  (VTD_GCMD_IR_TABLE | VTD_GCMD_WBF | \
> +				 VTD_GCMD_SFL | VTD_GCMD_ROOT)
> +
> +static inline void vtd_writel(unsigned int reg, uint32_t value)
> +{
> +	*(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> +}
> +
> +static inline void vtd_writeq(unsigned int reg, uint64_t value)
> +{
> +	*(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> +}
> +
> +static inline uint32_t vtd_readl(unsigned int reg)
> +{
> +	return *(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> +}
> +
> +static inline uint64_t vtd_readq(unsigned int reg)
> +{
> +	return *(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> +}
> +
> +void vtd_init(void);
> +
> +#endif
> diff --git a/x86/Makefile.x86_64 b/x86/Makefile.x86_64
> index f82492b..3e2821e 100644
> --- a/x86/Makefile.x86_64
> +++ b/x86/Makefile.x86_64
> @@ -4,6 +4,7 @@ ldarch = elf64-x86-64
>  CFLAGS += -mno-red-zone
>  
>  cflatobjs += lib/x86/setjmp64.o
> +cflatobjs += lib/x86/intel-iommu.o
>  
>  tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
>  	  $(TEST_DIR)/emulator.flat $(TEST_DIR)/idt_test.flat \
> @@ -14,6 +15,7 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
>  tests += $(TEST_DIR)/svm.flat
>  tests += $(TEST_DIR)/vmx.flat
>  tests += $(TEST_DIR)/tscdeadline_latency.flat
> +tests += $(TEST_DIR)/intel-iommu.flat
>  
>  include $(TEST_DIR)/Makefile.common
>  
> diff --git a/x86/intel-iommu.c b/x86/intel-iommu.c
> new file mode 100644
> index 0000000..f247913
> --- /dev/null
> +++ b/x86/intel-iommu.c
> @@ -0,0 +1,27 @@
> +/*
> + * Intel IOMMU unit test.
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * Authors:
> + *   Peter Xu <peterx@redhat.com>,
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or
> + * later.
> + */
> +
> +#include "intel-iommu.h"
> +
> +int main(int argc, char *argv[])
> +{
> +	vtd_init();
> +
> +	report("fault status check", vtd_readl(DMAR_FSTS_REG) == 0);
> +	report("QI enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_QI);
> +	report("DMAR table setup", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_ROOT);
> +	report("IR table setup", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_IR_TABLE);
> +	report("DMAR enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_DMAR);
> +	report("IR enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_IR);
> +
> +	return report_summary();
> +}
> -- 
> 2.7.4
>

Looks good to me, framework-wise (I didn't open the spec...)

Thanks,
drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 02/17] libcflat: add IS_ALIGNED() macro, and page sizes
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 02/17] libcflat: add IS_ALIGNED() macro, and page sizes Peter Xu
@ 2016-11-04 16:14   ` Andrew Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 16:14 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:05PM +0800, Peter Xu wrote:
> These macros will be useful to do page alignment checks.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/libcflat.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index 72b1bf9..a80df04 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -33,6 +33,11 @@
>  #define __ALIGN_MASK(x, mask)	(((x) + (mask)) & ~(mask))
>  #define __ALIGN(x, a)		__ALIGN_MASK(x, (typeof(x))(a) - 1)
>  #define ALIGN(x, a)		__ALIGN((x), (a))
> +#define IS_ALIGNED(x, a)	(((x) & ((typeof(x))(a) - 1)) == 0)
> +
> +#define SZ_4K			(0x1000)
> +#define SZ_2M			(0x200000)
> +#define SZ_1G			(0x40000000)
>  
>  typedef uint8_t		u8;
>  typedef int8_t		s8;
> -- 
> 2.7.4

Please add SZ_64K as well.

Otherwise

Reviewed-by: Andrew Jones <drjones@redhat.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 03/17] libcflat: moving MIN/MAX here
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 03/17] libcflat: moving MIN/MAX here Peter Xu
@ 2016-11-04 16:15   ` Andrew Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 16:15 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:06PM +0800, Peter Xu wrote:
> That's something can be used outside alloc.c.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/alloc.c    | 3 ---
>  lib/libcflat.h | 3 +++
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/alloc.c b/lib/alloc.c
> index e1d7b8a..58af52b 100644
> --- a/lib/alloc.c
> +++ b/lib/alloc.c
> @@ -7,9 +7,6 @@
>  #include "asm/spinlock.h"
>  #include "asm/io.h"
>  
> -#define MIN(a, b)		((a) < (b) ? (a) : (b))
> -#define MAX(a, b)		((a) > (b) ? (a) : (b))
> -
>  #define PHYS_ALLOC_NR_REGIONS	256
>  
>  struct phys_alloc_region {
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index a80df04..f0308f6 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -39,6 +39,9 @@
>  #define SZ_2M			(0x200000)
>  #define SZ_1G			(0x40000000)
>  
> +#define MIN(a, b)		((a) < (b) ? (a) : (b))
> +#define MAX(a, b)		((a) > (b) ? (a) : (b))
> +
>  typedef uint8_t		u8;
>  typedef int8_t		s8;
>  typedef uint16_t	u16;
> -- 
> 2.7.4

Reviewed-by: Andrew Jones <drjones@redhat.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 05/17] x86/asm: add cpu_relax()
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 05/17] x86/asm: add cpu_relax() Peter Xu
@ 2016-11-04 16:18   ` Andrew Jones
  2016-11-07 16:40     ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 16:18 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:08PM +0800, Peter Xu wrote:
> This will be useful to be put inside loops.
> 
> Suggested-by: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/x86/asm/barrier.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/lib/x86/asm/barrier.h b/lib/x86/asm/barrier.h
> index 7c108bd..193fb4c 100644
> --- a/lib/x86/asm/barrier.h
> +++ b/lib/x86/asm/barrier.h
> @@ -13,4 +13,15 @@
>  #define smp_rmb()	barrier()
>  #define smp_wmb()	barrier()
>  
> +/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
> +static inline void rep_nop(void)
> +{
> +	asm volatile("rep; nop" ::: "memory");
> +}
> +
> +static inline void cpu_relax(void)
> +{
> +	rep_nop();
> +}
> +
>  #endif
> -- 
> 2.7.4

Reviewed-by: Andrew Jones <drjones@redhat.com>

But this should be the first patch of the series, because your current
first patch already introduces one busy-wait loop where it could be used.

drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev Peter Xu
@ 2016-11-04 16:41   ` Andrew Jones
  2016-11-07 17:05     ` Peter Xu
  2016-11-08 12:27     ` Alexander Gordeev
  0 siblings, 2 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 16:41 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:09PM +0800, Peter Xu wrote:
> To extend current PCI framework, we need a per-device struct to store
> device specific information. Time to have a pci_dev struct. Most of the
> current PCI APIs are converted to use this pci_dev object as the first
> argument. Currently it only contains one field "pci_bdf", which is the
> bdf of current device.
> 
> For a few APIs like pci_config_*() ops, I kept the old interface (use
> PCI BDF value rather than "struct pci_dev") since they can be used in a
> open context that without any specific PCI device.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci-host-generic.c | 13 ++++++----
>  lib/pci-testdev.c      | 13 +++++-----
>  lib/pci.c              | 64 ++++++++++++++++++++++++++++++--------------------
>  lib/pci.h              | 24 ++++++++++++-------
>  x86/vmexit.c           | 18 +++++++-------
>  5 files changed, 80 insertions(+), 52 deletions(-)
> 
> diff --git a/lib/pci-host-generic.c b/lib/pci-host-generic.c
> index 96f4b4b..be3f9e7 100644
> --- a/lib/pci-host-generic.c
> +++ b/lib/pci-host-generic.c
> @@ -194,6 +194,7 @@ static bool pci_alloc_resource(u64 *addr, u32 bar, u64 size)
>  
>  bool pci_probe(void)
>  {
> +	struct pci_dev pci_dev;
>  	pcidevaddr_t dev;
>  	u8 header;
>  	u32 cmd;
> @@ -208,6 +209,8 @@ bool pci_probe(void)
>  		if (!pci_dev_exists(dev))
>  			continue;
>  
> +		pci_dev_init(&pci_dev, dev);
> +
>  		/* We are only interested in normal PCI devices */
>  		header = pci_config_readb(dev, PCI_HEADER_TYPE);
>  		if ((header & PCI_HEADER_TYPE_MASK) != PCI_HEADER_TYPE_NORMAL)
> @@ -219,21 +222,21 @@ bool pci_probe(void)
>  			u64 addr, size;
>  			u32 bar;
>  
> -			size = pci_bar_size(dev, i);
> +			size = pci_bar_size(&pci_dev, i);
>  			if (!size)
>  				continue;
>  
> -			bar = pci_bar_get(dev, i);
> +			bar = pci_bar_get(&pci_dev, i);
>  			if (pci_alloc_resource(&addr, bar, size)) {
> -				pci_bar_set_addr(dev, i, addr);
> +				pci_bar_set_addr(&pci_dev, i, addr);
>  
> -				if (pci_bar_is_memory(dev, i))
> +				if (pci_bar_is_memory(&pci_dev, i))
>  					cmd |= PCI_COMMAND_MEMORY;
>  				else
>  					cmd |= PCI_COMMAND_IO;
>  			}
>  
> -			if (pci_bar_is64(dev, i))
> +			if (pci_bar_is64(&pci_dev, i))
>  				i++;
>  		}
>  
> diff --git a/lib/pci-testdev.c b/lib/pci-testdev.c
> index ad482d3..caa31bd 100644
> --- a/lib/pci-testdev.c
> +++ b/lib/pci-testdev.c
> @@ -163,26 +163,27 @@ static int pci_testdev_all(struct pci_test_dev_hdr *test,
>  
>  int pci_testdev(void)
>  {
> +	struct pci_dev pci_dev;
>  	phys_addr_t addr;
>  	void __iomem *mem, *io;
> -	pcidevaddr_t dev;
>  	int nr_tests = 0;
>  	bool ret;
>  
> -	dev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
> -	if (dev == PCIDEVADDR_INVALID) {
> +	ret = pci_find_dev(&pci_dev, PCI_VENDOR_ID_REDHAT,
> +			   PCI_DEVICE_ID_REDHAT_TEST);
> +	if (ret == -1) {
>  		printf("'pci-testdev' device is not found, "
>  		       "check QEMU '-device pci-testdev' parameter\n");
>  		return -1;
>  	}
>  
> -	ret = pci_bar_is_valid(dev, 0) && pci_bar_is_valid(dev, 1);
> +	ret = pci_bar_is_valid(&pci_dev, 0) && pci_bar_is_valid(&pci_dev, 1);
>  	assert(ret);
>  
> -	addr = pci_bar_get_addr(dev, 0);
> +	addr = pci_bar_get_addr(&pci_dev, 0);
>  	mem = ioremap(addr, PAGE_SIZE);
>  
> -	addr = pci_bar_get_addr(dev, 1);
> +	addr = pci_bar_get_addr(&pci_dev, 1);
>  	io = (void *)(unsigned long)addr;
>  
>  	nr_tests += pci_testdev_all(mem, &pci_testdev_mem_ops);
> diff --git a/lib/pci.c b/lib/pci.c
> index 2dbbba4..bb41e0f 100644
> --- a/lib/pci.c
> +++ b/lib/pci.c
> @@ -13,18 +13,26 @@ bool pci_dev_exists(pcidevaddr_t dev)
>  		pci_config_readw(dev, PCI_DEVICE_ID) != 0xffff);
>  }
>  
> +void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf)
> +{
> +       memset(dev, 0, sizeof(*dev));
> +       dev->pci_bdf = bdf;

Hmm, bdf means bus-device-function, correct? We're not passing a "bdf"
in as the second argument to this function though. We're only passing
devid, assuming bus=0, fn=0. If you'd prefer we stop the bus,fn zero
assumption, then I think a precursor patch to this should be to change
our current handle type, pcidevaddr_t, to a "bdf_t".

> +}
> +
>  /* Scan bus look for a specific device. Only bus 0 scanned for now. */
> -pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
> +int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id)
>  {
>  	pcidevaddr_t dev;
>  
> -	for (dev = 0; dev < 256; ++dev) {
> +	for (dev = 0; dev < PCI_DEVFN_MAX; ++dev) {

Why introduce this PCI_DEVFN_MAX define?

>  		if (pci_config_readw(dev, PCI_VENDOR_ID) == vendor_id &&
> -		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id)
> -			return dev;
> +		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id) {
> +			pci_dev_init(pci_dev, dev);
> +			return 0;
> +		}
>  	}
>  
> -	return PCIDEVADDR_INVALID;
> +	return -1;

Why not use bool for the ret type; true=good, false=bad?

>  }
>  
>  uint32_t pci_bar_mask(uint32_t bar)
> @@ -33,12 +41,13 @@ uint32_t pci_bar_mask(uint32_t bar)
>  		PCI_BASE_ADDRESS_IO_MASK : PCI_BASE_ADDRESS_MEM_MASK;
>  }
>  
> -uint32_t pci_bar_get(pcidevaddr_t dev, int bar_num)
> +uint32_t pci_bar_get(struct pci_dev *dev, int bar_num)
>  {
> -	return pci_config_readl(dev, PCI_BASE_ADDRESS_0 + bar_num * 4);
> +	return pci_config_readl(dev->pci_bdf, PCI_BASE_ADDRESS_0 +
> +				bar_num * 4);
>  }
>  
> -phys_addr_t pci_bar_get_addr(pcidevaddr_t dev, int bar_num)
> +phys_addr_t pci_bar_get_addr(struct pci_dev *dev, int bar_num)
>  {
>  	uint32_t bar = pci_bar_get(dev, bar_num);
>  	uint32_t mask = pci_bar_mask(bar);
> @@ -47,17 +56,18 @@ phys_addr_t pci_bar_get_addr(pcidevaddr_t dev, int bar_num)
>  	if (pci_bar_is64(dev, bar_num))
>  		addr |= (uint64_t)pci_bar_get(dev, bar_num + 1) << 32;
>  
> -	return pci_translate_addr(dev, addr);
> +	return pci_translate_addr(dev->pci_bdf, addr);
>  }
>  
> -void pci_bar_set_addr(pcidevaddr_t dev, int bar_num, phys_addr_t addr)
> +void pci_bar_set_addr(struct pci_dev *dev, int bar_num, phys_addr_t addr)
>  {
>  	int off = PCI_BASE_ADDRESS_0 + bar_num * 4;
>  
> -	pci_config_writel(dev, off, (uint32_t)addr);
> +	pci_config_writel(dev->pci_bdf, off, (uint32_t)addr);
>  
>  	if (pci_bar_is64(dev, bar_num))
> -		pci_config_writel(dev, off + 4, (uint32_t)(addr >> 32));
> +		pci_config_writel(dev->pci_bdf, off + 4,
> +				  (uint32_t)(addr >> 32));
>  }
>  
>  /*
> @@ -70,20 +80,21 @@ void pci_bar_set_addr(pcidevaddr_t dev, int bar_num, phys_addr_t addr)
>   * The following pci_bar_size_helper() and pci_bar_size() functions
>   * implement the algorithm.
>   */
> -static uint32_t pci_bar_size_helper(pcidevaddr_t dev, int bar_num)
> +static uint32_t pci_bar_size_helper(struct pci_dev *dev, int bar_num)
>  {
>  	int off = PCI_BASE_ADDRESS_0 + bar_num * 4;
> +	uint16_t bdf = dev->pci_bdf;
>  	uint32_t bar, val;
>  
> -	bar = pci_config_readl(dev, off);
> -	pci_config_writel(dev, off, ~0u);
> -	val = pci_config_readl(dev, off);
> -	pci_config_writel(dev, off, bar);
> +	bar = pci_config_readl(bdf, off);
> +	pci_config_writel(bdf, off, ~0u);
> +	val = pci_config_readl(bdf, off);
> +	pci_config_writel(bdf, off, bar);
>  
>  	return val;
>  }
>  
> -phys_addr_t pci_bar_size(pcidevaddr_t dev, int bar_num)
> +phys_addr_t pci_bar_size(struct pci_dev *dev, int bar_num)
>  {
>  	uint32_t bar, size;
>  
> @@ -104,19 +115,19 @@ phys_addr_t pci_bar_size(pcidevaddr_t dev, int bar_num)
>  	}
>  }
>  
> -bool pci_bar_is_memory(pcidevaddr_t dev, int bar_num)
> +bool pci_bar_is_memory(struct pci_dev *dev, int bar_num)
>  {
>  	uint32_t bar = pci_bar_get(dev, bar_num);
>  
>  	return !(bar & PCI_BASE_ADDRESS_SPACE_IO);
>  }
>  
> -bool pci_bar_is_valid(pcidevaddr_t dev, int bar_num)
> +bool pci_bar_is_valid(struct pci_dev *dev, int bar_num)
>  {
>  	return pci_bar_get(dev, bar_num);
>  }
>  
> -bool pci_bar_is64(pcidevaddr_t dev, int bar_num)
> +bool pci_bar_is64(struct pci_dev *dev, int bar_num)
>  {
>  	uint32_t bar = pci_bar_get(dev, bar_num);
>  
> @@ -135,6 +146,7 @@ static void pci_dev_print(pcidevaddr_t dev)
>  	uint8_t progif = pci_config_readb(dev, PCI_CLASS_PROG);
>  	uint8_t subclass = pci_config_readb(dev, PCI_CLASS_DEVICE);
>  	uint8_t class = pci_config_readb(dev, PCI_CLASS_DEVICE + 1);
> +	struct pci_dev pci_dev;
>  	int i;
>  
>  	printf("dev %2d fn %d vendor_id %04x device_id %04x type %02x "
> @@ -145,18 +157,20 @@ static void pci_dev_print(pcidevaddr_t dev)
>  	if ((header & PCI_HEADER_TYPE_MASK) != PCI_HEADER_TYPE_NORMAL)
>  		return;
>  
> +	pci_dev_init(&pci_dev, dev);
> +
>  	for (i = 0; i < 6; i++) {
>  		phys_addr_t size, start, end;
>  		uint32_t bar;
>  
> -		size = pci_bar_size(dev, i);
> +		size = pci_bar_size(&pci_dev, i);
>  		if (!size)
>  			continue;
>  
> -		start = pci_bar_get_addr(dev, i);
> +		start = pci_bar_get_addr(&pci_dev, i);
>  		end = start + size - 1;
>  
> -		if (pci_bar_is64(dev, i)) {
> +		if (pci_bar_is64(&pci_dev, i)) {
>  			printf("\tBAR#%d,%d [%" PRIx64 "-%" PRIx64 " ",
>  			       i, i + 1, start, end);
>  			i++;
> @@ -165,7 +179,7 @@ static void pci_dev_print(pcidevaddr_t dev)
>  			       i, (uint32_t)start, (uint32_t)end);
>  		}
>  
> -		bar = pci_bar_get(dev, i);
> +		bar = pci_bar_get(&pci_dev, i);
>  
>  		if (bar & PCI_BASE_ADDRESS_SPACE_IO) {
>  			printf("PIO]\n");
> diff --git a/lib/pci.h b/lib/pci.h
> index 40e11a8..af20dc5 100644
> --- a/lib/pci.h
> +++ b/lib/pci.h
> @@ -15,10 +15,18 @@ enum {
>  	PCIDEVADDR_INVALID = 0xffff,
>  };
>  
> +#define PCI_DEVFN_MAX                   (256)
> +
> +struct pci_dev {
> +	uint16_t pci_bdf;

No need for 'pci_' in the bdf name. It's whole name already has it,
pci_dev.bdf

> +};
> +
> +void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
> +
>  extern bool pci_probe(void);
>  extern void pci_print(void);
>  extern bool pci_dev_exists(pcidevaddr_t dev);
> -extern pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id);
> +extern int pci_find_dev(struct pci_dev *dev, uint16_t vendor_id, uint16_t device_id);
>  
>  /*
>   * @bar_num in all BAR access functions below is the index of the 32-bit
> @@ -32,14 +40,14 @@ extern pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id);
>   * It is expected the caller is aware of the device BAR layout and never
>   * tries to address the middle of a 64-bit register.
>   */
> -extern phys_addr_t pci_bar_get_addr(pcidevaddr_t dev, int bar_num);
> -extern void pci_bar_set_addr(pcidevaddr_t dev, int bar_num, phys_addr_t addr);
> -extern phys_addr_t pci_bar_size(pcidevaddr_t dev, int bar_num);
> -extern uint32_t pci_bar_get(pcidevaddr_t dev, int bar_num);
> +extern phys_addr_t pci_bar_get_addr(struct pci_dev *dev, int bar_num);
> +extern void pci_bar_set_addr(struct pci_dev *dev, int bar_num, phys_addr_t addr);
> +extern phys_addr_t pci_bar_size(struct pci_dev *dev, int bar_num);
> +extern uint32_t pci_bar_get(struct pci_dev *dev, int bar_num);
>  extern uint32_t pci_bar_mask(uint32_t bar);
> -extern bool pci_bar_is64(pcidevaddr_t dev, int bar_num);
> -extern bool pci_bar_is_memory(pcidevaddr_t dev, int bar_num);
> -extern bool pci_bar_is_valid(pcidevaddr_t dev, int bar_num);
> +extern bool pci_bar_is64(struct pci_dev *dev, int bar_num);
> +extern bool pci_bar_is_memory(struct pci_dev *dev, int bar_num);
> +extern bool pci_bar_is_valid(struct pci_dev *dev, int bar_num);
>  
>  int pci_testdev(void);
>  
> diff --git a/x86/vmexit.c b/x86/vmexit.c
> index 2d99d5f..2736ab8 100644
> --- a/x86/vmexit.c
> +++ b/x86/vmexit.c
> @@ -372,7 +372,8 @@ int main(int ac, char **av)
>  	struct fadt_descriptor_rev1 *fadt;
>  	int i;
>  	unsigned long membar = 0;
> -	pcidevaddr_t pcidev;
> +	struct pci_dev pcidev;
> +	int ret;
>  
>  	smp_init();
>  	setup_vm();
> @@ -385,21 +386,22 @@ int main(int ac, char **av)
>  	pm_tmr_blk = fadt->pm_tmr_blk;
>  	printf("PM timer port is %x\n", pm_tmr_blk);
>  
> -	pcidev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
> -	if (pcidev != PCIDEVADDR_INVALID) {
> +	ret = pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
> +			   PCI_DEVICE_ID_REDHAT_TEST);
> +	if (ret == 0) {
>  		for (i = 0; i < PCI_TESTDEV_NUM_BARS; i++) {
> -			if (!pci_bar_is_valid(pcidev, i)) {
> +			if (!pci_bar_is_valid(&pcidev, i)) {
>  				continue;
>  			}
> -			if (pci_bar_is_memory(pcidev, i)) {
> -				membar = pci_bar_get_addr(pcidev, i);
> +			if (pci_bar_is_memory(&pcidev, i)) {
> +				membar = pci_bar_get_addr(&pcidev, i);
>  				pci_test.memaddr = ioremap(membar, PAGE_SIZE);
>  			} else {
> -				pci_test.iobar = pci_bar_get_addr(pcidev, i);
> +				pci_test.iobar = pci_bar_get_addr(&pcidev, i);
>  			}
>  		}
>  		printf("pci-testdev at 0x%x membar %lx iobar %x\n",
> -		       pcidev, membar, pci_test.iobar);
> +		       pcidev.pci_bdf, membar, pci_test.iobar);
>  	}
>  
>  	for (i = 0; i < ARRAY_SIZE(tests); ++i)
> -- 
> 2.7.4

I'm OK with introducing the pci_dev struct, but let's not switch over to
it from our other handle for everything all at once. I'd rather only
switch functions that will need more state than just devid/bdf. Anywhere
we never do then I think expecting the caller to call like
pci_foo(pcidev->bdf, ...) is reasonable.

Thanks,
drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 07/17] pci: provide pci_scan_bars()
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 07/17] pci: provide pci_scan_bars() Peter Xu
@ 2016-11-04 16:47   ` Andrew Jones
  2016-11-07 17:16     ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 16:47 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:10PM +0800, Peter Xu wrote:
> Let's provide a more general way to scan PCI bars, rather than read the
> config registers every time.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci.c | 14 ++++++++++++++
>  lib/pci.h |  3 +++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/lib/pci.c b/lib/pci.c
> index bb41e0f..95a6581 100644
> --- a/lib/pci.c
> +++ b/lib/pci.c
> @@ -35,6 +35,20 @@ int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id
>  	return -1;
>  }
>  
> +void pci_scan_bars(struct pci_dev *dev)
> +{
> +	int i = 0;
> +
> +	for (i = 0; i < PCI_BAR_NUM; i++) {

I don't think we need a PCI_BAR_NUM def, but if you do introduce it
then please change all uses of '6' to it that should be changed.
Same goes for PCI_DEVFN_MAX.

> +		if (!pci_bar_is_valid(dev, i))
> +			continue;
> +		dev->pci_bar[i] = pci_bar_get_addr(dev, i);
> +		printf("PCI: init dev 0x%04x BAR %d [%s] addr 0x%lx\n",
> +		       dev->pci_bdf, i, pci_bar_is_memory(dev, i) ?
> +		       "MEM" : "IO", dev->pci_bar[i]);

v9 of Alex's series has introduced pci_bar_print. You'll want to
use that here after modifying it to get the bars from struct.

> +	}
> +}
> +
>  uint32_t pci_bar_mask(uint32_t bar)
>  {
>  	return (bar & PCI_BASE_ADDRESS_SPACE_IO) ?
> diff --git a/lib/pci.h b/lib/pci.h
> index af20dc5..56b49f0 100644
> --- a/lib/pci.h
> +++ b/lib/pci.h
> @@ -15,13 +15,16 @@ enum {
>  	PCIDEVADDR_INVALID = 0xffff,
>  };
>  
> +#define PCI_BAR_NUM                     (6)
>  #define PCI_DEVFN_MAX                   (256)
>  
>  struct pci_dev {
>  	uint16_t pci_bdf;
> +	phys_addr_t pci_bar[PCI_BAR_NUM];
>  };
>  
>  void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
> +void pci_scan_bars(struct pci_dev *dev);
>  
>  extern bool pci_probe(void);
>  extern void pci_print(void);
> -- 
> 2.7.4
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 08/17] x86/vmexit: leverage pci_scan_bars()
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 08/17] x86/vmexit: leverage pci_scan_bars() Peter Xu
@ 2016-11-04 16:54   ` Andrew Jones
  2016-11-08 13:43     ` Alexander Gordeev
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 16:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:11PM +0800, Peter Xu wrote:
> Since pci-testdev is a very specific device for QEMU, let's use the new
> pci_scan_bars() helper, and selectively choose the bars we want.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci.h    |  2 ++
>  x86/vmexit.c | 23 ++++++++---------------
>  2 files changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/pci.h b/lib/pci.h
> index 56b49f0..e48045c 100644
> --- a/lib/pci.h
> +++ b/lib/pci.h
> @@ -66,6 +66,8 @@ int pci_testdev(void);
>   * pci-testdev supports at least three types of tests (via mmio and
>   * portio BARs): no-eventfd, wildcard-eventfd and datamatch-eventfd
>   */
> +#define PCI_TESTDEV_BAR_MEM		0
> +#define PCI_TESTDEV_BAR_IO		1

More defines that I think we could live without, but I'll try to stop
complaining about self-documenting code...

>  #define PCI_TESTDEV_NUM_BARS		2
>  #define PCI_TESTDEV_NUM_TESTS		3
>  
> diff --git a/x86/vmexit.c b/x86/vmexit.c
> index 2736ab8..880466e 100644
> --- a/x86/vmexit.c
> +++ b/x86/vmexit.c
> @@ -373,7 +373,6 @@ int main(int ac, char **av)
>  	int i;
>  	unsigned long membar = 0;
>  	struct pci_dev pcidev;
> -	int ret;
>  
>  	smp_init();
>  	setup_vm();
> @@ -386,20 +385,14 @@ int main(int ac, char **av)
>  	pm_tmr_blk = fadt->pm_tmr_blk;
>  	printf("PM timer port is %x\n", pm_tmr_blk);
>  
> -	ret = pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
> -			   PCI_DEVICE_ID_REDHAT_TEST);
> -	if (ret == 0) {
> -		for (i = 0; i < PCI_TESTDEV_NUM_BARS; i++) {
> -			if (!pci_bar_is_valid(&pcidev, i)) {
> -				continue;
> -			}
> -			if (pci_bar_is_memory(&pcidev, i)) {
> -				membar = pci_bar_get_addr(&pcidev, i);
> -				pci_test.memaddr = ioremap(membar, PAGE_SIZE);
> -			} else {
> -				pci_test.iobar = pci_bar_get_addr(&pcidev, i);
> -			}
> -		}
> +	if (!pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
> +			  PCI_DEVICE_ID_REDHAT_TEST)) {
> +		pci_scan_bars(&pcidev);
> +		assert(pci_bar_is_memory(&pcidev, PCI_TESTDEV_BAR_MEM));
> +		assert(!pci_bar_is_memory(&pcidev, PCI_TESTDEV_BAR_IO));
> +		membar = pcidev.pci_bar[PCI_TESTDEV_BAR_MEM];
> +		pci_test.memaddr = ioremap(membar, PAGE_SIZE);
> +		pci_test.iobar = pcidev.pci_bar[PCI_TESTDEV_BAR_IO];
>  		printf("pci-testdev at 0x%x membar %lx iobar %x\n",
>  		       pcidev.pci_bdf, membar, pci_test.iobar);
>  	}
> -- 
> 2.7.4

I'm not sure what we gain by hardcoding the mapping of these bars, besides
slightly simpler code, but lib/pci-testdev.c does it too, and it should be
quite safe, so OK.

Reviewed-by: Andrew Jones <drjones@redhat.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]()
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]() Peter Xu
@ 2016-11-04 16:59   ` Andrew Jones
  2016-11-05 17:06   ` Alexander Gordeev
  1 sibling, 0 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 16:59 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:12PM +0800, Peter Xu wrote:
> We have writel() for 4 bytes but still missing 2/1 bytes accessors.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci-host-generic.c | 12 ++++++++++++
>  lib/x86/asm/pci.h      | 14 ++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/lib/pci-host-generic.c b/lib/pci-host-generic.c
> index be3f9e7..c04e4f5 100644
> --- a/lib/pci-host-generic.c
> +++ b/lib/pci-host-generic.c
> @@ -291,6 +291,18 @@ u32 pci_config_readl(pcidevaddr_t dev, u8 off)
>  	return readl(conf + off);
>  }
>  
> +void pci_config_writeb(pcidevaddr_t dev, u8 off, u8 val)
> +{
> +	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
> +	writeb(val, conf + off);
> +}
> +
> +void pci_config_writew(pcidevaddr_t dev, u8 off, u16 val)
> +{
> +	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
> +	writew(val, conf + off);
> +}
> +
>  void pci_config_writel(pcidevaddr_t dev, u8 off, u32 val)
>  {
>  	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
> diff --git a/lib/x86/asm/pci.h b/lib/x86/asm/pci.h
> index 7384b91..abfe5d1 100644
> --- a/lib/x86/asm/pci.h
> +++ b/lib/x86/asm/pci.h
> @@ -29,6 +29,20 @@ static inline uint32_t pci_config_readl(pcidevaddr_t dev, uint8_t reg)
>      return inl(0xCFC);
>  }
>  
> +static inline void pci_config_writeb(pcidevaddr_t dev, uint8_t reg,
> +				     uint8_t val)
> +{
> +    outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
> +    outb(val, 0xCFC);
> +}
> +
> +static inline void pci_config_writew(pcidevaddr_t dev, uint8_t reg,
> +				     uint32_t val)
                                         ^^ should be 16

> +{
> +    outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
> +    outw(val, 0xCFC);
> +}
> +
>  static inline void pci_config_writel(pcidevaddr_t dev, uint8_t reg, uint32_t val)
>  {
>      outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
> -- 
> 2.7.4
>

Otherwise

Reviewed-by: Andrew Jones <drjones@redhat.com> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master()
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master() Peter Xu
@ 2016-11-04 17:04   ` Andrew Jones
  2016-11-07 17:35     ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:04 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:13PM +0800, Peter Xu wrote:
> We need to call this whenever we want to do DMA with a device.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci.c | 7 +++++++
>  lib/pci.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/lib/pci.c b/lib/pci.c
> index 95a6581..1495f61 100644
> --- a/lib/pci.c
> +++ b/lib/pci.c
> @@ -7,6 +7,13 @@
>  #include "pci.h"
>  #include "asm/pci.h"
>  
> +void pci_set_master(struct pci_dev *dev, int master)

Why take pci_dev instead of just the devid/bdf here?
What is the 'master' argument (which is unused and at least
needs an __unused to compile) here for?

> +{
> +	uint32_t val = pci_config_readw(dev->pci_bdf, PCI_COMMAND);
> +	val |= PCI_COMMAND_MASTER;
> +	pci_config_writew(dev->pci_bdf, PCI_COMMAND, val);

Hmm... Alex uses pci_config_writel for the PCI_COMMAND register.
I see writew is correct. Need to go back his series and make another
comment...

> +}
> +
>  bool pci_dev_exists(pcidevaddr_t dev)
>  {
>  	return (pci_config_readw(dev, PCI_VENDOR_ID) != 0xffff &&
> diff --git a/lib/pci.h b/lib/pci.h
> index e48045c..46657f4 100644
> --- a/lib/pci.h
> +++ b/lib/pci.h
> @@ -25,6 +25,7 @@ struct pci_dev {
>  
>  void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
>  void pci_scan_bars(struct pci_dev *dev);
> +void pci_set_master(struct pci_dev *dev, int master);
>  
>  extern bool pci_probe(void);
>  extern void pci_print(void);
> -- 
> 2.7.4
>

drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 11/17] pci: provide pci_enable_defaults()
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 11/17] pci: provide pci_enable_defaults() Peter Xu
@ 2016-11-04 17:08   ` Andrew Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:08 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:14PM +0800, Peter Xu wrote:
> Provide a function to do most of the common PCI init work.
> 
> Suggested-by: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci.c | 7 +++++++
>  lib/pci.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/lib/pci.c b/lib/pci.c
> index 1495f61..d78472f 100644
> --- a/lib/pci.c
> +++ b/lib/pci.c
> @@ -56,6 +56,13 @@ void pci_scan_bars(struct pci_dev *dev)
>  	}
>  }
>  
> +int pci_enable_defaults(struct pci_dev *dev)
> +{
> +	pci_scan_bars(dev);
> +	pci_set_master(dev, 1);
> +	return 0;
> +}
> +
>  uint32_t pci_bar_mask(uint32_t bar)
>  {
>  	return (bar & PCI_BASE_ADDRESS_SPACE_IO) ?
> diff --git a/lib/pci.h b/lib/pci.h
> index 46657f4..0336455 100644
> --- a/lib/pci.h
> +++ b/lib/pci.h
> @@ -26,6 +26,7 @@ struct pci_dev {
>  void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
>  void pci_scan_bars(struct pci_dev *dev);
>  void pci_set_master(struct pci_dev *dev, int master);
> +int pci_enable_defaults(struct pci_dev *dev);
>  
>  extern bool pci_probe(void);
>  extern void pci_print(void);
> -- 
> 2.7.4
>

Reviewed-by: Andrew Jones <drjones@redhat.com> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers Peter Xu
@ 2016-11-04 17:18   ` Andrew Jones
  2016-11-07 17:41     ` Peter Xu
  2016-11-04 17:24   ` Andrew Jones
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:18 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:16PM +0800, Peter Xu wrote:
> QEMU edu device is a pci device that is originally written for
> educational purpose, however it also suits for IOMMU unit test. Adding
> helpers for this specific device to implement the device logic.
> 
> The device supports lots of functions, here only DMA operation is
> supported.
> 
> The spec of the device can be found at:
> 
>   https://github.com/qemu/qemu/blob/master/docs/specs/edu.txt
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci-edu.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/pci-edu.h | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 153 insertions(+)
>  create mode 100644 lib/pci-edu.c
>  create mode 100644 lib/pci-edu.h
> 
> diff --git a/lib/pci-edu.c b/lib/pci-edu.c
> new file mode 100644
> index 0000000..c594e08
> --- /dev/null
> +++ b/lib/pci-edu.c
> @@ -0,0 +1,74 @@
> +/*
> + * Edu PCI device.
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * Authors:
> + *   Peter Xu <peterx@redhat.com>,
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or
> + * later.
> + */
> +
> +#include "pci-edu.h"
> +#include "asm/barrier.h"
> +
> +/* Return true if alive */
> +static inline bool edu_check_alive(struct pci_edu_dev *dev)
> +{
> +	static uint32_t live_count = 1;
> +	uint32_t value;
> +
> +	edu_reg_write(dev, EDU_REG_ALIVE, live_count++);
> +	value = edu_reg_read(dev, EDU_REG_ALIVE);
> +	return (live_count - 1 == ~value);
> +}
> +
> +int edu_init(struct pci_edu_dev *dev)
> +{
> +	int ret;
> +
> +	ret = pci_find_dev(&dev->pci_dev, PCI_VENDOR_ID_QEMU,
> +			   PCI_DEVICE_ID_EDU);
> +	if (ret)
> +		return ret;
> +
> +	pci_enable_defaults(&dev->pci_dev);
> +	assert(edu_check_alive(dev));
> +
> +	return 0;
> +}
> +
> +void edu_dma(struct pci_edu_dev *dev, iova_t iova,
> +	     size_t size, int dev_offset, bool from_device)
> +{
> +	uint64_t from, to;
> +	uint32_t cmd = EDU_CMD_DMA_START;
> +
> +	assert(size <= EDU_DMA_SIZE_MAX);
> +	assert(dev_offset < EDU_DMA_SIZE_MAX &&
> +	       dev_offset >= 0);
> +
> +	printf("edu device DMA start %s addr %p size 0x%lu off 0x%x\n",
> +	       from_device ? "FROM" : "TO",
> +	       (void *)iova, size, dev_offset);
> +
> +	if (from_device) {
> +		from = dev_offset + EDU_DMA_START;
> +		to = iova;
> +		cmd |= EDU_CMD_DMA_FROM;
> +	} else {
> +		from = iova;
> +		to = EDU_DMA_START + dev_offset;
> +		cmd |= EDU_CMD_DMA_TO;
> +	}
> +
> +	edu_reg_writeq(dev, EDU_REG_DMA_SRC, from);
> +	edu_reg_writeq(dev, EDU_REG_DMA_DST, to);
> +	edu_reg_writeq(dev, EDU_REG_DMA_COUNT, size);
> +	edu_reg_write(dev, EDU_REG_DMA_CMD, cmd);
> +
> +	/* Wait until DMA finished */
> +	while (edu_reg_read(dev, EDU_REG_DMA_CMD) & EDU_CMD_DMA_START)
> +		cpu_relax();
> +}
> diff --git a/lib/pci-edu.h b/lib/pci-edu.h
> new file mode 100644
> index 0000000..f98d2d1
> --- /dev/null
> +++ b/lib/pci-edu.h
> @@ -0,0 +1,79 @@
> +/*
> + * Edu PCI device header.
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * Authors:
> + *   Peter Xu <peterx@redhat.com>,
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or
> + * later.
> + *
> + * Edu device is a virtualized device in QEMU. Please refer to
> + * docs/specs/edu.txt in QEMU repository for EDU device manual.
> + */
> +#ifndef __PCI_EDU_H__
> +#define __PCI_EDU_H__
> +
> +#include "pci.h"
> +
> +#define  PCI_VENDOR_ID_QEMU              (0x1234)
> +#define  PCI_DEVICE_ID_EDU               (0x11e8)
> +
> +/* The only bar used by EDU device */
> +#define EDU_BAR_MEM                 (0)
> +#define EDU_MAGIC                   (0xed)
> +#define EDU_VERSION                 (0x100)
> +#define EDU_DMA_BUF_SIZE            (1 << 20)
> +#define EDU_INPUT_BUF_SIZE          (256)
> +
> +#define EDU_REG_ID                  (0x0)
> +#define EDU_REG_ALIVE               (0x4)
> +#define EDU_REG_FACTORIAL           (0x8)
> +#define EDU_REG_STATUS              (0x20)
> +#define EDU_REG_DMA_SRC             (0x80)
> +#define EDU_REG_DMA_DST             (0x88)
> +#define EDU_REG_DMA_COUNT           (0x90)
> +#define EDU_REG_DMA_CMD             (0x98)
> +
> +#define EDU_CMD_DMA_START           (0x01)
> +#define EDU_CMD_DMA_FROM            (0x02)
> +#define EDU_CMD_DMA_TO              (0x00)
> +
> +#define EDU_STATUS_FACTORIAL        (0x1)
> +#define EDU_STATUS_INT_ENABLE       (0x80)
> +
> +#define EDU_DMA_START               (0x40000)
> +#define EDU_DMA_SIZE_MAX            (4096)
> +
> +struct pci_edu_dev {
> +	struct pci_dev pci_dev;
> +};
> +
> +static inline uint64_t edu_reg_readq(struct pci_edu_dev *dev, int reg)
> +{
> +	return *(volatile uint64_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg);
> +}
> +
> +static inline uint32_t edu_reg_read(struct pci_edu_dev *dev, int reg)

please add the 'l' this read function name

> +{
> +	return *(volatile uint32_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg);
> +}
> +
> +static inline void edu_reg_writeq(struct pci_edu_dev *dev, int reg,
> +				  uint64_t val)
> +{
> +	*(volatile uint64_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg) = val;
> +}
> +
> +static inline void edu_reg_write(struct pci_edu_dev *dev, int reg,
> +				 uint32_t val)

please add the 'l'

> +{
> +	*(volatile uint32_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg) = val;
> +}
> +
> +int edu_init(struct pci_edu_dev *dev);
> +void edu_dma(struct pci_edu_dev *dev, iova_t iova,
> +	     size_t size, int dev_offset, bool from_device);
> +
> +#endif
> -- 
> 2.7.4

Otherwise

Reviewed-by: Andrew Jones <drjones@redhat.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers Peter Xu
  2016-11-04 17:18   ` Andrew Jones
@ 2016-11-04 17:24   ` Andrew Jones
  2016-11-07 17:44     ` Peter Xu
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:24 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:16PM +0800, Peter Xu wrote:
> QEMU edu device is a pci device that is originally written for
> educational purpose, however it also suits for IOMMU unit test. Adding
> helpers for this specific device to implement the device logic.
> 
> The device supports lots of functions, here only DMA operation is
> supported.
> 
> The spec of the device can be found at:
> 
>   https://github.com/qemu/qemu/blob/master/docs/specs/edu.txt
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci-edu.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/pci-edu.h | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 153 insertions(+)
>  create mode 100644 lib/pci-edu.c
>  create mode 100644 lib/pci-edu.h
> 
> diff --git a/lib/pci-edu.c b/lib/pci-edu.c
> new file mode 100644
> index 0000000..c594e08
> --- /dev/null
> +++ b/lib/pci-edu.c
> @@ -0,0 +1,74 @@
> +/*
> + * Edu PCI device.
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * Authors:
> + *   Peter Xu <peterx@redhat.com>,
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or
> + * later.
> + */
> +
> +#include "pci-edu.h"
> +#include "asm/barrier.h"
> +
> +/* Return true if alive */
> +static inline bool edu_check_alive(struct pci_edu_dev *dev)
> +{
> +	static uint32_t live_count = 1;
> +	uint32_t value;
> +
> +	edu_reg_write(dev, EDU_REG_ALIVE, live_count++);
> +	value = edu_reg_read(dev, EDU_REG_ALIVE);
> +	return (live_count - 1 == ~value);
> +}
> +
> +int edu_init(struct pci_edu_dev *dev)
> +{
> +	int ret;
> +
> +	ret = pci_find_dev(&dev->pci_dev, PCI_VENDOR_ID_QEMU,
> +			   PCI_DEVICE_ID_EDU);
> +	if (ret)
> +		return ret;

Another comment: as I mentioned earlier, I think pci_find_dev should
return a bool. Here too, i.e. edu_init returns true when init
succeeds, else false. The only reason to go with a zero success,
non-zero (negative) failure is if multiple error codes will be
introduced. I don't think that'll happen for kvm-unit-tests though.

> +
> +	pci_enable_defaults(&dev->pci_dev);
> +	assert(edu_check_alive(dev));
> +
> +	return 0;
> +}
> +
> +void edu_dma(struct pci_edu_dev *dev, iova_t iova,
> +	     size_t size, int dev_offset, bool from_device)
> +{
> +	uint64_t from, to;
> +	uint32_t cmd = EDU_CMD_DMA_START;
> +
> +	assert(size <= EDU_DMA_SIZE_MAX);
> +	assert(dev_offset < EDU_DMA_SIZE_MAX &&
> +	       dev_offset >= 0);
> +
> +	printf("edu device DMA start %s addr %p size 0x%lu off 0x%x\n",
> +	       from_device ? "FROM" : "TO",
> +	       (void *)iova, size, dev_offset);
> +
> +	if (from_device) {
> +		from = dev_offset + EDU_DMA_START;
> +		to = iova;
> +		cmd |= EDU_CMD_DMA_FROM;
> +	} else {
> +		from = iova;
> +		to = EDU_DMA_START + dev_offset;
> +		cmd |= EDU_CMD_DMA_TO;
> +	}
> +
> +	edu_reg_writeq(dev, EDU_REG_DMA_SRC, from);
> +	edu_reg_writeq(dev, EDU_REG_DMA_DST, to);
> +	edu_reg_writeq(dev, EDU_REG_DMA_COUNT, size);
> +	edu_reg_write(dev, EDU_REG_DMA_CMD, cmd);
> +
> +	/* Wait until DMA finished */
> +	while (edu_reg_read(dev, EDU_REG_DMA_CMD) & EDU_CMD_DMA_START)
> +		cpu_relax();
> +}
> diff --git a/lib/pci-edu.h b/lib/pci-edu.h
> new file mode 100644
> index 0000000..f98d2d1
> --- /dev/null
> +++ b/lib/pci-edu.h
> @@ -0,0 +1,79 @@
> +/*
> + * Edu PCI device header.
> + *
> + * Copyright (C) 2016 Red Hat, Inc.
> + *
> + * Authors:
> + *   Peter Xu <peterx@redhat.com>,
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or
> + * later.
> + *
> + * Edu device is a virtualized device in QEMU. Please refer to
> + * docs/specs/edu.txt in QEMU repository for EDU device manual.
> + */
> +#ifndef __PCI_EDU_H__
> +#define __PCI_EDU_H__
> +
> +#include "pci.h"
> +
> +#define  PCI_VENDOR_ID_QEMU              (0x1234)
> +#define  PCI_DEVICE_ID_EDU               (0x11e8)
> +
> +/* The only bar used by EDU device */
> +#define EDU_BAR_MEM                 (0)
> +#define EDU_MAGIC                   (0xed)
> +#define EDU_VERSION                 (0x100)
> +#define EDU_DMA_BUF_SIZE            (1 << 20)
> +#define EDU_INPUT_BUF_SIZE          (256)
> +
> +#define EDU_REG_ID                  (0x0)
> +#define EDU_REG_ALIVE               (0x4)
> +#define EDU_REG_FACTORIAL           (0x8)
> +#define EDU_REG_STATUS              (0x20)
> +#define EDU_REG_DMA_SRC             (0x80)
> +#define EDU_REG_DMA_DST             (0x88)
> +#define EDU_REG_DMA_COUNT           (0x90)
> +#define EDU_REG_DMA_CMD             (0x98)
> +
> +#define EDU_CMD_DMA_START           (0x01)
> +#define EDU_CMD_DMA_FROM            (0x02)
> +#define EDU_CMD_DMA_TO              (0x00)
> +
> +#define EDU_STATUS_FACTORIAL        (0x1)
> +#define EDU_STATUS_INT_ENABLE       (0x80)
> +
> +#define EDU_DMA_START               (0x40000)
> +#define EDU_DMA_SIZE_MAX            (4096)
> +
> +struct pci_edu_dev {
> +	struct pci_dev pci_dev;
> +};
> +
> +static inline uint64_t edu_reg_readq(struct pci_edu_dev *dev, int reg)
> +{
> +	return *(volatile uint64_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg);
> +}
> +
> +static inline uint32_t edu_reg_read(struct pci_edu_dev *dev, int reg)
> +{
> +	return *(volatile uint32_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg);
> +}
> +
> +static inline void edu_reg_writeq(struct pci_edu_dev *dev, int reg,
> +				  uint64_t val)
> +{
> +	*(volatile uint64_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg) = val;
> +}
> +
> +static inline void edu_reg_write(struct pci_edu_dev *dev, int reg,
> +				 uint32_t val)
> +{
> +	*(volatile uint32_t *)(dev->pci_dev.pci_bar[EDU_BAR_MEM] + reg) = val;
> +}
> +
> +int edu_init(struct pci_edu_dev *dev);
> +void edu_dma(struct pci_edu_dev *dev, iova_t iova,
> +	     size_t size, int dev_offset, bool from_device);
> +
> +#endif
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 15/17] pci: add msi support for 32/64bit address
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 15/17] pci: add msi support for 32/64bit address Peter Xu
@ 2016-11-04 17:33   ` Andrew Jones
  2016-11-07 17:58     ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:33 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:18PM +0800, Peter Xu wrote:
> pci_cap_walk() is provided to allow walk through all the capability bits
> for a specific PCI device. If a cap handler is provided, it'll be
> triggered if the cap is detected along the cap walk.
> 
> MSI cap handler is the first one supported. We can add more cap handler
> in the future.
> 
> Meanwhile, pci_setup_msi() API is provided to support basic 32/64 bit
> address MSI setup.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/pci.h |  3 +++
>  2 files changed, 71 insertions(+)
> 
> diff --git a/lib/pci.c b/lib/pci.c
> index d78472f..8d9a455 100644
> --- a/lib/pci.c
> +++ b/lib/pci.c
> @@ -7,6 +7,69 @@
>  #include "pci.h"
>  #include "asm/pci.h"
>  
> +typedef void (*pci_cap_handler)(struct pci_dev *dev, int cap_offset);
> +
> +static void pci_cap_msi_handler(struct pci_dev *dev, int cap_offset)
> +{
> +	printf("Detected MSI for device 0x%x offset 0x%x\n",
> +	       dev->pci_bdf, cap_offset);
> +	dev->msi_offset = cap_offset;
> +}
> +
> +static pci_cap_handler cap_handlers[0xff] = {

Why 0xff? Shouldn't it be (PCI_CAP_ID_MAX + 1)?

> +	[PCI_CAP_ID_MSI] = pci_cap_msi_handler,
> +};
> +
> +void pci_cap_walk(struct pci_dev *dev)
> +{
> +	uint8_t cap_offset;
> +	uint8_t cap_id;
> +
> +	cap_offset = pci_config_readb(dev->pci_bdf, PCI_CAPABILITY_LIST);
> +	while (cap_offset) {
> +		cap_id = pci_config_readb(dev->pci_bdf, cap_offset);
> +		printf("PCI detected cap 0x%x\n", cap_id);
> +		if (cap_handlers[cap_id])
> +			cap_handlers[cap_id](dev, cap_offset);
> +		cap_offset = pci_config_readb(dev->pci_bdf, cap_offset + 1);
> +	}
> +}
> +
> +int pci_setup_msi(struct pci_dev *dev, uint64_t msi_addr, uint32_t msi_data)
> +{
> +	uint16_t msi_control;
> +	uint16_t offset;
> +	pcidevaddr_t addr = dev->pci_bdf;
> +
> +	assert(dev);
> +
> +	if (!dev->msi_offset) {
> +		printf("MSI: dev 0x%x does not support MSI.\n", addr);
> +		return -1;
> +	}
> +
> +	offset = dev->msi_offset;
> +	msi_control = pci_config_readw(addr, offset + PCI_MSI_FLAGS);
> +	pci_config_writel(addr, offset + PCI_MSI_ADDRESS_LO,
> +			  msi_addr & 0xffffffff);
> +
> +	if (msi_control & PCI_MSI_FLAGS_64BIT) {
> +		pci_config_writel(addr, offset + PCI_MSI_ADDRESS_HI,
> +				  (uint32_t)(msi_addr >> 32));
> +		pci_config_writel(addr, offset + PCI_MSI_DATA_64, msi_data);
> +		printf("MSI: dev 0x%x init 64bit address: ", addr);
> +	} else {
> +		pci_config_writel(addr, offset + PCI_MSI_DATA_32, msi_data);
> +		printf("MSI: dev 0x%x init 32bit address: ", addr);
> +	}
> +	printf("addr=0x%lx, data=0x%x\n", msi_addr, msi_data);
> +
> +	msi_control |= PCI_MSI_FLAGS_ENABLE;
> +	pci_config_writew(addr, offset + PCI_MSI_FLAGS, msi_control);
> +
> +	return 0;
> +}

Again we only have 0/-1 for good/bad. I'd prefer true/false.

> +
>  void pci_set_master(struct pci_dev *dev, int master)
>  {
>  	uint32_t val = pci_config_readw(dev->pci_bdf, PCI_COMMAND);
> @@ -20,6 +83,10 @@ bool pci_dev_exists(pcidevaddr_t dev)
>  		pci_config_readw(dev, PCI_DEVICE_ID) != 0xffff);
>  }
>  
> +/*
> + * Scan bus look for a specific device. Only bus 0 scanned for now.
> + * After the scan, a pci_dev is returned with correct BAR information.
> + */
>  void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf)
>  {
>         memset(dev, 0, sizeof(*dev));
> @@ -60,6 +127,7 @@ int pci_enable_defaults(struct pci_dev *dev)
>  {
>  	pci_scan_bars(dev);
>  	pci_set_master(dev, 1);
> +	pci_cap_walk(dev);
>  	return 0;
>  }
>  
> diff --git a/lib/pci.h b/lib/pci.h
> index bc6b91a..a0331ac 100644
> --- a/lib/pci.h
> +++ b/lib/pci.h
> @@ -23,13 +23,16 @@ enum {
>  
>  struct pci_dev {
>  	uint16_t pci_bdf;
> +	uint16_t msi_offset;
>  	phys_addr_t pci_bar[PCI_BAR_NUM];
>  };
>  
>  void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf);
>  void pci_scan_bars(struct pci_dev *dev);
>  void pci_set_master(struct pci_dev *dev, int master);
> +void pci_cap_walk(struct pci_dev *dev);
>  int pci_enable_defaults(struct pci_dev *dev);
> +int pci_setup_msi(struct pci_dev *dev, uint64_t msi_addr, uint32_t msi_data);
>  
>  typedef phys_addr_t iova_t;
>  
> -- 
> 2.7.4
>

Otherwise

Reviewed-by: Andrew Jones <drjones@redhat.com> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 16/17] x86: intel-iommu: add IR MSI test
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 16/17] x86: intel-iommu: add IR MSI test Peter Xu
@ 2016-11-04 17:40   ` Andrew Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:40 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:19PM +0800, Peter Xu wrote:
> First of all, vtd_setup_msi() is provided. It setup IRTE entries,
> meanwhile, setup PCI device MSI vectors corresponding to VT-d spec.
> 
> Meanwhile, IR MSI test is added to intel IOMMU unit test. The basic IR
> test is carried out by a edu INTR raise request. When write to the intr
> raise register, interrupt will be generated. Type of interrupt will
> depend on the setup (either INTx or MSI).
> 
> Suggested-by: Andrew Jones <drjones@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci-edu.h         |  1 +
>  lib/x86/intel-iommu.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/x86/intel-iommu.h |  1 +
>  x86/intel-iommu.c     | 44 +++++++++++++++++++++++-
>  4 files changed, 139 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/pci-edu.h b/lib/pci-edu.h
> index f98d2d1..ea46281 100644
> --- a/lib/pci-edu.h
> +++ b/lib/pci-edu.h
> @@ -31,6 +31,7 @@
>  #define EDU_REG_ALIVE               (0x4)
>  #define EDU_REG_FACTORIAL           (0x8)
>  #define EDU_REG_STATUS              (0x20)
> +#define EDU_REG_INTR_RAISE          (0x60)
>  #define EDU_REG_DMA_SRC             (0x80)
>  #define EDU_REG_DMA_DST             (0x88)
>  #define EDU_REG_DMA_COUNT           (0x90)
> diff --git a/lib/x86/intel-iommu.c b/lib/x86/intel-iommu.c
> index b680453..dd32887 100644
> --- a/lib/x86/intel-iommu.c
> +++ b/lib/x86/intel-iommu.c
> @@ -12,6 +12,7 @@
>  
>  #include "intel-iommu.h"
>  #include "libcflat.h"
> +#include "pci.h"
>  
>  /*
>   * VT-d in QEMU currently only support 39 bits address width, which is
> @@ -48,6 +49,26 @@ struct vtd_context_entry {
>  } __attribute__ ((packed));
>  typedef struct vtd_context_entry vtd_ce_t;
>  
> +struct vtd_irte {
> +	uint32_t present:1;
> +	uint32_t fault_disable:1;    /* Fault Processing Disable */
> +	uint32_t dest_mode:1;        /* Destination Mode */
> +	uint32_t redir_hint:1;       /* Redirection Hint */
> +	uint32_t trigger_mode:1;     /* Trigger Mode */
> +	uint32_t delivery_mode:3;    /* Delivery Mode */
> +	uint32_t __avail:4;          /* Available spaces for software */
> +	uint32_t __reserved_0:3;     /* Reserved 0 */
> +	uint32_t irte_mode:1;        /* IRTE Mode */
> +	uint32_t vector:8;           /* Interrupt Vector */
> +	uint32_t __reserved_1:8;     /* Reserved 1 */
> +	uint32_t dest_id;            /* Destination ID */
> +	uint16_t source_id:16;       /* Source-ID */
> +	uint64_t sid_q:2;            /* Source-ID Qualifier */
> +	uint64_t sid_vtype:2;        /* Source-ID Validation Type */
> +	uint64_t __reserved_2:44;    /* Reserved 2 */
> +} __attribute__ ((packed));
> +typedef struct vtd_irte vtd_irte_t;
> +
>  #define ONE_BIT_ONLY(x) ((x) && !((x) & ((x) - 1)))
>  
>  #define VTD_RTA_MASK  (PAGE_MASK)
> @@ -205,6 +226,79 @@ void vtd_map_range(uint16_t sid, iova_t iova, phys_addr_t pa, size_t size)
>  	}
>  }
>  
> +static uint16_t vtd_intr_index_alloc(void)
> +{
> +	static int index_ctr = 0;
> +	assert(index_ctr < 65535);
> +	return index_ctr++;
> +}
> +
> +static void vtd_setup_irte(struct pci_dev *dev, vtd_irte_t *irte,
> +			   int vector, int dest_id)
> +{
> +	assert(sizeof(vtd_irte_t) == 16);
> +	memset(irte, 0, sizeof(*irte));
> +	irte->fault_disable = 1;
> +	irte->dest_mode = 0;	 /* physical */
> +	irte->trigger_mode = 0;	 /* edge */
> +	irte->delivery_mode = 0; /* fixed */
> +	irte->irte_mode = 0;	 /* remapped */
> +	irte->vector = vector;
> +	irte->dest_id = dest_id;
> +	irte->source_id = dev->pci_bdf;
> +	irte->sid_q = 0;
> +	irte->sid_vtype = 1;     /* full-sid verify */
> +	irte->present = 1;
> +}
> +
> +struct vtd_msi_addr {
> +	uint32_t __dont_care:2;
> +	uint32_t handle_15:1;	 /* handle[15] */
> +	uint32_t shv:1;
> +	uint32_t interrupt_format:1;
> +	uint32_t handle_0_14:15; /* handle[0:14] */
> +	uint32_t head:12;	 /* 0xfee */
> +	uint32_t addr_hi;	 /* not used except with x2apic */
> +} __attribute__ ((packed));
> +typedef struct vtd_msi_addr vtd_msi_addr_t;
> +
> +struct vtd_msi_data {
> +	uint16_t __reserved;
> +	uint16_t subhandle;
> +} __attribute__ ((packed));
> +typedef struct vtd_msi_data vtd_msi_data_t;
> +
> +/**
> + * vtd_setup_msi - setup MSI message for a device
> + *
> + * @dev: PCI device to setup MSI
> + * @vector: interrupt vector
> + * @dest_id: destination processor
> + */
> +int vtd_setup_msi(struct pci_dev *dev, int vector, int dest_id)
> +{
> +	vtd_msi_data_t msi_data = {};
> +	vtd_msi_addr_t msi_addr = {};
> +	vtd_irte_t *irte = phys_to_virt(vtd_ir_table());
> +	uint16_t index = vtd_intr_index_alloc();
> +
> +	assert(sizeof(vtd_msi_addr_t) == 8);
> +	assert(sizeof(vtd_msi_data_t) == 4);
> +
> +	printf("INTR: setup IRTE index %d\n", index);
> +	vtd_setup_irte(dev, irte + index, vector, dest_id);
> +
> +	msi_addr.handle_15 = index >> 15 & 1;
> +	msi_addr.shv = 0;
> +	msi_addr.interrupt_format = 1;
> +	msi_addr.handle_0_14 = index & 0x7fff;
> +	msi_addr.head = 0xfee;
> +	msi_data.subhandle = 0;
> +
> +	return pci_setup_msi(dev, *(uint64_t *)&msi_addr,
> +			     *(uint32_t *)&msi_data);
> +}
> +
>  void vtd_init(void)
>  {
>  	setup_vm();
> diff --git a/lib/x86/intel-iommu.h b/lib/x86/intel-iommu.h
> index 40816bb..6ce4263 100644
> --- a/lib/x86/intel-iommu.h
> +++ b/lib/x86/intel-iommu.h
> @@ -134,5 +134,6 @@ static inline uint64_t vtd_readq(unsigned int reg)
>  
>  void vtd_init(void);
>  void vtd_map_range(uint16_t sid, phys_addr_t iova, phys_addr_t pa, size_t size);
> +int vtd_setup_msi(struct pci_dev *dev, int vector, int dest_id);
>  
>  #endif
> diff --git a/x86/intel-iommu.c b/x86/intel-iommu.c
> index 26b39d1..8f28736 100644
> --- a/x86/intel-iommu.c
> +++ b/x86/intel-iommu.c
> @@ -12,8 +12,10 @@
>  
>  #include "intel-iommu.h"
>  #include "pci-edu.h"
> +#include "x86/apic.h"
>  
>  #define VTD_TEST_DMAR_4B ("DMAR 4B memcpy test")
> +#define VTD_TEST_IR_MSI ("IR MSI")
>  
>  void vtd_test_dmar(struct pci_edu_dev *dev)
>  {
> @@ -50,6 +52,43 @@ void vtd_test_dmar(struct pci_edu_dev *dev)
>  	free_page(page);
>  }
>  
> +static volatile bool edu_intr_recved;
> +
> +static void edu_isr(isr_regs_t *regs)
> +{
> +	edu_intr_recved = true;
> +	eoi();
> +}
> +
> +static void vtd_test_ir(struct pci_edu_dev *dev)
> +{
> +#define VTD_TEST_VECTOR (0xee)
> +	/*
> +	 * Setup EDU PCI device MSI, using interrupt remapping. By
> +	 * default, EDU device is using INTx.
> +	 */
> +	if (vtd_setup_msi(&dev->pci_dev, VTD_TEST_VECTOR, 0)) {
> +		printf("edu device does not support MSI, skip test\n");
> +		report_skip(VTD_TEST_IR_MSI);
> +		return;
> +	}
> +
> +	handle_irq(VTD_TEST_VECTOR, edu_isr);
> +	irq_enable();
> +
> +	/* Manually trigger INTR */
> +	edu_reg_write(dev, EDU_REG_INTR_RAISE, 1);
> +
> +	while (!edu_intr_recved)
> +		cpu_relax();
> +
> +	/* Clear INTR bits */
> +	edu_reg_write(dev, EDU_REG_INTR_RAISE, 0);
> +
> +	/* We are good as long as we reach here */
> +	report(VTD_TEST_IR_MSI, true);
> +}
> +
>  int main(int argc, char *argv[])
>  {
>  	int ret;
> @@ -72,8 +111,11 @@ int main(int argc, char *argv[])
>  		printf("Please specify \"-device edu\" to do "
>  		       "further IOMMU tests.\n");
>  		report_skip(VTD_TEST_DMAR_4B);
> -	} else
> +		report_skip(VTD_TEST_IR_MSI);
> +	} else {
>  		vtd_test_dmar(&dev);
> +		vtd_test_ir(&dev);
> +	}
>  
>  	return report_summary();
>  }
> -- 
> 2.7.4
>

I didn't confirm vtd_setup_msi() was correct, but the test looks good to
me.

drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test Peter Xu
@ 2016-11-04 17:46   ` Andrew Jones
  2016-11-07 18:06     ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:46 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:20PM +0800, Peter Xu wrote:
> I used pc-q35-2.7 instead of default q35 since I'd better make sure both
> QEMU 2.7/2.8 can run the test, which is to at least satisfy the
> following matrix:
> 
>   |----------+-----------+------------|
>   | versions | KVM < 4.7 | KVM >= 4.7 |
>   |----------+-----------+------------|
>   | QEMU 2.7 |           |            |
>   | QEMU 2.8 |           |            |
>   |----------+-----------+------------|
> 
> If using default q35, QEMU2.8+KVM<4.7 will fail to run.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  x86/unittests.cfg | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> index f760d8f..d59cf68 100644
> --- a/x86/unittests.cfg
> +++ b/x86/unittests.cfg
> @@ -206,3 +206,10 @@ extra_params = -cpu kvm64,hv_time,hv_synic,hv_stimer -device hyperv-testdev
>  file = hyperv_clock.flat
>  smp = 2
>  extra_params = -cpu kvm64,hv_time
> +
> +[intel_iommu]
> +file = intel-iommu.flat
> +arch = x86_64
> +timeout = 30
> +smp = 4
> +extra_params = -M pc-q35-2.7,kernel-irqchip=split -device intel-iommu,intremap=on -device edu
> -- 
> 2.7.4
> 

Hmm, this is the first we've worried about this in kvm-unit-tests (And
I'm not sure we should start). Usually we just want the tests to work
with latest KVM and latest QEMU, or to graceful fail with
report_xfail/skip/abort when it can determine that it's missing something
it needs.

Thanks,
drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 12/17] pci: add bdf helpers
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 12/17] pci: add bdf helpers Peter Xu
@ 2016-11-04 17:51   ` Andrew Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:51 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:15PM +0800, Peter Xu wrote:
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/lib/pci.h b/lib/pci.h
> index 0336455..7a3bce9 100644
> --- a/lib/pci.h
> +++ b/lib/pci.h
> @@ -18,6 +18,9 @@ enum {
>  #define PCI_BAR_NUM                     (6)
>  #define PCI_DEVFN_MAX                   (256)
>  
> +#define PCI_BDF_GET_DEVFN(x)            ((x) & 0xff)
> +#define PCI_BDF_GET_BUS(x)              (((x) >> 8) & 0xff)

I feel the above four defines (PCI_BAR_NUM, PCI_DEVFN_MAX,
PCI_BDF_GET_DEVFN, PCI_BDF_GET_BUS) are on the crufty side, but I
won't fight too hard to try to convince you to drop them :-)

> +
>  struct pci_dev {
>  	uint16_t pci_bdf;
>  	phys_addr_t pci_bar[PCI_BAR_NUM];
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 14/17] x86: intel-iommu: add dmar test
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 14/17] x86: intel-iommu: add dmar test Peter Xu
@ 2016-11-04 17:53   ` Andrew Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-04 17:53 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:17PM +0800, Peter Xu wrote:
> DMAR test is based on QEMU edu device. A 4B DMA memory copy is carried
> out as the simplest DMAR test.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci.h             |   2 +
>  lib/x86/intel-iommu.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/x86/intel-iommu.h |  23 +++++++++
>  x86/Makefile.common   |   1 +
>  x86/intel-iommu.c     |  52 ++++++++++++++++++++
>  5 files changed, 209 insertions(+)
> 
> diff --git a/lib/pci.h b/lib/pci.h
> index 7a3bce9..bc6b91a 100644
> --- a/lib/pci.h
> +++ b/lib/pci.h
> @@ -31,6 +31,8 @@ void pci_scan_bars(struct pci_dev *dev);
>  void pci_set_master(struct pci_dev *dev, int master);
>  int pci_enable_defaults(struct pci_dev *dev);
>  
> +typedef phys_addr_t iova_t;
> +
>  extern bool pci_probe(void);
>  extern void pci_print(void);
>  extern bool pci_dev_exists(pcidevaddr_t dev);
> diff --git a/lib/x86/intel-iommu.c b/lib/x86/intel-iommu.c
> index 02eaf39..b680453 100644
> --- a/lib/x86/intel-iommu.c
> +++ b/lib/x86/intel-iommu.c
> @@ -11,6 +11,42 @@
>   */
>  
>  #include "intel-iommu.h"
> +#include "libcflat.h"
> +
> +/*
> + * VT-d in QEMU currently only support 39 bits address width, which is
> + * 3-level translation.
> + */
> +#define VTD_PAGE_LEVEL      (3)
> +#define VTD_CE_AW_39BIT     (0x1)
> +
> +typedef uint64_t vtd_pte_t;
> +
> +struct vtd_root_entry {
> +	/* Quad 1 */
> +	uint64_t present:1;
> +	uint64_t __reserved:11;
> +	uint64_t context_table_p:52;
> +	/* Quad 2 */
> +	uint64_t __reserved_2;
> +} __attribute__ ((packed));
> +typedef struct vtd_root_entry vtd_re_t;
> +
> +struct vtd_context_entry {
> +	/* Quad 1 */
> +	uint64_t present:1;
> +	uint64_t disable_fault_report:1;
> +	uint64_t trans_type:2;
> +	uint64_t __reserved:8;
> +	uint64_t slptptr:52;
> +	/* Quad 2 */
> +	uint64_t addr_width:3;
> +	uint64_t __ignore:4;
> +	uint64_t __reserved_2:1;
> +	uint64_t domain_id:16;
> +	uint64_t __reserved_3:40;
> +} __attribute__ ((packed));
> +typedef struct vtd_context_entry vtd_ce_t;
>  
>  #define ONE_BIT_ONLY(x) ((x) && !((x) & ((x) - 1)))
>  
> @@ -74,6 +110,101 @@ static void vtd_setup_ir_table(void)
>  	printf("IR table address: 0x%016lx\n", vtd_ir_table());
>  }
>  
> +static void vtd_install_pte(vtd_pte_t *root, iova_t iova,
> +			    phys_addr_t pa, int level_target)
> +{
> +	int level;
> +	unsigned int offset;
> +	void *page;
> +
> +	for (level = VTD_PAGE_LEVEL; level > level_target; level--) {
> +		offset = PGDIR_OFFSET(iova, level);
> +		if (!(root[offset] & VTD_PTE_RW)) {
> +			page = alloc_page();
> +			memset(page, 0, PAGE_SIZE);
> +			root[offset] = virt_to_phys(page) | VTD_PTE_RW;
> +		}
> +		root = (uint64_t *)(root[offset] & VTD_PTE_ADDR);
> +	}
> +
> +	offset = PGDIR_OFFSET(iova, level);
> +	root[offset] = pa | VTD_PTE_RW;
> +	if (level != 1)
> +		/* This is huge page */
> +		root[offset] |= VTD_PTE_HUGE;
> +}
> +
> +#define  VTD_FETCH_VIRT_ADDR(x) \
> +	((void *)(((uint64_t)phys_to_virt(x)) >> PAGE_SHIFT))
> +
> +/**
> + * vtd_map_range: setup IO address mapping for specific memory range
> + *
> + * @sid: source ID of the device to setup
> + * @iova: start IO virtual address
> + * @pa: start physical address
> + * @size: size of the mapping area
> + */
> +void vtd_map_range(uint16_t sid, iova_t iova, phys_addr_t pa, size_t size)
> +{
> +	uint8_t bus_n, devfn;
> +	void *slptptr;
> +	vtd_ce_t *ce;
> +	vtd_re_t *re = phys_to_virt(vtd_root_table());
> +
> +	assert(IS_ALIGNED(iova, SZ_4K));
> +	assert(IS_ALIGNED(pa, SZ_4K));
> +	assert(IS_ALIGNED(size, SZ_4K));
> +
> +	bus_n = PCI_BDF_GET_BUS(sid);
> +	devfn = PCI_BDF_GET_DEVFN(sid);
> +
> +	/* Point to the correct root entry */
> +	re += bus_n;
> +
> +	if (!re->present) {
> +		ce = alloc_page();
> +		memset(ce, 0, PAGE_SIZE);
> +		memset(re, 0, sizeof(*re));
> +		re->context_table_p = virt_to_phys(ce) >> PAGE_SHIFT;
> +		re->present = 1;
> +		printf("allocated vt-d root entry for PCI bus %d\n",
> +		       bus_n);
> +	} else
> +		ce = VTD_FETCH_VIRT_ADDR(re->context_table_p);
> +
> +	/* Point to the correct context entry */
> +	ce += devfn;
> +
> +	if (!ce->present) {
> +		slptptr = alloc_page();
> +		memset(slptptr, 0, PAGE_SIZE);
> +		memset(ce, 0, sizeof(*ce));
> +		/* To make it simple, domain ID is the same as SID */
> +		ce->domain_id = sid;
> +		/* We only test 39 bits width case (3-level paging) */
> +		ce->addr_width = VTD_CE_AW_39BIT;
> +		ce->slptptr = virt_to_phys(slptptr) >> PAGE_SHIFT;
> +		ce->trans_type = VTD_CONTEXT_TT_MULTI_LEVEL;
> +		ce->present = 1;
> +		/* No error reporting yet */
> +		ce->disable_fault_report = 1;
> +		printf("allocated vt-d context entry for devfn 0x%x\n",
> +		       devfn);
> +	} else
> +		slptptr = VTD_FETCH_VIRT_ADDR(ce->slptptr);
> +
> +	while (size) {
> +		/* TODO: currently we only map 4K pages (level = 1) */
> +		printf("map 4K page IOVA 0x%lx to 0x%lx (sid=0x%04x)\n",
> +		       iova, pa, sid);
> +		vtd_install_pte(slptptr, iova, pa, 1);
> +		size -= PAGE_SIZE;
> +		iova += PAGE_SIZE;
> +		pa += PAGE_SIZE;
> +	}
> +}
> +
>  void vtd_init(void)
>  {
>  	setup_vm();
> diff --git a/lib/x86/intel-iommu.h b/lib/x86/intel-iommu.h
> index 537ce62..40816bb 100644
> --- a/lib/x86/intel-iommu.h
> +++ b/lib/x86/intel-iommu.h
> @@ -20,6 +20,7 @@
>  #include "isr.h"
>  #include "smp.h"
>  #include "desc.h"
> +#include "pci.h"
>  
>  #define Q35_HOST_BRIDGE_IOMMU_ADDR  0xfed90000ULL
>  
> @@ -90,6 +91,27 @@
>  #define VTD_GCMD_ONE_SHOT_BITS  (VTD_GCMD_IR_TABLE | VTD_GCMD_WBF | \
>  				 VTD_GCMD_SFL | VTD_GCMD_ROOT)
>  
> +/* Supported Adjusted Guest Address Widths */
> +#define VTD_CAP_SAGAW_SHIFT         8
> + /* 39-bit AGAW, 3-level page-table */
> +#define VTD_CAP_SAGAW_39bit         (0x2ULL << VTD_CAP_SAGAW_SHIFT)
> + /* 48-bit AGAW, 4-level page-table */
> +#define VTD_CAP_SAGAW_48bit         (0x4ULL << VTD_CAP_SAGAW_SHIFT)
> +#define VTD_CAP_SAGAW               VTD_CAP_SAGAW_39bit
> +
> +/* Both 1G/2M huge pages */
> +#define VTD_CAP_SLLPS               ((1ULL << 34) | (1ULL << 35))
> +
> +#define VTD_CONTEXT_TT_MULTI_LEVEL  0
> +#define VTD_CONTEXT_TT_DEV_IOTLB    1
> +#define VTD_CONTEXT_TT_PASS_THROUGH 2
> +
> +#define VTD_PTE_R                   (1 << 0)
> +#define VTD_PTE_W                   (1 << 1)
> +#define VTD_PTE_RW                  (VTD_PTE_R | VTD_PTE_W)
> +#define VTD_PTE_ADDR                GENMASK_ULL(51, 12)
> +#define VTD_PTE_HUGE                (1 << 7)
> +
>  static inline void vtd_writel(unsigned int reg, uint32_t value)
>  {
>  	*(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> @@ -111,5 +133,6 @@ static inline uint64_t vtd_readq(unsigned int reg)
>  }
>  
>  void vtd_init(void);
> +void vtd_map_range(uint16_t sid, phys_addr_t iova, phys_addr_t pa, size_t size);
>  
>  #endif
> diff --git a/x86/Makefile.common b/x86/Makefile.common
> index 356d879..1dad18b 100644
> --- a/x86/Makefile.common
> +++ b/x86/Makefile.common
> @@ -3,6 +3,7 @@
>  all: test_cases
>  
>  cflatobjs += lib/pci.o
> +cflatobjs += lib/pci-edu.o
>  cflatobjs += lib/x86/io.o
>  cflatobjs += lib/x86/smp.o
>  cflatobjs += lib/x86/vm.o
> diff --git a/x86/intel-iommu.c b/x86/intel-iommu.c
> index f247913..26b39d1 100644
> --- a/x86/intel-iommu.c
> +++ b/x86/intel-iommu.c
> @@ -11,9 +11,50 @@
>   */
>  
>  #include "intel-iommu.h"
> +#include "pci-edu.h"
> +
> +#define VTD_TEST_DMAR_4B ("DMAR 4B memcpy test")
> +
> +void vtd_test_dmar(struct pci_edu_dev *dev)
> +{
> +	void *page = alloc_page();
> +
> +#define DMA_TEST_WORD (0x12345678)
> +	/* Modify the first 4 bytes of the page */
> +	*(uint32_t *)page = DMA_TEST_WORD;
> +
> +	/*
> +	 * Map the newly allocated page into IOVA address 0 (size 4K)
> +	 * of the device address space. Root entry and context entry
> +	 * will be automatically created when needed.
> +	 */
> +	vtd_map_range(dev->pci_dev.pci_bdf, 0, virt_to_phys(page), PAGE_SIZE);
> +
> +	/*
> +	 * DMA the first 4 bytes of the page to EDU device buffer
> +	 * offset 0.
> +	 */
> +	edu_dma(dev, 0, 4, 0, false);
> +	/*
> +	 * DMA the first 4 bytes of EDU device buffer into the page
> +	 * with offset 4 (so it'll be using 4-7 bytes).
> +	 */
> +	edu_dma(dev, 4, 4, 0, true);
> +
> +	/*
> +	 * Check data match between 0-3 bytes and 4-7 bytes of the
> +	 * page.
> +	 */
> +	report(VTD_TEST_DMAR_4B, *((uint32_t *)page + 1) == DMA_TEST_WORD);
> +
> +	free_page(page);
> +}
>  
>  int main(int argc, char *argv[])
>  {
> +	int ret;
> +	struct pci_edu_dev dev;
> +
>  	vtd_init();
>  
>  	report("fault status check", vtd_readl(DMAR_FSTS_REG) == 0);
> @@ -22,6 +63,17 @@ int main(int argc, char *argv[])
>  	report("IR table setup", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_IR_TABLE);
>  	report("DMAR enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_DMAR);
>  	report("IR enablement", vtd_readl(DMAR_GSTS_REG) & VTD_GCMD_IR);
> +	report("DMAR support 39 bits address width",
> +	       vtd_readq(DMAR_CAP_REG) & VTD_CAP_SAGAW);
> +	report("DMAR support huge pages", vtd_readq(DMAR_CAP_REG) & VTD_CAP_SLLPS);
> +
> +	ret = edu_init(&dev);
> +	if (ret) {
> +		printf("Please specify \"-device edu\" to do "
> +		       "further IOMMU tests.\n");
> +		report_skip(VTD_TEST_DMAR_4B);
> +	} else
> +		vtd_test_dmar(&dev);
>  
>  	return report_summary();
>  }
> -- 
> 2.7.4

Test looks good to me, but I didn't attempt to review vtd_map_range.

drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]()
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]() Peter Xu
  2016-11-04 16:59   ` Andrew Jones
@ 2016-11-05 17:06   ` Alexander Gordeev
  2016-11-07 17:25     ` Peter Xu
  1 sibling, 1 reply; 67+ messages in thread
From: Alexander Gordeev @ 2016-11-05 17:06 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, drjones, rkrcmar, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:12PM +0800, Peter Xu wrote:
> We have writel() for 4 bytes but still missing 2/1 bytes accessors.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  lib/pci-host-generic.c | 12 ++++++++++++
>  lib/x86/asm/pci.h      | 14 ++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/lib/pci-host-generic.c b/lib/pci-host-generic.c
> index be3f9e7..c04e4f5 100644
> --- a/lib/pci-host-generic.c
> +++ b/lib/pci-host-generic.c
> @@ -291,6 +291,18 @@ u32 pci_config_readl(pcidevaddr_t dev, u8 off)
>  	return readl(conf + off);
>  }
>  
> +void pci_config_writeb(pcidevaddr_t dev, u8 off, u8 val)
> +{
> +	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
> +	writeb(val, conf + off);
> +}
> +
> +void pci_config_writew(pcidevaddr_t dev, u8 off, u16 val)
> +{
> +	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
> +	writew(val, conf + off);
> +}
> +

The two above need to be declared in lib/asm-generic/pci-host-bridge.h

>  void pci_config_writel(pcidevaddr_t dev, u8 off, u32 val)
>  {
>  	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
> diff --git a/lib/x86/asm/pci.h b/lib/x86/asm/pci.h
> index 7384b91..abfe5d1 100644
> --- a/lib/x86/asm/pci.h
> +++ b/lib/x86/asm/pci.h
> @@ -29,6 +29,20 @@ static inline uint32_t pci_config_readl(pcidevaddr_t dev, uint8_t reg)
>      return inl(0xCFC);
>  }
>  
> +static inline void pci_config_writeb(pcidevaddr_t dev, uint8_t reg,
> +				     uint8_t val)
> +{
> +    outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
> +    outb(val, 0xCFC);
> +}
> +
> +static inline void pci_config_writew(pcidevaddr_t dev, uint8_t reg,
> +				     uint32_t val)
> +{
> +    outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
> +    outw(val, 0xCFC);
> +}
> +
>  static inline void pci_config_writel(pcidevaddr_t dev, uint8_t reg, uint32_t val)
>  {
>      outl(PCI_CONF1_ADDRESS(dev, reg), 0xCF8);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test
  2016-11-04 16:12   ` Andrew Jones
@ 2016-11-07 16:32     ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-07 16:32 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 05:12:04PM +0100, Andrew Jones wrote:

[...]

> > +#include "intel-iommu.h"
> > +
> > +#define ONE_BIT_ONLY(x) ((x) && !((x) & ((x) - 1)))
> 
> How about instead adding
> 
>  static inline bool is_power_of_2(unsigned long n)
>  {
>      return (n && !(n & (n - 1)));
>  }
> 
> to libcflat.h, because there's a few other places I've needed
> this check myself.

Sure. Will do. 

[...]

> > +static void vtd_gcmd_or(uint32_t cmd)
> > +{
> > +	uint32_t status;
> > +
> > +	/* We only allow set one bit for each time */
> > +	assert(ONE_BIT_ONLY(cmd));
> > +
> > +	status = vtd_readl(DMAR_GSTS_REG);
> > +	vtd_writel(DMAR_GCMD_REG, status | cmd);
> > +
> > +	if (cmd & VTD_GCMD_ONE_SHOT_BITS)
> > +		/* One-shot bits are taking effect immediately */
> > +		return;
> 
> Please use {} when adding comments between an if and its one line.

Will fix.

Thanks!

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 05/17] x86/asm: add cpu_relax()
  2016-11-04 16:18   ` Andrew Jones
@ 2016-11-07 16:40     ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-07 16:40 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 05:18:29PM +0100, Andrew Jones wrote:
> On Wed, Oct 26, 2016 at 03:47:08PM +0800, Peter Xu wrote:
> > This will be useful to be put inside loops.
> > 
> > Suggested-by: Andrew Jones <drjones@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  lib/x86/asm/barrier.h | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/lib/x86/asm/barrier.h b/lib/x86/asm/barrier.h
> > index 7c108bd..193fb4c 100644
> > --- a/lib/x86/asm/barrier.h
> > +++ b/lib/x86/asm/barrier.h
> > @@ -13,4 +13,15 @@
> >  #define smp_rmb()	barrier()
> >  #define smp_wmb()	barrier()
> >  
> > +/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
> > +static inline void rep_nop(void)
> > +{
> > +	asm volatile("rep; nop" ::: "memory");
> > +}
> > +
> > +static inline void cpu_relax(void)
> > +{
> > +	rep_nop();
> > +}
> > +
> >  #endif
> > -- 
> > 2.7.4
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>
> 
> But this should be the first patch of the series, because your current
> first patch already introduces one busy-wait loop where it could be used.

I missed that patch. Will fix and reorder. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-04 16:41   ` Andrew Jones
@ 2016-11-07 17:05     ` Peter Xu
  2016-11-07 18:02       ` Andrew Jones
  2016-11-08 12:27     ` Alexander Gordeev
  1 sibling, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-11-07 17:05 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 05:41:01PM +0100, Andrew Jones wrote:

[...]

> > +void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf)
> > +{
> > +       memset(dev, 0, sizeof(*dev));
> > +       dev->pci_bdf = bdf;
> 
> Hmm, bdf means bus-device-function, correct? We're not passing a "bdf"
> in as the second argument to this function though. We're only passing
> devid, assuming bus=0, fn=0. If you'd prefer we stop the bus,fn zero
> assumption, then I think a precursor patch to this should be to change
> our current handle type, pcidevaddr_t, to a "bdf_t".

I thought pcidevaddr_t means bdf? That's a wild guess of me from its
size, as it's defined as:

  typedef uint16_t pcidevaddr_t;

uint16_t makes it looks more like bdf rather than devfn. For devfn,
uint8_t suites better.

And what I mean here is to pass in the bdf of the device, not devfn
only (in our case, devfn is always bdf though, since we are only
supporting bus number zero).

> 
> > +}
> > +
> >  /* Scan bus look for a specific device. Only bus 0 scanned for now. */
> > -pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
> > +int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id)
> >  {
> >  	pcidevaddr_t dev;
> >  
> > -	for (dev = 0; dev < 256; ++dev) {
> > +	for (dev = 0; dev < PCI_DEVFN_MAX; ++dev) {
> 
> Why introduce this PCI_DEVFN_MAX define?

Because I think this value might be re-used in the future in other
parts of the codes, so I defined a macro instead of using the raw
number here.

> 
> >  		if (pci_config_readw(dev, PCI_VENDOR_ID) == vendor_id &&
> > -		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id)
> > -			return dev;
> > +		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id) {
> > +			pci_dev_init(pci_dev, dev);
> > +			return 0;
> > +		}
> >  	}
> >  
> > -	return PCIDEVADDR_INVALID;
> > +	return -1;
> 
> Why not use bool for the ret type; true=good, false=bad?

True/false looks more direct, but zero/non-zero wins when in the
future we want to add more error codes than -1. Both work for me. :)

[...]

> > +struct pci_dev {
> > +	uint16_t pci_bdf;
> 
> No need for 'pci_' in the bdf name. It's whole name already has it,
> pci_dev.bdf

Yes, actually this is something of my own "bad" habit. I used to add
xxx_ prefix for better tag generations. When I read some big C repos
like Linux, I see lots of duplicated field names for different
structs/unions. For example, when I want to see codes related to
StructA.FieldX, cscope will always tell me something more than this
(rather than giving me StructA.FieldX, it gives me
Struct[ABCDE..].FieldX). Then I need to pick out where StructA is.
That's sometimes annoying, which depends on the length of the tag
list. So I prefer to add this kind of prefix in the codes.

Actually when I read some other codes (IIUC FreeBSD kernel), I see
that this "bad" technique is used as well with very short prefix,
e.g., for a struct called "net_filter" and field called "users", it
may be defined as:

  net_filter.nf_users

So the "nf_" prefix is the short form of "net_filter", and it can be
used to distinguish the net_filter "users" fields with other "users"
ones.

Here "pci" is short enough IMHO, so I prefixed it.

Not sure whether it's a good idea though.

[...]

> I'm OK with introducing the pci_dev struct, but let's not switch over to
> it from our other handle for everything all at once. I'd rather only
> switch functions that will need more state than just devid/bdf. Anywhere
> we never do then I think expecting the caller to call like
> pci_foo(pcidev->bdf, ...) is reasonable.

Agree. That's some point I want to achieve. E.g., for
pci_dev_exists(), I kept the old interface since it suites better
IMHO. Also for the pci_config_*() APIs as mentioned in the commit
message.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 07/17] pci: provide pci_scan_bars()
  2016-11-04 16:47   ` Andrew Jones
@ 2016-11-07 17:16     ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-07 17:16 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 05:47:20PM +0100, Andrew Jones wrote:
> On Wed, Oct 26, 2016 at 03:47:10PM +0800, Peter Xu wrote:
> > Let's provide a more general way to scan PCI bars, rather than read the
> > config registers every time.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  lib/pci.c | 14 ++++++++++++++
> >  lib/pci.h |  3 +++
> >  2 files changed, 17 insertions(+)
> > 
> > diff --git a/lib/pci.c b/lib/pci.c
> > index bb41e0f..95a6581 100644
> > --- a/lib/pci.c
> > +++ b/lib/pci.c
> > @@ -35,6 +35,20 @@ int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id
> >  	return -1;
> >  }
> >  
> > +void pci_scan_bars(struct pci_dev *dev)
> > +{
> > +	int i = 0;
> > +
> > +	for (i = 0; i < PCI_BAR_NUM; i++) {
> 
> I don't think we need a PCI_BAR_NUM def, but if you do introduce it
> then please change all uses of '6' to it that should be changed.
> Same goes for PCI_DEVFN_MAX.

Then I will replace corresponding number "6" into PCI_BAR_NUM when
necessary. Same for PCI_DEVFN_MAX.

> 
> > +		if (!pci_bar_is_valid(dev, i))
> > +			continue;
> > +		dev->pci_bar[i] = pci_bar_get_addr(dev, i);
> > +		printf("PCI: init dev 0x%04x BAR %d [%s] addr 0x%lx\n",
> > +		       dev->pci_bdf, i, pci_bar_is_memory(dev, i) ?
> > +		       "MEM" : "IO", dev->pci_bar[i]);
> 
> v9 of Alex's series has introduced pci_bar_print. You'll want to
> use that here after modifying it to get the bars from struct.

Ok. Will call pci_bar_print() here.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]()
  2016-11-05 17:06   ` Alexander Gordeev
@ 2016-11-07 17:25     ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-07 17:25 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: kvm, drjones, rkrcmar, jan.kiszka, pbonzini

On Sat, Nov 05, 2016 at 06:06:49PM +0100, Alexander Gordeev wrote:
> On Wed, Oct 26, 2016 at 03:47:12PM +0800, Peter Xu wrote:
> > We have writel() for 4 bytes but still missing 2/1 bytes accessors.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  lib/pci-host-generic.c | 12 ++++++++++++
> >  lib/x86/asm/pci.h      | 14 ++++++++++++++
> >  2 files changed, 26 insertions(+)
> > 
> > diff --git a/lib/pci-host-generic.c b/lib/pci-host-generic.c
> > index be3f9e7..c04e4f5 100644
> > --- a/lib/pci-host-generic.c
> > +++ b/lib/pci-host-generic.c
> > @@ -291,6 +291,18 @@ u32 pci_config_readl(pcidevaddr_t dev, u8 off)
> >  	return readl(conf + off);
> >  }
> >  
> > +void pci_config_writeb(pcidevaddr_t dev, u8 off, u8 val)
> > +{
> > +	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
> > +	writeb(val, conf + off);
> > +}
> > +
> > +void pci_config_writew(pcidevaddr_t dev, u8 off, u16 val)
> > +{
> > +	void __iomem *conf = pci_get_dev_conf(pci_host_bridge, dev);
> > +	writew(val, conf + off);
> > +}
> > +
> 
> The two above need to be declared in lib/asm-generic/pci-host-bridge.h

Will add them. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master()
  2016-11-04 17:04   ` Andrew Jones
@ 2016-11-07 17:35     ` Peter Xu
  2016-11-07 17:59       ` Andrew Jones
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-11-07 17:35 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 06:04:56PM +0100, Andrew Jones wrote:
> On Wed, Oct 26, 2016 at 03:47:13PM +0800, Peter Xu wrote:
> > We need to call this whenever we want to do DMA with a device.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  lib/pci.c | 7 +++++++
> >  lib/pci.h | 1 +
> >  2 files changed, 8 insertions(+)
> > 
> > diff --git a/lib/pci.c b/lib/pci.c
> > index 95a6581..1495f61 100644
> > --- a/lib/pci.c
> > +++ b/lib/pci.c
> > @@ -7,6 +7,13 @@
> >  #include "pci.h"
> >  #include "asm/pci.h"
> >  
> > +void pci_set_master(struct pci_dev *dev, int master)
> 
> Why take pci_dev instead of just the devid/bdf here?
> What is the 'master' argument (which is unused and at least
> needs an __unused to compile) here for?

I think both works for me here, but when this happens, IMHO pci_dev
struct is prefered, since as long as the caller has one *dev pointer
(I think if we introduce pci_dev, people should always keep one object
for each PCI device they manipulate in kvm-unit-tests), I would just
let people to pass in "dev" rather than "dev->pci_bdf", since it's
shorter for typing, and also it avoids referencing internal fields.

Regarding to the "master" variable... Oops, I just missed that. What I
want to do is:

	if (master)
		val |= PCI_COMMAND_MASTER;
	else
		val &= ~PCI_COMMAND_MASTER;

Thanks for pointing out. Will fix.

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers
  2016-11-04 17:18   ` Andrew Jones
@ 2016-11-07 17:41     ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-07 17:41 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 06:18:56PM +0100, Andrew Jones wrote:

[...]

> > +static inline uint32_t edu_reg_read(struct pci_edu_dev *dev, int reg)
> 
> please add the 'l' this read function name

[...]

> > +static inline void edu_reg_write(struct pci_edu_dev *dev, int reg,
> > +				 uint32_t val)
> 
> please add the 'l'

Fixing both. 

> Otherwise
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers
  2016-11-04 17:24   ` Andrew Jones
@ 2016-11-07 17:44     ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-07 17:44 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 06:24:48PM +0100, Andrew Jones wrote:
> On Wed, Oct 26, 2016 at 03:47:16PM +0800, Peter Xu wrote:
> > QEMU edu device is a pci device that is originally written for
> > educational purpose, however it also suits for IOMMU unit test. Adding
> > helpers for this specific device to implement the device logic.
> > 
> > The device supports lots of functions, here only DMA operation is
> > supported.
> > 
> > The spec of the device can be found at:
> > 
> >   https://github.com/qemu/qemu/blob/master/docs/specs/edu.txt
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  lib/pci-edu.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  lib/pci-edu.h | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 153 insertions(+)
> >  create mode 100644 lib/pci-edu.c
> >  create mode 100644 lib/pci-edu.h
> > 
> > diff --git a/lib/pci-edu.c b/lib/pci-edu.c
> > new file mode 100644
> > index 0000000..c594e08
> > --- /dev/null
> > +++ b/lib/pci-edu.c
> > @@ -0,0 +1,74 @@
> > +/*
> > + * Edu PCI device.
> > + *
> > + * Copyright (C) 2016 Red Hat, Inc.
> > + *
> > + * Authors:
> > + *   Peter Xu <peterx@redhat.com>,
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or
> > + * later.
> > + */
> > +
> > +#include "pci-edu.h"
> > +#include "asm/barrier.h"
> > +
> > +/* Return true if alive */
> > +static inline bool edu_check_alive(struct pci_edu_dev *dev)
> > +{
> > +	static uint32_t live_count = 1;
> > +	uint32_t value;
> > +
> > +	edu_reg_write(dev, EDU_REG_ALIVE, live_count++);
> > +	value = edu_reg_read(dev, EDU_REG_ALIVE);
> > +	return (live_count - 1 == ~value);
> > +}
> > +
> > +int edu_init(struct pci_edu_dev *dev)
> > +{
> > +	int ret;
> > +
> > +	ret = pci_find_dev(&dev->pci_dev, PCI_VENDOR_ID_QEMU,
> > +			   PCI_DEVICE_ID_EDU);
> > +	if (ret)
> > +		return ret;
> 
> Another comment: as I mentioned earlier, I think pci_find_dev should
> return a bool. Here too, i.e. edu_init returns true when init
> succeeds, else false. The only reason to go with a zero success,
> non-zero (negative) failure is if multiple error codes will be
> introduced. I don't think that'll happen for kvm-unit-tests though.

Ok. Will change pci_find_dev's return code into bool. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 15/17] pci: add msi support for 32/64bit address
  2016-11-04 17:33   ` Andrew Jones
@ 2016-11-07 17:58     ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-07 17:58 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 06:33:29PM +0100, Andrew Jones wrote:

[...]

> > +static pci_cap_handler cap_handlers[0xff] = {
> 
> Why 0xff? Shouldn't it be (PCI_CAP_ID_MAX + 1)?

Yes. Will fix.

[...]

> > +int pci_setup_msi(struct pci_dev *dev, uint64_t msi_addr, uint32_t msi_data)
> > +{
> > +	uint16_t msi_control;
> > +	uint16_t offset;
> > +	pcidevaddr_t addr = dev->pci_bdf;
> > +
> > +	assert(dev);
> > +
> > +	if (!dev->msi_offset) {
> > +		printf("MSI: dev 0x%x does not support MSI.\n", addr);
> > +		return -1;
> > +	}
> > +
> > +	offset = dev->msi_offset;
> > +	msi_control = pci_config_readw(addr, offset + PCI_MSI_FLAGS);
> > +	pci_config_writel(addr, offset + PCI_MSI_ADDRESS_LO,
> > +			  msi_addr & 0xffffffff);
> > +
> > +	if (msi_control & PCI_MSI_FLAGS_64BIT) {
> > +		pci_config_writel(addr, offset + PCI_MSI_ADDRESS_HI,
> > +				  (uint32_t)(msi_addr >> 32));
> > +		pci_config_writel(addr, offset + PCI_MSI_DATA_64, msi_data);
> > +		printf("MSI: dev 0x%x init 64bit address: ", addr);
> > +	} else {
> > +		pci_config_writel(addr, offset + PCI_MSI_DATA_32, msi_data);
> > +		printf("MSI: dev 0x%x init 32bit address: ", addr);
> > +	}
> > +	printf("addr=0x%lx, data=0x%x\n", msi_addr, msi_data);
> > +
> > +	msi_control |= PCI_MSI_FLAGS_ENABLE;
> > +	pci_config_writew(addr, offset + PCI_MSI_FLAGS, msi_control);
> > +
> > +	return 0;
> > +}
> 
> Again we only have 0/-1 for good/bad. I'd prefer true/false.

Will use bool as return code.

> Otherwise
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com> 

Thanks!

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master()
  2016-11-07 17:35     ` Peter Xu
@ 2016-11-07 17:59       ` Andrew Jones
  2016-11-07 19:45         ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-07 17:59 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 01:35:02AM +0800, Peter Xu wrote:
> On Fri, Nov 04, 2016 at 06:04:56PM +0100, Andrew Jones wrote:
> > On Wed, Oct 26, 2016 at 03:47:13PM +0800, Peter Xu wrote:
> > > We need to call this whenever we want to do DMA with a device.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  lib/pci.c | 7 +++++++
> > >  lib/pci.h | 1 +
> > >  2 files changed, 8 insertions(+)
> > > 
> > > diff --git a/lib/pci.c b/lib/pci.c
> > > index 95a6581..1495f61 100644
> > > --- a/lib/pci.c
> > > +++ b/lib/pci.c
> > > @@ -7,6 +7,13 @@
> > >  #include "pci.h"
> > >  #include "asm/pci.h"
> > >  
> > > +void pci_set_master(struct pci_dev *dev, int master)
> > 
> > Why take pci_dev instead of just the devid/bdf here?
> > What is the 'master' argument (which is unused and at least
> > needs an __unused to compile) here for?
> 
> I think both works for me here, but when this happens, IMHO pci_dev
> struct is prefered, since as long as the caller has one *dev pointer
> (I think if we introduce pci_dev, people should always keep one object
> for each PCI device they manipulate in kvm-unit-tests), I would just
> let people to pass in "dev" rather than "dev->pci_bdf", since it's
> shorter for typing, and also it avoids referencing internal fields.
> 
> Regarding to the "master" variable... Oops, I just missed that. What I
> want to do is:
> 
> 	if (master)
> 		val |= PCI_COMMAND_MASTER;
> 	else
> 		val &= ~PCI_COMMAND_MASTER;
> 
> Thanks for pointing out. Will fix.


Actually, maybe we can just drop this patch. It's just

 pci_config_writew(dev->bsd, PCI_COMMAND,
       pci_config_readw(dev->bsd, PCI_COMMAND) | PCI_COMMAND_MASTER);
                                               ^ change op as needed

Long, but no big deal... If updating PCI_COMMAND is a common thing
to do then we should have

 pci_command_set(dev, bits)
 pci_command_clr(dev, bits)

operators instead.

drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-07 17:05     ` Peter Xu
@ 2016-11-07 18:02       ` Andrew Jones
  2016-11-07 19:42         ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-07 18:02 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 01:05:11AM +0800, Peter Xu wrote:
> On Fri, Nov 04, 2016 at 05:41:01PM +0100, Andrew Jones wrote:
> 
> [...]
> 
> > > +void pci_dev_init(struct pci_dev *dev, pcidevaddr_t bdf)
> > > +{
> > > +       memset(dev, 0, sizeof(*dev));
> > > +       dev->pci_bdf = bdf;
> > 
> > Hmm, bdf means bus-device-function, correct? We're not passing a "bdf"
> > in as the second argument to this function though. We're only passing
> > devid, assuming bus=0, fn=0. If you'd prefer we stop the bus,fn zero
> > assumption, then I think a precursor patch to this should be to change
> > our current handle type, pcidevaddr_t, to a "bdf_t".
> 
> I thought pcidevaddr_t means bdf? That's a wild guess of me from its
> size, as it's defined as:
> 
>   typedef uint16_t pcidevaddr_t;
> 
> uint16_t makes it looks more like bdf rather than devfn. For devfn,
> uint8_t suites better.
> 
> And what I mean here is to pass in the bdf of the device, not devfn
> only (in our case, devfn is always bdf though, since we are only
> supporting bus number zero).

see lib/x86/asm/pci.h:pci_config_read how 'dev' gets used, it's only dev.

> 
> > 
> > > +}
> > > +
> > >  /* Scan bus look for a specific device. Only bus 0 scanned for now. */
> > > -pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
> > > +int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id)
> > >  {
> > >  	pcidevaddr_t dev;
> > >  
> > > -	for (dev = 0; dev < 256; ++dev) {
> > > +	for (dev = 0; dev < PCI_DEVFN_MAX; ++dev) {
> > 
> > Why introduce this PCI_DEVFN_MAX define?
> 
> Because I think this value might be re-used in the future in other
> parts of the codes, so I defined a macro instead of using the raw
> number here.
> 
> > 
> > >  		if (pci_config_readw(dev, PCI_VENDOR_ID) == vendor_id &&
> > > -		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id)
> > > -			return dev;
> > > +		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id) {
> > > +			pci_dev_init(pci_dev, dev);
> > > +			return 0;
> > > +		}
> > >  	}
> > >  
> > > -	return PCIDEVADDR_INVALID;
> > > +	return -1;
> > 
> > Why not use bool for the ret type; true=good, false=bad?
> 
> True/false looks more direct, but zero/non-zero wins when in the
> future we want to add more error codes than -1. Both work for me. :)
> 
> [...]
> 
> > > +struct pci_dev {
> > > +	uint16_t pci_bdf;
> > 
> > No need for 'pci_' in the bdf name. It's whole name already has it,
> > pci_dev.bdf
> 
> Yes, actually this is something of my own "bad" habit. I used to add
> xxx_ prefix for better tag generations. When I read some big C repos
> like Linux, I see lots of duplicated field names for different
> structs/unions. For example, when I want to see codes related to
> StructA.FieldX, cscope will always tell me something more than this
> (rather than giving me StructA.FieldX, it gives me
> Struct[ABCDE..].FieldX). Then I need to pick out where StructA is.
> That's sometimes annoying, which depends on the length of the tag
> list. So I prefer to add this kind of prefix in the codes.
> 
> Actually when I read some other codes (IIUC FreeBSD kernel), I see
> that this "bad" technique is used as well with very short prefix,
> e.g., for a struct called "net_filter" and field called "users", it
> may be defined as:
> 
>   net_filter.nf_users
> 
> So the "nf_" prefix is the short form of "net_filter", and it can be
> used to distinguish the net_filter "users" fields with other "users"
> ones.
> 
> Here "pci" is short enough IMHO, so I prefixed it.
> 
> Not sure whether it's a good idea though.

Seems like overkill for kvm-unit-tests.

> 
> [...]
> 
> > I'm OK with introducing the pci_dev struct, but let's not switch over to
> > it from our other handle for everything all at once. I'd rather only
> > switch functions that will need more state than just devid/bdf. Anywhere
> > we never do then I think expecting the caller to call like
> > pci_foo(pcidev->bdf, ...) is reasonable.
> 
> Agree. That's some point I want to achieve. E.g., for
> pci_dev_exists(), I kept the old interface since it suites better
> IMHO. Also for the pci_config_*() APIs as mentioned in the commit
> message.

Let's keep the passing of the struct down to the minimum.

Thanks,
drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test
  2016-11-04 17:46   ` Andrew Jones
@ 2016-11-07 18:06     ` Peter Xu
  2016-11-08 10:39       ` Andrew Jones
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-11-07 18:06 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 06:46:23PM +0100, Andrew Jones wrote:
> On Wed, Oct 26, 2016 at 03:47:20PM +0800, Peter Xu wrote:
> > I used pc-q35-2.7 instead of default q35 since I'd better make sure both
> > QEMU 2.7/2.8 can run the test, which is to at least satisfy the
> > following matrix:
> > 
> >   |----------+-----------+------------|
> >   | versions | KVM < 4.7 | KVM >= 4.7 |
> >   |----------+-----------+------------|
> >   | QEMU 2.7 |           |            |
> >   | QEMU 2.8 |           |            |
> >   |----------+-----------+------------|
> > 
> > If using default q35, QEMU2.8+KVM<4.7 will fail to run.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  x86/unittests.cfg | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> > index f760d8f..d59cf68 100644
> > --- a/x86/unittests.cfg
> > +++ b/x86/unittests.cfg
> > @@ -206,3 +206,10 @@ extra_params = -cpu kvm64,hv_time,hv_synic,hv_stimer -device hyperv-testdev
> >  file = hyperv_clock.flat
> >  smp = 2
> >  extra_params = -cpu kvm64,hv_time
> > +
> > +[intel_iommu]
> > +file = intel-iommu.flat
> > +arch = x86_64
> > +timeout = 30
> > +smp = 4
> > +extra_params = -M pc-q35-2.7,kernel-irqchip=split -device intel-iommu,intremap=on -device edu
> > -- 
> > 2.7.4
> > 
> 
> Hmm, this is the first we've worried about this in kvm-unit-tests (And
> I'm not sure we should start). Usually we just want the tests to work
> with latest KVM and latest QEMU, or to graceful fail with
> report_xfail/skip/abort when it can determine that it's missing something
> it needs.

For this test, the latest command line should be:

  -M q35,kernel-irqchip=split -device intel-iommu,intremap=on -device edu

However it will fail if kernel version <4.7 (without x2apic supported
KVM). No chance for a report_skip().

I can use eim=off to avoid that check, however, if with "eim=off", we have:

  -M q35,kernel-irqchip=split -device intel-iommu,intremap=on,eim=off -device edu

It requires QEMU to be latest (>2.8). Otherwise QEMU will report "eim
property not found" and we lost chance to report as well.

So, I just chose to use pc-q35-2.7 here. Not sure whether this is a
good idea.

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-07 18:02       ` Andrew Jones
@ 2016-11-07 19:42         ` Peter Xu
  2016-11-08 10:16           ` Andrew Jones
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-11-07 19:42 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Mon, Nov 07, 2016 at 07:02:12PM +0100, Andrew Jones wrote:

[...]

> > I thought pcidevaddr_t means bdf? That's a wild guess of me from its
> > size, as it's defined as:
> > 
> >   typedef uint16_t pcidevaddr_t;
> > 
> > uint16_t makes it looks more like bdf rather than devfn. For devfn,
> > uint8_t suites better.
> > 
> > And what I mean here is to pass in the bdf of the device, not devfn
> > only (in our case, devfn is always bdf though, since we are only
> > supporting bus number zero).
> 
> see lib/x86/asm/pci.h:pci_config_read how 'dev' gets used, it's only dev.

But shouldn't "dev" the same as "bdf" here?

[...]

> > Yes, actually this is something of my own "bad" habit. I used to add
> > xxx_ prefix for better tag generations. When I read some big C repos
> > like Linux, I see lots of duplicated field names for different
> > structs/unions. For example, when I want to see codes related to
> > StructA.FieldX, cscope will always tell me something more than this
> > (rather than giving me StructA.FieldX, it gives me
> > Struct[ABCDE..].FieldX). Then I need to pick out where StructA is.
> > That's sometimes annoying, which depends on the length of the tag
> > list. So I prefer to add this kind of prefix in the codes.
> > 
> > Actually when I read some other codes (IIUC FreeBSD kernel), I see
> > that this "bad" technique is used as well with very short prefix,
> > e.g., for a struct called "net_filter" and field called "users", it
> > may be defined as:
> > 
> >   net_filter.nf_users
> > 
> > So the "nf_" prefix is the short form of "net_filter", and it can be
> > used to distinguish the net_filter "users" fields with other "users"
> > ones.
> > 
> > Here "pci" is short enough IMHO, so I prefixed it.
> > 
> > Not sure whether it's a good idea though.
> 
> Seems like overkill for kvm-unit-tests.

Ok. Let me remove all pci_ prefix for pci_dev struct fields.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master()
  2016-11-07 17:59       ` Andrew Jones
@ 2016-11-07 19:45         ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-07 19:45 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Mon, Nov 07, 2016 at 06:59:30PM +0100, Andrew Jones wrote:
> On Tue, Nov 08, 2016 at 01:35:02AM +0800, Peter Xu wrote:
> > On Fri, Nov 04, 2016 at 06:04:56PM +0100, Andrew Jones wrote:
> > > On Wed, Oct 26, 2016 at 03:47:13PM +0800, Peter Xu wrote:
> > > > We need to call this whenever we want to do DMA with a device.
> > > > 
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  lib/pci.c | 7 +++++++
> > > >  lib/pci.h | 1 +
> > > >  2 files changed, 8 insertions(+)
> > > > 
> > > > diff --git a/lib/pci.c b/lib/pci.c
> > > > index 95a6581..1495f61 100644
> > > > --- a/lib/pci.c
> > > > +++ b/lib/pci.c
> > > > @@ -7,6 +7,13 @@
> > > >  #include "pci.h"
> > > >  #include "asm/pci.h"
> > > >  
> > > > +void pci_set_master(struct pci_dev *dev, int master)
> > > 
> > > Why take pci_dev instead of just the devid/bdf here?
> > > What is the 'master' argument (which is unused and at least
> > > needs an __unused to compile) here for?
> > 
> > I think both works for me here, but when this happens, IMHO pci_dev
> > struct is prefered, since as long as the caller has one *dev pointer
> > (I think if we introduce pci_dev, people should always keep one object
> > for each PCI device they manipulate in kvm-unit-tests), I would just
> > let people to pass in "dev" rather than "dev->pci_bdf", since it's
> > shorter for typing, and also it avoids referencing internal fields.
> > 
> > Regarding to the "master" variable... Oops, I just missed that. What I
> > want to do is:
> > 
> > 	if (master)
> > 		val |= PCI_COMMAND_MASTER;
> > 	else
> > 		val &= ~PCI_COMMAND_MASTER;
> > 
> > Thanks for pointing out. Will fix.
> 
> 
> Actually, maybe we can just drop this patch. It's just
> 
>  pci_config_writew(dev->bsd, PCI_COMMAND,
>        pci_config_readw(dev->bsd, PCI_COMMAND) | PCI_COMMAND_MASTER);
>                                                ^ change op as needed
> 
> Long, but no big deal... If updating PCI_COMMAND is a common thing
> to do then we should have
> 
>  pci_command_set(dev, bits)
>  pci_command_clr(dev, bits)
> 
> operators instead.

Sounds good to me. Actually I think it does not hurt to provide a
pci_set_master(), even as a wrapper to pci_command_set(). But for now,
let me drop pci_set_master() in this series. Maybe I can use one
function pci_cmd_set_clr(dev, set_bits, clr_bits) for the two?

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-07 19:42         ` Peter Xu
@ 2016-11-08 10:16           ` Andrew Jones
  2016-11-08 15:33             ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-08 10:16 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Mon, Nov 07, 2016 at 02:42:29PM -0500, Peter Xu wrote:
> On Mon, Nov 07, 2016 at 07:02:12PM +0100, Andrew Jones wrote:
> 
> [...]
> 
> > > I thought pcidevaddr_t means bdf? That's a wild guess of me from its
> > > size, as it's defined as:
> > > 
> > >   typedef uint16_t pcidevaddr_t;
> > > 
> > > uint16_t makes it looks more like bdf rather than devfn. For devfn,
> > > uint8_t suites better.
> > > 
> > > And what I mean here is to pass in the bdf of the device, not devfn
> > > only (in our case, devfn is always bdf though, since we are only
> > > supporting bus number zero).
> > 
> > see lib/x86/asm/pci.h:pci_config_read how 'dev' gets used, it's only dev.
> 
> But shouldn't "dev" the same as "bdf" here?

Hmm, yes. We're only shifting it by 8, not 11, so you're right. The word
'dev' in pcidevaddr_t threw me off and I didn't bother looking up CF8's
bits until just now. OK, let's try to get consistent naming going on,
and we need better comments (comments at all). I'd like to see one
briefly describing 'bdf' and then use that for the primary name, i.e.
rename pcidevaddr_t to something like 'pcibdfaddr_t' or whatever.

> 
> [...]
> 
> > > Yes, actually this is something of my own "bad" habit. I used to add
> > > xxx_ prefix for better tag generations. When I read some big C repos
> > > like Linux, I see lots of duplicated field names for different
> > > structs/unions. For example, when I want to see codes related to
> > > StructA.FieldX, cscope will always tell me something more than this
> > > (rather than giving me StructA.FieldX, it gives me
> > > Struct[ABCDE..].FieldX). Then I need to pick out where StructA is.
> > > That's sometimes annoying, which depends on the length of the tag
> > > list. So I prefer to add this kind of prefix in the codes.
> > > 
> > > Actually when I read some other codes (IIUC FreeBSD kernel), I see
> > > that this "bad" technique is used as well with very short prefix,
> > > e.g., for a struct called "net_filter" and field called "users", it
> > > may be defined as:
> > > 
> > >   net_filter.nf_users
> > > 
> > > So the "nf_" prefix is the short form of "net_filter", and it can be
> > > used to distinguish the net_filter "users" fields with other "users"
> > > ones.
> > > 
> > > Here "pci" is short enough IMHO, so I prefixed it.
> > > 
> > > Not sure whether it's a good idea though.
> > 
> > Seems like overkill for kvm-unit-tests.
> 
> Ok. Let me remove all pci_ prefix for pci_dev struct fields.
>

Thanks,
drew 

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test
  2016-11-07 18:06     ` Peter Xu
@ 2016-11-08 10:39       ` Andrew Jones
  2016-11-08 15:57         ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-08 10:39 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Mon, Nov 07, 2016 at 01:06:27PM -0500, Peter Xu wrote:
> On Fri, Nov 04, 2016 at 06:46:23PM +0100, Andrew Jones wrote:
> > On Wed, Oct 26, 2016 at 03:47:20PM +0800, Peter Xu wrote:
> > > I used pc-q35-2.7 instead of default q35 since I'd better make sure both
> > > QEMU 2.7/2.8 can run the test, which is to at least satisfy the
> > > following matrix:
> > > 
> > >   |----------+-----------+------------|
> > >   | versions | KVM < 4.7 | KVM >= 4.7 |
> > >   |----------+-----------+------------|
> > >   | QEMU 2.7 |           |            |
> > >   | QEMU 2.8 |           |            |
> > >   |----------+-----------+------------|
> > > 
> > > If using default q35, QEMU2.8+KVM<4.7 will fail to run.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  x86/unittests.cfg | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> > > index f760d8f..d59cf68 100644
> > > --- a/x86/unittests.cfg
> > > +++ b/x86/unittests.cfg
> > > @@ -206,3 +206,10 @@ extra_params = -cpu kvm64,hv_time,hv_synic,hv_stimer -device hyperv-testdev
> > >  file = hyperv_clock.flat
> > >  smp = 2
> > >  extra_params = -cpu kvm64,hv_time
> > > +
> > > +[intel_iommu]
> > > +file = intel-iommu.flat
> > > +arch = x86_64
> > > +timeout = 30
> > > +smp = 4
> > > +extra_params = -M pc-q35-2.7,kernel-irqchip=split -device intel-iommu,intremap=on -device edu
> > > -- 
> > > 2.7.4
> > > 
> > 
> > Hmm, this is the first we've worried about this in kvm-unit-tests (And
> > I'm not sure we should start). Usually we just want the tests to work
> > with latest KVM and latest QEMU, or to graceful fail with
> > report_xfail/skip/abort when it can determine that it's missing something
> > it needs.
> 
> For this test, the latest command line should be:
> 
>   -M q35,kernel-irqchip=split -device intel-iommu,intremap=on -device edu
> 
> However it will fail if kernel version <4.7 (without x2apic supported
> KVM). No chance for a report_skip().
> 
> I can use eim=off to avoid that check, however, if with "eim=off", we have:
> 
>   -M q35,kernel-irqchip=split -device intel-iommu,intremap=on,eim=off -device edu
> 
> It requires QEMU to be latest (>2.8). Otherwise QEMU will report "eim
> property not found" and we lost chance to report as well.
> 
> So, I just chose to use pc-q35-2.7 here. Not sure whether this is a
> good idea.

I think we just shouldn't care. Either this test is run on the latest
kernel and qemu, or it will fail. Hopefully something ends up in test.log
that indicates it failed due to an old kernel or old qemu. Old qemu is
clear from the "eim property not found"

Alternatively we can build support into runtime to capture kernel, qemu
versions and compare them against new unittests.cfg {kvm,qemu}_minver
fields. That would be a different series though, and will require
extending mkstandalone as well. Feel free to add it to your TODO. If you
do, then I hope you like Bash :-)

Thanks,
drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test
  2016-10-26  7:47 ` [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test Peter Xu
  2016-11-04 16:12   ` Andrew Jones
@ 2016-11-08 10:52   ` Alexander Gordeev
  2016-11-08 15:24     ` Peter Xu
  1 sibling, 1 reply; 67+ messages in thread
From: Alexander Gordeev @ 2016-11-08 10:52 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, drjones, rkrcmar, jan.kiszka, pbonzini

On Wed, Oct 26, 2016 at 03:47:04PM +0800, Peter Xu wrote:
> +static inline void vtd_writel(unsigned int reg, uint32_t value)
> +{
> +	*(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> +}
> +
> +static inline void vtd_writeq(unsigned int reg, uint64_t value)
> +{
> +	*(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> +}
> +
> +static inline uint32_t vtd_readl(unsigned int reg)
> +{
> +	return *(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> +}
> +
> +static inline uint64_t vtd_readq(unsigned int reg)
> +{
> +	return *(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> +}

Is it safe to use memory dereference here?
By contrast, Linux seems uses read*()/write*() accessors.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-04 16:41   ` Andrew Jones
  2016-11-07 17:05     ` Peter Xu
@ 2016-11-08 12:27     ` Alexander Gordeev
  2016-11-08 15:48       ` Peter Xu
  1 sibling, 1 reply; 67+ messages in thread
From: Alexander Gordeev @ 2016-11-08 12:27 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Peter Xu, kvm, rkrcmar, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 05:41:01PM +0100, Andrew Jones wrote:
> >  /* Scan bus look for a specific device. Only bus 0 scanned for now. */
> > -pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
> > +int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id)
> >  {
> >  	pcidevaddr_t dev;
> >  
> > -	for (dev = 0; dev < 256; ++dev) {
> > +	for (dev = 0; dev < PCI_DEVFN_MAX; ++dev) {
> 
> Why introduce this PCI_DEVFN_MAX define?
> 
> >  		if (pci_config_readw(dev, PCI_VENDOR_ID) == vendor_id &&
> > -		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id)
> > -			return dev;
> > +		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id) {
> > +			pci_dev_init(pci_dev, dev);
> > +			return 0;
> > +		}
> >  	}
> >  
> > -	return PCIDEVADDR_INVALID;
> > +	return -1;
> 
> Why not use bool for the ret type; true=good, false=bad?

Both ways look strange to me. I would leave pci_find_dev() as is (low-
level) and move pci_dev_init() outside - so a usage would be i.e.:

	dev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
	if (dev == PCIDEVADDR_INVALID)
		...
	pci_dev_init(&pci_dev, dev);

> Thanks,
> drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 08/17] x86/vmexit: leverage pci_scan_bars()
  2016-11-04 16:54   ` Andrew Jones
@ 2016-11-08 13:43     ` Alexander Gordeev
  2016-11-08 15:55       ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Gordeev @ 2016-11-08 13:43 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Peter Xu, kvm, rkrcmar, jan.kiszka, pbonzini

On Fri, Nov 04, 2016 at 05:54:34PM +0100, Andrew Jones wrote:
> On Wed, Oct 26, 2016 at 03:47:11PM +0800, Peter Xu wrote:
> > Since pci-testdev is a very specific device for QEMU, let's use the new
> > pci_scan_bars() helper, and selectively choose the bars we want.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  lib/pci.h    |  2 ++
> >  x86/vmexit.c | 23 ++++++++---------------
> >  2 files changed, 10 insertions(+), 15 deletions(-)
> > 
> > diff --git a/lib/pci.h b/lib/pci.h
> > index 56b49f0..e48045c 100644
> > --- a/lib/pci.h
> > +++ b/lib/pci.h
> > @@ -66,6 +66,8 @@ int pci_testdev(void);
> >   * pci-testdev supports at least three types of tests (via mmio and
> >   * portio BARs): no-eventfd, wildcard-eventfd and datamatch-eventfd
> >   */
> > +#define PCI_TESTDEV_BAR_MEM		0
> > +#define PCI_TESTDEV_BAR_IO		1
> 
> More defines that I think we could live without, but I'll try to stop
> complaining about self-documenting code...
> 
> >  #define PCI_TESTDEV_NUM_BARS		2
> >  #define PCI_TESTDEV_NUM_TESTS		3
> >  
> > diff --git a/x86/vmexit.c b/x86/vmexit.c
> > index 2736ab8..880466e 100644
> > --- a/x86/vmexit.c
> > +++ b/x86/vmexit.c
> > @@ -373,7 +373,6 @@ int main(int ac, char **av)
> >  	int i;
> >  	unsigned long membar = 0;
> >  	struct pci_dev pcidev;
> > -	int ret;
> >  
> >  	smp_init();
> >  	setup_vm();
> > @@ -386,20 +385,14 @@ int main(int ac, char **av)
> >  	pm_tmr_blk = fadt->pm_tmr_blk;
> >  	printf("PM timer port is %x\n", pm_tmr_blk);
> >  
> > -	ret = pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
> > -			   PCI_DEVICE_ID_REDHAT_TEST);
> > -	if (ret == 0) {
> > -		for (i = 0; i < PCI_TESTDEV_NUM_BARS; i++) {
> > -			if (!pci_bar_is_valid(&pcidev, i)) {
> > -				continue;
> > -			}
> > -			if (pci_bar_is_memory(&pcidev, i)) {
> > -				membar = pci_bar_get_addr(&pcidev, i);
> > -				pci_test.memaddr = ioremap(membar, PAGE_SIZE);
> > -			} else {
> > -				pci_test.iobar = pci_bar_get_addr(&pcidev, i);
> > -			}
> > -		}
> > +	if (!pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
> > +			  PCI_DEVICE_ID_REDHAT_TEST)) {
> > +		pci_scan_bars(&pcidev);

I think if you attempt to cache all viable information in a pci_dev
structure then pci_scan_bars() should be mandatory to call when the
pci_dev is initilized. IOW, it needs to be called in pci_dev_init().

> > +		assert(pci_bar_is_memory(&pcidev, PCI_TESTDEV_BAR_MEM));
> > +		assert(!pci_bar_is_memory(&pcidev, PCI_TESTDEV_BAR_IO));
> > +		membar = pcidev.pci_bar[PCI_TESTDEV_BAR_MEM];
> > +		pci_test.memaddr = ioremap(membar, PAGE_SIZE);
> > +		pci_test.iobar = pcidev.pci_bar[PCI_TESTDEV_BAR_IO];
> >  		printf("pci-testdev at 0x%x membar %lx iobar %x\n",
> >  		       pcidev.pci_bdf, membar, pci_test.iobar);
> >  	}
> > -- 
> > 2.7.4
> 
> I'm not sure what we gain by hardcoding the mapping of these bars, besides
> slightly simpler code, but lib/pci-testdev.c does it too, and it should be
> quite safe, so OK.
> 
> Reviewed-by: Andrew Jones <drjones@redhat.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test
  2016-11-08 10:52   ` Alexander Gordeev
@ 2016-11-08 15:24     ` Peter Xu
  2016-11-08 17:40       ` Alexander Gordeev
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-11-08 15:24 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: kvm, drjones, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 11:52:35AM +0100, Alexander Gordeev wrote:
> On Wed, Oct 26, 2016 at 03:47:04PM +0800, Peter Xu wrote:
> > +static inline void vtd_writel(unsigned int reg, uint32_t value)
> > +{
> > +	*(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> > +}
> > +
> > +static inline void vtd_writeq(unsigned int reg, uint64_t value)
> > +{
> > +	*(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> > +}
> > +
> > +static inline uint32_t vtd_readl(unsigned int reg)
> > +{
> > +	return *(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> > +}
> > +
> > +static inline uint64_t vtd_readq(unsigned int reg)
> > +{
> > +	return *(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> > +}
> 
> Is it safe to use memory dereference here?
> By contrast, Linux seems uses read*()/write*() accessors.

Looks like Linux is using memory barrior and volatile keywords to do
the protection (build_mmio_read() for x86). I think at least I can add
volatile here for each read/write to make it safer. Do you think
that'll suffice here? E.g.:

  static inline uint64_t vtd_readq(unsigned int reg)
  {
      return *(volatile uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
  }

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-08 10:16           ` Andrew Jones
@ 2016-11-08 15:33             ` Peter Xu
  2016-11-08 17:27               ` Andrew Jones
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Xu @ 2016-11-08 15:33 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 11:16:11AM +0100, Andrew Jones wrote:
> On Mon, Nov 07, 2016 at 02:42:29PM -0500, Peter Xu wrote:
> > On Mon, Nov 07, 2016 at 07:02:12PM +0100, Andrew Jones wrote:
> > 
> > [...]
> > 
> > > > I thought pcidevaddr_t means bdf? That's a wild guess of me from its
> > > > size, as it's defined as:
> > > > 
> > > >   typedef uint16_t pcidevaddr_t;
> > > > 
> > > > uint16_t makes it looks more like bdf rather than devfn. For devfn,
> > > > uint8_t suites better.
> > > > 
> > > > And what I mean here is to pass in the bdf of the device, not devfn
> > > > only (in our case, devfn is always bdf though, since we are only
> > > > supporting bus number zero).
> > > 
> > > see lib/x86/asm/pci.h:pci_config_read how 'dev' gets used, it's only dev.
> > 
> > But shouldn't "dev" the same as "bdf" here?
> 
> Hmm, yes. We're only shifting it by 8, not 11, so you're right. The word
> 'dev' in pcidevaddr_t threw me off and I didn't bother looking up CF8's
> bits until just now. OK, let's try to get consistent naming going on,
> and we need better comments (comments at all). I'd like to see one
> briefly describing 'bdf' and then use that for the primary name, i.e.
> rename pcidevaddr_t to something like 'pcibdfaddr_t' or whatever.

Yes, indeed the naming is a little bit confusing here. I was always
trying to use bdf/devfn in the whole series to avoid misunderstanding.
Regarding to the naming of pcidevaddr_t: that's okay for me, but I
agree that bdf is better and clearer. If you would not mind, I'll keep
it as it is for now for this series.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-08 12:27     ` Alexander Gordeev
@ 2016-11-08 15:48       ` Peter Xu
  2016-11-08 17:35         ` Andrew Jones
  2016-11-08 17:46         ` Alexander Gordeev
  0 siblings, 2 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-08 15:48 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: Andrew Jones, kvm, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 01:27:31PM +0100, Alexander Gordeev wrote:
> On Fri, Nov 04, 2016 at 05:41:01PM +0100, Andrew Jones wrote:
> > >  /* Scan bus look for a specific device. Only bus 0 scanned for now. */
> > > -pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
> > > +int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id)
> > >  {
> > >  	pcidevaddr_t dev;
> > >  
> > > -	for (dev = 0; dev < 256; ++dev) {
> > > +	for (dev = 0; dev < PCI_DEVFN_MAX; ++dev) {
> > 
> > Why introduce this PCI_DEVFN_MAX define?
> > 
> > >  		if (pci_config_readw(dev, PCI_VENDOR_ID) == vendor_id &&
> > > -		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id)
> > > -			return dev;
> > > +		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id) {
> > > +			pci_dev_init(pci_dev, dev);
> > > +			return 0;
> > > +		}
> > >  	}
> > >  
> > > -	return PCIDEVADDR_INVALID;
> > > +	return -1;
> > 
> > Why not use bool for the ret type; true=good, false=bad?
> 
> Both ways look strange to me. I would leave pci_find_dev() as is (low-
> level) and move pci_dev_init() outside - so a usage would be i.e.:
> 
> 	dev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
> 	if (dev == PCIDEVADDR_INVALID)
> 		...
> 	pci_dev_init(&pci_dev, dev);

IMHO actually it'll be nicer with:

  struct pci_dev *pci_find_dev(uint16_t vendor_id, uint16_t device_id);

But this needs dynamic allocation of memory, which might be an
overkill for kvm-unit-test. So I chose to allocate the pci_dev on
caller stack.

Anyway, I take the interface issue as a matter of taste.

How about I take Drew's suggestion here with true/false? Then if we
have any better idea with solid reasons, we can post seperate patch
for it.

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 08/17] x86/vmexit: leverage pci_scan_bars()
  2016-11-08 13:43     ` Alexander Gordeev
@ 2016-11-08 15:55       ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-08 15:55 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: Andrew Jones, kvm, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 02:43:53PM +0100, Alexander Gordeev wrote:

[...]

> > > @@ -386,20 +385,14 @@ int main(int ac, char **av)
> > >  	pm_tmr_blk = fadt->pm_tmr_blk;
> > >  	printf("PM timer port is %x\n", pm_tmr_blk);
> > >  
> > > -	ret = pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
> > > -			   PCI_DEVICE_ID_REDHAT_TEST);
> > > -	if (ret == 0) {
> > > -		for (i = 0; i < PCI_TESTDEV_NUM_BARS; i++) {
> > > -			if (!pci_bar_is_valid(&pcidev, i)) {
> > > -				continue;
> > > -			}
> > > -			if (pci_bar_is_memory(&pcidev, i)) {
> > > -				membar = pci_bar_get_addr(&pcidev, i);
> > > -				pci_test.memaddr = ioremap(membar, PAGE_SIZE);
> > > -			} else {
> > > -				pci_test.iobar = pci_bar_get_addr(&pcidev, i);
> > > -			}
> > > -		}
> > > +	if (!pci_find_dev(&pcidev, PCI_VENDOR_ID_REDHAT,
> > > +			  PCI_DEVICE_ID_REDHAT_TEST)) {
> > > +		pci_scan_bars(&pcidev);
> 
> I think if you attempt to cache all viable information in a pci_dev
> structure then pci_scan_bars() should be mandatory to call when the
> pci_dev is initilized. IOW, it needs to be called in pci_dev_init().

Yeah, I can put it into pci_dev_init(). Anyway, it's only a caching
operation.

But I also like Drew's idea about pci_enable_defaults() and to make
sure pci_dev_init() the minimum. Then, the caller will decide what to
do.

Again I think it's a matter of taste, and both work for me. :-)

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test
  2016-11-08 10:39       ` Andrew Jones
@ 2016-11-08 15:57         ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-08 15:57 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 11:39:55AM +0100, Andrew Jones wrote:

[...]

> > For this test, the latest command line should be:
> > 
> >   -M q35,kernel-irqchip=split -device intel-iommu,intremap=on -device edu
> > 
> > However it will fail if kernel version <4.7 (without x2apic supported
> > KVM). No chance for a report_skip().
> > 
> > I can use eim=off to avoid that check, however, if with "eim=off", we have:
> > 
> >   -M q35,kernel-irqchip=split -device intel-iommu,intremap=on,eim=off -device edu
> > 
> > It requires QEMU to be latest (>2.8). Otherwise QEMU will report "eim
> > property not found" and we lost chance to report as well.
> > 
> > So, I just chose to use pc-q35-2.7 here. Not sure whether this is a
> > good idea.
> 
> I think we just shouldn't care. Either this test is run on the latest
> kernel and qemu, or it will fail. Hopefully something ends up in test.log
> that indicates it failed due to an old kernel or old qemu. Old qemu is
> clear from the "eim property not found"

I see. Then let me make it simpler. I'll use q35 directly with eim=off.

> 
> Alternatively we can build support into runtime to capture kernel, qemu
> versions and compare them against new unittests.cfg {kvm,qemu}_minver
> fields. That would be a different series though, and will require
> extending mkstandalone as well. Feel free to add it to your TODO. If you
> do, then I hope you like Bash :-)

Sure. It's in. And I like Bash. :-)

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-08 15:33             ` Peter Xu
@ 2016-11-08 17:27               ` Andrew Jones
  0 siblings, 0 replies; 67+ messages in thread
From: Andrew Jones @ 2016-11-08 17:27 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, rkrcmar, agordeev, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 10:33:12AM -0500, Peter Xu wrote:
> On Tue, Nov 08, 2016 at 11:16:11AM +0100, Andrew Jones wrote:
> > On Mon, Nov 07, 2016 at 02:42:29PM -0500, Peter Xu wrote:
> > > On Mon, Nov 07, 2016 at 07:02:12PM +0100, Andrew Jones wrote:
> > > 
> > > [...]
> > > 
> > > > > I thought pcidevaddr_t means bdf? That's a wild guess of me from its
> > > > > size, as it's defined as:
> > > > > 
> > > > >   typedef uint16_t pcidevaddr_t;
> > > > > 
> > > > > uint16_t makes it looks more like bdf rather than devfn. For devfn,
> > > > > uint8_t suites better.
> > > > > 
> > > > > And what I mean here is to pass in the bdf of the device, not devfn
> > > > > only (in our case, devfn is always bdf though, since we are only
> > > > > supporting bus number zero).
> > > > 
> > > > see lib/x86/asm/pci.h:pci_config_read how 'dev' gets used, it's only dev.
> > > 
> > > But shouldn't "dev" the same as "bdf" here?
> > 
> > Hmm, yes. We're only shifting it by 8, not 11, so you're right. The word
> > 'dev' in pcidevaddr_t threw me off and I didn't bother looking up CF8's
> > bits until just now. OK, let's try to get consistent naming going on,
> > and we need better comments (comments at all). I'd like to see one
> > briefly describing 'bdf' and then use that for the primary name, i.e.
> > rename pcidevaddr_t to something like 'pcibdfaddr_t' or whatever.
> 
> Yes, indeed the naming is a little bit confusing here. I was always
> trying to use bdf/devfn in the whole series to avoid misunderstanding.
> Regarding to the naming of pcidevaddr_t: that's okay for me, but I
> agree that bdf is better and clearer. If you would not mind, I'll keep
> it as it is for now for this series.

Fine by me (now that I'm no longer confused :-)
> 
> Thanks,
> 
> -- peterx
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-08 15:48       ` Peter Xu
@ 2016-11-08 17:35         ` Andrew Jones
  2016-11-08 17:54           ` Alexander Gordeev
  2016-11-08 17:46         ` Alexander Gordeev
  1 sibling, 1 reply; 67+ messages in thread
From: Andrew Jones @ 2016-11-08 17:35 UTC (permalink / raw)
  To: Peter Xu; +Cc: Alexander Gordeev, kvm, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 10:48:32AM -0500, Peter Xu wrote:
> On Tue, Nov 08, 2016 at 01:27:31PM +0100, Alexander Gordeev wrote:
> > On Fri, Nov 04, 2016 at 05:41:01PM +0100, Andrew Jones wrote:
> > > >  /* Scan bus look for a specific device. Only bus 0 scanned for now. */
> > > > -pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
> > > > +int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id)
> > > >  {
> > > >  	pcidevaddr_t dev;
> > > >  
> > > > -	for (dev = 0; dev < 256; ++dev) {
> > > > +	for (dev = 0; dev < PCI_DEVFN_MAX; ++dev) {
> > > 
> > > Why introduce this PCI_DEVFN_MAX define?
> > > 
> > > >  		if (pci_config_readw(dev, PCI_VENDOR_ID) == vendor_id &&
> > > > -		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id)
> > > > -			return dev;
> > > > +		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id) {
> > > > +			pci_dev_init(pci_dev, dev);
> > > > +			return 0;
> > > > +		}
> > > >  	}
> > > >  
> > > > -	return PCIDEVADDR_INVALID;
> > > > +	return -1;
> > > 
> > > Why not use bool for the ret type; true=good, false=bad?
> > 
> > Both ways look strange to me. I would leave pci_find_dev() as is (low-
> > level) and move pci_dev_init() outside - so a usage would be i.e.:
> > 
> > 	dev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
> > 	if (dev == PCIDEVADDR_INVALID)
> > 		...
> > 	pci_dev_init(&pci_dev, dev);
> 
> IMHO actually it'll be nicer with:
> 
>   struct pci_dev *pci_find_dev(uint16_t vendor_id, uint16_t device_id);

Yes, I was going to suggest this too, but then...

> 
> But this needs dynamic allocation of memory, which might be an
> overkill for kvm-unit-test. So I chose to allocate the pci_dev on
> caller stack.

...saw the value of avoiding the alloc. That said, we do have precedent
for allocating in lib/* code already (virtio_bind), and I am currently
reworking x86's memory management code in order to enable malloc/calloc
use within lib code for x86. So we could go the alloc way as well...

> 
> Anyway, I take the interface issue as a matter of taste.

Not quite. I see Alex's reasoning. Perhaps users want to simply query
if a device is there (pci_find_dev), but not actually init a pci_dev
struct each time they call it. Alex's proposal keeps pci_find_dev more
generally useful.

Thanks,
drew

> 
> How about I take Drew's suggestion here with true/false? Then if we
> have any better idea with solid reasons, we can post seperate patch
> for it.
> 
> Thanks,
> 
> -- peterx
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test
  2016-11-08 15:24     ` Peter Xu
@ 2016-11-08 17:40       ` Alexander Gordeev
  2016-11-08 17:42         ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Gordeev @ 2016-11-08 17:40 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, drjones, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 10:24:18AM -0500, Peter Xu wrote:
> On Tue, Nov 08, 2016 at 11:52:35AM +0100, Alexander Gordeev wrote:
> > On Wed, Oct 26, 2016 at 03:47:04PM +0800, Peter Xu wrote:
> > > +static inline void vtd_writel(unsigned int reg, uint32_t value)
> > > +{
> > > +	*(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> > > +}
> > > +
> > > +static inline void vtd_writeq(unsigned int reg, uint64_t value)
> > > +{
> > > +	*(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg) = value;
> > > +}
> > > +
> > > +static inline uint32_t vtd_readl(unsigned int reg)
> > > +{
> > > +	return *(uint32_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> > > +}
> > > +
> > > +static inline uint64_t vtd_readq(unsigned int reg)
> > > +{
> > > +	return *(uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> > > +}
> > 
> > Is it safe to use memory dereference here?
> > By contrast, Linux seems uses read*()/write*() accessors.
> 
> Looks like Linux is using memory barrior and volatile keywords to do
> the protection (build_mmio_read() for x86). I think at least I can add
> volatile here for each read/write to make it safer. Do you think
> that'll suffice here? E.g.:
> 
>   static inline uint64_t vtd_readq(unsigned int reg)
>   {
>       return *(volatile uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
>   }

Yeah, I think with volatile it is better. But I am still missing
why not just use read/write which already exist and use volatile:

static inline u64 __raw_readq(const volatile void *addr)
{
	assert(sizeof(unsigned long) == sizeof(u64));
	return *(const volatile u64 *)addr;
}

> Thanks,
> 
> -- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test
  2016-11-08 17:40       ` Alexander Gordeev
@ 2016-11-08 17:42         ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-08 17:42 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: kvm, drjones, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 06:40:35PM +0100, Alexander Gordeev wrote:

[...]

> > Looks like Linux is using memory barrior and volatile keywords to do
> > the protection (build_mmio_read() for x86). I think at least I can add
> > volatile here for each read/write to make it safer. Do you think
> > that'll suffice here? E.g.:
> > 
> >   static inline uint64_t vtd_readq(unsigned int reg)
> >   {
> >       return *(volatile uint64_t *)(Q35_HOST_BRIDGE_IOMMU_ADDR + reg);
> >   }
> 
> Yeah, I think with volatile it is better. But I am still missing
> why not just use read/write which already exist and use volatile:
> 
> static inline u64 __raw_readq(const volatile void *addr)
> {
> 	assert(sizeof(unsigned long) == sizeof(u64));
> 	return *(const volatile u64 *)addr;
> }

That's nice! I'll use it. Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-08 15:48       ` Peter Xu
  2016-11-08 17:35         ` Andrew Jones
@ 2016-11-08 17:46         ` Alexander Gordeev
  1 sibling, 0 replies; 67+ messages in thread
From: Alexander Gordeev @ 2016-11-08 17:46 UTC (permalink / raw)
  To: Peter Xu; +Cc: Andrew Jones, kvm, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 10:48:32AM -0500, Peter Xu wrote:
> > Both ways look strange to me. I would leave pci_find_dev() as is (low-
> > level) and move pci_dev_init() outside - so a usage would be i.e.:
> > 
> > 	dev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
> > 	if (dev == PCIDEVADDR_INVALID)
> > 		...
> > 	pci_dev_init(&pci_dev, dev);
> 
> IMHO actually it'll be nicer with:
> 
>   struct pci_dev *pci_find_dev(uint16_t vendor_id, uint16_t device_id);

I hesitated to propose this one, but IMO it is the most clear way
everybody are accustomed to :)

> But this needs dynamic allocation of memory, which might be an
> overkill for kvm-unit-test. So I chose to allocate the pci_dev on
> caller stack.

Well, the dynamic allocation is implemented and utilized, so why not?

> Anyway, I take the interface issue as a matter of taste.
> 
> How about I take Drew's suggestion here with true/false? Then if we
> have any better idea with solid reasons, we can post seperate patch
> for it.

As far as I am concerned in all disputed cases Drew's opinion prevails ;)

> Thanks,
> 
> -- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-08 17:35         ` Andrew Jones
@ 2016-11-08 17:54           ` Alexander Gordeev
  2016-11-08 19:59             ` Peter Xu
  0 siblings, 1 reply; 67+ messages in thread
From: Alexander Gordeev @ 2016-11-08 17:54 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Peter Xu, kvm, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 06:35:54PM +0100, Andrew Jones wrote:
> On Tue, Nov 08, 2016 at 10:48:32AM -0500, Peter Xu wrote:
> > On Tue, Nov 08, 2016 at 01:27:31PM +0100, Alexander Gordeev wrote:
> > > On Fri, Nov 04, 2016 at 05:41:01PM +0100, Andrew Jones wrote:
> > > > >  /* Scan bus look for a specific device. Only bus 0 scanned for now. */
> > > > > -pcidevaddr_t pci_find_dev(uint16_t vendor_id, uint16_t device_id)
> > > > > +int pci_find_dev(struct pci_dev *pci_dev, uint16_t vendor_id, uint16_t device_id)
> > > > >  {
> > > > >  	pcidevaddr_t dev;
> > > > >  
> > > > > -	for (dev = 0; dev < 256; ++dev) {
> > > > > +	for (dev = 0; dev < PCI_DEVFN_MAX; ++dev) {
> > > > 
> > > > Why introduce this PCI_DEVFN_MAX define?
> > > > 
> > > > >  		if (pci_config_readw(dev, PCI_VENDOR_ID) == vendor_id &&
> > > > > -		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id)
> > > > > -			return dev;
> > > > > +		    pci_config_readw(dev, PCI_DEVICE_ID) == device_id) {
> > > > > +			pci_dev_init(pci_dev, dev);
> > > > > +			return 0;
> > > > > +		}
> > > > >  	}
> > > > >  
> > > > > -	return PCIDEVADDR_INVALID;
> > > > > +	return -1;
> > > > 
> > > > Why not use bool for the ret type; true=good, false=bad?
> > > 
> > > Both ways look strange to me. I would leave pci_find_dev() as is (low-
> > > level) and move pci_dev_init() outside - so a usage would be i.e.:
> > > 
> > > 	dev = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
> > > 	if (dev == PCIDEVADDR_INVALID)
> > > 		...
> > > 	pci_dev_init(&pci_dev, dev);
> > 
> > IMHO actually it'll be nicer with:
> > 
> >   struct pci_dev *pci_find_dev(uint16_t vendor_id, uint16_t device_id);
> 
> Yes, I was going to suggest this too, but then...
> 
> > 
> > But this needs dynamic allocation of memory, which might be an
> > overkill for kvm-unit-test. So I chose to allocate the pci_dev on
> > caller stack.
> 
> ...saw the value of avoiding the alloc. That said, we do have precedent
> for allocating in lib/* code already (virtio_bind), and I am currently
> reworking x86's memory management code in order to enable malloc/calloc
> use within lib code for x86. So we could go the alloc way as well...
> 
> > 
> > Anyway, I take the interface issue as a matter of taste.
> 
> Not quite. I see Alex's reasoning. Perhaps users want to simply query
> if a device is there (pci_find_dev), but not actually init a pci_dev
> struct each time they call it. Alex's proposal keeps pci_find_dev more
> generally useful.

Yep. So in the end of the day I would kept pci_find_dev() and added
pci_dev_alloc()/pci_dev_free() helpers (not sure about naming though).
The former would have to initialize pci_dev data to be consistent
and safely used by any other PCI API that takes pci_dev pointer.

> Thanks,
> drew

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev
  2016-11-08 17:54           ` Alexander Gordeev
@ 2016-11-08 19:59             ` Peter Xu
  0 siblings, 0 replies; 67+ messages in thread
From: Peter Xu @ 2016-11-08 19:59 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: Andrew Jones, kvm, rkrcmar, jan.kiszka, pbonzini

On Tue, Nov 08, 2016 at 06:54:16PM +0100, Alexander Gordeev wrote:

[...]

> > > IMHO actually it'll be nicer with:
> > > 
> > >   struct pci_dev *pci_find_dev(uint16_t vendor_id, uint16_t device_id);
> > 
> > Yes, I was going to suggest this too, but then...
> > 
> > > 
> > > But this needs dynamic allocation of memory, which might be an
> > > overkill for kvm-unit-test. So I chose to allocate the pci_dev on
> > > caller stack.
> > 
> > ...saw the value of avoiding the alloc. That said, we do have precedent
> > for allocating in lib/* code already (virtio_bind), and I am currently
> > reworking x86's memory management code in order to enable malloc/calloc
> > use within lib code for x86. So we could go the alloc way as well...
> > 
> > > 
> > > Anyway, I take the interface issue as a matter of taste.
> > 
> > Not quite. I see Alex's reasoning. Perhaps users want to simply query
> > if a device is there (pci_find_dev), but not actually init a pci_dev
> > struct each time they call it. Alex's proposal keeps pci_find_dev more
> > generally useful.

Sounds reasonable.

> 
> Yep. So in the end of the day I would kept pci_find_dev() and added
> pci_dev_alloc()/pci_dev_free() helpers (not sure about naming though).
> The former would have to initialize pci_dev data to be consistent
> and safely used by any other PCI API that takes pci_dev pointer.

Then I'll keep the old pci_find_dev() interface. And then when init a
PCI device, it will look like:

    struct pci_dev;
    pcidevaddr_t dev_addr;

    dev_addr = pci_find_dev(VENDOR_ID, DEVICE_ID);
    if (dev_addr == PCIDEVADDR_INVALID)
        goto fail;
    pci_dev_init(&pci_dev, dev_addr);
    pci_enable_defaults(&pci_dev);

Introducing pci_dev_alloc()/free() is a good idea as well. Since that
can be a work upon current codes, I'll avoid doing it in this series
to make the series shorter (it's already longer than I thought it
should be :-).

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2016-11-08 19:59 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-26  7:47 [PATCH kvm-unit-tests 00/17] VT-d unit test Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 01/17] x86: intel-iommu: add vt-d init test Peter Xu
2016-11-04 16:12   ` Andrew Jones
2016-11-07 16:32     ` Peter Xu
2016-11-08 10:52   ` Alexander Gordeev
2016-11-08 15:24     ` Peter Xu
2016-11-08 17:40       ` Alexander Gordeev
2016-11-08 17:42         ` Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 02/17] libcflat: add IS_ALIGNED() macro, and page sizes Peter Xu
2016-11-04 16:14   ` Andrew Jones
2016-10-26  7:47 ` [PATCH kvm-unit-tests 03/17] libcflat: moving MIN/MAX here Peter Xu
2016-11-04 16:15   ` Andrew Jones
2016-10-26  7:47 ` [PATCH kvm-unit-tests 04/17] vm/page: provide PGDIR_OFFSET() macro Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 05/17] x86/asm: add cpu_relax() Peter Xu
2016-11-04 16:18   ` Andrew Jones
2016-11-07 16:40     ` Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 06/17] pci: introduce struct pci_dev Peter Xu
2016-11-04 16:41   ` Andrew Jones
2016-11-07 17:05     ` Peter Xu
2016-11-07 18:02       ` Andrew Jones
2016-11-07 19:42         ` Peter Xu
2016-11-08 10:16           ` Andrew Jones
2016-11-08 15:33             ` Peter Xu
2016-11-08 17:27               ` Andrew Jones
2016-11-08 12:27     ` Alexander Gordeev
2016-11-08 15:48       ` Peter Xu
2016-11-08 17:35         ` Andrew Jones
2016-11-08 17:54           ` Alexander Gordeev
2016-11-08 19:59             ` Peter Xu
2016-11-08 17:46         ` Alexander Gordeev
2016-10-26  7:47 ` [PATCH kvm-unit-tests 07/17] pci: provide pci_scan_bars() Peter Xu
2016-11-04 16:47   ` Andrew Jones
2016-11-07 17:16     ` Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 08/17] x86/vmexit: leverage pci_scan_bars() Peter Xu
2016-11-04 16:54   ` Andrew Jones
2016-11-08 13:43     ` Alexander Gordeev
2016-11-08 15:55       ` Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 09/17] pci: add pci_config_write[wb]() Peter Xu
2016-11-04 16:59   ` Andrew Jones
2016-11-05 17:06   ` Alexander Gordeev
2016-11-07 17:25     ` Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 10/17] pci: provide pci_set_master() Peter Xu
2016-11-04 17:04   ` Andrew Jones
2016-11-07 17:35     ` Peter Xu
2016-11-07 17:59       ` Andrew Jones
2016-11-07 19:45         ` Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 11/17] pci: provide pci_enable_defaults() Peter Xu
2016-11-04 17:08   ` Andrew Jones
2016-10-26  7:47 ` [PATCH kvm-unit-tests 12/17] pci: add bdf helpers Peter Xu
2016-11-04 17:51   ` Andrew Jones
2016-10-26  7:47 ` [PATCH kvm-unit-tests 13/17] pci: edu: introduce pci-edu helpers Peter Xu
2016-11-04 17:18   ` Andrew Jones
2016-11-07 17:41     ` Peter Xu
2016-11-04 17:24   ` Andrew Jones
2016-11-07 17:44     ` Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 14/17] x86: intel-iommu: add dmar test Peter Xu
2016-11-04 17:53   ` Andrew Jones
2016-10-26  7:47 ` [PATCH kvm-unit-tests 15/17] pci: add msi support for 32/64bit address Peter Xu
2016-11-04 17:33   ` Andrew Jones
2016-11-07 17:58     ` Peter Xu
2016-10-26  7:47 ` [PATCH kvm-unit-tests 16/17] x86: intel-iommu: add IR MSI test Peter Xu
2016-11-04 17:40   ` Andrew Jones
2016-10-26  7:47 ` [PATCH kvm-unit-tests 17/17] x86/unittests: add intel-iommu test Peter Xu
2016-11-04 17:46   ` Andrew Jones
2016-11-07 18:06     ` Peter Xu
2016-11-08 10:39       ` Andrew Jones
2016-11-08 15:57         ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.