All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup)
@ 2015-02-09 20:04 Boris Ostrovsky
  2015-02-09 20:04 ` [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent Boris Ostrovsky
                   ` (11 more replies)
  0 siblings, 12 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-09 20:04 UTC (permalink / raw)
  To: jbeulich, keir, andrew.cooper3, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: boris.ostrovsky, dario.faggioli, port-xen, ufimtseva, xen-devel

Changes in v3:
* Added patch #1 to more consistently define nodes as a u8 and properly
  use NUMA_NO_NODE.
* Make changes to xen_sysctl_numainfo, similar to those made to
  xen_sysctl_topologyinfo. (Q: I kept both sets of changes in the same
  patch #3 to avoid bumping interface version twice. Perhaps it's better
  to split it into two?)
* Instead of copying data for each loop index allocate a buffer and copy
  once for all three queries in sysctl.c.
* Move hypercall buffer management from libxl to libxc (as requested by
  Dario, patches #5 and #6).
* Report topology info for offlined CPUs as well
* Added LIBXL_HAVE_PCITOPO macro

Changes in v2:
* Split topology sysctls into two --- one for CPU topology and the other
  for devices
* Avoid long loops in the hypervisor by using continuations. (I am not
  particularly happy about using first_dev in the interface, suggestions
  for a better interface would be appreciated)
* Use proper libxl conventions for interfaces
* Avoid hypervisor stack corruption when copying PXM data from guest


A few patches that add interface for querying hypervisor about device
topology and allow 'xl info -n' display this information if PXM object
is provided by ACPI.

This series also makes some optimizations and cleanup of current CPU
topology and NUMA sysctl queries.


Boris Ostrovsky (7):
  x86/numa: Make use of NUMA_NO_NODE consistent
  pci: Do not ignore device's PXM information
  sysctl: Make topologyinfo and numainfo sysctls a little more
    efficient
  sysctl: Add sysctl interface for querying PCI topology
  libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer
    management to libxc
  libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management
    to libxc
  libxl: Add interface for querying hypervisor about PCI topology

 tools/libxc/include/xenctrl.h       |   11 ++-
 tools/libxc/xc_misc.c               |   84 ++++++++++++---
 tools/libxl/libxl.c                 |  152 ++++++++++++-------------
 tools/libxl/libxl.h                 |   14 +++
 tools/libxl/libxl_freebsd.c         |   14 +++
 tools/libxl/libxl_internal.h        |    5 +
 tools/libxl/libxl_linux.c           |   74 ++++++++++++
 tools/libxl/libxl_netbsd.c          |   14 +++
 tools/libxl/libxl_types.idl         |    7 +
 tools/libxl/libxl_utils.c           |    8 ++
 tools/libxl/xl_cmdimpl.c            |   44 ++++++-
 tools/misc/xenpm.c                  |   92 ++++++---------
 tools/python/xen/lowlevel/xc/xc.c   |  103 ++++++-----------
 xen/arch/x86/physdev.c              |   23 ++++-
 xen/arch/x86/smpboot.c              |    2 +-
 xen/arch/x86/srat.c                 |   39 ++++---
 xen/arch/x86/x86_64/mm.c            |    2 +-
 xen/common/sysctl.c                 |  214 ++++++++++++++++++++++++++---------
 xen/drivers/passthrough/pci.c       |   13 ++-
 xen/drivers/passthrough/vtd/iommu.c |    4 +-
 xen/include/asm-x86/numa.h          |    4 +-
 xen/include/public/physdev.h        |    6 +
 xen/include/public/sysctl.h         |  104 ++++++++++++-----
 xen/include/xen/pci.h               |    5 +-
 24 files changed, 698 insertions(+), 340 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
@ 2015-02-09 20:04 ` Boris Ostrovsky
  2015-02-10 10:41   ` Andrew Cooper
                     ` (3 more replies)
  2015-02-09 20:04 ` [PATCH v3 2/7] pci: Do not ignore device's PXM information Boris Ostrovsky
                   ` (10 subsequent siblings)
  11 siblings, 4 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-09 20:04 UTC (permalink / raw)
  To: jbeulich, keir, andrew.cooper3, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: boris.ostrovsky, dario.faggioli, port-xen, ufimtseva, xen-devel

Various pieces of code test whether node value is NUMA_NO_NODE even
though pxm_to_node() may return (int)-1 for an invalid node.

Make pxm_to_node() and setup_node() return u8 and have them return
NUMA_NO_NODE when necessary.

Adjust code that tests for (node == -1).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/smpboot.c              |    2 +-
 xen/arch/x86/srat.c                 |   39 ++++++++++++++++++++++------------
 xen/arch/x86/x86_64/mm.c            |    2 +-
 xen/drivers/passthrough/vtd/iommu.c |    4 +-
 xen/include/asm-x86/numa.h          |    4 +-
 5 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index c54be7e..97956be 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -877,7 +877,7 @@ int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm)
 
     if ( !srat_disabled() )
     {
-        if ( (node = setup_node(pxm)) < 0 )
+        if ( (node = setup_node(pxm)) == NUMA_NO_NODE )
         {
             dprintk(XENLOG_WARNING,
                     "Setup node failed for pxm %x\n", pxm);
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 29fc724..4dfa1c3 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -21,13 +21,16 @@
 #include <asm/e820.h>
 #include <asm/page.h>
 
+#define MAX_PXM   255
+
 static struct acpi_table_slit *__read_mostly acpi_slit;
 
 static nodemask_t memory_nodes_parsed __initdata;
 static nodemask_t processor_nodes_parsed __initdata;
 static nodemask_t nodes_found __initdata;
 static struct node nodes[MAX_NUMNODES] __initdata;
-static u8 __read_mostly pxm2node[256] = { [0 ... 255] = NUMA_NO_NODE };
+static u8 __read_mostly pxm2node[MAX_PXM + 1] =
+    { [0 ... MAX_PXM] = NUMA_NO_NODE };
 
 
 static int num_node_memblks;
@@ -37,21 +40,29 @@ static int memblk_nodeid[NR_NODE_MEMBLKS];
 
 static int node_to_pxm(int n);
 
-int pxm_to_node(int pxm)
+u8 pxm_to_node(int pxm)
 {
-	if ((unsigned)pxm >= 256)
-		return -1;
-	/* Extend 0xff to (int)-1 */
-	return (signed char)pxm2node[pxm];
+	if ((unsigned)pxm > MAX_PXM)
+		return NUMA_NO_NODE;
+
+	return pxm2node[pxm];
 }
 
-__devinit int setup_node(int pxm)
+__devinit u8 setup_node(int pxm)
 {
-	unsigned node = pxm2node[pxm];
-	if (node == 0xff) {
+	u8 node;
+
+	/* NUMA_NO_NODE is 255 */
+	BUILD_BUG_ON(MAX_NUMNODES > 254);
+
+	if ((unsigned)pxm > MAX_PXM)
+		return NUMA_NO_NODE;
+
+	node = pxm2node[pxm];
+	if (node == NUMA_NO_NODE) {
 		if (nodes_weight(nodes_found) >= MAX_NUMNODES)
-			return -1;
-		node = first_unset_node(nodes_found); 
+			return NUMA_NO_NODE;
+		node = first_unset_node(nodes_found);
 		node_set(node, nodes_found);
 		pxm2node[pxm] = node;
 	}
@@ -175,7 +186,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
 		return;
 	pxm = pa->proximity_domain;
 	node = setup_node(pxm);
-	if (node < 0) {
+	if (node == NUMA_NO_NODE) {
 		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
 		bad_srat();
 		return;
@@ -208,7 +219,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
 		pxm |= pa->proximity_domain_hi[2] << 24;
 	}
 	node = setup_node(pxm);
-	if (node < 0) {
+	if (node == NUMA_NO_NODE) {
 		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
 		bad_srat();
 		return;
@@ -252,7 +263,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 	if (srat_rev < 2)
 		pxm &= 0xff;
 	node = setup_node(pxm);
-	if (node < 0) {
+	if (node == NUMA_NO_NODE) {
 		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
 		bad_srat();
 		return;
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index d631aee..4817dad 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1353,7 +1353,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     if ( !mem_hotadd_check(spfn, epfn) )
         return -EINVAL;
 
-    if ( (node = setup_node(pxm)) == -1 )
+    if ( (node = setup_node(pxm)) == NUMA_NO_NODE )
         return -EINVAL;
 
     if ( !valid_numa_range(spfn << PAGE_SHIFT, epfn << PAGE_SHIFT, node) )
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 19d8165..fd48f7b 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -190,14 +190,14 @@ u64 alloc_pgtable_maddr(struct acpi_drhd_unit *drhd, unsigned long npages)
     struct acpi_rhsa_unit *rhsa;
     struct page_info *pg, *cur_pg;
     u64 *vaddr;
-    int node = -1, i;
+    int node = NUMA_NO_NODE, i;
 
     rhsa = drhd_to_rhsa(drhd);
     if ( rhsa )
         node =  pxm_to_node(rhsa->proximity_domain);
 
     pg = alloc_domheap_pages(NULL, get_order_from_pages(npages),
-                             (node == -1 ) ? 0 : MEMF_node(node));
+                             (node == NUMA_NO_NODE ) ? 0 : MEMF_node(node));
     if ( !pg )
         return 0;
 
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index 5959860..2a43827 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -21,7 +21,7 @@ struct node {
 
 extern int compute_hash_shift(struct node *nodes, int numnodes,
 			      int *nodeids);
-extern int pxm_to_node(int nid);
+extern u8 pxm_to_node(int nid);
 
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 #define VIRTUAL_BUG_ON(x) 
@@ -33,7 +33,7 @@ extern int numa_off;
 
 extern int srat_disabled(void);
 extern void numa_set_node(int cpu, int node);
-extern int setup_node(int pxm);
+extern u8 setup_node(int pxm);
 extern void srat_detect_node(int cpu);
 
 extern void setup_node_bootmem(int nodeid, u64 start, u64 end);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/7] pci: Do not ignore device's PXM information
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
  2015-02-09 20:04 ` [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent Boris Ostrovsky
@ 2015-02-09 20:04 ` Boris Ostrovsky
  2015-02-09 20:04 ` [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient Boris Ostrovsky
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-09 20:04 UTC (permalink / raw)
  To: jbeulich, keir, andrew.cooper3, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: boris.ostrovsky, dario.faggioli, port-xen, ufimtseva, xen-devel

If ACPI provides PXM data for IO devices then dom0 will pass it to
hypervisor during PHYSDEVOP_pci_device_add call. This information,
however, is currently ignored.

We will store this information (in the form of nodeID) in pci_dev
structure so that we can provide it, for example, to the toolstack
when it adds support (in the following patches) for querying the
hypervisor about device topology

We will also print it when user requests device information dump.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/physdev.c        |   23 ++++++++++++++++++++---
 xen/drivers/passthrough/pci.c |   13 +++++++++----
 xen/include/public/physdev.h  |    6 ++++++
 xen/include/xen/pci.h         |    5 ++++-
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 6b3201b..ab92fa5 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
             break;
 
-        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
+        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
+                             NULL, NUMA_NO_NODE);
         break;
     }
 
@@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
         ret = pci_add_device(0, manage_pci_ext.bus,
                              manage_pci_ext.devfn,
-                             &pdev_info);
+                             &pdev_info, NUMA_NO_NODE);
         break;
     }
 
     case PHYSDEVOP_pci_device_add: {
         struct physdev_pci_device_add add;
         struct pci_dev_info pdev_info;
+        u8 node;
 
         ret = -EFAULT;
         if ( copy_from_guest(&add, arg, 1) != 0 )
@@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         }
         else
             pdev_info.is_virtfn = 0;
-        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
+
+        if ( add.flags & XEN_PCI_DEV_PXM )
+        {
+            uint32_t pxm;
+            size_t optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
+                                sizeof(add.optarr[0]);
+
+            if ( copy_from_guest_offset(&pxm, arg, optarr_off, 1) )
+                break;
+
+            node = pxm_to_node(pxm);
+        }
+        else
+            node = NUMA_NO_NODE;
+
+        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node);
         break;
     }
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 78c6977..43b4384 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
     pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
 }
 
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+                   const struct pci_dev_info *info, u8 node)
 {
     struct pci_seg *pseg;
     struct pci_dev *pdev;
@@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
         pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
         spin_unlock(&pcidevs_lock);
         if ( !pdev )
-            pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
+            pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
+                           NULL, node);
         pdev_type = "virtual function";
     }
     else
@@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
     if ( !pdev )
         goto out;
 
+    pdev->node = node;
+
     if ( info )
         pdev->info = *info;
     else if ( !pdev->vf_rlen[0] )
@@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
 
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
     {
-        printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ",
+        printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
                pseg->nr, pdev->bus,
                PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
-               pdev->domain ? pdev->domain->domain_id : -1);
+               pdev->domain ? pdev->domain->domain_id : -1,
+               (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
         list_for_each_entry ( msi, &pdev->msi_list, list )
                printk("%d ", msi->irq);
         printk(">\n");
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index d547928..309346b 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -293,6 +293,12 @@ struct physdev_pci_device_add {
         uint8_t bus;
         uint8_t devfn;
     } physfn;
+
+    /*
+     * Optional parameters array.
+     * First element ([0]) is PXM domain associated with the device (if
+     * XEN_PCI_DEV_PXM is set)
+     */
 #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
     uint32_t optarr[];
 #elif defined(__GNUC__)
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 5f295f3..1acab53 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -56,6 +56,8 @@ struct pci_dev {
 
     u8 phantom_stride;
 
+    u8 node; /* NUMA node */
+
     enum pdev_type {
         DEV_TYPE_PCI_UNKNOWN,
         DEV_TYPE_PCIe_ENDPOINT,
@@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *,
 int pci_release_devices(struct domain *d);
 int pci_add_segment(u16 seg);
 const unsigned long *pci_get_ro_map(u16 seg);
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+                   const struct pci_dev_info *, u8 node);
 int pci_remove_device(u16 seg, u8 bus, u8 devfn);
 int pci_ro_device(int seg, int bus, int devfn);
 void arch_pci_ro_device(int seg, int bdf);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
  2015-02-09 20:04 ` [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent Boris Ostrovsky
  2015-02-09 20:04 ` [PATCH v3 2/7] pci: Do not ignore device's PXM information Boris Ostrovsky
@ 2015-02-09 20:04 ` Boris Ostrovsky
  2015-02-13 12:26   ` Wei Liu
                     ` (4 more replies)
  2015-02-09 20:04 ` [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
                   ` (8 subsequent siblings)
  11 siblings, 5 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-09 20:04 UTC (permalink / raw)
  To: jbeulich, keir, andrew.cooper3, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: boris.ostrovsky, dario.faggioli, port-xen, ufimtseva, xen-devel

Currently both of these sysctls make a copy to userspace for each index of
various query arrays. We should try to copy whole arrays instead.

This requires some changes in sysctl's public data structure, thus bump
interface version.

Report topology for all present (not just online) cpus.

Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
that these are used for CPU topology. Subsequent patch will add support for
PCI topology sysctl.

Clarify some somments in sysctl.h.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxc/include/xenctrl.h     |    4 +-
 tools/libxc/xc_misc.c             |   10 ++--
 tools/libxl/libxl.c               |   71 +++++++------------
 tools/misc/xenpm.c                |   69 +++++++-----------
 tools/python/xen/lowlevel/xc/xc.c |   77 ++++++++------------
 xen/common/sysctl.c               |  141 ++++++++++++++++++++++---------------
 xen/include/public/sysctl.h       |   75 ++++++++++++--------
 7 files changed, 221 insertions(+), 226 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ad8b8d..0cb6743 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
 int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
-typedef xen_sysctl_topologyinfo_t xc_topologyinfo_t;
+typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1237,7 +1237,7 @@ typedef uint64_t xc_node_to_memfree_t;
 typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
-int xc_topologyinfo(xc_interface *xch, xc_topologyinfo_t *info);
+int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index e253a58..be68291 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -177,20 +177,20 @@ int xc_physinfo(xc_interface *xch,
     return 0;
 }
 
-int xc_topologyinfo(xc_interface *xch,
-                xc_topologyinfo_t *put_info)
+int xc_cputopoinfo(xc_interface *xch,
+                   xc_cputopoinfo_t *put_info)
 {
     int ret;
     DECLARE_SYSCTL;
 
-    sysctl.cmd = XEN_SYSCTL_topologyinfo;
+    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
 
-    memcpy(&sysctl.u.topologyinfo, put_info, sizeof(*put_info));
+    memcpy(&sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
 
     if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
         return ret;
 
-    memcpy(put_info, &sysctl.u.topologyinfo, sizeof(*put_info));
+    memcpy(put_info, &sysctl.u.cputopoinfo, sizeof(*put_info));
 
     return 0;
 }
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 50a8928..b05c0bb 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5072,10 +5072,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
     GC_INIT(ctx);
-    xc_topologyinfo_t tinfo;
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
+    xc_cputopoinfo_t tinfo;
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
     libxl_cputopology *ret = NULL;
     int i;
     int max_cpus;
@@ -5088,24 +5086,18 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
         goto out;
     }
 
-    coremap = xc_hypercall_buffer_alloc
-        (ctx->xch, coremap, sizeof(*coremap) * max_cpus);
-    socketmap = xc_hypercall_buffer_alloc
-        (ctx->xch, socketmap, sizeof(*socketmap) * max_cpus);
-    nodemap = xc_hypercall_buffer_alloc
-        (ctx->xch, nodemap, sizeof(*nodemap) * max_cpus);
-    if ((coremap == NULL) || (socketmap == NULL) || (nodemap == NULL)) {
+    cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
+                                        sizeof(*cputopo) * max_cpus);
+    if (cputopo == NULL) {
         LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
                             "Unable to allocate hypercall arguments");
         goto fail;
     }
 
-    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
-    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
-    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
+    set_xen_guest_handle(tinfo.cputopo, cputopo);
     tinfo.max_cpu_index = max_cpus - 1;
-    if (xc_topologyinfo(ctx->xch, &tinfo) != 0) {
-        LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "Topology info hypercall failed");
+    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0) {
+        LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "CPU topology info hypercall failed");
         goto fail;
     }
 
@@ -5115,18 +5107,16 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
     ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * max_cpus);
 
     for (i = 0; i < max_cpus; i++) {
-#define V(map, i) (map[i] == INVALID_TOPOLOGY_ID) ? \
-    LIBXL_CPUTOPOLOGY_INVALID_ENTRY : map[i]
-        ret[i].core = V(coremap, i);
-        ret[i].socket = V(socketmap, i);
-        ret[i].node = V(nodemap, i);
+#define V(map, i, invalid) ( cputopo[i].map == invalid) ? \
+    LIBXL_CPUTOPOLOGY_INVALID_ENTRY :  cputopo[i].map
+        ret[i].core = V(core, i, INVALID_CORE_ID);
+        ret[i].socket = V(socket, i, INVALID_SOCKET_ID);
+        ret[i].node = V(node, i, INVALID_NODE_ID);
 #undef V
     }
 
 fail:
-    xc_hypercall_buffer_free(ctx->xch, coremap);
-    xc_hypercall_buffer_free(ctx->xch, socketmap);
-    xc_hypercall_buffer_free(ctx->xch, nodemap);
+    xc_hypercall_buffer_free(ctx->xch, cputopo);
 
     if (ret)
         *nb_cpu_out = max_cpus;
@@ -5139,9 +5129,8 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
     GC_INIT(ctx);
     xc_numainfo_t ninfo;
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize);
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists);
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
+    DECLARE_HYPERCALL_BUFFER(uint8_t, distance);
     libxl_numainfo *ret = NULL;
     int i, j, max_nodes;
 
@@ -5153,21 +5142,16 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
         goto out;
     }
 
-    memsize = xc_hypercall_buffer_alloc
-        (ctx->xch, memsize, sizeof(*memsize) * max_nodes);
-    memfree = xc_hypercall_buffer_alloc
-        (ctx->xch, memfree, sizeof(*memfree) * max_nodes);
-    node_dists = xc_hypercall_buffer_alloc
-        (ctx->xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes);
-    if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) {
+    meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo, sizeof(*meminfo) * max_nodes);
+    distance = xc_hypercall_buffer_alloc(ctx->xch, distance, sizeof(*distance) * max_nodes * max_nodes);
+    if ((meminfo == NULL) || (distance == NULL)) {
         LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
                             "Unable to allocate hypercall arguments");
         goto fail;
     }
 
-    set_xen_guest_handle(ninfo.node_to_memsize, memsize);
-    set_xen_guest_handle(ninfo.node_to_memfree, memfree);
-    set_xen_guest_handle(ninfo.node_to_node_distance, node_dists);
+    set_xen_guest_handle(ninfo.meminfo, meminfo);
+    set_xen_guest_handle(ninfo.distance, distance);
     ninfo.max_node_index = max_nodes - 1;
     if (xc_numainfo(ctx->xch, &ninfo) != 0) {
         LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo");
@@ -5181,23 +5165,22 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 
     ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes);
     for (i = 0; i < max_nodes; i++)
-        ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*node_dists));
+        ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*distance));
 
     for (i = 0; i < max_nodes; i++) {
-#define V(mem, i) (mem[i] == INVALID_NUMAINFO_ID) ? \
-    LIBXL_NUMAINFO_INVALID_ENTRY : mem[i]
+#define V(mem, i) (meminfo[i].mem == INVALID_MEM_SZ) ? \
+    LIBXL_NUMAINFO_INVALID_ENTRY : meminfo[i].mem
         ret[i].size = V(memsize, i);
         ret[i].free = V(memfree, i);
         ret[i].num_dists = max_nodes;
         for (j = 0; j < ret[i].num_dists; j++)
-            ret[i].dists[j] = V(node_dists, i * max_nodes + j);
+            ret[i].dists[j] = distance[i*max_nodes + j];
 #undef V
     }
 
  fail:
-    xc_hypercall_buffer_free(ctx->xch, memsize);
-    xc_hypercall_buffer_free(ctx->xch, memfree);
-    xc_hypercall_buffer_free(ctx->xch, node_dists);
+    xc_hypercall_buffer_free(ctx->xch, meminfo);
+    xc_hypercall_buffer_free(ctx->xch, distance);
 
  out:
     GC_FREE;
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index e43924c..f7fe620 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -355,16 +355,13 @@ static void signal_int_handler(int signo)
     int i, j, k;
     struct timeval tv;
     int cx_cap = 0, px_cap = 0;
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_core);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_socket);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_node);
-    xc_topologyinfo_t info = { 0 };
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    xc_cputopoinfo_t info = { 0 };
 
-    cpu_to_core = xc_hypercall_buffer_alloc(xc_handle, cpu_to_core, sizeof(*cpu_to_core) * MAX_NR_CPU);
-    cpu_to_socket = xc_hypercall_buffer_alloc(xc_handle, cpu_to_socket, sizeof(*cpu_to_socket) * MAX_NR_CPU);
-    cpu_to_node = xc_hypercall_buffer_alloc(xc_handle, cpu_to_node, sizeof(*cpu_to_node) * MAX_NR_CPU);
+    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
+                                        sizeof(*cputopo) * MAX_NR_CPU);
 
-    if ( cpu_to_core == NULL || cpu_to_socket == NULL || cpu_to_node == NULL )
+    if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
@@ -448,12 +445,10 @@ static void signal_int_handler(int signo)
             printf("  Avg freq\t%d\tKHz\n", avgfreq[i]);
     }
 
-    set_xen_guest_handle(info.cpu_to_core, cpu_to_core);
-    set_xen_guest_handle(info.cpu_to_socket, cpu_to_socket);
-    set_xen_guest_handle(info.cpu_to_node, cpu_to_node);
+    set_xen_guest_handle(info.cputopo, cputopo);
     info.max_cpu_index = MAX_NR_CPU - 1;
 
-    if ( cx_cap && !xc_topologyinfo(xc_handle, &info) )
+    if ( cx_cap && !xc_cputopoinfo(xc_handle, &info) )
     {
         uint32_t socket_ids[MAX_NR_CPU];
         uint32_t core_ids[MAX_NR_CPU];
@@ -465,8 +460,8 @@ static void signal_int_handler(int signo)
         /* check validity */
         for ( i = 0; i <= info.max_cpu_index; i++ )
         {
-            if ( cpu_to_core[i] == INVALID_TOPOLOGY_ID ||
-                 cpu_to_socket[i] == INVALID_TOPOLOGY_ID )
+            if ( cputopo[i].core == INVALID_CORE_ID ||
+                 cputopo[i].socket == INVALID_SOCKET_ID )
                 break;
         }
         if ( i > info.max_cpu_index )
@@ -475,20 +470,20 @@ static void signal_int_handler(int signo)
             for ( i = 0; i <= info.max_cpu_index; i++ )
             {
                 for ( j = 0; j < socket_nr; j++ )
-                    if ( cpu_to_socket[i] == socket_ids[j] )
+                    if ( cputopo[i].socket == socket_ids[j] )
                         break;
                 if ( j == socket_nr )
                 {
-                    socket_ids[j] = cpu_to_socket[i];
+                    socket_ids[j] = cputopo[i].socket;
                     socket_nr++;
                 }
 
                 for ( j = 0; j < core_nr; j++ )
-                    if ( cpu_to_core[i] == core_ids[j] )
+                    if ( cputopo[i].core == core_ids[j] )
                         break;
                 if ( j == core_nr )
                 {
-                    core_ids[j] = cpu_to_core[i];
+                    core_ids[j] = cputopo[i].core;
                     core_nr++;
                 }
             }
@@ -501,7 +496,7 @@ static void signal_int_handler(int signo)
 
                 for ( j = 0; j <= info.max_cpu_index; j++ )
                 {
-                    if ( cpu_to_socket[j] == socket_ids[i] )
+                    if ( cputopo[j].socket == socket_ids[i] )
                         break;
                 }
                 printf("\nSocket %d\n", socket_ids[i]);
@@ -520,8 +515,8 @@ static void signal_int_handler(int signo)
                 {
                     for ( j = 0; j <= info.max_cpu_index; j++ )
                     {
-                        if ( cpu_to_socket[j] == socket_ids[i] &&
-                             cpu_to_core[j] == core_ids[k] )
+                        if ( cputopo[j].socket == socket_ids[i] &&
+                             cputopo[j].core == core_ids[k] )
                             break;
                     }
                     printf("\t Core %d CPU %d\n", core_ids[k], j);
@@ -556,9 +551,7 @@ static void signal_int_handler(int signo)
     free(sum);
     free(avgfreq);
 out:
-    xc_hypercall_buffer_free(xc_handle, cpu_to_core);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_socket);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_node);
+    xc_hypercall_buffer_free(xc_handle, cputopo);
     xc_interface_close(xc_handle);
     exit(0);
 }
@@ -965,28 +958,22 @@ void scaling_governor_func(int argc, char *argv[])
 
 void cpu_topology_func(int argc, char *argv[])
 {
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_core);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_socket);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_node);
-    xc_topologyinfo_t info = { 0 };
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    xc_cputopoinfo_t info = { 0 };
     int i, rc = ENOMEM;
 
-    cpu_to_core = xc_hypercall_buffer_alloc(xc_handle, cpu_to_core, sizeof(*cpu_to_core) * MAX_NR_CPU);
-    cpu_to_socket = xc_hypercall_buffer_alloc(xc_handle, cpu_to_socket, sizeof(*cpu_to_socket) * MAX_NR_CPU);
-    cpu_to_node = xc_hypercall_buffer_alloc(xc_handle, cpu_to_node, sizeof(*cpu_to_node) * MAX_NR_CPU);
-
-    if ( cpu_to_core == NULL || cpu_to_socket == NULL || cpu_to_node == NULL )
+    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
+                                        sizeof(*cputopo) * MAX_NR_CPU);
+    if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
     }
 
-    set_xen_guest_handle(info.cpu_to_core, cpu_to_core);
-    set_xen_guest_handle(info.cpu_to_socket, cpu_to_socket);
-    set_xen_guest_handle(info.cpu_to_node, cpu_to_node);
+    set_xen_guest_handle(info.cputopo, cputopo);
     info.max_cpu_index = MAX_NR_CPU-1;
 
-    if ( xc_topologyinfo(xc_handle, &info) )
+    if ( xc_cputopoinfo(xc_handle, &info) )
     {
         rc = errno;
         fprintf(stderr, "Cannot get Xen CPU topology (%d - %s)\n",
@@ -1000,16 +987,14 @@ void cpu_topology_func(int argc, char *argv[])
     printf("CPU\tcore\tsocket\tnode\n");
     for ( i = 0; i <= info.max_cpu_index; i++ )
     {
-        if ( cpu_to_core[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].core == INVALID_CORE_ID )
             continue;
         printf("CPU%d\t %d\t %d\t %d\n",
-               i, cpu_to_core[i], cpu_to_socket[i], cpu_to_node[i]);
+               i, cputopo[i].core, cputopo[i].socket, cputopo[i].node);
     }
     rc = 0;
 out:
-    xc_hypercall_buffer_free(xc_handle, cpu_to_core);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_socket);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_node);
+    xc_hypercall_buffer_free(xc_handle, cputopo);
     if ( rc )
         exit(rc);
 }
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 2aa0dc7..4275968 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1221,30 +1221,20 @@ static PyObject *pyxc_getcpuinfo(XcObject *self, PyObject *args, PyObject *kwds)
 static PyObject *pyxc_topologyinfo(XcObject *self)
 {
 #define MAX_CPU_INDEX 255
-    xc_topologyinfo_t tinfo = { 0 };
+    xc_cputopoinfo_t tinfo = { 0 };
     int i, max_cpu_index;
     PyObject *ret_obj = NULL;
     PyObject *cpu_to_core_obj, *cpu_to_socket_obj, *cpu_to_node_obj;
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
 
-    coremap = xc_hypercall_buffer_alloc(self->xc_handle, coremap, sizeof(*coremap) * (MAX_CPU_INDEX+1));
-    if ( coremap == NULL )
-        goto out;
-    socketmap = xc_hypercall_buffer_alloc(self->xc_handle, socketmap, sizeof(*socketmap) * (MAX_CPU_INDEX+1));
-    if ( socketmap == NULL  )
-        goto out;
-    nodemap = xc_hypercall_buffer_alloc(self->xc_handle, nodemap, sizeof(*nodemap) * (MAX_CPU_INDEX+1));
-    if ( nodemap == NULL )
-        goto out;
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
 
-    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
-    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
-    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
+    cputopo = xc_hypercall_buffer_alloc(self->xc_handle, cputopo, sizeof(*cputopo) * (MAX_CPU_INDEX+1));
+    if ( cputopo == NULL )
+	goto out;
+    set_xen_guest_handle(tinfo.cputopo, cputopo);
     tinfo.max_cpu_index = MAX_CPU_INDEX;
 
-    if ( xc_topologyinfo(self->xc_handle, &tinfo) != 0 )
+    if ( xc_cputopoinfo(self->xc_handle, &tinfo) != 0 )
         goto out;
 
     max_cpu_index = tinfo.max_cpu_index;
@@ -1257,35 +1247,35 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     cpu_to_node_obj = PyList_New(0);
     for ( i = 0; i <= max_cpu_index; i++ )
     {
-        if ( coremap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].core == INVALID_CORE_ID )
         {
             PyList_Append(cpu_to_core_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(coremap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].core);
             PyList_Append(cpu_to_core_obj, pyint);
             Py_DECREF(pyint);
         }
 
-        if ( socketmap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].socket == INVALID_SOCKET_ID )
         {
             PyList_Append(cpu_to_socket_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(socketmap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].socket);
             PyList_Append(cpu_to_socket_obj, pyint);
             Py_DECREF(pyint);
         }
 
-        if ( nodemap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].node == INVALID_NODE_ID )
         {
             PyList_Append(cpu_to_node_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(nodemap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].node);
             PyList_Append(cpu_to_node_obj, pyint);
             Py_DECREF(pyint);
         }
@@ -1303,9 +1293,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     Py_DECREF(cpu_to_node_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, coremap);
-    xc_hypercall_buffer_free(self->xc_handle, socketmap);
-    xc_hypercall_buffer_free(self->xc_handle, nodemap);
+    xc_hypercall_buffer_free(self->xc_handle, cputopo);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
 #undef MAX_CPU_INDEX
 }
@@ -1314,28 +1302,23 @@ static PyObject *pyxc_numainfo(XcObject *self)
 {
 #define MAX_NODE_INDEX 31
     xc_numainfo_t ninfo = { 0 };
-    int i, j, max_node_index;
+    int i, j, max_node_index, invalid_node;
     uint64_t free_heap;
     PyObject *ret_obj = NULL, *node_to_node_dist_list_obj;
     PyObject *node_to_memsize_obj, *node_to_memfree_obj;
     PyObject *node_to_dma32_mem_obj, *node_to_node_dist_obj;
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, node_memsize);
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, node_memfree);
-    DECLARE_HYPERCALL_BUFFER(xc_node_to_node_dist_t, nodes_dist);
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
+    DECLARE_HYPERCALL_BUFFER(uint8_t, distance);
 
-    node_memsize = xc_hypercall_buffer_alloc(self->xc_handle, node_memsize, sizeof(*node_memsize)*(MAX_NODE_INDEX+1));
-    if ( node_memsize == NULL )
-        goto out;
-    node_memfree = xc_hypercall_buffer_alloc(self->xc_handle, node_memfree, sizeof(*node_memfree)*(MAX_NODE_INDEX+1));
-    if ( node_memfree == NULL )
+    meminfo = xc_hypercall_buffer_alloc(self->xc_handle, meminfo, sizeof(*meminfo) * (MAX_NODE_INDEX+1));
+    if ( meminfo == NULL )
         goto out;
-    nodes_dist = xc_hypercall_buffer_alloc(self->xc_handle, nodes_dist, sizeof(*nodes_dist)*(MAX_NODE_INDEX+1)*(MAX_NODE_INDEX+1));
-    if ( nodes_dist == NULL )
+    distance = xc_hypercall_buffer_alloc(self->xc_handle, distance, sizeof(*distance)*(MAX_NODE_INDEX+1)*(MAX_NODE_INDEX+1));
+    if ( distance == NULL )
         goto out;
 
-    set_xen_guest_handle(ninfo.node_to_memsize, node_memsize);
-    set_xen_guest_handle(ninfo.node_to_memfree, node_memfree);
-    set_xen_guest_handle(ninfo.node_to_node_distance, nodes_dist);
+    set_xen_guest_handle(ninfo.meminfo, meminfo);
+    set_xen_guest_handle(ninfo.distance, distance);
     ninfo.max_node_index = MAX_NODE_INDEX;
 
     if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
@@ -1355,12 +1338,12 @@ static PyObject *pyxc_numainfo(XcObject *self)
         PyObject *pyint;
 
         /* Total Memory */
-        pyint = PyInt_FromLong(node_memsize[i] >> 20); /* MB */
+        pyint = PyInt_FromLong(meminfo[i].memsize >> 20); /* MB */
         PyList_Append(node_to_memsize_obj, pyint);
         Py_DECREF(pyint);
 
         /* Free Memory */
-        pyint = PyInt_FromLong(node_memfree[i] >> 20); /* MB */
+        pyint = PyInt_FromLong(meminfo[i].memfree >> 20); /* MB */
         PyList_Append(node_to_memfree_obj, pyint);
         Py_DECREF(pyint);
 
@@ -1372,10 +1355,11 @@ static PyObject *pyxc_numainfo(XcObject *self)
 
         /* Node to Node Distance */
         node_to_node_dist_obj = PyList_New(0);
+        invalid_node = (meminfo[i].memsize == INVALID_MEM_SZ);
         for ( j = 0; j <= max_node_index; j++ )
         {
-            uint32_t dist = nodes_dist[i*(max_node_index+1) + j];
-            if ( dist == INVALID_TOPOLOGY_ID )
+            uint32_t dist = distance[i * (max_node_index + 1) + j];
+            if ( invalid_node )
             {
                 PyList_Append(node_to_node_dist_obj, Py_None);
             }
@@ -1406,9 +1390,8 @@ static PyObject *pyxc_numainfo(XcObject *self)
     Py_DECREF(node_to_node_dist_list_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, node_memsize);
-    xc_hypercall_buffer_free(self->xc_handle, node_memfree);
-    xc_hypercall_buffer_free(self->xc_handle, nodes_dist);
+    xc_hypercall_buffer_free(self->xc_handle, meminfo);
+    xc_hypercall_buffer_free(self->xc_handle, distance);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
 #undef MAX_NODE_INDEX
 }
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 0cb6ee1..30c5806 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -274,85 +274,114 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 
     case XEN_SYSCTL_numainfo:
     {
-        uint32_t i, j, max_node_index, last_online_node;
+        uint32_t i, j, num_nodes, last_online_node;
         xen_sysctl_numainfo_t *ni = &op->u.numainfo;
+        xen_sysctl_meminfo_t *meminfo;
+        uint8_t *distance;
 
         last_online_node = last_node(node_online_map);
-        max_node_index = min_t(uint32_t, ni->max_node_index, last_online_node);
-        ni->max_node_index = last_online_node;
+        num_nodes = min_t(uint32_t, ni->max_node_index, last_online_node) + 1;
 
-        for ( i = 0; i <= max_node_index; i++ )
+        if ( guest_handle_is_null(ni->meminfo) ||
+             guest_handle_is_null(ni->distance) )
         {
-            if ( !guest_handle_is_null(ni->node_to_memsize) )
-            {
-                uint64_t memsize = node_online(i) ?
-                                   node_spanned_pages(i) << PAGE_SHIFT : 0ul;
-                if ( copy_to_guest_offset(ni->node_to_memsize, i, &memsize, 1) )
-                    break;
-            }
-            if ( !guest_handle_is_null(ni->node_to_memfree) )
-            {
-                uint64_t memfree = node_online(i) ?
-                                   avail_node_heap_pages(i) << PAGE_SHIFT : 0ul;
-                if ( copy_to_guest_offset(ni->node_to_memfree, i, &memfree, 1) )
-                    break;
-            }
+            ret = -EINVAL;
+            break;
+        }
 
-            if ( !guest_handle_is_null(ni->node_to_node_distance) )
+        meminfo = xmalloc_array(xen_sysctl_meminfo_t, num_nodes);
+        distance = xmalloc_array(uint8_t, num_nodes * num_nodes);
+        if ( !meminfo || !distance )
+        {
+            xfree(meminfo);
+            xfree(distance);
+            ret = -ENOMEM;
+            break;
+        }
+
+        for ( i = 0; i < num_nodes; i++ )
+        {
+            if ( node_online(i) )
             {
-                for ( j = 0; j <= max_node_index; j++)
-                {
-                    uint32_t distance = ~0u;
-                    if ( node_online(i) && node_online(j) )
-                        distance = __node_distance(i, j);
-                    if ( copy_to_guest_offset(
-                        ni->node_to_node_distance,
-                        i*(max_node_index+1) + j, &distance, 1) )
-                        break;
-                }
-                if ( j <= max_node_index )
-                    break;
+                meminfo[i].memsize = node_spanned_pages(i) << PAGE_SHIFT;
+                meminfo[i].memfree = avail_node_heap_pages(i) << PAGE_SHIFT;
+                for ( j = 0; j < num_nodes; j++ )
+                    distance[i * num_nodes + j] = __node_distance(i, j);
             }
+            else
+                meminfo[i].memsize = meminfo[i].memfree = INVALID_MEM_SZ;
+        }
+
+        if ( ni->max_node_index != last_online_node )
+        {
+            ni->max_node_index = last_online_node;
+            if ( __copy_field_to_guest(u_sysctl, op,
+                                       u.numainfo.max_node_index) )
+                ret = -EFAULT;
         }
 
-        ret = ((i <= max_node_index) || copy_to_guest(u_sysctl, op, 1))
-            ? -EFAULT : 0;
+        if ( !ret &&
+             (copy_to_guest_offset(ni->meminfo, 0, meminfo, num_nodes) ||
+              copy_to_guest_offset(ni->distance, 0, distance,
+                                   num_nodes * num_nodes)) )
+            ret = -EFAULT;
+
+        xfree(meminfo);
+        xfree(distance);
     }
     break;
 
-    case XEN_SYSCTL_topologyinfo:
+    case XEN_SYSCTL_cputopoinfo:
     {
-        uint32_t i, max_cpu_index, last_online_cpu;
-        xen_sysctl_topologyinfo_t *ti = &op->u.topologyinfo;
+        uint32_t i, num_cpus, last_present_cpu;
+        xen_sysctl_cputopoinfo_t *ti = &op->u.cputopoinfo;
+        struct xen_sysctl_cputopo *cputopo;
 
-        last_online_cpu = cpumask_last(&cpu_online_map);
-        max_cpu_index = min_t(uint32_t, ti->max_cpu_index, last_online_cpu);
-        ti->max_cpu_index = last_online_cpu;
+        if ( guest_handle_is_null(ti->cputopo) )
+        {
+            ret = -EINVAL;
+            break;
+        }
 
-        for ( i = 0; i <= max_cpu_index; i++ )
+        last_present_cpu = cpumask_last(&cpu_present_map);
+        num_cpus = min_t(uint32_t, ti->max_cpu_index, last_present_cpu) + 1;
+
+        cputopo = xmalloc_array(xen_sysctl_cputopo_t, num_cpus);
+        if ( !cputopo )
         {
-            if ( !guest_handle_is_null(ti->cpu_to_core) )
-            {
-                uint32_t core = cpu_online(i) ? cpu_to_core(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_core, i, &core, 1) )
-                    break;
-            }
-            if ( !guest_handle_is_null(ti->cpu_to_socket) )
+            ret = -ENOMEM;
+            break;
+        }
+
+        for ( i = 0; i < num_cpus; i++ )
+        {
+            if ( cpu_present(i) )
             {
-                uint32_t socket = cpu_online(i) ? cpu_to_socket(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_socket, i, &socket, 1) )
-                    break;
+                cputopo[i].core = cpu_to_core(i);
+                cputopo[i].socket = cpu_to_socket(i);
+                cputopo[i].node = cpu_to_node(i);
             }
-            if ( !guest_handle_is_null(ti->cpu_to_node) )
+            else
             {
-                uint32_t node = cpu_online(i) ? cpu_to_node(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_node, i, &node, 1) )
-                    break;
+                cputopo[i].core = INVALID_CORE_ID;
+                cputopo[i].socket = INVALID_SOCKET_ID;
+                cputopo[i].node = INVALID_NODE_ID;
             }
         }
 
-        ret = ((i <= max_cpu_index) || copy_to_guest(u_sysctl, op, 1))
-            ? -EFAULT : 0;
+        if ( ti->max_cpu_index != last_present_cpu )
+        {
+            ti->max_cpu_index = last_present_cpu;
+            if ( __copy_field_to_guest(u_sysctl, op,
+                                       u.cputopoinfo.max_cpu_index) )
+                ret = -EFAULT;
+        }
+
+
+        if ( !ret && copy_to_guest_offset(ti->cputopo, 0, cputopo,  num_cpus) )
+            ret = -EFAULT;
+
+        xfree(cputopo);
     }
     break;
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index b3713b3..7c78f81 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -34,7 +34,7 @@
 #include "xen.h"
 #include "domctl.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
 
 /*
  * Read console content from Xen buffer ring.
@@ -462,52 +462,67 @@ struct xen_sysctl_lockprof_op {
 typedef struct xen_sysctl_lockprof_op xen_sysctl_lockprof_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
 
-/* XEN_SYSCTL_topologyinfo */
-#define INVALID_TOPOLOGY_ID  (~0U)
-struct xen_sysctl_topologyinfo {
+/* XEN_SYSCTL_cputopoinfo */
+#define INVALID_CORE_ID  (~0U)
+#define INVALID_SOCKET_ID  (~0U)
+#define INVALID_NODE_ID  ((uint8_t)~0)
+
+struct xen_sysctl_cputopo {
+    uint32_t core;
+    uint32_t socket;
+    uint8_t node;
+};
+typedef struct xen_sysctl_cputopo xen_sysctl_cputopo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopo_t);
+
+struct xen_sysctl_cputopoinfo {
     /*
-     * IN: maximum addressable entry in the caller-provided arrays.
+     * IN: maximum addressable entry in the caller-provided cputopo arary
      * OUT: largest cpu identifier in the system.
-     * If OUT is greater than IN then the arrays are truncated!
-     * If OUT is leass than IN then the array tails are not written by sysctl.
      */
     uint32_t max_cpu_index;
 
     /*
-     * If not NULL, these arrays are filled with core/socket/node identifier
-     * for each cpu.
-     * If a cpu has no core/socket/node information (e.g., cpu not present) 
-     * then the sentinel value ~0u is written to each array.
-     * The number of array elements written by the sysctl is:
+     * OUT: core/socket/node identifier for each cpu.
+     * If information for a particular entry is not avalable it is set to
+     * INVALID_CORE/SOCKET/NODE_ID.
+     * The number of array elements for CPU topology written by the sysctl is:
      *   min(@max_cpu_index_IN,@max_cpu_index_OUT)+1
      */
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_core;
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_socket;
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_node;
+    XEN_GUEST_HANDLE_64(xen_sysctl_cputopo_t) cputopo;
 };
-typedef struct xen_sysctl_topologyinfo xen_sysctl_topologyinfo_t;
-DEFINE_XEN_GUEST_HANDLE(xen_sysctl_topologyinfo_t);
+typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
 
 /* XEN_SYSCTL_numainfo */
-#define INVALID_NUMAINFO_ID (~0U)
+#define INVALID_MEM_SZ (~0U)
+
+struct xen_sysctl_meminfo {
+    uint64_t memsize;
+    uint64_t memfree;
+};
+typedef struct xen_sysctl_meminfo xen_sysctl_meminfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_meminfo_t);
+
 struct xen_sysctl_numainfo {
     /*
      * IN: maximum addressable entry in the caller-provided arrays.
      * OUT: largest node identifier in the system.
-     * If OUT is greater than IN then the arrays are truncated!
      */
     uint32_t max_node_index;
 
-    /* NB. Entries are 0 if node is not present. */
-    XEN_GUEST_HANDLE_64(uint64) node_to_memsize;
-    XEN_GUEST_HANDLE_64(uint64) node_to_memfree;
+    /*
+     * OUT: max_node_index-sized array. Entries are INVALID_MEM_SZ
+     * if node is not present.
+     */
+    XEN_GUEST_HANDLE_64(xen_sysctl_meminfo_t) meminfo;
 
     /*
-     * Array, of size (max_node_index+1)^2, listing memory access distances
-     * between nodes. If an entry has no node distance information (e.g., node 
-     * not present) then the value ~0u is written.
-     * 
-     * Note that the array rows must be indexed by multiplying by the minimum 
+     * OUT: Array, of size (max_node_index+1)^2, listing memory access
+     * distances between nodes. If an entry has no node distance information
+     * (e.g., node not present) then the value ~0u is written.
+     *
+     * Note that the array rows must be indexed by multiplying by the minimum
      * of the caller-provided max_node_index and the returned value of
      * max_node_index. That is, if the largest node index in the system is
      * smaller than the caller can handle, a smaller 2-d array is constructed
@@ -516,7 +531,7 @@ struct xen_sysctl_numainfo {
      * in the system is larger than the caller can handle, then a 2-d array of
      * the maximum size handleable by the caller is constructed.
      */
-    XEN_GUEST_HANDLE_64(uint32) node_to_node_distance;
+    XEN_GUEST_HANDLE_64(uint8) distance;
 };
 typedef struct xen_sysctl_numainfo xen_sysctl_numainfo_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_numainfo_t);
@@ -671,7 +686,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pm_op                         12
 #define XEN_SYSCTL_page_offline_op               14
 #define XEN_SYSCTL_lockprof_op                   15
-#define XEN_SYSCTL_topologyinfo                  16 
+#define XEN_SYSCTL_cputopoinfo                   16
 #define XEN_SYSCTL_numainfo                      17
 #define XEN_SYSCTL_cpupool_op                    18
 #define XEN_SYSCTL_scheduler_op                  19
@@ -682,7 +697,7 @@ struct xen_sysctl {
         struct xen_sysctl_readconsole       readconsole;
         struct xen_sysctl_tbuf_op           tbuf_op;
         struct xen_sysctl_physinfo          physinfo;
-        struct xen_sysctl_topologyinfo      topologyinfo;
+        struct xen_sysctl_cputopoinfo       cputopoinfo;
         struct xen_sysctl_numainfo          numainfo;
         struct xen_sysctl_sched_id          sched_id;
         struct xen_sysctl_perfc_op          perfc_op;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2015-02-09 20:04 ` [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient Boris Ostrovsky
@ 2015-02-09 20:04 ` Boris Ostrovsky
  2015-02-10 11:13   ` Andrew Cooper
  2015-02-09 20:04 ` [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-09 20:04 UTC (permalink / raw)
  To: jbeulich, keir, andrew.cooper3, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: boris.ostrovsky, dario.faggioli, port-xen, ufimtseva, xen-devel

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/common/sysctl.c         |   73 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h |   29 +++++++++++++++++
 2 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 30c5806..ea6557f 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -384,7 +384,80 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         xfree(cputopo);
     }
     break;
+#ifdef HAS_PCI
+    case XEN_SYSCTL_pcitopoinfo:
+    {
+        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
+        physdev_pci_device_t *devs;
+        uint8_t *nodes;
+        unsigned int first_dev, i;
+        int num_devs;
+
+        num_devs = ti->num_devs - ti->first_dev;
+
+        if ( guest_handle_is_null(ti->devs) ||
+             guest_handle_is_null(ti->nodes) ||
+             (num_devs <= 0) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        devs = xmalloc_array(physdev_pci_device_t, num_devs);
+        nodes = xmalloc_array(uint8_t, num_devs);
+        if ( !devs || !nodes )
+        {
+            xfree(devs);
+            xfree(nodes);
+            ret = -ENOMEM;
+            break;
+        }
+
+        first_dev = ti->first_dev;
+
+        if ( copy_from_guest_offset(devs, ti->devs, first_dev, num_devs) )
+        {
+            xfree(devs);
+            xfree(nodes);
+            ret = -EFAULT;
+            break;
+        }
 
+        for ( i = 0; i < num_devs; i++ )
+        {
+            struct pci_dev *pdev;
+
+            spin_lock(&pcidevs_lock);
+            pdev = pci_get_pdev(devs[i].seg, devs[i].bus, devs[i].devfn);
+            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
+                nodes[i] = INVALID_NODE_ID;
+            else
+                nodes[i] = pdev->node;
+            spin_unlock(&pcidevs_lock);
+
+            if ( hypercall_preempt_check() )
+                break;
+        }
+
+        ti->first_dev += i + 1;
+
+        if ( __copy_field_to_guest(u_sysctl, op,
+                                   u.pcitopoinfo.first_dev) ||
+             copy_to_guest_offset(ti->nodes, first_dev, nodes,num_devs) )
+        {
+            ret = -EFAULT;
+            break;
+        }
+
+        if ( ti->first_dev < ti->num_devs )
+            ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
+                                               "h", u_sysctl);
+
+        xfree(devs);
+        xfree(nodes);
+    }
+    break;
+#endif
 #ifdef TEST_COVERAGE
     case XEN_SYSCTL_coverage_op:
         ret = sysctl_coverage_op(&op->u.coverage_op);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 7c78f81..044c3a1 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -33,6 +33,7 @@
 
 #include "xen.h"
 #include "domctl.h"
+#include "physdev.h"
 
 #define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
 
@@ -494,6 +495,32 @@ struct xen_sysctl_cputopoinfo {
 typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
 
+/* XEN_SYSCTL_pcitopoinfo */
+struct xen_sysctl_pcitopoinfo {
+    /* IN: Size of pcitopo array */
+    uint32_t num_devs;
+
+    /*
+     * IN/OUT: First element of pcitopo array that needs to be processed by
+     * hypervisor.
+     * This is used primarily by hypercall continuations and callers will
+     * typically set it to zero
+     */
+    uint32_t first_dev;
+
+    /* IN: list of devices */
+    XEN_GUEST_HANDLE_64(physdev_pci_device_t) devs;
+
+    /*
+     * OUT: node identifier for each device.
+     * If information for a particular device is not avalable then set
+     * to INVALID_NODE_ID.
+     */
+    XEN_GUEST_HANDLE_64(uint8) nodes;
+};
+typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
+
 /* XEN_SYSCTL_numainfo */
 #define INVALID_MEM_SZ (~0U)
 
@@ -692,12 +719,14 @@ struct xen_sysctl {
 #define XEN_SYSCTL_scheduler_op                  19
 #define XEN_SYSCTL_coverage_op                   20
 #define XEN_SYSCTL_psr_cmt_op                    21
+#define XEN_SYSCTL_pcitopoinfo                   22
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
         struct xen_sysctl_tbuf_op           tbuf_op;
         struct xen_sysctl_physinfo          physinfo;
         struct xen_sysctl_cputopoinfo       cputopoinfo;
+        struct xen_sysctl_pcitopoinfo       pcitopoinfo;
         struct xen_sysctl_numainfo          numainfo;
         struct xen_sysctl_sched_id          sched_id;
         struct xen_sysctl_perfc_op          perfc_op;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2015-02-09 20:04 ` [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
@ 2015-02-09 20:04 ` Boris Ostrovsky
  2015-02-10 11:23   ` Andrew Cooper
                     ` (3 more replies)
  2015-02-09 20:04 ` [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s " Boris Ostrovsky
                   ` (6 subsequent siblings)
  11 siblings, 4 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-09 20:04 UTC (permalink / raw)
  To: jbeulich, keir, andrew.cooper3, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: boris.ostrovsky, dario.faggioli, port-xen, ufimtseva, xen-devel

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxc/include/xenctrl.h     |    4 +-
 tools/libxc/xc_misc.c             |   23 +++++++++++-----
 tools/libxl/libxl.c               |   32 +++++------------------
 tools/misc/xenpm.c                |   51 ++++++++++++++++---------------------
 tools/python/xen/lowlevel/xc/xc.c |   23 ++++++-----------
 5 files changed, 55 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0cb6743..d94571d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
 int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
-typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
+typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1237,7 +1237,7 @@ typedef uint64_t xc_node_to_memfree_t;
 typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
-int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
+int xc_cputopoinfo(xc_interface *xch, int *max_cpus, xc_cputopo_t *cputopo);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index be68291..4c654f3 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -177,22 +177,31 @@ int xc_physinfo(xc_interface *xch,
     return 0;
 }
 
-int xc_cputopoinfo(xc_interface *xch,
-                   xc_cputopoinfo_t *put_info)
+int xc_cputopoinfo(xc_interface *xch, int *max_cpus,
+                   xc_cputopo_t *cputopo)
 {
     int ret;
     DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(cputopo, *max_cpus * sizeof(*cputopo),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 
-    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
+    if ((ret = xc_hypercall_bounce_pre(xch, cputopo)))
+        goto out;
 
-    memcpy(&sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
+    sysctl.u.cputopoinfo.max_cpu_index = *max_cpus;
+    set_xen_guest_handle(sysctl.u.cputopoinfo.cputopo, cputopo);
+
+    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
 
     if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
-        return ret;
+        goto out;
 
-    memcpy(put_info, &sysctl.u.cputopoinfo, sizeof(*put_info));
+    *max_cpus = sysctl.u.cputopoinfo.max_cpu_index + 1;
 
-    return 0;
+out:
+    xc_hypercall_bounce_post(xch, cputopo);
+
+    return ret;
 }
 
 int xc_numainfo(xc_interface *xch,
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b05c0bb..f8d64c2 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5072,38 +5072,23 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
     GC_INIT(ctx);
-    xc_cputopoinfo_t tinfo;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    xc_cputopo_t *cputopo;
     libxl_cputopology *ret = NULL;
-    int i;
-    int max_cpus;
+    int i, max_cpus;
 
     max_cpus = libxl_get_max_cpus(ctx);
-    if (max_cpus < 0)
-    {
+    if (max_cpus < 0) {
         LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS");
-        ret = NULL;
         goto out;
     }
 
-    cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
-                                        sizeof(*cputopo) * max_cpus);
-    if (cputopo == NULL) {
-        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
-                            "Unable to allocate hypercall arguments");
-        goto fail;
-    }
+    cputopo = libxl__zalloc(gc, sizeof(*cputopo) * max_cpus);
 
-    set_xen_guest_handle(tinfo.cputopo, cputopo);
-    tinfo.max_cpu_index = max_cpus - 1;
-    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0) {
+    if (xc_cputopoinfo(ctx->xch, &max_cpus, cputopo) != 0) {
         LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "CPU topology info hypercall failed");
-        goto fail;
+        goto out;
     }
 
-    if (tinfo.max_cpu_index < max_cpus - 1)
-        max_cpus = tinfo.max_cpu_index + 1;
-
     ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * max_cpus);
 
     for (i = 0; i < max_cpus; i++) {
@@ -5115,11 +5100,8 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 #undef V
     }
 
-fail:
-    xc_hypercall_buffer_free(ctx->xch, cputopo);
+    *nb_cpu_out = max_cpus;
 
-    if (ret)
-        *nb_cpu_out = max_cpus;
  out:
     GC_FREE;
     return ret;
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index f7fe620..1d1eb40 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -355,12 +355,11 @@ static void signal_int_handler(int signo)
     int i, j, k;
     struct timeval tv;
     int cx_cap = 0, px_cap = 0;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
-    xc_cputopoinfo_t info = { 0 };
-
-    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
-                                        sizeof(*cputopo) * MAX_NR_CPU);
+    xc_cputopo_t *cputopo;
+    int max_cpus;
 
+    max_cpus = MAX_NR_CPU;
+    cputopo = calloc(max_cpus, sizeof(*cputopo));
     if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
@@ -445,29 +444,26 @@ static void signal_int_handler(int signo)
             printf("  Avg freq\t%d\tKHz\n", avgfreq[i]);
     }
 
-    set_xen_guest_handle(info.cputopo, cputopo);
-    info.max_cpu_index = MAX_NR_CPU - 1;
-
-    if ( cx_cap && !xc_cputopoinfo(xc_handle, &info) )
+    if ( cx_cap && !xc_cputopoinfo(xc_handle, &max_cpus, cputopo) )
     {
         uint32_t socket_ids[MAX_NR_CPU];
         uint32_t core_ids[MAX_NR_CPU];
         uint32_t socket_nr = 0;
         uint32_t core_nr = 0;
 
-        if ( info.max_cpu_index > MAX_NR_CPU - 1 )
-            info.max_cpu_index = MAX_NR_CPU - 1;
+        if ( max_cpus > MAX_NR_CPU )
+            max_cpus = MAX_NR_CPU;
         /* check validity */
-        for ( i = 0; i <= info.max_cpu_index; i++ )
+        for ( i = 0; i < max_cpus; i++ )
         {
             if ( cputopo[i].core == INVALID_CORE_ID ||
                  cputopo[i].socket == INVALID_SOCKET_ID )
                 break;
         }
-        if ( i > info.max_cpu_index )
+        if ( i >= max_cpus )
         {
             /* find socket nr & core nr per socket */
-            for ( i = 0; i <= info.max_cpu_index; i++ )
+            for ( i = 0; i < max_cpus; i++ )
             {
                 for ( j = 0; j < socket_nr; j++ )
                     if ( cputopo[i].socket == socket_ids[j] )
@@ -494,7 +490,7 @@ static void signal_int_handler(int signo)
                 unsigned int n;
                 uint64_t res;
 
-                for ( j = 0; j <= info.max_cpu_index; j++ )
+                for ( j = 0; j < max_cpus; j++ )
                 {
                     if ( cputopo[j].socket == socket_ids[i] )
                         break;
@@ -513,7 +509,7 @@ static void signal_int_handler(int signo)
                 }
                 for ( k = 0; k < core_nr; k++ )
                 {
-                    for ( j = 0; j <= info.max_cpu_index; j++ )
+                    for ( j = 0; j < max_cpus; j++ )
                     {
                         if ( cputopo[j].socket == socket_ids[i] &&
                              cputopo[j].core == core_ids[k] )
@@ -551,7 +547,7 @@ static void signal_int_handler(int signo)
     free(sum);
     free(avgfreq);
 out:
-    xc_hypercall_buffer_free(xc_handle, cputopo);
+    free(cputopo);
     xc_interface_close(xc_handle);
     exit(0);
 }
@@ -958,22 +954,19 @@ void scaling_governor_func(int argc, char *argv[])
 
 void cpu_topology_func(int argc, char *argv[])
 {
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
-    xc_cputopoinfo_t info = { 0 };
+    xc_cputopo_t *cputopo;
+    int max_cpus;
     int i, rc = ENOMEM;
 
-    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
-                                        sizeof(*cputopo) * MAX_NR_CPU);
+    max_cpus = MAX_NR_CPU;
+    cputopo = calloc(max_cpus, sizeof(*cputopo));
     if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
     }
 
-    set_xen_guest_handle(info.cputopo, cputopo);
-    info.max_cpu_index = MAX_NR_CPU-1;
-
-    if ( xc_cputopoinfo(xc_handle, &info) )
+    if ( xc_cputopoinfo(xc_handle, &max_cpus, cputopo) )
     {
         rc = errno;
         fprintf(stderr, "Cannot get Xen CPU topology (%d - %s)\n",
@@ -981,11 +974,11 @@ void cpu_topology_func(int argc, char *argv[])
         goto out;
     }
 
-    if ( info.max_cpu_index > (MAX_NR_CPU-1) )
-        info.max_cpu_index = MAX_NR_CPU-1;
+    if ( max_cpus > MAX_NR_CPU )
+        max_cpus = MAX_NR_CPU;
 
     printf("CPU\tcore\tsocket\tnode\n");
-    for ( i = 0; i <= info.max_cpu_index; i++ )
+    for ( i = 0; i < max_cpus; i++ )
     {
         if ( cputopo[i].core == INVALID_CORE_ID )
             continue;
@@ -994,7 +987,7 @@ void cpu_topology_func(int argc, char *argv[])
     }
     rc = 0;
 out:
-    xc_hypercall_buffer_free(xc_handle, cputopo);
+    free(cputopo);
     if ( rc )
         exit(rc);
 }
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 4275968..6e49dc5 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1221,31 +1221,24 @@ static PyObject *pyxc_getcpuinfo(XcObject *self, PyObject *args, PyObject *kwds)
 static PyObject *pyxc_topologyinfo(XcObject *self)
 {
 #define MAX_CPU_INDEX 255
-    xc_cputopoinfo_t tinfo = { 0 };
-    int i, max_cpu_index;
+    xc_cputopo_t *cputopo;
+    int i, max_cpus;
     PyObject *ret_obj = NULL;
     PyObject *cpu_to_core_obj, *cpu_to_socket_obj, *cpu_to_node_obj;
 
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
-
-    cputopo = xc_hypercall_buffer_alloc(self->xc_handle, cputopo, sizeof(*cputopo) * (MAX_CPU_INDEX+1));
+    max_cpus = MAX_CPU_INDEX + 1;
+    cputopo = calloc(max_cpus, sizeof(*cputopo));
     if ( cputopo == NULL )
 	goto out;
-    set_xen_guest_handle(tinfo.cputopo, cputopo);
-    tinfo.max_cpu_index = MAX_CPU_INDEX;
 
-    if ( xc_cputopoinfo(self->xc_handle, &tinfo) != 0 )
+    if ( xc_cputopoinfo(self->xc_handle, &max_cpus, cputopo) != 0 )
         goto out;
 
-    max_cpu_index = tinfo.max_cpu_index;
-    if ( max_cpu_index > MAX_CPU_INDEX )
-        max_cpu_index = MAX_CPU_INDEX;
-
     /* Construct cpu-to-* lists. */
     cpu_to_core_obj = PyList_New(0);
     cpu_to_socket_obj = PyList_New(0);
     cpu_to_node_obj = PyList_New(0);
-    for ( i = 0; i <= max_cpu_index; i++ )
+    for ( i = 0; i < max_cpus; i++ )
     {
         if ( cputopo[i].core == INVALID_CORE_ID )
         {
@@ -1281,7 +1274,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
         }
     }
 
-    ret_obj = Py_BuildValue("{s:i}", "max_cpu_index", max_cpu_index);
+    ret_obj = Py_BuildValue("{s:i}", "max_cpu_index", max_cpus - 1);
 
     PyDict_SetItemString(ret_obj, "cpu_to_core", cpu_to_core_obj);
     Py_DECREF(cpu_to_core_obj);
@@ -1293,7 +1286,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     Py_DECREF(cpu_to_node_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, cputopo);
+    free(cputopo);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
 #undef MAX_CPU_INDEX
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2015-02-09 20:04 ` [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
@ 2015-02-09 20:04 ` Boris Ostrovsky
  2015-02-09 20:04 ` [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-09 20:04 UTC (permalink / raw)
  To: jbeulich, keir, andrew.cooper3, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: boris.ostrovsky, dario.faggioli, port-xen, ufimtseva, xen-devel

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxc/include/xenctrl.h     |    4 ++-
 tools/libxc/xc_misc.c             |   29 ++++++++++++++++++++------
 tools/libxl/libxl.c               |   36 ++++++++-------------------------
 tools/python/xen/lowlevel/xc/xc.c |   39 +++++++++++++++---------------------
 4 files changed, 50 insertions(+), 58 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d94571d..4466cd7 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1228,6 +1228,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys);
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
+typedef xen_sysctl_meminfo_t xc_meminfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
 typedef uint32_t xc_cpu_to_socket_t;
@@ -1238,7 +1239,8 @@ typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
 int xc_cputopoinfo(xc_interface *xch, int *max_cpus, xc_cputopo_t *cputopo);
-int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
+int xc_numainfo(xc_interface *xch, int *max_nodes,
+                xc_meminfo_t *meminfo, uint8_t *distance);
 
 int xc_sched_id(xc_interface *xch,
                 int *sched_id);
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 4c654f3..2dd5e13 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -204,22 +204,37 @@ out:
     return ret;
 }
 
-int xc_numainfo(xc_interface *xch,
-                xc_numainfo_t *put_info)
+int xc_numainfo(xc_interface *xch, int *max_nodes,
+                xc_meminfo_t *meminfo, uint8_t *distance)
 {
     int ret;
     DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(distance,
+                             *max_nodes * *max_nodes * sizeof(*distance),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
 
-    sysctl.cmd = XEN_SYSCTL_numainfo;
+    if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
+        goto out;
+    if ((ret = xc_hypercall_bounce_pre(xch, distance)))
+        goto out;
 
-    memcpy(&sysctl.u.numainfo, put_info, sizeof(*put_info));
+    sysctl.u.numainfo.max_node_index = *max_nodes - 1;
+    set_xen_guest_handle(sysctl.u.numainfo.meminfo, meminfo);
+    set_xen_guest_handle(sysctl.u.numainfo.distance, distance);
 
+    sysctl.cmd = XEN_SYSCTL_numainfo;
     if ((ret = do_sysctl(xch, &sysctl)) != 0)
-        return ret;
+        goto out;
 
-    memcpy(put_info, &sysctl.u.numainfo, sizeof(*put_info));
+    *max_nodes = sysctl.u.numainfo.max_node_index + 1;
 
-    return 0;
+out:
+    xc_hypercall_bounce_post(xch, meminfo);
+    xc_hypercall_bounce_post(xch, distance);
+
+    return ret;
 }
 
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index f8d64c2..9c1c949 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5110,41 +5110,25 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
     GC_INIT(ctx);
-    xc_numainfo_t ninfo;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
-    DECLARE_HYPERCALL_BUFFER(uint8_t, distance);
+    xc_meminfo_t *meminfo;
+    uint8_t *distance;
     libxl_numainfo *ret = NULL;
     int i, j, max_nodes;
 
     max_nodes = libxl_get_max_nodes(ctx);
-    if (max_nodes < 0)
-    {
+    if (max_nodes < 0) {
         LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES");
-        ret = NULL;
         goto out;
     }
 
-    meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo, sizeof(*meminfo) * max_nodes);
-    distance = xc_hypercall_buffer_alloc(ctx->xch, distance, sizeof(*distance) * max_nodes * max_nodes);
-    if ((meminfo == NULL) || (distance == NULL)) {
-        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
-                            "Unable to allocate hypercall arguments");
-        goto fail;
-    }
+    meminfo = libxl__zalloc(gc, sizeof(*meminfo) * max_nodes);
+    distance = libxl__zalloc(gc, sizeof(*distance) * max_nodes * max_nodes);
 
-    set_xen_guest_handle(ninfo.meminfo, meminfo);
-    set_xen_guest_handle(ninfo.distance, distance);
-    ninfo.max_node_index = max_nodes - 1;
-    if (xc_numainfo(ctx->xch, &ninfo) != 0) {
+    if (xc_numainfo(ctx->xch, &max_nodes, meminfo, distance) != 0) {
         LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo");
-        goto fail;
+        goto out;
     }
 
-    if (ninfo.max_node_index < max_nodes - 1)
-        max_nodes = ninfo.max_node_index + 1;
-
-    *nr = max_nodes;
-
     ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes);
     for (i = 0; i < max_nodes; i++)
         ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*distance));
@@ -5156,13 +5140,11 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
         ret[i].free = V(memfree, i);
         ret[i].num_dists = max_nodes;
         for (j = 0; j < ret[i].num_dists; j++)
-            ret[i].dists[j] = distance[i*max_nodes + j];
+            ret[i].dists[j] = distance[i * max_nodes + j];
 #undef V
     }
 
- fail:
-    xc_hypercall_buffer_free(ctx->xch, meminfo);
-    xc_hypercall_buffer_free(ctx->xch, distance);
+    *nr = max_nodes;
 
  out:
     GC_FREE;
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 6e49dc5..3fd4805 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1294,39 +1294,32 @@ out:
 static PyObject *pyxc_numainfo(XcObject *self)
 {
 #define MAX_NODE_INDEX 31
-    xc_numainfo_t ninfo = { 0 };
-    int i, j, max_node_index, invalid_node;
+    int i, j, max_nodes, invalid_node;
     uint64_t free_heap;
     PyObject *ret_obj = NULL, *node_to_node_dist_list_obj;
     PyObject *node_to_memsize_obj, *node_to_memfree_obj;
     PyObject *node_to_dma32_mem_obj, *node_to_node_dist_obj;
-    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
-    DECLARE_HYPERCALL_BUFFER(uint8_t, distance);
+    xc_meminfo_t *meminfo;
+    uint8_t *distance;
 
-    meminfo = xc_hypercall_buffer_alloc(self->xc_handle, meminfo, sizeof(*meminfo) * (MAX_NODE_INDEX+1));
-    if ( meminfo == NULL )
+    max_nodes = MAX_NODE_INDEX + 1;
+    meminfo = calloc(max_nodes, sizeof(*meminfo));
+    distance = calloc(max_nodes * max_nodes, sizeof(*distance));
+    if ( meminfo == NULL || distance == NULL)
         goto out;
-    distance = xc_hypercall_buffer_alloc(self->xc_handle, distance, sizeof(*distance)*(MAX_NODE_INDEX+1)*(MAX_NODE_INDEX+1));
-    if ( distance == NULL )
-        goto out;
-
-    set_xen_guest_handle(ninfo.meminfo, meminfo);
-    set_xen_guest_handle(ninfo.distance, distance);
-    ninfo.max_node_index = MAX_NODE_INDEX;
 
-    if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
+    if ( xc_numainfo(self->xc_handle, &max_nodes, meminfo, distance) != 0 )
         goto out;
 
-    max_node_index = ninfo.max_node_index;
-    if ( max_node_index > MAX_NODE_INDEX )
-        max_node_index = MAX_NODE_INDEX;
+    if ( max_nodes > MAX_NODE_INDEX + 1 )
+        max_nodes = MAX_NODE_INDEX + 1;
 
     /* Construct node-to-* lists. */
     node_to_memsize_obj = PyList_New(0);
     node_to_memfree_obj = PyList_New(0);
     node_to_dma32_mem_obj = PyList_New(0);
     node_to_node_dist_list_obj = PyList_New(0);
-    for ( i = 0; i <= max_node_index; i++ )
+    for ( i = 0; i < max_nodes; i++ )
     {
         PyObject *pyint;
 
@@ -1349,9 +1342,9 @@ static PyObject *pyxc_numainfo(XcObject *self)
         /* Node to Node Distance */
         node_to_node_dist_obj = PyList_New(0);
         invalid_node = (meminfo[i].memsize == INVALID_MEM_SZ);
-        for ( j = 0; j <= max_node_index; j++ )
+        for ( j = 0; j < max_nodes; j++ )
         {
-            uint32_t dist = distance[i * (max_node_index + 1) + j];
+            uint32_t dist = distance[i * max_nodes + j];
             if ( invalid_node )
             {
                 PyList_Append(node_to_node_dist_obj, Py_None);
@@ -1367,7 +1360,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
         Py_DECREF(node_to_node_dist_obj);
     }
 
-    ret_obj = Py_BuildValue("{s:i}", "max_node_index", max_node_index);
+    ret_obj = Py_BuildValue("{s:i}", "max_node_index", max_nodes - 1);
 
     PyDict_SetItemString(ret_obj, "node_memsize", node_to_memsize_obj);
     Py_DECREF(node_to_memsize_obj);
@@ -1383,8 +1376,8 @@ static PyObject *pyxc_numainfo(XcObject *self)
     Py_DECREF(node_to_node_dist_list_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, meminfo);
-    xc_hypercall_buffer_free(self->xc_handle, distance);
+    free(meminfo);
+    free(distance);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
 #undef MAX_NODE_INDEX
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2015-02-09 20:04 ` [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s " Boris Ostrovsky
@ 2015-02-09 20:04 ` Boris Ostrovsky
  2015-02-13 12:43   ` Wei Liu
                     ` (3 more replies)
  2015-02-10  8:52 ` [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Jan Beulich
                   ` (4 subsequent siblings)
  11 siblings, 4 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-09 20:04 UTC (permalink / raw)
  To: jbeulich, keir, andrew.cooper3, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: boris.ostrovsky, dario.faggioli, port-xen, ufimtseva, xen-devel

.. and use this new interface to display it along with CPU topology
and NUMA information when 'xl info -n' command is issued

The output will look like
...
cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
...
device topology        :
device           node
0000:00:00.0      0
0000:00:01.0      0
...

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxc/include/xenctrl.h |    3 ++
 tools/libxc/xc_misc.c         |   32 +++++++++++++++++
 tools/libxl/libxl.c           |   49 +++++++++++++++++++++++++++
 tools/libxl/libxl.h           |   14 ++++++++
 tools/libxl/libxl_freebsd.c   |   14 ++++++++
 tools/libxl/libxl_internal.h  |    5 +++
 tools/libxl/libxl_linux.c     |   74 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_netbsd.c    |   14 ++++++++
 tools/libxl/libxl_types.idl   |    7 ++++
 tools/libxl/libxl_utils.c     |    8 ++++
 tools/libxl/xl_cmdimpl.c      |   44 ++++++++++++++++++++----
 11 files changed, 257 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 4466cd7..9ba40eb 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1229,6 +1229,7 @@ typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopo_t xc_cputopo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 typedef xen_sysctl_meminfo_t xc_meminfo_t;
+typedef xen_sysctl_pcitopoinfo_t xc_pcitopoinfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
 typedef uint32_t xc_cpu_to_socket_t;
@@ -1241,6 +1242,8 @@ int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
 int xc_cputopoinfo(xc_interface *xch, int *max_cpus, xc_cputopo_t *cputopo);
 int xc_numainfo(xc_interface *xch, int *max_nodes,
                 xc_meminfo_t *meminfo, uint8_t *distance);
+int xc_pcitopoinfo(xc_interface *xch, int num_devs,
+                   physdev_pci_device_t *devs, uint8_t *nodes);
 
 int xc_sched_id(xc_interface *xch,
                 int *sched_id);
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 2dd5e13..a8266d7 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -204,6 +204,38 @@ out:
     return ret;
 }
 
+int xc_pcitopoinfo(xc_interface *xch, int num_devs,
+                   physdev_pci_device_t *devs,
+                   uint8_t *nodes)
+{
+    int ret;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(devs, num_devs * sizeof(*devs),
+                             XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BOUNCE(nodes, num_devs* sizeof(*nodes),
+                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ((ret = xc_hypercall_bounce_pre(xch, devs)))
+        goto out;
+    if ((ret = xc_hypercall_bounce_pre(xch, nodes)))
+        goto out;
+
+    sysctl.u.pcitopoinfo.first_dev = 0;
+    sysctl.u.pcitopoinfo.num_devs = num_devs;
+    set_xen_guest_handle(sysctl.u.pcitopoinfo.devs, devs);
+    set_xen_guest_handle(sysctl.u.pcitopoinfo.nodes, nodes);
+
+    sysctl.cmd = XEN_SYSCTL_pcitopoinfo;
+
+    ret = do_sysctl(xch, &sysctl);
+
+ out:
+    xc_hypercall_bounce_post(xch, devs);
+    xc_hypercall_bounce_post(xch, nodes);
+
+    return ret;
+}
+
 int xc_numainfo(xc_interface *xch, int *max_nodes,
                 xc_meminfo_t *meminfo, uint8_t *distance)
 {
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 9c1c949..c3b6c8a 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5107,6 +5107,55 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
     return ret;
 }
 
+#ifdef  LIBXL_HAVE_PCITOPO
+libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
+{
+    GC_INIT(ctx);
+    physdev_pci_device_t *devs;
+    uint8_t *nodes;
+    libxl_pcitopology *ret = NULL;
+    int i, rc;
+
+    *num_devs = libxl__pci_numdevs(gc);
+    if (*num_devs <= 0) {
+        LOG(ERROR, "Unable to determine number of PCI devices");
+        goto out;
+    }
+
+    devs = libxl__zalloc(gc, sizeof(*devs) * *num_devs);
+    nodes = libxl__zalloc(gc, sizeof(*nodes) * *num_devs);
+    if (devs == NULL || nodes == NULL) {
+        LOGEV(ERROR, ENOMEM, "Unable to allocate hypercall arguments");
+        goto out;
+    }
+
+    rc = libxl__pci_topology_init(gc, devs, *num_devs);
+    if (rc) {
+        LOGEV(ERROR, rc, "Cannot initialize PCI hypercall structure");
+        goto out;
+    }
+
+    if (xc_pcitopoinfo(ctx->xch, *num_devs, devs, nodes) != 0) {
+        LOGE(ERROR, "PCI topology info hypercall failed");
+        goto out;
+    }
+
+    ret = libxl__zalloc(NOGC, sizeof(libxl_pcitopology) * *num_devs);
+
+    for (i = 0; i < *num_devs; i++) {
+        ret[i].seg = devs[i].seg;
+        ret[i].bus = devs[i].bus;
+        ret[i].devfn = devs[i].devfn;
+        ret[i].node = (nodes[i] == INVALID_NODE_ID) ?
+            LIBXL_PCITOPOLOGY_INVALID_ENTRY : nodes[i];
+    }
+
+ out:
+    GC_FREE;
+    return ret;
+}
+#endif
+
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
     GC_INIT(ctx);
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a123f1..12d9fb9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -692,6 +692,14 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src);
 #define LIBXL_HAVE_PSR_CMT 1
 #endif
 
+/*
+ * LIBXL_HAVE_PCITOPO
+ *
+ * If this is defined, we have interfaces to query hypervisor about PCI device
+ * topology
+ */
+#define  LIBXL_HAVE_PCITOPO 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
@@ -1070,6 +1078,12 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
 void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
 
+#ifdef  LIBXL_HAVE_PCITOPO
+#define LIBXL_PCITOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
+libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs);
+void libxl_pcitopology_list_free(libxl_pcitopology *, int num_devs);
+#endif
+
 #define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0)
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr);
 void libxl_numainfo_list_free(libxl_numainfo *, int nr);
diff --git a/tools/libxl/libxl_freebsd.c b/tools/libxl/libxl_freebsd.c
index e8b88b3..68d7b71 100644
--- a/tools/libxl/libxl_freebsd.c
+++ b/tools/libxl/libxl_freebsd.c
@@ -131,3 +131,17 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
 }
+
+#ifdef  LIBXL_HAVE_PCITOPO
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    return ERROR_NI;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             physdev_pci_device_t *devs,
+                             int num_devs)
+{
+    return ERROR_NI;
+}
+#endif
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9695f18..5a3a2c5 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1168,6 +1168,11 @@ _hidden int libxl__try_phy_backend(mode_t st_mode);
 
 _hidden char *libxl__devid_to_localdev(libxl__gc *gc, int devid);
 
+_hidden int libxl__pci_numdevs(libxl__gc *gc);
+_hidden int libxl__pci_topology_init(libxl__gc *gc,
+                                     physdev_pci_device_t *devs,
+                                     int num_devs);
+
 /* from libxl_pci */
 
 _hidden int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, int starting);
diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
index ea5d8c1..f558dba 100644
--- a/tools/libxl/libxl_linux.c
+++ b/tools/libxl/libxl_linux.c
@@ -279,3 +279,77 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
 }
+
+#ifdef  LIBXL_HAVE_PCITOPO
+/* These two routines are "inspired" by pciutils */
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    DIR *dir;
+    struct dirent *entry;
+    int num_devs = 0;
+
+    dir = opendir("/sys/bus/pci/devices");
+    if (!dir) {
+        LOGEV(ERROR, errno, "Cannot open /sys/bus/pci/devices");
+        return ERROR_FAIL;
+    }
+
+    while ((entry = readdir(dir))) {
+        /* ".", ".." or a special non-device perhaps */
+        if (entry->d_name[0] == '.')
+            continue;
+        num_devs++;
+    }
+    closedir(dir);
+
+    return num_devs;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             physdev_pci_device_t *devs,
+                             int num_devs)
+{
+
+    DIR *dir;
+    struct dirent *entry;
+    int i, err = 0;
+
+    dir = opendir("/sys/bus/pci/devices");
+    if (!dir) {
+        LOGEV(ERROR, errno, "Cannot open /sys/bus/pci/devices");
+        return ERROR_FAIL;
+    }
+
+    i = 0;
+    while ((entry = readdir(dir))) {
+        unsigned int dom, bus, dev, func;
+
+        /* ".", ".." or a special non-device perhaps */
+        if (entry->d_name[0] == '.')
+            continue;
+
+        if (i == num_devs) {
+            LOGE(ERROR, "Too many devices\n");
+            err = ERROR_FAIL;
+            goto out;
+        }
+
+        if (sscanf(entry->d_name, "%x:%x:%x.%d", &dom, &bus, &dev, &func) < 4) {
+            LOGEV(ERROR, errno, "Error processing /sys/bus/pci/devices");
+            err = ERROR_FAIL;
+            goto out;
+        }
+
+        devs[i].seg = dom;
+        devs[i].bus = bus;
+        devs[i].devfn = ((dev & 0x1f) << 3) | (func & 7);
+
+        i++;
+    }
+
+ out:
+    closedir(dir);
+
+    return err;
+}
+#endif
diff --git a/tools/libxl/libxl_netbsd.c b/tools/libxl/libxl_netbsd.c
index 898e160..25c1701 100644
--- a/tools/libxl/libxl_netbsd.c
+++ b/tools/libxl/libxl_netbsd.c
@@ -95,3 +95,17 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL;
 }
+
+#ifdef  LIBXL_HAVE_PCITOPO
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    return ERROR_NI;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             physdev_pci_device_t *devs,
+                             int num_devs)
+{
+    return ERROR_NI;
+}
+#endif
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f7fc695..e49b5f8 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -642,6 +642,13 @@ libxl_cputopology = Struct("cputopology", [
     ("node", uint32),
     ], dir=DIR_OUT)
 
+libxl_pcitopology = Struct("pcitopology", [
+    ("seg", uint16),
+    ("bus", uint8),
+    ("devfn", uint8),
+    ("node", uint32),
+    ], dir=DIR_OUT)
+
 libxl_sched_credit_params = Struct("sched_credit_params", [
     ("tslice_ms", integer),
     ("ratelimit_us", integer),
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 7095b58..c31407f 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -877,6 +877,14 @@ void libxl_cputopology_list_free(libxl_cputopology *list, int nr)
     free(list);
 }
 
+void libxl_pcitopology_list_free(libxl_pcitopology *list, int nr)
+{
+    int i;
+    for (i = 0; i < nr; i++)
+        libxl_pcitopology_dispose(&list[i]);
+    free(list);
+}
+
 void libxl_numainfo_list_free(libxl_numainfo *list, int nr)
 {
     int i;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index ed0d478..c0e3f52 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5196,12 +5196,17 @@ static void output_numainfo(void)
 
 static void output_topologyinfo(void)
 {
-    libxl_cputopology *info;
+    libxl_cputopology *cpuinfo;
     int i, nr;
+#ifdef  LIBXL_HAVE_PCITOPO
+    libxl_pcitopology *pciinfo;
+    int valid_devs;
+#endif
 
-    info = libxl_get_cpu_topology(ctx, &nr);
-    if (info == NULL) {
-        fprintf(stderr, "libxl_get_topologyinfo failed.\n");
+
+    cpuinfo = libxl_get_cpu_topology(ctx, &nr);
+    if (cpuinfo == NULL) {
+        fprintf(stderr, "libxl_get_cpu_topology failed.\n");
         return;
     }
 
@@ -5209,12 +5214,37 @@ static void output_topologyinfo(void)
     printf("cpu:    core    socket     node\n");
 
     for (i = 0; i < nr; i++) {
-        if (info[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
+        if (cpuinfo[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
             printf("%3d:    %4d     %4d     %4d\n", i,
-                   info[i].core, info[i].socket, info[i].node);
+                   cpuinfo[i].core, cpuinfo[i].socket, cpuinfo[i].node);
+    }
+
+    libxl_cputopology_list_free(cpuinfo, nr);
+
+#ifdef  LIBXL_HAVE_PCITOPO
+    pciinfo = libxl_get_pci_topology(ctx, &nr);
+    if (cpuinfo == NULL) {
+        fprintf(stderr, "libxl_get_pci_topology failed.\n");
+        return;
     }
 
-    libxl_cputopology_list_free(info, nr);
+    printf("device topology        :\n");
+    printf("device           node\n");
+    for (i = 0; i < nr; i++) {
+        if (pciinfo[i].node != LIBXL_PCITOPOLOGY_INVALID_ENTRY) {
+            printf("%04x:%02x:%02x.%01x      %d\n", pciinfo[i].seg,
+                   pciinfo[i].bus,
+                   ((pciinfo[i].devfn >> 3) & 0x1f), (pciinfo[i].devfn & 7),
+                   pciinfo[i].node);
+            valid_devs++;
+        }
+    }
+
+    if (valid_devs == 0)
+        printf("No device topology data available\n");
+
+    libxl_pcitopology_list_free(pciinfo, nr);
+#endif
 
     return;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup)
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2015-02-09 20:04 ` [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
@ 2015-02-10  8:52 ` Jan Beulich
  2015-02-10 11:08   ` Andrew Cooper
  2015-02-10 10:26 ` [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology Robert Elz
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2015-02-10  8:52 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, keir

>>> On 09.02.15 at 21:04, <boris.ostrovsky@oracle.com> wrote:
> * Make changes to xen_sysctl_numainfo, similar to those made to
>   xen_sysctl_topologyinfo. (Q: I kept both sets of changes in the same
>   patch #3 to avoid bumping interface version twice. Perhaps it's better
>   to split it into two?)

There's no need to bump it more than once between releases.
Hence yes - splitting would be better.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
  2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2015-02-10  8:52 ` [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Jan Beulich
@ 2015-02-10 10:26 ` Robert Elz
       [not found] ` <1423512275-6531-3-git-send-email-boris.ostrovsky@oracle.com>
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 49+ messages in thread
From: Robert Elz @ 2015-02-10 10:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, ian.campbell, port-xen, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, wei.liu2,
	ufimtseva

    Date:        Mon,  9 Feb 2015 15:04:32 -0500
    From:        Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Message-ID:  <1423512275-6531-5-git-send-email-boris.ostrovsky@oracle.com>


  | +        num_devs = ti->num_devs - ti->first_dev;
  | +
  | +        if ( guest_handle_is_null(ti->devs) ||
  | +             guest_handle_is_null(ti->nodes) ||
  | +             (num_devs <= 0) )
  | +        {
  | +            ret = -EINVAL;
  | +            break;
  | +        }

Does that need a check that ti->first_dev <= ti_num_devs ?

(that is aside from the issue of subtracting one uint from another and hoping
that you'll get a negative signed int if the one subtracted is bigger).

The (num_devs <= 0) test (even if it works - which it should on most
rational architectures) isn't good enough, as it is possible to subtract
a very big uint from a very small unit and end up with an int that is
positive (and potentially, very big, consider (32bit): 2 - 0x80000001).

So, replace the (num_devs <= 0) test by (ti->first_dev <= ti->num_devs)  ??

Or possibly include both tests, just in case ti->num_devs is very large and
ti->first_dev is small (like 0) and when the large unsigned is converted
to signed, it becomes negative - or is all of this just too impossible
to worry about?

kre

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent
  2015-02-09 20:04 ` [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent Boris Ostrovsky
@ 2015-02-10 10:41   ` Andrew Cooper
  2015-02-10 11:39   ` Jan Beulich
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2015-02-10 10:41 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel

On 09/02/15 20:04, Boris Ostrovsky wrote:
> Various pieces of code test whether node value is NUMA_NO_NODE even
> though pxm_to_node() may return (int)-1 for an invalid node.
>
> Make pxm_to_node() and setup_node() return u8 and have them return
> NUMA_NO_NODE when necessary.
>
> Adjust code that tests for (node == -1).
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Looks fine.  A few minor comments.

> ---
>  xen/arch/x86/smpboot.c              |    2 +-
>  xen/arch/x86/srat.c                 |   39 ++++++++++++++++++++++------------
>  xen/arch/x86/x86_64/mm.c            |    2 +-
>  xen/drivers/passthrough/vtd/iommu.c |    4 +-
>  xen/include/asm-x86/numa.h          |    4 +-
>  5 files changed, 31 insertions(+), 20 deletions(-)
>
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index c54be7e..97956be 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -877,7 +877,7 @@ int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm)
>  
>      if ( !srat_disabled() )
>      {
> -        if ( (node = setup_node(pxm)) < 0 )
> +        if ( (node = setup_node(pxm)) == NUMA_NO_NODE )
>          {
>              dprintk(XENLOG_WARNING,
>                      "Setup node failed for pxm %x\n", pxm);
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index 29fc724..4dfa1c3 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -21,13 +21,16 @@
>  #include <asm/e820.h>
>  #include <asm/page.h>
>  
> +#define MAX_PXM   255
> +
>  static struct acpi_table_slit *__read_mostly acpi_slit;
>  
>  static nodemask_t memory_nodes_parsed __initdata;
>  static nodemask_t processor_nodes_parsed __initdata;
>  static nodemask_t nodes_found __initdata;
>  static struct node nodes[MAX_NUMNODES] __initdata;
> -static u8 __read_mostly pxm2node[256] = { [0 ... 255] = NUMA_NO_NODE };
> +static u8 __read_mostly pxm2node[MAX_PXM + 1] =
> +    { [0 ... MAX_PXM] = NUMA_NO_NODE };
>  
>  
>  static int num_node_memblks;
> @@ -37,21 +40,29 @@ static int memblk_nodeid[NR_NODE_MEMBLKS];
>  
>  static int node_to_pxm(int n);
>  
> -int pxm_to_node(int pxm)
> +u8 pxm_to_node(int pxm)

You can make these parameters unsigned and do away with the unsigned
casting.  pxm appears to be unsigned in all the relevant call chains.

>  {
> -	if ((unsigned)pxm >= 256)
> -		return -1;
> -	/* Extend 0xff to (int)-1 */
> -	return (signed char)pxm2node[pxm];
> +	if ((unsigned)pxm > MAX_PXM)
> +		return NUMA_NO_NODE;
> +
> +	return pxm2node[pxm];
>  }
>  
> -__devinit int setup_node(int pxm)
> +__devinit u8 setup_node(int pxm)
>  {
> -	unsigned node = pxm2node[pxm];
> -	if (node == 0xff) {
> +	u8 node;
> +
> +	/* NUMA_NO_NODE is 255 */
> +	BUILD_BUG_ON(MAX_NUMNODES > 254);
> +
> +	if ((unsigned)pxm > MAX_PXM)
> +		return NUMA_NO_NODE;
> +
> +	node = pxm2node[pxm];
> +	if (node == NUMA_NO_NODE) {
>  		if (nodes_weight(nodes_found) >= MAX_NUMNODES)
> -			return -1;
> -		node = first_unset_node(nodes_found); 
> +			return NUMA_NO_NODE;
> +		node = first_unset_node(nodes_found);
>  		node_set(node, nodes_found);
>  		pxm2node[pxm] = node;
>  	}
> @@ -175,7 +186,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
>  		return;
>  	pxm = pa->proximity_domain;
>  	node = setup_node(pxm);
> -	if (node < 0) {
> +	if (node == NUMA_NO_NODE) {
>  		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
>  		bad_srat();
>  		return;
> @@ -208,7 +219,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
>  		pxm |= pa->proximity_domain_hi[2] << 24;
>  	}
>  	node = setup_node(pxm);
> -	if (node < 0) {
> +	if (node == NUMA_NO_NODE) {
>  		printk(KERN_ERR "SRAT: Too many proximity domains %x\n", pxm);
>  		bad_srat();
>  		return;
> @@ -252,7 +263,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  	if (srat_rev < 2)
>  		pxm &= 0xff;
>  	node = setup_node(pxm);
> -	if (node < 0) {
> +	if (node == NUMA_NO_NODE) {
>  		printk(KERN_ERR "SRAT: Too many proximity domains.\n");
>  		bad_srat();
>  		return;
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index d631aee..4817dad 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1353,7 +1353,7 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>      if ( !mem_hotadd_check(spfn, epfn) )
>          return -EINVAL;
>  
> -    if ( (node = setup_node(pxm)) == -1 )
> +    if ( (node = setup_node(pxm)) == NUMA_NO_NODE )
>          return -EINVAL;
>  
>      if ( !valid_numa_range(spfn << PAGE_SHIFT, epfn << PAGE_SHIFT, node) )
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index 19d8165..fd48f7b 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -190,14 +190,14 @@ u64 alloc_pgtable_maddr(struct acpi_drhd_unit *drhd, unsigned long npages)
>      struct acpi_rhsa_unit *rhsa;
>      struct page_info *pg, *cur_pg;
>      u64 *vaddr;
> -    int node = -1, i;
> +    int node = NUMA_NO_NODE, i;
>  
>      rhsa = drhd_to_rhsa(drhd);
>      if ( rhsa )
>          node =  pxm_to_node(rhsa->proximity_domain);
>  
>      pg = alloc_domheap_pages(NULL, get_order_from_pages(npages),
> -                             (node == -1 ) ? 0 : MEMF_node(node));
> +                             (node == NUMA_NO_NODE ) ? 0 : MEMF_node(node));

Drop the stray space while you are editing this line.

>      if ( !pg )
>          return 0;
>  
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index 5959860..2a43827 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -21,7 +21,7 @@ struct node {
>  
>  extern int compute_hash_shift(struct node *nodes, int numnodes,
>  			      int *nodeids);
> -extern int pxm_to_node(int nid);
> +extern u8 pxm_to_node(int nid);

The parameter should presumably be named pxm?

~Andrew

>  
>  #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
>  #define VIRTUAL_BUG_ON(x) 
> @@ -33,7 +33,7 @@ extern int numa_off;
>  
>  extern int srat_disabled(void);
>  extern void numa_set_node(int cpu, int node);
> -extern int setup_node(int pxm);
> +extern u8 setup_node(int pxm);
>  extern void srat_detect_node(int cpu);
>  
>  extern void setup_node_bootmem(int nodeid, u64 start, u64 end);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 2/7] pci: Do not ignore device's PXM information
       [not found] ` <1423512275-6531-3-git-send-email-boris.ostrovsky@oracle.com>
@ 2015-02-10 10:45   ` Andrew Cooper
  2015-02-10 11:43   ` Jan Beulich
  1 sibling, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2015-02-10 10:45 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel

On 09/02/15 20:04, Boris Ostrovsky wrote:
> If ACPI provides PXM data for IO devices then dom0 will pass it to
> hypervisor during PHYSDEVOP_pci_device_add call. This information,
> however, is currently ignored.
>
> We will store this information (in the form of nodeID) in pci_dev
> structure so that we can provide it, for example, to the toolstack
> when it adds support (in the following patches) for querying the
> hypervisor about device topology
>
> We will also print it when user requests device information dump.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
>  xen/arch/x86/physdev.c        |   23 ++++++++++++++++++++---
>  xen/drivers/passthrough/pci.c |   13 +++++++++----
>  xen/include/public/physdev.h  |    6 ++++++
>  xen/include/xen/pci.h         |    5 ++++-
>  4 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
> index 6b3201b..ab92fa5 100644
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
>              break;
>  
> -        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
> +        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
> +                             NULL, NUMA_NO_NODE);
>          break;
>      }
>  
> @@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
>          ret = pci_add_device(0, manage_pci_ext.bus,
>                               manage_pci_ext.devfn,
> -                             &pdev_info);
> +                             &pdev_info, NUMA_NO_NODE);
>          break;
>      }
>  
>      case PHYSDEVOP_pci_device_add: {
>          struct physdev_pci_device_add add;
>          struct pci_dev_info pdev_info;
> +        u8 node;
>  
>          ret = -EFAULT;
>          if ( copy_from_guest(&add, arg, 1) != 0 )
> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          }
>          else
>              pdev_info.is_virtfn = 0;
> -        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
> +
> +        if ( add.flags & XEN_PCI_DEV_PXM )
> +        {
> +            uint32_t pxm;
> +            size_t optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
> +                                sizeof(add.optarr[0]);
> +
> +            if ( copy_from_guest_offset(&pxm, arg, optarr_off, 1) )
> +                break;
> +
> +            node = pxm_to_node(pxm);
> +        }
> +        else
> +            node = NUMA_NO_NODE;
> +
> +        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node);
>          break;
>      }
>  
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 78c6977..43b4384 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
>      pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
> +                   const struct pci_dev_info *info, u8 node)
>  {
>      struct pci_seg *pseg;
>      struct pci_dev *pdev;
> @@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
>          pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
>          spin_unlock(&pcidevs_lock);
>          if ( !pdev )
> -            pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
> +            pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
> +                           NULL, node);
>          pdev_type = "virtual function";
>      }
>      else
> @@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
>      if ( !pdev )
>          goto out;
>  
> +    pdev->node = node;
> +
>      if ( info )
>          pdev->info = *info;
>      else if ( !pdev->vf_rlen[0] )
> @@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
>  
>      list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>      {
> -        printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ",
> +        printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
>                 pseg->nr, pdev->bus,
>                 PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
> -               pdev->domain ? pdev->domain->domain_id : -1);
> +               pdev->domain ? pdev->domain->domain_id : -1,
> +               (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
>          list_for_each_entry ( msi, &pdev->msi_list, list )
>                 printk("%d ", msi->irq);
>          printk(">\n");
> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
> index d547928..309346b 100644
> --- a/xen/include/public/physdev.h
> +++ b/xen/include/public/physdev.h
> @@ -293,6 +293,12 @@ struct physdev_pci_device_add {
>          uint8_t bus;
>          uint8_t devfn;
>      } physfn;
> +
> +    /*
> +     * Optional parameters array.
> +     * First element ([0]) is PXM domain associated with the device (if
> +     * XEN_PCI_DEV_PXM is set)
> +     */
>  #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
>      uint32_t optarr[];
>  #elif defined(__GNUC__)
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 5f295f3..1acab53 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -56,6 +56,8 @@ struct pci_dev {
>  
>      u8 phantom_stride;
>  
> +    u8 node; /* NUMA node */
> +
>      enum pdev_type {
>          DEV_TYPE_PCI_UNKNOWN,
>          DEV_TYPE_PCIe_ENDPOINT,
> @@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *,
>  int pci_release_devices(struct domain *d);
>  int pci_add_segment(u16 seg);
>  const unsigned long *pci_get_ro_map(u16 seg);
> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
> +                   const struct pci_dev_info *, u8 node);
>  int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>  int pci_ro_device(int seg, int bus, int devfn);
>  void arch_pci_ro_device(int seg, int bdf);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup)
  2015-02-10  8:52 ` [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Jan Beulich
@ 2015-02-10 11:08   ` Andrew Cooper
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2015-02-10 11:08 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, port-xen, stefano.stabellini,
	dario.faggioli, ian.jackson, xen-devel, Jan Beulich, keir,
	ufimtseva

On 10/02/15 08:52, Jan Beulich wrote:
>>>> On 09.02.15 at 21:04, <boris.ostrovsky@oracle.com> wrote:
>> * Make changes to xen_sysctl_numainfo, similar to those made to
>>   xen_sysctl_topologyinfo. (Q: I kept both sets of changes in the same
>>   patch #3 to avoid bumping interface version twice. Perhaps it's better
>>   to split it into two?)
> There's no need to bump it more than once between releases.
> Hence yes - splitting would be better.

I will wait until patch 3 is split to do an in-depth review, but one
thing you need to fix is that the INVALID_$FOO identifiers need a XEN_
prefix to put them in the same namespace as the other identifiers.

~Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
  2015-02-09 20:04 ` [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
@ 2015-02-10 11:13   ` Andrew Cooper
  2015-02-10 14:45     ` Boris Ostrovsky
       [not found]     ` <54DA19A4.8070603@oracle.com>
  0 siblings, 2 replies; 49+ messages in thread
From: Andrew Cooper @ 2015-02-10 11:13 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel

On 09/02/15 20:04, Boris Ostrovsky wrote:
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  xen/common/sysctl.c         |   73 +++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/sysctl.h |   29 +++++++++++++++++
>  2 files changed, 102 insertions(+), 0 deletions(-)
>
> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> index 30c5806..ea6557f 100644
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -384,7 +384,80 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>          xfree(cputopo);
>      }
>      break;
> +#ifdef HAS_PCI
> +    case XEN_SYSCTL_pcitopoinfo:
> +    {
> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
> +        physdev_pci_device_t *devs;
> +        uint8_t *nodes;
> +        unsigned int first_dev, i;
> +        int num_devs;
> +
> +        num_devs = ti->num_devs - ti->first_dev;
> +
> +        if ( guest_handle_is_null(ti->devs) ||
> +             guest_handle_is_null(ti->nodes) ||
> +             (num_devs <= 0) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        devs = xmalloc_array(physdev_pci_device_t, num_devs);
> +        nodes = xmalloc_array(uint8_t, num_devs);

You can do all of this without any memory allocation at all, which will
simplify your error handling substantially.

Something along the lines of

for(...)
{
    copy one physdev_pci_device_t from the guest

    do the lookup

    copy one node id back to the guest
}

~Andrew

> +        if ( !devs || !nodes )
> +        {
> +            xfree(devs);
> +            xfree(nodes);
> +            ret = -ENOMEM;
> +            break;
> +        }
> +
> +        first_dev = ti->first_dev;
> +
> +        if ( copy_from_guest_offset(devs, ti->devs, first_dev, num_devs) )
> +        {
> +            xfree(devs);
> +            xfree(nodes);
> +            ret = -EFAULT;
> +            break;
> +        }
>  
> +        for ( i = 0; i < num_devs; i++ )
> +        {
> +            struct pci_dev *pdev;
> +
> +            spin_lock(&pcidevs_lock);
> +            pdev = pci_get_pdev(devs[i].seg, devs[i].bus, devs[i].devfn);
> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
> +                nodes[i] = INVALID_NODE_ID;
> +            else
> +                nodes[i] = pdev->node;
> +            spin_unlock(&pcidevs_lock);
> +
> +            if ( hypercall_preempt_check() )
> +                break;
> +        }
> +
> +        ti->first_dev += i + 1;
> +
> +        if ( __copy_field_to_guest(u_sysctl, op,
> +                                   u.pcitopoinfo.first_dev) ||
> +             copy_to_guest_offset(ti->nodes, first_dev, nodes,num_devs) )
> +        {
> +            ret = -EFAULT;
> +            break;
> +        }
> +
> +        if ( ti->first_dev < ti->num_devs )
> +            ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
> +                                               "h", u_sysctl);
> +
> +        xfree(devs);
> +        xfree(nodes);
> +    }
> +    break;
> +#endif
>  #ifdef TEST_COVERAGE
>      case XEN_SYSCTL_coverage_op:
>          ret = sysctl_coverage_op(&op->u.coverage_op);
> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> index 7c78f81..044c3a1 100644
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -33,6 +33,7 @@
>  
>  #include "xen.h"
>  #include "domctl.h"
> +#include "physdev.h"
>  
>  #define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
>  
> @@ -494,6 +495,32 @@ struct xen_sysctl_cputopoinfo {
>  typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
>  
> +/* XEN_SYSCTL_pcitopoinfo */
> +struct xen_sysctl_pcitopoinfo {
> +    /* IN: Size of pcitopo array */
> +    uint32_t num_devs;
> +
> +    /*
> +     * IN/OUT: First element of pcitopo array that needs to be processed by
> +     * hypervisor.
> +     * This is used primarily by hypercall continuations and callers will
> +     * typically set it to zero
> +     */
> +    uint32_t first_dev;
> +
> +    /* IN: list of devices */
> +    XEN_GUEST_HANDLE_64(physdev_pci_device_t) devs;
> +
> +    /*
> +     * OUT: node identifier for each device.
> +     * If information for a particular device is not avalable then set
> +     * to INVALID_NODE_ID.
> +     */
> +    XEN_GUEST_HANDLE_64(uint8) nodes;
> +};
> +typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
> +
>  /* XEN_SYSCTL_numainfo */
>  #define INVALID_MEM_SZ (~0U)
>  
> @@ -692,12 +719,14 @@ struct xen_sysctl {
>  #define XEN_SYSCTL_scheduler_op                  19
>  #define XEN_SYSCTL_coverage_op                   20
>  #define XEN_SYSCTL_psr_cmt_op                    21
> +#define XEN_SYSCTL_pcitopoinfo                   22
>      uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
>      union {
>          struct xen_sysctl_readconsole       readconsole;
>          struct xen_sysctl_tbuf_op           tbuf_op;
>          struct xen_sysctl_physinfo          physinfo;
>          struct xen_sysctl_cputopoinfo       cputopoinfo;
> +        struct xen_sysctl_pcitopoinfo       pcitopoinfo;
>          struct xen_sysctl_numainfo          numainfo;
>          struct xen_sysctl_sched_id          sched_id;
>          struct xen_sysctl_perfc_op          perfc_op;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-02-09 20:04 ` [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
@ 2015-02-10 11:23   ` Andrew Cooper
  2015-02-10 14:48     ` Boris Ostrovsky
  2015-02-16 14:22   ` Dario Faggioli
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 49+ messages in thread
From: Andrew Cooper @ 2015-02-10 11:23 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel

On 09/02/15 20:04, Boris Ostrovsky wrote:
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  tools/libxc/include/xenctrl.h     |    4 +-
>  tools/libxc/xc_misc.c             |   23 +++++++++++-----
>  tools/libxl/libxl.c               |   32 +++++------------------
>  tools/misc/xenpm.c                |   51 ++++++++++++++++---------------------
>  tools/python/xen/lowlevel/xc/xc.c |   23 ++++++-----------
>  5 files changed, 55 insertions(+), 78 deletions(-)
>
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 0cb6743..d94571d 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
>  int xc_send_debug_keys(xc_interface *xch, char *keys);
>  
>  typedef xen_sysctl_physinfo_t xc_physinfo_t;
> -typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
> +typedef xen_sysctl_cputopo_t xc_cputopo_t;

The sysctl structure is still named xen_sysctl_cputopoinfo_t.  Where
does xen_sysctl_cputopo_t come from?

>  typedef xen_sysctl_numainfo_t xc_numainfo_t;
>  
>  typedef uint32_t xc_cpu_to_node_t;
> @@ -1237,7 +1237,7 @@ typedef uint64_t xc_node_to_memfree_t;
>  typedef uint32_t xc_node_to_node_dist_t;
>  
>  int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
> -int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
> +int xc_cputopoinfo(xc_interface *xch, int *max_cpus, xc_cputopo_t *cputopo);

max_cpus is the length of the cputopo array, and is therefore an
unsigned value.

~Andrew

>  int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
>  
>  int xc_sched_id(xc_interface *xch,
> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
> index be68291..4c654f3 100644
> --- a/tools/libxc/xc_misc.c
> +++ b/tools/libxc/xc_misc.c
> @@ -177,22 +177,31 @@ int xc_physinfo(xc_interface *xch,
>      return 0;
>  }
>  
> -int xc_cputopoinfo(xc_interface *xch,
> -                   xc_cputopoinfo_t *put_info)
> +int xc_cputopoinfo(xc_interface *xch, int *max_cpus,
> +                   xc_cputopo_t *cputopo)
>  {
>      int ret;
>      DECLARE_SYSCTL;
> +    DECLARE_HYPERCALL_BOUNCE(cputopo, *max_cpus * sizeof(*cputopo),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>  
> -    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
> +    if ((ret = xc_hypercall_bounce_pre(xch, cputopo)))
> +        goto out;
>  
> -    memcpy(&sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
> +    sysctl.u.cputopoinfo.max_cpu_index = *max_cpus;
> +    set_xen_guest_handle(sysctl.u.cputopoinfo.cputopo, cputopo);
> +
> +    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
>  
>      if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
> -        return ret;
> +        goto out;
>  
> -    memcpy(put_info, &sysctl.u.cputopoinfo, sizeof(*put_info));
> +    *max_cpus = sysctl.u.cputopoinfo.max_cpu_index + 1;
>  
> -    return 0;
> +out:
> +    xc_hypercall_bounce_post(xch, cputopo);
> +
> +    return ret;
>  }
>  
>  int xc_numainfo(xc_interface *xch,
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index b05c0bb..f8d64c2 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5072,38 +5072,23 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
>  libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>  {
>      GC_INIT(ctx);
> -    xc_cputopoinfo_t tinfo;
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
> +    xc_cputopo_t *cputopo;
>      libxl_cputopology *ret = NULL;
> -    int i;
> -    int max_cpus;
> +    int i, max_cpus;
>  
>      max_cpus = libxl_get_max_cpus(ctx);
> -    if (max_cpus < 0)
> -    {
> +    if (max_cpus < 0) {
>          LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS");
> -        ret = NULL;
>          goto out;
>      }
>  
> -    cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
> -                                        sizeof(*cputopo) * max_cpus);
> -    if (cputopo == NULL) {
> -        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
> -                            "Unable to allocate hypercall arguments");
> -        goto fail;
> -    }
> +    cputopo = libxl__zalloc(gc, sizeof(*cputopo) * max_cpus);
>  
> -    set_xen_guest_handle(tinfo.cputopo, cputopo);
> -    tinfo.max_cpu_index = max_cpus - 1;
> -    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0) {
> +    if (xc_cputopoinfo(ctx->xch, &max_cpus, cputopo) != 0) {
>          LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "CPU topology info hypercall failed");
> -        goto fail;
> +        goto out;
>      }
>  
> -    if (tinfo.max_cpu_index < max_cpus - 1)
> -        max_cpus = tinfo.max_cpu_index + 1;
> -
>      ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * max_cpus);
>  
>      for (i = 0; i < max_cpus; i++) {
> @@ -5115,11 +5100,8 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>  #undef V
>      }
>  
> -fail:
> -    xc_hypercall_buffer_free(ctx->xch, cputopo);
> +    *nb_cpu_out = max_cpus;
>  
> -    if (ret)
> -        *nb_cpu_out = max_cpus;
>   out:
>      GC_FREE;
>      return ret;
> diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
> index f7fe620..1d1eb40 100644
> --- a/tools/misc/xenpm.c
> +++ b/tools/misc/xenpm.c
> @@ -355,12 +355,11 @@ static void signal_int_handler(int signo)
>      int i, j, k;
>      struct timeval tv;
>      int cx_cap = 0, px_cap = 0;
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
> -    xc_cputopoinfo_t info = { 0 };
> -
> -    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
> -                                        sizeof(*cputopo) * MAX_NR_CPU);
> +    xc_cputopo_t *cputopo;
> +    int max_cpus;
>  
> +    max_cpus = MAX_NR_CPU;
> +    cputopo = calloc(max_cpus, sizeof(*cputopo));
>      if ( cputopo == NULL )
>      {
>  	fprintf(stderr, "failed to allocate hypercall buffers\n");
> @@ -445,29 +444,26 @@ static void signal_int_handler(int signo)
>              printf("  Avg freq\t%d\tKHz\n", avgfreq[i]);
>      }
>  
> -    set_xen_guest_handle(info.cputopo, cputopo);
> -    info.max_cpu_index = MAX_NR_CPU - 1;
> -
> -    if ( cx_cap && !xc_cputopoinfo(xc_handle, &info) )
> +    if ( cx_cap && !xc_cputopoinfo(xc_handle, &max_cpus, cputopo) )
>      {
>          uint32_t socket_ids[MAX_NR_CPU];
>          uint32_t core_ids[MAX_NR_CPU];
>          uint32_t socket_nr = 0;
>          uint32_t core_nr = 0;
>  
> -        if ( info.max_cpu_index > MAX_NR_CPU - 1 )
> -            info.max_cpu_index = MAX_NR_CPU - 1;
> +        if ( max_cpus > MAX_NR_CPU )
> +            max_cpus = MAX_NR_CPU;
>          /* check validity */
> -        for ( i = 0; i <= info.max_cpu_index; i++ )
> +        for ( i = 0; i < max_cpus; i++ )
>          {
>              if ( cputopo[i].core == INVALID_CORE_ID ||
>                   cputopo[i].socket == INVALID_SOCKET_ID )
>                  break;
>          }
> -        if ( i > info.max_cpu_index )
> +        if ( i >= max_cpus )
>          {
>              /* find socket nr & core nr per socket */
> -            for ( i = 0; i <= info.max_cpu_index; i++ )
> +            for ( i = 0; i < max_cpus; i++ )
>              {
>                  for ( j = 0; j < socket_nr; j++ )
>                      if ( cputopo[i].socket == socket_ids[j] )
> @@ -494,7 +490,7 @@ static void signal_int_handler(int signo)
>                  unsigned int n;
>                  uint64_t res;
>  
> -                for ( j = 0; j <= info.max_cpu_index; j++ )
> +                for ( j = 0; j < max_cpus; j++ )
>                  {
>                      if ( cputopo[j].socket == socket_ids[i] )
>                          break;
> @@ -513,7 +509,7 @@ static void signal_int_handler(int signo)
>                  }
>                  for ( k = 0; k < core_nr; k++ )
>                  {
> -                    for ( j = 0; j <= info.max_cpu_index; j++ )
> +                    for ( j = 0; j < max_cpus; j++ )
>                      {
>                          if ( cputopo[j].socket == socket_ids[i] &&
>                               cputopo[j].core == core_ids[k] )
> @@ -551,7 +547,7 @@ static void signal_int_handler(int signo)
>      free(sum);
>      free(avgfreq);
>  out:
> -    xc_hypercall_buffer_free(xc_handle, cputopo);
> +    free(cputopo);
>      xc_interface_close(xc_handle);
>      exit(0);
>  }
> @@ -958,22 +954,19 @@ void scaling_governor_func(int argc, char *argv[])
>  
>  void cpu_topology_func(int argc, char *argv[])
>  {
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
> -    xc_cputopoinfo_t info = { 0 };
> +    xc_cputopo_t *cputopo;
> +    int max_cpus;
>      int i, rc = ENOMEM;
>  
> -    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
> -                                        sizeof(*cputopo) * MAX_NR_CPU);
> +    max_cpus = MAX_NR_CPU;
> +    cputopo = calloc(max_cpus, sizeof(*cputopo));
>      if ( cputopo == NULL )
>      {
>  	fprintf(stderr, "failed to allocate hypercall buffers\n");
>  	goto out;
>      }
>  
> -    set_xen_guest_handle(info.cputopo, cputopo);
> -    info.max_cpu_index = MAX_NR_CPU-1;
> -
> -    if ( xc_cputopoinfo(xc_handle, &info) )
> +    if ( xc_cputopoinfo(xc_handle, &max_cpus, cputopo) )
>      {
>          rc = errno;
>          fprintf(stderr, "Cannot get Xen CPU topology (%d - %s)\n",
> @@ -981,11 +974,11 @@ void cpu_topology_func(int argc, char *argv[])
>          goto out;
>      }
>  
> -    if ( info.max_cpu_index > (MAX_NR_CPU-1) )
> -        info.max_cpu_index = MAX_NR_CPU-1;
> +    if ( max_cpus > MAX_NR_CPU )
> +        max_cpus = MAX_NR_CPU;
>  
>      printf("CPU\tcore\tsocket\tnode\n");
> -    for ( i = 0; i <= info.max_cpu_index; i++ )
> +    for ( i = 0; i < max_cpus; i++ )
>      {
>          if ( cputopo[i].core == INVALID_CORE_ID )
>              continue;
> @@ -994,7 +987,7 @@ void cpu_topology_func(int argc, char *argv[])
>      }
>      rc = 0;
>  out:
> -    xc_hypercall_buffer_free(xc_handle, cputopo);
> +    free(cputopo);
>      if ( rc )
>          exit(rc);
>  }
> diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
> index 4275968..6e49dc5 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -1221,31 +1221,24 @@ static PyObject *pyxc_getcpuinfo(XcObject *self, PyObject *args, PyObject *kwds)
>  static PyObject *pyxc_topologyinfo(XcObject *self)
>  {
>  #define MAX_CPU_INDEX 255
> -    xc_cputopoinfo_t tinfo = { 0 };
> -    int i, max_cpu_index;
> +    xc_cputopo_t *cputopo;
> +    int i, max_cpus;
>      PyObject *ret_obj = NULL;
>      PyObject *cpu_to_core_obj, *cpu_to_socket_obj, *cpu_to_node_obj;
>  
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
> -
> -    cputopo = xc_hypercall_buffer_alloc(self->xc_handle, cputopo, sizeof(*cputopo) * (MAX_CPU_INDEX+1));
> +    max_cpus = MAX_CPU_INDEX + 1;
> +    cputopo = calloc(max_cpus, sizeof(*cputopo));
>      if ( cputopo == NULL )
>  	goto out;
> -    set_xen_guest_handle(tinfo.cputopo, cputopo);
> -    tinfo.max_cpu_index = MAX_CPU_INDEX;
>  
> -    if ( xc_cputopoinfo(self->xc_handle, &tinfo) != 0 )
> +    if ( xc_cputopoinfo(self->xc_handle, &max_cpus, cputopo) != 0 )
>          goto out;
>  
> -    max_cpu_index = tinfo.max_cpu_index;
> -    if ( max_cpu_index > MAX_CPU_INDEX )
> -        max_cpu_index = MAX_CPU_INDEX;
> -
>      /* Construct cpu-to-* lists. */
>      cpu_to_core_obj = PyList_New(0);
>      cpu_to_socket_obj = PyList_New(0);
>      cpu_to_node_obj = PyList_New(0);
> -    for ( i = 0; i <= max_cpu_index; i++ )
> +    for ( i = 0; i < max_cpus; i++ )
>      {
>          if ( cputopo[i].core == INVALID_CORE_ID )
>          {
> @@ -1281,7 +1274,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
>          }
>      }
>  
> -    ret_obj = Py_BuildValue("{s:i}", "max_cpu_index", max_cpu_index);
> +    ret_obj = Py_BuildValue("{s:i}", "max_cpu_index", max_cpus - 1);
>  
>      PyDict_SetItemString(ret_obj, "cpu_to_core", cpu_to_core_obj);
>      Py_DECREF(cpu_to_core_obj);
> @@ -1293,7 +1286,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
>      Py_DECREF(cpu_to_node_obj);
>  
>  out:
> -    xc_hypercall_buffer_free(self->xc_handle, cputopo);
> +    free(cputopo);
>      return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
>  #undef MAX_CPU_INDEX
>  }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent
  2015-02-09 20:04 ` [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent Boris Ostrovsky
  2015-02-10 10:41   ` Andrew Cooper
@ 2015-02-10 11:39   ` Jan Beulich
       [not found]   ` <54D9FBE7020000780005E91E@mail.emea.novell.com>
       [not found]   ` <54D9E076.1080604@citrix.com>
  3 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2015-02-10 11:39 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, keir

>>> On 09.02.15 at 21:04, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -21,7 +21,7 @@ struct node {
>  
>  extern int compute_hash_shift(struct node *nodes, int numnodes,
>  			      int *nodeids);
> -extern int pxm_to_node(int nid);
> +extern u8 pxm_to_node(int nid);

Perhaps introduce nodeid_t, so that once needed it can be widened?

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 2/7] pci: Do not ignore device's PXM information
       [not found] ` <1423512275-6531-3-git-send-email-boris.ostrovsky@oracle.com>
  2015-02-10 10:45   ` [PATCH v3 2/7] pci: Do not ignore device's PXM information Andrew Cooper
@ 2015-02-10 11:43   ` Jan Beulich
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2015-02-10 11:43 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, keir

>>> On 09.02.15 at 21:04, <boris.ostrovsky@oracle.com> wrote:
> If ACPI provides PXM data for IO devices then dom0 will pass it to
> hypervisor during PHYSDEVOP_pci_device_add call. This information,
> however, is currently ignored.
> 
> We will store this information (in the form of nodeID) in pci_dev
> structure so that we can provide it, for example, to the toolstack
> when it adds support (in the following patches) for querying the
> hypervisor about device topology
> 
> We will also print it when user requests device information dump.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

I disagree with the title - you still effectively ignore the value.
Displaying it upon debug key request doesn't really count as a
use. The patch itself, otoh is fine, so with the title adjusted to
state reality
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
       [not found] ` <1423512275-6531-7-git-send-email-boris.ostrovsky@oracle.com>
@ 2015-02-10 11:45   ` Andrew Cooper
       [not found]   ` <54D9EF57.7050308@citrix.com>
  2015-02-23 16:45   ` Ian Campbell
  2 siblings, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2015-02-10 11:45 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel

On 09/02/15 20:04, Boris Ostrovsky wrote:
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  tools/libxc/include/xenctrl.h     |    4 ++-
>  tools/libxc/xc_misc.c             |   29 ++++++++++++++++++++------
>  tools/libxl/libxl.c               |   36 ++++++++-------------------------
>  tools/python/xen/lowlevel/xc/xc.c |   39 +++++++++++++++---------------------
>  4 files changed, 50 insertions(+), 58 deletions(-)
>
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index d94571d..4466cd7 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1228,6 +1228,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys);
>  typedef xen_sysctl_physinfo_t xc_physinfo_t;
>  typedef xen_sysctl_cputopo_t xc_cputopo_t;
>  typedef xen_sysctl_numainfo_t xc_numainfo_t;
> +typedef xen_sysctl_meminfo_t xc_meminfo_t;
>  
>  typedef uint32_t xc_cpu_to_node_t;
>  typedef uint32_t xc_cpu_to_socket_t;
> @@ -1238,7 +1239,8 @@ typedef uint32_t xc_node_to_node_dist_t;
>  
>  int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
>  int xc_cputopoinfo(xc_interface *xch, int *max_cpus, xc_cputopo_t *cputopo);
> -int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
> +int xc_numainfo(xc_interface *xch, int *max_nodes,
> +                xc_meminfo_t *meminfo, uint8_t *distance);
>  
>  int xc_sched_id(xc_interface *xch,
>                  int *sched_id);
> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
> index 4c654f3..2dd5e13 100644
> --- a/tools/libxc/xc_misc.c
> +++ b/tools/libxc/xc_misc.c
> @@ -204,22 +204,37 @@ out:
>      return ret;
>  }
>  
> -int xc_numainfo(xc_interface *xch,
> -                xc_numainfo_t *put_info)
> +int xc_numainfo(xc_interface *xch, int *max_nodes,
> +                xc_meminfo_t *meminfo, uint8_t *distance)
>  {
>      int ret;
>      DECLARE_SYSCTL;
> +    DECLARE_HYPERCALL_BOUNCE(meminfo, *max_nodes * sizeof(*meminfo),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_HYPERCALL_BOUNCE(distance,
> +                             *max_nodes * *max_nodes * sizeof(*distance),
> +                             XC_HYPERCALL_BUFFER_BOUNCE_OUT);
>  
> -    sysctl.cmd = XEN_SYSCTL_numainfo;
> +    if ((ret = xc_hypercall_bounce_pre(xch, meminfo)))
> +        goto out;
> +    if ((ret = xc_hypercall_bounce_pre(xch, distance)))
> +        goto out;

You can safely combine these two together with an ||

>  
> -    memcpy(&sysctl.u.numainfo, put_info, sizeof(*put_info));
> +    sysctl.u.numainfo.max_node_index = *max_nodes - 1;

As you have you have already changed these hypercalls, can we fix the
length handling so it doesn't require +1 or -1.  Xen should receive the
length of the arrays, and return the number of elements written, with no
further adjustment necessary.

~Andrew

> +    set_xen_guest_handle(sysctl.u.numainfo.meminfo, meminfo);
> +    set_xen_guest_handle(sysctl.u.numainfo.distance, distance);
>  
> +    sysctl.cmd = XEN_SYSCTL_numainfo;
>      if ((ret = do_sysctl(xch, &sysctl)) != 0)
> -        return ret;
> +        goto out;
>  
> -    memcpy(put_info, &sysctl.u.numainfo, sizeof(*put_info));
> +    *max_nodes = sysctl.u.numainfo.max_node_index + 1;
>  
> -    return 0;
> +out:
> +    xc_hypercall_bounce_post(xch, meminfo);
> +    xc_hypercall_bounce_post(xch, distance);
> +
> +    return ret;
>  }
>  
>  
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index f8d64c2..9c1c949 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5110,41 +5110,25 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>  libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
>  {
>      GC_INIT(ctx);
> -    xc_numainfo_t ninfo;
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
> -    DECLARE_HYPERCALL_BUFFER(uint8_t, distance);
> +    xc_meminfo_t *meminfo;
> +    uint8_t *distance;
>      libxl_numainfo *ret = NULL;
>      int i, j, max_nodes;
>  
>      max_nodes = libxl_get_max_nodes(ctx);
> -    if (max_nodes < 0)
> -    {
> +    if (max_nodes < 0) {
>          LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of NODES");
> -        ret = NULL;
>          goto out;
>      }
>  
> -    meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo, sizeof(*meminfo) * max_nodes);
> -    distance = xc_hypercall_buffer_alloc(ctx->xch, distance, sizeof(*distance) * max_nodes * max_nodes);
> -    if ((meminfo == NULL) || (distance == NULL)) {
> -        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
> -                            "Unable to allocate hypercall arguments");
> -        goto fail;
> -    }
> +    meminfo = libxl__zalloc(gc, sizeof(*meminfo) * max_nodes);
> +    distance = libxl__zalloc(gc, sizeof(*distance) * max_nodes * max_nodes);
>  
> -    set_xen_guest_handle(ninfo.meminfo, meminfo);
> -    set_xen_guest_handle(ninfo.distance, distance);
> -    ninfo.max_node_index = max_nodes - 1;
> -    if (xc_numainfo(ctx->xch, &ninfo) != 0) {
> +    if (xc_numainfo(ctx->xch, &max_nodes, meminfo, distance) != 0) {
>          LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting numainfo");
> -        goto fail;
> +        goto out;
>      }
>  
> -    if (ninfo.max_node_index < max_nodes - 1)
> -        max_nodes = ninfo.max_node_index + 1;
> -
> -    *nr = max_nodes;
> -
>      ret = libxl__zalloc(NOGC, sizeof(libxl_numainfo) * max_nodes);
>      for (i = 0; i < max_nodes; i++)
>          ret[i].dists = libxl__calloc(NOGC, max_nodes, sizeof(*distance));
> @@ -5156,13 +5140,11 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
>          ret[i].free = V(memfree, i);
>          ret[i].num_dists = max_nodes;
>          for (j = 0; j < ret[i].num_dists; j++)
> -            ret[i].dists[j] = distance[i*max_nodes + j];
> +            ret[i].dists[j] = distance[i * max_nodes + j];
>  #undef V
>      }
>  
> - fail:
> -    xc_hypercall_buffer_free(ctx->xch, meminfo);
> -    xc_hypercall_buffer_free(ctx->xch, distance);
> +    *nr = max_nodes;
>  
>   out:
>      GC_FREE;
> diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
> index 6e49dc5..3fd4805 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -1294,39 +1294,32 @@ out:
>  static PyObject *pyxc_numainfo(XcObject *self)
>  {
>  #define MAX_NODE_INDEX 31
> -    xc_numainfo_t ninfo = { 0 };
> -    int i, j, max_node_index, invalid_node;
> +    int i, j, max_nodes, invalid_node;
>      uint64_t free_heap;
>      PyObject *ret_obj = NULL, *node_to_node_dist_list_obj;
>      PyObject *node_to_memsize_obj, *node_to_memfree_obj;
>      PyObject *node_to_dma32_mem_obj, *node_to_node_dist_obj;
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_meminfo_t, meminfo);
> -    DECLARE_HYPERCALL_BUFFER(uint8_t, distance);
> +    xc_meminfo_t *meminfo;
> +    uint8_t *distance;
>  
> -    meminfo = xc_hypercall_buffer_alloc(self->xc_handle, meminfo, sizeof(*meminfo) * (MAX_NODE_INDEX+1));
> -    if ( meminfo == NULL )
> +    max_nodes = MAX_NODE_INDEX + 1;
> +    meminfo = calloc(max_nodes, sizeof(*meminfo));
> +    distance = calloc(max_nodes * max_nodes, sizeof(*distance));
> +    if ( meminfo == NULL || distance == NULL)
>          goto out;
> -    distance = xc_hypercall_buffer_alloc(self->xc_handle, distance, sizeof(*distance)*(MAX_NODE_INDEX+1)*(MAX_NODE_INDEX+1));
> -    if ( distance == NULL )
> -        goto out;
> -
> -    set_xen_guest_handle(ninfo.meminfo, meminfo);
> -    set_xen_guest_handle(ninfo.distance, distance);
> -    ninfo.max_node_index = MAX_NODE_INDEX;
>  
> -    if ( xc_numainfo(self->xc_handle, &ninfo) != 0 )
> +    if ( xc_numainfo(self->xc_handle, &max_nodes, meminfo, distance) != 0 )
>          goto out;
>  
> -    max_node_index = ninfo.max_node_index;
> -    if ( max_node_index > MAX_NODE_INDEX )
> -        max_node_index = MAX_NODE_INDEX;
> +    if ( max_nodes > MAX_NODE_INDEX + 1 )
> +        max_nodes = MAX_NODE_INDEX + 1;
>  
>      /* Construct node-to-* lists. */
>      node_to_memsize_obj = PyList_New(0);
>      node_to_memfree_obj = PyList_New(0);
>      node_to_dma32_mem_obj = PyList_New(0);
>      node_to_node_dist_list_obj = PyList_New(0);
> -    for ( i = 0; i <= max_node_index; i++ )
> +    for ( i = 0; i < max_nodes; i++ )
>      {
>          PyObject *pyint;
>  
> @@ -1349,9 +1342,9 @@ static PyObject *pyxc_numainfo(XcObject *self)
>          /* Node to Node Distance */
>          node_to_node_dist_obj = PyList_New(0);
>          invalid_node = (meminfo[i].memsize == INVALID_MEM_SZ);
> -        for ( j = 0; j <= max_node_index; j++ )
> +        for ( j = 0; j < max_nodes; j++ )
>          {
> -            uint32_t dist = distance[i * (max_node_index + 1) + j];
> +            uint32_t dist = distance[i * max_nodes + j];
>              if ( invalid_node )
>              {
>                  PyList_Append(node_to_node_dist_obj, Py_None);
> @@ -1367,7 +1360,7 @@ static PyObject *pyxc_numainfo(XcObject *self)
>          Py_DECREF(node_to_node_dist_obj);
>      }
>  
> -    ret_obj = Py_BuildValue("{s:i}", "max_node_index", max_node_index);
> +    ret_obj = Py_BuildValue("{s:i}", "max_node_index", max_nodes - 1);
>  
>      PyDict_SetItemString(ret_obj, "node_memsize", node_to_memsize_obj);
>      Py_DECREF(node_to_memsize_obj);
> @@ -1383,8 +1376,8 @@ static PyObject *pyxc_numainfo(XcObject *self)
>      Py_DECREF(node_to_node_dist_list_obj);
>  
>  out:
> -    xc_hypercall_buffer_free(self->xc_handle, meminfo);
> -    xc_hypercall_buffer_free(self->xc_handle, distance);
> +    free(meminfo);
> +    free(distance);
>      return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
>  #undef MAX_NODE_INDEX
>  }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
       [not found] ` <2508.1423564006@perseus.noi.kre.to>
@ 2015-02-10 14:37   ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-10 14:37 UTC (permalink / raw)
  To: Robert Elz
  Cc: keir, ian.campbell, port-xen, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, wei.liu2,
	ufimtseva


On 02/10/2015 05:26 AM, Robert Elz wrote:
>      Date:        Mon,  9 Feb 2015 15:04:32 -0500
>      From:        Boris Ostrovsky <boris.ostrovsky@oracle.com>
>      Message-ID:  <1423512275-6531-5-git-send-email-boris.ostrovsky@oracle.com>
>
>
>    | +        num_devs = ti->num_devs - ti->first_dev;
>    | +
>    | +        if ( guest_handle_is_null(ti->devs) ||
>    | +             guest_handle_is_null(ti->nodes) ||
>    | +             (num_devs <= 0) )
>    | +        {
>    | +            ret = -EINVAL;
>    | +            break;
>    | +        }
>
> Does that need a check that ti->first_dev <= ti_num_devs ?
>
> (that is aside from the issue of subtracting one uint from another and hoping
> that you'll get a negative signed int if the one subtracted is bigger).
>
> The (num_devs <= 0) test (even if it works - which it should on most
> rational architectures) isn't good enough, as it is possible to subtract
> a very big uint from a very small unit and end up with an int that is
> positive (and potentially, very big, consider (32bit): 2 - 0x80000001).
>
> So, replace the (num_devs <= 0) test by (ti->first_dev <= ti->num_devs)  ??
>
> Or possibly include both tests, just in case ti->num_devs is very large and
> ti->first_dev is small (like 0) and when the large unsigned is converted
> to signed, it becomes negative - or is all of this just too impossible
> to worry about?


For the latter, I'll just keep num_devs as uint32. The only reason I 
declared it as a signed int was for (num_devs <= 0) test.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
  2015-02-10 11:13   ` Andrew Cooper
@ 2015-02-10 14:45     ` Boris Ostrovsky
       [not found]     ` <54DA19A4.8070603@oracle.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-10 14:45 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel


On 02/10/2015 06:13 AM, Andrew Cooper wrote:
> On 09/02/15 20:04, Boris Ostrovsky wrote:
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>>   xen/common/sysctl.c         |   73 +++++++++++++++++++++++++++++++++++++++++++
>>   xen/include/public/sysctl.h |   29 +++++++++++++++++
>>   2 files changed, 102 insertions(+), 0 deletions(-)
>>
>> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
>> index 30c5806..ea6557f 100644
>> --- a/xen/common/sysctl.c
>> +++ b/xen/common/sysctl.c
>> @@ -384,7 +384,80 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>           xfree(cputopo);
>>       }
>>       break;
>> +#ifdef HAS_PCI
>> +    case XEN_SYSCTL_pcitopoinfo:
>> +    {
>> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
>> +        physdev_pci_device_t *devs;
>> +        uint8_t *nodes;
>> +        unsigned int first_dev, i;
>> +        int num_devs;
>> +
>> +        num_devs = ti->num_devs - ti->first_dev;
>> +
>> +        if ( guest_handle_is_null(ti->devs) ||
>> +             guest_handle_is_null(ti->nodes) ||
>> +             (num_devs <= 0) )
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        devs = xmalloc_array(physdev_pci_device_t, num_devs);
>> +        nodes = xmalloc_array(uint8_t, num_devs);
> You can do all of this without any memory allocation at all, which will
> simplify your error handling substantially.
>
> Something along the lines of
>
> for(...)
> {
>      copy one physdev_pci_device_t from the guest
>
>      do the lookup
>
>      copy one node id back to the guest
> }

I am trying to avoid doing multiple copies. For lots of devices (IIRC, 
you said you had a system with a few thousand), having two copies per 
loop will add up, I think.

If you look at changes to cputopolgy and numatopology in earlier patch 
(the one you said you'd postpone reviewing until I split the third 
patch) I you will see that I did the same thing there.


-boris

>
> ~Andrew
>
>> +        if ( !devs || !nodes )
>> +        {
>> +            xfree(devs);
>> +            xfree(nodes);
>> +            ret = -ENOMEM;
>> +            break;
>> +        }
>> +
>> +        first_dev = ti->first_dev;
>> +
>> +        if ( copy_from_guest_offset(devs, ti->devs, first_dev, num_devs) )
>> +        {
>> +            xfree(devs);
>> +            xfree(nodes);
>> +            ret = -EFAULT;
>> +            break;
>> +        }
>>   
>> +        for ( i = 0; i < num_devs; i++ )
>> +        {
>> +            struct pci_dev *pdev;
>> +
>> +            spin_lock(&pcidevs_lock);
>> +            pdev = pci_get_pdev(devs[i].seg, devs[i].bus, devs[i].devfn);
>> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
>> +                nodes[i] = INVALID_NODE_ID;
>> +            else
>> +                nodes[i] = pdev->node;
>> +            spin_unlock(&pcidevs_lock);
>> +
>> +            if ( hypercall_preempt_check() )
>> +                break;
>> +        }
>> +
>> +        ti->first_dev += i + 1;
>> +
>> +        if ( __copy_field_to_guest(u_sysctl, op,
>> +                                   u.pcitopoinfo.first_dev) ||
>> +             copy_to_guest_offset(ti->nodes, first_dev, nodes,num_devs) )
>> +        {
>> +            ret = -EFAULT;
>> +            break;
>> +        }
>> +
>> +        if ( ti->first_dev < ti->num_devs )
>> +            ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
>> +                                               "h", u_sysctl);
>> +
>> +        xfree(devs);
>> +        xfree(nodes);
>> +    }
>> +    break;
>> +#endif
>>   #ifdef TEST_COVERAGE
>>       case XEN_SYSCTL_coverage_op:
>>           ret = sysctl_coverage_op(&op->u.coverage_op);
>> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
>> index 7c78f81..044c3a1 100644
>> --- a/xen/include/public/sysctl.h
>> +++ b/xen/include/public/sysctl.h
>> @@ -33,6 +33,7 @@
>>   
>>   #include "xen.h"
>>   #include "domctl.h"
>> +#include "physdev.h"
>>   
>>   #define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
>>   
>> @@ -494,6 +495,32 @@ struct xen_sysctl_cputopoinfo {
>>   typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
>>   DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
>>   
>> +/* XEN_SYSCTL_pcitopoinfo */
>> +struct xen_sysctl_pcitopoinfo {
>> +    /* IN: Size of pcitopo array */
>> +    uint32_t num_devs;
>> +
>> +    /*
>> +     * IN/OUT: First element of pcitopo array that needs to be processed by
>> +     * hypervisor.
>> +     * This is used primarily by hypercall continuations and callers will
>> +     * typically set it to zero
>> +     */
>> +    uint32_t first_dev;
>> +
>> +    /* IN: list of devices */
>> +    XEN_GUEST_HANDLE_64(physdev_pci_device_t) devs;
>> +
>> +    /*
>> +     * OUT: node identifier for each device.
>> +     * If information for a particular device is not avalable then set
>> +     * to INVALID_NODE_ID.
>> +     */
>> +    XEN_GUEST_HANDLE_64(uint8) nodes;
>> +};
>> +typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
>> +
>>   /* XEN_SYSCTL_numainfo */
>>   #define INVALID_MEM_SZ (~0U)
>>   
>> @@ -692,12 +719,14 @@ struct xen_sysctl {
>>   #define XEN_SYSCTL_scheduler_op                  19
>>   #define XEN_SYSCTL_coverage_op                   20
>>   #define XEN_SYSCTL_psr_cmt_op                    21
>> +#define XEN_SYSCTL_pcitopoinfo                   22
>>       uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
>>       union {
>>           struct xen_sysctl_readconsole       readconsole;
>>           struct xen_sysctl_tbuf_op           tbuf_op;
>>           struct xen_sysctl_physinfo          physinfo;
>>           struct xen_sysctl_cputopoinfo       cputopoinfo;
>> +        struct xen_sysctl_pcitopoinfo       pcitopoinfo;
>>           struct xen_sysctl_numainfo          numainfo;
>>           struct xen_sysctl_sched_id          sched_id;
>>           struct xen_sysctl_perfc_op          perfc_op;
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-02-10 11:23   ` Andrew Cooper
@ 2015-02-10 14:48     ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-10 14:48 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel


On 02/10/2015 06:23 AM, Andrew Cooper wrote:
>
>
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 0cb6743..d94571d 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
>   int xc_send_debug_keys(xc_interface *xch, char *keys);
>   
>   typedef xen_sysctl_physinfo_t xc_physinfo_t;
> -typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
> +typedef xen_sysctl_cputopo_t xc_cputopo_t;
> The sysctl structure is still named xen_sysctl_cputopoinfo_t.  Where
> does xen_sysctl_cputopo_t come from?

It's in patch 3 (the one you skipped), sysctl.h:

+struct xen_sysctl_cputopo {
+    uint32_t core;
+    uint32_t socket;
+    uint8_t node;
+};
+typedef struct xen_sysctl_cputopo xen_sysctl_cputopo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopo_t);

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
       [not found]     ` <54DA19A4.8070603@oracle.com>
@ 2015-02-10 14:54       ` Andrew Cooper
       [not found]       ` <54DA1BC1.2030205@citrix.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2015-02-10 14:54 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel

On 10/02/15 14:45, Boris Ostrovsky wrote:
>
> On 02/10/2015 06:13 AM, Andrew Cooper wrote:
>> On 09/02/15 20:04, Boris Ostrovsky wrote:
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> ---
>>>   xen/common/sysctl.c         |   73
>>> +++++++++++++++++++++++++++++++++++++++++++
>>>   xen/include/public/sysctl.h |   29 +++++++++++++++++
>>>   2 files changed, 102 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
>>> index 30c5806..ea6557f 100644
>>> --- a/xen/common/sysctl.c
>>> +++ b/xen/common/sysctl.c
>>> @@ -384,7 +384,80 @@ long
>>> do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>>           xfree(cputopo);
>>>       }
>>>       break;
>>> +#ifdef HAS_PCI
>>> +    case XEN_SYSCTL_pcitopoinfo:
>>> +    {
>>> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
>>> +        physdev_pci_device_t *devs;
>>> +        uint8_t *nodes;
>>> +        unsigned int first_dev, i;
>>> +        int num_devs;
>>> +
>>> +        num_devs = ti->num_devs - ti->first_dev;
>>> +
>>> +        if ( guest_handle_is_null(ti->devs) ||
>>> +             guest_handle_is_null(ti->nodes) ||
>>> +             (num_devs <= 0) )
>>> +        {
>>> +            ret = -EINVAL;
>>> +            break;
>>> +        }
>>> +
>>> +        devs = xmalloc_array(physdev_pci_device_t, num_devs);
>>> +        nodes = xmalloc_array(uint8_t, num_devs);
>> You can do all of this without any memory allocation at all, which will
>> simplify your error handling substantially.
>>
>> Something along the lines of
>>
>> for(...)
>> {
>>      copy one physdev_pci_device_t from the guest
>>
>>      do the lookup
>>
>>      copy one node id back to the guest
>> }
>
> I am trying to avoid doing multiple copies. For lots of devices (IIRC,
> you said you had a system with a few thousand), having two copies per
> loop will add up, I think.

copy_to/from_guest() is not expensive.  It is a straight memcpy with an
extable guards for pagefaults.

~Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent
       [not found]   ` <54D9FBE7020000780005E91E@mail.emea.novell.com>
@ 2015-02-10 14:55     ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-10 14:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, keir


On 02/10/2015 06:39 AM, Jan Beulich wrote:
>>>> On 09.02.15 at 21:04, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/include/asm-x86/numa.h
>> +++ b/xen/include/asm-x86/numa.h
>> @@ -21,7 +21,7 @@ struct node {
>>   
>>   extern int compute_hash_shift(struct node *nodes, int numnodes,
>>   			      int *nodeids);
>> -extern int pxm_to_node(int nid);
>> +extern u8 pxm_to_node(int nid);
> Perhaps introduce nodeid_t, so that once needed it can be widened?

Yes. I actually thing there are more references to node IDs throughout 
the code that may need adjusting.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
       [not found]   ` <54D9EF57.7050308@citrix.com>
@ 2015-02-10 14:59     ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-10 14:59 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel


On 02/10/2015 06:45 AM, Andrew Cooper wrote:
>
>>   
>> -    memcpy(&sysctl.u.numainfo, put_info, sizeof(*put_info));
>> +    sysctl.u.numainfo.max_node_index = *max_nodes - 1;
> As you have you have already changed these hypercalls, can we fix the
> length handling so it doesn't require +1 or -1.  Xen should receive the
> length of the arrays, and return the number of elements written, with no
> further adjustment necessary.

Yes, I should change this too.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
       [not found]       ` <54DA1BC1.2030205@citrix.com>
@ 2015-02-10 15:06         ` Boris Ostrovsky
       [not found]         ` <54DA1E6D.2010401@oracle.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-10 15:06 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, keir, ian.jackson, ian.campbell,
	stefano.stabellini, wei.liu2
  Cc: dario.faggioli, port-xen, ufimtseva, xen-devel


On 02/10/2015 09:54 AM, Andrew Cooper wrote:
> On 10/02/15 14:45, Boris Ostrovsky wrote:
>> On 02/10/2015 06:13 AM, Andrew Cooper wrote:
>>> On 09/02/15 20:04, Boris Ostrovsky wrote:
>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>> ---
>>>>    xen/common/sysctl.c         |   73
>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>    xen/include/public/sysctl.h |   29 +++++++++++++++++
>>>>    2 files changed, 102 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
>>>> index 30c5806..ea6557f 100644
>>>> --- a/xen/common/sysctl.c
>>>> +++ b/xen/common/sysctl.c
>>>> @@ -384,7 +384,80 @@ long
>>>> do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>>>            xfree(cputopo);
>>>>        }
>>>>        break;
>>>> +#ifdef HAS_PCI
>>>> +    case XEN_SYSCTL_pcitopoinfo:
>>>> +    {
>>>> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
>>>> +        physdev_pci_device_t *devs;
>>>> +        uint8_t *nodes;
>>>> +        unsigned int first_dev, i;
>>>> +        int num_devs;
>>>> +
>>>> +        num_devs = ti->num_devs - ti->first_dev;
>>>> +
>>>> +        if ( guest_handle_is_null(ti->devs) ||
>>>> +             guest_handle_is_null(ti->nodes) ||
>>>> +             (num_devs <= 0) )
>>>> +        {
>>>> +            ret = -EINVAL;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        devs = xmalloc_array(physdev_pci_device_t, num_devs);
>>>> +        nodes = xmalloc_array(uint8_t, num_devs);
>>> You can do all of this without any memory allocation at all, which will
>>> simplify your error handling substantially.
>>>
>>> Something along the lines of
>>>
>>> for(...)
>>> {
>>>       copy one physdev_pci_device_t from the guest
>>>
>>>       do the lookup
>>>
>>>       copy one node id back to the guest
>>> }
>> I am trying to avoid doing multiple copies. For lots of devices (IIRC,
>> you said you had a system with a few thousand), having two copies per
>> loop will add up, I think.
> copy_to/from_guest() is not expensive.  It is a straight memcpy with an
> extable guards for pagefaults.

True, but still why do this inside a loop? xmalloc() of less than a page 
is not that expensive, is it?

(The downside is that when we have really lots of devices we may be 
asking for more than one page. I know that we try not to do this but 
again, I think the expense would be amortised over long loops.).

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
       [not found]         ` <54DA1E6D.2010401@oracle.com>
@ 2015-02-10 16:30           ` Jan Beulich
       [not found]           ` <54DA322B02000078000C8167@mail.emea.novell.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2015-02-10 16:30 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: wei.liu2, ian.campbell, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, keir

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 02/10/15 4:07 PM >>>
>On 02/10/2015 09:54 AM, Andrew Cooper wrote:
>> copy_to/from_guest() is not expensive.  It is a straight memcpy with an
>> extable guards for pagefaults.
>
>True, but still why do this inside a loop? xmalloc() of less than a page 
>is not that expensive, is it?
>
>(The downside is that when we have really lots of devices we may be 
>asking for more than one page. I know that we try not to do this but 
>again, I think the expense would be amortised over long loops.).

And risk failing the hypercall because there isn't enough contiguous memory?
And that perhaps only on really large systems where debugging may end up
being difficult? No, please don't.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology
       [not found]           ` <54DA322B02000078000C8167@mail.emea.novell.com>
@ 2015-02-10 16:33             ` Andrew Cooper
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2015-02-10 16:33 UTC (permalink / raw)
  To: Jan Beulich, boris.ostrovsky
  Cc: wei.liu2, ian.campbell, port-xen, stefano.stabellini,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

On 10/02/15 16:30, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 02/10/15 4:07 PM >>>
>> On 02/10/2015 09:54 AM, Andrew Cooper wrote:
>>> copy_to/from_guest() is not expensive.  It is a straight memcpy with an
>>> extable guards for pagefaults.
>> True, but still why do this inside a loop? xmalloc() of less than a page 
>> is not that expensive, is it?
>>
>> (The downside is that when we have really lots of devices we may be 
>> asking for more than one page. I know that we try not to do this but 
>> again, I think the expense would be amortised over long loops.).
> And risk failing the hypercall because there isn't enough contiguous memory?
> And that perhaps only on really large systems where debugging may end up
> being difficult? No, please don't.

Indeed.  This hypercall is not part of a fastpath.  It can afford not to
amortise the cost in the name of making the code more simple and less
liable to fail.

~Andrew

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
  2015-02-09 20:04 ` [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient Boris Ostrovsky
@ 2015-02-13 12:26   ` Wei Liu
       [not found]   ` <20150213122609.GU13644@zion.uk.xensource.com>
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: Wei Liu @ 2015-02-13 12:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, ian.campbell, port-xen, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, wei.liu2,
	ufimtseva

On Mon, Feb 09, 2015 at 03:04:31PM -0500, Boris Ostrovsky wrote:
> Currently both of these sysctls make a copy to userspace for each index of
> various query arrays. We should try to copy whole arrays instead.
> 
> This requires some changes in sysctl's public data structure, thus bump
> interface version.
> 
> Report topology for all present (not just online) cpus.
> 
> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
> that these are used for CPU topology. Subsequent patch will add support for
> PCI topology sysctl.
> 
> Clarify some somments in sysctl.h.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  tools/libxc/include/xenctrl.h     |    4 +-
>  tools/libxc/xc_misc.c             |   10 ++--
>  tools/libxl/libxl.c               |   71 +++++++------------
>  tools/misc/xenpm.c                |   69 +++++++-----------
>  tools/python/xen/lowlevel/xc/xc.c |   77 ++++++++------------

Are these mostly mechanical changes? I'm assuming yes.

>  xen/common/sysctl.c               |  141 ++++++++++++++++++++++---------------
>  xen/include/public/sysctl.h       |   75 ++++++++++++--------
>  7 files changed, 221 insertions(+), 226 deletions(-)
[...]
> -        (ctx->xch, node_dists, sizeof(*node_dists) * max_nodes * max_nodes);
> -    if ((memsize == NULL) || (memfree == NULL) || (node_dists == NULL)) {
> +    meminfo = xc_hypercall_buffer_alloc(ctx->xch, meminfo, sizeof(*meminfo) * max_nodes);
> +    distance = xc_hypercall_buffer_alloc(ctx->xch, distance, sizeof(*distance) * max_nodes * max_nodes);

Please wrap these two lines to <80 column.

> +    if ((meminfo == NULL) || (distance == NULL)) {
>          LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
>                              "Unable to allocate hypercall arguments");
[...]
> -    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
> -    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
> -    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
> +    cputopo = xc_hypercall_buffer_alloc(self->xc_handle, cputopo, sizeof(*cputopo) * (MAX_CPU_INDEX+1));

Line too long.

> +    if ( cputopo == NULL )
> +	goto out;
> +    set_xen_guest_handle(tinfo.cputopo, cputopo);
>      tinfo.max_cpu_index = MAX_CPU_INDEX;
[...]
> -        goto out;
> -    node_memfree = xc_hypercall_buffer_alloc(self->xc_handle, node_memfree, sizeof(*node_memfree)*(MAX_NODE_INDEX+1));
> -    if ( node_memfree == NULL )
> +    meminfo = xc_hypercall_buffer_alloc(self->xc_handle, meminfo, sizeof(*meminfo) * (MAX_NODE_INDEX+1));

Ditto.

> +    if ( meminfo == NULL )
>          goto out;
> -    nodes_dist = xc_hypercall_buffer_alloc(self->xc_handle, nodes_dist, sizeof(*nodes_dist)*(MAX_NODE_INDEX+1)*(MAX_NODE_INDEX+1));
> -    if ( nodes_dist == NULL )
> +    distance = xc_hypercall_buffer_alloc(self->xc_handle, distance, sizeof(*distance)*(MAX_NODE_INDEX+1)*(MAX_NODE_INDEX+1));

Ditto.

Wei.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology
  2015-02-09 20:04 ` [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
@ 2015-02-13 12:43   ` Wei Liu
       [not found]   ` <20150213124345.GV13644@zion.uk.xensource.com>
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: Wei Liu @ 2015-02-13 12:43 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, ian.campbell, port-xen, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, wei.liu2,
	ufimtseva

On Mon, Feb 09, 2015 at 03:04:35PM -0500, Boris Ostrovsky wrote:
[...]
> +#ifdef  LIBXL_HAVE_PCITOPO
> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
> +{
> +    GC_INIT(ctx);
> +    physdev_pci_device_t *devs;
> +    uint8_t *nodes;
> +    libxl_pcitopology *ret = NULL;
> +    int i, rc;
> +
> +    *num_devs = libxl__pci_numdevs(gc);
> +    if (*num_devs <= 0) {
> +        LOG(ERROR, "Unable to determine number of PCI devices");

Is 0 an error? What if the system actually have no PCI devices?

The rest looks OK to me.

Wei.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
       [not found]   ` <20150213122609.GU13644@zion.uk.xensource.com>
@ 2015-02-13 14:21     ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-13 14:21 UTC (permalink / raw)
  To: Wei Liu
  Cc: keir, ian.campbell, port-xen, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, ufimtseva

On 02/13/2015 07:26 AM, Wei Liu wrote:
> On Mon, Feb 09, 2015 at 03:04:31PM -0500, Boris Ostrovsky wrote:
>> Currently both of these sysctls make a copy to userspace for each index of
>> various query arrays. We should try to copy whole arrays instead.
>>
>> This requires some changes in sysctl's public data structure, thus bump
>> interface version.
>>
>> Report topology for all present (not just online) cpus.
>>
>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
>> that these are used for CPU topology. Subsequent patch will add support for
>> PCI topology sysctl.
>>
>> Clarify some somments in sysctl.h.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>>   tools/libxc/include/xenctrl.h     |    4 +-
>>   tools/libxc/xc_misc.c             |   10 ++--
>>   tools/libxl/libxl.c               |   71 +++++++------------
>>   tools/misc/xenpm.c                |   69 +++++++-----------
>>   tools/python/xen/lowlevel/xc/xc.c |   77 ++++++++------------
> Are these mostly mechanical changes? I'm assuming yes.

Pretty much, yes.

-boris


>
>>   xen/common/sysctl.c               |  141 ++++++++++++++++++++++---------------
>>   xen/include/public/sysctl.h       |   75 ++++++++++++--------
>>   7 files changed, 221 insertions(+), 226 deletions(-)
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology
       [not found]   ` <20150213124345.GV13644@zion.uk.xensource.com>
@ 2015-02-13 14:22     ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-13 14:22 UTC (permalink / raw)
  To: Wei Liu
  Cc: keir, ian.campbell, port-xen, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, ufimtseva

On 02/13/2015 07:43 AM, Wei Liu wrote:
> On Mon, Feb 09, 2015 at 03:04:35PM -0500, Boris Ostrovsky wrote:
> [...]
>> +#ifdef  LIBXL_HAVE_PCITOPO
>> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
>> +{
>> +    GC_INIT(ctx);
>> +    physdev_pci_device_t *devs;
>> +    uint8_t *nodes;
>> +    libxl_pcitopology *ret = NULL;
>> +    int i, rc;
>> +
>> +    *num_devs = libxl__pci_numdevs(gc);
>> +    if (*num_devs <= 0) {
>> +        LOG(ERROR, "Unable to determine number of PCI devices");
> Is 0 an error? What if the system actually have no PCI devices?

Right, zero should be a non-error value.


-boris


>
> The rest looks OK to me.
>
> Wei.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology
  2015-02-09 20:04 ` [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
  2015-02-13 12:43   ` Wei Liu
       [not found]   ` <20150213124345.GV13644@zion.uk.xensource.com>
@ 2015-02-16 13:45   ` Dario Faggioli
  2015-02-16 15:54     ` Boris Ostrovsky
       [not found]     ` <54E212A3.7000802@oracle.com>
  2015-02-23 16:52   ` Ian Campbell
  3 siblings, 2 replies; 49+ messages in thread
From: Dario Faggioli @ 2015-02-16 13:45 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: Wei Liu, Ian Campbell, port-xen, Andrew Cooper, xen-devel,
	jbeulich, Stefano Stabellini, ufimtseva, Ian Jackson,
	Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 5050 bytes --]

On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
> .. and use this new interface to display it along with CPU topology
> and NUMA information when 'xl info -n' command is issued
> 
> The output will look like
> ...
> cpu_topology           :
> cpu:    core    socket     node
>   0:       0        0        0
> ...
> device topology        :
> device           node
> 0000:00:00.0      0
> 0000:00:01.0      0
> ...
> 
Cool, I like this! :-)

At some point, we may want to think about gathering most of/all NUMA
related information and create appropriate xl commands, but for now, I
think it is fine to have this info here.

> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -692,6 +692,14 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src);
>  #define LIBXL_HAVE_PSR_CMT 1
>  #endif
>  
> +/*
> + * LIBXL_HAVE_PCITOPO
> + *
> + * If this is defined, we have interfaces to query hypervisor about PCI device
> + * topology
> + */
> +#define  LIBXL_HAVE_PCITOPO 1
> +
>
This is probably not a big deal, but the majority of the definitions of
these macros have a wording like this: "FOO indicates that ...". E.g.:

/*
 * LIBXL_HAVE_VCPUINFO_SOFT_AFFINITY indicates that a 'cpumap_soft'
 * field (of libxl_bitmap type) is present in libxl_vcpuinfo,
 * containing the soft affinity of a vcpu.
 */
#define LIBXL_HAVE_VCPUINFO_SOFT_AFFINITY 1

and homogeneity and consistency has some value here, I think.

In any case, you have an extra space between #define and
LIBXL_HAVE_PCITOPO. :-)

> @@ -1070,6 +1078,12 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
>  libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
>  void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
>  
> +#ifdef  LIBXL_HAVE_PCITOPO
> +#define LIBXL_PCITOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs);
> +void libxl_pcitopology_list_free(libxl_pcitopology *, int num_devs);
> +#endif
> +
Is there the need to gate new APIs like this? I've never done that, and
I don't think there is. AFAIUI, these are (apart from a few exceptions
like LIBXL_HAVE_NO_SUSPEND_RESUME) intended for being used by libxl
consumers, not libxl own implementation.

> diff --git a/tools/libxl/libxl_freebsd.c b/tools/libxl/libxl_freebsd.c
> index e8b88b3..68d7b71 100644
> --- a/tools/libxl/libxl_freebsd.c
> +++ b/tools/libxl/libxl_freebsd.c
> @@ -131,3 +131,17 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
>  {
>      return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
>  }
> +
> +#ifdef  LIBXL_HAVE_PCITOPO
> +int libxl__pci_numdevs(libxl__gc *gc)
> +{
> +    return ERROR_NI;
> +}
> +
> +int libxl__pci_topology_init(libxl__gc *gc,
> +                             physdev_pci_device_t *devs,
> +                             int num_devs)
> +{
> +    return ERROR_NI;
> +}
> +#endif
>
And the same for internal functions, and elsewhere.
 
> @@ -5209,12 +5214,37 @@ static void output_topologyinfo(void)
>      printf("cpu:    core    socket     node\n");
>  
>      for (i = 0; i < nr; i++) {
> -        if (info[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
> +        if (cpuinfo[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
>              printf("%3d:    %4d     %4d     %4d\n", i,
> -                   info[i].core, info[i].socket, info[i].node);
> +                   cpuinfo[i].core, cpuinfo[i].socket, cpuinfo[i].node);
> +    }
> +
> +    libxl_cputopology_list_free(cpuinfo, nr);
> +
> +#ifdef  LIBXL_HAVE_PCITOPO
> +    pciinfo = libxl_get_pci_topology(ctx, &nr);
> +    if (cpuinfo == NULL) {
> +        fprintf(stderr, "libxl_get_pci_topology failed.\n");
> +        return;
>      }
>  
> -    libxl_cputopology_list_free(info, nr);
> +    printf("device topology        :\n");
> +    printf("device           node\n");
> +    for (i = 0; i < nr; i++) {
> +        if (pciinfo[i].node != LIBXL_PCITOPOLOGY_INVALID_ENTRY) {
> +            printf("%04x:%02x:%02x.%01x      %d\n", pciinfo[i].seg,
> +                   pciinfo[i].bus,
> +                   ((pciinfo[i].devfn >> 3) & 0x1f), (pciinfo[i].devfn & 7),
> +                   pciinfo[i].node);
> +            valid_devs++;
> +        }
> +    }
> +
> +    if (valid_devs == 0)
> +        printf("No device topology data available\n");
> +
> +    libxl_pcitopology_list_free(pciinfo, nr);
> +#endif
>
And for implementation too, I think.

In fact, if I'm not missing something obvious, since we're always
distributing xl and libxl together, this will always be "#ifdef 1",
wouldn't it?

Also, ISTR that the first change that actually changes the API should
bump the MAJOR in libxl's Makefile. I'm not sure this change qualifies
as such, as you're adding stuff, not altering existing data structs
(e.g., by adding fields, etc). Personally, I think it does, but I'm
leaving this to tools maintainers.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent
       [not found]   ` <54D9E076.1080604@citrix.com>
@ 2015-02-16 13:57     ` Dario Faggioli
  0 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2015-02-16 13:57 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, port-xen, xen-devel, jbeulich,
	Stefano Stabellini, ufimtseva, Ian Jackson, boris.ostrovsky,
	Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 1730 bytes --]

On Tue, 2015-02-10 at 10:41 +0000, Andrew Cooper wrote:
> On 09/02/15 20:04, Boris Ostrovsky wrote:

> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index 29fc724..4dfa1c3 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -21,13 +21,16 @@
> >  #include <asm/e820.h>
> >  #include <asm/page.h>
> >  
> > +#define MAX_PXM   255
> > +
> >  static struct acpi_table_slit *__read_mostly acpi_slit;
> >  
> >  static nodemask_t memory_nodes_parsed __initdata;
> >  static nodemask_t processor_nodes_parsed __initdata;
> >  static nodemask_t nodes_found __initdata;
> >  static struct node nodes[MAX_NUMNODES] __initdata;
> > -static u8 __read_mostly pxm2node[256] = { [0 ... 255] = NUMA_NO_NODE };
> > +static u8 __read_mostly pxm2node[MAX_PXM + 1] =
> > +    { [0 ... MAX_PXM] = NUMA_NO_NODE };
> >  
> >  
> >  static int num_node_memblks;
> > @@ -37,21 +40,29 @@ static int memblk_nodeid[NR_NODE_MEMBLKS];
> >  
> >  static int node_to_pxm(int n);
> >  
> > -int pxm_to_node(int pxm)
> > +u8 pxm_to_node(int pxm)
> 
> You can make these parameters unsigned and do away with the unsigned
> casting.  pxm appears to be unsigned in all the relevant call chains.
> 
+1

> > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > index 5959860..2a43827 100644
> > --- a/xen/include/asm-x86/numa.h
> > +++ b/xen/include/asm-x86/numa.h
> > @@ -21,7 +21,7 @@ struct node {
> >  
> >  extern int compute_hash_shift(struct node *nodes, int numnodes,
> >  			      int *nodeids);
> > -extern int pxm_to_node(int nid);
> > +extern u8 pxm_to_node(int nid);
> 
> The parameter should presumably be named pxm?
> 
+1  :-)

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
  2015-02-09 20:04 ` [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient Boris Ostrovsky
  2015-02-13 12:26   ` Wei Liu
       [not found]   ` <20150213122609.GU13644@zion.uk.xensource.com>
@ 2015-02-16 14:08   ` Dario Faggioli
  2015-02-16 15:55     ` Boris Ostrovsky
  2015-02-23 16:40   ` Ian Campbell
       [not found]   ` <1424709653.27930.212.camel@citrix.com>
  4 siblings, 1 reply; 49+ messages in thread
From: Dario Faggioli @ 2015-02-16 14:08 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: Wei Liu, Ian Campbell, port-xen, Andrew Cooper, xen-devel,
	jbeulich, Stefano Stabellini, ufimtseva, Ian Jackson,
	Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 799 bytes --]

On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
> Currently both of these sysctls make a copy to userspace for each index of
> various query arrays. We should try to copy whole arrays instead.
> 
> This requires some changes in sysctl's public data structure, thus bump
> interface version.
> 
> Report topology for all present (not just online) cpus.
> 
> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
> that these are used for CPU topology. Subsequent patch will add support for
> PCI topology sysctl.
> 
> Clarify some somments in sysctl.h.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
AFAIUI, this patch will change, so I'll look at it again in next
version. What I've seen here seemed all good to me.

Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-02-09 20:04 ` [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
  2015-02-10 11:23   ` Andrew Cooper
@ 2015-02-16 14:22   ` Dario Faggioli
  2015-02-23 16:44   ` Ian Campbell
       [not found]   ` <1424709863.27930.214.camel@citrix.com>
  3 siblings, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2015-02-16 14:22 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: Wei Liu, Ian Campbell, port-xen, Andrew Cooper, xen-devel,
	jbeulich, Stefano Stabellini, ufimtseva, Ian Jackson,
	Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 1752 bytes --]

On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Just one thing:

> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5072,38 +5072,23 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
>  libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>  {
>      GC_INIT(ctx);
> -    xc_cputopoinfo_t tinfo;
> -    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
> +    xc_cputopo_t *cputopo;
>      libxl_cputopology *ret = NULL;
> -    int i;
> -    int max_cpus;
> +    int i, max_cpus;
>  
>      max_cpus = libxl_get_max_cpus(ctx);
> -    if (max_cpus < 0)
> -    {
> +    if (max_cpus < 0) {
>          LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of CPUS");
> -        ret = NULL;
>          goto out;
>      }
>  
> -    cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
> -                                        sizeof(*cputopo) * max_cpus);
> -    if (cputopo == NULL) {
> -        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
> -                            "Unable to allocate hypercall arguments");
> -        goto fail;
> -    }
> +    cputopo = libxl__zalloc(gc, sizeof(*cputopo) * max_cpus);
>  
> -    set_xen_guest_handle(tinfo.cputopo, cputopo);
> -    tinfo.max_cpu_index = max_cpus - 1;
> -    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0) {
> +    if (xc_cputopoinfo(ctx->xch, &max_cpus, cputopo) != 0) {
>          LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "CPU topology info hypercall failed");
>
While at it, you probably can convert this to the new LOGE() error
reporting style.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology
  2015-02-16 13:45   ` Dario Faggioli
@ 2015-02-16 15:54     ` Boris Ostrovsky
       [not found]     ` <54E212A3.7000802@oracle.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-16 15:54 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, Ian Campbell, port-xen, Andrew Cooper, xen-devel,
	jbeulich, Stefano Stabellini, ufimtseva, Ian Jackson,
	Keir (Xen.org)

On 02/16/2015 08:45 AM, Dario Faggioli wrote:
>
>
>> @@ -1070,6 +1078,12 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
>>   libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
>>   void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
>>
>> +#ifdef  LIBXL_HAVE_PCITOPO
>> +#define LIBXL_PCITOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
>> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs);
>> +void libxl_pcitopology_list_free(libxl_pcitopology *, int num_devs);
>> +#endif
>> +
> Is there the need to gate new APIs like this? I've never done that, and
> I don't think there is. AFAIUI, these are (apart from a few exceptions
> like LIBXL_HAVE_NO_SUSPEND_RESUME) intended for being used by libxl
> consumers, not libxl own implementation.
>
>> diff --git a/tools/libxl/libxl_freebsd.c b/tools/libxl/libxl_freebsd.c
>> index e8b88b3..68d7b71 100644
>> --- a/tools/libxl/libxl_freebsd.c
>> +++ b/tools/libxl/libxl_freebsd.c
>> @@ -131,3 +131,17 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
>>   {
>>       return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
>>   }
>> +
>> +#ifdef  LIBXL_HAVE_PCITOPO
>> +int libxl__pci_numdevs(libxl__gc *gc)
>> +{
>> +    return ERROR_NI;
>> +}
>> +
>> +int libxl__pci_topology_init(libxl__gc *gc,
>> +                             physdev_pci_device_t *devs,
>> +                             int num_devs)
>> +{
>> +    return ERROR_NI;
>> +}
>> +#endif
>>
> And the same for internal functions, and elsewhere.
>
>> @@ -5209,12 +5214,37 @@ static void output_topologyinfo(void)
>>       printf("cpu:    core    socket     node\n");
>>
>>       for (i = 0; i < nr; i++) {
>> -        if (info[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
>> +        if (cpuinfo[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
>>               printf("%3d:    %4d     %4d     %4d\n", i,
>> -                   info[i].core, info[i].socket, info[i].node);
>> +                   cpuinfo[i].core, cpuinfo[i].socket, cpuinfo[i].node);
>> +    }
>> +
>> +    libxl_cputopology_list_free(cpuinfo, nr);
>> +
>> +#ifdef  LIBXL_HAVE_PCITOPO
>> +    pciinfo = libxl_get_pci_topology(ctx, &nr);
>> +    if (cpuinfo == NULL) {
>> +        fprintf(stderr, "libxl_get_pci_topology failed.\n");
>> +        return;
>>       }
>>
>> -    libxl_cputopology_list_free(info, nr);
>> +    printf("device topology        :\n");
>> +    printf("device           node\n");
>> +    for (i = 0; i < nr; i++) {
>> +        if (pciinfo[i].node != LIBXL_PCITOPOLOGY_INVALID_ENTRY) {
>> +            printf("%04x:%02x:%02x.%01x      %d\n", pciinfo[i].seg,
>> +                   pciinfo[i].bus,
>> +                   ((pciinfo[i].devfn >> 3) & 0x1f), (pciinfo[i].devfn & 7),
>> +                   pciinfo[i].node);
>> +            valid_devs++;
>> +        }
>> +    }
>> +
>> +    if (valid_devs == 0)
>> +        printf("No device topology data available\n");
>> +
>> +    libxl_pcitopology_list_free(pciinfo, nr);
>> +#endif
>>
> And for implementation too, I think.


I am not sure what the standard practice is for LIBXL_HAVE_ macros. I 
see a couple of examples where '#ifdef LIBXL_HAVE_*" is used in libxl 
code, which is why I have it here as well.

This is probably question for maintainers.


>
> In fact, if I'm not missing something obvious, since we're always
> distributing xl and libxl together, this will always be "#ifdef 1",
> wouldn't it?
>
> Also, ISTR that the first change that actually changes the API should
> bump the MAJOR in libxl's Makefile. I'm not sure this change qualifies
> as such, as you're adding stuff, not altering existing data structs
> (e.g., by adding fields, etc). Personally, I think it does, but I'm
> leaving this to tools maintainers.

libxl.h seems to suggest that API version is changed only when we make 
an incompatible change to the library. In my mind new interface is does 
not break compatibility so I didn't think a bump would be necessary.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
  2015-02-16 14:08   ` Dario Faggioli
@ 2015-02-16 15:55     ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-16 15:55 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Wei Liu, Ian Campbell, port-xen, Andrew Cooper, xen-devel,
	jbeulich, Stefano Stabellini, ufimtseva, Ian Jackson,
	Keir (Xen.org)

On 02/16/2015 09:08 AM, Dario Faggioli wrote:
> On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
>> Currently both of these sysctls make a copy to userspace for each index of
>> various query arrays. We should try to copy whole arrays instead.
>>
>> This requires some changes in sysctl's public data structure, thus bump
>> interface version.
>>
>> Report topology for all present (not just online) cpus.
>>
>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
>> that these are used for CPU topology. Subsequent patch will add support for
>> PCI topology sysctl.
>>
>> Clarify some somments in sysctl.h.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>
> AFAIUI, this patch will change, so I'll look at it again in next
> version. What I've seen here seemed all good to me.

Jan and Andrew don't want to make changes that I describe in the first 
paragraph so that will probably be different in the next version.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology
       [not found]     ` <54E212A3.7000802@oracle.com>
@ 2015-02-16 16:06       ` Dario Faggioli
  2015-02-16 16:47       ` Wei Liu
  1 sibling, 0 replies; 49+ messages in thread
From: Dario Faggioli @ 2015-02-16 16:06 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: Wei Liu, Ian Campbell, port-xen, Andrew Cooper, xen-devel,
	jbeulich, Stefano Stabellini, ufimtseva, Ian Jackson,
	Keir (Xen.org)


[-- Attachment #1.1: Type: text/plain, Size: 1616 bytes --]

On Mon, 2015-02-16 at 10:54 -0500, Boris Ostrovsky wrote:

> I am not sure what the standard practice is for LIBXL_HAVE_ macros. I 
> see a couple of examples where '#ifdef LIBXL_HAVE_*" is used in libxl 
> code, which is why I have it here as well.
> 
Yes, that is the case for two of them, AFAICS:

 * LIBXL_HAVE_NO_SUSPEND_RESUME
 * LIBXL_HAVE_PSR_CMT

If you look at how these two are defined, you'll see that the definition
is actually conditional:

#if defined(__arm__) || defined(__aarch64__)
#define LIBXL_HAVE_NO_SUSPEND_RESUME 1
#endif

#if defined(__i386__) || defined(__x86_64__)
#define LIBXL_HAVE_PSR_CMT 1
#endif

Which is, IMO, why we need to take special care of them in libxl and xl
code. All other macros, which are always and unconditionally #defined to
1, we just forget about them, and leave it to the caller.

> This is probably question for maintainers.
> 
Sure! :-)

> > Also, ISTR that the first change that actually changes the API should
> > bump the MAJOR in libxl's Makefile. I'm not sure this change qualifies
> > as such, as you're adding stuff, not altering existing data structs
> > (e.g., by adding fields, etc). Personally, I think it does, but I'm
> > leaving this to tools maintainers.
> 
> libxl.h seems to suggest that API version is changed only when we make 
> an incompatible change to the library. In my mind new interface is does 
> not break compatibility so I didn't think a bump would be necessary.
> 
Yeah, that was why I was not sure. Re-thinking about it, you probably
are right, no need for it for now.

Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology
       [not found]     ` <54E212A3.7000802@oracle.com>
  2015-02-16 16:06       ` Dario Faggioli
@ 2015-02-16 16:47       ` Wei Liu
  1 sibling, 0 replies; 49+ messages in thread
From: Wei Liu @ 2015-02-16 16:47 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Wei Liu, Ian Campbell, port-xen, Andrew Cooper, Dario Faggioli,
	xen-devel, jbeulich, Stefano Stabellini, ufimtseva, Ian Jackson,
	Keir (Xen.org)

On Mon, Feb 16, 2015 at 10:54:11AM -0500, Boris Ostrovsky wrote:
[...]
> >>+    if (valid_devs == 0)
> >>+        printf("No device topology data available\n");
> >>+
> >>+    libxl_pcitopology_list_free(pciinfo, nr);
> >>+#endif
> >>
> >And for implementation too, I think.
> 
> 
> I am not sure what the standard practice is for LIBXL_HAVE_ macros. I see a
> couple of examples where '#ifdef LIBXL_HAVE_*" is used in libxl code, which
> is why I have it here as well.
> 
> This is probably question for maintainers.
> 

They are not needed inside libxl. They are also not needed in xl, if
your new feature works on all architectures etc, because xl is shipped
with libxl and will always use the latest API.

Sorry for not mentioning this earlier.

> 
> >
> >In fact, if I'm not missing something obvious, since we're always
> >distributing xl and libxl together, this will always be "#ifdef 1",
> >wouldn't it?
> >
> >Also, ISTR that the first change that actually changes the API should
> >bump the MAJOR in libxl's Makefile. I'm not sure this change qualifies
> >as such, as you're adding stuff, not altering existing data structs
> >(e.g., by adding fields, etc). Personally, I think it does, but I'm
> >leaving this to tools maintainers.
> 
> libxl.h seems to suggest that API version is changed only when we make an
> incompatible change to the library. In my mind new interface is does not
> break compatibility so I didn't think a bump would be necessary.
> 
> -boris
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
  2015-02-09 20:04 ` [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient Boris Ostrovsky
                     ` (2 preceding siblings ...)
  2015-02-16 14:08   ` Dario Faggioli
@ 2015-02-23 16:40   ` Ian Campbell
       [not found]   ` <1424709653.27930.212.camel@citrix.com>
  4 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2015-02-23 16:40 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ufimtseva, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	keir

On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
> Currently both of these sysctls make a copy to userspace for each index of
> various query arrays. We should try to copy whole arrays instead.
> 
> This requires some changes in sysctl's public data structure, thus bump
> interface version.
> 
> Report topology for all present (not just online) cpus.
> 
> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
> that these are used for CPU topology. Subsequent patch will add support for
> PCI topology sysctl.
> 
> Clarify some somments in sysctl.h.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Was there a significant change in v3 which invalidated my tools ack from
<1421688368.10440.171.camel@citrix.com>?

Per-patch changelogs (after the ---) are very helpful for this sort of
thing.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-02-09 20:04 ` [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
  2015-02-10 11:23   ` Andrew Cooper
  2015-02-16 14:22   ` Dario Faggioli
@ 2015-02-23 16:44   ` Ian Campbell
       [not found]   ` <1424709863.27930.214.camel@citrix.com>
  3 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2015-02-23 16:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ufimtseva, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	keir

On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:

What is the rationale for this change? (i.e. it should be part of the
commit message)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc
       [not found] ` <1423512275-6531-7-git-send-email-boris.ostrovsky@oracle.com>
  2015-02-10 11:45   ` [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc Andrew Cooper
       [not found]   ` <54D9EF57.7050308@citrix.com>
@ 2015-02-23 16:45   ` Ian Campbell
  2 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2015-02-23 16:45 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ufimtseva, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	keir

On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:

As with previous patch please include some rationale.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
       [not found]   ` <1424709653.27930.212.camel@citrix.com>
@ 2015-02-23 16:48     ` Boris Ostrovsky
       [not found]     ` <54EB59F0.2070405@oracle.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-23 16:48 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, ufimtseva, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	keir

On 02/23/2015 11:40 AM, Ian Campbell wrote:
> On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
>> Currently both of these sysctls make a copy to userspace for each index of
>> various query arrays. We should try to copy whole arrays instead.
>>
>> This requires some changes in sysctl's public data structure, thus bump
>> interface version.
>>
>> Report topology for all present (not just online) cpus.
>>
>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
>> that these are used for CPU topology. Subsequent patch will add support for
>> PCI topology sysctl.
>>
>> Clarify some somments in sysctl.h.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Was there a significant change in v3 which invalidated my tools ack from
> <1421688368.10440.171.camel@citrix.com>?

Yes, because I merged changes that you ACKed together with (similar) 
changes for numa topology. Which will now have to unmerge because 
various people didn't like it. So I think your ACK will be restored.

>
> Per-patch changelogs (after the ---) are very helpful for this sort of
> thing.

I should have at least mentioned this in the cover letter, sorry.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
       [not found]   ` <1424709863.27930.214.camel@citrix.com>
@ 2015-02-23 16:52     ` Boris Ostrovsky
       [not found]     ` <54EB5AC9.7070409@oracle.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-23 16:52 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, ufimtseva, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	keir

On 02/23/2015 11:44 AM, Ian Campbell wrote:
> On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
>
> What is the rationale for this change?

libxl is not the right place to handle hypervisor-specific details like 
buffer management (most, if not all, of other services that libxl 
provides push these sort of things to libxc).

> (i.e. it should be part of the
> commit message)

Yes.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology
  2015-02-09 20:04 ` [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
                     ` (2 preceding siblings ...)
  2015-02-16 13:45   ` Dario Faggioli
@ 2015-02-23 16:52   ` Ian Campbell
  3 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2015-02-23 16:52 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ufimtseva, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	keir

On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 9c1c949..c3b6c8a 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5107,6 +5107,55 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>      return ret;
>  }
>  
> +#ifdef  LIBXL_HAVE_PCITOPO

No need for this conditional anywhere within libxl itself, nor in
libxl.h or in xl_cmdimpl.c either, it's for external app convenience
only.

> +/*
> + * LIBXL_HAVE_PCITOPO
> + *
> + * If this is defined, we have interfaces to query hypervisor about PCI device
> + * topology
> + */
> +#define  LIBXL_HAVE_PCITOPO 1

Can we call it PCI_TOPOLOGY here please.

> diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
> index ea5d8c1..f558dba 100644
> --- a/tools/libxl/libxl_linux.c
> +++ b/tools/libxl/libxl_linux.c
> @@ -279,3 +279,77 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
>  {
>      return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
>  }
> +
> +#ifdef  LIBXL_HAVE_PCITOPO
> +/* These two routines are "inspired" by pciutils */

Unless you've actually copied code (with the attention to licensing
which that requires) then please just drop this comment since it's only
purpose is to make me suspicious about the licensing status of these
functions.

Or if you did copy (although you previously said not) then the comment
should be expanded to clarify the apparent licensing disparity.

Ian.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
       [not found]     ` <54EB59F0.2070405@oracle.com>
@ 2015-02-23 16:59       ` Jan Beulich
       [not found]       ` <54EB6A770200007800062BDD@mail.emea.novell.com>
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2015-02-23 16:59 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, Ian Campbell, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel,
	ufimtseva, keir

>>> On 23.02.15 at 17:48, <boris.ostrovsky@oracle.com> wrote:
> On 02/23/2015 11:40 AM, Ian Campbell wrote:
>> Per-patch changelogs (after the ---) are very helpful for this sort of
>> thing.
> 
> I should have at least mentioned this in the cover letter, sorry.

Actually I also meant to suggest that you switch away from
listing the changes (only) in the cover letter - reviewing,
especially for later revisions, is easier when each patch lists
what got changed.

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
       [not found]     ` <54EB5AC9.7070409@oracle.com>
@ 2015-02-23 17:14       ` Ian Campbell
  2015-02-23 17:59         ` Boris Ostrovsky
  0 siblings, 1 reply; 49+ messages in thread
From: Ian Campbell @ 2015-02-23 17:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ufimtseva, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	keir

On Mon, 2015-02-23 at 11:52 -0500, Boris Ostrovsky wrote:
> On 02/23/2015 11:44 AM, Ian Campbell wrote:
> > On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
> >
> > What is the rationale for this change?
> 
> libxl is not the right place to handle hypervisor-specific details like 
> buffer management (most, if not all, of other services that libxl 
> provides push these sort of things to libxc).

It is acceptable for callers to do the buffer management themselves in
principal. The primary reason to do so would be high frequency calls
where the bouncing would be unacceptable overhead on every iteration
(i.e. it allows callers to preallocate a single buffer).

There aren't many such interfaces though and as you say most of them are
in libxc (as it happens).

Nonetheless the argument for this changw should be made in terms of the
interface not being called frequently and therefore being tolerant of
the bouncing overhead which simplifies things for the caller at the
expense of the performance.

That's assuming these interfaces are indeed tolerant of the performance
hit...

> 
> > (i.e. it should be part of the
> > commit message)
> 
> Yes.
> 
> -boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient
       [not found]       ` <54EB6A770200007800062BDD@mail.emea.novell.com>
@ 2015-02-23 17:15         ` Ian Campbell
  0 siblings, 0 replies; 49+ messages in thread
From: Ian Campbell @ 2015-02-23 17:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, port-xen, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir,
	Boris Ostrovsky

On Mon, 2015-02-23 at 16:59 +0000, Jan Beulich wrote:
> >>> On 23.02.15 at 17:48, <boris.ostrovsky@oracle.com> wrote:
> > On 02/23/2015 11:40 AM, Ian Campbell wrote:
> >> Per-patch changelogs (after the ---) are very helpful for this sort of
> >> thing.
> > 
> > I should have at least mentioned this in the cover letter, sorry.
> 
> Actually I also meant to suggest that you switch away from
> listing the changes (only) in the cover letter - reviewing,
> especially for later revisions, is easier when each patch lists
> what got changed.

Please. I find it hard to figure out which patch a given change in the
cover letter actually impacts, plus it means I need to keep two mails
open to look at both...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc
  2015-02-23 17:14       ` Ian Campbell
@ 2015-02-23 17:59         ` Boris Ostrovsky
  0 siblings, 0 replies; 49+ messages in thread
From: Boris Ostrovsky @ 2015-02-23 17:59 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, ufimtseva, port-xen, stefano.stabellini,
	andrew.cooper3, dario.faggioli, ian.jackson, xen-devel, jbeulich,
	keir


On 02/23/2015 12:14 PM, Ian Campbell wrote:
> On Mon, 2015-02-23 at 11:52 -0500, Boris Ostrovsky wrote:
>> On 02/23/2015 11:44 AM, Ian Campbell wrote:
>>> On Mon, 2015-02-09 at 15:04 -0500, Boris Ostrovsky wrote:
>>>
>>> What is the rationale for this change?
>> libxl is not the right place to handle hypervisor-specific details like
>> buffer management (most, if not all, of other services that libxl
>> provides push these sort of things to libxc).
> It is acceptable for callers to do the buffer management themselves in
> principal. The primary reason to do so would be high frequency calls
> where the bouncing would be unacceptable overhead on every iteration
> (i.e. it allows callers to preallocate a single buffer).
>
> There aren't many such interfaces though and as you say most of them are
> in libxc (as it happens).
>
> Nonetheless the argument for this changw should be made in terms of the
> interface not being called frequently and therefore being tolerant of
> the bouncing overhead which simplifies things for the caller at the
> expense of the performance.
>
> That's assuming these interfaces are indeed tolerant of the performance
> hit...

These interfaces should be called once per domain creation or as a query 
from commandline so they don't appear to be not performance-critical.

-boris

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2015-02-23 17:59 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-09 20:04 [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Boris Ostrovsky
2015-02-09 20:04 ` [PATCH v3 1/7] x86/numa: Make use of NUMA_NO_NODE consistent Boris Ostrovsky
2015-02-10 10:41   ` Andrew Cooper
2015-02-10 11:39   ` Jan Beulich
     [not found]   ` <54D9FBE7020000780005E91E@mail.emea.novell.com>
2015-02-10 14:55     ` Boris Ostrovsky
     [not found]   ` <54D9E076.1080604@citrix.com>
2015-02-16 13:57     ` Dario Faggioli
2015-02-09 20:04 ` [PATCH v3 2/7] pci: Do not ignore device's PXM information Boris Ostrovsky
2015-02-09 20:04 ` [PATCH v3 3/7] sysctl: Make topologyinfo and numainfo sysctls a little more efficient Boris Ostrovsky
2015-02-13 12:26   ` Wei Liu
     [not found]   ` <20150213122609.GU13644@zion.uk.xensource.com>
2015-02-13 14:21     ` Boris Ostrovsky
2015-02-16 14:08   ` Dario Faggioli
2015-02-16 15:55     ` Boris Ostrovsky
2015-02-23 16:40   ` Ian Campbell
     [not found]   ` <1424709653.27930.212.camel@citrix.com>
2015-02-23 16:48     ` Boris Ostrovsky
     [not found]     ` <54EB59F0.2070405@oracle.com>
2015-02-23 16:59       ` Jan Beulich
     [not found]       ` <54EB6A770200007800062BDD@mail.emea.novell.com>
2015-02-23 17:15         ` Ian Campbell
2015-02-09 20:04 ` [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
2015-02-10 11:13   ` Andrew Cooper
2015-02-10 14:45     ` Boris Ostrovsky
     [not found]     ` <54DA19A4.8070603@oracle.com>
2015-02-10 14:54       ` Andrew Cooper
     [not found]       ` <54DA1BC1.2030205@citrix.com>
2015-02-10 15:06         ` Boris Ostrovsky
     [not found]         ` <54DA1E6D.2010401@oracle.com>
2015-02-10 16:30           ` Jan Beulich
     [not found]           ` <54DA322B02000078000C8167@mail.emea.novell.com>
2015-02-10 16:33             ` Andrew Cooper
2015-02-09 20:04 ` [PATCH v3 5/7] libxl/libxc: Move libxl_get_cpu_topology()'s hypercall buffer management to libxc Boris Ostrovsky
2015-02-10 11:23   ` Andrew Cooper
2015-02-10 14:48     ` Boris Ostrovsky
2015-02-16 14:22   ` Dario Faggioli
2015-02-23 16:44   ` Ian Campbell
     [not found]   ` <1424709863.27930.214.camel@citrix.com>
2015-02-23 16:52     ` Boris Ostrovsky
     [not found]     ` <54EB5AC9.7070409@oracle.com>
2015-02-23 17:14       ` Ian Campbell
2015-02-23 17:59         ` Boris Ostrovsky
2015-02-09 20:04 ` [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s " Boris Ostrovsky
2015-02-09 20:04 ` [PATCH v3 7/7] libxl: Add interface for querying hypervisor about PCI topology Boris Ostrovsky
2015-02-13 12:43   ` Wei Liu
     [not found]   ` <20150213124345.GV13644@zion.uk.xensource.com>
2015-02-13 14:22     ` Boris Ostrovsky
2015-02-16 13:45   ` Dario Faggioli
2015-02-16 15:54     ` Boris Ostrovsky
     [not found]     ` <54E212A3.7000802@oracle.com>
2015-02-16 16:06       ` Dario Faggioli
2015-02-16 16:47       ` Wei Liu
2015-02-23 16:52   ` Ian Campbell
2015-02-10  8:52 ` [PATCH v3 0/7] Display IO topology when PXM data is available (plus some cleanup) Jan Beulich
2015-02-10 11:08   ` Andrew Cooper
2015-02-10 10:26 ` [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology Robert Elz
     [not found] ` <1423512275-6531-3-git-send-email-boris.ostrovsky@oracle.com>
2015-02-10 10:45   ` [PATCH v3 2/7] pci: Do not ignore device's PXM information Andrew Cooper
2015-02-10 11:43   ` Jan Beulich
     [not found] ` <2508.1423564006@perseus.noi.kre.to>
2015-02-10 14:37   ` [PATCH v3 4/7] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
     [not found] ` <1423512275-6531-7-git-send-email-boris.ostrovsky@oracle.com>
2015-02-10 11:45   ` [PATCH v3 6/7] libxl/libxc: Move libxl_get_numainfo()'s hypercall buffer management to libxc Andrew Cooper
     [not found]   ` <54D9EF57.7050308@citrix.com>
2015-02-10 14:59     ` Boris Ostrovsky
2015-02-23 16:45   ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.