All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Display IO topology when PXM data is available
@ 2014-12-02 21:34 Boris Ostrovsky
  2014-12-02 21:34 ` [PATCH 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-02 21:34 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

4 patches that add interface for querying hypervisor about device
topology and allow 'xl info -n' display this information if PXM object
is provided by ACPI.

The patches are:

* Store PXM data (nodeID) in pci_dev during PHYSDEVOP_pci_device_add
  hypercall
* Modify XEN_SYSCTL_topologyinfo so that it can return both CPU and
  device topology data. Add corresponding libxl interface
* Use new interface to query the hypervisor about topology and print
  it with 'xl info -n'
* Replace all users of old cpu topology interface with the new
  call. This patch is optional.

Boris Ostrovsky (4):
  pci: Do not ignore device's PXM information
  sysctl/libxl: Add interface for returning IO topology data
  sysctl/libxl: Provide information about IO topology
  libxl: Switch to using new topology interface

 tools/libxl/libxl.c               | 130 +++++++++++++++++++++++++-------------
 tools/libxl/libxl.h               |   4 ++
 tools/libxl/libxl_freebsd.c       |  12 ++++
 tools/libxl/libxl_internal.h      |   5 ++
 tools/libxl/libxl_linux.c         |  74 ++++++++++++++++++++++
 tools/libxl/libxl_netbsd.c        |  12 ++++
 tools/libxl/libxl_numa.c          |  14 ++--
 tools/libxl/libxl_types.idl       |  12 ++++
 tools/libxl/libxl_utils.c         |  30 +++++----
 tools/libxl/xl_cmdimpl.c          |  75 +++++++++++++---------
 tools/misc/xenpm.c                |  64 ++++++++-----------
 tools/python/xen/lowlevel/xc/xc.c |  38 ++++-------
 xen/arch/x86/physdev.c            |  20 +++++-
 xen/common/sysctl.c               |  69 +++++++++++++++-----
 xen/drivers/passthrough/pci.c     |  13 ++--
 xen/include/public/sysctl.h       |  36 ++++++++---
 xen/include/xen/pci.h             |   5 +-
 17 files changed, 424 insertions(+), 189 deletions(-)

-- 
1.8.4.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] pci: Do not ignore device's PXM information
  2014-12-02 21:34 [PATCH 0/4] Display IO topology when PXM data is available Boris Ostrovsky
@ 2014-12-02 21:34 ` Boris Ostrovsky
  2014-12-03 15:01   ` Andrew Cooper
  2014-12-05 15:53   ` Jan Beulich
  2014-12-02 21:34 ` [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data Boris Ostrovsky
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-02 21:34 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

If ACPI provides PXM data for IO devices then dom0 will pass it to
hypervisor during PHYSDEVOP_pci_device_add call. This information,
however, is currently ignored.

We should remember it (in the form of nodeID). We will also print it
when user requests device information dump.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/physdev.c        | 20 +++++++++++++++++---
 xen/drivers/passthrough/pci.c | 13 +++++++++----
 xen/include/xen/pci.h         |  5 ++++-
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 6b3201b..7775f80 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
             break;
 
-        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
+        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
+                             NULL, NUMA_NO_NODE);
         break;
     }
 
@@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
         ret = pci_add_device(0, manage_pci_ext.bus,
                              manage_pci_ext.devfn,
-                             &pdev_info);
+                             &pdev_info, NUMA_NO_NODE);
         break;
     }
 
     case PHYSDEVOP_pci_device_add: {
         struct physdev_pci_device_add add;
         struct pci_dev_info pdev_info;
+        int node;
 
         ret = -EFAULT;
         if ( copy_from_guest(&add, arg, 1) != 0 )
@@ -618,7 +620,19 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         }
         else
             pdev_info.is_virtfn = 0;
-        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
+
+        if ( add.flags & XEN_PCI_DEV_PXM ) {
+            int optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
+                             sizeof(add.optarr[0]);
+
+            if ( copy_from_guest_offset(&add.optarr[0], arg, optarr_off, 1) )
+                break;
+
+            node = pxm_to_node(add.optarr[0]);
+        } else
+            node = NUMA_NO_NODE;
+
+        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node);
         break;
     }
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 78c6977..898edaa 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
     pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
 }
 
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+                   const struct pci_dev_info *info, const int node)
 {
     struct pci_seg *pseg;
     struct pci_dev *pdev;
@@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
         pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
         spin_unlock(&pcidevs_lock);
         if ( !pdev )
-            pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
+            pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
+                           NULL, node);
         pdev_type = "virtual function";
     }
     else
@@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
     if ( !pdev )
         goto out;
 
+    pdev->node = node;
+
     if ( info )
         pdev->info = *info;
     else if ( !pdev->vf_rlen[0] )
@@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
 
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
     {
-        printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ",
+        printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
                pseg->nr, pdev->bus,
                PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
-               pdev->domain ? pdev->domain->domain_id : -1);
+               pdev->domain ? pdev->domain->domain_id : -1,
+               (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
         list_for_each_entry ( msi, &pdev->msi_list, list )
                printk("%d ", msi->irq);
         printk(">\n");
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 5f295f3..a2a0a1c 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -56,6 +56,8 @@ struct pci_dev {
 
     u8 phantom_stride;
 
+    int node; /* NUMA node */
+
     enum pdev_type {
         DEV_TYPE_PCI_UNKNOWN,
         DEV_TYPE_PCIe_ENDPOINT,
@@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *,
 int pci_release_devices(struct domain *d);
 int pci_add_segment(u16 seg);
 const unsigned long *pci_get_ro_map(u16 seg);
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+                   const struct pci_dev_info *, const int);
 int pci_remove_device(u16 seg, u8 bus, u8 devfn);
 int pci_ro_device(int seg, int bus, int devfn);
 void arch_pci_ro_device(int seg, int bdf);
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-02 21:34 [PATCH 0/4] Display IO topology when PXM data is available Boris Ostrovsky
  2014-12-02 21:34 ` [PATCH 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
@ 2014-12-02 21:34 ` Boris Ostrovsky
  2014-12-03 15:20   ` Andrew Cooper
                     ` (3 more replies)
  2014-12-02 21:34 ` [PATCH 3/4] sysctl/libxl: Provide information about IO topology Boris Ostrovsky
  2014-12-02 21:34 ` [PATCH 4/4] libxl: Switch to using new topology interface Boris Ostrovsky
  3 siblings, 4 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-02 21:34 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

Make XEN_SYSCTL_topologyinfo more generic so that it can return both
CPU and IO topology (support for returning the latter is added in the
subsequent patch)

To do so move (and rename) previously used cpu_to_core/cpu_to_socket/
cpu_to_node into struct xen_sysctl_cputopo and move it, together with
newly added struct xen_sysctl_iotopo, to xen_sysctl_topologyinfo.

Add libxl_get_topology() to handle new interface and modify
libxl_get_cpu_topology() to be a wrapper around it.

Adjust xenpm and python's low-level C libraries for new interface.

This change requires bumping XEN_SYSCTL_INTERFACE_VERSION

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxl/libxl.c               | 79 ++++++++++++++++++++++++---------------
 tools/libxl/libxl.h               |  4 ++
 tools/libxl/libxl_types.idl       | 12 ++++++
 tools/libxl/libxl_utils.c         |  6 +++
 tools/misc/xenpm.c                | 64 +++++++++++++------------------
 tools/python/xen/lowlevel/xc/xc.c | 38 +++++++------------
 xen/common/sysctl.c               | 43 ++++++++++++---------
 xen/include/public/sysctl.h       | 36 +++++++++++++-----
 8 files changed, 162 insertions(+), 120 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index de23fec..58cdc3b 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5056,11 +5056,37 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
     GC_INIT(ctx);
+    libxl_cputopology *cputopology = NULL;
+    libxl_topology *topo;
+    int i;
+
+    topo = libxl_get_topology(ctx);
+    if (topo == NULL) {
+        *nb_cpu_out = 0;
+        goto out;
+    }
+
+    cputopology = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * topo->cpu_num);
+    for (i = 0; i < topo->cpu_num; i++) {
+        cputopology[i].core = topo->cpu[i].core;
+        cputopology[i].socket = topo->cpu[i].socket;
+        cputopology[i].node = topo->cpu[i].node;
+    }
+    *nb_cpu_out = topo->cpu_num;
+
+    libxl_topology_free(topo);
+
+ out:
+    GC_FREE;
+    return cputopology;
+}
+
+libxl_topology *libxl_get_topology(libxl_ctx *ctx)
+{
+    GC_INIT(ctx);
     xc_topologyinfo_t tinfo;
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
-    libxl_cputopology *ret = NULL;
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    libxl_topology *ret = NULL;
     int i;
     int max_cpus;
 
@@ -5072,48 +5098,39 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
         goto out;
     }
 
-    coremap = xc_hypercall_buffer_alloc
-        (ctx->xch, coremap, sizeof(*coremap) * max_cpus);
-    socketmap = xc_hypercall_buffer_alloc
-        (ctx->xch, socketmap, sizeof(*socketmap) * max_cpus);
-    nodemap = xc_hypercall_buffer_alloc
-        (ctx->xch, nodemap, sizeof(*nodemap) * max_cpus);
-    if ((coremap == NULL) || (socketmap == NULL) || (nodemap == NULL)) {
+    cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
+                                        sizeof(*cputopo) * max_cpus);
+    if (cputopo == NULL) {
         LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
                             "Unable to allocate hypercall arguments");
         goto fail;
     }
-
-    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
-    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
-    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
+    set_xen_guest_handle(tinfo.cputopo, cputopo);
     tinfo.max_cpu_index = max_cpus - 1;
+
+    set_xen_guest_handle(tinfo.iotopo, HYPERCALL_BUFFER_NULL);
+
     if (xc_topologyinfo(ctx->xch, &tinfo) != 0) {
         LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "Topology info hypercall failed");
         goto fail;
     }
 
-    if (tinfo.max_cpu_index < max_cpus - 1)
-        max_cpus = tinfo.max_cpu_index + 1;
+    ret = libxl__zalloc(NOGC, sizeof(*ret));
+    ret->cpu_num = tinfo.max_cpu_index + 1;
+    ret->cpu = libxl__zalloc(NOGC, sizeof(*(ret->cpu)) * ret->cpu_num);
 
-    ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * max_cpus);
-
-    for (i = 0; i < max_cpus; i++) {
-#define V(map, i) (map[i] == INVALID_TOPOLOGY_ID) ? \
-    LIBXL_CPUTOPOLOGY_INVALID_ENTRY : map[i]
-        ret[i].core = V(coremap, i);
-        ret[i].socket = V(socketmap, i);
-        ret[i].node = V(nodemap, i);
+    for (i = 0; i < ret->cpu_num; i++) {
+#define V(map, i) ( cputopo[i].map == INVALID_TOPOLOGY_ID) ? \
+    LIBXL_CPUTOPOLOGY_INVALID_ENTRY :  cputopo[i].map
+        ret->cpu[i].core = V(core, i);
+        ret->cpu[i].socket = V(socket, i);
+        ret->cpu[i].node = V(node, i);
 #undef V
     }
 
-fail:
-    xc_hypercall_buffer_free(ctx->xch, coremap);
-    xc_hypercall_buffer_free(ctx->xch, socketmap);
-    xc_hypercall_buffer_free(ctx->xch, nodemap);
+ fail:
+    xc_hypercall_buffer_free(ctx->xch, cputopo);
 
-    if (ret)
-        *nb_cpu_out = max_cpus;
  out:
     GC_FREE;
     return ret;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index c3451bd..5ab008d 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1060,6 +1060,10 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
 void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
 
+#define LIBXL_TOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
+libxl_topology *libxl_get_topology(libxl_ctx *ctx);
+void libxl_topology_free(libxl_topology *);
+
 #define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0)
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr);
 void libxl_numainfo_list_free(libxl_numainfo *, int nr);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f7fc695..673b273 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -642,6 +642,18 @@ libxl_cputopology = Struct("cputopology", [
     ("node", uint32),
     ], dir=DIR_OUT)
 
+libxl_iotopology = Struct("iotopology", [
+    ("seg", uint16),
+    ("bus", uint8),
+    ("devfn", uint8),
+    ("node", uint32),
+    ], dir=DIR_OUT)
+
+libxl_topology = Struct("topology", [
+    ("cpu", Array(libxl_cputopology, "cpu_num")),
+    ("dev", Array(libxl_iotopology, "dev_num")),
+    ], dir=DIR_OUT)
+
 libxl_sched_credit_params = Struct("sched_credit_params", [
     ("tslice_ms", integer),
     ("ratelimit_us", integer),
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 58df4f3..70c21a2 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -859,6 +859,12 @@ void libxl_cputopology_list_free(libxl_cputopology *list, int nr)
     free(list);
 }
 
+void libxl_topology_free(libxl_topology *tinfo)
+{
+    libxl_topology_dispose(tinfo);
+    free(tinfo);
+}
+
 void libxl_numainfo_list_free(libxl_numainfo *list, int nr)
 {
     int i;
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index e43924c..8fd51d2 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -355,16 +355,11 @@ static void signal_int_handler(int signo)
     int i, j, k;
     struct timeval tv;
     int cx_cap = 0, px_cap = 0;
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_core);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_socket);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_node);
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
     xc_topologyinfo_t info = { 0 };
 
-    cpu_to_core = xc_hypercall_buffer_alloc(xc_handle, cpu_to_core, sizeof(*cpu_to_core) * MAX_NR_CPU);
-    cpu_to_socket = xc_hypercall_buffer_alloc(xc_handle, cpu_to_socket, sizeof(*cpu_to_socket) * MAX_NR_CPU);
-    cpu_to_node = xc_hypercall_buffer_alloc(xc_handle, cpu_to_node, sizeof(*cpu_to_node) * MAX_NR_CPU);
-
-    if ( cpu_to_core == NULL || cpu_to_socket == NULL || cpu_to_node == NULL )
+    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo, sizeof(*cputopo) * MAX_NR_CPU);
+    if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
@@ -448,11 +443,11 @@ static void signal_int_handler(int signo)
             printf("  Avg freq\t%d\tKHz\n", avgfreq[i]);
     }
 
-    set_xen_guest_handle(info.cpu_to_core, cpu_to_core);
-    set_xen_guest_handle(info.cpu_to_socket, cpu_to_socket);
-    set_xen_guest_handle(info.cpu_to_node, cpu_to_node);
+    set_xen_guest_handle(info.cputopo, cputopo);
     info.max_cpu_index = MAX_NR_CPU - 1;
 
+    set_xen_guest_handle(info.iotopo, HYPERCALL_BUFFER_NULL);
+
     if ( cx_cap && !xc_topologyinfo(xc_handle, &info) )
     {
         uint32_t socket_ids[MAX_NR_CPU];
@@ -465,8 +460,8 @@ static void signal_int_handler(int signo)
         /* check validity */
         for ( i = 0; i <= info.max_cpu_index; i++ )
         {
-            if ( cpu_to_core[i] == INVALID_TOPOLOGY_ID ||
-                 cpu_to_socket[i] == INVALID_TOPOLOGY_ID )
+            if ( cputopo[i].core == INVALID_TOPOLOGY_ID ||
+                 cputopo[i].socket == INVALID_TOPOLOGY_ID )
                 break;
         }
         if ( i > info.max_cpu_index )
@@ -475,20 +470,20 @@ static void signal_int_handler(int signo)
             for ( i = 0; i <= info.max_cpu_index; i++ )
             {
                 for ( j = 0; j < socket_nr; j++ )
-                    if ( cpu_to_socket[i] == socket_ids[j] )
+                    if ( cputopo[i].socket == socket_ids[j] )
                         break;
                 if ( j == socket_nr )
                 {
-                    socket_ids[j] = cpu_to_socket[i];
+                    socket_ids[j] = cputopo[i].socket;
                     socket_nr++;
                 }
 
                 for ( j = 0; j < core_nr; j++ )
-                    if ( cpu_to_core[i] == core_ids[j] )
+                    if ( cputopo[i].core == core_ids[j] )
                         break;
                 if ( j == core_nr )
                 {
-                    core_ids[j] = cpu_to_core[i];
+                    core_ids[j] = cputopo[i].core;
                     core_nr++;
                 }
             }
@@ -501,7 +496,7 @@ static void signal_int_handler(int signo)
 
                 for ( j = 0; j <= info.max_cpu_index; j++ )
                 {
-                    if ( cpu_to_socket[j] == socket_ids[i] )
+                    if ( cputopo[j].socket == socket_ids[i] )
                         break;
                 }
                 printf("\nSocket %d\n", socket_ids[i]);
@@ -520,8 +515,8 @@ static void signal_int_handler(int signo)
                 {
                     for ( j = 0; j <= info.max_cpu_index; j++ )
                     {
-                        if ( cpu_to_socket[j] == socket_ids[i] &&
-                             cpu_to_core[j] == core_ids[k] )
+                        if ( cputopo[j].socket == socket_ids[i] &&
+                             cputopo[j].core == core_ids[k] )
                             break;
                     }
                     printf("\t Core %d CPU %d\n", core_ids[k], j);
@@ -556,9 +551,7 @@ static void signal_int_handler(int signo)
     free(sum);
     free(avgfreq);
 out:
-    xc_hypercall_buffer_free(xc_handle, cpu_to_core);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_socket);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_node);
+    xc_hypercall_buffer_free(xc_handle, cputopo);
     xc_interface_close(xc_handle);
     exit(0);
 }
@@ -965,27 +958,22 @@ void scaling_governor_func(int argc, char *argv[])
 
 void cpu_topology_func(int argc, char *argv[])
 {
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_core);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_socket);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_node);
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
     xc_topologyinfo_t info = { 0 };
     int i, rc = ENOMEM;
 
-    cpu_to_core = xc_hypercall_buffer_alloc(xc_handle, cpu_to_core, sizeof(*cpu_to_core) * MAX_NR_CPU);
-    cpu_to_socket = xc_hypercall_buffer_alloc(xc_handle, cpu_to_socket, sizeof(*cpu_to_socket) * MAX_NR_CPU);
-    cpu_to_node = xc_hypercall_buffer_alloc(xc_handle, cpu_to_node, sizeof(*cpu_to_node) * MAX_NR_CPU);
-
-    if ( cpu_to_core == NULL || cpu_to_socket == NULL || cpu_to_node == NULL )
+    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo, sizeof(*cputopo) * MAX_NR_CPU);
+    if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
     }
 
-    set_xen_guest_handle(info.cpu_to_core, cpu_to_core);
-    set_xen_guest_handle(info.cpu_to_socket, cpu_to_socket);
-    set_xen_guest_handle(info.cpu_to_node, cpu_to_node);
+    set_xen_guest_handle(info.cputopo, cputopo);
     info.max_cpu_index = MAX_NR_CPU-1;
 
+    set_xen_guest_handle(info.iotopo, HYPERCALL_BUFFER_NULL);
+
     if ( xc_topologyinfo(xc_handle, &info) )
     {
         rc = errno;
@@ -1000,16 +988,14 @@ void cpu_topology_func(int argc, char *argv[])
     printf("CPU\tcore\tsocket\tnode\n");
     for ( i = 0; i <= info.max_cpu_index; i++ )
     {
-        if ( cpu_to_core[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].core == INVALID_TOPOLOGY_ID )
             continue;
         printf("CPU%d\t %d\t %d\t %d\n",
-               i, cpu_to_core[i], cpu_to_socket[i], cpu_to_node[i]);
+               i, cputopo[i].core, cputopo[i].socket, cputopo[i].node);
     }
     rc = 0;
 out:
-    xc_hypercall_buffer_free(xc_handle, cpu_to_core);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_socket);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_node);
+    xc_hypercall_buffer_free(xc_handle, cputopo);
     if ( rc )
         exit(rc);
 }
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index d95d459..1f9252a 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1226,25 +1226,17 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     int i, max_cpu_index;
     PyObject *ret_obj = NULL;
     PyObject *cpu_to_core_obj, *cpu_to_socket_obj, *cpu_to_node_obj;
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
 
-    coremap = xc_hypercall_buffer_alloc(self->xc_handle, coremap, sizeof(*coremap) * (MAX_CPU_INDEX+1));
-    if ( coremap == NULL )
-        goto out;
-    socketmap = xc_hypercall_buffer_alloc(self->xc_handle, socketmap, sizeof(*socketmap) * (MAX_CPU_INDEX+1));
-    if ( socketmap == NULL  )
-        goto out;
-    nodemap = xc_hypercall_buffer_alloc(self->xc_handle, nodemap, sizeof(*nodemap) * (MAX_CPU_INDEX+1));
-    if ( nodemap == NULL )
-        goto out;
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
 
-    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
-    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
-    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
+    cputopo = xc_hypercall_buffer_alloc(self->xc_handle, cputopo, sizeof(*cputopo) * (MAX_CPU_INDEX+1));
+    if ( cputopo == NULL )
+	goto out;
+    set_xen_guest_handle(tinfo.cputopo, cputopo);
     tinfo.max_cpu_index = MAX_CPU_INDEX;
 
+    set_xen_guest_handle(tinfo.iotopo, HYPERCALL_BUFFER_NULL);
+
     if ( xc_topologyinfo(self->xc_handle, &tinfo) != 0 )
         goto out;
 
@@ -1258,35 +1250,35 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     cpu_to_node_obj = PyList_New(0);
     for ( i = 0; i <= max_cpu_index; i++ )
     {
-        if ( coremap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].core == INVALID_TOPOLOGY_ID )
         {
             PyList_Append(cpu_to_core_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(coremap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].core);
             PyList_Append(cpu_to_core_obj, pyint);
             Py_DECREF(pyint);
         }
 
-        if ( socketmap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].socket == INVALID_TOPOLOGY_ID )
         {
             PyList_Append(cpu_to_socket_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(socketmap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].socket);
             PyList_Append(cpu_to_socket_obj, pyint);
             Py_DECREF(pyint);
         }
 
-        if ( nodemap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].node == INVALID_TOPOLOGY_ID )
         {
             PyList_Append(cpu_to_node_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(nodemap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].node);
             PyList_Append(cpu_to_node_obj, pyint);
             Py_DECREF(pyint);
         }
@@ -1304,9 +1296,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     Py_DECREF(cpu_to_node_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, coremap);
-    xc_hypercall_buffer_free(self->xc_handle, socketmap);
-    xc_hypercall_buffer_free(self->xc_handle, nodemap);
+    xc_hypercall_buffer_free(self->xc_handle, cputopo);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
 #undef MAX_CPU_INDEX
 }
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 0cb6ee1..d4dc8ed 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -327,32 +327,41 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 
         last_online_cpu = cpumask_last(&cpu_online_map);
         max_cpu_index = min_t(uint32_t, ti->max_cpu_index, last_online_cpu);
-        ti->max_cpu_index = last_online_cpu;
+
+        if ( guest_handle_is_null(ti->cputopo) )
+        {
+            ret = -EINVAL;
+            break;
+        }
 
         for ( i = 0; i <= max_cpu_index; i++ )
         {
-            if ( !guest_handle_is_null(ti->cpu_to_core) )
-            {
-                uint32_t core = cpu_online(i) ? cpu_to_core(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_core, i, &core, 1) )
-                    break;
-            }
-            if ( !guest_handle_is_null(ti->cpu_to_socket) )
+            xen_sysctl_cputopo_t cputopo;
+
+            if ( cpu_online(i) )
             {
-                uint32_t socket = cpu_online(i) ? cpu_to_socket(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_socket, i, &socket, 1) )
-                    break;
+                cputopo.core = cpu_to_core(i);
+                cputopo.socket = cpu_to_socket(i);
+                cputopo.node = cpu_to_node(i);
             }
-            if ( !guest_handle_is_null(ti->cpu_to_node) )
+            else
+                cputopo.core = cputopo.socket =
+                    cputopo.node = INVALID_TOPOLOGY_ID;
+
+            if ( copy_to_guest_offset(ti->cputopo, i, &cputopo, 1) )
             {
-                uint32_t node = cpu_online(i) ? cpu_to_node(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_node, i, &node, 1) )
-                    break;
+                ret = -EFAULT;
+                break;
             }
         }
 
-        ret = ((i <= max_cpu_index) || copy_to_guest(u_sysctl, op, 1))
-            ? -EFAULT : 0;
+        if ( !ret && (ti->max_cpu_index != last_online_cpu) )
+        {
+             ti->max_cpu_index = last_online_cpu;
+             if ( __copy_field_to_guest(u_sysctl, op,
+                                        u.topologyinfo.max_cpu_index) )
+                ret = -EFAULT;
+        }
     }
     break;
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index b3713b3..e1d1348 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -34,7 +34,7 @@
 #include "xen.h"
 #include "domctl.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
 
 /*
  * Read console content from Xen buffer ring.
@@ -464,26 +464,44 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
 
 /* XEN_SYSCTL_topologyinfo */
 #define INVALID_TOPOLOGY_ID  (~0U)
+
+struct xen_sysctl_cputopo {
+    uint32_t core;
+    uint32_t socket;
+    uint32_t node;
+};
+typedef struct xen_sysctl_cputopo xen_sysctl_cputopo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopo_t);
+
+struct xen_sysctl_iotopo {
+    uint16_t seg;
+    uint8_t bus;
+    uint8_t devfn;
+    uint32_t node;
+};
+typedef struct xen_sysctl_iotopo xen_sysctl_iotopo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_iotopo_t);
+
 struct xen_sysctl_topologyinfo {
     /*
      * IN: maximum addressable entry in the caller-provided arrays.
-     * OUT: largest cpu identifier in the system.
+     * OUT: largest cpu identifier or max number of devices in the system.
      * If OUT is greater than IN then the arrays are truncated!
      * If OUT is leass than IN then the array tails are not written by sysctl.
      */
     uint32_t max_cpu_index;
+    uint32_t max_devs;
 
     /*
      * If not NULL, these arrays are filled with core/socket/node identifier
-     * for each cpu.
-     * If a cpu has no core/socket/node information (e.g., cpu not present) 
-     * then the sentinel value ~0u is written to each array.
-     * The number of array elements written by the sysctl is:
+     * for each cpu and/or node for each PCI device.
+     * If information for a particular entry is not avalable it is set to
+     * INVALID_TOPOLOGY_ID.
+     * The number of array elements for CPU topology written by the sysctl is:
      *   min(@max_cpu_index_IN,@max_cpu_index_OUT)+1
      */
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_core;
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_socket;
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_node;
+    XEN_GUEST_HANDLE_64(xen_sysctl_cputopo_t) cputopo;
+    XEN_GUEST_HANDLE_64(xen_sysctl_iotopo_t) iotopo;
 };
 typedef struct xen_sysctl_topologyinfo xen_sysctl_topologyinfo_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_topologyinfo_t);
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] sysctl/libxl: Provide information about IO topology
  2014-12-02 21:34 [PATCH 0/4] Display IO topology when PXM data is available Boris Ostrovsky
  2014-12-02 21:34 ` [PATCH 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
  2014-12-02 21:34 ` [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data Boris Ostrovsky
@ 2014-12-02 21:34 ` Boris Ostrovsky
  2014-12-04 12:22   ` Dario Faggioli
  2014-12-05 15:59   ` Jan Beulich
  2014-12-02 21:34 ` [PATCH 4/4] libxl: Switch to using new topology interface Boris Ostrovsky
  3 siblings, 2 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-02 21:34 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

Add support to XEN_SYSCTL_topologyinfo to return IO topology data.

Modify libxl_get_topology() to request this data, provide OS-dependent
helper functions that determine which devices we are inquiring about
(Linux only).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxl/libxl.c          | 28 ++++++++++++++++-
 tools/libxl/libxl_freebsd.c  | 12 +++++++
 tools/libxl/libxl_internal.h |  5 +++
 tools/libxl/libxl_linux.c    | 74 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_netbsd.c   | 12 +++++++
 tools/libxl/xl_cmdimpl.c     | 31 ++++++++++++++-----
 xen/common/sysctl.c          | 30 ++++++++++++++++++
 7 files changed, 183 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 58cdc3b..ee103ac 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5086,6 +5086,7 @@ libxl_topology *libxl_get_topology(libxl_ctx *ctx)
     GC_INIT(ctx);
     xc_topologyinfo_t tinfo;
     DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_iotopo_t, iotopo);
     libxl_topology *ret = NULL;
     int i;
     int max_cpus;
@@ -5108,7 +5109,20 @@ libxl_topology *libxl_get_topology(libxl_ctx *ctx)
     set_xen_guest_handle(tinfo.cputopo, cputopo);
     tinfo.max_cpu_index = max_cpus - 1;
 
-    set_xen_guest_handle(tinfo.iotopo, HYPERCALL_BUFFER_NULL);
+    tinfo.max_devs = libxl_pci_numdevs(ctx);
+    if (tinfo.max_devs > 0) {
+        iotopo = xc_hypercall_buffer_alloc(ctx->xch, iotopo,
+                                           sizeof(*iotopo) * tinfo.max_devs);
+        if (iotopo != NULL) {
+            if (libxl_pci_topology_init(ctx, iotopo, tinfo.max_devs))
+                tinfo.max_devs = 0;
+        } else
+            tinfo.max_devs = 0;
+    }
+    if (tinfo.max_devs > 0)
+        set_xen_guest_handle(tinfo.iotopo, iotopo);
+    else
+        set_xen_guest_handle(tinfo.iotopo, HYPERCALL_BUFFER_NULL);
 
     if (xc_topologyinfo(ctx->xch, &tinfo) != 0) {
         LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "Topology info hypercall failed");
@@ -5128,8 +5142,20 @@ libxl_topology *libxl_get_topology(libxl_ctx *ctx)
 #undef V
     }
 
+    ret->dev_num = tinfo.max_devs;
+    ret->dev = libxl__zalloc(NOGC, sizeof(libxl_iotopology) * ret->dev_num);
+
+    for (i = 0; i < tinfo.max_devs; i++) {
+        ret->dev[i].seg = iotopo[i].seg;
+        ret->dev[i].bus = iotopo[i].bus;
+        ret->dev[i].devfn = iotopo[i].devfn;
+        ret->dev[i].node = (iotopo[i].node == INVALID_TOPOLOGY_ID) ?
+                           LIBXL_TOPOLOGY_INVALID_ENTRY : iotopo[i].node;
+    }
+
  fail:
     xc_hypercall_buffer_free(ctx->xch, cputopo);
+    xc_hypercall_buffer_free(ctx->xch, iotopo);
 
  out:
     GC_FREE;
diff --git a/tools/libxl/libxl_freebsd.c b/tools/libxl/libxl_freebsd.c
index e8b88b3..8695a78 100644
--- a/tools/libxl/libxl_freebsd.c
+++ b/tools/libxl/libxl_freebsd.c
@@ -131,3 +131,15 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
 }
+
+int libxl_pci_numdevs(libxl_ctx *ctx)
+{
+    return ERROR_NI;
+}
+
+int libxl_pci_topology_init(libxl_ctx *ctx,
+			    xen_sysctl_iotopo_t *iotopo,
+			    int numdev)
+{
+    return ERROR_NI;
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 4361421..c3a4194 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1168,6 +1168,11 @@ _hidden int libxl__try_phy_backend(mode_t st_mode);
 
 _hidden char *libxl__devid_to_localdev(libxl__gc *gc, int devid);
 
+_hidden int libxl_pci_numdevs(libxl_ctx *ctx);
+_hidden int libxl_pci_topology_init(libxl_ctx *ctx,
+                                    xen_sysctl_iotopo_t *iotopo,
+                                    int numdev);
+
 /* from libxl_pci */
 
 _hidden int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, int starting);
diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
index ea5d8c1..884c042 100644
--- a/tools/libxl/libxl_linux.c
+++ b/tools/libxl/libxl_linux.c
@@ -279,3 +279,77 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
 }
+
+/* These two routines are "inspired" by pciutils */
+int libxl_pci_numdevs(libxl_ctx *ctx)
+{
+    DIR *dir;
+    struct dirent *entry;
+    int numdev = 0;
+
+    dir = opendir("/sys/bus/pci/devices");
+    if (!dir) {
+        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, errno,
+                            "Cannot open /sys/bus/pci/devices");
+        return ERROR_FAIL;
+    }
+
+    while ((entry = readdir(dir))) {
+        /* ".", ".." or a special non-device perhaps */
+        if (entry->d_name[0] == '.')
+            continue;
+        numdev++;
+    }
+    closedir(dir);
+
+    return numdev;
+}
+
+int libxl_pci_topology_init(libxl_ctx *ctx,
+                            xen_sysctl_iotopo_t *iotopo,
+			    int numdev)
+{
+
+    DIR *dir;
+    struct dirent *entry;
+    int i;
+
+    dir = opendir("/sys/bus/pci/devices");
+    if (!dir) {
+        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, errno,
+                            "Cannot open /sys/bus/pci/devices");
+        return ERROR_FAIL;
+    }
+
+    i = 0;
+    while ((entry = readdir(dir))) {
+        unsigned int dom, bus, dev, func;
+
+        /* ".", ".." or a special non-device perhaps */
+        if (entry->d_name[0] == '.')
+            continue;
+
+        if (i == numdev) {
+            LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "Too many devices\n");
+            closedir(dir);
+            return ERROR_FAIL;
+        }
+
+        if (sscanf(entry->d_name, "%x:%x:%x.%d", &dom, &bus, &dev, &func) < 4) {
+            LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, errno,
+                                "Cannot open /sys/bus/pci/devices");
+            closedir(dir);
+            return ERROR_FAIL;
+        }
+
+        iotopo[i].seg = dom;
+        iotopo[i].bus = bus;
+        iotopo[i].devfn = ((dev & 0x1f) << 3) | (func & 7);
+
+        i++;
+    }
+
+    closedir(dir);
+
+    return 0;
+}
diff --git a/tools/libxl/libxl_netbsd.c b/tools/libxl/libxl_netbsd.c
index 898e160..0946b64 100644
--- a/tools/libxl/libxl_netbsd.c
+++ b/tools/libxl/libxl_netbsd.c
@@ -95,3 +95,15 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL;
 }
+
+int libxl_pci_numdevs(libxl_ctx *ctx)
+{
+    return ERROR_NI;
+}
+
+int libxl_pci_topology_init(libxl_ctx *ctx,
+			    xen_sysctl_iotopo_t *iotopo,
+			    int numdev)
+{
+    return ERROR_NI;
+}
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 9afef3f..fd9beb3 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5186,25 +5186,40 @@ static void output_numainfo(void)
 
 static void output_topologyinfo(void)
 {
-    libxl_cputopology *info;
-    int i, nr;
+    libxl_topology *info;
+    int i, valid_devs = 0;
 
-    info = libxl_get_cpu_topology(ctx, &nr);
+    info = libxl_get_topology(ctx);
     if (info == NULL) {
-        fprintf(stderr, "libxl_get_topologyinfo failed.\n");
+        fprintf(stderr, "libxl_get_topology failed.\n");
         return;
     }
 
     printf("cpu_topology           :\n");
     printf("cpu:    core    socket     node\n");
 
-    for (i = 0; i < nr; i++) {
-        if (info[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
+    for (i = 0; i < info->cpu_num; i++) {
+        if (info->cpu[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
             printf("%3d:    %4d     %4d     %4d\n", i,
-                   info[i].core, info[i].socket, info[i].node);
+                   info->cpu[i].core, info->cpu[i].socket, info->cpu[i].node);
     }
 
-    libxl_cputopology_list_free(info, nr);
+    printf("device topology        :\n");
+    printf("device           node\n");
+    for (i = 0; i < info->dev_num; i++) {
+        libxl_iotopology *dev = &info->dev[i];
+
+        if (dev->node != LIBXL_TOPOLOGY_INVALID_ENTRY) {
+	    printf("%04x:%02x:%02x.%01x      %d\n", dev->seg, dev->bus,
+                  ((dev->devfn >> 3) & 0x1f), (dev->devfn & 7), dev->node);
+            valid_devs++;
+        } 
+    }
+
+    if (valid_devs == 0)
+        printf(" No device topology data available\n");
+
+    libxl_topology_free(info);
 
     return;
 }
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index d4dc8ed..a82e0ed 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -328,6 +328,7 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         last_online_cpu = cpumask_last(&cpu_online_map);
         max_cpu_index = min_t(uint32_t, ti->max_cpu_index, last_online_cpu);
 
+        /* CPU topology buffers should always be provided */
         if ( guest_handle_is_null(ti->cputopo) )
         {
             ret = -EINVAL;
@@ -362,6 +363,35 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
                                         u.topologyinfo.max_cpu_index) )
                 ret = -EFAULT;
         }
+
+        if ( !ret && !guest_handle_is_null(ti->iotopo) )
+        {
+            for ( i = 0; i < ti->max_devs; i++ )
+            {
+                xen_sysctl_iotopo_t iotopo;
+                struct pci_dev *pdev;
+
+                if ( copy_from_guest_offset(&iotopo, ti->iotopo, i, 1) )
+                {
+                    ret = -EFAULT;
+                    break;
+                }
+
+                spin_lock(&pcidevs_lock);
+                pdev = pci_get_pdev(iotopo.seg, iotopo.bus, iotopo.devfn);
+                if ( !pdev || (pdev->node == NUMA_NO_NODE) )
+                    iotopo.node = INVALID_TOPOLOGY_ID;
+                else
+                    iotopo.node = pdev->node;
+                spin_unlock(&pcidevs_lock);
+
+                if ( copy_to_guest_offset(ti->iotopo, i, &iotopo, 1) )
+                {
+                    ret = -EFAULT;
+                    break;
+                }
+            }
+        }
     }
     break;
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] libxl: Switch to using new topology interface
  2014-12-02 21:34 [PATCH 0/4] Display IO topology when PXM data is available Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2014-12-02 21:34 ` [PATCH 3/4] sysctl/libxl: Provide information about IO topology Boris Ostrovsky
@ 2014-12-02 21:34 ` Boris Ostrovsky
  3 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-02 21:34 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

Make current users of libxl_get_cpu_topology() call
libxl_get_topology() instead.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxl/libxl.c       | 25 ++++++++++++-------------
 tools/libxl/libxl_numa.c  | 14 +++++++-------
 tools/libxl/libxl_utils.c | 24 ++++++++++++------------
 tools/libxl/xl_cmdimpl.c  | 44 +++++++++++++++++++++-----------------------
 4 files changed, 52 insertions(+), 55 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index ee103ac..5398e41 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -6410,30 +6410,29 @@ int libxl_cpupool_cpuadd(libxl_ctx *ctx, uint32_t poolid, int cpu)
 int libxl_cpupool_cpuadd_node(libxl_ctx *ctx, uint32_t poolid, int node, int *cpus)
 {
     int rc = 0;
-    int cpu, nr;
+    int cpu;
     libxl_bitmap freemap;
-    libxl_cputopology *topology;
+    libxl_topology *topology;
 
     if (libxl_get_freecpus(ctx, &freemap)) {
         return ERROR_FAIL;
     }
 
-    topology = libxl_get_cpu_topology(ctx, &nr);
+    topology = libxl_get_topology(ctx);
     if (!topology) {
         rc = ERROR_FAIL;
         goto out;
     }
 
     *cpus = 0;
-    for (cpu = 0; cpu < nr; cpu++) {
-        if (libxl_bitmap_test(&freemap, cpu) && (topology[cpu].node == node) &&
+    for (cpu = 0; cpu < topology->cpu_num; cpu++) {
+        if (libxl_bitmap_test(&freemap, cpu) && (topology->cpu[cpu].node == node) &&
             !libxl_cpupool_cpuadd(ctx, poolid, cpu)) {
                 (*cpus)++;
         }
-        libxl_cputopology_dispose(&topology[cpu]);
     }
 
-    free(topology);
+    libxl_topology_free(topology);
 out:
     libxl_bitmap_dispose(&freemap);
     return rc;
@@ -6457,8 +6456,8 @@ int libxl_cpupool_cpuremove_node(libxl_ctx *ctx, uint32_t poolid, int node, int
     int ret = 0;
     int n_pools;
     int p;
-    int cpu, nr_cpus;
-    libxl_cputopology *topology;
+    int cpu;
+    libxl_topology *topology;
     libxl_cpupoolinfo *poolinfo;
 
     poolinfo = libxl_list_cpupool(ctx, &n_pools);
@@ -6466,7 +6465,7 @@ int libxl_cpupool_cpuremove_node(libxl_ctx *ctx, uint32_t poolid, int node, int
         return ERROR_NOMEM;
     }
 
-    topology = libxl_get_cpu_topology(ctx, &nr_cpus);
+    topology = libxl_get_topology(ctx);
     if (!topology) {
         ret = ERROR_FAIL;
         goto out;
@@ -6475,8 +6474,8 @@ int libxl_cpupool_cpuremove_node(libxl_ctx *ctx, uint32_t poolid, int node, int
     *cpus = 0;
     for (p = 0; p < n_pools; p++) {
         if (poolinfo[p].poolid == poolid) {
-            for (cpu = 0; cpu < nr_cpus; cpu++) {
-                if ((topology[cpu].node == node) &&
+            for (cpu = 0; cpu < topology->cpu_num; cpu++) {
+                if ((topology->cpu[cpu].node == node) &&
                     libxl_bitmap_test(&poolinfo[p].cpumap, cpu) &&
                     !libxl_cpupool_cpuremove(ctx, poolid, cpu)) {
                         (*cpus)++;
@@ -6485,7 +6484,7 @@ int libxl_cpupool_cpuremove_node(libxl_ctx *ctx, uint32_t poolid, int node, int
         }
     }
 
-    libxl_cputopology_list_free(topology, nr_cpus);
+    libxl_topology_free(topology);
 
 out:
     libxl_cpupoolinfo_list_free(poolinfo, n_pools);
diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c
index 94ca4fe..c809dc0 100644
--- a/tools/libxl/libxl_numa.c
+++ b/tools/libxl/libxl_numa.c
@@ -293,9 +293,9 @@ int libxl__get_numa_candidate(libxl__gc *gc,
                               int *cndt_found)
 {
     libxl__numa_candidate new_cndt;
-    libxl_cputopology *tinfo = NULL;
+    libxl_topology *tinfo = NULL;
     libxl_numainfo *ninfo = NULL;
-    int nr_nodes = 0, nr_suit_nodes, nr_cpus = 0;
+    int nr_nodes = 0, nr_suit_nodes;
     libxl_bitmap suitable_nodemap, nodemap;
     int *vcpus_on_node, rc = 0;
 
@@ -341,7 +341,7 @@ int libxl__get_numa_candidate(libxl__gc *gc,
         goto out;
     }
 
-    tinfo = libxl_get_cpu_topology(CTX, &nr_cpus);
+    tinfo = libxl_get_topology(CTX);
     if (tinfo == NULL) {
         rc = ERROR_FAIL;
         goto out;
@@ -372,7 +372,7 @@ int libxl__get_numa_candidate(libxl__gc *gc,
      * all we have to do later is summing up the right elements of the
      * vcpus_on_node array.
      */
-    rc = nr_vcpus_on_nodes(gc, tinfo, nr_cpus, suitable_cpumap, vcpus_on_node);
+    rc = nr_vcpus_on_nodes(gc, tinfo->cpu, tinfo->cpu_num, suitable_cpumap, vcpus_on_node);
     if (rc)
         goto out;
 
@@ -387,7 +387,7 @@ int libxl__get_numa_candidate(libxl__gc *gc,
     if (!min_nodes) {
         int cpus_per_node;
 
-        cpus_per_node = count_cpus_per_node(tinfo, nr_cpus, nr_nodes);
+        cpus_per_node = count_cpus_per_node(tinfo->cpu, tinfo->cpu_num, nr_nodes);
         if (cpus_per_node == 0)
             min_nodes = 1;
         else
@@ -451,7 +451,7 @@ int libxl__get_numa_candidate(libxl__gc *gc,
                 continue;
 
             /* And the same applies if this combination is short in cpus */
-            nodes_cpus = nodemap_to_nr_cpus(tinfo, nr_cpus, suitable_cpumap,
+            nodes_cpus = nodemap_to_nr_cpus(tinfo->cpu, tinfo->cpu_num, suitable_cpumap,
                                             &nodemap);
             if (min_cpus && nodes_cpus < min_cpus)
                 continue;
@@ -503,7 +503,7 @@ int libxl__get_numa_candidate(libxl__gc *gc,
     libxl_bitmap_dispose(&suitable_nodemap);
     libxl__numa_candidate_dispose(&new_cndt);
     libxl_numainfo_list_free(ninfo, nr_nodes);
-    libxl_cputopology_list_free(tinfo, nr_cpus);
+    libxl_topology_free(tinfo);
     return rc;
 }
 
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 70c21a2..3eb8684 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -751,22 +751,22 @@ int libxl_nodemap_to_cpumap(libxl_ctx *ctx,
                             const libxl_bitmap *nodemap,
                             libxl_bitmap *cpumap)
 {
-    libxl_cputopology *tinfo = NULL;
-    int nr_cpus = 0, i, rc = 0;
+    libxl_topology *tinfo = NULL;
+    int i, rc = 0;
 
-    tinfo = libxl_get_cpu_topology(ctx, &nr_cpus);
+    tinfo = libxl_get_topology(ctx);
     if (tinfo == NULL) {
         rc = ERROR_FAIL;
         goto out;
     }
 
     libxl_bitmap_set_none(cpumap);
-    for (i = 0; i < nr_cpus; i++) {
-        if (libxl_bitmap_test(nodemap, tinfo[i].node))
+    for (i = 0; i < tinfo->cpu_num; i++) {
+        if (libxl_bitmap_test(nodemap, tinfo->cpu[i].node))
             libxl_bitmap_set(cpumap, i);
     }
  out:
-    libxl_cputopology_list_free(tinfo, nr_cpus);
+    libxl_topology_free(tinfo);
     return rc;
 }
 
@@ -796,10 +796,10 @@ int libxl_cpumap_to_nodemap(libxl_ctx *ctx,
                             const libxl_bitmap *cpumap,
                             libxl_bitmap *nodemap)
 {
-    libxl_cputopology *tinfo = NULL;
-    int nr_cpus = 0, i, rc = 0;
+    libxl_topology *tinfo = NULL;
+    int i, rc = 0;
 
-    tinfo = libxl_get_cpu_topology(ctx, &nr_cpus);
+    tinfo = libxl_get_topology(ctx);
     if (tinfo == NULL) {
         rc = ERROR_FAIL;
         goto out;
@@ -807,12 +807,12 @@ int libxl_cpumap_to_nodemap(libxl_ctx *ctx,
 
     libxl_bitmap_set_none(nodemap);
     libxl_for_each_set_bit(i, *cpumap) {
-        if (i >= nr_cpus)
+        if (i >= tinfo->cpu_num)
             break;
-        libxl_bitmap_set(nodemap, tinfo[i].node);
+        libxl_bitmap_set(nodemap, tinfo->cpu[i].node);
     }
  out:
-    libxl_cputopology_list_free(tinfo, nr_cpus);
+    libxl_topology_free(tinfo);
     return rc;
 }
 
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index fd9beb3..824f980 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7045,7 +7045,7 @@ int main_cpupoolcreate(int argc, char **argv)
     libxl_bitmap freemap;
     libxl_bitmap cpumap;
     libxl_uuid uuid;
-    libxl_cputopology *topology;
+    libxl_topology *topology;
     int rc = -ERROR_FAIL;
 
     SWITCH_FOREACH_OPT(opt, "hnf:", opts, "cpupool-create", 0) {
@@ -7149,18 +7149,17 @@ int main_cpupoolcreate(int argc, char **argv)
         goto out_cfg;
     }
     if (!xlu_cfg_get_list(config, "nodes", &nodes, 0, 0)) {
-        int nr;
         n_cpus = 0;
         n_nodes = 0;
-        topology = libxl_get_cpu_topology(ctx, &nr);
+        topology = libxl_get_topology(ctx);
         if (topology == NULL) {
-            fprintf(stderr, "libxl_get_topologyinfo failed\n");
+            fprintf(stderr, "libxl_get_topology failed\n");
             goto out_cfg;
         }
         while ((buf = xlu_cfg_get_listitem(nodes, n_nodes)) != NULL) {
             n = atoi(buf);
-            for (i = 0; i < nr; i++) {
-                if ((topology[i].node == n) &&
+            for (i = 0; i < topology->cpu_num; i++) {
+                if ((topology->cpu[i].node == n) &&
                     libxl_bitmap_test(&freemap, i)) {
                     libxl_bitmap_set(&cpumap, i);
                     n_cpus++;
@@ -7169,7 +7168,7 @@ int main_cpupoolcreate(int argc, char **argv)
             n_nodes++;
         }
 
-        libxl_cputopology_list_free(topology, nr);
+        libxl_topology_free(topology);
 
         if (n_cpus == 0) {
             fprintf(stderr, "no free cpu found\n");
@@ -7462,12 +7461,11 @@ int main_cpupoolnumasplit(int argc, char **argv)
     libxl_scheduler sched;
     int n_pools;
     int node;
-    int n_cpus;
     char name[16];
     libxl_uuid uuid;
     libxl_bitmap cpumap;
     libxl_cpupoolinfo *poolinfo;
-    libxl_cputopology *topology;
+    libxl_topology *topology;
     libxl_dominfo info;
 
     SWITCH_FOREACH_OPT(opt, "", NULL, "cpupool-numa-split", 0) {
@@ -7491,22 +7489,22 @@ int main_cpupoolnumasplit(int argc, char **argv)
         return -ERROR_FAIL;
     }
 
-    topology = libxl_get_cpu_topology(ctx, &n_cpus);
+    topology = libxl_get_topology(ctx);
     if (topology == NULL) {
-        fprintf(stderr, "libxl_get_topologyinfo failed\n");
+        fprintf(stderr, "libxl_get_topology failed\n");
         return -ERROR_FAIL;
     }
 
     if (libxl_cpu_bitmap_alloc(ctx, &cpumap, 0)) {
         fprintf(stderr, "Failed to allocate cpumap\n");
-        libxl_cputopology_list_free(topology, n_cpus);
+        libxl_topology_free(topology);
         return -ERROR_FAIL;
     }
 
     /* Reset Pool-0 to 1st node: first add cpus, then remove cpus to avoid
        a cpupool without cpus in between */
 
-    node = topology[0].node;
+    node = topology->cpu[0].node;
     if (libxl_cpupool_cpuadd_node(ctx, 0, node, &n)) {
         fprintf(stderr, "error on adding cpu to Pool 0\n");
         return -ERROR_FAIL;
@@ -7520,9 +7518,9 @@ int main_cpupoolnumasplit(int argc, char **argv)
     }
 
     n = 0;
-    for (c = 0; c < n_cpus; c++) {
-        if (topology[c].node == node) {
-            topology[c].node = LIBXL_CPUTOPOLOGY_INVALID_ENTRY;
+    for (c = 0; c < topology->cpu_num; c++) {
+        if (topology->cpu[c].node == node) {
+            topology->cpu[c].node = LIBXL_CPUTOPOLOGY_INVALID_ENTRY;
             libxl_bitmap_set(&cpumap, n);
             n++;
         }
@@ -7547,12 +7545,12 @@ int main_cpupoolnumasplit(int argc, char **argv)
     }
     libxl_bitmap_set_none(&cpumap);
 
-    for (c = 0; c < n_cpus; c++) {
-        if (topology[c].node == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) {
+    for (c = 0; c < topology->cpu_num; c++) {
+        if (topology->cpu[c].node == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) {
             continue;
         }
 
-        node = topology[c].node;
+        node = topology->cpu[c].node;
         ret = -libxl_cpupool_cpuremove_node(ctx, 0, node, &n);
         if (ret) {
             fprintf(stderr, "error on removing cpu from Pool 0\n");
@@ -7574,15 +7572,15 @@ int main_cpupoolnumasplit(int argc, char **argv)
             goto out;
         }
 
-        for (p = c; p < n_cpus; p++) {
-            if (topology[p].node == node) {
-                topology[p].node = LIBXL_CPUTOPOLOGY_INVALID_ENTRY;
+        for (p = c; p < topology->cpu_num; p++) {
+            if (topology->cpu[p].node == node) {
+                topology->cpu[p].node = LIBXL_CPUTOPOLOGY_INVALID_ENTRY;
             }
         }
     }
 
 out:
-    libxl_cputopology_list_free(topology, n_cpus);
+    libxl_topology_free(topology);
     libxl_bitmap_dispose(&cpumap);
 
     return ret;
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] pci: Do not ignore device's PXM information
  2014-12-02 21:34 ` [PATCH 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
@ 2014-12-03 15:01   ` Andrew Cooper
  2014-12-03 15:19     ` Boris Ostrovsky
  2014-12-05 15:53   ` Jan Beulich
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2014-12-03 15:01 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2
  Cc: dario.faggioli, ufimtseva, xen-devel

On 02/12/14 21:34, Boris Ostrovsky wrote:
> If ACPI provides PXM data for IO devices then dom0 will pass it to
> hypervisor during PHYSDEVOP_pci_device_add call. This information,
> however, is currently ignored.
>
> We should remember it (in the form of nodeID). We will also print it
> when user requests device information dump.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  xen/arch/x86/physdev.c        | 20 +++++++++++++++++---
>  xen/drivers/passthrough/pci.c | 13 +++++++++----
>  xen/include/xen/pci.h         |  5 ++++-
>  3 files changed, 30 insertions(+), 8 deletions(-)
>
> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
> index 6b3201b..7775f80 100644
> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
>              break;
>  
> -        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
> +        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
> +                             NULL, NUMA_NO_NODE);
>          break;
>      }
>  
> @@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
>          ret = pci_add_device(0, manage_pci_ext.bus,
>                               manage_pci_ext.devfn,
> -                             &pdev_info);
> +                             &pdev_info, NUMA_NO_NODE);
>          break;
>      }
>  
>      case PHYSDEVOP_pci_device_add: {
>          struct physdev_pci_device_add add;
>          struct pci_dev_info pdev_info;
> +        int node;
>  
>          ret = -EFAULT;
>          if ( copy_from_guest(&add, arg, 1) != 0 )
> @@ -618,7 +620,19 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          }
>          else
>              pdev_info.is_virtfn = 0;
> -        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
> +
> +        if ( add.flags & XEN_PCI_DEV_PXM ) {
> +            int optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
> +                             sizeof(add.optarr[0]);
> +
> +            if ( copy_from_guest_offset(&add.optarr[0], arg, optarr_off, 1) )
> +                break;

This will clobber the hypervisor stack, attempting to put the PXM
information into what is probably pdev_info.

How is one expected to use XEN_PCI_DEV_PXM ? There is currently no
specification of how to use the variable length structure.

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] pci: Do not ignore device's PXM information
  2014-12-03 15:01   ` Andrew Cooper
@ 2014-12-03 15:19     ` Boris Ostrovsky
  0 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-03 15:19 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2
  Cc: dario.faggioli, ufimtseva, xen-devel

On 12/03/2014 10:01 AM, Andrew Cooper wrote:
> On 02/12/14 21:34, Boris Ostrovsky wrote:
>> If ACPI provides PXM data for IO devices then dom0 will pass it to
>> hypervisor during PHYSDEVOP_pci_device_add call. This information,
>> however, is currently ignored.
>>
>> We should remember it (in the form of nodeID). We will also print it
>> when user requests device information dump.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>>   xen/arch/x86/physdev.c        | 20 +++++++++++++++++---
>>   xen/drivers/passthrough/pci.c | 13 +++++++++----
>>   xen/include/xen/pci.h         |  5 ++++-
>>   3 files changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index 6b3201b..7775f80 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
>>               break;
>>   
>> -        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
>> +        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
>> +                             NULL, NUMA_NO_NODE);
>>           break;
>>       }
>>   
>> @@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
>>           ret = pci_add_device(0, manage_pci_ext.bus,
>>                                manage_pci_ext.devfn,
>> -                             &pdev_info);
>> +                             &pdev_info, NUMA_NO_NODE);
>>           break;
>>       }
>>   
>>       case PHYSDEVOP_pci_device_add: {
>>           struct physdev_pci_device_add add;
>>           struct pci_dev_info pdev_info;
>> +        int node;
>>   
>>           ret = -EFAULT;
>>           if ( copy_from_guest(&add, arg, 1) != 0 )
>> @@ -618,7 +620,19 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           }
>>           else
>>               pdev_info.is_virtfn = 0;
>> -        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
>> +
>> +        if ( add.flags & XEN_PCI_DEV_PXM ) {
>> +            int optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
>> +                             sizeof(add.optarr[0]);
>> +
>> +            if ( copy_from_guest_offset(&add.optarr[0], arg, optarr_off, 1) )
>> +                break;
> This will clobber the hypervisor stack, attempting to put the PXM
> information into what is probably pdev_info.

Sigh... I actually fixed this on Linux side and now introduced this same 
bug here. (I guess I was lucky here that compiler must have inserted a 
word between add and pdev_info for alignment).

>
> How is one expected to use XEN_PCI_DEV_PXM ? There is currently no
> specification of how to use the variable length structure.

I don't think this is explicitly specified anywhere but if this flag is 
set then optarr[0] is supposed to store uint32_t pxm. I will add a 
comment to this effect to physdev.h

Thanks.
-boris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-02 21:34 ` [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data Boris Ostrovsky
@ 2014-12-03 15:20   ` Andrew Cooper
  2014-12-03 15:37     ` Boris Ostrovsky
  2014-12-04 11:24   ` Wei Liu
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2014-12-03 15:20 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2
  Cc: dario.faggioli, ufimtseva, xen-devel

On 02/12/14 21:34, Boris Ostrovsky wrote:
>  /* XEN_SYSCTL_topologyinfo */
>  #define INVALID_TOPOLOGY_ID  (~0U)
> +
> +struct xen_sysctl_cputopo {
> +    uint32_t core;
> +    uint32_t socket;
> +    uint32_t node;
> +};
> +typedef struct xen_sysctl_cputopo xen_sysctl_cputopo_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopo_t);
> +
> +struct xen_sysctl_iotopo {
> +    uint16_t seg;
> +    uint8_t bus;
> +    uint8_t devfn;
> +    uint32_t node;
> +};
> +typedef struct xen_sysctl_iotopo xen_sysctl_iotopo_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_iotopo_t);
> +
>  struct xen_sysctl_topologyinfo {
>      /*
>       * IN: maximum addressable entry in the caller-provided arrays.
> -     * OUT: largest cpu identifier in the system.
> +     * OUT: largest cpu identifier or max number of devices in the system.
>       * If OUT is greater than IN then the arrays are truncated!
>       * If OUT is leass than IN then the array tails are not written by sysctl.
>       */
>      uint32_t max_cpu_index;
> +    uint32_t max_devs;
>  
>      /*
>       * If not NULL, these arrays are filled with core/socket/node identifier
> -     * for each cpu.
> -     * If a cpu has no core/socket/node information (e.g., cpu not present) 
> -     * then the sentinel value ~0u is written to each array.
> -     * The number of array elements written by the sysctl is:
> +     * for each cpu and/or node for each PCI device.
> +     * If information for a particular entry is not avalable it is set to
> +     * INVALID_TOPOLOGY_ID.
> +     * The number of array elements for CPU topology written by the sysctl is:
>       *   min(@max_cpu_index_IN,@max_cpu_index_OUT)+1
>       */
> -    XEN_GUEST_HANDLE_64(uint32) cpu_to_core;
> -    XEN_GUEST_HANDLE_64(uint32) cpu_to_socket;
> -    XEN_GUEST_HANDLE_64(uint32) cpu_to_node;
> +    XEN_GUEST_HANDLE_64(xen_sysctl_cputopo_t) cputopo;
> +    XEN_GUEST_HANDLE_64(xen_sysctl_iotopo_t) iotopo;

These are inherently lists with different indicies.  They should not
conglomerated like this.

I would suggest introducing a new hypercall (xen_sysctl_iotopologyinfo
?) and leave this one alone.

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-03 15:20   ` Andrew Cooper
@ 2014-12-03 15:37     ` Boris Ostrovsky
  0 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-03 15:37 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2
  Cc: dario.faggioli, ufimtseva, xen-devel

On 12/03/2014 10:20 AM, Andrew Cooper wrote:
> On 02/12/14 21:34, Boris Ostrovsky wrote:
>>   /* XEN_SYSCTL_topologyinfo */
>>   #define INVALID_TOPOLOGY_ID  (~0U)
>> +
>> +struct xen_sysctl_cputopo {
>> +    uint32_t core;
>> +    uint32_t socket;
>> +    uint32_t node;
>> +};
>> +typedef struct xen_sysctl_cputopo xen_sysctl_cputopo_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopo_t);
>> +
>> +struct xen_sysctl_iotopo {
>> +    uint16_t seg;
>> +    uint8_t bus;
>> +    uint8_t devfn;
>> +    uint32_t node;
>> +};
>> +typedef struct xen_sysctl_iotopo xen_sysctl_iotopo_t;
>> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_iotopo_t);
>> +
>>   struct xen_sysctl_topologyinfo {
>>       /*
>>        * IN: maximum addressable entry in the caller-provided arrays.
>> -     * OUT: largest cpu identifier in the system.
>> +     * OUT: largest cpu identifier or max number of devices in the system.
>>        * If OUT is greater than IN then the arrays are truncated!
>>        * If OUT is leass than IN then the array tails are not written by sysctl.
>>        */
>>       uint32_t max_cpu_index;
>> +    uint32_t max_devs;
>>   
>>       /*
>>        * If not NULL, these arrays are filled with core/socket/node identifier
>> -     * for each cpu.
>> -     * If a cpu has no core/socket/node information (e.g., cpu not present)
>> -     * then the sentinel value ~0u is written to each array.
>> -     * The number of array elements written by the sysctl is:
>> +     * for each cpu and/or node for each PCI device.
>> +     * If information for a particular entry is not avalable it is set to
>> +     * INVALID_TOPOLOGY_ID.
>> +     * The number of array elements for CPU topology written by the sysctl is:
>>        *   min(@max_cpu_index_IN,@max_cpu_index_OUT)+1
>>        */
>> -    XEN_GUEST_HANDLE_64(uint32) cpu_to_core;
>> -    XEN_GUEST_HANDLE_64(uint32) cpu_to_socket;
>> -    XEN_GUEST_HANDLE_64(uint32) cpu_to_node;
>> +    XEN_GUEST_HANDLE_64(xen_sysctl_cputopo_t) cputopo;
>> +    XEN_GUEST_HANDLE_64(xen_sysctl_iotopo_t) iotopo;
> These are inherently lists with different indicies.  They should not
> conglomerated like this.


I don't follow this. These are indeed lists with different indicies but 
why can't they both be part of this struct?

-boris

>
> I would suggest introducing a new hypercall (xen_sysctl_iotopologyinfo
> ?) and leave this one alone.
>
> ~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-02 21:34 ` [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data Boris Ostrovsky
  2014-12-03 15:20   ` Andrew Cooper
@ 2014-12-04 11:24   ` Wei Liu
  2014-12-04 11:55   ` Dario Faggioli
  2014-12-05 15:55   ` Jan Beulich
  3 siblings, 0 replies; 26+ messages in thread
From: Wei Liu @ 2014-12-04 11:24 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, jbeulich, keir, ufimtseva

On Tue, Dec 02, 2014 at 04:34:08PM -0500, Boris Ostrovsky wrote:
[...]
>  void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
>  
> +#define LIBXL_TOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
> +libxl_topology *libxl_get_topology(libxl_ctx *ctx);
> +void libxl_topology_free(libxl_topology *);
> +
>  #define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0)
>  libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr);
>  void libxl_numainfo_list_free(libxl_numainfo *, int nr);
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index f7fc695..673b273 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -642,6 +642,18 @@ libxl_cputopology = Struct("cputopology", [
>      ("node", uint32),
>      ], dir=DIR_OUT)
>  
> +libxl_iotopology = Struct("iotopology", [
> +    ("seg", uint16),
> +    ("bus", uint8),
> +    ("devfn", uint8),
> +    ("node", uint32),
> +    ], dir=DIR_OUT)
> +
> +libxl_topology = Struct("topology", [
> +    ("cpu", Array(libxl_cputopology, "cpu_num")),
> +    ("dev", Array(libxl_iotopology, "dev_num")),
> +    ], dir=DIR_OUT)

Use "num_cpu" and "num_dev" please.

I think you also need to add a LIBXL_HAVE_$FOO in libxl.h. 

Wei.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-02 21:34 ` [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data Boris Ostrovsky
  2014-12-03 15:20   ` Andrew Cooper
  2014-12-04 11:24   ` Wei Liu
@ 2014-12-04 11:55   ` Dario Faggioli
  2014-12-04 16:26     ` Boris Ostrovsky
  2014-12-05 15:55   ` Jan Beulich
  3 siblings, 1 reply; 26+ messages in thread
From: Dario Faggioli @ 2014-12-04 11:55 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, jbeulich, keir, ufimtseva


[-- Attachment #1.1: Type: text/plain, Size: 1962 bytes --]

On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:
> Make XEN_SYSCTL_topologyinfo more generic so that it can return both
> CPU and IO topology (support for returning the latter is added in the
> subsequent patch)
> 
Is it always the case that we need the full information (i.e., both CPU
and IO topology), at all levels above Xen?

I appreciate the performance implications (one hcall instead of two) in
cases where we do, but I still think, as Andrew suggested, that having a
new XEN_SYSCTL_iotopology (or something like that) and leaving
*_topologyinfo alone would make a better low-level interface.

All IMHO, of course.

> To do so move (and rename) previously used cpu_to_core/cpu_to_socket/
> cpu_to_node into struct xen_sysctl_cputopo and move it, together with
> newly added struct xen_sysctl_iotopo, to xen_sysctl_topologyinfo.
> 
> Add libxl_get_topology() to handle new interface and modify
> libxl_get_cpu_topology() to be a wrapper around it.
> 
This is, I think, useful, and I'd go for it, even if we adding a new
hypercall at the Xen and xc level.

Of course, it would work the other way round: the new libxl function
would wrap the existing libxl_get_cpu_topology() and a new libxl call
(something like libxl_get_io_topology() ?).

> Adjust xenpm and python's low-level C libraries for new interface.
> 
All in one patch? Wouldn't splitting it at least in two (hypervisor and
tools parts) be better? If it were me, I'd probably split even more...

> This change requires bumping XEN_SYSCTL_INTERFACE_VERSION
>
Which would not be the case if we take the approach of adding a new,
iotopology specific, hcall, would it?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] sysctl/libxl: Provide information about IO topology
  2014-12-02 21:34 ` [PATCH 3/4] sysctl/libxl: Provide information about IO topology Boris Ostrovsky
@ 2014-12-04 12:22   ` Dario Faggioli
  2014-12-04 13:28     ` Ian Campbell
  2014-12-05 15:59   ` Jan Beulich
  1 sibling, 1 reply; 26+ messages in thread
From: Dario Faggioli @ 2014-12-04 12:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, jbeulich, keir, ufimtseva


[-- Attachment #1.1: Type: text/plain, Size: 876 bytes --]

On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:

> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1168,6 +1168,11 @@ _hidden int libxl__try_phy_backend(mode_t st_mode);
>  
>  _hidden char *libxl__devid_to_localdev(libxl__gc *gc, int devid);
>  
> +_hidden int libxl_pci_numdevs(libxl_ctx *ctx);
> +_hidden int libxl_pci_topology_init(libxl_ctx *ctx,
> +                                    xen_sysctl_iotopo_t *iotopo,
> +                                    int numdev);
> +
IIRC, internal/hidden function should have libxl__* as prefix.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] sysctl/libxl: Provide information about IO topology
  2014-12-04 12:22   ` Dario Faggioli
@ 2014-12-04 13:28     ` Ian Campbell
  0 siblings, 0 replies; 26+ messages in thread
From: Ian Campbell @ 2014-12-04 13:28 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: keir, ufimtseva, stefano.stabellini, ian.jackson, xen-devel,
	jbeulich, wei.liu2, Boris Ostrovsky

On Thu, 2014-12-04 at 13:22 +0100, Dario Faggioli wrote:
> On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:
> 
> > --- a/tools/libxl/libxl_internal.h
> > +++ b/tools/libxl/libxl_internal.h
> > @@ -1168,6 +1168,11 @@ _hidden int libxl__try_phy_backend(mode_t st_mode);
> >  
> >  _hidden char *libxl__devid_to_localdev(libxl__gc *gc, int devid);
> >  
> > +_hidden int libxl_pci_numdevs(libxl_ctx *ctx);
> > +_hidden int libxl_pci_topology_init(libxl_ctx *ctx,
> > +                                    xen_sysctl_iotopo_t *iotopo,
> > +                                    int numdev);
> > +
> IIRC, internal/hidden function should have libxl__* as prefix.

And, generally, take a gc instead of a ctx.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-04 11:55   ` Dario Faggioli
@ 2014-12-04 16:26     ` Boris Ostrovsky
  2014-12-04 16:44       ` Andrew Cooper
  2014-12-04 17:11       ` Jan Beulich
  0 siblings, 2 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-04 16:26 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, jbeulich, keir, ufimtseva

On 12/04/2014 06:55 AM, Dario Faggioli wrote:
> On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:
>> Make XEN_SYSCTL_topologyinfo more generic so that it can return both
>> CPU and IO topology (support for returning the latter is added in the
>> subsequent patch)
>>
> Is it always the case that we need the full information (i.e., both CPU
> and IO topology), at all levels above Xen?
>
> I appreciate the performance implications (one hcall instead of two) in
> cases where we do, but I still think, as Andrew suggested, that having a
> new XEN_SYSCTL_iotopology (or something like that) and leaving
> *_topologyinfo alone would make a better low-level interface.
>
> All IMHO, of course.

I don't feel strongly either way so I can go that route (and it will 
make patch 4 completely unnecessary).

(I am not sure though I understood Andrew's reasoning for splitting into 
two sysctls.

>> To do so move (and rename) previously used cpu_to_core/cpu_to_socket/
>> cpu_to_node into struct xen_sysctl_cputopo and move it, together with
>> newly added struct xen_sysctl_iotopo, to xen_sysctl_topologyinfo.
>>
>> Add libxl_get_topology() to handle new interface and modify
>> libxl_get_cpu_topology() to be a wrapper around it.
>>
> This is, I think, useful, and I'd go for it, even if we adding a new
> hypercall at the Xen and xc level.
>
> Of course, it would work the other way round: the new libxl function
> would wrap the existing libxl_get_cpu_topology() and a new libxl call
> (something like libxl_get_io_topology() ?).
>
>> Adjust xenpm and python's low-level C libraries for new interface.
>>
> All in one patch? Wouldn't splitting it at least in two (hypervisor and
> tools parts) be better? If it were me, I'd probably split even more...

I could not split it because this patch changes sysctl interface (more 
specifically, xen_sysctl_topologyinfo_t/xc_topologyinfo_t) so anyone who 
uses this struct needed to be updated at the same time.

Of course, if I were to leave current interface intact and add another 
sysctl for IO topology then this will not be necessary

>
>> This change requires bumping XEN_SYSCTL_INTERFACE_VERSION
>>
> Which would not be the case if we take the approach of adding a new,
> iotopology specific, hcall, would it?

I would think that any changes to a public interface, including adding a 
new function, require new version.

(And if we get a new version, even if we split topology into CPU and IO 
sysctls, I'd still like to put cpu_to_[core|socket||node] into a single 
structure).

Thanks.
-boris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-04 16:26     ` Boris Ostrovsky
@ 2014-12-04 16:44       ` Andrew Cooper
  2014-12-04 17:11       ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2014-12-04 16:44 UTC (permalink / raw)
  To: Boris Ostrovsky, Dario Faggioli
  Cc: wei.liu2, ian.campbell, stefano.stabellini, ian.jackson,
	xen-devel, jbeulich, keir, ufimtseva

On 04/12/14 16:26, Boris Ostrovsky wrote:
> On 12/04/2014 06:55 AM, Dario Faggioli wrote:
>> On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:
>>> Make XEN_SYSCTL_topologyinfo more generic so that it can return both
>>> CPU and IO topology (support for returning the latter is added in the
>>> subsequent patch)
>>>
>> Is it always the case that we need the full information (i.e., both CPU
>> and IO topology), at all levels above Xen?
>>
>> I appreciate the performance implications (one hcall instead of two) in
>> cases where we do, but I still think, as Andrew suggested, that having a
>> new XEN_SYSCTL_iotopology (or something like that) and leaving
>> *_topologyinfo alone would make a better low-level interface.
>>
>> All IMHO, of course.
>
> I don't feel strongly either way so I can go that route (and it will
> make patch 4 completely unnecessary).
>
> (I am not sure though I understood Andrew's reasoning for splitting
> into two sysctls.

The two bits of information are different, and it is perfectly
reasonable to want the cpu information at one point, but the io
information at another.

As it stands, you appear to mandate that the caller wants the cpu
information, which is not true.

>From a separate point of view, while the sysctl abi version does allow
us to make changes like this, it makes a mess for valgrind and strace to
have radically different hypercalls with the same name, separated by
only the interface version. 

>
>>> To do so move (and rename) previously used cpu_to_core/cpu_to_socket/
>>> cpu_to_node into struct xen_sysctl_cputopo and move it, together with
>>> newly added struct xen_sysctl_iotopo, to xen_sysctl_topologyinfo.
>>>
>>> Add libxl_get_topology() to handle new interface and modify
>>> libxl_get_cpu_topology() to be a wrapper around it.
>>>
>> This is, I think, useful, and I'd go for it, even if we adding a new
>> hypercall at the Xen and xc level.
>>
>> Of course, it would work the other way round: the new libxl function
>> would wrap the existing libxl_get_cpu_topology() and a new libxl call
>> (something like libxl_get_io_topology() ?).
>>
>>> Adjust xenpm and python's low-level C libraries for new interface.
>>>
>> All in one patch? Wouldn't splitting it at least in two (hypervisor and
>> tools parts) be better? If it were me, I'd probably split even more...
>
> I could not split it because this patch changes sysctl interface (more
> specifically, xen_sysctl_topologyinfo_t/xc_topologyinfo_t) so anyone
> who uses this struct needed to be updated at the same time.
>
> Of course, if I were to leave current interface intact and add another
> sysctl for IO topology then this will not be necessary

This is the other reason why change simply for changes sake in the
interface is best avoided.  The unstable API goes very deep into the
toolstack.

>
>>
>>> This change requires bumping XEN_SYSCTL_INTERFACE_VERSION
>>>
>> Which would not be the case if we take the approach of adding a new,
>> iotopology specific, hcall, would it?
>
> I would think that any changes to a public interface, including adding
> a new function, require new version.
>
> (And if we get a new version, even if we split topology into CPU and
> IO sysctls, I'd still like to put cpu_to_[core|socket||node] into a
> single structure).

This would certainly decrease the faff both the toostack and hypervisor
have to go to when passing the parameters, and I think it would be a
good improvement.

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-04 16:26     ` Boris Ostrovsky
  2014-12-04 16:44       ` Andrew Cooper
@ 2014-12-04 17:11       ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2014-12-04 17:11 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Dario Faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 04.12.14 at 17:26, <boris.ostrovsky@oracle.com> wrote:
> On 12/04/2014 06:55 AM, Dario Faggioli wrote:
>> On Tue, 2014-12-02 at 16:34 -0500, Boris Ostrovsky wrote:
>>> This change requires bumping XEN_SYSCTL_INTERFACE_VERSION
>>>
>> Which would not be the case if we take the approach of adding a new,
>> iotopology specific, hcall, would it?
> 
> I would think that any changes to a public interface, including adding a 
> new function, require new version.

No, additions of new sub-ops don't require the version to be bumped.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] pci: Do not ignore device's PXM information
  2014-12-02 21:34 ` [PATCH 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
  2014-12-03 15:01   ` Andrew Cooper
@ 2014-12-05 15:53   ` Jan Beulich
  2014-12-05 17:02     ` Boris Ostrovsky
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2014-12-05 15:53 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 02.12.14 at 22:34, <boris.ostrovsky@oracle.com> wrote:
> If ACPI provides PXM data for IO devices then dom0 will pass it to
> hypervisor during PHYSDEVOP_pci_device_add call. This information,
> however, is currently ignored.
> 
> We should remember it (in the form of nodeID). We will also print it
> when user requests device information dump.

This on its own seems pretty little reason for changing the code;
considering that subsequent patches will at least convey the
information to the tool stack, maybe that should be mentioned
here as the primary reason for the change?

> @@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
>          ret = pci_add_device(0, manage_pci_ext.bus,
>                               manage_pci_ext.devfn,
> -                             &pdev_info);
> +                             &pdev_info, NUMA_NO_NODE);
>          break;
>      }
>  
>      case PHYSDEVOP_pci_device_add: {
>          struct physdev_pci_device_add add;
>          struct pci_dev_info pdev_info;
> +        int node;

Here and everywhere else as applicable: unsigned int unless a
negative value is possible.

> @@ -618,7 +620,19 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          }
>          else
>              pdev_info.is_virtfn = 0;
> -        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
> +
> +        if ( add.flags & XEN_PCI_DEV_PXM ) {

Coding style.

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
>      pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
>  }
>  
> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
> +                   const struct pci_dev_info *info, const int node)

I don't see the need for the const.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -56,6 +56,8 @@ struct pci_dev {
>  
>      u8 phantom_stride;
>  
> +    int node; /* NUMA node */

I don't think we currently support node IDs wider than 8 bits.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-02 21:34 ` [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data Boris Ostrovsky
                     ` (2 preceding siblings ...)
  2014-12-04 11:55   ` Dario Faggioli
@ 2014-12-05 15:55   ` Jan Beulich
  2014-12-05 16:03     ` Jan Beulich
  3 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2014-12-05 15:55 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 02.12.14 at 22:34, <boris.ostrovsky@oracle.com> wrote:
> +struct xen_sysctl_iotopo {
> +    uint16_t seg;
> +    uint8_t bus;
> +    uint8_t devfn;
> +    uint32_t node;
> +};

This is PCI-centric without expressing in the name or layout. Perhaps
the first part should be a union from the very beginning?

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] sysctl/libxl: Provide information about IO topology
  2014-12-02 21:34 ` [PATCH 3/4] sysctl/libxl: Provide information about IO topology Boris Ostrovsky
  2014-12-04 12:22   ` Dario Faggioli
@ 2014-12-05 15:59   ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2014-12-05 15:59 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 02.12.14 at 22:34, <boris.ostrovsky@oracle.com> wrote:
> @@ -362,6 +363,35 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>                                          u.topologyinfo.max_cpu_index) )
>                  ret = -EFAULT;
>          }
> +
> +        if ( !ret && !guest_handle_is_null(ti->iotopo) )
> +        {
> +            for ( i = 0; i < ti->max_devs; i++ )

Careful about long running loops here please.

Jan

> +            {
> +                xen_sysctl_iotopo_t iotopo;
> +                struct pci_dev *pdev;
> +
> +                if ( copy_from_guest_offset(&iotopo, ti->iotopo, i, 1) )
> +                {
> +                    ret = -EFAULT;
> +                    break;
> +                }
> +
> +                spin_lock(&pcidevs_lock);
> +                pdev = pci_get_pdev(iotopo.seg, iotopo.bus, iotopo.devfn);
> +                if ( !pdev || (pdev->node == NUMA_NO_NODE) )
> +                    iotopo.node = INVALID_TOPOLOGY_ID;
> +                else
> +                    iotopo.node = pdev->node;
> +                spin_unlock(&pcidevs_lock);
> +
> +                if ( copy_to_guest_offset(ti->iotopo, i, &iotopo, 1) )
> +                {
> +                    ret = -EFAULT;
> +                    break;
> +                }
> +            }
> +        }
>      }
>      break;
>  
> -- 
> 1.8.4.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-05 15:55   ` Jan Beulich
@ 2014-12-05 16:03     ` Jan Beulich
  2014-12-05 17:10       ` Boris Ostrovsky
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2014-12-05 16:03 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 05.12.14 at 16:55, <JBeulich@suse.com> wrote:
>>>> On 02.12.14 at 22:34, <boris.ostrovsky@oracle.com> wrote:
>> +struct xen_sysctl_iotopo {
>> +    uint16_t seg;
>> +    uint8_t bus;
>> +    uint8_t devfn;
>> +    uint32_t node;
>> +};
> 
> This is PCI-centric without expressing in the name or layout. Perhaps
> the first part should be a union from the very beginning?

And I wonder whether that supposed union part wouldn't be nicely
done using struct physdev_pci_device.

Additionally please add IN and OUT annotations. When I first saw
this I assumed they would all be OUT (in which case the long running
loop problem mentioned in the reply to one of the other patches
wouldn't have been there), matching their CPU counterpart...

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] pci: Do not ignore device's PXM information
  2014-12-05 15:53   ` Jan Beulich
@ 2014-12-05 17:02     ` Boris Ostrovsky
  0 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-05 17:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

On 12/05/2014 10:53 AM, Jan Beulich wrote:
>
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -56,6 +56,8 @@ struct pci_dev {
>>   
>>       u8 phantom_stride;
>>   
>> +    int node; /* NUMA node */
> I don't think we currently support node IDs wider than 8 bits.
>

I used an int because pxm_to_node() returns an int. OTOH, pxm2node[], 
for which pxm_to_node() is essentially a wrapper, is a u8.

-boris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-05 16:03     ` Jan Beulich
@ 2014-12-05 17:10       ` Boris Ostrovsky
  2014-12-08  8:33         ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-05 17:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

On 12/05/2014 11:03 AM, Jan Beulich wrote:
>>>> On 05.12.14 at 16:55, <JBeulich@suse.com> wrote:
>>>>> On 02.12.14 at 22:34, <boris.ostrovsky@oracle.com> wrote:
>>> +struct xen_sysctl_iotopo {
>>> +    uint16_t seg;
>>> +    uint8_t bus;
>>> +    uint8_t devfn;
>>> +    uint32_t node;
>>> +};
>> This is PCI-centric without expressing in the name or layout.

xen_sysctl_pcitopo would be a better name.

>>   Perhaps
>> the first part should be a union from the very beginning?
> And I wonder whether that supposed union part wouldn't be nicely
> done using struct physdev_pci_device.

The do look strikingly similar ;-)

How would a union be useful here?

>
> Additionally please add IN and OUT annotations. When I first saw
> this I assumed they would all be OUT (in which case the long running
> loop problem mentioned in the reply to one of the other patches
> wouldn't have been there), matching their CPU counterpart...

I don't follow this. Are you saying that if ti->max_devs in patch 3/4 is 
an IN (which it is) then we don't have to guard for long-running loops?

-boris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-05 17:10       ` Boris Ostrovsky
@ 2014-12-08  8:33         ` Jan Beulich
  2014-12-08 14:56           ` Boris Ostrovsky
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2014-12-08  8:33 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 05.12.14 at 18:10, <boris.ostrovsky@oracle.com> wrote:
> On 12/05/2014 11:03 AM, Jan Beulich wrote:
>>>>> On 05.12.14 at 16:55, <JBeulich@suse.com> wrote:
>>>>>> On 02.12.14 at 22:34, <boris.ostrovsky@oracle.com> wrote:
>>>> +struct xen_sysctl_iotopo {
>>>> +    uint16_t seg;
>>>> +    uint8_t bus;
>>>> +    uint8_t devfn;
>>>> +    uint32_t node;
>>>> +};
>>> This is PCI-centric without expressing in the name or layout.
> 
> xen_sysctl_pcitopo would be a better name.
> 
>>>   Perhaps
>>> the first part should be a union from the very beginning?
>> And I wonder whether that supposed union part wouldn't be nicely
>> done using struct physdev_pci_device.
> 
> The do look strikingly similar ;-)
> 
> How would a union be useful here?

If this was left as iotopo, PCI would be one of potentially many I/O
variants, and hence a union would seem warranted. If you want to
rename it to pcitopo that of course would be pointless.

>> Additionally please add IN and OUT annotations. When I first saw
>> this I assumed they would all be OUT (in which case the long running
>> loop problem mentioned in the reply to one of the other patches
>> wouldn't have been there), matching their CPU counterpart...
> 
> I don't follow this. Are you saying that if ti->max_devs in patch 3/4 is 
> an IN (which it is) then we don't have to guard for long-running loops?

If they were all OUT then there wouldn't be a way for the entire
operation to be fooled into going over more devices than there are
in the system.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-08  8:33         ` Jan Beulich
@ 2014-12-08 14:56           ` Boris Ostrovsky
  2014-12-08 15:19             ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Ostrovsky @ 2014-12-08 14:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir


>>> Additionally please add IN and OUT annotations. When I first saw
>>> this I assumed they would all be OUT (in which case the long running
>>> loop problem mentioned in the reply to one of the other patches
>>> wouldn't have been there), matching their CPU counterpart...
>> I don't follow this. Are you saying that if ti->max_devs in patch 3/4 is
>> an IN (which it is) then we don't have to guard for long-running loops?
> If they were all OUT then there wouldn't be a way for the entire
> operation to be fooled into going over more devices than there are
> in the system.

Assuming I add continuations to the loop, too many devices wouldn't be a 
problem for the hypervisor, would it? If an unreasonable number is 
provided then eventually copy_from_guest() will fault.

-boris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-08 14:56           ` Boris Ostrovsky
@ 2014-12-08 15:19             ` Jan Beulich
  2014-12-08 15:30               ` Andrew Cooper
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2014-12-08 15:19 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 08.12.14 at 15:56, <boris.ostrovsky@oracle.com> wrote:

>>>> Additionally please add IN and OUT annotations. When I first saw
>>>> this I assumed they would all be OUT (in which case the long running
>>>> loop problem mentioned in the reply to one of the other patches
>>>> wouldn't have been there), matching their CPU counterpart...
>>> I don't follow this. Are you saying that if ti->max_devs in patch 3/4 is
>>> an IN (which it is) then we don't have to guard for long-running loops?
>> If they were all OUT then there wouldn't be a way for the entire
>> operation to be fooled into going over more devices than there are
>> in the system.
> 
> Assuming I add continuations to the loop, too many devices wouldn't be a 
> problem for the hypervisor, would it? If an unreasonable number is 
> provided then eventually copy_from_guest() will fault.

Continuations would address the concern, but it doesn't seem like
their use is really warranted here.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data
  2014-12-08 15:19             ` Jan Beulich
@ 2014-12-08 15:30               ` Andrew Cooper
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2014-12-08 15:30 UTC (permalink / raw)
  To: Jan Beulich, Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

On 08/12/14 15:19, Jan Beulich wrote:
>>>> On 08.12.14 at 15:56, <boris.ostrovsky@oracle.com> wrote:
>>>>> Additionally please add IN and OUT annotations. When I first saw
>>>>> this I assumed they would all be OUT (in which case the long running
>>>>> loop problem mentioned in the reply to one of the other patches
>>>>> wouldn't have been there), matching their CPU counterpart...
>>>> I don't follow this. Are you saying that if ti->max_devs in patch 3/4 is
>>>> an IN (which it is) then we don't have to guard for long-running loops?
>>> If they were all OUT then there wouldn't be a way for the entire
>>> operation to be fooled into going over more devices than there are
>>> in the system.
>> Assuming I add continuations to the loop, too many devices wouldn't be a 
>> problem for the hypervisor, would it? If an unreasonable number is 
>> provided then eventually copy_from_guest() will fault.
> Continuations would address the concern, but it doesn't seem like
> their use is really warranted here.

It depends.  I have one server I have to hand looks like:

[root@mpx1 ~]# lspci | wc -l
1759

(And I believe this one isn't fully populated with devices.)

A continuation is possibly warranted in a case like this, particularly
if an HVM domain is making this hypercall.

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-12-08 15:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-02 21:34 [PATCH 0/4] Display IO topology when PXM data is available Boris Ostrovsky
2014-12-02 21:34 ` [PATCH 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
2014-12-03 15:01   ` Andrew Cooper
2014-12-03 15:19     ` Boris Ostrovsky
2014-12-05 15:53   ` Jan Beulich
2014-12-05 17:02     ` Boris Ostrovsky
2014-12-02 21:34 ` [PATCH 2/4] sysctl/libxl: Add interface for returning IO topology data Boris Ostrovsky
2014-12-03 15:20   ` Andrew Cooper
2014-12-03 15:37     ` Boris Ostrovsky
2014-12-04 11:24   ` Wei Liu
2014-12-04 11:55   ` Dario Faggioli
2014-12-04 16:26     ` Boris Ostrovsky
2014-12-04 16:44       ` Andrew Cooper
2014-12-04 17:11       ` Jan Beulich
2014-12-05 15:55   ` Jan Beulich
2014-12-05 16:03     ` Jan Beulich
2014-12-05 17:10       ` Boris Ostrovsky
2014-12-08  8:33         ` Jan Beulich
2014-12-08 14:56           ` Boris Ostrovsky
2014-12-08 15:19             ` Jan Beulich
2014-12-08 15:30               ` Andrew Cooper
2014-12-02 21:34 ` [PATCH 3/4] sysctl/libxl: Provide information about IO topology Boris Ostrovsky
2014-12-04 12:22   ` Dario Faggioli
2014-12-04 13:28     ` Ian Campbell
2014-12-05 15:59   ` Jan Beulich
2014-12-02 21:34 ` [PATCH 4/4] libxl: Switch to using new topology interface Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.