All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Display IO topology when PXM data is available
@ 2015-01-06  2:18 Boris Ostrovsky
  2015-01-06  2:18 ` [PATCH v2 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
                   ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-06  2:18 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: andrew.cooper3, dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

Changes in v2:
* Split topology sysctls into two --- one for CPU topology and the other
  for devices
* Avoid long loops in the hypervisor by using continuations. (I am not
  particularly happy about using first_dev in the interface, suggestions
  for a better interface would be appreciated)
* Use proper libxl conventions for interfaces
* Avoid hypervisor stack corruption when copying PXM data from guest


4 patches that add interface for querying hypervisor about device
topology and allow 'xl info -n' display this information if PXM object
is provided by ACPI.

The patches are:

* Store PXM data (nodeID) in pci_dev during PHYSDEVOP_pci_device_add
  hypercall
* Modify XEN_SYSCTL_topologyinfo interface to make it a little more efficient.
  (This patch is not necessary for IO topology handling)
* Add XEN_SYSCTL_pcitopoinfo sysctl for querying hypervisor about
  device topology
* Make use of the above sysctl in libxl.


Boris Ostrovsky (4):
  pci: Do not ignore device's PXM information
  sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  sysctl: Add sysctl interface for querying PCI topology
  libxl: Add interface for querying hypervisor about PCI topology

 tools/libxc/include/xenctrl.h     |    6 +-
 tools/libxc/xc_misc.c             |   28 ++++++++--
 tools/libxl/libxl.c               |  110 ++++++++++++++++++++++++++-----------
 tools/libxl/libxl.h               |    4 +
 tools/libxl/libxl_freebsd.c       |   12 ++++
 tools/libxl/libxl_internal.h      |    5 ++
 tools/libxl/libxl_linux.c         |   71 ++++++++++++++++++++++++
 tools/libxl/libxl_netbsd.c        |   12 ++++
 tools/libxl/libxl_types.idl       |    7 ++
 tools/libxl/libxl_utils.c         |    8 +++
 tools/libxl/xl_cmdimpl.c          |   39 +++++++++++--
 tools/misc/xenpm.c                |   69 +++++++++--------------
 tools/python/xen/lowlevel/xc/xc.c |   40 +++++---------
 xen/arch/x86/physdev.c            |   23 +++++++-
 xen/common/sysctl.c               |  103 +++++++++++++++++++++++++++++------
 xen/drivers/passthrough/pci.c     |   13 +++-
 xen/include/public/physdev.h      |    6 ++
 xen/include/public/sysctl.h       |   75 +++++++++++++++++++------
 xen/include/xen/pci.h             |    5 +-
 19 files changed, 477 insertions(+), 159 deletions(-)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-06  2:18 [PATCH v2 0/4] Display IO topology when PXM data is available Boris Ostrovsky
@ 2015-01-06  2:18 ` Boris Ostrovsky
  2015-01-06 11:55   ` Andrew Cooper
  2015-01-07  9:06   ` Jan Beulich
  2015-01-06  2:18 ` [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-06  2:18 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: andrew.cooper3, dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

If ACPI provides PXM data for IO devices then dom0 will pass it to
hypervisor during PHYSDEVOP_pci_device_add call. This information,
however, is currently ignored.

We will store this information (in the form of nodeID) in pci_dev
structure so that we can provide it, for example, to the toolstack
when it adds support (in the following patches) for querying the
hypervisor about device topology

We will also print it when user requests device information dump.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/physdev.c        |   23 ++++++++++++++++++++---
 xen/drivers/passthrough/pci.c |   13 +++++++++----
 xen/include/public/physdev.h  |    6 ++++++
 xen/include/xen/pci.h         |    5 ++++-
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 6b3201b..62e768e 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -565,7 +565,8 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&manage_pci, arg, 1) != 0 )
             break;
 
-        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn, NULL);
+        ret = pci_add_device(0, manage_pci.bus, manage_pci.devfn,
+                             NULL, NUMA_NO_NODE);
         break;
     }
 
@@ -597,13 +598,14 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         pdev_info.physfn.devfn = manage_pci_ext.physfn.devfn;
         ret = pci_add_device(0, manage_pci_ext.bus,
                              manage_pci_ext.devfn,
-                             &pdev_info);
+                             &pdev_info, NUMA_NO_NODE);
         break;
     }
 
     case PHYSDEVOP_pci_device_add: {
         struct physdev_pci_device_add add;
         struct pci_dev_info pdev_info;
+        int node;
 
         ret = -EFAULT;
         if ( copy_from_guest(&add, arg, 1) != 0 )
@@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         }
         else
             pdev_info.is_virtfn = 0;
-        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
+
+        if ( add.flags & XEN_PCI_DEV_PXM )
+        {
+            uint32_t pxm;
+            int optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
+                             sizeof(add.optarr[0]);
+
+            if ( copy_from_guest_offset(&pxm, arg, optarr_off, 1) )
+                break;
+
+            node = pxm_to_node(pxm);
+        }
+        else
+            node = NUMA_NO_NODE;
+
+        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info, node);
         break;
     }
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 78c6977..3002129 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -568,7 +568,8 @@ static void pci_enable_acs(struct pci_dev *pdev)
     pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
 }
 
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+                   const struct pci_dev_info *info, int node)
 {
     struct pci_seg *pseg;
     struct pci_dev *pdev;
@@ -586,7 +587,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
         pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
         spin_unlock(&pcidevs_lock);
         if ( !pdev )
-            pci_add_device(seg, info->physfn.bus, info->physfn.devfn, NULL);
+            pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
+                           NULL, node);
         pdev_type = "virtual function";
     }
     else
@@ -609,6 +611,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info)
     if ( !pdev )
         goto out;
 
+    pdev->node = node;
+
     if ( info )
         pdev->info = *info;
     else if ( !pdev->vf_rlen[0] )
@@ -1191,10 +1195,11 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
 
     list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
     {
-        printk("%04x:%02x:%02x.%u - dom %-3d - MSIs < ",
+        printk("%04x:%02x:%02x.%u - dom %-3d - node %-3d - MSIs < ",
                pseg->nr, pdev->bus,
                PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
-               pdev->domain ? pdev->domain->domain_id : -1);
+               pdev->domain ? pdev->domain->domain_id : -1,
+               (pdev->node != NUMA_NO_NODE) ? pdev->node : -1);
         list_for_each_entry ( msi, &pdev->msi_list, list )
                printk("%d ", msi->irq);
         printk(">\n");
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index d547928..309346b 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -293,6 +293,12 @@ struct physdev_pci_device_add {
         uint8_t bus;
         uint8_t devfn;
     } physfn;
+
+    /*
+     * Optional parameters array.
+     * First element ([0]) is PXM domain associated with the device (if
+     * XEN_PCI_DEV_PXM is set)
+     */
 #if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
     uint32_t optarr[];
 #elif defined(__GNUC__)
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 5f295f3..a5eb81c 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -56,6 +56,8 @@ struct pci_dev {
 
     u8 phantom_stride;
 
+    int node; /* NUMA node */
+
     enum pdev_type {
         DEV_TYPE_PCI_UNKNOWN,
         DEV_TYPE_PCIe_ENDPOINT,
@@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *,
 int pci_release_devices(struct domain *d);
 int pci_add_segment(u16 seg);
 const unsigned long *pci_get_ro_map(u16 seg);
-int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
+int pci_add_device(u16 seg, u8 bus, u8 devfn,
+                   const struct pci_dev_info *, int);
 int pci_remove_device(u16 seg, u8 bus, u8 devfn);
 int pci_ro_device(int seg, int bus, int devfn);
 void arch_pci_ro_device(int seg, int bdf);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-06  2:18 [PATCH v2 0/4] Display IO topology when PXM data is available Boris Ostrovsky
  2015-01-06  2:18 ` [PATCH v2 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
@ 2015-01-06  2:18 ` Boris Ostrovsky
  2015-01-06 13:41   ` Andrew Cooper
  2015-01-19 17:26   ` Ian Campbell
  2015-01-06  2:18 ` [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
  2015-01-06  2:18 ` [PATCH v2 4/4] libxl: Add interface for querying hypervisor about " Boris Ostrovsky
  3 siblings, 2 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-06  2:18 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: andrew.cooper3, dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

Instead of copying data for each field in xen_sysctl_topologyinfo separately
put cpu/socket/node into a single structure and do a single copy for each
processor.

There is also no need to copy whole op to user at the end, max_cpu_index is
sufficient

Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
that these are used for CPU topology. Subsequent patch will add support for
PCI topology sysctl.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxc/include/xenctrl.h     |    4 +-
 tools/libxc/xc_misc.c             |   10 +++---
 tools/libxl/libxl.c               |   52 ++++++++++-----------------
 tools/misc/xenpm.c                |   69 ++++++++++++++----------------------
 tools/python/xen/lowlevel/xc/xc.c |   40 +++++++--------------
 xen/common/sysctl.c               |   47 +++++++++++++++----------
 xen/include/public/sysctl.h       |   40 ++++++++++++----------
 7 files changed, 117 insertions(+), 145 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ad8b8d..0cb6743 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1226,7 +1226,7 @@ int xc_readconsolering(xc_interface *xch,
 int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
-typedef xen_sysctl_topologyinfo_t xc_topologyinfo_t;
+typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1237,7 +1237,7 @@ typedef uint64_t xc_node_to_memfree_t;
 typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
-int xc_topologyinfo(xc_interface *xch, xc_topologyinfo_t *info);
+int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index e253a58..be68291 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -177,20 +177,20 @@ int xc_physinfo(xc_interface *xch,
     return 0;
 }
 
-int xc_topologyinfo(xc_interface *xch,
-                xc_topologyinfo_t *put_info)
+int xc_cputopoinfo(xc_interface *xch,
+                   xc_cputopoinfo_t *put_info)
 {
     int ret;
     DECLARE_SYSCTL;
 
-    sysctl.cmd = XEN_SYSCTL_topologyinfo;
+    sysctl.cmd = XEN_SYSCTL_cputopoinfo;
 
-    memcpy(&sysctl.u.topologyinfo, put_info, sizeof(*put_info));
+    memcpy(&sysctl.u.cputopoinfo, put_info, sizeof(*put_info));
 
     if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
         return ret;
 
-    memcpy(put_info, &sysctl.u.topologyinfo, sizeof(*put_info));
+    memcpy(put_info, &sysctl.u.cputopoinfo, sizeof(*put_info));
 
     return 0;
 }
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 50a8928..cd87614 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5072,10 +5072,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo *physinfo)
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
 {
     GC_INIT(ctx);
-    xc_topologyinfo_t tinfo;
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
+    xc_cputopoinfo_t tinfo;
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
     libxl_cputopology *ret = NULL;
     int i;
     int max_cpus;
@@ -5088,48 +5086,36 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
         goto out;
     }
 
-    coremap = xc_hypercall_buffer_alloc
-        (ctx->xch, coremap, sizeof(*coremap) * max_cpus);
-    socketmap = xc_hypercall_buffer_alloc
-        (ctx->xch, socketmap, sizeof(*socketmap) * max_cpus);
-    nodemap = xc_hypercall_buffer_alloc
-        (ctx->xch, nodemap, sizeof(*nodemap) * max_cpus);
-    if ((coremap == NULL) || (socketmap == NULL) || (nodemap == NULL)) {
+    cputopo = xc_hypercall_buffer_alloc(ctx->xch, cputopo,
+                                        sizeof(*cputopo) * max_cpus);
+    if (cputopo == NULL) {
         LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
                             "Unable to allocate hypercall arguments");
         goto fail;
     }
-
-    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
-    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
-    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
+    set_xen_guest_handle(tinfo.cputopo, cputopo);
     tinfo.max_cpu_index = max_cpus - 1;
-    if (xc_topologyinfo(ctx->xch, &tinfo) != 0) {
-        LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "Topology info hypercall failed");
+
+    if (xc_cputopoinfo(ctx->xch, &tinfo) != 0) {
+        LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "CPU topology info hypercall failed");
         goto fail;
     }
 
-    if (tinfo.max_cpu_index < max_cpus - 1)
-        max_cpus = tinfo.max_cpu_index + 1;
+    *nb_cpu_out = tinfo.max_cpu_index + 1;
+    ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * *nb_cpu_out);
 
-    ret = libxl__zalloc(NOGC, sizeof(libxl_cputopology) * max_cpus);
-
-    for (i = 0; i < max_cpus; i++) {
-#define V(map, i) (map[i] == INVALID_TOPOLOGY_ID) ? \
-    LIBXL_CPUTOPOLOGY_INVALID_ENTRY : map[i]
-        ret[i].core = V(coremap, i);
-        ret[i].socket = V(socketmap, i);
-        ret[i].node = V(nodemap, i);
+    for (i = 0; i < *nb_cpu_out; i++) {
+#define V(map, i) ( cputopo[i].map == INVALID_TOPOLOGY_ID) ? \
+    LIBXL_CPUTOPOLOGY_INVALID_ENTRY :  cputopo[i].map
+        ret[i].core = V(core, i);
+        ret[i].socket = V(socket, i);
+        ret[i].node = V(node, i);
 #undef V
     }
 
-fail:
-    xc_hypercall_buffer_free(ctx->xch, coremap);
-    xc_hypercall_buffer_free(ctx->xch, socketmap);
-    xc_hypercall_buffer_free(ctx->xch, nodemap);
+ fail:
+    xc_hypercall_buffer_free(ctx->xch, cputopo);
 
-    if (ret)
-        *nb_cpu_out = max_cpus;
  out:
     GC_FREE;
     return ret;
diff --git a/tools/misc/xenpm.c b/tools/misc/xenpm.c
index e43924c..13bfe7d 100644
--- a/tools/misc/xenpm.c
+++ b/tools/misc/xenpm.c
@@ -355,16 +355,13 @@ static void signal_int_handler(int signo)
     int i, j, k;
     struct timeval tv;
     int cx_cap = 0, px_cap = 0;
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_core);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_socket);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_node);
-    xc_topologyinfo_t info = { 0 };
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    xc_cputopoinfo_t info = { 0 };
 
-    cpu_to_core = xc_hypercall_buffer_alloc(xc_handle, cpu_to_core, sizeof(*cpu_to_core) * MAX_NR_CPU);
-    cpu_to_socket = xc_hypercall_buffer_alloc(xc_handle, cpu_to_socket, sizeof(*cpu_to_socket) * MAX_NR_CPU);
-    cpu_to_node = xc_hypercall_buffer_alloc(xc_handle, cpu_to_node, sizeof(*cpu_to_node) * MAX_NR_CPU);
+    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
+                                        sizeof(*cputopo) * MAX_NR_CPU);
 
-    if ( cpu_to_core == NULL || cpu_to_socket == NULL || cpu_to_node == NULL )
+    if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
@@ -448,12 +445,10 @@ static void signal_int_handler(int signo)
             printf("  Avg freq\t%d\tKHz\n", avgfreq[i]);
     }
 
-    set_xen_guest_handle(info.cpu_to_core, cpu_to_core);
-    set_xen_guest_handle(info.cpu_to_socket, cpu_to_socket);
-    set_xen_guest_handle(info.cpu_to_node, cpu_to_node);
+    set_xen_guest_handle(info.cputopo, cputopo);
     info.max_cpu_index = MAX_NR_CPU - 1;
 
-    if ( cx_cap && !xc_topologyinfo(xc_handle, &info) )
+    if ( cx_cap && !xc_cputopoinfo(xc_handle, &info) )
     {
         uint32_t socket_ids[MAX_NR_CPU];
         uint32_t core_ids[MAX_NR_CPU];
@@ -465,8 +460,8 @@ static void signal_int_handler(int signo)
         /* check validity */
         for ( i = 0; i <= info.max_cpu_index; i++ )
         {
-            if ( cpu_to_core[i] == INVALID_TOPOLOGY_ID ||
-                 cpu_to_socket[i] == INVALID_TOPOLOGY_ID )
+            if ( cputopo[i].core == INVALID_TOPOLOGY_ID ||
+                 cputopo[i].socket == INVALID_TOPOLOGY_ID )
                 break;
         }
         if ( i > info.max_cpu_index )
@@ -475,20 +470,20 @@ static void signal_int_handler(int signo)
             for ( i = 0; i <= info.max_cpu_index; i++ )
             {
                 for ( j = 0; j < socket_nr; j++ )
-                    if ( cpu_to_socket[i] == socket_ids[j] )
+                    if ( cputopo[i].socket == socket_ids[j] )
                         break;
                 if ( j == socket_nr )
                 {
-                    socket_ids[j] = cpu_to_socket[i];
+                    socket_ids[j] = cputopo[i].socket;
                     socket_nr++;
                 }
 
                 for ( j = 0; j < core_nr; j++ )
-                    if ( cpu_to_core[i] == core_ids[j] )
+                    if ( cputopo[i].core == core_ids[j] )
                         break;
                 if ( j == core_nr )
                 {
-                    core_ids[j] = cpu_to_core[i];
+                    core_ids[j] = cputopo[i].core;
                     core_nr++;
                 }
             }
@@ -501,7 +496,7 @@ static void signal_int_handler(int signo)
 
                 for ( j = 0; j <= info.max_cpu_index; j++ )
                 {
-                    if ( cpu_to_socket[j] == socket_ids[i] )
+                    if ( cputopo[j].socket == socket_ids[i] )
                         break;
                 }
                 printf("\nSocket %d\n", socket_ids[i]);
@@ -520,8 +515,8 @@ static void signal_int_handler(int signo)
                 {
                     for ( j = 0; j <= info.max_cpu_index; j++ )
                     {
-                        if ( cpu_to_socket[j] == socket_ids[i] &&
-                             cpu_to_core[j] == core_ids[k] )
+                        if ( cputopo[j].socket == socket_ids[i] &&
+                             cputopo[j].core == core_ids[k] )
                             break;
                     }
                     printf("\t Core %d CPU %d\n", core_ids[k], j);
@@ -556,9 +551,7 @@ static void signal_int_handler(int signo)
     free(sum);
     free(avgfreq);
 out:
-    xc_hypercall_buffer_free(xc_handle, cpu_to_core);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_socket);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_node);
+    xc_hypercall_buffer_free(xc_handle, cputopo);
     xc_interface_close(xc_handle);
     exit(0);
 }
@@ -965,28 +958,22 @@ void scaling_governor_func(int argc, char *argv[])
 
 void cpu_topology_func(int argc, char *argv[])
 {
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_core);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_socket);
-    DECLARE_HYPERCALL_BUFFER(uint32_t, cpu_to_node);
-    xc_topologyinfo_t info = { 0 };
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
+    xc_cputopoinfo_t info = { 0 };
     int i, rc = ENOMEM;
 
-    cpu_to_core = xc_hypercall_buffer_alloc(xc_handle, cpu_to_core, sizeof(*cpu_to_core) * MAX_NR_CPU);
-    cpu_to_socket = xc_hypercall_buffer_alloc(xc_handle, cpu_to_socket, sizeof(*cpu_to_socket) * MAX_NR_CPU);
-    cpu_to_node = xc_hypercall_buffer_alloc(xc_handle, cpu_to_node, sizeof(*cpu_to_node) * MAX_NR_CPU);
-
-    if ( cpu_to_core == NULL || cpu_to_socket == NULL || cpu_to_node == NULL )
+    cputopo = xc_hypercall_buffer_alloc(xc_handle, cputopo,
+                                        sizeof(*cputopo) * MAX_NR_CPU);
+    if ( cputopo == NULL )
     {
 	fprintf(stderr, "failed to allocate hypercall buffers\n");
 	goto out;
     }
 
-    set_xen_guest_handle(info.cpu_to_core, cpu_to_core);
-    set_xen_guest_handle(info.cpu_to_socket, cpu_to_socket);
-    set_xen_guest_handle(info.cpu_to_node, cpu_to_node);
+    set_xen_guest_handle(info.cputopo, cputopo);
     info.max_cpu_index = MAX_NR_CPU-1;
 
-    if ( xc_topologyinfo(xc_handle, &info) )
+    if ( xc_cputopoinfo(xc_handle, &info) )
     {
         rc = errno;
         fprintf(stderr, "Cannot get Xen CPU topology (%d - %s)\n",
@@ -1000,16 +987,14 @@ void cpu_topology_func(int argc, char *argv[])
     printf("CPU\tcore\tsocket\tnode\n");
     for ( i = 0; i <= info.max_cpu_index; i++ )
     {
-        if ( cpu_to_core[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].core == INVALID_TOPOLOGY_ID )
             continue;
         printf("CPU%d\t %d\t %d\t %d\n",
-               i, cpu_to_core[i], cpu_to_socket[i], cpu_to_node[i]);
+               i, cputopo[i].core, cputopo[i].socket, cputopo[i].node);
     }
     rc = 0;
 out:
-    xc_hypercall_buffer_free(xc_handle, cpu_to_core);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_socket);
-    xc_hypercall_buffer_free(xc_handle, cpu_to_node);
+    xc_hypercall_buffer_free(xc_handle, cputopo);
     if ( rc )
         exit(rc);
 }
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index f83e33d..6f8441d 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1221,30 +1221,20 @@ static PyObject *pyxc_getcpuinfo(XcObject *self, PyObject *args, PyObject *kwds)
 static PyObject *pyxc_topologyinfo(XcObject *self)
 {
 #define MAX_CPU_INDEX 255
-    xc_topologyinfo_t tinfo = { 0 };
+    xc_cputopoinfo_t tinfo = { 0 };
     int i, max_cpu_index;
     PyObject *ret_obj = NULL;
     PyObject *cpu_to_core_obj, *cpu_to_socket_obj, *cpu_to_node_obj;
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
-    DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
 
-    coremap = xc_hypercall_buffer_alloc(self->xc_handle, coremap, sizeof(*coremap) * (MAX_CPU_INDEX+1));
-    if ( coremap == NULL )
-        goto out;
-    socketmap = xc_hypercall_buffer_alloc(self->xc_handle, socketmap, sizeof(*socketmap) * (MAX_CPU_INDEX+1));
-    if ( socketmap == NULL  )
-        goto out;
-    nodemap = xc_hypercall_buffer_alloc(self->xc_handle, nodemap, sizeof(*nodemap) * (MAX_CPU_INDEX+1));
-    if ( nodemap == NULL )
-        goto out;
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_cputopo_t, cputopo);
 
-    set_xen_guest_handle(tinfo.cpu_to_core, coremap);
-    set_xen_guest_handle(tinfo.cpu_to_socket, socketmap);
-    set_xen_guest_handle(tinfo.cpu_to_node, nodemap);
+    cputopo = xc_hypercall_buffer_alloc(self->xc_handle, cputopo, sizeof(*cputopo) * (MAX_CPU_INDEX+1));
+    if ( cputopo == NULL )
+	goto out;
+    set_xen_guest_handle(tinfo.cputopo, cputopo);
     tinfo.max_cpu_index = MAX_CPU_INDEX;
 
-    if ( xc_topologyinfo(self->xc_handle, &tinfo) != 0 )
+    if ( xc_cputopoinfo(self->xc_handle, &tinfo) != 0 )
         goto out;
 
     max_cpu_index = tinfo.max_cpu_index;
@@ -1257,35 +1247,35 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     cpu_to_node_obj = PyList_New(0);
     for ( i = 0; i <= max_cpu_index; i++ )
     {
-        if ( coremap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].core == INVALID_TOPOLOGY_ID )
         {
             PyList_Append(cpu_to_core_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(coremap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].core);
             PyList_Append(cpu_to_core_obj, pyint);
             Py_DECREF(pyint);
         }
 
-        if ( socketmap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].socket == INVALID_TOPOLOGY_ID )
         {
             PyList_Append(cpu_to_socket_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(socketmap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].socket);
             PyList_Append(cpu_to_socket_obj, pyint);
             Py_DECREF(pyint);
         }
 
-        if ( nodemap[i] == INVALID_TOPOLOGY_ID )
+        if ( cputopo[i].node == INVALID_TOPOLOGY_ID )
         {
             PyList_Append(cpu_to_node_obj, Py_None);
         }
         else
         {
-            PyObject *pyint = PyInt_FromLong(nodemap[i]);
+            PyObject *pyint = PyInt_FromLong(cputopo[i].node);
             PyList_Append(cpu_to_node_obj, pyint);
             Py_DECREF(pyint);
         }
@@ -1303,9 +1293,7 @@ static PyObject *pyxc_topologyinfo(XcObject *self)
     Py_DECREF(cpu_to_node_obj);
 
 out:
-    xc_hypercall_buffer_free(self->xc_handle, coremap);
-    xc_hypercall_buffer_free(self->xc_handle, socketmap);
-    xc_hypercall_buffer_free(self->xc_handle, nodemap);
+    xc_hypercall_buffer_free(self->xc_handle, cputopo);
     return ret_obj ? ret_obj : pyxc_error_to_exception(self->xc_handle);
 #undef MAX_CPU_INDEX
 }
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 0cb6ee1..1254a24 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -320,39 +320,48 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
     }
     break;
 
-    case XEN_SYSCTL_topologyinfo:
+    case XEN_SYSCTL_cputopoinfo:
     {
         uint32_t i, max_cpu_index, last_online_cpu;
-        xen_sysctl_topologyinfo_t *ti = &op->u.topologyinfo;
+        xen_sysctl_cputopoinfo_t *ti = &op->u.cputopoinfo;
+
+        if ( guest_handle_is_null(ti->cputopo) )
+        {
+            ret = -EINVAL;
+            break;
+        }
 
         last_online_cpu = cpumask_last(&cpu_online_map);
         max_cpu_index = min_t(uint32_t, ti->max_cpu_index, last_online_cpu);
-        ti->max_cpu_index = last_online_cpu;
 
         for ( i = 0; i <= max_cpu_index; i++ )
         {
-            if ( !guest_handle_is_null(ti->cpu_to_core) )
-            {
-                uint32_t core = cpu_online(i) ? cpu_to_core(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_core, i, &core, 1) )
-                    break;
-            }
-            if ( !guest_handle_is_null(ti->cpu_to_socket) )
+            xen_sysctl_cputopo_t cputopo;
+
+            if ( cpu_online(i) )
             {
-                uint32_t socket = cpu_online(i) ? cpu_to_socket(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_socket, i, &socket, 1) )
-                    break;
+                cputopo.core = cpu_to_core(i);
+                cputopo.socket = cpu_to_socket(i);
+                cputopo.node = cpu_to_node(i);
             }
-            if ( !guest_handle_is_null(ti->cpu_to_node) )
+            else
+                cputopo.core = cputopo.socket =
+                    cputopo.node = INVALID_TOPOLOGY_ID;
+
+            if ( copy_to_guest_offset(ti->cputopo, i, &cputopo, 1) )
             {
-                uint32_t node = cpu_online(i) ? cpu_to_node(i) : ~0u;
-                if ( copy_to_guest_offset(ti->cpu_to_node, i, &node, 1) )
-                    break;
+                ret = -EFAULT;
+                break;
             }
         }
 
-        ret = ((i <= max_cpu_index) || copy_to_guest(u_sysctl, op, 1))
-            ? -EFAULT : 0;
+        if ( !ret && (ti->max_cpu_index != last_online_cpu) )
+        {
+             ti->max_cpu_index = last_online_cpu;
+             if ( __copy_field_to_guest(u_sysctl, op,
+                                        u.cputopoinfo.max_cpu_index) )
+                ret = -EFAULT;
+        }
     }
     break;
 
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index b3713b3..512ad74 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -34,7 +34,7 @@
 #include "xen.h"
 #include "domctl.h"
 
-#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000B
+#define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
 
 /*
  * Read console content from Xen buffer ring.
@@ -462,31 +462,35 @@ struct xen_sysctl_lockprof_op {
 typedef struct xen_sysctl_lockprof_op xen_sysctl_lockprof_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
 
-/* XEN_SYSCTL_topologyinfo */
+/* XEN_SYSCTL_cputopoinfo */
 #define INVALID_TOPOLOGY_ID  (~0U)
-struct xen_sysctl_topologyinfo {
+
+struct xen_sysctl_cputopo {
+    uint32_t core;
+    uint32_t socket;
+    uint32_t node;
+};
+typedef struct xen_sysctl_cputopo xen_sysctl_cputopo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopo_t);
+
+struct xen_sysctl_cputopoinfo {
     /*
-     * IN: maximum addressable entry in the caller-provided arrays.
+     * IN: maximum addressable entry in the caller-provided cputopo arary
      * OUT: largest cpu identifier in the system.
-     * If OUT is greater than IN then the arrays are truncated!
-     * If OUT is leass than IN then the array tails are not written by sysctl.
      */
     uint32_t max_cpu_index;
 
     /*
-     * If not NULL, these arrays are filled with core/socket/node identifier
-     * for each cpu.
-     * If a cpu has no core/socket/node information (e.g., cpu not present) 
-     * then the sentinel value ~0u is written to each array.
-     * The number of array elements written by the sysctl is:
+     * If not NULL, filled with core/socket/node identifier for each cpu
+     * If information for a particular entry is not avalable it is set to
+     * INVALID_TOPOLOGY_ID.
+     * The number of array elements for CPU topology written by the sysctl is:
      *   min(@max_cpu_index_IN,@max_cpu_index_OUT)+1
      */
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_core;
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_socket;
-    XEN_GUEST_HANDLE_64(uint32) cpu_to_node;
+    XEN_GUEST_HANDLE_64(xen_sysctl_cputopo_t) cputopo;
 };
-typedef struct xen_sysctl_topologyinfo xen_sysctl_topologyinfo_t;
-DEFINE_XEN_GUEST_HANDLE(xen_sysctl_topologyinfo_t);
+typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
 
 /* XEN_SYSCTL_numainfo */
 #define INVALID_NUMAINFO_ID (~0U)
@@ -671,7 +675,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pm_op                         12
 #define XEN_SYSCTL_page_offline_op               14
 #define XEN_SYSCTL_lockprof_op                   15
-#define XEN_SYSCTL_topologyinfo                  16 
+#define XEN_SYSCTL_cputopoinfo                   16
 #define XEN_SYSCTL_numainfo                      17
 #define XEN_SYSCTL_cpupool_op                    18
 #define XEN_SYSCTL_scheduler_op                  19
@@ -682,7 +686,7 @@ struct xen_sysctl {
         struct xen_sysctl_readconsole       readconsole;
         struct xen_sysctl_tbuf_op           tbuf_op;
         struct xen_sysctl_physinfo          physinfo;
-        struct xen_sysctl_topologyinfo      topologyinfo;
+        struct xen_sysctl_cputopoinfo       cputopoinfo;
         struct xen_sysctl_numainfo          numainfo;
         struct xen_sysctl_sched_id          sched_id;
         struct xen_sysctl_perfc_op          perfc_op;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-06  2:18 [PATCH v2 0/4] Display IO topology when PXM data is available Boris Ostrovsky
  2015-01-06  2:18 ` [PATCH v2 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
  2015-01-06  2:18 ` [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
@ 2015-01-06  2:18 ` Boris Ostrovsky
  2015-01-06 16:55   ` Wei Liu
  2015-01-07  9:21   ` Jan Beulich
  2015-01-06  2:18 ` [PATCH v2 4/4] libxl: Add interface for querying hypervisor about " Boris Ostrovsky
  3 siblings, 2 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-06  2:18 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: andrew.cooper3, dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/common/sysctl.c         |   60 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h |   35 ++++++++++++++++++++++++-
 2 files changed, 94 insertions(+), 1 deletions(-)

diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 1254a24..f09d377 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -365,6 +365,66 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
     }
     break;
 
+    case XEN_SYSCTL_pcitopoinfo:
+    {
+        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
+
+        if ( guest_handle_is_null(ti->pcitopo) ||
+             (ti->first_dev >= ti->num_devs) )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        for ( ; ti->first_dev < ti->num_devs; ti->first_dev++ )
+        {
+            xen_sysctl_pcitopo_t pcitopo;
+            struct pci_dev *pdev;
+
+            if ( copy_from_guest_offset(&pcitopo, ti->pcitopo,
+                                        ti->first_dev, 1) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            spin_lock(&pcidevs_lock);
+            pdev = pci_get_pdev(pcitopo.pcidev.seg, pcitopo.pcidev.bus,
+                                pcitopo.pcidev.devfn);
+            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
+                pcitopo.node = INVALID_TOPOLOGY_ID;
+            else
+                pcitopo.node = pdev->node;
+            spin_unlock(&pcidevs_lock);
+
+            if ( copy_to_guest_offset(ti->pcitopo, ti->first_dev,
+                                      &pcitopo, 1) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            if ( hypercall_preempt_check() )
+                break;
+        }
+
+        if ( !ret )
+        {
+            if ( __copy_field_to_guest(u_sysctl, op,
+                                       u.pcitopoinfo.first_dev) )
+            {
+                ret = -EFAULT;
+                break;
+            }
+
+            if ( ti->first_dev < ti->num_devs )
+                ret = hypercall_create_continuation(__HYPERVISOR_sysctl,
+                                                   "h", u_sysctl);
+
+        }
+    }
+    break;
+
 #ifdef TEST_COVERAGE
     case XEN_SYSCTL_coverage_op:
         ret = sysctl_coverage_op(&op->u.coverage_op);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 512ad74..628ed6a 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -33,6 +33,7 @@
 
 #include "xen.h"
 #include "domctl.h"
+#include "physdev.h"
 
 #define XEN_SYSCTL_INTERFACE_VERSION 0x0000000C
 
@@ -463,7 +464,7 @@ typedef struct xen_sysctl_lockprof_op xen_sysctl_lockprof_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
 
 /* XEN_SYSCTL_cputopoinfo */
-#define INVALID_TOPOLOGY_ID  (~0U)
+#define INVALID_TOPOLOGY_ID  (~0U) /* Also used by pcitopo */
 
 struct xen_sysctl_cputopo {
     uint32_t core;
@@ -492,6 +493,36 @@ struct xen_sysctl_cputopoinfo {
 typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
 
+/* XEN_SYSCTL_pcitopoinfo */
+struct xen_sysctl_pcitopo {
+    struct physdev_pci_device pcidev;
+    uint32_t node;
+};
+typedef struct xen_sysctl_pcitopo xen_sysctl_pcitopo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopo_t);
+
+struct xen_sysctl_pcitopoinfo {
+    /* IN: Size of pcitopo array */
+    uint32_t num_devs;
+
+    /*
+     * IN/OUT: First element of pcitopo array that needs to be processed by
+     * hypervisor.
+     * This is used primarily by hypercall continuations and callers will
+     * typically set it to zero
+     */
+    uint32_t first_dev;
+
+    /*
+     * If not NULL, filled with node identifier for each pcidev
+     * If information for a particular device is not avalable then node is set
+     * to INVALID_TOPOLOGY_ID.
+     */
+    XEN_GUEST_HANDLE_64(xen_sysctl_pcitopo_t) pcitopo;
+};
+typedef struct xen_sysctl_pcitopoinfo xen_sysctl_pcitopoinfo_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopoinfo_t);
+
 /* XEN_SYSCTL_numainfo */
 #define INVALID_NUMAINFO_ID (~0U)
 struct xen_sysctl_numainfo {
@@ -681,12 +712,14 @@ struct xen_sysctl {
 #define XEN_SYSCTL_scheduler_op                  19
 #define XEN_SYSCTL_coverage_op                   20
 #define XEN_SYSCTL_psr_cmt_op                    21
+#define XEN_SYSCTL_pcitopoinfo                   22
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
         struct xen_sysctl_tbuf_op           tbuf_op;
         struct xen_sysctl_physinfo          physinfo;
         struct xen_sysctl_cputopoinfo       cputopoinfo;
+        struct xen_sysctl_pcitopoinfo       pcitopoinfo;
         struct xen_sysctl_numainfo          numainfo;
         struct xen_sysctl_sched_id          sched_id;
         struct xen_sysctl_perfc_op          perfc_op;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-06  2:18 [PATCH v2 0/4] Display IO topology when PXM data is available Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2015-01-06  2:18 ` [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
@ 2015-01-06  2:18 ` Boris Ostrovsky
  2015-01-06 17:08   ` Wei Liu
                     ` (2 more replies)
  3 siblings, 3 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-06  2:18 UTC (permalink / raw)
  To: jbeulich, keir, ian.jackson, stefano.stabellini, ian.campbell, wei.liu2
  Cc: andrew.cooper3, dario.faggioli, boris.ostrovsky, ufimtseva, xen-devel

.. and use this new interface to display it along with CPU topology
and NUMA information when 'xl info -n' command is issued

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 tools/libxc/include/xenctrl.h |    2 +
 tools/libxc/xc_misc.c         |   18 ++++++++++
 tools/libxl/libxl.c           |   58 +++++++++++++++++++++++++++++++++
 tools/libxl/libxl.h           |    4 ++
 tools/libxl/libxl_freebsd.c   |   12 +++++++
 tools/libxl/libxl_internal.h  |    5 +++
 tools/libxl/libxl_linux.c     |   71 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_netbsd.c    |   12 +++++++
 tools/libxl/libxl_types.idl   |    7 ++++
 tools/libxl/libxl_utils.c     |    8 +++++
 tools/libxl/xl_cmdimpl.c      |   39 ++++++++++++++++++----
 11 files changed, 229 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0cb6743..3c5824a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1227,6 +1227,7 @@ int xc_send_debug_keys(xc_interface *xch, char *keys);
 
 typedef xen_sysctl_physinfo_t xc_physinfo_t;
 typedef xen_sysctl_cputopoinfo_t xc_cputopoinfo_t;
+typedef xen_sysctl_pcitopoinfo_t xc_pcitopoinfo_t;
 typedef xen_sysctl_numainfo_t xc_numainfo_t;
 
 typedef uint32_t xc_cpu_to_node_t;
@@ -1238,6 +1239,7 @@ typedef uint32_t xc_node_to_node_dist_t;
 
 int xc_physinfo(xc_interface *xch, xc_physinfo_t *info);
 int xc_cputopoinfo(xc_interface *xch, xc_cputopoinfo_t *info);
+int xc_pcitopoinfo(xc_interface *xch, xc_pcitopoinfo_t *info);
 int xc_numainfo(xc_interface *xch, xc_numainfo_t *info);
 
 int xc_sched_id(xc_interface *xch,
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index be68291..e33a312 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -195,6 +195,24 @@ int xc_cputopoinfo(xc_interface *xch,
     return 0;
 }
 
+int xc_pcitopoinfo(xc_interface *xch,
+                   xc_pcitopoinfo_t *put_info)
+{
+    int ret;
+    DECLARE_SYSCTL;
+
+    sysctl.cmd = XEN_SYSCTL_pcitopoinfo;
+
+    memcpy(&sysctl.u.pcitopoinfo, put_info, sizeof(*put_info));
+
+    if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
+        return ret;
+
+    memcpy(put_info, &sysctl.u.pcitopoinfo, sizeof(*put_info));
+
+    return 0;
+}
+
 int xc_numainfo(xc_interface *xch,
                 xc_numainfo_t *put_info)
 {
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index cd87614..888f068 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5121,6 +5121,64 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
     return ret;
 }
 
+libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
+{
+    GC_INIT(ctx);
+    xc_pcitopoinfo_t tinfo;
+    DECLARE_HYPERCALL_BUFFER(xen_sysctl_pcitopo_t, pcitopo);
+    libxl_pcitopology *ret = NULL;
+    int i, rc;
+
+    tinfo.num_devs = libxl__pci_numdevs(gc);
+    if (tinfo.num_devs <= 0) {
+        LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of PCI devices");
+        goto out;
+    }
+
+    pcitopo = xc_hypercall_buffer_alloc(ctx->xch, pcitopo,
+                                        sizeof(*pcitopo) * tinfo.num_devs);
+    if (pcitopo == NULL) {
+        LIBXL__LOG_ERRNOVAL(ctx, XTL_ERROR, ENOMEM,
+                            "Unable to allocate hypercall arguments");
+        goto out;
+    }
+
+    rc = libxl__pci_topology_init(gc, pcitopo, tinfo.num_devs);
+    if (rc) {
+        LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "Cannot initialize PCI"
+                            " hypercall structure");
+        goto fail;
+    }
+
+    tinfo.first_dev = 0;
+
+    set_xen_guest_handle(tinfo.pcitopo, pcitopo);
+
+    if (xc_pcitopoinfo(ctx->xch, &tinfo) != 0) {
+        LIBXL__LOG_ERRNO(ctx, XTL_ERROR, "PCI topology info hypercall failed");
+        goto fail;
+    }
+
+    ret = libxl__zalloc(NOGC, sizeof(libxl_pcitopology) * tinfo.num_devs);
+
+    for (i = 0; i < tinfo.num_devs; i++) {
+        ret[i].seg = pcitopo[i].pcidev.seg;
+        ret[i].bus = pcitopo[i].pcidev.bus;
+        ret[i].devfn = pcitopo[i].pcidev.devfn;
+        ret[i].node = (pcitopo[i].node ==  INVALID_TOPOLOGY_ID) ?
+            LIBXL_PCITOPOLOGY_INVALID_ENTRY :  pcitopo[i].node;
+    }
+
+    *num_devs = tinfo.num_devs;
+
+ fail:
+    xc_hypercall_buffer_free(ctx->xch, pcitopo);
+
+ out:
+    GC_FREE;
+    return ret;
+}
+
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
 {
     GC_INIT(ctx);
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a123f1..eb83f0a 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1070,6 +1070,10 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
 libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
 void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
 
+#define LIBXL_PCITOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
+libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_dev);
+void libxl_pcitopology_list_free(libxl_pcitopology *, int num_dev);
+
 #define LIBXL_NUMAINFO_INVALID_ENTRY (~(uint32_t)0)
 libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr);
 void libxl_numainfo_list_free(libxl_numainfo *, int nr);
diff --git a/tools/libxl/libxl_freebsd.c b/tools/libxl/libxl_freebsd.c
index e8b88b3..3cb11cd 100644
--- a/tools/libxl/libxl_freebsd.c
+++ b/tools/libxl/libxl_freebsd.c
@@ -131,3 +131,15 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
 }
+
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    return ERROR_NI;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             xen_sysctl_pcitopo_t *pcitopo,
+                             int numdev)
+{
+    return ERROR_NI;
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 9695f18..453e652 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1168,6 +1168,11 @@ _hidden int libxl__try_phy_backend(mode_t st_mode);
 
 _hidden char *libxl__devid_to_localdev(libxl__gc *gc, int devid);
 
+_hidden int libxl__pci_numdevs(libxl__gc *gc);
+_hidden int libxl__pci_topology_init(libxl__gc *gc,
+                                     xen_sysctl_pcitopo_t *pcitopo,
+                                     int numdev);
+
 /* from libxl_pci */
 
 _hidden int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, int starting);
diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
index ea5d8c1..07428c0 100644
--- a/tools/libxl/libxl_linux.c
+++ b/tools/libxl/libxl_linux.c
@@ -279,3 +279,74 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
 }
+
+/* These two routines are "inspired" by pciutils */
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    DIR *dir;
+    struct dirent *entry;
+    int numdev = 0;
+
+    dir = opendir("/sys/bus/pci/devices");
+    if (!dir) {
+        LOGEV(ERROR, errno, "Cannot open /sys/bus/pci/devices");
+        return ERROR_FAIL;
+    }
+
+    while ((entry = readdir(dir))) {
+        /* ".", ".." or a special non-device perhaps */
+        if (entry->d_name[0] == '.')
+            continue;
+        numdev++;
+    }
+    closedir(dir);
+
+    return numdev;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             xen_sysctl_pcitopo_t *pcitopo,
+                             int numdev)
+{
+
+    DIR *dir;
+    struct dirent *entry;
+    int i;
+
+    dir = opendir("/sys/bus/pci/devices");
+    if (!dir) {
+        LOGEV(ERROR, errno, "Cannot open /sys/bus/pci/devices");
+        return ERROR_FAIL;
+    }
+
+    i = 0;
+    while ((entry = readdir(dir))) {
+        unsigned int dom, bus, dev, func;
+
+        /* ".", ".." or a special non-device perhaps */
+        if (entry->d_name[0] == '.')
+            continue;
+
+        if (i == numdev) {
+            LOGE(ERROR, "Too many devices\n");
+            closedir(dir);
+            return ERROR_FAIL;
+        }
+
+        if (sscanf(entry->d_name, "%x:%x:%x.%d", &dom, &bus, &dev, &func) < 4) {
+            LOGEV(ERROR, errno, "Error processing /sys/bus/pci/devices");
+            closedir(dir);
+            return ERROR_FAIL;
+        }
+
+        pcitopo[i].pcidev.seg = dom;
+        pcitopo[i].pcidev.bus = bus;
+        pcitopo[i].pcidev.devfn = ((dev & 0x1f) << 3) | (func & 7);
+
+        i++;
+    }
+
+    closedir(dir);
+
+    return 0;
+}
diff --git a/tools/libxl/libxl_netbsd.c b/tools/libxl/libxl_netbsd.c
index 898e160..743d439 100644
--- a/tools/libxl/libxl_netbsd.c
+++ b/tools/libxl/libxl_netbsd.c
@@ -95,3 +95,15 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
 {
     return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL;
 }
+
+int libxl__pci_numdevs(libxl__gc *gc)
+{
+    return ERROR_NI;
+}
+
+int libxl__pci_topology_init(libxl__gc *gc,
+                             xen_sysctl_pcitopo_t *pcitopo,
+                             int numdev)
+{
+    return ERROR_NI;
+}
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index f7fc695..e49b5f8 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -642,6 +642,13 @@ libxl_cputopology = Struct("cputopology", [
     ("node", uint32),
     ], dir=DIR_OUT)
 
+libxl_pcitopology = Struct("pcitopology", [
+    ("seg", uint16),
+    ("bus", uint8),
+    ("devfn", uint8),
+    ("node", uint32),
+    ], dir=DIR_OUT)
+
 libxl_sched_credit_params = Struct("sched_credit_params", [
     ("tslice_ms", integer),
     ("ratelimit_us", integer),
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index 7095b58..c31407f 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -877,6 +877,14 @@ void libxl_cputopology_list_free(libxl_cputopology *list, int nr)
     free(list);
 }
 
+void libxl_pcitopology_list_free(libxl_pcitopology *list, int nr)
+{
+    int i;
+    for (i = 0; i < nr; i++)
+        libxl_pcitopology_dispose(&list[i]);
+    free(list);
+}
+
 void libxl_numainfo_list_free(libxl_numainfo *list, int nr)
 {
     int i;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 3737c7e..f2a3320 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5195,12 +5195,14 @@ static void output_numainfo(void)
 
 static void output_topologyinfo(void)
 {
-    libxl_cputopology *info;
+    libxl_cputopology *cpuinfo;
+    libxl_pcitopology *pciinfo;
     int i, nr;
+    int valid_devs;
 
-    info = libxl_get_cpu_topology(ctx, &nr);
-    if (info == NULL) {
-        fprintf(stderr, "libxl_get_topologyinfo failed.\n");
+    cpuinfo = libxl_get_cpu_topology(ctx, &nr);
+    if (cpuinfo == NULL) {
+        fprintf(stderr, "libxl_get_cpu_topology failed.\n");
         return;
     }
 
@@ -5208,12 +5210,35 @@ static void output_topologyinfo(void)
     printf("cpu:    core    socket     node\n");
 
     for (i = 0; i < nr; i++) {
-        if (info[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
+        if (cpuinfo[i].core != LIBXL_CPUTOPOLOGY_INVALID_ENTRY)
             printf("%3d:    %4d     %4d     %4d\n", i,
-                   info[i].core, info[i].socket, info[i].node);
+                   cpuinfo[i].core, cpuinfo[i].socket, cpuinfo[i].node);
+    }
+
+    libxl_cputopology_list_free(cpuinfo, nr);
+
+    pciinfo = libxl_get_pci_topology(ctx, &nr);
+    if (cpuinfo == NULL) {
+        fprintf(stderr, "libxl_get_pci_topology failed.\n");
+        return;
     }
 
-    libxl_cputopology_list_free(info, nr);
+    printf("device topology        :\n");
+    printf("device           node\n");
+    for (i = 0; i < nr; i++) {
+        if (pciinfo[i].node != LIBXL_PCITOPOLOGY_INVALID_ENTRY) {
+            printf("%04x:%02x:%02x.%01x      %d\n", pciinfo[i].seg,
+                   pciinfo[i].bus,
+                   ((pciinfo[i].devfn >> 3) & 0x1f), (pciinfo[i].devfn & 7),
+                   pciinfo[i].node);
+            valid_devs++;
+        }
+    }
+
+    if (valid_devs == 0)
+        printf("No device topology data available\n");
+
+    libxl_pcitopology_list_free(pciinfo, nr);
 
     return;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-06  2:18 ` [PATCH v2 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
@ 2015-01-06 11:55   ` Andrew Cooper
  2015-01-07  9:01     ` Jan Beulich
  2015-01-07  9:06   ` Jan Beulich
  1 sibling, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2015-01-06 11:55 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2
  Cc: dario.faggioli, ufimtseva, xen-devel

On 06/01/15 02:18, Boris Ostrovsky wrote:

> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 5f295f3..a5eb81c 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -56,6 +56,8 @@ struct pci_dev {
>  
>      u8 phantom_stride;
>  
> +    int node; /* NUMA node */
> +
>      enum pdev_type {
>          DEV_TYPE_PCI_UNKNOWN,
>          DEV_TYPE_PCIe_ENDPOINT,
> @@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *,
>  int pci_release_devices(struct domain *d);
>  int pci_add_segment(u16 seg);
>  const unsigned long *pci_get_ro_map(u16 seg);
> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
> +                   const struct pci_dev_info *, int);

Please use parameter names in definitions.

~Andrew

>  int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>  int pci_ro_device(int seg, int bus, int devfn);
>  void arch_pci_ro_device(int seg, int bdf);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-06  2:18 ` [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
@ 2015-01-06 13:41   ` Andrew Cooper
  2015-01-06 14:45     ` Boris Ostrovsky
                       ` (2 more replies)
  2015-01-19 17:26   ` Ian Campbell
  1 sibling, 3 replies; 60+ messages in thread
From: Andrew Cooper @ 2015-01-06 13:41 UTC (permalink / raw)
  To: Boris Ostrovsky, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2
  Cc: dario.faggioli, ufimtseva, xen-devel

On 06/01/15 02:18, Boris Ostrovsky wrote:
> Instead of copying data for each field in xen_sysctl_topologyinfo separately
> put cpu/socket/node into a single structure and do a single copy for each
> processor.
>
> There is also no need to copy whole op to user at the end, max_cpu_index is
> sufficient
>
> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
> that these are used for CPU topology. Subsequent patch will add support for
> PCI topology sysctl.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

If we are going to change the hypercall, then can we see about making it
a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
components which might want/need access to this information.  (i.e. I am
still looking for a reasonable way to get this information from Xen in
hwloc)

> <snip>
>
> +            if ( cpu_online(i) )
>              {
> -                uint32_t socket = cpu_online(i) ? cpu_to_socket(i) : ~0u;
> -                if ( copy_to_guest_offset(ti->cpu_to_socket, i, &socket, 1) )
> -                    break;
> +                cputopo.core = cpu_to_core(i);
> +                cputopo.socket = cpu_to_socket(i);
> +                cputopo.node = cpu_to_node(i);
>              }
> -            if ( !guest_handle_is_null(ti->cpu_to_node) )
> +            else
> +                cputopo.core = cputopo.socket =
> +                    cputopo.node = INVALID_TOPOLOGY_ID;
> +

In particular, can we fix this broken behaviour.  The cpu being online
or not has no bearing on whether Xen has topology information, and a
side effect is that when the cpu governer powers down a cpu, it
disappears from the reported topology.

~Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-06 13:41   ` Andrew Cooper
@ 2015-01-06 14:45     ` Boris Ostrovsky
  2015-01-07  9:12     ` Jan Beulich
  2015-01-07 15:23     ` Jan Beulich
  2 siblings, 0 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-06 14:45 UTC (permalink / raw)
  To: Andrew Cooper, jbeulich, keir, ian.jackson, stefano.stabellini,
	ian.campbell, wei.liu2
  Cc: dario.faggioli, ufimtseva, xen-devel

On 01/06/2015 08:41 AM, Andrew Cooper wrote:
> On 06/01/15 02:18, Boris Ostrovsky wrote:
>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>> put cpu/socket/node into a single structure and do a single copy for each
>> processor.
>>
>> There is also no need to copy whole op to user at the end, max_cpu_index is
>> sufficient
>>
>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
>> that these are used for CPU topology. Subsequent patch will add support for
>> PCI topology sysctl.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> If we are going to change the hypercall, then can we see about making it
> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
> components which might want/need access to this information.  (i.e. I am
> still looking for a reasonable way to get this information from Xen in
> hwloc)

Can't those components dlopen libxl? That's what I was assuming we'd do 
with hwlock.

>
>> <snip>
>>
>> +            if ( cpu_online(i) )
>>               {
>> -                uint32_t socket = cpu_online(i) ? cpu_to_socket(i) : ~0u;
>> -                if ( copy_to_guest_offset(ti->cpu_to_socket, i, &socket, 1) )
>> -                    break;
>> +                cputopo.core = cpu_to_core(i);
>> +                cputopo.socket = cpu_to_socket(i);
>> +                cputopo.node = cpu_to_node(i);
>>               }
>> -            if ( !guest_handle_is_null(ti->cpu_to_node) )
>> +            else
>> +                cputopo.core = cputopo.socket =
>> +                    cputopo.node = INVALID_TOPOLOGY_ID;
>> +
> In particular, can we fix this broken behaviour.  The cpu being online
> or not has no bearing on whether Xen has topology information, and a
> side effect is that when the cpu governer powers down a cpu, it
> disappears from the reported topology.

Yes, I should fix that as well.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-06  2:18 ` [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
@ 2015-01-06 16:55   ` Wei Liu
  2015-01-06 18:15     ` Boris Ostrovsky
  2015-01-07  9:21   ` Jan Beulich
  1 sibling, 1 reply; 60+ messages in thread
From: Wei Liu @ 2015-01-06 16:55 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir,
	ufimtseva

On Mon, Jan 05, 2015 at 09:18:56PM -0500, Boris Ostrovsky wrote:
[...]
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
>  
>  /* XEN_SYSCTL_cputopoinfo */

A bikeshed issue: is it better to just modified the above comment
instead of adding new trailing comment?

Wei.

> -#define INVALID_TOPOLOGY_ID  (~0U)
> +#define INVALID_TOPOLOGY_ID  (~0U) /* Also used by pcitopo */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-06  2:18 ` [PATCH v2 4/4] libxl: Add interface for querying hypervisor about " Boris Ostrovsky
@ 2015-01-06 17:08   ` Wei Liu
  2015-01-07  9:04   ` Dario Faggioli
  2015-01-19 17:32   ` Ian Campbell
  2 siblings, 0 replies; 60+ messages in thread
From: Wei Liu @ 2015-01-06 17:08 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, keir,
	ufimtseva

On Mon, Jan 05, 2015 at 09:18:57PM -0500, Boris Ostrovsky wrote:
[...]
> +int libxl__pci_topology_init(libxl__gc *gc,
> +                             xen_sysctl_pcitopo_t *pcitopo,
> +                             int numdev)
> +{
> +
> +    DIR *dir;
> +    struct dirent *entry;
> +    int i;
> +
> +    dir = opendir("/sys/bus/pci/devices");
> +    if (!dir) {
> +        LOGEV(ERROR, errno, "Cannot open /sys/bus/pci/devices");
> +        return ERROR_FAIL;
> +    }
> +
> +    i = 0;
> +    while ((entry = readdir(dir))) {
> +        unsigned int dom, bus, dev, func;
> +
> +        /* ".", ".." or a special non-device perhaps */
> +        if (entry->d_name[0] == '.')
> +            continue;
> +
> +        if (i == numdev) {
> +            LOGE(ERROR, "Too many devices\n");
> +            closedir(dir);
> +            return ERROR_FAIL;

Please use "goto out" idiom.

Wei.

> +        }
> +
> +        if (sscanf(entry->d_name, "%x:%x:%x.%d", &dom, &bus, &dev, &func) < 4) {
> +            LOGEV(ERROR, errno, "Error processing /sys/bus/pci/devices");
> +            closedir(dir);
> +            return ERROR_FAIL;
> +        }
> +

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-06 16:55   ` Wei Liu
@ 2015-01-06 18:15     ` Boris Ostrovsky
  0 siblings, 0 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-06 18:15 UTC (permalink / raw)
  To: Wei Liu
  Cc: keir, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, ufimtseva

On 01/06/2015 11:55 AM, Wei Liu wrote:
> On Mon, Jan 05, 2015 at 09:18:56PM -0500, Boris Ostrovsky wrote:
> [...]
>>   DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
>>   
>>   /* XEN_SYSCTL_cputopoinfo */
> A bikeshed issue: is it better to just modified the above comment
> instead of adding new trailing comment?

To me the top comment applies to structure definitions as well which is 
why I thought a side comment would be more appropriate.

-boris

>
> Wei.
>
>> -#define INVALID_TOPOLOGY_ID  (~0U)
>> +#define INVALID_TOPOLOGY_ID  (~0U) /* Also used by pcitopo */

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-06 11:55   ` Andrew Cooper
@ 2015-01-07  9:01     ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-07  9:01 UTC (permalink / raw)
  To: Andrew Cooper, Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 06.01.15 at 12:55, <andrew.cooper3@citrix.com> wrote:
> On 06/01/15 02:18, Boris Ostrovsky wrote:
> 
>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>> index 5f295f3..a5eb81c 100644
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -56,6 +56,8 @@ struct pci_dev {
>>  
>>      u8 phantom_stride;
>>  
>> +    int node; /* NUMA node */
>> +
>>      enum pdev_type {
>>          DEV_TYPE_PCI_UNKNOWN,
>>          DEV_TYPE_PCIe_ENDPOINT,
>> @@ -102,7 +104,8 @@ void setup_hwdom_pci_devices(struct domain *,
>>  int pci_release_devices(struct domain *d);
>>  int pci_add_segment(u16 seg);
>>  const unsigned long *pci_get_ro_map(u16 seg);
>> -int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *);
>> +int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> +                   const struct pci_dev_info *, int);
> 
> Please use parameter names in definitions.

For the added parameter - yes. For the pre-existing pointer one I don't
see a strong need to do so (and there was no name there before).

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-06  2:18 ` [PATCH v2 4/4] libxl: Add interface for querying hypervisor about " Boris Ostrovsky
  2015-01-06 17:08   ` Wei Liu
@ 2015-01-07  9:04   ` Dario Faggioli
  2015-01-07 14:15     ` Boris Ostrovsky
  2015-01-19 17:32   ` Ian Campbell
  2 siblings, 1 reply; 60+ messages in thread
From: Dario Faggioli @ 2015-01-07  9:04 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	ian.jackson, xen-devel, jbeulich, keir, ufimtseva


[-- Attachment #1.1: Type: text/plain, Size: 3221 bytes --]

On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote:
> .. and use this new interface to display it along with CPU topology
> and NUMA information when 'xl info -n' command is issued
> 
Also, can we see how an `xl info -n' looks, on an IONUMA system?

> @@ -195,6 +195,24 @@ int xc_cputopoinfo(xc_interface *xch,
>      return 0;
>  }
>  
> +int xc_pcitopoinfo(xc_interface *xch,
> +                   xc_pcitopoinfo_t *put_info)
> +{
> +    int ret;
> +    DECLARE_SYSCTL;
> +
> +    sysctl.cmd = XEN_SYSCTL_pcitopoinfo;
> +
> +    memcpy(&sysctl.u.pcitopoinfo, put_info, sizeof(*put_info));
> +
> +    if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
> +        return ret;
> +
> +    memcpy(put_info, &sysctl.u.pcitopoinfo, sizeof(*put_info));
> +
> +    return 0;
> +}

> @@ -5121,6 +5121,64 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>      return ret;
>  }
>  
> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
> +{
> +    GC_INIT(ctx);
> +    xc_pcitopoinfo_t tinfo;
> +    DECLARE_HYPERCALL_BUFFER(xen_sysctl_pcitopo_t, pcitopo);
> +    libxl_pcitopology *ret = NULL;
> +    int i, rc;
> +
I see from where this comes from. However, at least from new functions,
I think we should avoid using things like DECLARE_HYPERCALL_BUFFER etc.,
in libxl. They belong in libxc, IMO.

This is basically what Andrew was doing here (although that was on
xc_{topology,numa}info:

http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/hwloc-support-experimental
http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blobdiff;f=tools/libxc/xc_misc.c;h=4f672cead5a4d2b1fc23bcddb6fb49e4d6ec72b5;hp=330345459176961cacc7fda506e843831fe7156a;hb=3585994405b6a73c137309dd4be91f48c71e4903;hpb=9a80d5056766535ac624774b96495f8b97b1d28b

Basically, what I'm suggesting is to have xc_pcitopoinfo() to do most of
the work, with libxl_get_pci_topology being just a wrapper to it.

In fact, if you grep/cscope for things like DECLARE_HYPERCALL_BUFFER in
tools/libxl, the *only* results are these:

libxl.c             libxl_get_cpu_topology                  5076 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
libxl.c             libxl_get_cpu_topology                  5077 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
libxl.c             libxl_get_cpu_topology                  5078 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
libxl.c             libxl_get_numainfo                      5142 DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize);
libxl.c             libxl_get_numainfo                      5143 DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree);
libxl.c             libxl_get_numainfo                      5144 DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists);

And I think we should work toward removing these too, rather than adding
new ones! :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-06  2:18 ` [PATCH v2 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
  2015-01-06 11:55   ` Andrew Cooper
@ 2015-01-07  9:06   ` Jan Beulich
  2015-01-07 14:42     ` Boris Ostrovsky
  1 sibling, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-07  9:06 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 06.01.15 at 03:18, <boris.ostrovsky@oracle.com> wrote:
> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          }
>          else
>              pdev_info.is_virtfn = 0;
> -        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
> +
> +        if ( add.flags & XEN_PCI_DEV_PXM )
> +        {
> +            uint32_t pxm;
> +            int optarr_off = offsetof(struct physdev_pci_device_add, optarr) /

unsigned int or size_t.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -56,6 +56,8 @@ struct pci_dev {
>  
>      u8 phantom_stride;
>  
> +    int node; /* NUMA node */

I thought I asked about this on v1 already: Does this really need to be
an int, when commonly node numbers are stored in u8/unsigned char?
Shrinking the field size would prevent the structure size from growing...

Of course an additional question would be whether the node wouldn't
better go into struct arch_pci_dev - that depends on whether we
expect ARM to be using NUMA...

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-06 13:41   ` Andrew Cooper
  2015-01-06 14:45     ` Boris Ostrovsky
@ 2015-01-07  9:12     ` Jan Beulich
  2015-01-07 14:45       ` Boris Ostrovsky
  2015-01-16 15:56       ` Boris Ostrovsky
  2015-01-07 15:23     ` Jan Beulich
  2 siblings, 2 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-07  9:12 UTC (permalink / raw)
  To: Andrew Cooper, Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
> On 06/01/15 02:18, Boris Ostrovsky wrote:
>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>> put cpu/socket/node into a single structure and do a single copy for each
>> processor.
>>
>> There is also no need to copy whole op to user at the end, max_cpu_index is
>> sufficient
>>
>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
>> that these are used for CPU topology. Subsequent patch will add support for
>> PCI topology sysctl.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> If we are going to change the hypercall, then can we see about making it
> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
> components which might want/need access to this information.  (i.e. I am
> still looking for a reasonable way to get this information from Xen in
> hwloc)

In which case leaving the sysctl alone and just adding a new non-sysctl
interface should be considered.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-06  2:18 ` [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
  2015-01-06 16:55   ` Wei Liu
@ 2015-01-07  9:21   ` Jan Beulich
  2015-01-07 14:55     ` Boris Ostrovsky
  1 sibling, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-07  9:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 06.01.15 at 03:18, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -365,6 +365,66 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>      }
>      break;
>  

#ifdef HAS_PCI

> +    case XEN_SYSCTL_pcitopoinfo:
> +    {
> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
> +
> +        if ( guest_handle_is_null(ti->pcitopo) ||
> +             (ti->first_dev >= ti->num_devs) )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        for ( ; ti->first_dev < ti->num_devs; ti->first_dev++ )
> +        {
> +            xen_sysctl_pcitopo_t pcitopo;
> +            struct pci_dev *pdev;
> +
> +            if ( copy_from_guest_offset(&pcitopo, ti->pcitopo,
> +                                        ti->first_dev, 1) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            spin_lock(&pcidevs_lock);
> +            pdev = pci_get_pdev(pcitopo.pcidev.seg, pcitopo.pcidev.bus,
> +                                pcitopo.pcidev.devfn);
> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
> +                pcitopo.node = INVALID_TOPOLOGY_ID;
> +            else
> +                pcitopo.node = pdev->node;

Are hypervisor-internal node numbers really meaningful to the caller?

> +            spin_unlock(&pcidevs_lock);
> +
> +            if ( copy_to_guest_offset(ti->pcitopo, ti->first_dev,

__copy_ty_guest_offset()

> +                                      &pcitopo, 1) )
> +            {
> +                ret = -EFAULT;
> +                break;
> +            }
> +
> +            if ( hypercall_preempt_check() )
> +                break;

You didn't increment ->first_dev yet, i.e. you'd start with the same
index again when continuing later, and in the end you may not make
any forward progress.

> @@ -463,7 +464,7 @@ typedef struct xen_sysctl_lockprof_op xen_sysctl_lockprof_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
>  
>  /* XEN_SYSCTL_cputopoinfo */
> -#define INVALID_TOPOLOGY_ID  (~0U)
> +#define INVALID_TOPOLOGY_ID  (~0U) /* Also used by pcitopo */

Better extend the preceding comment.

> @@ -492,6 +493,36 @@ struct xen_sysctl_cputopoinfo {
>  typedef struct xen_sysctl_cputopoinfo xen_sysctl_cputopoinfo_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_cputopoinfo_t);
>  
> +/* XEN_SYSCTL_pcitopoinfo */
> +struct xen_sysctl_pcitopo {
> +    struct physdev_pci_device pcidev;
> +    uint32_t node;
> +};
> +typedef struct xen_sysctl_pcitopo xen_sysctl_pcitopo_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_pcitopo_t);
> +
> +struct xen_sysctl_pcitopoinfo {
> +    /* IN: Size of pcitopo array */
> +    uint32_t num_devs;
> +
> +    /*
> +     * IN/OUT: First element of pcitopo array that needs to be processed by
> +     * hypervisor.
> +     * This is used primarily by hypercall continuations and callers will
> +     * typically set it to zero
> +     */
> +    uint32_t first_dev;
> +
> +    /*
> +     * If not NULL, filled with node identifier for each pcidev

The "If not NULL" would be meaningful only if NULL had a special
meaning.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-07  9:04   ` Dario Faggioli
@ 2015-01-07 14:15     ` Boris Ostrovsky
  2015-01-07 14:45       ` Dario Faggioli
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-07 14:15 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	ian.jackson, xen-devel, jbeulich, keir, ufimtseva

On 01/07/2015 04:04 AM, Dario Faggioli wrote:
> On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote:
>> .. and use this new interface to display it along with CPU topology
>> and NUMA information when 'xl info -n' command is issued
>>
> Also, can we see how an `xl info -n' looks, on an IONUMA system?
>
>> @@ -195,6 +195,24 @@ int xc_cputopoinfo(xc_interface *xch,
>>       return 0;
>>   }
>>   
>> +int xc_pcitopoinfo(xc_interface *xch,
>> +                   xc_pcitopoinfo_t *put_info)
>> +{
>> +    int ret;
>> +    DECLARE_SYSCTL;
>> +
>> +    sysctl.cmd = XEN_SYSCTL_pcitopoinfo;
>> +
>> +    memcpy(&sysctl.u.pcitopoinfo, put_info, sizeof(*put_info));
>> +
>> +    if ( (ret = do_sysctl(xch, &sysctl)) != 0 )
>> +        return ret;
>> +
>> +    memcpy(put_info, &sysctl.u.pcitopoinfo, sizeof(*put_info));
>> +
>> +    return 0;
>> +}
>> @@ -5121,6 +5121,64 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>>       return ret;
>>   }
>>   
>> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
>> +{
>> +    GC_INIT(ctx);
>> +    xc_pcitopoinfo_t tinfo;
>> +    DECLARE_HYPERCALL_BUFFER(xen_sysctl_pcitopo_t, pcitopo);
>> +    libxl_pcitopology *ret = NULL;
>> +    int i, rc;
>> +
> I see from where this comes from. However, at least from new functions,
> I think we should avoid using things like DECLARE_HYPERCALL_BUFFER etc.,
> in libxl. They belong in libxc, IMO.
>
> This is basically what Andrew was doing here (although that was on
> xc_{topology,numa}info:
>
> http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/hwloc-support-experimental
> http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blobdiff;f=tools/libxc/xc_misc.c;h=4f672cead5a4d2b1fc23bcddb6fb49e4d6ec72b5;hp=330345459176961cacc7fda506e843831fe7156a;hb=3585994405b6a73c137309dd4be91f48c71e4903;hpb=9a80d5056766535ac624774b96495f8b97b1d28b
>
> Basically, what I'm suggesting is to have xc_pcitopoinfo() to do most of
> the work, with libxl_get_pci_topology being just a wrapper to it.


Maybe then I should update libxl_get_cpu_topology() and 
libxl_get_numainfo() as well for code uniformity.

-boris


>
> In fact, if you grep/cscope for things like DECLARE_HYPERCALL_BUFFER in
> tools/libxl, the *only* results are these:
>
> libxl.c             libxl_get_cpu_topology                  5076 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_core_t, coremap);
> libxl.c             libxl_get_cpu_topology                  5077 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_socket_t, socketmap);
> libxl.c             libxl_get_cpu_topology                  5078 DECLARE_HYPERCALL_BUFFER(xc_cpu_to_node_t, nodemap);
> libxl.c             libxl_get_numainfo                      5142 DECLARE_HYPERCALL_BUFFER(xc_node_to_memsize_t, memsize);
> libxl.c             libxl_get_numainfo                      5143 DECLARE_HYPERCALL_BUFFER(xc_node_to_memfree_t, memfree);
> libxl.c             libxl_get_numainfo                      5144 DECLARE_HYPERCALL_BUFFER(uint32_t, node_dists);
>
> And I think we should work toward removing these too, rather than adding
> new ones! :-)
>
> Regards,
> Dario
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-07  9:06   ` Jan Beulich
@ 2015-01-07 14:42     ` Boris Ostrovsky
  2015-01-07 14:47       ` Andrew Cooper
  2015-01-07 15:06       ` Jan Beulich
  0 siblings, 2 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-07 14:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

On 01/07/2015 04:06 AM, Jan Beulich wrote:
>>>> On 06.01.15 at 03:18, <boris.ostrovsky@oracle.com> wrote:
>> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           }
>>           else
>>               pdev_info.is_virtfn = 0;
>> -        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
>> +
>> +        if ( add.flags & XEN_PCI_DEV_PXM )
>> +        {
>> +            uint32_t pxm;
>> +            int optarr_off = offsetof(struct physdev_pci_device_add, optarr) /
> unsigned int or size_t.
>
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -56,6 +56,8 @@ struct pci_dev {
>>   
>>       u8 phantom_stride;
>>   
>> +    int node; /* NUMA node */
> I thought I asked about this on v1 already: Does this really need to be
> an int, when commonly node numbers are stored in u8/unsigned char?
> Shrinking the field size would prevent the structure size from growing...


I kept this field as an int to be able to store NUMA_NO_NODE which I 
thought to be (int)-1.

But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to 
(int)-1 by pxm_to_node(). Given that there is a number of tests for 
NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() 
return u8 as well?


> Of course an additional question would be whether the node wouldn't
> better go into struct arch_pci_dev - that depends on whether we
> expect ARM to be using NUMA...
>

Since we have CPU topology in common code I thought this would be 
arch-independent as well.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-07  9:12     ` Jan Beulich
@ 2015-01-07 14:45       ` Boris Ostrovsky
  2015-01-07 15:09         ` Jan Beulich
  2015-01-16 15:56       ` Boris Ostrovsky
  1 sibling, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-07 14:45 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>>> put cpu/socket/node into a single structure and do a single copy for each
>>> processor.
>>>
>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>>> sufficient
>>>
>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
>>> that these are used for CPU topology. Subsequent patch will add support for
>>> PCI topology sysctl.
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> If we are going to change the hypercall, then can we see about making it
>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>> components which might want/need access to this information.  (i.e. I am
>> still looking for a reasonable way to get this information from Xen in
>> hwloc)
> In which case leaving the sysctl alone and just adding a new non-sysctl
> interface should be considered.

I'd expect IO NUMA information to be used together with CPU topology 
information so I am not sure how useful this would be. Unless we create 
a similar interface for that (CPU/memory) as well.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-07 14:15     ` Boris Ostrovsky
@ 2015-01-07 14:45       ` Dario Faggioli
  0 siblings, 0 replies; 60+ messages in thread
From: Dario Faggioli @ 2015-01-07 14:45 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	ian.jackson, xen-devel, jbeulich, keir, ufimtseva


[-- Attachment #1.1: Type: text/plain, Size: 1370 bytes --]

On Wed, 2015-01-07 at 09:15 -0500, Boris Ostrovsky wrote:
> On 01/07/2015 04:04 AM, Dario Faggioli wrote:
> > This is basically what Andrew was doing here (although that was on
> > xc_{topology,numa}info:
> >
> > http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/hwloc-support-experimental
> > http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blobdiff;f=tools/libxc/xc_misc.c;h=4f672cead5a4d2b1fc23bcddb6fb49e4d6ec72b5;hp=330345459176961cacc7fda506e843831fe7156a;hb=3585994405b6a73c137309dd4be91f48c71e4903;hpb=9a80d5056766535ac624774b96495f8b97b1d28b
> >
> > Basically, what I'm suggesting is to have xc_pcitopoinfo() to do most of
> > the work, with libxl_get_pci_topology being just a wrapper to it.
> 
> 
> Maybe then I should update libxl_get_cpu_topology() and 
> libxl_get_numainfo() as well for code uniformity.
> 
Yes, that would be ideal.

I wanted to mention it, but didn't, because I felt it was not fair to
"ask" you that.. but yes, since you're reworking them anyway, that
really would IMO be wonderful. :-)

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-07 14:42     ` Boris Ostrovsky
@ 2015-01-07 14:47       ` Andrew Cooper
  2015-01-07 15:07         ` Jan Beulich
  2015-01-07 15:06       ` Jan Beulich
  1 sibling, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2015-01-07 14:47 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

On 07/01/15 14:42, Boris Ostrovsky wrote:
> On 01/07/2015 04:06 AM, Jan Beulich wrote:
>>>>> On 06.01.15 at 03:18, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -618,7 +620,22 @@ ret_t do_physdev_op(int cmd,
>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>           }
>>>           else
>>>               pdev_info.is_virtfn = 0;
>>> -        ret = pci_add_device(add.seg, add.bus, add.devfn, &pdev_info);
>>> +
>>> +        if ( add.flags & XEN_PCI_DEV_PXM )
>>> +        {
>>> +            uint32_t pxm;
>>> +            int optarr_off = offsetof(struct
>>> physdev_pci_device_add, optarr) /
>> unsigned int or size_t.
>>
>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -56,6 +56,8 @@ struct pci_dev {
>>>         u8 phantom_stride;
>>>   +    int node; /* NUMA node */
>> I thought I asked about this on v1 already: Does this really need to be
>> an int, when commonly node numbers are stored in u8/unsigned char?
>> Shrinking the field size would prevent the structure size from
>> growing...
>
>
> I kept this field as an int to be able to store NUMA_NO_NODE which I
> thought to be (int)-1.
>
> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to
> (int)-1 by pxm_to_node(). Given that there is a number of tests for
> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node()
> return u8 as well?

I noticed this as well, and found it quite counter intuitive.

I would suggest fixing NUMA_NO_NODE to -1 and removing some of the
type-punning.

~Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-07  9:21   ` Jan Beulich
@ 2015-01-07 14:55     ` Boris Ostrovsky
  2015-01-07 15:17       ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-07 14:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

On 01/07/2015 04:21 AM, Jan Beulich wrote:
>>>> On 06.01.15 at 03:18, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/common/sysctl.c
>> +++ b/xen/common/sysctl.c
>> @@ -365,6 +365,66 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>       }
>>       break;
>>   
> #ifdef HAS_PCI
>
>> +    case XEN_SYSCTL_pcitopoinfo:
>> +    {
>> +        xen_sysctl_pcitopoinfo_t *ti = &op->u.pcitopoinfo;
>> +
>> +        if ( guest_handle_is_null(ti->pcitopo) ||
>> +             (ti->first_dev >= ti->num_devs) )
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        for ( ; ti->first_dev < ti->num_devs; ti->first_dev++ )
>> +        {
>> +            xen_sysctl_pcitopo_t pcitopo;
>> +            struct pci_dev *pdev;
>> +
>> +            if ( copy_from_guest_offset(&pcitopo, ti->pcitopo,
>> +                                        ti->first_dev, 1) )
>> +            {
>> +                ret = -EFAULT;
>> +                break;
>> +            }
>> +
>> +            spin_lock(&pcidevs_lock);
>> +            pdev = pci_get_pdev(pcitopo.pcidev.seg, pcitopo.pcidev.bus,
>> +                                pcitopo.pcidev.devfn);
>> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
>> +                pcitopo.node = INVALID_TOPOLOGY_ID;
>> +            else
>> +                pcitopo.node = pdev->node;
> Are hypervisor-internal node numbers really meaningful to the caller?


This is the same information (pxm -> node mapping ) that we provide in 
XEN_SYSCTL_topologyinfo (renamed in this series to
XEN_SYSCTL_cputopoinfo). Given that I expect the two topologies to be 
used together I think the answer is yes.


>> @@ -463,7 +464,7 @@ typedef struct xen_sysctl_lockprof_op xen_sysctl_lockprof_op_t;
>>   DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
>>   
>>   /* XEN_SYSCTL_cputopoinfo */
>> -#define INVALID_TOPOLOGY_ID  (~0U)
>> +#define INVALID_TOPOLOGY_ID  (~0U) /* Also used by pcitopo */
> Better extend the preceding comment.

I mentioned it to Wei yesterday that the file is structured (or at least 
appears to me to be structured) in such a way that these top comments 
mark sections of definitions for each sysctl. And so I thought that I'd 
be breaking this convention if I were to extend the comment.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-07 14:42     ` Boris Ostrovsky
  2015-01-07 14:47       ` Andrew Cooper
@ 2015-01-07 15:06       ` Jan Beulich
  2015-01-07 15:31         ` Boris Ostrovsky
  1 sibling, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-07 15:06 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 07.01.15 at 15:42, <boris.ostrovsky@oracle.com> wrote:
> On 01/07/2015 04:06 AM, Jan Beulich wrote:
>>>>> On 06.01.15 at 03:18, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -56,6 +56,8 @@ struct pci_dev {
>>>   
>>>       u8 phantom_stride;
>>>   
>>> +    int node; /* NUMA node */
>> I thought I asked about this on v1 already: Does this really need to be
>> an int, when commonly node numbers are stored in u8/unsigned char?
>> Shrinking the field size would prevent the structure size from growing...
> 
> I kept this field as an int to be able to store NUMA_NO_NODE which I 
> thought to be (int)-1.
> 
> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to 
> (int)-1 by pxm_to_node(). Given that there is a number of tests for 
> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node() 
> return u8 as well?

I think that would make sense, together with fixing up one of the three
callers in VT-d code (from alloc_pgtable_maddr()); the other two look
correct already.

>> Of course an additional question would be whether the node wouldn't
>> better go into struct arch_pci_dev - that depends on whether we
>> expect ARM to be using NUMA...
> 
> Since we have CPU topology in common code I thought this would be 
> arch-independent as well.

Not sure what you're referring to here: What common piece of data
stores the node of a particular CPU? cpu_to_node[] clearly is x86-
specific.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-07 14:47       ` Andrew Cooper
@ 2015-01-07 15:07         ` Jan Beulich
  2015-01-07 15:34           ` Boris Ostrovsky
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-07 15:07 UTC (permalink / raw)
  To: Andrew Cooper, Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 07.01.15 at 15:47, <andrew.cooper3@citrix.com> wrote:
> On 07/01/15 14:42, Boris Ostrovsky wrote:
>> I kept this field as an int to be able to store NUMA_NO_NODE which I
>> thought to be (int)-1.
>>
>> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to
>> (int)-1 by pxm_to_node(). Given that there is a number of tests for
>> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node()
>> return u8 as well?
> 
> I noticed this as well, and found it quite counter intuitive.
> 
> I would suggest fixing NUMA_NO_NODE to -1 and removing some of the
> type-punning.

I have to admit that I see no value in wasting 4 bytes for something
that for the foreseeable future won't exceed 1 byte.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-07 14:45       ` Boris Ostrovsky
@ 2015-01-07 15:09         ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-07 15:09 UTC (permalink / raw)
  To: Andrew Cooper, Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

>>> On 07.01.15 at 15:45, <boris.ostrovsky@oracle.com> wrote:
> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>> processor.
>>>>
>>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>>>> sufficient
>>>>
>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the 
> fact
>>>> that these are used for CPU topology. Subsequent patch will add support for
>>>> PCI topology sysctl.
>>>>
>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> If we are going to change the hypercall, then can we see about making it
>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>> components which might want/need access to this information.  (i.e. I am
>>> still looking for a reasonable way to get this information from Xen in
>>> hwloc)
>> In which case leaving the sysctl alone and just adding a new non-sysctl
>> interface should be considered.
> 
> I'd expect IO NUMA information to be used together with CPU topology 
> information so I am not sure how useful this would be. Unless we create 
> a similar interface for that (CPU/memory) as well.

Creating a new CPU topology interface while leaving alone the current
sysctl was what I meant to suggest. The I/O topology one would then
become a sibling to the CPU one.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-07 14:55     ` Boris Ostrovsky
@ 2015-01-07 15:17       ` Jan Beulich
  2015-01-07 15:54         ` Boris Ostrovsky
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-07 15:17 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 07.01.15 at 15:55, <boris.ostrovsky@oracle.com> wrote:
> On 01/07/2015 04:21 AM, Jan Beulich wrote:
>>>>> On 06.01.15 at 03:18, <boris.ostrovsky@oracle.com> wrote:
>>> +        for ( ; ti->first_dev < ti->num_devs; ti->first_dev++ )
>>> +        {
>>> +            xen_sysctl_pcitopo_t pcitopo;
>>> +            struct pci_dev *pdev;
>>> +
>>> +            if ( copy_from_guest_offset(&pcitopo, ti->pcitopo,
>>> +                                        ti->first_dev, 1) )
>>> +            {
>>> +                ret = -EFAULT;
>>> +                break;
>>> +            }
>>> +
>>> +            spin_lock(&pcidevs_lock);
>>> +            pdev = pci_get_pdev(pcitopo.pcidev.seg, pcitopo.pcidev.bus,
>>> +                                pcitopo.pcidev.devfn);
>>> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
>>> +                pcitopo.node = INVALID_TOPOLOGY_ID;
>>> +            else
>>> +                pcitopo.node = pdev->node;
>> Are hypervisor-internal node numbers really meaningful to the caller?
> 
> 
> This is the same information (pxm -> node mapping ) that we provide in 
> XEN_SYSCTL_topologyinfo (renamed in this series to
> XEN_SYSCTL_cputopoinfo). Given that I expect the two topologies to be 
> used together I think the answer is yes.

Building your argumentation on potentially mis-designed existing
interfaces is bogus. The question is - what use is a Xen internal
node number to a caller of a particular hypercall (other than it
being purely informational, e.g. for printing human readable
output)?

In particular if we were to introduce a new non-sysctl interface,
determining whether the hypervisor internal representation is really
the right one to expose here should be one of the most important
design aspects. I personally think that exposing e.g. the firmware
determined (and hence hopefully stable across reboots) PXM would
be more reasonable.

>>> @@ -463,7 +464,7 @@ typedef struct xen_sysctl_lockprof_op 
> xen_sysctl_lockprof_op_t;
>>>   DEFINE_XEN_GUEST_HANDLE(xen_sysctl_lockprof_op_t);
>>>   
>>>   /* XEN_SYSCTL_cputopoinfo */
>>> -#define INVALID_TOPOLOGY_ID  (~0U)
>>> +#define INVALID_TOPOLOGY_ID  (~0U) /* Also used by pcitopo */
>> Better extend the preceding comment.
> 
> I mentioned it to Wei yesterday that the file is structured (or at least 
> appears to me to be structured) in such a way that these top comments 
> mark sections of definitions for each sysctl. And so I thought that I'd 
> be breaking this convention if I were to extend the comment.

Yeah, I saw that discussion after having replied. Looking more
closely, I think the placement of the INVALID_TOPOLOGY_ID
definition is sub-optimal when you want to use it in a second place.
Move it mode towards the beginning of the header, leave the
current comment as is, and if you feel so ad a new comment ahead
of it explaining which operations are using it.

And then, if we go the non-sysctl route, the definition would likely
need moving into xen.h anyway.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-06 13:41   ` Andrew Cooper
  2015-01-06 14:45     ` Boris Ostrovsky
  2015-01-07  9:12     ` Jan Beulich
@ 2015-01-07 15:23     ` Jan Beulich
  2 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-07 15:23 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir, Boris Ostrovsky

>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
> On 06/01/15 02:18, Boris Ostrovsky wrote:
>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>> put cpu/socket/node into a single structure and do a single copy for each
>> processor.
>>
>> There is also no need to copy whole op to user at the end, max_cpu_index is
>> sufficient
>>
>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the 
> fact
>> that these are used for CPU topology. Subsequent patch will add support for
>> PCI topology sysctl.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> If we are going to change the hypercall, then can we see about making it
> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
> components which might want/need access to this information.  (i.e. I am
> still looking for a reasonable way to get this information from Xen in
> hwloc)

Now having mentioned the alternative of doing this via other than
sysctl several times, I started wondering why exactly we'd want
this: hwloc is still user mode code, i.e. whether you call this part of
"the toolstack" is ambiguous. The only real reason to make an
interface like this other than a sysctl would be if the kernel had a
need to access it. The stability of the interface otoh has no meaning
here at all imo. We could declare a particular sysctl as "will never
change", but I understand that the need to pass the correct
XEN_SYSCTL_INTERFACE_VERSION into the hypercall would still
make it a little cumbersome to use.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-07 15:06       ` Jan Beulich
@ 2015-01-07 15:31         ` Boris Ostrovsky
  2015-01-07 15:44           ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-07 15:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

On 01/07/2015 10:06 AM, Jan Beulich wrote:
>
>>> Of course an additional question would be whether the node wouldn't
>>> better go into struct arch_pci_dev - that depends on whether we
>>> expect ARM to be using NUMA...
>> Since we have CPU topology in common code I thought this would be
>> arch-independent as well.
> Not sure what you're referring to here: What common piece of data
> stores the node of a particular CPU? cpu_to_node[] clearly is x86-
> specific.

I meant that we have common interface (sysctl) for querying topology and 
PCI is a common IO bus (unlike CPUs) so I figured I'd keep node in 
common part as well.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-07 15:07         ` Jan Beulich
@ 2015-01-07 15:34           ` Boris Ostrovsky
  2015-01-07 15:46             ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-07 15:34 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

On 01/07/2015 10:07 AM, Jan Beulich wrote:
>>>> On 07.01.15 at 15:47, <andrew.cooper3@citrix.com> wrote:
>> On 07/01/15 14:42, Boris Ostrovsky wrote:
>>> I kept this field as an int to be able to store NUMA_NO_NODE which I
>>> thought to be (int)-1.
>>>
>>> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to
>>> (int)-1 by pxm_to_node(). Given that there is a number of tests for
>>> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node()
>>> return u8 as well?
>> I noticed this as well, and found it quite counter intuitive.
>>
>> I would suggest fixing NUMA_NO_NODE to -1 and removing some of the
>> type-punning.
> I have to admit that I see no value in wasting 4 bytes for something
> that for the foreseeable future won't exceed 1 byte.

The downside of going to u8 is that we'd be limiting number of nodes to 
254, which is somewhat awkward. OTOH we already do this by testing 
nodeID against 0xff in various places.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-07 15:31         ` Boris Ostrovsky
@ 2015-01-07 15:44           ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-07 15:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 07.01.15 at 16:31, <boris.ostrovsky@oracle.com> wrote:
> On 01/07/2015 10:06 AM, Jan Beulich wrote:
>>
>>>> Of course an additional question would be whether the node wouldn't
>>>> better go into struct arch_pci_dev - that depends on whether we
>>>> expect ARM to be using NUMA...
>>> Since we have CPU topology in common code I thought this would be
>>> arch-independent as well.
>> Not sure what you're referring to here: What common piece of data
>> stores the node of a particular CPU? cpu_to_node[] clearly is x86-
>> specific.
> 
> I meant that we have common interface (sysctl) for querying topology and 
> PCI is a common IO bus (unlike CPUs) so I figured I'd keep node in 
> common part as well.

Makes little sense: ARM stubs out cpu_to_node(), so a similar
mechanism could be used to hide the fact that some arch-es may
have no node associated with a PCI device.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/4] pci: Do not ignore device's PXM information
  2015-01-07 15:34           ` Boris Ostrovsky
@ 2015-01-07 15:46             ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-07 15:46 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Andrew Cooper,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 07.01.15 at 16:34, <boris.ostrovsky@oracle.com> wrote:
> On 01/07/2015 10:07 AM, Jan Beulich wrote:
>>>>> On 07.01.15 at 15:47, <andrew.cooper3@citrix.com> wrote:
>>> On 07/01/15 14:42, Boris Ostrovsky wrote:
>>>> I kept this field as an int to be able to store NUMA_NO_NODE which I
>>>> thought to be (int)-1.
>>>>
>>>> But now I see that NUMA_NO_NODE is, in fact, 0xff but is promoted to
>>>> (int)-1 by pxm_to_node(). Given that there is a number of tests for
>>>> NUMA_NO_NODE and not for (int)-1, should we then make pxm_to_node()
>>>> return u8 as well?
>>> I noticed this as well, and found it quite counter intuitive.
>>>
>>> I would suggest fixing NUMA_NO_NODE to -1 and removing some of the
>>> type-punning.
>> I have to admit that I see no value in wasting 4 bytes for something
>> that for the foreseeable future won't exceed 1 byte.
> 
> The downside of going to u8 is that we'd be limiting number of nodes to 
> 254, which is somewhat awkward. OTOH we already do this by testing 
> nodeID against 0xff in various places.

With NODES_SHIFT being 6 and hence MAX_NUMNODES being 0x40,
we can't reach 254 right now anyway.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-07 15:17       ` Jan Beulich
@ 2015-01-07 15:54         ` Boris Ostrovsky
  2015-01-07 16:52           ` Jan Beulich
  2015-01-07 17:55           ` Dario Faggioli
  0 siblings, 2 replies; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-07 15:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

On 01/07/2015 10:17 AM, Jan Beulich wrote:
>>>> On 07.01.15 at 15:55, <boris.ostrovsky@oracle.com> wrote:
>> On 01/07/2015 04:21 AM, Jan Beulich wrote:
>>>>>> On 06.01.15 at 03:18, <boris.ostrovsky@oracle.com> wrote:
>>>> +        for ( ; ti->first_dev < ti->num_devs; ti->first_dev++ )
>>>> +        {
>>>> +            xen_sysctl_pcitopo_t pcitopo;
>>>> +            struct pci_dev *pdev;
>>>> +
>>>> +            if ( copy_from_guest_offset(&pcitopo, ti->pcitopo,
>>>> +                                        ti->first_dev, 1) )
>>>> +            {
>>>> +                ret = -EFAULT;
>>>> +                break;
>>>> +            }
>>>> +
>>>> +            spin_lock(&pcidevs_lock);
>>>> +            pdev = pci_get_pdev(pcitopo.pcidev.seg, pcitopo.pcidev.bus,
>>>> +                                pcitopo.pcidev.devfn);
>>>> +            if ( !pdev || (pdev->node == NUMA_NO_NODE) )
>>>> +                pcitopo.node = INVALID_TOPOLOGY_ID;
>>>> +            else
>>>> +                pcitopo.node = pdev->node;
>>> Are hypervisor-internal node numbers really meaningful to the caller?
>>
>> This is the same information (pxm -> node mapping ) that we provide in
>> XEN_SYSCTL_topologyinfo (renamed in this series to
>> XEN_SYSCTL_cputopoinfo). Given that I expect the two topologies to be
>> used together I think the answer is yes.
> Building your argumentation on potentially mis-designed existing
> interfaces is bogus. The question is - what use is a Xen internal
> node number to a caller of a particular hypercall (other than it
> being purely informational, e.g. for printing human readable
> output)?

Just as with knowing CPU/memory topology --- this will help with placing 
a guest if we know what "proximity domain" both the device and the 
CPUs/memory belong to.

Exposing PXM values to the caller would be as good as those internal 
node IDs. The only (I think) problem is that PXMs are not necessarily 
zero-based and may not be contiguous and so we need to have some sort of 
a common mapping for both CPUs and devices. And hypervisor provides such 
mapping in persistent way.

And if we are going to keep this as a sysctl then we need to be 
consistent with what we do now for CPUs, which is pxm2node[]. Or change 
CPU topology sysctl as well, which I don't think is a good idea.

>
> In particular if we were to introduce a new non-sysctl interface,
> determining whether the hypervisor internal representation is really
> the right one to expose here should be one of the most important
> design aspects.

Yes.

> I personally think that exposing e.g. the firmware
> determined (and hence hopefully stable across reboots) PXM would
> be more reasonable.

Again, the main argument that I see against using PXM values directly is 
the fact that it's not zero-based/non-contiguous.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-07 15:54         ` Boris Ostrovsky
@ 2015-01-07 16:52           ` Jan Beulich
  2015-01-07 17:55           ` Dario Faggioli
  1 sibling, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-07 16:52 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 07.01.15 at 16:54, <boris.ostrovsky@oracle.com> wrote:
> On 01/07/2015 10:17 AM, Jan Beulich wrote:
>> I personally think that exposing e.g. the firmware
>> determined (and hence hopefully stable across reboots) PXM would
>> be more reasonable.
> 
> Again, the main argument that I see against using PXM values directly is 
> the fact that it's not zero-based/non-contiguous.

I have to admit that I can't see why either of the aspects would
matter.

One thing coming to mind though is that the memory allocation
interfaces want Xen node numbers passed in.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-07 15:54         ` Boris Ostrovsky
  2015-01-07 16:52           ` Jan Beulich
@ 2015-01-07 17:55           ` Dario Faggioli
  2015-01-08  9:50             ` Jan Beulich
  1 sibling, 1 reply; 60+ messages in thread
From: Dario Faggioli @ 2015-01-07 17:55 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	ian.jackson, xen-devel, Jan Beulich, keir, ufimtseva


[-- Attachment #1.1: Type: text/plain, Size: 2619 bytes --]

On Wed, 2015-01-07 at 10:54 -0500, Boris Ostrovsky wrote:
> On 01/07/2015 10:17 AM, Jan Beulich wrote:
> >> This is the same information (pxm -> node mapping ) that we provide in
> >> XEN_SYSCTL_topologyinfo (renamed in this series to
> >> XEN_SYSCTL_cputopoinfo). Given that I expect the two topologies to be
> >> used together I think the answer is yes.
> > Building your argumentation on potentially mis-designed existing
> > interfaces is bogus. The question is - what use is a Xen internal
> > node number to a caller of a particular hypercall (other than it
> > being purely informational, e.g. for printing human readable
> > output)?
> 
> Just as with knowing CPU/memory topology --- this will help with placing 
> a guest if we know what "proximity domain" both the device and the 
> CPUs/memory belong to.
> 
FWIW, my view on how IONUMA information could be useful is this: either
somewhere inside toolstack, automatically, or by hand, one may want to
reason as follows:

"Ehi, network card X is on node #2, let's place domain d, to which I'm
passing through such card, on node #2 (or as much and as close as
possible to node #2), to get best performance!"

Of course, when inside the toolstack, we can do this automatically:

"Domain d does not come with any affinity/placement information, but it
is passed through net card X, which is on node #2, so let's try to place
it on node #2 too"

So, I think I agree with Boris that at least consistency is necessary.
Right now, pretty much everything at both the toolstack (libxl) and
command line (xl, for doing things by hand) level, uses node IDs
reported by the hypervisor, as this series is also doing.

In particular, at the command line level, affinity, cpupools, vNUMA (as
per Wei's patches)... Everything speaks "node ID language", AFAICT.

There probably would not be too serious issues in converting everything
to PXM, or adding duplicates, but I don't see the reason why we should
do such a thing... Perhaps I'm missing what using PXM would actually buy
us?

> And if we are going to keep this as a sysctl then we need to be 
> consistent with what we do now for CPUs, which is pxm2node[]. Or change 
> CPU topology sysctl as well, 
>
Indeed, and not only that.

> which I don't think is a good idea.
> 
Me neither.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-07 17:55           ` Dario Faggioli
@ 2015-01-08  9:50             ` Jan Beulich
  2015-01-08 15:49               ` Boris Ostrovsky
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-08  9:50 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	ian.jackson, xen-devel, ufimtseva, keir, Boris Ostrovsky

>>> On 07.01.15 at 18:55, <dario.faggioli@citrix.com> wrote:
> There probably would not be too serious issues in converting everything
> to PXM, or adding duplicates, but I don't see the reason why we should
> do such a thing... Perhaps I'm missing what using PXM would actually buy
> us?

As long as those node IDs don't get stored persistently for future
use, using them (despite being a Xen internal representation)
ought to be fine. My concern really is that such Xen internals may
easily get mis-used in the tool stack, e.g. assuming that they won't
change across reboots (perhaps after a hypervisor update).

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-08  9:50             ` Jan Beulich
@ 2015-01-08 15:49               ` Boris Ostrovsky
  2015-01-08 15:54                 ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-08 15:49 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	ian.jackson, xen-devel, ufimtseva, keir

On 01/08/2015 04:50 AM, Jan Beulich wrote:
>>>> On 07.01.15 at 18:55, <dario.faggioli@citrix.com> wrote:
>> There probably would not be too serious issues in converting everything
>> to PXM, or adding duplicates, but I don't see the reason why we should
>> do such a thing... Perhaps I'm missing what using PXM would actually buy
>> us?
> As long as those node IDs don't get stored persistently for future
> use, using them (despite being a Xen internal representation)
> ought to be fine. My concern really is that such Xen internals may
> easily get mis-used in the tool stack, e.g. assuming that they won't
> change across reboots (perhaps after a hypervisor update).

Are PXMs guaranteed to be persistent across firmware upgrades? I don't 
see anything about this in the APCI spec. The only semi-relevant 
statement that I found was that PXM values are meaningless (i.e. they 
are much like cookies).

(FWIW, Linux exposes nodeIDs in the same manner as Xen. In fact, Xen's 
srat.c appears to be derived from Linux code.)

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology
  2015-01-08 15:49               ` Boris Ostrovsky
@ 2015-01-08 15:54                 ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-08 15:54 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, andrew.cooper3,
	Dario Faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 08.01.15 at 16:49, <boris.ostrovsky@oracle.com> wrote:
> On 01/08/2015 04:50 AM, Jan Beulich wrote:
>>>>> On 07.01.15 at 18:55, <dario.faggioli@citrix.com> wrote:
>>> There probably would not be too serious issues in converting everything
>>> to PXM, or adding duplicates, but I don't see the reason why we should
>>> do such a thing... Perhaps I'm missing what using PXM would actually buy
>>> us?
>> As long as those node IDs don't get stored persistently for future
>> use, using them (despite being a Xen internal representation)
>> ought to be fine. My concern really is that such Xen internals may
>> easily get mis-used in the tool stack, e.g. assuming that they won't
>> change across reboots (perhaps after a hypervisor update).
> 
> Are PXMs guaranteed to be persistent across firmware upgrades? I don't 
> see anything about this in the APCI spec. The only semi-relevant 
> statement that I found was that PXM values are meaningless (i.e. they 
> are much like cookies).

Right, and I alluded to that earlier on. Still firmware updates are
presumable less frequent than hypervisor ones.

> (FWIW, Linux exposes nodeIDs in the same manner as Xen. In fact, Xen's 
> srat.c appears to be derived from Linux code.)

Not sure what the latter has to do with the former.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-07  9:12     ` Jan Beulich
  2015-01-07 14:45       ` Boris Ostrovsky
@ 2015-01-16 15:56       ` Boris Ostrovsky
  2015-01-16 16:06         ` Jan Beulich
  1 sibling, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-16 15:56 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: wei.liu2, ian.campbell, stefano.stabellini, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir

On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>>> put cpu/socket/node into a single structure and do a single copy for each
>>> processor.
>>>
>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>>> sufficient
>>>
>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
>>> that these are used for CPU topology. Subsequent patch will add support for
>>> PCI topology sysctl.
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> If we are going to change the hypercall, then can we see about making it
>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>> components which might want/need access to this information.  (i.e. I am
>> still looking for a reasonable way to get this information from Xen in
>> hwloc)
> In which case leaving the sysctl alone and just adding a new non-sysctl
> interface should be considered.

(Sorry for late reply)

Would a platform op be an option here or do you prefer a whole new 
hypercall?

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 15:56       ` Boris Ostrovsky
@ 2015-01-16 16:06         ` Jan Beulich
  2015-01-16 16:14           ` Boris Ostrovsky
  2015-01-16 16:16           ` Ian Campbell
  0 siblings, 2 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-16 16:06 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Andrew Cooper,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>> processor.
>>>>
>>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>>>> sufficient
>>>>
>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the 
> fact
>>>> that these are used for CPU topology. Subsequent patch will add support for
>>>> PCI topology sysctl.
>>>>
>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> If we are going to change the hypercall, then can we see about making it
>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>> components which might want/need access to this information.  (i.e. I am
>>> still looking for a reasonable way to get this information from Xen in
>>> hwloc)
>> In which case leaving the sysctl alone and just adding a new non-sysctl
>> interface should be considered.
> 
> (Sorry for late reply)
> 
> Would a platform op be an option here or do you prefer a whole new 
> hypercall?

>From an abstract pov a platform op would be fine, but iirc you had
a need for preempting, which doesn't work well for that hypercall.
A whole new one seems overkill too. Perhaps slightly bending what
physdevop-s are used for might be an option...

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:06         ` Jan Beulich
@ 2015-01-16 16:14           ` Boris Ostrovsky
  2015-01-16 16:20             ` Jan Beulich
  2015-01-16 16:16           ` Ian Campbell
  1 sibling, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-16 16:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Andrew Cooper,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

On 01/16/2015 11:06 AM, Jan Beulich wrote:
>>>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>>> processor.
>>>>>
>>>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>>>>> sufficient
>>>>>
>>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the
>> fact
>>>>> that these are used for CPU topology. Subsequent patch will add support for
>>>>> PCI topology sysctl.
>>>>>
>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>> If we are going to change the hypercall, then can we see about making it
>>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>>> components which might want/need access to this information.  (i.e. I am
>>>> still looking for a reasonable way to get this information from Xen in
>>>> hwloc)
>>> In which case leaving the sysctl alone and just adding a new non-sysctl
>>> interface should be considered.
>> (Sorry for late reply)
>>
>> Would a platform op be an option here or do you prefer a whole new
>> hypercall?
>  From an abstract pov a platform op would be fine, but iirc you had
> a need for preempting, which doesn't work well for that hypercall.
> A whole new one seems overkill too. Perhaps slightly bending what
> physdevop-s are used for might be an option...

Since the plan is to query for both CPU and IO topologies physdevops 
doesn't seem to be a logical place for that, does it?

What is the problem with preempting while in platform op? We already do 
this for microcode updates.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:06         ` Jan Beulich
  2015-01-16 16:14           ` Boris Ostrovsky
@ 2015-01-16 16:16           ` Ian Campbell
  2015-01-16 16:34             ` Jan Beulich
  1 sibling, 1 reply; 60+ messages in thread
From: Ian Campbell @ 2015-01-16 16:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, stefano.stabellini, Andrew Cooper, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir, Boris Ostrovsky

On Fri, 2015-01-16 at 16:06 +0000, Jan Beulich wrote:
> >>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
> > On 01/07/2015 04:12 AM, Jan Beulich wrote:
> >>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
> >>> On 06/01/15 02:18, Boris Ostrovsky wrote:
> >>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
> >>>> put cpu/socket/node into a single structure and do a single copy for each
> >>>> processor.
> >>>>
> >>>> There is also no need to copy whole op to user at the end, max_cpu_index is
> >>>> sufficient
> >>>>
> >>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the 
> > fact
> >>>> that these are used for CPU topology. Subsequent patch will add support for
> >>>> PCI topology sysctl.
> >>>>
> >>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >>> If we are going to change the hypercall, then can we see about making it
> >>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
> >>> components which might want/need access to this information.  (i.e. I am
> >>> still looking for a reasonable way to get this information from Xen in
> >>> hwloc)
> >> In which case leaving the sysctl alone and just adding a new non-sysctl
> >> interface should be considered.
> > 
> > (Sorry for late reply)
> > 
> > Would a platform op be an option here or do you prefer a whole new 
> > hypercall?
> 
> From an abstract pov a platform op would be fine, but iirc you had
> a need for preempting, which doesn't work well for that hypercall.
> A whole new one seems overkill too. Perhaps slightly bending what
> physdevop-s are used for might be an option...

Unlike sysctls, physdevop-s are exposed to/stable for dom0 too aren't
they?

Ian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:14           ` Boris Ostrovsky
@ 2015-01-16 16:20             ` Jan Beulich
  2015-01-16 16:34               ` Boris Ostrovsky
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-16 16:20 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Andrew Cooper,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 16.01.15 at 17:14, <boris.ostrovsky@oracle.com> wrote:
> On 01/16/2015 11:06 AM, Jan Beulich wrote:
>>>>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>>> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>>>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>>>> processor.
>>>>>>
>>>>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>>>>>> sufficient
>>>>>>
>>>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the
>>> fact
>>>>>> that these are used for CPU topology. Subsequent patch will add support for
>>>>>> PCI topology sysctl.
>>>>>>
>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>>> If we are going to change the hypercall, then can we see about making it
>>>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>>>> components which might want/need access to this information.  (i.e. I am
>>>>> still looking for a reasonable way to get this information from Xen in
>>>>> hwloc)
>>>> In which case leaving the sysctl alone and just adding a new non-sysctl
>>>> interface should be considered.
>>> (Sorry for late reply)
>>>
>>> Would a platform op be an option here or do you prefer a whole new
>>> hypercall?
>>  From an abstract pov a platform op would be fine, but iirc you had
>> a need for preempting, which doesn't work well for that hypercall.
>> A whole new one seems overkill too. Perhaps slightly bending what
>> physdevop-s are used for might be an option...
> 
> Since the plan is to query for both CPU and IO topologies physdevops 
> doesn't seem to be a logical place for that, does it?

That's why I said "slightly bending ..."

> What is the problem with preempting while in platform op? We already do 
> this for microcode updates.

But there we don't need to alter the arguments passed to the
hypercall, whereas you will need a way to encode where to
continue from.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:20             ` Jan Beulich
@ 2015-01-16 16:34               ` Boris Ostrovsky
  2015-01-16 16:42                 ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-16 16:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Andrew Cooper,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

On 01/16/2015 11:20 AM, Jan Beulich wrote:
>>>> On 16.01.15 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>> On 01/16/2015 11:06 AM, Jan Beulich wrote:
>>>>>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>>>> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>>>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>>>>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>>>>> processor.
>>>>>>>
>>>>>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>>>>>>> sufficient
>>>>>>>
>>>>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the
>>>> fact
>>>>>>> that these are used for CPU topology. Subsequent patch will add support for
>>>>>>> PCI topology sysctl.
>>>>>>>
>>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>>>> If we are going to change the hypercall, then can we see about making it
>>>>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>>>>> components which might want/need access to this information.  (i.e. I am
>>>>>> still looking for a reasonable way to get this information from Xen in
>>>>>> hwloc)
>>>>> In which case leaving the sysctl alone and just adding a new non-sysctl
>>>>> interface should be considered.
>>>> (Sorry for late reply)
>>>>
>>>> Would a platform op be an option here or do you prefer a whole new
>>>> hypercall?
>>>   From an abstract pov a platform op would be fine, but iirc you had
>>> a need for preempting, which doesn't work well for that hypercall.
>>> A whole new one seems overkill too. Perhaps slightly bending what
>>> physdevop-s are used for might be an option...
>> Since the plan is to query for both CPU and IO topologies physdevops
>> doesn't seem to be a logical place for that, does it?
> That's why I said "slightly bending ..."
>
>> What is the problem with preempting while in platform op? We already do
>> this for microcode updates.
> But there we don't need to alter the arguments passed to the
> hypercall, whereas you will need a way to encode where to
> continue from.

I needed to do this with last version and ended up having a field in the 
interface structure that hypervisor updates before issuing a continuation.

And I think I'd have to do the same for either platform op or physop 
since neither has an extra argument for passing via a continuation.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:16           ` Ian Campbell
@ 2015-01-16 16:34             ` Jan Beulich
  2015-01-16 16:38               ` Ian Campbell
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-16 16:34 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, stefano.stabellini, Andrew Cooper, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir, Boris Ostrovsky

>>> On 16.01.15 at 17:16, <Ian.Campbell@citrix.com> wrote:
> On Fri, 2015-01-16 at 16:06 +0000, Jan Beulich wrote:
>> >>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>> > On 01/07/2015 04:12 AM, Jan Beulich wrote:
>> >>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>> >>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>> >>>> Instead of copying data for each field in xen_sysctl_topologyinfo 
> separately
>> >>>> put cpu/socket/node into a single structure and do a single copy for each
>> >>>> processor.
>> >>>>
>> >>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>> >>>> sufficient
>> >>>>
>> >>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the 
>> > fact
>> >>>> that these are used for CPU topology. Subsequent patch will add support for
>> >>>> PCI topology sysctl.
>> >>>>
>> >>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> >>> If we are going to change the hypercall, then can we see about making it
>> >>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>> >>> components which might want/need access to this information.  (i.e. I am
>> >>> still looking for a reasonable way to get this information from Xen in
>> >>> hwloc)
>> >> In which case leaving the sysctl alone and just adding a new non-sysctl
>> >> interface should be considered.
>> > 
>> > (Sorry for late reply)
>> > 
>> > Would a platform op be an option here or do you prefer a whole new 
>> > hypercall?
>> 
>> From an abstract pov a platform op would be fine, but iirc you had
>> a need for preempting, which doesn't work well for that hypercall.
>> A whole new one seems overkill too. Perhaps slightly bending what
>> physdevop-s are used for might be an option...
> 
> Unlike sysctls, physdevop-s are exposed to/stable for dom0 too aren't
> they?

Sure, just like platformop-s. What is it I'm not understanding you
try to point out with your question?

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:34             ` Jan Beulich
@ 2015-01-16 16:38               ` Ian Campbell
  2015-01-16 16:45                 ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Campbell @ 2015-01-16 16:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, stefano.stabellini, Andrew Cooper, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir, Boris Ostrovsky

On Fri, 2015-01-16 at 16:34 +0000, Jan Beulich wrote:
> >>> On 16.01.15 at 17:16, <Ian.Campbell@citrix.com> wrote:
> > On Fri, 2015-01-16 at 16:06 +0000, Jan Beulich wrote:
> >> >>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
> >> > On 01/07/2015 04:12 AM, Jan Beulich wrote:
> >> >>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
> >> >>> On 06/01/15 02:18, Boris Ostrovsky wrote:
> >> >>>> Instead of copying data for each field in xen_sysctl_topologyinfo 
> > separately
> >> >>>> put cpu/socket/node into a single structure and do a single copy for each
> >> >>>> processor.
> >> >>>>
> >> >>>> There is also no need to copy whole op to user at the end, max_cpu_index is
> >> >>>> sufficient
> >> >>>>
> >> >>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the 
> >> > fact
> >> >>>> that these are used for CPU topology. Subsequent patch will add support for
> >> >>>> PCI topology sysctl.
> >> >>>>
> >> >>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >> >>> If we are going to change the hypercall, then can we see about making it
> >> >>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
> >> >>> components which might want/need access to this information.  (i.e. I am
> >> >>> still looking for a reasonable way to get this information from Xen in
> >> >>> hwloc)
> >> >> In which case leaving the sysctl alone and just adding a new non-sysctl
> >> >> interface should be considered.
> >> > 
> >> > (Sorry for late reply)
> >> > 
> >> > Would a platform op be an option here or do you prefer a whole new 
> >> > hypercall?
> >> 
> >> From an abstract pov a platform op would be fine, but iirc you had
> >> a need for preempting, which doesn't work well for that hypercall.
> >> A whole new one seems overkill too. Perhaps slightly bending what
> >> physdevop-s are used for might be an option...
> > 
> > Unlike sysctls, physdevop-s are exposed to/stable for dom0 too aren't
> > they?
> 
> Sure, just like platformop-s. What is it I'm not understanding you
> try to point out with your question?

By moving from a sysctl to a physdev op we would then have to declare
the interface stable and lose the ability to change it in the future,
and since it didn't look like the intention here was to expose to dom0
(make more efficient didn't imply that at least) that seems a bit
unnecessary.

Ian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:34               ` Boris Ostrovsky
@ 2015-01-16 16:42                 ` Jan Beulich
  0 siblings, 0 replies; 60+ messages in thread
From: Jan Beulich @ 2015-01-16 16:42 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, ian.campbell, stefano.stabellini, Andrew Cooper,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 16.01.15 at 17:34, <boris.ostrovsky@oracle.com> wrote:
> On 01/16/2015 11:20 AM, Jan Beulich wrote:
>>>>> On 16.01.15 at 17:14, <boris.ostrovsky@oracle.com> wrote:
>>> On 01/16/2015 11:06 AM, Jan Beulich wrote:
>>>>>>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>>>>> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>>>>>> Instead of copying data for each field in xen_sysctl_topologyinfo separately
>>>>>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>>>>>> processor.
>>>>>>>>
>>>>>>>> There is also no need to copy whole op to user at the end, max_cpu_index is
>>>>>>>> sufficient
>>>>>>>>
>>>>>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the
>>>>> fact
>>>>>>>> that these are used for CPU topology. Subsequent patch will add support for
>>>>>>>> PCI topology sysctl.
>>>>>>>>
>>>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>>>>> If we are going to change the hypercall, then can we see about making it
>>>>>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>>>>>> components which might want/need access to this information.  (i.e. I am
>>>>>>> still looking for a reasonable way to get this information from Xen in
>>>>>>> hwloc)
>>>>>> In which case leaving the sysctl alone and just adding a new non-sysctl
>>>>>> interface should be considered.
>>>>> (Sorry for late reply)
>>>>>
>>>>> Would a platform op be an option here or do you prefer a whole new
>>>>> hypercall?
>>>>   From an abstract pov a platform op would be fine, but iirc you had
>>>> a need for preempting, which doesn't work well for that hypercall.
>>>> A whole new one seems overkill too. Perhaps slightly bending what
>>>> physdevop-s are used for might be an option...
>>> Since the plan is to query for both CPU and IO topologies physdevops
>>> doesn't seem to be a logical place for that, does it?
>> That's why I said "slightly bending ..."
>>
>>> What is the problem with preempting while in platform op? We already do
>>> this for microcode updates.
>> But there we don't need to alter the arguments passed to the
>> hypercall, whereas you will need a way to encode where to
>> continue from.
> 
> I needed to do this with last version and ended up having a field in the 
> interface structure that hypervisor updates before issuing a continuation.

While I'm not particularly happy about such, having an explicitly
clobbered field in the interface structure is certainly an option.

> And I think I'd have to do the same for either platform op or physop 
> since neither has an extra argument for passing via a continuation.

We have cmd, of which we could use the upper bits just like we do
elsewhere.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:38               ` Ian Campbell
@ 2015-01-16 16:45                 ` Jan Beulich
  2015-01-16 16:57                   ` Andrew Cooper
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-16 16:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: wei.liu2, stefano.stabellini, Andrew Cooper, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir, Boris Ostrovsky

>>> On 16.01.15 at 17:38, <Ian.Campbell@citrix.com> wrote:
> On Fri, 2015-01-16 at 16:34 +0000, Jan Beulich wrote:
>> >>> On 16.01.15 at 17:16, <Ian.Campbell@citrix.com> wrote:
>> > On Fri, 2015-01-16 at 16:06 +0000, Jan Beulich wrote:
>> >> >>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>> >> > On 01/07/2015 04:12 AM, Jan Beulich wrote:
>> >> >>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>> >> >>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>> >> >>>> Instead of copying data for each field in xen_sysctl_topologyinfo 
>> > separately
>> >> >>>> put cpu/socket/node into a single structure and do a single copy for each
>> >> >>>> processor.
>> >> >>>>
>> >> >>>> There is also no need to copy whole op to user at the end, max_cpu_index 
> is
>> >> >>>> sufficient
>> >> >>>>
>> >> >>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the 
>> >> > fact
>> >> >>>> that these are used for CPU topology. Subsequent patch will add support 
> for
>> >> >>>> PCI topology sysctl.
>> >> >>>>
>> >> >>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> >> >>> If we are going to change the hypercall, then can we see about making it
>> >> >>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>> >> >>> components which might want/need access to this information.  (i.e. I am
>> >> >>> still looking for a reasonable way to get this information from Xen in
>> >> >>> hwloc)
>> >> >> In which case leaving the sysctl alone and just adding a new non-sysctl
>> >> >> interface should be considered.
>> >> > 
>> >> > (Sorry for late reply)
>> >> > 
>> >> > Would a platform op be an option here or do you prefer a whole new 
>> >> > hypercall?
>> >> 
>> >> From an abstract pov a platform op would be fine, but iirc you had
>> >> a need for preempting, which doesn't work well for that hypercall.
>> >> A whole new one seems overkill too. Perhaps slightly bending what
>> >> physdevop-s are used for might be an option...
>> > 
>> > Unlike sysctls, physdevop-s are exposed to/stable for dom0 too aren't
>> > they?
>> 
>> Sure, just like platformop-s. What is it I'm not understanding you
>> try to point out with your question?
> 
> By moving from a sysctl to a physdev op we would then have to declare
> the interface stable and lose the ability to change it in the future,
> and since it didn't look like the intention here was to expose to dom0
> (make more efficient didn't imply that at least) that seems a bit
> unnecessary.

The conversion from sysctl was something Andrew had asked for.
After some consideration I had actually indicated I'm not really
convinced of the motivation he gave, but I don't think I heard
back on this. So _if_ we want to expose this to other than the
tool stack, then _of course_ the interface can't be changed at
our liking anymore (this stability was part of what Andrew wanted
iirc).

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:45                 ` Jan Beulich
@ 2015-01-16 16:57                   ` Andrew Cooper
  2015-01-16 17:07                     ` Boris Ostrovsky
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2015-01-16 16:57 UTC (permalink / raw)
  To: Jan Beulich, Ian Campbell
  Cc: wei.liu2, stefano.stabellini, dario.faggioli, ian.jackson,
	xen-devel, ufimtseva, keir, Boris Ostrovsky

On 16/01/15 16:45, Jan Beulich wrote:
>>>> On 16.01.15 at 17:38, <Ian.Campbell@citrix.com> wrote:
>> On Fri, 2015-01-16 at 16:34 +0000, Jan Beulich wrote:
>>>>>> On 16.01.15 at 17:16, <Ian.Campbell@citrix.com> wrote:
>>>> On Fri, 2015-01-16 at 16:06 +0000, Jan Beulich wrote:
>>>>>>>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>>>>>> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>>>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>>>>>>> Instead of copying data for each field in xen_sysctl_topologyinfo 
>>>> separately
>>>>>>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>>>>>>> processor.
>>>>>>>>>
>>>>>>>>> There is also no need to copy whole op to user at the end, max_cpu_index 
>> is
>>>>>>>>> sufficient
>>>>>>>>>
>>>>>>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the 
>>>>>> fact
>>>>>>>>> that these are used for CPU topology. Subsequent patch will add support 
>> for
>>>>>>>>> PCI topology sysctl.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>>>>>> If we are going to change the hypercall, then can we see about making it
>>>>>>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>>>>>>> components which might want/need access to this information.  (i.e. I am
>>>>>>>> still looking for a reasonable way to get this information from Xen in
>>>>>>>> hwloc)
>>>>>>> In which case leaving the sysctl alone and just adding a new non-sysctl
>>>>>>> interface should be considered.
>>>>>> (Sorry for late reply)
>>>>>>
>>>>>> Would a platform op be an option here or do you prefer a whole new 
>>>>>> hypercall?
>>>>> From an abstract pov a platform op would be fine, but iirc you had
>>>>> a need for preempting, which doesn't work well for that hypercall.
>>>>> A whole new one seems overkill too. Perhaps slightly bending what
>>>>> physdevop-s are used for might be an option...
>>>> Unlike sysctls, physdevop-s are exposed to/stable for dom0 too aren't
>>>> they?
>>> Sure, just like platformop-s. What is it I'm not understanding you
>>> try to point out with your question?
>> By moving from a sysctl to a physdev op we would then have to declare
>> the interface stable and lose the ability to change it in the future,
>> and since it didn't look like the intention here was to expose to dom0
>> (make more efficient didn't imply that at least) that seems a bit
>> unnecessary.
> The conversion from sysctl was something Andrew had asked for.
> After some consideration I had actually indicated I'm not really
> convinced of the motivation he gave, but I don't think I heard
> back on this. So _if_ we want to expose this to other than the
> tool stack, then _of course_ the interface can't be changed at
> our liking anymore (this stability was part of what Andrew wanted
> iirc).

The real question is whether this information is useful to anything
other than toolstack-like entities.

I have been partially dissuaded from my stance of "yes" on this point. 
While it is possible that there are toolstack-like entities which want
this information, there is almost nothing useful which could be done
without other toolstack gubbins in place.

~Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 16:57                   ` Andrew Cooper
@ 2015-01-16 17:07                     ` Boris Ostrovsky
  2015-01-19  8:57                       ` Jan Beulich
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-16 17:07 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, Ian Campbell
  Cc: wei.liu2, stefano.stabellini, dario.faggioli, ian.jackson,
	xen-devel, ufimtseva, keir

On 01/16/2015 11:57 AM, Andrew Cooper wrote:
> On 16/01/15 16:45, Jan Beulich wrote:
>>>>> On 16.01.15 at 17:38, <Ian.Campbell@citrix.com> wrote:
>>> On Fri, 2015-01-16 at 16:34 +0000, Jan Beulich wrote:
>>>>>>> On 16.01.15 at 17:16, <Ian.Campbell@citrix.com> wrote:
>>>>> On Fri, 2015-01-16 at 16:06 +0000, Jan Beulich wrote:
>>>>>>>>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>>>>>>> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>>>>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>>>>>>>> Instead of copying data for each field in xen_sysctl_topologyinfo
>>>>> separately
>>>>>>>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>>>>>>>> processor.
>>>>>>>>>>
>>>>>>>>>> There is also no need to copy whole op to user at the end, max_cpu_index
>>> is
>>>>>>>>>> sufficient
>>>>>>>>>>
>>>>>>>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the
>>>>>>> fact
>>>>>>>>>> that these are used for CPU topology. Subsequent patch will add support
>>> for
>>>>>>>>>> PCI topology sysctl.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>>>>>>> If we are going to change the hypercall, then can we see about making it
>>>>>>>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>>>>>>>> components which might want/need access to this information.  (i.e. I am
>>>>>>>>> still looking for a reasonable way to get this information from Xen in
>>>>>>>>> hwloc)
>>>>>>>> In which case leaving the sysctl alone and just adding a new non-sysctl
>>>>>>>> interface should be considered.
>>>>>>> (Sorry for late reply)
>>>>>>>
>>>>>>> Would a platform op be an option here or do you prefer a whole new
>>>>>>> hypercall?
>>>>>>  From an abstract pov a platform op would be fine, but iirc you had
>>>>>> a need for preempting, which doesn't work well for that hypercall.
>>>>>> A whole new one seems overkill too. Perhaps slightly bending what
>>>>>> physdevop-s are used for might be an option...
>>>>> Unlike sysctls, physdevop-s are exposed to/stable for dom0 too aren't
>>>>> they?
>>>> Sure, just like platformop-s. What is it I'm not understanding you
>>>> try to point out with your question?
>>> By moving from a sysctl to a physdev op we would then have to declare
>>> the interface stable and lose the ability to change it in the future,
>>> and since it didn't look like the intention here was to expose to dom0
>>> (make more efficient didn't imply that at least) that seems a bit
>>> unnecessary.
>> The conversion from sysctl was something Andrew had asked for.
>> After some consideration I had actually indicated I'm not really
>> convinced of the motivation he gave, but I don't think I heard
>> back on this. So _if_ we want to expose this to other than the
>> tool stack, then _of course_ the interface can't be changed at
>> our liking anymore (this stability was part of what Andrew wanted
>> iirc).
> The real question is whether this information is useful to anything
> other than toolstack-like entities.
>
> I have been partially dissuaded from my stance of "yes" on this point.
> While it is possible that there are toolstack-like entities which want
> this information, there is almost nothing useful which could be done
> without other toolstack gubbins in place.

So then you are OK with keeping this in sysctl? (I actually must have 
misunderstood Jan's position on that from earlier discussion --- I 
thought he was advocating for moving away from sysctl as well).

Also, a side question --- how is platform ops interface considered 
stable if it is versioned? I may have a different understanding of what 
"stable" means.

-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-16 17:07                     ` Boris Ostrovsky
@ 2015-01-19  8:57                       ` Jan Beulich
  2015-01-19 10:48                         ` Ian Campbell
  0 siblings, 1 reply; 60+ messages in thread
From: Jan Beulich @ 2015-01-19  8:57 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: wei.liu2, Ian Campbell, stefano.stabellini, Andrew Cooper,
	dario.faggioli, ian.jackson, xen-devel, ufimtseva, keir

>>> On 16.01.15 at 18:07, <boris.ostrovsky@oracle.com> wrote:
> On 01/16/2015 11:57 AM, Andrew Cooper wrote:
>> On 16/01/15 16:45, Jan Beulich wrote:
>>>>>> On 16.01.15 at 17:38, <Ian.Campbell@citrix.com> wrote:
>>>> On Fri, 2015-01-16 at 16:34 +0000, Jan Beulich wrote:
>>>>>>>> On 16.01.15 at 17:16, <Ian.Campbell@citrix.com> wrote:
>>>>>> On Fri, 2015-01-16 at 16:06 +0000, Jan Beulich wrote:
>>>>>>>>>> On 16.01.15 at 16:56, <boris.ostrovsky@oracle.com> wrote:
>>>>>>>> On 01/07/2015 04:12 AM, Jan Beulich wrote:
>>>>>>>>>>>> On 06.01.15 at 14:41, <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> On 06/01/15 02:18, Boris Ostrovsky wrote:
>>>>>>>>>>> Instead of copying data for each field in xen_sysctl_topologyinfo
>>>>>> separately
>>>>>>>>>>> put cpu/socket/node into a single structure and do a single copy for each
>>>>>>>>>>> processor.
>>>>>>>>>>>
>>>>>>>>>>> There is also no need to copy whole op to user at the end, max_cpu_index
>>>> is
>>>>>>>>>>> sufficient
>>>>>>>>>>>
>>>>>>>>>>> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the
>>>>>>>> fact
>>>>>>>>>>> that these are used for CPU topology. Subsequent patch will add support
>>>> for
>>>>>>>>>>> PCI topology sysctl.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>>>>>>>> If we are going to change the hypercall, then can we see about making it
>>>>>>>>>> a stable interface (i.e. not a sysctl/domctl)?  There are non-toolstack
>>>>>>>>>> components which might want/need access to this information.  (i.e. I am
>>>>>>>>>> still looking for a reasonable way to get this information from Xen in
>>>>>>>>>> hwloc)
>>>>>>>>> In which case leaving the sysctl alone and just adding a new non-sysctl
>>>>>>>>> interface should be considered.
>>>>>>>> (Sorry for late reply)
>>>>>>>>
>>>>>>>> Would a platform op be an option here or do you prefer a whole new
>>>>>>>> hypercall?
>>>>>>>  From an abstract pov a platform op would be fine, but iirc you had
>>>>>>> a need for preempting, which doesn't work well for that hypercall.
>>>>>>> A whole new one seems overkill too. Perhaps slightly bending what
>>>>>>> physdevop-s are used for might be an option...
>>>>>> Unlike sysctls, physdevop-s are exposed to/stable for dom0 too aren't
>>>>>> they?
>>>>> Sure, just like platformop-s. What is it I'm not understanding you
>>>>> try to point out with your question?
>>>> By moving from a sysctl to a physdev op we would then have to declare
>>>> the interface stable and lose the ability to change it in the future,
>>>> and since it didn't look like the intention here was to expose to dom0
>>>> (make more efficient didn't imply that at least) that seems a bit
>>>> unnecessary.
>>> The conversion from sysctl was something Andrew had asked for.
>>> After some consideration I had actually indicated I'm not really
>>> convinced of the motivation he gave, but I don't think I heard
>>> back on this. So _if_ we want to expose this to other than the
>>> tool stack, then _of course_ the interface can't be changed at
>>> our liking anymore (this stability was part of what Andrew wanted
>>> iirc).
>> The real question is whether this information is useful to anything
>> other than toolstack-like entities.
>>
>> I have been partially dissuaded from my stance of "yes" on this point.
>> While it is possible that there are toolstack-like entities which want
>> this information, there is almost nothing useful which could be done
>> without other toolstack gubbins in place.
> 
> So then you are OK with keeping this in sysctl? (I actually must have 
> misunderstood Jan's position on that from earlier discussion --- I 
> thought he was advocating for moving away from sysctl as well).

I initially was, but then reconsidered Andrew's arguments and
became unconvinced.

> Also, a side question --- how is platform ops interface considered 
> stable if it is versioned? I may have a different understanding of what 
> "stable" means.

I'm not sure why this has a version associated with it. In any
event that version didn't ever change since its introduction, by
means of which the interface has been stable.

Jan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-19  8:57                       ` Jan Beulich
@ 2015-01-19 10:48                         ` Ian Campbell
  0 siblings, 0 replies; 60+ messages in thread
From: Ian Campbell @ 2015-01-19 10:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, stefano.stabellini, Andrew Cooper, dario.faggioli,
	ian.jackson, xen-devel, ufimtseva, keir, Boris Ostrovsky

On Mon, 2015-01-19 at 08:57 +0000, Jan Beulich wrote:
> >>> On 16.01.15 at 18:07, <boris.ostrovsky@oracle.com> wrote:
> > Also, a side question --- how is platform ops interface considered 
> > stable if it is versioned? I may have a different understanding of what 
> > "stable" means.
> 
> I'm not sure why this has a version associated with it. In any
> event that version didn't ever change since its introduction, by
> means of which the interface has been stable.

It was split off from the old non-stable/versioned dom0_ops when that
was split into platform_op, domctl and sysctl, partly to allow it to
become stable. I suspect the versioning of the platforms_ops is just a
historical relic carried over from before then.

Ian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
  2015-01-06  2:18 ` [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
  2015-01-06 13:41   ` Andrew Cooper
@ 2015-01-19 17:26   ` Ian Campbell
  1 sibling, 0 replies; 60+ messages in thread
From: Ian Campbell @ 2015-01-19 17:26 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, ufimtseva, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, wei.liu2

On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote:
> Instead of copying data for each field in xen_sysctl_topologyinfo separately
> put cpu/socket/node into a single structure and do a single copy for each
> processor.
> 
> There is also no need to copy whole op to user at the end, max_cpu_index is
> sufficient
> 
> Rename xen_sysctl_topologyinfo and XEN_SYSCTL_topologyinfo to reflect the fact
> that these are used for CPU topology. Subsequent patch will add support for
> PCI topology sysctl.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  tools/libxc/include/xenctrl.h     |    4 +-
>  tools/libxc/xc_misc.c             |   10 +++---
>  tools/libxl/libxl.c               |   52 ++++++++++-----------------
>  tools/misc/xenpm.c                |   69 ++++++++++++++----------------------
>  tools/python/xen/lowlevel/xc/xc.c |   40 +++++++--------------

FWIW the tools changes look correct to me (as expected, being largely
mechanical renamings), I think there's still a decision to be made about
which hypercall this is to be a sub-op of, but if the hypervisor guys
agree on this approach then the tools part is:
    Acked-by: Ian Campbell <ian.campbell@citrix.com>

(I'm not made keen on cputopo as an abbreviation, but not enough to ask
you to type "logy" over and over ;-))

Ian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-06  2:18 ` [PATCH v2 4/4] libxl: Add interface for querying hypervisor about " Boris Ostrovsky
  2015-01-06 17:08   ` Wei Liu
  2015-01-07  9:04   ` Dario Faggioli
@ 2015-01-19 17:32   ` Ian Campbell
  2015-01-20 10:54     ` Roger Pau Monné
                       ` (2 more replies)
  2 siblings, 3 replies; 60+ messages in thread
From: Ian Campbell @ 2015-01-19 17:32 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, ufimtseva, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, chegger, jbeulich,
	wei.liu2, Roger Pau Monne

On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote:

>  tools/libxl/libxl_freebsd.c   |   12 +++++++
>  tools/libxl/libxl_netbsd.c    |   12 +++++++

These are stubs, CCing some relevant folks to see if they would like to
provide an implementation.

> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index cd87614..888f068 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5121,6 +5121,64 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>      return ret;
>  }
>  
> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
> +{
> +    GC_INIT(ctx);
> +    xc_pcitopoinfo_t tinfo;
> +    DECLARE_HYPERCALL_BUFFER(xen_sysctl_pcitopo_t, pcitopo);
> +    libxl_pcitopology *ret = NULL;
> +    int i, rc;
> +
> +    tinfo.num_devs = libxl__pci_numdevs(gc);
> +    if (tinfo.num_devs <= 0) {
> +        LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of PCI devices");

Please use the shorter LOG*() macros, which will avoid long lines and
some wrapping.

> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 0a123f1..eb83f0a 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -1070,6 +1070,10 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
>  libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
>  void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
>  
> +#define LIBXL_PCITOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_dev);
> +void libxl_pcitopology_list_free(libxl_pcitopology *, int num_dev);

Needs a #define LIBXL_HAVE_FOO #define to advertise the new
functionality.

> diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
> index ea5d8c1..07428c0 100644
> --- a/tools/libxl/libxl_linux.c
> +++ b/tools/libxl/libxl_linux.c
> @@ -279,3 +279,74 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
>  {
>      return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
>  }
> +
> +/* These two routines are "inspired" by pciutils */

"inspired" in a licensing consistent way?

Nothing else above the comments made by others.
Ian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-19 17:32   ` Ian Campbell
@ 2015-01-20 10:54     ` Roger Pau Monné
  2015-01-20 10:56       ` Ian Campbell
  2015-01-20 15:15     ` Boris Ostrovsky
  2015-01-20 16:08     ` Egger, Christoph
  2 siblings, 1 reply; 60+ messages in thread
From: Roger Pau Monné @ 2015-01-20 10:54 UTC (permalink / raw)
  To: Ian Campbell, Boris Ostrovsky
  Cc: keir, ufimtseva, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, chegger, jbeulich,
	wei.liu2

El 19/01/15 a les 18.32, Ian Campbell ha escrit:
> On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote:
> 
>>  tools/libxl/libxl_freebsd.c   |   12 +++++++
>>  tools/libxl/libxl_netbsd.c    |   12 +++++++
> 
> These are stubs, CCing some relevant folks to see if they would like to
> provide an implementation.

Not right now, I'm afraid until FreeBSD gains pci-passthrough support
this is not going to be implemented.

Roger.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-20 10:54     ` Roger Pau Monné
@ 2015-01-20 10:56       ` Ian Campbell
  0 siblings, 0 replies; 60+ messages in thread
From: Ian Campbell @ 2015-01-20 10:56 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: keir, ufimtseva, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, chegger, jbeulich,
	wei.liu2, Boris Ostrovsky

On Tue, 2015-01-20 at 11:54 +0100, Roger Pau Monné wrote:
> El 19/01/15 a les 18.32, Ian Campbell ha escrit:
> > On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote:
> > 
> >>  tools/libxl/libxl_freebsd.c   |   12 +++++++
> >>  tools/libxl/libxl_netbsd.c    |   12 +++++++
> > 
> > These are stubs, CCing some relevant folks to see if they would like to
> > provide an implementation.
> 
> Not right now, I'm afraid until FreeBSD gains pci-passthrough support
> this is not going to be implemented.

Ah, I didn't realise it didn't have that. It's of course fair enough to
tie those together!

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-19 17:32   ` Ian Campbell
  2015-01-20 10:54     ` Roger Pau Monné
@ 2015-01-20 15:15     ` Boris Ostrovsky
  2015-01-20 15:21       ` Ian Campbell
  2015-01-20 16:08     ` Egger, Christoph
  2 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-20 15:15 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, ufimtseva, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, chegger, jbeulich,
	wei.liu2, Roger Pau Monne

On 01/19/2015 12:32 PM, Ian Campbell wrote:
>
>> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
>> index 0a123f1..eb83f0a 100644
>> --- a/tools/libxl/libxl.h
>> +++ b/tools/libxl/libxl.h
>> @@ -1070,6 +1070,10 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
>>   libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
>>   void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
>>   
>> +#define LIBXL_PCITOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
>> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_dev);
>> +void libxl_pcitopology_list_free(libxl_pcitopology *, int num_dev);
> Needs a #define LIBXL_HAVE_FOO #define to advertise the new
> functionality.

Ah, yes, Wei pointed it out to me earlier and I missed it.

>
>> diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
>> index ea5d8c1..07428c0 100644
>> --- a/tools/libxl/libxl_linux.c
>> +++ b/tools/libxl/libxl_linux.c
>> @@ -279,3 +279,74 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
>>   {
>>       return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
>>   }
>> +
>> +/* These two routines are "inspired" by pciutils */
> "inspired" in a licensing consistent way?

pciutils is licensed under GPL2 (e.g. 
http://git.kernel.org/cgit/utils/pciutils/pciutils.git/tree/COPYING) so 
this would be consistent.

Thanks.
-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-20 15:15     ` Boris Ostrovsky
@ 2015-01-20 15:21       ` Ian Campbell
  2015-01-20 16:04         ` Boris Ostrovsky
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Campbell @ 2015-01-20 15:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, jbeulich, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, chegger, ufimtseva,
	wei.liu2, Roger Pau Monne

On Tue, 2015-01-20 at 10:15 -0500, Boris Ostrovsky wrote:
> >> diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
> >> index ea5d8c1..07428c0 100644
> >> --- a/tools/libxl/libxl_linux.c
> >> +++ b/tools/libxl/libxl_linux.c
> >> @@ -279,3 +279,74 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
> >>   {
> >>       return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
> >>   }
> >> +
> >> +/* These two routines are "inspired" by pciutils */
> > "inspired" in a licensing consistent way?
> 
> pciutils is licensed under GPL2 (e.g. 
> http://git.kernel.org/cgit/utils/pciutils/pciutils.git/tree/COPYING) so 
> this would be consistent.

libxl is LGPL 2.1 not GPL2, and I'm afraid transitioning code in that
direction is not allowed by the license.

Ian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-20 15:21       ` Ian Campbell
@ 2015-01-20 16:04         ` Boris Ostrovsky
  2015-01-20 16:15           ` Ian Campbell
  0 siblings, 1 reply; 60+ messages in thread
From: Boris Ostrovsky @ 2015-01-20 16:04 UTC (permalink / raw)
  To: Ian Campbell
  Cc: keir, jbeulich, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, chegger, ufimtseva,
	wei.liu2, Roger Pau Monne

On 01/20/2015 10:21 AM, Ian Campbell wrote:
> On Tue, 2015-01-20 at 10:15 -0500, Boris Ostrovsky wrote:
>>>> diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
>>>> index ea5d8c1..07428c0 100644
>>>> --- a/tools/libxl/libxl_linux.c
>>>> +++ b/tools/libxl/libxl_linux.c
>>>> @@ -279,3 +279,74 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
>>>>    {
>>>>        return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
>>>>    }
>>>> +
>>>> +/* These two routines are "inspired" by pciutils */
>>> "inspired" in a licensing consistent way?
>> pciutils is licensed under GPL2 (e.g.
>> http://git.kernel.org/cgit/utils/pciutils/pciutils.git/tree/COPYING) so
>> this would be consistent.
> libxl is LGPL 2.1 not GPL2, and I'm afraid transitioning code in that
> direction is not allowed by the license.

I didn't realize libxl was licensed differently from the rest of Xen.

The code is not really coped from them (save for one-line comment) and 
it's kind of hard to write it any differently, given that it's only a 
few lines. The only reason I mentioned it was because I was parsing 
/proc/bus/pci/devices before and while debugging some other problem 
noticed that pciutils was using /sys/bus/pci/devices, which is slightly 
better.

Perhaps I should get in touch with pciutils copyright holder to see if 
he objects.


-boris

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-19 17:32   ` Ian Campbell
  2015-01-20 10:54     ` Roger Pau Monné
  2015-01-20 15:15     ` Boris Ostrovsky
@ 2015-01-20 16:08     ` Egger, Christoph
  2 siblings, 0 replies; 60+ messages in thread
From: Egger, Christoph @ 2015-01-20 16:08 UTC (permalink / raw)
  To: Ian Campbell, Boris Ostrovsky
  Cc: keir, ufimtseva, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, jbeulich, wei.liu2,
	Roger Pau Monne

On 2015/01/19 18:32, Ian Campbell wrote:
> On Mon, 2015-01-05 at 21:18 -0500, Boris Ostrovsky wrote:
> 
>>  tools/libxl/libxl_freebsd.c   |   12 +++++++
>>  tools/libxl/libxl_netbsd.c    |   12 +++++++
> 
> These are stubs, CCing some relevant folks to see if they would like to
> provide an implementation.

For netbsd I recommend CCing port-xen@netbsd.org.

Christoph

> 
>> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
>> index cd87614..888f068 100644
>> --- a/tools/libxl/libxl.c
>> +++ b/tools/libxl/libxl.c
>> @@ -5121,6 +5121,64 @@ libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out)
>>      return ret;
>>  }
>>  
>> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_devs)
>> +{
>> +    GC_INIT(ctx);
>> +    xc_pcitopoinfo_t tinfo;
>> +    DECLARE_HYPERCALL_BUFFER(xen_sysctl_pcitopo_t, pcitopo);
>> +    libxl_pcitopology *ret = NULL;
>> +    int i, rc;
>> +
>> +    tinfo.num_devs = libxl__pci_numdevs(gc);
>> +    if (tinfo.num_devs <= 0) {
>> +        LIBXL__LOG(ctx, XTL_ERROR, "Unable to determine number of PCI devices");
> 
> Please use the shorter LOG*() macros, which will avoid long lines and
> some wrapping.
> 
>> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
>> index 0a123f1..eb83f0a 100644
>> --- a/tools/libxl/libxl.h
>> +++ b/tools/libxl/libxl.h
>> @@ -1070,6 +1070,10 @@ void libxl_vminfo_list_free(libxl_vminfo *list, int nb_vm);
>>  libxl_cputopology *libxl_get_cpu_topology(libxl_ctx *ctx, int *nb_cpu_out);
>>  void libxl_cputopology_list_free(libxl_cputopology *, int nb_cpu);
>>  
>> +#define LIBXL_PCITOPOLOGY_INVALID_ENTRY (~(uint32_t)0)
>> +libxl_pcitopology *libxl_get_pci_topology(libxl_ctx *ctx, int *num_dev);
>> +void libxl_pcitopology_list_free(libxl_pcitopology *, int num_dev);
> 
> Needs a #define LIBXL_HAVE_FOO #define to advertise the new
> functionality.
> 
>> diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
>> index ea5d8c1..07428c0 100644
>> --- a/tools/libxl/libxl_linux.c
>> +++ b/tools/libxl/libxl_linux.c
>> @@ -279,3 +279,74 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
>>  {
>>      return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
>>  }
>> +
>> +/* These two routines are "inspired" by pciutils */
> 
> "inspired" in a licensing consistent way?
> 
> Nothing else above the comments made by others.
> Ian.
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 4/4] libxl: Add interface for querying hypervisor about PCI topology
  2015-01-20 16:04         ` Boris Ostrovsky
@ 2015-01-20 16:15           ` Ian Campbell
  0 siblings, 0 replies; 60+ messages in thread
From: Ian Campbell @ 2015-01-20 16:15 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: keir, ufimtseva, stefano.stabellini, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel, chegger, jbeulich,
	wei.liu2, Roger Pau Monne

On Tue, 2015-01-20 at 11:04 -0500, Boris Ostrovsky wrote:
> On 01/20/2015 10:21 AM, Ian Campbell wrote:
> > On Tue, 2015-01-20 at 10:15 -0500, Boris Ostrovsky wrote:
> >>>> diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
> >>>> index ea5d8c1..07428c0 100644
> >>>> --- a/tools/libxl/libxl_linux.c
> >>>> +++ b/tools/libxl/libxl_linux.c
> >>>> @@ -279,3 +279,74 @@ libxl_device_model_version libxl__default_device_model(libxl__gc *gc)
> >>>>    {
> >>>>        return LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
> >>>>    }
> >>>> +
> >>>> +/* These two routines are "inspired" by pciutils */
> >>> "inspired" in a licensing consistent way?
> >> pciutils is licensed under GPL2 (e.g.
> >> http://git.kernel.org/cgit/utils/pciutils/pciutils.git/tree/COPYING) so
> >> this would be consistent.
> > libxl is LGPL 2.1 not GPL2, and I'm afraid transitioning code in that
> > direction is not allowed by the license.
> 
> I didn't realize libxl was licensed differently from the rest of Xen.

Lots of the tools side is licensed under the LGPL, I believe the files
all have appropriate headers, and most subdirectories have a suitable
COPYING, but it looks like libxl does not.

> The code is not really coped from them (save for one-line comment) and 
> it's kind of hard to write it any differently, given that it's only a 
> few lines. The only reason I mentioned it was because I was parsing 
> /proc/bus/pci/devices before and while debugging some other problem 
> noticed that pciutils was using /sys/bus/pci/devices, which is slightly 
> better.
> 
> Perhaps I should get in touch with pciutils copyright holder to see if 
> he objects.

If you really feel you haven't copied any of their code I don't think
that would be necessary.

Ian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2015-01-20 16:15 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-06  2:18 [PATCH v2 0/4] Display IO topology when PXM data is available Boris Ostrovsky
2015-01-06  2:18 ` [PATCH v2 1/4] pci: Do not ignore device's PXM information Boris Ostrovsky
2015-01-06 11:55   ` Andrew Cooper
2015-01-07  9:01     ` Jan Beulich
2015-01-07  9:06   ` Jan Beulich
2015-01-07 14:42     ` Boris Ostrovsky
2015-01-07 14:47       ` Andrew Cooper
2015-01-07 15:07         ` Jan Beulich
2015-01-07 15:34           ` Boris Ostrovsky
2015-01-07 15:46             ` Jan Beulich
2015-01-07 15:06       ` Jan Beulich
2015-01-07 15:31         ` Boris Ostrovsky
2015-01-07 15:44           ` Jan Beulich
2015-01-06  2:18 ` [PATCH v2 2/4] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient Boris Ostrovsky
2015-01-06 13:41   ` Andrew Cooper
2015-01-06 14:45     ` Boris Ostrovsky
2015-01-07  9:12     ` Jan Beulich
2015-01-07 14:45       ` Boris Ostrovsky
2015-01-07 15:09         ` Jan Beulich
2015-01-16 15:56       ` Boris Ostrovsky
2015-01-16 16:06         ` Jan Beulich
2015-01-16 16:14           ` Boris Ostrovsky
2015-01-16 16:20             ` Jan Beulich
2015-01-16 16:34               ` Boris Ostrovsky
2015-01-16 16:42                 ` Jan Beulich
2015-01-16 16:16           ` Ian Campbell
2015-01-16 16:34             ` Jan Beulich
2015-01-16 16:38               ` Ian Campbell
2015-01-16 16:45                 ` Jan Beulich
2015-01-16 16:57                   ` Andrew Cooper
2015-01-16 17:07                     ` Boris Ostrovsky
2015-01-19  8:57                       ` Jan Beulich
2015-01-19 10:48                         ` Ian Campbell
2015-01-07 15:23     ` Jan Beulich
2015-01-19 17:26   ` Ian Campbell
2015-01-06  2:18 ` [PATCH v2 3/4] sysctl: Add sysctl interface for querying PCI topology Boris Ostrovsky
2015-01-06 16:55   ` Wei Liu
2015-01-06 18:15     ` Boris Ostrovsky
2015-01-07  9:21   ` Jan Beulich
2015-01-07 14:55     ` Boris Ostrovsky
2015-01-07 15:17       ` Jan Beulich
2015-01-07 15:54         ` Boris Ostrovsky
2015-01-07 16:52           ` Jan Beulich
2015-01-07 17:55           ` Dario Faggioli
2015-01-08  9:50             ` Jan Beulich
2015-01-08 15:49               ` Boris Ostrovsky
2015-01-08 15:54                 ` Jan Beulich
2015-01-06  2:18 ` [PATCH v2 4/4] libxl: Add interface for querying hypervisor about " Boris Ostrovsky
2015-01-06 17:08   ` Wei Liu
2015-01-07  9:04   ` Dario Faggioli
2015-01-07 14:15     ` Boris Ostrovsky
2015-01-07 14:45       ` Dario Faggioli
2015-01-19 17:32   ` Ian Campbell
2015-01-20 10:54     ` Roger Pau Monné
2015-01-20 10:56       ` Ian Campbell
2015-01-20 15:15     ` Boris Ostrovsky
2015-01-20 15:21       ` Ian Campbell
2015-01-20 16:04         ` Boris Ostrovsky
2015-01-20 16:15           ` Ian Campbell
2015-01-20 16:08     ` Egger, Christoph

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.