All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file
Date: Thu, 12 Apr 2012 11:24:46 +0100	[thread overview]
Message-ID: <4F86AD6E.3050705@eu.citrix.com> (raw)
In-Reply-To: <7e76233448b02810f0ae.1334150272@Solace>

On 11/04/12 14:17, Dario Faggioli wrote:
> Let the user specify the NUMA node affinity of a guest via the
> 'nodes=' config switch. Explicitly listing the intended target host
> nodes is required as of now.
>
> A valid setting will directly impact the node_affinity mask
> of the guest, i.e., it will change the behaviour of the low level
> memory allocator. On the other hand, this commit does not affect
> by any means how the guest's vCPUs are scheduled on the host's
> pCPUs.
I would probably break this into three separate patches, starting with 
the hypervisor, then libxc, then libxl, especially since they tend to 
have different maintainers and committers.
>
> TODO: * Better investigate and test interactions with cpupools.
>
> Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com>
>
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -112,6 +112,15 @@ List of which cpus the guest is allowed
>   (all vcpus will run on cpus 0,2,3,5), or `cpus=["2", "3"]` (all vcpus
>   will run on cpus 2 and 3).
>
> +=item B<nodes=[ NODE, NODE, ...]>
> +
> +List of the NUMA nodes of the host the guest should be considered
> +`affine` with. Being affine to a (set of) NUMA node(s) mainly means
> +the guest's memory is going to be allocated on those node(s).
> +
> +A list of nodes should be specified as follows: `nodes=["0", "3"]`
> +for the guest to be affine with nodes 0 and 3.
> +
Hmm, I think that using "affine" here is technically correct, and is 
what one would use if writing a research paper; but it's unusual to hear 
the word in more common English; it would be more common to hear someone 
describe a VM as "having affinity with".  How about something like this:

"The list of NUMA nodes the domain is considered to have affinity with.  
Memory from the guest will be allocated from these nodes."

(Technically you're also not supposed to end a sentence with a 
preposition, but I think "...with which the domain is considered to have 
affinity" is just to awkward.)

Also, is there a way to specify that the affinity to be to all nodes 
and/or based on the vcpu mask of the vcpus?
>   =item B<memory=MBYTES>
>
>   Start the guest with MBYTES megabytes of RAM.
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -110,6 +110,44 @@ int xc_domain_shutdown(xc_interface *xch
>   }
>
>
> +int xc_domain_node_affinity(xc_interface *xch,
> +                            uint32_t domid,
> +                            xc_nodemap_t node_affinity)
> +{
> +    DECLARE_DOMCTL;
> +    DECLARE_HYPERCALL_BUFFER(uint8_t, local);
> +    int ret = -1;
> +    int nodesize;
> +
> +    nodesize = xc_get_nodemap_size(xch);
> +    if (!nodesize)
> +    {
> +        PERROR("Could not get number of nodes");
> +        goto out;
> +    }
> +
> +    local = xc_hypercall_buffer_alloc(xch, local, nodesize);
> +    if ( local == NULL )
> +    {
> +        PERROR("Could not allocate memory for domain_node_affinity domctl hypercall");
> +        goto out;
> +    }
> +
> +    domctl.cmd = XEN_DOMCTL_node_affinity;
> +    domctl.domain = (domid_t)domid;
> +
> +    memcpy(local, node_affinity, nodesize);
> +    set_xen_guest_handle(domctl.u.node_affinity.nodemap.bitmap, local);
> +    domctl.u.node_affinity.nodemap.nr_elems = nodesize * 8;
> +
> +    ret = do_domctl(xch,&domctl);
> +
> +    xc_hypercall_buffer_free(xch, local);
> +
> + out:
> +    return ret;
> +}
> +
>   int xc_vcpu_setaffinity(xc_interface *xch,
>                           uint32_t domid,
>                           int vcpu,
> diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h
> --- a/tools/libxc/xenctrl.h
> +++ b/tools/libxc/xenctrl.h
> @@ -520,6 +520,19 @@ int xc_watchdog(xc_interface *xch,
>                  uint32_t id,
>                  uint32_t timeout);
>
> +/**
> + * This function explicitly sets the affinity of a domain with the
> + * host NUMA nodes.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id in which vcpus are to be created.
> + * @parm node_affinity the map of the affine nodes.
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_domain_node_affinity(xc_interface *xch,
> +                            uint32_t domind,
> +                            xc_nodemap_t node_affinity);
> +
Seems like it would be useful to have both a get and a set.
>   int xc_vcpu_setaffinity(xc_interface *xch,
>                           uint32_t domid,
>                           int vcpu,
> diff --git a/tools/libxl/gentest.py b/tools/libxl/gentest.py
> --- a/tools/libxl/gentest.py
> +++ b/tools/libxl/gentest.py
> @@ -20,7 +20,7 @@ def randomize_case(s):
>   def randomize_enum(e):
>       return random.choice([v.name for v in e.values])
>
> -handcoded = ["libxl_cpumap", "libxl_key_value_list",
> +handcoded = ["libxl_cpumap", "libxl_nodemap", "libxl_key_value_list",
>                "libxl_cpuid_policy_list", "libxl_file_reference",
>                "libxl_string_list"]
>
> @@ -119,6 +119,19 @@ static void libxl_cpumap_rand_init(libxl
>       }
>   }
>
> +static void libxl_nodemap_rand_init(libxl_nodemap *nodemap)
> +{
> +    int i;
> +    nodemap->size = rand() % 16;
> +    nodemap->map = calloc(nodemap->size, sizeof(*nodemap->map));
> +    libxl_for_each_node(i, *nodemap) {
> +        if (rand() % 2)
> +            libxl_nodemap_set(nodemap, i);
> +        else
> +            libxl_nodemap_reset(nodemap, i);
> +    }
> +}
> +
>   static void libxl_key_value_list_rand_init(libxl_key_value_list *pkvl)
>   {
>       int i, nr_kvp = rand() % 16;
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -3082,6 +3082,16 @@ int libxl_set_vcpuaffinity_all(libxl_ctx
>       return rc;
>   }
>
> +int libxl_set_node_affinity(libxl_ctx *ctx, uint32_t domid,
> +                            libxl_nodemap *nodemap)
> +{
> +    if (xc_domain_node_affinity(ctx->xch, domid, nodemap->map)) {
> +        LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "setting node affinity");
> +        return ERROR_FAIL;
> +    }
> +    return 0;
> +}
> +
>   int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap)
>   {
>       GC_INIT(ctx);
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -727,6 +727,8 @@ int libxl_set_vcpuaffinity(libxl_ctx *ct
>                              libxl_cpumap *cpumap);
>   int libxl_set_vcpuaffinity_all(libxl_ctx *ctx, uint32_t domid,
>                                  unsigned int max_vcpus, libxl_cpumap *cpumap);
> +int libxl_set_node_affinity(libxl_ctx *ctx, uint32_t domid,
> +                            libxl_nodemap *nodemap);
Hmm -- is there really no libxl_get_vcpuaffinity()?  That seems to be a 
missing component...
>   int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_cpumap *cpumap);
>
>   libxl_scheduler libxl_get_scheduler(libxl_ctx *ctx);
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -121,6 +121,12 @@ int libxl__domain_build_info_setdefault(
>           libxl_cpumap_set_any(&b_info->cpumap);
>       }
>
> +    if (!b_info->nodemap.size) {
> +        if (libxl_nodemap_alloc(CTX,&b_info->nodemap))
> +            return ERROR_NOMEM;
> +        libxl_nodemap_set_none(&b_info->nodemap);
> +    }
> +
>       if (b_info->max_memkb == LIBXL_MEMKB_DEFAULT)
>           b_info->max_memkb = 32 * 1024;
>       if (b_info->target_memkb == LIBXL_MEMKB_DEFAULT)
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -67,6 +67,7 @@ int libxl__build_pre(libxl__gc *gc, uint
>       char *xs_domid, *con_domid;
>       xc_domain_max_vcpus(ctx->xch, domid, info->max_vcpus);
>       libxl_set_vcpuaffinity_all(ctx, domid, info->max_vcpus,&info->cpumap);
> +    libxl_set_node_affinity(ctx, domid,&info->nodemap);
>       xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + LIBXL_MAXMEM_CONSTANT);
>       if (info->type == LIBXL_DOMAIN_TYPE_PV)
>           xc_domain_set_memmap_limit(ctx->xch, domid,
> diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c
> --- a/tools/libxl/libxl_json.c
> +++ b/tools/libxl/libxl_json.c
> @@ -119,6 +119,26 @@ out:
>       return s;
>   }
>
> +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand,
> +                                       libxl_nodemap *nodemap)
> +{
> +    yajl_gen_status s;
> +    int i;
> +
> +    s = yajl_gen_array_open(hand);
> +    if (s != yajl_gen_status_ok) goto out;
> +
> +    libxl_for_each_node(i, *nodemap) {
> +        if (libxl_nodemap_test(nodemap, i)) {
> +            s = yajl_gen_integer(hand, i);
> +            if (s != yajl_gen_status_ok) goto out;
> +        }
> +    }
> +    s = yajl_gen_array_close(hand);
> +out:
> +    return s;
> +}
> +
>   yajl_gen_status libxl_key_value_list_gen_json(yajl_gen hand,
>                                                 libxl_key_value_list *pkvl)
>   {
> diff --git a/tools/libxl/libxl_json.h b/tools/libxl/libxl_json.h
> --- a/tools/libxl/libxl_json.h
> +++ b/tools/libxl/libxl_json.h
> @@ -27,6 +27,7 @@ yajl_gen_status libxl_domid_gen_json(yaj
>   yajl_gen_status libxl_uuid_gen_json(yajl_gen hand, libxl_uuid *p);
>   yajl_gen_status libxl_mac_gen_json(yajl_gen hand, libxl_mac *p);
>   yajl_gen_status libxl_cpumap_gen_json(yajl_gen hand, libxl_cpumap *p);
> +yajl_gen_status libxl_nodemap_gen_json(yajl_gen hand, libxl_nodemap *p);
>   yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen hand,
>                                                    libxl_cpuid_policy_list *p);
>   yajl_gen_status libxl_string_list_gen_json(yajl_gen hand, libxl_string_list *p);
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -11,6 +11,7 @@ libxl_domid = Builtin("domid", json_fn =
>   libxl_uuid = Builtin("uuid", passby=PASS_BY_REFERENCE)
>   libxl_mac = Builtin("mac", passby=PASS_BY_REFERENCE)
>   libxl_cpumap = Builtin("cpumap", dispose_fn="libxl_cpumap_dispose", passby=PASS_BY_REFERENCE)
> +libxl_nodemap = Builtin("nodemap", dispose_fn="libxl_nodemap_dispose", passby=PASS_BY_REFERENCE)
>   libxl_cpuid_policy_list = Builtin("cpuid_policy_list", dispose_fn="libxl_cpuid_dispose", passby=PASS_BY_REFERENCE)
>
>   libxl_string_list = Builtin("string_list", dispose_fn="libxl_string_list_dispose", passby=PASS_BY_REFERENCE)
> @@ -233,6 +234,7 @@ libxl_domain_build_info = Struct("domain
>       ("max_vcpus",       integer),
>       ("cur_vcpus",       integer),
>       ("cpumap",          libxl_cpumap),
> +    ("nodemap",         libxl_nodemap),
>       ("tsc_mode",        libxl_tsc_mode),
>       ("max_memkb",       MemKB),
>       ("target_memkb",    MemKB),
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -515,7 +515,7 @@ static void parse_config_data(const char
>       const char *buf;
>       long l;
>       XLU_Config *config;
> -    XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids;
> +    XLU_ConfigList *cpus, *nodes, *vbds, *nics, *pcis, *cvfbs, *cpuids;
>       int pci_power_mgmt = 0;
>       int pci_msitranslate = 1;
>       int pci_permissive = 0;
> @@ -628,6 +628,39 @@ static void parse_config_data(const char
>           free(buf2);
>       }
>
> +    if (!xlu_cfg_get_list (config, "nodes",&nodes, 0, 0)) {
> +        int i, n_nodes = 0;
> +
> +        if (libxl_nodemap_alloc(ctx,&b_info->nodemap)) {
> +            fprintf(stderr, "Unable to allocate nodemap\n");
> +            exit(1);
> +        }
> +
> +        libxl_nodemap_set_none(&b_info->nodemap);
> +        while ((buf = xlu_cfg_get_listitem(nodes, n_nodes)) != NULL) {
> +            i = atoi(buf);
> +            if (!libxl_nodemap_node_valid(&b_info->nodemap, i)) {
> +                fprintf(stderr, "node %d illegal\n", i);
> +                exit(1);
> +            }
> +            libxl_nodemap_set(&b_info->nodemap, i);
> +            n_nodes++;
> +        }
> +
> +        if (n_nodes == 0)
> +            fprintf(stderr, "No valid node found: no affinity will be set\n");
> +    }
> +    else {
> +        /*
> +         * XXX What would a sane default be? I think doing nothing
> +         *     (i.e., relying on cpu-affinity/cpupool ==>  the current
> +         *     behavior) should be just fine.
> +         *     That would mean we're saying to the user "if you want us
> +         *     to take care of NUMA, please tell us, maybe just putting
> +         *     'nodes=auto', but tell us... otherwise we do as usual".
> +         */
I think that for this patch, doing nothing is fine (which means removing 
the whole else clause).  But once we have the auto placement, I think 
that "nodes=auto" should be the default.
> +    }
> +
>       if (!xlu_cfg_get_long (config, "memory",&l, 0)) {
>           b_info->max_memkb = l * 1024;
>           b_info->target_memkb = b_info->max_memkb;
> diff --git a/tools/python/xen/lowlevel/xl/xl.c b/tools/python/xen/lowlevel/xl/xl.c
> --- a/tools/python/xen/lowlevel/xl/xl.c
> +++ b/tools/python/xen/lowlevel/xl/xl.c
> @@ -243,6 +243,18 @@ int attrib__libxl_cpumap_set(PyObject *v
>       return 0;
>   }
>
> +int attrib__libxl_nodemap_set(PyObject *v, libxl_nodemap *pptr)
> +{
> +    int i;
> +    long node;
> +
> +    for (i = 0; i<  PyList_Size(v); i++) {
> +        node = PyInt_AsLong(PyList_GetItem(v, i));
> +        libxl_nodemap_set(pptr, node);
> +    }
> +    return 0;
> +}
> +
>   int attrib__libxl_file_reference_set(PyObject *v, libxl_file_reference *pptr)
>   {
>       return genwrap__string_set(v,&pptr->path);
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -222,6 +222,7 @@ struct domain *domain_create(
>
>       spin_lock_init(&d->node_affinity_lock);
>       d->node_affinity = NODE_MASK_ALL;
> +    d->has_node_affinity = 0;
>
>       spin_lock_init(&d->shutdown_lock);
>       d->shutdown_code = -1;
> @@ -337,7 +338,7 @@ void domain_update_node_affinity(struct
>   {
>       cpumask_var_t cpumask;
>       cpumask_var_t online_affinity;
> -    const cpumask_t *online;
> +    const cpumask_t *online = cpupool_online_cpumask(d->cpupool);
>       nodemask_t nodemask = NODE_MASK_NONE;
>       struct vcpu *v;
>       unsigned int node;
> @@ -350,9 +351,22 @@ void domain_update_node_affinity(struct
>           return;
>       }
>
> -    online = cpupool_online_cpumask(d->cpupool);
> +    spin_lock(&d->node_affinity_lock);
>
> -    spin_lock(&d->node_affinity_lock);
> +    /*
> +     * If someone explicitly told us our NUMA affinity, avoid messing
> +     * that up. Notice, however, that vcpu affinity is still what
> +     * drives all scheduling decisions.
> +     *
> +     * XXX I'm quite sure this is fine wrt to the various v->cpu_affinity
> +     *     (at least, that was the intended design!). Could it be useful
> +     *     to cross-check d->node_affinity against `online`? The basic
> +     *     idea here is "Hey, if you shoot yourself in the foot... You've
> +     *     shot in the foot!", but, you know...
> +     */
> +    if ( d->has_node_affinity )
> +        goto out;
> +
>
>       for_each_vcpu ( d, v )
>       {
> @@ -365,6 +379,8 @@ void domain_update_node_affinity(struct
>               node_set(node, nodemask);
>
>       d->node_affinity = nodemask;
> +
> +out:
>       spin_unlock(&d->node_affinity_lock);
>
>       free_cpumask_var(online_affinity);
> @@ -372,6 +388,31 @@ void domain_update_node_affinity(struct
>   }
>
>
> +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity)
> +{
> +    spin_lock(&d->node_affinity_lock);
> +
> +    /*
> +     * Being/becoming affine to _no_ nodes is not going to work,
> +     * let's take it as the `reset node affinity` command.
> +     */
> +    if ( nodes_empty(*affinity) )
> +    {
> +        d->has_node_affinity = 0;
> +        spin_unlock(&d->node_affinity_lock);
> +        domain_update_node_affinity(d);
> +        return 0;
> +    }
> +
> +    d->has_node_affinity = 1;
> +    d->node_affinity = *affinity;
> +
> +    spin_unlock(&d->node_affinity_lock);
> +
> +    return 0;
> +}
> +
> +
>   struct domain *get_domain_by_id(domid_t dom)
>   {
>       struct domain *d;
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -621,6 +621,27 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domc
>       }
>       break;
>
> +    case XEN_DOMCTL_node_affinity:
> +    {
> +        domid_t dom = op->domain;
> +        struct domain *d = rcu_lock_domain_by_id(dom);
> +        nodemask_t new_affinity;
> +
> +        ret = -ESRCH;
> +        if ( d == NULL )
> +            break;
> +
> +        /* XXX We need an xsm_* for this I guess, right? */
Yes. :-)
> +
> +        ret = xenctl_nodemap_to_nodemask(&new_affinity,
> +&op->u.node_affinity.nodemap);
> +        if ( !ret )
> +            ret = domain_set_node_affinity(d,&new_affinity);
> +
> +        rcu_unlock_domain(d);
> +    }
> +    break;
> +
>       case XEN_DOMCTL_setvcpuaffinity:
>       case XEN_DOMCTL_getvcpuaffinity:
>       {
> diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
> --- a/xen/common/keyhandler.c
> +++ b/xen/common/keyhandler.c
> @@ -217,6 +217,14 @@ static void cpuset_print(char *set, int
>       *set++ = '\0';
>   }
>
> +static void nodeset_print(char *set, int size, const nodemask_t *mask)
> +{
> +    *set++ = '[';
> +    set += nodelist_scnprintf(set, size-2, mask);
> +    *set++ = ']';
> +    *set++ = '\0';
> +}
> +
>   static void periodic_timer_print(char *str, int size, uint64_t period)
>   {
>       if ( period == 0 )
> @@ -272,6 +280,9 @@ static void dump_domains(unsigned char k
>
>           dump_pageframe_info(d);
>
> +        nodeset_print(tmpstr, sizeof(tmpstr),&d->node_affinity);
> +        printk("NODE affinity for domain %d: %s\n", d->domain_id, tmpstr);
> +
>           printk("VCPU information and callbacks for domain %u:\n",
>                  d->domain_id);
>           for_each_vcpu ( d, v )
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -278,6 +278,15 @@ typedef struct xen_domctl_getvcpuinfo xe
>   DEFINE_XEN_GUEST_HANDLE(xen_domctl_getvcpuinfo_t);
>
>
> +/* Set the NUMA node(s) with which the guest is `affine`. */
> +/* XEN_DOMCTL_node_affinity */
> +struct xen_domctl_node_affinity {
> +    struct xenctl_map nodemap;   /* IN */
> +};
> +typedef struct xen_domctl_node_affinity xen_domctl_node_affinity_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_domctl_node_affinity_t);
> +
> +
>   /* Get/set which physical cpus a vcpu can execute on. */
>   /* XEN_DOMCTL_setvcpuaffinity */
>   /* XEN_DOMCTL_getvcpuaffinity */
> @@ -914,6 +923,7 @@ struct xen_domctl {
>   #define XEN_DOMCTL_set_access_required           64
>   #define XEN_DOMCTL_audit_p2m                     65
>   #define XEN_DOMCTL_set_virq_handler              66
> +#define XEN_DOMCTL_node_affinity                 67
>   #define XEN_DOMCTL_gdbsx_guestmemio            1000
>   #define XEN_DOMCTL_gdbsx_pausevcpu             1001
>   #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> @@ -927,6 +937,7 @@ struct xen_domctl {
>           struct xen_domctl_getpageframeinfo  getpageframeinfo;
>           struct xen_domctl_getpageframeinfo2 getpageframeinfo2;
>           struct xen_domctl_getpageframeinfo3 getpageframeinfo3;
> +        struct xen_domctl_node_affinity     node_affinity;
>           struct xen_domctl_vcpuaffinity      vcpuaffinity;
>           struct xen_domctl_shadow_op         shadow_op;
>           struct xen_domctl_max_mem           max_mem;
> diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h
> --- a/xen/include/xen/nodemask.h
> +++ b/xen/include/xen/nodemask.h
> @@ -8,8 +8,9 @@
>    * See detailed comments in the file linux/bitmap.h describing the
>    * data type on which these nodemasks are based.
>    *
> - * For details of nodemask_scnprintf() and nodemask_parse(),
> - * see bitmap_scnprintf() and bitmap_parse() in lib/bitmap.c.
> + * For details of nodemask_scnprintf(), nodelist_scnpintf() and
> + * nodemask_parse(), see bitmap_scnprintf() and bitmap_parse()
> + * in lib/bitmap.c.
>    *
>    * The available nodemask operations are:
>    *
> @@ -48,6 +49,7 @@
>    * unsigned long *nodes_addr(mask)     Array of unsigned long's in mask
>    *
>    * int nodemask_scnprintf(buf, len, mask) Format nodemask for printing
> + * int nodelist_scnprintf(buf, len, mask) Format nodemask as a list for printing
>    * int nodemask_parse(ubuf, ulen, mask)        Parse ascii string as nodemask
>    *
>    * for_each_node_mask(node, mask)      for-loop node over mask
> @@ -280,6 +282,14 @@ static inline int __first_unset_node(con
>
>   #define nodes_addr(src) ((src).bits)
>
> +#define nodelist_scnprintf(buf, len, src) \
> +                       __nodelist_scnprintf((buf), (len), (src), MAX_NUMNODES)
> +static inline int __nodelist_scnprintf(char *buf, int len,
> +                                       const nodemask_t *srcp, int nbits)
> +{
> +       return bitmap_scnlistprintf(buf, len, srcp->bits, nbits);
> +}
> +
>   #if 0
>   #define nodemask_scnprintf(buf, len, src) \
>                          __nodemask_scnprintf((buf), (len),&(src), MAX_NUMNODES)
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -346,8 +346,12 @@ struct domain
>       /* Various mem_events */
>       struct mem_event_per_domain *mem_event;
>
> -    /* Currently computed from union of all vcpu cpu-affinity masks. */
> +    /*
> +     * Can be specified by the user. If that is not the case, it is
> +     * computed from the union of all the vcpu cpu-affinity masks.
> +     */
>       nodemask_t node_affinity;
> +    int has_node_affinity;
There's something that seems a bit clunky about this; but I can't really 
think of a better way.  At every least I'd change the name of the 
element here to something more descriptive; perhaps "auto_node_affinity" 
(which would invert the meaning)?
>       unsigned int last_alloc_node;
>       spinlock_t node_affinity_lock;
>   };
> @@ -416,6 +420,7 @@ static inline void get_knownalive_domain
>       ASSERT(!(atomic_read(&d->refcnt)&  DOMAIN_DESTROYED));
>   }
>
> +int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity);
>   void domain_update_node_affinity(struct domain *d);
>
>   struct domain *domain_create(

  reply	other threads:[~2012-04-12 10:24 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-11 13:17 [PATCH 00 of 10 [RFC]] Automatically place guest on host's NUMA nodes with xl Dario Faggioli
2012-04-11 13:17 ` [PATCH 01 of 10 [RFC]] libxc: Generalize xenctl_cpumap to just xenctl_map Dario Faggioli
2012-04-11 16:08   ` George Dunlap
2012-04-11 16:31     ` Dario Faggioli
2012-04-11 16:41       ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 02 of 10 [RFC]] libxl: Generalize libxl_cpumap to just libxl_map Dario Faggioli
2012-04-11 13:17 ` [PATCH 03 of 10 [RFC]] libxc, libxl: Introduce xc_nodemap_t and libxl_nodemap Dario Faggioli
2012-04-11 16:38   ` George Dunlap
2012-04-11 16:57     ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 04 of 10 [RFC]] libxl: Introduce libxl_get_numainfo() calling xc_numainfo() Dario Faggioli
2012-04-11 13:17 ` [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file Dario Faggioli
2012-04-12 10:24   ` George Dunlap [this message]
2012-04-12 10:48     ` David Vrabel
2012-04-12 22:25       ` Dario Faggioli
2012-04-12 11:32     ` Formatting of emails which are comments on patches Ian Jackson
2012-04-12 11:42       ` George Dunlap
2012-04-12 22:21     ` [PATCH 05 of 10 [RFC]] xl: Explicit node affinity specification for guests via config file Dario Faggioli
2012-04-11 13:17 ` [PATCH 06 of 10 [RFC]] xl: Allow user to set or change node affinity on-line Dario Faggioli
2012-04-12 10:29   ` George Dunlap
2012-04-12 21:57     ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 07 of 10 [RFC]] sched_credit: Let the scheduler know about `node affinity` Dario Faggioli
2012-04-12 23:06   ` Dario Faggioli
2012-04-27 14:45   ` George Dunlap
2012-05-02 15:13     ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 08 of 10 [RFC]] xl: Introduce First Fit memory-wise placement of guests on nodes Dario Faggioli
2012-05-01 15:45   ` George Dunlap
2012-05-02 16:30     ` Dario Faggioli
2012-05-03  1:03       ` Dario Faggioli
2012-05-03  8:10         ` Ian Campbell
2012-05-03 10:16         ` George Dunlap
2012-05-03 13:41       ` George Dunlap
2012-05-03 14:58         ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 09 of 10 [RFC]] xl: Introduce Best and Worst Fit guest placement algorithms Dario Faggioli
2012-04-16 10:29   ` Dario Faggioli
2012-04-11 13:17 ` [PATCH 10 of 10 [RFC]] xl: Some automatic NUMA placement documentation Dario Faggioli
2012-04-12  9:11   ` Ian Campbell
2012-04-12 10:32     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F86AD6E.3050705@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=raistlin@linux.it \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.