All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: vasilis.liaskovitis@profitbricks.com, hutao@cn.fujitsu.com,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
Date: Wed, 24 Jul 2013 14:41:36 +0200	[thread overview]
Message-ID: <51EFCB80.50308@redhat.com> (raw)
In-Reply-To: <20130724133420.5c0c5653@nial.usersys.redhat.com>

Il 24/07/2013 13:34, Igor Mammedov ha scritto:
> On Wed, 24 Jul 2013 11:41:04 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Il 24/07/2013 10:36, Igor Mammedov ha scritto:
>>> On Tue, 23 Jul 2013 19:09:26 +0200
>>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>>> Il 23/07/2013 18:23, Igor Mammedov ha scritto:
>>>>> - if slot property is not specified on -device/device_add command,
>>>>> treat default value as request for assigning DimmDevice to
>>>>> the first free slot.
>>>>
>>>> Even with "-m" instead of "-numa mem", I think this is problematic
>>>> because we still need to separate the host and guest parts of the DIMM
>>>> device.  "-numa mem" (or the QMP command that Wanlong added) will be
>>>> necessary to allocate memory on the host side before adding a DIMM.
>>> why not do host allocation part at the same time when DIMM is added, is
>>> there a real need to separate DIMM device?
>>>
>>> I probably miss something but -numa mem option and co aside what problem
>>> couldn't be solved during DIMM device initialization and would require
>>> a split DIMM device?
>>
>> Because otherwise, every option we add to "-numa mem" will have to be
>> added to "-device dimm".  For example,
>>
>>    -device dimm,policy=interleave
> if it's feature of DIMM device sure, if it is not lets find a better
> place for it. See below for an alternative approach.
> 
>>
>> makes no sense to me.
>>
>> In fact, this is no different from having to do drive_add or netdev_add
>> before device_add.  First you tell QEMU about the host resources to use,
>> then you add the guest device and bind the device to those resources.
>>
>>>> So slots will have three states: free (created with "-m"), allocated (a
>>>> free slot moves to this state with "-numa mem...,populated=no" when
>>>> migrating, or with the QMP command for regular hotplug), populated (an
>>>> allocated slot moves to this state with "-device dimm").
>>>>
>>>> You would be able to plug a DIMM only into an allocated slot, and the
>>>> size will be specified on the slot rather than the DIMM device.
>>> 'slot' property is there only for migration sake to provide stable
>>> numeric ID for QEMU<->ACPI BIOS interface. It's not used for any other
>>> purpose and wasn't intended for any other usage..
>>
>> How would you otherwise refer to the memory you want to affect in a
>> set-mem-policy monitor command?
> could be 'id' property or even better a QOM path
> 
>>
>>> on baremetal slot has noting to do with size of plugged in DIMM,
>>
>> On baremetal slots also belong to a specific NUMA node, for what it's
>> worth.  There are going to be differences with baremetal no matter what.
> sure we can deviate here, but I don't see full picture yet so I'm trying
> to find justification for it first and asking questions. Maybe a better
> solution will be found.
> 
>>
>>> why we
>>> would model it other way if it only brings problems: like predefined size,
>>
>> It doesn't have to be predefined.  In the previous discussions (and also
>> based on Vasilis and Hu Tao's implementations) I assumed predefined slot
>> sizes.  Now I understand the benefit of having a simpler command-line
>> with "-m", but then in return you need three slot states instead of just
>> unpopulated/populated.
>>
>> So you'd just do
>>
>>    set-mem-policy 0 size=2G      # free->allocated
>>    device_add dimm,slotid=0      # allocated->populated
>>
>> to hotplug a 2G DIMM.  And you'll be able to pin it to host NUMA nodes,
>> and assign it to guest NUMA nodes, like this:
>>
>>    set-mem-policy 0 size=2G,nodeid=1,policy=membind host-nodes=0-1
>>    device_add dimm,slotid=0
> Do policy and other -numa mem properties belong to a particular DIMM device
> or rather to a particular NUMA node?
> 
> How about following idea: guest-node maps to a specific host-node, then
> when we plug DIMM, guest node provides information on policies and whatever
> to the creator of DIMM device (via DimmBus and/or mhc) which allocates
> memory, applies policies and binds new memory to a specific host node.
> That would eliminate 2 stage approach.

It makes sense.  My main worry is not to deviate from what we've been
doing for drives and netdevs (because that's a proven design).  Both
"-numa mem" and this proposal satisfy that goal.

I originally proposed "-numa mem" because Vasilis and Hu's patches were
relying on specifying predefined sizes for all slots.  So "-numa mem"
was a good fit for both memory hotplug (done Hu's way) and NUMA policy.
 It also simplified the command line which had a lot of "mem-" prefixed
options.

With the approach you suggest it may not be necessary at all, and we can
go back to just "-numa
node,cpus=0,mem=1G,mem-policy=membind,mem-hostnodes=0-1,cpu-hostnodes=0"
or something like that.

Whether it is workable, it depends on what granularity Wanlong/Hu want.

There may be some scenarios where per-slot policies make sense.  For
example, imagine that in general you want memory to be bound to the
corresponding host node.  It turns out some nodes are now fully
committed and others are free, and you need more memory on a VM.  You
can hotplug that memory without really caring about binding and
momentarily suffer some performance loss.

I agree that specifying the policy on every hotplug complicates
management and may be overkill.  But then, most guests are not NUMA at
all and you would hardly perceive the difference, you would just have to
separate

    set-mem-policy 0 size=2G
    device_add dimm,slot=0

instead of

    device_add dimm,slot,size=2G

which is not a big chore.

> in this case DIMM device only needs to specify where it's plugged in, using
> 'node' property (now number but could become QOM path to NUMA node object).

Yeah, then it's the same as the id.

Paolo

> Ideally it would be QOM hierarchy:
> 
> /nodeX/@dimmbus/dimm_device
> where even 'node' property would become obsolete, just specify right
> bus to attach DIMM device to.
> 
> PS:
> we need a similar QOM hierarchy for CPUs as well to sort out
> -numa cpus=ids mess.
> 
>>
>> Again, this is the same as drive_add/device_add.
>>
>> Paolo
>>
>>> allocated, free etc. I think slot should be either free or busy.
>>>
>>>
>>>>
>>>> In general, I don't think free slots should be managed by the DimmBus,
>>>> and host vs. guest separation should be there even if we accept your
>>>> "-m" extension (doesn't look bad at all, I must say).
>>>>
>>>> Paolo
>>>
>>
> 

  reply	other threads:[~2013-07-24 12:42 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-23 16:22 [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Igor Mammedov
2013-07-23 16:22 ` [Qemu-devel] [PATCH 01/16] pc: use pci_hole64 info consistently Igor Mammedov
2013-07-23 16:22 ` [Qemu-devel] [PATCH 02/16] vl: set default ram_size during variable initialization Igor Mammedov
2013-08-02 20:33   ` Andreas Färber
2013-09-09 14:06     ` Igor Mammedov
2013-09-09 14:31       ` Paolo Bonzini
2013-09-09 15:26         ` Igor Mammedov
2013-07-23 16:22 ` [Qemu-devel] [PATCH 03/16] vl: convert -m to qemu_opts_parse() Igor Mammedov
2013-07-23 17:11   ` Paolo Bonzini
2013-07-24  8:40     ` Igor Mammedov
2013-07-24  9:04       ` Paolo Bonzini
2013-07-24  9:27         ` Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 04/16] qapi: make visit_type_size fallback to type_int Igor Mammedov
2013-07-25  6:41   ` Hu Tao
2013-07-25 11:35     ` Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 05/16] qdev: Add SIZE type to qdev properties Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 06/16] dimm: implement dimm device abstraction Igor Mammedov
2013-07-25  6:52   ` Hu Tao
2013-07-23 16:23 ` [Qemu-devel] [PATCH 07/16] dimm: map DimmDevice into DimBus provided address space Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 08/16] pc: piix: make hotplug memory gap in high memory Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 09/16] pc: i440fx: add DimmBus to chipset and map it into hotplug memory region Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation Igor Mammedov
2013-07-23 17:09   ` Paolo Bonzini
2013-07-24  8:36     ` Igor Mammedov
2013-07-24  9:41       ` Paolo Bonzini
2013-07-24 11:34         ` Igor Mammedov
2013-07-24 12:41           ` Paolo Bonzini [this message]
2013-07-26  7:38             ` Igor Mammedov
2013-07-26  9:26               ` Paolo Bonzini
2013-07-26 12:51                 ` Igor Mammedov
2013-07-26 14:37                   ` Paolo Bonzini
2013-08-03 13:56                     ` Andreas Färber
2013-09-11 15:12                       ` Igor Mammedov
2013-08-06  7:13                     ` Markus Armbruster
2013-07-23 16:23 ` [Qemu-devel] [PATCH 11/16] dimm: add busy address check and address auto-allocation Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 12/16] dimm: introduce memory added notifier Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 13/16] acpi/piix4: introduce memory hot-plug interface QEMU<->ACPI BIOS Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 14/16] pc: ACPI BIOS: implement memory hotplug interface Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 15/16] pc: update acpi-dsdt.hex.generated and add ssdt-mem.hex.generated Igor Mammedov
2013-07-23 16:23 ` [Qemu-devel] [PATCH 16/16] pc: ACPI BIOS: reserve SRAT entry for hotplug mem hole Igor Mammedov
2013-07-24  9:52 ` [Qemu-devel] [PATCH 00/16 RFC v6] ACPI memory hotplug Hu Tao
2013-07-24 10:02   ` Igor Mammedov
2013-07-24 10:58     ` Vasilis Liaskovitis
2013-08-02 12:35 ` Anthony Liguori
2013-08-07 14:14   ` Erlon Cruz
2013-08-09 17:19   ` Anthony Liguori
2013-09-11  4:01 ` Hu Tao
2013-09-17 12:29   ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51EFCB80.50308@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=hutao@cn.fujitsu.com \
    --cc=imammedo@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vasilis.liaskovitis@profitbricks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.