linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: David Hildenbrand <david@redhat.com>,
	"Li, Liang Z" <liang.z.li@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>
Subject: Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
Date: Wed, 7 Dec 2016 21:28:24 +0100	[thread overview]
Message-ID: <20161207202824.GH28786@redhat.com> (raw)
In-Reply-To: <b58fd9f6-d9dd-dd56-d476-dd342174dac5@intel.com>

On Wed, Dec 07, 2016 at 11:54:34AM -0800, Dave Hansen wrote:
> We're talking about a bunch of different stuff which is all being
> conflated.  There are 3 issues here that I can see.  I'll attempt to
> summarize what I think is going on:
> 
> 1. Current patches do a hypercall for each order in the allocator.
>    This is inefficient, but independent from the underlying data
>    structure in the ABI, unless bitmaps are in play, which they aren't.
> 2. Should we have bitmaps in the ABI, even if they are not in use by the
>    guest implementation today?  Andrea says they have zero benefits
>    over a pfn/len scheme.  Dave doesn't think they have zero benefits
>    but isn't that attached to them.  QEMU's handling gets more
>    complicated when using a bitmap.
> 3. Should the ABI contain records each with a pfn/len pair or a
>    pfn/order pair?
>    3a. 'len' is more flexible, but will always be a power-of-two anyway
> 	for high-order pages (the common case)

Len wouldn't be a power of two practically only if we detect adjacent
pages of smaller order that may merge into larger orders we already
allocated (or the other way around).

[addr=2M, len=2M] allocated at order 9 pass
[addr=4M, len=1M] allocated at order 8 pass -> merge as [addr=2M, len=3M]

Not sure if it would be worth it, but that unless we do this, page-order or
len won't make much difference.

>    3b. if we decide not to have a bitmap, then we basically have plenty
> 	of space for 'len' and should just do it
>    3c. It's easiest for the hypervisor to turn pfn/len into the
>        madvise() calls that it needs.
> 
> Did I miss anything?

I think you summarized fine all my arguments in your summary.

> FWIW, I don't feel that strongly about the bitmap.  Li had one
> originally, but I think the code thus far has demonstrated a huge
> benefit without even having a bitmap.
> 
> I've got no objections to ripping the bitmap out of the ABI.

I think we need to see a statistic showing the number of bits set in
each bitmap in average, after some uptime and lru churn, like running
stresstest app for a while with I/O and then inflate the balloon and
count:

1) how many bits were set vs total number of bits used in bitmaps

2) how many times bitmaps were used vs bitmap_len = 0 case of single
   page

My guess would be like very low percentage for both points.

> Surely we can think of a few ways...
> 
> A bitmap is 64x more dense if the lists are unordered.  It means being
> able to store ~32k*2M=64G worth of 2M pages in one data page vs. ~1G.
> That's 64x fewer cachelines to touch, 64x fewer pages to move to the
> hypervisor and lets us allocate 1/64th the memory.  Given a maximum
> allocation that we're allowed, it lets us do 64x more per-pass.
> 
> Now, are those benefits worth it?  Maybe not, but let's not pretend they
> don't exist. ;)

In the best case there are benefits obviously, the question is how
common the best case is.

The best case if I understand correctly is all high order not
available, but plenty of order 0 pages available at phys address X,
X+8k, X+16k, X+(8k*nr_bits_in_bitmap). How common is that 0 pages
exist but they're not at an address < X or > X+(8k*nr_bits_in_bitmap)?

> Yes, the current code sends one batch of pages up to the hypervisor per
> order.  But, this has nothing to do with the underlying data structure,
> or the choice to have an order vs. len in the ABI.
> 
> What you describe here is obviously more efficient.

And it isn't possible with the current ABI.

So there is a connection with the MAX_ORDER..0 allocation loop and the
ABI change, but I agree any of the ABI proposed would still allow for
it this logic to be used. Bitmap or not bitmap, the loop would still
work.

  reply	other threads:[~2016-12-07 20:28 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30  8:43 [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 1/5] virtio-balloon: rework deflate to add page to a list Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 2/5] virtio-balloon: define new feature bit and head struct Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 3/5] virtio-balloon: speed up inflate/deflate process Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 4/5] virtio-balloon: define flags and head for host request vq Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info Liang Li
2016-11-30 19:15   ` Dave Hansen
2016-12-04 13:13     ` Li, Liang Z
2016-12-05 17:22       ` Dave Hansen
2016-12-06  4:47         ` Li, Liang Z
2016-12-06  8:40 ` [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration David Hildenbrand
2016-12-07 13:35   ` Li, Liang Z
2016-12-07 15:34     ` Dave Hansen
2016-12-09  3:09       ` Li, Liang Z
2016-12-07 15:42     ` David Hildenbrand
2016-12-07 15:45       ` Dave Hansen
2016-12-07 16:21         ` David Hildenbrand
2016-12-07 16:57           ` Dave Hansen
2016-12-07 18:38             ` [Qemu-devel] " Andrea Arcangeli
2016-12-07 18:44               ` Dave Hansen
2016-12-07 18:58                 ` Andrea Arcangeli
2016-12-07 19:54               ` Dave Hansen
2016-12-07 20:28                 ` Andrea Arcangeli [this message]
2016-12-09  4:45                   ` Li, Liang Z
2016-12-09  4:53                     ` Dave Hansen
2016-12-09  5:35                       ` Li, Liang Z
2016-12-09 16:42                         ` Andrea Arcangeli
2016-12-14  8:20                           ` Li, Liang Z
2016-12-14  8:59                       ` Li, Liang Z
2016-12-15 15:34                         ` Dave Hansen
2016-12-15 15:54                           ` Michael S. Tsirkin
2016-12-16  1:12                             ` Li, Liang Z
2016-12-16 15:40                               ` Andrea Arcangeli
2016-12-17 11:56                                 ` Li, Liang Z
2016-12-16  0:48                           ` Li, Liang Z
2016-12-16  1:09                             ` Dave Hansen
2016-12-16  1:38                               ` Li, Liang Z
2016-12-16  1:40                                 ` Dave Hansen
2016-12-16  1:43                                   ` Li, Liang Z
2016-12-16 16:01                                   ` Andrea Arcangeli
2016-12-17 12:39                                     ` Li, Liang Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161207202824.GH28786@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=liang.z.li@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).