All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: "mhocko@suse.com" <mhocko@suse.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"Li, Liang Z" <liang.z.li@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
Date: Wed, 7 Dec 2016 19:38:17 +0100	[thread overview]
Message-ID: <20161207183817.GE28786__5904.88189268417$1481136798$gmane$org@redhat.com> (raw)
In-Reply-To: <d3ff453c-56fa-19de-317c-1c82456f2831@intel.com>

Hello,

On Wed, Dec 07, 2016 at 08:57:01AM -0800, Dave Hansen wrote:
> It is more space-efficient.  We're fitting the order into 6 bits, which
> would allows the full 2^64 address space to be represented in one entry,

Very large order is the same as very large len, 6 bits of order or 8
bytes of len won't really move the needle here, simpler code is
preferable.

The main benefit of "len" is that it can be more granular, plus it's
simpler than the bitmap too. Eventually all this stuff has to end up
into a madvisev (not yet upstream but somebody posted it for jemalloc
and should get merged eventually).

So the bitmap shall be demuxed to a addr,len array anyway, the bitmap
won't ever be sent to the madvise syscall, which makes the
intermediate representation with the bitmap a complication with
basically no benefits compared to a (N, [addr1,len1], .., [addrN,
lenN]) representation.

If you prefer 1 byte of order (not just 6 bits) instead 8bytes of len
that's possible too, I wouldn't be against that, the conversion before
calling madvise would be pretty efficient too.

> and leaves room for the bitmap size to be encoded as well, if we decide
> we need a bitmap in the future.

How would a bitmap ever be useful with very large page-order?

> If that was purely a length, we'd be limited to 64*4k pages per entry,
> which isn't even a full large page.

I don't follow here.

What we suggest is to send the data down represented as (N,
[addr1,len1], ..., [addrN, lenN]) which allows infinite ranges each
one of maximum length 2^64, so 2^64 multiplied infinite times if you
wish. Simplifying the code and not having any bitmap at all and no :6
:6 bits either.

The high order to low order loop of allocations is the interesting part
here, not the bitmap, and the fact of doing a single vmexit to send
the large ranges.

Once we pull out the largest order regions, we just add them to the
array as [addr,1UL<<order], when the array reaches a maximum N number
of entries or we fail a order 0 allocation, we flush all those entries
down to qemu. Qemu then builds the iov for madvisev and it's pretty
much a 1:1 conversion, not a decoding operation converting the bitmap
in the (N, [addr1,len1], ..., [addrN, lenN]) for madvisev (or a flood
of madvise MADV_DONTNEED with current kernels).

Considering the loop that allocates starting from MAX_ORDER..1, the
chance the bitmap is actually getting filled with more than one bit at
page_shift of PAGE_SHIFT should be very low after some uptime.

By the very nature of this loop, if we already exacerbates all high
order buddies, the page-order 0 pages obtained are going to be fairly
fragmented reducing the usefulness of the bitmap and potentially only
wasting CPU/memory.

  reply	other threads:[~2016-12-07 18:38 UTC|newest]

Thread overview: 165+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30  8:43 [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
2016-11-30  8:43 ` [Qemu-devel] " Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 1/5] virtio-balloon: rework deflate to add page to a list Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 2/5] virtio-balloon: define new feature bit and head struct Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 3/5] virtio-balloon: speed up inflate/deflate process Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 4/5] virtio-balloon: define flags and head for host request vq Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30 19:15   ` Dave Hansen
2016-11-30 19:15   ` Dave Hansen
2016-11-30 19:15     ` [Qemu-devel] " Dave Hansen
2016-11-30 19:15     ` Dave Hansen
2016-12-04 13:13     ` Li, Liang Z
2016-12-04 13:13     ` Li, Liang Z
2016-12-04 13:13       ` [Qemu-devel] " Li, Liang Z
2016-12-04 13:13       ` Li, Liang Z
2016-12-05 17:22       ` Dave Hansen
2016-12-05 17:22         ` [Qemu-devel] " Dave Hansen
2016-12-05 17:22         ` Dave Hansen
2016-12-05 17:22         ` Dave Hansen
2016-12-06  4:47         ` Li, Liang Z
2016-12-06  4:47           ` [Qemu-devel] " Li, Liang Z
2016-12-06  4:47           ` Li, Liang Z
2016-12-06  4:47           ` Li, Liang Z
2016-12-06  8:40 ` [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration David Hildenbrand
2016-12-06  8:40   ` [Qemu-devel] " David Hildenbrand
2016-12-06  8:40   ` David Hildenbrand
2016-12-06  8:40   ` David Hildenbrand
2016-12-07 13:35   ` Li, Liang Z
2016-12-07 13:35     ` [Qemu-devel] " Li, Liang Z
2016-12-07 13:35     ` Li, Liang Z
2016-12-07 13:35     ` Li, Liang Z
2016-12-07 15:34     ` Dave Hansen
2016-12-07 15:34       ` [Qemu-devel] " Dave Hansen
2016-12-07 15:34       ` Dave Hansen
2016-12-09  3:09       ` Li, Liang Z
2016-12-09  3:09         ` [Qemu-devel] " Li, Liang Z
2016-12-09  3:09         ` Li, Liang Z
2016-12-09  3:09         ` Li, Liang Z
2016-12-09  3:09       ` Li, Liang Z
2016-12-07 15:34     ` Dave Hansen
2016-12-07 15:42     ` David Hildenbrand
2016-12-07 15:42       ` [Qemu-devel] " David Hildenbrand
2016-12-07 15:42       ` David Hildenbrand
2016-12-07 15:45       ` Dave Hansen
2016-12-07 15:45         ` [Qemu-devel] " Dave Hansen
2016-12-07 15:45         ` Dave Hansen
2016-12-07 15:45         ` Dave Hansen
2016-12-07 16:21         ` David Hildenbrand
2016-12-07 16:21           ` [Qemu-devel] " David Hildenbrand
2016-12-07 16:21           ` David Hildenbrand
2016-12-07 16:21           ` David Hildenbrand
2016-12-07 16:57           ` Dave Hansen
2016-12-07 16:57           ` Dave Hansen
2016-12-07 16:57             ` [Qemu-devel] " Dave Hansen
2016-12-07 16:57             ` Dave Hansen
2016-12-07 18:38             ` Andrea Arcangeli [this message]
2016-12-07 18:38             ` [Qemu-devel] " Andrea Arcangeli
2016-12-07 18:38               ` Andrea Arcangeli
2016-12-07 18:38               ` Andrea Arcangeli
2016-12-07 18:44               ` Dave Hansen
2016-12-07 18:44               ` Dave Hansen
2016-12-07 18:44                 ` Dave Hansen
2016-12-07 18:44                 ` Dave Hansen
2016-12-07 18:58                 ` Andrea Arcangeli
2016-12-07 18:58                 ` Andrea Arcangeli
2016-12-07 18:58                   ` Andrea Arcangeli
2016-12-07 18:58                   ` Andrea Arcangeli
2016-12-07 19:54               ` Dave Hansen
2016-12-07 19:54                 ` Dave Hansen
2016-12-07 19:54                 ` Dave Hansen
2016-12-07 19:54                 ` Dave Hansen
2016-12-07 20:28                 ` Andrea Arcangeli
2016-12-07 20:28                 ` Andrea Arcangeli
2016-12-07 20:28                   ` Andrea Arcangeli
2016-12-07 20:28                   ` Andrea Arcangeli
2016-12-09  4:45                   ` Li, Liang Z
2016-12-09  4:45                   ` Li, Liang Z
2016-12-09  4:45                     ` Li, Liang Z
2016-12-09  4:45                     ` Li, Liang Z
2016-12-09  4:53                     ` Dave Hansen
2016-12-09  4:53                       ` Dave Hansen
2016-12-09  4:53                       ` Dave Hansen
2016-12-09  4:53                       ` Dave Hansen
2016-12-09  5:35                       ` Li, Liang Z
2016-12-09  5:35                         ` Li, Liang Z
2016-12-09  5:35                         ` Li, Liang Z
2016-12-09  5:35                         ` Li, Liang Z
2016-12-09 16:42                         ` Andrea Arcangeli
2016-12-09 16:42                           ` Andrea Arcangeli
2016-12-09 16:42                           ` Andrea Arcangeli
2016-12-09 16:42                           ` Andrea Arcangeli
2016-12-14  8:20                           ` Li, Liang Z
2016-12-14  8:20                             ` Li, Liang Z
2016-12-14  8:20                             ` Li, Liang Z
2016-12-14  8:20                             ` Li, Liang Z
2016-12-14  8:59                       ` Li, Liang Z
2016-12-14  8:59                         ` Li, Liang Z
2016-12-14  8:59                         ` Li, Liang Z
2016-12-14  8:59                         ` Li, Liang Z
2016-12-15 15:34                         ` Dave Hansen
2016-12-15 15:34                         ` Dave Hansen
2016-12-15 15:34                           ` Dave Hansen
2016-12-15 15:34                           ` Dave Hansen
2016-12-15 15:54                           ` Michael S. Tsirkin
2016-12-15 15:54                           ` Michael S. Tsirkin
2016-12-15 15:54                             ` Michael S. Tsirkin
2016-12-15 15:54                             ` Michael S. Tsirkin
2016-12-16  1:12                             ` Li, Liang Z
2016-12-16  1:12                               ` Li, Liang Z
2016-12-16  1:12                               ` Li, Liang Z
2016-12-16  1:12                               ` Li, Liang Z
2016-12-16 15:40                               ` Andrea Arcangeli
2016-12-16 15:40                                 ` Andrea Arcangeli
2016-12-16 15:40                                 ` Andrea Arcangeli
2016-12-17 11:56                                 ` Li, Liang Z
2016-12-17 11:56                                   ` Li, Liang Z
2016-12-17 11:56                                   ` Li, Liang Z
2016-12-17 11:56                                   ` Li, Liang Z
2016-12-16 15:40                               ` Andrea Arcangeli
2016-12-16  0:48                           ` Li, Liang Z
2016-12-16  0:48                             ` Li, Liang Z
2016-12-16  0:48                             ` Li, Liang Z
2016-12-16  0:48                             ` Li, Liang Z
2016-12-16  1:09                             ` Dave Hansen
2016-12-16  1:09                             ` Dave Hansen
2016-12-16  1:09                               ` Dave Hansen
2016-12-16  1:09                               ` Dave Hansen
2016-12-16  1:38                               ` Li, Liang Z
2016-12-16  1:38                                 ` Li, Liang Z
2016-12-16  1:38                                 ` Li, Liang Z
2016-12-16  1:38                                 ` Li, Liang Z
2016-12-16  1:40                                 ` Dave Hansen
2016-12-16  1:40                                   ` Dave Hansen
2016-12-16  1:40                                   ` Dave Hansen
2016-12-16  1:40                                   ` Dave Hansen
2016-12-16  1:43                                   ` Li, Liang Z
2016-12-16  1:43                                   ` Li, Liang Z
2016-12-16  1:43                                     ` Li, Liang Z
2016-12-16  1:43                                     ` Li, Liang Z
2016-12-16 16:01                                   ` Andrea Arcangeli
2016-12-16 16:01                                     ` Andrea Arcangeli
2016-12-16 16:01                                     ` Andrea Arcangeli
2016-12-16 16:01                                     ` Andrea Arcangeli
2016-12-17 12:39                                     ` Li, Liang Z
2016-12-17 12:39                                     ` Li, Liang Z
2016-12-17 12:39                                       ` Li, Liang Z
2016-12-17 12:39                                       ` Li, Liang Z
2016-12-07 15:42     ` David Hildenbrand
2016-12-07 13:35   ` Li, Liang Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20161207183817.GE28786__5904.88189268417$1481136798$gmane$org@redhat.com' \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=liang.z.li@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.