linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Wang, Wei W" <wei.w.wang@intel.com>
To: David Hildenbrand <david@redhat.com>,
	"virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"liliang.opensource@gmail.com" <liliang.opensource@gmail.com>,
	"yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>,
	"quan.xu0@gmail.com" <quan.xu0@gmail.com>,
	"nilal@redhat.com" <nilal@redhat.com>,
	"riel@redhat.com" <riel@redhat.com>,
	"peterx@redhat.com" <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Luiz Capitulino <lcapitulino@redhat.com>
Subject: RE: [PATCH v34 0/4] Virtio-balloon: support free page reporting
Date: Fri, 29 Jun 2018 15:55:04 +0000	[thread overview]
Message-ID: <286AC319A985734F985F78AFA26841F7396C254C@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <34bb25eb-97f3-8a9f-8a13-401dfcf39a2c@redhat.com>

On Friday, June 29, 2018 7:54 PM, David Hildenbrand wrote:
> On 29.06.2018 13:31, Wei Wang wrote:
> > On 06/29/2018 03:46 PM, David Hildenbrand wrote:
> >>>
> >>> I'm afraid it can't. For example, when we have a guest booted,
> >>> without too many memory activities. Assume the guest has 8GB free
> >>> memory. The arch_free_page there won't be able to capture the 8GB
> >>> free pages since there is no free() called. This results in no free pages
> reported to host.
> >>
> >> So, it takes some time from when the guest boots up until the balloon
> >> device was initialized and therefore page hinting can start. For that
> >> period, you won't get any arch_free_page()/page hinting callbacks, correct.
> >>
> >> However in the hypervisor, you can theoretically track which pages
> >> the guest actually touched ("dirty"), so you already know "which
> >> pages were never touched while booting up until virtio-balloon was
> >> brought to life". These, you can directly exclude from migration. No
> >> interface required.
> >>
> >> The remaining problem is pages that were touched ("allocated") by the
> >> guest during bootup but freed again, before virtio-balloon came up.
> >> One would have to measure how many pages these usually are, I would
> >> say it would not be that many (because recently freed pages are
> >> likely to be used again next for allocation). However, there are some
> >> pages not being reported.
> >>
> >> During the lifetime of the guest, this should not be a problem,
> >> eventually one of these pages would get allocated/freed again, so the
> >> problem "solves itself over time". You are looking into the special
> >> case of migrating the VM just after it has been started. But we have
> >> the exact same problem also for ordinary free page hinting, so we
> >> should rather solve that problem. It is not migration specific.
> >>
> >> If we are looking for an alternative to "problem solves itself",
> >> something like "if virtio-balloon comes up, it will report all free
> >> pages step by step using free page hinting, just like we would have
> >> from "arch_free_pages()"". This would be the same interface we are
> >> using for free page hinting - and it could even be made configurable in the
> guest.
> >>
> >> The current approach we are discussing internally for details about
> >> Nitesh's work ("how the magic inside arch_fee_pages() will work
> >> efficiently) would allow this as far as I can see just fine.
> >>
> >> There would be a tiny little window between virtio-balloon comes up
> >> and it has reported all free pages step by step, but that can be
> >> considered a very special corner case that I would argue is not worth
> >> it to be optimized.
> >>
> >> If I am missing something important here, sorry in advance :)
> >>
> >
> > Probably I didn't explain that well. Please see my re-try:
> >
> > That work is to monitor page allocation and free activities via
> > arch_alloc_pages and arch_free_pages. It has per-CPU lists to record
> > the pages that are freed to the mm free list, and the per-CPU lists
> > dump the recorded pages to a global list when any of them is full.
> > So its own per-CPU list will only be able to get free pages when there
> > is an mm free() function gets called. If we have 8GB free memory on
> > the mm free list, but no application uses them and thus no mm free()
> > calls are made. In that case, the arch_free_pages isn't called, and no
> > free pages added to the per-CPU list, but we have 8G free memory right
> > on the mm free list.
> > How would you guarantee the per-CPU lists have got all the free pages
> > that the mm free lists have?
> 
> As I said, if we have some mechanism that will scan the free pages (not
> arch_free_page() once and report hints using the same mechanism step by
> step (not your bulk interface)), this problem is solved. And as I said, this is
> not a migration specific problem, we have the same problem in the current
> page hinting RFC. These pages have to be reported.
> 
> >
> > - I'm also worried about the overhead of maintaining so many per-CPU
> > lists and the global list. For example, if we have applications
> > frequently allocate and free 4KB pages, and each per-CPU list needs to
> > implement the buddy algorithm to sort and merge neighbor pages. Today
> > a server can have more than 100 CPUs, then there will be more than 100
> > per-CPU lists which need to sync to a global list under a lock, I'm
> > not sure if this would scale well.
> 
> The overhead in the current RFC is definitely too high. But I consider this a
> problem to be solved before page hinting would go upstream. And we are
> discussing right now "if we have a reasonable page hinting implementation,
> why would we need your interface in addition".
> 
> >
> > - This seems to be a burden imposed on the core mm memory
> > allocation/free path. The whole overhead needs to be carried during
> > the whole system life cycle. What we actually expected is to just make
> > one call to get the free page hints only when live migration happens.
> 
> You're focusing too much on the actual implementation of the page hinting
> RFC right now. Assume for now that we would have
> - efficient page hinting without degrading other CPUs and little
>   overhead
> - a mechanism that solves reporting free pages once after we started up
>   virtio-balloon and actual free page hinting starts
> 
> Why would your suggestion still be applicable?
> 
> Your point for now is "I might not want to have page hinting enabled due to
> the overhead, but still a live migration speedup". If that overhead actually
> exists (we'll have to see) or there might be another reason to disable page
> hinting, then we have to decide if that specific setup is worth it merging your
> changes.

All the above "if we have", "assume we have" don't sound like a valid argument to me.
 
> I am not (and don't want to be) in the position to make any decisions here :) I
> just want to understand if two interfaces for free pages actually make sense.

I responded to Nitesh about the differences, you may want to check with him about this.
I would suggest you to send out your patches to LKML to get a discussion with the mm folks.

Best,
Wei


  reply	other threads:[~2018-06-29 15:55 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-25 12:05 [PATCH v34 0/4] Virtio-balloon: support free page reporting Wei Wang
2018-06-25 12:05 ` [PATCH v34 1/4] mm: support to get hints of free page blocks Wei Wang
2018-06-25 12:05 ` [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-06-26  1:37   ` Michael S. Tsirkin
2018-06-26  3:46     ` Wei Wang
2018-06-26  3:56       ` Michael S. Tsirkin
2018-06-26 12:27         ` Wei Wang
2018-06-26 13:34           ` Michael S. Tsirkin
2018-06-27  1:24             ` Wei Wang
2018-06-27  2:41               ` Michael S. Tsirkin
2018-06-27  3:00                 ` Wei Wang
2018-06-27  3:58                   ` Michael S. Tsirkin
2018-06-27  5:27                     ` Wei Wang
2018-06-27 16:53                       ` [virtio-dev] " Michael S. Tsirkin
2018-06-25 12:05 ` [PATCH v34 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules Wei Wang
2018-06-25 12:05 ` [PATCH v34 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON Wei Wang
2018-06-27 11:06 ` [PATCH v34 0/4] Virtio-balloon: support free page reporting David Hildenbrand
2018-06-29  3:51   ` Wei Wang
2018-06-29  7:46     ` David Hildenbrand
2018-06-29 11:31       ` Wei Wang
2018-06-29 11:53         ` David Hildenbrand
2018-06-29 15:55           ` Wang, Wei W [this message]
2018-06-29 16:03             ` David Hildenbrand
2018-06-29 14:45   ` Michael S. Tsirkin
2018-06-29 15:28     ` David Hildenbrand
2018-06-29 15:52     ` Wang, Wei W
2018-06-29 16:32       ` Michael S. Tsirkin
2018-06-30  4:31 ` Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=286AC319A985734F985F78AFA26841F7396C254C@shsmsx102.ccr.corp.intel.com \
    --to=wei.w.wang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=liliang.opensource@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=quan.xu0@gmail.com \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).