linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	Alexander Duyck <alexander.duyck@gmail.com>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>,
	kvm list <kvm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com,
	Yang Zhang <yang.zhang.wz@gmail.com>,
	Rik van Riel <riel@surriel.com>,
	dodgen@google.com, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	dhildenb@redhat.com, Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [RFC][Patch v8 6/7] KVM: Enables the kernel to isolate and report free pages
Date: Mon, 11 Feb 2019 10:28:31 +0100	[thread overview]
Message-ID: <19f6d1f2-9287-6113-07b8-1988907b6108@redhat.com> (raw)
In-Reply-To: <20190209192104-mutt-send-email-mst@kernel.org>

On 10.02.19 01:38, Michael S. Tsirkin wrote:
> On Fri, Feb 08, 2019 at 02:05:09PM -0800, Alexander Duyck wrote:
>> On Fri, Feb 8, 2019 at 1:38 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>
>>> On Fri, Feb 08, 2019 at 03:41:55PM -0500, Nitesh Narayan Lal wrote:
>>>>>> I am also planning to try Michael's suggestion of using MAX_ORDER - 1.
>>>>>> However I am still thinking about a workload which I can use to test its
>>>>>> effectiveness.
>>>>> You might want to look at doing something like min(MAX_ORDER - 1,
>>>>> HUGETLB_PAGE_ORDER). I know for x86 a 2MB page is the upper limit for
>>>>> THP which is the most likely to be used page size with the guest.
>>>> Sure, thanks for the suggestion.
>>>
>>> Given current hinting in balloon is MAX_ORDER I'd say
>>> share code. If you feel a need to adjust down the road,
>>> adjust both of them with actual testing showing gains.
>>
>> Actually I'm left kind of wondering why we are even going through
>> virtio-balloon for this?
> 
> Just look at what does it do.
> 
> It improves memory overcommit if guests are cooperative, and it does
> this by giving the hypervisor addresses of pages which it can discard.
> 
> It's just *exactly* like the balloon with all the same limitations.

I agree, this belongs to virtio-balloon *unless* we run into real
problems implementing it via an asynchronous mechanism.

> 
>> It seems like this would make much more sense
>> as core functionality of KVM itself for the specific architectures
>> rather than some side thing.

Whatever can be handled in user space and does not have significant
performance impacts should be handled in user space. If we run into real
problems with that approach, fair enough. (e.g. vcpu yielding is a good
example where an implementation in KVM makes sense, not going via QEMU)

> 
> Well same as balloon: whether it's useful to you at all
> would very much depend on your workloads.
> 
> This kind of cooperative functionality is good for co-located
> single-tenant VMs. That's pretty niche.  The core things in KVM
> generally don't trust guests.
> 
> 
>> In addition this could end up being
>> redundant when you start getting into either the s390 or PowerPC
>> architectures as they already have means of providing unused page
>> hints.

I'd like to note that on s390x the functionality is not provided when
running nested guests. And there are real problems getting it ever
supported. (see description below how it works on s390x, the issue for
nested guests are the bits in the guest -> host page tables we cannot
support for nested guests).

Hinting only works for guests running one level under LPAR (with a
recent machine), but not nested guests.

(LPAR -> KVM1 works, LPAR - KVM1 -> KVM2 foes not work for the latter)

So an implementation for s390 would still make sense for this scenario.

> 
> Interesting. Is there host support in kvm?

On s390x there is. It works on page granularity and synchronization
between guest/host ("don't drop a page in the host while the guest is
reusing it") is done via special bits in the host->guest page table.
Instructions in the guest are able to modify these bits. A guest can
configure a "usage state" of it's backed PTEs. E.g. "unused" or "stable".

Whenever a page in the guest is freed/reused, the ESSA instruction is
triggered in the guest. It will modify the page table bits and add the
guest phyical pfn to a buffer in the host. Once that buffer is full,
ESSA will trigger an intercept to the hypervisor. Here, all these
"unused" pages can be zapped.

Also, when swapping a page out in the hypervisor, if it was masked by
the guest as unused or logically zero, instead of swapping out the page,
it can simply be dropped and a fresh zero page can be supplied when the
guest tries to access it.

"ESSA" is implemented in KVM in arch/s390/kvm/priv.c:handle_essa().

So on s390x, it works because the synchronization with the hypervisor is
directly built into hw vitualization support (guest->host page tables +
instruction) and ESSA will not intercept on every call (due to the buffer).
> 
> 
>> I have a set of patches I proposed that add similar functionality via
>> a KVM hypercall for x86 instead of doing it as a part of a Virtio
>> device[1].  I'm suspecting the overhead of doing things this way is
>> much less then having to make multiple madvise system calls from QEMU
>> back into the kernel.
> 
> Well whether it's a virtio device is orthogonal to whether it's an
> madvise call, right? You can build vhost-pagehint and that can
> handle requests in a VQ within balloon and do it
> within host kernel directly.
> 
> virtio rings let you pass multiple pages so it's really hard to
> say which will win outright - maybe it's more important
> to coalesce exits.

We don't know until we measure it.

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2019-02-11  9:28 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-04 20:18 [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 1/7] KVM: Support for guest free page hinting Nitesh Narayan Lal
2019-02-05  4:14   ` Michael S. Tsirkin
2019-02-05 13:06     ` Nitesh Narayan Lal
2019-02-05 16:27       ` Michael S. Tsirkin
2019-02-05 16:34         ` Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 2/7] KVM: Enabling guest free page hinting via static key Nitesh Narayan Lal
2019-02-08 18:07   ` Alexander Duyck
2019-02-08 18:22     ` Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 3/7] KVM: Guest free page hinting functional skeleton Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 4/7] KVM: Disabling page poisoning to prevent corruption Nitesh Narayan Lal
2019-02-07 17:23   ` Alexander Duyck
2019-02-07 17:56     ` Nitesh Narayan Lal
2019-02-07 18:24       ` Alexander Duyck
2019-02-07 19:14         ` Michael S. Tsirkin
2019-02-07 21:08   ` Michael S. Tsirkin
2019-02-04 20:18 ` [RFC][Patch v8 5/7] virtio: Enables to add a single descriptor to the host Nitesh Narayan Lal
2019-02-05 20:49   ` Michael S. Tsirkin
2019-02-06 12:56     ` Nitesh Narayan Lal
2019-02-06 13:15       ` Luiz Capitulino
2019-02-06 13:24         ` Nitesh Narayan Lal
2019-02-06 13:29           ` Luiz Capitulino
2019-02-06 14:05             ` Nitesh Narayan Lal
2019-02-06 18:03       ` Michael S. Tsirkin
2019-02-06 18:19         ` Nitesh Narayan Lal
2019-02-04 20:18 ` [RFC][Patch v8 6/7] KVM: Enables the kernel to isolate and report free pages Nitesh Narayan Lal
2019-02-05 20:45   ` Michael S. Tsirkin
2019-02-05 21:54     ` Nitesh Narayan Lal
2019-02-05 21:55       ` Michael S. Tsirkin
2019-02-07 17:43         ` Alexander Duyck
2019-02-07 19:01           ` Michael S. Tsirkin
2019-02-07 20:50           ` Nitesh Narayan Lal
2019-02-08 17:58             ` Alexander Duyck
2019-02-08 20:41               ` Nitesh Narayan Lal
2019-02-08 21:38                 ` Michael S. Tsirkin
2019-02-08 22:05                   ` Alexander Duyck
2019-02-10  0:38                     ` Michael S. Tsirkin
2019-02-11  9:28                       ` David Hildenbrand [this message]
2019-02-12  5:16                         ` Michael S. Tsirkin
2019-02-12 17:10                       ` Nitesh Narayan Lal
2019-02-08 21:35               ` Michael S. Tsirkin
2019-02-04 20:18 ` [RFC][Patch v8 7/7] KVM: Adding tracepoints for guest page hinting Nitesh Narayan Lal
2019-02-04 20:20 ` [RFC][QEMU PATCH] KVM: Support for guest free " Nitesh Narayan Lal
2019-02-12  9:03 ` [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting Wang, Wei W
2019-02-12  9:24   ` David Hildenbrand
2019-02-12 17:24     ` Nitesh Narayan Lal
2019-02-12 19:34       ` David Hildenbrand
2019-02-13  8:55     ` Wang, Wei W
2019-02-13  9:19       ` David Hildenbrand
2019-02-13 12:17         ` Nitesh Narayan Lal
2019-02-13 17:09           ` Michael S. Tsirkin
2019-02-13 17:22             ` Nitesh Narayan Lal
     [not found]               ` <286AC319A985734F985F78AFA26841F73DF6F1C3@shsmsx102.ccr.corp.intel.com>
2019-02-14  9:34                 ` David Hildenbrand
2019-02-13 17:16         ` Michael S. Tsirkin
2019-02-13 17:59           ` David Hildenbrand
2019-02-13 19:08             ` Michael S. Tsirkin
2019-02-14  9:08         ` Wang, Wei W
2019-02-14 10:00           ` David Hildenbrand
2019-02-14 10:44             ` David Hildenbrand
2019-02-15  9:15             ` Wang, Wei W
2019-02-15  9:33               ` David Hildenbrand
2019-02-13  9:00 ` Wang, Wei W
2019-02-13 12:06   ` Nitesh Narayan Lal
2019-02-14  8:48     ` Wang, Wei W
2019-02-14  9:42       ` David Hildenbrand
2019-02-15  9:05         ` Wang, Wei W
2019-02-15  9:41           ` David Hildenbrand
2019-02-18  2:36             ` Wei Wang
2019-02-18  2:39               ` Wei Wang
2019-02-15 12:40           ` Nitesh Narayan Lal
2019-02-14 13:00       ` Nitesh Narayan Lal
2019-02-16  9:40 ` David Hildenbrand
2019-02-18 15:50   ` Nitesh Narayan Lal
2019-02-18 16:02     ` David Hildenbrand
2019-02-18 16:49   ` Michael S. Tsirkin
2019-02-18 16:59     ` David Hildenbrand
2019-02-18 17:31       ` Alexander Duyck
2019-02-18 17:41         ` David Hildenbrand
2019-02-18 23:47           ` Alexander Duyck
2019-02-19  2:45             ` Michael S. Tsirkin
2019-02-19  2:46             ` Andrea Arcangeli
2019-02-19 12:52               ` Nitesh Narayan Lal
2019-02-19 16:23               ` Alexander Duyck
2019-02-19  8:06             ` David Hildenbrand
2019-02-19 14:40               ` Michael S. Tsirkin
2019-02-19 14:44                 ` David Hildenbrand
2019-02-19 14:45                   ` David Hildenbrand
2019-02-18 18:01         ` Michael S. Tsirkin
2019-02-18 17:54       ` Michael S. Tsirkin
2019-02-18 18:29         ` David Hildenbrand
2019-02-18 19:16           ` Michael S. Tsirkin
2019-02-18 19:35             ` David Hildenbrand
2019-02-18 19:47               ` Michael S. Tsirkin
2019-02-18 20:04                 ` David Hildenbrand
2019-02-18 20:31                   ` Michael S. Tsirkin
2019-02-18 20:40                     ` Nitesh Narayan Lal
2019-02-18 21:04                       ` David Hildenbrand
2019-02-19  0:01                         ` Alexander Duyck
2019-02-19  7:54                           ` David Hildenbrand
2019-02-19 18:06                             ` Alexander Duyck
2019-02-19 18:31                               ` David Hildenbrand
2019-02-19 21:57                                 ` Alexander Duyck
2019-02-19 22:17                                   ` Michael S. Tsirkin
2019-02-19 22:36                                   ` David Hildenbrand
2019-02-19 19:58                               ` Michael S. Tsirkin
2019-02-19 20:02                                 ` David Hildenbrand
2019-02-19 20:17                                   ` Michael S. Tsirkin
2019-02-19 20:21                                     ` David Hildenbrand
2019-02-19 20:35                                       ` Michael S. Tsirkin
2019-02-19 12:47                         ` Nitesh Narayan Lal
2019-02-19 13:03                           ` David Hildenbrand
2019-02-19 14:17                             ` Nitesh Narayan Lal
2019-02-19 14:21                               ` David Hildenbrand
2019-02-18 20:53                     ` David Hildenbrand
2019-02-23  0:02 ` Alexander Duyck
2019-02-25 13:01   ` Nitesh Narayan Lal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19f6d1f2-9287-6113-07b8-1988907b6108@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=dhildenb@redhat.com \
    --cc=dodgen@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=nitesh@redhat.com \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=wei.w.wang@intel.com \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).