linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
@ 2013-03-20  8:29 Xiao Guangrong
  0 siblings, 0 replies; 19+ messages in thread
From: Xiao Guangrong @ 2013-03-20  8:29 UTC (permalink / raw)
  To: mtosatti; +Cc: gleb, linux-kernel, kvm, Xiao Guangrong

Changlog:
V2:
  - do not reset n_requested_mmu_pages and n_max_mmu_pages
  - batch free root shadow pages to reduce vcpu notification and mmu-lock
    contention
  - remove the first patch that introduce kvm->arch.mmu_cache since we only
    'memset zero' on hashtable rather than all mmu cache members in this
    version
  - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all

* Issue
The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
walk and zap all shadow pages one by one, also it need to zap all guest
page's rmap and all shadow page's parent spte list. Particularly, things
become worse if guest uses more memory or vcpus. It is not good for
scalability.

* Idea
Since all shadow page will be zapped, we can directly zap the mmu-cache
and rmap so that vcpu will fault on the new mmu-cache, after that, we can
directly free the memory used by old mmu-cache.

The root shadow page is little especial since they are currently used by
vcpus, we can not directly free them. So, we zap the root shadow pages and
re-add them into the new mmu-cache.

* TODO
(1): free root shadow pages by using generation-number
(2): drop unnecessary @npages from kvm_arch_create_memslot

* Performance
The testcase can be found at:
http://www.gossamer-threads.com/lists/engine?do=post_attachment;postatt_id=54896;list=linux
is used to measure the time of delete / add memslot. At that time, all vcpus
are waiting, that means, no mmu-lock contention. I believe the result be more
beautiful if other vcpus and mmu notification need to hold the mmu-lock.

Guest VCPU:6, Mem:2048M

before: Run 10 times, Avg time:46078825 ns.

after: Run 10 times, Avg time:21558774 ns. (+ 113%)

Xiao Guangrong (7):
  KVM: MMU: introduce mmu_cache->pte_list_descs
  KVM: x86: introduce memslot_set_lpage_disallowed
  KVM: x86: introduce kvm_clear_all_gfn_page_info
  KVM: MMU: delete shadow page from hash list in
    kvm_mmu_prepare_zap_page
  KVM: MMU: split kvm_mmu_prepare_zap_page
  KVM: MMU: fast zap all shadow pages
  KVM: MMU: drop unnecessary kvm_reload_remote_mmus after
    kvm_mmu_zap_all

 arch/x86/include/asm/kvm_host.h |    7 ++-
 arch/x86/kvm/mmu.c              |  105 ++++++++++++++++++++++++++++++++++-----
 arch/x86/kvm/mmu.h              |    1 +
 arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++-------
 include/linux/kvm_host.h        |    1 +
 5 files changed, 166 insertions(+), 35 deletions(-)

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-04-18 16:36                             ` Gleb Natapov
@ 2013-04-18 17:34                               ` Marcelo Tosatti
  0 siblings, 0 replies; 19+ messages in thread
From: Marcelo Tosatti @ 2013-04-18 17:34 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Xiao Guangrong, linux-kernel, kvm

On Thu, Apr 18, 2013 at 07:36:03PM +0300, Gleb Natapov wrote:
> On Thu, Apr 18, 2013 at 11:01:18AM -0300, Marcelo Tosatti wrote:
> > On Thu, Apr 18, 2013 at 12:42:39PM +0300, Gleb Natapov wrote:
> > > > > that, but if not then less code is better.
> > > > 
> > > > The number of sp->role.invalid=1 pages is small (only shadow roots). It
> > > > can grow but is bounded to a handful. No improvement visible there.
> > > > 
> > > > The number of shadow pages with old mmu_gen_number is potentially large.
> > > > 
> > > > Returning all shadow pages to the allocator is problematic because it
> > > > takes a long time (therefore the suggestion to postpone it).
> > > > 
> > > > Spreading the work to free (or reuse) those shadow pages to individual
> > > > page fault instances alleviates the mmu_lock hold time issue without
> > > > significant reduction to post kvm_mmu_zap_all operation (which has to
> > > > rebuilt all pagetables anyway).
> > > > 
> > > > You prefer to modify SLAB allocator to aggressively free these stale
> > > > shadow pages rather than kvm_mmu_get_page to reuse them?
> > > Are you saying that what makes kvm_mmu_zap_all() slow is that we return
> > > all the shadow pages to the SLAB allocator? As far as I understand what
> > > makes it slow is walking over huge number of shadow pages via various
> > > lists, actually releasing them to the SLAB is not an issue, otherwise
> > > the problem could have been solved by just moving
> > > kvm_mmu_commit_zap_page() out of the mmu_lock. If there is measurable
> > > SLAB overhead from not reusing the pages I am all for reusing them, but
> > > is this really the case or just premature optimization?
> > 
> > Actually releasing them is not a problem. Walking all pages, lists and
> > releasing in the process part of the problem ("returning them to the allocator"
> > would have been clearer with "freeing them").
> > 
> > Point is at some point you have to walk all pages and release their data
> > structures. With Xiaos scheme its possible to avoid this lengthy process
> > by either:
> > 
> > 1) reusing the pages with stale generation number
> > or
> > 2) releasing them via the SLAB shrinker more aggressively
> > 
> But is it really so? The number of allocated shadow pages are limited
> via n_max_mmu_pages mechanism, so I expect most freeing to happen in
> make_mmu_pages_available() which is called during page fault so freeing
> will be spread across page faults more or less equally. Doing
> kvm_mmu_prepare_zap_page()/kvm_mmu_commit_zap_page() and zapping unknown
> number of shadow pages during kvm_mmu_get_page() just to reuse one does
> not sound like a clear win to me.

Makes sense.

> > (another typo, i meant "SLAB shrinker" not "SLAB allocator").
> > 
> > But you seem to be concerned for 1) due to code complexity issues?
> > 
> It adds code that looks to me redundant. I may be wrong of course, if
> it is a demonstrable win I am all for it.

Ditto.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-04-18 14:01                           ` Marcelo Tosatti
@ 2013-04-18 16:36                             ` Gleb Natapov
  2013-04-18 17:34                               ` Marcelo Tosatti
  0 siblings, 1 reply; 19+ messages in thread
From: Gleb Natapov @ 2013-04-18 16:36 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, linux-kernel, kvm

On Thu, Apr 18, 2013 at 11:01:18AM -0300, Marcelo Tosatti wrote:
> On Thu, Apr 18, 2013 at 12:42:39PM +0300, Gleb Natapov wrote:
> > > > that, but if not then less code is better.
> > > 
> > > The number of sp->role.invalid=1 pages is small (only shadow roots). It
> > > can grow but is bounded to a handful. No improvement visible there.
> > > 
> > > The number of shadow pages with old mmu_gen_number is potentially large.
> > > 
> > > Returning all shadow pages to the allocator is problematic because it
> > > takes a long time (therefore the suggestion to postpone it).
> > > 
> > > Spreading the work to free (or reuse) those shadow pages to individual
> > > page fault instances alleviates the mmu_lock hold time issue without
> > > significant reduction to post kvm_mmu_zap_all operation (which has to
> > > rebuilt all pagetables anyway).
> > > 
> > > You prefer to modify SLAB allocator to aggressively free these stale
> > > shadow pages rather than kvm_mmu_get_page to reuse them?
> > Are you saying that what makes kvm_mmu_zap_all() slow is that we return
> > all the shadow pages to the SLAB allocator? As far as I understand what
> > makes it slow is walking over huge number of shadow pages via various
> > lists, actually releasing them to the SLAB is not an issue, otherwise
> > the problem could have been solved by just moving
> > kvm_mmu_commit_zap_page() out of the mmu_lock. If there is measurable
> > SLAB overhead from not reusing the pages I am all for reusing them, but
> > is this really the case or just premature optimization?
> 
> Actually releasing them is not a problem. Walking all pages, lists and
> releasing in the process part of the problem ("returning them to the allocator"
> would have been clearer with "freeing them").
> 
> Point is at some point you have to walk all pages and release their data
> structures. With Xiaos scheme its possible to avoid this lengthy process
> by either:
> 
> 1) reusing the pages with stale generation number
> or
> 2) releasing them via the SLAB shrinker more aggressively
> 
But is it really so? The number of allocated shadow pages are limited
via n_max_mmu_pages mechanism, so I expect most freeing to happen in
make_mmu_pages_available() which is called during page fault so freeing
will be spread across page faults more or less equally. Doing
kvm_mmu_prepare_zap_page()/kvm_mmu_commit_zap_page() and zapping unknown
number of shadow pages during kvm_mmu_get_page() just to reuse one does
not sound like a clear win to me.

> (another typo, i meant "SLAB shrinker" not "SLAB allocator").
> 
> But you seem to be concerned for 1) due to code complexity issues?
> 
It adds code that looks to me redundant. I may be wrong of course, if
it is a demonstrable win I am all for it.

--
			Gleb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-04-18  9:42                         ` Gleb Natapov
@ 2013-04-18 14:01                           ` Marcelo Tosatti
  2013-04-18 16:36                             ` Gleb Natapov
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2013-04-18 14:01 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Xiao Guangrong, linux-kernel, kvm

On Thu, Apr 18, 2013 at 12:42:39PM +0300, Gleb Natapov wrote:
> > > that, but if not then less code is better.
> > 
> > The number of sp->role.invalid=1 pages is small (only shadow roots). It
> > can grow but is bounded to a handful. No improvement visible there.
> > 
> > The number of shadow pages with old mmu_gen_number is potentially large.
> > 
> > Returning all shadow pages to the allocator is problematic because it
> > takes a long time (therefore the suggestion to postpone it).
> > 
> > Spreading the work to free (or reuse) those shadow pages to individual
> > page fault instances alleviates the mmu_lock hold time issue without
> > significant reduction to post kvm_mmu_zap_all operation (which has to
> > rebuilt all pagetables anyway).
> > 
> > You prefer to modify SLAB allocator to aggressively free these stale
> > shadow pages rather than kvm_mmu_get_page to reuse them?
> Are you saying that what makes kvm_mmu_zap_all() slow is that we return
> all the shadow pages to the SLAB allocator? As far as I understand what
> makes it slow is walking over huge number of shadow pages via various
> lists, actually releasing them to the SLAB is not an issue, otherwise
> the problem could have been solved by just moving
> kvm_mmu_commit_zap_page() out of the mmu_lock. If there is measurable
> SLAB overhead from not reusing the pages I am all for reusing them, but
> is this really the case or just premature optimization?

Actually releasing them is not a problem. Walking all pages, lists and
releasing in the process part of the problem ("returning them to the allocator"
would have been clearer with "freeing them").

Point is at some point you have to walk all pages and release their data
structures. With Xiaos scheme its possible to avoid this lengthy process
by either:

1) reusing the pages with stale generation number
or
2) releasing them via the SLAB shrinker more aggressively

(another typo, i meant "SLAB shrinker" not "SLAB allocator").

But you seem to be concerned for 1) due to code complexity issues?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-04-17 20:39                       ` Marcelo Tosatti
@ 2013-04-18  9:42                         ` Gleb Natapov
  2013-04-18 14:01                           ` Marcelo Tosatti
  0 siblings, 1 reply; 19+ messages in thread
From: Gleb Natapov @ 2013-04-18  9:42 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, linux-kernel, kvm

On Wed, Apr 17, 2013 at 05:39:04PM -0300, Marcelo Tosatti wrote:
> On Fri, Mar 22, 2013 at 09:15:24PM +0200, Gleb Natapov wrote:
> > On Fri, Mar 22, 2013 at 08:37:33PM +0800, Xiao Guangrong wrote:
> > > On 03/22/2013 08:12 PM, Gleb Natapov wrote:
> > > > On Fri, Mar 22, 2013 at 08:03:04PM +0800, Xiao Guangrong wrote:
> > > >> On 03/22/2013 07:47 PM, Gleb Natapov wrote:
> > > >>> On Fri, Mar 22, 2013 at 07:39:24PM +0800, Xiao Guangrong wrote:
> > > >>>> On 03/22/2013 07:28 PM, Gleb Natapov wrote:
> > > >>>>> On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
> > > >>>>>> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
> > > >>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> And then have codepaths that nuke shadow pages break from the spinlock,
> > > >>>>>>>>
> > > >>>>>>>> I think this is not needed any more. We can let mmu_notify use the generation
> > > >>>>>>>> number to invalid all shadow pages, then we only need to free them after
> > > >>>>>>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> > > >>>>>>>> we can directly free them.
> > > >>>>>>>>
> > > >>>>>>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
> > > >>>>>>>>
> > > >>>>>>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
> > > >>>>>>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
> > > >>>>>>>> possible soft lock-ups.
> > > >>>>>>>>
> > > >>>>>>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
> > > >>>>>>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
> > > >>>>>>>> create page tables again. zap-all-shadow-page need long time to be finished,
> > > >>>>>>>> the worst case is, it can not completed forever on intensive vcpu and memory
> > > >>>>>>>> usage.
> > > >>>>>>>
> > > >>>>>>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
> > > >>>>>>> cases, where there is no detailed concern about performance. Such as
> > > >>>>>>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
> > > >>>>>>> most is that host remains unaffected (and that it finishes in a
> > > >>>>>>> reasonable time).
> > > >>>>>>
> > > >>>>>> Okay. I agree with you, will give a try.
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>>> I still think the right way to fix this kind of thing is optimization for
> > > >>>>>>>> mmu-lock.
> > > >>>>>>>
> > > >>>>>>> And then for the cases where performance matters just increase a
> > > >>>>>>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
> > > >>>>>>>
> > > >>>>>>> kvm_mmu_get_page() {
> > > >>>>>>> 	sp = lookup_hash(gfn)
> > > >>>>>>> 	if (sp->role = role) {
> > > >>>>>>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> > > >>>>>>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
> > > >>>>>>> 			kvm_mmu_init_page(sp);
> > > >>>>>>> 			proceed as if the page was just allocated
> > > >>>>>>> 		}
> > > >>>>>>> 	}
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> It makes the kvm_mmu_zap_all path even faster than you have now.
> > > >>>>>>> I suppose this was your idea correct with the generation number correct?
> > > >>>>>>
> > > >>>>>> Wow, great minds think alike, this is exactly what i am doing. ;)
> > > >>>>>>
> > > >>>>> Not that I disagree with above code, but why not make mmu_gen_number to be
> > > >>>>> part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
> > > >>>>> limit is reached like we looks to be doing with role.invalid pages now.
> > > >>>>
> > > >>>> These pages can be reused after purge its entries and delete it from parents
> > > >>>> list, it can reduce the pressure of memory allocator. Also, we can move it to
> > > >>>> the head of active_list so that the pages with invalid_gen can be reclaimed first.
> > > >>>>
> > > >>> You mean tail of the active_list, since kvm_mmu_free_some_pages()
> > > >>> removes pages from tail? Since pages with new mmu_gen_number will be put
> > > >>
> > > >> I mean purge the invalid-gen page first, then update its valid-gen to current-gen,
> > > >> then move it to the head of active_list:
> > > >>
> > > >> kvm_mmu_get_page() {
> > > >> 	sp = lookup_hash(gfn)
> > > >> 	if (sp->role = role) {
> > > >> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> > > >> 			kvm_mmu_purge_page(sp); (no need for TLB flushes as its unreachable)
> > > >> 			sp->mmu_gen_number = kvm->arch.mmu_gen_number;
> > > >> 			@@@@@@ move sp to the head of active list @@@@@@
> > > >> 		}
> > > >> 	}
> > > >> }
> > > >>
> > > >>
> > > > And I am saying that if you make mmu_gen_number part of the role you do
> > > > not need to change kvm_mmu_get_page() at all. It will just work.
> > > 
> > > Oh, i got what your said. But i want to reuse these page (without
> > > free and re-allocate). What do you think about this?
> > > 
> > We did not do that for sp->role.invalid pages although we could do what
> > is proposed above for them too (am I right?). If there is measurable
> > advantage of reusing invalid pages in kvm_mmu_get_page() lets do it like
> > that, but if not then less code is better.
> 
> The number of sp->role.invalid=1 pages is small (only shadow roots). It
> can grow but is bounded to a handful. No improvement visible there.
> 
> The number of shadow pages with old mmu_gen_number is potentially large.
> 
> Returning all shadow pages to the allocator is problematic because it
> takes a long time (therefore the suggestion to postpone it).
> 
> Spreading the work to free (or reuse) those shadow pages to individual
> page fault instances alleviates the mmu_lock hold time issue without
> significant reduction to post kvm_mmu_zap_all operation (which has to
> rebuilt all pagetables anyway).
> 
> You prefer to modify SLAB allocator to aggressively free these stale
> shadow pages rather than kvm_mmu_get_page to reuse them?
Are you saying that what makes kvm_mmu_zap_all() slow is that we return
all the shadow pages to the SLAB allocator? As far as I understand what
makes it slow is walking over huge number of shadow pages via various
lists, actually releasing them to the SLAB is not an issue, otherwise
the problem could have been solved by just moving
kvm_mmu_commit_zap_page() out of the mmu_lock. If there is measurable
SLAB overhead from not reusing the pages I am all for reusing them, but
is this really the case or just premature optimization?

--
			Gleb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 19:15                     ` Gleb Natapov
@ 2013-04-17 20:39                       ` Marcelo Tosatti
  2013-04-18  9:42                         ` Gleb Natapov
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2013-04-17 20:39 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Xiao Guangrong, linux-kernel, kvm

On Fri, Mar 22, 2013 at 09:15:24PM +0200, Gleb Natapov wrote:
> On Fri, Mar 22, 2013 at 08:37:33PM +0800, Xiao Guangrong wrote:
> > On 03/22/2013 08:12 PM, Gleb Natapov wrote:
> > > On Fri, Mar 22, 2013 at 08:03:04PM +0800, Xiao Guangrong wrote:
> > >> On 03/22/2013 07:47 PM, Gleb Natapov wrote:
> > >>> On Fri, Mar 22, 2013 at 07:39:24PM +0800, Xiao Guangrong wrote:
> > >>>> On 03/22/2013 07:28 PM, Gleb Natapov wrote:
> > >>>>> On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
> > >>>>>> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
> > >>>>>>
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> And then have codepaths that nuke shadow pages break from the spinlock,
> > >>>>>>>>
> > >>>>>>>> I think this is not needed any more. We can let mmu_notify use the generation
> > >>>>>>>> number to invalid all shadow pages, then we only need to free them after
> > >>>>>>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> > >>>>>>>> we can directly free them.
> > >>>>>>>>
> > >>>>>>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
> > >>>>>>>>
> > >>>>>>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
> > >>>>>>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
> > >>>>>>>> possible soft lock-ups.
> > >>>>>>>>
> > >>>>>>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
> > >>>>>>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
> > >>>>>>>> create page tables again. zap-all-shadow-page need long time to be finished,
> > >>>>>>>> the worst case is, it can not completed forever on intensive vcpu and memory
> > >>>>>>>> usage.
> > >>>>>>>
> > >>>>>>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
> > >>>>>>> cases, where there is no detailed concern about performance. Such as
> > >>>>>>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
> > >>>>>>> most is that host remains unaffected (and that it finishes in a
> > >>>>>>> reasonable time).
> > >>>>>>
> > >>>>>> Okay. I agree with you, will give a try.
> > >>>>>>
> > >>>>>>>
> > >>>>>>>> I still think the right way to fix this kind of thing is optimization for
> > >>>>>>>> mmu-lock.
> > >>>>>>>
> > >>>>>>> And then for the cases where performance matters just increase a
> > >>>>>>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
> > >>>>>>>
> > >>>>>>> kvm_mmu_get_page() {
> > >>>>>>> 	sp = lookup_hash(gfn)
> > >>>>>>> 	if (sp->role = role) {
> > >>>>>>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> > >>>>>>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
> > >>>>>>> 			kvm_mmu_init_page(sp);
> > >>>>>>> 			proceed as if the page was just allocated
> > >>>>>>> 		}
> > >>>>>>> 	}
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> It makes the kvm_mmu_zap_all path even faster than you have now.
> > >>>>>>> I suppose this was your idea correct with the generation number correct?
> > >>>>>>
> > >>>>>> Wow, great minds think alike, this is exactly what i am doing. ;)
> > >>>>>>
> > >>>>> Not that I disagree with above code, but why not make mmu_gen_number to be
> > >>>>> part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
> > >>>>> limit is reached like we looks to be doing with role.invalid pages now.
> > >>>>
> > >>>> These pages can be reused after purge its entries and delete it from parents
> > >>>> list, it can reduce the pressure of memory allocator. Also, we can move it to
> > >>>> the head of active_list so that the pages with invalid_gen can be reclaimed first.
> > >>>>
> > >>> You mean tail of the active_list, since kvm_mmu_free_some_pages()
> > >>> removes pages from tail? Since pages with new mmu_gen_number will be put
> > >>
> > >> I mean purge the invalid-gen page first, then update its valid-gen to current-gen,
> > >> then move it to the head of active_list:
> > >>
> > >> kvm_mmu_get_page() {
> > >> 	sp = lookup_hash(gfn)
> > >> 	if (sp->role = role) {
> > >> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> > >> 			kvm_mmu_purge_page(sp); (no need for TLB flushes as its unreachable)
> > >> 			sp->mmu_gen_number = kvm->arch.mmu_gen_number;
> > >> 			@@@@@@ move sp to the head of active list @@@@@@
> > >> 		}
> > >> 	}
> > >> }
> > >>
> > >>
> > > And I am saying that if you make mmu_gen_number part of the role you do
> > > not need to change kvm_mmu_get_page() at all. It will just work.
> > 
> > Oh, i got what your said. But i want to reuse these page (without
> > free and re-allocate). What do you think about this?
> > 
> We did not do that for sp->role.invalid pages although we could do what
> is proposed above for them too (am I right?). If there is measurable
> advantage of reusing invalid pages in kvm_mmu_get_page() lets do it like
> that, but if not then less code is better.

The number of sp->role.invalid=1 pages is small (only shadow roots). It
can grow but is bounded to a handful. No improvement visible there.

The number of shadow pages with old mmu_gen_number is potentially large.

Returning all shadow pages to the allocator is problematic because it
takes a long time (therefore the suggestion to postpone it).

Spreading the work to free (or reuse) those shadow pages to individual
page fault instances alleviates the mmu_lock hold time issue without
significant reduction to post kvm_mmu_zap_all operation (which has to
rebuilt all pagetables anyway).

You prefer to modify SLAB allocator to aggressively free these stale
shadow pages rather than kvm_mmu_get_page to reuse them?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 12:37                   ` Xiao Guangrong
@ 2013-03-22 19:15                     ` Gleb Natapov
  2013-04-17 20:39                       ` Marcelo Tosatti
  0 siblings, 1 reply; 19+ messages in thread
From: Gleb Natapov @ 2013-03-22 19:15 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, linux-kernel, kvm

On Fri, Mar 22, 2013 at 08:37:33PM +0800, Xiao Guangrong wrote:
> On 03/22/2013 08:12 PM, Gleb Natapov wrote:
> > On Fri, Mar 22, 2013 at 08:03:04PM +0800, Xiao Guangrong wrote:
> >> On 03/22/2013 07:47 PM, Gleb Natapov wrote:
> >>> On Fri, Mar 22, 2013 at 07:39:24PM +0800, Xiao Guangrong wrote:
> >>>> On 03/22/2013 07:28 PM, Gleb Natapov wrote:
> >>>>> On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
> >>>>>> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
> >>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And then have codepaths that nuke shadow pages break from the spinlock,
> >>>>>>>>
> >>>>>>>> I think this is not needed any more. We can let mmu_notify use the generation
> >>>>>>>> number to invalid all shadow pages, then we only need to free them after
> >>>>>>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> >>>>>>>> we can directly free them.
> >>>>>>>>
> >>>>>>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
> >>>>>>>>
> >>>>>>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
> >>>>>>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
> >>>>>>>> possible soft lock-ups.
> >>>>>>>>
> >>>>>>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
> >>>>>>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
> >>>>>>>> create page tables again. zap-all-shadow-page need long time to be finished,
> >>>>>>>> the worst case is, it can not completed forever on intensive vcpu and memory
> >>>>>>>> usage.
> >>>>>>>
> >>>>>>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
> >>>>>>> cases, where there is no detailed concern about performance. Such as
> >>>>>>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
> >>>>>>> most is that host remains unaffected (and that it finishes in a
> >>>>>>> reasonable time).
> >>>>>>
> >>>>>> Okay. I agree with you, will give a try.
> >>>>>>
> >>>>>>>
> >>>>>>>> I still think the right way to fix this kind of thing is optimization for
> >>>>>>>> mmu-lock.
> >>>>>>>
> >>>>>>> And then for the cases where performance matters just increase a
> >>>>>>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
> >>>>>>>
> >>>>>>> kvm_mmu_get_page() {
> >>>>>>> 	sp = lookup_hash(gfn)
> >>>>>>> 	if (sp->role = role) {
> >>>>>>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> >>>>>>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
> >>>>>>> 			kvm_mmu_init_page(sp);
> >>>>>>> 			proceed as if the page was just allocated
> >>>>>>> 		}
> >>>>>>> 	}
> >>>>>>> }
> >>>>>>>
> >>>>>>> It makes the kvm_mmu_zap_all path even faster than you have now.
> >>>>>>> I suppose this was your idea correct with the generation number correct?
> >>>>>>
> >>>>>> Wow, great minds think alike, this is exactly what i am doing. ;)
> >>>>>>
> >>>>> Not that I disagree with above code, but why not make mmu_gen_number to be
> >>>>> part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
> >>>>> limit is reached like we looks to be doing with role.invalid pages now.
> >>>>
> >>>> These pages can be reused after purge its entries and delete it from parents
> >>>> list, it can reduce the pressure of memory allocator. Also, we can move it to
> >>>> the head of active_list so that the pages with invalid_gen can be reclaimed first.
> >>>>
> >>> You mean tail of the active_list, since kvm_mmu_free_some_pages()
> >>> removes pages from tail? Since pages with new mmu_gen_number will be put
> >>
> >> I mean purge the invalid-gen page first, then update its valid-gen to current-gen,
> >> then move it to the head of active_list:
> >>
> >> kvm_mmu_get_page() {
> >> 	sp = lookup_hash(gfn)
> >> 	if (sp->role = role) {
> >> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> >> 			kvm_mmu_purge_page(sp); (no need for TLB flushes as its unreachable)
> >> 			sp->mmu_gen_number = kvm->arch.mmu_gen_number;
> >> 			@@@@@@ move sp to the head of active list @@@@@@
> >> 		}
> >> 	}
> >> }
> >>
> >>
> > And I am saying that if you make mmu_gen_number part of the role you do
> > not need to change kvm_mmu_get_page() at all. It will just work.
> 
> Oh, i got what your said. But i want to reuse these page (without
> free and re-allocate). What do you think about this?
> 
We did not do that for sp->role.invalid pages although we could do what
is proposed above for them too (am I right?). If there is measurable
advantage of reusing invalid pages in kvm_mmu_get_page() lets do it like
that, but if not then less code is better.

> > 
> >>> at the head of the list it is natural that tail will contain pages with
> >>> outdated generation numbers without need to explicitly move them.
> >>
> >> Currently, only the new allocated page can be moved to the head of
> >> active_list. The existing pages are not moved by kvm_mmu_get_page.
> >> It seems a bug.
> > Ideally it needs to be LRU list based on accessed bit scanning.
> 
> Yes, but unfortunately, A bit does not be supported on some intel cpus...
> 
Yes, but there is not much we can do there.

--
			Gleb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 12:12                 ` Gleb Natapov
@ 2013-03-22 12:37                   ` Xiao Guangrong
  2013-03-22 19:15                     ` Gleb Natapov
  0 siblings, 1 reply; 19+ messages in thread
From: Xiao Guangrong @ 2013-03-22 12:37 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Marcelo Tosatti, linux-kernel, kvm

On 03/22/2013 08:12 PM, Gleb Natapov wrote:
> On Fri, Mar 22, 2013 at 08:03:04PM +0800, Xiao Guangrong wrote:
>> On 03/22/2013 07:47 PM, Gleb Natapov wrote:
>>> On Fri, Mar 22, 2013 at 07:39:24PM +0800, Xiao Guangrong wrote:
>>>> On 03/22/2013 07:28 PM, Gleb Natapov wrote:
>>>>> On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
>>>>>> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> And then have codepaths that nuke shadow pages break from the spinlock,
>>>>>>>>
>>>>>>>> I think this is not needed any more. We can let mmu_notify use the generation
>>>>>>>> number to invalid all shadow pages, then we only need to free them after
>>>>>>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
>>>>>>>> we can directly free them.
>>>>>>>>
>>>>>>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
>>>>>>>>
>>>>>>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
>>>>>>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
>>>>>>>> possible soft lock-ups.
>>>>>>>>
>>>>>>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
>>>>>>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
>>>>>>>> create page tables again. zap-all-shadow-page need long time to be finished,
>>>>>>>> the worst case is, it can not completed forever on intensive vcpu and memory
>>>>>>>> usage.
>>>>>>>
>>>>>>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
>>>>>>> cases, where there is no detailed concern about performance. Such as
>>>>>>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
>>>>>>> most is that host remains unaffected (and that it finishes in a
>>>>>>> reasonable time).
>>>>>>
>>>>>> Okay. I agree with you, will give a try.
>>>>>>
>>>>>>>
>>>>>>>> I still think the right way to fix this kind of thing is optimization for
>>>>>>>> mmu-lock.
>>>>>>>
>>>>>>> And then for the cases where performance matters just increase a
>>>>>>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
>>>>>>>
>>>>>>> kvm_mmu_get_page() {
>>>>>>> 	sp = lookup_hash(gfn)
>>>>>>> 	if (sp->role = role) {
>>>>>>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
>>>>>>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
>>>>>>> 			kvm_mmu_init_page(sp);
>>>>>>> 			proceed as if the page was just allocated
>>>>>>> 		}
>>>>>>> 	}
>>>>>>> }
>>>>>>>
>>>>>>> It makes the kvm_mmu_zap_all path even faster than you have now.
>>>>>>> I suppose this was your idea correct with the generation number correct?
>>>>>>
>>>>>> Wow, great minds think alike, this is exactly what i am doing. ;)
>>>>>>
>>>>> Not that I disagree with above code, but why not make mmu_gen_number to be
>>>>> part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
>>>>> limit is reached like we looks to be doing with role.invalid pages now.
>>>>
>>>> These pages can be reused after purge its entries and delete it from parents
>>>> list, it can reduce the pressure of memory allocator. Also, we can move it to
>>>> the head of active_list so that the pages with invalid_gen can be reclaimed first.
>>>>
>>> You mean tail of the active_list, since kvm_mmu_free_some_pages()
>>> removes pages from tail? Since pages with new mmu_gen_number will be put
>>
>> I mean purge the invalid-gen page first, then update its valid-gen to current-gen,
>> then move it to the head of active_list:
>>
>> kvm_mmu_get_page() {
>> 	sp = lookup_hash(gfn)
>> 	if (sp->role = role) {
>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
>> 			kvm_mmu_purge_page(sp); (no need for TLB flushes as its unreachable)
>> 			sp->mmu_gen_number = kvm->arch.mmu_gen_number;
>> 			@@@@@@ move sp to the head of active list @@@@@@
>> 		}
>> 	}
>> }
>>
>>
> And I am saying that if you make mmu_gen_number part of the role you do
> not need to change kvm_mmu_get_page() at all. It will just work.

Oh, i got what your said. But i want to reuse these page (without
free and re-allocate). What do you think about this?

> 
>>> at the head of the list it is natural that tail will contain pages with
>>> outdated generation numbers without need to explicitly move them.
>>
>> Currently, only the new allocated page can be moved to the head of
>> active_list. The existing pages are not moved by kvm_mmu_get_page.
>> It seems a bug.
> Ideally it needs to be LRU list based on accessed bit scanning.

Yes, but unfortunately, A bit does not be supported on some intel cpus...


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 12:03               ` Xiao Guangrong
@ 2013-03-22 12:12                 ` Gleb Natapov
  2013-03-22 12:37                   ` Xiao Guangrong
  0 siblings, 1 reply; 19+ messages in thread
From: Gleb Natapov @ 2013-03-22 12:12 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, linux-kernel, kvm

On Fri, Mar 22, 2013 at 08:03:04PM +0800, Xiao Guangrong wrote:
> On 03/22/2013 07:47 PM, Gleb Natapov wrote:
> > On Fri, Mar 22, 2013 at 07:39:24PM +0800, Xiao Guangrong wrote:
> >> On 03/22/2013 07:28 PM, Gleb Natapov wrote:
> >>> On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
> >>>> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
> >>>>
> >>>>>>
> >>>>>>>
> >>>>>>> And then have codepaths that nuke shadow pages break from the spinlock,
> >>>>>>
> >>>>>> I think this is not needed any more. We can let mmu_notify use the generation
> >>>>>> number to invalid all shadow pages, then we only need to free them after
> >>>>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> >>>>>> we can directly free them.
> >>>>>>
> >>>>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
> >>>>>>
> >>>>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
> >>>>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
> >>>>>> possible soft lock-ups.
> >>>>>>
> >>>>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
> >>>>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
> >>>>>> create page tables again. zap-all-shadow-page need long time to be finished,
> >>>>>> the worst case is, it can not completed forever on intensive vcpu and memory
> >>>>>> usage.
> >>>>>
> >>>>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
> >>>>> cases, where there is no detailed concern about performance. Such as
> >>>>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
> >>>>> most is that host remains unaffected (and that it finishes in a
> >>>>> reasonable time).
> >>>>
> >>>> Okay. I agree with you, will give a try.
> >>>>
> >>>>>
> >>>>>> I still think the right way to fix this kind of thing is optimization for
> >>>>>> mmu-lock.
> >>>>>
> >>>>> And then for the cases where performance matters just increase a
> >>>>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
> >>>>>
> >>>>> kvm_mmu_get_page() {
> >>>>> 	sp = lookup_hash(gfn)
> >>>>> 	if (sp->role = role) {
> >>>>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> >>>>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
> >>>>> 			kvm_mmu_init_page(sp);
> >>>>> 			proceed as if the page was just allocated
> >>>>> 		}
> >>>>> 	}
> >>>>> }
> >>>>>
> >>>>> It makes the kvm_mmu_zap_all path even faster than you have now.
> >>>>> I suppose this was your idea correct with the generation number correct?
> >>>>
> >>>> Wow, great minds think alike, this is exactly what i am doing. ;)
> >>>>
> >>> Not that I disagree with above code, but why not make mmu_gen_number to be
> >>> part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
> >>> limit is reached like we looks to be doing with role.invalid pages now.
> >>
> >> These pages can be reused after purge its entries and delete it from parents
> >> list, it can reduce the pressure of memory allocator. Also, we can move it to
> >> the head of active_list so that the pages with invalid_gen can be reclaimed first.
> >>
> > You mean tail of the active_list, since kvm_mmu_free_some_pages()
> > removes pages from tail? Since pages with new mmu_gen_number will be put
> 
> I mean purge the invalid-gen page first, then update its valid-gen to current-gen,
> then move it to the head of active_list:
> 
> kvm_mmu_get_page() {
> 	sp = lookup_hash(gfn)
> 	if (sp->role = role) {
> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> 			kvm_mmu_purge_page(sp); (no need for TLB flushes as its unreachable)
> 			sp->mmu_gen_number = kvm->arch.mmu_gen_number;
> 			@@@@@@ move sp to the head of active list @@@@@@
> 		}
> 	}
> }
> 
> 
And I am saying that if you make mmu_gen_number part of the role you do
not need to change kvm_mmu_get_page() at all. It will just work.

> > at the head of the list it is natural that tail will contain pages with
> > outdated generation numbers without need to explicitly move them.
> 
> Currently, only the new allocated page can be moved to the head of
> active_list. The existing pages are not moved by kvm_mmu_get_page.
> It seems a bug.
Ideally it needs to be LRU list based on accessed bit scanning.

--
			Gleb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 11:47             ` Gleb Natapov
@ 2013-03-22 12:03               ` Xiao Guangrong
  2013-03-22 12:12                 ` Gleb Natapov
  0 siblings, 1 reply; 19+ messages in thread
From: Xiao Guangrong @ 2013-03-22 12:03 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Marcelo Tosatti, linux-kernel, kvm

On 03/22/2013 07:47 PM, Gleb Natapov wrote:
> On Fri, Mar 22, 2013 at 07:39:24PM +0800, Xiao Guangrong wrote:
>> On 03/22/2013 07:28 PM, Gleb Natapov wrote:
>>> On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
>>>> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
>>>>
>>>>>>
>>>>>>>
>>>>>>> And then have codepaths that nuke shadow pages break from the spinlock,
>>>>>>
>>>>>> I think this is not needed any more. We can let mmu_notify use the generation
>>>>>> number to invalid all shadow pages, then we only need to free them after
>>>>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
>>>>>> we can directly free them.
>>>>>>
>>>>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
>>>>>>
>>>>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
>>>>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
>>>>>> possible soft lock-ups.
>>>>>>
>>>>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
>>>>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
>>>>>> create page tables again. zap-all-shadow-page need long time to be finished,
>>>>>> the worst case is, it can not completed forever on intensive vcpu and memory
>>>>>> usage.
>>>>>
>>>>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
>>>>> cases, where there is no detailed concern about performance. Such as
>>>>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
>>>>> most is that host remains unaffected (and that it finishes in a
>>>>> reasonable time).
>>>>
>>>> Okay. I agree with you, will give a try.
>>>>
>>>>>
>>>>>> I still think the right way to fix this kind of thing is optimization for
>>>>>> mmu-lock.
>>>>>
>>>>> And then for the cases where performance matters just increase a
>>>>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
>>>>>
>>>>> kvm_mmu_get_page() {
>>>>> 	sp = lookup_hash(gfn)
>>>>> 	if (sp->role = role) {
>>>>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
>>>>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
>>>>> 			kvm_mmu_init_page(sp);
>>>>> 			proceed as if the page was just allocated
>>>>> 		}
>>>>> 	}
>>>>> }
>>>>>
>>>>> It makes the kvm_mmu_zap_all path even faster than you have now.
>>>>> I suppose this was your idea correct with the generation number correct?
>>>>
>>>> Wow, great minds think alike, this is exactly what i am doing. ;)
>>>>
>>> Not that I disagree with above code, but why not make mmu_gen_number to be
>>> part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
>>> limit is reached like we looks to be doing with role.invalid pages now.
>>
>> These pages can be reused after purge its entries and delete it from parents
>> list, it can reduce the pressure of memory allocator. Also, we can move it to
>> the head of active_list so that the pages with invalid_gen can be reclaimed first.
>>
> You mean tail of the active_list, since kvm_mmu_free_some_pages()
> removes pages from tail? Since pages with new mmu_gen_number will be put

I mean purge the invalid-gen page first, then update its valid-gen to current-gen,
then move it to the head of active_list:

kvm_mmu_get_page() {
	sp = lookup_hash(gfn)
	if (sp->role = role) {
		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
			kvm_mmu_purge_page(sp); (no need for TLB flushes as its unreachable)
			sp->mmu_gen_number = kvm->arch.mmu_gen_number;
			@@@@@@ move sp to the head of active list @@@@@@
		}
	}
}


> at the head of the list it is natural that tail will contain pages with
> outdated generation numbers without need to explicitly move them.

Currently, only the new allocated page can be moved to the head of
active_list. The existing pages are not moved by kvm_mmu_get_page.
It seems a bug.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 11:39           ` Xiao Guangrong
@ 2013-03-22 11:47             ` Gleb Natapov
  2013-03-22 12:03               ` Xiao Guangrong
  0 siblings, 1 reply; 19+ messages in thread
From: Gleb Natapov @ 2013-03-22 11:47 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, linux-kernel, kvm

On Fri, Mar 22, 2013 at 07:39:24PM +0800, Xiao Guangrong wrote:
> On 03/22/2013 07:28 PM, Gleb Natapov wrote:
> > On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
> >> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
> >>
> >>>>
> >>>>>
> >>>>> And then have codepaths that nuke shadow pages break from the spinlock,
> >>>>
> >>>> I think this is not needed any more. We can let mmu_notify use the generation
> >>>> number to invalid all shadow pages, then we only need to free them after
> >>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> >>>> we can directly free them.
> >>>>
> >>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
> >>>>
> >>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
> >>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
> >>>> possible soft lock-ups.
> >>>>
> >>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
> >>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
> >>>> create page tables again. zap-all-shadow-page need long time to be finished,
> >>>> the worst case is, it can not completed forever on intensive vcpu and memory
> >>>> usage.
> >>>
> >>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
> >>> cases, where there is no detailed concern about performance. Such as
> >>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
> >>> most is that host remains unaffected (and that it finishes in a
> >>> reasonable time).
> >>
> >> Okay. I agree with you, will give a try.
> >>
> >>>
> >>>> I still think the right way to fix this kind of thing is optimization for
> >>>> mmu-lock.
> >>>
> >>> And then for the cases where performance matters just increase a
> >>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
> >>>
> >>> kvm_mmu_get_page() {
> >>> 	sp = lookup_hash(gfn)
> >>> 	if (sp->role = role) {
> >>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> >>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
> >>> 			kvm_mmu_init_page(sp);
> >>> 			proceed as if the page was just allocated
> >>> 		}
> >>> 	}
> >>> }
> >>>
> >>> It makes the kvm_mmu_zap_all path even faster than you have now.
> >>> I suppose this was your idea correct with the generation number correct?
> >>
> >> Wow, great minds think alike, this is exactly what i am doing. ;)
> >>
> > Not that I disagree with above code, but why not make mmu_gen_number to be
> > part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
> > limit is reached like we looks to be doing with role.invalid pages now.
> 
> These pages can be reused after purge its entries and delete it from parents
> list, it can reduce the pressure of memory allocator. Also, we can move it to
> the head of active_list so that the pages with invalid_gen can be reclaimed first.
> 
You mean tail of the active_list, since kvm_mmu_free_some_pages()
removes pages from tail? Since pages with new mmu_gen_number will be put
at the head of the list it is natural that tail will contain pages with
outdated generation numbers without need to explicitly move them.

--
			Gleb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 11:28         ` Gleb Natapov
@ 2013-03-22 11:39           ` Xiao Guangrong
  2013-03-22 11:47             ` Gleb Natapov
  0 siblings, 1 reply; 19+ messages in thread
From: Xiao Guangrong @ 2013-03-22 11:39 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Marcelo Tosatti, linux-kernel, kvm

On 03/22/2013 07:28 PM, Gleb Natapov wrote:
> On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
>> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
>>
>>>>
>>>>>
>>>>> And then have codepaths that nuke shadow pages break from the spinlock,
>>>>
>>>> I think this is not needed any more. We can let mmu_notify use the generation
>>>> number to invalid all shadow pages, then we only need to free them after
>>>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
>>>> we can directly free them.
>>>>
>>>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
>>>>
>>>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
>>>> not fix the hot-lock contention and it just occupies more cpu time to avoid
>>>> possible soft lock-ups.
>>>>
>>>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
>>>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
>>>> create page tables again. zap-all-shadow-page need long time to be finished,
>>>> the worst case is, it can not completed forever on intensive vcpu and memory
>>>> usage.
>>>
>>> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
>>> cases, where there is no detailed concern about performance. Such as
>>> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
>>> most is that host remains unaffected (and that it finishes in a
>>> reasonable time).
>>
>> Okay. I agree with you, will give a try.
>>
>>>
>>>> I still think the right way to fix this kind of thing is optimization for
>>>> mmu-lock.
>>>
>>> And then for the cases where performance matters just increase a
>>> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
>>>
>>> kvm_mmu_get_page() {
>>> 	sp = lookup_hash(gfn)
>>> 	if (sp->role = role) {
>>> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
>>> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
>>> 			kvm_mmu_init_page(sp);
>>> 			proceed as if the page was just allocated
>>> 		}
>>> 	}
>>> }
>>>
>>> It makes the kvm_mmu_zap_all path even faster than you have now.
>>> I suppose this was your idea correct with the generation number correct?
>>
>> Wow, great minds think alike, this is exactly what i am doing. ;)
>>
> Not that I disagree with above code, but why not make mmu_gen_number to be
> part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
> limit is reached like we looks to be doing with role.invalid pages now.

These pages can be reused after purge its entries and delete it from parents
list, it can reduce the pressure of memory allocator. Also, we can move it to
the head of active_list so that the pages with invalid_gen can be reclaimed first.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 11:10       ` Xiao Guangrong
@ 2013-03-22 11:28         ` Gleb Natapov
  2013-03-22 11:39           ` Xiao Guangrong
  0 siblings, 1 reply; 19+ messages in thread
From: Gleb Natapov @ 2013-03-22 11:28 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, linux-kernel, kvm

On Fri, Mar 22, 2013 at 07:10:44PM +0800, Xiao Guangrong wrote:
> On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:
> 
> >>
> >>>
> >>> And then have codepaths that nuke shadow pages break from the spinlock,
> >>
> >> I think this is not needed any more. We can let mmu_notify use the generation
> >> number to invalid all shadow pages, then we only need to free them after
> >> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> >> we can directly free them.
> >>
> >>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
> >>
> >> BTW, to my honest, i do not think spin_needbreak is a good way - it does
> >> not fix the hot-lock contention and it just occupies more cpu time to avoid
> >> possible soft lock-ups.
> >>
> >> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
> >> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
> >> create page tables again. zap-all-shadow-page need long time to be finished,
> >> the worst case is, it can not completed forever on intensive vcpu and memory
> >> usage.
> > 
> > Yes, but the suggestion is to use spin_needbreak on the VM shutdown
> > cases, where there is no detailed concern about performance. Such as
> > mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
> > most is that host remains unaffected (and that it finishes in a
> > reasonable time).
> 
> Okay. I agree with you, will give a try.
> 
> > 
> >> I still think the right way to fix this kind of thing is optimization for
> >> mmu-lock.
> > 
> > And then for the cases where performance matters just increase a
> > VM global generetion number, zap the roots and then on kvm_mmu_get_page:
> > 
> > kvm_mmu_get_page() {
> > 	sp = lookup_hash(gfn)
> > 	if (sp->role = role) {
> > 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> > 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
> > 			kvm_mmu_init_page(sp);
> > 			proceed as if the page was just allocated
> > 		}
> > 	}
> > }
> > 
> > It makes the kvm_mmu_zap_all path even faster than you have now.
> > I suppose this was your idea correct with the generation number correct?
> 
> Wow, great minds think alike, this is exactly what i am doing. ;)
> 
Not that I disagree with above code, but why not make mmu_gen_number to be
part of a role and remove old pages in kvm_mmu_free_some_pages() whenever
limit is reached like we looks to be doing with role.invalid pages now.
 
--
			Gleb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22 10:54     ` Marcelo Tosatti
@ 2013-03-22 11:10       ` Xiao Guangrong
  2013-03-22 11:28         ` Gleb Natapov
  0 siblings, 1 reply; 19+ messages in thread
From: Xiao Guangrong @ 2013-03-22 11:10 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: gleb, linux-kernel, kvm

On 03/22/2013 06:54 PM, Marcelo Tosatti wrote:

>>
>>>
>>> And then have codepaths that nuke shadow pages break from the spinlock,
>>
>> I think this is not needed any more. We can let mmu_notify use the generation
>> number to invalid all shadow pages, then we only need to free them after
>> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
>> we can directly free them.
>>
>>> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
>>
>> BTW, to my honest, i do not think spin_needbreak is a good way - it does
>> not fix the hot-lock contention and it just occupies more cpu time to avoid
>> possible soft lock-ups.
>>
>> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
>> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
>> create page tables again. zap-all-shadow-page need long time to be finished,
>> the worst case is, it can not completed forever on intensive vcpu and memory
>> usage.
> 
> Yes, but the suggestion is to use spin_needbreak on the VM shutdown
> cases, where there is no detailed concern about performance. Such as
> mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
> most is that host remains unaffected (and that it finishes in a
> reasonable time).

Okay. I agree with you, will give a try.

> 
>> I still think the right way to fix this kind of thing is optimization for
>> mmu-lock.
> 
> And then for the cases where performance matters just increase a
> VM global generetion number, zap the roots and then on kvm_mmu_get_page:
> 
> kvm_mmu_get_page() {
> 	sp = lookup_hash(gfn)
> 	if (sp->role = role) {
> 		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
> 			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
> 			kvm_mmu_init_page(sp);
> 			proceed as if the page was just allocated
> 		}
> 	}
> }
> 
> It makes the kvm_mmu_zap_all path even faster than you have now.
> I suppose this was your idea correct with the generation number correct?

Wow, great minds think alike, this is exactly what i am doing. ;)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22  2:11   ` Xiao Guangrong
  2013-03-22 10:01     ` Xiao Guangrong
@ 2013-03-22 10:54     ` Marcelo Tosatti
  2013-03-22 11:10       ` Xiao Guangrong
  1 sibling, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2013-03-22 10:54 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, linux-kernel, kvm

On Fri, Mar 22, 2013 at 10:11:17AM +0800, Xiao Guangrong wrote:
> On 03/22/2013 06:21 AM, Marcelo Tosatti wrote:
> > On Wed, Mar 20, 2013 at 04:30:20PM +0800, Xiao Guangrong wrote:
> >> Changlog:
> >> V2:
> >>   - do not reset n_requested_mmu_pages and n_max_mmu_pages
> >>   - batch free root shadow pages to reduce vcpu notification and mmu-lock
> >>     contention
> >>   - remove the first patch that introduce kvm->arch.mmu_cache since we only
> >>     'memset zero' on hashtable rather than all mmu cache members in this
> >>     version
> >>   - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all
> >>
> >> * Issue
> >> The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
> >> walk and zap all shadow pages one by one, also it need to zap all guest
> >> page's rmap and all shadow page's parent spte list. Particularly, things
> >> become worse if guest uses more memory or vcpus. It is not good for
> >> scalability.
> > 
> > Xiao, 
> > 
> > The bulk removal of shadow pages from mmu cache is nerving - it creates
> > two codepaths to delete a data structure: the usual, single entry one
> > and the bulk one.
> > 
> > There are two main usecases for kvm_mmu_zap_all(): to invalidate the
> > current mmu tree (from kvm_set_memory) and to tear down all pages
> > (VM shutdown).
> > 
> > The first usecase can use your idea of an invalid generation number
> > on shadow pages. That is, increment the VM generation number, nuke the root
> > pages and thats it. 
> > 
> > The modifications should be contained to kvm_mmu_get_page() mostly,
> > correct? (would also have to keep counters to increase SLAB freeing 
> > ratio, relative to number of outdated shadow pages).
> 
> Yes.
> 
> > 
> > And then have codepaths that nuke shadow pages break from the spinlock,
> 
> I think this is not needed any more. We can let mmu_notify use the generation
> number to invalid all shadow pages, then we only need to free them after
> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> we can directly free them.
> 
> > such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
> 
> BTW, to my honest, i do not think spin_needbreak is a good way - it does
> not fix the hot-lock contention and it just occupies more cpu time to avoid
> possible soft lock-ups.
>
> Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
> mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
> create page tables again. zap-all-shadow-page need long time to be finished,
> the worst case is, it can not completed forever on intensive vcpu and memory
> usage.

Yes, but the suggestion is to use spin_needbreak on the VM shutdown
cases, where there is no detailed concern about performance. Such as
mmu_notifier_release, kvm_destroy_vm, etc. In those cases what matters
most is that host remains unaffected (and that it finishes in a
reasonable time).

> I still think the right way to fix this kind of thing is optimization for
> mmu-lock.

And then for the cases where performance matters just increase a
VM global generetion number, zap the roots and then on kvm_mmu_get_page:

kvm_mmu_get_page() {
	sp = lookup_hash(gfn)
	if (sp->role = role) {
		if (sp->mmu_gen_number != kvm->arch.mmu_gen_number) {
			kvm_mmu_commit_zap_page(sp); (no need for TLB flushes as its unreachable)
			kvm_mmu_init_page(sp);
			proceed as if the page was just allocated
		}
	}
}

It makes the kvm_mmu_zap_all path even faster than you have now.
I suppose this was your idea correct with the generation number correct?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-22  2:11   ` Xiao Guangrong
@ 2013-03-22 10:01     ` Xiao Guangrong
  2013-03-22 10:54     ` Marcelo Tosatti
  1 sibling, 0 replies; 19+ messages in thread
From: Xiao Guangrong @ 2013-03-22 10:01 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, gleb, linux-kernel, kvm

On 03/22/2013 10:11 AM, Xiao Guangrong wrote:

>> The modifications should be contained to kvm_mmu_get_page() mostly,
>> correct? (would also have to keep counters to increase SLAB freeing 
>> ratio, relative to number of outdated shadow pages).
> 
> Yes.
> 
>>
>> And then have codepaths that nuke shadow pages break from the spinlock,
> 
> I think this is not needed any more. We can let mmu_notify use the generation
> number to invalid all shadow pages, then we only need to free them after
> all vcpus down and mmu_notify unregistered - at this point, no lock contention,
> we can directly free them.

Sorry. This is wrong since after call ->release(), the memory will be
freed. zap-all-sp can not be delayed. Will think out a good way to handle this...



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-21 22:21 ` Marcelo Tosatti
@ 2013-03-22  2:11   ` Xiao Guangrong
  2013-03-22 10:01     ` Xiao Guangrong
  2013-03-22 10:54     ` Marcelo Tosatti
  0 siblings, 2 replies; 19+ messages in thread
From: Xiao Guangrong @ 2013-03-22  2:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: gleb, linux-kernel, kvm

On 03/22/2013 06:21 AM, Marcelo Tosatti wrote:
> On Wed, Mar 20, 2013 at 04:30:20PM +0800, Xiao Guangrong wrote:
>> Changlog:
>> V2:
>>   - do not reset n_requested_mmu_pages and n_max_mmu_pages
>>   - batch free root shadow pages to reduce vcpu notification and mmu-lock
>>     contention
>>   - remove the first patch that introduce kvm->arch.mmu_cache since we only
>>     'memset zero' on hashtable rather than all mmu cache members in this
>>     version
>>   - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all
>>
>> * Issue
>> The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
>> walk and zap all shadow pages one by one, also it need to zap all guest
>> page's rmap and all shadow page's parent spte list. Particularly, things
>> become worse if guest uses more memory or vcpus. It is not good for
>> scalability.
> 
> Xiao, 
> 
> The bulk removal of shadow pages from mmu cache is nerving - it creates
> two codepaths to delete a data structure: the usual, single entry one
> and the bulk one.
> 
> There are two main usecases for kvm_mmu_zap_all(): to invalidate the
> current mmu tree (from kvm_set_memory) and to tear down all pages
> (VM shutdown).
> 
> The first usecase can use your idea of an invalid generation number
> on shadow pages. That is, increment the VM generation number, nuke the root
> pages and thats it. 
> 
> The modifications should be contained to kvm_mmu_get_page() mostly,
> correct? (would also have to keep counters to increase SLAB freeing 
> ratio, relative to number of outdated shadow pages).

Yes.

> 
> And then have codepaths that nuke shadow pages break from the spinlock,

I think this is not needed any more. We can let mmu_notify use the generation
number to invalid all shadow pages, then we only need to free them after
all vcpus down and mmu_notify unregistered - at this point, no lock contention,
we can directly free them.

> such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).

BTW, to my honest, i do not think spin_needbreak is a good way - it does
not fix the hot-lock contention and it just occupies more cpu time to avoid
possible soft lock-ups.

Especially, zap-all-shadow-pages can let other vcpus fault and vcpus contest
mmu-lock, then zap-all-shadow-pages release mmu-lock and wait, other vcpus
create page tables again. zap-all-shadow-page need long time to be finished,
the worst case is, it can not completed forever on intensive vcpu and memory
usage.

I still think the right way to fix this kind of thing is optimization for
mmu-lock.

> That would also solve the current issues without using more memory 
> for pte_list_desc and without the delicate "Reset MMU cache" step.
> 
> What you think?

I agree your point, Marcelo! I will redesign it. Thank you!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
  2013-03-20  8:30 Xiao Guangrong
@ 2013-03-21 22:21 ` Marcelo Tosatti
  2013-03-22  2:11   ` Xiao Guangrong
  0 siblings, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2013-03-21 22:21 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, linux-kernel, kvm

On Wed, Mar 20, 2013 at 04:30:20PM +0800, Xiao Guangrong wrote:
> Changlog:
> V2:
>   - do not reset n_requested_mmu_pages and n_max_mmu_pages
>   - batch free root shadow pages to reduce vcpu notification and mmu-lock
>     contention
>   - remove the first patch that introduce kvm->arch.mmu_cache since we only
>     'memset zero' on hashtable rather than all mmu cache members in this
>     version
>   - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all
> 
> * Issue
> The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
> walk and zap all shadow pages one by one, also it need to zap all guest
> page's rmap and all shadow page's parent spte list. Particularly, things
> become worse if guest uses more memory or vcpus. It is not good for
> scalability.

Xiao, 

The bulk removal of shadow pages from mmu cache is nerving - it creates
two codepaths to delete a data structure: the usual, single entry one
and the bulk one.

There are two main usecases for kvm_mmu_zap_all(): to invalidate the
current mmu tree (from kvm_set_memory) and to tear down all pages
(VM shutdown).

The first usecase can use your idea of an invalid generation number
on shadow pages. That is, increment the VM generation number, nuke the root
pages and thats it. 

The modifications should be contained to kvm_mmu_get_page() mostly,
correct? (would also have to keep counters to increase SLAB freeing 
ratio, relative to number of outdated shadow pages).

And then have codepaths that nuke shadow pages break from the spinlock,
such as kvm_mmu_slot_remove_write_access does now (spin_needbreak).
That would also solve the current issues without using more memory 
for pte_list_desc and without the delicate "Reset MMU cache" step.

What you think?

> * Idea
> Since all shadow page will be zapped, we can directly zap the mmu-cache
> and rmap so that vcpu will fault on the new mmu-cache, after that, we can
> directly free the memory used by old mmu-cache.
> 
> The root shadow page is little especial since they are currently used by
> vcpus, we can not directly free them. So, we zap the root shadow pages and
> re-add them into the new mmu-cache.
> 
> * TODO
> (1): free root shadow pages by using generation-number
> (2): drop unnecessary @npages from kvm_arch_create_memslot
> 
> * Performance
> The testcase can be found at:
> http://www.gossamer-threads.com/lists/engine?do=post_attachment;postatt_id=54896;list=linux
> is used to measure the time of delete / add memslot. At that time, all vcpus
> are waiting, that means, no mmu-lock contention. I believe the result be more
> beautiful if other vcpus and mmu notification need to hold the mmu-lock.
> 
> Guest VCPU:6, Mem:2048M
> 
> before: Run 10 times, Avg time:46078825 ns.
> 
> after: Run 10 times, Avg time:21558774 ns. (+ 113%)
> 
> Xiao Guangrong (7):
>   KVM: MMU: introduce mmu_cache->pte_list_descs
>   KVM: x86: introduce memslot_set_lpage_disallowed
>   KVM: x86: introduce kvm_clear_all_gfn_page_info
>   KVM: MMU: delete shadow page from hash list in
>     kvm_mmu_prepare_zap_page
>   KVM: MMU: split kvm_mmu_prepare_zap_page
>   KVM: MMU: fast zap all shadow pages
>   KVM: MMU: drop unnecessary kvm_reload_remote_mmus after
>     kvm_mmu_zap_all
> 
>  arch/x86/include/asm/kvm_host.h |    7 ++-
>  arch/x86/kvm/mmu.c              |  105 ++++++++++++++++++++++++++++++++++-----
>  arch/x86/kvm/mmu.h              |    1 +
>  arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++-------
>  include/linux/kvm_host.h        |    1 +
>  5 files changed, 166 insertions(+), 35 deletions(-)
> 
> -- 
> 1.7.7.6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages
@ 2013-03-20  8:30 Xiao Guangrong
  2013-03-21 22:21 ` Marcelo Tosatti
  0 siblings, 1 reply; 19+ messages in thread
From: Xiao Guangrong @ 2013-03-20  8:30 UTC (permalink / raw)
  To: mtosatti; +Cc: gleb, linux-kernel, kvm, Xiao Guangrong

Changlog:
V2:
  - do not reset n_requested_mmu_pages and n_max_mmu_pages
  - batch free root shadow pages to reduce vcpu notification and mmu-lock
    contention
  - remove the first patch that introduce kvm->arch.mmu_cache since we only
    'memset zero' on hashtable rather than all mmu cache members in this
    version
  - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all

* Issue
The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
walk and zap all shadow pages one by one, also it need to zap all guest
page's rmap and all shadow page's parent spte list. Particularly, things
become worse if guest uses more memory or vcpus. It is not good for
scalability.

* Idea
Since all shadow page will be zapped, we can directly zap the mmu-cache
and rmap so that vcpu will fault on the new mmu-cache, after that, we can
directly free the memory used by old mmu-cache.

The root shadow page is little especial since they are currently used by
vcpus, we can not directly free them. So, we zap the root shadow pages and
re-add them into the new mmu-cache.

* TODO
(1): free root shadow pages by using generation-number
(2): drop unnecessary @npages from kvm_arch_create_memslot

* Performance
The testcase can be found at:
http://www.gossamer-threads.com/lists/engine?do=post_attachment;postatt_id=54896;list=linux
is used to measure the time of delete / add memslot. At that time, all vcpus
are waiting, that means, no mmu-lock contention. I believe the result be more
beautiful if other vcpus and mmu notification need to hold the mmu-lock.

Guest VCPU:6, Mem:2048M

before: Run 10 times, Avg time:46078825 ns.

after: Run 10 times, Avg time:21558774 ns. (+ 113%)

Xiao Guangrong (7):
  KVM: MMU: introduce mmu_cache->pte_list_descs
  KVM: x86: introduce memslot_set_lpage_disallowed
  KVM: x86: introduce kvm_clear_all_gfn_page_info
  KVM: MMU: delete shadow page from hash list in
    kvm_mmu_prepare_zap_page
  KVM: MMU: split kvm_mmu_prepare_zap_page
  KVM: MMU: fast zap all shadow pages
  KVM: MMU: drop unnecessary kvm_reload_remote_mmus after
    kvm_mmu_zap_all

 arch/x86/include/asm/kvm_host.h |    7 ++-
 arch/x86/kvm/mmu.c              |  105 ++++++++++++++++++++++++++++++++++-----
 arch/x86/kvm/mmu.h              |    1 +
 arch/x86/kvm/x86.c              |   87 +++++++++++++++++++++++++-------
 include/linux/kvm_host.h        |    1 +
 5 files changed, 166 insertions(+), 35 deletions(-)

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-04-18 17:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-20  8:29 [PATCH v2 0/7] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-03-20  8:30 Xiao Guangrong
2013-03-21 22:21 ` Marcelo Tosatti
2013-03-22  2:11   ` Xiao Guangrong
2013-03-22 10:01     ` Xiao Guangrong
2013-03-22 10:54     ` Marcelo Tosatti
2013-03-22 11:10       ` Xiao Guangrong
2013-03-22 11:28         ` Gleb Natapov
2013-03-22 11:39           ` Xiao Guangrong
2013-03-22 11:47             ` Gleb Natapov
2013-03-22 12:03               ` Xiao Guangrong
2013-03-22 12:12                 ` Gleb Natapov
2013-03-22 12:37                   ` Xiao Guangrong
2013-03-22 19:15                     ` Gleb Natapov
2013-04-17 20:39                       ` Marcelo Tosatti
2013-04-18  9:42                         ` Gleb Natapov
2013-04-18 14:01                           ` Marcelo Tosatti
2013-04-18 16:36                             ` Gleb Natapov
2013-04-18 17:34                               ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).