From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to
 help avoid aborts
Date: Tue, 25 Jan 2011 16:53:44 +0200
Message-ID: <4D3EE3F8.3020603@redhat.com>
References: <20110121233040.22262.68117.stgit@s20.home>	 <20110124093241.GA28654@amt.cnet>  <4D3D89B1.30300@siemens.com>	 <1295883899.3230.9.camel@x201> <1295933876.3230.46.camel@x201>	 <4D3E7D74.1030100@web.de> <1295966492.3230.55.camel@x201>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Jan Kiszka <jan.kiszka@web.de>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"ddutile@redhat.com" <ddutile@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"chrisw@redhat.com" <chrisw@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:53320 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1750960Ab1AYOxt (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 25 Jan 2011 09:53:49 -0500
In-Reply-To: <1295966492.3230.55.camel@x201>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 01/25/2011 04:41 PM, Alex Williamson wrote:
> >  >
> >  >
> >  >  kvm: Allow memory slot array to grow on demand
> >  >
> >  >  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
> >  >  to grow on demand.  Private slots are now allocated at the
> >  >  front instead of the end.  Only x86 seems to use private slots,
> >
> >  Hmm, doesn't current user space expect slots 8..11 to be the private
> >  ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?
>
> The private slots aren't currently visible to userspace, they're
> actually slots 32..35.  The patch automatically increments user passed
> slot ids so userspace has it's own zero-based view of the array.
> Frankly, I don't understand why userspace reserves slots 8..11, is this
> compatibility with older kernel implementations?

I think so.  I believe these kernel versions are too old now to matter, 
but of course I can't be sure.

> >  >  so this is now zero for all other archs.  The memslots pointer
> >  >  is already updated using rcu, so changing the size off the
> >  >  array when it's replaces is straight forward.  x86 also keeps
> >  >  a bitmap of slots used by a kvm_mmu_page, which requires a
> >  >  shadow tlb flush whenever we increase the number of slots.
> >  >  This forces the pages to be rebuilt with the new bitmap size.
> >
> >  Is it possible for user space to increase the slot number to ridiculous
> >  amounts (at least as far as kmalloc allows) and then trigger a kernel
> >  walk through them in non-preemptible contexts? Just wondering, I haven't
> >  checked all contexts of functions like kvm_is_visible_gfn yet.
> >
> >  If yes, we should already switch to rbtree or something like that.
> >  Otherwise that may wait a bit, but probably not too long.
>
> Yeah, Avi has brought up the hole that userspace can exploit this
> interface with these changes.  However, for 99+% of users, this change
> leaves the slot array at about the same size, or makes it smaller.  Only
> huge, scale-out guests would probably even see a doubling of slots (my
> guest with 14 82576 VFs uses 48 slots).  On the kernel side, I think we
> can safely save a tree implementation as a later optimization should we
> determine it's necessary.  We'll have to see how the userspace side
> matches to figure out what's best there.  Thanks,
>

A tree would probably be a pessimization until we are able to cache the 
result of lookups.  That's because the linear scan generates a very 
simple pattern of branch predictions and memory accesses, while a tree 
uses a whole bunch of cachelines and generates unpredictable branches 
(if the inputs are unpredictable).

Note that with TDP most lookups result in failure, so all we need is a 
fast way to determine whether to perform the lookup at all or not.  That 
can be done by caching the last lookup for this address in the spte by 
setting a reserved bits.  For the other lookups, which we believe will 
succeed, we can assume the probablity of a match is related to the slot 
size, and sort the slots by page count.

-- 
error compiling committee.c: too many arguments to function