From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754933AbbBQN3g (ORCPT ); Tue, 17 Feb 2015 08:29:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43894 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752096AbbBQN3f (ORCPT ); Tue, 17 Feb 2015 08:29:35 -0500 Date: Tue, 17 Feb 2015 14:29:31 +0100 From: "Michael S. Tsirkin" To: Paolo Bonzini Cc: Igor Mammedov , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, netdev@vger.kernel.org Subject: Re: [PATCH] vhost: support upto 509 memory regions Message-ID: <20150217132931.GB6362@redhat.com> References: <1423842599-5174-1-git-send-email-imammedo@redhat.com> <20150217090242.GA20254@redhat.com> <54E31F24.1060705@redhat.com> <20150217123212.GA6362@redhat.com> <54E33E09.5090603@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54E33E09.5090603@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 17, 2015 at 02:11:37PM +0100, Paolo Bonzini wrote: > > > On 17/02/2015 13:32, Michael S. Tsirkin wrote: > > On Tue, Feb 17, 2015 at 11:59:48AM +0100, Paolo Bonzini wrote: > >> > >> > >> On 17/02/2015 10:02, Michael S. Tsirkin wrote: > >>>> Increasing VHOST_MEMORY_MAX_NREGIONS from 65 to 509 > >>>> to match KVM_USER_MEM_SLOTS fixes issue for vhost-net. > >>>> > >>>> Signed-off-by: Igor Mammedov > >>> > >>> This scares me a bit: each region is 32byte, we are talking > >>> a 16K allocation that userspace can trigger. > >> > >> What's bad with a 16K allocation? > > > > It fails when memory is fragmented. > > If memory is _that_ fragmented I think you have much bigger problems > than vhost. > > > I'm guessing kvm doesn't do memory scans on data path, vhost does. > > It does for MMIO memory-to-memory writes, but that's not a particularly > fast path. > > KVM doesn't access the memory map on fast paths, but QEMU does, so I > don't think it's beyond the expectations of the kernel. QEMU has an elaborate data structure to deal with that. > For example you > can use a radix tree (not lib/radix-tree.c unfortunately), and cache > GVA->HPA translations if it turns out that lookup has become a hot path. All vhost lookups are hot path. > The addressing space of x86 is in practice 44 bits or fewer, and each > slot will typically be at least 1 GiB, so you only have 14 bits to > dispatch on. It's probably possible to only have two or three levels > in the radix tree in the common case, and beat the linear scan real quick. Not if there are about 6 regions, I think. > The radix tree can be tuned to use order-0 allocations, and then your > worries about fragmentation go away too. > > Paolo Increasing the number might be reasonable for workloads such as nested virt. But depending on this in userspace when you don't have to is not a good idea IMHO. -- MST