From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753353AbcG2Tsh (ORCPT ); Fri, 29 Jul 2016 15:48:37 -0400 Received: from mga04.intel.com ([192.55.52.120]:58993 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752004AbcG2Ts3 (ORCPT ); Fri, 29 Jul 2016 15:48:29 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,441,1464678000"; d="scan'208";a="856027264" Subject: Re: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process To: "Michael S. Tsirkin" , "Li, Liang Z" References: <1469582616-5729-1-git-send-email-liang.z.li@intel.com> <1469582616-5729-5-git-send-email-liang.z.li@intel.com> <5798DB49.7030803@intel.com> <20160728044000-mutt-send-email-mst@kernel.org> <20160729003759-mutt-send-email-mst@kernel.org> Cc: "linux-kernel@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "linux-mm@kvack.org" , "virtio-dev@lists.oasis-open.org" , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , "dgilbert@redhat.com" , "quintela@redhat.com" , Andrew Morton , Vlastimil Babka , Mel Gorman , Paolo Bonzini , Cornelia Huck , Amit Shah From: Dave Hansen Message-ID: <579BB30B.2040704@intel.com> Date: Fri, 29 Jul 2016 12:48:27 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <20160729003759-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/28/2016 02:51 PM, Michael S. Tsirkin wrote: >> > If 1MB is too big, how about 512K, or 256K? 32K seems too small. >> > > It's only small because it makes you rescan the free list. > So maybe you should do something else. > I looked at it a bit. Instead of scanning the free list, how about > scanning actual page structures? If page is unused, pass it to host. > Solves the problem of rescanning multiple times, does it not? FWIW, I think the new data structure needs some work. Before, we had a potentially very long list of 4k areas. Now, we've just got a very large bitmap. The bitmap might not even be very dense if we are ballooning relatively few things. Can I suggest an alternate scheme? I think you actually need a hybrid scheme that has bitmaps but also allows more flexibility in the pfn ranges. The payload could be a number of records each containing 3 things: pfn, page order, length of bitmap (maybe in powers of 2) Each record is followed by the bitmap. Or, if the bitmap length is 0, immediately followed by another record. A bitmap length of 0 implies a bitmap with the least significant bit set. Page order specifies how many pages each bit represents. This scheme could easily encode the new data structure you are proposing by just setting pfn=0, order=0, and a very long bitmap length. But, it could handle sparse bitmaps much better *and* represent large pages much more efficiently. There's plenty of space to fit a whole record in 64 bits.