From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752841AbdDEDbn (ORCPT ); Tue, 4 Apr 2017 23:31:43 -0400 Received: from mga03.intel.com ([134.134.136.65]:30979 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751885AbdDEDbl (ORCPT ); Tue, 4 Apr 2017 23:31:41 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,276,1486454400"; d="scan'208";a="1151168781" From: "Wang, Wei W" To: "virtio-dev@lists.oasis-open.org" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "mst@redhat.com" , "david@redhat.com" , "Hansen, Dave" , "cornelia.huck@de.ibm.com" , "akpm@linux-foundation.org" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" Subject: RE: [PATCH kernel v8 2/4] virtio-balloon: VIRTIO_BALLOON_F_CHUNK_TRANSFER Thread-Topic: [PATCH kernel v8 2/4] virtio-balloon: VIRTIO_BALLOON_F_CHUNK_TRANSFER Thread-Index: AQHSniTSyACGRJFyVk65PJumYo7Q6qG2OEkw Date: Wed, 5 Apr 2017 03:31:36 +0000 Message-ID: <286AC319A985734F985F78AFA26841F7391E1962@shsmsx102.ccr.corp.intel.com> References: <1489648127-37282-1-git-send-email-wei.w.wang@intel.com> <1489648127-37282-3-git-send-email-wei.w.wang@intel.com> In-Reply-To: <1489648127-37282-3-git-send-email-wei.w.wang@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNDE3MzYwMWEtMjcxYi00MWYyLWE4ODgtZjhhY2Q5ZGMyODdmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InpybktXMkVsVHpXb2tOdjhMMmpOa0U2emV2cWtqRk55Q0dNNXdvZ0Ria0k9In0= x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v353Vmk8018579 On Thursday, March 16, 2017 3:09 PM Wei Wang wrote: > The implementation of the current virtio-balloon is not very efficient, because > the ballooned pages are transferred to the host one by one. Here is the > breakdown of the time in percentage spent on each step of the balloon inflating > process (inflating 7GB of an 8GB idle guest). > > 1) allocating pages (6.5%) > 2) sending PFNs to host (68.3%) > 3) address translation (6.1%) > 4) madvise (19%) > > It takes about 4126ms for the inflating process to complete. > The above profiling shows that the bottlenecks are stage 2) and stage 4). > > This patch optimizes step 2) by transferring pages to the host in chunks. A chunk > consists of guest physically continuous pages, and it is offered to the host via a > base PFN (i.e. the start PFN of those physically continuous pages) and the size > (i.e. the total number of the pages). A chunk is formated as below: > > -------------------------------------------------------- > | Base (52 bit) | Rsvd (12 bit) | > -------------------------------------------------------- > -------------------------------------------------------- > | Size (52 bit) | Rsvd (12 bit) | > -------------------------------------------------------- > > By doing so, step 4) can also be optimized by doing address translation and > madvise() in chunks rather than page by page. > > This optimization requires the negotiation of a new feature bit, > VIRTIO_BALLOON_F_CHUNK_TRANSFER. > > With this new feature, the above ballooning process takes ~590ms resulting in > an improvement of ~85%. > > TODO: optimize stage 1) by allocating/freeing a chunk of pages instead of a > single page each time. > > Signed-off-by: Liang Li > Signed-off-by: Wei Wang > Suggested-by: Michael S. Tsirkin > --- > drivers/virtio/virtio_balloon.c | 371 +++++++++++++++++++++++++++++++++- > -- > include/uapi/linux/virtio_balloon.h | 9 + > 2 files changed, 353 insertions(+), 27 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index > f59cb4f..3f4a161 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -42,6 +42,10 @@ > #define OOM_VBALLOON_DEFAULT_PAGES 256 > #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 > > +#define PAGE_BMAP_SIZE (8 * PAGE_SIZE) > +#define PFNS_PER_PAGE_BMAP (PAGE_BMAP_SIZE * BITS_PER_BYTE) > +#define PAGE_BMAP_COUNT_MAX 32 > + > static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; > module_param(oom_pages, int, S_IRUSR | S_IWUSR); > MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -50,6 +54,14 > @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); static struct > vfsmount *balloon_mnt; #endif > > +#define BALLOON_CHUNK_BASE_SHIFT 12 > +#define BALLOON_CHUNK_SIZE_SHIFT 12 > +struct balloon_page_chunk { > + __le64 base; > + __le64 size; > +}; > + > +typedef __le64 resp_data_t; > struct virtio_balloon { > struct virtio_device *vdev; > struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; @@ -67,6 +79,31 > @@ struct virtio_balloon { > > /* Number of balloon pages we've told the Host we're not using. */ > unsigned int num_pages; > + /* Pointer to the response header. */ > + struct virtio_balloon_resp_hdr *resp_hdr; > + /* Pointer to the start address of response data. */ > + resp_data_t *resp_data; I think the implementation has an issue here - both the balloon pages and the unused pages use the same buffer ("resp_data" above) to store chunks. It would cause a race in this case: live migration starts while ballooning is also in progress. I plan to use separate buffers for CHUNKS_OF_BALLOON_PAGES and CHUNKS_OF_UNUSED_PAGES. Please let me know if you have a different suggestion. Thanks. Best, Wei