From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751451AbcLDNN3 convert rfc822-to-8bit (ORCPT ); Sun, 4 Dec 2016 08:13:29 -0500 Received: from mga04.intel.com ([192.55.52.120]:16120 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751261AbcLDNN1 (ORCPT ); Sun, 4 Dec 2016 08:13:27 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,298,1477983600"; d="scan'208";a="13423106" From: "Li, Liang Z" To: "Hansen, Dave" , "kvm@vger.kernel.org" CC: "virtualization@lists.linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "virtio-dev@lists.oasis-open.org" , "qemu-devel@nongnu.org" , "quintela@redhat.com" , "dgilbert@redhat.com" , "mst@redhat.com" , "jasowang@redhat.com" , "kirill.shutemov@linux.intel.com" , "akpm@linux-foundation.org" , "mhocko@suse.com" , "pbonzini@redhat.com" , Mel Gorman , Cornelia Huck , "Amit Shah" Subject: RE: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info Thread-Topic: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info Thread-Index: AQHSSuf7C044VX4sFkSRMSEWWM0WiqDxYEaAgAZgGRA= Date: Sun, 4 Dec 2016 13:13:23 +0000 Message-ID: References: <1480495397-23225-1-git-send-email-liang.z.li@intel.com> <1480495397-23225-6-git-send-email-liang.z.li@intel.com> <438dd41a-fdf1-2a77-ef9c-8c103f492b2f@intel.com> In-Reply-To: <438dd41a-fdf1-2a77-ef9c-8c103f492b2f@intel.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOWI3ZWNkYzgtNDU3NC00MjEwLTg1ODEtNmE4ODMyOWZkYTEyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IjlGdW5FYlRpbEN1XC96SkZtQ2pnWnZ4eW0xMXpBMDVLN2g4azhzZHR2NnBjPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On 11/30/2016 12:43 AM, Liang Li wrote: > > +static void send_unused_pages_info(struct virtio_balloon *vb, > > + unsigned long req_id) > > +{ > > + struct scatterlist sg_in; > > + unsigned long pos = 0; > > + struct virtqueue *vq = vb->req_vq; > > + struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr; > > + int ret, order; > > + > > + mutex_lock(&vb->balloon_lock); > > + > > + for (order = MAX_ORDER - 1; order >= 0; order--) { > > I scratched my head for a bit on this one. Why are you walking over orders, > *then* zones. I *think* you're doing it because you can efficiently fill the > bitmaps at a given order for all zones, then move to a new bitmap. But, it > would be interesting to document this. > Yes, use the order is somewhat strange, but it's helpful to keep the API simple. Do you think it's acceptable? > > + pos = 0; > > + ret = get_unused_pages(vb->resp_data, > > + vb->resp_buf_size / sizeof(unsigned long), > > + order, &pos); > > FWIW, get_unsued_pages() is a pretty bad name. "get" usually implies > bumping reference counts or consuming something. You're just "recording" > or "marking" them. > Will change to mark_unused_pages(). > > + if (ret == -ENOSPC) { > > + void *new_resp_data; > > + > > + new_resp_data = kmalloc(2 * vb->resp_buf_size, > > + GFP_KERNEL); > > + if (new_resp_data) { > > + kfree(vb->resp_data); > > + vb->resp_data = new_resp_data; > > + vb->resp_buf_size *= 2; > > What happens to the data in ->resp_data at this point? Doesn't this just > throw it away? > Yes, so we should make sure the data in resp_data is not inuse. > ... > > +struct page_info_item { > > + __le64 start_pfn : 52; /* start pfn for the bitmap */ > > + __le64 page_shift : 6; /* page shift width, in bytes */ > > + __le64 bmap_len : 6; /* bitmap length, in bytes */ }; > > Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right? > Currently, we just use the 8 bytes and 0 bytes bitmap, should we support more than 64 bytes? > > +static int mark_unused_pages(struct zone *zone, > > + unsigned long *unused_pages, unsigned long size, > > + int order, unsigned long *pos) > > +{ > > + unsigned long pfn, flags; > > + unsigned int t; > > + struct list_head *curr; > > + struct page_info_item *info; > > + > > + if (zone_is_empty(zone)) > > + return 0; > > + > > + spin_lock_irqsave(&zone->lock, flags); > > + > > + if (*pos + zone->free_area[order].nr_free > size) > > + return -ENOSPC; > > Urg, so this won't partially fill? So, what the nr_free pages limit where we no > longer fit in the kmalloc()'d buffer where this simply won't work? > Yes. My initial implementation is partially fill, it's better for the worst case. I thought the above code is more efficient for most case ... Do you think partially fill the bitmap is better? > > + for (t = 0; t < MIGRATE_TYPES; t++) { > > + list_for_each(curr, &zone->free_area[order].free_list[t]) { > > + pfn = page_to_pfn(list_entry(curr, struct page, lru)); > > + info = (struct page_info_item *)(unused_pages + > *pos); > > + info->start_pfn = pfn; > > + info->page_shift = order + PAGE_SHIFT; > > + *pos += 1; > > + } > > + } > > Do we need to fill in ->bmap_len here? For integrity, the bmap_len should be filled, will add. Omit this step just because QEMU assume the ->bmp_len is 0 and ignore this field. Thanks for your comment! Liang From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Li, Liang Z" Subject: RE: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info Date: Sun, 4 Dec 2016 13:13:23 +0000 Message-ID: References: <1480495397-23225-1-git-send-email-liang.z.li@intel.com> <1480495397-23225-6-git-send-email-liang.z.li@intel.com> <438dd41a-fdf1-2a77-ef9c-8c103f492b2f@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "virtualization@lists.linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "virtio-dev@lists.oasis-open.org" , "qemu-devel@nongnu.org" , "quintela@redhat.com" , "dgilbert@redhat.com" , "mst@redhat.com" , "jasowang@redhat.com" , "kirill.shutemov@linux.intel.com" , "akpm@linux-foundation.org" , "mhocko@suse.com" , "pbonzini@redhat.com" , Mel Gorman , Cornelia Huck , "Amit Shah" To: "Hansen, Dave" , "kvm@vger.kernel.org" Return-path: In-Reply-To: <438dd41a-fdf1-2a77-ef9c-8c103f492b2f@intel.com> Content-Language: en-US Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org > On 11/30/2016 12:43 AM, Liang Li wrote: > > +static void send_unused_pages_info(struct virtio_balloon *vb, > > + unsigned long req_id) > > +{ > > + struct scatterlist sg_in; > > + unsigned long pos =3D 0; > > + struct virtqueue *vq =3D vb->req_vq; > > + struct virtio_balloon_resp_hdr *hdr =3D vb->resp_hdr; > > + int ret, order; > > + > > + mutex_lock(&vb->balloon_lock); > > + > > + for (order =3D MAX_ORDER - 1; order >=3D 0; order--) { >=20 > I scratched my head for a bit on this one. Why are you walking over orde= rs, > *then* zones. I *think* you're doing it because you can efficiently fill= the > bitmaps at a given order for all zones, then move to a new bitmap. But, = it > would be interesting to document this. >=20 Yes, use the order is somewhat strange, but it's helpful to keep the API si= mple.=20 Do you think it's acceptable? > > + pos =3D 0; > > + ret =3D get_unused_pages(vb->resp_data, > > + vb->resp_buf_size / sizeof(unsigned long), > > + order, &pos); >=20 > FWIW, get_unsued_pages() is a pretty bad name. "get" usually implies > bumping reference counts or consuming something. You're just "recording" > or "marking" them. >=20 Will change to mark_unused_pages(). > > + if (ret =3D=3D -ENOSPC) { > > + void *new_resp_data; > > + > > + new_resp_data =3D kmalloc(2 * vb->resp_buf_size, > > + GFP_KERNEL); > > + if (new_resp_data) { > > + kfree(vb->resp_data); > > + vb->resp_data =3D new_resp_data; > > + vb->resp_buf_size *=3D 2; >=20 > What happens to the data in ->resp_data at this point? Doesn't this just > throw it away? >=20 Yes, so we should make sure the data in resp_data is not inuse. > ... > > +struct page_info_item { > > + __le64 start_pfn : 52; /* start pfn for the bitmap */ > > + __le64 page_shift : 6; /* page shift width, in bytes */ > > + __le64 bmap_len : 6; /* bitmap length, in bytes */ }; >=20 > Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right? >=20 Currently, we just use the 8 bytes and 0 bytes bitmap, should we support mo= re than 64 bytes? > > +static int mark_unused_pages(struct zone *zone, > > + unsigned long *unused_pages, unsigned long size, > > + int order, unsigned long *pos) > > +{ > > + unsigned long pfn, flags; > > + unsigned int t; > > + struct list_head *curr; > > + struct page_info_item *info; > > + > > + if (zone_is_empty(zone)) > > + return 0; > > + > > + spin_lock_irqsave(&zone->lock, flags); > > + > > + if (*pos + zone->free_area[order].nr_free > size) > > + return -ENOSPC; >=20 > Urg, so this won't partially fill? So, what the nr_free pages limit wher= e we no > longer fit in the kmalloc()'d buffer where this simply won't work? >=20 Yes. My initial implementation is partially fill, it's better for the wors= t case. I thought the above code is more efficient for most case ... Do you think partially fill the bitmap is better? =20 > > + for (t =3D 0; t < MIGRATE_TYPES; t++) { > > + list_for_each(curr, &zone->free_area[order].free_list[t]) { > > + pfn =3D page_to_pfn(list_entry(curr, struct page, lru)); > > + info =3D (struct page_info_item *)(unused_pages + > *pos); > > + info->start_pfn =3D pfn; > > + info->page_shift =3D order + PAGE_SHIFT; > > + *pos +=3D 1; > > + } > > + } >=20 > Do we need to fill in ->bmap_len here? For integrity, the bmap_len should be filled, will add. Omit this step just because QEMU assume the ->bmp_len is 0 and ignore this = field. Thanks for your comment! Liang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60669) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cDWbn-0007tZ-51 for qemu-devel@nongnu.org; Sun, 04 Dec 2016 08:13:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cDWbj-0005ZC-4M for qemu-devel@nongnu.org; Sun, 04 Dec 2016 08:13:35 -0500 Received: from mga14.intel.com ([192.55.52.115]:30370) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cDWbi-0005Z5-PF for qemu-devel@nongnu.org; Sun, 04 Dec 2016 08:13:31 -0500 From: "Li, Liang Z" Date: Sun, 4 Dec 2016 13:13:23 +0000 Message-ID: References: <1480495397-23225-1-git-send-email-liang.z.li@intel.com> <1480495397-23225-6-git-send-email-liang.z.li@intel.com> <438dd41a-fdf1-2a77-ef9c-8c103f492b2f@intel.com> In-Reply-To: <438dd41a-fdf1-2a77-ef9c-8c103f492b2f@intel.com> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Hansen, Dave" , "kvm@vger.kernel.org" Cc: "virtualization@lists.linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "virtio-dev@lists.oasis-open.org" , "qemu-devel@nongnu.org" , "quintela@redhat.com" , "dgilbert@redhat.com" , "mst@redhat.com" , "jasowang@redhat.com" , "kirill.shutemov@linux.intel.com" , "akpm@linux-foundation.org" , "mhocko@suse.com" , "pbonzini@redhat.com" , Mel Gorman , Cornelia Huck , Amit Shah > On 11/30/2016 12:43 AM, Liang Li wrote: > > +static void send_unused_pages_info(struct virtio_balloon *vb, > > + unsigned long req_id) > > +{ > > + struct scatterlist sg_in; > > + unsigned long pos =3D 0; > > + struct virtqueue *vq =3D vb->req_vq; > > + struct virtio_balloon_resp_hdr *hdr =3D vb->resp_hdr; > > + int ret, order; > > + > > + mutex_lock(&vb->balloon_lock); > > + > > + for (order =3D MAX_ORDER - 1; order >=3D 0; order--) { >=20 > I scratched my head for a bit on this one. Why are you walking over orde= rs, > *then* zones. I *think* you're doing it because you can efficiently fill= the > bitmaps at a given order for all zones, then move to a new bitmap. But, = it > would be interesting to document this. >=20 Yes, use the order is somewhat strange, but it's helpful to keep the API si= mple.=20 Do you think it's acceptable? > > + pos =3D 0; > > + ret =3D get_unused_pages(vb->resp_data, > > + vb->resp_buf_size / sizeof(unsigned long), > > + order, &pos); >=20 > FWIW, get_unsued_pages() is a pretty bad name. "get" usually implies > bumping reference counts or consuming something. You're just "recording" > or "marking" them. >=20 Will change to mark_unused_pages(). > > + if (ret =3D=3D -ENOSPC) { > > + void *new_resp_data; > > + > > + new_resp_data =3D kmalloc(2 * vb->resp_buf_size, > > + GFP_KERNEL); > > + if (new_resp_data) { > > + kfree(vb->resp_data); > > + vb->resp_data =3D new_resp_data; > > + vb->resp_buf_size *=3D 2; >=20 > What happens to the data in ->resp_data at this point? Doesn't this just > throw it away? >=20 Yes, so we should make sure the data in resp_data is not inuse. > ... > > +struct page_info_item { > > + __le64 start_pfn : 52; /* start pfn for the bitmap */ > > + __le64 page_shift : 6; /* page shift width, in bytes */ > > + __le64 bmap_len : 6; /* bitmap length, in bytes */ }; >=20 > Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right? >=20 Currently, we just use the 8 bytes and 0 bytes bitmap, should we support mo= re than 64 bytes? > > +static int mark_unused_pages(struct zone *zone, > > + unsigned long *unused_pages, unsigned long size, > > + int order, unsigned long *pos) > > +{ > > + unsigned long pfn, flags; > > + unsigned int t; > > + struct list_head *curr; > > + struct page_info_item *info; > > + > > + if (zone_is_empty(zone)) > > + return 0; > > + > > + spin_lock_irqsave(&zone->lock, flags); > > + > > + if (*pos + zone->free_area[order].nr_free > size) > > + return -ENOSPC; >=20 > Urg, so this won't partially fill? So, what the nr_free pages limit wher= e we no > longer fit in the kmalloc()'d buffer where this simply won't work? >=20 Yes. My initial implementation is partially fill, it's better for the wors= t case. I thought the above code is more efficient for most case ... Do you think partially fill the bitmap is better? =20 > > + for (t =3D 0; t < MIGRATE_TYPES; t++) { > > + list_for_each(curr, &zone->free_area[order].free_list[t]) { > > + pfn =3D page_to_pfn(list_entry(curr, struct page, lru)); > > + info =3D (struct page_info_item *)(unused_pages + > *pos); > > + info->start_pfn =3D pfn; > > + info->page_shift =3D order + PAGE_SHIFT; > > + *pos +=3D 1; > > + } > > + } >=20 > Do we need to fill in ->bmap_len here? For integrity, the bmap_len should be filled, will add. Omit this step just because QEMU assume the ->bmp_len is 0 and ignore this = field. Thanks for your comment! Liang