All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Li, Liang Z" <liang.z.li@intel.com>
To: "Hansen, Dave" <dave.hansen@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Cc: "virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	"Amit Shah" <amit.shah@redhat.com>
Subject: RE: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
Date: Tue, 6 Dec 2016 04:47:27 +0000	[thread overview]
Message-ID: <F2CBF3009FA73547804AE4C663CAB28E3A12F831@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <70ece7a5-348b-2eb9-c40a-f21b08df042c@intel.com>

> >>> +	mutex_lock(&vb->balloon_lock);
> >>> +
> >>> +	for (order = MAX_ORDER - 1; order >= 0; order--) {
> >>
> >> I scratched my head for a bit on this one.  Why are you walking over
> >> orders,
> >> *then* zones.  I *think* you're doing it because you can efficiently
> >> fill the bitmaps at a given order for all zones, then move to a new
> >> bitmap.  But, it would be interesting to document this.
> >
> > Yes, use the order is somewhat strange, but it's helpful to keep the API simple.
> > Do you think it's acceptable?
> 
> Yeah, it's fine.  Just comment it, please.
> 
Good!

> >>> +		if (ret == -ENOSPC) {
> >>> +			void *new_resp_data;
> >>> +
> >>> +			new_resp_data = kmalloc(2 * vb->resp_buf_size,
> >>> +						GFP_KERNEL);
> >>> +			if (new_resp_data) {
> >>> +				kfree(vb->resp_data);
> >>> +				vb->resp_data = new_resp_data;
> >>> +				vb->resp_buf_size *= 2;
> >>
> >> What happens to the data in ->resp_data at this point?  Doesn't this
> >> just throw it away?
> >
> > Yes, so we should make sure the data in resp_data is not inuse.
> 
> But doesn't it have valid data that we just collected and haven't told the
> hypervisor about yet?  Aren't we throwing away good data that cost us
> something to collect?

Indeed.  Some filled data may exist for the previous zone. Should we
change the API to 
'int get_unused_pages(unsigned long *unused_pages, unsigned long size,
		int order, unsigned long *pos, struct zone *zone)' ?

then we can use the 'zone' to record the zone to retry and not discard the
filled data.

> >> ...
> >>> +struct page_info_item {
> >>> +	__le64 start_pfn : 52; /* start pfn for the bitmap */
> >>> +	__le64 page_shift : 6; /* page shift width, in bytes */
> 
> What does a page_shift "in bytes" mean? :)

Obviously, you know. :o
I will try to make it clear.

> 
> >>> +	__le64 bmap_len : 6;  /* bitmap length, in bytes */ };
> >>
> >> Is 'bmap_len' too short?  a 64-byte buffer is a bit tiny.  Right?
> >
> > Currently, we just use the 8 bytes and 0 bytes bitmap, should we support
> more than 64 bytes?
> 
> It just means that with this format, you end up wasting at least ~1/8th of the
> space with metadata.  That's a bit unfortunate, but I guess it's not fatal.
> 
> I'd definitely call it out in the patch description and make sure other folks take
> a look at it.

OK.

> 
> There's a somewhat easy fix, but that would make the qemu implementation
> more complicated: You could just have bmap_len==0x3f imply that there's
> another field that contains an extended bitmap length for when you need long
> bitmaps.
> 
> But, as you note, there's no need for it, so it's a matter of trading the extra
> complexity versus the desire to not habing to change the ABI again for longer
> (hopefully).
> 

Your suggestion still works without changing the current code, just reserve
 ' bmap_len==0x3f' for future extension, and it's not used by the current code.

> >>> +static int  mark_unused_pages(struct zone *zone,
> >>> +		unsigned long *unused_pages, unsigned long size,
> >>> +		int order, unsigned long *pos)
> >>> +{
> >>> +	unsigned long pfn, flags;
> >>> +	unsigned int t;
> >>> +	struct list_head *curr;
> >>> +	struct page_info_item *info;
> >>> +
> >>> +	if (zone_is_empty(zone))
> >>> +		return 0;
> >>> +
> >>> +	spin_lock_irqsave(&zone->lock, flags);
> >>> +
> >>> +	if (*pos + zone->free_area[order].nr_free > size)
> >>> +		return -ENOSPC;
> >>
> >> Urg, so this won't partially fill?  So, what the nr_free pages limit
> >> where we no longer fit in the kmalloc()'d buffer where this simply won't
> work?
> >
> > Yes.  My initial implementation is partially fill, it's better for the worst case.
> > I thought the above code is more efficient for most case ...
> > Do you think partially fill the bitmap is better?
> 
> Could you please answer the question I asked?
> 

For your question:
-------------------------------------------------------------------------------------------------------
>So, what the nr_free pages limit where we no longer fit in the kmalloc()'d buffer
> where this simply won't work?
------------------------------------------------------------------------------------------------------
No, if the buffer is not big enough to save 'nr_free'  pages, get_unused_pages() will return
'-ENOSPC', and the following code will try to allocate a 2x times size buffer for retrying,
until the proper size buffer is allocated. The current order will not be skipped unless the
buffer allocation failed.

> Because if you don't get this right, it could mean that there are system that
> simply *fail* here.

WARNING: multiple messages have this Message-ID (diff)
From: "Li, Liang Z" <liang.z.li@intel.com>
To: "Hansen, Dave" <dave.hansen@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Cc: "virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"mhocko@suse.com" <mhocko@suse.com>,
	Amit Shah <amit.shah@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>
Subject: RE: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
Date: Tue, 6 Dec 2016 04:47:27 +0000	[thread overview]
Message-ID: <F2CBF3009FA73547804AE4C663CAB28E3A12F831@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <70ece7a5-348b-2eb9-c40a-f21b08df042c@intel.com>

> >>> +	mutex_lock(&vb->balloon_lock);
> >>> +
> >>> +	for (order = MAX_ORDER - 1; order >= 0; order--) {
> >>
> >> I scratched my head for a bit on this one.  Why are you walking over
> >> orders,
> >> *then* zones.  I *think* you're doing it because you can efficiently
> >> fill the bitmaps at a given order for all zones, then move to a new
> >> bitmap.  But, it would be interesting to document this.
> >
> > Yes, use the order is somewhat strange, but it's helpful to keep the API simple.
> > Do you think it's acceptable?
> 
> Yeah, it's fine.  Just comment it, please.
> 
Good!

> >>> +		if (ret == -ENOSPC) {
> >>> +			void *new_resp_data;
> >>> +
> >>> +			new_resp_data = kmalloc(2 * vb->resp_buf_size,
> >>> +						GFP_KERNEL);
> >>> +			if (new_resp_data) {
> >>> +				kfree(vb->resp_data);
> >>> +				vb->resp_data = new_resp_data;
> >>> +				vb->resp_buf_size *= 2;
> >>
> >> What happens to the data in ->resp_data at this point?  Doesn't this
> >> just throw it away?
> >
> > Yes, so we should make sure the data in resp_data is not inuse.
> 
> But doesn't it have valid data that we just collected and haven't told the
> hypervisor about yet?  Aren't we throwing away good data that cost us
> something to collect?

Indeed.  Some filled data may exist for the previous zone. Should we
change the API to 
'int get_unused_pages(unsigned long *unused_pages, unsigned long size,
		int order, unsigned long *pos, struct zone *zone)' ?

then we can use the 'zone' to record the zone to retry and not discard the
filled data.

> >> ...
> >>> +struct page_info_item {
> >>> +	__le64 start_pfn : 52; /* start pfn for the bitmap */
> >>> +	__le64 page_shift : 6; /* page shift width, in bytes */
> 
> What does a page_shift "in bytes" mean? :)

Obviously, you know. :o
I will try to make it clear.

> 
> >>> +	__le64 bmap_len : 6;  /* bitmap length, in bytes */ };
> >>
> >> Is 'bmap_len' too short?  a 64-byte buffer is a bit tiny.  Right?
> >
> > Currently, we just use the 8 bytes and 0 bytes bitmap, should we support
> more than 64 bytes?
> 
> It just means that with this format, you end up wasting at least ~1/8th of the
> space with metadata.  That's a bit unfortunate, but I guess it's not fatal.
> 
> I'd definitely call it out in the patch description and make sure other folks take
> a look at it.

OK.

> 
> There's a somewhat easy fix, but that would make the qemu implementation
> more complicated: You could just have bmap_len==0x3f imply that there's
> another field that contains an extended bitmap length for when you need long
> bitmaps.
> 
> But, as you note, there's no need for it, so it's a matter of trading the extra
> complexity versus the desire to not habing to change the ABI again for longer
> (hopefully).
> 

Your suggestion still works without changing the current code, just reserve
 ' bmap_len==0x3f' for future extension, and it's not used by the current code.

> >>> +static int  mark_unused_pages(struct zone *zone,
> >>> +		unsigned long *unused_pages, unsigned long size,
> >>> +		int order, unsigned long *pos)
> >>> +{
> >>> +	unsigned long pfn, flags;
> >>> +	unsigned int t;
> >>> +	struct list_head *curr;
> >>> +	struct page_info_item *info;
> >>> +
> >>> +	if (zone_is_empty(zone))
> >>> +		return 0;
> >>> +
> >>> +	spin_lock_irqsave(&zone->lock, flags);
> >>> +
> >>> +	if (*pos + zone->free_area[order].nr_free > size)
> >>> +		return -ENOSPC;
> >>
> >> Urg, so this won't partially fill?  So, what the nr_free pages limit
> >> where we no longer fit in the kmalloc()'d buffer where this simply won't
> work?
> >
> > Yes.  My initial implementation is partially fill, it's better for the worst case.
> > I thought the above code is more efficient for most case ...
> > Do you think partially fill the bitmap is better?
> 
> Could you please answer the question I asked?
> 

For your question:
-------------------------------------------------------------------------------------------------------
>So, what the nr_free pages limit where we no longer fit in the kmalloc()'d buffer
> where this simply won't work?
------------------------------------------------------------------------------------------------------
No, if the buffer is not big enough to save 'nr_free'  pages, get_unused_pages() will return
'-ENOSPC', and the following code will try to allocate a 2x times size buffer for retrying,
until the proper size buffer is allocated. The current order will not be skipped unless the
buffer allocation failed.

> Because if you don't get this right, it could mean that there are system that
> simply *fail* here.

WARNING: multiple messages have this Message-ID (diff)
From: "Li, Liang Z" <liang.z.li@intel.com>
To: "Hansen, Dave" <dave.hansen@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Cc: "virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	Amit Shah <amit.shah@redhat.com>
Subject: RE: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
Date: Tue, 6 Dec 2016 04:47:27 +0000	[thread overview]
Message-ID: <F2CBF3009FA73547804AE4C663CAB28E3A12F831@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <70ece7a5-348b-2eb9-c40a-f21b08df042c@intel.com>

> >>> +	mutex_lock(&vb->balloon_lock);
> >>> +
> >>> +	for (order = MAX_ORDER - 1; order >= 0; order--) {
> >>
> >> I scratched my head for a bit on this one.  Why are you walking over
> >> orders,
> >> *then* zones.  I *think* you're doing it because you can efficiently
> >> fill the bitmaps at a given order for all zones, then move to a new
> >> bitmap.  But, it would be interesting to document this.
> >
> > Yes, use the order is somewhat strange, but it's helpful to keep the API simple.
> > Do you think it's acceptable?
> 
> Yeah, it's fine.  Just comment it, please.
> 
Good!

> >>> +		if (ret == -ENOSPC) {
> >>> +			void *new_resp_data;
> >>> +
> >>> +			new_resp_data = kmalloc(2 * vb->resp_buf_size,
> >>> +						GFP_KERNEL);
> >>> +			if (new_resp_data) {
> >>> +				kfree(vb->resp_data);
> >>> +				vb->resp_data = new_resp_data;
> >>> +				vb->resp_buf_size *= 2;
> >>
> >> What happens to the data in ->resp_data at this point?  Doesn't this
> >> just throw it away?
> >
> > Yes, so we should make sure the data in resp_data is not inuse.
> 
> But doesn't it have valid data that we just collected and haven't told the
> hypervisor about yet?  Aren't we throwing away good data that cost us
> something to collect?

Indeed.  Some filled data may exist for the previous zone. Should we
change the API to 
'int get_unused_pages(unsigned long *unused_pages, unsigned long size,
		int order, unsigned long *pos, struct zone *zone)' ?

then we can use the 'zone' to record the zone to retry and not discard the
filled data.

> >> ...
> >>> +struct page_info_item {
> >>> +	__le64 start_pfn : 52; /* start pfn for the bitmap */
> >>> +	__le64 page_shift : 6; /* page shift width, in bytes */
> 
> What does a page_shift "in bytes" mean? :)

Obviously, you know. :o
I will try to make it clear.

> 
> >>> +	__le64 bmap_len : 6;  /* bitmap length, in bytes */ };
> >>
> >> Is 'bmap_len' too short?  a 64-byte buffer is a bit tiny.  Right?
> >
> > Currently, we just use the 8 bytes and 0 bytes bitmap, should we support
> more than 64 bytes?
> 
> It just means that with this format, you end up wasting at least ~1/8th of the
> space with metadata.  That's a bit unfortunate, but I guess it's not fatal.
> 
> I'd definitely call it out in the patch description and make sure other folks take
> a look at it.

OK.

> 
> There's a somewhat easy fix, but that would make the qemu implementation
> more complicated: You could just have bmap_len==0x3f imply that there's
> another field that contains an extended bitmap length for when you need long
> bitmaps.
> 
> But, as you note, there's no need for it, so it's a matter of trading the extra
> complexity versus the desire to not habing to change the ABI again for longer
> (hopefully).
> 

Your suggestion still works without changing the current code, just reserve
 ' bmap_len==0x3f' for future extension, and it's not used by the current code.

> >>> +static int  mark_unused_pages(struct zone *zone,
> >>> +		unsigned long *unused_pages, unsigned long size,
> >>> +		int order, unsigned long *pos)
> >>> +{
> >>> +	unsigned long pfn, flags;
> >>> +	unsigned int t;
> >>> +	struct list_head *curr;
> >>> +	struct page_info_item *info;
> >>> +
> >>> +	if (zone_is_empty(zone))
> >>> +		return 0;
> >>> +
> >>> +	spin_lock_irqsave(&zone->lock, flags);
> >>> +
> >>> +	if (*pos + zone->free_area[order].nr_free > size)
> >>> +		return -ENOSPC;
> >>
> >> Urg, so this won't partially fill?  So, what the nr_free pages limit
> >> where we no longer fit in the kmalloc()'d buffer where this simply won't
> work?
> >
> > Yes.  My initial implementation is partially fill, it's better for the worst case.
> > I thought the above code is more efficient for most case ...
> > Do you think partially fill the bitmap is better?
> 
> Could you please answer the question I asked?
> 

For your question:
-------------------------------------------------------------------------------------------------------
>So, what the nr_free pages limit where we no longer fit in the kmalloc()'d buffer
> where this simply won't work?
------------------------------------------------------------------------------------------------------
No, if the buffer is not big enough to save 'nr_free'  pages, get_unused_pages() will return
'-ENOSPC', and the following code will try to allocate a 2x times size buffer for retrying,
until the proper size buffer is allocated. The current order will not be skipped unless the
buffer allocation failed.

> Because if you don't get this right, it could mean that there are system that
> simply *fail* here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Li, Liang Z" <liang.z.li@intel.com>
To: "Hansen, Dave" <dave.hansen@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Cc: "virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	Amit Shah <amit.shah@redhat.com>
Subject: Re: [Qemu-devel] [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
Date: Tue, 6 Dec 2016 04:47:27 +0000	[thread overview]
Message-ID: <F2CBF3009FA73547804AE4C663CAB28E3A12F831@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <70ece7a5-348b-2eb9-c40a-f21b08df042c@intel.com>

> >>> +	mutex_lock(&vb->balloon_lock);
> >>> +
> >>> +	for (order = MAX_ORDER - 1; order >= 0; order--) {
> >>
> >> I scratched my head for a bit on this one.  Why are you walking over
> >> orders,
> >> *then* zones.  I *think* you're doing it because you can efficiently
> >> fill the bitmaps at a given order for all zones, then move to a new
> >> bitmap.  But, it would be interesting to document this.
> >
> > Yes, use the order is somewhat strange, but it's helpful to keep the API simple.
> > Do you think it's acceptable?
> 
> Yeah, it's fine.  Just comment it, please.
> 
Good!

> >>> +		if (ret == -ENOSPC) {
> >>> +			void *new_resp_data;
> >>> +
> >>> +			new_resp_data = kmalloc(2 * vb->resp_buf_size,
> >>> +						GFP_KERNEL);
> >>> +			if (new_resp_data) {
> >>> +				kfree(vb->resp_data);
> >>> +				vb->resp_data = new_resp_data;
> >>> +				vb->resp_buf_size *= 2;
> >>
> >> What happens to the data in ->resp_data at this point?  Doesn't this
> >> just throw it away?
> >
> > Yes, so we should make sure the data in resp_data is not inuse.
> 
> But doesn't it have valid data that we just collected and haven't told the
> hypervisor about yet?  Aren't we throwing away good data that cost us
> something to collect?

Indeed.  Some filled data may exist for the previous zone. Should we
change the API to 
'int get_unused_pages(unsigned long *unused_pages, unsigned long size,
		int order, unsigned long *pos, struct zone *zone)' ?

then we can use the 'zone' to record the zone to retry and not discard the
filled data.

> >> ...
> >>> +struct page_info_item {
> >>> +	__le64 start_pfn : 52; /* start pfn for the bitmap */
> >>> +	__le64 page_shift : 6; /* page shift width, in bytes */
> 
> What does a page_shift "in bytes" mean? :)

Obviously, you know. :o
I will try to make it clear.

> 
> >>> +	__le64 bmap_len : 6;  /* bitmap length, in bytes */ };
> >>
> >> Is 'bmap_len' too short?  a 64-byte buffer is a bit tiny.  Right?
> >
> > Currently, we just use the 8 bytes and 0 bytes bitmap, should we support
> more than 64 bytes?
> 
> It just means that with this format, you end up wasting at least ~1/8th of the
> space with metadata.  That's a bit unfortunate, but I guess it's not fatal.
> 
> I'd definitely call it out in the patch description and make sure other folks take
> a look at it.

OK.

> 
> There's a somewhat easy fix, but that would make the qemu implementation
> more complicated: You could just have bmap_len==0x3f imply that there's
> another field that contains an extended bitmap length for when you need long
> bitmaps.
> 
> But, as you note, there's no need for it, so it's a matter of trading the extra
> complexity versus the desire to not habing to change the ABI again for longer
> (hopefully).
> 

Your suggestion still works without changing the current code, just reserve
 ' bmap_len==0x3f' for future extension, and it's not used by the current code.

> >>> +static int  mark_unused_pages(struct zone *zone,
> >>> +		unsigned long *unused_pages, unsigned long size,
> >>> +		int order, unsigned long *pos)
> >>> +{
> >>> +	unsigned long pfn, flags;
> >>> +	unsigned int t;
> >>> +	struct list_head *curr;
> >>> +	struct page_info_item *info;
> >>> +
> >>> +	if (zone_is_empty(zone))
> >>> +		return 0;
> >>> +
> >>> +	spin_lock_irqsave(&zone->lock, flags);
> >>> +
> >>> +	if (*pos + zone->free_area[order].nr_free > size)
> >>> +		return -ENOSPC;
> >>
> >> Urg, so this won't partially fill?  So, what the nr_free pages limit
> >> where we no longer fit in the kmalloc()'d buffer where this simply won't
> work?
> >
> > Yes.  My initial implementation is partially fill, it's better for the worst case.
> > I thought the above code is more efficient for most case ...
> > Do you think partially fill the bitmap is better?
> 
> Could you please answer the question I asked?
> 

For your question:
-------------------------------------------------------------------------------------------------------
>So, what the nr_free pages limit where we no longer fit in the kmalloc()'d buffer
> where this simply won't work?
------------------------------------------------------------------------------------------------------
No, if the buffer is not big enough to save 'nr_free'  pages, get_unused_pages() will return
'-ENOSPC', and the following code will try to allocate a 2x times size buffer for retrying,
until the proper size buffer is allocated. The current order will not be skipped unless the
buffer allocation failed.

> Because if you don't get this right, it could mean that there are system that
> simply *fail* here.

  reply	other threads:[~2016-12-06  4:47 UTC|newest]

Thread overview: 165+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30  8:43 [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
2016-11-30  8:43 ` [Qemu-devel] " Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 1/5] virtio-balloon: rework deflate to add page to a list Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 2/5] virtio-balloon: define new feature bit and head struct Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 3/5] virtio-balloon: speed up inflate/deflate process Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 4/5] virtio-balloon: define flags and head for host request vq Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43 ` [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info Liang Li
2016-11-30  8:43 ` Liang Li
2016-11-30  8:43   ` [Qemu-devel] " Liang Li
2016-11-30  8:43   ` Liang Li
2016-11-30 19:15   ` Dave Hansen
2016-11-30 19:15   ` Dave Hansen
2016-11-30 19:15     ` [Qemu-devel] " Dave Hansen
2016-11-30 19:15     ` Dave Hansen
2016-12-04 13:13     ` Li, Liang Z
2016-12-04 13:13     ` Li, Liang Z
2016-12-04 13:13       ` [Qemu-devel] " Li, Liang Z
2016-12-04 13:13       ` Li, Liang Z
2016-12-05 17:22       ` Dave Hansen
2016-12-05 17:22         ` [Qemu-devel] " Dave Hansen
2016-12-05 17:22         ` Dave Hansen
2016-12-05 17:22         ` Dave Hansen
2016-12-06  4:47         ` Li, Liang Z [this message]
2016-12-06  4:47           ` [Qemu-devel] " Li, Liang Z
2016-12-06  4:47           ` Li, Liang Z
2016-12-06  4:47           ` Li, Liang Z
2016-12-06  8:40 ` [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration David Hildenbrand
2016-12-06  8:40   ` [Qemu-devel] " David Hildenbrand
2016-12-06  8:40   ` David Hildenbrand
2016-12-06  8:40   ` David Hildenbrand
2016-12-07 13:35   ` Li, Liang Z
2016-12-07 13:35     ` [Qemu-devel] " Li, Liang Z
2016-12-07 13:35     ` Li, Liang Z
2016-12-07 13:35     ` Li, Liang Z
2016-12-07 15:34     ` Dave Hansen
2016-12-07 15:34       ` [Qemu-devel] " Dave Hansen
2016-12-07 15:34       ` Dave Hansen
2016-12-09  3:09       ` Li, Liang Z
2016-12-09  3:09         ` [Qemu-devel] " Li, Liang Z
2016-12-09  3:09         ` Li, Liang Z
2016-12-09  3:09         ` Li, Liang Z
2016-12-09  3:09       ` Li, Liang Z
2016-12-07 15:34     ` Dave Hansen
2016-12-07 15:42     ` David Hildenbrand
2016-12-07 15:42       ` [Qemu-devel] " David Hildenbrand
2016-12-07 15:42       ` David Hildenbrand
2016-12-07 15:45       ` Dave Hansen
2016-12-07 15:45         ` [Qemu-devel] " Dave Hansen
2016-12-07 15:45         ` Dave Hansen
2016-12-07 15:45         ` Dave Hansen
2016-12-07 16:21         ` David Hildenbrand
2016-12-07 16:21           ` [Qemu-devel] " David Hildenbrand
2016-12-07 16:21           ` David Hildenbrand
2016-12-07 16:21           ` David Hildenbrand
2016-12-07 16:57           ` Dave Hansen
2016-12-07 16:57           ` Dave Hansen
2016-12-07 16:57             ` [Qemu-devel] " Dave Hansen
2016-12-07 16:57             ` Dave Hansen
2016-12-07 18:38             ` [Qemu-devel] " Andrea Arcangeli
2016-12-07 18:38             ` Andrea Arcangeli
2016-12-07 18:38               ` Andrea Arcangeli
2016-12-07 18:38               ` Andrea Arcangeli
2016-12-07 18:44               ` Dave Hansen
2016-12-07 18:44               ` Dave Hansen
2016-12-07 18:44                 ` Dave Hansen
2016-12-07 18:44                 ` Dave Hansen
2016-12-07 18:58                 ` Andrea Arcangeli
2016-12-07 18:58                 ` Andrea Arcangeli
2016-12-07 18:58                   ` Andrea Arcangeli
2016-12-07 18:58                   ` Andrea Arcangeli
2016-12-07 19:54               ` Dave Hansen
2016-12-07 19:54                 ` Dave Hansen
2016-12-07 19:54                 ` Dave Hansen
2016-12-07 19:54                 ` Dave Hansen
2016-12-07 20:28                 ` Andrea Arcangeli
2016-12-07 20:28                 ` Andrea Arcangeli
2016-12-07 20:28                   ` Andrea Arcangeli
2016-12-07 20:28                   ` Andrea Arcangeli
2016-12-09  4:45                   ` Li, Liang Z
2016-12-09  4:45                   ` Li, Liang Z
2016-12-09  4:45                     ` Li, Liang Z
2016-12-09  4:45                     ` Li, Liang Z
2016-12-09  4:53                     ` Dave Hansen
2016-12-09  4:53                       ` Dave Hansen
2016-12-09  4:53                       ` Dave Hansen
2016-12-09  4:53                       ` Dave Hansen
2016-12-09  5:35                       ` Li, Liang Z
2016-12-09  5:35                         ` Li, Liang Z
2016-12-09  5:35                         ` Li, Liang Z
2016-12-09  5:35                         ` Li, Liang Z
2016-12-09 16:42                         ` Andrea Arcangeli
2016-12-09 16:42                           ` Andrea Arcangeli
2016-12-09 16:42                           ` Andrea Arcangeli
2016-12-09 16:42                           ` Andrea Arcangeli
2016-12-14  8:20                           ` Li, Liang Z
2016-12-14  8:20                             ` Li, Liang Z
2016-12-14  8:20                             ` Li, Liang Z
2016-12-14  8:20                             ` Li, Liang Z
2016-12-14  8:59                       ` Li, Liang Z
2016-12-14  8:59                         ` Li, Liang Z
2016-12-14  8:59                         ` Li, Liang Z
2016-12-14  8:59                         ` Li, Liang Z
2016-12-15 15:34                         ` Dave Hansen
2016-12-15 15:34                         ` Dave Hansen
2016-12-15 15:34                           ` Dave Hansen
2016-12-15 15:34                           ` Dave Hansen
2016-12-15 15:54                           ` Michael S. Tsirkin
2016-12-15 15:54                           ` Michael S. Tsirkin
2016-12-15 15:54                             ` Michael S. Tsirkin
2016-12-15 15:54                             ` Michael S. Tsirkin
2016-12-16  1:12                             ` Li, Liang Z
2016-12-16  1:12                               ` Li, Liang Z
2016-12-16  1:12                               ` Li, Liang Z
2016-12-16  1:12                               ` Li, Liang Z
2016-12-16 15:40                               ` Andrea Arcangeli
2016-12-16 15:40                                 ` Andrea Arcangeli
2016-12-16 15:40                                 ` Andrea Arcangeli
2016-12-17 11:56                                 ` Li, Liang Z
2016-12-17 11:56                                   ` Li, Liang Z
2016-12-17 11:56                                   ` Li, Liang Z
2016-12-17 11:56                                   ` Li, Liang Z
2016-12-16 15:40                               ` Andrea Arcangeli
2016-12-16  0:48                           ` Li, Liang Z
2016-12-16  0:48                             ` Li, Liang Z
2016-12-16  0:48                             ` Li, Liang Z
2016-12-16  0:48                             ` Li, Liang Z
2016-12-16  1:09                             ` Dave Hansen
2016-12-16  1:09                             ` Dave Hansen
2016-12-16  1:09                               ` Dave Hansen
2016-12-16  1:09                               ` Dave Hansen
2016-12-16  1:38                               ` Li, Liang Z
2016-12-16  1:38                                 ` Li, Liang Z
2016-12-16  1:38                                 ` Li, Liang Z
2016-12-16  1:38                                 ` Li, Liang Z
2016-12-16  1:40                                 ` Dave Hansen
2016-12-16  1:40                                   ` Dave Hansen
2016-12-16  1:40                                   ` Dave Hansen
2016-12-16  1:40                                   ` Dave Hansen
2016-12-16  1:43                                   ` Li, Liang Z
2016-12-16  1:43                                   ` Li, Liang Z
2016-12-16  1:43                                     ` Li, Liang Z
2016-12-16  1:43                                     ` Li, Liang Z
2016-12-16 16:01                                   ` Andrea Arcangeli
2016-12-16 16:01                                     ` Andrea Arcangeli
2016-12-16 16:01                                     ` Andrea Arcangeli
2016-12-16 16:01                                     ` Andrea Arcangeli
2016-12-17 12:39                                     ` Li, Liang Z
2016-12-17 12:39                                     ` Li, Liang Z
2016-12-17 12:39                                       ` Li, Liang Z
2016-12-17 12:39                                       ` Li, Liang Z
2016-12-07 15:42     ` David Hildenbrand
2016-12-07 13:35   ` Li, Liang Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F2CBF3009FA73547804AE4C663CAB28E3A12F831@shsmsx102.ccr.corp.intel.com \
    --to=liang.z.li@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.shah@redhat.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.