linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wei Wang <wei.w.wang@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org,
	kvm@vger.kernel.org, linux-mm@kvack.org, david@redhat.com,
	dave.hansen@intel.com, cornelia.huck@de.ibm.com,
	akpm@linux-foundation.org, mgorman@techsingularity.net,
	aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com,
	liliang.opensource@gmail.com,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [virtio-dev] Re: [PATCH v11 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS
Date: Thu, 15 Jun 2017 16:10:17 +0800	[thread overview]
Message-ID: <594240E9.2070705@intel.com> (raw)
In-Reply-To: <20170613200049-mutt-send-email-mst@kernel.org>

On 06/14/2017 01:56 AM, Michael S. Tsirkin wrote:
> On Fri, Jun 09, 2017 at 06:41:38PM +0800, Wei Wang wrote:
>> Add a new feature, VIRTIO_BALLOON_F_PAGE_CHUNKS, which enables
>> the transfer of the ballooned (i.e. inflated/deflated) pages in
>> chunks to the host.
> so now these chunks are just s/g list entry.
> So let's rename this VIRTIO_BALLOON_F_SG with a comment:
> * Use standard virtio s/g instead of PFN lists *

Actually, it's not using the standard s/g list in the implementation,
because:
using the standard s/g will need kmalloc() the indirect table on
demand (i.e. when virtqueue_add() converts s/g to indirect table);

The implementation directly pre-allocates an indirect desc table,
and uses a entry (i.e. vring_desc) to describe a chunk. This
avoids the overhead of kmalloc() the indirect table.


>> +/*
>> + * Callulates how many pfns can a page_bmap record. A bit corresponds to a
>> + * page of PAGE_SIZE.
>> + */
>> +#define VIRTIO_BALLOON_PFNS_PER_PAGE_BMAP \
>> +	(VIRTIO_BALLOON_PAGE_BMAP_SIZE * BITS_PER_BYTE)
>> +
>> +/* The number of page_bmap to allocate by default. */
>> +#define VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM	1
> It's not by default, it's at probe time, right?
It is the number of page bitmap being kept throughout the whole
lifecycle of the driver. The page bmap will be temporarily extended
due to insufficiency during a ballooning process, but when that
ballooning finishes, the extended part will be freed.
>> +/* The maximum number of page_bmap that can be allocated. */
> Not really, this is the size of the array we use to keep them.

This is the max number of the page bmap that can be
extended temporarily.

>> +#define VIRTIO_BALLOON_PAGE_BMAP_MAX_NUM	32
>> +
> So you still have a home-grown bitmap. I'd like to know why
> isn't xbitmap suggested for this purpose by Matthew Wilcox
> appropriate. Please add a comment explaining the requirements
> from the data structure.

I didn't find his xbitmap being upstreamed, did you?

>> +/*
>> + * QEMU virtio implementation requires the desc table size less than
>> + * VIRTQUEUE_MAX_SIZE, so minus 1 here.
> I think it doesn't, the issue is probably that you add a header
> as a separate s/g. In any case see below.
>
>> + */
>> +#define VIRTIO_BALLOON_MAX_PAGE_CHUNKS (VIRTQUEUE_MAX_SIZE - 1)
> This is wrong, virtio spec says s/g size should not exceed VQ size.
> If you want to support huge VQ sizes, you can add a fallback to
> smaller sizes until it fits in 1 page.

Probably no need for huge VQ size, 1024 queue size should be
enough. And we can have 1024 descriptors in the indirect
table, so the above size doesn't exceed the vq size, right?


> +static unsigned int extend_page_bmap_size(struct virtio_balloon *vb,
> +					  unsigned long pfn_num)
> what's this API doing?  Pls add comments. this seems to assume
> it will only be called once.
OK, I will add some comments here. This is the function to extend
the number of page bitmap when the original 1 page bmap is
not sufficient during a ballooning process. As mentioned above,
at the end of this ballooning process, the extended part will be freed.

> it would be better to avoid making
> this assumption, just look at what has been allocated
> and extend it.
Actually it's not an assumption. The rule here is that we always keep
"1" page bmap. "1" is defined by the
VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM. So when freeing, it also
references VIRTIO_BALLOON_PAGE_BMAP_DEFAULT_NUM (not assuming
any number)

> +}
> +
> +/* Add a chunk to the buffer. */
> +static void add_one_chunk(struct virtio_balloon *vb, struct virtqueue *vq,
> +			  u64 base_addr, u32 size)
> +{
> +	unsigned int *num = &vb->balloon_page_chunk.chunk_num;
> +	struct vring_desc *desc = &vb->balloon_page_chunk.desc_table[*num];
> +
> +	desc->addr = cpu_to_virtio64(vb->vdev, base_addr);
> +	desc->len = cpu_to_virtio32(vb->vdev, size);
> +	*num += 1;
> +	if (*num == VIRTIO_BALLOON_MAX_PAGE_CHUNKS)
> +		send_page_chunks(vb, vq);
> +}
> +
> Poking at virtio internals like this is not nice. Pls move to virtio
> code.  Also, pages must be read descriptors as host might modify them.
>
> This also lacks viommu support but this is not mandatory as
> that is borken atm anyway. I'll send a patch to at least fail cleanly.
OK, thanks.

>> +static void convert_bmap_to_chunks(struct virtio_balloon *vb,
>> +				   struct virtqueue *vq,
>> +				   unsigned long *bmap,
>> +				   unsigned long pfn_start,
>> +				   unsigned long size)
>> +{
>> +	unsigned long next_one, next_zero, pos = 0;
>> +	u64 chunk_base_addr;
>> +	u32 chunk_size;
>> +
>> +	while (pos < size) {
>> +		next_one = find_next_bit(bmap, size, pos);
>> +		/*
>> +		 * No "1" bit found, which means that there is no pfn
>> +		 * recorded in the rest of this bmap.
>> +		 */
>> +		if (next_one == size)
>> +			break;
>> +		next_zero = find_next_zero_bit(bmap, size, next_one + 1);
>> +		/*
>> +		 * A bit in page_bmap corresponds to a page of PAGE_SIZE.
>> +		 * Convert it to be pages of 4KB balloon page size when
>> +		 * adding it to a chunk.
> This looks wrong. add_one_chunk assumes size in bytes. So should be just
> PAGE_SIZE.

It's intended to be "chunk size", which is the number of pfns. The 
benefit is
that the 32-bit desc->len won't be overflow, as you mentioned below.


>
>> +		 */
>> +		chunk_size = (next_zero - next_one) *
>> +			     VIRTIO_BALLOON_PAGES_PER_PAGE;
> How do you know this won't overflow a 32 bit integer? Needs a comment.

If it stores size in bytes, it has the possibility to overflow.
If storing number of pfns, the 32-bit value can support 2^32*4KB=8TB
memory, unlikely to overflow.
> +
> +static int balloon_page_chunk_init(struct virtio_balloon *vb)
> +{
> +	int i;
> +
> +	vb->balloon_page_chunk.desc_table = alloc_indirect(vb->vdev,
> +						VIRTIO_BALLOON_MAX_PAGE_CHUNKS,
> +						GFP_KERNEL);
> This one's problematic, you aren't supposed to use APIs when device
> is not inited yet. Seems to work by luck here. I suggest moving
> this to probe, that's where we do a bunch of inits.
> And then you can move private init back to allocate too.

This is just to allocate an indirect desc table. If allocation fails, we 
need to clear
the related feature bit in ->validate(), right?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-06-15  8:08 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-09 10:41 [PATCH v11 0/6] Virtio-balloon Enhancement Wei Wang
2017-06-09 10:41 ` [PATCH v11 1/6] virtio-balloon: deflate via a page list Wei Wang
2017-06-09 10:41 ` [PATCH v11 2/6] virtio-balloon: coding format cleanup Wei Wang
2017-06-09 10:41 ` [PATCH v11 3/6] virtio-balloon: VIRTIO_BALLOON_F_PAGE_CHUNKS Wei Wang
2017-06-13 17:56   ` Michael S. Tsirkin
2017-06-13 17:59     ` Dave Hansen
2017-06-13 18:55       ` Michael S. Tsirkin
2017-06-15  8:10     ` Wei Wang [this message]
2017-06-16  3:19       ` [virtio-dev] " Michael S. Tsirkin
2017-06-28 15:04       ` Matthew Wilcox
2017-07-12 13:05         ` Wei Wang
2017-06-09 10:41 ` [PATCH v11 4/6] mm: function to offer a page block on the free list Wei Wang
2017-06-12 14:10   ` Dave Hansen
2017-06-12 16:28     ` Michael S. Tsirkin
2017-06-12 16:42       ` Dave Hansen
2017-06-12 20:34         ` Michael S. Tsirkin
2017-06-12 20:54           ` Dave Hansen
2017-06-13  2:56             ` Wei Wang
2017-06-20 16:44     ` Rik van Riel
2017-06-20 16:49       ` David Hildenbrand
2017-06-20 17:29         ` Rik van Riel
2017-06-20 18:26           ` Michael S. Tsirkin
2017-06-20 19:51             ` Rik van Riel
2017-06-21 12:41               ` Michael S. Tsirkin
2017-06-21  8:38           ` [Qemu-devel] " Wei Wang
2017-06-20 18:17         ` Michael S. Tsirkin
2017-06-20 18:54           ` David Hildenbrand
2017-06-20 18:56             ` Michael S. Tsirkin
2017-06-20 19:01               ` David Hildenbrand
2017-06-21 12:56         ` Christian Borntraeger
2017-06-21 13:47           ` David Hildenbrand
2017-06-09 10:41 ` [PATCH v11 5/6] mm: export symbol of next_zone and first_online_pgdat Wei Wang
2017-06-09 10:41 ` [PATCH v11 6/6] virtio-balloon: VIRTIO_BALLOON_F_CMD_VQ Wei Wang
2017-06-12 14:07   ` Dave Hansen
2017-06-13 10:17     ` Wei Wang
2017-06-20 16:18   ` Michael S. Tsirkin
2017-06-21  3:28     ` [virtio-dev] " Wei Wang
2017-06-21 12:28       ` Michael S. Tsirkin
2017-06-22  8:40         ` Wei Wang
2017-06-28 15:01           ` Michael S. Tsirkin
2017-07-12 12:57             ` Wei Wang
2017-06-09 11:18 ` [PATCH v11 0/6] Virtio-balloon Enhancement Wang, Wei W

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=594240E9.2070705@intel.com \
    --to=wei.w.wang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.shah@redhat.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=liliang.opensource@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).