From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Wang Subject: Re: [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG Date: Wed, 26 Jul 2017 11:48:41 +0800 Message-ID: <59781119.8010200@intel.com> References: <1499863221-16206-1-git-send-email-wei.w.wang@intel.com> <1499863221-16206-6-git-send-email-wei.w.wang@intel.com> <20170712160129-mutt-send-email-mst@kernel.org> <5966241C.9060503@intel.com> <20170712163746-mutt-send-email-mst@kernel.org> <5967246B.9030804@intel.com> <20170713210819-mutt-send-email-mst@kernel.org> <59686EEB.8080805@intel.com> <20170723044036-mutt-send-email-mst@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, linux-mm@kvack.org, david@redhat.com, cornelia.huck@de.ibm.com, akpm@linux-foundation.org, mgorman@techsingularity.net, aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com, liliang.opensource@gmail.com, virtio-dev@lists.oasis-open.org, yang.zhang.wz@gmail.com, quan.xu@aliyun.com To: "Michael S. Tsirkin" Return-path: In-Reply-To: <20170723044036-mutt-send-email-mst@kernel.org> Sender: owner-linux-mm@kvack.org List-Id: kvm.vger.kernel.org On 07/23/2017 09:45 AM, Michael S. Tsirkin wrote: > On Fri, Jul 14, 2017 at 03:12:43PM +0800, Wei Wang wrote: >> On 07/14/2017 04:19 AM, Michael S. Tsirkin wrote: >>> On Thu, Jul 13, 2017 at 03:42:35PM +0800, Wei Wang wrote: >>>> On 07/12/2017 09:56 PM, Michael S. Tsirkin wrote: >>>>> So the way I see it, there are several issues: >>>>> >>>>> - internal wait - forces multiple APIs like kick/kick_sync >>>>> note how kick_sync can fail but your code never checks return code >>>>> - need to re-write the last descriptor - might not work >>>>> for alternative layouts which always expose descriptors >>>>> immediately >>>> Probably it wasn't clear. Please let me explain the two functions here: >>>> >>>> 1) virtqueue_add_chain_desc(vq, head_id, prev_id,..): >>>> grabs a desc from the vq and inserts it to the chain tail (which is indexed >>>> by >>>> prev_id, probably better to call it tail_id). Then, the new added desc >>>> becomes >>>> the tail (i.e. the last desc). The _F_NEXT flag is cleared for each desc >>>> when it's >>>> added to the chain, and set when another desc comes to follow later. >>> And this only works if there are multiple rings like >>> avail + descriptor ring. >>> It won't work e.g. with the proposed new layout where >>> writing out a descriptor exposes it immediately. >> I think it can support the 1.1 proposal, too. But before getting >> into that, I think we first need to deep dive into the implementation >> and usage of _first/next/last. The usage would need to lock the vq >> from the first to the end (otherwise, the returned info about the number >> of available desc in the vq, i.e. num_free, would be invalid): >> >> lock(vq); >> add_first(); >> add_next(); >> add_last(); >> unlock(vq); >> >> However, I think the case isn't this simple, since we need to check more >> things >> after each add_xx() step. For example, if only one entry is available at the >> time >> we start to use the vq, that is, num_free is 0 after add_first(), we >> wouldn't be >> able to add_next and add_last. So, it would work like this: >> >> start: >> ...get free page block.. >> lock(vq) >> retry: >> ret = add_first(..,&num_free,); >> if(ret == -ENOSPC) { >> goto retry; >> } else if (!num_free) { >> add_chain_head(); >> unlock(vq); >> kick & wait; >> goto start; >> } >> next_one: >> ...get free page block.. >> add_next(..,&num_free,); >> if (!num_free) { >> add_chain_head(); >> unlock(vq); >> kick & wait; >> goto start; >> } if (num_free == 1) { >> ...get free page block.. >> add_last(..); >> unlock(vq); >> kick & wait; >> goto start; >> } else { >> goto next_one; >> } >> >> The above seems unnecessary to me to have three different APIs. >> That's the reason to combine them into one virtqueue_add_chain_desc(). >> >> -- or, do you have a different thought about using the three APIs? >> >> >> Implementation Reference: >> >> struct desc_iterator { >> unsigned int head; >> unsigned int tail; >> }; >> >> add_first(*vq, *desc_iterator, *num_free, ..) >> { >> if (vq->vq.num_free < 1) >> return -ENOSPC; >> get_desc(&desc_id); >> desc[desc_id].flag &= ~_F_NEXT; >> desc_iterator->head = desc_id >> desc_iterator->tail = desc_iterator->head; >> *num_free = vq->vq.num_free; >> } >> >> add_next(vq, desc_iterator, *num_free,..) >> { >> get_desc(&desc_id); >> desc[desc_id].flag &= ~_F_NEXT; >> desc[desc_iterator.tail].next = desc_id; >> desc[desc_iterator->tail].flag |= _F_NEXT; >> desc_iterator->tail = desc_id; >> *num_free = vq->vq.num_free; >> } >> >> add_last(vq, desc_iterator,..) >> { >> get_desc(&desc_id); >> desc[desc_id].flag &= ~_F_NEXT; >> desc[desc_iterator.tail].next = desc_id; >> desc_iterator->tail = desc_id; >> >> add_chain_head(); // put the desc_iterator.head to the ring >> } >> >> >> Best, >> Wei > OK I thought this over. While we might need these new APIs in > the future, I think that at the moment, there's a way to implement > this feature that is significantly simpler. Just add each s/g > as a separate input buffer. Should it be an output buffer? I think output means from the driver to device (i.e. DMA_TO_DEVICE). > > This needs zero new APIs. > > I know that follow-up patches need to add a header in front > so you might be thinking: how am I going to add this > header? The answer is quite simple - add it as a separate > out header. > > Host will be able to distinguish between header and pages > by looking at the direction, and - should we want to add > IN data to header - additionally size (<4K => header). I think this works fine when the cmdq is only used for reporting the unused pages. It would be an issue if there are other usages (e.g. report memory statistics) interleaving. I think one solution would be to lock the cmdq until a cmd usage is done ((e.g. all the unused pages are reported) ) - in this case, the periodically updated guest memory statistics may be delayed for a while occasionally when live migration starts. Would this be acceptable? If not, probably we can have the cmdq for one usage only. Best, Wei -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org