Re: [PATCH v14 6/6] virtio-balloon: Add support for providing unused page reports to host

From: David Hildenbrand <david@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm list <kvm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>, linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Vlastimil Babka <vbabka@suse.cz>,
	Yang Zhang <yang.zhang.wz@gmail.com>,
	Nitesh Narayan Lal <nitesh@redhat.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Pankaj Gupta <pagupta@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	lcapitulino@redhat.com, Dave Hansen <dave.hansen@intel.com>,
	"Wang, Wei W" <wei.w.wang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH v14 6/6] virtio-balloon: Add support for providing unused page reports to host
Date: Mon, 2 Dec 2019 11:43:59 +0100	[thread overview]
Message-ID: <c97bc21d-6394-7667-32a7-28f121fa1045@redhat.com> (raw)
In-Reply-To: <CAKgT0Uf8iebEXSovdWfXq1FvyGpqrF-X0VDrq-h8xavQkvA_9w@mail.gmail.com>

[...]

>> Sounds like the device is unused :D
>>
>> "Device info for reporting unused pages" ?
>>
>> I am in general wondering, should we rename "unused" to "free". I.e.,
>> "free page reporting" instead of "unused page reporting"? Or what was
>> the motivation behind using "unused" ?
> 
> I honestly don't remember why I chose "unused" at this point. I can
> switch over to "free" if that is what is preferred.
> 
> Looking over the code a bit more I suspect the reason for avoiding it
> is because free page hinting also mentioned reporting in a few spots.

Maybe we should fix these cases. FWIW, I'd prefer "free page reporting".
(e.g., pairs nicely with "free page hinting").

>>> +
>>> +     virtqueue_kick(vq);
>>> +
>>> +     /* When host has read buffer, this completes via balloon_ack */
>>> +     wait_event(vb->acked, virtqueue_get_buf(vq, &unused));
>>
>> Is it safe to rely on the same ack-ing mechanism as the inflate/deflate
>> queue? What if both mechanisms are used concurrently and race/both wait
>> for the hypervisor?
>>
>> Maybe we need a separate vb->acked + callback function.
> 
> So if I understand correctly what is actually happening is that the
> wait event is simply a trigger that will wake us up, and at that point
> we check to see if the buffer we submitted is done. If not we go back
> to sleep. As such all we are really waiting on is the notification
> that the buffers we submitted have been processed. So it is using the
> same function but on a different virtual queue.

Very right, this is just a waitqueue (was only looking at this patch,
not the full code). This should indeed be fine.

>>>       vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
>>>       vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
>>>       if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
>>> @@ -932,12 +976,30 @@ static int virtballoon_probe(struct virtio_device *vdev)
>>>               if (err)
>>>                       goto out_del_balloon_wq;
>>>       }
>>> +
>>> +     vb->pr_dev_info.report = virtballoon_unused_page_report;
>>> +     if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) {
>>> +             unsigned int capacity;
>>> +
>>> +             capacity = min_t(unsigned int,
>>> +                              virtqueue_get_vring_size(vb->reporting_vq),
>>> +                              VIRTIO_BALLOON_VRING_HINTS_MAX);
>>> +             vb->pr_dev_info.capacity = capacity;
>>> +
>>> +             err = page_reporting_register(&vb->pr_dev_info);
>>> +             if (err)
>>> +                     goto out_unregister_shrinker;
>>> +     }
>>
>> It can happen here that we start reporting before marking the device
>> ready. Can that be problematic?
>>
>> Maybe we have to ignore any reports in virtballoon_unused_page_report()
>> until ready...
> 
> I don't think there is an issue with us putting buffers on the ring
> before it is ready. I think it will just cause our function to sleep.
> 
> I'm guessing that is the case since init_vqs will add a buffer to the
> stats vq and that happens even earlier in virtballoon_probe.
> 

Interesting: "Note: vqs are enabled automatically after probe returns.".
Learned something new.

The virtballoon_changed(vdev) *after* virtio_device_ready(vdev) made me
wonder, because that could also fill the queues.

Maybe Michael can clarify.

>>> +
>>>       virtio_device_ready(vdev);
>>>
>>>       if (towards_target(vb))
>>>               virtballoon_changed(vdev);
>>>       return 0;
>>>
>>> +out_unregister_shrinker:
>>> +     if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>>> +             virtio_balloon_unregister_shrinker(vb);
>>
>> A sync is done implicitly, right? So after this call, we won't get any
>> new callbacks/are stuck in a callback.
> 
> From what I can tell a read/write semaphore is used in
> unregister_shrinker when we delete it from the list so it shouldn't be
> an issue.

Yes, makes sense.

> 
>>>  out_del_balloon_wq:
>>>       if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>>>               destroy_workqueue(vb->balloon_wq);
>>> @@ -966,6 +1028,8 @@ static void virtballoon_remove(struct virtio_device *vdev)
>>>  {
>>>       struct virtio_balloon *vb = vdev->priv;
>>>
>>> +     if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING))
>>> +             page_reporting_unregister(&vb->pr_dev_info);
>>
>> Dito, same question regarding syncs.
> 
> Yes, although for that one I was using pointer deletion, a barrier,
> and a cancel_work_sync since I didn't support a list.

Okay, perfect.

[...]
>>
>> Small and powerful patch :)
> 
> Agreed. Although we will have to see if we can keep it that way.
> Ideally I want to leave this with the ability so specify what size
> scatterlist we receive. However if we have to flip it around then it
> will force us to add logic for chopping up the scatterlist for
> processing in chunks.

I hope we can keep it like that. Otherwise each and every driver has to
implement this chopping-up (e.g., a hypervisor that can only send one
hint at a time - e.g., via  a simple hypercall - would have to implement
that).

-- 
Thanks,

David / dhildenb