On 6/4/19 12:25 PM, Alexander Duyck wrote:
> On Tue, Jun 4, 2019 at 9:08 AM Nitesh Narayan Lal <nitesh@redhat.com> wrote:
>>
>> On 6/4/19 11:14 AM, Alexander Duyck wrote:
>>> On Tue, Jun 4, 2019 at 5:55 AM Nitesh Narayan Lal <nitesh@redhat.com> wrote:
>>>> On 6/3/19 3:04 PM, Alexander Duyck wrote:
>>>>> On Mon, Jun 3, 2019 at 10:04 AM Nitesh Narayan Lal <nitesh@redhat.com> wrote:
>>>>>> This patch introduces the core infrastructure for free page hinting in
>>>>>> virtual environments. It enables the kernel to track the free pages which
>>>>>> can be reported to its hypervisor so that the hypervisor could
>>>>>> free and reuse that memory as per its requirement.
>>>>>>
>>>>>> While the pages are getting processed in the hypervisor (e.g.,
>>>>>> via MADV_FREE), the guest must not use them, otherwise, data loss
>>>>>> would be possible. To avoid such a situation, these pages are
>>>>>> temporarily removed from the buddy. The amount of pages removed
>>>>>> temporarily from the buddy is governed by the backend(virtio-balloon
>>>>>> in our case).
>>>>>>
>>>>>> To efficiently identify free pages that can to be hinted to the
>>>>>> hypervisor, bitmaps in a coarse granularity are used. Only fairly big
>>>>>> chunks are reported to the hypervisor - especially, to not break up THP
>>>>>> in the hypervisor - "MAX_ORDER - 2" on x86, and to save space. The bits
>>>>>> in the bitmap are an indication whether a page *might* be free, not a
>>>>>> guarantee. A new hook after buddy merging sets the bits.
>>>>>>
>>>>>> Bitmaps are stored per zone, protected by the zone lock. A workqueue
>>>>>> asynchronously processes the bitmaps, trying to isolate and report pages
>>>>>> that are still free. The backend (virtio-balloon) is responsible for
>>>>>> reporting these batched pages to the host synchronously. Once reporting/
>>>>>> freeing is complete, isolated pages are returned back to the buddy.
>>>>>>
>>>>>> There are still various things to look into (e.g., memory hotplug, more
>>>>>> efficient locking, possible races when disabling).
>>>>>>
>>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>>>> So one thing I had thought about, that I don't believe that has been
>>>>> addressed in your solution, is to determine a means to guarantee
>>>>> forward progress. If you have a noisy thread that is allocating and
>>>>> freeing some block of memory repeatedly you will be stuck processing
>>>>> that and cannot get to the other work. Specifically if you have a zone
>>>>> where somebody is just cycling the number of pages needed to fill your
>>>>> hinting queue how do you get around it and get to the data that is
>>>>> actually code instead of getting stuck processing the noise?
>>>> It should not matter. As every time the memory threshold is met, entire
>>>> bitmap
>>>> is scanned and not just a chunk of memory for possible isolation. This
>>>> will guarantee
>>>> forward progress.
>>> So I think there may still be some issues. I see how you go from the
>>> start to the end, but how to you loop back to the start again as pages
>>> are added? The init_hinting_wq doesn't seem to have a way to get back
>>> to the start again if there is still work to do after you have
>>> completed your pass without queue_work_on firing off another thread.
>>>
>> That will be taken care as the part of a new job, which will be
>> en-queued as soon
>> as the free memory count for the respective zone will reach the threshold.
> So does that mean that you have multiple threads all calling
> queue_work_on until you get below the threshold?
Every time a page of order MAX_ORDER - 2 is added to the buddy, free
memory count will be incremented if the bit is not already set and its
value will be checked against the threshold.
>  If so it seems like
> that would get expensive since that is an atomic test and set
> operation that would be hammered until you get below that threshold.

Not sure if I understood "until you get below that threshold".
Can you please explain?
test_and_set_bit() will be called every time a page with MAX_ORDER -2
order is added to the buddy. (Not already hinted)


-- 
Regards
Nitesh