linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Nitesh Narayan Lal <nitesh@redhat.com>,
	kvm list <kvm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com,
	Yang Zhang <yang.zhang.wz@gmail.com>,
	Rik van Riel <riel@surriel.com>,
	dodgen@google.com, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	dhildenb@redhat.com, Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: Thoughts on simple scanner approach for free page hinting
Date: Mon, 8 Apr 2019 23:20:52 +0200	[thread overview]
Message-ID: <9da317cb-38ee-9b02-2549-65d8b45d5354@redhat.com> (raw)
In-Reply-To: <efe01b95-33d4-71ce-2a48-ec43f0846d68@redhat.com>

On 08.04.19 22:51, David Hildenbrand wrote:
> On 08.04.19 22:10, Alexander Duyck wrote:
>> On Mon, Apr 8, 2019 at 11:40 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>>>>>
>>>>>> In addition we will need some way to identify which pages have been
>>>>>> hinted on and which have not. The way I believe easiest to do this
>>>>>> would be to overload the PageType value so that we could essentially
>>>>>> have two values for "Buddy" pages. We would have our standard "Buddy"
>>>>>> pages, and "Buddy" pages that also have the "Offline" value set in the
>>>>>> PageType field. Tracking the Online vs Offline pages this way would
>>>>>> actually allow us to do this with almost no overhead as the mapcount
>>>>>> value is already being reset to clear the "Buddy" flag so adding a
>>>>>> "Offline" flag to this clearing should come at no additional cost.
>>>>>
>>>>> Just nothing here that this will require modifications to kdump
>>>>> (makedumpfile to be precise and the vmcore information exposed from the
>>>>> kernel), as kdump only checks for the the actual mapcount value to
>>>>> detect buddy and offline pages (to exclude them from dumps), they are
>>>>> not treated as flags.
>>>>>
>>>>> For now, any mapcount values are really only separate values, meaning
>>>>> not the separate bits are of interest, like flags would be. Reusing
>>>>> other flags would make our life a lot easier. E.g. PG_young or so. But
>>>>> clearing of these is then the problematic part.
>>>>>
>>>>> Of course we could use in the kernel two values, Buddy and BuddyOffline.
>>>>> But then we have to check for two different values whenever we want to
>>>>> identify a buddy page in the kernel.
>>>>
>>>> Actually this may not be working the way you think it is working.
>>>
>>> Trust me, I know how it works. That's why I was giving you the notice.
>>>
>>> Read the first paragraph again and ignore the others. I am only
>>> concerned about makedumpfile that has to be changed.
>>>
>>> PAGE_OFFLINE_MAPCOUNT_VALUE
>>> PAGE_BUDDY_MAPCOUNT_VALUE
>>>
>>> Once you find out how these values are used, you should understand what
>>> has to be changed and where.
>>
>> Ugh. Is there an official repo I am supposed to refer to for makedumpfile?
>>
>> As far as the changes needed I don't think this would necessitate
>> additional exports. We could probably just get away with having
>> makedumpfile generate a new value by simply doing an "&" of the two
>> values to determine what an offline buddy would be. If need be I can
>> submit a patch for that. I find it kind of annoying that the kernel is
>> handling identifying these bits one way, and makedumpfile is doing it
>> another way. It should have been setup to handle this all the same
>> way.
>>
>>>
>>>>>>
>>>>>> Lastly we would need to create a specialized function for allocating
>>>>>> the non-"Offline" pages, and to tweak __free_one_page to tail enqueue
>>>>>> "Offline" pages. I'm thinking the alloc function it would look
>>>>>> something like __rmqueue_smallest but without the "expand" and needing
>>>>>> to modify the !page check to also include a check to verify the page
>>>>>> is not "Offline". As far as the changes to __free_one_page it would be
>>>>>> a 2 line change to test for the PageType being offline, and if it is
>>>>>> to call add_to_free_area_tail instead of add_to_free_area.
>>>>>
>>>>> As already mentioned, there might be scenarios where the additional
>>>>> hinting thread might consume too much CPU cycles, especially if there is
>>>>> little guest activity any you mostly spend time scanning a handful of
>>>>> free pages and reporting them. I wonder if we can somehow limit the
>>>>> amount of wakeups/scans for a given period to mitigate this issue.
>>>>
>>>> That is why I was talking about breaking nr_free into nr_freed and
>>>> nr_bound. By doing that I can record the nr_free value to a
>>>> virtio-balloon specific location at the start of any walk and should
>>>> know exactly now many pages were freed between that call and the next
>>>> one. By ordering things such that we place the "Offline" pages on the
>>>> tail of the list it should make the search quite fast since we would
>>>> just be always allocating off of the head of the queue until we have
>>>> hinted everything int he queue. So when we hit the last call to alloc
>>>> the non-"Offline" pages and shut down our thread we can use the
>>>> nr_freed value that we recorded to know exactly how many pages have
>>>> been added that haven't been hinted.
>>>>
>>>>> One main issue I see with your approach is that we need quite a lot of
>>>>> core memory management changes. This is a problem. I wonder if we can
>>>>> factor out most parts into callbacks.
>>>>
>>>> I think that is something we can't get away from. However if we make
>>>> this generic enough there would likely be others beyond just the
>>>> virtualization drivers that could make use of the infrastructure. For
>>>> example being able to track the rate at which the free areas are
>>>> cycling in and out pages seems like something that would be useful
>>>> outside of just the virtualization areas.
>>>
>>> Might be, but might be the other extreme, people not wanting such
>>> special cases in core mm. I assume the latter until I see a very clear
>>> design where such stuff has been properly factored out.
>>
>> The only real pain point I am seeing right now is the assumptions
>> makedumpfile is currently making about how mapcount is being used to
>> indicate pagetype. If we patch it to fix it most of the other bits are
>> minor.
> 
> I'll be curious how splitting etc. will be handled. Especially if you
> want to set Offline for all affected sub pages.
> 

Answering that myself, I guess you are planning to change the buddy to
basically copy the offline value to sub-pages when splitting, also
attaching them to the tail of the list instead of the head.

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2019-04-08 21:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-06  0:09 Thoughts on simple scanner approach for free page hinting Alexander Duyck
2019-04-08 12:24 ` Nitesh Narayan Lal
2019-04-08 15:18   ` Alexander Duyck
2019-04-08 15:41     ` Michael S. Tsirkin
2019-04-08 16:36 ` David Hildenbrand
2019-04-08 18:09   ` Nitesh Narayan Lal
2019-04-08 18:19     ` Michael S. Tsirkin
2019-04-08 18:29       ` Alexander Duyck
2019-04-08 18:58         ` David Hildenbrand
2019-04-08 18:27     ` David Hildenbrand
2019-04-08 18:18   ` Alexander Duyck
2019-04-08 18:40     ` David Hildenbrand
2019-04-08 20:10       ` Alexander Duyck
2019-04-08 20:47         ` David Hildenbrand
2019-04-08 20:51         ` David Hildenbrand
2019-04-08 21:20           ` David Hildenbrand [this message]
2019-04-08 21:56             ` Alexander Duyck
2019-04-09  2:44 ` Michael S. Tsirkin
2019-04-09  7:05   ` David Hildenbrand
2019-04-09  9:20 ` David Hildenbrand
2019-04-09 13:31   ` Michael S. Tsirkin
2019-04-09 13:36     ` David Hildenbrand
2019-04-09 13:37       ` Michael S. Tsirkin
2019-04-09 13:43         ` David Hildenbrand
2019-04-09 14:03           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9da317cb-38ee-9b02-2549-65d8b45d5354@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=dhildenb@redhat.com \
    --cc=dodgen@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mst@redhat.com \
    --cc=nitesh@redhat.com \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=wei.w.wang@intel.com \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).