Re: + mm-introduce-reported-pages.patch added to -mm tree

From: David Hildenbrand <david@redhat.com>
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, aarcange@redhat.com,
	dan.j.williams@intel.com, dave.hansen@intel.com,
	konrad.wilk@oracle.com, lcapitulino@redhat.com,
	mgorman@techsingularity.net, mm-commits@vger.kernel.org,
	mst@redhat.com, osalvador@suse.de, pagupta@redhat.com,
	pbonzini@redhat.com, riel@surriel.com, vbabka@suse.cz,
	wei.w.wang@intel.com, willy@infradead.org,
	yang.zhang.wz@gmail.com, linux-mm@kvack.org
Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree
Date: Tue, 12 Nov 2019 14:04:07 +0100	[thread overview]
Message-ID: <8a407188-5dd2-648b-fc26-f03a826bfee3@redhat.com> (raw)
In-Reply-To: <1d881e86ed58511b20883fd0031623fe6cade480.camel@linux.intel.com>

>> Yes, they are protoypes, RFCs. You did the right thing to report the 
>> issues so Nitesh can look into them.
> 
> I feel like you are arguing this both ways. On one side you are saying
> that these are alternatives and need to be evaluated. Then on the other
> side you say they are RFCs and it isn't fair to hold the outcome of a
> performance evaluation against them.

I guess my point is to identify if there are fundamental performance
issues that cannot be solved easily later. If you have a prototype and
you perform some minor changes to solve them, then I don't consider it a
fundamental problem. It's still a prototype.

> 
> Maybe instead of directly comparing the two approaches we should just look
> at defining what requirements need to be met for either approach to be
> considered acceptable. Then neither of us necessarily needs to be
> comparing things directly and instead we are marching toward a set of
> requirements to get to the solution that will work best overall.

Makes perfect sense to me.

> 
>>>> You: Please take my patch set, it is better than the alternatives
>>>> because of X, for X in {RFC quality, sparse zones, locking internals,
>>>> current performance differences}
>>>
>>> I should have replied to Michal's original question and simply stated that
>>> Mel had not replied to the patches in the last month and a half. I half
>>> suspect that is the reason for Andrew applying it. It put some pressure on
>>> others to provide review feedback, which if nothing else I am grateful
>>> for.
>>>
>>> You had inserted the need to compare it against Nitesh's patch set. Which
>>> based on Nitesh's email is likely going to be a little while since he
>>> cannot give me an ETA.
>>
>> So I want us (you, me, Michal, Mel, Dave, ...) to discuss the direction 
>> we want to go. I'd love to do this on a design level, instead of having 
>> to wait for any patch set. But I guess this is harder to do? And 
>> especially as you keep mentioning the performance difference, I think we 
>> should evaluate if this is an unsolvable problem or just an issue in the 
>> current prototype?
>>
>> I mean people could have a look at Niteshs older series to see how it 
>> fundamentally differs to your approach (external tracking). Nitesh might 
>> have fixed some things in the mean time, and is replacing the "fake 
>> allocation" by page isolation. But I mean, the general approach should 
>> be obvious and sufficient for people to make a decision.
> 
> I think the problem is what is being asked for versus how we are going
> about it. Asking for performance comparisons isn't going to really lead to
> a design discussion.

I tend to agree. It's just me wondering how good of a performance we can
achieve with external tracking :)

> 
> One thing that might be interesting, at least to me, might be to just
> start a new thread to discuss the options/approaches. I know I haven't
> really heard much about the page isolation approach. I would be interested
> in hearing how you guys are planning to go about implementing that.

Yes, we should definitely do that. I was only skimming over the other
mail exchange (won't really be working this week), but I think you
identified some improvements yourself for your approach.

[...]

>>> fact is it is still invasive, just to different parts of the mm subsystem.
>>
>> I'd love to see how it uses the page isolation framework, and only has a 
>> single hook to queue pages. I don't like the way pages are pulled out of 
>> the buddy in Niteshs approach currently. What you have is cleaner.
> 
> I don't see how you could use the page isolation framework to pull out
> free pages. Is there a thread somewhere on the topic that I missed?

It's basically only isolating pages while reporting them, and not
pulling them out of the buddy (IOW, you move the pages to the isolate
queues where nobody is allowed to touch them, and setting the
migratetype properly). This e.g., makes other user of page isolation
(e.g., memory offlining, alloc_contig_range()) play nicely with these
isolated pages. "somebody else just isolated them, please try again."

start_isolate_page_range()/undo_isolate_page_range()/test_pages_isolated()
along with a lockless check if the page is free.

I think it should be something like this (ignoring different
migratetypes and such for now)

1. Test lockless if page is free: Not free? Done.
2. start_isolate_page_range(): Busy? Rare race (with other isolate users
or with an allocation). Done.
3. test_pages_isolated()
3a. no? Rare race, page not free anymore. undo_isolate_page_range()
3b. yes? Report, then undo_isolate_page_range()

If we would run into performance issues with the current page isolation
implementation (esp. locking), I think there are some nice
cleanups/reworks possible of which all current users could benefit
(especially accross pageblocks).

> 
>>> I would argue that one of my concerns about the hotplug and sparse
>>> handling is that by skipping those for now is essentially hiding what is
>>> likely to be some invasive code, likely not too different from what I had
>>> to deal with with compaction. At this point he adds more data to the zone
>>> struct than my changes, and I suspect as he progresses that may increase
>>> further. >
>>> I do not think it is fair to hold up review and acceptance of this patch
>>> set for performance comparisons with a patch set with no definite ETA.
>>
>> Michal asked "Is there really a consensus". A consensus that we want 
>> something like this, not that we want Nitesh's approach. It's just an 
>> alternative worth discussing.
>>
>> And if you are reworking your patch set now with Mel, we might get 
>> another alternative that everybody is pleased with. Nobody is against 
>> reviewing your series - that's perfect, it's against picking it up and 
>> sending it upstream. That's my concern an Michals concern if I am not wrong.
> 
> I agree it is not necessarily ready for upstream yet. Thus why I am
> working on a v14. My past experience has been that anything accepted at
> this state is going to spend at least a couple months in the mm tree
> before it is pushed. However I don't see the issue with it spending some
> time in the mm tree and linux-next to get more eyes on it to identify any
> potential issues or additional use cases. If anything I welcome the
> additional debate, as it allows for additional opportunities for
> improvement.

Indeed.

-- 

Thanks,

David / dhildenb