From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ID7B=Y7=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 941BAC43331
	for <linux-mm@archiver.kernel.org>; Thu,  7 Nov 2019 19:37:31 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 484E421882
	for <linux-mm@archiver.kernel.org>; Thu,  7 Nov 2019 19:37:31 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UCElBqZ8"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 484E421882
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id C65A56B0003; Thu,  7 Nov 2019 14:37:30 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id C169F6B0006; Thu,  7 Nov 2019 14:37:30 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id AB8406B0007; Thu,  7 Nov 2019 14:37:30 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188])
	by kanga.kvack.org (Postfix) with ESMTP id 8D3796B0003
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 14:37:30 -0500 (EST)
Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with SMTP id 32B4C181AEF21
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 19:37:30 +0000 (UTC)
X-FDA: 76130490660.04.form78_415323c804043
X-HE-Tag: form78_415323c804043
X-Filterd-Recvd-Size: 17160
Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120])
	by imf17.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 19:37:28 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1573155448;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:openpgp:openpgp:autocrypt:autocrypt;
	bh=HdhlE7ex5yJDNJORMrj1OYUdx6qwy3C42USpbKgl1nQ=;
	b=UCElBqZ8DKq/bAe0KJf+W/w2jmtEuMLwWN34pN4XluPJYEDx3u0Nt//GzddXMPNx6RJ09c
	hoiM7DMaFf5zAbuXYV9WNgwhYLfn2wLYkxryLytFq0Az/Cn1D/LtrVpDqNgTSCz6keKgwI
	TBbFJixY/812FDO1RAhyFWrstsu6g3A=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-72-VpR5U_cpOuajtx2BBm2WHQ-1; Thu, 07 Nov 2019 14:37:24 -0500
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B2BE4800C61;
	Thu,  7 Nov 2019 19:37:22 +0000 (UTC)
Received: from [10.40.204.33] (ovpn-204-33.brq.redhat.com [10.40.204.33])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 43012608B3;
	Thu,  7 Nov 2019 19:37:04 +0000 (UTC)
Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree
To: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
 David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@kernel.org>
Cc: akpm@linux-foundation.org, aarcange@redhat.com, dan.j.williams@intel.com,
 dave.hansen@intel.com, konrad.wilk@oracle.com, lcapitulino@redhat.com,
 mgorman@techsingularity.net, mm-commits@vger.kernel.org, mst@redhat.com,
 osalvador@suse.de, pagupta@redhat.com, pbonzini@redhat.com,
 riel@surriel.com, vbabka@suse.cz, wei.w.wang@intel.com, willy@infradead.org,
 yang.zhang.wz@gmail.com, linux-mm@kvack.org
References: <20191106121605.GH8314@dhcp22.suse.cz>
 <CD4A882A-91A7-43F2-B31C-3FFD85289907@redhat.com>
 <dc1b2b3f4db8591303932351971f55688e0e240e.camel@linux.intel.com>
 <20191106165416.GO8314@dhcp22.suse.cz>
 <e90877ab21cdb7bb6a7a71a035f1b03e5544f384.camel@linux.intel.com>
 <f84f53b8-221e-02bd-2e7a-c0040ca03a38@redhat.com>
 <f1f84779123b7d3f0613d20f6bd9b05c806b39f7.camel@linux.intel.com>
From: Nitesh Narayan Lal <nitesh@redhat.com>
Openpgp: preference=signencrypt
Autocrypt: addr=nitesh@redhat.com; prefer-encrypt=mutual; keydata=
 mQINBFl4pQoBEADT/nXR2JOfsCjDgYmE2qonSGjkM1g8S6p9UWD+bf7YEAYYYzZsLtbilFTe
 z4nL4AV6VJmC7dBIlTi3Mj2eymD/2dkKP6UXlliWkq67feVg1KG+4UIp89lFW7v5Y8Muw3Fm
 uQbFvxyhN8n3tmhRe+ScWsndSBDxYOZgkbCSIfNPdZrHcnOLfA7xMJZeRCjqUpwhIjxQdFA7
 n0s0KZ2cHIsemtBM8b2WXSQG9CjqAJHVkDhrBWKThDRF7k80oiJdEQlTEiVhaEDURXq+2XmG
 jpCnvRQDb28EJSsQlNEAzwzHMeplddfB0vCg9fRk/kOBMDBtGsTvNT9OYUZD+7jaf0gvBvBB
 lbKmmMMX7uJB+ejY7bnw6ePNrVPErWyfHzR5WYrIFUtgoR3LigKnw5apzc7UIV9G8uiIcZEn
 C+QJCK43jgnkPcSmwVPztcrkbC84g1K5v2Dxh9amXKLBA1/i+CAY8JWMTepsFohIFMXNLj+B
 RJoOcR4HGYXZ6CAJa3Glu3mCmYqHTOKwezJTAvmsCLd3W7WxOGF8BbBjVaPjcZfavOvkin0u
 DaFvhAmrzN6lL0msY17JCZo046z8oAqkyvEflFbC0S1R/POzehKrzQ1RFRD3/YzzlhmIowkM
 BpTqNBeHEzQAlIhQuyu1ugmQtfsYYq6FPmWMRfFPes/4JUU/PQARAQABtCVOaXRlc2ggTmFy
 YXlhbiBMYWwgPG5pbGFsQHJlZGhhdC5jb20+iQI9BBMBCAAnBQJZeKUKAhsjBQkJZgGABQsJ
 CAcCBhUICQoLAgQWAgMBAh4BAheAAAoJEKOGQNwGMqM56lEP/A2KMs/pu0URcVk/kqVwcBhU
 SnvB8DP3lDWDnmVrAkFEOnPX7GTbactQ41wF/xwjwmEmTzLrMRZpkqz2y9mV0hWHjqoXbOCS
 6RwK3ri5e2ThIPoGxFLt6TrMHgCRwm8YuOSJ97o+uohCTN8pmQ86KMUrDNwMqRkeTRW9wWIQ
 EdDqW44VwelnyPwcmWHBNNb1Kd8j3xKlHtnS45vc6WuoKxYRBTQOwI/5uFpDZtZ1a5kq9Ak/
 MOPDDZpd84rqd+IvgMw5z4a5QlkvOTpScD21G3gjmtTEtyfahltyDK/5i8IaQC3YiXJCrqxE
 r7/4JMZeOYiKpE9iZMtS90t4wBgbVTqAGH1nE/ifZVAUcCtycD0f3egX9CHe45Ad4fsF3edQ
 ESa5tZAogiA4Hc/yQpnnf43a3aQ67XPOJXxS0Qptzu4vfF9h7kTKYWSrVesOU3QKYbjEAf95
 NewF9FhAlYqYrwIwnuAZ8TdXVDYt7Z3z506//sf6zoRwYIDA8RDqFGRuPMXUsoUnf/KKPrtR
 ceLcSUP/JCNiYbf1/QtW8S6Ca/4qJFXQHp0knqJPGmwuFHsarSdpvZQ9qpxD3FnuPyo64S2N
 Dfq8TAeifNp2pAmPY2PAHQ3nOmKgMG8Gn5QiORvMUGzSz8Lo31LW58NdBKbh6bci5+t/HE0H
 pnyVf5xhNC/FuQINBFl4pQoBEACr+MgxWHUP76oNNYjRiNDhaIVtnPRqxiZ9v4H5FPxJy9UD
 Bqr54rifr1E+K+yYNPt/Po43vVL2cAyfyI/LVLlhiY4yH6T1n+Di/hSkkviCaf13gczuvgz4
 KVYLwojU8+naJUsiCJw01MjO3pg9GQ+47HgsnRjCdNmmHiUQqksMIfd8k3reO9SUNlEmDDNB
 XuSzkHjE5y/R/6p8uXaVpiKPfHoULjNRWaFc3d2JGmxJpBdpYnajoz61m7XJlgwl/B5Ql/6B
 dHGaX3VHxOZsfRfugwYF9CkrPbyO5PK7yJ5vaiWre7aQ9bmCtXAomvF1q3/qRwZp77k6i9R3
 tWfXjZDOQokw0u6d6DYJ0Vkfcwheg2i/Mf/epQl7Pf846G3PgSnyVK6cRwerBl5a68w7xqVU
 4KgAh0DePjtDcbcXsKRT9D63cfyfrNE+ea4i0SVik6+N4nAj1HbzWHTk2KIxTsJXypibOKFX
 2VykltxutR1sUfZBYMkfU4PogE7NjVEU7KtuCOSAkYzIWrZNEQrxYkxHLJsWruhSYNRsqVBy
 KvY6JAsq/i5yhVd5JKKU8wIOgSwC9P6mXYRgwPyfg15GZpnw+Fpey4bCDkT5fMOaCcS+vSU1
 UaFmC4Ogzpe2BW2DOaPU5Ik99zUFNn6cRmOOXArrryjFlLT5oSOe4IposgWzdwARAQABiQIl
 BBgBCAAPBQJZeKUKAhsMBQkJZgGAAAoJEKOGQNwGMqM5ELoP/jj9d9gF1Al4+9bngUlYohYu
 0sxyZo9IZ7Yb7cHuJzOMqfgoP4tydP4QCuyd9Q2OHHL5AL4VFNb8SvqAxxYSPuDJTI3JZwI7
 d8JTPKwpulMSUaJE8ZH9n8A/+sdC3CAD4QafVBcCcbFe1jifHmQRdDrvHV9Es14QVAOTZhnJ
 vweENyHEIxkpLsyUUDuVypIo6y/Cws+EBCWt27BJi9GH/EOTB0wb+2ghCs/i3h8a+bi+bS7L
 FCCm/AxIqxRurh2UySn0P/2+2eZvneJ1/uTgfxnjeSlwQJ1BWzMAdAHQO1/lnbyZgEZEtUZJ
 x9d9ASekTtJjBMKJXAw7GbB2dAA/QmbA+Q+Xuamzm/1imigz6L6sOt2n/X/SSc33w8RJUyor
 SvAIoG/zU2Y76pKTgbpQqMDmkmNYFMLcAukpvC4ki3Sf086TdMgkjqtnpTkEElMSFJC8npXv
 3QnGGOIfFug/qs8z03DLPBz9VYS26jiiN7QIJVpeeEdN/LKnaz5LO+h5kNAyj44qdF2T2AiF
 HxnZnxO5JNP5uISQH3FjxxGxJkdJ8jKzZV7aT37sC+Rp0o3KNc+GXTR+GSVq87Xfuhx0LRST
 NK9ZhT0+qkiN7npFLtNtbzwqaqceq3XhafmCiw8xrtzCnlB/C4SiBr/93Ip4kihXJ0EuHSLn
 VujM7c/b4pps
Organization: Red Hat Inc,
Message-ID: <c92fead9-cf88-b56c-4664-5c1e20de40c1@redhat.com>
Date: Thu, 7 Nov 2019 14:37:01 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <f1f84779123b7d3f0613d20f6bd9b05c806b39f7.camel@linux.intel.com>
Content-Language: en-US
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13
X-MC-Unique: VpR5U_cpOuajtx2BBm2WHQ-1
X-Mimecast-Spam-Score: 0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


On 11/7/19 1:02 PM, Alexander Duyck wrote:
> On Thu, 2019-11-07 at 00:33 +0100, David Hildenbrand wrote:
>> On 06.11.19 18:48, Alexander Duyck wrote:
>>> On Wed, 2019-11-06 at 17:54 +0100, Michal Hocko wrote:
>>>> On Wed 06-11-19 08:35:43, Alexander Duyck wrote:
>>>>> On Wed, 2019-11-06 at 15:09 +0100, David Hildenbrand wrote:
>>>>>>> Am 06.11.2019 um 13:16 schrieb Michal Hocko <mhocko@kernel.org>:
>>>>>>>
>>>>>>> =EF=BB=BFI didn't have time to read through newer versions of this =
patch series
>>>>>>> but I remember there were concerns about this functionality being p=
ulled
>>>>>>> into the page allocator previously both by me and Mel [1][2]. Have =
those been
>>>>>>> addressed? I do not see an ack from Mel or any other MM people. Is =
there
>>>>>>> really a consensus that we want something like that living in the
>>>>>>> allocator?
>>>>>> I don=E2=80=98t think there is. The discussion is still ongoing (alt=
hough quiet,
>>>>>> Nitesh is working on a new version AFAIK). I think we should not rus=
h
>>>>>> this.
>>>>> How much time is needed to get a review? I waited 2 weeks since posti=
ng
>>>>> v12 and the only comments I got on the code were from Andrew. Most of=
 this
>>>>> hasn't changed much since v10 and that was posted back in mid Septemb=
er. I
>>>>> have been down to making small tweaks here and there and haven't had =
any
>>>>> real critiques on the approach since Mel had the comments about confl=
icts
>>>>> with compaction which I addressed by allowing compaction to punt the
>>>>> reporter out so that it could split and splice the lists as it walked
>>>>> through them.
>>>> Well, people are busy and MM community is not a large one. I cannot
>>>> really help you much other than keep poking those people and give
>>>> reasonable arguments so they decide to ack your patch.
>>> I get that. But v10 was posted in mid September. Back then we had a
>>> discussion about addressing what Mel had mentioned and I had mentioned
>>> then that I had addressed it by allowing compaction to essentially rese=
t
>>> the reporter to get it out of the list so compaction could do this spli=
t
>>> and splice tumbling logic.
>>>
>>>> I definitely do not intent to nack this work, I just have maintainabil=
ity
>>>> concerns and considering there is an alternative approach that does no=
t
>>>> require to touch page allocator internals and which we need to compare
>>>> against then I do not really think there is any need to push something
>>>> in right away. Or is there any pressing reason to have this merged rig=
ht
>>>> now?
>>> The alternative approach doesn't touch the page allocator, however it
>>> still has essentially the same changes to __free_one_page. I suspect th=
e
>> Nitesh is working on Michals suggestion to use page isolation instead=20
>> AFAIK - which avoids this.
> Okay. However it makes it much harder to discuss when we are comparing
> against code that isn't public. If the design is being redone do we have
> any ETA for when we will have something to actually compare to?

If there would have been just the design change then giving definite ETA wo=
uld
have been possible.
However, I also have to fix the performance with (MAX_ORDER - 2). Unlike yo=
u,
I need some time to do that.
If I just post the code without fixing the performance there will again be =
an
unnecessary discussion about the same thing which doesn't make any sense.


>
>>> performance issue seen is mostly due to the fact that because it doesn'=
t
>>> touch the page allocator it is taking the zone lock and probing the pag=
e
>>> for each set bit to see if the page is still free. As such the performa=
nce
>>> regression seen gets worse the lower the order used for reporting.
>>>
>>> Also I suspect Nitesh's patches are also in need of further review. I h=
ave
>>> provided feedback however my focus ended up being on more the kernel
>>> panics and 30% performance regression rather than debating architecture=
.
>> Please don't take this personally, but I really dislike you taking about=
=20
>> Niteshs RFCs (!) and pushing for your approach (although it was you that=
=20
>> was late to the party!) in that way. If there are problems then please=
=20
>> collaborate and fix instead of using the same wrong arguments over and=
=20
>> over again.
> Since Nitesh is in the middle of doing a full rewrite anyway I don't have
> much to compare against except for the previous set, which still needs
> fixes.  It is why I mentioned in the cover of the last patch set that I
> would prefer to not discuss it since I have no visibility into the patch
> set he is now working on.


Fair point.


>
>> a) hotplug/sparse zones: I explained a couple of times why we can ignore=
=20
>> that. There was never a reply from you, yet you keep coming up with=20
>> that. I don't enjoy talking to a wall.
> This gets to the heart of how Nitesh's patch set works. It is assuming
> that every zone is linear, that there will be no overlap between zones,
> and that the zones don't really change. These are key architectural
> assumptions that should really be discussed instead of simply dismissed.


They are not at all dismissed, they are just kept as a future action item.

>
> I guess part of the difference between us is that I am looking for
> something that is production ready and not a proof of concept. It sounds
> like you would prefer this work stays in a proof of concept stage for som=
e
> time longer.

In my opinion, it is more about how many use-cases do we want to target
initially.
With your patch-set, I agree we can cover more use-case where the solution
will fit in.
However, my series might not be suitable for use-cases where
we have memory hotplug or memory restriction. (This will be the case
after I fix the issues in the series)

>
>> b) Locking optimizations: Come on, these are premature optimizations and=
=20
>> nothing to dictate your design. *nobody* but you cares about that in an=
=20
>> initial approach we get upstream. We can always optimize that.
> My concern isn't so much the locking as the fact that it is the hunt and
> peck approach through a bitmap that will become increasingly more stale a=
s
> you are processing the data. Every bit you have to test for requires
> taking a zone lock and then probing to see if the page is still free and
> the right size. My concern is how much time is going to be spent with the
> zone lock held while other CPUs are waiting on access.

This can be prevented (at least to an extent) by checking if the page is in
the buddy before acquiring the lock as I have suggested previously.

>
>> c) Kernel panics: Come on, we are talking about complicated RFCs here=20
>> with moving design decisions. We want do discuss *design* and=20
>> *architecture* here, not *implementation details*.
> Then why ask me to compare performance against it? You were the one
> pushing for me to test it, not me. If you and Nitesh knew the design
> wasn't complete enough to run it why ask me to test it?
>
> Many of the kernel panics for the patch sets in the past have been relate=
d
> to fundamental architectural issues. For example ignoring things like
> NUMA, mangling the free_list by accessing it with the wrong locks held,
> etc.


Obviously we didn't know it earlier, whatever tests I had tried I didn't se=
e
any issues with them.
Again, I am trying to learn from my mistakes and appreciate you helping
me out with that.

>
>> d) Performance: We want to see a design that fits into the whole=20
>> architecture cleanly, is maintainable, and provides a benefit. Of=20
>> course, performance is relevant, but it certainly should not dictate our=
=20
>> design of a *virtualization specific optimization feature*. Performance=
=20
>> is not everything, otherwise please feel free and rewrite the kernel in=
=20
>> ASM and claim it is better because it is faster.
> I agree performance is not everything. But when a system grinds down to
> 60% of what it was originally I find that significant.

60%? In one of your previous emails you suggested that the drop was 30%.

>
>> Again, I do value your review and feedback, but I absolutely do not=20
>> enjoy the way you are trying to push your series here, sorry.
> Well I am a bit frustrated as I have had to provide a significant amount
> of feedback on Nitesh's patches, and in spite of that I feel like I am
> getting nothing in return.

Not sure if I understood the meaning here. May I know what were you expecti=
ng?
I do try to review your series and share whatever I can.

>  I have pointed out the various issues and
> opportunities to address the issues. At this point there are sections of
> his code that are directly copied from mine[1].=20

I don't think there is any problem with learning from you or your code.
Is there?
As far as giving proper credit is concerned, that was my mistake and I inte=
nd
to correct it as I have already mentioned.

> I have done everything I
> can to help the patches along but it seems like they aren't getting out o=
f
> RFC or proof-of-concept state any time soon.=20

So one reason for that is definitely the issues you pointed out.
But also since I started working on this project, I kept getting different =
design
suggestions. Which according to me is excellent. However, adopting them and
making those changes could be easy for you but I have to take my time to
properly understand them before implementing them.

> So with that being the case
> why not consider his patch set as something that could end up being a
> follow-on/refactor instead of an alternative to mine?
>

I have already mentioned that I would like to see the solution which is bet=
ter
and has a consensus (It doesn't matter from where it is coming from).

>> Yes, if we end up finding out that there is real value in your approach,=
=20
>> nothing speaks against considering it. But please don't try to hurry and=
=20
>> push your series in that way. Please give everybody to time to evaluate.
> I would love to argue this patch set on the merits. However I really don'=
t
> feel like I am getting a fair comparison here, at least from you. Every
> other reply on the thread seems to be from you trying to reinforce any
> criticism and taking the opportunity to mention that there is another
> solution out there. It is fine to fight for your own idea, but at least
> let me reply to the criticisms of my own patchset before you pile on. I
> would really prefer to discuss Nitesh's patches on a posting of his patch
> set rather than here.
>
> [1]: https://lore.kernel.org/lkml/101649ae-58d4-76ee-91f3-42ac1c145c46@re=
dhat.com/
>
>
--=20
Thanks
Nitesh