From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58CF2C43331 for ; Mon, 11 Nov 2019 18:52:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 11B2621925 for ; Mon, 11 Nov 2019 18:52:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="H3AoXslF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 11B2621925 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9B1226B0005; Mon, 11 Nov 2019 13:52:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 93B3C6B0006; Mon, 11 Nov 2019 13:52:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DC066B000A; Mon, 11 Nov 2019 13:52:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0218.hostedemail.com [216.40.44.218]) by kanga.kvack.org (Postfix) with ESMTP id 6412C6B0005 for ; Mon, 11 Nov 2019 13:52:40 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id ED9ED181AC9CC for ; Mon, 11 Nov 2019 18:52:39 +0000 (UTC) X-FDA: 76144892838.04.lead15_343558088dc29 X-HE-Tag: lead15_343558088dc29 X-Filterd-Recvd-Size: 10560 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Mon, 11 Nov 2019 18:52:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1573498358; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:openpgp:openpgp:autocrypt:autocrypt; bh=tlFGceUgrIh6qaEuzLOz329lJqtZGUM3IYtnYDDjBro=; b=H3AoXslFjX/1OQiHpU/L0h/7VKbSQD/Daz9m1JTkIwbkokYxl5rg4stOjWii74hZ7Mq7MU SRk815b5kKikQVVPG31BNZwGzUXEH83XbWiP3MpzZNI3DXn98CZQLUCtCtbWufSjbIC6WC CCWHvyrUdi2ny0wEzfLKNAFLNN/cVb0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-426-29TA9o8zPwGQxvb45cuqtA-1; Mon, 11 Nov 2019 13:52:36 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2DC731007B4C; Mon, 11 Nov 2019 18:52:34 +0000 (UTC) Received: from [10.40.205.221] (unknown [10.40.205.221]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 005C85D6D4; Mon, 11 Nov 2019 18:52:14 +0000 (UTC) Subject: Re: + mm-introduce-reported-pages.patch added to -mm tree To: Michal Hocko , akpm@linux-foundation.org Cc: aarcange@redhat.com, alexander.h.duyck@linux.intel.com, dan.j.williams@intel.com, dave.hansen@intel.com, david@redhat.com, konrad.wilk@oracle.com, lcapitulino@redhat.com, mgorman@techsingularity.net, mm-commits@vger.kernel.org, mst@redhat.com, osalvador@suse.de, pagupta@redhat.com, pbonzini@redhat.com, riel@surriel.com, vbabka@suse.cz, wei.w.wang@intel.com, willy@infradead.org, yang.zhang.wz@gmail.com, linux-mm@kvack.org References: <20191106000547.juQRi83gi%akpm@linux-foundation.org> <20191106121605.GH8314@dhcp22.suse.cz> From: Nitesh Narayan Lal Openpgp: preference=signencrypt Autocrypt: addr=nitesh@redhat.com; prefer-encrypt=mutual; keydata= mQINBFl4pQoBEADT/nXR2JOfsCjDgYmE2qonSGjkM1g8S6p9UWD+bf7YEAYYYzZsLtbilFTe z4nL4AV6VJmC7dBIlTi3Mj2eymD/2dkKP6UXlliWkq67feVg1KG+4UIp89lFW7v5Y8Muw3Fm uQbFvxyhN8n3tmhRe+ScWsndSBDxYOZgkbCSIfNPdZrHcnOLfA7xMJZeRCjqUpwhIjxQdFA7 n0s0KZ2cHIsemtBM8b2WXSQG9CjqAJHVkDhrBWKThDRF7k80oiJdEQlTEiVhaEDURXq+2XmG jpCnvRQDb28EJSsQlNEAzwzHMeplddfB0vCg9fRk/kOBMDBtGsTvNT9OYUZD+7jaf0gvBvBB lbKmmMMX7uJB+ejY7bnw6ePNrVPErWyfHzR5WYrIFUtgoR3LigKnw5apzc7UIV9G8uiIcZEn C+QJCK43jgnkPcSmwVPztcrkbC84g1K5v2Dxh9amXKLBA1/i+CAY8JWMTepsFohIFMXNLj+B RJoOcR4HGYXZ6CAJa3Glu3mCmYqHTOKwezJTAvmsCLd3W7WxOGF8BbBjVaPjcZfavOvkin0u DaFvhAmrzN6lL0msY17JCZo046z8oAqkyvEflFbC0S1R/POzehKrzQ1RFRD3/YzzlhmIowkM BpTqNBeHEzQAlIhQuyu1ugmQtfsYYq6FPmWMRfFPes/4JUU/PQARAQABtCVOaXRlc2ggTmFy YXlhbiBMYWwgPG5pbGFsQHJlZGhhdC5jb20+iQI9BBMBCAAnBQJZeKUKAhsjBQkJZgGABQsJ CAcCBhUICQoLAgQWAgMBAh4BAheAAAoJEKOGQNwGMqM56lEP/A2KMs/pu0URcVk/kqVwcBhU SnvB8DP3lDWDnmVrAkFEOnPX7GTbactQ41wF/xwjwmEmTzLrMRZpkqz2y9mV0hWHjqoXbOCS 6RwK3ri5e2ThIPoGxFLt6TrMHgCRwm8YuOSJ97o+uohCTN8pmQ86KMUrDNwMqRkeTRW9wWIQ EdDqW44VwelnyPwcmWHBNNb1Kd8j3xKlHtnS45vc6WuoKxYRBTQOwI/5uFpDZtZ1a5kq9Ak/ MOPDDZpd84rqd+IvgMw5z4a5QlkvOTpScD21G3gjmtTEtyfahltyDK/5i8IaQC3YiXJCrqxE r7/4JMZeOYiKpE9iZMtS90t4wBgbVTqAGH1nE/ifZVAUcCtycD0f3egX9CHe45Ad4fsF3edQ ESa5tZAogiA4Hc/yQpnnf43a3aQ67XPOJXxS0Qptzu4vfF9h7kTKYWSrVesOU3QKYbjEAf95 NewF9FhAlYqYrwIwnuAZ8TdXVDYt7Z3z506//sf6zoRwYIDA8RDqFGRuPMXUsoUnf/KKPrtR ceLcSUP/JCNiYbf1/QtW8S6Ca/4qJFXQHp0knqJPGmwuFHsarSdpvZQ9qpxD3FnuPyo64S2N Dfq8TAeifNp2pAmPY2PAHQ3nOmKgMG8Gn5QiORvMUGzSz8Lo31LW58NdBKbh6bci5+t/HE0H pnyVf5xhNC/FuQINBFl4pQoBEACr+MgxWHUP76oNNYjRiNDhaIVtnPRqxiZ9v4H5FPxJy9UD Bqr54rifr1E+K+yYNPt/Po43vVL2cAyfyI/LVLlhiY4yH6T1n+Di/hSkkviCaf13gczuvgz4 KVYLwojU8+naJUsiCJw01MjO3pg9GQ+47HgsnRjCdNmmHiUQqksMIfd8k3reO9SUNlEmDDNB XuSzkHjE5y/R/6p8uXaVpiKPfHoULjNRWaFc3d2JGmxJpBdpYnajoz61m7XJlgwl/B5Ql/6B dHGaX3VHxOZsfRfugwYF9CkrPbyO5PK7yJ5vaiWre7aQ9bmCtXAomvF1q3/qRwZp77k6i9R3 tWfXjZDOQokw0u6d6DYJ0Vkfcwheg2i/Mf/epQl7Pf846G3PgSnyVK6cRwerBl5a68w7xqVU 4KgAh0DePjtDcbcXsKRT9D63cfyfrNE+ea4i0SVik6+N4nAj1HbzWHTk2KIxTsJXypibOKFX 2VykltxutR1sUfZBYMkfU4PogE7NjVEU7KtuCOSAkYzIWrZNEQrxYkxHLJsWruhSYNRsqVBy KvY6JAsq/i5yhVd5JKKU8wIOgSwC9P6mXYRgwPyfg15GZpnw+Fpey4bCDkT5fMOaCcS+vSU1 UaFmC4Ogzpe2BW2DOaPU5Ik99zUFNn6cRmOOXArrryjFlLT5oSOe4IposgWzdwARAQABiQIl BBgBCAAPBQJZeKUKAhsMBQkJZgGAAAoJEKOGQNwGMqM5ELoP/jj9d9gF1Al4+9bngUlYohYu 0sxyZo9IZ7Yb7cHuJzOMqfgoP4tydP4QCuyd9Q2OHHL5AL4VFNb8SvqAxxYSPuDJTI3JZwI7 d8JTPKwpulMSUaJE8ZH9n8A/+sdC3CAD4QafVBcCcbFe1jifHmQRdDrvHV9Es14QVAOTZhnJ vweENyHEIxkpLsyUUDuVypIo6y/Cws+EBCWt27BJi9GH/EOTB0wb+2ghCs/i3h8a+bi+bS7L FCCm/AxIqxRurh2UySn0P/2+2eZvneJ1/uTgfxnjeSlwQJ1BWzMAdAHQO1/lnbyZgEZEtUZJ x9d9ASekTtJjBMKJXAw7GbB2dAA/QmbA+Q+Xuamzm/1imigz6L6sOt2n/X/SSc33w8RJUyor SvAIoG/zU2Y76pKTgbpQqMDmkmNYFMLcAukpvC4ki3Sf086TdMgkjqtnpTkEElMSFJC8npXv 3QnGGOIfFug/qs8z03DLPBz9VYS26jiiN7QIJVpeeEdN/LKnaz5LO+h5kNAyj44qdF2T2AiF HxnZnxO5JNP5uISQH3FjxxGxJkdJ8jKzZV7aT37sC+Rp0o3KNc+GXTR+GSVq87Xfuhx0LRST NK9ZhT0+qkiN7npFLtNtbzwqaqceq3XhafmCiw8xrtzCnlB/C4SiBr/93Ip4kihXJ0EuHSLn VujM7c/b4pps Organization: Red Hat Inc, Message-ID: Date: Mon, 11 Nov 2019 13:52:11 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20191106121605.GH8314@dhcp22.suse.cz> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: 29TA9o8zPwGQxvb45cuqtA-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11/6/19 7:16 AM, Michal Hocko wrote: > I didn't have time to read through newer versions of this patch series > but I remember there were concerns about this functionality being pulled > into the page allocator previously both by me and Mel [1][2]. Have those = been=20 > addressed? I do not see an ack from Mel or any other MM people. Is there > really a consensus that we want something like that living in the > allocator? > > There has also been a different approach discussed and from [3] > (referenced by the cover letter) I can only see > > : Then Nitesh's solution had changed to the bitmap approach[7]. However i= t > : has been pointed out that this solution doesn't deal with sparse memory= , > : hotplug, and various other issues. > > which looks more like something to be done than a fundamental > roadblocks. > > [1] http://lkml.kernel.org/r/20190912163525.GV2739@techsingularity.net > [2] http://lkml.kernel.org/r/20190912091925.GM4023@dhcp22.suse.cz > [3] http://lkml.kernel.org/r/29f43d5796feed0dec8e8bb98b187d9dac03b900.cam= el@linux.intel.com > [...] Hi, I performed some experiments to find the root cause for the performance degradation Alexander reported with my v12 patch-set. [1] I will try to give a brief background of the previous discussion under v12: (Alexander can correct me if I am missing something). Alexander suggested two issues with my v12 posting: [2] (This is excluding the sparse zone and memory hotplug/hotremove support) - A crash which was caused because I was not using spinlock_irqsave() =C2=A0 (Fix suggestion came from Alexander). - Performance degradation with Alexander's suggested setup. Where we are us= ing =C2=A0 modified will-it-scale/page_fault with THP, CONFIG_SLAB_FREELIST_RAN= DOM & =C2=A0 CONFIG_SHUFFLE_PAGE_ALLOCATOR. When I was using (MAX_ORDER - 2) as t= he =C2=A0 PAGE_REPORTING_MIN_ORDER, I also observed significant performance de= gradation =C2=A0 (around 20% in the number of threads launched on the 16th vCPU). How= ever, on =C2=A0 switching the PAGE_REPORTING_MIN_ORDER to (MAX_ORDER - 1), I was abl= e to get =C2=A0 the performance similar to what Alexander is reporting. PAGE_REPORTING_MIN_ORDER: is the minimum order of a page to be captured in = the bitmap and get reported to the hypervisor. For the discussion where we are comparing the two series, the performance aspect is more relevant and important. It turns out that with the current implementation the number of vmexit with PAGE_REPORTING_MIN_ORDER as pageblock_order or (MAX_ORDER - 2) are signific= antly large when compared to (MAX_ODER - 1). One of the reason could be that the lower order pages are not getting suffi= cient time to merge with each other as a result they are somehow getting reported with 2 separate reporting requests. Hence, generating more vmexits. Where as with (MAX_ORDER - 1) we don't have that kind of situation as I never try to report any page which has order < (MAX_ORDER - 1). To fix this, I might have to further limit the reporting which could allow = the lower order pages to further merge and hence reduce the VM exits. I will tr= y to do some experiments to see if I can fix this. In any case, if anyone has a suggestion I would be more than happy to look in that direction. Following are the numbers I gathered on a 30GB single NUMA, 16 vCPU guest affined to a single host-NUMA: On 16th vCPU: With PAGE_REPORTING_MIN_ORDER as (MAX_ORDER - 1): % Dip on the number of Processes =3D 1.3 % % Dip on the number of =C2=A0Threads =C2=A0=3D 5.7 % With PAGE_REPORTING_MIN_ORDER as With (pageblock_order): % Dip on the number of Processes =3D 5 % % Dip on the number of =C2=A0Threads =C2=A0=3D 20 % Michal's suggestion: I was able to get the prototype which could use page-isolation API: start_isolate_page_range()/undo_isolate_page_range() to work. But the issue mentioned above was also evident with it. Hence, I think before moving to the decision whether I want to use __isolate_free_page() which isolates pages from the buddy or start/undo_isolate_page_range() which just marks the page as MIGRATE_ISOLAT= E, it is important for me to resolve the above-mentioned issue. Previous discussions: More about how we ended up with these two approaches could be found at [3] = & [4] explained by Alexander & David. [1] https://lore.kernel.org/lkml/20190812131235.27244-1-nitesh@redhat.com/ [2] https://lkml.org/lkml/2019/10/2/425 [3] https://lkml.org/lkml/2019/10/23/1166 [4] https://lkml.org/lkml/2019/9/12/48 --=20 Thanks Nitesh