From: David Rientjes <rientjes@google.com> To: Michal Hocko <mhocko@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>, ying.huang@intel.com, Andrea Arcangeli <aarcange@redhat.com>, s.priebe@profihost.ag, mgorman@techsingularity.net, Linux List Kernel Mailing <linux-kernel@vger.kernel.org>, alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton <akpm@linux-foundation.org>, zi.yan@cs.rutgers.edu, Vlastimil Babka <vbabka@suse.cz> Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression Date: Mon, 3 Dec 2018 13:53:21 -0800 (PST) [thread overview] Message-ID: <alpine.DEB.2.21.1812031345180.224765@chino.kir.corp.google.com> (raw) In-Reply-To: <20181203212539.GR31738@dhcp22.suse.cz> On Mon, 3 Dec 2018, Michal Hocko wrote: > > I think extending functionality so thp can be allocated remotely if truly > > desired is worthwhile > > This is a complete NUMA policy antipatern that we have for all other > user memory allocations. So far you have to be explicit for your numa > requirements. You are trying to conflate NUMA api with MADV and that is > just conflating two orthogonal things and that is just wrong. > No, the page allocator change for both my patch and __GFP_COMPACT_ONLY has nothing to do with any madvise() mode. It has to do with where thp allocations are preferred. Yes, this is different than other memory allocations where it doesn't cause a 13.9% access latency regression for the lifetime of a binary for users who back their text with hugepages. MADV_HUGEPAGE still has its purpose to try synchronous memory compaction at fault time under all thp defrag modes other than "never". The specific problem being reported here, and that both my patch and __GFP_COMPACT_ONLY address, is the pointless reclaim activity that does not assist in making compaction more successful. > Let's put the __GFP_THISNODE issue aside. I do not remember you > confirming that __GFP_COMPACT_ONLY patch is OK for you (sorry it might > got lost in the emails storm from back then) but if that is the only > agreeable solution for now then I can live with that. The discussion between my patch and Andrea's patch seemed to only be about whether this should be a gfp bit or not > __GFP_NORETRY hack > was shown to not work properly by Mel AFAIR. Again if I misremember then > I am sorry and I can live with that. Andrea's patch as posted in this thread sets __GFP_NORETRY for __GFP_ONLY_COMPACT, so both my patch and his patch require it. His patch gets this behavior for page faults by way of alloc_pages_vma(), mine gets it from modifying GFP_TRANSHUGE. > But conflating MADV_TRANSHUGE with > an implicit numa placement policy and/or adding an opt-in for remote > NUMA placing is completely backwards and a broken API which will likely > bites us later. I sincerely hope we are not going to repeat mistakes > from the past. Assuming s/MADV_TRANSHUGE/MADV_HUGEPAGE/. Again, this is *not* about the madvise(); it's specifically about the role of direct reclaim in the allocation of a transparent hugepage at fault time regardless of any madvise() because you can get the same behavior with defrag=always (and the inconsistent use of __GFP_NORETRY there that is fixed by both of our patches).
WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes@google.com> To: lkp@lists.01.org Subject: Re: [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression Date: Mon, 03 Dec 2018 13:53:21 -0800 [thread overview] Message-ID: <alpine.DEB.2.21.1812031345180.224765@chino.kir.corp.google.com> (raw) In-Reply-To: <20181203212539.GR31738@dhcp22.suse.cz> [-- Attachment #1: Type: text/plain, Size: 2588 bytes --] On Mon, 3 Dec 2018, Michal Hocko wrote: > > I think extending functionality so thp can be allocated remotely if truly > > desired is worthwhile > > This is a complete NUMA policy antipatern that we have for all other > user memory allocations. So far you have to be explicit for your numa > requirements. You are trying to conflate NUMA api with MADV and that is > just conflating two orthogonal things and that is just wrong. > No, the page allocator change for both my patch and __GFP_COMPACT_ONLY has nothing to do with any madvise() mode. It has to do with where thp allocations are preferred. Yes, this is different than other memory allocations where it doesn't cause a 13.9% access latency regression for the lifetime of a binary for users who back their text with hugepages. MADV_HUGEPAGE still has its purpose to try synchronous memory compaction at fault time under all thp defrag modes other than "never". The specific problem being reported here, and that both my patch and __GFP_COMPACT_ONLY address, is the pointless reclaim activity that does not assist in making compaction more successful. > Let's put the __GFP_THISNODE issue aside. I do not remember you > confirming that __GFP_COMPACT_ONLY patch is OK for you (sorry it might > got lost in the emails storm from back then) but if that is the only > agreeable solution for now then I can live with that. The discussion between my patch and Andrea's patch seemed to only be about whether this should be a gfp bit or not > __GFP_NORETRY hack > was shown to not work properly by Mel AFAIR. Again if I misremember then > I am sorry and I can live with that. Andrea's patch as posted in this thread sets __GFP_NORETRY for __GFP_ONLY_COMPACT, so both my patch and his patch require it. His patch gets this behavior for page faults by way of alloc_pages_vma(), mine gets it from modifying GFP_TRANSHUGE. > But conflating MADV_TRANSHUGE with > an implicit numa placement policy and/or adding an opt-in for remote > NUMA placing is completely backwards and a broken API which will likely > bites us later. I sincerely hope we are not going to repeat mistakes > from the past. Assuming s/MADV_TRANSHUGE/MADV_HUGEPAGE/. Again, this is *not* about the madvise(); it's specifically about the role of direct reclaim in the allocation of a transparent hugepage at fault time regardless of any madvise() because you can get the same behavior with defrag=always (and the inconsistent use of __GFP_NORETRY there that is fixed by both of our patches).
next prev parent reply other threads:[~2018-12-03 21:53 UTC|newest] Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-11-27 6:25 [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression kernel test robot 2018-11-27 6:25 ` kernel test robot 2018-11-27 17:08 ` [LKP] " Linus Torvalds 2018-11-27 17:08 ` Linus Torvalds 2018-11-27 18:17 ` [LKP] " Michal Hocko 2018-11-27 18:17 ` Michal Hocko 2018-11-27 18:21 ` [LKP] " Michal Hocko 2018-11-27 18:21 ` Michal Hocko 2018-11-27 19:05 ` [LKP] " Vlastimil Babka 2018-11-27 19:05 ` Vlastimil Babka 2018-11-27 19:16 ` [LKP] " Vlastimil Babka 2018-11-27 19:16 ` Vlastimil Babka 2018-11-27 20:57 ` [LKP] " Andrea Arcangeli 2018-11-27 20:57 ` Andrea Arcangeli 2018-11-27 22:50 ` [LKP] " Linus Torvalds 2018-11-27 22:50 ` Linus Torvalds 2018-11-28 6:30 ` [LKP] " Michal Hocko 2018-11-28 6:30 ` Michal Hocko 2018-11-28 3:20 ` [LKP] " Huang, Ying 2018-11-28 3:20 ` Huang, Ying 2018-11-28 16:48 ` [LKP] " Linus Torvalds 2018-11-28 16:48 ` Linus Torvalds 2018-11-28 18:39 ` [LKP] " Andrea Arcangeli 2018-11-28 18:39 ` Andrea Arcangeli 2018-11-28 23:10 ` [LKP] " David Rientjes 2018-11-28 23:10 ` David Rientjes 2018-12-03 18:01 ` [LKP] " Linus Torvalds 2018-12-03 18:01 ` Linus Torvalds 2018-12-03 18:14 ` [LKP] " Michal Hocko 2018-12-03 18:14 ` Michal Hocko 2018-12-03 18:19 ` [LKP] " Linus Torvalds 2018-12-03 18:19 ` Linus Torvalds 2018-12-03 18:30 ` [LKP] " Michal Hocko 2018-12-03 18:30 ` Michal Hocko 2018-12-03 18:45 ` [LKP] " Linus Torvalds 2018-12-03 18:45 ` Linus Torvalds 2018-12-03 18:59 ` [LKP] " Michal Hocko 2018-12-03 18:59 ` Michal Hocko 2018-12-03 19:23 ` [LKP] " Andrea Arcangeli 2018-12-03 19:23 ` Andrea Arcangeli 2018-12-03 20:26 ` [LKP] " David Rientjes 2018-12-03 20:26 ` David Rientjes 2018-12-03 19:28 ` [LKP] " Linus Torvalds 2018-12-03 19:28 ` Linus Torvalds 2018-12-03 20:12 ` [LKP] " Andrea Arcangeli 2018-12-03 20:12 ` Andrea Arcangeli 2018-12-03 20:36 ` [LKP] " David Rientjes 2018-12-03 20:36 ` David Rientjes 2018-12-03 22:04 ` [LKP] " Linus Torvalds 2018-12-03 22:04 ` Linus Torvalds 2018-12-03 22:27 ` [LKP] " Linus Torvalds 2018-12-03 22:27 ` Linus Torvalds 2018-12-03 22:57 ` [LKP] " David Rientjes 2018-12-03 22:57 ` David Rientjes 2018-12-04 9:22 ` [LKP] " Vlastimil Babka 2018-12-04 9:22 ` Vlastimil Babka 2018-12-04 10:45 ` [LKP] " Mel Gorman 2018-12-04 10:45 ` Mel Gorman 2018-12-05 0:47 ` [LKP] " David Rientjes 2018-12-05 0:47 ` David Rientjes 2018-12-05 9:08 ` [LKP] " Michal Hocko 2018-12-05 9:08 ` Michal Hocko 2018-12-05 10:43 ` [LKP] " Mel Gorman 2018-12-05 10:43 ` Mel Gorman 2018-12-05 11:43 ` [LKP] " Michal Hocko 2018-12-05 11:43 ` Michal Hocko 2018-12-05 10:06 ` [LKP] " Mel Gorman 2018-12-05 10:06 ` Mel Gorman 2018-12-05 20:40 ` [LKP] " Andrea Arcangeli 2018-12-05 20:40 ` Andrea Arcangeli 2018-12-05 21:59 ` [LKP] " David Rientjes 2018-12-05 21:59 ` David Rientjes 2018-12-06 0:00 ` [LKP] " Andrea Arcangeli 2018-12-06 0:00 ` Andrea Arcangeli 2018-12-05 22:03 ` [LKP] " Linus Torvalds 2018-12-05 22:03 ` Linus Torvalds 2018-12-05 22:12 ` [LKP] " David Rientjes 2018-12-05 22:12 ` David Rientjes 2018-12-05 23:36 ` [LKP] " Andrea Arcangeli 2018-12-05 23:36 ` Andrea Arcangeli 2018-12-05 23:51 ` [LKP] " Linus Torvalds 2018-12-05 23:51 ` Linus Torvalds 2018-12-06 0:58 ` [LKP] " Linus Torvalds 2018-12-06 0:58 ` Linus Torvalds 2018-12-06 9:14 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression) Michal Hocko 2018-12-06 9:14 ` MADV_HUGEPAGE vs. NUMA semantic (was: " Michal Hocko 2018-12-06 23:49 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " David Rientjes 2018-12-06 23:49 ` MADV_HUGEPAGE vs. NUMA semantic (was: " David Rientjes 2018-12-07 7:34 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Michal Hocko 2018-12-07 7:34 ` MADV_HUGEPAGE vs. NUMA semantic (was: " Michal Hocko 2018-12-07 4:31 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Linus Torvalds 2018-12-07 4:31 ` MADV_HUGEPAGE vs. NUMA semantic (was: " Linus Torvalds 2018-12-07 7:49 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Michal Hocko 2018-12-07 7:49 ` MADV_HUGEPAGE vs. NUMA semantic (was: " Michal Hocko 2018-12-07 9:06 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " Vlastimil Babka 2018-12-07 9:06 ` MADV_HUGEPAGE vs. NUMA semantic (was: " Vlastimil Babka 2018-12-07 23:15 ` MADV_HUGEPAGE vs. NUMA semantic (was: Re: [LKP] " David Rientjes 2018-12-07 23:15 ` MADV_HUGEPAGE vs. NUMA semantic (was: " David Rientjes 2018-12-06 23:43 ` [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression David Rientjes 2018-12-06 23:43 ` David Rientjes 2018-12-07 4:01 ` [LKP] " Linus Torvalds 2018-12-07 4:01 ` Linus Torvalds 2018-12-10 0:29 ` [LKP] " David Rientjes 2018-12-10 0:29 ` David Rientjes 2018-12-10 4:49 ` [LKP] " Andrea Arcangeli 2018-12-10 4:49 ` Andrea Arcangeli 2018-12-12 0:37 ` [LKP] " David Rientjes 2018-12-12 0:37 ` David Rientjes 2018-12-12 9:50 ` [LKP] " Michal Hocko 2018-12-12 9:50 ` Michal Hocko 2018-12-12 17:00 ` [LKP] " Andrea Arcangeli 2018-12-12 17:00 ` Andrea Arcangeli 2018-12-14 11:32 ` [LKP] " Michal Hocko 2018-12-14 11:32 ` Michal Hocko 2018-12-12 10:14 ` [LKP] " Vlastimil Babka 2018-12-12 10:14 ` Vlastimil Babka 2018-12-14 21:04 ` [LKP] " David Rientjes 2018-12-14 21:04 ` David Rientjes 2018-12-14 21:33 ` [LKP] " Vlastimil Babka 2018-12-14 21:33 ` Vlastimil Babka 2018-12-21 22:18 ` [LKP] " David Rientjes 2018-12-21 22:18 ` David Rientjes 2018-12-21 22:18 ` [LKP] " David Rientjes 2018-12-22 12:08 ` Mel Gorman 2018-12-22 12:08 ` Mel Gorman 2018-12-14 23:11 ` [LKP] " Mel Gorman 2018-12-14 23:11 ` Mel Gorman 2018-12-21 22:15 ` [LKP] " David Rientjes 2018-12-21 22:15 ` David Rientjes 2018-12-12 10:44 ` [LKP] " Andrea Arcangeli 2018-12-12 10:44 ` Andrea Arcangeli 2019-04-15 11:48 ` [LKP] " Michal Hocko 2019-04-15 11:48 ` Michal Hocko 2018-12-06 0:18 ` [LKP] " David Rientjes 2018-12-06 0:18 ` David Rientjes 2018-12-06 0:54 ` [LKP] " Andrea Arcangeli 2018-12-06 0:54 ` Andrea Arcangeli 2018-12-06 9:23 ` [LKP] " Vlastimil Babka 2018-12-06 9:23 ` Vlastimil Babka 2018-12-03 20:39 ` [LKP] " David Rientjes 2018-12-03 20:39 ` David Rientjes 2018-12-03 21:25 ` [LKP] " Michal Hocko 2018-12-03 21:25 ` Michal Hocko 2018-12-03 21:53 ` David Rientjes [this message] 2018-12-03 21:53 ` David Rientjes 2018-12-04 8:48 ` [LKP] " Michal Hocko 2018-12-04 8:48 ` Michal Hocko 2018-12-05 0:07 ` [LKP] " David Rientjes 2018-12-05 0:07 ` David Rientjes 2018-12-05 10:18 ` [LKP] " Michal Hocko 2018-12-05 10:18 ` Michal Hocko 2018-12-05 19:16 ` [LKP] " David Rientjes 2018-12-05 19:16 ` David Rientjes 2018-11-27 7:23 [LKP] " kernel test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=alpine.DEB.2.21.1812031345180.224765@chino.kir.corp.google.com \ --to=rientjes@google.com \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=alex.williamson@redhat.com \ --cc=kirill@shutemov.name \ --cc=linux-kernel@vger.kernel.org \ --cc=lkp@01.org \ --cc=mgorman@techsingularity.net \ --cc=mhocko@kernel.org \ --cc=s.priebe@profihost.ag \ --cc=torvalds@linux-foundation.org \ --cc=vbabka@suse.cz \ --cc=ying.huang@intel.com \ --cc=zi.yan@cs.rutgers.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.