From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Rapoport Subject: Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE Date: Wed, 24 May 2017 18:13:41 +0300 Message-ID: <20170524151340.GG3063@rapoport-lnx> References: <20170522133559.GE27382@rapoport-lnx> <20170522135548.GA8514@dhcp22.suse.cz> <20170522142927.GG27382@rapoport-lnx> <20170524075043.GB3063@rapoport-lnx> <20170524103947.GC3063@rapoport-lnx> <91778b6e-cb69-bfca-51da-f8c3256e630e@suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <91778b6e-cb69-bfca-51da-f8c3256e630e@suse.cz> Sender: owner-linux-mm@kvack.org To: Vlastimil Babka Cc: Pavel Emelyanov , Michal Hocko , "Kirill A. Shutemov" , Andrew Morton , Arnd Bergmann , "Kirill A. Shutemov" , Andrea Arcangeli , linux-mm , lkml , Linux API List-Id: linux-api@vger.kernel.org On Wed, May 24, 2017 at 04:54:38PM +0200, Vlastimil Babka wrote: > On 05/24/2017 04:28 PM, Pavel Emelyanov wrote: > > On 05/24/2017 02:31 PM, Vlastimil Babka wrote: > >> On 05/24/2017 12:39 PM, Mike Rapoport wrote: > >>>> Hm so the prctl does: > >>>> > >>>> if (arg2) > >>>> me->mm->def_flags |= VM_NOHUGEPAGE; > >>>> else > >>>> me->mm->def_flags &= ~VM_NOHUGEPAGE; > >>>> > >>>> That's rather lazy implementation IMHO. Could we change it so the flag > >>>> is stored elsewhere in the mm, and the code that decides to (not) use > >>>> THP will check both the per-vma flag and the per-mm flag? > >>> > >>> I afraid I don't understand how that can help. > >>> What we need is an ability to temporarily disable collapse of the pages in > >>> VMAs that do not have VM_*HUGEPAGE flags set and that after we re-enable > >>> THP, the vma->vm_flags for those VMAs will remain intact. > >> > >> That's what I'm saying - instead of implementing the prctl flag via > >> mm->def_flags (which gets permanently propagated to newly created vma's > >> but e.g. doesn't affect already existing ones), it would be setting a > >> flag somewhere in mm, which khugepaged (and page faults) would check in > >> addition to the per-vma flags. > > > > I do not insist, but this would make existing paths (checking for flags) be > > 2 times slower -- from now on these would need to check two bits (vma flags > > and mm flags) which are 100% in different cache lines. > > I'd expect you already have mm struct cached during a page fault. And > THP-eligible page fault is just one per pmd, the overhead should be > practically zero. > > > What Mike is proposing is the way to fine-tune the existing vma flags. This > > would keep current paths as fast (or slow ;) ) as they are now. All the > > complexity would go to rare cases when someone needs to turn thp off for a > > while and then turn it back on. > > Yeah but it's extending user-space API for a corner case. We should do > that only when there's no other option. With madvise() I'm suggesting we rather add "completeness" to the existing API, IMHO. We do have API that sets VM_HUGEPAGE and clears VM_NOHUGEPAGE or vise versa, but we do not have an API that can clear both flags... And if we would use prctl(), we either change user visible behaviour or we still need to extend the API and use, say, arg2 to distinguish between the current behaviour and the new one. -- Sincerely yours, Mike. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org