From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751375AbdH1Jh7 (ORCPT ); Mon, 28 Aug 2017 05:37:59 -0400 Received: from merlin.infradead.org ([205.233.59.134]:49550 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750866AbdH1Jh6 (ORCPT ); Mon, 28 Aug 2017 05:37:58 -0400 Date: Mon, 28 Aug 2017 11:37:27 +0200 From: Peter Zijlstra To: "Kirill A. Shutemov" Cc: Laurent Dufour , paulmck@linux.vnet.ibm.com, akpm@linux-foundation.org, ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net, jack@suse.cz, Matthew Wilcox , benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org, Thomas Gleixner , Ingo Molnar , hpa@zytor.com, Will Deacon , linux-kernel@vger.kernel.org, linux-mm@kvack.org, haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com, npiggin@gmail.com, bsingharora@gmail.com, Tim Chen , linuxppc-dev@lists.ozlabs.org, x86@kernel.org Subject: Re: [PATCH v2 14/20] mm: Provide speculative fault infrastructure Message-ID: <20170828093727.5wldedputadanssh@hirez.programming.kicks-ass.net> References: <1503007519-26777-1-git-send-email-ldufour@linux.vnet.ibm.com> <1503007519-26777-15-git-send-email-ldufour@linux.vnet.ibm.com> <20170827001823.n5wgkfq36z6snvf2@node.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170827001823.n5wgkfq36z6snvf2@node.shutemov.name> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Aug 27, 2017 at 03:18:23AM +0300, Kirill A. Shutemov wrote: > On Fri, Aug 18, 2017 at 12:05:13AM +0200, Laurent Dufour wrote: > > + /* > > + * Can't call vm_ops service has we don't know what they would do > > + * with the VMA. > > + * This include huge page from hugetlbfs. > > + */ > > + if (vma->vm_ops) > > + goto unlock; > > I think we need to have a way to white-list safe ->vm_ops. Either that, or simply teach all ->fault() callbacks about speculative faults. Shouldn't be too hard, just 'work'. > > + > > + if (unlikely(!vma->anon_vma)) > > + goto unlock; > > It deserves a comment. Yes, that was very much not intended. It wrecks most of the fun. This really _should_ work for file maps too. > > + /* > > + * Do a speculative lookup of the PTE entry. > > + */ > > + local_irq_disable(); > > + pgd = pgd_offset(mm, address); > > + if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd))) > > + goto out_walk; > > + > > + p4d = p4d_alloc(mm, pgd, address); > > + if (p4d_none(*p4d) || unlikely(p4d_bad(*p4d))) > > + goto out_walk; > > + > > + pud = pud_alloc(mm, p4d, address); > > + if (pud_none(*pud) || unlikely(pud_bad(*pud))) > > + goto out_walk; > > + > > + pmd = pmd_offset(pud, address); > > + if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) > > + goto out_walk; > > + > > + /* > > + * The above does not allocate/instantiate page-tables because doing so > > + * would lead to the possibility of instantiating page-tables after > > + * free_pgtables() -- and consequently leaking them. > > + * > > + * The result is that we take at least one !speculative fault per PMD > > + * in order to instantiate it. > > + */ > > > Doing all this job and just give up because we cannot allocate page tables > looks very wasteful to me. > > Have you considered to look how we can hand over from speculative to > non-speculative path without starting from scratch (when possible)? So we _can_ in fact allocate and install page-tables, but we have to be very careful about it. The interesting case is where we race with free_pgtables() and install a page that was just taken out. But since we already have the VMA I think we can do something like: if (p*g_none()) { p*d_t *new = p*d_alloc_one(mm, address); spin_lock(&mm->page_table_lock); if (!vma_changed_or_dead(vma,seq)) { if (p*d_none()) p*d_populate(mm, p*d, new); else p*d_free(new); new = NULL; } spin_unlock(&mm->page_table_lock); if (new) { p*d_free(new); goto out_walk; } } I just never bothered with that, figured we ought to get the basics working before trying to be clever. > > + /* Transparent huge pages are not supported. */ > > + if (unlikely(pmd_trans_huge(*pmd))) > > + goto out_walk; > > That's looks like a blocker to me. > > Is there any problem with making it supported (besides plain coding)? Not that I can remember, but I never really looked at THP, I don't think we even had that when I did the first versions.