From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1020AC433DB for ; Tue, 29 Dec 2020 04:36:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CCCE0221F8 for ; Tue, 29 Dec 2020 04:36:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726829AbgL2EgD (ORCPT ); Mon, 28 Dec 2020 23:36:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726014AbgL2EgC (ORCPT ); Mon, 28 Dec 2020 23:36:02 -0500 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5085AC0613D6 for ; Mon, 28 Dec 2020 20:35:22 -0800 (PST) Received: by mail-oi1-x22b.google.com with SMTP id s2so13603210oij.2 for ; Mon, 28 Dec 2020 20:35:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=HKp1B9SaJ4tN+WlMO/Qg9gpOOi6ZDpwz4xG4OzPcWYuJu1lWBDnwSx77dfIPumeGBt 6xCVrpoH6RqQL5DEDV5mduzxLMDfvMG0oGAd50jUFEvL3Z5Bsx/LwqU5WP9JG4ZvhopB uSfSicQt/SklehWvKfm08VlMl0WGiv9Mhpp/VszvsmacTClXzXnVUpM2whqiErC0hn/Z alqFshOjTj3PaOHMuDSLeO+aIexc3M/a99on4A9KqSkwRBn4od3ypNmOnHk0eX8n2TwZ YzCaL4nGke7Y0FkGjnG1psXFjOxzTmej5ESsnXpIQ4iu4vgYXK9ZwoujSntqcILYI7X9 bfuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=KpXRKIw3P90o9BX1YNU9mSCxfw8UA+cjKWyg0yGrtl2QaghXOnrbzWSN7JUO4fBUUK EhDVy2aYV6Bjan5JJgPIZQ2+IFLsZIA789kxCZLvsS9ahokr7NyPWsYz3Ap9hN3doCcp mww7KErZ3DQ8n1YKlt4nwDIPDJENt0qllX0mbd7CdebU9leRi1rfpjkKp829bSFv72Uc dB+T45/7CpmWyVtNd8gjoKsHu+4uzQzZ3fyCfcvtZyibBDv3fGXy1A4IYEpNZd5npaR+ nE4qz287VS3rKBm8PmQvExX1V9d9m+s4gphi1zmCM4SN4HyCbxDrm+dDAHYvIydz+fo6 w6/A== X-Gm-Message-State: AOAM532AX0SBvpLau7G0eG5jhAhE6SUmxufo3dhxePCPPOpI64jndL+3 I2W5hdjI6F4rQUP916RWh4iFEQ== X-Google-Smtp-Source: ABdhPJxnuGvedWExGwmna/yXsnxS3rwe/jtCvuhc/XFABPncfq3GbCaJ7VJgaAJnEVbAa24+fa1bNg== X-Received: by 2002:a05:6808:3c9:: with SMTP id o9mr1301842oie.103.1609216521313; Mon, 28 Dec 2020 20:35:21 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j2sm9666576otq.78.2020.12.28.20.35.19 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 28 Dec 2020 20:35:20 -0800 (PST) Date: Mon, 28 Dec 2020 20:35:06 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Linus Torvalds , Hugh Dickins , Matthew Wilcox , "Kirill A. Shutemov" , Will Deacon , Linux Kernel Mailing List , Linux-MM , Linux ARM , Catalin Marinas , Jan Kara , Minchan Kim , Andrew Morton , Vinayak Menon , Android Kernel Team Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting In-Reply-To: <20201228221237.6nu75kgxq7ikxn2a@box> Message-ID: References: <20201226224016.dxjmordcfj75xgte@box> <20201227234853.5mjyxcybucts3kbq@box> <20201228125352.phnj2x2ci3kwfld5@box> <20201228220548.57hl32mmrvvefj6q@box> <20201228221237.6nu75kgxq7ikxn2a@box> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Got it at last, sorry it's taken so long. On Tue, 29 Dec 2020, Kirill A. Shutemov wrote: > On Tue, Dec 29, 2020 at 01:05:48AM +0300, Kirill A. Shutemov wrote: > > On Mon, Dec 28, 2020 at 10:47:36AM -0800, Linus Torvalds wrote: > > > On Mon, Dec 28, 2020 at 4:53 AM Kirill A. Shutemov wrote: > > > > > > > > So far I only found one more pin leak and always-true check. I don't see > > > > how can it lead to crash or corruption. Keep looking. Those mods look good in themselves, but, as you expected, made no difference to the corruption I was seeing. > > > > > > Well, I noticed that the nommu.c version of filemap_map_pages() needs > > > fixing, but that's obviously not the case Hugh sees. > > > > > > No,m I think the problem is the > > > > > > pte_unmap_unlock(vmf->pte, vmf->ptl); > > > > > > at the end of filemap_map_pages(). > > > > > > Why? > > > > > > Because we've been updating vmf->pte as we go along: > > > > > > vmf->pte += xas.xa_index - last_pgoff; > > > > > > and I think that by the time we get to that "pte_unmap_unlock()", > > > vmf->pte potentially points to past the edge of the page directory. > > > > Well, if it's true we have bigger problem: we set up an pte entry without > > relevant PTL. > > > > But I *think* we should be fine here: do_fault_around() limits start_pgoff > > and end_pgoff to stay within the page table. Yes, Linus's patch had made no difference, the map_pages loop is safe in that respect. > > > > It made mw looking at the code around pte_unmap_unlock() and I think that > > the bug is that we have to reset vmf->address and NULLify vmf->pte once we > > are done with faultaround: > > > > diff --git a/mm/memory.c b/mm/memory.c > > Ugh.. Wrong place. Need to sleep. > > I'll look into your idea tomorrow. > > diff --git a/mm/filemap.c b/mm/filemap.c > index 87671284de62..e4daab80ed81 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2987,6 +2987,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, > } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); > pte_unmap_unlock(vmf->pte, vmf->ptl); > rcu_read_unlock(); > + vmf->address = address; > + vmf->pte = NULL; > WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); > > return ret; > -- And that made no (noticeable) difference either. But at last I realized, it's absolutely on the right track, but missing the couple of early returns at the head of filemap_map_pages(): add --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3025,14 +3025,12 @@ vm_fault_t filemap_map_pages(struct vm_f rcu_read_lock(); head = first_map_page(vmf, &xas, end_pgoff); - if (!head) { - rcu_read_unlock(); - return 0; - } + if (!head) + goto out; if (filemap_map_pmd(vmf, head)) { - rcu_read_unlock(); - return VM_FAULT_NOPAGE; + ret = VM_FAULT_NOPAGE; + goto out; } vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, @@ -3066,9 +3064,9 @@ unlock: put_page(head); } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); pte_unmap_unlock(vmf->pte, vmf->ptl); +out: rcu_read_unlock(); vmf->address = address; - vmf->pte = NULL; WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); return ret; -- and then the corruption is fixed. It seems miraculous that the machines even booted with that bad vmf->address going to __do_fault(): maybe that tells us what a good job map_pages does most of the time. You'll see I've tried removing the "vmf->pte = NULL;" there. I did criticize earlier that vmf->pte was being left set, but was either thinking back to some earlier era of mm/memory.c, or else confusing with vmf->prealloc_pte, which is NULLed when consumed: I could not find anywhere in mm/memory.c which now needs vmf->pte to be cleared, and I seem to run fine without it (even on i386 HIGHPTE). So, the mystery is solved; but I don't think any of these patches should be applied. Without thinking through Linus's suggestions re do_set_pte() in particular, I do think this map_pages interface is too ugly, and given us lots of trouble: please take your time to go over it all again, and come up with a cleaner patch. I've grown rather jaded, and questioning the value of the rework: I don't think I want to look at or test another for a week or so. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0E50C433E0 for ; Tue, 29 Dec 2020 04:35:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4549920867 for ; Tue, 29 Dec 2020 04:35:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4549920867 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5A8A18D0024; Mon, 28 Dec 2020 23:35:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 559E38D0018; Mon, 28 Dec 2020 23:35:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 448808D0024; Mon, 28 Dec 2020 23:35:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0136.hostedemail.com [216.40.44.136]) by kanga.kvack.org (Postfix) with ESMTP id 2B5EB8D0018 for ; Mon, 28 Dec 2020 23:35:23 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E178F181AEF0B for ; Tue, 29 Dec 2020 04:35:22 +0000 (UTC) X-FDA: 77645055684.10.sugar71_230eff327499 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id C141E16A0D2 for ; Tue, 29 Dec 2020 04:35:22 +0000 (UTC) X-HE-Tag: sugar71_230eff327499 X-Filterd-Recvd-Size: 8486 Received: from mail-oi1-f177.google.com (mail-oi1-f177.google.com [209.85.167.177]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Tue, 29 Dec 2020 04:35:22 +0000 (UTC) Received: by mail-oi1-f177.google.com with SMTP id w124so13584434oia.6 for ; Mon, 28 Dec 2020 20:35:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=HKp1B9SaJ4tN+WlMO/Qg9gpOOi6ZDpwz4xG4OzPcWYuJu1lWBDnwSx77dfIPumeGBt 6xCVrpoH6RqQL5DEDV5mduzxLMDfvMG0oGAd50jUFEvL3Z5Bsx/LwqU5WP9JG4ZvhopB uSfSicQt/SklehWvKfm08VlMl0WGiv9Mhpp/VszvsmacTClXzXnVUpM2whqiErC0hn/Z alqFshOjTj3PaOHMuDSLeO+aIexc3M/a99on4A9KqSkwRBn4od3ypNmOnHk0eX8n2TwZ YzCaL4nGke7Y0FkGjnG1psXFjOxzTmej5ESsnXpIQ4iu4vgYXK9ZwoujSntqcILYI7X9 bfuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=cGOMSemfT4CqAOqA7JE7mvbPIPpp2CcyY7vZTrOIJi+wa0Bj6VO3kSRwFdw7Z58Ipb QtM+MenJAhWtXsuMG4mR9dDKpaODckkw6Yjxgrhcpi72364bbKBCJ8z7xHNRBmKphfNl jzURaBxYKfMXarZScFJovmyTK9gn1Tva2FYGfv73M1lWEwa052No/FL6Ss1zt0oR/J9t NODbBej1wn2I/WreKztJBDBxviISyV74fEruG9aiHyi1PlRFcGbItuf2baUjeCrrdbex L4VQuEP0CzEOXPd/KN69BR1CrhOdcLwi9/mrSc6d8aTu1o+9RN7hfdmK3zsqcC8Iqslk czWg== X-Gm-Message-State: AOAM530OTTdHNGhQTJJwnBbCMwt3LPdprJH4scQNYvIrYOUpO8rsZEmg e4SMjfNEtcDXh/Z6xp5kiLUOLQ== X-Google-Smtp-Source: ABdhPJxnuGvedWExGwmna/yXsnxS3rwe/jtCvuhc/XFABPncfq3GbCaJ7VJgaAJnEVbAa24+fa1bNg== X-Received: by 2002:a05:6808:3c9:: with SMTP id o9mr1301842oie.103.1609216521313; Mon, 28 Dec 2020 20:35:21 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j2sm9666576otq.78.2020.12.28.20.35.19 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 28 Dec 2020 20:35:20 -0800 (PST) Date: Mon, 28 Dec 2020 20:35:06 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" cc: Linus Torvalds , Hugh Dickins , Matthew Wilcox , "Kirill A. Shutemov" , Will Deacon , Linux Kernel Mailing List , Linux-MM , Linux ARM , Catalin Marinas , Jan Kara , Minchan Kim , Andrew Morton , Vinayak Menon , Android Kernel Team Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting In-Reply-To: <20201228221237.6nu75kgxq7ikxn2a@box> Message-ID: References: <20201226224016.dxjmordcfj75xgte@box> <20201227234853.5mjyxcybucts3kbq@box> <20201228125352.phnj2x2ci3kwfld5@box> <20201228220548.57hl32mmrvvefj6q@box> <20201228221237.6nu75kgxq7ikxn2a@box> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Got it at last, sorry it's taken so long. On Tue, 29 Dec 2020, Kirill A. Shutemov wrote: > On Tue, Dec 29, 2020 at 01:05:48AM +0300, Kirill A. Shutemov wrote: > > On Mon, Dec 28, 2020 at 10:47:36AM -0800, Linus Torvalds wrote: > > > On Mon, Dec 28, 2020 at 4:53 AM Kirill A. Shutemov wrote: > > > > > > > > So far I only found one more pin leak and always-true check. I don't see > > > > how can it lead to crash or corruption. Keep looking. Those mods look good in themselves, but, as you expected, made no difference to the corruption I was seeing. > > > > > > Well, I noticed that the nommu.c version of filemap_map_pages() needs > > > fixing, but that's obviously not the case Hugh sees. > > > > > > No,m I think the problem is the > > > > > > pte_unmap_unlock(vmf->pte, vmf->ptl); > > > > > > at the end of filemap_map_pages(). > > > > > > Why? > > > > > > Because we've been updating vmf->pte as we go along: > > > > > > vmf->pte += xas.xa_index - last_pgoff; > > > > > > and I think that by the time we get to that "pte_unmap_unlock()", > > > vmf->pte potentially points to past the edge of the page directory. > > > > Well, if it's true we have bigger problem: we set up an pte entry without > > relevant PTL. > > > > But I *think* we should be fine here: do_fault_around() limits start_pgoff > > and end_pgoff to stay within the page table. Yes, Linus's patch had made no difference, the map_pages loop is safe in that respect. > > > > It made mw looking at the code around pte_unmap_unlock() and I think that > > the bug is that we have to reset vmf->address and NULLify vmf->pte once we > > are done with faultaround: > > > > diff --git a/mm/memory.c b/mm/memory.c > > Ugh.. Wrong place. Need to sleep. > > I'll look into your idea tomorrow. > > diff --git a/mm/filemap.c b/mm/filemap.c > index 87671284de62..e4daab80ed81 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2987,6 +2987,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, > } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); > pte_unmap_unlock(vmf->pte, vmf->ptl); > rcu_read_unlock(); > + vmf->address = address; > + vmf->pte = NULL; > WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); > > return ret; > -- And that made no (noticeable) difference either. But at last I realized, it's absolutely on the right track, but missing the couple of early returns at the head of filemap_map_pages(): add --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3025,14 +3025,12 @@ vm_fault_t filemap_map_pages(struct vm_f rcu_read_lock(); head = first_map_page(vmf, &xas, end_pgoff); - if (!head) { - rcu_read_unlock(); - return 0; - } + if (!head) + goto out; if (filemap_map_pmd(vmf, head)) { - rcu_read_unlock(); - return VM_FAULT_NOPAGE; + ret = VM_FAULT_NOPAGE; + goto out; } vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, @@ -3066,9 +3064,9 @@ unlock: put_page(head); } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); pte_unmap_unlock(vmf->pte, vmf->ptl); +out: rcu_read_unlock(); vmf->address = address; - vmf->pte = NULL; WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); return ret; -- and then the corruption is fixed. It seems miraculous that the machines even booted with that bad vmf->address going to __do_fault(): maybe that tells us what a good job map_pages does most of the time. You'll see I've tried removing the "vmf->pte = NULL;" there. I did criticize earlier that vmf->pte was being left set, but was either thinking back to some earlier era of mm/memory.c, or else confusing with vmf->prealloc_pte, which is NULLed when consumed: I could not find anywhere in mm/memory.c which now needs vmf->pte to be cleared, and I seem to run fine without it (even on i386 HIGHPTE). So, the mystery is solved; but I don't think any of these patches should be applied. Without thinking through Linus's suggestions re do_set_pte() in particular, I do think this map_pages interface is too ugly, and given us lots of trouble: please take your time to go over it all again, and come up with a cleaner patch. I've grown rather jaded, and questioning the value of the rework: I don't think I want to look at or test another for a week or so. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EC2AC433E0 for ; Tue, 29 Dec 2020 04:37:22 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E2493208D5 for ; Tue, 29 Dec 2020 04:37:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E2493208D5 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:Message-ID:In-Reply-To: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=vYoL6lKBRlDRaRlAep+J7l3/8yHRfL4ordlQcInOK9o=; b=Lxyn5+bRzIHUk9VcsMej01rI0 Jz12UZfltRKWJCeu7GxjYvv3y+1qfYiP/TiuT0ILOGS94hYrfun9Mo5XiSa0HTCrHecLiLaZMqEUa XKT9uELB6Wq+yO9lWkFJihGEpaYvzqrXuEXOaF4H2kDrcISrpn2rdC08Fuh9AYhSb6ID+xc7ukY5k N0M7ZgvGhIO23TG7E4g6ELZOz9atNm8cQMfquN7psVi1bXV2zkp2hmxTXPuBUX6qxHHZH0VuHOpVW /W+KjXTJ89X1pRsY93gNSd0R7NpfkTJlB+sLkMgN4AT0Y1ewKWRSlRAcepetC/ESYH9xpKcMubydM 13u7/oWaw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1ku6jQ-00026Z-HN; Tue, 29 Dec 2020 04:35:36 +0000 Received: from mail-oi1-x235.google.com ([2607:f8b0:4864:20::235]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1ku6jL-00025K-VM for linux-arm-kernel@lists.infradead.org; Tue, 29 Dec 2020 04:35:33 +0000 Received: by mail-oi1-x235.google.com with SMTP id d203so13601771oia.0 for ; Mon, 28 Dec 2020 20:35:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=HKp1B9SaJ4tN+WlMO/Qg9gpOOi6ZDpwz4xG4OzPcWYuJu1lWBDnwSx77dfIPumeGBt 6xCVrpoH6RqQL5DEDV5mduzxLMDfvMG0oGAd50jUFEvL3Z5Bsx/LwqU5WP9JG4ZvhopB uSfSicQt/SklehWvKfm08VlMl0WGiv9Mhpp/VszvsmacTClXzXnVUpM2whqiErC0hn/Z alqFshOjTj3PaOHMuDSLeO+aIexc3M/a99on4A9KqSkwRBn4od3ypNmOnHk0eX8n2TwZ YzCaL4nGke7Y0FkGjnG1psXFjOxzTmej5ESsnXpIQ4iu4vgYXK9ZwoujSntqcILYI7X9 bfuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=hqQREaNq+NNAv75fe43F01kJKyRrA6N95gs0RKLBLQM=; b=uLE3wFC/9n2K7z7nl6kk0vZlt3a4wSMjzQbQJOrbujC1rO0oDmENjLClEND240EVKB c33E9ayWDzjxTDJe3FUD10eyv0HL/BGc2YVBxOVK7j85yGI9QvEtsIaTDkFOKREYrfDD NgM2gA2K9NwCgoMiD4/AFnMJvCMkRot4wZTrnoxjvqFjcdksqAYFYF+dnNOy5K/6Gd/2 j+IKu0v6f8lr2roUvlJCQr7GPmoiv5ZDACYYlpW/bkqr3R+3EydwYMQIiDMNFZKRXEvc S+5yRykCtuH+/sYa7NzcITYmZm3OKegecG+2uuIJYUemyGmDzYkxYBeZnV0odSAAh1em qsQQ== X-Gm-Message-State: AOAM530x88Mon4E0IbsJUjl6kii2LmLiu0cFHQ4VZ7PesTBS14zSpiew zdqf8M6v9/0lzAbbUfo8vWgJMQ== X-Google-Smtp-Source: ABdhPJxnuGvedWExGwmna/yXsnxS3rwe/jtCvuhc/XFABPncfq3GbCaJ7VJgaAJnEVbAa24+fa1bNg== X-Received: by 2002:a05:6808:3c9:: with SMTP id o9mr1301842oie.103.1609216521313; Mon, 28 Dec 2020 20:35:21 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j2sm9666576otq.78.2020.12.28.20.35.19 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 28 Dec 2020 20:35:20 -0800 (PST) Date: Mon, 28 Dec 2020 20:35:06 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: "Kirill A. Shutemov" Subject: Re: [PATCH 1/2] mm: Allow architectures to request 'old' entries when prefaulting In-Reply-To: <20201228221237.6nu75kgxq7ikxn2a@box> Message-ID: References: <20201226224016.dxjmordcfj75xgte@box> <20201227234853.5mjyxcybucts3kbq@box> <20201228125352.phnj2x2ci3kwfld5@box> <20201228220548.57hl32mmrvvefj6q@box> <20201228221237.6nu75kgxq7ikxn2a@box> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201228_233532_044790_B0FA0498 X-CRM114-Status: GOOD ( 37.48 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Android Kernel Team , Jan Kara , Minchan Kim , Linus Torvalds , Hugh Dickins , Linux Kernel Mailing List , Matthew Wilcox , Linux-MM , Vinayak Menon , Linux ARM , Catalin Marinas , Andrew Morton , Will Deacon , "Kirill A. Shutemov" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Got it at last, sorry it's taken so long. On Tue, 29 Dec 2020, Kirill A. Shutemov wrote: > On Tue, Dec 29, 2020 at 01:05:48AM +0300, Kirill A. Shutemov wrote: > > On Mon, Dec 28, 2020 at 10:47:36AM -0800, Linus Torvalds wrote: > > > On Mon, Dec 28, 2020 at 4:53 AM Kirill A. Shutemov wrote: > > > > > > > > So far I only found one more pin leak and always-true check. I don't see > > > > how can it lead to crash or corruption. Keep looking. Those mods look good in themselves, but, as you expected, made no difference to the corruption I was seeing. > > > > > > Well, I noticed that the nommu.c version of filemap_map_pages() needs > > > fixing, but that's obviously not the case Hugh sees. > > > > > > No,m I think the problem is the > > > > > > pte_unmap_unlock(vmf->pte, vmf->ptl); > > > > > > at the end of filemap_map_pages(). > > > > > > Why? > > > > > > Because we've been updating vmf->pte as we go along: > > > > > > vmf->pte += xas.xa_index - last_pgoff; > > > > > > and I think that by the time we get to that "pte_unmap_unlock()", > > > vmf->pte potentially points to past the edge of the page directory. > > > > Well, if it's true we have bigger problem: we set up an pte entry without > > relevant PTL. > > > > But I *think* we should be fine here: do_fault_around() limits start_pgoff > > and end_pgoff to stay within the page table. Yes, Linus's patch had made no difference, the map_pages loop is safe in that respect. > > > > It made mw looking at the code around pte_unmap_unlock() and I think that > > the bug is that we have to reset vmf->address and NULLify vmf->pte once we > > are done with faultaround: > > > > diff --git a/mm/memory.c b/mm/memory.c > > Ugh.. Wrong place. Need to sleep. > > I'll look into your idea tomorrow. > > diff --git a/mm/filemap.c b/mm/filemap.c > index 87671284de62..e4daab80ed81 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2987,6 +2987,8 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, unsigned long address, > } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); > pte_unmap_unlock(vmf->pte, vmf->ptl); > rcu_read_unlock(); > + vmf->address = address; > + vmf->pte = NULL; > WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); > > return ret; > -- And that made no (noticeable) difference either. But at last I realized, it's absolutely on the right track, but missing the couple of early returns at the head of filemap_map_pages(): add --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3025,14 +3025,12 @@ vm_fault_t filemap_map_pages(struct vm_f rcu_read_lock(); head = first_map_page(vmf, &xas, end_pgoff); - if (!head) { - rcu_read_unlock(); - return 0; - } + if (!head) + goto out; if (filemap_map_pmd(vmf, head)) { - rcu_read_unlock(); - return VM_FAULT_NOPAGE; + ret = VM_FAULT_NOPAGE; + goto out; } vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, @@ -3066,9 +3064,9 @@ unlock: put_page(head); } while ((head = next_map_page(vmf, &xas, end_pgoff)) != NULL); pte_unmap_unlock(vmf->pte, vmf->ptl); +out: rcu_read_unlock(); vmf->address = address; - vmf->pte = NULL; WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss); return ret; -- and then the corruption is fixed. It seems miraculous that the machines even booted with that bad vmf->address going to __do_fault(): maybe that tells us what a good job map_pages does most of the time. You'll see I've tried removing the "vmf->pte = NULL;" there. I did criticize earlier that vmf->pte was being left set, but was either thinking back to some earlier era of mm/memory.c, or else confusing with vmf->prealloc_pte, which is NULLed when consumed: I could not find anywhere in mm/memory.c which now needs vmf->pte to be cleared, and I seem to run fine without it (even on i386 HIGHPTE). So, the mystery is solved; but I don't think any of these patches should be applied. Without thinking through Linus's suggestions re do_set_pte() in particular, I do think this map_pages interface is too ugly, and given us lots of trouble: please take your time to go over it all again, and come up with a cleaner patch. I've grown rather jaded, and questioning the value of the rework: I don't think I want to look at or test another for a week or so. Hugh _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel