linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: "Thomas Hellström (VMware)" <thomas_os@shipmail.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	torvalds@linux-foundation.org,
	"Thomas Hellstrom" <thellstrom@vmware.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Will Deacon" <will.deacon@arm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Rik van Riel" <riel@surriel.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH v3 2/7] mm: Add a walk_page_mapping() function to the pagewalk code
Date: Fri, 4 Oct 2019 15:37:32 +0300	[thread overview]
Message-ID: <20191004123732.xpr3vroee5mhg2zt@box.shutemov.name> (raw)
In-Reply-To: <d336497b-3716-0748-d838-378902399439@shipmail.org>

On Thu, Oct 03, 2019 at 01:32:45PM +0200, Thomas Hellström (VMware) wrote:
> > > + *   If @mapping allows faulting of huge pmds and puds, it is desirable
> > > + *   that its huge_fault() handler blocks while this function is running on
> > > + *   @mapping. Otherwise a race may occur where the huge entry is split when
> > > + *   it was intended to be handled in a huge entry callback. This requires an
> > > + *   external lock, for example that @mapping->i_mmap_rwsem is held in
> > > + *   write mode in the huge_fault() handlers.
> > Em. No. We have ptl for this. It's the only lock required (plus mmap_sem
> > on read) to split PMD entry into PTE table. And it can happen not only
> > from fault path.
> > 
> > If you care about splitting compound page under you, take a pin or lock a
> > page. It will block split_huge_page().
> > 
> > Suggestion to block fault path is not viable (and it will not happen
> > magically just because of this comment).
> > 
> I was specifically thinking of this:
> 
> https://elixir.bootlin.com/linux/latest/source/mm/pagewalk.c#L103
> 
> If a huge pud is concurrently faulted in here, it will immediatly get split
> without getting processed in pud_entry(). An external lock would protect
> against that, but that's perhaps a bug in the pagewalk code?  For pmds the
> situation is not the same since when pte_entry is used, all pmds will
> unconditionally get split.

I *think* it should be fixed with something like this (there's no
pud_trans_unstable() yet):

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index d48c2a986ea3..221a3b945f42 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -102,10 +102,11 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 					break;
 				continue;
 			}
+		} else {
+			split_huge_pud(walk->vma, pud, addr);
 		}
 
-		split_huge_pud(walk->vma, pud, addr);
-		if (pud_none(*pud))
+		if (pud_none(*pud) || pud_trans_unstable(*pud))
 			goto again;
 
 		if (ops->pmd_entry || ops->pte_entry)

Or better yet converted to what we do on pmd level.

Honestly, all the code around PUD THP missing a lot of ground work.
Rushing it upstream for DAX was not a right move.

> There's a similar more scary race in
> 
> https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L3931
> 
> It looks like if a concurrent thread faults in a huge pud just after the
> test for pud_none in that pmd_alloc, things might go pretty bad.

Hm? It will fail the next pmd_none() check under ptl. Do you have a
particular racing scenarion?

-- 
 Kirill A. Shutemov

  reply	other threads:[~2019-10-04 12:37 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-02 13:47 [PATCH v3 0/7] Emulated coherent graphics memory take 2 Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 1/7] mm: Remove BUG_ON mmap_sem not held from xxx_trans_huge_lock() Thomas Hellström (VMware)
2019-10-03 11:02   ` Kirill A. Shutemov
2019-10-03 11:32     ` Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 2/7] mm: Add a walk_page_mapping() function to the pagewalk code Thomas Hellström (VMware)
2019-10-02 17:52   ` Linus Torvalds
2019-10-03 11:17   ` Kirill A. Shutemov
2019-10-03 11:32     ` Thomas Hellström (VMware)
2019-10-04 12:37       ` Kirill A. Shutemov [this message]
2019-10-04 12:58         ` Thomas Hellström (VMware)
2019-10-04 13:24           ` Kirill A. Shutemov
2019-10-02 13:47 ` [PATCH v3 3/7] mm: Add write-protect and clean utilities for address space ranges Thomas Hellström (VMware)
2019-10-02 18:06   ` Linus Torvalds
2019-10-02 18:13     ` Matthew Wilcox
2019-10-02 19:09     ` Thomas Hellström (VMware)
2019-10-02 20:27       ` Linus Torvalds
2019-10-03  7:56         ` Thomas Hellstrom
2019-10-03 16:55           ` Linus Torvalds
2019-10-03 18:03             ` Thomas Hellström (VMware)
2019-10-03 18:11               ` Linus Torvalds
2019-10-02 13:47 ` [PATCH v3 4/7] drm/vmwgfx: Implement an infrastructure for write-coherent resources Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 5/7] drm/vmwgfx: Use an RBtree instead of linked list for MOB resources Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 6/7] drm/vmwgfx: Implement an infrastructure for read-coherent resources Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 7/7] drm/vmwgfx: Add surface dirty-tracking callbacks Thomas Hellström (VMware)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191004123732.xpr3vroee5mhg2zt@box.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=akpm@linux-foundation.org \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=thellstrom@vmware.com \
    --cc=thomas_os@shipmail.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).