Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Anshuman Khandual <anshuman.khandual@arm.com>, linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Christophe Leroy <christophe.leroy@c-s.fr>,
	linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org,
	Mark Rutland <mark.rutland@arm.com>
Subject: Re: [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
Date: Tue, 25 Jun 2019 10:20:14 +1000
Message-ID: <1561421882.9uwq6zqlvo.astroid@bobo.none> (raw)
In-Reply-To: <8668f76d-faad-4e57-2f7b-f2b8969b1026@arm.com>

Anshuman Khandual's on June 24, 2019 4:52 pm:
> 
> 
> On 06/23/2019 03:14 PM, Nicholas Piggin wrote:
>> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
>> Whether or not a vmap is huge depends on the architecture details,
>> alignments, boot options, etc., which the caller can not be expected
>> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
>> 
>> This change teaches vmalloc_to_page about larger pages, and returns
>> the struct page that corresponds to the offset within the large page.
>> This makes the API agnostic to mapping implementation details.
>> 
>> [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
>>     fail gracefully on unexpected huge vmap mappings")
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  include/asm-generic/4level-fixup.h |  1 +
>>  include/asm-generic/5level-fixup.h |  1 +
>>  mm/vmalloc.c                       | 37 +++++++++++++++++++-----------
>>  3 files changed, 26 insertions(+), 13 deletions(-)
>> 
>> diff --git a/include/asm-generic/4level-fixup.h b/include/asm-generic/4level-fixup.h
>> index e3667c9a33a5..3cc65a4dd093 100644
>> --- a/include/asm-generic/4level-fixup.h
>> +++ b/include/asm-generic/4level-fixup.h
>> @@ -20,6 +20,7 @@
>>  #define pud_none(pud)			0
>>  #define pud_bad(pud)			0
>>  #define pud_present(pud)		1
>> +#define pud_large(pud)			0
>>  #define pud_ERROR(pud)			do { } while (0)
>>  #define pud_clear(pud)			pgd_clear(pud)
>>  #define pud_val(pud)			pgd_val(pud)
>> diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
>> index bb6cb347018c..c4377db09a4f 100644
>> --- a/include/asm-generic/5level-fixup.h
>> +++ b/include/asm-generic/5level-fixup.h
>> @@ -22,6 +22,7 @@
>>  #define p4d_none(p4d)			0
>>  #define p4d_bad(p4d)			0
>>  #define p4d_present(p4d)		1
>> +#define p4d_large(p4d)			0
>>  #define p4d_ERROR(p4d)			do { } while (0)
>>  #define p4d_clear(p4d)			pgd_clear(p4d)
>>  #define p4d_val(p4d)			pgd_val(p4d)
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 4c9e150e5ad3..4be98f700862 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -36,6 +36,7 @@
>>  #include <linux/rbtree_augmented.h>
>>  
>>  #include <linux/uaccess.h>
>> +#include <asm/pgtable.h>
>>  #include <asm/tlbflush.h>
>>  #include <asm/shmparam.h>
>>  
>> @@ -284,26 +285,36 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>  
>>  	if (pgd_none(*pgd))
>>  		return NULL;
>> +
>>  	p4d = p4d_offset(pgd, addr);
>>  	if (p4d_none(*p4d))
>>  		return NULL;
>> -	pud = pud_offset(p4d, addr);
>> +	if (WARN_ON_ONCE(p4d_bad(*p4d)))
>> +		return NULL;
> 
> The warning here is a required addition but it needs to be moved after p4d_large()
> check. Please see the next comment below.
> 
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +	if (p4d_large(*p4d))
>> +		return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
>> +#endif
>>  
>> -	/*
>> -	 * Don't dereference bad PUD or PMD (below) entries. This will also
>> -	 * identify huge mappings, which we may encounter on architectures
>> -	 * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
>> -	 * identified as vmalloc addresses by is_vmalloc_addr(), but are
>> -	 * not [unambiguously] associated with a struct page, so there is
>> -	 * no correct value to return for them.
>> -	 */
>> -	WARN_ON_ONCE(pud_bad(*pud));
>> -	if (pud_none(*pud) || pud_bad(*pud))
>> +	pud = pud_offset(p4d, addr);
>> +	if (pud_none(*pud))
>> +		return NULL;
>> +	if (WARN_ON_ONCE(pud_bad(*pud)))
>>  		return NULL;
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +	if (pud_large(*pud))
>> +		return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>> +#endif
>> +
> 
> pud_bad() on arm64 returns true when the PUD does not point to a next page
> table page implying the fact that it might be a large/huge entry. I am not
> sure if the semantics holds good for other architectures too. But on arm64
> if pud_large() is true, then pud_bad() will be true as well. So pud_bad()
> check must happen after pud_large() check. So the sequence here should be
> 
> 1. pud_none()	--> Nothing is in here, return NULL
> 2. pud_large()	--> Return offset page address from the huge page mapping
> 3. pud_bad()	--> Return NULL as there is no more page table level left
> 
> Checking pud_bad() first can return NULL for a valid huge mapping.
> 
>>  	pmd = pmd_offset(pud, addr);
>> -	WARN_ON_ONCE(pmd_bad(*pmd));
>> -	if (pmd_none(*pmd) || pmd_bad(*pmd))
>> +	if (pmd_none(*pmd))
>> +		return NULL;
>> +	if (WARN_ON_ONCE(pmd_bad(*pmd)))
>>  		return NULL;
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +	if (pmd_large(*pmd))
>> +		return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
>> +#endif
> 
> Ditto.
> 
> I see that your previous proposal had this right which got changed in this
> manner after my comments. Sorry about it.
> 
> It was recently when I learned (correctly) that expected semantics of pxx_bad()
> is that - It does not point to the next page table page.  Hence I wonder why is
> this not renamed as pxx_table() instead to make it absolutely clear.
> 

Okay, I'll change it and resend. It worked okay on powerpc but it
looks like the usual precedent is testing for large before bad so we
will have to go with that.

Thanks,
Nick


  reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-23  9:44 [PATCH 0/3] " Nicholas Piggin
2019-06-23  9:44 ` [PATCH 1/3] arm64: mm: Add p?d_large() definitions Nicholas Piggin
2019-07-01  9:27   ` Will Deacon
2019-07-01 10:03     ` Steven Price
2019-07-01 10:15       ` Will Deacon
2019-07-02  3:07         ` Nicholas Piggin
2019-07-02 10:46           ` Will Deacon
2019-06-23  9:44 ` [PATCH 2/3] powerpc/64s: Add p?d_large definitions Nicholas Piggin
2019-06-23  9:44 ` [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
2019-06-24  6:52   ` Anshuman Khandual
2019-06-25  0:20     ` Nicholas Piggin [this message]
2019-06-24  5:52 ` [PATCH 0/3] " Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1561421882.9uwq6zqlvo.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=christophe.leroy@c-s.fr \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mark.rutland@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git