Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Mark Brown <broonie@kernel.org>,
	Steven Price <Steven.Price@arm.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Kees Cook <keescook@chromium.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Matthew Wilcox <willy@infradead.org>,
	Sri Krishna chowdary <schowdary@nvidia.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Russell King - ARM Linux <linux@armlinux.org.uk>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Paul Mackerras <paulus@samba.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vineet Gupta <vgupta@synopsys.com>,
	James Hogan <jhogan@kernel.org>,
	Paul Burton <paul.burton@mips.com>,
	Ralf Baechle <ralf@linux-mips.org>,
	linux-snps-arc@lists.infradead.org, linux-mips@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	linux-sh@vger.kernel.org, sparclinux@vger.kernel.org,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] mm/pgtable/debug: Add test validating architecture page table helpers
Date: Mon, 9 Sep 2019 11:56:50 +0530
Message-ID: <3d5de35f-8192-1c75-50a9-03e66e3b8e5c@arm.com> (raw)
In-Reply-To: <20190906210346.5ecbff01@thinkpad>



On 09/07/2019 12:33 AM, Gerald Schaefer wrote:
> On Fri, 6 Sep 2019 11:58:59 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>> On 09/05/2019 10:36 PM, Gerald Schaefer wrote:
>>> On Thu, 5 Sep 2019 14:48:14 +0530
>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>   
>>>>> [...]    
>>>>>> +
>>>>>> +#if !defined(__PAGETABLE_PMD_FOLDED) && !defined(__ARCH_HAS_4LEVEL_HACK)
>>>>>> +static void pud_clear_tests(pud_t *pudp)
>>>>>> +{
>>>>>> +	memset(pudp, RANDOM_NZVALUE, sizeof(pud_t));
>>>>>> +	pud_clear(pudp);
>>>>>> +	WARN_ON(!pud_none(READ_ONCE(*pudp)));
>>>>>> +}    
>>>>>
>>>>> For pgd/p4d/pud_clear(), we only clear if the page table level is present
>>>>> and not folded. The memset() here overwrites the table type bits, so
>>>>> pud_clear() will not clear anything on s390 and the pud_none() check will
>>>>> fail.
>>>>> Would it be possible to OR a (larger) random value into the table, so that
>>>>> the lower 12 bits would be preserved?    
>>>>
>>>> So the suggestion is instead of doing memset() on entry with RANDOM_NZVALUE,
>>>> it should OR a large random value preserving lower 12 bits. Hmm, this should
>>>> still do the trick for other platforms, they just need non zero value. So on
>>>> s390, the lower 12 bits on the page table entry already has valid value while
>>>> entering this function which would make sure that pud_clear() really does
>>>> clear the entry ?  
>>>
>>> Yes, in theory the table entry on s390 would have the type set in the last
>>> 4 bits, so preserving those would be enough. If it does not conflict with
>>> others, I would still suggest preserving all 12 bits since those would contain
>>> arch-specific flags in general, just to be sure. For s390, the pte/pmd tests
>>> would also work with the memset, but for consistency I think the same logic
>>> should be used in all pxd_clear_tests.  
>>
>> Makes sense but..
>>
>> There is a small challenge with this. Modifying individual bits on a given
>> page table entry from generic code like this test case is bit tricky. That
>> is because there are not enough helpers to create entries with an absolute
>> value. This would have been easier if all the platforms provided functions
>> like __pxx() which is not the case now. Otherwise something like this should
>> have worked.
>>
>>
>> pud_t pud = READ_ONCE(*pudp);
>> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0))
>> WRITE_ONCE(*pudp, pud);
>>
>> But __pud() will fail to build in many platforms.
> 
> Hmm, I simply used this on my system to make pud_clear_tests() work, not
> sure if it works on all archs:
> 
> pud_val(*pudp) |= RANDOM_NZVALUE;

Which compiles on arm64 but then fails on x86 because of the way pmd_val()
has been defined there. on arm64 and s390 (with many others) pmd_val() is
a macro which still got the variable that can be used as lvalue but that is
not true for some other platforms like x86.

arch/arm64/include/asm/pgtable-types.h:	#define pmd_val(x)	((x).pmd)
arch/s390/include/asm/page.h:		#define pmd_val(x)	((x).pmd)
arch/x86/include/asm/pgtable.h:		#define pmd_val(x)       native_pmd_val(x)

static inline pmdval_t native_pmd_val(pmd_t pmd)
{
        return pmd.pmd;
}

Unless I am mistaken, the return value from this function can not be used as
lvalue for future assignments.

mm/arch_pgtable_test.c: In function ‘pud_clear_tests’:
mm/arch_pgtable_test.c:156:17: error: lvalue required as left operand of assignment
  pud_val(*pudp) |= RANDOM_ORVALUE;
                 ^~
AFAICS pxx_val() were never intended to be used as lvalue and using it that way
might just happen to work on all those platforms which define them as macros.
They meant to just provide values for an entry as being determined by the platform.

In principle pxx_val() on an entry was not supposed to be modified directly from
generic code without going through (again) platform helpers for any specific state
change (write, old, dirty, special, huge etc). The current use case is a deviation
for that.

I originally went with memset() just to load up the entries with non-zero value so
that we know pxx_clear() are really doing the clearing. The same is being followed
for all pxx_same() checks.

Another way for fixing the problem would be to mark them with known attributes
like write/young/huge etc instead which for sure will create non-zero entries.
We can do that for pxx_clear() and pxx_same() tests and drop RANDOM_NZVALUE
completely. Does that sound good ?

> 
>>
>> The other alternative will be to make sure memset() happens on all other
>> bits except the lower 12 bits which will depend on endianness. If s390
>> has a fixed endianness, we can still use either of them which will hold
>> good for others as well.
>>
>> memset(pudp, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>>
>> OR
>>
>> memset(pudp + 3, RANDOM_NZVALUE, sizeof(pud_t) - 3);
>>
>>>
>>> However, there is another issue on s390 which will make this only work
>>> for pud_clear_tests(), and not for the p4d/pgd_tests. The problem is that
>>> mm_alloc() will only give you a 3-level page table initially on s390.
>>> This means that pudp == p4dp == pgdp, and so the p4d/pgd_tests will
>>> both see the pud level (of course this also affects other tests).  
>>
>> Got it.
>>
>>>
>>> Not sure yet how to fix this, i.e. how to initialize/update the page table
>>> to 5 levels. We can handle 5 level page tables, and it would be good if
>>> all levels could be tested, but using mm_alloc() to establish the page
>>> tables might not work on s390. One option could be to provide an arch-hook
>>> or weak function to allocate/initialize the mm.  
>>
>> Sure, got it. Though I plan to do add some arch specific tests or init sequence
>> like the above later on but for now the idea is to get the smallest possible set
>> of test cases which builds and runs on all platforms without requiring any arch
>> specific hooks or special casing (#ifdef) to be agreed upon broadly and accepted.
>>
>> Do you think this is absolutely necessary on s390 for the very first set of test
>> cases or we can add this later on as an improvement ?
> 
> It can be added later, no problem. I did not expect this to work flawlessly
> on s390 right from the start anyway, with all our peculiarities, so don't
> let this hinder you. I might come up with an add-on patch later.

Sure.

> 
> Actually, using get_unmapped_area() as suggested by Kirill could also
> solve this issue. We do create a new mm with 3-level page tables on s390,
> and the dynamic upgrade to 4 or 5 levels is then triggered exactly by
> arch_get_unmapped_area(), depending on the addr. But I currently don't
> see how / where arch_get_unmapped_area() is set up for such a dummy mm
> created by mm_alloc().

Normally they are set during program loading but we can set it up explicitly
for the test mm_struct if we need to but there are some other challenges.

load_[aout|elf|flat|..]_binary()
	setup_new_exec()
		arch_pick_mmap_layout().

I did some initial experiments around get_unmapped_area(). Seems bit tricky
to get it working on a pure 'test' mm_struct. It expects a real user context
in the form of current->mm.

get_unmapped_area()
{
	....
	get_area = current->mm->get_unmapped_area;
	....
	addr = get_area(file, addr, len, pgoff, flags); {
		....
		struct mm_struct *mm = current->mm;
		....
		if (addr) {
			...
			vma = find_vma_prev(mm, addr, &prev);
		}
		....
		vm_unmapped_area() {
			struct mm_struct *mm = current->mm;
			....
			/* Walks across mm->mm_rb.rb_node */
		}
	}
	....
}	

Simple call like get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0) to get an
address fails right away on current->mm->get_unmapped_area which does
not have a valid value in the kernel context.

There might be two methods to get around this problem

1) Write a custom get_unmapped_area() imitating the real one but going
   around the problem by taking an appropriately initialized mm_struct
   instead of current->mm.

2) Create dummy user task with dummy mm, switch 'current' context before
   calling into get_unmapped_area() and switch back again. Dont know if
   this is even possible.

Wondering if this might deviate too much from the original goal of
testing the page table helpers.

Looking back again at the proposed test vaddr, wondering what will be the
real problem in case it goes beyond user address range ? Will pxx_alloc()
fail to create page table ranges at required level ? Apart from skipping
pgtable_page_ctor/dtor for page table pages, it might not really affect
any helpers as such.

VADDR_TEST (PGDIR_SIZE + [P4D_SIZE] + PUD_SIZE + PMD_SIZE + PAGE_SIZE)

OR

A random page aligned address in [FIRST_USER_ADDRESS..TASK_SIZE] range ?


  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03  8:01 [PATCH 0/1] mm/debug: Add tests for architecture exported " Anshuman Khandual
2019-09-03  8:01 ` [PATCH 1/1] mm/pgtable/debug: Add test validating architecture " Anshuman Khandual
2019-09-03 11:13   ` kbuild test robot
2019-09-04  6:14     ` Anshuman Khandual
2019-09-04 14:19   ` Kirill A. Shutemov
2019-09-05  8:18     ` Anshuman Khandual
2019-09-05  8:59       ` Kirill A. Shutemov
2019-09-06  7:03         ` Anshuman Khandual
2019-09-04 20:16   ` Gerald Schaefer
2019-09-05  9:18     ` Anshuman Khandual
2019-09-05 17:06       ` Gerald Schaefer
2019-09-06  6:28         ` Anshuman Khandual
2019-09-06 19:03           ` Gerald Schaefer
2019-09-09  6:26             ` Anshuman Khandual [this message]
2019-09-09 15:13               ` Kirill A. Shutemov
2019-09-10  3:56                 ` Anshuman Khandual
2019-09-10  4:45                   ` Christophe Leroy
2019-09-10  5:43                     ` Anshuman Khandual
2019-09-09 16:51               ` Gerald Schaefer
2019-09-04 23:14   ` Dave Hansen

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d5de35f-8192-1c75-50a9-03e66e3b8e5c@arm.com \
    --to=anshuman.khandual@arm.com \
    --cc=Steven.Price@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=broonie@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jgg@ziepe.ca \
    --cc=jhogan@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=paul.burton@mips.com \
    --cc=paulus@samba.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=peterz@infradead.org \
    --cc=ralf@linux-mips.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=schowdary@nvidia.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=vgupta@synopsys.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git