linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Sasha Levin <sashal@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Bhupesh Sharma <bhsharma@redhat.com>,
	kexec mailing list <kexec@lists.infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	James Morris <jmorris@namei.org>, linux-mm <linux-mm@kvack.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Matthias Brugger <matthias.bgg@gmail.com>,
	will@kernel.org, Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v3 08/17] arm64, trans_pgd: make trans_pgd_map_page generic
Date: Fri, 11 Oct 2019 19:15:15 +0100	[thread overview]
Message-ID: <ba96ab95-af8b-895e-e515-a94a63dd056a@arm.com> (raw)
In-Reply-To: <CA+CK2bAPA=L+KeWve=2PbNEh+B9mXRzTGr1iQqRCkOAs5dU-Qg@mail.gmail.com>

Hi Pavel,

On 06/09/2019 19:58, Pavel Tatashin wrote:
> On Fri, Sep 6, 2019 at 11:20 AM James Morse <james.morse@arm.com> wrote:
>> On 21/08/2019 19:31, Pavel Tatashin wrote:
>>> Currently, trans_pgd_map_page has assumptions that are relevant to
>>> hibernate. But, to make it generic we must allow it to use any allocator

>>> and also, can't assume that entries do not exist in the page table
>>> already.

[...]

>> Please don't use the page tables as an array: this is what the offset helpers are for.
> 
> Sure, I can use:
> 
> pte_offset_kernel()
> pmd_offset()
> pud_offset()
> pgd_offset_raw()

> The code becomes a little less efficient, because offsets return
> pointer to the entry after READ_ONCE, and we need to use another
> READ_ONCE() to read its content to parse its value in for example
> pud_table(), pud_none() etc . In my case we use READ_ONCE() only one
> time  per entry and operate on the content multiple times. Also,
> because of unfortunate differences in macro names, the code become a
> little less symmetric. Still, I can change the code to use _offsets
> here. Please let me know if you still think it is better to use them
> here.

We should make this as clearly readable as possible, that way reviewers can spot the bugs.
Using the helpers makes this more maintainable, as the helpers may be where strange things
like 52bit VA get implemented.

Making it fast is the compilers job. I agree it can't remove READ_ONCE()es, but I think
the difference between one and two READ_ONCE()es per leaf-entry is insignificant when we
go on to copy megabytes worth of data.


>> The copy_p?d() functions should decide if they should manipulate _this_ entry based on
>> _this_ entry and the kernel configuration. This is only really done in _copy_pte(), which
>> is where it should stay.
> 
> I am sorry, I do not understand this comment. Could you please
> elaborate what would you like me to change.

Consider the current _copy_pte():
|	} else if (debug_pagealloc_enabled() && !pte_none(pte)) {
|		/*
|		 * debug_pagealloc will removed the PTE_VALID bit if
|		 * the page isn't in use by the resume kernel. It may have
|		 * been in use by the original kernel, in which case we need
|		 * to put it back in our copy to do the restore.
|		 *
|		 * Before marking this entry valid, check the pfn should
|		 * be mapped.
|		 */
|		BUG_ON(!pfn_valid(pte_pfn(pte)));
|
|		set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte)));
|	}

From this it is very obvious that we only put the valid bits back into the page table if
debug_pagealloc is enabled and the not-valid PTE's PFN points to memory that was part of
the linear map.

If this logic gets moved apart, and strung together with global variables, its not at all
clear what happens.


>>> diff --git a/arch/arm64/include/asm/trans_pgd.h b/arch/arm64/include/asm/trans_pgd.h
>>> index c7b5402b7d87..e3d022b1b526 100644
>>> --- a/arch/arm64/include/asm/trans_pgd.h
>>> +++ b/arch/arm64/include/asm/trans_pgd.h
>>> @@ -11,10 +11,45 @@
>>>  #include <linux/bits.h>
>>>  #include <asm/pgtable-types.h>
>>>
>>> +/*
>>> + * trans_alloc_page
>>> + *   - Allocator that should return exactly one uninitilaized page, if this
>>> + *    allocator fails, trans_pgd returns -ENOMEM error.
>>> + *
>>> + * trans_alloc_arg
>>> + *   - Passed to trans_alloc_page as an argument
>>
>> This is very familiar.
> 
> Sorry, What do you mean?

This stuff used to take a pointer to a function that allocates a page, and an argument for
that allocator ... until patch 2 when you squashed it all in... only to undo it here. This
looks like churn.


>>> + * trans_flags

[...]

> I re-evaluated "flags", and figured that they are indeed not needed.
> So, I will embed them into the code directly.

Great!



>>> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
>>> index 00b62d8640c2..dbabccd78cc4 100644
>>> --- a/arch/arm64/mm/trans_pgd.c
>>> +++ b/arch/arm64/mm/trans_pgd.c
>>> @@ -17,6 +17,16 @@
>>>  #include <asm/pgtable.h>
>>>  #include <linux/suspend.h>

>>>
>>> -int trans_pgd_map_page(pgd_t *trans_pgd, void *page, unsigned long dst_addr,
>>> -                    pgprot_t pgprot)
>>> +int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
>>> +                    void *page, unsigned long dst_addr, pgprot_t pgprot)
>>>  {
>>> -     pgd_t *pgdp;
>>> -     pud_t *pudp;
>>> -     pmd_t *pmdp;
>>> -     pte_t *ptep;
>>> -
>>> -     pgdp = pgd_offset_raw(trans_pgd, dst_addr);
>>> -     if (pgd_none(READ_ONCE(*pgdp))) {
>>> -             pudp = (void *)get_safe_page(GFP_ATOMIC);
>>> -             if (!pudp)
>>> +     int pgd_idx = pgd_index(dst_addr);
>>> +     int pud_idx = pud_index(dst_addr);
>>> +     int pmd_idx = pmd_index(dst_addr);
>>> +     int pte_idx = pte_index(dst_addr);
>>
>> Yuck.
>>
> 
> What's wrong with pre-calculating indices? :)

The only thing to do with them is access the page tables as a C array. This stuff is the
business of the helpers, please use them. Its a maintenance headache if you don't.


>>> -     pudp = pud_offset(pgdp, dst_addr);
>>> -     if (pud_none(READ_ONCE(*pudp))) {
>>> -             pmdp = (void *)get_safe_page(GFP_ATOMIC);
>>> -             if (!pmdp)
>>> +     pudp = __va(pgd_page_paddr(pgd));
>>> +     pud = READ_ONCE(pudp[pud_idx]);
>>> +     if (pud_sect(pud)) {
>>> +             return -ENXIO;
>>> +     } else if (pud_none(pud) || pud_sect(pud)) {
>>> +             pmd_t *t = trans_alloc(info);
>>> +
>>> +             if (!t)
>>>                       return -ENOMEM;
>>
>> Choke on block mappings? This should never happen because this function should only create
>> the tables necessary to map one page. Not a block mapping in sight.
>>
>> (see my comments on patch 6)

> I can remove this, but what should I replace it with BUG() or silently
> ignore, and assume no huge page hre? I thought the idea is not to use
> BUG() calls in kernel code, and return errors instead. If, in the
> future PUD size mappings are added, how is that going to be detected?

...if in the future...

Could your turn RODATA_FULL_DEFAULT_ENABLED off in your kernel config, then check debugfs
kernel_page_tables export. You should see blocks mappings for the large contiguous blocks
of memory.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-10-11 18:15 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 18:31 [PATCH v3 00/17] arm64: MMU enabled kexec relocation Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 01/17] kexec: quiet down kexec reboot Pavel Tatashin
2019-09-06 15:17   ` James Morse
2019-09-06 15:35     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 02/17] arm64, hibernate: use get_safe_page directly Pavel Tatashin
2019-09-06 15:17   ` James Morse
2019-09-06 15:39     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 03/17] arm64, hibernate: remove gotos in create_safe_exec_page Pavel Tatashin
2019-09-06 15:17   ` James Morse
2019-09-06 15:41     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 04/17] arm64, hibernate: rename dst to page " Pavel Tatashin
2019-09-06 15:17   ` James Morse
2019-09-06 15:41     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 05/17] arm64, hibernate: check pgd table allocation Pavel Tatashin
2019-09-06 15:17   ` James Morse
2019-09-06 15:44     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 06/17] arm64, hibernate: add trans_pgd public functions Pavel Tatashin
2019-09-06 15:18   ` James Morse
2019-09-06 16:00     ` Pavel Tatashin
2019-10-11 18:16       ` James Morse
2019-08-21 18:31 ` [PATCH v3 07/17] arm64, hibernate: move page handling function to new trans_pgd.c Pavel Tatashin
2019-09-06 15:18   ` James Morse
2019-09-06 17:41     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 08/17] arm64, trans_pgd: make trans_pgd_map_page generic Pavel Tatashin
2019-09-06 15:20   ` James Morse
2019-09-06 18:58     ` Pavel Tatashin
2019-10-11 18:15       ` James Morse [this message]
2019-08-21 18:31 ` [PATCH v3 09/17] arm64, trans_pgd: add trans_pgd_create_empty Pavel Tatashin
2019-09-06 15:20   ` James Morse
2019-09-06 19:00     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 10/17] arm64, trans_pgd: adjust trans_pgd_create_copy interface Pavel Tatashin
2019-09-06 15:20   ` James Morse
2019-09-06 19:03     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 11/17] arm64, trans_pgd: add PUD_SECT_RDONLY Pavel Tatashin
2019-09-06 15:21   ` James Morse
2019-09-06 19:04     ` Pavel Tatashin
2019-08-21 18:31 ` [PATCH v3 12/17] arm64, trans_pgd: complete generalization of trans_pgds Pavel Tatashin
2019-09-06 15:23   ` James Morse
2019-09-06 19:06     ` Pavel Tatashin
2019-08-21 18:32 ` [PATCH v3 13/17] kexec: add machine_kexec_post_load() Pavel Tatashin
2019-08-21 18:32 ` [PATCH v3 14/17] arm64, kexec: move relocation function setup and clean up Pavel Tatashin
2019-08-21 18:32 ` [PATCH v3 15/17] arm64, kexec: add expandable argument to relocation function Pavel Tatashin
2019-08-21 18:32 ` [PATCH v3 16/17] arm64, kexec: configure trans_pgd page table for kexec Pavel Tatashin
2019-08-21 18:32 ` [PATCH v3 17/17] arm64, kexec: enable MMU during kexec relocation Pavel Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba96ab95-af8b-895e-e515-a94a63dd056a@arm.com \
    --to=james.morse@arm.com \
    --cc=bhsharma@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=jmorris@namei.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marc.zyngier@arm.com \
    --cc=mark.rutland@arm.com \
    --cc=matthias.bgg@gmail.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=sashal@kernel.org \
    --cc=vladimir.murzin@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).