linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A logical error in arch/x86/mm/init.c
@ 2022-02-03 10:30 Nikita Popov
  2022-02-03 17:27 ` Dave Hansen
  0 siblings, 1 reply; 4+ messages in thread
From: Nikita Popov @ 2022-02-03 10:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz; +Cc: linux-kernel

Hello!

It appears that there is a logical error in arch/x86/mm/init.c in the
master branch. Although it is unlikely to occur in practice. I
discovered it while studying source code in that file.

Here is the description:
1) The function 'init_range_memory_mapping' is responsible for direct
mapping of the normal memory range. Before proceeding to mapping of a
valid subrange containing usable RAM the function checks that it is
safe to allocate pages from PGT_BUF subarea of BRK area (the 'safe'
means we we are not mapping the page belonging to PGT_BUF subarea
which can incur MMU issues if that page is allocated as a page
directory). As a result the global flag 'can_use_brk_pgt' reflects
whether the current subrange overlaps with the free area from PGT_BUF.
The 'true' value means no overlapping.
2) The function 'alloc_low_pages' is used during page directories
construction. If it happens that we exhausted PGT_BUF completely (such
condition usually causes can_use_brk_pgt == true), there is a chance
that we use common BRK area on memblock allocation failure (memblock
allocator is restricted to use already mapped pages only). If the
'can_use_brk_pgt' flag is set we allocate a page from the common BRK
area. But the BRK area is not protected with the similar check meaning
that in theory we can allocate a page serving as page directory for
the very same page. The flag 'can_use_brk_pgt' protects only the
PGT_BUF subarea of the BRK area. Not the whole BRK area.

In my opinion one of the simplest fixes here is to completely remove
the following lines:
    if (!ret && can_use_brk_pgt)
            ret = __pa(extend_brk(PAGE_SIZE * num, PAGE_SIZE));

I look forward to seeing your comments on this.
Kind regards.

Nikita Popov
Senior C Developer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: A logical error in arch/x86/mm/init.c
  2022-02-03 10:30 A logical error in arch/x86/mm/init.c Nikita Popov
@ 2022-02-03 17:27 ` Dave Hansen
  2022-02-04  5:35   ` Nikita Popov
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2022-02-03 17:27 UTC (permalink / raw)
  To: Nikita Popov, dave.hansen, luto, peterz
  Cc: linux-kernel, the arch/x86 maintainers, Juergen Gross

On 2/3/22 02:30, Nikita Popov wrote:
> It appears that there is a logical error in arch/x86/mm/init.c in the
> master branch. Although it is unlikely to occur in practice. I
> discovered it while studying source code in that file.

I looked at this a bit.  It seems to have originated in:

> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=c9b3234a6aba

which isn't the best changelog in the history of the world.  It's also
fixing a boot problem in a configuration that I can't readily reproduce
(Xen PV guest).

There's one thing from the old changelog that's confusing me:

>     But after we get back that page for pgt, it tiggers one bug in pgt allocation
>     with xen: We need to avoid to use page as pgt to map range that is
>     overlapping with that pgt page.

and presumably alluding to the same issue from your mail:

> ... which can incur MMU issues if that page is allocated as a page
> directory)

What are these "MMU issues"?

> In my opinion one of the simplest fixes here is to completely remove
> the following lines:
>     if (!ret && can_use_brk_pgt)
>             ret = __pa(extend_brk(PAGE_SIZE * num, PAGE_SIZE));

This _might_ be right.  But, my confidence that it won't break anything
else is pretty low.  It's also obviously not been tested.

I'd be much more confident if this issue was reproduced, even if the
reproduction was contrived by doing something like purposefully
exhausting the pgt_buf_* area.

If you really feel that this is something that needs to be fixed, I'd
appreciate if you could find some way to reproduce it and then send a
proper patch.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: A logical error in arch/x86/mm/init.c
  2022-02-03 17:27 ` Dave Hansen
@ 2022-02-04  5:35   ` Nikita Popov
  2022-02-04 10:59     ` Juergen Gross
  0 siblings, 1 reply; 4+ messages in thread
From: Nikita Popov @ 2022-02-04  5:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: dave.hansen, luto, peterz, linux-kernel,
	the arch/x86 maintainers, Juergen Gross

Thank you for your attention.
> If you really feel that this is something that needs to be fixed, I'd
> appreciate if you could find some way to reproduce it and then send a
> proper patch.
I believe this would be hard to reproduce.
I just noticed this discrepancy during manual code review.
I'm considering the following facts:
1) The area 'pgt_buf' is part of the 'brk' area defined in the linker
script. It is allocated in the function 'early_alloc_pgt_buf' using
the very same 'extend_brk'. The latter is essentially a stack-based
allocator picking its memory slices from the linker defined area.
2) The allocations from 'pgt_buf' are in the stack manner too.
One can expect that these two areas (one of which is completely
contained in the other) have the same properties in view of the direct
memory mapping.

Then there is the flag 'can_use_brk_pgt' which allows usage of the
pgt_buf area if a mapped range doesn't overlap with the free space of
the pgt_buf area. In the 'init_range_memory_mapping' function we can
observe that this flag doesn't reflect the relative position between a
mapped range and the free space of the brk area as a whole:
                /*
                 * if it is overlapping with brk pgt, we need to
                 * alloc pgt buf from memblock instead.
                 */
                can_use_brk_pgt = max(start, (u64)pgt_buf_end<<PAGE_SHIFT) >=
                                    min(end, (u64)pgt_buf_top<<PAGE_SHIFT);
This check is simply too narrow.

So for whatever reason this flag prohibits usage of the pgt_buf area,
I believe for the exact same reason we have to avoid using brk area if
the similar condition on the free space of the brk area holds.
> This _might_ be right.  But, my confidence that it won't break anything
> else is pretty low.  It's also obviously not been tested.
Yes, I agree here. I saw it as my duty to report the possible issue.
> What are these "MMU issues"?
I tried to deduce the underlying reason beyond the code fragments in
question. I presumed that checking for overlap is protecting against
some MMU issues that could affect stability of the kernel.

Kind regards,
Nikita Popov
Senior C Developer

On Thu, Feb 3, 2022 at 10:27 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/3/22 02:30, Nikita Popov wrote:
> > It appears that there is a logical error in arch/x86/mm/init.c in the
> > master branch. Although it is unlikely to occur in practice. I
> > discovered it while studying source code in that file.
>
> I looked at this a bit.  It seems to have originated in:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=c9b3234a6aba
>
> which isn't the best changelog in the history of the world.  It's also
> fixing a boot problem in a configuration that I can't readily reproduce
> (Xen PV guest).
>
> There's one thing from the old changelog that's confusing me:
>
> >     But after we get back that page for pgt, it tiggers one bug in pgt allocation
> >     with xen: We need to avoid to use page as pgt to map range that is
> >     overlapping with that pgt page.
>
> and presumably alluding to the same issue from your mail:
>
> > ... which can incur MMU issues if that page is allocated as a page
> > directory)
>
> What are these "MMU issues"?
>
> > In my opinion one of the simplest fixes here is to completely remove
> > the following lines:
> >     if (!ret && can_use_brk_pgt)
> >             ret = __pa(extend_brk(PAGE_SIZE * num, PAGE_SIZE));
>
> This _might_ be right.  But, my confidence that it won't break anything
> else is pretty low.  It's also obviously not been tested.
>
> I'd be much more confident if this issue was reproduced, even if the
> reproduction was contrived by doing something like purposefully
> exhausting the pgt_buf_* area.
>
> If you really feel that this is something that needs to be fixed, I'd
> appreciate if you could find some way to reproduce it and then send a
> proper patch.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: A logical error in arch/x86/mm/init.c
  2022-02-04  5:35   ` Nikita Popov
@ 2022-02-04 10:59     ` Juergen Gross
  0 siblings, 0 replies; 4+ messages in thread
From: Juergen Gross @ 2022-02-04 10:59 UTC (permalink / raw)
  To: Nikita Popov, Dave Hansen
  Cc: dave.hansen, luto, peterz, linux-kernel, the arch/x86 maintainers


[-- Attachment #1.1.1: Type: text/plain, Size: 2655 bytes --]

On 04.02.22 06:35, Nikita Popov wrote:
> Thank you for your attention.
>> If you really feel that this is something that needs to be fixed, I'd
>> appreciate if you could find some way to reproduce it and then send a
>> proper patch.
> I believe this would be hard to reproduce.
> I just noticed this discrepancy during manual code review.
> I'm considering the following facts:
> 1) The area 'pgt_buf' is part of the 'brk' area defined in the linker
> script. It is allocated in the function 'early_alloc_pgt_buf' using
> the very same 'extend_brk'. The latter is essentially a stack-based
> allocator picking its memory slices from the linker defined area.
> 2) The allocations from 'pgt_buf' are in the stack manner too.
> One can expect that these two areas (one of which is completely
> contained in the other) have the same properties in view of the direct
> memory mapping.
> 
> Then there is the flag 'can_use_brk_pgt' which allows usage of the
> pgt_buf area if a mapped range doesn't overlap with the free space of
> the pgt_buf area. In the 'init_range_memory_mapping' function we can
> observe that this flag doesn't reflect the relative position between a
> mapped range and the free space of the brk area as a whole:
>                  /*
>                   * if it is overlapping with brk pgt, we need to
>                   * alloc pgt buf from memblock instead.
>                   */
>                  can_use_brk_pgt = max(start, (u64)pgt_buf_end<<PAGE_SHIFT) >=
>                                      min(end, (u64)pgt_buf_top<<PAGE_SHIFT);
> This check is simply too narrow.
> 
> So for whatever reason this flag prohibits usage of the pgt_buf area,
> I believe for the exact same reason we have to avoid using brk area if
> the similar condition on the free space of the brk area holds.
>> This _might_ be right.  But, my confidence that it won't break anything
>> else is pretty low.  It's also obviously not been tested.
> Yes, I agree here. I saw it as my duty to report the possible issue.
>> What are these "MMU issues"?
> I tried to deduce the underlying reason beyond the code fragments in
> question. I presumed that checking for overlap is protecting against
> some MMU issues that could affect stability of the kernel.

I've done a local test with the can_use_brk_pgt tests in
alloc_low_pages() removed and couldn't trigger any bug when running as
Xen PV guest.

This could be due to commit 0167d7d8b0beb4cf1, or some parts of early
memory management initialization (maybe for Xen PV only) have changed
since the patch introducing can_use_brk_pgt was applied.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-02-04 10:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 10:30 A logical error in arch/x86/mm/init.c Nikita Popov
2022-02-03 17:27 ` Dave Hansen
2022-02-04  5:35   ` Nikita Popov
2022-02-04 10:59     ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).