linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexandre Ghiti <alexghiti@rivosinc.com>
To: Nylon Chen <nylon.chen@sifive.com>
Cc: alex@ghiti.fr, apatel@ventanamicro.com, catalin.marinas@arm.com,
	 will@kernel.org, paul.walmsley@sifive.com, palmer@dabbelt.com,
	 aou@eecs.berkeley.edu, robh+dt@kernel.org,
	frowand.list@gmail.com,  rppt@kernel.org,
	akpm@linux-foundation.org, anup@brainfault.org,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  linux-riscv@lists.infradead.org,
	devicetree@vger.kernel.org,  linux-mm@kvack.org,
	zong.li@sifive.com, nylon7717@gmail.com
Subject: Re: Fwd: [PATCH v8 0/4] riscv: Use PUD/P4D/PGD pages for the linear mapping
Date: Tue, 12 Mar 2024 10:33:39 +0100	[thread overview]
Message-ID: <CAHVXubjMfagEqBkfgvjcQM8ifcJrLUMgkBZ6HOLB318Hb25O_Q@mail.gmail.com> (raw)
In-Reply-To: <CAHh=Yk8cWiX8q7YT3OuqNBP=kgNdZm3cWZCm01+CfVVQcnPy=Q@mail.gmail.com>

Hi Nylon,

On Tue, Mar 12, 2024 at 7:48 AM Nylon Chen <nylon.chen@sifive.com> wrote:
>
> Nylon Chen <nylon.chen@sifive.com> 於 2024年3月12日 週二 下午2:40寫道:
> >
> > Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年2月5日 週一 下午5:32寫道:
> > >
> > > Hi Nylon,
> > >
> > > On Fri, Jan 19, 2024 at 10:27 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> > > >
> > > > Alexandre Ghiti <alexghiti@rivosinc.com> 於 2024年1月18日 週四 下午9:01寫道:
> > > > >
> > > > > Hi Nylon,
> > > > Hi Alexandre, thanks for your feedback,
> > > > >
> > > > > On Thu, Jan 18, 2024 at 9:23 AM Nylon Chen <nylon.chen@sifive.com> wrote:
> > > > > >
> > > > > > > On 3/23/23 15:55, Anup Patel wrote:
> > > > > > > > On Thu, Mar 23, 2023 at 6:24 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > > >> Hi Anup,
> > > > > > > >>
> > > > > > > >> On Thu, Mar 23, 2023 at 1:18 PM Anup Patel <apatel@ventanamicro.com> wrote:
> > > > > > > >>> Hi Alex,
> > > > > > > >>>
> > > > > > > >>> On Thu, Mar 16, 2023 at 6:48 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > > > > > >>>> This patchset intends to improve tlb utilization by using hugepages for
> > > > > > > >>>> the linear mapping.
> > > > > > > >>>>
> > > > > > > >>>> As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
> > > > > > > >>>> take care of isolating the kernel text and rodata so that they are not
> > > > > > > >>>> mapped with a PUD mapping which would then assign wrong permissions to
> > > > > > > >>>> the whole region: it is achieved by introducing a new memblock API.
> > > > > > > >>>>
> > > > > > > >>>> Another patch makes use of this new API in arm64 which used some sort of
> > > > > > > >>>> hack to solve this issue: it was built/boot tested successfully.
> > > > > > > >>>>
> > > > > > > >>>> base-commit-tag: v6.3-rc1
> > > > > > > >>>>
> > > > > > > >>>> v8:
> > > > > > > >>>> - Fix rv32, as reported by Anup
> > > > > > > >>>> - Do not modify memblock_isolate_range and fixes comment, as suggested by Mike
> > > > > > > >>>> - Use the new memblock API for crash kernel too in arm64, as suggested by Andrew
> > > > > > > >>>> - Fix arm64 double mapping (which to me did not work in v7), but ends up not
> > > > > > > >>>>    being pretty at all, will wait for comments from arm64 reviewers, but
> > > > > > > >>>>    this patch can easily be dropped if they do not want it.
> > > > > > > >>>>
> > > > > > > >>>> v7:
> > > > > > > >>>> - Fix Anup bug report by introducing memblock_isolate_memory which
> > > > > > > >>>>    allows us to split the memblock mappings and then avoid to map the
> > > > > > > >>>>    the PUD which contains the kernel as read only
> > > > > > > >>>> - Add a patch to arm64 to use this newly introduced API
> > > > > > > >>>>
> > > > > > > >>>> v6:
> > > > > > > >>>> - quiet LLVM warning by casting phys_ram_base into an unsigned long
> > > > > > > >>>>
> > > > > > > >>>> v5:
> > > > > > > >>>> - Fix nommu builds by getting rid of riscv_pfn_base in patch 1, thanks
> > > > > > > >>>>    Conor
> > > > > > > >>>> - Add RB from Andrew
> > > > > > > >>>>
> > > > > > > >>>> v4:
> > > > > > > >>>> - Rebase on top of v6.2-rc3, as noted by Conor
> > > > > > > >>>> - Add Acked-by Rob
> > > > > > > >>>>
> > > > > > > >>>> v3:
> > > > > > > >>>> - Change the comment about initrd_start VA conversion so that it fits
> > > > > > > >>>>    ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > > > > > >>>>    by Rob
> > > > > > > >>>>
> > > > > > > >>>> v2:
> > > > > > > >>>> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > > > > > >>>>    early in the boot process, as asked by Rob
> > > > > > > >>>>
> > > > > > > >>>> Alexandre Ghiti (4):
> > > > > > > >>>>    riscv: Get rid of riscv_pfn_base variable
> > > > > > > >>>>    mm: Introduce memblock_isolate_memory
> > > > > > > >>>>    arm64: Make use of memblock_isolate_memory for the linear mapping
> > > > > > > >>>>    riscv: Use PUD/P4D/PGD pages for the linear mapping
> > > > > > > >>> Kernel boot fine on RV64 but there is a failure which is still not
> > > > > > > >>> addressed. You can see this failure as following message in
> > > > > > > >>> kernel boot log:
> > > > > > > >>>      0.000000] Failed to add a System RAM resource at 80200000
> > > > > > > >> Hmmm I don't get that in any of my test configs, would you mind
> > > > > > > >> sharing yours and your qemu command line?
> > > > > > > > Try alexghiti_test branch at
> > > > > > > > https://github.com/avpatel/linux.git
> > > > > > > >
> > > > > > > > I am building the kernel using defconfig and my rootfs is
> > > > > > > > based on busybox.
> > > > > > > >
> > > > > > > > My QEMU command is:
> > > > > > > > qemu-system-riscv64 -M virt -m 512M -nographic -bios
> > > > > > > > opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel
> > > > > > > > ./build-riscv64/arch/riscv/boot/Image -append "root=/dev/ram rw
> > > > > > > > console=ttyS0 earlycon" -initrd ./rootfs_riscv64.img -smp 4
> > > > > > >
> > > > > > >
> > > > > > > So splitting memblock.memory is the culprit, it "confuses" the resources
> > > > > > > addition and I can only find hacky ways to fix that...
> > > > > > Hi Alexandre,
> > > > > >
> > > > > > We encountered the same error as Anup. After adding your patch
> > > > > > (3335068f87217ea59d08f462187dc856652eea15), we will not encounter the
> > > > > > error again.
> > > > > >
> > > > > > What I have observed so far is
> > > > > >
> > > > > > - before your patch
> > > > > > When merging consecutive memblocks, if the memblock types are different,
> > > > > > they will be merged into reserved
> > > > > > - after your patch
> > > > > > When consecutive memblocks are merged, if the memblock types are
> > > > > > different, they will be merged into memory.
> > > > > >
> > > > > > Such a result will cause the memory location of OpenSBI to be changed
> > > > > > from reserved to memory. Will this have any side effects?
> > > > >
> > > > > I guess it will end up in the memory pool and pages from openSBI
> > > > > region will be allocated, so we should see very quickly bad stuff
> > > > > happening (either PMP violation or M-mode ecall never
> > > > > returning/trapping/etc).
> > > > >
> > > > > But I don't observe the same thing, I always see the openSBI region
> > > > > being reserved:
> > > > >
> > > > > reserved[0x0] [0x0000000080000000-0x000000008007ffff],
> > > > > 0x0000000000080000 bytes flags: 0x0
> > > > >
> > > > > Can you elaborate a bit more about "When consecutive memblocks are
> > > > > merged, if the memblock types are different, they will be merged into
> > > > > memory"? Where/when does this merge happen? Can you give me a config
> > > > > file and a kernel revision so that I can take a look?
> > > > Ok, If you want to reproduce the same results you just need to modify OpenSBI
> > > >
> > > > [ lib/sbi/sbi_domain.c ]
> > > > +#define TEST_SIZE 0x200000
> > > >
> > > > -                                 (scratch->fw_size - scratch->fw_rw_offset),
> > > > +                                 (TEST_SIZE - scratch->fw_rw_offset),
> > > >
> > > > In addition, you can insert checks in the kernel merged function
> > > > [ mm/memblock.c ]
> > > > static void __init_memblock memblock_merge_regions(struct memblock_type *type)
> > > >         while (i < type->cnt - 1) {
> > > >          ...
> > > >                 /* move forward from next + 1, index of which is i + 2 */
> > > >                 memmove(next, next + 1, (type->cnt - (i + 2)) * sizeof(*next));
> > > >                 type->cnt--;
> > > >         }
> > > > +       pr_info("Merged memblock_type: cnt = %lu, max = %lu,
> > > > total_size = 0x%llx\n",type->cnt, type->max, type->total_size);
> > > > +       for (i = 0; i < type->cnt; i++) {
> > > > +               const char *region_type =
> > > > memblock_is_memory(type->regions[i].base) ? "memory" : "reserve";
> > > > +               pr_info("Region %d: base = 0x%llx, size = 0x%llx, type
> > > > = %s\n", i, type->regions[i].base, type->regions[i].size,
> > > > region_type);
> > > > +       }
> > > >  }
> > > > This is kernel boot log
> > > > - before your patch
> > > > ...
> > > > [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> > > > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1628501
> > > > [    0.000000] Region 0: base = 0x80000000, size = 0x1600000, type = reserve
> > > > ...
> > > >
> > > > - after your patch
> > > > ...
> > > > [    0.000000] OF: fdt: Reserving memory: base = 0x80000000, size = 0x200000
> > > > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x180c42e
> > > > [    0.000000] Region 0: base = 0x80000000, size = 0x1800000, type = memory
> > >
> > Hi Alex, thanks for your feedback.
> > > So the openSBI region is marked as memory, and not reserved because
> > > this region is now described as nomap, and memblock_mark_nomap() does
> > > not move this region into the reserved memblock list, but keep it in
> > > the memory list with the nomap flag
> > > (https://elixir.bootlin.com/linux/latest/source/drivers/of/fdt.c#L479).
> > > But as stated in the description of memblock_mark_nomap()
> > > (https://elixir.bootlin.com/linux/latest/source/mm/memblock.c#L969),
> > > the pages associated with the region will be marked as PageReserved
> > > and the region will not be covered in the linear mapping.
> > I traced it via GDB, and indeed, it enters
> > early_init_dt_reserve_memory() and calls memblock_reserve to reserve
> > this block of memory.
> >
> > [before your patch]
> > [    0.000000] OF: fdt: check nomap Reserving memory: base =
> > 0x80000000, size = 0x200000
> > [    0.000000] ---  Reserved memory: Base address: 80000000, Size:
> > 200000---
> > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size =
> > 0x1e28501
> > [    0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type =
> > reserve
> > [    0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type =
> > memory
> > ....
> > [    0.000000] OF: fdt: Reserved memory: reserved region for node
> > 'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB
> > [    0.000000] OF: reserved mem:
> > 0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable
> > mmode_resv0@80000000
> >
> > [after your patch]
> > [    0.000000] OF: fdt: check nomap Reserving memory: base =
> > 0x80000000, size = 0x200000
> > [    0.000000] --- Reserved memory: Base address: 80000000, Size: 200000---
> > [    0.000000] Merged memblock_type: cnt = 4, max = 128, total_size = 0x1e25501
> > [    0.000000] Region 0: base = 0x80000000, size = 0x1e00000, type = memory
> > [    0.000000] Region 1: base = 0xbfe00000, size = 0x6002, type = memory
> > ...
> > [    0.000000] OF: fdt: Reserved memory: reserved region for node
> > 'mmode_resv0@80000000': base 0x0000000080000000, size 2 MiB
> > [    0.000000] OF: reserved mem:
> > 0x0000000080000000..0x00000000801fffff (2048 KiB) map non-reusable
> > mmode_resv0@80000000
> >
> > At the moment, it can be confirmed that there is no need to worry
> > about this block of memory being used.
> >
> > But I still have a question I'd like to ask, which is why this
> > location is flagged as 'reserve' instead of 'memory' in the memblock
> Sorry, I asked the wrong question.
>
> Why is this location marked as "memory" instead of "reserve" in the memblock?

No idea, let's see if @Mike Rapoport can answer this :)

> >
> > Thanks,
> > Nylon
> > >
> > > So to me, this is normal and we are safe. Let me know if I made a mistake.
> > >
> > > And sorry for the long delay, that slipped my mind!
> > >
> > > Thanks,
> > >
> > > Alex
> > >
> > > > ...
> > > > [    0.000000] Failed to add a system RAM resource at 80200000
> > > > ...
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Alex
> > > > >
> > > > > > >
> > > > > > > So given that the arm64 patch with the new API is not pretty and that
> > > > > > > the simplest solution is to re-merge the memblock regions afterwards
> > > > > > > (which is done by memblock_clear_nomap), I'll drop the new API and the
> > > > > > > arm64 patch to use the nomap API like arm64: I'll take advantage of that
> > > > > > > to clean setup_vm_final which I have wanted to do for a long time.
> > > > > > >
> > > > > > > @Mike Thanks for you reviews!
> > > > > > >
> > > > > > > @Anup Thanks for all your bug reports on this patchset, I have to
> > > > > > > improve my test flow (it is in the work :)).
> > > > > > >
> > > > > > >
> > > > > > > > Regards,
> > > > > > > > Anup
> > > > > > > >
> > > > > > > >> Thanks
> > > > > > > >>
> > > > > > > >>> Regards,
> > > > > > > >>> Anup
> > > > > > > >>>
> > > > > > > >>>>   arch/arm64/mm/mmu.c           | 25 +++++++++++------
> > > > > > > >>>>   arch/riscv/include/asm/page.h | 19 +++++++++++--
> > > > > > > >>>>   arch/riscv/mm/init.c          | 53 ++++++++++++++++++++++++++++-------
> > > > > > > >>>>   arch/riscv/mm/physaddr.c      | 16 +++++++++++
> > > > > > > >>>>   drivers/of/fdt.c              | 11 ++++----
> > > > > > > >>>>   include/linux/memblock.h      |  1 +
> > > > > > > >>>>   mm/memblock.c                 | 20 +++++++++++++
> > > > > > > >>>>   7 files changed, 119 insertions(+), 26 deletions(-)
> > > > > > > >>>>
> > > > > > > >>>> --
> > > > > > > >>>> 2.37.2
> > > > > > > >>>>
> > > > > > > > _______________________________________________
> > > > > > > > linux-riscv mailing list
> > > > > > > > linux-riscv@lists.infradead.org
> > > > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > linux-riscv mailing list
> > > > > > > linux-riscv@lists.infradead.org
> > > > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv


      reply	other threads:[~2024-03-12  9:33 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-16 13:17 [PATCH v8 0/4] riscv: Use PUD/P4D/PGD pages for the linear mapping Alexandre Ghiti
2023-03-16 13:17 ` [PATCH v8 1/4] riscv: Get rid of riscv_pfn_base variable Alexandre Ghiti
2023-03-16 13:17 ` [PATCH v8 2/4] mm: Introduce memblock_isolate_memory Alexandre Ghiti
2023-03-16 20:12   ` Mike Rapoport
2023-03-20 10:54     ` Alexandre Ghiti
2023-03-20 17:44       ` Mike Rapoport
2023-03-23 11:52         ` Alexandre Ghiti
2023-03-23 12:16           ` Mike Rapoport
2023-03-23 12:30             ` Alexandre Ghiti
2023-03-16 13:17 ` [PATCH v8 3/4] arm64: Make use of memblock_isolate_memory for the linear mapping Alexandre Ghiti
2023-03-24 15:21   ` Will Deacon
2023-03-24 15:23     ` Alexandre Ghiti
2023-03-16 13:17 ` [PATCH v8 4/4] riscv: Use PUD/P4D/PGD pages " Alexandre Ghiti
2023-03-23 12:18 ` [PATCH v8 0/4] " Anup Patel
2023-03-23 12:54   ` Alexandre Ghiti
2023-03-23 14:55     ` Anup Patel
2023-03-24  9:59       ` Alexandre Ghiti
     [not found]         ` <CAPqJEFr6MgUyARfbWAo7EeQKLVd4xRJz_LOYN68UC-kPD1Hr5A@mail.gmail.com>
2024-01-18  8:23           ` Fwd: " Nylon Chen
2024-01-18 13:01             ` Alexandre Ghiti
2024-01-19  9:26               ` Nylon Chen
2024-02-05  9:32                 ` Alexandre Ghiti
2024-03-12  6:40                   ` Nylon Chen
2024-03-12  6:48                     ` Nylon Chen
2024-03-12  9:33                       ` Alexandre Ghiti [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHVXubjMfagEqBkfgvjcQM8ifcJrLUMgkBZ6HOLB318Hb25O_Q@mail.gmail.com \
    --to=alexghiti@rivosinc.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=apatel@ventanamicro.com \
    --cc=catalin.marinas@arm.com \
    --cc=devicetree@vger.kernel.org \
    --cc=frowand.list@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=nylon.chen@sifive.com \
    --cc=nylon7717@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh+dt@kernel.org \
    --cc=rppt@kernel.org \
    --cc=will@kernel.org \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).