linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandre Ghiti <alexghiti@rivosinc.com>
To: Conor Dooley <conor@kernel.org>
Cc: Anup Patel <apatel@ventanamicro.com>,
	Alexandre Ghiti <alex@ghiti.fr>, Anup Patel <anup@brainfault.org>,
	Conor Dooley <conor.dooley@microchip.com>,
	Song Shuai <suagrfillet@gmail.com>,
	robh@kernel.org, Andrew Jones <ajones@ventanamicro.com>,
	palmer@rivosinc.com, jeeheng.sia@starfivetech.com,
	leyfoon.tan@starfivetech.com, mason.huo@starfivetech.com,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Guo Ren <guoren@kernel.org>,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	Sunil V L <sunilvl@ventanamicro.com>
Subject: Re: Bug report: kernel paniced when system hibernates
Date: Wed, 24 May 2023 15:57:20 +0200	[thread overview]
Message-ID: <CAHVXubjhti9reD8oCh3Jm97tvFdUGi9ShYnm0NhQU4ZYVgU6HA@mail.gmail.com> (raw)
In-Reply-To: <20230524-wasp-charm-fe5c5478957a@spud>

Hi Conor,

On Wed, May 24, 2023 at 3:49 PM Conor Dooley <conor@kernel.org> wrote:
>
> Alex, Anup,
>
> On Thu, May 18, 2023 at 07:34:16PM +0530, Anup Patel wrote:
> > On Thu, May 18, 2023 at 5:39 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
> > > On 5/18/23 08:53, Anup Patel wrote:
> > > > On Wed, May 17, 2023 at 8:26 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > >> On Wed, May 17, 2023 at 1:28 PM Conor Dooley <conor.dooley@microchip.com> wrote:
> > > >>> On Wed, May 17, 2023 at 10:58:02AM +0200, Alexandre Ghiti wrote:
> > > >>>> On Tue, May 16, 2023 at 1:12 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> > > >>>>> On Tue, May 16, 2023 at 11:24 AM Song Shuai <suagrfillet@gmail.com> wrote:
> > > >>>>> I actually removed this flag a few years ago, and I have to admit that
> > > >>>>> I need to check if that's necessary: the goal of commit 3335068f8721
> > > >>>>> ("riscv: Use PUD/P4D/PGD pages for the linear mapping") is to expose
> > > >>>>> the "right" start of DRAM so that we can align virtual and physical
> > > >>>>> addresses on a 1GB boundary.
> > > >>>>>
> > > >>>>> So I have to check if a nomap region is actually added as a
> > > >>>>> memblock.memory.regions[] or not: if yes, that's perfect, let's add
> > > >>>>> the nomap attributes to the PMP regions, otherwise, I don't think that
> > > >>>>> is a good solution.
> > > >>>> So here is the current linear mapping without nomap in openSBI:
> > > >>>>
> > > >>>> ---[ Linear mapping ]---
> > > >>>> 0xff60000000000000-0xff60000000200000    0x0000000080000000         2M
> > > >>>> PMD     D A G . . W R V
> > > >>>> 0xff60000000200000-0xff60000000e00000    0x0000000080200000        12M
> > > >>>> PMD     D A G . . . R V
> > > >>>>
> > > >>>> And below the linear mapping with nomap in openSBI:
> > > >>>>
> > > >>>> ---[ Linear mapping ]---
> > > >>>> 0xff60000000080000-0xff60000000200000    0x0000000080080000      1536K
> > > >>>> PTE     D A G . . W R V
> > > >>>> 0xff60000000200000-0xff60000000e00000    0x0000000080200000        12M
> > > >>>> PMD     D A G . . . R V
> > > >>>>
> > > >>>> So adding nomap does not misalign virtual and physical addresses, it
> > > >>>> prevents the usage of 1GB page for this area though, so that's a
> > > >>>> solution, we just lose this 1GB page here.
> > > >>>>
> > > >>>> But even though that may be the fix, I think we also need to fix that
> > > >>>> in the kernel as it would break compatibility with certain versions of
> > > >>>> openSBI *if* we fix openSBI...So here are a few solutions:
> > > >>>>
> > > >>>> 1. we can mark all "mmode_resv" nodes in the device tree as nomap,
> > > >>>> before the linear mapping is established (IIUC, those nodes are added
> > > >>>> by openSBI to advertise PMP regions)
> > > >>>>      -> This amounts to the same fix as opensbi and we lose the 1GB hugepage.
> > > >>> AFAIU, losing the 1 GB hugepage is a regression, which would make this
> > > >>> not an option, right?
> > > >> Not sure this is a real regression, I'd rather avoid it, but as
> > > >> mentioned in my first answer, Mike Rapoport showed that it was making
> > > >> no difference performance-wise...
> > > >>
> > > >>>> 2. we can tweak pfn_is_nosave function to *not* save pfn corresponding
> > > >>>> to PMP regions
> > > >>>>      -> We don't lose the 1GB hugepage \o/
> > > >>>> 3. we can use register_nosave_region() to not save the "mmode_resv"
> > > >>>> regions (x86 does that
> > > >>>> https://elixir.bootlin.com/linux/v6.4-rc1/source/arch/x86/kernel/e820.c#L753)
> > > >>>>      -> We don't lose the 1GB hugepage \o/
> > > >>>> 4. Given JeeHeng pointer to
> > > >>>> https://elixir.bootlin.com/linux/v6.4-rc1/source/kernel/power/snapshot.c#L1340,
> > > >>>> we can mark those pages as non-readable and make the hibernation
> > > >>>> process not save those pages
> > > >>>>      -> Very late-in-the-day idea, not sure what it's worth, we also
> > > >>>> lose the 1GB hugepage...
> > > >>> Ditto here re: introducing another regression.
> > > >>>
> > > >>>> To me, the best solution is 3 as it would prepare for other similar
> > > >>>> issues later, it is similar to x86 and it allows us to keep 1GB
> > > >>>> hugepages.
> > > >>>>
> > > >>>> I have been thinking, and to me nomap does not provide anything since
> > > >>>> the kernel should not address this memory range, so if it does, we
> > > >>>> must fix the kernel.
> > > >>>>
> > > >>>> Let me know what you all think, I'll be preparing a PoC of 3 in the meantime!
> > > >>> #3 would probably get my vote too. It seems like you could use it
> > > >>> dynamically if there was to be a future other provider of "mmode_resv"
> > > >>> regions, rather than doing something location-specific.
> > > >>>
> > > >>> We should probably document these opensbi reserved memory nodes though
> > > >>> in a dt-binding or w/e if we are going to be relying on them to not
> > > >>> crash!
> > > > Depending on a particular node name is fragile. If we really need
> > > > information from DT then I suggest adding "no-save-restore" DT
> > > > property in reserved memory nodes.
> > >
> > >
> > > I understand your point, the node name is the only thing I found that
> > > would work with current opensbi: any other idea what we could use instead?
> > >
> > >
> > > >> Yes, you're right, let's see what Atish and Anup think!
> > > > I think we have two possible approaches:
> > > >
> > > > 1) Update OpenSBI to set "no-map" DT property for firmware
> > > >      reserved regions. We were doing this previously but removed
> > > >      it later for performance reasons mentioned by Alex. It is also
> > > >      worth mentioning that ARM Trusted Firmware also sets "no-map"
> > > >      DT property for firmware reserved regions.
> > > >
> > > > 2) Add a new "no-save-restore" DT property in the reserved
> > > >      memory DT bindings. The hibernate support of Linux arch/riscv
> > > >      will use this DT property to exclude memory regions from
> > > >      save-restore. The EFI implementation of EDK2 and U-Boot
> > > >      should do the following:
> > > >      1) Treat all memory having "no-map" DT property as EFI
> > > >          reserved memory
> > > >      2) Treat all memory not having "no-map" DT property and
> > > >          not having "no-save-restore" DT property as EfiBootServicesData
> > > >      3) Treat all memory not having "no-map" DT property and
> > > >           having "no-save-restore" DT property as EfiRuntimeServiceData
> > > >           (Refer,
> > > > https://devicetree-specification.readthedocs.io/en/latest/chapter3-devicenodes.html#reserved-memory-and-uefi)
> > > >
> > > > Personally, I am leaning towards approach#1 since approach#2
> > > > will require changing DeviceTree specification as well.
> > >
> > >
> > > If needed, indeed #1 is the simplest, but I insist, to me it is not
> > > needed (and we don't have it in the current opensbi), if you have
> > > another opinion, I'm open to discuss it!
> >
> > I agree with you, backward compatibility with older firmwares
> > is important.
> >
> > Let's go with your proposed change to treat reserved DT nodes
> > with "mmode_resv*" name as M-mode firmware memory (it could
> > be any M-mode firmware). We will certainly need to document it
> > somewhere as an expectation of Linux RISC-V kernel.
>
> Actually, you two both probably know the answer to this, but was there a
> release done of OpenSBI where the reserved memory region was not
> specified to be no-map?

The reserved memory regions are *currently* not specified to be no-map
and that since v0.8: the culprit is commit 6966ad0abe70
("platform/lib: Allow the OS to map the regions that are protected by
PMP").

So to make sure we are on the same page:

- from 0.1 to 0.7 => reserved memory regions were marked as no-map
- starting from 0.8 => reserved memory regions are *not* marked as
no-map anymore

Hope that's clear!

Alex

>
> >
> > @Sunil How about treating "mmode_resv*" as
> > EfiRuntimeServiceData in EDK2 ? Other reserved memory
> > nodes can follow the device tree specification.
>

  reply	other threads:[~2023-05-24 13:57 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-16  9:24 Bug report: kernel paniced when system hibernates Song Shuai
2023-05-16  9:55 ` JeeHeng Sia
2023-05-16 11:14   ` Alexandre Ghiti
2023-05-16 11:27     ` JeeHeng Sia
2023-05-17  8:33       ` Alexandre Ghiti
2023-05-18  4:06         ` JeeHeng Sia
2023-05-16 11:12 ` Alexandre Ghiti
2023-05-17  8:58   ` Alexandre Ghiti
2023-05-17 11:05     ` Song Shuai
     [not found]       ` <CAHVXubgjgMvFV0MOABbtKr+2TH85+0kow7wOrjxFCP5iXt1saQ@mail.gmail.com>
2023-05-17 14:42         ` Fwd: " Alexandre Ghiti
2023-05-18  1:29           ` JeeHeng Sia
2023-05-18  9:13             ` Alexandre Ghiti
2023-05-18  3:29           ` Song Shuai
2023-05-18 11:54             ` Alexandre Ghiti
2023-05-17 11:27     ` Conor Dooley
2023-05-17 14:55       ` Alexandre Ghiti
2023-05-18  6:53         ` Anup Patel
2023-05-18  7:59           ` Conor Dooley
2023-05-18  8:41             ` Alexandre Ghiti
2023-05-18 10:35               ` Conor Dooley
2023-05-18 11:58                 ` Alexandre Ghiti
2023-05-24 23:53                   ` Atish Patra
2023-05-25 12:55                     ` Conor Dooley
2023-05-18 12:21                 ` JeeHeng Sia
2023-05-18 12:09           ` Alexandre Ghiti
2023-05-18 14:04             ` Anup Patel
2023-05-24 13:49               ` Conor Dooley
2023-05-24 13:57                 ` Alexandre Ghiti [this message]
2023-05-24 15:59                   ` Conor Dooley
2023-05-24 23:45               ` Atish Patra
2023-05-25 13:08                 ` Conor Dooley
2023-05-25 13:21                   ` Anup Patel
2023-05-25 13:37                     ` Conor Dooley
2023-05-25 13:43                       ` Anup Patel
2023-05-25 13:55                         ` Conor Dooley
2023-05-25 13:59                           ` Anup Patel
2023-05-25 14:20                             ` Conor Dooley
2023-05-25 17:39                               ` Atish Patra
2023-05-25 18:22                                 ` Conor Dooley
2023-05-25 18:37                                   ` Atish Patra
2023-05-25 18:39                                     ` Conor Dooley
2023-05-25 20:06                                       ` Atish Patra
2023-05-25 21:24                                         ` Conor Dooley
2023-05-26 13:14                                           ` Alexandre Ghiti
2023-05-26 14:59                                             ` Conor Dooley
2023-05-26 15:12                                               ` Alexandre Ghiti
2023-05-26 15:17                                                 ` Anup Patel
2023-05-26 15:22                                                   ` Alexandre Ghiti
2023-05-26 18:48                                                     ` Atish Patra
2023-05-16 11:35 ` Conor Dooley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHVXubjhti9reD8oCh3Jm97tvFdUGi9ShYnm0NhQU4ZYVgU6HA@mail.gmail.com \
    --to=alexghiti@rivosinc.com \
    --cc=ajones@ventanamicro.com \
    --cc=alex@ghiti.fr \
    --cc=anup@brainfault.org \
    --cc=apatel@ventanamicro.com \
    --cc=conor.dooley@microchip.com \
    --cc=conor@kernel.org \
    --cc=guoren@kernel.org \
    --cc=jeeheng.sia@starfivetech.com \
    --cc=leyfoon.tan@starfivetech.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mason.huo@starfivetech.com \
    --cc=palmer@rivosinc.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh@kernel.org \
    --cc=suagrfillet@gmail.com \
    --cc=sunilvl@ventanamicro.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).