All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steve Wahl <steve.wahl@hpe.com>
To: Pavin Joseph <me@pavinjoseph.com>,
	Simon Horman <horms@verge.net.au>,
	kexec@lists.infradead.org, Eric Biederman <ebiederm@xmission.com>
Cc: Steve Wahl <steve.wahl@hpe.com>,
	Eric Hagberg <ehagberg@gmail.com>,
	dave.hansen@linux.intel.com, regressions@lists.linux.dev,
	stable@vger.kernel.org
Subject: Re: [REGRESSION] kexec does firmware reboot in kernel v6.7.6
Date: Tue, 12 Mar 2024 17:02:06 -0500	[thread overview]
Message-ID: <ZfDQ3j6lOf9xgC04@swahl-home.5wahls.com> (raw)
In-Reply-To: <42e3e931-2883-4faf-8a15-2d7660120381@pavinjoseph.com>

[*really* added kexec maintainers this time.]

Full thread starts here:
https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/

On Wed, Mar 13, 2024 at 12:12:31AM +0530, Pavin Joseph wrote:
> On 3/12/24 20:43, Steve Wahl wrote:
> > But I don't want to introduce a new command line parameter if the
> > actual problem can be understood and fixed.  The question is how much
> > time do I have to persue a direct fix before some other action needs
> > to be taken?
> 
> Perhaps the kexec maintainers [0] can be made aware of this and you could
> coordinate with them on a potential fix?
> 
> Currently maintained by
> P:      Simon Horman
> M:      horms@verge.net.au
> L:      kexec@lists.infradead.org

Probably a good idea to add kexec people to the list, so I've added
them to this email.

Everyone, my recent patch to the kernel that changed identity mapping:

7143c5f4cf2073193 x86/mm/ident_map: Use gbpages only where full GB page should be mapped.

... has broken kexec on a few machines.  The symptom is they do a full
BIOS reboot instead of a kexec of the new kernel.  Seems to be limited
to AMD processors, but it's not all AMD processors, probably just some
characteristic that they happen to share.

The same machines that are broken by my patch, are also broken in
previous kernels if you add "nogbpages" to the kernel command line
(which makes the identity map bigger, "nogbpages" doing for all parts
of the identity map what my patch does only for some parts of it).

I'm still hoping to find a machine I can reproduce this on to try and
debug it myself.

If any of you have any assistance or advice to offer, it would be most
welcome!

> I hope the root cause can be fixed instead of patching it over with a flag
> to suppress the problem, but I don't know how regressions are handled here.

That would be my preference as well.

Thanks,

--> Steve Wahl

-- 
Steve Wahl, Hewlett Packard Enterprise

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Steve Wahl <steve.wahl@hpe.com>
To: Pavin Joseph <me@pavinjoseph.com>,
	Simon Horman <horms@verge.net.au>,
	kexec@lists.infradead.org, Eric Biederman <ebiederm@xmission.com>
Cc: Steve Wahl <steve.wahl@hpe.com>,
	Eric Hagberg <ehagberg@gmail.com>,
	dave.hansen@linux.intel.com, regressions@lists.linux.dev,
	stable@vger.kernel.org
Subject: Re: [REGRESSION] kexec does firmware reboot in kernel v6.7.6
Date: Tue, 12 Mar 2024 17:02:06 -0500	[thread overview]
Message-ID: <ZfDQ3j6lOf9xgC04@swahl-home.5wahls.com> (raw)
In-Reply-To: <42e3e931-2883-4faf-8a15-2d7660120381@pavinjoseph.com>

[*really* added kexec maintainers this time.]

Full thread starts here:
https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/

On Wed, Mar 13, 2024 at 12:12:31AM +0530, Pavin Joseph wrote:
> On 3/12/24 20:43, Steve Wahl wrote:
> > But I don't want to introduce a new command line parameter if the
> > actual problem can be understood and fixed.  The question is how much
> > time do I have to persue a direct fix before some other action needs
> > to be taken?
> 
> Perhaps the kexec maintainers [0] can be made aware of this and you could
> coordinate with them on a potential fix?
> 
> Currently maintained by
> P:      Simon Horman
> M:      horms@verge.net.au
> L:      kexec@lists.infradead.org

Probably a good idea to add kexec people to the list, so I've added
them to this email.

Everyone, my recent patch to the kernel that changed identity mapping:

7143c5f4cf2073193 x86/mm/ident_map: Use gbpages only where full GB page should be mapped.

... has broken kexec on a few machines.  The symptom is they do a full
BIOS reboot instead of a kexec of the new kernel.  Seems to be limited
to AMD processors, but it's not all AMD processors, probably just some
characteristic that they happen to share.

The same machines that are broken by my patch, are also broken in
previous kernels if you add "nogbpages" to the kernel command line
(which makes the identity map bigger, "nogbpages" doing for all parts
of the identity map what my patch does only for some parts of it).

I'm still hoping to find a machine I can reproduce this on to try and
debug it myself.

If any of you have any assistance or advice to offer, it would be most
welcome!

> I hope the root cause can be fixed instead of patching it over with a flag
> to suppress the problem, but I don't know how regressions are handled here.

That would be my preference as well.

Thanks,

--> Steve Wahl

-- 
Steve Wahl, Hewlett Packard Enterprise

  parent reply	other threads:[~2024-03-12 22:02 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05 10:39 [REGRESSION] kexec does firmware reboot in kernel v6.7.6 Eric Hagberg
2024-03-07 16:33 ` Steve Wahl
     [not found]   ` <CAJbxNHfPHpbzRwfuFw6j7SxR1OsgBH2VJFPnchBHTtRueJna4A@mail.gmail.com>
2024-03-12 15:13     ` Steve Wahl
2024-03-12 18:42       ` Pavin Joseph
2024-03-12 20:09         ` Steve Wahl
2024-03-12 22:02         ` Steve Wahl [this message]
2024-03-12 22:02           ` Steve Wahl
2024-03-13 12:16           ` Eric W. Biederman
2024-03-13 12:16             ` Eric W. Biederman
2024-03-13 16:17             ` Steve Wahl
2024-03-13 16:17               ` Steve Wahl
2024-03-14  9:25               ` Dave Young
2024-03-14  9:25                 ` Dave Young
  -- strict thread matches above, loose matches on Subject: below --
2024-03-01 14:10 Pavin Joseph
2024-03-01 14:45 ` Linux regression tracking (Thorsten Leemhuis)
2024-03-02  8:24   ` Pavin Joseph
2024-03-02 15:17     ` Linux regression tracking (Thorsten Leemhuis)
2024-03-02 16:10       ` Pavin Joseph
2024-03-03  0:00         ` Steve Wahl
2024-03-03  6:32           ` Pavin Joseph
2024-03-04 16:15             ` Steve Wahl
2024-03-04 17:48               ` Pavin Joseph
2024-03-05 15:25                 ` Steve Wahl
2024-03-05 19:58                   ` Pavin Joseph
2024-03-06  3:09                     ` Pavin Joseph
2024-03-06 15:50                       ` Steve Wahl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfDQ3j6lOf9xgC04@swahl-home.5wahls.com \
    --to=steve.wahl@hpe.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=ebiederm@xmission.com \
    --cc=ehagberg@gmail.com \
    --cc=horms@verge.net.au \
    --cc=kexec@lists.infradead.org \
    --cc=me@pavinjoseph.com \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.