linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Zmudzinski <brchuckz@netscape.net>
To: Thorsten Leemhuis <regressions@leemhuis.info>,
	Jan Beulich <jbeulich@suse.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Andrew Lutomirski <luto@kernel.org>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	the arch/x86 maintainers <x86@kernel.org>,
	Juergen Gross <jgross@suse.com>
Subject: Re: Ping: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen
Date: Thu, 14 Jul 2022 18:45:11 -0400	[thread overview]
Message-ID: <a13b8cdd-8e9f-a917-9e61-1ce6eee8da1c@netscape.net> (raw)
In-Reply-To: <1a486b6d-037e-ac54-4279-286b4ae9452e@netscape.net>

On 7/14/2022 6:33 PM, Chuck Zmudzinski wrote:
> On 7/14/2022 1:17 PM, Chuck Zmudzinski wrote:
> > On 7/5/22 6:57 AM, Thorsten Leemhuis wrote:
> > > [CCing tglx, mingo, Boris and Juergen]
> > >
> > > On 04.07.22 14:26, Jan Beulich wrote:
> > > > On 04.07.2022 13:58, Thorsten Leemhuis wrote:
> > > >> On 25.05.22 10:55, Jan Beulich wrote:
> > > >>> On 28.04.2022 16:50, Jan Beulich wrote:
> > > >>>> The latest with commit bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT
> > > >>>> with pat_enabled()") pat_enabled() returning false (because of PAT
> > > >>>> initialization being suppressed in the absence of MTRRs being announced
> > > >>>> to be available) has become a problem: The i915 driver now fails to
> > > >>>> initialize when running PV on Xen (i915_gem_object_pin_map() is where I
> > > >>>> located the induced failure), and its error handling is flaky enough to
> > > >>>> (at least sometimes) result in a hung system.
> > > >>>>
> > > >>>> Yet even beyond that problem the keying of the use of WC mappings to
> > > >>>> pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular
> > > >>>> graphics frame buffer accesses would have been quite a bit less
> > > >>>> performant than possible.
> > > >>>>
> > > >>>> Arrange for the function to return true in such environments, without
> > > >>>> undermining the rest of PAT MSR management logic considering PAT to be
> > > >>>> disabled: Specifically, no writes to the PAT MSR should occur.
> > > >>>>
> > > >>>> For the new boolean to live in .init.data, init_cache_modes() also needs
> > > >>>> moving to .init.text (where it could/should have lived already before).
> > > >>>>
> > > >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > > >>>
> > > >>> The Linux kernel regression tracker is pestering me because things are
> > > >>> taking so long (effectively quoting him), and alternative proposals
> > > >>> made so far look to have more severe downsides.
> > > >>
> > > >> Has any progress been made with this patch? It afaics is meant to fix
> > > >> this regression, which ideally should have been fixed weeks ago (btw:
> > > >> adding a "Link:" tag pointing to it would be good):
> > > >> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> > > >>
> > > >> According to Juergen it's still needed:
> > > >> https://lore.kernel.org/lkml/c5515533-29a9-9e91-5a36-45f00f25b37b@suse.com/
> > > >>
> > > >> Or was a different solution found to fix that regression?
> > > > 
> > > > No progress and no alternatives I'm aware of.
> > >
> > > Getting closer to the point where I need to bring this to Linus
> > > attention. I hope this mail can help avoiding this.
> > >
> > > Jan, I didn't follow this closely, but do you have any idea why Dave,
> > > Luto, and Peter are ignoring this? Is reverting bdd8b6c98239 a option to
> > > get the regression fixed? Would a repost maybe help getting this rolling
> > > again?
> >
> > Hi, Thorsten,
> >
> > Here is a link to the hardware probe of my system which exhibits
> > a system hang before fully booting with bdd8b6c98239. Without
> > bdd8b6c98239, the problem is gone:
> >
> > https://linux-hardware.org/?probe=32e615b538
> >
> > Keep in mind this problem is not seen with bdd8b6c98239
> > on the bare metal, but only when running as a traditional Dom0
> > PV type guest on Xen. I don't know see the problem on Xen HVM
> > DomU, and I have not tested it on Xen PVH DomU, Xen PV DomU,
> > or the experimental Xen PVH Dom0.
>
> Update: On affected hardware, you do not need to run in a
> Xen PV Dom0 to see the regression caused by bdd8b6c98239.
>
> All you need to do is run, on the bare metal, on the affected
> hardware, with the Linux kernel nopat boot option.
>
> Jan mentions in his commit message the function in the i915
> driver that was touched by bdd8b6c98239 and that causes this
> regression. That is, any Intel IGD that needs to execute the
> function that Jan mentions in the commit message of his
> proposed patch when the i915 driver is setting up the graphics
> engine will most likely be hardware that is affected. My Intel
> IGD was marketed as HD Graphics 4600, I think.
>
> So find an a system with these hardware characteristics, and
> try running, with the nopat option, the Linux kernel, with
> and without bdd8b6c98239. You will see the regression I
> am experiencing, I predict.

This raises a disturbing question: The commit message of
bdd8b6c98239 mentions the nopat option. It does not specify what
effect the commit was supposed to have on system
with the nopat option, but the actual effect on the system,
both with the seldom used nopat option and in Xen PV Dom0,
a nasty regression on some older Intel IGD devices. My question:

Was this intentional? Or just grossly incompetent? Any other
possibilities?

I think you should definitely notify Linus about this if you can
verify the story I am telling here.

Chuck

>
> Chuck
>
> >
> > You can probably verify yourself that reverting bdd8b6c98239
> > fixes the regression if you try to reproduce the regression with
> > any Linux version that has bdd8b6c98239 or its equivalent on
> > the stable branches with a hardware profile similar to the link
> > to the profile of my machine which exhibits the problem. Mine
> > is a Haswell core-i5 4590S CPU and ASRock B85M-Pro4
> > motherboard.
> >
> > Also, other notes:
> >
> > 1. Yes, AFAICT, Marek at Qubes OS is the first to report the problem.
> > 2. Juergen Gross' work to try to fix this has been helpful, but none
> > of his posted patches has fixed the regression on my system.
> > 3. Jan's patch fixes it also, and so do the two patches I posted to lkml
> > earlier this week to the appropriate maintainers.
> > 4. On the pkg-xen-devel mailing list, which is public, this issue was
> > briefly discussed where I first reported it. Someone there said they
> > did not see the issue with Broadwell Xeons. Mine is a Haswell core i5,
> > which is one generation older than Broadwell, so you are most likely
> > to be able to reproduce the problem with a Haswell core i5 desktop
> > system like my ASRock system, which was my own private build
> > which has been working fine for eight years until Linux 5.17.y.
> >
> > Hope this helps.
> >
> > Chuck
> >
> > > BTW, for anyone new to this, Jan's patch afaics is supposed to fix the
> > > regression reported here:
> > > https://lore.kernel.org/all/YnHK1Z3o99eMXsVK@mail-itl/
> > >
> > > Side note: Juergen Gross recently posted related patches in this code
> > > area to fix some other problems (regressions?), but his efforts look
> > > stalled, too:
> > > https://lore.kernel.org/all/ddb0cc0d-cefc-4f33-23f8-3a94c7c51a49@suse.com/
> > >
> > > And he recently stated this Jan's patch is still needed, even if his
> > > changes make it in.
> > > https://lore.kernel.org/all/c5515533-29a9-9e91-5a36-45f00f25b37b@suse.com/
> > >
> > > This from my point all looks a bit... unsatisfying. :-/
> > >
> > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > >
> > > P.S.: As the Linux kernel's regression tracker I deal with a lot of
> > > reports and sometimes miss something important when writing mails like
> > > this. If that's the case here, don't hesitate to tell me in a public
> > > reply, it's in everyone's interest to set the public record straight.
> >
>
>


  reply	other threads:[~2022-07-14 22:45 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-28 14:50 [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen Jan Beulich
2022-05-03 12:54 ` Juergen Gross
2022-05-11 13:32   ` Juergen Gross
2022-05-21 13:56 ` Chuck Zmudzinski
2022-05-25  8:55 ` Ping: " Jan Beulich
2022-07-04 11:58   ` Thorsten Leemhuis
2022-07-04 12:26     ` Jan Beulich
2022-07-05 10:57       ` Thorsten Leemhuis
2022-07-05 11:02         ` Jan Beulich
2022-07-05 13:36         ` Borislav Petkov
2022-07-05 13:38           ` Juergen Gross
2022-07-14 17:17         ` Chuck Zmudzinski
2022-07-14 22:33           ` Chuck Zmudzinski
2022-07-14 22:45             ` Chuck Zmudzinski [this message]
2022-07-19 14:26               ` Chuck Zmudzinski
2022-07-05 15:04 ` Borislav Petkov
2022-07-05 15:56   ` Jan Beulich
2022-07-05 16:14     ` Borislav Petkov
2022-07-06  6:17       ` Jan Beulich
2022-07-06 17:01         ` Borislav Petkov
2022-07-07  6:38           ` Jan Beulich
2022-07-11 10:40             ` Borislav Petkov
2022-07-11 11:38       ` Chuck Zmudzinski
2022-07-11 12:28       ` [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen, with corrected patch Chuck Zmudzinski
2022-07-11 14:18       ` [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen Chuck Zmudzinski
2022-07-11 14:31         ` Juergen Gross
2022-07-11 17:41           ` Chuck Zmudzinski
2022-07-12  5:49             ` Juergen Gross
2022-07-12  6:04             ` Jan Beulich
2022-07-12 13:22               ` Chuck Zmudzinski
2022-07-12 13:32                 ` Juergen Gross
2022-07-12 15:09                   ` Chuck Zmudzinski
2022-07-12 15:30                     ` Juergen Gross
2022-07-12 16:34                       ` Chuck Zmudzinski
2022-08-15 10:20 ` [tip: x86/urgent] x86/PAT: Have pat_enabled() properly reflect state when running on Xen tip-bot2 for Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a13b8cdd-8e9f-a917-9e61-1ce6eee8da1c@netscape.net \
    --to=brchuckz@netscape.net \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jbeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=regressions@leemhuis.info \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).