All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] meta-yocto: bump qemu preferred version to 4.4
@ 2016-02-11 15:15 Bruce Ashfield
  2016-02-11 15:15 ` [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel Bruce Ashfield
  0 siblings, 1 reply; 24+ messages in thread
From: Bruce Ashfield @ 2016-02-11 15:15 UTC (permalink / raw)
  To: richard.purdie; +Cc: poky

Hi all,

4.4 has been in test in various forms for over a month now, without major 
issues on any of the image types I've tried.  Now that we have -stable
updates to 4.4 (as well as -rt updates), things are looking good to make
it the default for the qemu machines.

I'm also at the limit of the combos I can build and test, so this needs to
go out for wider testing.

Cheers,

Bruce


The following changes since commit 4e7320cc81178fe17c870e987a526d3f29caf906:

  linux-yocto/4.1: galileo backports and support (2016-02-11 10:01:55 -0500)

are available in the git repository at:

  git://git.pokylinux.org/poky-contrib zedd/kernel-yocto
  http://git.pokylinux.org/cgit.cgi/poky-contrib/log/?h=zedd/kernel-yocto

Bruce Ashfield (1):
  poky: update qemu* to prefer 4.4 kernel

 meta-yocto/conf/distro/poky.conf | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

-- 
2.1.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-11 15:15 [PATCH 0/1] meta-yocto: bump qemu preferred version to 4.4 Bruce Ashfield
@ 2016-02-11 15:15 ` Bruce Ashfield
  2016-02-12 14:36   ` Richard Purdie
  0 siblings, 1 reply; 24+ messages in thread
From: Bruce Ashfield @ 2016-02-11 15:15 UTC (permalink / raw)
  To: richard.purdie; +Cc: poky

4.4 is out and has had enough mileage to be the default for the
qemu machines. Tested with sato, minimal and kernel dev image
types.

Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
---
 meta-yocto/conf/distro/poky.conf | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/meta-yocto/conf/distro/poky.conf b/meta-yocto/conf/distro/poky.conf
index dec364498553..ede728dde1fd 100644
--- a/meta-yocto/conf/distro/poky.conf
+++ b/meta-yocto/conf/distro/poky.conf
@@ -18,13 +18,13 @@ POKY_DEFAULT_EXTRA_RRECOMMENDS = "kernel-module-af-packet"
 
 DISTRO_FEATURES ?= "${DISTRO_FEATURES_DEFAULT} ${DISTRO_FEATURES_LIBC} ${POKY_DEFAULT_DISTRO_FEATURES}"
 
-PREFERRED_VERSION_linux-yocto ?= "4.1%"
-PREFERRED_VERSION_linux-yocto_qemux86 ?= "4.1%"
-PREFERRED_VERSION_linux-yocto_qemux86-64 ?= "4.1%"
-PREFERRED_VERSION_linux-yocto_qemuarm ?= "4.1%"
-PREFERRED_VERSION_linux-yocto_qemumips ?= "4.1%"
-PREFERRED_VERSION_linux-yocto_qemumips64 ?= "4.1%"
-PREFERRED_VERSION_linux-yocto_qemuppc ?= "4.1%"
+PREFERRED_VERSION_linux-yocto ?= "4.4%"
+PREFERRED_VERSION_linux-yocto_qemux86 ?= "4.4%"
+PREFERRED_VERSION_linux-yocto_qemux86-64 ?= "4.4%"
+PREFERRED_VERSION_linux-yocto_qemuarm ?= "4.4%"
+PREFERRED_VERSION_linux-yocto_qemumips ?= "4.4%"
+PREFERRED_VERSION_linux-yocto_qemumips64 ?= "4.4%"
+PREFERRED_VERSION_linux-yocto_qemuppc ?= "4.4%"
 
 SDK_NAME = "${DISTRO}-${TCLIBC}-${SDK_ARCH}-${IMAGE_BASENAME}-${TUNE_PKGARCH}"
 SDKPATH = "/opt/${DISTRO}/${SDK_VERSION}"
-- 
2.1.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-11 15:15 ` [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel Bruce Ashfield
@ 2016-02-12 14:36   ` Richard Purdie
  2016-02-12 15:32     ` Richard Purdie
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Purdie @ 2016-02-12 14:36 UTC (permalink / raw)
  To: Bruce Ashfield; +Cc: poky

On Thu, 2016-02-11 at 10:15 -0500, Bruce Ashfield wrote:
> 4.4 is out and has had enough mileage to be the default for the
> qemu machines. Tested with sato, minimal and kernel dev image
> types.

This mostly worked except for qemux86 which is showing X failing on all
builds. Its quite reproducible, "MACHINE=qemux86 bitbake core-image
-sato -c testimage" shows it.

https://autobuilder.yoctoproject.org/main/builders/nightly-x86/builds/6
48/steps/Running%20Sanity%20Tests/logs/stdio
is one example from the autobuilder.

Poking at the image manually, it appears Xorg fails due to:

[3587706.730] (EE) XKB: Could not invoke xkbcomp
[3587706.730] (EE) XKB: Couldn't compile keymap
[3587706.731] (EE) XKB: Failed to load keymap. Loading default keymap
instead.
[3587706.765] (EE) XKB: Could not invoke xkbcomp
[3587706.765] (EE) XKB: Couldn't compile keymap

which prompted me to look at dmesg:

EXT4-fs (vda): re-mounted. Opts: data=ordered
random: dd urandom read with 51 bits of entropy available
x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
------------[ cut here ]------------
WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
Modules linked in: uvesafb
CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
 00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
 00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
Call Trace:
 [<c1397ab2>] dump_stack+0x4b/0x79
 [<c1051477>] warn_slowpath_common+0x87/0xc0
 [<c104b98f>] ? untrack_pfn+0xaf/0xc0
 [<c104b98f>] ? untrack_pfn+0xaf/0xc0
 [<c1051552>] warn_slowpath_null+0x22/0x30
 [<c104b98f>] untrack_pfn+0xaf/0xc0
 [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
 [<c114e17f>] unmap_single_vma+0x4ef/0x500
 [<c114f007>] unmap_vmas+0x37/0x50
 [<c1154f8f>] exit_mmap+0x5f/0xf0
 [<c104eedd>] mmput+0x2d/0xb0
 [<c105009c>] copy_process+0xd2c/0x13c0
 [<c1050892>] _do_fork+0x82/0x340
 [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
 [<c1050c3c>] SyS_clone+0x2c/0x30
 [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
 [<c189a94a>] entry_INT80_32+0x2a/0x2a
---[ end trace be3e0a61097feddc ]---
x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period (net c1c271c0)

which fits since do_fork() is probably failing and would cause this
error. It looks like the above is only a warning but could be fatal
later I guess.

A quick look at upstream makes me wonder about:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d9fe4fab11976e56b2e992980bf6ce948bdf02ac

which changes the code in this area.

Any idea what is going on here?

Cheers,

Richard


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-12 14:36   ` Richard Purdie
@ 2016-02-12 15:32     ` Richard Purdie
  2016-02-13  8:31       ` Richard Purdie
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Purdie @ 2016-02-12 15:32 UTC (permalink / raw)
  To: Bruce Ashfield; +Cc: poky

On Fri, 2016-02-12 at 14:36 +0000, Richard Purdie wrote:
> A quick look at upstream makes me wonder about:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commi
> t/?id=d9fe4fab11976e56b2e992980bf6ce948bdf02ac
> 
> which changes the code in this area.

FWIW this doesn't help, the problem still exists after applying it.

Setting CONFIG_X86_PAT to is not set in the defconfig did "fix" it,
unsurprisingly.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-12 15:32     ` Richard Purdie
@ 2016-02-13  8:31       ` Richard Purdie
  2016-02-13 17:17           ` Richard Purdie
  2016-02-13 18:16         ` Bruce Ashfield
  0 siblings, 2 replies; 24+ messages in thread
From: Richard Purdie @ 2016-02-13  8:31 UTC (permalink / raw)
  To: Bruce Ashfield; +Cc: poky

On Fri, 2016-02-12 at 15:32 +0000, Richard Purdie wrote:
> On Fri, 2016-02-12 at 14:36 +0000, Richard Purdie wrote:
> > A quick look at upstream makes me wonder about:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/com
> > mi
> > t/?id=d9fe4fab11976e56b2e992980bf6ce948bdf02ac
> > 
> > which changes the code in this area.
> 
> FWIW this doesn't help, the problem still exists after applying it.
> 
> Setting CONFIG_X86_PAT to is not set in the defconfig did "fix" it,
> unsurprisingly.

Also, in the last set of builds, despite booting with "nopat", we see:

https://autobuilder.yoctoproject.org/main/builders/nightly-x86-lsb/buil
ds/637/steps/Running%20Sanity%20Tests/logs/stdio

Central error: [    9.298049] Failed to add WC MTRR for [fd000000-fdffffff]; performance may suffer.

Why we only see this on the lsb run I don't know but that address looks
like the one causing problems elsewhere with PAT. It could therefore
look like this is related to uvesafb.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-13  8:31       ` Richard Purdie
@ 2016-02-13 17:17           ` Richard Purdie
  2016-02-13 18:16         ` Bruce Ashfield
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Purdie @ 2016-02-13 17:17 UTC (permalink / raw)
  To: Bruce Ashfield, openembedded-core, Hart, Darren, saul.wold,
	Paul Gortmaker
  Cc: poky

I'm moving the discussion to OE-Core and pulling in some kernel people.
I think I understand what is wrong and how to fix it but I could use
someone who actually knows this code.

To summarise the story so far, on qemux86, X doesn't start and there is
a backtrace in the logs:

x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
------------[ cut here ]------------
WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
Modules linked in: uvesafb
CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
 00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
 00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
Call Trace:
 [<c1397ab2>] dump_stack+0x4b/0x79
 [<c1051477>] warn_slowpath_common+0x87/0xc0
 [<c104b98f>] ? untrack_pfn+0xaf/0xc0
 [<c104b98f>] ? untrack_pfn+0xaf/0xc0
 [<c1051552>] warn_slowpath_null+0x22/0x30
 [<c104b98f>] untrack_pfn+0xaf/0xc0
 [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
 [<c114e17f>] unmap_single_vma+0x4ef/0x500
 [<c114f007>] unmap_vmas+0x37/0x50
 [<c1154f8f>] exit_mmap+0x5f/0xf0
 [<c104eedd>] mmput+0x2d/0xb0
 [<c105009c>] copy_process+0xd2c/0x13c0
 [<c1050892>] _do_fork+0x82/0x340
 [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
 [<c1050c3c>] SyS_clone+0x2c/0x30
 [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
 [<c189a94a>] entry_INT80_32+0x2a/0x2a
---[ end trace be3e0a61097feddc ]---
x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining

The entry in question is setup by uvesafb which in its
uvesafb_ioremap() function calls ioremap_wc().

It appears that Xorg mmaps this from userspace, then later does a
fork() to execute a utility. At this point, when creating the vmas for
the new process, the pat code says "eeek!" as the protection mode for
the new vmas don't match the old one, returns -EINVAL, the process dies
and X goes with it.

There are a few hammers we can hit this with, we can boot with "nopat"
option which makes the problem go away, or turn off CONFIG_X86_PAT. No
surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
ioremap_wc call still happens.

The real issue is the "expected mapping type uncached-minus for got
write-combining" message, it all goes wrong from there.

Upon looking at the code and scratching my head for a long while, I
notice that there are two ways of representing the protection mode
data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".

The exact meaning of pgprot_t depends on which CPU you're running,
older CPUs have errata meaning only a small number of bits can be used.
The exact mapping table is determined by __cachemode2pte_tbl and is
updated at boot by calls from update_cache_mode_entry().

The result of this if you map enum -> pgprot_t, then try to do pgprot_t
-> enum, you can get different values since its not a 1:1 mapping.

This means the comparison in reserve_pfn_range() where it does "pcm !=
want_pcm" isn't correct and can trigger even in cases where there isn't
a problem.

This can be "fixed" by doing cachemode2protval(pcm) !=
cachemode2protval(want_pcm) and checking whether the protection bits
match, rather than the enum values, since in reality this is what we
really care about.

I can confirm that if I make that change, X boots up just fine.

The problem is I really have no idea what I'm doing :).

Could someone who understands this code have a look and see whether the
above makes sense and if it does, perhaps open a discussion with
upstream about how to fix this properly (assuming my change isn't
actually the correct fix)?

We don't see this on qemux86-64 since that has more PAT bits working
and hence the values map correctly.

Bruce: Would you accept a patch doing the above for now?

Cheers,

Richard




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-02-13 17:17           ` Richard Purdie
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Purdie @ 2016-02-13 17:17 UTC (permalink / raw)
  To: Bruce Ashfield, openembedded-core, Hart, Darren, saul.wold,
	Paul Gortmaker
  Cc: poky

I'm moving the discussion to OE-Core and pulling in some kernel people.
I think I understand what is wrong and how to fix it but I could use
someone who actually knows this code.

To summarise the story so far, on qemux86, X doesn't start and there is
a backtrace in the logs:

x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
------------[ cut here ]------------
WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
Modules linked in: uvesafb
CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
 00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
 00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
Call Trace:
 [<c1397ab2>] dump_stack+0x4b/0x79
 [<c1051477>] warn_slowpath_common+0x87/0xc0
 [<c104b98f>] ? untrack_pfn+0xaf/0xc0
 [<c104b98f>] ? untrack_pfn+0xaf/0xc0
 [<c1051552>] warn_slowpath_null+0x22/0x30
 [<c104b98f>] untrack_pfn+0xaf/0xc0
 [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
 [<c114e17f>] unmap_single_vma+0x4ef/0x500
 [<c114f007>] unmap_vmas+0x37/0x50
 [<c1154f8f>] exit_mmap+0x5f/0xf0
 [<c104eedd>] mmput+0x2d/0xb0
 [<c105009c>] copy_process+0xd2c/0x13c0
 [<c1050892>] _do_fork+0x82/0x340
 [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
 [<c1050c3c>] SyS_clone+0x2c/0x30
 [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
 [<c189a94a>] entry_INT80_32+0x2a/0x2a
---[ end trace be3e0a61097feddc ]---
x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining

The entry in question is setup by uvesafb which in its
uvesafb_ioremap() function calls ioremap_wc().

It appears that Xorg mmaps this from userspace, then later does a
fork() to execute a utility. At this point, when creating the vmas for
the new process, the pat code says "eeek!" as the protection mode for
the new vmas don't match the old one, returns -EINVAL, the process dies
and X goes with it.

There are a few hammers we can hit this with, we can boot with "nopat"
option which makes the problem go away, or turn off CONFIG_X86_PAT. No
surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
ioremap_wc call still happens.

The real issue is the "expected mapping type uncached-minus for got
write-combining" message, it all goes wrong from there.

Upon looking at the code and scratching my head for a long while, I
notice that there are two ways of representing the protection mode
data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".

The exact meaning of pgprot_t depends on which CPU you're running,
older CPUs have errata meaning only a small number of bits can be used.
The exact mapping table is determined by __cachemode2pte_tbl and is
updated at boot by calls from update_cache_mode_entry().

The result of this if you map enum -> pgprot_t, then try to do pgprot_t
-> enum, you can get different values since its not a 1:1 mapping.

This means the comparison in reserve_pfn_range() where it does "pcm !=
want_pcm" isn't correct and can trigger even in cases where there isn't
a problem.

This can be "fixed" by doing cachemode2protval(pcm) !=
cachemode2protval(want_pcm) and checking whether the protection bits
match, rather than the enum values, since in reality this is what we
really care about.

I can confirm that if I make that change, X boots up just fine.

The problem is I really have no idea what I'm doing :).

Could someone who understands this code have a look and see whether the
above makes sense and if it does, perhaps open a discussion with
upstream about how to fix this properly (assuming my change isn't
actually the correct fix)?

We don't see this on qemux86-64 since that has more PAT bits working
and hence the values map correctly.

Bruce: Would you accept a patch doing the above for now?

Cheers,

Richard




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-13  8:31       ` Richard Purdie
  2016-02-13 17:17           ` Richard Purdie
@ 2016-02-13 18:16         ` Bruce Ashfield
  1 sibling, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-02-13 18:16 UTC (permalink / raw)
  To: Richard Purdie; +Cc: poky

[-- Attachment #1: Type: text/plain, Size: 1619 bytes --]

On Sat, Feb 13, 2016 at 3:31 AM, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> On Fri, 2016-02-12 at 15:32 +0000, Richard Purdie wrote:
> > On Fri, 2016-02-12 at 14:36 +0000, Richard Purdie wrote:
> > > A quick look at upstream makes me wonder about:
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/com
> > > mi
> > > t/?id=d9fe4fab11976e56b2e992980bf6ce948bdf02ac
> > >
> > > which changes the code in this area.
> >
> > FWIW this doesn't help, the problem still exists after applying it.
> >
> > Setting CONFIG_X86_PAT to is not set in the defconfig did "fix" it,
> > unsurprisingly.
>
> Also, in the last set of builds, despite booting with "nopat", we see:
>
> https://autobuilder.yoctoproject.org/main/builders/nightly-x86-lsb/buil
> ds/637/steps/Running%20Sanity%20Tests/logs/stdio
>
> Central error: [    9.298049] Failed to add WC MTRR for
> [fd000000-fdffffff]; performance may suffer.
>
> Why we only see this on the lsb run I don't know but that address looks
> like the one causing problems elsewhere with PAT. It could therefore
> look like this is related to uvesafb.
>

Interesting. And of course as I was mentioning in my update discussion, that
I ran out of cycles to test the 3x kernel types with LSB .. murphy's law I
suppose.

Bruce


>
> Cheers,
>
> Richard
> --
> _______________________________________________
> poky mailing list
> poky@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/poky
>



-- 
"Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end"

[-- Attachment #2: Type: text/html, Size: 2837 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-13 17:17           ` Richard Purdie
@ 2016-02-13 18:19             ` Bruce Ashfield
  -1 siblings, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-02-13 18:19 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Hart, Darren, saul.wold, poky, openembedded-core

[-- Attachment #1: Type: text/plain, Size: 5476 bytes --]

On Sat, Feb 13, 2016 at 12:17 PM, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> I'm moving the discussion to OE-Core and pulling in some kernel people.
> I think I understand what is wrong and how to fix it but I could use
> someone who actually knows this code.
>
> To summarise the story so far, on qemux86, X doesn't start and there is
> a backtrace in the logs:
>
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem
> 0xfd000000-0xfdffffff], got write-combining
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 705 at
> /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985
> untrack_pfn+0xaf/0xc0()
> Modules linked in: uvesafb
> CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
>  00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
>  00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
> Call Trace:
>  [<c1397ab2>] dump_stack+0x4b/0x79
>  [<c1051477>] warn_slowpath_common+0x87/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c1051552>] warn_slowpath_null+0x22/0x30
>  [<c104b98f>] untrack_pfn+0xaf/0xc0
>  [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
>  [<c114e17f>] unmap_single_vma+0x4ef/0x500
>  [<c114f007>] unmap_vmas+0x37/0x50
>  [<c1154f8f>] exit_mmap+0x5f/0xf0
>  [<c104eedd>] mmput+0x2d/0xb0
>  [<c105009c>] copy_process+0xd2c/0x13c0
>  [<c1050892>] _do_fork+0x82/0x340
>  [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
>  [<c1050c3c>] SyS_clone+0x2c/0x30
>  [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
>  [<c189a94a>] entry_INT80_32+0x2a/0x2a
> ---[ end trace be3e0a61097feddc ]---
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem
> 0xfd000000-0xfdffffff], got write-combining
>
> The entry in question is setup by uvesafb which in its
> uvesafb_ioremap() function calls ioremap_wc().
>
> It appears that Xorg mmaps this from userspace, then later does a
> fork() to execute a utility. At this point, when creating the vmas for
> the new process, the pat code says "eeek!" as the protection mode for
> the new vmas don't match the old one, returns -EINVAL, the process dies
> and X goes with it.
>
> There are a few hammers we can hit this with, we can boot with "nopat"
> option which makes the problem go away, or turn off CONFIG_X86_PAT. No
> surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
> ioremap_wc call still happens.
>
> The real issue is the "expected mapping type uncached-minus for got
> write-combining" message, it all goes wrong from there.
>
> Upon looking at the code and scratching my head for a long while, I
> notice that there are two ways of representing the protection mode
> data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".
>
> The exact meaning of pgprot_t depends on which CPU you're running,
> older CPUs have errata meaning only a small number of bits can be used.
> The exact mapping table is determined by __cachemode2pte_tbl and is
> updated at boot by calls from update_cache_mode_entry().
>
> The result of this if you map enum -> pgprot_t, then try to do pgprot_t
> -> enum, you can get different values since its not a 1:1 mapping.
>
> This means the comparison in reserve_pfn_range() where it does "pcm !=
> want_pcm" isn't correct and can trigger even in cases where there isn't
> a problem.
>
> This can be "fixed" by doing cachemode2protval(pcm) !=
> cachemode2protval(want_pcm) and checking whether the protection bits
> match, rather than the enum values, since in reality this is what we
> really care about.
>
> I can confirm that if I make that change, X boots up just fine.
>
> The problem is I really have no idea what I'm doing :).
>
> Could someone who understands this code have a look and see whether the
> above makes sense and if it does, perhaps open a discussion with
> upstream about how to fix this properly (assuming my change isn't
> actually the correct fix)?
>

I'm not familiar with this code either, but I'll start doing some bisects
to see if
we can isolate the commit that introduced the behaviour change. This is the
sort of thing that is very rare, and really isn't supposed to happen. So
until
I read the commit log where the change was triggered, I'd lean to it being
a bug
and something to report and fix upstream.

I know that I booted sato with the 4.4 update, do we have any other
confirmations
that this worked, or was I the only one doing 4.4 boot tests until I sent
the
patch to bump the preferred version ?


>
> We don't see this on qemux86-64 since that has more PAT bits working
> and hence the values map correctly.
>
> Bruce: Would you accept a patch doing the above for now?
>

I would. I'd rather not paper over this by changing the config fragments,
since
once we throw that switch, it'll most likely sit there .. covered up.

Bruce


>
> Cheers,
>
> Richard
>
>
> --
> _______________________________________________
> poky mailing list
> poky@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/poky
>



-- 
"Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end"

[-- Attachment #2: Type: text/html, Size: 7002 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-02-13 18:19             ` Bruce Ashfield
  0 siblings, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-02-13 18:19 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Hart, Darren, saul.wold, poky, openembedded-core

[-- Attachment #1: Type: text/plain, Size: 5476 bytes --]

On Sat, Feb 13, 2016 at 12:17 PM, Richard Purdie <
richard.purdie@linuxfoundation.org> wrote:

> I'm moving the discussion to OE-Core and pulling in some kernel people.
> I think I understand what is wrong and how to fix it but I could use
> someone who actually knows this code.
>
> To summarise the story so far, on qemux86, X doesn't start and there is
> a backtrace in the logs:
>
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem
> 0xfd000000-0xfdffffff], got write-combining
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 705 at
> /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985
> untrack_pfn+0xaf/0xc0()
> Modules linked in: uvesafb
> CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
>  00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
>  00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
> Call Trace:
>  [<c1397ab2>] dump_stack+0x4b/0x79
>  [<c1051477>] warn_slowpath_common+0x87/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c1051552>] warn_slowpath_null+0x22/0x30
>  [<c104b98f>] untrack_pfn+0xaf/0xc0
>  [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
>  [<c114e17f>] unmap_single_vma+0x4ef/0x500
>  [<c114f007>] unmap_vmas+0x37/0x50
>  [<c1154f8f>] exit_mmap+0x5f/0xf0
>  [<c104eedd>] mmput+0x2d/0xb0
>  [<c105009c>] copy_process+0xd2c/0x13c0
>  [<c1050892>] _do_fork+0x82/0x340
>  [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
>  [<c1050c3c>] SyS_clone+0x2c/0x30
>  [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
>  [<c189a94a>] entry_INT80_32+0x2a/0x2a
> ---[ end trace be3e0a61097feddc ]---
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem
> 0xfd000000-0xfdffffff], got write-combining
>
> The entry in question is setup by uvesafb which in its
> uvesafb_ioremap() function calls ioremap_wc().
>
> It appears that Xorg mmaps this from userspace, then later does a
> fork() to execute a utility. At this point, when creating the vmas for
> the new process, the pat code says "eeek!" as the protection mode for
> the new vmas don't match the old one, returns -EINVAL, the process dies
> and X goes with it.
>
> There are a few hammers we can hit this with, we can boot with "nopat"
> option which makes the problem go away, or turn off CONFIG_X86_PAT. No
> surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
> ioremap_wc call still happens.
>
> The real issue is the "expected mapping type uncached-minus for got
> write-combining" message, it all goes wrong from there.
>
> Upon looking at the code and scratching my head for a long while, I
> notice that there are two ways of representing the protection mode
> data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".
>
> The exact meaning of pgprot_t depends on which CPU you're running,
> older CPUs have errata meaning only a small number of bits can be used.
> The exact mapping table is determined by __cachemode2pte_tbl and is
> updated at boot by calls from update_cache_mode_entry().
>
> The result of this if you map enum -> pgprot_t, then try to do pgprot_t
> -> enum, you can get different values since its not a 1:1 mapping.
>
> This means the comparison in reserve_pfn_range() where it does "pcm !=
> want_pcm" isn't correct and can trigger even in cases where there isn't
> a problem.
>
> This can be "fixed" by doing cachemode2protval(pcm) !=
> cachemode2protval(want_pcm) and checking whether the protection bits
> match, rather than the enum values, since in reality this is what we
> really care about.
>
> I can confirm that if I make that change, X boots up just fine.
>
> The problem is I really have no idea what I'm doing :).
>
> Could someone who understands this code have a look and see whether the
> above makes sense and if it does, perhaps open a discussion with
> upstream about how to fix this properly (assuming my change isn't
> actually the correct fix)?
>

I'm not familiar with this code either, but I'll start doing some bisects
to see if
we can isolate the commit that introduced the behaviour change. This is the
sort of thing that is very rare, and really isn't supposed to happen. So
until
I read the commit log where the change was triggered, I'd lean to it being
a bug
and something to report and fix upstream.

I know that I booted sato with the 4.4 update, do we have any other
confirmations
that this worked, or was I the only one doing 4.4 boot tests until I sent
the
patch to bump the preferred version ?


>
> We don't see this on qemux86-64 since that has more PAT bits working
> and hence the values map correctly.
>
> Bruce: Would you accept a patch doing the above for now?
>

I would. I'd rather not paper over this by changing the config fragments,
since
once we throw that switch, it'll most likely sit there .. covered up.

Bruce


>
> Cheers,
>
> Richard
>
>
> --
> _______________________________________________
> poky mailing list
> poky@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/poky
>



-- 
"Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end"

[-- Attachment #2: Type: text/html, Size: 7002 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-13 17:17           ` Richard Purdie
@ 2016-02-14 16:29             ` Paul Gortmaker
  -1 siblings, 0 replies; 24+ messages in thread
From: Paul Gortmaker @ 2016-02-14 16:29 UTC (permalink / raw)
  To: Richard Purdie
  Cc: Bruce Ashfield, Hart, Darren, saul.wold, poky, openembedded-core

[Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On 13/02/2016 (Sat 17:17) Richard Purdie wrote:

> I'm moving the discussion to OE-Core and pulling in some kernel people.
> I think I understand what is wrong and how to fix it but I could use
> someone who actually knows this code.
> 
> To summarise the story so far, on qemux86, X doesn't start and there is
> a backtrace in the logs:
> 
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
> Modules linked in: uvesafb
> CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
>  00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
>  00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
> Call Trace:
>  [<c1397ab2>] dump_stack+0x4b/0x79
>  [<c1051477>] warn_slowpath_common+0x87/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c1051552>] warn_slowpath_null+0x22/0x30
>  [<c104b98f>] untrack_pfn+0xaf/0xc0
>  [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
>  [<c114e17f>] unmap_single_vma+0x4ef/0x500
>  [<c114f007>] unmap_vmas+0x37/0x50
>  [<c1154f8f>] exit_mmap+0x5f/0xf0
>  [<c104eedd>] mmput+0x2d/0xb0
>  [<c105009c>] copy_process+0xd2c/0x13c0
>  [<c1050892>] _do_fork+0x82/0x340
>  [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
>  [<c1050c3c>] SyS_clone+0x2c/0x30
>  [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
>  [<c189a94a>] entry_INT80_32+0x2a/0x2a
> ---[ end trace be3e0a61097feddc ]---
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
> 
> The entry in question is setup by uvesafb which in its
> uvesafb_ioremap() function calls ioremap_wc().
> 
> It appears that Xorg mmaps this from userspace, then later does a
> fork() to execute a utility. At this point, when creating the vmas for
> the new process, the pat code says "eeek!" as the protection mode for
> the new vmas don't match the old one, returns -EINVAL, the process dies
> and X goes with it.
> 
> There are a few hammers we can hit this with, we can boot with "nopat"
> option which makes the problem go away, or turn off CONFIG_X86_PAT. No
> surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
> ioremap_wc call still happens.

Disabling PAT for qemu wouldn't be some horrible crime in the end; the
help text for the Kconfig option itself says:

     Say N here if you see bootup problems (boot crash, boot hang,
     spontaneous reboots) or a non-working video driver.

...and in theory PAT and the older MTRR are supposed to be performance
enhancements but not critical to have present.  I find it hard to get
excited about qemu video performance through the vesa driver.  :)
That said, it would be nice to fully understand what went pear shaped.

> 
> The real issue is the "expected mapping type uncached-minus for got
> write-combining" message, it all goes wrong from there.
> 
> Upon looking at the code and scratching my head for a long while, I
> notice that there are two ways of representing the protection mode
> data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".
> 
> The exact meaning of pgprot_t depends on which CPU you're running,
> older CPUs have errata meaning only a small number of bits can be used.
> The exact mapping table is determined by __cachemode2pte_tbl and is
> updated at boot by calls from update_cache_mode_entry().
> 
> The result of this if you map enum -> pgprot_t, then try to do pgprot_t
> -> enum, you can get different values since its not a 1:1 mapping.
> 
> This means the comparison in reserve_pfn_range() where it does "pcm !=
> want_pcm" isn't correct and can trigger even in cases where there isn't
> a problem.
> 
> This can be "fixed" by doing cachemode2protval(pcm) !=
> cachemode2protval(want_pcm) and checking whether the protection bits
> match, rather than the enum values, since in reality this is what we
> really care about.
> 
> I can confirm that if I make that change, X boots up just fine.
> 
> The problem is I really have no idea what I'm doing :).

I know the feeling.  :)   Usually I find that being able to pinpoint the
exact commit where things failed adds that final bit of information
needed to get to the bottom of things.  Bruce and I fought with disks
disappearing on qemu versatile a while back and with a bisect traced it
down to some cryptic PCI swizzle mess.  But without the bisect pointing
us at where it went wrong, I'm not sure what we'd have done.  This case
is probably not that bad; it sounds like you've got 95% of it figured
out already.

Anyway to that end, I'm assuming here if we insert the 4.1 kernel
(presumably which also has PAT enabled) and leave X11 and qemu alone,
things work.  If so, we can use "debugpat" bootarg or debugfs to compare
the 4.1 and 4.4 PAT entries (as per Documentation/x86/pat.txt) to see
where things differ between the two kernels.

And/or we can look at some of the relevant changes between the two
versions (see below) and spot test reverts of any that look suspect.
Or, just jump to a brute force bisect, while keeping an eye on the
.config file along the way to ensure it remains consistent as we go.

If this is still unresolved Tues when I'm back in the office and
more easily able to test gfx issues, I'll look at doing bisection.

P.
--

Note: below listing does not account for gregKH stable or yocto changes!

paul@acer:~/git/linux-head$ git log --oneline ^v4.1 v4.4 arch/x86/mm/pat.c 
35a5a10 x86/mm/pat: Extend set_page_memtype() to support Write-Through type
d1b4bfb x86/mm/pat: Add pgprot_writethrough()
0d69bdf x86/mm/pat: Change reserve_memtype() for Write-Through type
d79a40c x86/mm/pat: Use 7th PAT MSR slot for Write-Through PAT type
7202fdb x86/mm/pat: Remove pat_enabled() checks
9cd25aa x86/mm/pat: Emulate PAT when it is disabled
9dac629 x86/mm/pat: Untangle pat_init()
fbe7193 x86/mm/pat: Export pat_enabled()
cb32edf x86/mm/pat: Wrap pat_enabled into a function API
9e76561 x86/mm/pat: Convert to pr_*() usage
b73522e x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers

paul@acer:~/git/linux-head$ git log --oneline ^v4.1 v4.4 arch/x86/include/asm/pgtable_types.h
70f15287 x86/mm: Fix regression with huge pages on PAE
f70abb0 x86/asm: Fix pud/pmd interfaces to handle large PAT bit
4be4c1f x86/asm: Add pud/pmd mask interfaces to handle large PAT bit
d1b4bfb x86/mm/pat: Add pgprot_writethrough()

paul@acer:~/git/linux-head$ git log --oneline ^v4.1 v4.4 drivers/video/fbdev/uvesafb.c
9c27847 kernel/params: constify struct kernel_param_ops uses

paul@acer:~/git/linux-head$ git log --no-merges --oneline ^v4.1 v4.4 arch/x86/mm/ioremap*
8a0a5da x86/mm: Fix newly introduced printk format warnings
9a58eeb x86/mm: Remove region_is_ram() call from ioremap
1c9cf9b x86/mm: Move warning from __ioremap_check_ram() to the call site
623dffb x86/mm/pat: Add set_memory_wt() for Write-Through type
d838270 x86/mm, asm-generic: Add ioremap_wt() for creating Write-Through mappings
7202fdb x86/mm/pat: Remove pat_enabled() checks
1e6277d x86/mm: Mark arch_ioremap_p{m,u}d_supported() __init
cb32edf x86/mm/pat: Wrap pat_enabled into a function API
e4b6be3 x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-)
562bfca x86/mm: Clean up types in xlate_dev_mem_ptr() some more


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-02-14 16:29             ` Paul Gortmaker
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Gortmaker @ 2016-02-14 16:29 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Hart, Darren, saul.wold, poky, openembedded-core

[Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On 13/02/2016 (Sat 17:17) Richard Purdie wrote:

> I'm moving the discussion to OE-Core and pulling in some kernel people.
> I think I understand what is wrong and how to fix it but I could use
> someone who actually knows this code.
> 
> To summarise the story so far, on qemux86, X doesn't start and there is
> a backtrace in the logs:
> 
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
> Modules linked in: uvesafb
> CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
>  00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
>  00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
> Call Trace:
>  [<c1397ab2>] dump_stack+0x4b/0x79
>  [<c1051477>] warn_slowpath_common+0x87/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c1051552>] warn_slowpath_null+0x22/0x30
>  [<c104b98f>] untrack_pfn+0xaf/0xc0
>  [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
>  [<c114e17f>] unmap_single_vma+0x4ef/0x500
>  [<c114f007>] unmap_vmas+0x37/0x50
>  [<c1154f8f>] exit_mmap+0x5f/0xf0
>  [<c104eedd>] mmput+0x2d/0xb0
>  [<c105009c>] copy_process+0xd2c/0x13c0
>  [<c1050892>] _do_fork+0x82/0x340
>  [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
>  [<c1050c3c>] SyS_clone+0x2c/0x30
>  [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
>  [<c189a94a>] entry_INT80_32+0x2a/0x2a
> ---[ end trace be3e0a61097feddc ]---
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
> 
> The entry in question is setup by uvesafb which in its
> uvesafb_ioremap() function calls ioremap_wc().
> 
> It appears that Xorg mmaps this from userspace, then later does a
> fork() to execute a utility. At this point, when creating the vmas for
> the new process, the pat code says "eeek!" as the protection mode for
> the new vmas don't match the old one, returns -EINVAL, the process dies
> and X goes with it.
> 
> There are a few hammers we can hit this with, we can boot with "nopat"
> option which makes the problem go away, or turn off CONFIG_X86_PAT. No
> surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
> ioremap_wc call still happens.

Disabling PAT for qemu wouldn't be some horrible crime in the end; the
help text for the Kconfig option itself says:

     Say N here if you see bootup problems (boot crash, boot hang,
     spontaneous reboots) or a non-working video driver.

...and in theory PAT and the older MTRR are supposed to be performance
enhancements but not critical to have present.  I find it hard to get
excited about qemu video performance through the vesa driver.  :)
That said, it would be nice to fully understand what went pear shaped.

> 
> The real issue is the "expected mapping type uncached-minus for got
> write-combining" message, it all goes wrong from there.
> 
> Upon looking at the code and scratching my head for a long while, I
> notice that there are two ways of representing the protection mode
> data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".
> 
> The exact meaning of pgprot_t depends on which CPU you're running,
> older CPUs have errata meaning only a small number of bits can be used.
> The exact mapping table is determined by __cachemode2pte_tbl and is
> updated at boot by calls from update_cache_mode_entry().
> 
> The result of this if you map enum -> pgprot_t, then try to do pgprot_t
> -> enum, you can get different values since its not a 1:1 mapping.
> 
> This means the comparison in reserve_pfn_range() where it does "pcm !=
> want_pcm" isn't correct and can trigger even in cases where there isn't
> a problem.
> 
> This can be "fixed" by doing cachemode2protval(pcm) !=
> cachemode2protval(want_pcm) and checking whether the protection bits
> match, rather than the enum values, since in reality this is what we
> really care about.
> 
> I can confirm that if I make that change, X boots up just fine.
> 
> The problem is I really have no idea what I'm doing :).

I know the feeling.  :)   Usually I find that being able to pinpoint the
exact commit where things failed adds that final bit of information
needed to get to the bottom of things.  Bruce and I fought with disks
disappearing on qemu versatile a while back and with a bisect traced it
down to some cryptic PCI swizzle mess.  But without the bisect pointing
us at where it went wrong, I'm not sure what we'd have done.  This case
is probably not that bad; it sounds like you've got 95% of it figured
out already.

Anyway to that end, I'm assuming here if we insert the 4.1 kernel
(presumably which also has PAT enabled) and leave X11 and qemu alone,
things work.  If so, we can use "debugpat" bootarg or debugfs to compare
the 4.1 and 4.4 PAT entries (as per Documentation/x86/pat.txt) to see
where things differ between the two kernels.

And/or we can look at some of the relevant changes between the two
versions (see below) and spot test reverts of any that look suspect.
Or, just jump to a brute force bisect, while keeping an eye on the
.config file along the way to ensure it remains consistent as we go.

If this is still unresolved Tues when I'm back in the office and
more easily able to test gfx issues, I'll look at doing bisection.

P.
--

Note: below listing does not account for gregKH stable or yocto changes!

paul@acer:~/git/linux-head$ git log --oneline ^v4.1 v4.4 arch/x86/mm/pat.c 
35a5a10 x86/mm/pat: Extend set_page_memtype() to support Write-Through type
d1b4bfb x86/mm/pat: Add pgprot_writethrough()
0d69bdf x86/mm/pat: Change reserve_memtype() for Write-Through type
d79a40c x86/mm/pat: Use 7th PAT MSR slot for Write-Through PAT type
7202fdb x86/mm/pat: Remove pat_enabled() checks
9cd25aa x86/mm/pat: Emulate PAT when it is disabled
9dac629 x86/mm/pat: Untangle pat_init()
fbe7193 x86/mm/pat: Export pat_enabled()
cb32edf x86/mm/pat: Wrap pat_enabled into a function API
9e76561 x86/mm/pat: Convert to pr_*() usage
b73522e x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers

paul@acer:~/git/linux-head$ git log --oneline ^v4.1 v4.4 arch/x86/include/asm/pgtable_types.h
70f15287 x86/mm: Fix regression with huge pages on PAE
f70abb0 x86/asm: Fix pud/pmd interfaces to handle large PAT bit
4be4c1f x86/asm: Add pud/pmd mask interfaces to handle large PAT bit
d1b4bfb x86/mm/pat: Add pgprot_writethrough()

paul@acer:~/git/linux-head$ git log --oneline ^v4.1 v4.4 drivers/video/fbdev/uvesafb.c
9c27847 kernel/params: constify struct kernel_param_ops uses

paul@acer:~/git/linux-head$ git log --no-merges --oneline ^v4.1 v4.4 arch/x86/mm/ioremap*
8a0a5da x86/mm: Fix newly introduced printk format warnings
9a58eeb x86/mm: Remove region_is_ram() call from ioremap
1c9cf9b x86/mm: Move warning from __ioremap_check_ram() to the call site
623dffb x86/mm/pat: Add set_memory_wt() for Write-Through type
d838270 x86/mm, asm-generic: Add ioremap_wt() for creating Write-Through mappings
7202fdb x86/mm/pat: Remove pat_enabled() checks
1e6277d x86/mm: Mark arch_ioremap_p{m,u}d_supported() __init
cb32edf x86/mm/pat: Wrap pat_enabled into a function API
e4b6be3 x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-)
562bfca x86/mm: Clean up types in xlate_dev_mem_ptr() some more


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-02-13 17:17           ` Richard Purdie
@ 2016-03-02  1:41             ` Paul Gortmaker
  -1 siblings, 0 replies; 24+ messages in thread
From: Paul Gortmaker @ 2016-03-02  1:41 UTC (permalink / raw)
  To: Richard Purdie
  Cc: Bruce Ashfield, Hart, Darren, saul.wold, poky, openembedded-core

[Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On 13/02/2016 (Sat 17:17) Richard Purdie wrote:

> I'm moving the discussion to OE-Core and pulling in some kernel people.
> I think I understand what is wrong and how to fix it but I could use
> someone who actually knows this code.
> 
> To summarise the story so far, on qemux86, X doesn't start and there is
> a backtrace in the logs:
> 
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining

So Bruce helped me set up a reproducer locally today since he'd already
invested the time on that, and then I boiled that down to divorce it
from the slower steps of build-deploy-boot to make the bisect something
that mortal humans could tolerate.

Amusingly enough that led to:

commit 9cd25aac1f44f269de5ecea11f7d927f37f1d01c
Author: Borislav Petkov <bp@suse.de>
Date:   Thu Jun 4 18:55:10 2015 +0200

    x86/mm/pat: Emulate PAT when it is disabled

So while some of us were joking on IRC about the validity of forcibly
disabling PAT (via cmdline or Kconfig) as a workaround, the one line
shortlog above tells us that it wasn't so off the mark after all.

Bruce and I will decide what to do with this tomorrow, but since Richard
spent so much time on it, I thought he'd like to know this in the
interim.  Good times.   :-/

Paul.
--

> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
> Modules linked in: uvesafb
> CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
>  00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
>  00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
> Call Trace:
>  [<c1397ab2>] dump_stack+0x4b/0x79
>  [<c1051477>] warn_slowpath_common+0x87/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c1051552>] warn_slowpath_null+0x22/0x30
>  [<c104b98f>] untrack_pfn+0xaf/0xc0
>  [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
>  [<c114e17f>] unmap_single_vma+0x4ef/0x500
>  [<c114f007>] unmap_vmas+0x37/0x50
>  [<c1154f8f>] exit_mmap+0x5f/0xf0
>  [<c104eedd>] mmput+0x2d/0xb0
>  [<c105009c>] copy_process+0xd2c/0x13c0
>  [<c1050892>] _do_fork+0x82/0x340
>  [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
>  [<c1050c3c>] SyS_clone+0x2c/0x30
>  [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
>  [<c189a94a>] entry_INT80_32+0x2a/0x2a
> ---[ end trace be3e0a61097feddc ]---
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
> 
> The entry in question is setup by uvesafb which in its
> uvesafb_ioremap() function calls ioremap_wc().
> 
> It appears that Xorg mmaps this from userspace, then later does a
> fork() to execute a utility. At this point, when creating the vmas for
> the new process, the pat code says "eeek!" as the protection mode for
> the new vmas don't match the old one, returns -EINVAL, the process dies
> and X goes with it.
> 
> There are a few hammers we can hit this with, we can boot with "nopat"
> option which makes the problem go away, or turn off CONFIG_X86_PAT. No
> surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
> ioremap_wc call still happens.
> 
> The real issue is the "expected mapping type uncached-minus for got
> write-combining" message, it all goes wrong from there.
> 
> Upon looking at the code and scratching my head for a long while, I
> notice that there are two ways of representing the protection mode
> data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".
> 
> The exact meaning of pgprot_t depends on which CPU you're running,
> older CPUs have errata meaning only a small number of bits can be used.
> The exact mapping table is determined by __cachemode2pte_tbl and is
> updated at boot by calls from update_cache_mode_entry().
> 
> The result of this if you map enum -> pgprot_t, then try to do pgprot_t
> -> enum, you can get different values since its not a 1:1 mapping.
> 
> This means the comparison in reserve_pfn_range() where it does "pcm !=
> want_pcm" isn't correct and can trigger even in cases where there isn't
> a problem.
> 
> This can be "fixed" by doing cachemode2protval(pcm) !=
> cachemode2protval(want_pcm) and checking whether the protection bits
> match, rather than the enum values, since in reality this is what we
> really care about.
> 
> I can confirm that if I make that change, X boots up just fine.
> 
> The problem is I really have no idea what I'm doing :).
> 
> Could someone who understands this code have a look and see whether the
> above makes sense and if it does, perhaps open a discussion with
> upstream about how to fix this properly (assuming my change isn't
> actually the correct fix)?
> 
> We don't see this on qemux86-64 since that has more PAT bits working
> and hence the values map correctly.
> 
> Bruce: Would you accept a patch doing the above for now?
> 
> Cheers,
> 
> Richard
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-03-02  1:41             ` Paul Gortmaker
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Gortmaker @ 2016-03-02  1:41 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Hart, Darren, saul.wold, poky, openembedded-core

[Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On 13/02/2016 (Sat 17:17) Richard Purdie wrote:

> I'm moving the discussion to OE-Core and pulling in some kernel people.
> I think I understand what is wrong and how to fix it but I could use
> someone who actually knows this code.
> 
> To summarise the story so far, on qemux86, X doesn't start and there is
> a backtrace in the logs:
> 
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining

So Bruce helped me set up a reproducer locally today since he'd already
invested the time on that, and then I boiled that down to divorce it
from the slower steps of build-deploy-boot to make the bisect something
that mortal humans could tolerate.

Amusingly enough that led to:

commit 9cd25aac1f44f269de5ecea11f7d927f37f1d01c
Author: Borislav Petkov <bp@suse.de>
Date:   Thu Jun 4 18:55:10 2015 +0200

    x86/mm/pat: Emulate PAT when it is disabled

So while some of us were joking on IRC about the validity of forcibly
disabling PAT (via cmdline or Kconfig) as a workaround, the one line
shortlog above tells us that it wasn't so off the mark after all.

Bruce and I will decide what to do with this tomorrow, but since Richard
spent so much time on it, I thought he'd like to know this in the
interim.  Good times.   :-/

Paul.
--

> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
> Modules linked in: uvesafb
> CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
>  00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
>  00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
> Call Trace:
>  [<c1397ab2>] dump_stack+0x4b/0x79
>  [<c1051477>] warn_slowpath_common+0x87/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>  [<c1051552>] warn_slowpath_null+0x22/0x30
>  [<c104b98f>] untrack_pfn+0xaf/0xc0
>  [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
>  [<c114e17f>] unmap_single_vma+0x4ef/0x500
>  [<c114f007>] unmap_vmas+0x37/0x50
>  [<c1154f8f>] exit_mmap+0x5f/0xf0
>  [<c104eedd>] mmput+0x2d/0xb0
>  [<c105009c>] copy_process+0xd2c/0x13c0
>  [<c1050892>] _do_fork+0x82/0x340
>  [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
>  [<c1050c3c>] SyS_clone+0x2c/0x30
>  [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
>  [<c189a94a>] entry_INT80_32+0x2a/0x2a
> ---[ end trace be3e0a61097feddc ]---
> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
> 
> The entry in question is setup by uvesafb which in its
> uvesafb_ioremap() function calls ioremap_wc().
> 
> It appears that Xorg mmaps this from userspace, then later does a
> fork() to execute a utility. At this point, when creating the vmas for
> the new process, the pat code says "eeek!" as the protection mode for
> the new vmas don't match the old one, returns -EINVAL, the process dies
> and X goes with it.
> 
> There are a few hammers we can hit this with, we can boot with "nopat"
> option which makes the problem go away, or turn off CONFIG_X86_PAT. No
> surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
> ioremap_wc call still happens.
> 
> The real issue is the "expected mapping type uncached-minus for got
> write-combining" message, it all goes wrong from there.
> 
> Upon looking at the code and scratching my head for a long while, I
> notice that there are two ways of representing the protection mode
> data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".
> 
> The exact meaning of pgprot_t depends on which CPU you're running,
> older CPUs have errata meaning only a small number of bits can be used.
> The exact mapping table is determined by __cachemode2pte_tbl and is
> updated at boot by calls from update_cache_mode_entry().
> 
> The result of this if you map enum -> pgprot_t, then try to do pgprot_t
> -> enum, you can get different values since its not a 1:1 mapping.
> 
> This means the comparison in reserve_pfn_range() where it does "pcm !=
> want_pcm" isn't correct and can trigger even in cases where there isn't
> a problem.
> 
> This can be "fixed" by doing cachemode2protval(pcm) !=
> cachemode2protval(want_pcm) and checking whether the protection bits
> match, rather than the enum values, since in reality this is what we
> really care about.
> 
> I can confirm that if I make that change, X boots up just fine.
> 
> The problem is I really have no idea what I'm doing :).
> 
> Could someone who understands this code have a look and see whether the
> above makes sense and if it does, perhaps open a discussion with
> upstream about how to fix this properly (assuming my change isn't
> actually the correct fix)?
> 
> We don't see this on qemux86-64 since that has more PAT bits working
> and hence the values map correctly.
> 
> Bruce: Would you accept a patch doing the above for now?
> 
> Cheers,
> 
> Richard
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-03-02  1:41             ` Paul Gortmaker
@ 2016-03-09 18:53               ` Bruce Ashfield
  -1 siblings, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-03-09 18:53 UTC (permalink / raw)
  To: Paul Gortmaker, Richard Purdie
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On 2016-03-01 8:41 PM, Paul Gortmaker wrote:
> [Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On 13/02/2016 (Sat 17:17) Richard Purdie wrote:
>
>> I'm moving the discussion to OE-Core and pulling in some kernel people.
>> I think I understand what is wrong and how to fix it but I could use
>> someone who actually knows this code.
>>
>> To summarise the story so far, on qemux86, X doesn't start and there is
>> a backtrace in the logs:
>>
>> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
>
> So Bruce helped me set up a reproducer locally today since he'd already
> invested the time on that, and then I boiled that down to divorce it
> from the slower steps of build-deploy-boot to make the bisect something
> that mortal humans could tolerate.
>
> Amusingly enough that led to:
>
> commit 9cd25aac1f44f269de5ecea11f7d927f37f1d01c
> Author: Borislav Petkov <bp@suse.de>
> Date:   Thu Jun 4 18:55:10 2015 +0200
>
>      x86/mm/pat: Emulate PAT when it is disabled
>
> So while some of us were joking on IRC about the validity of forcibly
> disabling PAT (via cmdline or Kconfig) as a workaround, the one line
> shortlog above tells us that it wasn't so off the mark after all.
>
> Bruce and I will decide what to do with this tomorrow, but since Richard
> spent so much time on it, I thought he'd like to know this in the
> interim.  Good times.   :-/

As another follow up. The thread can be summarized as "It doesn't
look like it should have worked before, and qemu's pat emulation
may be the issue'.

The suggestion is to run with 'nopat', which is what Richard originally
did.

So I'm going to prep a patch that drops the kernel patch, and leaves
nopat enabled on the qemu command line. That should get us put back
together in a semi-permanent way.

Bruce

>
> Paul.
> --
>
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
>> Modules linked in: uvesafb
>> CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>>   00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
>>   00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
>>   00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
>> Call Trace:
>>   [<c1397ab2>] dump_stack+0x4b/0x79
>>   [<c1051477>] warn_slowpath_common+0x87/0xc0
>>   [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>>   [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>>   [<c1051552>] warn_slowpath_null+0x22/0x30
>>   [<c104b98f>] untrack_pfn+0xaf/0xc0
>>   [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
>>   [<c114e17f>] unmap_single_vma+0x4ef/0x500
>>   [<c114f007>] unmap_vmas+0x37/0x50
>>   [<c1154f8f>] exit_mmap+0x5f/0xf0
>>   [<c104eedd>] mmput+0x2d/0xb0
>>   [<c105009c>] copy_process+0xd2c/0x13c0
>>   [<c1050892>] _do_fork+0x82/0x340
>>   [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
>>   [<c1050c3c>] SyS_clone+0x2c/0x30
>>   [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
>>   [<c189a94a>] entry_INT80_32+0x2a/0x2a
>> ---[ end trace be3e0a61097feddc ]---
>> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
>>
>> The entry in question is setup by uvesafb which in its
>> uvesafb_ioremap() function calls ioremap_wc().
>>
>> It appears that Xorg mmaps this from userspace, then later does a
>> fork() to execute a utility. At this point, when creating the vmas for
>> the new process, the pat code says "eeek!" as the protection mode for
>> the new vmas don't match the old one, returns -EINVAL, the process dies
>> and X goes with it.
>>
>> There are a few hammers we can hit this with, we can boot with "nopat"
>> option which makes the problem go away, or turn off CONFIG_X86_PAT. No
>> surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
>> ioremap_wc call still happens.
>>
>> The real issue is the "expected mapping type uncached-minus for got
>> write-combining" message, it all goes wrong from there.
>>
>> Upon looking at the code and scratching my head for a long while, I
>> notice that there are two ways of representing the protection mode
>> data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".
>>
>> The exact meaning of pgprot_t depends on which CPU you're running,
>> older CPUs have errata meaning only a small number of bits can be used.
>> The exact mapping table is determined by __cachemode2pte_tbl and is
>> updated at boot by calls from update_cache_mode_entry().
>>
>> The result of this if you map enum -> pgprot_t, then try to do pgprot_t
>> -> enum, you can get different values since its not a 1:1 mapping.
>>
>> This means the comparison in reserve_pfn_range() where it does "pcm !=
>> want_pcm" isn't correct and can trigger even in cases where there isn't
>> a problem.
>>
>> This can be "fixed" by doing cachemode2protval(pcm) !=
>> cachemode2protval(want_pcm) and checking whether the protection bits
>> match, rather than the enum values, since in reality this is what we
>> really care about.
>>
>> I can confirm that if I make that change, X boots up just fine.
>>
>> The problem is I really have no idea what I'm doing :).
>>
>> Could someone who understands this code have a look and see whether the
>> above makes sense and if it does, perhaps open a discussion with
>> upstream about how to fix this properly (assuming my change isn't
>> actually the correct fix)?
>>
>> We don't see this on qemux86-64 since that has more PAT bits working
>> and hence the values map correctly.
>>
>> Bruce: Would you accept a patch doing the above for now?
>>
>> Cheers,
>>
>> Richard
>>
>>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-03-09 18:53               ` Bruce Ashfield
  0 siblings, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-03-09 18:53 UTC (permalink / raw)
  To: Paul Gortmaker, Richard Purdie
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On 2016-03-01 8:41 PM, Paul Gortmaker wrote:
> [Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On 13/02/2016 (Sat 17:17) Richard Purdie wrote:
>
>> I'm moving the discussion to OE-Core and pulling in some kernel people.
>> I think I understand what is wrong and how to fix it but I could use
>> someone who actually knows this code.
>>
>> To summarise the story so far, on qemux86, X doesn't start and there is
>> a backtrace in the logs:
>>
>> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
>
> So Bruce helped me set up a reproducer locally today since he'd already
> invested the time on that, and then I boiled that down to divorce it
> from the slower steps of build-deploy-boot to make the bisect something
> that mortal humans could tolerate.
>
> Amusingly enough that led to:
>
> commit 9cd25aac1f44f269de5ecea11f7d927f37f1d01c
> Author: Borislav Petkov <bp@suse.de>
> Date:   Thu Jun 4 18:55:10 2015 +0200
>
>      x86/mm/pat: Emulate PAT when it is disabled
>
> So while some of us were joking on IRC about the validity of forcibly
> disabling PAT (via cmdline or Kconfig) as a workaround, the one line
> shortlog above tells us that it wasn't so off the mark after all.
>
> Bruce and I will decide what to do with this tomorrow, but since Richard
> spent so much time on it, I thought he'd like to know this in the
> interim.  Good times.   :-/

As another follow up. The thread can be summarized as "It doesn't
look like it should have worked before, and qemu's pat emulation
may be the issue'.

The suggestion is to run with 'nopat', which is what Richard originally
did.

So I'm going to prep a patch that drops the kernel patch, and leaves
nopat enabled on the qemu command line. That should get us put back
together in a semi-permanent way.

Bruce

>
> Paul.
> --
>
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 705 at /media/build1/poky/build/tmp/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:985 untrack_pfn+0xaf/0xc0()
>> Modules linked in: uvesafb
>> CPU: 0 PID: 705 Comm: Xorg Not tainted 4.4.1-yocto-standard #1
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>>   00000000 00000000 cf14dd78 c1397ab2 00000000 cf14dda8 c1051477 c1aa4f6c
>>   00000000 000002c1 c1a9fa4c 000003d9 c104b98f c104b98f cf244000 b6355000
>>   00000000 cf14ddb8 c1051552 00000009 00000000 cf14dde0 c104b98f cf14ddd0
>> Call Trace:
>>   [<c1397ab2>] dump_stack+0x4b/0x79
>>   [<c1051477>] warn_slowpath_common+0x87/0xc0
>>   [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>>   [<c104b98f>] ? untrack_pfn+0xaf/0xc0
>>   [<c1051552>] warn_slowpath_null+0x22/0x30
>>   [<c104b98f>] untrack_pfn+0xaf/0xc0
>>   [<c104d54c>] ? kmap_atomic_prot+0x3c/0xf0
>>   [<c114e17f>] unmap_single_vma+0x4ef/0x500
>>   [<c114f007>] unmap_vmas+0x37/0x50
>>   [<c1154f8f>] exit_mmap+0x5f/0xf0
>>   [<c104eedd>] mmput+0x2d/0xb0
>>   [<c105009c>] copy_process+0xd2c/0x13c0
>>   [<c1050892>] _do_fork+0x82/0x340
>>   [<c105f2d1>] ? SyS_rt_sigaction+0x51/0xa0
>>   [<c1050c3c>] SyS_clone+0x2c/0x30
>>   [<c1001a03>] do_syscall_32_irqs_on+0x53/0xb0
>>   [<c189a94a>] entry_INT80_32+0x2a/0x2a
>> ---[ end trace be3e0a61097feddc ]---
>> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
>>
>> The entry in question is setup by uvesafb which in its
>> uvesafb_ioremap() function calls ioremap_wc().
>>
>> It appears that Xorg mmaps this from userspace, then later does a
>> fork() to execute a utility. At this point, when creating the vmas for
>> the new process, the pat code says "eeek!" as the protection mode for
>> the new vmas don't match the old one, returns -EINVAL, the process dies
>> and X goes with it.
>>
>> There are a few hammers we can hit this with, we can boot with "nopat"
>> option which makes the problem go away, or turn off CONFIG_X86_PAT. No
>> surprises there. Changing uvesafb to use mtrr=0 doesn't help since the
>> ioremap_wc call still happens.
>>
>> The real issue is the "expected mapping type uncached-minus for got
>> write-combining" message, it all goes wrong from there.
>>
>> Upon looking at the code and scratching my head for a long while, I
>> notice that there are two ways of representing the protection mode
>> data, "enum page_cache_mode" and "pgprot_t & _PAGE_CACHE_MASK".
>>
>> The exact meaning of pgprot_t depends on which CPU you're running,
>> older CPUs have errata meaning only a small number of bits can be used.
>> The exact mapping table is determined by __cachemode2pte_tbl and is
>> updated at boot by calls from update_cache_mode_entry().
>>
>> The result of this if you map enum -> pgprot_t, then try to do pgprot_t
>> -> enum, you can get different values since its not a 1:1 mapping.
>>
>> This means the comparison in reserve_pfn_range() where it does "pcm !=
>> want_pcm" isn't correct and can trigger even in cases where there isn't
>> a problem.
>>
>> This can be "fixed" by doing cachemode2protval(pcm) !=
>> cachemode2protval(want_pcm) and checking whether the protection bits
>> match, rather than the enum values, since in reality this is what we
>> really care about.
>>
>> I can confirm that if I make that change, X boots up just fine.
>>
>> The problem is I really have no idea what I'm doing :).
>>
>> Could someone who understands this code have a look and see whether the
>> above makes sense and if it does, perhaps open a discussion with
>> upstream about how to fix this properly (assuming my change isn't
>> actually the correct fix)?
>>
>> We don't see this on qemux86-64 since that has more PAT bits working
>> and hence the values map correctly.
>>
>> Bruce: Would you accept a patch doing the above for now?
>>
>> Cheers,
>>
>> Richard
>>
>>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-03-09 18:53               ` Bruce Ashfield
@ 2016-03-09 21:23                 ` Richard Purdie
  -1 siblings, 0 replies; 24+ messages in thread
From: Richard Purdie @ 2016-03-09 21:23 UTC (permalink / raw)
  To: Bruce Ashfield, Paul Gortmaker
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On Wed, 2016-03-09 at 13:53 -0500, Bruce Ashfield wrote:
> On 2016-03-01 8:41 PM, Paul Gortmaker wrote:
> > [Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On
> > 13/02/2016 (Sat 17:17) Richard Purdie wrote:
> > 
> > > I'm moving the discussion to OE-Core and pulling in some kernel
> > > people.
> > > I think I understand what is wrong and how to fix it but I could
> > > use
> > > someone who actually knows this code.
> > > 
> > > To summarise the story so far, on qemux86, X doesn't start and
> > > there is
> > > a backtrace in the logs:
> > > 
> > > x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus
> > > for [mem 0xfd000000-0xfdffffff], got write-combining
> > 
> > So Bruce helped me set up a reproducer locally today since he'd
> > already
> > invested the time on that, and then I boiled that down to divorce
> > it
> > from the slower steps of build-deploy-boot to make the bisect
> > something
> > that mortal humans could tolerate.
> > 
> > Amusingly enough that led to:
> > 
> > commit 9cd25aac1f44f269de5ecea11f7d927f37f1d01c
> > Author: Borislav Petkov <bp@suse.de>
> > Date:   Thu Jun 4 18:55:10 2015 +0200
> > 
> >      x86/mm/pat: Emulate PAT when it is disabled
> > 
> > So while some of us were joking on IRC about the validity of
> > forcibly
> > disabling PAT (via cmdline or Kconfig) as a workaround, the one
> > line
> > shortlog above tells us that it wasn't so off the mark after all.
> > 
> > Bruce and I will decide what to do with this tomorrow, but since
> > Richard
> > spent so much time on it, I thought he'd like to know this in the
> > interim.  Good times.   :-/
> 
> As another follow up. The thread can be summarized as "It doesn't
> look like it should have worked before, and qemu's pat emulation
> may be the issue'.
> 
> The suggestion is to run with 'nopat', which is what Richard
> originally
> did.
> 
> So I'm going to prep a patch that drops the kernel patch, and leaves
> nopat enabled on the qemu command line. That should get us put back
> together in a semi-permanent way.

How sure are we this is a bug in QEMU's pat emulation? If that is the
case we should file a bug against qemu and try and fix it rather than
work around it...

Cheers,

Richard


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-03-09 21:23                 ` Richard Purdie
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Purdie @ 2016-03-09 21:23 UTC (permalink / raw)
  To: Bruce Ashfield, Paul Gortmaker
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On Wed, 2016-03-09 at 13:53 -0500, Bruce Ashfield wrote:
> On 2016-03-01 8:41 PM, Paul Gortmaker wrote:
> > [Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On
> > 13/02/2016 (Sat 17:17) Richard Purdie wrote:
> > 
> > > I'm moving the discussion to OE-Core and pulling in some kernel
> > > people.
> > > I think I understand what is wrong and how to fix it but I could
> > > use
> > > someone who actually knows this code.
> > > 
> > > To summarise the story so far, on qemux86, X doesn't start and
> > > there is
> > > a backtrace in the logs:
> > > 
> > > x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus
> > > for [mem 0xfd000000-0xfdffffff], got write-combining
> > 
> > So Bruce helped me set up a reproducer locally today since he'd
> > already
> > invested the time on that, and then I boiled that down to divorce
> > it
> > from the slower steps of build-deploy-boot to make the bisect
> > something
> > that mortal humans could tolerate.
> > 
> > Amusingly enough that led to:
> > 
> > commit 9cd25aac1f44f269de5ecea11f7d927f37f1d01c
> > Author: Borislav Petkov <bp@suse.de>
> > Date:   Thu Jun 4 18:55:10 2015 +0200
> > 
> >      x86/mm/pat: Emulate PAT when it is disabled
> > 
> > So while some of us were joking on IRC about the validity of
> > forcibly
> > disabling PAT (via cmdline or Kconfig) as a workaround, the one
> > line
> > shortlog above tells us that it wasn't so off the mark after all.
> > 
> > Bruce and I will decide what to do with this tomorrow, but since
> > Richard
> > spent so much time on it, I thought he'd like to know this in the
> > interim.  Good times.   :-/
> 
> As another follow up. The thread can be summarized as "It doesn't
> look like it should have worked before, and qemu's pat emulation
> may be the issue'.
> 
> The suggestion is to run with 'nopat', which is what Richard
> originally
> did.
> 
> So I'm going to prep a patch that drops the kernel patch, and leaves
> nopat enabled on the qemu command line. That should get us put back
> together in a semi-permanent way.

How sure are we this is a bug in QEMU's pat emulation? If that is the
case we should file a bug against qemu and try and fix it rather than
work around it...

Cheers,

Richard


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-03-09 21:23                 ` Richard Purdie
@ 2016-03-10  4:12                   ` Bruce Ashfield
  -1 siblings, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-03-10  4:12 UTC (permalink / raw)
  To: Richard Purdie, Paul Gortmaker
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On 2016-03-09 4:23 PM, Richard Purdie wrote:
> On Wed, 2016-03-09 at 13:53 -0500, Bruce Ashfield wrote:
>> On 2016-03-01 8:41 PM, Paul Gortmaker wrote:
>>> [Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On
>>> 13/02/2016 (Sat 17:17) Richard Purdie wrote:
>>>
>>>> I'm moving the discussion to OE-Core and pulling in some kernel
>>>> people.
>>>> I think I understand what is wrong and how to fix it but I could
>>>> use
>>>> someone who actually knows this code.
>>>>
>>>> To summarise the story so far, on qemux86, X doesn't start and
>>>> there is
>>>> a backtrace in the logs:
>>>>
>>>> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus
>>>> for [mem 0xfd000000-0xfdffffff], got write-combining
>>>
>>> So Bruce helped me set up a reproducer locally today since he'd
>>> already
>>> invested the time on that, and then I boiled that down to divorce
>>> it
>>> from the slower steps of build-deploy-boot to make the bisect
>>> something
>>> that mortal humans could tolerate.
>>>
>>> Amusingly enough that led to:
>>>
>>> commit 9cd25aac1f44f269de5ecea11f7d927f37f1d01c
>>> Author: Borislav Petkov <bp@suse.de>
>>> Date:   Thu Jun 4 18:55:10 2015 +0200
>>>
>>>       x86/mm/pat: Emulate PAT when it is disabled
>>>
>>> So while some of us were joking on IRC about the validity of
>>> forcibly
>>> disabling PAT (via cmdline or Kconfig) as a workaround, the one
>>> line
>>> shortlog above tells us that it wasn't so off the mark after all.
>>>
>>> Bruce and I will decide what to do with this tomorrow, but since
>>> Richard
>>> spent so much time on it, I thought he'd like to know this in the
>>> interim.  Good times.   :-/
>>
>> As another follow up. The thread can be summarized as "It doesn't
>> look like it should have worked before, and qemu's pat emulation
>> may be the issue'.
>>
>> The suggestion is to run with 'nopat', which is what Richard
>> originally
>> did.
>>
>> So I'm going to prep a patch that drops the kernel patch, and leaves
>> nopat enabled on the qemu command line. That should get us put back
>> together in a semi-permanent way.
>
> How sure are we this is a bug in QEMU's pat emulation? If that is the
> case we should file a bug against qemu and try and fix it rather than
> work around it...

It could still be something that the kernel can work around, Toshi
did say:

There is a matter of how qemu emulates CPU features.  There is no such
Intel CPU that supports PAT w/o MTRR.  This is why the current code
assumes this dependency.

Which is likely the trigger, we've send information about the cpu to
him, and with that there's a chance for a pat fix.

He repeated our thought of running with 'nopat' while a fix is
considered.

It may be some time before that happens, and I was going to test
with the kernel patch dropped, and nopat in the qemu boot args. If
that works, I'd rather run with that, and then revisit when (if)
there's more changes upstream.

Bruce

>
> Cheers,
>
> Richard
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-03-10  4:12                   ` Bruce Ashfield
  0 siblings, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-03-10  4:12 UTC (permalink / raw)
  To: Richard Purdie, Paul Gortmaker
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On 2016-03-09 4:23 PM, Richard Purdie wrote:
> On Wed, 2016-03-09 at 13:53 -0500, Bruce Ashfield wrote:
>> On 2016-03-01 8:41 PM, Paul Gortmaker wrote:
>>> [Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel] On
>>> 13/02/2016 (Sat 17:17) Richard Purdie wrote:
>>>
>>>> I'm moving the discussion to OE-Core and pulling in some kernel
>>>> people.
>>>> I think I understand what is wrong and how to fix it but I could
>>>> use
>>>> someone who actually knows this code.
>>>>
>>>> To summarise the story so far, on qemux86, X doesn't start and
>>>> there is
>>>> a backtrace in the logs:
>>>>
>>>> x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus
>>>> for [mem 0xfd000000-0xfdffffff], got write-combining
>>>
>>> So Bruce helped me set up a reproducer locally today since he'd
>>> already
>>> invested the time on that, and then I boiled that down to divorce
>>> it
>>> from the slower steps of build-deploy-boot to make the bisect
>>> something
>>> that mortal humans could tolerate.
>>>
>>> Amusingly enough that led to:
>>>
>>> commit 9cd25aac1f44f269de5ecea11f7d927f37f1d01c
>>> Author: Borislav Petkov <bp@suse.de>
>>> Date:   Thu Jun 4 18:55:10 2015 +0200
>>>
>>>       x86/mm/pat: Emulate PAT when it is disabled
>>>
>>> So while some of us were joking on IRC about the validity of
>>> forcibly
>>> disabling PAT (via cmdline or Kconfig) as a workaround, the one
>>> line
>>> shortlog above tells us that it wasn't so off the mark after all.
>>>
>>> Bruce and I will decide what to do with this tomorrow, but since
>>> Richard
>>> spent so much time on it, I thought he'd like to know this in the
>>> interim.  Good times.   :-/
>>
>> As another follow up. The thread can be summarized as "It doesn't
>> look like it should have worked before, and qemu's pat emulation
>> may be the issue'.
>>
>> The suggestion is to run with 'nopat', which is what Richard
>> originally
>> did.
>>
>> So I'm going to prep a patch that drops the kernel patch, and leaves
>> nopat enabled on the qemu command line. That should get us put back
>> together in a semi-permanent way.
>
> How sure are we this is a bug in QEMU's pat emulation? If that is the
> case we should file a bug against qemu and try and fix it rather than
> work around it...

It could still be something that the kernel can work around, Toshi
did say:

There is a matter of how qemu emulates CPU features.  There is no such
Intel CPU that supports PAT w/o MTRR.  This is why the current code
assumes this dependency.

Which is likely the trigger, we've send information about the cpu to
him, and with that there's a chance for a pat fix.

He repeated our thought of running with 'nopat' while a fix is
considered.

It may be some time before that happens, and I was going to test
with the kernel patch dropped, and nopat in the qemu boot args. If
that works, I'd rather run with that, and then revisit when (if)
there's more changes upstream.

Bruce

>
> Cheers,
>
> Richard
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-03-10  4:12                   ` Bruce Ashfield
@ 2016-03-10 20:59                     ` Richard Purdie
  -1 siblings, 0 replies; 24+ messages in thread
From: Richard Purdie @ 2016-03-10 20:59 UTC (permalink / raw)
  To: Bruce Ashfield, Paul Gortmaker
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On Wed, 2016-03-09 at 23:12 -0500, Bruce Ashfield wrote:
> On 2016-03-09 4:23 PM, Richard Purdie wrote:
> > On Wed, 2016-03-09 at 13:53 -0500, Bruce Ashfield wrote:
> > > As another follow up. The thread can be summarized as "It doesn't
> > > look like it should have worked before, and qemu's pat emulation
> > > may be the issue'.
> > > 
> > > The suggestion is to run with 'nopat', which is what Richard
> > > originally
> > > did.
> > > 
> > > So I'm going to prep a patch that drops the kernel patch, and
> > > leaves
> > > nopat enabled on the qemu command line. That should get us put
> > > back
> > > together in a semi-permanent way.
> > 
> > How sure are we this is a bug in QEMU's pat emulation? If that is
> > the
> > case we should file a bug against qemu and try and fix it rather
> > than
> > work around it...
> 
> It could still be something that the kernel can work around, Toshi
> did say:
> 
> There is a matter of how qemu emulates CPU features.  There is no
> such
> Intel CPU that supports PAT w/o MTRR.  This is why the current code
> assumes this dependency.
> 
> Which is likely the trigger, we've send information about the cpu to
> him, and with that there's a chance for a pat fix.
> 
> He repeated our thought of running with 'nopat' while a fix is
> considered.
> 
> It may be some time before that happens, and I was going to test
> with the kernel patch dropped, and nopat in the qemu boot args. If
> that works, I'd rather run with that, and then revisit when (if)
> there's more changes upstream.

Reading the other thread, it looks like if MTRR is disabled, PAT needs
to be disabled too. That sounds like a simple enough patch which is
going upstream imminently so I think the preferred solution is to get
that into our kernels and then drop my patch?

Cheers,

Richard


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-03-10 20:59                     ` Richard Purdie
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Purdie @ 2016-03-10 20:59 UTC (permalink / raw)
  To: Bruce Ashfield, Paul Gortmaker
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On Wed, 2016-03-09 at 23:12 -0500, Bruce Ashfield wrote:
> On 2016-03-09 4:23 PM, Richard Purdie wrote:
> > On Wed, 2016-03-09 at 13:53 -0500, Bruce Ashfield wrote:
> > > As another follow up. The thread can be summarized as "It doesn't
> > > look like it should have worked before, and qemu's pat emulation
> > > may be the issue'.
> > > 
> > > The suggestion is to run with 'nopat', which is what Richard
> > > originally
> > > did.
> > > 
> > > So I'm going to prep a patch that drops the kernel patch, and
> > > leaves
> > > nopat enabled on the qemu command line. That should get us put
> > > back
> > > together in a semi-permanent way.
> > 
> > How sure are we this is a bug in QEMU's pat emulation? If that is
> > the
> > case we should file a bug against qemu and try and fix it rather
> > than
> > work around it...
> 
> It could still be something that the kernel can work around, Toshi
> did say:
> 
> There is a matter of how qemu emulates CPU features.  There is no
> such
> Intel CPU that supports PAT w/o MTRR.  This is why the current code
> assumes this dependency.
> 
> Which is likely the trigger, we've send information about the cpu to
> him, and with that there's a chance for a pat fix.
> 
> He repeated our thought of running with 'nopat' while a fix is
> considered.
> 
> It may be some time before that happens, and I was going to test
> with the kernel patch dropped, and nopat in the qemu boot args. If
> that works, I'd rather run with that, and then revisit when (if)
> there's more changes upstream.

Reading the other thread, it looks like if MTRR is disabled, PAT needs
to be disabled too. That sounds like a simple enough patch which is
going upstream imminently so I think the preferred solution is to get
that into our kernels and then drop my patch?

Cheers,

Richard


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [poky] [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
  2016-03-10 20:59                     ` Richard Purdie
@ 2016-03-10 21:55                       ` Bruce Ashfield
  -1 siblings, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-03-10 21:55 UTC (permalink / raw)
  To: Richard Purdie, Paul Gortmaker
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On 2016-03-10 3:59 PM, Richard Purdie wrote:
> On Wed, 2016-03-09 at 23:12 -0500, Bruce Ashfield wrote:
>> On 2016-03-09 4:23 PM, Richard Purdie wrote:
>>> On Wed, 2016-03-09 at 13:53 -0500, Bruce Ashfield wrote:
>>>> As another follow up. The thread can be summarized as "It doesn't
>>>> look like it should have worked before, and qemu's pat emulation
>>>> may be the issue'.
>>>>
>>>> The suggestion is to run with 'nopat', which is what Richard
>>>> originally
>>>> did.
>>>>
>>>> So I'm going to prep a patch that drops the kernel patch, and
>>>> leaves
>>>> nopat enabled on the qemu command line. That should get us put
>>>> back
>>>> together in a semi-permanent way.
>>>
>>> How sure are we this is a bug in QEMU's pat emulation? If that is
>>> the
>>> case we should file a bug against qemu and try and fix it rather
>>> than
>>> work around it...
>>
>> It could still be something that the kernel can work around, Toshi
>> did say:
>>
>> There is a matter of how qemu emulates CPU features.  There is no
>> such
>> Intel CPU that supports PAT w/o MTRR.  This is why the current code
>> assumes this dependency.
>>
>> Which is likely the trigger, we've send information about the cpu to
>> him, and with that there's a chance for a pat fix.
>>
>> He repeated our thought of running with 'nopat' while a fix is
>> considered.
>>
>> It may be some time before that happens, and I was going to test
>> with the kernel patch dropped, and nopat in the qemu boot args. If
>> that works, I'd rather run with that, and then revisit when (if)
>> there's more changes upstream.
>
> Reading the other thread, it looks like if MTRR is disabled, PAT needs
> to be disabled too. That sounds like a simple enough patch which is
> going upstream imminently so I think the preferred solution is to get
> that into our kernels and then drop my patch?

Yep. Looks like there's a patch to disable the pat based on the flags
of the cpu model being proposed.

When that's out, I'll merge it, and then we'll drop this. So for now,
we wait a bit longer.

Bruce

>
> Cheers,
>
> Richard
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel
@ 2016-03-10 21:55                       ` Bruce Ashfield
  0 siblings, 0 replies; 24+ messages in thread
From: Bruce Ashfield @ 2016-03-10 21:55 UTC (permalink / raw)
  To: Richard Purdie, Paul Gortmaker
  Cc: Hart, Darren, saul.wold, poky, openembedded-core

On 2016-03-10 3:59 PM, Richard Purdie wrote:
> On Wed, 2016-03-09 at 23:12 -0500, Bruce Ashfield wrote:
>> On 2016-03-09 4:23 PM, Richard Purdie wrote:
>>> On Wed, 2016-03-09 at 13:53 -0500, Bruce Ashfield wrote:
>>>> As another follow up. The thread can be summarized as "It doesn't
>>>> look like it should have worked before, and qemu's pat emulation
>>>> may be the issue'.
>>>>
>>>> The suggestion is to run with 'nopat', which is what Richard
>>>> originally
>>>> did.
>>>>
>>>> So I'm going to prep a patch that drops the kernel patch, and
>>>> leaves
>>>> nopat enabled on the qemu command line. That should get us put
>>>> back
>>>> together in a semi-permanent way.
>>>
>>> How sure are we this is a bug in QEMU's pat emulation? If that is
>>> the
>>> case we should file a bug against qemu and try and fix it rather
>>> than
>>> work around it...
>>
>> It could still be something that the kernel can work around, Toshi
>> did say:
>>
>> There is a matter of how qemu emulates CPU features.  There is no
>> such
>> Intel CPU that supports PAT w/o MTRR.  This is why the current code
>> assumes this dependency.
>>
>> Which is likely the trigger, we've send information about the cpu to
>> him, and with that there's a chance for a pat fix.
>>
>> He repeated our thought of running with 'nopat' while a fix is
>> considered.
>>
>> It may be some time before that happens, and I was going to test
>> with the kernel patch dropped, and nopat in the qemu boot args. If
>> that works, I'd rather run with that, and then revisit when (if)
>> there's more changes upstream.
>
> Reading the other thread, it looks like if MTRR is disabled, PAT needs
> to be disabled too. That sounds like a simple enough patch which is
> going upstream imminently so I think the preferred solution is to get
> that into our kernels and then drop my patch?

Yep. Looks like there's a patch to disable the pat based on the flags
of the cpu model being proposed.

When that's out, I'll merge it, and then we'll drop this. So for now,
we wait a bit longer.

Bruce

>
> Cheers,
>
> Richard
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-03-10 21:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-11 15:15 [PATCH 0/1] meta-yocto: bump qemu preferred version to 4.4 Bruce Ashfield
2016-02-11 15:15 ` [PATCH 1/1] poky: update qemu* to prefer 4.4 kernel Bruce Ashfield
2016-02-12 14:36   ` Richard Purdie
2016-02-12 15:32     ` Richard Purdie
2016-02-13  8:31       ` Richard Purdie
2016-02-13 17:17         ` [poky] " Richard Purdie
2016-02-13 17:17           ` Richard Purdie
2016-02-13 18:19           ` [poky] " Bruce Ashfield
2016-02-13 18:19             ` Bruce Ashfield
2016-02-14 16:29           ` [poky] " Paul Gortmaker
2016-02-14 16:29             ` Paul Gortmaker
2016-03-02  1:41           ` [poky] " Paul Gortmaker
2016-03-02  1:41             ` Paul Gortmaker
2016-03-09 18:53             ` [poky] " Bruce Ashfield
2016-03-09 18:53               ` Bruce Ashfield
2016-03-09 21:23               ` [poky] " Richard Purdie
2016-03-09 21:23                 ` Richard Purdie
2016-03-10  4:12                 ` [poky] " Bruce Ashfield
2016-03-10  4:12                   ` Bruce Ashfield
2016-03-10 20:59                   ` [poky] " Richard Purdie
2016-03-10 20:59                     ` Richard Purdie
2016-03-10 21:55                     ` [poky] " Bruce Ashfield
2016-03-10 21:55                       ` Bruce Ashfield
2016-02-13 18:16         ` Bruce Ashfield

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.