All of lore.kernel.org
 help / color / mirror / Atom feed
* Hibernate resume bug around 3,18-rc2 - Full PAT support
@ 2015-11-18 21:43 Vassilis Virvilis
  2015-11-19  5:39 ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Vassilis Virvilis @ 2015-11-18 21:43 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 8068 bytes --]

Hi,

I have been hit by a hibernate/resume bug. Other people may have too: The following links are consistent with my observations

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494
https://bugs.archlinux.org/task/44807

Some observations:
1) The first few rapid hibernation / resume cycles do not fail.

2) If the computer is loaded (eclipse + chromium + firefox/iceweasel + thunderbird/icedove + Konsole) helps to reproduce and lock up during resume

3) Long hibernation times (overnight) helps to reproduce and lock up during resume

4) For the bad commits (where the lockup during resume takes place) - the image loading during resume is significantly faster. It is fast and then it locks.

How I hit the problem and what I have done:

I am running debian unstable

Debian went from 3.16 to 3.19 - hence the problem raised its ugly head. I upgraded diligently up to 4.2.6 - The problem persists

I added no_console_suspend initcall_debug to the kernel command line - see attached image of the lockup.

I added the drm.debug=0xe but it didn't produce any interesting (ok I know who I am to judge?) and the runs did not have it so I took it out again.

I reproduced with hibernating and resuming back to KDE and or back to text console.

I switched to the VGA console and the resume problem persists.

I started kernel bisection from 3.16 to 3.19 following https://wiki.debian.org/DebianKernel/GitBisect

One month and 25 kernels later see below for the bisect log

I hit some untestable kernel that weren't booting. They were hanging at "Loading ramdisk..." before any actual kernel message.

Looks like the first bad / untestable commit is from  Juergen Gross / Thomas Gleixner Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support]

Full disclaimer: I may have fucked up the bisection. Finding bad commits was semi easy - finding good commits needs a run time for 2-3 days.

I would really appreciate some help and directions to nail this down.


Regards

      Vassilis Virvilis



bill@localhost:~/Downloads/linux$ git bisect log
git bisect start
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735
# good: [754c780953397dd5ee5191b7b3ca67e09088ce7a] Merge branch 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
git bisect good 754c780953397dd5ee5191b7b3ca67e09088ce7a
# bad: [7ef58b32f571bffb7763c6252ad7527562081f34] Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
git bisect bad 7ef58b32f571bffb7763c6252ad7527562081f34
# good: [53429290a054b30e4683297409fc4627b2592315] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect good 53429290a054b30e4683297409fc4627b2592315
# good: [3a647c1d7ab08145cee4b650f5e797d168846c51] Merge tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 3a647c1d7ab08145cee4b650f5e797d168846c51
# bad: [1366f5d3129f2abde606214de7afc3dd61781fa3] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect bad 1366f5d3129f2abde606214de7afc3dd61781fa3
# good: [151cd97630f87451cab412e40750d0e5f7581c98] Merge tag 'defconfig-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 151cd97630f87451cab412e40750d0e5f7581c98
# good: [ecb50f0afd35a51ef487e8a54b976052eb03d729] Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good ecb50f0afd35a51ef487e8a54b976052eb03d729
# bad: [3a5dc1fafb016560315fe45bb4ef8bde259dd1bc] Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 3a5dc1fafb016560315fe45bb4ef8bde259dd1bc
# good: [b6444bd0a18eb47343e16749ce80a6ebd521f124] Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good b6444bd0a18eb47343e16749ce80a6ebd521f124
# bad: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad a023748d53c10850650fe86b1c4a7d421d576451
# good: [773fed910d41e443e495a6bfa9ab1c2b7b13e012] Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 773fed910d41e443e495a6bfa9ab1c2b7b13e012
# good: [49a3b3cbdf1621678a39bd95a3e67c0f858539c7] x86: Use new cache mode type in mm/iomap_32.c
git bisect good 49a3b3cbdf1621678a39bd95a3e67c0f858539c7
# skip: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up pgtable_types.h
git bisect skip 87ad0b713b1034b6caf559976c35ce47f6d1d1e9
# skip: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache mode type in setting page attributes
git bisect skip c06814d8419a74528500f85faf5fc01f67f8e7e6
# skip: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache mode type in memtype related functions
git bisect skip e00c8cc93c1ac01ecd5049929a50fb47b62bb041
# skip: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to use cache mode translation tables
git bisect skip bd809af16e3ab1f8d55b3e2928c47c67e2a865d2
# skip: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit in pagetable dump for lower levels
git bisect skip f439c429c320981943f8b64b2a4049d946cb492b
# skip: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen pv-domains using PAT
git bisect skip 47591df505129c9774af6cca2debf283a6e56ed7
# skip: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache mode type in mm/ioremap.c
git bisect skip b14097bd911c2554b0b5271b3a6b2d84044d1843
# skip: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
git bisect skip 102e19e1955d85f31475416b1ee22980c6462cf8
# skip: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit when copying pte values between large and normal pages
git bisect skip f5b2831d654167d77da8afbef4d2584897b12d0c
# skip: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT only functions to mm/pat.c
git bisect skip 0dbcae884779fdf7e2239a97ac7488877f0693d9
# skip: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
git bisect skip 2a3746984c98b17b565e6a2c2bbaaaef757db1b4
# only skipped commits left to test
# possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
# possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT only functions to mm/pat.c
# possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen pv-domains using PAT
# possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to use cache mode translation tables
# possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit when copying pte values between large and normal pages
# possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit in pagetable dump for lower levels
# possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up pgtable_types.h
# possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache mode type in memtype related functions
# possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache mode type in mm/ioremap.c
# possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache mode type in setting page attributes
# possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
# possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()









[-- Attachment #2: IMG_20150916_201816.jpg --]
[-- Type: image/jpeg, Size: 242949 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-18 21:43 Hibernate resume bug around 3,18-rc2 - Full PAT support Vassilis Virvilis
@ 2015-11-19  5:39 ` Juergen Gross
  2015-11-19  7:50   ` vasvir
  2015-11-23 18:48   ` Luis R. Rodriguez
  0 siblings, 2 replies; 22+ messages in thread
From: Juergen Gross @ 2015-11-19  5:39 UTC (permalink / raw)
  To: vasvir, linux-kernel; +Cc: Toshi Kani, Luis R. Rodriguez

On 18/11/15 22:43, Vassilis Virvilis wrote:
> Hi,
> 
> I have been hit by a hibernate/resume bug. Other people may have too:
> The following links are consistent with my observations
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494
> https://bugs.archlinux.org/task/44807
> 
> Some observations:
> 1) The first few rapid hibernation / resume cycles do not fail.
> 
> 2) If the computer is loaded (eclipse + chromium + firefox/iceweasel +
> thunderbird/icedove + Konsole) helps to reproduce and lock up during resume
> 
> 3) Long hibernation times (overnight) helps to reproduce and lock up
> during resume
> 
> 4) For the bad commits (where the lockup during resume takes place) -
> the image loading during resume is significantly faster. It is fast and
> then it locks.
> 
> How I hit the problem and what I have done:
> 
> I am running debian unstable
> 
> Debian went from 3.16 to 3.19 - hence the problem raised its ugly head.
> I upgraded diligently up to 4.2.6 - The problem persists

Could you please try the most recent 4.3 kernel? There has been some
work related to this topic after 4.2 (large page pat handling done by
Toshi Kani and mtrr/pat handling by Luis Rodriguez).

Another interesting information would be the exact hardware you are
using. Maybe we can see some similarities between yours and the other
two cases you referenced above.

> I added no_console_suspend initcall_debug to the kernel command line -
> see attached image of the lockup.
> 
> I added the drm.debug=0xe but it didn't produce any interesting (ok I
> know who I am to judge?) and the runs did not have it so I took it out
> again.
> 
> I reproduced with hibernating and resuming back to KDE and or back to
> text console.
> 
> I switched to the VGA console and the resume problem persists.
> 
> I started kernel bisection from 3.16 to 3.19 following
> https://wiki.debian.org/DebianKernel/GitBisect
> 
> One month and 25 kernels later see below for the bisect log

Wow! Thanks for doing this work!


Juergen

> 
> I hit some untestable kernel that weren't booting. They were hanging at
> "Loading ramdisk..." before any actual kernel message.
> 
> Looks like the first bad / untestable commit is from  Juergen Gross /
> Thomas Gleixner Merge branch 'x86-mm-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support]
> 
> Full disclaimer: I may have fucked up the bisection. Finding bad commits
> was semi easy - finding good commits needs a run time for 2-3 days.
> 
> I would really appreciate some help and directions to nail this down.
> 
> 
> Regards
> 
>      Vassilis Virvilis
> 
> 
> 
> bill@localhost:~/Downloads/linux$ git bisect log
> git bisect start
> # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
> git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
> # bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
> git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735
> # good: [754c780953397dd5ee5191b7b3ca67e09088ce7a] Merge branch
> 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
> git bisect good 754c780953397dd5ee5191b7b3ca67e09088ce7a
> # bad: [7ef58b32f571bffb7763c6252ad7527562081f34] Merge tag
> 'devicetree-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
> git bisect bad 7ef58b32f571bffb7763c6252ad7527562081f34
> # good: [53429290a054b30e4683297409fc4627b2592315] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
> git bisect good 53429290a054b30e4683297409fc4627b2592315
> # good: [3a647c1d7ab08145cee4b650f5e797d168846c51] Merge tag
> 'drivers-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> git bisect good 3a647c1d7ab08145cee4b650f5e797d168846c51
> # bad: [1366f5d3129f2abde606214de7afc3dd61781fa3] Merge branch
> 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
> git bisect bad 1366f5d3129f2abde606214de7afc3dd61781fa3
> # good: [151cd97630f87451cab412e40750d0e5f7581c98] Merge tag
> 'defconfig-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> git bisect good 151cd97630f87451cab412e40750d0e5f7581c98
> # good: [ecb50f0afd35a51ef487e8a54b976052eb03d729] Merge branch
> 'irq-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good ecb50f0afd35a51ef487e8a54b976052eb03d729
> # bad: [3a5dc1fafb016560315fe45bb4ef8bde259dd1bc] Merge branch
> 'x86-microcode-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 3a5dc1fafb016560315fe45bb4ef8bde259dd1bc
> # good: [b6444bd0a18eb47343e16749ce80a6ebd521f124] Merge branch
> 'x86-boot-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good b6444bd0a18eb47343e16749ce80a6ebd521f124
> # bad: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch
> 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad a023748d53c10850650fe86b1c4a7d421d576451
> # good: [773fed910d41e443e495a6bfa9ab1c2b7b13e012] Merge branches
> 'x86-platform-for-linus' and 'x86-uv-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good 773fed910d41e443e495a6bfa9ab1c2b7b13e012
> # good: [49a3b3cbdf1621678a39bd95a3e67c0f858539c7] x86: Use new cache
> mode type in mm/iomap_32.c
> git bisect good 49a3b3cbdf1621678a39bd95a3e67c0f858539c7
> # skip: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up
> pgtable_types.h
> git bisect skip 87ad0b713b1034b6caf559976c35ce47f6d1d1e9
> # skip: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache
> mode type in setting page attributes
> git bisect skip c06814d8419a74528500f85faf5fc01f67f8e7e6
> # skip: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache
> mode type in memtype related functions
> git bisect skip e00c8cc93c1ac01ecd5049929a50fb47b62bb041
> # skip: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to
> use cache mode translation tables
> git bisect skip bd809af16e3ab1f8d55b3e2928c47c67e2a865d2
> # skip: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit
> in pagetable dump for lower levels
> git bisect skip f439c429c320981943f8b64b2a4049d946cb492b
> # skip: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen
> pv-domains using PAT
> git bisect skip 47591df505129c9774af6cca2debf283a6e56ed7
> # skip: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache
> mode type in mm/ioremap.c
> git bisect skip b14097bd911c2554b0b5271b3a6b2d84044d1843
> # skip: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking
> for setting of _PAGE_PAT_LARGE in pageattr.c
> git bisect skip 102e19e1955d85f31475416b1ee22980c6462cf8
> # skip: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit
> when copying pte values between large and normal pages
> git bisect skip f5b2831d654167d77da8afbef4d2584897b12d0c
> # skip: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT
> only functions to mm/pat.c
> git bisect skip 0dbcae884779fdf7e2239a97ac7488877f0693d9
> # skip: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache
> mode type in track_pfn_remap() and track_pfn_insert()
> git bisect skip 2a3746984c98b17b565e6a2c2bbaaaef757db1b4
> # only skipped commits left to test
> # possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451]
> Merge branch 'x86-mm-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> # possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9]
> x86: mm: Move PAT only functions to mm/pat.c
> # possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7]
> xen: Support Xen pv-domains using PAT
> # possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2]
> x86: Enable PAT to use cache mode translation tables
> # possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c]
> x86: Respect PAT bit when copying pte values between large and normal pages
> # possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b]
> x86: Support PAT bit in pagetable dump for lower levels
> # possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9]
> x86: Clean up pgtable_types.h
> # possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041]
> x86: Use new cache mode type in memtype related functions
> # possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843]
> x86: Use new cache mode type in mm/ioremap.c
> # possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6]
> x86: Use new cache mode type in setting page attributes
> # possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8]
> x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
> # possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4]
> x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-19  5:39 ` Juergen Gross
@ 2015-11-19  7:50   ` vasvir
  2015-11-19  9:10     ` Juergen Gross
  2015-11-23 18:48   ` Luis R. Rodriguez
  1 sibling, 1 reply; 22+ messages in thread
From: vasvir @ 2015-11-19  7:50 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

Hi,

Thanks for the quick answer

>
> Could you please try the most recent 4.3 kernel? There has been some
> work related to this topic after 4.2 (large page pat handling done by
> Toshi Kani and mtrr/pat handling by Luis Rodriguez).

That means I will reset the bisection. Right? Is there any other info we
can extract from there?

So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume
4.3 for now.

I will do it later tonight. It will take 2 days at least to report back

>
> Another interesting information would be the exact hardware you are
> using. Maybe we can see some similarities between yours and the other
> two cases you referenced above.
>

It is an i7
Motherboard: ASROCK H97 PRO4 RETAIL
CPU INTEL CORE I7-4790 3.60GHZ LGA1150 - BOX
It has 16GB of RAM, one SSD and one HDD
I have NO external graphics card

Do you want me to run something on this like lspci, lsusb

I upgraded the BIOS of the motherboard to the latest. This is not the
problem though because I upgraded after the problem occurred as a counter
measure in case I was hit by a buggy BIOS and linux had changed its
behavior to be stricter.

I experimented with ACPI compilers/decompilers and I was tempted to fix my
ACPI tables but I didn't.

I saw the kernel command line option acpi_os=!Windows2013 but I didn't try
it. Do you thing I should try it?

> Wow! Thanks for doing this work!
>

I would like this to be fixed so I am willing to do the testing.

   Vassilis


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-19  7:50   ` vasvir
@ 2015-11-19  9:10     ` Juergen Gross
  2015-11-19 20:35       ` Vassilis Virvilis
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2015-11-19  9:10 UTC (permalink / raw)
  To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

On 19/11/15 08:50, vasvir@iit.demokritos.gr wrote:
> Hi,
> 
> Thanks for the quick answer
> 
>>
>> Could you please try the most recent 4.3 kernel? There has been some
>> work related to this topic after 4.2 (large page pat handling done by
>> Toshi Kani and mtrr/pat handling by Luis Rodriguez).
> 
> That means I will reset the bisection. Right? Is there any other info we
> can extract from there?

I don't see what else should be specific to that patch other than the
information that the issue occurred due to that patch. All further
diagnostic information should be obtainable with a newer kernel, too.

> So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume
> 4.3 for now.

I think 4.3 is okay.

> I will do it later tonight. It will take 2 days at least to report back

Okay, thank you for your effort!

> 
>>
>> Another interesting information would be the exact hardware you are
>> using. Maybe we can see some similarities between yours and the other
>> two cases you referenced above.
>>
> 
> It is an i7
> Motherboard: ASROCK H97 PRO4 RETAIL
> CPU INTEL CORE I7-4790 3.60GHZ LGA1150 - BOX
> It has 16GB of RAM, one SSD and one HDD
> I have NO external graphics card
> 
> Do you want me to run something on this like lspci, lsusb

Yes, please post the output of both.

> I upgraded the BIOS of the motherboard to the latest. This is not the
> problem though because I upgraded after the problem occurred as a counter
> measure in case I was hit by a buggy BIOS and linux had changed its
> behavior to be stricter.

BIOS was my first guess, but in case the other two reports are really
due to the same problem I doubt the BIOS is to blame (one Lenovo and one
Sony laptop).

> I experimented with ACPI compilers/decompilers and I was tempted to fix my
> ACPI tables but I didn't.
> 
> I saw the kernel command line option acpi_os=!Windows2013 but I didn't try
> it. Do you thing I should try it?

You could try "nopat" as command line option.

> 
>> Wow! Thanks for doing this work!
>>
> 
> I would like this to be fixed so I am willing to do the testing.

I appreciate this spirit. :-)


Juergen


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-19  9:10     ` Juergen Gross
@ 2015-11-19 20:35       ` Vassilis Virvilis
  2015-11-20  5:25         ` Vassilis Virvilis
  0 siblings, 1 reply; 22+ messages in thread
From: Vassilis Virvilis @ 2015-11-19 20:35 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]

On 11/19/2015 11:10 AM, Juergen Gross wrote:

>> So Do you want me to test 4.3 or 4.4-pre/rc*/latest linus tree. I assume
>> 4.3 for now.
>
> I think 4.3 is okay.
>
>> I will do it later tonight. It will take 2 days at least to report back

I compiled and I am running 4.3 right now.

If it fails I will try with the nopat option.

If it fails I will try 3.18-rc2+nopat to see if that fails.

>>
>> Do you want me to run something on this like lspci, lsusb
>
> Yes, please post the output of both.


Here they are. See attachments

>
>> I would like this to be fixed so I am willing to do the testing.
>
> I appreciate this spirit. :-)
>

I appreciate the guidance. :-)


     Vassilis

[-- Attachment #2: lsusb.txt --]
[-- Type: text/plain, Size: 34976 bytes --]

Bus 004 Device 002: ID 8087:8001 Intel Corp. 
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 002: ID 8087:8009 Intel Corp. 
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 046d:089d Logitech, Inc. QuickCam E2500 series
Bus 001 Device 003: ID 045e:0745 Microsoft Corp. Nano Transceiver v1.0 for Bluetooth
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 004 Device 002: ID 8087:8001 Intel Corp. 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         1 Single TT
  bMaxPacketSize0        64
  idVendor           0x8087 Intel Corp.
  idProduct          0x8001 
  bcdDevice            0.00
  iManufacturer           0 
  iProduct                0 
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0002  1x 2 bytes
        bInterval              12
Hub Descriptor:
  bLength              11
  bDescriptorType      41
  nNbrPorts             8
  wHubCharacteristic 0x0009
    Per-port power switching
    Per-port overcurrent protection
    TT think time 8 FS bits
  bPwrOn2PwrGood        0 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x00 0x00
  PortPwrCtrlMask    0xff 0xff
 Hub Port Status:
   Port 1: 0000.0100 power
   Port 2: 0000.0100 power
   Port 3: 0000.0100 power
   Port 4: 0000.0100 power
   Port 5: 0000.0100 power
   Port 6: 0000.0100 power
   Port 7: 0000.0100 power
   Port 8: 0000.0100 power
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x0001
  Self Powered

Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  idVendor           0x1d6b Linux Foundation
  idProduct          0x0002 2.0 root hub
  bcdDevice            4.03
  iManufacturer           3 Linux 4.3.0+ ehci_hcd
  iProduct                2 EHCI Host Controller
  iSerial                 1 0000:00:1d.0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0004  1x 4 bytes
        bInterval              12
Hub Descriptor:
  bLength               9
  bDescriptorType      41
  nNbrPorts             2
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
  bPwrOn2PwrGood       10 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x02
  PortPwrCtrlMask    0xff
 Hub Port Status:
   Port 1: 0000.0507 highspeed power suspend enable connect
   Port 2: 0000.0100 power
Device Status:     0x0001
  Self Powered

Bus 003 Device 002: ID 8087:8009 Intel Corp. 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         1 Single TT
  bMaxPacketSize0        64
  idVendor           0x8087 Intel Corp.
  idProduct          0x8009 
  bcdDevice            0.00
  iManufacturer           0 
  iProduct                0 
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0001  1x 1 bytes
        bInterval              12
Hub Descriptor:
  bLength               9
  bDescriptorType      41
  nNbrPorts             6
  wHubCharacteristic 0x0009
    Per-port power switching
    Per-port overcurrent protection
    TT think time 8 FS bits
  bPwrOn2PwrGood        0 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x00
  PortPwrCtrlMask    0xff
 Hub Port Status:
   Port 1: 0000.0100 power
   Port 2: 0000.0100 power
   Port 3: 0000.0100 power
   Port 4: 0000.0100 power
   Port 5: 0000.0100 power
   Port 6: 0000.0100 power
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x0001
  Self Powered

Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         0 Full speed (or root) hub
  bMaxPacketSize0        64
  idVendor           0x1d6b Linux Foundation
  idProduct          0x0002 2.0 root hub
  bcdDevice            4.03
  iManufacturer           3 Linux 4.3.0+ ehci_hcd
  iProduct                2 EHCI Host Controller
  iSerial                 1 0000:00:1a.0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0004  1x 4 bytes
        bInterval              12
Hub Descriptor:
  bLength               9
  bDescriptorType      41
  nNbrPorts             2
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
  bPwrOn2PwrGood       10 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x02
  PortPwrCtrlMask    0xff
 Hub Port Status:
   Port 1: 0000.0507 highspeed power suspend enable connect
   Port 2: 0000.0100 power
Device Status:     0x0001
  Self Powered

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               3.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         3 
  bMaxPacketSize0         9
  idVendor           0x1d6b Linux Foundation
  idProduct          0x0003 3.0 root hub
  bcdDevice            4.03
  iManufacturer           3 Linux 4.3.0+ xhci-hcd
  iProduct                2 xHCI Host Controller
  iSerial                 1 0000:00:14.0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           31
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0004  1x 4 bytes
        bInterval              12
        bMaxBurst               0
Hub Descriptor:
  bLength              12
  bDescriptorType      42
  nNbrPorts             6
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
  bPwrOn2PwrGood       10 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  bHubDecLat          0.0 micro seconds
  wHubDelay             0 nano seconds
  DeviceRemovable    0x00
 Hub Port Status:
   Port 1: 0000.02a0 5Gbps power Rx.Detect
   Port 2: 0000.02a0 5Gbps power Rx.Detect
   Port 3: 0000.02a0 5Gbps power Rx.Detect
   Port 4: 0000.02a0 5Gbps power Rx.Detect
   Port 5: 0000.02a0 5Gbps power Rx.Detect
   Port 6: 0000.02a0 5Gbps power Rx.Detect
Binary Object Store Descriptor:
  bLength                 5
  bDescriptorType        15
  wTotalLength           15
  bNumDeviceCaps          1
  SuperSpeed USB Device Capability:
    bLength                10
    bDescriptorType        16
    bDevCapabilityType      3
    bmAttributes         0x02
      Latency Tolerance Messages (LTM) Supported
    wSpeedsSupported   0x0008
      Device can operate at SuperSpeed (5Gbps)
    bFunctionalitySupport   3
      Lowest fully-functional device speed is SuperSpeed (5Gbps)
    bU1DevExitLat          10 micro seconds
    bU2DevExitLat         512 micro seconds
Device Status:     0x0001
  Self Powered

Bus 001 Device 002: ID 046d:089d Logitech, Inc. QuickCam E2500 series
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0         8
  idVendor           0x046d Logitech, Inc.
  idProduct          0x089d QuickCam E2500 series
  bcdDevice            1.00
  iManufacturer           0 
  iProduct                0 
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength          336
    bNumInterfaces          3
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xa0
      (Bus Powered)
      Remote Wakeup
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0000  1x 0 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       1
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0080  1x 128 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       2
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x00c0  1x 192 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       3
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0100  1x 256 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       4
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0180  1x 384 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       5
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       6
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0300  1x 768 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       7
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x03ff  1x 1023 bytes
        bInterval               1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass         1 Audio
      bInterfaceSubClass      1 Control Device
      bInterfaceProtocol      0 
      iInterface              0 
      AudioControl Interface Descriptor:
        bLength                 9
        bDescriptorType        36
        bDescriptorSubtype      1 (HEADER)
        bcdADC               1.00
        wTotalLength           39
        bInCollection           1
        baInterfaceNr( 0)       2
      AudioControl Interface Descriptor:
        bLength                12
        bDescriptorType        36
        bDescriptorSubtype      2 (INPUT_TERMINAL)
        bTerminalID             1
        wTerminalType      0x0201 Microphone
        bAssocTerminal          0
        bNrChannels             1
        wChannelConfig     0x0000
        iChannelNames           0 
        iTerminal               0 
      AudioControl Interface Descriptor:
        bLength                 9
        bDescriptorType        36
        bDescriptorSubtype      6 (FEATURE_UNIT)
        bUnitID                 2
        bSourceID               1
        bControlSize            2
        bmaControls( 0)      0x43
        bmaControls( 0)      0x00
          Mute Control
          Volume Control
          Automatic Gain Control
        iFeature                0 
      AudioControl Interface Descriptor:
        bLength                 9
        bDescriptorType        36
        bDescriptorSubtype      3 (OUTPUT_TERMINAL)
        bTerminalID             3
        wTerminalType      0x0101 USB Streaming
        bAssocTerminal          0
        bSourceID               2
        iTerminal               0 
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol      0 
      iInterface              0 
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       1
      bNumEndpoints           1
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol      0 
      iInterface              0 
      AudioStreaming Interface Descriptor:
        bLength                 7
        bDescriptorType        36
        bDescriptorSubtype      1 (AS_GENERAL)
        bTerminalLink           3
        bDelay                  1 frames
        wFormatTag              1 PCM
      AudioStreaming Interface Descriptor:
        bLength                11
        bDescriptorType        36
        bDescriptorSubtype      2 (FORMAT_TYPE)
        bFormatType             1 (FORMAT_TYPE_I)
        bNrChannels             1
        bSubframeSize           2
        bBitResolution         16
        bSamFreqType            1 Discrete
        tSamFreq[ 0]         8000
      Endpoint Descriptor:
        bLength                 9
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0010  1x 16 bytes
        bInterval               1
        bRefresh                0
        bSynchAddress           0
        AudioControl Endpoint Descriptor:
          bLength                 7
          bDescriptorType        37
          bDescriptorSubtype      1 (EP_GENERAL)
          bmAttributes         0x00
          bLockDelayUnits         0 Undefined
          wLockDelay              0 Undefined
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       2
      bNumEndpoints           1
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol      0 
      iInterface              0 
      AudioStreaming Interface Descriptor:
        bLength                 7
        bDescriptorType        36
        bDescriptorSubtype      1 (AS_GENERAL)
        bTerminalLink           3
        bDelay                  1 frames
        wFormatTag              1 PCM
      AudioStreaming Interface Descriptor:
        bLength                11
        bDescriptorType        36
        bDescriptorSubtype      2 (FORMAT_TYPE)
        bFormatType             1 (FORMAT_TYPE_I)
        bNrChannels             1
        bSubframeSize           2
        bBitResolution         16
        bSamFreqType            1 Discrete
        tSamFreq[ 0]        16000
      Endpoint Descriptor:
        bLength                 9
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            1
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0020  1x 32 bytes
        bInterval               1
        bRefresh                0
        bSynchAddress           0
        AudioControl Endpoint Descriptor:
          bLength                 7
          bDescriptorType        37
          bDescriptorSubtype      1 (EP_GENERAL)
          bmAttributes         0x00
          bLockDelayUnits         0 Undefined
          wLockDelay              0 Undefined
Device Status:     0x0000
  (Bus Powered)

Bus 001 Device 003: ID 045e:0745 Microsoft Corp. Nano Transceiver v1.0 for Bluetooth
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x045e Microsoft Corp.
  idProduct          0x0745 Nano Transceiver v1.0 for Bluetooth
  bcdDevice            6.56
  iManufacturer           1 Microsoft
  iProduct                2 Microsoft® 2.4GHz Transceiver v8.0
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           84
    bNumInterfaces          3
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xa0
      (Bus Powered)
      Remote Wakeup
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      1 Boot Interface Subclass
      bInterfaceProtocol      1 Keyboard
      iInterface              0 
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.11
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength      57
         Report Descriptors: 
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval               4
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      1 Boot Interface Subclass
      bInterfaceProtocol      2 Mouse
      iInterface              0 
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.11
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength     295
         Report Descriptors: 
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x000a  1x 10 bytes
        bInterval               1
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      0 No Subclass
      bInterfaceProtocol      0 None
      iInterface              0 
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.11
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength     319
         Report Descriptors: 
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0020  1x 32 bytes
        bInterval               1
Device Status:     0x0000
  (Bus Powered)

Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            9 Hub
  bDeviceSubClass         0 Unused
  bDeviceProtocol         1 Single TT
  bMaxPacketSize0        64
  idVendor           0x1d6b Linux Foundation
  idProduct          0x0002 2.0 root hub
  bcdDevice            4.03
  iManufacturer           3 Linux 4.3.0+ xhci-hcd
  iProduct                2 xHCI Host Controller
  iSerial                 1 0000:00:14.0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           25
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xe0
      Self Powered
      Remote Wakeup
    MaxPower                0mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         9 Hub
      bInterfaceSubClass      0 Unused
      bInterfaceProtocol      0 Full speed (or root) hub
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0004  1x 4 bytes
        bInterval              12
Hub Descriptor:
  bLength              11
  bDescriptorType      41
  nNbrPorts            14
  wHubCharacteristic 0x000a
    No power switching (usb 1.0)
    Per-port overcurrent protection
    TT think time 8 FS bits
  bPwrOn2PwrGood       10 * 2 milli seconds
  bHubContrCurrent      0 milli Ampere
  DeviceRemovable    0x00 0x00
  PortPwrCtrlMask    0xff 0xff
 Hub Port Status:
   Port 1: 0000.0100 power
   Port 2: 0000.0100 power
   Port 3: 0000.0103 power enable connect
   Port 4: 0000.0100 power
   Port 5: 0000.0100 power
   Port 6: 0000.0100 power
   Port 7: 0000.0100 power
   Port 8: 0000.0100 power
   Port 9: 0000.0100 power
   Port 10: 0000.0103 power enable connect
   Port 11: 0000.0100 power
   Port 12: 0000.0100 power
   Port 13: 0000.0100 power
   Port 14: 0000.0100 power
Device Status:     0x0001
  Self Powered

[-- Attachment #3: lspci.txt --]
[-- Type: text/plain, Size: 7362 bytes --]

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
00:1c.6 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family H97 Controller
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
	Subsystem: ASRock Incorporation Device 0c00
	Flags: bus master, fast devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>
	Kernel driver in use: hsw_uncore

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
	Subsystem: ASRock Incorporation Device 0412
	Flags: bus master, fast devsel, latency 0, IRQ 31
	Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	I/O ports at f000 [size=64]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 2
	Capabilities: [a4] PCI Advanced Features
	Kernel driver in use: i915

00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
	Subsystem: ASRock Incorporation Device 0c0c
	Flags: bus master, fast devsel, latency 0, IRQ 32
	Memory at f7c34000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 2
	Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
	Kernel driver in use: snd_hda_intel

00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller (prog-if 30 [XHCI])
	Subsystem: ASRock Incorporation Device 8cb1
	Flags: bus master, medium devsel, latency 0, IRQ 27
	Memory at f7c20000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
	Kernel driver in use: xhci_hcd

00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
	Subsystem: ASRock Incorporation Device 8cba
	Flags: bus master, fast devsel, latency 0, IRQ 29
	Memory at f7c3f000 (64-bit, non-prefetchable) [size=16]
	Capabilities: [50] Power Management version 3
	Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Kernel driver in use: mei_me

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V
	Subsystem: ASRock Incorporation Device 15a1
	Flags: bus master, fast devsel, latency 0, IRQ 26
	Memory at f7c00000 (32-bit, non-prefetchable) [size=128K]
	Memory at f7c3c000 (32-bit, non-prefetchable) [size=4K]
	I/O ports at f080 [size=32]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [e0] PCI Advanced Features
	Kernel driver in use: e1000e

00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2 (prog-if 20 [EHCI])
	Subsystem: ASRock Incorporation Device 8cad
	Flags: bus master, medium devsel, latency 0, IRQ 16
	Memory at f7c3b000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] PCI Advanced Features
	Kernel driver in use: ehci-pci

00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
	Subsystem: ASRock Incorporation Device d892
	Flags: bus master, fast devsel, latency 0, IRQ 30
	Memory at f7c30000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 2
	Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Kernel driver in use: snd_hda_intel

00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 24
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	Capabilities: [40] Express Root Port (Slot-), MSI 00
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: ASRock Incorporation Device 8c90
	Capabilities: [a0] Power Management version 3
	Kernel driver in use: pcieport

00:1c.6 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 25
	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
	Capabilities: [40] Express Root Port (Slot+), MSI 00
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: ASRock Incorporation Device 244e
	Capabilities: [a0] Power Management version 3
	Kernel driver in use: pcieport

00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1 (prog-if 20 [EHCI])
	Subsystem: ASRock Incorporation Device 8ca6
	Flags: bus master, medium devsel, latency 0, IRQ 23
	Memory at f7c3a000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] PCI Advanced Features
	Kernel driver in use: ehci-pci

00:1f.0 ISA bridge: Intel Corporation 9 Series Chipset Family H97 Controller
	Subsystem: ASRock Incorporation Device 8cc6
	Flags: bus master, medium devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>
	Kernel driver in use: lpc_ich

00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] (prog-if 01 [AHCI 1.0])
	Subsystem: ASRock Incorporation Device 8c82
	Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 28
	I/O ports at f0d0 [size=8]
	I/O ports at f0c0 [size=4]
	I/O ports at f0b0 [size=8]
	I/O ports at f0a0 [size=4]
	I/O ports at f060 [size=32]
	Memory at f7c39000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [70] Power Management version 3
	Capabilities: [a8] SATA HBA v1.0
	Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
	Subsystem: ASRock Incorporation Device 8ca2
	Flags: medium devsel
	Memory at f7c38000 (64-bit, non-prefetchable) [size=256]
	I/O ports at f040 [size=32]


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-19 20:35       ` Vassilis Virvilis
@ 2015-11-20  5:25         ` Vassilis Virvilis
  2015-11-20  8:47           ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Vassilis Virvilis @ 2015-11-20  5:25 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

On 11/19/2015 10:35 PM, Vassilis Virvilis wrote:
>
> I compiled and I am running 4.3 right now.
>

It failed this morning. Last night I did 3 hibernate / resume cycles. In the last one I I also turned off the PSU (this seems to push it over the edge - but it may be random behavior) and it worked. This morning 7h later failed to resume - but it didn't hang on _lapic_resume. This time it rebooted - and I seem to recall this behavior for 4.2+ kernels. I forgot to mention it because my testing with 4.x kernels were one month before.

So 4.3 kernel - reboots on resume after a long hibernation time.

I am testing with 4.3 and nopat right now.

      Vassilis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-20  5:25         ` Vassilis Virvilis
@ 2015-11-20  8:47           ` Juergen Gross
  2015-11-20 10:04             ` vasvir
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2015-11-20  8:47 UTC (permalink / raw)
  To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

On 20/11/15 06:25, Vassilis Virvilis wrote:
> On 11/19/2015 10:35 PM, Vassilis Virvilis wrote:
>>
>> I compiled and I am running 4.3 right now.
>>
> 
> It failed this morning. Last night I did 3 hibernate / resume cycles. In
> the last one I I also turned off the PSU (this seems to push it over the
> edge - but it may be random behavior) and it worked. This morning 7h
> later failed to resume - but it didn't hang on _lapic_resume. This time
> it rebooted - and I seem to recall this behavior for 4.2+ kernels. I
> forgot to mention it because my testing with 4.x kernels were one month
> before.
> 
> So 4.3 kernel - reboots on resume after a long hibernation time.
> 
> I am testing with 4.3 and nopat right now.

I've just found a potential issue: In case MTRR is disabled by the BIOS
the PAT register of the boot processor won't be restored after resume.

Can you check whether pr_info("MTRR: Disabled\n") has been executed in
early boot? If yes, this might be a BIOS option.


Juergen





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-20  8:47           ` Juergen Gross
@ 2015-11-20 10:04             ` vasvir
  2015-11-20 12:23               ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: vasvir @ 2015-11-20 10:04 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

> I've just found a potential issue: In case MTRR is disabled by the BIOS
> the PAT register of the boot processor won't be restored after resume.
>
> Can you check whether pr_info("MTRR: Disabled\n") has been executed in
> early boot? If yes, this might be a BIOS option.
>

I don't have access right now. I will test it later tonight (This is my
home machine).

Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr
somewere else e.g. /proc /sys etc?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-20 10:04             ` vasvir
@ 2015-11-20 12:23               ` Juergen Gross
  2015-11-21 11:49                 ` Vassilis Virvilis
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2015-11-20 12:23 UTC (permalink / raw)
  To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

On 20/11/15 11:04, vasvir@iit.demokritos.gr wrote:
>> I've just found a potential issue: In case MTRR is disabled by the BIOS
>> the PAT register of the boot processor won't be restored after resume.
>>
>> Can you check whether pr_info("MTRR: Disabled\n") has been executed in
>> early boot? If yes, this might be a BIOS option.
>>
> 
> I don't have access right now. I will test it later tonight (This is my
> home machine).
> 
> Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr
> somewere else e.g. /proc /sys etc?

I think grepping for MTRR in dmesg should be enough.


Juergen


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-20 12:23               ` Juergen Gross
@ 2015-11-21 11:49                 ` Vassilis Virvilis
  2015-11-23  7:32                   ` Juergen Gross
  2015-11-23 18:56                   ` Luis R. Rodriguez
  0 siblings, 2 replies; 22+ messages in thread
From: Vassilis Virvilis @ 2015-11-21 11:49 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

On 11/20/2015 02:23 PM, Juergen Gross wrote:
> On 20/11/15 11:04, vasvir@iit.demokritos.gr wrote:
>>> I've just found a potential issue: In case MTRR is disabled by the BIOS
>>> the PAT register of the boot processor won't be restored after resume.
>>>
>>> Can you check whether pr_info("MTRR: Disabled\n") has been executed in
>>> early boot? If yes, this might be a BIOS option.
>>>
>>
>> I don't have access right now. I will test it later tonight (This is my
>> home machine).
>>
>> Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr
>> somewere else e.g. /proc /sys etc?
>
> I think grepping for MTRR in dmesg should be enough.

kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place.

$dmesg | grep -i mtr for 4.3 kernel with notpat
[    0.189113] calling  mtrr_if_init+0x0/0x5f @ 1
[    0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
[    0.189222] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
[    0.189559] calling  mtrr_init_finialize+0x0/0x3a @ 1
[    0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
[    8.994140] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
[    8.994154] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.

$dmesg | grep -i mtr for 4.3 kernel with default pat enabled
[    0.189368] calling  mtrr_if_init+0x0/0x5f @ 1
[    0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
[    0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
[    0.189814] calling  mtrr_init_finialize+0x0/0x3a @ 1
[    0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs


I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR?

Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently?

Note: With PAT enabled the system boots up significantly faster.

In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing...

I will also try with nopat and I will run dmesg | grep -i mtr and post results

Unless you have any other suggestions...

     Vassilis


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-21 11:49                 ` Vassilis Virvilis
@ 2015-11-23  7:32                   ` Juergen Gross
  2015-11-23 14:11                     ` vasvir
  2015-11-23 18:56                   ` Luis R. Rodriguez
  1 sibling, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2015-11-23  7:32 UTC (permalink / raw)
  To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

On 21/11/15 12:49, Vassilis Virvilis wrote:
> On 11/20/2015 02:23 PM, Juergen Gross wrote:
>> On 20/11/15 11:04, vasvir@iit.demokritos.gr wrote:
>>>> I've just found a potential issue: In case MTRR is disabled by the BIOS
>>>> the PAT register of the boot processor won't be restored after resume.
>>>>
>>>> Can you check whether pr_info("MTRR: Disabled\n") has been executed in
>>>> early boot? If yes, this might be a BIOS option.
>>>>
>>>
>>> I don't have access right now. I will test it later tonight (This is my
>>> home machine).
>>>
>>> Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr
>>> somewere else e.g. /proc /sys etc?
>>
>> I think grepping for MTRR in dmesg should be enough.
> 
> kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the
> familiar (see previously attached image) "Calling lapic..." place.
> 
> $dmesg | grep -i mtr for 4.3 kernel with notpat
> [    0.189113] calling  mtrr_if_init+0x0/0x5f @ 1
> [    0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> [    0.189222] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000]
> with a huge-page mapping due to MTRR override.
> [    0.189559] calling  mtrr_init_finialize+0x0/0x3a @ 1
> [    0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0
> usecs
> [    8.994140] mtrr: type mismatch for e0000000,10000000 old: write-back
> new: write-combining
> [    8.994154] Failed to add WC MTRR for
> [00000000e0000000-00000000efffffff]; performance may suffer.
> 
> $dmesg | grep -i mtr for 4.3 kernel with default pat enabled
> [    0.189368] calling  mtrr_if_init+0x0/0x5f @ 1
> [    0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> [    0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000]
> with a huge-page mapping due to MTRR override.
> [    0.189814] calling  mtrr_init_finialize+0x0/0x3a @ 1
> [    0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0
> usecs
> 
> 
> I also checked my BIOS. I found nothing about mtrr. My BIOS manual is
> ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option
> about MTRR?

As the BIOS obviously isn't disabling MTRR I don't think we have
to go that route any longer.

> Question: If we assume your theory is correct about mtrr/pat, wouldn't
> lockup/hang reboot every time the system goes to hibernate/resume? Can
> this assumption explain why the first hibernation/resume cycles in rapid
> succession after system boot are working and the long ones fail somewhat
> more consistently?

Hmm, I'm really not sure. It would depend on the usage of non-standard
cache mode mappings. But as MTRR isn't disabled this theory won't apply
to your problem.

> Note: With PAT enabled the system boots up significantly faster.
> 
> In the weekend I will return to 3.18-rc2 and I will try to verify my
> bisection is correct. Double guessing your self is a terrible thing...

Thanks.

> I will also try with nopat and I will run dmesg | grep -i mtr and post
> results
> 
> Unless you have any other suggestions...

I think we have to find out where the kernel is really hanging. Do you
have any chance to trigger a NMI?

Looking into suspend/resume code I found a strange inconsistency for
the lapic handling:

lapic_suspend()
{
...
#ifdef CONFIG_X86_THERMAL_VECTOR
        if (maxlvt >= 5)
                apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR);
#endif
...
}

lapic_resume()
{
...
#if defined(CONFIG_X86_MCE_INTEL)
        if (maxlvt >= 5)
                apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr);
#endif
...
}

and comparing that to:

clear_local_APIC()
{
...
#ifdef CONFIG_X86_THERMAL_VECTOR
        if (maxlvt >= 5) {
                v = apic_read(APIC_LVTTHMR);
                apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED);
        }
#endif
#ifdef CONFIG_X86_MCE_INTEL
        if (maxlvt >= 6) {
                v = apic_read(APIC_LVTCMCI);
                if (!(v & APIC_LVT_MASKED))
                        apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED);
        }
#endif
...
}

I think it would be interesting to know your kernel config...


Juergen


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-23  7:32                   ` Juergen Gross
@ 2015-11-23 14:11                     ` vasvir
  2015-11-23 14:19                       ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: vasvir @ 2015-11-23 14:11 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

On 11/20/2015 02:23 PM, Juergen Gross wrote:

>
> As the BIOS obviously isn't disabling MTRR I don't think we have
> to go that route any longer.

ok.

>>
>> In the weekend I will return to 3.18-rc2 and I will try to verify my
>> bisection is correct. Double guessing your self is a terrible thing...
>
> Thanks.
>
>> I will also try with nopat and I will run dmesg | grep -i mtr and post
>> results
>>
>> Unless you have any other suggestions...
>

I hit a very big problem here. I did
$git checkout 773fed910d41e443e495a6bfa9ab1c2b7b13e012
$make (with gcc 4.8 - as all my tests)

and the resulting kernel in unbootable hunging in "Loading initial
ramdisk..." second line of the kernel boot

That means my bisection is not good because this release is marked as good.

So now I am at loss.

As I said I followed https://wiki.debian.org/DebianKernel/GitBisect

I notice now that the article suggest a step
  $make oldconfig

I did it once at the start of the bisection and then answering the default
(Enter) in all config questions.

> I think we have to find out where the kernel is really hanging. Do you
> have any chance to trigger a NMI?

I am googling about it.

>
> Looking into suspend/resume code I found a strange inconsistency for
> the lapic handling:
>
> lapic_suspend()
> {
> ...
> #ifdef CONFIG_X86_THERMAL_VECTOR
>         if (maxlvt >= 5)
>                 apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR);
> #endif
> ...
> }
>
> lapic_resume()
> {
> ...
> #if defined(CONFIG_X86_MCE_INTEL)
>         if (maxlvt >= 5)
>                 apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr);
> #endif
> ...
> }
>
> and comparing that to:
>
> clear_local_APIC()
> {
> ...
> #ifdef CONFIG_X86_THERMAL_VECTOR
>         if (maxlvt >= 5) {
>                 v = apic_read(APIC_LVTTHMR);
>                 apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED);
>         }
> #endif
> #ifdef CONFIG_X86_MCE_INTEL
>         if (maxlvt >= 6) {
>                 v = apic_read(APIC_LVTCMCI);
>                 if (!(v & APIC_LVT_MASKED))
>                         apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED);
>         }
> #endif
> ...
> }
>

Ok I will send the .config when I get back home. I have all kernels I
build in .deb archive. The problem is that the debian kernel build
procedure does not hold somewhere in the deb file the git commit hash.

Fow which kernel would you care to see the config? 4.3?

     Vassilis




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-23 14:11                     ` vasvir
@ 2015-11-23 14:19                       ` Juergen Gross
  2015-11-24 22:46                         ` Luis R. Rodriguez
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2015-11-23 14:19 UTC (permalink / raw)
  To: vasvir; +Cc: linux-kernel, Toshi Kani, Luis R. Rodriguez

On 23/11/15 15:11, vasvir@iit.demokritos.gr wrote:
> Ok I will send the .config when I get back home. I have all kernels I
> build in .deb archive. The problem is that the debian kernel build
> procedure does not hold somewhere in the deb file the git commit hash.
> 
> Fow which kernel would you care to see the config? 4.3?

Doesn't really matter anymore. I've posted a patch already to fix it and
got the reply, that the fix is okay, but no harm can come from the
current implementation, as the two config options are always either both
set or reset.

Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-19  5:39 ` Juergen Gross
  2015-11-19  7:50   ` vasvir
@ 2015-11-23 18:48   ` Luis R. Rodriguez
  2015-11-24  9:36     ` vasvir
  1 sibling, 1 reply; 22+ messages in thread
From: Luis R. Rodriguez @ 2015-11-23 18:48 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Vassilis Virvilis, linux-kernel, Toshi Kani, mcgrof, mcgrof

On Thu, Nov 19, 2015 at 06:39:28AM +0100, Juergen Gross wrote:
> On 18/11/15 22:43, Vassilis Virvilis wrote:
> > Hi,
> > 
> > I have been hit by a hibernate/resume bug. Other people may have too:
> > The following links are consistent with my observations
> > 
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1490494
> > https://bugs.archlinux.org/task/44807
> > 
> > Some observations:
> > 1) The first few rapid hibernation / resume cycles do not fail.
> > 
> > 2) If the computer is loaded (eclipse + chromium + firefox/iceweasel +
> > thunderbird/icedove + Konsole) helps to reproduce and lock up during resume

Let's try to speed up reproducing this.

I have a hunch perhaps this might be related to some BIOS controlled
MTRRs and a mismatch which then enables the kernel to think that a type 
of MTRR write might be OK, but in fact its not. Due to the work load
description of this perhaps this could be related to fan control and BIOS
control on them and against some other device MTRR. More on this suspicion
on another thread where you provide more logs.

On a kernel that you know fails can you try replacing this work load by making
you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building for 2,
4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if
making the CPU fan trigger would accelerate the issue.  If 'make -j' is too nuts
to the point you can't even CTRL C it, try 'make -j 16' . Note that if this is
true then that means a hot CPU could still trigger CPU fan controls on on a
fresh boot if the previous boot was CPU intensive.

If this doesn't do it lets try forcing an MTRR capable driver, say graphics is
the obvious target, try perhaps some 3D stuff or a screen saver prior to
hibernation. Note that even if you boot nomtrr the BIOS may still use MTRRs,
and PAT use on Linux could assume MTRR is not being used on drivers but the
BIOS may still do something behind the scenes. This is actually one reason why
we can't exactly remove MTRR support from Linux, since the BIOS may still do
some wacky stuff with MTRRs, one example of such I was given was CPU can
control might use WC MTRRs, so the kernel must be aware of this, even if no
MTRRs are ever used on the Linux kernel at all -- this is the case now as of
v4.3 and onwards.

If that doesn't help speed it up , maybe try both screen saver + some 3D
stuff + cpu instensive stuff.

To help you speed up testing you can try reducing your build time by reducing
the amount of crap you have to build:

make localmodconfig

That should only build things your kernel has loaded as modules or is already
enabled (=y).

> > 3) Long hibernation times (overnight) helps to reproduce and lock up
> > during resume
> > 
> > 4) For the bad commits (where the lockup during resume takes place) -
> > the image loading during resume is significantly faster. It is fast and
> > then it locks.
> > 
> > How I hit the problem and what I have done:
> > 
> > I am running debian unstable
> > 
> > Debian went from 3.16 to 3.19 - hence the problem raised its ugly head.
> > I upgraded diligently up to 4.2.6 - The problem persists
> > 
> > I started kernel bisection from 3.16 to 3.19 following
> > https://wiki.debian.org/DebianKernel/GitBisect
> > 
> > One month and 25 kernels later see below for the bisect log
> 
> Wow! Thanks for doing this work!
> 

Vassilis, indeed, the amount of work you have put into this is extremely
appreciated!

> Juergen
> 
> > 
> > I hit some untestable kernel that weren't booting. They were hanging at
> > "Loading ramdisk..." before any actual kernel message.
> > 
> > Looks like the first bad / untestable commit is from  Juergen Gross /
> > Thomas Gleixner Merge branch 'x86-mm-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip [full PAT support]
> > 

That is commit a023748d53c10850650fe86b1c4a7d421d576451
("Merge branch 'x86-mm-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

Git is smart enough to tell you you've hit a merge commit and that all the
possible commits on that merge could be the issue. This is why you bisect
log shows a slew of commits. The next step is to bisect through the merge
and then bisect through that, this will then let us identify the exact commit
that may have caused the issue.

There are a few ways to do this, my preferred way is to "unfold" a merge
commit manually.

To help keep thing separately (without affecting other tests you might
have on your other git tree and to avoid having to force you to loose
fresh object as you continue to build test on the other tree), I'd do
something like this:

mkdir ~/tmp
git clone ~/linux/.git linux-dev-test

cd linux-dev-test

Notice how if you do git log and search for a023748d53c10850650fe86b1c4a7d421d576451
you'll see that the commit listed before this is 773fed910d41e443e495a6bfa9ab1c2b7b13e012
("Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")

To be clear the list of commits you typically would see is just:

a023748d53c10850650fe86b1c4a7d421d576451 - Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
773fed910d41e443e495a6bfa9ab1c2b7b13e012 - Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

We want to go down into the commits in the merge commit a023748d53c and
then zero out exactly which commit caused the issue. To do that on your
linux-dev-test directory you can do this:

git checkout -b test-merge-commit a023748d53c10850650fe86b1c4a7d421d576451

That will create branch for testing based on the merge commit.
Now do this:

git rebase -i 773fed910d41e443e495a6bfa9ab1c2b7b13e012

Then don't pick any commit, just save and exit the editor, and then
git will actually "unfold" the merge commit for you -- it magically
will apply each commit in that merge commit linearly into your git
history.

For instance the rebase should show 22 commits as follows, just 
leave the defaults set as in bewlow and just hit (ESCT + :wq if
in vi):

pick 96e70f832856 x86/mm: Avoid overlap the fixmap area on i386                 
pick 63e7b6d90c1e x86: mm: Re-use the early_ioremap fixed area                  
pick bdee237c0343 x86: mm: Use 2GB memory block size on large-memory x86-64 systems
pick 281d4078bec3 x86: Make page cache mode a real type                         
pick c27ce0af896b x86: Use new cache mode type in include/asm/fb.h              
pick 2d85ebf8e12e x86: Use new cache mode type in drivers/video/fbdev/gbefb.c   
pick 5006e45a6bc2 x86: Use new cache mode type in drivers/video/fbdev/vermilion 
pick 1c64216be164 x86: Use new cache mode type in arch/x86/pci                  
pick 2df58b6d3530 x86: Use new cache mode type in arch/x86/mm/init_64.c         
pick d85f33342a0f x86: Use new cache mode type in asm/pgtable.h                 
pick 49a3b3cbdf16 x86: Use new cache mode type in mm/iomap_32.c                 
pick 2a3746984c98 x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
pick 102e19e1955d x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
pick c06814d8419a x86: Use new cache mode type in setting page attributes       
pick b14097bd911c x86: Use new cache mode type in mm/ioremap.c                  
pick e00c8cc93c1a x86: Use new cache mode type in memtype related functions     
pick 87ad0b713b10 x86: Clean up pgtable_types.h                                 
pick f439c429c320 x86: Support PAT bit in pagetable dump for lower levels       
pick f5b2831d6541 x86: Respect PAT bit when copying pte values between large and normal pages
pick bd809af16e3a x86: Enable PAT to use cache mode translation tables          
pick 47591df50512 xen: Support Xen pv-domains using PAT                         
pick 0dbcae884779 x86: mm: Move PAT only functions to mm/pat.c 

You should see:

Successfully rebased and updated refs/heads/test-merge-commit.

Now if you do git log you will see the above commits in linear
atomic history. You can now bisect this merge commit atomically, so do:

git bisect 099487de0934e3d5e326666914a426af89a0868b 773fed910d41e443e495a6bfa9ab1c2b7b13e012

Note that this assumes that the commit prior to the merge commit is fine.
Is this true, can you confirm? (git checkout -b test-prior-merge-gtest 773fed910d4,
build and see if it doesn't break there)

If we know for sure 773fed910d4 did not break anything then the above bisect
should let us zero in on the exact atomic commit ID that caused the issue.

> > Full disclaimer: I may have fucked up the bisection. Finding bad commits
> > was semi easy - finding good commits needs a run time for 2-3 days.

Reducing the amount of time it takes to reproduce a bug is art work but perhaps
we can  reduce that time.

> > 
> > I would really appreciate some help and directions to nail this down.
> > 

The amount of time and patience on your side is appreciated as well.

> > 
> > Regards
> > 
> >      Vassilis Virvilis
> > 
> > 
> > 
> > bill@localhost:~/Downloads/linux$ git bisect log
> > git bisect start
> > # good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
> > git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
> > # bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
> > git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735
> > # good: [754c780953397dd5ee5191b7b3ca67e09088ce7a] Merge branch
> > 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
> > git bisect good 754c780953397dd5ee5191b7b3ca67e09088ce7a
> > # bad: [7ef58b32f571bffb7763c6252ad7527562081f34] Merge tag
> > 'devicetree-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
> > git bisect bad 7ef58b32f571bffb7763c6252ad7527562081f34
> > # good: [53429290a054b30e4683297409fc4627b2592315] Merge
> > git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
> > git bisect good 53429290a054b30e4683297409fc4627b2592315
> > # good: [3a647c1d7ab08145cee4b650f5e797d168846c51] Merge tag
> > 'drivers-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> > git bisect good 3a647c1d7ab08145cee4b650f5e797d168846c51
> > # bad: [1366f5d3129f2abde606214de7afc3dd61781fa3] Merge branch
> > 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
> > git bisect bad 1366f5d3129f2abde606214de7afc3dd61781fa3
> > # good: [151cd97630f87451cab412e40750d0e5f7581c98] Merge tag
> > 'defconfig-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> > git bisect good 151cd97630f87451cab412e40750d0e5f7581c98
> > # good: [ecb50f0afd35a51ef487e8a54b976052eb03d729] Merge branch
> > 'irq-core-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good ecb50f0afd35a51ef487e8a54b976052eb03d729
> > # bad: [3a5dc1fafb016560315fe45bb4ef8bde259dd1bc] Merge branch
> > 'x86-microcode-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect bad 3a5dc1fafb016560315fe45bb4ef8bde259dd1bc
> > # good: [b6444bd0a18eb47343e16749ce80a6ebd521f124] Merge branch
> > 'x86-boot-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good b6444bd0a18eb47343e16749ce80a6ebd521f124
> > # bad: [a023748d53c10850650fe86b1c4a7d421d576451] Merge branch
> > 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect bad a023748d53c10850650fe86b1c4a7d421d576451
> > # good: [773fed910d41e443e495a6bfa9ab1c2b7b13e012] Merge branches
> > 'x86-platform-for-linus' and 'x86-uv-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good 773fed910d41e443e495a6bfa9ab1c2b7b13e012
> > # good: [49a3b3cbdf1621678a39bd95a3e67c0f858539c7] x86: Use new cache
> > mode type in mm/iomap_32.c
> > git bisect good 49a3b3cbdf1621678a39bd95a3e67c0f858539c7
> > # skip: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9] x86: Clean up
> > pgtable_types.h
> > git bisect skip 87ad0b713b1034b6caf559976c35ce47f6d1d1e9
> > # skip: [c06814d8419a74528500f85faf5fc01f67f8e7e6] x86: Use new cache
> > mode type in setting page attributes
> > git bisect skip c06814d8419a74528500f85faf5fc01f67f8e7e6
> > # skip: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041] x86: Use new cache
> > mode type in memtype related functions
> > git bisect skip e00c8cc93c1ac01ecd5049929a50fb47b62bb041
> > # skip: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2] x86: Enable PAT to
> > use cache mode translation tables
> > git bisect skip bd809af16e3ab1f8d55b3e2928c47c67e2a865d2
> > # skip: [f439c429c320981943f8b64b2a4049d946cb492b] x86: Support PAT bit
> > in pagetable dump for lower levels
> > git bisect skip f439c429c320981943f8b64b2a4049d946cb492b
> > # skip: [47591df505129c9774af6cca2debf283a6e56ed7] xen: Support Xen
> > pv-domains using PAT
> > git bisect skip 47591df505129c9774af6cca2debf283a6e56ed7
> > # skip: [b14097bd911c2554b0b5271b3a6b2d84044d1843] x86: Use new cache
> > mode type in mm/ioremap.c
> > git bisect skip b14097bd911c2554b0b5271b3a6b2d84044d1843
> > # skip: [102e19e1955d85f31475416b1ee22980c6462cf8] x86: Remove looking
> > for setting of _PAGE_PAT_LARGE in pageattr.c
> > git bisect skip 102e19e1955d85f31475416b1ee22980c6462cf8
> > # skip: [f5b2831d654167d77da8afbef4d2584897b12d0c] x86: Respect PAT bit
> > when copying pte values between large and normal pages
> > git bisect skip f5b2831d654167d77da8afbef4d2584897b12d0c
> > # skip: [0dbcae884779fdf7e2239a97ac7488877f0693d9] x86: mm: Move PAT
> > only functions to mm/pat.c
> > git bisect skip 0dbcae884779fdf7e2239a97ac7488877f0693d9
> > # skip: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4] x86: Use new cache
> > mode type in track_pfn_remap() and track_pfn_insert()
> > git bisect skip 2a3746984c98b17b565e6a2c2bbaaaef757db1b4
> > # only skipped commits left to test
> > # possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451]
> > Merge branch 'x86-mm-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > # possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9]
> > x86: mm: Move PAT only functions to mm/pat.c
> > # possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7]
> > xen: Support Xen pv-domains using PAT
> > # possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2]
> > x86: Enable PAT to use cache mode translation tables
> > # possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c]
> > x86: Respect PAT bit when copying pte values between large and normal pages
> > # possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b]
> > x86: Support PAT bit in pagetable dump for lower levels
> > # possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9]
> > x86: Clean up pgtable_types.h
> > # possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041]
> > x86: Use new cache mode type in memtype related functions
> > # possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843]
> > x86: Use new cache mode type in mm/ioremap.c
> > # possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6]
> > x86: Use new cache mode type in setting page attributes
> > # possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8]
> > x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
> > # possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4]
> > x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-21 11:49                 ` Vassilis Virvilis
  2015-11-23  7:32                   ` Juergen Gross
@ 2015-11-23 18:56                   ` Luis R. Rodriguez
  2015-11-23 23:01                     ` Vassilis Virvilis
  1 sibling, 1 reply; 22+ messages in thread
From: Luis R. Rodriguez @ 2015-11-23 18:56 UTC (permalink / raw)
  To: Vassilis Virvilis; +Cc: Juergen Gross, linux-kernel, Toshi Kani

On Sat, Nov 21, 2015 at 01:49:06PM +0200, Vassilis Virvilis wrote:
> On 11/20/2015 02:23 PM, Juergen Gross wrote:
> >On 20/11/15 11:04, vasvir@iit.demokritos.gr wrote:
> >>>I've just found a potential issue: In case MTRR is disabled by the BIOS
> >>>the PAT register of the boot processor won't be restored after resume.
> >>>
> >>>Can you check whether pr_info("MTRR: Disabled\n") has been executed in
> >>>early boot? If yes, this might be a BIOS option.
> >>>
> >>
> >>I don't have access right now. I will test it later tonight (This is my
> >>home machine).
> >>
> >>Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr
> >>somewere else e.g. /proc /sys etc?
> >
> >I think grepping for MTRR in dmesg should be enough.
> 
> kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place.
> 
> $dmesg | grep -i mtr for 4.3 kernel with notpat
> [    0.189113] calling  mtrr_if_init+0x0/0x5f @ 1
> [    0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> [    0.189222] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
> [    0.189559] calling  mtrr_init_finialize+0x0/0x3a @ 1
> [    0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
> [    8.994140] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
> [    8.994154] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.

Its not clear from the log who called this MTRR call for WC that failed, I 
hope we didn't attempt a WC wright on a WB region. Who owns
00000000e0000000-00000000efffffff ?

What does your log show right before and after this? To find out try:

dmesg | grep -5 -i mtrr  

Not being able to use WC is not fatal, its just a performance issue, but if we tried
to override a region which we should not have to WC for which another area the BIOS
might rely on to not be WC, that could be a big issue.

> $dmesg | grep -i mtr for 4.3 kernel with default pat enabled
> [    0.189368] calling  mtrr_if_init+0x0/0x5f @ 1
> [    0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> [    0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
> [    0.189814] calling  mtrr_init_finialize+0x0/0x3a @ 1
> [    0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs

The fact we don't see a conflict doesn't mean an issue or conflict didn't
trigger. If PAT didn't see something the BIOS did that make the kernel assume
it could do something that it was not able to. The MTRR init code should pick
up on this stuff and let the kernel PAT code know if there could be a conflict,
but if for some reason that was missed, that could be an issue.

> I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR?
> 
> Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently?
> 
> Note: With PAT enabled the system boots up significantly faster.
> 
> In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing...
> 
> I will also try with nopat and I will run dmesg | grep -i mtr and post results
> 
> Unless you have any other suggestions...

Bisection on the merge commit would help.

 Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-23 18:56                   ` Luis R. Rodriguez
@ 2015-11-23 23:01                     ` Vassilis Virvilis
  2015-11-24 22:16                       ` Luis R. Rodriguez
  0 siblings, 1 reply; 22+ messages in thread
From: Vassilis Virvilis @ 2015-11-23 23:01 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: Juergen Gross, linux-kernel, Toshi Kani

[-- Attachment #1: Type: text/plain, Size: 9405 bytes --]

On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote:
> Its not clear from the log who called this MTRR call for WC that failed, I
> hope we didn't attempt a WC wright on a WB region. Who owns
> 00000000e0000000-00000000efffffff ?

How can I answer that? Is there any utility to run? peek inside /proc?

Here is an idea:
$dmesg | grep -i -5 e0000000
[    0.220941] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window]
[    0.220944] pci_bus 0000:00: root bus resource [mem 0xdf200000-0xfeafffff window]
[    0.220950] pci 0000:00:00.0: [8086:0c00] type 00 class 0x060000
[    0.221012] pci 0000:00:02.0: [8086:0412] type 00 class 0x030000
[    0.221021] pci 0000:00:02.0: reg 0x10: [mem 0xf7800000-0xf7bfffff 64bit]
[    0.221025] pci 0000:00:02.0: reg 0x18: [mem 0xe0000000-0xefffffff 64bit pref]
[    0.221028] pci 0000:00:02.0: reg 0x20: [io  0xf000-0xf03f]
[    0.221081] pci 0000:00:03.0: [8086:0c0c] type 00 class 0x040300
[    0.221089] pci 0000:00:03.0: reg 0x10: [mem 0xf7c34000-0xf7c37fff 64bit]
[    0.221163] pci 0000:00:14.0: [8086:8cb1] type 00 class 0x0c0330
[    0.221184] pci 0000:00:14.0: reg 0x10: [mem 0xf7c20000-0xf7c2ffff 64bit]
--
[    0.453765] calling  ioapic_init_ops+0x0/0xf @ 1
[    0.453767] initcall ioapic_init_ops+0x0/0xf returned 0 after 0 usecs
[    0.453770] calling  add_pcspkr+0x0/0x3b @ 1
[    0.453781] initcall add_pcspkr+0x0/0x3b returned 0 after 8 usecs
[    0.453783] calling  sysfb_init+0x0/0x96 @ 1
[    0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe0000000, 0x6bb000 bytes, mapped to 0xffffc90002000000
[    0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720
[    0.557233] Console: switching to colour frame buffer device 210x65
[    0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered!
[    0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs
[    0.661266] calling  audit_classes_init+0x0/0xaa @ 1
--
[    9.744397] input: gspca_zc3xx as /devices/pci0000:00/0000:00:14.0/usb3/3-3/input/input18
[    9.744481] usbcore: registered new interface driver gspca_zc3xx
[    9.744484] initcall sd_driver_init+0x0/0x1000 [gspca_zc3xx] returned 0 after 319 usecs
[    9.745108] calling  i915_init+0x0/0xa2 [i915] @ 403
[    9.745542] [drm] Memory usable by graphics device = 2048M
[    9.745544] checking generic (e0000000 6bb000) vs hw (e0000000 10000000)
[    9.745544] fb: switching to inteldrmfb from simple
[    9.745831] calling  alsa_seq_device_init+0x0/0x1000 [snd_seq_device] @ 384
[    9.745842] initcall alsa_seq_device_init+0x0/0x1000 [snd_seq_device] returned 0 after 9 usecs
[    9.746179] calling  hmac_module_init+0x0/0x1000 [hmac] @ 471
[    9.746180] initcall hmac_module_init+0x0/0x1000 [hmac] returned 0 after 0 usecs
--
[    9.749840] calling  usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384
[    9.751163] usbcore: registered new interface driver snd-usb-audio
[    9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs
[    9.943166] Console: switching to colour dummy device 80x25
[    9.943240] [drm] Replacing VGA console driver
[    9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
[    9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.
[    9.947147] Adding 31249404k swap on /dev/sdb1.  Priority:-1 extents:1 across:31249404k FS
[    9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    9.949728] [drm] Driver supports precise vblank timestamp query.
[    9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)

$lspci | grep 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Looks like it is the graphics card or the graphics driver.

I don't know if this is relevant
$ cat /proc/mtrr
reg00: base=0x000000000 (    0MB), size=16384MB, count=1: write-back
reg01: base=0x400000000 (16384MB), size=  512MB, count=1: write-back
reg02: base=0x0e0000000 ( 3584MB), size=  512MB, count=1: uncachable
reg03: base=0x0d0000000 ( 3328MB), size=  256MB, count=1: uncachable
reg04: base=0x0cf000000 ( 3312MB), size=   16MB, count=1: uncachable
reg05: base=0x41f000000 (16880MB), size=   16MB, count=1: uncachable
reg06: base=0x41ee00000 (16878MB), size=    2MB, count=1: uncachable

>
> What does your log show right before and after this? To find out try:
>
> dmesg | grep -5 -i mtrr
>

See full dmesg attached

$dmesg | grep -5 -i mtrr
[    0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs
[    0.189336] calling  pt_init+0x0/0x2a4 @ 1
[    0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs
[    0.189352] calling  bts_init+0x0/0xa4 @ 1
[    0.189354] initcall bts_init+0x0/0xa4 returned 0 after 0 usecs
[    0.189357] calling  mtrr_if_init+0x0/0x5f @ 1
[    0.189360] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
[    0.189362] calling  ffh_cstate_init+0x0/0x26 @ 1
[    0.189363] initcall ffh_cstate_init+0x0/0x26 returned 0 after 0 usecs
[    0.189366] calling  activate_jump_labels+0x0/0x2d @ 1
[    0.189367] initcall activate_jump_labels+0x0/0x2d returned 0 after 0 usecs
[    0.189370] calling  kcmp_cookies_init+0x0/0x31 @ 1
--
[    0.189424] calling  dmi_id_init+0x0/0x300 @ 1
[    0.189448] initcall dmi_id_init+0x0/0x300 returned 0 after 0 usecs
[    0.189450] calling  pci_arch_init+0x0/0x63 @ 1
[    0.189458] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
[    0.189462] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in E820
[    0.189467] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
[    0.189514] PCI: Using configuration type 1 for base access
[    0.189519] initcall pci_arch_init+0x0/0x63 returned 0 after 0 usecs
[    0.189528] calling  init_vdso+0x0/0x44 @ 1
[    0.189535] initcall init_vdso+0x0/0x44 returned 0 after 0 usecs
[    0.189538] calling  sysenter_setup+0x0/0x52 @ 1
--
[    0.189542] calling  topology_init+0x0/0x83 @ 1
[    0.189795] initcall topology_init+0x0/0x83 returned 0 after 0 usecs
[    0.189798] calling  fixup_ht_bug+0x0/0xed @ 1
[    0.189799] perf_event_intel: PMU erratum BJ122, BV98, HSD29 worked around, HT is on
[    0.189802] initcall fixup_ht_bug+0x0/0xed returned 0 after 0 usecs
[    0.189805] calling  mtrr_init_finialize+0x0/0x3a @ 1
[    0.189807] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
[    0.189809] calling  uid_cache_init+0x0/0x90 @ 1
[    0.189810] initcall uid_cache_init+0x0/0x90 returned 0 after 0 usecs
[    0.189812] calling  param_sysfs_init+0x0/0x1d9 @ 1
[    0.190106] initcall param_sysfs_init+0x0/0x1d9 returned 0 after 0 usecs
[    0.190108] calling  pm_sysrq_init+0x0/0x14 @ 1
--
[    9.749840] calling  usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384
[    9.751163] usbcore: registered new interface driver snd-usb-audio
[    9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs
[    9.943166] Console: switching to colour dummy device 80x25
[    9.943240] [drm] Replacing VGA console driver
[    9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
[    9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.
[    9.947147] Adding 31249404k swap on /dev/sdb1.  Priority:-1 extents:1 across:31249404k FS
[    9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    9.949728] [drm] Driver supports precise vblank timestamp query.
[    9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)

> Not being able to use WC is not fatal, its just a performance issue, but if we tried
> to override a region which we should not have to WC for which another area the BIOS
> might rely on to not be WC, that could be a big issue.
>

>> $dmesg | grep -i mtr for 4.3 kernel with default pat enabled
>> [    0.189368] calling  mtrr_if_init+0x0/0x5f @ 1
>> [    0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
>> [    0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
>> [    0.189814] calling  mtrr_init_finialize+0x0/0x3a @ 1
>> [    0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
>
> The fact we don't see a conflict doesn't mean an issue or conflict didn't
> trigger. If PAT didn't see something the BIOS did that make the kernel assume
> it could do something that it was not able to. The MTRR init code should pick
> up on this stuff and let the kernel PAT code know if there could be a conflict,
> but if for some reason that was missed, that could be an issue.
>

Ok I am not sure if there is something I should do here. I am attaching the dmesg for that boot just in case.
$cat /proc/mtrr gives the same results

>> Unless you have any other suggestions...
>
> Bisection on the merge commit would help.
>

Will do.

Thanks for the guidance, and the through explanations.

     Vassilis



[-- Attachment #2: dmesg-4.3-nopat.txt.gz --]
[-- Type: application/x-gzip, Size: 27174 bytes --]

[-- Attachment #3: dmesg-4.3-pat.txt.gz --]
[-- Type: application/x-gzip, Size: 27067 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-23 18:48   ` Luis R. Rodriguez
@ 2015-11-24  9:36     ` vasvir
  2015-11-24 22:03       ` Luis R. Rodriguez
  0 siblings, 1 reply; 22+ messages in thread
From: vasvir @ 2015-11-24  9:36 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Vassilis Virvilis, linux-kernel, Toshi Kani,
	mcgrof, mcgrof

> Let's try to speed up reproducing this.
>
> I have a hunch perhaps this might be related to some BIOS controlled
> MTRRs and a mismatch which then enables the kernel to think that a type
> of MTRR write might be OK, but in fact its not. Due to the work load
> description of this perhaps this could be related to fan control and BIOS
> control on them and against some other device MTRR. More on this suspicion
> on another thread where you provide more logs.
>
> On a kernel that you know fails can you try replacing this work load by
> making
> you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building
> for 2,
> 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if
> making the CPU fan trigger would accelerate the issue.  If 'make -j' is
> too nuts
> to the point you can't even CTRL C it, try 'make -j 16' . Note that if
> this is
> true then that means a hot CPU could still trigger CPU fan controls on on
> a
> fresh boot if the previous boot was CPU intensive.

OK that nailed it - with kernel 4.3 a known "bad" kernel I was able to
reproduce it in the second hibernate/resume cycle. Here is what I did in
my own words so you can spot inconsistencies.

I started a kernel compile with make -j 32. My computer was very
responsive which is an impressive feat by the way.
In a second tab in my Konsole (I am running KDE) I run $watch sensors. I
watched the temperature of the cores to go from 38 to ~70 and the cpu fan
from ~1630 to ~1900. Then the first time I hit Ctrl+C - stopped the
compilation and hibernated from the KDE. I always hibernate from the KDE
start menu. Previously I had made some tests where I was hibernating from
the VT console (although sddm may was running in VT7) and I have managed
to reproduce it - so (in my mind) it was not graphics mode specific. From
that point I am always hibernating from KDE.

The first time it worked. For the second time I thought - why to hit
Ctrl+C let's try to hibernate with the compilation running - and it
failed. Now I don't know if it failed because it was the second cycle or
because the load of the compilation was there or because of the
temperature controlled fan register you mentioned.

Then I repeated the test with a known good kernel 3.18 (which should be
773fed910d41e443e495a6bfa9ab1c2b7b13e012 according to my git bisect logs -
I have a problem there - see below) and it survived the same test
(hibernate two times with temperature being ~70).


> If this doesn't do it lets try forcing an MTRR capable driver, say
> graphics is
> the obvious target, try perhaps some 3D stuff or a screen saver prior to
> hibernation. Note that even if you boot nomtrr the BIOS may still use
> MTRRs,
> and PAT use on Linux could assume MTRR is not being used on drivers but
> the
> BIOS may still do something behind the scenes. This is actually one reason
> why
> we can't exactly remove MTRR support from Linux, since the BIOS may still
> do
> some wacky stuff with MTRRs, one example of such I was given was CPU can
> control might use WC MTRRs, so the kernel must be aware of this, even if
> no
> MTRRs are ever used on the Linux kernel at all -- this is the case now as
> of
> v4.3 and onwards.
>
> If that doesn't help speed it up , maybe try both screen saver + some 3D
> stuff + cpu instensive stuff.

I have 3D effects enabled in my KDE. Since your tip succeed to reproduce
the problem early I didn't bother but If I should test 3D which program /
benchmark should I run? glxgears?

>
> To help you speed up testing you can try reducing your build time by
> reducing
> the amount of crap you have to build:
>
> make localmodconfig
>
> That should only build things your kernel has loaded as modules or is
> already
> enabled (=y).
>

Thanks for the tip. I don't want to change that right now. I don't mind
waiting a little bit because I a get a deb with the kernel and can retest
a known configuration. The other tip you gave if it actually works as it
looks like working would give a great boost to the debugging cycle to
actually make me the bottleneck.

>
> That is commit a023748d53c10850650fe86b1c4a7d421d576451
> ("Merge branch 'x86-mm-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
>
> Git is smart enough to tell you you've hit a merge commit and that all the
> possible commits on that merge could be the issue. This is why you bisect
> log shows a slew of commits. The next step is to bisect through the merge
> and then bisect through that, this will then let us identify the exact
> commit
> that may have caused the issue.
>
> There are a few ways to do this, my preferred way is to "unfold" a merge
> commit manually.
>
> To help keep thing separately (without affecting other tests you might
> have on your other git tree and to avoid having to force you to loose
> fresh object as you continue to build test on the other tree), I'd do
> something like this:

we will go with your preferred way - no question about that.

>
> mkdir ~/tmp
> git clone ~/linux/.git linux-dev-test

ok I have them in paralled ~/path/linux ~/path/linux-dev-test

>
> cd linux-dev-test
>
> Notice how if you do git log and search for
> a023748d53c10850650fe86b1c4a7d421d576451
> you'll see that the commit listed before this is
> 773fed910d41e443e495a6bfa9ab1c2b7b13e012
> ("Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
>
> To be clear the list of commits you typically would see is just:
>
> a023748d53c10850650fe86b1c4a7d421d576451 - Merge branch 'x86-mm-for-linus'
> of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> 773fed910d41e443e495a6bfa9ab1c2b7b13e012 - Merge branches
> 'x86-platform-for-linus' and 'x86-uv-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>
> We want to go down into the commits in the merge commit a023748d53c and
> then zero out exactly which commit caused the issue. To do that on your
> linux-dev-test directory you can do this:

Thank you for the explanations. I thing I had understood that bit. git
bisect visualized (gitk) helped me to grasp it. git log gave me a hard
time with all these "hidden commits". Confirmation is good.

>
> git checkout -b test-merge-commit a023748d53c10850650fe86b1c4a7d421d576451
>
> That will create branch for testing based on the merge commit.
> Now do this:
>
> git rebase -i 773fed910d41e443e495a6bfa9ab1c2b7b13e012
>
> Then don't pick any commit, just save and exit the editor, and then
> git will actually "unfold" the merge commit for you -- it magically
> will apply each commit in that merge commit linearly into your git
> history.
>
> For instance the rebase should show 22 commits as follows, just
> leave the defaults set as in bewlow and just hit (ESCT + :wq if
> in vi):
>
> pick 96e70f832856 x86/mm: Avoid overlap the fixmap area on i386
> pick 63e7b6d90c1e x86: mm: Re-use the early_ioremap fixed area
> pick bdee237c0343 x86: mm: Use 2GB memory block size on large-memory
> x86-64 systems
> pick 281d4078bec3 x86: Make page cache mode a real type
> pick c27ce0af896b x86: Use new cache mode type in include/asm/fb.h
> pick 2d85ebf8e12e x86: Use new cache mode type in
> drivers/video/fbdev/gbefb.c
> pick 5006e45a6bc2 x86: Use new cache mode type in
> drivers/video/fbdev/vermilion
> pick 1c64216be164 x86: Use new cache mode type in arch/x86/pci
> pick 2df58b6d3530 x86: Use new cache mode type in arch/x86/mm/init_64.c
> pick d85f33342a0f x86: Use new cache mode type in asm/pgtable.h
> pick 49a3b3cbdf16 x86: Use new cache mode type in mm/iomap_32.c
> pick 2a3746984c98 x86: Use new cache mode type in track_pfn_remap() and
> track_pfn_insert()
> pick 102e19e1955d x86: Remove looking for setting of _PAGE_PAT_LARGE in
> pageattr.c
> pick c06814d8419a x86: Use new cache mode type in setting page attributes
> pick b14097bd911c x86: Use new cache mode type in mm/ioremap.c
> pick e00c8cc93c1a x86: Use new cache mode type in memtype related
> functions
> pick 87ad0b713b10 x86: Clean up pgtable_types.h
> pick f439c429c320 x86: Support PAT bit in pagetable dump for lower levels
> pick f5b2831d6541 x86: Respect PAT bit when copying pte values between
> large and normal pages
> pick bd809af16e3a x86: Enable PAT to use cache mode translation tables
> pick 47591df50512 xen: Support Xen pv-domains using PAT
> pick 0dbcae884779 x86: mm: Move PAT only functions to mm/pat.c
>

Ok I will do later tonight. But from my (git bisect) logs what I was
expecting was

# only skipped commits left to test
# possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451]
Merge branch 'x86-mm-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
# possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9]
x86: mm: Move PAT only functions to mm/pat.c
# possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7]
xen: Support Xen pv-domains using PAT
# possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2]
x86: Enable PAT to use cache mode translation tables
# possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c]
x86: Respect PAT bit when copying pte values between large and normal
pages
# possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b]
x86: Support PAT bit in pagetable dump for lower levels
# possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9]
x86: Clean up pgtable_types.h
# possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041]
x86: Use new cache mode type in memtype related functions
# possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843]
x86: Use new cache mode type in mm/ioremap.c
# possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6]
x86: Use new cache mode type in setting page attributes
# possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8]
x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
# possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4]
x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()

commit a023748d53c10850650fe86b1c4a7d421d576451 contains all the other
commits listed below. The order is that newest is higher

Note that these commits listed above are untestable because the resulting
kernels are not bootable. They hang in the second line of boot output in
"Loading Ramdisk..." or something similar.

my last good commit was 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 that
means with git bisect I had already started zeroing in in
a023748d53c10850650fe86b1c4a7d421d576451 since 49a3b3... was part of
a0237...

So based on my git bisect so far my understanding is
last good merge commit: 773fed910d41e443e495a6bfa9ab1c2b7b13e012
last bad merge commit (next after 773fed...):
a023748d53c10850650fe86b1c4a7d421d576451
last good commit (inside a023748d53c10850650fe86b1c4a7d421d576451):
49a3b3cbdf1621678a39bd95a3e67c0f858539c7
all the others from 49a3b3cbdf1... to a023748d53c1... are
untestable/unbootable kernels.

Please correct me if I am wrong - it will help me build the correct mental
model.

> You should see:
>
> Successfully rebased and updated refs/heads/test-merge-commit.
>
> Now if you do git log you will see the above commits in linear
> atomic history. You can now bisect this merge commit atomically, so do:
>
> git bisect 099487de0934e3d5e326666914a426af89a0868b
> 773fed910d41e443e495a6bfa9ab1c2b7b13e012
>
> Note that this assumes that the commit prior to the merge commit is fine.
> Is this true, can you confirm? (git checkout -b test-prior-merge-gtest
> 773fed910d4,
> build and see if it doesn't break there)
>
> If we know for sure 773fed910d4 did not break anything then the above
> bisect
> should let us zero in on the exact atomic commit ID that caused the issue.
>

Now the problem is that I tried twice to verify that
773fed910d41e443e495a6bfa9ab1c2b7b13e012 is indeed a good commit and I
ended up with an unbootable kernel (hangs in "Loading Ramdisk..."). This
is very disappointing and means that all my bisections so far are invalid.
Very disappointing indeed but it's only a setback. I will figure it out
and will make sure I have a valid setup for reproducible tests before I
bother you again.

Just for the record I did
$git checkout 773fed910d41e443e495a6bfa9ab1c2b7b13e012
$fakeroot make -j 4 CC=gcc-4.8 deb-pkg

I will do as you suggest with the unfold of commits - but if my bisection
was right (serious hints to the opposite exist) I stopped on
unbootable/untestable kernels

Thanks for the exhaustive mails with the explanations and the tips. They
are much appreciated.

      Vassilis



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-24  9:36     ` vasvir
@ 2015-11-24 22:03       ` Luis R. Rodriguez
  0 siblings, 0 replies; 22+ messages in thread
From: Luis R. Rodriguez @ 2015-11-24 22:03 UTC (permalink / raw)
  To: vasvir; +Cc: Juergen Gross, linux-kernel, Toshi Kani, mcgrof

On Tue, Nov 24, 2015 at 11:36:54AM +0200, vasvir@iit.demokritos.gr wrote:
> > Let's try to speed up reproducing this.
> >
> > I have a hunch perhaps this might be related to some BIOS controlled
> > MTRRs and a mismatch which then enables the kernel to think that a type
> > of MTRR write might be OK, but in fact its not. Due to the work load
> > description of this perhaps this could be related to fan control and BIOS
> > control on them and against some other device MTRR. More on this suspicion
> > on another thread where you provide more logs.
> >
> > On a kernel that you know fails can you try replacing this work load by
> > making
> > you CPU crawl to its knees quickly, perhaps 'make -j' on Linux building
> > for 2,
> > 4, 8, 16, minutes and then hit CTRL C to continue to hibernation to see if
> > making the CPU fan trigger would accelerate the issue.  If 'make -j' is
> > too nuts
> > to the point you can't even CTRL C it, try 'make -j 16' . Note that if
> > this is
> > true then that means a hot CPU could still trigger CPU fan controls on on
> > a
> > fresh boot if the previous boot was CPU intensive.
> 
> OK that nailed it - with kernel 4.3 a known "bad" kernel I was able to
> reproduce it in the second hibernate/resume cycle.

Great, glad we could reduce the amount of time to reproduce to what seems
to be a few minutes now.

> Here is what I did in my own words so you can spot inconsistencies.
> 
> I started a kernel compile with make -j 32. My computer was very
> responsive which is an impressive feat by the way.
> In a second tab in my Konsole (I am running KDE) I run $watch sensors. I
> watched the temperature of the cores to go from 38 to ~70 and the cpu fan
> from ~1630 to ~1900. Then the first time I hit Ctrl+C - stopped the
> compilation and hibernated from the KDE. I always hibernate from the KDE
> start menu. Previously I had made some tests where I was hibernating from
> the VT console (although sddm may was running in VT7) and I have managed
> to reproduce it - so (in my mind) it was not graphics mode specific. From
> that point I am always hibernating from KDE.

Come to think of it, the mtrr_add() and/or ioremap_wc() calls would be
triggered on driver initialization, that is on probe / boot time, so if this
issue you are running into is a clash of the BIOS's own notion of what is
set for an MTRR type and later another driver's desired MTRR desired type
(or equivalent PAT type) then the issue could be triggered just with
boot time / hibernation / resume time without much interaction at least
on the graphics front.

> The first time it worked. For the second time I thought - why to hit
> Ctrl+C let's try to hibernate with the compilation running - and it
> failed.

OK. How long did you leave the machine on idle before resuming?

Can you try on a fresh boot to bring up temperature to ~70 and while its still
compiling hibernate and see if that triggers it ? If we can reduce it to only
one hibernate that should reduce time to troubleshoot, it is also just puzzling
you'd need to hibernate twice to reproduce this issue.

> Now I don't know if it failed because it was the second cycle or
> because the load of the compilation was there or because of the
> temperature controlled fan register you mentioned.

If its fan related one test could be to hibertane on a fresh boot once fan
control is one, let it sit to cool, and then resume. Vs just resuming right
away. Ie: determine if we need fan control to be idle upon resume or not,
also how many times does fan control have to go on / off before you can
reproduce.

> Then I repeated the test with a known good kernel 3.18 (which should be
> 773fed910d41e443e495a6bfa9ab1c2b7b13e012 according to my git bisect logs -
> I have a problem there - see below) and it survived the same test
> (hibernate two times with temperature being ~70).
> 
> 
> > If this doesn't do it lets try forcing an MTRR capable driver, say
> > graphics is
> > the obvious target, try perhaps some 3D stuff or a screen saver prior to
> > hibernation. Note that even if you boot nomtrr the BIOS may still use
> > MTRRs,
> > and PAT use on Linux could assume MTRR is not being used on drivers but
> > the
> > BIOS may still do something behind the scenes. This is actually one reason
> > why
> > we can't exactly remove MTRR support from Linux, since the BIOS may still
> > do
> > some wacky stuff with MTRRs, one example of such I was given was CPU can
> > control might use WC MTRRs, so the kernel must be aware of this, even if
> > no
> > MTRRs are ever used on the Linux kernel at all -- this is the case now as
> > of
> > v4.3 and onwards.
> >
> > If that doesn't help speed it up , maybe try both screen saver + some 3D
> > stuff + cpu instensive stuff.
> 
> I have 3D effects enabled in my KDE. Since your tip succeed to reproduce
> the problem early I didn't bother but If I should test 3D which program /
> benchmark should I run? glxgears?

As I mentioned above I can't think now of a reason why this should trigger
the issue if its mtrr related.

> > To help you speed up testing you can try reducing your build time by
> > reducing
> > the amount of crap you have to build:
> >
> > make localmodconfig
> >
> > That should only build things your kernel has loaded as modules or is
> > already
> > enabled (=y).
> >
> 
> Thanks for the tip. I don't want to change that right now. I don't mind
> waiting a little bit because I a get a deb with the kernel and can retest
> a known configuration.

There is little risk in using it, you'll keep everything you had enabled as
built-in or modules. The only issue with this is if in between the commits
there was a kconfig symbol rename (driver rename), but I really don't
expect you to run into this as an issue.

> The other tip you gave if it actually works as it
> looks like working would give a great boost to the debugging cycle to
> actually make me the bottleneck.

Sure.

> > That is commit a023748d53c10850650fe86b1c4a7d421d576451
> > ("Merge branch 'x86-mm-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
> >
> > Git is smart enough to tell you you've hit a merge commit and that all the
> > possible commits on that merge could be the issue. This is why you bisect
> > log shows a slew of commits. The next step is to bisect through the merge
> > and then bisect through that, this will then let us identify the exact
> > commit
> > that may have caused the issue.
> >
> > There are a few ways to do this, my preferred way is to "unfold" a merge
> > commit manually.
> >
> > To help keep thing separately (without affecting other tests you might
> > have on your other git tree and to avoid having to force you to loose
> > fresh object as you continue to build test on the other tree), I'd do
> > something like this:
> 
> we will go with your preferred way - no question about that.
> 
> >
> > mkdir ~/tmp
> > git clone ~/linux/.git linux-dev-test
> 
> ok I have them in paralled ~/path/linux ~/path/linux-dev-test
> 
> >
> > cd linux-dev-test
> >
> > Notice how if you do git log and search for
> > a023748d53c10850650fe86b1c4a7d421d576451
> > you'll see that the commit listed before this is
> > 773fed910d41e443e495a6bfa9ab1c2b7b13e012
> > ("Merge branches 'x86-platform-for-linus' and 'x86-uv-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
> >
> > To be clear the list of commits you typically would see is just:
> >
> > a023748d53c10850650fe86b1c4a7d421d576451 - Merge branch 'x86-mm-for-linus'
> > of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > 773fed910d41e443e495a6bfa9ab1c2b7b13e012 - Merge branches
> > 'x86-platform-for-linus' and 'x86-uv-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >
> > We want to go down into the commits in the merge commit a023748d53c and
> > then zero out exactly which commit caused the issue. To do that on your
> > linux-dev-test directory you can do this:
> 
> Thank you for the explanations. I thing I had understood that bit. git
> bisect visualized (gitk) helped me to grasp it. git log gave me a hard
> time with all these "hidden commits". Confirmation is good.
> 
> >
> > git checkout -b test-merge-commit a023748d53c10850650fe86b1c4a7d421d576451
> >
> > That will create branch for testing based on the merge commit.
> > Now do this:
> >
> > git rebase -i 773fed910d41e443e495a6bfa9ab1c2b7b13e012
> >
> > Then don't pick any commit, just save and exit the editor, and then
> > git will actually "unfold" the merge commit for you -- it magically
> > will apply each commit in that merge commit linearly into your git
> > history.
> >
> > For instance the rebase should show 22 commits as follows, just
> > leave the defaults set as in bewlow and just hit (ESCT + :wq if
> > in vi):
> >
> > pick 96e70f832856 x86/mm: Avoid overlap the fixmap area on i386
> > pick 63e7b6d90c1e x86: mm: Re-use the early_ioremap fixed area
> > pick bdee237c0343 x86: mm: Use 2GB memory block size on large-memory
> > x86-64 systems
> > pick 281d4078bec3 x86: Make page cache mode a real type
> > pick c27ce0af896b x86: Use new cache mode type in include/asm/fb.h
> > pick 2d85ebf8e12e x86: Use new cache mode type in
> > drivers/video/fbdev/gbefb.c
> > pick 5006e45a6bc2 x86: Use new cache mode type in
> > drivers/video/fbdev/vermilion
> > pick 1c64216be164 x86: Use new cache mode type in arch/x86/pci
> > pick 2df58b6d3530 x86: Use new cache mode type in arch/x86/mm/init_64.c
> > pick d85f33342a0f x86: Use new cache mode type in asm/pgtable.h
> > pick 49a3b3cbdf16 x86: Use new cache mode type in mm/iomap_32.c

Everything below here should be tested given you say 49a3b3cbdf16 is good.

> > pick 2a3746984c98 x86: Use new cache mode type in track_pfn_remap() and
> > track_pfn_insert()
> > pick 102e19e1955d x86: Remove looking for setting of _PAGE_PAT_LARGE in
> > pageattr.c
> > pick c06814d8419a x86: Use new cache mode type in setting page attributes
> > pick b14097bd911c x86: Use new cache mode type in mm/ioremap.c
> > pick e00c8cc93c1a x86: Use new cache mode type in memtype related
> > functions
> > pick 87ad0b713b10 x86: Clean up pgtable_types.h
> > pick f439c429c320 x86: Support PAT bit in pagetable dump for lower levels
> > pick f5b2831d6541 x86: Respect PAT bit when copying pte values between
> > large and normal pages
> > pick bd809af16e3a x86: Enable PAT to use cache mode translation tables
> > pick 47591df50512 xen: Support Xen pv-domains using PAT
> > pick 0dbcae884779 x86: mm: Move PAT only functions to mm/pat.c
> >
> 
> Ok I will do later tonight. But from my (git bisect) logs what I was
> expecting was
> 
> # only skipped commits left to test
> # possible first bad commit: [a023748d53c10850650fe86b1c4a7d421d576451]
> Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> # possible first bad commit: [0dbcae884779fdf7e2239a97ac7488877f0693d9]
> x86: mm: Move PAT only functions to mm/pat.c
> # possible first bad commit: [47591df505129c9774af6cca2debf283a6e56ed7]
> xen: Support Xen pv-domains using PAT
> # possible first bad commit: [bd809af16e3ab1f8d55b3e2928c47c67e2a865d2]
> x86: Enable PAT to use cache mode translation tables
> # possible first bad commit: [f5b2831d654167d77da8afbef4d2584897b12d0c]
> x86: Respect PAT bit when copying pte values between large and normal
> pages
> # possible first bad commit: [f439c429c320981943f8b64b2a4049d946cb492b]
> x86: Support PAT bit in pagetable dump for lower levels
> # possible first bad commit: [87ad0b713b1034b6caf559976c35ce47f6d1d1e9]
> x86: Clean up pgtable_types.h
> # possible first bad commit: [e00c8cc93c1ac01ecd5049929a50fb47b62bb041]
> x86: Use new cache mode type in memtype related functions
> # possible first bad commit: [b14097bd911c2554b0b5271b3a6b2d84044d1843]
> x86: Use new cache mode type in mm/ioremap.c
> # possible first bad commit: [c06814d8419a74528500f85faf5fc01f67f8e7e6]
> x86: Use new cache mode type in setting page attributes
> # possible first bad commit: [102e19e1955d85f31475416b1ee22980c6462cf8]
> x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
> # possible first bad commit: [2a3746984c98b17b565e6a2c2bbaaaef757db1b4]
> x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()

You are correct. Since you say 49a3b3cbdf16 was your first last good
commit then these are possible bad candidates.

> commit a023748d53c10850650fe86b1c4a7d421d576451 contains all the other
> commits listed below. The order is that newest is higher

Sure. All of those went int v3.19-rc1.

> Note that these commits listed above are untestable because the resulting
> kernels are not bootable. They hang in the second line of boot output in
> "Loading Ramdisk..." or something similar.
> 
> my last good commit was 49a3b3cbdf1621678a39bd95a3e67c0f858539c7 

That went in v3.19-rc1 as well.

> that
> means with git bisect I had already started zeroing in in
> a023748d53c10850650fe86b1c4a7d421d576451 since 49a3b3... was part of
> a0237...

This is correct. I missed that, thanks. That should reduce your bisect to
the above commits. We know that aa8f46878ab1a4a4e7b975b8fc8c398981e52986
("x86: mm: Move PAT only functions to mm/pat.c") was the last commit
part of the merge a023748d53c10850650fe86b1c4a7d421d576451, and we know
a023748d53c10850650fe86b1c4a7d421d576451 was a bad commit you can now
bisect on that tree between

x86: se new cache mode type in mm/iomap_32.c

and

x86: mm: Move PAT only functions to mm/pat.c

Note that rebasign will then change your commit IDs so the above
two named commits would appear differently on your tree so when
relaying information back to us just use the name if working
on the rebased tree.

Since you know the commit IDs now though you could also just
go back to your original tree and bisect between the two
commits now part of the same branch:

git bisect start a023748d53c10850650fe86b1c4a7d421d576451 49a3b3cbdf1621678a39bd95a3e67c0f858539c7

But for some reason git does a huge bisection here, I get 11 steps with
1814 revisions to test... we know there are only 12 revisions really
left to test though, for instance here are my commit IDs once I rebase
on the branch commit a023748d53c10850650fe86b1c4a7d421d576451 (as I said
notice how the commit IDs are now different):

5e9c2da70692 x86: Use new cache mode type in mm/iomap_32.c                 
e09f7c9da6b7 x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
7077aded72a2 x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
a019620d98ec x86: Use new cache mode type in setting page attributes       
848159761245 x86: Use new cache mode type in mm/ioremap.c                  
06d664382eea x86: Use new cache mode type in memtype related functions     
39ba0907d179 x86: Clean up pgtable_types.h                                 
155c520125fe x86: Support PAT bit in pagetable dump for lower levels       
f51279d0867f x86: Respect PAT bit when copying pte values between large and normal pages
ddbb181ad4ff x86: Enable PAT to use cache mode translation tables          
7c67687de764 xen: Support Xen pv-domains using PAT                         
aa8f46878ab1 x86: mm: Move PAT only functions to mm/pat.c  

So we really only need you test max 3 commits (log2 of 11 =~ 3). 

So with my commit IDs I'd just do:

git bisect start aa8f46878ab1 5e9c2da70692

By rebasing on the commit prior to the merge commit this cuts down bisection it
down from 1814 revision to 5 revisions and from roughly 11 steps roughly 3
steps.

> So based on my git bisect so far my understanding is
> last good merge commit: 773fed910d41e443e495a6bfa9ab1c2b7b13e012

773fed910d41e443e495a6bfa9ab1c2b7b13e012 is the commit prior to the
merge commit, we have better information -- its just within the merge
commit so we need to trickle in there to look at it.

> last bad merge commit (next after 773fed...):
> a023748d53c10850650fe86b1c4a7d421d576451

That's the merge commit, but a merge commit is just fluff (meta data
to preserve annotations who how queued up code and then tossed it to
Linus later), we know the actual last commit that made code changes
was 0dbcae884779fdf7e2239a97ac7488877f0693d9 ("x86: mm: Move PAT
only functions to mm/pat.c") so we can use that.

> last good commit (inside a023748d53c10850650fe86b1c4a7d421d576451):
> 49a3b3cbdf1621678a39bd95a3e67c0f858539c7

That's much better, that zeroes inside the merge commit.

> all the others from 49a3b3cbdf1... to a023748d53c1... are
> untestable/unbootable kernels.

To be clear 49a3b3cbdf1 is bootable as its your last good
commit. If you are saying that after that and up to the
last commit of the merge commit (0dbcae884779) things are
not bootable that's a big issue indeed to help bisect this
further. If you can boot but it "hangs" on hibernate on
the merge commit a023748d53c1 I would suspect you should
at least be able to boot into the last commit of the merge
0dbcae884779f, can you confirm?

To be clear here are the list of commits we are reviewing:

a023748d53c1 - merge commit
	0dbcae884779f - last commit of the merge commit
	49a3b3cbdf162 - first good commit in the merge

Putting names on these:

a023748d53c1 - Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
	0dbcae884779f - x86: mm: Move PAT only functions to mm/pat.c
	49a3b3cbdf162 - x86: Use new cache mode type in mm/iomap_32.c

The actual list then of things we need you to find out what caused the issue:

git log --oneline 49a3b3cbdf162^1..0dbcae884779f
0dbcae884779 x86: mm: Move PAT only functions to mm/pat.c
47591df50512 xen: Support Xen pv-domains using PAT
bd809af16e3a x86: Enable PAT to use cache mode translation tables
f5b2831d6541 x86: Respect PAT bit when copying pte values between large and normal pages
f439c429c320 x86: Support PAT bit in pagetable dump for lower levels
87ad0b713b10 x86: Clean up pgtable_types.h
e00c8cc93c1a x86: Use new cache mode type in memtype related functions
b14097bd911c x86: Use new cache mode type in mm/ioremap.c
c06814d8419a x86: Use new cache mode type in setting page attributes
102e19e1955d x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c
2a3746984c98 x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert()
49a3b3cbdf16 x86: Use new cache mode type in mm/iomap_32.c

Juergen is the author of all of these except 0dbcae884779 which is just a code
shift, so it sholud not affect run time for you.

> Please correct me if I am wrong - it will help me build the correct mental
> model.

Hope this helps. If there were issues with getting to boot some of the other
commits obviously some of the other commits fixed an issue as the merge
commit seems bootable -- so perhaps one of these commits is important to fix
the bootable issue you noted. Since Juergen is the author of all of the
relevant patches and he's been active on this thread I am confident we
should be able to get you a bootable kernel so you can help complete the
bisection.

  Luis

> > You should see:

> > Successfully rebased and updated refs/heads/test-merge-commit.
> >
> > Now if you do git log you will see the above commits in linear
> > atomic history. You can now bisect this merge commit atomically, so do:
> >
> > git bisect 099487de0934e3d5e326666914a426af89a0868b
> > 773fed910d41e443e495a6bfa9ab1c2b7b13e012
> >
> > Note that this assumes that the commit prior to the merge commit is fine.
> > Is this true, can you confirm? (git checkout -b test-prior-merge-gtest
> > 773fed910d4,
> > build and see if it doesn't break there)
> >
> > If we know for sure 773fed910d4 did not break anything then the above
> > bisect
> > should let us zero in on the exact atomic commit ID that caused the issue.
> >
> 
> Now the problem is that I tried twice to verify that
> 773fed910d41e443e495a6bfa9ab1c2b7b13e012 is indeed a good commit and I
> ended up with an unbootable kernel (hangs in "Loading Ramdisk..."). This
> is very disappointing and means that all my bisections so far are invalid.
> Very disappointing indeed but it's only a setback. I will figure it out
> and will make sure I have a valid setup for reproducible tests before I
> bother you again.
> 
> Just for the record I did
> $git checkout 773fed910d41e443e495a6bfa9ab1c2b7b13e012
> $fakeroot make -j 4 CC=gcc-4.8 deb-pkg
> 
> I will do as you suggest with the unfold of commits - but if my bisection
> was right (serious hints to the opposite exist) I stopped on
> unbootable/untestable kernels
> 
> Thanks for the exhaustive mails with the explanations and the tips. They
> are much appreciated.
> 
>       Vassilis
> 
> 
> 

-- 
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-23 23:01                     ` Vassilis Virvilis
@ 2015-11-24 22:16                       ` Luis R. Rodriguez
  0 siblings, 0 replies; 22+ messages in thread
From: Luis R. Rodriguez @ 2015-11-24 22:16 UTC (permalink / raw)
  To: Vassilis Virvilis; +Cc: Juergen Gross, linux-kernel, Toshi Kani

On Tue, Nov 24, 2015 at 01:01:31AM +0200, Vassilis Virvilis wrote:
> On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote:
> >Its not clear from the log who called this MTRR call for WC that failed, I
> >hope we didn't attempt a WC wright on a WB region. Who owns
> >00000000e0000000-00000000efffffff ?
> 
> How can I answer that? Is there any utility to run? peek inside /proc?
> 
> [    0.221012] pci 0000:00:02.0: [8086:0412] type 00 class 0x030000
> [    0.221021] pci 0000:00:02.0: reg 0x10: [mem 0xf7800000-0xf7bfffff 64bit]
> [    0.221025] pci 0000:00:02.0: reg 0x18: [mem 0xe0000000-0xefffffff 64bit pref]
> [    0.221028] pci 0000:00:02.0: reg 0x20: [io  0xf000-0xf03f]

...

> [    0.453783] calling  sysfb_init+0x0/0x96 @ 1
> [    0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe0000000, 0x6bb000 bytes, mapped to 0xffffc90002000000
> [    0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720
> [    0.557233] Console: switching to colour frame buffer device 210x65
> [    0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered!
> [    0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs

...

> [    9.745108] calling  i915_init+0x0/0xa2 [i915] @ 403
> [    9.745542] [drm] Memory usable by graphics device = 2048M
> [    9.745544] checking generic (e0000000 6bb000) vs hw (e0000000 10000000)
> [    9.745544] fb: switching to inteldrmfb from simple

...

> [    9.943166] Console: switching to colour dummy device 80x25
> [    9.943240] [drm] Replacing VGA console driver
> [    9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
> [    9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.
> [    9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [    9.949728] [drm] Driver supports precise vblank timestamp query.
> [    9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem

...

> $lspci | grep 00:02.0
> 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
>
> Looks like it is the graphics card or the graphics driver.

Good job yes.

> I don't know if this is relevant
> $ cat /proc/mtrr
> reg00: base=0x000000000 (    0MB), size=16384MB, count=1: write-back
> reg01: base=0x400000000 (16384MB), size=  512MB, count=1: write-back
> reg02: base=0x0e0000000 ( 3584MB), size=  512MB, count=1: uncachable

Right so it tried to set this to WC but failed, and when using PAT
MTRR is not used instead PAT is used and your log showed no error.

> reg03: base=0x0d0000000 ( 3328MB), size=  256MB, count=1: uncachable
> reg04: base=0x0cf000000 ( 3312MB), size=   16MB, count=1: uncachable
> reg05: base=0x41f000000 (16880MB), size=   16MB, count=1: uncachable
> reg06: base=0x41ee00000 (16878MB), size=    2MB, count=1: uncachable
> 
> >
> >What does your log show right before and after this? To find out try:
> >
> >dmesg | grep -5 -i mtrr
> >
> 
> See full dmesg attached
> 
> $dmesg | grep -5 -i mtrr
> [    0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs
> [    0.189336] calling  pt_init+0x0/0x2a4 @ 1
> [    0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs
> [    0.189352] calling  bts_init+0x0/0xa4 @ 1
> [    0.189354] initcall bts_init+0x0/0xa4 returned 0 after 0 usecs
> [    0.189357] calling  mtrr_if_init+0x0/0x5f @ 1
> [    0.189360] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> [    0.189362] calling  ffh_cstate_init+0x0/0x26 @ 1
> [    0.189363] initcall ffh_cstate_init+0x0/0x26 returned 0 after 0 usecs
> [    0.189366] calling  activate_jump_labels+0x0/0x2d @ 1
> [    0.189367] initcall activate_jump_labels+0x0/0x2d returned 0 after 0 usecs
> [    0.189370] calling  kcmp_cookies_init+0x0/0x31 @ 1
> --
> [    0.189424] calling  dmi_id_init+0x0/0x300 @ 1
> [    0.189448] initcall dmi_id_init+0x0/0x300 returned 0 after 0 usecs
> [    0.189450] calling  pci_arch_init+0x0/0x63 @ 1
> [    0.189458] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
> [    0.189462] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in E820
> [    0.189467] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
> [    0.189514] PCI: Using configuration type 1 for base access
> [    0.189519] initcall pci_arch_init+0x0/0x63 returned 0 after 0 usecs
> [    0.189528] calling  init_vdso+0x0/0x44 @ 1
> [    0.189535] initcall init_vdso+0x0/0x44 returned 0 after 0 usecs
> [    0.189538] calling  sysenter_setup+0x0/0x52 @ 1
> --
> [    0.189542] calling  topology_init+0x0/0x83 @ 1
> [    0.189795] initcall topology_init+0x0/0x83 returned 0 after 0 usecs
> [    0.189798] calling  fixup_ht_bug+0x0/0xed @ 1
> [    0.189799] perf_event_intel: PMU erratum BJ122, BV98, HSD29 worked around, HT is on
> [    0.189802] initcall fixup_ht_bug+0x0/0xed returned 0 after 0 usecs
> [    0.189805] calling  mtrr_init_finialize+0x0/0x3a @ 1
> [    0.189807] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
> [    0.189809] calling  uid_cache_init+0x0/0x90 @ 1
> [    0.189810] initcall uid_cache_init+0x0/0x90 returned 0 after 0 usecs
> [    0.189812] calling  param_sysfs_init+0x0/0x1d9 @ 1
> [    0.190106] initcall param_sysfs_init+0x0/0x1d9 returned 0 after 0 usecs
> [    0.190108] calling  pm_sysrq_init+0x0/0x14 @ 1
> --
> [    9.749840] calling  usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384
> [    9.751163] usbcore: registered new interface driver snd-usb-audio
> [    9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs
> [    9.943166] Console: switching to colour dummy device 80x25
> [    9.943240] [drm] Replacing VGA console driver
> [    9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
> [    9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.
> [    9.947147] Adding 31249404k swap on /dev/sdb1.  Priority:-1 extents:1 across:31249404k FS
> [    9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [    9.949728] [drm] Driver supports precise vblank timestamp query.
> [    9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
> [    9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)

Thanks. I don't see anything obvious that should have caused MTRR for the graphics driver
to have failed here...

> >Not being able to use WC is not fatal, its just a performance issue, but if we tried
> >to override a region which we should not have to WC for which another area the BIOS
> >might rely on to not be WC, that could be a big issue.
> >
> 
> >>$dmesg | grep -i mtr for 4.3 kernel with default pat enabled
> >>[    0.189368] calling  mtrr_if_init+0x0/0x5f @ 1
> >>[    0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> >>[    0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
> >>[    0.189814] calling  mtrr_init_finialize+0x0/0x3a @ 1
> >>[    0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
> >
> >The fact we don't see a conflict doesn't mean an issue or conflict didn't
> >trigger. If PAT didn't see something the BIOS did that make the kernel assume
> >it could do something that it was not able to. The MTRR init code should pick
> >up on this stuff and let the kernel PAT code know if there could be a conflict,
> >but if for some reason that was missed, that could be an issue.
> >
> 
> Ok I am not sure if there is something I should do here. I am attaching the dmesg for that boot just in case.
> $cat /proc/mtrr gives the same results
> 
> >>Unless you have any other suggestions...
> >
> >Bisection on the merge commit would help.
> >
> 
> Will do.
> 
> Thanks for the guidance, and the through explanations.

This helps but it doesn't give us further insight as to why the error really
occurred in the first place for the mttr add call. Let's debug further on
the bisect and see where that takes us.

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-23 14:19                       ` Juergen Gross
@ 2015-11-24 22:46                         ` Luis R. Rodriguez
  2015-11-25  5:01                           ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Luis R. Rodriguez @ 2015-11-24 22:46 UTC (permalink / raw)
  To: Juergen Gross; +Cc: vasvir, linux-kernel, Toshi Kani

On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote:
> On 23/11/15 15:11, vasvir@iit.demokritos.gr wrote:
> > Ok I will send the .config when I get back home. I have all kernels I
> > build in .deb archive. The problem is that the debian kernel build
> > procedure does not hold somewhere in the deb file the git commit hash.
> > 
> > Fow which kernel would you care to see the config? 4.3?
> 
> Doesn't really matter anymore. I've posted a patch already to fix it and
> got the reply, that the fix is okay, but no harm can come from the
> current implementation, as the two config options are always either both
> set or reset.

Hrm, Vassilis seems to be able to reproduce this more effectively by heating up
his CPU prior to hibernation though. I have no idea what adding APIC_LVT_MASKED
((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR 0x330) does but
clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during
boot." If we're suspending but the fan is still on I wonder if this could cause
an issue with some settings the BIOS may have set prior to hibernation, and
a mismatch upon resume.

I can't find what APIC_LVT_MASKED does though, the best doc I found:

https://www-ssl.intel.com/content/dam/www/public/us/en/documents/white-papers/cpu-monitoring-dts-peci-paper.pdf

The inability to set the MTRR for the i915 card might be totally separate
issue at this point, not sure. One could test that I suppose by just
using vesa graphics card driver (disabling i915) to at least get
a basic screen to see things and compile/test things.

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-24 22:46                         ` Luis R. Rodriguez
@ 2015-11-25  5:01                           ` Juergen Gross
  2015-11-25 19:24                             ` Luis R. Rodriguez
  0 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2015-11-25  5:01 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: vasvir, linux-kernel, Toshi Kani

On 24/11/15 23:46, Luis R. Rodriguez wrote:
> On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote:
>> On 23/11/15 15:11, vasvir@iit.demokritos.gr wrote:
>>> Ok I will send the .config when I get back home. I have all kernels I
>>> build in .deb archive. The problem is that the debian kernel build
>>> procedure does not hold somewhere in the deb file the git commit hash.
>>>
>>> Fow which kernel would you care to see the config? 4.3?
>>
>> Doesn't really matter anymore. I've posted a patch already to fix it and
>> got the reply, that the fix is okay, but no harm can come from the
>> current implementation, as the two config options are always either both
>> set or reset.
> 
> Hrm, Vassilis seems to be able to reproduce this more effectively by heating up
> his CPU prior to hibernation though. I have no idea what adding APIC_LVT_MASKED
> ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR 0x330) does but
> clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during
> boot." If we're suspending but the fan is still on I wonder if this could cause
> an issue with some settings the BIOS may have set prior to hibernation, and
> a mismatch upon resume.
> 
> I can't find what APIC_LVT_MASKED does though, the best doc I found:

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf

Local APIC (chapter 10.4).


Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Hibernate resume bug around 3,18-rc2 - Full PAT support
  2015-11-25  5:01                           ` Juergen Gross
@ 2015-11-25 19:24                             ` Luis R. Rodriguez
  0 siblings, 0 replies; 22+ messages in thread
From: Luis R. Rodriguez @ 2015-11-25 19:24 UTC (permalink / raw)
  To: Juergen Gross; +Cc: vasvir, linux-kernel, Toshi Kani

On Wed, Nov 25, 2015 at 06:01:20AM +0100, Juergen Gross wrote:
> On 24/11/15 23:46, Luis R. Rodriguez wrote:
> > On Mon, Nov 23, 2015 at 03:19:16PM +0100, Juergen Gross wrote:
> >> On 23/11/15 15:11, vasvir@iit.demokritos.gr wrote:
> >>> Ok I will send the .config when I get back home. I have all kernels I
> >>> build in .deb archive. The problem is that the debian kernel build
> >>> procedure does not hold somewhere in the deb file the git commit hash.
> >>>
> >>> Fow which kernel would you care to see the config? 4.3?
> >>
> >> Doesn't really matter anymore. I've posted a patch already to fix it and
> >> got the reply, that the fix is okay, but no harm can come from the
> >> current implementation, as the two config options are always either both
> >> set or reset.
> > 
> > Hrm, Vassilis seems to be able to reproduce this more effectively by heating up
> > his CPU prior to hibernation though. I have no idea what adding APIC_LVT_MASKED
> > ((1 << 16)) to the Local Vector Table (LVT) Thermal Monitor (APIC_LVTTHMR 0x330) does but
> > clear_local_APIC() seems to be used to "cleanout any BIOS leftovers during
> > boot." If we're suspending but the fan is still on I wonder if this could cause
> > an issue with some settings the BIOS may have set prior to hibernation, and
> > a mismatch upon resume.
> > 
> > I can't find what APIC_LVT_MASKED does though, the best doc I found:
> 
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf
> 
> Local APIC (chapter 10.4).

Thanks, yeah I only see the same thing you spotted and fixed [0] but also
agree it does not play a role with this issue. Although completely
not documented the APIC_LVT_MASKED just masks the thermal interrupts
while we go down, and we just set the original value of the thermal
register when we come up. The only other possible cautious reading about
the thermal register seemed to be x86-32 bit specific.

Let's see what the bisect ends up with.

[0] https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=42baa2581c92f8d07e7260506c8d41caf14b0fc3

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-11-25 19:25 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-18 21:43 Hibernate resume bug around 3,18-rc2 - Full PAT support Vassilis Virvilis
2015-11-19  5:39 ` Juergen Gross
2015-11-19  7:50   ` vasvir
2015-11-19  9:10     ` Juergen Gross
2015-11-19 20:35       ` Vassilis Virvilis
2015-11-20  5:25         ` Vassilis Virvilis
2015-11-20  8:47           ` Juergen Gross
2015-11-20 10:04             ` vasvir
2015-11-20 12:23               ` Juergen Gross
2015-11-21 11:49                 ` Vassilis Virvilis
2015-11-23  7:32                   ` Juergen Gross
2015-11-23 14:11                     ` vasvir
2015-11-23 14:19                       ` Juergen Gross
2015-11-24 22:46                         ` Luis R. Rodriguez
2015-11-25  5:01                           ` Juergen Gross
2015-11-25 19:24                             ` Luis R. Rodriguez
2015-11-23 18:56                   ` Luis R. Rodriguez
2015-11-23 23:01                     ` Vassilis Virvilis
2015-11-24 22:16                       ` Luis R. Rodriguez
2015-11-23 18:48   ` Luis R. Rodriguez
2015-11-24  9:36     ` vasvir
2015-11-24 22:03       ` Luis R. Rodriguez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.