linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression from 3.4.9 to 3.4.16 "stable" kernel
@ 2012-10-29  4:03 Mark Lord
  2012-10-29  6:46 ` Willy Tarreau
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Lord @ 2012-10-29  4:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman, stable, Linus Torvalds, Linux Kernel

My server here runs the 3.4.xx series of "stable" kernels.
Until today, it was running 3.4.9.
Today I tried to upgrade it to 3.4.16.
It hangs in setup.c.

I've isolated the fault down to this specific change
that was made between 3.4.9 and 3.4.16.
Reverting this change allows the system to boot/run normally again.


--- linux-3.4.9/arch/x86/kernel/setup.c	2012-08-15 11:17:17.000000000 -0400
+++ linux-3.4.16/arch/x86/kernel/setup.c	2012-10-28 13:36:33.000000000 -0400
@@ -927,8 +927,21 @@

 #ifdef CONFIG_X86_64
 	if (max_pfn > max_low_pfn) {
-		max_pfn_mapped = init_memory_mapping(1UL<<32,
-						     max_pfn<<PAGE_SHIFT);
+		int i;
+		for (i = 0; i < e820.nr_map; i++) {
+			struct e820entry *ei = &e820.map[i];
+
+			if (ei->addr + ei->size <= 1UL << 32)
+				continue;
+
+			if (ei->type == E820_RESERVED)
+				continue;
+
+			max_pfn_mapped = init_memory_mapping(
+				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
+				ei->addr + ei->size);
+		}
+
 		/* can we preseve max_low_pfn ?*/
 		max_low_pfn = max_pfn;
 	}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29  4:03 Regression from 3.4.9 to 3.4.16 "stable" kernel Mark Lord
@ 2012-10-29  6:46 ` Willy Tarreau
  2012-10-29 14:22   ` Mark Lord
  0 siblings, 1 reply; 14+ messages in thread
From: Willy Tarreau @ 2012-10-29  6:46 UTC (permalink / raw)
  To: Mark Lord
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Linux Kernel,
	Jacob Shin, H. Peter Anvin

On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> My server here runs the 3.4.xx series of "stable" kernels.
> Until today, it was running 3.4.9.
> Today I tried to upgrade it to 3.4.16.
> It hangs in setup.c.
> 
> I've isolated the fault down to this specific change
> that was made between 3.4.9 and 3.4.16.
> Reverting this change allows the system to boot/run normally again.
> 
> 
> --- linux-3.4.9/arch/x86/kernel/setup.c	2012-08-15 11:17:17.000000000 -0400
> +++ linux-3.4.16/arch/x86/kernel/setup.c	2012-10-28 13:36:33.000000000 -0400
> @@ -927,8 +927,21 @@
> 
>  #ifdef CONFIG_X86_64
>  	if (max_pfn > max_low_pfn) {
> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
> -						     max_pfn<<PAGE_SHIFT);
> +		int i;
> +		for (i = 0; i < e820.nr_map; i++) {
> +			struct e820entry *ei = &e820.map[i];
> +
> +			if (ei->addr + ei->size <= 1UL << 32)
> +				continue;
> +
> +			if (ei->type == E820_RESERVED)
> +				continue;
> +
> +			max_pfn_mapped = init_memory_mapping(
> +				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> +				ei->addr + ei->size);
> +		}
> +
>  		/* can we preseve max_low_pfn ?*/
>  		max_low_pfn = max_pfn;
>  	}

For the record, it is this commit introduced in 3.4.16 :

commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
Author: Jacob Shin <jacob.shin@amd.com>
Date:   Thu Oct 20 16:15:26 2011 -0500

    x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
    
    commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
    
    On systems with very large memory (1 TB in our case), BIOS may report a
    reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
    these from the direct mapping.
    
    [ hpa: this should be done not just for > 4 GB but for everything above the legacy
      region (1 MB), at the very least.  That, however, turns out to require significant
      restructuring.  That work is well underway, but is not suitable for rc/stable. ]
    
    Signed-off-by: Jacob Shin <jacob.shin@amd.com>
    Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
    Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Willy


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29  6:46 ` Willy Tarreau
@ 2012-10-29 14:22   ` Mark Lord
  2012-10-29 14:37     ` Mark Lord
  2012-10-29 14:40     ` Ben Hutchings
  0 siblings, 2 replies; 14+ messages in thread
From: Mark Lord @ 2012-10-29 14:22 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Linux Kernel,
	Jacob Shin, H. Peter Anvin

On 12-10-29 02:46 AM, Willy Tarreau wrote:
> On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
>> My server here runs the 3.4.xx series of "stable" kernels.
>> Until today, it was running 3.4.9.
>> Today I tried to upgrade it to 3.4.16.
>> It hangs in setup.c.
>>
>> I've isolated the fault down to this specific change
>> that was made between 3.4.9 and 3.4.16.
>> Reverting this change allows the system to boot/run normally again.
>>
>>
>> --- linux-3.4.9/arch/x86/kernel/setup.c	2012-08-15 11:17:17.000000000 -0400
>> +++ linux-3.4.16/arch/x86/kernel/setup.c	2012-10-28 13:36:33.000000000 -0400
>> @@ -927,8 +927,21 @@
>>
>>  #ifdef CONFIG_X86_64
>>  	if (max_pfn > max_low_pfn) {
>> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
>> -						     max_pfn<<PAGE_SHIFT);
>> +		int i;
>> +		for (i = 0; i < e820.nr_map; i++) {
>> +			struct e820entry *ei = &e820.map[i];
>> +
>> +			if (ei->addr + ei->size <= 1UL << 32)
>> +				continue;
>> +
>> +			if (ei->type == E820_RESERVED)
>> +				continue;
>> +
>> +			max_pfn_mapped = init_memory_mapping(
>> +				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
>> +				ei->addr + ei->size);
>> +		}
>> +
>>  		/* can we preseve max_low_pfn ?*/
>>  		max_low_pfn = max_pfn;
>>  	}
> 
> For the record, it is this commit introduced in 3.4.16 :
> 
> commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> Author: Jacob Shin <jacob.shin@amd.com>
> Date:   Thu Oct 20 16:15:26 2011 -0500
> 
>     x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
>     
>     commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
>     
>     On systems with very large memory (1 TB in our case), BIOS may report a
>     reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>     these from the direct mapping.
>     
>     [ hpa: this should be done not just for > 4 GB but for everything above the legacy
>       region (1 MB), at the very least.  That, however, turns out to require significant
>       restructuring.  That work is well underway, but is not suitable for rc/stable. ]
>     
>     Signed-off-by: Jacob Shin <jacob.shin@amd.com>
>     Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
>     Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Willy


Thanks, Willy.

I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
So there's a fix somewhere in between that perhaps could also get backported to -stable.

-ml


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 14:22   ` Mark Lord
@ 2012-10-29 14:37     ` Mark Lord
  2012-10-29 14:40     ` Ben Hutchings
  1 sibling, 0 replies; 14+ messages in thread
From: Mark Lord @ 2012-10-29 14:37 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Greg Kroah-Hartman, stable, Linus Torvalds, Linux Kernel,
	Jacob Shin, H. Peter Anvin

On 12-10-29 10:22 AM, Mark Lord wrote:
> On 12-10-29 02:46 AM, Willy Tarreau wrote:
>> On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
>>> My server here runs the 3.4.xx series of "stable" kernels.
>>> Until today, it was running 3.4.9.
>>> Today I tried to upgrade it to 3.4.16.
>>> It hangs in setup.c.
>>>
>>> I've isolated the fault down to this specific change
>>> that was made between 3.4.9 and 3.4.16.
>>> Reverting this change allows the system to boot/run normally again.
..
>> For the record, it is this commit introduced in 3.4.16 :
>>
>> commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
>> Author: Jacob Shin <jacob.shin@amd.com>
>> Date:   Thu Oct 20 16:15:26 2011 -0500
>>
>>     x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
>>     
>>     commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
>>     
>>     On systems with very large memory (1 TB in our case), BIOS may report a
>>     reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>>     these from the direct mapping.
>>     
>>     [ hpa: this should be done not just for > 4 GB but for everything above the legacy
>>       region (1 MB), at the very least.  That, however, turns out to require significant
>>       restructuring.  That work is well underway, but is not suitable for rc/stable. ]
>>     
>>     Signed-off-by: Jacob Shin <jacob.shin@amd.com>
>>     Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
>>     Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
..
> I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> So there's a fix somewhere in between that perhaps could also get backported to -stable.
..

Heh.. except that kernel has its own issues -- hangs in some kind of screen loop
in the Radeon code (?) when trying to shutdown.  ctrl-alt-sysrq s+u+s+b gets out of that,
but it hangs in a similar fashion during the subsequent reboot.

A full power-off was required to get the Radeon video to behave so I could reboot
the system with 3.4.16 again.  I'm not going to pursue that issue for now, though.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 14:22   ` Mark Lord
  2012-10-29 14:37     ` Mark Lord
@ 2012-10-29 14:40     ` Ben Hutchings
  2012-10-29 14:47       ` Jacob Shin
  2012-10-29 16:37       ` Yinghai Lu
  1 sibling, 2 replies; 14+ messages in thread
From: Ben Hutchings @ 2012-10-29 14:40 UTC (permalink / raw)
  To: Mark Lord, Yinghai Lu
  Cc: Willy Tarreau, Greg Kroah-Hartman, stable, Linus Torvalds,
	Linux Kernel, Jacob Shin, H. Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 3320 bytes --]

On Mon, 2012-10-29 at 10:22 -0400, Mark Lord wrote:
> On 12-10-29 02:46 AM, Willy Tarreau wrote:
> > On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> >> My server here runs the 3.4.xx series of "stable" kernels.
> >> Until today, it was running 3.4.9.
> >> Today I tried to upgrade it to 3.4.16.
> >> It hangs in setup.c.
> >>
> >> I've isolated the fault down to this specific change
> >> that was made between 3.4.9 and 3.4.16.
> >> Reverting this change allows the system to boot/run normally again.
> >>
> >>
> >> --- linux-3.4.9/arch/x86/kernel/setup.c	2012-08-15 11:17:17.000000000 -0400
> >> +++ linux-3.4.16/arch/x86/kernel/setup.c	2012-10-28 13:36:33.000000000 -0400
> >> @@ -927,8 +927,21 @@
> >>
> >>  #ifdef CONFIG_X86_64
> >>  	if (max_pfn > max_low_pfn) {
> >> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
> >> -						     max_pfn<<PAGE_SHIFT);
> >> +		int i;
> >> +		for (i = 0; i < e820.nr_map; i++) {
> >> +			struct e820entry *ei = &e820.map[i];
> >> +
> >> +			if (ei->addr + ei->size <= 1UL << 32)
> >> +				continue;
> >> +
> >> +			if (ei->type == E820_RESERVED)
> >> +				continue;
> >> +
> >> +			max_pfn_mapped = init_memory_mapping(
> >> +				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> >> +				ei->addr + ei->size);
> >> +		}
> >> +
> >>  		/* can we preseve max_low_pfn ?*/
> >>  		max_low_pfn = max_pfn;
> >>  	}
> > 
> > For the record, it is this commit introduced in 3.4.16 :
> > 
> > commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> > Author: Jacob Shin <jacob.shin@amd.com>
> > Date:   Thu Oct 20 16:15:26 2011 -0500
> > 
> >     x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
> >     
> >     commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
> >     
> >     On systems with very large memory (1 TB in our case), BIOS may report a
> >     reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> >     these from the direct mapping.
> >     
> >     [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> >       region (1 MB), at the very least.  That, however, turns out to require significant
> >       restructuring.  That work is well underway, but is not suitable for rc/stable. ]
> >     
> >     Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> >     Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
> >     Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > Willy
> 
> 
> Thanks, Willy.
> 
> I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> So there's a fix somewhere in between that perhaps could also get backported to -stable.

Might well be:

commit 1f2ff682ac951ed82cc043cf140d2851084512df
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Mon Oct 22 16:35:18 2012 -0700

    x86, mm: Use memblock memory loop instead of e820_RAM

However I'm not sure that this loop is correct either.  Yinghai, does
your version definitely iterate in increasing pfn order?  If not then
the max_pfn_mapped assignment must be conditional.

Ben.

-- 
Ben Hutchings
Humans are not rational beings; they are rationalising beings.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 14:40     ` Ben Hutchings
@ 2012-10-29 14:47       ` Jacob Shin
  2012-10-29 16:58         ` Greg Kroah-Hartman
  2012-10-29 16:37       ` Yinghai Lu
  1 sibling, 1 reply; 14+ messages in thread
From: Jacob Shin @ 2012-10-29 14:47 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Mark Lord, Yinghai Lu, Willy Tarreau, Greg Kroah-Hartman, stable,
	Linus Torvalds, Linux Kernel, H. Peter Anvin

On Mon, Oct 29, 2012 at 02:40:58PM +0000, Ben Hutchings wrote:
> On Mon, 2012-10-29 at 10:22 -0400, Mark Lord wrote:
> > On 12-10-29 02:46 AM, Willy Tarreau wrote:
> > > On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> > >> My server here runs the 3.4.xx series of "stable" kernels.
> > >> Until today, it was running 3.4.9.
> > >> Today I tried to upgrade it to 3.4.16.
> > >> It hangs in setup.c.
> > >>
> > >> I've isolated the fault down to this specific change
> > >> that was made between 3.4.9 and 3.4.16.
> > >> Reverting this change allows the system to boot/run normally again.
> > >>
> > >>
> > >> --- linux-3.4.9/arch/x86/kernel/setup.c	2012-08-15 11:17:17.000000000 -0400
> > >> +++ linux-3.4.16/arch/x86/kernel/setup.c	2012-10-28 13:36:33.000000000 -0400
> > >> @@ -927,8 +927,21 @@
> > >>
> > >>  #ifdef CONFIG_X86_64
> > >>  	if (max_pfn > max_low_pfn) {
> > >> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
> > >> -						     max_pfn<<PAGE_SHIFT);
> > >> +		int i;
> > >> +		for (i = 0; i < e820.nr_map; i++) {
> > >> +			struct e820entry *ei = &e820.map[i];
> > >> +
> > >> +			if (ei->addr + ei->size <= 1UL << 32)
> > >> +				continue;
> > >> +
> > >> +			if (ei->type == E820_RESERVED)
> > >> +				continue;
> > >> +
> > >> +			max_pfn_mapped = init_memory_mapping(
> > >> +				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> > >> +				ei->addr + ei->size);
> > >> +		}
> > >> +
> > >>  		/* can we preseve max_low_pfn ?*/
> > >>  		max_low_pfn = max_pfn;
> > >>  	}
> > > 
> > > For the record, it is this commit introduced in 3.4.16 :
> > > 
> > > commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> > > Author: Jacob Shin <jacob.shin@amd.com>
> > > Date:   Thu Oct 20 16:15:26 2011 -0500
> > > 
> > >     x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
> > >     
> > >     commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
> > >     
> > >     On systems with very large memory (1 TB in our case), BIOS may report a
> > >     reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > >     these from the direct mapping.
> > >     
> > >     [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> > >       region (1 MB), at the very least.  That, however, turns out to require significant
> > >       restructuring.  That work is well underway, but is not suitable for rc/stable. ]
> > >     
> > >     Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> > >     Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
> > >     Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> > >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > 
> > > Willy
> > 
> > 
> > Thanks, Willy.
> > 
> > I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> > So there's a fix somewhere in between that perhaps could also get backported to -stable.
> 
> Might well be:
> 
> commit 1f2ff682ac951ed82cc043cf140d2851084512df
> Author: Yinghai Lu <yinghai@kernel.org>
> Date:   Mon Oct 22 16:35:18 2012 -0700
> 
>     x86, mm: Use memblock memory loop instead of e820_RAM
> 
> However I'm not sure that this loop is correct either.  Yinghai, does
> your version definitely iterate in increasing pfn order?  If not then
> the max_pfn_mapped assignment must be conditional.

Hi, I believe these two commits in mainline should fix Alexander's failing
machien:

844ab6f993b1d32eb40512503d35ff6ad0c57030
f82f64dd9f485e13f29f369772d4a0e868e5633a

This thread has some more details:

https://lkml.org/lkml/2012/10/21/157

Sorry, and thanks!

> 
> Ben.
> 
> -- 
> Ben Hutchings
> Humans are not rational beings; they are rationalising beings.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 14:40     ` Ben Hutchings
  2012-10-29 14:47       ` Jacob Shin
@ 2012-10-29 16:37       ` Yinghai Lu
  1 sibling, 0 replies; 14+ messages in thread
From: Yinghai Lu @ 2012-10-29 16:37 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Mark Lord, Willy Tarreau, Greg Kroah-Hartman, stable,
	Linus Torvalds, Linux Kernel, Jacob Shin, H. Peter Anvin

On Mon, Oct 29, 2012 at 7:40 AM, Ben Hutchings <ben@decadent.org.uk> wrote:
> However I'm not sure that this loop is correct either.  Yinghai, does
> your version definitely iterate in increasing pfn order?  If not then
> the max_pfn_mapped assignment must be conditional.

yes, memblock is in order.

Yinghai

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 14:47       ` Jacob Shin
@ 2012-10-29 16:58         ` Greg Kroah-Hartman
  2012-10-29 17:04           ` Jacob Shin
  2012-10-29 23:00           ` Mark Lord
  0 siblings, 2 replies; 14+ messages in thread
From: Greg Kroah-Hartman @ 2012-10-29 16:58 UTC (permalink / raw)
  To: Jacob Shin
  Cc: Ben Hutchings, Mark Lord, Yinghai Lu, Willy Tarreau, stable,
	Linus Torvalds, Linux Kernel, H. Peter Anvin

On Mon, Oct 29, 2012 at 09:47:22AM -0500, Jacob Shin wrote:
> On Mon, Oct 29, 2012 at 02:40:58PM +0000, Ben Hutchings wrote:
> > On Mon, 2012-10-29 at 10:22 -0400, Mark Lord wrote:
> > > On 12-10-29 02:46 AM, Willy Tarreau wrote:
> > > > On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> > > >> My server here runs the 3.4.xx series of "stable" kernels.
> > > >> Until today, it was running 3.4.9.
> > > >> Today I tried to upgrade it to 3.4.16.
> > > >> It hangs in setup.c.
> > > >>
> > > >> I've isolated the fault down to this specific change
> > > >> that was made between 3.4.9 and 3.4.16.
> > > >> Reverting this change allows the system to boot/run normally again.
> > > >>
> > > >>
> > > >> --- linux-3.4.9/arch/x86/kernel/setup.c	2012-08-15 11:17:17.000000000 -0400
> > > >> +++ linux-3.4.16/arch/x86/kernel/setup.c	2012-10-28 13:36:33.000000000 -0400
> > > >> @@ -927,8 +927,21 @@
> > > >>
> > > >>  #ifdef CONFIG_X86_64
> > > >>  	if (max_pfn > max_low_pfn) {
> > > >> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
> > > >> -						     max_pfn<<PAGE_SHIFT);
> > > >> +		int i;
> > > >> +		for (i = 0; i < e820.nr_map; i++) {
> > > >> +			struct e820entry *ei = &e820.map[i];
> > > >> +
> > > >> +			if (ei->addr + ei->size <= 1UL << 32)
> > > >> +				continue;
> > > >> +
> > > >> +			if (ei->type == E820_RESERVED)
> > > >> +				continue;
> > > >> +
> > > >> +			max_pfn_mapped = init_memory_mapping(
> > > >> +				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> > > >> +				ei->addr + ei->size);
> > > >> +		}
> > > >> +
> > > >>  		/* can we preseve max_low_pfn ?*/
> > > >>  		max_low_pfn = max_pfn;
> > > >>  	}
> > > > 
> > > > For the record, it is this commit introduced in 3.4.16 :
> > > > 
> > > > commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> > > > Author: Jacob Shin <jacob.shin@amd.com>
> > > > Date:   Thu Oct 20 16:15:26 2011 -0500
> > > > 
> > > >     x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
> > > >     
> > > >     commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
> > > >     
> > > >     On systems with very large memory (1 TB in our case), BIOS may report a
> > > >     reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > > >     these from the direct mapping.
> > > >     
> > > >     [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> > > >       region (1 MB), at the very least.  That, however, turns out to require significant
> > > >       restructuring.  That work is well underway, but is not suitable for rc/stable. ]
> > > >     
> > > >     Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> > > >     Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
> > > >     Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> > > >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > 
> > > > Willy
> > > 
> > > 
> > > Thanks, Willy.
> > > 
> > > I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> > > So there's a fix somewhere in between that perhaps could also get backported to -stable.
> > 
> > Might well be:
> > 
> > commit 1f2ff682ac951ed82cc043cf140d2851084512df
> > Author: Yinghai Lu <yinghai@kernel.org>
> > Date:   Mon Oct 22 16:35:18 2012 -0700
> > 
> >     x86, mm: Use memblock memory loop instead of e820_RAM
> > 
> > However I'm not sure that this loop is correct either.  Yinghai, does
> > your version definitely iterate in increasing pfn order?  If not then
> > the max_pfn_mapped assignment must be conditional.
> 
> Hi, I believe these two commits in mainline should fix Alexander's failing
> machien:
> 
> 844ab6f993b1d32eb40512503d35ff6ad0c57030
> f82f64dd9f485e13f29f369772d4a0e868e5633a
> 
> This thread has some more details:
> 
> https://lkml.org/lkml/2012/10/21/157
> 
> Sorry, and thanks!

Thanks, I've queued these up now.

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 16:58         ` Greg Kroah-Hartman
@ 2012-10-29 17:04           ` Jacob Shin
  2012-10-29 23:00           ` Mark Lord
  1 sibling, 0 replies; 14+ messages in thread
From: Jacob Shin @ 2012-10-29 17:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ben Hutchings, Mark Lord, Yinghai Lu, Willy Tarreau, stable,
	Linus Torvalds, Linux Kernel, H. Peter Anvin

On Mon, Oct 29, 2012 at 09:58:23AM -0700, Greg Kroah-Hartman wrote:
> On Mon, Oct 29, 2012 at 09:47:22AM -0500, Jacob Shin wrote:
> > On Mon, Oct 29, 2012 at 02:40:58PM +0000, Ben Hutchings wrote:
> > > On Mon, 2012-10-29 at 10:22 -0400, Mark Lord wrote:
> > > > On 12-10-29 02:46 AM, Willy Tarreau wrote:
> > > > > On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> > > > >> My server here runs the 3.4.xx series of "stable" kernels.
> > > > >> Until today, it was running 3.4.9.
> > > > >> Today I tried to upgrade it to 3.4.16.
> > > > >> It hangs in setup.c.
> > > > >>
> > > > >> I've isolated the fault down to this specific change
> > > > >> that was made between 3.4.9 and 3.4.16.
> > > > >> Reverting this change allows the system to boot/run normally again.
> > > > >>
> > > > >>
> > > > >> --- linux-3.4.9/arch/x86/kernel/setup.c	2012-08-15 11:17:17.000000000 -0400
> > > > >> +++ linux-3.4.16/arch/x86/kernel/setup.c	2012-10-28 13:36:33.000000000 -0400
> > > > >> @@ -927,8 +927,21 @@
> > > > >>
> > > > >>  #ifdef CONFIG_X86_64
> > > > >>  	if (max_pfn > max_low_pfn) {
> > > > >> -		max_pfn_mapped = init_memory_mapping(1UL<<32,
> > > > >> -						     max_pfn<<PAGE_SHIFT);
> > > > >> +		int i;
> > > > >> +		for (i = 0; i < e820.nr_map; i++) {
> > > > >> +			struct e820entry *ei = &e820.map[i];
> > > > >> +
> > > > >> +			if (ei->addr + ei->size <= 1UL << 32)
> > > > >> +				continue;
> > > > >> +
> > > > >> +			if (ei->type == E820_RESERVED)
> > > > >> +				continue;
> > > > >> +
> > > > >> +			max_pfn_mapped = init_memory_mapping(
> > > > >> +				ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> > > > >> +				ei->addr + ei->size);
> > > > >> +		}
> > > > >> +
> > > > >>  		/* can we preseve max_low_pfn ?*/
> > > > >>  		max_low_pfn = max_pfn;
> > > > >>  	}
> > > > > 
> > > > > For the record, it is this commit introduced in 3.4.16 :
> > > > > 
> > > > > commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> > > > > Author: Jacob Shin <jacob.shin@amd.com>
> > > > > Date:   Thu Oct 20 16:15:26 2011 -0500
> > > > > 
> > > > >     x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
> > > > >     
> > > > >     commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
> > > > >     
> > > > >     On systems with very large memory (1 TB in our case), BIOS may report a
> > > > >     reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > > > >     these from the direct mapping.
> > > > >     
> > > > >     [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> > > > >       region (1 MB), at the very least.  That, however, turns out to require significant
> > > > >       restructuring.  That work is well underway, but is not suitable for rc/stable. ]
> > > > >     
> > > > >     Signed-off-by: Jacob Shin <jacob.shin@amd.com>
> > > > >     Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
> > > > >     Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
> > > > >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > 
> > > > > Willy
> > > > 
> > > > 
> > > > Thanks, Willy.
> > > > 
> > > > I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> > > > So there's a fix somewhere in between that perhaps could also get backported to -stable.
> > > 
> > > Might well be:
> > > 
> > > commit 1f2ff682ac951ed82cc043cf140d2851084512df
> > > Author: Yinghai Lu <yinghai@kernel.org>
> > > Date:   Mon Oct 22 16:35:18 2012 -0700
> > > 
> > >     x86, mm: Use memblock memory loop instead of e820_RAM
> > > 
> > > However I'm not sure that this loop is correct either.  Yinghai, does
> > > your version definitely iterate in increasing pfn order?  If not then
> > > the max_pfn_mapped assignment must be conditional.
> > 
> > Hi, I believe these two commits in mainline should fix Alexander's failing
> > machien:
> > 
> > 844ab6f993b1d32eb40512503d35ff6ad0c57030
> > f82f64dd9f485e13f29f369772d4a0e868e5633a
> > 
> > This thread has some more details:
> > 
> > https://lkml.org/lkml/2012/10/21/157
> > 
> > Sorry, and thanks!
> 
> Thanks, I've queued these up now.

Thanks,

And also unrelated to Alexander's panic, but related to the commit in question
1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a

These two commits from Yinghai should also be backported into stable, or I
think it is already in progress (I saw an email out to Yinghai saying that the
patch did not apply cleanly, and needs to be manually backported):

6ede1fd3cb404c0016de6ac529df46d561bd558b
1f2ff682ac951ed82cc043cf140d2851084512df

Right Yinghai?

Thanks!

-Jacob

> 
> greg k-h
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 16:58         ` Greg Kroah-Hartman
  2012-10-29 17:04           ` Jacob Shin
@ 2012-10-29 23:00           ` Mark Lord
  2012-10-29 23:03             ` Greg Kroah-Hartman
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Lord @ 2012-10-29 23:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jacob Shin, Ben Hutchings, Yinghai Lu, Willy Tarreau, stable,
	Linus Torvalds, Linux Kernel, H. Peter Anvin

There's something else very wrong when going from 3.4.9 to 3.4.16.
I've done it on two machines here, one the AMD-450 server (64-bit),
and the other my main notebook (Core2duo 32-bit-PAE).

Both systems feel much more sluggish than usual with 3.4.16 running.
Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
and the usual responsive feel has returned.

Vague, I know, but something bad happened in there somewhere.

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 23:00           ` Mark Lord
@ 2012-10-29 23:03             ` Greg Kroah-Hartman
  2012-10-30  1:20               ` Yinghai Lu
  2012-10-30  4:53               ` Mark Lord
  0 siblings, 2 replies; 14+ messages in thread
From: Greg Kroah-Hartman @ 2012-10-29 23:03 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jacob Shin, Ben Hutchings, Yinghai Lu, Willy Tarreau, stable,
	Linus Torvalds, Linux Kernel, H. Peter Anvin

On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
> There's something else very wrong when going from 3.4.9 to 3.4.16.
> I've done it on two machines here, one the AMD-450 server (64-bit),
> and the other my main notebook (Core2duo 32-bit-PAE).
> 
> Both systems feel much more sluggish than usual with 3.4.16 running.
> Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
> and the usual responsive feel has returned.
> 
> Vague, I know, but something bad happened in there somewhere.

That's too vague for me to do anything with, sorry.  Bisection would be
good if you can figure out how to measure this.

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 23:03             ` Greg Kroah-Hartman
@ 2012-10-30  1:20               ` Yinghai Lu
  2012-10-30  4:53               ` Mark Lord
  1 sibling, 0 replies; 14+ messages in thread
From: Yinghai Lu @ 2012-10-30  1:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Mark Lord
  Cc: Jacob Shin, Ben Hutchings, Willy Tarreau, stable, Linus Torvalds,
	Linux Kernel, H. Peter Anvin

On Mon, Oct 29, 2012 at 4:03 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
>> Both systems feel much more sluggish than usual with 3.4.16 running.
>> Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
>> and the usual responsive feel has returned.
>>
>> Vague, I know, but something bad happened in there somewhere.
>
> That's too vague for me to do anything with, sorry.  Bisection would be
> good if you can figure out how to measure this.

yes, at least you can post boot log of working kernel and not working kernel.
then we could figure out if there is any corner case is not handled or
uncovered.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-29 23:03             ` Greg Kroah-Hartman
  2012-10-30  1:20               ` Yinghai Lu
@ 2012-10-30  4:53               ` Mark Lord
  2012-10-30 16:30                 ` Greg Kroah-Hartman
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Lord @ 2012-10-30  4:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jacob Shin, Ben Hutchings, Yinghai Lu, Willy Tarreau, stable,
	Linus Torvalds, Linux Kernel, H. Peter Anvin

On 12-10-29 07:03 PM, Greg Kroah-Hartman wrote:
> On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
>> There's something else very wrong when going from 3.4.9 to 3.4.16.
>> I've done it on two machines here, one the AMD-450 server (64-bit),
>> and the other my main notebook (Core2duo 32-bit-PAE).
>>
>> Both systems feel much more sluggish than usual with 3.4.16 running.
>> Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
>> and the usual responsive feel has returned.
>>
>> Vague, I know, but something bad happened in there somewhere.
> 
> That's too vague for me to do anything with, sorry.  Bisection would be
> good if you can figure out how to measure this.

Well, I'd bet Donkeys to Daises that reverting the kernel/sched.c changes
will probably fix the responsiveness, but I haven't done that yet.
I've lost enough time already debugging the other issues.

This is more just an indication that perhaps -stable patches need better review
than they're getting.  Take the setup.c breakage: as soon as I pointed it out,
a few people jumped in with knowledge that it was broken, and that patches
existed to fix it.

That kind of thing should be happening before a -stable release,
though I don't know how you would get the Right People to look
at this stuff then rather than after the fact.  Maybe a topic
for a future kernel summit or something.

Best wishes.
-ml


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Regression from 3.4.9 to 3.4.16 "stable" kernel
  2012-10-30  4:53               ` Mark Lord
@ 2012-10-30 16:30                 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 14+ messages in thread
From: Greg Kroah-Hartman @ 2012-10-30 16:30 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jacob Shin, Ben Hutchings, Yinghai Lu, Willy Tarreau, stable,
	Linus Torvalds, Linux Kernel, H. Peter Anvin

On Tue, Oct 30, 2012 at 12:53:06AM -0400, Mark Lord wrote:
> On 12-10-29 07:03 PM, Greg Kroah-Hartman wrote:
> > On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
> >> There's something else very wrong when going from 3.4.9 to 3.4.16.
> >> I've done it on two machines here, one the AMD-450 server (64-bit),
> >> and the other my main notebook (Core2duo 32-bit-PAE).
> >>
> >> Both systems feel much more sluggish than usual with 3.4.16 running.
> >> Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
> >> and the usual responsive feel has returned.
> >>
> >> Vague, I know, but something bad happened in there somewhere.
> > 
> > That's too vague for me to do anything with, sorry.  Bisection would be
> > good if you can figure out how to measure this.
> 
> Well, I'd bet Donkeys to Daises that reverting the kernel/sched.c changes
> will probably fix the responsiveness, but I haven't done that yet.
> I've lost enough time already debugging the other issues.
> 
> This is more just an indication that perhaps -stable patches need better review
> than they're getting.  Take the setup.c breakage: as soon as I pointed it out,
> a few people jumped in with knowledge that it was broken, and that patches
> existed to fix it.

There will always be bugs, fixing them quickly is the best that we can
do.

> That kind of thing should be happening before a -stable release,
> though I don't know how you would get the Right People to look
> at this stuff then rather than after the fact.  Maybe a topic
> for a future kernel summit or something.

I send patches to everyone involved, and there's a -rc period where
people are _supposed_ to test things out.  If you know of a better way
to get other people to test and review, please let me know, this is the
best that we have come up with so far.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-10-30 16:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-29  4:03 Regression from 3.4.9 to 3.4.16 "stable" kernel Mark Lord
2012-10-29  6:46 ` Willy Tarreau
2012-10-29 14:22   ` Mark Lord
2012-10-29 14:37     ` Mark Lord
2012-10-29 14:40     ` Ben Hutchings
2012-10-29 14:47       ` Jacob Shin
2012-10-29 16:58         ` Greg Kroah-Hartman
2012-10-29 17:04           ` Jacob Shin
2012-10-29 23:00           ` Mark Lord
2012-10-29 23:03             ` Greg Kroah-Hartman
2012-10-30  1:20               ` Yinghai Lu
2012-10-30  4:53               ` Mark Lord
2012-10-30 16:30                 ` Greg Kroah-Hartman
2012-10-29 16:37       ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).