linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: memtest: WARN if bad RAM found
@ 2012-04-02 15:05 Jonathan Nieder
  2012-04-13 19:39 ` [PATCH resend] " Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-02 15:05 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Ben Hutchings, Andreas Herrmann, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin

From: Ben Hutchings <ben@decadent.org.uk>
Date: Mon, 5 Dec 2011 04:00:58 +0000

Since this is not a particularly thorough test, if we find any bad
bits of RAM then there is a fair chance that there are other bad bits
we fail to detect.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Hi,

The patch below comes from this discussion

  http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=50;bug=613321

and has been in use in Debian kernels since last December.  The
rationale does not seem particularly distro-specific, and all in all
it looks to me like a good change.

Nothing urgent here.  I imagine this patch as targetted to v3.5.

Thoughts?
Jonathan

 arch/x86/mm/memtest.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index c80b9fb95734..38caeb44a218 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -30,6 +30,8 @@ static u64 patterns[] __initdata = {
 
 static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
 {
+	WARN_ONCE(1, "Bad RAM detected. Use memtest86+ to perform a thorough test\n"
+		  "and the memmap= parameter to reserve the bad areas.");
 	printk(KERN_INFO "  %016llx bad mem addr %010llx - %010llx reserved\n",
 	       (unsigned long long) pattern,
 	       (unsigned long long) start_bad,
-- 
1.7.10.rc3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH resend] x86: memtest: WARN if bad RAM found
  2012-04-02 15:05 [PATCH] x86: memtest: WARN if bad RAM found Jonathan Nieder
@ 2012-04-13 19:39 ` Jonathan Nieder
  2012-04-23 18:26   ` [PATCH resend v3] " Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-13 19:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: x86, linux-kernel, Ben Hutchings, Andreas Herrmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin

From: Ben Hutchings <ben@decadent.org.uk>
Date: Mon, 5 Dec 2011 04:00:58 +0000

Since this is not a particularly thorough test, if we find any bad
bits of RAM then there is a fair chance that there are other bad bits
we fail to detect.

Warn so the TAINT_WARNING flag shows up in panic traces and other bug
reports from users that enabled memtest and found bad RAM during
bootup, to help people debugging to see that problems are potentially
due to unreliable RAM.  The warning text gives advice that can be used
to make the warning go away using a more thorough test:

	Bad RAM detected. Use memtest86+ to perform a thorough test
	and the memmap= parameter to reserve the bad areas.

In this way, this patch should make the lives of people helping to
analyze bug reports from builds with CONFIG_MEMTEST enabled easier.

[jn: more explanation of impact]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Hi again,

The patch below last visited this list on 2 April, about a week and a
half ago.  No reply.

The patch has been in Debian since last December.  I like it.  The
patch is targetted at v3.5, so I would like to see it in linux-next so
it can get more exposure.  Comments?

Thanks,
Jonathan

 arch/x86/mm/memtest.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index c80b9fb95734..38caeb44a218 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -30,6 +30,8 @@ static u64 patterns[] __initdata = {
 
 static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
 {
+	WARN_ONCE(1, "Bad RAM detected. Use memtest86+ to perform a thorough test\n"
+		  "and the memmap= parameter to reserve the bad areas.");
 	printk(KERN_INFO "  %016llx bad mem addr %010llx - %010llx reserved\n",
 	       (unsigned long long) pattern,
 	       (unsigned long long) start_bad,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH resend v3] x86: memtest: WARN if bad RAM found
  2012-04-13 19:39 ` [PATCH resend] " Jonathan Nieder
@ 2012-04-23 18:26   ` Jonathan Nieder
  2012-04-23 20:26     ` Yinghai Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-23 18:26 UTC (permalink / raw)
  To: x86
  Cc: Andrew Morton, linux-kernel, Ben Hutchings, Andreas Herrmann,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Dave Jones

From: Ben Hutchings <ben@decadent.org.uk>
Date: Mon, 5 Dec 2011 04:00:58 +0000

The novice who enables CONFIG_MEMTEST may not realize that it is not a
particularly thorough test.  If we find any bad bits of RAM then there
is a fair chance that there are other bad bits we fail to detect; add
a WARNING for this situation so people helping debug ensuing problems
can understand what happened.

The warning text gives advice to allow the sysadmin to run a more
thorough test and suppress the warning.

	Bad RAM detected. Use memtest86+ to perform a thorough test
	and the memmap= parameter to reserve the bad areas.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Jonathan Nieder wrote:

> The patch below last visited this list on 2 April, about a week and a
> half ago.  No reply.

Are Ben and I the only ones who care either way about this change?

 arch/x86/mm/memtest.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index c80b9fb95734..38caeb44a218 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -30,6 +30,8 @@ static u64 patterns[] __initdata = {
 
 static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
 {
+	WARN_ONCE(1, "Bad RAM detected. Use memtest86+ to perform a thorough test\n"
+		  "and the memmap= parameter to reserve the bad areas.");
 	printk(KERN_INFO "  %016llx bad mem addr %010llx - %010llx reserved\n",
 	       (unsigned long long) pattern,
 	       (unsigned long long) start_bad,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH resend v3] x86: memtest: WARN if bad RAM found
  2012-04-23 18:26   ` [PATCH resend v3] " Jonathan Nieder
@ 2012-04-23 20:26     ` Yinghai Lu
  2012-04-23 20:28       ` Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Yinghai Lu @ 2012-04-23 20:26 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: x86, Andrew Morton, linux-kernel, Ben Hutchings,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Dave Jones

On Mon, Apr 23, 2012 at 11:26 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>
>  arch/x86/mm/memtest.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
> index c80b9fb95734..38caeb44a218 100644
> --- a/arch/x86/mm/memtest.c
> +++ b/arch/x86/mm/memtest.c
> @@ -30,6 +30,8 @@ static u64 patterns[] __initdata = {
>
>  static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
>  {
> +       WARN_ONCE(1, "Bad RAM detected. Use memtest86+ to perform a thorough test\n"
> +                 "and the memmap= parameter to reserve the bad areas.");

You must be kidding : calling memtest86+ "thorough test".

Yinghai

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH resend v3] x86: memtest: WARN if bad RAM found
  2012-04-23 20:26     ` Yinghai Lu
@ 2012-04-23 20:28       ` Jonathan Nieder
  2012-04-23 22:13         ` Yinghai Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-23 20:28 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: x86, Andrew Morton, linux-kernel, Ben Hutchings,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Dave Jones

Yinghai Lu wrote:
> On Mon, Apr 23, 2012 at 11:26 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> --- a/arch/x86/mm/memtest.c
>> +++ b/arch/x86/mm/memtest.c
>> @@ -30,6 +30,8 @@ static u64 patterns[] __initdata = {
>>
>>  static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
>>  {
>> +       WARN_ONCE(1, "Bad RAM detected. Use memtest86+ to perform a thorough test\n"
>> +                 "and the memmap= parameter to reserve the bad areas.");
>
> You must be kidding : calling memtest86+ "thorough test".

How about "more thorough test"?  Or do you have a better
recommendation for users?

Thanks for looking it over.
Jonathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH resend v3] x86: memtest: WARN if bad RAM found
  2012-04-23 20:28       ` Jonathan Nieder
@ 2012-04-23 22:13         ` Yinghai Lu
  2012-04-24  2:50           ` [PATCH v4] " Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Yinghai Lu @ 2012-04-23 22:13 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: x86, Andrew Morton, linux-kernel, Ben Hutchings,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Dave Jones

On Mon, Apr 23, 2012 at 1:28 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Yinghai Lu wrote:
>> On Mon, Apr 23, 2012 at 11:26 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>
>>> --- a/arch/x86/mm/memtest.c
>>> +++ b/arch/x86/mm/memtest.c
>>> @@ -30,6 +30,8 @@ static u64 patterns[] __initdata = {
>>>
>>>  static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
>>>  {
>>> +       WARN_ONCE(1, "Bad RAM detected. Use memtest86+ to perform a thorough test\n"
>>> +                 "and the memmap= parameter to reserve the bad areas.");
>>
>> You must be kidding : calling memtest86+ "thorough test".
>
> How about "more thorough test"?  Or do you have a better
> recommendation for users?

The reason for adding early_memtest is for debug purpose.
Sometimes BIOS mess up setting, on some booting memory is ok, but
other booting the memory is not initialized properly.

in that case: preboot memtest tools is not going to help.

also preboot memtest tools and early_memtest is not stressed enough.
--- only one process is running.
Need to run memtester multiple instances to test your memory and systems.

Yinghai

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4] x86: memtest: WARN if bad RAM found
  2012-04-23 22:13         ` Yinghai Lu
@ 2012-04-24  2:50           ` Jonathan Nieder
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-24  2:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: x86, Andrew Morton, linux-kernel, Ben Hutchings,
	Andreas Herrmann, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Dave Jones

From: Ben Hutchings <ben@decadent.org.uk>

The novice who enables CONFIG_MEMTEST may not realize that this is not
a particularly thorough test.  If we find any bad bits of RAM then
there is a fair chance that there are other bad bits we fail to
detect; add a WARNING for this situation so people helping debug
ensuing problems can understand what happened.

The warning text gives advice to allow the sysadmin to address the
warning by fixing the underlying problem or running a more thorough
test and using the memmap= parameter to reserve bad areas.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Yinghai Lu wrote:

> The reason for adding early_memtest is for debug purpose.
> Sometimes BIOS mess up setting, on some booting memory is ok, but
> other booting the memory is not initialized properly.
>
> in that case: preboot memtest tools is not going to help.
[... and another hint about how memtest86 may be more suitable
 than memtest86+ in some situations]

Makes perfect sense.  How about this?

This punts to Documentation/memory.txt for advice, in the hope of
nudging people to improve that document where it is lacking (hint,
hint).

Thanks,
Jonathan

 arch/x86/mm/memtest.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/mm/memtest.c b/arch/x86/mm/memtest.c
index c80b9fb95734..d26067d5ddec 100644
--- a/arch/x86/mm/memtest.c
+++ b/arch/x86/mm/memtest.c
@@ -30,6 +30,7 @@ static u64 patterns[] __initdata = {
 
 static void __init reserve_bad_mem(u64 pattern, u64 start_bad, u64 end_bad)
 {
+	WARN_ONCE(1, "Bad RAM detected. See Documentation/memory.txt for hints.");
 	printk(KERN_INFO "  %016llx bad mem addr %010llx - %010llx reserved\n",
 	       (unsigned long long) pattern,
 	       (unsigned long long) start_bad,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-04-24  2:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-02 15:05 [PATCH] x86: memtest: WARN if bad RAM found Jonathan Nieder
2012-04-13 19:39 ` [PATCH resend] " Jonathan Nieder
2012-04-23 18:26   ` [PATCH resend v3] " Jonathan Nieder
2012-04-23 20:26     ` Yinghai Lu
2012-04-23 20:28       ` Jonathan Nieder
2012-04-23 22:13         ` Yinghai Lu
2012-04-24  2:50           ` [PATCH v4] " Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).