linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* x86-64 bad pmds in 2.6.11.6
@ 2005-03-30 21:44 Dave Jones
  2005-03-31 10:41 ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Jones @ 2005-03-30 21:44 UTC (permalink / raw)
  To: ak; +Cc: linux-kernel

[apologies to Andi for getting this twice, I goofed the l-k address
 the first time]

 
 I arrived at the office today to find my workstation had this spew
 in its dmesg buffer..
 
 mm/memory.c:97: bad pmd ffff81004b017438(00000038a5500a88).
 mm/memory.c:97: bad pmd ffff81004b017440(0000000000000003).
 mm/memory.c:97: bad pmd ffff81004b017448(00007ffffffff73b).
 mm/memory.c:97: bad pmd ffff81004b017450(00007ffffffff73c).
 mm/memory.c:97: bad pmd ffff81004b017458(00007ffffffff73d).
 mm/memory.c:97: bad pmd ffff81004b017468(00007ffffffff73e).
 mm/memory.c:97: bad pmd ffff81004b017470(00007ffffffff73f).
 mm/memory.c:97: bad pmd ffff81004b017478(00007ffffffff740).
 mm/memory.c:97: bad pmd ffff81004b017480(00007ffffffff741).
 mm/memory.c:97: bad pmd ffff81004b017488(00007ffffffff742).
 mm/memory.c:97: bad pmd ffff81004b017490(00007ffffffff743).
 mm/memory.c:97: bad pmd ffff81004b017498(00007ffffffff744).
 mm/memory.c:97: bad pmd ffff81004b0174a0(00007ffffffff745).
 mm/memory.c:97: bad pmd ffff81004b0174a8(00007ffffffff746).
 mm/memory.c:97: bad pmd ffff81004b0174b0(00007ffffffff747).
 mm/memory.c:97: bad pmd ffff81004b0174b8(00007ffffffff748).
 mm/memory.c:97: bad pmd ffff81004b0174c0(00007ffffffff749).
 mm/memory.c:97: bad pmd ffff81004b0174c8(00007ffffffff74a).
 mm/memory.c:97: bad pmd ffff81004b0174d0(00007ffffffff74b).
 mm/memory.c:97: bad pmd ffff81004b0174d8(00007ffffffff74c).
 mm/memory.c:97: bad pmd ffff81004b0174e0(00007ffffffff74d).
 mm/memory.c:97: bad pmd ffff81004b0174e8(00007ffffffff74e).
 mm/memory.c:97: bad pmd ffff81004b0174f0(00007ffffffff74f).
 mm/memory.c:97: bad pmd ffff81004b0174f8(00007ffffffff750).
 mm/memory.c:97: bad pmd ffff81004b017500(00007ffffffff751).
 mm/memory.c:97: bad pmd ffff81004b017508(00007ffffffff752).
 mm/memory.c:97: bad pmd ffff81004b017510(00007ffffffff753).
 mm/memory.c:97: bad pmd ffff81004b017518(00007ffffffff754).
 mm/memory.c:97: bad pmd ffff81004b017520(00007ffffffff755).
 mm/memory.c:97: bad pmd ffff81004b017528(00007ffffffff756).
 mm/memory.c:97: bad pmd ffff81004b017530(00007ffffffff757).
 mm/memory.c:97: bad pmd ffff81004b017538(00007ffffffff758).
 mm/memory.c:97: bad pmd ffff81004b017540(00007ffffffff759).
 mm/memory.c:97: bad pmd ffff81004b017548(00007ffffffff75a).
 mm/memory.c:97: bad pmd ffff81004b017550(00007ffffffff75b).
 mm/memory.c:97: bad pmd ffff81004b017558(00007ffffffff75c).
 mm/memory.c:97: bad pmd ffff81004b017560(00007ffffffff75d).
 mm/memory.c:97: bad pmd ffff81004b017568(00007ffffffff75e).
 mm/memory.c:97: bad pmd ffff81004b017570(00007ffffffff75f).
 mm/memory.c:97: bad pmd ffff81004b017578(00007ffffffff760).
 mm/memory.c:97: bad pmd ffff81004b017580(00007ffffffff761).
 mm/memory.c:97: bad pmd ffff81004b017588(00007ffffffff762).
 mm/memory.c:97: bad pmd ffff81004b017590(00007ffffffff763).
 mm/memory.c:97: bad pmd ffff81004b017598(00007ffffffff764).
 mm/memory.c:97: bad pmd ffff81004b0175a0(00007ffffffff765).
 mm/memory.c:97: bad pmd ffff81004b0175a8(00007ffffffff766).
 mm/memory.c:97: bad pmd ffff81004b0175b0(00007ffffffff767).
 mm/memory.c:97: bad pmd ffff81004b0175b8(00007ffffffff768).
 mm/memory.c:97: bad pmd ffff81004b0175c0(00007ffffffff769).
 mm/memory.c:97: bad pmd ffff81004b0175c8(00007ffffffff76a).
 mm/memory.c:97: bad pmd ffff81004b0175d0(00007ffffffff76b).
 mm/memory.c:97: bad pmd ffff81004b0175d8(00007ffffffff76c).
 mm/memory.c:97: bad pmd ffff81004b0175e0(00007ffffffff76d).
 mm/memory.c:97: bad pmd ffff81004b0175e8(00007ffffffff76e).
 mm/memory.c:97: bad pmd ffff81004b0175f0(00007ffffffff76f).
 mm/memory.c:97: bad pmd ffff81004b0175f8(00007ffffffff770).
 mm/memory.c:97: bad pmd ffff81004b017600(00007ffffffff771).
 mm/memory.c:97: bad pmd ffff81004b017608(00007ffffffff772).
 mm/memory.c:97: bad pmd ffff81004b017610(00007ffffffff773).
 mm/memory.c:97: bad pmd ffff81004b017618(00007ffffffff774).
 mm/memory.c:97: bad pmd ffff81004b017628(0000000000000010).
 mm/memory.c:97: bad pmd ffff81004b017630(00000000078bfbff).
 mm/memory.c:97: bad pmd ffff81004b017638(0000000000000006).
 mm/memory.c:97: bad pmd ffff81004b017640(0000000000001000).
 mm/memory.c:97: bad pmd ffff81004b017648(0000000000000011).
 mm/memory.c:97: bad pmd ffff81004b017650(0000000000000064).
 mm/memory.c:97: bad pmd ffff81004b017658(0000000000000003).
 mm/memory.c:97: bad pmd ffff81004b017660(0000000000400040).
 mm/memory.c:97: bad pmd ffff81004b017668(0000000000000004).
 mm/memory.c:97: bad pmd ffff81004b017670(0000000000000038).
 mm/memory.c:97: bad pmd ffff81004b017678(0000000000000005).
 mm/memory.c:97: bad pmd ffff81004b017680(0000000000000008).
 mm/memory.c:97: bad pmd ffff81004b017688(0000000000000007).
 mm/memory.c:97: bad pmd ffff81004b017698(0000000000000008).
 mm/memory.c:97: bad pmd ffff81004b0176a8(0000000000000009).
 mm/memory.c:97: bad pmd ffff81004b0176b0(0000000000403840).
 mm/memory.c:97: bad pmd ffff81004b0176b8(000000000000000b).
 mm/memory.c:97: bad pmd ffff81004b0176c0(00000000000001f4).
 mm/memory.c:97: bad pmd ffff81004b0176c8(000000000000000c).
 mm/memory.c:97: bad pmd ffff81004b0176d0(00000000000001f4).
 mm/memory.c:97: bad pmd ffff81004b0176d8(000000000000000d).
 mm/memory.c:97: bad pmd ffff81004b0176e0(00000000000001f4).
 mm/memory.c:97: bad pmd ffff81004b0176e8(000000000000000e).
 mm/memory.c:97: bad pmd ffff81004b0176f0(00000000000001f4).
 mm/memory.c:97: bad pmd ffff81004b0176f8(0000000000000017).
 mm/memory.c:97: bad pmd ffff81004b017708(000000000000000f).
 mm/memory.c:97: bad pmd ffff81004b017710(00007ffffffff734).
 mm/memory.c:97: bad pmd ffff81004b017730(5f36387800000000).
 mm/memory.c:97: bad pmd ffff81004b017738(0000000000003436).
 
 
I've not done a memtest86 run on this (yet), but I'll be very
surprised if this is bad RAM, especially considering other
folks also seem to have hit the same thing when they moved
to 2.6.11.  (My workstation ran 2.6.9/2.6.10 without incident
previously).

http://lkml.org/lkml/2005/3/11/42 for example lists a similar
dump (though obviously differing addresses).
Googling around reveals a bunch of other similar dumps.
 
 		Dave


^ permalink raw reply	[flat|nested] 48+ messages in thread
* re: x86-64 bad pmds in 2.6.11.6
@ 2005-04-08 16:33 Clem Taylor
  0 siblings, 0 replies; 48+ messages in thread
From: Clem Taylor @ 2005-04-08 16:33 UTC (permalink / raw)
  To: linux-kernel

Dave Jones reported seeing bad pmd messages in 2.6.11.6. I've been
seeing them with 2.6.11 and today with 2.6.11.6. When I first saw the
problem I ran memtest86 and it didn't catch anything after ~3hours.
However, I don't see them when X starts. They tend to happen after a
program segfaults:

2.6.11:
Apr  3 23:23:33 klaatu kernel: sh[16361]: segfault at 0000000000000000
rip 0000000000000000 rsp 00007ffffffff020 error 14
Apr  3 23:23:33 klaatu kernel: mm/memory.c:97: bad pmd
ffff810027171010(00000000006b68b9).
.. many more ...

2.6.11.6:
Apr  8 12:03:17 klaatu kernel: grep[20971]: segfault at
0000000000000000 rip 0000000000000000 rsp 00007ffffffff090 error 14
Apr  8 12:03:17 klaatu kernel: mm/memory.c:97: bad pmd
ffff810095929010(0000000000000015).
.... many more ...
Apr  8 12:03:18 klaatu kernel: mm/memory.c:97: bad pmd
ffff8100959299d0(000034365f363878).
Apr  8 12:03:18 klaatu kernel: grep[21116]: segfault at
0000000000000000 rip 0000000000000000 rsp 00007ffffffff0a0 error 14
Apr  8 12:03:18 klaatu kernel: mm/memory.c:97: bad pmd
ffff810095f5b000(000000000000000f).
...

At the time I was doing a
find ... -exec grep -H ...
over a linux kernel tree.

I repeated the find and I didn't see segfaults the second run.

                                --Clem

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: x86-64 bad pmds in 2.6.11.6
@ 2005-08-08 16:55 Andy Davidson
  0 siblings, 0 replies; 48+ messages in thread
From: Andy Davidson @ 2005-08-08 16:55 UTC (permalink / raw)
  To: linux-kernel, davej

On Wed, 6 Apr, 2005 22:49:03 -0400, Dave Jones wrote:
> On Thu, Mar 31, 2005 at 12:41:17PM +0200, Andi Kleen wrote:
>  > On Wed, Mar 30, 2005 at 04:44:55PM -0500, Dave Jones wrote:
>  > >  I arrived at the office today to find my workstation had this spew
>  > >  in its dmesg buffer..
>  > Looks like random memory corruption to me.
>  > Can you enable slab debugging etc.?
>  > >  mm/memory.c:97: bad pmd ffff81004b017438(00000038a5500a88).
>  > >  mm/memory.c:97: bad pmd ffff81004b017440(0000000000000003).
>  > >  mm/memory.c:97: bad pmd ffff81004b017448(00007ffffffff73b).
>  > >  mm/memory.c:97: bad pmd ffff81004b017450(00007ffffffff73c).
> I realised today that this happens every time X starts up for
> the first time.   I did some experiments, and found that with 2.6.12rc1
> it's gone. Either it got fixed accidentally, or its hidden now
> by one of the many changes in 4-level patches.
> I'll try and narrow this down a little more tomorrow, to see if I
> can pinpoint the exact -bk snapshot (may be tricky given they were
> broken for a while), as it'd be good to get this fixed in 2.6.11.x
> if .12 isn't going to show up any time soon.

Hi, Dave, all --

Does anyone remember if they saw any system instability at the time of 
these messages ?

I'm running 2.6.11 on an SMP Opteron box, which is exhibiting these 
notices.  The box occasionally then behaves like it would during a 
serious memory leak - the load average shoots up, the box becomes 
unresponsive, stops accepting network connections, (but memory resources 
are not entirely starved, and nor does the kernel kill any processes off.)

Then - a few minutes later, the computer returns to normal.  This seems 
to happen maybe twice a week.  Thankfully, it's not ruined my weekend 
with a phone call from support yet, but it might. ;-)

If you do remember instability at this time, which was cured with an 
upgrade, then I will schedule some down time to try this out.


-- 

Regards, Andy Davidson                                andy@ebuyer.com
Systems Administrator,                                Ebuyer (UK) Ltd

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: x86-64 bad pmds in 2.6.11.6
@ 2005-09-20 17:12 Charles McCreary
  2005-09-20 17:30 ` Linus Torvalds
  0 siblings, 1 reply; 48+ messages in thread
From: Charles McCreary @ 2005-09-20 17:12 UTC (permalink / raw)
  To: linux-kernel

Another datapoint for this thread. The box spewing the bad pmds messages is a 
dual opteron 246 on a TYAN S2885 Thunder K8W motherboard. Kernel is 
2.6.11.4-20a-smp.

Approximately one hour after the bad pmd's, the box was completely 
unresponsive. This machine is either idle or heavily loaded, many threads, 
lots of io and nfs network traffic. Never see this when idle. When heavily 
loaded, it will invariably become unresponsive within 24 hrs. Looks 
reproducible. I'm willing to provide more information and test patches.

Output:
Sep 15 06:42:46 lakeport -- MARK --
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bc8
(00002aaaaaaaba98).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bd0
(0000000000000002).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bd8
(00007ffffffffdcc).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680be0
(00007ffffffffdcd).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bf0
(00007ffffffffdce).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680bf8
(00007ffffffffdcf).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c00
(00007ffffffffdd0).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c08
(00007ffffffffdd1).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c10
(00007ffffffffdd2).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c18
(00007ffffffffdd3).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c20
(00007ffffffffdd4).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c28
(00007ffffffffdd5).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c30
(00007ffffffffdd6).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c38
(00007ffffffffdd7).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c40
(00007ffffffffdd8).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c48
(00007ffffffffdd9).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c50
(00007ffffffffdda).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c58
(00007ffffffffddb).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c60
(00007ffffffffddc).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c68
(00007ffffffffddd).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c70
(00007ffffffffdde).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c78
(00007ffffffffddf).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c80
(00007ffffffffde0).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c88
(00007ffffffffde1).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c90
(00007ffffffffde2).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680c98
(00007ffffffffde3).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680ca0
(00007ffffffffde4).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680ca8
(00007ffffffffde5).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cb0
(00007ffffffffde6).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cc0
(0000000000000010).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cc8
(00000000078bfbff).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cd0
(0000000000000006).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cd8
(0000000000001000).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680ce0
(0000000000000011).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680ce8
(0000000000000064).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cf0
(0000000000000003).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680cf8
(0000000000400040).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d00
(0000000000000004).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d08
(0000000000000038).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d10
(0000000000000005).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d18
(0000000000000009).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d20
(0000000000000007).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d28
(00002aaaaaaab000).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d30
(0000000000000008).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d40
(0000000000000009).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d48
(00000000004010f0).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d50
(000000000000000b).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d60
(000000000000000c).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d70
(000000000000000d).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d80
(000000000000000e).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680d90
(0000000000000017).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680da0
(000000000000000f).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680da8
(00007ffffffffdc5).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680dc0
(3638780000000000).
Sep 15 06:58:44 lakeport kernel: mm/memory.c:97: bad pmd ffff81013c680dc8
(000000000034365f).
Sep 15 07:22:47 lakeport -- MARK --


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2005-09-20 23:23 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-03-30 21:44 x86-64 bad pmds in 2.6.11.6 Dave Jones
2005-03-31 10:41 ` Andi Kleen
2005-03-31 21:52   ` Dave Jones
2005-04-01 11:52     ` Sergey S. Kostyliov
2005-04-07  2:49   ` Dave Jones
2005-04-07  6:29     ` Andi Kleen
2005-04-14 13:54       ` Hugh Dickins
2005-04-14 17:01         ` Andi Kleen
2005-04-14 17:34           ` Hugh Dickins
2005-04-14 18:10             ` Andi Kleen
2005-04-14 18:11               ` x86-64 bad pmds in 2.6.11.6 II Andi Kleen
2005-04-14 18:27                 ` Chris Wright
2005-04-15 17:24                   ` Andi Kleen
2005-04-15 17:28                     ` Chris Wright
2005-04-15 17:58                       ` Hugh Dickins
2005-04-15 18:07                         ` Dave Jones
2005-04-22 17:37                           ` Debugging patch was " Andi Kleen
2005-04-27 14:23                           ` New debugging " Andi Kleen
2005-04-27 17:37                             ` Dave Jones
2005-04-29 11:07                               ` Hans Kristian Rosbach
2005-04-19 13:35                         ` Andi Kleen
2005-04-19 15:52                           ` Hugh Dickins
2005-04-29 11:12                             ` Christopher Warner
2005-04-29 16:13                               ` Chris Wright
2005-04-29 17:32                               ` Dave Jones
2005-05-02 17:00                                 ` Andi Kleen
2005-05-02 15:28                                   ` Christopher Warner
2005-05-02 20:33                                     ` Chris Wright
2005-05-02 21:08                                       ` Dave Jones
2005-05-03 14:28                                         ` Andi Kleen
2005-05-03 15:15                                           ` Dave Jones
2005-05-10  9:36                                     ` Christopher Warner
2005-05-10 16:26                                       ` Chris Wright
2005-05-10 12:03                                         ` Christopher Warner
2005-05-10 16:38                                       ` Dave Jones
2005-05-10 16:46                                         ` Andi Kleen
2005-05-10 16:59                                           ` Dave Jones
2005-05-10 20:32                                             ` Andi Kleen
2005-05-10 20:43                                               ` Chris Wright
2005-05-12 21:23                                             ` Andi Kleen
2005-05-13 21:51                                               ` Peter J. Stieber
2005-05-14 17:29                                                 ` Peter J. Stieber
2005-04-08 16:33 x86-64 bad pmds in 2.6.11.6 Clem Taylor
2005-08-08 16:55 Andy Davidson
2005-09-20 17:12 Charles McCreary
2005-09-20 17:30 ` Linus Torvalds
2005-09-20 19:44   ` Chris Wedgwood
2005-09-20 23:23     ` Dave Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).