Dave Jones wrote: > On Tue, Dec 19, 2006 at 04:20:37PM +1100, Nick Piggin wrote: > > Dave Jones wrote: > > > > > Eeek! page_mapcount(page) went negative! (-2) > > > > Hmm, probably happened once before, too. > > You're right. Going back further in the log, I noticed > that it had happened again exactly at the time that cron restarted vpnc. > The first time, the flags were different.. > > Dec 4 00:01:03 firewall kernel: Eeek! page_mapcount(page) went negative! (-1) > Dec 4 00:01:03 firewall kernel: page->flags = 400 > Dec 4 00:01:03 firewall kernel: page->count = 1 > Dec 4 00:01:03 firewall kernel: page->mapping = 00000000 Still reserved, with a NULL mapping. I'd say it could be the same page. > > > > page->flags = 404 > > > > What's that? PG_referenced|PG_reserved? So I'd say it is likely > > that some driver has got its refcounting wrong. > > At the time that it bit me, here's what was loaded.. > > tun ipt_MASQUERADE iptable_nat ip_nat ipt_LOG xt_limit ipv6 > ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp > iptable_filter ip_tables x_tables video sbs i2c_ec button battery asus_acpi ac > parport_pc lp parport pcspkr ide_cd i2c_viapro i2c_core cdrom 3c59x via_rhine > via_ircc mii irda crc_ccitt serio_raw dm_snapshot dm_zero dm_mirror dm_mod ext3 > jbd ehci_hcd ohci_hcd uhci_hcd > > The scary ones (i2c, irda) weren't in use at all, and had never been opened afaik, > so the potential for those to be corrupting memory is slim, but not out of the > question. (Why the hell asus_acpi is loaded is a mystery, this isn't an Asus, > or a laptop. Probably dumb initscripts). OK that could be useful if I do some grepping and see which ones are setting PG_reserved. > > And I see we've got another report for 2.6.19.1 from Chris, which > > is equally vague. > > I'll be moving that box to 2.6.19.x at some point real soon, so I'll holler > if I see it again on a later kernel. > > > IMO the pattern is much too consistent to be able to attribute > > them all to hardware problems. And considering it takes so long > > for these things to appear, can we get something like the attached > > patch upstream at least until we manage to stamp them out? > > Sounds like a good idea to me. > > ACKed-by: Dave Jones Thanks. > > > Any other debugging info we can add? > > Would it be useful to print the pfn of the page ? > In cases like mine, where it bit twice before it killed the box, it > might be interesting to see if its always the same page. Not sure > what that would prove/disprove though. Might help. I guess the site where it is allocated from might be another one, although I'm hoping that if we know what ->nopage is being used then we'll be able to track it. OTOH it may be using remap_pfn_range from fops->mmap, rather than nopage... I wonder how we could get at that info? vma->vm_file->f_op->mmap? -- SUSE Labs, Novell Inc.