linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Follow-up] Physical memory disappeared from /proc/meminfo
@ 2008-08-17 17:59 Marc Villemade
  2008-08-17 18:59 ` Rik van Riel
  2008-08-18  6:09 ` Andi Kleen
  0 siblings, 2 replies; 6+ messages in thread
From: Marc Villemade @ 2008-08-17 17:59 UTC (permalink / raw)
  To: linux-kernel

Hi everyone,

(I apologize in advance for this long email)

While looking for answers to my memory problems i've been having for
some time now, i've stumbled onto these posts:

Dated last year :
http://kerneltrap.org/mailarchive/linux-kernel/2007/8/26/164909

and dated from a couple months ago:
http://kerneltrap.org/mailarchive/linux-kernel/2008/6/24/2209554

I'm having exactly the same issue but on a 2.6.20.4 vanilla kernel
(x86). /proc/meminfo shows that
MemFree+Buffers+cached+AnonPages+Slab+Mapped != MemTotal, which AFAIK
should be the case.

6_days_uptime_machine ~ # cat /proc/meminfo
MemTotal:      3106668 kB
MemFree:        678104 kB
Buffers:        120024 kB
Cached:          69892 kB
SwapCached:          0 kB
Active:         740872 kB
Inactive:      1621704 kB
HighTotal:     2227996 kB
HighFree:        21380 kB
LowTotal:       878672 kB
LowFree:        656724 kB
SwapTotal:     4192956 kB
SwapFree:      4192956 kB
Dirty:            1292 kB
Writeback:           0 kB
AnonPages:      586900 kB
Mapped:          13824 kB
Slab:            50432 kB
SReclaimable:    39092 kB
SUnreclaim:      11340 kB
PageTables:       1532 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   5746288 kB
Committed_AS:  1073624 kB
VmallocTotal:   114680 kB
VmallocUsed:      8944 kB
VmallocChunk:   105568 kB

in contrast, here's the meminfo from a machine that has been rebooted
20 hours before, in which the above mentioned figures almost add up to
MemTotal. There's 50 MB missing already which makes me think the leak
starts right away after boot up...

20_hours_uptime_machine ~ # cat /proc/meminfo
MemTotal:      3106668 kB
MemFree:       2455932 kB
Buffers:         88624 kB
Cached:          69364 kB
SwapCached:          0 kB
Active:         496772 kB
Inactive:       114680 kB
HighTotal:     2227996 kB
HighFree:      1695240 kB
LowTotal:       878672 kB
LowFree:        760692 kB
SwapTotal:     4192956 kB
SwapFree:      4192956 kB
Dirty:            1016 kB
Writeback:           0 kB
AnonPages:      395888 kB
Mapped:          13956 kB
Slab:            23400 kB
SReclaimable:    12828 kB
SUnreclaim:      10572 kB
PageTables:       1048 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   5746288 kB
Committed_AS:   988928 kB
VmallocTotal:   114680 kB
VmallocUsed:      8944 kB
VmallocChunk:   105568 kB

Over time, i've noticed that the LRU lists (active.inactive) just get
bigger and bigger and inactive especially never seems to get freed
which doesn't make a lot of sense to me. I've tried the drop_cache
thing which helps for a while (but still doesn't make the memory
accounting get back to normal), but it is not a fix, it's only a
temporary solution. I'd like to have those machines running without us
having to drop caches once in a while.


The main brain-teaser for me here is that the machines were in use
several months ago in another setup almost identical in terms of
running processes - same kernel, same running processes, only
differences are mostly network-wise and machines have not been
reinstalled - and we didn't have this kind of issues. Now, we have to
reboot the servers every other week otherwise applications just get
refused for more memory at one point. That is inexplicable to me !
Which is why i turn to you guys ;)

Looking at meminfo, something else strikes me :  if SwapCached means
that there was once something swapped out and since it is always 0 on
my machines, how can a machine apparently going out of memory, and on
which swap is on, never swaps anything ? It seems logical to me that
one can't have more memory than the system can allocate which would
make swap space on a 32 bit machine with 4Gb of RAM useless, if it
were not for the MMU. Those machines have MMU enabled (hence the 3Gb
available even though 4 are physically installed). So i should be able
to use swap. Hence why doesn't it seem to be the case when the
machines are likely running out of memory (refusing malloc() calls).
Or maybe, i'm just totally wrong about the meaning of SwapCached ??!?


I've browsed (read grep'd) throu the changelogs from 2.6.20.4 and up
to 2.6.26.3 and saw that there was a consequent amount of memory leaks
fixed during that period, but they were mostly linked to USB ( i don't
have any USB devices on these machines although usbfs is used),
NETFILTER (which i don't use) or on other architectures than x86. i
didn't see anything strikingly matching my setup. Except maybe for
some SCSI bugs (mostly linked to firmwares).

Rob Mueller in June (second refered post) had a 2.6.25.x and he still
had the problem. Would you guys know if 2.6.26 fixes this issue ? Fred
on the first thread i posted says he doesn't have the issue with a
2.6.1 but had it with 2.6.12 and 2.6.20.x.

Here is some more info on the 7 days uptime machine. I didn't include
a dmesg cause this mail is already pretty long, and it doesn't seem to
me that there is anything worth of interest in it, but i could be
totally wrong, so please let me know if you want me to send it to you
as well. I might just copy a couple of lines which look a bit
suspicious to me :

------------- from DMESG ---
PM: Writing back config space on device 0000:08:03.0 at offset 3 (was
804000, writing 804010)
PM: Writing back config space on device 0000:08:03.0 at offset 2 (was
2000000, writing 2000010)
PM: Writing back config space on device 0000:08:03.0 at offset 1 (was
2b00000, writing 2b00146)

------------------------------------------------ ZONEINFO


6_days_uptime_machine ~ # cat /proc/zoneinfo
Node 0, zone      DMA
  pages free     2827
        min      17
        low      21
        high     25
        active   0
        inactive 0
        scanned  0 (a: 9 i: 9)
        spanned  4096
        present  4064
    nr_anon_pages 0
    nr_mapped    1
    nr_file_pages 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 0
    nr_page_table_pages 0
    nr_dirty     0
    nr_writeback 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
        protection: (0, 873, 4048)
  pagesets
  all_unreclaimable: 1
  prev_priority:     12
  start_pfn:         0
Node 0, zone   Normal
  pages free     161364
        min      936
        low      1170
        high     1404
        active   33965
        inactive 7245
        scanned  0 (a: 0 i: 28)
        spanned  225280
        present  223520
    nr_anon_pages 5081
    nr_mapped    0
    nr_file_pages 31868
    nr_slab_reclaimable 9773
    nr_slab_unreclaimable 2790
    nr_page_table_pages 383
    nr_dirty     18
    nr_writeback 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 3
        protection: (0, 0, 25400)
  pagesets
    cpu: 0 pcp: 0
              count: 140
              high:  186
              batch: 31
    cpu: 0 pcp: 1
              count: 23
              high:  62
              batch: 15
  vm stats threshold: 24
    cpu: 1 pcp: 0
              count: 19
              high:  186
              batch: 31
    cpu: 1 pcp: 1
              count: 14
              high:  62
              batch: 15
  vm stats threshold: 24
    cpu: 2 pcp: 0
              count: 158
              high:  186
              batch: 31
    cpu: 2 pcp: 1
              count: 10
              high:  62
              batch: 15
  vm stats threshold: 24
    cpu: 3 pcp: 0
              count: 94
              high:  186
              batch: 31
    cpu: 3 pcp: 1
              count: 7
              high:  62
              batch: 15
  vm stats threshold: 24
  all_unreclaimable: 0
  prev_priority:     12
  start_pfn:         4096
Node 0, zone  HighMem
  pages free     3175
        min      128
        low      979
        high     1831
        active   153497
        inactive 398182
        scanned  0 (a: 0 i: 0)
        spanned  819200
        present  812800
    nr_anon_pages 143849
    nr_mapped    3456
    nr_file_pages 15590
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 0
    nr_page_table_pages 0
    nr_dirty     44
    nr_writeback 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
        protection: (0, 0, 0)
  pagesets
    cpu: 0 pcp: 0
              count: 14
              high:  186
              batch: 31
    cpu: 0 pcp: 1
              count: 9
              high:  62
              batch: 15
  vm stats threshold: 36
    cpu: 1 pcp: 0
              count: 12
              high:  186
              batch: 31
    cpu: 1 pcp: 1
              count: 3
              high:  62
              batch: 15
  vm stats threshold: 36
    cpu: 2 pcp: 0
              count: 7
              high:  186
              batch: 31
    cpu: 2 pcp: 1
              count: 3
              high:  62
              batch: 15
  vm stats threshold: 36
    cpu: 3 pcp: 0
              count: 33
              high:  186
              batch: 31
    cpu: 3 pcp: 1
              count: 9
              high:  62
              batch: 15
  vm stats threshold: 36
  all_unreclaimable: 0
  prev_priority:     12
  start_pfn:         229376

------------------------------------------------ LSMOD


6_days_uptime_machine ~ # lsmod
Module                  Size  Used by
iptable_nat             7172  0
nf_nat                 16172  1 iptable_nat
nf_conntrack_ipv4      14860  2 iptable_nat
nf_conntrack           51336  3 iptable_nat,nf_nat,nf_conntrack_ipv4
nfnetlink               6040  3 nf_nat,nf_conntrack_ipv4,nf_conntrack
iptable_filter          3332  1
ip_tables              11508  2 iptable_nat,iptable_filter
x_tables               12804  2 iptable_nat,ip_tables
rtc                    11184  0
bonding                84248  0
bnx2                  142960  0
zlib_inflate           15232  1 bnx2
evdev                   9088  0
raid456               119568  0
xor                    15112  1 raid456
tg3                   104712  0
e1000                 121856  0
sata_nv                15496  0
libata                 96164  1 sata_nv
usbhid                 15240  0
ohci_hcd               19852  0
uhci_hcd               22036  0
usb_storage            34312  0
ehci_hcd               28824  0
usbcore               115084  6 usbhid,ohci_hcd,uhci_hcd,usb_storage,ehci_hcd


Thanks for any information you might have that would help me figure
this out. We've been having this problem for two months now, and it's
getting very infuriating not to be able to fix it or even understand
where the problem stems from. If you need any more information, i'd be
happy to hand it to you. Just ask !

Cheers

Marc Villemade

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Follow-up] Physical memory disappeared from /proc/meminfo
  2008-08-17 17:59 [Follow-up] Physical memory disappeared from /proc/meminfo Marc Villemade
@ 2008-08-17 18:59 ` Rik van Riel
  2008-08-17 19:26   ` Marc Villemade
  2008-08-18  6:09 ` Andi Kleen
  1 sibling, 1 reply; 6+ messages in thread
From: Rik van Riel @ 2008-08-17 18:59 UTC (permalink / raw)
  To: Marc Villemade; +Cc: linux-kernel

On Sun, 17 Aug 2008 19:59:10 +0200
"Marc Villemade" <mastachand@gmail.com> wrote:

> I'm having exactly the same issue but on a 2.6.20.4 vanilla kernel
> (x86). /proc/meminfo shows that
> MemFree+Buffers+cached+AnonPages+Slab+Mapped != MemTotal, which AFAIK
> should be the case.

There are also page tables, vmalloc memory and various
unaccounted kernel allocations.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Follow-up] Physical memory disappeared from /proc/meminfo
  2008-08-17 18:59 ` Rik van Riel
@ 2008-08-17 19:26   ` Marc Villemade
  2008-08-17 19:42     ` Arjan van de Ven
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Villemade @ 2008-08-17 19:26 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

hey

Thanks for your answer, but it still doesn't come close. I'm up to 1.5
Gig adding what i said. and PageTables + VMallocTotal equals 160 Mo.
Sounds unlikely to me that something unaccounted for might go up to
1.4 Gig, don't you think ? Or maybe it is accounted somewhere else
than /proc/meminfo ? Please point me to the right place if so.

Cheers

Marc Villemade


On Sun, Aug 17, 2008 at 8:59 PM, Rik van Riel <riel@redhat.com> wrote:
> On Sun, 17 Aug 2008 19:59:10 +0200
> "Marc Villemade" <mastachand@gmail.com> wrote:
>
>> I'm having exactly the same issue but on a 2.6.20.4 vanilla kernel
>> (x86). /proc/meminfo shows that
>> MemFree+Buffers+cached+AnonPages+Slab+Mapped != MemTotal, which AFAIK
>> should be the case.
>
> There are also page tables, vmalloc memory and various
> unaccounted kernel allocations.
>
> --
> All rights reversed.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Follow-up] Physical memory disappeared from /proc/meminfo
  2008-08-17 19:26   ` Marc Villemade
@ 2008-08-17 19:42     ` Arjan van de Ven
  2008-08-17 20:22       ` Marc Villemade
  0 siblings, 1 reply; 6+ messages in thread
From: Arjan van de Ven @ 2008-08-17 19:42 UTC (permalink / raw)
  To: Marc Villemade; +Cc: Rik van Riel, linux-kernel

On Sun, 17 Aug 2008 21:26:07 +0200
"Marc Villemade" <mastachand@gmail.com> wrote:

> hey
> 
> Thanks for your answer, but it still doesn't come close. I'm up to 1.5
> Gig adding what i said. and PageTables + VMallocTotal equals 160 Mo.
> Sounds unlikely to me that something unaccounted for might go up to
> 1.4 Gig, don't you think ? Or maybe it is accounted somewhere else
> than /proc/meminfo ? Please point me to the right place if so.
> 

you're not accidentally using shmfs or tmpfs are you ?
(or otherwise have a leak in sysv shm)


you can use the "ipcs" command to see your shm allocation

(and for tmpfs/shmfs... check if you have it mounted anywhere.
Something like /dev being on shmfs and /dev/null being a file not a
device can.. totally screw you over)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Follow-up] Physical memory disappeared from /proc/meminfo
  2008-08-17 19:42     ` Arjan van de Ven
@ 2008-08-17 20:22       ` Marc Villemade
  0 siblings, 0 replies; 6+ messages in thread
From: Marc Villemade @ 2008-08-17 20:22 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Rik van Riel, linux-kernel

On Sun, Aug 17, 2008 at 9:42 PM, Arjan van de Ven <arjan@infradead.org> wrote:
> On Sun, 17 Aug 2008 21:26:07 +0200
> "Marc Villemade" <mastachand@gmail.com> wrote:
>
>> hey
>>
>> Thanks for your answer, but it still doesn't come close. I'm up to 1.5
>> Gig adding what i said. and PageTables + VMallocTotal equals 160 Mo.
>> Sounds unlikely to me that something unaccounted for might go up to
>> 1.4 Gig, don't you think ? Or maybe it is accounted somewhere else
>> than /proc/meminfo ? Please point me to the right place if so.
>>
>
> you're not accidentally using shmfs or tmpfs are you ?
> (or otherwise have a leak in sysv shm)
>
>
> you can use the "ipcs" command to see your shm allocation
>
> (and for tmpfs/shmfs... check if you have it mounted anywhere.
> Something like /dev being on shmfs and /dev/null being a file not a
> device can.. totally screw you over)
>

hey,

here are my mount points:

6_days_uptime_machine ~ # mount
/dev/evms/root on / type ext3 (rw,noatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,nosuid)
devpts on /dev/pts type devpts (rw,nosuid,noexec)
/dev/evms/boot on /boot type ext3 (rw,noatime)
/dev/evms/data on /data type ext3 (rw,noatime,data=journal)
/dev/evms/queue on /mnt/queue type ext3 (rw,noatime)
none on /dev/shm type tmpfs (rw)
/dev on /opt/bizanga/chroot/dev type none (rw,bind)
usbfs on /proc/bus/usb type usbfs (rw,noexec,nosuid,devmode=0664,devgid=85)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc
(rw,noexec,nosuid,nodev)

so shmfs is never used, but tmpfs is for /dev and /dev/shm. Do you
think that would be an issue ?

and it looks like i'm not using any shared mem either :
6_days_uptime_machine ~ # ipcs -a

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status

------ Semaphore Arrays --------
key        semid      owner      perms      nsems

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages


And i've also checked that all /dev entries are devices. Thanks for the tip.

After Ryk's email, i started looking a little more closely to the
other values in /proc/meminfo and /proc/zoneinfo and i would like to
have more information about zoneinfo. It seems that most of the
inactive pages are in HighMem :

        active   148512
        inactive 403735
        scanned  0 (a: 0 i: 0)

and what does the scanned line mean ? And what about the fact that
it's all 0 ? Means that it's never been scanned for reclaimable space
? Maybe that's where my problem lies !? Any more info on those proc
entries would be greatly appreciated. My friend google didn't have a
lot of good answers, neither did the Kernel documentation.


Marc Villemade

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Follow-up] Physical memory disappeared from /proc/meminfo
  2008-08-17 17:59 [Follow-up] Physical memory disappeared from /proc/meminfo Marc Villemade
  2008-08-17 18:59 ` Rik van Riel
@ 2008-08-18  6:09 ` Andi Kleen
  1 sibling, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2008-08-18  6:09 UTC (permalink / raw)
  To: Marc Villemade; +Cc: linux-kernel

"Marc Villemade" <mastachand@gmail.com> writes:
>
> I'm having exactly the same issue but on a 2.6.20.4 vanilla kernel
> (x86). /proc/meminfo shows that
> MemFree+Buffers+cached+AnonPages+Slab+Mapped != MemTotal, which AFAIK
> should be the case.

Nope, the equation is not necessarily true. 
See http://halobates.de/memorywaste.pdf for a detailed discussion.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-08-18  6:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-17 17:59 [Follow-up] Physical memory disappeared from /proc/meminfo Marc Villemade
2008-08-17 18:59 ` Rik van Riel
2008-08-17 19:26   ` Marc Villemade
2008-08-17 19:42     ` Arjan van de Ven
2008-08-17 20:22       ` Marc Villemade
2008-08-18  6:09 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).