linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Memory leaks on atom-based boards?
@ 2014-10-27 18:44 AL13N
  2014-11-09 11:56 ` Vlastimil Babka
  2014-11-21 20:06 ` Pavel Machek
  0 siblings, 2 replies; 7+ messages in thread
From: AL13N @ 2014-10-27 18:44 UTC (permalink / raw)
  To: linux-kernel

I have several machines with the same OS and kernel (3.14.22).

2 of those machines are both atom-based boards and they get OOM, without
swap being used (MemAvail crawls down towards 0, even though not more
memory is used on processes).

Specifically, this one machine, i need to reboot every 3 à 5 days.

It has 4GB RAM and 4GB swap(SSD), but:
 - sum of all vmRSS < 500MB
 - sum of all tmpfs < 100MB
 - Slab is around 16MB
 - Cache will usually crawl down towards 0 (just like MemAvail)
 - I couldn't find another explanation for the loss of Memory
 - I also asked
http://serverfault.com/questions/616856/where-did-my-memory-go-on-linux-no-cache-slab-shm-ipcs
(the other machine)
 - This problem existed on this hardware at least from 3.12.* upwards.

I've recompiled kernel to include kmemleak (i figured it'd be some module
that i've only got with this board), but it didn't point to anything (i
tested also with the test module, to see if it was working).

My questions are:
 - Is this a kernel memory leak somewhere?
 - How can i find out what is allocating all this memory?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory leaks on atom-based boards?
  2014-10-27 18:44 Memory leaks on atom-based boards? AL13N
@ 2014-11-09 11:56 ` Vlastimil Babka
  2014-11-09 16:38   ` AL13N
  2014-11-21 20:06 ` Pavel Machek
  1 sibling, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2014-11-09 11:56 UTC (permalink / raw)
  To: AL13N, linux-kernel

On 10/27/2014 07:44 PM, AL13N wrote:
> I have several machines with the same OS and kernel (3.14.22).
> 
> 2 of those machines are both atom-based boards and they get OOM, without
> swap being used (MemAvail crawls down towards 0, even though not more
> memory is used on processes).
> 
> Specifically, this one machine, i need to reboot every 3 à 5 days.
> 
> It has 4GB RAM and 4GB swap(SSD), but:
>  - sum of all vmRSS < 500MB
>  - sum of all tmpfs < 100MB
>  - Slab is around 16MB
>  - Cache will usually crawl down towards 0 (just like MemAvail)
>  - I couldn't find another explanation for the loss of Memory
>  - I also asked
> http://serverfault.com/questions/616856/where-did-my-memory-go-on-linux-no-cache-slab-shm-ipcs
> (the other machine)
>  - This problem existed on this hardware at least from 3.12.* upwards.
> 
> I've recompiled kernel to include kmemleak (i figured it'd be some module
> that i've only got with this board), but it didn't point to anything (i
> tested also with the test module, to see if it was working).
> 
> My questions are:
>  - Is this a kernel memory leak somewhere?

Hi, this does look like a kernel memory leak. There was recently a known
one fixed by patch from https://lkml.org/lkml/2014/10/15/447 which made
it to 3.18-rc3 and should be backported to stable kernels 3.8+ soon.
You would recognize if this is the fix for you by checking the
thp_zero_page_alloc value in /proc/vmstat. Value X > 1 basically means
that X*2 MB memory is leaked.
You say in the serverfault post that 3.17.2 helped, but the fix is not
in 3.17.2... but it could be just that the circumstances changed and THP
zero pages are no longer freed and realocated.
So if you want to be sure, I would suggest trying again a version where
the problem appeared on your system, and checking the
thp_zero_page_alloc. Perhaps you'll see a >1 value even on 3.17.2, which
means some leak did occur there as well, but maybe not so severe.

>  - How can i find out what is allocating all this memory?

There's no simple way, unfortunately. Checking the kpageflags /proc file
might help. IIRC there used to be a patch in -mm tree to store who
allocated what page, but it might be bitrotten.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory leaks on atom-based boards?
  2014-11-09 11:56 ` Vlastimil Babka
@ 2014-11-09 16:38   ` AL13N
  2014-11-09 22:07     ` Vlastimil Babka
  0 siblings, 1 reply; 7+ messages in thread
From: AL13N @ 2014-11-09 16:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Vlastimil Babka

> On 10/27/2014 07:44 PM, AL13N wrote:
>> I have several machines with the same OS and kernel (3.14.22).
>>
>> 2 of those machines are both atom-based boards and they get OOM, without
>> swap being used (MemAvail crawls down towards 0, even though not more
>> memory is used on processes).
>>
>> Specifically, this one machine, i need to reboot every 3 à 5 days.
>>
>> It has 4GB RAM and 4GB swap(SSD), but:
>>  - sum of all vmRSS < 500MB
>>  - sum of all tmpfs < 100MB
>>  - Slab is around 16MB
>>  - Cache will usually crawl down towards 0 (just like MemAvail)
>>  - I couldn't find another explanation for the loss of Memory
>>  - I also asked
>> http://serverfault.com/questions/616856/where-did-my-memory-go-on-linux-no-cache-slab-shm-ipcs
>> (the other machine)
>>  - This problem existed on this hardware at least from 3.12.* upwards.
>>
>> I've recompiled kernel to include kmemleak (i figured it'd be some
>> module
>> that i've only got with this board), but it didn't point to anything (i
>> tested also with the test module, to see if it was working).
>>
>> My questions are:
>>  - Is this a kernel memory leak somewhere?
>
> Hi, this does look like a kernel memory leak. There was recently a known
> one fixed by patch from https://lkml.org/lkml/2014/10/15/447 which made
> it to 3.18-rc3 and should be backported to stable kernels 3.8+ soon.
> You would recognize if this is the fix for you by checking the
> thp_zero_page_alloc value in /proc/vmstat. Value X > 1 basically means
> that X*2 MB memory is leaked.
> You say in the serverfault post that 3.17.2 helped, but the fix is not
> in 3.17.2... but it could be just that the circumstances changed and THP
> zero pages are no longer freed and realocated.
> So if you want to be sure, I would suggest trying again a version where
> the problem appeared on your system, and checking the
> thp_zero_page_alloc. Perhaps you'll see a >1 value even on 3.17.2, which
> means some leak did occur there as well, but maybe not so severe.


i was gonna tell you guys, but i was waiting until i was sure, but indeed
3.17.2 fixed, it, where i had OOM after 3, maybe 4 days (for at least 2
months), now i'm up more than 4 days and the MemAvailable is still high
enough... at about 3.5GB whereas otherwise it would dwindle until 0. (at
about 1GB/day)

Well, it results to 0 on 3.17.2 ... so... i guess not? i'll keep this
value under observation...


>>  - How can i find out what is allocating all this memory?
>
> There's no simple way, unfortunately. Checking the kpageflags /proc file
> might help. IIRC there used to be a patch in -mm tree to store who
> allocated what page, but it might be bitrotten.


i checked what was in kpageflags (or kpagecount) but it's all some kind of
binary stuff...

do i need some tool to interprete these values?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory leaks on atom-based boards?
  2014-11-09 16:38   ` AL13N
@ 2014-11-09 22:07     ` Vlastimil Babka
  2014-11-10  0:19       ` AL13N
  0 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2014-11-09 22:07 UTC (permalink / raw)
  To: AL13N, linux-kernel

On 11/09/2014 05:38 PM, AL13N wrote:
>> On 10/27/2014 07:44 PM, AL13N wrote:
>>
>> Hi, this does look like a kernel memory leak. There was recently a known
>> one fixed by patch from https://lkml.org/lkml/2014/10/15/447 which made
>> it to 3.18-rc3 and should be backported to stable kernels 3.8+ soon.
>> You would recognize if this is the fix for you by checking the
>> thp_zero_page_alloc value in /proc/vmstat. Value X > 1 basically means
>> that X*2 MB memory is leaked.
>> You say in the serverfault post that 3.17.2 helped, but the fix is not
>> in 3.17.2... but it could be just that the circumstances changed and THP
>> zero pages are no longer freed and realocated.
>> So if you want to be sure, I would suggest trying again a version where
>> the problem appeared on your system, and checking the
>> thp_zero_page_alloc. Perhaps you'll see a >1 value even on 3.17.2, which
>> means some leak did occur there as well, but maybe not so severe.
> 
> 
> i was gonna tell you guys, but i was waiting until i was sure, but indeed
> 3.17.2 fixed, it, where i had OOM after 3, maybe 4 days (for at least 2
> months), now i'm up more than 4 days and the MemAvailable is still high
> enough... at about 3.5GB whereas otherwise it would dwindle until 0. (at
> about 1GB/day)
> 
> Well, it results to 0 on 3.17.2 ... so... i guess not? i'll keep this
> value under observation...

Hm, 0 sounds like nobody was allocating transparent huge pages at all. What
about the other thp_* stats?

>>>  - How can i find out what is allocating all this memory?
>>
>> There's no simple way, unfortunately. Checking the kpageflags /proc file
>> might help. IIRC there used to be a patch in -mm tree to store who
>> allocated what page, but it might be bitrotten.
> 
> 
> i checked what was in kpageflags (or kpagecount) but it's all some kind of
> binary stuff...
> 
> do i need some tool to interprete these values?

There's tools/vm/page-types.c in kernel sources which can read kpageflags, but
not the kpagecount...

Vlastimil

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory leaks on atom-based boards?
  2014-11-09 22:07     ` Vlastimil Babka
@ 2014-11-10  0:19       ` AL13N
  0 siblings, 0 replies; 7+ messages in thread
From: AL13N @ 2014-11-10  0:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: Vlastimil Babka

> On 11/09/2014 05:38 PM, AL13N wrote:
>>> On 10/27/2014 07:44 PM, AL13N wrote:
>>>
>>> Hi, this does look like a kernel memory leak. There was recently a
>>> known
>>> one fixed by patch from https://lkml.org/lkml/2014/10/15/447 which made
>>> it to 3.18-rc3 and should be backported to stable kernels 3.8+ soon.
>>> You would recognize if this is the fix for you by checking the
>>> thp_zero_page_alloc value in /proc/vmstat. Value X > 1 basically means
>>> that X*2 MB memory is leaked.
>>> You say in the serverfault post that 3.17.2 helped, but the fix is not
>>> in 3.17.2... but it could be just that the circumstances changed and
>>> THP
>>> zero pages are no longer freed and realocated.
>>> So if you want to be sure, I would suggest trying again a version where
>>> the problem appeared on your system, and checking the
>>> thp_zero_page_alloc. Perhaps you'll see a >1 value even on 3.17.2,
>>> which
>>> means some leak did occur there as well, but maybe not so severe.
>>
>>
>> i was gonna tell you guys, but i was waiting until i was sure, but
>> indeed
>> 3.17.2 fixed, it, where i had OOM after 3, maybe 4 days (for at least 2
>> months), now i'm up more than 4 days and the MemAvailable is still high
>> enough... at about 3.5GB whereas otherwise it would dwindle until 0. (at
>> about 1GB/day)
>>
>> Well, it results to 0 on 3.17.2 ... so... i guess not? i'll keep this
>> value under observation...
>
> Hm, 0 sounds like nobody was allocating transparent huge pages at all.
> What
> about the other thp_* stats?

thp_fault_alloc 0
thp_fault_fallback 0
thp_collapse_alloc 0
thp_collapse_alloc_failed 0
thp_split 0
thp_zero_page_alloc 0
thp_zero_page_alloc_failed 0


i guess on 3.17.2 there's something that doesn't allocate thp? either
that, or it was a different issue after all...

>>>>  - How can i find out what is allocating all this memory?
>>>
>>> There's no simple way, unfortunately. Checking the kpageflags /proc
>>> file
>>> might help. IIRC there used to be a patch in -mm tree to store who
>>> allocated what page, but it might be bitrotten.
>>
>>
>> i checked what was in kpageflags (or kpagecount) but it's all some kind
>> of
>> binary stuff...
>>
>> do i need some tool to interprete these values?
>
> There's tools/vm/page-types.c in kernel sources which can read kpageflags,
> but
> not the kpagecount...

good to know...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory leaks on atom-based boards?
  2014-10-27 18:44 Memory leaks on atom-based boards? AL13N
  2014-11-09 11:56 ` Vlastimil Babka
@ 2014-11-21 20:06 ` Pavel Machek
  2014-11-21 21:08   ` AL13N
  1 sibling, 1 reply; 7+ messages in thread
From: Pavel Machek @ 2014-11-21 20:06 UTC (permalink / raw)
  To: AL13N; +Cc: linux-kernel

On Mon 2014-10-27 18:44:03, AL13N wrote:
> I have several machines with the same OS and kernel (3.14.22).
> 
> 2 of those machines are both atom-based boards and they get OOM, without
> swap being used (MemAvail crawls down towards 0, even though not more
> memory is used on processes).
> 
> Specifically, this one machine, i need to reboot every 3 à 5 days.

Run the machine without swap and with mem=512M (or something), and it
will reproduce sooner....

You may want to recompile kernel (or something similar) to give leak a
chance...

									Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Memory leaks on atom-based boards?
  2014-11-21 20:06 ` Pavel Machek
@ 2014-11-21 21:08   ` AL13N
  0 siblings, 0 replies; 7+ messages in thread
From: AL13N @ 2014-11-21 21:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pavel Machek

err

the whole point is that swap stays largely unused and get OOM after 3 or 4
days on only atom boards (2), while the other machines(5) with the
identical kernel are unaffected by this and can run for several months...

plus, upgrade to 3.17.2 seems to have fixed the issue...

it's categorized as a slow and steady decline in MemAvailable in
/proc/meminfo.

i can report that the server in question is now running since i fixed it,
several weeks now, and MemAvailable is still above 3GB ...


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-11-21 21:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-27 18:44 Memory leaks on atom-based boards? AL13N
2014-11-09 11:56 ` Vlastimil Babka
2014-11-09 16:38   ` AL13N
2014-11-09 22:07     ` Vlastimil Babka
2014-11-10  0:19       ` AL13N
2014-11-21 20:06 ` Pavel Machek
2014-11-21 21:08   ` AL13N

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).