linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* That greedy Linux VM cache
@ 2014-01-30 16:58 Igor Podlesny
  2014-01-30 17:06 ` David Lang
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Igor Podlesny @ 2014-01-30 16:58 UTC (permalink / raw)
  To: linux-kernel

   Hello!

   Probably every Linux newcomer's going to have concerns regarding
low free memory and hear an explanation from Linux old fellows that's
actually there's plenty of -- it's just cached, but when it's needed
for applications it's gonna be used -- on demand. I also thought so
until recently I noticed that even when free memory's is almost
exhausted (~ 75 Mib), and processes are in sleep_on_page_killable, the
cache is somewhat like ~ 500 MiB and it's not going to return back
what it's gained. Naturally, vm.drop_caches 3 doesn't squeeze it as
well. That drama has been happening on rather
outdated-but-yet-still-has-2GiB-of-RAM notebook with kernel from 3.10
till 3.12.9 (3.13 is the first release for a long time which simply
freezes the notebook so cold, that SysRq_B's not working, but that's
another story). Everything RAM demanding just yet crawls, load average
is getting higher and there's no paging out, but on going disk mostly
_read_ and a bit write activity. If vm.swaPPineSS not 0, it's swapping
out, but not much, right now I ran Chromium (in addition to long-run
Firefox) and only 32 MiB went to swap, load avg. ~ 7

   Again: 25 % is told (by top, free and finally /proc/meminfo) to be
cached, but kinda greedy.

   I came across similar issue report:
http://www.spinics.net/lists/linux-btrfs/msg11723.html but still
questions remain:

   * How to analyze it? slabtop doesn't mention even 100 MiB of slab
   * Why that's possible?
   * The system is on Btrfs but /home is on XFS, so disk I/O might be
related to text segment paging? But anyway this leads us to question,
hey, there's 500 MiB free^Wcached.

   While I'm thinking about moving system back to XFS...

   P. S. While writing these, swapped ~ 100 MiB, and cache reduced(!)
to 377 MiB, Firefox is mostly in "D" -- sleep_on_page_killable, so is
Chrome, load avg. ~ 7. I had to close Skype to be able to finish that
letter, and cached mem. now is 439 MiB. :) I know it's time to
upgrade, but hey, cached memory is free memory, right?

-- 
End of message. Next message?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: That greedy Linux VM cache
  2014-01-30 16:58 That greedy Linux VM cache Igor Podlesny
@ 2014-01-30 17:06 ` David Lang
  2014-01-31 14:47 ` Igor Podlesny
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: David Lang @ 2014-01-30 17:06 UTC (permalink / raw)
  To: Igor Podlesny; +Cc: linux-kernel

On Fri, 31 Jan 2014, Igor Podlesny wrote:

>   Hello!
>
>   Probably every Linux newcomer's going to have concerns regarding
> low free memory and hear an explanation from Linux old fellows that's
> actually there's plenty of -- it's just cached, but when it's needed
> for applications it's gonna be used -- on demand. I also thought so
> until recently I noticed that even when free memory's is almost
> exhausted (~ 75 Mib)

that's actually quite a bit of free memory, it's very common for servers to be 
far lower than that.

> , and processes are in sleep_on_page_killable, the
> cache is somewhat like ~ 500 MiB and it's not going to return back
> what it's gained. Naturally, vm.drop_caches 3 doesn't squeeze it as
> well.

this is telling you that this data isn't clean cache that can just be dropped, 
it's dirty cache that is waiting to be written, or is otherwise locked.

rather than looking at the memory numbers, instead look at the swap numbers. If 
you are doing any noticable amount of swapping (si so in vmstat), then you are 
out of memory and the cache that can be dropped has been dropped.

this does mean that you ahve a hard time telling when you are getting close to 
running out of memory, but it's easy to see when you have.

> That drama has been happening on rather
> outdated-but-yet-still-has-2GiB-of-RAM notebook with kernel from 3.10
> till 3.12.9 (3.13 is the first release for a long time which simply
> freezes the notebook so cold, that SysRq_B's not working, but that's
> another story). Everything RAM demanding just yet crawls, load average
> is getting higher and there's no paging out, but on going disk mostly
> _read_ and a bit write activity. If vm.swaPPineSS not 0, it's swapping
> out, but not much, right now I ran Chromium (in addition to long-run
> Firefox) and only 32 MiB went to swap, load avg. ~ 7

that much read activity probably means that you are swapping pages in to use 
them, then dropping them to swap in another page, which you then drop to go back 
and fetch the first page.

David Lang

>   Again: 25 % is told (by top, free and finally /proc/meminfo) to be
> cached, but kinda greedy.
>
>   I came across similar issue report:
> http://www.spinics.net/lists/linux-btrfs/msg11723.html but still
> questions remain:
>
>   * How to analyze it? slabtop doesn't mention even 100 MiB of slab
>   * Why that's possible?
>   * The system is on Btrfs but /home is on XFS, so disk I/O might be
> related to text segment paging? But anyway this leads us to question,
> hey, there's 500 MiB free^Wcached.
>
>   While I'm thinking about moving system back to XFS...
>
>   P. S. While writing these, swapped ~ 100 MiB, and cache reduced(!)
> to 377 MiB, Firefox is mostly in "D" -- sleep_on_page_killable, so is
> Chrome, load avg. ~ 7. I had to close Skype to be able to finish that
> letter, and cached mem. now is 439 MiB. :) I know it's time to
> upgrade, but hey, cached memory is free memory, right?
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: That greedy Linux VM cache
  2014-01-30 16:58 That greedy Linux VM cache Igor Podlesny
  2014-01-30 17:06 ` David Lang
@ 2014-01-31 14:47 ` Igor Podlesny
  2014-01-31 16:57   ` Austin S. Hemmelgarn
  2014-01-31 18:25 ` Henrique de Moraes Holschuh
  2014-02-03 10:55 ` Michal Hocko
  3 siblings, 1 reply; 9+ messages in thread
From: Igor Podlesny @ 2014-01-31 14:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton

On 31 January 2014 00:58, Igor Podlesny <for.poige+linux@gmail.com> wrote:
[...]
>  While I'm thinking about moving system back to XFS...

   Well, it helped just a bit. The whole picture remains, so it's not
Btrfs' issue, but seemingly Linux VM's one. The problem can be briefly
described as "if allowed to swap (swappiness != 0), VM would rather
start swapping, than reduce cache size which holds ~ 25 % of RAM".
Even more briefly it's stated in the Subject.

   From user's point of view, it looks like the system is being
heavily swapped (and can be easily misinterpreted as it), but actually
the most disk activity is permanent _reading_ from filesystem, and not
accessing swap device.

   Should I fill in a bug report in kernel's bugzilla, or just upgrade
the notebook? )

-- 
End of message. Next message?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: That greedy Linux VM cache
  2014-01-31 14:47 ` Igor Podlesny
@ 2014-01-31 16:57   ` Austin S. Hemmelgarn
  2014-01-31 17:08     ` Igor Podlesny
  0 siblings, 1 reply; 9+ messages in thread
From: Austin S. Hemmelgarn @ 2014-01-31 16:57 UTC (permalink / raw)
  To: for.poige+linux, linux-kernel; +Cc: Andrew Morton



On 01/31/2014 09:47 AM, Igor Podlesny wrote:
> On 31 January 2014 00:58, Igor Podlesny <for.poige+linux@gmail.com> wrote:
> [...]
>>  While I'm thinking about moving system back to XFS...
> 
>    Well, it helped just a bit. The whole picture remains, so it's not
> Btrfs' issue, but seemingly Linux VM's one. The problem can be briefly
> described as "if allowed to swap (swappiness != 0), VM would rather
> start swapping, than reduce cache size which holds ~ 25 % of RAM".
> Even more briefly it's stated in the Subject.
> 
>    From user's point of view, it looks like the system is being
> heavily swapped (and can be easily misinterpreted as it), but actually
> the most disk activity is permanent _reading_ from filesystem, and not
> accessing swap device.
> 
>    Should I fill in a bug report in kernel's bugzilla, or just upgrade
> the notebook? )
> 
If I remember correctly, there is a sysctl for configuring how
aggressively the system tries to retain the VFS cache, changing the
value there might improve things for you.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: That greedy Linux VM cache
  2014-01-31 16:57   ` Austin S. Hemmelgarn
@ 2014-01-31 17:08     ` Igor Podlesny
  0 siblings, 0 replies; 9+ messages in thread
From: Igor Podlesny @ 2014-01-31 17:08 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-kernel

[...]
On 1 February 2014 00:57, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote:
> If I remember correctly, there is a sysctl for configuring how
> aggressively the system tries to retain the VFS cache, changing the
> value there might improve things for you.

   Yeah, in theory. On practice I never saw a difference even with
vm.vfs_cache_pressure set to 800000.

-- 
End of message. Next message?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: That greedy Linux VM cache
  2014-01-30 16:58 That greedy Linux VM cache Igor Podlesny
  2014-01-30 17:06 ` David Lang
  2014-01-31 14:47 ` Igor Podlesny
@ 2014-01-31 18:25 ` Henrique de Moraes Holschuh
  2014-02-03 10:55 ` Michal Hocko
  3 siblings, 0 replies; 9+ messages in thread
From: Henrique de Moraes Holschuh @ 2014-01-31 18:25 UTC (permalink / raw)
  To: linux-kernel

On Fri, 31 Jan 2014, Igor Podlesny wrote:
>    Probably every Linux newcomer's going to have concerns regarding
> low free memory and hear an explanation from Linux old fellows that's
> actually there's plenty of -- it's just cached, but when it's needed
> for applications it's gonna be used -- on demand. I also thought so

Yeah right, we wish it would...

Anyway, maybe this helps?
http://thread.gmane.org/gmane.linux.kernel.mm/112554/focus=81834

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: That greedy Linux VM cache
  2014-01-30 16:58 That greedy Linux VM cache Igor Podlesny
                   ` (2 preceding siblings ...)
  2014-01-31 18:25 ` Henrique de Moraes Holschuh
@ 2014-02-03 10:55 ` Michal Hocko
  3 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2014-02-03 10:55 UTC (permalink / raw)
  To: Igor Podlesny; +Cc: linux-kernel, linux-mm

[Adding linux-mm to the CC]

On Fri 31-01-14 00:58:16, Igor Podlesny wrote:
>    Hello!
> 
>    Probably every Linux newcomer's going to have concerns regarding
> low free memory and hear an explanation from Linux old fellows that's
> actually there's plenty of -- it's just cached, but when it's needed
> for applications it's gonna be used -- on demand. I also thought so
> until recently I noticed that even when free memory's is almost
> exhausted (~ 75 Mib), and processes are in sleep_on_page_killable, the

This means that the page has to be written back in order to be dropped.
How much dirty memory you have (comparing to the total size of the page
cache)?
What does your /proc/sys/vm/dirty_ratio say?
How fast is your storage?

Also, is this 32b or 64b system?

> cache is somewhat like ~ 500 MiB and it's not going to return back
> what it's gained. Naturally, vm.drop_caches 3 doesn't squeeze it as
> well. That drama has been happening on rather
> outdated-but-yet-still-has-2GiB-of-RAM notebook with kernel from 3.10
> till 3.12.9 (3.13 is the first release for a long time which simply
> freezes the notebook so cold, that SysRq_B's not working, but that's
> another story). Everything RAM demanding just yet crawls, load average
> is getting higher and there's no paging out, but on going disk mostly
> _read_ and a bit write activity. If vm.swaPPineSS not 0, it's swapping
> out, but not much, right now I ran Chromium (in addition to long-run
> Firefox) and only 32 MiB went to swap, load avg. ~ 7
> 
>    Again: 25 % is told (by top, free and finally /proc/meminfo) to be
> cached, but kinda greedy.
> 
>    I came across similar issue report:
> http://www.spinics.net/lists/linux-btrfs/msg11723.html but still
> questions remain:
> 
>    * How to analyze it? slabtop doesn't mention even 100 MiB of slab

snapshoting /proc/meminfo and /proc/vmstat every second or two while
your load is bad might tell us more. 

>    * Why that's possible?

That is hard to tell withou some numbers. But it might be possible that
you are seeing the same issue as reported and fixed here: 
http://marc.info/?l=linux-kernel&m=139060103406327&w=2

Especially when you are using tmpfs (e.g. as a backing storage for /tmp)

>    * The system is on Btrfs but /home is on XFS, so disk I/O might be
> related to text segment paging? But anyway this leads us to question,
> hey, there's 500 MiB free^Wcached.
> 
>    While I'm thinking about moving system back to XFS...
> 
>    P. S. While writing these, swapped ~ 100 MiB, and cache reduced(!)
> to 377 MiB, Firefox is mostly in "D" -- sleep_on_page_killable, so is
> Chrome, load avg. ~ 7. I had to close Skype to be able to finish that
> letter, and cached mem. now is 439 MiB. :) I know it's time to
> upgrade, but hey, cached memory is free memory, right?
> 
> -- 
> End of message. Next message?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: That greedy Linux VM cache
  2014-02-08 19:42 Igor Podlesny
@ 2014-02-10 13:33 ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2014-02-10 13:33 UTC (permalink / raw)
  To: Igor Podlesny; +Cc: linux-kernel, linux-mm

On Sun 09-02-14 03:42:52, Igor Podlesny wrote:
> On 3 February 2014 18:55, Michal Hocko <mhocko@suse.cz> wrote:
> > [Adding linux-mm to the CC]
> 
> [...]
> 
> > This means that the page has to be written back in order to be dropped.
> > How much dirty memory you have (comparing to the total size of the page
> > cache)?
> 
>    Not too many. May be you missed that part, but I said, that disk is
> being mostly READ, NOT written.
>    I also said, that READing is going from system partition (it was Btrfs).
> 
> > What does your /proc/sys/vm/dirty_ratio say?
> 
>    10

With 2G of RAM this shouldn't be a lot. And definitely shouldn't make a
problem with SSD.

> > How fast is your storage?
> 
>    Was 5400 HDD, today I installed SSD.
> 
> > Also, is this 32b or 64b system?
> 
>    Kernel is x86_64 or sometimes 32, userspace is 32 -- full x86_64
> setup is simply not usable on 2 GiB,

Which is unexpected on its own. We have many systems with comparable
and much less memory as well running just fine. You haven't posted any
numbers yet so it is still not clear where is the bottleneck on your
system.

> you can run just one program, like in MS-DOS era. :) (I'd give a try
> to x32, but alas, it's not really ready yet.)
> 
> >>    * How to analyze it? slabtop doesn't mention even 100 MiB of slab
> >
> > snapshoting /proc/meminfo and /proc/vmstat every second or two while
> > your load is bad might tell us more.
> >
> >>    * Why that's possible?
> >
> > That is hard to tell withou some numbers. But it might be possible that
> > you are seeing the same issue as reported and fixed here:
> > http://marc.info/?l=linux-kernel&m=139060103406327&w=2
> 
>    No, there's no such amount of dirty data.

OK, then I would check whether this is fs related. You said that you've
tried xfs or something else with similar results?

> > Especially when you are using tmpfs (e.g. as a backing storage for /tmp)
> 
>    I use it, yeah, but it has ridiculously low occupied space ~ 1--2 MiB.
> 
>    *** Okay, so I've said I decided to try SSD. The issue stays
> absolutely the same and is seen even more clearer: when swappiness is
> 0, Btrfs-endio is heating up processor constantly taking almost all
> CPU resources (storage is fast, CPU's saturated), but when I set it
> higher, thus allowing to swap, it helps -- ~ 250 MiB got swapped out
> (quickly -- SSD rules) and the system became responsive again. As
> previously it didn't try to reduce cache at all. I never saw it to be
> even 250 MiB, always higher (~ 25 % of RAM). So, actually it's better
> using swappiness = 100 in these circumstances.

Hmm, so the swapping is fast enough while the page cache backed by the
storage is slow. I guess both the swap partition and fs are backed by
the same storage, right?
Do you have a sufficient free space on the filesystem?

>    I think the problem should be easily reproducible -- kernel allows
> you to limit available RAM. ;)
> 
>    P. S. The only thing's left as a theory is "Intel Corporation
> Mobile GM965/GL960 Integrated Graphics Controller" with i915 kernel
> module. I don't know much about it, but it should have had bitten a
> good part of system RAM, right?

How much memory? I vaguely remember that i915 had very aggressive
reclaiming logic which led to some stalls during reclaim. I cannot seem
to find any reference right now.

Btw. Are you using vanilla kernel?

> Since it's Ubuntu, there's compiz by
> default and pmap -d `pgrep compiz` shows lots of similar lines:

It would be good to reduce problem space by disabling compiz.
 
> ...
> e0344000      20 rw-s- 0000000102e33000 000:00005 card0
> e0479000      56 rw-s- 0000000102bf4000 000:00005 card0
> e0487000      48 rw-s- 0000000102be8000 000:00005 card0
> e0493000      56 rw-s- 0000000102bda000 000:00005 card0
> e04a1000      56 rw-s- 0000000102bcc000 000:00005 card0
> e04af000      48 rw-s- 0000000102bc0000 000:00005 card0
> e04bb000      56 rw-s- 0000000102bb2000 000:00005 card0
> e04c9000      48 rw-s- 0000000102d64000 000:00005 card0
> e04d5000     192 rw-s- 0000000102ce5000 000:00005 card0
> e0505000      80 rw-s- 0000000102de7000 000:00005 card0
> e0519000      20 rw-s- 0000000102ccc000 000:00005 card0
> e051e000     160 rw-s- 0000000102ca4000 000:00005 card0
> e0546000      20 rw-s- 0000000102c9f000 000:00005 card0
> e054b000      48 rw-s- 0000000102c93000 000:00005 card0
> e0557000      20 rw-s- 0000000102c8e000 000:00005 card0
> e055c000      20 rw-s- 0000000102c89000 000:00005 card0
> ...
> 
>    I have a suspicion... (I also dislike the sizes of those mappings)

The mappings do not seem to be too big (the biggest one has 160kB)...

> ... that a valuable amount of that "cached memory" can be related to
> this i915. How can I check it?...

I am not sure I understand what you are asking about.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: That greedy Linux VM cache
@ 2014-02-08 19:42 Igor Podlesny
  2014-02-10 13:33 ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Igor Podlesny @ 2014-02-08 19:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm

On 3 February 2014 18:55, Michal Hocko <mhocko@suse.cz> wrote:
> [Adding linux-mm to the CC]

[...]

> This means that the page has to be written back in order to be dropped.
> How much dirty memory you have (comparing to the total size of the page
> cache)?

   Not too many. May be you missed that part, but I said, that disk is
being mostly READ, NOT written.
   I also said, that READing is going from system partition (it was Btrfs).

> What does your /proc/sys/vm/dirty_ratio say?

   10

> How fast is your storage?

   Was 5400 HDD, today I installed SSD.

> Also, is this 32b or 64b system?

   Kernel is x86_64 or sometimes 32, userspace is 32 -- full x86_64
setup is simply not usable on 2 GiB,
you can run just one program, like in MS-DOS era. :) (I'd give a try
to x32, but alas, it's not really ready yet.)

>>    * How to analyze it? slabtop doesn't mention even 100 MiB of slab
>
> snapshoting /proc/meminfo and /proc/vmstat every second or two while
> your load is bad might tell us more.
>
>>    * Why that's possible?
>
> That is hard to tell withou some numbers. But it might be possible that
> you are seeing the same issue as reported and fixed here:
> http://marc.info/?l=linux-kernel&m=139060103406327&w=2

   No, there's no such amount of dirty data.

> Especially when you are using tmpfs (e.g. as a backing storage for /tmp)

   I use it, yeah, but it has ridiculously low occupied space ~ 1--2 MiB.

   *** Okay, so I've said I decided to try SSD. The issue stays
absolutely the same and is seen even more clearer: when swappiness is
0, Btrfs-endio is heating up processor constantly taking almost all
CPU resources (storage is fast, CPU's saturated), but when I set it
higher, thus allowing to swap, it helps -- ~ 250 MiB got swapped out
(quickly -- SSD rules) and the system became responsive again. As
previously it didn't try to reduce cache at all. I never saw it to be
even 250 MiB, always higher (~ 25 % of RAM). So, actually it's better
using swappiness = 100 in these circumstances.

   I think the problem should be easily reproducible -- kernel allows
you to limit available RAM. ;)

   P. S. The only thing's left as a theory is "Intel Corporation
Mobile GM965/GL960 Integrated Graphics Controller" with i915 kernel
module. I don't know much about it, but it should have had bitten a
good part of system RAM, right? Since it's Ubuntu, there's compiz by
default and pmap -d `pgrep compiz` shows lots of similar lines:

...
e0344000      20 rw-s- 0000000102e33000 000:00005 card0
e0479000      56 rw-s- 0000000102bf4000 000:00005 card0
e0487000      48 rw-s- 0000000102be8000 000:00005 card0
e0493000      56 rw-s- 0000000102bda000 000:00005 card0
e04a1000      56 rw-s- 0000000102bcc000 000:00005 card0
e04af000      48 rw-s- 0000000102bc0000 000:00005 card0
e04bb000      56 rw-s- 0000000102bb2000 000:00005 card0
e04c9000      48 rw-s- 0000000102d64000 000:00005 card0
e04d5000     192 rw-s- 0000000102ce5000 000:00005 card0
e0505000      80 rw-s- 0000000102de7000 000:00005 card0
e0519000      20 rw-s- 0000000102ccc000 000:00005 card0
e051e000     160 rw-s- 0000000102ca4000 000:00005 card0
e0546000      20 rw-s- 0000000102c9f000 000:00005 card0
e054b000      48 rw-s- 0000000102c93000 000:00005 card0
e0557000      20 rw-s- 0000000102c8e000 000:00005 card0
e055c000      20 rw-s- 0000000102c89000 000:00005 card0
...

   I have a suspicion... (I also dislike the sizes of those mappings)
... that a valuable amount of that "cached memory" can be related to
this i915. How can I check it?...

-- 
End of message. Next message?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-02-10 13:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-30 16:58 That greedy Linux VM cache Igor Podlesny
2014-01-30 17:06 ` David Lang
2014-01-31 14:47 ` Igor Podlesny
2014-01-31 16:57   ` Austin S. Hemmelgarn
2014-01-31 17:08     ` Igor Podlesny
2014-01-31 18:25 ` Henrique de Moraes Holschuh
2014-02-03 10:55 ` Michal Hocko
2014-02-08 19:42 Igor Podlesny
2014-02-10 13:33 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).