All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nag Avadhanam (nag)" <nag@cisco.com>
To: "Theodore Ts'o" <tytso@mit.edu>,
	"Daniel Walker (danielwa)" <danielwa@cisco.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Khalid Mughal (khalidm)" <khalidm@cisco.com>,
	"xe-kernel@external.cisco.com" <xe-kernel@external.cisco.com>,
	"dave.hansen@intel.com" <dave.hansen@intel.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"riel@redhat.com" <riel@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count
Date: Tue, 16 Feb 2016 02:58:04 +0000	[thread overview]
Message-ID: <D2E7B337.D5404%nag@cisco.com> (raw)
In-Reply-To: <20160216004531.GA28260@thunk.org>

We have a class of platforms that are essentially swap-less embedded
systems that have limited memory resources (2GB and less).

There is a need to implement early alerts (before the OOM killer kicks in)
based on the current memory usage so admins can take appropriate steps (do
not initiate provisioning operations but support existing services,
de-provision certain services, etc. based on the extent of memory usage in
the system) . 

There is also a general need to let end users know the available memory so
they can determine if they can enable new services (helps in planning).

These two depend upon knowing approximate (accurate within few 10s of MB)
memory usage within the system. We want to alert admins before system
exhibits any thrashing behaviors.

We find the source of accounting anomalies to be the page cache
accounting. Anonymous page accounting is fine. Page cache usage on our
system can be attributed to these ­ file system cache, shared memory store
(non-reclaimable) and the in-memory file systems (non-reclaimable). We
know the sizes of the shared memory stores and the in memory file system
sizes.

If we can determine the amount of reclaimable file system cache (+/- few
10s of MB), we can improve the serviceability of these systems.
 
Total - (# of bytes of anon pages + # of bytes of shared memory/tmpfs
pages + # of bytes of non-reclaimable file system cache pages) gives us a
measure of the available memory.


Its the calculation of the # of bytes of non-reclaimable file system cache
pages that has been troubling us. We do not want to count inactive file
pages (of programs/binaries) that were once mapped by any process in the
system as reclaimable because that might lead to thrashing under memory
pressure (we want to alert admins before system starts dropping text
pages).

>From our experiments, we determined running a VM scan looking for
droppable pages came close to establishing that number. If there are
cheaper ways of determining this stat, please let us know.

Thanks,
nag 


On 2/15/16, 4:45 PM, "Theodore Ts'o" <tytso@mit.edu> wrote:

>On Mon, Feb 15, 2016 at 03:52:31PM -0800, Daniel Walker wrote:
>> >>We need it to determine accurately what the free memory in the
>> >>system is. If you know where we can get this information already
>> >>please tell, we aren't aware of it. For instance /proc/meminfo isn't
>> >>accurate enough.
>> 
>> Approximate point-in-time indication is an accurate characterization
>> of what we are doing. This is good enough for us. NO matter what we
>> do, we are never going to be able to address the "time of check to
>> time of use² window.  But, this approximation works reasonably well
>> for our use case.
>
>Why do you need such accuracy, and what do you consider "good enough".
>Having something which iterates over all of the inodes in the system
>is something that really shouldn't be in a general production kernel
>At the very least it should only be accessible by root (so now only a
>careless system administrator can DOS attack the system) but the
>Dave's original question still stands.  Why do you need a certain
>level of accuracy regarding how much memory is available after
>dropping all of the caches?  What problem are you trying to
>solve/avoid?
>
>It may be that you are going about things completely the wrong way,
>which is why understanding the higher order problem you are trying to
>solve might be helpful in finding something which is safer,
>architecturally cleaner, and something that could go into the upstream
>kernel.
>
>Cheers,
>
>						- Ted

WARNING: multiple messages have this Message-ID (diff)
From: "Nag Avadhanam (nag)" <nag@cisco.com>
To: "Theodore Ts'o" <tytso@mit.edu>,
	"Daniel Walker (danielwa)" <danielwa@cisco.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Khalid Mughal (khalidm)" <khalidm@cisco.com>,
	"xe-kernel@external.cisco.com" <xe-kernel@external.cisco.com>,
	"dave.hansen@intel.com" <dave.hansen@intel.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"riel@redhat.com" <riel@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count
Date: Tue, 16 Feb 2016 02:58:04 +0000	[thread overview]
Message-ID: <D2E7B337.D5404%nag@cisco.com> (raw)
In-Reply-To: <20160216004531.GA28260@thunk.org>

We have a class of platforms that are essentially swap-less embedded
systems that have limited memory resources (2GB and less).

There is a need to implement early alerts (before the OOM killer kicks in)
based on the current memory usage so admins can take appropriate steps (do
not initiate provisioning operations but support existing services,
de-provision certain services, etc. based on the extent of memory usage in
the system) . 

There is also a general need to let end users know the available memory so
they can determine if they can enable new services (helps in planning).

These two depend upon knowing approximate (accurate within few 10s of MB)
memory usage within the system. We want to alert admins before system
exhibits any thrashing behaviors.

We find the source of accounting anomalies to be the page cache
accounting. Anonymous page accounting is fine. Page cache usage on our
system can be attributed to these ­ file system cache, shared memory store
(non-reclaimable) and the in-memory file systems (non-reclaimable). We
know the sizes of the shared memory stores and the in memory file system
sizes.

If we can determine the amount of reclaimable file system cache (+/- few
10s of MB), we can improve the serviceability of these systems.
 
Total - (# of bytes of anon pages + # of bytes of shared memory/tmpfs
pages + # of bytes of non-reclaimable file system cache pages) gives us a
measure of the available memory.


Its the calculation of the # of bytes of non-reclaimable file system cache
pages that has been troubling us. We do not want to count inactive file
pages (of programs/binaries) that were once mapped by any process in the
system as reclaimable because that might lead to thrashing under memory
pressure (we want to alert admins before system starts dropping text
pages).

>From our experiments, we determined running a VM scan looking for
droppable pages came close to establishing that number. If there are
cheaper ways of determining this stat, please let us know.

Thanks,
nag 


On 2/15/16, 4:45 PM, "Theodore Ts'o" <tytso@mit.edu> wrote:

>On Mon, Feb 15, 2016 at 03:52:31PM -0800, Daniel Walker wrote:
>> >>We need it to determine accurately what the free memory in the
>> >>system is. If you know where we can get this information already
>> >>please tell, we aren't aware of it. For instance /proc/meminfo isn't
>> >>accurate enough.
>> 
>> Approximate point-in-time indication is an accurate characterization
>> of what we are doing. This is good enough for us. NO matter what we
>> do, we are never going to be able to address the "time of check to
>> time of use² window.  But, this approximation works reasonably well
>> for our use case.
>
>Why do you need such accuracy, and what do you consider "good enough".
>Having something which iterates over all of the inodes in the system
>is something that really shouldn't be in a general production kernel
>At the very least it should only be accessible by root (so now only a
>careless system administrator can DOS attack the system) but the
>Dave's original question still stands.  Why do you need a certain
>level of accuracy regarding how much memory is available after
>dropping all of the caches?  What problem are you trying to
>solve/avoid?
>
>It may be that you are going about things completely the wrong way,
>which is why understanding the higher order problem you are trying to
>solve might be helpful in finding something which is safer,
>architecturally cleaner, and something that could go into the upstream
>kernel.
>
>Cheers,
>
>						- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-02-16  3:07 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-12 20:14 [PATCH] kernel: fs: drop_caches: add dds drop_caches_count Daniel Walker
2016-02-12 20:14 ` Daniel Walker
2016-02-14 21:18 ` Dave Chinner
2016-02-14 21:18   ` Dave Chinner
2016-02-15 18:19   ` Daniel Walker
2016-02-15 18:19     ` Daniel Walker
2016-02-15 23:05     ` Dave Chinner
2016-02-15 23:05       ` Dave Chinner
2016-02-15 23:52       ` Daniel Walker
2016-02-15 23:52         ` Daniel Walker
2016-02-15 23:52         ` Daniel Walker
2016-02-16  0:45         ` Theodore Ts'o
2016-02-16  0:45           ` Theodore Ts'o
2016-02-16  0:45           ` Theodore Ts'o
2016-02-16  2:58           ` Nag Avadhanam (nag) [this message]
2016-02-16  2:58             ` Nag Avadhanam (nag)
2016-02-16  5:38             ` Dave Chinner
2016-02-16  5:38               ` Dave Chinner
2016-02-16  7:14               ` Nag Avadhanam
2016-02-16  7:14                 ` Nag Avadhanam
2016-02-16  8:35                 ` Dave Chinner
2016-02-16  8:35                   ` Dave Chinner
2016-02-16  8:43             ` Vladimir Davydov
2016-02-16  8:43               ` Vladimir Davydov
2016-02-16 18:37               ` Nag Avadhanam
2016-02-16 18:37                 ` Nag Avadhanam
2016-02-16  5:28         ` Dave Chinner
2016-02-16  5:28           ` Dave Chinner
2016-02-16  5:28           ` Dave Chinner
2016-02-16  5:57           ` Nag Avadhanam
2016-02-16  5:57             ` Nag Avadhanam
2016-02-16  8:22             ` Dave Chinner
2016-02-16  8:22               ` Dave Chinner
2016-02-16 16:12           ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D2E7B337.D5404%nag@cisco.com \
    --to=nag@cisco.com \
    --cc=corbet@lwn.net \
    --cc=danielwa@cisco.com \
    --cc=dave.hansen@intel.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=khalidm@cisco.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xe-kernel@external.cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.