From: "Nag Avadhanam (nag)" <nag@cisco.com> To: "Theodore Ts'o" <tytso@mit.edu>, "Daniel Walker (danielwa)" <danielwa@cisco.com> Cc: Dave Chinner <david@fromorbit.com>, Alexander Viro <viro@zeniv.linux.org.uk>, "Khalid Mughal (khalidm)" <khalidm@cisco.com>, "xe-kernel@external.cisco.com" <xe-kernel@external.cisco.com>, "dave.hansen@intel.com" <dave.hansen@intel.com>, "hannes@cmpxchg.org" <hannes@cmpxchg.org>, "riel@redhat.com" <riel@redhat.com>, Jonathan Corbet <corbet@lwn.net>, "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org> Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count Date: Tue, 16 Feb 2016 02:58:04 +0000 [thread overview] Message-ID: <D2E7B337.D5404%nag@cisco.com> (raw) In-Reply-To: <20160216004531.GA28260@thunk.org> We have a class of platforms that are essentially swap-less embedded systems that have limited memory resources (2GB and less). There is a need to implement early alerts (before the OOM killer kicks in) based on the current memory usage so admins can take appropriate steps (do not initiate provisioning operations but support existing services, de-provision certain services, etc. based on the extent of memory usage in the system) . There is also a general need to let end users know the available memory so they can determine if they can enable new services (helps in planning). These two depend upon knowing approximate (accurate within few 10s of MB) memory usage within the system. We want to alert admins before system exhibits any thrashing behaviors. We find the source of accounting anomalies to be the page cache accounting. Anonymous page accounting is fine. Page cache usage on our system can be attributed to these file system cache, shared memory store (non-reclaimable) and the in-memory file systems (non-reclaimable). We know the sizes of the shared memory stores and the in memory file system sizes. If we can determine the amount of reclaimable file system cache (+/- few 10s of MB), we can improve the serviceability of these systems. Total - (# of bytes of anon pages + # of bytes of shared memory/tmpfs pages + # of bytes of non-reclaimable file system cache pages) gives us a measure of the available memory. Its the calculation of the # of bytes of non-reclaimable file system cache pages that has been troubling us. We do not want to count inactive file pages (of programs/binaries) that were once mapped by any process in the system as reclaimable because that might lead to thrashing under memory pressure (we want to alert admins before system starts dropping text pages). >From our experiments, we determined running a VM scan looking for droppable pages came close to establishing that number. If there are cheaper ways of determining this stat, please let us know. Thanks, nag On 2/15/16, 4:45 PM, "Theodore Ts'o" <tytso@mit.edu> wrote: >On Mon, Feb 15, 2016 at 03:52:31PM -0800, Daniel Walker wrote: >> >>We need it to determine accurately what the free memory in the >> >>system is. If you know where we can get this information already >> >>please tell, we aren't aware of it. For instance /proc/meminfo isn't >> >>accurate enough. >> >> Approximate point-in-time indication is an accurate characterization >> of what we are doing. This is good enough for us. NO matter what we >> do, we are never going to be able to address the "time of check to >> time of use² window. But, this approximation works reasonably well >> for our use case. > >Why do you need such accuracy, and what do you consider "good enough". >Having something which iterates over all of the inodes in the system >is something that really shouldn't be in a general production kernel >At the very least it should only be accessible by root (so now only a >careless system administrator can DOS attack the system) but the >Dave's original question still stands. Why do you need a certain >level of accuracy regarding how much memory is available after >dropping all of the caches? What problem are you trying to >solve/avoid? > >It may be that you are going about things completely the wrong way, >which is why understanding the higher order problem you are trying to >solve might be helpful in finding something which is safer, >architecturally cleaner, and something that could go into the upstream >kernel. > >Cheers, > > - Ted
WARNING: multiple messages have this Message-ID (diff)
From: "Nag Avadhanam (nag)" <nag@cisco.com> To: "Theodore Ts'o" <tytso@mit.edu>, "Daniel Walker (danielwa)" <danielwa@cisco.com> Cc: Dave Chinner <david@fromorbit.com>, Alexander Viro <viro@zeniv.linux.org.uk>, "Khalid Mughal (khalidm)" <khalidm@cisco.com>, "xe-kernel@external.cisco.com" <xe-kernel@external.cisco.com>, "dave.hansen@intel.com" <dave.hansen@intel.com>, "hannes@cmpxchg.org" <hannes@cmpxchg.org>, "riel@redhat.com" <riel@redhat.com>, Jonathan Corbet <corbet@lwn.net>, "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org> Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count Date: Tue, 16 Feb 2016 02:58:04 +0000 [thread overview] Message-ID: <D2E7B337.D5404%nag@cisco.com> (raw) In-Reply-To: <20160216004531.GA28260@thunk.org> We have a class of platforms that are essentially swap-less embedded systems that have limited memory resources (2GB and less). There is a need to implement early alerts (before the OOM killer kicks in) based on the current memory usage so admins can take appropriate steps (do not initiate provisioning operations but support existing services, de-provision certain services, etc. based on the extent of memory usage in the system) . There is also a general need to let end users know the available memory so they can determine if they can enable new services (helps in planning). These two depend upon knowing approximate (accurate within few 10s of MB) memory usage within the system. We want to alert admins before system exhibits any thrashing behaviors. We find the source of accounting anomalies to be the page cache accounting. Anonymous page accounting is fine. Page cache usage on our system can be attributed to these file system cache, shared memory store (non-reclaimable) and the in-memory file systems (non-reclaimable). We know the sizes of the shared memory stores and the in memory file system sizes. If we can determine the amount of reclaimable file system cache (+/- few 10s of MB), we can improve the serviceability of these systems. Total - (# of bytes of anon pages + # of bytes of shared memory/tmpfs pages + # of bytes of non-reclaimable file system cache pages) gives us a measure of the available memory. Its the calculation of the # of bytes of non-reclaimable file system cache pages that has been troubling us. We do not want to count inactive file pages (of programs/binaries) that were once mapped by any process in the system as reclaimable because that might lead to thrashing under memory pressure (we want to alert admins before system starts dropping text pages). >From our experiments, we determined running a VM scan looking for droppable pages came close to establishing that number. If there are cheaper ways of determining this stat, please let us know. Thanks, nag On 2/15/16, 4:45 PM, "Theodore Ts'o" <tytso@mit.edu> wrote: >On Mon, Feb 15, 2016 at 03:52:31PM -0800, Daniel Walker wrote: >> >>We need it to determine accurately what the free memory in the >> >>system is. If you know where we can get this information already >> >>please tell, we aren't aware of it. For instance /proc/meminfo isn't >> >>accurate enough. >> >> Approximate point-in-time indication is an accurate characterization >> of what we are doing. This is good enough for us. NO matter what we >> do, we are never going to be able to address the "time of check to >> time of use² window. But, this approximation works reasonably well >> for our use case. > >Why do you need such accuracy, and what do you consider "good enough". >Having something which iterates over all of the inodes in the system >is something that really shouldn't be in a general production kernel >At the very least it should only be accessible by root (so now only a >careless system administrator can DOS attack the system) but the >Dave's original question still stands. Why do you need a certain >level of accuracy regarding how much memory is available after >dropping all of the caches? What problem are you trying to >solve/avoid? > >It may be that you are going about things completely the wrong way, >which is why understanding the higher order problem you are trying to >solve might be helpful in finding something which is safer, >architecturally cleaner, and something that could go into the upstream >kernel. > >Cheers, > > - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-02-16 3:07 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-02-12 20:14 [PATCH] kernel: fs: drop_caches: add dds drop_caches_count Daniel Walker 2016-02-12 20:14 ` Daniel Walker 2016-02-14 21:18 ` Dave Chinner 2016-02-14 21:18 ` Dave Chinner 2016-02-15 18:19 ` Daniel Walker 2016-02-15 18:19 ` Daniel Walker 2016-02-15 23:05 ` Dave Chinner 2016-02-15 23:05 ` Dave Chinner 2016-02-15 23:52 ` Daniel Walker 2016-02-15 23:52 ` Daniel Walker 2016-02-15 23:52 ` Daniel Walker 2016-02-16 0:45 ` Theodore Ts'o 2016-02-16 0:45 ` Theodore Ts'o 2016-02-16 0:45 ` Theodore Ts'o 2016-02-16 2:58 ` Nag Avadhanam (nag) [this message] 2016-02-16 2:58 ` Nag Avadhanam (nag) 2016-02-16 5:38 ` Dave Chinner 2016-02-16 5:38 ` Dave Chinner 2016-02-16 7:14 ` Nag Avadhanam 2016-02-16 7:14 ` Nag Avadhanam 2016-02-16 8:35 ` Dave Chinner 2016-02-16 8:35 ` Dave Chinner 2016-02-16 8:43 ` Vladimir Davydov 2016-02-16 8:43 ` Vladimir Davydov 2016-02-16 18:37 ` Nag Avadhanam 2016-02-16 18:37 ` Nag Avadhanam 2016-02-16 5:28 ` Dave Chinner 2016-02-16 5:28 ` Dave Chinner 2016-02-16 5:28 ` Dave Chinner 2016-02-16 5:57 ` Nag Avadhanam 2016-02-16 5:57 ` Nag Avadhanam 2016-02-16 8:22 ` Dave Chinner 2016-02-16 8:22 ` Dave Chinner 2016-02-16 16:12 ` Rik van Riel
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=D2E7B337.D5404%nag@cisco.com \ --to=nag@cisco.com \ --cc=corbet@lwn.net \ --cc=danielwa@cisco.com \ --cc=dave.hansen@intel.com \ --cc=david@fromorbit.com \ --cc=hannes@cmpxchg.org \ --cc=khalidm@cisco.com \ --cc=linux-doc@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=riel@redhat.com \ --cc=tytso@mit.edu \ --cc=viro@zeniv.linux.org.uk \ --cc=xe-kernel@external.cisco.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.