All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Walker <danielwa@cisco.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Khalid Mughal <khalidm@cisco.com>,
	xe-kernel@external.cisco.com, dave.hansen@intel.com,
	hannes@cmpxchg.org, riel@redhat.com,
	Jonathan Corbet <corbet@lwn.net>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	"Nag Avadhanam (nag)" <nag@cisco.com>
Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count
Date: Mon, 15 Feb 2016 15:52:31 -0800	[thread overview]
Message-ID: <56C264BF.3090100@cisco.com> (raw)
In-Reply-To: <20160215230511.GU19486@dastard>

On 02/15/2016 03:05 PM, Dave Chinner wrote:
> On Mon, Feb 15, 2016 at 10:19:54AM -0800, Daniel Walker wrote:
>> On 02/14/2016 01:18 PM, Dave Chinner wrote:
>>> On Fri, Feb 12, 2016 at 12:14:39PM -0800, Daniel Walker wrote:
>>>> From: Khalid Mughal <khalidm@cisco.com>
>>>>
>>>> Currently there is no way to figure out the droppable pagecache size
>>> >from the meminfo output. The MemFree size can shrink during normal
>>>> system operation, when some of the memory pages get cached and is
>>>> reflected in "Cached" field. Similarly for file operations some of
>>>> the buffer memory gets cached and it is reflected in "Buffers" field.
>>>> The kernel automatically reclaims all this cached & buffered memory,
>>>> when it is needed elsewhere on the system. The only way to manually
>>>> reclaim this memory is by writing 1 to /proc/sys/vm/drop_caches. But
>>>> this can have performance impact. Since it discards cached objects,
>>>> it may cause high CPU & I/O utilization to recreate the dropped
>>>> objects during heavy system load.
>>>> This patch computes the droppable pagecache count, using same
>>>> algorithm as "vm/drop_caches". It is non-destructive and does not
>>>> drop any pages. Therefore it does not have any impact on system
>>>> performance. The computation does not include the size of
>>>> reclaimable slab.
>>> Why, exactly, do you need this? You've described what the patch
>>> does (i.e. redundant, because we can read the code), and described
>>> that the kernel already accounts this reclaimable memory elsewhere
>>> and you can already read that and infer the amount of reclaimable
>>> memory from it. So why isn't that accounting sufficient?
>> We need it to determine accurately what the free memory in the
>> system is. If you know where we can get this information already
>> please tell, we aren't aware of it. For instance /proc/meminfo isn't
>> accurate enough.
> What you are proposing isn't accurate, either, because it will be
> stale by the time the inode cache traversal is completed and the
> count returned to userspace. e.g. pages that have already been
> accounted as droppable can be reclaimed or marked dirty and hence
> "unreclaimable".
>
> IOWs, the best you are going to get is an approximate point-in-time
> indication of how much memory is available for immediate reclaim.
> We're never going to get an accurate measure in userspace unless we
> accurately account for it in the kernel itself. Which, I think it
> has already been pointed out, is prohibitively expensive so isn't
> done.
>
> As for a replacement, looking at what pages you consider "droppable"
> is really only file pages that are not under dirty or under
> writeback. i.e. from /proc/meminfo:
>
> Active(file):     220128 kB
> Inactive(file):    60232 kB
> Dirty:                 0 kB
> Writeback:             0 kB
>
> i.e. reclaimable file cache = Active + inactive - dirty - writeback.
>
> And while you are there, when you drop slab caches:
>
> SReclaimable:      66632 kB
>
> some amount of that may be freed. No guarantees can be made about
> the amount, though.

I got this response from another engineer here at Cisco (Nag he's CC'd 
also),

"

Approximate point-in-time indication is an accurate characterization of what we are doing. This is good enough for us. NO matter what we do, we are never going to be able to address the "time of check to time of use” window.  But, this approximation works reasonably well for our use case.

As to his other suggestion of estimating the droppable cache, I have considered it but found it unusable. The problem is the inactive file pages count a whole lot pages more than the droppable pages.

See the value of these, before and [after] dropping reclaimable pages.

Before:

Active(file):     183488 kB
Inactive(file):   180504 kB

After (the drop caches):
Active(file):      89468 kB
Inactive(file):    32016 kB

The dirty and the write back are mostly 0KB under our workload as we are 
mostly dealing with the readonly file pages of binaries 
(programs/libraries)..
"

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Walker <danielwa@cisco.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Khalid Mughal <khalidm@cisco.com>,
	xe-kernel@external.cisco.com, dave.hansen@intel.com,
	hannes@cmpxchg.org, riel@redhat.com,
	Jonathan Corbet <corbet@lwn.net>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	"Nag Avadhanam (nag)" <nag@cisco.com>
Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count
Date: Mon, 15 Feb 2016 15:52:31 -0800	[thread overview]
Message-ID: <56C264BF.3090100@cisco.com> (raw)
In-Reply-To: <20160215230511.GU19486@dastard>

On 02/15/2016 03:05 PM, Dave Chinner wrote:
> On Mon, Feb 15, 2016 at 10:19:54AM -0800, Daniel Walker wrote:
>> On 02/14/2016 01:18 PM, Dave Chinner wrote:
>>> On Fri, Feb 12, 2016 at 12:14:39PM -0800, Daniel Walker wrote:
>>>> From: Khalid Mughal <khalidm@cisco.com>
>>>>
>>>> Currently there is no way to figure out the droppable pagecache size
>>> >from the meminfo output. The MemFree size can shrink during normal
>>>> system operation, when some of the memory pages get cached and is
>>>> reflected in "Cached" field. Similarly for file operations some of
>>>> the buffer memory gets cached and it is reflected in "Buffers" field.
>>>> The kernel automatically reclaims all this cached & buffered memory,
>>>> when it is needed elsewhere on the system. The only way to manually
>>>> reclaim this memory is by writing 1 to /proc/sys/vm/drop_caches. But
>>>> this can have performance impact. Since it discards cached objects,
>>>> it may cause high CPU & I/O utilization to recreate the dropped
>>>> objects during heavy system load.
>>>> This patch computes the droppable pagecache count, using same
>>>> algorithm as "vm/drop_caches". It is non-destructive and does not
>>>> drop any pages. Therefore it does not have any impact on system
>>>> performance. The computation does not include the size of
>>>> reclaimable slab.
>>> Why, exactly, do you need this? You've described what the patch
>>> does (i.e. redundant, because we can read the code), and described
>>> that the kernel already accounts this reclaimable memory elsewhere
>>> and you can already read that and infer the amount of reclaimable
>>> memory from it. So why isn't that accounting sufficient?
>> We need it to determine accurately what the free memory in the
>> system is. If you know where we can get this information already
>> please tell, we aren't aware of it. For instance /proc/meminfo isn't
>> accurate enough.
> What you are proposing isn't accurate, either, because it will be
> stale by the time the inode cache traversal is completed and the
> count returned to userspace. e.g. pages that have already been
> accounted as droppable can be reclaimed or marked dirty and hence
> "unreclaimable".
>
> IOWs, the best you are going to get is an approximate point-in-time
> indication of how much memory is available for immediate reclaim.
> We're never going to get an accurate measure in userspace unless we
> accurately account for it in the kernel itself. Which, I think it
> has already been pointed out, is prohibitively expensive so isn't
> done.
>
> As for a replacement, looking at what pages you consider "droppable"
> is really only file pages that are not under dirty or under
> writeback. i.e. from /proc/meminfo:
>
> Active(file):     220128 kB
> Inactive(file):    60232 kB
> Dirty:                 0 kB
> Writeback:             0 kB
>
> i.e. reclaimable file cache = Active + inactive - dirty - writeback.
>
> And while you are there, when you drop slab caches:
>
> SReclaimable:      66632 kB
>
> some amount of that may be freed. No guarantees can be made about
> the amount, though.

I got this response from another engineer here at Cisco (Nag he's CC'd 
also),

"

Approximate point-in-time indication is an accurate characterization of what we are doing. This is good enough for us. NO matter what we do, we are never going to be able to address the "time of check to time of use� window.  But, this approximation works reasonably well for our use case.

As to his other suggestion of estimating the droppable cache, I have considered it but found it unusable. The problem is the inactive file pages count a whole lot pages more than the droppable pages.

See the value of these, before and [after] dropping reclaimable pages.

Before:

Active(file):     183488 kB
Inactive(file):   180504 kB

After (the drop caches):
Active(file):      89468 kB
Inactive(file):    32016 kB

The dirty and the write back are mostly 0KB under our workload as we are 
mostly dealing with the readonly file pages of binaries 
(programs/libraries)..
"


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Walker <danielwa@cisco.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Khalid Mughal <khalidm@cisco.com>,
	xe-kernel@external.cisco.com, dave.hansen@intel.com,
	hannes@cmpxchg.org, riel@redhat.com,
	Jonathan Corbet <corbet@lwn.net>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	"Nag Avadhanam (nag)" <nag@cisco.com>
Subject: Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count
Date: Mon, 15 Feb 2016 15:52:31 -0800	[thread overview]
Message-ID: <56C264BF.3090100@cisco.com> (raw)
In-Reply-To: <20160215230511.GU19486@dastard>

On 02/15/2016 03:05 PM, Dave Chinner wrote:
> On Mon, Feb 15, 2016 at 10:19:54AM -0800, Daniel Walker wrote:
>> On 02/14/2016 01:18 PM, Dave Chinner wrote:
>>> On Fri, Feb 12, 2016 at 12:14:39PM -0800, Daniel Walker wrote:
>>>> From: Khalid Mughal <khalidm@cisco.com>
>>>>
>>>> Currently there is no way to figure out the droppable pagecache size
>>> >from the meminfo output. The MemFree size can shrink during normal
>>>> system operation, when some of the memory pages get cached and is
>>>> reflected in "Cached" field. Similarly for file operations some of
>>>> the buffer memory gets cached and it is reflected in "Buffers" field.
>>>> The kernel automatically reclaims all this cached & buffered memory,
>>>> when it is needed elsewhere on the system. The only way to manually
>>>> reclaim this memory is by writing 1 to /proc/sys/vm/drop_caches. But
>>>> this can have performance impact. Since it discards cached objects,
>>>> it may cause high CPU & I/O utilization to recreate the dropped
>>>> objects during heavy system load.
>>>> This patch computes the droppable pagecache count, using same
>>>> algorithm as "vm/drop_caches". It is non-destructive and does not
>>>> drop any pages. Therefore it does not have any impact on system
>>>> performance. The computation does not include the size of
>>>> reclaimable slab.
>>> Why, exactly, do you need this? You've described what the patch
>>> does (i.e. redundant, because we can read the code), and described
>>> that the kernel already accounts this reclaimable memory elsewhere
>>> and you can already read that and infer the amount of reclaimable
>>> memory from it. So why isn't that accounting sufficient?
>> We need it to determine accurately what the free memory in the
>> system is. If you know where we can get this information already
>> please tell, we aren't aware of it. For instance /proc/meminfo isn't
>> accurate enough.
> What you are proposing isn't accurate, either, because it will be
> stale by the time the inode cache traversal is completed and the
> count returned to userspace. e.g. pages that have already been
> accounted as droppable can be reclaimed or marked dirty and hence
> "unreclaimable".
>
> IOWs, the best you are going to get is an approximate point-in-time
> indication of how much memory is available for immediate reclaim.
> We're never going to get an accurate measure in userspace unless we
> accurately account for it in the kernel itself. Which, I think it
> has already been pointed out, is prohibitively expensive so isn't
> done.
>
> As for a replacement, looking at what pages you consider "droppable"
> is really only file pages that are not under dirty or under
> writeback. i.e. from /proc/meminfo:
>
> Active(file):     220128 kB
> Inactive(file):    60232 kB
> Dirty:                 0 kB
> Writeback:             0 kB
>
> i.e. reclaimable file cache = Active + inactive - dirty - writeback.
>
> And while you are there, when you drop slab caches:
>
> SReclaimable:      66632 kB
>
> some amount of that may be freed. No guarantees can be made about
> the amount, though.

I got this response from another engineer here at Cisco (Nag he's CC'd 
also),

"

Approximate point-in-time indication is an accurate characterization of what we are doing. This is good enough for us. NO matter what we do, we are never going to be able to address the "time of check to time of use? window.  But, this approximation works reasonably well for our use case.

As to his other suggestion of estimating the droppable cache, I have considered it but found it unusable. The problem is the inactive file pages count a whole lot pages more than the droppable pages.

See the value of these, before and [after] dropping reclaimable pages.

Before:

Active(file):     183488 kB
Inactive(file):   180504 kB

After (the drop caches):
Active(file):      89468 kB
Inactive(file):    32016 kB

The dirty and the write back are mostly 0KB under our workload as we are 
mostly dealing with the readonly file pages of binaries 
(programs/libraries)..
"


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-02-15 23:52 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-12 20:14 [PATCH] kernel: fs: drop_caches: add dds drop_caches_count Daniel Walker
2016-02-12 20:14 ` Daniel Walker
2016-02-14 21:18 ` Dave Chinner
2016-02-14 21:18   ` Dave Chinner
2016-02-15 18:19   ` Daniel Walker
2016-02-15 18:19     ` Daniel Walker
2016-02-15 23:05     ` Dave Chinner
2016-02-15 23:05       ` Dave Chinner
2016-02-15 23:52       ` Daniel Walker [this message]
2016-02-15 23:52         ` Daniel Walker
2016-02-15 23:52         ` Daniel Walker
2016-02-16  0:45         ` Theodore Ts'o
2016-02-16  0:45           ` Theodore Ts'o
2016-02-16  0:45           ` Theodore Ts'o
2016-02-16  2:58           ` Nag Avadhanam (nag)
2016-02-16  2:58             ` Nag Avadhanam (nag)
2016-02-16  5:38             ` Dave Chinner
2016-02-16  5:38               ` Dave Chinner
2016-02-16  7:14               ` Nag Avadhanam
2016-02-16  7:14                 ` Nag Avadhanam
2016-02-16  8:35                 ` Dave Chinner
2016-02-16  8:35                   ` Dave Chinner
2016-02-16  8:43             ` Vladimir Davydov
2016-02-16  8:43               ` Vladimir Davydov
2016-02-16 18:37               ` Nag Avadhanam
2016-02-16 18:37                 ` Nag Avadhanam
2016-02-16  5:28         ` Dave Chinner
2016-02-16  5:28           ` Dave Chinner
2016-02-16  5:28           ` Dave Chinner
2016-02-16  5:57           ` Nag Avadhanam
2016-02-16  5:57             ` Nag Avadhanam
2016-02-16  8:22             ` Dave Chinner
2016-02-16  8:22               ` Dave Chinner
2016-02-16 16:12           ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C264BF.3090100@cisco.com \
    --to=danielwa@cisco.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@intel.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=khalidm@cisco.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nag@cisco.com \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xe-kernel@external.cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.