From: Waiman Long <longman@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Jonathan Corbet <corbet@lwn.net>,
"Luis R. Rodriguez" <mcgrof@kernel.org>,
Kees Cook <keescook@chromium.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-doc@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Miklos Szeredi <mszeredi@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Larry Woodman <lwoodman@redhat.com>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
"Wangkai (Kevin C)" <wangkai86@huawei.com>
Subject: Re: [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries
Date: Tue, 10 Jul 2018 12:09:17 -0400 [thread overview]
Message-ID: <a2794bcc-9193-cbca-3a54-47420a2ab52c@redhat.com> (raw)
In-Reply-To: <20180710142740.GQ14284@dhcp22.suse.cz>
On 07/10/2018 10:27 AM, Michal Hocko wrote:
> On Mon 09-07-18 12:01:04, Waiman Long wrote:
>> On 07/09/2018 04:19 AM, Michal Hocko wrote:
> [...]
>>> later needs a special treatment while the first one is ok? There are
>>> quite some resources which allow a non privileged user to consume a lot
>>> of memory and the memory controller is the only reliable way to mitigate
>>> the risk.
>> Yes, memory controller is the only reliable way to mitigate the risk,
>> but not all tasks are under the control of a memory controller with
>> kernel memory limit.
> But those which you do not trust should. So why do we need yet another
> mechanism for the reclaim?
Sometimes it could be a programming error in the code. I had seen a
customer report about the negative dentries because of a bug in their
code that generated a lot of negative dentries causing problem. In such
a controlled environment, they may not want to run their applications
under a memory cgroup as there is overhead involved in that. So a
mechanism to highlight and notify the problem is probably good to have.
>
> [...]
>>>> Patch 1 tracks the number of negative dentries present in the LRU
>>>> lists and reports it in /proc/sys/fs/dentry-state.
>>> If anything I _think_ vmstat would benefit from this because behavior of
>>> the memory reclaim does depend on the amount of neg. dentries.
>>>
>>>> Patch 2 adds a "neg-dentry-pc" sysctl parameter that can be used to to
>>>> specify a soft limit on the number of negative allowed as a percentage
>>>> of total system memory. This parameter is 0 by default which means no
>>>> negative dentry limiting will be performed.
>>> percentage has turned out to be a really wrong unit for many tunables
>>> over time. Even 1% can be just too much on really large machines.
>> Yes, that is true. Do you have any suggestion of what kind of unit
>> should be used? I can scale down the unit to 0.1% of the system memory.
>> Alternatively, one unit can be 10k/cpu thread, so a 20-thread system
>> corresponds to 200k, etc.
> I simply think this is a strange user interface. How much is a
> reasonable number? How can any admin figure that out?
Without the optional enforcement, the limit is essentially just a
notification mechanism where the system signals that there is something
wrong going on and the system administrator need to take a look. So it
is perfectly OK if the limit is sufficiently high that normally we won't
need to use that many negative dentries. The goal is to prevent negative
dentries from consuming a significant portion of the system memory.
I am going to reduce the granularity of each unit to 1/1000 of the total
system memory so that for large system with TB of memory, a smaller
amount of memory can be specified.
>>>> Patch 3 enables automatic pruning of least recently used negative
>>>> dentries when the total number is close to the preset limit.
>>> Please explain why this cannot be done in a standard dcache shrinking
>>> way. I strongly suspect that you are developing yet another reclaim with
>>> its own sets of tunable and bypassing the existing infrastructure. I
>>> haven't read patches yet but the cover letter doesn't really explain
>>> design much so I am only guessing.
>> The standard dcache shrinking happens when the system is almost running
>> out of free memory.
> Well, the standard reclaim happens when somebody needs memory. We are
> usually quite far away from "almost running out of memory". We do
> reclaim fs metadata including dentries so I really do not see why
> negative ones should be any special here.
That is fine. I can certainly live without the new reclaim mechanism.
>
>> This new shrinker will be turned on when the number
>> of negative dentries is closed to the limit even when there are still
>> plenty of free memory left. It will stop when the number of negative
>> dentries is lowered to a safe level. The new shrinker is designed to
>> impose as little overhead to the currently running tasks. That is not
>> true for the standard shrinker which will have a rather significant
>> performance impact to the currently running tasks.
> Do you have any numbers to back your claim? The memory reclaim is
> usually quite lightweight. Especially when we have a lot of clean
> fs {meta}data
In the case of dentries, it is the lock hold time of the LRU list that
can impact the normal filesystem operation. The new shrinker that I add
purposely limit the lock hold time whereas the standard shrinker can
hold the LRU for quite a long time if there are a lot of dentries to get
rid of. I have some performance numbers in the cover letter of this
patch about this.
>> I can remove the new shrinker if people really don't want to add a new
>> one as long as I can keep the option to kill off newly created negative
>> dentries when the limit is exceeded.
> Please let's not add yet another memory reclaim mechanism. It will just
> backfire sooner or later.
As said above, I am going to remove the new shrinker in the next version
of the patch. We can always add it back later on if we feel there is a
need to do it.
Cheers,
Longman
next prev parent reply other threads:[~2018-07-10 16:09 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-06 19:32 [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Waiman Long
2018-07-06 19:32 ` [PATCH v6 1/7] fs/dcache: Track & report number " Waiman Long
2018-07-06 19:32 ` [PATCH v6 2/7] fs/dcache: Add sysctl parameter neg-dentry-pc as a soft limit on " Waiman Long
2018-07-06 19:32 ` [PATCH v6 3/7] fs/dcache: Enable automatic pruning of " Waiman Long
2018-07-06 19:32 ` [PATCH v6 4/7] fs/dcache: Spread negative dentry pruning across multiple CPUs Waiman Long
2018-07-06 19:32 ` [PATCH v6 5/7] fs/dcache: Add negative dentries to LRU head initially Waiman Long
2018-07-06 19:32 ` [PATCH v6 6/7] fs/dcache: Allow optional enforcement of negative dentry limit Waiman Long
2018-07-06 19:32 ` [PATCH v6 7/7] fs/dcache: Allow deconfiguration of negative dentry code to reduce kernel size Waiman Long
2018-07-06 21:54 ` Eric Biggers
2018-07-06 22:28 ` [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Al Viro
2018-07-07 3:02 ` Waiman Long
2018-07-09 8:19 ` Michal Hocko
2018-07-09 16:01 ` Waiman Long
2018-07-10 14:27 ` Michal Hocko
2018-07-10 16:09 ` Waiman Long [this message]
2018-07-11 10:21 ` Michal Hocko
2018-07-11 15:13 ` Waiman Long
2018-07-11 17:42 ` James Bottomley
2018-07-11 19:07 ` Waiman Long
2018-07-11 19:21 ` James Bottomley
2018-07-12 15:54 ` Waiman Long
2018-07-12 16:04 ` James Bottomley
2018-07-12 16:26 ` Waiman Long
2018-07-12 17:33 ` James Bottomley
2018-07-13 15:32 ` Waiman Long
2018-07-12 16:49 ` Matthew Wilcox
2018-07-12 17:21 ` James Bottomley
2018-07-12 18:06 ` Linus Torvalds
2018-07-12 19:57 ` James Bottomley
2018-07-13 0:36 ` Dave Chinner
2018-07-13 15:46 ` James Bottomley
2018-07-13 23:17 ` Dave Chinner
2018-07-16 9:10 ` Michal Hocko
2018-07-16 14:42 ` James Bottomley
2018-07-16 9:09 ` Michal Hocko
2018-07-16 9:12 ` Michal Hocko
2018-07-16 12:41 ` Matthew Wilcox
2018-07-16 23:40 ` Andrew Morton
2018-07-17 1:30 ` Matthew Wilcox
2018-07-17 8:33 ` Michal Hocko
2018-07-19 0:33 ` Dave Chinner
2018-07-19 8:45 ` Michal Hocko
2018-07-19 9:13 ` Jan Kara
2018-07-18 18:39 ` Waiman Long
2018-07-18 16:17 ` Waiman Long
2018-07-19 8:48 ` Michal Hocko
2018-07-12 8:48 ` Michal Hocko
2018-07-12 16:12 ` Waiman Long
2018-07-12 23:16 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a2794bcc-9193-cbca-3a54-47420a2ab52c@redhat.com \
--to=longman@redhat.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=jack@suse.cz \
--cc=keescook@chromium.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lwoodman@redhat.com \
--cc=mcgrof@kernel.org \
--cc=mhocko@kernel.org \
--cc=mingo@kernel.org \
--cc=mszeredi@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=wangkai86@huawei.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).