From: Michal Hocko <mhocko@kernel.org> To: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, James Bottomley <James.Bottomley@HansenPartnership.com>, Linus Torvalds <torvalds@linux-foundation.org>, Waiman Long <longman@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>, Jonathan Corbet <corbet@lwn.net>, "Luis R. Rodriguez" <mcgrof@kernel.org>, Kees Cook <keescook@chromium.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>, "open list:DOCUMENTATION" <linux-doc@vger.kernel.org>, Jan Kara <jack@suse.cz>, Paul McKenney <paulmck@linux.vnet.ibm.com>, Ingo Molnar <mingo@kernel.org>, Miklos Szeredi <mszeredi@redhat.com>, Larry Woodman <lwoodman@redhat.com>, "Wangkai (Kevin,C)" <wangkai86@huawei.com> Subject: Re: [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Date: Thu, 19 Jul 2018 10:45:38 +0200 [thread overview] Message-ID: <20180719084538.GP7193@dhcp22.suse.cz> (raw) In-Reply-To: <20180719003329.GD19934@dastard> On Thu 19-07-18 10:33:29, Dave Chinner wrote: > On Tue, Jul 17, 2018 at 10:33:26AM +0200, Michal Hocko wrote: > > On Mon 16-07-18 16:40:32, Andrew Morton wrote: > > > On Mon, 16 Jul 2018 05:41:15 -0700 Matthew Wilcox <willy@infradead.org> wrote: > > > It's quite a small code change and would provide a mechanism for > > > implementing the hammer-cache-until-youve-freed-enough design above. > > > > > > > > > > > > Aside 2: if we *do* do something like the above __d_alloc() pseudo code > > > then perhaps it could be cast in terms of pages, not dentries. ie, > > > > > > __d_alloc() > > > { > > > ... > > > while (too many pages in dentry_cache) > > > call the dcache shrinker > > > ... > > > } > > Direct reclaim will result in all the people who care about long > tail latencies and/or highly concurent workloads starting to hate > you. Direct reclaim already hammers superblock shrinkers with > excessive concurrency, this would only make it worse. I can only confirm that! We have something similar in our SLES kernel. We have page cache soft limit implemented for many years and it is basically similar thing to above. We just shrink the page cache when we have too much of it. It turned out to be a complete PITA on large machines when hundreds of CPUs are fighting for locks. We have tried to address that but it is a complete whack a mole. More important lesson from this is that the original motivation for this functionality was to not allow too much page cache which would push a useful DB data out to swap. And as it turned out MM internals have changed a lot since the introduction and we do not really swap out in presence of the page cache anymore. Moreover we have a much more effective reclaim protection thanks to memcg low limit reclaim etc. While that is all good and nice there are still people tunning the pagecache limit based on some really old admin guides and the feature makes more harm than good and we see bug reports that system gets stalled... I really do not see why limiting (negative) dentries should be any different. > IOWs, anything like this needs to co-ordinate with other reclaim > operations in progress and, most likely, be done via background > reclaim processing rather than blocking new allocations > indefinitely. background processing can be done in bulk and as > efficiently as possible - concurrent direct reclaim in tiny batches > will just hammer dcache locks and destroy performance when there is > memory pressure. Absolutely agreed! > How many times do we have to learn this lesson the hard way? > > > > and, apart from the external name thing (grr), that should address > > > these fragmentation issues, no? I assume it's easy to ask slab how > > > many pages are presently in use for a particular cache. > > > > I remember Dave Chinner had an idea how to age dcache pages to push > > dentries with similar live time to the same page. Not sure what happened > > to that. > > Same thing that happened to all the "select the dentries on this > page for reclaim". i.e. it's referenced dentries that we can't > reclaim or move that are the issue, not the reclaimable dentries on > the page. > > Bsaically, without a hint at allocation time as to the expected life > time of the dentry, we can't be smart about how we select partial > pages to allocate from. And because we don't know at allocation time > if the dentry is going to remain a negative dentry or not, we can't > provide a hint about expected lifetime of teh object being > allocated. Can we allocate a new dentry at the time when we know the life time or the dentry pointer is so spread by that time that we cannot? -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, James Bottomley <James.Bottomley@HansenPartnership.com>, Linus Torvalds <torvalds@linux-foundation.org>, Waiman Long <longman@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>, Jonathan Corbet <corbet@lwn.net>, "Luis R. Rodriguez" <mcgrof@kernel.org>, Kees Cook <keescook@chromium.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>, "open list:DOCUMENTATION" <linux-doc@vger.kernel.org>, Jan Kara <jack@suse.cz>, Paul McKenney <paulmck@linux.vnet.ibm.com>, Ingo Molnar <mingo@kernel.org>, Miklos Szeredi <mszeredi@redhat.com>, Larry Woodman <lwoodman@redhat.com>, "Wangkai (Kevin,C)" <wangkai86@huawei.com> Subject: Re: [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Date: Thu, 19 Jul 2018 10:45:38 +0200 [thread overview] Message-ID: <20180719084538.GP7193@dhcp22.suse.cz> (raw) In-Reply-To: <20180719003329.GD19934@dastard> On Thu 19-07-18 10:33:29, Dave Chinner wrote: > On Tue, Jul 17, 2018 at 10:33:26AM +0200, Michal Hocko wrote: > > On Mon 16-07-18 16:40:32, Andrew Morton wrote: > > > On Mon, 16 Jul 2018 05:41:15 -0700 Matthew Wilcox <willy@infradead.org> wrote: > > > It's quite a small code change and would provide a mechanism for > > > implementing the hammer-cache-until-youve-freed-enough design above. > > > > > > > > > > > > Aside 2: if we *do* do something like the above __d_alloc() pseudo code > > > then perhaps it could be cast in terms of pages, not dentries. ie, > > > > > > __d_alloc() > > > { > > > ... > > > while (too many pages in dentry_cache) > > > call the dcache shrinker > > > ... > > > } > > Direct reclaim will result in all the people who care about long > tail latencies and/or highly concurent workloads starting to hate > you. Direct reclaim already hammers superblock shrinkers with > excessive concurrency, this would only make it worse. I can only confirm that! We have something similar in our SLES kernel. We have page cache soft limit implemented for many years and it is basically similar thing to above. We just shrink the page cache when we have too much of it. It turned out to be a complete PITA on large machines when hundreds of CPUs are fighting for locks. We have tried to address that but it is a complete whack a mole. More important lesson from this is that the original motivation for this functionality was to not allow too much page cache which would push a useful DB data out to swap. And as it turned out MM internals have changed a lot since the introduction and we do not really swap out in presence of the page cache anymore. Moreover we have a much more effective reclaim protection thanks to memcg low limit reclaim etc. While that is all good and nice there are still people tunning the pagecache limit based on some really old admin guides and the feature makes more harm than good and we see bug reports that system gets stalled... I really do not see why limiting (negative) dentries should be any different. > IOWs, anything like this needs to co-ordinate with other reclaim > operations in progress and, most likely, be done via background > reclaim processing rather than blocking new allocations > indefinitely. background processing can be done in bulk and as > efficiently as possible - concurrent direct reclaim in tiny batches > will just hammer dcache locks and destroy performance when there is > memory pressure. Absolutely agreed! > How many times do we have to learn this lesson the hard way? > > > > and, apart from the external name thing (grr), that should address > > > these fragmentation issues, no? I assume it's easy to ask slab how > > > many pages are presently in use for a particular cache. > > > > I remember Dave Chinner had an idea how to age dcache pages to push > > dentries with similar live time to the same page. Not sure what happened > > to that. > > Same thing that happened to all the "select the dentries on this > page for reclaim". i.e. it's referenced dentries that we can't > reclaim or move that are the issue, not the reclaimable dentries on > the page. > > Bsaically, without a hint at allocation time as to the expected life > time of the dentry, we can't be smart about how we select partial > pages to allocate from. And because we don't know at allocation time > if the dentry is going to remain a negative dentry or not, we can't > provide a hint about expected lifetime of teh object being > allocated. Can we allocate a new dentry at the time when we know the life time or the dentry pointer is so spread by that time that we cannot? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-07-19 8:45 UTC|newest] Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-06 19:32 [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Waiman Long 2018-07-06 19:32 ` Waiman Long 2018-07-06 19:32 ` [PATCH v6 1/7] fs/dcache: Track & report number " Waiman Long 2018-07-06 19:32 ` Waiman Long 2018-07-06 19:32 ` [PATCH v6 2/7] fs/dcache: Add sysctl parameter neg-dentry-pc as a soft limit on " Waiman Long 2018-07-06 19:32 ` Waiman Long 2018-07-06 19:32 ` [PATCH v6 3/7] fs/dcache: Enable automatic pruning of " Waiman Long 2018-07-06 19:32 ` Waiman Long 2018-07-06 19:32 ` [PATCH v6 4/7] fs/dcache: Spread negative dentry pruning across multiple CPUs Waiman Long 2018-07-06 19:32 ` Waiman Long 2018-07-06 19:32 ` [PATCH v6 5/7] fs/dcache: Add negative dentries to LRU head initially Waiman Long 2018-07-06 19:32 ` Waiman Long 2018-07-06 19:32 ` [PATCH v6 6/7] fs/dcache: Allow optional enforcement of negative dentry limit Waiman Long 2018-07-06 19:32 ` Waiman Long 2018-07-06 19:32 ` [PATCH v6 7/7] fs/dcache: Allow deconfiguration of negative dentry code to reduce kernel size Waiman Long 2018-07-06 19:32 ` Waiman Long 2018-07-06 21:54 ` Eric Biggers 2018-07-06 21:54 ` Eric Biggers 2018-07-06 22:28 ` [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Al Viro 2018-07-06 22:28 ` Al Viro 2018-07-07 3:02 ` Waiman Long 2018-07-07 3:02 ` Waiman Long 2018-07-09 8:19 ` Michal Hocko 2018-07-09 8:19 ` Michal Hocko 2018-07-09 16:01 ` Waiman Long 2018-07-09 16:01 ` Waiman Long 2018-07-10 14:27 ` Michal Hocko 2018-07-10 14:27 ` Michal Hocko 2018-07-10 16:09 ` Waiman Long 2018-07-10 16:09 ` Waiman Long 2018-07-11 10:21 ` Michal Hocko 2018-07-11 10:21 ` Michal Hocko 2018-07-11 15:13 ` Waiman Long 2018-07-11 15:13 ` Waiman Long 2018-07-11 17:42 ` James Bottomley 2018-07-11 17:42 ` James Bottomley 2018-07-11 17:42 ` James Bottomley 2018-07-11 19:07 ` Waiman Long 2018-07-11 19:07 ` Waiman Long 2018-07-11 19:21 ` James Bottomley 2018-07-11 19:21 ` James Bottomley 2018-07-11 19:21 ` James Bottomley 2018-07-11 19:21 ` James Bottomley 2018-07-12 15:54 ` Waiman Long 2018-07-12 15:54 ` Waiman Long 2018-07-12 16:04 ` James Bottomley 2018-07-12 16:04 ` James Bottomley 2018-07-12 16:04 ` James Bottomley 2018-07-12 16:04 ` James Bottomley 2018-07-12 16:26 ` Waiman Long 2018-07-12 16:26 ` Waiman Long 2018-07-12 17:33 ` James Bottomley 2018-07-12 17:33 ` James Bottomley 2018-07-12 17:33 ` James Bottomley 2018-07-12 17:33 ` James Bottomley 2018-07-13 15:32 ` Waiman Long 2018-07-13 15:32 ` Waiman Long 2018-07-12 16:49 ` Matthew Wilcox 2018-07-12 16:49 ` Matthew Wilcox 2018-07-12 17:21 ` James Bottomley 2018-07-12 17:21 ` James Bottomley 2018-07-12 17:21 ` James Bottomley 2018-07-12 17:21 ` James Bottomley 2018-07-12 18:06 ` Linus Torvalds 2018-07-12 19:57 ` James Bottomley 2018-07-12 19:57 ` James Bottomley 2018-07-12 19:57 ` James Bottomley 2018-07-12 19:57 ` James Bottomley 2018-07-13 0:36 ` Dave Chinner 2018-07-13 0:36 ` Dave Chinner 2018-07-13 15:46 ` James Bottomley 2018-07-13 15:46 ` James Bottomley 2018-07-13 15:46 ` James Bottomley 2018-07-13 15:46 ` James Bottomley 2018-07-13 23:17 ` Dave Chinner 2018-07-13 23:17 ` Dave Chinner 2018-07-13 23:17 ` Dave Chinner 2018-07-13 23:17 ` Dave Chinner 2018-07-16 9:10 ` Michal Hocko 2018-07-16 9:10 ` Michal Hocko 2018-07-16 14:42 ` James Bottomley 2018-07-16 14:42 ` James Bottomley 2018-07-16 14:42 ` James Bottomley 2018-07-16 14:42 ` James Bottomley 2018-07-16 9:09 ` Michal Hocko 2018-07-16 9:09 ` Michal Hocko 2018-07-16 9:12 ` Michal Hocko 2018-07-16 9:12 ` Michal Hocko 2018-07-16 12:41 ` Matthew Wilcox 2018-07-16 12:41 ` Matthew Wilcox 2018-07-16 23:40 ` Andrew Morton 2018-07-16 23:40 ` Andrew Morton 2018-07-17 1:30 ` Matthew Wilcox 2018-07-17 1:30 ` Matthew Wilcox 2018-07-17 8:33 ` Michal Hocko 2018-07-17 8:33 ` Michal Hocko 2018-07-19 0:33 ` Dave Chinner 2018-07-19 0:33 ` Dave Chinner 2018-07-19 8:45 ` Michal Hocko [this message] 2018-07-19 8:45 ` Michal Hocko 2018-07-19 9:13 ` Jan Kara 2018-07-19 9:13 ` Jan Kara 2018-07-18 18:39 ` Waiman Long 2018-07-18 18:39 ` Waiman Long 2018-07-18 16:17 ` Waiman Long 2018-07-18 16:17 ` Waiman Long 2018-07-19 8:48 ` Michal Hocko 2018-07-19 8:48 ` Michal Hocko 2018-07-12 8:48 ` Michal Hocko 2018-07-12 8:48 ` Michal Hocko 2018-07-12 16:12 ` Waiman Long 2018-07-12 16:12 ` Waiman Long 2018-07-12 23:16 ` Andrew Morton 2018-07-12 23:16 ` Andrew Morton
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180719084538.GP7193@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=James.Bottomley@HansenPartnership.com \ --cc=akpm@linux-foundation.org \ --cc=corbet@lwn.net \ --cc=david@fromorbit.com \ --cc=jack@suse.cz \ --cc=keescook@chromium.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=longman@redhat.com \ --cc=lwoodman@redhat.com \ --cc=mcgrof@kernel.org \ --cc=mingo@kernel.org \ --cc=mszeredi@redhat.com \ --cc=paulmck@linux.vnet.ibm.com \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ --cc=wangkai86@huawei.com \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.