All of lore.kernel.org
 help / color / mirror / Atom feed
* Limit dentry cache entries
@ 2013-05-20  3:50 Keyur Govande
  2013-05-20 12:20 ` Bob Peterson
  2013-05-20 22:53 ` Dave Chinner
  0 siblings, 2 replies; 11+ messages in thread
From: Keyur Govande @ 2013-05-20  3:50 UTC (permalink / raw)
  To: linux-fsdevel

Hello,

We have a bunch of servers that create a lot of temp files, or check
for the existence of non-existent files. Every such operation creates
a dentry object and soon most of the free memory is consumed for
'negative' dentry entries. This behavior was observed on both CentOS
kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.

There are also some processes running that occasionally allocate large
chunks of memory, and when this happens the kernel clears out a bunch
of stale dentry caches. This clearing takes some time. kswapd kicks
in, and allocations and bzero() of 4GB that normally takes <1s, takes
20s or more.

Because the memory needs are non-continuous but negative dentry
generation is fairly continuous, vfs_cache_pressure doesn't help much.

The thought I had was to have a sysctl that limits the number of
dentries per super-block (sb-max-dentry). Everytime a new dentry is
allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
of super blocks * sb-max-dentry). If yes, queue up an asynchronous
workqueue call to prune_dcache(). Also have a separate sysctl to
indicate by what percentage to reduce the dentry entries when this
happens.

Thanks for your input. If this sounds like a reasonable idea, I'll
send out a patch.

Cheers,
Keyur.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-20  3:50 Limit dentry cache entries Keyur Govande
@ 2013-05-20 12:20 ` Bob Peterson
  2013-05-25  3:03   ` Keyur Govande
  2013-05-20 22:53 ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Bob Peterson @ 2013-05-20 12:20 UTC (permalink / raw)
  To: Keyur Govande; +Cc: linux-fsdevel

----- Original Message -----
| Hello,
| 
| We have a bunch of servers that create a lot of temp files, or check
| for the existence of non-existent files. Every such operation creates
| a dentry object and soon most of the free memory is consumed for
| 'negative' dentry entries. This behavior was observed on both CentOS
| kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
| 
| There are also some processes running that occasionally allocate large
| chunks of memory, and when this happens the kernel clears out a bunch
| of stale dentry caches. This clearing takes some time. kswapd kicks
| in, and allocations and bzero() of 4GB that normally takes <1s, takes
| 20s or more.
| 
| Because the memory needs are non-continuous but negative dentry
| generation is fairly continuous, vfs_cache_pressure doesn't help much.
| 
| The thought I had was to have a sysctl that limits the number of
| dentries per super-block (sb-max-dentry). Everytime a new dentry is
| allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
| of super blocks * sb-max-dentry). If yes, queue up an asynchronous
| workqueue call to prune_dcache(). Also have a separate sysctl to
| indicate by what percentage to reduce the dentry entries when this
| happens.
| 
| Thanks for your input. If this sounds like a reasonable idea, I'll
| send out a patch.
| 
| Cheers,
| Keyur.

Hi Keyur,

I like the idea. I've had people bring up the same issue, relating
to GFS2. This is especially true for doing du and similar ops on a
very large file system. This wasn't on GFS2, was it?

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-20  3:50 Limit dentry cache entries Keyur Govande
  2013-05-20 12:20 ` Bob Peterson
@ 2013-05-20 22:53 ` Dave Chinner
  2013-05-25  3:12   ` Keyur Govande
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2013-05-20 22:53 UTC (permalink / raw)
  To: Keyur Govande; +Cc: linux-fsdevel

On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
> Hello,
> 
> We have a bunch of servers that create a lot of temp files, or check
> for the existence of non-existent files. Every such operation creates
> a dentry object and soon most of the free memory is consumed for
> 'negative' dentry entries. This behavior was observed on both CentOS
> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
> 
> There are also some processes running that occasionally allocate large
> chunks of memory, and when this happens the kernel clears out a bunch
> of stale dentry caches. This clearing takes some time. kswapd kicks
> in, and allocations and bzero() of 4GB that normally takes <1s, takes
> 20s or more.
> 
> Because the memory needs are non-continuous but negative dentry
> generation is fairly continuous, vfs_cache_pressure doesn't help much.
> 
> The thought I had was to have a sysctl that limits the number of
> dentries per super-block (sb-max-dentry). Everytime a new dentry is
> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
> of super blocks * sb-max-dentry). If yes, queue up an asynchronous
> workqueue call to prune_dcache(). Also have a separate sysctl to
> indicate by what percentage to reduce the dentry entries when this
> happens.

This request does come up every so often. There are valid reasons
for being able to control the exact size of the dentry and page
caches - I've seen a few implementations in storage appliance
vendor kernels where total control of memory usage yields a few
percent better performance of industry specific benchmarks. Indeed,
years ago I thought that capping the size of the dnetry cache was a
good idea, too.

However, the problem that I've seen with every single on of these
implementations is that the limit is carefully tuned for best all
round performance in a given set of canned workloads. When the limit
is wrong, performance tanks, and it is just about impossible to set
a limit correctly for a machine that has a changing workload.

If your problem is negative dentries building up, where do you set
the limit? Set it low enough to keep only a small number of total
dentries to keep the negative dentries down, and you'll end up
with a dentry cache that isn't big enough to hold all th dentries
needed for efficient performance with workloads that do directory
traversals. It's a two-edged sword, and most people do not have
enough knowledge to tune a knob correctly.

IOWs, the automatic sizing of the dentry cache based on memory
pressure is the correct thing to do. Capping it, or allowing it to
be capped will simply generate bug reports for strange performance
problems....

That said, keeping lots of negative dentries around until memory
pressure kicks them out is probably the wrong thing to do. Negative
dentries are an optimisation for some workloads, but they tend to
have references to negative dentries with a temporal locality that
matches the unlink time.

Perhaps we need to separately reclaim negative dentries i.e. not
wait for memory pressure to reclaim them but use some other kind of
trigger for reclamation. That doesn't cap the size of the dentry
cache, but would address the problem of negative dentry buildup....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-20 12:20 ` Bob Peterson
@ 2013-05-25  3:03   ` Keyur Govande
  0 siblings, 0 replies; 11+ messages in thread
From: Keyur Govande @ 2013-05-25  3:03 UTC (permalink / raw)
  To: Bob Peterson; +Cc: linux-fsdevel

On Mon, May 20, 2013 at 8:20 AM, Bob Peterson <rpeterso@redhat.com> wrote:
>
> ----- Original Message -----
> | Hello,
> |
> | We have a bunch of servers that create a lot of temp files, or check
> | for the existence of non-existent files. Every such operation creates
> | a dentry object and soon most of the free memory is consumed for
> | 'negative' dentry entries. This behavior was observed on both CentOS
> | kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
> |
> | There are also some processes running that occasionally allocate large
> | chunks of memory, and when this happens the kernel clears out a bunch
> | of stale dentry caches. This clearing takes some time. kswapd kicks
> | in, and allocations and bzero() of 4GB that normally takes <1s, takes
> | 20s or more.
> |
> | Because the memory needs are non-continuous but negative dentry
> | generation is fairly continuous, vfs_cache_pressure doesn't help much.
> |
> | The thought I had was to have a sysctl that limits the number of
> | dentries per super-block (sb-max-dentry). Everytime a new dentry is
> | allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
> | of super blocks * sb-max-dentry). If yes, queue up an asynchronous
> | workqueue call to prune_dcache(). Also have a separate sysctl to
> | indicate by what percentage to reduce the dentry entries when this
> | happens.
> |
> | Thanks for your input. If this sounds like a reasonable idea, I'll
> | send out a patch.
> |
> | Cheers,
> | Keyur.
>
> Hi Keyur,
>
> I like the idea. I've had people bring up the same issue, relating
> to GFS2. This is especially true for doing du and similar ops on a
> very large file system. This wasn't on GFS2, was it?
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems

Hi Bob,

Actually this was observed both on EXT3 and XFS. Only tmpfs is immune
to the negative dentry caching problem, as might be expected :)

Thanks,
Keyur.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-20 22:53 ` Dave Chinner
@ 2013-05-25  3:12   ` Keyur Govande
  2013-05-26 23:23     ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Keyur Govande @ 2013-05-25  3:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
>> Hello,
>>
>> We have a bunch of servers that create a lot of temp files, or check
>> for the existence of non-existent files. Every such operation creates
>> a dentry object and soon most of the free memory is consumed for
>> 'negative' dentry entries. This behavior was observed on both CentOS
>> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
>>
>> There are also some processes running that occasionally allocate large
>> chunks of memory, and when this happens the kernel clears out a bunch
>> of stale dentry caches. This clearing takes some time. kswapd kicks
>> in, and allocations and bzero() of 4GB that normally takes <1s, takes
>> 20s or more.
>>
>> Because the memory needs are non-continuous but negative dentry
>> generation is fairly continuous, vfs_cache_pressure doesn't help much.
>>
>> The thought I had was to have a sysctl that limits the number of
>> dentries per super-block (sb-max-dentry). Everytime a new dentry is
>> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
>> of super blocks * sb-max-dentry). If yes, queue up an asynchronous
>> workqueue call to prune_dcache(). Also have a separate sysctl to
>> indicate by what percentage to reduce the dentry entries when this
>> happens.
>
> This request does come up every so often. There are valid reasons
> for being able to control the exact size of the dentry and page
> caches - I've seen a few implementations in storage appliance
> vendor kernels where total control of memory usage yields a few
> percent better performance of industry specific benchmarks. Indeed,
> years ago I thought that capping the size of the dnetry cache was a
> good idea, too.
>
> However, the problem that I've seen with every single on of these
> implementations is that the limit is carefully tuned for best all
> round performance in a given set of canned workloads. When the limit
> is wrong, performance tanks, and it is just about impossible to set
> a limit correctly for a machine that has a changing workload.
>
> If your problem is negative dentries building up, where do you set
> the limit? Set it low enough to keep only a small number of total
> dentries to keep the negative dentries down, and you'll end up
> with a dentry cache that isn't big enough to hold all th dentries
> needed for efficient performance with workloads that do directory
> traversals. It's a two-edged sword, and most people do not have
> enough knowledge to tune a knob correctly.
>
> IOWs, the automatic sizing of the dentry cache based on memory
> pressure is the correct thing to do. Capping it, or allowing it to
> be capped will simply generate bug reports for strange performance
> problems....
>
> That said, keeping lots of negative dentries around until memory
> pressure kicks them out is probably the wrong thing to do. Negative
> dentries are an optimisation for some workloads, but they tend to
> have references to negative dentries with a temporal locality that
> matches the unlink time.
>
> Perhaps we need to separately reclaim negative dentries i.e. not
> wait for memory pressure to reclaim them but use some other kind of
> trigger for reclamation. That doesn't cap the size of the dentry
> cache, but would address the problem of negative dentry buildup....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

Hi Dave,

Thank you for responding. Sorry it took so long for me to get back,
been a bit busy.

I do agree that having a knob, and then setting a bad value can tank
performance. But not having a knob IMO is worse. Currently there are
no options for controlling the cache, bar dropping the caches
altogether every so often. The knob would have a default value of
((unsigned long) -1)), so if one does not care for it, they would
experience the same behavior as today.

Also, setting a bad value for the knob would negatively impact file-IO
performance, which on a spinning disk isn't guaranteed anyway. The
current situation tanks memory performance which is more unexpected to
a normal user.

Thanks,
Keyur.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-25  3:12   ` Keyur Govande
@ 2013-05-26 23:23     ` Dave Chinner
  2013-05-28  6:12       ` Keyur Govande
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2013-05-26 23:23 UTC (permalink / raw)
  To: Keyur Govande; +Cc: linux-fsdevel

On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote:
> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
> >> Hello,
> >>
> >> We have a bunch of servers that create a lot of temp files, or check
> >> for the existence of non-existent files. Every such operation creates
> >> a dentry object and soon most of the free memory is consumed for
> >> 'negative' dentry entries. This behavior was observed on both CentOS
> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
> >>
> >> There are also some processes running that occasionally allocate large
> >> chunks of memory, and when this happens the kernel clears out a bunch
> >> of stale dentry caches. This clearing takes some time. kswapd kicks
> >> in, and allocations and bzero() of 4GB that normally takes <1s, takes
> >> 20s or more.
> >>
> >> Because the memory needs are non-continuous but negative dentry
> >> generation is fairly continuous, vfs_cache_pressure doesn't help much.
> >>
> >> The thought I had was to have a sysctl that limits the number of
> >> dentries per super-block (sb-max-dentry). Everytime a new dentry is
> >> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
> >> of super blocks * sb-max-dentry). If yes, queue up an asynchronous
> >> workqueue call to prune_dcache(). Also have a separate sysctl to
> >> indicate by what percentage to reduce the dentry entries when this
> >> happens.
> >
> > This request does come up every so often. There are valid reasons
> > for being able to control the exact size of the dentry and page
> > caches - I've seen a few implementations in storage appliance
> > vendor kernels where total control of memory usage yields a few
> > percent better performance of industry specific benchmarks. Indeed,
> > years ago I thought that capping the size of the dnetry cache was a
> > good idea, too.
> >
> > However, the problem that I've seen with every single on of these
> > implementations is that the limit is carefully tuned for best all
> > round performance in a given set of canned workloads. When the limit
> > is wrong, performance tanks, and it is just about impossible to set
> > a limit correctly for a machine that has a changing workload.
> >
> > If your problem is negative dentries building up, where do you set
> > the limit? Set it low enough to keep only a small number of total
> > dentries to keep the negative dentries down, and you'll end up
> > with a dentry cache that isn't big enough to hold all th dentries
> > needed for efficient performance with workloads that do directory
> > traversals. It's a two-edged sword, and most people do not have
> > enough knowledge to tune a knob correctly.
> >
> > IOWs, the automatic sizing of the dentry cache based on memory
> > pressure is the correct thing to do. Capping it, or allowing it to
> > be capped will simply generate bug reports for strange performance
> > problems....
> >
> > That said, keeping lots of negative dentries around until memory
> > pressure kicks them out is probably the wrong thing to do. Negative
> > dentries are an optimisation for some workloads, but they tend to
> > have references to negative dentries with a temporal locality that
> > matches the unlink time.
> >
> > Perhaps we need to separately reclaim negative dentries i.e. not
> > wait for memory pressure to reclaim them but use some other kind of
> > trigger for reclamation. That doesn't cap the size of the dentry
> > cache, but would address the problem of negative dentry buildup....
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
> 
> Hi Dave,
> 
> Thank you for responding. Sorry it took so long for me to get back,
> been a bit busy.
> 
> I do agree that having a knob, and then setting a bad value can tank
> performance. But not having a knob IMO is worse.  Currently there are
> no options for controlling the cache, bar dropping the caches
> altogether every so often. The knob would have a default value of
> ((unsigned long) -1)), so if one does not care for it, they would
> experience the same behavior as today.

And therein lies the problem with a knob. What's the point of having
a knob that nobody but a handful of people know what it does or
evenhow to recognise when they need to tweak it. It's long been a
linux kernel policy that the kernel should do the right thing by
default. As such, knobs to tweak things are a last resort.

> Also, setting a bad value for the knob would negatively impact file-IO
> performance, which on a spinning disk isn't guaranteed anyway. The
> current situation tanks memory performance which is more unexpected to
> a normal user.

Which is precisely why a knob is the wrong solution. If it's
something a normal, unsuspecting user has problems with, then it
needs to be handled automatically by the kernel. Expecting users who
don't even know what a dentry is to know about a magic knob that
fixes a problem they don't even know they have is not an acceptable
solution.

The first step to solving such a problem is to provide a
reproducable, measurable test case in a simple script that
demonstrates the problem that needs solving. If we can reproduce it
at will, then half the battle is already won....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-26 23:23     ` Dave Chinner
@ 2013-05-28  6:12       ` Keyur Govande
  2013-05-28  6:24         ` Keyur Govande
  2013-05-28 10:49         ` Dave Chinner
  0 siblings, 2 replies; 11+ messages in thread
From: Keyur Govande @ 2013-05-28  6:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

On Sun, May 26, 2013 at 7:23 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote:
>> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
>> >> Hello,
>> >>
>> >> We have a bunch of servers that create a lot of temp files, or check
>> >> for the existence of non-existent files. Every such operation creates
>> >> a dentry object and soon most of the free memory is consumed for
>> >> 'negative' dentry entries. This behavior was observed on both CentOS
>> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
>> >>
>> >> There are also some processes running that occasionally allocate large
>> >> chunks of memory, and when this happens the kernel clears out a bunch
>> >> of stale dentry caches. This clearing takes some time. kswapd kicks
>> >> in, and allocations and bzero() of 4GB that normally takes <1s, takes
>> >> 20s or more.
>> >>
>> >> Because the memory needs are non-continuous but negative dentry
>> >> generation is fairly continuous, vfs_cache_pressure doesn't help much.
>> >>
>> >> The thought I had was to have a sysctl that limits the number of
>> >> dentries per super-block (sb-max-dentry). Everytime a new dentry is
>> >> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
>> >> of super blocks * sb-max-dentry). If yes, queue up an asynchronous
>> >> workqueue call to prune_dcache(). Also have a separate sysctl to
>> >> indicate by what percentage to reduce the dentry entries when this
>> >> happens.
>> >
>> > This request does come up every so often. There are valid reasons
>> > for being able to control the exact size of the dentry and page
>> > caches - I've seen a few implementations in storage appliance
>> > vendor kernels where total control of memory usage yields a few
>> > percent better performance of industry specific benchmarks. Indeed,
>> > years ago I thought that capping the size of the dnetry cache was a
>> > good idea, too.
>> >
>> > However, the problem that I've seen with every single on of these
>> > implementations is that the limit is carefully tuned for best all
>> > round performance in a given set of canned workloads. When the limit
>> > is wrong, performance tanks, and it is just about impossible to set
>> > a limit correctly for a machine that has a changing workload.
>> >
>> > If your problem is negative dentries building up, where do you set
>> > the limit? Set it low enough to keep only a small number of total
>> > dentries to keep the negative dentries down, and you'll end up
>> > with a dentry cache that isn't big enough to hold all th dentries
>> > needed for efficient performance with workloads that do directory
>> > traversals. It's a two-edged sword, and most people do not have
>> > enough knowledge to tune a knob correctly.
>> >
>> > IOWs, the automatic sizing of the dentry cache based on memory
>> > pressure is the correct thing to do. Capping it, or allowing it to
>> > be capped will simply generate bug reports for strange performance
>> > problems....
>> >
>> > That said, keeping lots of negative dentries around until memory
>> > pressure kicks them out is probably the wrong thing to do. Negative
>> > dentries are an optimisation for some workloads, but they tend to
>> > have references to negative dentries with a temporal locality that
>> > matches the unlink time.
>> >
>> > Perhaps we need to separately reclaim negative dentries i.e. not
>> > wait for memory pressure to reclaim them but use some other kind of
>> > trigger for reclamation. That doesn't cap the size of the dentry
>> > cache, but would address the problem of negative dentry buildup....
>> >
>> > Cheers,
>> >
>> > Dave.
>> > --
>> > Dave Chinner
>> > david@fromorbit.com
>>
>> Hi Dave,
>>
>> Thank you for responding. Sorry it took so long for me to get back,
>> been a bit busy.
>>
>> I do agree that having a knob, and then setting a bad value can tank
>> performance. But not having a knob IMO is worse.  Currently there are
>> no options for controlling the cache, bar dropping the caches
>> altogether every so often. The knob would have a default value of
>> ((unsigned long) -1)), so if one does not care for it, they would
>> experience the same behavior as today.
>
> And therein lies the problem with a knob. What's the point of having
> a knob that nobody but a handful of people know what it does or
> evenhow to recognise when they need to tweak it. It's long been a
> linux kernel policy that the kernel should do the right thing by
> default. As such, knobs to tweak things are a last resort.
>
>> Also, setting a bad value for the knob would negatively impact file-IO
>> performance, which on a spinning disk isn't guaranteed anyway. The
>> current situation tanks memory performance which is more unexpected to
>> a normal user.
>
> Which is precisely why a knob is the wrong solution. If it's
> something a normal, unsuspecting user has problems with, then it
> needs to be handled automatically by the kernel. Expecting users who
> don't even know what a dentry is to know about a magic knob that
> fixes a problem they don't even know they have is not an acceptable
> solution.
>
> The first step to solving such a problem is to provide a
> reproducable, measurable test case in a simple script that
> demonstrates the problem that needs solving. If we can reproduce it
> at will, then half the battle is already won....
>

Here's a simple test case: https://gist.github.com/keyurdg/5660719 to
create a ton of dentry cache entries, and
https://gist.github.com/keyurdg/5660723 to allocate some memory.

I kicked off 3 instances of fopen in 3 different prefixed directories.
After all the memory was filled up with dentry entries, I tried
allocating 4GB of memory. It took ~20s. If I turned off the dentry
generation programs and attempted to allocate 4GB again, it only took
2s (because the memory was already free). Here's a quick graph of this
behavior: http://i.imgur.com/XhgX84d.png

I understand that in general, the kernel should do "the right thing"
without user input. But this seems to be a case where the user should
be allowed input into how memory is used. After all, there are already
lots of knobs in Linux that if set wrongly can cause user pain/bad
performance. IMO this new knob needs the right kind of documentation,
like suggesting the use of slabtop and perf to identify dentry as an
issue before setting the knob.

I'm also not tied to the idea of the knob being a limit on the number
of dentry cache entries. A limit just seems easiest to administer; but
if there are other ways of alleviating this issue, then I'd love to
explore those as well.

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-28  6:12       ` Keyur Govande
@ 2013-05-28  6:24         ` Keyur Govande
  2013-05-28 10:49         ` Dave Chinner
  1 sibling, 0 replies; 11+ messages in thread
From: Keyur Govande @ 2013-05-28  6:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

On Tue, May 28, 2013 at 2:12 AM, Keyur Govande <keyurgovande@gmail.com> wrote:
> On Sun, May 26, 2013 at 7:23 PM, Dave Chinner <david@fromorbit.com> wrote:
>> On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote:
>>> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@fromorbit.com> wrote:
>>> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
>>> >> Hello,
>>> >>
>>> >> We have a bunch of servers that create a lot of temp files, or check
>>> >> for the existence of non-existent files. Every such operation creates
>>> >> a dentry object and soon most of the free memory is consumed for
>>> >> 'negative' dentry entries. This behavior was observed on both CentOS
>>> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
>>> >>
>>> >> There are also some processes running that occasionally allocate large
>>> >> chunks of memory, and when this happens the kernel clears out a bunch
>>> >> of stale dentry caches. This clearing takes some time. kswapd kicks
>>> >> in, and allocations and bzero() of 4GB that normally takes <1s, takes
>>> >> 20s or more.
>>> >>
>>> >> Because the memory needs are non-continuous but negative dentry
>>> >> generation is fairly continuous, vfs_cache_pressure doesn't help much.
>>> >>
>>> >> The thought I had was to have a sysctl that limits the number of
>>> >> dentries per super-block (sb-max-dentry). Everytime a new dentry is
>>> >> allocated in d_alloc(), check if dentry_stat.nr_dentry exceeds (number
>>> >> of super blocks * sb-max-dentry). If yes, queue up an asynchronous
>>> >> workqueue call to prune_dcache(). Also have a separate sysctl to
>>> >> indicate by what percentage to reduce the dentry entries when this
>>> >> happens.
>>> >
>>> > This request does come up every so often. There are valid reasons
>>> > for being able to control the exact size of the dentry and page
>>> > caches - I've seen a few implementations in storage appliance
>>> > vendor kernels where total control of memory usage yields a few
>>> > percent better performance of industry specific benchmarks. Indeed,
>>> > years ago I thought that capping the size of the dnetry cache was a
>>> > good idea, too.
>>> >
>>> > However, the problem that I've seen with every single on of these
>>> > implementations is that the limit is carefully tuned for best all
>>> > round performance in a given set of canned workloads. When the limit
>>> > is wrong, performance tanks, and it is just about impossible to set
>>> > a limit correctly for a machine that has a changing workload.
>>> >
>>> > If your problem is negative dentries building up, where do you set
>>> > the limit? Set it low enough to keep only a small number of total
>>> > dentries to keep the negative dentries down, and you'll end up
>>> > with a dentry cache that isn't big enough to hold all th dentries
>>> > needed for efficient performance with workloads that do directory
>>> > traversals. It's a two-edged sword, and most people do not have
>>> > enough knowledge to tune a knob correctly.
>>> >
>>> > IOWs, the automatic sizing of the dentry cache based on memory
>>> > pressure is the correct thing to do. Capping it, or allowing it to
>>> > be capped will simply generate bug reports for strange performance
>>> > problems....
>>> >
>>> > That said, keeping lots of negative dentries around until memory
>>> > pressure kicks them out is probably the wrong thing to do. Negative
>>> > dentries are an optimisation for some workloads, but they tend to
>>> > have references to negative dentries with a temporal locality that
>>> > matches the unlink time.
>>> >
>>> > Perhaps we need to separately reclaim negative dentries i.e. not
>>> > wait for memory pressure to reclaim them but use some other kind of
>>> > trigger for reclamation. That doesn't cap the size of the dentry
>>> > cache, but would address the problem of negative dentry buildup....
>>> >
>>> > Cheers,
>>> >
>>> > Dave.
>>> > --
>>> > Dave Chinner
>>> > david@fromorbit.com
>>>
>>> Hi Dave,
>>>
>>> Thank you for responding. Sorry it took so long for me to get back,
>>> been a bit busy.
>>>
>>> I do agree that having a knob, and then setting a bad value can tank
>>> performance. But not having a knob IMO is worse.  Currently there are
>>> no options for controlling the cache, bar dropping the caches
>>> altogether every so often. The knob would have a default value of
>>> ((unsigned long) -1)), so if one does not care for it, they would
>>> experience the same behavior as today.
>>
>> And therein lies the problem with a knob. What's the point of having
>> a knob that nobody but a handful of people know what it does or
>> evenhow to recognise when they need to tweak it. It's long been a
>> linux kernel policy that the kernel should do the right thing by
>> default. As such, knobs to tweak things are a last resort.
>>
>>> Also, setting a bad value for the knob would negatively impact file-IO
>>> performance, which on a spinning disk isn't guaranteed anyway. The
>>> current situation tanks memory performance which is more unexpected to
>>> a normal user.
>>
>> Which is precisely why a knob is the wrong solution. If it's
>> something a normal, unsuspecting user has problems with, then it
>> needs to be handled automatically by the kernel. Expecting users who
>> don't even know what a dentry is to know about a magic knob that
>> fixes a problem they don't even know they have is not an acceptable
>> solution.
>>
>> The first step to solving such a problem is to provide a
>> reproducable, measurable test case in a simple script that
>> demonstrates the problem that needs solving. If we can reproduce it
>> at will, then half the battle is already won....
>>
>
> Here's a simple test case: https://gist.github.com/keyurdg/5660719 to
> create a ton of dentry cache entries, and
> https://gist.github.com/keyurdg/5660723 to allocate some memory.
>
> I kicked off 3 instances of fopen in 3 different prefixed directories.
> After all the memory was filled up with dentry entries, I tried
> allocating 4GB of memory. It took ~20s. If I turned off the dentry
> generation programs and attempted to allocate 4GB again, it only took
> 2s (because the memory was already free). Here's a quick graph of this
> behavior: http://i.imgur.com/XhgX84d.png
>
> I understand that in general, the kernel should do "the right thing"
> without user input. But this seems to be a case where the user should
> be allowed input into how memory is used. After all, there are already
> lots of knobs in Linux that if set wrongly can cause user pain/bad
> performance. IMO this new knob needs the right kind of documentation,
> like suggesting the use of slabtop and perf to identify dentry as an
> issue before setting the knob.
>
> I'm also not tied to the idea of the knob being a limit on the number
> of dentry cache entries. A limit just seems easiest to administer; but
> if there are other ways of alleviating this issue, then I'd love to
> explore those as well.
>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com

Forgot to add: the only "knob" for this issue ATM is to drop the
entire cache altogether, a massive overreaction to the problem. The
dentry cache system already has all its elements in an LRU; if we did
allow setting a limit, any dropped dentries have a good chance of not
being very significant (performance-wise).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-28  6:12       ` Keyur Govande
  2013-05-28  6:24         ` Keyur Govande
@ 2013-05-28 10:49         ` Dave Chinner
  2013-05-28 16:42           ` Keyur Govande
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2013-05-28 10:49 UTC (permalink / raw)
  To: Keyur Govande; +Cc: linux-fsdevel

On Tue, May 28, 2013 at 02:12:26AM -0400, Keyur Govande wrote:
> On Sun, May 26, 2013 at 7:23 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote:
> >> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
> >> >> Hello,
> >> >>
> >> >> We have a bunch of servers that create a lot of temp files, or check
> >> >> for the existence of non-existent files. Every such operation creates
> >> >> a dentry object and soon most of the free memory is consumed for
> >> >> 'negative' dentry entries. This behavior was observed on both CentOS
> >> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
....
> >> Also, setting a bad value for the knob would negatively impact file-IO
> >> performance, which on a spinning disk isn't guaranteed anyway. The
> >> current situation tanks memory performance which is more unexpected to
> >> a normal user.
> >
> > Which is precisely why a knob is the wrong solution. If it's
> > something a normal, unsuspecting user has problems with, then it
> > needs to be handled automatically by the kernel. Expecting users who
> > don't even know what a dentry is to know about a magic knob that
> > fixes a problem they don't even know they have is not an acceptable
> > solution.
> >
> > The first step to solving such a problem is to provide a
> > reproducable, measurable test case in a simple script that
> > demonstrates the problem that needs solving. If we can reproduce it
> > at will, then half the battle is already won....
> 
> Here's a simple test case: https://gist.github.com/keyurdg/5660719 to
> create a ton of dentry cache entries, and
> https://gist.github.com/keyurdg/5660723 to allocate some memory.
> 
> I kicked off 3 instances of fopen in 3 different prefixed directories.
> After all the memory was filled up with dentry entries, I tried
> allocating 4GB of memory. It took ~20s. If I turned off the dentry
> generation programs and attempted to allocate 4GB again, it only took
> 2s (because the memory was already free). Here's a quick graph of this
> behavior: http://i.imgur.com/XhgX84d.png

News at 11! Memory allocation when memory is full is slower than
when it's empty!

That's not what I was asking for. We were talking about negative
dentry buildup and possibly containing that, not a strawman "I can
fill all of memory with dentries by creating files" workload.

IOWs, your example is not demonstrating the problem you complained
about. We are not going to put a global limit on active dentries.

If you really want a global dentry cache size limit or to ensure
that certain processes have free memory available for use, then
perhaps you should be looking at what you can control with cgroups.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-28 10:49         ` Dave Chinner
@ 2013-05-28 16:42           ` Keyur Govande
  2013-05-28 17:14             ` Keyur Govande
  0 siblings, 1 reply; 11+ messages in thread
From: Keyur Govande @ 2013-05-28 16:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

On Tue, May 28, 2013 at 6:49 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, May 28, 2013 at 02:12:26AM -0400, Keyur Govande wrote:
>> On Sun, May 26, 2013 at 7:23 PM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote:
>> >> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
>> >> >> Hello,
>> >> >>
>> >> >> We have a bunch of servers that create a lot of temp files, or check
>> >> >> for the existence of non-existent files. Every such operation creates
>> >> >> a dentry object and soon most of the free memory is consumed for
>> >> >> 'negative' dentry entries. This behavior was observed on both CentOS
>> >> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
> ....
>> >> Also, setting a bad value for the knob would negatively impact file-IO
>> >> performance, which on a spinning disk isn't guaranteed anyway. The
>> >> current situation tanks memory performance which is more unexpected to
>> >> a normal user.
>> >
>> > Which is precisely why a knob is the wrong solution. If it's
>> > something a normal, unsuspecting user has problems with, then it
>> > needs to be handled automatically by the kernel. Expecting users who
>> > don't even know what a dentry is to know about a magic knob that
>> > fixes a problem they don't even know they have is not an acceptable
>> > solution.
>> >
>> > The first step to solving such a problem is to provide a
>> > reproducable, measurable test case in a simple script that
>> > demonstrates the problem that needs solving. If we can reproduce it
>> > at will, then half the battle is already won....
>>
>> Here's a simple test case: https://gist.github.com/keyurdg/5660719 to
>> create a ton of dentry cache entries, and
>> https://gist.github.com/keyurdg/5660723 to allocate some memory.
>>
>> I kicked off 3 instances of fopen in 3 different prefixed directories.
>> After all the memory was filled up with dentry entries, I tried
>> allocating 4GB of memory. It took ~20s. If I turned off the dentry
>> generation programs and attempted to allocate 4GB again, it only took
>> 2s (because the memory was already free). Here's a quick graph of this
>> behavior: http://i.imgur.com/XhgX84d.png
>
> News at 11! Memory allocation when memory is full is slower than
> when it's empty!
>
> That's not what I was asking for. We were talking about negative
> dentry buildup and possibly containing that, not a strawman "I can
> fill all of memory with dentries by creating files" workload.

By passing in a mode of "r" like: "./fopen test1 r & ./fopen test2 r
&" you can create a ton of negative dentry cache entries.

>
> IOWs, your example is not demonstrating the problem you complained
> about. We are not going to put a global limit on active dentries.
>
> If you really want a global dentry cache size limit or to ensure
> that certain processes have free memory available for use, then
> perhaps you should be looking at what you can control with cgroups.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Limit dentry cache entries
  2013-05-28 16:42           ` Keyur Govande
@ 2013-05-28 17:14             ` Keyur Govande
  0 siblings, 0 replies; 11+ messages in thread
From: Keyur Govande @ 2013-05-28 17:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

On Tue, May 28, 2013 at 12:42 PM, Keyur Govande <keyurgovande@gmail.com> wrote:
> On Tue, May 28, 2013 at 6:49 AM, Dave Chinner <david@fromorbit.com> wrote:
>> On Tue, May 28, 2013 at 02:12:26AM -0400, Keyur Govande wrote:
>>> On Sun, May 26, 2013 at 7:23 PM, Dave Chinner <david@fromorbit.com> wrote:
>>> > On Fri, May 24, 2013 at 11:12:50PM -0400, Keyur Govande wrote:
>>> >> On Mon, May 20, 2013 at 6:53 PM, Dave Chinner <david@fromorbit.com> wrote:
>>> >> > On Sun, May 19, 2013 at 11:50:55PM -0400, Keyur Govande wrote:
>>> >> >> Hello,
>>> >> >>
>>> >> >> We have a bunch of servers that create a lot of temp files, or check
>>> >> >> for the existence of non-existent files. Every such operation creates
>>> >> >> a dentry object and soon most of the free memory is consumed for
>>> >> >> 'negative' dentry entries. This behavior was observed on both CentOS
>>> >> >> kernel v.2.6.32-358 and Amazon Linux kernel v.3.4.43-4.
>> ....
>>> >> Also, setting a bad value for the knob would negatively impact file-IO
>>> >> performance, which on a spinning disk isn't guaranteed anyway. The
>>> >> current situation tanks memory performance which is more unexpected to
>>> >> a normal user.
>>> >
>>> > Which is precisely why a knob is the wrong solution. If it's
>>> > something a normal, unsuspecting user has problems with, then it
>>> > needs to be handled automatically by the kernel. Expecting users who
>>> > don't even know what a dentry is to know about a magic knob that
>>> > fixes a problem they don't even know they have is not an acceptable
>>> > solution.
>>> >
>>> > The first step to solving such a problem is to provide a
>>> > reproducable, measurable test case in a simple script that
>>> > demonstrates the problem that needs solving. If we can reproduce it
>>> > at will, then half the battle is already won....
>>>
>>> Here's a simple test case: https://gist.github.com/keyurdg/5660719 to
>>> create a ton of dentry cache entries, and
>>> https://gist.github.com/keyurdg/5660723 to allocate some memory.
>>>
>>> I kicked off 3 instances of fopen in 3 different prefixed directories.
>>> After all the memory was filled up with dentry entries, I tried
>>> allocating 4GB of memory. It took ~20s. If I turned off the dentry
>>> generation programs and attempted to allocate 4GB again, it only took
>>> 2s (because the memory was already free). Here's a quick graph of this
>>> behavior: http://i.imgur.com/XhgX84d.png
>>
>> News at 11! Memory allocation when memory is full is slower than
>> when it's empty!
>>
>> That's not what I was asking for. We were talking about negative
>> dentry buildup and possibly containing that, not a strawman "I can
>> fill all of memory with dentries by creating files" workload.
>
> By passing in a mode of "r" like: "./fopen test1 r & ./fopen test2 r
> &" you can create a ton of negative dentry cache entries.
>
>>
>> IOWs, your example is not demonstrating the problem you complained
>> about. We are not going to put a global limit on active dentries.
>>
>> If you really want a global dentry cache size limit or to ensure
>> that certain processes have free memory available for use, then
>> perhaps you should be looking at what you can control with cgroups.
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com

I looked through some older discussions like
http://marc.info/?l=linux-fsdevel&m=131363083730886&w=2 it sounds like
you might be OK with limiting the size of the "inactive cache"
(nr_unused). I think that is a perfectly reasonable solution, because
in my case, nr_unused bloating up is always the real issue.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-05-28 17:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-20  3:50 Limit dentry cache entries Keyur Govande
2013-05-20 12:20 ` Bob Peterson
2013-05-25  3:03   ` Keyur Govande
2013-05-20 22:53 ` Dave Chinner
2013-05-25  3:12   ` Keyur Govande
2013-05-26 23:23     ` Dave Chinner
2013-05-28  6:12       ` Keyur Govande
2013-05-28  6:24         ` Keyur Govande
2013-05-28 10:49         ` Dave Chinner
2013-05-28 16:42           ` Keyur Govande
2013-05-28 17:14             ` Keyur Govande

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.