All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats
Date: Mon, 18 Feb 2013 11:18:06 -0500	[thread overview]
Message-ID: <50239A71-7563-434D-B69C-9D7009BD43F1@oracle.com> (raw)
In-Reply-To: <20130218093948.670ddb06@tlielax.poochiereds.net>


On Feb 18, 2013, at 9:39 AM, Jeff Layton <jlayton@redhat.com> wrote:

> On Sat, 16 Feb 2013 12:18:18 -0500
> Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>> 
>> On Feb 16, 2013, at 8:39 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
>> 
>>> On Fri, Feb 15, 2013 at 05:20:58PM -0500, Jeff Layton wrote:
>>>> An excellent question, and not an easy one to answer. Clearly 1024
>>>> entries was not enough. We now cap the size as a function of the
>>>> available low memory, which I think is a reasonable way to keep it from
>>>> ballooning so large that the box falls over. We also have a shrinker
>>>> and periodic cache cleaner to prune off entries that have expired.
>>>> 
>>>> Of course one thing I haven't really considered enough is the
>>>> performance implications of walking the potentially much longer hash
>>>> chains here.
>>>> 
>>>> If that is a problem, then one way to counter that without moving to a
>>>> different structure altogether might be to alter the hash function
>>>> based on the max size of the cache. IOW, grow the number of hash buckets
>>>> as the max cache size grows?
>> 
>> The trouble with a hash table is that once you've allocated it, it's a heavy lift to increase the table size.  That sort of logic adds complexity and additional locking, and is often difficult to test.
>> 
> 
> I wasn't suggesting that we resize/rebalance the table on the fly. We
> determine the max allowable number of cache entries when the server
> starts. We could also determine the number of buckets at the same time,
> and alter the hashing function to take that number into account.
> 
> Of course, more buckets may not help if the hash function just sucks.

My point was that using a data structure like an rbtree, as you suggested, has the advantage of not requiring any extra logic (whether done statically or dynamically).  An rbtree implementation is much more likely to exercise exactly the same code paths no matter how many entries are in the cache.  It will scale deterministically as DRC capacity continues to increase over time.  And it is not dependent on efficient hash-ability of client XIDs.


> 
>>> Another reason to organize the cache per client address?
>> 
>> 
>> In theory, an active single client could evict all entries for other clients, but do we know this happens in practice?
>> 
> 
> I'm pretty sure that's what's been happening to our QA group. They have
> some shared NFS servers set up in the test lab for client testing. When
> things get busy, they the DRC just plain doesn't appear to work. It's
> hard to know for sure though since the problem only crops up very
> rarely.
> 
> My hope is that the massive increase in the size of the DRC should
> prevent that from occurring now.

Well, right.  If the DRC is too small already, any client can cause entries to be evicted prematurely, even its own entries.  Now we have an opportunity to ask whether a larger cache makes it much less likely that a single obnoxious client will effect his neighbors.  In other words, maybe per-client sensitivity doesn't buy us much if the DRC can grow larger.


> 
>>> With a per-client maximum number of entries, sizing the hash tables
>>> should be easier.
>> 
>> 
>> When a server has only one client, should that client be allowed to maximize the use of a server's resources (eg, use all of the DRC resource the server has available)?  How about when a server has one active client and multiple quiescent clients?
>> 
> 
> Again, we have two "problems" to solve, and we need to take care not to
> conflate them too much.
> 
> 1) how do we best organize the cache for efficient lookups?
> 
> 2) what cache eviction policy should we use?

These are related, I feel.  Because memory capacity is increasing faster than memory bandwidth, I think lookup time efficiency is relatively more important than cache space efficiency.  Also, above we were arguing that a larger DRC is more effective.

What's more, 2) is a much harder problem to solve well.  And if lookups are efficient anyway, then 2) should matter much less, as it will only be needed to deal with memory pressure on the server, and not to keep lookup time efficient.

I was proposing the approach of dealing with per-client caching with 2) (choosing which entry to evict based on how many entries there are for each client).  But 2) looks less critical than 1), thus perhaps per-client caching may not provide a lot of benefit.


> Organizing the cache based on client address might help both of those,
> but we'll need to determine whether it's worth the extra complexity.

Agree.  Data, data, data.


-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





      reply	other threads:[~2013-02-18 16:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-15 17:23 [PATCH v2 0/3] nfsd: better stats gathering for DRC Jeff Layton
2013-02-15 17:23 ` [PATCH v2 1/3] nfsd: break out comparator into separate function Jeff Layton
2013-02-15 17:23 ` [PATCH v2 2/3] nfsd: track memory utilization in the DRC Jeff Layton
2013-02-15 17:23 ` [PATCH v2 3/3] nfsd: add new reply_cache_stats file in nfsdfs Jeff Layton
2013-02-15 17:29   ` Chuck Lever
2013-02-15 18:34     ` Jeff Layton
2013-02-15 20:04       ` [PATCH RFC] nfsd: report length of the largest hash chain in reply cache stats Jeff Layton
2013-02-15 21:14         ` Chuck Lever
2013-02-15 22:20           ` Jeff Layton
2013-02-16 13:39             ` J. Bruce Fields
2013-02-16 17:18               ` Chuck Lever
2013-02-17 16:00                 ` J. Bruce Fields
2013-02-17 19:58                   ` Chuck Lever
2013-02-18 14:21                   ` Jeff Layton
2013-02-18 14:30                     ` J. Bruce Fields
2013-02-18 14:39                 ` Jeff Layton
2013-02-18 16:18                   ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50239A71-7563-434D-B69C-9D7009BD43F1@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=bfields@fieldses.org \
    --cc=jlayton@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.