linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: James Simmons <jsimmons@infradead.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andreas Dilger <andreas.dilger@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: Re: [PATCH 00/20] staging: lustre: convert to rhashtable
Date: Wed, 18 Apr 2018 13:17:07 +1000	[thread overview]
Message-ID: <87y3hlqk3w.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <alpine.LFD.2.21.1804170429300.14398@casper.infradead.org>

[-- Attachment #1: Type: text/plain, Size: 3436 bytes --]

On Tue, Apr 17 2018, James Simmons wrote:

>> libcfs in lustre has a resizeable hashtable.
>> Linux already has a resizeable hashtable, rhashtable, which is better
>> is most metrics. See https://lwn.net/Articles/751374/ in a few days
>> for an introduction to rhashtable.
>
> Thansk for starting this work. I was think about cleaning the libcfs
> hash but your port to rhashtables is way better. How did you gather
> metrics to see that rhashtable was better than libcfs hash?

Code inspection and reputation.  It is hard to beat inlined lockless
code for lookups.  And rhashtable is heavily used in the network
subsystem and they are very focused on latency there.  I'm not sure that
insertion is as fast as it can be (I have some thoughts on that) but I'm
sure lookup will be better.
I haven't done any performance testing myself, only correctness.

>  
>> This series converts lustre to use rhashtable.  This affects several
>> different tables, and each is different is various ways.
>> 
>> There are two outstanding issues.  One is that a bug in rhashtable
>> means that we cannot enable auto-shrinking in one of the tables.  That
>> is documented as appropriate and should be fixed soon.
>> 
>> The other is that rhashtable has an atomic_t which counts the elements
>> in a hash table.  At least one table in lustre went to some trouble to
>> avoid any table-wide atomics, so that could lead to a regression.
>> I'm hoping that rhashtable can be enhanced with the option of a
>> per-cpu counter, or similar.
>> 
>
> This doesn't sound quite ready to land just yet. This will have to do some
> soak testing and a larger scope of test to make sure no new regressions
> happen. Believe me I did work to make lustre work better on tickless 
> systems, which I'm preparing for the linux client, and small changes could 
> break things in interesting ways. I will port the rhashtable change to the 
> Intel developement branch and get people more familar with the hash code
> to look at it.

Whether it is "ready" or not probably depends on perspective and
priorities.  As I see it, getting lustre cleaned up and out of staging
is a fairly high priority, and it will require a lot of code change.
It is inevitable that regressions will slip in (some already have) and
it is important to keep testing (the test suite is of great benefit, but
is only part of the story of course).  But to test the code, it needs to
land.  Testing the code in Intel's devel branch and then porting it
across doesn't really prove much.  For testing to be meaningful, it
needs to be tested in a branch that up-to-date with mainline and on
track to be merged into mainline.

I have no particular desire to rush this in, but I don't see any
particular benefit in delaying it either.

I guess I see staging as implicitly a 'devel' branch.  You seem to be
treating it a bit like a 'stable' branch - is that right?

As mentioned, I think there is room for improvement in rhashtable.
- making the atomic counter optional
- replacing the separate spinlocks with bit-locks in the hash-chain head
  so that the lock and chain are in the same cache line
- for the common case of inserting in an empty chain, a single atomic
  cmpxchg() should be sufficient
I think we have a better chance of being heard if we have "skin in the
game" and have upstream code that would use this.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2018-04-18  3:17 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-11 21:54 [PATCH 00/20] staging: lustre: convert to rhashtable NeilBrown
2018-04-11 21:54 ` [PATCH 03/20] staging: lustre: convert obd uuid hash " NeilBrown
2018-04-11 21:54 ` [PATCH 04/20] staging: lustre: convert osc_quota " NeilBrown
2018-04-11 21:54 ` [PATCH 05/20] staging: lustre: separate buckets from ldlm hash table NeilBrown
2018-04-11 21:54 ` [PATCH 09/20] staging: lustre: convert ldlm_resource hash to rhashtable NeilBrown
2018-04-11 21:54 ` [PATCH 07/20] staging: lustre: ldlm: store name directly in namespace NeilBrown
2018-04-11 21:54 ` [PATCH 10/20] staging: lustre: make struct lu_site_bkt_data private NeilBrown
2018-04-11 21:54 ` [PATCH 02/20] staging: lustre: convert lov_pool to use rhashtable NeilBrown
2018-04-11 21:54 ` [PATCH 12/20] staging: lustre: lu_object: factor out extra per-bucket data NeilBrown
2018-04-11 21:54 ` [PATCH 08/20] staging: lustre: simplify ldlm_ns_hash_defs[] NeilBrown
2018-04-11 21:54 ` [PATCH 01/20] staging: lustre: ptlrpc: convert conn_hash to rhashtable NeilBrown
2018-04-11 21:54 ` [PATCH 06/20] staging: lustre: ldlm: add a counter to the per-namespace data NeilBrown
2018-04-11 21:54 ` [PATCH 11/20] staging: lustre: lu_object: discard extra lru count NeilBrown
2018-04-11 21:54 ` [PATCH 17/20] staging: lustre: use call_rcu() to free lu_object_headers NeilBrown
2018-04-11 21:54 ` [PATCH 15/20] staging: lustre: llite: use more private data in dump_pgcache NeilBrown
2018-04-11 21:54 ` [PATCH 16/20] staging: lustre: llite: remove redundant lookup " NeilBrown
2018-04-11 21:54 ` [PATCH 14/20] staging: lustre: fold lu_object_new() into lu_object_find_at() NeilBrown
2018-04-11 21:54 ` [PATCH 13/20] staging: lustre: lu_object: move retry logic inside htable_lookup NeilBrown
2018-04-11 21:54 ` [PATCH 18/20] staging: lustre: change how "dump_page_cache" walks a hash table NeilBrown
2018-04-11 21:54 ` [PATCH 20/20] staging: lustre: remove cfs_hash resizeable hashtable implementation NeilBrown
2018-04-11 21:54 ` [PATCH 19/20] staging: lustre: convert lu_object cache to rhashtable NeilBrown
2018-04-17  3:35 ` [PATCH 00/20] staging: lustre: convert " James Simmons
2018-04-18  3:17   ` NeilBrown [this message]
2018-04-18 21:56     ` [lustre-devel] " Simmons, James A.
2018-04-23 13:08 ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y3hlqk3w.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=andreas.dilger@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jsimmons@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lustre-devel@lists.lustre.org \
    --cc=oleg.drokin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).