All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Patrick McHardy <kaber@trash.net>
Cc: pablo@netfilter.org, netfilter-devel@vger.kernel.org,
	paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH] netfilter: nft_hash: bug fixes and resizing
Date: Thu, 27 Feb 2014 07:47:46 -0800	[thread overview]
Message-ID: <20140227154746.GA26756@thin> (raw)
In-Reply-To: <1393512980-28780-2-git-send-email-kaber@trash.net>

On Thu, Feb 27, 2014 at 02:56:20PM +0000, Patrick McHardy wrote:
> The hash set type is very broken and was never meant to be merged in this
> state. Missing RCU synchronization on element removal, leaking chain
> refcounts when used as a verdict map, races during lookups, a fixed table
> size are probably just some of the problems. Luckily it is currently
> never chosen by the kernel when the rbtree type is also available.
> 
> Rewrite it to be usable.
> 
> The new implementation supports automatic hash table resizing using RCU,
> based on Paul McKenney's and Josh Triplett's algorithm "Optimized Resizing
> For RCU-Protected Hash Tables" described in [1].
> 
> Resizing doesn't require a second list head in the elements, it works by
> chosing a hash function that remaps elements to a predictable set of buckets,
> only resizing by integral factors and
> 
> - during expansion: linking new buckets to the old bucket that contains
>   elements for any of the new buckets, thereby creating imprecise chains,
>   then incrementally seperating the elements until the new buckets only
>   contain elements that hash directly to them.
> 
> - during shrinking: linking the hash chains of all old buckets that hash
>   to the same new bucket to form a single chain.
> 
> Expansion requires at most the number of elements in the longest hash chain
> grace periods, shrinking requires a single grace period.
> 
> Due to the requirement of having hash chains/elements linked to multiple
> buckets during resizing, homemade single linked lists are used instead of
> the existing list helpers, that don't support this in a clean fashion.
> As a side effect, the amount of memory required per element is reduced by
> one pointer.
> 
> Expansion is triggered when the load factors exceeds 75%, shrinking when
> the load factor goes below 30%. Both operations are allowed to fail and
> will be retried on the next insertion or removal if their respective
> conditions still hold.

Since this hash table uses chaining, does it really make sense to expand
at 75% load?  You might just want to expand at 100%, which is even
easier to check for.  But that seems like a question for benchmarks to
answer.

> [1] http://dl.acm.org/citation.cfm?id=2002181.2002192
> 
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Signed-off-by: Patrick McHardy <kaber@trash.net>

This looks correct to me.  Thank you very much for this work!

One suggestion and one minor micro-optimization (unrelated to the
algorithm implementation) below.  With those changed:
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

> +	nft_hash_for_each_entry(he, he->next) {
> +		if (nft_hash_data(&he->key, ntbl->size, set->klen) != h)
> +			continue;
> +		next = he;
> +		break;
> +	}

> +		nft_hash_for_each_entry(he, tbl->buckets[h]) {
> +			if (nft_hash_data(&he->key, ntbl->size, set->klen) != i)
> +				continue;
> +			RCU_INIT_POINTER(ntbl->buckets[i], he);
> +			break;
> +		}

In both of these cases, you could reverse the sense of the if with
s/!=/==/, move the "statement; break;" into the if, and the continue
would become redundant.  (You then wouldn't even need the braces around
the loop body.)

Second, in the load-factor calculation:

> +	/* Expand table when exceeding 75% load */
> +	if (tbl->elements * 4 / 3 > tbl->size)
> +		nft_hash_tbl_expand(set, priv);

I just checked, and GCC ends up using an imul to implement this, due to
the division by 3.  I'd suggest rewriting it to:

if (tbl->elements > (tbl->size >> 2) * 3)

Dividing tbl->size by 4 first avoids the possibility of integer
overflow, and GCC translates the *3 into a single lea instruction.

(Also, do you need an NFT_HASH_MAX_SIZE here?)

Similar considerations apply to the calculation for shrinking.

- Josh Triplett

  reply	other threads:[~2014-02-27 15:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-27 14:56 [PATCH] netfilter: hash resizing Patrick McHardy
2014-02-27 14:56 ` [PATCH] netfilter: nft_hash: bug fixes and resizing Patrick McHardy
2014-02-27 15:47   ` Josh Triplett [this message]
2014-02-27 16:06     ` Patrick McHardy
2014-02-27 19:47   ` Paul E. McKenney
2014-02-28 11:28     ` Patrick McHardy
2014-02-28 16:09       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140227154746.GA26756@thin \
    --to=josh@joshtriplett.org \
    --cc=kaber@trash.net \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.