All of lore.kernel.org
 help / color / mirror / Atom feed
* Question regarding hash_resize
@ 2019-01-07 13:49 Junchang Wang
  2019-01-07 18:33 ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Junchang Wang @ 2019-01-07 13:49 UTC (permalink / raw)
  To: perfbook

Hi all,

I'm reading hash_resize recently, and have a few questions regarding
this algorithm. Please take a look if you have time. Any suggestions
are warmly welcomed.

=== Question 1 ===
In hash_resize.c : hashtab_lock_mod
186         if (b > READ_ONCE(htp->ht_resize_cur)) {
187                 lsp->hbp[1] = NULL;
188                 return;
189         }
190         htp = rcu_dereference(htp->ht_new);

It seems we are missing a barrier (e.g., smp_mb) in between lines 189
and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
compilers and hardware from reordering the two unrelated variables,
ht_resize_cur and ht_new. Is my understanding correct?

=== Question 2 ===
In hash_resize.c, each time an updater wants to access a bucket, the
updater must first acquire the bucket's lock (htb_lock), preventing
other updaters accessing the same bucket concurrently. This approach
is OK if the linked list of a bucket is relatively short, but for a
larger system where linked lists are long enough and the
perftest_resize thread is running simultaneously, it could become a
potential performance bottleneck. One naive solution is to allow
multiple updaters to access the same bucket, only if they don't
operate on the same item of the list of this bucket. I wonder if there
are any existing works or discussions on this topic?

=== Question 3 ===
Chapter Data Structures also discusses other resizable hash tables,
namely "Resizable, scalable, concurrent hash tables via relativistic
programming" from Josh Triplett, which can save memory footprint by
using a single pair of pointers. But my understanding is that
perftest_resize.c is unique in that it allows you to rebuild the hash
table by utilizing a different hash function, which could be very
useful in practice (e.g., to prevent DDoS attack). Other solutions do
not share this property. Is my understanding correct? Did I miss any
discussions on this topic in perfbook?

=== Question 4 ===
In the current implementation of hash_resize.c, the perftest_resize
could block an updater, and vice versa. It seems this is not what we
expected. Ideally, they should be allowed to run concurrently, or at
least the perftest_resize thread should have lower priority and
updaters should never be blocked by the perftest_resize thread. Is
that right? I'm very interested in helping improve. Please let me know
if you have any suggestions.

Thanks,
--Junchang


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 13:49 Question regarding hash_resize Junchang Wang
@ 2019-01-07 18:33 ` Paul E. McKenney
  2019-01-07 22:54   ` Akira Yokosawa
  2019-01-18 14:32   ` Junchang Wang
  0 siblings, 2 replies; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-07 18:33 UTC (permalink / raw)
  To: Junchang Wang; +Cc: perfbook

On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> Hi all,
> 
> I'm reading hash_resize recently, and have a few questions regarding
> this algorithm. Please take a look if you have time. Any suggestions
> are warmly welcomed.
> 
> === Question 1 ===
> In hash_resize.c : hashtab_lock_mod
> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> 187                 lsp->hbp[1] = NULL;
> 188                 return;
> 189         }
> 190         htp = rcu_dereference(htp->ht_new);
> 
> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> compilers and hardware from reordering the two unrelated variables,
> ht_resize_cur and ht_new. Is my understanding correct?

Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
section and there is a synchronize_rcu() between the update to ->ht_new
and the updates to ->ht_resize_cur.  For more details on how this works,
please see https://lwn.net/Articles/573497/.

Of course, if you find a code path in which a call to hashtab_lock_mod()
is invoked outside of an RCU read-side critical section, that would be
a bug.  (Can you tell me an exception to this rule, that is, a case
where hashtab_lock_mod() could safely be invoked outside of an RCU
read-side critical section?)

> === Question 2 ===
> In hash_resize.c, each time an updater wants to access a bucket, the
> updater must first acquire the bucket's lock (htb_lock), preventing
> other updaters accessing the same bucket concurrently. This approach
> is OK if the linked list of a bucket is relatively short, but for a
> larger system where linked lists are long enough and the
> perftest_resize thread is running simultaneously, it could become a
> potential performance bottleneck. One naive solution is to allow
> multiple updaters to access the same bucket, only if they don't
> operate on the same item of the list of this bucket. I wonder if there
> are any existing works or discussions on this topic?

One approach is to use a hashed array of locks, and to hash a given
element's address to locate the lock to be used.  Please see
Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
Needed Locks First"), including Quick Quiz 7.9, for additional details.

Another approach is to use RCU to protect traversals, and locks within the
linked-list elements themselves.  These locks are conditionally acquired
(again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
them in list order, and the tricks in Quick Quiz 7.9.

Non-blocking synchronization can also be used, but it is often quite a
bit more complicated.  See for example the split-order list of Shalev
and Shavit, along with Desnoyers's RCU-protected extension in the
userspace RCU library.

But it is usually -way- better to just choose a good hash function and
to increase the number of buckets.  Which is of course one reason for
having resizable hash tables.  ;-)

But the other techniques can be useful in more complex linked data
structures, such as graphs, where there is no reasonable way to
partition the data.  Nevertheless, many people choose to do the
partitioning anyway, especially on distributed systems.

> === Question 3 ===
> Chapter Data Structures also discusses other resizable hash tables,
> namely "Resizable, scalable, concurrent hash tables via relativistic
> programming" from Josh Triplett, which can save memory footprint by
> using a single pair of pointers. But my understanding is that
> perftest_resize.c is unique in that it allows you to rebuild the hash
> table by utilizing a different hash function, which could be very
> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> not share this property. Is my understanding correct? Did I miss any
> discussions on this topic in perfbook?

Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
(which I use in hash_resize.c) is the only one allowing arbitrary changes
to hash functions.  I expect that this advantage will become increasingly
important as security issues become more challenging.  Furthermore, I
suspect that the pointer-pair approach is faster and more scalable.
It is certainly simpler.

On the other hand, one advantage of the other two approaches is decreased
memory consumption.

Another advantage of Josh Triplett's pointer-unzip approach is that
concurrent updates are (in theory, anyway) not blocked for as long
by resize operations.  The other edge of this sword is that resizing
is much slower, given the need to wait for many RCU grace periods.

Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
and Shavit's split-order list is that all operations are non-blocking,
which can be important on massively overloaded systems, such as one
might find in cloud computing.

> === Question 4 ===
> In the current implementation of hash_resize.c, the perftest_resize
> could block an updater, and vice versa. It seems this is not what we
> expected. Ideally, they should be allowed to run concurrently, or at
> least the perftest_resize thread should have lower priority and
> updaters should never be blocked by the perftest_resize thread. Is
> that right? I'm very interested in helping improve. Please let me know
> if you have any suggestions.

In hash_resize.c, an updater is blocked only for the time required to
redisposition a bucket.  This is a great improvement over blocking
updaters for the full resize over all buckets.

But yes, it is not hard to do better, for example, periodically dropping
the old-table lock in hashtab_resize().  This requires a few careful
adjustments, of course.  Can you tell me what these adjustments are?

Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
optimizing for the race with hashtab_resize() doesn't make a whole lot
of sense.  Please see the patch below.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 737646a9c868d841b32199b52f5569668975953e
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Mon Jan 7 10:29:14 2019 -0800

    datastruct/hash: Simplify hashtab_lookup()
    
    Because resizing leaves the old hash table intact, and because lookups
    are carried out within RCU read-side critical sections (which prevent
    a second resizing operation from starting), there is no need for a
    lookup to search anywhere but in the old hash table.  And in the common
    case, there is no resize, so there is no new hash table.  Therefore,
    eliminating the check for resizing speeds things up in the common
    case.  In addition, this simplifies the code.
    
    This commit therefore eliminates the ht_get_bucket() function,
    renames the ht_get_bucket_single() function to ht_get_bucket(),
    and modifies callers appropriately.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
index 29e05f907200..be4157959b83 100644
--- a/CodeSamples/datastruct/hash/hash_resize.c
+++ b/CodeSamples/datastruct/hash/hash_resize.c
@@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
 //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
 /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
 static struct ht_bucket *				//\lnlbl{single:b}
-ht_get_bucket_single(struct ht *htp, void *key, long *b,
-                     unsigned long *h)
+ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
 {
 	unsigned long hash = htp->ht_gethash(key);
 
@@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
 		*h = hash;				//\lnlbl{single:h}
 	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
 }							//\lnlbl{single:e}
-
-/* Get hash bucket correesponding to key, accounting for resize. */
-static struct ht_bucket *				//\lnlbl{b}
-ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
-{
-	struct ht_bucket *htbp;
-
-	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
-								//\fcvexclude
-	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
-		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
-		*htp = rcu_dereference((*htp)->ht_new);	//\lnlbl{newtable}
-		htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
-	}
-	if (i)						//\lnlbl{chk_i}
-		*i = (*htp)->ht_idx;			//\lnlbl{set_idx}
-	return htbp;					//\lnlbl{return}
-}							//\lnlbl{e}
 //\end{snippet}
 
 /* Read-side lock/unlock functions. */
@@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
 
 	rcu_read_lock();				//\lnlbl{l:rcu_lock}
 	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{l:refhashtbl}
-	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:refbucket}
+	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:refbucket}
 	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_bucket}
 	lsp->hbp[0] = htbp;				//\lnlbl{l:lsp0b}
 	lsp->hls_idx[0] = htp->ht_idx;
@@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
 		return;					//\lnlbl{l:fastret1}
 	}
 	htp = rcu_dereference(htp->ht_new);		//\lnlbl{l:new_hashtbl}
-	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:get_newbkt}
+	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:get_newbkt}
 	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_newbkt}
 	lsp->hbp[1] = htbp;				//\lnlbl{l:lsp1b}
 	lsp->hls_idx[1] = htp->ht_idx;
@@ -223,16 +204,15 @@ struct ht_elem *					//\lnlbl{lkp:b}
 hashtab_lookup(struct hashtab *htp_master, void *key)
 {
 	long b;
-	int i;
 	struct ht *htp;
 	struct ht_elem *htep;
 	struct ht_bucket *htbp;
 
 	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
-	htbp = ht_get_bucket(&htp, key, &b, &i);	//\lnlbl{lkp:get_curbkt}
+	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
 	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
 	                            &htbp->htb_head,
-	                            hte_next[i]) {
+	                            hte_next[htp->ht_idx]) {
 		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
 			return htep;			//\lnlbl{lkp:ret_match}
 	}						//\lnlbl{lkp:loop:e}
@@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
 		htbp = &htp->ht_bkt[i];			//\lnlbl{get_oldcur}
 		spin_lock(&htbp->htb_lock);		//\lnlbl{acq_oldcur}
 		cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
-			htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
+			htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
 			spin_lock(&htbp_new->htb_lock);
 			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
 			spin_unlock(&htbp_new->htb_lock);
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 5c61bf5e2389..0152437c274e 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -966,10 +966,8 @@ the old table.
 \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
 Bucket selection is shown in
 Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
-which shows \co{ht_get_bucket_single()} on
-lines~\lnref{single:b}-\lnref{single:e} and
-\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
-The \co{ht_get_bucket_single()} function returns a reference to the bucket
+which shows \co{ht_get_bucket()}.
+This function returns a reference to the bucket
 corresponding to the specified key in the specified hash table, without
 making any allowances for resizing.
 It also stores the bucket index corresponding to the key into the location
@@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
 hash value corresponding to the key into the location
 referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
 Line~\lnref{single:return} then returns a reference to the corresponding bucket.
-
-The \co{ht_get_bucket()} function handles hash-table selection, invoking
-\co{ht_get_bucket_single()} on
-line~\lnref{call_single} to select the bucket
-corresponding to the hash in the current
-hash table, storing the hash value through parameter~\co{b}.
-If line~\lnref{resized} determines that the table is being resized and that
-line~\lnref{call_single}'s bucket has already been distributed across the new hash
-table, then line~\lnref{newtable} selects the new hash table and
-line~\lnref{newbucket}
-selects the bucket corresponding to the hash in the new hash table,
-again storing the hash value through parameter~\co{b}.
-\end{lineref}
-
-\QuickQuiz{}
-	The code in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
-	computes the hash twice!
-	Why this blatant inefficiency?
-\QuickQuizAnswer{
-	The reason is that the old and new hash tables might have
-	completely different hash functions, so that a hash computed
-	for the old table might be completely irrelevant to the
-	new table.
-} \QuickQuizEnd
-
-\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
-If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
-line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
-Finally, line~\lnref{return} returns a reference to the selected hash bucket.
 \end{lineref}
 
 \QuickQuiz{}
@@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
 	functions described next.
 } \QuickQuizEnd
 
-This implementation of
-\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
-permit lookups and modifications to run concurrently
-with a resize operation.
+This implementation of \co{ht_get_bucket()} permits lookups and
+modifications to run concurrently with a resize operation.
 
 \begin{listing}[tb]
 \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
@@ -1129,11 +1095,6 @@ hash lookups.
 Line~\lnref{get_curtbl} fetches the current hash table and
 line~\lnref{get_curbkt} obtains a reference
 to the bucket corresponding to the specified key.
-This bucket will be located in a new resized hash table when a
-resize operation has progressed past the bucket in the old hash
-table that contained the desired data element.
-Note that line~\lnref{get_curbkt} also passes back the index that will be
-used to select the correct set of pointers from the pair in each element.
 The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
 so that if line~\lnref{match}
 detects a match,
@@ -1144,22 +1105,17 @@ failure.
 \end{lineref}
 
 \QuickQuiz{}
-	In the \co{hashtab_lookup()} function in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
-	the code carefully finds the right bucket in the new hash table
-	if the element to be looked up has already been distributed
-	by a concurrent resize operation.
-	This seems wasteful for RCU-protected lookups.
-	Why not just stick with the old hash table in this case?
+	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
+	What if execution reaches line~\lnref{loop:b}
+	of \co{hashtab_lookup()} in
+	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
+	just after this bucket has been resized.
+	Won't that result in lookup failures?
+	\end{lineref}
 \QuickQuizAnswer{
-	Suppose that a resize operation begins and distributes half of
-	the old table's buckets to the new table.
-	Suppose further that a thread adds a new element that goes into
-	one of the already-distributed buckets, and that this same thread
-	now looks up this newly added element.
-	If lookups unconditionally traversed only the old hash table,
-	this thread would get a lookup failure for the element that it
-	just added, which certainly sounds like a bug to me!
+	No, it won't.
+	Resizing into the new hash table leaves the old hash table
+	intact, courtesy of the pointer pairs.
 } \QuickQuizEnd
 
 \begin{lineref}[ln:datastruct:hash_resize:access:add]


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 18:33 ` Paul E. McKenney
@ 2019-01-07 22:54   ` Akira Yokosawa
  2019-01-07 23:06     ` Akira Yokosawa
  2019-01-07 23:13     ` Paul E. McKenney
  2019-01-18 14:32   ` Junchang Wang
  1 sibling, 2 replies; 23+ messages in thread
From: Akira Yokosawa @ 2019-01-07 22:54 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Junchang Wang, perfbook, Akira Yokosawa

Hi Paul, 

On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
>> Hi all,
>>
>> I'm reading hash_resize recently, and have a few questions regarding
>> this algorithm. Please take a look if you have time. Any suggestions
>> are warmly welcomed.
>>
>> === Question 1 ===
>> In hash_resize.c : hashtab_lock_mod
>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
>> 187                 lsp->hbp[1] = NULL;
>> 188                 return;
>> 189         }
>> 190         htp = rcu_dereference(htp->ht_new);
>>
>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
>> compilers and hardware from reordering the two unrelated variables,
>> ht_resize_cur and ht_new. Is my understanding correct?
> 
> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> section

You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
an RCU read-side critical section", don't you?

>         and there is a synchronize_rcu() between the update to ->ht_new
> and the updates to ->ht_resize_cur.  For more details on how this works,
> please see https://lwn.net/Articles/573497/.
> 
> Of course, if you find a code path in which a call to hashtab_lock_mod()
> is invoked outside of an RCU read-side critical section, that would be
> a bug.  (Can you tell me an exception to this rule, that is, a case
> where hashtab_lock_mod() could safely be invoked outside of an RCU
> read-side critical section?)
> 
>> === Question 2 ===
>> In hash_resize.c, each time an updater wants to access a bucket, the
>> updater must first acquire the bucket's lock (htb_lock), preventing
>> other updaters accessing the same bucket concurrently. This approach
>> is OK if the linked list of a bucket is relatively short, but for a
>> larger system where linked lists are long enough and the
>> perftest_resize thread is running simultaneously, it could become a
>> potential performance bottleneck. One naive solution is to allow
>> multiple updaters to access the same bucket, only if they don't
>> operate on the same item of the list of this bucket. I wonder if there
>> are any existing works or discussions on this topic?
> 
> One approach is to use a hashed array of locks, and to hash a given
> element's address to locate the lock to be used.  Please see
> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> Needed Locks First"), including Quick Quiz 7.9, for additional details.
> 
> Another approach is to use RCU to protect traversals, and locks within the
> linked-list elements themselves.  These locks are conditionally acquired
> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> them in list order, and the tricks in Quick Quiz 7.9.
> 
> Non-blocking synchronization can also be used, but it is often quite a
> bit more complicated.  See for example the split-order list of Shalev
> and Shavit, along with Desnoyers's RCU-protected extension in the
> userspace RCU library.
> 
> But it is usually -way- better to just choose a good hash function and
> to increase the number of buckets.  Which is of course one reason for
> having resizable hash tables.  ;-)
> 
> But the other techniques can be useful in more complex linked data
> structures, such as graphs, where there is no reasonable way to
> partition the data.  Nevertheless, many people choose to do the
> partitioning anyway, especially on distributed systems.
> 
>> === Question 3 ===
>> Chapter Data Structures also discusses other resizable hash tables,
>> namely "Resizable, scalable, concurrent hash tables via relativistic
>> programming" from Josh Triplett, which can save memory footprint by
>> using a single pair of pointers. But my understanding is that
>> perftest_resize.c is unique in that it allows you to rebuild the hash
>> table by utilizing a different hash function, which could be very
>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
>> not share this property. Is my understanding correct? Did I miss any
>> discussions on this topic in perfbook?
> 
> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> to hash functions.  I expect that this advantage will become increasingly
> important as security issues become more challenging.  Furthermore, I
> suspect that the pointer-pair approach is faster and more scalable.
> It is certainly simpler.
> 
> On the other hand, one advantage of the other two approaches is decreased
> memory consumption.
> 
> Another advantage of Josh Triplett's pointer-unzip approach is that
> concurrent updates are (in theory, anyway) not blocked for as long
> by resize operations.  The other edge of this sword is that resizing
> is much slower, given the need to wait for many RCU grace periods.
> 
> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> and Shavit's split-order list is that all operations are non-blocking,
> which can be important on massively overloaded systems, such as one
> might find in cloud computing.
> 
>> === Question 4 ===
>> In the current implementation of hash_resize.c, the perftest_resize
>> could block an updater, and vice versa. It seems this is not what we
>> expected. Ideally, they should be allowed to run concurrently, or at
>> least the perftest_resize thread should have lower priority and
>> updaters should never be blocked by the perftest_resize thread. Is
>> that right? I'm very interested in helping improve. Please let me know
>> if you have any suggestions.
> 
> In hash_resize.c, an updater is blocked only for the time required to
> redisposition a bucket.  This is a great improvement over blocking
> updaters for the full resize over all buckets.
> 
> But yes, it is not hard to do better, for example, periodically dropping
> the old-table lock in hashtab_resize().  This requires a few careful
> adjustments, of course.  Can you tell me what these adjustments are?
> 
> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> optimizing for the race with hashtab_resize() doesn't make a whole lot
> of sense.  Please see the patch below.  Thoughts?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 737646a9c868d841b32199b52f5569668975953e
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Mon Jan 7 10:29:14 2019 -0800
> 
>     datastruct/hash: Simplify hashtab_lookup()
>     
>     Because resizing leaves the old hash table intact, and because lookups
>     are carried out within RCU read-side critical sections (which prevent
>     a second resizing operation from starting), there is no need for a
>     lookup to search anywhere but in the old hash table.  And in the common
>     case, there is no resize, so there is no new hash table.  Therefore,
>     eliminating the check for resizing speeds things up in the common
>     case.  In addition, this simplifies the code.
>     
>     This commit therefore eliminates the ht_get_bucket() function,
>     renames the ht_get_bucket_single() function to ht_get_bucket(),
>     and modifies callers appropriately.
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> index 29e05f907200..be4157959b83 100644
> --- a/CodeSamples/datastruct/hash/hash_resize.c
> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>  static struct ht_bucket *				//\lnlbl{single:b}
> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> -                     unsigned long *h)
> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>  {
>  	unsigned long hash = htp->ht_gethash(key);
>  
> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
>  		*h = hash;				//\lnlbl{single:h}
>  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
>  }							//\lnlbl{single:e}
> -
> -/* Get hash bucket correesponding to key, accounting for resize. */
> -static struct ht_bucket *				//\lnlbl{b}
> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> -{
> -	struct ht_bucket *htbp;
> -
> -	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> -								//\fcvexclude
> -	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
> -		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */

If we can remove this memory barrier, the counterpart smp_mb() in
hashtab_resize() becomes unnecessary, doesn't it?

        Thanks, Akira

> -		*htp = rcu_dereference((*htp)->ht_new);	//\lnlbl{newtable}
> -		htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> -	}
> -	if (i)						//\lnlbl{chk_i}
> -		*i = (*htp)->ht_idx;			//\lnlbl{set_idx}
> -	return htbp;					//\lnlbl{return}
> -}							//\lnlbl{e}
>  //\end{snippet}
>  
>  /* Read-side lock/unlock functions. */
> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>  
>  	rcu_read_lock();				//\lnlbl{l:rcu_lock}
>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{l:refhashtbl}
> -	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:refbucket}
> +	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:refbucket}
>  	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_bucket}
>  	lsp->hbp[0] = htbp;				//\lnlbl{l:lsp0b}
>  	lsp->hls_idx[0] = htp->ht_idx;
> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>  		return;					//\lnlbl{l:fastret1}
>  	}
>  	htp = rcu_dereference(htp->ht_new);		//\lnlbl{l:new_hashtbl}
> -	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:get_newbkt}
> +	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:get_newbkt}
>  	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_newbkt}
>  	lsp->hbp[1] = htbp;				//\lnlbl{l:lsp1b}
>  	lsp->hls_idx[1] = htp->ht_idx;
> @@ -223,16 +204,15 @@ struct ht_elem *					//\lnlbl{lkp:b}
>  hashtab_lookup(struct hashtab *htp_master, void *key)
>  {
>  	long b;
> -	int i;
>  	struct ht *htp;
>  	struct ht_elem *htep;
>  	struct ht_bucket *htbp;
>  
>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
> -	htbp = ht_get_bucket(&htp, key, &b, &i);	//\lnlbl{lkp:get_curbkt}
> +	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
>  	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
>  	                            &htbp->htb_head,
> -	                            hte_next[i]) {
> +	                            hte_next[htp->ht_idx]) {
>  		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
>  			return htep;			//\lnlbl{lkp:ret_match}
>  	}						//\lnlbl{lkp:loop:e}
> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
>  		htbp = &htp->ht_bkt[i];			//\lnlbl{get_oldcur}
>  		spin_lock(&htbp->htb_lock);		//\lnlbl{acq_oldcur}
>  		cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> -			htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> +			htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>  			spin_lock(&htbp_new->htb_lock);
>  			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
>  			spin_unlock(&htbp_new->htb_lock);
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index 5c61bf5e2389..0152437c274e 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
> @@ -966,10 +966,8 @@ the old table.
>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>  Bucket selection is shown in
>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> -which shows \co{ht_get_bucket_single()} on
> -lines~\lnref{single:b}-\lnref{single:e} and
> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> +which shows \co{ht_get_bucket()}.
> +This function returns a reference to the bucket
>  corresponding to the specified key in the specified hash table, without
>  making any allowances for resizing.
>  It also stores the bucket index corresponding to the key into the location
> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
>  hash value corresponding to the key into the location
>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> -
> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> -\co{ht_get_bucket_single()} on
> -line~\lnref{call_single} to select the bucket
> -corresponding to the hash in the current
> -hash table, storing the hash value through parameter~\co{b}.
> -If line~\lnref{resized} determines that the table is being resized and that
> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> -table, then line~\lnref{newtable} selects the new hash table and
> -line~\lnref{newbucket}
> -selects the bucket corresponding to the hash in the new hash table,
> -again storing the hash value through parameter~\co{b}.
> -\end{lineref}
> -
> -\QuickQuiz{}
> -	The code in
> -	Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> -	computes the hash twice!
> -	Why this blatant inefficiency?
> -\QuickQuizAnswer{
> -	The reason is that the old and new hash tables might have
> -	completely different hash functions, so that a hash computed
> -	for the old table might be completely irrelevant to the
> -	new table.
> -} \QuickQuizEnd
> -
> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>  \end{lineref}
>  
>  \QuickQuiz{}
> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>  	functions described next.
>  } \QuickQuizEnd
>  
> -This implementation of
> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> -permit lookups and modifications to run concurrently
> -with a resize operation.
> +This implementation of \co{ht_get_bucket()} permits lookups and
> +modifications to run concurrently with a resize operation.
>  
>  \begin{listing}[tb]
>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> @@ -1129,11 +1095,6 @@ hash lookups.
>  Line~\lnref{get_curtbl} fetches the current hash table and
>  line~\lnref{get_curbkt} obtains a reference
>  to the bucket corresponding to the specified key.
> -This bucket will be located in a new resized hash table when a
> -resize operation has progressed past the bucket in the old hash
> -table that contained the desired data element.
> -Note that line~\lnref{get_curbkt} also passes back the index that will be
> -used to select the correct set of pointers from the pair in each element.
>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
>  so that if line~\lnref{match}
>  detects a match,
> @@ -1144,22 +1105,17 @@ failure.
>  \end{lineref}
>  
>  \QuickQuiz{}
> -	In the \co{hashtab_lookup()} function in
> -	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> -	the code carefully finds the right bucket in the new hash table
> -	if the element to be looked up has already been distributed
> -	by a concurrent resize operation.
> -	This seems wasteful for RCU-protected lookups.
> -	Why not just stick with the old hash table in this case?
> +	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> +	What if execution reaches line~\lnref{loop:b}
> +	of \co{hashtab_lookup()} in
> +	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> +	just after this bucket has been resized.
> +	Won't that result in lookup failures?
> +	\end{lineref}
>  \QuickQuizAnswer{
> -	Suppose that a resize operation begins and distributes half of
> -	the old table's buckets to the new table.
> -	Suppose further that a thread adds a new element that goes into
> -	one of the already-distributed buckets, and that this same thread
> -	now looks up this newly added element.
> -	If lookups unconditionally traversed only the old hash table,
> -	this thread would get a lookup failure for the element that it
> -	just added, which certainly sounds like a bug to me!
> +	No, it won't.
> +	Resizing into the new hash table leaves the old hash table
> +	intact, courtesy of the pointer pairs.
>  } \QuickQuizEnd
>  
>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 22:54   ` Akira Yokosawa
@ 2019-01-07 23:06     ` Akira Yokosawa
  2019-01-07 23:48       ` Paul E. McKenney
  2019-01-08  1:56       ` Junchang Wang
  2019-01-07 23:13     ` Paul E. McKenney
  1 sibling, 2 replies; 23+ messages in thread
From: Akira Yokosawa @ 2019-01-07 23:06 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Junchang Wang, perfbook, Akira Yokosawa

On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> Hi Paul, 
> 
> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
>>> Hi all,
>>>
>>> I'm reading hash_resize recently, and have a few questions regarding
>>> this algorithm. Please take a look if you have time. Any suggestions
>>> are warmly welcomed.
>>>
>>> === Question 1 ===
>>> In hash_resize.c : hashtab_lock_mod
>>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
>>> 187                 lsp->hbp[1] = NULL;
>>> 188                 return;
>>> 189         }
>>> 190         htp = rcu_dereference(htp->ht_new);
>>>
>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
>>> compilers and hardware from reordering the two unrelated variables,
>>> ht_resize_cur and ht_new. Is my understanding correct?
>>
>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
>> section
> 
> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> an RCU read-side critical section", don't you?
> 
>>         and there is a synchronize_rcu() between the update to ->ht_new
>> and the updates to ->ht_resize_cur.  For more details on how this works,
>> please see https://lwn.net/Articles/573497/.
>>
>> Of course, if you find a code path in which a call to hashtab_lock_mod()
>> is invoked outside of an RCU read-side critical section, that would be
>> a bug.  (Can you tell me an exception to this rule, that is, a case
>> where hashtab_lock_mod() could safely be invoked outside of an RCU
>> read-side critical section?)
>>
>>> === Question 2 ===
>>> In hash_resize.c, each time an updater wants to access a bucket, the
>>> updater must first acquire the bucket's lock (htb_lock), preventing
>>> other updaters accessing the same bucket concurrently. This approach
>>> is OK if the linked list of a bucket is relatively short, but for a
>>> larger system where linked lists are long enough and the
>>> perftest_resize thread is running simultaneously, it could become a
>>> potential performance bottleneck. One naive solution is to allow
>>> multiple updaters to access the same bucket, only if they don't
>>> operate on the same item of the list of this bucket. I wonder if there
>>> are any existing works or discussions on this topic?
>>
>> One approach is to use a hashed array of locks, and to hash a given
>> element's address to locate the lock to be used.  Please see
>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
>>
>> Another approach is to use RCU to protect traversals, and locks within the
>> linked-list elements themselves.  These locks are conditionally acquired
>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
>> them in list order, and the tricks in Quick Quiz 7.9.
>>
>> Non-blocking synchronization can also be used, but it is often quite a
>> bit more complicated.  See for example the split-order list of Shalev
>> and Shavit, along with Desnoyers's RCU-protected extension in the
>> userspace RCU library.
>>
>> But it is usually -way- better to just choose a good hash function and
>> to increase the number of buckets.  Which is of course one reason for
>> having resizable hash tables.  ;-)
>>
>> But the other techniques can be useful in more complex linked data
>> structures, such as graphs, where there is no reasonable way to
>> partition the data.  Nevertheless, many people choose to do the
>> partitioning anyway, especially on distributed systems.
>>
>>> === Question 3 ===
>>> Chapter Data Structures also discusses other resizable hash tables,
>>> namely "Resizable, scalable, concurrent hash tables via relativistic
>>> programming" from Josh Triplett, which can save memory footprint by
>>> using a single pair of pointers. But my understanding is that
>>> perftest_resize.c is unique in that it allows you to rebuild the hash
>>> table by utilizing a different hash function, which could be very
>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
>>> not share this property. Is my understanding correct? Did I miss any
>>> discussions on this topic in perfbook?
>>
>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
>> to hash functions.  I expect that this advantage will become increasingly
>> important as security issues become more challenging.  Furthermore, I
>> suspect that the pointer-pair approach is faster and more scalable.
>> It is certainly simpler.
>>
>> On the other hand, one advantage of the other two approaches is decreased
>> memory consumption.
>>
>> Another advantage of Josh Triplett's pointer-unzip approach is that
>> concurrent updates are (in theory, anyway) not blocked for as long
>> by resize operations.  The other edge of this sword is that resizing
>> is much slower, given the need to wait for many RCU grace periods.
>>
>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
>> and Shavit's split-order list is that all operations are non-blocking,
>> which can be important on massively overloaded systems, such as one
>> might find in cloud computing.
>>
>>> === Question 4 ===
>>> In the current implementation of hash_resize.c, the perftest_resize
>>> could block an updater, and vice versa. It seems this is not what we
>>> expected. Ideally, they should be allowed to run concurrently, or at
>>> least the perftest_resize thread should have lower priority and
>>> updaters should never be blocked by the perftest_resize thread. Is
>>> that right? I'm very interested in helping improve. Please let me know
>>> if you have any suggestions.
>>
>> In hash_resize.c, an updater is blocked only for the time required to
>> redisposition a bucket.  This is a great improvement over blocking
>> updaters for the full resize over all buckets.
>>
>> But yes, it is not hard to do better, for example, periodically dropping
>> the old-table lock in hashtab_resize().  This requires a few careful
>> adjustments, of course.  Can you tell me what these adjustments are?
>>
>> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
>> optimizing for the race with hashtab_resize() doesn't make a whole lot
>> of sense.  Please see the patch below.  Thoughts?
>>
>> 							Thanx, Paul
>>
>> ------------------------------------------------------------------------
>>
>> commit 737646a9c868d841b32199b52f5569668975953e
>> Author: Paul E. McKenney <paulmck@linux.ibm.com>
>> Date:   Mon Jan 7 10:29:14 2019 -0800
>>
>>     datastruct/hash: Simplify hashtab_lookup()
>>     
>>     Because resizing leaves the old hash table intact, and because lookups
>>     are carried out within RCU read-side critical sections (which prevent
>>     a second resizing operation from starting), there is no need for a
>>     lookup to search anywhere but in the old hash table.  And in the common
>>     case, there is no resize, so there is no new hash table.  Therefore,
>>     eliminating the check for resizing speeds things up in the common
>>     case.  In addition, this simplifies the code.
>>     
>>     This commit therefore eliminates the ht_get_bucket() function,
>>     renames the ht_get_bucket_single() function to ht_get_bucket(),
>>     and modifies callers appropriately.
>>     
>>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
>>
>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
>> index 29e05f907200..be4157959b83 100644
>> --- a/CodeSamples/datastruct/hash/hash_resize.c
>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
>>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>>  static struct ht_bucket *				//\lnlbl{single:b}
>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
>> -                     unsigned long *h)
>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>>  {
>>  	unsigned long hash = htp->ht_gethash(key);
>>  
>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>  		*h = hash;				//\lnlbl{single:h}
>>  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
>>  }							//\lnlbl{single:e}
>> -
>> -/* Get hash bucket correesponding to key, accounting for resize. */
>> -static struct ht_bucket *				//\lnlbl{b}
>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
>> -{
>> -	struct ht_bucket *htbp;
>> -
>> -	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
>> -								//\fcvexclude
>> -	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
>> -		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> 
> If we can remove this memory barrier, the counterpart smp_mb() in
> hashtab_resize() becomes unnecessary, doesn't it?

And the WRITE_ONCE() in the following line.

        Thanks, Akira
> 
>         Thanks, Akira
> 
>> -		*htp = rcu_dereference((*htp)->ht_new);	//\lnlbl{newtable}
>> -		htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
>> -	}
>> -	if (i)						//\lnlbl{chk_i}
>> -		*i = (*htp)->ht_idx;			//\lnlbl{set_idx}
>> -	return htbp;					//\lnlbl{return}
>> -}							//\lnlbl{e}
>>  //\end{snippet}
>>  
>>  /* Read-side lock/unlock functions. */
>> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>>  
>>  	rcu_read_lock();				//\lnlbl{l:rcu_lock}
>>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{l:refhashtbl}
>> -	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:refbucket}
>> +	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:refbucket}
>>  	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_bucket}
>>  	lsp->hbp[0] = htbp;				//\lnlbl{l:lsp0b}
>>  	lsp->hls_idx[0] = htp->ht_idx;
>> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>>  		return;					//\lnlbl{l:fastret1}
>>  	}
>>  	htp = rcu_dereference(htp->ht_new);		//\lnlbl{l:new_hashtbl}
>> -	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:get_newbkt}
>> +	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:get_newbkt}
>>  	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_newbkt}
>>  	lsp->hbp[1] = htbp;				//\lnlbl{l:lsp1b}
>>  	lsp->hls_idx[1] = htp->ht_idx;
>> @@ -223,16 +204,15 @@ struct ht_elem *					//\lnlbl{lkp:b}
>>  hashtab_lookup(struct hashtab *htp_master, void *key)
>>  {
>>  	long b;
>> -	int i;
>>  	struct ht *htp;
>>  	struct ht_elem *htep;
>>  	struct ht_bucket *htbp;
>>  
>>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
>> -	htbp = ht_get_bucket(&htp, key, &b, &i);	//\lnlbl{lkp:get_curbkt}
>> +	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
>>  	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
>>  	                            &htbp->htb_head,
>> -	                            hte_next[i]) {
>> +	                            hte_next[htp->ht_idx]) {
>>  		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
>>  			return htep;			//\lnlbl{lkp:ret_match}
>>  	}						//\lnlbl{lkp:loop:e}
>> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
>>  		htbp = &htp->ht_bkt[i];			//\lnlbl{get_oldcur}
>>  		spin_lock(&htbp->htb_lock);		//\lnlbl{acq_oldcur}
>>  		cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
>> -			htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>> +			htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>>  			spin_lock(&htbp_new->htb_lock);
>>  			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
>>  			spin_unlock(&htbp_new->htb_lock);
>> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
>> index 5c61bf5e2389..0152437c274e 100644
>> --- a/datastruct/datastruct.tex
>> +++ b/datastruct/datastruct.tex
>> @@ -966,10 +966,8 @@ the old table.
>>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>>  Bucket selection is shown in
>>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
>> -which shows \co{ht_get_bucket_single()} on
>> -lines~\lnref{single:b}-\lnref{single:e} and
>> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
>> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
>> +which shows \co{ht_get_bucket()}.
>> +This function returns a reference to the bucket
>>  corresponding to the specified key in the specified hash table, without
>>  making any allowances for resizing.
>>  It also stores the bucket index corresponding to the key into the location
>> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
>>  hash value corresponding to the key into the location
>>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
>>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
>> -
>> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
>> -\co{ht_get_bucket_single()} on
>> -line~\lnref{call_single} to select the bucket
>> -corresponding to the hash in the current
>> -hash table, storing the hash value through parameter~\co{b}.
>> -If line~\lnref{resized} determines that the table is being resized and that
>> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
>> -table, then line~\lnref{newtable} selects the new hash table and
>> -line~\lnref{newbucket}
>> -selects the bucket corresponding to the hash in the new hash table,
>> -again storing the hash value through parameter~\co{b}.
>> -\end{lineref}
>> -
>> -\QuickQuiz{}
>> -	The code in
>> -	Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
>> -	computes the hash twice!
>> -	Why this blatant inefficiency?
>> -\QuickQuizAnswer{
>> -	The reason is that the old and new hash tables might have
>> -	completely different hash functions, so that a hash computed
>> -	for the old table might be completely irrelevant to the
>> -	new table.
>> -} \QuickQuizEnd
>> -
>> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
>> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
>> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>>  \end{lineref}
>>  
>>  \QuickQuiz{}
>> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>>  	functions described next.
>>  } \QuickQuizEnd
>>  
>> -This implementation of
>> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
>> -permit lookups and modifications to run concurrently
>> -with a resize operation.
>> +This implementation of \co{ht_get_bucket()} permits lookups and
>> +modifications to run concurrently with a resize operation.
>>  
>>  \begin{listing}[tb]
>>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
>> @@ -1129,11 +1095,6 @@ hash lookups.
>>  Line~\lnref{get_curtbl} fetches the current hash table and
>>  line~\lnref{get_curbkt} obtains a reference
>>  to the bucket corresponding to the specified key.
>> -This bucket will be located in a new resized hash table when a
>> -resize operation has progressed past the bucket in the old hash
>> -table that contained the desired data element.
>> -Note that line~\lnref{get_curbkt} also passes back the index that will be
>> -used to select the correct set of pointers from the pair in each element.
>>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
>>  so that if line~\lnref{match}
>>  detects a match,
>> @@ -1144,22 +1105,17 @@ failure.
>>  \end{lineref}
>>  
>>  \QuickQuiz{}
>> -	In the \co{hashtab_lookup()} function in
>> -	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
>> -	the code carefully finds the right bucket in the new hash table
>> -	if the element to be looked up has already been distributed
>> -	by a concurrent resize operation.
>> -	This seems wasteful for RCU-protected lookups.
>> -	Why not just stick with the old hash table in this case?
>> +	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
>> +	What if execution reaches line~\lnref{loop:b}
>> +	of \co{hashtab_lookup()} in
>> +	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
>> +	just after this bucket has been resized.
>> +	Won't that result in lookup failures?
>> +	\end{lineref}
>>  \QuickQuizAnswer{
>> -	Suppose that a resize operation begins and distributes half of
>> -	the old table's buckets to the new table.
>> -	Suppose further that a thread adds a new element that goes into
>> -	one of the already-distributed buckets, and that this same thread
>> -	now looks up this newly added element.
>> -	If lookups unconditionally traversed only the old hash table,
>> -	this thread would get a lookup failure for the element that it
>> -	just added, which certainly sounds like a bug to me!
>> +	No, it won't.
>> +	Resizing into the new hash table leaves the old hash table
>> +	intact, courtesy of the pointer pairs.
>>  } \QuickQuizEnd
>>  
>>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
>>
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 22:54   ` Akira Yokosawa
  2019-01-07 23:06     ` Akira Yokosawa
@ 2019-01-07 23:13     ` Paul E. McKenney
  2019-01-07 23:33       ` Paul E. McKenney
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-07 23:13 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Tue, Jan 08, 2019 at 07:54:16AM +0900, Akira Yokosawa wrote:
> Hi Paul, 
> 
> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> > On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> >> Hi all,
> >>
> >> I'm reading hash_resize recently, and have a few questions regarding
> >> this algorithm. Please take a look if you have time. Any suggestions
> >> are warmly welcomed.
> >>
> >> === Question 1 ===
> >> In hash_resize.c : hashtab_lock_mod
> >> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> >> 187                 lsp->hbp[1] = NULL;
> >> 188                 return;
> >> 189         }
> >> 190         htp = rcu_dereference(htp->ht_new);
> >>
> >> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> >> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> >> compilers and hardware from reordering the two unrelated variables,
> >> ht_resize_cur and ht_new. Is my understanding correct?
> > 
> > Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> > section
> 
> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> an RCU read-side critical section", don't you?

Indeed I do, good catch!

> >         and there is a synchronize_rcu() between the update to ->ht_new
> > and the updates to ->ht_resize_cur.  For more details on how this works,
> > please see https://lwn.net/Articles/573497/.
> > 
> > Of course, if you find a code path in which a call to hashtab_lock_mod()
> > is invoked outside of an RCU read-side critical section, that would be
> > a bug.  (Can you tell me an exception to this rule, that is, a case
> > where hashtab_lock_mod() could safely be invoked outside of an RCU
> > read-side critical section?)
> > 
> >> === Question 2 ===
> >> In hash_resize.c, each time an updater wants to access a bucket, the
> >> updater must first acquire the bucket's lock (htb_lock), preventing
> >> other updaters accessing the same bucket concurrently. This approach
> >> is OK if the linked list of a bucket is relatively short, but for a
> >> larger system where linked lists are long enough and the
> >> perftest_resize thread is running simultaneously, it could become a
> >> potential performance bottleneck. One naive solution is to allow
> >> multiple updaters to access the same bucket, only if they don't
> >> operate on the same item of the list of this bucket. I wonder if there
> >> are any existing works or discussions on this topic?
> > 
> > One approach is to use a hashed array of locks, and to hash a given
> > element's address to locate the lock to be used.  Please see
> > Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> > Needed Locks First"), including Quick Quiz 7.9, for additional details.
> > 
> > Another approach is to use RCU to protect traversals, and locks within the
> > linked-list elements themselves.  These locks are conditionally acquired
> > (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> > them in list order, and the tricks in Quick Quiz 7.9.
> > 
> > Non-blocking synchronization can also be used, but it is often quite a
> > bit more complicated.  See for example the split-order list of Shalev
> > and Shavit, along with Desnoyers's RCU-protected extension in the
> > userspace RCU library.
> > 
> > But it is usually -way- better to just choose a good hash function and
> > to increase the number of buckets.  Which is of course one reason for
> > having resizable hash tables.  ;-)
> > 
> > But the other techniques can be useful in more complex linked data
> > structures, such as graphs, where there is no reasonable way to
> > partition the data.  Nevertheless, many people choose to do the
> > partitioning anyway, especially on distributed systems.
> > 
> >> === Question 3 ===
> >> Chapter Data Structures also discusses other resizable hash tables,
> >> namely "Resizable, scalable, concurrent hash tables via relativistic
> >> programming" from Josh Triplett, which can save memory footprint by
> >> using a single pair of pointers. But my understanding is that
> >> perftest_resize.c is unique in that it allows you to rebuild the hash
> >> table by utilizing a different hash function, which could be very
> >> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> >> not share this property. Is my understanding correct? Did I miss any
> >> discussions on this topic in perfbook?
> > 
> > Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> > (which I use in hash_resize.c) is the only one allowing arbitrary changes
> > to hash functions.  I expect that this advantage will become increasingly
> > important as security issues become more challenging.  Furthermore, I
> > suspect that the pointer-pair approach is faster and more scalable.
> > It is certainly simpler.
> > 
> > On the other hand, one advantage of the other two approaches is decreased
> > memory consumption.
> > 
> > Another advantage of Josh Triplett's pointer-unzip approach is that
> > concurrent updates are (in theory, anyway) not blocked for as long
> > by resize operations.  The other edge of this sword is that resizing
> > is much slower, given the need to wait for many RCU grace periods.
> > 
> > Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> > and Shavit's split-order list is that all operations are non-blocking,
> > which can be important on massively overloaded systems, such as one
> > might find in cloud computing.
> > 
> >> === Question 4 ===
> >> In the current implementation of hash_resize.c, the perftest_resize
> >> could block an updater, and vice versa. It seems this is not what we
> >> expected. Ideally, they should be allowed to run concurrently, or at
> >> least the perftest_resize thread should have lower priority and
> >> updaters should never be blocked by the perftest_resize thread. Is
> >> that right? I'm very interested in helping improve. Please let me know
> >> if you have any suggestions.
> > 
> > In hash_resize.c, an updater is blocked only for the time required to
> > redisposition a bucket.  This is a great improvement over blocking
> > updaters for the full resize over all buckets.
> > 
> > But yes, it is not hard to do better, for example, periodically dropping
> > the old-table lock in hashtab_resize().  This requires a few careful
> > adjustments, of course.  Can you tell me what these adjustments are?
> > 
> > Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> > optimizing for the race with hashtab_resize() doesn't make a whole lot
> > of sense.  Please see the patch below.  Thoughts?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 737646a9c868d841b32199b52f5569668975953e
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Mon Jan 7 10:29:14 2019 -0800
> > 
> >     datastruct/hash: Simplify hashtab_lookup()
> >     
> >     Because resizing leaves the old hash table intact, and because lookups
> >     are carried out within RCU read-side critical sections (which prevent
> >     a second resizing operation from starting), there is no need for a
> >     lookup to search anywhere but in the old hash table.  And in the common
> >     case, there is no resize, so there is no new hash table.  Therefore,
> >     eliminating the check for resizing speeds things up in the common
> >     case.  In addition, this simplifies the code.
> >     
> >     This commit therefore eliminates the ht_get_bucket() function,
> >     renames the ht_get_bucket_single() function to ht_get_bucket(),
> >     and modifies callers appropriately.
> >     
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> > diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> > index 29e05f907200..be4157959b83 100644
> > --- a/CodeSamples/datastruct/hash/hash_resize.c
> > +++ b/CodeSamples/datastruct/hash/hash_resize.c
> > @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> >  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> >  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> >  static struct ht_bucket *				//\lnlbl{single:b}
> > -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> > -                     unsigned long *h)
> > +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> >  {
> >  	unsigned long hash = htp->ht_gethash(key);
> >  
> > @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >  		*h = hash;				//\lnlbl{single:h}
> >  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
> >  }							//\lnlbl{single:e}
> > -
> > -/* Get hash bucket correesponding to key, accounting for resize. */
> > -static struct ht_bucket *				//\lnlbl{b}
> > -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> > -{
> > -	struct ht_bucket *htbp;
> > -
> > -	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> > -								//\fcvexclude
> > -	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
> > -		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> 
> If we can remove this memory barrier, the counterpart smp_mb() in
> hashtab_resize() becomes unnecessary, doesn't it?

Or maybe I need to add a memory barrier to hashtab_lock_mod().  Thoughts?

							Thanx, Paul

>         Thanks, Akira
> 
> > -		*htp = rcu_dereference((*htp)->ht_new);	//\lnlbl{newtable}
> > -		htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> > -	}
> > -	if (i)						//\lnlbl{chk_i}
> > -		*i = (*htp)->ht_idx;			//\lnlbl{set_idx}
> > -	return htbp;					//\lnlbl{return}
> > -}							//\lnlbl{e}
> >  //\end{snippet}
> >  
> >  /* Read-side lock/unlock functions. */
> > @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >  
> >  	rcu_read_lock();				//\lnlbl{l:rcu_lock}
> >  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{l:refhashtbl}
> > -	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:refbucket}
> > +	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:refbucket}
> >  	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_bucket}
> >  	lsp->hbp[0] = htbp;				//\lnlbl{l:lsp0b}
> >  	lsp->hls_idx[0] = htp->ht_idx;
> > @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >  		return;					//\lnlbl{l:fastret1}
> >  	}
> >  	htp = rcu_dereference(htp->ht_new);		//\lnlbl{l:new_hashtbl}
> > -	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:get_newbkt}
> > +	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:get_newbkt}
> >  	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_newbkt}
> >  	lsp->hbp[1] = htbp;				//\lnlbl{l:lsp1b}
> >  	lsp->hls_idx[1] = htp->ht_idx;
> > @@ -223,16 +204,15 @@ struct ht_elem *					//\lnlbl{lkp:b}
> >  hashtab_lookup(struct hashtab *htp_master, void *key)
> >  {
> >  	long b;
> > -	int i;
> >  	struct ht *htp;
> >  	struct ht_elem *htep;
> >  	struct ht_bucket *htbp;
> >  
> >  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
> > -	htbp = ht_get_bucket(&htp, key, &b, &i);	//\lnlbl{lkp:get_curbkt}
> > +	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
> >  	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
> >  	                            &htbp->htb_head,
> > -	                            hte_next[i]) {
> > +	                            hte_next[htp->ht_idx]) {
> >  		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
> >  			return htep;			//\lnlbl{lkp:ret_match}
> >  	}						//\lnlbl{lkp:loop:e}
> > @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
> >  		htbp = &htp->ht_bkt[i];			//\lnlbl{get_oldcur}
> >  		spin_lock(&htbp->htb_lock);		//\lnlbl{acq_oldcur}
> >  		cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> > -			htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> > +			htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> >  			spin_lock(&htbp_new->htb_lock);
> >  			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> >  			spin_unlock(&htbp_new->htb_lock);
> > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > index 5c61bf5e2389..0152437c274e 100644
> > --- a/datastruct/datastruct.tex
> > +++ b/datastruct/datastruct.tex
> > @@ -966,10 +966,8 @@ the old table.
> >  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> >  Bucket selection is shown in
> >  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> > -which shows \co{ht_get_bucket_single()} on
> > -lines~\lnref{single:b}-\lnref{single:e} and
> > -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> > -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> > +which shows \co{ht_get_bucket()}.
> > +This function returns a reference to the bucket
> >  corresponding to the specified key in the specified hash table, without
> >  making any allowances for resizing.
> >  It also stores the bucket index corresponding to the key into the location
> > @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
> >  hash value corresponding to the key into the location
> >  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
> >  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> > -
> > -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> > -\co{ht_get_bucket_single()} on
> > -line~\lnref{call_single} to select the bucket
> > -corresponding to the hash in the current
> > -hash table, storing the hash value through parameter~\co{b}.
> > -If line~\lnref{resized} determines that the table is being resized and that
> > -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> > -table, then line~\lnref{newtable} selects the new hash table and
> > -line~\lnref{newbucket}
> > -selects the bucket corresponding to the hash in the new hash table,
> > -again storing the hash value through parameter~\co{b}.
> > -\end{lineref}
> > -
> > -\QuickQuiz{}
> > -	The code in
> > -	Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> > -	computes the hash twice!
> > -	Why this blatant inefficiency?
> > -\QuickQuizAnswer{
> > -	The reason is that the old and new hash tables might have
> > -	completely different hash functions, so that a hash computed
> > -	for the old table might be completely irrelevant to the
> > -	new table.
> > -} \QuickQuizEnd
> > -
> > -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> > -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> > -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> > -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >  \end{lineref}
> >  
> >  \QuickQuiz{}
> > @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >  	functions described next.
> >  } \QuickQuizEnd
> >  
> > -This implementation of
> > -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> > -permit lookups and modifications to run concurrently
> > -with a resize operation.
> > +This implementation of \co{ht_get_bucket()} permits lookups and
> > +modifications to run concurrently with a resize operation.
> >  
> >  \begin{listing}[tb]
> >  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> > @@ -1129,11 +1095,6 @@ hash lookups.
> >  Line~\lnref{get_curtbl} fetches the current hash table and
> >  line~\lnref{get_curbkt} obtains a reference
> >  to the bucket corresponding to the specified key.
> > -This bucket will be located in a new resized hash table when a
> > -resize operation has progressed past the bucket in the old hash
> > -table that contained the desired data element.
> > -Note that line~\lnref{get_curbkt} also passes back the index that will be
> > -used to select the correct set of pointers from the pair in each element.
> >  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> >  so that if line~\lnref{match}
> >  detects a match,
> > @@ -1144,22 +1105,17 @@ failure.
> >  \end{lineref}
> >  
> >  \QuickQuiz{}
> > -	In the \co{hashtab_lookup()} function in
> > -	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> > -	the code carefully finds the right bucket in the new hash table
> > -	if the element to be looked up has already been distributed
> > -	by a concurrent resize operation.
> > -	This seems wasteful for RCU-protected lookups.
> > -	Why not just stick with the old hash table in this case?
> > +	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> > +	What if execution reaches line~\lnref{loop:b}
> > +	of \co{hashtab_lookup()} in
> > +	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> > +	just after this bucket has been resized.
> > +	Won't that result in lookup failures?
> > +	\end{lineref}
> >  \QuickQuizAnswer{
> > -	Suppose that a resize operation begins and distributes half of
> > -	the old table's buckets to the new table.
> > -	Suppose further that a thread adds a new element that goes into
> > -	one of the already-distributed buckets, and that this same thread
> > -	now looks up this newly added element.
> > -	If lookups unconditionally traversed only the old hash table,
> > -	this thread would get a lookup failure for the element that it
> > -	just added, which certainly sounds like a bug to me!
> > +	No, it won't.
> > +	Resizing into the new hash table leaves the old hash table
> > +	intact, courtesy of the pointer pairs.
> >  } \QuickQuizEnd
> >  
> >  \begin{lineref}[ln:datastruct:hash_resize:access:add]
> > 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 23:13     ` Paul E. McKenney
@ 2019-01-07 23:33       ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-07 23:33 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Mon, Jan 07, 2019 at 03:13:07PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 08, 2019 at 07:54:16AM +0900, Akira Yokosawa wrote:
> > Hi Paul, 
> > 
> > On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> > >> Hi all,
> > >>
> > >> I'm reading hash_resize recently, and have a few questions regarding
> > >> this algorithm. Please take a look if you have time. Any suggestions
> > >> are warmly welcomed.
> > >>
> > >> === Question 1 ===
> > >> In hash_resize.c : hashtab_lock_mod
> > >> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> > >> 187                 lsp->hbp[1] = NULL;
> > >> 188                 return;
> > >> 189         }
> > >> 190         htp = rcu_dereference(htp->ht_new);
> > >>
> > >> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> > >> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> > >> compilers and hardware from reordering the two unrelated variables,
> > >> ht_resize_cur and ht_new. Is my understanding correct?
> > > 
> > > Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> > > section
> > 
> > You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> > an RCU read-side critical section", don't you?
> 
> Indeed I do, good catch!
> 
> > >         and there is a synchronize_rcu() between the update to ->ht_new
> > > and the updates to ->ht_resize_cur.  For more details on how this works,
> > > please see https://lwn.net/Articles/573497/.
> > > 
> > > Of course, if you find a code path in which a call to hashtab_lock_mod()
> > > is invoked outside of an RCU read-side critical section, that would be
> > > a bug.  (Can you tell me an exception to this rule, that is, a case
> > > where hashtab_lock_mod() could safely be invoked outside of an RCU
> > > read-side critical section?)
> > > 
> > >> === Question 2 ===
> > >> In hash_resize.c, each time an updater wants to access a bucket, the
> > >> updater must first acquire the bucket's lock (htb_lock), preventing
> > >> other updaters accessing the same bucket concurrently. This approach
> > >> is OK if the linked list of a bucket is relatively short, but for a
> > >> larger system where linked lists are long enough and the
> > >> perftest_resize thread is running simultaneously, it could become a
> > >> potential performance bottleneck. One naive solution is to allow
> > >> multiple updaters to access the same bucket, only if they don't
> > >> operate on the same item of the list of this bucket. I wonder if there
> > >> are any existing works or discussions on this topic?
> > > 
> > > One approach is to use a hashed array of locks, and to hash a given
> > > element's address to locate the lock to be used.  Please see
> > > Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> > > Needed Locks First"), including Quick Quiz 7.9, for additional details.
> > > 
> > > Another approach is to use RCU to protect traversals, and locks within the
> > > linked-list elements themselves.  These locks are conditionally acquired
> > > (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> > > them in list order, and the tricks in Quick Quiz 7.9.
> > > 
> > > Non-blocking synchronization can also be used, but it is often quite a
> > > bit more complicated.  See for example the split-order list of Shalev
> > > and Shavit, along with Desnoyers's RCU-protected extension in the
> > > userspace RCU library.
> > > 
> > > But it is usually -way- better to just choose a good hash function and
> > > to increase the number of buckets.  Which is of course one reason for
> > > having resizable hash tables.  ;-)
> > > 
> > > But the other techniques can be useful in more complex linked data
> > > structures, such as graphs, where there is no reasonable way to
> > > partition the data.  Nevertheless, many people choose to do the
> > > partitioning anyway, especially on distributed systems.
> > > 
> > >> === Question 3 ===
> > >> Chapter Data Structures also discusses other resizable hash tables,
> > >> namely "Resizable, scalable, concurrent hash tables via relativistic
> > >> programming" from Josh Triplett, which can save memory footprint by
> > >> using a single pair of pointers. But my understanding is that
> > >> perftest_resize.c is unique in that it allows you to rebuild the hash
> > >> table by utilizing a different hash function, which could be very
> > >> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> > >> not share this property. Is my understanding correct? Did I miss any
> > >> discussions on this topic in perfbook?
> > > 
> > > Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> > > (which I use in hash_resize.c) is the only one allowing arbitrary changes
> > > to hash functions.  I expect that this advantage will become increasingly
> > > important as security issues become more challenging.  Furthermore, I
> > > suspect that the pointer-pair approach is faster and more scalable.
> > > It is certainly simpler.
> > > 
> > > On the other hand, one advantage of the other two approaches is decreased
> > > memory consumption.
> > > 
> > > Another advantage of Josh Triplett's pointer-unzip approach is that
> > > concurrent updates are (in theory, anyway) not blocked for as long
> > > by resize operations.  The other edge of this sword is that resizing
> > > is much slower, given the need to wait for many RCU grace periods.
> > > 
> > > Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> > > and Shavit's split-order list is that all operations are non-blocking,
> > > which can be important on massively overloaded systems, such as one
> > > might find in cloud computing.
> > > 
> > >> === Question 4 ===
> > >> In the current implementation of hash_resize.c, the perftest_resize
> > >> could block an updater, and vice versa. It seems this is not what we
> > >> expected. Ideally, they should be allowed to run concurrently, or at
> > >> least the perftest_resize thread should have lower priority and
> > >> updaters should never be blocked by the perftest_resize thread. Is
> > >> that right? I'm very interested in helping improve. Please let me know
> > >> if you have any suggestions.
> > > 
> > > In hash_resize.c, an updater is blocked only for the time required to
> > > redisposition a bucket.  This is a great improvement over blocking
> > > updaters for the full resize over all buckets.
> > > 
> > > But yes, it is not hard to do better, for example, periodically dropping
> > > the old-table lock in hashtab_resize().  This requires a few careful
> > > adjustments, of course.  Can you tell me what these adjustments are?
> > > 
> > > Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> > > optimizing for the race with hashtab_resize() doesn't make a whole lot
> > > of sense.  Please see the patch below.  Thoughts?
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > commit 737646a9c868d841b32199b52f5569668975953e
> > > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > > Date:   Mon Jan 7 10:29:14 2019 -0800
> > > 
> > >     datastruct/hash: Simplify hashtab_lookup()
> > >     
> > >     Because resizing leaves the old hash table intact, and because lookups
> > >     are carried out within RCU read-side critical sections (which prevent
> > >     a second resizing operation from starting), there is no need for a
> > >     lookup to search anywhere but in the old hash table.  And in the common
> > >     case, there is no resize, so there is no new hash table.  Therefore,
> > >     eliminating the check for resizing speeds things up in the common
> > >     case.  In addition, this simplifies the code.
> > >     
> > >     This commit therefore eliminates the ht_get_bucket() function,
> > >     renames the ht_get_bucket_single() function to ht_get_bucket(),
> > >     and modifies callers appropriately.
> > >     
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > > 
> > > diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> > > index 29e05f907200..be4157959b83 100644
> > > --- a/CodeSamples/datastruct/hash/hash_resize.c
> > > +++ b/CodeSamples/datastruct/hash/hash_resize.c
> > > @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> > >  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> > >  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> > >  static struct ht_bucket *				//\lnlbl{single:b}
> > > -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> > > -                     unsigned long *h)
> > > +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> > >  {
> > >  	unsigned long hash = htp->ht_gethash(key);
> > >  
> > > @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> > >  		*h = hash;				//\lnlbl{single:h}
> > >  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
> > >  }							//\lnlbl{single:e}
> > > -
> > > -/* Get hash bucket correesponding to key, accounting for resize. */
> > > -static struct ht_bucket *				//\lnlbl{b}
> > > -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> > > -{
> > > -	struct ht_bucket *htbp;
> > > -
> > > -	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> > > -								//\fcvexclude
> > > -	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
> > > -		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> > 
> > If we can remove this memory barrier, the counterpart smp_mb() in
> > hashtab_resize() becomes unnecessary, doesn't it?
> 
> Or maybe I need to add a memory barrier to hashtab_lock_mod().  Thoughts?

No, the locking provides full mutual exclusion.  Queued a patch with your
Reported-by, thank you!

							Thanx, Paul

> >         Thanks, Akira
> > 
> > > -		*htp = rcu_dereference((*htp)->ht_new);	//\lnlbl{newtable}
> > > -		htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> > > -	}
> > > -	if (i)						//\lnlbl{chk_i}
> > > -		*i = (*htp)->ht_idx;			//\lnlbl{set_idx}
> > > -	return htbp;					//\lnlbl{return}
> > > -}							//\lnlbl{e}
> > >  //\end{snippet}
> > >  
> > >  /* Read-side lock/unlock functions. */
> > > @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> > >  
> > >  	rcu_read_lock();				//\lnlbl{l:rcu_lock}
> > >  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{l:refhashtbl}
> > > -	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:refbucket}
> > > +	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:refbucket}
> > >  	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_bucket}
> > >  	lsp->hbp[0] = htbp;				//\lnlbl{l:lsp0b}
> > >  	lsp->hls_idx[0] = htp->ht_idx;
> > > @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> > >  		return;					//\lnlbl{l:fastret1}
> > >  	}
> > >  	htp = rcu_dereference(htp->ht_new);		//\lnlbl{l:new_hashtbl}
> > > -	htbp = ht_get_bucket_single(htp, key, &b, &h);	//\lnlbl{l:get_newbkt}
> > > +	htbp = ht_get_bucket(htp, key, &b, &h);		//\lnlbl{l:get_newbkt}
> > >  	spin_lock(&htbp->htb_lock);			//\lnlbl{l:acq_newbkt}
> > >  	lsp->hbp[1] = htbp;				//\lnlbl{l:lsp1b}
> > >  	lsp->hls_idx[1] = htp->ht_idx;
> > > @@ -223,16 +204,15 @@ struct ht_elem *					//\lnlbl{lkp:b}
> > >  hashtab_lookup(struct hashtab *htp_master, void *key)
> > >  {
> > >  	long b;
> > > -	int i;
> > >  	struct ht *htp;
> > >  	struct ht_elem *htep;
> > >  	struct ht_bucket *htbp;
> > >  
> > >  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
> > > -	htbp = ht_get_bucket(&htp, key, &b, &i);	//\lnlbl{lkp:get_curbkt}
> > > +	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
> > >  	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
> > >  	                            &htbp->htb_head,
> > > -	                            hte_next[i]) {
> > > +	                            hte_next[htp->ht_idx]) {
> > >  		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
> > >  			return htep;			//\lnlbl{lkp:ret_match}
> > >  	}						//\lnlbl{lkp:loop:e}
> > > @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
> > >  		htbp = &htp->ht_bkt[i];			//\lnlbl{get_oldcur}
> > >  		spin_lock(&htbp->htb_lock);		//\lnlbl{acq_oldcur}
> > >  		cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> > > -			htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> > > +			htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> > >  			spin_lock(&htbp_new->htb_lock);
> > >  			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> > >  			spin_unlock(&htbp_new->htb_lock);
> > > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > > index 5c61bf5e2389..0152437c274e 100644
> > > --- a/datastruct/datastruct.tex
> > > +++ b/datastruct/datastruct.tex
> > > @@ -966,10 +966,8 @@ the old table.
> > >  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> > >  Bucket selection is shown in
> > >  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> > > -which shows \co{ht_get_bucket_single()} on
> > > -lines~\lnref{single:b}-\lnref{single:e} and
> > > -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> > > -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> > > +which shows \co{ht_get_bucket()}.
> > > +This function returns a reference to the bucket
> > >  corresponding to the specified key in the specified hash table, without
> > >  making any allowances for resizing.
> > >  It also stores the bucket index corresponding to the key into the location
> > > @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
> > >  hash value corresponding to the key into the location
> > >  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
> > >  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> > > -
> > > -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> > > -\co{ht_get_bucket_single()} on
> > > -line~\lnref{call_single} to select the bucket
> > > -corresponding to the hash in the current
> > > -hash table, storing the hash value through parameter~\co{b}.
> > > -If line~\lnref{resized} determines that the table is being resized and that
> > > -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> > > -table, then line~\lnref{newtable} selects the new hash table and
> > > -line~\lnref{newbucket}
> > > -selects the bucket corresponding to the hash in the new hash table,
> > > -again storing the hash value through parameter~\co{b}.
> > > -\end{lineref}
> > > -
> > > -\QuickQuiz{}
> > > -	The code in
> > > -	Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> > > -	computes the hash twice!
> > > -	Why this blatant inefficiency?
> > > -\QuickQuizAnswer{
> > > -	The reason is that the old and new hash tables might have
> > > -	completely different hash functions, so that a hash computed
> > > -	for the old table might be completely irrelevant to the
> > > -	new table.
> > > -} \QuickQuizEnd
> > > -
> > > -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> > > -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> > > -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> > > -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> > >  \end{lineref}
> > >  
> > >  \QuickQuiz{}
> > > @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> > >  	functions described next.
> > >  } \QuickQuizEnd
> > >  
> > > -This implementation of
> > > -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> > > -permit lookups and modifications to run concurrently
> > > -with a resize operation.
> > > +This implementation of \co{ht_get_bucket()} permits lookups and
> > > +modifications to run concurrently with a resize operation.
> > >  
> > >  \begin{listing}[tb]
> > >  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> > > @@ -1129,11 +1095,6 @@ hash lookups.
> > >  Line~\lnref{get_curtbl} fetches the current hash table and
> > >  line~\lnref{get_curbkt} obtains a reference
> > >  to the bucket corresponding to the specified key.
> > > -This bucket will be located in a new resized hash table when a
> > > -resize operation has progressed past the bucket in the old hash
> > > -table that contained the desired data element.
> > > -Note that line~\lnref{get_curbkt} also passes back the index that will be
> > > -used to select the correct set of pointers from the pair in each element.
> > >  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> > >  so that if line~\lnref{match}
> > >  detects a match,
> > > @@ -1144,22 +1105,17 @@ failure.
> > >  \end{lineref}
> > >  
> > >  \QuickQuiz{}
> > > -	In the \co{hashtab_lookup()} function in
> > > -	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> > > -	the code carefully finds the right bucket in the new hash table
> > > -	if the element to be looked up has already been distributed
> > > -	by a concurrent resize operation.
> > > -	This seems wasteful for RCU-protected lookups.
> > > -	Why not just stick with the old hash table in this case?
> > > +	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> > > +	What if execution reaches line~\lnref{loop:b}
> > > +	of \co{hashtab_lookup()} in
> > > +	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> > > +	just after this bucket has been resized.
> > > +	Won't that result in lookup failures?
> > > +	\end{lineref}
> > >  \QuickQuizAnswer{
> > > -	Suppose that a resize operation begins and distributes half of
> > > -	the old table's buckets to the new table.
> > > -	Suppose further that a thread adds a new element that goes into
> > > -	one of the already-distributed buckets, and that this same thread
> > > -	now looks up this newly added element.
> > > -	If lookups unconditionally traversed only the old hash table,
> > > -	this thread would get a lookup failure for the element that it
> > > -	just added, which certainly sounds like a bug to me!
> > > +	No, it won't.
> > > +	Resizing into the new hash table leaves the old hash table
> > > +	intact, courtesy of the pointer pairs.
> > >  } \QuickQuizEnd
> > >  
> > >  \begin{lineref}[ln:datastruct:hash_resize:access:add]
> > > 
> > 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 23:06     ` Akira Yokosawa
@ 2019-01-07 23:48       ` Paul E. McKenney
  2019-01-08 15:18         ` Akira Yokosawa
  2019-01-08  1:56       ` Junchang Wang
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-07 23:48 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Tue, Jan 08, 2019 at 08:06:51AM +0900, Akira Yokosawa wrote:
> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> > Hi Paul, 
> > 
> > On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> >> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> >>> Hi all,
> >>>
> >>> I'm reading hash_resize recently, and have a few questions regarding
> >>> this algorithm. Please take a look if you have time. Any suggestions
> >>> are warmly welcomed.
> >>>
> >>> === Question 1 ===
> >>> In hash_resize.c : hashtab_lock_mod
> >>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> >>> 187                 lsp->hbp[1] = NULL;
> >>> 188                 return;
> >>> 189         }
> >>> 190         htp = rcu_dereference(htp->ht_new);
> >>>
> >>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> >>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> >>> compilers and hardware from reordering the two unrelated variables,
> >>> ht_resize_cur and ht_new. Is my understanding correct?
> >>
> >> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> >> section
> > 
> > You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> > an RCU read-side critical section", don't you?
> > 
> >>         and there is a synchronize_rcu() between the update to ->ht_new
> >> and the updates to ->ht_resize_cur.  For more details on how this works,
> >> please see https://lwn.net/Articles/573497/.
> >>
> >> Of course, if you find a code path in which a call to hashtab_lock_mod()
> >> is invoked outside of an RCU read-side critical section, that would be
> >> a bug.  (Can you tell me an exception to this rule, that is, a case
> >> where hashtab_lock_mod() could safely be invoked outside of an RCU
> >> read-side critical section?)
> >>
> >>> === Question 2 ===
> >>> In hash_resize.c, each time an updater wants to access a bucket, the
> >>> updater must first acquire the bucket's lock (htb_lock), preventing
> >>> other updaters accessing the same bucket concurrently. This approach
> >>> is OK if the linked list of a bucket is relatively short, but for a
> >>> larger system where linked lists are long enough and the
> >>> perftest_resize thread is running simultaneously, it could become a
> >>> potential performance bottleneck. One naive solution is to allow
> >>> multiple updaters to access the same bucket, only if they don't
> >>> operate on the same item of the list of this bucket. I wonder if there
> >>> are any existing works or discussions on this topic?
> >>
> >> One approach is to use a hashed array of locks, and to hash a given
> >> element's address to locate the lock to be used.  Please see
> >> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> >> Needed Locks First"), including Quick Quiz 7.9, for additional details.
> >>
> >> Another approach is to use RCU to protect traversals, and locks within the
> >> linked-list elements themselves.  These locks are conditionally acquired
> >> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> >> them in list order, and the tricks in Quick Quiz 7.9.
> >>
> >> Non-blocking synchronization can also be used, but it is often quite a
> >> bit more complicated.  See for example the split-order list of Shalev
> >> and Shavit, along with Desnoyers's RCU-protected extension in the
> >> userspace RCU library.
> >>
> >> But it is usually -way- better to just choose a good hash function and
> >> to increase the number of buckets.  Which is of course one reason for
> >> having resizable hash tables.  ;-)
> >>
> >> But the other techniques can be useful in more complex linked data
> >> structures, such as graphs, where there is no reasonable way to
> >> partition the data.  Nevertheless, many people choose to do the
> >> partitioning anyway, especially on distributed systems.
> >>
> >>> === Question 3 ===
> >>> Chapter Data Structures also discusses other resizable hash tables,
> >>> namely "Resizable, scalable, concurrent hash tables via relativistic
> >>> programming" from Josh Triplett, which can save memory footprint by
> >>> using a single pair of pointers. But my understanding is that
> >>> perftest_resize.c is unique in that it allows you to rebuild the hash
> >>> table by utilizing a different hash function, which could be very
> >>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> >>> not share this property. Is my understanding correct? Did I miss any
> >>> discussions on this topic in perfbook?
> >>
> >> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> >> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> >> to hash functions.  I expect that this advantage will become increasingly
> >> important as security issues become more challenging.  Furthermore, I
> >> suspect that the pointer-pair approach is faster and more scalable.
> >> It is certainly simpler.
> >>
> >> On the other hand, one advantage of the other two approaches is decreased
> >> memory consumption.
> >>
> >> Another advantage of Josh Triplett's pointer-unzip approach is that
> >> concurrent updates are (in theory, anyway) not blocked for as long
> >> by resize operations.  The other edge of this sword is that resizing
> >> is much slower, given the need to wait for many RCU grace periods.
> >>
> >> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> >> and Shavit's split-order list is that all operations are non-blocking,
> >> which can be important on massively overloaded systems, such as one
> >> might find in cloud computing.
> >>
> >>> === Question 4 ===
> >>> In the current implementation of hash_resize.c, the perftest_resize
> >>> could block an updater, and vice versa. It seems this is not what we
> >>> expected. Ideally, they should be allowed to run concurrently, or at
> >>> least the perftest_resize thread should have lower priority and
> >>> updaters should never be blocked by the perftest_resize thread. Is
> >>> that right? I'm very interested in helping improve. Please let me know
> >>> if you have any suggestions.
> >>
> >> In hash_resize.c, an updater is blocked only for the time required to
> >> redisposition a bucket.  This is a great improvement over blocking
> >> updaters for the full resize over all buckets.
> >>
> >> But yes, it is not hard to do better, for example, periodically dropping
> >> the old-table lock in hashtab_resize().  This requires a few careful
> >> adjustments, of course.  Can you tell me what these adjustments are?
> >>
> >> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> >> optimizing for the race with hashtab_resize() doesn't make a whole lot
> >> of sense.  Please see the patch below.  Thoughts?
> >>
> >> 							Thanx, Paul
> >>
> >> ------------------------------------------------------------------------
> >>
> >> commit 737646a9c868d841b32199b52f5569668975953e
> >> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> >> Date:   Mon Jan 7 10:29:14 2019 -0800
> >>
> >>     datastruct/hash: Simplify hashtab_lookup()
> >>     
> >>     Because resizing leaves the old hash table intact, and because lookups
> >>     are carried out within RCU read-side critical sections (which prevent
> >>     a second resizing operation from starting), there is no need for a
> >>     lookup to search anywhere but in the old hash table.  And in the common
> >>     case, there is no resize, so there is no new hash table.  Therefore,
> >>     eliminating the check for resizing speeds things up in the common
> >>     case.  In addition, this simplifies the code.
> >>     
> >>     This commit therefore eliminates the ht_get_bucket() function,
> >>     renames the ht_get_bucket_single() function to ht_get_bucket(),
> >>     and modifies callers appropriately.
> >>     
> >>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> >>
> >> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> >> index 29e05f907200..be4157959b83 100644
> >> --- a/CodeSamples/datastruct/hash/hash_resize.c
> >> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> >> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> >>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> >>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> >>  static struct ht_bucket *				//\lnlbl{single:b}
> >> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >> -                     unsigned long *h)
> >> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> >>  {
> >>  	unsigned long hash = htp->ht_gethash(key);
> >>  
> >> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>  		*h = hash;				//\lnlbl{single:h}
> >>  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
> >>  }							//\lnlbl{single:e}
> >> -
> >> -/* Get hash bucket correesponding to key, accounting for resize. */
> >> -static struct ht_bucket *				//\lnlbl{b}
> >> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> >> -{
> >> -	struct ht_bucket *htbp;
> >> -
> >> -	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> >> -								//\fcvexclude
> >> -	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
> >> -		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> > 
> > If we can remove this memory barrier, the counterpart smp_mb() in
> > hashtab_resize() becomes unnecessary, doesn't it?
> 
> And the WRITE_ONCE() in the following line.

Actually, that must stay.  It is true that the bucket lock is held by
hashtab_unlock_mod(), and that this prevents concurrent resizing of that
bucket, but other buckets might well be resized, which results in the
possibiliity of concurrent reads and writes for ->ht_resize_cur.

Anyway, here is the resulting commit.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 1aac0c703482c90c2ce4092b2cc604d474f5a44b
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Mon Jan 7 15:39:40 2019 -0800

    datastruct/hash: Remove extraneous barrier from hashtab_resize()
    
    Now that hashtab_lookup() is iresizing-agnostic, all non-initialization
    accesses to ->ht_resize-cur are protected by locking in the restricted
    sense that any change to ->ht_resize_cur that would change the value
    of the "if" condition cannot happen while the lock is held on the old
    bucket.  This means that the memory barrier may be removed.  However,
    the READ_ONCE() and WRITE_ONCE() markings on non-initialization accesses
    to ->ht_resize_cur must remain because reads from ->ht_resize_cur really
    can race with writes, just not is a way to change the "if" conditions.
    
    Reported-by: Akira Yokosawa <akiyks@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
index be4157959b83..9f68a00dabe3 100644
--- a/CodeSamples/datastruct/hash/hash_resize.c
+++ b/CodeSamples/datastruct/hash/hash_resize.c
@@ -288,7 +288,6 @@ int hashtab_resize(struct hashtab *htp_master,
 			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
 			spin_unlock(&htbp_new->htb_lock);
 		}					//\lnlbl{loop_list:e}
-		smp_mb(); /* Fill new buckets before claiming them. */
 		WRITE_ONCE(htp->ht_resize_cur, i);	//\lnlbl{update_resize}
 		spin_unlock(&htbp->htb_lock);		//\lnlbl{rel_oldcur}
 	}						//\lnlbl{loop:e}
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 0152437c274e..e2159330790f 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1245,10 +1245,7 @@ lines~\lnref{loop_list:b}-\lnref{loop_list:e} adds one data element
 from the current old-table bucket to the corresponding new-table bucket,
 holding the new-table bucket's lock during the add operation.
 Line~\lnref{update_resize} updates
-\co{->ht_resize_cur} to indicate that this bucket has been distributed,
-and is preceded by a full memory barrier that pairs with the one in
-\co{ht_get_bucket()} shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}.
+\co{->ht_resize_cur} to indicate that this bucket has been distributed.
 Finally, line~\lnref{rel_oldcur} releases the old-table bucket lock.
 
 \QuickQuiz{}


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 23:06     ` Akira Yokosawa
  2019-01-07 23:48       ` Paul E. McKenney
@ 2019-01-08  1:56       ` Junchang Wang
  2019-01-08 15:28         ` Paul E. McKenney
  1 sibling, 1 reply; 23+ messages in thread
From: Junchang Wang @ 2019-01-08  1:56 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Paul E. McKenney, perfbook

On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>
> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> > Hi Paul,
> >
> > On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> >> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> >>> Hi all,
> >>>
> >>> I'm reading hash_resize recently, and have a few questions regarding
> >>> this algorithm. Please take a look if you have time. Any suggestions
> >>> are warmly welcomed.
> >>>
> >>> === Question 1 ===
> >>> In hash_resize.c : hashtab_lock_mod
> >>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> >>> 187                 lsp->hbp[1] = NULL;
> >>> 188                 return;
> >>> 189         }
> >>> 190         htp = rcu_dereference(htp->ht_new);
> >>>
> >>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> >>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> >>> compilers and hardware from reordering the two unrelated variables,
> >>> ht_resize_cur and ht_new. Is my understanding correct?
> >>
> >> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> >> section
> >
> > You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> > an RCU read-side critical section", don't you?
> >
> >>         and there is a synchronize_rcu() between the update to ->ht_new
> >> and the updates to ->ht_resize_cur.  For more details on how this works,
> >> please see https://lwn.net/Articles/573497/.
> >>
> >> Of course, if you find a code path in which a call to hashtab_lock_mod()
> >> is invoked outside of an RCU read-side critical section, that would be
> >> a bug.  (Can you tell me an exception to this rule, that is, a case
> >> where hashtab_lock_mod() could safely be invoked outside of an RCU
> >> read-side critical section?)
> >>
> >>> === Question 2 ===
> >>> In hash_resize.c, each time an updater wants to access a bucket, the
> >>> updater must first acquire the bucket's lock (htb_lock), preventing
> >>> other updaters accessing the same bucket concurrently. This approach
> >>> is OK if the linked list of a bucket is relatively short, but for a
> >>> larger system where linked lists are long enough and the
> >>> perftest_resize thread is running simultaneously, it could become a
> >>> potential performance bottleneck. One naive solution is to allow
> >>> multiple updaters to access the same bucket, only if they don't
> >>> operate on the same item of the list of this bucket. I wonder if there
> >>> are any existing works or discussions on this topic?
> >>
> >> One approach is to use a hashed array of locks, and to hash a given
> >> element's address to locate the lock to be used.  Please see
> >> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> >> Needed Locks First"), including Quick Quiz 7.9, for additional details.
> >>
> >> Another approach is to use RCU to protect traversals, and locks within the
> >> linked-list elements themselves.  These locks are conditionally acquired
> >> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> >> them in list order, and the tricks in Quick Quiz 7.9.
> >>
> >> Non-blocking synchronization can also be used, but it is often quite a
> >> bit more complicated.  See for example the split-order list of Shalev
> >> and Shavit, along with Desnoyers's RCU-protected extension in the
> >> userspace RCU library.
> >>
> >> But it is usually -way- better to just choose a good hash function and
> >> to increase the number of buckets.  Which is of course one reason for
> >> having resizable hash tables.  ;-)
> >>
> >> But the other techniques can be useful in more complex linked data
> >> structures, such as graphs, where there is no reasonable way to
> >> partition the data.  Nevertheless, many people choose to do the
> >> partitioning anyway, especially on distributed systems.
> >>
> >>> === Question 3 ===
> >>> Chapter Data Structures also discusses other resizable hash tables,
> >>> namely "Resizable, scalable, concurrent hash tables via relativistic
> >>> programming" from Josh Triplett, which can save memory footprint by
> >>> using a single pair of pointers. But my understanding is that
> >>> perftest_resize.c is unique in that it allows you to rebuild the hash
> >>> table by utilizing a different hash function, which could be very
> >>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> >>> not share this property. Is my understanding correct? Did I miss any
> >>> discussions on this topic in perfbook?
> >>
> >> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> >> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> >> to hash functions.  I expect that this advantage will become increasingly
> >> important as security issues become more challenging.  Furthermore, I
> >> suspect that the pointer-pair approach is faster and more scalable.
> >> It is certainly simpler.
> >>
> >> On the other hand, one advantage of the other two approaches is decreased
> >> memory consumption.
> >>
> >> Another advantage of Josh Triplett's pointer-unzip approach is that
> >> concurrent updates are (in theory, anyway) not blocked for as long
> >> by resize operations.  The other edge of this sword is that resizing
> >> is much slower, given the need to wait for many RCU grace periods.
> >>
> >> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> >> and Shavit's split-order list is that all operations are non-blocking,
> >> which can be important on massively overloaded systems, such as one
> >> might find in cloud computing.
> >>
> >>> === Question 4 ===
> >>> In the current implementation of hash_resize.c, the perftest_resize
> >>> could block an updater, and vice versa. It seems this is not what we
> >>> expected. Ideally, they should be allowed to run concurrently, or at
> >>> least the perftest_resize thread should have lower priority and
> >>> updaters should never be blocked by the perftest_resize thread. Is
> >>> that right? I'm very interested in helping improve. Please let me know
> >>> if you have any suggestions.
> >>
> >> In hash_resize.c, an updater is blocked only for the time required to
> >> redisposition a bucket.  This is a great improvement over blocking
> >> updaters for the full resize over all buckets.
> >>
> >> But yes, it is not hard to do better, for example, periodically dropping
> >> the old-table lock in hashtab_resize().  This requires a few careful
> >> adjustments, of course.  Can you tell me what these adjustments are?
> >>
> >> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> >> optimizing for the race with hashtab_resize() doesn't make a whole lot
> >> of sense.  Please see the patch below.  Thoughts?
> >>
> >>                                                      Thanx, Paul
> >>
> >> ------------------------------------------------------------------------
> >>
> >> commit 737646a9c868d841b32199b52f5569668975953e
> >> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> >> Date:   Mon Jan 7 10:29:14 2019 -0800
> >>
> >>     datastruct/hash: Simplify hashtab_lookup()
> >>
> >>     Because resizing leaves the old hash table intact, and because lookups
> >>     are carried out within RCU read-side critical sections (which prevent
> >>     a second resizing operation from starting), there is no need for a
> >>     lookup to search anywhere but in the old hash table.  And in the common
> >>     case, there is no resize, so there is no new hash table.  Therefore,
> >>     eliminating the check for resizing speeds things up in the common
> >>     case.  In addition, this simplifies the code.
> >>
> >>     This commit therefore eliminates the ht_get_bucket() function,
> >>     renames the ht_get_bucket_single() function to ht_get_bucket(),
> >>     and modifies callers appropriately.
> >>
> >>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> >>
> >> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> >> index 29e05f907200..be4157959b83 100644
> >> --- a/CodeSamples/datastruct/hash/hash_resize.c
> >> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> >> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> >>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> >>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> >>  static struct ht_bucket *                           //\lnlbl{single:b}
> >> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >> -                     unsigned long *h)
> >> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> >>  {
> >>      unsigned long hash = htp->ht_gethash(key);
> >>
> >> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>              *h = hash;                              //\lnlbl{single:h}
> >>      return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
> >>  }                                                   //\lnlbl{single:e}
> >> -
> >> -/* Get hash bucket correesponding to key, accounting for resize. */
> >> -static struct ht_bucket *                           //\lnlbl{b}
> >> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> >> -{
> >> -    struct ht_bucket *htbp;
> >> -
> >> -    htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> >> -                                                            //\fcvexclude
> >> -    if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
> >> -            smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> >
> > If we can remove this memory barrier, the counterpart smp_mb() in
> > hashtab_resize() becomes unnecessary, doesn't it?
>
> And the WRITE_ONCE() in the following line.
>
>         Thanks, Akira
> >
> >         Thanks, Akira
> >
> >> -            *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
> >> -            htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> >> -    }
> >> -    if (i)                                          //\lnlbl{chk_i}
> >> -            *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
> >> -    return htbp;                                    //\lnlbl{return}
> >> -}                                                   //\lnlbl{e}
> >>  //\end{snippet}
> >>
> >>  /* Read-side lock/unlock functions. */
> >> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >>
> >>      rcu_read_lock();                                //\lnlbl{l:rcu_lock}
> >>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
> >> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
> >> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
> >>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
> >>      lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
> >>      lsp->hls_idx[0] = htp->ht_idx;
> >> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >>              return;                                 //\lnlbl{l:fastret1}
> >>      }
> >>      htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
> >> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
> >> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
> >>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
> >>      lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
> >>      lsp->hls_idx[1] = htp->ht_idx;
> >> @@ -223,16 +204,15 @@ struct ht_elem *                                       //\lnlbl{lkp:b}
> >>  hashtab_lookup(struct hashtab *htp_master, void *key)
> >>  {
> >>      long b;
> >> -    int i;
> >>      struct ht *htp;
> >>      struct ht_elem *htep;
> >>      struct ht_bucket *htbp;
> >>
> >>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
> >> -    htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
> >> +    htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
> >>      cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
> >>                                  &htbp->htb_head,
> >> -                                hte_next[i]) {
> >> +                                hte_next[htp->ht_idx]) {
> >>              if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
> >>                      return htep;                    //\lnlbl{lkp:ret_match}
> >>      }                                               //\lnlbl{lkp:loop:e}
> >> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
> >>              htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
> >>              spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
> >>              cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> >> -                    htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> >> +                    htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> >>                      spin_lock(&htbp_new->htb_lock);
> >>                      cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> >>                      spin_unlock(&htbp_new->htb_lock);
> >> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> >> index 5c61bf5e2389..0152437c274e 100644
> >> --- a/datastruct/datastruct.tex
> >> +++ b/datastruct/datastruct.tex
> >> @@ -966,10 +966,8 @@ the old table.
> >>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> >>  Bucket selection is shown in
> >>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> >> -which shows \co{ht_get_bucket_single()} on
> >> -lines~\lnref{single:b}-\lnref{single:e} and
> >> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> >> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> >> +which shows \co{ht_get_bucket()}.
> >> +This function returns a reference to the bucket
> >>  corresponding to the specified key in the specified hash table, without
> >>  making any allowances for resizing.
> >>  It also stores the bucket index corresponding to the key into the location
> >> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
> >>  hash value corresponding to the key into the location
> >>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
> >>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> >> -
> >> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> >> -\co{ht_get_bucket_single()} on
> >> -line~\lnref{call_single} to select the bucket
> >> -corresponding to the hash in the current
> >> -hash table, storing the hash value through parameter~\co{b}.
> >> -If line~\lnref{resized} determines that the table is being resized and that
> >> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> >> -table, then line~\lnref{newtable} selects the new hash table and
> >> -line~\lnref{newbucket}
> >> -selects the bucket corresponding to the hash in the new hash table,
> >> -again storing the hash value through parameter~\co{b}.
> >> -\end{lineref}
> >> -
> >> -\QuickQuiz{}
> >> -    The code in
> >> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> >> -    computes the hash twice!
> >> -    Why this blatant inefficiency?
> >> -\QuickQuizAnswer{
> >> -    The reason is that the old and new hash tables might have
> >> -    completely different hash functions, so that a hash computed
> >> -    for the old table might be completely irrelevant to the
> >> -    new table.
> >> -} \QuickQuizEnd
> >> -
> >> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> >> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> >> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> >> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >>  \end{lineref}
> >>
> >>  \QuickQuiz{}
> >> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >>      functions described next.
> >>  } \QuickQuizEnd
> >>
> >> -This implementation of
> >> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> >> -permit lookups and modifications to run concurrently
> >> -with a resize operation.
> >> +This implementation of \co{ht_get_bucket()} permits lookups and
> >> +modifications to run concurrently with a resize operation.
> >>
> >>  \begin{listing}[tb]
> >>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> >> @@ -1129,11 +1095,6 @@ hash lookups.
> >>  Line~\lnref{get_curtbl} fetches the current hash table and
> >>  line~\lnref{get_curbkt} obtains a reference
> >>  to the bucket corresponding to the specified key.
> >> -This bucket will be located in a new resized hash table when a
> >> -resize operation has progressed past the bucket in the old hash
> >> -table that contained the desired data element.
> >> -Note that line~\lnref{get_curbkt} also passes back the index that will be
> >> -used to select the correct set of pointers from the pair in each element.
> >>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> >>  so that if line~\lnref{match}
> >>  detects a match,
> >> @@ -1144,22 +1105,17 @@ failure.
> >>  \end{lineref}
> >>
> >>  \QuickQuiz{}
> >> -    In the \co{hashtab_lookup()} function in
> >> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> >> -    the code carefully finds the right bucket in the new hash table
> >> -    if the element to be looked up has already been distributed
> >> -    by a concurrent resize operation.
> >> -    This seems wasteful for RCU-protected lookups.
> >> -    Why not just stick with the old hash table in this case?
> >> +    \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> >> +    What if execution reaches line~\lnref{loop:b}
> >> +    of \co{hashtab_lookup()} in
> >> +    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> >> +    just after this bucket has been resized.
> >> +    Won't that result in lookup failures?
> >> +    \end{lineref}
> >>  \QuickQuizAnswer{
> >> -    Suppose that a resize operation begins and distributes half of
> >> -    the old table's buckets to the new table.
> >> -    Suppose further that a thread adds a new element that goes into
> >> -    one of the already-distributed buckets, and that this same thread
> >> -    now looks up this newly added element.
> >> -    If lookups unconditionally traversed only the old hash table,
> >> -    this thread would get a lookup failure for the element that it
> >> -    just added, which certainly sounds like a bug to me!
> >> +    No, it won't.
> >> +    Resizing into the new hash table leaves the old hash table
> >> +    intact, courtesy of the pointer pairs.
> >>  } \QuickQuizEnd
> >>
> >>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
> >>

Hi Paul and Akira,

Thanks a lot for the comments, which I need some more time to look
into. For Paul's patch, I have a few concerns. Please take a look.

My understanding is that with this path, during the time period when
the resizing thread is running, an updater may insert/delete an item
into/from the new hash table, while readers are still looking up data
in the old one, resulting the readers are unaware of
insertions/deletions happening simultaneously. For example, it seems
the following sequence could happen.

1. The resizing thread starts.
2. The resizing thread successfully passes bucket *B* of the old hash table.
3. An updater wants to insert a new item *I* which should be inserted
into bucket *B*.
4. The updater will select the new hash table and insert the item *I*
into the new hash table.
5. A read request comes in and wants to lookup item *I*. The lookup
request will check the old hash table and fail. Doesn't it?
6. The resizing thread exits.
7. Now subsequent read requests can successfully find item *I*.

Is my understanding correct? Please let me know if I misunderstood
anything. Give the truth that this patch can accelerate the fast path,
I think it should be OK because resizing is typically happen rarely.
Just want to make sure I fully understand the algorithm.

Thanks,
--Junchang


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 23:48       ` Paul E. McKenney
@ 2019-01-08 15:18         ` Akira Yokosawa
  2019-01-08 15:32           ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Akira Yokosawa @ 2019-01-08 15:18 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Junchang Wang, perfbook, Akira Yokosawa

On 2019/01/07 15:48:50 -0800, Paul E. McKenney wrote:
> On Tue, Jan 08, 2019 at 08:06:51AM +0900, Akira Yokosawa wrote:
>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
>>> Hi Paul, 
>>>
>>> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
>>>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm reading hash_resize recently, and have a few questions regarding
>>>>> this algorithm. Please take a look if you have time. Any suggestions
>>>>> are warmly welcomed.
>>>>>
>>>>> === Question 1 ===
>>>>> In hash_resize.c : hashtab_lock_mod
>>>>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
>>>>> 187                 lsp->hbp[1] = NULL;
>>>>> 188                 return;
>>>>> 189         }
>>>>> 190         htp = rcu_dereference(htp->ht_new);
>>>>>
>>>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
>>>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
>>>>> compilers and hardware from reordering the two unrelated variables,
>>>>> ht_resize_cur and ht_new. Is my understanding correct?
>>>>
>>>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
>>>> section
>>>
>>> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
>>> an RCU read-side critical section", don't you?
>>>
>>>>         and there is a synchronize_rcu() between the update to ->ht_new
>>>> and the updates to ->ht_resize_cur.  For more details on how this works,
>>>> please see https://lwn.net/Articles/573497/.
>>>>
>>>> Of course, if you find a code path in which a call to hashtab_lock_mod()
>>>> is invoked outside of an RCU read-side critical section, that would be
>>>> a bug.  (Can you tell me an exception to this rule, that is, a case
>>>> where hashtab_lock_mod() could safely be invoked outside of an RCU
>>>> read-side critical section?)
>>>>
>>>>> === Question 2 ===
>>>>> In hash_resize.c, each time an updater wants to access a bucket, the
>>>>> updater must first acquire the bucket's lock (htb_lock), preventing
>>>>> other updaters accessing the same bucket concurrently. This approach
>>>>> is OK if the linked list of a bucket is relatively short, but for a
>>>>> larger system where linked lists are long enough and the
>>>>> perftest_resize thread is running simultaneously, it could become a
>>>>> potential performance bottleneck. One naive solution is to allow
>>>>> multiple updaters to access the same bucket, only if they don't
>>>>> operate on the same item of the list of this bucket. I wonder if there
>>>>> are any existing works or discussions on this topic?
>>>>
>>>> One approach is to use a hashed array of locks, and to hash a given
>>>> element's address to locate the lock to be used.  Please see
>>>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
>>>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
>>>>
>>>> Another approach is to use RCU to protect traversals, and locks within the
>>>> linked-list elements themselves.  These locks are conditionally acquired
>>>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
>>>> them in list order, and the tricks in Quick Quiz 7.9.
>>>>
>>>> Non-blocking synchronization can also be used, but it is often quite a
>>>> bit more complicated.  See for example the split-order list of Shalev
>>>> and Shavit, along with Desnoyers's RCU-protected extension in the
>>>> userspace RCU library.
>>>>
>>>> But it is usually -way- better to just choose a good hash function and
>>>> to increase the number of buckets.  Which is of course one reason for
>>>> having resizable hash tables.  ;-)
>>>>
>>>> But the other techniques can be useful in more complex linked data
>>>> structures, such as graphs, where there is no reasonable way to
>>>> partition the data.  Nevertheless, many people choose to do the
>>>> partitioning anyway, especially on distributed systems.
>>>>
>>>>> === Question 3 ===
>>>>> Chapter Data Structures also discusses other resizable hash tables,
>>>>> namely "Resizable, scalable, concurrent hash tables via relativistic
>>>>> programming" from Josh Triplett, which can save memory footprint by
>>>>> using a single pair of pointers. But my understanding is that
>>>>> perftest_resize.c is unique in that it allows you to rebuild the hash
>>>>> table by utilizing a different hash function, which could be very
>>>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
>>>>> not share this property. Is my understanding correct? Did I miss any
>>>>> discussions on this topic in perfbook?
>>>>
>>>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
>>>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
>>>> to hash functions.  I expect that this advantage will become increasingly
>>>> important as security issues become more challenging.  Furthermore, I
>>>> suspect that the pointer-pair approach is faster and more scalable.
>>>> It is certainly simpler.
>>>>
>>>> On the other hand, one advantage of the other two approaches is decreased
>>>> memory consumption.
>>>>
>>>> Another advantage of Josh Triplett's pointer-unzip approach is that
>>>> concurrent updates are (in theory, anyway) not blocked for as long
>>>> by resize operations.  The other edge of this sword is that resizing
>>>> is much slower, given the need to wait for many RCU grace periods.
>>>>
>>>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
>>>> and Shavit's split-order list is that all operations are non-blocking,
>>>> which can be important on massively overloaded systems, such as one
>>>> might find in cloud computing.
>>>>
>>>>> === Question 4 ===
>>>>> In the current implementation of hash_resize.c, the perftest_resize
>>>>> could block an updater, and vice versa. It seems this is not what we
>>>>> expected. Ideally, they should be allowed to run concurrently, or at
>>>>> least the perftest_resize thread should have lower priority and
>>>>> updaters should never be blocked by the perftest_resize thread. Is
>>>>> that right? I'm very interested in helping improve. Please let me know
>>>>> if you have any suggestions.
>>>>
>>>> In hash_resize.c, an updater is blocked only for the time required to
>>>> redisposition a bucket.  This is a great improvement over blocking
>>>> updaters for the full resize over all buckets.
>>>>
>>>> But yes, it is not hard to do better, for example, periodically dropping
>>>> the old-table lock in hashtab_resize().  This requires a few careful
>>>> adjustments, of course.  Can you tell me what these adjustments are?
>>>>
>>>> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
>>>> optimizing for the race with hashtab_resize() doesn't make a whole lot
>>>> of sense.  Please see the patch below.  Thoughts?
>>>>
>>>> 							Thanx, Paul
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> commit 737646a9c868d841b32199b52f5569668975953e
>>>> Author: Paul E. McKenney <paulmck@linux.ibm.com>
>>>> Date:   Mon Jan 7 10:29:14 2019 -0800
>>>>
>>>>     datastruct/hash: Simplify hashtab_lookup()
>>>>     
>>>>     Because resizing leaves the old hash table intact, and because lookups
>>>>     are carried out within RCU read-side critical sections (which prevent
>>>>     a second resizing operation from starting), there is no need for a
>>>>     lookup to search anywhere but in the old hash table.  And in the common
>>>>     case, there is no resize, so there is no new hash table.  Therefore,
>>>>     eliminating the check for resizing speeds things up in the common
>>>>     case.  In addition, this simplifies the code.
>>>>     
>>>>     This commit therefore eliminates the ht_get_bucket() function,
>>>>     renames the ht_get_bucket_single() function to ht_get_bucket(),
>>>>     and modifies callers appropriately.
>>>>     
>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
>>>>
>>>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
>>>> index 29e05f907200..be4157959b83 100644
>>>> --- a/CodeSamples/datastruct/hash/hash_resize.c
>>>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
>>>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
>>>>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>>>>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>>>>  static struct ht_bucket *				//\lnlbl{single:b}
>>>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>>> -                     unsigned long *h)
>>>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>>>>  {
>>>>  	unsigned long hash = htp->ht_gethash(key);
>>>>  
>>>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>>>  		*h = hash;				//\lnlbl{single:h}
>>>>  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
>>>>  }							//\lnlbl{single:e}
>>>> -
>>>> -/* Get hash bucket correesponding to key, accounting for resize. */
>>>> -static struct ht_bucket *				//\lnlbl{b}
>>>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
>>>> -{
>>>> -	struct ht_bucket *htbp;
>>>> -
>>>> -	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
>>>> -								//\fcvexclude
>>>> -	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
>>>> -		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
>>>
>>> If we can remove this memory barrier, the counterpart smp_mb() in
>>> hashtab_resize() becomes unnecessary, doesn't it?
>>
>> And the WRITE_ONCE() in the following line.
> 
> Actually, that must stay.  It is true that the bucket lock is held by
> hashtab_unlock_mod(), and that this prevents concurrent resizing of that
> bucket, but other buckets might well be resized, which results in the
> possibiliity of concurrent reads and writes for ->ht_resize_cur.
> 
> Anyway, here is the resulting commit.  Thoughts?

Looks good to me. (Give or take a few typo in commit log.)
I missed the remaining READ_ONCE(), which didn't appear in the diff.

And if you choose this way of simplifying hashtab_lookup(),
the Quick Quiz on smp_mb()s would become out of context.

        Thanks, Akira

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 1aac0c703482c90c2ce4092b2cc604d474f5a44b
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Mon Jan 7 15:39:40 2019 -0800
> 
>     datastruct/hash: Remove extraneous barrier from hashtab_resize()
>     
>     Now that hashtab_lookup() is iresizing-agnostic, all non-initialization
>     accesses to ->ht_resize-cur are protected by locking in the restricted
>     sense that any change to ->ht_resize_cur that would change the value
>     of the "if" condition cannot happen while the lock is held on the old
>     bucket.  This means that the memory barrier may be removed.  However,
>     the READ_ONCE() and WRITE_ONCE() markings on non-initialization accesses
>     to ->ht_resize_cur must remain because reads from ->ht_resize_cur really
>     can race with writes, just not is a way to change the "if" conditions.
>     
>     Reported-by: Akira Yokosawa <akiyks@gmail.com>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> index be4157959b83..9f68a00dabe3 100644
> --- a/CodeSamples/datastruct/hash/hash_resize.c
> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> @@ -288,7 +288,6 @@ int hashtab_resize(struct hashtab *htp_master,
>  			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
>  			spin_unlock(&htbp_new->htb_lock);
>  		}					//\lnlbl{loop_list:e}
> -		smp_mb(); /* Fill new buckets before claiming them. */
>  		WRITE_ONCE(htp->ht_resize_cur, i);	//\lnlbl{update_resize}
>  		spin_unlock(&htbp->htb_lock);		//\lnlbl{rel_oldcur}
>  	}						//\lnlbl{loop:e}
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index 0152437c274e..e2159330790f 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
> @@ -1245,10 +1245,7 @@ lines~\lnref{loop_list:b}-\lnref{loop_list:e} adds one data element
>  from the current old-table bucket to the corresponding new-table bucket,
>  holding the new-table bucket's lock during the add operation.
>  Line~\lnref{update_resize} updates
> -\co{->ht_resize_cur} to indicate that this bucket has been distributed,
> -and is preceded by a full memory barrier that pairs with the one in
> -\co{ht_get_bucket()} shown in
> -Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}.
> +\co{->ht_resize_cur} to indicate that this bucket has been distributed.
>  Finally, line~\lnref{rel_oldcur} releases the old-table bucket lock.
>  
>  \QuickQuiz{}
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-08  1:56       ` Junchang Wang
@ 2019-01-08 15:28         ` Paul E. McKenney
  2019-01-08 15:35           ` Akira Yokosawa
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-08 15:28 UTC (permalink / raw)
  To: Junchang Wang; +Cc: Akira Yokosawa, perfbook

On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> > On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> > > Hi Paul,
> > >
> > > On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> > >> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> > >>> Hi all,
> > >>>
> > >>> I'm reading hash_resize recently, and have a few questions regarding
> > >>> this algorithm. Please take a look if you have time. Any suggestions
> > >>> are warmly welcomed.
> > >>>
> > >>> === Question 1 ===
> > >>> In hash_resize.c : hashtab_lock_mod
> > >>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> > >>> 187                 lsp->hbp[1] = NULL;
> > >>> 188                 return;
> > >>> 189         }
> > >>> 190         htp = rcu_dereference(htp->ht_new);
> > >>>
> > >>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> > >>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> > >>> compilers and hardware from reordering the two unrelated variables,
> > >>> ht_resize_cur and ht_new. Is my understanding correct?
> > >>
> > >> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> > >> section
> > >
> > > You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> > > an RCU read-side critical section", don't you?
> > >
> > >>         and there is a synchronize_rcu() between the update to ->ht_new
> > >> and the updates to ->ht_resize_cur.  For more details on how this works,
> > >> please see https://lwn.net/Articles/573497/.
> > >>
> > >> Of course, if you find a code path in which a call to hashtab_lock_mod()
> > >> is invoked outside of an RCU read-side critical section, that would be
> > >> a bug.  (Can you tell me an exception to this rule, that is, a case
> > >> where hashtab_lock_mod() could safely be invoked outside of an RCU
> > >> read-side critical section?)
> > >>
> > >>> === Question 2 ===
> > >>> In hash_resize.c, each time an updater wants to access a bucket, the
> > >>> updater must first acquire the bucket's lock (htb_lock), preventing
> > >>> other updaters accessing the same bucket concurrently. This approach
> > >>> is OK if the linked list of a bucket is relatively short, but for a
> > >>> larger system where linked lists are long enough and the
> > >>> perftest_resize thread is running simultaneously, it could become a
> > >>> potential performance bottleneck. One naive solution is to allow
> > >>> multiple updaters to access the same bucket, only if they don't
> > >>> operate on the same item of the list of this bucket. I wonder if there
> > >>> are any existing works or discussions on this topic?
> > >>
> > >> One approach is to use a hashed array of locks, and to hash a given
> > >> element's address to locate the lock to be used.  Please see
> > >> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> > >> Needed Locks First"), including Quick Quiz 7.9, for additional details.
> > >>
> > >> Another approach is to use RCU to protect traversals, and locks within the
> > >> linked-list elements themselves.  These locks are conditionally acquired
> > >> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> > >> them in list order, and the tricks in Quick Quiz 7.9.
> > >>
> > >> Non-blocking synchronization can also be used, but it is often quite a
> > >> bit more complicated.  See for example the split-order list of Shalev
> > >> and Shavit, along with Desnoyers's RCU-protected extension in the
> > >> userspace RCU library.
> > >>
> > >> But it is usually -way- better to just choose a good hash function and
> > >> to increase the number of buckets.  Which is of course one reason for
> > >> having resizable hash tables.  ;-)
> > >>
> > >> But the other techniques can be useful in more complex linked data
> > >> structures, such as graphs, where there is no reasonable way to
> > >> partition the data.  Nevertheless, many people choose to do the
> > >> partitioning anyway, especially on distributed systems.
> > >>
> > >>> === Question 3 ===
> > >>> Chapter Data Structures also discusses other resizable hash tables,
> > >>> namely "Resizable, scalable, concurrent hash tables via relativistic
> > >>> programming" from Josh Triplett, which can save memory footprint by
> > >>> using a single pair of pointers. But my understanding is that
> > >>> perftest_resize.c is unique in that it allows you to rebuild the hash
> > >>> table by utilizing a different hash function, which could be very
> > >>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> > >>> not share this property. Is my understanding correct? Did I miss any
> > >>> discussions on this topic in perfbook?
> > >>
> > >> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> > >> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> > >> to hash functions.  I expect that this advantage will become increasingly
> > >> important as security issues become more challenging.  Furthermore, I
> > >> suspect that the pointer-pair approach is faster and more scalable.
> > >> It is certainly simpler.
> > >>
> > >> On the other hand, one advantage of the other two approaches is decreased
> > >> memory consumption.
> > >>
> > >> Another advantage of Josh Triplett's pointer-unzip approach is that
> > >> concurrent updates are (in theory, anyway) not blocked for as long
> > >> by resize operations.  The other edge of this sword is that resizing
> > >> is much slower, given the need to wait for many RCU grace periods.
> > >>
> > >> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> > >> and Shavit's split-order list is that all operations are non-blocking,
> > >> which can be important on massively overloaded systems, such as one
> > >> might find in cloud computing.
> > >>
> > >>> === Question 4 ===
> > >>> In the current implementation of hash_resize.c, the perftest_resize
> > >>> could block an updater, and vice versa. It seems this is not what we
> > >>> expected. Ideally, they should be allowed to run concurrently, or at
> > >>> least the perftest_resize thread should have lower priority and
> > >>> updaters should never be blocked by the perftest_resize thread. Is
> > >>> that right? I'm very interested in helping improve. Please let me know
> > >>> if you have any suggestions.
> > >>
> > >> In hash_resize.c, an updater is blocked only for the time required to
> > >> redisposition a bucket.  This is a great improvement over blocking
> > >> updaters for the full resize over all buckets.
> > >>
> > >> But yes, it is not hard to do better, for example, periodically dropping
> > >> the old-table lock in hashtab_resize().  This requires a few careful
> > >> adjustments, of course.  Can you tell me what these adjustments are?
> > >>
> > >> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> > >> optimizing for the race with hashtab_resize() doesn't make a whole lot
> > >> of sense.  Please see the patch below.  Thoughts?
> > >>
> > >>                                                      Thanx, Paul
> > >>
> > >> ------------------------------------------------------------------------
> > >>
> > >> commit 737646a9c868d841b32199b52f5569668975953e
> > >> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > >> Date:   Mon Jan 7 10:29:14 2019 -0800
> > >>
> > >>     datastruct/hash: Simplify hashtab_lookup()
> > >>
> > >>     Because resizing leaves the old hash table intact, and because lookups
> > >>     are carried out within RCU read-side critical sections (which prevent
> > >>     a second resizing operation from starting), there is no need for a
> > >>     lookup to search anywhere but in the old hash table.  And in the common
> > >>     case, there is no resize, so there is no new hash table.  Therefore,
> > >>     eliminating the check for resizing speeds things up in the common
> > >>     case.  In addition, this simplifies the code.
> > >>
> > >>     This commit therefore eliminates the ht_get_bucket() function,
> > >>     renames the ht_get_bucket_single() function to ht_get_bucket(),
> > >>     and modifies callers appropriately.
> > >>
> > >>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > >>
> > >> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> > >> index 29e05f907200..be4157959b83 100644
> > >> --- a/CodeSamples/datastruct/hash/hash_resize.c
> > >> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> > >> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> > >>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> > >>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> > >>  static struct ht_bucket *                           //\lnlbl{single:b}
> > >> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> > >> -                     unsigned long *h)
> > >> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> > >>  {
> > >>      unsigned long hash = htp->ht_gethash(key);
> > >>
> > >> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> > >>              *h = hash;                              //\lnlbl{single:h}
> > >>      return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
> > >>  }                                                   //\lnlbl{single:e}
> > >> -
> > >> -/* Get hash bucket correesponding to key, accounting for resize. */
> > >> -static struct ht_bucket *                           //\lnlbl{b}
> > >> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> > >> -{
> > >> -    struct ht_bucket *htbp;
> > >> -
> > >> -    htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> > >> -                                                            //\fcvexclude
> > >> -    if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
> > >> -            smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> > >
> > > If we can remove this memory barrier, the counterpart smp_mb() in
> > > hashtab_resize() becomes unnecessary, doesn't it?
> >
> > And the WRITE_ONCE() in the following line.
> >
> >         Thanks, Akira
> > >
> > >         Thanks, Akira
> > >
> > >> -            *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
> > >> -            htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> > >> -    }
> > >> -    if (i)                                          //\lnlbl{chk_i}
> > >> -            *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
> > >> -    return htbp;                                    //\lnlbl{return}
> > >> -}                                                   //\lnlbl{e}
> > >>  //\end{snippet}
> > >>
> > >>  /* Read-side lock/unlock functions. */
> > >> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> > >>
> > >>      rcu_read_lock();                                //\lnlbl{l:rcu_lock}
> > >>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
> > >> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
> > >> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
> > >>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
> > >>      lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
> > >>      lsp->hls_idx[0] = htp->ht_idx;
> > >> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> > >>              return;                                 //\lnlbl{l:fastret1}
> > >>      }
> > >>      htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
> > >> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
> > >> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
> > >>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
> > >>      lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
> > >>      lsp->hls_idx[1] = htp->ht_idx;
> > >> @@ -223,16 +204,15 @@ struct ht_elem *                                       //\lnlbl{lkp:b}
> > >>  hashtab_lookup(struct hashtab *htp_master, void *key)
> > >>  {
> > >>      long b;
> > >> -    int i;
> > >>      struct ht *htp;
> > >>      struct ht_elem *htep;
> > >>      struct ht_bucket *htbp;
> > >>
> > >>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
> > >> -    htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
> > >> +    htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
> > >>      cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
> > >>                                  &htbp->htb_head,
> > >> -                                hte_next[i]) {
> > >> +                                hte_next[htp->ht_idx]) {
> > >>              if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
> > >>                      return htep;                    //\lnlbl{lkp:ret_match}
> > >>      }                                               //\lnlbl{lkp:loop:e}
> > >> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
> > >>              htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
> > >>              spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
> > >>              cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> > >> -                    htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> > >> +                    htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> > >>                      spin_lock(&htbp_new->htb_lock);
> > >>                      cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> > >>                      spin_unlock(&htbp_new->htb_lock);
> > >> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > >> index 5c61bf5e2389..0152437c274e 100644
> > >> --- a/datastruct/datastruct.tex
> > >> +++ b/datastruct/datastruct.tex
> > >> @@ -966,10 +966,8 @@ the old table.
> > >>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> > >>  Bucket selection is shown in
> > >>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> > >> -which shows \co{ht_get_bucket_single()} on
> > >> -lines~\lnref{single:b}-\lnref{single:e} and
> > >> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> > >> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> > >> +which shows \co{ht_get_bucket()}.
> > >> +This function returns a reference to the bucket
> > >>  corresponding to the specified key in the specified hash table, without
> > >>  making any allowances for resizing.
> > >>  It also stores the bucket index corresponding to the key into the location
> > >> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
> > >>  hash value corresponding to the key into the location
> > >>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
> > >>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> > >> -
> > >> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> > >> -\co{ht_get_bucket_single()} on
> > >> -line~\lnref{call_single} to select the bucket
> > >> -corresponding to the hash in the current
> > >> -hash table, storing the hash value through parameter~\co{b}.
> > >> -If line~\lnref{resized} determines that the table is being resized and that
> > >> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> > >> -table, then line~\lnref{newtable} selects the new hash table and
> > >> -line~\lnref{newbucket}
> > >> -selects the bucket corresponding to the hash in the new hash table,
> > >> -again storing the hash value through parameter~\co{b}.
> > >> -\end{lineref}
> > >> -
> > >> -\QuickQuiz{}
> > >> -    The code in
> > >> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> > >> -    computes the hash twice!
> > >> -    Why this blatant inefficiency?
> > >> -\QuickQuizAnswer{
> > >> -    The reason is that the old and new hash tables might have
> > >> -    completely different hash functions, so that a hash computed
> > >> -    for the old table might be completely irrelevant to the
> > >> -    new table.
> > >> -} \QuickQuizEnd
> > >> -
> > >> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> > >> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> > >> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> > >> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> > >>  \end{lineref}
> > >>
> > >>  \QuickQuiz{}
> > >> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> > >>      functions described next.
> > >>  } \QuickQuizEnd
> > >>
> > >> -This implementation of
> > >> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> > >> -permit lookups and modifications to run concurrently
> > >> -with a resize operation.
> > >> +This implementation of \co{ht_get_bucket()} permits lookups and
> > >> +modifications to run concurrently with a resize operation.
> > >>
> > >>  \begin{listing}[tb]
> > >>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> > >> @@ -1129,11 +1095,6 @@ hash lookups.
> > >>  Line~\lnref{get_curtbl} fetches the current hash table and
> > >>  line~\lnref{get_curbkt} obtains a reference
> > >>  to the bucket corresponding to the specified key.
> > >> -This bucket will be located in a new resized hash table when a
> > >> -resize operation has progressed past the bucket in the old hash
> > >> -table that contained the desired data element.
> > >> -Note that line~\lnref{get_curbkt} also passes back the index that will be
> > >> -used to select the correct set of pointers from the pair in each element.
> > >>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> > >>  so that if line~\lnref{match}
> > >>  detects a match,
> > >> @@ -1144,22 +1105,17 @@ failure.
> > >>  \end{lineref}
> > >>
> > >>  \QuickQuiz{}
> > >> -    In the \co{hashtab_lookup()} function in
> > >> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> > >> -    the code carefully finds the right bucket in the new hash table
> > >> -    if the element to be looked up has already been distributed
> > >> -    by a concurrent resize operation.
> > >> -    This seems wasteful for RCU-protected lookups.
> > >> -    Why not just stick with the old hash table in this case?
> > >> +    \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> > >> +    What if execution reaches line~\lnref{loop:b}
> > >> +    of \co{hashtab_lookup()} in
> > >> +    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> > >> +    just after this bucket has been resized.
> > >> +    Won't that result in lookup failures?
> > >> +    \end{lineref}
> > >>  \QuickQuizAnswer{
> > >> -    Suppose that a resize operation begins and distributes half of
> > >> -    the old table's buckets to the new table.
> > >> -    Suppose further that a thread adds a new element that goes into
> > >> -    one of the already-distributed buckets, and that this same thread
> > >> -    now looks up this newly added element.
> > >> -    If lookups unconditionally traversed only the old hash table,
> > >> -    this thread would get a lookup failure for the element that it
> > >> -    just added, which certainly sounds like a bug to me!
> > >> +    No, it won't.
> > >> +    Resizing into the new hash table leaves the old hash table
> > >> +    intact, courtesy of the pointer pairs.
> > >>  } \QuickQuizEnd
> > >>
> > >>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
> > >>
> 
> Hi Paul and Akira,
> 
> Thanks a lot for the comments, which I need some more time to look
> into. For Paul's patch, I have a few concerns. Please take a look.
> 
> My understanding is that with this path, during the time period when
> the resizing thread is running, an updater may insert/delete an item
> into/from the new hash table, while readers are still looking up data
> in the old one, resulting the readers are unaware of
> insertions/deletions happening simultaneously. For example, it seems
> the following sequence could happen.
> 
> 1. The resizing thread starts.
> 2. The resizing thread successfully passes bucket *B* of the old hash table.
> 3. An updater wants to insert a new item *I* which should be inserted
> into bucket *B*.
> 4. The updater will select the new hash table and insert the item *I*
> into the new hash table.
> 5. A read request comes in and wants to lookup item *I*. The lookup
> request will check the old hash table and fail. Doesn't it?
> 6. The resizing thread exits.
> 7. Now subsequent read requests can successfully find item *I*.

Yes, this can happen.

> Is my understanding correct? Please let me know if I misunderstood
> anything. Give the truth that this patch can accelerate the fast path,
> I think it should be OK because resizing is typically happen rarely.
> Just want to make sure I fully understand the algorithm.

It is a design choice, and some users would prefer not to fail to see
new items during a resize.  One approach would be to revert back to
the old-style checking, and another would be to provide a separate
lookup interface that synchronizes with adds and deletes.

So, I could add a quick quiz with this information, I could revert the
change, or I could add another lookup function that provided more timely
information.  Left to myself, I would provide a quick quiz, but what
do you guys think?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-08 15:18         ` Akira Yokosawa
@ 2019-01-08 15:32           ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-08 15:32 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Wed, Jan 09, 2019 at 12:18:14AM +0900, Akira Yokosawa wrote:
> On 2019/01/07 15:48:50 -0800, Paul E. McKenney wrote:
> > On Tue, Jan 08, 2019 at 08:06:51AM +0900, Akira Yokosawa wrote:
> >> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> >>> Hi Paul, 
> >>>
> >>> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> >>>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I'm reading hash_resize recently, and have a few questions regarding
> >>>>> this algorithm. Please take a look if you have time. Any suggestions
> >>>>> are warmly welcomed.
> >>>>>
> >>>>> === Question 1 ===
> >>>>> In hash_resize.c : hashtab_lock_mod
> >>>>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> >>>>> 187                 lsp->hbp[1] = NULL;
> >>>>> 188                 return;
> >>>>> 189         }
> >>>>> 190         htp = rcu_dereference(htp->ht_new);
> >>>>>
> >>>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> >>>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> >>>>> compilers and hardware from reordering the two unrelated variables,
> >>>>> ht_resize_cur and ht_new. Is my understanding correct?
> >>>>
> >>>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> >>>> section
> >>>
> >>> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> >>> an RCU read-side critical section", don't you?
> >>>
> >>>>         and there is a synchronize_rcu() between the update to ->ht_new
> >>>> and the updates to ->ht_resize_cur.  For more details on how this works,
> >>>> please see https://lwn.net/Articles/573497/.
> >>>>
> >>>> Of course, if you find a code path in which a call to hashtab_lock_mod()
> >>>> is invoked outside of an RCU read-side critical section, that would be
> >>>> a bug.  (Can you tell me an exception to this rule, that is, a case
> >>>> where hashtab_lock_mod() could safely be invoked outside of an RCU
> >>>> read-side critical section?)
> >>>>
> >>>>> === Question 2 ===
> >>>>> In hash_resize.c, each time an updater wants to access a bucket, the
> >>>>> updater must first acquire the bucket's lock (htb_lock), preventing
> >>>>> other updaters accessing the same bucket concurrently. This approach
> >>>>> is OK if the linked list of a bucket is relatively short, but for a
> >>>>> larger system where linked lists are long enough and the
> >>>>> perftest_resize thread is running simultaneously, it could become a
> >>>>> potential performance bottleneck. One naive solution is to allow
> >>>>> multiple updaters to access the same bucket, only if they don't
> >>>>> operate on the same item of the list of this bucket. I wonder if there
> >>>>> are any existing works or discussions on this topic?
> >>>>
> >>>> One approach is to use a hashed array of locks, and to hash a given
> >>>> element's address to locate the lock to be used.  Please see
> >>>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> >>>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
> >>>>
> >>>> Another approach is to use RCU to protect traversals, and locks within the
> >>>> linked-list elements themselves.  These locks are conditionally acquired
> >>>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> >>>> them in list order, and the tricks in Quick Quiz 7.9.
> >>>>
> >>>> Non-blocking synchronization can also be used, but it is often quite a
> >>>> bit more complicated.  See for example the split-order list of Shalev
> >>>> and Shavit, along with Desnoyers's RCU-protected extension in the
> >>>> userspace RCU library.
> >>>>
> >>>> But it is usually -way- better to just choose a good hash function and
> >>>> to increase the number of buckets.  Which is of course one reason for
> >>>> having resizable hash tables.  ;-)
> >>>>
> >>>> But the other techniques can be useful in more complex linked data
> >>>> structures, such as graphs, where there is no reasonable way to
> >>>> partition the data.  Nevertheless, many people choose to do the
> >>>> partitioning anyway, especially on distributed systems.
> >>>>
> >>>>> === Question 3 ===
> >>>>> Chapter Data Structures also discusses other resizable hash tables,
> >>>>> namely "Resizable, scalable, concurrent hash tables via relativistic
> >>>>> programming" from Josh Triplett, which can save memory footprint by
> >>>>> using a single pair of pointers. But my understanding is that
> >>>>> perftest_resize.c is unique in that it allows you to rebuild the hash
> >>>>> table by utilizing a different hash function, which could be very
> >>>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> >>>>> not share this property. Is my understanding correct? Did I miss any
> >>>>> discussions on this topic in perfbook?
> >>>>
> >>>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> >>>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> >>>> to hash functions.  I expect that this advantage will become increasingly
> >>>> important as security issues become more challenging.  Furthermore, I
> >>>> suspect that the pointer-pair approach is faster and more scalable.
> >>>> It is certainly simpler.
> >>>>
> >>>> On the other hand, one advantage of the other two approaches is decreased
> >>>> memory consumption.
> >>>>
> >>>> Another advantage of Josh Triplett's pointer-unzip approach is that
> >>>> concurrent updates are (in theory, anyway) not blocked for as long
> >>>> by resize operations.  The other edge of this sword is that resizing
> >>>> is much slower, given the need to wait for many RCU grace periods.
> >>>>
> >>>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> >>>> and Shavit's split-order list is that all operations are non-blocking,
> >>>> which can be important on massively overloaded systems, such as one
> >>>> might find in cloud computing.
> >>>>
> >>>>> === Question 4 ===
> >>>>> In the current implementation of hash_resize.c, the perftest_resize
> >>>>> could block an updater, and vice versa. It seems this is not what we
> >>>>> expected. Ideally, they should be allowed to run concurrently, or at
> >>>>> least the perftest_resize thread should have lower priority and
> >>>>> updaters should never be blocked by the perftest_resize thread. Is
> >>>>> that right? I'm very interested in helping improve. Please let me know
> >>>>> if you have any suggestions.
> >>>>
> >>>> In hash_resize.c, an updater is blocked only for the time required to
> >>>> redisposition a bucket.  This is a great improvement over blocking
> >>>> updaters for the full resize over all buckets.
> >>>>
> >>>> But yes, it is not hard to do better, for example, periodically dropping
> >>>> the old-table lock in hashtab_resize().  This requires a few careful
> >>>> adjustments, of course.  Can you tell me what these adjustments are?
> >>>>
> >>>> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> >>>> optimizing for the race with hashtab_resize() doesn't make a whole lot
> >>>> of sense.  Please see the patch below.  Thoughts?
> >>>>
> >>>> 							Thanx, Paul
> >>>>
> >>>> ------------------------------------------------------------------------
> >>>>
> >>>> commit 737646a9c868d841b32199b52f5569668975953e
> >>>> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> >>>> Date:   Mon Jan 7 10:29:14 2019 -0800
> >>>>
> >>>>     datastruct/hash: Simplify hashtab_lookup()
> >>>>     
> >>>>     Because resizing leaves the old hash table intact, and because lookups
> >>>>     are carried out within RCU read-side critical sections (which prevent
> >>>>     a second resizing operation from starting), there is no need for a
> >>>>     lookup to search anywhere but in the old hash table.  And in the common
> >>>>     case, there is no resize, so there is no new hash table.  Therefore,
> >>>>     eliminating the check for resizing speeds things up in the common
> >>>>     case.  In addition, this simplifies the code.
> >>>>     
> >>>>     This commit therefore eliminates the ht_get_bucket() function,
> >>>>     renames the ht_get_bucket_single() function to ht_get_bucket(),
> >>>>     and modifies callers appropriately.
> >>>>     
> >>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> >>>>
> >>>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> >>>> index 29e05f907200..be4157959b83 100644
> >>>> --- a/CodeSamples/datastruct/hash/hash_resize.c
> >>>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> >>>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> >>>>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> >>>>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> >>>>  static struct ht_bucket *				//\lnlbl{single:b}
> >>>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>>> -                     unsigned long *h)
> >>>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> >>>>  {
> >>>>  	unsigned long hash = htp->ht_gethash(key);
> >>>>  
> >>>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>>>  		*h = hash;				//\lnlbl{single:h}
> >>>>  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
> >>>>  }							//\lnlbl{single:e}
> >>>> -
> >>>> -/* Get hash bucket correesponding to key, accounting for resize. */
> >>>> -static struct ht_bucket *				//\lnlbl{b}
> >>>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> >>>> -{
> >>>> -	struct ht_bucket *htbp;
> >>>> -
> >>>> -	htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> >>>> -								//\fcvexclude
> >>>> -	if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {	//\lnlbl{resized}
> >>>> -		smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> >>>
> >>> If we can remove this memory barrier, the counterpart smp_mb() in
> >>> hashtab_resize() becomes unnecessary, doesn't it?
> >>
> >> And the WRITE_ONCE() in the following line.
> > 
> > Actually, that must stay.  It is true that the bucket lock is held by
> > hashtab_unlock_mod(), and that this prevents concurrent resizing of that
> > bucket, but other buckets might well be resized, which results in the
> > possibiliity of concurrent reads and writes for ->ht_resize_cur.
> > 
> > Anyway, here is the resulting commit.  Thoughts?
> 
> Looks good to me. (Give or take a few typo in commit log.)
> I missed the remaining READ_ONCE(), which didn't appear in the diff.
> 
> And if you choose this way of simplifying hashtab_lookup(),
> the Quick Quiz on smp_mb()s would become out of context.

Good catch, queued its removal with your Reported-by.

							Thanx, Paul

>         Thanks, Akira
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 1aac0c703482c90c2ce4092b2cc604d474f5a44b
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Mon Jan 7 15:39:40 2019 -0800
> > 
> >     datastruct/hash: Remove extraneous barrier from hashtab_resize()
> >     
> >     Now that hashtab_lookup() is iresizing-agnostic, all non-initialization
> >     accesses to ->ht_resize-cur are protected by locking in the restricted
> >     sense that any change to ->ht_resize_cur that would change the value
> >     of the "if" condition cannot happen while the lock is held on the old
> >     bucket.  This means that the memory barrier may be removed.  However,
> >     the READ_ONCE() and WRITE_ONCE() markings on non-initialization accesses
> >     to ->ht_resize_cur must remain because reads from ->ht_resize_cur really
> >     can race with writes, just not is a way to change the "if" conditions.
> >     
> >     Reported-by: Akira Yokosawa <akiyks@gmail.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> > 
> > diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> > index be4157959b83..9f68a00dabe3 100644
> > --- a/CodeSamples/datastruct/hash/hash_resize.c
> > +++ b/CodeSamples/datastruct/hash/hash_resize.c
> > @@ -288,7 +288,6 @@ int hashtab_resize(struct hashtab *htp_master,
> >  			cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> >  			spin_unlock(&htbp_new->htb_lock);
> >  		}					//\lnlbl{loop_list:e}
> > -		smp_mb(); /* Fill new buckets before claiming them. */
> >  		WRITE_ONCE(htp->ht_resize_cur, i);	//\lnlbl{update_resize}
> >  		spin_unlock(&htbp->htb_lock);		//\lnlbl{rel_oldcur}
> >  	}						//\lnlbl{loop:e}
> > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > index 0152437c274e..e2159330790f 100644
> > --- a/datastruct/datastruct.tex
> > +++ b/datastruct/datastruct.tex
> > @@ -1245,10 +1245,7 @@ lines~\lnref{loop_list:b}-\lnref{loop_list:e} adds one data element
> >  from the current old-table bucket to the corresponding new-table bucket,
> >  holding the new-table bucket's lock during the add operation.
> >  Line~\lnref{update_resize} updates
> > -\co{->ht_resize_cur} to indicate that this bucket has been distributed,
> > -and is preceded by a full memory barrier that pairs with the one in
> > -\co{ht_get_bucket()} shown in
> > -Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}.
> > +\co{->ht_resize_cur} to indicate that this bucket has been distributed.
> >  Finally, line~\lnref{rel_oldcur} releases the old-table bucket lock.
> >  
> >  \QuickQuiz{}
> > 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-08 15:28         ` Paul E. McKenney
@ 2019-01-08 15:35           ` Akira Yokosawa
  2019-01-08 18:39             ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Akira Yokosawa @ 2019-01-08 15:35 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Junchang Wang, perfbook, Akira Yokosawa

On 2019/01/09 0:28, Paul E. McKenney wrote:
> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
>>>> Hi Paul,
>>>>
>>>> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
>>>>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I'm reading hash_resize recently, and have a few questions regarding
>>>>>> this algorithm. Please take a look if you have time. Any suggestions
>>>>>> are warmly welcomed.
>>>>>>
>>>>>> === Question 1 ===
>>>>>> In hash_resize.c : hashtab_lock_mod
>>>>>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
>>>>>> 187                 lsp->hbp[1] = NULL;
>>>>>> 188                 return;
>>>>>> 189         }
>>>>>> 190         htp = rcu_dereference(htp->ht_new);
>>>>>>
>>>>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
>>>>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
>>>>>> compilers and hardware from reordering the two unrelated variables,
>>>>>> ht_resize_cur and ht_new. Is my understanding correct?
>>>>>
>>>>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
>>>>> section
>>>>
>>>> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
>>>> an RCU read-side critical section", don't you?
>>>>
>>>>>         and there is a synchronize_rcu() between the update to ->ht_new
>>>>> and the updates to ->ht_resize_cur.  For more details on how this works,
>>>>> please see https://lwn.net/Articles/573497/.
>>>>>
>>>>> Of course, if you find a code path in which a call to hashtab_lock_mod()
>>>>> is invoked outside of an RCU read-side critical section, that would be
>>>>> a bug.  (Can you tell me an exception to this rule, that is, a case
>>>>> where hashtab_lock_mod() could safely be invoked outside of an RCU
>>>>> read-side critical section?)
>>>>>
>>>>>> === Question 2 ===
>>>>>> In hash_resize.c, each time an updater wants to access a bucket, the
>>>>>> updater must first acquire the bucket's lock (htb_lock), preventing
>>>>>> other updaters accessing the same bucket concurrently. This approach
>>>>>> is OK if the linked list of a bucket is relatively short, but for a
>>>>>> larger system where linked lists are long enough and the
>>>>>> perftest_resize thread is running simultaneously, it could become a
>>>>>> potential performance bottleneck. One naive solution is to allow
>>>>>> multiple updaters to access the same bucket, only if they don't
>>>>>> operate on the same item of the list of this bucket. I wonder if there
>>>>>> are any existing works or discussions on this topic?
>>>>>
>>>>> One approach is to use a hashed array of locks, and to hash a given
>>>>> element's address to locate the lock to be used.  Please see
>>>>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
>>>>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
>>>>>
>>>>> Another approach is to use RCU to protect traversals, and locks within the
>>>>> linked-list elements themselves.  These locks are conditionally acquired
>>>>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
>>>>> them in list order, and the tricks in Quick Quiz 7.9.
>>>>>
>>>>> Non-blocking synchronization can also be used, but it is often quite a
>>>>> bit more complicated.  See for example the split-order list of Shalev
>>>>> and Shavit, along with Desnoyers's RCU-protected extension in the
>>>>> userspace RCU library.
>>>>>
>>>>> But it is usually -way- better to just choose a good hash function and
>>>>> to increase the number of buckets.  Which is of course one reason for
>>>>> having resizable hash tables.  ;-)
>>>>>
>>>>> But the other techniques can be useful in more complex linked data
>>>>> structures, such as graphs, where there is no reasonable way to
>>>>> partition the data.  Nevertheless, many people choose to do the
>>>>> partitioning anyway, especially on distributed systems.
>>>>>
>>>>>> === Question 3 ===
>>>>>> Chapter Data Structures also discusses other resizable hash tables,
>>>>>> namely "Resizable, scalable, concurrent hash tables via relativistic
>>>>>> programming" from Josh Triplett, which can save memory footprint by
>>>>>> using a single pair of pointers. But my understanding is that
>>>>>> perftest_resize.c is unique in that it allows you to rebuild the hash
>>>>>> table by utilizing a different hash function, which could be very
>>>>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
>>>>>> not share this property. Is my understanding correct? Did I miss any
>>>>>> discussions on this topic in perfbook?
>>>>>
>>>>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
>>>>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
>>>>> to hash functions.  I expect that this advantage will become increasingly
>>>>> important as security issues become more challenging.  Furthermore, I
>>>>> suspect that the pointer-pair approach is faster and more scalable.
>>>>> It is certainly simpler.
>>>>>
>>>>> On the other hand, one advantage of the other two approaches is decreased
>>>>> memory consumption.
>>>>>
>>>>> Another advantage of Josh Triplett's pointer-unzip approach is that
>>>>> concurrent updates are (in theory, anyway) not blocked for as long
>>>>> by resize operations.  The other edge of this sword is that resizing
>>>>> is much slower, given the need to wait for many RCU grace periods.
>>>>>
>>>>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
>>>>> and Shavit's split-order list is that all operations are non-blocking,
>>>>> which can be important on massively overloaded systems, such as one
>>>>> might find in cloud computing.
>>>>>
>>>>>> === Question 4 ===
>>>>>> In the current implementation of hash_resize.c, the perftest_resize
>>>>>> could block an updater, and vice versa. It seems this is not what we
>>>>>> expected. Ideally, they should be allowed to run concurrently, or at
>>>>>> least the perftest_resize thread should have lower priority and
>>>>>> updaters should never be blocked by the perftest_resize thread. Is
>>>>>> that right? I'm very interested in helping improve. Please let me know
>>>>>> if you have any suggestions.
>>>>>
>>>>> In hash_resize.c, an updater is blocked only for the time required to
>>>>> redisposition a bucket.  This is a great improvement over blocking
>>>>> updaters for the full resize over all buckets.
>>>>>
>>>>> But yes, it is not hard to do better, for example, periodically dropping
>>>>> the old-table lock in hashtab_resize().  This requires a few careful
>>>>> adjustments, of course.  Can you tell me what these adjustments are?
>>>>>
>>>>> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
>>>>> optimizing for the race with hashtab_resize() doesn't make a whole lot
>>>>> of sense.  Please see the patch below.  Thoughts?
>>>>>
>>>>>                                                      Thanx, Paul
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> commit 737646a9c868d841b32199b52f5569668975953e
>>>>> Author: Paul E. McKenney <paulmck@linux.ibm.com>
>>>>> Date:   Mon Jan 7 10:29:14 2019 -0800
>>>>>
>>>>>     datastruct/hash: Simplify hashtab_lookup()
>>>>>
>>>>>     Because resizing leaves the old hash table intact, and because lookups
>>>>>     are carried out within RCU read-side critical sections (which prevent
>>>>>     a second resizing operation from starting), there is no need for a
>>>>>     lookup to search anywhere but in the old hash table.  And in the common
>>>>>     case, there is no resize, so there is no new hash table.  Therefore,
>>>>>     eliminating the check for resizing speeds things up in the common
>>>>>     case.  In addition, this simplifies the code.
>>>>>
>>>>>     This commit therefore eliminates the ht_get_bucket() function,
>>>>>     renames the ht_get_bucket_single() function to ht_get_bucket(),
>>>>>     and modifies callers appropriately.
>>>>>
>>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
>>>>>
>>>>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
>>>>> index 29e05f907200..be4157959b83 100644
>>>>> --- a/CodeSamples/datastruct/hash/hash_resize.c
>>>>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
>>>>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
>>>>>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>>>>>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>>>>>  static struct ht_bucket *                           //\lnlbl{single:b}
>>>>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>>>> -                     unsigned long *h)
>>>>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>>>>>  {
>>>>>      unsigned long hash = htp->ht_gethash(key);
>>>>>
>>>>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>>>>              *h = hash;                              //\lnlbl{single:h}
>>>>>      return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
>>>>>  }                                                   //\lnlbl{single:e}
>>>>> -
>>>>> -/* Get hash bucket correesponding to key, accounting for resize. */
>>>>> -static struct ht_bucket *                           //\lnlbl{b}
>>>>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
>>>>> -{
>>>>> -    struct ht_bucket *htbp;
>>>>> -
>>>>> -    htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
>>>>> -                                                            //\fcvexclude
>>>>> -    if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
>>>>> -            smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
>>>>
>>>> If we can remove this memory barrier, the counterpart smp_mb() in
>>>> hashtab_resize() becomes unnecessary, doesn't it?
>>>
>>> And the WRITE_ONCE() in the following line.
>>>
>>>         Thanks, Akira
>>>>
>>>>         Thanks, Akira
>>>>
>>>>> -            *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
>>>>> -            htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
>>>>> -    }
>>>>> -    if (i)                                          //\lnlbl{chk_i}
>>>>> -            *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
>>>>> -    return htbp;                                    //\lnlbl{return}
>>>>> -}                                                   //\lnlbl{e}
>>>>>  //\end{snippet}
>>>>>
>>>>>  /* Read-side lock/unlock functions. */
>>>>> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>>>>>
>>>>>      rcu_read_lock();                                //\lnlbl{l:rcu_lock}
>>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
>>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
>>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
>>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
>>>>>      lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
>>>>>      lsp->hls_idx[0] = htp->ht_idx;
>>>>> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>>>>>              return;                                 //\lnlbl{l:fastret1}
>>>>>      }
>>>>>      htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
>>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
>>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
>>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
>>>>>      lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
>>>>>      lsp->hls_idx[1] = htp->ht_idx;
>>>>> @@ -223,16 +204,15 @@ struct ht_elem *                                       //\lnlbl{lkp:b}
>>>>>  hashtab_lookup(struct hashtab *htp_master, void *key)
>>>>>  {
>>>>>      long b;
>>>>> -    int i;
>>>>>      struct ht *htp;
>>>>>      struct ht_elem *htep;
>>>>>      struct ht_bucket *htbp;
>>>>>
>>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
>>>>> -    htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
>>>>> +    htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
>>>>>      cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
>>>>>                                  &htbp->htb_head,
>>>>> -                                hte_next[i]) {
>>>>> +                                hte_next[htp->ht_idx]) {
>>>>>              if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
>>>>>                      return htep;                    //\lnlbl{lkp:ret_match}
>>>>>      }                                               //\lnlbl{lkp:loop:e}
>>>>> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
>>>>>              htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
>>>>>              spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
>>>>>              cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
>>>>> -                    htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>>>>> +                    htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>>>>>                      spin_lock(&htbp_new->htb_lock);
>>>>>                      cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
>>>>>                      spin_unlock(&htbp_new->htb_lock);
>>>>> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
>>>>> index 5c61bf5e2389..0152437c274e 100644
>>>>> --- a/datastruct/datastruct.tex
>>>>> +++ b/datastruct/datastruct.tex
>>>>> @@ -966,10 +966,8 @@ the old table.
>>>>>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>>>>>  Bucket selection is shown in
>>>>>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
>>>>> -which shows \co{ht_get_bucket_single()} on
>>>>> -lines~\lnref{single:b}-\lnref{single:e} and
>>>>> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
>>>>> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
>>>>> +which shows \co{ht_get_bucket()}.
>>>>> +This function returns a reference to the bucket
>>>>>  corresponding to the specified key in the specified hash table, without
>>>>>  making any allowances for resizing.
>>>>>  It also stores the bucket index corresponding to the key into the location
>>>>> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
>>>>>  hash value corresponding to the key into the location
>>>>>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
>>>>>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
>>>>> -
>>>>> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
>>>>> -\co{ht_get_bucket_single()} on
>>>>> -line~\lnref{call_single} to select the bucket
>>>>> -corresponding to the hash in the current
>>>>> -hash table, storing the hash value through parameter~\co{b}.
>>>>> -If line~\lnref{resized} determines that the table is being resized and that
>>>>> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
>>>>> -table, then line~\lnref{newtable} selects the new hash table and
>>>>> -line~\lnref{newbucket}
>>>>> -selects the bucket corresponding to the hash in the new hash table,
>>>>> -again storing the hash value through parameter~\co{b}.
>>>>> -\end{lineref}
>>>>> -
>>>>> -\QuickQuiz{}
>>>>> -    The code in
>>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
>>>>> -    computes the hash twice!
>>>>> -    Why this blatant inefficiency?
>>>>> -\QuickQuizAnswer{
>>>>> -    The reason is that the old and new hash tables might have
>>>>> -    completely different hash functions, so that a hash computed
>>>>> -    for the old table might be completely irrelevant to the
>>>>> -    new table.
>>>>> -} \QuickQuizEnd
>>>>> -
>>>>> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>>>>> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
>>>>> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
>>>>> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>>>>>  \end{lineref}
>>>>>
>>>>>  \QuickQuiz{}
>>>>> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>>>>>      functions described next.
>>>>>  } \QuickQuizEnd
>>>>>
>>>>> -This implementation of
>>>>> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
>>>>> -permit lookups and modifications to run concurrently
>>>>> -with a resize operation.
>>>>> +This implementation of \co{ht_get_bucket()} permits lookups and
>>>>> +modifications to run concurrently with a resize operation.
>>>>>
>>>>>  \begin{listing}[tb]
>>>>>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
>>>>> @@ -1129,11 +1095,6 @@ hash lookups.
>>>>>  Line~\lnref{get_curtbl} fetches the current hash table and
>>>>>  line~\lnref{get_curbkt} obtains a reference
>>>>>  to the bucket corresponding to the specified key.
>>>>> -This bucket will be located in a new resized hash table when a
>>>>> -resize operation has progressed past the bucket in the old hash
>>>>> -table that contained the desired data element.
>>>>> -Note that line~\lnref{get_curbkt} also passes back the index that will be
>>>>> -used to select the correct set of pointers from the pair in each element.
>>>>>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
>>>>>  so that if line~\lnref{match}
>>>>>  detects a match,
>>>>> @@ -1144,22 +1105,17 @@ failure.
>>>>>  \end{lineref}
>>>>>
>>>>>  \QuickQuiz{}
>>>>> -    In the \co{hashtab_lookup()} function in
>>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
>>>>> -    the code carefully finds the right bucket in the new hash table
>>>>> -    if the element to be looked up has already been distributed
>>>>> -    by a concurrent resize operation.
>>>>> -    This seems wasteful for RCU-protected lookups.
>>>>> -    Why not just stick with the old hash table in this case?
>>>>> +    \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
>>>>> +    What if execution reaches line~\lnref{loop:b}
>>>>> +    of \co{hashtab_lookup()} in
>>>>> +    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
>>>>> +    just after this bucket has been resized.
>>>>> +    Won't that result in lookup failures?
>>>>> +    \end{lineref}
>>>>>  \QuickQuizAnswer{
>>>>> -    Suppose that a resize operation begins and distributes half of
>>>>> -    the old table's buckets to the new table.
>>>>> -    Suppose further that a thread adds a new element that goes into
>>>>> -    one of the already-distributed buckets, and that this same thread
>>>>> -    now looks up this newly added element.
>>>>> -    If lookups unconditionally traversed only the old hash table,
>>>>> -    this thread would get a lookup failure for the element that it
>>>>> -    just added, which certainly sounds like a bug to me!
>>>>> +    No, it won't.
>>>>> +    Resizing into the new hash table leaves the old hash table
>>>>> +    intact, courtesy of the pointer pairs.
>>>>>  } \QuickQuizEnd
>>>>>
>>>>>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
>>>>>
>>
>> Hi Paul and Akira,
>>
>> Thanks a lot for the comments, which I need some more time to look
>> into. For Paul's patch, I have a few concerns. Please take a look.
>>
>> My understanding is that with this path, during the time period when
>> the resizing thread is running, an updater may insert/delete an item
>> into/from the new hash table, while readers are still looking up data
>> in the old one, resulting the readers are unaware of
>> insertions/deletions happening simultaneously. For example, it seems
>> the following sequence could happen.
>>
>> 1. The resizing thread starts.
>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
>> 3. An updater wants to insert a new item *I* which should be inserted
>> into bucket *B*.
>> 4. The updater will select the new hash table and insert the item *I*
>> into the new hash table.
>> 5. A read request comes in and wants to lookup item *I*. The lookup
>> request will check the old hash table and fail. Doesn't it?
>> 6. The resizing thread exits.
>> 7. Now subsequent read requests can successfully find item *I*.
> 
> Yes, this can happen.
> 
>> Is my understanding correct? Please let me know if I misunderstood
>> anything. Give the truth that this patch can accelerate the fast path,
>> I think it should be OK because resizing is typically happen rarely.
>> Just want to make sure I fully understand the algorithm.
> 
> It is a design choice, and some users would prefer not to fail to see
> new items during a resize.  One approach would be to revert back to
> the old-style checking, and another would be to provide a separate
> lookup interface that synchronizes with adds and deletes.
> 
> So, I could add a quick quiz with this information, I could revert the
> change, or I could add another lookup function that provided more timely
> information.  Left to myself, I would provide a quick quiz, but what
> do you guys think?

Hi, I was composing a message, but now I'm replying to this one.
I think adding a quick quiz would be a good idea.

        Thanks, Akira

> 
> 							Thanx, Paul
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-08 15:35           ` Akira Yokosawa
@ 2019-01-08 18:39             ` Paul E. McKenney
  2019-01-08 22:16               ` Akira Yokosawa
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-08 18:39 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
> On 2019/01/09 0:28, Paul E. McKenney wrote:
> > On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
> >> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> >>>> Hi Paul,
> >>>>
> >>>> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
> >>>>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm reading hash_resize recently, and have a few questions regarding
> >>>>>> this algorithm. Please take a look if you have time. Any suggestions
> >>>>>> are warmly welcomed.
> >>>>>>
> >>>>>> === Question 1 ===
> >>>>>> In hash_resize.c : hashtab_lock_mod
> >>>>>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> >>>>>> 187                 lsp->hbp[1] = NULL;
> >>>>>> 188                 return;
> >>>>>> 189         }
> >>>>>> 190         htp = rcu_dereference(htp->ht_new);
> >>>>>>
> >>>>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> >>>>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> >>>>>> compilers and hardware from reordering the two unrelated variables,
> >>>>>> ht_resize_cur and ht_new. Is my understanding correct?
> >>>>>
> >>>>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> >>>>> section
> >>>>
> >>>> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
> >>>> an RCU read-side critical section", don't you?
> >>>>
> >>>>>         and there is a synchronize_rcu() between the update to ->ht_new
> >>>>> and the updates to ->ht_resize_cur.  For more details on how this works,
> >>>>> please see https://lwn.net/Articles/573497/.
> >>>>>
> >>>>> Of course, if you find a code path in which a call to hashtab_lock_mod()
> >>>>> is invoked outside of an RCU read-side critical section, that would be
> >>>>> a bug.  (Can you tell me an exception to this rule, that is, a case
> >>>>> where hashtab_lock_mod() could safely be invoked outside of an RCU
> >>>>> read-side critical section?)
> >>>>>
> >>>>>> === Question 2 ===
> >>>>>> In hash_resize.c, each time an updater wants to access a bucket, the
> >>>>>> updater must first acquire the bucket's lock (htb_lock), preventing
> >>>>>> other updaters accessing the same bucket concurrently. This approach
> >>>>>> is OK if the linked list of a bucket is relatively short, but for a
> >>>>>> larger system where linked lists are long enough and the
> >>>>>> perftest_resize thread is running simultaneously, it could become a
> >>>>>> potential performance bottleneck. One naive solution is to allow
> >>>>>> multiple updaters to access the same bucket, only if they don't
> >>>>>> operate on the same item of the list of this bucket. I wonder if there
> >>>>>> are any existing works or discussions on this topic?
> >>>>>
> >>>>> One approach is to use a hashed array of locks, and to hash a given
> >>>>> element's address to locate the lock to be used.  Please see
> >>>>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> >>>>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
> >>>>>
> >>>>> Another approach is to use RCU to protect traversals, and locks within the
> >>>>> linked-list elements themselves.  These locks are conditionally acquired
> >>>>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> >>>>> them in list order, and the tricks in Quick Quiz 7.9.
> >>>>>
> >>>>> Non-blocking synchronization can also be used, but it is often quite a
> >>>>> bit more complicated.  See for example the split-order list of Shalev
> >>>>> and Shavit, along with Desnoyers's RCU-protected extension in the
> >>>>> userspace RCU library.
> >>>>>
> >>>>> But it is usually -way- better to just choose a good hash function and
> >>>>> to increase the number of buckets.  Which is of course one reason for
> >>>>> having resizable hash tables.  ;-)
> >>>>>
> >>>>> But the other techniques can be useful in more complex linked data
> >>>>> structures, such as graphs, where there is no reasonable way to
> >>>>> partition the data.  Nevertheless, many people choose to do the
> >>>>> partitioning anyway, especially on distributed systems.
> >>>>>
> >>>>>> === Question 3 ===
> >>>>>> Chapter Data Structures also discusses other resizable hash tables,
> >>>>>> namely "Resizable, scalable, concurrent hash tables via relativistic
> >>>>>> programming" from Josh Triplett, which can save memory footprint by
> >>>>>> using a single pair of pointers. But my understanding is that
> >>>>>> perftest_resize.c is unique in that it allows you to rebuild the hash
> >>>>>> table by utilizing a different hash function, which could be very
> >>>>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
> >>>>>> not share this property. Is my understanding correct? Did I miss any
> >>>>>> discussions on this topic in perfbook?
> >>>>>
> >>>>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> >>>>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> >>>>> to hash functions.  I expect that this advantage will become increasingly
> >>>>> important as security issues become more challenging.  Furthermore, I
> >>>>> suspect that the pointer-pair approach is faster and more scalable.
> >>>>> It is certainly simpler.
> >>>>>
> >>>>> On the other hand, one advantage of the other two approaches is decreased
> >>>>> memory consumption.
> >>>>>
> >>>>> Another advantage of Josh Triplett's pointer-unzip approach is that
> >>>>> concurrent updates are (in theory, anyway) not blocked for as long
> >>>>> by resize operations.  The other edge of this sword is that resizing
> >>>>> is much slower, given the need to wait for many RCU grace periods.
> >>>>>
> >>>>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> >>>>> and Shavit's split-order list is that all operations are non-blocking,
> >>>>> which can be important on massively overloaded systems, such as one
> >>>>> might find in cloud computing.
> >>>>>
> >>>>>> === Question 4 ===
> >>>>>> In the current implementation of hash_resize.c, the perftest_resize
> >>>>>> could block an updater, and vice versa. It seems this is not what we
> >>>>>> expected. Ideally, they should be allowed to run concurrently, or at
> >>>>>> least the perftest_resize thread should have lower priority and
> >>>>>> updaters should never be blocked by the perftest_resize thread. Is
> >>>>>> that right? I'm very interested in helping improve. Please let me know
> >>>>>> if you have any suggestions.
> >>>>>
> >>>>> In hash_resize.c, an updater is blocked only for the time required to
> >>>>> redisposition a bucket.  This is a great improvement over blocking
> >>>>> updaters for the full resize over all buckets.
> >>>>>
> >>>>> But yes, it is not hard to do better, for example, periodically dropping
> >>>>> the old-table lock in hashtab_resize().  This requires a few careful
> >>>>> adjustments, of course.  Can you tell me what these adjustments are?
> >>>>>
> >>>>> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> >>>>> optimizing for the race with hashtab_resize() doesn't make a whole lot
> >>>>> of sense.  Please see the patch below.  Thoughts?
> >>>>>
> >>>>>                                                      Thanx, Paul
> >>>>>
> >>>>> ------------------------------------------------------------------------
> >>>>>
> >>>>> commit 737646a9c868d841b32199b52f5569668975953e
> >>>>> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> >>>>> Date:   Mon Jan 7 10:29:14 2019 -0800
> >>>>>
> >>>>>     datastruct/hash: Simplify hashtab_lookup()
> >>>>>
> >>>>>     Because resizing leaves the old hash table intact, and because lookups
> >>>>>     are carried out within RCU read-side critical sections (which prevent
> >>>>>     a second resizing operation from starting), there is no need for a
> >>>>>     lookup to search anywhere but in the old hash table.  And in the common
> >>>>>     case, there is no resize, so there is no new hash table.  Therefore,
> >>>>>     eliminating the check for resizing speeds things up in the common
> >>>>>     case.  In addition, this simplifies the code.
> >>>>>
> >>>>>     This commit therefore eliminates the ht_get_bucket() function,
> >>>>>     renames the ht_get_bucket_single() function to ht_get_bucket(),
> >>>>>     and modifies callers appropriately.
> >>>>>
> >>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> >>>>>
> >>>>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> >>>>> index 29e05f907200..be4157959b83 100644
> >>>>> --- a/CodeSamples/datastruct/hash/hash_resize.c
> >>>>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> >>>>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> >>>>>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> >>>>>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> >>>>>  static struct ht_bucket *                           //\lnlbl{single:b}
> >>>>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>>>> -                     unsigned long *h)
> >>>>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> >>>>>  {
> >>>>>      unsigned long hash = htp->ht_gethash(key);
> >>>>>
> >>>>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >>>>>              *h = hash;                              //\lnlbl{single:h}
> >>>>>      return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
> >>>>>  }                                                   //\lnlbl{single:e}
> >>>>> -
> >>>>> -/* Get hash bucket correesponding to key, accounting for resize. */
> >>>>> -static struct ht_bucket *                           //\lnlbl{b}
> >>>>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> >>>>> -{
> >>>>> -    struct ht_bucket *htbp;
> >>>>> -
> >>>>> -    htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> >>>>> -                                                            //\fcvexclude
> >>>>> -    if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
> >>>>> -            smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> >>>>
> >>>> If we can remove this memory barrier, the counterpart smp_mb() in
> >>>> hashtab_resize() becomes unnecessary, doesn't it?
> >>>
> >>> And the WRITE_ONCE() in the following line.
> >>>
> >>>         Thanks, Akira
> >>>>
> >>>>         Thanks, Akira
> >>>>
> >>>>> -            *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
> >>>>> -            htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> >>>>> -    }
> >>>>> -    if (i)                                          //\lnlbl{chk_i}
> >>>>> -            *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
> >>>>> -    return htbp;                                    //\lnlbl{return}
> >>>>> -}                                                   //\lnlbl{e}
> >>>>>  //\end{snippet}
> >>>>>
> >>>>>  /* Read-side lock/unlock functions. */
> >>>>> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >>>>>
> >>>>>      rcu_read_lock();                                //\lnlbl{l:rcu_lock}
> >>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
> >>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
> >>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
> >>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
> >>>>>      lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
> >>>>>      lsp->hls_idx[0] = htp->ht_idx;
> >>>>> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >>>>>              return;                                 //\lnlbl{l:fastret1}
> >>>>>      }
> >>>>>      htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
> >>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
> >>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
> >>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
> >>>>>      lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
> >>>>>      lsp->hls_idx[1] = htp->ht_idx;
> >>>>> @@ -223,16 +204,15 @@ struct ht_elem *                                       //\lnlbl{lkp:b}
> >>>>>  hashtab_lookup(struct hashtab *htp_master, void *key)
> >>>>>  {
> >>>>>      long b;
> >>>>> -    int i;
> >>>>>      struct ht *htp;
> >>>>>      struct ht_elem *htep;
> >>>>>      struct ht_bucket *htbp;
> >>>>>
> >>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
> >>>>> -    htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
> >>>>> +    htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
> >>>>>      cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
> >>>>>                                  &htbp->htb_head,
> >>>>> -                                hte_next[i]) {
> >>>>> +                                hte_next[htp->ht_idx]) {
> >>>>>              if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
> >>>>>                      return htep;                    //\lnlbl{lkp:ret_match}
> >>>>>      }                                               //\lnlbl{lkp:loop:e}
> >>>>> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
> >>>>>              htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
> >>>>>              spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
> >>>>>              cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> >>>>> -                    htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> >>>>> +                    htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> >>>>>                      spin_lock(&htbp_new->htb_lock);
> >>>>>                      cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> >>>>>                      spin_unlock(&htbp_new->htb_lock);
> >>>>> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> >>>>> index 5c61bf5e2389..0152437c274e 100644
> >>>>> --- a/datastruct/datastruct.tex
> >>>>> +++ b/datastruct/datastruct.tex
> >>>>> @@ -966,10 +966,8 @@ the old table.
> >>>>>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> >>>>>  Bucket selection is shown in
> >>>>>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> >>>>> -which shows \co{ht_get_bucket_single()} on
> >>>>> -lines~\lnref{single:b}-\lnref{single:e} and
> >>>>> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> >>>>> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> >>>>> +which shows \co{ht_get_bucket()}.
> >>>>> +This function returns a reference to the bucket
> >>>>>  corresponding to the specified key in the specified hash table, without
> >>>>>  making any allowances for resizing.
> >>>>>  It also stores the bucket index corresponding to the key into the location
> >>>>> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
> >>>>>  hash value corresponding to the key into the location
> >>>>>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
> >>>>>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> >>>>> -
> >>>>> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> >>>>> -\co{ht_get_bucket_single()} on
> >>>>> -line~\lnref{call_single} to select the bucket
> >>>>> -corresponding to the hash in the current
> >>>>> -hash table, storing the hash value through parameter~\co{b}.
> >>>>> -If line~\lnref{resized} determines that the table is being resized and that
> >>>>> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> >>>>> -table, then line~\lnref{newtable} selects the new hash table and
> >>>>> -line~\lnref{newbucket}
> >>>>> -selects the bucket corresponding to the hash in the new hash table,
> >>>>> -again storing the hash value through parameter~\co{b}.
> >>>>> -\end{lineref}
> >>>>> -
> >>>>> -\QuickQuiz{}
> >>>>> -    The code in
> >>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> >>>>> -    computes the hash twice!
> >>>>> -    Why this blatant inefficiency?
> >>>>> -\QuickQuizAnswer{
> >>>>> -    The reason is that the old and new hash tables might have
> >>>>> -    completely different hash functions, so that a hash computed
> >>>>> -    for the old table might be completely irrelevant to the
> >>>>> -    new table.
> >>>>> -} \QuickQuizEnd
> >>>>> -
> >>>>> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> >>>>> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> >>>>> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> >>>>> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >>>>>  \end{lineref}
> >>>>>
> >>>>>  \QuickQuiz{}
> >>>>> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >>>>>      functions described next.
> >>>>>  } \QuickQuizEnd
> >>>>>
> >>>>> -This implementation of
> >>>>> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> >>>>> -permit lookups and modifications to run concurrently
> >>>>> -with a resize operation.
> >>>>> +This implementation of \co{ht_get_bucket()} permits lookups and
> >>>>> +modifications to run concurrently with a resize operation.
> >>>>>
> >>>>>  \begin{listing}[tb]
> >>>>>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> >>>>> @@ -1129,11 +1095,6 @@ hash lookups.
> >>>>>  Line~\lnref{get_curtbl} fetches the current hash table and
> >>>>>  line~\lnref{get_curbkt} obtains a reference
> >>>>>  to the bucket corresponding to the specified key.
> >>>>> -This bucket will be located in a new resized hash table when a
> >>>>> -resize operation has progressed past the bucket in the old hash
> >>>>> -table that contained the desired data element.
> >>>>> -Note that line~\lnref{get_curbkt} also passes back the index that will be
> >>>>> -used to select the correct set of pointers from the pair in each element.
> >>>>>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> >>>>>  so that if line~\lnref{match}
> >>>>>  detects a match,
> >>>>> @@ -1144,22 +1105,17 @@ failure.
> >>>>>  \end{lineref}
> >>>>>
> >>>>>  \QuickQuiz{}
> >>>>> -    In the \co{hashtab_lookup()} function in
> >>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> >>>>> -    the code carefully finds the right bucket in the new hash table
> >>>>> -    if the element to be looked up has already been distributed
> >>>>> -    by a concurrent resize operation.
> >>>>> -    This seems wasteful for RCU-protected lookups.
> >>>>> -    Why not just stick with the old hash table in this case?
> >>>>> +    \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> >>>>> +    What if execution reaches line~\lnref{loop:b}
> >>>>> +    of \co{hashtab_lookup()} in
> >>>>> +    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> >>>>> +    just after this bucket has been resized.
> >>>>> +    Won't that result in lookup failures?
> >>>>> +    \end{lineref}
> >>>>>  \QuickQuizAnswer{
> >>>>> -    Suppose that a resize operation begins and distributes half of
> >>>>> -    the old table's buckets to the new table.
> >>>>> -    Suppose further that a thread adds a new element that goes into
> >>>>> -    one of the already-distributed buckets, and that this same thread
> >>>>> -    now looks up this newly added element.
> >>>>> -    If lookups unconditionally traversed only the old hash table,
> >>>>> -    this thread would get a lookup failure for the element that it
> >>>>> -    just added, which certainly sounds like a bug to me!
> >>>>> +    No, it won't.
> >>>>> +    Resizing into the new hash table leaves the old hash table
> >>>>> +    intact, courtesy of the pointer pairs.
> >>>>>  } \QuickQuizEnd
> >>>>>
> >>>>>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
> >>>>>
> >>
> >> Hi Paul and Akira,
> >>
> >> Thanks a lot for the comments, which I need some more time to look
> >> into. For Paul's patch, I have a few concerns. Please take a look.
> >>
> >> My understanding is that with this path, during the time period when
> >> the resizing thread is running, an updater may insert/delete an item
> >> into/from the new hash table, while readers are still looking up data
> >> in the old one, resulting the readers are unaware of
> >> insertions/deletions happening simultaneously. For example, it seems
> >> the following sequence could happen.
> >>
> >> 1. The resizing thread starts.
> >> 2. The resizing thread successfully passes bucket *B* of the old hash table.
> >> 3. An updater wants to insert a new item *I* which should be inserted
> >> into bucket *B*.
> >> 4. The updater will select the new hash table and insert the item *I*
> >> into the new hash table.
> >> 5. A read request comes in and wants to lookup item *I*. The lookup
> >> request will check the old hash table and fail. Doesn't it?
> >> 6. The resizing thread exits.
> >> 7. Now subsequent read requests can successfully find item *I*.
> > 
> > Yes, this can happen.
> > 
> >> Is my understanding correct? Please let me know if I misunderstood
> >> anything. Give the truth that this patch can accelerate the fast path,
> >> I think it should be OK because resizing is typically happen rarely.
> >> Just want to make sure I fully understand the algorithm.
> > 
> > It is a design choice, and some users would prefer not to fail to see
> > new items during a resize.  One approach would be to revert back to
> > the old-style checking, and another would be to provide a separate
> > lookup interface that synchronizes with adds and deletes.
> > 
> > So, I could add a quick quiz with this information, I could revert the
> > change, or I could add another lookup function that provided more timely
> > information.  Left to myself, I would provide a quick quiz, but what
> > do you guys think?
> 
> Hi, I was composing a message, but now I'm replying to this one.
> I think adding a quick quiz would be a good idea.

But in the meantime, it occurred to me that I was looking at the
problem in the wrong way.  I believe that the following patch makes
hashtab_lookup() find elements recently added by hashtab_add(), even
during a resize, and without the need for memory barriers.

The scenario that convinced me to take this approach is when a thread
does hashtab_add(), then immediately searches for the newly added element.
Failing to find it would be quite a surprise to most people.

							Thanx, Paul

------------------------------------------------------------------------

commit b61179bdc22e9750147ad3f540215af225aa3376
Author: Paul E. McKenney <paulmck@linux.ibm.com>
Date:   Tue Jan 8 10:29:43 2019 -0800

    datastruct/hash: Make hashtab_lookup() more responsive during resize
    
    If hashtab_add() adds a new element to an already-resized bucket during
    a resize, hashtab_lookup() won't find it until the resize is complete.
    This is a not-unreasonable semantic, but might be quite surprising to most
    users, especially if the thread doing the lookup is the same thread that
    just added the new element.  This commit therefore causes hashtab_lookup()
    to find recently added elements even during resize operations.
    
    Note that this change involved a small refactoring of the code,
    introducing a new ht_search_bucket() helper function.  This refactoring
    avoids the need for memory barriers as well because when an element is
    not found and there is a resize in progress, the new version of the
    hash table is unconditionally searched, thus avoiding the need for a
    racy access to ->ht_resize_cur.  Although this approach can result in
    needless searches of not-yet-filled-in buckets, such searches will
    be rare (assuming resizing is rare), and a search of an empty bucket
    is cheap anyway.
    
    Reported-by: Junchang Wang <junchangwang@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>

diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
index 9f68a00dabe3..6dbfe020d78d 100644
--- a/CodeSamples/datastruct/hash/hash_resize.c
+++ b/CodeSamples/datastruct/hash/hash_resize.c
@@ -124,7 +124,8 @@ void hashtab_free(struct hashtab *htp_master)
 //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
 /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
 static struct ht_bucket *				//\lnlbl{single:b}
-ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
+ht_get_bucket(struct ht *htp, void *key,
+              long *b, unsigned long *h)
 {
 	unsigned long hash = htp->ht_gethash(key);
 
@@ -133,6 +134,24 @@ ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
 		*h = hash;				//\lnlbl{single:h}
 	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
 }							//\lnlbl{single:e}
+
+/* Search the bucket for the specfied key in the specified ht structure. */
+static struct ht_elem *					//\lnlbl{hsb:b}
+ht_search_bucket(struct ht *htp, void *key)
+{
+	long b;
+	struct ht_elem *htep;
+	struct ht_bucket *htbp;
+
+	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{hsb:get_curbkt}
+	cds_list_for_each_entry_rcu(htep,		//\lnlbl{hsb:loop:b}
+	                            &htbp->htb_head,
+	                            hte_next[htp->ht_idx]) {
+		if (htp->ht_cmp(htep, key)) 		//\lnlbl{hsb:match}
+			return htep;			//\lnlbl{hsb:ret_match}
+	}						//\lnlbl{hsb:loop:e}
+	return NULL;					//\lnlbl{hsb:ret_NULL}
+}							//\lnlbl{hsb:e}
 //\end{snippet}
 
 /* Read-side lock/unlock functions. */
@@ -203,20 +222,17 @@ void hashtab_lookup_done(struct ht_elem *htep)
 struct ht_elem *					//\lnlbl{lkp:b}
 hashtab_lookup(struct hashtab *htp_master, void *key)
 {
-	long b;
 	struct ht *htp;
 	struct ht_elem *htep;
-	struct ht_bucket *htbp;
 
 	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
-	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
-	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
-	                            &htbp->htb_head,
-	                            hte_next[htp->ht_idx]) {
-		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
-			return htep;			//\lnlbl{lkp:ret_match}
-	}						//\lnlbl{lkp:loop:e}
-	return NULL;					//\lnlbl{lkp:ret_NULL}
+	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
+	if (htep)					//\lnlbl{lkp:entchk}
+		return htep;				//\lnlbl{lkp:ret_match}
+	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
+	if (!htp)					//\lnlbl{lkp:htpchk}
+		return NULL;				//\lnlbl{lkp:noresize}
+	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
 }							//\lnlbl{lkp:e}
 
 /*
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index eb7f790aa5a3..175056cbe911 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -966,8 +966,10 @@ the old table.
 \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
 Bucket selection is shown in
 Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
-which shows \co{ht_get_bucket()}.
-This function returns a reference to the bucket
+which shows \co{ht_get_bucket()} on
+lines~\lnref{single:b}-\lnref{single:e} and \co{ht_search_bucket()} on
+lines~\lnref{hsb:b}-\lnref{hsb:e}.
+The \co{ht_get_bucket()} function returns a reference to the bucket
 corresponding to the specified key in the specified hash table, without
 making any allowances for resizing.
 It also stores the bucket index corresponding to the key into the location
@@ -976,6 +978,17 @@ line~\lnref{single:gethash}, and the corresponding
 hash value corresponding to the key into the location
 referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
 Line~\lnref{single:return} then returns a reference to the corresponding bucket.
+
+The \co{ht_search_bucket()} function searches for the specified key
+within the specified hash-table version.
+Line~\lnref{hsb:get_curbkt} obtains a reference to the bucket corresponding
+to the specified key.
+The loop spanning lines~\lnref{hsb:loop:b}-\lnref{hsb:loop:e} searches
+that bucket, so that if line~\lnref{hsb:match} detects a match,
+line~\lnref{hsb:ret_match} returns a pointer to the enclosing data element.
+Otherwise, if there is no match,
+line~\lnref{hsb:ret_NULL} returns \co{NULL} to indicate
+failure.
 \end{lineref}
 
 \QuickQuiz{}
@@ -1093,31 +1106,18 @@ The \co{hashtab_lookup()} function on
 lines~\lnref{b}-\lnref{e} of the listing does
 hash lookups.
 Line~\lnref{get_curtbl} fetches the current hash table and
-line~\lnref{get_curbkt} obtains a reference
-to the bucket corresponding to the specified key.
-The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
-so that if line~\lnref{match}
-detects a match,
-line~\lnref{ret_match} returns a pointer to the enclosing data element.
-Otherwise, if there is no match,
-line~\lnref{ret_NULL} returns \co{NULL} to indicate
-failure.
+line~\lnref{get_curbkt} searches the bucket corresponding to the
+specified key.
+If line~\lnref{entchk} determines that the search was successful,
+line~\lnref{ret_match} returns a pointer to the element that was located.
+Otherwise, line~\lnref{get_nxttbl} picks up a pointer to the next version,
+and if line~\lnref{htpchk} determines that there is no resize in progress,
+line~\lnref{noresize} returns \co{NULL}.
+When there is a resize in progress, execution reaches
+line~\lnref{ret_nxtbkt}, which returns the result of searching the
+bucket corresponding to \co{key} in the new version.
 \end{lineref}
 
-\QuickQuiz{}
-	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
-	What if execution reaches line~\lnref{loop:b}
-	of \co{hashtab_lookup()} in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
-	just after this bucket has been resized.
-	Won't that result in lookup failures?
-	\end{lineref}
-\QuickQuizAnswer{
-	No, it won't.
-	Resizing into the new hash table leaves the old hash table
-	intact, courtesy of the pointer pairs.
-} \QuickQuizEnd
-
 \begin{lineref}[ln:datastruct:hash_resize:access:add]
 The \co{hashtab_add()} function on lines~\lnref{b}-\lnref{e} of the listing adds
 new data elements to the hash table.



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-08 18:39             ` Paul E. McKenney
@ 2019-01-08 22:16               ` Akira Yokosawa
  2019-01-09  0:19                 ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Akira Yokosawa @ 2019-01-08 22:16 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Junchang Wang, perfbook, Akira Yokosawa

On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
> On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
>> On 2019/01/09 0:28, Paul E. McKenney wrote:
>>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
>>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
>>>>>> Hi Paul,
>>>>>>
>>>>>> On 2019/01/07 10:33:17 -0800, Paul E. McKenney wrote:
>>>>>>> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm reading hash_resize recently, and have a few questions regarding
>>>>>>>> this algorithm. Please take a look if you have time. Any suggestions
>>>>>>>> are warmly welcomed.
>>>>>>>>
>>>>>>>> === Question 1 ===
>>>>>>>> In hash_resize.c : hashtab_lock_mod
>>>>>>>> 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
>>>>>>>> 187                 lsp->hbp[1] = NULL;
>>>>>>>> 188                 return;
>>>>>>>> 189         }
>>>>>>>> 190         htp = rcu_dereference(htp->ht_new);
>>>>>>>>
>>>>>>>> It seems we are missing a barrier (e.g., smp_mb) in between lines 189
>>>>>>>> and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
>>>>>>>> compilers and hardware from reordering the two unrelated variables,
>>>>>>>> ht_resize_cur and ht_new. Is my understanding correct?
>>>>>>>
>>>>>>> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
>>>>>>> section
>>>>>>
>>>>>> You mean "rcu_read_lock() at the beginning of hashtab_lock_mod() starts
>>>>>> an RCU read-side critical section", don't you?
>>>>>>
>>>>>>>         and there is a synchronize_rcu() between the update to ->ht_new
>>>>>>> and the updates to ->ht_resize_cur.  For more details on how this works,
>>>>>>> please see https://lwn.net/Articles/573497/.
>>>>>>>
>>>>>>> Of course, if you find a code path in which a call to hashtab_lock_mod()
>>>>>>> is invoked outside of an RCU read-side critical section, that would be
>>>>>>> a bug.  (Can you tell me an exception to this rule, that is, a case
>>>>>>> where hashtab_lock_mod() could safely be invoked outside of an RCU
>>>>>>> read-side critical section?)
>>>>>>>
>>>>>>>> === Question 2 ===
>>>>>>>> In hash_resize.c, each time an updater wants to access a bucket, the
>>>>>>>> updater must first acquire the bucket's lock (htb_lock), preventing
>>>>>>>> other updaters accessing the same bucket concurrently. This approach
>>>>>>>> is OK if the linked list of a bucket is relatively short, but for a
>>>>>>>> larger system where linked lists are long enough and the
>>>>>>>> perftest_resize thread is running simultaneously, it could become a
>>>>>>>> potential performance bottleneck. One naive solution is to allow
>>>>>>>> multiple updaters to access the same bucket, only if they don't
>>>>>>>> operate on the same item of the list of this bucket. I wonder if there
>>>>>>>> are any existing works or discussions on this topic?
>>>>>>>
>>>>>>> One approach is to use a hashed array of locks, and to hash a given
>>>>>>> element's address to locate the lock to be used.  Please see
>>>>>>> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
>>>>>>> Needed Locks First"), including Quick Quiz 7.9, for additional details.
>>>>>>>
>>>>>>> Another approach is to use RCU to protect traversals, and locks within the
>>>>>>> linked-list elements themselves.  These locks are conditionally acquired
>>>>>>> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
>>>>>>> them in list order, and the tricks in Quick Quiz 7.9.
>>>>>>>
>>>>>>> Non-blocking synchronization can also be used, but it is often quite a
>>>>>>> bit more complicated.  See for example the split-order list of Shalev
>>>>>>> and Shavit, along with Desnoyers's RCU-protected extension in the
>>>>>>> userspace RCU library.
>>>>>>>
>>>>>>> But it is usually -way- better to just choose a good hash function and
>>>>>>> to increase the number of buckets.  Which is of course one reason for
>>>>>>> having resizable hash tables.  ;-)
>>>>>>>
>>>>>>> But the other techniques can be useful in more complex linked data
>>>>>>> structures, such as graphs, where there is no reasonable way to
>>>>>>> partition the data.  Nevertheless, many people choose to do the
>>>>>>> partitioning anyway, especially on distributed systems.
>>>>>>>
>>>>>>>> === Question 3 ===
>>>>>>>> Chapter Data Structures also discusses other resizable hash tables,
>>>>>>>> namely "Resizable, scalable, concurrent hash tables via relativistic
>>>>>>>> programming" from Josh Triplett, which can save memory footprint by
>>>>>>>> using a single pair of pointers. But my understanding is that
>>>>>>>> perftest_resize.c is unique in that it allows you to rebuild the hash
>>>>>>>> table by utilizing a different hash function, which could be very
>>>>>>>> useful in practice (e.g., to prevent DDoS attack). Other solutions do
>>>>>>>> not share this property. Is my understanding correct? Did I miss any
>>>>>>>> discussions on this topic in perfbook?
>>>>>>>
>>>>>>> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
>>>>>>> (which I use in hash_resize.c) is the only one allowing arbitrary changes
>>>>>>> to hash functions.  I expect that this advantage will become increasingly
>>>>>>> important as security issues become more challenging.  Furthermore, I
>>>>>>> suspect that the pointer-pair approach is faster and more scalable.
>>>>>>> It is certainly simpler.
>>>>>>>
>>>>>>> On the other hand, one advantage of the other two approaches is decreased
>>>>>>> memory consumption.
>>>>>>>
>>>>>>> Another advantage of Josh Triplett's pointer-unzip approach is that
>>>>>>> concurrent updates are (in theory, anyway) not blocked for as long
>>>>>>> by resize operations.  The other edge of this sword is that resizing
>>>>>>> is much slower, given the need to wait for many RCU grace periods.
>>>>>>>
>>>>>>> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
>>>>>>> and Shavit's split-order list is that all operations are non-blocking,
>>>>>>> which can be important on massively overloaded systems, such as one
>>>>>>> might find in cloud computing.
>>>>>>>
>>>>>>>> === Question 4 ===
>>>>>>>> In the current implementation of hash_resize.c, the perftest_resize
>>>>>>>> could block an updater, and vice versa. It seems this is not what we
>>>>>>>> expected. Ideally, they should be allowed to run concurrently, or at
>>>>>>>> least the perftest_resize thread should have lower priority and
>>>>>>>> updaters should never be blocked by the perftest_resize thread. Is
>>>>>>>> that right? I'm very interested in helping improve. Please let me know
>>>>>>>> if you have any suggestions.
>>>>>>>
>>>>>>> In hash_resize.c, an updater is blocked only for the time required to
>>>>>>> redisposition a bucket.  This is a great improvement over blocking
>>>>>>> updaters for the full resize over all buckets.
>>>>>>>
>>>>>>> But yes, it is not hard to do better, for example, periodically dropping
>>>>>>> the old-table lock in hashtab_resize().  This requires a few careful
>>>>>>> adjustments, of course.  Can you tell me what these adjustments are?
>>>>>>>
>>>>>>> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
>>>>>>> optimizing for the race with hashtab_resize() doesn't make a whole lot
>>>>>>> of sense.  Please see the patch below.  Thoughts?
>>>>>>>
>>>>>>>                                                      Thanx, Paul
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> commit 737646a9c868d841b32199b52f5569668975953e
>>>>>>> Author: Paul E. McKenney <paulmck@linux.ibm.com>
>>>>>>> Date:   Mon Jan 7 10:29:14 2019 -0800
>>>>>>>
>>>>>>>     datastruct/hash: Simplify hashtab_lookup()
>>>>>>>
>>>>>>>     Because resizing leaves the old hash table intact, and because lookups
>>>>>>>     are carried out within RCU read-side critical sections (which prevent
>>>>>>>     a second resizing operation from starting), there is no need for a
>>>>>>>     lookup to search anywhere but in the old hash table.  And in the common
>>>>>>>     case, there is no resize, so there is no new hash table.  Therefore,
>>>>>>>     eliminating the check for resizing speeds things up in the common
>>>>>>>     case.  In addition, this simplifies the code.
>>>>>>>
>>>>>>>     This commit therefore eliminates the ht_get_bucket() function,
>>>>>>>     renames the ht_get_bucket_single() function to ht_get_bucket(),
>>>>>>>     and modifies callers appropriately.
>>>>>>>
>>>>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
>>>>>>>
>>>>>>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
>>>>>>> index 29e05f907200..be4157959b83 100644
>>>>>>> --- a/CodeSamples/datastruct/hash/hash_resize.c
>>>>>>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
>>>>>>> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
>>>>>>>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>>>>>>>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>>>>>>>  static struct ht_bucket *                           //\lnlbl{single:b}
>>>>>>> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>>>>>> -                     unsigned long *h)
>>>>>>> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>>>>>>>  {
>>>>>>>      unsigned long hash = htp->ht_gethash(key);
>>>>>>>
>>>>>>> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
>>>>>>>              *h = hash;                              //\lnlbl{single:h}
>>>>>>>      return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
>>>>>>>  }                                                   //\lnlbl{single:e}
>>>>>>> -
>>>>>>> -/* Get hash bucket correesponding to key, accounting for resize. */
>>>>>>> -static struct ht_bucket *                           //\lnlbl{b}
>>>>>>> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
>>>>>>> -{
>>>>>>> -    struct ht_bucket *htbp;
>>>>>>> -
>>>>>>> -    htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
>>>>>>> -                                                            //\fcvexclude
>>>>>>> -    if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
>>>>>>> -            smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
>>>>>>
>>>>>> If we can remove this memory barrier, the counterpart smp_mb() in
>>>>>> hashtab_resize() becomes unnecessary, doesn't it?
>>>>>
>>>>> And the WRITE_ONCE() in the following line.
>>>>>
>>>>>         Thanks, Akira
>>>>>>
>>>>>>         Thanks, Akira
>>>>>>
>>>>>>> -            *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
>>>>>>> -            htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
>>>>>>> -    }
>>>>>>> -    if (i)                                          //\lnlbl{chk_i}
>>>>>>> -            *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
>>>>>>> -    return htbp;                                    //\lnlbl{return}
>>>>>>> -}                                                   //\lnlbl{e}
>>>>>>>  //\end{snippet}
>>>>>>>
>>>>>>>  /* Read-side lock/unlock functions. */
>>>>>>> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>>>>>>>
>>>>>>>      rcu_read_lock();                                //\lnlbl{l:rcu_lock}
>>>>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
>>>>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
>>>>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
>>>>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
>>>>>>>      lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
>>>>>>>      lsp->hls_idx[0] = htp->ht_idx;
>>>>>>> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>>>>>>>              return;                                 //\lnlbl{l:fastret1}
>>>>>>>      }
>>>>>>>      htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
>>>>>>> -    htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
>>>>>>> +    htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
>>>>>>>      spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
>>>>>>>      lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
>>>>>>>      lsp->hls_idx[1] = htp->ht_idx;
>>>>>>> @@ -223,16 +204,15 @@ struct ht_elem *                                       //\lnlbl{lkp:b}
>>>>>>>  hashtab_lookup(struct hashtab *htp_master, void *key)
>>>>>>>  {
>>>>>>>      long b;
>>>>>>> -    int i;
>>>>>>>      struct ht *htp;
>>>>>>>      struct ht_elem *htep;
>>>>>>>      struct ht_bucket *htbp;
>>>>>>>
>>>>>>>      htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
>>>>>>> -    htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
>>>>>>> +    htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
>>>>>>>      cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
>>>>>>>                                  &htbp->htb_head,
>>>>>>> -                                hte_next[i]) {
>>>>>>> +                                hte_next[htp->ht_idx]) {
>>>>>>>              if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
>>>>>>>                      return htep;                    //\lnlbl{lkp:ret_match}
>>>>>>>      }                                               //\lnlbl{lkp:loop:e}
>>>>>>> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
>>>>>>>              htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
>>>>>>>              spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
>>>>>>>              cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
>>>>>>> -                    htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>>>>>>> +                    htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>>>>>>>                      spin_lock(&htbp_new->htb_lock);
>>>>>>>                      cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
>>>>>>>                      spin_unlock(&htbp_new->htb_lock);
>>>>>>> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
>>>>>>> index 5c61bf5e2389..0152437c274e 100644
>>>>>>> --- a/datastruct/datastruct.tex
>>>>>>> +++ b/datastruct/datastruct.tex
>>>>>>> @@ -966,10 +966,8 @@ the old table.
>>>>>>>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>>>>>>>  Bucket selection is shown in
>>>>>>>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
>>>>>>> -which shows \co{ht_get_bucket_single()} on
>>>>>>> -lines~\lnref{single:b}-\lnref{single:e} and
>>>>>>> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
>>>>>>> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
>>>>>>> +which shows \co{ht_get_bucket()}.
>>>>>>> +This function returns a reference to the bucket
>>>>>>>  corresponding to the specified key in the specified hash table, without
>>>>>>>  making any allowances for resizing.
>>>>>>>  It also stores the bucket index corresponding to the key into the location
>>>>>>> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
>>>>>>>  hash value corresponding to the key into the location
>>>>>>>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
>>>>>>>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
>>>>>>> -
>>>>>>> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
>>>>>>> -\co{ht_get_bucket_single()} on
>>>>>>> -line~\lnref{call_single} to select the bucket
>>>>>>> -corresponding to the hash in the current
>>>>>>> -hash table, storing the hash value through parameter~\co{b}.
>>>>>>> -If line~\lnref{resized} determines that the table is being resized and that
>>>>>>> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
>>>>>>> -table, then line~\lnref{newtable} selects the new hash table and
>>>>>>> -line~\lnref{newbucket}
>>>>>>> -selects the bucket corresponding to the hash in the new hash table,
>>>>>>> -again storing the hash value through parameter~\co{b}.
>>>>>>> -\end{lineref}
>>>>>>> -
>>>>>>> -\QuickQuiz{}
>>>>>>> -    The code in
>>>>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
>>>>>>> -    computes the hash twice!
>>>>>>> -    Why this blatant inefficiency?
>>>>>>> -\QuickQuizAnswer{
>>>>>>> -    The reason is that the old and new hash tables might have
>>>>>>> -    completely different hash functions, so that a hash computed
>>>>>>> -    for the old table might be completely irrelevant to the
>>>>>>> -    new table.
>>>>>>> -} \QuickQuizEnd
>>>>>>> -
>>>>>>> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>>>>>>> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
>>>>>>> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
>>>>>>> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>>>>>>>  \end{lineref}
>>>>>>>
>>>>>>>  \QuickQuiz{}
>>>>>>> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>>>>>>>      functions described next.
>>>>>>>  } \QuickQuizEnd
>>>>>>>
>>>>>>> -This implementation of
>>>>>>> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
>>>>>>> -permit lookups and modifications to run concurrently
>>>>>>> -with a resize operation.
>>>>>>> +This implementation of \co{ht_get_bucket()} permits lookups and
>>>>>>> +modifications to run concurrently with a resize operation.
>>>>>>>
>>>>>>>  \begin{listing}[tb]
>>>>>>>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
>>>>>>> @@ -1129,11 +1095,6 @@ hash lookups.
>>>>>>>  Line~\lnref{get_curtbl} fetches the current hash table and
>>>>>>>  line~\lnref{get_curbkt} obtains a reference
>>>>>>>  to the bucket corresponding to the specified key.
>>>>>>> -This bucket will be located in a new resized hash table when a
>>>>>>> -resize operation has progressed past the bucket in the old hash
>>>>>>> -table that contained the desired data element.
>>>>>>> -Note that line~\lnref{get_curbkt} also passes back the index that will be
>>>>>>> -used to select the correct set of pointers from the pair in each element.
>>>>>>>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
>>>>>>>  so that if line~\lnref{match}
>>>>>>>  detects a match,
>>>>>>> @@ -1144,22 +1105,17 @@ failure.
>>>>>>>  \end{lineref}
>>>>>>>
>>>>>>>  \QuickQuiz{}
>>>>>>> -    In the \co{hashtab_lookup()} function in
>>>>>>> -    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
>>>>>>> -    the code carefully finds the right bucket in the new hash table
>>>>>>> -    if the element to be looked up has already been distributed
>>>>>>> -    by a concurrent resize operation.
>>>>>>> -    This seems wasteful for RCU-protected lookups.
>>>>>>> -    Why not just stick with the old hash table in this case?
>>>>>>> +    \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
>>>>>>> +    What if execution reaches line~\lnref{loop:b}
>>>>>>> +    of \co{hashtab_lookup()} in
>>>>>>> +    Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
>>>>>>> +    just after this bucket has been resized.
>>>>>>> +    Won't that result in lookup failures?
>>>>>>> +    \end{lineref}
>>>>>>>  \QuickQuizAnswer{
>>>>>>> -    Suppose that a resize operation begins and distributes half of
>>>>>>> -    the old table's buckets to the new table.
>>>>>>> -    Suppose further that a thread adds a new element that goes into
>>>>>>> -    one of the already-distributed buckets, and that this same thread
>>>>>>> -    now looks up this newly added element.
>>>>>>> -    If lookups unconditionally traversed only the old hash table,
>>>>>>> -    this thread would get a lookup failure for the element that it
>>>>>>> -    just added, which certainly sounds like a bug to me!
>>>>>>> +    No, it won't.
>>>>>>> +    Resizing into the new hash table leaves the old hash table
>>>>>>> +    intact, courtesy of the pointer pairs.
>>>>>>>  } \QuickQuizEnd
>>>>>>>
>>>>>>>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
>>>>>>>
>>>>
>>>> Hi Paul and Akira,
>>>>
>>>> Thanks a lot for the comments, which I need some more time to look
>>>> into. For Paul's patch, I have a few concerns. Please take a look.
>>>>
>>>> My understanding is that with this path, during the time period when
>>>> the resizing thread is running, an updater may insert/delete an item
>>>> into/from the new hash table, while readers are still looking up data
>>>> in the old one, resulting the readers are unaware of
>>>> insertions/deletions happening simultaneously. For example, it seems
>>>> the following sequence could happen.
>>>>
>>>> 1. The resizing thread starts.
>>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
>>>> 3. An updater wants to insert a new item *I* which should be inserted
>>>> into bucket *B*.
>>>> 4. The updater will select the new hash table and insert the item *I*
>>>> into the new hash table.
>>>> 5. A read request comes in and wants to lookup item *I*. The lookup
>>>> request will check the old hash table and fail. Doesn't it?
>>>> 6. The resizing thread exits.
>>>> 7. Now subsequent read requests can successfully find item *I*.
>>>
>>> Yes, this can happen.
>>>
>>>> Is my understanding correct? Please let me know if I misunderstood
>>>> anything. Give the truth that this patch can accelerate the fast path,
>>>> I think it should be OK because resizing is typically happen rarely.
>>>> Just want to make sure I fully understand the algorithm.
>>>
>>> It is a design choice, and some users would prefer not to fail to see
>>> new items during a resize.  One approach would be to revert back to
>>> the old-style checking, and another would be to provide a separate
>>> lookup interface that synchronizes with adds and deletes.
>>>
>>> So, I could add a quick quiz with this information, I could revert the
>>> change, or I could add another lookup function that provided more timely
>>> information.  Left to myself, I would provide a quick quiz, but what
>>> do you guys think?
>>
>> Hi, I was composing a message, but now I'm replying to this one.
>> I think adding a quick quiz would be a good idea.
> 
> But in the meantime, it occurred to me that I was looking at the
> problem in the wrong way.  I believe that the following patch makes
> hashtab_lookup() find elements recently added by hashtab_add(), even
> during a resize, and without the need for memory barriers.
> 
> The scenario that convinced me to take this approach is when a thread
> does hashtab_add(), then immediately searches for the newly added element.
> Failing to find it would be quite a surprise to most people.
When a thread does hashtab_del() and immediately checks the deletion,
it still finds the deleted element while resizing is in progress.
This would also be a surprise. Current version looks less consistent
than the simpler one did.

        Thanks, Akira

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit b61179bdc22e9750147ad3f540215af225aa3376
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Tue Jan 8 10:29:43 2019 -0800
> 
>     datastruct/hash: Make hashtab_lookup() more responsive during resize
>     
>     If hashtab_add() adds a new element to an already-resized bucket during
>     a resize, hashtab_lookup() won't find it until the resize is complete.
>     This is a not-unreasonable semantic, but might be quite surprising to most
>     users, especially if the thread doing the lookup is the same thread that
>     just added the new element.  This commit therefore causes hashtab_lookup()
>     to find recently added elements even during resize operations.
>     
>     Note that this change involved a small refactoring of the code,
>     introducing a new ht_search_bucket() helper function.  This refactoring
>     avoids the need for memory barriers as well because when an element is
>     not found and there is a resize in progress, the new version of the
>     hash table is unconditionally searched, thus avoiding the need for a
>     racy access to ->ht_resize_cur.  Although this approach can result in
>     needless searches of not-yet-filled-in buckets, such searches will
>     be rare (assuming resizing is rare), and a search of an empty bucket
>     is cheap anyway.
>     
>     Reported-by: Junchang Wang <junchangwang@gmail.com>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> 
> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> index 9f68a00dabe3..6dbfe020d78d 100644
> --- a/CodeSamples/datastruct/hash/hash_resize.c
> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> @@ -124,7 +124,8 @@ void hashtab_free(struct hashtab *htp_master)
>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>  static struct ht_bucket *				//\lnlbl{single:b}
> -ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> +ht_get_bucket(struct ht *htp, void *key,
> +              long *b, unsigned long *h)
>  {
>  	unsigned long hash = htp->ht_gethash(key);
>  
> @@ -133,6 +134,24 @@ ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>  		*h = hash;				//\lnlbl{single:h}
>  	return &htp->ht_bkt[*b];			//\lnlbl{single:return}
>  }							//\lnlbl{single:e}
> +
> +/* Search the bucket for the specfied key in the specified ht structure. */
> +static struct ht_elem *					//\lnlbl{hsb:b}
> +ht_search_bucket(struct ht *htp, void *key)
> +{
> +	long b;
> +	struct ht_elem *htep;
> +	struct ht_bucket *htbp;
> +
> +	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{hsb:get_curbkt}
> +	cds_list_for_each_entry_rcu(htep,		//\lnlbl{hsb:loop:b}
> +	                            &htbp->htb_head,
> +	                            hte_next[htp->ht_idx]) {
> +		if (htp->ht_cmp(htep, key)) 		//\lnlbl{hsb:match}
> +			return htep;			//\lnlbl{hsb:ret_match}
> +	}						//\lnlbl{hsb:loop:e}
> +	return NULL;					//\lnlbl{hsb:ret_NULL}
> +}							//\lnlbl{hsb:e}
>  //\end{snippet}
>  
>  /* Read-side lock/unlock functions. */
> @@ -203,20 +222,17 @@ void hashtab_lookup_done(struct ht_elem *htep)
>  struct ht_elem *					//\lnlbl{lkp:b}
>  hashtab_lookup(struct hashtab *htp_master, void *key)
>  {
> -	long b;
>  	struct ht *htp;
>  	struct ht_elem *htep;
> -	struct ht_bucket *htbp;
>  
>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
> -	htbp = ht_get_bucket(htp, key, &b, NULL);	//\lnlbl{lkp:get_curbkt}
> -	cds_list_for_each_entry_rcu(htep,		//\lnlbl{lkp:loop:b}
> -	                            &htbp->htb_head,
> -	                            hte_next[htp->ht_idx]) {
> -		if (htp->ht_cmp(htep, key)) 		//\lnlbl{lkp:match}
> -			return htep;			//\lnlbl{lkp:ret_match}
> -	}						//\lnlbl{lkp:loop:e}
> -	return NULL;					//\lnlbl{lkp:ret_NULL}
> +	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
> +	if (htep)					//\lnlbl{lkp:entchk}
> +		return htep;				//\lnlbl{lkp:ret_match}
> +	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
> +	if (!htp)					//\lnlbl{lkp:htpchk}
> +		return NULL;				//\lnlbl{lkp:noresize}
> +	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
>  }							//\lnlbl{lkp:e}
>  
>  /*
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index eb7f790aa5a3..175056cbe911 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
> @@ -966,8 +966,10 @@ the old table.
>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>  Bucket selection is shown in
>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> -which shows \co{ht_get_bucket()}.
> -This function returns a reference to the bucket
> +which shows \co{ht_get_bucket()} on
> +lines~\lnref{single:b}-\lnref{single:e} and \co{ht_search_bucket()} on
> +lines~\lnref{hsb:b}-\lnref{hsb:e}.
> +The \co{ht_get_bucket()} function returns a reference to the bucket
>  corresponding to the specified key in the specified hash table, without
>  making any allowances for resizing.
>  It also stores the bucket index corresponding to the key into the location
> @@ -976,6 +978,17 @@ line~\lnref{single:gethash}, and the corresponding
>  hash value corresponding to the key into the location
>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> +
> +The \co{ht_search_bucket()} function searches for the specified key
> +within the specified hash-table version.
> +Line~\lnref{hsb:get_curbkt} obtains a reference to the bucket corresponding
> +to the specified key.
> +The loop spanning lines~\lnref{hsb:loop:b}-\lnref{hsb:loop:e} searches
> +that bucket, so that if line~\lnref{hsb:match} detects a match,
> +line~\lnref{hsb:ret_match} returns a pointer to the enclosing data element.
> +Otherwise, if there is no match,
> +line~\lnref{hsb:ret_NULL} returns \co{NULL} to indicate
> +failure.
>  \end{lineref}
>  
>  \QuickQuiz{}
> @@ -1093,31 +1106,18 @@ The \co{hashtab_lookup()} function on
>  lines~\lnref{b}-\lnref{e} of the listing does
>  hash lookups.
>  Line~\lnref{get_curtbl} fetches the current hash table and
> -line~\lnref{get_curbkt} obtains a reference
> -to the bucket corresponding to the specified key.
> -The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> -so that if line~\lnref{match}
> -detects a match,
> -line~\lnref{ret_match} returns a pointer to the enclosing data element.
> -Otherwise, if there is no match,
> -line~\lnref{ret_NULL} returns \co{NULL} to indicate
> -failure.
> +line~\lnref{get_curbkt} searches the bucket corresponding to the
> +specified key.
> +If line~\lnref{entchk} determines that the search was successful,
> +line~\lnref{ret_match} returns a pointer to the element that was located.
> +Otherwise, line~\lnref{get_nxttbl} picks up a pointer to the next version,
> +and if line~\lnref{htpchk} determines that there is no resize in progress,
> +line~\lnref{noresize} returns \co{NULL}.
> +When there is a resize in progress, execution reaches
> +line~\lnref{ret_nxtbkt}, which returns the result of searching the
> +bucket corresponding to \co{key} in the new version.
>  \end{lineref}
>  
> -\QuickQuiz{}
> -	\begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> -	What if execution reaches line~\lnref{loop:b}
> -	of \co{hashtab_lookup()} in
> -	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> -	just after this bucket has been resized.
> -	Won't that result in lookup failures?
> -	\end{lineref}
> -\QuickQuizAnswer{
> -	No, it won't.
> -	Resizing into the new hash table leaves the old hash table
> -	intact, courtesy of the pointer pairs.
> -} \QuickQuizEnd
> -
>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
>  The \co{hashtab_add()} function on lines~\lnref{b}-\lnref{e} of the listing adds
>  new data elements to the hash table.
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-08 22:16               ` Akira Yokosawa
@ 2019-01-09  0:19                 ` Paul E. McKenney
  2019-01-09  2:59                   ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-09  0:19 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Wed, Jan 09, 2019 at 07:16:05AM +0900, Akira Yokosawa wrote:
> On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
> > On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
> >> On 2019/01/09 0:28, Paul E. McKenney wrote:
> >>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
> >>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:

[ . . . ]

> >>>> Hi Paul and Akira,
> >>>>
> >>>> Thanks a lot for the comments, which I need some more time to look
> >>>> into. For Paul's patch, I have a few concerns. Please take a look.
> >>>>
> >>>> My understanding is that with this path, during the time period when
> >>>> the resizing thread is running, an updater may insert/delete an item
> >>>> into/from the new hash table, while readers are still looking up data
> >>>> in the old one, resulting the readers are unaware of
> >>>> insertions/deletions happening simultaneously. For example, it seems
> >>>> the following sequence could happen.
> >>>>
> >>>> 1. The resizing thread starts.
> >>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
> >>>> 3. An updater wants to insert a new item *I* which should be inserted
> >>>> into bucket *B*.
> >>>> 4. The updater will select the new hash table and insert the item *I*
> >>>> into the new hash table.
> >>>> 5. A read request comes in and wants to lookup item *I*. The lookup
> >>>> request will check the old hash table and fail. Doesn't it?
> >>>> 6. The resizing thread exits.
> >>>> 7. Now subsequent read requests can successfully find item *I*.
> >>>
> >>> Yes, this can happen.
> >>>
> >>>> Is my understanding correct? Please let me know if I misunderstood
> >>>> anything. Give the truth that this patch can accelerate the fast path,
> >>>> I think it should be OK because resizing is typically happen rarely.
> >>>> Just want to make sure I fully understand the algorithm.
> >>>
> >>> It is a design choice, and some users would prefer not to fail to see
> >>> new items during a resize.  One approach would be to revert back to
> >>> the old-style checking, and another would be to provide a separate
> >>> lookup interface that synchronizes with adds and deletes.
> >>>
> >>> So, I could add a quick quiz with this information, I could revert the
> >>> change, or I could add another lookup function that provided more timely
> >>> information.  Left to myself, I would provide a quick quiz, but what
> >>> do you guys think?
> >>
> >> Hi, I was composing a message, but now I'm replying to this one.
> >> I think adding a quick quiz would be a good idea.
> > 
> > But in the meantime, it occurred to me that I was looking at the
> > problem in the wrong way.  I believe that the following patch makes
> > hashtab_lookup() find elements recently added by hashtab_add(), even
> > during a resize, and without the need for memory barriers.
> > 
> > The scenario that convinced me to take this approach is when a thread
> > does hashtab_add(), then immediately searches for the newly added element.
> > Failing to find it would be quite a surprise to most people.
> 
> When a thread does hashtab_del() and immediately checks the deletion,
> it still finds the deleted element while resizing is in progress.
> This would also be a surprise. Current version looks less consistent
> than the simpler one did.

I bet I can fix that...  Famous last words!  ;-)

But please see below and tell me what you think.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
index 6dbfe020d78d..632d9e27675b 100644
--- a/CodeSamples/datastruct/hash/hash_resize.c
+++ b/CodeSamples/datastruct/hash/hash_resize.c
@@ -257,9 +257,12 @@ void hashtab_add(struct ht_elem *htep,			//\lnlbl{add:b}
 void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
                  struct ht_lock_state *lsp)
 {
-	int i = lsp->hls_idx[!!lsp->hbp[1]];		//\lnlbl{del:i}
+	int new = !!lsp->hbp[1];			//\lnlbl{del:new}
+	int i = lsp->hls_idx[new];			//\lnlbl{del:i}
 
 	cds_list_del_rcu(&htep->hte_next[i]);		//\lnlbl{del:del}
+	if (new)
+		cds_list_del_rcu(&htep->hte_next[!i]);	//\lnlbl{del:del}
 }							//\lnlbl{del:e}
 //\end{snippet}
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-09  0:19                 ` Paul E. McKenney
@ 2019-01-09  2:59                   ` Paul E. McKenney
  2019-01-11  4:08                     ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-09  2:59 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Tue, Jan 08, 2019 at 04:19:59PM -0800, Paul E. McKenney wrote:
> On Wed, Jan 09, 2019 at 07:16:05AM +0900, Akira Yokosawa wrote:
> > On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
> > > On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
> > >> On 2019/01/09 0:28, Paul E. McKenney wrote:
> > >>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
> > >>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> > >>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> 
> [ . . . ]
> 
> > >>>> Hi Paul and Akira,
> > >>>>
> > >>>> Thanks a lot for the comments, which I need some more time to look
> > >>>> into. For Paul's patch, I have a few concerns. Please take a look.
> > >>>>
> > >>>> My understanding is that with this path, during the time period when
> > >>>> the resizing thread is running, an updater may insert/delete an item
> > >>>> into/from the new hash table, while readers are still looking up data
> > >>>> in the old one, resulting the readers are unaware of
> > >>>> insertions/deletions happening simultaneously. For example, it seems
> > >>>> the following sequence could happen.
> > >>>>
> > >>>> 1. The resizing thread starts.
> > >>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
> > >>>> 3. An updater wants to insert a new item *I* which should be inserted
> > >>>> into bucket *B*.
> > >>>> 4. The updater will select the new hash table and insert the item *I*
> > >>>> into the new hash table.
> > >>>> 5. A read request comes in and wants to lookup item *I*. The lookup
> > >>>> request will check the old hash table and fail. Doesn't it?
> > >>>> 6. The resizing thread exits.
> > >>>> 7. Now subsequent read requests can successfully find item *I*.
> > >>>
> > >>> Yes, this can happen.
> > >>>
> > >>>> Is my understanding correct? Please let me know if I misunderstood
> > >>>> anything. Give the truth that this patch can accelerate the fast path,
> > >>>> I think it should be OK because resizing is typically happen rarely.
> > >>>> Just want to make sure I fully understand the algorithm.
> > >>>
> > >>> It is a design choice, and some users would prefer not to fail to see
> > >>> new items during a resize.  One approach would be to revert back to
> > >>> the old-style checking, and another would be to provide a separate
> > >>> lookup interface that synchronizes with adds and deletes.
> > >>>
> > >>> So, I could add a quick quiz with this information, I could revert the
> > >>> change, or I could add another lookup function that provided more timely
> > >>> information.  Left to myself, I would provide a quick quiz, but what
> > >>> do you guys think?
> > >>
> > >> Hi, I was composing a message, but now I'm replying to this one.
> > >> I think adding a quick quiz would be a good idea.
> > > 
> > > But in the meantime, it occurred to me that I was looking at the
> > > problem in the wrong way.  I believe that the following patch makes
> > > hashtab_lookup() find elements recently added by hashtab_add(), even
> > > during a resize, and without the need for memory barriers.
> > > 
> > > The scenario that convinced me to take this approach is when a thread
> > > does hashtab_add(), then immediately searches for the newly added element.
> > > Failing to find it would be quite a surprise to most people.
> > 
> > When a thread does hashtab_del() and immediately checks the deletion,
> > it still finds the deleted element while resizing is in progress.
> > This would also be a surprise. Current version looks less consistent
> > than the simpler one did.
> 
> I bet I can fix that...  Famous last words!  ;-)
> 
> But please see below and tell me what you think.

Well, that is not quite right, but close.  Working on it...

							Thanx, Paul

> ------------------------------------------------------------------------
> 
> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> index 6dbfe020d78d..632d9e27675b 100644
> --- a/CodeSamples/datastruct/hash/hash_resize.c
> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> @@ -257,9 +257,12 @@ void hashtab_add(struct ht_elem *htep,			//\lnlbl{add:b}
>  void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
>                   struct ht_lock_state *lsp)
>  {
> -	int i = lsp->hls_idx[!!lsp->hbp[1]];		//\lnlbl{del:i}
> +	int new = !!lsp->hbp[1];			//\lnlbl{del:new}
> +	int i = lsp->hls_idx[new];			//\lnlbl{del:i}
>  
>  	cds_list_del_rcu(&htep->hte_next[i]);		//\lnlbl{del:del}
> +	if (new)
> +		cds_list_del_rcu(&htep->hte_next[!i]);	//\lnlbl{del:del}
>  }							//\lnlbl{del:e}
>  //\end{snippet}
>  


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-09  2:59                   ` Paul E. McKenney
@ 2019-01-11  4:08                     ` Paul E. McKenney
  2019-01-11 14:25                       ` Akira Yokosawa
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-11  4:08 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Tue, Jan 08, 2019 at 06:59:13PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 08, 2019 at 04:19:59PM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 09, 2019 at 07:16:05AM +0900, Akira Yokosawa wrote:
> > > On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
> > > > On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
> > > >> On 2019/01/09 0:28, Paul E. McKenney wrote:
> > > >>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
> > > >>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> > > >>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> > 
> > [ . . . ]
> > 
> > > >>>> Hi Paul and Akira,
> > > >>>>
> > > >>>> Thanks a lot for the comments, which I need some more time to look
> > > >>>> into. For Paul's patch, I have a few concerns. Please take a look.
> > > >>>>
> > > >>>> My understanding is that with this path, during the time period when
> > > >>>> the resizing thread is running, an updater may insert/delete an item
> > > >>>> into/from the new hash table, while readers are still looking up data
> > > >>>> in the old one, resulting the readers are unaware of
> > > >>>> insertions/deletions happening simultaneously. For example, it seems
> > > >>>> the following sequence could happen.
> > > >>>>
> > > >>>> 1. The resizing thread starts.
> > > >>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
> > > >>>> 3. An updater wants to insert a new item *I* which should be inserted
> > > >>>> into bucket *B*.
> > > >>>> 4. The updater will select the new hash table and insert the item *I*
> > > >>>> into the new hash table.
> > > >>>> 5. A read request comes in and wants to lookup item *I*. The lookup
> > > >>>> request will check the old hash table and fail. Doesn't it?
> > > >>>> 6. The resizing thread exits.
> > > >>>> 7. Now subsequent read requests can successfully find item *I*.
> > > >>>
> > > >>> Yes, this can happen.
> > > >>>
> > > >>>> Is my understanding correct? Please let me know if I misunderstood
> > > >>>> anything. Give the truth that this patch can accelerate the fast path,
> > > >>>> I think it should be OK because resizing is typically happen rarely.
> > > >>>> Just want to make sure I fully understand the algorithm.
> > > >>>
> > > >>> It is a design choice, and some users would prefer not to fail to see
> > > >>> new items during a resize.  One approach would be to revert back to
> > > >>> the old-style checking, and another would be to provide a separate
> > > >>> lookup interface that synchronizes with adds and deletes.
> > > >>>
> > > >>> So, I could add a quick quiz with this information, I could revert the
> > > >>> change, or I could add another lookup function that provided more timely
> > > >>> information.  Left to myself, I would provide a quick quiz, but what
> > > >>> do you guys think?
> > > >>
> > > >> Hi, I was composing a message, but now I'm replying to this one.
> > > >> I think adding a quick quiz would be a good idea.
> > > > 
> > > > But in the meantime, it occurred to me that I was looking at the
> > > > problem in the wrong way.  I believe that the following patch makes
> > > > hashtab_lookup() find elements recently added by hashtab_add(), even
> > > > during a resize, and without the need for memory barriers.
> > > > 
> > > > The scenario that convinced me to take this approach is when a thread
> > > > does hashtab_add(), then immediately searches for the newly added element.
> > > > Failing to find it would be quite a surprise to most people.
> > > 
> > > When a thread does hashtab_del() and immediately checks the deletion,
> > > it still finds the deleted element while resizing is in progress.
> > > This would also be a surprise. Current version looks less consistent
> > > than the simpler one did.
> > 
> > I bet I can fix that...  Famous last words!  ;-)
> > 
> > But please see below and tell me what you think.
> 
> Well, that is not quite right, but close.  Working on it...

Seems to be stable.  I have not yet updated the text.  I am currently
looking into whether I can get rid of ->ht_resize_cur.  In theory, this
would make it trivial to make the resizing "pause", releasing the lock
from time to time.

For whatever it is worth...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-11  4:08                     ` Paul E. McKenney
@ 2019-01-11 14:25                       ` Akira Yokosawa
  2019-01-11 15:43                         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Akira Yokosawa @ 2019-01-11 14:25 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Junchang Wang, perfbook, Akira Yokosawa

On 2019/01/11 13:08, Paul E. McKenney wrote:
> On Tue, Jan 08, 2019 at 06:59:13PM -0800, Paul E. McKenney wrote:
>> On Tue, Jan 08, 2019 at 04:19:59PM -0800, Paul E. McKenney wrote:
>>> On Wed, Jan 09, 2019 at 07:16:05AM +0900, Akira Yokosawa wrote:
>>>> On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
>>>>> On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
>>>>>> On 2019/01/09 0:28, Paul E. McKenney wrote:
>>>>>>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
>>>>>>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>>>>>>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
>>>
>>> [ . . . ]
>>>
>>>>>>>> Hi Paul and Akira,
>>>>>>>>
>>>>>>>> Thanks a lot for the comments, which I need some more time to look
>>>>>>>> into. For Paul's patch, I have a few concerns. Please take a look.
>>>>>>>>
>>>>>>>> My understanding is that with this path, during the time period when
>>>>>>>> the resizing thread is running, an updater may insert/delete an item
>>>>>>>> into/from the new hash table, while readers are still looking up data
>>>>>>>> in the old one, resulting the readers are unaware of
>>>>>>>> insertions/deletions happening simultaneously. For example, it seems
>>>>>>>> the following sequence could happen.
>>>>>>>>
>>>>>>>> 1. The resizing thread starts.
>>>>>>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
>>>>>>>> 3. An updater wants to insert a new item *I* which should be inserted
>>>>>>>> into bucket *B*.
>>>>>>>> 4. The updater will select the new hash table and insert the item *I*
>>>>>>>> into the new hash table.
>>>>>>>> 5. A read request comes in and wants to lookup item *I*. The lookup
>>>>>>>> request will check the old hash table and fail. Doesn't it?
>>>>>>>> 6. The resizing thread exits.
>>>>>>>> 7. Now subsequent read requests can successfully find item *I*.
>>>>>>>
>>>>>>> Yes, this can happen.
>>>>>>>
>>>>>>>> Is my understanding correct? Please let me know if I misunderstood
>>>>>>>> anything. Give the truth that this patch can accelerate the fast path,
>>>>>>>> I think it should be OK because resizing is typically happen rarely.
>>>>>>>> Just want to make sure I fully understand the algorithm.
>>>>>>>
>>>>>>> It is a design choice, and some users would prefer not to fail to see
>>>>>>> new items during a resize.  One approach would be to revert back to
>>>>>>> the old-style checking, and another would be to provide a separate
>>>>>>> lookup interface that synchronizes with adds and deletes.
>>>>>>>
>>>>>>> So, I could add a quick quiz with this information, I could revert the
>>>>>>> change, or I could add another lookup function that provided more timely
>>>>>>> information.  Left to myself, I would provide a quick quiz, but what
>>>>>>> do you guys think?
>>>>>>
>>>>>> Hi, I was composing a message, but now I'm replying to this one.
>>>>>> I think adding a quick quiz would be a good idea.
>>>>>
>>>>> But in the meantime, it occurred to me that I was looking at the
>>>>> problem in the wrong way.  I believe that the following patch makes
>>>>> hashtab_lookup() find elements recently added by hashtab_add(), even
>>>>> during a resize, and without the need for memory barriers.
>>>>>
>>>>> The scenario that convinced me to take this approach is when a thread
>>>>> does hashtab_add(), then immediately searches for the newly added element.
>>>>> Failing to find it would be quite a surprise to most people.
>>>>
>>>> When a thread does hashtab_del() and immediately checks the deletion,
>>>> it still finds the deleted element while resizing is in progress.
>>>> This would also be a surprise. Current version looks less consistent
>>>> than the simpler one did.
>>>
>>> I bet I can fix that...  Famous last words!  ;-)
>>>
>>> But please see below and tell me what you think.
>>
>> Well, that is not quite right, but close.  Working on it...
> 
> Seems to be stable.  I have not yet updated the text.  I am currently
> looking into whether I can get rid of ->ht_resize_cur.

Without ->ht_resize_cur, it would be hard (or impossible) for
hashtab_lock_mod() to see if bucket in the new table needs
to be locked, wouldn't it?

I have another idea to simplify the code, with the possibility of
increasing update-side cost during resizing.

By keeping updating ->cur hashtab until the resizing finished,
hashtab_lookup() only needs to see ->cur.
hashtab_add() and hashtab_del() update both buckets if hashtab_lock_mod()
has locked two buckets.

I've added the code as the alternative code to avoid messing line lables.
The diff is appended below.

Thoughts?

Note: By adding a "-DFCV_SNIPPET" flag to the compiler, you can compile
your version of the code.

        Thanks, Akira

>                                                        In theory, this
> would make it trivial to make the resizing "pause", releasing the lock
> from time to time.
> 
> For whatever it is worth...
> 
> 							Thanx, Paul
> 

diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
index 9f9fe8c..35bc847 100644
--- a/CodeSamples/datastruct/hash/hash_resize.c
+++ b/CodeSamples/datastruct/hash/hash_resize.c
@@ -30,7 +30,11 @@
 struct ht_elem {
 	struct rcu_head rh;
 	struct cds_list_head hte_next[2];		//\lnlbl{ht_elem:next}
+#ifndef FCV_SNIPPET
+	unsigned long hte_hash[2];
+#else /* #ifndef FCV_SNIPPET */
 	unsigned long hte_hash;
+#endif /* #ifndef FCV_SNIPPET */
 };
 
 /* Hash-table bucket element. */
@@ -230,12 +234,16 @@ hashtab_lookup(struct hashtab *htp_master, void *key)
 
 	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
 	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
+#ifndef FCV_SNIPPET
+	return htep;
+#else /* #ifndef FCV_SNIPPET */
 	if (htep)					//\lnlbl{lkp:entchk}
 		return htep;				//\lnlbl{lkp:ret_match}
 	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
 	if (!htp)					//\lnlbl{lkp:htpchk}
 		return NULL;				//\lnlbl{lkp:noresize}
 	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
+#endif /* #ifndef FCV_SNIPPET */
 }							//\lnlbl{lkp:e}
 
 /*
@@ -248,9 +256,18 @@ void hashtab_add(struct ht_elem *htep,			//\lnlbl{add:b}
 	struct ht_bucket *htbp = lsp->hbp[0];		//\lnlbl{add:htbp}
 	int i = lsp->hls_idx[0];			//\lnlbl{add:i}
 
+#ifndef FCV_SNIPPET
+	htep->hte_hash[0] = lsp->hls_hash[0];
+	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head);
+	if ((htbp = lsp->hbp[1])) {
+	  htep->hte_hash[1] = lsp->hls_hash[1];
+	  cds_list_add_rcu(&htep->hte_next[!i], &htbp->htb_head);
+	}
+#else /* #ifndef FCV_SNIPPET */
 	htep->hte_hash = lsp->hls_hash[0];		//\lnlbl{add:hash}
 	htep->hte_next[!i].prev = NULL;			//\lnlbl{add:initp}
 	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head); //\lnlbl{add:add}
+#endif /* #ifndef FCV_SNIPPET */
 }							//\lnlbl{add:e}
 
 /*
@@ -262,6 +279,11 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
 {
 	int i = lsp->hls_idx[0];			//\lnlbl{del:i}
 
+#ifndef FCV_SNIPPET
+	cds_list_del_rcu(&htep->hte_next[i]);
+	if (lsp->hbp[1])
+	  cds_list_del_rcu(&htep->hte_next[!i]);
+#else /* #ifndef FCV_SNIPPET */
 	if (htep->hte_next[i].prev) {			//\lnlbl{del:if}
 		cds_list_del_rcu(&htep->hte_next[i]);	//\lnlbl{del:del}
 		htep->hte_next[i].prev = NULL;		//\lnlbl{del:init}
@@ -270,6 +292,7 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
 		cds_list_del_rcu(&htep->hte_next[!i]);	//\lnlbl{del:delnew}
 		htep->hte_next[!i].prev = NULL;		//\lnlbl{del:initnew}
 	}
+#endif /* #ifndef FCV_SNIPPET */
 }							//\lnlbl{del:e}
 //\end{snippet}
 
@@ -350,5 +373,9 @@ void defer_del_rcu(struct rcu_head *rhp)
 
 #define quiescent_state() rcu_quiescent_state()
 
+#ifndef FCV_SNIPPET
+#define check_hash() (htep->hte_hash[0] != hash && htep->hte_hash[1] != hash)
+#endif /* #ifndef FCV_SNIPPET */
+
 #include "hashtorture.h"
 #endif /* #ifdef TEST_HASH */
diff --git a/CodeSamples/datastruct/hash/hashtorture.h b/CodeSamples/datastruct/hash/hashtorture.h
index 6f47baa..d6345cc 100644
--- a/CodeSamples/datastruct/hash/hashtorture.h
+++ b/CodeSamples/datastruct/hash/hashtorture.h
@@ -62,6 +62,10 @@ void (*defer_del_done)(struct ht_elem *htep) = NULL;
 #define rcu_barrier() do ; while (0)
 #endif /* #ifndef quiescent_state */
 
+#ifndef check_hash
+#define check_hash() (htep->hte_hash != hash)
+#endif /* #ifndef check_hash */
+
 /*
  * Test variables.
  */
@@ -988,7 +992,7 @@ int zoo_lookup(char *key)
 	htep = hashtab_lookup(perftest_htp, hash, key);
 	zhep = container_of(htep, struct zoo_he, zhe_e);
 	BUG_ON(htep &&
-	       (htep->hte_hash != hash ||
+	       (check_hash() ||
 	        strncmp(zhep->name, (char *)key, ZOO_NAMELEN) != 0));
 	hashtab_unlock_lookup(perftest_htp, hash);
 	hashtab_lookup_done(htep);



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-11 14:25                       ` Akira Yokosawa
@ 2019-01-11 15:43                         ` Paul E. McKenney
  2019-01-11 22:56                           ` Akira Yokosawa
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-11 15:43 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Fri, Jan 11, 2019 at 11:25:18PM +0900, Akira Yokosawa wrote:
> On 2019/01/11 13:08, Paul E. McKenney wrote:
> > On Tue, Jan 08, 2019 at 06:59:13PM -0800, Paul E. McKenney wrote:
> >> On Tue, Jan 08, 2019 at 04:19:59PM -0800, Paul E. McKenney wrote:
> >>> On Wed, Jan 09, 2019 at 07:16:05AM +0900, Akira Yokosawa wrote:
> >>>> On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
> >>>>> On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
> >>>>>> On 2019/01/09 0:28, Paul E. McKenney wrote:
> >>>>>>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
> >>>>>>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>>>>>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> >>>
> >>> [ . . . ]
> >>>
> >>>>>>>> Hi Paul and Akira,
> >>>>>>>>
> >>>>>>>> Thanks a lot for the comments, which I need some more time to look
> >>>>>>>> into. For Paul's patch, I have a few concerns. Please take a look.
> >>>>>>>>
> >>>>>>>> My understanding is that with this path, during the time period when
> >>>>>>>> the resizing thread is running, an updater may insert/delete an item
> >>>>>>>> into/from the new hash table, while readers are still looking up data
> >>>>>>>> in the old one, resulting the readers are unaware of
> >>>>>>>> insertions/deletions happening simultaneously. For example, it seems
> >>>>>>>> the following sequence could happen.
> >>>>>>>>
> >>>>>>>> 1. The resizing thread starts.
> >>>>>>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
> >>>>>>>> 3. An updater wants to insert a new item *I* which should be inserted
> >>>>>>>> into bucket *B*.
> >>>>>>>> 4. The updater will select the new hash table and insert the item *I*
> >>>>>>>> into the new hash table.
> >>>>>>>> 5. A read request comes in and wants to lookup item *I*. The lookup
> >>>>>>>> request will check the old hash table and fail. Doesn't it?
> >>>>>>>> 6. The resizing thread exits.
> >>>>>>>> 7. Now subsequent read requests can successfully find item *I*.
> >>>>>>>
> >>>>>>> Yes, this can happen.
> >>>>>>>
> >>>>>>>> Is my understanding correct? Please let me know if I misunderstood
> >>>>>>>> anything. Give the truth that this patch can accelerate the fast path,
> >>>>>>>> I think it should be OK because resizing is typically happen rarely.
> >>>>>>>> Just want to make sure I fully understand the algorithm.
> >>>>>>>
> >>>>>>> It is a design choice, and some users would prefer not to fail to see
> >>>>>>> new items during a resize.  One approach would be to revert back to
> >>>>>>> the old-style checking, and another would be to provide a separate
> >>>>>>> lookup interface that synchronizes with adds and deletes.
> >>>>>>>
> >>>>>>> So, I could add a quick quiz with this information, I could revert the
> >>>>>>> change, or I could add another lookup function that provided more timely
> >>>>>>> information.  Left to myself, I would provide a quick quiz, but what
> >>>>>>> do you guys think?
> >>>>>>
> >>>>>> Hi, I was composing a message, but now I'm replying to this one.
> >>>>>> I think adding a quick quiz would be a good idea.
> >>>>>
> >>>>> But in the meantime, it occurred to me that I was looking at the
> >>>>> problem in the wrong way.  I believe that the following patch makes
> >>>>> hashtab_lookup() find elements recently added by hashtab_add(), even
> >>>>> during a resize, and without the need for memory barriers.
> >>>>>
> >>>>> The scenario that convinced me to take this approach is when a thread
> >>>>> does hashtab_add(), then immediately searches for the newly added element.
> >>>>> Failing to find it would be quite a surprise to most people.
> >>>>
> >>>> When a thread does hashtab_del() and immediately checks the deletion,
> >>>> it still finds the deleted element while resizing is in progress.
> >>>> This would also be a surprise. Current version looks less consistent
> >>>> than the simpler one did.
> >>>
> >>> I bet I can fix that...  Famous last words!  ;-)
> >>>
> >>> But please see below and tell me what you think.
> >>
> >> Well, that is not quite right, but close.  Working on it...
> > 
> > Seems to be stable.  I have not yet updated the text.  I am currently
> > looking into whether I can get rid of ->ht_resize_cur.
> 
> Without ->ht_resize_cur, it would be hard (or impossible) for
> hashtab_lock_mod() to see if bucket in the new table needs
> to be locked, wouldn't it?

The thought is to just unconditionally lock the buckets in both tables
when resizing.  This would of course result in added locking overhead
for adds and deletes, but only during resizing.

> I have another idea to simplify the code, with the possibility of
> increasing update-side cost during resizing.
> 
> By keeping updating ->cur hashtab until the resizing finished,
> hashtab_lookup() only needs to see ->cur.
> hashtab_add() and hashtab_del() update both buckets if hashtab_lock_mod()
> has locked two buckets.

Now that hashtab_lookup() does a lookup in both buckets, it should not be
necessary to add to both buckets.  But you are right that it should not
hurt, given that there is no API for iterating over the entire hash table.
And I do see how it reduces the non-resize-time cost of failed lookups,
which is a good thing.

So this is a promising approach!  One question interspersed below.

							Thanx, Paul

> I've added the code as the alternative code to avoid messing line lables.
> The diff is appended below.
> 
> Thoughts?
> 
> Note: By adding a "-DFCV_SNIPPET" flag to the compiler, you can compile
> your version of the code.
> 
>         Thanks, Akira
> 
> >                                                        In theory, this
> > would make it trivial to make the resizing "pause", releasing the lock
> > from time to time.
> > 
> > For whatever it is worth...
> > 
> > 							Thanx, Paul
> > 
> 
> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> index 9f9fe8c..35bc847 100644
> --- a/CodeSamples/datastruct/hash/hash_resize.c
> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> @@ -30,7 +30,11 @@
>  struct ht_elem {
>  	struct rcu_head rh;
>  	struct cds_list_head hte_next[2];		//\lnlbl{ht_elem:next}
> +#ifndef FCV_SNIPPET
> +	unsigned long hte_hash[2];
> +#else /* #ifndef FCV_SNIPPET */
>  	unsigned long hte_hash;
> +#endif /* #ifndef FCV_SNIPPET */
>  };
>  
>  /* Hash-table bucket element. */
> @@ -230,12 +234,16 @@ hashtab_lookup(struct hashtab *htp_master, void *key)
>  
>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
>  	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
> +#ifndef FCV_SNIPPET
> +	return htep;
> +#else /* #ifndef FCV_SNIPPET */
>  	if (htep)					//\lnlbl{lkp:entchk}
>  		return htep;				//\lnlbl{lkp:ret_match}
>  	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
>  	if (!htp)					//\lnlbl{lkp:htpchk}
>  		return NULL;				//\lnlbl{lkp:noresize}
>  	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
> +#endif /* #ifndef FCV_SNIPPET */
>  }							//\lnlbl{lkp:e}
>  
>  /*
> @@ -248,9 +256,18 @@ void hashtab_add(struct ht_elem *htep,			//\lnlbl{add:b}
>  	struct ht_bucket *htbp = lsp->hbp[0];		//\lnlbl{add:htbp}
>  	int i = lsp->hls_idx[0];			//\lnlbl{add:i}
>  
> +#ifndef FCV_SNIPPET
> +	htep->hte_hash[0] = lsp->hls_hash[0];
> +	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head);
> +	if ((htbp = lsp->hbp[1])) {
> +	  htep->hte_hash[1] = lsp->hls_hash[1];
> +	  cds_list_add_rcu(&htep->hte_next[!i], &htbp->htb_head);
> +	}
> +#else /* #ifndef FCV_SNIPPET */
>  	htep->hte_hash = lsp->hls_hash[0];		//\lnlbl{add:hash}
>  	htep->hte_next[!i].prev = NULL;			//\lnlbl{add:initp}
>  	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head); //\lnlbl{add:add}
> +#endif /* #ifndef FCV_SNIPPET */
>  }							//\lnlbl{add:e}
>  
>  /*
> @@ -262,6 +279,11 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
>  {
>  	int i = lsp->hls_idx[0];			//\lnlbl{del:i}
>  
> +#ifndef FCV_SNIPPET
> +	cds_list_del_rcu(&htep->hte_next[i]);
> +	if (lsp->hbp[1])
> +	  cds_list_del_rcu(&htep->hte_next[!i]);

What if the element was added before the resize started, so that it is not
in the new version of the table?  Or are we initializing each element's
pointers to point to themselves somewhere that I am currently blind to?

> +#else /* #ifndef FCV_SNIPPET */
>  	if (htep->hte_next[i].prev) {			//\lnlbl{del:if}
>  		cds_list_del_rcu(&htep->hte_next[i]);	//\lnlbl{del:del}
>  		htep->hte_next[i].prev = NULL;		//\lnlbl{del:init}
> @@ -270,6 +292,7 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
>  		cds_list_del_rcu(&htep->hte_next[!i]);	//\lnlbl{del:delnew}
>  		htep->hte_next[!i].prev = NULL;		//\lnlbl{del:initnew}
>  	}
> +#endif /* #ifndef FCV_SNIPPET */
>  }							//\lnlbl{del:e}
>  //\end{snippet}
>  
> @@ -350,5 +373,9 @@ void defer_del_rcu(struct rcu_head *rhp)
>  
>  #define quiescent_state() rcu_quiescent_state()
>  
> +#ifndef FCV_SNIPPET
> +#define check_hash() (htep->hte_hash[0] != hash && htep->hte_hash[1] != hash)
> +#endif /* #ifndef FCV_SNIPPET */
> +
>  #include "hashtorture.h"
>  #endif /* #ifdef TEST_HASH */
> diff --git a/CodeSamples/datastruct/hash/hashtorture.h b/CodeSamples/datastruct/hash/hashtorture.h
> index 6f47baa..d6345cc 100644
> --- a/CodeSamples/datastruct/hash/hashtorture.h
> +++ b/CodeSamples/datastruct/hash/hashtorture.h
> @@ -62,6 +62,10 @@ void (*defer_del_done)(struct ht_elem *htep) = NULL;
>  #define rcu_barrier() do ; while (0)
>  #endif /* #ifndef quiescent_state */
>  
> +#ifndef check_hash
> +#define check_hash() (htep->hte_hash != hash)
> +#endif /* #ifndef check_hash */
> +
>  /*
>   * Test variables.
>   */
> @@ -988,7 +992,7 @@ int zoo_lookup(char *key)
>  	htep = hashtab_lookup(perftest_htp, hash, key);
>  	zhep = container_of(htep, struct zoo_he, zhe_e);
>  	BUG_ON(htep &&
> -	       (htep->hte_hash != hash ||
> +	       (check_hash() ||
>  	        strncmp(zhep->name, (char *)key, ZOO_NAMELEN) != 0));
>  	hashtab_unlock_lookup(perftest_htp, hash);
>  	hashtab_lookup_done(htep);
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-11 15:43                         ` Paul E. McKenney
@ 2019-01-11 22:56                           ` Akira Yokosawa
  2019-01-11 23:28                             ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Akira Yokosawa @ 2019-01-11 22:56 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Junchang Wang, perfbook, Akira Yokosawa

On 2019/01/11 07:43:14 -0800, Paul E. McKenney wrote:
> On Fri, Jan 11, 2019 at 11:25:18PM +0900, Akira Yokosawa wrote:
>> On 2019/01/11 13:08, Paul E. McKenney wrote:
>>> On Tue, Jan 08, 2019 at 06:59:13PM -0800, Paul E. McKenney wrote:
>>>> On Tue, Jan 08, 2019 at 04:19:59PM -0800, Paul E. McKenney wrote:
>>>>> On Wed, Jan 09, 2019 at 07:16:05AM +0900, Akira Yokosawa wrote:
>>>>>> On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
>>>>>>> On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
>>>>>>>> On 2019/01/09 0:28, Paul E. McKenney wrote:
>>>>>>>>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
>>>>>>>>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
>>>>>>>>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
>>>>>
>>>>> [ . . . ]
>>>>>
>>>>>>>>>> Hi Paul and Akira,
>>>>>>>>>>
>>>>>>>>>> Thanks a lot for the comments, which I need some more time to look
>>>>>>>>>> into. For Paul's patch, I have a few concerns. Please take a look.
>>>>>>>>>>
>>>>>>>>>> My understanding is that with this path, during the time period when
>>>>>>>>>> the resizing thread is running, an updater may insert/delete an item
>>>>>>>>>> into/from the new hash table, while readers are still looking up data
>>>>>>>>>> in the old one, resulting the readers are unaware of
>>>>>>>>>> insertions/deletions happening simultaneously. For example, it seems
>>>>>>>>>> the following sequence could happen.
>>>>>>>>>>
>>>>>>>>>> 1. The resizing thread starts.
>>>>>>>>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
>>>>>>>>>> 3. An updater wants to insert a new item *I* which should be inserted
>>>>>>>>>> into bucket *B*.
>>>>>>>>>> 4. The updater will select the new hash table and insert the item *I*
>>>>>>>>>> into the new hash table.
>>>>>>>>>> 5. A read request comes in and wants to lookup item *I*. The lookup
>>>>>>>>>> request will check the old hash table and fail. Doesn't it?
>>>>>>>>>> 6. The resizing thread exits.
>>>>>>>>>> 7. Now subsequent read requests can successfully find item *I*.
>>>>>>>>>
>>>>>>>>> Yes, this can happen.
>>>>>>>>>
>>>>>>>>>> Is my understanding correct? Please let me know if I misunderstood
>>>>>>>>>> anything. Give the truth that this patch can accelerate the fast path,
>>>>>>>>>> I think it should be OK because resizing is typically happen rarely.
>>>>>>>>>> Just want to make sure I fully understand the algorithm.
>>>>>>>>>
>>>>>>>>> It is a design choice, and some users would prefer not to fail to see
>>>>>>>>> new items during a resize.  One approach would be to revert back to
>>>>>>>>> the old-style checking, and another would be to provide a separate
>>>>>>>>> lookup interface that synchronizes with adds and deletes.
>>>>>>>>>
>>>>>>>>> So, I could add a quick quiz with this information, I could revert the
>>>>>>>>> change, or I could add another lookup function that provided more timely
>>>>>>>>> information.  Left to myself, I would provide a quick quiz, but what
>>>>>>>>> do you guys think?
>>>>>>>>
>>>>>>>> Hi, I was composing a message, but now I'm replying to this one.
>>>>>>>> I think adding a quick quiz would be a good idea.
>>>>>>>
>>>>>>> But in the meantime, it occurred to me that I was looking at the
>>>>>>> problem in the wrong way.  I believe that the following patch makes
>>>>>>> hashtab_lookup() find elements recently added by hashtab_add(), even
>>>>>>> during a resize, and without the need for memory barriers.
>>>>>>>
>>>>>>> The scenario that convinced me to take this approach is when a thread
>>>>>>> does hashtab_add(), then immediately searches for the newly added element.
>>>>>>> Failing to find it would be quite a surprise to most people.
>>>>>>
>>>>>> When a thread does hashtab_del() and immediately checks the deletion,
>>>>>> it still finds the deleted element while resizing is in progress.
>>>>>> This would also be a surprise. Current version looks less consistent
>>>>>> than the simpler one did.
>>>>>
>>>>> I bet I can fix that...  Famous last words!  ;-)
>>>>>
>>>>> But please see below and tell me what you think.
>>>>
>>>> Well, that is not quite right, but close.  Working on it...
>>>
>>> Seems to be stable.  I have not yet updated the text.  I am currently
>>> looking into whether I can get rid of ->ht_resize_cur.
>>
>> Without ->ht_resize_cur, it would be hard (or impossible) for
>> hashtab_lock_mod() to see if bucket in the new table needs
>> to be locked, wouldn't it?
> 
> The thought is to just unconditionally lock the buckets in both tables
> when resizing.  This would of course result in added locking overhead
> for adds and deletes, but only during resizing.
> 
>> I have another idea to simplify the code, with the possibility of
>> increasing update-side cost during resizing.
>>
>> By keeping updating ->cur hashtab until the resizing finished,
>> hashtab_lookup() only needs to see ->cur.
>> hashtab_add() and hashtab_del() update both buckets if hashtab_lock_mod()
>> has locked two buckets.
> 
> Now that hashtab_lookup() does a lookup in both buckets, it should not be
> necessary to add to both buckets.  But you are right that it should not
> hurt, given that there is no API for iterating over the entire hash table.
> And I do see how it reduces the non-resize-time cost of failed lookups,
> which is a good thing.
> 
> So this is a promising approach!  One question interspersed below.
> 
> 							Thanx, Paul
> 
>> I've added the code as the alternative code to avoid messing line lables.
>> The diff is appended below.
>>
>> Thoughts?
>>
>> Note: By adding a "-DFCV_SNIPPET" flag to the compiler, you can compile
>> your version of the code.
>>
>>         Thanks, Akira
>>
>>>                                                        In theory, this
>>> would make it trivial to make the resizing "pause", releasing the lock
>>> from time to time.
>>>
>>> For whatever it is worth...
>>>
>>> 							Thanx, Paul
>>>
>>
>> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
>> index 9f9fe8c..35bc847 100644
>> --- a/CodeSamples/datastruct/hash/hash_resize.c
>> +++ b/CodeSamples/datastruct/hash/hash_resize.c
>> @@ -30,7 +30,11 @@
>>  struct ht_elem {
>>  	struct rcu_head rh;
>>  	struct cds_list_head hte_next[2];		//\lnlbl{ht_elem:next}
>> +#ifndef FCV_SNIPPET
>> +	unsigned long hte_hash[2];
>> +#else /* #ifndef FCV_SNIPPET */
>>  	unsigned long hte_hash;
>> +#endif /* #ifndef FCV_SNIPPET */
>>  };
>>  
>>  /* Hash-table bucket element. */
>> @@ -230,12 +234,16 @@ hashtab_lookup(struct hashtab *htp_master, void *key)
>>  
>>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
>>  	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
>> +#ifndef FCV_SNIPPET
>> +	return htep;
>> +#else /* #ifndef FCV_SNIPPET */
>>  	if (htep)					//\lnlbl{lkp:entchk}
>>  		return htep;				//\lnlbl{lkp:ret_match}
>>  	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
>>  	if (!htp)					//\lnlbl{lkp:htpchk}
>>  		return NULL;				//\lnlbl{lkp:noresize}
>>  	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
>> +#endif /* #ifndef FCV_SNIPPET */
>>  }							//\lnlbl{lkp:e}
>>  
>>  /*
>> @@ -248,9 +256,18 @@ void hashtab_add(struct ht_elem *htep,			//\lnlbl{add:b}
>>  	struct ht_bucket *htbp = lsp->hbp[0];		//\lnlbl{add:htbp}
>>  	int i = lsp->hls_idx[0];			//\lnlbl{add:i}
>>  
>> +#ifndef FCV_SNIPPET
>> +	htep->hte_hash[0] = lsp->hls_hash[0];
>> +	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head);
>> +	if ((htbp = lsp->hbp[1])) {
>> +	  htep->hte_hash[1] = lsp->hls_hash[1];
>> +	  cds_list_add_rcu(&htep->hte_next[!i], &htbp->htb_head);
>> +	}
>> +#else /* #ifndef FCV_SNIPPET */
>>  	htep->hte_hash = lsp->hls_hash[0];		//\lnlbl{add:hash}
>>  	htep->hte_next[!i].prev = NULL;			//\lnlbl{add:initp}
>>  	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head); //\lnlbl{add:add}
>> +#endif /* #ifndef FCV_SNIPPET */
>>  }							//\lnlbl{add:e}
>>  
>>  /*
>> @@ -262,6 +279,11 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
>>  {
>>  	int i = lsp->hls_idx[0];			//\lnlbl{del:i}
>>  
>> +#ifndef FCV_SNIPPET
>> +	cds_list_del_rcu(&htep->hte_next[i]);
>> +	if (lsp->hbp[1])
>> +	  cds_list_del_rcu(&htep->hte_next[!i]);
> 
> What if the element was added before the resize started, so that it is not
> in the new version of the table?  Or are we initializing each element's
> pointers to point to themselves somewhere that I am currently blind to?

You are considering unconditionally locking both buckets, but
this version relies on the update of ->ht_resize_cur.

That hashtab_lock_mod() returns a pointer in lsp->hbp[1] means
the resizing has distributed the bucket to new table. Therefore,
the element is assured to be in the new bucket.

And it looks like I used indent by 2 spaces in the added code
in the diff. I'll fix them if you'd like to take this approach.

        Thanks, Akira

> 
>> +#else /* #ifndef FCV_SNIPPET */
>>  	if (htep->hte_next[i].prev) {			//\lnlbl{del:if}
>>  		cds_list_del_rcu(&htep->hte_next[i]);	//\lnlbl{del:del}
>>  		htep->hte_next[i].prev = NULL;		//\lnlbl{del:init}
>> @@ -270,6 +292,7 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
>>  		cds_list_del_rcu(&htep->hte_next[!i]);	//\lnlbl{del:delnew}
>>  		htep->hte_next[!i].prev = NULL;		//\lnlbl{del:initnew}
>>  	}
>> +#endif /* #ifndef FCV_SNIPPET */
>>  }							//\lnlbl{del:e}
>>  //\end{snippet}
>>  
>> @@ -350,5 +373,9 @@ void defer_del_rcu(struct rcu_head *rhp)
>>  
>>  #define quiescent_state() rcu_quiescent_state()
>>  
>> +#ifndef FCV_SNIPPET
>> +#define check_hash() (htep->hte_hash[0] != hash && htep->hte_hash[1] != hash)
>> +#endif /* #ifndef FCV_SNIPPET */
>> +
>>  #include "hashtorture.h"
>>  #endif /* #ifdef TEST_HASH */
>> diff --git a/CodeSamples/datastruct/hash/hashtorture.h b/CodeSamples/datastruct/hash/hashtorture.h
>> index 6f47baa..d6345cc 100644
>> --- a/CodeSamples/datastruct/hash/hashtorture.h
>> +++ b/CodeSamples/datastruct/hash/hashtorture.h
>> @@ -62,6 +62,10 @@ void (*defer_del_done)(struct ht_elem *htep) = NULL;
>>  #define rcu_barrier() do ; while (0)
>>  #endif /* #ifndef quiescent_state */
>>  
>> +#ifndef check_hash
>> +#define check_hash() (htep->hte_hash != hash)
>> +#endif /* #ifndef check_hash */
>> +
>>  /*
>>   * Test variables.
>>   */
>> @@ -988,7 +992,7 @@ int zoo_lookup(char *key)
>>  	htep = hashtab_lookup(perftest_htp, hash, key);
>>  	zhep = container_of(htep, struct zoo_he, zhe_e);
>>  	BUG_ON(htep &&
>> -	       (htep->hte_hash != hash ||
>> +	       (check_hash() ||
>>  	        strncmp(zhep->name, (char *)key, ZOO_NAMELEN) != 0));
>>  	hashtab_unlock_lookup(perftest_htp, hash);
>>  	hashtab_lookup_done(htep);
>>
>>
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-11 22:56                           ` Akira Yokosawa
@ 2019-01-11 23:28                             ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-11 23:28 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Junchang Wang, perfbook

On Sat, Jan 12, 2019 at 07:56:48AM +0900, Akira Yokosawa wrote:
> On 2019/01/11 07:43:14 -0800, Paul E. McKenney wrote:
> > On Fri, Jan 11, 2019 at 11:25:18PM +0900, Akira Yokosawa wrote:
> >> On 2019/01/11 13:08, Paul E. McKenney wrote:
> >>> On Tue, Jan 08, 2019 at 06:59:13PM -0800, Paul E. McKenney wrote:
> >>>> On Tue, Jan 08, 2019 at 04:19:59PM -0800, Paul E. McKenney wrote:
> >>>>> On Wed, Jan 09, 2019 at 07:16:05AM +0900, Akira Yokosawa wrote:
> >>>>>> On 2019/01/08 10:39:31 -0800, Paul E. McKenney wrote:
> >>>>>>> On Wed, Jan 09, 2019 at 12:35:37AM +0900, Akira Yokosawa wrote:
> >>>>>>>> On 2019/01/09 0:28, Paul E. McKenney wrote:
> >>>>>>>>> On Tue, Jan 08, 2019 at 09:56:57AM +0800, Junchang Wang wrote:
> >>>>>>>>>> On Tue, Jan 8, 2019 at 7:06 AM Akira Yokosawa <akiyks@gmail.com> wrote:
> >>>>>>>>>>> On 2019/01/08 07:54:16 +0900, Akira Yokosawa wrote:
> >>>>>
> >>>>> [ . . . ]
> >>>>>
> >>>>>>>>>> Hi Paul and Akira,
> >>>>>>>>>>
> >>>>>>>>>> Thanks a lot for the comments, which I need some more time to look
> >>>>>>>>>> into. For Paul's patch, I have a few concerns. Please take a look.
> >>>>>>>>>>
> >>>>>>>>>> My understanding is that with this path, during the time period when
> >>>>>>>>>> the resizing thread is running, an updater may insert/delete an item
> >>>>>>>>>> into/from the new hash table, while readers are still looking up data
> >>>>>>>>>> in the old one, resulting the readers are unaware of
> >>>>>>>>>> insertions/deletions happening simultaneously. For example, it seems
> >>>>>>>>>> the following sequence could happen.
> >>>>>>>>>>
> >>>>>>>>>> 1. The resizing thread starts.
> >>>>>>>>>> 2. The resizing thread successfully passes bucket *B* of the old hash table.
> >>>>>>>>>> 3. An updater wants to insert a new item *I* which should be inserted
> >>>>>>>>>> into bucket *B*.
> >>>>>>>>>> 4. The updater will select the new hash table and insert the item *I*
> >>>>>>>>>> into the new hash table.
> >>>>>>>>>> 5. A read request comes in and wants to lookup item *I*. The lookup
> >>>>>>>>>> request will check the old hash table and fail. Doesn't it?
> >>>>>>>>>> 6. The resizing thread exits.
> >>>>>>>>>> 7. Now subsequent read requests can successfully find item *I*.
> >>>>>>>>>
> >>>>>>>>> Yes, this can happen.
> >>>>>>>>>
> >>>>>>>>>> Is my understanding correct? Please let me know if I misunderstood
> >>>>>>>>>> anything. Give the truth that this patch can accelerate the fast path,
> >>>>>>>>>> I think it should be OK because resizing is typically happen rarely.
> >>>>>>>>>> Just want to make sure I fully understand the algorithm.
> >>>>>>>>>
> >>>>>>>>> It is a design choice, and some users would prefer not to fail to see
> >>>>>>>>> new items during a resize.  One approach would be to revert back to
> >>>>>>>>> the old-style checking, and another would be to provide a separate
> >>>>>>>>> lookup interface that synchronizes with adds and deletes.
> >>>>>>>>>
> >>>>>>>>> So, I could add a quick quiz with this information, I could revert the
> >>>>>>>>> change, or I could add another lookup function that provided more timely
> >>>>>>>>> information.  Left to myself, I would provide a quick quiz, but what
> >>>>>>>>> do you guys think?
> >>>>>>>>
> >>>>>>>> Hi, I was composing a message, but now I'm replying to this one.
> >>>>>>>> I think adding a quick quiz would be a good idea.
> >>>>>>>
> >>>>>>> But in the meantime, it occurred to me that I was looking at the
> >>>>>>> problem in the wrong way.  I believe that the following patch makes
> >>>>>>> hashtab_lookup() find elements recently added by hashtab_add(), even
> >>>>>>> during a resize, and without the need for memory barriers.
> >>>>>>>
> >>>>>>> The scenario that convinced me to take this approach is when a thread
> >>>>>>> does hashtab_add(), then immediately searches for the newly added element.
> >>>>>>> Failing to find it would be quite a surprise to most people.
> >>>>>>
> >>>>>> When a thread does hashtab_del() and immediately checks the deletion,
> >>>>>> it still finds the deleted element while resizing is in progress.
> >>>>>> This would also be a surprise. Current version looks less consistent
> >>>>>> than the simpler one did.
> >>>>>
> >>>>> I bet I can fix that...  Famous last words!  ;-)
> >>>>>
> >>>>> But please see below and tell me what you think.
> >>>>
> >>>> Well, that is not quite right, but close.  Working on it...
> >>>
> >>> Seems to be stable.  I have not yet updated the text.  I am currently
> >>> looking into whether I can get rid of ->ht_resize_cur.
> >>
> >> Without ->ht_resize_cur, it would be hard (or impossible) for
> >> hashtab_lock_mod() to see if bucket in the new table needs
> >> to be locked, wouldn't it?
> > 
> > The thought is to just unconditionally lock the buckets in both tables
> > when resizing.  This would of course result in added locking overhead
> > for adds and deletes, but only during resizing.
> > 
> >> I have another idea to simplify the code, with the possibility of
> >> increasing update-side cost during resizing.
> >>
> >> By keeping updating ->cur hashtab until the resizing finished,
> >> hashtab_lookup() only needs to see ->cur.
> >> hashtab_add() and hashtab_del() update both buckets if hashtab_lock_mod()
> >> has locked two buckets.
> > 
> > Now that hashtab_lookup() does a lookup in both buckets, it should not be
> > necessary to add to both buckets.  But you are right that it should not
> > hurt, given that there is no API for iterating over the entire hash table.
> > And I do see how it reduces the non-resize-time cost of failed lookups,
> > which is a good thing.
> > 
> > So this is a promising approach!  One question interspersed below.
> > 
> > 							Thanx, Paul
> > 
> >> I've added the code as the alternative code to avoid messing line lables.
> >> The diff is appended below.
> >>
> >> Thoughts?
> >>
> >> Note: By adding a "-DFCV_SNIPPET" flag to the compiler, you can compile
> >> your version of the code.
> >>
> >>         Thanks, Akira
> >>
> >>>                                                        In theory, this
> >>> would make it trivial to make the resizing "pause", releasing the lock
> >>> from time to time.
> >>>
> >>> For whatever it is worth...
> >>>
> >>> 							Thanx, Paul
> >>>
> >>
> >> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> >> index 9f9fe8c..35bc847 100644
> >> --- a/CodeSamples/datastruct/hash/hash_resize.c
> >> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> >> @@ -30,7 +30,11 @@
> >>  struct ht_elem {
> >>  	struct rcu_head rh;
> >>  	struct cds_list_head hte_next[2];		//\lnlbl{ht_elem:next}
> >> +#ifndef FCV_SNIPPET
> >> +	unsigned long hte_hash[2];
> >> +#else /* #ifndef FCV_SNIPPET */
> >>  	unsigned long hte_hash;
> >> +#endif /* #ifndef FCV_SNIPPET */
> >>  };
> >>  
> >>  /* Hash-table bucket element. */
> >> @@ -230,12 +234,16 @@ hashtab_lookup(struct hashtab *htp_master, void *key)
> >>  
> >>  	htp = rcu_dereference(htp_master->ht_cur);	//\lnlbl{lkp:get_curtbl}
> >>  	htep = ht_search_bucket(htp, key);		//\lnlbl{lkp:get_curbkt}
> >> +#ifndef FCV_SNIPPET
> >> +	return htep;
> >> +#else /* #ifndef FCV_SNIPPET */
> >>  	if (htep)					//\lnlbl{lkp:entchk}
> >>  		return htep;				//\lnlbl{lkp:ret_match}
> >>  	htp = rcu_dereference(htp->ht_new);		//\lnlbl{lkp:get_nxttbl}
> >>  	if (!htp)					//\lnlbl{lkp:htpchk}
> >>  		return NULL;				//\lnlbl{lkp:noresize}
> >>  	return ht_search_bucket(htp, key);		//\lnlbl{lkp:ret_nxtbkt}
> >> +#endif /* #ifndef FCV_SNIPPET */
> >>  }							//\lnlbl{lkp:e}
> >>  
> >>  /*
> >> @@ -248,9 +256,18 @@ void hashtab_add(struct ht_elem *htep,			//\lnlbl{add:b}
> >>  	struct ht_bucket *htbp = lsp->hbp[0];		//\lnlbl{add:htbp}
> >>  	int i = lsp->hls_idx[0];			//\lnlbl{add:i}
> >>  
> >> +#ifndef FCV_SNIPPET
> >> +	htep->hte_hash[0] = lsp->hls_hash[0];
> >> +	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head);
> >> +	if ((htbp = lsp->hbp[1])) {
> >> +	  htep->hte_hash[1] = lsp->hls_hash[1];
> >> +	  cds_list_add_rcu(&htep->hte_next[!i], &htbp->htb_head);
> >> +	}
> >> +#else /* #ifndef FCV_SNIPPET */
> >>  	htep->hte_hash = lsp->hls_hash[0];		//\lnlbl{add:hash}
> >>  	htep->hte_next[!i].prev = NULL;			//\lnlbl{add:initp}
> >>  	cds_list_add_rcu(&htep->hte_next[i], &htbp->htb_head); //\lnlbl{add:add}
> >> +#endif /* #ifndef FCV_SNIPPET */
> >>  }							//\lnlbl{add:e}
> >>  
> >>  /*
> >> @@ -262,6 +279,11 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
> >>  {
> >>  	int i = lsp->hls_idx[0];			//\lnlbl{del:i}
> >>  
> >> +#ifndef FCV_SNIPPET
> >> +	cds_list_del_rcu(&htep->hte_next[i]);
> >> +	if (lsp->hbp[1])
> >> +	  cds_list_del_rcu(&htep->hte_next[!i]);
> > 
> > What if the element was added before the resize started, so that it is not
> > in the new version of the table?  Or are we initializing each element's
> > pointers to point to themselves somewhere that I am currently blind to?
> 
> You are considering unconditionally locking both buckets, but
> this version relies on the update of ->ht_resize_cur.
> 
> That hashtab_lock_mod() returns a pointer in lsp->hbp[1] means
> the resizing has distributed the bucket to new table. Therefore,
> the element is assured to be in the new bucket.

Ah, got it!

> And it looks like I used indent by 2 spaces in the added code
> in the diff. I'll fix them if you'd like to take this approach.

Well, your taking this code over would have the advantage of my getting
back to reworking Section 9.5, section-level epigraphs, and so on.  The
only downside of your approach that I see is if we ever want to iterate
over the hash table, but that can be dealt with when and if.

Therefore, I am all for your approach.  ;-)

							Thanx, Paul

>         Thanks, Akira
> 
> > 
> >> +#else /* #ifndef FCV_SNIPPET */
> >>  	if (htep->hte_next[i].prev) {			//\lnlbl{del:if}
> >>  		cds_list_del_rcu(&htep->hte_next[i]);	//\lnlbl{del:del}
> >>  		htep->hte_next[i].prev = NULL;		//\lnlbl{del:init}
> >> @@ -270,6 +292,7 @@ void hashtab_del(struct ht_elem *htep,			//\lnlbl{del:b}
> >>  		cds_list_del_rcu(&htep->hte_next[!i]);	//\lnlbl{del:delnew}
> >>  		htep->hte_next[!i].prev = NULL;		//\lnlbl{del:initnew}
> >>  	}
> >> +#endif /* #ifndef FCV_SNIPPET */
> >>  }							//\lnlbl{del:e}
> >>  //\end{snippet}
> >>  
> >> @@ -350,5 +373,9 @@ void defer_del_rcu(struct rcu_head *rhp)
> >>  
> >>  #define quiescent_state() rcu_quiescent_state()
> >>  
> >> +#ifndef FCV_SNIPPET
> >> +#define check_hash() (htep->hte_hash[0] != hash && htep->hte_hash[1] != hash)
> >> +#endif /* #ifndef FCV_SNIPPET */
> >> +
> >>  #include "hashtorture.h"
> >>  #endif /* #ifdef TEST_HASH */
> >> diff --git a/CodeSamples/datastruct/hash/hashtorture.h b/CodeSamples/datastruct/hash/hashtorture.h
> >> index 6f47baa..d6345cc 100644
> >> --- a/CodeSamples/datastruct/hash/hashtorture.h
> >> +++ b/CodeSamples/datastruct/hash/hashtorture.h
> >> @@ -62,6 +62,10 @@ void (*defer_del_done)(struct ht_elem *htep) = NULL;
> >>  #define rcu_barrier() do ; while (0)
> >>  #endif /* #ifndef quiescent_state */
> >>  
> >> +#ifndef check_hash
> >> +#define check_hash() (htep->hte_hash != hash)
> >> +#endif /* #ifndef check_hash */
> >> +
> >>  /*
> >>   * Test variables.
> >>   */
> >> @@ -988,7 +992,7 @@ int zoo_lookup(char *key)
> >>  	htep = hashtab_lookup(perftest_htp, hash, key);
> >>  	zhep = container_of(htep, struct zoo_he, zhe_e);
> >>  	BUG_ON(htep &&
> >> -	       (htep->hte_hash != hash ||
> >> +	       (check_hash() ||
> >>  	        strncmp(zhep->name, (char *)key, ZOO_NAMELEN) != 0));
> >>  	hashtab_unlock_lookup(perftest_htp, hash);
> >>  	hashtab_lookup_done(htep);
> >>
> >>
> > 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-07 18:33 ` Paul E. McKenney
  2019-01-07 22:54   ` Akira Yokosawa
@ 2019-01-18 14:32   ` Junchang Wang
  2019-01-18 18:34     ` Paul E. McKenney
  1 sibling, 1 reply; 23+ messages in thread
From: Junchang Wang @ 2019-01-18 14:32 UTC (permalink / raw)
  To: Paul McKenney; +Cc: perfbook

Hi Paul,

Sorry for the late reply. Please see below,

On Tue, Jan 8, 2019 at 2:32 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
>
> On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> > Hi all,
> >
> > I'm reading hash_resize recently, and have a few questions regarding
> > this algorithm. Please take a look if you have time. Any suggestions
> > are warmly welcomed.
> >
> > === Question 1 ===
> > In hash_resize.c : hashtab_lock_mod
> > 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> > 187                 lsp->hbp[1] = NULL;
> > 188                 return;
> > 189         }
> > 190         htp = rcu_dereference(htp->ht_new);
> >
> > It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> > and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> > compilers and hardware from reordering the two unrelated variables,
> > ht_resize_cur and ht_new. Is my understanding correct?
>
> Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> section and there is a synchronize_rcu() between the update to ->ht_new
> and the updates to ->ht_resize_cur.  For more details on how this works,
> please see https://lwn.net/Articles/573497/.
>

Thanks for the pointer to the article, which is very clear and helpful.

> Of course, if you find a code path in which a call to hashtab_lock_mod()
> is invoked outside of an RCU read-side critical section, that would be
> a bug.  (Can you tell me an exception to this rule, that is, a case
> where hashtab_lock_mod() could safely be invoked outside of an RCU
> read-side critical section?)
>
> > === Question 2 ===
> > In hash_resize.c, each time an updater wants to access a bucket, the
> > updater must first acquire the bucket's lock (htb_lock), preventing
> > other updaters accessing the same bucket concurrently. This approach
> > is OK if the linked list of a bucket is relatively short, but for a
> > larger system where linked lists are long enough and the
> > perftest_resize thread is running simultaneously, it could become a
> > potential performance bottleneck. One naive solution is to allow
> > multiple updaters to access the same bucket, only if they don't
> > operate on the same item of the list of this bucket. I wonder if there
> > are any existing works or discussions on this topic?
>
> One approach is to use a hashed array of locks, and to hash a given
> element's address to locate the lock to be used.  Please see
> Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> Needed Locks First"), including Quick Quiz 7.9, for additional details.
>
> Another approach is to use RCU to protect traversals, and locks within the
> linked-list elements themselves.  These locks are conditionally acquired
> (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> them in list order, and the tricks in Quick Quiz 7.9.
>
> Non-blocking synchronization can also be used, but it is often quite a
> bit more complicated.  See for example the split-order list of Shalev
> and Shavit, along with Desnoyers's RCU-protected extension in the
> userspace RCU library.
>

I have carefully read the related sections of perfbook, Shalev and
Shavit's split-ordered lists, Michael's lock-free linked list, and
Desnoyers's implementation. What a wonderful journey. Thanks a lot for
the pointers!

> But it is usually -way- better to just choose a good hash function and
> to increase the number of buckets.  Which is of course one reason for
> having resizable hash tables.  ;-)
>
> But the other techniques can be useful in more complex linked data
> structures, such as graphs, where there is no reasonable way to
> partition the data.  Nevertheless, many people choose to do the
> partitioning anyway, especially on distributed systems.
>
> > === Question 3 ===
> > Chapter Data Structures also discusses other resizable hash tables,
> > namely "Resizable, scalable, concurrent hash tables via relativistic
> > programming" from Josh Triplett, which can save memory footprint by
> > using a single pair of pointers. But my understanding is that
> > perftest_resize.c is unique in that it allows you to rebuild the hash
> > table by utilizing a different hash function, which could be very
> > useful in practice (e.g., to prevent DDoS attack). Other solutions do
> > not share this property. Is my understanding correct? Did I miss any
> > discussions on this topic in perfbook?
>
> Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> (which I use in hash_resize.c) is the only one allowing arbitrary changes
> to hash functions.  I expect that this advantage will become increasingly
> important as security issues become more challenging.  Furthermore, I
> suspect that the pointer-pair approach is faster and more scalable.
> It is certainly simpler.
>
> On the other hand, one advantage of the other two approaches is decreased
> memory consumption.
>

Agree. My understanding is that the other two approaches trade
flexibility for memory efficiency.

> Another advantage of Josh Triplett's pointer-unzip approach is that
> concurrent updates are (in theory, anyway) not blocked for as long
> by resize operations.  The other edge of this sword is that resizing
> is much slower, given the need to wait for many RCU grace periods.
>

I have carefully read Triplett's paper. The most impressing part is
that the algorithm gives up some degree of internal consistency (a
reader can see items that are really for other buckets) to get better
performance.

> Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> and Shavit's split-order list is that all operations are non-blocking,
> which can be important on massively overloaded systems, such as one
> might find in cloud computing.
>

My understanding is that if Herbert Xu's algorithm leverages lock-free
linked lists, it could also become non-blocking. Is that right? The
biggest challenge may come from resizing thread, which must atomically
delete an item from the old table and insert it into the new table.
This could be very interesting. For the applications that may benefit
from the non-blocking feature of the hash tables, could you please
elaborate more?


Thanks,
--Junchang

> > === Question 4 ===
> > In the current implementation of hash_resize.c, the perftest_resize
> > could block an updater, and vice versa. It seems this is not what we
> > expected. Ideally, they should be allowed to run concurrently, or at
> > least the perftest_resize thread should have lower priority and
> > updaters should never be blocked by the perftest_resize thread. Is
> > that right? I'm very interested in helping improve. Please let me know
> > if you have any suggestions.
>
> In hash_resize.c, an updater is blocked only for the time required to
> redisposition a bucket.  This is a great improvement over blocking
> updaters for the full resize over all buckets.
>
> But yes, it is not hard to do better, for example, periodically dropping
> the old-table lock in hashtab_resize().  This requires a few careful
> adjustments, of course.  Can you tell me what these adjustments are?
>
> Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> optimizing for the race with hashtab_resize() doesn't make a whole lot
> of sense.  Please see the patch below.  Thoughts?
>
>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 737646a9c868d841b32199b52f5569668975953e
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date:   Mon Jan 7 10:29:14 2019 -0800
>
>     datastruct/hash: Simplify hashtab_lookup()
>
>     Because resizing leaves the old hash table intact, and because lookups
>     are carried out within RCU read-side critical sections (which prevent
>     a second resizing operation from starting), there is no need for a
>     lookup to search anywhere but in the old hash table.  And in the common
>     case, there is no resize, so there is no new hash table.  Therefore,
>     eliminating the check for resizing speeds things up in the common
>     case.  In addition, this simplifies the code.
>
>     This commit therefore eliminates the ht_get_bucket() function,
>     renames the ht_get_bucket_single() function to ht_get_bucket(),
>     and modifies callers appropriately.
>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
>
> diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> index 29e05f907200..be4157959b83 100644
> --- a/CodeSamples/datastruct/hash/hash_resize.c
> +++ b/CodeSamples/datastruct/hash/hash_resize.c
> @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
>  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
>  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
>  static struct ht_bucket *                              //\lnlbl{single:b}
> -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> -                     unsigned long *h)
> +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
>  {
>         unsigned long hash = htp->ht_gethash(key);
>
> @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
>                 *h = hash;                              //\lnlbl{single:h}
>         return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
>  }                                                      //\lnlbl{single:e}
> -
> -/* Get hash bucket correesponding to key, accounting for resize. */
> -static struct ht_bucket *                              //\lnlbl{b}
> -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> -{
> -       struct ht_bucket *htbp;
> -
> -       htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> -                                                               //\fcvexclude
> -       if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
> -               smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> -               *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
> -               htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> -       }
> -       if (i)                                          //\lnlbl{chk_i}
> -               *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
> -       return htbp;                                    //\lnlbl{return}
> -}                                                      //\lnlbl{e}
>  //\end{snippet}
>
>  /* Read-side lock/unlock functions. */
> @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>
>         rcu_read_lock();                                //\lnlbl{l:rcu_lock}
>         htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
> -       htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
> +       htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
>         spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
>         lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
>         lsp->hls_idx[0] = htp->ht_idx;
> @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
>                 return;                                 //\lnlbl{l:fastret1}
>         }
>         htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
> -       htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
> +       htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
>         spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
>         lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
>         lsp->hls_idx[1] = htp->ht_idx;
> @@ -223,16 +204,15 @@ struct ht_elem *                                  //\lnlbl{lkp:b}
>  hashtab_lookup(struct hashtab *htp_master, void *key)
>  {
>         long b;
> -       int i;
>         struct ht *htp;
>         struct ht_elem *htep;
>         struct ht_bucket *htbp;
>
>         htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
> -       htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
> +       htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
>         cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
>                                     &htbp->htb_head,
> -                                   hte_next[i]) {
> +                                   hte_next[htp->ht_idx]) {
>                 if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
>                         return htep;                    //\lnlbl{lkp:ret_match}
>         }                                               //\lnlbl{lkp:loop:e}
> @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
>                 htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
>                 spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
>                 cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> -                       htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> +                       htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
>                         spin_lock(&htbp_new->htb_lock);
>                         cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
>                         spin_unlock(&htbp_new->htb_lock);
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index 5c61bf5e2389..0152437c274e 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
> @@ -966,10 +966,8 @@ the old table.
>  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
>  Bucket selection is shown in
>  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> -which shows \co{ht_get_bucket_single()} on
> -lines~\lnref{single:b}-\lnref{single:e} and
> -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> +which shows \co{ht_get_bucket()}.
> +This function returns a reference to the bucket
>  corresponding to the specified key in the specified hash table, without
>  making any allowances for resizing.
>  It also stores the bucket index corresponding to the key into the location
> @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
>  hash value corresponding to the key into the location
>  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
>  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> -
> -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> -\co{ht_get_bucket_single()} on
> -line~\lnref{call_single} to select the bucket
> -corresponding to the hash in the current
> -hash table, storing the hash value through parameter~\co{b}.
> -If line~\lnref{resized} determines that the table is being resized and that
> -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> -table, then line~\lnref{newtable} selects the new hash table and
> -line~\lnref{newbucket}
> -selects the bucket corresponding to the hash in the new hash table,
> -again storing the hash value through parameter~\co{b}.
> -\end{lineref}
> -
> -\QuickQuiz{}
> -       The code in
> -       Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> -       computes the hash twice!
> -       Why this blatant inefficiency?
> -\QuickQuizAnswer{
> -       The reason is that the old and new hash tables might have
> -       completely different hash functions, so that a hash computed
> -       for the old table might be completely irrelevant to the
> -       new table.
> -} \QuickQuizEnd
> -
> -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>  \end{lineref}
>
>  \QuickQuiz{}
> @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
>         functions described next.
>  } \QuickQuizEnd
>
> -This implementation of
> -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> -permit lookups and modifications to run concurrently
> -with a resize operation.
> +This implementation of \co{ht_get_bucket()} permits lookups and
> +modifications to run concurrently with a resize operation.
>
>  \begin{listing}[tb]
>  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> @@ -1129,11 +1095,6 @@ hash lookups.
>  Line~\lnref{get_curtbl} fetches the current hash table and
>  line~\lnref{get_curbkt} obtains a reference
>  to the bucket corresponding to the specified key.
> -This bucket will be located in a new resized hash table when a
> -resize operation has progressed past the bucket in the old hash
> -table that contained the desired data element.
> -Note that line~\lnref{get_curbkt} also passes back the index that will be
> -used to select the correct set of pointers from the pair in each element.
>  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
>  so that if line~\lnref{match}
>  detects a match,
> @@ -1144,22 +1105,17 @@ failure.
>  \end{lineref}
>
>  \QuickQuiz{}
> -       In the \co{hashtab_lookup()} function in
> -       Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> -       the code carefully finds the right bucket in the new hash table
> -       if the element to be looked up has already been distributed
> -       by a concurrent resize operation.
> -       This seems wasteful for RCU-protected lookups.
> -       Why not just stick with the old hash table in this case?
> +       \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> +       What if execution reaches line~\lnref{loop:b}
> +       of \co{hashtab_lookup()} in
> +       Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> +       just after this bucket has been resized.
> +       Won't that result in lookup failures?
> +       \end{lineref}
>  \QuickQuizAnswer{
> -       Suppose that a resize operation begins and distributes half of
> -       the old table's buckets to the new table.
> -       Suppose further that a thread adds a new element that goes into
> -       one of the already-distributed buckets, and that this same thread
> -       now looks up this newly added element.
> -       If lookups unconditionally traversed only the old hash table,
> -       this thread would get a lookup failure for the element that it
> -       just added, which certainly sounds like a bug to me!
> +       No, it won't.
> +       Resizing into the new hash table leaves the old hash table
> +       intact, courtesy of the pointer pairs.
>  } \QuickQuizEnd
>
>  \begin{lineref}[ln:datastruct:hash_resize:access:add]
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Question regarding hash_resize
  2019-01-18 14:32   ` Junchang Wang
@ 2019-01-18 18:34     ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2019-01-18 18:34 UTC (permalink / raw)
  To: Junchang Wang; +Cc: perfbook

On Fri, Jan 18, 2019 at 10:32:51PM +0800, Junchang Wang wrote:
> Hi Paul,
> 
> Sorry for the late reply. Please see below,
> 
> On Tue, Jan 8, 2019 at 2:32 AM Paul E. McKenney <paulmck@linux.ibm.com> wrote:
> >
> > On Mon, Jan 07, 2019 at 09:49:19PM +0800, Junchang Wang wrote:
> > > Hi all,
> > >
> > > I'm reading hash_resize recently, and have a few questions regarding
> > > this algorithm. Please take a look if you have time. Any suggestions
> > > are warmly welcomed.
> > >
> > > === Question 1 ===
> > > In hash_resize.c : hashtab_lock_mod
> > > 186         if (b > READ_ONCE(htp->ht_resize_cur)) {
> > > 187                 lsp->hbp[1] = NULL;
> > > 188                 return;
> > > 189         }
> > > 190         htp = rcu_dereference(htp->ht_new);
> > >
> > > It seems we are missing a barrier (e.g., smp_mb) in between lines 189
> > > and 190, because neither READ_ONCE() nor rcu_dereference() can prevent
> > > compilers and hardware from reordering the two unrelated variables,
> > > ht_resize_cur and ht_new. Is my understanding correct?
> >
> > Ah, but hashtab_lock_mod() is invoked within an RCU read-side critical
> > section and there is a synchronize_rcu() between the update to ->ht_new
> > and the updates to ->ht_resize_cur.  For more details on how this works,
> > please see https://lwn.net/Articles/573497/.
> 
> Thanks for the pointer to the article, which is very clear and helpful.

Glad it helped!

> > Of course, if you find a code path in which a call to hashtab_lock_mod()
> > is invoked outside of an RCU read-side critical section, that would be
> > a bug.  (Can you tell me an exception to this rule, that is, a case
> > where hashtab_lock_mod() could safely be invoked outside of an RCU
> > read-side critical section?)
> >
> > > === Question 2 ===
> > > In hash_resize.c, each time an updater wants to access a bucket, the
> > > updater must first acquire the bucket's lock (htb_lock), preventing
> > > other updaters accessing the same bucket concurrently. This approach
> > > is OK if the linked list of a bucket is relatively short, but for a
> > > larger system where linked lists are long enough and the
> > > perftest_resize thread is running simultaneously, it could become a
> > > potential performance bottleneck. One naive solution is to allow
> > > multiple updaters to access the same bucket, only if they don't
> > > operate on the same item of the list of this bucket. I wonder if there
> > > are any existing works or discussions on this topic?
> >
> > One approach is to use a hashed array of locks, and to hash a given
> > element's address to locate the lock to be used.  Please see
> > Section 7.1.1.5 ("Conditional Locking") and Section 7.1.1.6 ("Acquire
> > Needed Locks First"), including Quick Quiz 7.9, for additional details.
> >
> > Another approach is to use RCU to protect traversals, and locks within the
> > linked-list elements themselves.  These locks are conditionally acquired
> > (again, please see Section 7.1.1.5), and deadlock is avoided by acquiring
> > them in list order, and the tricks in Quick Quiz 7.9.
> >
> > Non-blocking synchronization can also be used, but it is often quite a
> > bit more complicated.  See for example the split-order list of Shalev
> > and Shavit, along with Desnoyers's RCU-protected extension in the
> > userspace RCU library.
> 
> I have carefully read the related sections of perfbook, Shalev and
> Shavit's split-ordered lists, Michael's lock-free linked list, and
> Desnoyers's implementation. What a wonderful journey. Thanks a lot for
> the pointers!

It is an interesting algorithm!

One limitation is the need to reverse the bits of the key.  This is
sometimes (but not always) a performance bottleneck.

> > But it is usually -way- better to just choose a good hash function and
> > to increase the number of buckets.  Which is of course one reason for
> > having resizable hash tables.  ;-)
> >
> > But the other techniques can be useful in more complex linked data
> > structures, such as graphs, where there is no reasonable way to
> > partition the data.  Nevertheless, many people choose to do the
> > partitioning anyway, especially on distributed systems.
> >
> > > === Question 3 ===
> > > Chapter Data Structures also discusses other resizable hash tables,
> > > namely "Resizable, scalable, concurrent hash tables via relativistic
> > > programming" from Josh Triplett, which can save memory footprint by
> > > using a single pair of pointers. But my understanding is that
> > > perftest_resize.c is unique in that it allows you to rebuild the hash
> > > table by utilizing a different hash function, which could be very
> > > useful in practice (e.g., to prevent DDoS attack). Other solutions do
> > > not share this property. Is my understanding correct? Did I miss any
> > > discussions on this topic in perfbook?
> >
> > Indeed, to the best of my knowledge, Herbert Xu's pointer-pair approach
> > (which I use in hash_resize.c) is the only one allowing arbitrary changes
> > to hash functions.  I expect that this advantage will become increasingly
> > important as security issues become more challenging.  Furthermore, I
> > suspect that the pointer-pair approach is faster and more scalable.
> > It is certainly simpler.
> >
> > On the other hand, one advantage of the other two approaches is decreased
> > memory consumption.
> 
> Agree. My understanding is that the other two approaches trade
> flexibility for memory efficiency.
> 
> > Another advantage of Josh Triplett's pointer-unzip approach is that
> > concurrent updates are (in theory, anyway) not blocked for as long
> > by resize operations.  The other edge of this sword is that resizing
> > is much slower, given the need to wait for many RCU grace periods.
> 
> I have carefully read Triplett's paper. The most impressing part is
> that the algorithm gives up some degree of internal consistency (a
> reader can see items that are really for other buckets) to get better
> performance.

That tradeoff is a common theme running through almost all RCU-based
algorithms.

> > Another advantage of Mathieu Desnoyers's RCUified variant of Shalev
> > and Shavit's split-order list is that all operations are non-blocking,
> > which can be important on massively overloaded systems, such as one
> > might find in cloud computing.
> 
> My understanding is that if Herbert Xu's algorithm leverages lock-free
> linked lists, it could also become non-blocking. Is that right? The
> biggest challenge may come from resizing thread, which must atomically
> delete an item from the old table and insert it into the new table.
> This could be very interesting. For the applications that may benefit
> from the non-blocking feature of the hash tables, could you please
> elaborate more?

Actually, there is no need for the resizing thread to remove the element
from the old list.  (I was confused on that point myself earlier in
this thread.)  So you only need to atomically add and atomically delete.
There is no need to atomically move.

							Thanx, Paul

> Thanks,
> --Junchang
> 
> > > === Question 4 ===
> > > In the current implementation of hash_resize.c, the perftest_resize
> > > could block an updater, and vice versa. It seems this is not what we
> > > expected. Ideally, they should be allowed to run concurrently, or at
> > > least the perftest_resize thread should have lower priority and
> > > updaters should never be blocked by the perftest_resize thread. Is
> > > that right? I'm very interested in helping improve. Please let me know
> > > if you have any suggestions.
> >
> > In hash_resize.c, an updater is blocked only for the time required to
> > redisposition a bucket.  This is a great improvement over blocking
> > updaters for the full resize over all buckets.
> >
> > But yes, it is not hard to do better, for example, periodically dropping
> > the old-table lock in hashtab_resize().  This requires a few careful
> > adjustments, of course.  Can you tell me what these adjustments are?
> >
> > Hmmm...  I could simplify hashtab_lookup(), couldn't I?  After all,
> > optimizing for the race with hashtab_resize() doesn't make a whole lot
> > of sense.  Please see the patch below.  Thoughts?
> >
> >                                                         Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit 737646a9c868d841b32199b52f5569668975953e
> > Author: Paul E. McKenney <paulmck@linux.ibm.com>
> > Date:   Mon Jan 7 10:29:14 2019 -0800
> >
> >     datastruct/hash: Simplify hashtab_lookup()
> >
> >     Because resizing leaves the old hash table intact, and because lookups
> >     are carried out within RCU read-side critical sections (which prevent
> >     a second resizing operation from starting), there is no need for a
> >     lookup to search anywhere but in the old hash table.  And in the common
> >     case, there is no resize, so there is no new hash table.  Therefore,
> >     eliminating the check for resizing speeds things up in the common
> >     case.  In addition, this simplifies the code.
> >
> >     This commit therefore eliminates the ht_get_bucket() function,
> >     renames the ht_get_bucket_single() function to ht_get_bucket(),
> >     and modifies callers appropriately.
> >
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
> >
> > diff --git a/CodeSamples/datastruct/hash/hash_resize.c b/CodeSamples/datastruct/hash/hash_resize.c
> > index 29e05f907200..be4157959b83 100644
> > --- a/CodeSamples/datastruct/hash/hash_resize.c
> > +++ b/CodeSamples/datastruct/hash/hash_resize.c
> > @@ -124,8 +124,7 @@ void hashtab_free(struct hashtab *htp_master)
> >  //\begin{snippet}[labelbase=ln:datastruct:hash_resize:get_bucket,commandchars=\\\@\$]
> >  /* Get hash bucket corresponding to key, ignoring the possibility of resize. */
> >  static struct ht_bucket *                              //\lnlbl{single:b}
> > -ht_get_bucket_single(struct ht *htp, void *key, long *b,
> > -                     unsigned long *h)
> > +ht_get_bucket(struct ht *htp, void *key, long *b, unsigned long *h)
> >  {
> >         unsigned long hash = htp->ht_gethash(key);
> >
> > @@ -134,24 +133,6 @@ ht_get_bucket_single(struct ht *htp, void *key, long *b,
> >                 *h = hash;                              //\lnlbl{single:h}
> >         return &htp->ht_bkt[*b];                        //\lnlbl{single:return}
> >  }                                                      //\lnlbl{single:e}
> > -
> > -/* Get hash bucket correesponding to key, accounting for resize. */
> > -static struct ht_bucket *                              //\lnlbl{b}
> > -ht_get_bucket(struct ht **htp, void *key, long *b, int *i)
> > -{
> > -       struct ht_bucket *htbp;
> > -
> > -       htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{call_single}
> > -                                                               //\fcvexclude
> > -       if (*b <= READ_ONCE((*htp)->ht_resize_cur)) {   //\lnlbl{resized}
> > -               smp_mb(); /* order ->ht_resize_cur before ->ht_new. */
> > -               *htp = rcu_dereference((*htp)->ht_new); //\lnlbl{newtable}
> > -               htbp = ht_get_bucket_single(*htp, key, b, NULL); //\lnlbl{newbucket}
> > -       }
> > -       if (i)                                          //\lnlbl{chk_i}
> > -               *i = (*htp)->ht_idx;                    //\lnlbl{set_idx}
> > -       return htbp;                                    //\lnlbl{return}
> > -}                                                      //\lnlbl{e}
> >  //\end{snippet}
> >
> >  /* Read-side lock/unlock functions. */
> > @@ -178,7 +159,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >
> >         rcu_read_lock();                                //\lnlbl{l:rcu_lock}
> >         htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{l:refhashtbl}
> > -       htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:refbucket}
> > +       htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:refbucket}
> >         spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_bucket}
> >         lsp->hbp[0] = htbp;                             //\lnlbl{l:lsp0b}
> >         lsp->hls_idx[0] = htp->ht_idx;
> > @@ -188,7 +169,7 @@ hashtab_lock_mod(struct hashtab *htp_master, void *key,
> >                 return;                                 //\lnlbl{l:fastret1}
> >         }
> >         htp = rcu_dereference(htp->ht_new);             //\lnlbl{l:new_hashtbl}
> > -       htbp = ht_get_bucket_single(htp, key, &b, &h);  //\lnlbl{l:get_newbkt}
> > +       htbp = ht_get_bucket(htp, key, &b, &h);         //\lnlbl{l:get_newbkt}
> >         spin_lock(&htbp->htb_lock);                     //\lnlbl{l:acq_newbkt}
> >         lsp->hbp[1] = htbp;                             //\lnlbl{l:lsp1b}
> >         lsp->hls_idx[1] = htp->ht_idx;
> > @@ -223,16 +204,15 @@ struct ht_elem *                                  //\lnlbl{lkp:b}
> >  hashtab_lookup(struct hashtab *htp_master, void *key)
> >  {
> >         long b;
> > -       int i;
> >         struct ht *htp;
> >         struct ht_elem *htep;
> >         struct ht_bucket *htbp;
> >
> >         htp = rcu_dereference(htp_master->ht_cur);      //\lnlbl{lkp:get_curtbl}
> > -       htbp = ht_get_bucket(&htp, key, &b, &i);        //\lnlbl{lkp:get_curbkt}
> > +       htbp = ht_get_bucket(htp, key, &b, NULL);       //\lnlbl{lkp:get_curbkt}
> >         cds_list_for_each_entry_rcu(htep,               //\lnlbl{lkp:loop:b}
> >                                     &htbp->htb_head,
> > -                                   hte_next[i]) {
> > +                                   hte_next[htp->ht_idx]) {
> >                 if (htp->ht_cmp(htep, key))             //\lnlbl{lkp:match}
> >                         return htep;                    //\lnlbl{lkp:ret_match}
> >         }                                               //\lnlbl{lkp:loop:e}
> > @@ -303,7 +283,7 @@ int hashtab_resize(struct hashtab *htp_master,
> >                 htbp = &htp->ht_bkt[i];                 //\lnlbl{get_oldcur}
> >                 spin_lock(&htbp->htb_lock);             //\lnlbl{acq_oldcur}
> >                 cds_list_for_each_entry(htep, &htbp->htb_head, hte_next[idx]) { //\lnlbl{loop_list:b}
> > -                       htbp_new = ht_get_bucket_single(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> > +                       htbp_new = ht_get_bucket(htp_new, htp_new->ht_getkey(htep), &b, NULL);
> >                         spin_lock(&htbp_new->htb_lock);
> >                         cds_list_add_rcu(&htep->hte_next[!idx], &htbp_new->htb_head);
> >                         spin_unlock(&htbp_new->htb_lock);
> > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > index 5c61bf5e2389..0152437c274e 100644
> > --- a/datastruct/datastruct.tex
> > +++ b/datastruct/datastruct.tex
> > @@ -966,10 +966,8 @@ the old table.
> >  \begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> >  Bucket selection is shown in
> >  Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
> > -which shows \co{ht_get_bucket_single()} on
> > -lines~\lnref{single:b}-\lnref{single:e} and
> > -\co{ht_get_bucket()} on lines~\lnref{b}-\lnref{e}.
> > -The \co{ht_get_bucket_single()} function returns a reference to the bucket
> > +which shows \co{ht_get_bucket()}.
> > +This function returns a reference to the bucket
> >  corresponding to the specified key in the specified hash table, without
> >  making any allowances for resizing.
> >  It also stores the bucket index corresponding to the key into the location
> > @@ -978,36 +976,6 @@ line~\lnref{single:gethash}, and the corresponding
> >  hash value corresponding to the key into the location
> >  referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
> >  Line~\lnref{single:return} then returns a reference to the corresponding bucket.
> > -
> > -The \co{ht_get_bucket()} function handles hash-table selection, invoking
> > -\co{ht_get_bucket_single()} on
> > -line~\lnref{call_single} to select the bucket
> > -corresponding to the hash in the current
> > -hash table, storing the hash value through parameter~\co{b}.
> > -If line~\lnref{resized} determines that the table is being resized and that
> > -line~\lnref{call_single}'s bucket has already been distributed across the new hash
> > -table, then line~\lnref{newtable} selects the new hash table and
> > -line~\lnref{newbucket}
> > -selects the bucket corresponding to the hash in the new hash table,
> > -again storing the hash value through parameter~\co{b}.
> > -\end{lineref}
> > -
> > -\QuickQuiz{}
> > -       The code in
> > -       Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
> > -       computes the hash twice!
> > -       Why this blatant inefficiency?
> > -\QuickQuizAnswer{
> > -       The reason is that the old and new hash tables might have
> > -       completely different hash functions, so that a hash computed
> > -       for the old table might be completely irrelevant to the
> > -       new table.
> > -} \QuickQuizEnd
> > -
> > -\begin{lineref}[ln:datastruct:hash_resize:get_bucket]
> > -If line~\lnref{chk_i} finds that parameter~\co{i} is non-\co{NULL}, then
> > -line~\lnref{set_idx} stores the pointer-set index for the selected hash table.
> > -Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >  \end{lineref}
> >
> >  \QuickQuiz{}
> > @@ -1021,10 +989,8 @@ Finally, line~\lnref{return} returns a reference to the selected hash bucket.
> >         functions described next.
> >  } \QuickQuizEnd
> >
> > -This implementation of
> > -\co{ht_get_bucket_single()} and \co{ht_get_bucket()}
> > -permit lookups and modifications to run concurrently
> > -with a resize operation.
> > +This implementation of \co{ht_get_bucket()} permits lookups and
> > +modifications to run concurrently with a resize operation.
> >
> >  \begin{listing}[tb]
> >  \input{CodeSamples/datastruct/hash/hash_resize@lock_unlock_mod.fcv}
> > @@ -1129,11 +1095,6 @@ hash lookups.
> >  Line~\lnref{get_curtbl} fetches the current hash table and
> >  line~\lnref{get_curbkt} obtains a reference
> >  to the bucket corresponding to the specified key.
> > -This bucket will be located in a new resized hash table when a
> > -resize operation has progressed past the bucket in the old hash
> > -table that contained the desired data element.
> > -Note that line~\lnref{get_curbkt} also passes back the index that will be
> > -used to select the correct set of pointers from the pair in each element.
> >  The loop spanning lines~\lnref{loop:b}-\lnref{loop:e} searches the bucket,
> >  so that if line~\lnref{match}
> >  detects a match,
> > @@ -1144,22 +1105,17 @@ failure.
> >  \end{lineref}
> >
> >  \QuickQuiz{}
> > -       In the \co{hashtab_lookup()} function in
> > -       Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions},
> > -       the code carefully finds the right bucket in the new hash table
> > -       if the element to be looked up has already been distributed
> > -       by a concurrent resize operation.
> > -       This seems wasteful for RCU-protected lookups.
> > -       Why not just stick with the old hash table in this case?
> > +       \begin{lineref}[ln:datastruct:hash_resize:access:lkp]
> > +       What if execution reaches line~\lnref{loop:b}
> > +       of \co{hashtab_lookup()} in
> > +       Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
> > +       just after this bucket has been resized.
> > +       Won't that result in lookup failures?
> > +       \end{lineref}
> >  \QuickQuizAnswer{
> > -       Suppose that a resize operation begins and distributes half of
> > -       the old table's buckets to the new table.
> > -       Suppose further that a thread adds a new element that goes into
> > -       one of the already-distributed buckets, and that this same thread
> > -       now looks up this newly added element.
> > -       If lookups unconditionally traversed only the old hash table,
> > -       this thread would get a lookup failure for the element that it
> > -       just added, which certainly sounds like a bug to me!
> > +       No, it won't.
> > +       Resizing into the new hash table leaves the old hash table
> > +       intact, courtesy of the pointer pairs.
> >  } \QuickQuizEnd
> >
> >  \begin{lineref}[ln:datastruct:hash_resize:access:add]
> >
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-01-18 18:34 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07 13:49 Question regarding hash_resize Junchang Wang
2019-01-07 18:33 ` Paul E. McKenney
2019-01-07 22:54   ` Akira Yokosawa
2019-01-07 23:06     ` Akira Yokosawa
2019-01-07 23:48       ` Paul E. McKenney
2019-01-08 15:18         ` Akira Yokosawa
2019-01-08 15:32           ` Paul E. McKenney
2019-01-08  1:56       ` Junchang Wang
2019-01-08 15:28         ` Paul E. McKenney
2019-01-08 15:35           ` Akira Yokosawa
2019-01-08 18:39             ` Paul E. McKenney
2019-01-08 22:16               ` Akira Yokosawa
2019-01-09  0:19                 ` Paul E. McKenney
2019-01-09  2:59                   ` Paul E. McKenney
2019-01-11  4:08                     ` Paul E. McKenney
2019-01-11 14:25                       ` Akira Yokosawa
2019-01-11 15:43                         ` Paul E. McKenney
2019-01-11 22:56                           ` Akira Yokosawa
2019-01-11 23:28                             ` Paul E. McKenney
2019-01-07 23:13     ` Paul E. McKenney
2019-01-07 23:33       ` Paul E. McKenney
2019-01-18 14:32   ` Junchang Wang
2019-01-18 18:34     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.