All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Roman Gushchin <klamm@yandex-team.ru>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Dipankar Sarma <dipankar@in.ibm.com>,
	zhmurov@yandex-team.ru, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	James Morris <jmorris@namei.org>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Patrick McHardy <kaber@trash.net>
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
Date: Sat, 25 May 2013 04:37:15 -0700	[thread overview]
Message-ID: <20130525113715.GA3795@linux.vnet.ibm.com> (raw)
In-Reply-To: <519D19DA.50400@yandex-team.ru>

On Wed, May 22, 2013 at 11:17:46PM +0400, Roman Gushchin wrote:
> On 22.05.2013 21:45, Paul E. McKenney wrote:
> >On Wed, May 22, 2013 at 05:07:07PM +0400, Roman Gushchin wrote:
> >>On 22.05.2013 16:30, Eric Dumazet wrote:
> >>>On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote:
> >>>
> >>>>+/*
> >>>>+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
> >>>>+ * The main goal is preventing compiler to store &ptr->field in a register.
> >>>
> >>>But &ptr->field is a constant during the whole duration of
> >>>udp4_lib_lookup2() and could be in a register, in my case field is at
> >>>offset 0, and ptr is a parameter (so could be in a 'register')
> >>>
> >>>The bug you found is that compiler caches the indirection  (ptr->field)
> >>>into a register, not that compiler stores &ptr->field into a register.
> >>>
> >>>>+ */
> >>>>+#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
> >>>>+
> >>>
> >>>Here we force the compiler to consider ptr as volatile, but semantically
> >>>it is not required in rcu_dereference(ptr->field)
> >>
> >>Actually, we need to mark an "address of a place" where the field value is
> >>located as volatile before dereferencing. I have no idea how to do it in another way,
> >>except using multiple casts and offsetof's, but, IMHO, it will be even more complex:
> >>	ACCESS_ONCE(typeof(&ptr->field)((char*)ptr + offsetof(typeof(*ptr), field)))
> 
> Probably I miss one more ACCESS_ONCE() in this expression. Should be something like
> ACCESS_ONCE(typeof(&ptr->field)((char*)ACCESS_ONCE(ptr) + offsetof(typeof(*ptr), field))) .
> But this is not a working example, just an illustration against using ACCESS_ONCE() here.
> 
> >Why not just ACCESS_ONCE(ptr->field)?  Or if it is the thing that
> >ptr->field points to that is subject to change, ACCESS_ONCE(*ptr->field)?
> >
> >Or rcu_dereference(ptr->field), as appropriate?
> 
> It's not enough. So is the code now, and it doesn't work as expected.
> You can't write (ptr->field) without ptr being marked as a volatile pointer.
> 
> I try to explain the problem once more from scratch:
> 
> 1) We have the following pseudo-code (based on udp4_lib_lookup2()):
> 
> static void some_func(struct hlist_nulls_head *head) {
> 	struct hlist_nulls_node *node;
> 
> begin:
> 	for (node = rcu_dereference(head->first);
> 		!is_nulls(node) & ...;
> 		node = rcu_dereference(node->next)) {
> 		<...>
> 	}
> 
> 	if (restart_condition)
> 		goto begin;
> 
> 2) A problem occurs when restart_condition is true and we jump to the begin label.
> We do not recalculate (head + offsetof(head, first)) address, we just dereference
> again the OLD (head->first) pointer. So, we get a node, that WAS the first node in a
> previous time instead of current first node. That node can be dead, or, for instance,
> can be a head of another chain.

OK, this is what I was referring to when I said that the RCU list macros
assume that the list header is static (or equivalently, appropriately
protected).

With some_func() as written above, you would need to return some sort
of failure indication from some_func(), and have the caller refetch head.
Otherwise, as far as gcc is concerned, the value of the parameter head
is constant throughout some_func().

> It is correct from gcc's point of view, since it doesn't expect the head pointer
> to change, and this address is just (head + constant offset).

Agreed.

How does the caller calculate the value to pass in through the argument "head"?

> 3) If we start with a wrong first element, restart_condition can be always true, so,
> we get an infinite loop. In any case, we do not scan the whole (socket) chain,
> as expected.

Agreed.

> This scenario is absolutely real and causes our DNS servers to hang
> sometimes under high load.

I completely believe that such a hang could happen.

> Note, that there are no problems if we don't restart a loop. Also, it is highly
> dependent on gcc options, and the code in the body of the loop. Even small changes
> in the code (like adding debugging print) preventing reproducing of the issue.

Again, I believe that your retry logic needs to extend back into the
calling function for your some_func() example above.

							Thanx, Paul


  reply	other threads:[~2013-05-26  8:52 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-21  9:05 [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro Roman Gushchin
2013-05-21 10:40 ` David Laight
2013-05-21 11:55   ` Roman Gushchin
2013-05-21 13:42     ` David Laight
2013-05-21 12:09 ` Paul E. McKenney
2013-05-21 12:46   ` Roman Gushchin
2013-05-21 12:58     ` Paul E. McKenney
2013-05-21 13:37   ` Eric Dumazet
2013-05-21 13:44   ` Eric Dumazet
2013-05-21 14:47     ` Roman Gushchin
2013-05-21 15:16       ` Eric Dumazet
2013-05-21 15:51         ` Roman Gushchin
2013-05-21 15:38       ` Eric Dumazet
2013-05-21 15:51         ` Roman Gushchin
2013-05-21 18:12         ` [PATCH v2] " Roman Gushchin
2013-05-22  2:01           ` Eric Dumazet
2013-05-22  5:49             ` Eric Dumazet
2013-05-22 11:58               ` Roman Gushchin
2013-05-22 12:30                 ` Eric Dumazet
2013-05-22 13:07                   ` Roman Gushchin
2013-05-22 17:45                     ` Paul E. McKenney
2013-05-22 19:17                       ` Roman Gushchin
2013-05-25 11:37                         ` Paul E. McKenney [this message]
2013-05-27 11:34                           ` Roman Gushchin
2013-05-27 17:55                           ` Roman Gushchin
2013-05-28  0:12                             ` Eric Dumazet
2013-05-28  9:10                               ` Roman Gushchin
2013-05-29  0:34                                 ` Eric Dumazet
2013-05-29  1:31                                   ` Paul E. McKenney
2013-05-29  5:08                                     ` Eric Dumazet
2013-05-29 10:09                                       ` Roman Gushchin
2013-05-29 19:06                                         ` Eric Dumazet
2013-05-30  8:25                                           ` Roman Gushchin
2013-06-02 23:31                                             ` Eric Dumazet
2013-06-03  2:58                                               ` David Miller
2013-06-03  3:12                                                 ` Eric Dumazet
2013-06-03  3:27                                                   ` David Miller
2013-06-03  3:42                                                     ` Paul E. McKenney
2013-06-03  3:47                                                       ` Eric Dumazet
2013-06-03  3:49                                                       ` David Miller
2013-06-03  6:05                                                         ` Paul E. McKenney
2013-06-10 18:29                                                         ` Boris B. Zhmurov
2013-06-10 18:51                                                           ` Eric Dumazet
2013-06-03  3:48                                                   ` Paul E. McKenney
2013-06-03  3:42                                                 ` Paul E. McKenney
2013-05-29  9:17                                   ` Roman Gushchin
2013-05-29  1:19                                 ` Paul E. McKenney
2013-05-22 13:27                   ` David Laight
2013-05-22 13:27                     ` David Laight
2013-05-22 13:36                     ` Eric Dumazet
2013-05-22 14:23                       ` David Laight
2013-05-22 14:23                         ` David Laight
2013-05-22 13:55                     ` Roman Gushchin
2013-05-22  9:58             ` Paul E. McKenney
2013-05-22 12:28               ` Eric Dumazet
2013-05-22 13:00                 ` Paul E. McKenney
2013-05-22 14:16                   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130525113715.GA3795@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davem@davemloft.net \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jmorris@namei.org \
    --cc=kaber@trash.net \
    --cc=klamm@yandex-team.ru \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    --cc=zhmurov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.