All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, Stephen Hemminger <shemminger@vyatta.com>
Subject: Re: [PATCH 6/13] bridge: Add core IGMP snooping support
Date: Sat, 6 Mar 2010 11:00:00 -0800	[thread overview]
Message-ID: <20100306190000.GA24445@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100306151933.GD6812@linux.vnet.ibm.com>

On Sat, Mar 06, 2010 at 07:19:33AM -0800, Paul E. McKenney wrote:
> On Sat, Mar 06, 2010 at 02:56:55PM +0800, Herbert Xu wrote:
> > On Fri, Mar 05, 2010 at 09:06:56PM -0800, Paul E. McKenney wrote:
> > > 
> > > Agreed, but the callbacks registered by the call_rcu_bh() might run
> > > at any time, possibly quite some time after the synchronize_rcu_bh()
> > > completes.  For example, the last call_rcu_bh() might register on
> > > one CPU, and the synchronize_rcu_bh() on another CPU.  Then there
> > > is no guarantee that the call_rcu_bh()'s callback will execute before
> > > the synchronize_rcu_bh() returns.
> > > 
> > > In contrast, rcu_barrier_bh() is guaranteed not to return until all
> > > pending RCU-bh callbacks have executed.
> > 
> > You're absolutely right.  I'll send a patch to fix this.
> > 
> > Incidentally, does rcu_barrier imply rcu_barrier_bh? What about
> > synchronize_rcu and synchronize_rcu_bh? The reason I'm asking is
> > that we use a mixture of rcu_read_lock_bh and rcu_read_lock all
> > over the place but only ever use rcu_barrier and synchronize_rcu.
> 
> Hmmm...  rcu_barrier() definitely does -not- imply rcu_barrier_bh(),
> because there are separate sets of callbacks whose execution can
> be throttled separately.  So, while you would expect RCU-bh grace
> periods to complete more quickly, if there was a large number of
> RCU-bh callbacks on a given CPU but very few RCU callbacks, it might
> well take longer for the RCU-bh callbacks to be invoked.
> 
> With TREE_PREEMPT_RCU, if there were no RCU readers but one long-running
> RCU-bh reader, then synchronize_rcu_bh() could return before
> synchronize_rcu() does.
> 
> The simple approach would be to do something like:
> 
> 	synchronize_rcu();
> 	synchronize_rcu_bh();
> 
> on the one hand, and:
> 
> 	rcu_barrier();
> 	rcu_barrier_bh();
> 
> on the other.  However, this is not so good for update-side latency.
> 
> Perhaps we need a primitive that waits for both RCU and RCU-bh in
> parallel?  This is pretty easy for synchronize_rcu() and
> synchronize_rcu_bh(), and probably not too hard for rcu_barrier()
> and rcu_barrier_bh().
> 
> Hmmm...  Do we have the same issue with call_rcu() and call_rcu_bh()?

But before I get too excited...

You really are talking about code like the following, correct?

	rcu_read_lock();
	p = rcu_dereference(global_p);
	do_something_with(p);
	rcu_read_unlock();

	. . .

	rcu_read_lock_bh();
	p = rcu_dereference(global_p);
	do_something_else_with(p);
	rcu_read_unlock_bh();

	. . . 

	spin_lock(&my_lock);
	p = global_p;
	rcu_assign_pointer(global_p, NULL);
	synchronize_rcu();  /* BUG -- also need synchronize_rcu_bh(). */
	kfree(p);
	spin_unlock(&my_lock);

In other words, different readers traversing the same data structure
under different flavors of RCU protection, but then using only one
flavor of RCU grace period during the update?

						Thanx, Paul

> > > > I understand.  However, AFAICS whatever it is that we are destroying
> > > > is taken off the reader's visible data structure before call_rcu_bh.
> > > > Do you have a particular case in mind where this is not the case?
> > > 
> > > I might simply have missed the operation that removed reader
> > > visibility, looking again...
> > > 
> > > Ah, I see it.  The "br->mdb = NULL" in br_multicast_stop() makes
> > > it impossible for the readers to get to any of the data.  Right?
> > 
> > Yes.  The read-side will see it and get nothing, while all write-side
> > paths will see that netif_running is false and exit.
> > 
> > > > > The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
> > > > > rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
> > > > > scheme correctly.
> > > > 
> > > > Any function that modifies the data structure is done under the
> > > > multicast_lock, including br_multicast_del_pg.
> > > 
> > > But spin_lock() does not take the place of rcu_read_lock_bh().
> > > And so, in theory, the RCU-bh grace period could complete between
> > > the time that br_multicast_del_pg() does its call_rcu_bh() and the
> > > "*pp = p->next;" at the top of the next loop iteration.  If so,
> > > then br_multicast_free_pg()'s kfree() will possibly have clobbered
> > > "p->next".  Low probability, yes, but a long-running interrupt
> > > could do the trick.
> > > 
> > > Or is there something I am missing that is preventing an RCU-bh
> > > grace period from completing near the bottom of br_multicast_del_pg()'s
> > > "for" loop?
> > 
> > Well all the locks are taken with BH disabled, this should prevent
> > this problem, no?
> > 
> > > > The read-side is the data path (non-IGMP multicast packets).  The
> > > > sole entry point is br_mdb_get().
> > > 
> > > Hmmm...  So the caller is responsible for rcu_read_lock_bh()?
> > 
> > Yes, all data paths through the bridge operate with BH disabled.
> > 
> > > Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
> > > in __br_mdb_ip_get(), then?  Or is something else going on here?
> > 
> > Indeed it should, I'll fix this up too.
> > 
> > Thanks for reviewing Paul!
> > -- 
> > Visit Openswan at http://www.openswan.org/
> > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> > Home Page: http://gondor.apana.org.au/~herbert/
> > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

  reply	other threads:[~2010-03-06 19:00 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
2010-02-26 15:35 ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
2010-02-26 15:35 ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
2010-02-27 11:14   ` David Miller
2010-02-27 15:36     ` Herbert Xu
2010-02-26 15:35 ` [PATCH 3/13] bridge: Avoid unnecessary clone on forward path Herbert Xu
2010-02-26 15:35 ` [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path Herbert Xu
2010-02-26 15:35 ` [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood Herbert Xu
2010-02-26 15:35 ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
2010-02-26 15:35 ` [PATCH 7/13] bridge: Add multicast forwarding functions Herbert Xu
2010-02-26 15:35 ` [PATCH 8/13] bridge: Add multicast start/stop hooks Herbert Xu
2010-02-26 15:35 ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
2010-02-26 15:35 ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
2010-02-27  0:42   ` Stephen Hemminger
2010-02-27 11:29     ` David Miller
2010-02-27 15:53       ` Herbert Xu
2010-03-09 12:25     ` Herbert Xu
2010-03-09 12:26       ` Herbert Xu
2010-02-26 15:35 ` [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle Herbert Xu
2010-02-26 15:35 ` [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries Herbert Xu
2010-02-26 15:35 ` [PATCH 13/13] bridge: Add multicast count/interval " Herbert Xu
2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
2010-02-28  5:41   ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
2010-02-28  5:41   ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
2010-02-28  5:41   ` [PATCH 3/13] bridge: Avoid unnecessary clone on forward path Herbert Xu
2010-02-28  5:41   ` [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path Herbert Xu
2010-02-28  5:41   ` [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood Herbert Xu
2010-02-28  5:41   ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
2010-03-05 23:43     ` Paul E. McKenney
2010-03-06  1:17       ` Herbert Xu
2010-03-06  5:06         ` Paul E. McKenney
2010-03-06  6:56           ` Herbert Xu
2010-03-06  7:03             ` Herbert Xu
2010-03-07 23:31               ` David Miller
2010-03-06  7:07             ` Herbert Xu
2010-03-07 23:31               ` David Miller
2010-03-06 15:00             ` Paul E. McKenney
2010-03-06 15:19             ` Paul E. McKenney
2010-03-06 19:00               ` Paul E. McKenney [this message]
2010-03-07  2:45                 ` Herbert Xu
2010-03-07  3:11                   ` Paul E. McKenney
2010-03-08 18:50                     ` Arnd Bergmann
2010-03-09  3:15                       ` Paul E. McKenney
2010-03-11 18:49                       ` Arnd Bergmann
2010-03-14 23:01                         ` Paul E. McKenney
2010-03-09 21:12                     ` Arnd Bergmann
2010-03-10  2:14                       ` Paul E. McKenney
2010-03-10  9:41                         ` Arnd Bergmann
2010-03-10 10:39                           ` Eric Dumazet
2010-03-10 10:49                             ` Herbert Xu
2010-03-10 13:13                               ` Paul E. McKenney
2010-03-10 14:07                                 ` Herbert Xu
2010-03-10 16:26                                   ` Paul E. McKenney
2010-03-10 16:35                                     ` David Miller
2010-03-10 17:56                                       ` Arnd Bergmann
2010-03-10 21:25                                         ` Paul E. McKenney
2010-03-10 13:27                               ` Arnd Bergmann
2010-03-10 13:39                               ` Arnd Bergmann
2010-03-10 13:19                           ` Paul E. McKenney
2010-03-10 13:30                             ` Arnd Bergmann
2010-03-10 13:57                               ` Paul E. McKenney
2010-02-28  5:41   ` [PATCH 7/13] bridge: Add multicast forwarding functions Herbert Xu
2010-02-28  5:41   ` [PATCH 8/13] bridge: Add multicast start/stop hooks Herbert Xu
2010-02-28  5:41   ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
2010-04-27 17:13     ` [PATCH net-next] bridge: use is_multicast_ether_addr Stephen Hemminger
2010-04-27 19:53       ` David Miller
2010-02-28  5:41   ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
2010-04-27 17:13     ` [PATCH net-next] bridge: multicast router list manipulation Stephen Hemminger
2010-04-27 19:54       ` David Miller
2010-04-27 23:11       ` Michał Mirosław
2010-04-27 23:25         ` Stephen Hemminger
2010-04-27 23:28           ` David Miller
2010-04-27 23:44             ` Stephen Hemminger
2010-04-27 23:51               ` David Miller
2010-04-27 23:27         ` David Miller
2010-04-28  1:51       ` Herbert Xu
2010-02-28  5:41   ` [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle Herbert Xu
2010-02-28  5:41   ` [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries Herbert Xu
2010-02-28  5:41   ` [PATCH 13/13] bridge: Add multicast count/interval " Herbert Xu
2010-02-28  8:52   ` [1/13] bridge: Add IGMP snooping support David Miller
2010-03-01  2:08     ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100306190000.GA24445@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.