All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
@ 2013-03-28 17:16 Steven Rostedt
  2013-03-28 17:29 ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2013-03-28 17:16 UTC (permalink / raw)
  To: Jiri Pirko, Andy Gospodarek, David S. Miller
  Cc: LKML, netdev, Nicolas de Pesloüan, Thomas Gleixner,
	Guy Streeter, Paul E. McKenney

Hi,

I'm currently debugging a crash in an old 3.0-rt kernel that one of our
customers is seeing. The bug happens with a stress test that loads and
unloads the bonding module in a loop (I don't know all the details as
I'm not the one that is directly interacting with the customer). But the
bug looks to be something that may still be present and possibly present
in mainline too. It will just be much harder to trigger it in mainline.

In -rt, interrupts are threads, and can schedule in and out just like
any other thread. Note, mainline now supports interrupt threads so this
may be easily reproducible in mainline as well. I don't have the ability
to tell the customer to try mainline or other kernels, so my hands are
somewhat tied to what I can do.

But according to a core dump, I tracked down that the eth irq thread
crashed in bond_handle_frame() here:

	slave = bond_slave_get_rcu(skb->dev);
	bond = slave->bond; <--- BUG


the slave returned was NULL and accessing slave->bond caused a NULL
pointer dereference.

Looking at the code that unregisters the handler:

void netdev_rx_handler_unregister(struct net_device *dev)
{

        ASSERT_RTNL();
        RCU_INIT_POINTER(dev->rx_handler, NULL);
        RCU_INIT_POINTER(dev->rx_handler_data, NULL);
}

Which is basically:
	dev->rx_handler = NULL;
	dev->rx_handler_data = NULL;

And looking at __netif_receive_skb() we have:

        rx_handler = rcu_dereference(skb->dev->rx_handler);
        if (rx_handler) {
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                        pt_prev = NULL;
                }
                switch (rx_handler(&skb)) {

My question to all of you is, what stops this interrupt from happening
while the bonding module is unloading?  What happens if the interrupt
triggers and we have this:


	CPU0			CPU1
	----			----
  rx_handler = skb->dev->rx_handler

			netdev_rx_handler_unregister() {
			   dev->rx_handler = NULL;
			   dev->rx_handler_data = NULL;

  rx_handler()
   bond_handle_frame() {
    slave = skb->dev->rx_handler;
    bond = slave->bond; <-- NULL pointer dereference!!!


What protection am I missing in the bond release handler that would
prevent the above from happening?

Thanks!

-- Steve





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
  2013-03-28 17:16 [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline) Steven Rostedt
@ 2013-03-28 17:29 ` Eric Dumazet
  2013-03-28 17:44   ` Steven Rostedt
  2013-03-29  9:48   ` Jiri Pirko
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2013-03-28 17:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Pirko, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney

On Thu, 2013-03-28 at 13:16 -0400, Steven Rostedt wrote:
> Hi,
> 
> I'm currently debugging a crash in an old 3.0-rt kernel that one of our
> customers is seeing. The bug happens with a stress test that loads and
> unloads the bonding module in a loop (I don't know all the details as
> I'm not the one that is directly interacting with the customer). But the
> bug looks to be something that may still be present and possibly present
> in mainline too. It will just be much harder to trigger it in mainline.
> 
> In -rt, interrupts are threads, and can schedule in and out just like
> any other thread. Note, mainline now supports interrupt threads so this
> may be easily reproducible in mainline as well. I don't have the ability
> to tell the customer to try mainline or other kernels, so my hands are
> somewhat tied to what I can do.
> 
> But according to a core dump, I tracked down that the eth irq thread
> crashed in bond_handle_frame() here:
> 
> 	slave = bond_slave_get_rcu(skb->dev);
> 	bond = slave->bond; <--- BUG
> 
> 
> the slave returned was NULL and accessing slave->bond caused a NULL
> pointer dereference.
> 
> Looking at the code that unregisters the handler:
> 
> void netdev_rx_handler_unregister(struct net_device *dev)
> {
> 
>         ASSERT_RTNL();
>         RCU_INIT_POINTER(dev->rx_handler, NULL);
>         RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> }
> 
> Which is basically:
> 	dev->rx_handler = NULL;
> 	dev->rx_handler_data = NULL;
> 
> And looking at __netif_receive_skb() we have:
> 
>         rx_handler = rcu_dereference(skb->dev->rx_handler);
>         if (rx_handler) {
>                 if (pt_prev) {
>                         ret = deliver_skb(skb, pt_prev, orig_dev);
>                         pt_prev = NULL;
>                 }
>                 switch (rx_handler(&skb)) {
> 
> My question to all of you is, what stops this interrupt from happening
> while the bonding module is unloading?  What happens if the interrupt
> triggers and we have this:
> 
> 
> 	CPU0			CPU1
> 	----			----
>   rx_handler = skb->dev->rx_handler
> 
> 			netdev_rx_handler_unregister() {
> 			   dev->rx_handler = NULL;
> 			   dev->rx_handler_data = NULL;
> 
>   rx_handler()
>    bond_handle_frame() {
>     slave = skb->dev->rx_handler;
>     bond = slave->bond; <-- NULL pointer dereference!!!
> 
> 
> What protection am I missing in the bond release handler that would
> prevent the above from happening?

Nothing :(

bug introduced in commit 35d48903e9781975e823b359ee85c257c9ff5c1c
(bonding: fix rx_handler locking)

CC Jiri

Fix seems simple :

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6bbd90e..7956ca5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1457,6 +1457,8 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 	*pskb = skb;
 
 	slave = bond_slave_get_rcu(skb->dev);
+	if (!slave)
+		return ret;
 	bond = slave->bond;
 
 	if (bond->params.arp_interval)




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
  2013-03-28 17:29 ` Eric Dumazet
@ 2013-03-28 17:44   ` Steven Rostedt
  2013-03-29  9:48   ` Jiri Pirko
  1 sibling, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2013-03-28 17:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney

On Thu, 2013-03-28 at 10:29 -0700, Eric Dumazet wrote:

> Nothing :(
> 
> bug introduced in commit 35d48903e9781975e823b359ee85c257c9ff5c1c
> (bonding: fix rx_handler locking)
> 
> CC Jiri
> 
> Fix seems simple :
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 6bbd90e..7956ca5 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1457,6 +1457,8 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
>  	*pskb = skb;
>  
>  	slave = bond_slave_get_rcu(skb->dev);
> +	if (!slave)
> +		return ret;

Thanks! That's basically what I thought, but wanted to make sure there's
wasn't some other synchronization that I may have been missing.

-- Steve

>  	bond = slave->bond;
>  
>  	if (bond->params.arp_interval)
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
  2013-03-28 17:29 ` Eric Dumazet
  2013-03-28 17:44   ` Steven Rostedt
@ 2013-03-29  9:48   ` Jiri Pirko
  2013-03-29 13:01     ` [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() Eric Dumazet
                       ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Jiri Pirko @ 2013-03-29  9:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Steven Rostedt, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen

Thu, Mar 28, 2013 at 06:29:52PM CET, eric.dumazet@gmail.com wrote:
>On Thu, 2013-03-28 at 13:16 -0400, Steven Rostedt wrote:
>> Hi,
>> 
>> I'm currently debugging a crash in an old 3.0-rt kernel that one of our
>> customers is seeing. The bug happens with a stress test that loads and
>> unloads the bonding module in a loop (I don't know all the details as
>> I'm not the one that is directly interacting with the customer). But the
>> bug looks to be something that may still be present and possibly present
>> in mainline too. It will just be much harder to trigger it in mainline.
>> 
>> In -rt, interrupts are threads, and can schedule in and out just like
>> any other thread. Note, mainline now supports interrupt threads so this
>> may be easily reproducible in mainline as well. I don't have the ability
>> to tell the customer to try mainline or other kernels, so my hands are
>> somewhat tied to what I can do.
>> 
>> But according to a core dump, I tracked down that the eth irq thread
>> crashed in bond_handle_frame() here:
>> 
>> 	slave = bond_slave_get_rcu(skb->dev);
>> 	bond = slave->bond; <--- BUG
>> 
>> 
>> the slave returned was NULL and accessing slave->bond caused a NULL
>> pointer dereference.
>> 
>> Looking at the code that unregisters the handler:
>> 
>> void netdev_rx_handler_unregister(struct net_device *dev)
>> {
>> 
>>         ASSERT_RTNL();
>>         RCU_INIT_POINTER(dev->rx_handler, NULL);
>>         RCU_INIT_POINTER(dev->rx_handler_data, NULL);
>> }
>> 
>> Which is basically:
>> 	dev->rx_handler = NULL;
>> 	dev->rx_handler_data = NULL;
>> 
>> And looking at __netif_receive_skb() we have:
>> 
>>         rx_handler = rcu_dereference(skb->dev->rx_handler);
>>         if (rx_handler) {
>>                 if (pt_prev) {
>>                         ret = deliver_skb(skb, pt_prev, orig_dev);
>>                         pt_prev = NULL;
>>                 }
>>                 switch (rx_handler(&skb)) {
>> 
>> My question to all of you is, what stops this interrupt from happening
>> while the bonding module is unloading?  What happens if the interrupt
>> triggers and we have this:
>> 
>> 
>> 	CPU0			CPU1
>> 	----			----
>>   rx_handler = skb->dev->rx_handler
>> 
>> 			netdev_rx_handler_unregister() {
>> 			   dev->rx_handler = NULL;
>> 			   dev->rx_handler_data = NULL;
>> 
>>   rx_handler()
>>    bond_handle_frame() {
>>     slave = skb->dev->rx_handler;
>>     bond = slave->bond; <-- NULL pointer dereference!!!
>> 
>> 
>> What protection am I missing in the bond release handler that would
>> prevent the above from happening?


Hmm. I think that this might be issue introduced by:
commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a
Author: Stephen Hemminger <shemminger@vyatta.com>
Date:   Mon Aug 1 16:19:00 2011 +0000

    rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER


Because, if rcu_dereference(dev->rx_handler) is null,
rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
we are hitting following scenario:


   CPU0				CPU1
   ----				----
  			    dev->rx_handler_data = NULL
 rcu_read_lock()
 			    dev->rx_handler = NULL


CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
barrier in rcu_assign_pointer() might prevent this reorder from happening.
Therefore I suggest:

diff --git a/net/core/dev.c b/net/core/dev.c
index 0caa38e..c16b829 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
 {
 
 	ASSERT_RTNL();
-	RCU_INIT_POINTER(dev->rx_handler, NULL);
-	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
+	rcu_assign_pointer(dev->rx_handler, NULL);
+	rcu_assign_pointer(dev->rx_handler_data, NULL);
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
 

>
>Nothing :(
>
>bug introduced in commit 35d48903e9781975e823b359ee85c257c9ff5c1c
>(bonding: fix rx_handler locking)
>
>CC Jiri
>
>Fix seems simple :
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 6bbd90e..7956ca5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1457,6 +1457,8 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
> 	*pskb = skb;
> 
> 	slave = bond_slave_get_rcu(skb->dev);
>+	if (!slave)
>+		return ret;
> 	bond = slave->bond;
> 
> 	if (bond->params.arp_interval)
>
>
>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29  9:48   ` Jiri Pirko
@ 2013-03-29 13:01     ` Eric Dumazet
  2013-03-29 13:17       ` Steven Rostedt
                         ` (2 more replies)
  2013-03-29 15:46     ` [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline) Stephen Hemminger
  2013-03-29 18:36     ` Steven Rostedt
  2 siblings, 3 replies; 18+ messages in thread
From: Eric Dumazet @ 2013-03-29 13:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Steven Rostedt, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen

From: Eric Dumazet <edumazet@google.com>

On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:

> Hmm. I think that this might be issue introduced by:
> commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a
> Author: Stephen Hemminger <shemminger@vyatta.com>
> Date:   Mon Aug 1 16:19:00 2011 +0000
> 
>     rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
> 
> 
> Because, if rcu_dereference(dev->rx_handler) is null,
> rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
> we are hitting following scenario:
> 
> 
>    CPU0				CPU1
>    ----				----
>   			    dev->rx_handler_data = NULL
>  rcu_read_lock()
>  			    dev->rx_handler = NULL
> 
> 
> CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
> barrier in rcu_assign_pointer() might prevent this reorder from happening.
> Therefore I suggest:
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0caa38e..c16b829 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>  {
>  
>  	ASSERT_RTNL();
> -	RCU_INIT_POINTER(dev->rx_handler, NULL);
> -	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> +	rcu_assign_pointer(dev->rx_handler, NULL);
> +	rcu_assign_pointer(dev->rx_handler_data, NULL);
>  }
>  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
>  
> 

Nope this changes nothing at all.

However, we can fix the bug in a different way, if we want to avoid a
test in fast path.

With following patch, we can make sure that a reader seeing a non NULL
rx_handler has a guarantee to see a non NULL rx_handler_data

Thanks

[PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()

commit 35d48903e97819 (bonding: fix rx_handler locking) added a race
in bonding driver, reported by Steven Rostedt who did a very good
diagnosis :

<quoting Steven>

I'm currently debugging a crash in an old 3.0-rt kernel that one of our
customers is seeing. The bug happens with a stress test that loads and
unloads the bonding module in a loop (I don't know all the details as
I'm not the one that is directly interacting with the customer). But the
bug looks to be something that may still be present and possibly present
in mainline too. It will just be much harder to trigger it in mainline.

In -rt, interrupts are threads, and can schedule in and out just like
any other thread. Note, mainline now supports interrupt threads so this
may be easily reproducible in mainline as well. I don't have the ability
to tell the customer to try mainline or other kernels, so my hands are
somewhat tied to what I can do.

But according to a core dump, I tracked down that the eth irq thread
crashed in bond_handle_frame() here:

        slave = bond_slave_get_rcu(skb->dev);
        bond = slave->bond; <--- BUG


the slave returned was NULL and accessing slave->bond caused a NULL
pointer dereference.

Looking at the code that unregisters the handler:

void netdev_rx_handler_unregister(struct net_device *dev)
{

        ASSERT_RTNL();
        RCU_INIT_POINTER(dev->rx_handler, NULL);
        RCU_INIT_POINTER(dev->rx_handler_data, NULL);
}

Which is basically:
        dev->rx_handler = NULL;
        dev->rx_handler_data = NULL;

And looking at __netif_receive_skb() we have:

        rx_handler = rcu_dereference(skb->dev->rx_handler);
        if (rx_handler) {
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                        pt_prev = NULL;
                }
                switch (rx_handler(&skb)) {

My question to all of you is, what stops this interrupt from happening
while the bonding module is unloading?  What happens if the interrupt
triggers and we have this:


        CPU0                    CPU1
        ----                    ----
  rx_handler = skb->dev->rx_handler

                        netdev_rx_handler_unregister() {
                           dev->rx_handler = NULL;
                           dev->rx_handler_data = NULL;

  rx_handler()
   bond_handle_frame() {
    slave = skb->dev->rx_handler;
    bond = slave->bond; <-- NULL pointer dereference!!!


What protection am I missing in the bond release handler that would
prevent the above from happening?

</quoting Steven>

We can fix bug this in two ways. First is adding a test in
bond_handle_frame() and others to check if rx_handler_data is NULL.

A second way is adding a synchronize_net() in
netdev_rx_handler_unregister() to make sure that a rcu protected reader
has the guarantee to see a non NULL rx_handler_data.

The second way is better as it avoids an extra test in fast path.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
---
 net/core/dev.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index b13e5c7..56932a4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev,
 	if (dev->rx_handler)
 		return -EBUSY;
 
+	/* Note: rx_handler_data must be set before rx_handler */
 	rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
 	rcu_assign_pointer(dev->rx_handler, rx_handler);
 
@@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device *dev)
 
 	ASSERT_RTNL();
 	RCU_INIT_POINTER(dev->rx_handler, NULL);
+	/* a reader seeing a non NULL rx_handler in a rcu_read_lock()
+	 * section has a guarantee to see a non NULL rx_handler_data
+	 * as well.
+	 */
+	synchronize_net();
 	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 13:01     ` [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() Eric Dumazet
@ 2013-03-29 13:17       ` Steven Rostedt
  2013-03-29 13:38         ` Eric Dumazet
  2013-03-29 15:11       ` Ivan Vecera
  2013-03-29 19:20       ` Paul E. McKenney
  2 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2013-03-29 13:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen

On Fri, 2013-03-29 at 06:01 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:
> 
> > Hmm. I think that this might be issue introduced by:
> > commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a
> > Author: Stephen Hemminger <shemminger@vyatta.com>
> > Date:   Mon Aug 1 16:19:00 2011 +0000
> > 
> >     rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
> > 
> > 
> > Because, if rcu_dereference(dev->rx_handler) is null,
> > rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
> > we are hitting following scenario:
> > 
> > 
> >    CPU0				CPU1
> >    ----				----
> >   			    dev->rx_handler_data = NULL
> >  rcu_read_lock()
> >  			    dev->rx_handler = NULL
> > 
> > 
> > CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
> > barrier in rcu_assign_pointer() might prevent this reorder from happening.
> > Therefore I suggest:
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 0caa38e..c16b829 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
> >  {
> >  
> >  	ASSERT_RTNL();
> > -	RCU_INIT_POINTER(dev->rx_handler, NULL);
> > -	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> > +	rcu_assign_pointer(dev->rx_handler, NULL);
> > +	rcu_assign_pointer(dev->rx_handler_data, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
> >  
> > 
> 
> Nope this changes nothing at all.

Exactly! In fact, the bug triggered on an older kernel that had the
original rcu_assign_pointer()

> 
> However, we can fix the bug in a different way, if we want to avoid a
> test in fast path.
> 
> With following patch, we can make sure that a reader seeing a non NULL
> rx_handler has a guarantee to see a non NULL rx_handler_data
> 

[..]

> We can fix bug this in two ways. First is adding a test in
> bond_handle_frame() and others to check if rx_handler_data is NULL.
> 
> A second way is adding a synchronize_net() in
> netdev_rx_handler_unregister() to make sure that a rcu protected reader
> has the guarantee to see a non NULL rx_handler_data.
> 
> The second way is better as it avoids an extra test in fast path.
> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jiri Pirko <jpirko@redhat.com>
> Cc: Paul E. McKenney <paulmck@us.ibm.com>
> ---
>  net/core/dev.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b13e5c7..56932a4 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev,
>  	if (dev->rx_handler)
>  		return -EBUSY;
>  
> +	/* Note: rx_handler_data must be set before rx_handler */
>  	rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
>  	rcu_assign_pointer(dev->rx_handler, rx_handler);
>  
> @@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>  
>  	ASSERT_RTNL();
>  	RCU_INIT_POINTER(dev->rx_handler, NULL);
> +	/* a reader seeing a non NULL rx_handler in a rcu_read_lock()
> +	 * section has a guarantee to see a non NULL rx_handler_data
> +	 * as well.
> +	 */
> +	synchronize_net();

I've thought about this too, but I wasn't sure we wanted two
synchronize_*() functions, as the caller does a synchronize as well.
That said, I think this is the more robust solution and it lets all
rx_handler() functions assume that their rx_handler_data is set. And it
removes the check from the fast path which outweighs an added
synchronization in the slow path.

Acked-by: Steven Rostedt <rostedt@goodmis.org>

Thanks!

-- Steve

>  	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
>  }
>  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 13:17       ` Steven Rostedt
@ 2013-03-29 13:38         ` Eric Dumazet
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2013-03-29 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Pirko, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen

On Fri, 2013-03-29 at 09:17 -0400, Steven Rostedt wrote:

> I've thought about this too, but I wasn't sure we wanted two
> synchronize_*() functions, as the caller does a synchronize as well.
> That said, I think this is the more robust solution and it lets all
> rx_handler() functions assume that their rx_handler_data is set. And it
> removes the check from the fast path which outweighs an added
> synchronization in the slow path.
> 

Note that I used synchronize_net(), which does a
synchronize_rcu_expedited() when RTNL is locked, so its normally quite
fast.

> Acked-by: Steven Rostedt <rostedt@goodmis.org>
> 
> Thanks!

Thanks a lot for your very detailed report and analysis !



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 13:01     ` [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() Eric Dumazet
  2013-03-29 13:17       ` Steven Rostedt
@ 2013-03-29 15:11       ` Ivan Vecera
  2013-03-29 15:38         ` Eric Dumazet
  2013-03-29 19:20       ` Paul E. McKenney
  2 siblings, 1 reply; 18+ messages in thread
From: Ivan Vecera @ 2013-03-29 15:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, Steven Rostedt, Andy Gospodarek, David S. Miller,
	LKML, netdev, Nicolas de Pesloüan, Thomas Gleixner,
	Guy Streeter, Paul E. McKenney, stephen

On 03/29/2013 02:01 PM, Eric Dumazet wrote:
>> CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
>> >barrier in rcu_assign_pointer() might prevent this reorder from happening.
>> >Therefore I suggest:
>> >
>> >diff --git a/net/core/dev.c b/net/core/dev.c
>> >index 0caa38e..c16b829 100644
>> >--- a/net/core/dev.c
>> >+++ b/net/core/dev.c
>> >@@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>> >  {
>> >
>> >  	ASSERT_RTNL();
>> >-	RCU_INIT_POINTER(dev->rx_handler, NULL);
>> >-	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
>> >+	rcu_assign_pointer(dev->rx_handler, NULL);
>> >+	rcu_assign_pointer(dev->rx_handler_data, NULL);
>> >  }
>> >  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
>> >
>> >
> Nope this changes nothing at all.
Erik, why doesn't help the write barrier between the assignments. It 
should guarantee their orders... or not?

Thanks,
Ivan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 15:11       ` Ivan Vecera
@ 2013-03-29 15:38         ` Eric Dumazet
  2013-03-29 16:12           ` Jiri Pirko
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2013-03-29 15:38 UTC (permalink / raw)
  To: Ivan Vecera
  Cc: Jiri Pirko, Steven Rostedt, Andy Gospodarek, David S. Miller,
	LKML, netdev, Nicolas de Pesloüan, Thomas Gleixner,
	Guy Streeter, Paul E. McKenney, stephen

On Fri, 2013-03-29 at 16:11 +0100, Ivan Vecera wrote:

> Erik, why doesn't help the write barrier between the assignments. It 
> should guarantee their orders... or not?
> 

Its not enough, I wont explain here why as RCU is quite well documented
in Documentation/RCU




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
  2013-03-29  9:48   ` Jiri Pirko
  2013-03-29 13:01     ` [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() Eric Dumazet
@ 2013-03-29 15:46     ` Stephen Hemminger
  2013-03-29 18:36     ` Steven Rostedt
  2 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2013-03-29 15:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Eric Dumazet, Steven Rostedt, Andy Gospodarek, David S. Miller,
	LKML, netdev, Nicolas de Pesloüan, Thomas Gleixner,
	Guy Streeter, Paul E. McKenney

On Fri, 29 Mar 2013 10:48:56 +0100
Jiri Pirko <jpirko@redhat.com> wrote:

> index 0caa38e..c16b829 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>  {
>  
>  	ASSERT_RTNL();
> -	RCU_INIT_POINTER(dev->rx_handler, NULL);
> -	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> +	rcu_assign_pointer(dev->rx_handler, NULL);
> +	rcu_assign_pointer(dev->rx_handler_data, NULL);
>  }
>  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
It is worth noting that at the time rcu_assign_pointer() had a special
case tat if the value was NULL it would compile into RCU_INIT_POINTER without
the barrier.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 15:38         ` Eric Dumazet
@ 2013-03-29 16:12           ` Jiri Pirko
  2013-03-29 16:46             ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Pirko @ 2013-03-29 16:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ivan Vecera, Jiri Pirko, Steven Rostedt, Andy Gospodarek,
	David S. Miller, LKML, netdev, Nicolas de Pesloüan,
	Thomas Gleixner, Guy Streeter, Paul E. McKenney, stephen

Fri, Mar 29, 2013 at 04:38:15PM CET, eric.dumazet@gmail.com wrote:
>On Fri, 2013-03-29 at 16:11 +0100, Ivan Vecera wrote:
>
>> Erik, why doesn't help the write barrier between the assignments. It 
>> should guarantee their orders... or not?
>> 
>
>Its not enough, I wont explain here why as RCU is quite well documented
>in Documentation/RCU

Can you point me exact paragraph? I'm unable to find that :(

>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 16:12           ` Jiri Pirko
@ 2013-03-29 16:46             ` Eric Dumazet
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2013-03-29 16:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Ivan Vecera, Jiri Pirko, Steven Rostedt, Andy Gospodarek,
	David S. Miller, LKML, netdev, Nicolas de Pesloüan,
	Thomas Gleixner, Guy Streeter, Paul E. McKenney, stephen

On Fri, 2013-03-29 at 17:12 +0100, Jiri Pirko wrote:
> Fri, Mar 29, 2013 at 04:38:15PM CET, eric.dumazet@gmail.com wrote:
> >On Fri, 2013-03-29 at 16:11 +0100, Ivan Vecera wrote:
> >
> >> Erik, why doesn't help the write barrier between the assignments. It 
> >> should guarantee their orders... or not?
> >> 
> >
> >Its not enough, I wont explain here why as RCU is quite well documented
> >in Documentation/RCU
> 
> Can you point me exact paragraph? I'm unable to find that :(
> 

You need a bit of RCU history to understand the issue

rcu_assign_pointer(dev->rx_handler, NULL) is certainly not needing a
barrier _before_ setting rx_handler to NULL.

Old kernels had this rcu_assign_pointer() implementation since
commit d99c4f6b13b3149bc83703ab1493beaeaaaf8a2d 
(Remove rcu_assign_pointer() penalty for NULL pointers)

#define rcu_assign_pointer(p, v) \
	({ \
		if (!__builtin_constant_p(v) || \
		    ((v) != NULL)) \
			smp_wmb(); \
		(p) = (v); \
	})

Note that wmb() was _not_ done if v was NULL


Because of various sparse issues, commit
d322f45ceed525daa9401154590bbae3222cfefb
(rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier)
removed the conditional, because RCU_INIT_POINTER() was available.

In the rx_handler/rx_handler_data, we use two pointers protected by RCU,
but we want to only test rx_hander being NULL, and avoid testing
rx_handler_data.

Nothing in RCU guarantees that two different pointers have any order.

Here is what could happen

CPU0                                      CPU1

handler = rcu_dereference(dev->rx_handler)
if (handler) {
   handler(dev, ...);

                                          dev->rx_handler = NULL;
                                          smp_wmb(); // OR NOT
                                          dev->rx_handler_data = NULL;
                                          smp_wmb(); // OR NOT
 handler(dev)
   priv_data = rcu_dereference(dev->rx_handler_data);
   x = priv_data->some_field;   // CRASH because priv_data is NULL




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
  2013-03-29  9:48   ` Jiri Pirko
  2013-03-29 13:01     ` [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() Eric Dumazet
  2013-03-29 15:46     ` [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline) Stephen Hemminger
@ 2013-03-29 18:36     ` Steven Rostedt
  2013-03-29 19:55       ` Steven Rostedt
  2013-03-30  9:19       ` Jiri Pirko
  2 siblings, 2 replies; 18+ messages in thread
From: Steven Rostedt @ 2013-03-29 18:36 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Eric Dumazet, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen

On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:

> Because, if rcu_dereference(dev->rx_handler) is null,
> rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
> we are hitting following scenario:
> 
> 
>    CPU0				CPU1
>    ----				----
>   			    dev->rx_handler_data = NULL
>  rcu_read_lock()
>  			    dev->rx_handler = NULL
> 
> 

That is not what is happening and that is not how RCU works. That is,
rcu_read_lock() does not block nor does it really do much with ordering
at all.

The problem is totally contained within the rcu_read_lock() as well:


If you have:

	rcu_read_lock();
	rx_handler = dev->rx_handler;
	rx_handler();
	rcu_read_unlock();

where rx_handler references rx->rx_handler_data you need much more than
making sure that rx->handler is set to null before rx_handler_data.

The way RCU works is it lets things exist in a "dual state". Kind of
like a Schödinger's cat. The solution Eric posted is a classic RCU
example of how this works.

When you set dev->rx_handler to NULL, there's two states that currently
exist in the system. Those that still see dev->rx_handler set to
something and those that see it set to NULL. You could put in memory
barriers to your hearts content, but you will still have a system that
sees things in a dual state. If you set dev->rx_handler_data to NULL,
you risk those that see rx_handler as a function can still reference the
rx_handler_data when it is NULL.

Think of it this way:

	dev->rx_handler() {

Once the function has been called, even if you set rx_handler() to NULL
at this point, it makes no difference, even with memory barriers. This
CPU is about to execute the previous value of rx_handler and there's
nothing you can do to stop it. Setting rx_handler_data to NULL now can
cause that CPU to reference the NULL pointer. There isn't a ordering
problem where rx_handler_data got set to NULL first.

But the beauty about RCU is the synchronize_*() functions, because that
puts the system back into a single state. After the synchronization is
complete, the entire system sees rx_handler() as NULL. There is no worry
about setting rx_handler_data to NULL now because nothing will be
referencing the previous value of rx_handler because that value no
longer exists in the system.

That means Eric's solution fits perfectly well here.

	< system in single state : everyone sees rx_handler = function() >

	rx_handler = NULL;

	< system in dual state : new calls see rx_handler = NULL, but
	  current calls see rx_handler = function >

	synchronize_net();

	< system is back to single state: everyone sees rx_handler = NULL >

	rx_handler_data = NULL;

no problem ;-)

-- Steve





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 13:01     ` [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() Eric Dumazet
  2013-03-29 13:17       ` Steven Rostedt
  2013-03-29 15:11       ` Ivan Vecera
@ 2013-03-29 19:20       ` Paul E. McKenney
  2013-03-29 19:26         ` Eric Dumazet
  2013-03-29 19:39         ` David Miller
  2 siblings, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2013-03-29 19:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, Steven Rostedt, Andy Gospodarek, David S. Miller,
	LKML, netdev, Nicolas de Pesloüan, Thomas Gleixner,
	Guy Streeter, stephen

On Fri, Mar 29, 2013 at 06:01:22AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:
> 
> > Hmm. I think that this might be issue introduced by:
> > commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a
> > Author: Stephen Hemminger <shemminger@vyatta.com>
> > Date:   Mon Aug 1 16:19:00 2011 +0000
> > 
> >     rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
> > 
> > 
> > Because, if rcu_dereference(dev->rx_handler) is null,
> > rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
> > we are hitting following scenario:
> > 
> > 
> >    CPU0				CPU1
> >    ----				----
> >   			    dev->rx_handler_data = NULL
> >  rcu_read_lock()
> >  			    dev->rx_handler = NULL
> > 
> > 
> > CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
> > barrier in rcu_assign_pointer() might prevent this reorder from happening.
> > Therefore I suggest:
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 0caa38e..c16b829 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
> >  {
> >  
> >  	ASSERT_RTNL();
> > -	RCU_INIT_POINTER(dev->rx_handler, NULL);
> > -	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> > +	rcu_assign_pointer(dev->rx_handler, NULL);
> > +	rcu_assign_pointer(dev->rx_handler_data, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
> >  
> > 
> 
> Nope this changes nothing at all.
> 
> However, we can fix the bug in a different way, if we want to avoid a
> test in fast path.
> 
> With following patch, we can make sure that a reader seeing a non NULL
> rx_handler has a guarantee to see a non NULL rx_handler_data
> 
> Thanks
> 
> [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
> 
> commit 35d48903e97819 (bonding: fix rx_handler locking) added a race
> in bonding driver, reported by Steven Rostedt who did a very good
> diagnosis :
> 
> <quoting Steven>
> 
> I'm currently debugging a crash in an old 3.0-rt kernel that one of our
> customers is seeing. The bug happens with a stress test that loads and
> unloads the bonding module in a loop (I don't know all the details as
> I'm not the one that is directly interacting with the customer). But the
> bug looks to be something that may still be present and possibly present
> in mainline too. It will just be much harder to trigger it in mainline.
> 
> In -rt, interrupts are threads, and can schedule in and out just like
> any other thread. Note, mainline now supports interrupt threads so this
> may be easily reproducible in mainline as well. I don't have the ability
> to tell the customer to try mainline or other kernels, so my hands are
> somewhat tied to what I can do.
> 
> But according to a core dump, I tracked down that the eth irq thread
> crashed in bond_handle_frame() here:
> 
>         slave = bond_slave_get_rcu(skb->dev);
>         bond = slave->bond; <--- BUG
> 
> 
> the slave returned was NULL and accessing slave->bond caused a NULL
> pointer dereference.
> 
> Looking at the code that unregisters the handler:
> 
> void netdev_rx_handler_unregister(struct net_device *dev)
> {
> 
>         ASSERT_RTNL();
>         RCU_INIT_POINTER(dev->rx_handler, NULL);
>         RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> }
> 
> Which is basically:
>         dev->rx_handler = NULL;
>         dev->rx_handler_data = NULL;
> 
> And looking at __netif_receive_skb() we have:
> 
>         rx_handler = rcu_dereference(skb->dev->rx_handler);
>         if (rx_handler) {
>                 if (pt_prev) {
>                         ret = deliver_skb(skb, pt_prev, orig_dev);
>                         pt_prev = NULL;
>                 }
>                 switch (rx_handler(&skb)) {
> 
> My question to all of you is, what stops this interrupt from happening
> while the bonding module is unloading?  What happens if the interrupt
> triggers and we have this:
> 
> 
>         CPU0                    CPU1
>         ----                    ----
>   rx_handler = skb->dev->rx_handler
> 
>                         netdev_rx_handler_unregister() {
>                            dev->rx_handler = NULL;
>                            dev->rx_handler_data = NULL;
> 
>   rx_handler()
>    bond_handle_frame() {
>     slave = skb->dev->rx_handler;
>     bond = slave->bond; <-- NULL pointer dereference!!!
> 
> 
> What protection am I missing in the bond release handler that would
> prevent the above from happening?
> 
> </quoting Steven>
> 
> We can fix bug this in two ways. First is adding a test in
> bond_handle_frame() and others to check if rx_handler_data is NULL.
> 
> A second way is adding a synchronize_net() in
> netdev_rx_handler_unregister() to make sure that a rcu protected reader
> has the guarantee to see a non NULL rx_handler_data.
> 
> The second way is better as it avoids an extra test in fast path.
> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jiri Pirko <jpirko@redhat.com>
> Cc: Paul E. McKenney <paulmck@us.ibm.com>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

With kudos to Steven Rostedt for his analogy between RCU and
Schrödinger's cat.  ;-)

> ---
>  net/core/dev.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b13e5c7..56932a4 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev,
>  	if (dev->rx_handler)
>  		return -EBUSY;
> 
> +	/* Note: rx_handler_data must be set before rx_handler */
>  	rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
>  	rcu_assign_pointer(dev->rx_handler, rx_handler);
> 
> @@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device *dev)
> 
>  	ASSERT_RTNL();
>  	RCU_INIT_POINTER(dev->rx_handler, NULL);
> +	/* a reader seeing a non NULL rx_handler in a rcu_read_lock()
> +	 * section has a guarantee to see a non NULL rx_handler_data
> +	 * as well.
> +	 */
> +	synchronize_net();
>  	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
>  }
>  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 19:20       ` Paul E. McKenney
@ 2013-03-29 19:26         ` Eric Dumazet
  2013-03-29 19:39         ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2013-03-29 19:26 UTC (permalink / raw)
  To: paulmck
  Cc: Jiri Pirko, Steven Rostedt, Andy Gospodarek, David S. Miller,
	LKML, netdev, Nicolas de Pesloüan, Thomas Gleixner,
	Guy Streeter, stephen

On Fri, 2013-03-29 at 12:20 -0700, Paul E. McKenney wrote:

> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> With kudos to Steven Rostedt for his analogy between RCU and
> Schrödinger's cat.  ;-)

Thanks Paul !



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
  2013-03-29 19:20       ` Paul E. McKenney
  2013-03-29 19:26         ` Eric Dumazet
@ 2013-03-29 19:39         ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2013-03-29 19:39 UTC (permalink / raw)
  To: paulmck
  Cc: eric.dumazet, jpirko, rostedt, andy, linux-kernel, netdev,
	nicolas.2p.debian, tglx, streeter, stephen

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Fri, 29 Mar 2013 12:20:11 -0700

> On Fri, Mar 29, 2013 at 06:01:22AM -0700, Eric Dumazet wrote:
>> [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
>> 
>> commit 35d48903e97819 (bonding: fix rx_handler locking) added a race
>> in bonding driver, reported by Steven Rostedt who did a very good
>> diagnosis :
 ...
>> We can fix bug this in two ways. First is adding a test in
>> bond_handle_frame() and others to check if rx_handler_data is NULL.
>> 
>> A second way is adding a synchronize_net() in
>> netdev_rx_handler_unregister() to make sure that a rcu protected reader
>> has the guarantee to see a non NULL rx_handler_data.
>> 
>> The second way is better as it avoids an extra test in fast path.
>> 
>> Reported-by: Steven Rostedt <rostedt@goodmis.org>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Jiri Pirko <jpirko@redhat.com>
>> Cc: Paul E. McKenney <paulmck@us.ibm.com>
> 
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> With kudos to Steven Rostedt for his analogy between RCU and
> Schrödinger's cat.  ;-)

Applied and queued up for -stable, thanks Everyone.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
  2013-03-29 18:36     ` Steven Rostedt
@ 2013-03-29 19:55       ` Steven Rostedt
  2013-03-30  9:19       ` Jiri Pirko
  1 sibling, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2013-03-29 19:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Eric Dumazet, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen

On Fri, 2013-03-29 at 14:36 -0400, Steven Rostedt wrote:

This one's for you Paul ;-)

That means Eric's solution fits perfectly well here.

	< system in single state : everyone sees cat = alive >

	insert_into_box(cat);

	< system in dual state : new calls see cat == dead, but
	  current calls see cat == alive >

	open_box();

	< system is back to single state: everyone sees cat = dead >

	funeral(cat); 

no problem ;-)

-- Steve




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
  2013-03-29 18:36     ` Steven Rostedt
  2013-03-29 19:55       ` Steven Rostedt
@ 2013-03-30  9:19       ` Jiri Pirko
  1 sibling, 0 replies; 18+ messages in thread
From: Jiri Pirko @ 2013-03-30  9:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Pirko, Eric Dumazet, Andy Gospodarek, David S. Miller, LKML,
	netdev, Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen

Fri, Mar 29, 2013 at 07:36:24PM CET, rostedt@goodmis.org wrote:
>On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:
>
>> Because, if rcu_dereference(dev->rx_handler) is null,
>> rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
>> we are hitting following scenario:
>> 
>> 
>>    CPU0				CPU1
>>    ----				----
>>   			    dev->rx_handler_data = NULL
>>  rcu_read_lock()
>>  			    dev->rx_handler = NULL
>> 
>> 
>
>That is not what is happening and that is not how RCU works. That is,
>rcu_read_lock() does not block nor does it really do much with ordering
>at all.
>
>The problem is totally contained within the rcu_read_lock() as well:
>
>
>If you have:
>
>	rcu_read_lock();
>	rx_handler = dev->rx_handler;
>	rx_handler();
>	rcu_read_unlock();
>
>where rx_handler references rx->rx_handler_data you need much more than
>making sure that rx->handler is set to null before rx_handler_data.
>
>The way RCU works is it lets things exist in a "dual state". Kind of
>like a Schödinger's cat. The solution Eric posted is a classic RCU
>example of how this works.
>
>When you set dev->rx_handler to NULL, there's two states that currently
>exist in the system. Those that still see dev->rx_handler set to
>something and those that see it set to NULL. You could put in memory
>barriers to your hearts content, but you will still have a system that
>sees things in a dual state. If you set dev->rx_handler_data to NULL,
>you risk those that see rx_handler as a function can still reference the
>rx_handler_data when it is NULL.
>
>Think of it this way:
>
>	dev->rx_handler() {
>
>Once the function has been called, even if you set rx_handler() to NULL
>at this point, it makes no difference, even with memory barriers. This
>CPU is about to execute the previous value of rx_handler and there's
>nothing you can do to stop it. Setting rx_handler_data to NULL now can
>cause that CPU to reference the NULL pointer. There isn't a ordering
>problem where rx_handler_data got set to NULL first.
>
>But the beauty about RCU is the synchronize_*() functions, because that
>puts the system back into a single state. After the synchronization is
>complete, the entire system sees rx_handler() as NULL. There is no worry
>about setting rx_handler_data to NULL now because nothing will be
>referencing the previous value of rx_handler because that value no
>longer exists in the system.
>
>That means Eric's solution fits perfectly well here.
>
>	< system in single state : everyone sees rx_handler = function() >
>
>	rx_handler = NULL;
>
>	< system in dual state : new calls see rx_handler = NULL, but
>	  current calls see rx_handler = function >
>
>	synchronize_net();
>
>	< system is back to single state: everyone sees rx_handler = NULL >
>
>	rx_handler_data = NULL;
>
>no problem ;-)
>
>-- Steve


I think I understand now. I was under false impression that when rcu_read_lock()
is held, rcu_dereference(pointer) value is predetermined (for that
single run I mean).

Thank you very much for explanation!

Jiri

>
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-03-30  9:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-28 17:16 [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline) Steven Rostedt
2013-03-28 17:29 ` Eric Dumazet
2013-03-28 17:44   ` Steven Rostedt
2013-03-29  9:48   ` Jiri Pirko
2013-03-29 13:01     ` [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() Eric Dumazet
2013-03-29 13:17       ` Steven Rostedt
2013-03-29 13:38         ` Eric Dumazet
2013-03-29 15:11       ` Ivan Vecera
2013-03-29 15:38         ` Eric Dumazet
2013-03-29 16:12           ` Jiri Pirko
2013-03-29 16:46             ` Eric Dumazet
2013-03-29 19:20       ` Paul E. McKenney
2013-03-29 19:26         ` Eric Dumazet
2013-03-29 19:39         ` David Miller
2013-03-29 15:46     ` [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline) Stephen Hemminger
2013-03-29 18:36     ` Steven Rostedt
2013-03-29 19:55       ` Steven Rostedt
2013-03-30  9:19       ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.