linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-28  0:22 Suzanne Wood
  0 siblings, 0 replies; 27+ messages in thread
From: Suzanne Wood @ 2005-09-28  0:22 UTC (permalink / raw)
  To: davem; +Cc: Robert.Olsson, linux-kernel, paulmck, walpole

Many thanks for all you've provided here. 

   > From davem@davemloft.net  Tue Sep 27 14:36:32 2005

   > I agree with the changes to add rcu_dereference() use.
   > Those were definitely lacking and needed.

   > This following case is clever and correct, though.  It is from
   > the net/ipv4/devinet.c part of your patch:

   > @@ -409,7 +412,8 @@ static int inet_rtm_deladdr(struct sk_bu
   >  
   >         if ((in_dev = inetdev_by_index(ifm->ifa_index)) == NULL)
   >                 goto out;
   > -       __in_dev_put(in_dev);
   > +       in_dev_put(in_dev);
   > +       rcu_read_unlock();
   >  
   >         for (ifap = &in_dev->ifa_list; (ifa = *ifap) != NULL;
   >              ifap = &ifa->ifa_next) {

   > Everyone gets fooled by a certain invariant in the Linux networking
   > locking.  If the RTNL semaphore is held, _all_ device and address
   > configuration changes are blocked.  IP addresses cannot be removed,
   > devices cannot be brought up or down, routes cannot be added or
   > deleted, etc.  The RTNL semaphore serializes all of these operations.
   > And it is held during inet_rtm_deladdr() here.

   > So we _know_ that if inetdev_by_index() returns non-NULL someone
   > else (the device itself) holds at least _one_ reference to that
   > object which cannot go away, because all such actions would need
   > to take the RTNL semaphore which we hold.

   > So __in_dev_put() is safe here.

In this case, you want the refcnt decremented without the unnecessary 
test that in_dev_put() would incur.   I was concerned about the 
pairings of __in_dev_get which doesn't increment refcnt with 
__in_dev_put which decrements.  Didn't mean to address that til 
after some feedback, but thank you for clarifying my error here
since I can't trace any pairing with the use of __in_dev_put 
in inet_rtm_deladdr.

   > Arguably, it's being overly clever for questionable gain.
   > It definitely deserves a comment, how about that? :-)

   > Finally, about adding rcu_read_{lock,unlock}() around even
   > in_dev_{get,put}().  I bet that really isn't needed but I cannot
   > articulate why we can get away without it.  For example, if we
   > are given a pair executed in a function like:

   >         in_dev_get();

   >         ...

   >         in_dev_put();

   > who cares if we preempt?  The local function's execution holds the
   > necessary reference, so the object's refcount cannot ever fall to
   > zero.

   > We can't get any RCU callbacks invoked, as a result, so we don't
   > need the rcu_read_{lock,unlock}() calls here.

   > The in_dev_put() uses atomic_dec_and_test(), which provides a memory
   > barrier, so no out-of-order cpu memory references to the object
   > can escape past the decrement to zero of the object reference count.

   > In short, I think adding rcu_read_{lock,unlock}() is very heavy
   > handed and unnecessary.

In Paul McKenney's reference at
www.rdrop.com/users/paulmck/RCU/whatisRCU.html
"Reference counts may be used in conjunction with RCU to maintain 
longer-term references to data structures."  So you're right.  I
was basing those rcu_read_lock extents on the idea that the calling
function has the vision of the need for protection of an
rcu_dereference'd pointer.  Paul has also provided further insight
into discriminating between read-side and update-side uses of
rcu_dereference which I need to incorporate.

Many thanks again and I'll try for a better submission.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-10-01 18:37 Suzanne Wood
@ 2005-10-01 19:29 ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2005-10-01 19:29 UTC (permalink / raw)
  To: Suzanne Wood; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

On Sat, Oct 01, 2005 at 11:37:14AM -0700, Suzanne Wood wrote:
> 
> But the following may be worth considering.
> 
>   > It is also probably good to retain the old __in_dev_get()
> and deprecate it.

I think it's better to get rid of it actually so that if there are
external modules using this they can make sure that they're using
the right function.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-10-01 18:37 Suzanne Wood
  2005-10-01 19:29 ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Suzanne Wood @ 2005-10-01 18:37 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

Please excuse this restatement of an earlier concern.

  > From suzannew Sat Oct  1 11:00:28 2005

  > With the spotlight leaving in_dev_get, we have the parallel question 
  > of in_dev_put() and __in_dev_put()  both defined with refcnt 
  > decrement, but the preceding underscore may lend itself to an
  > inadvertant pairing and refcnt inaccuracy.

Dave Miller already addressed this question in 
http://www.ussg.iu.edu/hypermail/linux/kernel/0509.3/0757.html 

  > It may not be reasonable to rename __in_dev_put for its parallel definition
  > since its current usage is with __in_dev_get_rtnl() which does not increment 
  > refcnt. 

But the following may be worth considering.

  > It is also probably good to retain the old __in_dev_get()
and deprecate it.

Thank you.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-10-01  7:12 ` Herbert Xu
@ 2005-10-01 18:04   ` Paul E. McKenney
  0 siblings, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2005-10-01 18:04 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Suzanne Wood, Robert.Olsson, davem, linux-kernel, netdev, walpole

On Sat, Oct 01, 2005 at 05:12:48PM +1000, Herbert Xu wrote:
> On Fri, Sep 30, 2005 at 11:56:41PM -0700, Suzanne Wood wrote:
> > 
> > But it is interesting to have discarded what was developed yesterday
> > to minimize rcu_dereference impact:
> >
> > >> ----- Original message  -----
> > >> From: Herbert Xu <herbert@gondor.apana.org.au>
> > >> Date: Fri, 30 Sep 2005 11:19:07 +1000
> > >> 
> > >> On Thu, Sep 29, 2005 at 06:16:03PM -0700, Paul E. McKenney wrote:
> > >>> 
> > >>> OK, how about this instead?
> > >>> 
> > >>> rcu_read_lock();
> > >>> in_dev = dev->ip_ptr;
> > >>> if (in_dev) {
> > >>> atomic_inc(&rcu_dereference(in_dev)->refcnt);
> > >>> }
> > >>> rcu_read_unlock();
> > >>> return in_dev;
> > >> 
> > >> Looks great. 
> > 
> > while adding a function call level by wrapping  __in_dev_get_rcu
> > with in_dev_get as suggested here.
> 
> It might look different, but it should compile to the same result.
> GCC should be smart enough to combine the two branches and produce a
> memory barrier only when in_dev is not NULL.
>  
> > The other thing I'd hoped to address in pktgen.c was 
> > removing the __in_dev_put() which decrements refcnt 
> > while __in_dev_get_rcu() does not increment.
> 
> Well spotted.
> 
> Here is a patch on top of the last one to fix this bogus decrement.

Both this and Herbert's prior patch look good to me!

							Thanx, Paul

> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Thanks,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -1673,7 +1673,6 @@ static void pktgen_setup_inject(struct p
>  					pkt_dev->saddr_min = in_dev->ifa_list->ifa_address;
>  					pkt_dev->saddr_max = pkt_dev->saddr_min;
>  				}
> -				__in_dev_put(in_dev);	
>  			}
>  			rcu_read_unlock();
>  		}


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-10-01 18:00 Suzanne Wood
  0 siblings, 0 replies; 27+ messages in thread
From: Suzanne Wood @ 2005-10-01 18:00 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

With the spotlight leaving in_dev_get, we have the parallel question 
of in_dev_put() and __in_dev_put()  both defined with refcnt 
decrement, but the preceding underscore may lend itself to an
inadvertant pairing and refcnt inaccuracy.

In the following  cscope of linux-2.6.14-rc2 prior to Herbert's patches 
which remove the occurence in pktgen.c and the others are
paired with __in_dev_get_rtnl(), except xfrm4_policy.c():

C symbol: __in_dev_put
  File           Function              Line
  File           Function              Line
0 inetdevice.h   <global>               172 #define __in_dev_put(idev) atomic_dec(&(idev)->refcnt)
1 pktgen.c       pktgen_setup_inject   1687 __in_dev_put(in_dev);
2 devinet.c      inet_rtm_deladdr       412 __in_dev_put(in_dev);
3 igmp.c         igmp_gq_timer_expire   686 __in_dev_put(in_dev);
4 igmp.c         igmp_ifc_timer_expire  698 __in_dev_put(in_dev);
5 igmp.c         igmp_heard_query       801 __in_dev_put(in_dev);
6 igmp.c         ip_mc_down            1228 __in_dev_put(in_dev);
7 igmp.c         ip_mc_down            1231 __in_dev_put(in_dev);
8 igmp.c         ip_mc_find_dev        1310 __in_dev_put(idev);
9 xfrm4_policy.c xfrm4_dst_ifdown       280 __in_dev_put(loopback_idev);

It may not be reasonable to rename __in_dev_put for its parallel definition
since its current usage is with __in_dev_get_rtnl() which does not increment 
refcnt. 

It is also probably good to retain the old __in_dev_get() 
and __in_dev_put() and deprecate them.

Old business:  this is probably stating the obvious,
with Herbert's patches in place, none of the test patch
I'd submitted 8 Sept should take effect.

Thank you.
Suzanne

----- Original Message ----- 
From: "Herbert Xu" <herbert@gondor.apana.org.au>
Sent: Saturday, October 01, 2005 12:12 AM

> On Fri, Sep 30, 2005 at 11:56:41PM -0700, Suzanne Wood wrote:
>> 
>> The other thing I'd hoped to address in pktgen.c was 
>> removing the __in_dev_put() which decrements refcnt 
>> while __in_dev_get_rcu() does not increment.
> 
> Well spotted.
> 
> Here is a patch on top of the last one to fix this bogus decrement.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-10-01  6:56 Suzanne Wood
@ 2005-10-01  7:12 ` Herbert Xu
  2005-10-01 18:04   ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-10-01  7:12 UTC (permalink / raw)
  To: Suzanne Wood; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]

On Fri, Sep 30, 2005 at 11:56:41PM -0700, Suzanne Wood wrote:
> 
> But it is interesting to have discarded what was developed yesterday
> to minimize rcu_dereference impact:
>
> >> ----- Original message  -----
> >> From: Herbert Xu <herbert@gondor.apana.org.au>
> >> Date: Fri, 30 Sep 2005 11:19:07 +1000
> >> 
> >> On Thu, Sep 29, 2005 at 06:16:03PM -0700, Paul E. McKenney wrote:
> >>> 
> >>> OK, how about this instead?
> >>> 
> >>> rcu_read_lock();
> >>> in_dev = dev->ip_ptr;
> >>> if (in_dev) {
> >>> atomic_inc(&rcu_dereference(in_dev)->refcnt);
> >>> }
> >>> rcu_read_unlock();
> >>> return in_dev;
> >> 
> >> Looks great. 
> 
> while adding a function call level by wrapping  __in_dev_get_rcu
> with in_dev_get as suggested here.

It might look different, but it should compile to the same result.
GCC should be smart enough to combine the two branches and produce a
memory barrier only when in_dev is not NULL.
 
> The other thing I'd hoped to address in pktgen.c was 
> removing the __in_dev_put() which decrements refcnt 
> while __in_dev_get_rcu() does not increment.

Well spotted.

Here is a patch on top of the last one to fix this bogus decrement.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 336 bytes --]

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1673,7 +1673,6 @@ static void pktgen_setup_inject(struct p
 					pkt_dev->saddr_min = in_dev->ifa_list->ifa_address;
 					pkt_dev->saddr_max = pkt_dev->saddr_min;
 				}
-				__in_dev_put(in_dev);	
 			}
 			rcu_read_unlock();
 		}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-10-01  6:56 Suzanne Wood
  2005-10-01  7:12 ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Suzanne Wood @ 2005-10-01  6:56 UTC (permalink / raw)
  To: herbert
  Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, suzannew, walpole

Many thanks for addressing this issue.

----- Original Message ----- 
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Sat, 1 Oct 2005 11:13:12 +1000
> 
> On Thu, Sep 29, 2005 at 06:06:50PM -0700, Suzanne Wood wrote:
>> 
>> Are there three cases then?  RCU protection with refcnt, RCU without refcnt,
>> and the bare cast of the dereference? 
> 
> Correct.
> 
> The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
> introduces __in_dev_get_rcu() to cover the second case.
> 
> 1) RCU with refcnt should use in_dev_get().
> 2) RCU without refcnt should use __in_dev_get_rcu().
> 3) All others must hold RTNL and use __in_dev_get_rtnl().
> 

The naming to indicate purpose looks good and the leading underscores
in the prior work did seem to imply wrapping to add functionality.

But it is interesting to have discarded what was developed yesterday
to minimize rcu_dereference impact:

>> ----- Original message  -----
>> From: Herbert Xu <herbert@gondor.apana.org.au>
>> Date: Fri, 30 Sep 2005 11:19:07 +1000
>> 
>> On Thu, Sep 29, 2005 at 06:16:03PM -0700, Paul E. McKenney wrote:
>>> 
>>> OK, how about this instead?
>>> 
>>> rcu_read_lock();
>>> in_dev = dev->ip_ptr;
>>> if (in_dev) {
>>> atomic_inc(&rcu_dereference(in_dev)->refcnt);
>>> }
>>> rcu_read_unlock();
>>> return in_dev;
>> 
>> Looks great. 

while adding a function call level by wrapping  __in_dev_get_rcu
with in_dev_get as suggested here.

> -- 
> diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
> --- a/include/linux/inetdevice.h
> +++ b/include/linux/inetdevice.h
> @@ -142,13 +142,21 @@ static __inline__ int bad_mask(u32 mask,
> 
> #define endfor_ifa(in_dev) }
> 
> +static inline struct in_device *__in_dev_get_rcu(const struct net_device *dev)
> +{
> + struct in_device *in_dev = dev->ip_ptr;
> + if (in_dev)
> + in_dev = rcu_dereference(in_dev);
> + return in_dev;
> +}
> +
> static __inline__ struct in_device *
> in_dev_get(const struct net_device *dev)
> {
>  struct in_device *in_dev;
> 
>  rcu_read_lock();
> - in_dev = dev->ip_ptr;
> + in_dev = __in_dev_get_rcu(dev);
>  if (in_dev)
>  atomic_inc(&in_dev->refcnt);
>  rcu_read_unlock();
> @@ -156,7 +164,7 @@ in_dev_get(const struct net_device *dev)
> }
> 
> static __inline__ struct in_device *
> -__in_dev_get(const struct net_device *dev)
> +__in_dev_get_rtnl(const struct net_device *dev)
> {
>  return (struct in_device*)dev->ip_ptr;
> }

The other thing I'd hoped to address in pktgen.c was 
removing the __in_dev_put() which decrements refcnt 
while __in_dev_get_rcu() does not increment.

> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -1667,7 +1667,7 @@ static void pktgen_setup_inject(struct p
>  struct in_device *in_dev; 
> 
>  rcu_read_lock();
> - in_dev = __in_dev_get(pkt_dev->odev);
> + in_dev = __in_dev_get_rcu(pkt_dev->odev);
>  if (in_dev) {
>  if (in_dev->ifa_list) {
>  pkt_dev->saddr_min = in_dev->ifa_list->ifa_address;

     pkt_dev->saddr_max = pkt_dev->saddr_min;
    }
-   __in_dev_put(in_dev); 
   }
   rcu_read_unlock();

Thank you very much again for resolving this by clarifying both the 
RCU and RTNL usages.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-30  1:06 Suzanne Wood
@ 2005-10-01  1:13 ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2005-10-01  1:13 UTC (permalink / raw)
  To: Suzanne Wood; +Cc: paulmck, Robert.Olsson, davem, linux-kernel, netdev, walpole

[-- Attachment #1: Type: text/plain, Size: 987 bytes --]

On Thu, Sep 29, 2005 at 06:06:50PM -0700, Suzanne Wood wrote:
> 
> Are there three cases then?  RCU protection with refcnt, RCU without refcnt,
> and the bare cast of the dereference? 

Correct.

The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
introduces __in_dev_get_rcu() to cover the second case.

1) RCU with refcnt should use in_dev_get().
2) RCU without refcnt should use __in_dev_get_rcu().
3) All others must hold RTNL and use __in_dev_get_rtnl().

There is one exception in net/ipv4/route.c which is in fact a pre-existing
race condition.  I've marked it as such so that we remember to fix it.

This patch is based on suggestions and prior work by Suzanne Wood and
Paul McKenney.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 17044 bytes --]

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2776,7 +2776,7 @@ static u32 bond_glean_dev_ip(struct net_
 		return 0;
 
 	rcu_read_lock();
-	idev = __in_dev_get(dev);
+	idev = __in_dev_get_rcu(dev);
 	if (!idev)
 		goto out;
 
diff --git a/drivers/net/wan/sdlamain.c b/drivers/net/wan/sdlamain.c
--- a/drivers/net/wan/sdlamain.c
+++ b/drivers/net/wan/sdlamain.c
@@ -57,6 +57,7 @@
 #include <linux/ioport.h>	/* request_region(), release_region() */
 #include <linux/wanrouter.h>	/* WAN router definitions */
 #include <linux/wanpipe.h>	/* WANPIPE common user API definitions */
+#include <linux/rcupdate.h>
 
 #include <linux/in.h>
 #include <asm/io.h>		/* phys_to_virt() */
@@ -1268,37 +1269,41 @@ unsigned long get_ip_address(struct net_
 	
 	struct in_ifaddr *ifaddr;
 	struct in_device *in_dev;
+	unsigned long addr = 0;
 
-	if ((in_dev = __in_dev_get(dev)) == NULL){
-		return 0;
+	rcu_read_lock();
+	if ((in_dev = __in_dev_get_rcu(dev)) == NULL){
+		goto out;
 	}
 
 	if ((ifaddr = in_dev->ifa_list)== NULL ){
-		return 0;
+		goto out;
 	}
 	
 	switch (option){
 
 	case WAN_LOCAL_IP:
-		return ifaddr->ifa_local;
+		addr = ifaddr->ifa_local;
 		break;
 	
 	case WAN_POINTOPOINT_IP:
-		return ifaddr->ifa_address;
+		addr = ifaddr->ifa_address;
 		break;	
 
 	case WAN_NETMASK_IP:
-		return ifaddr->ifa_mask;
+		addr = ifaddr->ifa_mask;
 		break;
 
 	case WAN_BROADCAST_IP:
-		return ifaddr->ifa_broadcast;
+		addr = ifaddr->ifa_broadcast;
 		break;
 	default:
-		return 0;
+		break;
 	}
 
-	return 0;
+out:
+	rcu_read_unlock();
+	return addr;
 }	
 
 void add_gateway(sdla_t *card, struct net_device *dev)
diff --git a/drivers/net/wan/syncppp.c b/drivers/net/wan/syncppp.c
--- a/drivers/net/wan/syncppp.c
+++ b/drivers/net/wan/syncppp.c
@@ -769,7 +769,7 @@ static void sppp_cisco_input (struct spp
 		u32 addr = 0, mask = ~0; /* FIXME: is the mask correct? */
 #ifdef CONFIG_INET
 		rcu_read_lock();
-		if ((in_dev = __in_dev_get(dev)) != NULL)
+		if ((in_dev = __in_dev_get_rcu(dev)) != NULL)
 		{
 			for (ifa=in_dev->ifa_list; ifa != NULL;
 				ifa=ifa->ifa_next) {
diff --git a/drivers/net/wireless/strip.c b/drivers/net/wireless/strip.c
--- a/drivers/net/wireless/strip.c
+++ b/drivers/net/wireless/strip.c
@@ -1352,7 +1352,7 @@ static unsigned char *strip_make_packet(
 		struct in_device *in_dev;
 
 		rcu_read_lock();
-		in_dev = __in_dev_get(strip_info->dev);
+		in_dev = __in_dev_get_rcu(strip_info->dev);
 		if (in_dev == NULL) {
 			rcu_read_unlock();
 			return NULL;
@@ -1508,7 +1508,7 @@ static void strip_send(struct strip *str
 
 		brd = addr = 0;
 		rcu_read_lock();
-		in_dev = __in_dev_get(strip_info->dev);
+		in_dev = __in_dev_get_rcu(strip_info->dev);
 		if (in_dev) {
 			if (in_dev->ifa_list) {
 				brd = in_dev->ifa_list->ifa_broadcast;
diff --git a/drivers/parisc/led.c b/drivers/parisc/led.c
--- a/drivers/parisc/led.c
+++ b/drivers/parisc/led.c
@@ -37,6 +37,7 @@
 #include <linux/proc_fs.h>
 #include <linux/ctype.h>
 #include <linux/blkdev.h>
+#include <linux/rcupdate.h>
 #include <asm/io.h>
 #include <asm/processor.h>
 #include <asm/hardware.h>
@@ -358,9 +359,10 @@ static __inline__ int led_get_net_activi
 	/* we are running as tasklet, so locking dev_base 
 	 * for reading should be OK */
 	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	for (dev = dev_base; dev; dev = dev->next) {
 	    struct net_device_stats *stats;
-	    struct in_device *in_dev = __in_dev_get(dev);
+	    struct in_device *in_dev = __in_dev_get_rcu(dev);
 	    if (!in_dev || !in_dev->ifa_list)
 		continue;
 	    if (LOOPBACK(in_dev->ifa_list->ifa_local))
@@ -371,6 +373,7 @@ static __inline__ int led_get_net_activi
 	    rx_total += stats->rx_packets;
 	    tx_total += stats->tx_packets;
 	}
+	rcu_read_unlock();
 	read_unlock(&dev_base_lock);
 
 	retval = 0;
diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c
--- a/drivers/s390/net/qeth_main.c
+++ b/drivers/s390/net/qeth_main.c
@@ -5200,7 +5200,7 @@ qeth_free_vlan_addresses4(struct qeth_ca
 	if (!card->vlangrp)
 		return;
 	rcu_read_lock();
-	in_dev = __in_dev_get(card->vlangrp->vlan_devices[vid]);
+	in_dev = __in_dev_get_rcu(card->vlangrp->vlan_devices[vid]);
 	if (!in_dev)
 		goto out;
 	for (ifa = in_dev->ifa_list; ifa; ifa = ifa->ifa_next) {
@@ -7725,7 +7725,7 @@ qeth_arp_constructor(struct neighbour *n
 		goto out;
 
 	rcu_read_lock();
-	in_dev = rcu_dereference(__in_dev_get(dev));
+	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev == NULL) {
 		rcu_read_unlock();
 		return -EINVAL;
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -142,13 +142,21 @@ static __inline__ int bad_mask(u32 mask,
 
 #define endfor_ifa(in_dev) }
 
+static inline struct in_device *__in_dev_get_rcu(const struct net_device *dev)
+{
+	struct in_device *in_dev = dev->ip_ptr;
+	if (in_dev)
+		in_dev = rcu_dereference(in_dev);
+	return in_dev;
+}
+
 static __inline__ struct in_device *
 in_dev_get(const struct net_device *dev)
 {
 	struct in_device *in_dev;
 
 	rcu_read_lock();
-	in_dev = dev->ip_ptr;
+	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev)
 		atomic_inc(&in_dev->refcnt);
 	rcu_read_unlock();
@@ -156,7 +164,7 @@ in_dev_get(const struct net_device *dev)
 }
 
 static __inline__ struct in_device *
-__in_dev_get(const struct net_device *dev)
+__in_dev_get_rtnl(const struct net_device *dev)
 {
 	return (struct in_device*)dev->ip_ptr;
 }
diff --git a/net/atm/clip.c b/net/atm/clip.c
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -310,7 +310,7 @@ static int clip_constructor(struct neigh
 	if (neigh->type != RTN_UNICAST) return -EINVAL;
 
 	rcu_read_lock();
-	in_dev = rcu_dereference(__in_dev_get(dev));
+	in_dev = __in_dev_get_rcu(dev);
 	if (!in_dev) {
 		rcu_read_unlock();
 		return -EINVAL;
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -703,7 +703,7 @@ int netpoll_setup(struct netpoll *np)
 
 	if (!np->local_ip) {
 		rcu_read_lock();
-		in_dev = __in_dev_get(ndev);
+		in_dev = __in_dev_get_rcu(ndev);
 
 		if (!in_dev || !in_dev->ifa_list) {
 			rcu_read_unlock();
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1667,7 +1667,7 @@ static void pktgen_setup_inject(struct p
 			struct in_device *in_dev; 
 
 			rcu_read_lock();
-			in_dev = __in_dev_get(pkt_dev->odev);
+			in_dev = __in_dev_get_rcu(pkt_dev->odev);
 			if (in_dev) {
 				if (in_dev->ifa_list) {
 					pkt_dev->saddr_min = in_dev->ifa_list->ifa_address;
diff --git a/net/econet/af_econet.c b/net/econet/af_econet.c
--- a/net/econet/af_econet.c
+++ b/net/econet/af_econet.c
@@ -406,7 +406,7 @@ static int econet_sendmsg(struct kiocb *
 		unsigned long network = 0;
 
 		rcu_read_lock();
-		idev = __in_dev_get(dev);
+		idev = __in_dev_get_rcu(dev);
 		if (idev) {
 			if (idev->ifa_list)
 				network = ntohl(idev->ifa_list->ifa_address) & 
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -241,7 +241,7 @@ static int arp_constructor(struct neighb
 	neigh->type = inet_addr_type(addr);
 
 	rcu_read_lock();
-	in_dev = rcu_dereference(__in_dev_get(dev));
+	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev == NULL) {
 		rcu_read_unlock();
 		return -EINVAL;
@@ -990,8 +990,8 @@ static int arp_req_set(struct arpreq *r,
 			ipv4_devconf.proxy_arp = 1;
 			return 0;
 		}
-		if (__in_dev_get(dev)) {
-			__in_dev_get(dev)->cnf.proxy_arp = 1;
+		if (__in_dev_get_rtnl(dev)) {
+			__in_dev_get_rtnl(dev)->cnf.proxy_arp = 1;
 			return 0;
 		}
 		return -ENXIO;
@@ -1096,8 +1096,8 @@ static int arp_req_delete(struct arpreq 
 				ipv4_devconf.proxy_arp = 0;
 				return 0;
 			}
-			if (__in_dev_get(dev)) {
-				__in_dev_get(dev)->cnf.proxy_arp = 0;
+			if (__in_dev_get_rtnl(dev)) {
+				__in_dev_get_rtnl(dev)->cnf.proxy_arp = 0;
 				return 0;
 			}
 			return -ENXIO;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -351,7 +351,7 @@ static int inet_insert_ifa(struct in_ifa
 
 static int inet_set_ifa(struct net_device *dev, struct in_ifaddr *ifa)
 {
-	struct in_device *in_dev = __in_dev_get(dev);
+	struct in_device *in_dev = __in_dev_get_rtnl(dev);
 
 	ASSERT_RTNL();
 
@@ -449,7 +449,7 @@ static int inet_rtm_newaddr(struct sk_bu
 		goto out;
 
 	rc = -ENOBUFS;
-	if ((in_dev = __in_dev_get(dev)) == NULL) {
+	if ((in_dev = __in_dev_get_rtnl(dev)) == NULL) {
 		in_dev = inetdev_init(dev);
 		if (!in_dev)
 			goto out;
@@ -584,7 +584,7 @@ int devinet_ioctl(unsigned int cmd, void
 	if (colon)
 		*colon = ':';
 
-	if ((in_dev = __in_dev_get(dev)) != NULL) {
+	if ((in_dev = __in_dev_get_rtnl(dev)) != NULL) {
 		if (tryaddrmatch) {
 			/* Matthias Andree */
 			/* compare label and address (4.4BSD style) */
@@ -748,7 +748,7 @@ rarok:
 
 static int inet_gifconf(struct net_device *dev, char __user *buf, int len)
 {
-	struct in_device *in_dev = __in_dev_get(dev);
+	struct in_device *in_dev = __in_dev_get_rtnl(dev);
 	struct in_ifaddr *ifa;
 	struct ifreq ifr;
 	int done = 0;
@@ -791,7 +791,7 @@ u32 inet_select_addr(const struct net_de
 	struct in_device *in_dev;
 
 	rcu_read_lock();
-	in_dev = __in_dev_get(dev);
+	in_dev = __in_dev_get_rcu(dev);
 	if (!in_dev)
 		goto no_in_dev;
 
@@ -818,7 +818,7 @@ no_in_dev:
 	read_lock(&dev_base_lock);
 	rcu_read_lock();
 	for (dev = dev_base; dev; dev = dev->next) {
-		if ((in_dev = __in_dev_get(dev)) == NULL)
+		if ((in_dev = __in_dev_get_rcu(dev)) == NULL)
 			continue;
 
 		for_primary_ifa(in_dev) {
@@ -887,7 +887,7 @@ u32 inet_confirm_addr(const struct net_d
 
 	if (dev) {
 		rcu_read_lock();
-		if ((in_dev = __in_dev_get(dev)))
+		if ((in_dev = __in_dev_get_rcu(dev)))
 			addr = confirm_addr_indev(in_dev, dst, local, scope);
 		rcu_read_unlock();
 
@@ -897,7 +897,7 @@ u32 inet_confirm_addr(const struct net_d
 	read_lock(&dev_base_lock);
 	rcu_read_lock();
 	for (dev = dev_base; dev; dev = dev->next) {
-		if ((in_dev = __in_dev_get(dev))) {
+		if ((in_dev = __in_dev_get_rcu(dev))) {
 			addr = confirm_addr_indev(in_dev, dst, local, scope);
 			if (addr)
 				break;
@@ -957,7 +957,7 @@ static int inetdev_event(struct notifier
 			 void *ptr)
 {
 	struct net_device *dev = ptr;
-	struct in_device *in_dev = __in_dev_get(dev);
+	struct in_device *in_dev = __in_dev_get_rtnl(dev);
 
 	ASSERT_RTNL();
 
@@ -1078,7 +1078,7 @@ static int inet_dump_ifaddr(struct sk_bu
 		if (idx > s_idx)
 			s_ip_idx = 0;
 		rcu_read_lock();
-		if ((in_dev = __in_dev_get(dev)) == NULL) {
+		if ((in_dev = __in_dev_get_rcu(dev)) == NULL) {
 			rcu_read_unlock();
 			continue;
 		}
@@ -1149,7 +1149,7 @@ void inet_forward_change(void)
 	for (dev = dev_base; dev; dev = dev->next) {
 		struct in_device *in_dev;
 		rcu_read_lock();
-		in_dev = __in_dev_get(dev);
+		in_dev = __in_dev_get_rcu(dev);
 		if (in_dev)
 			in_dev->cnf.forwarding = on;
 		rcu_read_unlock();
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -173,7 +173,7 @@ int fib_validate_source(u32 src, u32 dst
 
 	no_addr = rpf = 0;
 	rcu_read_lock();
-	in_dev = __in_dev_get(dev);
+	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev) {
 		no_addr = in_dev->ifa_list == NULL;
 		rpf = IN_DEV_RPFILTER(in_dev);
@@ -607,7 +607,7 @@ static int fib_inetaddr_event(struct not
 static int fib_netdev_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
 	struct net_device *dev = ptr;
-	struct in_device *in_dev = __in_dev_get(dev);
+	struct in_device *in_dev = __in_dev_get_rtnl(dev);
 
 	if (event == NETDEV_UNREGISTER) {
 		fib_disable_ip(dev, 2);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1087,7 +1087,7 @@ fib_convert_rtentry(int cmd, struct nlms
 		rta->rta_oif = &dev->ifindex;
 		if (colon) {
 			struct in_ifaddr *ifa;
-			struct in_device *in_dev = __in_dev_get(dev);
+			struct in_device *in_dev = __in_dev_get_rtnl(dev);
 			if (!in_dev)
 				return -ENODEV;
 			*colon = ':';
@@ -1268,7 +1268,7 @@ int fib_sync_up(struct net_device *dev)
 			}
 			if (nh->nh_dev == NULL || !(nh->nh_dev->flags&IFF_UP))
 				continue;
-			if (nh->nh_dev != dev || __in_dev_get(dev) == NULL)
+			if (nh->nh_dev != dev || !__in_dev_get_rtnl(dev))
 				continue;
 			alive++;
 			spin_lock_bh(&fib_multipath_lock);
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1323,7 +1323,7 @@ static struct in_device * ip_mc_find_dev
 	}
 	if (dev) {
 		imr->imr_ifindex = dev->ifindex;
-		idev = __in_dev_get(dev);
+		idev = __in_dev_get_rtnl(dev);
 	}
 	return idev;
 }
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1104,10 +1104,10 @@ static int ipgre_open(struct net_device 
 			return -EADDRNOTAVAIL;
 		dev = rt->u.dst.dev;
 		ip_rt_put(rt);
-		if (__in_dev_get(dev) == NULL)
+		if (__in_dev_get_rtnl(dev) == NULL)
 			return -EADDRNOTAVAIL;
 		t->mlink = dev->ifindex;
-		ip_mc_inc_group(__in_dev_get(dev), t->parms.iph.daddr);
+		ip_mc_inc_group(__in_dev_get_rtnl(dev), t->parms.iph.daddr);
 	}
 	return 0;
 }
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -149,7 +149,7 @@ struct net_device *ipmr_new_tunnel(struc
 		if (err == 0 && (dev = __dev_get_by_name(p.name)) != NULL) {
 			dev->flags |= IFF_MULTICAST;
 
-			in_dev = __in_dev_get(dev);
+			in_dev = __in_dev_get_rtnl(dev);
 			if (in_dev == NULL && (in_dev = inetdev_init(dev)) == NULL)
 				goto failure;
 			in_dev->cnf.rp_filter = 0;
@@ -278,7 +278,7 @@ static int vif_delete(int vifi)
 
 	dev_set_allmulti(dev, -1);
 
-	if ((in_dev = __in_dev_get(dev)) != NULL) {
+	if ((in_dev = __in_dev_get_rtnl(dev)) != NULL) {
 		in_dev->cnf.mc_forwarding--;
 		ip_rt_multicast_event(in_dev);
 	}
@@ -421,7 +421,7 @@ static int vif_add(struct vifctl *vifc, 
 		return -EINVAL;
 	}
 
-	if ((in_dev = __in_dev_get(dev)) == NULL)
+	if ((in_dev = __in_dev_get_rtnl(dev)) == NULL)
 		return -EADDRNOTAVAIL;
 	in_dev->cnf.mc_forwarding++;
 	dev_set_allmulti(dev, +1);
diff --git a/net/ipv4/netfilter/ip_conntrack_netbios_ns.c b/net/ipv4/netfilter/ip_conntrack_netbios_ns.c
--- a/net/ipv4/netfilter/ip_conntrack_netbios_ns.c
+++ b/net/ipv4/netfilter/ip_conntrack_netbios_ns.c
@@ -58,7 +58,7 @@ static int help(struct sk_buff **pskb,
 		goto out;
 
 	rcu_read_lock();
-	in_dev = __in_dev_get(rt->u.dst.dev);
+	in_dev = __in_dev_get_rcu(rt->u.dst.dev);
 	if (in_dev != NULL) {
 		for_primary_ifa(in_dev) {
 			if (ifa->ifa_broadcast == iph->daddr) {
diff --git a/net/ipv4/netfilter/ipt_REDIRECT.c b/net/ipv4/netfilter/ipt_REDIRECT.c
--- a/net/ipv4/netfilter/ipt_REDIRECT.c
+++ b/net/ipv4/netfilter/ipt_REDIRECT.c
@@ -93,7 +93,7 @@ redirect_target(struct sk_buff **pskb,
 		newdst = 0;
 		
 		rcu_read_lock();
-		indev = __in_dev_get((*pskb)->dev);
+		indev = __in_dev_get_rcu((*pskb)->dev);
 		if (indev && (ifa = indev->ifa_list))
 			newdst = ifa->ifa_local;
 		rcu_read_unlock();
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2128,7 +2128,7 @@ int ip_route_input(struct sk_buff *skb, 
 		struct in_device *in_dev;
 
 		rcu_read_lock();
-		if ((in_dev = __in_dev_get(dev)) != NULL) {
+		if ((in_dev = __in_dev_get_rcu(dev)) != NULL) {
 			int our = ip_check_mc(in_dev, daddr, saddr,
 				skb->nh.iph->protocol);
 			if (our
@@ -2443,7 +2443,9 @@ static int ip_route_output_slow(struct r
 		err = -ENODEV;
 		if (dev_out == NULL)
 			goto out;
-		if (__in_dev_get(dev_out) == NULL) {
+
+		/* RACE: Check return value of inet_select_addr instead. */
+		if (__in_dev_get_rtnl(dev_out) == NULL) {
 			dev_put(dev_out);
 			goto out;	/* Wrong error code */
 		}
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1806,7 +1806,7 @@ static void sit_add_v4_addrs(struct inet
 	}
 
         for (dev = dev_base; dev != NULL; dev = dev->next) {
-		struct in_device * in_dev = __in_dev_get(dev);
+		struct in_device * in_dev = __in_dev_get_rtnl(dev);
 		if (in_dev && (dev->flags & IFF_UP)) {
 			struct in_ifaddr * ifa;
 
diff --git a/net/irda/irlan/irlan_eth.c b/net/irda/irlan/irlan_eth.c
--- a/net/irda/irlan/irlan_eth.c
+++ b/net/irda/irlan/irlan_eth.c
@@ -310,7 +310,7 @@ void irlan_eth_send_gratuitous_arp(struc
 #ifdef CONFIG_INET
 	IRDA_DEBUG(4, "IrLAN: Sending gratuitous ARP\n");
 	rcu_read_lock();
-	in_dev = __in_dev_get(dev);
+	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev == NULL)
 		goto out;
 	if (in_dev->ifa_list)
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -147,7 +147,7 @@ static void sctp_v4_copy_addrlist(struct
 	struct sctp_sockaddr_entry *addr;
 
 	rcu_read_lock();
-	if ((in_dev = __in_dev_get(dev)) == NULL) {
+	if ((in_dev = __in_dev_get_rcu(dev)) == NULL) {
 		rcu_read_unlock();
 		return;
 	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-30  1:16         ` Paul E. McKenney
@ 2005-09-30  1:19           ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2005-09-30  1:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Suzanne Wood, Robert.Olsson, davem, linux-kernel, netdev, walpole

On Thu, Sep 29, 2005 at 06:16:03PM -0700, Paul E. McKenney wrote:
> 
> OK, how about this instead?
> 
> 	rcu_read_lock();
> 	in_dev = dev->ip_ptr;
> 	if (in_dev) {
> 		atomic_inc(&rcu_dereference(in_dev)->refcnt);
> 	}
> 	rcu_read_unlock();
> 	return in_dev;

Looks great.  Thanks Paul.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-30  1:04       ` Herbert Xu
@ 2005-09-30  1:16         ` Paul E. McKenney
  2005-09-30  1:19           ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2005-09-30  1:16 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Suzanne Wood, Robert.Olsson, davem, linux-kernel, netdev, walpole

On Fri, Sep 30, 2005 at 11:04:04AM +1000, Herbert Xu wrote:
> On Thu, Sep 29, 2005 at 05:36:42PM -0700, Paul E. McKenney wrote:
> >
> > > 	rcu_read_lock();
> > > 	in_dev = dev->ip_ptr;
> > > 	if (in_dev) {
> > > 		in_dev = rcu_dereference(in_dev);
> > > 		atomic_inc(&in_dev->refcnt);
> > > 	}
> > > 	rcu_read_unlock();
> > > 	return in_dev;
> > 
> > How about:
> > 
> > 	rcu_read_lock();
> > 	in_dev = dev->ip_ptr;
> > 	if (rcu_dereference(in_dev)) {
> > 		atomic_inc(&in_dev->refcnt);
> > 	}
> > 	rcu_read_unlock();
> > 	return in_dev;
> 
> With this the barrier will taken even when in_dev is NULL.
> 
> I agree this isn't such a big deal since it only impacts Alpha and then
> only when in_dev is NULL.  But as we already do the branch anyway to
> increment the reference count, we might as well make things a little
> better for Alpha.

OK, how about this instead?

	rcu_read_lock();
	in_dev = dev->ip_ptr;
	if (in_dev) {
		atomic_inc(&rcu_dereference(in_dev)->refcnt);
	}
	rcu_read_unlock();
	return in_dev;

						Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-30  1:06 Suzanne Wood
  2005-10-01  1:13 ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Suzanne Wood @ 2005-09-30  1:06 UTC (permalink / raw)
  To: paulmck, suzannew
  Cc: Robert.Olsson, davem, herbert, linux-kernel, netdev, walpole

In reviewing the 44 kernel uses of __in_dev_get and seeing many 
cases in 13 of 20 C code files of insertions of rcu_read_lock with 
and without the rcu_dereference that is indicated, so it does appear 
often to be programmer intent.  

Of the programs using __in_dev_get that don't include rcu, devinet.c 
and igmp.c use an rtnl lock.  Five other programs that use __in_dev_get 
without rcu have rtnl locking in the program source code, but I need 
to actually look further into the call tree to say more.

Are there three cases then?  RCU protection with refcnt, RCU without refcnt,
and the bare cast of the dereference? 

Thank you very much for getting it back on track.

  > From paulmck@us.ibm.com  Thu Sep 29 17:23:18 2005

  > Is there any case where __in_dev_get() might be called without
  > needing to be wrapped with rcu_dereference()?  If so, then I
  > agree (FWIW, given my meagre knowledge of Linux networking).

  > If all __in_dev_get() invocations need to be wrapped in
  > rcu_dereference(), then it seems to me that there would be
  > motivation to bury rcu_dereference() in __in_dev_get().

  > > > Some callers of __in_dev_get() don't need rcu_dereference at all
  > > > because they're protected by the rtnl.
  > > 
  > > > BTW, could you please move the rcu_dereference in in_dev_get()
  > > > into the if clause? The barrier is not needed when ip_ptr is
  > > > NULL.
  > > 
  > > The trouble with that may be that there are three events, the
  > > dereference, the assignment, and the conditional test.  The
  > > rcu_dereference() is meant to assure deferred destruction
  > > throughout.

  > One only needs an rcu_dereference() once on the data-flow path from
  > fetching the RCU-protected pointer to dereferencing that pointer.
  > If the pointer is NULL, there is no way you can dereference it,
  > so, technically, Herbert is quite correct.

  > However, rcu_dereference() only generates a memory barrier on DEC
  > Alpha, so there is normally no penalty for using it in the NULL-pointer
  > case.  So, when using rcu_dereference() unconditionally simplifies
  > the code, it may make sense to "just do it".

  > 							Thanx, Paul


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-30  0:36     ` Paul E. McKenney
@ 2005-09-30  1:04       ` Herbert Xu
  2005-09-30  1:16         ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-09-30  1:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Suzanne Wood, Robert.Olsson, davem, linux-kernel, netdev, walpole

On Thu, Sep 29, 2005 at 05:36:42PM -0700, Paul E. McKenney wrote:
>
> > 	rcu_read_lock();
> > 	in_dev = dev->ip_ptr;
> > 	if (in_dev) {
> > 		in_dev = rcu_dereference(in_dev);
> > 		atomic_inc(&in_dev->refcnt);
> > 	}
> > 	rcu_read_unlock();
> > 	return in_dev;
> 
> How about:
> 
> 	rcu_read_lock();
> 	in_dev = dev->ip_ptr;
> 	if (rcu_dereference(in_dev)) {
> 		atomic_inc(&in_dev->refcnt);
> 	}
> 	rcu_read_unlock();
> 	return in_dev;

With this the barrier will taken even when in_dev is NULL.

I agree this isn't such a big deal since it only impacts Alpha and then
only when in_dev is NULL.  But as we already do the branch anyway to
increment the reference count, we might as well make things a little
better for Alpha.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-30  0:27   ` Herbert Xu
@ 2005-09-30  0:36     ` Paul E. McKenney
  2005-09-30  1:04       ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2005-09-30  0:36 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Suzanne Wood, Robert.Olsson, davem, linux-kernel, netdev, walpole

On Fri, Sep 30, 2005 at 10:27:19AM +1000, Herbert Xu wrote:
> On Thu, Sep 29, 2005 at 05:23:46PM -0700, Paul E. McKenney wrote:
> > 
> > Is there any case where __in_dev_get() might be called without
> > needing to be wrapped with rcu_dereference()?  If so, then I
> > agree (FWIW, given my meagre knowledge of Linux networking).
> 
> Yes.  All paths that call __in_dev_get() under the rtnl do not
> need rcu_dereference (or any RCU at all) since the rtnl prevents
> any ip_ptr modification from occuring.
> 
> > However, rcu_dereference() only generates a memory barrier on DEC
> > Alpha, so there is normally no penalty for using it in the NULL-pointer
> > case.  So, when using rcu_dereference() unconditionally simplifies
> > the code, it may make sense to "just do it".
> 
> Here is what the code would look like:
> 
> 	rcu_read_lock();
> 	in_dev = dev->ip_ptr;
> 	if (in_dev) {
> 		in_dev = rcu_dereference(in_dev);
> 		atomic_inc(&in_dev->refcnt);
> 	}
> 	rcu_read_unlock();
> 	return in_dev;

How about:

	rcu_read_lock();
	in_dev = dev->ip_ptr;
	if (rcu_dereference(in_dev)) {
		atomic_inc(&in_dev->refcnt);
	}
	rcu_read_unlock();
	return in_dev;

Admittedly only saves one line, but...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-30  0:23 ` Paul E. McKenney
@ 2005-09-30  0:27   ` Herbert Xu
  2005-09-30  0:36     ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-09-30  0:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Suzanne Wood, Robert.Olsson, davem, linux-kernel, netdev, walpole

On Thu, Sep 29, 2005 at 05:23:46PM -0700, Paul E. McKenney wrote:
> 
> Is there any case where __in_dev_get() might be called without
> needing to be wrapped with rcu_dereference()?  If so, then I
> agree (FWIW, given my meagre knowledge of Linux networking).

Yes.  All paths that call __in_dev_get() under the rtnl do not
need rcu_dereference (or any RCU at all) since the rtnl prevents
any ip_ptr modification from occuring.

> However, rcu_dereference() only generates a memory barrier on DEC
> Alpha, so there is normally no penalty for using it in the NULL-pointer
> case.  So, when using rcu_dereference() unconditionally simplifies
> the code, it may make sense to "just do it".

Here is what the code would look like:

	rcu_read_lock();
	in_dev = dev->ip_ptr;
	if (in_dev) {
		in_dev = rcu_dereference(in_dev);
		atomic_inc(&in_dev->refcnt);
	}
	rcu_read_unlock();
	return in_dev;

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-29 23:30 Suzanne Wood
  2005-09-30  0:21 ` Herbert Xu
@ 2005-09-30  0:23 ` Paul E. McKenney
  2005-09-30  0:27   ` Herbert Xu
  1 sibling, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2005-09-30  0:23 UTC (permalink / raw)
  To: Suzanne Wood; +Cc: herbert, Robert.Olsson, davem, linux-kernel, netdev, walpole

On Thu, Sep 29, 2005 at 04:30:28PM -0700, Suzanne Wood wrote:
> > Date: Fri, 30 Sep 2005 07:28:36 +1000
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> 
> > On Thu, Sep 29, 2005 at 09:02:29AM -0700, Suzanne Wood wrote:
> > > 
> > > The exchange below suggests that it is equally important 
> > > to have the rcu_dereference() in __in_dev_get(), so the 
> > > idea of the only difference between in_dev_get and 
> > > __in_dev_get being the refcnt may be accepted.
> 
> > With __in_dev_get() it's the caller's responsibility to ensure
> > that RCU works correctly.  Therefore if any rcu_dereference is
> > needed it should be done by the caller.
> 
> This sounds reasonable to me.  Does everyone agree? 

Is there any case where __in_dev_get() might be called without
needing to be wrapped with rcu_dereference()?  If so, then I
agree (FWIW, given my meagre knowledge of Linux networking).

If all __in_dev_get() invocations need to be wrapped in
rcu_dereference(), then it seems to me that there would be
motivation to bury rcu_dereference() in __in_dev_get().

> > Some callers of __in_dev_get() don't need rcu_dereference at all
> > because they're protected by the rtnl.
> 
> > BTW, could you please move the rcu_dereference in in_dev_get()
> > into the if clause? The barrier is not needed when ip_ptr is
> > NULL.
> 
> The trouble with that may be that there are three events, the
> dereference, the assignment, and the conditional test.  The
> rcu_dereference() is meant to assure deferred destruction
> throughout.

One only needs an rcu_dereference() once on the data-flow path from
fetching the RCU-protected pointer to dereferencing that pointer.
If the pointer is NULL, there is no way you can dereference it,
so, technically, Herbert is quite correct.

However, rcu_dereference() only generates a memory barrier on DEC
Alpha, so there is normally no penalty for using it in the NULL-pointer
case.  So, when using rcu_dereference() unconditionally simplifies
the code, it may make sense to "just do it".

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-29 23:59 Suzanne Wood
@ 2005-09-30  0:23 ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2005-09-30  0:23 UTC (permalink / raw)
  To: Suzanne Wood; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

On Thu, Sep 29, 2005 at 04:59:56PM -0700, Suzanne Wood wrote:
> Sorry to be thinking on-line, but if you mean this:
> 
>   if (in_dev = rcu_dereference(dev->ip_ptr))
> 
> I think that's fine.

Close.  What I had in mind is

	rcu_read_lock();
	in_dev = dev->ip_ptr;
	if (in_dev) {
		in_dev = rcu_dereference(in_dev);
		atomic_inc(&in_dev->refcnt);
	}
	rcu_read_unlock();
	return in_dev;

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-29 23:30 Suzanne Wood
@ 2005-09-30  0:21 ` Herbert Xu
  2005-09-30  0:23 ` Paul E. McKenney
  1 sibling, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2005-09-30  0:21 UTC (permalink / raw)
  To: Suzanne Wood; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

On Thu, Sep 29, 2005 at 04:30:28PM -0700, Suzanne Wood wrote:
> 
> > BTW, could you please move the rcu_dereference in in_dev_get()
> > into the if clause? The barrier is not needed when ip_ptr is
> > NULL.
> 
> The trouble with that may be that there are three events, the
> dereference, the assignment, and the conditional test.  The
> rcu_dereference() is meant to assure deferred destruction
> throughout.

The deferred destruction is guaranteed here by the reference count.
The only purpose served by rcu_dereference() in in_dev_get() is to
prevent the user from seeing pre-initialisation data.

When the pointer is NULL, you can't see any data at all, let alone
pre-initialisation data.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-29 23:59 Suzanne Wood
  2005-09-30  0:23 ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Suzanne Wood @ 2005-09-29 23:59 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

Sorry to be thinking on-line, but if you mean this:

  if (in_dev = rcu_dereference(dev->ip_ptr))

I think that's fine.

  > From suzannew Thu Sep 29 16:39:57 2005

  >   > From suzannew Thu Sep 29 16:30:28 2005

  >   > > From: Herbert Xu 30 Sep 2005 07:28

  >   > > BTW, could you please move the rcu_dereference in in_dev_get()
  >   > > into the if clause? The barrier is not needed when ip_ptr is
  >   > > NULL.

  >   > The trouble with that may be that there are three events, the
  >   > dereference, the assignment, and the conditional test.  The
  >   > rcu_dereference() is meant to assure deferred destruction
  >   > throughout.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-29 23:39 Suzanne Wood
  0 siblings, 0 replies; 27+ messages in thread
From: Suzanne Wood @ 2005-09-29 23:39 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

  > From suzannew Thu Sep 29 16:30:28 2005

  > > From: Herbert Xu 30 Sep 2005 07:28

  > > BTW, could you please move the rcu_dereference in in_dev_get()
  > > into the if clause? The barrier is not needed when ip_ptr is
  > > NULL.

  > The trouble with that may be that there are three events, the
  > dereference, the assignment, and the conditional test.  The
  > rcu_dereference() is meant to assure deferred destruction
  > throughout.

Sorry, I was thinking in terms of the rcu_read_lock, so this is misstated.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-29 23:30 Suzanne Wood
  2005-09-30  0:21 ` Herbert Xu
  2005-09-30  0:23 ` Paul E. McKenney
  0 siblings, 2 replies; 27+ messages in thread
From: Suzanne Wood @ 2005-09-29 23:30 UTC (permalink / raw)
  To: herbert; +Cc: Robert.Olsson, davem, linux-kernel, netdev, paulmck, walpole

> Date: Fri, 30 Sep 2005 07:28:36 +1000
> From: Herbert Xu <herbert@gondor.apana.org.au>

> On Thu, Sep 29, 2005 at 09:02:29AM -0700, Suzanne Wood wrote:
> > 
> > The exchange below suggests that it is equally important 
> > to have the rcu_dereference() in __in_dev_get(), so the 
> > idea of the only difference between in_dev_get and 
> > __in_dev_get being the refcnt may be accepted.

> With __in_dev_get() it's the caller's responsibility to ensure
> that RCU works correctly.  Therefore if any rcu_dereference is
> needed it should be done by the caller.

This sounds reasonable to me.  Does everyone agree? 

> Some callers of __in_dev_get() don't need rcu_dereference at all
> because they're protected by the rtnl.

> BTW, could you please move the rcu_dereference in in_dev_get()
> into the if clause? The barrier is not needed when ip_ptr is
> NULL.

The trouble with that may be that there are three events, the
dereference, the assignment, and the conditional test.  The
rcu_dereference() is meant to assure deferred destruction
throughout.

Thank you very much for your suggestions.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-29 16:02 Suzanne Wood
@ 2005-09-29 21:28 ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2005-09-29 21:28 UTC (permalink / raw)
  To: Suzanne Wood; +Cc: paulmck, Robert.Olsson, davem, linux-kernel, netdev, walpole

On Thu, Sep 29, 2005 at 09:02:29AM -0700, Suzanne Wood wrote:
> 
> The exchange below suggests that it is equally important 
> to have the rcu_dereference() in __in_dev_get(), so the 
> idea of the only difference between in_dev_get and 
> __in_dev_get being the refcnt may be accepted.

With __in_dev_get() it's the caller's responsibility to ensure
that RCU works correctly.  Therefore if any rcu_dereference is
needed it should be done by the caller.

Some callers of __in_dev_get() don't need rcu_dereference at all
because they're protected by the rtnl.

BTW, could you please move the rcu_dereference in in_dev_get()
into the if clause? The barrier is not needed when ip_ptr is
NULL.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-29 16:02 Suzanne Wood
  2005-09-29 21:28 ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Suzanne Wood @ 2005-09-29 16:02 UTC (permalink / raw)
  To: paulmck
  Cc: Robert.Olsson, davem, herbert, linux-kernel, netdev, suzannew, walpole

The original motivation for this patch was in __in_dev_get 
usage.  I'll try to test a build, but should submittals be 
incremental? first addressing in_dev_get, then 
__in_dev_get?  What seems resolved so far follows.

The exchange below suggests that it is equally important 
to have the rcu_dereference() in __in_dev_get(), so the 
idea of the only difference between in_dev_get and 
__in_dev_get being the refcnt may be accepted.

Correct usage may be a question with the mismatched 
definitions (in terms of refcnt) of __in_dev_get() 
and __in_dev_put() that superficially appear 
paired and this may merit a comment.  If interested, 
examples are mentioned in 
www.uwsg.iu.edu/hypermail/linux/kernel/0509.1/0184.html
and
www.ussg.iu.edu/hypermail/linux/kernel/0509.3/0757.html

But when the refcnt is employed for the DEC Alpha, 
rcu-protection or other locking must be in place for 
multiple CPUs, which apparently affirms the value 
of the marking of an rcu read-side critical section 
done by the calling function which has the vision of the 
extent of use of the protected dereference.

Is this all reasonable to you?
Thank you very much.

----- Original Message ----- 
From: Paul E. McKenney
Sent: Wednesday, September 28, 2005 7:51 AM


> On Wed, Sep 28, 2005 at 12:55:45PM +1000, Herbert Xu wrote:
>> David S. Miller wrote:
>> > 
>> > I agree with the changes to add rcu_dereference() use.
>> > Those were definitely lacking and needed.
>> 
>> Actually I'm not so sure that they are all needed.  I only looked
>> at the > guarantee correct code.  We really need to look at each case
>> individually.
> 
> Yep, these two APIs are only part of the solution.
> 
> The reference-count approach is only guaranteed to work if the kernel
> thread that did the reference-count increment is later referencing that
> same data element.  Otherwise, one has the following possible situation
> on DEC Alpha:
> 
> o CPU 0 initializes and inserts a new element into the data
> structure, using rcu_assign_pointer() to provide any needed
> memory barriers.  (Or, if RCU is not being used, under the
> appropriate update-side lock.)
> 
> o CPU 1 acquires a reference to this new element, presumably
> using either a lock or rcu_read_lock() and rcu_dereference()
> in order to do so safely.  CPU 1 then increments the reference
> count.
> 
> o CPU 2 picks up a pointer to this new element, but in a way
> that relies on the reference count having been incremented,
> without using locking, rcu_read_lock(), rcu_dereference(),
> and so on.
> 
> This CPU can then see the pre-initialized contents of the
> newly inserted data structure (again, but only on DEC Alpha).
> 
> Again, if the same kernel thread that incremented the reference count
> is later accessing it, no problem, even on Alpha.
> 
> Thanx, Paul
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-28 14:51     ` Paul E. McKenney
@ 2005-09-28 22:11       ` Herbert Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Herbert Xu @ 2005-09-28 22:11 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David S. Miller, suzannew, linux-kernel, Robert.Olsson, walpole, netdev

On Wed, Sep 28, 2005 at 07:51:10AM -0700, Paul E. McKenney wrote:
> 
> The reference-count approach is only guaranteed to work if the kernel
> thread that did the reference-count increment is later referencing that
> same data element.  Otherwise, one has the following possible situation
> on DEC Alpha:

You're quite right.  Without the rcu_dereference users of in_dev_get()
may see pre-initialisation contents of in_dev.

So these barriers are definitely needed.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-28  2:55   ` Herbert Xu
@ 2005-09-28 14:51     ` Paul E. McKenney
  2005-09-28 22:11       ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2005-09-28 14:51 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, suzannew, linux-kernel, Robert.Olsson, walpole, netdev

On Wed, Sep 28, 2005 at 12:55:45PM +1000, Herbert Xu wrote:
> David S. Miller <davem@davemloft.net> wrote:
> > 
> > I agree with the changes to add rcu_dereference() use.
> > Those were definitely lacking and needed.
> 
> Actually I'm not so sure that they are all needed.  I only looked
> at the very first one in the patch which is in in_dev_get().  That
> one certainly isn't necessary because the old value of ip_ptr
> is valid as long as the reference count does not hit zero.
> 
> The later is guaranteed by the increment in in_dev_get().
> 
> Because the pervasiveness of reference counting in the network stack,
> I believe that we should scrutinise the other bits in the patch too
> to make sure that they are all needed.
> 
> In general, using rcu_dereference/rcu_assign_pointer does not
> guarantee correct code.  We really need to look at each case
> individually.

Yep, these two APIs are only part of the solution.

The reference-count approach is only guaranteed to work if the kernel
thread that did the reference-count increment is later referencing that
same data element.  Otherwise, one has the following possible situation
on DEC Alpha:

o	CPU 0 initializes and inserts a new element into the data
	structure, using rcu_assign_pointer() to provide any needed
	memory barriers.  (Or, if RCU is not being used, under the
	appropriate update-side lock.)

o	CPU 1 acquires a reference to this new element, presumably
	using either a lock or rcu_read_lock() and rcu_dereference()
	in order to do so safely.  CPU 1 then increments the reference
	count.

o	CPU 2 picks up a pointer to this new element, but in a way
	that relies on the reference count having been incremented,
	without using locking, rcu_read_lock(), rcu_dereference(),
	and so on.

	This CPU can then see the pre-initialized contents of the
	newly inserted data structure (again, but only on DEC Alpha).

Again, if the same kernel thread that incremented the reference count
is later accessing it, no problem, even on Alpha.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-27 20:56 ` David S. Miller
@ 2005-09-28  2:55   ` Herbert Xu
  2005-09-28 14:51     ` Paul E. McKenney
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2005-09-28  2:55 UTC (permalink / raw)
  To: David S. Miller
  Cc: suzannew, linux-kernel, Robert.Olsson, paulmck, walpole, netdev

David S. Miller <davem@davemloft.net> wrote:
> 
> I agree with the changes to add rcu_dereference() use.
> Those were definitely lacking and needed.

Actually I'm not so sure that they are all needed.  I only looked
at the very first one in the patch which is in in_dev_get().  That
one certainly isn't necessary because the old value of ip_ptr
is valid as long as the reference count does not hit zero.

The later is guaranteed by the increment in in_dev_get().

Because the pervasiveness of reference counting in the network stack,
I believe that we should scrutinise the other bits in the patch too
to make sure that they are all needed.

In general, using rcu_dereference/rcu_assign_pointer does not
guarantee correct code.  We really need to look at each case
individually.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] identify in_dev_get rcu read-side critical sections
  2005-09-08 17:12 Suzanne Wood
@ 2005-09-27 20:56 ` David S. Miller
  2005-09-28  2:55   ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: David S. Miller @ 2005-09-27 20:56 UTC (permalink / raw)
  To: suzannew; +Cc: linux-kernel, Robert.Olsson, paulmck, walpole

From: Suzanne Wood <suzannew@cs.pdx.edu>
Date: Thu, 8 Sep 2005 10:12:52 -0700 (PDT)

> Please consider this request for suggestions on an attempt at a partial patch 
> based on the assumptions below to identify rcu read-side critical sections 
> for in_dev_get() defined in inetdevice.h.  Thank you.

Thanks a lot for your patch, and patience in waiting for it
to get reviewed :-)

I agree with the changes to add rcu_dereference() use.
Those were definitely lacking and needed.

This following case is clever and correct, though.  It is from
the net/ipv4/devinet.c part of your patch:

@@ -409,7 +412,8 @@ static int inet_rtm_deladdr(struct sk_bu
 
 	if ((in_dev = inetdev_by_index(ifm->ifa_index)) == NULL)
 		goto out;
-	__in_dev_put(in_dev);
+	in_dev_put(in_dev);
+	rcu_read_unlock();
 
 	for (ifap = &in_dev->ifa_list; (ifa = *ifap) != NULL;
 	     ifap = &ifa->ifa_next) {

Everyone gets fooled by a certain invariant in the Linux networking
locking.  If the RTNL semaphore is held, _all_ device and address
configuration changes are blocked.  IP addresses cannot be removed,
devices cannot be brought up or down, routes cannot be added or
deleted, etc.  The RTNL semaphore serializes all of these operations.
And it is held during inet_rtm_deladdr() here.

So we _know_ that if inetdev_by_index() returns non-NULL someone
else (the device itself) holds at least _one_ reference to that
object which cannot go away, because all such actions would need
to take the RTNL semaphore which we hold.

So __in_dev_put() is safe here.

Arguably, it's being overly clever for questionable gain.
It definitely deserves a comment, how about that? :-)

Finally, about adding rcu_read_{lock,unlock}() around even
in_dev_{get,put}().  I bet that really isn't needed but I cannot
articulate why we can get away without it.  For example, if we
are given a pair executed in a function like:

	in_dev_get();

	...

	in_dev_put();

who cares if we preempt?  The local function's execution holds the
necessary reference, so the object's refcount cannot ever fall to
zero.

We can't get any RCU callbacks invoked, as a result, so we don't
need the rcu_read_{lock,unlock}() calls here.

The in_dev_put() uses atomic_dec_and_test(), which provides a memory
barrier, so no out-of-order cpu memory references to the object
can escape past the decrement to zero of the object reference count.

In short, I think adding rcu_read_{lock,unlock}() is very heavy
handed and unnecessary.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC][PATCH] identify in_dev_get rcu read-side critical sections
@ 2005-09-08 17:12 Suzanne Wood
  2005-09-27 20:56 ` David S. Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Suzanne Wood @ 2005-09-08 17:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Robert.Olsson, paulmck, suzannew, walpole


Please consider this request for suggestions on an attempt at a partial patch 
based on the assumptions below to identify rcu read-side critical sections 
for in_dev_get() defined in inetdevice.h.  Thank you.

drivers/s390/net/lcs.c		|	 2 ++
drivers/s390/net/qeth_main.c	|	 4 ++++
include/linux/inetdevice.h	|	 2 +-
net/ipv4/arp.c			|	17 +++++++++++------
net/ipv4/devinet.c		|	 6 +++++-
net/ipv4/icmp.c			|	 4 ++--
net/ipv4/igmp.c			|	22 ++++++++++++++++++----
net/ipv4/ip_gre.c 		|	 1 +
net/ipv4/ip_input.c 		| 	 6 ++++--
net/ipv4/route.c		| 	41 ++++++++++++++++++++++++++++-------------
10 files changed, 76 insertions(+), 29 deletions(-)

---------------------------------------------------

Assumptions made in this patch development:

1. Programmer who inserts rcu_lock/unlock around an in_dev_get or
   __in_dev_get expects the protection of deferred destruction.

2. The marking of an rcu read-side critical section done by
   the calling function has the vision of the extent of use of the
   protected dereference.

3. Pairings of in_dev_get/in_dev_put and __in_dev_get/__in_dev_put
   indicate the need to balance increment/decrement of refcnt, 
   so __in_dev_get which does not increment is likely paired in 
   error with __in_dev_put which decrements (unless in_dev_hold 
   or similar is in the path).  

4. If programmer chooses __in_dev_get rather than in_dev_get, 
   the refcnt is not desired.

5. If a function returns an rcu protected pointer, the rcu_read_lock
   is in place, so the rcu_read_unlock occurs in the caller.

Questions/Suggestions:

   A pairing of in_dev_get with in_dev_put may indicate the addition
   in the in_dev_put conditional of something like 
   call_rcu(&idev->rcu_head, in_dev->rcu_put)
   after in_dev_finish_destroy or replace the latter with a call like
   inetdev_destroy().

   Differences between inetdevice.h in kernel 2.5.60 and linux-2.6.13-rc6
   include the replacement in in_dev_get() of read_lock(&inetdev_lock)/unlock 
   with rcu_read_lock/unlock and the addition of the rcu_head to the 
   in_ifaddr struct.

   In net/ipv4/route.c, consider __mkroute_output and follow the
   three refcnt increments done to out_dev by dev_hold and in_dev_get, 
   then one decrement to in_dev.  Correct refcnt may also be an issue in
   xfrm4_policy.c to allow the atomic_dec_and_test in in_dev_put
   to appropriately recognize the condition to free the in_device.

   In net/ipv4/arp.c, arp_req_set() includes the following where,
   with possible update, might this be risky to test the deref and
   deref again: 
                if (__in_dev_get(dev)) {
                        __in_dev_get(dev)->cnf.proxy_arp = 1;
                        return 0;
                }
   and similarly in arp_req_delete?  While the vast majority of uses
   of __in_dev_get indicate rcu protection, the above may be the exception
   that breaks the rule.  It seems reasonable to have both in_dev_get
   and __in_dev_get use rcu_dereference and have the difference between
   them be the refcnt.  For now, I'll submit a patch addressing only
   in_dev_get and hope for feedback on patching __in_dev_get.  Because
   one way or another, there are __in_dev_get uses that probably need 
   to be addressed.

While this is a subjective attempt to understand programmer intent,
with your help and suggestions, the plan is to build an automated
checker.  Thank you.

------------------------------------------------------------------

diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/drivers/s390/net/lcs.c linux-2.6.13-rc6/drivers/s390/net/lcs.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/drivers/s390/net/lcs.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/drivers/s390/net/lcs.c	2005-09-07 21:37:41.000000000 -0700
@@ -1270,6 +1270,7 @@ lcs_register_mc_addresses(void *data)
 		return 0;
 	LCS_DBF_TEXT(4, trace, "regmulti");
 
+	rcu_read_lock();
 	in4_dev = in_dev_get(card->dev);
 	if (in4_dev == NULL)
 		goto out;
@@ -1278,6 +1279,7 @@ lcs_register_mc_addresses(void *data)
 	lcs_set_mc_addresses(card, in4_dev);
 	read_unlock(&in4_dev->mc_list_lock);
 	in_dev_put(in4_dev);
+	rcu_read_unlock();
 
 	lcs_fix_multicast_list(card);
 out:
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/drivers/s390/net/qeth_main.c linux-2.6.13-rc6/drivers/s390/net/qeth_main.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/drivers/s390/net/qeth_main.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/drivers/s390/net/qeth_main.c	2005-09-07 22:22:10.000000000 -0700
@@ -5441,6 +5441,7 @@ qeth_add_vlan_mc(struct qeth_card *card)
 		if (vg->vlan_devices[i] == NULL ||
 		    !(vg->vlan_devices[i]->flags & IFF_UP))
 			continue;
+		rcu_read_lock();
 		in_dev = in_dev_get(vg->vlan_devices[i]);
 		if (!in_dev)
 			continue;
@@ -5448,6 +5449,7 @@ qeth_add_vlan_mc(struct qeth_card *card)
 		qeth_add_mc(card,in_dev);
 		read_unlock(&in_dev->mc_list_lock);
 		in_dev_put(in_dev);
+		rcu_read_unlock();
 	}
 #endif
 }
@@ -5458,6 +5460,7 @@ qeth_add_multicast_ipv4(struct qeth_card
 	struct in_device *in4_dev;
 
 	QETH_DBF_TEXT(trace,4,"chkmcv4");
+	rcu_read_lock();
 	in4_dev = in_dev_get(card->dev);
 	if (in4_dev == NULL)
 		return;
@@ -5466,6 +5469,7 @@ qeth_add_multicast_ipv4(struct qeth_card
 	qeth_add_vlan_mc(card);
 	read_unlock(&in4_dev->mc_list_lock);
 	in_dev_put(in4_dev);
+	rcu_read_unlock();
 }
 
 #ifdef CONFIG_QETH_IPV6
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/include/linux/inetdevice.h linux-2.6.13-rc6/include/linux/inetdevice.h
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/include/linux/inetdevice.h	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/include/linux/inetdevice.h	2005-08-19 14:27:02.000000000 -0700
@@ -148,7 +148,7 @@ in_dev_get(const struct net_device *dev)
 	struct in_device *in_dev;
 
 	rcu_read_lock();
-	in_dev = dev->ip_ptr;
+	in_dev = rcu_dereference(dev->ip_ptr);
 	if (in_dev)
 		atomic_inc(&in_dev->refcnt);
 	rcu_read_unlock();
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/arp.c linux-2.6.13-rc6/net/ipv4/arp.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/arp.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/net/ipv4/arp.c	2005-09-07 22:49:09.000000000 -0700
@@ -334,9 +334,10 @@ static void arp_solicit(struct neighbour
 	struct net_device *dev = neigh->dev;
 	u32 target = *(u32*)neigh->primary_key;
 	int probes = atomic_read(&neigh->probes);
-	struct in_device *in_dev = in_dev_get(dev);
+	struct in_device *in_dev;
 
-	if (!in_dev)
+	rcu_read_lock();
+	if ((in_dev = in_dev_get(dev)) == NULL)
 		return;
 
 	switch (IN_DEV_ARP_ANNOUNCE(in_dev)) {
@@ -362,6 +363,8 @@ static void arp_solicit(struct neighbour
 
 	if (in_dev)
 		in_dev_put(in_dev);
+	rcu_read_unlock();
+
 	if (!saddr)
 		saddr = inet_select_addr(dev, target, RT_SCOPE_LINK);
 
@@ -543,11 +546,12 @@ static inline int arp_fwd_proxy(struct i
 		return 0;
 
 	/* place to check for proxy_arp for routes */
-
+	rcu_read_lock();
 	if ((out_dev = in_dev_get(rt->u.dst.dev)) != NULL) {
 		omi = IN_DEV_MEDIUM_ID(out_dev);
 		in_dev_put(out_dev);
 	}
+	rcu_read_unlock();
 	return (omi != imi && omi != -1);
 }
 
@@ -710,7 +714,7 @@ static void parp_redo(struct sk_buff *sk
 static int arp_process(struct sk_buff *skb)
 {
 	struct net_device *dev = skb->dev;
-	struct in_device *in_dev = in_dev_get(dev);
+	struct in_device *in_dev;
 	struct arphdr *arp;
 	unsigned char *arp_ptr;
 	struct rtable *rt;
@@ -723,8 +727,8 @@ static int arp_process(struct sk_buff *s
 	/* arp_rcv below verifies the ARP header and verifies the device
 	 * is ARP'able.
 	 */
-
-	if (in_dev == NULL)
+	rcu_read_lock();
+	if ((in_dev = in_dev_get(dev)) == NULL)
 		goto out;
 
 	arp = skb->nh.arph;
@@ -918,6 +922,7 @@ static int arp_process(struct sk_buff *s
 out:
 	if (in_dev)
 		in_dev_put(in_dev);
+	rcu_read_unlock();
 	kfree_skb(skb);
 	return 0;
 }
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/devinet.c linux-2.6.13-rc6/net/ipv4/devinet.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/devinet.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/net/ipv4/devinet.c	2005-09-08 00:38:01.000000000 -0700
@@ -372,12 +372,15 @@ static int inet_set_ifa(struct net_devic
 	return inet_insert_ifa(ifa);
 }
 
+/* returns with rcu read-critical section open, so rcu_read_unlock() in caller */
+
 struct in_device *inetdev_by_index(int ifindex)
 {
 	struct net_device *dev;
 	struct in_device *in_dev = NULL;
 	read_lock(&dev_base_lock);
 	dev = __dev_get_by_index(ifindex);
+	rcu_read_lock();
 	if (dev)
 		in_dev = in_dev_get(dev);
 	read_unlock(&dev_base_lock);
@@ -409,7 +412,8 @@ static int inet_rtm_deladdr(struct sk_bu
 
 	if ((in_dev = inetdev_by_index(ifm->ifa_index)) == NULL)
 		goto out;
-	__in_dev_put(in_dev);
+	in_dev_put(in_dev);
+	rcu_read_unlock();
 
 	for (ifap = &in_dev->ifa_list; (ifa = *ifap) != NULL;
 	     ifap = &ifa->ifa_next) {
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/icmp.c linux-2.6.13-rc6/net/ipv4/icmp.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/icmp.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/net/ipv4/icmp.c	2005-09-07 23:37:08.000000000 -0700
@@ -890,10 +890,10 @@ static void icmp_address_reply(struct sk
 	if (skb->len < 4 || !(rt->rt_flags&RTCF_DIRECTSRC))
 		goto out;
 
+	rcu_read_lock();
 	in_dev = in_dev_get(dev);
 	if (!in_dev)
 		goto out;
-	rcu_read_lock();
 	if (in_dev->ifa_list &&
 	    IN_DEV_LOG_MARTIANS(in_dev) &&
 	    IN_DEV_FORWARD(in_dev)) {
@@ -913,8 +913,8 @@ static void icmp_address_reply(struct sk
 			       NIPQUAD(*mp), dev->name, NIPQUAD(rt->rt_src));
 		}
 	}
-	rcu_read_unlock();
 	in_dev_put(in_dev);
+	rcu_read_unlock();
 out:;
 }
 
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/igmp.c linux-2.6.13-rc6/net/ipv4/igmp.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/igmp.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/net/ipv4/igmp.c	2005-09-08 00:35:52.000000000 -0700
@@ -864,10 +864,12 @@ int igmp_rcv(struct sk_buff *skb)
 {
 	/* This basically follows the spec line by line -- see RFC1112 */
 	struct igmphdr *ih;
-	struct in_device *in_dev = in_dev_get(skb->dev);
+	struct in_device *in_dev;
 	int len = skb->len;
 
-	if (in_dev==NULL) {
+	rcu_read_lock();
+	if ((in_dev = in_dev_get(skb->dev) == NULL) {
+		rcu_read_unlock();
 		kfree_skb(skb);
 		return 0;
 	}
@@ -875,6 +877,7 @@ int igmp_rcv(struct sk_buff *skb)
 	if (!pskb_may_pull(skb, sizeof(struct igmphdr)) || 
 	    (u16)csum_fold(skb_checksum(skb, 0, len, 0))) {
 		in_dev_put(in_dev);
+		rcu_read_unlock();
 		kfree_skb(skb);
 		return 0;
 	}
@@ -907,6 +910,7 @@ int igmp_rcv(struct sk_buff *skb)
 		NETDEBUG(printk(KERN_DEBUG "New IGMP type=%d, why we do not know about it?\n", ih->type));
 	}
 	in_dev_put(in_dev);
+	rcu_read_unlock();
 	kfree_skb(skb);
 	return 0;
 }
@@ -1307,8 +1311,8 @@ static struct in_device * ip_mc_find_dev
 	if (imr->imr_ifindex) {
 		idev = inetdev_by_index(imr->imr_ifindex);
 		if (idev)
-			__in_dev_put(idev);
-		return idev;
+			__in_dev_put(idev); /* decrement before return? */
+		return idev; /* caller should rcu_read_unlock */
 	}
 	if (imr->imr_address.s_addr) {
 		dev = ip_dev_find(imr->imr_address.s_addr);
@@ -2098,6 +2102,7 @@ void ip_mc_drop_socket(struct sock *sk)
 			(void) ip_mc_leave_src(sk, iml, in_dev);
 			ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr);
 			in_dev_put(in_dev);
+			rcu_read_unlock();
 		}
 		sock_kfree_s(sk, iml, sizeof(*iml));
 
@@ -2154,6 +2159,7 @@ static inline struct ip_mc_list *igmp_mc
 	     state->dev; 
 	     state->dev = state->dev->next) {
 		struct in_device *in_dev;
+		rcu_read_lock();
 		in_dev = in_dev_get(state->dev);
 		if (!in_dev)
 			continue;
@@ -2165,6 +2171,7 @@ static inline struct ip_mc_list *igmp_mc
 		}
 		read_unlock(&in_dev->mc_list_lock);
 		in_dev_put(in_dev);
+		rcu_read_unlock();
 	}
 	return im;
 }
@@ -2183,11 +2190,13 @@ static struct ip_mc_list *igmp_mc_get_ne
 			state->in_dev = NULL;
 			break;
 		}
+		rcu_read_lock();
 		state->in_dev = in_dev_get(state->dev);
 		if (!state->in_dev)
 			continue;
 		read_lock(&state->in_dev->mc_list_lock);
 		im = state->in_dev->mc_list;
+		rcu_read_unlock();
 	}
 	return im;
 }
@@ -2317,6 +2326,7 @@ static inline struct ip_sf_list *igmp_mc
 	     state->dev; 
 	     state->dev = state->dev->next) {
 		struct in_device *idev;
+		rcu_read_lock();
 		idev = in_dev_get(state->dev);
 		if (unlikely(idev == NULL))
 			continue;
@@ -2334,6 +2344,7 @@ static inline struct ip_sf_list *igmp_mc
 		}
 		read_unlock(&idev->mc_list_lock);
 		in_dev_put(idev);
+		rcu_read_unlock();
 	}
 	return psf;
 }
@@ -2356,11 +2367,14 @@ static struct ip_sf_list *igmp_mcf_get_n
 				state->idev = NULL;
 				goto out;
 			}
+			rcu_read_lock();
 			state->idev = in_dev_get(state->dev);
 			if (!state->idev)
+				rcu_read_unlock();
 				continue;
 			read_lock(&state->idev->mc_list_lock);
 			state->im = state->idev->mc_list;
+			rcu_read_unlock();
 		}
 		if (!state->im)
 			break;
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/ip_gre.c linux-2.6.13-rc6/net/ipv4/ip_gre.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/ip_gre.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/net/ipv4/ip_gre.c	2005-09-08 00:42:38.000000000 -0700
@@ -1121,6 +1121,7 @@ static int ipgre_close(struct net_device
 			ip_mc_dec_group(in_dev, t->parms.iph.daddr);
 			in_dev_put(in_dev);
 		}
+		rcu_read_unlock();
 	}
 	return 0;
 }
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/ip_input.c linux-2.6.13-rc6/net/ipv4/ip_input.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/ip_input.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/net/ipv4/ip_input.c	2005-09-08 00:45:42.000000000 -0700
@@ -330,8 +330,9 @@ static inline int ip_rcv_finish(struct s
 
 		opt = &(IPCB(skb)->opt);
 		if (opt->srr) {
-			struct in_device *in_dev = in_dev_get(dev);
-			if (in_dev) {
+			struct in_device *in_dev;
+			rcu_read_lock();
+			if ((in_dev = in_dev_get(dev)) != NULL) {
 				if (!IN_DEV_SOURCE_ROUTE(in_dev)) {
 					if (IN_DEV_LOG_MARTIANS(in_dev) && net_ratelimit())
 						printk(KERN_INFO "source route option %u.%u.%u.%u -> %u.%u.%u.%u\n",
@@ -341,6 +342,7 @@ static inline int ip_rcv_finish(struct s
 				}
 				in_dev_put(in_dev);
 			}
+			rcu_read_unlock();
 			if (ip_options_rcv_srr(skb))
 				goto drop;
 		}
diff -urpNa -X dontdiff /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/route.c linux-2.6.13-rc6/net/ipv4/route.c
--- /mnt/shared/linuxKernel2_6/linux-2.6.13-rc6/net/ipv4/route.c	2005-08-07 11:18:56.000000000 -0700
+++ linux-2.6.13-rc6/net/ipv4/route.c	2005-09-08 08:39:01.000000000 -0700
@@ -1112,14 +1112,15 @@ void ip_rt_redirect(u32 old_gw, u32 dadd
 		    u32 saddr, u8 tos, struct net_device *dev)
 {
 	int i, k;
-	struct in_device *in_dev = in_dev_get(dev);
+	struct in_device *in_dev;
 	struct rtable *rth, **rthp;
 	u32  skeys[2] = { saddr, 0 };
 	int  ikeys[2] = { dev->ifindex, 0 };
 
 	tos &= IPTOS_RT_MASK;
 
-	if (!in_dev)
+	rcu_read_lock();
+	if ((in_dev = in_dev_get(dev)) == NULL)
 		return;
 
 	if (new_gw == old_gw || !IN_DEV_RX_REDIRECTS(in_dev)
@@ -1171,6 +1172,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd
 				if (rt == NULL) {
 					ip_rt_put(rth);
 					in_dev_put(in_dev);
+					rcu_read_unlock();
 					return;
 				}
 
@@ -1223,6 +1225,7 @@ void ip_rt_redirect(u32 old_gw, u32 dadd
 		}
 	}
 	in_dev_put(in_dev);
+	rcu_read_unlock();
 	return;
 
 reject_redirect:
@@ -1236,6 +1239,7 @@ reject_redirect:
 		       NIPQUAD(saddr), NIPQUAD(daddr), tos);
 #endif
 	in_dev_put(in_dev);
+	rcu_read_unlock();
 }
 
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
@@ -1284,9 +1288,9 @@ static struct dst_entry *ipv4_negative_a
 void ip_rt_send_redirect(struct sk_buff *skb)
 {
 	struct rtable *rt = (struct rtable*)skb->dst;
-	struct in_device *in_dev = in_dev_get(rt->u.dst.dev);
-
-	if (!in_dev)
+	struct in_device *in_dev;
+	rcu_read_lock();
+	if ((in_dev = in_dev_get(rt->u.dst.dev)) == NULL)
 		return;
 
 	if (!IN_DEV_TX_REDIRECTS(in_dev))
@@ -1327,6 +1331,7 @@ void ip_rt_send_redirect(struct sk_buff 
 	}
 out:
         in_dev_put(in_dev);
+	rcu_read_unlock();
 }
 
 static int ip_error(struct sk_buff *skb)
@@ -1482,10 +1487,12 @@ static void ipv4_dst_ifdown(struct dst_e
 	struct rtable *rt = (struct rtable *) dst;
 	struct in_device *idev = rt->idev;
 	if (dev != &loopback_dev && idev && idev->dev == dev) {
-		struct in_device *loopback_idev = in_dev_get(&loopback_dev);
-		if (loopback_idev) {
+		struct in_device *loopback_idev;
+		rcu_read_lock();
+		if (loopback_idev = in_dev_get(&loopback_dev) != NULL) {
 			rt->idev = loopback_idev;
 			in_dev_put(idev);
+			rcu_read_unlock();
 		}
 	}
 }
@@ -1593,12 +1600,12 @@ static int ip_route_input_mc(struct sk_b
 	unsigned hash;
 	struct rtable *rth;
 	u32 spec_dst;
-	struct in_device *in_dev = in_dev_get(dev);
+	struct in_device *in_dev;
 	u32 itag = 0;
 
 	/* Primary sanity checks. */
-
-	if (in_dev == NULL)
+	rcu_read_lock();
+	if ((in_dev = in_dev_get(dev)) == NULL)
 		return -EINVAL;
 
 	if (MULTICAST(saddr) || BADCLASS(saddr) || LOOPBACK(saddr) ||
@@ -1656,15 +1663,18 @@ static int ip_route_input_mc(struct sk_b
 	RT_CACHE_STAT_INC(in_slow_mc);
 
 	in_dev_put(in_dev);
+	rcu_read_unlock();
 	hash = rt_hash_code(daddr, saddr ^ (dev->ifindex << 5), tos);
 	return rt_intern_hash(hash, rth, (struct rtable**) &skb->dst);
 
 e_nobufs:
 	in_dev_put(in_dev);
+	rcu_read_unlock();
 	return -ENOBUFS;
 
 e_inval:
 	in_dev_put(in_dev);
+	rcu_read_unlock();
 	return -EINVAL;
 }
 
@@ -1714,6 +1724,7 @@ static inline int __mkroute_input(struct
 	u32 spec_dst, itag;
 
 	/* get a working reference to the output device */
+	rcu_read_lock()
 	out_dev = in_dev_get(FIB_RES_DEV(*res));
 	if (out_dev == NULL) {
 		if (net_ratelimit())
@@ -1796,6 +1807,7 @@ static inline int __mkroute_input(struct
  cleanup:
 	/* release the working reference to the output device */
 	in_dev_put(out_dev);
+	rcu_read_unlock();
 	return err;
 }						
 
@@ -1899,7 +1911,7 @@ static int ip_route_input_slow(struct sk
 			       u8 tos, struct net_device *dev)
 {
 	struct fib_result res;
-	struct in_device *in_dev = in_dev_get(dev);
+	struct in_device *in_dev;
 	struct flowi fl = { .nl_u = { .ip4_u =
 				      { .daddr = daddr,
 					.saddr = saddr,
@@ -1919,8 +1931,8 @@ static int ip_route_input_slow(struct sk
 	int		free_res = 0;
 
 	/* IP on this device is disabled. */
-
-	if (!in_dev)
+	rcu_read_lock();
+	if ((in_dev = in_dev_get(dev)) == NULL)
 		goto out;
 
 	/* Check for the most weird martians, which can be not detected
@@ -1983,6 +1995,7 @@ static int ip_route_input_slow(struct sk
 	
 done:
 	in_dev_put(in_dev);
+	rcu_read_unlock();
 	if (free_res)
 		fib_res_put(&res);
 out:	return err;
@@ -2174,6 +2187,7 @@ static inline int __mkroute_output(struc
 		flags |= RTCF_LOCAL;
 
 	/* get work reference to inet device */
+	rcu_read_lock();
 	in_dev = in_dev_get(dev_out);
 	if (!in_dev)
 		return -EINVAL;
@@ -2270,6 +2284,7 @@ static inline int __mkroute_output(struc
 	*result = rth;
  cleanup:
 	/* release work reference to inet device */
+	rcu_read_unlock();
 	in_dev_put(in_dev);
 
 	return err;

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2005-10-01 19:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-28  0:22 [RFC][PATCH] identify in_dev_get rcu read-side critical sections Suzanne Wood
  -- strict thread matches above, loose matches on Subject: below --
2005-10-01 18:37 Suzanne Wood
2005-10-01 19:29 ` Herbert Xu
2005-10-01 18:00 Suzanne Wood
2005-10-01  6:56 Suzanne Wood
2005-10-01  7:12 ` Herbert Xu
2005-10-01 18:04   ` Paul E. McKenney
2005-09-30  1:06 Suzanne Wood
2005-10-01  1:13 ` Herbert Xu
2005-09-29 23:59 Suzanne Wood
2005-09-30  0:23 ` Herbert Xu
2005-09-29 23:39 Suzanne Wood
2005-09-29 23:30 Suzanne Wood
2005-09-30  0:21 ` Herbert Xu
2005-09-30  0:23 ` Paul E. McKenney
2005-09-30  0:27   ` Herbert Xu
2005-09-30  0:36     ` Paul E. McKenney
2005-09-30  1:04       ` Herbert Xu
2005-09-30  1:16         ` Paul E. McKenney
2005-09-30  1:19           ` Herbert Xu
2005-09-29 16:02 Suzanne Wood
2005-09-29 21:28 ` Herbert Xu
2005-09-08 17:12 Suzanne Wood
2005-09-27 20:56 ` David S. Miller
2005-09-28  2:55   ` Herbert Xu
2005-09-28 14:51     ` Paul E. McKenney
2005-09-28 22:11       ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).