netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Performance regression from routing cache removal?
@ 2013-06-05 17:57 Shawn Bohrer
  2013-06-05 18:13 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn Bohrer @ 2013-06-05 17:57 UTC (permalink / raw)
  To: netdev; +Cc: davem

I've got a performance regression that I've been trying to track down
for the last couple of days.  The last known good kernel was 3.4 and
now I'm testing 3.10 so there has been a lot of changes in between.
The workload I'm testing has a single machine receiving UDP multicast
packets across approximately 350 multicast groups.  We've got one
socket per multicast address receiving the data.  The traffic is small
packets and tends to be very bursty.  With the 3.10 kernel I'm seeing
occasional spikes in one-way latency that are in the 100-500
millisecond range, and regularly in the 10 millisecond range.  I've
started a git bisect which has narrowed me down to:

bad f5b0a8743601a4477419171f5046bd07d1c080a0
good fa0afcd10951afad2022dda09777d2bf70cdab3d

The remaining commits are all part of the routing cache removal so
I've stopped my bisection for now.  I can bisect further if anyone
thinks it will be valuable.

Profiling the machine under 3.10-rc3 shows the following new hot spot
on the kernel side:

45.84%  [kernel.kallsyms]
        |
        |--8.55%-- ip_check_mc_rcu
        |          |
        |          |--99.41%-- ip_route_input_noref
        |          |          ip_rcv_finish
        |          |          ip_rcv
        |          |          __netif_receive_skb_core
        |          |          __netif_receive_skb
        |          |          netif_receive_skb
        |          |          napi_gro_frags
        |          |          mlx4_en_process_rx_cq
        |          |          mlx4_en_poll_rx_cq
        |          |          net_rx_action
        |          |          __do_softirq

Digging through the git history the ip_check_mc_rcu appears to be the
result of the routing cache removal.  I've also used trace-cmd to
capture some function graphs and compared between 3.4 and 3.10.  The
ip_check_mc_rcu() is one obvious slowdown and it also appears that
ipv4_pktinfo_prepare() may have slowed down between 3.4 and 3.10 as
well though I have not yet looked to see what has changed there.  Here
is an example of ip_rcv_finish() traced with 3.10 keeping in mind that
the trace does add overhead:

7474.898949: funcgraph_entry              |  ip_rcv_finish() {
7474.898950: funcgraph_entry              |    ip_route_input_noref() {
7474.898951: funcgraph_entry   6.693 us   |      ip_check_mc_rcu();
7474.898958: funcgraph_entry   0.080 us   |      fib_validate_source();
7474.898958: funcgraph_entry              |      rt_dst_alloc() {
7474.898958: funcgraph_entry              |        dst_alloc() {
7474.898958: funcgraph_entry   0.046 us   |          kmem_cache_alloc();
7474.898959: funcgraph_entry   0.029 us   |          local_bh_disable();
7474.898959: funcgraph_entry   0.030 us   |          local_bh_enable();
7474.898959: funcgraph_exit:   1.134 us   |        }
7474.898960: funcgraph_exit:   1.415 us   |      }
7474.898960: funcgraph_exit:   9.269 us   |    }
7474.898960: funcgraph_entry              |    ip_local_deliver() {
7474.898960: funcgraph_entry              |      ip_local_deliver_finish() {
7474.898960: funcgraph_entry   0.054 us   |        raw_local_deliver();
7474.898960: funcgraph_entry              |        udp_rcv() {
7474.898961: funcgraph_entry              |          __udp4_lib_rcv() {
7474.898961: funcgraph_entry              |            __pskb_pull_tail() {
7474.898961: funcgraph_entry   0.129 us   |              skb_copy_bits();
7474.898961: funcgraph_exit:   0.377 us   |            }
7474.898961: funcgraph_entry              |            __udp4_lib_mcast_deliver() {
7474.898962: funcgraph_entry   0.156 us   |              _raw_spin_lock();
7474.898962: funcgraph_entry   0.268 us   |              ip_mc_sf_allow();
7474.898963: funcgraph_entry              |              flush_stack() {
7474.898963: funcgraph_entry              |                udp_queue_rcv_skb() {
7474.898963: funcgraph_entry              |                  ipv4_pktinfo_prepare() {
7474.898963: funcgraph_entry              |                    fib_compute_spec_dst() {
7474.898963: funcgraph_entry              |                      fib_table_lookup() {
7474.898964: funcgraph_entry   0.059 us   |                        check_leaf();
7474.898964: funcgraph_entry   0.030 us   |                        check_leaf();
7474.898964: funcgraph_exit:   0.784 us   |                      }
7474.898964: funcgraph_entry              |                      fib_table_lookup() {
7474.898965: funcgraph_entry   0.181 us   |                        check_leaf();
7474.898965: funcgraph_exit:   0.511 us   |                      }
7474.898965: funcgraph_exit:   1.814 us   |                    }
7474.898965: funcgraph_entry              |                    dst_release() {
7474.898965: funcgraph_entry              |                      dst_destroy() {
7474.898966: funcgraph_entry   0.026 us   |                        local_bh_disable();
7474.898966: funcgraph_entry   0.029 us   |                        local_bh_enable();
7474.898966: funcgraph_entry   0.029 us   |                        ipv4_dst_destroy();
7474.898966: funcgraph_entry   0.065 us   |                        kmem_cache_free();
7474.898967: funcgraph_exit:   1.029 us   |                      }
7474.898967: funcgraph_exit:   1.299 us   |                    }
7474.898967: funcgraph_exit:   3.647 us   |                  }
7474.898967: funcgraph_entry   0.028 us   |                  _raw_spin_lock();
7474.898967: funcgraph_entry              |                  __udp_queue_rcv_skb() {
7474.898967: funcgraph_entry              |                    sock_queue_rcv_skb() {
7474.898967: funcgraph_entry   0.035 us   |                      sk_filter();
7474.898968: funcgraph_entry   0.030 us   |                      _raw_spin_lock_irqsave();
7474.898968: funcgraph_entry   0.032 us   |                      _raw_spin_unlock_irqrestore();
7474.898968: funcgraph_entry              |                      sock_def_readable() {
7474.898968: funcgraph_entry              |                        __wake_up_sync_key() {
7474.898968: funcgraph_entry   0.030 us   |                          _raw_spin_lock_irqsave();
7474.898969: funcgraph_entry              |                          __wake_up_common() {
7474.898969: funcgraph_entry              |                            ep_poll_callback() {
7474.898969: funcgraph_entry   0.129 us   |                              _raw_spin_lock_irqsave();
7474.898970: funcgraph_entry              |                              __wake_up_locked() {
7474.898970: funcgraph_entry              |                                __wake_up_common() {
7474.898970: funcgraph_entry              |                                  default_wake_function() {
7474.898970: funcgraph_entry              |                                    try_to_wake_up() {
7474.898970: funcgraph_entry   0.095 us   |                                      _raw_spin_lock_irqsave();
7474.898971: funcgraph_entry   0.037 us   |                                      select_task_rq_rt();
7474.898971: funcgraph_entry              |                                      native_smp_send_reschedule() {
7474.898972: funcgraph_entry              |                                        physflat_send_IPI_mask() {
7474.898972: funcgraph_entry   0.181 us   |                                          default_send_IPI_mask_sequence_phys();
7474.898972: funcgraph_exit:   0.421 us   |                                        }
7474.898972: funcgraph_exit:   0.677 us   |                                      }
7474.898972: funcgraph_entry   0.158 us   |                                      ttwu_stat();
7474.898973: funcgraph_entry   0.033 us   |                                      _raw_spin_unlock_irqrestore();
7474.898973: funcgraph_exit:   2.784 us   |                                    }
7474.898973: funcgraph_exit:   3.022 us   |                                  }
7474.898973: funcgraph_exit:   3.398 us   |                                }
7474.898973: funcgraph_exit:   3.616 us   |                              }
7474.898973: funcgraph_entry   0.033 us   |                              _raw_spin_unlock_irqrestore();
7474.898974: funcgraph_exit:   4.634 us   |                            }
7474.898974: funcgraph_exit:   4.948 us   |                          }
7474.898974: funcgraph_entry   0.033 us   |                          _raw_spin_unlock_irqrestore();
7474.898975: funcgraph_exit:   6.420 us   |                        }
7474.898975: funcgraph_exit:   6.765 us   |                      }
7474.898975: funcgraph_exit:   7.743 us   |                    }
7474.898975: funcgraph_exit:   8.029 us   |                  }
7474.898975: funcgraph_exit: + 12.542 us  |                }
7474.898976: funcgraph_exit: + 12.838 us  |              }
7474.898976: funcgraph_exit: + 14.130 us  |            }
7474.898976: funcgraph_exit: + 15.069 us  |          }
7474.898976: funcgraph_exit: + 15.355 us  |        }
7474.898976: funcgraph_exit: + 15.959 us  |      }
7474.898976: funcgraph_exit: + 16.244 us  |    }
7474.898976: funcgraph_exit: + 26.284 us  |  }

And here is an example from 3.4 for comparison:

342680.275804: funcgraph_entry              |  ip_rcv_finish() {
342680.275805: funcgraph_entry              |    ip_route_input_common() {
342680.275806: funcgraph_entry   0.044 us   |      ipv4_validate_peer();
342680.275806: funcgraph_entry   0.033 us   |      skb_dst_set_noref();
342680.275807: funcgraph_exit:   1.249 us   |    }
342680.275807: funcgraph_entry              |    ip_local_deliver() {
342680.275807: funcgraph_entry              |      ip_local_deliver_finish() {
342680.275807: funcgraph_entry   0.037 us   |        raw_local_deliver();
342680.275807: funcgraph_entry              |        udp_rcv() {
342680.275808: funcgraph_entry              |          __udp4_lib_rcv() {
342680.275808: funcgraph_entry              |            __pskb_pull_tail() {
342680.275808: funcgraph_entry   0.085 us   |              skb_copy_bits();
342680.275808: funcgraph_exit:   0.416 us   |            }
342680.275808: funcgraph_entry              |            __udp4_lib_mcast_deliver() {
342680.275809: funcgraph_entry   0.058 us   |              _raw_spin_lock();
342680.275809: funcgraph_entry   0.044 us   |              ip_mc_sf_allow();
342680.275809: funcgraph_entry              |              flush_stack() {
342680.275810: funcgraph_entry              |                udp_queue_rcv_skb() {
342680.275810: funcgraph_entry   0.030 us   |                  ipv4_pktinfo_prepare();
342680.275810: funcgraph_entry   0.030 us   |                  _raw_spin_lock();
342680.275811: funcgraph_entry              |                  __udp_queue_rcv_skb() {
342680.275811: funcgraph_entry              |                    sock_queue_rcv_skb() {
342680.275811: funcgraph_entry   0.046 us   |                      sk_filter();
342680.275811: funcgraph_entry   0.031 us   |                      _raw_spin_lock_irqsave();
342680.275811: funcgraph_entry   0.035 us   |                      _raw_spin_unlock_irqrestore();
342680.275812: funcgraph_entry              |                      sock_def_readable() {
342680.275812: funcgraph_entry              |                        __wake_up_sync_key() {
342680.275812: funcgraph_entry   0.088 us   |                          _raw_spin_lock_irqsave();
342680.275812: funcgraph_entry              |                          __wake_up_common() {
342680.275813: funcgraph_entry              |                            ep_poll_callback() {
342680.275813: funcgraph_entry   0.143 us   |                              _raw_spin_lock_irqsave();
342680.275813: funcgraph_entry              |                              __wake_up_locked() {
342680.275814: funcgraph_entry              |                                __wake_up_common() {
342680.275814: funcgraph_entry              |                                  default_wake_function() {
342680.275814: funcgraph_entry              |                                    try_to_wake_up() {
342680.275814: funcgraph_entry   0.103 us   |                                      _raw_spin_lock_irqsave();
342680.275815: funcgraph_entry   0.039 us   |                                      select_task_rq_rt();
342680.275815: funcgraph_entry              |                                      native_smp_send_reschedule() {
342680.275815: funcgraph_entry              |                                        physflat_send_IPI_mask() {
342680.275815: funcgraph_entry   0.122 us   |                                          default_send_IPI_mask_sequence_phys();
342680.275816: funcgraph_exit:   0.369 us   |                                        }
342680.275816: funcgraph_exit:   0.624 us   |                                      }
342680.275816: funcgraph_entry   0.117 us   |                                      ttwu_stat();
342680.275816: funcgraph_entry   0.040 us   |                                      _raw_spin_unlock_irqrestore();
342680.275817: funcgraph_exit:   2.529 us   |                                    }
342680.275817: funcgraph_exit:   2.763 us   |                                  }
342680.275817: funcgraph_exit:   3.121 us   |                                }
342680.275817: funcgraph_exit:   3.348 us   |                              }
342680.275817: funcgraph_entry   0.036 us   |                              _raw_spin_unlock_irqrestore();
342680.275817: funcgraph_exit:   4.422 us   |                            }
342680.275817: funcgraph_exit:   4.818 us   |                          }
342680.275817: funcgraph_entry   0.036 us   |                          _raw_spin_unlock_irqrestore();
342680.275818: funcgraph_exit:   5.666 us   |                        }
342680.275818: funcgraph_exit:   6.051 us   |                      }
342680.275818: funcgraph_exit:   7.076 us   |                    }
342680.275818: funcgraph_exit:   7.363 us   |                  }
342680.275818: funcgraph_exit:   8.333 us   |                }
342680.275818: funcgraph_exit:   8.613 us   |              }
342680.275818: funcgraph_exit:   9.838 us   |            }
342680.275818: funcgraph_exit: + 10.800 us  |          }
342680.275819: funcgraph_exit: + 11.034 us  |        }
342680.275819: funcgraph_exit: + 11.628 us  |      }
342680.275819: funcgraph_exit: + 11.965 us  |    }
342680.275819: funcgraph_exit: + 13.932 us  |  }

Now while it does appear that ip_check_mc_rcu() and
ipv4_pktinfo_prepare() may have added some latency in the receive path
there may be something else going on here as well.  Obviously even the
13us difference in ip_rcv_finish() doesn't fully explain the large
spikes of one-way latency that I'm seeing though my theory is that the
additional packet overhead may add up during a large burst of packets.
Any input on this problem would be appreciated.

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Performance regression from routing cache removal?
  2013-06-05 17:57 Performance regression from routing cache removal? Shawn Bohrer
@ 2013-06-05 18:13 ` Eric Dumazet
  2013-06-05 20:32   ` Shawn Bohrer
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2013-06-05 18:13 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev, davem

On Wed, 2013-06-05 at 12:57 -0500, Shawn Bohrer wrote:
> I've got a performance regression that I've been trying to track down
> for the last couple of days.  The last known good kernel was 3.4 and
> now I'm testing 3.10 so there has been a lot of changes in between.
> The workload I'm testing has a single machine receiving UDP multicast
> packets across approximately 350 multicast groups.  We've got one
> socket per multicast address receiving the data.  The traffic is small
> packets and tends to be very bursty.  With the 3.10 kernel I'm seeing
> occasional spikes in one-way latency that are in the 100-500
> millisecond range, and regularly in the 10 millisecond range.  I've
> started a git bisect which has narrowed me down to:
> 
> bad f5b0a8743601a4477419171f5046bd07d1c080a0
> good fa0afcd10951afad2022dda09777d2bf70cdab3d
> 
> The remaining commits are all part of the routing cache removal so
> I've stopped my bisection for now.  I can bisect further if anyone
> thinks it will be valuable.

> 
> Digging through the git history the ip_check_mc_rcu appears to be the
> result of the routing cache removal. 

Yes, ip_check_mc_rcu() does a linear scan of your ~350 groups :

for_each_pmc_rcu(in_dev, im) {
    if (im->multiaddr == mc_addr)
          break;
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Performance regression from routing cache removal?
  2013-06-05 18:13 ` Eric Dumazet
@ 2013-06-05 20:32   ` Shawn Bohrer
  2013-06-05 20:52     ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn Bohrer @ 2013-06-05 20:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

On Wed, Jun 05, 2013 at 11:13:17AM -0700, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 12:57 -0500, Shawn Bohrer wrote:
> > I've got a performance regression that I've been trying to track down
> > for the last couple of days.  The last known good kernel was 3.4 and
> > now I'm testing 3.10 so there has been a lot of changes in between.
> > The workload I'm testing has a single machine receiving UDP multicast
> > packets across approximately 350 multicast groups.  We've got one
> > socket per multicast address receiving the data.  The traffic is small
> > packets and tends to be very bursty.  With the 3.10 kernel I'm seeing
> > occasional spikes in one-way latency that are in the 100-500
> > millisecond range, and regularly in the 10 millisecond range.  I've
> > started a git bisect which has narrowed me down to:
> > 
> > bad f5b0a8743601a4477419171f5046bd07d1c080a0
> > good fa0afcd10951afad2022dda09777d2bf70cdab3d
> > 
> > The remaining commits are all part of the routing cache removal so
> > I've stopped my bisection for now.  I can bisect further if anyone
> > thinks it will be valuable.
> 
> > 
> > Digging through the git history the ip_check_mc_rcu appears to be the
> > result of the routing cache removal. 
> 
> Yes, ip_check_mc_rcu() does a linear scan of your ~350 groups :
> 
> for_each_pmc_rcu(in_dev, im) {
>     if (im->multiaddr == mc_addr)
>           break;
> }

Indeed it does.  So what are my options here?

1) Use fewer multicast addresses.  This may not be possible since in
some cases I don't control the incoming data/addresses.  I am running
a test sending the same data over ~280 vs 350 and naturally it does
appear to be better.

2) Make that linear scan a hash lookup?  Are there any downsides or
reasons not to do this?

3) ?

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Performance regression from routing cache removal?
  2013-06-05 20:32   ` Shawn Bohrer
@ 2013-06-05 20:52     ` Eric Dumazet
  2013-06-07  0:35       ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2013-06-05 20:52 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev, davem

On Wed, 2013-06-05 at 15:32 -0500, Shawn Bohrer wrote:

> Indeed it does.  So what are my options here?
> 
> 1) Use fewer multicast addresses.  This may not be possible since in
> some cases I don't control the incoming data/addresses.  I am running
> a test sending the same data over ~280 vs 350 and naturally it does
> appear to be better.
> 
> 2) Make that linear scan a hash lookup?  Are there any downsides or
> reasons not to do this?
> 

It can be easily done, with a threshold : 

Above say 4 multicast addresses in the mc_list, allocate a hash table
and populate it.

Nice thing is that this hash table could be dynamically reallocated,
instead of being of fixed size, as we would keep the mc_list as well.

> 3) ?

One idea would be to extend IP early demux, currently handling only TCP
sockets, to UDP sockets.

This would work if you have no more than one socket receiving traffic to
a particular address:port, and this would avoid IP fib/dst stuff, as the
socket would have a cache of the dst.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Performance regression from routing cache removal?
  2013-06-05 20:52     ` Eric Dumazet
@ 2013-06-07  0:35       ` Eric Dumazet
  2013-06-07 14:48         ` Shawn Bohrer
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2013-06-07  0:35 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev, davem

On Wed, 2013-06-05 at 13:52 -0700, Eric Dumazet wrote:

> It can be easily done, with a threshold : 
> 
> Above say 4 multicast addresses in the mc_list, allocate a hash table
> and populate it.

Please try the following (untested) patch :

 include/linux/igmp.h       |    1 
 include/linux/inetdevice.h |    5 ++
 net/ipv4/igmp.c            |   73 +++++++++++++++++++++++++++++++++--
 3 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 7f2bf15..e3362b5 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -84,6 +84,7 @@ struct ip_mc_list {
 		struct ip_mc_list *next;
 		struct ip_mc_list __rcu *next_rcu;
 	};
+	struct ip_mc_list __rcu *next_hash;
 	struct timer_list	timer;
 	int			users;
 	atomic_t		refcnt;
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index ea1e3b8..b99cd23 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -50,12 +50,17 @@ struct ipv4_devconf {
 	DECLARE_BITMAP(state, IPV4_DEVCONF_MAX);
 };
 
+#define MC_HASH_SZ_LOG 9
+
 struct in_device {
 	struct net_device	*dev;
 	atomic_t		refcnt;
 	int			dead;
 	struct in_ifaddr	*ifa_list;	/* IP ifaddr chain		*/
+
 	struct ip_mc_list __rcu	*mc_list;	/* IP multicast filter chain    */
+	struct ip_mc_list __rcu	* __rcu *mc_hash;
+
 	int			mc_count;	/* Number of installed mcasts	*/
 	spinlock_t		mc_tomb_lock;
 	struct ip_mc_list	*mc_tomb;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 450f625..9c60482 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1217,6 +1217,56 @@ static void igmp_group_added(struct ip_mc_list *im)
  *	Multicast list managers
  */
 
+static u32 ip_mc_hash(const struct ip_mc_list *im)
+{
+	return hash_32((u32)im->multiaddr, MC_HASH_SZ_LOG);
+}
+
+static void ip_mc_hash_add(struct in_device *in_dev,
+			   struct ip_mc_list *im)
+{
+	struct ip_mc_list __rcu **mc_hash;
+	u32 hash;
+
+	mc_hash = rtnl_dereference(in_dev->mc_hash);
+	if (mc_hash) {
+		hash = ip_mc_hash(im);
+		im->next_hash = rtnl_dereference(mc_hash[hash]);
+		rcu_assign_pointer(mc_hash[hash], im);
+	} else if (in_dev->mc_count >= 4) {
+		mc_hash = kzalloc(sizeof(struct ip_mc_list *) << MC_HASH_SZ_LOG,
+				  GFP_KERNEL);
+		if (mc_hash) {
+			struct ip_mc_list *aux = rtnl_dereference(in_dev->mc_list);
+
+			while (aux) {
+				hash = ip_mc_hash(aux);
+
+				aux->next_hash = rcu_dereference_protected(mc_hash[hash], 1);
+				RCU_INIT_POINTER(mc_hash[hash], aux);
+				aux = rtnl_dereference(aux->next_rcu);
+			}
+			rcu_assign_pointer(in_dev->mc_hash, mc_hash);
+		}
+	}
+}
+
+static void ip_mc_hash_remove(struct in_device *in_dev,
+			      struct ip_mc_list *im)
+{
+	struct ip_mc_list __rcu **mc_hash = rtnl_dereference(in_dev->mc_hash);
+	struct ip_mc_list *aux;
+	unsigned int hash;
+
+	if (!mc_hash)
+		return;
+	hash = ip_mc_hash(im);
+	mc_hash += hash;
+	while ((aux = rtnl_dereference(*mc_hash)) != im)
+		mc_hash = &aux->next_hash;
+	*mc_hash = im->next_hash;
+}
+
 
 /*
  *	A socket has joined a multicast group on device dev.
@@ -1258,6 +1308,8 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 addr)
 	in_dev->mc_count++;
 	rcu_assign_pointer(in_dev->mc_list, im);
 
+	ip_mc_hash_add(in_dev, im);
+
 #ifdef CONFIG_IP_MULTICAST
 	igmpv3_del_delrec(in_dev, im->multiaddr);
 #endif
@@ -1314,6 +1366,7 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr)
 	     ip = &i->next_rcu) {
 		if (i->multiaddr == addr) {
 			if (--i->users == 0) {
+				ip_mc_hash_remove(in_dev, i);
 				*ip = i->next_rcu;
 				in_dev->mc_count--;
 				igmp_group_dropped(i);
@@ -1431,6 +1484,7 @@ void ip_mc_destroy_dev(struct in_device *in_dev)
 		ip_mc_clear_src(i);
 		ip_ma_put(i);
 	}
+	kfree(in_dev->mc_hash);
 }
 
 /* RTNL is locked */
@@ -2321,12 +2375,25 @@ void ip_mc_drop_socket(struct sock *sk)
 int ip_check_mc_rcu(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u16 proto)
 {
 	struct ip_mc_list *im;
+	struct ip_mc_list __rcu **mc_hash;
 	struct ip_sf_list *psf;
 	int rv = 0;
 
-	for_each_pmc_rcu(in_dev, im) {
-		if (im->multiaddr == mc_addr)
-			break;
+	mc_hash = rcu_dereference(in_dev->mc_hash);
+	if (mc_hash) {
+		u32 hash = hash_32((u32)mc_addr, MC_HASH_SZ_LOG);
+
+		for (im = rcu_dereference(mc_hash[hash]);
+		     im != NULL;
+		     im = rcu_dereference(im->next_hash)) {
+			if (im->multiaddr == mc_addr)
+				break;
+		}
+	} else {
+		for_each_pmc_rcu(in_dev, im) {
+			if (im->multiaddr == mc_addr)
+				break;
+		}
 	}
 	if (im && proto == IPPROTO_IGMP) {
 		rv = 1;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Performance regression from routing cache removal?
  2013-06-07  0:35       ` Eric Dumazet
@ 2013-06-07 14:48         ` Shawn Bohrer
  2013-06-07 15:48           ` [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu() Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn Bohrer @ 2013-06-07 14:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

On Thu, Jun 06, 2013 at 05:35:01PM -0700, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 13:52 -0700, Eric Dumazet wrote:
> 
> > It can be easily done, with a threshold : 
> > 
> > Above say 4 multicast addresses in the mc_list, allocate a hash table
> > and populate it.
> 
> Please try the following (untested) patch :
> 
>  include/linux/igmp.h       |    1 
>  include/linux/inetdevice.h |    5 ++
>  net/ipv4/igmp.c            |   73 +++++++++++++++++++++++++++++++++--
>  3 files changed, 76 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/igmp.h b/include/linux/igmp.h
> index 7f2bf15..e3362b5 100644
> --- a/include/linux/igmp.h
> +++ b/include/linux/igmp.h
> @@ -84,6 +84,7 @@ struct ip_mc_list {
>  		struct ip_mc_list *next;
>  		struct ip_mc_list __rcu *next_rcu;
>  	};
> +	struct ip_mc_list __rcu *next_hash;
>  	struct timer_list	timer;
>  	int			users;
>  	atomic_t		refcnt;
> diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
> index ea1e3b8..b99cd23 100644
> --- a/include/linux/inetdevice.h
> +++ b/include/linux/inetdevice.h
> @@ -50,12 +50,17 @@ struct ipv4_devconf {
>  	DECLARE_BITMAP(state, IPV4_DEVCONF_MAX);
>  };
>  
> +#define MC_HASH_SZ_LOG 9
> +
>  struct in_device {
>  	struct net_device	*dev;
>  	atomic_t		refcnt;
>  	int			dead;
>  	struct in_ifaddr	*ifa_list;	/* IP ifaddr chain		*/
> +
>  	struct ip_mc_list __rcu	*mc_list;	/* IP multicast filter chain    */
> +	struct ip_mc_list __rcu	* __rcu *mc_hash;
> +
>  	int			mc_count;	/* Number of installed mcasts	*/
>  	spinlock_t		mc_tomb_lock;
>  	struct ip_mc_list	*mc_tomb;
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index 450f625..9c60482 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -1217,6 +1217,56 @@ static void igmp_group_added(struct ip_mc_list *im)
>   *	Multicast list managers
>   */
>  
> +static u32 ip_mc_hash(const struct ip_mc_list *im)
> +{
> +	return hash_32((u32)im->multiaddr, MC_HASH_SZ_LOG);
> +}
> +
> +static void ip_mc_hash_add(struct in_device *in_dev,
> +			   struct ip_mc_list *im)
> +{
> +	struct ip_mc_list __rcu **mc_hash;
> +	u32 hash;
> +
> +	mc_hash = rtnl_dereference(in_dev->mc_hash);
> +	if (mc_hash) {
> +		hash = ip_mc_hash(im);
> +		im->next_hash = rtnl_dereference(mc_hash[hash]);
> +		rcu_assign_pointer(mc_hash[hash], im);
> +	} else if (in_dev->mc_count >= 4) {
> +		mc_hash = kzalloc(sizeof(struct ip_mc_list *) << MC_HASH_SZ_LOG,
> +				  GFP_KERNEL);
> +		if (mc_hash) {
> +			struct ip_mc_list *aux = rtnl_dereference(in_dev->mc_list);
> +
> +			while (aux) {
> +				hash = ip_mc_hash(aux);
> +
> +				aux->next_hash = rcu_dereference_protected(mc_hash[hash], 1);
> +				RCU_INIT_POINTER(mc_hash[hash], aux);
> +				aux = rtnl_dereference(aux->next_rcu);
> +			}
> +			rcu_assign_pointer(in_dev->mc_hash, mc_hash);
> +		}
> +	}
> +}
> +
> +static void ip_mc_hash_remove(struct in_device *in_dev,
> +			      struct ip_mc_list *im)
> +{
> +	struct ip_mc_list __rcu **mc_hash = rtnl_dereference(in_dev->mc_hash);
> +	struct ip_mc_list *aux;
> +	unsigned int hash;
> +
> +	if (!mc_hash)
> +		return;
> +	hash = ip_mc_hash(im);
> +	mc_hash += hash;
> +	while ((aux = rtnl_dereference(*mc_hash)) != im)
> +		mc_hash = &aux->next_hash;
> +	*mc_hash = im->next_hash;
> +}
> +
>  
>  /*
>   *	A socket has joined a multicast group on device dev.
> @@ -1258,6 +1308,8 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 addr)
>  	in_dev->mc_count++;
>  	rcu_assign_pointer(in_dev->mc_list, im);
>  
> +	ip_mc_hash_add(in_dev, im);
> +
>  #ifdef CONFIG_IP_MULTICAST
>  	igmpv3_del_delrec(in_dev, im->multiaddr);
>  #endif
> @@ -1314,6 +1366,7 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr)
>  	     ip = &i->next_rcu) {
>  		if (i->multiaddr == addr) {
>  			if (--i->users == 0) {
> +				ip_mc_hash_remove(in_dev, i);
>  				*ip = i->next_rcu;
>  				in_dev->mc_count--;
>  				igmp_group_dropped(i);
> @@ -1431,6 +1484,7 @@ void ip_mc_destroy_dev(struct in_device *in_dev)
>  		ip_mc_clear_src(i);
>  		ip_ma_put(i);
>  	}
> +	kfree(in_dev->mc_hash);
>  }
>  
>  /* RTNL is locked */
> @@ -2321,12 +2375,25 @@ void ip_mc_drop_socket(struct sock *sk)
>  int ip_check_mc_rcu(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u16 proto)
>  {
>  	struct ip_mc_list *im;
> +	struct ip_mc_list __rcu **mc_hash;
>  	struct ip_sf_list *psf;
>  	int rv = 0;
>  
> -	for_each_pmc_rcu(in_dev, im) {
> -		if (im->multiaddr == mc_addr)
> -			break;
> +	mc_hash = rcu_dereference(in_dev->mc_hash);
> +	if (mc_hash) {
> +		u32 hash = hash_32((u32)mc_addr, MC_HASH_SZ_LOG);
> +
> +		for (im = rcu_dereference(mc_hash[hash]);
> +		     im != NULL;
> +		     im = rcu_dereference(im->next_hash)) {
> +			if (im->multiaddr == mc_addr)
> +				break;
> +		}
> +	} else {
> +		for_each_pmc_rcu(in_dev, im) {
> +			if (im->multiaddr == mc_addr)
> +				break;
> +		}
>  	}
>  	if (im && proto == IPPROTO_IGMP) {
>  		rv = 1;
> 
> 

Thanks Eric!  I ran this patch last night and it greatly improves the
multicast receive performance on 3.10-rc4.  3.10 may still be 1-2us
slower than 3.4 but it is a little hard for me to tell at the moment
since it looks like there is a mmapped I/O regression in 3.10 that I
also need to track down.

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu()
  2013-06-07 14:48         ` Shawn Bohrer
@ 2013-06-07 15:48           ` Eric Dumazet
  2013-06-07 17:33             ` Shawn Bohrer
                               ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Eric Dumazet @ 2013-06-07 15:48 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev, davem

From: Eric Dumazet <edumazet@google.com>

After IP route cache removal, multicast applications using
a lot of multicast addresses hit a O(N) behavior in ip_check_mc_rcu()

Add a per in_device hash table to get faster lookup.

This hash table is created only if the number of items in mc_list is
above 4.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
Shawn, this is a different version from the v0.
so please test it and add your "Tested-by: " if OK, thanks.

 include/linux/igmp.h       |    1 
 include/linux/inetdevice.h |    5 ++
 net/ipv4/devinet.c         |    1 
 net/ipv4/igmp.c            |   73 +++++++++++++++++++++++++++++++++--
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 7f2bf15..e3362b5 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -84,6 +84,7 @@ struct ip_mc_list {
 		struct ip_mc_list *next;
 		struct ip_mc_list __rcu *next_rcu;
 	};
+	struct ip_mc_list __rcu *next_hash;
 	struct timer_list	timer;
 	int			users;
 	atomic_t		refcnt;
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index ea1e3b8..b99cd23 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -50,12 +50,17 @@ struct ipv4_devconf {
 	DECLARE_BITMAP(state, IPV4_DEVCONF_MAX);
 };
 
+#define MC_HASH_SZ_LOG 9
+
 struct in_device {
 	struct net_device	*dev;
 	atomic_t		refcnt;
 	int			dead;
 	struct in_ifaddr	*ifa_list;	/* IP ifaddr chain		*/
+
 	struct ip_mc_list __rcu	*mc_list;	/* IP multicast filter chain    */
+	struct ip_mc_list __rcu	* __rcu *mc_hash;
+
 	int			mc_count;	/* Number of installed mcasts	*/
 	spinlock_t		mc_tomb_lock;
 	struct ip_mc_list	*mc_tomb;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index b047e2d..3469506 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -215,6 +215,7 @@ void in_dev_finish_destroy(struct in_device *idev)
 
 	WARN_ON(idev->ifa_list);
 	WARN_ON(idev->mc_list);
+	kfree(rcu_dereference_protected(idev->mc_hash, 1));
 #ifdef NET_REFCNT_DEBUG
 	pr_debug("%s: %p=%s\n", __func__, idev, dev ? dev->name : "NIL");
 #endif
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 450f625..f72011d 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1217,6 +1217,57 @@ static void igmp_group_added(struct ip_mc_list *im)
  *	Multicast list managers
  */
 
+static u32 ip_mc_hash(const struct ip_mc_list *im)
+{
+	return hash_32((u32)im->multiaddr, MC_HASH_SZ_LOG);
+}
+
+static void ip_mc_hash_add(struct in_device *in_dev,
+			   struct ip_mc_list *im)
+{
+	struct ip_mc_list __rcu **mc_hash;
+	u32 hash;
+
+	mc_hash = rtnl_dereference(in_dev->mc_hash);
+	if (mc_hash) {
+		hash = ip_mc_hash(im);
+		im->next_hash = rtnl_dereference(mc_hash[hash]);
+		rcu_assign_pointer(mc_hash[hash], im);
+		return;
+	}
+
+	/* do not use a hash table for small number of items */
+	if (in_dev->mc_count < 4)
+		return;
+
+	mc_hash = kzalloc(sizeof(struct ip_mc_list *) << MC_HASH_SZ_LOG,
+			  GFP_KERNEL);
+	if (!mc_hash)
+		return;
+
+	for_each_pmc_rtnl(in_dev, im) {
+		hash = ip_mc_hash(im);
+		im->next_hash = rtnl_dereference(mc_hash[hash]);
+		RCU_INIT_POINTER(mc_hash[hash], im);
+	}
+
+	rcu_assign_pointer(in_dev->mc_hash, mc_hash);
+}
+
+static void ip_mc_hash_remove(struct in_device *in_dev,
+			      struct ip_mc_list *im)
+{
+	struct ip_mc_list __rcu **mc_hash = rtnl_dereference(in_dev->mc_hash);
+	struct ip_mc_list *aux;
+
+	if (!mc_hash)
+		return;
+	mc_hash += ip_mc_hash(im);
+	while ((aux = rtnl_dereference(*mc_hash)) != im)
+		mc_hash = &aux->next_hash;
+	*mc_hash = im->next_hash;
+}
+
 
 /*
  *	A socket has joined a multicast group on device dev.
@@ -1258,6 +1309,8 @@ void ip_mc_inc_group(struct in_device *in_dev, __be32 addr)
 	in_dev->mc_count++;
 	rcu_assign_pointer(in_dev->mc_list, im);
 
+	ip_mc_hash_add(in_dev, im);
+
 #ifdef CONFIG_IP_MULTICAST
 	igmpv3_del_delrec(in_dev, im->multiaddr);
 #endif
@@ -1314,6 +1367,7 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr)
 	     ip = &i->next_rcu) {
 		if (i->multiaddr == addr) {
 			if (--i->users == 0) {
+				ip_mc_hash_remove(in_dev, i);
 				*ip = i->next_rcu;
 				in_dev->mc_count--;
 				igmp_group_dropped(i);
@@ -2321,12 +2375,25 @@ void ip_mc_drop_socket(struct sock *sk)
 int ip_check_mc_rcu(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u16 proto)
 {
 	struct ip_mc_list *im;
+	struct ip_mc_list __rcu **mc_hash;
 	struct ip_sf_list *psf;
 	int rv = 0;
 
-	for_each_pmc_rcu(in_dev, im) {
-		if (im->multiaddr == mc_addr)
-			break;
+	mc_hash = rcu_dereference(in_dev->mc_hash);
+	if (mc_hash) {
+		u32 hash = hash_32((u32)mc_addr, MC_HASH_SZ_LOG);
+
+		for (im = rcu_dereference(mc_hash[hash]);
+		     im != NULL;
+		     im = rcu_dereference(im->next_hash)) {
+			if (im->multiaddr == mc_addr)
+				break;
+		}
+	} else {
+		for_each_pmc_rcu(in_dev, im) {
+			if (im->multiaddr == mc_addr)
+				break;
+		}
 	}
 	if (im && proto == IPPROTO_IGMP) {
 		rv = 1;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu()
  2013-06-07 15:48           ` [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu() Eric Dumazet
@ 2013-06-07 17:33             ` Shawn Bohrer
  2013-06-08  4:39             ` Cong Wang
  2013-06-12  7:26             ` David Miller
  2 siblings, 0 replies; 12+ messages in thread
From: Shawn Bohrer @ 2013-06-07 17:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

On Fri, Jun 07, 2013 at 08:48:57AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> After IP route cache removal, multicast applications using
> a lot of multicast addresses hit a O(N) behavior in ip_check_mc_rcu()
> 
> Add a per in_device hash table to get faster lookup.
> 
> This hash table is created only if the number of items in mc_list is
> above 4.
> 
> Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> Shawn, this is a different version from the v0.
> so please test it and add your "Tested-by: " if OK, thanks.

Patch works great thanks!

Tested-by: Shawn Bohrer <sbohrer@rgmadvisors.com>

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu()
  2013-06-07 15:48           ` [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu() Eric Dumazet
  2013-06-07 17:33             ` Shawn Bohrer
@ 2013-06-08  4:39             ` Cong Wang
  2013-06-08  5:23               ` Eric Dumazet
  2013-06-12  7:26             ` David Miller
  2 siblings, 1 reply; 12+ messages in thread
From: Cong Wang @ 2013-06-08  4:39 UTC (permalink / raw)
  To: netdev

On Fri, 07 Jun 2013 at 15:48 GMT, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> diff --git a/include/linux/igmp.h b/include/linux/igmp.h
> index 7f2bf15..e3362b5 100644
> --- a/include/linux/igmp.h
> +++ b/include/linux/igmp.h
> @@ -84,6 +84,7 @@ struct ip_mc_list {
>  		struct ip_mc_list *next;
>  		struct ip_mc_list __rcu *next_rcu;
>  	};
> +	struct ip_mc_list __rcu *next_hash;


Why not put this into the above union?


> +static void ip_mc_hash_add(struct in_device *in_dev,
> +			   struct ip_mc_list *im)
> +{
> +	struct ip_mc_list __rcu **mc_hash;
> +	u32 hash;
> +
> +	mc_hash = rtnl_dereference(in_dev->mc_hash);
> +	if (mc_hash) {
> +		hash = ip_mc_hash(im);
> +		im->next_hash = rtnl_dereference(mc_hash[hash]);
> +		rcu_assign_pointer(mc_hash[hash], im);
> +		return;
> +	}
> +
> +	/* do not use a hash table for small number of items */
> +	if (in_dev->mc_count < 4)
> +		return;


Can this check be moved to the beginning of this function?


> +
> +	mc_hash = kzalloc(sizeof(struct ip_mc_list *) << MC_HASH_SZ_LOG,
> +			  GFP_KERNEL);
> +	if (!mc_hash)
> +		return;
> +
> +	for_each_pmc_rtnl(in_dev, im) {
> +		hash = ip_mc_hash(im);
> +		im->next_hash = rtnl_dereference(mc_hash[hash]);
> +		RCU_INIT_POINTER(mc_hash[hash], im);
> +	}
> +
> +	rcu_assign_pointer(in_dev->mc_hash, mc_hash);
> +}
> +

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu()
  2013-06-08  4:39             ` Cong Wang
@ 2013-06-08  5:23               ` Eric Dumazet
  2013-06-10  1:58                 ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2013-06-08  5:23 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Sat, 2013-06-08 at 04:39 +0000, Cong Wang wrote:
> On Fri, 07 Jun 2013 at 15:48 GMT, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > diff --git a/include/linux/igmp.h b/include/linux/igmp.h
> > index 7f2bf15..e3362b5 100644
> > --- a/include/linux/igmp.h
> > +++ b/include/linux/igmp.h
> > @@ -84,6 +84,7 @@ struct ip_mc_list {
> >  		struct ip_mc_list *next;
> >  		struct ip_mc_list __rcu *next_rcu;
> >  	};
> > +	struct ip_mc_list __rcu *next_hash;
> 
> 
> Why not put this into the above union?
> 

Because it must be a different storage.

Read ip_mc_hash_add(), its pretty clear...

> 
> > +static void ip_mc_hash_add(struct in_device *in_dev,
> > +			   struct ip_mc_list *im)
> > +{
> > +	struct ip_mc_list __rcu **mc_hash;
> > +	u32 hash;
> > +
> > +	mc_hash = rtnl_dereference(in_dev->mc_hash);
> > +	if (mc_hash) {
> > +		hash = ip_mc_hash(im);
> > +		im->next_hash = rtnl_dereference(mc_hash[hash]);
> > +		rcu_assign_pointer(mc_hash[hash], im);
> > +		return;
> > +	}
> > +
> > +	/* do not use a hash table for small number of items */
> > +	if (in_dev->mc_count < 4)
> > +		return;
> 
> 
> Can this check be moved to the beginning of this function?

Absolutely not.

Once hash table is created, all items must be inserted in,
because of RCU lookups.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu()
  2013-06-08  5:23               ` Eric Dumazet
@ 2013-06-10  1:58                 ` Cong Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Cong Wang @ 2013-06-10  1:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers

Thanks for explanation!

Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu()
  2013-06-07 15:48           ` [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu() Eric Dumazet
  2013-06-07 17:33             ` Shawn Bohrer
  2013-06-08  4:39             ` Cong Wang
@ 2013-06-12  7:26             ` David Miller
  2 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2013-06-12  7:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sbohrer, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 07 Jun 2013 08:48:57 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> After IP route cache removal, multicast applications using
> a lot of multicast addresses hit a O(N) behavior in ip_check_mc_rcu()
> 
> Add a per in_device hash table to get faster lookup.
> 
> This hash table is created only if the number of items in mc_list is
> above 4.
> 
> Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-06-12  7:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-05 17:57 Performance regression from routing cache removal? Shawn Bohrer
2013-06-05 18:13 ` Eric Dumazet
2013-06-05 20:32   ` Shawn Bohrer
2013-06-05 20:52     ` Eric Dumazet
2013-06-07  0:35       ` Eric Dumazet
2013-06-07 14:48         ` Shawn Bohrer
2013-06-07 15:48           ` [PATCH net-next] igmp: hash a hash table to speedup ip_check_mc_rcu() Eric Dumazet
2013-06-07 17:33             ` Shawn Bohrer
2013-06-08  4:39             ` Cong Wang
2013-06-08  5:23               ` Eric Dumazet
2013-06-10  1:58                 ` Cong Wang
2013-06-12  7:26             ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).