All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v4 00/16] net/sched use rcu filters
@ 2014-09-10 15:46 John Fastabend
  2014-09-10 15:47 ` [net-next PATCH v4 01/16] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
                   ` (15 more replies)
  0 siblings, 16 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:46 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

A rather long description...

This series drops the qdisc lock that is currently protecting the
ingress qdisc. This can be done after the tcf filters are made
lockless and the statistic accounting is safe to run without locks.

To do this the classifiers are converted to use RCU. This requires
updating each classifier individually to handle the new copy/update
requirement and also to update the core list traversals. This is
done in patches 2-11. This also makes the assumption that updates
to the tables are infrequent in comparison to the packet per second
being classified. On a 10Gbps running near line rate we can easily
produce 12+ million packets per second so IMO this is a reasonable
assumption. The updates are serialized by RTNL.

In order to have working statistics patch 13 and 14 convert the
bstats and qstats, which do accounting for bytes and packets, into
percpu variables and the u64_stats_update_{begin|end} infrastructure
is used to maintain consistent 64bit statistics. Because these
statistics are also used by the estimators those function calls had
to be udpated as well. So that I didn't have to modify all qdiscs at
this time many of which don't have an easy path to make lockless the
percpu statistics are only used when the TCQ_F_LLQDISC flag is set.
Its worth noting that in the mq and mqprio case sub-qdisc's are
already mapped 1:1 with TX queues which tend to be equal to the number
of CPUs in the system so its not clear that removing locking in
these cases would provide large benefits. Most likely a new qdisc
written from scratch would be needed to implement a mq-htb or
mq-tbf qdisc. It seems to me that one reasonably approach to do
this would be to use eventually consistent counters and accept
some imperfect rate limiting scheme.

As for some history I wrote what was basically these patches some
time ago and then got stalled working on other things. Cong Wang
made a proposal to remove the locking around the ingress qdisc
which then kicked me to get these patches working again. Some
time passed and am now submitting the patches.

I have done some basic testing on this series and do not see any
immediate splats or issues. The patch series has been running
on my dev systems for a month or so now and I've not seen any
issues. Although my configurations are not overly complicated.

My test cases at this point cover all the filters with a
tight loop to add/remove filters. Some basic estimator tests
where I add an estimator to the qdisc and verify the statistics
accurate using pktgen. And finally I have a small script to
exercise the 'tc actions' interface. Feel free to send me more
tests off list and I can run them.

Comments:
  - Checkpatch is still giving errors on some >80 char lines I know
    about this. IMO the way to fix this is to restructure the sched
    code to avoid being so heavily indented. But doing this here
    bloats the patchset and anyways there are already lots of >80
    chars in these files. I would prefer to keep the patches as is
    but let me know if others think I should fix these and I will.
    A follow up patch set could restructure the code and fix this
    throughout the code blocks.

Future work:
  - provide metadata such as current cpu for the classifier
    to match on. this would allow for a multiqueue ingress
    qdisc strategy.
  - provide filter hook on egress before queue is selected
    to allow a classifier/action to pick the tx queue. This
    generalizes mqprio and should remove the need for many
    drivers to implement select_queue() callbacks. I have
    a patch for this now but its not entirely clear to me
    its all that useful considering mqprio already allows
    queueing by skb->priority. I also have a patch to do
    hardware based queue rate limiting with this patch the
    egress filter is more interesting.
  - create a variant of tbf that does not require the qdisc
    lock using eventually consistent counters.
  - a lockless fifo ring may provide some wins for some use
    cases.

Changes:
 - v2: Fixed alloc without null check and used kmemdup
       use rcu_access_pointer

 - v3: I inadvertently changed the logic in tcf_destroy_chain
       such that the tcf_proto list was not being NULL'd as
       pointed out by Dave Miller. My test cases were still
       passing because this is only being done after the
       qdisc is detached from the xmit path and right before
       the qdisc itself is destroyed so at least in the
       existing code paths there were no further attempts to
       access tcf_proto list. Anyways it was wrong and should
       now be resolved by using correct rcu semantics.

       Also I fixed the sparse warnings from the tcf_chain
       calls and the percpu stats usage by doing correct
       rcu annotations.

 - v4: Eric Dumazet corrected my usage of RCU_INIT_POINTER
       so I moved many rcu_assign_pointer over to init_pointer.
       Also some other comments resolved.

       Split cls_u32 patch into two patches one for percpu
       logic and the other for rcu transformation.
       
       Cong Wang spotted a missing 'static' in cls_u32.

---

John Fastabend (16):
      net: qdisc: use rcu prefix and silence sparse warnings
      net: rcu-ify tcf_proto
      net: sched: cls_basic use RCU
      net: sched: cls_cgroup use RCU
      net: sched: cls_flow use RCU
      net: sched: fw use RCU
      net: sched: RCU cls_route
      net: sched: RCU cls_tcindex
      net: sched: make cls_u32 per cpu
      net: sched: make cls_u32 lockless
      net: sched: rcu'ify cls_rsvp
      net: sched: rcu'ify cls_bpf
      net: sched: make tc_action safe to walk under RCU
      net: sched: make bstats per cpu and estimator RCU safe
      net: sched: make qstats per cpu
      net: sched: drop ingress qdisc lock


 include/linux/netdevice.h  |   29 +----
 include/linux/rtnetlink.h  |   10 ++
 include/net/act_api.h      |    1 
 include/net/codel.h        |    4 -
 include/net/gen_stats.h    |   18 +++
 include/net/pkt_cls.h      |   10 ++
 include/net/sch_generic.h  |   93 ++++++++++++----
 net/core/dev.c             |   53 ++++++++-
 net/core/gen_estimator.c   |   60 ++++++++--
 net/core/gen_stats.c       |   77 +++++++++++++
 net/netfilter/xt_RATEEST.c |    4 -
 net/sched/act_api.c        |   23 ++--
 net/sched/act_police.c     |    4 -
 net/sched/cls_api.c        |   47 ++++----
 net/sched/cls_basic.c      |   80 ++++++++------
 net/sched/cls_bpf.c        |   93 ++++++++--------
 net/sched/cls_cgroup.c     |   63 +++++++----
 net/sched/cls_flow.c       |  145 ++++++++++++++-----------
 net/sched/cls_fw.c         |  111 +++++++++++++------
 net/sched/cls_route.c      |  226 +++++++++++++++++++++++----------------
 net/sched/cls_rsvp.h       |  157 +++++++++++++++------------
 net/sched/cls_tcindex.c    |  248 ++++++++++++++++++++++++++----------------
 net/sched/cls_u32.c        |  258 +++++++++++++++++++++++++++++---------------
 net/sched/sch_api.c        |   86 ++++++++++++---
 net/sched/sch_atm.c        |   22 ++--
 net/sched/sch_cbq.c        |   30 +++--
 net/sched/sch_choke.c      |   32 +++--
 net/sched/sch_codel.c      |    2 
 net/sched/sch_drr.c        |   26 +++-
 net/sched/sch_dsmark.c     |   11 +-
 net/sched/sch_fifo.c       |    6 +
 net/sched/sch_fq.c         |    4 -
 net/sched/sch_fq_codel.c   |   19 ++-
 net/sched/sch_generic.c    |   21 +++-
 net/sched/sch_gred.c       |   10 +-
 net/sched/sch_hfsc.c       |   38 ++++--
 net/sched/sch_hhf.c        |    8 +
 net/sched/sch_htb.c        |   35 +++---
 net/sched/sch_ingress.c    |   20 +++
 net/sched/sch_mq.c         |   31 +++--
 net/sched/sch_mqprio.c     |   54 ++++++---
 net/sched/sch_multiq.c     |   18 ++-
 net/sched/sch_netem.c      |   17 ++-
 net/sched/sch_pie.c        |    8 +
 net/sched/sch_plug.c       |    2 
 net/sched/sch_prio.c       |   21 ++--
 net/sched/sch_qfq.c        |   29 +++--
 net/sched/sch_red.c        |   13 +-
 net/sched/sch_sfb.c        |   28 +++--
 net/sched/sch_sfq.c        |   30 +++--
 net/sched/sch_tbf.c        |   11 +-
 net/sched/sch_teql.c       |   13 +-
 52 files changed, 1562 insertions(+), 897 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 01/16] net: qdisc: use rcu prefix and silence sparse warnings
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
@ 2014-09-10 15:47 ` John Fastabend
  2014-09-11  0:23   ` Eric Dumazet
  2014-09-10 15:47 ` [net-next PATCH v4 02/16] net: rcu-ify tcf_proto John Fastabend
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:47 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/netdevice.h |   29 ++++----------------------
 include/net/sch_generic.h |   25 +++++++++++++++++-----
 net/core/dev.c            |   51 +++++++++++++++++++++++++++++++++++++++++++--
 net/sched/sch_generic.c   |    4 ++--
 net/sched/sch_mqprio.c    |    6 ++++-
 net/sched/sch_teql.c      |   13 +++++++----
 6 files changed, 86 insertions(+), 42 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ba72f6b..ae721f5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -543,7 +543,7 @@ struct netdev_queue {
  * read mostly part
  */
 	struct net_device	*dev;
-	struct Qdisc		*qdisc;
+	struct Qdisc __rcu	*qdisc;
 	struct Qdisc		*qdisc_sleeping;
 #ifdef CONFIG_SYSFS
 	struct kobject		kobj;
@@ -2356,12 +2356,7 @@ static inline void input_queue_tail_incr_save(struct softnet_data *sd,
 DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
 
 void __netif_schedule(struct Qdisc *q);
-
-static inline void netif_schedule_queue(struct netdev_queue *txq)
-{
-	if (!(txq->state & QUEUE_STATE_ANY_XOFF))
-		__netif_schedule(txq->qdisc);
-}
+void netif_schedule_queue(struct netdev_queue *txq);
 
 static inline void netif_tx_schedule_all(struct net_device *dev)
 {
@@ -2397,11 +2392,7 @@ static inline void netif_tx_start_all_queues(struct net_device *dev)
 	}
 }
 
-static inline void netif_tx_wake_queue(struct netdev_queue *dev_queue)
-{
-	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state))
-		__netif_schedule(dev_queue->qdisc);
-}
+void netif_tx_wake_queue(struct netdev_queue *dev_queue);
 
 /**
  *	netif_wake_queue - restart transmit
@@ -2673,19 +2664,7 @@ static inline bool netif_subqueue_stopped(const struct net_device *dev,
 	return __netif_subqueue_stopped(dev, skb_get_queue_mapping(skb));
 }
 
-/**
- *	netif_wake_subqueue - allow sending packets on subqueue
- *	@dev: network device
- *	@queue_index: sub queue index
- *
- * Resume individual transmit queue of a device with multiple transmit queues.
- */
-static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
-{
-	struct netdev_queue *txq = netdev_get_tx_queue(dev, queue_index);
-	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state))
-		__netif_schedule(txq->qdisc);
-}
+void netif_wake_subqueue(struct net_device *dev, u16 queue_index);
 
 #ifdef CONFIG_XPS
 int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a3cfb8e..ce3b920 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -259,7 +259,9 @@ static inline spinlock_t *qdisc_lock(struct Qdisc *qdisc)
 
 static inline struct Qdisc *qdisc_root(const struct Qdisc *qdisc)
 {
-	return qdisc->dev_queue->qdisc;
+	struct Qdisc *q = rcu_dereference_rtnl(qdisc->dev_queue->qdisc);
+
+	return q;
 }
 
 static inline struct Qdisc *qdisc_root_sleeping(const struct Qdisc *qdisc)
@@ -384,7 +386,7 @@ static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i)
 	struct Qdisc *qdisc;
 
 	for (; i < dev->num_tx_queues; i++) {
-		qdisc = netdev_get_tx_queue(dev, i)->qdisc;
+		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		if (qdisc) {
 			spin_lock_bh(qdisc_lock(qdisc));
 			qdisc_reset(qdisc);
@@ -402,13 +404,18 @@ static inline void qdisc_reset_all_tx(struct net_device *dev)
 static inline bool qdisc_all_tx_empty(const struct net_device *dev)
 {
 	unsigned int i;
+
+	rcu_read_lock();
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-		const struct Qdisc *q = txq->qdisc;
+		const struct Qdisc *q = rcu_dereference(txq->qdisc);
 
-		if (q->q.qlen)
+		if (q->q.qlen) {
+			rcu_read_unlock();
 			return false;
+		}
 	}
+	rcu_read_unlock();
 	return true;
 }
 
@@ -416,10 +423,13 @@ static inline bool qdisc_all_tx_empty(const struct net_device *dev)
 static inline bool qdisc_tx_changing(const struct net_device *dev)
 {
 	unsigned int i;
+
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-		if (txq->qdisc != txq->qdisc_sleeping)
+		if (rcu_access_pointer(txq->qdisc) != txq->qdisc_sleeping) {
+			rcu_read_unlock();
 			return true;
+		}
 	}
 	return false;
 }
@@ -428,10 +438,13 @@ static inline bool qdisc_tx_changing(const struct net_device *dev)
 static inline bool qdisc_tx_is_noop(const struct net_device *dev)
 {
 	unsigned int i;
+
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-		if (txq->qdisc != &noop_qdisc)
+		if (rcu_dereference(txq->qdisc) != &noop_qdisc) {
+			rcu_read_unlock();
 			return false;
+		}
 	}
 	return true;
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 3c6a967..b3d6dbc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2177,6 +2177,53 @@ static struct dev_kfree_skb_cb *get_kfree_skb_cb(const struct sk_buff *skb)
 	return (struct dev_kfree_skb_cb *)skb->cb;
 }
 
+void netif_schedule_queue(struct netdev_queue *txq)
+{
+	rcu_read_lock();
+	if (!(txq->state & QUEUE_STATE_ANY_XOFF)) {
+		struct Qdisc *q = rcu_dereference(txq->qdisc);
+
+		__netif_schedule(q);
+	}
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(netif_schedule_queue);
+
+/**
+ *	netif_wake_subqueue - allow sending packets on subqueue
+ *	@dev: network device
+ *	@queue_index: sub queue index
+ *
+ * Resume individual transmit queue of a device with multiple transmit queues.
+ */
+void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
+{
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, queue_index);
+
+	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)) {
+		struct Qdisc *q;
+
+		rcu_read_lock();
+		q = rcu_dereference(txq->qdisc);
+		__netif_schedule(q);
+		rcu_read_unlock();
+	}
+}
+EXPORT_SYMBOL(netif_wake_subqueue);
+
+void netif_tx_wake_queue(struct netdev_queue *dev_queue)
+{
+	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state)) {
+		struct Qdisc *q;
+
+		rcu_read_lock();
+		q = rcu_dereference(dev_queue->qdisc);
+		__netif_schedule(q);
+		rcu_read_unlock();
+	}
+}
+EXPORT_SYMBOL(netif_tx_wake_queue);
+
 void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason)
 {
 	unsigned long flags;
@@ -3432,7 +3479,7 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 	skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
 
-	q = rxq->qdisc;
+	q = rcu_dereference(rxq->qdisc);
 	if (q != &noop_qdisc) {
 		spin_lock(qdisc_lock(q));
 		if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
@@ -3449,7 +3496,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 {
 	struct netdev_queue *rxq = rcu_dereference(skb->dev->ingress_queue);
 
-	if (!rxq || rxq->qdisc == &noop_qdisc)
+	if (!rxq || rcu_access_pointer(rxq->qdisc) == &noop_qdisc)
 		goto out;
 
 	if (*pt_prev) {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 19696eb..346ef85 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -783,7 +783,7 @@ static void dev_deactivate_queue(struct net_device *dev,
 	struct Qdisc *qdisc_default = _qdisc_default;
 	struct Qdisc *qdisc;
 
-	qdisc = dev_queue->qdisc;
+	qdisc = rtnl_dereference(dev_queue->qdisc);
 	if (qdisc) {
 		spin_lock_bh(qdisc_lock(qdisc));
 
@@ -876,7 +876,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
 {
 	struct Qdisc *qdisc = _qdisc;
 
-	dev_queue->qdisc = qdisc;
+	rcu_assign_pointer(dev_queue->qdisc, qdisc);
 	dev_queue->qdisc_sleeping = qdisc;
 }
 
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6749e2f..37e7d25 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -231,7 +231,7 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
-		qdisc = netdev_get_tx_queue(dev, i)->qdisc;
+		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
 		sch->bstats.bytes	+= qdisc->bstats.bytes;
@@ -340,7 +340,9 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		spin_unlock_bh(d->lock);
 
 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
-			qdisc = netdev_get_tx_queue(dev, i)->qdisc;
+			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
+
+			qdisc = rtnl_dereference(q->qdisc);
 			spin_lock_bh(qdisc_lock(qdisc));
 			bstats.bytes      += qdisc->bstats.bytes;
 			bstats.packets    += qdisc->bstats.packets;
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index aaa8d03..5cd291b 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -96,11 +96,14 @@ teql_dequeue(struct Qdisc *sch)
 	struct teql_sched_data *dat = qdisc_priv(sch);
 	struct netdev_queue *dat_queue;
 	struct sk_buff *skb;
+	struct Qdisc *q;
 
 	skb = __skb_dequeue(&dat->q);
 	dat_queue = netdev_get_tx_queue(dat->m->dev, 0);
+	q = rcu_dereference_bh(dat_queue->qdisc);
+
 	if (skb == NULL) {
-		struct net_device *m = qdisc_dev(dat_queue->qdisc);
+		struct net_device *m = qdisc_dev(q);
 		if (m) {
 			dat->m->slaves = sch;
 			netif_wake_queue(m);
@@ -108,7 +111,7 @@ teql_dequeue(struct Qdisc *sch)
 	} else {
 		qdisc_bstats_update(sch, skb);
 	}
-	sch->q.qlen = dat->q.qlen + dat_queue->qdisc->q.qlen;
+	sch->q.qlen = dat->q.qlen + q->q.qlen;
 	return skb;
 }
 
@@ -157,9 +160,9 @@ teql_destroy(struct Qdisc *sch)
 						txq = netdev_get_tx_queue(master->dev, 0);
 						master->slaves = NULL;
 
-						root_lock = qdisc_root_sleeping_lock(txq->qdisc);
+						root_lock = qdisc_root_sleeping_lock(rtnl_dereference(txq->qdisc));
 						spin_lock_bh(root_lock);
-						qdisc_reset(txq->qdisc);
+						qdisc_reset(rtnl_dereference(txq->qdisc));
 						spin_unlock_bh(root_lock);
 					}
 				}
@@ -266,7 +269,7 @@ static inline int teql_resolve(struct sk_buff *skb,
 	struct dst_entry *dst = skb_dst(skb);
 	int res;
 
-	if (txq->qdisc == &noop_qdisc)
+	if (rcu_access_pointer(txq->qdisc) == &noop_qdisc)
 		return -ENODEV;
 
 	if (!dev->header_ops || !dst)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 02/16] net: rcu-ify tcf_proto
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
  2014-09-10 15:47 ` [net-next PATCH v4 01/16] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
@ 2014-09-10 15:47 ` John Fastabend
  2014-09-11  0:56   ` Eric Dumazet
  2014-09-10 15:47 ` [net-next PATCH v4 03/16] net: sched: cls_basic use RCU John Fastabend
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:47 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    9 +++++----
 net/sched/cls_api.c       |   30 +++++++++++++++---------------
 net/sched/sch_api.c       |   10 +++++-----
 net/sched/sch_atm.c       |   20 +++++++++++---------
 net/sched/sch_cbq.c       |   11 +++++++----
 net/sched/sch_choke.c     |   17 ++++++++++++-----
 net/sched/sch_drr.c       |    9 ++++++---
 net/sched/sch_dsmark.c    |    9 +++++----
 net/sched/sch_fq_codel.c  |   11 +++++++----
 net/sched/sch_hfsc.c      |    8 ++++----
 net/sched/sch_htb.c       |   15 ++++++++-------
 net/sched/sch_ingress.c   |    8 +++++---
 net/sched/sch_multiq.c    |    8 +++++---
 net/sched/sch_prio.c      |   11 +++++++----
 net/sched/sch_qfq.c       |    9 ++++++---
 net/sched/sch_sfb.c       |   15 +++++++++------
 net/sched/sch_sfq.c       |   11 +++++++----
 17 files changed, 124 insertions(+), 87 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index ce3b920..206d906 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -143,7 +143,7 @@ struct Qdisc_class_ops {
 	void			(*walk)(struct Qdisc *, struct qdisc_walker * arg);
 
 	/* Filter manipulation */
-	struct tcf_proto **	(*tcf_chain)(struct Qdisc *, unsigned long);
+	struct tcf_proto __rcu ** (*tcf_chain)(struct Qdisc *, unsigned long);
 	unsigned long		(*bind_tcf)(struct Qdisc *, unsigned long,
 					u32 classid);
 	void			(*unbind_tcf)(struct Qdisc *, unsigned long);
@@ -212,8 +212,8 @@ struct tcf_proto_ops {
 
 struct tcf_proto {
 	/* Fast access part */
-	struct tcf_proto	*next;
-	void			*root;
+	struct tcf_proto __rcu	*next;
+	void __rcu		*root;
 	int			(*classify)(struct sk_buff *,
 					    const struct tcf_proto *,
 					    struct tcf_result *);
@@ -225,6 +225,7 @@ struct tcf_proto {
 	struct Qdisc		*q;
 	void			*data;
 	const struct tcf_proto_ops	*ops;
+	struct rcu_head		rcu;
 };
 
 struct qdisc_skb_cb {
@@ -378,7 +379,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 void __qdisc_calculate_pkt_len(struct sk_buff *skb,
 			       const struct qdisc_size_table *stab);
 void tcf_destroy(struct tcf_proto *tp);
-void tcf_destroy_chain(struct tcf_proto **fl);
+void tcf_destroy_chain(struct tcf_proto __rcu **fl);
 
 /* Reset all TX qdiscs greater then index of a device.  */
 static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c28b0d3..e547efd 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -117,7 +117,6 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tca[TCA_MAX + 1];
-	spinlock_t *root_lock;
 	struct tcmsg *t;
 	u32 protocol;
 	u32 prio;
@@ -125,7 +124,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 	u32 parent;
 	struct net_device *dev;
 	struct Qdisc  *q;
-	struct tcf_proto **back, **chain;
+	struct tcf_proto __rcu **back;
+	struct tcf_proto __rcu **chain;
 	struct tcf_proto *tp;
 	const struct tcf_proto_ops *tp_ops;
 	const struct Qdisc_class_ops *cops;
@@ -197,7 +197,9 @@ replay:
 		goto errout;
 
 	/* Check the chain for existence of proto-tcf with this priority */
-	for (back = chain; (tp = *back) != NULL; back = &tp->next) {
+	for (back = chain;
+	     (tp = rtnl_dereference(*back)) != NULL;
+	     back = &tp->next) {
 		if (tp->prio >= prio) {
 			if (tp->prio == prio) {
 				if (!nprio ||
@@ -209,8 +211,6 @@ replay:
 		}
 	}
 
-	root_lock = qdisc_root_sleeping_lock(q);
-
 	if (tp == NULL) {
 		/* Proto-tcf does not exist, create new one */
 
@@ -259,7 +259,8 @@ replay:
 		}
 		tp->ops = tp_ops;
 		tp->protocol = protocol;
-		tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(*back));
+		tp->prio = nprio ? :
+			       TC_H_MAJ(tcf_auto_prio(rtnl_dereference(*back)));
 		tp->q = q;
 		tp->classify = tp_ops->classify;
 		tp->classid = parent;
@@ -280,9 +281,9 @@ replay:
 
 	if (fh == 0) {
 		if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
-			spin_lock_bh(root_lock);
-			*back = tp->next;
-			spin_unlock_bh(root_lock);
+			struct tcf_proto *next = rtnl_dereference(tp->next);
+
+			RCU_INIT_POINTER(*back, next);
 
 			tfilter_notify(net, skb, n, tp, fh, RTM_DELTFILTER);
 			tcf_destroy(tp);
@@ -322,10 +323,8 @@ replay:
 			      n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE);
 	if (err == 0) {
 		if (tp_created) {
-			spin_lock_bh(root_lock);
-			tp->next = *back;
-			*back = tp;
-			spin_unlock_bh(root_lock);
+			RCU_INIT_POINTER(tp->next, rtnl_dereference(*back));
+			rcu_assign_pointer(*back, tp);
 		}
 		tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER);
 	} else {
@@ -420,7 +419,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	int s_t;
 	struct net_device *dev;
 	struct Qdisc *q;
-	struct tcf_proto *tp, **chain;
+	struct tcf_proto *tp, __rcu **chain;
 	struct tcmsg *tcm = nlmsg_data(cb->nlh);
 	unsigned long cl = 0;
 	const struct Qdisc_class_ops *cops;
@@ -454,7 +453,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 
 	s_t = cb->args[0];
 
-	for (tp = *chain, t = 0; tp; tp = tp->next, t++) {
+	for (tp = rtnl_dereference(*chain), t = 0;
+	     tp; tp = rtnl_dereference(tp->next), t++) {
 		if (t < s_t)
 			continue;
 		if (TC_H_MAJ(tcm->tcm_info) &&
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 58bed75..ca62483 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1781,7 +1781,7 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
 	__be16 protocol = skb->protocol;
 	int err;
 
-	for (; tp; tp = tp->next) {
+	for (; tp; tp = rcu_dereference_bh(tp->next)) {
 		if (tp->protocol != protocol &&
 		    tp->protocol != htons(ETH_P_ALL))
 			continue;
@@ -1833,15 +1833,15 @@ void tcf_destroy(struct tcf_proto *tp)
 {
 	tp->ops->destroy(tp);
 	module_put(tp->ops->owner);
-	kfree(tp);
+	kfree_rcu(tp, rcu);
 }
 
-void tcf_destroy_chain(struct tcf_proto **fl)
+void tcf_destroy_chain(struct tcf_proto __rcu **fl)
 {
 	struct tcf_proto *tp;
 
-	while ((tp = *fl) != NULL) {
-		*fl = tp->next;
+	while ((tp = rtnl_dereference(*fl)) != NULL) {
+		RCU_INIT_POINTER(*fl, tp->next);
 		tcf_destroy(tp);
 	}
 }
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 8449b33..c398f9c 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -41,7 +41,7 @@
 
 struct atm_flow_data {
 	struct Qdisc		*q;	/* FIFO, TBF, etc. */
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 	struct atm_vcc		*vcc;	/* VCC; NULL if VCC is closed */
 	void			(*old_pop)(struct atm_vcc *vcc,
 					   struct sk_buff *skb); /* chaining */
@@ -273,7 +273,7 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
 		error = -ENOBUFS;
 		goto err_out;
 	}
-	flow->filter_list = NULL;
+	RCU_INIT_POINTER(flow->filter_list, NULL);
 	flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid);
 	if (!flow->q)
 		flow->q = &noop_qdisc;
@@ -311,7 +311,7 @@ static int atm_tc_delete(struct Qdisc *sch, unsigned long arg)
 	pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow);
 	if (list_empty(&flow->list))
 		return -EINVAL;
-	if (flow->filter_list || flow == &p->link)
+	if (rcu_access_pointer(flow->filter_list) || flow == &p->link)
 		return -EBUSY;
 	/*
 	 * Reference count must be 2: one for "keepalive" (set at class
@@ -345,7 +345,8 @@ static void atm_tc_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 	}
 }
 
-static struct tcf_proto **atm_tc_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **atm_tc_find_tcf(struct Qdisc *sch,
+						unsigned long cl)
 {
 	struct atm_qdisc_data *p = qdisc_priv(sch);
 	struct atm_flow_data *flow = (struct atm_flow_data *)cl;
@@ -369,11 +370,12 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	flow = NULL;
 	if (TC_H_MAJ(skb->priority) != sch->handle ||
 	    !(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->priority))) {
+		struct tcf_proto *fl;
+
 		list_for_each_entry(flow, &p->flows, list) {
-			if (flow->filter_list) {
-				result = tc_classify_compat(skb,
-							    flow->filter_list,
-							    &res);
+			fl = rcu_dereference_bh(flow->filter_list);
+			if (fl) {
+				result = tc_classify_compat(skb, fl, &res);
 				if (result < 0)
 					continue;
 				flow = (struct atm_flow_data *)res.class;
@@ -544,7 +546,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt)
 	if (!p->link.q)
 		p->link.q = &noop_qdisc;
 	pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q);
-	p->link.filter_list = NULL;
+	RCU_INIT_POINTER(p->link.filter_list, NULL);
 	p->link.vcc = NULL;
 	p->link.sock = NULL;
 	p->link.classid = sch->handle;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 762a04b..a3244a8 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -133,7 +133,7 @@ struct cbq_class {
 	struct gnet_stats_rate_est64 rate_est;
 	struct tc_cbq_xstats	xstats;
 
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 
 	int			refcnt;
 	int			filters;
@@ -221,6 +221,7 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct cbq_class **defmap;
 	struct cbq_class *cl = NULL;
 	u32 prio = skb->priority;
+	struct tcf_proto *fl;
 	struct tcf_result res;
 
 	/*
@@ -235,11 +236,12 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 		int result = 0;
 		defmap = head->defaults;
 
+		fl = rcu_dereference_bh(head->filter_list);
 		/*
 		 * Step 2+n. Apply classifier.
 		 */
-		if (!head->filter_list ||
-		    (result = tc_classify_compat(skb, head->filter_list, &res)) < 0)
+		result = tc_classify_compat(skb, fl, &res);
+		if (!fl || result < 0)
 			goto fallback;
 
 		cl = (void *)res.class;
@@ -1954,7 +1956,8 @@ static int cbq_delete(struct Qdisc *sch, unsigned long arg)
 	return 0;
 }
 
-static struct tcf_proto **cbq_find_tcf(struct Qdisc *sch, unsigned long arg)
+static struct tcf_proto __rcu **cbq_find_tcf(struct Qdisc *sch,
+					     unsigned long arg)
 {
 	struct cbq_sched_data *q = qdisc_priv(sch);
 	struct cbq_class *cl = (struct cbq_class *)arg;
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index ed30e43..4b52b70 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -57,7 +57,7 @@ struct choke_sched_data {
 
 /* Variables */
 	struct red_vars  vars;
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	struct {
 		u32	prob_drop;	/* Early probability drops */
 		u32	prob_mark;	/* Early probability marks */
@@ -193,9 +193,11 @@ static bool choke_classify(struct sk_buff *skb,
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int result;
 
-	result = tc_classify(skb, q->filter_list, &res);
+	fl = rcu_dereference_bh(q->filter_list);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -244,12 +246,14 @@ static bool choke_match_random(const struct choke_sched_data *q,
 			       unsigned int *pidx)
 {
 	struct sk_buff *oskb;
+	struct tcf_proto *fl;
 
 	if (q->head == q->tail)
 		return false;
 
 	oskb = choke_peek_random(q, pidx);
-	if (q->filter_list)
+	fl = rcu_dereference_bh(q->filter_list);
+	if (fl)
 		return choke_get_classid(nskb) == choke_get_classid(oskb);
 
 	return choke_match_flow(oskb, nskb);
@@ -259,9 +263,11 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 	const struct red_parms *p = &q->parms;
+	struct tcf_proto *fl;
 	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 
-	if (q->filter_list) {
+	fl = rcu_dereference_bh(q->filter_list);
+	if (fl) {
 		/* If using external classifiers, get result and record it. */
 		if (!choke_classify(skb, sch, &ret))
 			goto other_drop;	/* Packet was eaten by filter */
@@ -554,7 +560,8 @@ static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
 	return 0;
 }
 
-static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **choke_find_tcf(struct Qdisc *sch,
+					       unsigned long cl)
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 7bbbfe1..d8b5ccf 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -35,7 +35,7 @@ struct drr_class {
 
 struct drr_sched {
 	struct list_head		active;
-	struct tcf_proto		*filter_list;
+	struct tcf_proto __rcu		*filter_list;
 	struct Qdisc_class_hash		clhash;
 };
 
@@ -184,7 +184,8 @@ static void drr_put_class(struct Qdisc *sch, unsigned long arg)
 		drr_destroy_class(sch, cl);
 }
 
-static struct tcf_proto **drr_tcf_chain(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **drr_tcf_chain(struct Qdisc *sch,
+					      unsigned long cl)
 {
 	struct drr_sched *q = qdisc_priv(sch);
 
@@ -319,6 +320,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 	struct drr_sched *q = qdisc_priv(sch);
 	struct drr_class *cl;
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int result;
 
 	if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) {
@@ -328,7 +330,8 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, q->filter_list, &res);
+	fl = rcu_dereference_bh(q->filter_list);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 49d6ef3..485e456 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -37,7 +37,7 @@
 
 struct dsmark_qdisc_data {
 	struct Qdisc		*q;
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 	u8			*mask;	/* "owns" the array */
 	u8			*value;
 	u16			indices;
@@ -186,8 +186,8 @@ ignore:
 	}
 }
 
-static inline struct tcf_proto **dsmark_find_tcf(struct Qdisc *sch,
-						 unsigned long cl)
+static inline struct tcf_proto __rcu **dsmark_find_tcf(struct Qdisc *sch,
+						       unsigned long cl)
 {
 	struct dsmark_qdisc_data *p = qdisc_priv(sch);
 	return &p->filter_list;
@@ -229,7 +229,8 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		skb->tc_index = TC_H_MIN(skb->priority);
 	else {
 		struct tcf_result res;
-		int result = tc_classify(skb, p->filter_list, &res);
+		struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
+		int result = tc_classify(skb, fl, &res);
 
 		pr_debug("result %d class 0x%04x\n", result, res.classid);
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index cc56c8b..105cf55 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -52,7 +52,7 @@ struct fq_codel_flow {
 }; /* please try to keep this structure <= 64 bytes */
 
 struct fq_codel_sched_data {
-	struct tcf_proto *filter_list;	/* optional external classifier */
+	struct tcf_proto __rcu *filter_list; /* optional external classifier */
 	struct fq_codel_flow *flows;	/* Flows table [flows_cnt] */
 	u32		*backlogs;	/* backlog table [flows_cnt] */
 	u32		flows_cnt;	/* number of flows */
@@ -85,6 +85,7 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 				      int *qerr)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *filter;
 	struct tcf_result res;
 	int result;
 
@@ -93,11 +94,12 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 	    TC_H_MIN(skb->priority) <= q->flows_cnt)
 		return TC_H_MIN(skb->priority);
 
-	if (!q->filter_list)
+	filter = rcu_dereference(q->filter_list);
+	if (!filter)
 		return fq_codel_hash(q, skb) + 1;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, q->filter_list, &res);
+	result = tc_classify(skb, filter, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -496,7 +498,8 @@ static void fq_codel_put(struct Qdisc *q, unsigned long cl)
 {
 }
 
-static struct tcf_proto **fq_codel_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **fq_codel_find_tcf(struct Qdisc *sch,
+						  unsigned long cl)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index ec8aeaa..04b0de4 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -116,7 +116,7 @@ struct hfsc_class {
 	struct gnet_stats_queue qstats;
 	struct gnet_stats_rate_est64 rate_est;
 	unsigned int	level;		/* class level in hierarchy */
-	struct tcf_proto *filter_list;	/* filter list */
+	struct tcf_proto __rcu *filter_list; /* filter list */
 	unsigned int	filter_cnt;	/* filter count */
 
 	struct hfsc_sched *sched;	/* scheduler data */
@@ -1161,7 +1161,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	head = &q->root;
-	tcf = q->root.filter_list;
+	tcf = rcu_dereference_bh(q->root.filter_list);
 	while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -1185,7 +1185,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 			return cl; /* hit leaf class */
 
 		/* apply inner filter chain */
-		tcf = cl->filter_list;
+		tcf = rcu_dereference_bh(cl->filter_list);
 		head = cl;
 	}
 
@@ -1285,7 +1285,7 @@ hfsc_unbind_tcf(struct Qdisc *sch, unsigned long arg)
 	cl->filter_cnt--;
 }
 
-static struct tcf_proto **
+static struct tcf_proto __rcu **
 hfsc_tcf_chain(struct Qdisc *sch, unsigned long arg)
 {
 	struct hfsc_sched *q = qdisc_priv(sch);
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index aea942c..6d16b9b 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -103,7 +103,7 @@ struct htb_class {
 	u32			prio;		/* these two are used only by leaves... */
 	int			quantum;	/* but stored for parent-to-leaf return */
 
-	struct tcf_proto	*filter_list;	/* class attached filters */
+	struct tcf_proto __rcu	*filter_list;	/* class attached filters */
 	int			filter_cnt;
 	int			refcnt;		/* usage count of this class */
 
@@ -153,7 +153,7 @@ struct htb_sched {
 	int			rate2quantum;	/* quant = rate / rate2quantum */
 
 	/* filters for qdisc itself */
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 
 #define HTB_WARN_TOOMANYEVENTS	0x1
 	unsigned int		warned;	/* only one warning */
@@ -223,9 +223,9 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 		if (cl->level == 0)
 			return cl;
 		/* Start with inner filter chain if a non-leaf class is selected */
-		tcf = cl->filter_list;
+		tcf = rcu_dereference_bh(cl->filter_list);
 	} else {
-		tcf = q->filter_list;
+		tcf = rcu_dereference_bh(q->filter_list);
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
@@ -251,7 +251,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 			return cl;	/* we hit leaf; return it */
 
 		/* we have got inner class; apply inner filter chain */
-		tcf = cl->filter_list;
+		tcf = rcu_dereference_bh(cl->filter_list);
 	}
 	/* classification failed; try to use default class */
 	cl = htb_find(TC_H_MAKE(TC_H_MAJ(sch->handle), q->defcls), sch);
@@ -1519,11 +1519,12 @@ failure:
 	return err;
 }
 
-static struct tcf_proto **htb_find_tcf(struct Qdisc *sch, unsigned long arg)
+static struct tcf_proto __rcu **htb_find_tcf(struct Qdisc *sch,
+					     unsigned long arg)
 {
 	struct htb_sched *q = qdisc_priv(sch);
 	struct htb_class *cl = (struct htb_class *)arg;
-	struct tcf_proto **fl = cl ? &cl->filter_list : &q->filter_list;
+	struct tcf_proto __rcu **fl = cl ? &cl->filter_list : &q->filter_list;
 
 	return fl;
 }
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 62871c1..b351125 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -17,7 +17,7 @@
 
 
 struct ingress_qdisc_data {
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 };
 
 /* ------------------------- Class/flow operations ------------------------- */
@@ -46,7 +46,8 @@ static void ingress_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 {
 }
 
-static struct tcf_proto **ingress_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **ingress_find_tcf(struct Qdisc *sch,
+						 unsigned long cl)
 {
 	struct ingress_qdisc_data *p = qdisc_priv(sch);
 
@@ -59,9 +60,10 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct ingress_qdisc_data *p = qdisc_priv(sch);
 	struct tcf_result res;
+	struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
 	int result;
 
-	result = tc_classify(skb, p->filter_list, &res);
+	result = tc_classify(skb, fl, &res);
 
 	qdisc_bstats_update(sch, skb);
 	switch (result) {
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index afb050a..c0466c1 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -31,7 +31,7 @@ struct multiq_sched_data {
 	u16 bands;
 	u16 max_bands;
 	u16 curband;
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	struct Qdisc **queues;
 };
 
@@ -42,10 +42,11 @@ multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct multiq_sched_data *q = qdisc_priv(sch);
 	u32 band;
 	struct tcf_result res;
+	struct tcf_proto *fl = rcu_dereference_bh(q->filter_list);
 	int err;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	err = tc_classify(skb, q->filter_list, &res);
+	err = tc_classify(skb, fl, &res);
 #ifdef CONFIG_NET_CLS_ACT
 	switch (err) {
 	case TC_ACT_STOLEN:
@@ -388,7 +389,8 @@ static void multiq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 	}
 }
 
-static struct tcf_proto **multiq_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **multiq_find_tcf(struct Qdisc *sch,
+						unsigned long cl)
 {
 	struct multiq_sched_data *q = qdisc_priv(sch);
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 79359b6..03ef99e 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -24,7 +24,7 @@
 
 struct prio_sched_data {
 	int bands;
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	u8  prio2band[TC_PRIO_MAX+1];
 	struct Qdisc *queues[TCQ_PRIO_BANDS];
 };
@@ -36,11 +36,13 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct prio_sched_data *q = qdisc_priv(sch);
 	u32 band = skb->priority;
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int err;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	if (TC_H_MAJ(skb->priority) != sch->handle) {
-		err = tc_classify(skb, q->filter_list, &res);
+		fl = rcu_dereference_bh(q->filter_list);
+		err = tc_classify(skb, fl, &res);
 #ifdef CONFIG_NET_CLS_ACT
 		switch (err) {
 		case TC_ACT_STOLEN:
@@ -50,7 +52,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 			return NULL;
 		}
 #endif
-		if (!q->filter_list || err < 0) {
+		if (!fl || err < 0) {
 			if (TC_H_MAJ(band))
 				band = 0;
 			return q->queues[q->prio2band[band & TC_PRIO_MAX]];
@@ -351,7 +353,8 @@ static void prio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 	}
 }
 
-static struct tcf_proto **prio_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **prio_find_tcf(struct Qdisc *sch,
+					      unsigned long cl)
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
 
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 8056fb4..602ea01 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -181,7 +181,7 @@ struct qfq_group {
 };
 
 struct qfq_sched {
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	struct Qdisc_class_hash clhash;
 
 	u64			oldV, V;	/* Precise virtual times. */
@@ -576,7 +576,8 @@ static void qfq_put_class(struct Qdisc *sch, unsigned long arg)
 		qfq_destroy_class(sch, cl);
 }
 
-static struct tcf_proto **qfq_tcf_chain(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **qfq_tcf_chain(struct Qdisc *sch,
+					      unsigned long cl)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
 
@@ -704,6 +705,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	struct qfq_sched *q = qdisc_priv(sch);
 	struct qfq_class *cl;
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int result;
 
 	if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) {
@@ -714,7 +716,8 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, q->filter_list, &res);
+	fl = rcu_dereference_bh(q->filter_list);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 9b0f709..1562fb2 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -55,7 +55,7 @@ struct sfb_bins {
 
 struct sfb_sched_data {
 	struct Qdisc	*qdisc;
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	unsigned long	rehash_interval;
 	unsigned long	warmup_time;	/* double buffering warmup time in jiffies */
 	u32		max;
@@ -253,13 +253,13 @@ static bool sfb_rate_limit(struct sk_buff *skb, struct sfb_sched_data *q)
 	return false;
 }
 
-static bool sfb_classify(struct sk_buff *skb, struct sfb_sched_data *q,
+static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
 			 int *qerr, u32 *salt)
 {
 	struct tcf_result res;
 	int result;
 
-	result = tc_classify(skb, q->filter_list, &res);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -281,6 +281,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	struct sfb_sched_data *q = qdisc_priv(sch);
 	struct Qdisc *child = q->qdisc;
+	struct tcf_proto *fl;
 	int i;
 	u32 p_min = ~0;
 	u32 minqlen = ~0;
@@ -306,9 +307,10 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		}
 	}
 
-	if (q->filter_list) {
+	fl = rcu_dereference_bh(q->filter_list);
+	if (fl) {
 		/* If using external classifiers, get result and record it. */
-		if (!sfb_classify(skb, q, &ret, &salt))
+		if (!sfb_classify(skb, fl, &ret, &salt))
 			goto other_drop;
 		keys.src = salt;
 		keys.dst = 0;
@@ -660,7 +662,8 @@ static void sfb_walk(struct Qdisc *sch, struct qdisc_walker *walker)
 	}
 }
 
-static struct tcf_proto **sfb_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **sfb_find_tcf(struct Qdisc *sch,
+					     unsigned long cl)
 {
 	struct sfb_sched_data *q = qdisc_priv(sch);
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 1af2f73..0bededd 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -125,7 +125,7 @@ struct sfq_sched_data {
 	u8		cur_depth;	/* depth of longest slot */
 	u8		flags;
 	unsigned short  scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	sfq_index	*ht;		/* Hash table ('divisor' slots) */
 	struct sfq_slot	*slots;		/* Flows table ('maxflows' entries) */
 
@@ -187,6 +187,7 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int result;
 
 	if (TC_H_MAJ(skb->priority) == sch->handle &&
@@ -194,13 +195,14 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	    TC_H_MIN(skb->priority) <= q->divisor)
 		return TC_H_MIN(skb->priority);
 
-	if (!q->filter_list) {
+	fl = rcu_dereference_bh(q->filter_list);
+	if (!fl) {
 		skb_flow_dissect(skb, &sfq_skb_cb(skb)->keys);
 		return sfq_hash(q, skb) + 1;
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, q->filter_list, &res);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -841,7 +843,8 @@ static void sfq_put(struct Qdisc *q, unsigned long cl)
 {
 }
 
-static struct tcf_proto **sfq_find_tcf(struct Qdisc *sch, unsigned long cl)
+static struct tcf_proto __rcu **sfq_find_tcf(struct Qdisc *sch,
+					     unsigned long cl)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 03/16] net: sched: cls_basic use RCU
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
  2014-09-10 15:47 ` [net-next PATCH v4 01/16] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
  2014-09-10 15:47 ` [net-next PATCH v4 02/16] net: rcu-ify tcf_proto John Fastabend
@ 2014-09-10 15:47 ` John Fastabend
  2014-09-10 15:48 ` [net-next PATCH v4 04/16] net: sched: cls_cgroup " John Fastabend
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:47 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

Enable basic classifier for RCU.

Dereferencing tp->root may look a bit strange here but it is needed
by my accounting because it is allocated at init time and needs to
be kfree'd at destroy time. However because it may be referenced in
the classify() path we must wait an RCU grace period before free'ing
it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern
is used in all the classifiers.

Also the hgenerator can be incremented without concern because it
is always incremented under RTNL.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/cls_basic.c |   80 ++++++++++++++++++++++++++++---------------------
 1 file changed, 45 insertions(+), 35 deletions(-)

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 0ae1813..1937298 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -24,6 +24,7 @@
 struct basic_head {
 	u32			hgenerator;
 	struct list_head	flist;
+	struct rcu_head		rcu;
 };
 
 struct basic_filter {
@@ -31,17 +32,19 @@ struct basic_filter {
 	struct tcf_exts		exts;
 	struct tcf_ematch_tree	ematches;
 	struct tcf_result	res;
+	struct tcf_proto	*tp;
 	struct list_head	link;
+	struct rcu_head		rcu;
 };
 
 static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			  struct tcf_result *res)
 {
 	int r;
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rcu_dereference_bh(tp->root);
 	struct basic_filter *f;
 
-	list_for_each_entry(f, &head->flist, link) {
+	list_for_each_entry_rcu(f, &head->flist, link) {
 		if (!tcf_em_tree_match(skb, &f->ematches, NULL))
 			continue;
 		*res = f->res;
@@ -56,7 +59,7 @@ static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 static unsigned long basic_get(struct tcf_proto *tp, u32 handle)
 {
 	unsigned long l = 0UL;
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f;
 
 	if (head == NULL)
@@ -81,12 +84,15 @@ static int basic_init(struct tcf_proto *tp)
 	if (head == NULL)
 		return -ENOBUFS;
 	INIT_LIST_HEAD(&head->flist);
-	tp->root = head;
+	rcu_assign_pointer(tp->root, head);
 	return 0;
 }
 
-static void basic_delete_filter(struct tcf_proto *tp, struct basic_filter *f)
+static void basic_delete_filter(struct rcu_head *head)
 {
+	struct basic_filter *f = container_of(head, struct basic_filter, rcu);
+	struct tcf_proto *tp = f->tp;
+
 	tcf_unbind_filter(tp, &f->res);
 	tcf_exts_destroy(tp, &f->exts);
 	tcf_em_tree_destroy(tp, &f->ematches);
@@ -95,27 +101,26 @@ static void basic_delete_filter(struct tcf_proto *tp, struct basic_filter *f)
 
 static void basic_destroy(struct tcf_proto *tp)
 {
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f, *n;
 
 	list_for_each_entry_safe(f, n, &head->flist, link) {
-		list_del(&f->link);
-		basic_delete_filter(tp, f);
+		list_del_rcu(&f->link);
+		call_rcu(&f->rcu, basic_delete_filter);
 	}
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static int basic_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *t, *f = (struct basic_filter *) arg;
 
 	list_for_each_entry(t, &head->flist, link)
 		if (t == f) {
-			tcf_tree_lock(tp);
-			list_del(&t->link);
-			tcf_tree_unlock(tp);
-			basic_delete_filter(tp, t);
+			list_del_rcu(&t->link);
+			call_rcu(&t->rcu, basic_delete_filter);
 			return 0;
 		}
 
@@ -152,6 +157,7 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp,
 
 	tcf_exts_change(tp, &f->exts, &e);
 	tcf_em_tree_change(tp, &f->ematches, &t);
+	f->tp = tp;
 
 	return 0;
 errout:
@@ -164,9 +170,10 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 			struct nlattr **tca, unsigned long *arg, bool ovr)
 {
 	int err;
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct nlattr *tb[TCA_BASIC_MAX + 1];
-	struct basic_filter *f = (struct basic_filter *) *arg;
+	struct basic_filter *fold = (struct basic_filter *) *arg;
+	struct basic_filter *fnew;
 
 	if (tca[TCA_OPTIONS] == NULL)
 		return -EINVAL;
@@ -176,22 +183,23 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		return err;
 
-	if (f != NULL) {
-		if (handle && f->handle != handle)
+	if (fold != NULL) {
+		if (handle && fold->handle != handle)
 			return -EINVAL;
-		return basic_set_parms(net, tp, f, base, tb, tca[TCA_RATE], ovr);
 	}
 
 	err = -ENOBUFS;
-	f = kzalloc(sizeof(*f), GFP_KERNEL);
-	if (f == NULL)
+	fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
+	if (fnew == NULL)
 		goto errout;
 
-	tcf_exts_init(&f->exts, TCA_BASIC_ACT, TCA_BASIC_POLICE);
+	tcf_exts_init(&fnew->exts, TCA_BASIC_ACT, TCA_BASIC_POLICE);
 	err = -EINVAL;
-	if (handle)
-		f->handle = handle;
-	else {
+	if (handle) {
+		fnew->handle = handle;
+	} else if (fold) {
+		fnew->handle = fold->handle;
+	} else {
 		unsigned int i = 0x80000000;
 		do {
 			if (++head->hgenerator == 0x7FFFFFFF)
@@ -203,29 +211,31 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 			goto errout;
 		}
 
-		f->handle = head->hgenerator;
+		fnew->handle = head->hgenerator;
 	}
 
-	err = basic_set_parms(net, tp, f, base, tb, tca[TCA_RATE], ovr);
+	err = basic_set_parms(net, tp, fnew, base, tb, tca[TCA_RATE], ovr);
 	if (err < 0)
 		goto errout;
 
-	tcf_tree_lock(tp);
-	list_add(&f->link, &head->flist);
-	tcf_tree_unlock(tp);
-	*arg = (unsigned long) f;
+	*arg = (unsigned long)fnew;
+
+	if (fold) {
+		list_replace_rcu(&fold->link, &fnew->link);
+		call_rcu(&fold->rcu, basic_delete_filter);
+	} else {
+		list_add_rcu(&fnew->link, &head->flist);
+	}
 
 	return 0;
 errout:
-	if (*arg == 0UL && f)
-		kfree(f);
-
+	kfree(fnew);
 	return err;
 }
 
 static void basic_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f;
 
 	list_for_each_entry(f, &head->flist, link) {

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 04/16] net: sched: cls_cgroup use RCU
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (2 preceding siblings ...)
  2014-09-10 15:47 ` [net-next PATCH v4 03/16] net: sched: cls_basic use RCU John Fastabend
@ 2014-09-10 15:48 ` John Fastabend
  2014-09-10 15:48 ` [net-next PATCH v4 05/16] net: sched: cls_flow " John Fastabend
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:48 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

Make cgroup classifier safe for RCU.

Also drops the calls in the classify routine that were doing a
rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held
entering this routine we have issues with deleting the classifier
chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock()
pair noting all paths AFAIK hold rcu_read_lock.

If there is a case where classify is called without the rcu read lock
then an rcu splat will occur and we can correct it.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/cls_cgroup.c |   63 ++++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index cacf01b..3b75487 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -22,17 +22,17 @@ struct cls_cgroup_head {
 	u32			handle;
 	struct tcf_exts		exts;
 	struct tcf_ematch_tree	ematches;
+	struct tcf_proto	*tp;
+	struct rcu_head		rcu;
 };
 
 static int cls_cgroup_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			       struct tcf_result *res)
 {
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rcu_dereference_bh(tp->root);
 	u32 classid;
 
-	rcu_read_lock();
 	classid = task_cls_state(current)->classid;
-	rcu_read_unlock();
 
 	/*
 	 * Due to the nature of the classifier it is required to ignore all
@@ -80,13 +80,25 @@ static const struct nla_policy cgroup_policy[TCA_CGROUP_MAX + 1] = {
 	[TCA_CGROUP_EMATCHES]	= { .type = NLA_NESTED },
 };
 
+static void cls_cgroup_destroy_rcu(struct rcu_head *root)
+{
+	struct cls_cgroup_head *head = container_of(root,
+						    struct cls_cgroup_head,
+						    rcu);
+
+	tcf_exts_destroy(head->tp, &head->exts);
+	tcf_em_tree_destroy(head->tp, &head->ematches);
+	kfree(head);
+}
+
 static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 			     struct tcf_proto *tp, unsigned long base,
 			     u32 handle, struct nlattr **tca,
 			     unsigned long *arg, bool ovr)
 {
 	struct nlattr *tb[TCA_CGROUP_MAX + 1];
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
+	struct cls_cgroup_head *new;
 	struct tcf_ematch_tree t;
 	struct tcf_exts e;
 	int err;
@@ -94,25 +106,24 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 	if (!tca[TCA_OPTIONS])
 		return -EINVAL;
 
-	if (head == NULL) {
-		if (!handle)
-			return -EINVAL;
+	if (!head && !handle)
+		return -EINVAL;
 
-		head = kzalloc(sizeof(*head), GFP_KERNEL);
-		if (head == NULL)
-			return -ENOBUFS;
+	if (head && handle != head->handle)
+		return -ENOENT;
 
-		tcf_exts_init(&head->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
-		head->handle = handle;
+	new = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (!new)
+		return -ENOBUFS;
 
-		tcf_tree_lock(tp);
-		tp->root = head;
-		tcf_tree_unlock(tp);
+	if (head) {
+		new->handle = head->handle;
+	} else {
+		tcf_exts_init(&new->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
+		new->handle = handle;
 	}
 
-	if (handle != head->handle)
-		return -ENOENT;
-
+	new->tp = tp;
 	err = nla_parse_nested(tb, TCA_CGROUP_MAX, tca[TCA_OPTIONS],
 			       cgroup_policy);
 	if (err < 0)
@@ -127,20 +138,24 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		return err;
 
-	tcf_exts_change(tp, &head->exts, &e);
-	tcf_em_tree_change(tp, &head->ematches, &t);
+	tcf_exts_change(tp, &new->exts, &e);
+	tcf_em_tree_change(tp, &new->ematches, &t);
 
+	rcu_assign_pointer(tp->root, new);
+	if (head)
+		call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
 	return 0;
 }
 
 static void cls_cgroup_destroy(struct tcf_proto *tp)
 {
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 
 	if (head) {
 		tcf_exts_destroy(tp, &head->exts);
 		tcf_em_tree_destroy(tp, &head->ematches);
-		kfree(head);
+		RCU_INIT_POINTER(tp->root, NULL);
+		kfree_rcu(head, rcu);
 	}
 }
 
@@ -151,7 +166,7 @@ static int cls_cgroup_delete(struct tcf_proto *tp, unsigned long arg)
 
 static void cls_cgroup_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 
 	if (arg->count < arg->skip)
 		goto skip;
@@ -167,7 +182,7 @@ skip:
 static int cls_cgroup_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 			   struct sk_buff *skb, struct tcmsg *t)
 {
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 	unsigned char *b = skb_tail_pointer(skb);
 	struct nlattr *nest;
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 05/16] net: sched: cls_flow use RCU
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (3 preceding siblings ...)
  2014-09-10 15:48 ` [net-next PATCH v4 04/16] net: sched: cls_cgroup " John Fastabend
@ 2014-09-10 15:48 ` John Fastabend
  2014-09-11  0:58   ` Eric Dumazet
  2014-09-10 15:49 ` [net-next PATCH v4 06/16] net: sched: fw " John Fastabend
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:48 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_flow.c |  145 +++++++++++++++++++++++++++++---------------------
 1 file changed, 84 insertions(+), 61 deletions(-)

diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 35be16f..95736fa 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -34,12 +34,14 @@
 
 struct flow_head {
 	struct list_head	filters;
+	struct rcu_head		rcu;
 };
 
 struct flow_filter {
 	struct list_head	list;
 	struct tcf_exts		exts;
 	struct tcf_ematch_tree	ematches;
+	struct tcf_proto	*tp;
 	struct timer_list	perturb_timer;
 	u32			perturb_period;
 	u32			handle;
@@ -54,6 +56,7 @@ struct flow_filter {
 	u32			divisor;
 	u32			baseclass;
 	u32			hashrnd;
+	struct rcu_head		rcu;
 };
 
 static inline u32 addr_fold(void *addr)
@@ -276,14 +279,14 @@ static u32 flow_key_get(struct sk_buff *skb, int key, struct flow_keys *flow)
 static int flow_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			 struct tcf_result *res)
 {
-	struct flow_head *head = tp->root;
+	struct flow_head *head = rcu_dereference_bh(tp->root);
 	struct flow_filter *f;
 	u32 keymask;
 	u32 classid;
 	unsigned int n, key;
 	int r;
 
-	list_for_each_entry(f, &head->filters, list) {
+	list_for_each_entry_rcu(f, &head->filters, list) {
 		u32 keys[FLOW_KEY_MAX + 1];
 		struct flow_keys flow_keys;
 
@@ -346,13 +349,23 @@ static const struct nla_policy flow_policy[TCA_FLOW_MAX + 1] = {
 	[TCA_FLOW_PERTURB]	= { .type = NLA_U32 },
 };
 
+static void flow_destroy_filter(struct rcu_head *head)
+{
+	struct flow_filter *f = container_of(head, struct flow_filter, rcu);
+
+	del_timer_sync(&f->perturb_timer);
+	tcf_exts_destroy(f->tp, &f->exts);
+	tcf_em_tree_destroy(f->tp, &f->ematches);
+	kfree(f);
+}
+
 static int flow_change(struct net *net, struct sk_buff *in_skb,
 		       struct tcf_proto *tp, unsigned long base,
 		       u32 handle, struct nlattr **tca,
 		       unsigned long *arg, bool ovr)
 {
-	struct flow_head *head = tp->root;
-	struct flow_filter *f;
+	struct flow_head *head = rtnl_dereference(tp->root);
+	struct flow_filter *fold, *fnew;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_FLOW_MAX + 1];
 	struct tcf_exts e;
@@ -401,20 +414,42 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		goto err1;
 
-	f = (struct flow_filter *)*arg;
-	if (f != NULL) {
+	err = -ENOBUFS;
+	fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
+	if (!fnew)
+		goto err2;
+
+	fold = (struct flow_filter *)*arg;
+	if (fold) {
 		err = -EINVAL;
-		if (f->handle != handle && handle)
+		if (fold->handle != handle && handle)
 			goto err2;
 
-		mode = f->mode;
+		/* Copy fold into fnew */
+		fnew->handle = fold->handle;
+		fnew->keymask = fold->keymask;
+		fnew->tp = fold->tp;
+
+		fnew->handle = fold->handle;
+		fnew->nkeys = fold->nkeys;
+		fnew->keymask = fold->keymask;
+		fnew->mode = fold->mode;
+		fnew->mask = fold->mask;
+		fnew->xor = fold->xor;
+		fnew->rshift = fold->rshift;
+		fnew->addend = fold->addend;
+		fnew->divisor = fold->divisor;
+		fnew->baseclass = fold->baseclass;
+		fnew->hashrnd = fold->hashrnd;
+
+		mode = fold->mode;
 		if (tb[TCA_FLOW_MODE])
 			mode = nla_get_u32(tb[TCA_FLOW_MODE]);
 		if (mode != FLOW_MODE_HASH && nkeys > 1)
 			goto err2;
 
 		if (mode == FLOW_MODE_HASH)
-			perturb_period = f->perturb_period;
+			perturb_period = fold->perturb_period;
 		if (tb[TCA_FLOW_PERTURB]) {
 			if (mode != FLOW_MODE_HASH)
 				goto err2;
@@ -444,83 +479,70 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 		if (TC_H_MIN(baseclass) == 0)
 			baseclass = TC_H_MAKE(baseclass, 1);
 
-		err = -ENOBUFS;
-		f = kzalloc(sizeof(*f), GFP_KERNEL);
-		if (f == NULL)
-			goto err2;
-
-		f->handle = handle;
-		f->mask	  = ~0U;
-		tcf_exts_init(&f->exts, TCA_FLOW_ACT, TCA_FLOW_POLICE);
-
-		get_random_bytes(&f->hashrnd, 4);
-		f->perturb_timer.function = flow_perturbation;
-		f->perturb_timer.data = (unsigned long)f;
-		init_timer_deferrable(&f->perturb_timer);
+		fnew->handle = handle;
+		fnew->mask  = ~0U;
+		fnew->tp = tp;
+		get_random_bytes(&fnew->hashrnd, 4);
+		tcf_exts_init(&fnew->exts, TCA_FLOW_ACT, TCA_FLOW_POLICE);
 	}
 
-	tcf_exts_change(tp, &f->exts, &e);
-	tcf_em_tree_change(tp, &f->ematches, &t);
+	fnew->perturb_timer.function = flow_perturbation;
+	fnew->perturb_timer.data = (unsigned long)fnew;
+	init_timer_deferrable(&fnew->perturb_timer);
 
-	tcf_tree_lock(tp);
+	tcf_exts_change(tp, &fnew->exts, &e);
+	tcf_em_tree_change(tp, &fnew->ematches, &t);
 
 	if (tb[TCA_FLOW_KEYS]) {
-		f->keymask = keymask;
-		f->nkeys   = nkeys;
+		fnew->keymask = keymask;
+		fnew->nkeys   = nkeys;
 	}
 
-	f->mode = mode;
+	fnew->mode = mode;
 
 	if (tb[TCA_FLOW_MASK])
-		f->mask = nla_get_u32(tb[TCA_FLOW_MASK]);
+		fnew->mask = nla_get_u32(tb[TCA_FLOW_MASK]);
 	if (tb[TCA_FLOW_XOR])
-		f->xor = nla_get_u32(tb[TCA_FLOW_XOR]);
+		fnew->xor = nla_get_u32(tb[TCA_FLOW_XOR]);
 	if (tb[TCA_FLOW_RSHIFT])
-		f->rshift = nla_get_u32(tb[TCA_FLOW_RSHIFT]);
+		fnew->rshift = nla_get_u32(tb[TCA_FLOW_RSHIFT]);
 	if (tb[TCA_FLOW_ADDEND])
-		f->addend = nla_get_u32(tb[TCA_FLOW_ADDEND]);
+		fnew->addend = nla_get_u32(tb[TCA_FLOW_ADDEND]);
 
 	if (tb[TCA_FLOW_DIVISOR])
-		f->divisor = nla_get_u32(tb[TCA_FLOW_DIVISOR]);
+		fnew->divisor = nla_get_u32(tb[TCA_FLOW_DIVISOR]);
 	if (baseclass)
-		f->baseclass = baseclass;
+		fnew->baseclass = baseclass;
 
-	f->perturb_period = perturb_period;
-	del_timer(&f->perturb_timer);
+	fnew->perturb_period = perturb_period;
 	if (perturb_period)
-		mod_timer(&f->perturb_timer, jiffies + perturb_period);
+		mod_timer(&fnew->perturb_timer, jiffies + perturb_period);
 
 	if (*arg == 0)
-		list_add_tail(&f->list, &head->filters);
+		list_add_tail_rcu(&fnew->list, &head->filters);
+	else
+		list_replace_rcu(&fnew->list, &fold->list);
 
-	tcf_tree_unlock(tp);
+	*arg = (unsigned long)fnew;
 
-	*arg = (unsigned long)f;
+	if (fold)
+		call_rcu(&fold->rcu, flow_destroy_filter);
 	return 0;
 
 err2:
 	tcf_em_tree_destroy(tp, &t);
+	kfree(fnew);
 err1:
 	tcf_exts_destroy(tp, &e);
 	return err;
 }
 
-static void flow_destroy_filter(struct tcf_proto *tp, struct flow_filter *f)
-{
-	del_timer_sync(&f->perturb_timer);
-	tcf_exts_destroy(tp, &f->exts);
-	tcf_em_tree_destroy(tp, &f->ematches);
-	kfree(f);
-}
-
 static int flow_delete(struct tcf_proto *tp, unsigned long arg)
 {
 	struct flow_filter *f = (struct flow_filter *)arg;
 
-	tcf_tree_lock(tp);
-	list_del(&f->list);
-	tcf_tree_unlock(tp);
-	flow_destroy_filter(tp, f);
+	list_del_rcu(&f->list);
+	call_rcu(&f->rcu, flow_destroy_filter);
 	return 0;
 }
 
@@ -532,28 +554,29 @@ static int flow_init(struct tcf_proto *tp)
 	if (head == NULL)
 		return -ENOBUFS;
 	INIT_LIST_HEAD(&head->filters);
-	tp->root = head;
+	rcu_assign_pointer(tp->root, head);
 	return 0;
 }
 
 static void flow_destroy(struct tcf_proto *tp)
 {
-	struct flow_head *head = tp->root;
+	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f, *next;
 
 	list_for_each_entry_safe(f, next, &head->filters, list) {
-		list_del(&f->list);
-		flow_destroy_filter(tp, f);
+		list_del_rcu(&f->list);
+		call_rcu(&f->rcu, flow_destroy_filter);
 	}
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static unsigned long flow_get(struct tcf_proto *tp, u32 handle)
 {
-	struct flow_head *head = tp->root;
+	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f;
 
-	list_for_each_entry(f, &head->filters, list)
+	list_for_each_entry_rcu(f, &head->filters, list)
 		if (f->handle == handle)
 			return (unsigned long)f;
 	return 0;
@@ -626,10 +649,10 @@ nla_put_failure:
 
 static void flow_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct flow_head *head = tp->root;
+	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f;
 
-	list_for_each_entry(f, &head->filters, list) {
+	list_for_each_entry_rcu(f, &head->filters, list) {
 		if (arg->count < arg->skip)
 			goto skip;
 		if (arg->fn(tp, (unsigned long)f, arg) < 0) {

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 06/16] net: sched: fw use RCU
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (4 preceding siblings ...)
  2014-09-10 15:48 ` [net-next PATCH v4 05/16] net: sched: cls_flow " John Fastabend
@ 2014-09-10 15:49 ` John Fastabend
  2014-09-11  1:03   ` Eric Dumazet
  2014-09-10 15:49 ` [net-next PATCH v4 07/16] net: sched: RCU cls_route John Fastabend
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:49 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

RCU'ify fw classifier.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_fw.c |  111 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 77 insertions(+), 34 deletions(-)

diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 861b03c..006b45a 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -33,17 +33,20 @@
 
 struct fw_head {
 	u32			mask;
-	struct fw_filter	*ht[HTSIZE];
+	struct fw_filter __rcu	*ht[HTSIZE];
+	struct rcu_head		rcu;
 };
 
 struct fw_filter {
-	struct fw_filter	*next;
+	struct fw_filter __rcu	*next;
 	u32			id;
 	struct tcf_result	res;
 #ifdef CONFIG_NET_CLS_IND
 	int			ifindex;
 #endif /* CONFIG_NET_CLS_IND */
 	struct tcf_exts		exts;
+	struct tcf_proto	*tp;
+	struct rcu_head		rcu;
 };
 
 static u32 fw_hash(u32 handle)
@@ -56,14 +59,16 @@ static u32 fw_hash(u32 handle)
 static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			  struct tcf_result *res)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rcu_dereference_bh(tp->root);
 	struct fw_filter *f;
 	int r;
 	u32 id = skb->mark;
 
 	if (head != NULL) {
 		id &= head->mask;
-		for (f = head->ht[fw_hash(id)]; f; f = f->next) {
+
+		for (f = rcu_dereference_bh(head->ht[fw_hash(id)]); f;
+		     f = rcu_dereference_bh(f->next)) {
 			if (f->id == id) {
 				*res = f->res;
 #ifdef CONFIG_NET_CLS_IND
@@ -92,13 +97,14 @@ static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 
 static unsigned long fw_get(struct tcf_proto *tp, u32 handle)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f;
 
 	if (head == NULL)
 		return 0;
 
-	for (f = head->ht[fw_hash(handle)]; f; f = f->next) {
+	f = rtnl_dereference(head->ht[fw_hash(handle)]);
+	for (; f; f = rtnl_dereference(f->next)) {
 		if (f->id == handle)
 			return (unsigned long)f;
 	}
@@ -114,8 +120,11 @@ static int fw_init(struct tcf_proto *tp)
 	return 0;
 }
 
-static void fw_delete_filter(struct tcf_proto *tp, struct fw_filter *f)
+static void fw_delete_filter(struct rcu_head *head)
 {
+	struct fw_filter *f = container_of(head, struct fw_filter, rcu);
+	struct tcf_proto *tp = f->tp;
+
 	tcf_unbind_filter(tp, &f->res);
 	tcf_exts_destroy(tp, &f->exts);
 	kfree(f);
@@ -123,7 +132,7 @@ static void fw_delete_filter(struct tcf_proto *tp, struct fw_filter *f)
 
 static void fw_destroy(struct tcf_proto *tp)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f;
 	int h;
 
@@ -131,29 +140,33 @@ static void fw_destroy(struct tcf_proto *tp)
 		return;
 
 	for (h = 0; h < HTSIZE; h++) {
-		while ((f = head->ht[h]) != NULL) {
-			head->ht[h] = f->next;
-			fw_delete_filter(tp, f);
+		while ((f = rtnl_dereference(head->ht[h])) != NULL) {
+			RCU_INIT_POINTER(head->ht[h],
+					 rtnl_dereference(f->next));
+			call_rcu(&f->rcu, fw_delete_filter);
 		}
 	}
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static int fw_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = (struct fw_filter *)arg;
-	struct fw_filter **fp;
+	struct fw_filter __rcu **fp;
+	struct fw_filter *pfp;
 
 	if (head == NULL || f == NULL)
 		goto out;
 
-	for (fp = &head->ht[fw_hash(f->id)]; *fp; fp = &(*fp)->next) {
-		if (*fp == f) {
-			tcf_tree_lock(tp);
-			*fp = f->next;
-			tcf_tree_unlock(tp);
-			fw_delete_filter(tp, f);
+	fp = &head->ht[fw_hash(f->id)];
+
+	for (pfp = rtnl_dereference(*fp); pfp;
+	     fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
+		if (pfp == f) {
+			RCU_INIT_POINTER(*fp, rtnl_dereference(f->next));
+			call_rcu(&f->rcu, fw_delete_filter);
 			return 0;
 		}
 	}
@@ -171,7 +184,7 @@ static int
 fw_change_attrs(struct net *net, struct tcf_proto *tp, struct fw_filter *f,
 	struct nlattr **tb, struct nlattr **tca, unsigned long base, bool ovr)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct tcf_exts e;
 	u32 mask;
 	int err;
@@ -220,7 +233,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 		     struct nlattr **tca,
 		     unsigned long *arg, bool ovr)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = (struct fw_filter *) *arg;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_FW_MAX + 1];
@@ -233,10 +246,42 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		return err;
 
-	if (f != NULL) {
+	if (f) {
+		struct fw_filter *pfp, *fnew;
+		struct fw_filter __rcu **fp;
+
 		if (f->id != handle && handle)
 			return -EINVAL;
-		return fw_change_attrs(net, tp, f, tb, tca, base, ovr);
+
+		fnew = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
+		if (!fnew)
+			return -ENOBUFS;
+
+		fnew->id = f->id;
+		fnew->res = f->res;
+#ifdef CONFIG_NET_CLS_IND
+		fnew->ifindex = f->ifindex;
+#endif /* CONFIG_NET_CLS_IND */
+		fnew->tp = f->tp;
+
+		err = fw_change_attrs(net, tp, fnew, tb, tca, base, ovr);
+		if (err < 0) {
+			kfree(fnew);
+			return err;
+		}
+
+		fp = &head->ht[fw_hash(fnew->id)];
+		for (pfp = rtnl_dereference(*fp); pfp;
+		     fp = &pfp->next, pfp = rtnl_dereference(*fp))
+			if (pfp == f)
+				break;
+
+		RCU_INIT_POINTER(fnew->next, rtnl_dereference(pfp->next));
+		rcu_assign_pointer(*fp, fnew);
+		call_rcu(&f->rcu, fw_delete_filter);
+
+		*arg = (unsigned long)fnew;
+		return err;
 	}
 
 	if (!handle)
@@ -252,9 +297,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 			return -ENOBUFS;
 		head->mask = mask;
 
-		tcf_tree_lock(tp);
-		tp->root = head;
-		tcf_tree_unlock(tp);
+		rcu_assign_pointer(tp->root, head);
 	}
 
 	f = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
@@ -263,15 +306,14 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 
 	tcf_exts_init(&f->exts, TCA_FW_ACT, TCA_FW_POLICE);
 	f->id = handle;
+	f->tp = tp;
 
 	err = fw_change_attrs(net, tp, f, tb, tca, base, ovr);
 	if (err < 0)
 		goto errout;
 
-	f->next = head->ht[fw_hash(handle)];
-	tcf_tree_lock(tp);
-	head->ht[fw_hash(handle)] = f;
-	tcf_tree_unlock(tp);
+	RCU_INIT_POINTER(f->next, head->ht[fw_hash(handle)]);
+	rcu_assign_pointer(head->ht[fw_hash(handle)], f);
 
 	*arg = (unsigned long)f;
 	return 0;
@@ -283,7 +325,7 @@ errout:
 
 static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	int h;
 
 	if (head == NULL)
@@ -295,7 +337,8 @@ static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 	for (h = 0; h < HTSIZE; h++) {
 		struct fw_filter *f;
 
-		for (f = head->ht[h]; f; f = f->next) {
+		for (f = rtnl_dereference(head->ht[h]); f;
+		     f = rtnl_dereference(f->next)) {
 			if (arg->count < arg->skip) {
 				arg->count++;
 				continue;
@@ -312,7 +355,7 @@ static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 static int fw_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		   struct sk_buff *skb, struct tcmsg *t)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = (struct fw_filter *)fh;
 	unsigned char *b = skb_tail_pointer(skb);
 	struct nlattr *nest;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 07/16] net: sched: RCU cls_route
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (5 preceding siblings ...)
  2014-09-10 15:49 ` [net-next PATCH v4 06/16] net: sched: fw " John Fastabend
@ 2014-09-10 15:49 ` John Fastabend
  2014-09-11  1:12   ` Eric Dumazet
  2014-09-10 15:50 ` [net-next PATCH v4 08/16] net: sched: RCU cls_tcindex John Fastabend
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:49 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

RCUify the route classifier. For now however spinlock's are used to
protect fastmap cache.

The issue here is the fastmap may be read by one CPU while the
cache is being updated by another. An array of pointers could be
one possible solution.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_route.c |  226 +++++++++++++++++++++++++++++--------------------
 1 file changed, 132 insertions(+), 94 deletions(-)

diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index dd9fc25..ba96dea 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -29,25 +29,26 @@
  *    are mutually  exclusive.
  * 3. "to TAG from ANY" has higher priority, than "to ANY from XXX"
  */
-
 struct route4_fastmap {
-	struct route4_filter	*filter;
-	u32			id;
-	int			iif;
+	struct route4_filter		*filter;
+	u32				id;
+	int				iif;
 };
 
 struct route4_head {
-	struct route4_fastmap	fastmap[16];
-	struct route4_bucket	*table[256 + 1];
+	struct route4_fastmap		fastmap[16];
+	struct route4_bucket __rcu	*table[256 + 1];
+	struct rcu_head			rcu;
 };
 
 struct route4_bucket {
 	/* 16 FROM buckets + 16 IIF buckets + 1 wildcard bucket */
-	struct route4_filter	*ht[16 + 16 + 1];
+	struct route4_filter __rcu	*ht[16 + 16 + 1];
+	struct rcu_head			rcu;
 };
 
 struct route4_filter {
-	struct route4_filter	*next;
+	struct route4_filter __rcu	*next;
 	u32			id;
 	int			iif;
 
@@ -55,6 +56,8 @@ struct route4_filter {
 	struct tcf_exts		exts;
 	u32			handle;
 	struct route4_bucket	*bkt;
+	struct tcf_proto	*tp;
+	struct rcu_head		rcu;
 };
 
 #define ROUTE4_FAILURE ((struct route4_filter *)(-1L))
@@ -64,14 +67,13 @@ static inline int route4_fastmap_hash(u32 id, int iif)
 	return id & 0xF;
 }
 
+static DEFINE_SPINLOCK(fastmap_lock);
 static void
-route4_reset_fastmap(struct Qdisc *q, struct route4_head *head, u32 id)
+route4_reset_fastmap(struct route4_head *head)
 {
-	spinlock_t *root_lock = qdisc_root_sleeping_lock(q);
-
-	spin_lock_bh(root_lock);
+	spin_lock_bh(&fastmap_lock);
 	memset(head->fastmap, 0, sizeof(head->fastmap));
-	spin_unlock_bh(root_lock);
+	spin_unlock_bh(&fastmap_lock);
 }
 
 static void
@@ -80,9 +82,12 @@ route4_set_fastmap(struct route4_head *head, u32 id, int iif,
 {
 	int h = route4_fastmap_hash(id, iif);
 
+	/* fastmap updates must look atomic to aling id, iff, filter */
+	spin_lock_bh(&fastmap_lock);
 	head->fastmap[h].id = id;
 	head->fastmap[h].iif = iif;
 	head->fastmap[h].filter = f;
+	spin_unlock_bh(&fastmap_lock);
 }
 
 static inline int route4_hash_to(u32 id)
@@ -123,7 +128,7 @@ static inline int route4_hash_wild(void)
 static int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			   struct tcf_result *res)
 {
-	struct route4_head *head = tp->root;
+	struct route4_head *head = rcu_dereference_bh(tp->root);
 	struct dst_entry *dst;
 	struct route4_bucket *b;
 	struct route4_filter *f;
@@ -141,32 +146,43 @@ static int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	iif = inet_iif(skb);
 
 	h = route4_fastmap_hash(id, iif);
+
+	spin_lock(&fastmap_lock);
 	if (id == head->fastmap[h].id &&
 	    iif == head->fastmap[h].iif &&
 	    (f = head->fastmap[h].filter) != NULL) {
-		if (f == ROUTE4_FAILURE)
+		if (f == ROUTE4_FAILURE) {
+			spin_unlock(&fastmap_lock);
 			goto failure;
+		}
 
 		*res = f->res;
+		spin_unlock(&fastmap_lock);
 		return 0;
 	}
+	spin_unlock(&fastmap_lock);
 
 	h = route4_hash_to(id);
 
 restart:
-	b = head->table[h];
+	b = rcu_dereference_bh(head->table[h]);
 	if (b) {
-		for (f = b->ht[route4_hash_from(id)]; f; f = f->next)
+		for (f = rcu_dereference_bh(b->ht[route4_hash_from(id)]);
+		     f;
+		     f = rcu_dereference_bh(f->next))
 			if (f->id == id)
 				ROUTE4_APPLY_RESULT();
 
-		for (f = b->ht[route4_hash_iif(iif)]; f; f = f->next)
+		for (f = rcu_dereference_bh(b->ht[route4_hash_iif(iif)]);
+		     f;
+		     f = rcu_dereference_bh(f->next))
 			if (f->iif == iif)
 				ROUTE4_APPLY_RESULT();
 
-		for (f = b->ht[route4_hash_wild()]; f; f = f->next)
+		for (f = rcu_dereference_bh(b->ht[route4_hash_wild()]);
+		     f;
+		     f = rcu_dereference_bh(f->next))
 			ROUTE4_APPLY_RESULT();
-
 	}
 	if (h < 256) {
 		h = 256;
@@ -213,7 +229,7 @@ static inline u32 from_hash(u32 id)
 
 static unsigned long route4_get(struct tcf_proto *tp, u32 handle)
 {
-	struct route4_head *head = tp->root;
+	struct route4_head *head = rtnl_dereference(tp->root);
 	struct route4_bucket *b;
 	struct route4_filter *f;
 	unsigned int h1, h2;
@@ -229,9 +245,11 @@ static unsigned long route4_get(struct tcf_proto *tp, u32 handle)
 	if (h2 > 32)
 		return 0;
 
-	b = head->table[h1];
+	b = rtnl_dereference(head->table[h1]);
 	if (b) {
-		for (f = b->ht[h2]; f; f = f->next)
+		for (f = rtnl_dereference(b->ht[h2]);
+		     f;
+		     f = rtnl_dereference(f->next))
 			if (f->handle == handle)
 				return (unsigned long)f;
 	}
@@ -248,8 +266,11 @@ static int route4_init(struct tcf_proto *tp)
 }
 
 static void
-route4_delete_filter(struct tcf_proto *tp, struct route4_filter *f)
+route4_delete_filter(struct rcu_head *head)
 {
+	struct route4_filter *f = container_of(head, struct route4_filter, rcu);
+	struct tcf_proto *tp = f->tp;
+
 	tcf_unbind_filter(tp, &f->res);
 	tcf_exts_destroy(tp, &f->exts);
 	kfree(f);
@@ -257,7 +278,7 @@ route4_delete_filter(struct tcf_proto *tp, struct route4_filter *f)
 
 static void route4_destroy(struct tcf_proto *tp)
 {
-	struct route4_head *head = tp->root;
+	struct route4_head *head = rtnl_dereference(tp->root);
 	int h1, h2;
 
 	if (head == NULL)
@@ -266,28 +287,35 @@ static void route4_destroy(struct tcf_proto *tp)
 	for (h1 = 0; h1 <= 256; h1++) {
 		struct route4_bucket *b;
 
-		b = head->table[h1];
+		b = rtnl_dereference(head->table[h1]);
 		if (b) {
 			for (h2 = 0; h2 <= 32; h2++) {
 				struct route4_filter *f;
 
-				while ((f = b->ht[h2]) != NULL) {
-					b->ht[h2] = f->next;
-					route4_delete_filter(tp, f);
+				while ((f = rtnl_dereference(b->ht[h2])) != NULL) {
+					struct route4_filter *next;
+
+					next = rtnl_dereference(f->next);
+					RCU_INIT_POINTER(b->ht[h2], next);
+					call_rcu(&f->rcu, route4_delete_filter);
 				}
 			}
-			kfree(b);
+			RCU_INIT_POINTER(head->table[h1], NULL);
+			kfree_rcu(b, rcu);
 		}
 	}
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static int route4_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct route4_head *head = tp->root;
-	struct route4_filter **fp, *f = (struct route4_filter *)arg;
-	unsigned int h = 0;
+	struct route4_head *head = rtnl_dereference(tp->root);
+	struct route4_filter *f = (struct route4_filter *)arg;
+	struct route4_filter __rcu **fp;
+	struct route4_filter *nf;
 	struct route4_bucket *b;
+	unsigned int h = 0;
 	int i;
 
 	if (!head || !f)
@@ -296,27 +324,35 @@ static int route4_delete(struct tcf_proto *tp, unsigned long arg)
 	h = f->handle;
 	b = f->bkt;
 
-	for (fp = &b->ht[from_hash(h >> 16)]; *fp; fp = &(*fp)->next) {
-		if (*fp == f) {
-			tcf_tree_lock(tp);
-			*fp = f->next;
-			tcf_tree_unlock(tp);
+	fp = &b->ht[from_hash(h >> 16)];
+	for (nf = rtnl_dereference(*fp); nf;
+	     fp = &nf->next, nf = rtnl_dereference(*fp)) {
+		if (nf == f) {
+			/* unlink it */
+			RCU_INIT_POINTER(*fp, rtnl_dereference(f->next));
 
-			route4_reset_fastmap(tp->q, head, f->id);
-			route4_delete_filter(tp, f);
+			/* Remove any fastmap lookups that might ref filter
+			 * notice we unlink'd the filter so we can't get it
+			 * back in the fastmap.
+			 */
+			route4_reset_fastmap(head);
 
-			/* Strip tree */
+			/* Delete it */
+			call_rcu(&f->rcu, route4_delete_filter);
 
-			for (i = 0; i <= 32; i++)
-				if (b->ht[i])
+			/* Strip RTNL protected tree */
+			for (i = 0; i <= 32; i++) {
+				struct route4_filter *rt;
+
+				rt = rtnl_dereference(b->ht[i]);
+				if (rt)
 					return 0;
+			}
 
 			/* OK, session has no flows */
-			tcf_tree_lock(tp);
-			head->table[to_hash(h)] = NULL;
-			tcf_tree_unlock(tp);
+			RCU_INIT_POINTER(head->table[to_hash(h)], NULL);
+			kfree_rcu(b, rcu);
 
-			kfree(b);
 			return 0;
 		}
 	}
@@ -380,26 +416,25 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
 	}
 
 	h1 = to_hash(nhandle);
-	b = head->table[h1];
+	b = rtnl_dereference(head->table[h1]);
 	if (!b) {
 		err = -ENOBUFS;
 		b = kzalloc(sizeof(struct route4_bucket), GFP_KERNEL);
 		if (b == NULL)
 			goto errout;
 
-		tcf_tree_lock(tp);
-		head->table[h1] = b;
-		tcf_tree_unlock(tp);
+		rcu_assign_pointer(head->table[h1], b);
 	} else {
 		unsigned int h2 = from_hash(nhandle >> 16);
 
 		err = -EEXIST;
-		for (fp = b->ht[h2]; fp; fp = fp->next)
+		for (fp = rtnl_dereference(b->ht[h2]);
+		     fp;
+		     fp = rtnl_dereference(fp->next))
 			if (fp->handle == f->handle)
 				goto errout;
 	}
 
-	tcf_tree_lock(tp);
 	if (tb[TCA_ROUTE4_TO])
 		f->id = to;
 
@@ -410,7 +445,7 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
 
 	f->handle = nhandle;
 	f->bkt = b;
-	tcf_tree_unlock(tp);
+	f->tp = tp;
 
 	if (tb[TCA_ROUTE4_CLASSID]) {
 		f->res.classid = nla_get_u32(tb[TCA_ROUTE4_CLASSID]);
@@ -431,14 +466,15 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
 		       struct nlattr **tca,
 		       unsigned long *arg, bool ovr)
 {
-	struct route4_head *head = tp->root;
-	struct route4_filter *f, *f1, **fp;
+	struct route4_head *head = rtnl_dereference(tp->root);
+	struct route4_filter __rcu **fp;
+	struct route4_filter *fold, *f1, *pfp, *f = NULL;
 	struct route4_bucket *b;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_ROUTE4_MAX + 1];
 	unsigned int h, th;
-	u32 old_handle = 0;
 	int err;
+	bool new = true;
 
 	if (opt == NULL)
 		return handle ? -EINVAL : 0;
@@ -447,70 +483,70 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		return err;
 
-	f = (struct route4_filter *)*arg;
-	if (f) {
-		if (f->handle != handle && handle)
+	fold = (struct route4_filter *)*arg;
+	if (fold && handle && fold->handle != handle)
 			return -EINVAL;
 
-		if (f->bkt)
-			old_handle = f->handle;
-
-		err = route4_set_parms(net, tp, base, f, handle, head, tb,
-			tca[TCA_RATE], 0, ovr);
-		if (err < 0)
-			return err;
-
-		goto reinsert;
-	}
-
 	err = -ENOBUFS;
 	if (head == NULL) {
 		head = kzalloc(sizeof(struct route4_head), GFP_KERNEL);
 		if (head == NULL)
 			goto errout;
-
-		tcf_tree_lock(tp);
-		tp->root = head;
-		tcf_tree_unlock(tp);
+		rcu_assign_pointer(tp->root, head);
 	}
 
 	f = kzalloc(sizeof(struct route4_filter), GFP_KERNEL);
-	if (f == NULL)
+	if (!f)
 		goto errout;
 
 	tcf_exts_init(&f->exts, TCA_ROUTE4_ACT, TCA_ROUTE4_POLICE);
+	if (fold) {
+		f->id = fold->id;
+		f->iif = fold->iif;
+		f->res = fold->res;
+		f->handle = fold->handle;
+
+		f->tp = fold->tp;
+		f->bkt = fold->bkt;
+		new = false;
+	}
+
 	err = route4_set_parms(net, tp, base, f, handle, head, tb,
-		tca[TCA_RATE], 1, ovr);
+			       tca[TCA_RATE], new, ovr);
 	if (err < 0)
 		goto errout;
 
-reinsert:
 	h = from_hash(f->handle >> 16);
-	for (fp = &f->bkt->ht[h]; (f1 = *fp) != NULL; fp = &f1->next)
+	fp = &f->bkt->ht[h];
+	for (pfp = rtnl_dereference(*fp);
+	     (f1 = rtnl_dereference(*fp)) != NULL;
+	     fp = &f1->next)
 		if (f->handle < f1->handle)
 			break;
 
-	f->next = f1;
-	tcf_tree_lock(tp);
-	*fp = f;
+	rcu_assign_pointer(f->next, f1);
+	rcu_assign_pointer(*fp, f);
 
-	if (old_handle && f->handle != old_handle) {
-		th = to_hash(old_handle);
-		h = from_hash(old_handle >> 16);
-		b = head->table[th];
+	if (fold && fold->handle && f->handle != fold->handle) {
+		th = to_hash(fold->handle);
+		h = from_hash(fold->handle >> 16);
+		b = rtnl_dereference(head->table[th]);
 		if (b) {
-			for (fp = &b->ht[h]; *fp; fp = &(*fp)->next) {
-				if (*fp == f) {
+			fp = &b->ht[h];
+			for (pfp = rtnl_dereference(*fp); pfp;
+			     fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
+				if (pfp == f) {
 					*fp = f->next;
 					break;
 				}
 			}
 		}
 	}
-	tcf_tree_unlock(tp);
 
-	route4_reset_fastmap(tp->q, head, f->id);
+	route4_reset_fastmap(head);
 	*arg = (unsigned long)f;
+	if (fold)
+		call_rcu(&fold->rcu, route4_delete_filter);
 	return 0;
 
 errout:
@@ -520,7 +556,7 @@ errout:
 
 static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct route4_head *head = tp->root;
+	struct route4_head *head = rtnl_dereference(tp->root);
 	unsigned int h, h1;
 
 	if (head == NULL)
@@ -530,13 +566,15 @@ static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 		return;
 
 	for (h = 0; h <= 256; h++) {
-		struct route4_bucket *b = head->table[h];
+		struct route4_bucket *b = rtnl_dereference(head->table[h]);
 
 		if (b) {
 			for (h1 = 0; h1 <= 32; h1++) {
 				struct route4_filter *f;
 
-				for (f = b->ht[h1]; f; f = f->next) {
+				for (f = rtnl_dereference(b->ht[h1]);
+				     f;
+				     f = rtnl_dereference(f->next)) {
 					if (arg->count < arg->skip) {
 						arg->count++;
 						continue;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 08/16] net: sched: RCU cls_tcindex
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (6 preceding siblings ...)
  2014-09-10 15:49 ` [net-next PATCH v4 07/16] net: sched: RCU cls_route John Fastabend
@ 2014-09-10 15:50 ` John Fastabend
  2014-09-11  1:17   ` Eric Dumazet
  2014-09-10 15:50 ` [net-next PATCH v4 09/16] net: sched: make cls_u32 per cpu John Fastabend
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:50 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

Make cls_tcindex RCU safe.

This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/rtnetlink.h |   10 ++
 net/sched/cls_tcindex.c   |  248 ++++++++++++++++++++++++++++-----------------
 2 files changed, 164 insertions(+), 94 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 167bae7..6cacbce 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -47,6 +47,16 @@ static inline int lockdep_rtnl_is_held(void)
 	rcu_dereference_check(p, lockdep_rtnl_is_held())
 
 /**
+ * rcu_dereference_bh_rtnl - rcu_dereference_bh with debug checking
+ * @p: The pointer to read, prior to dereference
+ *
+ * Do an rcu_dereference_bh(p), but check caller either holds rcu_read_lock_bh()
+ * or RTNL. Note : Please prefer rtnl_dereference() or rcu_dereference_bh()
+ */
+#define rcu_dereference_bh_rtnl(p)				\
+	rcu_dereference_bh_check(p, lockdep_rtnl_is_held())
+
+/**
  * rtnl_dereference - fetch RCU pointer when updates are prevented by RTNL
  * @p: The pointer to read, prior to dereferencing
  *
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 3e9f764..a9f4279 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -32,19 +32,21 @@ struct tcindex_filter_result {
 struct tcindex_filter {
 	u16 key;
 	struct tcindex_filter_result result;
-	struct tcindex_filter *next;
+	struct tcindex_filter __rcu *next;
+	struct rcu_head rcu;
 };
 
 
 struct tcindex_data {
 	struct tcindex_filter_result *perfect; /* perfect hash; NULL if none */
-	struct tcindex_filter **h; /* imperfect hash; only used if !perfect;
-				      NULL if unused */
+	struct tcindex_filter __rcu **h; /* imperfect hash; */
+	struct tcf_proto *tp;
 	u16 mask;		/* AND key with mask */
-	int shift;		/* shift ANDed key to the right */
-	int hash;		/* hash table size; 0 if undefined */
-	int alloc_hash;		/* allocated size */
-	int fall_through;	/* 0: only classify if explicit match */
+	u32 shift;		/* shift ANDed key to the right */
+	u32 hash;		/* hash table size; 0 if undefined */
+	u32 alloc_hash;		/* allocated size */
+	u32 fall_through;	/* 0: only classify if explicit match */
+	struct rcu_head rcu;
 };
 
 static inline int
@@ -56,13 +58,18 @@ tcindex_filter_is_set(struct tcindex_filter_result *r)
 static struct tcindex_filter_result *
 tcindex_lookup(struct tcindex_data *p, u16 key)
 {
-	struct tcindex_filter *f;
+	if (p->perfect) {
+		struct tcindex_filter_result *f = p->perfect + key;
+
+		return tcindex_filter_is_set(f) ? f : NULL;
+	} else if (p->h) {
+		struct tcindex_filter __rcu **fp;
+		struct tcindex_filter *f;
 
-	if (p->perfect)
-		return tcindex_filter_is_set(p->perfect + key) ?
-			p->perfect + key : NULL;
-	else if (p->h) {
-		for (f = p->h[key % p->hash]; f; f = f->next)
+		fp = &p->h[key % p->hash];
+		for (f = rcu_dereference_bh_rtnl(*fp);
+		     f;
+		     fp = &f->next, f = rcu_dereference_bh_rtnl(*fp))
 			if (f->key == key)
 				return &f->result;
 	}
@@ -74,7 +81,7 @@ tcindex_lookup(struct tcindex_data *p, u16 key)
 static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			    struct tcf_result *res)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rcu_dereference(tp->root);
 	struct tcindex_filter_result *f;
 	int key = (skb->tc_index & p->mask) >> p->shift;
 
@@ -99,7 +106,7 @@ static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 
 static unsigned long tcindex_get(struct tcf_proto *tp, u32 handle)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r;
 
 	pr_debug("tcindex_get(tp %p,handle 0x%08x)\n", tp, handle);
@@ -129,49 +136,59 @@ static int tcindex_init(struct tcf_proto *tp)
 	p->hash = DEFAULT_HASH_SIZE;
 	p->fall_through = 1;
 
-	tp->root = p;
+	rcu_assign_pointer(tp->root, p);
 	return 0;
 }
 
-
 static int
-__tcindex_delete(struct tcf_proto *tp, unsigned long arg, int lock)
+tcindex_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = (struct tcindex_filter_result *) arg;
+	struct tcindex_filter __rcu **walk;
 	struct tcindex_filter *f = NULL;
 
-	pr_debug("tcindex_delete(tp %p,arg 0x%lx),p %p,f %p\n", tp, arg, p, f);
+	pr_debug("tcindex_delete(tp %p,arg 0x%lx),p %p\n", tp, arg, p);
 	if (p->perfect) {
 		if (!r->res.class)
 			return -ENOENT;
 	} else {
 		int i;
-		struct tcindex_filter **walk = NULL;
 
-		for (i = 0; i < p->hash; i++)
-			for (walk = p->h+i; *walk; walk = &(*walk)->next)
-				if (&(*walk)->result == r)
+		for (i = 0; i < p->hash; i++) {
+			walk = p->h + i;
+			for (f = rtnl_dereference(*walk); f;
+			     walk = &f->next, f = rtnl_dereference(*walk)) {
+				if (&f->result == r)
 					goto found;
+			}
+		}
 		return -ENOENT;
 
 found:
-		f = *walk;
-		if (lock)
-			tcf_tree_lock(tp);
-		*walk = f->next;
-		if (lock)
-			tcf_tree_unlock(tp);
+		rcu_assign_pointer(*walk, rtnl_dereference(f->next));
 	}
 	tcf_unbind_filter(tp, &r->res);
 	tcf_exts_destroy(tp, &r->exts);
-	kfree(f);
+	if (f)
+		kfree_rcu(f, rcu);
 	return 0;
 }
 
-static int tcindex_delete(struct tcf_proto *tp, unsigned long arg)
+static int tcindex_destroy_element(struct tcf_proto *tp,
+				   unsigned long arg,
+				   struct tcf_walker *walker)
 {
-	return __tcindex_delete(tp, arg, 1);
+	return tcindex_delete(tp, arg);
+}
+
+static void __tcindex_destroy(struct rcu_head *head)
+{
+	struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
+
+	kfree(p->perfect);
+	kfree(p->h);
+	kfree(p);
 }
 
 static inline int
@@ -194,6 +211,14 @@ static void tcindex_filter_result_init(struct tcindex_filter_result *r)
 	tcf_exts_init(&r->exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
 }
 
+static void __tcindex_partial_destroy(struct rcu_head *head)
+{
+	struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
+
+	kfree(p->perfect);
+	kfree(p);
+}
+
 static int
 tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 		  u32 handle, struct tcindex_data *p,
@@ -203,7 +228,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	int err, balloc = 0;
 	struct tcindex_filter_result new_filter_result, *old_r = r;
 	struct tcindex_filter_result cr;
-	struct tcindex_data cp;
+	struct tcindex_data *cp, *oldp;
 	struct tcindex_filter *f = NULL; /* make gcc behave */
 	struct tcf_exts e;
 
@@ -212,84 +237,118 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	if (err < 0)
 		return err;
 
-	memcpy(&cp, p, sizeof(cp));
-	tcindex_filter_result_init(&new_filter_result);
+	/* tcindex_data attributes must look atomic to classifier/lookup so
+	 * allocate new tcindex data and RCU assign it onto root. Keeping
+	 * perfect hash and hash pointers from old data.
+	 */
+	cp = kzalloc(sizeof(cp), GFP_KERNEL);
+	if (!cp)
+		return -ENOMEM;
+
+	cp->mask = p->mask;
+	cp->shift = p->shift;
+	cp->hash = p->hash;
+	cp->alloc_hash = p->alloc_hash;
+	cp->fall_through = p->fall_through;
+	cp->tp = tp;
+
+	if (p->perfect) {
+		cp->perfect = kmemdup(p->perfect,
+				      sizeof(*r) * cp->hash, GFP_KERNEL);
+		if (!cp->perfect)
+			goto errout;
+	}
+	cp->h = p->h;
+
+	memset(&new_filter_result, 0, sizeof(new_filter_result));
+	tcf_exts_init(&new_filter_result.exts,
+		      TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
 
 	tcindex_filter_result_init(&cr);
 	if (old_r)
 		cr.res = r->res;
 
 	if (tb[TCA_TCINDEX_HASH])
-		cp.hash = nla_get_u32(tb[TCA_TCINDEX_HASH]);
+		cp->hash = nla_get_u32(tb[TCA_TCINDEX_HASH]);
 
 	if (tb[TCA_TCINDEX_MASK])
-		cp.mask = nla_get_u16(tb[TCA_TCINDEX_MASK]);
+		cp->mask = nla_get_u16(tb[TCA_TCINDEX_MASK]);
 
 	if (tb[TCA_TCINDEX_SHIFT])
-		cp.shift = nla_get_u32(tb[TCA_TCINDEX_SHIFT]);
+		cp->shift = nla_get_u32(tb[TCA_TCINDEX_SHIFT]);
 
 	err = -EBUSY;
+
 	/* Hash already allocated, make sure that we still meet the
 	 * requirements for the allocated hash.
 	 */
-	if (cp.perfect) {
-		if (!valid_perfect_hash(&cp) ||
-		    cp.hash > cp.alloc_hash)
+	if (cp->perfect) {
+		if (!valid_perfect_hash(cp) ||
+		    cp->hash > cp->alloc_hash)
 			goto errout;
-	} else if (cp.h && cp.hash != cp.alloc_hash)
+	} else if (cp->h && cp->hash != cp->alloc_hash) {
 		goto errout;
+	}
 
 	err = -EINVAL;
 	if (tb[TCA_TCINDEX_FALL_THROUGH])
-		cp.fall_through = nla_get_u32(tb[TCA_TCINDEX_FALL_THROUGH]);
+		cp->fall_through = nla_get_u32(tb[TCA_TCINDEX_FALL_THROUGH]);
 
-	if (!cp.hash) {
+	if (!cp->hash) {
 		/* Hash not specified, use perfect hash if the upper limit
 		 * of the hashing index is below the threshold.
 		 */
-		if ((cp.mask >> cp.shift) < PERFECT_HASH_THRESHOLD)
-			cp.hash = (cp.mask >> cp.shift) + 1;
+		if ((cp->mask >> cp->shift) < PERFECT_HASH_THRESHOLD)
+			cp->hash = (cp->mask >> cp->shift) + 1;
 		else
-			cp.hash = DEFAULT_HASH_SIZE;
+			cp->hash = DEFAULT_HASH_SIZE;
 	}
 
-	if (!cp.perfect && !cp.h)
-		cp.alloc_hash = cp.hash;
+	if (!cp->perfect && cp->h)
+		cp->alloc_hash = cp->hash;
 
 	/* Note: this could be as restrictive as if (handle & ~(mask >> shift))
 	 * but then, we'd fail handles that may become valid after some future
 	 * mask change. While this is extremely unlikely to ever matter,
 	 * the check below is safer (and also more backwards-compatible).
 	 */
-	if (cp.perfect || valid_perfect_hash(&cp))
-		if (handle >= cp.alloc_hash)
+	if (cp->perfect || valid_perfect_hash(cp))
+		if (handle >= cp->alloc_hash)
 			goto errout;
 
 
 	err = -ENOMEM;
-	if (!cp.perfect && !cp.h) {
-		if (valid_perfect_hash(&cp)) {
+	if (!cp->perfect && !cp->h) {
+		if (valid_perfect_hash(cp)) {
 			int i;
 
-			cp.perfect = kcalloc(cp.hash, sizeof(*r), GFP_KERNEL);
-			if (!cp.perfect)
+			cp->perfect = kcalloc(cp->hash, sizeof(*r), GFP_KERNEL);
+			if (!cp->perfect)
 				goto errout;
-			for (i = 0; i < cp.hash; i++)
-				tcf_exts_init(&cp.perfect[i].exts, TCA_TCINDEX_ACT,
+			for (i = 0; i < cp->hash; i++)
+				tcf_exts_init(&cp->perfect[i].exts,
+					      TCA_TCINDEX_ACT,
 					      TCA_TCINDEX_POLICE);
 			balloc = 1;
 		} else {
-			cp.h = kcalloc(cp.hash, sizeof(f), GFP_KERNEL);
-			if (!cp.h)
+			struct tcindex_filter __rcu **hash;
+
+			hash = kcalloc(cp->hash,
+				       sizeof(struct tcindex_filter *),
+				       GFP_KERNEL);
+
+			if (!hash)
 				goto errout;
+
+			cp->h = hash;
 			balloc = 2;
 		}
 	}
 
-	if (cp.perfect)
-		r = cp.perfect + handle;
+	if (cp->perfect)
+		r = cp->perfect + handle;
 	else
-		r = tcindex_lookup(&cp, handle) ? : &new_filter_result;
+		r = tcindex_lookup(cp, handle) ? : &new_filter_result;
 
 	if (r == &new_filter_result) {
 		f = kzalloc(sizeof(*f), GFP_KERNEL);
@@ -307,33 +366,41 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	else
 		tcf_exts_change(tp, &cr.exts, &e);
 
-	tcf_tree_lock(tp);
 	if (old_r && old_r != r)
 		tcindex_filter_result_init(old_r);
 
-	memcpy(p, &cp, sizeof(cp));
+	oldp = p;
 	r->res = cr.res;
+	rcu_assign_pointer(tp->root, cp);
 
 	if (r == &new_filter_result) {
-		struct tcindex_filter **fp;
+		struct tcindex_filter *nfp;
+		struct tcindex_filter __rcu **fp;
 
 		f->key = handle;
 		f->result = new_filter_result;
 		f->next = NULL;
-		for (fp = p->h+(handle % p->hash); *fp; fp = &(*fp)->next)
-			/* nothing */;
-		*fp = f;
+
+		fp = p->h + (handle % p->hash);
+		for (nfp = rtnl_dereference(*fp);
+		     nfp;
+		     fp = &nfp->next, nfp = rtnl_dereference(*fp))
+				; /* nothing */
+
+		rcu_assign_pointer(*fp, f);
 	}
-	tcf_tree_unlock(tp);
 
+	if (oldp)
+		call_rcu(&oldp->rcu, __tcindex_partial_destroy);
 	return 0;
 
 errout_alloc:
 	if (balloc == 1)
-		kfree(cp.perfect);
+		kfree(cp->perfect);
 	else if (balloc == 2)
-		kfree(cp.h);
+		kfree(cp->h);
 errout:
+	kfree(cp);
 	tcf_exts_destroy(tp, &e);
 	return err;
 }
@@ -345,7 +412,7 @@ tcindex_change(struct net *net, struct sk_buff *in_skb,
 {
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_TCINDEX_MAX + 1];
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = (struct tcindex_filter_result *) *arg;
 	int err;
 
@@ -364,10 +431,9 @@ tcindex_change(struct net *net, struct sk_buff *in_skb,
 				 tca[TCA_RATE], ovr);
 }
 
-
 static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter *f, *next;
 	int i;
 
@@ -390,8 +456,8 @@ static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
 	if (!p->h)
 		return;
 	for (i = 0; i < p->hash; i++) {
-		for (f = p->h[i]; f; f = next) {
-			next = f->next;
+		for (f = rtnl_dereference(p->h[i]); f; f = next) {
+			next = rtnl_dereference(f->next);
 			if (walker->count >= walker->skip) {
 				if (walker->fn(tp, (unsigned long) &f->result,
 				    walker) < 0) {
@@ -404,17 +470,9 @@ static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
 	}
 }
 
-
-static int tcindex_destroy_element(struct tcf_proto *tp,
-    unsigned long arg, struct tcf_walker *walker)
-{
-	return __tcindex_delete(tp, arg, 0);
-}
-
-
 static void tcindex_destroy(struct tcf_proto *tp)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcf_walker walker;
 
 	pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p);
@@ -422,17 +480,16 @@ static void tcindex_destroy(struct tcf_proto *tp)
 	walker.skip = 0;
 	walker.fn = tcindex_destroy_element;
 	tcindex_walk(tp, &walker);
-	kfree(p->perfect);
-	kfree(p->h);
-	kfree(p);
-	tp->root = NULL;
+
+	RCU_INIT_POINTER(tp->root, NULL);
+	call_rcu(&p->rcu, __tcindex_destroy);
 }
 
 
 static int tcindex_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
     struct sk_buff *skb, struct tcmsg *t)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = (struct tcindex_filter_result *) fh;
 	unsigned char *b = skb_tail_pointer(skb);
 	struct nlattr *nest;
@@ -455,15 +512,18 @@ static int tcindex_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		nla_nest_end(skb, nest);
 	} else {
 		if (p->perfect) {
-			t->tcm_handle = r-p->perfect;
+			t->tcm_handle = r - p->perfect;
 		} else {
 			struct tcindex_filter *f;
+			struct tcindex_filter __rcu **fp;
 			int i;
 
 			t->tcm_handle = 0;
 			for (i = 0; !t->tcm_handle && i < p->hash; i++) {
-				for (f = p->h[i]; !t->tcm_handle && f;
-				     f = f->next) {
+				fp = &p->h[i];
+				for (f = rtnl_dereference(*fp);
+				     !t->tcm_handle && f;
+				     fp = &f->next, f = rtnl_dereference(*fp)) {
 					if (&f->result == r)
 						t->tcm_handle = f->key;
 				}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 09/16] net: sched: make cls_u32 per cpu
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (7 preceding siblings ...)
  2014-09-10 15:50 ` [net-next PATCH v4 08/16] net: sched: RCU cls_tcindex John Fastabend
@ 2014-09-10 15:50 ` John Fastabend
  2014-09-11  1:19   ` Eric Dumazet
  2014-09-10 15:50 ` [net-next PATCH v4 10/16] net: sched: make cls_u32 lockless John Fastabend
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:50 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

This uses per cpu counters in cls_u32 in preparation
to convert over to rcu.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_u32.c |   75 ++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 59 insertions(+), 16 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 70c0be8..f3227d7 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -55,10 +55,12 @@ struct tc_u_knode {
 	struct tcf_result	res;
 	struct tc_u_hnode	*ht_down;
 #ifdef CONFIG_CLS_U32_PERF
-	struct tc_u32_pcnt	*pf;
+	struct tc_u32_pcnt __percpu *pf;
 #endif
 #ifdef CONFIG_CLS_U32_MARK
-	struct tc_u32_mark	mark;
+	u32			val;
+	u32			mask;
+	u32 __percpu		*pcpu_success;
 #endif
 	struct tc_u32_sel	sel;
 };
@@ -115,16 +117,16 @@ next_knode:
 		struct tc_u32_key *key = n->sel.keys;
 
 #ifdef CONFIG_CLS_U32_PERF
-		n->pf->rcnt += 1;
+		__this_cpu_inc(n->pf->rcnt);
 		j = 0;
 #endif
 
 #ifdef CONFIG_CLS_U32_MARK
-		if ((skb->mark & n->mark.mask) != n->mark.val) {
+		if ((skb->mark & n->mask) != n->val) {
 			n = n->next;
 			goto next_knode;
 		} else {
-			n->mark.success++;
+			__this_cpu_inc(*n->pcpu_success);
 		}
 #endif
 
@@ -143,7 +145,7 @@ next_knode:
 				goto next_knode;
 			}
 #ifdef CONFIG_CLS_U32_PERF
-			n->pf->kcnts[j] += 1;
+			__this_cpu_inc(n->pf->kcnts[j]);
 			j++;
 #endif
 		}
@@ -159,7 +161,7 @@ check_terminal:
 				}
 #endif
 #ifdef CONFIG_CLS_U32_PERF
-				n->pf->rhit += 1;
+				__this_cpu_inc(n->pf->rhit);
 #endif
 				r = tcf_exts_exec(skb, &n->exts, res);
 				if (r < 0) {
@@ -342,7 +344,7 @@ static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n)
 	if (n->ht_down)
 		n->ht_down->refcnt--;
 #ifdef CONFIG_CLS_U32_PERF
-	kfree(n->pf);
+	free_percpu(n->pf);
 #endif
 	kfree(n);
 	return 0;
@@ -564,6 +566,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	struct nlattr *tb[TCA_U32_MAX + 1];
 	u32 htid;
 	int err;
+#ifdef CONFIG_CLS_U32_PERF
+	size_t size;
+#endif
 
 	if (opt == NULL)
 		return handle ? -EINVAL : 0;
@@ -642,8 +647,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		return -ENOBUFS;
 
 #ifdef CONFIG_CLS_U32_PERF
-	n->pf = kzalloc(sizeof(struct tc_u32_pcnt) + s->nkeys*sizeof(u64), GFP_KERNEL);
-	if (n->pf == NULL) {
+	size = sizeof(struct tc_u32_pcnt) + s->nkeys * sizeof(u64);
+	n->pf = __alloc_percpu(size, __alignof__(struct tc_u32_pcnt));
+	if (!n->pf) {
 		kfree(n);
 		return -ENOBUFS;
 	}
@@ -656,12 +662,14 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	tcf_exts_init(&n->exts, TCA_U32_ACT, TCA_U32_POLICE);
 
 #ifdef CONFIG_CLS_U32_MARK
+	n->pcpu_success = alloc_percpu(u32);
+
 	if (tb[TCA_U32_MARK]) {
 		struct tc_u32_mark *mark;
 
 		mark = nla_data(tb[TCA_U32_MARK]);
-		memcpy(&n->mark, mark, sizeof(struct tc_u32_mark));
-		n->mark.success = 0;
+		n->val = mark->val;
+		n->mask = mark->mask;
 	}
 #endif
 
@@ -745,6 +753,11 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		if (nla_put_u32(skb, TCA_U32_DIVISOR, divisor))
 			goto nla_put_failure;
 	} else {
+#ifdef CONFIG_CLS_U32_PERF
+		struct tc_u32_pcnt *gpf;
+#endif
+		int cpu;
+
 		if (nla_put(skb, TCA_U32_SEL,
 			    sizeof(n->sel) + n->sel.nkeys*sizeof(struct tc_u32_key),
 			    &n->sel))
@@ -762,9 +775,20 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 			goto nla_put_failure;
 
 #ifdef CONFIG_CLS_U32_MARK
-		if ((n->mark.val || n->mark.mask) &&
-		    nla_put(skb, TCA_U32_MARK, sizeof(n->mark), &n->mark))
-			goto nla_put_failure;
+		if ((n->val || n->mask)) {
+			struct tc_u32_mark mark = {.val = n->val,
+						   .mask = n->mask,
+						   .success = 0};
+
+			for_each_possible_cpu(cpu) {
+				__u32 cnt = *per_cpu_ptr(n->pcpu_success, cpu);
+
+				mark.success += cnt;
+			}
+
+			if (nla_put(skb, TCA_U32_MARK, sizeof(mark), &mark))
+				goto nla_put_failure;
+		}
 #endif
 
 		if (tcf_exts_dump(skb, &n->exts) < 0)
@@ -779,10 +803,29 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		}
 #endif
 #ifdef CONFIG_CLS_U32_PERF
+		gpf = kzalloc(sizeof(struct tc_u32_pcnt) +
+			      n->sel.nkeys * sizeof(u64),
+			      GFP_KERNEL);
+		if (!gpf)
+			goto nla_put_failure;
+
+		for_each_possible_cpu(cpu) {
+			int i;
+			struct tc_u32_pcnt *pf = per_cpu_ptr(n->pf, cpu);
+
+			gpf->rcnt += pf->rcnt;
+			gpf->rhit += pf->rhit;
+			for (i = 0; i < n->sel.nkeys; i++)
+				gpf->kcnts[i] += pf->kcnts[i];
+		}
+
 		if (nla_put(skb, TCA_U32_PCNT,
 			    sizeof(struct tc_u32_pcnt) + n->sel.nkeys*sizeof(u64),
-			    n->pf))
+			    gpf)) {
+			kfree(gpf);
 			goto nla_put_failure;
+		}
+		kfree(gpf);
 #endif
 	}
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 10/16] net: sched: make cls_u32 lockless
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (8 preceding siblings ...)
  2014-09-10 15:50 ` [net-next PATCH v4 09/16] net: sched: make cls_u32 per cpu John Fastabend
@ 2014-09-10 15:50 ` John Fastabend
  2014-09-11  1:26   ` Eric Dumazet
  2014-09-10 15:51 ` [net-next PATCH v4 11/16] net: sched: rcu'ify cls_rsvp John Fastabend
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:50 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

Make cls_u32 classifier safe to run without holding lock. This patch
converts statistics that are kept in read section u32_classify into
per cpu counters.

This patch was tested with a tight u32 filter add/delete loop while
generating traffic with pktgen. By running pktgen on vlan devices
created on top of a physical device we can hit the qdisc layer
correctly. For ingress qdisc's a loopback cable was used.

for i in {1..100}; do
        q=`echo $i%8|bc`;
        echo -n "u32 tos: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
                  action skbedit queue_mapping $q;
        sleep 1;
        tc filter del dev p3p2 prio $i;

        echo -n "u32 tos hash table: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
        sleep 2;
        tc filter del dev p3p2 prio $i
        sleep 1;
done

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_u32.c |  183 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 110 insertions(+), 73 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index f3227d7..5ed5ac4 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -36,6 +36,7 @@
 #include <linux/kernel.h>
 #include <linux/string.h>
 #include <linux/errno.h>
+#include <linux/percpu.h>
 #include <linux/rtnetlink.h>
 #include <linux/skbuff.h>
 #include <linux/bitmap.h>
@@ -44,16 +45,16 @@
 #include <net/pkt_cls.h>
 
 struct tc_u_knode {
-	struct tc_u_knode	*next;
+	struct tc_u_knode __rcu	*next;
 	u32			handle;
-	struct tc_u_hnode	*ht_up;
+	struct tc_u_hnode __rcu	*ht_up;
 	struct tcf_exts		exts;
 #ifdef CONFIG_NET_CLS_IND
 	int			ifindex;
 #endif
 	u8			fshift;
 	struct tcf_result	res;
-	struct tc_u_hnode	*ht_down;
+	struct tc_u_hnode __rcu	*ht_down;
 #ifdef CONFIG_CLS_U32_PERF
 	struct tc_u32_pcnt __percpu *pf;
 #endif
@@ -62,24 +63,28 @@ struct tc_u_knode {
 	u32			mask;
 	u32 __percpu		*pcpu_success;
 #endif
+	struct tcf_proto	*tp;
 	struct tc_u32_sel	sel;
+	struct rcu_head		rcu;
 };
 
 struct tc_u_hnode {
-	struct tc_u_hnode	*next;
+	struct tc_u_hnode __rcu	*next;
 	u32			handle;
 	u32			prio;
 	struct tc_u_common	*tp_c;
 	int			refcnt;
 	unsigned int		divisor;
-	struct tc_u_knode	*ht[1];
+	struct tc_u_knode __rcu	*ht[1];
+	struct rcu_head		rcu;
 };
 
 struct tc_u_common {
-	struct tc_u_hnode	*hlist;
+	struct tc_u_hnode __rcu	*hlist;
 	struct Qdisc		*q;
 	int			refcnt;
 	u32			hgenerator;
+	struct rcu_head		rcu;
 };
 
 static inline unsigned int u32_hash_fold(__be32 key,
@@ -98,7 +103,7 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
 		unsigned int	  off;
 	} stack[TC_U32_MAXDEPTH];
 
-	struct tc_u_hnode *ht = tp->root;
+	struct tc_u_hnode *ht = rcu_dereference_bh(tp->root);
 	unsigned int off = skb_network_offset(skb);
 	struct tc_u_knode *n;
 	int sdepth = 0;
@@ -110,7 +115,7 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
 	int i, r;
 
 next_ht:
-	n = ht->ht[sel];
+	n = rcu_dereference_bh(ht->ht[sel]);
 
 next_knode:
 	if (n) {
@@ -123,7 +128,7 @@ next_knode:
 
 #ifdef CONFIG_CLS_U32_MARK
 		if ((skb->mark & n->mask) != n->val) {
-			n = n->next;
+			n = rcu_dereference_bh(n->next);
 			goto next_knode;
 		} else {
 			__this_cpu_inc(*n->pcpu_success);
@@ -141,7 +146,7 @@ next_knode:
 			if (!data)
 				goto out;
 			if ((*data ^ key->val) & key->mask) {
-				n = n->next;
+				n = rcu_dereference_bh(n->next);
 				goto next_knode;
 			}
 #ifdef CONFIG_CLS_U32_PERF
@@ -149,14 +154,16 @@ next_knode:
 			j++;
 #endif
 		}
-		if (n->ht_down == NULL) {
+
+		ht = rcu_dereference_bh(n->ht_down);
+		if (!ht) {
 check_terminal:
 			if (n->sel.flags & TC_U32_TERMINAL) {
 
 				*res = n->res;
 #ifdef CONFIG_NET_CLS_IND
 				if (!tcf_match_indev(skb, n->ifindex)) {
-					n = n->next;
+					n = rcu_dereference_bh(n->next);
 					goto next_knode;
 				}
 #endif
@@ -165,13 +172,13 @@ check_terminal:
 #endif
 				r = tcf_exts_exec(skb, &n->exts, res);
 				if (r < 0) {
-					n = n->next;
+					n = rcu_dereference_bh(n->next);
 					goto next_knode;
 				}
 
 				return r;
 			}
-			n = n->next;
+			n = rcu_dereference_bh(n->next);
 			goto next_knode;
 		}
 
@@ -182,7 +189,7 @@ check_terminal:
 		stack[sdepth].off = off;
 		sdepth++;
 
-		ht = n->ht_down;
+		ht = rcu_dereference_bh(n->ht_down);
 		sel = 0;
 		if (ht->divisor) {
 			__be32 *data, hdata;
@@ -224,7 +231,7 @@ check_terminal:
 	/* POP */
 	if (sdepth--) {
 		n = stack[sdepth].knode;
-		ht = n->ht_up;
+		ht = rcu_dereference_bh(n->ht_up);
 		off = stack[sdepth].off;
 		goto check_terminal;
 	}
@@ -241,7 +248,9 @@ u32_lookup_ht(struct tc_u_common *tp_c, u32 handle)
 {
 	struct tc_u_hnode *ht;
 
-	for (ht = tp_c->hlist; ht; ht = ht->next)
+	for (ht = rtnl_dereference(tp_c->hlist);
+	     ht;
+	     ht = rtnl_dereference(ht->next))
 		if (ht->handle == handle)
 			break;
 
@@ -258,7 +267,9 @@ u32_lookup_key(struct tc_u_hnode *ht, u32 handle)
 	if (sel > ht->divisor)
 		goto out;
 
-	for (n = ht->ht[sel]; n; n = n->next)
+	for (n = rtnl_dereference(ht->ht[sel]);
+	     n;
+	     n = rtnl_dereference(n->next))
 		if (n->handle == handle)
 			break;
 out:
@@ -272,7 +283,7 @@ static unsigned long u32_get(struct tcf_proto *tp, u32 handle)
 	struct tc_u_common *tp_c = tp->data;
 
 	if (TC_U32_HTID(handle) == TC_U32_ROOT)
-		ht = tp->root;
+		ht = rtnl_dereference(tp->root);
 	else
 		ht = u32_lookup_ht(tp_c, TC_U32_HTID(handle));
 
@@ -293,6 +304,9 @@ static u32 gen_new_htid(struct tc_u_common *tp_c)
 {
 	int i = 0x800;
 
+	/* hgenerator only used inside rtnl lock it is safe to increment
+	 * without read _copy_ update semantics
+	 */
 	do {
 		if (++tp_c->hgenerator == 0x7FF)
 			tp_c->hgenerator = 1;
@@ -328,11 +342,11 @@ static int u32_init(struct tcf_proto *tp)
 	}
 
 	tp_c->refcnt++;
-	root_ht->next = tp_c->hlist;
-	tp_c->hlist = root_ht;
+	RCU_INIT_POINTER(root_ht->next, tp_c->hlist);
+	rcu_assign_pointer(tp_c->hlist, root_ht);
 	root_ht->tp_c = tp_c;
 
-	tp->root = root_ht;
+	rcu_assign_pointer(tp->root, root_ht);
 	tp->data = tp_c;
 	return 0;
 }
@@ -350,19 +364,27 @@ static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n)
 	return 0;
 }
 
+static void u32_delete_key_rcu(struct rcu_head *rcu)
+{
+	struct tc_u_knode *key = container_of(rcu, struct tc_u_knode, rcu);
+
+	u32_destroy_key(key->tp, key);
+}
+
 static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 {
-	struct tc_u_knode **kp;
+	struct tc_u_knode __rcu **kp;
+	struct tc_u_knode *pkp;
 	struct tc_u_hnode *ht = key->ht_up;
 
 	if (ht) {
-		for (kp = &ht->ht[TC_U32_HASH(key->handle)]; *kp; kp = &(*kp)->next) {
-			if (*kp == key) {
-				tcf_tree_lock(tp);
-				*kp = key->next;
-				tcf_tree_unlock(tp);
+		kp = &ht->ht[TC_U32_HASH(key->handle)];
+		for (pkp = rtnl_dereference(*kp); pkp;
+		     kp = &pkp->next, pkp = rtnl_dereference(*kp)) {
+			if (pkp == key) {
+				RCU_INIT_POINTER(*kp, key->next);
 
-				u32_destroy_key(tp, key);
+				call_rcu(&key->rcu, u32_delete_key_rcu);
 				return 0;
 			}
 		}
@@ -371,16 +393,16 @@ static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 	return 0;
 }
 
-static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
+static void u32_clear_hnode(struct tc_u_hnode *ht)
 {
 	struct tc_u_knode *n;
 	unsigned int h;
 
 	for (h = 0; h <= ht->divisor; h++) {
-		while ((n = ht->ht[h]) != NULL) {
-			ht->ht[h] = n->next;
-
-			u32_destroy_key(tp, n);
+		while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
+			RCU_INIT_POINTER(ht->ht[h],
+					 rtnl_dereference(n->next));
+			call_rcu(&n->rcu, u32_delete_key_rcu);
 		}
 	}
 }
@@ -388,28 +410,31 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 {
 	struct tc_u_common *tp_c = tp->data;
-	struct tc_u_hnode **hn;
+	struct tc_u_hnode __rcu **hn;
+	struct tc_u_hnode *phn;
 
 	WARN_ON(ht->refcnt);
 
-	u32_clear_hnode(tp, ht);
+	u32_clear_hnode(ht);
 
-	for (hn = &tp_c->hlist; *hn; hn = &(*hn)->next) {
-		if (*hn == ht) {
-			*hn = ht->next;
-			kfree(ht);
+	hn = &tp_c->hlist;
+	for (phn = rtnl_dereference(*hn);
+	     phn;
+	     hn = &phn->next, phn = rtnl_dereference(*hn)) {
+		if (phn == ht) {
+			RCU_INIT_POINTER(*hn, ht->next);
+			kfree_rcu(ht, rcu);
 			return 0;
 		}
 	}
 
-	WARN_ON(1);
 	return -ENOENT;
 }
 
 static void u32_destroy(struct tcf_proto *tp)
 {
 	struct tc_u_common *tp_c = tp->data;
-	struct tc_u_hnode *root_ht = tp->root;
+	struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
 
 	WARN_ON(root_ht == NULL);
 
@@ -421,17 +446,16 @@ static void u32_destroy(struct tcf_proto *tp)
 
 		tp->q->u32_node = NULL;
 
-		for (ht = tp_c->hlist; ht; ht = ht->next) {
+		for (ht = rtnl_dereference(tp_c->hlist);
+		     ht;
+		     ht = rtnl_dereference(ht->next)) {
 			ht->refcnt--;
-			u32_clear_hnode(tp, ht);
+			u32_clear_hnode(ht);
 		}
 
-		while ((ht = tp_c->hlist) != NULL) {
-			tp_c->hlist = ht->next;
-
-			WARN_ON(ht->refcnt != 0);
-
-			kfree(ht);
+		while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
+			RCU_INIT_POINTER(tp_c->hlist, ht->next);
+			kfree_rcu(ht, rcu);
 		}
 
 		kfree(tp_c);
@@ -443,6 +467,7 @@ static void u32_destroy(struct tcf_proto *tp)
 static int u32_delete(struct tcf_proto *tp, unsigned long arg)
 {
 	struct tc_u_hnode *ht = (struct tc_u_hnode *)arg;
+	struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
 
 	if (ht == NULL)
 		return 0;
@@ -450,7 +475,7 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
 	if (TC_U32_KEY(ht->handle))
 		return u32_delete_key(tp, (struct tc_u_knode *)ht);
 
-	if (tp->root == ht)
+	if (root_ht == ht)
 		return -EINVAL;
 
 	if (ht->refcnt == 1) {
@@ -473,7 +498,9 @@ static u32 gen_new_kid(struct tc_u_hnode *ht, u32 handle)
 	if (!bitmap)
 		return handle | 0xFFF;
 
-	for (n = ht->ht[TC_U32_HASH(handle)]; n; n = n->next)
+	for (n = rtnl_dereference(ht->ht[TC_U32_HASH(handle)]);
+	     n;
+	     n = rtnl_dereference(n->next))
 		set_bit(TC_U32_NODE(n->handle), bitmap);
 
 	i = find_next_zero_bit(bitmap, NR_U32_NODE, 0x800);
@@ -523,10 +550,8 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
 			ht_down->refcnt++;
 		}
 
-		tcf_tree_lock(tp);
-		ht_old = n->ht_down;
-		n->ht_down = ht_down;
-		tcf_tree_unlock(tp);
+		ht_old = rtnl_dereference(n->ht_down);
+		rcu_assign_pointer(n->ht_down, ht_down);
 
 		if (ht_old)
 			ht_old->refcnt--;
@@ -606,8 +631,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		ht->divisor = divisor;
 		ht->handle = handle;
 		ht->prio = tp->prio;
-		ht->next = tp_c->hlist;
-		tp_c->hlist = ht;
+		RCU_INIT_POINTER(ht->next, tp_c->hlist);
+		rcu_assign_pointer(tp_c->hlist, ht);
 		*arg = (unsigned long)ht;
 		return 0;
 	}
@@ -615,7 +640,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	if (tb[TCA_U32_HASH]) {
 		htid = nla_get_u32(tb[TCA_U32_HASH]);
 		if (TC_U32_HTID(htid) == TC_U32_ROOT) {
-			ht = tp->root;
+			ht = rtnl_dereference(tp->root);
 			htid = ht->handle;
 		} else {
 			ht = u32_lookup_ht(tp->data, TC_U32_HTID(htid));
@@ -623,7 +648,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 				return -EINVAL;
 		}
 	} else {
-		ht = tp->root;
+		ht = rtnl_dereference(tp->root);
 		htid = ht->handle;
 	}
 
@@ -660,6 +685,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	n->handle = handle;
 	n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0;
 	tcf_exts_init(&n->exts, TCA_U32_ACT, TCA_U32_POLICE);
+	n->tp = tp;
 
 #ifdef CONFIG_CLS_U32_MARK
 	n->pcpu_success = alloc_percpu(u32);
@@ -675,21 +701,23 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 
 	err = u32_set_parms(net, tp, base, ht, n, tb, tca[TCA_RATE], ovr);
 	if (err == 0) {
-		struct tc_u_knode **ins;
-		for (ins = &ht->ht[TC_U32_HASH(handle)]; *ins; ins = &(*ins)->next)
-			if (TC_U32_NODE(handle) < TC_U32_NODE((*ins)->handle))
+		struct tc_u_knode __rcu **ins;
+		struct tc_u_knode *pins;
+
+		ins = &ht->ht[TC_U32_HASH(handle)];
+		for (pins = rtnl_dereference(*ins); pins;
+		     ins = &pins->next, pins = rtnl_dereference(*ins))
+			if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
 				break;
 
-		n->next = *ins;
-		tcf_tree_lock(tp);
-		*ins = n;
-		tcf_tree_unlock(tp);
+		RCU_INIT_POINTER(n->next, pins);
+		rcu_assign_pointer(*ins, n);
 
 		*arg = (unsigned long)n;
 		return 0;
 	}
 #ifdef CONFIG_CLS_U32_PERF
-	kfree(n->pf);
+	free_percpu(n->pf);
 #endif
 	kfree(n);
 	return err;
@@ -705,7 +733,9 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 	if (arg->stop)
 		return;
 
-	for (ht = tp_c->hlist; ht; ht = ht->next) {
+	for (ht = rtnl_dereference(tp_c->hlist);
+	     ht;
+	     ht = rtnl_dereference(ht->next)) {
 		if (ht->prio != tp->prio)
 			continue;
 		if (arg->count >= arg->skip) {
@@ -716,7 +746,9 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 		}
 		arg->count++;
 		for (h = 0; h <= ht->divisor; h++) {
-			for (n = ht->ht[h]; n; n = n->next) {
+			for (n = rtnl_dereference(ht->ht[h]);
+			     n;
+			     n = rtnl_dereference(n->next)) {
 				if (arg->count < arg->skip) {
 					arg->count++;
 					continue;
@@ -735,6 +767,7 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		     struct sk_buff *skb, struct tcmsg *t)
 {
 	struct tc_u_knode *n = (struct tc_u_knode *)fh;
+	struct tc_u_hnode *ht_up, *ht_down;
 	struct nlattr *nest;
 
 	if (n == NULL)
@@ -762,7 +795,9 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 			    sizeof(n->sel) + n->sel.nkeys*sizeof(struct tc_u32_key),
 			    &n->sel))
 			goto nla_put_failure;
-		if (n->ht_up) {
+
+		ht_up = rtnl_dereference(n->ht_up);
+		if (ht_up) {
 			u32 htid = n->handle & 0xFFFFF000;
 			if (nla_put_u32(skb, TCA_U32_HASH, htid))
 				goto nla_put_failure;
@@ -770,8 +805,10 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		if (n->res.classid &&
 		    nla_put_u32(skb, TCA_U32_CLASSID, n->res.classid))
 			goto nla_put_failure;
-		if (n->ht_down &&
-		    nla_put_u32(skb, TCA_U32_LINK, n->ht_down->handle))
+
+		ht_down = rtnl_dereference(n->ht_down);
+		if (ht_down &&
+		    nla_put_u32(skb, TCA_U32_LINK, ht_down->handle))
 			goto nla_put_failure;
 
 #ifdef CONFIG_CLS_U32_MARK

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 11/16] net: sched: rcu'ify cls_rsvp
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (9 preceding siblings ...)
  2014-09-10 15:50 ` [net-next PATCH v4 10/16] net: sched: make cls_u32 lockless John Fastabend
@ 2014-09-10 15:51 ` John Fastabend
  2014-09-11  1:30   ` Eric Dumazet
  2014-09-10 15:51 ` [net-next PATCH v4 12/16] net: sched: rcu'ify cls_bpf John Fastabend
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:51 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_rsvp.h |  157 ++++++++++++++++++++++++++++----------------------
 1 file changed, 89 insertions(+), 68 deletions(-)

diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 1020e23..afa508b 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -70,31 +70,34 @@ struct rsvp_head {
 	u32			tmap[256/32];
 	u32			hgenerator;
 	u8			tgenerator;
-	struct rsvp_session	*ht[256];
+	struct rsvp_session __rcu *ht[256];
+	struct rcu_head		rcu;
 };
 
 struct rsvp_session {
-	struct rsvp_session	*next;
-	__be32			dst[RSVP_DST_LEN];
-	struct tc_rsvp_gpi 	dpi;
-	u8			protocol;
-	u8			tunnelid;
+	struct rsvp_session __rcu	*next;
+	__be32				dst[RSVP_DST_LEN];
+	struct tc_rsvp_gpi		dpi;
+	u8				protocol;
+	u8				tunnelid;
 	/* 16 (src,sport) hash slots, and one wildcard source slot */
-	struct rsvp_filter	*ht[16 + 1];
+	struct rsvp_filter __rcu	*ht[16 + 1];
+	struct rcu_head			rcu;
 };
 
 
 struct rsvp_filter {
-	struct rsvp_filter	*next;
-	__be32			src[RSVP_DST_LEN];
-	struct tc_rsvp_gpi	spi;
-	u8			tunnelhdr;
+	struct rsvp_filter __rcu	*next;
+	__be32				src[RSVP_DST_LEN];
+	struct tc_rsvp_gpi		spi;
+	u8				tunnelhdr;
 
-	struct tcf_result	res;
-	struct tcf_exts		exts;
+	struct tcf_result		res;
+	struct tcf_exts			exts;
 
-	u32			handle;
-	struct rsvp_session	*sess;
+	u32				handle;
+	struct rsvp_session		*sess;
+	struct rcu_head			rcu;
 };
 
 static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid)
@@ -128,7 +131,7 @@ static inline unsigned int hash_src(__be32 *src)
 static int rsvp_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			 struct tcf_result *res)
 {
-	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
+	struct rsvp_head *head = rcu_dereference_bh(tp->root);
 	struct rsvp_session *s;
 	struct rsvp_filter *f;
 	unsigned int h1, h2;
@@ -169,7 +172,8 @@ restart:
 	h1 = hash_dst(dst, protocol, tunnelid);
 	h2 = hash_src(src);
 
-	for (s = sht[h1]; s; s = s->next) {
+	for (s = rcu_dereference_bh(head->ht[h1]); s;
+	     s = rcu_dereference_bh(s->next)) {
 		if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN - 1] &&
 		    protocol == s->protocol &&
 		    !(s->dpi.mask &
@@ -181,7 +185,8 @@ restart:
 #endif
 		    tunnelid == s->tunnelid) {
 
-			for (f = s->ht[h2]; f; f = f->next) {
+			for (f = rcu_dereference_bh(s->ht[h2]); f;
+			     f = rcu_dereference_bh(f->next)) {
 				if (src[RSVP_DST_LEN-1] == f->src[RSVP_DST_LEN - 1] &&
 				    !(f->spi.mask & (*(u32 *)(xprt + f->spi.offset) ^ f->spi.key))
 #if RSVP_DST_LEN == 4
@@ -205,7 +210,8 @@ matched:
 			}
 
 			/* And wildcard bucket... */
-			for (f = s->ht[16]; f; f = f->next) {
+			for (f = rcu_dereference_bh(s->ht[16]); f;
+			     f = rcu_dereference_bh(f->next)) {
 				*res = f->res;
 				RSVP_APPLY_RESULT();
 				goto matched;
@@ -218,7 +224,7 @@ matched:
 
 static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
 {
-	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
+	struct rsvp_head *head = rtnl_dereference(tp->root);
 	struct rsvp_session *s;
 	struct rsvp_filter *f;
 	unsigned int h1 = handle & 0xFF;
@@ -227,8 +233,10 @@ static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
 	if (h2 > 16)
 		return 0;
 
-	for (s = sht[h1]; s; s = s->next) {
-		for (f = s->ht[h2]; f; f = f->next) {
+	for (s = rtnl_dereference(head->ht[h1]); s;
+	     s = rtnl_dereference(s->next)) {
+		for (f = rtnl_dereference(s->ht[h2]); f;
+		     f = rtnl_dereference(f->next)) {
 			if (f->handle == handle)
 				return (unsigned long)f;
 		}
@@ -246,7 +254,7 @@ static int rsvp_init(struct tcf_proto *tp)
 
 	data = kzalloc(sizeof(struct rsvp_head), GFP_KERNEL);
 	if (data) {
-		tp->root = data;
+		rcu_assign_pointer(tp->root, data);
 		return 0;
 	}
 	return -ENOBUFS;
@@ -257,53 +265,55 @@ rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
 {
 	tcf_unbind_filter(tp, &f->res);
 	tcf_exts_destroy(tp, &f->exts);
-	kfree(f);
+	kfree_rcu(f, rcu);
 }
 
 static void rsvp_destroy(struct tcf_proto *tp)
 {
-	struct rsvp_head *data = xchg(&tp->root, NULL);
-	struct rsvp_session **sht;
+	struct rsvp_head *data = rtnl_dereference(tp->root);
 	int h1, h2;
 
 	if (data == NULL)
 		return;
 
-	sht = data->ht;
+	RCU_INIT_POINTER(tp->root, NULL);
 
 	for (h1 = 0; h1 < 256; h1++) {
 		struct rsvp_session *s;
 
-		while ((s = sht[h1]) != NULL) {
-			sht[h1] = s->next;
+		while ((s = rtnl_dereference(data->ht[h1])) != NULL) {
+			RCU_INIT_POINTER(data->ht[h1], s->next);
 
 			for (h2 = 0; h2 <= 16; h2++) {
 				struct rsvp_filter *f;
 
-				while ((f = s->ht[h2]) != NULL) {
-					s->ht[h2] = f->next;
+				while ((f = rtnl_dereference(s->ht[h2])) != NULL) {
+					rcu_assign_pointer(s->ht[h2], f->next);
 					rsvp_delete_filter(tp, f);
 				}
 			}
-			kfree(s);
+			kfree_rcu(s, rcu);
 		}
 	}
-	kfree(data);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(data, rcu);
 }
 
 static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct rsvp_filter **fp, *f = (struct rsvp_filter *)arg;
+	struct rsvp_head *head = rtnl_dereference(tp->root);
+	struct rsvp_filter *nfp, *f = (struct rsvp_filter *)arg;
+	struct rsvp_filter __rcu **fp;
 	unsigned int h = f->handle;
-	struct rsvp_session **sp;
-	struct rsvp_session *s = f->sess;
+	struct rsvp_session __rcu **sp;
+	struct rsvp_session *nsp, *s = f->sess;
 	int i;
 
-	for (fp = &s->ht[(h >> 8) & 0xFF]; *fp; fp = &(*fp)->next) {
-		if (*fp == f) {
-			tcf_tree_lock(tp);
+	fp = &s->ht[(h >> 8) & 0xFF];
+	for (nfp = rtnl_dereference(*fp); nfp;
+	     fp = &nfp->next, nfp = rtnl_dereference(*fp)) {
+		if (nfp == f) {
 			*fp = f->next;
-			tcf_tree_unlock(tp);
 			rsvp_delete_filter(tp, f);
 
 			/* Strip tree */
@@ -313,14 +323,12 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 					return 0;
 
 			/* OK, session has no flows */
-			for (sp = &((struct rsvp_head *)tp->root)->ht[h & 0xFF];
-			     *sp; sp = &(*sp)->next) {
-				if (*sp == s) {
-					tcf_tree_lock(tp);
+			sp = &head->ht[h & 0xFF];
+			for (nsp = rtnl_dereference(*sp); nsp;
+			     sp = &nsp->next, nsp = rtnl_dereference(*sp)) {
+				if (nsp == s) {
 					*sp = s->next;
-					tcf_tree_unlock(tp);
-
-					kfree(s);
+					kfree_rcu(s, rcu);
 					return 0;
 				}
 			}
@@ -333,7 +341,7 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 
 static unsigned int gen_handle(struct tcf_proto *tp, unsigned salt)
 {
-	struct rsvp_head *data = tp->root;
+	struct rsvp_head *data = rtnl_dereference(tp->root);
 	int i = 0xFFFF;
 
 	while (i-- > 0) {
@@ -361,7 +369,7 @@ static int tunnel_bts(struct rsvp_head *data)
 
 static void tunnel_recycle(struct rsvp_head *data)
 {
-	struct rsvp_session **sht = data->ht;
+	struct rsvp_session __rcu **sht = data->ht;
 	u32 tmap[256/32];
 	int h1, h2;
 
@@ -369,11 +377,13 @@ static void tunnel_recycle(struct rsvp_head *data)
 
 	for (h1 = 0; h1 < 256; h1++) {
 		struct rsvp_session *s;
-		for (s = sht[h1]; s; s = s->next) {
+		for (s = rtnl_dereference(sht[h1]); s;
+		     s = rtnl_dereference(s->next)) {
 			for (h2 = 0; h2 <= 16; h2++) {
 				struct rsvp_filter *f;
 
-				for (f = s->ht[h2]; f; f = f->next) {
+				for (f = rtnl_dereference(s->ht[h2]); f;
+				     f = rtnl_dereference(f->next)) {
 					if (f->tunnelhdr == 0)
 						continue;
 					data->tgenerator = f->res.classid;
@@ -417,9 +427,11 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
 		       struct nlattr **tca,
 		       unsigned long *arg, bool ovr)
 {
-	struct rsvp_head *data = tp->root;
-	struct rsvp_filter *f, **fp;
-	struct rsvp_session *s, **sp;
+	struct rsvp_head *data = rtnl_dereference(tp->root);
+	struct rsvp_filter *f, *nfp;
+	struct rsvp_filter __rcu **fp;
+	struct rsvp_session *nsp, *s;
+	struct rsvp_session __rcu **sp;
 	struct tc_rsvp_pinfo *pinfo = NULL;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_RSVP_MAX + 1];
@@ -499,7 +511,9 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
 			goto errout;
 	}
 
-	for (sp = &data->ht[h1]; (s = *sp) != NULL; sp = &s->next) {
+	for (sp = &data->ht[h1];
+	     (s = rtnl_dereference(*sp)) != NULL;
+	     sp = &s->next) {
 		if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN-1] &&
 		    pinfo && pinfo->protocol == s->protocol &&
 		    memcmp(&pinfo->dpi, &s->dpi, sizeof(s->dpi)) == 0 &&
@@ -521,12 +535,16 @@ insert:
 
 			tcf_exts_change(tp, &f->exts, &e);
 
-			for (fp = &s->ht[h2]; *fp; fp = &(*fp)->next)
-				if (((*fp)->spi.mask & f->spi.mask) != f->spi.mask)
+			fp = &s->ht[h2];
+			for (nfp = rtnl_dereference(*fp); nfp;
+			     fp = &nfp->next, nfp = rtnl_dereference(*fp)) {
+				__u32 mask = nfp->spi.mask & f->spi.mask;
+
+				if (mask != f->spi.mask)
 					break;
-			f->next = *fp;
-			wmb();
-			*fp = f;
+			}
+			RCU_INIT_POINTER(f->next, nfp);
+			rcu_assign_pointer(*fp, f);
 
 			*arg = (unsigned long)f;
 			return 0;
@@ -546,13 +564,14 @@ insert:
 		s->protocol = pinfo->protocol;
 		s->tunnelid = pinfo->tunnelid;
 	}
-	for (sp = &data->ht[h1]; *sp; sp = &(*sp)->next) {
-		if (((*sp)->dpi.mask&s->dpi.mask) != s->dpi.mask)
+	sp = &data->ht[h1];
+	for (nsp = rtnl_dereference(*sp); nsp;
+	     sp = &nsp->next, nsp = rtnl_dereference(*sp)) {
+		if ((nsp->dpi.mask & s->dpi.mask) != s->dpi.mask)
 			break;
 	}
-	s->next = *sp;
-	wmb();
-	*sp = s;
+	RCU_INIT_POINTER(s->next, nsp);
+	rcu_assign_pointer(*sp, s);
 
 	goto insert;
 
@@ -565,7 +584,7 @@ errout2:
 
 static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct rsvp_head *head = tp->root;
+	struct rsvp_head *head = rtnl_dereference(tp->root);
 	unsigned int h, h1;
 
 	if (arg->stop)
@@ -574,11 +593,13 @@ static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 	for (h = 0; h < 256; h++) {
 		struct rsvp_session *s;
 
-		for (s = head->ht[h]; s; s = s->next) {
+		for (s = rtnl_dereference(head->ht[h]); s;
+		     s = rtnl_dereference(s->next)) {
 			for (h1 = 0; h1 <= 16; h1++) {
 				struct rsvp_filter *f;
 
-				for (f = s->ht[h1]; f; f = f->next) {
+				for (f = rtnl_dereference(s->ht[h1]); f;
+				     f = rtnl_dereference(f->next)) {
 					if (arg->count < arg->skip) {
 						arg->count++;
 						continue;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 12/16] net: sched: rcu'ify cls_bpf
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (10 preceding siblings ...)
  2014-09-10 15:51 ` [net-next PATCH v4 11/16] net: sched: rcu'ify cls_rsvp John Fastabend
@ 2014-09-10 15:51 ` John Fastabend
  2014-09-11  2:28   ` Eric Dumazet
  2014-09-10 15:52 ` [net-next PATCH v4 13/16] net: sched: make tc_action safe to walk under RCU John Fastabend
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:51 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

This patch makes the cls_bpf classifier RCU safe. The tcf_lock
was being used to protect a list of cls_bpf_prog now this list
is RCU safe and updates occur with rcu_replace.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_bpf.c |   93 ++++++++++++++++++++++++++-------------------------
 1 file changed, 47 insertions(+), 46 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 0e30d58..f2ba58a 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -27,6 +27,7 @@ MODULE_DESCRIPTION("TC BPF based classifier");
 struct cls_bpf_head {
 	struct list_head plist;
 	u32 hgen;
+	struct rcu_head rcu;
 };
 
 struct cls_bpf_prog {
@@ -37,6 +38,8 @@ struct cls_bpf_prog {
 	struct list_head link;
 	u32 handle;
 	u16 bpf_len;
+	struct tcf_proto *tp;
+	struct rcu_head rcu;
 };
 
 static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
@@ -49,11 +52,11 @@ static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
 static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			    struct tcf_result *res)
 {
-	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_head *head = rcu_dereference(tp->root);
 	struct cls_bpf_prog *prog;
 	int ret;
 
-	list_for_each_entry(prog, &head->plist, link) {
+	list_for_each_entry_rcu(prog, &head->plist, link) {
 		int filter_res = BPF_PROG_RUN(prog->filter, skb);
 
 		if (filter_res == 0)
@@ -81,8 +84,8 @@ static int cls_bpf_init(struct tcf_proto *tp)
 	if (head == NULL)
 		return -ENOBUFS;
 
-	INIT_LIST_HEAD(&head->plist);
-	tp->root = head;
+	INIT_LIST_HEAD_RCU(&head->plist);
+	rcu_assign_pointer(tp->root, head);
 
 	return 0;
 }
@@ -98,18 +101,22 @@ static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
 	kfree(prog);
 }
 
+static void __cls_bpf_delete_prog(struct rcu_head *rcu)
+{
+	struct cls_bpf_prog *prog = container_of(rcu, struct cls_bpf_prog, rcu);
+
+	cls_bpf_delete_prog(prog->tp, prog);
+}
+
 static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 	struct cls_bpf_prog *prog, *todel = (struct cls_bpf_prog *) arg;
 
 	list_for_each_entry(prog, &head->plist, link) {
 		if (prog == todel) {
-			tcf_tree_lock(tp);
-			list_del(&prog->link);
-			tcf_tree_unlock(tp);
-
-			cls_bpf_delete_prog(tp, prog);
+			list_del_rcu(&prog->link);
+			call_rcu(&prog->rcu, __cls_bpf_delete_prog);
 			return 0;
 		}
 	}
@@ -119,27 +126,28 @@ static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
 
 static void cls_bpf_destroy(struct tcf_proto *tp)
 {
-	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 	struct cls_bpf_prog *prog, *tmp;
 
 	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
-		list_del(&prog->link);
-		cls_bpf_delete_prog(tp, prog);
+		list_del_rcu(&prog->link);
+		call_rcu(&prog->rcu, __cls_bpf_delete_prog);
 	}
 
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
 {
-	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 	struct cls_bpf_prog *prog;
 	unsigned long ret = 0UL;
 
 	if (head == NULL)
 		return 0UL;
 
-	list_for_each_entry(prog, &head->plist, link) {
+	list_for_each_entry_rcu(prog, &head->plist, link) {
 		if (prog->handle == handle) {
 			ret = (unsigned long) prog;
 			break;
@@ -158,10 +166,10 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 				   unsigned long base, struct nlattr **tb,
 				   struct nlattr *est, bool ovr)
 {
-	struct sock_filter *bpf_ops, *bpf_old;
+	struct sock_filter *bpf_ops;
 	struct tcf_exts exts;
 	struct sock_fprog_kern tmp;
-	struct bpf_prog *fp, *fp_old;
+	struct bpf_prog *fp;
 	u16 bpf_size, bpf_len;
 	u32 classid;
 	int ret;
@@ -197,26 +205,15 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 	if (ret)
 		goto errout_free;
 
-	tcf_tree_lock(tp);
-	fp_old = prog->filter;
-	bpf_old = prog->bpf_ops;
-
 	prog->bpf_len = bpf_len;
 	prog->bpf_ops = bpf_ops;
 	prog->filter = fp;
 	prog->res.classid = classid;
-	tcf_tree_unlock(tp);
 
 	tcf_bind_filter(tp, &prog->res, base);
 	tcf_exts_change(tp, &prog->exts, &exts);
 
-	if (fp_old)
-		bpf_prog_destroy(fp_old);
-	if (bpf_old)
-		kfree(bpf_old);
-
 	return 0;
-
 errout_free:
 	kfree(bpf_ops);
 errout:
@@ -244,9 +241,10 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 			  u32 handle, struct nlattr **tca,
 			  unsigned long *arg, bool ovr)
 {
-	struct cls_bpf_head *head = tp->root;
-	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) *arg;
+	struct cls_bpf_head *head = rtnl_dereference(tp->root);
+	struct cls_bpf_prog *oldprog = (struct cls_bpf_prog *) *arg;
 	struct nlattr *tb[TCA_BPF_MAX + 1];
+	struct cls_bpf_prog *prog;
 	int ret;
 
 	if (tca[TCA_OPTIONS] == NULL)
@@ -256,18 +254,19 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	if (ret < 0)
 		return ret;
 
-	if (prog != NULL) {
-		if (handle && prog->handle != handle)
-			return -EINVAL;
-		return cls_bpf_modify_existing(net, tp, prog, base, tb,
-					       tca[TCA_RATE], ovr);
-	}
-
 	prog = kzalloc(sizeof(*prog), GFP_KERNEL);
-	if (prog == NULL)
+	if (!prog)
 		return -ENOBUFS;
 
 	tcf_exts_init(&prog->exts, TCA_BPF_ACT, TCA_BPF_POLICE);
+
+	if (oldprog) {
+		if (handle && oldprog->handle != handle) {
+			ret = -EINVAL;
+			goto errout;
+		}
+	}
+
 	if (handle == 0)
 		prog->handle = cls_bpf_grab_new_handle(tp, head);
 	else
@@ -281,15 +280,17 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	if (ret < 0)
 		goto errout;
 
-	tcf_tree_lock(tp);
-	list_add(&prog->link, &head->plist);
-	tcf_tree_unlock(tp);
+	if (oldprog) {
+		list_replace_rcu(&prog->link, &oldprog->link);
+		call_rcu(&oldprog->rcu, __cls_bpf_delete_prog);
+	} else {
+		list_add_rcu(&prog->link, &head->plist);
+	}
 
 	*arg = (unsigned long) prog;
-
 	return 0;
 errout:
-	if (*arg == 0UL && prog)
+	if (prog)
 		kfree(prog);
 
 	return ret;
@@ -339,10 +340,10 @@ nla_put_failure:
 
 static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 	struct cls_bpf_prog *prog;
 
-	list_for_each_entry(prog, &head->plist, link) {
+	list_for_each_entry_rcu(prog, &head->plist, link) {
 		if (arg->count < arg->skip)
 			goto skip;
 		if (arg->fn(tp, (unsigned long) prog, arg) < 0) {

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 13/16] net: sched: make tc_action safe to walk under RCU
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (11 preceding siblings ...)
  2014-09-10 15:51 ` [net-next PATCH v4 12/16] net: sched: rcu'ify cls_bpf John Fastabend
@ 2014-09-10 15:52 ` John Fastabend
  2014-09-10 15:52 ` [net-next PATCH v4 14/16] net: sched: make bstats per cpu and estimator RCU safe John Fastabend
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:52 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

This patch makes tc_actions safe to walk from classifiers under RCU.
Notice that each action act() callback defined in each act_*.c
already does its own locking. This is needed even in the current
infrastructure for the case where users add/remove actions via
'tc actions' and reference them via index from the classifiers.

There are a couple interesting pieces here (i.e. need careful review)
In tcf_exts_exec() the call to tcf_action_exec follows a list_empty
check. However although this is occurring under RCU there is no
guarantee that the list is not empty when tcf_action_exec is called.
This patch fixes up return values from tcf_action_exec() so that
packets wont be dropped on the floor when this occurs. Hopefully
its rare and by using RCU we sort of make this assumption.

Second there is a suspect usage of list_splice_init_rcu() in the
tcf_exts_change() routine. Notice how it is used twice in succession
and the second init works on the src tcf_exts. There is probably a
better way to accomplish that.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/act_api.h |    1 +
 include/net/pkt_cls.h |   10 +++++++++-
 net/sched/act_api.c   |   14 +++++++-------
 net/sched/cls_api.c   |   17 ++++++++++-------
 4 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 3ee4c92..ddd0c5a 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -79,6 +79,7 @@ struct tc_action {
 	__u32			type; /* for backward compat(TCA_OLD_COMPAT) */
 	__u32			order;
 	struct list_head	list;
+	struct rcu_head		rcu;
 };
 
 struct tc_action_ops {
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 6da46dc..58c99fc 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -76,7 +76,7 @@ static inline void tcf_exts_init(struct tcf_exts *exts, int action, int police)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	exts->type = 0;
-	INIT_LIST_HEAD(&exts->actions);
+	INIT_LIST_HEAD_RCU(&exts->actions);
 #endif
 	exts->action = action;
 	exts->police = police;
@@ -88,6 +88,8 @@ static inline void tcf_exts_init(struct tcf_exts *exts, int action, int police)
  *
  * Returns 1 if a predicative extension is present, i.e. an extension which
  * might cause further actions and thus overrule the regular tcf_result.
+ *
+ * This check is only valid if done under RTNL.
  */
 static inline int
 tcf_exts_is_predicative(struct tcf_exts *exts)
@@ -128,6 +130,12 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
 	       struct tcf_result *res)
 {
 #ifdef CONFIG_NET_CLS_ACT
+	/* This check is racy but this is OK, if the list is emptied
+	 * before walking the chain of actions the return value has
+	 * been updated to return zero. This way packets will not be
+	 * dropped when action list deletion occurs after the empty
+	 * check but before execution
+	 */
 	if (!list_empty(&exts->actions))
 		return tcf_action_exec(skb, &exts->actions, res);
 #endif
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 648778a..ae32b5b 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -381,14 +381,14 @@ int tcf_action_exec(struct sk_buff *skb, const struct list_head *actions,
 		    struct tcf_result *res)
 {
 	const struct tc_action *a;
-	int ret = -1;
+	int ret = 0;
 
 	if (skb->tc_verd & TC_NCLS) {
 		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
 		ret = TC_ACT_OK;
 		goto exec_done;
 	}
-	list_for_each_entry(a, actions, list) {
+	list_for_each_entry_rcu(a, actions, list) {
 repeat:
 		ret = a->ops->act(skb, a, res);
 		if (TC_MUNGED & skb->tc_verd) {
@@ -417,8 +417,8 @@ int tcf_action_destroy(struct list_head *actions, int bind)
 			module_put(a->ops->owner);
 		else if (ret < 0)
 			return ret;
-		list_del(&a->list);
-		kfree(a);
+		list_del_rcu(&a->list);
+		kfree_rcu(a, rcu);
 	}
 	return ret;
 }
@@ -584,7 +584,7 @@ int tcf_action_init(struct net *net, struct nlattr *nla,
 			goto err;
 		}
 		act->order = i;
-		list_add_tail(&act->list, actions);
+		list_add_tail_rcu(&act->list, actions);
 	}
 	return 0;
 
@@ -746,8 +746,8 @@ static void cleanup_a(struct list_head *actions)
 	struct tc_action *a, *tmp;
 
 	list_for_each_entry_safe(a, tmp, actions, list) {
-		list_del(&a->list);
-		kfree(a);
+		list_del_rcu(&a->list);
+		kfree_rcu(a, rcu);
 	}
 }
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e547efd..dfce69b 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -500,7 +500,7 @@ void tcf_exts_destroy(struct tcf_proto *tp, struct tcf_exts *exts)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	tcf_action_destroy(&exts->actions, TCA_ACT_UNBIND);
-	INIT_LIST_HEAD(&exts->actions);
+	INIT_LIST_HEAD_RCU(&exts->actions);
 #endif
 }
 EXPORT_SYMBOL(tcf_exts_destroy);
@@ -512,7 +512,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 	{
 		struct tc_action *act;
 
-		INIT_LIST_HEAD(&exts->actions);
+		INIT_LIST_HEAD_RCU(&exts->actions);
 		if (exts->police && tb[exts->police]) {
 			act = tcf_action_init_1(net, tb[exts->police], rate_tlv,
 						"police", ovr,
@@ -521,7 +521,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 				return PTR_ERR(act);
 
 			act->type = exts->type = TCA_OLD_COMPAT;
-			list_add(&act->list, &exts->actions);
+			list_add_rcu(&act->list, &exts->actions);
 		} else if (exts->action && tb[exts->action]) {
 			int err;
 			err = tcf_action_init(net, tb[exts->action], rate_tlv,
@@ -541,15 +541,18 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 }
 EXPORT_SYMBOL(tcf_exts_validate);
 
+/* It is not safe to use src->actions after this due to _init_rcu usage
+ * INIT_LIST_HEAD_RCU() is called on src->actions
+ */
 void tcf_exts_change(struct tcf_proto *tp, struct tcf_exts *dst,
 		     struct tcf_exts *src)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	LIST_HEAD(tmp);
-	tcf_tree_lock(tp);
-	list_splice_init(&dst->actions, &tmp);
-	list_splice(&src->actions, &dst->actions);
-	tcf_tree_unlock(tp);
+	list_splice_init_rcu(&dst->actions, &tmp, synchronize_rcu);
+	list_splice_init_rcu(&src->actions,
+			     &dst->actions,
+			     synchronize_rcu);
 	tcf_action_destroy(&tmp, TCA_ACT_UNBIND);
 #endif
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 14/16] net: sched: make bstats per cpu and estimator RCU safe
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (12 preceding siblings ...)
  2014-09-10 15:52 ` [net-next PATCH v4 13/16] net: sched: make tc_action safe to walk under RCU John Fastabend
@ 2014-09-10 15:52 ` John Fastabend
  2014-09-10 15:52 ` [net-next PATCH v4 15/16] net: sched: make qstats per cpu John Fastabend
  2014-09-10 15:53 ` [net-next PATCH v4 16/16] net: sched: drop ingress qdisc lock John Fastabend
  15 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:52 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_LLQDISC flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

After this if qdiscs use the _percpu routines to update stats
and set the TCQ_F_LLQDISC they can be made safe to run without
locking. Any qdisc that sets the flag needs to ensure that the
data structures it owns are safe to run without locks and
additionally currently all the skb list routines run without
locking so those would need to be updated in a future patch.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/gen_stats.h    |   16 ++++++++++++
 include/net/sch_generic.h  |   40 +++++++++++++++++++++++++++--
 net/core/gen_estimator.c   |   60 ++++++++++++++++++++++++++++++++++----------
 net/core/gen_stats.c       |   47 ++++++++++++++++++++++++++++++++++
 net/netfilter/xt_RATEEST.c |    4 +--
 net/sched/act_api.c        |    9 ++++---
 net/sched/act_police.c     |    4 +--
 net/sched/sch_api.c        |   51 ++++++++++++++++++++++++++++++++-----
 net/sched/sch_cbq.c        |    9 ++++---
 net/sched/sch_drr.c        |    9 ++++---
 net/sched/sch_generic.c    |   11 +++++++-
 net/sched/sch_hfsc.c       |   15 +++++++----
 net/sched/sch_htb.c        |   14 +++++++---
 net/sched/sch_ingress.c    |    2 +
 net/sched/sch_mq.c         |   10 ++++---
 net/sched/sch_mqprio.c     |   13 +++++-----
 net/sched/sch_multiq.c     |    2 +
 net/sched/sch_prio.c       |    2 +
 net/sched/sch_qfq.c        |   12 +++++----
 19 files changed, 258 insertions(+), 72 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index ea4271d..4b7ca2b 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -6,6 +6,11 @@
 #include <linux/rtnetlink.h>
 #include <linux/pkt_sched.h>
 
+struct gnet_stats_basic_cpu {
+	struct gnet_stats_basic_packed bstats;
+	struct u64_stats_sync syncp;
+};
+
 struct gnet_dump {
 	spinlock_t *      lock;
 	struct sk_buff *  skb;
@@ -26,10 +31,17 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
 				 int tc_stats_type, int xstats_type,
 				 spinlock_t *lock, struct gnet_dump *d);
 
+int gnet_stats_copy_basic_cpu(struct gnet_dump *d,
+			      struct gnet_stats_basic_cpu __percpu *b);
+void __gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
+				 struct gnet_stats_basic_cpu __percpu *b);
 int gnet_stats_copy_basic(struct gnet_dump *d,
 			  struct gnet_stats_basic_packed *b);
+void __gnet_stats_copy_basic(struct gnet_stats_basic_packed *bstats,
+			     struct gnet_stats_basic_packed *b);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 			     const struct gnet_stats_basic_packed *b,
+			     const struct gnet_stats_basic_cpu __percpu *cpu_b,
 			     struct gnet_stats_rate_est64 *r);
 int gnet_stats_copy_queue(struct gnet_dump *d, struct gnet_stats_queue *q);
 int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
@@ -37,13 +49,17 @@ int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 int gnet_stats_finish_copy(struct gnet_dump *d);
 
 int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
+		      struct gnet_stats_basic_cpu __percpu *cpu_bstats,
 		      struct gnet_stats_rate_est64 *rate_est,
 		      spinlock_t *stats_lock, struct nlattr *opt);
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
+			struct gnet_stats_basic_cpu __percpu *cpu_bstats,
 			struct gnet_stats_rate_est64 *rate_est);
 int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
+			  struct gnet_stats_basic_cpu __percpu *cpu_bstats,
 			  struct gnet_stats_rate_est64 *rate_est,
 			  spinlock_t *stats_lock, struct nlattr *opt);
 bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
+			  const struct gnet_stats_basic_cpu __percpu *cpu_bstat,
 			  const struct gnet_stats_rate_est64 *rate_est);
 #endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 206d906..e1ed293 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -6,6 +6,7 @@
 #include <linux/rcupdate.h>
 #include <linux/pkt_sched.h>
 #include <linux/pkt_cls.h>
+#include <linux/percpu.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
 
@@ -57,6 +58,9 @@ struct Qdisc {
 				      * Its true for MQ/MQPRIO slaves, or non
 				      * multiqueue device.
 				      */
+#define TCQ_F_LLQDISC		0x20 /* lockless qdiscs can run without using
+				      * the qdisc lock to serialize skbs.
+				      */
 #define TCQ_F_WARN_NONWC	(1 << 16)
 	u32			limit;
 	const struct Qdisc_ops	*ops;
@@ -83,7 +87,10 @@ struct Qdisc {
 	 */
 	unsigned long		state;
 	struct sk_buff_head	q;
-	struct gnet_stats_basic_packed bstats;
+	union {
+		struct gnet_stats_basic_packed bstats;
+		struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+	} bstats_qdisc;
 	unsigned int		__state;
 	struct gnet_stats_queue	qstats;
 	struct rcu_head		rcu_head;
@@ -490,7 +497,6 @@ static inline int qdisc_enqueue_root(struct sk_buff *skb, struct Qdisc *sch)
 	return qdisc_enqueue(skb, sch) & NET_XMIT_MASK;
 }
 
-
 static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
 				 const struct sk_buff *skb)
 {
@@ -498,10 +504,38 @@ static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
 	bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
 }
 
+static inline void qdisc_bstats_update_cpu(struct Qdisc *sch,
+					   const struct sk_buff *skb)
+{
+	struct gnet_stats_basic_cpu *bstats =
+				this_cpu_ptr(sch->bstats_qdisc.cpu_bstats);
+
+	u64_stats_update_begin(&bstats->syncp);
+	bstats_update(&bstats->bstats, skb);
+	u64_stats_update_end(&bstats->syncp);
+}
+
 static inline void qdisc_bstats_update(struct Qdisc *sch,
 				       const struct sk_buff *skb)
 {
-	bstats_update(&sch->bstats, skb);
+	bstats_update(&sch->bstats_qdisc.bstats, skb);
+}
+
+static inline bool qdisc_is_lockless(struct Qdisc *q)
+{
+	return q->flags & TCQ_F_LLQDISC;
+}
+
+static inline int
+qdisc_bstats_copy_qdisc(struct gnet_dump *d, struct Qdisc *q)
+{
+	int err;
+
+	if (qdisc_is_lockless(q))
+		err = gnet_stats_copy_basic_cpu(d, q->bstats_qdisc.cpu_bstats);
+	else
+		err = gnet_stats_copy_basic(d, &q->bstats_qdisc.bstats);
+	return err;
 }
 
 static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 9d33dff..0d13319 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -91,6 +91,8 @@ struct gen_estimator
 	u32			avpps;
 	struct rcu_head		e_rcu;
 	struct rb_node		node;
+	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+	struct rcu_head		head;
 };
 
 struct gen_estimator_head
@@ -122,11 +124,21 @@ static void est_timer(unsigned long arg)
 
 		spin_lock(e->stats_lock);
 		read_lock(&est_lock);
-		if (e->bstats == NULL)
+
+		if (e->bstats == NULL && e->cpu_bstats == NULL)
 			goto skip;
 
-		nbytes = e->bstats->bytes;
-		npackets = e->bstats->packets;
+		if (e->cpu_bstats) {
+			struct gnet_stats_basic_packed b = {0};
+
+			__gnet_stats_copy_basic_cpu(&b, e->cpu_bstats);
+			nbytes = b.bytes;
+			npackets = b.packets;
+		} else {
+			nbytes = e->bstats->bytes;
+			npackets = e->bstats->packets;
+		}
+
 		brate = (nbytes - e->last_bytes)<<(7 - idx);
 		e->last_bytes = nbytes;
 		e->avbps += (brate >> e->ewma_log) - (e->avbps >> e->ewma_log);
@@ -146,6 +158,11 @@ skip:
 	rcu_read_unlock();
 }
 
+static void *gen_get_bstats(struct gen_estimator *est)
+{
+	return est->cpu_bstats ? (void *)est->cpu_bstats : (void *)est->bstats;
+}
+
 static void gen_add_node(struct gen_estimator *est)
 {
 	struct rb_node **p = &est_root.rb_node, *parent = NULL;
@@ -156,7 +173,7 @@ static void gen_add_node(struct gen_estimator *est)
 		parent = *p;
 		e = rb_entry(parent, struct gen_estimator, node);
 
-		if (est->bstats > e->bstats)
+		if (gen_get_bstats(est)  > gen_get_bstats(e))
 			p = &parent->rb_right;
 		else
 			p = &parent->rb_left;
@@ -167,18 +184,20 @@ static void gen_add_node(struct gen_estimator *est)
 
 static
 struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats,
+				    const struct gnet_stats_basic_cpu __percpu *cpu_bstats,
 				    const struct gnet_stats_rate_est64 *rate_est)
 {
 	struct rb_node *p = est_root.rb_node;
+	void *b = bstats ? (void *)bstats : (void *)cpu_bstats;
 
 	while (p) {
 		struct gen_estimator *e;
 
 		e = rb_entry(p, struct gen_estimator, node);
 
-		if (bstats > e->bstats)
+		if (b > gen_get_bstats(e))
 			p = p->rb_right;
-		else if (bstats < e->bstats || rate_est != e->rate_est)
+		else if (b < gen_get_bstats(e) || rate_est != e->rate_est)
 			p = p->rb_left;
 		else
 			return e;
@@ -203,6 +222,7 @@ struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
  *
  */
 int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
+		      struct gnet_stats_basic_cpu __percpu *cpu_bstats,
 		      struct gnet_stats_rate_est64 *rate_est,
 		      spinlock_t *stats_lock,
 		      struct nlattr *opt)
@@ -221,14 +241,24 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 	if (est == NULL)
 		return -ENOBUFS;
 
+	if (cpu_bstats) {
+		struct gnet_stats_basic_packed b = {0};
+
+		est->cpu_bstats = cpu_bstats;
+		__gnet_stats_copy_basic_cpu(&b, cpu_bstats);
+		est->last_bytes = b.bytes;
+		est->last_packets = b.packets;
+	} else {
+		est->bstats = bstats;
+		est->last_bytes = bstats->bytes;
+		est->last_packets = bstats->packets;
+	}
+
 	idx = parm->interval + 2;
-	est->bstats = bstats;
 	est->rate_est = rate_est;
 	est->stats_lock = stats_lock;
 	est->ewma_log = parm->ewma_log;
-	est->last_bytes = bstats->bytes;
 	est->avbps = rate_est->bps<<5;
-	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
 	spin_lock_bh(&est_tree_lock);
@@ -258,14 +288,14 @@ EXPORT_SYMBOL(gen_new_estimator);
  * Note : Caller should respect an RCU grace period before freeing stats_lock
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
+			struct gnet_stats_basic_cpu __percpu *cpu_bstats,
 			struct gnet_stats_rate_est64 *rate_est)
 {
 	struct gen_estimator *e;
 
 	spin_lock_bh(&est_tree_lock);
-	while ((e = gen_find_node(bstats, rate_est))) {
+	while ((e = gen_find_node(bstats, cpu_bstats, rate_est))) {
 		rb_erase(&e->node, &est_root);
-
 		write_lock(&est_lock);
 		e->bstats = NULL;
 		write_unlock(&est_lock);
@@ -290,11 +320,12 @@ EXPORT_SYMBOL(gen_kill_estimator);
  * Returns 0 on success or a negative error code.
  */
 int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
+			  struct gnet_stats_basic_cpu __percpu *cpu_bstats,
 			  struct gnet_stats_rate_est64 *rate_est,
 			  spinlock_t *stats_lock, struct nlattr *opt)
 {
-	gen_kill_estimator(bstats, rate_est);
-	return gen_new_estimator(bstats, rate_est, stats_lock, opt);
+	gen_kill_estimator(bstats, cpu_bstats, rate_est);
+	return gen_new_estimator(bstats, cpu_bstats, rate_est, stats_lock, opt);
 }
 EXPORT_SYMBOL(gen_replace_estimator);
 
@@ -306,6 +337,7 @@ EXPORT_SYMBOL(gen_replace_estimator);
  * Returns true if estimator is active, and false if not.
  */
 bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
+			  const struct gnet_stats_basic_cpu __percpu *cpu_bstats,
 			  const struct gnet_stats_rate_est64 *rate_est)
 {
 	bool res;
@@ -313,7 +345,7 @@ bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
 	ASSERT_RTNL();
 
 	spin_lock_bh(&est_tree_lock);
-	res = gen_find_node(bstats, rate_est) != NULL;
+	res = gen_find_node(bstats, cpu_bstats, rate_est) != NULL;
 	spin_unlock_bh(&est_tree_lock);
 
 	return res;
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 2ddbce4..e43b55f 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -128,6 +128,50 @@ gnet_stats_copy_basic(struct gnet_dump *d, struct gnet_stats_basic_packed *b)
 }
 EXPORT_SYMBOL(gnet_stats_copy_basic);
 
+void
+__gnet_stats_copy_basic(struct gnet_stats_basic_packed *bstats,
+			struct gnet_stats_basic_packed *b)
+{
+	bstats->bytes = b->bytes;
+	bstats->packets = b->packets;
+}
+EXPORT_SYMBOL(__gnet_stats_copy_basic);
+
+void
+__gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
+			    struct gnet_stats_basic_cpu __percpu *b)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct gnet_stats_basic_cpu *bcpu = per_cpu_ptr(b, i);
+		unsigned int start;
+		__u64 bytes;
+		__u32 packets;
+
+		do {
+			start = u64_stats_fetch_begin_irq(&bcpu->syncp);
+			bytes = bcpu->bstats.bytes;
+			packets = bcpu->bstats.packets;
+		} while (u64_stats_fetch_retry_irq(&bcpu->syncp, start));
+
+		bstats->bytes += bcpu->bstats.bytes;
+		bstats->packets += bcpu->bstats.packets;
+	}
+}
+EXPORT_SYMBOL(__gnet_stats_copy_basic_cpu);
+
+int
+gnet_stats_copy_basic_cpu(struct gnet_dump *d,
+			  struct gnet_stats_basic_cpu __percpu *b)
+{
+	struct gnet_stats_basic_packed bstats = {0};
+
+	__gnet_stats_copy_basic_cpu(&bstats, b);
+	return gnet_stats_copy_basic(d, &bstats);
+}
+EXPORT_SYMBOL(gnet_stats_copy_basic_cpu);
+
 /**
  * gnet_stats_copy_rate_est - copy rate estimator statistics into statistics TLV
  * @d: dumping handle
@@ -143,12 +187,13 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
 int
 gnet_stats_copy_rate_est(struct gnet_dump *d,
 			 const struct gnet_stats_basic_packed *b,
+			 const struct gnet_stats_basic_cpu __percpu *cpu_b,
 			 struct gnet_stats_rate_est64 *r)
 {
 	struct gnet_stats_rate_est est;
 	int res;
 
-	if (b && !gen_estimator_active(b, r))
+	if ((b || cpu_b) && !gen_estimator_active(b, cpu_b, r))
 		return 0;
 
 	est.bps = min_t(u64, UINT_MAX, r->bps);
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 370adf6..02e2532 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -64,7 +64,7 @@ void xt_rateest_put(struct xt_rateest *est)
 	mutex_lock(&xt_rateest_mutex);
 	if (--est->refcnt == 0) {
 		hlist_del(&est->list);
-		gen_kill_estimator(&est->bstats, &est->rstats);
+		gen_kill_estimator(&est->bstats, NULL, &est->rstats);
 		/*
 		 * gen_estimator est_timer() might access est->lock or bstats,
 		 * wait a RCU grace period before freeing 'est'
@@ -136,7 +136,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
 	cfg.est.interval	= info->interval;
 	cfg.est.ewma_log	= info->ewma_log;
 
-	ret = gen_new_estimator(&est->bstats, &est->rstats,
+	ret = gen_new_estimator(&est->bstats, NULL, &est->rstats,
 				&est->lock, &cfg.opt);
 	if (ret < 0)
 		goto err2;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index ae32b5b..89504ea 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -35,7 +35,7 @@ void tcf_hash_destroy(struct tc_action *a)
 	spin_lock_bh(&hinfo->lock);
 	hlist_del(&p->tcfc_head);
 	spin_unlock_bh(&hinfo->lock);
-	gen_kill_estimator(&p->tcfc_bstats,
+	gen_kill_estimator(&p->tcfc_bstats, NULL,
 			   &p->tcfc_rate_est);
 	/*
 	 * gen_estimator est_timer() might access p->tcfc_lock
@@ -228,7 +228,7 @@ void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est)
 {
 	struct tcf_common *pc = a->priv;
 	if (est)
-		gen_kill_estimator(&pc->tcfc_bstats,
+		gen_kill_estimator(&pc->tcfc_bstats, NULL,
 				   &pc->tcfc_rate_est);
 	kfree_rcu(pc, tcfc_rcu);
 }
@@ -252,7 +252,8 @@ int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
 	p->tcfc_tm.install = jiffies;
 	p->tcfc_tm.lastuse = jiffies;
 	if (est) {
-		int err = gen_new_estimator(&p->tcfc_bstats, &p->tcfc_rate_est,
+		int err = gen_new_estimator(&p->tcfc_bstats, NULL,
+					    &p->tcfc_rate_est,
 					    &p->tcfc_lock, est);
 		if (err) {
 			kfree(p);
@@ -620,7 +621,7 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *a,
 		goto errout;
 
 	if (gnet_stats_copy_basic(&d, &p->tcfc_bstats) < 0 ||
-	    gnet_stats_copy_rate_est(&d, &p->tcfc_bstats,
+	    gnet_stats_copy_rate_est(&d, &p->tcfc_bstats, NULL,
 				     &p->tcfc_rate_est) < 0 ||
 	    gnet_stats_copy_queue(&d, &p->tcfc_qstats) < 0)
 		goto errout;
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index f32bcb0..26f4bb3 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -178,14 +178,14 @@ override:
 
 	spin_lock_bh(&police->tcf_lock);
 	if (est) {
-		err = gen_replace_estimator(&police->tcf_bstats,
+		err = gen_replace_estimator(&police->tcf_bstats, NULL,
 					    &police->tcf_rate_est,
 					    &police->tcf_lock, est);
 		if (err)
 			goto failure_unlock;
 	} else if (tb[TCA_POLICE_AVRATE] &&
 		   (ret == ACT_P_CREATED ||
-		    !gen_estimator_active(&police->tcf_bstats,
+		    !gen_estimator_active(&police->tcf_bstats, NULL,
 					  &police->tcf_rate_est))) {
 		err = -EINVAL;
 		goto failure_unlock;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ca62483..beb2064 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -942,6 +942,13 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 	sch->handle = handle;
 
 	if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) {
+		if (qdisc_is_lockless(sch)) {
+			sch->bstats_qdisc.cpu_bstats =
+				alloc_percpu(struct gnet_stats_basic_cpu);
+			if (!sch->bstats_qdisc.cpu_bstats)
+				goto err_out4;
+		}
+
 		if (tca[TCA_STAB]) {
 			stab = qdisc_get_stab(tca[TCA_STAB]);
 			if (IS_ERR(stab)) {
@@ -964,8 +971,18 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 			else
 				root_lock = qdisc_lock(sch);
 
-			err = gen_new_estimator(&sch->bstats, &sch->rate_est,
-						root_lock, tca[TCA_RATE]);
+			if (qdisc_is_lockless(sch))
+				err = gen_new_estimator(NULL,
+							sch->bstats_qdisc.cpu_bstats,
+							&sch->rate_est,
+							root_lock,
+							tca[TCA_RATE]);
+			else
+				err = gen_new_estimator(&sch->bstats_qdisc.bstats,
+							NULL,
+							&sch->rate_est,
+							root_lock,
+							tca[TCA_RATE]);
 			if (err)
 				goto err_out4;
 		}
@@ -1022,9 +1039,11 @@ static int qdisc_change(struct Qdisc *sch, struct nlattr **tca)
 		   because change can't be undone. */
 		if (sch->flags & TCQ_F_MQROOT)
 			goto out;
-		gen_replace_estimator(&sch->bstats, &sch->rate_est,
-					    qdisc_root_sleeping_lock(sch),
-					    tca[TCA_RATE]);
+		gen_replace_estimator(&sch->bstats_qdisc.bstats,
+				      sch->bstats_qdisc.cpu_bstats,
+				      &sch->rate_est,
+				      qdisc_root_sleeping_lock(sch),
+				      tca[TCA_RATE]);
 	}
 out:
 	return 0;
@@ -1304,6 +1323,7 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 	unsigned char *b = skb_tail_pointer(skb);
 	struct gnet_dump d;
 	struct qdisc_size_table *stab;
+	int err;
 
 	cond_resched();
 	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
@@ -1334,10 +1354,25 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 	if (q->ops->dump_stats && q->ops->dump_stats(q, &d) < 0)
 		goto nla_put_failure;
 
-	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(&d, &q->bstats, &q->rate_est) < 0 ||
+	if (qdisc_is_lockless(q)) {
+		err = gnet_stats_copy_basic_cpu(&d, q->bstats_qdisc.cpu_bstats);
+		if (err < 0)
+			goto nla_put_failure;
+		err = gnet_stats_copy_rate_est(&d, NULL,
+					       q->bstats_qdisc.cpu_bstats,
+					       &q->rate_est);
+	} else {
+		err = gnet_stats_copy_basic(&d, &q->bstats_qdisc.bstats);
+		if (err < 0)
+			goto nla_put_failure;
+		err = gnet_stats_copy_rate_est(&d,
+					       &q->bstats_qdisc.bstats, NULL,
+					       &q->rate_est);
+	}
+
+	if (err < 0 ||
 	    gnet_stats_copy_queue(&d, &q->qstats) < 0)
-		goto nla_put_failure;
+			goto nla_put_failure;
 
 	if (gnet_stats_finish_copy(&d) < 0)
 		goto nla_put_failure;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index a3244a8..208074d 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1602,7 +1602,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 		cl->xstats.undertime = cl->undertime - q->now;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qstats) < 0)
 		return -1;
 
@@ -1671,7 +1671,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
 	tcf_destroy_chain(&cl->filter_list);
 	qdisc_destroy(cl->q);
 	qdisc_put_rtab(cl->R_tab);
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	if (cl != &q->link)
 		kfree(cl);
 }
@@ -1759,7 +1759,8 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 		}
 
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
 						    qdisc_root_sleeping_lock(sch),
 						    tca[TCA_RATE]);
 			if (err) {
@@ -1852,7 +1853,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 		goto failure;
 
 	if (tca[TCA_RATE]) {
-		err = gen_new_estimator(&cl->bstats, &cl->rate_est,
+		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
 					qdisc_root_sleeping_lock(sch),
 					tca[TCA_RATE]);
 		if (err) {
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index d8b5ccf..88a65f6 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -88,7 +88,8 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	if (cl != NULL) {
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
 						    qdisc_root_sleeping_lock(sch),
 						    tca[TCA_RATE]);
 			if (err)
@@ -116,7 +117,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		cl->qdisc = &noop_qdisc;
 
 	if (tca[TCA_RATE]) {
-		err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
+		err = gen_replace_estimator(&cl->bstats, NULL, &cl->rate_est,
 					    qdisc_root_sleeping_lock(sch),
 					    tca[TCA_RATE]);
 		if (err) {
@@ -138,7 +139,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 static void drr_destroy_class(struct Qdisc *sch, struct drr_class *cl)
 {
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	qdisc_destroy(cl->qdisc);
 	kfree(cl);
 }
@@ -283,7 +284,7 @@ static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	}
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0)
 		return -1;
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 346ef85..e3d203e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -632,6 +632,9 @@ static void qdisc_rcu_free(struct rcu_head *head)
 {
 	struct Qdisc *qdisc = container_of(head, struct Qdisc, rcu_head);
 
+	if (qdisc_is_lockless(qdisc))
+		free_percpu(qdisc->bstats_qdisc.cpu_bstats);
+
 	kfree((char *) qdisc - qdisc->padded);
 }
 
@@ -648,7 +651,13 @@ void qdisc_destroy(struct Qdisc *qdisc)
 
 	qdisc_put_stab(rtnl_dereference(qdisc->stab));
 #endif
-	gen_kill_estimator(&qdisc->bstats, &qdisc->rate_est);
+	if (qdisc_is_lockless(qdisc))
+		gen_kill_estimator(NULL, qdisc->bstats_qdisc.cpu_bstats,
+				   &qdisc->rate_est);
+	else
+		gen_kill_estimator(&qdisc->bstats_qdisc.bstats, NULL,
+				   &qdisc->rate_est);
+
 	if (ops->reset)
 		ops->reset(qdisc);
 	if (ops->destroy)
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 04b0de4..3b112d2 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1014,9 +1014,12 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		cur_time = psched_get_time();
 
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
-					      qdisc_root_sleeping_lock(sch),
-					      tca[TCA_RATE]);
+			spinlock_t *lock = qdisc_root_sleeping_lock(sch);
+
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
+						    lock,
+						    tca[TCA_RATE]);
 			if (err)
 				return err;
 		}
@@ -1063,7 +1066,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		return -ENOBUFS;
 
 	if (tca[TCA_RATE]) {
-		err = gen_new_estimator(&cl->bstats, &cl->rate_est,
+		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
 					qdisc_root_sleeping_lock(sch),
 					tca[TCA_RATE]);
 		if (err) {
@@ -1113,7 +1116,7 @@ hfsc_destroy_class(struct Qdisc *sch, struct hfsc_class *cl)
 
 	tcf_destroy_chain(&cl->filter_list);
 	qdisc_destroy(cl->qdisc);
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	if (cl != &q->root)
 		kfree(cl);
 }
@@ -1375,7 +1378,7 @@ hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	xstats.rtwork  = cl->cl_cumul;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qstats) < 0)
 		return -1;
 
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 6d16b9b..8067a82 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1145,7 +1145,7 @@ htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
 	cl->xstats.ctokens = PSCHED_NS2TICKS(cl->ctokens);
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, NULL, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, NULL, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qstats) < 0)
 		return -1;
 
@@ -1235,7 +1235,7 @@ static void htb_destroy_class(struct Qdisc *sch, struct htb_class *cl)
 		WARN_ON(!cl->un.leaf.q);
 		qdisc_destroy(cl->un.leaf.q);
 	}
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	tcf_destroy_chain(&cl->filter_list);
 	kfree(cl);
 }
@@ -1402,7 +1402,8 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			goto failure;
 
 		if (htb_rate_est || tca[TCA_RATE]) {
-			err = gen_new_estimator(&cl->bstats, &cl->rate_est,
+			err = gen_new_estimator(&cl->bstats, NULL,
+						&cl->rate_est,
 						qdisc_root_sleeping_lock(sch),
 						tca[TCA_RATE] ? : &est.nla);
 			if (err) {
@@ -1464,8 +1465,11 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			parent->children++;
 	} else {
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
-						    qdisc_root_sleeping_lock(sch),
+			spinlock_t *lock = qdisc_root_sleeping_lock(sch);
+
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
+						    lock,
 						    tca[TCA_RATE]);
 			if (err)
 				return err;
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index b351125..25302be 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -65,7 +65,7 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	result = tc_classify(skb, fl, &res);
 
-	qdisc_bstats_update(sch, skb);
+	qdisc_bstats_update_cpu(sch, skb);
 	switch (result) {
 	case TC_ACT_SHOT:
 		result = TC_ACT_SHOT;
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index a8b2864..e96a41f 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -98,20 +98,22 @@ static void mq_attach(struct Qdisc *sch)
 
 static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_basic_packed *bstats = &sch->bstats_qdisc.bstats;
 	struct net_device *dev = qdisc_dev(sch);
 	struct Qdisc *qdisc;
 	unsigned int ntx;
 
 	sch->q.qlen = 0;
-	memset(&sch->bstats, 0, sizeof(sch->bstats));
+	memset(bstats, 0, sizeof(sch->bstats_qdisc));
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
-		sch->bstats.bytes	+= qdisc->bstats.bytes;
-		sch->bstats.packets	+= qdisc->bstats.packets;
+
+		bstats->bytes		+= qdisc->bstats_qdisc.bstats.bytes;
+		bstats->packets		+= qdisc->bstats_qdisc.bstats.packets;
 		sch->qstats.qlen	+= qdisc->qstats.qlen;
 		sch->qstats.backlog	+= qdisc->qstats.backlog;
 		sch->qstats.drops	+= qdisc->qstats.drops;
@@ -201,7 +203,7 @@ static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 	sch = dev_queue->qdisc_sleeping;
 	sch->qstats.qlen = sch->q.qlen;
-	if (gnet_stats_copy_basic(d, &sch->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, &sch->bstats_qdisc.bstats) < 0 ||
 	    gnet_stats_copy_queue(d, &sch->qstats) < 0)
 		return -1;
 	return 0;
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 37e7d25..6e3e4e9 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -219,6 +219,7 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 
 static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_basic_packed *bstats = &sch->bstats_qdisc.bstats;
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	unsigned char *b = skb_tail_pointer(skb);
@@ -227,15 +228,15 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	unsigned int i;
 
 	sch->q.qlen = 0;
-	memset(&sch->bstats, 0, sizeof(sch->bstats));
+	memset(bstats, 0, sizeof(sch->bstats_qdisc.bstats));
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
-		sch->bstats.bytes	+= qdisc->bstats.bytes;
-		sch->bstats.packets	+= qdisc->bstats.packets;
+		bstats->bytes		+= qdisc->bstats_qdisc.bstats.bytes;
+		bstats->packets		+= qdisc->bstats_qdisc.bstats.packets;
 		sch->qstats.qlen	+= qdisc->qstats.qlen;
 		sch->qstats.backlog	+= qdisc->qstats.backlog;
 		sch->qstats.drops	+= qdisc->qstats.drops;
@@ -344,8 +345,8 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 			qdisc = rtnl_dereference(q->qdisc);
 			spin_lock_bh(qdisc_lock(qdisc));
-			bstats.bytes      += qdisc->bstats.bytes;
-			bstats.packets    += qdisc->bstats.packets;
+			bstats.bytes      += qdisc->bstats_qdisc.bstats.bytes;
+			bstats.packets    += qdisc->bstats_qdisc.bstats.packets;
 			qstats.qlen       += qdisc->qstats.qlen;
 			qstats.backlog    += qdisc->qstats.backlog;
 			qstats.drops      += qdisc->qstats.drops;
@@ -363,7 +364,7 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 		sch = dev_queue->qdisc_sleeping;
 		sch->qstats.qlen = sch->q.qlen;
-		if (gnet_stats_copy_basic(d, &sch->bstats) < 0 ||
+		if (gnet_stats_copy_basic(d, &sch->bstats_qdisc.bstats) < 0 ||
 		    gnet_stats_copy_queue(d, &sch->qstats) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index c0466c1..d6430102 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -361,7 +361,7 @@ static int multiq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 	cl_q = q->queues[cl - 1];
 	cl_q->qstats.qlen = cl_q->q.qlen;
-	if (gnet_stats_copy_basic(d, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, &cl_q->bstats_qdisc.bstats) < 0 ||
 	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
 		return -1;
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 03ef99e..9069aba 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -325,7 +325,7 @@ static int prio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 	cl_q = q->queues[cl - 1];
 	cl_q->qstats.qlen = cl_q->q.qlen;
-	if (gnet_stats_copy_basic(d, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, &cl_q->bstats_qdisc.bstats) < 0 ||
 	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
 		return -1;
 
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 602ea01..52a602d 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -459,7 +459,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	if (cl != NULL) { /* modify existing class */
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
 						    qdisc_root_sleeping_lock(sch),
 						    tca[TCA_RATE]);
 			if (err)
@@ -484,7 +485,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		cl->qdisc = &noop_qdisc;
 
 	if (tca[TCA_RATE]) {
-		err = gen_new_estimator(&cl->bstats, &cl->rate_est,
+		err = gen_new_estimator(&cl->bstats, NULL,
+					&cl->rate_est,
 					qdisc_root_sleeping_lock(sch),
 					tca[TCA_RATE]);
 		if (err)
@@ -505,7 +507,7 @@ set_change_agg:
 		new_agg = kzalloc(sizeof(*new_agg), GFP_KERNEL);
 		if (new_agg == NULL) {
 			err = -ENOBUFS;
-			gen_kill_estimator(&cl->bstats, &cl->rate_est);
+			gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 			goto destroy_class;
 		}
 		sch_tree_lock(sch);
@@ -530,7 +532,7 @@ static void qfq_destroy_class(struct Qdisc *sch, struct qfq_class *cl)
 	struct qfq_sched *q = qdisc_priv(sch);
 
 	qfq_rm_from_agg(q, cl);
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	qdisc_destroy(cl->qdisc);
 	kfree(cl);
 }
@@ -668,7 +670,7 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	xstats.lmax = cl->agg->lmax;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0)
 		return -1;
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 15/16] net: sched: make qstats per cpu
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (13 preceding siblings ...)
  2014-09-10 15:52 ` [net-next PATCH v4 14/16] net: sched: make bstats per cpu and estimator RCU safe John Fastabend
@ 2014-09-10 15:52 ` John Fastabend
  2014-09-10 15:53 ` [net-next PATCH v4 16/16] net: sched: drop ingress qdisc lock John Fastabend
  15 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:52 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

Now that we run qdisc without locked the qstats need to handle the
case where multiple cpus are receiving or transmiting a skb. For now
the only qdisc that supports running without locks is the ingress
qdisc which increments the 32-bit drop counter.

When the stats are collected the summation of all CPUs is collected
and returned in the dump tlv.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/codel.h       |    4 ++--
 include/net/gen_stats.h   |    2 ++
 include/net/sch_generic.h |   19 +++++++++++--------
 net/core/gen_stats.c      |   30 +++++++++++++++++++++++++++++-
 net/sched/sch_api.c       |   31 ++++++++++++++++++++++++++-----
 net/sched/sch_atm.c       |    2 +-
 net/sched/sch_cbq.c       |   10 +++++-----
 net/sched/sch_choke.c     |   15 ++++++++-------
 net/sched/sch_codel.c     |    2 +-
 net/sched/sch_drr.c       |    8 ++++----
 net/sched/sch_dsmark.c    |    2 +-
 net/sched/sch_fifo.c      |    6 ++++--
 net/sched/sch_fq.c        |    4 ++--
 net/sched/sch_fq_codel.c  |    8 ++++----
 net/sched/sch_generic.c   |    8 +++++---
 net/sched/sch_gred.c      |   10 +++++-----
 net/sched/sch_hfsc.c      |   15 ++++++++-------
 net/sched/sch_hhf.c       |    8 ++++----
 net/sched/sch_htb.c       |    6 +++---
 net/sched/sch_ingress.c   |    4 +++-
 net/sched/sch_mq.c        |   23 ++++++++++++++---------
 net/sched/sch_mqprio.c    |   35 ++++++++++++++++++++++-------------
 net/sched/sch_multiq.c    |    8 ++++----
 net/sched/sch_netem.c     |   17 +++++++++--------
 net/sched/sch_pie.c       |    8 ++++----
 net/sched/sch_plug.c      |    2 +-
 net/sched/sch_prio.c      |    8 ++++----
 net/sched/sch_qfq.c       |    8 ++++----
 net/sched/sch_red.c       |   13 +++++++------
 net/sched/sch_sfb.c       |   13 +++++++------
 net/sched/sch_sfq.c       |   19 ++++++++++---------
 net/sched/sch_tbf.c       |   11 ++++++-----
 32 files changed, 220 insertions(+), 139 deletions(-)

diff --git a/include/net/codel.h b/include/net/codel.h
index aeee280..72d37a4 100644
--- a/include/net/codel.h
+++ b/include/net/codel.h
@@ -228,13 +228,13 @@ static bool codel_should_drop(const struct sk_buff *skb,
 	}
 
 	vars->ldelay = now - codel_get_enqueue_time(skb);
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 
 	if (unlikely(qdisc_pkt_len(skb) > stats->maxpacket))
 		stats->maxpacket = qdisc_pkt_len(skb);
 
 	if (codel_time_before(vars->ldelay, params->target) ||
-	    sch->qstats.backlog <= stats->maxpacket) {
+	    sch->qstats_qdisc.qstats.backlog <= stats->maxpacket) {
 		/* went below - stay below for at least interval */
 		vars->first_above_time = 0;
 		return false;
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 4b7ca2b..d548dc9 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -44,6 +44,8 @@ int gnet_stats_copy_rate_est(struct gnet_dump *d,
 			     const struct gnet_stats_basic_cpu __percpu *cpu_b,
 			     struct gnet_stats_rate_est64 *r);
 int gnet_stats_copy_queue(struct gnet_dump *d, struct gnet_stats_queue *q);
+int gnet_stats_copy_queue_cpu(struct gnet_dump *d,
+			      struct gnet_stats_queue __percpu *q);
 int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 
 int gnet_stats_finish_copy(struct gnet_dump *d);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e1ed293..c5b20cb 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -92,7 +92,10 @@ struct Qdisc {
 		struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	} bstats_qdisc;
 	unsigned int		__state;
-	struct gnet_stats_queue	qstats;
+	union {
+		struct gnet_stats_queue qstats;
+		struct gnet_stats_queue __percpu *cpu_qstats;
+	} qstats_qdisc;
 	struct rcu_head		rcu_head;
 	int			padded;
 	atomic_t		refcnt;
@@ -542,7 +545,7 @@ static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
 				       struct sk_buff_head *list)
 {
 	__skb_queue_tail(list, skb);
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 
 	return NET_XMIT_SUCCESS;
 }
@@ -558,7 +561,7 @@ static inline struct sk_buff *__qdisc_dequeue_head(struct Qdisc *sch,
 	struct sk_buff *skb = __skb_dequeue(list);
 
 	if (likely(skb != NULL)) {
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		qdisc_bstats_update(sch, skb);
 	}
 
@@ -577,7 +580,7 @@ static inline unsigned int __qdisc_queue_drop_head(struct Qdisc *sch,
 
 	if (likely(skb != NULL)) {
 		unsigned int len = qdisc_pkt_len(skb);
-		sch->qstats.backlog -= len;
+		sch->qstats_qdisc.qstats.backlog -= len;
 		kfree_skb(skb);
 		return len;
 	}
@@ -596,7 +599,7 @@ static inline struct sk_buff *__qdisc_dequeue_tail(struct Qdisc *sch,
 	struct sk_buff *skb = __skb_dequeue_tail(list);
 
 	if (likely(skb != NULL))
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 
 	return skb;
 }
@@ -653,7 +656,7 @@ static inline void __qdisc_reset_queue(struct Qdisc *sch,
 static inline void qdisc_reset_queue(struct Qdisc *sch)
 {
 	__qdisc_reset_queue(sch, &sch->q);
-	sch->qstats.backlog = 0;
+	sch->qstats_qdisc.qstats.backlog = 0;
 }
 
 static inline unsigned int __qdisc_queue_drop(struct Qdisc *sch,
@@ -678,14 +681,14 @@ static inline unsigned int qdisc_queue_drop(struct Qdisc *sch)
 static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch)
 {
 	kfree_skb(skb);
-	sch->qstats.drops++;
+	sch->qstats_qdisc.qstats.drops++;
 
 	return NET_XMIT_DROP;
 }
 
 static inline int qdisc_reshape_fail(struct sk_buff *skb, struct Qdisc *sch)
 {
-	sch->qstats.drops++;
+	sch->qstats_qdisc.qstats.drops++;
 
 #ifdef CONFIG_NET_CLS_ACT
 	if (sch->reshape_fail == NULL || sch->reshape_fail(skb, sch))
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index e43b55f..1d11e10 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -22,7 +22,7 @@
 #include <linux/gen_stats.h>
 #include <net/netlink.h>
 #include <net/gen_stats.h>
-
+#include <uapi/linux/gen_stats.h>
 
 static inline int
 gnet_stats_copy(struct gnet_dump *d, int type, void *buf, int size)
@@ -245,6 +245,34 @@ gnet_stats_copy_queue(struct gnet_dump *d, struct gnet_stats_queue *q)
 }
 EXPORT_SYMBOL(gnet_stats_copy_queue);
 
+static void
+__gnet_stats_copy_queue_cpu(struct gnet_stats_queue *qstats,
+			    struct gnet_stats_queue __percpu *q)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
+
+		qstats->qlen += qcpu->qlen;
+		qstats->backlog += qcpu->backlog;
+		qstats->drops += qcpu->drops;
+		qstats->requeues += qcpu->requeues;
+		qstats->overlimits += qcpu->overlimits;
+	}
+}
+
+int
+gnet_stats_copy_queue_cpu(struct gnet_dump *d,
+			  struct gnet_stats_queue __percpu *q)
+{
+	struct gnet_stats_queue qstats = {0};
+
+	__gnet_stats_copy_queue_cpu(&qstats, q);
+	return gnet_stats_copy_queue(d, &qstats);
+}
+EXPORT_SYMBOL(gnet_stats_copy_queue_cpu);
+
 /**
  * gnet_stats_copy_app - copy application specific statistics into statistics TLV
  * @d: dumping handle
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index beb2064..db5626d 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -763,7 +763,7 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
 			cops->put(sch, cl);
 		}
 		sch->q.qlen -= n;
-		sch->qstats.drops += drops;
+		sch->qstats_qdisc.qstats.drops += drops;
 	}
 }
 EXPORT_SYMBOL(qdisc_tree_decrease_qlen);
@@ -947,6 +947,11 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				alloc_percpu(struct gnet_stats_basic_cpu);
 			if (!sch->bstats_qdisc.cpu_bstats)
 				goto err_out4;
+
+			sch->qstats_qdisc.cpu_qstats =
+				alloc_percpu(struct gnet_stats_queue);
+			if (!sch->qstats_qdisc.cpu_qstats)
+				goto err_out4;
 		}
 
 		if (tca[TCA_STAB]) {
@@ -1341,7 +1346,6 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 		goto nla_put_failure;
 	if (q->ops->dump && q->ops->dump(q, skb) < 0)
 		goto nla_put_failure;
-	q->qstats.qlen = q->q.qlen;
 
 	stab = rtnl_dereference(q->stab);
 	if (stab && qdisc_dump_stab(skb, stab) < 0)
@@ -1355,24 +1359,41 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 		goto nla_put_failure;
 
 	if (qdisc_is_lockless(q)) {
+		struct gnet_stats_queue *stats;
+
 		err = gnet_stats_copy_basic_cpu(&d, q->bstats_qdisc.cpu_bstats);
 		if (err < 0)
 			goto nla_put_failure;
 		err = gnet_stats_copy_rate_est(&d, NULL,
 					       q->bstats_qdisc.cpu_bstats,
 					       &q->rate_est);
+
+		if (err < 0)
+			goto nla_put_failure;
+
+		/* Qlen is a property of the skb queue list not gen_stats_queue
+		 * so we pack it into the first cpu variable
+		 */
+		stats = per_cpu_ptr(q->qstats_qdisc.cpu_qstats, 0);
+		stats->qlen = qdisc_qlen(q);
+		err = gnet_stats_copy_queue_cpu(&d, q->qstats_qdisc.cpu_qstats);
 	} else {
 		err = gnet_stats_copy_basic(&d, &q->bstats_qdisc.bstats);
 		if (err < 0)
 			goto nla_put_failure;
+
+		q->qstats_qdisc.qstats.qlen = q->q.qlen;
 		err = gnet_stats_copy_rate_est(&d,
 					       &q->bstats_qdisc.bstats, NULL,
 					       &q->rate_est);
+		if (err < 0)
+			goto nla_put_failure;
+
+		err = gnet_stats_copy_queue(&d, &q->qstats_qdisc.qstats);
 	}
 
-	if (err < 0 ||
-	    gnet_stats_copy_queue(&d, &q->qstats) < 0)
-			goto nla_put_failure;
+	if (err < 0)
+		goto nla_put_failure;
 
 	if (gnet_stats_finish_copy(&d) < 0)
 		goto nla_put_failure;
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index c398f9c..f704006 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -417,7 +417,7 @@ done:
 	if (ret != NET_XMIT_SUCCESS) {
 drop: __maybe_unused
 		if (net_xmit_drop_count(ret)) {
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 			if (flow)
 				flow->qstats.drops++;
 		}
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 208074d..7ac9833 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -377,7 +377,7 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 #endif
 	if (cl == NULL) {
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -395,7 +395,7 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	}
 
 	if (net_xmit_drop_count(ret)) {
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 		cbq_mark_toplevel(q, cl);
 		cl->qstats.drops++;
 	}
@@ -650,11 +650,11 @@ static int cbq_reshape_fail(struct sk_buff *skb, struct Qdisc *child)
 			return 0;
 		}
 		if (net_xmit_drop_count(ret))
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		return 0;
 	}
 
-	sch->qstats.drops++;
+	sch->qstats_qdisc.qstats.drops++;
 	return -1;
 }
 #endif
@@ -995,7 +995,7 @@ cbq_dequeue(struct Qdisc *sch)
 	 */
 
 	if (sch->q.qlen) {
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (q->wd_expires)
 			qdisc_watchdog_schedule(&q->watchdog,
 						now + q->wd_expires);
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 4b52b70..4c18d20 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -127,7 +127,7 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
 	if (idx == q->tail)
 		choke_zap_tail_holes(q);
 
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 	qdisc_drop(skb, sch);
 	qdisc_tree_decrease_qlen(sch, 1);
 	--sch->q.qlen;
@@ -296,7 +296,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		if (q->vars.qavg > p->qth_max) {
 			q->vars.qcount = -1;
 
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			if (use_harddrop(q) || !use_ecn(q) ||
 			    !INET_ECN_set_ce(skb)) {
 				q->stats.forced_drop++;
@@ -309,7 +309,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 				q->vars.qcount = 0;
 				q->vars.qR = red_random(p);
 
-				sch->qstats.overlimits++;
+				sch->qstats_qdisc.qstats.overlimits++;
 				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
 					q->stats.prob_drop++;
 					goto congestion_drop;
@@ -326,7 +326,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		q->tab[q->tail] = skb;
 		q->tail = (q->tail + 1) & q->tab_mask;
 		++sch->q.qlen;
-		sch->qstats.backlog += qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 		return NET_XMIT_SUCCESS;
 	}
 
@@ -339,7 +339,7 @@ congestion_drop:
 
 other_drop:
 	if (ret & __NET_XMIT_BYPASS)
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	kfree_skb(skb);
 	return ret;
 }
@@ -359,7 +359,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	q->tab[q->head] = NULL;
 	choke_zap_head_holes(q);
 	--sch->q.qlen;
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 	qdisc_bstats_update(sch, skb);
 
 	return skb;
@@ -402,6 +402,7 @@ static void choke_free(void *addr)
 
 static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct choke_sched_data *q = qdisc_priv(sch);
 	struct nlattr *tb[TCA_CHOKE_MAX + 1];
 	const struct tc_red_qopt *ctl;
@@ -454,7 +455,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 					ntab[tail++] = skb;
 					continue;
 				}
-				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				qstats->backlog -= qdisc_pkt_len(skb);
 				--sch->q.qlen;
 				qdisc_drop(skb, sch);
 			}
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 2f9ab17..47a387e 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -149,7 +149,7 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt)
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = __skb_dequeue(&sch->q);
 
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		qdisc_drop(skb, sch);
 	}
 	qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 88a65f6..4570597 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -280,12 +280,12 @@ static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	memset(&xstats, 0, sizeof(xstats));
 	if (cl->qdisc->q.qlen) {
 		xstats.deficit = cl->deficit;
-		cl->qdisc->qstats.qlen = cl->qdisc->q.qlen;
+		cl->qdisc->qstats_qdisc.qstats.qlen = cl->qdisc->q.qlen;
 	}
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
-	    gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0)
+	    gnet_stats_copy_queue(d, &cl->qdisc->qstats_qdisc.qstats) < 0)
 		return -1;
 
 	return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
@@ -360,7 +360,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl = drr_classify(skb, sch, &err);
 	if (cl == NULL) {
 		if (err & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return err;
 	}
@@ -369,7 +369,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (unlikely(err != NET_XMIT_SUCCESS)) {
 		if (net_xmit_drop_count(err)) {
 			cl->qstats.drops++;
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		}
 		return err;
 	}
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 485e456..6e6c159 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -258,7 +258,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	err = qdisc_enqueue(skb, p->q);
 	if (err != NET_XMIT_SUCCESS) {
 		if (net_xmit_drop_count(err))
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		return err;
 	}
 
diff --git a/net/sched/sch_fifo.c b/net/sched/sch_fifo.c
index e15a9eb..264ae79 100644
--- a/net/sched/sch_fifo.c
+++ b/net/sched/sch_fifo.c
@@ -21,7 +21,9 @@
 
 static int bfifo_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
-	if (likely(sch->qstats.backlog + qdisc_pkt_len(skb) <= sch->limit))
+	u32 limit = sch->qstats_qdisc.qstats.backlog + qdisc_pkt_len(skb);
+
+	if (likely(limit <= sch->limit))
 		return qdisc_enqueue_tail(skb, sch);
 
 	return qdisc_reshape_fail(skb, sch);
@@ -42,7 +44,7 @@ static int pfifo_tail_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	/* queue full, remove one skb to fulfill the limit */
 	__qdisc_queue_drop_head(sch, &sch->q);
-	sch->qstats.drops++;
+	sch->qstats_qdisc.qstats.drops++;
 	qdisc_enqueue_tail(skb, sch);
 
 	return NET_XMIT_CN;
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index e12f997..d1fd16d 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -290,7 +290,7 @@ static struct sk_buff *fq_dequeue_head(struct Qdisc *sch, struct fq_flow *flow)
 		flow->head = skb->next;
 		skb->next = NULL;
 		flow->qlen--;
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		sch->q.qlen--;
 	}
 	return skb;
@@ -371,7 +371,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	f->qlen++;
 	if (skb_is_retransmit(skb))
 		q->stat_tcp_retrans++;
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 	if (fq_flow_is_detached(f)) {
 		fq_flow_add_tail(&q->new_flows, f);
 		if (time_after(jiffies, f->age + q->flow_refill_delay))
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 105cf55..6d3395d 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -164,8 +164,8 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
 	q->backlogs[idx] -= len;
 	kfree_skb(skb);
 	sch->q.qlen--;
-	sch->qstats.drops++;
-	sch->qstats.backlog -= len;
+	sch->qstats_qdisc.qstats.drops++;
+	sch->qstats_qdisc.qstats.backlog -= len;
 	flow->dropped++;
 	return idx;
 }
@@ -180,7 +180,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	idx = fq_codel_classify(skb, sch, &ret);
 	if (idx == 0) {
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -190,7 +190,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	flow = &q->flows[idx];
 	flow_queue_add(flow, skb);
 	q->backlogs[idx] += qdisc_pkt_len(skb);
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 
 	if (list_empty(&flow->flowchain)) {
 		list_add_tail(&flow->flowchain, &q->new_flows);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e3d203e..39527a9 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -49,7 +49,7 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
 	skb_dst_force(skb);
 	q->gso_skb = skb;
-	q->qstats.requeues++;
+	q->qstats_qdisc.qstats.requeues++;
 	q->q.qlen++;	/* it's still part of the queue */
 	__netif_schedule(q);
 
@@ -500,7 +500,7 @@ static void pfifo_fast_reset(struct Qdisc *qdisc)
 		__qdisc_reset_queue(qdisc, band2list(priv, prio));
 
 	priv->bitmap = 0;
-	qdisc->qstats.backlog = 0;
+	qdisc->qstats_qdisc.qstats.backlog = 0;
 	qdisc->q.qlen = 0;
 }
 
@@ -632,8 +632,10 @@ static void qdisc_rcu_free(struct rcu_head *head)
 {
 	struct Qdisc *qdisc = container_of(head, struct Qdisc, rcu_head);
 
-	if (qdisc_is_lockless(qdisc))
+	if (qdisc_is_lockless(qdisc)) {
 		free_percpu(qdisc->bstats_qdisc.cpu_bstats);
+		free_percpu(qdisc->qstats_qdisc.cpu_qstats);
+	}
 
 	kfree((char *) qdisc - qdisc->padded);
 }
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 12cbc09..198c44c 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -115,7 +115,7 @@ static inline unsigned int gred_backlog(struct gred_sched *table,
 					struct Qdisc *sch)
 {
 	if (gred_wred_mode(table))
-		return sch->qstats.backlog;
+		return sch->qstats_qdisc.qstats.backlog;
 	else
 		return q->backlog;
 }
@@ -209,7 +209,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		break;
 
 	case RED_PROB_MARK:
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (!gred_use_ecn(t) || !INET_ECN_set_ce(skb)) {
 			q->stats.prob_drop++;
 			goto congestion_drop;
@@ -219,7 +219,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		break;
 
 	case RED_HARD_MARK:
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (gred_use_harddrop(t) || !gred_use_ecn(t) ||
 		    !INET_ECN_set_ce(skb)) {
 			q->stats.forced_drop++;
@@ -261,7 +261,7 @@ static struct sk_buff *gred_dequeue(struct Qdisc *sch)
 			q->backlog -= qdisc_pkt_len(skb);
 
 			if (gred_wred_mode(t)) {
-				if (!sch->qstats.backlog)
+				if (!sch->qstats_qdisc.qstats.backlog)
 					red_start_of_idle_period(&t->wred_set);
 			} else {
 				if (!q->backlog)
@@ -294,7 +294,7 @@ static unsigned int gred_drop(struct Qdisc *sch)
 			q->stats.other++;
 
 			if (gred_wred_mode(t)) {
-				if (!sch->qstats.backlog)
+				if (!sch->qstats_qdisc.qstats.backlog)
 					red_start_of_idle_period(&t->wred_set);
 			} else {
 				if (!q->backlog)
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 3b112d2..4c08595 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1371,7 +1371,7 @@ hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	struct tc_hfsc_stats xstats;
 
 	cl->qstats.qlen = cl->qdisc->q.qlen;
-	cl->qstats.backlog = cl->qdisc->qstats.backlog;
+	cl->qstats.backlog = cl->qdisc->qstats_qdisc.qstats.backlog;
 	xstats.level   = cl->level;
 	xstats.period  = cl->cl_vtperiod;
 	xstats.work    = cl->cl_total;
@@ -1560,16 +1560,17 @@ hfsc_destroy_qdisc(struct Qdisc *sch)
 static int
 hfsc_dump_qdisc(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct hfsc_sched *q = qdisc_priv(sch);
 	unsigned char *b = skb_tail_pointer(skb);
 	struct tc_hfsc_qopt qopt;
 	struct hfsc_class *cl;
 	unsigned int i;
 
-	sch->qstats.backlog = 0;
+	qstats->backlog = 0;
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry(cl, &q->clhash.hash[i], cl_common.hnode)
-			sch->qstats.backlog += cl->qdisc->qstats.backlog;
+			qstats->backlog += cl->qdisc->qstats_qdisc.qstats.backlog;
 	}
 
 	qopt.defcls = q->defcls;
@@ -1591,7 +1592,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl = hfsc_classify(skb, sch, &err);
 	if (cl == NULL) {
 		if (err & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return err;
 	}
@@ -1600,7 +1601,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (unlikely(err != NET_XMIT_SUCCESS)) {
 		if (net_xmit_drop_count(err)) {
 			cl->qstats.drops++;
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		}
 		return err;
 	}
@@ -1643,7 +1644,7 @@ hfsc_dequeue(struct Qdisc *sch)
 		 */
 		cl = vttree_get_minvt(&q->root, cur_time);
 		if (cl == NULL) {
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			hfsc_schedule_watchdog(sch);
 			return NULL;
 		}
@@ -1698,7 +1699,7 @@ hfsc_drop(struct Qdisc *sch)
 				list_move_tail(&cl->dlist, &q->droplist);
 			}
 			cl->qstats.drops++;
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 			sch->q.qlen--;
 			return len;
 		}
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index d85b681..9ae250a 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -376,8 +376,8 @@ static unsigned int hhf_drop(struct Qdisc *sch)
 		struct sk_buff *skb = dequeue_head(bucket);
 
 		sch->q.qlen--;
-		sch->qstats.drops++;
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.drops++;
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		kfree_skb(skb);
 	}
 
@@ -395,7 +395,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	bucket = &q->buckets[idx];
 	bucket_add(bucket, skb);
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 
 	if (list_empty(&bucket->bucketchain)) {
 		unsigned int weight;
@@ -457,7 +457,7 @@ begin:
 	if (bucket->head) {
 		skb = dequeue_head(bucket);
 		sch->q.qlen--;
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 	}
 
 	if (!skb) {
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 8067a82..1de50e8 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -586,13 +586,13 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 #ifdef CONFIG_NET_CLS_ACT
 	} else if (!cl) {
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 #endif
 	} else if ((ret = qdisc_enqueue(skb, cl->un.leaf.q)) != NET_XMIT_SUCCESS) {
 		if (net_xmit_drop_count(ret)) {
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 			cl->qstats.drops++;
 		}
 		return ret;
@@ -925,7 +925,7 @@ ok:
 				goto ok;
 		}
 	}
-	sch->qstats.overlimits++;
+	sch->qstats_qdisc.qstats.overlimits++;
 	if (likely(next_event > q->now)) {
 		if (!test_bit(__QDISC_STATE_DEACTIVATED,
 			      &qdisc_root_sleeping(q->watchdog.qdisc)->state)) {
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 25302be..88a6289 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -59,6 +59,7 @@ static struct tcf_proto __rcu **ingress_find_tcf(struct Qdisc *sch,
 static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct ingress_qdisc_data *p = qdisc_priv(sch);
+	struct gnet_stats_queue *qstats;
 	struct tcf_result res;
 	struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
 	int result;
@@ -69,7 +70,8 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	switch (result) {
 	case TC_ACT_SHOT:
 		result = TC_ACT_SHOT;
-		sch->qstats.drops++;
+		qstats = this_cpu_ptr(sch->qstats_qdisc.cpu_qstats);
+		qstats->drops++;
 		break;
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index e96a41f..402460d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -98,6 +98,7 @@ static void mq_attach(struct Qdisc *sch)
 
 static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct gnet_stats_basic_packed *bstats = &sch->bstats_qdisc.bstats;
 	struct net_device *dev = qdisc_dev(sch);
 	struct Qdisc *qdisc;
@@ -105,20 +106,24 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 	sch->q.qlen = 0;
 	memset(bstats, 0, sizeof(sch->bstats_qdisc));
-	memset(&sch->qstats, 0, sizeof(sch->qstats));
+	memset(qstats, 0, sizeof(sch->qstats_qdisc.qstats));
 
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		struct gnet_stats_queue *q_qstats;
+
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
-		sch->q.qlen		+= qdisc->q.qlen;
+		q_qstats = &qdisc->qstats_qdisc.qstats;
 
+		sch->q.qlen		+= qdisc->q.qlen;
 		bstats->bytes		+= qdisc->bstats_qdisc.bstats.bytes;
 		bstats->packets		+= qdisc->bstats_qdisc.bstats.packets;
-		sch->qstats.qlen	+= qdisc->qstats.qlen;
-		sch->qstats.backlog	+= qdisc->qstats.backlog;
-		sch->qstats.drops	+= qdisc->qstats.drops;
-		sch->qstats.requeues	+= qdisc->qstats.requeues;
-		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+		qstats->qlen		+= q_qstats->qlen;
+		qstats->backlog		+= q_qstats->backlog;
+		qstats->drops		+= q_qstats->drops;
+		qstats->requeues	+= q_qstats->requeues;
+		qstats->overlimits	+= q_qstats->overlimits;
+
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
 	return 0;
@@ -202,9 +207,9 @@ static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
 
 	sch = dev_queue->qdisc_sleeping;
-	sch->qstats.qlen = sch->q.qlen;
+	sch->qstats_qdisc.qstats.qlen = sch->q.qlen;
 	if (gnet_stats_copy_basic(d, &sch->bstats_qdisc.bstats) < 0 ||
-	    gnet_stats_copy_queue(d, &sch->qstats) < 0)
+	    gnet_stats_copy_queue(d, &sch->qstats_qdisc.qstats) < 0)
 		return -1;
 	return 0;
 }
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6e3e4e9..6ec84a7 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -220,6 +220,7 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct gnet_stats_basic_packed *bstats = &sch->bstats_qdisc.bstats;
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	unsigned char *b = skb_tail_pointer(skb);
@@ -229,19 +230,25 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 	sch->q.qlen = 0;
 	memset(bstats, 0, sizeof(sch->bstats_qdisc.bstats));
-	memset(&sch->qstats, 0, sizeof(sch->qstats));
+	memset(qstats, 0, sizeof(sch->qstats_qdisc.qstats));
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct gnet_stats_queue *q_qstats;
+
 		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
+
 		bstats->bytes		+= qdisc->bstats_qdisc.bstats.bytes;
 		bstats->packets		+= qdisc->bstats_qdisc.bstats.packets;
-		sch->qstats.qlen	+= qdisc->qstats.qlen;
-		sch->qstats.backlog	+= qdisc->qstats.backlog;
-		sch->qstats.drops	+= qdisc->qstats.drops;
-		sch->qstats.requeues	+= qdisc->qstats.requeues;
-		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+
+		q_qstats = &qdisc->qstats_qdisc.qstats;
+
+		qstats->qlen		+= q_qstats->qlen;
+		qstats->backlog		+= q_qstats->backlog;
+		qstats->drops		+= q_qstats->drops;
+		qstats->requeues	+= q_qstats->requeues;
+		qstats->overlimits	+= q_qstats->overlimits;
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
 
@@ -341,17 +348,19 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		spin_unlock_bh(d->lock);
 
 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
+			struct gnet_stats_queue *stats;
 			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
 
 			qdisc = rtnl_dereference(q->qdisc);
 			spin_lock_bh(qdisc_lock(qdisc));
+			stats = &qdisc->qstats_qdisc.qstats;
 			bstats.bytes      += qdisc->bstats_qdisc.bstats.bytes;
 			bstats.packets    += qdisc->bstats_qdisc.bstats.packets;
-			qstats.qlen       += qdisc->qstats.qlen;
-			qstats.backlog    += qdisc->qstats.backlog;
-			qstats.drops      += qdisc->qstats.drops;
-			qstats.requeues   += qdisc->qstats.requeues;
-			qstats.overlimits += qdisc->qstats.overlimits;
+			qstats.qlen       += stats->qlen;
+			qstats.backlog    += stats->backlog;
+			qstats.drops      += stats->drops;
+			qstats.requeues   += stats->requeues;
+			qstats.overlimits += stats->overlimits;
 			spin_unlock_bh(qdisc_lock(qdisc));
 		}
 		/* Reclaim root sleeping lock before completing stats */
@@ -363,9 +372,9 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
 
 		sch = dev_queue->qdisc_sleeping;
-		sch->qstats.qlen = sch->q.qlen;
+		sch->qstats_qdisc.qstats.qlen = sch->q.qlen;
 		if (gnet_stats_copy_basic(d, &sch->bstats_qdisc.bstats) < 0 ||
-		    gnet_stats_copy_queue(d, &sch->qstats) < 0)
+		    gnet_stats_copy_queue(d, &sch->qstats_qdisc.qstats) < 0)
 			return -1;
 	}
 	return 0;
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index d6430102..2af2293 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -75,7 +75,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (qdisc == NULL) {
 
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -87,7 +87,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		return NET_XMIT_SUCCESS;
 	}
 	if (net_xmit_drop_count(ret))
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	return ret;
 }
 
@@ -360,9 +360,9 @@ static int multiq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct Qdisc *cl_q;
 
 	cl_q = q->queues[cl - 1];
-	cl_q->qstats.qlen = cl_q->q.qlen;
+	cl_q->qstats_qdisc.qstats.qlen = cl_q->q.qlen;
 	if (gnet_stats_copy_basic(d, &cl_q->bstats_qdisc.bstats) < 0 ||
-	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
+	    gnet_stats_copy_queue(d, &cl_q->qstats_qdisc.qstats) < 0)
 		return -1;
 
 	return 0;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 111d70f..c7158ce 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -429,12 +429,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* Drop packet? */
 	if (loss_event(q)) {
 		if (q->ecn && INET_ECN_set_ce(skb))
-			sch->qstats.drops++; /* mark packet */
+			sch->qstats_qdisc.qstats.drops++; /* mark packet */
 		else
 			--count;
 	}
 	if (count == 0) {
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	}
@@ -478,7 +478,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (unlikely(skb_queue_len(&sch->q) >= sch->limit))
 		return qdisc_reshape_fail(skb, sch);
 
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 
 	cb = netem_skb_cb(skb);
 	if (q->gap == 0 ||		/* not doing reordering */
@@ -526,7 +526,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		q->counter = 0;
 
 		__skb_queue_head(&sch->q, skb);
-		sch->qstats.requeues++;
+		sch->qstats_qdisc.qstats.requeues++;
 	}
 
 	return NET_XMIT_SUCCESS;
@@ -550,20 +550,21 @@ static unsigned int netem_drop(struct Qdisc *sch)
 			skb->next = NULL;
 			skb->prev = NULL;
 			len = qdisc_pkt_len(skb);
-			sch->qstats.backlog -= len;
+			sch->qstats_qdisc.qstats.backlog -= len;
 			kfree_skb(skb);
 		}
 	}
 	if (!len && q->qdisc && q->qdisc->ops->drop)
 	    len = q->qdisc->ops->drop(q->qdisc);
 	if (len)
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 
 	return len;
 }
 
 static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct netem_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
 	struct rb_node *p;
@@ -575,7 +576,7 @@ tfifo_dequeue:
 	skb = __skb_dequeue(&sch->q);
 	if (skb) {
 deliver:
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qstats->backlog -= qdisc_pkt_len(skb);
 		qdisc_unthrottled(sch);
 		qdisc_bstats_update(sch, skb);
 		return skb;
@@ -610,7 +611,7 @@ deliver:
 
 				if (unlikely(err != NET_XMIT_SUCCESS)) {
 					if (net_xmit_drop_count(err)) {
-						sch->qstats.drops++;
+						qstats->drops++;
 						qdisc_tree_decrease_qlen(sch, 1);
 					}
 				}
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index fefeeb7..f454354 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -116,7 +116,7 @@ static bool drop_early(struct Qdisc *sch, u32 packet_size)
 	/* If we have fewer than 2 mtu-sized packets, disable drop_early,
 	 * similar to min_th in RED
 	 */
-	if (sch->qstats.backlog < 2 * mtu)
+	if (sch->qstats_qdisc.qstats.backlog < 2 * mtu)
 		return false;
 
 	/* If bytemode is turned on, use packet size to compute new
@@ -232,7 +232,7 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt)
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = __skb_dequeue(&sch->q);
 
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		qdisc_drop(skb, sch);
 	}
 	qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
@@ -245,7 +245,7 @@ static void pie_process_dequeue(struct Qdisc *sch, struct sk_buff *skb)
 {
 
 	struct pie_sched_data *q = qdisc_priv(sch);
-	int qlen = sch->qstats.backlog;	/* current queue size in bytes */
+	int qlen = sch->qstats_qdisc.qstats.backlog;
 
 	/* If current queue is about 10 packets or more and dq_count is unset
 	 * we have enough packets to calculate the drain rate. Save
@@ -310,7 +310,7 @@ static void pie_process_dequeue(struct Qdisc *sch, struct sk_buff *skb)
 static void calculate_probability(struct Qdisc *sch)
 {
 	struct pie_sched_data *q = qdisc_priv(sch);
-	u32 qlen = sch->qstats.backlog;	/* queue size in bytes */
+	u32 qlen = sch->qstats_qdisc.qstats.backlog;
 	psched_time_t qdelay = 0;	/* in pschedtime */
 	psched_time_t qdelay_old = q->vars.qdelay;	/* in pschedtime */
 	s32 delta = 0;		/* determines the change in probability */
diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c
index 89f8fcf..df2df95 100644
--- a/net/sched/sch_plug.c
+++ b/net/sched/sch_plug.c
@@ -90,7 +90,7 @@ static int plug_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct plug_sched_data *q = qdisc_priv(sch);
 
-	if (likely(sch->qstats.backlog + skb->len <= q->limit)) {
+	if (likely(sch->qstats_qdisc.qstats.backlog + skb->len <= q->limit)) {
 		if (!q->unplug_indefinite)
 			q->pkts_current_epoch++;
 		return qdisc_enqueue_tail(skb, sch);
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 9069aba..2dd3b8a 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -77,7 +77,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (qdisc == NULL) {
 
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -89,7 +89,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		return NET_XMIT_SUCCESS;
 	}
 	if (net_xmit_drop_count(ret))
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	return ret;
 }
 
@@ -324,9 +324,9 @@ static int prio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct Qdisc *cl_q;
 
 	cl_q = q->queues[cl - 1];
-	cl_q->qstats.qlen = cl_q->q.qlen;
+	cl_q->qstats_qdisc.qstats.qlen = cl_q->q.qlen;
 	if (gnet_stats_copy_basic(d, &cl_q->bstats_qdisc.bstats) < 0 ||
-	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
+	    gnet_stats_copy_queue(d, &cl_q->qstats_qdisc.qstats) < 0)
 		return -1;
 
 	return 0;
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 52a602d..966647c 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -664,14 +664,14 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	struct tc_qfq_stats xstats;
 
 	memset(&xstats, 0, sizeof(xstats));
-	cl->qdisc->qstats.qlen = cl->qdisc->q.qlen;
+	cl->qdisc->qstats_qdisc.qstats.qlen = cl->qdisc->q.qlen;
 
 	xstats.weight = cl->agg->class_weight;
 	xstats.lmax = cl->agg->lmax;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
-	    gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0)
+	    gnet_stats_copy_queue(d, &cl->qdisc->qstats_qdisc.qstats) < 0)
 		return -1;
 
 	return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
@@ -1229,7 +1229,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl = qfq_classify(skb, sch, &err);
 	if (cl == NULL) {
 		if (err & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return err;
 	}
@@ -1249,7 +1249,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		pr_debug("qfq_enqueue: enqueue failed %d\n", err);
 		if (net_xmit_drop_count(err)) {
 			cl->qstats.drops++;
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		}
 		return err;
 	}
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 633e32d..01ec26b 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -64,7 +64,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	q->vars.qavg = red_calc_qavg(&q->parms,
 				     &q->vars,
-				     child->qstats.backlog);
+				     child->qstats_qdisc.qstats.backlog);
 
 	if (red_is_idling(&q->vars))
 		red_end_of_idle_period(&q->vars);
@@ -74,7 +74,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		break;
 
 	case RED_PROB_MARK:
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) {
 			q->stats.prob_drop++;
 			goto congestion_drop;
@@ -84,7 +84,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		break;
 
 	case RED_HARD_MARK:
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (red_use_harddrop(q) || !red_use_ecn(q) ||
 		    !INET_ECN_set_ce(skb)) {
 			q->stats.forced_drop++;
@@ -100,7 +100,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		sch->q.qlen++;
 	} else if (net_xmit_drop_count(ret)) {
 		q->stats.pdrop++;
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	}
 	return ret;
 
@@ -142,7 +142,7 @@ static unsigned int red_drop(struct Qdisc *sch)
 
 	if (child->ops->drop && (len = child->ops->drop(child)) > 0) {
 		q->stats.other++;
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 		sch->q.qlen--;
 		return len;
 	}
@@ -256,6 +256,7 @@ static int red_init(struct Qdisc *sch, struct nlattr *opt)
 
 static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct red_sched_data *q = qdisc_priv(sch);
 	struct nlattr *opts = NULL;
 	struct tc_red_qopt opt = {
@@ -268,7 +269,7 @@ static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
 		.Scell_log	= q->parms.Scell_log,
 	};
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	qstats->backlog = q->qdisc->qstats_qdisc.qstats.backlog;
 	opts = nla_nest_start(skb, TCA_OPTIONS);
 	if (opts == NULL)
 		goto nla_put_failure;
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 1562fb2..3c411e7 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -290,7 +290,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	struct flow_keys keys;
 
 	if (unlikely(sch->q.qlen >= q->limit)) {
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		q->stats.queuedrop++;
 		goto drop;
 	}
@@ -348,7 +348,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	sfb_skb_cb(skb)->hashes[slot] = 0;
 
 	if (unlikely(minqlen >= q->max)) {
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		q->stats.bucketdrop++;
 		goto drop;
 	}
@@ -376,7 +376,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			}
 		}
 		if (sfb_rate_limit(skb, q)) {
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			q->stats.penaltydrop++;
 			goto drop;
 		}
@@ -411,7 +411,7 @@ enqueue:
 		increment_qlen(skb, q);
 	} else if (net_xmit_drop_count(ret)) {
 		q->stats.childdrop++;
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	}
 	return ret;
 
@@ -420,7 +420,7 @@ drop:
 	return NET_XMIT_CN;
 other_drop:
 	if (ret & __NET_XMIT_BYPASS)
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	kfree_skb(skb);
 	return ret;
 }
@@ -556,6 +556,7 @@ static int sfb_init(struct Qdisc *sch, struct nlattr *opt)
 
 static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct sfb_sched_data *q = qdisc_priv(sch);
 	struct nlattr *opts;
 	struct tc_sfb_qopt opt = {
@@ -570,7 +571,7 @@ static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
 		.penalty_burst = q->penalty_burst,
 	};
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	qstats->backlog = q->qdisc->qstats_qdisc.qstats.backlog;
 	opts = nla_nest_start(skb, TCA_OPTIONS);
 	if (opts == NULL)
 		goto nla_put_failure;
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 0bededd..4f57a50 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -336,8 +336,8 @@ drop:
 		sfq_dec(q, x);
 		kfree_skb(skb);
 		sch->q.qlen--;
-		sch->qstats.drops++;
-		sch->qstats.backlog -= len;
+		sch->qstats_qdisc.qstats.drops++;
+		sch->qstats_qdisc.qstats.backlog -= len;
 		return len;
 	}
 
@@ -384,7 +384,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	hash = sfq_classify(skb, sch, &ret);
 	if (hash == 0) {
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -414,7 +414,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			break;
 
 		case RED_PROB_MARK:
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			if (sfq_prob_mark(q)) {
 				/* We know we have at least one packet in queue */
 				if (sfq_headdrop(q) &&
@@ -431,7 +431,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			goto congestion_drop;
 
 		case RED_HARD_MARK:
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			if (sfq_hard_mark(q)) {
 				/* We know we have at least one packet in queue */
 				if (sfq_headdrop(q) &&
@@ -457,7 +457,7 @@ congestion_drop:
 		/* We know we have at least one packet in queue */
 		head = slot_dequeue_head(slot);
 		delta = qdisc_pkt_len(head) - qdisc_pkt_len(skb);
-		sch->qstats.backlog -= delta;
+		sch->qstats_qdisc.qstats.backlog -= delta;
 		slot->backlog -= delta;
 		qdisc_drop(head, sch);
 
@@ -466,7 +466,7 @@ congestion_drop:
 	}
 
 enqueue:
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 	slot->backlog += qdisc_pkt_len(skb);
 	slot_queue_add(slot, skb);
 	sfq_inc(q, x);
@@ -525,7 +525,7 @@ next_slot:
 	sfq_dec(q, a);
 	qdisc_bstats_update(sch, skb);
 	sch->q.qlen--;
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 	slot->backlog -= qdisc_pkt_len(skb);
 	/* Is the slot empty? */
 	if (slot->qlen == 0) {
@@ -559,6 +559,7 @@ sfq_reset(struct Qdisc *sch)
  */
 static void sfq_rehash(struct Qdisc *sch)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
 	int i;
@@ -591,7 +592,7 @@ static void sfq_rehash(struct Qdisc *sch)
 		if (x == SFQ_EMPTY_SLOT) {
 			x = q->dep[0].next; /* get a free slot */
 			if (x >= SFQ_MAX_FLOWS) {
-drop:				sch->qstats.backlog -= qdisc_pkt_len(skb);
+drop:				qstats->backlog -= qdisc_pkt_len(skb);
 				kfree_skb(skb);
 				dropped++;
 				continue;
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 0c39b75..97d2eca 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -175,7 +175,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch)
 		ret = qdisc_enqueue(segs, q->qdisc);
 		if (ret != NET_XMIT_SUCCESS) {
 			if (net_xmit_drop_count(ret))
-				sch->qstats.drops++;
+				sch->qstats_qdisc.qstats.drops++;
 		} else {
 			nb++;
 		}
@@ -201,7 +201,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	ret = qdisc_enqueue(skb, q->qdisc);
 	if (ret != NET_XMIT_SUCCESS) {
 		if (net_xmit_drop_count(ret))
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		return ret;
 	}
 
@@ -216,7 +216,7 @@ static unsigned int tbf_drop(struct Qdisc *sch)
 
 	if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) {
 		sch->q.qlen--;
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	}
 	return len;
 }
@@ -281,7 +281,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
 		   (cf. CSZ, HPFQ, HFSC)
 		 */
 
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 	}
 	return NULL;
 }
@@ -448,11 +448,12 @@ static void tbf_destroy(struct Qdisc *sch)
 
 static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct tbf_sched_data *q = qdisc_priv(sch);
 	struct nlattr *nest;
 	struct tc_tbf_qopt opt;
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	qstats->backlog = q->qdisc->qstats_qdisc.qstats.backlog;
 	nest = nla_nest_start(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH v4 16/16] net: sched: drop ingress qdisc lock
  2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
                   ` (14 preceding siblings ...)
  2014-09-10 15:52 ` [net-next PATCH v4 15/16] net: sched: make qstats per cpu John Fastabend
@ 2014-09-10 15:53 ` John Fastabend
  15 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-10 15:53 UTC (permalink / raw)
  To: xiyou.wangcong, davem, eric.dumazet, jhs; +Cc: netdev, paulmck, brouer

After the previous patches to make the filters RCU safe and
support per cpu counters we can drop the qdisc lock around
the ingress qdisc hook.

This is possible because the ingress qdisc is a very basic
qdisc and only updates stats and runs tc_classify. Its the
simplest qdiscs we have.

In order for the per-cpu counters to get invoked the
ingress qdisc must set the LLQDISC flag. We could use per-cpu
counters on all counters but it is only necessary when the
qdisc lock is not held.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/core/dev.c          |    2 --
 net/sched/sch_ingress.c |    6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b3d6dbc..e400df9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3481,10 +3481,8 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 
 	q = rcu_dereference(rxq->qdisc);
 	if (q != &noop_qdisc) {
-		spin_lock(qdisc_lock(q));
 		if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
 			result = qdisc_enqueue_root(skb, q);
-		spin_unlock(qdisc_lock(q));
 	}
 
 	return result;
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 88a6289..57ea680 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -89,6 +89,11 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 }
 
 /* ------------------------------------------------------------- */
+static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	sch->flags |= TCQ_F_LLQDISC;
+	return 0;
+}
 
 static void ingress_destroy(struct Qdisc *sch)
 {
@@ -122,6 +127,7 @@ static const struct Qdisc_class_ops ingress_class_ops = {
 };
 
 static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
+	.init		=	ingress_init,
 	.cl_ops		=	&ingress_class_ops,
 	.id		=	"ingress",
 	.priv_size	=	sizeof(struct ingress_qdisc_data),

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 01/16] net: qdisc: use rcu prefix and silence sparse warnings
  2014-09-10 15:47 ` [net-next PATCH v4 01/16] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
@ 2014-09-11  0:23   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  0:23 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:47 -0700, John Fastabend wrote:
> Add __rcu notation to qdisc handling by doing this we can make
> smatch output more legible. And anyways some of the cases should
> be using rcu_dereference() see qdisc_all_tx_empty(),
> qdisc_tx_chainging(), and so on.
> 
> Also *wake_queue() API is commonly called from driver timer routines
> without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
> around netif_wake_subqueue and netif_tx_wake_queue.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

...

>  
> @@ -416,10 +423,13 @@ static inline bool qdisc_all_tx_empty(const struct net_device *dev)
>  static inline bool qdisc_tx_changing(const struct net_device *dev)
>  {
>  	unsigned int i;
> +
>  	for (i = 0; i < dev->num_tx_queues; i++) {
>  		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
> -		if (txq->qdisc != txq->qdisc_sleeping)
> +		if (rcu_access_pointer(txq->qdisc) != txq->qdisc_sleeping) {
> +			rcu_read_unlock();

You forgot to remove this rcu_read_unlock();

>  			return true;
> +		}
>  	}
>  	return false;
>  }
> @@ -428,10 +438,13 @@ static inline bool qdisc_tx_changing(const struct net_device *dev)
>  static inline bool qdisc_tx_is_noop(const struct net_device *dev)
>  {
>  	unsigned int i;
> +
>  	for (i = 0; i < dev->num_tx_queues; i++) {
>  		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
> -		if (txq->qdisc != &noop_qdisc)
> +		if (rcu_dereference(txq->qdisc) != &noop_qdisc) {

		rcu_access_pointer()

> +			rcu_read_unlock();

You forgot to remove this rcu_read_unlock();

>  			return false;
> +		}
>  	}
>  	return true;
>  }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 02/16] net: rcu-ify tcf_proto
  2014-09-10 15:47 ` [net-next PATCH v4 02/16] net: rcu-ify tcf_proto John Fastabend
@ 2014-09-11  0:56   ` Eric Dumazet
  2014-09-12 15:03     ` John Fastabend
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  0:56 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:47 -0700, John Fastabend wrote:
> rcu'ify tcf_proto this allows calling tc_classify() without holding
> any locks. Updaters are protected by RTNL.
> 
> This patch prepares the core net_sched infrastracture for running
> the classifier/action chains without holding the qdisc lock however
> it does nothing to ensure cls_xxx and act_xxx types also work without
> locking. Additional patches are required to address the fall out.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
...
> diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
> index ed30e43..4b52b70 100644
> --- a/net/sched/sch_choke.c
> +++ b/net/sched/sch_choke.c
> @@ -57,7 +57,7 @@ struct choke_sched_data {
>  
>  /* Variables */
>  	struct red_vars  vars;
> -	struct tcf_proto *filter_list;
> +	struct tcf_proto __rcu *filter_list;
>  	struct {
>  		u32	prob_drop;	/* Early probability drops */
>  		u32	prob_mark;	/* Early probability marks */
> @@ -193,9 +193,11 @@ static bool choke_classify(struct sk_buff *skb,
>  {
>  	struct choke_sched_data *q = qdisc_priv(sch);
>  	struct tcf_result res;
> +	struct tcf_proto *fl;
>  	int result;
>  
> -	result = tc_classify(skb, q->filter_list, &res);
> +	fl = rcu_dereference_bh(q->filter_list);

Hmm... please change the caller to pass fl.

Idea is to read q->filter_list once.

> +	result = tc_classify(skb, fl, &res);
>  	if (result >= 0) {
>  #ifdef CONFIG_NET_CLS_ACT
>  		switch (result) {
> @@ -244,12 +246,14 @@ static bool choke_match_random(const struct choke_sched_data *q,
>  			       unsigned int *pidx)
>  {
>  	struct sk_buff *oskb;
> +	struct tcf_proto *fl;
>  
>  	if (q->head == q->tail)
>  		return false;
>  
>  	oskb = choke_peek_random(q, pidx);
> -	if (q->filter_list)
> +	fl = rcu_dereference_bh(q->filter_list);

You could use rcu_access_pointer() and not have this fl variable.

> +	if (fl)
>  		return choke_get_classid(nskb) == choke_get_classid(oskb);
>  
>  	return choke_match_flow(oskb, nskb);
> @@ -259,9 +263,11 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  {
>  	struct choke_sched_data *q = qdisc_priv(sch);
>  	const struct red_parms *p = &q->parms;
> +	struct tcf_proto *fl;
>  	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
>  
> -	if (q->filter_list) {
> +	fl = rcu_dereference_bh(q->filter_list);
> +	if (fl) {
>  		/* If using external classifiers, get result and record it. */
>  		if (!choke_classify(skb, sch, &ret))

Here I think you should pass fl as an additional parameter to
choke_classify()


OR, just use rcu_access_pointer() here as you do not deref
q->filter_list here.


>  			goto other_drop;	/* Packet was eaten by filter */
> @@ -554,7 +560,8 @@ static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
>  	return 0;
>  }
>  
> -static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
> +static struct tcf_proto __rcu **choke_find_tcf(struct Qdisc *sch,
> +					       unsigned long cl)
>  {
>  	struct choke_sched_data *q = qdisc_priv(sch);
>  

remaining part seems fine.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 05/16] net: sched: cls_flow use RCU
  2014-09-10 15:48 ` [net-next PATCH v4 05/16] net: sched: cls_flow " John Fastabend
@ 2014-09-11  0:58   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  0:58 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:48 -0700, John Fastabend wrote:
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  net/sched/cls_flow.c |  145 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 84 insertions(+), 61 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 06/16] net: sched: fw use RCU
  2014-09-10 15:49 ` [net-next PATCH v4 06/16] net: sched: fw " John Fastabend
@ 2014-09-11  1:03   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  1:03 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:49 -0700, John Fastabend wrote:
> RCU'ify fw classifier.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  net/sched/cls_fw.c |  111 ++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 77 insertions(+), 34 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 07/16] net: sched: RCU cls_route
  2014-09-10 15:49 ` [net-next PATCH v4 07/16] net: sched: RCU cls_route John Fastabend
@ 2014-09-11  1:12   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  1:12 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:49 -0700, John Fastabend wrote:
> RCUify the route classifier. For now however spinlock's are used to
> protect fastmap cache.
> 
> The issue here is the fastmap may be read by one CPU while the
> cache is being updated by another. An array of pointers could be
> one possible solution.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 08/16] net: sched: RCU cls_tcindex
  2014-09-10 15:50 ` [net-next PATCH v4 08/16] net: sched: RCU cls_tcindex John Fastabend
@ 2014-09-11  1:17   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  1:17 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:50 -0700, John Fastabend wrote:
> Make cls_tcindex RCU safe.
> 
> This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
> caller either holds the rcu read lock or RTNL. This is needed to
> handle the case where tcindex_lookup() is being called in both cases.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 09/16] net: sched: make cls_u32 per cpu
  2014-09-10 15:50 ` [net-next PATCH v4 09/16] net: sched: make cls_u32 per cpu John Fastabend
@ 2014-09-11  1:19   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  1:19 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:50 -0700, John Fastabend wrote:
> This uses per cpu counters in cls_u32 in preparation
> to convert over to rcu.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 10/16] net: sched: make cls_u32 lockless
  2014-09-10 15:50 ` [net-next PATCH v4 10/16] net: sched: make cls_u32 lockless John Fastabend
@ 2014-09-11  1:26   ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  1:26 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:50 -0700, John Fastabend wrote:
> Make cls_u32 classifier safe to run without holding lock. This patch
> converts statistics that are kept in read section u32_classify into
> per cpu counters.
> 
> This patch was tested with a tight u32 filter add/delete loop while
> generating traffic with pktgen. By running pktgen on vlan devices
> created on top of a physical device we can hit the qdisc layer
> correctly. For ingress qdisc's a loopback cable was used.
> 
> for i in {1..100}; do
>         q=`echo $i%8|bc`;
>         echo -n "u32 tos: iteration $i on queue $q";
>         tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
>                   action skbedit queue_mapping $q;
>         sleep 1;
>         tc filter del dev p3p2 prio $i;
> 
>         echo -n "u32 tos hash table: iteration $i on queue $q";
>         tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
>         tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
>                 match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
>         tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
>                 ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
>         sleep 2;
>         tc filter del dev p3p2 prio $i
>         sleep 1;
> done
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 11/16] net: sched: rcu'ify cls_rsvp
  2014-09-10 15:51 ` [net-next PATCH v4 11/16] net: sched: rcu'ify cls_rsvp John Fastabend
@ 2014-09-11  1:30   ` Eric Dumazet
  2014-09-12 15:13     ` John Fastabend
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  1:30 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:51 -0700, John Fastabend wrote:
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  net/sched/cls_rsvp.h |  157 ++++++++++++++++++++++++++++----------------------
>  1 file changed, 89 insertions(+), 68 deletions(-)
> 
> diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
> index 1020e23..afa508b 100644
> --- a/net/sched/cls_rsvp.h
> +++ b/net/sched/cls_rsvp.h
> @@ -70,31 +70,34 @@ struct rsvp_head {
>  	u32			tmap[256/32];
>  	u32			hgenerator;
>  	u8			tgenerator;
> -	struct rsvp_session	*ht[256];
> +	struct rsvp_session __rcu *ht[256];
> +	struct rcu_head		rcu;
>  };
>  
>  struct rsvp_session {
> -	struct rsvp_session	*next;
> -	__be32			dst[RSVP_DST_LEN];
> -	struct tc_rsvp_gpi 	dpi;
> -	u8			protocol;
> -	u8			tunnelid;
> +	struct rsvp_session __rcu	*next;
> +	__be32				dst[RSVP_DST_LEN];
> +	struct tc_rsvp_gpi		dpi;
> +	u8				protocol;
> +	u8				tunnelid;
>  	/* 16 (src,sport) hash slots, and one wildcard source slot */
> -	struct rsvp_filter	*ht[16 + 1];
> +	struct rsvp_filter __rcu	*ht[16 + 1];
> +	struct rcu_head			rcu;
>  };
>  
> 
>  struct rsvp_filter {
> -	struct rsvp_filter	*next;
> -	__be32			src[RSVP_DST_LEN];
> -	struct tc_rsvp_gpi	spi;
> -	u8			tunnelhdr;
> +	struct rsvp_filter __rcu	*next;
> +	__be32				src[RSVP_DST_LEN];
> +	struct tc_rsvp_gpi		spi;
> +	u8				tunnelhdr;
>  
> -	struct tcf_result	res;
> -	struct tcf_exts		exts;
> +	struct tcf_result		res;
> +	struct tcf_exts			exts;
>  
> -	u32			handle;
> -	struct rsvp_session	*sess;
> +	u32				handle;
> +	struct rsvp_session		*sess;
> +	struct rcu_head			rcu;
>  };
>  
>  static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid)
> @@ -128,7 +131,7 @@ static inline unsigned int hash_src(__be32 *src)
>  static int rsvp_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>  			 struct tcf_result *res)
>  {
> -	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
> +	struct rsvp_head *head = rcu_dereference_bh(tp->root);
>  	struct rsvp_session *s;
>  	struct rsvp_filter *f;
>  	unsigned int h1, h2;
> @@ -169,7 +172,8 @@ restart:
>  	h1 = hash_dst(dst, protocol, tunnelid);
>  	h2 = hash_src(src);
>  
> -	for (s = sht[h1]; s; s = s->next) {
> +	for (s = rcu_dereference_bh(head->ht[h1]); s;
> +	     s = rcu_dereference_bh(s->next)) {
>  		if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN - 1] &&
>  		    protocol == s->protocol &&
>  		    !(s->dpi.mask &
> @@ -181,7 +185,8 @@ restart:
>  #endif
>  		    tunnelid == s->tunnelid) {
>  
> -			for (f = s->ht[h2]; f; f = f->next) {
> +			for (f = rcu_dereference_bh(s->ht[h2]); f;
> +			     f = rcu_dereference_bh(f->next)) {
>  				if (src[RSVP_DST_LEN-1] == f->src[RSVP_DST_LEN - 1] &&
>  				    !(f->spi.mask & (*(u32 *)(xprt + f->spi.offset) ^ f->spi.key))
>  #if RSVP_DST_LEN == 4
> @@ -205,7 +210,8 @@ matched:
>  			}
>  
>  			/* And wildcard bucket... */
> -			for (f = s->ht[16]; f; f = f->next) {
> +			for (f = rcu_dereference_bh(s->ht[16]); f;
> +			     f = rcu_dereference_bh(f->next)) {
>  				*res = f->res;
>  				RSVP_APPLY_RESULT();
>  				goto matched;
> @@ -218,7 +224,7 @@ matched:
>  
>  static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
>  {
> -	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
> +	struct rsvp_head *head = rtnl_dereference(tp->root);
>  	struct rsvp_session *s;
>  	struct rsvp_filter *f;
>  	unsigned int h1 = handle & 0xFF;
> @@ -227,8 +233,10 @@ static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
>  	if (h2 > 16)
>  		return 0;
>  
> -	for (s = sht[h1]; s; s = s->next) {
> -		for (f = s->ht[h2]; f; f = f->next) {
> +	for (s = rtnl_dereference(head->ht[h1]); s;
> +	     s = rtnl_dereference(s->next)) {
> +		for (f = rtnl_dereference(s->ht[h2]); f;
> +		     f = rtnl_dereference(f->next)) {
>  			if (f->handle == handle)
>  				return (unsigned long)f;
>  		}
> @@ -246,7 +254,7 @@ static int rsvp_init(struct tcf_proto *tp)
>  
>  	data = kzalloc(sizeof(struct rsvp_head), GFP_KERNEL);
>  	if (data) {
> -		tp->root = data;
> +		rcu_assign_pointer(tp->root, data);
>  		return 0;
>  	}
>  	return -ENOBUFS;
> @@ -257,53 +265,55 @@ rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
>  {
>  	tcf_unbind_filter(tp, &f->res);
>  	tcf_exts_destroy(tp, &f->exts);
> -	kfree(f);
> +	kfree_rcu(f, rcu);
>  }
>  
>  static void rsvp_destroy(struct tcf_proto *tp)
>  {
> -	struct rsvp_head *data = xchg(&tp->root, NULL);
> -	struct rsvp_session **sht;
> +	struct rsvp_head *data = rtnl_dereference(tp->root);
>  	int h1, h2;
>  
>  	if (data == NULL)
>  		return;
>  
> -	sht = data->ht;
> +	RCU_INIT_POINTER(tp->root, NULL);
>  
>  	for (h1 = 0; h1 < 256; h1++) {
>  		struct rsvp_session *s;
>  
> -		while ((s = sht[h1]) != NULL) {
> -			sht[h1] = s->next;
> +		while ((s = rtnl_dereference(data->ht[h1])) != NULL) {
> +			RCU_INIT_POINTER(data->ht[h1], s->next);
>  
>  			for (h2 = 0; h2 <= 16; h2++) {
>  				struct rsvp_filter *f;
>  
> -				while ((f = s->ht[h2]) != NULL) {
> -					s->ht[h2] = f->next;
> +				while ((f = rtnl_dereference(s->ht[h2])) != NULL) {
> +					rcu_assign_pointer(s->ht[h2], f->next);
>  					rsvp_delete_filter(tp, f);
>  				}
>  			}
> -			kfree(s);
> +			kfree_rcu(s, rcu);
>  		}
>  	}
> -	kfree(data);
> +	RCU_INIT_POINTER(tp->root, NULL);

Strange, you already did the RCU_INIT_POINTER(tp->root, NULL) before the
for(h1 = 0; h1 < 256; h1++) loop

 

> +	kfree_rcu(data, rcu);
>  }
>  
>  static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
>  {
> -	struct rsvp_filter **fp, *f = (struct rsvp_filter *)arg;
> +	struct rsvp_head *head = rtnl_dereference(tp->root);
> +	struct rsvp_filter *nfp, *f = (struct rsvp_filter *)arg;
> +	struct rsvp_filter __rcu **fp;
>  	unsigned int h = f->handle;
> -	struct rsvp_session **sp;
> -	struct rsvp_session *s = f->sess;
> +	struct rsvp_session __rcu **sp;
> +	struct rsvp_session *nsp, *s = f->sess;
>  	int i;
>  
> -	for (fp = &s->ht[(h >> 8) & 0xFF]; *fp; fp = &(*fp)->next) {
> -		if (*fp == f) {
> -			tcf_tree_lock(tp);
> +	fp = &s->ht[(h >> 8) & 0xFF];
> +	for (nfp = rtnl_dereference(*fp); nfp;
> +	     fp = &nfp->next, nfp = rtnl_dereference(*fp)) {
> +		if (nfp == f) {
>  			*fp = f->next;

It seems you do not follow your own convention here ?

RCU_INIT_POINTER(*fp, f->next);

> -			tcf_tree_unlock(tp);
>  			rsvp_delete_filter(tp, f);
>  
>  			/* Strip tree */
> @@ -313,14 +323,12 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
>  					return 0;
>  
>  			/* OK, session has no flows */
> -			for (sp = &((struct rsvp_head *)tp->root)->ht[h & 0xFF];
> -			     *sp; sp = &(*sp)->next) {
> -				if (*sp == s) {
> -					tcf_tree_lock(tp);
> +			sp = &head->ht[h & 0xFF];
> +			for (nsp = rtnl_dereference(*sp); nsp;
> +			     sp = &nsp->next, nsp = rtnl_dereference(*sp)) {
> +				if (nsp == s) {
>  					*sp = s->next;

Same remark here.

> -					tcf_tree_unlock(tp);
> -
> -					kfree(s);
> +					kfree_rcu(s, rcu);
>  					return 0;
>  				}
>  			}
> @@ -333,7 +341,7 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)

Thanks !

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 12/16] net: sched: rcu'ify cls_bpf
  2014-09-10 15:51 ` [net-next PATCH v4 12/16] net: sched: rcu'ify cls_bpf John Fastabend
@ 2014-09-11  2:28   ` Eric Dumazet
  2014-09-12 15:16     ` John Fastabend
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2014-09-11  2:28 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On Wed, 2014-09-10 at 08:51 -0700, John Fastabend wrote:
> This patch makes the cls_bpf classifier RCU safe. The tcf_lock
> was being used to protect a list of cls_bpf_prog now this list
> is RCU safe and updates occur with rcu_replace.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

...

> @@ -256,18 +254,19 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
>  	if (ret < 0)
>  		return ret;
>  
> -	if (prog != NULL) {
> -		if (handle && prog->handle != handle)
> -			return -EINVAL;
> -		return cls_bpf_modify_existing(net, tp, prog, base, tb,
> -					       tca[TCA_RATE], ovr);
> -	}
> -
>  	prog = kzalloc(sizeof(*prog), GFP_KERNEL);
> -	if (prog == NULL)
> +	if (!prog)
>  		return -ENOBUFS;
>  
>  	tcf_exts_init(&prog->exts, TCA_BPF_ACT, TCA_BPF_POLICE);
> +
> +	if (oldprog) {
> +		if (handle && oldprog->handle != handle) {
> +			ret = -EINVAL;
> +			goto errout;
> +		}
> +	}
> +
>  	if (handle == 0)
>  		prog->handle = cls_bpf_grab_new_handle(tp, head);
>  	else
> @@ -281,15 +280,17 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
>  	if (ret < 0)
>  		goto errout;
>  
> -	tcf_tree_lock(tp);
> -	list_add(&prog->link, &head->plist);
> -	tcf_tree_unlock(tp);
> +	if (oldprog) {
> +		list_replace_rcu(&prog->link, &oldprog->link);
> +		call_rcu(&oldprog->rcu, __cls_bpf_delete_prog);
> +	} else {
> +		list_add_rcu(&prog->link, &head->plist);
> +	}
>  
>  	*arg = (unsigned long) prog;
> -
>  	return 0;
>  errout:
> -	if (*arg == 0UL && prog)
> +	if (prog)
>  		kfree(prog);
>  

nit, you can directly call kfree(prog) even if prog == NULL

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 02/16] net: rcu-ify tcf_proto
  2014-09-11  0:56   ` Eric Dumazet
@ 2014-09-12 15:03     ` John Fastabend
  0 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-12 15:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On 09/10/2014 05:56 PM, Eric Dumazet wrote:
> On Wed, 2014-09-10 at 08:47 -0700, John Fastabend wrote:
>> rcu'ify tcf_proto this allows calling tc_classify() without holding
>> any locks. Updaters are protected by RTNL.
>>
>> This patch prepares the core net_sched infrastracture for running
>> the classifier/action chains without holding the qdisc lock however
>> it does nothing to ensure cls_xxx and act_xxx types also work without
>> locking. Additional patches are required to address the fall out.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
> ...
>> diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
>> index ed30e43..4b52b70 100644
>> --- a/net/sched/sch_choke.c
>> +++ b/net/sched/sch_choke.c
>> @@ -57,7 +57,7 @@ struct choke_sched_data {
>>
>>   /* Variables */
>>   	struct red_vars  vars;
>> -	struct tcf_proto *filter_list;
>> +	struct tcf_proto __rcu *filter_list;
>>   	struct {
>>   		u32	prob_drop;	/* Early probability drops */
>>   		u32	prob_mark;	/* Early probability marks */
>> @@ -193,9 +193,11 @@ static bool choke_classify(struct sk_buff *skb,
>>   {
>>   	struct choke_sched_data *q = qdisc_priv(sch);
>>   	struct tcf_result res;
>> +	struct tcf_proto *fl;
>>   	int result;
>>
>> -	result = tc_classify(skb, q->filter_list, &res);
>> +	fl = rcu_dereference_bh(q->filter_list);
>
> Hmm... please change the caller to pass fl.
>
> Idea is to read q->filter_list once.
>

I'll just use rcu_access_pointer() in the caller and leave this
rcu_dereference_bh() here.

>> +	result = tc_classify(skb, fl, &res);
>>   	if (result >= 0) {
>>   #ifdef CONFIG_NET_CLS_ACT
>>   		switch (result) {
>> @@ -244,12 +246,14 @@ static bool choke_match_random(const struct choke_sched_data *q,
>>   			       unsigned int *pidx)
>>   {
>>   	struct sk_buff *oskb;
>> +	struct tcf_proto *fl;
>>
>>   	if (q->head == q->tail)
>>   		return false;
>>
>>   	oskb = choke_peek_random(q, pidx);
>> -	if (q->filter_list)
>> +	fl = rcu_dereference_bh(q->filter_list);
>
> You could use rcu_access_pointer() and not have this fl variable.
>

done thanks.

>> +	if (fl)
>>   		return choke_get_classid(nskb) == choke_get_classid(oskb);
>>
>>   	return choke_match_flow(oskb, nskb);
>> @@ -259,9 +263,11 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>>   {
>>   	struct choke_sched_data *q = qdisc_priv(sch);
>>   	const struct red_parms *p = &q->parms;
>> +	struct tcf_proto *fl;
>>   	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
>>
>> -	if (q->filter_list) {
>> +	fl = rcu_dereference_bh(q->filter_list);
>> +	if (fl) {
>>   		/* If using external classifiers, get result and record it. */
>>   		if (!choke_classify(skb, sch, &ret))
>
> Here I think you should pass fl as an additional parameter to
> choke_classify()
>
>
> OR, just use rcu_access_pointer() here as you do not deref
> q->filter_list here.
>

Went with rcu_access_pointer.

>
>>   			goto other_drop;	/* Packet was eaten by filter */
>> @@ -554,7 +560,8 @@ static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
>>   	return 0;
>>   }
>>
>> -static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
>> +static struct tcf_proto __rcu **choke_find_tcf(struct Qdisc *sch,
>> +					       unsigned long cl)
>>   {
>>   	struct choke_sched_data *q = qdisc_priv(sch);
>>
>
> remaining part seems fine.
>
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 11/16] net: sched: rcu'ify cls_rsvp
  2014-09-11  1:30   ` Eric Dumazet
@ 2014-09-12 15:13     ` John Fastabend
  0 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-12 15:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On 09/10/2014 06:30 PM, Eric Dumazet wrote:
> On Wed, 2014-09-10 at 08:51 -0700, John Fastabend wrote:
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>   net/sched/cls_rsvp.h |  157 ++++++++++++++++++++++++++++----------------------
>>   1 file changed, 89 insertions(+), 68 deletions(-)
>>
>> diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
>> index 1020e23..afa508b 100644
>> --- a/net/sched/cls_rsvp.h
>> +++ b/net/sched/cls_rsvp.h
>> @@ -70,31 +70,34 @@ struct rsvp_head {
>>   	u32			tmap[256/32];
>>   	u32			hgenerator;
>>   	u8			tgenerator;
>> -	struct rsvp_session	*ht[256];
>> +	struct rsvp_session __rcu *ht[256];
>> +	struct rcu_head		rcu;
>>   };
>>
>>   struct rsvp_session {
>> -	struct rsvp_session	*next;
>> -	__be32			dst[RSVP_DST_LEN];
>> -	struct tc_rsvp_gpi 	dpi;
>> -	u8			protocol;
>> -	u8			tunnelid;
>> +	struct rsvp_session __rcu	*next;
>> +	__be32				dst[RSVP_DST_LEN];
>> +	struct tc_rsvp_gpi		dpi;
>> +	u8				protocol;
>> +	u8				tunnelid;
>>   	/* 16 (src,sport) hash slots, and one wildcard source slot */
>> -	struct rsvp_filter	*ht[16 + 1];
>> +	struct rsvp_filter __rcu	*ht[16 + 1];
>> +	struct rcu_head			rcu;
>>   };
>>
>>
>>   struct rsvp_filter {
>> -	struct rsvp_filter	*next;
>> -	__be32			src[RSVP_DST_LEN];
>> -	struct tc_rsvp_gpi	spi;
>> -	u8			tunnelhdr;
>> +	struct rsvp_filter __rcu	*next;
>> +	__be32				src[RSVP_DST_LEN];
>> +	struct tc_rsvp_gpi		spi;
>> +	u8				tunnelhdr;
>>
>> -	struct tcf_result	res;
>> -	struct tcf_exts		exts;
>> +	struct tcf_result		res;
>> +	struct tcf_exts			exts;
>>
>> -	u32			handle;
>> -	struct rsvp_session	*sess;
>> +	u32				handle;
>> +	struct rsvp_session		*sess;
>> +	struct rcu_head			rcu;
>>   };
>>
>>   static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid)
>> @@ -128,7 +131,7 @@ static inline unsigned int hash_src(__be32 *src)
>>   static int rsvp_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>>   			 struct tcf_result *res)
>>   {
>> -	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
>> +	struct rsvp_head *head = rcu_dereference_bh(tp->root);
>>   	struct rsvp_session *s;
>>   	struct rsvp_filter *f;
>>   	unsigned int h1, h2;
>> @@ -169,7 +172,8 @@ restart:
>>   	h1 = hash_dst(dst, protocol, tunnelid);
>>   	h2 = hash_src(src);
>>
>> -	for (s = sht[h1]; s; s = s->next) {
>> +	for (s = rcu_dereference_bh(head->ht[h1]); s;
>> +	     s = rcu_dereference_bh(s->next)) {
>>   		if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN - 1] &&
>>   		    protocol == s->protocol &&
>>   		    !(s->dpi.mask &
>> @@ -181,7 +185,8 @@ restart:
>>   #endif
>>   		    tunnelid == s->tunnelid) {
>>
>> -			for (f = s->ht[h2]; f; f = f->next) {
>> +			for (f = rcu_dereference_bh(s->ht[h2]); f;
>> +			     f = rcu_dereference_bh(f->next)) {
>>   				if (src[RSVP_DST_LEN-1] == f->src[RSVP_DST_LEN - 1] &&
>>   				    !(f->spi.mask & (*(u32 *)(xprt + f->spi.offset) ^ f->spi.key))
>>   #if RSVP_DST_LEN == 4
>> @@ -205,7 +210,8 @@ matched:
>>   			}
>>
>>   			/* And wildcard bucket... */
>> -			for (f = s->ht[16]; f; f = f->next) {
>> +			for (f = rcu_dereference_bh(s->ht[16]); f;
>> +			     f = rcu_dereference_bh(f->next)) {
>>   				*res = f->res;
>>   				RSVP_APPLY_RESULT();
>>   				goto matched;
>> @@ -218,7 +224,7 @@ matched:
>>
>>   static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
>>   {
>> -	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
>> +	struct rsvp_head *head = rtnl_dereference(tp->root);
>>   	struct rsvp_session *s;
>>   	struct rsvp_filter *f;
>>   	unsigned int h1 = handle & 0xFF;
>> @@ -227,8 +233,10 @@ static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
>>   	if (h2 > 16)
>>   		return 0;
>>
>> -	for (s = sht[h1]; s; s = s->next) {
>> -		for (f = s->ht[h2]; f; f = f->next) {
>> +	for (s = rtnl_dereference(head->ht[h1]); s;
>> +	     s = rtnl_dereference(s->next)) {
>> +		for (f = rtnl_dereference(s->ht[h2]); f;
>> +		     f = rtnl_dereference(f->next)) {
>>   			if (f->handle == handle)
>>   				return (unsigned long)f;
>>   		}
>> @@ -246,7 +254,7 @@ static int rsvp_init(struct tcf_proto *tp)
>>
>>   	data = kzalloc(sizeof(struct rsvp_head), GFP_KERNEL);
>>   	if (data) {
>> -		tp->root = data;
>> +		rcu_assign_pointer(tp->root, data);
>>   		return 0;
>>   	}
>>   	return -ENOBUFS;
>> @@ -257,53 +265,55 @@ rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
>>   {
>>   	tcf_unbind_filter(tp, &f->res);
>>   	tcf_exts_destroy(tp, &f->exts);
>> -	kfree(f);
>> +	kfree_rcu(f, rcu);
>>   }
>>
>>   static void rsvp_destroy(struct tcf_proto *tp)
>>   {
>> -	struct rsvp_head *data = xchg(&tp->root, NULL);
>> -	struct rsvp_session **sht;
>> +	struct rsvp_head *data = rtnl_dereference(tp->root);
>>   	int h1, h2;
>>
>>   	if (data == NULL)
>>   		return;
>>
>> -	sht = data->ht;
>> +	RCU_INIT_POINTER(tp->root, NULL);
>>
>>   	for (h1 = 0; h1 < 256; h1++) {
>>   		struct rsvp_session *s;
>>
>> -		while ((s = sht[h1]) != NULL) {
>> -			sht[h1] = s->next;
>> +		while ((s = rtnl_dereference(data->ht[h1])) != NULL) {
>> +			RCU_INIT_POINTER(data->ht[h1], s->next);
>>
>>   			for (h2 = 0; h2 <= 16; h2++) {
>>   				struct rsvp_filter *f;
>>
>> -				while ((f = s->ht[h2]) != NULL) {
>> -					s->ht[h2] = f->next;
>> +				while ((f = rtnl_dereference(s->ht[h2])) != NULL) {
>> +					rcu_assign_pointer(s->ht[h2], f->next);
>>   					rsvp_delete_filter(tp, f);
>>   				}
>>   			}
>> -			kfree(s);
>> +			kfree_rcu(s, rcu);
>>   		}
>>   	}
>> -	kfree(data);
>> +	RCU_INIT_POINTER(tp->root, NULL);
>
> Strange, you already did the RCU_INIT_POINTER(tp->root, NULL) before the
> for(h1 = 0; h1 < 256; h1++) loop
>

Yep, I'll drop this duplicate call.

>
>
>> +	kfree_rcu(data, rcu);
>>   }
>>
>>   static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
>>   {
>> -	struct rsvp_filter **fp, *f = (struct rsvp_filter *)arg;
>> +	struct rsvp_head *head = rtnl_dereference(tp->root);
>> +	struct rsvp_filter *nfp, *f = (struct rsvp_filter *)arg;
>> +	struct rsvp_filter __rcu **fp;
>>   	unsigned int h = f->handle;
>> -	struct rsvp_session **sp;
>> -	struct rsvp_session *s = f->sess;
>> +	struct rsvp_session __rcu **sp;
>> +	struct rsvp_session *nsp, *s = f->sess;
>>   	int i;
>>
>> -	for (fp = &s->ht[(h >> 8) & 0xFF]; *fp; fp = &(*fp)->next) {
>> -		if (*fp == f) {
>> -			tcf_tree_lock(tp);
>> +	fp = &s->ht[(h >> 8) & 0xFF];
>> +	for (nfp = rtnl_dereference(*fp); nfp;
>> +	     fp = &nfp->next, nfp = rtnl_dereference(*fp)) {
>> +		if (nfp == f) {
>>   			*fp = f->next;
>
> It seems you do not follow your own convention here ?
>
> RCU_INIT_POINTER(*fp, f->next);

converted both cases to RCU_INIT_POINTER().

Thanks.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH v4 12/16] net: sched: rcu'ify cls_bpf
  2014-09-11  2:28   ` Eric Dumazet
@ 2014-09-12 15:16     ` John Fastabend
  0 siblings, 0 replies; 30+ messages in thread
From: John Fastabend @ 2014-09-12 15:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: xiyou.wangcong, davem, jhs, netdev, paulmck, brouer

On 09/10/2014 07:28 PM, Eric Dumazet wrote:
> On Wed, 2014-09-10 at 08:51 -0700, John Fastabend wrote:
>> This patch makes the cls_bpf classifier RCU safe. The tcf_lock
>> was being used to protect a list of cls_bpf_prog now this list
>> is RCU safe and updates occur with rcu_replace.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>
> ...

[...]

>> @@ -281,15 +280,17 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
>>   	if (ret < 0)
>>   		goto errout;
>>
>> -	tcf_tree_lock(tp);
>> -	list_add(&prog->link, &head->plist);
>> -	tcf_tree_unlock(tp);
>> +	if (oldprog) {
>> +		list_replace_rcu(&prog->link, &oldprog->link);
>> +		call_rcu(&oldprog->rcu, __cls_bpf_delete_prog);
>> +	} else {
>> +		list_add_rcu(&prog->link, &head->plist);
>> +	}
>>
>>   	*arg = (unsigned long) prog;
>> -
>>   	return 0;
>>   errout:
>> -	if (*arg == 0UL && prog)
>> +	if (prog)
>>   		kfree(prog);
>>
>
> nit, you can directly call kfree(prog) even if prog == NULL
>

Yep, made the change.

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-09-12 15:16 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-10 15:46 [net-next PATCH v4 00/16] net/sched use rcu filters John Fastabend
2014-09-10 15:47 ` [net-next PATCH v4 01/16] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
2014-09-11  0:23   ` Eric Dumazet
2014-09-10 15:47 ` [net-next PATCH v4 02/16] net: rcu-ify tcf_proto John Fastabend
2014-09-11  0:56   ` Eric Dumazet
2014-09-12 15:03     ` John Fastabend
2014-09-10 15:47 ` [net-next PATCH v4 03/16] net: sched: cls_basic use RCU John Fastabend
2014-09-10 15:48 ` [net-next PATCH v4 04/16] net: sched: cls_cgroup " John Fastabend
2014-09-10 15:48 ` [net-next PATCH v4 05/16] net: sched: cls_flow " John Fastabend
2014-09-11  0:58   ` Eric Dumazet
2014-09-10 15:49 ` [net-next PATCH v4 06/16] net: sched: fw " John Fastabend
2014-09-11  1:03   ` Eric Dumazet
2014-09-10 15:49 ` [net-next PATCH v4 07/16] net: sched: RCU cls_route John Fastabend
2014-09-11  1:12   ` Eric Dumazet
2014-09-10 15:50 ` [net-next PATCH v4 08/16] net: sched: RCU cls_tcindex John Fastabend
2014-09-11  1:17   ` Eric Dumazet
2014-09-10 15:50 ` [net-next PATCH v4 09/16] net: sched: make cls_u32 per cpu John Fastabend
2014-09-11  1:19   ` Eric Dumazet
2014-09-10 15:50 ` [net-next PATCH v4 10/16] net: sched: make cls_u32 lockless John Fastabend
2014-09-11  1:26   ` Eric Dumazet
2014-09-10 15:51 ` [net-next PATCH v4 11/16] net: sched: rcu'ify cls_rsvp John Fastabend
2014-09-11  1:30   ` Eric Dumazet
2014-09-12 15:13     ` John Fastabend
2014-09-10 15:51 ` [net-next PATCH v4 12/16] net: sched: rcu'ify cls_bpf John Fastabend
2014-09-11  2:28   ` Eric Dumazet
2014-09-12 15:16     ` John Fastabend
2014-09-10 15:52 ` [net-next PATCH v4 13/16] net: sched: make tc_action safe to walk under RCU John Fastabend
2014-09-10 15:52 ` [net-next PATCH v4 14/16] net: sched: make bstats per cpu and estimator RCU safe John Fastabend
2014-09-10 15:52 ` [net-next PATCH v4 15/16] net: sched: make qstats per cpu John Fastabend
2014-09-10 15:53 ` [net-next PATCH v4 16/16] net: sched: drop ingress qdisc lock John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.