All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc
@ 2014-04-30 16:34 John Fastabend
  2014-04-30 16:35 ` [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
                   ` (14 more replies)
  0 siblings, 15 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:34 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

This series drops the qdisc lock that is currently protecting the
ingress qdisc. This can be done after the tcf filters are made
lockless and the statistic accounting is safe to run without locks.

To do this the classifiers are converted to use RCU. This requires
updating each classifier individually to handle the new copy/update
requirement and also to update the core list traversals. This is
done in patches 2-11. This also makes the assumption that updates
to the tables are infrequent in comparison to the packet per second
being classified. On a 10Gbps running near line rate we can easily
produce 12+ million packets per second so IMO this is a reasonable
assumption. And the updates are serialized by RTNL.

In order to have working statistics patch 13 and 14 convert the
bstats and qstats, which do accounting for bytes and packets, into
percpu variables and the u64_stats_update_{begin|end} infrastructure
is used to maintain consistent 64bit statistics. Because these
statistics are also used by the estimators those function calls had
to be udpated as well. So that I didn't have to modify all qdiscs at
this time many of which don't have an easy path to make lockless the
percpu statistics are only used when the TCQ_F_LLQDISC flag is set.
Its worth noting that in the mq and mqprio case sub-qdisc's are
already mapped 1:1 with TX queues which tend to be equal to the number
of CPUs in the system so its not clear that removing locking in
these cases would provide any benefit. Most likely a new qdisc
written from scratch would be needed to implement a mq-htb or
mq-tbf qdisc.

As for some history I wrote what was basically these patches some
time ago and then got stalled working on other things. Cong Wang
made a proposal to remove the locking around the ingress qdisc
which then kicked me to get these patches working again.

I have done some basic testing on this series and do no see any
immediate splats or issues. I will continue doing some more testing
for the rest of the week before submitting without the RCU tag but
any feedback would be good. If someone has a better idea for the
percpu union of gnet_stats_basic_* in struct Qdisc or finds it
paticularly ugly that would be good to know. At this point I
believe the patch set is complete.

My test cases at this point cover all the filters with a
tight loop to add/remove filters. Some basic estimator tests
where I add an estimator to the qdisc and verify the statistics
accurate using pktgen. And finally I have a small script to
exercise the 'tc actions' interface. Feel free to send me more
tests off list and I can run them.

Performance numbers TBD (working on this).

Also there are still a few checkpatch warnings I need to resolve.

Future work:
  - provide metadata such as current cpu for the classifier
    to match on. this would allow for a multiqueue ingress
    qdisc strategy.
  - provide filter hook on egress before queue is selected
    to allow a classifier/action to pick the tx queue. This
    generalizes mqprio and should remove the need for many
    drivers to implement select_queue() callbacks.
  - create a variant of tbf that does not require the qdisc
    lock using eventually consistent counters.

---

John Fastabend (15):
      net: qdisc: use rcu prefix and silence sparse warnings
      net: rcu-ify tcf_proto
      net: sched: cls_basic use RCU
      net: sched: cls_cgroup use RCU
      net: sched: cls_flow use RCU
      net: sched: fw use RCU
      net: sched: RCU cls_route
      net: sched: RCU cls_tcindex
      net: sched: make cls_u32 lockless
      net: sched: rcu'ify cls_rsvp
      net: make cls_bpf rcu safe
      net: sched: make tc_action safe to walk under RCU
      net: sched: make bstats per cpu and estimator RCU safe
      net: sched: make qstats per cpu
      net: sched: drop ingress qdisc lock


 include/linux/netdevice.h  |   29 +----
 include/linux/rtnetlink.h  |   10 ++
 include/net/act_api.h      |    1 
 include/net/codel.h        |    4 -
 include/net/gen_stats.h    |   17 +++
 include/net/pkt_cls.h      |   12 ++
 include/net/sch_generic.h  |   93 +++++++++++++---
 net/core/dev.c             |   47 +++++++-
 net/core/gen_estimator.c   |   60 ++++++++--
 net/core/gen_stats.c       |   76 +++++++++++++
 net/netfilter/xt_RATEEST.c |    4 -
 net/sched/act_api.c        |   23 ++--
 net/sched/act_police.c     |    4 -
 net/sched/cls_api.c        |   47 ++++----
 net/sched/cls_basic.c      |   80 ++++++++------
 net/sched/cls_bpf.c        |   79 +++++++------
 net/sched/cls_cgroup.c     |   63 +++++++----
 net/sched/cls_flow.c       |  145 ++++++++++++++-----------
 net/sched/cls_fw.c         |  108 +++++++++++++-----
 net/sched/cls_route.c      |  221 ++++++++++++++++++++++----------------
 net/sched/cls_rsvp.h       |  152 +++++++++++++++-----------
 net/sched/cls_tcindex.c    |  240 +++++++++++++++++++++++++----------------
 net/sched/cls_u32.c        |  258 +++++++++++++++++++++++++++++---------------
 net/sched/sch_api.c        |   71 ++++++++++--
 net/sched/sch_atm.c        |   32 +++--
 net/sched/sch_cbq.c        |   40 ++++---
 net/sched/sch_choke.c      |   33 ++++--
 net/sched/sch_codel.c      |    2 
 net/sched/sch_drr.c        |   27 +++--
 net/sched/sch_dsmark.c     |   10 +-
 net/sched/sch_fifo.c       |    6 +
 net/sched/sch_fq.c         |    4 -
 net/sched/sch_fq_codel.c   |   19 ++-
 net/sched/sch_generic.c    |   20 +++
 net/sched/sch_gred.c       |   10 +-
 net/sched/sch_hfsc.c       |   47 +++++---
 net/sched/sch_hhf.c        |    8 +
 net/sched/sch_htb.c        |   45 +++++---
 net/sched/sch_ingress.c    |   20 +++
 net/sched/sch_mq.c         |   31 +++--
 net/sched/sch_mqprio.c     |   54 ++++++---
 net/sched/sch_multiq.c     |   18 ++-
 net/sched/sch_netem.c      |   17 ++-
 net/sched/sch_pie.c        |    8 +
 net/sched/sch_plug.c       |    2 
 net/sched/sch_prio.c       |   21 ++--
 net/sched/sch_qfq.c        |   29 +++--
 net/sched/sch_red.c        |   13 +-
 net/sched/sch_sfb.c        |   28 +++--
 net/sched/sch_sfq.c        |   30 +++--
 net/sched/sch_tbf.c        |   11 +-
 net/sched/sch_teql.c       |    9 +-
 52 files changed, 1552 insertions(+), 886 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
@ 2014-04-30 16:35 ` John Fastabend
  2014-04-30 17:00   ` Eric Dumazet
  2014-05-14 19:39   ` John Fastabend
  2014-04-30 16:35 ` [RFC PATCH 02/15] net: rcu-ify tcf_proto John Fastabend
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:35 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/netdevice.h |   29 ++++-------------------------
 include/net/sch_generic.h |   29 +++++++++++++++++++++++------
 net/core/dev.c            |   45 +++++++++++++++++++++++++++++++++++++++++++--
 net/sched/sch_generic.c   |    4 ++--
 net/sched/sch_mqprio.c    |    6 ++++--
 net/sched/sch_teql.c      |    9 +++++----
 6 files changed, 81 insertions(+), 41 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a803d79..4826988 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -546,7 +546,7 @@ struct netdev_queue {
  * read mostly part
  */
 	struct net_device	*dev;
-	struct Qdisc		*qdisc;
+	struct Qdisc __rcu	*qdisc;
 	struct Qdisc		*qdisc_sleeping;
 #ifdef CONFIG_SYSFS
 	struct kobject		kobj;
@@ -2138,12 +2138,7 @@ static inline void input_queue_tail_incr_save(struct softnet_data *sd,
 DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
 
 void __netif_schedule(struct Qdisc *q);
-
-static inline void netif_schedule_queue(struct netdev_queue *txq)
-{
-	if (!(txq->state & QUEUE_STATE_ANY_XOFF))
-		__netif_schedule(txq->qdisc);
-}
+void netif_schedule_queue(struct netdev_queue *txq);
 
 static inline void netif_tx_schedule_all(struct net_device *dev)
 {
@@ -2179,11 +2174,7 @@ static inline void netif_tx_start_all_queues(struct net_device *dev)
 	}
 }
 
-static inline void netif_tx_wake_queue(struct netdev_queue *dev_queue)
-{
-	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state))
-		__netif_schedule(dev_queue->qdisc);
-}
+void netif_tx_wake_queue(struct netdev_queue *dev_queue);
 
 /**
  *	netif_wake_queue - restart transmit
@@ -2455,19 +2446,7 @@ static inline bool netif_subqueue_stopped(const struct net_device *dev,
 	return __netif_subqueue_stopped(dev, skb_get_queue_mapping(skb));
 }
 
-/**
- *	netif_wake_subqueue - allow sending packets on subqueue
- *	@dev: network device
- *	@queue_index: sub queue index
- *
- * Resume individual transmit queue of a device with multiple transmit queues.
- */
-static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
-{
-	struct netdev_queue *txq = netdev_get_tx_queue(dev, queue_index);
-	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state))
-		__netif_schedule(txq->qdisc);
-}
+void netif_wake_subqueue(struct net_device *dev, u16 queue_index);
 
 #ifdef CONFIG_XPS
 int netif_set_xps_queue(struct net_device *dev, const struct cpumask *mask,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d062f81..dc226c0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -259,7 +259,9 @@ static inline spinlock_t *qdisc_lock(struct Qdisc *qdisc)
 
 static inline struct Qdisc *qdisc_root(const struct Qdisc *qdisc)
 {
-	return qdisc->dev_queue->qdisc;
+	struct Qdisc *q = rcu_dereference(qdisc->dev_queue->qdisc);
+
+	return q;
 }
 
 static inline struct Qdisc *qdisc_root_sleeping(const struct Qdisc *qdisc)
@@ -384,7 +386,7 @@ static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i)
 	struct Qdisc *qdisc;
 
 	for (; i < dev->num_tx_queues; i++) {
-		qdisc = netdev_get_tx_queue(dev, i)->qdisc;
+		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		if (qdisc) {
 			spin_lock_bh(qdisc_lock(qdisc));
 			qdisc_reset(qdisc);
@@ -402,13 +404,18 @@ static inline void qdisc_reset_all_tx(struct net_device *dev)
 static inline bool qdisc_all_tx_empty(const struct net_device *dev)
 {
 	unsigned int i;
+
+	rcu_read_lock();
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-		const struct Qdisc *q = txq->qdisc;
+		const struct Qdisc *q = rcu_dereference(txq->qdisc);
 
-		if (q->q.qlen)
+		if (q->q.qlen) {
+			rcu_read_unlock();
 			return false;
+		}
 	}
+	rcu_read_unlock();
 	return true;
 }
 
@@ -416,11 +423,16 @@ static inline bool qdisc_all_tx_empty(const struct net_device *dev)
 static inline bool qdisc_tx_changing(const struct net_device *dev)
 {
 	unsigned int i;
+
+	rcu_read_lock();
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-		if (txq->qdisc != txq->qdisc_sleeping)
+		if (rcu_dereference(txq->qdisc) != txq->qdisc_sleeping) {
+			rcu_read_unlock();
 			return true;
+		}
 	}
+	rcu_read_unlock();
 	return false;
 }
 
@@ -428,11 +440,16 @@ static inline bool qdisc_tx_changing(const struct net_device *dev)
 static inline bool qdisc_tx_is_noop(const struct net_device *dev)
 {
 	unsigned int i;
+
+	rcu_read_lock();
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-		if (txq->qdisc != &noop_qdisc)
+		if (rcu_dereference(txq->qdisc) != &noop_qdisc) {
+			rcu_read_unlock();
 			return false;
+		}
 	}
+	rcu_read_unlock();
 	return true;
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 11d70e3..5d88a89 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2159,6 +2159,47 @@ static struct dev_kfree_skb_cb *get_kfree_skb_cb(const struct sk_buff *skb)
 	return (struct dev_kfree_skb_cb *)skb->cb;
 }
 
+void netif_schedule_queue(struct netdev_queue *txq)
+{
+	rcu_read_lock();
+	if (!(txq->state & QUEUE_STATE_ANY_XOFF)) {
+		struct Qdisc *q = rcu_dereference(txq->qdisc);
+
+		__netif_schedule(q);
+	}
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(netif_schedule_queue);
+
+/**
+ *	netif_wake_subqueue - allow sending packets on subqueue
+ *	@dev: network device
+ *	@queue_index: sub queue index
+ *
+ * Resume individual transmit queue of a device with multiple transmit queues.
+ */
+void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
+{
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, queue_index);
+
+	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)) {
+		struct Qdisc *q = rcu_dereference(txq->qdisc);
+
+		__netif_schedule(q);
+	}
+}
+EXPORT_SYMBOL(netif_wake_subqueue);
+
+void netif_tx_wake_queue(struct netdev_queue *dev_queue)
+{
+	if (test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state)) {
+		struct Qdisc *q = rcu_dereference(dev_queue->qdisc);
+
+		__netif_schedule(q);
+	}
+}
+EXPORT_SYMBOL(netif_tx_wake_queue);
+
 void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason)
 {
 	unsigned long flags;
@@ -3401,7 +3442,7 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 	skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
 	skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
 
-	q = rxq->qdisc;
+	q = rcu_dereference(rxq->qdisc);
 	if (q != &noop_qdisc) {
 		spin_lock(qdisc_lock(q));
 		if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
@@ -3418,7 +3459,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 {
 	struct netdev_queue *rxq = rcu_dereference(skb->dev->ingress_queue);
 
-	if (!rxq || rxq->qdisc == &noop_qdisc)
+	if (!rxq || rcu_dereference_bh(rxq->qdisc) == &noop_qdisc)
 		goto out;
 
 	if (*pt_prev) {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e1543b0..cd34a49 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -778,7 +778,7 @@ static void dev_deactivate_queue(struct net_device *dev,
 	struct Qdisc *qdisc_default = _qdisc_default;
 	struct Qdisc *qdisc;
 
-	qdisc = dev_queue->qdisc;
+	qdisc = rtnl_dereference(dev_queue->qdisc);
 	if (qdisc) {
 		spin_lock_bh(qdisc_lock(qdisc));
 
@@ -871,7 +871,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
 {
 	struct Qdisc *qdisc = _qdisc;
 
-	dev_queue->qdisc = qdisc;
+	rcu_assign_pointer(dev_queue->qdisc, qdisc);
 	dev_queue->qdisc_sleeping = qdisc;
 }
 
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6749e2f..37e7d25 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -231,7 +231,7 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
-		qdisc = netdev_get_tx_queue(dev, i)->qdisc;
+		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
 		sch->bstats.bytes	+= qdisc->bstats.bytes;
@@ -340,7 +340,9 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		spin_unlock_bh(d->lock);
 
 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
-			qdisc = netdev_get_tx_queue(dev, i)->qdisc;
+			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
+
+			qdisc = rtnl_dereference(q->qdisc);
 			spin_lock_bh(qdisc_lock(qdisc));
 			bstats.bytes      += qdisc->bstats.bytes;
 			bstats.packets    += qdisc->bstats.packets;
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index 4741671..b35ba70 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -100,7 +100,8 @@ teql_dequeue(struct Qdisc *sch)
 	skb = __skb_dequeue(&dat->q);
 	dat_queue = netdev_get_tx_queue(dat->m->dev, 0);
 	if (skb == NULL) {
-		struct net_device *m = qdisc_dev(dat_queue->qdisc);
+		struct Qdisc *q = rcu_dereference_bh(dat_queue->qdisc);
+		struct net_device *m = qdisc_dev(q);
 		if (m) {
 			dat->m->slaves = sch;
 			netif_wake_queue(m);
@@ -157,9 +158,9 @@ teql_destroy(struct Qdisc *sch)
 						txq = netdev_get_tx_queue(master->dev, 0);
 						master->slaves = NULL;
 
-						root_lock = qdisc_root_sleeping_lock(txq->qdisc);
+						root_lock = qdisc_root_sleeping_lock(rtnl_dereference(txq->qdisc));
 						spin_lock_bh(root_lock);
-						qdisc_reset(txq->qdisc);
+						qdisc_reset(rtnl_dereference(txq->qdisc));
 						spin_unlock_bh(root_lock);
 					}
 				}
@@ -266,7 +267,7 @@ static inline int teql_resolve(struct sk_buff *skb,
 	struct dst_entry *dst = skb_dst(skb);
 	int res;
 
-	if (txq->qdisc == &noop_qdisc)
+	if (rcu_dereference_bh(txq->qdisc) == &noop_qdisc)
 		return -ENODEV;
 
 	if (!dev->header_ops || !dst)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 02/15] net: rcu-ify tcf_proto
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
  2014-04-30 16:35 ` [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
@ 2014-04-30 16:35 ` John Fastabend
  2014-04-30 16:36 ` [RFC PATCH 03/15] net: sched: cls_basic use RCU John Fastabend
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:35 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    5 +++--
 net/sched/cls_api.c       |   30 +++++++++++++++---------------
 net/sched/sch_api.c       |    6 +++---
 net/sched/sch_atm.c       |   30 +++++++++++++++++++-----------
 net/sched/sch_cbq.c       |   21 +++++++++++++++------
 net/sched/sch_choke.c     |   18 +++++++++++++-----
 net/sched/sch_drr.c       |   10 +++++++---
 net/sched/sch_dsmark.c    |    8 +++++---
 net/sched/sch_fq_codel.c  |   11 +++++++----
 net/sched/sch_hfsc.c      |   17 +++++++++++------
 net/sched/sch_htb.c       |   25 ++++++++++++++++---------
 net/sched/sch_ingress.c   |    8 +++++---
 net/sched/sch_multiq.c    |    8 +++++---
 net/sched/sch_prio.c      |   11 +++++++----
 net/sched/sch_qfq.c       |    9 ++++++---
 net/sched/sch_sfb.c       |   15 +++++++++------
 net/sched/sch_sfq.c       |   11 +++++++----
 17 files changed, 153 insertions(+), 90 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index dc226c0..20553c7 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -212,8 +212,8 @@ struct tcf_proto_ops {
 
 struct tcf_proto {
 	/* Fast access part */
-	struct tcf_proto	*next;
-	void			*root;
+	struct tcf_proto __rcu	*next;
+	void __rcu		*root;
 	int			(*classify)(struct sk_buff *,
 					    const struct tcf_proto *,
 					    struct tcf_result *);
@@ -225,6 +225,7 @@ struct tcf_proto {
 	struct Qdisc		*q;
 	void			*data;
 	const struct tcf_proto_ops	*ops;
+	struct rcu_head		rcu;
 };
 
 struct qdisc_skb_cb {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 29a30a1..94c7d38 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -117,7 +117,6 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tca[TCA_MAX + 1];
-	spinlock_t *root_lock;
 	struct tcmsg *t;
 	u32 protocol;
 	u32 prio;
@@ -125,7 +124,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 	u32 parent;
 	struct net_device *dev;
 	struct Qdisc  *q;
-	struct tcf_proto **back, **chain;
+	struct tcf_proto __rcu **back;
+	struct tcf_proto __rcu **chain;
 	struct tcf_proto *tp;
 	const struct tcf_proto_ops *tp_ops;
 	const struct Qdisc_class_ops *cops;
@@ -196,7 +196,9 @@ replay:
 		goto errout;
 
 	/* Check the chain for existence of proto-tcf with this priority */
-	for (back = chain; (tp = *back) != NULL; back = &tp->next) {
+	for (back = chain;
+	     (tp = rtnl_dereference(*back)) != NULL;
+	     back = &tp->next) {
 		if (tp->prio >= prio) {
 			if (tp->prio == prio) {
 				if (!nprio ||
@@ -208,8 +210,6 @@ replay:
 		}
 	}
 
-	root_lock = qdisc_root_sleeping_lock(q);
-
 	if (tp == NULL) {
 		/* Proto-tcf does not exist, create new one */
 
@@ -258,7 +258,8 @@ replay:
 		}
 		tp->ops = tp_ops;
 		tp->protocol = protocol;
-		tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(*back));
+		tp->prio = nprio ? :
+			       TC_H_MAJ(tcf_auto_prio(rtnl_dereference(*back)));
 		tp->q = q;
 		tp->classify = tp_ops->classify;
 		tp->classid = parent;
@@ -279,9 +280,9 @@ replay:
 
 	if (fh == 0) {
 		if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
-			spin_lock_bh(root_lock);
-			*back = tp->next;
-			spin_unlock_bh(root_lock);
+			struct tcf_proto *next = rtnl_dereference(tp->next);
+
+			rcu_assign_pointer(*back, next);
 
 			tfilter_notify(net, skb, n, tp, fh, RTM_DELTFILTER);
 			tcf_destroy(tp);
@@ -320,10 +321,8 @@ replay:
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh);
 	if (err == 0) {
 		if (tp_created) {
-			spin_lock_bh(root_lock);
-			tp->next = *back;
-			*back = tp;
-			spin_unlock_bh(root_lock);
+			rcu_assign_pointer(tp->next, rtnl_dereference(*back));
+			rcu_assign_pointer(*back, tp);
 		}
 		tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER);
 	} else {
@@ -418,7 +417,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	int s_t;
 	struct net_device *dev;
 	struct Qdisc *q;
-	struct tcf_proto *tp, **chain;
+	struct tcf_proto *tp, __rcu **chain;
 	struct tcmsg *tcm = nlmsg_data(cb->nlh);
 	unsigned long cl = 0;
 	const struct Qdisc_class_ops *cops;
@@ -452,7 +451,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 
 	s_t = cb->args[0];
 
-	for (tp = *chain, t = 0; tp; tp = tp->next, t++) {
+	for (tp = rtnl_dereference(*chain), t = 0;
+	     tp; tp = rtnl_dereference(tp->next), t++) {
 		if (t < s_t)
 			continue;
 		if (TC_H_MAJ(tcm->tcm_info) &&
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index a0b84e0..30cc267 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1779,7 +1779,7 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
 	__be16 protocol = skb->protocol;
 	int err;
 
-	for (; tp; tp = tp->next) {
+	for (; tp; tp = rcu_dereference_bh(tp->next)) {
 		if (tp->protocol != protocol &&
 		    tp->protocol != htons(ETH_P_ALL))
 			continue;
@@ -1831,7 +1831,7 @@ void tcf_destroy(struct tcf_proto *tp)
 {
 	tp->ops->destroy(tp);
 	module_put(tp->ops->owner);
-	kfree(tp);
+	kfree_rcu(tp, rcu);
 }
 
 void tcf_destroy_chain(struct tcf_proto **fl)
@@ -1839,7 +1839,7 @@ void tcf_destroy_chain(struct tcf_proto **fl)
 	struct tcf_proto *tp;
 
 	while ((tp = *fl) != NULL) {
-		*fl = tp->next;
+		*fl = rtnl_dereference(tp->next);
 		tcf_destroy(tp);
 	}
 }
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 8449b33..71e7e72 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -41,7 +41,7 @@
 
 struct atm_flow_data {
 	struct Qdisc		*q;	/* FIFO, TBF, etc. */
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 	struct atm_vcc		*vcc;	/* VCC; NULL if VCC is closed */
 	void			(*old_pop)(struct atm_vcc *vcc,
 					   struct sk_buff *skb); /* chaining */
@@ -134,6 +134,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
 {
 	struct atm_qdisc_data *p = qdisc_priv(sch);
 	struct atm_flow_data *flow = (struct atm_flow_data *)cl;
+	struct tcf_proto *fl;
 
 	pr_debug("atm_tc_put(sch %p,[qdisc %p],flow %p)\n", sch, p, flow);
 	if (--flow->ref)
@@ -142,7 +143,10 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
 	list_del_init(&flow->list);
 	pr_debug("atm_tc_put: qdisc %p\n", flow->q);
 	qdisc_destroy(flow->q);
-	tcf_destroy_chain(&flow->filter_list);
+
+	fl = rtnl_dereference(flow->filter_list);
+	tcf_destroy_chain(&fl);
+
 	if (flow->sock) {
 		pr_debug("atm_tc_put: f_count %ld\n",
 			file_count(flow->sock->file));
@@ -273,7 +277,7 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
 		error = -ENOBUFS;
 		goto err_out;
 	}
-	flow->filter_list = NULL;
+	RCU_INIT_POINTER(flow->filter_list, NULL);
 	flow->q = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops, classid);
 	if (!flow->q)
 		flow->q = &noop_qdisc;
@@ -311,7 +315,7 @@ static int atm_tc_delete(struct Qdisc *sch, unsigned long arg)
 	pr_debug("atm_tc_delete(sch %p,[qdisc %p],flow %p)\n", sch, p, flow);
 	if (list_empty(&flow->list))
 		return -EINVAL;
-	if (flow->filter_list || flow == &p->link)
+	if (rtnl_dereference(flow->filter_list) || flow == &p->link)
 		return -EBUSY;
 	/*
 	 * Reference count must be 2: one for "keepalive" (set at class
@@ -369,11 +373,12 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	flow = NULL;
 	if (TC_H_MAJ(skb->priority) != sch->handle ||
 	    !(flow = (struct atm_flow_data *)atm_tc_get(sch, skb->priority))) {
+		struct tcf_proto *fl;
+
 		list_for_each_entry(flow, &p->flows, list) {
-			if (flow->filter_list) {
-				result = tc_classify_compat(skb,
-							    flow->filter_list,
-							    &res);
+			fl = rcu_dereference_bh(flow->filter_list);
+			if (fl) {
+				result = tc_classify_compat(skb, fl, &res);
 				if (result < 0)
 					continue;
 				flow = (struct atm_flow_data *)res.class;
@@ -544,7 +549,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt)
 	if (!p->link.q)
 		p->link.q = &noop_qdisc;
 	pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q);
-	p->link.filter_list = NULL;
+	RCU_INIT_POINTER(p->link.filter_list, NULL);
 	p->link.vcc = NULL;
 	p->link.sock = NULL;
 	p->link.classid = sch->handle;
@@ -568,10 +573,13 @@ static void atm_tc_destroy(struct Qdisc *sch)
 {
 	struct atm_qdisc_data *p = qdisc_priv(sch);
 	struct atm_flow_data *flow, *tmp;
+	struct tcf_proto *fl;
 
 	pr_debug("atm_tc_destroy(sch %p,[qdisc %p])\n", sch, p);
-	list_for_each_entry(flow, &p->flows, list)
-		tcf_destroy_chain(&flow->filter_list);
+	list_for_each_entry(flow, &p->flows, list) {
+		fl = rtnl_dereference(flow->filter_list);
+		tcf_destroy_chain(&fl);
+	}
 
 	list_for_each_entry_safe(flow, tmp, &p->flows, list) {
 		if (flow->ref > 1)
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index ead5264..d0a6e60 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -133,7 +133,7 @@ struct cbq_class {
 	struct gnet_stats_rate_est64 rate_est;
 	struct tc_cbq_xstats	xstats;
 
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 
 	int			refcnt;
 	int			filters;
@@ -222,6 +222,7 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct cbq_class **defmap;
 	struct cbq_class *cl = NULL;
 	u32 prio = skb->priority;
+	struct tcf_proto *fl;
 	struct tcf_result res;
 
 	/*
@@ -234,13 +235,15 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	for (;;) {
 		int result = 0;
+
 		defmap = head->defaults;
 
+		fl = rcu_dereference_bh(head->filter_list);
 		/*
 		 * Step 2+n. Apply classifier.
 		 */
-		if (!head->filter_list ||
-		    (result = tc_classify_compat(skb, head->filter_list, &res)) < 0)
+		result = tc_classify_compat(skb, fl, &res);
+		if (!fl || result < 0)
 			goto fallback;
 
 		cl = (void *)res.class;
@@ -1683,10 +1686,13 @@ static unsigned long cbq_get(struct Qdisc *sch, u32 classid)
 static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
 {
 	struct cbq_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *fl;
 
 	WARN_ON(cl->filters);
 
-	tcf_destroy_chain(&cl->filter_list);
+	fl = rtnl_dereference(cl->filter_list);
+	tcf_destroy_chain(&fl);
+
 	qdisc_destroy(cl->q);
 	qdisc_put_rtab(cl->R_tab);
 	gen_kill_estimator(&cl->bstats, &cl->rate_est);
@@ -1699,6 +1705,7 @@ static void cbq_destroy(struct Qdisc *sch)
 	struct cbq_sched_data *q = qdisc_priv(sch);
 	struct hlist_node *next;
 	struct cbq_class *cl;
+	struct tcf_proto *fl;
 	unsigned int h;
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -1710,8 +1717,10 @@ static void cbq_destroy(struct Qdisc *sch)
 	 * be bound to classes which have been destroyed already. --TGR '04
 	 */
 	for (h = 0; h < q->clhash.hashsize; h++) {
-		hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode)
-			tcf_destroy_chain(&cl->filter_list);
+		hlist_for_each_entry(cl, &q->clhash.hash[h], common.hnode) {
+			fl = rtnl_dereference(cl->filter_list);
+			tcf_destroy_chain(&fl);
+		}
 	}
 	for (h = 0; h < q->clhash.hashsize; h++) {
 		hlist_for_each_entry_safe(cl, next, &q->clhash.hash[h],
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 2aee028..31b2d75 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -57,7 +57,7 @@ struct choke_sched_data {
 
 /* Variables */
 	struct red_vars  vars;
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	struct {
 		u32	prob_drop;	/* Early probability drops */
 		u32	prob_mark;	/* Early probability marks */
@@ -193,9 +193,11 @@ static bool choke_classify(struct sk_buff *skb,
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int result;
 
-	result = tc_classify(skb, q->filter_list, &res);
+	fl = rcu_dereference_bh(q->filter_list);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -244,12 +246,14 @@ static bool choke_match_random(const struct choke_sched_data *q,
 			       unsigned int *pidx)
 {
 	struct sk_buff *oskb;
+	struct tcf_proto *fl;
 
 	if (q->head == q->tail)
 		return false;
 
 	oskb = choke_peek_random(q, pidx);
-	if (q->filter_list)
+	fl = rcu_dereference_bh(q->filter_list);
+	if (fl)
 		return choke_get_classid(nskb) == choke_get_classid(oskb);
 
 	return choke_match_flow(oskb, nskb);
@@ -259,9 +263,11 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 	const struct red_parms *p = &q->parms;
+	struct tcf_proto *fl;
 	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 
-	if (q->filter_list) {
+	fl = rcu_dereference_bh(q->filter_list);
+	if (fl) {
 		/* If using external classifiers, get result and record it. */
 		if (!choke_classify(skb, sch, &ret))
 			goto other_drop;	/* Packet was eaten by filter */
@@ -534,8 +540,10 @@ static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
 static void choke_destroy(struct Qdisc *sch)
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *fl;
 
-	tcf_destroy_chain(&q->filter_list);
+	fl = rtnl_dereference(q->filter_list);
+	tcf_destroy_chain(&fl);
 	choke_free(q->tab);
 }
 
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 8302717..899026f 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -35,7 +35,7 @@ struct drr_class {
 
 struct drr_sched {
 	struct list_head		active;
-	struct tcf_proto		*filter_list;
+	struct tcf_proto __rcu		*filter_list;
 	struct Qdisc_class_hash		clhash;
 };
 
@@ -319,6 +319,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 	struct drr_sched *q = qdisc_priv(sch);
 	struct drr_class *cl;
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int result;
 
 	if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) {
@@ -328,7 +329,8 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, q->filter_list, &res);
+	fl = rcu_dereference_bh(q->filter_list);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -467,9 +469,11 @@ static void drr_destroy_qdisc(struct Qdisc *sch)
 	struct drr_sched *q = qdisc_priv(sch);
 	struct drr_class *cl;
 	struct hlist_node *next;
+	struct tcf_proto *fl;
 	unsigned int i;
 
-	tcf_destroy_chain(&q->filter_list);
+	fl = rtnl_dereference(q->filter_list);
+	tcf_destroy_chain(&fl);
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry_safe(cl, next, &q->clhash.hash[i],
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 49d6ef3..6b84bd4 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -37,7 +37,7 @@
 
 struct dsmark_qdisc_data {
 	struct Qdisc		*q;
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 	u8			*mask;	/* "owns" the array */
 	u8			*value;
 	u16			indices;
@@ -229,7 +229,8 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		skb->tc_index = TC_H_MIN(skb->priority);
 	else {
 		struct tcf_result res;
-		int result = tc_classify(skb, p->filter_list, &res);
+		struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
+		int result = tc_classify(skb, fl, &res);
 
 		pr_debug("result %d class 0x%04x\n", result, res.classid);
 
@@ -404,10 +405,11 @@ static void dsmark_reset(struct Qdisc *sch)
 static void dsmark_destroy(struct Qdisc *sch)
 {
 	struct dsmark_qdisc_data *p = qdisc_priv(sch);
+	struct tcf_proto *fl = rtnl_dereference(p->filter_list);
 
 	pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p);
 
-	tcf_destroy_chain(&p->filter_list);
+	tcf_destroy_chain(&fl);
 	qdisc_destroy(p->q);
 	kfree(p->mask);
 }
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 0bf432c..d37eb06 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -52,7 +52,7 @@ struct fq_codel_flow {
 }; /* please try to keep this structure <= 64 bytes */
 
 struct fq_codel_sched_data {
-	struct tcf_proto *filter_list;	/* optional external classifier */
+	struct tcf_proto __rcu *filter_list; /* optional external classifier */
 	struct fq_codel_flow *flows;	/* Flows table [flows_cnt] */
 	u32		*backlogs;	/* backlog table [flows_cnt] */
 	u32		flows_cnt;	/* number of flows */
@@ -84,6 +84,7 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 				      int *qerr)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *filter;
 	struct tcf_result res;
 	int result;
 
@@ -92,11 +93,12 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 	    TC_H_MIN(skb->priority) <= q->flows_cnt)
 		return TC_H_MIN(skb->priority);
 
-	if (!q->filter_list)
+	filter = rcu_dereference(q->filter_list);
+	if (!filter)
 		return fq_codel_hash(q, skb) + 1;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, q->filter_list, &res);
+	result = tc_classify(skb, filter, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -376,8 +378,9 @@ static void fq_codel_free(void *addr)
 static void fq_codel_destroy(struct Qdisc *sch)
 {
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *filter = rtnl_dereference(q->filter_list);
 
-	tcf_destroy_chain(&q->filter_list);
+	tcf_destroy_chain(&filter);
 	fq_codel_free(q->backlogs);
 	fq_codel_free(q->flows);
 }
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index ec8aeaa..95689d3 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -116,7 +116,7 @@ struct hfsc_class {
 	struct gnet_stats_queue qstats;
 	struct gnet_stats_rate_est64 rate_est;
 	unsigned int	level;		/* class level in hierarchy */
-	struct tcf_proto *filter_list;	/* filter list */
+	struct tcf_proto __rcu *filter_list; /* filter list */
 	unsigned int	filter_cnt;	/* filter count */
 
 	struct hfsc_sched *sched;	/* scheduler data */
@@ -1110,8 +1110,10 @@ static void
 hfsc_destroy_class(struct Qdisc *sch, struct hfsc_class *cl)
 {
 	struct hfsc_sched *q = qdisc_priv(sch);
+	struct tcf_proto *fl;
 
-	tcf_destroy_chain(&cl->filter_list);
+	fl = rtnl_dereference(cl->filter_list);
+	tcf_destroy_chain(&fl);
 	qdisc_destroy(cl->qdisc);
 	gen_kill_estimator(&cl->bstats, &cl->rate_est);
 	if (cl != &q->root)
@@ -1161,7 +1163,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	head = &q->root;
-	tcf = q->root.filter_list;
+	tcf = rcu_dereference_bh(q->root.filter_list);
 	while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -1185,7 +1187,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 			return cl; /* hit leaf class */
 
 		/* apply inner filter chain */
-		tcf = cl->filter_list;
+		tcf = rcu_dereference_bh(cl->filter_list);
 		head = cl;
 	}
 
@@ -1539,11 +1541,14 @@ hfsc_destroy_qdisc(struct Qdisc *sch)
 	struct hfsc_sched *q = qdisc_priv(sch);
 	struct hlist_node *next;
 	struct hfsc_class *cl;
+	struct tcf_proto *fl;
 	unsigned int i;
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
-		hlist_for_each_entry(cl, &q->clhash.hash[i], cl_common.hnode)
-			tcf_destroy_chain(&cl->filter_list);
+		hlist_for_each_entry(cl, &q->clhash.hash[i], cl_common.hnode) {
+			fl = rtnl_dereference(cl->filter_list);
+			tcf_destroy_chain(&fl);
+		}
 	}
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry_safe(cl, next, &q->clhash.hash[i],
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 9f949ab..2c2a6aa 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -103,7 +103,7 @@ struct htb_class {
 	u32			prio;		/* these two are used only by leaves... */
 	int			quantum;	/* but stored for parent-to-leaf return */
 
-	struct tcf_proto	*filter_list;	/* class attached filters */
+	struct tcf_proto __rcu	*filter_list;	/* class attached filters */
 	int			filter_cnt;
 	int			refcnt;		/* usage count of this class */
 
@@ -153,7 +153,7 @@ struct htb_sched {
 	int			rate2quantum;	/* quant = rate / rate2quantum */
 
 	/* filters for qdisc itself */
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 
 #define HTB_WARN_TOOMANYEVENTS	0x1
 	unsigned int		warned;	/* only one warning */
@@ -223,9 +223,9 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 		if (cl->level == 0)
 			return cl;
 		/* Start with inner filter chain if a non-leaf class is selected */
-		tcf = cl->filter_list;
+		tcf = rcu_dereference_bh(cl->filter_list);
 	} else {
-		tcf = q->filter_list;
+		tcf = rcu_dereference_bh(q->filter_list);
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
@@ -251,7 +251,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 			return cl;	/* we hit leaf; return it */
 
 		/* we have got inner class; apply inner filter chain */
-		tcf = cl->filter_list;
+		tcf = rcu_dereference_bh(cl->filter_list);
 	}
 	/* classification failed; try to use default class */
 	cl = htb_find(TC_H_MAKE(TC_H_MAJ(sch->handle), q->defcls), sch);
@@ -1231,12 +1231,15 @@ static void htb_parent_to_leaf(struct htb_sched *q, struct htb_class *cl,
 
 static void htb_destroy_class(struct Qdisc *sch, struct htb_class *cl)
 {
+	struct tcf_proto *fl;
+
 	if (!cl->level) {
 		WARN_ON(!cl->un.leaf.q);
 		qdisc_destroy(cl->un.leaf.q);
 	}
 	gen_kill_estimator(&cl->bstats, &cl->rate_est);
-	tcf_destroy_chain(&cl->filter_list);
+	fl = rtnl_dereference(cl->filter_list);
+	tcf_destroy_chain(&fl);
 	kfree(cl);
 }
 
@@ -1245,6 +1248,7 @@ static void htb_destroy(struct Qdisc *sch)
 	struct htb_sched *q = qdisc_priv(sch);
 	struct hlist_node *next;
 	struct htb_class *cl;
+	struct tcf_proto *fl;
 	unsigned int i;
 
 	cancel_work_sync(&q->work);
@@ -1254,11 +1258,14 @@ static void htb_destroy(struct Qdisc *sch)
 	 * because filter need its target class alive to be able to call
 	 * unbind_filter on it (without Oops).
 	 */
-	tcf_destroy_chain(&q->filter_list);
+	fl = rtnl_dereference(q->filter_list);
+	tcf_destroy_chain(&fl);
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
-		hlist_for_each_entry(cl, &q->clhash.hash[i], common.hnode)
-			tcf_destroy_chain(&cl->filter_list);
+		hlist_for_each_entry(cl, &q->clhash.hash[i], common.hnode) {
+			fl = rtnl_dereference(cl->filter_list);
+			tcf_destroy_chain(&fl);
+		}
 	}
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry_safe(cl, next, &q->clhash.hash[i],
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 62871c1..be10333d 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -17,7 +17,7 @@
 
 
 struct ingress_qdisc_data {
-	struct tcf_proto	*filter_list;
+	struct tcf_proto __rcu	*filter_list;
 };
 
 /* ------------------------- Class/flow operations ------------------------- */
@@ -59,9 +59,10 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct ingress_qdisc_data *p = qdisc_priv(sch);
 	struct tcf_result res;
+	struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
 	int result;
 
-	result = tc_classify(skb, p->filter_list, &res);
+	result = tc_classify(skb, fl, &res);
 
 	qdisc_bstats_update(sch, skb);
 	switch (result) {
@@ -89,8 +90,9 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 static void ingress_destroy(struct Qdisc *sch)
 {
 	struct ingress_qdisc_data *p = qdisc_priv(sch);
+	struct tcf_proto *fl = rtnl_dereference(p->filter_list);
 
-	tcf_destroy_chain(&p->filter_list);
+	tcf_destroy_chain(&fl);
 }
 
 static int ingress_dump(struct Qdisc *sch, struct sk_buff *skb)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index afb050a..169d637 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -31,7 +31,7 @@ struct multiq_sched_data {
 	u16 bands;
 	u16 max_bands;
 	u16 curband;
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	struct Qdisc **queues;
 };
 
@@ -42,10 +42,11 @@ multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct multiq_sched_data *q = qdisc_priv(sch);
 	u32 band;
 	struct tcf_result res;
+	struct tcf_proto *fl = rcu_dereference_bh(q->filter_list);
 	int err;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	err = tc_classify(skb, q->filter_list, &res);
+	err = tc_classify(skb, fl, &res);
 #ifdef CONFIG_NET_CLS_ACT
 	switch (err) {
 	case TC_ACT_STOLEN:
@@ -188,8 +189,9 @@ multiq_destroy(struct Qdisc *sch)
 {
 	int band;
 	struct multiq_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *fl = rtnl_dereference(q->filter_list);
 
-	tcf_destroy_chain(&q->filter_list);
+	tcf_destroy_chain(&fl);
 	for (band = 0; band < q->bands; band++)
 		qdisc_destroy(q->queues[band]);
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 79359b6..43aa35d 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -24,7 +24,7 @@
 
 struct prio_sched_data {
 	int bands;
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	u8  prio2band[TC_PRIO_MAX+1];
 	struct Qdisc *queues[TCQ_PRIO_BANDS];
 };
@@ -36,11 +36,13 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct prio_sched_data *q = qdisc_priv(sch);
 	u32 band = skb->priority;
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int err;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	if (TC_H_MAJ(skb->priority) != sch->handle) {
-		err = tc_classify(skb, q->filter_list, &res);
+		fl = rcu_dereference_bh(q->filter_list);
+		err = tc_classify(skb, fl, &res);
 #ifdef CONFIG_NET_CLS_ACT
 		switch (err) {
 		case TC_ACT_STOLEN:
@@ -50,7 +52,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 			return NULL;
 		}
 #endif
-		if (!q->filter_list || err < 0) {
+		if (!fl || err < 0) {
 			if (TC_H_MAJ(band))
 				band = 0;
 			return q->queues[q->prio2band[band & TC_PRIO_MAX]];
@@ -157,8 +159,9 @@ prio_destroy(struct Qdisc *sch)
 {
 	int prio;
 	struct prio_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *fl = rtnl_dereference(q->filter_list);
 
-	tcf_destroy_chain(&q->filter_list);
+	tcf_destroy_chain(&fl);
 	for (prio = 0; prio < q->bands; prio++)
 		qdisc_destroy(q->queues[prio]);
 }
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 8056fb4..cee5618 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -181,7 +181,7 @@ struct qfq_group {
 };
 
 struct qfq_sched {
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	struct Qdisc_class_hash clhash;
 
 	u64			oldV, V;	/* Precise virtual times. */
@@ -704,6 +704,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	struct qfq_sched *q = qdisc_priv(sch);
 	struct qfq_class *cl;
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int result;
 
 	if (TC_H_MAJ(skb->priority ^ sch->handle) == 0) {
@@ -714,7 +715,8 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, q->filter_list, &res);
+	fl = rcu_dereference_bh(q->filter_list);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -1524,9 +1526,10 @@ static void qfq_destroy_qdisc(struct Qdisc *sch)
 	struct qfq_sched *q = qdisc_priv(sch);
 	struct qfq_class *cl;
 	struct hlist_node *next;
+	struct tcf_proto *fl = rtnl_dereference(q->filter_list);
 	unsigned int i;
 
-	tcf_destroy_chain(&q->filter_list);
+	tcf_destroy_chain(&fl);
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry_safe(cl, next, &q->clhash.hash[i],
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 9b0f709..5938d00 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -55,7 +55,7 @@ struct sfb_bins {
 
 struct sfb_sched_data {
 	struct Qdisc	*qdisc;
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	unsigned long	rehash_interval;
 	unsigned long	warmup_time;	/* double buffering warmup time in jiffies */
 	u32		max;
@@ -253,13 +253,13 @@ static bool sfb_rate_limit(struct sk_buff *skb, struct sfb_sched_data *q)
 	return false;
 }
 
-static bool sfb_classify(struct sk_buff *skb, struct sfb_sched_data *q,
+static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
 			 int *qerr, u32 *salt)
 {
 	struct tcf_result res;
 	int result;
 
-	result = tc_classify(skb, q->filter_list, &res);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -281,6 +281,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	struct sfb_sched_data *q = qdisc_priv(sch);
 	struct Qdisc *child = q->qdisc;
+	struct tcf_proto *fl;
 	int i;
 	u32 p_min = ~0;
 	u32 minqlen = ~0;
@@ -306,9 +307,10 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		}
 	}
 
-	if (q->filter_list) {
+	fl = rcu_dereference_bh(q->filter_list);
+	if (fl) {
 		/* If using external classifiers, get result and record it. */
-		if (!sfb_classify(skb, q, &ret, &salt))
+		if (!sfb_classify(skb, fl, &ret, &salt))
 			goto other_drop;
 		keys.src = salt;
 		keys.dst = 0;
@@ -465,8 +467,9 @@ static void sfb_reset(struct Qdisc *sch)
 static void sfb_destroy(struct Qdisc *sch)
 {
 	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *fl = rtnl_dereference(q->filter_list);
 
-	tcf_destroy_chain(&q->filter_list);
+	tcf_destroy_chain(&fl);
 	qdisc_destroy(q->qdisc);
 }
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 87317ff..13ec4eb 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -125,7 +125,7 @@ struct sfq_sched_data {
 	u8		cur_depth;	/* depth of longest slot */
 	u8		flags;
 	unsigned short  scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */
-	struct tcf_proto *filter_list;
+	struct tcf_proto __rcu *filter_list;
 	sfq_index	*ht;		/* Hash table ('divisor' slots) */
 	struct sfq_slot	*slots;		/* Flows table ('maxflows' entries) */
 
@@ -187,6 +187,7 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	struct tcf_result res;
+	struct tcf_proto *fl;
 	int result;
 
 	if (TC_H_MAJ(skb->priority) == sch->handle &&
@@ -194,13 +195,14 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	    TC_H_MIN(skb->priority) <= q->divisor)
 		return TC_H_MIN(skb->priority);
 
-	if (!q->filter_list) {
+	fl = rcu_dereference_bh(q->filter_list);
+	if (!fl) {
 		skb_flow_dissect(skb, &sfq_skb_cb(skb)->keys);
 		return sfq_hash(q, skb) + 1;
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, q->filter_list, &res);
+	result = tc_classify(skb, fl, &res);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -727,8 +729,9 @@ static void sfq_free(void *addr)
 static void sfq_destroy(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
+	struct tcf_proto *fl = rtnl_dereference(q->filter_list);
 
-	tcf_destroy_chain(&q->filter_list);
+	tcf_destroy_chain(&fl);
 	q->perturb_period = 0;
 	del_timer_sync(&q->perturb_timer);
 	sfq_free(q->ht);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 03/15] net: sched: cls_basic use RCU
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
  2014-04-30 16:35 ` [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
  2014-04-30 16:35 ` [RFC PATCH 02/15] net: rcu-ify tcf_proto John Fastabend
@ 2014-04-30 16:36 ` John Fastabend
  2014-04-30 16:36 ` [RFC PATCH 04/15] net: sched: cls_cgroup " John Fastabend
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:36 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

Enable basic classifier for RCU.

Dereferencing tp->root may look a bit strange here but it is needed
by my accounting because it is allocated at init time and needs to
be kfree'd at destroy time. However because it may be referenced in
the classify() path we must wait an RCU grace period before free'ing
it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern
is used in all the classifiers.

Also the hgenerator can be incremented without concern because it
is always incremented under RTNL.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_basic.c |   80 ++++++++++++++++++++++++++++---------------------
 1 file changed, 45 insertions(+), 35 deletions(-)

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index e98ca99..9b3ca1d 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -24,6 +24,7 @@
 struct basic_head {
 	u32			hgenerator;
 	struct list_head	flist;
+	struct rcu_head		rcu;
 };
 
 struct basic_filter {
@@ -31,17 +32,19 @@ struct basic_filter {
 	struct tcf_exts		exts;
 	struct tcf_ematch_tree	ematches;
 	struct tcf_result	res;
+	struct tcf_proto	*tp;
 	struct list_head	link;
+	struct rcu_head		rcu;
 };
 
 static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			  struct tcf_result *res)
 {
 	int r;
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rcu_dereference_bh(tp->root);
 	struct basic_filter *f;
 
-	list_for_each_entry(f, &head->flist, link) {
+	list_for_each_entry_rcu(f, &head->flist, link) {
 		if (!tcf_em_tree_match(skb, &f->ematches, NULL))
 			continue;
 		*res = f->res;
@@ -56,7 +59,7 @@ static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 static unsigned long basic_get(struct tcf_proto *tp, u32 handle)
 {
 	unsigned long l = 0UL;
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f;
 
 	if (head == NULL)
@@ -81,12 +84,15 @@ static int basic_init(struct tcf_proto *tp)
 	if (head == NULL)
 		return -ENOBUFS;
 	INIT_LIST_HEAD(&head->flist);
-	tp->root = head;
+	rcu_assign_pointer(tp->root, head);
 	return 0;
 }
 
-static void basic_delete_filter(struct tcf_proto *tp, struct basic_filter *f)
+static void basic_delete_filter(struct rcu_head *head)
 {
+	struct basic_filter *f = container_of(head, struct basic_filter, rcu);
+	struct tcf_proto *tp = f->tp;
+
 	tcf_unbind_filter(tp, &f->res);
 	tcf_exts_destroy(tp, &f->exts);
 	tcf_em_tree_destroy(tp, &f->ematches);
@@ -95,27 +101,26 @@ static void basic_delete_filter(struct tcf_proto *tp, struct basic_filter *f)
 
 static void basic_destroy(struct tcf_proto *tp)
 {
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f, *n;
 
 	list_for_each_entry_safe(f, n, &head->flist, link) {
-		list_del(&f->link);
-		basic_delete_filter(tp, f);
+		list_del_rcu(&f->link);
+		call_rcu(&f->rcu, basic_delete_filter);
 	}
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static int basic_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *t, *f = (struct basic_filter *) arg;
 
 	list_for_each_entry(t, &head->flist, link)
 		if (t == f) {
-			tcf_tree_lock(tp);
-			list_del(&t->link);
-			tcf_tree_unlock(tp);
-			basic_delete_filter(tp, t);
+			list_del_rcu(&t->link);
+			call_rcu(&t->rcu, basic_delete_filter);
 			return 0;
 		}
 
@@ -152,6 +157,7 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp,
 
 	tcf_exts_change(tp, &f->exts, &e);
 	tcf_em_tree_change(tp, &f->ematches, &t);
+	f->tp = tp;
 
 	return 0;
 errout:
@@ -164,9 +170,10 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 			struct nlattr **tca, unsigned long *arg)
 {
 	int err;
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct nlattr *tb[TCA_BASIC_MAX + 1];
-	struct basic_filter *f = (struct basic_filter *) *arg;
+	struct basic_filter *fold = (struct basic_filter *) *arg;
+	struct basic_filter *fnew;
 
 	if (tca[TCA_OPTIONS] == NULL)
 		return -EINVAL;
@@ -176,22 +183,23 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		return err;
 
-	if (f != NULL) {
-		if (handle && f->handle != handle)
+	if (fold != NULL) {
+		if (handle && fold->handle != handle)
 			return -EINVAL;
-		return basic_set_parms(net, tp, f, base, tb, tca[TCA_RATE]);
 	}
 
 	err = -ENOBUFS;
-	f = kzalloc(sizeof(*f), GFP_KERNEL);
-	if (f == NULL)
+	fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
+	if (fnew == NULL)
 		goto errout;
 
-	tcf_exts_init(&f->exts, TCA_BASIC_ACT, TCA_BASIC_POLICE);
+	tcf_exts_init(&fnew->exts, TCA_BASIC_ACT, TCA_BASIC_POLICE);
 	err = -EINVAL;
-	if (handle)
-		f->handle = handle;
-	else {
+	if (handle) {
+		fnew->handle = handle;
+	} else if (fold) {
+		fnew->handle = fold->handle;
+	} else {
 		unsigned int i = 0x80000000;
 		do {
 			if (++head->hgenerator == 0x7FFFFFFF)
@@ -203,29 +211,31 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 			goto errout;
 		}
 
-		f->handle = head->hgenerator;
+		fnew->handle = head->hgenerator;
 	}
 
-	err = basic_set_parms(net, tp, f, base, tb, tca[TCA_RATE]);
+	err = basic_set_parms(net, tp, fnew, base, tb, tca[TCA_RATE]);
 	if (err < 0)
 		goto errout;
 
-	tcf_tree_lock(tp);
-	list_add(&f->link, &head->flist);
-	tcf_tree_unlock(tp);
-	*arg = (unsigned long) f;
+	*arg = (unsigned long) fnew;
+
+	if (fold) {
+		list_replace_rcu(&fold->link, &fnew->link);
+		call_rcu(&fold->rcu, basic_delete_filter);
+	} else {
+		list_add_rcu(&fnew->link, &head->flist);
+	}
 
 	return 0;
 errout:
-	if (*arg == 0UL && f)
-		kfree(f);
-
+	kfree(fnew);
 	return err;
 }
 
 static void basic_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct basic_head *head = tp->root;
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f;
 
 	list_for_each_entry(f, &head->flist, link) {

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 04/15] net: sched: cls_cgroup use RCU
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (2 preceding siblings ...)
  2014-04-30 16:36 ` [RFC PATCH 03/15] net: sched: cls_basic use RCU John Fastabend
@ 2014-04-30 16:36 ` John Fastabend
  2014-04-30 16:36 ` [RFC PATCH 05/15] net: sched: cls_flow " John Fastabend
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:36 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

Make cgroup classifier safe for RCU.

Also drops the calls in the classify routine that were doing a
rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held
entering this routine we have issues with deleting the classifier
chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock()
pair noting all paths AFAIK hold rcu_read_lock.

If there is a case where classify is called without the rcu read lock
then an rcu splat will occur and we can correct it.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_cgroup.c |   63 ++++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 8e2158a..b8db7ff 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -22,17 +22,17 @@ struct cls_cgroup_head {
 	u32			handle;
 	struct tcf_exts		exts;
 	struct tcf_ematch_tree	ematches;
+	struct tcf_proto	*tp;
+	struct rcu_head		rcu;
 };
 
 static int cls_cgroup_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			       struct tcf_result *res)
 {
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rcu_dereference_bh(tp->root);
 	u32 classid;
 
-	rcu_read_lock();
 	classid = task_cls_state(current)->classid;
-	rcu_read_unlock();
 
 	/*
 	 * Due to the nature of the classifier it is required to ignore all
@@ -80,13 +80,25 @@ static const struct nla_policy cgroup_policy[TCA_CGROUP_MAX + 1] = {
 	[TCA_CGROUP_EMATCHES]	= { .type = NLA_NESTED },
 };
 
+static void cls_cgroup_destroy_rcu(struct rcu_head *root)
+{
+	struct cls_cgroup_head *head = container_of(root,
+						    struct cls_cgroup_head,
+						    rcu);
+
+	tcf_exts_destroy(head->tp, &head->exts);
+	tcf_em_tree_destroy(head->tp, &head->ematches);
+	kfree(head);
+}
+
 static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 			     struct tcf_proto *tp, unsigned long base,
 			     u32 handle, struct nlattr **tca,
 			     unsigned long *arg)
 {
 	struct nlattr *tb[TCA_CGROUP_MAX + 1];
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
+	struct cls_cgroup_head *new;
 	struct tcf_ematch_tree t;
 	struct tcf_exts e;
 	int err;
@@ -94,25 +106,24 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 	if (!tca[TCA_OPTIONS])
 		return -EINVAL;
 
-	if (head == NULL) {
-		if (!handle)
-			return -EINVAL;
+	if (!head && !handle)
+		return -EINVAL;
 
-		head = kzalloc(sizeof(*head), GFP_KERNEL);
-		if (head == NULL)
-			return -ENOBUFS;
+	if (head && handle != head->handle)
+		return -ENOENT;
 
-		tcf_exts_init(&head->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
-		head->handle = handle;
+	new = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (!new)
+		return -ENOBUFS;
 
-		tcf_tree_lock(tp);
-		tp->root = head;
-		tcf_tree_unlock(tp);
+	if (head) {
+		new->handle = head->handle;
+	} else {
+		tcf_exts_init(&new->exts, TCA_CGROUP_ACT, TCA_CGROUP_POLICE);
+		new->handle = handle;
 	}
 
-	if (handle != head->handle)
-		return -ENOENT;
-
+	new->tp = tp;
 	err = nla_parse_nested(tb, TCA_CGROUP_MAX, tca[TCA_OPTIONS],
 			       cgroup_policy);
 	if (err < 0)
@@ -127,20 +138,24 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		return err;
 
-	tcf_exts_change(tp, &head->exts, &e);
-	tcf_em_tree_change(tp, &head->ematches, &t);
+	tcf_exts_change(tp, &new->exts, &e);
+	tcf_em_tree_change(tp, &new->ematches, &t);
 
+	rcu_assign_pointer(tp->root, new);
+	if (head)
+		call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
 	return 0;
 }
 
 static void cls_cgroup_destroy(struct tcf_proto *tp)
 {
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 
 	if (head) {
 		tcf_exts_destroy(tp, &head->exts);
 		tcf_em_tree_destroy(tp, &head->ematches);
-		kfree(head);
+		RCU_INIT_POINTER(tp->root, NULL);
+		kfree_rcu(head, rcu);
 	}
 }
 
@@ -151,7 +166,7 @@ static int cls_cgroup_delete(struct tcf_proto *tp, unsigned long arg)
 
 static void cls_cgroup_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 
 	if (arg->count < arg->skip)
 		goto skip;
@@ -167,7 +182,7 @@ skip:
 static int cls_cgroup_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 			   struct sk_buff *skb, struct tcmsg *t)
 {
-	struct cls_cgroup_head *head = tp->root;
+	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 	unsigned char *b = skb_tail_pointer(skb);
 	struct nlattr *nest;
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 05/15] net: sched: cls_flow use RCU
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (3 preceding siblings ...)
  2014-04-30 16:36 ` [RFC PATCH 04/15] net: sched: cls_cgroup " John Fastabend
@ 2014-04-30 16:36 ` John Fastabend
  2014-04-30 16:37 ` [RFC PATCH 06/15] net: sched: fw " John Fastabend
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:36 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_flow.c |  145 +++++++++++++++++++++++++++++---------------------
 1 file changed, 84 insertions(+), 61 deletions(-)

diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 257029c..272337f 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -34,12 +34,15 @@
 
 struct flow_head {
 	struct list_head	filters;
+	struct rcu_head		rcu;
 };
 
 struct flow_filter {
 	struct list_head	list;
+	struct rcu_head		rcu;
 	struct tcf_exts		exts;
 	struct tcf_ematch_tree	ematches;
+	struct tcf_proto	*tp;
 	struct timer_list	perturb_timer;
 	u32			perturb_period;
 	u32			handle;
@@ -276,14 +279,14 @@ static u32 flow_key_get(struct sk_buff *skb, int key, struct flow_keys *flow)
 static int flow_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			 struct tcf_result *res)
 {
-	struct flow_head *head = tp->root;
+	struct flow_head *head = rcu_dereference_bh(tp->root);
 	struct flow_filter *f;
 	u32 keymask;
 	u32 classid;
 	unsigned int n, key;
 	int r;
 
-	list_for_each_entry(f, &head->filters, list) {
+	list_for_each_entry_rcu(f, &head->filters, list) {
 		u32 keys[FLOW_KEY_MAX + 1];
 		struct flow_keys flow_keys;
 
@@ -346,13 +349,23 @@ static const struct nla_policy flow_policy[TCA_FLOW_MAX + 1] = {
 	[TCA_FLOW_PERTURB]	= { .type = NLA_U32 },
 };
 
+static void flow_destroy_filter(struct rcu_head *head)
+{
+	struct flow_filter *f = container_of(head, struct flow_filter, rcu);
+
+	del_timer_sync(&f->perturb_timer);
+	tcf_exts_destroy(f->tp, &f->exts);
+	tcf_em_tree_destroy(f->tp, &f->ematches);
+	kfree(f);
+}
+
 static int flow_change(struct net *net, struct sk_buff *in_skb,
 		       struct tcf_proto *tp, unsigned long base,
 		       u32 handle, struct nlattr **tca,
 		       unsigned long *arg)
 {
-	struct flow_head *head = tp->root;
-	struct flow_filter *f;
+	struct flow_head *head = rtnl_dereference(tp->root);
+	struct flow_filter *fold, *fnew;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_FLOW_MAX + 1];
 	struct tcf_exts e;
@@ -401,20 +414,42 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		goto err1;
 
-	f = (struct flow_filter *)*arg;
-	if (f != NULL) {
+	err = -ENOBUFS;
+	fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
+	if (!fnew)
+		goto err2;
+
+	fold = (struct flow_filter *)*arg;
+	if (fold) {
 		err = -EINVAL;
-		if (f->handle != handle && handle)
+		if (fold->handle != handle && handle)
 			goto err2;
 
-		mode = f->mode;
+		/* Copy fold into fnew */
+		fnew->handle = fold->handle;
+		fnew->keymask = fold->keymask;
+		fnew->tp = fold->tp;
+
+		fnew->handle = fold->handle;
+		fnew->nkeys = fold->nkeys;
+		fnew->keymask = fold->keymask;
+		fnew->mode = fold->mode;
+		fnew->mask = fold->mask;
+		fnew->xor = fold->xor;
+		fnew->rshift = fold->rshift;
+		fnew->addend = fold->addend;
+		fnew->divisor = fold->divisor;
+		fnew->baseclass = fold->baseclass;
+		fnew->hashrnd = fold->hashrnd;
+
+		mode = fold->mode;
 		if (tb[TCA_FLOW_MODE])
 			mode = nla_get_u32(tb[TCA_FLOW_MODE]);
 		if (mode != FLOW_MODE_HASH && nkeys > 1)
 			goto err2;
 
 		if (mode == FLOW_MODE_HASH)
-			perturb_period = f->perturb_period;
+			perturb_period = fold->perturb_period;
 		if (tb[TCA_FLOW_PERTURB]) {
 			if (mode != FLOW_MODE_HASH)
 				goto err2;
@@ -444,83 +479,70 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 		if (TC_H_MIN(baseclass) == 0)
 			baseclass = TC_H_MAKE(baseclass, 1);
 
-		err = -ENOBUFS;
-		f = kzalloc(sizeof(*f), GFP_KERNEL);
-		if (f == NULL)
-			goto err2;
-
-		f->handle = handle;
-		f->mask	  = ~0U;
-		tcf_exts_init(&f->exts, TCA_FLOW_ACT, TCA_FLOW_POLICE);
-
-		get_random_bytes(&f->hashrnd, 4);
-		f->perturb_timer.function = flow_perturbation;
-		f->perturb_timer.data = (unsigned long)f;
-		init_timer_deferrable(&f->perturb_timer);
+		fnew->handle = handle;
+		fnew->mask  = ~0U;
+		fnew->tp = tp;
+		get_random_bytes(&fnew->hashrnd, 4);
+		tcf_exts_init(&fnew->exts, TCA_FLOW_ACT, TCA_FLOW_POLICE);
 	}
 
-	tcf_exts_change(tp, &f->exts, &e);
-	tcf_em_tree_change(tp, &f->ematches, &t);
+	fnew->perturb_timer.function = flow_perturbation;
+	fnew->perturb_timer.data = (unsigned long)fnew;
+	init_timer_deferrable(&fnew->perturb_timer);
 
-	tcf_tree_lock(tp);
+	tcf_exts_change(tp, &fnew->exts, &e);
+	tcf_em_tree_change(tp, &fnew->ematches, &t);
 
 	if (tb[TCA_FLOW_KEYS]) {
-		f->keymask = keymask;
-		f->nkeys   = nkeys;
+		fnew->keymask = keymask;
+		fnew->nkeys   = nkeys;
 	}
 
-	f->mode = mode;
+	fnew->mode = mode;
 
 	if (tb[TCA_FLOW_MASK])
-		f->mask = nla_get_u32(tb[TCA_FLOW_MASK]);
+		fnew->mask = nla_get_u32(tb[TCA_FLOW_MASK]);
 	if (tb[TCA_FLOW_XOR])
-		f->xor = nla_get_u32(tb[TCA_FLOW_XOR]);
+		fnew->xor = nla_get_u32(tb[TCA_FLOW_XOR]);
 	if (tb[TCA_FLOW_RSHIFT])
-		f->rshift = nla_get_u32(tb[TCA_FLOW_RSHIFT]);
+		fnew->rshift = nla_get_u32(tb[TCA_FLOW_RSHIFT]);
 	if (tb[TCA_FLOW_ADDEND])
-		f->addend = nla_get_u32(tb[TCA_FLOW_ADDEND]);
+		fnew->addend = nla_get_u32(tb[TCA_FLOW_ADDEND]);
 
 	if (tb[TCA_FLOW_DIVISOR])
-		f->divisor = nla_get_u32(tb[TCA_FLOW_DIVISOR]);
+		fnew->divisor = nla_get_u32(tb[TCA_FLOW_DIVISOR]);
 	if (baseclass)
-		f->baseclass = baseclass;
+		fnew->baseclass = baseclass;
 
-	f->perturb_period = perturb_period;
-	del_timer(&f->perturb_timer);
+	fnew->perturb_period = perturb_period;
 	if (perturb_period)
-		mod_timer(&f->perturb_timer, jiffies + perturb_period);
+		mod_timer(&fnew->perturb_timer, jiffies + perturb_period);
 
 	if (*arg == 0)
-		list_add_tail(&f->list, &head->filters);
+		list_add_tail_rcu(&fnew->list, &head->filters);
+	else
+		list_replace_rcu(&fnew->list, &fold->list);
 
-	tcf_tree_unlock(tp);
+	*arg = (unsigned long)fnew;
 
-	*arg = (unsigned long)f;
+	if (fold)
+		call_rcu(&fold->rcu, flow_destroy_filter);
 	return 0;
 
 err2:
 	tcf_em_tree_destroy(tp, &t);
+	kfree(fnew);
 err1:
 	tcf_exts_destroy(tp, &e);
 	return err;
 }
 
-static void flow_destroy_filter(struct tcf_proto *tp, struct flow_filter *f)
-{
-	del_timer_sync(&f->perturb_timer);
-	tcf_exts_destroy(tp, &f->exts);
-	tcf_em_tree_destroy(tp, &f->ematches);
-	kfree(f);
-}
-
 static int flow_delete(struct tcf_proto *tp, unsigned long arg)
 {
 	struct flow_filter *f = (struct flow_filter *)arg;
 
-	tcf_tree_lock(tp);
-	list_del(&f->list);
-	tcf_tree_unlock(tp);
-	flow_destroy_filter(tp, f);
+	list_del_rcu(&f->list);
+	call_rcu(&f->rcu, flow_destroy_filter);
 	return 0;
 }
 
@@ -532,28 +554,29 @@ static int flow_init(struct tcf_proto *tp)
 	if (head == NULL)
 		return -ENOBUFS;
 	INIT_LIST_HEAD(&head->filters);
-	tp->root = head;
+	rcu_assign_pointer(tp->root, head);
 	return 0;
 }
 
 static void flow_destroy(struct tcf_proto *tp)
 {
-	struct flow_head *head = tp->root;
+	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f, *next;
 
 	list_for_each_entry_safe(f, next, &head->filters, list) {
-		list_del(&f->list);
-		flow_destroy_filter(tp, f);
+		list_del_rcu(&f->list);
+		call_rcu(&f->rcu, flow_destroy_filter);
 	}
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static unsigned long flow_get(struct tcf_proto *tp, u32 handle)
 {
-	struct flow_head *head = tp->root;
+	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f;
 
-	list_for_each_entry(f, &head->filters, list)
+	list_for_each_entry_rcu(f, &head->filters, list)
 		if (f->handle == handle)
 			return (unsigned long)f;
 	return 0;
@@ -626,10 +649,10 @@ nla_put_failure:
 
 static void flow_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct flow_head *head = tp->root;
+	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f;
 
-	list_for_each_entry(f, &head->filters, list) {
+	list_for_each_entry_rcu(f, &head->filters, list) {
 		if (arg->count < arg->skip)
 			goto skip;
 		if (arg->fn(tp, (unsigned long)f, arg) < 0) {

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 06/15] net: sched: fw use RCU
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (4 preceding siblings ...)
  2014-04-30 16:36 ` [RFC PATCH 05/15] net: sched: cls_flow " John Fastabend
@ 2014-04-30 16:37 ` John Fastabend
  2014-04-30 16:37 ` [RFC PATCH 07/15] net: sched: RCU cls_route John Fastabend
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:37 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

RCU'ify fw classifier.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_fw.c |  108 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 75 insertions(+), 33 deletions(-)

diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 63a3ce7..5610e8c 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -34,16 +34,19 @@
 struct fw_head {
 	u32			mask;
 	struct fw_filter	*ht[HTSIZE];
+	struct rcu_head		rcu;
 };
 
 struct fw_filter {
-	struct fw_filter	*next;
+	struct fw_filter __rcu	*next;
 	u32			id;
 	struct tcf_result	res;
 #ifdef CONFIG_NET_CLS_IND
 	int			ifindex;
 #endif /* CONFIG_NET_CLS_IND */
 	struct tcf_exts		exts;
+	struct tcf_proto	*tp;
+	struct rcu_head		rcu;
 };
 
 static u32 fw_hash(u32 handle)
@@ -56,14 +59,16 @@ static u32 fw_hash(u32 handle)
 static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			  struct tcf_result *res)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rcu_dereference_bh(tp->root);
 	struct fw_filter *f;
 	int r;
 	u32 id = skb->mark;
 
 	if (head != NULL) {
 		id &= head->mask;
-		for (f = head->ht[fw_hash(id)]; f; f = f->next) {
+
+		for (f = rcu_dereference_bh(head->ht[fw_hash(id)]); f;
+		     f = rcu_dereference_bh(f->next)) {
 			if (f->id == id) {
 				*res = f->res;
 #ifdef CONFIG_NET_CLS_IND
@@ -92,13 +97,14 @@ static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 
 static unsigned long fw_get(struct tcf_proto *tp, u32 handle)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f;
 
 	if (head == NULL)
 		return 0;
 
-	for (f = head->ht[fw_hash(handle)]; f; f = f->next) {
+	f = rtnl_dereference(head->ht[fw_hash(handle)]);
+	for (; f; f = rtnl_dereference(f->next)) {
 		if (f->id == handle)
 			return (unsigned long)f;
 	}
@@ -114,8 +120,11 @@ static int fw_init(struct tcf_proto *tp)
 	return 0;
 }
 
-static void fw_delete_filter(struct tcf_proto *tp, struct fw_filter *f)
+static void fw_delete_filter(struct rcu_head *head)
 {
+	struct fw_filter *f = container_of(head, struct fw_filter, rcu);
+	struct tcf_proto *tp = f->tp;
+
 	tcf_unbind_filter(tp, &f->res);
 	tcf_exts_destroy(tp, &f->exts);
 	kfree(f);
@@ -123,7 +132,7 @@ static void fw_delete_filter(struct tcf_proto *tp, struct fw_filter *f)
 
 static void fw_destroy(struct tcf_proto *tp)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f;
 	int h;
 
@@ -131,29 +140,32 @@ static void fw_destroy(struct tcf_proto *tp)
 		return;
 
 	for (h = 0; h < HTSIZE; h++) {
-		while ((f = head->ht[h]) != NULL) {
-			head->ht[h] = f->next;
-			fw_delete_filter(tp, f);
+		while ((f = rtnl_dereference(head->ht[h])) != NULL) {
+			rcu_assign_pointer(head->ht[h],
+					   rtnl_dereference(f->next));
+			call_rcu(&f->rcu, fw_delete_filter);
 		}
 	}
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static int fw_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = (struct fw_filter *)arg;
-	struct fw_filter **fp;
+	struct fw_filter __rcu **fp, *pfp;
 
 	if (head == NULL || f == NULL)
 		goto out;
 
-	for (fp = &head->ht[fw_hash(f->id)]; *fp; fp = &(*fp)->next) {
-		if (*fp == f) {
-			tcf_tree_lock(tp);
-			*fp = f->next;
-			tcf_tree_unlock(tp);
-			fw_delete_filter(tp, f);
+	fp = &head->ht[fw_hash(f->id)];
+
+	for (pfp = rtnl_dereference(*fp); pfp;
+	     fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
+		if (pfp == f) {
+			rcu_assign_pointer(*fp, rtnl_dereference(f->next));
+			call_rcu(&f->rcu, fw_delete_filter);
 			return 0;
 		}
 	}
@@ -171,7 +183,7 @@ static int
 fw_change_attrs(struct net *net, struct tcf_proto *tp, struct fw_filter *f,
 	struct nlattr **tb, struct nlattr **tca, unsigned long base)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct tcf_exts e;
 	u32 mask;
 	int err;
@@ -220,7 +232,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 		     struct nlattr **tca,
 		     unsigned long *arg)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = (struct fw_filter *) *arg;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_FW_MAX + 1];
@@ -233,10 +245,42 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		return err;
 
-	if (f != NULL) {
+	if (f) {
+		struct fw_filter *pfp, *fnew;
+		struct fw_filter __rcu **fp;
+
 		if (f->id != handle && handle)
 			return -EINVAL;
-		return fw_change_attrs(net, tp, f, tb, tca, base);
+
+		fnew = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
+		if (!fnew)
+			return -ENOBUFS;
+
+		fnew->id = f->id;
+		fnew->res = f->res;
+#ifdef CONFIG_NET_CLS_IND
+		fnew->ifindex = f->ifindex;
+#endif /* CONFIG_NET_CLS_IND */
+		fnew->tp = f->tp;
+
+		err = fw_change_attrs(net, tp, fnew, tb, tca, base);
+		if (err < 0) {
+			kfree(fnew);
+			return err;
+		}
+
+		fp = &head->ht[fw_hash(fnew->id)];
+		for (pfp = rtnl_dereference(*fp); pfp;
+		     fp = &pfp->next, pfp = rtnl_dereference(*fp))
+			if (pfp == f)
+				break;
+
+		rcu_assign_pointer(fnew->next, rtnl_dereference(pfp->next));
+		rcu_assign_pointer(*fp, fnew);
+		call_rcu(&f->rcu, fw_delete_filter);
+
+		*arg = (unsigned long)fnew;
+		return err;
 	}
 
 	if (!handle)
@@ -252,9 +296,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 			return -ENOBUFS;
 		head->mask = mask;
 
-		tcf_tree_lock(tp);
-		tp->root = head;
-		tcf_tree_unlock(tp);
+		rcu_assign_pointer(tp->root, head);
 	}
 
 	f = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
@@ -263,15 +305,14 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 
 	tcf_exts_init(&f->exts, TCA_FW_ACT, TCA_FW_POLICE);
 	f->id = handle;
+	f->tp = tp;
 
 	err = fw_change_attrs(net, tp, f, tb, tca, base);
 	if (err < 0)
 		goto errout;
 
-	f->next = head->ht[fw_hash(handle)];
-	tcf_tree_lock(tp);
-	head->ht[fw_hash(handle)] = f;
-	tcf_tree_unlock(tp);
+	rcu_assign_pointer(f->next, head->ht[fw_hash(handle)]);
+	rcu_assign_pointer(head->ht[fw_hash(handle)], f);
 
 	*arg = (unsigned long)f;
 	return 0;
@@ -283,7 +324,7 @@ errout:
 
 static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	int h;
 
 	if (head == NULL)
@@ -295,7 +336,8 @@ static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 	for (h = 0; h < HTSIZE; h++) {
 		struct fw_filter *f;
 
-		for (f = head->ht[h]; f; f = f->next) {
+		for (f = rtnl_dereference(head->ht[h]); f;
+		     f = rtnl_dereference(f->next)) {
 			if (arg->count < arg->skip) {
 				arg->count++;
 				continue;
@@ -312,7 +354,7 @@ static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 static int fw_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		   struct sk_buff *skb, struct tcmsg *t)
 {
-	struct fw_head *head = tp->root;
+	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = (struct fw_filter *)fh;
 	unsigned char *b = skb_tail_pointer(skb);
 	struct nlattr *nest;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 07/15] net: sched: RCU cls_route
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (5 preceding siblings ...)
  2014-04-30 16:37 ` [RFC PATCH 06/15] net: sched: fw " John Fastabend
@ 2014-04-30 16:37 ` John Fastabend
  2014-04-30 16:38 ` [RFC PATCH 08/15] net: sched: RCU cls_tcindex John Fastabend
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:37 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

RCUify the route classifier. For now however spinlock's are used to
protect fastmap cache.

The issue here is the fastmap may be read by one CPU while the
cache is being updated by another. An array of pointers could be
one possible solution.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_route.c |  221 ++++++++++++++++++++++++++++---------------------
 1 file changed, 128 insertions(+), 93 deletions(-)

diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 1ad3068..50caf0b 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -29,25 +29,26 @@
  *    are mutually  exclusive.
  * 3. "to TAG from ANY" has higher priority, than "to ANY from XXX"
  */
-
 struct route4_fastmap {
-	struct route4_filter	*filter;
-	u32			id;
-	int			iif;
+	struct route4_filter		*filter;
+	u32				id;
+	int				iif;
 };
 
 struct route4_head {
-	struct route4_fastmap	fastmap[16];
-	struct route4_bucket	*table[256 + 1];
+	struct route4_fastmap		fastmap[16];
+	struct route4_bucket __rcu	*table[256 + 1];
+	struct rcu_head			rcu;
 };
 
 struct route4_bucket {
 	/* 16 FROM buckets + 16 IIF buckets + 1 wildcard bucket */
-	struct route4_filter	*ht[16 + 16 + 1];
+	struct route4_filter __rcu	*ht[16 + 16 + 1];
+	struct rcu_head			rcu;
 };
 
 struct route4_filter {
-	struct route4_filter	*next;
+	struct route4_filter __rcu	*next;
 	u32			id;
 	int			iif;
 
@@ -55,6 +56,8 @@ struct route4_filter {
 	struct tcf_exts		exts;
 	u32			handle;
 	struct route4_bucket	*bkt;
+	struct tcf_proto	*tp;
+	struct rcu_head		rcu;
 };
 
 #define ROUTE4_FAILURE ((struct route4_filter *)(-1L))
@@ -64,14 +67,13 @@ static inline int route4_fastmap_hash(u32 id, int iif)
 	return id & 0xF;
 }
 
+static DEFINE_SPINLOCK(fastmap_lock);
 static void
-route4_reset_fastmap(struct Qdisc *q, struct route4_head *head, u32 id)
+route4_reset_fastmap(struct route4_head *head)
 {
-	spinlock_t *root_lock = qdisc_root_sleeping_lock(q);
-
-	spin_lock_bh(root_lock);
+	spin_lock_bh(&fastmap_lock);
 	memset(head->fastmap, 0, sizeof(head->fastmap));
-	spin_unlock_bh(root_lock);
+	spin_unlock_bh(&fastmap_lock);
 }
 
 static void
@@ -80,9 +82,12 @@ route4_set_fastmap(struct route4_head *head, u32 id, int iif,
 {
 	int h = route4_fastmap_hash(id, iif);
 
+	/* fastmap updates must look atomic to aling id, iff, filter */
+	spin_lock_bh(&fastmap_lock);
 	head->fastmap[h].id = id;
 	head->fastmap[h].iif = iif;
 	head->fastmap[h].filter = f;
+	spin_unlock_bh(&fastmap_lock);
 }
 
 static inline int route4_hash_to(u32 id)
@@ -123,7 +128,7 @@ static inline int route4_hash_wild(void)
 static int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			   struct tcf_result *res)
 {
-	struct route4_head *head = tp->root;
+	struct route4_head *head = rcu_dereference_bh(tp->root);
 	struct dst_entry *dst;
 	struct route4_bucket *b;
 	struct route4_filter *f;
@@ -141,32 +146,43 @@ static int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 	iif = inet_iif(skb);
 
 	h = route4_fastmap_hash(id, iif);
+
+	spin_lock(&fastmap_lock);
 	if (id == head->fastmap[h].id &&
 	    iif == head->fastmap[h].iif &&
 	    (f = head->fastmap[h].filter) != NULL) {
-		if (f == ROUTE4_FAILURE)
+		if (f == ROUTE4_FAILURE) {
+			spin_unlock(&fastmap_lock);
 			goto failure;
+		}
 
 		*res = f->res;
+		spin_unlock(&fastmap_lock);
 		return 0;
 	}
+	spin_unlock(&fastmap_lock);
 
 	h = route4_hash_to(id);
 
 restart:
-	b = head->table[h];
+	b = rcu_dereference_bh(head->table[h]);
 	if (b) {
-		for (f = b->ht[route4_hash_from(id)]; f; f = f->next)
+		for (f = rcu_dereference_bh(b->ht[route4_hash_from(id)]);
+		     f;
+		     f = rcu_dereference_bh(f->next))
 			if (f->id == id)
 				ROUTE4_APPLY_RESULT();
 
-		for (f = b->ht[route4_hash_iif(iif)]; f; f = f->next)
+		for (f = rcu_dereference_bh(b->ht[route4_hash_iif(iif)]);
+		     f;
+		     f = rcu_dereference_bh(f->next))
 			if (f->iif == iif)
 				ROUTE4_APPLY_RESULT();
 
-		for (f = b->ht[route4_hash_wild()]; f; f = f->next)
+		for (f = rcu_dereference_bh(b->ht[route4_hash_wild()]);
+		     f;
+		     f = rcu_dereference_bh(f->next))
 			ROUTE4_APPLY_RESULT();
-
 	}
 	if (h < 256) {
 		h = 256;
@@ -213,7 +229,7 @@ static inline u32 from_hash(u32 id)
 
 static unsigned long route4_get(struct tcf_proto *tp, u32 handle)
 {
-	struct route4_head *head = tp->root;
+	struct route4_head *head = rtnl_dereference(tp->root);
 	struct route4_bucket *b;
 	struct route4_filter *f;
 	unsigned int h1, h2;
@@ -229,9 +245,11 @@ static unsigned long route4_get(struct tcf_proto *tp, u32 handle)
 	if (h2 > 32)
 		return 0;
 
-	b = head->table[h1];
+	b = rtnl_dereference(head->table[h1]);
 	if (b) {
-		for (f = b->ht[h2]; f; f = f->next)
+		for (f = rtnl_dereference(b->ht[h2]);
+		     f;
+		     f = rtnl_dereference(f->next))
 			if (f->handle == handle)
 				return (unsigned long)f;
 	}
@@ -248,8 +266,11 @@ static int route4_init(struct tcf_proto *tp)
 }
 
 static void
-route4_delete_filter(struct tcf_proto *tp, struct route4_filter *f)
+route4_delete_filter(struct rcu_head *head)
 {
+	struct route4_filter *f = container_of(head, struct route4_filter, rcu);
+	struct tcf_proto *tp = f->tp;
+
 	tcf_unbind_filter(tp, &f->res);
 	tcf_exts_destroy(tp, &f->exts);
 	kfree(f);
@@ -257,7 +278,7 @@ route4_delete_filter(struct tcf_proto *tp, struct route4_filter *f)
 
 static void route4_destroy(struct tcf_proto *tp)
 {
-	struct route4_head *head = tp->root;
+	struct route4_head *head = rtnl_dereference(tp->root);
 	int h1, h2;
 
 	if (head == NULL)
@@ -266,26 +287,31 @@ static void route4_destroy(struct tcf_proto *tp)
 	for (h1 = 0; h1 <= 256; h1++) {
 		struct route4_bucket *b;
 
-		b = head->table[h1];
+		b = rtnl_dereference(head->table[h1]);
 		if (b) {
 			for (h2 = 0; h2 <= 32; h2++) {
 				struct route4_filter *f;
 
-				while ((f = b->ht[h2]) != NULL) {
-					b->ht[h2] = f->next;
-					route4_delete_filter(tp, f);
+				while ((f = rtnl_dereference(b->ht[h2])) != NULL) {
+					struct route4_filter *next;
+
+					next = rtnl_dereference(f->next);
+					rcu_assign_pointer(b->ht[h2], next);
+					call_rcu(&f->rcu, route4_delete_filter);
 				}
 			}
-			kfree(b);
+			RCU_INIT_POINTER(head->table[h1], NULL);
+			kfree_rcu(b, rcu);
 		}
 	}
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static int route4_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct route4_head *head = tp->root;
-	struct route4_filter **fp, *f = (struct route4_filter *)arg;
+	struct route4_head *head = rtnl_dereference(tp->root);
+	struct route4_filter __rcu **fp, *nf, *f = (struct route4_filter *)arg;
 	unsigned int h = 0;
 	struct route4_bucket *b;
 	int i;
@@ -296,27 +322,35 @@ static int route4_delete(struct tcf_proto *tp, unsigned long arg)
 	h = f->handle;
 	b = f->bkt;
 
-	for (fp = &b->ht[from_hash(h >> 16)]; *fp; fp = &(*fp)->next) {
-		if (*fp == f) {
-			tcf_tree_lock(tp);
+	fp = &b->ht[from_hash(h >> 16)];
+	for (nf = rtnl_dereference(*fp); nf;
+	     fp = &nf->next, nf = rtnl_dereference(*fp)) {
+		if (nf == f) {
+			/* unlink it */
 			*fp = f->next;
-			tcf_tree_unlock(tp);
 
-			route4_reset_fastmap(tp->q, head, f->id);
-			route4_delete_filter(tp, f);
+			/* Remove any fastmap lookups that might ref filter
+			 * notice we unlink'd the filter so we can't get it
+			 * back in the fastmap.
+			 */
+			route4_reset_fastmap(head);
+
+			/* Delete it */
+			call_rcu(&f->rcu, route4_delete_filter);
 
-			/* Strip tree */
+			/* Strip RTNL protected tree */
+			for (i = 0; i <= 32; i++) {
+				struct route4_filter *rt;
 
-			for (i = 0; i <= 32; i++)
-				if (b->ht[i])
+				rt = rtnl_dereference(b->ht[i]);
+				if (rt)
 					return 0;
+			}
 
 			/* OK, session has no flows */
-			tcf_tree_lock(tp);
-			head->table[to_hash(h)] = NULL;
-			tcf_tree_unlock(tp);
+			RCU_INIT_POINTER(head->table[to_hash(h)], NULL);
+			kfree_rcu(b, rcu);
 
-			kfree(b);
 			return 0;
 		}
 	}
@@ -379,26 +413,25 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
 	}
 
 	h1 = to_hash(nhandle);
-	b = head->table[h1];
+	b = rtnl_dereference(head->table[h1]);
 	if (!b) {
 		err = -ENOBUFS;
 		b = kzalloc(sizeof(struct route4_bucket), GFP_KERNEL);
 		if (b == NULL)
 			goto errout;
 
-		tcf_tree_lock(tp);
-		head->table[h1] = b;
-		tcf_tree_unlock(tp);
+		rcu_assign_pointer(head->table[h1], b);
 	} else {
 		unsigned int h2 = from_hash(nhandle >> 16);
 
 		err = -EEXIST;
-		for (fp = b->ht[h2]; fp; fp = fp->next)
+		for (fp = rtnl_dereference(b->ht[h2]);
+		     fp;
+		     fp = rtnl_dereference(fp->next))
 			if (fp->handle == f->handle)
 				goto errout;
 	}
 
-	tcf_tree_lock(tp);
 	if (tb[TCA_ROUTE4_TO])
 		f->id = to;
 
@@ -409,7 +442,7 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
 
 	f->handle = nhandle;
 	f->bkt = b;
-	tcf_tree_unlock(tp);
+	f->tp = tp;
 
 	if (tb[TCA_ROUTE4_CLASSID]) {
 		f->res.classid = nla_get_u32(tb[TCA_ROUTE4_CLASSID]);
@@ -430,14 +463,15 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
 		       struct nlattr **tca,
 		       unsigned long *arg)
 {
-	struct route4_head *head = tp->root;
-	struct route4_filter *f, *f1, **fp;
+	struct route4_head *head = rtnl_dereference(tp->root);
+	struct route4_filter __rcu **fp;
+	struct route4_filter *fold, *f1, *pfp, *f = NULL;
 	struct route4_bucket *b;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_ROUTE4_MAX + 1];
 	unsigned int h, th;
-	u32 old_handle = 0;
 	int err;
+	bool new = true;
 
 	if (opt == NULL)
 		return handle ? -EINVAL : 0;
@@ -446,70 +480,69 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		return err;
 
-	f = (struct route4_filter *)*arg;
-	if (f) {
-		if (f->handle != handle && handle)
+	fold = (struct route4_filter *)*arg;
+	if (fold && handle && fold->handle != handle)
 			return -EINVAL;
-
-		if (f->bkt)
-			old_handle = f->handle;
-
-		err = route4_set_parms(net, tp, base, f, handle, head, tb,
-			tca[TCA_RATE], 0);
-		if (err < 0)
-			return err;
-
-		goto reinsert;
-	}
-
 	err = -ENOBUFS;
 	if (head == NULL) {
 		head = kzalloc(sizeof(struct route4_head), GFP_KERNEL);
 		if (head == NULL)
 			goto errout;
-
-		tcf_tree_lock(tp);
-		tp->root = head;
-		tcf_tree_unlock(tp);
+		rcu_assign_pointer(tp->root, head);
 	}
 
 	f = kzalloc(sizeof(struct route4_filter), GFP_KERNEL);
-	if (f == NULL)
+	if (!f)
 		goto errout;
 
 	tcf_exts_init(&f->exts, TCA_ROUTE4_ACT, TCA_ROUTE4_POLICE);
+	if (fold) {
+		f->id = fold->id;
+		f->iif = fold->iif;
+		f->res = fold->res;
+		f->handle = fold->handle;
+
+		f->tp = fold->tp;
+		f->bkt = fold->bkt;
+		new = false;
+	}
+
 	err = route4_set_parms(net, tp, base, f, handle, head, tb,
-		tca[TCA_RATE], 1);
+			       tca[TCA_RATE], new);
 	if (err < 0)
 		goto errout;
 
-reinsert:
 	h = from_hash(f->handle >> 16);
-	for (fp = &f->bkt->ht[h]; (f1 = *fp) != NULL; fp = &f1->next)
+	fp = &f->bkt->ht[h];
+	for (pfp = rtnl_dereference(*fp);
+	     (f1 = rtnl_dereference(*fp)) != NULL;
+	     fp = &f1->next)
 		if (f->handle < f1->handle)
 			break;
 
-	f->next = f1;
-	tcf_tree_lock(tp);
-	*fp = f;
+	rcu_assign_pointer(f->next, f1);
+	rcu_assign_pointer(*fp, f);
 
-	if (old_handle && f->handle != old_handle) {
-		th = to_hash(old_handle);
-		h = from_hash(old_handle >> 16);
-		b = head->table[th];
+	if (fold->handle && f->handle != fold->handle) {
+		th = to_hash(fold->handle);
+		h = from_hash(fold->handle >> 16);
+		b = rtnl_dereference(head->table[th]);
 		if (b) {
-			for (fp = &b->ht[h]; *fp; fp = &(*fp)->next) {
-				if (*fp == f) {
+			fp = &b->ht[h];
+			for (pfp = rtnl_dereference(*fp); pfp;
+			     fp = &pfp->next, pfp = rtnl_dereference(*fp)) {
+				if (pfp == f) {
 					*fp = f->next;
 					break;
 				}
 			}
 		}
 	}
-	tcf_tree_unlock(tp);
 
-	route4_reset_fastmap(tp->q, head, f->id);
+	route4_reset_fastmap(head);
 	*arg = (unsigned long)f;
+	if (fold)
+		call_rcu(&fold->rcu, route4_delete_filter);
 	return 0;
 
 errout:
@@ -519,7 +552,7 @@ errout:
 
 static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct route4_head *head = tp->root;
+	struct route4_head *head = rtnl_dereference(tp->root);
 	unsigned int h, h1;
 
 	if (head == NULL)
@@ -529,13 +562,15 @@ static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 		return;
 
 	for (h = 0; h <= 256; h++) {
-		struct route4_bucket *b = head->table[h];
+		struct route4_bucket *b = rtnl_dereference(head->table[h]);
 
 		if (b) {
 			for (h1 = 0; h1 <= 32; h1++) {
 				struct route4_filter *f;
 
-				for (f = b->ht[h1]; f; f = f->next) {
+				for (f = rtnl_dereference(b->ht[h1]);
+				     f;
+				     f = rtnl_dereference(f->next)) {
 					if (arg->count < arg->skip) {
 						arg->count++;
 						continue;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 08/15] net: sched: RCU cls_tcindex
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (6 preceding siblings ...)
  2014-04-30 16:37 ` [RFC PATCH 07/15] net: sched: RCU cls_route John Fastabend
@ 2014-04-30 16:38 ` John Fastabend
  2014-04-30 16:38 ` [RFC PATCH 09/15] net: sched: make cls_u32 lockless John Fastabend
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:38 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

Make cls_tcindex RCU safe.

This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/rtnetlink.h |   10 ++
 net/sched/cls_tcindex.c   |  240 ++++++++++++++++++++++++++++-----------------
 2 files changed, 159 insertions(+), 91 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 8e3e66a..73228e2 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -42,6 +42,16 @@ static inline int lockdep_rtnl_is_held(void)
 	rcu_dereference_check(p, lockdep_rtnl_is_held())
 
 /**
+ * rcu_dereference_bh_rtnl - rcu_dereference_bh with debug checking
+ * @p: The pointer to read, prior to dereference
+ *
+ * Do an rcu_dereference_bh(p), but check caller either holds rcu_read_lock_bh()
+ * or RTNL. Note : Please prefer rtnl_dereference() or rcu_dereference_bh()
+ */
+#define rcu_dereference_bh_rtnl(p)				\
+	rcu_dereference_bh_check(p, lockdep_rtnl_is_held())
+
+/**
  * rtnl_dereference - fetch RCU pointer when updates are prevented by RTNL
  * @p: The pointer to read, prior to dereferencing
  *
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index eed8404..741c033 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -32,19 +32,22 @@ struct tcindex_filter_result {
 struct tcindex_filter {
 	u16 key;
 	struct tcindex_filter_result result;
-	struct tcindex_filter *next;
+	struct tcindex_filter __rcu *next;
+	struct rcu_head rcu;
 };
 
 
 struct tcindex_data {
 	struct tcindex_filter_result *perfect; /* perfect hash; NULL if none */
-	struct tcindex_filter **h; /* imperfect hash; only used if !perfect;
-				      NULL if unused */
+	struct tcindex_filter __rcu **h; /* imperfect hash; only used if
+					    !perfect NULL if unused */
+	struct tcf_proto *tp;
 	u16 mask;		/* AND key with mask */
-	int shift;		/* shift ANDed key to the right */
-	int hash;		/* hash table size; 0 if undefined */
-	int alloc_hash;		/* allocated size */
-	int fall_through;	/* 0: only classify if explicit match */
+	u32 shift;		/* shift ANDed key to the right */
+	u32 hash;		/* hash table size; 0 if undefined */
+	u32 alloc_hash;		/* allocated size */
+	u32 fall_through;	/* 0: only classify if explicit match */
+	struct rcu_head rcu;
 };
 
 static inline int
@@ -56,13 +59,18 @@ tcindex_filter_is_set(struct tcindex_filter_result *r)
 static struct tcindex_filter_result *
 tcindex_lookup(struct tcindex_data *p, u16 key)
 {
-	struct tcindex_filter *f;
+	if (p->perfect) {
+		struct tcindex_filter_result *f = p->perfect + key;
+
+		return tcindex_filter_is_set(f) ? f : NULL;
+	} else if (p->h) {
+		struct tcindex_filter __rcu **fp;
+		struct tcindex_filter *f;
 
-	if (p->perfect)
-		return tcindex_filter_is_set(p->perfect + key) ?
-			p->perfect + key : NULL;
-	else if (p->h) {
-		for (f = p->h[key % p->hash]; f; f = f->next)
+		fp = &p->h[key % p->hash];
+		for (f = rcu_dereference_bh_rtnl(*fp);
+		     f;
+		     fp = &f->next, f = rcu_dereference_bh_rtnl(*fp))
 			if (f->key == key)
 				return &f->result;
 	}
@@ -74,7 +82,7 @@ tcindex_lookup(struct tcindex_data *p, u16 key)
 static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			    struct tcf_result *res)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rcu_dereference(tp->root);
 	struct tcindex_filter_result *f;
 	int key = (skb->tc_index & p->mask) >> p->shift;
 
@@ -99,7 +107,7 @@ static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 
 static unsigned long tcindex_get(struct tcf_proto *tp, u32 handle)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r;
 
 	pr_debug("tcindex_get(tp %p,handle 0x%08x)\n", tp, handle);
@@ -129,49 +137,59 @@ static int tcindex_init(struct tcf_proto *tp)
 	p->hash = DEFAULT_HASH_SIZE;
 	p->fall_through = 1;
 
-	tp->root = p;
+	rcu_assign_pointer(tp->root, p);
 	return 0;
 }
 
-
 static int
-__tcindex_delete(struct tcf_proto *tp, unsigned long arg, int lock)
+tcindex_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = (struct tcindex_filter_result *) arg;
+	struct tcindex_filter __rcu **walk;
 	struct tcindex_filter *f = NULL;
 
-	pr_debug("tcindex_delete(tp %p,arg 0x%lx),p %p,f %p\n", tp, arg, p, f);
+	pr_debug("tcindex_delete(tp %p,arg 0x%lx),p %p\n", tp, arg, p);
 	if (p->perfect) {
 		if (!r->res.class)
 			return -ENOENT;
 	} else {
 		int i;
-		struct tcindex_filter **walk = NULL;
 
-		for (i = 0; i < p->hash; i++)
-			for (walk = p->h+i; *walk; walk = &(*walk)->next)
-				if (&(*walk)->result == r)
+		for (i = 0; i < p->hash; i++) {
+			walk = p->h + i;
+			for (f = rtnl_dereference(*walk); f;
+			     walk = &f->next, f = rtnl_dereference(*walk)) {
+				if (&f->result == r)
 					goto found;
+			}
+		}
 		return -ENOENT;
 
 found:
-		f = *walk;
-		if (lock)
-			tcf_tree_lock(tp);
-		*walk = f->next;
-		if (lock)
-			tcf_tree_unlock(tp);
+		rcu_assign_pointer(*walk, rtnl_dereference(f->next));
 	}
 	tcf_unbind_filter(tp, &r->res);
 	tcf_exts_destroy(tp, &r->exts);
-	kfree(f);
+	if (f)
+		kfree_rcu(f, rcu);
 	return 0;
 }
 
-static int tcindex_delete(struct tcf_proto *tp, unsigned long arg)
+static int tcindex_destroy_element(struct tcf_proto *tp,
+				   unsigned long arg,
+				   struct tcf_walker *walker)
 {
-	return __tcindex_delete(tp, arg, 1);
+	return tcindex_delete(tp, arg);
+}
+
+static void __tcindex_destroy(struct rcu_head *head)
+{
+	struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
+
+	kfree(p->perfect);
+	kfree(p->h);
+	kfree(p);
 }
 
 static inline int
@@ -188,6 +206,14 @@ static const struct nla_policy tcindex_policy[TCA_TCINDEX_MAX + 1] = {
 	[TCA_TCINDEX_CLASSID]		= { .type = NLA_U32 },
 };
 
+static void __tcindex_partial_destroy(struct rcu_head *head)
+{
+	struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
+
+	kfree(p->perfect);
+	kfree(p);
+}
+
 static int
 tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 		  u32 handle, struct tcindex_data *p,
@@ -197,7 +223,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	int err, balloc = 0;
 	struct tcindex_filter_result new_filter_result, *old_r = r;
 	struct tcindex_filter_result cr;
-	struct tcindex_data cp;
+	struct tcindex_data *cp, *oldp;
 	struct tcindex_filter *f = NULL; /* make gcc behave */
 	struct tcf_exts e;
 
@@ -206,7 +232,27 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	if (err < 0)
 		return err;
 
-	memcpy(&cp, p, sizeof(cp));
+	/* tcindex_data attributes must look atomic to classifier/lookup so
+	 * allocate new tcindex data and RCU assign it onto root. Keeping
+	 * perfect hash and hash pointers from old data.
+	 */
+	cp = kzalloc(sizeof(cp), GFP_KERNEL);
+	if (!cp)
+		return -ENOMEM;
+
+	cp->mask = p->mask;
+	cp->shift = p->shift;
+	cp->hash = p->hash;
+	cp->alloc_hash = p->alloc_hash;
+	cp->fall_through = p->fall_through;
+	cp->tp = tp;
+
+	if (p->perfect) {
+		cp->perfect = kcalloc(cp->hash, sizeof(*r), GFP_KERNEL);
+		memcpy(cp->perfect, p->perfect, sizeof(*r) * cp->hash);
+	}
+	cp->h = p->h;
+
 	memset(&new_filter_result, 0, sizeof(new_filter_result));
 	tcf_exts_init(&new_filter_result.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
 
@@ -218,71 +264,82 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	}
 
 	if (tb[TCA_TCINDEX_HASH])
-		cp.hash = nla_get_u32(tb[TCA_TCINDEX_HASH]);
+		cp->hash = nla_get_u32(tb[TCA_TCINDEX_HASH]);
 
 	if (tb[TCA_TCINDEX_MASK])
-		cp.mask = nla_get_u16(tb[TCA_TCINDEX_MASK]);
+		cp->mask = nla_get_u16(tb[TCA_TCINDEX_MASK]);
 
 	if (tb[TCA_TCINDEX_SHIFT])
-		cp.shift = nla_get_u32(tb[TCA_TCINDEX_SHIFT]);
+		cp->shift = nla_get_u32(tb[TCA_TCINDEX_SHIFT]);
 
 	err = -EBUSY;
+
 	/* Hash already allocated, make sure that we still meet the
 	 * requirements for the allocated hash.
 	 */
-	if (cp.perfect) {
-		if (!valid_perfect_hash(&cp) ||
-		    cp.hash > cp.alloc_hash)
+	if (cp->perfect) {
+		if (!valid_perfect_hash(cp) ||
+		    cp->hash > cp->alloc_hash)
 			goto errout;
-	} else if (cp.h && cp.hash != cp.alloc_hash)
+	} else if (cp->h && cp->hash != cp->alloc_hash)
 		goto errout;
 
 	err = -EINVAL;
 	if (tb[TCA_TCINDEX_FALL_THROUGH])
-		cp.fall_through = nla_get_u32(tb[TCA_TCINDEX_FALL_THROUGH]);
+		cp->fall_through = nla_get_u32(tb[TCA_TCINDEX_FALL_THROUGH]);
 
-	if (!cp.hash) {
+	if (!cp->hash) {
 		/* Hash not specified, use perfect hash if the upper limit
 		 * of the hashing index is below the threshold.
 		 */
-		if ((cp.mask >> cp.shift) < PERFECT_HASH_THRESHOLD)
-			cp.hash = (cp.mask >> cp.shift) + 1;
+		if ((cp->mask >> cp->shift) < PERFECT_HASH_THRESHOLD)
+			cp->hash = (cp->mask >> cp->shift) + 1;
 		else
-			cp.hash = DEFAULT_HASH_SIZE;
+			cp->hash = DEFAULT_HASH_SIZE;
 	}
 
-	if (!cp.perfect && !cp.h)
-		cp.alloc_hash = cp.hash;
+	if (!cp->perfect && cp->h)
+		cp->alloc_hash = cp->hash;
 
 	/* Note: this could be as restrictive as if (handle & ~(mask >> shift))
 	 * but then, we'd fail handles that may become valid after some future
 	 * mask change. While this is extremely unlikely to ever matter,
 	 * the check below is safer (and also more backwards-compatible).
 	 */
-	if (cp.perfect || valid_perfect_hash(&cp))
-		if (handle >= cp.alloc_hash)
+	if (cp->perfect || valid_perfect_hash(cp))
+		if (handle >= cp->alloc_hash)
 			goto errout;
 
 
 	err = -ENOMEM;
-	if (!cp.perfect && !cp.h) {
-		if (valid_perfect_hash(&cp)) {
-			cp.perfect = kcalloc(cp.hash, sizeof(*r), GFP_KERNEL);
-			if (!cp.perfect)
+	if (!cp->perfect && !cp->h) {
+		if (valid_perfect_hash(cp)) {
+			struct tcindex_filter_result *perfect;
+
+			perfect = kcalloc(cp->hash, sizeof(*r), GFP_KERNEL);
+			if (!perfect)
 				goto errout;
+			cp->perfect = perfect;
 			balloc = 1;
 		} else {
-			cp.h = kcalloc(cp.hash, sizeof(f), GFP_KERNEL);
-			if (!cp.h)
+			struct tcindex_filter __rcu **hash;
+
+			hash = kcalloc(cp->hash,
+				       sizeof(struct tcindex_filter *),
+				       GFP_KERNEL);
+
+			if (!hash)
 				goto errout;
+
+			cp->h = hash;
 			balloc = 2;
 		}
 	}
 
-	if (cp.perfect)
-		r = cp.perfect + handle;
+	if (cp->perfect)
+		r = cp->perfect + handle;
 	else
-		r = tcindex_lookup(&cp, handle) ? : &new_filter_result;
+		r = tcindex_lookup(cp, handle) ? : &new_filter_result;
 
 	if (r == &new_filter_result) {
 		f = kzalloc(sizeof(*f), GFP_KERNEL);
@@ -297,33 +354,41 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 
 	tcf_exts_change(tp, &cr.exts, &e);
 
-	tcf_tree_lock(tp);
 	if (old_r && old_r != r)
 		memset(old_r, 0, sizeof(*old_r));
 
-	memcpy(p, &cp, sizeof(cp));
+	oldp = p;
 	memcpy(r, &cr, sizeof(cr));
+	rcu_assign_pointer(tp->root, cp);
 
 	if (r == &new_filter_result) {
-		struct tcindex_filter **fp;
+		struct tcindex_filter *nfp;
+		struct tcindex_filter __rcu **fp;
 
 		f->key = handle;
 		f->result = new_filter_result;
 		f->next = NULL;
-		for (fp = p->h+(handle % p->hash); *fp; fp = &(*fp)->next)
-			/* nothing */;
-		*fp = f;
+
+		fp = p->h + (handle % p->hash);
+		for (nfp = rtnl_dereference(*fp);
+		     nfp;
+		     fp = &nfp->next, nfp = rtnl_dereference(*fp))
+				/* nothing */;
+
+		rcu_assign_pointer(*fp, f);
 	}
-	tcf_tree_unlock(tp);
 
+	if (oldp)
+		call_rcu(&oldp->rcu, __tcindex_partial_destroy);
 	return 0;
 
 errout_alloc:
 	if (balloc == 1)
-		kfree(cp.perfect);
+		kfree(cp->perfect);
 	else if (balloc == 2)
-		kfree(cp.h);
+		kfree(cp->h);
 errout:
+	kfree(cp);
 	tcf_exts_destroy(tp, &e);
 	return err;
 }
@@ -335,7 +400,7 @@ tcindex_change(struct net *net, struct sk_buff *in_skb,
 {
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_TCINDEX_MAX + 1];
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = (struct tcindex_filter_result *) *arg;
 	int err;
 
@@ -354,10 +419,9 @@ tcindex_change(struct net *net, struct sk_buff *in_skb,
 				 tca[TCA_RATE]);
 }
 
-
 static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter *f, *next;
 	int i;
 
@@ -380,8 +444,8 @@ static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
 	if (!p->h)
 		return;
 	for (i = 0; i < p->hash; i++) {
-		for (f = p->h[i]; f; f = next) {
-			next = f->next;
+		for (f = rtnl_dereference(p->h[i]); f; f = next) {
+			next = rtnl_dereference(f->next);
 			if (walker->count >= walker->skip) {
 				if (walker->fn(tp, (unsigned long) &f->result,
 				    walker) < 0) {
@@ -394,17 +458,9 @@ static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
 	}
 }
 
-
-static int tcindex_destroy_element(struct tcf_proto *tp,
-    unsigned long arg, struct tcf_walker *walker)
-{
-	return __tcindex_delete(tp, arg, 0);
-}
-
-
 static void tcindex_destroy(struct tcf_proto *tp)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcf_walker walker;
 
 	pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p);
@@ -412,17 +468,16 @@ static void tcindex_destroy(struct tcf_proto *tp)
 	walker.skip = 0;
 	walker.fn = &tcindex_destroy_element;
 	tcindex_walk(tp, &walker);
-	kfree(p->perfect);
-	kfree(p->h);
-	kfree(p);
-	tp->root = NULL;
+
+	RCU_INIT_POINTER(tp->root, NULL);
+	call_rcu(&p->rcu, __tcindex_destroy);
 }
 
 
 static int tcindex_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
     struct sk_buff *skb, struct tcmsg *t)
 {
-	struct tcindex_data *p = tp->root;
+	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = (struct tcindex_filter_result *) fh;
 	unsigned char *b = skb_tail_pointer(skb);
 	struct nlattr *nest;
@@ -445,15 +500,18 @@ static int tcindex_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		nla_nest_end(skb, nest);
 	} else {
 		if (p->perfect) {
-			t->tcm_handle = r-p->perfect;
+			t->tcm_handle = r - p->perfect;
 		} else {
 			struct tcindex_filter *f;
+			struct tcindex_filter __rcu **fp;
 			int i;
 
 			t->tcm_handle = 0;
 			for (i = 0; !t->tcm_handle && i < p->hash; i++) {
-				for (f = p->h[i]; !t->tcm_handle && f;
-				     f = f->next) {
+				fp = &p->h[i];
+				for (f = rtnl_dereference(*fp);
+				     !t->tcm_handle && f;
+				     fp = &f->next, f = rtnl_dereference(*fp)) {
 					if (&f->result == r)
 						t->tcm_handle = f->key;
 				}

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 09/15] net: sched: make cls_u32 lockless
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (7 preceding siblings ...)
  2014-04-30 16:38 ` [RFC PATCH 08/15] net: sched: RCU cls_tcindex John Fastabend
@ 2014-04-30 16:38 ` John Fastabend
  2014-04-30 16:39 ` [RFC PATCH 10/15] net: sched: rcu'ify cls_rsvp John Fastabend
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:38 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

Make cls_u32 classifier safe to run without holding lock. This patch
converts statistics that are kept in read section u32_classify into
per cpu counters.

This patch was tested with a tight u32 filter add/delete loop while
generating traffic with pktgen. By running pktgen on vlan devices
created on top of a physical device we can hit the qdisc layer
correctly. For ingress qdisc's a loopback cable was used.

for i in {1..100}; do
        q=`echo $i%8|bc`;
        echo -n "u32 tos: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
                  action skbedit queue_mapping $q;
        sleep 1;
        tc filter del dev p3p2 prio $i;

        echo -n "u32 tos hash table: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
        sleep 2;
        tc filter del dev p3p2 prio $i
        sleep 1;
done

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_u32.c |  258 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 169 insertions(+), 89 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 84c28da..3d7f364 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -36,6 +36,7 @@
 #include <linux/kernel.h>
 #include <linux/string.h>
 #include <linux/errno.h>
+#include <linux/percpu.h>
 #include <linux/rtnetlink.h>
 #include <linux/skbuff.h>
 #include <net/netlink.h>
@@ -43,40 +44,46 @@
 #include <net/pkt_cls.h>
 
 struct tc_u_knode {
-	struct tc_u_knode	*next;
+	struct tc_u_knode __rcu	*next;
 	u32			handle;
-	struct tc_u_hnode	*ht_up;
+	struct tc_u_hnode __rcu	*ht_up;
 	struct tcf_exts		exts;
 #ifdef CONFIG_NET_CLS_IND
 	int			ifindex;
 #endif
 	u8			fshift;
 	struct tcf_result	res;
-	struct tc_u_hnode	*ht_down;
+	struct tc_u_hnode __rcu	*ht_down;
 #ifdef CONFIG_CLS_U32_PERF
-	struct tc_u32_pcnt	*pf;
+	struct tc_u32_pcnt __percpu *pf;
 #endif
 #ifdef CONFIG_CLS_U32_MARK
-	struct tc_u32_mark	mark;
+	u32			val;
+	u32			mask;
+	u32 __percpu		*pcpu_success;
 #endif
+	struct tcf_proto	*tp;
 	struct tc_u32_sel	sel;
+	struct rcu_head		rcu;
 };
 
 struct tc_u_hnode {
-	struct tc_u_hnode	*next;
+	struct tc_u_hnode __rcu	*next;
 	u32			handle;
 	u32			prio;
 	struct tc_u_common	*tp_c;
 	int			refcnt;
 	unsigned int		divisor;
-	struct tc_u_knode	*ht[1];
+	struct tc_u_knode __rcu	*ht[1];
+	struct rcu_head		rcu;
 };
 
 struct tc_u_common {
-	struct tc_u_hnode	*hlist;
+	struct tc_u_hnode __rcu	*hlist;
 	struct Qdisc		*q;
 	int			refcnt;
 	u32			hgenerator;
+	struct rcu_head		rcu;
 };
 
 static inline unsigned int u32_hash_fold(__be32 key,
@@ -95,7 +102,7 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
 		unsigned int	  off;
 	} stack[TC_U32_MAXDEPTH];
 
-	struct tc_u_hnode *ht = tp->root;
+	struct tc_u_hnode *ht = rcu_dereference_bh(tp->root);
 	unsigned int off = skb_network_offset(skb);
 	struct tc_u_knode *n;
 	int sdepth = 0;
@@ -107,23 +114,23 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
 	int i, r;
 
 next_ht:
-	n = ht->ht[sel];
+	n = rcu_dereference_bh(ht->ht[sel]);
 
 next_knode:
 	if (n) {
 		struct tc_u32_key *key = n->sel.keys;
 
 #ifdef CONFIG_CLS_U32_PERF
-		n->pf->rcnt += 1;
+		this_cpu_inc(n->pf->rcnt);
 		j = 0;
 #endif
 
 #ifdef CONFIG_CLS_U32_MARK
-		if ((skb->mark & n->mark.mask) != n->mark.val) {
-			n = n->next;
+		if ((skb->mark & n->mask) != n->val) {
+			n = rcu_dereference_bh(n->next);
 			goto next_knode;
 		} else {
-			n->mark.success++;
+			this_cpu_inc(*n->pcpu_success);
 		}
 #endif
 
@@ -138,37 +145,39 @@ next_knode:
 			if (!data)
 				goto out;
 			if ((*data ^ key->val) & key->mask) {
-				n = n->next;
+				n = rcu_dereference_bh(n->next);
 				goto next_knode;
 			}
 #ifdef CONFIG_CLS_U32_PERF
-			n->pf->kcnts[j] += 1;
+			this_cpu_inc(n->pf->kcnts[j]);
 			j++;
 #endif
 		}
-		if (n->ht_down == NULL) {
+
+		ht = rcu_dereference_bh(n->ht_down);
+		if (!ht) {
 check_terminal:
 			if (n->sel.flags & TC_U32_TERMINAL) {
 
 				*res = n->res;
 #ifdef CONFIG_NET_CLS_IND
 				if (!tcf_match_indev(skb, n->ifindex)) {
-					n = n->next;
+					n = rcu_dereference_bh(n->next);
 					goto next_knode;
 				}
 #endif
 #ifdef CONFIG_CLS_U32_PERF
-				n->pf->rhit += 1;
+				this_cpu_inc(n->pf->rhit);
 #endif
 				r = tcf_exts_exec(skb, &n->exts, res);
 				if (r < 0) {
-					n = n->next;
+					n = rcu_dereference_bh(n->next);
 					goto next_knode;
 				}
 
 				return r;
 			}
-			n = n->next;
+			n = rcu_dereference_bh(n->next);
 			goto next_knode;
 		}
 
@@ -179,7 +188,7 @@ check_terminal:
 		stack[sdepth].off = off;
 		sdepth++;
 
-		ht = n->ht_down;
+		ht = rcu_dereference_bh(n->ht_down);
 		sel = 0;
 		if (ht->divisor) {
 			__be32 *data, hdata;
@@ -221,7 +230,7 @@ check_terminal:
 	/* POP */
 	if (sdepth--) {
 		n = stack[sdepth].knode;
-		ht = n->ht_up;
+		ht = rcu_dereference_bh(n->ht_up);
 		off = stack[sdepth].off;
 		goto check_terminal;
 	}
@@ -238,7 +247,9 @@ u32_lookup_ht(struct tc_u_common *tp_c, u32 handle)
 {
 	struct tc_u_hnode *ht;
 
-	for (ht = tp_c->hlist; ht; ht = ht->next)
+	for (ht = rtnl_dereference(tp_c->hlist);
+	     ht;
+	     ht = rtnl_dereference(ht->next))
 		if (ht->handle == handle)
 			break;
 
@@ -255,7 +266,9 @@ u32_lookup_key(struct tc_u_hnode *ht, u32 handle)
 	if (sel > ht->divisor)
 		goto out;
 
-	for (n = ht->ht[sel]; n; n = n->next)
+	for (n = rtnl_dereference(ht->ht[sel]);
+	     n;
+	     n = rtnl_dereference(n->next))
 		if (n->handle == handle)
 			break;
 out:
@@ -269,7 +282,7 @@ static unsigned long u32_get(struct tcf_proto *tp, u32 handle)
 	struct tc_u_common *tp_c = tp->data;
 
 	if (TC_U32_HTID(handle) == TC_U32_ROOT)
-		ht = tp->root;
+		ht = rtnl_dereference(tp->root);
 	else
 		ht = u32_lookup_ht(tp_c, TC_U32_HTID(handle));
 
@@ -290,6 +303,9 @@ static u32 gen_new_htid(struct tc_u_common *tp_c)
 {
 	int i = 0x800;
 
+	/* hgenerator only used inside rtnl lock it is safe to increment
+	 * without read _copy_ update semantics
+	 */
 	do {
 		if (++tp_c->hgenerator == 0x7FF)
 			tp_c->hgenerator = 1;
@@ -325,11 +341,11 @@ static int u32_init(struct tcf_proto *tp)
 	}
 
 	tp_c->refcnt++;
-	root_ht->next = tp_c->hlist;
-	tp_c->hlist = root_ht;
+	rcu_assign_pointer(root_ht->next, tp_c->hlist);
+	rcu_assign_pointer(tp_c->hlist, root_ht);
 	root_ht->tp_c = tp_c;
 
-	tp->root = root_ht;
+	rcu_assign_pointer(tp->root, root_ht);
 	tp->data = tp_c;
 	return 0;
 }
@@ -341,25 +357,33 @@ static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n)
 	if (n->ht_down)
 		n->ht_down->refcnt--;
 #ifdef CONFIG_CLS_U32_PERF
-	kfree(n->pf);
+	free_percpu(n->pf);
 #endif
 	kfree(n);
 	return 0;
 }
 
+void __u32_delete_key(struct rcu_head *rcu)
+{
+	struct tc_u_knode *key = container_of(rcu, struct tc_u_knode, rcu);
+
+	u32_destroy_key(key->tp, key);
+}
+
 static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 {
-	struct tc_u_knode **kp;
+	struct tc_u_knode __rcu **kp;
+	struct tc_u_knode *pkp;
 	struct tc_u_hnode *ht = key->ht_up;
 
 	if (ht) {
-		for (kp = &ht->ht[TC_U32_HASH(key->handle)]; *kp; kp = &(*kp)->next) {
-			if (*kp == key) {
-				tcf_tree_lock(tp);
-				*kp = key->next;
-				tcf_tree_unlock(tp);
+		kp = &ht->ht[TC_U32_HASH(key->handle)];
+		for (pkp = rtnl_dereference(*kp); pkp;
+		     kp = &pkp->next, pkp = rtnl_dereference(*kp)) {
+			if (pkp == key) {
+				rcu_assign_pointer(*kp, key->next);
 
-				u32_destroy_key(tp, key);
+				call_rcu(&key->rcu, __u32_delete_key);
 				return 0;
 			}
 		}
@@ -368,16 +392,16 @@ static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
 	return 0;
 }
 
-static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
+static void u32_clear_hnode(struct tc_u_hnode *ht)
 {
 	struct tc_u_knode *n;
 	unsigned int h;
 
 	for (h = 0; h <= ht->divisor; h++) {
-		while ((n = ht->ht[h]) != NULL) {
-			ht->ht[h] = n->next;
-
-			u32_destroy_key(tp, n);
+		while ((n = rtnl_dereference(ht->ht[h])) != NULL) {
+			rcu_assign_pointer(ht->ht[h],
+					   rtnl_dereference(n->next));
+			call_rcu(&n->rcu, __u32_delete_key);
 		}
 	}
 }
@@ -385,28 +409,31 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 {
 	struct tc_u_common *tp_c = tp->data;
-	struct tc_u_hnode **hn;
+	struct tc_u_hnode __rcu **hn;
+	struct tc_u_hnode *phn;
 
 	WARN_ON(ht->refcnt);
 
-	u32_clear_hnode(tp, ht);
+	u32_clear_hnode(ht);
 
-	for (hn = &tp_c->hlist; *hn; hn = &(*hn)->next) {
-		if (*hn == ht) {
-			*hn = ht->next;
-			kfree(ht);
+	hn = &tp_c->hlist;
+	for (phn = rtnl_dereference(*hn);
+	     phn;
+	     hn = &phn->next, phn = rtnl_dereference(*hn)) {
+		if (phn == ht) {
+			rcu_assign_pointer(*hn, ht->next);
+			kfree_rcu(ht, rcu);
 			return 0;
 		}
 	}
 
-	WARN_ON(1);
 	return -ENOENT;
 }
 
 static void u32_destroy(struct tcf_proto *tp)
 {
 	struct tc_u_common *tp_c = tp->data;
-	struct tc_u_hnode *root_ht = tp->root;
+	struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
 
 	WARN_ON(root_ht == NULL);
 
@@ -418,17 +445,16 @@ static void u32_destroy(struct tcf_proto *tp)
 
 		tp->q->u32_node = NULL;
 
-		for (ht = tp_c->hlist; ht; ht = ht->next) {
+		for (ht = rtnl_dereference(tp_c->hlist);
+		     ht;
+		     ht = rtnl_dereference(ht->next)) {
 			ht->refcnt--;
-			u32_clear_hnode(tp, ht);
+			u32_clear_hnode(ht);
 		}
 
-		while ((ht = tp_c->hlist) != NULL) {
-			tp_c->hlist = ht->next;
-
-			WARN_ON(ht->refcnt != 0);
-
-			kfree(ht);
+		while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
+			rcu_assign_pointer(tp_c->hlist, ht->next);
+			kfree_rcu(ht, rcu);
 		}
 
 		kfree(tp_c);
@@ -440,6 +466,7 @@ static void u32_destroy(struct tcf_proto *tp)
 static int u32_delete(struct tcf_proto *tp, unsigned long arg)
 {
 	struct tc_u_hnode *ht = (struct tc_u_hnode *)arg;
+	struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
 
 	if (ht == NULL)
 		return 0;
@@ -447,7 +474,7 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
 	if (TC_U32_KEY(ht->handle))
 		return u32_delete_key(tp, (struct tc_u_knode *)ht);
 
-	if (tp->root == ht)
+	if (root_ht == ht)
 		return -EINVAL;
 
 	if (ht->refcnt == 1) {
@@ -465,7 +492,9 @@ static u32 gen_new_kid(struct tc_u_hnode *ht, u32 handle)
 	struct tc_u_knode *n;
 	unsigned int i = 0x7FF;
 
-	for (n = ht->ht[TC_U32_HASH(handle)]; n; n = n->next)
+	for (n = rtnl_dereference(ht->ht[TC_U32_HASH(handle)]);
+	     n;
+	     n = rtnl_dereference(n->next))
 		if (i < TC_U32_NODE(n->handle))
 			i = TC_U32_NODE(n->handle);
 	i++;
@@ -512,10 +541,8 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
 			ht_down->refcnt++;
 		}
 
-		tcf_tree_lock(tp);
-		ht_old = n->ht_down;
-		n->ht_down = ht_down;
-		tcf_tree_unlock(tp);
+		ht_old = rtnl_dereference(n->ht_down);
+		rcu_assign_pointer(n->ht_down, ht_down);
 
 		if (ht_old)
 			ht_old->refcnt--;
@@ -555,6 +582,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	struct nlattr *tb[TCA_U32_MAX + 1];
 	u32 htid;
 	int err;
+#ifdef CONFIG_CLS_U32_PERF
+	size_t size;
+#endif
 
 	if (opt == NULL)
 		return handle ? -EINVAL : 0;
@@ -592,8 +622,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		ht->divisor = divisor;
 		ht->handle = handle;
 		ht->prio = tp->prio;
-		ht->next = tp_c->hlist;
-		tp_c->hlist = ht;
+		rcu_assign_pointer(ht->next, tp_c->hlist);
+		rcu_assign_pointer(tp_c->hlist, ht);
 		*arg = (unsigned long)ht;
 		return 0;
 	}
@@ -601,7 +631,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	if (tb[TCA_U32_HASH]) {
 		htid = nla_get_u32(tb[TCA_U32_HASH]);
 		if (TC_U32_HTID(htid) == TC_U32_ROOT) {
-			ht = tp->root;
+			ht = rtnl_dereference(tp->root);
 			htid = ht->handle;
 		} else {
 			ht = u32_lookup_ht(tp->data, TC_U32_HTID(htid));
@@ -609,7 +639,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 				return -EINVAL;
 		}
 	} else {
-		ht = tp->root;
+		ht = rtnl_dereference(tp->root);
 		htid = ht->handle;
 	}
 
@@ -633,8 +663,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		return -ENOBUFS;
 
 #ifdef CONFIG_CLS_U32_PERF
-	n->pf = kzalloc(sizeof(struct tc_u32_pcnt) + s->nkeys*sizeof(u64), GFP_KERNEL);
-	if (n->pf == NULL) {
+	size = sizeof(struct tc_u32_pcnt) + s->nkeys * sizeof(u64);
+	n->pf = __alloc_percpu(size, __alignof__(struct tc_u32_pcnt));
+	if (!n->pf) {
 		kfree(n);
 		return -ENOBUFS;
 	}
@@ -645,34 +676,39 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	n->handle = handle;
 	n->fshift = s->hmask ? ffs(ntohl(s->hmask)) - 1 : 0;
 	tcf_exts_init(&n->exts, TCA_U32_ACT, TCA_U32_POLICE);
+	n->tp = tp;
 
 #ifdef CONFIG_CLS_U32_MARK
+	n->pcpu_success = alloc_percpu(u32);
+
 	if (tb[TCA_U32_MARK]) {
 		struct tc_u32_mark *mark;
 
 		mark = nla_data(tb[TCA_U32_MARK]);
-		memcpy(&n->mark, mark, sizeof(struct tc_u32_mark));
-		n->mark.success = 0;
+		n->val = mark->val;
+		n->mask = mark->mask;
 	}
 #endif
 
 	err = u32_set_parms(net, tp, base, ht, n, tb, tca[TCA_RATE]);
 	if (err == 0) {
-		struct tc_u_knode **ins;
-		for (ins = &ht->ht[TC_U32_HASH(handle)]; *ins; ins = &(*ins)->next)
-			if (TC_U32_NODE(handle) < TC_U32_NODE((*ins)->handle))
+		struct tc_u_knode __rcu **ins;
+		struct tc_u_knode *pins;
+
+		ins = &ht->ht[TC_U32_HASH(handle)];
+		for (pins = rtnl_dereference(*ins); pins;
+		     ins = &pins->next, pins = rtnl_dereference(*ins))
+			if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
 				break;
 
-		n->next = *ins;
-		tcf_tree_lock(tp);
-		*ins = n;
-		tcf_tree_unlock(tp);
+		rcu_assign_pointer(n->next, pins);
+		rcu_assign_pointer(*ins, n);
 
 		*arg = (unsigned long)n;
 		return 0;
 	}
 #ifdef CONFIG_CLS_U32_PERF
-	kfree(n->pf);
+	free_percpu(n->pf);
 #endif
 	kfree(n);
 	return err;
@@ -688,7 +724,9 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 	if (arg->stop)
 		return;
 
-	for (ht = tp_c->hlist; ht; ht = ht->next) {
+	for (ht = rtnl_dereference(tp_c->hlist);
+	     ht;
+	     ht = rtnl_dereference(ht->next)) {
 		if (ht->prio != tp->prio)
 			continue;
 		if (arg->count >= arg->skip) {
@@ -699,7 +737,9 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 		}
 		arg->count++;
 		for (h = 0; h <= ht->divisor; h++) {
-			for (n = ht->ht[h]; n; n = n->next) {
+			for (n = rtnl_dereference(ht->ht[h]);
+			     n;
+			     n = rtnl_dereference(n->next)) {
 				if (arg->count < arg->skip) {
 					arg->count++;
 					continue;
@@ -718,6 +758,7 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		     struct sk_buff *skb, struct tcmsg *t)
 {
 	struct tc_u_knode *n = (struct tc_u_knode *)fh;
+	struct tc_u_hnode *ht_up, *ht_down;
 	struct nlattr *nest;
 
 	if (n == NULL)
@@ -736,11 +777,18 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		if (nla_put_u32(skb, TCA_U32_DIVISOR, divisor))
 			goto nla_put_failure;
 	} else {
+#ifdef CONFIG_CLS_U32_PERF
+		struct tc_u32_pcnt *gpf;
+#endif
+		int cpu;
+
 		if (nla_put(skb, TCA_U32_SEL,
 			    sizeof(n->sel) + n->sel.nkeys*sizeof(struct tc_u32_key),
 			    &n->sel))
 			goto nla_put_failure;
-		if (n->ht_up) {
+
+		ht_up = rtnl_dereference(n->ht_up);
+		if (ht_up) {
 			u32 htid = n->handle & 0xFFFFF000;
 			if (nla_put_u32(skb, TCA_U32_HASH, htid))
 				goto nla_put_failure;
@@ -748,14 +796,27 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		if (n->res.classid &&
 		    nla_put_u32(skb, TCA_U32_CLASSID, n->res.classid))
 			goto nla_put_failure;
-		if (n->ht_down &&
-		    nla_put_u32(skb, TCA_U32_LINK, n->ht_down->handle))
+
+		ht_down = rtnl_dereference(n->ht_down);
+		if (ht_down &&
+		    nla_put_u32(skb, TCA_U32_LINK, ht_down->handle))
 			goto nla_put_failure;
 
 #ifdef CONFIG_CLS_U32_MARK
-		if ((n->mark.val || n->mark.mask) &&
-		    nla_put(skb, TCA_U32_MARK, sizeof(n->mark), &n->mark))
-			goto nla_put_failure;
+		if ((n->val || n->mask)) {
+			struct tc_u32_mark mark = {.val = n->val,
+						   .mask = n->mask,
+						   .success = 0};
+
+			for_each_possible_cpu(cpu) {
+				__u32 cnt = *per_cpu_ptr(n->pcpu_success, cpu);
+
+				mark.success += cnt;
+			}
+
+			if (nla_put(skb, TCA_U32_MARK, sizeof(mark), &mark))
+				goto nla_put_failure;
+		}
 #endif
 
 		if (tcf_exts_dump(skb, &n->exts) < 0)
@@ -770,10 +831,29 @@ static int u32_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		}
 #endif
 #ifdef CONFIG_CLS_U32_PERF
+		gpf = kzalloc(sizeof(struct tc_u32_pcnt) +
+					n->sel.nkeys * sizeof(u64),
+			      GFP_KERNEL);
+		if (!gpf)
+			goto nla_put_failure;
+
+		for_each_possible_cpu(cpu) {
+			int i;
+			struct tc_u32_pcnt *pf = per_cpu_ptr(n->pf, cpu);
+
+			gpf->rcnt += pf->rcnt;
+			gpf->rhit += pf->rhit;
+			for (i = 0; i < n->sel.nkeys; i++)
+				gpf->kcnts[i] += pf->kcnts[i];
+		}
+
 		if (nla_put(skb, TCA_U32_PCNT,
 			    sizeof(struct tc_u32_pcnt) + n->sel.nkeys*sizeof(u64),
-			    n->pf))
+			    gpf)) {
+			kfree(gpf);
 			goto nla_put_failure;
+		}
+		kfree(gpf);
 #endif
 	}
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 10/15] net: sched: rcu'ify cls_rsvp
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (8 preceding siblings ...)
  2014-04-30 16:38 ` [RFC PATCH 09/15] net: sched: make cls_u32 lockless John Fastabend
@ 2014-04-30 16:39 ` John Fastabend
  2014-04-30 16:39 ` [RFC PATCH 11/15] net: make cls_bpf rcu safe John Fastabend
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:39 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_rsvp.h |  152 ++++++++++++++++++++++++++++----------------------
 1 file changed, 86 insertions(+), 66 deletions(-)

diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 19f8e5d..ebfc377 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -70,31 +70,34 @@ struct rsvp_head {
 	u32			tmap[256/32];
 	u32			hgenerator;
 	u8			tgenerator;
-	struct rsvp_session	*ht[256];
+	struct rsvp_session __rcu *ht[256];
+	struct rcu_head		rcu;
 };
 
 struct rsvp_session {
-	struct rsvp_session	*next;
-	__be32			dst[RSVP_DST_LEN];
-	struct tc_rsvp_gpi 	dpi;
-	u8			protocol;
-	u8			tunnelid;
+	struct rsvp_session __rcu	*next;
+	__be32				dst[RSVP_DST_LEN];
+	struct tc_rsvp_gpi		dpi;
+	u8				protocol;
+	u8				tunnelid;
 	/* 16 (src,sport) hash slots, and one wildcard source slot */
-	struct rsvp_filter	*ht[16 + 1];
+	struct rsvp_filter __rcu	*ht[16 + 1];
+	struct rcu_head			rcu;
 };
 
 
 struct rsvp_filter {
-	struct rsvp_filter	*next;
-	__be32			src[RSVP_DST_LEN];
-	struct tc_rsvp_gpi	spi;
-	u8			tunnelhdr;
+	struct rsvp_filter __rcu	*next;
+	__be32				src[RSVP_DST_LEN];
+	struct tc_rsvp_gpi		spi;
+	u8				tunnelhdr;
 
-	struct tcf_result	res;
-	struct tcf_exts		exts;
+	struct tcf_result		res;
+	struct tcf_exts			exts;
 
-	u32			handle;
-	struct rsvp_session	*sess;
+	u32				handle;
+	struct rsvp_session		*sess;
+	struct rcu_head			rcu;
 };
 
 static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid)
@@ -128,7 +131,7 @@ static inline unsigned int hash_src(__be32 *src)
 static int rsvp_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			 struct tcf_result *res)
 {
-	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
+	struct rsvp_head *head = rcu_dereference_bh(tp->root);
 	struct rsvp_session *s;
 	struct rsvp_filter *f;
 	unsigned int h1, h2;
@@ -169,7 +172,8 @@ restart:
 	h1 = hash_dst(dst, protocol, tunnelid);
 	h2 = hash_src(src);
 
-	for (s = sht[h1]; s; s = s->next) {
+	for (s = rcu_dereference_bh(head->ht[h1]); s;
+	     s = rcu_dereference_bh(s->next)) {
 		if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN - 1] &&
 		    protocol == s->protocol &&
 		    !(s->dpi.mask &
@@ -181,7 +185,8 @@ restart:
 #endif
 		    tunnelid == s->tunnelid) {
 
-			for (f = s->ht[h2]; f; f = f->next) {
+			for (f = rcu_dereference_bh(s->ht[h2]); f;
+			     f = rcu_dereference_bh(f->next)) {
 				if (src[RSVP_DST_LEN-1] == f->src[RSVP_DST_LEN - 1] &&
 				    !(f->spi.mask & (*(u32 *)(xprt + f->spi.offset) ^ f->spi.key))
 #if RSVP_DST_LEN == 4
@@ -205,7 +210,8 @@ matched:
 			}
 
 			/* And wildcard bucket... */
-			for (f = s->ht[16]; f; f = f->next) {
+			for (f = rcu_dereference_bh(s->ht[16]); f;
+			     f = rcu_dereference_bh(f->next)) {
 				*res = f->res;
 				RSVP_APPLY_RESULT();
 				goto matched;
@@ -218,7 +224,7 @@ matched:
 
 static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
 {
-	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
+	struct rsvp_head *head = rtnl_dereference(tp->root);
 	struct rsvp_session *s;
 	struct rsvp_filter *f;
 	unsigned int h1 = handle & 0xFF;
@@ -227,8 +233,10 @@ static unsigned long rsvp_get(struct tcf_proto *tp, u32 handle)
 	if (h2 > 16)
 		return 0;
 
-	for (s = sht[h1]; s; s = s->next) {
-		for (f = s->ht[h2]; f; f = f->next) {
+	for (s = rtnl_dereference(head->ht[h1]); s;
+	     s = rtnl_dereference(s->next)) {
+		for (f = rtnl_dereference(s->ht[h2]); f;
+		     f = rtnl_dereference(f->next)) {
 			if (f->handle == handle)
 				return (unsigned long)f;
 		}
@@ -246,7 +254,7 @@ static int rsvp_init(struct tcf_proto *tp)
 
 	data = kzalloc(sizeof(struct rsvp_head), GFP_KERNEL);
 	if (data) {
-		tp->root = data;
+		rcu_assign_pointer(tp->root, data);
 		return 0;
 	}
 	return -ENOBUFS;
@@ -257,53 +265,55 @@ rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
 {
 	tcf_unbind_filter(tp, &f->res);
 	tcf_exts_destroy(tp, &f->exts);
-	kfree(f);
+	kfree_rcu(f, rcu);
 }
 
 static void rsvp_destroy(struct tcf_proto *tp)
 {
-	struct rsvp_head *data = xchg(&tp->root, NULL);
-	struct rsvp_session **sht;
+	struct rsvp_head *data = rtnl_dereference(tp->root);
 	int h1, h2;
 
 	if (data == NULL)
 		return;
 
-	sht = data->ht;
+	RCU_INIT_POINTER(tp->root, NULL);
 
 	for (h1 = 0; h1 < 256; h1++) {
 		struct rsvp_session *s;
 
-		while ((s = sht[h1]) != NULL) {
-			sht[h1] = s->next;
+		while ((s = rtnl_dereference(data->ht[h1])) != NULL) {
+			rcu_assign_pointer(data->ht[h1], s->next);
 
 			for (h2 = 0; h2 <= 16; h2++) {
 				struct rsvp_filter *f;
 
-				while ((f = s->ht[h2]) != NULL) {
-					s->ht[h2] = f->next;
+				while ((f = rtnl_dereference(s->ht[h2])) != NULL) {
+					rcu_assign_pointer(s->ht[h2], f->next);
 					rsvp_delete_filter(tp, f);
 				}
 			}
-			kfree(s);
+			kfree_rcu(s, rcu);
 		}
 	}
-	kfree(data);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(data, rcu);
 }
 
 static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 {
-	struct rsvp_filter **fp, *f = (struct rsvp_filter *)arg;
+	struct rsvp_head *head = rtnl_dereference(tp->root);
+	struct rsvp_filter *nfp, *f = (struct rsvp_filter *)arg;
+	struct rsvp_filter __rcu **fp;
 	unsigned int h = f->handle;
-	struct rsvp_session **sp;
-	struct rsvp_session *s = f->sess;
+	struct rsvp_session __rcu **sp;
+	struct rsvp_session *nsp, *s = f->sess;
 	int i;
 
-	for (fp = &s->ht[(h >> 8) & 0xFF]; *fp; fp = &(*fp)->next) {
-		if (*fp == f) {
-			tcf_tree_lock(tp);
+	fp = &s->ht[(h >> 8) & 0xFF];
+	for (nfp = rtnl_dereference(*fp); nfp;
+	     fp = &nfp->next, nfp = rtnl_dereference(*fp)) {
+		if (nfp == f) {
 			*fp = f->next;
-			tcf_tree_unlock(tp);
 			rsvp_delete_filter(tp, f);
 
 			/* Strip tree */
@@ -313,14 +323,12 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 					return 0;
 
 			/* OK, session has no flows */
-			for (sp = &((struct rsvp_head *)tp->root)->ht[h & 0xFF];
-			     *sp; sp = &(*sp)->next) {
-				if (*sp == s) {
-					tcf_tree_lock(tp);
+			sp = &head->ht[h & 0xFF];
+			for (nsp = rtnl_dereference(*sp); nsp;
+			     sp = &nsp->next, nsp = rtnl_dereference(*sp)) {
+				if (nsp == s) {
 					*sp = s->next;
-					tcf_tree_unlock(tp);
-
-					kfree(s);
+					kfree_rcu(s, rcu);
 					return 0;
 				}
 			}
@@ -333,7 +341,7 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 
 static unsigned int gen_handle(struct tcf_proto *tp, unsigned salt)
 {
-	struct rsvp_head *data = tp->root;
+	struct rsvp_head *data = rtnl_dereference(tp->root);
 	int i = 0xFFFF;
 
 	while (i-- > 0) {
@@ -361,7 +369,7 @@ static int tunnel_bts(struct rsvp_head *data)
 
 static void tunnel_recycle(struct rsvp_head *data)
 {
-	struct rsvp_session **sht = data->ht;
+	struct rsvp_session __rcu **sht = data->ht;
 	u32 tmap[256/32];
 	int h1, h2;
 
@@ -369,11 +377,13 @@ static void tunnel_recycle(struct rsvp_head *data)
 
 	for (h1 = 0; h1 < 256; h1++) {
 		struct rsvp_session *s;
-		for (s = sht[h1]; s; s = s->next) {
+		for (s = rtnl_dereference(sht[h1]); s;
+		     s = rtnl_dereference(s->next)) {
 			for (h2 = 0; h2 <= 16; h2++) {
 				struct rsvp_filter *f;
 
-				for (f = s->ht[h2]; f; f = f->next) {
+				for (f = rtnl_dereference(s->ht[h2]); f;
+				     f = rtnl_dereference(f->next)) {
 					if (f->tunnelhdr == 0)
 						continue;
 					data->tgenerator = f->res.classid;
@@ -417,9 +427,11 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
 		       struct nlattr **tca,
 		       unsigned long *arg)
 {
-	struct rsvp_head *data = tp->root;
-	struct rsvp_filter *f, **fp;
-	struct rsvp_session *s, **sp;
+	struct rsvp_head *data = rtnl_dereference(tp->root);
+	struct rsvp_filter *f, *nfp;
+	struct rsvp_filter __rcu **fp;
+	struct rsvp_session *nsp, *s;
+	struct rsvp_session __rcu **sp;
 	struct tc_rsvp_pinfo *pinfo = NULL;
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_RSVP_MAX + 1];
@@ -499,7 +511,9 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
 			goto errout;
 	}
 
-	for (sp = &data->ht[h1]; (s = *sp) != NULL; sp = &s->next) {
+	for (sp = &data->ht[h1];
+	     (s = rtnl_dereference(*sp)) != NULL;
+	     sp = &s->next) {
 		if (dst[RSVP_DST_LEN-1] == s->dst[RSVP_DST_LEN-1] &&
 		    pinfo && pinfo->protocol == s->protocol &&
 		    memcmp(&pinfo->dpi, &s->dpi, sizeof(s->dpi)) == 0 &&
@@ -521,12 +535,14 @@ insert:
 
 			tcf_exts_change(tp, &f->exts, &e);
 
-			for (fp = &s->ht[h2]; *fp; fp = &(*fp)->next)
-				if (((*fp)->spi.mask & f->spi.mask) != f->spi.mask)
+			fp = &s->ht[h2];
+			for (nfp = rtnl_dereference(*fp); nfp;
+			     fp = &nfp->next, nfp = rtnl_dereference(*fp))
+				if ((nfp->spi.mask & f->spi.mask) != f->spi.mask)
 					break;
-			f->next = *fp;
+			rcu_assign_pointer(f->next, nfp);
 			wmb();
-			*fp = f;
+			rcu_assign_pointer(*fp, f);
 
 			*arg = (unsigned long)f;
 			return 0;
@@ -546,13 +562,15 @@ insert:
 		s->protocol = pinfo->protocol;
 		s->tunnelid = pinfo->tunnelid;
 	}
-	for (sp = &data->ht[h1]; *sp; sp = &(*sp)->next) {
-		if (((*sp)->dpi.mask&s->dpi.mask) != s->dpi.mask)
+	sp = &data->ht[h1];
+	for (nsp = rtnl_dereference(*sp); nsp;
+	     sp = &nsp->next, nsp = rtnl_dereference(*sp)) {
+		if ((nsp->dpi.mask & s->dpi.mask) != s->dpi.mask)
 			break;
 	}
-	s->next = *sp;
+	rcu_assign_pointer(s->next, nsp);
 	wmb();
-	*sp = s;
+	rcu_assign_pointer(*sp, s);
 
 	goto insert;
 
@@ -565,7 +583,7 @@ errout2:
 
 static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 {
-	struct rsvp_head *head = tp->root;
+	struct rsvp_head *head = rtnl_dereference(tp->root);
 	unsigned int h, h1;
 
 	if (arg->stop)
@@ -574,11 +592,13 @@ static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 	for (h = 0; h < 256; h++) {
 		struct rsvp_session *s;
 
-		for (s = head->ht[h]; s; s = s->next) {
+		for (s = rtnl_dereference(head->ht[h]); s;
+		     s = rtnl_dereference(s->next)) {
 			for (h1 = 0; h1 <= 16; h1++) {
 				struct rsvp_filter *f;
 
-				for (f = s->ht[h1]; f; f = f->next) {
+				for (f = rtnl_dereference(s->ht[h1]); f;
+				     f = rtnl_dereference(f->next)) {
 					if (arg->count < arg->skip) {
 						arg->count++;
 						continue;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 11/15] net: make cls_bpf rcu safe
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (9 preceding siblings ...)
  2014-04-30 16:39 ` [RFC PATCH 10/15] net: sched: rcu'ify cls_rsvp John Fastabend
@ 2014-04-30 16:39 ` John Fastabend
  2014-04-30 16:39 ` [RFC PATCH 12/15] net: sched: make tc_action safe to walk under RCU John Fastabend
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:39 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

This patch makes the cls_bpf classifier RCU safe. The tcf_lock
was being used to protect a list of cls_bpf_prog now this list
is RCU safe and updates occur with rcu_replace.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/cls_bpf.c |   79 ++++++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 39 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 8e3cf49..de83f53 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -25,8 +25,9 @@ MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
 MODULE_DESCRIPTION("TC BPF based classifier");
 
 struct cls_bpf_head {
-	struct list_head plist;
+	struct list_head __rcu plist;
 	u32 hgen;
+	struct rcu_head rcu;
 };
 
 struct cls_bpf_prog {
@@ -37,6 +38,8 @@ struct cls_bpf_prog {
 	struct list_head link;
 	u32 handle;
 	u16 bpf_len;
+	struct tcf_proto *tp;
+	struct rcu_head rcu;
 };
 
 static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
@@ -49,11 +52,11 @@ static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
 static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			    struct tcf_result *res)
 {
-	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_head *head = rcu_dereference(tp->root);
 	struct cls_bpf_prog *prog;
 	int ret;
 
-	list_for_each_entry(prog, &head->plist, link) {
+	list_for_each_entry_rcu(prog, &head->plist, link) {
 		int filter_res = SK_RUN_FILTER(prog->filter, skb);
 
 		if (filter_res == 0)
@@ -81,8 +84,8 @@ static int cls_bpf_init(struct tcf_proto *tp)
 	if (head == NULL)
 		return -ENOBUFS;
 
-	INIT_LIST_HEAD(&head->plist);
-	tp->root = head;
+	INIT_LIST_HEAD_RCU(&head->plist);
+	rcu_assign_pointer(tp->root, head);
 
 	return 0;
 }
@@ -98,6 +101,13 @@ static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
 	kfree(prog);
 }
 
+void __cls_bpf_delete_prog(struct rcu_head *rcu)
+{
+	struct cls_bpf_prog *prog = container_of(rcu, struct cls_bpf_prog, rcu);
+
+	cls_bpf_delete_prog(prog->tp, prog);
+}
+
 static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
 {
 	struct cls_bpf_head *head = tp->root;
@@ -105,11 +115,8 @@ static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
 
 	list_for_each_entry(prog, &head->plist, link) {
 		if (prog == todel) {
-			tcf_tree_lock(tp);
-			list_del(&prog->link);
-			tcf_tree_unlock(tp);
-
-			cls_bpf_delete_prog(tp, prog);
+			list_del_rcu(&prog->link);
+			call_rcu(&prog->rcu, __cls_bpf_delete_prog);
 			return 0;
 		}
 	}
@@ -123,11 +130,12 @@ static void cls_bpf_destroy(struct tcf_proto *tp)
 	struct cls_bpf_prog *prog, *tmp;
 
 	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
-		list_del(&prog->link);
-		cls_bpf_delete_prog(tp, prog);
+		list_del_rcu(&prog->link);
+		call_rcu(&prog->rcu, __cls_bpf_delete_prog);
 	}
 
-	kfree(head);
+	RCU_INIT_POINTER(tp->root, NULL);
+	kfree_rcu(head, rcu);
 }
 
 static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
@@ -158,10 +166,10 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 				   unsigned long base, struct nlattr **tb,
 				   struct nlattr *est)
 {
-	struct sock_filter *bpf_ops, *bpf_old;
+	struct sock_filter *bpf_ops;
 	struct tcf_exts exts;
 	struct sock_fprog tmp;
-	struct sk_filter *fp, *fp_old;
+	struct sk_filter *fp;
 	u16 bpf_size, bpf_len;
 	u32 classid;
 	int ret;
@@ -197,24 +205,13 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 	if (ret)
 		goto errout_free;
 
-	tcf_tree_lock(tp);
-	fp_old = prog->filter;
-	bpf_old = prog->bpf_ops;
-
 	prog->bpf_len = bpf_len;
 	prog->bpf_ops = bpf_ops;
 	prog->filter = fp;
 	prog->res.classid = classid;
-	tcf_tree_unlock(tp);
 
 	tcf_bind_filter(tp, &prog->res, base);
 	tcf_exts_change(tp, &prog->exts, &exts);
-
-	if (fp_old)
-		sk_unattached_filter_destroy(fp_old);
-	if (bpf_old)
-		kfree(bpf_old);
-
 	return 0;
 
 errout_free:
@@ -245,8 +242,9 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 			  unsigned long *arg)
 {
 	struct cls_bpf_head *head = tp->root;
-	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) *arg;
+	struct cls_bpf_prog *oldprog = (struct cls_bpf_prog *) *arg;
 	struct nlattr *tb[TCA_BPF_MAX + 1];
+	struct cls_bpf_prog *prog;
 	int ret;
 
 	if (tca[TCA_OPTIONS] == NULL)
@@ -256,18 +254,19 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	if (ret < 0)
 		return ret;
 
-	if (prog != NULL) {
-		if (handle && prog->handle != handle)
-			return -EINVAL;
-		return cls_bpf_modify_existing(net, tp, prog, base, tb,
-					       tca[TCA_RATE]);
-	}
-
 	prog = kzalloc(sizeof(*prog), GFP_KERNEL);
 	if (prog == NULL)
 		return -ENOBUFS;
 
 	tcf_exts_init(&prog->exts, TCA_BPF_ACT, TCA_BPF_POLICE);
+
+	if (oldprog) {
+		if (handle && oldprog->handle != handle) {
+			ret = -EINVAL;
+			goto errout;
+		}
+	}
+
 	if (handle == 0)
 		prog->handle = cls_bpf_grab_new_handle(tp, head);
 	else
@@ -281,15 +280,17 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	if (ret < 0)
 		goto errout;
 
-	tcf_tree_lock(tp);
-	list_add(&prog->link, &head->plist);
-	tcf_tree_unlock(tp);
+	if (oldprog) {
+		list_replace_rcu(&prog->link, &oldprog->link);
+		call_rcu(&oldprog->rcu, __cls_bpf_delete_prog);
+	} else {
+		list_add_rcu(&prog->link, &head->plist);
+	}
 
 	*arg = (unsigned long) prog;
-
 	return 0;
 errout:
-	if (*arg == 0UL && prog)
+	if (prog)
 		kfree(prog);
 
 	return ret;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 12/15] net: sched: make tc_action safe to walk under RCU
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (10 preceding siblings ...)
  2014-04-30 16:39 ` [RFC PATCH 11/15] net: make cls_bpf rcu safe John Fastabend
@ 2014-04-30 16:39 ` John Fastabend
  2014-04-30 16:40 ` [RFC PATCH 13/15] net: sched: make bstats per cpu and estimator RCU safe John Fastabend
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:39 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

This patch makes tc_actions safe to walk from classifiers under RCU.
Notice that each action act() callback defined in each act_*.c
already does its own locking. This is needed even in the current
infrastructure for the case where users add/remove actions via
'tc actions' and reference them via index from the classifiers.

There are a couple interesting pieces here (i.e. need careful review)
In tcf_exts_exec() the call to tcf_action_exec follows a list_empty
check. However although this is occurring under RCU there is no
guarantee that the list is not empty when tcf_action_exec is called.
This patch fixes up return values from tcf_action_exec() so that
packets wont be dropped on the floor when this occurs. Hopefully
its rare and by using RCU we sort of make this assumption.

Second there is a suspect usage of list_splice_init_rcu() in the
tcf_exts_change() routine. Notice how it is used twice in succession
and the second init works on the src tcf_exts. There is probably a
better way to accomplish that.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/act_api.h |    1 +
 include/net/pkt_cls.h |   12 ++++++++++--
 net/sched/act_api.c   |   14 +++++++-------
 net/sched/cls_api.c   |   17 ++++++++++-------
 4 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 3ee4c92..ddd0c5a 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -79,6 +79,7 @@ struct tc_action {
 	__u32			type; /* for backward compat(TCA_OLD_COMPAT) */
 	__u32			order;
 	struct list_head	list;
+	struct rcu_head		rcu;
 };
 
 struct tc_action_ops {
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a2441fb..69f0ec5 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -63,7 +63,7 @@ tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
 struct tcf_exts {
 #ifdef CONFIG_NET_CLS_ACT
 	__u32	type; /* for backward compat(TCA_OLD_COMPAT) */
-	struct list_head actions;
+	struct list_head __rcu actions;
 #endif
 	/* Map to export classifier specific extension TLV types to the
 	 * generic extensions API. Unsupported extensions must be set to 0.
@@ -76,7 +76,7 @@ static inline void tcf_exts_init(struct tcf_exts *exts, int action, int police)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	exts->type = 0;
-	INIT_LIST_HEAD(&exts->actions);
+	INIT_LIST_HEAD_RCU(&exts->actions);
 #endif
 	exts->action = action;
 	exts->police = police;
@@ -88,6 +88,8 @@ static inline void tcf_exts_init(struct tcf_exts *exts, int action, int police)
  *
  * Returns 1 if a predicative extension is present, i.e. an extension which
  * might cause further actions and thus overrule the regular tcf_result.
+ *
+ * This check is only valid if done under RTNL.
  */
 static inline int
 tcf_exts_is_predicative(struct tcf_exts *exts)
@@ -128,6 +130,12 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
 	       struct tcf_result *res)
 {
 #ifdef CONFIG_NET_CLS_ACT
+	/* This check is racy but this is OK, if the list is emptied
+	 * before walking the chain of actions the return value has
+	 * been updated to return zero. This way packets will not be
+	 * dropped when action list deletion occurs after the empty
+	 * check but before execution
+	 */
 	if (!list_empty(&exts->actions))
 		return tcf_action_exec(skb, &exts->actions, res);
 #endif
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 8a5ba5a..d73a546 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -381,14 +381,14 @@ int tcf_action_exec(struct sk_buff *skb, const struct list_head *actions,
 		    struct tcf_result *res)
 {
 	const struct tc_action *a;
-	int ret = -1;
+	int ret = 0;
 
 	if (skb->tc_verd & TC_NCLS) {
 		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
 		ret = TC_ACT_OK;
 		goto exec_done;
 	}
-	list_for_each_entry(a, actions, list) {
+	list_for_each_entry_rcu(a, actions, list) {
 repeat:
 		ret = a->ops->act(skb, a, res);
 		if (TC_MUNGED & skb->tc_verd) {
@@ -417,8 +417,8 @@ int tcf_action_destroy(struct list_head *actions, int bind)
 			module_put(a->ops->owner);
 		else if (ret < 0)
 			return ret;
-		list_del(&a->list);
-		kfree(a);
+		list_del_rcu(&a->list);
+		kfree_rcu(a, rcu);
 	}
 	return ret;
 }
@@ -584,7 +584,7 @@ int tcf_action_init(struct net *net, struct nlattr *nla,
 			goto err;
 		}
 		act->order = i;
-		list_add_tail(&act->list, actions);
+		list_add_tail_rcu(&act->list, actions);
 	}
 	return 0;
 
@@ -746,8 +746,8 @@ static void cleanup_a(struct list_head *actions)
 	struct tc_action *a, *tmp;
 
 	list_for_each_entry_safe(a, tmp, actions, list) {
-		list_del(&a->list);
-		kfree(a);
+		list_del_rcu(&a->list);
+		kfree_rcu(a, rcu);
 	}
 }
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 94c7d38..813e5ca 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -498,7 +498,7 @@ void tcf_exts_destroy(struct tcf_proto *tp, struct tcf_exts *exts)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	tcf_action_destroy(&exts->actions, TCA_ACT_UNBIND);
-	INIT_LIST_HEAD(&exts->actions);
+	INIT_LIST_HEAD_RCU(&exts->actions);
 #endif
 }
 EXPORT_SYMBOL(tcf_exts_destroy);
@@ -510,7 +510,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 	{
 		struct tc_action *act;
 
-		INIT_LIST_HEAD(&exts->actions);
+		INIT_LIST_HEAD_RCU(&exts->actions);
 		if (exts->police && tb[exts->police]) {
 			act = tcf_action_init_1(net, tb[exts->police], rate_tlv,
 						"police", TCA_ACT_NOREPLACE,
@@ -519,7 +519,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 				return PTR_ERR(act);
 
 			act->type = exts->type = TCA_OLD_COMPAT;
-			list_add(&act->list, &exts->actions);
+			list_add_rcu(&act->list, &exts->actions);
 		} else if (exts->action && tb[exts->action]) {
 			int err;
 			err = tcf_action_init(net, tb[exts->action], rate_tlv,
@@ -539,16 +539,19 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 }
 EXPORT_SYMBOL(tcf_exts_validate);
 
+/* It is not safe to use src->actions after this due to _init_rcu usage
+ * INIT_LIST_HEAD_RCU() is called on src->actions
+ */
 void tcf_exts_change(struct tcf_proto *tp, struct tcf_exts *dst,
 		     struct tcf_exts *src)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	if (!list_empty(&src->actions)) {
 		LIST_HEAD(tmp);
-		tcf_tree_lock(tp);
-		list_splice_init(&dst->actions, &tmp);
-		list_splice(&src->actions, &dst->actions);
-		tcf_tree_unlock(tp);
+		list_splice_init_rcu(&dst->actions, &tmp, synchronize_rcu);
+		list_splice_init_rcu(&src->actions,
+				     &dst->actions,
+				     synchronize_rcu);
 		tcf_action_destroy(&tmp, TCA_ACT_UNBIND);
 	}
 #endif

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 13/15] net: sched: make bstats per cpu and estimator RCU safe
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (11 preceding siblings ...)
  2014-04-30 16:39 ` [RFC PATCH 12/15] net: sched: make tc_action safe to walk under RCU John Fastabend
@ 2014-04-30 16:40 ` John Fastabend
  2014-04-30 16:40 ` [RFC PATCH 14/15] net: sched: make qstats per cpu John Fastabend
  2014-04-30 16:41 ` [RFC PATCH 15/15] net: sched: drop ingress qdisc lock John Fastabend
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:40 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_LLQDISC flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

After this if qdiscs use the _percpu routines to update stats
and set the TCQ_F_LLQDISC they can be made safe to run without
locking. Any qdisc that sets the flag needs to ensure that the
data structures it owns are safe to run without locks and
additionally currently all the skb list routines run without
locking so those would need to be updated in a future patch.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/gen_stats.h    |   16 ++++++++++++
 include/net/sch_generic.h  |   40 +++++++++++++++++++++++++++--
 net/core/gen_estimator.c   |   60 ++++++++++++++++++++++++++++++++++----------
 net/core/gen_stats.c       |   46 +++++++++++++++++++++++++++++++++-
 net/netfilter/xt_RATEEST.c |    4 +--
 net/sched/act_api.c        |    9 ++++---
 net/sched/act_police.c     |    4 +--
 net/sched/sch_api.c        |   51 ++++++++++++++++++++++++++++++++-----
 net/sched/sch_cbq.c        |    9 ++++---
 net/sched/sch_drr.c        |    9 ++++---
 net/sched/sch_generic.c    |   12 ++++++++-
 net/sched/sch_hfsc.c       |   15 +++++++----
 net/sched/sch_htb.c        |   14 +++++++---
 net/sched/sch_ingress.c    |    2 +
 net/sched/sch_mq.c         |   10 ++++---
 net/sched/sch_mqprio.c     |   14 ++++++----
 net/sched/sch_multiq.c     |    2 +
 net/sched/sch_prio.c       |    2 +
 net/sched/sch_qfq.c        |   12 +++++----
 19 files changed, 259 insertions(+), 72 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index ea4271d..dec4312 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -6,6 +6,11 @@
 #include <linux/rtnetlink.h>
 #include <linux/pkt_sched.h>
 
+struct gnet_stats_basic_cpu {
+	struct gnet_stats_basic_packed bstats;
+	struct u64_stats_sync syncp;
+};
+
 struct gnet_dump {
 	spinlock_t *      lock;
 	struct sk_buff *  skb;
@@ -26,10 +31,17 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
 				 int tc_stats_type, int xstats_type,
 				 spinlock_t *lock, struct gnet_dump *d);
 
+int gnet_stats_copy_basic_cpu(struct gnet_dump *d,
+			      struct gnet_stats_basic_cpu *b);
+void __gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
+				 struct gnet_stats_basic_cpu *b);
 int gnet_stats_copy_basic(struct gnet_dump *d,
 			  struct gnet_stats_basic_packed *b);
+void __gnet_stats_copy_basic(struct gnet_stats_basic_packed *bstats,
+			     struct gnet_stats_basic_packed *b);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 			     const struct gnet_stats_basic_packed *b,
+			     const struct gnet_stats_basic_cpu *cpu_b,
 			     struct gnet_stats_rate_est64 *r);
 int gnet_stats_copy_queue(struct gnet_dump *d, struct gnet_stats_queue *q);
 int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
@@ -37,13 +49,17 @@ int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 int gnet_stats_finish_copy(struct gnet_dump *d);
 
 int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
+		      struct gnet_stats_basic_cpu *cpu_bstats,
 		      struct gnet_stats_rate_est64 *rate_est,
 		      spinlock_t *stats_lock, struct nlattr *opt);
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
+			struct gnet_stats_basic_cpu *cpu_bstats,
 			struct gnet_stats_rate_est64 *rate_est);
 int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
+			  struct gnet_stats_basic_cpu *cpu_bstats,
 			  struct gnet_stats_rate_est64 *rate_est,
 			  spinlock_t *stats_lock, struct nlattr *opt);
 bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
+			  const struct gnet_stats_basic_cpu *cpu_bstats,
 			  const struct gnet_stats_rate_est64 *rate_est);
 #endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 20553c7..1a55e19 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -6,6 +6,7 @@
 #include <linux/rcupdate.h>
 #include <linux/pkt_sched.h>
 #include <linux/pkt_cls.h>
+#include <linux/percpu.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
 
@@ -57,6 +58,9 @@ struct Qdisc {
 				      * Its true for MQ/MQPRIO slaves, or non
 				      * multiqueue device.
 				      */
+#define TCQ_F_LLQDISC		0x20 /* lockless qdiscs can run without using
+				      * the qdisc lock to serialize skbs.
+				      */
 #define TCQ_F_WARN_NONWC	(1 << 16)
 	u32			limit;
 	const struct Qdisc_ops	*ops;
@@ -83,7 +87,10 @@ struct Qdisc {
 	 */
 	unsigned long		state;
 	struct sk_buff_head	q;
-	struct gnet_stats_basic_packed bstats;
+	union {
+		struct gnet_stats_basic_packed bstats;
+		struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+	} bstats_qdisc;
 	unsigned int		__state;
 	struct gnet_stats_queue	qstats;
 	struct rcu_head		rcu_head;
@@ -494,7 +501,6 @@ static inline int qdisc_enqueue_root(struct sk_buff *skb, struct Qdisc *sch)
 	return qdisc_enqueue(skb, sch) & NET_XMIT_MASK;
 }
 
-
 static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
 				 const struct sk_buff *skb)
 {
@@ -502,10 +508,38 @@ static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
 	bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
 }
 
+static inline void qdisc_bstats_update_cpu(struct Qdisc *sch,
+					   const struct sk_buff *skb)
+{
+	struct gnet_stats_basic_cpu *bstats =
+				this_cpu_ptr(sch->bstats_qdisc.cpu_bstats);
+
+	u64_stats_update_begin(&bstats->syncp);
+	bstats_update(&bstats->bstats, skb);
+	u64_stats_update_end(&bstats->syncp);
+}
+
 static inline void qdisc_bstats_update(struct Qdisc *sch,
 				       const struct sk_buff *skb)
 {
-	bstats_update(&sch->bstats, skb);
+	bstats_update(&sch->bstats_qdisc.bstats, skb);
+}
+
+static inline bool qdisc_is_lockless(struct Qdisc *q)
+{
+	return q->flags & TCQ_F_LLQDISC;
+}
+
+static inline int
+qdisc_bstats_copy_qdisc(struct gnet_dump *d, struct Qdisc *q)
+{
+	int err;
+
+	if (qdisc_is_lockless(q))
+		err = gnet_stats_copy_basic_cpu(d, q->bstats_qdisc.cpu_bstats);
+	else
+		err = gnet_stats_copy_basic(d, &q->bstats_qdisc.bstats);
+	return err;
 }
 
 static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 6b5b6e7..4f963e7 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -91,6 +91,8 @@ struct gen_estimator
 	u32			avpps;
 	struct rcu_head		e_rcu;
 	struct rb_node		node;
+	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+	struct rcu_head		head;
 };
 
 struct gen_estimator_head
@@ -122,11 +124,21 @@ static void est_timer(unsigned long arg)
 
 		spin_lock(e->stats_lock);
 		read_lock(&est_lock);
-		if (e->bstats == NULL)
+
+		if (e->bstats == NULL && e->cpu_bstats == NULL)
 			goto skip;
 
-		nbytes = e->bstats->bytes;
-		npackets = e->bstats->packets;
+		if (e->cpu_bstats) {
+			struct gnet_stats_basic_packed b = {0};
+
+			__gnet_stats_copy_basic_cpu(&b, e->cpu_bstats);
+			nbytes = b.bytes;
+			npackets = b.packets;
+		} else {
+			nbytes = e->bstats->bytes;
+			npackets = e->bstats->packets;
+		}
+
 		brate = (nbytes - e->last_bytes)<<(7 - idx);
 		e->last_bytes = nbytes;
 		e->avbps += (brate >> e->ewma_log) - (e->avbps >> e->ewma_log);
@@ -146,6 +158,11 @@ skip:
 	rcu_read_unlock();
 }
 
+static void *gen_get_bstats(struct gen_estimator *est)
+{
+	return est->cpu_bstats ? (void *)est->cpu_bstats : (void *)est->bstats;
+}
+
 static void gen_add_node(struct gen_estimator *est)
 {
 	struct rb_node **p = &est_root.rb_node, *parent = NULL;
@@ -156,7 +173,7 @@ static void gen_add_node(struct gen_estimator *est)
 		parent = *p;
 		e = rb_entry(parent, struct gen_estimator, node);
 
-		if (est->bstats > e->bstats)
+		if (gen_get_bstats(est)  > gen_get_bstats(e))
 			p = &parent->rb_right;
 		else
 			p = &parent->rb_left;
@@ -167,18 +184,20 @@ static void gen_add_node(struct gen_estimator *est)
 
 static
 struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats,
+				    const struct gnet_stats_basic_cpu *cpu_bstats,
 				    const struct gnet_stats_rate_est64 *rate_est)
 {
 	struct rb_node *p = est_root.rb_node;
+	void *b = bstats ? (void *)bstats : (void *)cpu_bstats;
 
 	while (p) {
 		struct gen_estimator *e;
 
 		e = rb_entry(p, struct gen_estimator, node);
 
-		if (bstats > e->bstats)
+		if (b > gen_get_bstats(e))
 			p = p->rb_right;
-		else if (bstats < e->bstats || rate_est != e->rate_est)
+		else if (b < gen_get_bstats(e) || rate_est != e->rate_est)
 			p = p->rb_left;
 		else
 			return e;
@@ -203,6 +222,7 @@ struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
  *
  */
 int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
+		      struct gnet_stats_basic_cpu *cpu_bstats,
 		      struct gnet_stats_rate_est64 *rate_est,
 		      spinlock_t *stats_lock,
 		      struct nlattr *opt)
@@ -221,14 +241,24 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 	if (est == NULL)
 		return -ENOBUFS;
 
+	if (cpu_bstats) {
+		struct gnet_stats_basic_packed b = {0};
+
+		est->cpu_bstats = cpu_bstats;
+		__gnet_stats_copy_basic_cpu(&b, cpu_bstats);
+		est->last_bytes = b.bytes;
+		est->last_packets = b.packets;
+	} else {
+		est->bstats = bstats;
+		est->last_bytes = bstats->bytes;
+		est->last_packets = bstats->packets;
+	}
+
 	idx = parm->interval + 2;
-	est->bstats = bstats;
 	est->rate_est = rate_est;
 	est->stats_lock = stats_lock;
 	est->ewma_log = parm->ewma_log;
-	est->last_bytes = bstats->bytes;
 	est->avbps = rate_est->bps<<5;
-	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
 	spin_lock_bh(&est_tree_lock);
@@ -258,14 +288,14 @@ EXPORT_SYMBOL(gen_new_estimator);
  * Note : Caller should respect an RCU grace period before freeing stats_lock
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
+			struct gnet_stats_basic_cpu *cpu_bstats,
 			struct gnet_stats_rate_est64 *rate_est)
 {
 	struct gen_estimator *e;
 
 	spin_lock_bh(&est_tree_lock);
-	while ((e = gen_find_node(bstats, rate_est))) {
+	while ((e = gen_find_node(bstats, cpu_bstats, rate_est))) {
 		rb_erase(&e->node, &est_root);
-
 		write_lock(&est_lock);
 		e->bstats = NULL;
 		write_unlock(&est_lock);
@@ -290,11 +320,12 @@ EXPORT_SYMBOL(gen_kill_estimator);
  * Returns 0 on success or a negative error code.
  */
 int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
+			  struct gnet_stats_basic_cpu *cpu_bstats,
 			  struct gnet_stats_rate_est64 *rate_est,
 			  spinlock_t *stats_lock, struct nlattr *opt)
 {
-	gen_kill_estimator(bstats, rate_est);
-	return gen_new_estimator(bstats, rate_est, stats_lock, opt);
+	gen_kill_estimator(bstats, cpu_bstats, rate_est);
+	return gen_new_estimator(bstats, cpu_bstats, rate_est, stats_lock, opt);
 }
 EXPORT_SYMBOL(gen_replace_estimator);
 
@@ -306,6 +337,7 @@ EXPORT_SYMBOL(gen_replace_estimator);
  * Returns true if estimator is active, and false if not.
  */
 bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
+			  const struct gnet_stats_basic_cpu *cpu_bstats,
 			  const struct gnet_stats_rate_est64 *rate_est)
 {
 	bool res;
@@ -313,7 +345,7 @@ bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
 	ASSERT_RTNL();
 
 	spin_lock_bh(&est_tree_lock);
-	res = gen_find_node(bstats, rate_est) != NULL;
+	res = gen_find_node(bstats, cpu_bstats, rate_est) != NULL;
 	spin_unlock_bh(&est_tree_lock);
 
 	return res;
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 9d3d9e7..b51146a 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -128,6 +128,49 @@ gnet_stats_copy_basic(struct gnet_dump *d, struct gnet_stats_basic_packed *b)
 }
 EXPORT_SYMBOL(gnet_stats_copy_basic);
 
+void
+__gnet_stats_copy_basic(struct gnet_stats_basic_packed *bstats,
+			struct gnet_stats_basic_packed *b)
+{
+	bstats->bytes = b->bytes;
+	bstats->packets = b->packets;
+}
+EXPORT_SYMBOL(__gnet_stats_copy_basic);
+
+void
+__gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
+			    struct gnet_stats_basic_cpu *b)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct gnet_stats_basic_cpu *bcpu = per_cpu_ptr(b, i);
+		unsigned int start;
+		__u64 bytes;
+		__u32 packets;
+
+		do {
+			start = u64_stats_fetch_begin_irq(&bcpu->syncp);
+			bytes = bcpu->bstats.bytes;
+			packets = bcpu->bstats.packets;
+		} while (u64_stats_fetch_retry_irq(&bcpu->syncp, start));
+
+		bstats->bytes += bcpu->bstats.bytes;
+		bstats->packets += bcpu->bstats.packets;
+	}
+}
+EXPORT_SYMBOL(__gnet_stats_copy_basic_cpu);
+
+int
+gnet_stats_copy_basic_cpu(struct gnet_dump *d, struct gnet_stats_basic_cpu *b)
+{
+	struct gnet_stats_basic_packed bstats = {0};
+
+	__gnet_stats_copy_basic_cpu(&bstats, b);
+	return gnet_stats_copy_basic(d, &bstats);
+}
+EXPORT_SYMBOL(gnet_stats_copy_basic_cpu);
+
 /**
  * gnet_stats_copy_rate_est - copy rate estimator statistics into statistics TLV
  * @d: dumping handle
@@ -143,12 +186,13 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
 int
 gnet_stats_copy_rate_est(struct gnet_dump *d,
 			 const struct gnet_stats_basic_packed *b,
+			 const struct gnet_stats_basic_cpu *cpu_b,
 			 struct gnet_stats_rate_est64 *r)
 {
 	struct gnet_stats_rate_est est;
 	int res;
 
-	if (b && !gen_estimator_active(b, r))
+	if ((b || cpu_b) && !gen_estimator_active(b, cpu_b, r))
 		return 0;
 
 	est.bps = min_t(u64, UINT_MAX, r->bps);
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 370adf6..02e2532 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -64,7 +64,7 @@ void xt_rateest_put(struct xt_rateest *est)
 	mutex_lock(&xt_rateest_mutex);
 	if (--est->refcnt == 0) {
 		hlist_del(&est->list);
-		gen_kill_estimator(&est->bstats, &est->rstats);
+		gen_kill_estimator(&est->bstats, NULL, &est->rstats);
 		/*
 		 * gen_estimator est_timer() might access est->lock or bstats,
 		 * wait a RCU grace period before freeing 'est'
@@ -136,7 +136,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
 	cfg.est.interval	= info->interval;
 	cfg.est.ewma_log	= info->ewma_log;
 
-	ret = gen_new_estimator(&est->bstats, &est->rstats,
+	ret = gen_new_estimator(&est->bstats, NULL, &est->rstats,
 				&est->lock, &cfg.opt);
 	if (ret < 0)
 		goto err2;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index d73a546..0be3945 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -35,7 +35,7 @@ void tcf_hash_destroy(struct tc_action *a)
 	spin_lock_bh(&hinfo->lock);
 	hlist_del(&p->tcfc_head);
 	spin_unlock_bh(&hinfo->lock);
-	gen_kill_estimator(&p->tcfc_bstats,
+	gen_kill_estimator(&p->tcfc_bstats, NULL,
 			   &p->tcfc_rate_est);
 	/*
 	 * gen_estimator est_timer() might access p->tcfc_lock
@@ -228,7 +228,7 @@ void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est)
 {
 	struct tcf_common *pc = a->priv;
 	if (est)
-		gen_kill_estimator(&pc->tcfc_bstats,
+		gen_kill_estimator(&pc->tcfc_bstats, NULL,
 				   &pc->tcfc_rate_est);
 	kfree_rcu(pc, tcfc_rcu);
 }
@@ -252,7 +252,8 @@ int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
 	p->tcfc_tm.install = jiffies;
 	p->tcfc_tm.lastuse = jiffies;
 	if (est) {
-		int err = gen_new_estimator(&p->tcfc_bstats, &p->tcfc_rate_est,
+		int err = gen_new_estimator(&p->tcfc_bstats, NULL,
+					    &p->tcfc_rate_est,
 					    &p->tcfc_lock, est);
 		if (err) {
 			kfree(p);
@@ -620,7 +621,7 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *a,
 		goto errout;
 
 	if (gnet_stats_copy_basic(&d, &p->tcfc_bstats) < 0 ||
-	    gnet_stats_copy_rate_est(&d, &p->tcfc_bstats,
+	    gnet_stats_copy_rate_est(&d, &p->tcfc_bstats, NULL,
 				     &p->tcfc_rate_est) < 0 ||
 	    gnet_stats_copy_queue(&d, &p->tcfc_qstats) < 0)
 		goto errout;
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 0566e46..18e6c83 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -178,14 +178,14 @@ override:
 
 	spin_lock_bh(&police->tcf_lock);
 	if (est) {
-		err = gen_replace_estimator(&police->tcf_bstats,
+		err = gen_replace_estimator(&police->tcf_bstats, NULL,
 					    &police->tcf_rate_est,
 					    &police->tcf_lock, est);
 		if (err)
 			goto failure_unlock;
 	} else if (tb[TCA_POLICE_AVRATE] &&
 		   (ret == ACT_P_CREATED ||
-		    !gen_estimator_active(&police->tcf_bstats,
+		    !gen_estimator_active(&police->tcf_bstats, NULL,
 					  &police->tcf_rate_est))) {
 		err = -EINVAL;
 		goto failure_unlock;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 30cc267..bf14313 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -942,6 +942,13 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 	sch->handle = handle;
 
 	if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) {
+		if (qdisc_is_lockless(sch)) {
+			sch->bstats_qdisc.cpu_bstats =
+				alloc_percpu(struct gnet_stats_basic_cpu);
+			if (!sch->bstats_qdisc.cpu_bstats)
+				goto err_out4;
+		}
+
 		if (tca[TCA_STAB]) {
 			stab = qdisc_get_stab(tca[TCA_STAB]);
 			if (IS_ERR(stab)) {
@@ -964,8 +971,18 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 			else
 				root_lock = qdisc_lock(sch);
 
-			err = gen_new_estimator(&sch->bstats, &sch->rate_est,
-						root_lock, tca[TCA_RATE]);
+			if (qdisc_is_lockless(sch))
+				err = gen_new_estimator(NULL,
+							sch->bstats_qdisc.cpu_bstats,
+							&sch->rate_est,
+							root_lock,
+							tca[TCA_RATE]);
+			else
+				err = gen_new_estimator(&sch->bstats_qdisc.bstats,
+							NULL,
+							&sch->rate_est,
+							root_lock,
+							tca[TCA_RATE]);
 			if (err)
 				goto err_out4;
 		}
@@ -1022,9 +1039,11 @@ static int qdisc_change(struct Qdisc *sch, struct nlattr **tca)
 		   because change can't be undone. */
 		if (sch->flags & TCQ_F_MQROOT)
 			goto out;
-		gen_replace_estimator(&sch->bstats, &sch->rate_est,
-					    qdisc_root_sleeping_lock(sch),
-					    tca[TCA_RATE]);
+		gen_replace_estimator(&sch->bstats_qdisc.bstats,
+				      sch->bstats_qdisc.cpu_bstats,
+				      &sch->rate_est,
+				      qdisc_root_sleeping_lock(sch),
+				      tca[TCA_RATE]);
 	}
 out:
 	return 0;
@@ -1303,6 +1322,7 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 	unsigned char *b = skb_tail_pointer(skb);
 	struct gnet_dump d;
 	struct qdisc_size_table *stab;
+	int err;
 
 	cond_resched();
 	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
@@ -1333,10 +1353,25 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 	if (q->ops->dump_stats && q->ops->dump_stats(q, &d) < 0)
 		goto nla_put_failure;
 
-	if (gnet_stats_copy_basic(&d, &q->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(&d, &q->bstats, &q->rate_est) < 0 ||
+	if (qdisc_is_lockless(q)) {
+		err = gnet_stats_copy_basic_cpu(&d, q->bstats_qdisc.cpu_bstats);
+		if (err < 0)
+			goto nla_put_failure;
+		err = gnet_stats_copy_rate_est(&d, NULL,
+					       q->bstats_qdisc.cpu_bstats,
+					       &q->rate_est);
+	} else {
+		err = gnet_stats_copy_basic(&d, &q->bstats_qdisc.bstats);
+		if (err < 0)
+			goto nla_put_failure;
+		err = gnet_stats_copy_rate_est(&d,
+					       &q->bstats_qdisc.bstats, NULL,
+					       &q->rate_est);
+	}
+
+	if (err < 0 ||
 	    gnet_stats_copy_queue(&d, &q->qstats) < 0)
-		goto nla_put_failure;
+			goto nla_put_failure;
 
 	if (gnet_stats_finish_copy(&d) < 0)
 		goto nla_put_failure;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index d0a6e60..9ecab38 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1623,7 +1623,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 		cl->xstats.undertime = cl->undertime - q->now;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qstats) < 0)
 		return -1;
 
@@ -1695,7 +1695,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
 
 	qdisc_destroy(cl->q);
 	qdisc_put_rtab(cl->R_tab);
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	if (cl != &q->link)
 		kfree(cl);
 }
@@ -1786,7 +1786,8 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 		}
 
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
 						    qdisc_root_sleeping_lock(sch),
 						    tca[TCA_RATE]);
 			if (err) {
@@ -1879,7 +1880,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 		goto failure;
 
 	if (tca[TCA_RATE]) {
-		err = gen_new_estimator(&cl->bstats, &cl->rate_est,
+		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
 					qdisc_root_sleeping_lock(sch),
 					tca[TCA_RATE]);
 		if (err) {
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 899026f..b58971a 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -88,7 +88,8 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	if (cl != NULL) {
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
 						    qdisc_root_sleeping_lock(sch),
 						    tca[TCA_RATE]);
 			if (err)
@@ -116,7 +117,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		cl->qdisc = &noop_qdisc;
 
 	if (tca[TCA_RATE]) {
-		err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
+		err = gen_replace_estimator(&cl->bstats, NULL, &cl->rate_est,
 					    qdisc_root_sleeping_lock(sch),
 					    tca[TCA_RATE]);
 		if (err) {
@@ -138,7 +139,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 static void drr_destroy_class(struct Qdisc *sch, struct drr_class *cl)
 {
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	qdisc_destroy(cl->qdisc);
 	kfree(cl);
 }
@@ -282,7 +283,7 @@ static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	}
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0)
 		return -1;
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cd34a49..27033d0 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -627,6 +627,9 @@ static void qdisc_rcu_free(struct rcu_head *head)
 {
 	struct Qdisc *qdisc = container_of(head, struct Qdisc, rcu_head);
 
+	if (qdisc_is_lockless(qdisc))
+		free_percpu(qdisc->bstats_qdisc.cpu_bstats);
+
 	kfree((char *) qdisc - qdisc->padded);
 }
 
@@ -643,7 +646,13 @@ void qdisc_destroy(struct Qdisc *qdisc)
 
 	qdisc_put_stab(rtnl_dereference(qdisc->stab));
 #endif
-	gen_kill_estimator(&qdisc->bstats, &qdisc->rate_est);
+	if (qdisc_is_lockless(qdisc))
+		gen_kill_estimator(NULL, qdisc->bstats_qdisc.cpu_bstats,
+				   &qdisc->rate_est);
+	else
+		gen_kill_estimator(&qdisc->bstats_qdisc.bstats, NULL,
+				   &qdisc->rate_est);
+
 	if (ops->reset)
 		ops->reset(qdisc);
 	if (ops->destroy)
@@ -653,6 +662,7 @@ void qdisc_destroy(struct Qdisc *qdisc)
 	dev_put(qdisc_dev(qdisc));
 
 	kfree_skb(qdisc->gso_skb);
+
 	/*
 	 * gen_estimator est_timer() might access qdisc->q.lock,
 	 * wait a RCU grace period before freeing qdisc.
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 95689d3..d8f928c 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1014,9 +1014,12 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		cur_time = psched_get_time();
 
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
-					      qdisc_root_sleeping_lock(sch),
-					      tca[TCA_RATE]);
+			spinlock_t *lock = qdisc_root_sleeping_lock(sch);
+
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
+						    lock,
+						    tca[TCA_RATE]);
 			if (err)
 				return err;
 		}
@@ -1063,7 +1066,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		return -ENOBUFS;
 
 	if (tca[TCA_RATE]) {
-		err = gen_new_estimator(&cl->bstats, &cl->rate_est,
+		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
 					qdisc_root_sleeping_lock(sch),
 					tca[TCA_RATE]);
 		if (err) {
@@ -1115,7 +1118,7 @@ hfsc_destroy_class(struct Qdisc *sch, struct hfsc_class *cl)
 	fl = rtnl_dereference(cl->filter_list);
 	tcf_destroy_chain(&fl);
 	qdisc_destroy(cl->qdisc);
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	if (cl != &q->root)
 		kfree(cl);
 }
@@ -1377,7 +1380,7 @@ hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	xstats.rtwork  = cl->cl_cumul;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qstats) < 0)
 		return -1;
 
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 2c2a6aa..2bbed9c 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1145,7 +1145,7 @@ htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
 	cl->xstats.ctokens = PSCHED_NS2TICKS(cl->ctokens);
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, NULL, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, NULL, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qstats) < 0)
 		return -1;
 
@@ -1237,7 +1237,7 @@ static void htb_destroy_class(struct Qdisc *sch, struct htb_class *cl)
 		WARN_ON(!cl->un.leaf.q);
 		qdisc_destroy(cl->un.leaf.q);
 	}
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	fl = rtnl_dereference(cl->filter_list);
 	tcf_destroy_chain(&fl);
 	kfree(cl);
@@ -1409,7 +1409,8 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			goto failure;
 
 		if (htb_rate_est || tca[TCA_RATE]) {
-			err = gen_new_estimator(&cl->bstats, &cl->rate_est,
+			err = gen_new_estimator(&cl->bstats, NULL,
+						&cl->rate_est,
 						qdisc_root_sleeping_lock(sch),
 						tca[TCA_RATE] ? : &est.nla);
 			if (err) {
@@ -1471,8 +1472,11 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			parent->children++;
 	} else {
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
-						    qdisc_root_sleeping_lock(sch),
+			spinlock_t *lock = qdisc_root_sleeping_lock(sch);
+
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
+						    lock,
 						    tca[TCA_RATE]);
 			if (err)
 				return err;
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index be10333d..5700c27 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -64,7 +64,7 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	result = tc_classify(skb, fl, &res);
 
-	qdisc_bstats_update(sch, skb);
+	qdisc_bstats_update_cpu(sch, skb);
 	switch (result) {
 	case TC_ACT_SHOT:
 		result = TC_ACT_SHOT;
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index a8b2864..e96a41f 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -98,20 +98,22 @@ static void mq_attach(struct Qdisc *sch)
 
 static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_basic_packed *bstats = &sch->bstats_qdisc.bstats;
 	struct net_device *dev = qdisc_dev(sch);
 	struct Qdisc *qdisc;
 	unsigned int ntx;
 
 	sch->q.qlen = 0;
-	memset(&sch->bstats, 0, sizeof(sch->bstats));
+	memset(bstats, 0, sizeof(sch->bstats_qdisc));
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
-		sch->bstats.bytes	+= qdisc->bstats.bytes;
-		sch->bstats.packets	+= qdisc->bstats.packets;
+
+		bstats->bytes		+= qdisc->bstats_qdisc.bstats.bytes;
+		bstats->packets		+= qdisc->bstats_qdisc.bstats.packets;
 		sch->qstats.qlen	+= qdisc->qstats.qlen;
 		sch->qstats.backlog	+= qdisc->qstats.backlog;
 		sch->qstats.drops	+= qdisc->qstats.drops;
@@ -201,7 +203,7 @@ static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 	sch = dev_queue->qdisc_sleeping;
 	sch->qstats.qlen = sch->q.qlen;
-	if (gnet_stats_copy_basic(d, &sch->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, &sch->bstats_qdisc.bstats) < 0 ||
 	    gnet_stats_copy_queue(d, &sch->qstats) < 0)
 		return -1;
 	return 0;
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 37e7d25..10f5d4a 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -219,6 +219,7 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 
 static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_basic_packed *bstats = &sch->bstats_qdisc.bstats;
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	unsigned char *b = skb_tail_pointer(skb);
@@ -227,15 +228,16 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	unsigned int i;
 
 	sch->q.qlen = 0;
-	memset(&sch->bstats, 0, sizeof(sch->bstats));
+	memset(bstats, 0, sizeof(sch->bstats_qdisc.bstats));
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
+
 		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
-		sch->bstats.bytes	+= qdisc->bstats.bytes;
-		sch->bstats.packets	+= qdisc->bstats.packets;
+		bstats->bytes		+= qdisc->bstats_qdisc.bstats.bytes;
+		bstats->packets		+= qdisc->bstats_qdisc.bstats.packets;
 		sch->qstats.qlen	+= qdisc->qstats.qlen;
 		sch->qstats.backlog	+= qdisc->qstats.backlog;
 		sch->qstats.drops	+= qdisc->qstats.drops;
@@ -344,8 +346,8 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 			qdisc = rtnl_dereference(q->qdisc);
 			spin_lock_bh(qdisc_lock(qdisc));
-			bstats.bytes      += qdisc->bstats.bytes;
-			bstats.packets    += qdisc->bstats.packets;
+			bstats.bytes      += qdisc->bstats_qdisc.bstats.bytes;
+			bstats.packets    += qdisc->bstats_qdisc.bstats.packets;
 			qstats.qlen       += qdisc->qstats.qlen;
 			qstats.backlog    += qdisc->qstats.backlog;
 			qstats.drops      += qdisc->qstats.drops;
@@ -363,7 +365,7 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 		sch = dev_queue->qdisc_sleeping;
 		sch->qstats.qlen = sch->q.qlen;
-		if (gnet_stats_copy_basic(d, &sch->bstats) < 0 ||
+		if (gnet_stats_copy_basic(d, &sch->bstats_qdisc.bstats) < 0 ||
 		    gnet_stats_copy_queue(d, &sch->qstats) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 169d637..73e8b2b 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -362,7 +362,7 @@ static int multiq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 	cl_q = q->queues[cl - 1];
 	cl_q->qstats.qlen = cl_q->q.qlen;
-	if (gnet_stats_copy_basic(d, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, &cl_q->bstats_qdisc.bstats) < 0 ||
 	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
 		return -1;
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 43aa35d..1ae5d5b 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -326,7 +326,7 @@ static int prio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 	cl_q = q->queues[cl - 1];
 	cl_q->qstats.qlen = cl_q->q.qlen;
-	if (gnet_stats_copy_basic(d, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, &cl_q->bstats_qdisc.bstats) < 0 ||
 	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
 		return -1;
 
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index cee5618..1cc4978 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -459,7 +459,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	if (cl != NULL) { /* modify existing class */
 		if (tca[TCA_RATE]) {
-			err = gen_replace_estimator(&cl->bstats, &cl->rate_est,
+			err = gen_replace_estimator(&cl->bstats, NULL,
+						    &cl->rate_est,
 						    qdisc_root_sleeping_lock(sch),
 						    tca[TCA_RATE]);
 			if (err)
@@ -484,7 +485,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		cl->qdisc = &noop_qdisc;
 
 	if (tca[TCA_RATE]) {
-		err = gen_new_estimator(&cl->bstats, &cl->rate_est,
+		err = gen_new_estimator(&cl->bstats, NULL,
+					&cl->rate_est,
 					qdisc_root_sleeping_lock(sch),
 					tca[TCA_RATE]);
 		if (err)
@@ -505,7 +507,7 @@ set_change_agg:
 		new_agg = kzalloc(sizeof(*new_agg), GFP_KERNEL);
 		if (new_agg == NULL) {
 			err = -ENOBUFS;
-			gen_kill_estimator(&cl->bstats, &cl->rate_est);
+			gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 			goto destroy_class;
 		}
 		sch_tree_lock(sch);
@@ -530,7 +532,7 @@ static void qfq_destroy_class(struct Qdisc *sch, struct qfq_class *cl)
 	struct qfq_sched *q = qdisc_priv(sch);
 
 	qfq_rm_from_agg(q, cl);
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	gen_kill_estimator(&cl->bstats, NULL, &cl->rate_est);
 	qdisc_destroy(cl->qdisc);
 	kfree(cl);
 }
@@ -667,7 +669,7 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	xstats.lmax = cl->agg->lmax;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
-	    gnet_stats_copy_rate_est(d, &cl->bstats, &cl->rate_est) < 0 ||
+	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0)
 		return -1;
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 14/15] net: sched: make qstats per cpu
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (12 preceding siblings ...)
  2014-04-30 16:40 ` [RFC PATCH 13/15] net: sched: make bstats per cpu and estimator RCU safe John Fastabend
@ 2014-04-30 16:40 ` John Fastabend
  2014-04-30 17:08   ` Cong Wang
  2014-04-30 16:41 ` [RFC PATCH 15/15] net: sched: drop ingress qdisc lock John Fastabend
  14 siblings, 1 reply; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:40 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

Now that we run qdisc without locked the qstats need to handle the
case where multiple cpus are receiving or transmiting a skb. For now
the only qdisc that supports running without locks is the ingress
qdisc which increments the 32-bit drop counter.

When the stats are collected the summation of all CPUs is collected
and returned in the dump tlv.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/codel.h       |    4 ++--
 include/net/gen_stats.h   |    1 +
 include/net/sch_generic.h |   19 +++++++++++--------
 net/core/gen_stats.c      |   30 +++++++++++++++++++++++++++++-
 net/sched/sch_api.c       |   20 +++++++++++++++-----
 net/sched/sch_atm.c       |    2 +-
 net/sched/sch_cbq.c       |   10 +++++-----
 net/sched/sch_choke.c     |   15 ++++++++-------
 net/sched/sch_codel.c     |    2 +-
 net/sched/sch_drr.c       |    8 ++++----
 net/sched/sch_dsmark.c    |    2 +-
 net/sched/sch_fifo.c      |    6 ++++--
 net/sched/sch_fq.c        |    4 ++--
 net/sched/sch_fq_codel.c  |    8 ++++----
 net/sched/sch_generic.c   |    4 ++--
 net/sched/sch_gred.c      |   10 +++++-----
 net/sched/sch_hfsc.c      |   15 ++++++++-------
 net/sched/sch_hhf.c       |    8 ++++----
 net/sched/sch_htb.c       |    6 +++---
 net/sched/sch_ingress.c   |    4 +++-
 net/sched/sch_mq.c        |   23 ++++++++++++++---------
 net/sched/sch_mqprio.c    |   34 +++++++++++++++++++++-------------
 net/sched/sch_multiq.c    |    8 ++++----
 net/sched/sch_netem.c     |   17 +++++++++--------
 net/sched/sch_pie.c       |    8 ++++----
 net/sched/sch_plug.c      |    2 +-
 net/sched/sch_prio.c      |    8 ++++----
 net/sched/sch_qfq.c       |    8 ++++----
 net/sched/sch_red.c       |   13 +++++++------
 net/sched/sch_sfb.c       |   13 +++++++------
 net/sched/sch_sfq.c       |   19 ++++++++++---------
 net/sched/sch_tbf.c       |   11 ++++++-----
 32 files changed, 204 insertions(+), 138 deletions(-)

diff --git a/include/net/codel.h b/include/net/codel.h
index fe0eab3..6c8911c 100644
--- a/include/net/codel.h
+++ b/include/net/codel.h
@@ -228,13 +228,13 @@ static bool codel_should_drop(const struct sk_buff *skb,
 	}
 
 	vars->ldelay = now - codel_get_enqueue_time(skb);
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 
 	if (unlikely(qdisc_pkt_len(skb) > stats->maxpacket))
 		stats->maxpacket = qdisc_pkt_len(skb);
 
 	if (codel_time_before(vars->ldelay, params->target) ||
-	    sch->qstats.backlog <= stats->maxpacket) {
+	    sch->qstats_qdisc.qstats.backlog <= stats->maxpacket) {
 		/* went below - stay below for at least interval */
 		vars->first_above_time = 0;
 		return false;
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index dec4312..c9a6e38 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -44,6 +44,7 @@ int gnet_stats_copy_rate_est(struct gnet_dump *d,
 			     const struct gnet_stats_basic_cpu *cpu_b,
 			     struct gnet_stats_rate_est64 *r);
 int gnet_stats_copy_queue(struct gnet_dump *d, struct gnet_stats_queue *q);
+int gnet_stats_copy_queue_cpu(struct gnet_dump *d, struct gnet_stats_queue *q);
 int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 
 int gnet_stats_finish_copy(struct gnet_dump *d);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 1a55e19..6d67a27 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -92,7 +92,10 @@ struct Qdisc {
 		struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	} bstats_qdisc;
 	unsigned int		__state;
-	struct gnet_stats_queue	qstats;
+	union {
+		struct gnet_stats_queue qstats;
+		struct gnet_stats_queue __percpu *cpu_qstats;
+	} qstats_qdisc;
 	struct rcu_head		rcu_head;
 	int			padded;
 	atomic_t		refcnt;
@@ -546,7 +549,7 @@ static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
 				       struct sk_buff_head *list)
 {
 	__skb_queue_tail(list, skb);
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 
 	return NET_XMIT_SUCCESS;
 }
@@ -562,7 +565,7 @@ static inline struct sk_buff *__qdisc_dequeue_head(struct Qdisc *sch,
 	struct sk_buff *skb = __skb_dequeue(list);
 
 	if (likely(skb != NULL)) {
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		qdisc_bstats_update(sch, skb);
 	}
 
@@ -581,7 +584,7 @@ static inline unsigned int __qdisc_queue_drop_head(struct Qdisc *sch,
 
 	if (likely(skb != NULL)) {
 		unsigned int len = qdisc_pkt_len(skb);
-		sch->qstats.backlog -= len;
+		sch->qstats_qdisc.qstats.backlog -= len;
 		kfree_skb(skb);
 		return len;
 	}
@@ -600,7 +603,7 @@ static inline struct sk_buff *__qdisc_dequeue_tail(struct Qdisc *sch,
 	struct sk_buff *skb = __skb_dequeue_tail(list);
 
 	if (likely(skb != NULL))
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 
 	return skb;
 }
@@ -657,7 +660,7 @@ static inline void __qdisc_reset_queue(struct Qdisc *sch,
 static inline void qdisc_reset_queue(struct Qdisc *sch)
 {
 	__qdisc_reset_queue(sch, &sch->q);
-	sch->qstats.backlog = 0;
+	sch->qstats_qdisc.qstats.backlog = 0;
 }
 
 static inline unsigned int __qdisc_queue_drop(struct Qdisc *sch,
@@ -682,14 +685,14 @@ static inline unsigned int qdisc_queue_drop(struct Qdisc *sch)
 static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch)
 {
 	kfree_skb(skb);
-	sch->qstats.drops++;
+	sch->qstats_qdisc.qstats.drops++;
 
 	return NET_XMIT_DROP;
 }
 
 static inline int qdisc_reshape_fail(struct sk_buff *skb, struct Qdisc *sch)
 {
-	sch->qstats.drops++;
+	sch->qstats_qdisc.qstats.drops++;
 
 #ifdef CONFIG_NET_CLS_ACT
 	if (sch->reshape_fail == NULL || sch->reshape_fail(skb, sch))
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index b51146a..5d279de 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -22,7 +22,7 @@
 #include <linux/gen_stats.h>
 #include <net/netlink.h>
 #include <net/gen_stats.h>
-
+#include <uapi/linux/gen_stats.h>
 
 static inline int
 gnet_stats_copy(struct gnet_dump *d, int type, void *buf, int size)
@@ -244,6 +244,34 @@ gnet_stats_copy_queue(struct gnet_dump *d, struct gnet_stats_queue *q)
 }
 EXPORT_SYMBOL(gnet_stats_copy_queue);
 
+static void
+__gnet_stats_copy_queue_cpu(struct gnet_stats_queue *qstats,
+			    struct gnet_stats_queue *q)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
+
+		qstats->qlen += qcpu->qlen;
+		qstats->backlog += qcpu->backlog;
+		qstats->drops += qcpu->drops;
+		qstats->requeues += qcpu->requeues;
+		qstats->overlimits += qcpu->overlimits;
+	}
+}
+
+int
+gnet_stats_copy_queue_cpu(struct gnet_dump *d, struct gnet_stats_queue *q)
+{
+	struct gnet_stats_queue qstats = {0};
+
+	__gnet_stats_copy_queue_cpu(&qstats, q);
+	return gnet_stats_copy_queue(d, &qstats);
+}
+EXPORT_SYMBOL(gnet_stats_copy_queue_cpu);
+
+
 /**
  * gnet_stats_copy_app - copy application specific statistics into statistics TLV
  * @d: dumping handle
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index bf14313..02cbbbe 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -763,7 +763,7 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
 			cops->put(sch, cl);
 		}
 		sch->q.qlen -= n;
-		sch->qstats.drops += drops;
+		sch->qstats_qdisc.qstats.drops += drops;
 	}
 }
 EXPORT_SYMBOL(qdisc_tree_decrease_qlen);
@@ -1340,7 +1340,6 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 		goto nla_put_failure;
 	if (q->ops->dump && q->ops->dump(q, skb) < 0)
 		goto nla_put_failure;
-	q->qstats.qlen = q->q.qlen;
 
 	stab = rtnl_dereference(q->stab);
 	if (stab && qdisc_dump_stab(skb, stab) < 0)
@@ -1360,18 +1359,29 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 		err = gnet_stats_copy_rate_est(&d, NULL,
 					       q->bstats_qdisc.cpu_bstats,
 					       &q->rate_est);
+
+		if (err < 0)
+			goto nla_put_failure;
+
+		q->qstats_qdisc.qstats.qlen = q->q.qlen;
+		err = gnet_stats_copy_queue_cpu(&d, q->qstats_qdisc.cpu_qstats);
 	} else {
 		err = gnet_stats_copy_basic(&d, &q->bstats_qdisc.bstats);
 		if (err < 0)
 			goto nla_put_failure;
+
+		q->qstats_qdisc.qstats.qlen = q->q.qlen;
 		err = gnet_stats_copy_rate_est(&d,
 					       &q->bstats_qdisc.bstats, NULL,
 					       &q->rate_est);
+		if (err < 0)
+			goto nla_put_failure;
+
+		err = gnet_stats_copy_queue(&d, &q->qstats_qdisc.qstats);
 	}
 
-	if (err < 0 ||
-	    gnet_stats_copy_queue(&d, &q->qstats) < 0)
-			goto nla_put_failure;
+	if (err < 0)
+		goto nla_put_failure;
 
 	if (gnet_stats_finish_copy(&d) < 0)
 		goto nla_put_failure;
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 71e7e72..eab73c8 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -420,7 +420,7 @@ done:
 	if (ret != NET_XMIT_SUCCESS) {
 drop: __maybe_unused
 		if (net_xmit_drop_count(ret)) {
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 			if (flow)
 				flow->qstats.drops++;
 		}
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 9ecab38..e815ac4 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -384,7 +384,7 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 #endif
 	if (cl == NULL) {
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -402,7 +402,7 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	}
 
 	if (net_xmit_drop_count(ret)) {
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 		cbq_mark_toplevel(q, cl);
 		cl->qstats.drops++;
 	}
@@ -657,11 +657,11 @@ static int cbq_reshape_fail(struct sk_buff *skb, struct Qdisc *child)
 			return 0;
 		}
 		if (net_xmit_drop_count(ret))
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		return 0;
 	}
 
-	sch->qstats.drops++;
+	sch->qstats_qdisc.qstats.drops++;
 	return -1;
 }
 #endif
@@ -1014,7 +1014,7 @@ cbq_dequeue(struct Qdisc *sch)
 	 */
 
 	if (sch->q.qlen) {
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (q->wd_expires)
 			qdisc_watchdog_schedule(&q->watchdog,
 						now + q->wd_expires);
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 31b2d75..7dc89d9 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -127,7 +127,7 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
 	if (idx == q->tail)
 		choke_zap_tail_holes(q);
 
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 	qdisc_drop(skb, sch);
 	qdisc_tree_decrease_qlen(sch, 1);
 	--sch->q.qlen;
@@ -296,7 +296,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		if (q->vars.qavg > p->qth_max) {
 			q->vars.qcount = -1;
 
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			if (use_harddrop(q) || !use_ecn(q) ||
 			    !INET_ECN_set_ce(skb)) {
 				q->stats.forced_drop++;
@@ -309,7 +309,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 				q->vars.qcount = 0;
 				q->vars.qR = red_random(p);
 
-				sch->qstats.overlimits++;
+				sch->qstats_qdisc.qstats.overlimits++;
 				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
 					q->stats.prob_drop++;
 					goto congestion_drop;
@@ -326,7 +326,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		q->tab[q->tail] = skb;
 		q->tail = (q->tail + 1) & q->tab_mask;
 		++sch->q.qlen;
-		sch->qstats.backlog += qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 		return NET_XMIT_SUCCESS;
 	}
 
@@ -339,7 +339,7 @@ congestion_drop:
 
 other_drop:
 	if (ret & __NET_XMIT_BYPASS)
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	kfree_skb(skb);
 	return ret;
 }
@@ -359,7 +359,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	q->tab[q->head] = NULL;
 	choke_zap_head_holes(q);
 	--sch->q.qlen;
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 	qdisc_bstats_update(sch, skb);
 
 	return skb;
@@ -407,6 +407,7 @@ static void choke_free(void *addr)
 
 static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct choke_sched_data *q = qdisc_priv(sch);
 	struct nlattr *tb[TCA_CHOKE_MAX + 1];
 	const struct tc_red_qopt *ctl;
@@ -459,7 +460,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 					ntab[tail++] = skb;
 					continue;
 				}
-				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				qstats->backlog -= qdisc_pkt_len(skb);
 				--sch->q.qlen;
 				qdisc_drop(skb, sch);
 			}
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 2f9ab17..47a387e 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -149,7 +149,7 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt)
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = __skb_dequeue(&sch->q);
 
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		qdisc_drop(skb, sch);
 	}
 	qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index b58971a..9547792 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -279,12 +279,12 @@ static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	memset(&xstats, 0, sizeof(xstats));
 	if (cl->qdisc->q.qlen) {
 		xstats.deficit = cl->deficit;
-		cl->qdisc->qstats.qlen = cl->qdisc->q.qlen;
+		cl->qdisc->qstats_qdisc.qstats.qlen = cl->qdisc->q.qlen;
 	}
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
-	    gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0)
+	    gnet_stats_copy_queue(d, &cl->qdisc->qstats_qdisc.qstats) < 0)
 		return -1;
 
 	return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
@@ -359,7 +359,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl = drr_classify(skb, sch, &err);
 	if (cl == NULL) {
 		if (err & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return err;
 	}
@@ -368,7 +368,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (unlikely(err != NET_XMIT_SUCCESS)) {
 		if (net_xmit_drop_count(err)) {
 			cl->qstats.drops++;
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		}
 		return err;
 	}
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 6b84bd4..405a72d 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -258,7 +258,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	err = qdisc_enqueue(skb, p->q);
 	if (err != NET_XMIT_SUCCESS) {
 		if (net_xmit_drop_count(err))
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		return err;
 	}
 
diff --git a/net/sched/sch_fifo.c b/net/sched/sch_fifo.c
index e15a9eb..264ae79 100644
--- a/net/sched/sch_fifo.c
+++ b/net/sched/sch_fifo.c
@@ -21,7 +21,9 @@
 
 static int bfifo_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
-	if (likely(sch->qstats.backlog + qdisc_pkt_len(skb) <= sch->limit))
+	u32 limit = sch->qstats_qdisc.qstats.backlog + qdisc_pkt_len(skb);
+
+	if (likely(limit <= sch->limit))
 		return qdisc_enqueue_tail(skb, sch);
 
 	return qdisc_reshape_fail(skb, sch);
@@ -42,7 +44,7 @@ static int pfifo_tail_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	/* queue full, remove one skb to fulfill the limit */
 	__qdisc_queue_drop_head(sch, &sch->q);
-	sch->qstats.drops++;
+	sch->qstats_qdisc.qstats.drops++;
 	qdisc_enqueue_tail(skb, sch);
 
 	return NET_XMIT_CN;
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 23c682b..64f946d 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -290,7 +290,7 @@ static struct sk_buff *fq_dequeue_head(struct Qdisc *sch, struct fq_flow *flow)
 		flow->head = skb->next;
 		skb->next = NULL;
 		flow->qlen--;
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		sch->q.qlen--;
 	}
 	return skb;
@@ -371,7 +371,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	f->qlen++;
 	if (skb_is_retransmit(skb))
 		q->stat_tcp_retrans++;
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 	if (fq_flow_is_detached(f)) {
 		fq_flow_add_tail(&q->new_flows, f);
 		if (time_after(jiffies, f->age + q->flow_refill_delay))
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index d37eb06..8f71e67 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -163,8 +163,8 @@ static unsigned int fq_codel_drop(struct Qdisc *sch)
 	q->backlogs[idx] -= len;
 	kfree_skb(skb);
 	sch->q.qlen--;
-	sch->qstats.drops++;
-	sch->qstats.backlog -= len;
+	sch->qstats_qdisc.qstats.drops++;
+	sch->qstats_qdisc.qstats.backlog -= len;
 	flow->dropped++;
 	return idx;
 }
@@ -179,7 +179,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	idx = fq_codel_classify(skb, sch, &ret);
 	if (idx == 0) {
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -189,7 +189,7 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	flow = &q->flows[idx];
 	flow_queue_add(flow, skb);
 	q->backlogs[idx] += qdisc_pkt_len(skb);
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 
 	if (list_empty(&flow->flowchain)) {
 		list_add_tail(&flow->flowchain, &q->new_flows);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 27033d0..b5870a1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -49,7 +49,7 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
 	skb_dst_force(skb);
 	q->gso_skb = skb;
-	q->qstats.requeues++;
+	q->qstats_qdisc.qstats.requeues++;
 	q->q.qlen++;	/* it's still part of the queue */
 	__netif_schedule(q);
 
@@ -495,7 +495,7 @@ static void pfifo_fast_reset(struct Qdisc *qdisc)
 		__qdisc_reset_queue(qdisc, band2list(priv, prio));
 
 	priv->bitmap = 0;
-	qdisc->qstats.backlog = 0;
+	qdisc->qstats_qdisc.qstats.backlog = 0;
 	qdisc->q.qlen = 0;
 }
 
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 12cbc09..198c44c 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -115,7 +115,7 @@ static inline unsigned int gred_backlog(struct gred_sched *table,
 					struct Qdisc *sch)
 {
 	if (gred_wred_mode(table))
-		return sch->qstats.backlog;
+		return sch->qstats_qdisc.qstats.backlog;
 	else
 		return q->backlog;
 }
@@ -209,7 +209,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		break;
 
 	case RED_PROB_MARK:
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (!gred_use_ecn(t) || !INET_ECN_set_ce(skb)) {
 			q->stats.prob_drop++;
 			goto congestion_drop;
@@ -219,7 +219,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		break;
 
 	case RED_HARD_MARK:
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (gred_use_harddrop(t) || !gred_use_ecn(t) ||
 		    !INET_ECN_set_ce(skb)) {
 			q->stats.forced_drop++;
@@ -261,7 +261,7 @@ static struct sk_buff *gred_dequeue(struct Qdisc *sch)
 			q->backlog -= qdisc_pkt_len(skb);
 
 			if (gred_wred_mode(t)) {
-				if (!sch->qstats.backlog)
+				if (!sch->qstats_qdisc.qstats.backlog)
 					red_start_of_idle_period(&t->wred_set);
 			} else {
 				if (!q->backlog)
@@ -294,7 +294,7 @@ static unsigned int gred_drop(struct Qdisc *sch)
 			q->stats.other++;
 
 			if (gred_wred_mode(t)) {
-				if (!sch->qstats.backlog)
+				if (!sch->qstats_qdisc.qstats.backlog)
 					red_start_of_idle_period(&t->wred_set);
 			} else {
 				if (!q->backlog)
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index d8f928c..f98c972 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1373,7 +1373,7 @@ hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	struct tc_hfsc_stats xstats;
 
 	cl->qstats.qlen = cl->qdisc->q.qlen;
-	cl->qstats.backlog = cl->qdisc->qstats.backlog;
+	cl->qstats.backlog = cl->qdisc->qstats_qdisc.qstats.backlog;
 	xstats.level   = cl->level;
 	xstats.period  = cl->cl_vtperiod;
 	xstats.work    = cl->cl_total;
@@ -1565,16 +1565,17 @@ hfsc_destroy_qdisc(struct Qdisc *sch)
 static int
 hfsc_dump_qdisc(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct hfsc_sched *q = qdisc_priv(sch);
 	unsigned char *b = skb_tail_pointer(skb);
 	struct tc_hfsc_qopt qopt;
 	struct hfsc_class *cl;
 	unsigned int i;
 
-	sch->qstats.backlog = 0;
+	qstats->backlog = 0;
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry(cl, &q->clhash.hash[i], cl_common.hnode)
-			sch->qstats.backlog += cl->qdisc->qstats.backlog;
+			qstats->backlog += cl->qdisc->qstats_qdisc.qstats.backlog;
 	}
 
 	qopt.defcls = q->defcls;
@@ -1596,7 +1597,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl = hfsc_classify(skb, sch, &err);
 	if (cl == NULL) {
 		if (err & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return err;
 	}
@@ -1605,7 +1606,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (unlikely(err != NET_XMIT_SUCCESS)) {
 		if (net_xmit_drop_count(err)) {
 			cl->qstats.drops++;
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		}
 		return err;
 	}
@@ -1648,7 +1649,7 @@ hfsc_dequeue(struct Qdisc *sch)
 		 */
 		cl = vttree_get_minvt(&q->root, cur_time);
 		if (cl == NULL) {
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			hfsc_schedule_watchdog(sch);
 			return NULL;
 		}
@@ -1703,7 +1704,7 @@ hfsc_drop(struct Qdisc *sch)
 				list_move_tail(&cl->dlist, &q->droplist);
 			}
 			cl->qstats.drops++;
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 			sch->q.qlen--;
 			return len;
 		}
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index edee03d..7cf4196 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -376,8 +376,8 @@ static unsigned int hhf_drop(struct Qdisc *sch)
 		struct sk_buff *skb = dequeue_head(bucket);
 
 		sch->q.qlen--;
-		sch->qstats.drops++;
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.drops++;
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		kfree_skb(skb);
 	}
 
@@ -395,7 +395,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	bucket = &q->buckets[idx];
 	bucket_add(bucket, skb);
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 
 	if (list_empty(&bucket->bucketchain)) {
 		unsigned int weight;
@@ -457,7 +457,7 @@ begin:
 	if (bucket->head) {
 		skb = dequeue_head(bucket);
 		sch->q.qlen--;
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 	}
 
 	if (!skb) {
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 2bbed9c..2cb80e9 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -586,13 +586,13 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 #ifdef CONFIG_NET_CLS_ACT
 	} else if (!cl) {
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 #endif
 	} else if ((ret = qdisc_enqueue(skb, cl->un.leaf.q)) != NET_XMIT_SUCCESS) {
 		if (net_xmit_drop_count(ret)) {
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 			cl->qstats.drops++;
 		}
 		return ret;
@@ -925,7 +925,7 @@ ok:
 				goto ok;
 		}
 	}
-	sch->qstats.overlimits++;
+	sch->qstats_qdisc.qstats.overlimits++;
 	if (likely(next_event > q->now)) {
 		if (!test_bit(__QDISC_STATE_DEACTIVATED,
 			      &qdisc_root_sleeping(q->watchdog.qdisc)->state)) {
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 5700c27..2423d7b 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -58,6 +58,7 @@ static struct tcf_proto **ingress_find_tcf(struct Qdisc *sch, unsigned long cl)
 static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct ingress_qdisc_data *p = qdisc_priv(sch);
+	struct gnet_stats_queue *qstats;
 	struct tcf_result res;
 	struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
 	int result;
@@ -68,7 +69,8 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	switch (result) {
 	case TC_ACT_SHOT:
 		result = TC_ACT_SHOT;
-		sch->qstats.drops++;
+		qstats = this_cpu_ptr(sch->qstats_qdisc.cpu_qstats);
+		qstats->drops++;
 		break;
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index e96a41f..402460d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -98,6 +98,7 @@ static void mq_attach(struct Qdisc *sch)
 
 static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct gnet_stats_basic_packed *bstats = &sch->bstats_qdisc.bstats;
 	struct net_device *dev = qdisc_dev(sch);
 	struct Qdisc *qdisc;
@@ -105,20 +106,24 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 	sch->q.qlen = 0;
 	memset(bstats, 0, sizeof(sch->bstats_qdisc));
-	memset(&sch->qstats, 0, sizeof(sch->qstats));
+	memset(qstats, 0, sizeof(sch->qstats_qdisc.qstats));
 
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		struct gnet_stats_queue *q_qstats;
+
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
-		sch->q.qlen		+= qdisc->q.qlen;
+		q_qstats = &qdisc->qstats_qdisc.qstats;
 
+		sch->q.qlen		+= qdisc->q.qlen;
 		bstats->bytes		+= qdisc->bstats_qdisc.bstats.bytes;
 		bstats->packets		+= qdisc->bstats_qdisc.bstats.packets;
-		sch->qstats.qlen	+= qdisc->qstats.qlen;
-		sch->qstats.backlog	+= qdisc->qstats.backlog;
-		sch->qstats.drops	+= qdisc->qstats.drops;
-		sch->qstats.requeues	+= qdisc->qstats.requeues;
-		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+		qstats->qlen		+= q_qstats->qlen;
+		qstats->backlog		+= q_qstats->backlog;
+		qstats->drops		+= q_qstats->drops;
+		qstats->requeues	+= q_qstats->requeues;
+		qstats->overlimits	+= q_qstats->overlimits;
+
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
 	return 0;
@@ -202,9 +207,9 @@ static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
 
 	sch = dev_queue->qdisc_sleeping;
-	sch->qstats.qlen = sch->q.qlen;
+	sch->qstats_qdisc.qstats.qlen = sch->q.qlen;
 	if (gnet_stats_copy_basic(d, &sch->bstats_qdisc.bstats) < 0 ||
-	    gnet_stats_copy_queue(d, &sch->qstats) < 0)
+	    gnet_stats_copy_queue(d, &sch->qstats_qdisc.qstats) < 0)
 		return -1;
 	return 0;
 }
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 10f5d4a..6ec84a7 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -220,6 +220,7 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct gnet_stats_basic_packed *bstats = &sch->bstats_qdisc.bstats;
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
 	unsigned char *b = skb_tail_pointer(skb);
@@ -229,20 +230,25 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 	sch->q.qlen = 0;
 	memset(bstats, 0, sizeof(sch->bstats_qdisc.bstats));
-	memset(&sch->qstats, 0, sizeof(sch->qstats));
+	memset(qstats, 0, sizeof(sch->qstats_qdisc.qstats));
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct gnet_stats_queue *q_qstats;
 
 		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
 		spin_lock_bh(qdisc_lock(qdisc));
 		sch->q.qlen		+= qdisc->q.qlen;
+
 		bstats->bytes		+= qdisc->bstats_qdisc.bstats.bytes;
 		bstats->packets		+= qdisc->bstats_qdisc.bstats.packets;
-		sch->qstats.qlen	+= qdisc->qstats.qlen;
-		sch->qstats.backlog	+= qdisc->qstats.backlog;
-		sch->qstats.drops	+= qdisc->qstats.drops;
-		sch->qstats.requeues	+= qdisc->qstats.requeues;
-		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+
+		q_qstats = &qdisc->qstats_qdisc.qstats;
+
+		qstats->qlen		+= q_qstats->qlen;
+		qstats->backlog		+= q_qstats->backlog;
+		qstats->drops		+= q_qstats->drops;
+		qstats->requeues	+= q_qstats->requeues;
+		qstats->overlimits	+= q_qstats->overlimits;
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
 
@@ -342,17 +348,19 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		spin_unlock_bh(d->lock);
 
 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
+			struct gnet_stats_queue *stats;
 			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
 
 			qdisc = rtnl_dereference(q->qdisc);
 			spin_lock_bh(qdisc_lock(qdisc));
+			stats = &qdisc->qstats_qdisc.qstats;
 			bstats.bytes      += qdisc->bstats_qdisc.bstats.bytes;
 			bstats.packets    += qdisc->bstats_qdisc.bstats.packets;
-			qstats.qlen       += qdisc->qstats.qlen;
-			qstats.backlog    += qdisc->qstats.backlog;
-			qstats.drops      += qdisc->qstats.drops;
-			qstats.requeues   += qdisc->qstats.requeues;
-			qstats.overlimits += qdisc->qstats.overlimits;
+			qstats.qlen       += stats->qlen;
+			qstats.backlog    += stats->backlog;
+			qstats.drops      += stats->drops;
+			qstats.requeues   += stats->requeues;
+			qstats.overlimits += stats->overlimits;
 			spin_unlock_bh(qdisc_lock(qdisc));
 		}
 		/* Reclaim root sleeping lock before completing stats */
@@ -364,9 +372,9 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
 
 		sch = dev_queue->qdisc_sleeping;
-		sch->qstats.qlen = sch->q.qlen;
+		sch->qstats_qdisc.qstats.qlen = sch->q.qlen;
 		if (gnet_stats_copy_basic(d, &sch->bstats_qdisc.bstats) < 0 ||
-		    gnet_stats_copy_queue(d, &sch->qstats) < 0)
+		    gnet_stats_copy_queue(d, &sch->qstats_qdisc.qstats) < 0)
 			return -1;
 	}
 	return 0;
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 73e8b2b..dfc9c45 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -75,7 +75,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (qdisc == NULL) {
 
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -87,7 +87,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		return NET_XMIT_SUCCESS;
 	}
 	if (net_xmit_drop_count(ret))
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	return ret;
 }
 
@@ -361,9 +361,9 @@ static int multiq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct Qdisc *cl_q;
 
 	cl_q = q->queues[cl - 1];
-	cl_q->qstats.qlen = cl_q->q.qlen;
+	cl_q->qstats_qdisc.qstats.qlen = cl_q->q.qlen;
 	if (gnet_stats_copy_basic(d, &cl_q->bstats_qdisc.bstats) < 0 ||
-	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
+	    gnet_stats_copy_queue(d, &cl_q->qstats_qdisc.qstats) < 0)
 		return -1;
 
 	return 0;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index f1669a00..b46272f 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -429,12 +429,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* Drop packet? */
 	if (loss_event(q)) {
 		if (q->ecn && INET_ECN_set_ce(skb))
-			sch->qstats.drops++; /* mark packet */
+			sch->qstats_qdisc.qstats.drops++; /* mark packet */
 		else
 			--count;
 	}
 	if (count == 0) {
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	}
@@ -478,7 +478,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (unlikely(skb_queue_len(&sch->q) >= sch->limit))
 		return qdisc_reshape_fail(skb, sch);
 
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 
 	cb = netem_skb_cb(skb);
 	if (q->gap == 0 ||		/* not doing reordering */
@@ -526,7 +526,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		q->counter = 0;
 
 		__skb_queue_head(&sch->q, skb);
-		sch->qstats.requeues++;
+		sch->qstats_qdisc.qstats.requeues++;
 	}
 
 	return NET_XMIT_SUCCESS;
@@ -550,20 +550,21 @@ static unsigned int netem_drop(struct Qdisc *sch)
 			skb->next = NULL;
 			skb->prev = NULL;
 			len = qdisc_pkt_len(skb);
-			sch->qstats.backlog -= len;
+			sch->qstats_qdisc.qstats.backlog -= len;
 			kfree_skb(skb);
 		}
 	}
 	if (!len && q->qdisc && q->qdisc->ops->drop)
 	    len = q->qdisc->ops->drop(q->qdisc);
 	if (len)
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 
 	return len;
 }
 
 static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct netem_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
 	struct rb_node *p;
@@ -575,7 +576,7 @@ tfifo_dequeue:
 	skb = __skb_dequeue(&sch->q);
 	if (skb) {
 deliver:
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		qstats->backlog -= qdisc_pkt_len(skb);
 		qdisc_unthrottled(sch);
 		qdisc_bstats_update(sch, skb);
 		return skb;
@@ -610,7 +611,7 @@ deliver:
 
 				if (unlikely(err != NET_XMIT_SUCCESS)) {
 					if (net_xmit_drop_count(err)) {
-						sch->qstats.drops++;
+						qstats->drops++;
 						qdisc_tree_decrease_qlen(sch, 1);
 					}
 				}
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index fefeeb7..f454354 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -116,7 +116,7 @@ static bool drop_early(struct Qdisc *sch, u32 packet_size)
 	/* If we have fewer than 2 mtu-sized packets, disable drop_early,
 	 * similar to min_th in RED
 	 */
-	if (sch->qstats.backlog < 2 * mtu)
+	if (sch->qstats_qdisc.qstats.backlog < 2 * mtu)
 		return false;
 
 	/* If bytemode is turned on, use packet size to compute new
@@ -232,7 +232,7 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt)
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = __skb_dequeue(&sch->q);
 
-		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 		qdisc_drop(skb, sch);
 	}
 	qdisc_tree_decrease_qlen(sch, qlen - sch->q.qlen);
@@ -245,7 +245,7 @@ static void pie_process_dequeue(struct Qdisc *sch, struct sk_buff *skb)
 {
 
 	struct pie_sched_data *q = qdisc_priv(sch);
-	int qlen = sch->qstats.backlog;	/* current queue size in bytes */
+	int qlen = sch->qstats_qdisc.qstats.backlog;
 
 	/* If current queue is about 10 packets or more and dq_count is unset
 	 * we have enough packets to calculate the drain rate. Save
@@ -310,7 +310,7 @@ static void pie_process_dequeue(struct Qdisc *sch, struct sk_buff *skb)
 static void calculate_probability(struct Qdisc *sch)
 {
 	struct pie_sched_data *q = qdisc_priv(sch);
-	u32 qlen = sch->qstats.backlog;	/* queue size in bytes */
+	u32 qlen = sch->qstats_qdisc.qstats.backlog;
 	psched_time_t qdelay = 0;	/* in pschedtime */
 	psched_time_t qdelay_old = q->vars.qdelay;	/* in pschedtime */
 	s32 delta = 0;		/* determines the change in probability */
diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c
index 89f8fcf..df2df95 100644
--- a/net/sched/sch_plug.c
+++ b/net/sched/sch_plug.c
@@ -90,7 +90,7 @@ static int plug_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct plug_sched_data *q = qdisc_priv(sch);
 
-	if (likely(sch->qstats.backlog + skb->len <= q->limit)) {
+	if (likely(sch->qstats_qdisc.qstats.backlog + skb->len <= q->limit)) {
 		if (!q->unplug_indefinite)
 			q->pkts_current_epoch++;
 		return qdisc_enqueue_tail(skb, sch);
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 1ae5d5b..ed34ccc 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -77,7 +77,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	if (qdisc == NULL) {
 
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -89,7 +89,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		return NET_XMIT_SUCCESS;
 	}
 	if (net_xmit_drop_count(ret))
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	return ret;
 }
 
@@ -325,9 +325,9 @@ static int prio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct Qdisc *cl_q;
 
 	cl_q = q->queues[cl - 1];
-	cl_q->qstats.qlen = cl_q->q.qlen;
+	cl_q->qstats_qdisc.qstats.qlen = cl_q->q.qlen;
 	if (gnet_stats_copy_basic(d, &cl_q->bstats_qdisc.bstats) < 0 ||
-	    gnet_stats_copy_queue(d, &cl_q->qstats) < 0)
+	    gnet_stats_copy_queue(d, &cl_q->qstats_qdisc.qstats) < 0)
 		return -1;
 
 	return 0;
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 1cc4978..f4902cf 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -663,14 +663,14 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	struct tc_qfq_stats xstats;
 
 	memset(&xstats, 0, sizeof(xstats));
-	cl->qdisc->qstats.qlen = cl->qdisc->q.qlen;
+	cl->qdisc->qstats_qdisc.qstats.qlen = cl->qdisc->q.qlen;
 
 	xstats.weight = cl->agg->class_weight;
 	xstats.lmax = cl->agg->lmax;
 
 	if (gnet_stats_copy_basic(d, &cl->bstats) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->bstats, NULL, &cl->rate_est) < 0 ||
-	    gnet_stats_copy_queue(d, &cl->qdisc->qstats) < 0)
+	    gnet_stats_copy_queue(d, &cl->qdisc->qstats_qdisc.qstats) < 0)
 		return -1;
 
 	return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
@@ -1228,7 +1228,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl = qfq_classify(skb, sch, &err);
 	if (cl == NULL) {
 		if (err & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return err;
 	}
@@ -1248,7 +1248,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		pr_debug("qfq_enqueue: enqueue failed %d\n", err);
 		if (net_xmit_drop_count(err)) {
 			cl->qstats.drops++;
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		}
 		return err;
 	}
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 633e32d..01ec26b 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -64,7 +64,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	q->vars.qavg = red_calc_qavg(&q->parms,
 				     &q->vars,
-				     child->qstats.backlog);
+				     child->qstats_qdisc.qstats.backlog);
 
 	if (red_is_idling(&q->vars))
 		red_end_of_idle_period(&q->vars);
@@ -74,7 +74,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		break;
 
 	case RED_PROB_MARK:
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) {
 			q->stats.prob_drop++;
 			goto congestion_drop;
@@ -84,7 +84,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		break;
 
 	case RED_HARD_MARK:
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		if (red_use_harddrop(q) || !red_use_ecn(q) ||
 		    !INET_ECN_set_ce(skb)) {
 			q->stats.forced_drop++;
@@ -100,7 +100,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		sch->q.qlen++;
 	} else if (net_xmit_drop_count(ret)) {
 		q->stats.pdrop++;
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	}
 	return ret;
 
@@ -142,7 +142,7 @@ static unsigned int red_drop(struct Qdisc *sch)
 
 	if (child->ops->drop && (len = child->ops->drop(child)) > 0) {
 		q->stats.other++;
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 		sch->q.qlen--;
 		return len;
 	}
@@ -256,6 +256,7 @@ static int red_init(struct Qdisc *sch, struct nlattr *opt)
 
 static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct red_sched_data *q = qdisc_priv(sch);
 	struct nlattr *opts = NULL;
 	struct tc_red_qopt opt = {
@@ -268,7 +269,7 @@ static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
 		.Scell_log	= q->parms.Scell_log,
 	};
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	qstats->backlog = q->qdisc->qstats_qdisc.qstats.backlog;
 	opts = nla_nest_start(skb, TCA_OPTIONS);
 	if (opts == NULL)
 		goto nla_put_failure;
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 5938d00..3de57da 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -290,7 +290,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	struct flow_keys keys;
 
 	if (unlikely(sch->q.qlen >= q->limit)) {
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		q->stats.queuedrop++;
 		goto drop;
 	}
@@ -348,7 +348,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	sfb_skb_cb(skb)->hashes[slot] = 0;
 
 	if (unlikely(minqlen >= q->max)) {
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 		q->stats.bucketdrop++;
 		goto drop;
 	}
@@ -376,7 +376,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			}
 		}
 		if (sfb_rate_limit(skb, q)) {
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			q->stats.penaltydrop++;
 			goto drop;
 		}
@@ -411,7 +411,7 @@ enqueue:
 		increment_qlen(skb, q);
 	} else if (net_xmit_drop_count(ret)) {
 		q->stats.childdrop++;
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	}
 	return ret;
 
@@ -420,7 +420,7 @@ drop:
 	return NET_XMIT_CN;
 other_drop:
 	if (ret & __NET_XMIT_BYPASS)
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	kfree_skb(skb);
 	return ret;
 }
@@ -557,6 +557,7 @@ static int sfb_init(struct Qdisc *sch, struct nlattr *opt)
 
 static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct sfb_sched_data *q = qdisc_priv(sch);
 	struct nlattr *opts;
 	struct tc_sfb_qopt opt = {
@@ -571,7 +572,7 @@ static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
 		.penalty_burst = q->penalty_burst,
 	};
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	qstats->backlog = q->qdisc->qstats_qdisc.qstats.backlog;
 	opts = nla_nest_start(skb, TCA_OPTIONS);
 	if (opts == NULL)
 		goto nla_put_failure;
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 13ec4eb..fdaebc7 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -336,8 +336,8 @@ drop:
 		sfq_dec(q, x);
 		kfree_skb(skb);
 		sch->q.qlen--;
-		sch->qstats.drops++;
-		sch->qstats.backlog -= len;
+		sch->qstats_qdisc.qstats.drops++;
+		sch->qstats_qdisc.qstats.backlog -= len;
 		return len;
 	}
 
@@ -384,7 +384,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	hash = sfq_classify(skb, sch, &ret);
 	if (hash == 0) {
 		if (ret & __NET_XMIT_BYPASS)
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		kfree_skb(skb);
 		return ret;
 	}
@@ -414,7 +414,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			break;
 
 		case RED_PROB_MARK:
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			if (sfq_prob_mark(q)) {
 				/* We know we have at least one packet in queue */
 				if (sfq_headdrop(q) &&
@@ -431,7 +431,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			goto congestion_drop;
 
 		case RED_HARD_MARK:
-			sch->qstats.overlimits++;
+			sch->qstats_qdisc.qstats.overlimits++;
 			if (sfq_hard_mark(q)) {
 				/* We know we have at least one packet in queue */
 				if (sfq_headdrop(q) &&
@@ -457,7 +457,7 @@ congestion_drop:
 		/* We know we have at least one packet in queue */
 		head = slot_dequeue_head(slot);
 		delta = qdisc_pkt_len(head) - qdisc_pkt_len(skb);
-		sch->qstats.backlog -= delta;
+		sch->qstats_qdisc.qstats.backlog -= delta;
 		slot->backlog -= delta;
 		qdisc_drop(head, sch);
 
@@ -466,7 +466,7 @@ congestion_drop:
 	}
 
 enqueue:
-	sch->qstats.backlog += qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog += qdisc_pkt_len(skb);
 	slot->backlog += qdisc_pkt_len(skb);
 	slot_queue_add(slot, skb);
 	sfq_inc(q, x);
@@ -525,7 +525,7 @@ next_slot:
 	sfq_dec(q, a);
 	qdisc_bstats_update(sch, skb);
 	sch->q.qlen--;
-	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->qstats_qdisc.qstats.backlog -= qdisc_pkt_len(skb);
 	slot->backlog -= qdisc_pkt_len(skb);
 	/* Is the slot empty? */
 	if (slot->qlen == 0) {
@@ -559,6 +559,7 @@ sfq_reset(struct Qdisc *sch)
  */
 static void sfq_rehash(struct Qdisc *sch)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
 	int i;
@@ -591,7 +592,7 @@ static void sfq_rehash(struct Qdisc *sch)
 		if (x == SFQ_EMPTY_SLOT) {
 			x = q->dep[0].next; /* get a free slot */
 			if (x >= SFQ_MAX_FLOWS) {
-drop:				sch->qstats.backlog -= qdisc_pkt_len(skb);
+drop:				qstats->backlog -= qdisc_pkt_len(skb);
 				kfree_skb(skb);
 				dropped++;
 				continue;
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 18ff634..edbd14e 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -175,7 +175,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch)
 		ret = qdisc_enqueue(segs, q->qdisc);
 		if (ret != NET_XMIT_SUCCESS) {
 			if (net_xmit_drop_count(ret))
-				sch->qstats.drops++;
+				sch->qstats_qdisc.qstats.drops++;
 		} else {
 			nb++;
 		}
@@ -201,7 +201,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	ret = qdisc_enqueue(skb, q->qdisc);
 	if (ret != NET_XMIT_SUCCESS) {
 		if (net_xmit_drop_count(ret))
-			sch->qstats.drops++;
+			sch->qstats_qdisc.qstats.drops++;
 		return ret;
 	}
 
@@ -216,7 +216,7 @@ static unsigned int tbf_drop(struct Qdisc *sch)
 
 	if (q->qdisc->ops->drop && (len = q->qdisc->ops->drop(q->qdisc)) != 0) {
 		sch->q.qlen--;
-		sch->qstats.drops++;
+		sch->qstats_qdisc.qstats.drops++;
 	}
 	return len;
 }
@@ -281,7 +281,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
 		   (cf. CSZ, HPFQ, HFSC)
 		 */
 
-		sch->qstats.overlimits++;
+		sch->qstats_qdisc.qstats.overlimits++;
 	}
 	return NULL;
 }
@@ -448,11 +448,12 @@ static void tbf_destroy(struct Qdisc *sch)
 
 static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
+	struct gnet_stats_queue *qstats = &sch->qstats_qdisc.qstats;
 	struct tbf_sched_data *q = qdisc_priv(sch);
 	struct nlattr *nest;
 	struct tc_tbf_qopt opt;
 
-	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	qstats->backlog = q->qdisc->qstats_qdisc.qstats.backlog;
 	nest = nla_nest_start(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 15/15] net: sched: drop ingress qdisc lock
  2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
                   ` (13 preceding siblings ...)
  2014-04-30 16:40 ` [RFC PATCH 14/15] net: sched: make qstats per cpu John Fastabend
@ 2014-04-30 16:41 ` John Fastabend
  14 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 16:41 UTC (permalink / raw)
  To: xiyou.wangcong, jhs; +Cc: netdev, davem, eric.dumazet

After the previous patches to make the filters RCU safe and
support per cpu counters we can drop the qdisc lock around
the ingress qdisc hook.

This is possible because the ingress qdisc is a very basic
qdisc and only updates stats and runs tc_classify. Its the
simplest qdiscs we have.

In order for the per-cpu counters to get invoked the
ingress qdisc must set the LLQDISC flag.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/core/dev.c          |    2 --
 net/sched/sch_ingress.c |    6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5d88a89..a7eb8aa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3444,10 +3444,8 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 
 	q = rcu_dereference(rxq->qdisc);
 	if (q != &noop_qdisc) {
-		spin_lock(qdisc_lock(q));
 		if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
 			result = qdisc_enqueue_root(skb, q);
-		spin_unlock(qdisc_lock(q));
 	}
 
 	return result;
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 2423d7b..a81b333 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -88,6 +88,11 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 }
 
 /* ------------------------------------------------------------- */
+static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	sch->flags |= TCQ_F_LLQDISC;
+	return 0;
+}
 
 static void ingress_destroy(struct Qdisc *sch)
 {
@@ -122,6 +127,7 @@ static const struct Qdisc_class_ops ingress_class_ops = {
 };
 
 static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
+	.init		=	ingress_init,
 	.cl_ops		=	&ingress_class_ops,
 	.id		=	"ingress",
 	.priv_size	=	sizeof(struct ingress_qdisc_data),

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-04-30 16:35 ` [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
@ 2014-04-30 17:00   ` Eric Dumazet
  2014-04-30 22:25     ` John Fastabend
  2014-05-14 19:39   ` John Fastabend
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2014-04-30 17:00 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, jhs, netdev, davem

On Wed, 2014-04-30 at 09:35 -0700, John Fastabend wrote:

>  static inline struct Qdisc *qdisc_root(const struct Qdisc *qdisc)
>  {
> -	return qdisc->dev_queue->qdisc;
> +	struct Qdisc *q = rcu_dereference(qdisc->dev_queue->qdisc);
> +
> +	return q;
>  }
>  

Hmm... I am pretty sure this can be called without rcu_read_lock()

rcu_dereference_rtnl() would be more appropriate here I think.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 14/15] net: sched: make qstats per cpu
  2014-04-30 16:40 ` [RFC PATCH 14/15] net: sched: make qstats per cpu John Fastabend
@ 2014-04-30 17:08   ` Cong Wang
  2014-04-30 22:29     ` John Fastabend
  0 siblings, 1 reply; 26+ messages in thread
From: Cong Wang @ 2014-04-30 17:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jamal Hadi Salim, Linux Kernel Network Developers, David Miller,
	Eric Dumazet

On Wed, Apr 30, 2014 at 9:40 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 1a55e19..6d67a27 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -92,7 +92,10 @@ struct Qdisc {
>                 struct gnet_stats_basic_cpu __percpu *cpu_bstats;
>         } bstats_qdisc;
>         unsigned int            __state;
> -       struct gnet_stats_queue qstats;
> +       union {
> +               struct gnet_stats_queue qstats;
> +               struct gnet_stats_queue __percpu *cpu_qstats;
> +       } qstats_qdisc;
>         struct rcu_head         rcu_head;
>         int                     padded;
>         atomic_t                refcnt;


So only ingress qdisc uses ->cpu_qstats, the rest still uses
->qstats. Hmm, doesn't this mean ingress qdisc should init
->cpu_qstats in its ->init()?

[...]

> diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
> index 5700c27..2423d7b 100644
> --- a/net/sched/sch_ingress.c
> +++ b/net/sched/sch_ingress.c
> @@ -58,6 +58,7 @@ static struct tcf_proto **ingress_find_tcf(struct Qdisc *sch, unsigned long cl)
>  static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  {
>         struct ingress_qdisc_data *p = qdisc_priv(sch);
> +       struct gnet_stats_queue *qstats;
>         struct tcf_result res;
>         struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
>         int result;
> @@ -68,7 +69,8 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>         switch (result) {
>         case TC_ACT_SHOT:
>                 result = TC_ACT_SHOT;
> -               sch->qstats.drops++;
> +               qstats = this_cpu_ptr(sch->qstats_qdisc.cpu_qstats);
> +               qstats->drops++;
>                 break;
>         case TC_ACT_STOLEN:
>         case TC_ACT_QUEUED:

I don't see you call alloc_percpu() and free_percpu() in sch_ingress.c.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-04-30 17:00   ` Eric Dumazet
@ 2014-04-30 22:25     ` John Fastabend
  2014-04-30 23:29       ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: John Fastabend @ 2014-04-30 22:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: xiyou.wangcong, jhs, netdev, davem

On 04/30/2014 10:00 AM, Eric Dumazet wrote:
> On Wed, 2014-04-30 at 09:35 -0700, John Fastabend wrote:
>
>>   static inline struct Qdisc *qdisc_root(const struct Qdisc *qdisc)
>>   {
>> -	return qdisc->dev_queue->qdisc;
>> +	struct Qdisc *q = rcu_dereference(qdisc->dev_queue->qdisc);
>> +
>> +	return q;
>>   }
>>
>
> Hmm... I am pretty sure this can be called without rcu_read_lock()
>
> rcu_dereference_rtnl() would be more appropriate here I think.
>
>

It looks like it except for qdisc_watchdog() which needs an
rcu_read_lock() if I'm not mistaken/

static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer)
{
	struct qdisc_watchdog *wd = container_of(timer, struct 		
						 qdisc_watchdog,
						 timer);

	rcu_read_lock();
	qdisc_unthrottled(wd->qdisc);
	__netif_schedule(qdisc_root(wd->qdisc));
	rcu_read_unlock();

	return HRTIMER_NORESTART;
}


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 14/15] net: sched: make qstats per cpu
  2014-04-30 17:08   ` Cong Wang
@ 2014-04-30 22:29     ` John Fastabend
  0 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-04-30 22:29 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, Linux Kernel Network Developers, David Miller,
	Eric Dumazet

On 04/30/2014 10:08 AM, Cong Wang wrote:
> On Wed, Apr 30, 2014 at 9:40 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 1a55e19..6d67a27 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -92,7 +92,10 @@ struct Qdisc {
>>                  struct gnet_stats_basic_cpu __percpu *cpu_bstats;
>>          } bstats_qdisc;
>>          unsigned int            __state;
>> -       struct gnet_stats_queue qstats;
>> +       union {
>> +               struct gnet_stats_queue qstats;
>> +               struct gnet_stats_queue __percpu *cpu_qstats;
>> +       } qstats_qdisc;
>>          struct rcu_head         rcu_head;
>>          int                     padded;
>>          atomic_t                refcnt;
>
>
> So only ingress qdisc uses ->cpu_qstats, the rest still uses
> ->qstats. Hmm, doesn't this mean ingress qdisc should init
> ->cpu_qstats in its ->init()?
>
> [...]
>
>> diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
>> index 5700c27..2423d7b 100644
>> --- a/net/sched/sch_ingress.c
>> +++ b/net/sched/sch_ingress.c
>> @@ -58,6 +58,7 @@ static struct tcf_proto **ingress_find_tcf(struct Qdisc *sch, unsigned long cl)
>>   static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>>   {
>>          struct ingress_qdisc_data *p = qdisc_priv(sch);
>> +       struct gnet_stats_queue *qstats;
>>          struct tcf_result res;
>>          struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
>>          int result;
>> @@ -68,7 +69,8 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>>          switch (result) {
>>          case TC_ACT_SHOT:
>>                  result = TC_ACT_SHOT;
>> -               sch->qstats.drops++;
>> +               qstats = this_cpu_ptr(sch->qstats_qdisc.cpu_qstats);
>> +               qstats->drops++;
>>                  break;
>>          case TC_ACT_STOLEN:
>>          case TC_ACT_QUEUED:
>
> I don't see you call alloc_percpu() and free_percpu() in sch_ingress.c.
>


Right I missed this update in the patch set its included now. Although
its part of sch_api.c and triggers off the TCQ_F_LLQDISC flag.

Thanks,
John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-04-30 22:25     ` John Fastabend
@ 2014-04-30 23:29       ` Eric Dumazet
  2014-05-01 15:20         ` John Fastabend
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2014-04-30 23:29 UTC (permalink / raw)
  To: John Fastabend; +Cc: xiyou.wangcong, jhs, netdev, davem

On Wed, 2014-04-30 at 15:25 -0700, John Fastabend wrote:

> It looks like it except for qdisc_watchdog() which needs an
> rcu_read_lock() if I'm not mistaken/
> 
> static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer)
> {
> 	struct qdisc_watchdog *wd = container_of(timer, struct 		
> 						 qdisc_watchdog,
> 						 timer);
> 
> 	rcu_read_lock();
> 	qdisc_unthrottled(wd->qdisc);
> 	__netif_schedule(qdisc_root(wd->qdisc));
> 	rcu_read_unlock();
> 
> 	return HRTIMER_NORESTART;
> }
> 

Normally, HRTIMER handlers are run under softirq.

Anyway you could simply use following on your builds

CONFIG_PROVE_RCU=y

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-04-30 23:29       ` Eric Dumazet
@ 2014-05-01 15:20         ` John Fastabend
  0 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-05-01 15:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: xiyou.wangcong, jhs, netdev, davem

On 04/30/2014 04:29 PM, Eric Dumazet wrote:
> On Wed, 2014-04-30 at 15:25 -0700, John Fastabend wrote:
>
>> It looks like it except for qdisc_watchdog() which needs an
>> rcu_read_lock() if I'm not mistaken/
>>
>> static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer)
>> {
>> 	struct qdisc_watchdog *wd = container_of(timer, struct 		
>> 						 qdisc_watchdog,
>> 						 timer);
>>
>> 	rcu_read_lock();
>> 	qdisc_unthrottled(wd->qdisc);
>> 	__netif_schedule(qdisc_root(wd->qdisc));
>> 	rcu_read_unlock();
>>
>> 	return HRTIMER_NORESTART;
>> }
>>
>
> Normally, HRTIMER handlers are run under softirq.
>

Right, so I'll change the qdisc_root() usage to rcu_dereference_rtnl().

> Anyway you could simply use following on your builds
>
> CONFIG_PROVE_RCU=y
>

I've been doing this although I haven't checked the output on this
series for a bit and I missed a few places that need to be wrapped
with rtnl_dereference() in the tcf_* code.

Thanks!

>
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-04-30 16:35 ` [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
  2014-04-30 17:00   ` Eric Dumazet
@ 2014-05-14 19:39   ` John Fastabend
  2014-05-15 20:41     ` Paul E. McKenney
  2014-05-15 20:43     ` David Miller
  1 sibling, 2 replies; 26+ messages in thread
From: John Fastabend @ 2014-05-14 19:39 UTC (permalink / raw)
  To: eric.dumazet, paulmck; +Cc: Cong Wang, jhs, netdev, davem

On 04/30/2014 09:35 AM, John Fastabend wrote:
> Add __rcu notation to qdisc handling by doing this we can make
> smatch output more legible. And anyways some of the cases should
> be using rcu_dereference() see qdisc_all_tx_empty(),
> qdisc_tx_chainging(), and so on.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---

Now I'm trying to resolve the lingering sparse errors/warnings and I
have one that I'm not sure about. Maybe someone has some insight,

net/sched/sch_generic.c:694:9: error: bad constant expression
net/sched/sch_generic.c:694:9: error: cannot size expression
net/sched/sch_generic.c:751:9: error: bad constant expression
net/sched/sch_generic.c:751:9: error: cannot size expression
net/sched/sch_generic.c:800:17: error: bad constant expression
net/sched/sch_generic.c:800:17: error: cannot size expression
net/sched/sch_generic.c:886:9: error: bad constant expression
net/sched/sch_generic.c:886:9: error: cannot size expression
net/sched/sch_generic.c:908:17: error: bad constant expression
net/sched/sch_generic.c:908:17: error: cannot size expression

Here are (what I believe are) the relevant changes to trigger it,

>   include/linux/netdevice.h |   29 ++++-------------------------
>   include/net/sch_generic.h |   29 +++++++++++++++++++++++------
>   net/core/dev.c            |   45 +++++++++++++++++++++++++++++++++++++++++++--
>   net/sched/sch_generic.c   |    4 ++--
>   net/sched/sch_mqprio.c    |    6 ++++--
>   net/sched/sch_teql.c      |    9 +++++----
>   6 files changed, 81 insertions(+), 41 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a803d79..4826988 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -546,7 +546,7 @@ struct netdev_queue {
>    * read mostly part
>    */
>   	struct net_device	*dev;
> -	struct Qdisc		*qdisc;
> +	struct Qdisc __rcu	*qdisc;

Add __rcu here,

>   	struct Qdisc		*qdisc_sleeping;
>   #ifdef CONFIG_SYSFS
>   	struct kobject		kobj;

[...]

> +++ b/net/sched/sch_generic.c

[...]

> @@ -871,7 +871,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
>   {
>   	struct Qdisc *qdisc = _qdisc;
>
> -	dev_queue->qdisc = qdisc;
> +	rcu_assign_pointer(dev_queue->qdisc, qdisc);
>   	dev_queue->qdisc_sleeping = qdisc;
>   }

Then rcu_assign_pointer() operations throw the "bad constant expression"
and "cannot size expression" errors. The line numbers are a bit off from
the errors above due to some other improvements/fixes but this is line
886 from above sparse output.

I've done similar types of transformations before without sparse errors
so I thought I might see what changed. After doing a revert of

"rcu: Define rcu_assign_pointer() in terms of smp_store_release()"
commit 88c1863066ccfa456797e12c5d8b4631aa1ad0d0

Sparse no longer throws an error. Here is the diff of that commit for
reference

> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 00a7fd6..4550f22 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -44,7 +44,6 @@
>  #include <linux/debugobjects.h>
>  #include <linux/bug.h>
>  #include <linux/compiler.h>
> -#include <asm/barrier.h>
>
>  extern int rcu_expedited; /* for sysctl */
>  #ifdef CONFIG_RCU_TORTURE_TEST
> @@ -582,7 +581,12 @@ static inline void rcu_preempt_sleep_check(void)
>   * please be careful when making changes to rcu_assign_pointer() and the
>   * other macros that it invokes.
>   */
> -#define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v))
> +#define rcu_assign_pointer(p, v) \
> +       do { \
> +               smp_wmb(); \
> +               ACCESS_ONCE(p) = RCU_INITIALIZER(v); \
> +       } while (0)
> +
>
>  /**
>   * rcu_access_pointer() - fetch RCU pointer with no dereferencing

At the moment I'm not seeing what changed that would cause the error.
Any ideas what I got wrong here?

Thanks!
John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-05-14 19:39   ` John Fastabend
@ 2014-05-15 20:41     ` Paul E. McKenney
  2014-05-15 21:11       ` John Fastabend
  2014-05-15 20:43     ` David Miller
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2014-05-15 20:41 UTC (permalink / raw)
  To: John Fastabend; +Cc: eric.dumazet, Cong Wang, jhs, netdev, davem

On Wed, May 14, 2014 at 12:39:12PM -0700, John Fastabend wrote:
> On 04/30/2014 09:35 AM, John Fastabend wrote:
> >Add __rcu notation to qdisc handling by doing this we can make
> >smatch output more legible. And anyways some of the cases should
> >be using rcu_dereference() see qdisc_all_tx_empty(),
> >qdisc_tx_chainging(), and so on.
> >
> >Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >---
> 
> Now I'm trying to resolve the lingering sparse errors/warnings and I
> have one that I'm not sure about. Maybe someone has some insight,
> 
> net/sched/sch_generic.c:694:9: error: bad constant expression
> net/sched/sch_generic.c:694:9: error: cannot size expression
> net/sched/sch_generic.c:751:9: error: bad constant expression
> net/sched/sch_generic.c:751:9: error: cannot size expression
> net/sched/sch_generic.c:800:17: error: bad constant expression
> net/sched/sch_generic.c:800:17: error: cannot size expression
> net/sched/sch_generic.c:886:9: error: bad constant expression
> net/sched/sch_generic.c:886:9: error: cannot size expression
> net/sched/sch_generic.c:908:17: error: bad constant expression
> net/sched/sch_generic.c:908:17: error: cannot size expression

There is some compiletime_assert_atomic_type() bustage that causes
errors like this.  There should be a fix on its way in.  Try making
compiletime_assert_atomic_type() be an empty macro, and if that works,
help is on the way.

							Thanx, Paul

> Here are (what I believe are) the relevant changes to trigger it,
> 
> >  include/linux/netdevice.h |   29 ++++-------------------------
> >  include/net/sch_generic.h |   29 +++++++++++++++++++++++------
> >  net/core/dev.c            |   45 +++++++++++++++++++++++++++++++++++++++++++--
> >  net/sched/sch_generic.c   |    4 ++--
> >  net/sched/sch_mqprio.c    |    6 ++++--
> >  net/sched/sch_teql.c      |    9 +++++----
> >  6 files changed, 81 insertions(+), 41 deletions(-)
> >
> >diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >index a803d79..4826988 100644
> >--- a/include/linux/netdevice.h
> >+++ b/include/linux/netdevice.h
> >@@ -546,7 +546,7 @@ struct netdev_queue {
> >   * read mostly part
> >   */
> >  	struct net_device	*dev;
> >-	struct Qdisc		*qdisc;
> >+	struct Qdisc __rcu	*qdisc;
> 
> Add __rcu here,
> 
> >  	struct Qdisc		*qdisc_sleeping;
> >  #ifdef CONFIG_SYSFS
> >  	struct kobject		kobj;
> 
> [...]
> 
> >+++ b/net/sched/sch_generic.c
> 
> [...]
> 
> >@@ -871,7 +871,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
> >  {
> >  	struct Qdisc *qdisc = _qdisc;
> >
> >-	dev_queue->qdisc = qdisc;
> >+	rcu_assign_pointer(dev_queue->qdisc, qdisc);
> >  	dev_queue->qdisc_sleeping = qdisc;
> >  }
> 
> Then rcu_assign_pointer() operations throw the "bad constant expression"
> and "cannot size expression" errors. The line numbers are a bit off from
> the errors above due to some other improvements/fixes but this is line
> 886 from above sparse output.
> 
> I've done similar types of transformations before without sparse errors
> so I thought I might see what changed. After doing a revert of
> 
> "rcu: Define rcu_assign_pointer() in terms of smp_store_release()"
> commit 88c1863066ccfa456797e12c5d8b4631aa1ad0d0
> 
> Sparse no longer throws an error. Here is the diff of that commit for
> reference
> 
> >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >index 00a7fd6..4550f22 100644
> >--- a/include/linux/rcupdate.h
> >+++ b/include/linux/rcupdate.h
> >@@ -44,7 +44,6 @@
> > #include <linux/debugobjects.h>
> > #include <linux/bug.h>
> > #include <linux/compiler.h>
> >-#include <asm/barrier.h>
> >
> > extern int rcu_expedited; /* for sysctl */
> > #ifdef CONFIG_RCU_TORTURE_TEST
> >@@ -582,7 +581,12 @@ static inline void rcu_preempt_sleep_check(void)
> >  * please be careful when making changes to rcu_assign_pointer() and the
> >  * other macros that it invokes.
> >  */
> >-#define rcu_assign_pointer(p, v) smp_store_release(&p, RCU_INITIALIZER(v))
> >+#define rcu_assign_pointer(p, v) \
> >+       do { \
> >+               smp_wmb(); \
> >+               ACCESS_ONCE(p) = RCU_INITIALIZER(v); \
> >+       } while (0)
> >+
> >
> > /**
> >  * rcu_access_pointer() - fetch RCU pointer with no dereferencing
> 
> At the moment I'm not seeing what changed that would cause the error.
> Any ideas what I got wrong here?
> 
> Thanks!
> John
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-05-14 19:39   ` John Fastabend
  2014-05-15 20:41     ` Paul E. McKenney
@ 2014-05-15 20:43     ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2014-05-15 20:43 UTC (permalink / raw)
  To: john.fastabend; +Cc: eric.dumazet, paulmck, xiyou.wangcong, jhs, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 14 May 2014 12:39:12 -0700

> At the moment I'm not seeing what changed that would cause the error.
> Any ideas what I got wrong here?

John I took a quick look at this and I'm as mystified as you are :-/

Maybe Paul can provide some insight.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings
  2014-05-15 20:41     ` Paul E. McKenney
@ 2014-05-15 21:11       ` John Fastabend
  0 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2014-05-15 21:11 UTC (permalink / raw)
  To: paulmck; +Cc: eric.dumazet, Cong Wang, jhs, netdev, davem

On 05/15/2014 01:41 PM, Paul E. McKenney wrote:
> On Wed, May 14, 2014 at 12:39:12PM -0700, John Fastabend wrote:
>> On 04/30/2014 09:35 AM, John Fastabend wrote:
>>> Add __rcu notation to qdisc handling by doing this we can make
>>> smatch output more legible. And anyways some of the cases should
>>> be using rcu_dereference() see qdisc_all_tx_empty(),
>>> qdisc_tx_chainging(), and so on.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>>
>> Now I'm trying to resolve the lingering sparse errors/warnings and I
>> have one that I'm not sure about. Maybe someone has some insight,
>>
>> net/sched/sch_generic.c:694:9: error: bad constant expression
>> net/sched/sch_generic.c:694:9: error: cannot size expression
>> net/sched/sch_generic.c:751:9: error: bad constant expression
>> net/sched/sch_generic.c:751:9: error: cannot size expression
>> net/sched/sch_generic.c:800:17: error: bad constant expression
>> net/sched/sch_generic.c:800:17: error: cannot size expression
>> net/sched/sch_generic.c:886:9: error: bad constant expression
>> net/sched/sch_generic.c:886:9: error: cannot size expression
>> net/sched/sch_generic.c:908:17: error: bad constant expression
>> net/sched/sch_generic.c:908:17: error: cannot size expression
>
> There is some compiletime_assert_atomic_type() bustage that causes
> errors like this.  There should be a fix on its way in.  Try making
> compiletime_assert_atomic_type() be an empty macro, and if that works,
> help is on the way.
>
> 							Thanx, Paul

OK the empty macro fixed the sparse errors I have. I'll pull in the
fix when I see it.

Thanks,
John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-05-15 21:11 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-30 16:34 [RFC PATCH 00/15] remove qdisc lock from ingress_qdisc John Fastabend
2014-04-30 16:35 ` [RFC PATCH 01/15] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
2014-04-30 17:00   ` Eric Dumazet
2014-04-30 22:25     ` John Fastabend
2014-04-30 23:29       ` Eric Dumazet
2014-05-01 15:20         ` John Fastabend
2014-05-14 19:39   ` John Fastabend
2014-05-15 20:41     ` Paul E. McKenney
2014-05-15 21:11       ` John Fastabend
2014-05-15 20:43     ` David Miller
2014-04-30 16:35 ` [RFC PATCH 02/15] net: rcu-ify tcf_proto John Fastabend
2014-04-30 16:36 ` [RFC PATCH 03/15] net: sched: cls_basic use RCU John Fastabend
2014-04-30 16:36 ` [RFC PATCH 04/15] net: sched: cls_cgroup " John Fastabend
2014-04-30 16:36 ` [RFC PATCH 05/15] net: sched: cls_flow " John Fastabend
2014-04-30 16:37 ` [RFC PATCH 06/15] net: sched: fw " John Fastabend
2014-04-30 16:37 ` [RFC PATCH 07/15] net: sched: RCU cls_route John Fastabend
2014-04-30 16:38 ` [RFC PATCH 08/15] net: sched: RCU cls_tcindex John Fastabend
2014-04-30 16:38 ` [RFC PATCH 09/15] net: sched: make cls_u32 lockless John Fastabend
2014-04-30 16:39 ` [RFC PATCH 10/15] net: sched: rcu'ify cls_rsvp John Fastabend
2014-04-30 16:39 ` [RFC PATCH 11/15] net: make cls_bpf rcu safe John Fastabend
2014-04-30 16:39 ` [RFC PATCH 12/15] net: sched: make tc_action safe to walk under RCU John Fastabend
2014-04-30 16:40 ` [RFC PATCH 13/15] net: sched: make bstats per cpu and estimator RCU safe John Fastabend
2014-04-30 16:40 ` [RFC PATCH 14/15] net: sched: make qstats per cpu John Fastabend
2014-04-30 17:08   ` Cong Wang
2014-04-30 22:29     ` John Fastabend
2014-04-30 16:41 ` [RFC PATCH 15/15] net: sched: drop ingress qdisc lock John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.