All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter.
@ 2021-10-16  8:49 Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 1/9] gen_stats: Add instead Set the value in __gnet_stats_copy_basic() Sebastian Andrzej Siewior
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner

The first few patches is a follow up to
    https://lore.kernel.org/all/20211007175000.2334713-1-bigeasy@linutronix.de/

The remaining patches (#5+) remove the seqcount_t (Qdisc::running) from
the Qdisc. The statistics (Qdisc::bstats and Qdisc::cpu_bstats) use
u64_stats_t and the "running state" is now represented by a bit in
Qdisc::state.

By removing the seqcount_t from Qdisc and decoupling the bstats
statistics from the seqcount_t it is possible to query the statistics
even if the Qdisc is running instead of waiting until it is idle again.

The try-lock like usage of the seqcount_t in qdisc_run_begin() is
problematic on PREEMPT_RT. Inside the qdisc_run_begin/end() qdisc->running
sequence counter write sections, at sch_direct_xmit(), the seqcount write
serialization lock is released then re-acquired. This is fine for !RT, because
the writer is in a BH disabled region and there is a no in-IRQ reader. For RT
though, BH sections are preemptible. The earlier introduced seqcount_LOCKNAME_t
mechanism, which for RT the reader acquires then relesaes the write
serailization lock to avoid infinite spinning if it preempts a seqcount write
section, cannot work: the qdisc->running write serialization lock is already
intermittingly released inside the seqcount write section.

Sebastian



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next 1/9] gen_stats: Add instead Set the value in __gnet_stats_copy_basic().
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 2/9] gen_stats: Add gnet_stats_add_queue() Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

__gnet_stats_copy_basic() always assigns the value to the bstats
argument overwriting the previous value. The later added per-CPU version
always accumulated the values in the returning gnet_stats_basic_packed
argument.

Based on review there are five users of that function as of today:
- est_fetch_counters(), ___gnet_stats_copy_basic()
  memsets() bstats to zero, single invocation.

- mq_dump(), mqprio_dump(), mqprio_dump_class_stats()
  memsets() bstats to zero, multiple invocation but does not use the
  function due to !qdisc_is_percpu_stats().

Add the values in __gnet_stats_copy_basic() instead overwriting. Rename
the function to gnet_stats_add_basic() to make it more obvious.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/net/gen_stats.h  |  8 ++++----
 net/core/gen_estimator.c |  2 +-
 net/core/gen_stats.c     | 29 ++++++++++++++++-------------
 net/sched/sch_mq.c       |  5 ++---
 net/sched/sch_mqprio.c   | 11 +++++------
 5 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 1424e02cef90c..25740d004bdb0 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -46,10 +46,10 @@ int gnet_stats_copy_basic(const seqcount_t *running,
 			  struct gnet_dump *d,
 			  struct gnet_stats_basic_cpu __percpu *cpu,
 			  struct gnet_stats_basic_packed *b);
-void __gnet_stats_copy_basic(const seqcount_t *running,
-			     struct gnet_stats_basic_packed *bstats,
-			     struct gnet_stats_basic_cpu __percpu *cpu,
-			     struct gnet_stats_basic_packed *b);
+void gnet_stats_add_basic(const seqcount_t *running,
+			  struct gnet_stats_basic_packed *bstats,
+			  struct gnet_stats_basic_cpu __percpu *cpu,
+			  struct gnet_stats_basic_packed *b);
 int gnet_stats_copy_basic_hw(const seqcount_t *running,
 			     struct gnet_dump *d,
 			     struct gnet_stats_basic_cpu __percpu *cpu,
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 8e582e29a41e3..205df8b5116e5 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -66,7 +66,7 @@ static void est_fetch_counters(struct net_rate_estimator *e,
 	if (e->stats_lock)
 		spin_lock(e->stats_lock);
 
-	__gnet_stats_copy_basic(e->running, b, e->cpu_bstats, e->bstats);
+	gnet_stats_add_basic(e->running, b, e->cpu_bstats, e->bstats);
 
 	if (e->stats_lock)
 		spin_unlock(e->stats_lock);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index e491b083b3485..25d7c0989b83f 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -114,9 +114,8 @@ gnet_stats_start_copy(struct sk_buff *skb, int type, spinlock_t *lock,
 }
 EXPORT_SYMBOL(gnet_stats_start_copy);
 
-static void
-__gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
-			    struct gnet_stats_basic_cpu __percpu *cpu)
+static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_packed *bstats,
+				     struct gnet_stats_basic_cpu __percpu *cpu)
 {
 	int i;
 
@@ -136,26 +135,30 @@ __gnet_stats_copy_basic_cpu(struct gnet_stats_basic_packed *bstats,
 	}
 }
 
-void
-__gnet_stats_copy_basic(const seqcount_t *running,
-			struct gnet_stats_basic_packed *bstats,
-			struct gnet_stats_basic_cpu __percpu *cpu,
-			struct gnet_stats_basic_packed *b)
+void gnet_stats_add_basic(const seqcount_t *running,
+			  struct gnet_stats_basic_packed *bstats,
+			  struct gnet_stats_basic_cpu __percpu *cpu,
+			  struct gnet_stats_basic_packed *b)
 {
 	unsigned int seq;
+	u64 bytes = 0;
+	u64 packets = 0;
 
 	if (cpu) {
-		__gnet_stats_copy_basic_cpu(bstats, cpu);
+		gnet_stats_add_basic_cpu(bstats, cpu);
 		return;
 	}
 	do {
 		if (running)
 			seq = read_seqcount_begin(running);
-		bstats->bytes = b->bytes;
-		bstats->packets = b->packets;
+		bytes = b->bytes;
+		packets = b->packets;
 	} while (running && read_seqcount_retry(running, seq));
+
+	bstats->bytes += bytes;
+	bstats->packets += packets;
 }
-EXPORT_SYMBOL(__gnet_stats_copy_basic);
+EXPORT_SYMBOL(gnet_stats_add_basic);
 
 static int
 ___gnet_stats_copy_basic(const seqcount_t *running,
@@ -166,7 +169,7 @@ ___gnet_stats_copy_basic(const seqcount_t *running,
 {
 	struct gnet_stats_basic_packed bstats = {0};
 
-	__gnet_stats_copy_basic(running, &bstats, cpu, b);
+	gnet_stats_add_basic(running, &bstats, cpu, b);
 
 	if (d->compat_tc_stats && type == TCA_STATS_BASIC) {
 		d->tc_stats.bytes = bstats.bytes;
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index e04f1a87642b9..1edd98a50e33d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -147,9 +147,8 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 		if (qdisc_is_percpu_stats(qdisc)) {
 			qlen = qdisc_qlen_sum(qdisc);
-			__gnet_stats_copy_basic(NULL, &sch->bstats,
-						qdisc->cpu_bstats,
-						&qdisc->bstats);
+			gnet_stats_add_basic(NULL, &sch->bstats,
+					     qdisc->cpu_bstats, &qdisc->bstats);
 			__gnet_stats_copy_queue(&sch->qstats,
 						qdisc->cpu_qstats,
 						&qdisc->qstats, qlen);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index e1904e62425e5..4bae601e15e1e 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -405,9 +405,8 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 		if (qdisc_is_percpu_stats(qdisc)) {
 			__u32 qlen = qdisc_qlen_sum(qdisc);
 
-			__gnet_stats_copy_basic(NULL, &sch->bstats,
-						qdisc->cpu_bstats,
-						&qdisc->bstats);
+			gnet_stats_add_basic(NULL, &sch->bstats,
+					     qdisc->cpu_bstats, &qdisc->bstats);
 			__gnet_stats_copy_queue(&sch->qstats,
 						qdisc->cpu_qstats,
 						&qdisc->qstats, qlen);
@@ -535,9 +534,9 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 			if (qdisc_is_percpu_stats(qdisc)) {
 				qlen = qdisc_qlen_sum(qdisc);
 
-				__gnet_stats_copy_basic(NULL, &bstats,
-							qdisc->cpu_bstats,
-							&qdisc->bstats);
+				gnet_stats_add_basic(NULL, &bstats,
+						     qdisc->cpu_bstats,
+						     &qdisc->bstats);
 				__gnet_stats_copy_queue(&qstats,
 							qdisc->cpu_qstats,
 							&qdisc->qstats,
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 2/9] gen_stats: Add gnet_stats_add_queue().
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 1/9] gen_stats: Add instead Set the value in __gnet_stats_copy_basic() Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 3/9] mq, mqprio: Use gnet_stats_add_queue() Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

This function will replace __gnet_stats_copy_queue(). It reads all
arguments and adds them into the passed gnet_stats_queue argument.
In contrast to __gnet_stats_copy_queue() it also copies the qlen member.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/net/gen_stats.h |  3 +++
 net/core/gen_stats.c    | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 25740d004bdb0..148f0ba85f25a 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -62,6 +62,9 @@ int gnet_stats_copy_queue(struct gnet_dump *d,
 void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
 			     const struct gnet_stats_queue __percpu *cpu_q,
 			     const struct gnet_stats_queue *q, __u32 qlen);
+void gnet_stats_add_queue(struct gnet_stats_queue *qstats,
+			  const struct gnet_stats_queue __percpu *cpu_q,
+			  const struct gnet_stats_queue *q);
 int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 
 int gnet_stats_finish_copy(struct gnet_dump *d);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 25d7c0989b83f..26c020a7ead49 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -321,6 +321,38 @@ void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
 }
 EXPORT_SYMBOL(__gnet_stats_copy_queue);
 
+static void gnet_stats_add_queue_cpu(struct gnet_stats_queue *qstats,
+				     const struct gnet_stats_queue __percpu *q)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
+
+		qstats->qlen += qcpu->backlog;
+		qstats->backlog += qcpu->backlog;
+		qstats->drops += qcpu->drops;
+		qstats->requeues += qcpu->requeues;
+		qstats->overlimits += qcpu->overlimits;
+	}
+}
+
+void gnet_stats_add_queue(struct gnet_stats_queue *qstats,
+			  const struct gnet_stats_queue __percpu *cpu,
+			  const struct gnet_stats_queue *q)
+{
+	if (cpu) {
+		gnet_stats_add_queue_cpu(qstats, cpu);
+	} else {
+		qstats->qlen += q->qlen;
+		qstats->backlog += q->backlog;
+		qstats->drops += q->drops;
+		qstats->requeues += q->requeues;
+		qstats->overlimits += q->overlimits;
+	}
+}
+EXPORT_SYMBOL(gnet_stats_add_queue);
+
 /**
  * gnet_stats_copy_queue - copy queue statistics into statistics TLV
  * @d: dumping handle
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 3/9] mq, mqprio: Use gnet_stats_add_queue().
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 1/9] gen_stats: Add instead Set the value in __gnet_stats_copy_basic() Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 2/9] gen_stats: Add gnet_stats_add_queue() Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 4/9] gen_stats: Move remaining users to gnet_stats_add_queue() Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

gnet_stats_add_basic() and gnet_stats_add_queue() add up the statistics
so they can be used directly for both the per-CPU and global case.

gnet_stats_add_queue() copies either Qdisc's per-CPU
gnet_stats_queue::qlen or the global member. The global
gnet_stats_queue::qlen isn't touched in the per-CPU case so there is no
need to consider it in the global-case.

In the per-CPU case, the sum of global gnet_stats_queue::qlen and
the per-CPU gnet_stats_queue::qlen was assigned to sch->q.qlen and
sch->qstats.qlen. Now both fields are copied individually.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/sched/sch_mq.c     | 24 +++++----------------
 net/sched/sch_mqprio.c | 49 +++++++++++-------------------------------
 2 files changed, 17 insertions(+), 56 deletions(-)

diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 1edd98a50e33d..9d58ecb4e80c6 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -130,7 +130,6 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	struct net_device *dev = qdisc_dev(sch);
 	struct Qdisc *qdisc;
 	unsigned int ntx;
-	__u32 qlen = 0;
 
 	sch->q.qlen = 0;
 	memset(&sch->bstats, 0, sizeof(sch->bstats));
@@ -145,24 +144,11 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 
-		if (qdisc_is_percpu_stats(qdisc)) {
-			qlen = qdisc_qlen_sum(qdisc);
-			gnet_stats_add_basic(NULL, &sch->bstats,
-					     qdisc->cpu_bstats, &qdisc->bstats);
-			__gnet_stats_copy_queue(&sch->qstats,
-						qdisc->cpu_qstats,
-						&qdisc->qstats, qlen);
-			sch->q.qlen		+= qlen;
-		} else {
-			sch->q.qlen		+= qdisc->q.qlen;
-			sch->bstats.bytes	+= qdisc->bstats.bytes;
-			sch->bstats.packets	+= qdisc->bstats.packets;
-			sch->qstats.qlen	+= qdisc->qstats.qlen;
-			sch->qstats.backlog	+= qdisc->qstats.backlog;
-			sch->qstats.drops	+= qdisc->qstats.drops;
-			sch->qstats.requeues	+= qdisc->qstats.requeues;
-			sch->qstats.overlimits	+= qdisc->qstats.overlimits;
-		}
+		gnet_stats_add_basic(NULL, &sch->bstats, qdisc->cpu_bstats,
+				     &qdisc->bstats);
+		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+				     &qdisc->qstats);
+		sch->q.qlen += qdisc_qlen(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 4bae601e15e1e..57427b40f0d2e 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -402,24 +402,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 
-		if (qdisc_is_percpu_stats(qdisc)) {
-			__u32 qlen = qdisc_qlen_sum(qdisc);
-
-			gnet_stats_add_basic(NULL, &sch->bstats,
-					     qdisc->cpu_bstats, &qdisc->bstats);
-			__gnet_stats_copy_queue(&sch->qstats,
-						qdisc->cpu_qstats,
-						&qdisc->qstats, qlen);
-			sch->q.qlen		+= qlen;
-		} else {
-			sch->q.qlen		+= qdisc->q.qlen;
-			sch->bstats.bytes	+= qdisc->bstats.bytes;
-			sch->bstats.packets	+= qdisc->bstats.packets;
-			sch->qstats.backlog	+= qdisc->qstats.backlog;
-			sch->qstats.drops	+= qdisc->qstats.drops;
-			sch->qstats.requeues	+= qdisc->qstats.requeues;
-			sch->qstats.overlimits	+= qdisc->qstats.overlimits;
-		}
+		gnet_stats_add_basic(NULL, &sch->bstats, qdisc->cpu_bstats,
+				     &qdisc->bstats);
+		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+				     &qdisc->qstats);
+		sch->q.qlen += qdisc_qlen(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
@@ -511,7 +498,7 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 {
 	if (cl >= TC_H_MIN_PRIORITY) {
 		int i;
-		__u32 qlen = 0;
+		__u32 qlen;
 		struct gnet_stats_queue qstats = {0};
 		struct gnet_stats_basic_packed bstats = {0};
 		struct net_device *dev = qdisc_dev(sch);
@@ -531,27 +518,15 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 			spin_lock_bh(qdisc_lock(qdisc));
 
-			if (qdisc_is_percpu_stats(qdisc)) {
-				qlen = qdisc_qlen_sum(qdisc);
+			gnet_stats_add_basic(NULL, &bstats, qdisc->cpu_bstats,
+					     &qdisc->bstats);
+			gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
+					     &qdisc->qstats);
+			sch->q.qlen += qdisc_qlen(qdisc);
 
-				gnet_stats_add_basic(NULL, &bstats,
-						     qdisc->cpu_bstats,
-						     &qdisc->bstats);
-				__gnet_stats_copy_queue(&qstats,
-							qdisc->cpu_qstats,
-							&qdisc->qstats,
-							qlen);
-			} else {
-				qlen		+= qdisc->q.qlen;
-				bstats.bytes	+= qdisc->bstats.bytes;
-				bstats.packets	+= qdisc->bstats.packets;
-				qstats.backlog	+= qdisc->qstats.backlog;
-				qstats.drops	+= qdisc->qstats.drops;
-				qstats.requeues	+= qdisc->qstats.requeues;
-				qstats.overlimits += qdisc->qstats.overlimits;
-			}
 			spin_unlock_bh(qdisc_lock(qdisc));
 		}
+		qlen = qdisc_qlen(sch) + qstats.qlen;
 
 		/* Reclaim root sleeping lock before completing stats */
 		if (d->lock)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 4/9] gen_stats: Move remaining users to gnet_stats_add_queue().
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2021-10-16  8:49 ` [PATCH net-next 3/9] mq, mqprio: Use gnet_stats_add_queue() Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 5/9] u64_stats: Introduce u64_stats_set() Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

The gnet_stats_queue::qlen member is only used in the SMP-case.

qdisc_qstats_qlen_backlog() needs to add qdisc_qlen() to qstats.qlen to
have the same value as that provided by qdisc_qlen_sum().

gnet_stats_copy_queue() needs to overwritte the resulting qstats.qlen
field whith the caller submitted qlen value. It might be differ from the
submitted value.

Let both functions use gnet_stats_add_queue() and remove unused
__gnet_stats_copy_queue().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/net/gen_stats.h   |  3 ---
 include/net/sch_generic.h |  5 ++---
 net/core/gen_stats.c      | 39 ++-------------------------------------
 3 files changed, 4 insertions(+), 43 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 148f0ba85f25a..d47155f5db5d7 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -59,9 +59,6 @@ int gnet_stats_copy_rate_est(struct gnet_dump *d,
 int gnet_stats_copy_queue(struct gnet_dump *d,
 			  struct gnet_stats_queue __percpu *cpu_q,
 			  struct gnet_stats_queue *q, __u32 qlen);
-void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
-			     const struct gnet_stats_queue __percpu *cpu_q,
-			     const struct gnet_stats_queue *q, __u32 qlen);
 void gnet_stats_add_queue(struct gnet_stats_queue *qstats,
 			  const struct gnet_stats_queue __percpu *cpu_q,
 			  const struct gnet_stats_queue *q);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5a011f8d394ea..7bc2d30b5c067 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -972,10 +972,9 @@ static inline void qdisc_qstats_qlen_backlog(struct Qdisc *sch,  __u32 *qlen,
 					     __u32 *backlog)
 {
 	struct gnet_stats_queue qstats = { 0 };
-	__u32 len = qdisc_qlen_sum(sch);
 
-	__gnet_stats_copy_queue(&qstats, sch->cpu_qstats, &sch->qstats, len);
-	*qlen = qstats.qlen;
+	gnet_stats_add_queue(&qstats, sch->cpu_qstats, &sch->qstats);
+	*qlen = qstats.qlen + qdisc_qlen(sch);
 	*backlog = qstats.backlog;
 }
 
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 26c020a7ead49..6ec11289140b6 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -285,42 +285,6 @@ gnet_stats_copy_rate_est(struct gnet_dump *d,
 }
 EXPORT_SYMBOL(gnet_stats_copy_rate_est);
 
-static void
-__gnet_stats_copy_queue_cpu(struct gnet_stats_queue *qstats,
-			    const struct gnet_stats_queue __percpu *q)
-{
-	int i;
-
-	for_each_possible_cpu(i) {
-		const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
-
-		qstats->qlen = 0;
-		qstats->backlog += qcpu->backlog;
-		qstats->drops += qcpu->drops;
-		qstats->requeues += qcpu->requeues;
-		qstats->overlimits += qcpu->overlimits;
-	}
-}
-
-void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
-			     const struct gnet_stats_queue __percpu *cpu,
-			     const struct gnet_stats_queue *q,
-			     __u32 qlen)
-{
-	if (cpu) {
-		__gnet_stats_copy_queue_cpu(qstats, cpu);
-	} else {
-		qstats->qlen = q->qlen;
-		qstats->backlog = q->backlog;
-		qstats->drops = q->drops;
-		qstats->requeues = q->requeues;
-		qstats->overlimits = q->overlimits;
-	}
-
-	qstats->qlen = qlen;
-}
-EXPORT_SYMBOL(__gnet_stats_copy_queue);
-
 static void gnet_stats_add_queue_cpu(struct gnet_stats_queue *qstats,
 				     const struct gnet_stats_queue __percpu *q)
 {
@@ -374,7 +338,8 @@ gnet_stats_copy_queue(struct gnet_dump *d,
 {
 	struct gnet_stats_queue qstats = {0};
 
-	__gnet_stats_copy_queue(&qstats, cpu_q, q, qlen);
+	gnet_stats_add_queue(&qstats, cpu_q, q);
+	qstats.qlen = qlen;
 
 	if (d->compat_tc_stats) {
 		d->tc_stats.drops = qstats.drops;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 5/9] u64_stats: Introduce u64_stats_set()
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2021-10-16  8:49 ` [PATCH net-next 4/9] gen_stats: Move remaining users to gnet_stats_add_queue() Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 6/9] net: sched: Protect Qdisc::bstats with u64_stats Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" <a.darwish@linutronix.de>

Allow to directly set a u64_stats_t value which is used to provide an init
function which sets it directly to zero intead of memset() the value.

Add u64_stats_set() to the u64_stats API.

[bigeasy: commit message. ]

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/u64_stats_sync.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index e81856c0ba134..e8ec116c916bf 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -83,6 +83,11 @@ static inline u64 u64_stats_read(const u64_stats_t *p)
 	return local64_read(&p->v);
 }
 
+static inline void u64_stats_set(u64_stats_t *p, u64 val)
+{
+	local64_set(&p->v, val);
+}
+
 static inline void u64_stats_add(u64_stats_t *p, unsigned long val)
 {
 	local64_add(val, &p->v);
@@ -104,6 +109,11 @@ static inline u64 u64_stats_read(const u64_stats_t *p)
 	return p->v;
 }
 
+static inline void u64_stats_set(u64_stats_t *p, u64 val)
+{
+	p->v = val;
+}
+
 static inline void u64_stats_add(u64_stats_t *p, unsigned long val)
 {
 	p->v += val;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 6/9] net: sched: Protect Qdisc::bstats with u64_stats
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2021-10-16  8:49 ` [PATCH net-next 5/9] u64_stats: Introduce u64_stats_set() Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-26 23:47   ` kernel test robot
  2021-10-16  8:49 ` [PATCH net-next 7/9] net: sched: Use _bstats_update/set() instead of raw writes Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" <a.darwish@linutronix.de>

The not-per-CPU variant of qdisc tc (traffic control) statistics,
Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running
sequence counter.

This sequence counter is used for reliably protecting bstats reads from
parallel writes. Meanwhile, the seqcount's write section covers a much
wider area than bstats update: qdisc_run_begin() => qdisc_run_end().

That read/write section asymmetry can lead to needless retries of the
read section. To prepare for removing the Qdisc::running sequence
counter altogether, introduce a u64_stats sync point inside bstats
instead.

Modify _bstats_update() to start/end the bstats u64_stats write
section.

For bisectability, and finer commits granularity, the bstats read
section is still protected with a Qdisc::running read/retry loop and
qdisc_run_begin/end() still starts/ends that seqcount write section.
Once all call sites are modified to use _bstats_update(), the
Qdisc::running seqcount will be removed and bstats read/retry loop will
be modified to utilize the internal u64_stats sync point.

Note, using u64_stats implies no sequence counter protection for 64-bit
architectures. This can lead to the statistics "packets" vs. "bytes"
values getting out of sync on rare occasions. The individual values will
still be valid.

[bigeasy: Minor commit message edits, init all gnet_stats_basic_packed.]

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/net/gen_stats.h    |  2 ++
 include/net/sch_generic.h  |  2 ++
 net/core/gen_estimator.c   |  2 +-
 net/core/gen_stats.c       | 14 ++++++++++++--
 net/netfilter/xt_RATEEST.c |  1 +
 net/sched/act_api.c        |  2 ++
 net/sched/sch_atm.c        |  1 +
 net/sched/sch_cbq.c        |  1 +
 net/sched/sch_drr.c        |  1 +
 net/sched/sch_ets.c        |  2 +-
 net/sched/sch_generic.c    |  1 +
 net/sched/sch_gred.c       |  4 +++-
 net/sched/sch_hfsc.c       |  1 +
 net/sched/sch_htb.c        |  7 +++++--
 net/sched/sch_mq.c         |  2 +-
 net/sched/sch_mqprio.c     |  5 +++--
 net/sched/sch_qfq.c        |  1 +
 17 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index d47155f5db5d7..304d792f79776 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -11,6 +11,7 @@
 struct gnet_stats_basic_packed {
 	__u64	bytes;
 	__u64	packets;
+	struct u64_stats_sync syncp;
 };
 
 struct gnet_stats_basic_cpu {
@@ -34,6 +35,7 @@ struct gnet_dump {
 	struct tc_stats   tc_stats;
 };
 
+void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b);
 int gnet_stats_start_copy(struct sk_buff *skb, int type, spinlock_t *lock,
 			  struct gnet_dump *d, int padattr);
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7bc2d30b5c067..d7746aea3cecf 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -852,8 +852,10 @@ static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 static inline void _bstats_update(struct gnet_stats_basic_packed *bstats,
 				  __u64 bytes, __u32 packets)
 {
+	u64_stats_update_begin(&bstats->syncp);
 	bstats->bytes += bytes;
 	bstats->packets += packets;
+	u64_stats_update_end(&bstats->syncp);
 }
 
 static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 205df8b5116e5..64978e77368f4 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -62,7 +62,7 @@ struct net_rate_estimator {
 static void est_fetch_counters(struct net_rate_estimator *e,
 			       struct gnet_stats_basic_packed *b)
 {
-	memset(b, 0, sizeof(*b));
+	gnet_stats_basic_packed_init(b);
 	if (e->stats_lock)
 		spin_lock(e->stats_lock);
 
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 6ec11289140b6..f2e12fe7112b1 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -18,7 +18,7 @@
 #include <linux/gen_stats.h>
 #include <net/netlink.h>
 #include <net/gen_stats.h>
-
+#include <net/sch_generic.h>
 
 static inline int
 gnet_stats_copy(struct gnet_dump *d, int type, void *buf, int size, int padattr)
@@ -114,6 +114,15 @@ gnet_stats_start_copy(struct sk_buff *skb, int type, spinlock_t *lock,
 }
 EXPORT_SYMBOL(gnet_stats_start_copy);
 
+/* Must not be inlined, due to u64_stats seqcount_t lockdep key */
+void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b)
+{
+	b->bytes = 0;
+	b->packets = 0;
+	u64_stats_init(&b->syncp);
+}
+EXPORT_SYMBOL(gnet_stats_basic_packed_init);
+
 static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_packed *bstats,
 				     struct gnet_stats_basic_cpu __percpu *cpu)
 {
@@ -167,8 +176,9 @@ ___gnet_stats_copy_basic(const seqcount_t *running,
 			 struct gnet_stats_basic_packed *b,
 			 int type)
 {
-	struct gnet_stats_basic_packed bstats = {0};
+	struct gnet_stats_basic_packed bstats;
 
+	gnet_stats_basic_packed_init(&bstats);
 	gnet_stats_add_basic(running, &bstats, cpu, b);
 
 	if (d->compat_tc_stats && type == TCA_STATS_BASIC) {
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 0d5c422f87452..d5200725ca62c 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -143,6 +143,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
 	if (!est)
 		goto err1;
 
+	gnet_stats_basic_packed_init(&est->bstats);
 	strlcpy(est->name, info->name, sizeof(est->name));
 	spin_lock_init(&est->lock);
 	est->refcnt		= 1;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 7dd3a2dc5fa40..0302dad42df14 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -490,6 +490,8 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		if (!p->cpu_qstats)
 			goto err3;
 	}
+	gnet_stats_basic_packed_init(&p->tcfa_bstats);
+	gnet_stats_basic_packed_init(&p->tcfa_bstats_hw);
 	spin_lock_init(&p->tcfa_lock);
 	p->tcfa_index = index;
 	p->tcfa_tm.install = jiffies;
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 7d8518176b45a..c8e1771383f9a 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -548,6 +548,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt,
 	pr_debug("atm_tc_init(sch %p,[qdisc %p],opt %p)\n", sch, p, opt);
 	INIT_LIST_HEAD(&p->flows);
 	INIT_LIST_HEAD(&p->link.list);
+	gnet_stats_basic_packed_init(&p->link.bstats);
 	list_add(&p->link.list, &p->flows);
 	p->link.q = qdisc_create_dflt(sch->dev_queue,
 				      &pfifo_qdisc_ops, sch->handle, extack);
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index e0da15530f0e9..d01f6ec315f87 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1611,6 +1611,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 	if (cl == NULL)
 		goto failure;
 
+	gnet_stats_basic_packed_init(&cl->bstats);
 	err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
 	if (err) {
 		kfree(cl);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 642cd179b7a75..319906e19a6ba 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -106,6 +106,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	if (cl == NULL)
 		return -ENOBUFS;
 
+	gnet_stats_basic_packed_init(&cl->bstats);
 	cl->common.classid = classid;
 	cl->quantum	   = quantum;
 	cl->qdisc	   = qdisc_create_dflt(sch->dev_queue,
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index ed86b7021f6d0..83693107371f9 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -689,7 +689,7 @@ static int ets_qdisc_change(struct Qdisc *sch, struct nlattr *opt,
 		q->classes[i].qdisc = NULL;
 		q->classes[i].quantum = 0;
 		q->classes[i].deficit = 0;
-		memset(&q->classes[i].bstats, 0, sizeof(q->classes[i].bstats));
+		gnet_stats_basic_packed_init(&q->classes[i].bstats);
 		memset(&q->classes[i].qstats, 0, sizeof(q->classes[i].qstats));
 	}
 	return 0;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 8c64a552a64fe..ef27ff3ddee4f 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -892,6 +892,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 	__skb_queue_head_init(&sch->gso_skb);
 	__skb_queue_head_init(&sch->skb_bad_txq);
 	qdisc_skb_head_init(&sch->q);
+	gnet_stats_basic_packed_init(&sch->bstats);
 	spin_lock_init(&sch->q.lock);
 
 	if (ops->static_flags & TCQ_F_CPUSTATS) {
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 621dc6afde8f3..2ddcbb2efdbbc 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -364,9 +364,11 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
 	hw_stats->handle = sch->handle;
 	hw_stats->parent = sch->parent;
 
-	for (i = 0; i < MAX_DPs; i++)
+	for (i = 0; i < MAX_DPs; i++) {
+		gnet_stats_basic_packed_init(&hw_stats->stats.bstats[i]);
 		if (table->tab[i])
 			hw_stats->stats.xstats[i] = &table->tab[i]->stats;
+	}
 
 	ret = qdisc_offload_dump_helper(sch, TC_SETUP_QDISC_GRED, hw_stats);
 	/* Even if driver returns failure adjust the stats - in case offload
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index b7ac30cca035d..ff6ff54806fcd 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1406,6 +1406,7 @@ hfsc_init_qdisc(struct Qdisc *sch, struct nlattr *opt,
 	if (err)
 		return err;
 
+	gnet_stats_basic_packed_init(&q->root.bstats);
 	q->root.cl_common.classid = sch->handle;
 	q->root.sched   = q;
 	q->root.qdisc = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops,
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 5067a6e5d4fde..2e805b17efcf9 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1311,7 +1311,7 @@ static void htb_offload_aggregate_stats(struct htb_sched *q,
 	struct htb_class *c;
 	unsigned int i;
 
-	memset(&cl->bstats, 0, sizeof(cl->bstats));
+	gnet_stats_basic_packed_init(&cl->bstats);
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry(c, &q->clhash.hash[i], common.hnode) {
@@ -1357,7 +1357,7 @@ htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
 			if (cl->leaf.q)
 				cl->bstats = cl->leaf.q->bstats;
 			else
-				memset(&cl->bstats, 0, sizeof(cl->bstats));
+				gnet_stats_basic_packed_init(&cl->bstats);
 			cl->bstats.bytes += cl->bstats_bias.bytes;
 			cl->bstats.packets += cl->bstats_bias.packets;
 		} else {
@@ -1849,6 +1849,9 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 		if (!cl)
 			goto failure;
 
+		gnet_stats_basic_packed_init(&cl->bstats);
+		gnet_stats_basic_packed_init(&cl->bstats_bias);
+
 		err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
 		if (err) {
 			kfree(cl);
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 9d58ecb4e80c6..704e14a58f09d 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -132,7 +132,7 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	unsigned int ntx;
 
 	sch->q.qlen = 0;
-	memset(&sch->bstats, 0, sizeof(sch->bstats));
+	gnet_stats_basic_packed_init(&sch->bstats);
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	/* MQ supports lockless qdiscs. However, statistics accounting needs
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 57427b40f0d2e..fe6b4a178fc9f 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -390,7 +390,7 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	unsigned int ntx, tc;
 
 	sch->q.qlen = 0;
-	memset(&sch->bstats, 0, sizeof(sch->bstats));
+	gnet_stats_basic_packed_init(&sch->bstats);
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	/* MQ supports lockless qdiscs. However, statistics accounting needs
@@ -500,10 +500,11 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		int i;
 		__u32 qlen;
 		struct gnet_stats_queue qstats = {0};
-		struct gnet_stats_basic_packed bstats = {0};
+		struct gnet_stats_basic_packed bstats;
 		struct net_device *dev = qdisc_dev(sch);
 		struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
 
+		gnet_stats_basic_packed_init(&bstats);
 		/* Drop lock here it will be reclaimed before touching
 		 * statistics this is required because the d->lock we
 		 * hold here is the look on dev_queue->qdisc_sleeping
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 58a9d42b52b8f..b6d989b69324d 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -465,6 +465,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	if (cl == NULL)
 		return -ENOBUFS;
 
+	gnet_stats_basic_packed_init(&cl->bstats);
 	cl->common.classid = classid;
 	cl->deficit = lmax;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 7/9] net: sched: Use _bstats_update/set() instead of raw writes
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2021-10-16  8:49 ` [PATCH net-next 6/9] net: sched: Protect Qdisc::bstats with u64_stats Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 8/9] net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter Sebastian Andrzej Siewior
  8 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" <a.darwish@linutronix.de>

The Qdisc::running sequence counter, used to protect Qdisc::bstats reads
from parallel writes, is in the process of being removed. Qdisc::bstats
read/writes will synchronize using an internal u64_stats sync point
instead.

Modify all bstats writes to use _bstats_update(). This ensures that
the internal u64_stats sync point is always acquired and released as
appropriate.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/gen_stats.c |  9 +++++----
 net/sched/sch_cbq.c  |  3 +--
 net/sched/sch_gred.c |  7 ++++---
 net/sched/sch_htb.c  | 25 +++++++++++++++----------
 net/sched/sch_qfq.c  |  3 +--
 5 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index f2e12fe7112b1..69576972a25f0 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -126,6 +126,7 @@ EXPORT_SYMBOL(gnet_stats_basic_packed_init);
 static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_packed *bstats,
 				     struct gnet_stats_basic_cpu __percpu *cpu)
 {
+	u64 t_bytes = 0, t_packets = 0;
 	int i;
 
 	for_each_possible_cpu(i) {
@@ -139,9 +140,10 @@ static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_packed *bstats,
 			packets = bcpu->bstats.packets;
 		} while (u64_stats_fetch_retry_irq(&bcpu->syncp, start));
 
-		bstats->bytes += bytes;
-		bstats->packets += packets;
+		t_bytes += bytes;
+		t_packets += packets;
 	}
+	_bstats_update(bstats, t_bytes, t_packets);
 }
 
 void gnet_stats_add_basic(const seqcount_t *running,
@@ -164,8 +166,7 @@ void gnet_stats_add_basic(const seqcount_t *running,
 		packets = b->packets;
 	} while (running && read_seqcount_retry(running, seq));
 
-	bstats->bytes += bytes;
-	bstats->packets += packets;
+	_bstats_update(bstats, bytes, packets);
 }
 EXPORT_SYMBOL(gnet_stats_add_basic);
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index d01f6ec315f87..ef9e87175d35c 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -565,8 +565,7 @@ cbq_update(struct cbq_sched_data *q)
 		long avgidle = cl->avgidle;
 		long idle;
 
-		cl->bstats.packets++;
-		cl->bstats.bytes += len;
+		_bstats_update(&cl->bstats, len, 1);
 
 		/*
 		 * (now - last) is total time between packet right edges.
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 2ddcbb2efdbbc..02b03d6d24ea4 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -353,6 +353,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
 {
 	struct gred_sched *table = qdisc_priv(sch);
 	struct tc_gred_qopt_offload *hw_stats;
+	u64 bytes = 0, packets = 0;
 	unsigned int i;
 	int ret;
 
@@ -381,15 +382,15 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
 		table->tab[i]->bytesin += hw_stats->stats.bstats[i].bytes;
 		table->tab[i]->backlog += hw_stats->stats.qstats[i].backlog;
 
-		_bstats_update(&sch->bstats,
-			       hw_stats->stats.bstats[i].bytes,
-			       hw_stats->stats.bstats[i].packets);
+		bytes += hw_stats->stats.bstats[i].bytes;
+		packets += hw_stats->stats.bstats[i].packets;
 		sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
 		sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
 		sch->qstats.drops += hw_stats->stats.qstats[i].drops;
 		sch->qstats.requeues += hw_stats->stats.qstats[i].requeues;
 		sch->qstats.overlimits += hw_stats->stats.qstats[i].overlimits;
 	}
+	_bstats_update(&sch->bstats, bytes, packets);
 
 	kfree(hw_stats);
 	return ret;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 2e805b17efcf9..324ecfdf842a3 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1308,6 +1308,7 @@ static int htb_dump_class(struct Qdisc *sch, unsigned long arg,
 static void htb_offload_aggregate_stats(struct htb_sched *q,
 					struct htb_class *cl)
 {
+	u64 bytes = 0, packets = 0;
 	struct htb_class *c;
 	unsigned int i;
 
@@ -1323,14 +1324,15 @@ static void htb_offload_aggregate_stats(struct htb_sched *q,
 			if (p != cl)
 				continue;
 
-			cl->bstats.bytes += c->bstats_bias.bytes;
-			cl->bstats.packets += c->bstats_bias.packets;
+			bytes += c->bstats_bias.bytes;
+			packets += c->bstats_bias.packets;
 			if (c->level == 0) {
-				cl->bstats.bytes += c->leaf.q->bstats.bytes;
-				cl->bstats.packets += c->leaf.q->bstats.packets;
+				bytes += c->leaf.q->bstats.bytes;
+				packets += c->leaf.q->bstats.packets;
 			}
 		}
 	}
+	_bstats_update(&cl->bstats, bytes, packets);
 }
 
 static int
@@ -1358,8 +1360,9 @@ htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
 				cl->bstats = cl->leaf.q->bstats;
 			else
 				gnet_stats_basic_packed_init(&cl->bstats);
-			cl->bstats.bytes += cl->bstats_bias.bytes;
-			cl->bstats.packets += cl->bstats_bias.packets;
+			_bstats_update(&cl->bstats,
+				       cl->bstats_bias.bytes,
+				       cl->bstats_bias.packets);
 		} else {
 			htb_offload_aggregate_stats(q, cl);
 		}
@@ -1578,8 +1581,9 @@ static int htb_destroy_class_offload(struct Qdisc *sch, struct htb_class *cl,
 		WARN_ON(old != q);
 
 	if (cl->parent) {
-		cl->parent->bstats_bias.bytes += q->bstats.bytes;
-		cl->parent->bstats_bias.packets += q->bstats.packets;
+		_bstats_update(&cl->parent->bstats_bias,
+			       q->bstats.bytes,
+			       q->bstats.packets);
 	}
 
 	offload_opt = (struct tc_htb_qopt_offload) {
@@ -1925,8 +1929,9 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 				htb_graft_helper(dev_queue, old_q);
 				goto err_kill_estimator;
 			}
-			parent->bstats_bias.bytes += old_q->bstats.bytes;
-			parent->bstats_bias.packets += old_q->bstats.packets;
+			_bstats_update(&parent->bstats_bias,
+				       old_q->bstats.bytes,
+				       old_q->bstats.packets);
 			qdisc_put(old_q);
 		}
 		new_q = qdisc_create_dflt(dev_queue, &pfifo_qdisc_ops,
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index b6d989b69324d..bea68c91027a3 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1235,8 +1235,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		return err;
 	}
 
-	cl->bstats.bytes += len;
-	cl->bstats.packets += gso_segs;
+	_bstats_update(&cl->bstats, len, gso_segs);
 	sch->qstats.backlog += len;
 	++sch->q.qlen;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 8/9] net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2021-10-16  8:49 ` [PATCH net-next 7/9] net: sched: Use _bstats_update/set() instead of raw writes Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-16  8:49 ` [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter Sebastian Andrzej Siewior
  8 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" <a.darwish@linutronix.de>

The only factor differentiating per-CPU bstats data type (struct
gnet_stats_basic_cpu) from the packed non-per-CPU one (struct
gnet_stats_basic_packed) was a u64_stats sync point inside the former.
The two data types are now equivalent: earlier commits added a u64_stats
sync point to the latter.

Combine both data types into "struct gnet_stats_basic_sync". This
eliminates redundancy and simplifies the bstats read/write APIs.

Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit
architectures, u64_stats sync points do not use sequence counter
protection.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 .../net/ethernet/netronome/nfp/abm/qdisc.c    |  2 +-
 include/net/act_api.h                         | 10 +--
 include/net/gen_stats.h                       | 44 ++++++-------
 include/net/netfilter/xt_rateest.h            |  2 +-
 include/net/pkt_cls.h                         |  4 +-
 include/net/sch_generic.h                     | 34 +++-------
 net/core/gen_estimator.c                      | 36 ++++++-----
 net/core/gen_stats.c                          | 62 ++++++++++---------
 net/netfilter/xt_RATEEST.c                    |  8 +--
 net/sched/act_api.c                           | 14 ++---
 net/sched/act_bpf.c                           |  2 +-
 net/sched/act_ife.c                           |  4 +-
 net/sched/act_mpls.c                          |  2 +-
 net/sched/act_police.c                        |  2 +-
 net/sched/act_sample.c                        |  2 +-
 net/sched/act_simple.c                        |  3 +-
 net/sched/act_skbedit.c                       |  2 +-
 net/sched/act_skbmod.c                        |  2 +-
 net/sched/sch_api.c                           |  2 +-
 net/sched/sch_atm.c                           |  4 +-
 net/sched/sch_cbq.c                           |  4 +-
 net/sched/sch_drr.c                           |  4 +-
 net/sched/sch_ets.c                           |  4 +-
 net/sched/sch_generic.c                       |  4 +-
 net/sched/sch_gred.c                          | 10 +--
 net/sched/sch_hfsc.c                          |  4 +-
 net/sched/sch_htb.c                           | 32 +++++-----
 net/sched/sch_mq.c                            |  2 +-
 net/sched/sch_mqprio.c                        |  6 +-
 net/sched/sch_qfq.c                           |  4 +-
 30 files changed, 155 insertions(+), 160 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
index 2473fb5f75e5e..2a5cc64227e9f 100644
--- a/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
+++ b/drivers/net/ethernet/netronome/nfp/abm/qdisc.c
@@ -458,7 +458,7 @@ nfp_abm_qdisc_graft(struct nfp_abm_link *alink, u32 handle, u32 child_handle,
 static void
 nfp_abm_stats_calculate(struct nfp_alink_stats *new,
 			struct nfp_alink_stats *old,
-			struct gnet_stats_basic_packed *bstats,
+			struct gnet_stats_basic_sync *bstats,
 			struct gnet_stats_queue *qstats)
 {
 	_bstats_update(bstats, new->tx_bytes - old->tx_bytes,
diff --git a/include/net/act_api.h b/include/net/act_api.h
index f19f7f4a463cd..b5b624c7e4888 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -30,13 +30,13 @@ struct tc_action {
 	atomic_t			tcfa_bindcnt;
 	int				tcfa_action;
 	struct tcf_t			tcfa_tm;
-	struct gnet_stats_basic_packed	tcfa_bstats;
-	struct gnet_stats_basic_packed	tcfa_bstats_hw;
+	struct gnet_stats_basic_sync	tcfa_bstats;
+	struct gnet_stats_basic_sync	tcfa_bstats_hw;
 	struct gnet_stats_queue		tcfa_qstats;
 	struct net_rate_estimator __rcu *tcfa_rate_est;
 	spinlock_t			tcfa_lock;
-	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
-	struct gnet_stats_basic_cpu __percpu *cpu_bstats_hw;
+	struct gnet_stats_basic_sync __percpu *cpu_bstats;
+	struct gnet_stats_basic_sync __percpu *cpu_bstats_hw;
 	struct gnet_stats_queue __percpu *cpu_qstats;
 	struct tc_cookie	__rcu *act_cookie;
 	struct tcf_chain	__rcu *goto_chain;
@@ -206,7 +206,7 @@ static inline void tcf_action_update_bstats(struct tc_action *a,
 					    struct sk_buff *skb)
 {
 	if (likely(a->cpu_bstats)) {
-		bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), skb);
+		bstats_update(this_cpu_ptr(a->cpu_bstats), skb);
 		return;
 	}
 	spin_lock(&a->tcfa_lock);
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 304d792f79776..52b87588f467b 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -7,15 +7,17 @@
 #include <linux/rtnetlink.h>
 #include <linux/pkt_sched.h>
 
-/* Note: this used to be in include/uapi/linux/gen_stats.h */
-struct gnet_stats_basic_packed {
-	__u64	bytes;
-	__u64	packets;
-	struct u64_stats_sync syncp;
-};
-
-struct gnet_stats_basic_cpu {
-	struct gnet_stats_basic_packed bstats;
+/* Throughput stats.
+ * Must be initialized beforehand with gnet_stats_basic_sync_init().
+ *
+ * If no reads can ever occur parallel to writes (e.g. stack-allocated
+ * bstats), then the internal stat values can be written to and read
+ * from directly. Otherwise, use _bstats_set/update() for writes and
+ * gnet_stats_add_basic() for reads.
+ */
+struct gnet_stats_basic_sync {
+	u64_stats_t bytes;
+	u64_stats_t packets;
 	struct u64_stats_sync syncp;
 } __aligned(2 * sizeof(u64));
 
@@ -35,7 +37,7 @@ struct gnet_dump {
 	struct tc_stats   tc_stats;
 };
 
-void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b);
+void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b);
 int gnet_stats_start_copy(struct sk_buff *skb, int type, spinlock_t *lock,
 			  struct gnet_dump *d, int padattr);
 
@@ -46,16 +48,16 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
 
 int gnet_stats_copy_basic(const seqcount_t *running,
 			  struct gnet_dump *d,
-			  struct gnet_stats_basic_cpu __percpu *cpu,
-			  struct gnet_stats_basic_packed *b);
+			  struct gnet_stats_basic_sync __percpu *cpu,
+			  struct gnet_stats_basic_sync *b);
 void gnet_stats_add_basic(const seqcount_t *running,
-			  struct gnet_stats_basic_packed *bstats,
-			  struct gnet_stats_basic_cpu __percpu *cpu,
-			  struct gnet_stats_basic_packed *b);
+			  struct gnet_stats_basic_sync *bstats,
+			  struct gnet_stats_basic_sync __percpu *cpu,
+			  struct gnet_stats_basic_sync *b);
 int gnet_stats_copy_basic_hw(const seqcount_t *running,
 			     struct gnet_dump *d,
-			     struct gnet_stats_basic_cpu __percpu *cpu,
-			     struct gnet_stats_basic_packed *b);
+			     struct gnet_stats_basic_sync __percpu *cpu,
+			     struct gnet_stats_basic_sync *b);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 			     struct net_rate_estimator __rcu **ptr);
 int gnet_stats_copy_queue(struct gnet_dump *d,
@@ -68,14 +70,14 @@ int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 
 int gnet_stats_finish_copy(struct gnet_dump *d);
 
-int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
-		      struct gnet_stats_basic_cpu __percpu *cpu_bstats,
+int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
+		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
 		      struct net_rate_estimator __rcu **rate_est,
 		      spinlock_t *lock,
 		      seqcount_t *running, struct nlattr *opt);
 void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
-int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
-			  struct gnet_stats_basic_cpu __percpu *cpu_bstats,
+int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
+			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
 			  struct net_rate_estimator __rcu **ptr,
 			  spinlock_t *lock,
 			  seqcount_t *running, struct nlattr *opt);
diff --git a/include/net/netfilter/xt_rateest.h b/include/net/netfilter/xt_rateest.h
index 832ab69efda57..4c3809e141f4f 100644
--- a/include/net/netfilter/xt_rateest.h
+++ b/include/net/netfilter/xt_rateest.h
@@ -6,7 +6,7 @@
 
 struct xt_rateest {
 	/* keep lock and bstats on same cache line to speedup xt_rateest_tg() */
-	struct gnet_stats_basic_packed	bstats;
+	struct gnet_stats_basic_sync	bstats;
 	spinlock_t			lock;
 
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 83a6d07921806..4a5833108083f 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -765,7 +765,7 @@ struct tc_cookie {
 };
 
 struct tc_qopt_offload_stats {
-	struct gnet_stats_basic_packed *bstats;
+	struct gnet_stats_basic_sync *bstats;
 	struct gnet_stats_queue *qstats;
 };
 
@@ -885,7 +885,7 @@ struct tc_gred_qopt_offload_params {
 };
 
 struct tc_gred_qopt_offload_stats {
-	struct gnet_stats_basic_packed bstats[MAX_DPs];
+	struct gnet_stats_basic_sync bstats[MAX_DPs];
 	struct gnet_stats_queue qstats[MAX_DPs];
 	struct red_stats *xstats[MAX_DPs];
 };
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d7746aea3cecf..7882e3aa64482 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -97,7 +97,7 @@ struct Qdisc {
 	struct netdev_queue	*dev_queue;
 
 	struct net_rate_estimator __rcu *rate_est;
-	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+	struct gnet_stats_basic_sync __percpu *cpu_bstats;
 	struct gnet_stats_queue	__percpu *cpu_qstats;
 	int			pad;
 	refcount_t		refcnt;
@@ -107,7 +107,7 @@ struct Qdisc {
 	 */
 	struct sk_buff_head	gso_skb ____cacheline_aligned_in_smp;
 	struct qdisc_skb_head	q;
-	struct gnet_stats_basic_packed bstats;
+	struct gnet_stats_basic_sync bstats;
 	seqcount_t		running;
 	struct gnet_stats_queue	qstats;
 	unsigned long		state;
@@ -849,16 +849,16 @@ static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	return sch->enqueue(skb, sch, to_free);
 }
 
-static inline void _bstats_update(struct gnet_stats_basic_packed *bstats,
+static inline void _bstats_update(struct gnet_stats_basic_sync *bstats,
 				  __u64 bytes, __u32 packets)
 {
 	u64_stats_update_begin(&bstats->syncp);
-	bstats->bytes += bytes;
-	bstats->packets += packets;
+	u64_stats_add(&bstats->bytes, bytes);
+	u64_stats_add(&bstats->packets, packets);
 	u64_stats_update_end(&bstats->syncp);
 }
 
-static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
+static inline void bstats_update(struct gnet_stats_basic_sync *bstats,
 				 const struct sk_buff *skb)
 {
 	_bstats_update(bstats,
@@ -866,26 +866,10 @@ static inline void bstats_update(struct gnet_stats_basic_packed *bstats,
 		       skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1);
 }
 
-static inline void _bstats_cpu_update(struct gnet_stats_basic_cpu *bstats,
-				      __u64 bytes, __u32 packets)
-{
-	u64_stats_update_begin(&bstats->syncp);
-	_bstats_update(&bstats->bstats, bytes, packets);
-	u64_stats_update_end(&bstats->syncp);
-}
-
-static inline void bstats_cpu_update(struct gnet_stats_basic_cpu *bstats,
-				     const struct sk_buff *skb)
-{
-	u64_stats_update_begin(&bstats->syncp);
-	bstats_update(&bstats->bstats, skb);
-	u64_stats_update_end(&bstats->syncp);
-}
-
 static inline void qdisc_bstats_cpu_update(struct Qdisc *sch,
 					   const struct sk_buff *skb)
 {
-	bstats_cpu_update(this_cpu_ptr(sch->cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(sch->cpu_bstats), skb);
 }
 
 static inline void qdisc_bstats_update(struct Qdisc *sch,
@@ -1317,7 +1301,7 @@ void psched_ppscfg_precompute(struct psched_pktrate *r, u64 pktrate64);
 struct mini_Qdisc {
 	struct tcf_proto *filter_list;
 	struct tcf_block *block;
-	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+	struct gnet_stats_basic_sync __percpu *cpu_bstats;
 	struct gnet_stats_queue	__percpu *cpu_qstats;
 	struct rcu_head rcu;
 };
@@ -1325,7 +1309,7 @@ struct mini_Qdisc {
 static inline void mini_qdisc_bstats_cpu_update(struct mini_Qdisc *miniq,
 						const struct sk_buff *skb)
 {
-	bstats_cpu_update(this_cpu_ptr(miniq->cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(miniq->cpu_bstats), skb);
 }
 
 static inline void mini_qdisc_qstats_cpu_drop(struct mini_Qdisc *miniq)
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 64978e77368f4..a73ad0bf324c4 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -40,10 +40,10 @@
  */
 
 struct net_rate_estimator {
-	struct gnet_stats_basic_packed	*bstats;
+	struct gnet_stats_basic_sync	*bstats;
 	spinlock_t		*stats_lock;
 	seqcount_t		*running;
-	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+	struct gnet_stats_basic_sync __percpu *cpu_bstats;
 	u8			ewma_log;
 	u8			intvl_log; /* period : (250ms << intvl_log) */
 
@@ -60,9 +60,9 @@ struct net_rate_estimator {
 };
 
 static void est_fetch_counters(struct net_rate_estimator *e,
-			       struct gnet_stats_basic_packed *b)
+			       struct gnet_stats_basic_sync *b)
 {
-	gnet_stats_basic_packed_init(b);
+	gnet_stats_basic_sync_init(b);
 	if (e->stats_lock)
 		spin_lock(e->stats_lock);
 
@@ -76,14 +76,18 @@ static void est_fetch_counters(struct net_rate_estimator *e,
 static void est_timer(struct timer_list *t)
 {
 	struct net_rate_estimator *est = from_timer(est, t, timer);
-	struct gnet_stats_basic_packed b;
+	struct gnet_stats_basic_sync b;
+	u64 b_bytes, b_packets;
 	u64 rate, brate;
 
 	est_fetch_counters(est, &b);
-	brate = (b.bytes - est->last_bytes) << (10 - est->intvl_log);
+	b_bytes = u64_stats_read(&b.bytes);
+	b_packets = u64_stats_read(&b.packets);
+
+	brate = (b_bytes - est->last_bytes) << (10 - est->intvl_log);
 	brate = (brate >> est->ewma_log) - (est->avbps >> est->ewma_log);
 
-	rate = (b.packets - est->last_packets) << (10 - est->intvl_log);
+	rate = (b_packets - est->last_packets) << (10 - est->intvl_log);
 	rate = (rate >> est->ewma_log) - (est->avpps >> est->ewma_log);
 
 	write_seqcount_begin(&est->seq);
@@ -91,8 +95,8 @@ static void est_timer(struct timer_list *t)
 	est->avpps += rate;
 	write_seqcount_end(&est->seq);
 
-	est->last_bytes = b.bytes;
-	est->last_packets = b.packets;
+	est->last_bytes = b_bytes;
+	est->last_packets = b_packets;
 
 	est->next_jiffies += ((HZ/4) << est->intvl_log);
 
@@ -121,8 +125,8 @@ static void est_timer(struct timer_list *t)
  * Returns 0 on success or a negative error code.
  *
  */
-int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
-		      struct gnet_stats_basic_cpu __percpu *cpu_bstats,
+int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
+		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
 		      struct net_rate_estimator __rcu **rate_est,
 		      spinlock_t *lock,
 		      seqcount_t *running,
@@ -130,7 +134,7 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 {
 	struct gnet_estimator *parm = nla_data(opt);
 	struct net_rate_estimator *old, *est;
-	struct gnet_stats_basic_packed b;
+	struct gnet_stats_basic_sync b;
 	int intvl_log;
 
 	if (nla_len(opt) < sizeof(*parm))
@@ -164,8 +168,8 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 	est_fetch_counters(est, &b);
 	if (lock)
 		local_bh_enable();
-	est->last_bytes = b.bytes;
-	est->last_packets = b.packets;
+	est->last_bytes = u64_stats_read(&b.bytes);
+	est->last_packets = u64_stats_read(&b.packets);
 
 	if (lock)
 		spin_lock_bh(lock);
@@ -222,8 +226,8 @@ EXPORT_SYMBOL(gen_kill_estimator);
  *
  * Returns 0 on success or a negative error code.
  */
-int gen_replace_estimator(struct gnet_stats_basic_packed *bstats,
-			  struct gnet_stats_basic_cpu __percpu *cpu_bstats,
+int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
+			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
 			  struct net_rate_estimator __rcu **rate_est,
 			  spinlock_t *lock,
 			  seqcount_t *running, struct nlattr *opt)
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 69576972a25f0..5f57f761def69 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -115,29 +115,29 @@ gnet_stats_start_copy(struct sk_buff *skb, int type, spinlock_t *lock,
 EXPORT_SYMBOL(gnet_stats_start_copy);
 
 /* Must not be inlined, due to u64_stats seqcount_t lockdep key */
-void gnet_stats_basic_packed_init(struct gnet_stats_basic_packed *b)
+void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b)
 {
-	b->bytes = 0;
-	b->packets = 0;
+	u64_stats_set(&b->bytes, 0);
+	u64_stats_set(&b->packets, 0);
 	u64_stats_init(&b->syncp);
 }
-EXPORT_SYMBOL(gnet_stats_basic_packed_init);
+EXPORT_SYMBOL(gnet_stats_basic_sync_init);
 
-static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_packed *bstats,
-				     struct gnet_stats_basic_cpu __percpu *cpu)
+static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
+				     struct gnet_stats_basic_sync __percpu *cpu)
 {
 	u64 t_bytes = 0, t_packets = 0;
 	int i;
 
 	for_each_possible_cpu(i) {
-		struct gnet_stats_basic_cpu *bcpu = per_cpu_ptr(cpu, i);
+		struct gnet_stats_basic_sync *bcpu = per_cpu_ptr(cpu, i);
 		unsigned int start;
 		u64 bytes, packets;
 
 		do {
 			start = u64_stats_fetch_begin_irq(&bcpu->syncp);
-			bytes = bcpu->bstats.bytes;
-			packets = bcpu->bstats.packets;
+			bytes = u64_stats_read(&bcpu->bytes);
+			packets = u64_stats_read(&bcpu->packets);
 		} while (u64_stats_fetch_retry_irq(&bcpu->syncp, start));
 
 		t_bytes += bytes;
@@ -147,9 +147,9 @@ static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_packed *bstats,
 }
 
 void gnet_stats_add_basic(const seqcount_t *running,
-			  struct gnet_stats_basic_packed *bstats,
-			  struct gnet_stats_basic_cpu __percpu *cpu,
-			  struct gnet_stats_basic_packed *b)
+			  struct gnet_stats_basic_sync *bstats,
+			  struct gnet_stats_basic_sync __percpu *cpu,
+			  struct gnet_stats_basic_sync *b)
 {
 	unsigned int seq;
 	u64 bytes = 0;
@@ -162,8 +162,8 @@ void gnet_stats_add_basic(const seqcount_t *running,
 	do {
 		if (running)
 			seq = read_seqcount_begin(running);
-		bytes = b->bytes;
-		packets = b->packets;
+		bytes = u64_stats_read(&b->bytes);
+		packets = u64_stats_read(&b->packets);
 	} while (running && read_seqcount_retry(running, seq));
 
 	_bstats_update(bstats, bytes, packets);
@@ -173,18 +173,22 @@ EXPORT_SYMBOL(gnet_stats_add_basic);
 static int
 ___gnet_stats_copy_basic(const seqcount_t *running,
 			 struct gnet_dump *d,
-			 struct gnet_stats_basic_cpu __percpu *cpu,
-			 struct gnet_stats_basic_packed *b,
+			 struct gnet_stats_basic_sync __percpu *cpu,
+			 struct gnet_stats_basic_sync *b,
 			 int type)
 {
-	struct gnet_stats_basic_packed bstats;
+	struct gnet_stats_basic_sync bstats;
+	u64 bstats_bytes, bstats_packets;
 
-	gnet_stats_basic_packed_init(&bstats);
+	gnet_stats_basic_sync_init(&bstats);
 	gnet_stats_add_basic(running, &bstats, cpu, b);
 
+	bstats_bytes = u64_stats_read(&bstats.bytes);
+	bstats_packets = u64_stats_read(&bstats.packets);
+
 	if (d->compat_tc_stats && type == TCA_STATS_BASIC) {
-		d->tc_stats.bytes = bstats.bytes;
-		d->tc_stats.packets = bstats.packets;
+		d->tc_stats.bytes = bstats_bytes;
+		d->tc_stats.packets = bstats_packets;
 	}
 
 	if (d->tail) {
@@ -192,14 +196,14 @@ ___gnet_stats_copy_basic(const seqcount_t *running,
 		int res;
 
 		memset(&sb, 0, sizeof(sb));
-		sb.bytes = bstats.bytes;
-		sb.packets = bstats.packets;
+		sb.bytes = bstats_bytes;
+		sb.packets = bstats_packets;
 		res = gnet_stats_copy(d, type, &sb, sizeof(sb), TCA_STATS_PAD);
-		if (res < 0 || sb.packets == bstats.packets)
+		if (res < 0 || sb.packets == bstats_packets)
 			return res;
 		/* emit 64bit stats only if needed */
-		return gnet_stats_copy(d, TCA_STATS_PKT64, &bstats.packets,
-				       sizeof(bstats.packets), TCA_STATS_PAD);
+		return gnet_stats_copy(d, TCA_STATS_PKT64, &bstats_packets,
+				       sizeof(bstats_packets), TCA_STATS_PAD);
 	}
 	return 0;
 }
@@ -220,8 +224,8 @@ ___gnet_stats_copy_basic(const seqcount_t *running,
 int
 gnet_stats_copy_basic(const seqcount_t *running,
 		      struct gnet_dump *d,
-		      struct gnet_stats_basic_cpu __percpu *cpu,
-		      struct gnet_stats_basic_packed *b)
+		      struct gnet_stats_basic_sync __percpu *cpu,
+		      struct gnet_stats_basic_sync *b)
 {
 	return ___gnet_stats_copy_basic(running, d, cpu, b,
 					TCA_STATS_BASIC);
@@ -244,8 +248,8 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
 int
 gnet_stats_copy_basic_hw(const seqcount_t *running,
 			 struct gnet_dump *d,
-			 struct gnet_stats_basic_cpu __percpu *cpu,
-			 struct gnet_stats_basic_packed *b)
+			 struct gnet_stats_basic_sync __percpu *cpu,
+			 struct gnet_stats_basic_sync *b)
 {
 	return ___gnet_stats_copy_basic(running, d, cpu, b,
 					TCA_STATS_BASIC_HW);
diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index d5200725ca62c..8aec1b529364a 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -94,11 +94,11 @@ static unsigned int
 xt_rateest_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_rateest_target_info *info = par->targinfo;
-	struct gnet_stats_basic_packed *stats = &info->est->bstats;
+	struct gnet_stats_basic_sync *stats = &info->est->bstats;
 
 	spin_lock_bh(&info->est->lock);
-	stats->bytes += skb->len;
-	stats->packets++;
+	u64_stats_add(&stats->bytes, skb->len);
+	u64_stats_inc(&stats->packets);
 	spin_unlock_bh(&info->est->lock);
 
 	return XT_CONTINUE;
@@ -143,7 +143,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par)
 	if (!est)
 		goto err1;
 
-	gnet_stats_basic_packed_init(&est->bstats);
+	gnet_stats_basic_sync_init(&est->bstats);
 	strlcpy(est->name, info->name, sizeof(est->name));
 	spin_lock_init(&est->lock);
 	est->refcnt		= 1;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 0302dad42df14..585829ffa0c4c 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -480,18 +480,18 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		atomic_set(&p->tcfa_bindcnt, 1);
 
 	if (cpustats) {
-		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
+		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_sync);
 		if (!p->cpu_bstats)
 			goto err1;
-		p->cpu_bstats_hw = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
+		p->cpu_bstats_hw = netdev_alloc_pcpu_stats(struct gnet_stats_basic_sync);
 		if (!p->cpu_bstats_hw)
 			goto err2;
 		p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
 		if (!p->cpu_qstats)
 			goto err3;
 	}
-	gnet_stats_basic_packed_init(&p->tcfa_bstats);
-	gnet_stats_basic_packed_init(&p->tcfa_bstats_hw);
+	gnet_stats_basic_sync_init(&p->tcfa_bstats);
+	gnet_stats_basic_sync_init(&p->tcfa_bstats_hw);
 	spin_lock_init(&p->tcfa_lock);
 	p->tcfa_index = index;
 	p->tcfa_tm.install = jiffies;
@@ -1128,13 +1128,13 @@ void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets,
 			     u64 drops, bool hw)
 {
 	if (a->cpu_bstats) {
-		_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
+		_bstats_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
 
 		this_cpu_ptr(a->cpu_qstats)->drops += drops;
 
 		if (hw)
-			_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats_hw),
-					   bytes, packets);
+			_bstats_update(this_cpu_ptr(a->cpu_bstats_hw),
+				       bytes, packets);
 		return;
 	}
 
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 5c36013339e11..f2bf896331a59 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -41,7 +41,7 @@ static int tcf_bpf_act(struct sk_buff *skb, const struct tc_action *act,
 	int action, filter_res;
 
 	tcf_lastuse_update(&prog->tcf_tm);
-	bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
 
 	filter = rcu_dereference(prog->filter);
 	if (at_ingress) {
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 7064a365a1a98..b757f90a2d589 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -718,7 +718,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
 	u8 *tlv_data;
 	u16 metalen;
 
-	bstats_cpu_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
 	tcf_lastuse_update(&ife->tcf_tm);
 
 	if (skb_at_tc_ingress(skb))
@@ -806,7 +806,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
 			exceed_mtu = true;
 	}
 
-	bstats_cpu_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
 	tcf_lastuse_update(&ife->tcf_tm);
 
 	if (!metalen) {		/* no metadata to send */
diff --git a/net/sched/act_mpls.c b/net/sched/act_mpls.c
index e4529b428cf44..8faa4c58305e3 100644
--- a/net/sched/act_mpls.c
+++ b/net/sched/act_mpls.c
@@ -59,7 +59,7 @@ static int tcf_mpls_act(struct sk_buff *skb, const struct tc_action *a,
 	int ret, mac_len;
 
 	tcf_lastuse_update(&m->tcf_tm);
-	bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(m->common.cpu_bstats), skb);
 
 	/* Ensure 'data' points at mac_header prior calling mpls manipulating
 	 * functions.
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 832157a840fc3..c9383805222df 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -248,7 +248,7 @@ static int tcf_police_act(struct sk_buff *skb, const struct tc_action *a,
 	int ret;
 
 	tcf_lastuse_update(&police->tcf_tm);
-	bstats_cpu_update(this_cpu_ptr(police->common.cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(police->common.cpu_bstats), skb);
 
 	ret = READ_ONCE(police->tcf_action);
 	p = rcu_dereference_bh(police->params);
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 230501eb9e069..ce859b0e0deb9 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -163,7 +163,7 @@ static int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a,
 	int retval;
 
 	tcf_lastuse_update(&s->tcf_tm);
-	bstats_cpu_update(this_cpu_ptr(s->common.cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(s->common.cpu_bstats), skb);
 	retval = READ_ONCE(s->tcf_action);
 
 	psample_group = rcu_dereference_bh(s->psample_group);
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index cbbe1861d3a20..e617ab4505ca4 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -36,7 +36,8 @@ static int tcf_simp_act(struct sk_buff *skb, const struct tc_action *a,
 	 * then it would look like "hello_3" (without quotes)
 	 */
 	pr_info("simple: %s_%llu\n",
-	       (char *)d->tcfd_defdata, d->tcf_bstats.packets);
+		(char *)d->tcfd_defdata,
+		u64_stats_read(&d->tcf_bstats.packets));
 	spin_unlock(&d->tcf_lock);
 	return d->tcf_action;
 }
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 6054185383474..d30ecbfc8f846 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -31,7 +31,7 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 	int action;
 
 	tcf_lastuse_update(&d->tcf_tm);
-	bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(d->common.cpu_bstats), skb);
 
 	params = rcu_dereference_bh(d->params);
 	action = READ_ONCE(d->tcf_action);
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index ecb9ee6660954..9b6b52c5e24ec 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -31,7 +31,7 @@ static int tcf_skbmod_act(struct sk_buff *skb, const struct tc_action *a,
 	u64 flags;
 
 	tcf_lastuse_update(&d->tcf_tm);
-	bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb);
+	bstats_update(this_cpu_ptr(d->common.cpu_bstats), skb);
 
 	action = READ_ONCE(d->tcf_action);
 	if (unlikely(action == TC_ACT_SHOT))
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 91820f67275c7..70f006cbf2126 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -885,7 +885,7 @@ static void qdisc_offload_graft_root(struct net_device *dev,
 static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 			 u32 portid, u32 seq, u16 flags, int event)
 {
-	struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+	struct gnet_stats_basic_sync __percpu *cpu_bstats = NULL;
 	struct gnet_stats_queue __percpu *cpu_qstats = NULL;
 	struct tcmsg *tcm;
 	struct nlmsghdr  *nlh;
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index c8e1771383f9a..fbfe4ce9497b5 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -52,7 +52,7 @@ struct atm_flow_data {
 	struct atm_qdisc_data	*parent;	/* parent qdisc */
 	struct socket		*sock;		/* for closing */
 	int			ref;		/* reference count */
-	struct gnet_stats_basic_packed	bstats;
+	struct gnet_stats_basic_sync	bstats;
 	struct gnet_stats_queue	qstats;
 	struct list_head	list;
 	struct atm_flow_data	*excess;	/* flow for excess traffic;
@@ -548,7 +548,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt,
 	pr_debug("atm_tc_init(sch %p,[qdisc %p],opt %p)\n", sch, p, opt);
 	INIT_LIST_HEAD(&p->flows);
 	INIT_LIST_HEAD(&p->link.list);
-	gnet_stats_basic_packed_init(&p->link.bstats);
+	gnet_stats_basic_sync_init(&p->link.bstats);
 	list_add(&p->link.list, &p->flows);
 	p->link.q = qdisc_create_dflt(sch->dev_queue,
 				      &pfifo_qdisc_ops, sch->handle, extack);
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index ef9e87175d35c..f0b1282fae111 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -116,7 +116,7 @@ struct cbq_class {
 	long			avgidle;
 	long			deficit;	/* Saved deficit for WRR */
 	psched_time_t		penalized;
-	struct gnet_stats_basic_packed bstats;
+	struct gnet_stats_basic_sync bstats;
 	struct gnet_stats_queue qstats;
 	struct net_rate_estimator __rcu *rate_est;
 	struct tc_cbq_xstats	xstats;
@@ -1610,7 +1610,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 	if (cl == NULL)
 		goto failure;
 
-	gnet_stats_basic_packed_init(&cl->bstats);
+	gnet_stats_basic_sync_init(&cl->bstats);
 	err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
 	if (err) {
 		kfree(cl);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 319906e19a6ba..7243617a3595f 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -19,7 +19,7 @@ struct drr_class {
 	struct Qdisc_class_common	common;
 	unsigned int			filter_cnt;
 
-	struct gnet_stats_basic_packed		bstats;
+	struct gnet_stats_basic_sync		bstats;
 	struct gnet_stats_queue		qstats;
 	struct net_rate_estimator __rcu *rate_est;
 	struct list_head		alist;
@@ -106,7 +106,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	if (cl == NULL)
 		return -ENOBUFS;
 
-	gnet_stats_basic_packed_init(&cl->bstats);
+	gnet_stats_basic_sync_init(&cl->bstats);
 	cl->common.classid = classid;
 	cl->quantum	   = quantum;
 	cl->qdisc	   = qdisc_create_dflt(sch->dev_queue,
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index 83693107371f9..af56d155e7fca 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -41,7 +41,7 @@ struct ets_class {
 	struct Qdisc *qdisc;
 	u32 quantum;
 	u32 deficit;
-	struct gnet_stats_basic_packed bstats;
+	struct gnet_stats_basic_sync bstats;
 	struct gnet_stats_queue qstats;
 };
 
@@ -689,7 +689,7 @@ static int ets_qdisc_change(struct Qdisc *sch, struct nlattr *opt,
 		q->classes[i].qdisc = NULL;
 		q->classes[i].quantum = 0;
 		q->classes[i].deficit = 0;
-		gnet_stats_basic_packed_init(&q->classes[i].bstats);
+		gnet_stats_basic_sync_init(&q->classes[i].bstats);
 		memset(&q->classes[i].qstats, 0, sizeof(q->classes[i].qstats));
 	}
 	return 0;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ef27ff3ddee4f..989186e7f1a02 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -892,12 +892,12 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 	__skb_queue_head_init(&sch->gso_skb);
 	__skb_queue_head_init(&sch->skb_bad_txq);
 	qdisc_skb_head_init(&sch->q);
-	gnet_stats_basic_packed_init(&sch->bstats);
+	gnet_stats_basic_sync_init(&sch->bstats);
 	spin_lock_init(&sch->q.lock);
 
 	if (ops->static_flags & TCQ_F_CPUSTATS) {
 		sch->cpu_bstats =
-			netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
+			netdev_alloc_pcpu_stats(struct gnet_stats_basic_sync);
 		if (!sch->cpu_bstats)
 			goto errout1;
 
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 02b03d6d24ea4..72de08ef8335e 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -366,7 +366,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
 	hw_stats->parent = sch->parent;
 
 	for (i = 0; i < MAX_DPs; i++) {
-		gnet_stats_basic_packed_init(&hw_stats->stats.bstats[i]);
+		gnet_stats_basic_sync_init(&hw_stats->stats.bstats[i]);
 		if (table->tab[i])
 			hw_stats->stats.xstats[i] = &table->tab[i]->stats;
 	}
@@ -378,12 +378,12 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
 	for (i = 0; i < MAX_DPs; i++) {
 		if (!table->tab[i])
 			continue;
-		table->tab[i]->packetsin += hw_stats->stats.bstats[i].packets;
-		table->tab[i]->bytesin += hw_stats->stats.bstats[i].bytes;
+		table->tab[i]->packetsin += u64_stats_read(&hw_stats->stats.bstats[i].packets);
+		table->tab[i]->bytesin += u64_stats_read(&hw_stats->stats.bstats[i].bytes);
 		table->tab[i]->backlog += hw_stats->stats.qstats[i].backlog;
 
-		bytes += hw_stats->stats.bstats[i].bytes;
-		packets += hw_stats->stats.bstats[i].packets;
+		bytes += u64_stats_read(&hw_stats->stats.bstats[i].bytes);
+		packets += u64_stats_read(&hw_stats->stats.bstats[i].packets);
 		sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
 		sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
 		sch->qstats.drops += hw_stats->stats.qstats[i].drops;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index ff6ff54806fcd..181c2905ff983 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -111,7 +111,7 @@ enum hfsc_class_flags {
 struct hfsc_class {
 	struct Qdisc_class_common cl_common;
 
-	struct gnet_stats_basic_packed bstats;
+	struct gnet_stats_basic_sync bstats;
 	struct gnet_stats_queue qstats;
 	struct net_rate_estimator __rcu *rate_est;
 	struct tcf_proto __rcu *filter_list; /* filter list */
@@ -1406,7 +1406,7 @@ hfsc_init_qdisc(struct Qdisc *sch, struct nlattr *opt,
 	if (err)
 		return err;
 
-	gnet_stats_basic_packed_init(&q->root.bstats);
+	gnet_stats_basic_sync_init(&q->root.bstats);
 	q->root.cl_common.classid = sch->handle;
 	q->root.sched   = q;
 	q->root.qdisc = qdisc_create_dflt(sch->dev_queue, &pfifo_qdisc_ops,
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 324ecfdf842a3..adceb9e210f61 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -113,8 +113,8 @@ struct htb_class {
 	/*
 	 * Written often fields
 	 */
-	struct gnet_stats_basic_packed bstats;
-	struct gnet_stats_basic_packed bstats_bias;
+	struct gnet_stats_basic_sync bstats;
+	struct gnet_stats_basic_sync bstats_bias;
 	struct tc_htb_xstats	xstats;	/* our special stats */
 
 	/* token bucket parameters */
@@ -1312,7 +1312,7 @@ static void htb_offload_aggregate_stats(struct htb_sched *q,
 	struct htb_class *c;
 	unsigned int i;
 
-	gnet_stats_basic_packed_init(&cl->bstats);
+	gnet_stats_basic_sync_init(&cl->bstats);
 
 	for (i = 0; i < q->clhash.hashsize; i++) {
 		hlist_for_each_entry(c, &q->clhash.hash[i], common.hnode) {
@@ -1324,11 +1324,11 @@ static void htb_offload_aggregate_stats(struct htb_sched *q,
 			if (p != cl)
 				continue;
 
-			bytes += c->bstats_bias.bytes;
-			packets += c->bstats_bias.packets;
+			bytes += u64_stats_read(&c->bstats_bias.bytes);
+			packets += u64_stats_read(&c->bstats_bias.packets);
 			if (c->level == 0) {
-				bytes += c->leaf.q->bstats.bytes;
-				packets += c->leaf.q->bstats.packets;
+				bytes += u64_stats_read(&c->leaf.q->bstats.bytes);
+				packets += u64_stats_read(&c->leaf.q->bstats.packets);
 			}
 		}
 	}
@@ -1359,10 +1359,10 @@ htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
 			if (cl->leaf.q)
 				cl->bstats = cl->leaf.q->bstats;
 			else
-				gnet_stats_basic_packed_init(&cl->bstats);
+				gnet_stats_basic_sync_init(&cl->bstats);
 			_bstats_update(&cl->bstats,
-				       cl->bstats_bias.bytes,
-				       cl->bstats_bias.packets);
+				       u64_stats_read(&cl->bstats_bias.bytes),
+				       u64_stats_read(&cl->bstats_bias.packets));
 		} else {
 			htb_offload_aggregate_stats(q, cl);
 		}
@@ -1582,8 +1582,8 @@ static int htb_destroy_class_offload(struct Qdisc *sch, struct htb_class *cl,
 
 	if (cl->parent) {
 		_bstats_update(&cl->parent->bstats_bias,
-			       q->bstats.bytes,
-			       q->bstats.packets);
+			       u64_stats_read(&q->bstats.bytes),
+			       u64_stats_read(&q->bstats.packets));
 	}
 
 	offload_opt = (struct tc_htb_qopt_offload) {
@@ -1853,8 +1853,8 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 		if (!cl)
 			goto failure;
 
-		gnet_stats_basic_packed_init(&cl->bstats);
-		gnet_stats_basic_packed_init(&cl->bstats_bias);
+		gnet_stats_basic_sync_init(&cl->bstats);
+		gnet_stats_basic_sync_init(&cl->bstats_bias);
 
 		err = tcf_block_get(&cl->block, &cl->filter_list, sch, extack);
 		if (err) {
@@ -1930,8 +1930,8 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 				goto err_kill_estimator;
 			}
 			_bstats_update(&parent->bstats_bias,
-				       old_q->bstats.bytes,
-				       old_q->bstats.packets);
+				       u64_stats_read(&old_q->bstats.bytes),
+				       u64_stats_read(&old_q->bstats.packets));
 			qdisc_put(old_q);
 		}
 		new_q = qdisc_create_dflt(dev_queue, &pfifo_qdisc_ops,
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 704e14a58f09d..cedd0b3ef9cfb 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -132,7 +132,7 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	unsigned int ntx;
 
 	sch->q.qlen = 0;
-	gnet_stats_basic_packed_init(&sch->bstats);
+	gnet_stats_basic_sync_init(&sch->bstats);
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	/* MQ supports lockless qdiscs. However, statistics accounting needs
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index fe6b4a178fc9f..3f7f756f92ca3 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -390,7 +390,7 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	unsigned int ntx, tc;
 
 	sch->q.qlen = 0;
-	gnet_stats_basic_packed_init(&sch->bstats);
+	gnet_stats_basic_sync_init(&sch->bstats);
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	/* MQ supports lockless qdiscs. However, statistics accounting needs
@@ -500,11 +500,11 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		int i;
 		__u32 qlen;
 		struct gnet_stats_queue qstats = {0};
-		struct gnet_stats_basic_packed bstats;
+		struct gnet_stats_basic_sync bstats;
 		struct net_device *dev = qdisc_dev(sch);
 		struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
 
-		gnet_stats_basic_packed_init(&bstats);
+		gnet_stats_basic_sync_init(&bstats);
 		/* Drop lock here it will be reclaimed before touching
 		 * statistics this is required because the d->lock we
 		 * hold here is the look on dev_queue->qdisc_sleeping
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index bea68c91027a3..a35200f591a2d 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -131,7 +131,7 @@ struct qfq_class {
 
 	unsigned int filter_cnt;
 
-	struct gnet_stats_basic_packed bstats;
+	struct gnet_stats_basic_sync bstats;
 	struct gnet_stats_queue qstats;
 	struct net_rate_estimator __rcu *rate_est;
 	struct Qdisc *qdisc;
@@ -465,7 +465,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	if (cl == NULL)
 		return -ENOBUFS;
 
-	gnet_stats_basic_packed_init(&cl->bstats);
+	gnet_stats_basic_sync_init(&cl->bstats);
 	cl->common.classid = classid;
 	cl->deficit = lmax;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter
  2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2021-10-16  8:49 ` [PATCH net-next 8/9] net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types Sebastian Andrzej Siewior
@ 2021-10-16  8:49 ` Sebastian Andrzej Siewior
  2021-10-18 17:23   ` Eric Dumazet
  8 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-16  8:49 UTC (permalink / raw)
  To: netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner,
	Sebastian Andrzej Siewior

From: "Ahmed S. Darwish" <a.darwish@linutronix.de>

The Qdisc::running sequence counter has two uses:

  1. Reliably reading qdisc's tc statistics while the qdisc is running
     (a seqcount read/retry loop at gnet_stats_add_basic()).

  2. As a flag, indicating whether the qdisc in question is running
     (without any retry loops).

For the first usage, the Qdisc::running sequence counter write section,
qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
is actually needed: the raw qdisc's bstats update. A u64_stats sync
point was thus introduced (in previous commits) inside the bstats
structure itself. A local u64_stats write section is then started and
stopped for the bstats updates.

Use that u64_stats sync point mechanism for the bstats read/retry loop
at gnet_stats_add_basic().

For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
accessed with atomic bitops, is sufficient. Using a bit flag instead of
a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
to the SMP barriers implicitly added through raw_read_seqcount() and
write_seqcount_begin/end() getting removed. All call sites have been
surveyed though, and no required ordering was identified.

Now that the qdisc->running sequence counter is no longer used, remove
it.

Note, using u64_stats implies no sequence counter protection for 64-bit
architectures. This can lead to the qdisc tc statistics "packets" vs.
"bytes" values getting out of sync on rare occasions. The individual
values will still be valid.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/netdevice.h |  4 ----
 include/net/gen_stats.h   | 19 +++++++--------
 include/net/sch_generic.h | 33 +++++++++++---------------
 net/core/gen_estimator.c  | 16 ++++++++-----
 net/core/gen_stats.c      | 50 ++++++++++++++++++++++-----------------
 net/sched/act_api.c       |  9 +++----
 net/sched/act_police.c    |  2 +-
 net/sched/sch_api.c       | 16 +++----------
 net/sched/sch_atm.c       |  3 +--
 net/sched/sch_cbq.c       |  9 +++----
 net/sched/sch_drr.c       | 10 +++-----
 net/sched/sch_ets.c       |  3 +--
 net/sched/sch_generic.c   | 10 ++------
 net/sched/sch_hfsc.c      |  8 +++----
 net/sched/sch_htb.c       |  7 +++---
 net/sched/sch_mq.c        |  7 +++---
 net/sched/sch_mqprio.c    | 14 +++++------
 net/sched/sch_multiq.c    |  3 +--
 net/sched/sch_prio.c      |  4 ++--
 net/sched/sch_qfq.c       |  7 +++---
 net/sched/sch_taprio.c    |  2 +-
 21 files changed, 102 insertions(+), 134 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 173984414f387..f9cd6fea213f3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1916,7 +1916,6 @@ enum netdev_ml_priv_type {
  *	@sfp_bus:	attached &struct sfp_bus structure.
  *
  *	@qdisc_tx_busylock: lockdep class annotating Qdisc->busylock spinlock
- *	@qdisc_running_key: lockdep class annotating Qdisc->running seqcount
  *
  *	@proto_down:	protocol port state information can be sent to the
  *			switch driver and used to set the phys state of the
@@ -2250,7 +2249,6 @@ struct net_device {
 	struct phy_device	*phydev;
 	struct sfp_bus		*sfp_bus;
 	struct lock_class_key	*qdisc_tx_busylock;
-	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
 	unsigned		wol_enabled:1;
 	unsigned		threaded:1;
@@ -2360,13 +2358,11 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev,
 #define netdev_lockdep_set_classes(dev)				\
 {								\
 	static struct lock_class_key qdisc_tx_busylock_key;	\
-	static struct lock_class_key qdisc_running_key;		\
 	static struct lock_class_key qdisc_xmit_lock_key;	\
 	static struct lock_class_key dev_addr_list_lock_key;	\
 	unsigned int i;						\
 								\
 	(dev)->qdisc_tx_busylock = &qdisc_tx_busylock_key;	\
-	(dev)->qdisc_running_key = &qdisc_running_key;		\
 	lockdep_set_class(&(dev)->addr_list_lock,		\
 			  &dev_addr_list_lock_key);		\
 	for (i = 0; i < (dev)->num_tx_queues; i++)		\
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 52b87588f467b..7aa2b8e1fb298 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -46,18 +46,15 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
 				 spinlock_t *lock, struct gnet_dump *d,
 				 int padattr);
 
-int gnet_stats_copy_basic(const seqcount_t *running,
-			  struct gnet_dump *d,
+int gnet_stats_copy_basic(struct gnet_dump *d,
 			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b);
-void gnet_stats_add_basic(const seqcount_t *running,
-			  struct gnet_stats_basic_sync *bstats,
+			  struct gnet_stats_basic_sync *b, bool running);
+void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b);
-int gnet_stats_copy_basic_hw(const seqcount_t *running,
-			     struct gnet_dump *d,
+			  struct gnet_stats_basic_sync *b, bool running);
+int gnet_stats_copy_basic_hw(struct gnet_dump *d,
 			     struct gnet_stats_basic_sync __percpu *cpu,
-			     struct gnet_stats_basic_sync *b);
+			     struct gnet_stats_basic_sync *b, bool running);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 			     struct net_rate_estimator __rcu **ptr);
 int gnet_stats_copy_queue(struct gnet_dump *d,
@@ -74,13 +71,13 @@ int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
 		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
 		      struct net_rate_estimator __rcu **rate_est,
 		      spinlock_t *lock,
-		      seqcount_t *running, struct nlattr *opt);
+		      bool running, struct nlattr *opt);
 void gen_kill_estimator(struct net_rate_estimator __rcu **ptr);
 int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
 			  struct net_rate_estimator __rcu **ptr,
 			  spinlock_t *lock,
-			  seqcount_t *running, struct nlattr *opt);
+			  bool running, struct nlattr *opt);
 bool gen_estimator_active(struct net_rate_estimator __rcu **ptr);
 bool gen_estimator_read(struct net_rate_estimator __rcu **ptr,
 			struct gnet_stats_rate_est64 *sample);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7882e3aa64482..baad2ab4d971c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -38,6 +38,10 @@ enum qdisc_state_t {
 	__QDISC_STATE_DEACTIVATED,
 	__QDISC_STATE_MISSED,
 	__QDISC_STATE_DRAINING,
+	/* Only for !TCQ_F_NOLOCK qdisc. Never access it directly.
+	 * Use qdisc_run_begin/end() or qdisc_is_running() instead.
+	 */
+	__QDISC_STATE_RUNNING,
 };
 
 #define QDISC_STATE_MISSED	BIT(__QDISC_STATE_MISSED)
@@ -108,7 +112,6 @@ struct Qdisc {
 	struct sk_buff_head	gso_skb ____cacheline_aligned_in_smp;
 	struct qdisc_skb_head	q;
 	struct gnet_stats_basic_sync bstats;
-	seqcount_t		running;
 	struct gnet_stats_queue	qstats;
 	unsigned long		state;
 	struct Qdisc            *next_sched;
@@ -143,11 +146,15 @@ static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
 	return NULL;
 }
 
+/* For !TCQ_F_NOLOCK qdisc: callers must either call this within a qdisc
+ * root_lock section, or provide their own memory barriers -- ordering
+ * against qdisc_run_begin/end() atomic bit operations.
+ */
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
 	if (qdisc->flags & TCQ_F_NOLOCK)
 		return spin_is_locked(&qdisc->seqlock);
-	return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
+	return test_bit(__QDISC_STATE_RUNNING, &qdisc->state);
 }
 
 static inline bool nolock_qdisc_is_empty(const struct Qdisc *qdisc)
@@ -167,6 +174,9 @@ static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
 	return !READ_ONCE(qdisc->q.qlen);
 }
 
+/* For !TCQ_F_NOLOCK qdisc, qdisc_run_begin/end() must be invoked with
+ * the qdisc root lock acquired.
+ */
 static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 {
 	if (qdisc->flags & TCQ_F_NOLOCK) {
@@ -206,15 +216,8 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 		 * after it releases the lock at the end of qdisc_run_end().
 		 */
 		return spin_trylock(&qdisc->seqlock);
-	} else if (qdisc_is_running(qdisc)) {
-		return false;
 	}
-	/* Variant of write_seqcount_begin() telling lockdep a trylock
-	 * was attempted.
-	 */
-	raw_write_seqcount_begin(&qdisc->running);
-	seqcount_acquire(&qdisc->running.dep_map, 0, 1, _RET_IP_);
-	return true;
+	return test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
 }
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
@@ -226,7 +229,7 @@ static inline void qdisc_run_end(struct Qdisc *qdisc)
 				      &qdisc->state)))
 			__netif_schedule(qdisc);
 	} else {
-		write_seqcount_end(&qdisc->running);
+		clear_bit(__QDISC_STATE_RUNNING, &qdisc->state);
 	}
 }
 
@@ -592,14 +595,6 @@ static inline spinlock_t *qdisc_root_sleeping_lock(const struct Qdisc *qdisc)
 	return qdisc_lock(root);
 }
 
-static inline seqcount_t *qdisc_root_sleeping_running(const struct Qdisc *qdisc)
-{
-	struct Qdisc *root = qdisc_root_sleeping(qdisc);
-
-	ASSERT_RTNL();
-	return &root->running;
-}
-
 static inline struct net_device *qdisc_dev(const struct Qdisc *qdisc)
 {
 	return qdisc->dev_queue->dev;
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index a73ad0bf324c4..4fcbdd71c59fa 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -42,7 +42,7 @@
 struct net_rate_estimator {
 	struct gnet_stats_basic_sync	*bstats;
 	spinlock_t		*stats_lock;
-	seqcount_t		*running;
+	bool			running;
 	struct gnet_stats_basic_sync __percpu *cpu_bstats;
 	u8			ewma_log;
 	u8			intvl_log; /* period : (250ms << intvl_log) */
@@ -66,7 +66,7 @@ static void est_fetch_counters(struct net_rate_estimator *e,
 	if (e->stats_lock)
 		spin_lock(e->stats_lock);
 
-	gnet_stats_add_basic(e->running, b, e->cpu_bstats, e->bstats);
+	gnet_stats_add_basic(b, e->cpu_bstats, e->bstats, e->running);
 
 	if (e->stats_lock)
 		spin_unlock(e->stats_lock);
@@ -113,7 +113,9 @@ static void est_timer(struct timer_list *t)
  * @cpu_bstats: bstats per cpu
  * @rate_est: rate estimator statistics
  * @lock: lock for statistics and control path
- * @running: qdisc running seqcount
+ * @running: true if @bstats represents a running qdisc, thus @bstats'
+ *           internal values might change during basic reads. Only used
+ *           if @bstats_cpu is NULL
  * @opt: rate estimator configuration TLV
  *
  * Creates a new rate estimator with &bstats as source and &rate_est
@@ -129,7 +131,7 @@ int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
 		      struct gnet_stats_basic_sync __percpu *cpu_bstats,
 		      struct net_rate_estimator __rcu **rate_est,
 		      spinlock_t *lock,
-		      seqcount_t *running,
+		      bool running,
 		      struct nlattr *opt)
 {
 	struct gnet_estimator *parm = nla_data(opt);
@@ -218,7 +220,9 @@ EXPORT_SYMBOL(gen_kill_estimator);
  * @cpu_bstats: bstats per cpu
  * @rate_est: rate estimator statistics
  * @lock: lock for statistics and control path
- * @running: qdisc running seqcount (might be NULL)
+ * @running: true if @bstats represents a running qdisc, thus @bstats'
+ *           internal values might change during basic reads. Only used
+ *           if @cpu_bstats is NULL
  * @opt: rate estimator configuration TLV
  *
  * Replaces the configuration of a rate estimator by calling
@@ -230,7 +234,7 @@ int gen_replace_estimator(struct gnet_stats_basic_sync *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu_bstats,
 			  struct net_rate_estimator __rcu **rate_est,
 			  spinlock_t *lock,
-			  seqcount_t *running, struct nlattr *opt)
+			  bool running, struct nlattr *opt)
 {
 	return gen_new_estimator(bstats, cpu_bstats, rate_est,
 				 lock, running, opt);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 5f57f761def69..5516ea0d5da0b 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -146,42 +146,42 @@ static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
 	_bstats_update(bstats, t_bytes, t_packets);
 }
 
-void gnet_stats_add_basic(const seqcount_t *running,
-			  struct gnet_stats_basic_sync *bstats,
+void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
 			  struct gnet_stats_basic_sync __percpu *cpu,
-			  struct gnet_stats_basic_sync *b)
+			  struct gnet_stats_basic_sync *b, bool running)
 {
-	unsigned int seq;
+	unsigned int start;
 	u64 bytes = 0;
 	u64 packets = 0;
 
+	WARN_ON_ONCE((cpu || running) && !in_task());
+
 	if (cpu) {
 		gnet_stats_add_basic_cpu(bstats, cpu);
 		return;
 	}
 	do {
 		if (running)
-			seq = read_seqcount_begin(running);
+			start = u64_stats_fetch_begin_irq(&b->syncp);
 		bytes = u64_stats_read(&b->bytes);
 		packets = u64_stats_read(&b->packets);
-	} while (running && read_seqcount_retry(running, seq));
+	} while (running && u64_stats_fetch_retry_irq(&b->syncp, start));
 
 	_bstats_update(bstats, bytes, packets);
 }
 EXPORT_SYMBOL(gnet_stats_add_basic);
 
 static int
-___gnet_stats_copy_basic(const seqcount_t *running,
-			 struct gnet_dump *d,
+___gnet_stats_copy_basic(struct gnet_dump *d,
 			 struct gnet_stats_basic_sync __percpu *cpu,
 			 struct gnet_stats_basic_sync *b,
-			 int type)
+			 int type, bool running)
 {
 	struct gnet_stats_basic_sync bstats;
 	u64 bstats_bytes, bstats_packets;
 
 	gnet_stats_basic_sync_init(&bstats);
-	gnet_stats_add_basic(running, &bstats, cpu, b);
+	gnet_stats_add_basic(&bstats, cpu, b, running);
 
 	bstats_bytes = u64_stats_read(&bstats.bytes);
 	bstats_packets = u64_stats_read(&bstats.packets);
@@ -210,10 +210,14 @@ ___gnet_stats_copy_basic(const seqcount_t *running,
 
 /**
  * gnet_stats_copy_basic - copy basic statistics into statistic TLV
- * @running: seqcount_t pointer
  * @d: dumping handle
  * @cpu: copy statistic per cpu
  * @b: basic statistics
+ * @running: true if @b represents a running qdisc, thus @b's
+ *           internal values might change during basic reads.
+ *           Only used if @cpu is NULL
+ *
+ * Context: task; must not be run from IRQ or BH contexts
  *
  * Appends the basic statistics to the top level TLV created by
  * gnet_stats_start_copy().
@@ -222,22 +226,25 @@ ___gnet_stats_copy_basic(const seqcount_t *running,
  * if the room in the socket buffer was not sufficient.
  */
 int
-gnet_stats_copy_basic(const seqcount_t *running,
-		      struct gnet_dump *d,
+gnet_stats_copy_basic(struct gnet_dump *d,
 		      struct gnet_stats_basic_sync __percpu *cpu,
-		      struct gnet_stats_basic_sync *b)
+		      struct gnet_stats_basic_sync *b,
+		      bool running)
 {
-	return ___gnet_stats_copy_basic(running, d, cpu, b,
-					TCA_STATS_BASIC);
+	return ___gnet_stats_copy_basic(d, cpu, b, TCA_STATS_BASIC, running);
 }
 EXPORT_SYMBOL(gnet_stats_copy_basic);
 
 /**
  * gnet_stats_copy_basic_hw - copy basic hw statistics into statistic TLV
- * @running: seqcount_t pointer
  * @d: dumping handle
  * @cpu: copy statistic per cpu
  * @b: basic statistics
+ * @running: true if @b represents a running qdisc, thus @b's
+ *           internal values might change during basic reads.
+ *           Only used if @cpu is NULL
+ *
+ * Context: task; must not be run from IRQ or BH contexts
  *
  * Appends the basic statistics to the top level TLV created by
  * gnet_stats_start_copy().
@@ -246,13 +253,12 @@ EXPORT_SYMBOL(gnet_stats_copy_basic);
  * if the room in the socket buffer was not sufficient.
  */
 int
-gnet_stats_copy_basic_hw(const seqcount_t *running,
-			 struct gnet_dump *d,
+gnet_stats_copy_basic_hw(struct gnet_dump *d,
 			 struct gnet_stats_basic_sync __percpu *cpu,
-			 struct gnet_stats_basic_sync *b)
+			 struct gnet_stats_basic_sync *b,
+			 bool running)
 {
-	return ___gnet_stats_copy_basic(running, d, cpu, b,
-					TCA_STATS_BASIC_HW);
+	return ___gnet_stats_copy_basic(d, cpu, b, TCA_STATS_BASIC_HW, running);
 }
 EXPORT_SYMBOL(gnet_stats_copy_basic_hw);
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 585829ffa0c4c..4133b8ea5a57a 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -501,7 +501,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	if (est) {
 		err = gen_new_estimator(&p->tcfa_bstats, p->cpu_bstats,
 					&p->tcfa_rate_est,
-					&p->tcfa_lock, NULL, est);
+					&p->tcfa_lock, false, est);
 		if (err)
 			goto err4;
 	}
@@ -1173,9 +1173,10 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *p,
 	if (err < 0)
 		goto errout;
 
-	if (gnet_stats_copy_basic(NULL, &d, p->cpu_bstats, &p->tcfa_bstats) < 0 ||
-	    gnet_stats_copy_basic_hw(NULL, &d, p->cpu_bstats_hw,
-				     &p->tcfa_bstats_hw) < 0 ||
+	if (gnet_stats_copy_basic(&d, p->cpu_bstats,
+				  &p->tcfa_bstats, false) < 0 ||
+	    gnet_stats_copy_basic_hw(&d, p->cpu_bstats_hw,
+				     &p->tcfa_bstats_hw, false) < 0 ||
 	    gnet_stats_copy_rate_est(&d, &p->tcfa_rate_est) < 0 ||
 	    gnet_stats_copy_queue(&d, p->cpu_qstats,
 				  &p->tcfa_qstats,
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index c9383805222df..9e77ba8401e53 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -125,7 +125,7 @@ static int tcf_police_init(struct net *net, struct nlattr *nla,
 					    police->common.cpu_bstats,
 					    &police->tcf_rate_est,
 					    &police->tcf_lock,
-					    NULL, est);
+					    false, est);
 		if (err)
 			goto failure;
 	} else if (tb[TCA_POLICE_AVRATE] &&
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 70f006cbf2126..efcd0b5e9a323 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -943,8 +943,7 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 		cpu_qstats = q->cpu_qstats;
 	}
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(q),
-				  &d, cpu_bstats, &q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(&d, cpu_bstats, &q->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(&d, &q->rate_est) < 0 ||
 	    gnet_stats_copy_queue(&d, cpu_qstats, &q->qstats, qlen) < 0)
 		goto nla_put_failure;
@@ -1265,26 +1264,17 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
 		rcu_assign_pointer(sch->stab, stab);
 	}
 	if (tca[TCA_RATE]) {
-		seqcount_t *running;
-
 		err = -EOPNOTSUPP;
 		if (sch->flags & TCQ_F_MQROOT) {
 			NL_SET_ERR_MSG(extack, "Cannot attach rate estimator to a multi-queue root qdisc");
 			goto err_out4;
 		}
 
-		if (sch->parent != TC_H_ROOT &&
-		    !(sch->flags & TCQ_F_INGRESS) &&
-		    (!p || !(p->flags & TCQ_F_MQROOT)))
-			running = qdisc_root_sleeping_running(sch);
-		else
-			running = &sch->running;
-
 		err = gen_new_estimator(&sch->bstats,
 					sch->cpu_bstats,
 					&sch->rate_est,
 					NULL,
-					running,
+					true,
 					tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Failed to generate new estimator");
@@ -1360,7 +1350,7 @@ static int qdisc_change(struct Qdisc *sch, struct nlattr **tca,
 				      sch->cpu_bstats,
 				      &sch->rate_est,
 				      NULL,
-				      qdisc_root_sleeping_running(sch),
+				      true,
 				      tca[TCA_RATE]);
 	}
 out:
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index fbfe4ce9497b5..4c8e994cf0a53 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -653,8 +653,7 @@ atm_tc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 {
 	struct atm_flow_data *flow = (struct atm_flow_data *)arg;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &flow->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &flow->bstats, true) < 0 ||
 	    gnet_stats_copy_queue(d, NULL, &flow->qstats, flow->q->q.qlen) < 0)
 		return -1;
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f0b1282fae111..02d9f0dfe3564 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1383,8 +1383,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	if (cl->undertime != PSCHED_PASTPERFECT)
 		cl->xstats.undertime = cl->undertime - q->now;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0)
 		return -1;
@@ -1518,7 +1517,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
 						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    true,
 						    tca[TCA_RATE]);
 			if (err) {
 				NL_SET_ERR_MSG(extack, "Failed to replace specified rate estimator");
@@ -1619,9 +1618,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 
 	if (tca[TCA_RATE]) {
 		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
-					NULL,
-					qdisc_root_sleeping_running(sch),
-					tca[TCA_RATE]);
+					NULL, true, tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Couldn't create new estimator");
 			tcf_block_put(cl->block);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 7243617a3595f..18e4f7a0b2912 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -85,8 +85,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		if (tca[TCA_RATE]) {
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
-						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    NULL, true,
 						    tca[TCA_RATE]);
 			if (err) {
 				NL_SET_ERR_MSG(extack, "Failed to replace estimator");
@@ -119,9 +118,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	if (tca[TCA_RATE]) {
 		err = gen_replace_estimator(&cl->bstats, NULL, &cl->rate_est,
-					    NULL,
-					    qdisc_root_sleeping_running(sch),
-					    tca[TCA_RATE]);
+					    NULL, true, tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Failed to replace estimator");
 			qdisc_put(cl->qdisc);
@@ -268,8 +265,7 @@ static int drr_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	if (qlen)
 		xstats.deficit = cl->deficit;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, cl_q->cpu_qstats, &cl_q->qstats, qlen) < 0)
 		return -1;
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index af56d155e7fca..0eae9ff5edf6f 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -325,8 +325,7 @@ static int ets_class_dump_stats(struct Qdisc *sch, unsigned long arg,
 	struct ets_class *cl = ets_class_from_arg(sch, arg);
 	struct Qdisc *cl_q = cl->qdisc;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl_q->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, cl_q) < 0)
 		return -1;
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 989186e7f1a02..b0ff0dff27734 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -304,8 +304,8 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 
 /*
  * Transmit possibly several skbs, and handle the return status as
- * required. Owning running seqcount bit guarantees that
- * only one CPU can execute this function.
+ * required. Owning qdisc running bit guarantees that only one CPU
+ * can execute this function.
  *
  * Returns to the caller:
  *				false  - hardware queue frozen backoff
@@ -606,7 +606,6 @@ struct Qdisc noop_qdisc = {
 	.ops		=	&noop_qdisc_ops,
 	.q.lock		=	__SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
 	.dev_queue	=	&noop_netdev_queue,
-	.running	=	SEQCNT_ZERO(noop_qdisc.running),
 	.busylock	=	__SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
 	.gso_skb = {
 		.next = (struct sk_buff *)&noop_qdisc.gso_skb,
@@ -867,7 +866,6 @@ struct Qdisc_ops pfifo_fast_ops __read_mostly = {
 EXPORT_SYMBOL(pfifo_fast_ops);
 
 static struct lock_class_key qdisc_tx_busylock;
-static struct lock_class_key qdisc_running_key;
 
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 			  const struct Qdisc_ops *ops,
@@ -917,10 +915,6 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 	lockdep_set_class(&sch->seqlock,
 			  dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
 
-	seqcount_init(&sch->running);
-	lockdep_set_class(&sch->running,
-			  dev->qdisc_running_key ?: &qdisc_running_key);
-
 	sch->ops = ops;
 	sch->flags = ops->static_flags;
 	sch->enqueue = ops->enqueue;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 181c2905ff983..d3979a6000e7d 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -965,7 +965,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
 						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    true,
 						    tca[TCA_RATE]);
 			if (err)
 				return err;
@@ -1033,9 +1033,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 
 	if (tca[TCA_RATE]) {
 		err = gen_new_estimator(&cl->bstats, NULL, &cl->rate_est,
-					NULL,
-					qdisc_root_sleeping_running(sch),
-					tca[TCA_RATE]);
+					NULL, true, tca[TCA_RATE]);
 		if (err) {
 			tcf_block_put(cl->block);
 			kfree(cl);
@@ -1328,7 +1326,7 @@ hfsc_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	xstats.work    = cl->cl_total;
 	xstats.rtwork  = cl->cl_cumul;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, NULL, &cl->qstats, qlen) < 0)
 		return -1;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index adceb9e210f61..cf1d45db4e84b 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1368,8 +1368,7 @@ htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump *d)
 		}
 	}
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    gnet_stats_copy_queue(d, NULL, &qs, qlen) < 0)
 		return -1;
@@ -1865,7 +1864,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			err = gen_new_estimator(&cl->bstats, NULL,
 						&cl->rate_est,
 						NULL,
-						qdisc_root_sleeping_running(sch),
+						true,
 						tca[TCA_RATE] ? : &est.nla);
 			if (err)
 				goto err_block_put;
@@ -1991,7 +1990,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
 						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    true,
 						    tca[TCA_RATE]);
 			if (err)
 				return err;
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index cedd0b3ef9cfb..83d2e54bf303a 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -144,8 +144,8 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 
-		gnet_stats_add_basic(NULL, &sch->bstats, qdisc->cpu_bstats,
-				     &qdisc->bstats);
+		gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
+				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
 		sch->q.qlen += qdisc_qlen(qdisc);
@@ -231,8 +231,7 @@ static int mq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct netdev_queue *dev_queue = mq_queue_get(sch, cl);
 
 	sch = dev_queue->qdisc_sleeping;
-	if (gnet_stats_copy_basic(&sch->running, d, sch->cpu_bstats,
-				  &sch->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, sch->cpu_bstats, &sch->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, sch) < 0)
 		return -1;
 	return 0;
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 3f7f756f92ca3..b29f3453c6eaf 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -402,8 +402,8 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
 
-		gnet_stats_add_basic(NULL, &sch->bstats, qdisc->cpu_bstats,
-				     &qdisc->bstats);
+		gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
+				     &qdisc->bstats, false);
 		gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
 				     &qdisc->qstats);
 		sch->q.qlen += qdisc_qlen(qdisc);
@@ -519,8 +519,8 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 			spin_lock_bh(qdisc_lock(qdisc));
 
-			gnet_stats_add_basic(NULL, &bstats, qdisc->cpu_bstats,
-					     &qdisc->bstats);
+			gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
+					     &qdisc->bstats, false);
 			gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
 					     &qdisc->qstats);
 			sch->q.qlen += qdisc_qlen(qdisc);
@@ -532,15 +532,15 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 		/* Reclaim root sleeping lock before completing stats */
 		if (d->lock)
 			spin_lock_bh(d->lock);
-		if (gnet_stats_copy_basic(NULL, d, NULL, &bstats) < 0 ||
+		if (gnet_stats_copy_basic(d, NULL, &bstats, false) < 0 ||
 		    gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0)
 			return -1;
 	} else {
 		struct netdev_queue *dev_queue = mqprio_queue_get(sch, cl);
 
 		sch = dev_queue->qdisc_sleeping;
-		if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch), d,
-					  sch->cpu_bstats, &sch->bstats) < 0 ||
+		if (gnet_stats_copy_basic(d, sch->cpu_bstats,
+					  &sch->bstats, true) < 0 ||
 		    qdisc_qstats_copy(d, sch) < 0)
 			return -1;
 	}
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index e282e7382117a..cd8ab90c4765d 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -338,8 +338,7 @@ static int multiq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct Qdisc *cl_q;
 
 	cl_q = q->queues[cl - 1];
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, cl_q->cpu_bstats, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, cl_q->cpu_bstats, &cl_q->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, cl_q) < 0)
 		return -1;
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 03fdf31ccb6af..3b8d7197c06bf 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -361,8 +361,8 @@ static int prio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct Qdisc *cl_q;
 
 	cl_q = q->queues[cl - 1];
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, cl_q->cpu_bstats, &cl_q->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, cl_q->cpu_bstats,
+				  &cl_q->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, cl_q) < 0)
 		return -1;
 
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index a35200f591a2d..0b7f9ba28deb0 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -451,7 +451,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 			err = gen_replace_estimator(&cl->bstats, NULL,
 						    &cl->rate_est,
 						    NULL,
-						    qdisc_root_sleeping_running(sch),
+						    true,
 						    tca[TCA_RATE]);
 			if (err)
 				return err;
@@ -478,7 +478,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 		err = gen_new_estimator(&cl->bstats, NULL,
 					&cl->rate_est,
 					NULL,
-					qdisc_root_sleeping_running(sch),
+					true,
 					tca[TCA_RATE]);
 		if (err)
 			goto destroy_class;
@@ -640,8 +640,7 @@ static int qfq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
 	xstats.weight = cl->agg->class_weight;
 	xstats.lmax = cl->agg->lmax;
 
-	if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
-				  d, NULL, &cl->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &cl->bstats, true) < 0 ||
 	    gnet_stats_copy_rate_est(d, &cl->rate_est) < 0 ||
 	    qdisc_qstats_copy(d, cl->qdisc) < 0)
 		return -1;
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index b9fd18d986464..9ab068fa2672b 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1977,7 +1977,7 @@ static int taprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
 
 	sch = dev_queue->qdisc_sleeping;
-	if (gnet_stats_copy_basic(&sch->running, d, NULL, &sch->bstats) < 0 ||
+	if (gnet_stats_copy_basic(d, NULL, &sch->bstats, true) < 0 ||
 	    qdisc_qstats_copy(d, sch) < 0)
 		return -1;
 	return 0;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter
  2021-10-16  8:49 ` [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter Sebastian Andrzej Siewior
@ 2021-10-18 17:23   ` Eric Dumazet
  2021-10-18 18:30     ` Eric Dumazet
  2021-10-19 10:12     ` [PATCH net-next] net: sched: Allow statistics reads from softirq Sebastian Andrzej Siewior
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2021-10-18 17:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner



On 10/16/21 1:49 AM, Sebastian Andrzej Siewior wrote:
> From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
> 
> The Qdisc::running sequence counter has two uses:
> 
>   1. Reliably reading qdisc's tc statistics while the qdisc is running
>      (a seqcount read/retry loop at gnet_stats_add_basic()).
> 
>   2. As a flag, indicating whether the qdisc in question is running
>      (without any retry loops).
> 
> For the first usage, the Qdisc::running sequence counter write section,
> qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
> is actually needed: the raw qdisc's bstats update. A u64_stats sync
> point was thus introduced (in previous commits) inside the bstats
> structure itself. A local u64_stats write section is then started and
> stopped for the bstats updates.
> 
> Use that u64_stats sync point mechanism for the bstats read/retry loop
> at gnet_stats_add_basic().
> 
> For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
> accessed with atomic bitops, is sufficient. Using a bit flag instead of
> a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
> to the SMP barriers implicitly added through raw_read_seqcount() and
> write_seqcount_begin/end() getting removed. All call sites have been
> surveyed though, and no required ordering was identified.
> 
> Now that the qdisc->running sequence counter is no longer used, remove
> it.
> 
> Note, using u64_stats implies no sequence counter protection for 64-bit
> architectures. This can lead to the qdisc tc statistics "packets" vs.
> "bytes" values getting out of sync on rare occasions. The individual
> values will still be valid.
> 
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>


I see this has been merged this week end before we could test this thing during work days :/

Just add a rate estimator on a qdisc:

tc qd add dev lo root est 1sec 4sec pfifo

then :

[  140.824352] ------------[ cut here ]------------
[  140.824361] WARNING: CPU: 15 PID: 0 at net/core/gen_stats.c:157 gnet_stats_add_basic+0x97/0xc0
[  140.824378] Modules linked in: ipvlan bonding vfat fat w1_therm i2c_mux_pca954x i2c_mux ds2482 wire cdc_acm ehci_pci ehci_hcd bnx2x mdio xt_TCPMSS ip6table_mangle ip6_tables ipv6
[  140.824413] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.15.0-smp-DEV #73
[  140.824415] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 2.48.0 10/02/2019
[  140.824417] RIP: 0010:gnet_stats_add_basic+0x97/0xc0
[  140.824420] Code: 2c 38 4a 03 5c 38 08 48 c7 c6 68 15 51 a4 e8 60 00 c7 ff 44 39 e0 72 db 89 d8 eb 05 31 c0 45 31 ed 4d 01 2e 49 01 46 08 eb 17 <0f> 0b 4d 85 ff 75 96 48 8b 02 48 8b 4a 08 49 01 06 89 c8 49 01 46
[  140.824432] RSP: 0018:ffff99415fbc5e08 EFLAGS: 00010206
[  140.824434] RAX: 0000000080000100 RBX: ffff9939812f41d0 RCX: 0000000000000001
[  140.824436] RDX: ffff99399705e0b0 RSI: 0000000000000000 RDI: ffff99415fbc5e40
[  140.824438] RBP: ffff99415fbc5e30 R08: 0000000000000000 R09: 0000000000000000
[  140.824440] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff99415fbd7740
[  140.824441] R13: dead000000000122 R14: ffff99415fbc5e40 R15: 0000000000000000
[  140.824443] FS:  0000000000000000(0000) GS:ffff99415fbc0000(0000) knlGS:0000000000000000
[  140.824445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  140.824447] CR2: 000000000087fff0 CR3: 0000000f11610006 CR4: 00000000000606e0
[  140.824449] Call Trace:
[  140.824450]  <IRQ>
[  140.824453]  ? local_bh_enable+0x20/0x20
[  140.824457]  est_timer+0x5e/0x130
[  140.824460]  call_timer_fn+0x2c/0x110
[  140.824464]  expire_timers+0x4c/0xf0
[  140.824467]  __run_timers+0x16f/0x1b0
[  140.824470]  run_timer_softirq+0x1d/0x40
[  140.824473]  __do_softirq+0x142/0x2a1
[  140.824477]  irq_exit_rcu+0x6b/0xb0
[  140.824480]  sysvec_apic_timer_interrupt+0x79/0x90
[  140.824483]  </IRQ>
[  140.824493]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  140.824497] RIP: 0010:cpuidle_enter_state+0x19b/0x300
[  140.824502] Code: ff 45 84 e4 74 20 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 e4 00 00 00 31 ff e8 c9 0d 88 ff fb 8b 45 bc <85> c0 78 52 48 89 de 89 c3 48 6b d3 68 48 8b 4c 16 48 4c 2b 6d b0
[  140.824503] RSP: 0018:ffff99398089be60 EFLAGS: 00000246
[  140.824505] RAX: 0000000000000004 RBX: ffffffffa446cb28 RCX: 000000000000001f
[  140.824506] RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000000
[  140.824507] RBP: ffff99398089beb0 R08: 0000000000000002 R09: 00000020cf9326e4
[  140.824508] R10: 0000000000638824 R11: 0000000000000000 R12: 0000000000000000
[  140.824509] R13: 00000020c9c8d180 R14: ffffc4733fbe1c50 R15: 0000000000000004
[  140.824511]  cpuidle_enter+0x2e/0x40
[  140.824514]  do_idle+0x19f/0x240
[  140.824517]  cpu_startup_entry+0x25/0x30
[  140.824519]  start_secondary+0x7c/0x80
[  140.824521]  secondary_startup_64_no_verify+0xc3/0xcb
[  140.824525] ---[ end trace d64fa4b3dc94b292 ]---



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter
  2021-10-18 17:23   ` Eric Dumazet
@ 2021-10-18 18:30     ` Eric Dumazet
  2021-10-18 19:24       ` Eric Dumazet
  2021-10-18 23:53       ` Eric Dumazet
  2021-10-19 10:12     ` [PATCH net-next] net: sched: Allow statistics reads from softirq Sebastian Andrzej Siewior
  1 sibling, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2021-10-18 18:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner



On 10/18/21 10:23 AM, Eric Dumazet wrote:
> 
> 
> On 10/16/21 1:49 AM, Sebastian Andrzej Siewior wrote:
>> From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
>>
>> The Qdisc::running sequence counter has two uses:
>>
>>   1. Reliably reading qdisc's tc statistics while the qdisc is running
>>      (a seqcount read/retry loop at gnet_stats_add_basic()).
>>
>>   2. As a flag, indicating whether the qdisc in question is running
>>      (without any retry loops).
>>
>> For the first usage, the Qdisc::running sequence counter write section,
>> qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
>> is actually needed: the raw qdisc's bstats update. A u64_stats sync
>> point was thus introduced (in previous commits) inside the bstats
>> structure itself. A local u64_stats write section is then started and
>> stopped for the bstats updates.
>>
>> Use that u64_stats sync point mechanism for the bstats read/retry loop
>> at gnet_stats_add_basic().
>>
>> For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
>> accessed with atomic bitops, is sufficient. Using a bit flag instead of
>> a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
>> to the SMP barriers implicitly added through raw_read_seqcount() and
>> write_seqcount_begin/end() getting removed. All call sites have been
>> surveyed though, and no required ordering was identified.
>>
>> Now that the qdisc->running sequence counter is no longer used, remove
>> it.
>>
>> Note, using u64_stats implies no sequence counter protection for 64-bit
>> architectures. This can lead to the qdisc tc statistics "packets" vs.
>> "bytes" values getting out of sync on rare occasions. The individual
>> values will still be valid.
>>
>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> 
> I see this has been merged this week end before we could test this thing during work days :/
> 
> Just add a rate estimator on a qdisc:
> 
> tc qd add dev lo root est 1sec 4sec pfifo
> 
> then :
> 
> [  140.824352] ------------[ cut here ]------------
> [  140.824361] WARNING: CPU: 15 PID: 0 at net/core/gen_stats.c:157 gnet_stats_add_basic+0x97/0xc0
> [  140.824378] Modules linked in: ipvlan bonding vfat fat w1_therm i2c_mux_pca954x i2c_mux ds2482 wire cdc_acm ehci_pci ehci_hcd bnx2x mdio xt_TCPMSS ip6table_mangle ip6_tables ipv6
> [  140.824413] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.15.0-smp-DEV #73
> [  140.824415] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 2.48.0 10/02/2019
> [  140.824417] RIP: 0010:gnet_stats_add_basic+0x97/0xc0
> [  140.824420] Code: 2c 38 4a 03 5c 38 08 48 c7 c6 68 15 51 a4 e8 60 00 c7 ff 44 39 e0 72 db 89 d8 eb 05 31 c0 45 31 ed 4d 01 2e 49 01 46 08 eb 17 <0f> 0b 4d 85 ff 75 96 48 8b 02 48 8b 4a 08 49 01 06 89 c8 49 01 46
> [  140.824432] RSP: 0018:ffff99415fbc5e08 EFLAGS: 00010206
> [  140.824434] RAX: 0000000080000100 RBX: ffff9939812f41d0 RCX: 0000000000000001
> [  140.824436] RDX: ffff99399705e0b0 RSI: 0000000000000000 RDI: ffff99415fbc5e40
> [  140.824438] RBP: ffff99415fbc5e30 R08: 0000000000000000 R09: 0000000000000000
> [  140.824440] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff99415fbd7740
> [  140.824441] R13: dead000000000122 R14: ffff99415fbc5e40 R15: 0000000000000000
> [  140.824443] FS:  0000000000000000(0000) GS:ffff99415fbc0000(0000) knlGS:0000000000000000
> [  140.824445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  140.824447] CR2: 000000000087fff0 CR3: 0000000f11610006 CR4: 00000000000606e0
> [  140.824449] Call Trace:
> [  140.824450]  <IRQ>
> [  140.824453]  ? local_bh_enable+0x20/0x20
> [  140.824457]  est_timer+0x5e/0x130
> [  140.824460]  call_timer_fn+0x2c/0x110
> [  140.824464]  expire_timers+0x4c/0xf0
> [  140.824467]  __run_timers+0x16f/0x1b0
> [  140.824470]  run_timer_softirq+0x1d/0x40
> [  140.824473]  __do_softirq+0x142/0x2a1
> [  140.824477]  irq_exit_rcu+0x6b/0xb0
> [  140.824480]  sysvec_apic_timer_interrupt+0x79/0x90
> [  140.824483]  </IRQ>
> [  140.824493]  asm_sysvec_apic_timer_interrupt+0x12/0x20
> [  140.824497] RIP: 0010:cpuidle_enter_state+0x19b/0x300
> [  140.824502] Code: ff 45 84 e4 74 20 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 e4 00 00 00 31 ff e8 c9 0d 88 ff fb 8b 45 bc <85> c0 78 52 48 89 de 89 c3 48 6b d3 68 48 8b 4c 16 48 4c 2b 6d b0
> [  140.824503] RSP: 0018:ffff99398089be60 EFLAGS: 00000246
> [  140.824505] RAX: 0000000000000004 RBX: ffffffffa446cb28 RCX: 000000000000001f
> [  140.824506] RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000000
> [  140.824507] RBP: ffff99398089beb0 R08: 0000000000000002 R09: 00000020cf9326e4
> [  140.824508] R10: 0000000000638824 R11: 0000000000000000 R12: 0000000000000000
> [  140.824509] R13: 00000020c9c8d180 R14: ffffc4733fbe1c50 R15: 0000000000000004
> [  140.824511]  cpuidle_enter+0x2e/0x40
> [  140.824514]  do_idle+0x19f/0x240
> [  140.824517]  cpu_startup_entry+0x25/0x30
> [  140.824519]  start_secondary+0x7c/0x80
> [  140.824521]  secondary_startup_64_no_verify+0xc3/0xcb
> [  140.824525] ---[ end trace d64fa4b3dc94b292 ]---
> 
> 

Is it just me, or is net-next broken ?

Pinging the default gateway from idle host shows huge and variable delays.

Other hosts still using older kernels are just fine.

It looks we miss real qdisc_run() or something.

lpk43:~# ping6 fe80::1%eth0
PING fe80::1%eth0(fe80::1) 56 data bytes
64 bytes from fe80::1: icmp_seq=1 ttl=64 time=0.177 ms
64 bytes from fe80::1: icmp_seq=2 ttl=64 time=0.138 ms
64 bytes from fe80::1: icmp_seq=3 ttl=64 time=118 ms
64 bytes from fe80::1: icmp_seq=4 ttl=64 time=394 ms
64 bytes from fe80::1: icmp_seq=5 ttl=64 time=0.146 ms
64 bytes from fe80::1: icmp_seq=6 ttl=64 time=823 ms
64 bytes from fe80::1: icmp_seq=7 ttl=64 time=77.1 ms
64 bytes from fe80::1: icmp_seq=8 ttl=64 time=0.165 ms
64 bytes from fe80::1: icmp_seq=9 ttl=64 time=0.181 ms
64 bytes from fe80::1: icmp_seq=10 ttl=64 time=276 ms
64 bytes from fe80::1: icmp_seq=11 ttl=64 time=0.159 ms
64 bytes from fe80::1: icmp_seq=12 ttl=64 time=17.3 ms
64 bytes from fe80::1: icmp_seq=13 ttl=64 time=0.134 ms
64 bytes from fe80::1: icmp_seq=14 ttl=64 time=0.210 ms
64 bytes from fe80::1: icmp_seq=15 ttl=64 time=0.134 ms
64 bytes from fe80::1: icmp_seq=16 ttl=64 time=414 ms
64 bytes from fe80::1: icmp_seq=17 ttl=64 time=443 ms
64 bytes from fe80::1: icmp_seq=18 ttl=64 time=0.142 ms
64 bytes from fe80::1: icmp_seq=19 ttl=64 time=0.137 ms
64 bytes from fe80::1: icmp_seq=20 ttl=64 time=121 ms
64 bytes from fe80::1: icmp_seq=21 ttl=64 time=169 ms
^C
--- fe80::1%eth0 ping statistics ---
21 packets transmitted, 21 received, 0% packet loss, time 20300ms
rtt min/avg/max/mdev = 0.134/136.098/823.204/213.070 ms

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter
  2021-10-18 18:30     ` Eric Dumazet
@ 2021-10-18 19:24       ` Eric Dumazet
  2021-10-18 23:53       ` Eric Dumazet
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2021-10-18 19:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner



On 10/18/21 11:30 AM, Eric Dumazet wrote:
> 
> 
> On 10/18/21 10:23 AM, Eric Dumazet wrote:
>>
>>
>> On 10/16/21 1:49 AM, Sebastian Andrzej Siewior wrote:
>>> From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
>>>
>>> The Qdisc::running sequence counter has two uses:
>>>
>>>   1. Reliably reading qdisc's tc statistics while the qdisc is running
>>>      (a seqcount read/retry loop at gnet_stats_add_basic()).
>>>
>>>   2. As a flag, indicating whether the qdisc in question is running
>>>      (without any retry loops).
>>>
>>> For the first usage, the Qdisc::running sequence counter write section,
>>> qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
>>> is actually needed: the raw qdisc's bstats update. A u64_stats sync
>>> point was thus introduced (in previous commits) inside the bstats
>>> structure itself. A local u64_stats write section is then started and
>>> stopped for the bstats updates.
>>>
>>> Use that u64_stats sync point mechanism for the bstats read/retry loop
>>> at gnet_stats_add_basic().
>>>
>>> For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
>>> accessed with atomic bitops, is sufficient. Using a bit flag instead of
>>> a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
>>> to the SMP barriers implicitly added through raw_read_seqcount() and
>>> write_seqcount_begin/end() getting removed. All call sites have been
>>> surveyed though, and no required ordering was identified.
>>>
>>> Now that the qdisc->running sequence counter is no longer used, remove
>>> it.
>>>
>>> Note, using u64_stats implies no sequence counter protection for 64-bit
>>> architectures. This can lead to the qdisc tc statistics "packets" vs.
>>> "bytes" values getting out of sync on rare occasions. The individual
>>> values will still be valid.
>>>
>>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>>
>>
>> I see this has been merged this week end before we could test this thing during work days :/
>>
>> Just add a rate estimator on a qdisc:
>>
>> tc qd add dev lo root est 1sec 4sec pfifo
>>
>> then :
>>
>> [  140.824352] ------------[ cut here ]------------
>> [  140.824361] WARNING: CPU: 15 PID: 0 at net/core/gen_stats.c:157 gnet_stats_add_basic+0x97/0xc0
>> [  140.824378] Modules linked in: ipvlan bonding vfat fat w1_therm i2c_mux_pca954x i2c_mux ds2482 wire cdc_acm ehci_pci ehci_hcd bnx2x mdio xt_TCPMSS ip6table_mangle ip6_tables ipv6
>> [  140.824413] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.15.0-smp-DEV #73
>> [  140.824415] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 2.48.0 10/02/2019
>> [  140.824417] RIP: 0010:gnet_stats_add_basic+0x97/0xc0
>> [  140.824420] Code: 2c 38 4a 03 5c 38 08 48 c7 c6 68 15 51 a4 e8 60 00 c7 ff 44 39 e0 72 db 89 d8 eb 05 31 c0 45 31 ed 4d 01 2e 49 01 46 08 eb 17 <0f> 0b 4d 85 ff 75 96 48 8b 02 48 8b 4a 08 49 01 06 89 c8 49 01 46
>> [  140.824432] RSP: 0018:ffff99415fbc5e08 EFLAGS: 00010206
>> [  140.824434] RAX: 0000000080000100 RBX: ffff9939812f41d0 RCX: 0000000000000001
>> [  140.824436] RDX: ffff99399705e0b0 RSI: 0000000000000000 RDI: ffff99415fbc5e40
>> [  140.824438] RBP: ffff99415fbc5e30 R08: 0000000000000000 R09: 0000000000000000
>> [  140.824440] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff99415fbd7740
>> [  140.824441] R13: dead000000000122 R14: ffff99415fbc5e40 R15: 0000000000000000
>> [  140.824443] FS:  0000000000000000(0000) GS:ffff99415fbc0000(0000) knlGS:0000000000000000
>> [  140.824445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  140.824447] CR2: 000000000087fff0 CR3: 0000000f11610006 CR4: 00000000000606e0
>> [  140.824449] Call Trace:
>> [  140.824450]  <IRQ>
>> [  140.824453]  ? local_bh_enable+0x20/0x20
>> [  140.824457]  est_timer+0x5e/0x130
>> [  140.824460]  call_timer_fn+0x2c/0x110
>> [  140.824464]  expire_timers+0x4c/0xf0
>> [  140.824467]  __run_timers+0x16f/0x1b0
>> [  140.824470]  run_timer_softirq+0x1d/0x40
>> [  140.824473]  __do_softirq+0x142/0x2a1
>> [  140.824477]  irq_exit_rcu+0x6b/0xb0
>> [  140.824480]  sysvec_apic_timer_interrupt+0x79/0x90
>> [  140.824483]  </IRQ>
>> [  140.824493]  asm_sysvec_apic_timer_interrupt+0x12/0x20
>> [  140.824497] RIP: 0010:cpuidle_enter_state+0x19b/0x300
>> [  140.824502] Code: ff 45 84 e4 74 20 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 e4 00 00 00 31 ff e8 c9 0d 88 ff fb 8b 45 bc <85> c0 78 52 48 89 de 89 c3 48 6b d3 68 48 8b 4c 16 48 4c 2b 6d b0
>> [  140.824503] RSP: 0018:ffff99398089be60 EFLAGS: 00000246
>> [  140.824505] RAX: 0000000000000004 RBX: ffffffffa446cb28 RCX: 000000000000001f
>> [  140.824506] RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000000
>> [  140.824507] RBP: ffff99398089beb0 R08: 0000000000000002 R09: 00000020cf9326e4
>> [  140.824508] R10: 0000000000638824 R11: 0000000000000000 R12: 0000000000000000
>> [  140.824509] R13: 00000020c9c8d180 R14: ffffc4733fbe1c50 R15: 0000000000000004
>> [  140.824511]  cpuidle_enter+0x2e/0x40
>> [  140.824514]  do_idle+0x19f/0x240
>> [  140.824517]  cpu_startup_entry+0x25/0x30
>> [  140.824519]  start_secondary+0x7c/0x80
>> [  140.824521]  secondary_startup_64_no_verify+0xc3/0xcb
>> [  140.824525] ---[ end trace d64fa4b3dc94b292 ]---
>>
>>
> 
> Is it just me, or is net-next broken ?
> 
> Pinging the default gateway from idle host shows huge and variable delays.
> 
> Other hosts still using older kernels are just fine.
> 
> It looks we miss real qdisc_run() or something.
> 
> lpk43:~# ping6 fe80::1%eth0
> PING fe80::1%eth0(fe80::1) 56 data bytes
> 64 bytes from fe80::1: icmp_seq=1 ttl=64 time=0.177 ms
> 64 bytes from fe80::1: icmp_seq=2 ttl=64 time=0.138 ms
> 64 bytes from fe80::1: icmp_seq=3 ttl=64 time=118 ms
> 64 bytes from fe80::1: icmp_seq=4 ttl=64 time=394 ms
> 64 bytes from fe80::1: icmp_seq=5 ttl=64 time=0.146 ms
> 64 bytes from fe80::1: icmp_seq=6 ttl=64 time=823 ms
> 64 bytes from fe80::1: icmp_seq=7 ttl=64 time=77.1 ms
> 64 bytes from fe80::1: icmp_seq=8 ttl=64 time=0.165 ms
> 64 bytes from fe80::1: icmp_seq=9 ttl=64 time=0.181 ms
> 64 bytes from fe80::1: icmp_seq=10 ttl=64 time=276 ms
> 64 bytes from fe80::1: icmp_seq=11 ttl=64 time=0.159 ms
> 64 bytes from fe80::1: icmp_seq=12 ttl=64 time=17.3 ms
> 64 bytes from fe80::1: icmp_seq=13 ttl=64 time=0.134 ms
> 64 bytes from fe80::1: icmp_seq=14 ttl=64 time=0.210 ms
> 64 bytes from fe80::1: icmp_seq=15 ttl=64 time=0.134 ms
> 64 bytes from fe80::1: icmp_seq=16 ttl=64 time=414 ms
> 64 bytes from fe80::1: icmp_seq=17 ttl=64 time=443 ms
> 64 bytes from fe80::1: icmp_seq=18 ttl=64 time=0.142 ms
> 64 bytes from fe80::1: icmp_seq=19 ttl=64 time=0.137 ms
> 64 bytes from fe80::1: icmp_seq=20 ttl=64 time=121 ms
> 64 bytes from fe80::1: icmp_seq=21 ttl=64 time=169 ms
> ^C
> --- fe80::1%eth0 ping statistics ---
> 21 packets transmitted, 21 received, 0% packet loss, time 20300ms
> rtt min/avg/max/mdev = 0.134/136.098/823.204/213.070 ms
> 

Reverting 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter")
solves the issue for me.

I wonder if this patch has been tested with normal qdiscs (ie not pfifo_fast which is lockless)




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter
  2021-10-18 18:30     ` Eric Dumazet
  2021-10-18 19:24       ` Eric Dumazet
@ 2021-10-18 23:53       ` Eric Dumazet
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2021-10-18 23:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, netdev, netfilter-devel
  Cc: Jakub Kicinski, David S. Miller, Pablo Neira Ayuso,
	Jozsef Kadlecsik, Florian Westphal, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Ahmed S. Darwish, Eric Dumazet, Thomas Gleixner



On 10/18/21 11:30 AM, Eric Dumazet wrote:
> 
> 
> On 10/18/21 10:23 AM, Eric Dumazet wrote:
>>
>>
>> On 10/16/21 1:49 AM, Sebastian Andrzej Siewior wrote:
>>> From: "Ahmed S. Darwish" <a.darwish@linutronix.de>
>>>
>>> The Qdisc::running sequence counter has two uses:
>>>
>>>   1. Reliably reading qdisc's tc statistics while the qdisc is running
>>>      (a seqcount read/retry loop at gnet_stats_add_basic()).
>>>
>>>   2. As a flag, indicating whether the qdisc in question is running
>>>      (without any retry loops).
>>>
>>> For the first usage, the Qdisc::running sequence counter write section,
>>> qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
>>> is actually needed: the raw qdisc's bstats update. A u64_stats sync
>>> point was thus introduced (in previous commits) inside the bstats
>>> structure itself. A local u64_stats write section is then started and
>>> stopped for the bstats updates.
>>>
>>> Use that u64_stats sync point mechanism for the bstats read/retry loop
>>> at gnet_stats_add_basic().
>>>
>>> For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
>>> accessed with atomic bitops, is sufficient. Using a bit flag instead of
>>> a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
>>> to the SMP barriers implicitly added through raw_read_seqcount() and
>>> write_seqcount_begin/end() getting removed. All call sites have been
>>> surveyed though, and no required ordering was identified.
>>>
>>> Now that the qdisc->running sequence counter is no longer used, remove
>>> it.
>>>
>>> Note, using u64_stats implies no sequence counter protection for 64-bit
>>> architectures. This can lead to the qdisc tc statistics "packets" vs.
>>> "bytes" values getting out of sync on rare occasions. The individual
>>> values will still be valid.
>>>
>>> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
>>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>>
>>
>> I see this has been merged this week end before we could test this thing during work days :/
>>
>> Just add a rate estimator on a qdisc:
>>
>> tc qd add dev lo root est 1sec 4sec pfifo
>>
>> then :
>>
>> [  140.824352] ------------[ cut here ]------------
>> [  140.824361] WARNING: CPU: 15 PID: 0 at net/core/gen_stats.c:157 gnet_stats_add_basic+0x97/0xc0
>> [  140.824378] Modules linked in: ipvlan bonding vfat fat w1_therm i2c_mux_pca954x i2c_mux ds2482 wire cdc_acm ehci_pci ehci_hcd bnx2x mdio xt_TCPMSS ip6table_mangle ip6_tables ipv6
>> [  140.824413] CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.15.0-smp-DEV #73
>> [  140.824415] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 2.48.0 10/02/2019
>> [  140.824417] RIP: 0010:gnet_stats_add_basic+0x97/0xc0
>> [  140.824420] Code: 2c 38 4a 03 5c 38 08 48 c7 c6 68 15 51 a4 e8 60 00 c7 ff 44 39 e0 72 db 89 d8 eb 05 31 c0 45 31 ed 4d 01 2e 49 01 46 08 eb 17 <0f> 0b 4d 85 ff 75 96 48 8b 02 48 8b 4a 08 49 01 06 89 c8 49 01 46
>> [  140.824432] RSP: 0018:ffff99415fbc5e08 EFLAGS: 00010206
>> [  140.824434] RAX: 0000000080000100 RBX: ffff9939812f41d0 RCX: 0000000000000001
>> [  140.824436] RDX: ffff99399705e0b0 RSI: 0000000000000000 RDI: ffff99415fbc5e40
>> [  140.824438] RBP: ffff99415fbc5e30 R08: 0000000000000000 R09: 0000000000000000
>> [  140.824440] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff99415fbd7740
>> [  140.824441] R13: dead000000000122 R14: ffff99415fbc5e40 R15: 0000000000000000
>> [  140.824443] FS:  0000000000000000(0000) GS:ffff99415fbc0000(0000) knlGS:0000000000000000
>> [  140.824445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  140.824447] CR2: 000000000087fff0 CR3: 0000000f11610006 CR4: 00000000000606e0
>> [  140.824449] Call Trace:
>> [  140.824450]  <IRQ>
>> [  140.824453]  ? local_bh_enable+0x20/0x20
>> [  140.824457]  est_timer+0x5e/0x130
>> [  140.824460]  call_timer_fn+0x2c/0x110
>> [  140.824464]  expire_timers+0x4c/0xf0
>> [  140.824467]  __run_timers+0x16f/0x1b0
>> [  140.824470]  run_timer_softirq+0x1d/0x40
>> [  140.824473]  __do_softirq+0x142/0x2a1
>> [  140.824477]  irq_exit_rcu+0x6b/0xb0
>> [  140.824480]  sysvec_apic_timer_interrupt+0x79/0x90
>> [  140.824483]  </IRQ>
>> [  140.824493]  asm_sysvec_apic_timer_interrupt+0x12/0x20
>> [  140.824497] RIP: 0010:cpuidle_enter_state+0x19b/0x300
>> [  140.824502] Code: ff 45 84 e4 74 20 48 c7 45 c8 00 00 00 00 9c 8f 45 c8 f7 45 c8 00 02 00 00 0f 85 e4 00 00 00 31 ff e8 c9 0d 88 ff fb 8b 45 bc <85> c0 78 52 48 89 de 89 c3 48 6b d3 68 48 8b 4c 16 48 4c 2b 6d b0
>> [  140.824503] RSP: 0018:ffff99398089be60 EFLAGS: 00000246
>> [  140.824505] RAX: 0000000000000004 RBX: ffffffffa446cb28 RCX: 000000000000001f
>> [  140.824506] RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000000
>> [  140.824507] RBP: ffff99398089beb0 R08: 0000000000000002 R09: 00000020cf9326e4
>> [  140.824508] R10: 0000000000638824 R11: 0000000000000000 R12: 0000000000000000
>> [  140.824509] R13: 00000020c9c8d180 R14: ffffc4733fbe1c50 R15: 0000000000000004
>> [  140.824511]  cpuidle_enter+0x2e/0x40
>> [  140.824514]  do_idle+0x19f/0x240
>> [  140.824517]  cpu_startup_entry+0x25/0x30
>> [  140.824519]  start_secondary+0x7c/0x80
>> [  140.824521]  secondary_startup_64_no_verify+0xc3/0xcb
>> [  140.824525] ---[ end trace d64fa4b3dc94b292 ]---
>>
>>
> 
> Is it just me, or is net-next broken ?
> 
> Pinging the default gateway from idle host shows huge and variable delays.
> 
> Other hosts still using older kernels are just fine.
> 
> It looks we miss real qdisc_run() or something.
> 
> lpk43:~# ping6 fe80::1%eth0
> PING fe80::1%eth0(fe80::1) 56 data bytes
> 64 bytes from fe80::1: icmp_seq=1 ttl=64 time=0.177 ms
> 64 bytes from fe80::1: icmp_seq=2 ttl=64 time=0.138 ms
> 64 bytes from fe80::1: icmp_seq=3 ttl=64 time=118 ms
> 64 bytes from fe80::1: icmp_seq=4 ttl=64 time=394 ms
> 64 bytes from fe80::1: icmp_seq=5 ttl=64 time=0.146 ms
> 64 bytes from fe80::1: icmp_seq=6 ttl=64 time=823 ms
> 64 bytes from fe80::1: icmp_seq=7 ttl=64 time=77.1 ms
> 64 bytes from fe80::1: icmp_seq=8 ttl=64 time=0.165 ms
> 64 bytes from fe80::1: icmp_seq=9 ttl=64 time=0.181 ms
> 64 bytes from fe80::1: icmp_seq=10 ttl=64 time=276 ms
> 64 bytes from fe80::1: icmp_seq=11 ttl=64 time=0.159 ms
> 64 bytes from fe80::1: icmp_seq=12 ttl=64 time=17.3 ms
> 64 bytes from fe80::1: icmp_seq=13 ttl=64 time=0.134 ms
> 64 bytes from fe80::1: icmp_seq=14 ttl=64 time=0.210 ms
> 64 bytes from fe80::1: icmp_seq=15 ttl=64 time=0.134 ms
> 64 bytes from fe80::1: icmp_seq=16 ttl=64 time=414 ms
> 64 bytes from fe80::1: icmp_seq=17 ttl=64 time=443 ms
> 64 bytes from fe80::1: icmp_seq=18 ttl=64 time=0.142 ms
> 64 bytes from fe80::1: icmp_seq=19 ttl=64 time=0.137 ms
> 64 bytes from fe80::1: icmp_seq=20 ttl=64 time=121 ms
> 64 bytes from fe80::1: icmp_seq=21 ttl=64 time=169 ms
> ^C
> --- fe80::1%eth0 ping statistics ---
> 21 packets transmitted, 21 received, 0% packet loss, time 20300ms
> rtt min/avg/max/mdev = 0.134/136.098/823.204/213.070 ms
> 

So the issue was about a reverted test_and_set_bit() in qdisc_run_begin()

Also, we should not use atomic operations when changing __QDISC_STATE_RUNNING,
this is adding yet another pair of atomic ops in tx fast path.

I will test something like :

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index baad2ab4d971cd3fdc8d59acdd72d39fa6230370..ada02c4a4f518b732d62561a22b1d9033516b494 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -38,10 +38,13 @@ enum qdisc_state_t {
        __QDISC_STATE_DEACTIVATED,
        __QDISC_STATE_MISSED,
        __QDISC_STATE_DRAINING,
+};
+
+enum qdisc_state2_t {
        /* Only for !TCQ_F_NOLOCK qdisc. Never access it directly.
         * Use qdisc_run_begin/end() or qdisc_is_running() instead.
         */
-       __QDISC_STATE_RUNNING,
+       __QDISC_STATE2_RUNNING,
 };
 
 #define QDISC_STATE_MISSED     BIT(__QDISC_STATE_MISSED)
@@ -114,6 +117,7 @@ struct Qdisc {
        struct gnet_stats_basic_sync bstats;
        struct gnet_stats_queue qstats;
        unsigned long           state;
+       unsigned long           state2; /* must be written under qdisc spinlock */
        struct Qdisc            *next_sched;
        struct sk_buff_head     skb_bad_txq;
 
@@ -154,7 +158,7 @@ static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
        if (qdisc->flags & TCQ_F_NOLOCK)
                return spin_is_locked(&qdisc->seqlock);
-       return test_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+       return test_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
 }
 
 static inline bool nolock_qdisc_is_empty(const struct Qdisc *qdisc)
@@ -217,7 +221,7 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
                 */
                return spin_trylock(&qdisc->seqlock);
        }
-       return test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+       return !__test_and_set_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
 }
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
@@ -229,7 +233,7 @@ static inline void qdisc_run_end(struct Qdisc *qdisc)
                                      &qdisc->state)))
                        __netif_schedule(qdisc);
        } else {
-               clear_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+               __clear_bit(__QDISC_STATE2_RUNNING, &qdisc->state2);
        }
 }
 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next] net: sched: Allow statistics reads from softirq.
  2021-10-18 17:23   ` Eric Dumazet
  2021-10-18 18:30     ` Eric Dumazet
@ 2021-10-19 10:12     ` Sebastian Andrzej Siewior
  2021-10-19 12:10       ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-19 10:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, netfilter-devel, Jakub Kicinski, David S. Miller,
	Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Ahmed S. Darwish,
	Eric Dumazet, Thomas Gleixner

Eric reported that the rate estimator reads statics from the softirq
which in turn triggers a warning introduced in the statistics rework.

The warning is too cautious. The updates happen in the softirq context
so reads from softirq are fine since the writes can not be preempted.
The updates/writes happen during qdisc_run() which ensures one writer
and the softirq context.
The remaining bad context for reading statistics remains in hard-IRQ
because it may preempt a writer.

Fixes: 29cbcd8582837 ("net: sched: Remove Qdisc::running sequence counter")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/gen_stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 5516ea0d5da0b..15c270e22c5ef 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -154,7 +154,7 @@ void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
 	u64 bytes = 0;
 	u64 packets = 0;
 
-	WARN_ON_ONCE((cpu || running) && !in_task());
+	WARN_ON_ONCE((cpu || running) && in_hardirq());
 
 	if (cpu) {
 		gnet_stats_add_basic_cpu(bstats, cpu);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: sched: Allow statistics reads from softirq.
  2021-10-19 10:12     ` [PATCH net-next] net: sched: Allow statistics reads from softirq Sebastian Andrzej Siewior
@ 2021-10-19 12:10       ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-10-19 12:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: eric.dumazet, netdev, netfilter-devel, kuba, davem, pablo,
	kadlec, fw, jhs, xiyou.wangcong, jiri, a.darwish, edumazet, tglx

Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue, 19 Oct 2021 12:12:04 +0200 you wrote:
> Eric reported that the rate estimator reads statics from the softirq
> which in turn triggers a warning introduced in the statistics rework.
> 
> The warning is too cautious. The updates happen in the softirq context
> so reads from softirq are fine since the writes can not be preempted.
> The updates/writes happen during qdisc_run() which ensures one writer
> and the softirq context.
> The remaining bad context for reading statistics remains in hard-IRQ
> because it may preempt a writer.
> 
> [...]

Here is the summary with links:
  - [net-next] net: sched: Allow statistics reads from softirq.
    https://git.kernel.org/netdev/net-next/c/e22db7bd552f

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 6/9] net: sched: Protect Qdisc::bstats with u64_stats
  2021-10-16  8:49 ` [PATCH net-next 6/9] net: sched: Protect Qdisc::bstats with u64_stats Sebastian Andrzej Siewior
@ 2021-10-26 23:47   ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-10-26 23:47 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4991 bytes --]

Hi Sebastian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Sebastian-Andrzej-Siewior/Try-to-simplify-the-gnet_stats-and-remove-qdisc-running-sequence-counter/20211016-165122
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 295711fa8fec42a55623bf6997d05a21d7855132
config: arc-allyesconfig (attached as .config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/3f7dcf3ef7b80c1b54674bb0dc3fef74971cd975
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sebastian-Andrzej-Siewior/Try-to-simplify-the-gnet_stats-and-remove-qdisc-running-sequence-counter/20211016-165122
        git checkout 3f7dcf3ef7b80c1b54674bb0dc3fef74971cd975
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   net/sched/sch_gred.c: In function 'gred_offload':
>> net/sched/sch_gred.c:350:1: error: the frame size of 1164 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
     350 | }
         | ^
   cc1: all warnings being treated as errors


vim +350 net/sched/sch_gred.c

^1da177e4c3f41 Linus Torvalds 2005-04-16  309  
890d8d23ec3c9e Jakub Kicinski 2018-11-19  310  static void gred_offload(struct Qdisc *sch, enum tc_gred_command command)
890d8d23ec3c9e Jakub Kicinski 2018-11-19  311  {
890d8d23ec3c9e Jakub Kicinski 2018-11-19  312  	struct gred_sched *table = qdisc_priv(sch);
890d8d23ec3c9e Jakub Kicinski 2018-11-19  313  	struct net_device *dev = qdisc_dev(sch);
890d8d23ec3c9e Jakub Kicinski 2018-11-19  314  	struct tc_gred_qopt_offload opt = {
890d8d23ec3c9e Jakub Kicinski 2018-11-19  315  		.command	= command,
890d8d23ec3c9e Jakub Kicinski 2018-11-19  316  		.handle		= sch->handle,
890d8d23ec3c9e Jakub Kicinski 2018-11-19  317  		.parent		= sch->parent,
890d8d23ec3c9e Jakub Kicinski 2018-11-19  318  	};
890d8d23ec3c9e Jakub Kicinski 2018-11-19  319  
890d8d23ec3c9e Jakub Kicinski 2018-11-19  320  	if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
890d8d23ec3c9e Jakub Kicinski 2018-11-19  321  		return;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  322  
890d8d23ec3c9e Jakub Kicinski 2018-11-19  323  	if (command == TC_GRED_REPLACE) {
890d8d23ec3c9e Jakub Kicinski 2018-11-19  324  		unsigned int i;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  325  
890d8d23ec3c9e Jakub Kicinski 2018-11-19  326  		opt.set.grio_on = gred_rio_mode(table);
890d8d23ec3c9e Jakub Kicinski 2018-11-19  327  		opt.set.wred_on = gred_wred_mode(table);
890d8d23ec3c9e Jakub Kicinski 2018-11-19  328  		opt.set.dp_cnt = table->DPs;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  329  		opt.set.dp_def = table->def;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  330  
890d8d23ec3c9e Jakub Kicinski 2018-11-19  331  		for (i = 0; i < table->DPs; i++) {
890d8d23ec3c9e Jakub Kicinski 2018-11-19  332  			struct gred_sched_data *q = table->tab[i];
890d8d23ec3c9e Jakub Kicinski 2018-11-19  333  
890d8d23ec3c9e Jakub Kicinski 2018-11-19  334  			if (!q)
890d8d23ec3c9e Jakub Kicinski 2018-11-19  335  				continue;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  336  			opt.set.tab[i].present = true;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  337  			opt.set.tab[i].limit = q->limit;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  338  			opt.set.tab[i].prio = q->prio;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  339  			opt.set.tab[i].min = q->parms.qth_min >> q->parms.Wlog;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  340  			opt.set.tab[i].max = q->parms.qth_max >> q->parms.Wlog;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  341  			opt.set.tab[i].is_ecn = gred_use_ecn(q);
890d8d23ec3c9e Jakub Kicinski 2018-11-19  342  			opt.set.tab[i].is_harddrop = gred_use_harddrop(q);
890d8d23ec3c9e Jakub Kicinski 2018-11-19  343  			opt.set.tab[i].probability = q->parms.max_P;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  344  			opt.set.tab[i].backlog = &q->backlog;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  345  		}
890d8d23ec3c9e Jakub Kicinski 2018-11-19  346  		opt.set.qstats = &sch->qstats;
890d8d23ec3c9e Jakub Kicinski 2018-11-19  347  	}
890d8d23ec3c9e Jakub Kicinski 2018-11-19  348  
890d8d23ec3c9e Jakub Kicinski 2018-11-19  349  	dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_GRED, &opt);
890d8d23ec3c9e Jakub Kicinski 2018-11-19 @350  }
890d8d23ec3c9e Jakub Kicinski 2018-11-19  351  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 69503 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-26 23:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-16  8:49 [PATCH net-next 0/9] Try to simplify the gnet_stats and remove qdisc->running sequence counter Sebastian Andrzej Siewior
2021-10-16  8:49 ` [PATCH net-next 1/9] gen_stats: Add instead Set the value in __gnet_stats_copy_basic() Sebastian Andrzej Siewior
2021-10-16  8:49 ` [PATCH net-next 2/9] gen_stats: Add gnet_stats_add_queue() Sebastian Andrzej Siewior
2021-10-16  8:49 ` [PATCH net-next 3/9] mq, mqprio: Use gnet_stats_add_queue() Sebastian Andrzej Siewior
2021-10-16  8:49 ` [PATCH net-next 4/9] gen_stats: Move remaining users to gnet_stats_add_queue() Sebastian Andrzej Siewior
2021-10-16  8:49 ` [PATCH net-next 5/9] u64_stats: Introduce u64_stats_set() Sebastian Andrzej Siewior
2021-10-16  8:49 ` [PATCH net-next 6/9] net: sched: Protect Qdisc::bstats with u64_stats Sebastian Andrzej Siewior
2021-10-26 23:47   ` kernel test robot
2021-10-16  8:49 ` [PATCH net-next 7/9] net: sched: Use _bstats_update/set() instead of raw writes Sebastian Andrzej Siewior
2021-10-16  8:49 ` [PATCH net-next 8/9] net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types Sebastian Andrzej Siewior
2021-10-16  8:49 ` [PATCH net-next 9/9] net: sched: Remove Qdisc::running sequence counter Sebastian Andrzej Siewior
2021-10-18 17:23   ` Eric Dumazet
2021-10-18 18:30     ` Eric Dumazet
2021-10-18 19:24       ` Eric Dumazet
2021-10-18 23:53       ` Eric Dumazet
2021-10-19 10:12     ` [PATCH net-next] net: sched: Allow statistics reads from softirq Sebastian Andrzej Siewior
2021-10-19 12:10       ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.