All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.20 23/76] net: sched: put back q.qlen into a single location
Date: Fri,  8 Mar 2019 13:49:35 +0100	[thread overview]
Message-ID: <20190308124915.596417692@linuxfoundation.org> (raw)
In-Reply-To: <20190308124914.789210760@linuxfoundation.org>

4.20-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 46b1c18f9deb326a7e18348e668e4c7ab7c7458b ]

In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).

It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.

But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.

HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
  htb_activate() -> WARN_ON()
  htb_dequeue_tree() to decide if a class can be htb_deactivated
  when it has no more packets.

HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.

Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.

It seems we have to put back qlen in a central location,
at least for stable kernels.

For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.

For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())

This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.

- qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
- qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()

Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).

Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.

Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/sch_generic.h |   31 +++++++++++++------------------
 net/core/gen_stats.c      |    2 --
 net/sched/sch_generic.c   |   13 ++++++-------
 3 files changed, 19 insertions(+), 27 deletions(-)

--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -48,7 +48,10 @@ struct qdisc_size_table {
 struct qdisc_skb_head {
 	struct sk_buff	*head;
 	struct sk_buff	*tail;
-	__u32		qlen;
+	union {
+		u32		qlen;
+		atomic_t	atomic_qlen;
+	};
 	spinlock_t	lock;
 };
 
@@ -405,27 +408,19 @@ static inline void qdisc_cb_private_vali
 	BUILD_BUG_ON(sizeof(qcb->data) < sz);
 }
 
-static inline int qdisc_qlen_cpu(const struct Qdisc *q)
-{
-	return this_cpu_ptr(q->cpu_qstats)->qlen;
-}
-
 static inline int qdisc_qlen(const struct Qdisc *q)
 {
 	return q->q.qlen;
 }
 
-static inline int qdisc_qlen_sum(const struct Qdisc *q)
+static inline u32 qdisc_qlen_sum(const struct Qdisc *q)
 {
-	__u32 qlen = q->qstats.qlen;
-	int i;
+	u32 qlen = q->qstats.qlen;
 
-	if (q->flags & TCQ_F_NOLOCK) {
-		for_each_possible_cpu(i)
-			qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
-	} else {
+	if (q->flags & TCQ_F_NOLOCK)
+		qlen += atomic_read(&q->q.atomic_qlen);
+	else
 		qlen += q->q.qlen;
-	}
 
 	return qlen;
 }
@@ -798,14 +793,14 @@ static inline void qdisc_qstats_cpu_back
 	this_cpu_add(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
 }
 
-static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch)
+static inline void qdisc_qstats_atomic_qlen_inc(struct Qdisc *sch)
 {
-	this_cpu_inc(sch->cpu_qstats->qlen);
+	atomic_inc(&sch->q.atomic_qlen);
 }
 
-static inline void qdisc_qstats_cpu_qlen_dec(struct Qdisc *sch)
+static inline void qdisc_qstats_atomic_qlen_dec(struct Qdisc *sch)
 {
-	this_cpu_dec(sch->cpu_qstats->qlen);
+	atomic_dec(&sch->q.atomic_qlen);
 }
 
 static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -291,7 +291,6 @@ __gnet_stats_copy_queue_cpu(struct gnet_
 	for_each_possible_cpu(i) {
 		const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
 
-		qstats->qlen = 0;
 		qstats->backlog += qcpu->backlog;
 		qstats->drops += qcpu->drops;
 		qstats->requeues += qcpu->requeues;
@@ -307,7 +306,6 @@ void __gnet_stats_copy_queue(struct gnet
 	if (cpu) {
 		__gnet_stats_copy_queue_cpu(qstats, cpu);
 	} else {
-		qstats->qlen = q->qlen;
 		qstats->backlog = q->backlog;
 		qstats->drops = q->drops;
 		qstats->requeues = q->requeues;
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -68,7 +68,7 @@ static inline struct sk_buff *__skb_dequ
 			skb = __skb_dequeue(&q->skb_bad_txq);
 			if (qdisc_is_percpu_stats(q)) {
 				qdisc_qstats_cpu_backlog_dec(q, skb);
-				qdisc_qstats_cpu_qlen_dec(q);
+				qdisc_qstats_atomic_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
 				q->q.qlen--;
@@ -108,7 +108,7 @@ static inline void qdisc_enqueue_skb_bad
 
 	if (qdisc_is_percpu_stats(q)) {
 		qdisc_qstats_cpu_backlog_inc(q, skb);
-		qdisc_qstats_cpu_qlen_inc(q);
+		qdisc_qstats_atomic_qlen_inc(q);
 	} else {
 		qdisc_qstats_backlog_inc(q, skb);
 		q->q.qlen++;
@@ -147,7 +147,7 @@ static inline int dev_requeue_skb_locked
 
 		qdisc_qstats_cpu_requeues_inc(q);
 		qdisc_qstats_cpu_backlog_inc(q, skb);
-		qdisc_qstats_cpu_qlen_inc(q);
+		qdisc_qstats_atomic_qlen_inc(q);
 
 		skb = next;
 	}
@@ -252,7 +252,7 @@ static struct sk_buff *dequeue_skb(struc
 			skb = __skb_dequeue(&q->gso_skb);
 			if (qdisc_is_percpu_stats(q)) {
 				qdisc_qstats_cpu_backlog_dec(q, skb);
-				qdisc_qstats_cpu_qlen_dec(q);
+				qdisc_qstats_atomic_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
 				q->q.qlen--;
@@ -645,7 +645,7 @@ static int pfifo_fast_enqueue(struct sk_
 	if (unlikely(err))
 		return qdisc_drop_cpu(skb, qdisc, to_free);
 
-	qdisc_qstats_cpu_qlen_inc(qdisc);
+	qdisc_qstats_atomic_qlen_inc(qdisc);
 	/* Note: skb can not be used after skb_array_produce(),
 	 * so we better not use qdisc_qstats_cpu_backlog_inc()
 	 */
@@ -670,7 +670,7 @@ static struct sk_buff *pfifo_fast_dequeu
 	if (likely(skb)) {
 		qdisc_qstats_cpu_backlog_dec(qdisc, skb);
 		qdisc_bstats_cpu_update(qdisc, skb);
-		qdisc_qstats_cpu_qlen_dec(qdisc);
+		qdisc_qstats_atomic_qlen_dec(qdisc);
 	}
 
 	return skb;
@@ -714,7 +714,6 @@ static void pfifo_fast_reset(struct Qdis
 		struct gnet_stats_queue *q = per_cpu_ptr(qdisc->cpu_qstats, i);
 
 		q->backlog = 0;
-		q->qlen = 0;
 	}
 }
 



  parent reply	other threads:[~2019-03-08 12:54 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-08 12:49 [PATCH 4.20 00/76] 4.20.15-stable review Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 01/76] cpufreq: Use struct kobj_attribute instead of struct global_attr Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 02/76] staging: erofs: fix mis-acted TAIL merging behavior Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 03/76] binder: create node flag to request senders security context Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 04/76] USB: serial: option: add Telit ME910 ECM composition Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 05/76] USB: serial: cp210x: add ID for Ingenico 3070 Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 06/76] USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485 Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 07/76] driver core: Postpone DMA tear-down until after devres release Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 08/76] staging: erofs: fix fast symlink w/o xattr when fs xattr is on Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 09/76] staging: erofs: fix memleak of inodes shared xattr array Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 10/76] staging: erofs: fix race of initializing xattrs of a inode at the same time Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 11/76] staging: erofs: fix illegal address access under memory pressure Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 12/76] staging: erofs: compressed_pages should not be accessed again after freed Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 13/76] staging: comedi: ni_660x: fix missing break in switch statement Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 14/76] staging: wilc1000: fix to set correct value for vif_num Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 15/76] staging: android: ion: fix sys heap pools gfp_flags Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 16/76] staging: android: ashmem: Dont call fallocate() with ashmem_mutex held Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 17/76] staging: android: ashmem: Avoid range_alloc() allocation " Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 18/76] ip6mr: Do not call __IP6_INC_STATS() from preemptible context Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 19/76] net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 20/76] net: dsa: mv88e6xxx: handle unknown duplex modes gracefully in mv88e6xxx_port_set_duplex Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 21/76] net: dsa: mv8e6xxx: fix number of internal PHYs for 88E6x90 family Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 22/76] net: mscc: Enable all ports in QSGMII Greg Kroah-Hartman
2019-03-08 12:49 ` Greg Kroah-Hartman [this message]
2019-03-08 12:49 ` [PATCH 4.20 24/76] net-sysfs: Fix mem leak in netdev_register_kobject Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 25/76] qmi_wwan: Add support for Quectel EG12/EM12 Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 26/76] sctp: call iov_iter_revert() after sending ABORT Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 27/76] sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79 Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 28/76] team: Free BPF filter when unregistering netdev Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 29/76] tipc: fix RDM/DGRAM connect() regression Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 30/76] bnxt_en: Drop oversize TX packets to prevent errors Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 31/76] geneve: correctly handle ipv6.disable module parameter Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 32/76] hv_netvsc: Fix IP header checksum for coalesced packets Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 33/76] ipv4: Add ICMPv6 support when parse route ipproto Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 34/76] lan743x: Fix TX Stall Issue Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 35/76] net: dsa: mv88e6xxx: Fix statistics on mv88e6161 Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 36/76] net: dsa: mv88e6xxx: Fix u64 statistics Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 37/76] netlabel: fix out-of-bounds memory accesses Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 38/76] net: netem: fix skb length BUG_ON in __skb_to_sgvec Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 39/76] net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 40/76] net: phy: Micrel KSZ8061: link failure after cable connect Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 41/76] net: phy: phylink: fix uninitialized variable in phylink_get_mac_state Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 42/76] net: sit: fix memory leak in sit_init_net() Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 43/76] net: socket: set sock->sk to NULL after calling proto_ops::release() Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 44/76] tipc: fix race condition causing hung sendto Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 45/76] tun: fix blocking read Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 46/76] xen-netback: dont populate the hash cache on XenBus disconnect Greg Kroah-Hartman
2019-03-08 12:49 ` [PATCH 4.20 47/76] xen-netback: fix occasional leak of grant ref mappings under memory pressure Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 48/76] tun: remove unnecessary memory barrier Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 49/76] net: Add __icmp_send helper Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 50/76] net: avoid use IPCB in cipso_v4_error Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 51/76] ipv4: Return error for RTA_VIA attribute Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 52/76] ipv6: " Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 53/76] mpls: Return error for RTA_GATEWAY attribute Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 54/76] ipv4: Pass original device to ip_rcv_finish_core Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 55/76] net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 56/76] net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 57/76] net/sched: act_ipt: fix refcount leak when replace fails Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 58/76] net/sched: act_skbedit: " Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 59/76] net: sched: act_tunnel_key: fix NULL pointer dereference during init Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 60/76] x86/CPU/AMD: Set the CPB bit unconditionally on F17h Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 61/76] x86/boot/compressed/64: Do not read legacy ROM on EFI system Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 62/76] tracing: Fix event filters and triggers to handle negative numbers Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 63/76] xhci: tegra: Prevent error pointer dereference Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 64/76] usb: xhci: Fix for Enabling USB ROLE SWITCH QUIRK on INTEL_SUNRISEPOINT_LP_XHCI Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 65/76] applicom: Fix potential Spectre v1 vulnerabilities Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 66/76] MIPS: irq: Allocate accurate order pages for irq stack Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 67/76] aio: Fix locking in aio_poll() Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 68/76] xtensa: fix get_wchan Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 69/76] gnss: sirf: fix premature wakeup interrupt enable Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 70/76] USB: serial: cp210x: fix GPIO in autosuspend Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 71/76] selftests: firmware: fix verify_reqs() return value Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 72/76] Bluetooth: btrtl: Restore old logic to assume firmware is already loaded Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 73/76] Bluetooth: Fix locking in bt_accept_enqueue() for BH context Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 74/76] exec: Fix mem leak in kernel_read_file Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 75/76] scsi: core: reset host byte in DID_NEXUS_FAILURE case Greg Kroah-Hartman
2019-03-08 12:50 ` [PATCH 4.20 76/76] bpf: fix sanitation rewrite in case of non-pointers Greg Kroah-Hartman
2019-03-08 16:04 ` [PATCH 4.20 00/76] 4.20.15-stable review Jon Hunter
2019-03-08 16:04   ` Jon Hunter
2019-03-08 20:57 ` shuah
2019-03-09  6:44 ` Naresh Kamboju
2019-03-09 22:35 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190308124915.596417692@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.