All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: netdev@vger.kernel.org
Cc: Jiri Pirko <jiri@resnulli.us>,
	Pedro Tammela <pctammela@mojatatu.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	linux-kernel@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	intel-wired-lan@lists.osuosl.org,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Peilin Ye <yepeilin.cs@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [Intel-wired-lan] [PATCH RESEND net-next 2/5] net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode
Date: Fri,  2 Jun 2023 13:37:47 +0300	[thread overview]
Message-ID: <20230602103750.2290132-3-vladimir.oltean@nxp.com> (raw)
In-Reply-To: <20230602103750.2290132-1-vladimir.oltean@nxp.com>

The taprio qdisc currently lives in the following equilibrium.

In the software scheduling case, taprio attaches itself to all TXQs,
thus having a refcount of 1 + the number of TX queues. In this mode,
q->qdiscs[] is not visible directly to the Qdisc API. The lifetime of
the Qdiscs from this private array lasts until qdisc_destroy() ->
taprio_destroy().

In the fully offloaded case, the root taprio has a refcount of 1, and
all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[]
are visible to the Qdisc API (they are attached to the netdev TXQs
directly), however taprio loses a reference to them very early - during
qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees
the q->qdiscs[] array to not leak memory, but interestingly, it does not
release a reference on these qdiscs because it doesn't effectively own
them - they are created by taprio but owned by the Qdisc core, and will
be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when
the Qdisc is deleted or when the child Qdisc is replaced with something
else.

My interest is to change this equilibrium such that taprio also owns a
reference on the q->qdiscs[] child Qdiscs for the lifetime of the root
Qdisc, including in full offload mode. I want this because I would like
taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have
insight into q->qdiscs[] for the software scheduling mode - currently
they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same
as the root taprio.

The following set of changes is necessary:
- don't free q->qdiscs[] early in taprio_attach(), free it late in
  taprio_destroy() for consistency with software mode. But:
- currently that's not possible, because taprio doesn't own a reference
  on q->qdiscs[]. So hold that reference - once during the initial
  attach() and once during subsequent graft() calls when the child is
  changed.
- always keep track of the current child in q->qdiscs[], even for full
  offload mode, so that we free in taprio_destroy() what we should, and
  not something stale.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/sched/sch_taprio.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index b1c611c72aa4..8807fc915b79 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -2138,23 +2138,20 @@ static void taprio_attach(struct Qdisc *sch)
 
 			qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
 			old = dev_graft_qdisc(dev_queue, qdisc);
+			/* Keep refcount of q->qdiscs[ntx] at 2 */
+			qdisc_refcount_inc(qdisc);
 		} else {
 			/* In software mode, attach the root taprio qdisc
 			 * to all netdev TX queues, so that dev_qdisc_enqueue()
 			 * goes through taprio_enqueue().
 			 */
 			old = dev_graft_qdisc(dev_queue, sch);
+			/* Keep root refcount at 1 + num_tx_queues */
 			qdisc_refcount_inc(sch);
 		}
 		if (old)
 			qdisc_put(old);
 	}
-
-	/* access to the child qdiscs is not needed in offload mode */
-	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
-		kfree(q->qdiscs);
-		q->qdiscs = NULL;
-	}
 }
 
 static struct netdev_queue *taprio_queue_get(struct Qdisc *sch,
@@ -2183,15 +2180,24 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
 	if (dev->flags & IFF_UP)
 		dev_deactivate(dev);
 
-	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
+	/* In software mode, the root taprio qdisc is still the one attached to
+	 * all netdev TX queues, and hence responsible for taprio_enqueue() to
+	 * forward the skbs to the child qdiscs from the private q->qdiscs[]
+	 * array. So only attach the new qdisc to the netdev queue in offload
+	 * mode, where the enqueue must bypass taprio. However, save the
+	 * reference to the new qdisc in the private array in both cases, to
+	 * have an up-to-date reference to our children.
+	 */
+	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
 		*old = dev_graft_qdisc(dev_queue, new);
-	} else {
+	else
 		*old = q->qdiscs[cl - 1];
-		q->qdiscs[cl - 1] = new;
-	}
 
-	if (new)
+	q->qdiscs[cl - 1] = new;
+	if (new) {
+		qdisc_refcount_inc(new);
 		new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+	}
 
 	if (dev->flags & IFF_UP)
 		dev_activate(dev);
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>,
	Peilin Ye <yepeilin.cs@gmail.com>,
	Pedro Tammela <pctammela@mojatatu.com>
Subject: [PATCH RESEND net-next 2/5] net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode
Date: Fri,  2 Jun 2023 13:37:47 +0300	[thread overview]
Message-ID: <20230602103750.2290132-3-vladimir.oltean@nxp.com> (raw)
In-Reply-To: <20230602103750.2290132-1-vladimir.oltean@nxp.com>

The taprio qdisc currently lives in the following equilibrium.

In the software scheduling case, taprio attaches itself to all TXQs,
thus having a refcount of 1 + the number of TX queues. In this mode,
q->qdiscs[] is not visible directly to the Qdisc API. The lifetime of
the Qdiscs from this private array lasts until qdisc_destroy() ->
taprio_destroy().

In the fully offloaded case, the root taprio has a refcount of 1, and
all child q->qdiscs[] also have a refcount of 1. The child q->qdiscs[]
are visible to the Qdisc API (they are attached to the netdev TXQs
directly), however taprio loses a reference to them very early - during
qdisc_graft(parent==NULL) -> taprio_attach(). At that time, taprio frees
the q->qdiscs[] array to not leak memory, but interestingly, it does not
release a reference on these qdiscs because it doesn't effectively own
them - they are created by taprio but owned by the Qdisc core, and will
be freed by qdisc_graft(parent==NULL, new==NULL) -> qdisc_put(old) when
the Qdisc is deleted or when the child Qdisc is replaced with something
else.

My interest is to change this equilibrium such that taprio also owns a
reference on the q->qdiscs[] child Qdiscs for the lifetime of the root
Qdisc, including in full offload mode. I want this because I would like
taprio_leaf(), taprio_dump_class(), taprio_dump_class_stats() to have
insight into q->qdiscs[] for the software scheduling mode - currently
they look at dev_queue->qdisc_sleeping, which is, as mentioned, the same
as the root taprio.

The following set of changes is necessary:
- don't free q->qdiscs[] early in taprio_attach(), free it late in
  taprio_destroy() for consistency with software mode. But:
- currently that's not possible, because taprio doesn't own a reference
  on q->qdiscs[]. So hold that reference - once during the initial
  attach() and once during subsequent graft() calls when the child is
  changed.
- always keep track of the current child in q->qdiscs[], even for full
  offload mode, so that we free in taprio_destroy() what we should, and
  not something stale.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/sched/sch_taprio.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index b1c611c72aa4..8807fc915b79 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -2138,23 +2138,20 @@ static void taprio_attach(struct Qdisc *sch)
 
 			qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
 			old = dev_graft_qdisc(dev_queue, qdisc);
+			/* Keep refcount of q->qdiscs[ntx] at 2 */
+			qdisc_refcount_inc(qdisc);
 		} else {
 			/* In software mode, attach the root taprio qdisc
 			 * to all netdev TX queues, so that dev_qdisc_enqueue()
 			 * goes through taprio_enqueue().
 			 */
 			old = dev_graft_qdisc(dev_queue, sch);
+			/* Keep root refcount at 1 + num_tx_queues */
 			qdisc_refcount_inc(sch);
 		}
 		if (old)
 			qdisc_put(old);
 	}
-
-	/* access to the child qdiscs is not needed in offload mode */
-	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
-		kfree(q->qdiscs);
-		q->qdiscs = NULL;
-	}
 }
 
 static struct netdev_queue *taprio_queue_get(struct Qdisc *sch,
@@ -2183,15 +2180,24 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
 	if (dev->flags & IFF_UP)
 		dev_deactivate(dev);
 
-	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
+	/* In software mode, the root taprio qdisc is still the one attached to
+	 * all netdev TX queues, and hence responsible for taprio_enqueue() to
+	 * forward the skbs to the child qdiscs from the private q->qdiscs[]
+	 * array. So only attach the new qdisc to the netdev queue in offload
+	 * mode, where the enqueue must bypass taprio. However, save the
+	 * reference to the new qdisc in the private array in both cases, to
+	 * have an up-to-date reference to our children.
+	 */
+	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
 		*old = dev_graft_qdisc(dev_queue, new);
-	} else {
+	else
 		*old = q->qdiscs[cl - 1];
-		q->qdiscs[cl - 1] = new;
-	}
 
-	if (new)
+	q->qdiscs[cl - 1] = new;
+	if (new) {
+		qdisc_refcount_inc(new);
 		new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+	}
 
 	if (dev->flags & IFF_UP)
 		dev_activate(dev);
-- 
2.34.1


  parent reply	other threads:[~2023-06-02 10:38 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 10:37 [Intel-wired-lan] [PATCH RESEND net-next 0/5] Improve the taprio qdisc's relationship with its children Vladimir Oltean
2023-06-02 10:37 ` Vladimir Oltean
2023-06-02 10:37 ` [Intel-wired-lan] [PATCH RESEND net-next 1/5] net/sched: taprio: don't access q->qdiscs[] in unoffloaded mode during attach() Vladimir Oltean
2023-06-02 10:37   ` Vladimir Oltean
2023-06-02 10:37 ` Vladimir Oltean [this message]
2023-06-02 10:37   ` [PATCH RESEND net-next 2/5] net/sched: taprio: keep child Qdisc refcount elevated at 2 in offload mode Vladimir Oltean
2023-06-06 10:27   ` Paolo Abeni
2023-06-06 10:27     ` [Intel-wired-lan] " Paolo Abeni
2023-06-06 15:56     ` Vladimir Oltean
2023-06-06 15:56       ` [Intel-wired-lan] " Vladimir Oltean
2023-06-07 10:05       ` Paolo Abeni
2023-06-07 10:05         ` Paolo Abeni
2023-06-07 10:16         ` Vladimir Oltean
2023-06-07 10:16           ` [Intel-wired-lan] " Vladimir Oltean
2023-06-02 10:37 ` [Intel-wired-lan] [PATCH RESEND net-next 3/5] net/sched: taprio: try again to report q->qdiscs[] to qdisc_leaf() Vladimir Oltean
2023-06-02 10:37   ` Vladimir Oltean
2023-06-02 10:37 ` [Intel-wired-lan] [PATCH RESEND net-next 4/5] net/sched: taprio: delete misleading comment about preallocating child qdiscs Vladimir Oltean
2023-06-02 10:37   ` Vladimir Oltean
2023-06-02 10:37 ` [Intel-wired-lan] [PATCH RESEND net-next 5/5] net/sched: taprio: dump class stats for the actual q->qdiscs[] Vladimir Oltean
2023-06-02 10:37   ` Vladimir Oltean
2023-06-08 18:44   ` Jamal Hadi Salim
2023-06-08 18:44     ` [Intel-wired-lan] " Jamal Hadi Salim
2023-06-09 12:10     ` Vladimir Oltean
2023-06-09 12:10       ` [Intel-wired-lan] " Vladimir Oltean
2023-06-09 14:56       ` Victor Nogueira
2023-06-09 14:56         ` [Intel-wired-lan] " Victor Nogueira
2023-06-10 17:49         ` Vladimir Oltean
2023-06-10 17:49           ` [Intel-wired-lan] " Vladimir Oltean
2023-06-09 16:19       ` Jamal Hadi Salim
2023-06-09 16:19         ` Jamal Hadi Salim
2023-06-09 17:56         ` Vladimir Oltean
2023-06-09 17:56           ` [Intel-wired-lan] " Vladimir Oltean
2023-06-05 12:50 ` [PATCH RESEND net-next 0/5] Improve the taprio qdisc's relationship with its children Vladimir Oltean
2023-06-05 12:50   ` [Intel-wired-lan] " Vladimir Oltean
2023-06-05 13:27   ` Simon Horman
2023-06-05 13:27     ` [Intel-wired-lan] " Simon Horman
2023-06-05 15:44 ` Jamal Hadi Salim
2023-06-05 15:44   ` Jamal Hadi Salim
2023-06-05 16:53   ` Vladimir Oltean
2023-06-05 16:53     ` [Intel-wired-lan] " Vladimir Oltean
2023-06-06 15:39     ` Jamal Hadi Salim
2023-06-06 15:39       ` [Intel-wired-lan] " Jamal Hadi Salim
2023-06-06 16:31       ` Vladimir Oltean
2023-06-06 16:31         ` [Intel-wired-lan] " Vladimir Oltean
2023-06-06 17:42         ` Jamal Hadi Salim
2023-06-06 17:42           ` [Intel-wired-lan] " Jamal Hadi Salim
2023-06-07 10:09           ` Vladimir Oltean
2023-06-07 10:09             ` [Intel-wired-lan] " Vladimir Oltean
2023-06-07 10:30             ` Vladimir Oltean
2023-06-07 10:30               ` [Intel-wired-lan] " Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230602103750.2290132-3-vladimir.oltean@nxp.com \
    --to=vladimir.oltean@nxp.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.