All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
@ 2021-09-12 16:00 Vladimir Oltean
  2021-09-12 16:13 ` Florian Fainelli
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Oltean @ 2021-09-12 16:00 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	David S. Miller, Jakub Kicinski

Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error
messages appear:

mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2
mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2

(and similarly for other ports)

What happens is that DSA has a policy "even if there are bugs, let's at
least not leak memory" and dsa_port_teardown() clears the dp->fdbs and
dp->mdbs lists, which are supposed to be empty.

But deleting that cleanup code, the warnings go away.

=> the FDB and MDB lists (used for refcounting on shared ports, aka CPU
and DSA ports) will eventually be empty, but are not empty by the time
we tear down those ports. Aka we are deleting them too soon.

The addresses that DSA complains about are host-trapped addresses: the
local addresses of the ports, and the MAC address of the bridge device.

The problem is that offloading those entries happens from a deferred
work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this
races with the teardown of the CPU and DSA ports where the refcounting
is kept.

In fact, not only it races, but fundamentally speaking, if we iterate
through the port list linearly, we might end up tearing down the shared
ports even before we delete a DSA user port which has a bridge upper.

So as it turns out, we need to first tear down the user ports (and the
unused ones, for no better place of doing that), then the shared ports
(the CPU and DSA ports). In between, we need to ensure that all work
items scheduled by our switchdev handlers (which only run for user
ports, hence the reason why we tear them down first) have finished.

Fixes: 161ca59d39e9 ("net: dsa: reference count the MDB entries at the cross-chip notifier level")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h  |  5 +++++
 net/dsa/dsa.c      |  5 +++++
 net/dsa/dsa2.c     | 46 +++++++++++++++++++++++++++++++---------------
 net/dsa/dsa_priv.h |  1 +
 4 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 2c39dbac63bd..6e29c0e080f6 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -447,6 +447,11 @@ static inline bool dsa_port_is_user(struct dsa_port *dp)
 	return dp->type == DSA_PORT_TYPE_USER;
 }
 
+static inline bool dsa_port_is_unused(struct dsa_port *dp)
+{
+	return dp->type == DSA_PORT_TYPE_UNUSED;
+}
+
 static inline bool dsa_is_unused_port(struct dsa_switch *ds, int p)
 {
 	return dsa_to_port(ds, p)->type == DSA_PORT_TYPE_UNUSED;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 1dc45e40f961..41f36ad8b0ec 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -345,6 +345,11 @@ bool dsa_schedule_work(struct work_struct *work)
 	return queue_work(dsa_owq, work);
 }
 
+void dsa_flush_workqueue(void)
+{
+	flush_workqueue(dsa_owq);
+}
+
 int dsa_devlink_param_get(struct devlink *dl, u32 id,
 			  struct devlink_param_gset_ctx *ctx)
 {
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 906ae566aa22..17d0437d72c0 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -897,6 +897,33 @@ static void dsa_switch_teardown(struct dsa_switch *ds)
 	ds->setup = false;
 }
 
+/* First tear down the non-shared, then the shared ports. This ensures that
+ * all work items scheduled by our switchdev handlers for user ports have
+ * completed before we destroy the refcounting kept on the shared ports.
+ */
+static void dsa_tree_teardown_ports(struct dsa_switch_tree *dst)
+{
+	struct dsa_port *dp;
+
+	list_for_each_entry(dp, &dst->ports, list)
+		if (dsa_port_is_user(dp) || dsa_port_is_unused(dp))
+			dsa_port_teardown(dp);
+
+	dsa_flush_workqueue();
+
+	list_for_each_entry(dp, &dst->ports, list)
+		if (dsa_port_is_dsa(dp) || dsa_port_is_cpu(dp))
+			dsa_port_teardown(dp);
+}
+
+static void dsa_tree_teardown_switches(struct dsa_switch_tree *dst)
+{
+	struct dsa_port *dp;
+
+	list_for_each_entry(dp, &dst->ports, list)
+		dsa_switch_teardown(dp->ds);
+}
+
 static int dsa_tree_setup_switches(struct dsa_switch_tree *dst)
 {
 	struct dsa_port *dp;
@@ -923,26 +950,13 @@ static int dsa_tree_setup_switches(struct dsa_switch_tree *dst)
 	return 0;
 
 teardown:
-	list_for_each_entry(dp, &dst->ports, list)
-		dsa_port_teardown(dp);
+	dsa_tree_teardown_ports(dst);
 
-	list_for_each_entry(dp, &dst->ports, list)
-		dsa_switch_teardown(dp->ds);
+	dsa_tree_teardown_switches(dst);
 
 	return err;
 }
 
-static void dsa_tree_teardown_switches(struct dsa_switch_tree *dst)
-{
-	struct dsa_port *dp;
-
-	list_for_each_entry(dp, &dst->ports, list)
-		dsa_port_teardown(dp);
-
-	list_for_each_entry(dp, &dst->ports, list)
-		dsa_switch_teardown(dp->ds);
-}
-
 static int dsa_tree_setup_master(struct dsa_switch_tree *dst)
 {
 	struct dsa_port *dp;
@@ -1052,6 +1066,8 @@ static void dsa_tree_teardown(struct dsa_switch_tree *dst)
 
 	dsa_tree_teardown_master(dst);
 
+	dsa_tree_teardown_ports(dst);
+
 	dsa_tree_teardown_switches(dst);
 
 	dsa_tree_teardown_cpu_ports(dst);
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 33ab7d7af9eb..a5c9bc7b66c6 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -170,6 +170,7 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops);
 const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
 
 bool dsa_schedule_work(struct work_struct *work);
+void dsa_flush_workqueue(void);
 const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
 
 static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  2021-09-12 16:00 [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports Vladimir Oltean
@ 2021-09-12 16:13 ` Florian Fainelli
  2021-09-12 16:19   ` Vladimir Oltean
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Fainelli @ 2021-09-12 16:13 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Andrew Lunn, Vivien Didelot, Vladimir Oltean, David S. Miller,
	Jakub Kicinski



On 9/12/2021 9:00 AM, Vladimir Oltean wrote:
> Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error
> messages appear:
> 
> mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2
> mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2
> mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2
> mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2
> mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2
> 
> (and similarly for other ports)
> 
> What happens is that DSA has a policy "even if there are bugs, let's at
> least not leak memory" and dsa_port_teardown() clears the dp->fdbs and
> dp->mdbs lists, which are supposed to be empty.
> 
> But deleting that cleanup code, the warnings go away.
> 
> => the FDB and MDB lists (used for refcounting on shared ports, aka CPU
> and DSA ports) will eventually be empty, but are not empty by the time
> we tear down those ports. Aka we are deleting them too soon.
> 
> The addresses that DSA complains about are host-trapped addresses: the
> local addresses of the ports, and the MAC address of the bridge device.
> 
> The problem is that offloading those entries happens from a deferred
> work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this
> races with the teardown of the CPU and DSA ports where the refcounting
> is kept.
> 
> In fact, not only it races, but fundamentally speaking, if we iterate
> through the port list linearly, we might end up tearing down the shared
> ports even before we delete a DSA user port which has a bridge upper.
> 
> So as it turns out, we need to first tear down the user ports (and the
> unused ones, for no better place of doing that), then the shared ports
> (the CPU and DSA ports). In between, we need to ensure that all work
> items scheduled by our switchdev handlers (which only run for user
> ports, hence the reason why we tear them down first) have finished.
> 
> Fixes: 161ca59d39e9 ("net: dsa: reference count the MDB entries at the cross-chip notifier level")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

Did you post this as a RFC for a particular reason, or just to give 
reviewers some time?
-- 
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  2021-09-12 16:13 ` Florian Fainelli
@ 2021-09-12 16:19   ` Vladimir Oltean
  2021-09-12 16:24     ` Florian Fainelli
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Oltean @ 2021-09-12 16:19 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vladimir Oltean, netdev, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jakub Kicinski

On Sun, Sep 12, 2021 at 09:13:36AM -0700, Florian Fainelli wrote:
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> 
> Did you post this as a RFC for a particular reason, or just to give
> reviewers some time?

Both.

In principle there's nothing wrong with what this patch does, only
perhaps maybe something with what it doesn't do.

We keep saying that a network interface should be ready to pass traffic
as soon as it's registered, but that "walk dst->ports linearly when
calling dsa_port_setup" might not really live up to that promise.

So while we do end up bringing all ports up at the end of
dsa_tree_setup_switches, I think for consistency we should do the same
thing there, i.e. bring the shared ports up first, then the user ports.
That way, the user ports should really be prepared to pass traffic as
soon as they get registered.

But I don't really know what kind of story to build around that to
include it as part of this patch, other than consistency. For teardown,
I think it is much more obvious to see an issue.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  2021-09-12 16:19   ` Vladimir Oltean
@ 2021-09-12 16:24     ` Florian Fainelli
  2021-09-12 16:33       ` Vladimir Oltean
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Fainelli @ 2021-09-12 16:24 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jakub Kicinski



On 9/12/2021 9:19 AM, Vladimir Oltean wrote:
> On Sun, Sep 12, 2021 at 09:13:36AM -0700, Florian Fainelli wrote:
>> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>>
>> Did you post this as a RFC for a particular reason, or just to give
>> reviewers some time?
> 
> Both.
> 
> In principle there's nothing wrong with what this patch does, only
> perhaps maybe something with what it doesn't do.
> 
> We keep saying that a network interface should be ready to pass traffic
> as soon as it's registered, but that "walk dst->ports linearly when
> calling dsa_port_setup" might not really live up to that promise.

That promise most definitively existed back when Lennert wrote this code 
and we had an array of ports and the switch drivers brought up their 
port in their ->setup() method, nowadays, not so sure anymore because of 
the .port_enable() as much as the list.

This is making me wonder whether the occasional messages I am seeing on 
system suspend from __dev_queue_xmit: Virtual device %s asks to queue 
packet! might have something to do with that and/or the inappropriate 
ordering between suspending the switch and the DSA master.

> 
> So while we do end up bringing all ports up at the end of
> dsa_tree_setup_switches, I think for consistency we should do the same
> thing there, i.e. bring the shared ports up first, then the user ports.
> That way, the user ports should really be prepared to pass traffic as
> soon as they get registered.
> 
> But I don't really know what kind of story to build around that to
> include it as part of this patch, other than consistency. For teardown,
> I think it is much more obvious to see an issue.
> 

-- 
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  2021-09-12 16:24     ` Florian Fainelli
@ 2021-09-12 16:33       ` Vladimir Oltean
  2021-09-13  2:06         ` Florian Fainelli
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Oltean @ 2021-09-12 16:33 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vladimir Oltean, netdev, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jakub Kicinski

On Sun, Sep 12, 2021 at 09:24:53AM -0700, Florian Fainelli wrote:
>
>
> On 9/12/2021 9:19 AM, Vladimir Oltean wrote:
> > On Sun, Sep 12, 2021 at 09:13:36AM -0700, Florian Fainelli wrote:
> > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> > >
> > > Did you post this as a RFC for a particular reason, or just to give
> > > reviewers some time?
> >
> > Both.
> >
> > In principle there's nothing wrong with what this patch does, only
> > perhaps maybe something with what it doesn't do.
> >
> > We keep saying that a network interface should be ready to pass traffic
> > as soon as it's registered, but that "walk dst->ports linearly when
> > calling dsa_port_setup" might not really live up to that promise.
>
> That promise most definitively existed back when Lennert wrote this code and
> we had an array of ports and the switch drivers brought up their port in
> their ->setup() method, nowadays, not so sure anymore because of the
> .port_enable() as much as the list.
>
> This is making me wonder whether the occasional messages I am seeing on
> system suspend from __dev_queue_xmit: Virtual device %s asks to queue
> packet! might have something to do with that and/or the inappropriate
> ordering between suspending the switch and the DSA master.

Sorry, I have never tested the suspend/resume code path, mostly because
I don't know what would the easiest way be to wake up my systems from
suspend. If you could give me some pointers there I would be glad to
look into it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  2021-09-12 16:33       ` Vladimir Oltean
@ 2021-09-13  2:06         ` Florian Fainelli
  2021-09-13  2:12           ` Vladimir Oltean
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Fainelli @ 2021-09-13  2:06 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jakub Kicinski



On 9/12/2021 9:33 AM, Vladimir Oltean wrote:
> On Sun, Sep 12, 2021 at 09:24:53AM -0700, Florian Fainelli wrote:
>>
>>
>> On 9/12/2021 9:19 AM, Vladimir Oltean wrote:
>>> On Sun, Sep 12, 2021 at 09:13:36AM -0700, Florian Fainelli wrote:
>>>> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>>>>
>>>> Did you post this as a RFC for a particular reason, or just to give
>>>> reviewers some time?
>>>
>>> Both.
>>>
>>> In principle there's nothing wrong with what this patch does, only
>>> perhaps maybe something with what it doesn't do.
>>>
>>> We keep saying that a network interface should be ready to pass traffic
>>> as soon as it's registered, but that "walk dst->ports linearly when
>>> calling dsa_port_setup" might not really live up to that promise.
>>
>> That promise most definitively existed back when Lennert wrote this code and
>> we had an array of ports and the switch drivers brought up their port in
>> their ->setup() method, nowadays, not so sure anymore because of the
>> .port_enable() as much as the list.
>>
>> This is making me wonder whether the occasional messages I am seeing on
>> system suspend from __dev_queue_xmit: Virtual device %s asks to queue
>> packet! might have something to do with that and/or the inappropriate
>> ordering between suspending the switch and the DSA master.
> 
> Sorry, I have never tested the suspend/resume code path, mostly because
> I don't know what would the easiest way be to wake up my systems from
> suspend. If you could give me some pointers there I would be glad to
> look into it.

If your systems support suspend/resume just do:

echo mem > /sys/power/state
or
echo standby > /sys/power/state

if they don't, then maybe a x86 VM with dsa_loop may precipitate the 
problem, but since it uses DSA_TAG_PROTO_NONE, I doubt it, we would need 
to pass traffic on the DSA devices for this warning to show up.
-- 
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  2021-09-13  2:06         ` Florian Fainelli
@ 2021-09-13  2:12           ` Vladimir Oltean
  2021-09-13  2:20             ` Florian Fainelli
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Oltean @ 2021-09-13  2:12 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vladimir Oltean, netdev, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jakub Kicinski

On Sun, Sep 12, 2021 at 07:06:25PM -0700, Florian Fainelli wrote:
>
>
> On 9/12/2021 9:33 AM, Vladimir Oltean wrote:
> > On Sun, Sep 12, 2021 at 09:24:53AM -0700, Florian Fainelli wrote:
> > >
> > >
> > > On 9/12/2021 9:19 AM, Vladimir Oltean wrote:
> > > > On Sun, Sep 12, 2021 at 09:13:36AM -0700, Florian Fainelli wrote:
> > > > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> > > > >
> > > > > Did you post this as a RFC for a particular reason, or just to give
> > > > > reviewers some time?
> > > >
> > > > Both.
> > > >
> > > > In principle there's nothing wrong with what this patch does, only
> > > > perhaps maybe something with what it doesn't do.
> > > >
> > > > We keep saying that a network interface should be ready to pass traffic
> > > > as soon as it's registered, but that "walk dst->ports linearly when
> > > > calling dsa_port_setup" might not really live up to that promise.
> > >
> > > That promise most definitively existed back when Lennert wrote this code and
> > > we had an array of ports and the switch drivers brought up their port in
> > > their ->setup() method, nowadays, not so sure anymore because of the
> > > .port_enable() as much as the list.
> > >
> > > This is making me wonder whether the occasional messages I am seeing on
> > > system suspend from __dev_queue_xmit: Virtual device %s asks to queue
> > > packet! might have something to do with that and/or the inappropriate
> > > ordering between suspending the switch and the DSA master.
> >
> > Sorry, I have never tested the suspend/resume code path, mostly because
> > I don't know what would the easiest way be to wake up my systems from
> > suspend. If you could give me some pointers there I would be glad to
> > look into it.
>
> If your systems support suspend/resume just do:
>
> echo mem > /sys/power/state
> or
> echo standby > /sys/power/state
>
> if they don't, then maybe a x86 VM with dsa_loop may precipitate the
> problem, but since it uses DSA_TAG_PROTO_NONE, I doubt it, we would need to
> pass traffic on the DSA devices for this warning to show up.

I figured out a working combination in the meanwhile, I even found a bug
in the process:
https://patchwork.kernel.org/project/netdevbpf/patch/20210912192805.1394305-1-vladimir.oltean@nxp.com/

However I did not see those messages getting printed while pinging after
system resume (note that none of the DSA switch drivers I tested with
did implement .suspend or .resume), with net-next or with linux-stable/linux-5.14.y.

Is there more to your setup to reproduce this issue?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  2021-09-13  2:12           ` Vladimir Oltean
@ 2021-09-13  2:20             ` Florian Fainelli
  2021-09-13 11:31               ` Vladimir Oltean
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Fainelli @ 2021-09-13  2:20 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jakub Kicinski



On 9/12/2021 7:12 PM, Vladimir Oltean wrote:
> On Sun, Sep 12, 2021 at 07:06:25PM -0700, Florian Fainelli wrote:
>>
>>
>> On 9/12/2021 9:33 AM, Vladimir Oltean wrote:
>>> On Sun, Sep 12, 2021 at 09:24:53AM -0700, Florian Fainelli wrote:
>>>>
>>>>
>>>> On 9/12/2021 9:19 AM, Vladimir Oltean wrote:
>>>>> On Sun, Sep 12, 2021 at 09:13:36AM -0700, Florian Fainelli wrote:
>>>>>> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>>>>>>
>>>>>> Did you post this as a RFC for a particular reason, or just to give
>>>>>> reviewers some time?
>>>>>
>>>>> Both.
>>>>>
>>>>> In principle there's nothing wrong with what this patch does, only
>>>>> perhaps maybe something with what it doesn't do.
>>>>>
>>>>> We keep saying that a network interface should be ready to pass traffic
>>>>> as soon as it's registered, but that "walk dst->ports linearly when
>>>>> calling dsa_port_setup" might not really live up to that promise.
>>>>
>>>> That promise most definitively existed back when Lennert wrote this code and
>>>> we had an array of ports and the switch drivers brought up their port in
>>>> their ->setup() method, nowadays, not so sure anymore because of the
>>>> .port_enable() as much as the list.
>>>>
>>>> This is making me wonder whether the occasional messages I am seeing on
>>>> system suspend from __dev_queue_xmit: Virtual device %s asks to queue
>>>> packet! might have something to do with that and/or the inappropriate
>>>> ordering between suspending the switch and the DSA master.
>>>
>>> Sorry, I have never tested the suspend/resume code path, mostly because
>>> I don't know what would the easiest way be to wake up my systems from
>>> suspend. If you could give me some pointers there I would be glad to
>>> look into it.
>>
>> If your systems support suspend/resume just do:
>>
>> echo mem > /sys/power/state
>> or
>> echo standby > /sys/power/state
>>
>> if they don't, then maybe a x86 VM with dsa_loop may precipitate the
>> problem, but since it uses DSA_TAG_PROTO_NONE, I doubt it, we would need to
>> pass traffic on the DSA devices for this warning to show up.
> 
> I figured out a working combination in the meanwhile, I even found a bug
> in the process:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210912192805.1394305-1-vladimir.oltean@nxp.com/
> 
> However I did not see those messages getting printed while pinging after
> system resume (note that none of the DSA switch drivers I tested with
> did implement .suspend or .resume), with net-next or with linux-stable/linux-5.14.y.
> 
> Is there more to your setup to reproduce this issue?

All switch ports are brought up with a DHCP client, the issue is 
definitively intermittent and not frequent, I don't have suspend/resume 
working on 5.14.y yet, but going as far back as 5.10.y should let you 
see the same kind of messages.
-- 
Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports
  2021-09-13  2:20             ` Florian Fainelli
@ 2021-09-13 11:31               ` Vladimir Oltean
  0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Oltean @ 2021-09-13 11:31 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vladimir Oltean, netdev, Andrew Lunn, Vivien Didelot,
	David S. Miller, Jakub Kicinski

On Sun, Sep 12, 2021 at 07:20:23PM -0700, Florian Fainelli wrote:
> On 9/12/2021 7:12 PM, Vladimir Oltean wrote:
> > On Sun, Sep 12, 2021 at 07:06:25PM -0700, Florian Fainelli wrote:
> > > On 9/12/2021 9:33 AM, Vladimir Oltean wrote:
> > > > On Sun, Sep 12, 2021 at 09:24:53AM -0700, Florian Fainelli wrote:
> > > > > On 9/12/2021 9:19 AM, Vladimir Oltean wrote:
> > > > > > On Sun, Sep 12, 2021 at 09:13:36AM -0700, Florian Fainelli wrote:
> > > > > > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
> > > > > > >
> > > > > > > Did you post this as a RFC for a particular reason, or just to give
> > > > > > > reviewers some time?
> > > > > >
> > > > > > Both.
> > > > > >
> > > > > > In principle there's nothing wrong with what this patch does, only
> > > > > > perhaps maybe something with what it doesn't do.
> > > > > >
> > > > > > We keep saying that a network interface should be ready to pass traffic
> > > > > > as soon as it's registered, but that "walk dst->ports linearly when
> > > > > > calling dsa_port_setup" might not really live up to that promise.
> > > > >
> > > > > That promise most definitively existed back when Lennert wrote this code and
> > > > > we had an array of ports and the switch drivers brought up their port in
> > > > > their ->setup() method, nowadays, not so sure anymore because of the
> > > > > .port_enable() as much as the list.
> > > > >
> > > > > This is making me wonder whether the occasional messages I am seeing on
> > > > > system suspend from __dev_queue_xmit: Virtual device %s asks to queue
> > > > > packet! might have something to do with that and/or the inappropriate
> > > > > ordering between suspending the switch and the DSA master.
> > > >
> > > > Sorry, I have never tested the suspend/resume code path, mostly because
> > > > I don't know what would the easiest way be to wake up my systems from
> > > > suspend. If you could give me some pointers there I would be glad to
> > > > look into it.
> > >
> > > If your systems support suspend/resume just do:
> > >
> > > echo mem > /sys/power/state
> > > or
> > > echo standby > /sys/power/state
> > >
> > > if they don't, then maybe a x86 VM with dsa_loop may precipitate the
> > > problem, but since it uses DSA_TAG_PROTO_NONE, I doubt it, we would need to
> > > pass traffic on the DSA devices for this warning to show up.
> >
> > I figured out a working combination in the meanwhile, I even found a bug
> > in the process:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210912192805.1394305-1-vladimir.oltean@nxp.com/
> >
> > However I did not see those messages getting printed while pinging after
> > system resume (note that none of the DSA switch drivers I tested with
> > did implement .suspend or .resume), with net-next or with linux-stable/linux-5.14.y.
> >
> > Is there more to your setup to reproduce this issue?
>
> All switch ports are brought up with a DHCP client, the issue is
> definitively intermittent and not frequent, I don't have suspend/resume
> working on 5.14.y yet, but going as far back as 5.10.y should let you see
> the same kind of messages.

Nope, I don't see the message on linux-5.10.y either, but even if I look
at the code I don't understand what goes on.

Here are some facts:

- __dev_queue_xmit only prints "Virtual device %s asks to queue packet!"
  for devices which have a NULL dev->qdisc->enqueue, but return
  NETDEV_TX_BUSY in dev_hard_start_xmit

- only the noqueue qdisc manages that performance, normally register_qdisc
  sets up a default of ->enqueue pointer of noop_qdisc_ops.enqueue, but
  there is a hack in noqueue_init which subverts that.

- The default qdiscs are assigned by attach_default_qdiscs(). There are
  two ways in which an interface can get the noqueue_qdisc_ops by default:
  (a) it declares the IFF_NO_QUEUE feature
  (b) the init of its other default qdisc (for example mq) failed for
      some reason, the kernel prints a warning and falls back to noqueue.

- DSA declares IFF_NO_QUEUE, so it uses the noqueue qdisc by default

- DSA never returns anything other than NETDEV_TX_OK in ndo_start_xmit.

So my conclusion is:
you have the noqueue qdisc on the DSA master, which got there either by
accident due to a failure of the mq qdisc initialization, or
intentionally by:

tc qdisc add dev eth0 root noqueue

When you use the noqueue qdisc and your DSA master device driver returns
NETDEV_TX_BUSY, the packet won't get queued => the system pretty much is
in its own right to complain.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-13 11:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-12 16:00 [RFC PATCH net] net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports Vladimir Oltean
2021-09-12 16:13 ` Florian Fainelli
2021-09-12 16:19   ` Vladimir Oltean
2021-09-12 16:24     ` Florian Fainelli
2021-09-12 16:33       ` Vladimir Oltean
2021-09-13  2:06         ` Florian Fainelli
2021-09-13  2:12           ` Vladimir Oltean
2021-09-13  2:20             ` Florian Fainelli
2021-09-13 11:31               ` Vladimir Oltean

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.