All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: patch problem - mv88e6xxx: flush switchdev FDB workqueue
       [not found] <ccf51795-5821-203d-348e-295aabbdc735@danysek.cz>
@ 2022-03-13 14:10 ` Vladimir Oltean
  2022-03-13 14:18   ` Vladimir Oltean
  2022-03-14  6:38   ` gregkh
  0 siblings, 2 replies; 6+ messages in thread
From: Vladimir Oltean @ 2022-03-13 14:10 UTC (permalink / raw)
  To: Daniel Suchy; +Cc: davem, rafael.richter, gregkh, netdev, stable, linux-kernel

Hi Daniel,

On Sun, Mar 13, 2022 at 03:03:07PM +0100, Daniel Suchy wrote:
> Hello,
> 
> I noticed boot problems on my Turris Omnia (with Marvell 88E6176 switch
> chip) after "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> removing VLAN" commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2566a89b9e163b2fcd104d6005e0149f197b8a48
> 
> Within logs I catched hung kernel tasks (see below), at least first is
> related to DSA subsystem.
> 
> When I revert this patch, everything works as expected and without any
> issues.
> 
> In my setup, I have few vlans on affected switch (i'm using ifupdown2 v3.0
> with iproute2 5.16 for configuration).
> 
> It seems your this patch introduces some new problem (at least for 5.15
> kernels). I suggest revert this patch.
> 
> - Daniel

Oh wow, I'm terribly sorry. Yes, this patch shouldn't have been
backported to kernel 5.15 and below, but I guess I missed the
backport notification email and forgot to tell Greg about this.
Patch "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
removing VLAN" needs to be immediately reverted from these trees.

Greg, to avoid this from happening in the future, would something like
this work? Is this parsed in some way?

Depends-on: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") # which first appeared in v5.16

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: patch problem - mv88e6xxx: flush switchdev FDB workqueue
  2022-03-13 14:10 ` patch problem - mv88e6xxx: flush switchdev FDB workqueue Vladimir Oltean
@ 2022-03-13 14:18   ` Vladimir Oltean
  2022-03-14  6:38   ` gregkh
  1 sibling, 0 replies; 6+ messages in thread
From: Vladimir Oltean @ 2022-03-13 14:18 UTC (permalink / raw)
  To: Daniel Suchy; +Cc: davem, rafael.richter, gregkh, netdev, stable, linux-kernel

On Sun, Mar 13, 2022 at 04:10:30PM +0200, Vladimir Oltean wrote:
> Hi Daniel,
> 
> On Sun, Mar 13, 2022 at 03:03:07PM +0100, Daniel Suchy wrote:
> > Hello,
> > 
> > I noticed boot problems on my Turris Omnia (with Marvell 88E6176 switch
> > chip) after "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > removing VLAN" commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2566a89b9e163b2fcd104d6005e0149f197b8a48
> > 
> > Within logs I catched hung kernel tasks (see below), at least first is
> > related to DSA subsystem.
> > 
> > When I revert this patch, everything works as expected and without any
> > issues.
> > 
> > In my setup, I have few vlans on affected switch (i'm using ifupdown2 v3.0
> > with iproute2 5.16 for configuration).
> > 
> > It seems your this patch introduces some new problem (at least for 5.15
> > kernels). I suggest revert this patch.
> > 
> > - Daniel
> 
> Oh wow, I'm terribly sorry. Yes, this patch shouldn't have been
> backported to kernel 5.15 and below, but I guess I missed the
> backport notification email and forgot to tell Greg about this.
> Patch "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> removing VLAN" needs to be immediately reverted from these trees.
> 
> Greg, to avoid this from happening in the future, would something like
> this work? Is this parsed in some way?
> 
> Depends-on: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") # which first appeared in v5.16

Looks like only 5.15 is affected. I re-checked the backport notification
emails, and the patch failed to apply on 5.10, 5.4, 4.19 and 4.14, as
intended.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: patch problem - mv88e6xxx: flush switchdev FDB workqueue
  2022-03-13 14:10 ` patch problem - mv88e6xxx: flush switchdev FDB workqueue Vladimir Oltean
  2022-03-13 14:18   ` Vladimir Oltean
@ 2022-03-14  6:38   ` gregkh
  2022-03-14  7:21     ` gregkh
  1 sibling, 1 reply; 6+ messages in thread
From: gregkh @ 2022-03-14  6:38 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Daniel Suchy, davem, rafael.richter, netdev, stable, linux-kernel

On Sun, Mar 13, 2022 at 02:10:31PM +0000, Vladimir Oltean wrote:
> Hi Daniel,
> 
> On Sun, Mar 13, 2022 at 03:03:07PM +0100, Daniel Suchy wrote:
> > Hello,
> > 
> > I noticed boot problems on my Turris Omnia (with Marvell 88E6176 switch
> > chip) after "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > removing VLAN" commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2566a89b9e163b2fcd104d6005e0149f197b8a48
> > 
> > Within logs I catched hung kernel tasks (see below), at least first is
> > related to DSA subsystem.
> > 
> > When I revert this patch, everything works as expected and without any
> > issues.
> > 
> > In my setup, I have few vlans on affected switch (i'm using ifupdown2 v3.0
> > with iproute2 5.16 for configuration).
> > 
> > It seems your this patch introduces some new problem (at least for 5.15
> > kernels). I suggest revert this patch.
> > 
> > - Daniel
> 
> Oh wow, I'm terribly sorry. Yes, this patch shouldn't have been
> backported to kernel 5.15 and below, but I guess I missed the
> backport notification email and forgot to tell Greg about this.
> Patch "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> removing VLAN" needs to be immediately reverted from these trees.
> 
> Greg, to avoid this from happening in the future, would something like
> this work? Is this parsed in some way?
> 
> Depends-on: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") # which first appeared in v5.16

The "Fixes:" tag will solve this, please just use that in the future.

I'll go revert this, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: patch problem - mv88e6xxx: flush switchdev FDB workqueue
  2022-03-14  6:38   ` gregkh
@ 2022-03-14  7:21     ` gregkh
  2022-03-14 11:17       ` Vladimir Oltean
  0 siblings, 1 reply; 6+ messages in thread
From: gregkh @ 2022-03-14  7:21 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Daniel Suchy, davem, rafael.richter, netdev, stable, linux-kernel

On Mon, Mar 14, 2022 at 07:38:50AM +0100, gregkh@linuxfoundation.org wrote:
> On Sun, Mar 13, 2022 at 02:10:31PM +0000, Vladimir Oltean wrote:
> > Hi Daniel,
> > 
> > On Sun, Mar 13, 2022 at 03:03:07PM +0100, Daniel Suchy wrote:
> > > Hello,
> > > 
> > > I noticed boot problems on my Turris Omnia (with Marvell 88E6176 switch
> > > chip) after "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > > removing VLAN" commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2566a89b9e163b2fcd104d6005e0149f197b8a48
> > > 
> > > Within logs I catched hung kernel tasks (see below), at least first is
> > > related to DSA subsystem.
> > > 
> > > When I revert this patch, everything works as expected and without any
> > > issues.
> > > 
> > > In my setup, I have few vlans on affected switch (i'm using ifupdown2 v3.0
> > > with iproute2 5.16 for configuration).
> > > 
> > > It seems your this patch introduces some new problem (at least for 5.15
> > > kernels). I suggest revert this patch.
> > > 
> > > - Daniel
> > 
> > Oh wow, I'm terribly sorry. Yes, this patch shouldn't have been
> > backported to kernel 5.15 and below, but I guess I missed the
> > backport notification email and forgot to tell Greg about this.
> > Patch "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > removing VLAN" needs to be immediately reverted from these trees.
> > 
> > Greg, to avoid this from happening in the future, would something like
> > this work? Is this parsed in some way?
> > 
> > Depends-on: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") # which first appeared in v5.16
> 
> The "Fixes:" tag will solve this, please just use that in the future.

Ah, you did have a fixes tag here, so then use the way to say "you also
need to add another patch here" by adding the sha to the line for the
stable tree:
	cc: stable@vger.kernel.org # 0faf890fc519

So, should I just backport that commit instead?  The "Fixes:" line says
this needs to be backported to 4.14, which is why I added it to these
trees.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: patch problem - mv88e6xxx: flush switchdev FDB workqueue
  2022-03-14  7:21     ` gregkh
@ 2022-03-14 11:17       ` Vladimir Oltean
  2022-03-14 11:25         ` gregkh
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Oltean @ 2022-03-14 11:17 UTC (permalink / raw)
  To: gregkh; +Cc: Daniel Suchy, davem, rafael.richter, netdev, stable, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2482 bytes --]

On Mon, Mar 14, 2022 at 08:21:30AM +0100, gregkh@linuxfoundation.org wrote:
> On Mon, Mar 14, 2022 at 07:38:50AM +0100, gregkh@linuxfoundation.org wrote:
> > On Sun, Mar 13, 2022 at 02:10:31PM +0000, Vladimir Oltean wrote:
> > > Hi Daniel,
> > > 
> > > On Sun, Mar 13, 2022 at 03:03:07PM +0100, Daniel Suchy wrote:
> > > > Hello,
> > > > 
> > > > I noticed boot problems on my Turris Omnia (with Marvell 88E6176 switch
> > > > chip) after "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > > > removing VLAN" commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2566a89b9e163b2fcd104d6005e0149f197b8a48
> > > > 
> > > > Within logs I catched hung kernel tasks (see below), at least first is
> > > > related to DSA subsystem.
> > > > 
> > > > When I revert this patch, everything works as expected and without any
> > > > issues.
> > > > 
> > > > In my setup, I have few vlans on affected switch (i'm using ifupdown2 v3.0
> > > > with iproute2 5.16 for configuration).
> > > > 
> > > > It seems your this patch introduces some new problem (at least for 5.15
> > > > kernels). I suggest revert this patch.
> > > > 
> > > > - Daniel
> > > 
> > > Oh wow, I'm terribly sorry. Yes, this patch shouldn't have been
> > > backported to kernel 5.15 and below, but I guess I missed the
> > > backport notification email and forgot to tell Greg about this.
> > > Patch "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > > removing VLAN" needs to be immediately reverted from these trees.
> > > 
> > > Greg, to avoid this from happening in the future, would something like
> > > this work? Is this parsed in some way?
> > > 
> > > Depends-on: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") # which first appeared in v5.16
> > 
> > The "Fixes:" tag will solve this, please just use that in the future.
> 
> Ah, you did have a fixes tag here, so then use the way to say "you also
> need to add another patch here" by adding the sha to the line for the
> stable tree:
> 	cc: stable@vger.kernel.org # 0faf890fc519
> 
> So, should I just backport that commit instead?  The "Fixes:" line says
> this needs to be backported to 4.14, which is why I added it to these
> trees.
> 
> thanks,

No, don't backport the dependency, just revert the patch (hence my
question: how can I describe "don't backport beyond commit X"?).

Here, you can apply the revert attached.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Revert-net-dsa-mv88e6xxx-flush-switchdev-FDB-workque.patch --]
[-- Type: text/x-diff; name="0001-Revert-net-dsa-mv88e6xxx-flush-switchdev-FDB-workque.patch", Size: 3851 bytes --]

From 571ca982c0a53d127b4c92339262c1e896e9c5a1 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Mon, 14 Mar 2022 01:24:59 +0200
Subject: [PATCH] Revert "net: dsa: mv88e6xxx: flush switchdev FDB workqueue
 before removing VLAN"

This reverts commit 2566a89b9e163b2fcd104d6005e0149f197b8a48.

The above change depends on upstream commit 0faf890fc519 ("net: dsa:
drop rtnl_lock from dsa_slave_switchdev_event_work"), which is not
present in linux-5.15.y. Without that change, waiting for the switchdev
workqueue causes deadlocks on the rtnl_mutex.

Backporting the dependency commit isn't trivial/desirable, since it
requires that the following dependencies of the dependency are also
backported:

df405910ab9f net: dsa: sja1105: wait for dynamic config command completion on writes too
eb016afd83a9 net: dsa: sja1105: serialize access to the dynamic config interface
2468346c5677 net: mscc: ocelot: serialize access to the MAC table
f7eb4a1c0864 net: dsa: b53: serialize access to the ARL table
cf231b436f7c net: dsa: lantiq_gswip: serialize access to the PCE registers
338a3a4745aa net: dsa: introduce locking for the address lists on CPU and DSA ports

and then this bugfix on top:

8940e6b669ca ("net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held")

Reported-by: Daniel Suchy <danny@danysek.cz>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 7 -------
 include/net/dsa.h                | 1 -
 net/dsa/dsa.c                    | 1 -
 net/dsa/dsa_priv.h               | 1 +
 4 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 263da7e2d6be..056e3b65cd27 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2291,13 +2291,6 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
 	if (!mv88e6xxx_max_vid(chip))
 		return -EOPNOTSUPP;
 
-	/* The ATU removal procedure needs the FID to be mapped in the VTU,
-	 * but FDB deletion runs concurrently with VLAN deletion. Flush the DSA
-	 * switchdev workqueue to ensure that all FDB entries are deleted
-	 * before we remove the VLAN.
-	 */
-	dsa_flush_workqueue();
-
 	mv88e6xxx_reg_lock(chip);
 
 	err = mv88e6xxx_port_get_pvid(chip, port, &pvid);
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 49e5ece9361c..d784e76113b8 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -1056,7 +1056,6 @@ void dsa_unregister_switch(struct dsa_switch *ds);
 int dsa_register_switch(struct dsa_switch *ds);
 void dsa_switch_shutdown(struct dsa_switch *ds);
 struct dsa_switch *dsa_switch_find(int tree_index, int sw_index);
-void dsa_flush_workqueue(void);
 #ifdef CONFIG_PM_SLEEP
 int dsa_switch_suspend(struct dsa_switch *ds);
 int dsa_switch_resume(struct dsa_switch *ds);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 4ff03fb262e0..41f36ad8b0ec 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -349,7 +349,6 @@ void dsa_flush_workqueue(void)
 {
 	flush_workqueue(dsa_owq);
 }
-EXPORT_SYMBOL_GPL(dsa_flush_workqueue);
 
 int dsa_devlink_param_get(struct devlink *dl, u32 id,
 			  struct devlink_param_gset_ctx *ctx)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 33ab7d7af9eb..a5c9bc7b66c6 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -170,6 +170,7 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops);
 const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
 
 bool dsa_schedule_work(struct work_struct *work);
+void dsa_flush_workqueue(void);
 const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
 
 static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: patch problem - mv88e6xxx: flush switchdev FDB workqueue
  2022-03-14 11:17       ` Vladimir Oltean
@ 2022-03-14 11:25         ` gregkh
  0 siblings, 0 replies; 6+ messages in thread
From: gregkh @ 2022-03-14 11:25 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Daniel Suchy, davem, rafael.richter, netdev, stable, linux-kernel

On Mon, Mar 14, 2022 at 11:17:50AM +0000, Vladimir Oltean wrote:
> On Mon, Mar 14, 2022 at 08:21:30AM +0100, gregkh@linuxfoundation.org wrote:
> > On Mon, Mar 14, 2022 at 07:38:50AM +0100, gregkh@linuxfoundation.org wrote:
> > > On Sun, Mar 13, 2022 at 02:10:31PM +0000, Vladimir Oltean wrote:
> > > > Hi Daniel,
> > > > 
> > > > On Sun, Mar 13, 2022 at 03:03:07PM +0100, Daniel Suchy wrote:
> > > > > Hello,
> > > > > 
> > > > > I noticed boot problems on my Turris Omnia (with Marvell 88E6176 switch
> > > > > chip) after "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > > > > removing VLAN" commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2566a89b9e163b2fcd104d6005e0149f197b8a48
> > > > > 
> > > > > Within logs I catched hung kernel tasks (see below), at least first is
> > > > > related to DSA subsystem.
> > > > > 
> > > > > When I revert this patch, everything works as expected and without any
> > > > > issues.
> > > > > 
> > > > > In my setup, I have few vlans on affected switch (i'm using ifupdown2 v3.0
> > > > > with iproute2 5.16 for configuration).
> > > > > 
> > > > > It seems your this patch introduces some new problem (at least for 5.15
> > > > > kernels). I suggest revert this patch.
> > > > > 
> > > > > - Daniel
> > > > 
> > > > Oh wow, I'm terribly sorry. Yes, this patch shouldn't have been
> > > > backported to kernel 5.15 and below, but I guess I missed the
> > > > backport notification email and forgot to tell Greg about this.
> > > > Patch "net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
> > > > removing VLAN" needs to be immediately reverted from these trees.
> > > > 
> > > > Greg, to avoid this from happening in the future, would something like
> > > > this work? Is this parsed in some way?
> > > > 
> > > > Depends-on: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") # which first appeared in v5.16
> > > 
> > > The "Fixes:" tag will solve this, please just use that in the future.
> > 
> > Ah, you did have a fixes tag here, so then use the way to say "you also
> > need to add another patch here" by adding the sha to the line for the
> > stable tree:
> > 	cc: stable@vger.kernel.org # 0faf890fc519
> > 
> > So, should I just backport that commit instead?  The "Fixes:" line says
> > this needs to be backported to 4.14, which is why I added it to these
> > trees.
> > 
> > thanks,
> 
> No, don't backport the dependency, just revert the patch (hence my
> question: how can I describe "don't backport beyond commit X"?).
> 
> Here, you can apply the revert attached.


Thanks, now queued up.

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-03-14 11:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ccf51795-5821-203d-348e-295aabbdc735@danysek.cz>
2022-03-13 14:10 ` patch problem - mv88e6xxx: flush switchdev FDB workqueue Vladimir Oltean
2022-03-13 14:18   ` Vladimir Oltean
2022-03-14  6:38   ` gregkh
2022-03-14  7:21     ` gregkh
2022-03-14 11:17       ` Vladimir Oltean
2022-03-14 11:25         ` gregkh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.